WO2023073971A1 - Three-dimensional reconstruction device, three-dimensional reconstruction method and program - Google Patents

Three-dimensional reconstruction device, three-dimensional reconstruction method and program Download PDF

Info

Publication number
WO2023073971A1
WO2023073971A1 PCT/JP2021/040164 JP2021040164W WO2023073971A1 WO 2023073971 A1 WO2023073971 A1 WO 2023073971A1 JP 2021040164 W JP2021040164 W JP 2021040164W WO 2023073971 A1 WO2023073971 A1 WO 2023073971A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
dimensional
dimensional reconstruction
coordinates
indoor space
Prior art date
Application number
PCT/JP2021/040164
Other languages
French (fr)
Japanese (ja)
Inventor
みずき 田端
潤一郎 玉松
亮 田中
陽祐 竹内
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to JP2023556075A priority Critical patent/JPWO2023073971A1/ja
Priority to PCT/JP2021/040164 priority patent/WO2023073971A1/en
Publication of WO2023073971A1 publication Critical patent/WO2023073971A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery

Definitions

  • the present disclosure relates to a three-dimensional reconstruction device, a three-dimensional reconstruction method, and a program that perform three-dimensional reconstruction of indoor space.
  • a neural network is a mathematical model that simulates nerve cells (neurons) and their connections in the human brain, that is, a neural network, reproduced on a computer and applied to operations such as machine learning.
  • a neural network constitutes a multi-layered network consisting of an input layer, one or more hidden layers (intermediate layers), and an output layer.
  • a deep neural network is a neural network that supports deep learning and has four or more layers of the network.
  • Non-Patent Document 1 describes a method of estimating a three-dimensional room layout from a single panoramic image using HorizonNet.
  • FIG. 5 cites a photograph disclosed in Non-Patent Document 1.
  • FIG. 5 As shown in FIG. 5, existing technologies such as HorizonNet use only panoramic images 21 as input information, and learn the corners and boundaries of wall surfaces only from the images.
  • Three-dimensional reconstruction 23 is performed by detecting the boundary B and estimating the three-dimensional coordinates.
  • Non-Patent Document 2 describes an outline of PointNet, which is a deep neural network that can directly input measured point groups.
  • An object of the present invention which has been made in view of such circumstances, is to provide highly accurate three-dimensional reconstruction of an indoor space even when boundary detection from image feature values is difficult, such as when a large number of objects exist in the indoor space.
  • the purpose is to make the configuration possible.
  • a three-dimensional reconstruction device is a three-dimensional reconstruction device that performs three-dimensional reconstruction of an indoor space, comprising: a panoramic image of a target indoor space; An input unit for inputting three-dimensional coordinates interpolated so as to connect the corners of the object to be photographed described in the drawing, an image encoding unit for extracting an image feature amount from the panoramic image, and the a shape encoding unit that extracts a three-dimensional shape feature amount from the interpolated three-dimensional coordinates; a concatenation operation unit that generates a feature amount by concatenating the image feature amount and the three-dimensional shape feature amount; and a three-dimensional reconstruction unit for performing three-dimensional reconstruction of the indoor space based on the image coordinates of the boundary.
  • a three-dimensional reconstruction method is a three-dimensional reconstruction method for performing three-dimensional reconstruction of an indoor space. inputting a panorama image and three-dimensional coordinates interpolated so as to connect corners of an object to be photographed described in a drawing of the indoor space; and extracting an image feature quantity from the panorama image. a step of extracting a three-dimensional shape feature quantity from the interpolated three-dimensional coordinates; a step of generating a feature quantity by connecting the image feature quantity and the three-dimensional shape feature quantity; generating image coordinates of a boundary by decoding quantities; and performing a three-dimensional reconstruction of the indoor space based on the image coordinates of the boundary.
  • a program causes a computer to function as the three-dimensional reconstruction device.
  • FIG. 1 is a block diagram showing a configuration example of a three-dimensional reconstruction device according to one embodiment
  • FIG. 4 is a flow chart showing an example of a three-dimensional reconstruction method executed by a three-dimensional reconstruction device according to one embodiment
  • FIG. 10 is a flowchart for explaining a procedure for a shape encoding unit to extract a three-dimensional shape feature amount
  • FIG. 1 is a block diagram showing a schematic configuration of a computer functioning as a three-dimensional reconstruction device
  • FIG. 1 is a diagram showing a conventional procedure for three-dimensional reconstruction from panoramic images
  • FIG. 1 is a block diagram showing a configuration example of a three-dimensional reconstruction device according to one embodiment.
  • a three-dimensional reconstruction device 1 performs three-dimensional reconstruction of an indoor space.
  • one panorama image 21 of the indoor space is prepared in advance, and the three-dimensional coordinates 22 (with an arbitrary point as the origin) interpolated so as to connect the corners of the photographing object described in the drawing of the indoor space. ) should be prepared as input data.
  • the input unit 11 inputs a panorama image 21 of the target indoor space and three-dimensional coordinates 22 interpolated so as to connect the corners of the shooting target described in the drawing of the indoor space.
  • the input unit 11 needs to input one panorama image and the corresponding three-dimensional coordinates 22 (acquired from the dimensions described in the drawing of the indoor space, with an arbitrary point as the origin).
  • the input unit 11 outputs the target indoor space panoramic image 21 to the image encoding unit 12 and outputs the interpolated three-dimensional coordinates 22 to the shape encoding unit 13 .
  • the image encoding unit 12 extracts the image feature amount a represented by a matrix from the panorama image 21 .
  • the image encoding unit 12 may extract the image feature amount a from the panoramic image 21 using HorizonNet.
  • the image encoding unit 12 outputs the image feature quantity a to the concatenation calculation unit 14 .
  • HorizonNet is a neural network that addresses the task of estimating a 3D room layout from a single panoramic image.
  • a HorizonNet feature extractor receives a single panorama image 21 and extracts a plurality of feature quantities represented by a matrix. Next, it learns by inputting a feature map, which is a combination of multiple feature values, and generates boundary image coordinates representing the boundary positions between the floor and walls, the ceiling and walls, and the boundaries between walls. . Post-processing is then applied to reconstruct the three-dimensional room layout from the image coordinates of the boundaries.
  • the shape encoding unit 13 extracts the three-dimensional shape feature quantity b represented by a matrix from the interpolated three-dimensional coordinates 22 .
  • the shape encoding unit 13 may extract the three-dimensional shape feature quantity b from the interpolated three-dimensional coordinates 22 using PointNet.
  • the shape encoding unit 13 outputs the three-dimensional shape feature quantity b to the concatenation calculation unit 14 .
  • the procedure for extracting the three-dimensional shape feature amount b by the shape encoding unit 13 will be described later with reference to FIG.
  • connection operation unit 14 connects the image feature amount a extracted from the panorama image 21 and the three-dimensional shape feature amount b extracted from the interpolated three-dimensional coordinates 22, represented by respective matrices, into a feature amount c. to generate
  • the concatenation calculation unit 14 outputs the concatenated feature amount c to the image coordinate decoding unit 15 .
  • the image coordinate decoding unit 15 generates the image coordinate d of the boundary by decoding the connected feature amount c.
  • the image coordinate decoding unit 15 may decode the concatenated feature quantity c using HorizonNet.
  • the three-dimensional reconstruction unit 16 performs three-dimensional reconstruction of the indoor space based on the boundary image coordinates d.
  • FIG. 2 is a flowchart showing an example of a three-dimensional reconstruction method executed by a three-dimensional reconstruction device according to one embodiment. ⁇ Step S01>
  • step S ⁇ b>01 the input unit 11 inputs the panoramic image 21 .
  • the vertical direction of the image is made to coincide with the zenith direction.
  • step S ⁇ b>02 the input unit 11 inputs the three-dimensional coordinates 22 interpolated so as to connect the corners of the panorama image 21 to be photographed.
  • the three-dimensional coordinates of the imaging target of the panorama image 21 (the three-dimensional coordinates of the corner obtained by setting the origin of the coordinate system to an arbitrary corner).
  • the three-dimensional coordinates 22 are interpolated so as to connect the corners of the three-dimensional coordinates.
  • the three-dimensional coordinates before interpolation include the three-dimensional coordinates of each point having only the corners as a point group. It also has the three-dimensional coordinates of each point in the point group of the part of the line that crosses the line.
  • step S ⁇ b>03 the image encoding unit 12 extracts the image feature amount a represented by a matrix from the panorama image 21 .
  • the above-mentioned HorizonNet or the like may be used to extract the image feature amount a.
  • step S04 the shape encoding unit 13 extracts the three-dimensional shape feature quantity b represented by a matrix from the three-dimensional coordinates 22.
  • the shape encoding unit 13 may extract the three-dimensional shape feature quantity b from the three-dimensional coordinates 22 using PointNet.
  • PointNet is a deep neural network that can directly input measured point clouds.
  • FIG. 3 is a flow chart for explaining the procedure for the shape encoding unit 13 to extract the three-dimensional shape feature quantity using PointNet. The procedure for extracting the three-dimensional shape feature quantity by the shape encoding unit 13 will be described below by dividing it into steps S041 to S044.
  • T-net inputs a point group and outputs a 3 ⁇ 3 affine transformation matrix.
  • a point cloud is represented as a set of three-dimensional points ⁇ Pi
  • i 1,..., n ⁇ , where each point Pi has (x, y, z) coordinates.
  • T-net is a network that inputs a point cloud and outputs an affine transformation matrix.
  • Affine transformation refers to performing scaling, rotation, translation, etc. of an image collectively using a matrix.
  • the internal structure of T-net has a structure similar to that of the shape coding section 13 .
  • step S042 the matrix multiplier multiplies the input point group by the affine transformation matrix output by the T-net (Matrix multiply). This results in an n ⁇ 3 matrix. This operation makes it possible to eliminate the effects of translation and rotation of the point group.
  • a neural network is a mathematical model that simulates nerve cells (neurons) and their connections in the human brain, that is, a neural network, reproduced on a computer and applied to operations such as machine learning.
  • a “perceptron” is the smallest unit that constitutes a neural network, and is a function that outputs one value for multiple inputs.
  • a “perceptron” is a function that outputs one signal for n input signals, and is represented by the following equation (1).
  • y is the output signal
  • function f () is the "activation function”
  • w i is i
  • b is a variable called a bias.
  • step S043 the n ⁇ 3 matrix obtained in step S042 is input to the multi-layer perceptron (the multi-layer perceptron is denoted as mlp in FIG. 3).
  • mlp(64,128,256) shown in FIG. 3 is a multi-layer perceptron with output sizes 64,128,256.
  • Multilayer perceptrons are constructed, for example, by connecting all-connected layers and activation functions.
  • a fully-connected layer integrates multiple numerical inputs from multiple nodes (modeled neurons) into a single numerical value through linear transformation processing.
  • An activation function is a function that non-linearly converts an input value to another value and outputs it when outputting from one node to the next node.
  • ReLU is used as the activation function.
  • ReLU Rectified Linear Unit
  • ReLU Rectified Linear Unit
  • step S044 a three-dimensional shape feature amount is obtained by Max Pooling.
  • Max Pooling means selecting and compressing the maximum value from the output values in each range. This operation makes it possible to eliminate the influence of the order of each point in the point group.
  • connection operation unit 14 connects the image feature amount a and the three-dimensional shape feature amount b to generate the connected feature amount c.
  • Methods for connecting feature amounts include a method of simply connecting matrices, a method of summing and multiplying elements of matrices, a method of using a bilinear model, and the like.
  • step S06 the image coordinate decoding unit 15 decodes the connected feature quantity c to generate the image coordinate d of the boundary.
  • a HorizonNet decoder or the like may be used for decoding the concatenated feature amount c.
  • the three-dimensional reconstruction unit 16 performs three-dimensional reconstruction based on the decoded boundary image coordinates d.
  • the three-dimensional reconstruction unit 16 performs processing such as determining a wall surface by principal component analysis based on the Manhattan world hypothesis.
  • the Manhattan World Assumption states that artificial objects such as ceilings and walls in three-dimensional space have dominant three axes that are orthogonal to each other, and the surfaces of the ceilings, walls, etc. that make up the artificial objects are It is a provisional arrangement that they are arranged perpendicularly or parallel to the three axes.
  • the panoramic image 21 of the indoor space As described above, in order to achieve the object of the present invention of realizing highly accurate boundary detection even in an indoor space where it is difficult to obtain a sufficient image feature amount a, the panoramic image 21 of the indoor space, The drawing of the indoor space was also used.
  • the panorama image 21 and the , and the three-dimensional coordinates 22 such as the point cloud it is possible to realize highly accurate boundary detection, and thus highly accurate three-dimensional reconstruction is possible.
  • the reconstruction accuracy depends on the image feature amount.
  • the present disclosure by using both a panoramic image and three-dimensional coordinates such as a point cloud, it is possible to realize high-precision boundary detection even in a space where it is difficult to obtain sufficient image feature amounts.
  • the input unit 11, the image encoding unit 12, the shape encoding unit 13, the concatenation calculation unit 14, the image coordinate decoding unit 15, and the three-dimensional reconstruction unit 16 in the three-dimensional reconstruction device 1 are included in the control device (controller). constitute a part.
  • the control device may be composed of dedicated hardware such as ASIC (Application Specific Integrated Circuit) or FPGA (Field-Programmable Gate Array), or may be composed of a processor, or may be composed of both.
  • FIG. 4 is a block diagram showing a schematic configuration of a computer that functions as a three-dimensional reconstruction device.
  • the computer functioning as the three-dimensional reconstruction device 1 may be a general-purpose computer, a dedicated computer, a workstation, a PC (Personal Computer), an electronic notepad, or the like.
  • Program instructions may be program code, code segments, etc. for performing the required tasks.
  • the computer 100 includes a processor 110, a ROM (Read Only Memory) 120, a RAM (Random Access Memory) 130, and a storage 140 as storage units, an input unit 150, an output unit 160, and communication and an interface (I/F) 170 .
  • a processor 110 a ROM (Read Only Memory) 120
  • a RAM Random Access Memory
  • storage 140 as storage units
  • an input unit 150 an output unit 160
  • communication and an interface (I/F) 170 Each component is communicatively connected to each other via a bus 180 .
  • the ROM 120 stores various programs and various data.
  • RAM 130 temporarily stores programs or data as a work area.
  • the storage 140 is configured by a HDD (Hard Disk Drive) or SSD (Solid State Drive) and stores various programs including an operating system and various data.
  • the ROM 120 or the storage 140 stores programs according to the present disclosure.
  • the processor 110 is specifically a CPU (Central Processing Unit), MPU (Micro Processing Unit), GPU (Graphics Processing Unit), DSP (Digital Signal Processor), SoC (System on a Chip), or the like. may be configured by a plurality of processors of The processor 110 reads a program from the ROM 120 or the storage 140 and executes the program using the RAM 130 as a work area, thereby performing control of each configuration and various arithmetic processing. Note that at least part of these processing contents may be realized by hardware.
  • CPU Central Processing Unit
  • MPU Micro Processing Unit
  • GPU Graphics Processing Unit
  • DSP Digital Signal Processor
  • SoC System on a Chip
  • the program may be recorded on a recording medium readable by the three-dimensional reconstruction device 1. By using such a recording medium, it can be installed in the three-dimensional reconstruction device 1 .
  • the recording medium on which the program is recorded may be a non-transitory recording medium.
  • the non-transitory recording medium is not particularly limited, but may be, for example, a CD-ROM, a DVD-ROM, a USB (Universal Serial Bus) memory, or the like.
  • this program may be downloaded from an external device via a network.
  • a three-dimensional reconstruction device that performs three-dimensional reconstruction of an indoor space, memory; a controller connected to the memory; with The controller is Input a panoramic image of the target indoor space and three-dimensional coordinates interpolated so as to connect the corners of the shooting target described in the drawing of the indoor space, extracting an image feature amount from the panoramic image; extracting a three-dimensional shape feature amount from the interpolated three-dimensional coordinates; generating a feature amount by connecting the image feature amount and the three-dimensional shape feature amount; generating boundary image coordinates by decoding the concatenated features; A three-dimensional reconstruction device that performs three-dimensional reconstruction of the indoor space based on the image coordinates of the boundary. (Appendix 2) The controller is 3.
  • the three-dimensional reconstruction device according to claim 1, wherein image feature amounts are extracted from the panorama image using HorizonNet.
  • the controller is 3.
  • the three-dimensional reconstruction apparatus according to item 1 or 2, wherein a three-dimensional shape feature amount is extracted from the interpolated three-dimensional coordinates using PointNet.
  • the controller is 4.
  • the three-dimensional reconstruction device according to any one of additional items 1 to 3, wherein the concatenated feature quantity is decoded using HorizonNet.
  • a three-dimensional reconstruction method for three-dimensional reconstruction of an indoor space comprising: With a three-dimensional reconstruction device, a step of inputting a panoramic image of a target indoor space and three-dimensional coordinates interpolated so as to connect the corners of the shooting target described in the drawing of the indoor space; a step of extracting an image feature quantity from the panoramic image; a step of extracting a three-dimensional shape feature quantity from the interpolated three-dimensional coordinates; generating a feature amount by connecting the image feature amount and the three-dimensional shape feature amount; generating image coordinates of boundaries by decoding the concatenated features; performing three-dimensional reconstruction of the indoor space based on the image coordinates of the boundary;
  • a three-dimensional reconstruction method comprising: (Appendix 6) A non-temporary storage medium storing a computer-executable program, the non-temporary storage storing a program that causes the computer to function as the three-dimensional reconstruction device according to any one of appendices 1 to 4. medium.

Abstract

This three-dimensional reconstruction program (1) is provided with: an input unit (11) which inputs a panorama image (21) of a target indoor space, and three-dimensional coordinates (22) interpolated so as to connect corners to be imaged, shown on a drawing of the indoor space; an image encoding unit (12) which extracts image feature values from the panorama image (21); a shape encoding unit (13) which extracts three-dimensional shape feature values from the interpolated three-dimensional coordinates (22); a link calculation unit (14) which generates feature values that link image feature values and three-dimensional shape feature values; an image coordinate decoding unit (15) which generates boundary image coordinates by decoding the linked feature values; and a three-dimensional reconstruction unit (16) which performs three-dimensional reconstruction of the indoor space on the basis of the boundary image coordinates.

Description

3次元再構成装置、3次元再構成方法、及びプログラム3D reconstruction device, 3D reconstruction method, and program
 本開示は、屋内空間の3次元再構成を行う3次元再構成装置、3次元再構成方法、及びプログラムに関する。 The present disclosure relates to a three-dimensional reconstruction device, a three-dimensional reconstruction method, and a program that perform three-dimensional reconstruction of indoor space.
 従来、部屋などの屋内空間が撮影されたパノラマ画像から3次元再構成を行う技術には、ディープニューラルネットワーク(Deep Neural Network)を使用したHorizonNet等がある。3次元再構成とは、立体である物体の投影画像、もしくは、平面画像から元の3次元構造を再構築することをいう。ニューラルネットワーク(Neural Network)とは、人間の脳内にある神経細胞(ニューロン)とそのつながり、つまり神経回路網を模式化した数理モデルをコンピュータ上に再現し、機械学習等の操作に応用したものをいう。ニューラルネットワークは、入力層、1つ以上の隠れ層(中間層)、出力層という多層ネットワークを構成する。また、ディープニューラルネットワーク(Deep Neural Network )とは、ニューラルネットワークをディープラーニング(深層学習)に対応させて、ネットワークの階層を4層以上に深くしたものを指す。非特許文献1には、HorizonNetを用いて1枚のパノラマ画像から3次元の部屋のレイアウトを推定する手法が記載されている。図5は、非特許文献1に開示された写真を引用している。図5に示すように、HorizonNet等の既存技術は、入力情報としてパノラマ画像21のみを使用し、壁面の隅角部及び境界を画像のみから学習する。そして、境界Bを検出し、3次元座標を推定することにより、3次元再構成23を行っている。また、非特許文献2には、計測した点群を直接入力できるディープニューラルネットワークであるPointNetの概要が記載されている。  Conventionally, technologies for 3D reconstruction from panoramic images of indoor spaces such as rooms include HorizonNet, which uses a deep neural network. Three-dimensional reconstruction refers to reconstructing the original three-dimensional structure from a projected image of a three-dimensional object or a planar image. A neural network is a mathematical model that simulates nerve cells (neurons) and their connections in the human brain, that is, a neural network, reproduced on a computer and applied to operations such as machine learning. Say. A neural network constitutes a multi-layered network consisting of an input layer, one or more hidden layers (intermediate layers), and an output layer. Also, a deep neural network is a neural network that supports deep learning and has four or more layers of the network. Non-Patent Document 1 describes a method of estimating a three-dimensional room layout from a single panoramic image using HorizonNet. FIG. 5 cites a photograph disclosed in Non-Patent Document 1. FIG. As shown in FIG. 5, existing technologies such as HorizonNet use only panoramic images 21 as input information, and learn the corners and boundaries of wall surfaces only from the images. Three-dimensional reconstruction 23 is performed by detecting the boundary B and estimating the three-dimensional coordinates. In addition, Non-Patent Document 2 describes an outline of PointNet, which is a deep neural network that can directly input measured point groups.
 しかし、従来のパノラマ画像のみを使用した境界検出は、画像特徴量(エッジ、コーナー等)に基づいており、屋内空間中に存在する多数の物体が境界を遮蔽している場合、画像特徴量から境界検出が困難な屋内空間も多く存在する。かかる屋内空間に対して従来技術を適用すると、境界検出精度が下がるため3次元画像の再構成精度が低下するという課題があった。 However, conventional boundary detection using only panoramic images is based on image features (edges, corners, etc.). There are many indoor spaces where boundary detection is difficult. If the conventional technology is applied to such an indoor space, there is a problem that the accuracy of boundary detection is lowered, and the reconstruction accuracy of the three-dimensional image is lowered.
 かかる事情に鑑みてなされた本発明の目的は、屋内空間中に多数の物体が存在する等、画像特徴量からの境界検出が困難な場合であっても、高精度な屋内空間の3次元再構成を可能とすることにある。 An object of the present invention, which has been made in view of such circumstances, is to provide highly accurate three-dimensional reconstruction of an indoor space even when boundary detection from image feature values is difficult, such as when a large number of objects exist in the indoor space. The purpose is to make the configuration possible.
 上記課題を解決するため、一実施形態に係る3次元再構成装置は、屋内空間の3次元再構成を行う3次元再構成装置であって、対象とする屋内空間のパノラマ画像と、該屋内空間の図面に記載されている撮影対象の隅角部間を繋ぐように補間された3次元座標と、を入力する入力部と、前記パノラマ画像から画像特徴量を抽出する画像符号化部と、前記補間された3次元座標から3次元形状特徴量を抽出する形状符号化部と、前記画像特徴量と、前記3次元形状特徴量と、を連結した特徴量を生成する連結演算部と、前記連結した特徴量を復号することにより、境界の画像座標を生成する画像座標復号部と、前記境界の画像座標を基に前記屋内空間の3次元再構成を行う3次元再構成部と、を備える。 In order to solve the above problems, a three-dimensional reconstruction device according to one embodiment is a three-dimensional reconstruction device that performs three-dimensional reconstruction of an indoor space, comprising: a panoramic image of a target indoor space; An input unit for inputting three-dimensional coordinates interpolated so as to connect the corners of the object to be photographed described in the drawing, an image encoding unit for extracting an image feature amount from the panoramic image, and the a shape encoding unit that extracts a three-dimensional shape feature amount from the interpolated three-dimensional coordinates; a concatenation operation unit that generates a feature amount by concatenating the image feature amount and the three-dimensional shape feature amount; and a three-dimensional reconstruction unit for performing three-dimensional reconstruction of the indoor space based on the image coordinates of the boundary.
 上記課題を解決するため、一実施形態に係る3次元再構成方法は、屋内空間の3次元再構成を行う3次元再構成方法であって、3次元再構成装置により、対象とする屋内空間のパノラマ画像と、該屋内空間の図面に記載されている撮影対象の隅角部間を繋ぐように補間された3次元座標と、を入力するステップと、前記パノラマ画像から画像特徴量を抽出するステップと、前記補間された3次元座標から3次元形状特徴量を抽出するステップと、前記画像特徴量と、前記3次元形状特徴量と、を連結した特徴量を生成するステップと、前記連結した特徴量を復号することにより、境界の画像座標を生成するステップと、前記境界の画像座標を基に前記屋内空間の3次元再構成を行うステップと、を含む。 In order to solve the above problems, a three-dimensional reconstruction method according to one embodiment is a three-dimensional reconstruction method for performing three-dimensional reconstruction of an indoor space. inputting a panorama image and three-dimensional coordinates interpolated so as to connect corners of an object to be photographed described in a drawing of the indoor space; and extracting an image feature quantity from the panorama image. a step of extracting a three-dimensional shape feature quantity from the interpolated three-dimensional coordinates; a step of generating a feature quantity by connecting the image feature quantity and the three-dimensional shape feature quantity; generating image coordinates of a boundary by decoding quantities; and performing a three-dimensional reconstruction of the indoor space based on the image coordinates of the boundary.
 上記課題を解決するため、一実施形態に係るプログラムは、コンピュータを、上記3次元再構成装置として機能させる。 In order to solve the above problems, a program according to one embodiment causes a computer to function as the three-dimensional reconstruction device.
 本開示によれば、屋内空間中に多数の物体が存在する等、画像特徴量からの境界検出が困難な空間に対しても、パノラマ画像と、点群等3次元座標とを併用することにより、高精度な境界検出を実現することが可能となるため、高精度な屋内空間の3次元再構成が可能となる。 According to the present disclosure, even in a space where it is difficult to detect the boundary from the image feature amount, such as when there are many objects in the indoor space, by using both the panoramic image and the three-dimensional coordinates such as the point cloud , high-precision boundary detection can be realized, so that high-precision three-dimensional reconstruction of an indoor space is possible.
一実施形態に係る3次元再構成装置の構成例を示すブロック図である。1 is a block diagram showing a configuration example of a three-dimensional reconstruction device according to one embodiment; FIG. 一実施形態に係る3次元再構成装置が実行する3次元再構成方法の一例を示すフローチャートである。4 is a flow chart showing an example of a three-dimensional reconstruction method executed by a three-dimensional reconstruction device according to one embodiment; 形状符号化部が3次元形状特徴量を抽出する手順を説明するフローチャートである。FIG. 10 is a flowchart for explaining a procedure for a shape encoding unit to extract a three-dimensional shape feature amount; FIG. 3次元再構成装置として機能するコンピュータの概略構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a computer functioning as a three-dimensional reconstruction device; FIG. パノラマ画像から3次元再構成を行う従来の手順を示す図である。1 is a diagram showing a conventional procedure for three-dimensional reconstruction from panoramic images; FIG.
 以下、本発明を実施するための形態について、図面を参照しながら詳細に説明する。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings.
 図1は、一実施形態に係る3次元再構成装置の構成例を示すブロック図である。図1に示す3次元再構成装置1は、入力部11と、画像符号化部12と、形状符号化部13と、連結演算部14と、画像座標復号部15と、3次元再構成部16と、を備える。3次元再構成装置1は、屋内空間の3次元再構成を行う。 FIG. 1 is a block diagram showing a configuration example of a three-dimensional reconstruction device according to one embodiment. The three-dimensional reconstruction device 1 shown in FIG. And prepare. A three-dimensional reconstruction device 1 performs three-dimensional reconstruction of an indoor space.
 本実施形態では、屋内空間のパノラマ画像21と該屋内空間の図面とを併用することにより、十分な画像特徴量aを得ることが困難な空間に対しても、高精度な境界検出を実現する。このため、予め屋内空間のパノラマ画像21を1枚と、該屋内空間の図面に記載されている撮影対象の隅角部間を繋ぐように補間された3次元座標22(任意の点を原点とする)とを入力データとして用意することが必要である。 In this embodiment, by using both the panoramic image 21 of the indoor space and the drawing of the indoor space, highly accurate boundary detection is realized even for a space where it is difficult to obtain a sufficient image feature amount a. . For this reason, one panorama image 21 of the indoor space is prepared in advance, and the three-dimensional coordinates 22 (with an arbitrary point as the origin) interpolated so as to connect the corners of the photographing object described in the drawing of the indoor space. ) should be prepared as input data.
 入力部11は、対象とする屋内空間のパノラマ画像21と、該屋内空間の図面に記載されている撮影対象の隅角部間を繋ぐように補間された3次元座標22と、を入力する。入力部11は、パノラマ画像1枚と、対応する3次元座標22(屋内空間の図面に記載されている寸法から取得する。任意の点を原点とする)とを入力する必要がある。入力部11は、対象とする屋内空間のパノラマ画像21を画像符号化部12へ出力し、補間された3次元座標22を形状符号化部13へ出力する。 The input unit 11 inputs a panorama image 21 of the target indoor space and three-dimensional coordinates 22 interpolated so as to connect the corners of the shooting target described in the drawing of the indoor space. The input unit 11 needs to input one panorama image and the corresponding three-dimensional coordinates 22 (acquired from the dimensions described in the drawing of the indoor space, with an arbitrary point as the origin). The input unit 11 outputs the target indoor space panoramic image 21 to the image encoding unit 12 and outputs the interpolated three-dimensional coordinates 22 to the shape encoding unit 13 .
 画像符号化部12は、パノラマ画像21から行列で表される画像特徴量aを抽出する。画像符号化部12は、パノラマ画像21からHorizonNetを使用して、画像特徴量aを抽出してもよい。画像符号化部12は、画像特徴量aを連結演算部14へ出力する。 The image encoding unit 12 extracts the image feature amount a represented by a matrix from the panorama image 21 . The image encoding unit 12 may extract the image feature amount a from the panoramic image 21 using HorizonNet. The image encoding unit 12 outputs the image feature quantity a to the concatenation calculation unit 14 .
 HorizonNetは、1枚のパノラマ画像から3次元の部屋のレイアウトを推定する課題に対応するニューラルネットワークである。HorizonNetの特徴抽出器は、1枚のパノラマ画像21を入力し、行列で表される複数の特徴量を抽出する。次に、複数の特徴量が連結された特徴マップ(feature map)を入力して学習し、床と壁,天井と壁の境界位置,および壁と壁の境界を表す境界の画像座標を生成する。そして、後処理を施し、3次元の部屋のレイアウトが境界の画像座標から再構成される。 HorizonNet is a neural network that addresses the task of estimating a 3D room layout from a single panoramic image. A HorizonNet feature extractor receives a single panorama image 21 and extracts a plurality of feature quantities represented by a matrix. Next, it learns by inputting a feature map, which is a combination of multiple feature values, and generates boundary image coordinates representing the boundary positions between the floor and walls, the ceiling and walls, and the boundaries between walls. . Post-processing is then applied to reconstruct the three-dimensional room layout from the image coordinates of the boundaries.
 形状符号化部13は、補間された3次元座標22から行列で表される3次元形状特徴量bを抽出する。形状符号化部13は、PointNetを使用して、補間された3次元座標22から3次元形状特徴量bを抽出してもよい。形状符号化部13は、3次元形状特徴量bを連結演算部14へ出力する。なお、形状符号化部13が3次元形状特徴量bを抽出する手順については、図3を参考として後述する。 The shape encoding unit 13 extracts the three-dimensional shape feature quantity b represented by a matrix from the interpolated three-dimensional coordinates 22 . The shape encoding unit 13 may extract the three-dimensional shape feature quantity b from the interpolated three-dimensional coordinates 22 using PointNet. The shape encoding unit 13 outputs the three-dimensional shape feature quantity b to the concatenation calculation unit 14 . The procedure for extracting the three-dimensional shape feature amount b by the shape encoding unit 13 will be described later with reference to FIG.
 連結演算部14は、それぞれの行列で表される、パノラマ画像21から抽出した画像特徴量aと、補間された3次元座標22から抽出した3次元形状特徴量bと、を連結した特徴量cを生成する。連結演算部14は、連結した特徴量cを画像座標復号部15へ出力する。 The connection operation unit 14 connects the image feature amount a extracted from the panorama image 21 and the three-dimensional shape feature amount b extracted from the interpolated three-dimensional coordinates 22, represented by respective matrices, into a feature amount c. to generate The concatenation calculation unit 14 outputs the concatenated feature amount c to the image coordinate decoding unit 15 .
 画像座標復号部15は、連結した特徴量cを復号することにより、境界の画像座標dを生成する。画像座標復号部15は、連結した特徴量cをHorizonNetを使用して復号してもよい。 The image coordinate decoding unit 15 generates the image coordinate d of the boundary by decoding the connected feature amount c. The image coordinate decoding unit 15 may decode the concatenated feature quantity c using HorizonNet.
 3次元再構成部16は、境界の画像座標dを基に屋内空間の3次元再構成を行う。 The three-dimensional reconstruction unit 16 performs three-dimensional reconstruction of the indoor space based on the boundary image coordinates d.
 図2は、一実施形態に係る3次元再構成装置が実行する3次元再構成方法の一例を示すフローチャートである。
<ステップS01>
FIG. 2 is a flowchart showing an example of a three-dimensional reconstruction method executed by a three-dimensional reconstruction device according to one embodiment.
<Step S01>
 ステップS01では、入力部11が、パノラマ画像21を入力する。本ステップを実行する前に予め用意されたパノラマ画像21は、画像の垂直方向を天頂方向に一致させたものとする。
<ステップS02>
In step S<b>01 , the input unit 11 inputs the panoramic image 21 . In the panorama image 21 prepared in advance before executing this step, the vertical direction of the image is made to coincide with the zenith direction.
<Step S02>
 ステップS02では、入力部11が、パノラマ画像21の撮影対象の隅角部間を繋ぐように補間された3次元座標22を入力する。本ステップを実行する前に予め、パノラマ画像21の撮影対象の3次元座標(座標系の原点を任意の隅角部1点に設定することによって得られる隅角部の3次元座標である。)を用意する。該3次元座標の隅角部間を繋ぐように補間されたものが3次元座標22である。補間を行う前の3次元座標は、隅角部のみを点群とする各々の点の3次元座標を有するが、3次元座標22は、隅角部と、隅角部間を繋ぐように補間する線の部分とを点群とする各々の点の3次元座標をも有している。
<ステップS03>
In step S<b>02 , the input unit 11 inputs the three-dimensional coordinates 22 interpolated so as to connect the corners of the panorama image 21 to be photographed. Before executing this step, the three-dimensional coordinates of the imaging target of the panorama image 21 (the three-dimensional coordinates of the corner obtained by setting the origin of the coordinate system to an arbitrary corner). prepare. The three-dimensional coordinates 22 are interpolated so as to connect the corners of the three-dimensional coordinates. The three-dimensional coordinates before interpolation include the three-dimensional coordinates of each point having only the corners as a point group. It also has the three-dimensional coordinates of each point in the point group of the part of the line that crosses the line.
<Step S03>
 ステップS03では、画像符号化部12が、パノラマ画像21から行列で表される画像特徴量aを抽出する。画像特徴量aの抽出には、上述のHorizonNet等を使用してもよい。
<ステップS04>
In step S<b>03 , the image encoding unit 12 extracts the image feature amount a represented by a matrix from the panorama image 21 . The above-mentioned HorizonNet or the like may be used to extract the image feature amount a.
<Step S04>
 ステップS04では、形状符号化部13が、3次元座標22から行列で表される3次元形状特徴量bを抽出する。形状符号化部13は、PointNetを使用して、3次元座標22から3次元形状特徴量bを抽出してもよい。PointNetは、計測した点群を直接入力できるディープニューラルネットワークである。図3は、形状符号化部13がPointNetを使用して3次元形状特徴量を抽出する手順を説明するフローチャートである。形状符号化部13が3次元形状特徴量を抽出する手順を、ステップS041~ステップS044に分けて以下に説明する。 In step S04, the shape encoding unit 13 extracts the three-dimensional shape feature quantity b represented by a matrix from the three-dimensional coordinates 22. The shape encoding unit 13 may extract the three-dimensional shape feature quantity b from the three-dimensional coordinates 22 using PointNet. PointNet is a deep neural network that can directly input measured point clouds. FIG. 3 is a flow chart for explaining the procedure for the shape encoding unit 13 to extract the three-dimensional shape feature quantity using PointNet. The procedure for extracting the three-dimensional shape feature quantity by the shape encoding unit 13 will be described below by dividing it into steps S041 to S044.
 ステップS041では、T-netが、点群を入力し、3×3アフィン変換行列を出力する。点群は、3次元の点の集合{Pi| i = 1, ..., n}として表現され、各点Piは、(x,y,z)座標を有する。T-netとは、点群を入力し、アフィン変換行列を出力するネットワークである。アフィン変換とは、画像の拡大縮小、回転、並進等をまとめて行列を使って行うことをいう。図3に示すようにT-netの内部構造は、形状符号化部13と同様の構造を有する。 At step S041, T-net inputs a point group and outputs a 3×3 affine transformation matrix. A point cloud is represented as a set of three-dimensional points {Pi| i = 1,..., n}, where each point Pi has (x, y, z) coordinates. T-net is a network that inputs a point cloud and outputs an affine transformation matrix. Affine transformation refers to performing scaling, rotation, translation, etc. of an image collectively using a matrix. As shown in FIG. 3, the internal structure of T-net has a structure similar to that of the shape coding section 13 .
 ステップS042では、行列乗算器が、入力点群にT-netが出力したアフィン変換行列を乗じる操作(Matrix multiply)を行う。この結果、n×3の行列が得られる。この操作により、点群の並進・回転による影響をなくすことが可能となる。 In step S042, the matrix multiplier multiplies the input point group by the affine transformation matrix output by the T-net (Matrix multiply). This results in an n×3 matrix. This operation makes it possible to eliminate the effects of translation and rotation of the point group.
 上述したように、ニューラルネットワークとは、人間の脳内にある神経細胞(ニューロン)とそのつながり、つまり神経回路網を模式化した数理モデルをコンピュータ上に再現し、機械学習などの操作に応用したものをいう。「パーセプトロン」とは、ニューラルネットワークを構成する最小単位であり、複数の入力に対して1つの値を出力する関数のことをいう。「パーセプトロン」は 、n個の入力信号に対し1個の信号を出力する関数であり、以下の式(1)で表される。式(1)中、yは出力信号、関数f()は「活性化関数」、xはi番目の入力信号(i=0,1,---,n-1)、wはi番目の入力信号に掛けられる重みを表す変数、bはバイアスと呼ぶ変数である。
 
   y = f (w* x+ --- + wn-1 * xn-1+ b )         (1)   
                  
As mentioned above, a neural network is a mathematical model that simulates nerve cells (neurons) and their connections in the human brain, that is, a neural network, reproduced on a computer and applied to operations such as machine learning. say something A “perceptron” is the smallest unit that constitutes a neural network, and is a function that outputs one value for multiple inputs. A "perceptron" is a function that outputs one signal for n input signals, and is represented by the following equation (1). In equation (1), y is the output signal, function f () is the "activation function", x i is the i-th input signal (i = 0, 1, ---, n-1), w i is i A variable representing a weight applied to the th input signal, b is a variable called a bias.

y = f( w0 * x0 +---+ wn-1 *xn -1 +b) (1)
 ステップS043では、ステップS042で得られたn×3の行列を多層パーセプトロン(図3では、多層パーセプトロンをmlpと記載している。)に入力する。図3に示す「mlp(64,128,256)」は、出力サイズ64、128、256の多層パーセプトロンである。多層パーセプトロンは、例えば、全結合層と活性化関数とを接続して構成される。全結合層は、複数のノード(ニューロンをモデル化したものをいう)から、あるノードへの複数の数値の入力を、線形変換処理によって1つの数値にまとめる。活性化関数は、あるノードから次のノードへと出力する際に、入力値を別の数値に非線形変換して出力する関数である。本実施例では、ReLUが活性化関数として使用される。ReLU(Rectified Linear Unit)とは、関数への入力値が0以下の場合には出力値が常に0、入力値が0より上の場合には出力値が入力値と同じ値となる関数をいう。 In step S043, the n×3 matrix obtained in step S042 is input to the multi-layer perceptron (the multi-layer perceptron is denoted as mlp in FIG. 3). "mlp(64,128,256)" shown in FIG. 3 is a multi-layer perceptron with output sizes 64,128,256. Multilayer perceptrons are constructed, for example, by connecting all-connected layers and activation functions. A fully-connected layer integrates multiple numerical inputs from multiple nodes (modeled neurons) into a single numerical value through linear transformation processing. An activation function is a function that non-linearly converts an input value to another value and outputs it when outputting from one node to the next node. In this example, ReLU is used as the activation function. ReLU (Rectified Linear Unit) is a function whose output value is always 0 when the input value to the function is 0 or less, and whose output value is the same value as the input value when the input value is greater than 0. .
 ステップS044では、Max Poolingによって3次元形状特徴量を得る。Max Poolingとは、各範囲の出力値の中から最大値を選択して圧縮することをいう。この操作により、点群の各々の点の順番による影響をなくすことが可能となる。
<ステップS05>
In step S044, a three-dimensional shape feature amount is obtained by Max Pooling. Max Pooling means selecting and compressing the maximum value from the output values in each range. This operation makes it possible to eliminate the influence of the order of each point in the point group.
<Step S05>
 ステップS05では、連結演算部14が、画像特徴量aと3次元形状特徴量bとを連結し、連結した特徴量cを生成する。特徴量を連結する方法には、単純に行列を結合する方法、行列同士の要素和、要素積、双線形モデル(bilinear model)とする方法等がある。
<ステップS06>
In step S05, the connection operation unit 14 connects the image feature amount a and the three-dimensional shape feature amount b to generate the connected feature amount c. Methods for connecting feature amounts include a method of simply connecting matrices, a method of summing and multiplying elements of matrices, a method of using a bilinear model, and the like.
<Step S06>
 ステップS06では、画像座標復号部15が、連結した特徴量cを復号することにより、境界の画像座標dを生成する。連結した特徴量cの復号には、HorizonNetの復号器等を使用してもよい。
<ステップS07>
In step S06, the image coordinate decoding unit 15 decodes the connected feature quantity c to generate the image coordinate d of the boundary. A HorizonNet decoder or the like may be used for decoding the concatenated feature amount c.
<Step S07>
 ステップS07では、3次元再構成部16が、復号した境界の画像座標dを基に、3次元再構成を行う。3次元再構成部16は、マンハッタンワールド仮説に基づき主成分分析により壁面を決定する処理等を行う。マンハッタンワールド仮説(Manhattan World Assumption)とは、3次元空間における天井、壁等の人工物には、互いに直交する支配的な3軸が存在し、人工物を構成する天井、壁等の面は、3軸に垂直又は平行に配置されているとする仮設である。 In step S07, the three-dimensional reconstruction unit 16 performs three-dimensional reconstruction based on the decoded boundary image coordinates d. The three-dimensional reconstruction unit 16 performs processing such as determining a wall surface by principal component analysis based on the Manhattan world hypothesis. The Manhattan World Assumption states that artificial objects such as ceilings and walls in three-dimensional space have dominant three axes that are orthogonal to each other, and the surfaces of the ceilings, walls, etc. that make up the artificial objects are It is a provisional arrangement that they are arranged perpendicularly or parallel to the three axes.
 上述のように、十分な画像特徴量aを得ることが困難な屋内空間に対しても高精度な境界検出を実現するという本発明の目的を達成するため、該屋内空間のパノラマ画像21と、該屋内空間の図面とを併用した。 As described above, in order to achieve the object of the present invention of realizing highly accurate boundary detection even in an indoor space where it is difficult to obtain a sufficient image feature amount a, the panoramic image 21 of the indoor space, The drawing of the indoor space was also used.
 本開示において、図面を併用して高精度な境界検出を実現するために実施したことは、(i)図面に記載されているパノラマ画像21の撮影対象の隅角部間を補間された3次元座標22を入力データとして用意したこと、(ii)3次元座標22から3次元形状特徴量(点群特徴量)bを抽出する形状符号化部13を加えたこと、及び(iii)画像特徴量aと3次元形状特徴量bとを連結した特徴量cを復号することにより、3次元情報を加味した境界推定を行ったこと、である。上記実施により、画像特徴量aに加え3次元形状特徴量bをも加味した境界検出が可能となり、3次元再構成精度が向上する。 In the present disclosure, what has been implemented in order to realize highly accurate boundary detection by using drawings together is (i) a three-dimensional image interpolated between the corners of the imaging target of the panorama image 21 described in the drawings; The coordinates 22 are prepared as input data, (ii) the addition of the shape encoding unit 13 for extracting the three-dimensional shape feature amount (point group feature amount) b from the three-dimensional coordinates 22, and (iii) the image feature amount. By decoding the feature quantity c, which is a concatenation of a and the three-dimensional shape feature quantity b, the boundary is estimated in consideration of the three-dimensional information. By the above-described implementation, it becomes possible to perform boundary detection in consideration of the three-dimensional shape feature amount b in addition to the image feature amount a, thereby improving the three-dimensional reconstruction accuracy.
 パノラマ画像21に加えて、図面等に記載されている3次元情報を活用すると、図面情報により即した境界検出が可能となる。また、高精度な境界検出が実現できると、高精度な3次元再構成が可能となる。さらに、高精度な3次元再構成により、画像の再投影誤差を小さくすることができるため、視認性の向上につながる。 In addition to the panorama image 21, by utilizing the three-dimensional information described in the drawings, etc., it is possible to detect the boundaries more in line with the drawing information. Also, if highly accurate boundary detection can be achieved, highly accurate three-dimensional reconstruction will be possible. Furthermore, high-precision three-dimensional reconstruction can reduce image reprojection errors, leading to improved visibility.
 また、パノラマ画像21に構造物の点検画像を用いると、劣化情報を含んだ3次元データが得られるため、構造計算を行い定量的な健全性評価が可能となる。 Also, by using the inspection image of the structure as the panoramic image 21, three-dimensional data including deterioration information can be obtained, so structural calculations can be performed and quantitative soundness evaluation can be performed.
 したがって、本開示に係る3次元画像再構成装置1によれば、屋内空間中に多数の物体が存在する等、画像特徴量aからの境界検出が困難な空間に対しても、パノラマ画像21と、点群等の3次元座標22とを併用することにより、高精度な境界検出を実現することが可能となるため、高精度な3次元再構成が可能となる。 Therefore, according to the three-dimensional image reconstruction device 1 according to the present disclosure, the panorama image 21 and the , and the three-dimensional coordinates 22 such as the point cloud, it is possible to realize highly accurate boundary detection, and thus highly accurate three-dimensional reconstruction is possible.
 より具体的には、本開示に係る3次元画像再構成装置1によれば、(i)従来のパノラマ画像からの3次元再構成技術では、画像特徴量に再構成精度が依存していたが、本開示では、パノラマ画像と、点群等の3次元座標を併用することで、十分な画像特徴量を得ることが困難な空間に対しても、境界検出の高精度化を実現することが可能となり、(ii)パノラマ画像取得時のカメラ位置姿勢を必要としない高精度な壁面同士の境界検出が可能となり、(iii)収容物で壁面が遮蔽されている屋内空間に対しても高精度な3次元画像の再構成が可能となる、等の効果が得られる。 More specifically, according to the three-dimensional image reconstruction device 1 according to the present disclosure, (i) in the three-dimensional reconstruction technique from the conventional panoramic image, the reconstruction accuracy depends on the image feature amount. In the present disclosure, by using both a panoramic image and three-dimensional coordinates such as a point cloud, it is possible to realize high-precision boundary detection even in a space where it is difficult to obtain sufficient image feature amounts. (ii) high-precision boundary detection between walls that does not require the camera position and orientation when acquiring a panoramic image; It is possible to reconstruct a three-dimensional image with a large number of images.
 上記の3次元再構成装置1における入力部11、画像符号化部12、形状符号化部13、連結演算部14、画像座標復号部15及び3次元再構成部16は、制御装置(コントローラ)の一部を構成する。該制御装置は、ASIC(Application Specific Integrated Circuit)、FPGA(Field-Programmable Gate Array)などの専用のハードウェアによって構成されてもよいし、プロセッサによって構成されてもよいし、双方を含んで構成されてもよい。 The input unit 11, the image encoding unit 12, the shape encoding unit 13, the concatenation calculation unit 14, the image coordinate decoding unit 15, and the three-dimensional reconstruction unit 16 in the three-dimensional reconstruction device 1 are included in the control device (controller). constitute a part. The control device may be composed of dedicated hardware such as ASIC (Application Specific Integrated Circuit) or FPGA (Field-Programmable Gate Array), or may be composed of a processor, or may be composed of both. may
 上記の3次元再構成装置1を機能させるために、プログラム命令を実行可能なコンピュータを用いることも可能である。図4は、3次元再構成装置として機能するコンピュータの概略構成を示すブロック図である。ここで、3次元再構成装置1として機能するコンピュータは、汎用コンピュータ、専用コンピュータ、ワークステーション、PC(Personal Computer)、電子ノートパッド等であってもよい。プログラム命令は、必要なタスクを実行するためのプログラムコード、コードセグメント等であってもよい。 It is also possible to use a computer capable of executing program instructions in order to function the three-dimensional reconstruction apparatus 1 described above. FIG. 4 is a block diagram showing a schematic configuration of a computer that functions as a three-dimensional reconstruction device. Here, the computer functioning as the three-dimensional reconstruction device 1 may be a general-purpose computer, a dedicated computer, a workstation, a PC (Personal Computer), an electronic notepad, or the like. Program instructions may be program code, code segments, etc. for performing the required tasks.
 図4に示すように、コンピュータ100は、プロセッサ110と、記憶部としてROM(Read Only Memory)120、RAM(Random Access Memory)130、及びストレージ140と、入力部150と、出力部160と、通信インターフェース(I/F)170と、を備える。各構成は、バス180を介して相互に通信可能に接続されている。 As shown in FIG. 4, the computer 100 includes a processor 110, a ROM (Read Only Memory) 120, a RAM (Random Access Memory) 130, and a storage 140 as storage units, an input unit 150, an output unit 160, and communication and an interface (I/F) 170 . Each component is communicatively connected to each other via a bus 180 .
 ROM120は、各種プログラム及び各種データを保存する。RAM130は、作業領域として一時的にプログラム又はデータを記憶する。ストレージ140は、HDD(Hard Disk Drive)又はSSD(Solid State Drive)により構成され、オペレーティングシステムを含む各種プログラム及び各種データを保存する。本開示では、ROM120又はストレージ140に、本開示に係るプログラムが保存されている。 The ROM 120 stores various programs and various data. RAM 130 temporarily stores programs or data as a work area. The storage 140 is configured by a HDD (Hard Disk Drive) or SSD (Solid State Drive) and stores various programs including an operating system and various data. In the present disclosure, the ROM 120 or the storage 140 stores programs according to the present disclosure.
 プロセッサ110は、具体的にはCPU(Central Processing Unit)、MPU(Micro Processing Unit)、GPU(Graphics Processing Unit)、DSP(Digital Signal Processor)、SoC(System on a Chip)等であり、同種又は異種の複数のプロセッサにより構成されてもよい。プロセッサ110は、ROM120又はストレージ140からプログラムを読み出し、RAM130を作業領域としてプログラムを実行することで、上記各構成の制御及び各種の演算処理を行う。なお、これらの処理内容の少なくとも一部をハードウェアで実現することとしてもよい。 The processor 110 is specifically a CPU (Central Processing Unit), MPU (Micro Processing Unit), GPU (Graphics Processing Unit), DSP (Digital Signal Processor), SoC (System on a Chip), or the like. may be configured by a plurality of processors of The processor 110 reads a program from the ROM 120 or the storage 140 and executes the program using the RAM 130 as a work area, thereby performing control of each configuration and various arithmetic processing. Note that at least part of these processing contents may be realized by hardware.
 プログラムは、3次元再構成装置1が読み取り可能な記録媒体に記録されていてもよい。このような記録媒体を用いれば、3次元再構成装置1にインストールすることが可能である。ここで、プログラムが記録された記録媒体は、非一過性(non-transitory)の記録媒体であってもよい。非一過性の記録媒体は、特に限定されるものではないが、例えば、CD-ROM、DVD-ROM、USB(Universal Serial Bus)メモリ等であってもよい。また、このプログラムは、ネットワークを介して外部装置からダウンロードされる形態としてもよい。 The program may be recorded on a recording medium readable by the three-dimensional reconstruction device 1. By using such a recording medium, it can be installed in the three-dimensional reconstruction device 1 . Here, the recording medium on which the program is recorded may be a non-transitory recording medium. The non-transitory recording medium is not particularly limited, but may be, for example, a CD-ROM, a DVD-ROM, a USB (Universal Serial Bus) memory, or the like. Also, this program may be downloaded from an external device via a network.
 以上の実施形態に関し、更に以下の付記を開示する。 Regarding the above embodiments, the following additional remarks are disclosed.
 (付記項1)
 屋内空間の3次元再構成を行う3次元再構成装置であって、
 メモリと、
 前記メモリに接続されたコントローラと、
 を備え、
 前記コントローラは、
 対象とする屋内空間のパノラマ画像と、該屋内空間の図面に記載されている撮影対象の隅角部間を繋ぐように補間された3次元座標と、を入力し、
 前記パノラマ画像から画像特徴量を抽出し、
 前記補間された3次元座標から3次元形状特徴量を抽出し、
 前記画像特徴量と、前記3次元形状特徴量と、を連結した特徴量を生成し、
 前記連結した特徴量を復号することにより、境界の画像座標を生成し、
 前記境界の画像座標を基に前記屋内空間の3次元再構成を行う、3次元再構成装置。
 (付記項2)
 前記コントローラは、
 前記パノラマ画像からHorizonNetを使用して、画像特徴量を抽出する、付記項1に記載の3次元再構成装置。
 (付記項3)
 前記コントローラは、
 PointNetを使用して、前記補間された3次元座標から3次元形状特徴量を抽出する、付記項1又は2に記載の3次元再構成装置。
 (付記項4)
 前記コントローラは、
 前記連結した特徴量をHorizonNetを使用して復号する、付記項1から3のいずれか一項に記載の3次元再構成装置。
 (付記項5)
 屋内空間の3次元再構成を行う3次元再構成方法であって、
 3次元再構成装置により、
 対象とする屋内空間のパノラマ画像と、該屋内空間の図面に記載されている撮影対象の隅角部間を繋ぐように補間された3次元座標と、を入力するステップと、
 前記パノラマ画像から画像特徴量を抽出するステップと、
 前記補間された3次元座標から3次元形状特徴量を抽出するステップと、
 前記画像特徴量と、前記3次元形状特徴量と、を連結した特徴量を生成するステップと、
 前記連結した特徴量を復号することにより、境界の画像座標を生成するステップと、
 前記境界の画像座標を基に前記屋内空間の3次元再構成を行うステップと、
を含む3次元再構成方法。
 (付記項6)
 コンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、前記コンピュータを付記項1から4のいずれか一項に記載の3次元再構成装置として機能させるプログラムを記憶した非一時的記憶媒体。
(Appendix 1)
A three-dimensional reconstruction device that performs three-dimensional reconstruction of an indoor space,
memory;
a controller connected to the memory;
with
The controller is
Input a panoramic image of the target indoor space and three-dimensional coordinates interpolated so as to connect the corners of the shooting target described in the drawing of the indoor space,
extracting an image feature amount from the panoramic image;
extracting a three-dimensional shape feature amount from the interpolated three-dimensional coordinates;
generating a feature amount by connecting the image feature amount and the three-dimensional shape feature amount;
generating boundary image coordinates by decoding the concatenated features;
A three-dimensional reconstruction device that performs three-dimensional reconstruction of the indoor space based on the image coordinates of the boundary.
(Appendix 2)
The controller is
3. The three-dimensional reconstruction device according to claim 1, wherein image feature amounts are extracted from the panorama image using HorizonNet.
(Appendix 3)
The controller is
3. The three-dimensional reconstruction apparatus according to item 1 or 2, wherein a three-dimensional shape feature amount is extracted from the interpolated three-dimensional coordinates using PointNet.
(Appendix 4)
The controller is
4. The three-dimensional reconstruction device according to any one of additional items 1 to 3, wherein the concatenated feature quantity is decoded using HorizonNet.
(Appendix 5)
A three-dimensional reconstruction method for three-dimensional reconstruction of an indoor space, comprising:
With a three-dimensional reconstruction device,
a step of inputting a panoramic image of a target indoor space and three-dimensional coordinates interpolated so as to connect the corners of the shooting target described in the drawing of the indoor space;
a step of extracting an image feature quantity from the panoramic image;
a step of extracting a three-dimensional shape feature quantity from the interpolated three-dimensional coordinates;
generating a feature amount by connecting the image feature amount and the three-dimensional shape feature amount;
generating image coordinates of boundaries by decoding the concatenated features;
performing three-dimensional reconstruction of the indoor space based on the image coordinates of the boundary;
A three-dimensional reconstruction method comprising:
(Appendix 6)
A non-temporary storage medium storing a computer-executable program, the non-temporary storage storing a program that causes the computer to function as the three-dimensional reconstruction device according to any one of appendices 1 to 4. medium.
 上述の実施形態は代表的な例として説明したが、本開示の趣旨及び範囲内で、多くの変更及び置換ができることは当業者に明らかである。したがって、本発明は、上述の実施形態によって制限するものと解するべきではなく、特許請求の範囲から逸脱することなく、種々の変形又は変更が可能である。たとえば、実施形態の構成図に記載の複数の構成ブロックを1つに組み合わせたり、あるいは1つの構成ブロックを分割したりすることが可能である。 Although the above-described embodiments have been described as representative examples, it will be apparent to those skilled in the art that many modifications and substitutions can be made within the spirit and scope of the present disclosure. Therefore, the present invention should not be construed as limited by the embodiments described above, and various modifications and changes are possible without departing from the scope of the claims. For example, it is possible to combine a plurality of configuration blocks described in the configuration diagrams of the embodiments into one, or divide one configuration block.
1                   3次元再構成装置
11                 入力部
12                 画像符号化部
13                 形状符号化部
14                 連結演算部
15                 画像座標復号部
16                 3次元再構成部
21                 パノラマ画像
22                 3次元座標(補間された3次元座標)
23                 3次元再構成
100               コンピュータ
110               プロセッサ
120               ROM
130               RAM
140               ストレージ
150               入力部
160               出力部
170               通信インターフェース(I/F)
180               バス
 
1 three-dimensional reconstruction device 11 input unit 12 image encoding unit 13 shape encoding unit 14 concatenation operation unit 15 image coordinate decoding unit 16 three-dimensional reconstruction unit 21 panorama image 22 three-dimensional coordinates (interpolated three-dimensional coordinates)
23 three-dimensional reconstruction 100 computer 110 processor 120 ROM
130 RAM
140 storage 150 input unit 160 output unit 170 communication interface (I/F)
180 bus

Claims (6)

  1.  屋内空間の3次元再構成を行う3次元再構成装置であって、
     対象とする屋内空間のパノラマ画像と、該屋内空間の図面に記載されている撮影対象の隅角部間を繋ぐように補間された3次元座標と、を入力する入力部と、
     前記パノラマ画像から画像特徴量を抽出する画像符号化部と、
     前記補間された3次元座標から3次元形状特徴量を抽出する形状符号化部と、
     前記画像特徴量と、前記3次元形状特徴量と、を連結した特徴量を生成する連結演算部と、
     前記連結した特徴量を復号することにより、境界の画像座標を生成する画像座標復号部と、
     前記境界の画像座標を基に前記屋内空間の3次元再構成を行う3次元再構成部と、
    を備える3次元再構成装置。
    A three-dimensional reconstruction device that performs three-dimensional reconstruction of an indoor space,
    an input unit for inputting a panoramic image of a target indoor space and three-dimensional coordinates interpolated so as to connect the corners of the object to be photographed described in the drawing of the indoor space;
    an image encoding unit that extracts an image feature amount from the panoramic image;
    a shape encoding unit that extracts a three-dimensional shape feature amount from the interpolated three-dimensional coordinates;
    a concatenation calculation unit that generates a feature amount by concatenating the image feature amount and the three-dimensional shape feature amount;
    an image coordinate decoding unit that generates boundary image coordinates by decoding the connected feature amounts;
    a three-dimensional reconstruction unit that performs three-dimensional reconstruction of the indoor space based on the image coordinates of the boundary;
    A three-dimensional reconstruction device comprising:
  2.  前記画像符号化部は、
     前記パノラマ画像からHorizonNetを使用して、画像特徴量を抽出する、請求項1に記載の3次元再構成装置。
    The image encoding unit is
    2. The three-dimensional reconstruction apparatus according to claim 1, wherein image features are extracted from said panorama image using HorizonNet.
  3.  前記形状符号化部は、
     PointNetを使用して、前記補間された3次元座標から3次元形状特徴量を抽出する、請求項1又は2に記載の3次元再構成装置。
    The shape encoding unit
    3. The three-dimensional reconstruction apparatus according to claim 1, wherein PointNet is used to extract three-dimensional shape features from the interpolated three-dimensional coordinates.
  4.  前記画像座標復号部は、
     前記連結した特徴量をHorizonNetを使用して復号する、請求項1から3のいずれか一項に記載の3次元再構成装置。
    The image coordinate decoding unit is
    4. The three-dimensional reconstruction apparatus according to any one of claims 1 to 3, wherein the concatenated feature quantity is decoded using HorizonNet.
  5.  屋内空間の3次元再構成を行う3次元再構成方法であって、
     3次元再構成装置により、
     対象とする屋内空間のパノラマ画像と、該屋内空間の図面に記載されている撮影対象の隅角部間を繋ぐように補間された3次元座標と、を入力するステップと、
     前記パノラマ画像から画像特徴量を抽出するステップと、
     前記補間された3次元座標から3次元形状特徴量を抽出するステップと、
     前記画像特徴量と、前記3次元形状特徴量と、を連結した特徴量を生成するステップと、
     前記連結した特徴量を復号することにより、境界の画像座標を生成するステップと、
     前記境界の画像座標を基に前記屋内空間の3次元再構成を行うステップと、
    を含む3次元再構成方法。
    A three-dimensional reconstruction method for three-dimensional reconstruction of an indoor space, comprising:
    With a three-dimensional reconstruction device,
    a step of inputting a panoramic image of a target indoor space and three-dimensional coordinates interpolated so as to connect the corners of the shooting target described in the drawing of the indoor space;
    a step of extracting an image feature quantity from the panoramic image;
    a step of extracting a three-dimensional shape feature quantity from the interpolated three-dimensional coordinates;
    generating a feature amount by connecting the image feature amount and the three-dimensional shape feature amount;
    generating image coordinates of boundaries by decoding the concatenated features;
    performing three-dimensional reconstruction of the indoor space based on the image coordinates of the boundary;
    A three-dimensional reconstruction method comprising:
  6.  コンピュータを、請求項1から4のいずれか一項に記載の3次元再構成装置として機能させるためのプログラム。 A program for causing a computer to function as the three-dimensional reconstruction device according to any one of claims 1 to 4.
PCT/JP2021/040164 2021-10-29 2021-10-29 Three-dimensional reconstruction device, three-dimensional reconstruction method and program WO2023073971A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2023556075A JPWO2023073971A1 (en) 2021-10-29 2021-10-29
PCT/JP2021/040164 WO2023073971A1 (en) 2021-10-29 2021-10-29 Three-dimensional reconstruction device, three-dimensional reconstruction method and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/040164 WO2023073971A1 (en) 2021-10-29 2021-10-29 Three-dimensional reconstruction device, three-dimensional reconstruction method and program

Publications (1)

Publication Number Publication Date
WO2023073971A1 true WO2023073971A1 (en) 2023-05-04

Family

ID=86157671

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/040164 WO2023073971A1 (en) 2021-10-29 2021-10-29 Three-dimensional reconstruction device, three-dimensional reconstruction method and program

Country Status (2)

Country Link
JP (1) JPWO2023073971A1 (en)
WO (1) WO2023073971A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017203709A1 (en) * 2016-05-27 2017-11-30 楽天株式会社 Three-dimensional model generation system, three-dimensional model generation method, and program
US20200302686A1 (en) * 2019-03-18 2020-09-24 Geomagical Labs, Inc. System and method for virtual modeling of indoor scenes from imagery

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017203709A1 (en) * 2016-05-27 2017-11-30 楽天株式会社 Three-dimensional model generation system, three-dimensional model generation method, and program
US20200302686A1 (en) * 2019-03-18 2020-09-24 Geomagical Labs, Inc. System and method for virtual modeling of indoor scenes from imagery

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZIOULIS NIKOLAOS; ALVAREZ FEDERICO; ZARPALAS DIMITRIOS; DARAS PETROS: "Single-shot cuboids: Geodesics-based end-to-end Manhattan aligned layout estimation from spherical panoramas", IMAGE AND VISION COMPUTING, ELSEVIER, GUILDFORD, GB, vol. 110, 18 March 2021 (2021-03-18), GUILDFORD, GB , XP086578049, ISSN: 0262-8856, DOI: 10.1016/j.imavis.2021.104160 *

Also Published As

Publication number Publication date
JPWO2023073971A1 (en) 2023-05-04

Similar Documents

Publication Publication Date Title
US11354847B2 (en) Three-dimensional object reconstruction from a video
JP7386812B2 (en) lighting estimation
US11082720B2 (en) Using residual video data resulting from a compression of original video data to improve a decompression of the original video data
CN111652054B (en) Joint point detection method, gesture recognition method and device
CN111429421A (en) Model generation method, medical image segmentation method, device, equipment and medium
Badías et al. An augmented reality platform for interactive aerodynamic design and analysis
JP6863596B2 (en) Data processing device and data processing method
US11954830B2 (en) High dynamic range support for legacy applications
JP2007004578A (en) Method and device for acquiring three-dimensional shape and recording medium for program
WO2021188104A1 (en) Object pose estimation and defect detection
JP7333520B2 (en) LEARNING PROGRAM, LEARNING METHOD, AND INFORMATION PROCESSING DEVICE
CN113807361A (en) Neural network, target detection method, neural network training method and related products
Yang et al. DMAT: Deformable medial axis transform for animated mesh approximation
CN115457492A (en) Target detection method and device, computer equipment and storage medium
DE102018123761A1 (en) FUSE PROTECTION IN AN ERROR CORRECTION CODE (ECC) IMPLEMENTED IN A MOTOR VEHICLE SYSTEM
KR20190035445A (en) Electronic apparatus and control method thereof
US9977993B2 (en) System and method for constructing a statistical shape model
US11475549B1 (en) High dynamic range image generation from tone mapped standard dynamic range images
WO2023073971A1 (en) Three-dimensional reconstruction device, three-dimensional reconstruction method and program
JP2008310724A (en) Three-dimensional shape restoration device, three-dimensional shape restoration method, three-dimensional shape restoration program and recording medium with its program stored
CN110827394B (en) Facial expression construction method, device and non-transitory computer readable recording medium
CN114820755B (en) Depth map estimation method and system
CN113158970B (en) Action identification method and system based on fast and slow dual-flow graph convolutional neural network
CN112802075A (en) Training method of texture generation model, image processing method and device
JP2023553004A (en) Deep learning-based image augmentation for additive manufacturing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21962510

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023556075

Country of ref document: JP