JP2022050293A

JP2022050293A - Information processing device, information processing method, information processing system, and information processing program

Info

Publication number: JP2022050293A
Application number: JP2021009953A
Authority: JP
Inventors: 博和森田; Hirokazu Morita; 洋介安田; Yosuke Yasuda
Original assignee: Spacely Inc
Current assignee: Spacely Inc
Priority date: 2020-09-17
Filing date: 2021-01-26
Publication date: 2022-03-30
Also published as: JP2022049767A; JP6830561B1

Abstract

PROBLEM TO BE SOLVED: To provide an information processing device, an information processing method, an information processing system and an information processing program, with which it is possible to process information that pertains to a photograph of a space and a plan view of the space.

SOLUTION: An information processing device 3 comprises a storage unit 33 and a control unit 32. The storage unit 33 stores a trained model having been trained by machine learning using a convolutional neural network on the basis of a correspondence relation between the plan view of first N spaces and one or more photographs in which each of the first N spaces is photographed. The control unit 32 extracts, on the basis of the trained model and the photograph of a second space differing from the N spaces and the plan view of the second space, information in which the photograph of the second space is compressed, and concatting this and information in which the plan view of the second space is compressed, thereby estimating the photographing position and photographing direction at which the photograph of the second space in the plan view of the second space was photographed.

SELECTED DRAWING: Figure 3

Description

本発明は、情報処理装置、情報処理方法、情報処理システム、情報処理プログラムに関する。一例として、不動産物件の「部屋写真」および「間取り図」に関連する情報を処理することを可能とする情報処理装置、情報処理方法、情報処理システム、情報処理プログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, an information processing system, and an information processing program. As an example, the present invention relates to an information processing device, an information processing method, an information processing system, and an information processing program capable of processing information related to a "room photograph" and a "room layout" of a real estate property.

不動産の物件案内には、通常、当該不動産の平面図としての間取り図が掲載されており、加えて、特定の部屋や設備に関してはそれらを撮影した写真が掲載されることがある。従来は、不動産物件の部屋写真（たとえば、廊下、居間、洋室、キッチン、トイレ、玄関、ベランダなどの写真）を撮影した場合、その写真と、間取り図との対応関係（たとえば、どの場所から撮影したものであるのか、どの方向に向かって撮影したものであるのか等）については、不動産業者等において手動で入力を行っていた。 The property guide of a real estate usually contains a floor plan as a plan view of the real estate, and in addition, a photograph of a specific room or facility may be posted. In the past, when a room photo of a real estate property (for example, a photo of a corridor, living room, Western-style room, kitchen, toilet, entrance, veranda, etc.) was taken, the correspondence between the photo and the floor plan (for example, from which place) was taken. The real estate agent, etc. manually entered the information (whether it was taken or in which direction it was taken, etc.).

ＮａｍＶｏｅｔａｌ．ＣＶＰＲ２０１９，ＣｏｍｐｏｓｉｎｇＴｅｘｔａｎｄＩｍａｇｅｆｏｒＩｍａｇｅＲｅｔｒｉｅｖａｌ－ＡｎＥｍｐｉｒｉｃａｌＯｄｙｓｓｅｙ（ａｒＸｉｖ：１８１２．０７１１９）Nam Vo et al. CVPR2019, Composing Text and Image for Image Retrieval-An Imperial Odyssey (arXiv: 1812.01719) ＳｈｕａｉＬｉａｏｅｔａｌ．ＣＶＰＲ２０１９，ＳｐｈｅｒｉｃａｌＲｅｇｒｅｓｓｉｏｎ：ＬｅａｒｎｉｎｇＶｉｅｗｐｏｉｎｔｓ，ＳｕｒｆａｃｅＮｏｒｍａｌｓａｎｄ３ＤＲｏｔａｔｉｏｎｓｏｎｎ－Ｓｐｈｅｒｅｓ（ａｒＸｉｖ：１９０４．０５４０４）Shuai Liao et al. CVPR 2019, Physical Restriction: Regression Viewpoints, Surface Normals and 3D Rotations on n-Spheres (arXiv: 1904.05404)

上記のとおり、物件の部屋写真については、従来から、物件案内等において掲載され、当該写真がどの場所ないし部屋の写真なのかを説明する物件説明が記載されてきた。しかしながら、現在までにその撮影位置や撮影方向の入力について自動化の試みは、（本発明者ら以外においては）奏功していない。その主な理由は、建築図面である間取り図と、不動産の部屋等の現物を撮影した部屋写真とが、全く異なる次元の図であるためであった。たとえば、図１と図２を参照する。図１はいわゆる間取り図（フロアデザイン）の一例であり、図１の物件では玄関正面に収納があり、玄関からは、トイレと１０．９畳のＬＤＫ（リビングダイニングキッチン）に入室することができる。ＬＤＫ（リビングダイニングキッチン）には、キッチンが配置され、図面右手方向の６．７畳洋室と、図面左手方向の５．４畳の洋室に隣接している。都合２部屋とＬＤＫが存在することになる。他方、図２は、デジタルカメラによって撮影した図１に示す物件内部の写真である。間取り図（フロアデザイン）とは次元も視点も異なる図である。 As mentioned above, the room photo of the property has been posted in the property guide, etc., and the property description explaining which place or room the photo is in has been described. However, to date, attempts to automate the input of the shooting position and shooting direction have not been successful (other than the present inventors). The main reason for this was that the floor plan, which is an architectural drawing, and the room photograph, which was taken of the actual room such as a real estate room, are completely different dimensions. See, for example, FIGS. 1 and 2. Fig. 1 is an example of a so-called floor plan (floor design). In the property shown in Fig. 1, there is a storage in front of the entrance, and from the entrance, you can enter the toilet and the 10.9 tatami mat LDK (living / dining / kitchen). .. The LDK (living-dining kitchen) has a kitchen, which is adjacent to a 6.7 tatami mat room on the right side of the drawing and a 5.4 tatami mat room on the left side of the drawing. There will be 2 rooms and an LDK. On the other hand, FIG. 2 is a photograph of the inside of the property shown in FIG. 1 taken by a digital camera. It is a diagram with different dimensions and viewpoints from the floor plan (floor design).

図１に、当該写真の撮影位置と撮影方向を矢印で記載した。人間が観察すれば、正面にキッチンがあり、右手方向に扉が複数位置していることなどが分かるが、このような撮影位置と撮影方向の特定は、コンピュータで精度良く自動的に入力することができなかったため、不動産会社等において人が推測し、手作業で入力するほかなかった。このような課題を発見した本件発明者らは、これらの課題を解決することのできる、より利便性の高い、不動産物件の間取り図および部屋写真の情報処理に係る情報処理装置、当該情報処理装置を用いた情報処理方法、当該情報処理装置を含む情報処理システム、コンピュータを当該情報処理装置として動作させる情報処理プログラムが求められることを認識し、本件発明に至ったものである。なお、その際、非特許文献１ないし２のような画像の認識に係る先行技術文献に接したものの、これらの文献に記載の技術では、本課題を解決することはできなかった。 In FIG. 1, the shooting position and shooting direction of the photograph are indicated by arrows. If humans observe it, it can be seen that there is a kitchen in front and multiple doors are located in the direction of the right hand, but such identification of the shooting position and shooting direction should be automatically and accurately input by a computer. Because I couldn't do it, I had to guess by a person at a real estate company and input it manually. The inventors who have discovered such problems are more convenient information processing devices related to information processing of floor plans and room photographs of real estate properties, which can solve these problems, and the information processing devices. The present invention was made by recognizing that an information processing method using an information processing device, an information processing system including the information processing device, and an information processing program for operating a computer as the information processing device are required. At that time, although they came into contact with prior art documents related to image recognition such as Non-Patent Documents 1 and 2, the techniques described in these documents could not solve this problem.

本発明の一実施形態においては、記憶部（発明の詳細な説明における記憶部３３に対応する）と制御部（発明の詳細な説明における制御部３２に対応する）とを有する情報処理装置（発明の詳細な説明における情報処理装置３に対応する）であって、前記記憶部は、第１のＮ個の物の情報（発明の詳細な説明における、学習モデルの部屋に関する情報に対応する）と、前記第１のＮ個の物それぞれを撮影した１以上の写真（発明の詳細な説明における、学習モデルの部屋写真に対応する）との対応関係に基づいた学習済モデルを記憶し、前記制御部が、前記学習済みモデルと、前記Ｎ個の物と異なる第２の物の写真（発明の詳細な説明における、推定対象であるところの部屋写真に対応する）および該第２の物の平面図（発明の詳細な説明における、推定対象であるところの部屋写真に対応した間取り図に対応する）とに基づいて、前記第２の物の平面図における前記第２の物の写真が撮影された撮影位置と撮影方向を推定する、情報処理装置が提供される。 In one embodiment of the present invention, an information processing apparatus (invention) having a storage unit (corresponding to the storage unit 33 in the detailed description of the invention) and a control unit (corresponding to the control unit 32 in the detailed description of the invention). The storage unit corresponds to the information of the first N objects (corresponding to the information regarding the room of the learning model in the detailed description of the invention). , The trained model based on the correspondence with one or more photographs (corresponding to the room photograph of the learning model in the detailed description of the invention) obtained by taking each of the first N objects is stored, and the control thereof is performed. The part is a photograph of the trained model and a second object different from the N objects (corresponding to a room photograph of an estimation target in the detailed description of the invention) and a plane of the second object. A photograph of the second object in the plan view of the second object is taken based on the figure (corresponding to the floor view corresponding to the room photograph of the place to be estimated in the detailed description of the invention). An information processing device is provided that estimates the shooting position and shooting direction.

上記構成を備える情報処理装置は、学習済みモデルを用いて、物の写真が撮影された撮影位置と撮影方向を物の平面図において推定することが可能である。 An information processing device having the above configuration can estimate the shooting position and shooting direction in which a photograph of an object is taken in a plan view of the object by using a trained model.

本発明の一実施形態においては、記憶部（発明の詳細な説明における記憶部３３に対応する）と制御部（発明の詳細な説明における制御部３２に対応する）とを有する情報処理装置（発明の詳細な説明における情報処理装置３に対応する）における情報処理方法であって、前記記憶部が、第１のＮ個の物の情報（発明の詳細な説明における、学習モデルの部屋に関する情報に対応する）と、前記第１のＮ個の物それぞれを撮影した１以上の写真（発明の詳細な説明における、学習モデルの部屋写真に対応する）との対応関係に基づいた学習済モデルを記憶し、前記制御部が、前記学習済みモデルと、前記Ｎ個の物と異なる第２の物の写真（発明の詳細な説明における、推定対象であるところの部屋写真に対応する）および該第２の物の平面図（発明の詳細な説明における、推定対象であるところの部屋写真に対応した間取り図に対応する）とに基づいて、前記第２の物の平面図における前記第２の物の写真が撮影された撮影位置と撮影方向を推定する、情報処理方法が提供される。 In one embodiment of the present invention, an information processing apparatus (invention) having a storage unit (corresponding to the storage unit 33 in the detailed description of the invention) and a control unit (corresponding to the control unit 32 in the detailed description of the invention). In the information processing method in the information processing apparatus 3 in the detailed description of the above, the storage unit provides information on the first N objects (information on the room of the learning model in the detailed description of the invention). Corresponds to) and one or more photographs of each of the first N objects (corresponding to the room photograph of the learning model in the detailed description of the invention) and stores the trained model based on the correspondence relationship. Then, the control unit performs the trained model, a photograph of a second object different from the N objects (corresponding to a room photograph of an estimation target in the detailed description of the invention), and the second object. The second object in the plan view of the second object based on the plan view of the object (corresponding to the floor view corresponding to the room photograph of the place to be estimated in the detailed description of the invention). An information processing method is provided that estimates the shooting position and shooting direction in which the photo was taken.

上記構成を備える情報処理方法は、コンピュータにより、学習済みモデルを用い、物の写真が撮影された撮影位置と撮影方向を物の平面図において推定することを可能とする。 The information processing method provided with the above configuration makes it possible to estimate the shooting position and shooting direction in which a photograph of an object is taken in a plan view of the object by using a trained model by a computer.

本発明の一実施形態においては、コンピュータを、記憶部（発明の詳細な説明における記憶部３３に対応する）と制御部（発明の詳細な説明における制御部３２に対応する）とを有する情報処理装置（発明の詳細な説明における情報処理装置３に対応する）であって、前記記憶部は、第１のＮ個の物の情報（発明の詳細な説明における、学習モデルの部屋に関する情報に対応する）と、前記第１のＮ個の物それぞれを撮影した１以上の写真（発明の詳細な説明における、学習モデルの部屋写真に対応する）との対応関係に基づいた学習済モデルを記憶し、前記制御部が、前記学習済みモデルと、前記Ｎ個の物と異なる第２の物の写真（発明の詳細な説明における、推定対象であるところの部屋写真に対応する）および該第２の物の平面図（発明の詳細な説明における、推定対象であるところの部屋写真に対応した間取り図に対応する）とに基づいて、前記第２の物の平面図における前記第２の物の写真が撮影された撮影位置と撮影方向を推定する情報処理装置として機能させる、情報処理プログラムが提供される。 In one embodiment of the invention, the computer is an information processing unit having a storage unit (corresponding to the storage unit 33 in the detailed description of the invention) and a control unit (corresponding to the control unit 32 in the detailed description of the invention). A device (corresponding to the information processing device 3 in the detailed description of the invention), wherein the storage unit corresponds to information on the first N objects (corresponding to information about the room of the learning model in the detailed description of the invention). ) And one or more photographs taken of each of the first N objects (corresponding to the room photograph of the learning model in the detailed description of the invention) and memorize the trained model based on the correspondence relationship. , The control unit has a photograph of the trained model and a second object different from the N objects (corresponding to a room photograph of an estimation target in the detailed description of the invention) and the second object. Photograph of the second object in the plan view of the second object based on the plan view of the object (corresponding to the floor view corresponding to the room photograph of the place to be estimated in the detailed description of the invention). An information processing program is provided that functions as an information processing device that estimates the shooting position and shooting direction in which the image was taken.

上記構成を備える情報処理プログラムは、コンピュータを、学習済みモデルを用い、物の写真が撮影された撮影位置と撮影方向を物の平面図において推定する装置として利用することを可能とする。 An information processing program having the above configuration makes it possible to use a computer as a device for estimating a shooting position and a shooting direction in which a photograph of an object is taken in a plan view of the object by using a trained model.

本発明の一実施形態においては、情報処理装置（発明の詳細な説明における情報処理装置３に対応する）を含む情報処理システムであって、該情報処理装置は記憶部（発明の詳細な説明における記憶部３３に対応する）と制御部（発明の詳細な説明における制御部３２に対応する）とを有し、前記記憶部は、第１のＮ個の物の情報（発明の詳細な説明における、学習モデルの部屋に関する情報に対応する）と、前記第１のＮ個の物それぞれを撮影した１以上の写真（発明の詳細な説明における、学習モデルの部屋写真に対応する）との対応関係に基づいた学習済モデルを記憶し、前記制御部が、前記学習済みモデルと、前記Ｎ個の物と異なる第２の物の写真（発明の詳細な説明における、推定対象であるところの部屋写真に対応する）および該第２の物の平面図（発明の詳細な説明における、推定対象であるところの部屋写真に対応した間取り図に対応する）とに基づいて、前記第２の物の平面図における前記第２の物の写真が撮影された撮影位置と撮影方向を推定する、情報処理システムが提供される。当該情報処理システムはさらに、上記情報処理装置にインターネット回線等を通じてアクセス可能な端末を含みうる。 In one embodiment of the present invention, the information processing system includes an information processing device (corresponding to the information processing device 3 in the detailed description of the invention), and the information processing device is a storage unit (corresponding to the detailed description of the invention). It has a storage unit 33) and a control unit (corresponding to the control unit 32 in the detailed description of the invention), and the storage unit has information on the first N objects (corresponding to the detailed description of the invention). , Corresponds to information about the room of the learning model) and one or more photographs (corresponding to the room photograph of the learning model in the detailed description of the invention) of each of the first N objects. A photograph of the trained model and a second object different from the N objects (a room photograph where the control unit is an estimation target in the detailed description of the invention). (Corresponding to) and the plan view of the second object (corresponding to the floor plan corresponding to the room photograph of the place to be estimated in the detailed description of the invention). An information processing system is provided that estimates the shooting position and shooting direction in which the photograph of the second object in the figure is taken. The information processing system may further include a terminal accessible to the information processing apparatus via an internet line or the like.

上記構成を備える情報処理システムは、コンピュータにおいて、学習済みモデルを用い、物の写真が撮影された撮影位置と撮影方向を物の平面図において推定することを可能とし、システムが端末を含む場合にあっては、該撮影位置と撮影方向の推定の結果を該端末において把握させることを可能とする。 An information processing system having the above configuration makes it possible to estimate the shooting position and shooting direction in which a photograph of an object is taken in a plan view of the object by using a trained model in a computer, when the system includes a terminal. Then, it is possible to grasp the result of estimation of the shooting position and the shooting direction on the terminal.

本発明の一実施形態において、前記撮影位置と撮影方向を推定することは、前記制御部が、前記学習済みモデルに基づいて前記第２の物の写真を圧縮して該第２の物の写真におけるオブジェクトの内容と位置からなる特徴を認識し、前記第２の物の平面図におけるオブジェクトの内容と位置からなる特徴と対照することを特徴としてもよい。 In one embodiment of the present invention, in estimating the photographing position and the photographing direction, the control unit compresses the photograph of the second object based on the trained model and photographs the second object. It may be characterized in that it recognizes a feature consisting of the content and position of the object in the above-mentioned second object and contrasts it with the feature consisting of the content and position of the object in the plan view of the second object.

上記構成を備えることにより、撮影位置および撮影方向の推定の演算量を減少させ、本発明出願当時の通常のサーバ装置においても十分な正確性をもって撮影位置および撮影方向の推定を行うことを可能とする。 By providing the above configuration, it is possible to reduce the amount of calculation for estimating the shooting position and shooting direction, and to estimate the shooting position and shooting direction with sufficient accuracy even in a normal server device at the time of filing the present invention. do.

本発明の一実施形態において、前記第２の物の写真におけるオブジェクトの内容と位置からなる特徴を認識することは、前記制御部が前記第２の物の写真におけるｃｈａｎｎｅｌ方向のベクトルに圧縮した情報とｗｉｄｔｈ方向のベクトルに圧縮した情報とを抽出することを特徴としてもよい。 In one embodiment of the present invention, recognizing a feature consisting of the content and position of an object in the photograph of the second object is information compressed by the control unit into a vector in the channel direction in the photograph of the second object. It may be characterized by extracting information compressed into a vector in the width direction.

上記構成を備えることにより撮影位置および撮影方向の推定の正確性を向上させることが可能となる。 By providing the above configuration, it is possible to improve the accuracy of estimation of the shooting position and the shooting direction.

本発明の一実施形態において、前記対照することは、前記制御部が前記第２の物の写真における前記ｃｈａｎｎｅｌ方向のベクトルに圧縮した情報と前記ｗｉｄｔｈ方向のベクトルに圧縮した情報とを、それぞれ前記第２の物の平面図を圧縮した情報と２回以上コンカットすることを特徴としてもよい。 In one embodiment of the present invention, the contrast is that the control unit compresses the information compressed into the vector in the channel direction and the information compressed into the vector in the width direction in the photograph of the second object, respectively. It may be characterized by compressing the plan view of the second object with the compressed information more than once.

本発明の一実施形態において、前記対照することは、さらに、前記制御部が前記コンカットの結果を、前記撮影位置を推定するためのヒートマップ（たとえば、２次元のヒートマップ）と、前記撮影方向を推定するためのヒートマップ（たとえば、１次元のヒートマップ）とを出力して評価することを含むことを特徴としてもよい。 In one embodiment of the present invention, the control further comprises a heat map (for example, a two-dimensional heat map) for the control unit to estimate the imaging position and the imaging. It may be characterized by including outputting and evaluating a heat map for estimating a direction (for example, a one-dimensional heat map).

本発明の一実施形態において、前記コンカットは、前記制御部が前記第２の物の写真における前記ｃｈａｎｎｅｌ方向のベクトルに圧縮した情報と前記ｗｉｄｔｈ方向のベクトルに圧縮した情報とを、ぞれぞれ、前記第２の物の平面図の幅方向および高さ方向に基づいて１以上複製し、当該複製した情報に基づいて前記第２の物の平面図を圧縮した情報とコンカットすることを特徴としてもよい。 In one embodiment of the present invention, the control unit compresses the information compressed into the vector in the channel direction and the information compressed into the vector in the width direction in the photograph of the second object, respectively. Then, one or more copies are made based on the width direction and the height direction of the plan view of the second object, and the plan view of the second object is compressed with the compressed information based on the duplicated information. It may be a feature.

上記構成を備えることにより、写真のｃｈａｎｎｅｌ方向のベクトルに圧縮した情報とｗｉｄｔｈ方向のベクトルに圧縮した情報とを分けて平面図と対応付けることができ、もって正確性を向上させることが可能としてもよい。 By providing the above configuration, the information compressed in the vector in the channel direction of the photograph and the information compressed in the vector in the width direction can be separately associated with the plan view, and the accuracy may be improved. ..

本発明の一実施形態において、前記第１のＮ個の物の情報は、Ｎ個の不動産物件の情報（たとえば、部屋内容、部屋情報、部屋内の物体の情報、間取りの内容に関する情報）であってもよく、また、前記第１のＮ個の物それぞれを撮影した１以上の写真は、該不動産物件の１以上の第１の部屋写真であってもよく、また、前記第２の物の平面図は、第２の間取り図であってもよく、また、前記第２の物の写真は、第２の部屋写真であってもよい。 In one embodiment of the present invention, the information on the first N objects is information on N real estate properties (for example, room contents, room information, information on objects in a room, information on floor plans). There may be one or more photographs of each of the first N objects, which may be one or more first room photographs of the real estate property, and the second object may be present. The plan view of the above may be a second floor plan, and the photograph of the second object may be a second room photograph.

上記構成を備えることにより、所定の部屋の間取り図における部屋写真（たとえば、パノラマ画像）が撮影された撮影位置と撮影方向を推定することが可能となる。 By providing the above configuration, it is possible to estimate the shooting position and shooting direction in which a room photograph (for example, a panoramic image) in a floor plan of a predetermined room is taken.

本発明の一実施形態においては、制御部を有し、物の平面図と該物の写真との対応付けを行う情報処理装置であって、前記制御部が、前記平面図の３次元特徴量と、前記写真から抽出されたｃｈａｎｎｅｌ方向のベクトルに圧縮した１次元特徴量およびｗｉｄｔｈ方向のベクトルに圧縮した１次元特徴量とを、前記各１次元特徴量を前記３次元特徴量の幅方向および高さ方向に１以上複製してコンカットする、情報処理装置が提供される。本発明の一実施形態において、前記物の平面図は、不動産物件の間取り図であり、前記物の写真は、部屋写真であってもよい。 In one embodiment of the present invention, the information processing device has a control unit and associates a plan view of the object with a photograph of the object, and the control unit is a three-dimensional feature amount of the plan view. And a one-dimensional feature amount compressed into a vector in the channel direction extracted from the photograph and a one-dimensional feature amount compressed into a vector in the width direction, and each one-dimensional feature amount in the width direction of the three-dimensional feature amount and An information processing apparatus is provided that duplicates and concuts one or more in the height direction. In one embodiment of the present invention, the plan view of the object may be a floor plan of a real estate property, and the photograph of the object may be a room photograph.

上記構成を備えた情報処理装置により、３次元特徴量と１次元特徴量とを対応付ける際に、３次元特徴量の有する情報を失うことなく対応付けることが可能となる。 With the information processing apparatus provided with the above configuration, when associating a three-dimensional feature amount with a one-dimensional feature amount, it is possible to associate the three-dimensional feature amount without losing the information possessed by the three-dimensional feature amount.

本発明の一実施形態においては、制御部を有する情報処理装置における、物の平面図と該物の写真との対応付けを行う情報処理方法であって、前記制御部が、前記平面図の３次元特徴量と、前記写真から抽出されたｃｈａｎｎｅｌ方向のベクトルに圧縮した１次元特徴量およびｗｉｄｔｈ方向のベクトルに圧縮した１次元特徴量とを、前記各１次元特徴量を前記３次元特徴量の幅方向および高さ方向に１以上複製してコンカットする、情報処理方法が提供される。本発明の一実施形態において、前記物の平面図は、不動産物件の間取り図であり、前記物の写真は、部屋写真であってもよい。 In one embodiment of the present invention, there is an information processing method for associating a plan view of an object with a photograph of the object in an information processing apparatus having a control unit, wherein the control unit is the 3 in the plan view. The dimensional feature amount, the one-dimensional feature amount compressed into the channel direction vector extracted from the photograph, and the one-dimensional feature amount compressed into the width direction vector, and each one-dimensional feature amount of the three-dimensional feature amount An information processing method is provided in which one or more copies are duplicated and combined in the width direction and the height direction. In one embodiment of the present invention, the plan view of the object may be a floor plan of a real estate property, and the photograph of the object may be a room photograph.

上記構成を備えた情報処理方法により、３次元特徴量と１次元特徴量とを対応付ける際に、３次元特徴量の有する情報を失うことなく対応付けることが可能となる。 With the information processing method provided with the above configuration, when associating a three-dimensional feature amount with a one-dimensional feature amount, it is possible to associate the three-dimensional feature amount without losing the information possessed by the three-dimensional feature amount.

本発明の一実施形態においては、コンピュータを、物の平面図と該物の写真との対応付けを行う制御部を有する情報処理装置であって、前記制御部が、前記平面図の３次元特徴量と、前記写真から抽出されたｃｈａｎｎｅｌ方向のベクトルに圧縮した１次元特徴量およびｗｉｄｔｈ方向のベクトルに圧縮した１次元特徴量とを、前記各１次元特徴量を前記３次元特徴量の幅方向および高さ方向に１以上複製してコンカットしてする、情報処理装置として動作させるコンピュータプログラムが提供される。本発明の一実施形態において、前記物の平面図は、不動産物件の間取り図であり、前記物の写真は、部屋写真であってもよい。 In one embodiment of the present invention, the computer is an information processing device having a control unit for associating a plan view of the object with a photograph of the object, and the control unit is a three-dimensional feature of the plan view. The quantity, the one-dimensional feature quantity compressed into the channel direction vector extracted from the photograph, and the one-dimensional feature quantity compressed into the width direction vector, and each one-dimensional feature quantity in the width direction of the three-dimensional feature quantity. And a computer program that operates as an information processing device that duplicates one or more in the height direction and concuts is provided. In one embodiment of the present invention, the plan view of the object may be a floor plan of a real estate property, and the photograph of the object may be a room photograph.

上記構成を備えた情報処理プログラムにより、情報処理装置において、３次元特徴量と１次元特徴量とを対応付ける際に、３次元特徴量の有する情報を失うことなく対応付けることが可能となる。 With the information processing program having the above configuration, when the three-dimensional feature amount and the one-dimensional feature amount are associated with each other in the information processing apparatus, the information possessed by the three-dimensional feature amount can be associated without being lost.

本発明の一実施形態においては、物の平面図と該物の写真との対応付けを行う制御部を有する情報処理装置を含む情報処理システムであって、前記制御部が、前記平面図の３次元特徴量と、前記写真から抽出されたｃｈａｎｎｅｌ方向のベクトルに圧縮した１次元特徴量およびｗｉｄｔｈ方向のベクトルに圧縮した１次元特徴量とを、前記各１次元特徴量を前記３次元特徴量の幅方向および高さ方向に１以上複製してコンカットしてする、情報処理システムが提供される。当該情報処理システムはさらに、上記情報処理装置にインターネット回線等を通じてアクセス可能な端末を含みうる。本発明の一実施形態において、前記物の平面図は、不動産物件の間取り図であり、前記物の写真は、部屋写真であってもよい。 In one embodiment of the present invention, the information processing system includes an information processing apparatus including a control unit for associating a plan view of an object with a photograph of the object, wherein the control unit is 3 in the plan view. The dimensional feature amount, the one-dimensional feature amount compressed into the channel direction vector extracted from the photograph, and the one-dimensional feature amount compressed into the width direction vector, and each one-dimensional feature amount of the three-dimensional feature amount An information processing system is provided that duplicates and concats one or more in the width direction and the height direction. The information processing system may further include a terminal accessible to the information processing apparatus via an internet line or the like. In one embodiment of the present invention, the plan view of the object may be a floor plan of a real estate property, and the photograph of the object may be a room photograph.

上記構成を備えた情報処理システムにより、情報処理装置において、３次元特徴量と１次元特徴量とを対応付ける際に、３次元特徴量の有する情報を失うことなく対応付けることが可能となる。 With the information processing system provided with the above configuration, when associating a three-dimensional feature amount with a one-dimensional feature amount in an information processing apparatus, it is possible to associate the three-dimensional feature amount without losing the information possessed by the three-dimensional feature amount.

本発明の一実施形態においては、制御部を有し、物の平面図と当該物の写真との対応付けを行う情報処理装置であって、前記制御部が、前記平面図の特徴量と、前記写真から抽出された特徴量とをコンカットして対照し、該対照結果を１次元または２次元のヒートマップとして出力して前記対応付けを行う、情報処理装置が提供される。本発明の一実施形態において、前記物の平面図は、不動産物件の間取り図であり、前記物の写真は、部屋写真であってもよい。本発明の別の実施形態においては、制御部を有し、物（変形例２における「端末装置５」に対応する。）の周囲情報（変形例２における「周囲画像」に対応する。）と当該物から撮影された写真との対応付けを行う（変形例２における「端末装置５の現在の向きを計算」することに対応する）情報処理装置であって、前記制御部が、前記周囲情報の特徴量と、前記写真から抽出された特徴量とをコンカットして対照し、該対照結果を１次元のヒートマップとして出力して前記対応付けを行う、情報処理装置が提供される。 In one embodiment of the present invention, an information processing device having a control unit and associating a plan view of an object with a photograph of the object, wherein the control unit has a feature amount of the plan view and a photograph of the object. An information processing apparatus is provided which controls and controls the feature amount extracted from the photograph, outputs the control result as a one-dimensional or two-dimensional heat map, and performs the association. In one embodiment of the present invention, the plan view of the object may be a floor plan of a real estate property, and the photograph of the object may be a room photograph. In another embodiment of the present invention, the control unit is provided, and the surrounding information of an object (corresponding to the "terminal device 5" in the modified example 2) and the surrounding information (corresponding to the "surrounding image" in the modified example 2). An information processing device that associates with a photograph taken from the object (corresponding to "calculating the current orientation of the terminal device 5" in the second modification), and the control unit controls the surrounding information. An information processing apparatus is provided which controls and controls the feature amount extracted from the photograph and outputs the control result as a one-dimensional heat map to perform the correspondence.

上記構成を備えた情報処理装置により、平面図（または周囲情報ないし周囲画像）と写真とを対応付ける際に、正確性を向上させることが可能となる。 The information processing apparatus provided with the above configuration makes it possible to improve the accuracy when associating a plan view (or surrounding information or surrounding image) with a photograph.

本発明の一実施形態においては、制御部を有し、物の平面図と当該物の写真との対応付けを行う情報処理装置における情報処理方法であって、前記制御部が、前記平面図の特徴量と、前記写真から抽出された特徴量とをコンカットして対照し、該対照結果を１次元または２次元のヒートマップとして出力して前記対応付けを行う、情報処理方法が提供される。本発明の一実施形態において、前記物の平面図は、不動産物件の間取り図であり、前記物の写真は、部屋写真であってもよい。本発明の別の実施形態においては、制御部を有し、物（変形例２における「端末装置５」に対応する。）の周囲情報（変形例２における「周囲画像」に対応する。）と当該物から撮影された写真との対応付けを行う（変形例２における「端末装置５の現在の向きを計算」することに対応する）情報処理装置における情報処理方法であって、前記制御部が、前記周囲情報の特徴量と、前記写真から抽出された特徴量とをコンカットして対照し、該対照結果を１次元のヒートマップとして出力して前記対応付けを行う、情報処理方法が提供される。 In one embodiment of the present invention, there is an information processing method in an information processing apparatus having a control unit and associating a plan view of an object with a photograph of the object, wherein the control unit is the plan view of the object. Provided is an information processing method in which a feature amount and a feature amount extracted from the photograph are concut and compared, and the control result is output as a one-dimensional or two-dimensional heat map to perform the correspondence. .. In one embodiment of the present invention, the plan view of the object may be a floor plan of a real estate property, and the photograph of the object may be a room photograph. In another embodiment of the present invention, the control unit is provided, and the surrounding information of an object (corresponding to the "terminal device 5" in the modified example 2) and the surrounding information (corresponding to the "surrounding image" in the modified example 2). An information processing method in an information processing device that associates with a photograph taken from the object (corresponding to "calculating the current orientation of the terminal device 5" in the second modification), wherein the control unit Provided is an information processing method in which a feature amount of the surrounding information and a feature amount extracted from the photograph are concatenated and compared, and the control result is output as a one-dimensional heat map to perform the correspondence. Will be done.

上記構成を備えた情報処理方法により、平面図（または周囲情報ないし周囲画像）と写真とを対応付ける際に、正確性を向上させることが可能となる。 The information processing method provided with the above configuration makes it possible to improve the accuracy when associating a plan view (or surrounding information or a surrounding image) with a photograph.

本発明の一実施形態においては、コンピュータを、制御部を有し、物の平面図と当該物の写真との対応付けを行う情報処理装置であって、前記制御部が、前記平面図の特徴量と、前記写真から抽出された特徴量とをコンカットして対照し、該対照結果を１次元または２次元のヒートマップとして出力して前記対応付けを行う、情報処理装置として動作させるコンピュータプログラムが提供される。本発明の一実施形態において、前記物の平面図は、不動産物件の間取り図であり、前記物の写真は、部屋写真であってもよい。本発明の別の実施形態においては、コンピュータを、制御部を有し、物（変形例２における「端末装置５」に対応する。）の周囲情報（変形例２における「周囲画像」に対応する。）と当該物から撮影された写真との対応付けを行う（変形例２における「端末装置５の現在の向きを計算」することに対応する）情報処理装置であって、前記制御部が、前記周囲情報の特徴量と、前記写真から抽出された特徴量とをコンカットして対照し、該対照結果を１次元のヒートマップとして出力して前記対応付けを行う、情報処理装置として動作させるコンピュータプログラムが提供される。 In one embodiment of the present invention, the computer is an information processing device having a control unit and associating a plan view of an object with a photograph of the object, wherein the control unit is a feature of the plan view. A computer program that operates as an information processing device that controls the amount and the feature amount extracted from the photograph, controls the control amount, outputs the control result as a one-dimensional or two-dimensional heat map, and performs the correspondence. Is provided. In one embodiment of the present invention, the plan view of the object may be a floor plan of a real estate property, and the photograph of the object may be a room photograph. In another embodiment of the present invention, the computer has a control unit and corresponds to the surrounding information (corresponding to the "peripheral image" in the modified example 2) of an object (corresponding to the "terminal device 5" in the modified example 2). An information processing device that associates (.) With a photograph taken from the object (corresponding to "calculating the current orientation of the terminal device 5" in the second modification), wherein the control unit: The feature amount of the surrounding information and the feature amount extracted from the photograph are computerized and compared, and the control result is output as a one-dimensional heat map to operate as an information processing device that performs the correspondence. A computer program is provided.

上記構成を備えた情報処理プログラムにより、情報処理装置において平面図（または周囲情報ないし周囲画像）と写真とを対応付ける際に、正確性を向上させることが可能となる。 An information processing program having the above configuration makes it possible to improve the accuracy when associating a plan view (or surrounding information or surrounding image) with a photograph in an information processing apparatus.

本発明の一実施形態においては、物の平面図と該物の写真との対応付けを行う制御部を有する情報処理装置を含む情報処理システムであって、前記制御部が、前記平面図の特徴量と、前記写真から抽出された特徴量とをコンカットして対照し、該対照結果を１次元または２次元のヒートマップとして出力して前記対応付けを行う、情報処理システムが提供される。当該情報処理システムはさらに、上記情報処理装置にインターネット回線等を通じてアクセス可能な端末を含みうる。本発明の一実施形態において、前記物の平面図は、不動産物件の間取り図であり、前記物の写真は、部屋写真であってもよい。本発明の別の実施形態においては、撮影手段を備えた端末（変形例２における「端末装置５」に対応する。）と、当該端末の周囲情報（変形例２における「周囲画像」に対応する。）および当該端末から撮影された写真を対応付ける（変形例２における「端末装置５の現在の向きを計算」することに対応する）制御部を有する情報処理装置と、を含む情報処理システムであって、前記制御部が、前記周囲情報の特徴量と、前記写真から抽出された特徴量とをコンカットして対照し、該対照結果を１次元のヒートマップとして出力して前記対応付けを行う、情報処理システムが提供される。 In one embodiment of the present invention, the information processing system includes an information processing apparatus including a control unit for associating a plan view of an object with a photograph of the object, wherein the control unit is a feature of the plan view. An information processing system is provided in which an amount and a feature amount extracted from the photograph are concut and compared, and the control result is output as a one-dimensional or two-dimensional heat map to perform the association. The information processing system may further include a terminal accessible to the information processing apparatus via an internet line or the like. In one embodiment of the present invention, the plan view of the object may be a floor plan of a real estate property, and the photograph of the object may be a room photograph. In another embodiment of the present invention, the terminal provided with the photographing means (corresponding to the "terminal device 5" in the modified example 2) and the surrounding information of the terminal (corresponding to the "surrounding image" in the modified example 2). An information processing system including an information processing device having a control unit for associating a photograph taken from the terminal (corresponding to "calculating the current orientation of the terminal device 5" in the modification 2). Then, the control unit concuts and controls the feature amount of the surrounding information and the feature amount extracted from the photograph, outputs the control result as a one-dimensional heat map, and performs the correspondence. , An information processing system is provided.

上記構成を備えた情報処理システムにより、情報処理装置において平面図（または周囲情報ないし周囲画像）と写真とを対応付ける際に、正確性を向上させることが可能となる。 The information processing system provided with the above configuration makes it possible to improve the accuracy when associating a plan view (or surrounding information or surrounding image) with a photograph in an information processing apparatus.

本発明の一実施形態において、推定された撮影位置と撮影方向は、間取り図上において矢印等のマークを用いて表示されても良く、ユーザ端末装置における入力部の操作（たとえば、当該マークを選択すること）によって、対応する部屋写真を表示できるようにしても良い。 In one embodiment of the present invention, the estimated shooting position and shooting direction may be displayed by using a mark such as an arrow on the floor plan, and an operation of an input unit in the user terminal device (for example, selecting the mark). By doing so), the corresponding room photo may be displayed.

本発明の一実施形態において、情報処理装置は、推定された撮影位置と撮影方向を用いて、ユーザ端末装置に対し、不動産物件の自動的及び／又はインタラクティブなツアーを提供するバーチャルツアーを提供する。バーチャルツアーでは、当該バーチャルツアーにおいて訪問される間取り図における特定の部屋（位置）における、対応するパノラマ画像によって提供され、ユーザからの方角（撮影方向）を変更する信号を受信することで撮影方向が異なるパノラマ画像を提供することもできる。 In one embodiment of the invention, the information processing device provides a virtual tour that provides an automatic and / or interactive tour of a real estate property to a user terminal device using an estimated shooting position and shooting direction. .. In a virtual tour, the shooting direction is provided by a corresponding panoramic image in a specific room (position) in the floor plan visited in the virtual tour, and the shooting direction is changed by receiving a signal from the user to change the direction (shooting direction). It is also possible to provide different panoramic images.

本発明の一実施形態において、記憶部を有し、第１の情報と第２の情報とを対応付ける情報処理装置であって、前記第１の情報から得た３次元特徴量と、前記第２の情報から得た第１の１次元特徴量（たとえば、ｃｈａｎｎｅｌ方向のベクトルに圧縮した１次元特徴量）および第２の１次元特徴量（たとえば、ｗｉｄｔｈ方向のベクトルに圧縮した１次元特徴量）とを前記各１次元特徴量を前記３次元特徴量の幅方向および高さ方向に１以上複製してコンカットして前記第１の情報と前記第２の情報との前記対応付けを行う、情報処理装置が提供される。 In one embodiment of the present invention, an information processing apparatus having a storage unit and associating the first information with the second information, the three-dimensional feature amount obtained from the first information and the second information. First one-dimensional feature amount (for example, one-dimensional feature amount compressed into a vector in the channel direction) and second one-dimensional feature amount (for example, one-dimensional feature amount compressed into a vector in the width direction) obtained from the above information. And one or more of each one-dimensional feature amount are duplicated in the width direction and the height direction of the three-dimensional feature amount and combined to perform the correspondence between the first information and the second information. An information processing device is provided.

上記構成を有する情報処理装置によれば、２つの入力の特徴量の次元が異なる場合の情報処理（コンカット）を行う情報処理装置を提供することができる。 According to the information processing apparatus having the above configuration, it is possible to provide an information processing apparatus that performs information processing (concut) when the dimensions of the feature quantities of the two inputs are different.

本発明の一実施形態において、コンピュータを、制御部を有し、前記制御部が、第１の情報の３次元特徴量と、前記第２の情報から得た第１の１次元特徴量（たとえば、ｃｈａｎｎｅｌ方向のベクトルに圧縮した１次元特徴量）および第２の１次元特徴量（たとえば、ｗｉｄｔｈ方向のベクトルに圧縮した１次元特徴量）とを前記各１次元特徴量を前記３次元特徴量の幅方向および高さ方向に１以上複製してコンカットして前記第１の情報と前記第２の情報との対応付けを行う情報処理装置として動作させる、コンピュータプログラムが提供される。 In one embodiment of the present invention, the computer has a control unit, and the control unit has a three-dimensional feature amount of the first information and a first one-dimensional feature amount obtained from the second information (for example). , One-dimensional feature amount compressed into a vector in the channel direction) and a second one-dimensional feature amount (for example, one-dimensional feature amount compressed into a vector in the width direction). Provided is a computer program that duplicates one or more in the width direction and the height direction of the above, converts the first information, and operates as an information processing apparatus for associating the first information with the second information.

上記構成を有するコンピュータプログラムによれば、２つの入力の特徴量の次元が異なる場合の情報処理（コンカット）を行うことのできるコンピュータプログラムを提供することができる。 According to the computer program having the above configuration, it is possible to provide a computer program capable of performing information processing (concut) when the dimensions of the feature quantities of the two inputs are different.

本発明の一実施形態において、制御部を有する情報処理方法であって、前記制御部において、第１の情報の３次元特徴量と、第２の情報から得た第１の１次元特徴量（たとえば、ｃｈａｎｎｅｌ方向のベクトルに圧縮した１次元特徴量）および第２の１次元特徴量（たとえば、ｗｉｄｔｈ方向のベクトルに圧縮した１次元特徴量）とを前記各１次元特徴量を前記３次元特徴量の幅方向および高さ方向に１以上複製してコンカットして前記第１の情報と前記第２の情報との対応付けを行う、情報処理方法が提供される。 In one embodiment of the present invention, there is an information processing method having a control unit, wherein the control unit has a three-dimensional feature amount of the first information and a first one-dimensional feature amount obtained from the second information. For example, the one-dimensional feature amount compressed into a vector in the channel direction) and the second one-dimensional feature amount (for example, the one-dimensional feature amount compressed into a vector in the width direction) are combined with the three-dimensional feature amount. An information processing method is provided in which one or more copies are duplicated in the width direction and the height direction of a quantity, and the first information and the second information are associated with each other.

上記構成を有する情報処理方法によれば、２つの入力の特徴量の次元が異なる場合の情報処理（コンカット）を行うことのできる、情報処理装置における情報処理方法を提供することができる。 According to the information processing method having the above configuration, it is possible to provide an information processing method in an information processing apparatus capable of performing information processing (concut) when the dimensions of the feature quantities of the two inputs are different.

本発明の一実施形態において、制御部を有する情報処理装置を含む情報処理システムであって、前記制御部において、第１の情報の３次元特徴量と、第２の情報から得た第１の１次元特徴量（たとえば、ｃｈａｎｎｅｌ方向のベクトルに圧縮した１次元特徴量）および第２の１次元特徴量（たとえば、ｗｉｄｔｈ方向のベクトルに圧縮した１次元特徴量）とを前記各１次元特徴量を前記３次元特徴量の幅方向および高さ方向に１以上複製してコンカットして前記第１の情報と前記第２の情報との対応付けを行う、情報処理システムが提供される。 In one embodiment of the present invention, an information processing system including an information processing apparatus having a control unit, wherein the control unit has a three-dimensional feature amount of the first information and a first information obtained from the second information. A one-dimensional feature amount (for example, a one-dimensional feature amount compressed into a vector in the channel direction) and a second one-dimensional feature amount (for example, a one-dimensional feature amount compressed into a vector in the width direction) are each described as one-dimensional feature amount. An information processing system is provided that duplicates one or more of the three-dimensional feature quantities in the width direction and the height direction and concuts them to associate the first information with the second information.

上記構成を有する情報処理システムによれば、２つの入力の特徴量の次元が異なる場合の情報処理（コンカット）を行うことのできる情報処理装置を構成要素とする情報処理システムを提供することができる。 According to the information processing system having the above configuration, it is possible to provide an information processing system having an information processing device as a component capable of performing information processing (concut) when the dimensions of the feature quantities of the two inputs are different. can.

本発明の一態様によれば、物の平面図における、当該物の写真の撮影位置および撮影方向の推定が可能となる。たとえば、不動産（物件）のパノラマ画像と間取り図を入力にして、撮影位置と撮影方向の推定が可能となる。 According to one aspect of the present invention, it is possible to estimate the shooting position and the shooting direction of a photograph of the object in a plan view of the object. For example, it is possible to estimate the shooting position and shooting direction by inputting a panoramic image and a floor plan of a real estate (property).

図１は、間取り図画像の一例を示す図である。FIG. 1 is a diagram showing an example of a floor plan image. 図２は、デジタルカメラで撮影された、図１に示す部屋の部屋写真（パノラマ写真）の一例を示す図である。FIG. 2 is a diagram showing an example of a room photograph (panoramic photograph) of the room shown in FIG. 1 taken by a digital camera. 図３は、情報処理装置３（サーバ）の構成の一例を示す図である。FIG. 3 is a diagram showing an example of the configuration of the information processing device 3 (server). 図４は、情報処理装置３（サーバ）とインターネット４とユーザ端末装置５とを含む情報処理システム１の構成の一例を示す図である。FIG. 4 is a diagram showing an example of the configuration of an information processing system 1 including an information processing device 3 (server), an Internet 4, and a user terminal device 5. 図５は、ユーザ端末装置の構成の一例を示す図である。FIG. 5 is a diagram showing an example of the configuration of the user terminal device. 図６は、本発明の一実施形態に係る情報処理ステップの事前ステップを示すための図である。具体的には、記憶部３３が学習済みモデルＧを記憶するまでのステップの一例を示したものである。FIG. 6 is a diagram for showing a preliminary step of an information processing step according to an embodiment of the present invention. Specifically, an example of the steps until the storage unit 33 stores the trained model G is shown. 図７は、本発明の一実施形態に係る情報処理ステップを示すための図である。FIG. 7 is a diagram for showing an information processing step according to an embodiment of the present invention. 図８は、本発明の一実施形態に係る情報処理ステップを補足説明するための図である。具体的には、部屋写真（たとえばパノラマ画像）からｗｉｄｔｈ方向の情報を抽出する場合のステップに関する説明である。FIG. 8 is a diagram for supplementarily explaining an information processing step according to an embodiment of the present invention. Specifically, it is a description about a step in the case of extracting information in the width direction from a room photograph (for example, a panoramic image). 図９は、本発明の一実施形態に係る情報処理ステップを補足説明するための図である。具体的には、部屋写真（たとえばパノラマ画像）からｗｉｄｔｈ方向の情報を抽出する場合のステップに関する説明である。図８において、部屋写真において対象となるべきオブジェクト位置に網掛けをして表示したものである。FIG. 9 is a diagram for supplementarily explaining an information processing step according to an embodiment of the present invention. Specifically, it is a description about a step in the case of extracting information in the width direction from a room photograph (for example, a panoramic image). In FIG. 8, the position of the object to be the target in the room photograph is shaded and displayed. 図１０は、本発明の一実施形態に係る情報処理ステップを補足説明するための図である。具体的には、１階のテンソル情報（ｃｈａｎｎｅｌ方向の情報および／またはｗｉｄｔｈ方向の情報）と、３階のテンソル情報（間取り図の情報）をコンカットする際に、３階のテンソル情報が保有する空間情報を失うことなくコンカットを行うことを可能とする手法に関する。FIG. 10 is a diagram for supplementarily explaining an information processing step according to an embodiment of the present invention. Specifically, the tensor information on the 3rd floor is possessed when the tensor information on the 1st floor (information in the channel direction and / or the information in the width direction) and the tensor information on the 3rd floor (information on the floor plan) are combined. It relates to a method that makes it possible to perform a concut without losing the spatial information to be performed.

本発明の一実施形態を説明するフローチャートは、プロセスステップを意味しており、当該特定の例のみに限定されることなく、代替の実施態様を許容するものである。また、本明細書の明示の記載に反しない限りにおいて、プロセスステップは本明細書における特定の説明と異なる順序で実行することを許容するものである。このように、本発明の趣旨及び範囲から逸脱することなく本実施形態に対しては種々の変更を行うことができる。以下、本明細書においては、所定の部屋の「間取り図（画像）」と当該部屋の「部屋写真」を例にとって情報処理装置ないしは情報処理装置における情報処理内容を説明する。 The flowchart illustrating one embodiment of the invention means a process step and allows for alternative embodiments without limitation to the particular example. It is also permissible to carry out the process steps in a different order than the particular description herein, as long as it does not contradict the express statements herein. As described above, various changes can be made to the present embodiment without departing from the spirit and scope of the present invention. Hereinafter, in the present specification, the information processing content in the information processing device or the information processing device will be described by taking as an example a “floor plan (image)” of a predetermined room and a “room photograph” of the room.

［物体の認識］
近時、物体の認識について、畳み込みニューラルネットワーク（ＣＮＮ：ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）は、画像のクラス分類（画像認識）だけでなく、物体の位置や大きさの検出（物体検出）や形状の抽出（領域抽出）にも用いられ、例えば、物体検出には、リージョンＣＮＮであるＲ－ＣＮＮ（さらに、ＦａｓｔｅｒＲ－ＣＮＮ）、Ｙｏｌｏなどが、領域抽出にはＳｅｇＮｅｔや画像のセグメンテーションを推定するＵ－Ｎｅｔ（Ｕ字型の畳み込みネットワーク）などのニューラルネットワークが用いられるようになってきている。 [Object recognition]
Recently, regarding object recognition, convolutional neural networks (CNN) not only classify images (image recognition), but also detect the position and size of objects (object detection) and extract shapes (areas). It is also used for (extraction), for example, region CNN R-CNN (further, Faster R-CNN), Yoro, etc. for object detection, SegNet for region extraction, and U-Net (U-Net for estimating image segmentation). Neural networks such as U-shaped convolutional networks) have come to be used.

たとえば、Ｒ－ＣＮＮでは、ある矩形が物体なのか背景なのかを学習し、検出した場所に具体的に何が写っているのかを学習する。また、Ｕ－Ｎｅｔ（Ｕ字型の畳み込みネットワーク）では物体の「局所的特徴」と「全体的位置情報」の両方を統合して学習する。本発明ではこれらの既存技術、非特許文献１および２に記載の技術を用いる際、これらの既存技術に対する詳細な説明自体は省略し、それらの技術内容を参照により本明細書に組み込むものとする。 For example, in R-CNN, it learns whether a certain rectangle is an object or a background, and learns what is specifically reflected in the detected place. In U-Net (U-shaped convolutional network), both "local features" and "overall position information" of an object are integrated and learned. In the present invention, when these existing techniques and the techniques described in Non-Patent Documents 1 and 2 are used, detailed description of these existing techniques itself shall be omitted, and the technical contents thereof shall be incorporated in the present specification by reference. ..

［ハードウェア］
本発明の実施形態に係るハードウェアの基本的な構成を説明する。まず、情報処理装置３は、サーバを念頭に置いているが、たとえば、スマートフォンやタブレット端末などのモバイル端末、ノートブックコンピュータ、デスクトップコンピュータなどの電子機器であってもよい。情報処理装置３は、物理的な演算装置（ＣＰＵおよび／またはＧＰＵ、図示せず）、一時的な作業内容を記憶しておく作業メモリ（ＲＡＭ、図示せず）等、情報処理装置として必須の物理的構成を備えている。 [hardware]
The basic configuration of the hardware according to the embodiment of the present invention will be described. First, the information processing device 3 has a server in mind, but may be, for example, an electronic device such as a mobile terminal such as a smartphone or a tablet terminal, a notebook computer, or a desktop computer. The information processing device 3 is indispensable as an information processing device such as a physical arithmetic unit (CPU and / or GPU, not shown), a working memory for storing temporary work contents (RAM, not shown), and the like. It has a physical configuration.

図３に示すように、情報処理装置３は、通信部３１を有する。通信部３１は、インターネット等のネットワークを介して外部と通信可能に接続されている。なお、ネットワークは、有線回線、無線回線を問わない接続方法によるインターネットや、イーサネット（登録商標）等を利用したローカルエリアネットワーク等の公知のネットワーク接続方法を用いることができる。 As shown in FIG. 3, the information processing apparatus 3 has a communication unit 31. The communication unit 31 is connected so as to be able to communicate with the outside via a network such as the Internet. As the network, a known network connection method such as the Internet by a connection method regardless of a wired line or a wireless line, or a local area network using Ethernet (registered trademark) or the like can be used.

情報処理装置３は、画像データの処理等を行う制御部３２を更に有する。本明細書において「制御部」として説明するものは、ＣＰＵおよび／またはＧＰＵによって対応するプログラムが読みだされ、ＣＰＵおよび／またはＧＰＵによって実行される機能を意味する。制御部３２は、情報処理装置３内のプロセッサが所定のプログラムを実行することにより実現されてもよいし、ハードウェアで実装されてもよい。 The information processing device 3 further includes a control unit 32 that processes image data and the like. As used herein as a "control unit" is meant a function in which the corresponding program is read by the CPU and / or GPU and executed by the CPU and / or GPU. The control unit 32 may be realized by the processor in the information processing apparatus 3 executing a predetermined program, or may be implemented by hardware.

情報処理装置は、記憶部３３を更に有する。記憶部３３には、制御部３２が取り扱う各種データが記憶される。記憶部３３は、たとえば内蔵ハードディスクドライブ、内臓ソリッドステートドライブ、外部データストレージ（たとえば外付けハードディスクドライブ、サーバ用テープドライブ等）などのデータストレージであり、場合によってはクラウドストレージなどのデータストレージであってもよい。その意味で、本発明は、制御部３２と記憶部３３とは別の情報処理装置を用いるという態様を許容する。 The information processing device further includes a storage unit 33. Various data handled by the control unit 32 are stored in the storage unit 33. The storage unit 33 is data storage such as an internal hard disk drive, an internal solid state drive, and an external data storage (for example, an external hard disk drive, a tape drive for a server, etc.), and may be a data storage such as cloud storage. May be good. In that sense, the present invention allows an aspect in which an information processing device different from the control unit 32 and the storage unit 33 is used.

なお、情報処理装置３はさらに入力部、表示部を備えていてもよい（図示せず）が、これらは必須の構成ではない。たとえば、入力部は情報を入力するためのインターフェースであり、モバイル端末におけるタッチパネルやマイクロフォン、ノートブックコンピュータないしデスクトップコンピュータなどにおけるタッチパッド、キーボードまたはマウスなどである。表示部は情報処理装置３の使用者に対して各種情報を表示するインターフェースであり、たとえば液晶ディスプレイ、ヘッドマウントディスプレイ等の映像表示手段である。 The information processing apparatus 3 may further include an input unit and a display unit (not shown), but these are not essential configurations. For example, the input unit is an interface for inputting information, such as a touch panel or microphone in a mobile terminal, a touch pad in a notebook computer or a desktop computer, a keyboard or a mouse. The display unit is an interface for displaying various information to the user of the information processing apparatus 3, and is a video display means such as a liquid crystal display or a head-mounted display.

［端末装置５］
本情報処理システムは、ユーザ端末装置５をさらに含んでも良い（図４）。ユーザ端末装置５（端末５ａおよび端末５ｂ）は、ユーザが入出力するためのクライアント端末を意味する。たとえば、スマートフォンやタブレット端末などのモバイル端末、ノートブックコンピュータ、デスクトップコンピュータ、ワークステーションなどの電子機器であってもよい。これらのユーザ端末装置５と情報処理装置３とは、ネットワーク４を介して接続される。なお、変形例２においては、端末装置５はカメラを搭載したロボットやカメラを搭載した自動運転車であるが、これも、情報処理装置３にネットワーク４を通じてアクセスすることができるクライアント端末の一例であることに変わりはない。 [Terminal device 5]
The information processing system may further include a user terminal device 5 (FIG. 4). The user terminal device 5 (terminal 5a and terminal 5b) means a client terminal for input / output by the user. For example, it may be a mobile terminal such as a smartphone or a tablet terminal, or an electronic device such as a notebook computer, a desktop computer, or a workstation. The user terminal device 5 and the information processing device 3 are connected to each other via the network 4. In the second modification, the terminal device 5 is a robot equipped with a camera or an autonomous driving vehicle equipped with a camera, but this is also an example of a client terminal capable of accessing the information processing device 3 through the network 4. There is no change.

図５は、ユーザ端末装置５の構成の一例を示す図である。図５に示すように、端末装置５は、（端末）通信部５１と、（端末）制御部５２と、（端末）記憶部５３と、（端末）入力部５４と、（端末）表示部５５とを有し、各部は、バスを介して互いに通信可能に接続されている。以下、ユーザ端末装置５の各部については、情報処理装置３における各部と区別するために、端末通信部５１などと、頭に「端末」を付して説明する。 FIG. 5 is a diagram showing an example of the configuration of the user terminal device 5. As shown in FIG. 5, the terminal device 5 includes a (terminal) communication unit 51, a (terminal) control unit 52, a (terminal) storage unit 53, a (terminal) input unit 54, and a (terminal) display unit 55. And each part is connected to each other so as to be able to communicate with each other via a bus. Hereinafter, each part of the user terminal device 5 will be described with a terminal communication unit 51 or the like and a “terminal” at the beginning in order to distinguish each part from the information processing device 3.

端末通信部５１は、ユーザ端末装置５とネットワーク４との間の通信インターフェースである。端末通信部５１は、ネットワーク４を介して端末装置５と情報処理装置３との間で情報を送受信することができる。 The terminal communication unit 51 is a communication interface between the user terminal device 5 and the network 4. The terminal communication unit 51 can transmit and receive information between the terminal device 5 and the information processing device 3 via the network 4.

端末制御部５２は、端末装置５の各種処理を行う制御手段である。端末制御部５２は、端末装置５内のプロセッサが所定のプログラムを実行することにより実現されてもよいし、ハードウェアで実装されてもよい。 The terminal control unit 52 is a control means for performing various processes of the terminal device 5. The terminal control unit 52 may be realized by the processor in the terminal device 5 executing a predetermined program, or may be implemented by hardware.

端末記憶部５３は、たとえば内蔵メモリや外部メモリ（ＳＤメモリカード等）などのデータストレージである。端末記憶部５３には、端末制御部５２が取り扱う各種データが記憶される。 The terminal storage unit 53 is, for example, a data storage such as an internal memory or an external memory (SD memory card or the like). Various data handled by the terminal control unit 52 are stored in the terminal storage unit 53.

端末入力部５４は、ユーザが端末装置５に情報を入力するためのインターフェースであり、たとえばモバイル端末におけるカメラ（ＣＣＤカメラやＣＭＯＳカメラ）、タッチパネルやマイクロフォン、ノートブックコンピュータにおけるタッチパッド、キーボードまたはマウスなどである。なお、端末装置５がロボットないし自動運転車である変形例２においては、端末入力部５４としてカメラが必須の構成となる。 The terminal input unit 54 is an interface for a user to input information to the terminal device 5, for example, a camera (CCD camera or CMOS camera) in a mobile terminal, a touch panel or microphone, a touch pad in a notebook computer, a keyboard, a mouse, or the like. Is. In the second modification in which the terminal device 5 is a robot or an autonomous vehicle, a camera is indispensable as the terminal input unit 54.

端末表示部５５は、端末装置５からユーザに対して各種情報を表示するインターフェースであり、たとえば液晶ディスプレイ等の映像表示手段である。具体的には、たとえば、端末表示部５５は、ユーザからの操作を受け付けるためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）を表示してもよい。なお、端末装置５がロボットないし自動運転車である変形例２においては、端末表示部５５は不要である。 The terminal display unit 55 is an interface for displaying various information from the terminal device 5 to the user, and is a video display means such as a liquid crystal display. Specifically, for example, the terminal display unit 55 may display a GUI (Graphical User Interface) for receiving an operation from the user. In the second modification in which the terminal device 5 is a robot or an autonomous vehicle, the terminal display unit 55 is unnecessary.

［間取り図］
間取り図画像（以下、単に「間取り図」とも称する。）は、建物における部屋（複数の場合もあり得る。）の配置の図（たとえば、平面図）であり、通常は、一律の縮尺で描かれている。不動産小売業者、建築家、不動産仲介業者、不動産情報業者等によって用いられる様々なタイプ又はスタイルの間取り図がある。通常は部屋の平面的な形状のみならず、部屋内の一定の物体（たとえば、扉、扉の開閉方向、キッチン、トイレ、バスタブ、床暖房の位置など）の位置と種類が、一定のルールで記載されている。間取り図画像は、ＪＰＥＧ、ＴＩＦＦ、ＧＩＦ、ＢＭＰ、ＰＮＧ等を含む任意の画像フォーマットのものとすることができる。 [Floor plan]
A floor plan image (hereinafter, also simply referred to as a “floor plan”) is a diagram (for example, a plan view) of the arrangement of rooms (there may be more than one) in a building, and is usually drawn at a uniform scale. It has been. There are various types or styles of floor plans used by real estate retailers, architects, real estate agents, real estate information agents, etc. Normally, not only the flat shape of the room, but also the position and type of certain objects in the room (for example, doors, door opening / closing directions, kitchen, toilet, bathtub, floor heating position, etc.) are determined by certain rules. Are listed. The floor plan image can be of any image format including JPEG, TIFF, GIF, BMP, PNG and the like.

［部屋写真］
本発明において「部屋写真」は、部屋内部ないしは建物内部の空間を撮影したカラー、グレースケールまたはモノクロ等の写真を意味するものとする。部屋写真は、ＪＰＥＧ、ＴＩＦＦ、ＧＩＦ、ＢＭＰ、ＰＮＧ等を含む任意の画像フォーマットのものとすることができる。 [Room photo]
In the present invention, the "room photograph" is intended to mean a color, grayscale, monochrome, or the like photograph of the space inside a room or a building. Room photographs can be of any image format, including JPEG, TIFF, GIF, BMP, PNG and the like.

本発明の一実施形態において部屋写真はパノラマ画像であることを想定しており、本発明において「パノラマ画像」とは、不動産物件内の内部ロケーションにある１以上の物体をキャプチャーすることが可能な広視野を有する任意の画像を意味するものとする。 In one embodiment of the present invention, it is assumed that the room photograph is a panoramic image, and in the present invention, the "panoramic image" can capture one or more objects at an internal location in a real estate property. It shall mean any image having a wide field of view.

本実施形態では、部屋内部ないしは建物内部の空間を撮影した画像（部屋写真１０）は、画角３６０度のパノラマ画像であることを想定しており、かかる画角が本発明に最も好適である。もっとも、パノラマ画像は、水平方向に３６０度未満の視野を有することもできる。また、理論上は、本発明の部屋画像１０として、たとえば２７０度、１８０度又はそれ未満のようにより狭い視野の画像を用いることもできる。本明細書において以下、パノラマ画像及び部屋写真という用語は区別なく用いられ得る。すなわち、部屋、又は建物内の空間を撮影した画角３６０度の画像を、単に「部屋写真」とも称するものとする。 In the present embodiment, it is assumed that the image (room photograph 10) of the space inside the room or the inside of the building is a panoramic image with an angle of view of 360 degrees, and such an angle of view is most suitable for the present invention. .. However, the panoramic image can also have a field of view of less than 360 degrees in the horizontal direction. Also, theoretically, as the room image 10 of the present invention, an image having a narrower field of view, for example, 270 degrees, 180 degrees or less, can be used. Hereinafter, the terms panoramic image and room photograph may be used without distinction in the present specification. That is, an image having an angle of view of 360 degrees, which is a photograph of a room or a space in a building, is also simply referred to as a "room photograph".

［学習済みモデル］
本発明においては、事前に推定を行う画像等に関する学習（重みの自動調整）をさせたデータ（「学習済みモデル」）を記憶部３３に記憶することになる。たとえば、本発明の一実施形態では、事前に、パノラマ画像を入力とした部屋の種類を推定するタスクを畳み込みニューラルネットワークにより学習させ、その結果得られる学習済みモデルを記憶部３３に記憶しておく（図６：「事前ステップ」）。学習は、情報処理装置３の制御部３２において行われてもよいが、他の情報処理装置において行われ、情報処理装置３の記憶部３３に記憶されてもよい。 [Trained model]
In the present invention, data (“learned model”) that has been trained (automatically adjusting weights) related to an image or the like to be estimated in advance is stored in the storage unit 33. For example, in one embodiment of the present invention, a task of estimating a room type using a panoramic image as an input is trained by a convolutional neural network in advance, and a trained model obtained as a result is stored in a storage unit 33. (Fig. 6: "Preliminary step"). The learning may be performed in the control unit 32 of the information processing device 3, but may be performed in another information processing device and stored in the storage unit 33 of the information processing device 3.

一例において学習セットは、部屋写真（間取り図１枚に対して部屋写真は複数枚であってもよい）と当該部屋の情報（部屋の種類）のセットであり、その学習方法ないし畳み込みニューラルネットワークの構成については、本明細書の他の記載により制限される他には特に制限はなく、非特許文献１に記載の手法などを含む、標準的な深層学習（たとえば、標準的な距離学習）を用いることができる。ここで、本発明において距離学習とは、対応すると判明しているアイテム同士を近くに、対応しないと分かっているアイテム同士を遠くにマッピングするようなニューラルネットの学習方法一般を意味する。たとえば、対応するペアの特徴量のベクトルが小さくなるように重みパラメータの自動調整（学習）を行い、対応しないペアの特徴量のベクトルの距離は大きくなるように重みパラメータの自動調整を行ってもよい。一例として、コントラスティブロス（ＣｏｎｔｒａｓｔｉｖｅＬｏｓｓ）を用いることができる。コントラスティブロスは距離学習の一つであり、ペアとなるデータ（たとえば、２つの画像データ）に基づいて、損失関数の出力値（Ｌ）を最小化し、ニューラルネットワークの学習（重みの自動調整）を行う。そのほか、トリプレットロス（ＴｒｉｐｌｅｔＬｏｓｓ）を用いることもできる。トリプレットロスは距離学習の一つであり、３データに基づいて重みの自動調整を行う手法である。具体的に言えば、まず、３データ中で中心となるもの（「中心データ」）が決められる。次いで、残る２データに対し、中心データと相対的に似ているものと似ていないものがそれぞれ指定される。ニューラルネットは中心データと残る２データの距離を計算し、実際に似ているもの同士の距離を似ていないもの同士の距離よりも大きくするように更新する。 In one example, the learning set is a set of room photographs (there may be multiple room photographs for one floor plan) and information on the room (room type), and the learning method or convolutional neural network thereof. The configuration is not particularly limited except that it is limited by other descriptions of the present specification, and standard deep learning (for example, standard distance learning) including the method described in Non-Patent Document 1 is used. Can be used. Here, in the present invention, the distance learning means a general learning method of a neural network that maps items that are known to correspond to each other close to each other and items that are known to not correspond to each other to a distance. For example, even if the weight parameter is automatically adjusted (learning) so that the vector of the feature amount of the corresponding pair becomes small, and the weight parameter is automatically adjusted so that the distance of the feature amount vector of the uncorresponding pair becomes large. good. As an example, a contrastive loss can be used. Contrastive loss is one of distance learning, which minimizes the output value (L) of the loss function based on paired data (for example, two image data) and learns a neural network (automatic adjustment of weights). I do. In addition, Triplet Loss can also be used. Triplet loss is one of the distance learning and is a method of automatically adjusting the weight based on 3 data. Specifically, first, the central one (“central data”) among the three data is determined. Next, for the remaining two data, data that is relatively similar to the center data and data that are not similar to the center data are designated. The neural network calculates the distance between the center data and the remaining two data, and updates the distance between similar ones to be larger than the distance between dissimilar ones.

学習された内容は、学習済みモデルとして記憶部３３に記憶されるが、このことは、記憶部が、第１のＮ個の物の情報（たとえば、部屋内容、部屋情報、部屋内の物体の情報、間取りの内容に関する情報）と第１のＮ個の物それぞれを撮影した１以上の写真との対応関係（すなわち、学習セット）に基づいた学習済モデルＧを記憶している、と表現することもできる。 The learned content is stored in the storage unit 33 as a trained model, which means that the storage unit stores information on the first N objects (for example, room content, room information, objects in the room). It is expressed that it remembers the trained model G based on the correspondence (that is, the learning set) between the information (information about the contents of the layout) and one or more photographs taken of each of the first N objects. You can also do it.

［実施形態１の情報処理］
以下では、実施形態１における情報処理の詳細を説明する。実施形態１の情報処理の概要は、情報処理装置３の制御部３２が、学習済みモデルと、所定の部屋の間取り図２０と、当該部屋の１枚以上の部屋写真１０と、に基づいて、当該部屋写真１０が、当該間取り図２０でいえば、どの位置で撮影されたものであるのか、また、どの方向に向かって撮影されたものであるのかを推定するというものである。なお、上記した所定の部屋の「間取り図２０（画像）」と当該部屋の「部屋写真１０」は、あくまで本発明を説明するための具体的な例の一に過ぎない。本発明自体は、所定の部屋の「間取り図（画像）」と当該部屋の「部屋写真」に関する推定に限定されるものではなく、所定の物（複数でも構わない）の「平面図」と当該物（複数でも構わない）の「写真」の関連を推定、判断するという形態をも包含する。さらに、第１の情報と第２の情報とが存在し、両情報の特徴量の次元が異なる場合において、両情報の関連を推定ないし判断するという形態をも包含する。 [Information processing of embodiment 1]
Hereinafter, the details of the information processing in the first embodiment will be described. The outline of the information processing of the first embodiment is described by the control unit 32 of the information processing apparatus 3 based on the trained model, the floor plan 20 of a predetermined room, and one or more room photographs 10 of the room. In the floor plan 20, the room photograph 10 estimates at which position the room photograph 10 was taken and in which direction the room photograph 10 was taken. The above-mentioned "floor plan 20 (image)" of a predetermined room and "room photograph 10" of the room are merely specific examples for explaining the present invention. The present invention itself is not limited to estimation of a "floor plan (image)" of a predetermined room and a "room photograph" of the room, but is not limited to a "plan view" of a predetermined object (s) and the subject. It also includes the form of estimating and judging the relationship between "photographs" of objects (s). Further, when the first information and the second information exist and the dimensions of the feature quantities of the two information are different, the form of estimating or determining the relationship between the two information is also included.

本発明の一実施形態に係る情報処理ステップを図７に示す。図７に記載の各ステップ（ステップ１からステップ７まで）を以下に説明する。 FIG. 7 shows an information processing step according to an embodiment of the present invention. Each step (step 1 to step 7) described in FIG. 7 will be described below.

［情報処理ステップの詳細］
［ステップ１］
本実施形態１において、まず、推定対象である部屋写真１０と、当該部屋写真２０に対応した間取り図とが、情報処理装置３に入力される（「ステップ１」）。入力方法には特に制限はない。たとえば、推定対象である部屋写真１０と、当該部屋写真１０に対応した間取り図２０とは、メモリーカードやＵＳＢメモリ等の外部記憶デバイスに保存され、当該外部記憶デバイスが直接情報処理装置３に接続されても良い。この場合、受信された部屋写真１０と、間取り図２０とは、記憶部３３に記憶されてもよいし、そのまま外部記憶デバイスに保存されたままでも良い。 [Details of information processing steps]
[Step 1]
In the first embodiment, first, the room photograph 10 to be estimated and the floor plan corresponding to the room photograph 20 are input to the information processing apparatus 3 (“step 1”). There are no particular restrictions on the input method. For example, the room photograph 10 to be estimated and the floor plan 20 corresponding to the room photograph 10 are stored in an external storage device such as a memory card or a USB memory, and the external storage device is directly connected to the information processing device 3. May be done. In this case, the received room photograph 10 and the floor plan 20 may be stored in the storage unit 33 or may be stored as they are in the external storage device.

そのほか、たとえば、情報処理装置３が外部ネットワーク４から通信部３１を介して、部屋写真１０と間取り図２０とを受信してもよい。この場合、受信された部屋写真１０と、間取り図２０とは、記憶部３３に記憶される。情報処理システム１全体で考えれば、ユーザ端末装置５（たとえば、スマートフォン）において情報処理装置３の記憶部３３に予め記憶された間取り図２０を選択するという形態のみならず、端末装置５において間取り図２０を撮影（スキャン）または入力して情報処理装置３にアップロードするという形態であってもよいことを意味し、その上で、部屋写真１０を該スマートフォンのカメラで撮影し、撮影した部屋写真１０を情報処理装置３にアップロードするという形でも実現され得る。以下では、部屋写真１０としてパノラマ画像を外部ネットワーク４から受信した場合を想定して説明を行う。 In addition, for example, the information processing apparatus 3 may receive the room photograph 10 and the floor plan 20 from the external network 4 via the communication unit 31. In this case, the received room photograph 10 and the floor plan 20 are stored in the storage unit 33. Considering the information processing system 1 as a whole, not only the form in which the floor diagram 20 stored in advance in the storage unit 33 of the information processing device 3 is selected in the user terminal device 5 (for example, a smartphone), but also the layout diagram in the terminal device 5. It means that 20 may be photographed (scanned) or input and uploaded to the information processing apparatus 3, and then the room photograph 10 is photographed by the camera of the smartphone and the room photograph 10 is photographed. Can also be realized in the form of uploading to the information processing apparatus 3. In the following, a case where a panoramic image is received from the external network 4 as a room photograph 10 will be described.

［ステップ２］
ステップ２において情報処理装置３の制御部３２は、部屋写真１０（パノラマ画像）を読み込む。ここで、一例において、情報処理装置３に入力された（すなわち、情報処理装置３が受信した）部屋写真１０（パノラマ画像）がグレースケール画像である場合にはそのまま読み込んで後記情報処理を行うが、それがカラー画像である場合にはグレースケールで読み込む（「ステップ２」）。 [Step 2]
In step 2, the control unit 32 of the information processing apparatus 3 reads the room photograph 10 (panoramic image). Here, in one example, if the room photograph 10 (panoramic image) input to the information processing device 3 (that is, received by the information processing device 3) is a grayscale image, it is read as it is and information processing is performed later. , If it is a color image, read it in grayscale (“Step 2”).

［ステップ３］
読み込まれた部屋写真（グレースケール画像）から、制御部３２において、学習済みモデルに基づいて、部屋写真（パノラマ画像）内の物体情報（ｃｈａｎｎｅｌ方向の情報、すなわち、１階のテンソル情報）を抽出するとともに、パノラマ画像内の物体のそれぞれの位置情報（ｗｉｄｔｈ方向の情報、すなわち、１階テンソル情報）を抽出するという計算が行われる（「ステップ３」）。この際の挙動をさらに補足する。 [Step 3]
From the read room photograph (grayscale image), the control unit 32 extracts the object information (information in the channel direction, that is, the tensor information on the first floor) in the room photograph (panoramic image) based on the trained model. At the same time, a calculation is performed to extract the position information (information in the width direction, that is, the first-order tensor information) of each object in the panoramic image (“step 3”). The behavior at this time is further supplemented.

一例として、読み込まれたグレースケールのパノラマ画像の解像度が、Ｈ（高さ）＝２５６、Ｗ（幅）＝５１２という解像度であることを前提に以下、ｗｉｄｔｈ方向の情報（つまり、位置情報）の抽出について補足的に説明を行う。なお当然のことながら、読み込まれたグレースケールのパノラマ画像の解像度はこのような解像度に限定されるものではない。図８にｗｉｄｔｈ方向の情報の抽出の補足説明の図を示す。まず、制御部３２は、当該グレースケールのパノラマ画像に対し、畳み込みを行う。当該畳み込みにより、たとえば、Ｈ（高さ）＝８、Ｗ（幅）＝１６、Ｃ（チャネル数）＝１０８０のデータに変換する。制御部３２は、当該変換後のデータについて、高さ方向の情報については平均化を行うことによって、情報としては落としてしまっても良い。そうすると、高さ方向の情報の平均化によって、Ｗ（幅）＝１６、Ｃ（チャネル数）＝１０８０というデータに変換することができる。ここで、全結合層はＷ（幅）＝１６、Ｃ（チャネル数）＝６４となり、これをベクトル化することにより、Ｃ（チャネル数）＝１０２４とすることができる。図９は、たとえば図８において部屋写真１０の左側にオブジェクトが存在すると仮定した場合のステップを示しており、オブジェクト位置を網掛けで示した。上記各処理後のデータにおける当該オブジェクトに対応する部分もまた、網掛けで示している。 As an example, assuming that the resolution of the read grayscale panoramic image is H (height) = 256 and W (width) = 512, the following information in the width direction (that is, position information) The extraction will be supplementarily explained. As a matter of course, the resolution of the loaded grayscale panoramic image is not limited to such a resolution. FIG. 8 shows a diagram of a supplementary explanation for extracting information in the width direction. First, the control unit 32 convolves the grayscale panoramic image. By the convolution, for example, data of H (height) = 8, W (width) = 16, C (number of channels) = 1080 is converted. The control unit 32 may drop the converted data as information by averaging the information in the height direction. Then, by averaging the information in the height direction, it can be converted into data of W (width) = 16 and C (number of channels) = 1080. Here, the fully connected layer has W (width) = 16 and C (number of channels) = 64, and by vectorizing these, C (number of channels) = 1024 can be set. FIG. 9 shows, for example, the steps when it is assumed that the object exists on the left side of the room photograph 10 in FIG. 8, and the object positions are shown in shading. The part corresponding to the object in the data after each of the above processes is also shown in shading.

［ステップ４］
部屋写真１０内の物体に対応するｃｈａｎｎｅｌ方向の情報、及びパノラマ画像内の物体のそれぞれの位置に対応するｗｉｄｔｈ方向の情報は、所定のファイル１００に保存され、該部屋写真１０（パノラマ画像）と関連付けられて記憶部３３に記憶される。 [Step 4]
The information in the channel direction corresponding to the object in the room photograph 10 and the information in the width direction corresponding to each position of the object in the panoramic image are stored in a predetermined file 100, and are stored in the room photograph 10 (panoramic image). It is associated and stored in the storage unit 33.

［ステップ５］
情報処理装置３に入力された間取り図２０が畳み込まれてエンコードされる。一例において、間取り図２０は、ＭｏｂｉｌｅＮｅｔをベースとしたアーキテクチャを用いて、エンコードされる。これにより相当量の演算量を削減することができる。なお、通常の畳み込みが空間方向とチャネル方向の畳み込みを同時に行うのに対して、本発明の一実施形態においては、ＭｏｂｉｌｅＮｅｔ等により、畳み込みはｗｉｄｔｈ方向（パノラマ画像内の物体のそれぞれの位置情報）を行なったのちに、ｃｈａｎｎｅｌ方向（つまり画像内の物体の情報）を行なう。すなわち、空間方向とチャネル方向の畳み込みを同時に行うのではなく、上記した記載の順に行うものとする。なお、畳み込みに用いられるアーキテクチャはＭｏｂｉｌｅＮｅｔｂに限定されるものではないが、仮にＭｏｂｉｌｅＮｅｔを使用する場合にあっては、そのバージョンについて制限はない。好ましくは、ＭｏｂｉｌｅＮｅｔＶ１ではなくＭｏｂｉｌｅＮｅｔＶ２を用いる。一般的に畳み込みでは層を重ねるごとにチャネル数が増加していきパラメータ量を圧迫していくが、ＭｏｂｉｌｅＮｅｔＶ１は最終的に形が７×７×１０２４になるのに対してＭｏｂｉｌｅＮｅｔＶ２は７×７×３２０までしかならないためパラメータ数がＭｏｂｉｌｅＮｅｔＶ１よりも小さくすることができる。そして、本発明において提供される推定の精度との関係では、ＭｏｂｉｌｅＮｅｔＶ２で圧縮される程度まで間取り図を圧縮したとしても、特に大きな問題は生じない。 [Step 5]
The layout diagram 20 input to the information processing apparatus 3 is folded and encoded. In one example, the floor plan 20 is encoded using a MobileNet-based architecture. As a result, a considerable amount of calculation can be reduced. In addition, while the normal convolution performs the convolution in the spatial direction and the convolution in the channel direction at the same time, in one embodiment of the present invention, the convolution is performed in the width direction (position information of each object in the panoramic image) by using MobileNet or the like. After that, the channel direction (that is, the information of the object in the image) is performed. That is, the convolution in the spatial direction and the channel direction is not performed at the same time, but is performed in the order described above. The architecture used for convolution is not limited to MobileNetb, but if MobileNet is used, there is no limitation on the version. Preferably, MobileNetV2 is used instead of MobileNetV1. Generally, in convolution, the number of channels increases with each layer and the amount of parameters is squeezed, but MobileNet V1 finally has a shape of 7 × 7 × 1024, whereas MobileNet V2 has 7 × 7 ×. Since it is only up to 320, the number of parameters can be made smaller than that of MobileNet V1. Further, in relation to the estimation accuracy provided in the present invention, even if the floor plan is compressed to the extent that it is compressed by MobileNet V2, no particular big problem occurs.

［ステップ６］
当該ステップ５で得た情報と、ファイル１００に記憶された部屋写真１０（パノラマ画像）における上記特徴量（画像内の物体に対応するｃｈａｎｎｅｌ方向の情報、及びパノラマ画像内の物体のそれぞれの位置に対応するｗｉｄｔｈ方向の情報）とを、それぞれコンカット（縦に連結）することで対応づけを行う。一例において、本コンカットは複数回（好ましくは２回）行われる。 [Step 6]
The information obtained in step 5 and the feature amount (information in the channel direction corresponding to the object in the image) in the room photograph 10 (panorama image) stored in the file 100, and the position of the object in the panorama image. (Information in the corresponding width direction) is associated with each other by concating (vertically connecting) each other. In one example, the concut is performed multiple times (preferably twice).

この点、間取り図は３階のテンソルであるのに対し、画像内の物体に対応するｃｈａｎｎｅｌ方向の情報及びパノラマ画像内の物体のそれぞれの位置に対応するｗｉｄｔｈ方向の情報はいずれも１階のテンソルである。したがって、これらを直接コンカットすることはできない。３階のテンソルを１階のテンソルに落とすことで（つまり、３階のテンソルが有する情報のうち、１階のテンソルに対応しない次元の情報はコンカットしないことになる）、１階のテンソル同士をコンカットする手法も考えられないではないが、それでは空間情報が失われてしまい、撮影位置および撮影方向の特定に問題が生じる。そこで、本発明者らは、以下において説明するように、１階のテンソル情報（ｃｈａｎｎｅｌ方向の情報および／またはｗｉｄｔｈ方向の情報）と、３階のテンソル情報（間取り図の情報）をコンカットする際に、３階のテンソル情報が保有する空間情報を失うことなく、コンカットを行う手法に至った。当該手法について、図１０を用いて以下説明する。 In this respect, while the floor plan is a tensor on the third floor, the information in the channel direction corresponding to the object in the image and the information in the width direction corresponding to each position of the object in the panoramic image are both on the first floor. It is a tensor. Therefore, these cannot be directly concut. By dropping the tensor on the 3rd floor to the tensor on the 1st floor (that is, the information on the tensor on the 3rd floor that does not correspond to the tensor on the 1st floor will not be converted). Although it is not unthinkable to use a method of concutting the image, spatial information will be lost, and there will be a problem in specifying the shooting position and shooting direction. Therefore, as described below, the present inventors concatenate the tensor information on the first floor (information in the channel direction and / or information in the width direction) and the tensor information on the third floor (information on the floor plan). At that time, we came up with a method to perform concat without losing the spatial information held by the tensor information on the 3rd floor. The method will be described below with reference to FIG.

上記手法は、概略、１階のテンソルを間取り図のテンソルの形に合わせて複製し、ｃｈａｎｎｅｌ方向とｗｉｄｔｈ方向とそれぞれコンカットする手法である。上記のとおり、ｃｈａｎｎｅｌ方向の情報及びｗｉｄｔｈ方向の情報はいずれも１階のテンソルであり、Ｃ（チャネル数）のみであるが、これを、（１、１、Ｃ１）という情報量に変換する。そして、間取り図画像が（Ｈ２、Ｗ２、Ｃ２）の情報量であるとした場合、間取り図画像を基準とし、（１、１、Ｃ）をＨ（高さ）方向に複数（×Ｈ２）複製し、それをさらにＷ（幅）方向に複数（×Ｗ２）複製し、（Ｈ２、Ｗ２、Ｃ１）という情報量に変換する。これにより、ｃｈａｎｎｅｌ方向の情報及びｗｉｄｔｈ方向の情報は、それぞれ、（Ｈ２、Ｗ２、Ｃ１）という、３階のテンソルと同次元の情報に変換される。さいごに、（Ｈ２、Ｗ２、Ｃ２）と（Ｈ２、Ｗ２、Ｃ１）とについて、いずれも３階のテンソルであるとして、コンカットを行う。 The above method is a method in which a tensor on the first floor is roughly duplicated according to the shape of the tensor in the floor plan, and concuts are performed in the channel direction and the width direction, respectively. As described above, the information in the channel direction and the information in the width direction are both first-order tensors and are only C (number of channels), but this is converted into the amount of information (1, 1, C1). Then, assuming that the floor plan image is the amount of information of (H2, W2, C2), a plurality of (× H2) duplications (1, 1, C) of (1, 1, C) are performed in the H (height) direction with the floor plan image as a reference. Then, it is further duplicated (× W2) in the W (width) direction and converted into an information amount (H2, W2, C1). As a result, the information in the channel direction and the information in the width direction are converted into information of the same dimension as the third-order tensor called (H2, W2, C1), respectively. Finally, (H2, W2, C2) and (H2, W2, C1) are concated assuming that they are all tensors on the third floor.

ここで、非特許文献１においては、コンカットはｃｈａｎｎｅｌ情報のみに基づいて行われていた。しかしながら、本発明者らは、ｃｈａｎｎｅｌ情報のみに基づいてコンカットを行った場合、位置関係の情報が不十分であり、その結果として本発明に係る推定の正確性が担保できないことを見いだした。そこで、本発明者らはｗｉｄｔｈ方向の情報をもコンカットに用いることで、正確性を格段に高めることができるという技術に至った。 Here, in Non-Patent Document 1, the concut was performed based only on the channel information. However, the present inventors have found that when the concat is performed based only on the channel information, the information on the positional relationship is insufficient, and as a result, the accuracy of the estimation according to the present invention cannot be guaranteed. Therefore, the present inventors have come up with a technique that the accuracy can be remarkably improved by using the information in the width direction for the comcut.

［ステップ７］
当該コンカットされた情報をデコードし、間取り図と同サイズのヒートマップとして出力し、評価する。これらは、既存の手法によって実行することができ、たとえば、Ｕ字型の畳み込みネットワーク（Ｕ－Ｎｅｔ）をベースにしたアーキテクチャにより実行することができる。ただし、一例において、この際、位置情報（２次元）と方向情報（１次元）とを分けて、ヒートマップで出力する。これにより、撮影位置情報（２次元）と撮影方向情報（１次元）とをヒートマップから評価する（すなわち、対照同士の確率分布の積分値の差分による比較をする）ことが可能となる。 [Step 7]
The combined information is decoded, output as a heat map of the same size as the floor plan, and evaluated. These can be performed by existing techniques, for example, by an architecture based on a U-shaped convolutional network (U-Net). However, in one example, at this time, the position information (two-dimensional) and the direction information (one-dimensional) are separately output as a heat map. This makes it possible to evaluate the shooting position information (two-dimensional) and the shooting direction information (one-dimensional) from the heat map (that is, compare by the difference between the integrated values of the probability distributions of the controls).

一例において、撮影方向の推定および撮影位置の推定におけるＬｏｓｓ、すなわち、機械学習におけるクロスエントロピーであるＬｏｇａｒｉｔｈｍｉｃＬｏｓｓとしては、Ｊｅｎｓｅｎ－Ｓｈａｎｎｏｎ（ＪＳ）ｄｉｖｅｒｇｅｎｃｅを用いる。すなわち、同じ確率変数ｘ（すなわち、角度情報）に対して、２つの確率分布Ｐ（ｘ）とＱ（ｘ）があるとき、これらの確率分布の距離（すなわち、差分）は、Ｋｕｌｌｂａｃｋ－Ｌｅｉｂｌｅｒ（「ＫＬ」）ｄｉｖｅｒｇｅｎｃｅを使い評価することもできるが、ＫＬｄｉｖｅｒｇｅｎｃｅは対称性がなく（すなわち、２つの確率分布を交換した場合に等価でない）ため距離の公理を満たさないが、Ｊｅｎｓｅｎ－Ｓｈａｎｎｏｎ（「ＪＳ」）ｄｉｖｅｒｇｅｎｃｅであれば対称性を有し、本発明に好適である。 In one example, Jensen-Shannon (JS) diversity is used as the Loss in the estimation of the shooting direction and the estimation of the shooting position, that is, the Logarithmic Loss which is the cross entropy in machine learning. That is, when there are two probability distributions P (x) and Q (x) for the same probability variable x (ie, angle information), the distance (ie, difference) of these probability distributions is Kullback-Leibler (ie, difference). Although it can be evaluated using (“KL”) diversity, KL diversity does not meet the distance apocalypse due to its lack of symmetry (ie, it is not equivalent when two probability distributions are exchanged), but Jensen-Shannon (“KL”). JS ") divergence has symmetry and is suitable for the present invention.

なお、機械学習に係る文献である非特許文献２では、（角度情報）推定のために符号付きのｓｉｎ、ｃｏｓの２つを出力に学習するよりも、ｓｉｎの絶対値、ｃｏｓの絶対値、ｓｉｎの符号、ｃｏｓの符号を出力として、その差分を小さくするよう学習する方が、精度が高かったとされている。しかしながら、本発明者らは、一次元のヒートマップを撮影方向の推定の出力とすることで、格段に精度を高めることができることを見いだした。 In Non-Patent Document 2, which is a document related to machine learning, the absolute value of sin and the absolute value of cos are more important than learning the two signed sin and cos as outputs for (angle information) estimation. It is said that it was more accurate to learn to reduce the difference by using the sign of sin and the sign of cos as outputs. However, the present inventors have found that the accuracy can be significantly improved by using a one-dimensional heat map as an output for estimating the shooting direction.

［追加的情報処理］
これらにより、１つ以上の画像のセット、好ましくは、広視野を示すパノラマ画像のセットと、対応する間取り図画像とが、情報処理装置３の記憶部３３に記憶される。次に、これらの１つ以上のパノラマ画像は、当該パノラマ画像が撮影された間取り図における特定の撮影位置および撮影方向とマッチングされる。当該マッチングに基づいて、間取り図上に矢印等のマークを表示し（図１の矢印参照）、パノラマ画像のセットと、対応する間取り図画像とをユーザに表示させることが可能となる。 [Additional information processing]
As a result, a set of one or more images, preferably a set of panoramic images showing a wide field of view, and a corresponding floor plan image are stored in the storage unit 33 of the information processing apparatus 3. Next, these one or more panoramic images are matched with a specific shooting position and shooting direction in the floor plan in which the panoramic image is taken. Based on the matching, a mark such as an arrow can be displayed on the floor plan (see the arrow in FIG. 1), and the user can display the set of panoramic images and the corresponding floor plan image.

［追加的構成］
間取り図内における撮影されたパノラマ画像のそれぞれの撮影位置及び撮影方向が特定されると、不動産物件の自動的及び／又はインタラクティブなツアーを提供するバーチャルツアーを作成することができる。バーチャルツアーでは、当該バーチャルツアーにおいて訪問される間取り図における特定の部屋（位置）における、対応するパノラマ画像によって提供され、ユーザ端末装置５からの方角（撮影方向）を変更する信号を受信することで撮影方向が異なるパノラマ画像を提供することもできる。また、情報処理装置３において、パノラマ画像を編集し、所定の方向からの写真を修正し、異なる方向からの写真を合成して表示することも可能である。また、パノラマ画像の広視野画像に変えて、クロッピングなどにより、ユーザの視野角に応じた標準的な視野を提供する事も可能である。 [Additional configuration]
Once the shooting position and shooting direction of each of the panoramic images taken in the floor plan are specified, it is possible to create a virtual tour that provides an automatic and / or interactive tour of the real estate property. In the virtual tour, by receiving a signal from the user terminal device 5 that changes the direction (shooting direction) provided by the corresponding panoramic image in a specific room (position) in the floor plan visited in the virtual tour. It is also possible to provide panoramic images in different shooting directions. Further, in the information processing apparatus 3, it is also possible to edit a panoramic image, modify a photograph from a predetermined direction, and combine and display a photograph from a different direction. It is also possible to provide a standard field of view according to the viewing angle of the user by cropping or the like instead of the wide field of view image of the panoramic image.

一例として、バーチャルツアーは双方向性のあるツアーでも良く、たとえばユーザがユーザ端末装置５の表示部５５としてヘッドマウントディスプレイなどを使用して部屋内を見学できるようにしても良い。その際、本発明の一実施形態によれば、各パノラマ画像の撮影位置と撮影方向が正確に算出され間取り図と対応付けられるから、ユーザは間取り図に対応した形で各部屋をバーチャルに訪問することができる。また、ユーザの所定の場所に対する視線をヘッドマウントディスプレイの入力部５４が感知するように構成し、バーチャルツアーを一時停止したり、進行したり、別の視点に切り替えたりするなどの操作ができるようになっていてもよい。 As an example, the virtual tour may be an interactive tour, and for example, the user may be allowed to tour the inside of the room by using a head-mounted display or the like as the display unit 55 of the user terminal device 5. At that time, according to one embodiment of the present invention, since the shooting position and shooting direction of each panoramic image are accurately calculated and associated with the floor plan, the user virtually visits each room in a form corresponding to the floor plan. can do. In addition, the input unit 54 of the head-mounted display is configured to detect the line of sight of the user at a predetermined place so that the virtual tour can be paused, progressed, or switched to another viewpoint. It may be.

この際、ユーザの体験が実体験と一致することができるように、部屋Ａから部屋Ｂに入室する場合には、部屋Ａから見た部屋Ｂの画像を提供するなど、複数の部屋写真のうち、最も適した撮影方向の部屋写真１０を選択してユーザ端末装置５の表示部５５に表示するようにすることができる。 At this time, when entering the room B from the room A, the image of the room B seen from the room A is provided so that the user's experience can match the actual experience. The room photograph 10 in the most suitable shooting direction can be selected and displayed on the display unit 55 of the user terminal device 5.

［実施形態２］
さらに別の実施形態として、部屋写真１０Ｂおよび間取り図２０Ｂの両方において物体を特定し、当該特定された物体の位置関係を元に、撮影位置および方向を特定する手法を提案する。たとえば、図１および図２において、撮影位置と間取り図上のキッチンの位置と、部屋写真においてキッチンが正面に写っていることを前提とすれば、これらの情報に基づいて、制御部３２は撮影方向を推定することが可能である。 [Embodiment 2]
As yet another embodiment, a method of specifying an object in both the room photograph 10B and the floor plan 20B and specifying the shooting position and the direction based on the positional relationship of the specified object is proposed. For example, in FIGS. 1 and 2, assuming that the shooting position, the position of the kitchen on the floor plan, and the kitchen are shown in front in the room photograph, the control unit 32 takes a picture based on these information. It is possible to estimate the direction.

部屋写真が撮影された間取り図上の位置（すなわち、撮影位置）の特定は、実施形態１の手法によって行うことができるので、詳細な説明は省略する。 Since the position (that is, the shooting position) on the floor plan on which the room photograph was taken can be specified by the method of the first embodiment, detailed description thereof will be omitted.

間取り図上の物体の位置の特定については、実施形態１の手法のみならず、その他の手法によって行うこともできる。たとえば、物体はニューラルネットワークによる学習済みモデルを用いて特定され、特定される物体は、学習済みモデルに応じるものであり、事前に規定される。具体的には、キッチンアイランド、扉（ドア）およびそのドアノブ、壁、階段、手摺り、トイレ、シンク、バスタブ、シャワー、ストーブ、冷蔵庫、洗濯機、棚などがあげられるが、これらに限定されるものではない。一例においては、ドアノブの位置を検出することで、右ドアと左ドアとは区別して特定される。 The position of the object on the floor plan can be specified not only by the method of the first embodiment but also by another method. For example, an object is identified using a trained model by a neural network, and the identified object corresponds to the trained model and is predetermined. Specific examples include, but are limited to, kitchen islands, doors and their doorknobs, walls, stairs, railings, toilets, sinks, bathtubs, showers, stoves, refrigerators, washing machines, shelves, etc. It's not a thing. In one example, by detecting the position of the doorknob, the right door and the left door are distinguished and specified.

［変形例１］
なお、本発明は上記実施の形態に限られたものではなく、要旨を逸脱しない範囲で適宜変更することが可能である。すなわち、本発明の一実施形態によれば、２つの入力の特徴量の次元が異なっていても、図１０および対応する詳細な説明（たとえば上記「ステップ６」）で記載したとおり、両者をコンカットして評価を行うことが可能となるから、たとえば、インプットとして、画像と、画像の説明文章（例として「左に犬が走っていて、右に風船が飛んでいる」）が入力された場合に、「犬が走っているかどうか」「風船が飛んでいるかどうか」の判定ではなく、説明文書どおりの画像であるかどうか（すなわち、上記の例でいえば、「左に犬が走っていて、右に風船が飛んでいる」画像であるかどうか）を正確に判定することができる。その意味で、本発明は「間取り図」と「部屋写真」との情報処理に限定されるものではないし、「物の平面図」と「物の写真」との情報処理に限定されるものでもなく、２つの入力の特徴量の次元が異なる場合の情報処理に関するものである。 [Modification 1]
The present invention is not limited to the above embodiment, and can be appropriately modified without departing from the gist. That is, according to one embodiment of the present invention, even if the dimensions of the feature quantities of the two inputs are different, both are controlled as described in FIG. 10 and the corresponding detailed description (for example, "step 6" above). Since it is possible to cut and evaluate, for example, an image and an explanatory text of the image (for example, "a dog is running on the left and a balloon is flying on the right") are input as inputs. In some cases, it is not a judgment of "whether the dog is running" or "whether the balloon is flying", but whether the image is as described in the explanation (that is, in the above example, "the dog is running to the left". It is possible to accurately determine (whether or not the image is a "balloon flying to the right"). In that sense, the present invention is not limited to information processing between a "floor plan" and a "room photograph", but is also limited to information processing between a "plan view" and a "photograph of an object". However, it relates to information processing when the dimensions of the feature quantities of the two inputs are different.

［変形例２］
さらに、上記「ステップ７」において説明した、角度情報を一次元のヒートマップとして出力して評価するという手法は、正確な角度情報が必要とされる他の分野において広く適用することができる。具体的には、変形例として、ロボティクスや自動運転分野において、（端末）入力部５４としてカメラを搭載した端末装置５（本変形例における「端末装置５」とは、たとえば、カメラを搭載したロボットやカメラを搭載した自動運転車である）について、当該端末の周囲の画像情報を情報処理装置３に与え、情報処理装置３の制御部３２において当該端末に搭載されたカメラで撮影された画像と端末の周囲の画像情報とに基づいて当該端末５の現在の向きを計算し、そこから目的地に移動するための当該端末の向きを適切に変更させ、当該端末の向きを制御するという形態においても、その角度情報（すなわち、端末の向き）計算の精度向上に寄与する。この場合、上記実施形態において間取り図２０として説明したものが「端末の周囲の画像情報」に対応し、本実施形態において部屋写真１０（ないしパノラマ画像）として説明したものが「端末に搭載されたカメラで撮影された画像」に対応する。さらに、ヒートマップによる出力は、特定の画像における人物や物体が、どの方向を向いているのかを計算する際にも、有用である。 [Modification 2]
Further, the method of outputting and evaluating the angle information as a one-dimensional heat map described in the above "step 7" can be widely applied in other fields where accurate angle information is required. Specifically, as a modification, in the fields of robotics and automatic driving, a terminal device 5 equipped with a camera as a (terminal) input unit 54 (the "terminal device 5" in this modification is, for example, a robot equipped with a camera). And an automatic driving vehicle equipped with a camera), the image information around the terminal is given to the information processing device 3, and the image taken by the camera mounted on the terminal in the control unit 32 of the information processing device 3 In the form of calculating the current orientation of the terminal 5 based on the image information around the terminal, appropriately changing the orientation of the terminal for moving to the destination from there, and controlling the orientation of the terminal. Also contributes to improving the accuracy of the calculation of the angle information (that is, the orientation of the terminal). In this case, what is described as the floor plan 20 in the above embodiment corresponds to "image information around the terminal", and what is described as the room photograph 10 (or panoramic image) in the present embodiment is "mounted on the terminal". Corresponds to "images taken by the camera". In addition, the heatmap output is also useful in calculating which direction a person or object in a particular image is facing.

上記のとおり、本発明は、間取り図内における部屋写真の撮影位置と撮影方向の特定および物の平面図と当該物の写真の対応付けのみならず、画像と当該画像の説明文の対応付けおよび正誤判断、並びにロボットや自動運転車に係る角度情報（方向）の計算などにも適用することができ、有用である。 As described above, the present invention not only specifies the shooting position and shooting direction of the room photograph in the floor plan and associates the plan view of the object with the photograph of the object, but also associates the image with the description of the image. It is useful because it can be applied to correct / incorrect judgment and calculation of angle information (direction) related to robots and autonomous vehicles.

情報処理システム１
部屋写真１０
間取り図（画像）２０
情報処理装置３
通信部３１
制御部３２
記憶部３３
ネットワーク４
ユーザ端末装置５
通信部５１
制御部５２
記憶部５３
入力部５４
表示部５５
ファイル１００
学習モデルＧ Information processing system 1
Room photo 10
Floor plan (image) 20
Information processing device 3
Communication unit 31
Control unit 32
Memory 33
Network 4
User terminal device 5
Communication unit 51
Control unit 52
Memory 53
Input unit 54
Display 55
File 100
Learning model G

Claims

An information processing device having a storage unit and a control unit.
The storage unit performs machine learning by a convolutional neural network based on the correspondence between the plan view of the first N spaces and one or more photographs taken of each of the first N spaces. Remember the finished model,
The control unit extracts information obtained by compressing the photograph of the second space based on the trained model, the photograph of the second space different from the N spaces, and the plan view of the second space. Then, by combining the plan view of the second space with the compressed information, the shooting position and the shooting direction in which the photograph of the second space in the plan view of the second space is taken are estimated.
Information processing equipment.

The estimation includes the control unit performing a Loss calculation using the Jensen-Shannon diversity.
The information processing apparatus according to claim 1.

The estimation includes evaluating the result of the concut by a U-shaped convolutional network.
The information processing apparatus according to claim 1 or 2.

Computer,
It has a storage unit and a control unit.
Learning that the storage unit performed machine learning by a convolutional neural network based on the correspondence between the plan view of the first N spaces and one or more photographs taken of each of the first N spaces. Remember the finished model,
The control unit extracts information obtained by compressing the photograph of the second space based on the trained model, the photograph of the second space different from the N spaces, and the plan view of the second space. Information processing that estimates the shooting position and shooting direction in which the photograph of the second space in the plan view of the second space is taken by combining the plan view of the second space with the compressed information. Operate as a device,
Computer program.

The estimation includes the control unit performing a Loss calculation using the Jensen-Shannon diversity.
The computer program according to claim 4.

The estimation includes evaluating the result of the concut by a U-shaped convolutional network.
The computer program according to claim 4 or 5.