JP2019175139A

JP2019175139A - Architectural structure extraction system

Info

Publication number: JP2019175139A
Application number: JP2018062645A
Authority: JP
Inventors: 竜平濱口; Ryuhei Hamaguchi
Original assignee: Pasco Corp
Current assignee: Pasco Corp
Priority date: 2018-03-28
Filing date: 2018-03-28
Publication date: 2019-10-10
Anticipated expiration: 2038-03-28
Also published as: JP7096033B2

Abstract

To prevent oversight during building extraction.SOLUTION: An architectural structure extraction system includes a first building detection device that has learned a plurality of buildings belonging to a first group using an input image for learning and teacher data of information indicating the shapes of the plurality of buildings included in the input image for learning, a second building detection device that has learned a plurality of buildings belonging to a second group different from the first group using an input image for learning and teacher data of information indicating the shapes of the plurality of buildings included in the input image for learning, an input part that inputs feature information of the input image where a region subject to learning on the ground surface is photographed from the sky to the first building detection device and the second building detection device, and an integration part that integrates the output of the first building detection device and the output of the second building detection device in response to the input feature information of the input image. The first building detection device and the second building detection device include a neural network of the type different from each other.SELECTED DRAWING: Figure 2

Description

本発明は建築物抽出システムに関する。 The present invention relates to a building extraction system.

航空写真や衛星画像など、上空から取得した画像等のデータから建物を抽出する技術が研究されている。特許文献１には、航空写真等の画像上にて作業者が抽出したい建物を含む作業領域を指定し、当該作業領域にて建物の輪郭を自動的に抽出するシステムが開示されており、また下記特許文献２には、上空からレーザスキャナなどを用いて取得したＤＳＭ（Digital Surface Model：数値表層モデル）を使用して建物の輪郭を抽出する装置が開示されている。 A technique for extracting a building from data such as an aerial photograph or satellite image acquired from the sky has been studied. Patent Document 1 discloses a system for designating a work area including a building that an operator wants to extract on an image such as an aerial photograph, and automatically extracting the outline of the building in the work area. Patent Document 2 listed below discloses an apparatus that extracts the outline of a building using a DSM (Digital Surface Model) obtained from above using a laser scanner or the like.

特許文献３には、歩行者を認識するための物体検出装置において、３つのスケールを有するアンサンブル検出器が開示されており、スケールにより検出するべき歩行者の画像のサイズが異なることが開示されている。 Patent Document 3 discloses an ensemble detector having three scales in an object detection device for recognizing a pedestrian, and discloses that the size of a pedestrian image to be detected differs depending on the scale. Yes.

特開２０１１−７６１７８号公報JP 2011-76178 A 特開２０１３−１０１４２８号公報JP 2013-101428 A 特開２０１８−５５２０号公報Japanese Patent Application Laid-Open No. 2018-5520

発明者らは、例えば、建物の異動（新築や取り壊し）を検出する作業負荷を軽減するために、畳み込みニューラルネットワークを利用して建物を抽出する手法を開発している。
発明者らは、建物の抽出に用いるニューラルネットワークとして、畳み込み層とプーリング層とを含むモデルや、拡張畳み込み演算を用いたモデルを用いて実験をした。しかしながら、各モデルは、建物の大きさなどの属性によって建物の抽出精度に得意や不得意があり、どちらのモデルを用いても、建物の抽出における見落としを抑制することが困難であった。 For example, the inventors have developed a method of extracting a building using a convolutional neural network in order to reduce a work load for detecting a change (new construction or demolition) of the building.
The inventors conducted experiments using a model including a convolution layer and a pooling layer and a model using an extended convolution operation as a neural network used for building extraction. However, each model has strength and weakness in building extraction accuracy depending on attributes such as the size of the building, and it is difficult to suppress oversight in building extraction using either model.

本発明は上記課題を鑑みてなされたものであって、その目的は、建物の抽出における見落としを抑制することが可能な建築物抽出システムを提供することにある。 This invention is made | formed in view of the said subject, The objective is to provide the building extraction system which can suppress the oversight in extraction of a building.

（１）第１のグループに属する複数の建物について、学習用入力画像と、当該学習用入力画像に含まれる当該複数の建物の形状を示す情報の教師データとを用いて学習された第１の建物検出器と、前記第１のグループと異なる第２のグループに属する複数の建物について、学習用入力画像と、当該複数の建物の形状を示す情報の教師データとを用いて学習された第２の建物検出器と、地表上の学習対象領域が上空から撮影された入力画像の特徴情報を、前記第１の建物検出器および前記第２の建物検出器に入力する入力部と、前記入力された入力画像の特徴情報に対する、前記第１の建物検出器の出力と前記第２の建物検出器の出力とを統合する統合部と、を含み、前記第１の建物検出器および第２の建物検出器は、互いに異なる種類のニューラルネットワークを含む、を含む建築物抽出システム。 (1) A first learning for a plurality of buildings belonging to the first group using a learning input image and teacher data of information indicating the shapes of the plurality of buildings included in the learning input image. A second that has been learned using a building detector and a plurality of buildings belonging to a second group different from the first group by using an input image for learning and teacher data of information indicating the shapes of the plurality of buildings. And the input unit for inputting the feature information of the input image in which the learning target area on the ground surface is photographed from the sky to the first building detector and the second building detector, and the input An integration unit that integrates the output of the first building detector and the output of the second building detector with respect to the feature information of the input image, the first building detector and the second building The detectors are of different types. Building extraction system, including, including the Lal network.

（２）（１）において、前記第１のグループに属する複数の建物の面積は、第１の範囲に属し、前記第２のグループに属する複数の建物の面積は、前記第１の範囲と異なる第２の範囲に属する、建築物抽出システム。 (2) In (1), the areas of the plurality of buildings belonging to the first group belong to the first range, and the areas of the plurality of buildings belonging to the second group are different from the first range. A building extraction system belonging to the second range.

（３）（２）において、前記第１の建物検出器の出力に含まれる建物、および、前記第２の建物検出器の出力に含まれる建物を面積に基づいて除去するフィルタをさらに含む、建築物抽出システム。 (3) In (2), the building further includes a filter that removes the building included in the output of the first building detector and the building included in the output of the second building detector based on the area. Product extraction system.

（４）（１）から（３）のいずれかにおいて、前記第１の建物検出器は、拡張畳み込み演算を行う畳み込み層を含み、前記第２の建物検出器は、プーリング層を含む、建築物抽出システム。 (4) In any one of (1) to (3), the first building detector includes a convolution layer that performs an extended convolution operation, and the second building detector includes a pooling layer. Extraction system.

（５）（２）において、前記第１の範囲の最大値は、前記第２の範囲の最大値より小さく、前記第１の建物検出器は、拡張畳み込み演算を行う畳み込み層を含み、前記第２の建物検出器は、プーリング層を含む、建築物抽出システム。 (5) In (2), the maximum value of the first range is smaller than the maximum value of the second range, and the first building detector includes a convolution layer that performs an extended convolution operation, The building detector of 2 includes a pooling layer.

（６）（１）から（３）のいずれかにおいて、前記第１のグループおよび前記第２のグループのうちいずれかに属する複数の建物について、第１の種類のニューラルネットワークを含み、前記学習用入力画像と、前記学習用入力画像に含まれる当該複数の建物の形状を示す情報の教師データとを用いて学習された第１の候補検出器と、第２の種類のニューラルネットワークを含み、前記学習用入力画像と、前記学習用入力画像に含まれる当該複数の建物の形状を示す情報の教師データとを用いて学習された第２の候補検出器とのそれぞれの、建物の形状の検出精度を評価する評価部と、前記評価部により評価された検出精度に基づいて、前記第１の候補検出器および前記第２の候補検出器のうち一つを、前記第１の建物検出器および前記第２の建物検出器のうちいずれかとして選択する検出器選択部と、をさらに含む、建築物抽出システム。 (6) In any one of (1) to (3), a plurality of buildings belonging to any one of the first group and the second group includes a first type neural network, and the learning A first candidate detector trained using an input image and teacher data of information indicating the shape of the plurality of buildings included in the learning input image, and a second type neural network, Building shape detection accuracy of each of the second candidate detector learned using the learning input image and the teacher data of information indicating the shape of the plurality of buildings included in the learning input image One of the first candidate detector and the second candidate detector based on the evaluation accuracy evaluated by the evaluation unit and the first candidate detector and the second candidate detector. Second Further comprising a detector selection unit for selecting as the one of the building detectors, building extraction systems.

（７）（１）から（６）のいずれかにおいて、前記統合部は、前記入力された入力画像の特徴情報に対する、前記第１の建物検出器の出力と前記第２の建物検出器の出力とのいずれかにおいて建物と認識された領域を、建物のある領域と判定する、建築物抽出システム。 (7) In any one of (1) to (6), the integration unit outputs the output of the first building detector and the output of the second building detector with respect to the feature information of the inputted input image. The building extraction system which determines the area recognized as a building in any of the above as an area with a building.

本発明の実施形態にかかる建築物抽出システムのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the building extraction system concerning embodiment of this invention. 建築物抽出システムの機能構成を示すブロック図である。It is a block diagram which shows the function structure of a building extraction system. 学習検出器の種類を説明する図である。It is a figure explaining the kind of learning detector. スケールの違いを説明する図である。It is a figure explaining the difference in a scale. 学習検出器の構成の概要を示す図である。It is a figure which shows the outline | summary of a structure of a learning detector. プーリングモデルの学習検出器に含まれる層を説明する図である。It is a figure explaining the layer contained in the learning detector of a pooling model. ダイレーションモデルの学習検出器に含まれる層を説明する図である。It is a figure explaining the layer contained in the learning detector of a dilation model. 拡張畳み込み演算における層構造の一例を説明する図である。It is a figure explaining an example of the layer structure in an extended convolution calculation. 学習検出器を学習させる処理の一例を示すフロー図である。It is a flowchart which shows an example of the process which learns a learning detector. 窓画像のそれぞれに対する学習実行部の処理の一例を示すフロー図である。It is a flowchart which shows an example of the process of the learning execution part with respect to each of a window image. 教師データの一例を示す図である。It is a figure which shows an example of teacher data. 学習検出器を評価する処理の一例を示すフロー図である。It is a flowchart which shows an example of the process which evaluates a learning detector. 評価結果を示す図である。It is a figure which shows an evaluation result. 建物の領域を判定する処理の概要を説明する図である。It is a figure explaining the outline | summary of the process which determines the area | region of a building. 処理対象画像から全体出力画像を生成する処理の流れを示すフロー図である。It is a flowchart which shows the flow of the process which produces | generates a whole output image from a process target image.

以下では、本発明の実施形態について図面に基づいて説明する。出現する構成要素のうち同一機能を有するものには同じ符号を付し、その説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Of the constituent elements that appear, those having the same function are given the same reference numerals, and the description thereof is omitted.

本実施形態にかかる建築物抽出システムでは、ニューラルネットワークを用いた学習済みモデルである建物検出器に、建物を抽出する処理の対象領域とする地表を撮影した航空写真や衛星画像等（航空写真や衛星画像に基づくオルソ画像であってよく、以下では「処理対象画像」と表記する）の特徴情報を入力し、建物検出器から出力された画像に基づいて建物領域を判定・抽出する。建築物抽出システムは、処理対象画像から建物を識別する際には、３つの建物検出器を用いる。また、３つの建物検出器は、それぞれ、面積がＳ，Ｍ，Ｌの範囲に属する建物をより高精度に検出するように構成されている。例えば、面積の範囲Ｓは４５ｍ^２未満であり、面積の範囲Ｍは、４５ｍ^２以上１３１ｍ^２未満であり、面積の範囲Ｌは１３１ｍ^２以上である。おおむね、面積の範囲Ｌに属する建物はマンションや大型商業施設に対応し、面積の範囲Ｍはアパートや小売店に対応し、面積の範囲Ｓは一般家屋に対応する。 In the building extraction system according to the present embodiment, an aerial photograph or a satellite image obtained by photographing a ground surface as a target region of a process for extracting a building on a building detector that is a learned model using a neural network (aerial photograph or The feature information may be an ortho image based on a satellite image (hereinafter referred to as “processing target image”), and a building region is determined and extracted based on the image output from the building detector. The building extraction system uses three building detectors when identifying a building from a processing target image. Each of the three building detectors is configured to detect a building belonging to a range of areas S, M, and L with higher accuracy. For example, the area range S is less than 45 m ² , the area range M is 45 m ² or more and less than 131 m ² , and the area range L is 131 m ² or more. Generally, buildings belonging to the area range L correspond to condominiums and large commercial facilities, the area range M corresponds to apartments and retail stores, and the area range S corresponds to ordinary houses.

また、本実施形態にかかる建築物抽出システムでは、面積の範囲Ｓ，Ｍ，Ｌのそれぞれについて、ニューラルネットワークの種類や、入力される学習用画像のスケール（縮尺）が互いに異なる複数の建物検出器に対して学習が行われ、面積の範囲Ｓ，Ｍ，Ｌのそれぞれについて、複数の建物検出器から最も良い建物検出器が選択され、選択された建物検出器が、処理対象データからの建物領域の検出に用いられる。 In the building extraction system according to the present embodiment, for each of the area ranges S, M, and L, a plurality of building detectors having different types of neural networks and different scales (scales) of input learning images are used. Learning is performed, and the best building detector is selected from a plurality of building detectors for each of the area ranges S, M, and L, and the selected building detector is the building region from the processing target data. Used for detection.

図１は、本発明の実施形態にかかる建築物抽出システムのハードウェア構成を示す図である。建築物抽出システムは、学習サーバ１を含む。学習サーバ１は、サーバコンピュータであり、プロセッサ１１、記憶部１２、通信部１３、入出力部１４を含む。 FIG. 1 is a diagram illustrating a hardware configuration of a building extraction system according to an embodiment of the present invention. The building extraction system includes a learning server 1. The learning server 1 is a server computer and includes a processor 11, a storage unit 12, a communication unit 13, and an input / output unit 14.

プロセッサ１１は、記憶部１２に格納されているプログラムに従って動作する。またプロセッサ１１は通信部１３を制御し、入出力部１４に接続されたデバイスを制御する。ここでは、プロセッサ１１は、いわゆるＣＰＵ（Central Processing Unit）や、並列計算機として用いられるＧＰＵ（Graphics Processing Unit）を含んでよい。なお、上記プログラムは、インターネット等を介して提供されるものであってもよいし、フラッシュメモリやＤＶＤ−ＲＯＭ等のコンピュータで読み取り可能な記憶媒体に格納されて提供されるものであってもよい。 The processor 11 operates according to a program stored in the storage unit 12. Further, the processor 11 controls the communication unit 13 and controls a device connected to the input / output unit 14. Here, the processor 11 may include a so-called CPU (Central Processing Unit) or a GPU (Graphics Processing Unit) used as a parallel computer. The program may be provided via the Internet or the like, or may be provided by being stored in a computer-readable storage medium such as a flash memory or a DVD-ROM. .

記憶部１２は、ＲＡＭやフラッシュメモリ等のメモリ素子やハードディスクドライブによって構成されている。記憶部１２は、上記プログラムを格納する。また、記憶部１２は、各部から入力される情報や演算結果を格納する。 The storage unit 12 includes a memory element such as a RAM and a flash memory, and a hard disk drive. The storage unit 12 stores the program. The storage unit 12 stores information input from each unit and calculation results.

通信部１３は、他の装置と通信する機能を実現するものであり、例えば有線ＬＡＮの集積回路などにより構成されている。通信部１３は、それぞれプロセッサ１１の制御に基づいて、他の装置との間で情報を送受信する。また通信部１３は、受信された情報をプロセッサ１１や記憶部１２に入力する。通信部１３は、例えばＬＡＮにより他の機器と接続されている。 The communication unit 13 implements a function of communicating with other devices, and is configured by, for example, a wired LAN integrated circuit. The communication unit 13 transmits and receives information to and from other devices based on the control of the processor 11. The communication unit 13 inputs the received information to the processor 11 and the storage unit 12. The communication unit 13 is connected to other devices by, for example, a LAN.

入出力部１４は、表示出力デバイスをコントロールするビデオコントローラや、入力デバイスからのデータを取得するコントローラなどにより構成される。入力デバイスとしては、キーボード、マウス、タッチパネルなどがある。入出力部１４は、プロセッサ１１の制御に基づいて、表示出力デバイスに表示データを出力し、入力デバイスをユーザが操作することにより入力されるデータを取得する。表示出力デバイスは例えば外部に接続されるディスプレイ装置である。 The input / output unit 14 includes a video controller that controls the display output device, a controller that acquires data from the input device, and the like. Examples of input devices include a keyboard, a mouse, and a touch panel. Based on the control of the processor 11, the input / output unit 14 outputs display data to the display output device, and acquires data input by the user operating the input device. The display output device is, for example, a display device connected to the outside.

次に、建築物抽出システムの機能の概要について説明する。図２は、建築物抽出システムの機能構成を示すブロック図である。建築物抽出システムは、機能的に、学習データ取得部５１と、学習実行部５２と、学習検出器セット５３と、評価データ取得部５６と、評価実行部５７と、検出器選択部５８と、実行検出器セット６１と、対象データ入力部６５と、出力取得部６６と、フィルタ部６７と、統合部６８と、画像出力部６９とを含む。これらの機能は、主に、プロセッサ１１が記憶部１２に格納されたプログラムを実行し、記憶部１２に格納されるデータにアクセスすることで実現される。これらの全ての機能が、学習サーバ１により実行されてもよいし、その一部の機能が他のサーバで実行されてもよい。例えば、対象データ入力部６５、実行検出器セット６１、出力取得部６６、フィルタ部６７、統合部６８、画像出力部６９の機能が、プロセッサ１１、記憶部１２、通信部１３、入出力部１４を有する他のサーバにより実現されてもよい。 Next, the outline of the function of the building extraction system will be described. FIG. 2 is a block diagram showing a functional configuration of the building extraction system. The building extraction system functionally includes a learning data acquisition unit 51, a learning execution unit 52, a learning detector set 53, an evaluation data acquisition unit 56, an evaluation execution unit 57, a detector selection unit 58, An execution detector set 61, a target data input unit 65, an output acquisition unit 66, a filter unit 67, an integration unit 68, and an image output unit 69 are included. These functions are mainly realized by the processor 11 executing a program stored in the storage unit 12 and accessing data stored in the storage unit 12. All these functions may be executed by the learning server 1, or a part of the functions may be executed by another server. For example, the functions of the target data input unit 65, the execution detector set 61, the output acquisition unit 66, the filter unit 67, the integration unit 68, and the image output unit 69 are the processor 11, the storage unit 12, the communication unit 13, and the input / output unit 14. It may be realized by another server having

学習検出器セット５３は、複数の学習検出器５４を有する。本実施形態では、学習検出器５４の数は６であり、学習検出器５４のそれぞれは、面積の範囲Ｓ，Ｍ，Ｌに関わらず共通の学習がなされる共通部５４０と、それぞれ面積の範囲Ｓ，Ｍ，Ｌに応じた学習がなされる個別部５４１，５４２，５４３とを有する。学習検出器５４のそれぞれは、互いに異なる、ニューラルネットワークの種類と入力される学習用画像のスケールの組み合わせについて学習がなされる。 The learning detector set 53 includes a plurality of learning detectors 54. In the present embodiment, the number of learning detectors 54 is 6, and each of the learning detectors 54 has a common unit 540 that performs common learning regardless of the area ranges S, M, and L, and each area range. And individual units 541, 542, and 543 that perform learning according to S, M, and L. Each of the learning detectors 54 learns about a combination of different neural network types and input learning image scales.

学習データ取得部５１は、学習用入力画像と、その学習用入力画像に含まれる建物の形状を示す教師データとを取得する。学習実行部５２は、学習用入力画像と教師データとを用いて学習検出器５４を学習させる。 The learning data acquisition unit 51 acquires a learning input image and teacher data indicating the shape of a building included in the learning input image. The learning execution unit 52 causes the learning detector 54 to learn using the learning input image and the teacher data.

評価データ取得部５６は、評価用入力画像と、その評価用入力画像に含まれる建物の形状を示す正解データとを取得する。評価用入力画像および正解データは、学習用入力画像および教師データと同じであってもよい。評価実行部５７は、評価用入力画像と正解データとを用いて、学習検出器５４のそれぞれについて、個別部５４１，５４２，５４３のそれぞれについて、その建物の形状の検出精度を評価する。 The evaluation data acquisition unit 56 acquires an evaluation input image and correct data indicating the shape of the building included in the evaluation input image. The evaluation input image and the correct answer data may be the same as the learning input image and the teacher data. The evaluation execution unit 57 evaluates the detection accuracy of the shape of the building for each of the individual detectors 541, 542, and 543 for each of the learning detectors 54 using the input image for evaluation and the correct answer data.

検出器選択部５８は、評価実行部５７により評価された検出精度に基づいて、面積の範囲Ｓ，Ｍ，Ｌのそれぞれについて、入力対象データに対する建物の検出を行う学習検出器５４を選択する。選択された学習検出器５４の少なくとも一部は、実行検出器セット６１を構成する実行検出器６２，６３，６４として使用される。より具体的には、面積の範囲Ｓについて選択された学習検出器５４に含まれる共通部５４０および個別部５４１の組み合わせが、面積の範囲Ｓに対応する実行検出器６２に含まれる、共通部６２０および個別部６２１となる。面積の範囲Ｍについて選択された学習検出器５４に含まれる共通部５４０および個別部５４２の組み合わせが、面積の範囲Ｍに対応する実行検出器６３に含まれる、共通部６３０および個別部６３１となる。面積の範囲Ｌについて選択された学習検出器５４に含まれる共通部５４０および個別部５４３の組み合わせが、面積の範囲Ｌに対応する実行検出器６４に含まれる、共通部６４０および個別部６４１となる。 Based on the detection accuracy evaluated by the evaluation execution unit 57, the detector selection unit 58 selects the learning detector 54 that detects the building for the input target data for each of the area ranges S, M, and L. At least a part of the selected learning detector 54 is used as the execution detectors 62, 63, 64 constituting the execution detector set 61. More specifically, the combination of the common unit 540 and the individual unit 541 included in the learning detector 54 selected for the area range S is included in the execution detector 62 corresponding to the area range S. And the individual unit 621. The combination of the common unit 540 and the individual unit 542 included in the learning detector 54 selected for the area range M becomes the common unit 630 and the individual unit 631 included in the execution detector 63 corresponding to the area range M. . The combination of the common unit 540 and the individual unit 543 included in the learning detector 54 selected for the area range L becomes the common unit 640 and the individual unit 641 included in the execution detector 64 corresponding to the area range L. .

対象データ入力部６５は、入力対象画像を取得し、その入力対象画像を必要に応じ加工し、入力対象画像を実行検出器６２，６３，６４に入力する。出力取得部６６は、実行検出器６２，６３，６４が出力する出力画像を取得する。 The target data input unit 65 acquires an input target image, processes the input target image as necessary, and inputs the input target image to the execution detectors 62, 63, 64. The output acquisition unit 66 acquires output images output from the execution detectors 62, 63, and 64.

フィルタ部６７は、実行検出器６２，６３，６４の出力画像に含まれる建物を、面積に基づいて除去し、フィルタされた出力画像を生成する。 The filter unit 67 removes the buildings included in the output images of the execution detectors 62, 63, and 64 based on the area, and generates a filtered output image.

統合部６８は、フィルタされた、実行検出器６２，６３，６４の出力画像を統合する。統合部６８は、実行検出器６２，６３，６４の出力画像のいずれかにおいて建物と認識された領域が、建物のある領域と判定された画像を生成する。 The integration unit 68 integrates the filtered output images of the execution detectors 62, 63, and 64. The integration unit 68 generates an image in which an area recognized as a building in any of the output images of the execution detectors 62, 63, and 64 is determined as an area with a building.

画像出力部６９は、統合部６８により統合された画像を記憶部１２や表示出力デバイスへ出力する。 The image output unit 69 outputs the image integrated by the integration unit 68 to the storage unit 12 and the display output device.

次に、学習検出器セット５３およびそれに含まれる学習検出器５４の詳細について説明する。図３は、学習検出器５４の種類を説明する図である。図３に示される表の「Ｎｏ」は、６つの学習検出器５４に振られた番号を示す。「スケール」はその番号の学習検出器５４に投入される学習用入力画像のスケールを示し、はじめに用意される学習用入力画像をスケールに示される倍率で調整（必要に応じた拡大または縮小）され、スケールにかかわらず同じピクセル数を有するように切り出された学習用入力画像（以下では切り出された学習用入力画像を「窓画像」と記載する）が学習検出器５４に入力される。「モデル種類」はその番号の学習検出器５４の内部を構成するニューラルネットワークの種類を示す。「Ｐｏｏｌｉｎｇ」はＣＮＮ（Convolutional Neural Network）の中でも畳み込み層とプーリング層とを組み合わせたモデル（以下では「プーリングモデル」と記載する）であることを示し、「Ｄｉｌａｔｉｏｎ」は拡張畳み込み演算を行う畳み込み層を用いるモデル（以下では「ダイレーションモデル」と記載する）であることを示す。 Next, details of the learning detector set 53 and the learning detector 54 included therein will be described. FIG. 3 is a diagram for explaining the types of the learning detector 54. “No” in the table shown in FIG. 3 indicates numbers assigned to the six learning detectors 54. “Scale” indicates the scale of the learning input image to be input to the learning detector 54 of that number, and the learning input image prepared first is adjusted (enlarged or reduced as necessary) at the magnification indicated on the scale. The learning input image cut out so as to have the same number of pixels regardless of the scale (hereinafter, the cut out learning input image is referred to as a “window image”) is input to the learning detector 54. “Model type” indicates the type of the neural network that constitutes the learning detector 54 of that number. “Pooling” indicates that the convolutional layer and the pooling layer are combined in the CNN (Convolutional Neural Network) (hereinafter referred to as “pooling model”), and “Dilation” indicates a convolutional layer that performs an extended convolution operation. This is a model that uses (hereinafter referred to as “dilation model”).

図４は、スケールの違いを説明する図である。図４（ａ）は、スケールが０．５倍の場合の窓画像の一例であり、図４（ｂ）、図４（ｃ）は、それぞれ、スケールが１倍、２倍の場合の窓画像の一例である。図４（ａ）〜（ｃ）に示される窓画像は、同じ領域を含んでいる。窓画像のピクセル数は、どれもＰｘ×Ｐｙである。ＰｘおよびＰｙの値は、例えば３２や６４であってよい。スケールが０．５倍の場合の学習用入力画像は、スケールが１．０の場合の学習用入力画像を、縦横のドット数が１／２倍になるように縮小する（間引く）ことで得られ、スケールが２．０倍の場合の学習用入力画像は、スケールが１．０の場合の学習用入力画像を縦横のドット数が２倍になるように拡大する（ドット間に線形補間等によるドットを配置する）ことで得られる。この学習用入力画像の拡大または縮小は、学習データ取得部５１により行われる。 FIG. 4 is a diagram for explaining a difference in scale. 4A is an example of a window image when the scale is 0.5 times, and FIGS. 4B and 4C are window images when the scale is 1 time and 2 times, respectively. It is an example. The window images shown in FIGS. 4A to 4C include the same region. The number of pixels in the window image is Px × Py. The values of Px and Py may be 32 or 64, for example. The learning input image when the scale is 0.5 times is obtained by reducing (thinning out) the learning input image when the scale is 1.0 so that the number of vertical and horizontal dots becomes 1/2 times. The learning input image when the scale is 2.0 times is enlarged so that the number of vertical and horizontal dots is doubled (such as linear interpolation between dots). It is obtained by placing a dot. The learning data acquisition unit 51 enlarges or reduces the learning input image.

図５は、学習検出器５４の構成の概要を示す図である。学習検出器５４は、前述のように、共通部５４０と個別部５４１，５４２，５４３を有する。共通部５４０は複数の層を有し、また個別部５４１，５４２，５４３は、同じ数の層を有する。共通部５４０の１番目の層には調整された学習用入力画像が入力され、最後の層の出力である特徴情報は、個別部５４１，５４２，５４３のそれぞれの１番目の層に入力される。個別部５４１，５４２，５４３の出力は、例えば１６×１６ドットの画像であり、その各ドットはそのドットの位置における建物の存在確率を示す。 FIG. 5 is a diagram showing an outline of the configuration of the learning detector 54. As described above, the learning detector 54 includes the common unit 540 and the individual units 541, 542, and 543. The common unit 540 has a plurality of layers, and the individual units 541, 542, and 543 have the same number of layers. The adjusted learning input image is input to the first layer of the common unit 540, and the feature information that is the output of the last layer is input to the first layers of the individual units 541, 542, and 543, respectively. . The outputs of the individual units 541, 542, and 543 are, for example, 16 × 16 dot images, and each dot indicates the existence probability of the building at the position of the dot.

図６は、プーリングモデルの学習検出器５４に含まれる層を説明する図であり、図６には各層が、処理順に記載されている。所属の欄において、「共通」と記載される層は共通部５４０に存在し、「個別」と記載される層は個別部５４１，５４２，５４３に存在する。ここで、「個別」に記載される層は、個別部５４１，５４２，５４３のそれぞれに存在している。処理種類は、各層の種類を示しており、「ｉｎｐｕｔ」は入力層、「ｃｏｎｖｏｌｕｔｉｏｎ」は畳み込み層、「ｐｏｏｌｉｎｇ（ｓ２）」は、ストライド（カーネルの適用間隔）が２であるプーリング層を示している。カーネルサイズは、畳み込みフィルタのサイズを表すパラメータである。ここでは処理対象が画像であることに対応して、カーネルは２次元であり、カーネルサイズの値「ｋ」は“ｋ×ｋ”フィルタであることを意味する。各層の「特徴マップ数」は、当該層にて抽出される特徴マップの数であり、チャネルともよばれる。なお、ストライドは特に記載のない限り１であり、層ごとの記載を省略している。 FIG. 6 is a diagram for explaining the layers included in the learning detector 54 of the pooling model. In FIG. 6, the layers are described in the order of processing. In the column of affiliation, a layer described as “common” exists in the common part 540, and a layer described as “individual” exists in the individual parts 541, 542, and 543. Here, the layers described as “individual” exist in each of the individual units 541, 542, and 543. The processing type indicates the type of each layer, “input” indicates the input layer, “convolution” indicates the convolution layer, and “pooling (s2)” indicates the pooling layer whose stride (kernel application interval) is 2. Yes. The kernel size is a parameter representing the size of the convolution filter. Here, corresponding to the processing target being an image, the kernel is two-dimensional, and the kernel size value “k” means a “k × k” filter. The “number of feature maps” of each layer is the number of feature maps extracted in the layer, and is also called a channel. The stride is 1 unless otherwise specified, and the description for each layer is omitted.

図７は、ダイレーションモデルの学習検出器に含まれる層を説明する図である。図７の記載も図６の記載に準じているが、ダイレーションモデルにおける「ｃｏｎｖｏｌｕｔｉｏｎ」の層は拡張畳み込み層を示しており、その拡張畳み込み層の設定が拡張係数の欄に示されている。 FIG. 7 is a diagram illustrating layers included in the learning detector of the dilation model. Although the description of FIG. 7 is similar to the description of FIG. 6, the “convolution” layer in the dilation model indicates an extension convolution layer, and the setting of the extension convolution layer is shown in the column of the expansion coefficient.

拡張畳み込み演算についてさらに説明する。図８は、拡張畳み込み演算における層構造の一例を説明する図である。学習用入力画像などの入力画像は空間的に２次元のデータであるが、ここでは図示及び説明の簡素化のため、学習検出器５４への入力データを１次元データに単純化して説明する。具体的には、図８にて一番下に位置する入力層にて水平方向に並ぶ複数の“○”印が入力データを構成する。“○”印で表す入力データの要素３０は、入力画像における画素（又は画素値）に相当する。図８に示される畳み込み層はいわゆる特徴抽出層であり、特徴抽出層に続く層の記載を省略している。 The extended convolution operation will be further described. FIG. 8 is a diagram for explaining an example of a layer structure in the extended convolution operation. An input image such as a learning input image is spatially two-dimensional data. Here, for simplification of illustration and description, the input data to the learning detector 54 is simplified to one-dimensional data. Specifically, a plurality of “◯” marks arranged in the horizontal direction in the input layer located at the bottom in FIG. 8 constitute input data. The input data element 30 indicated by “◯” corresponds to a pixel (or pixel value) in the input image. The convolution layer shown in FIG. 8 is a so-called feature extraction layer, and the description of the layer following the feature extraction layer is omitted.

図８に示されるニューラルネットワークは特徴抽出層として７層の畳み込み層を有し、各畳み込み層が拡張畳み込み演算を行う。入力層の上に位置する第１層の畳み込み層は拡張係数ｄ＝１の拡張畳み込み演算を行う。具体的には第１層にて“○”印で表す複数のユニット３１それぞれにて畳み込み演算が行われ、各ユニット３１は入力層の隣り合う２つの要素３０の値に重みを乗じて足し合わせた値を出力する。 The neural network shown in FIG. 8 has seven convolution layers as feature extraction layers, and each convolution layer performs an extended convolution operation. The first convolution layer located above the input layer performs an expansion convolution operation with an expansion coefficient d = 1. Specifically, a convolution operation is performed in each of a plurality of units 31 indicated by “◯” in the first layer, and each unit 31 multiplies the values of two adjacent elements 30 in the input layer by adding a weight. Output the value.

第２層の畳み込み層は拡張係数ｄ＝２の拡張畳み込み演算を行う。具体的には第２層にて“○”印で表す複数のユニット３２それぞれにて畳み込み演算が行われ、各ユニット３２は第１層にて１つ置きのユニット３１の出力値に重みを乗じて足し合わせた値を出力する。 The second convolution layer performs an expansion convolution operation with an expansion coefficient d = 2. Specifically, a convolution operation is performed in each of the plurality of units 32 indicated by “◯” in the second layer, and each unit 32 multiplies the output value of every other unit 31 in the first layer by a weight. Output the sum.

また、第３層の畳み込み層は拡張係数ｄ＝３の拡張畳み込み演算を行い、第３層の“○”印で表す各ユニット３３は第２層にて３つ置きのユニット３２の出力値に重みを乗じて足し合わせた値を出力し、第４層の畳み込み層は拡張係数ｄ＝４の拡張畳み込み演算を行い、第４層の“○”印で表す各ユニット３４は第３層にて７つ置きのユニット３３の出力値に重みを乗じて足し合わせた値を出力する。第５層の各ユニット３５は、ｄ＝３の拡張畳み込み演算を行い、また、第６層の各ユニット３６、第７層の各ユニット３７は、それぞれｄ＝２，ｄ＝１の拡張畳み込み演算を行う。 Further, the convolution layer of the third layer performs an expansion convolution operation with the expansion coefficient d = 3, and each unit 33 represented by “◯” mark of the third layer becomes the output value of every third unit 32 in the second layer. The value obtained by multiplying by the weight is output, the convolution layer of the fourth layer performs the expansion convolution operation with the expansion coefficient d = 4, and each unit 34 represented by “◯” mark of the fourth layer is in the third layer. A value obtained by multiplying the output value of every seventh unit 33 by the weight is added. Each unit 35 in the fifth layer performs an extension convolution operation with d = 3, and each unit 36 in the sixth layer and each unit 37 in the seventh layer respectively perform an extension convolution operation with d = 2 and d = 1. I do.

ここで、図８に示す特徴抽出層の構造において、第１層〜第４層からなる部分をフロントエンド部と称し、これに続く第５層〜第７層からなる部分を局所特徴抽出部と称することにする。フロントエンド部は、入力層に続く複数の畳み込み層であり、フロントエンド部では、当該畳み込み層の並び順に従って拡張係数ｄが特徴抽出層における最大値まで増加する。一方、局所特徴抽出部は、フロントエンド部に続く複数の畳み込み層であり、局所特徴抽出部では当該畳み込み層の並び順に従って拡張係数が減少する。 Here, in the structure of the feature extraction layer shown in FIG. 8, the portion consisting of the first layer to the fourth layer is referred to as a front end portion, and the subsequent portion consisting of the fifth layer to the seventh layer is referred to as a local feature extraction portion. I will call it. The front end part is a plurality of convolution layers following the input layer. In the front end part, the expansion coefficient d increases to the maximum value in the feature extraction layer according to the arrangement order of the convolution layers. On the other hand, the local feature extraction unit is a plurality of convolution layers following the front end unit. In the local feature extraction unit, the expansion coefficient decreases according to the arrangement order of the convolution layers.

図８は、第７層の或る１つのユニット３７の出力に畳み込まれる第１層から第６層のユニット及び入力層の接続関係を線で例示している。拡張畳み込み演算では、拡張係数ｄに応じて指数関数的にカーネルの適用範囲が拡張される。例えば、図８のｄ＝１〜４の畳み込み演算のカーネルは、いずれも２つの入力を畳み込むフィルタ、つまりサイズが２のフィルタであるが、ｄ＝１のカーネルにより畳み込まれる２つの入力の１次元データの並びでの間隔は１であるのに対して、ｄ＝２のカーネルにより畳み込まれる２つの入力の間隔は２であり、またｄ＝３では当該間隔は４、ｄ＝４では当該間隔は８となる。つまり、間隔は２^ｄ−１に設定されている。 FIG. 8 illustrates, as a line, the connection relationship between the units of the first to sixth layers and the input layer that are folded into the output of one unit 37 of the seventh layer. In the extended convolution operation, the application range of the kernel is expanded exponentially according to the expansion coefficient d. For example, the kernel of the convolution operation of d = 1 to 4 in FIG. 8 is a filter that convolves two inputs, that is, a filter of size 2, but one of the two inputs convolved by the kernel of d = 1. The interval in the array of dimensional data is 1, whereas the interval between two inputs convolved by the kernel with d = 2 is 2, and when d = 3, the interval is 4, and when d = 4, the interval is The interval is 8. That is, the interval is set to 2 ^d−1 .

フロントエンド部におけるユニットおよび入力層の接続関係からわかるように、拡張畳み込み演算では、カーネルの適用範囲を拡張することで、少ない層数で受容野を広げることができる。そして、畳み込みだけで受容野を広げるので、一般的なＣＮＮで用いるプーリング層が不要となり、プーリング層による解像度低下を回避できる。また、適用範囲を拡大する一方で、当該範囲内の要素を間引いて残った一部の要素しか畳み込まないことで、重みパラメータの増大が抑制される。 As can be seen from the connection relationship between the unit and the input layer in the front end unit, in the extended convolution operation, the receptive field can be expanded with a small number of layers by extending the application range of the kernel. And since a receptive field is expanded only by convolution, the pooling layer used by general CNN becomes unnecessary, and the resolution fall by a pooling layer can be avoided. Further, while expanding the application range, only a part of the remaining elements after thinning out the elements in the range is convoluted, thereby suppressing an increase in the weight parameter.

一方、フロントエンド部のように、順に拡張係数ｄが増加するように層を積み重ねる構造は、最上層における近傍ユニット間の相関が弱まるという問題や、入力データのローカルな特徴を拾いにくくなるという問題を有する。局所特徴抽出部はこの問題を解決するために設けられており、フロントエンド部と局所特徴抽出部とを組み合わせることで、第７層のあるユニットにおいて近傍ユニット間の相関が弱まるという問題や、第１層のユニット３１ａ，３１ｂが隣り合っているというローカルな情報を把握できないという問題が解決されている。 On the other hand, the structure in which layers are stacked so that the expansion coefficient d increases in order as in the front-end portion has a problem that the correlation between neighboring units in the uppermost layer is weakened and a problem that it is difficult to pick up local features of input data Have The local feature extraction unit is provided in order to solve this problem. By combining the front end unit and the local feature extraction unit, there is a problem that the correlation between neighboring units is weakened in a unit of the seventh layer, The problem that local information that the units 31a and 31b of the first layer are adjacent to each other cannot be grasped is solved.

言い換えると、フロントエンド部の後に局所特徴抽出部を設けた構成とすることで、フロントエンド部にて拡張畳み込み演算を積極的に利用し解像度を一切落とさずにコンテキストを得ると共に、局所特徴抽出部ではフロントエンド部により分散された局所特徴を集約する。これにより、コンテキストの情報と局所特徴の情報を有効活用でき、小さく密集したオブジェクトも認識可能となっている。 In other words, the local feature extraction unit is provided after the front end unit, so that the front end unit actively uses the extended convolution operation to obtain the context without reducing any resolution, and the local feature extraction unit. Then, local features distributed by the front-end part are aggregated. As a result, context information and local feature information can be used effectively, and small and dense objects can be recognized.

次に、これまでに説明した学習検出器５４を、スケールに応じた学習用入力画像と、その学習用画像に含まれる建物の形状を示す教師データとを用いて学習させる処理の詳細について説明する。 Next, the details of the process of causing the learning detector 54 described so far to learn using the learning input image corresponding to the scale and the teacher data indicating the shape of the building included in the learning image will be described. .

図９は、学習検出器５４を学習させる処理の一例を示すフロー図である。図９には、学習データ取得部５１および学習実行部５２の処理が記載されており、この処理により、学習検出器５４が学習される。また、図９に示される処理は、学習検出器５４ごとに繰り返し回数だけ行われる。 FIG. 9 is a flowchart showing an example of a process for causing the learning detector 54 to learn. FIG. 9 shows the processing of the learning data acquisition unit 51 and the learning execution unit 52, and the learning detector 54 is learned by this processing. Further, the process shown in FIG. 9 is performed for each learning detector 54 by the number of repetitions.

学習データ取得部５１は、記憶部１２に格納された学習用画像を取得する（ステップＳ１０１）。学習用画像は、建物を抽出する処理の対象領域とする地表を撮影した航空写真や衛星画像等（航空写真や衛星画像に基づくオルソ画像であってよい）である。次に、学習データ取得部５１は、学習用画像のサイズを、学習検出器５４のスケールに合わせるように設定する（ステップＳ１０２）。例えば、学習検出器５４のスケールが０．５倍であれば学習用画像を０．５倍に縮小し、スケールが２倍であれば学習用画像を２倍に拡大する。なお、ステップＳ１０２の処理をする代わりに、予めスケールの種類のそれぞれに対応した複数の学習用画像を準備しておき、学習データ取得部５１が学習検出器５４のスケールに対応する画像を読み込んでもよい。 The learning data acquisition unit 51 acquires the learning image stored in the storage unit 12 (step S101). The learning image is an aerial photograph, a satellite image, or the like (which may be an ortho image based on the aerial photograph or the satellite image) obtained by photographing the ground surface as a processing target region for extracting a building. Next, the learning data acquisition unit 51 sets the size of the learning image to match the scale of the learning detector 54 (step S102). For example, if the scale of the learning detector 54 is 0.5 times, the learning image is reduced to 0.5 times, and if the scale is 2 times, the learning image is enlarged twice. Instead of performing the process of step S102, a plurality of learning images corresponding to each type of scale are prepared in advance, and the learning data acquisition unit 51 reads an image corresponding to the scale of the learning detector 54. Good.

そして、学習実行部５２は、スケールに合わせるように設定された学習用画像から、学習検出器５４に入力する窓画像を切出す（ステップＳ１０３）。窓画像は、Ｐｘ×Ｐｙのサイズであり、１つの学習用画像から、ランダムに位置を選択し、選択した位置をもとに学習用画像から窓画像を切り出す。 And the learning execution part 52 cuts out the window image input into the learning detector 54 from the image for learning set so that it might match | combine with a scale (step S103). The window image has a size of Px × Py, a position is randomly selected from one learning image, and the window image is cut out from the learning image based on the selected position.

学習実行部５２は、学習用画像から切り出された窓画像を入力し、出力を教師データと比較することで学習検出器５４を学習させる（ステップＳ１０４）。 The learning execution unit 52 inputs the window image cut out from the learning image, and causes the learning detector 54 to learn by comparing the output with the teacher data (step S104).

図１０は、窓画像のそれぞれに対する学習実行部５２の処理の一例を示すフロー図であり、ステップＳ１０４の処理をさらに詳細に説明する図である。ステップＳ１０４では、はじめに、学習実行部５２は、学習検出器５４の共通部５４０へ、学習用画像から切り出された窓画像を入力する（ステップＳ１２１）。これにより、学習検出器５４の共通部５４０が窓画像を処理し、さらに共通部５４０の出力を個別部５４１，５４２，５４３が処理する。そして、学習実行部５２は、学習検出器５４の個別部５４１，５４２，５４３のそれぞれの出力画像を取得する（ステップＳ１２２）。ここで、以下では、面積の範囲Ｓに対応する個別部５４１の出力画像を出力画像（Ｓ）、面積の範囲Ｍに対応する個別部５４２の出力画像を出力画像（Ｍ）、面積の範囲Ｌに対応する個別部５４３の出力画像を出力画像（Ｌ）と記載する。また、個別部５４１，５４２，５４３の出力画像をまとめて出力画像（Ｓ，Ｍ，Ｌ）と記載する。ここで、出力画像（Ｓ，Ｍ，Ｌ）の各ドットの値は、建物の領域の存在確率を示している。 FIG. 10 is a flowchart showing an example of processing of the learning execution unit 52 for each of the window images, and is a diagram for explaining the processing of step S104 in more detail. In step S104, first, the learning execution unit 52 inputs a window image cut out from the learning image to the common unit 540 of the learning detector 54 (step S121). Accordingly, the common unit 540 of the learning detector 54 processes the window image, and the individual units 541, 542, and 543 process the output of the common unit 540. And the learning execution part 52 acquires each output image of the separate parts 541,542,543 of the learning detector 54 (step S122). Here, in the following, the output image of the individual unit 541 corresponding to the area range S is the output image (S), the output image of the individual unit 542 corresponding to the area range M is the output image (M), and the area range L An output image of the individual unit 543 corresponding to is described as an output image (L). The output images of the individual units 541, 542, and 543 are collectively referred to as output images (S, M, L). Here, the value of each dot in the output image (S, M, L) indicates the existence probability of the building area.

次に、学習実行部５２は、学習検出器５４の出力画像（Ｓ，Ｍ，Ｌ）と、教師データとの誤差を算出する（ステップＳ１２３）。ここで、教師データは、学習用画像データに含まれる建物の形状を示す情報である。 Next, the learning execution unit 52 calculates an error between the output image (S, M, L) of the learning detector 54 and the teacher data (step S123). Here, the teacher data is information indicating the shape of the building included in the learning image data.

図１１は、教師データの一例を示す図である。図１１に示される教師データは、図４に示される窓画像を含む学習用画像に対応しているビットマップ画像である。図１１に示される教師データは、面積が範囲Ｓに属する建物の領域（例えばＡ）と、範囲Ｍに属する建物の領域（例えばＢ）と、範囲Ｌに属する建物の領域（例えばＣ）とが区別されている。教師データは、例えば、建物のない領域のドットの値を０、面積が範囲Ｓ，Ｍ，Ｌの建物の領域のドットの値をそれぞれ１，２，３に設定された画像であってもよい。また、教師データは、面積が範囲Ｓに属する建物の領域のドットの値が１である画像と、面積が範囲Ｍに属する建物の領域のドットの値が１である画像と、面積が範囲Ｌに属する建物の領域のドットの値が１である画像との複数のレイヤーに相当する画像であってもよい。 FIG. 11 is a diagram illustrating an example of teacher data. The teacher data shown in FIG. 11 is a bitmap image corresponding to the learning image including the window image shown in FIG. The teacher data shown in FIG. 11 includes a building area (eg, A) whose area belongs to the range S, a building area (eg, B) that belongs to the range M, and a building area (eg, C) that belongs to the range L. It is distinguished. The teacher data may be, for example, an image in which the dot value of a region without a building is set to 0, and the dot values of building regions whose areas are ranges S, M, and L are set to 1, 2, and 3, respectively. . The teacher data includes an image having a dot value of 1 in a building region whose area belongs to the range S, an image having a dot value of 1 in a building region belonging to the area M, and an area having a range L. It may be an image corresponding to a plurality of layers with an image having a dot value of 1 in a building area belonging to.

学習実行部５２は、誤差の算出において、学習用画像の窓画像の中央の１６×１６ドットに相当する位置の画像を教師データから切り出し、そして、出力画像（Ｓ，Ｍ，Ｌ）のそれぞれと、教師データとを比較する。ここで、学習実行部５２は、教師データのうち建物のない領域および範囲Ｓに属する建物の領域については出力画像（Ｓ）との誤差を算出するが、範囲Ｍ，Ｌに属する建物の領域については誤差を算出しない。同様に、学習実行部５２は、範囲Ｓ，Ｌに属する建物の領域について出力画像（Ｍ）との誤差を算出せず、範囲Ｓ，Ｍに属する建物の領域について出力画像（Ｌ）との誤差を算出しない。これにより、個別部５４１，５４２，５４３のそれぞれが、面積の範囲Ｓ，Ｍ，Ｌの建物の検出に適するように学習が進む。 In calculating the error, the learning execution unit 52 cuts out an image at a position corresponding to 16 × 16 dots in the center of the window image of the learning image from the teacher data, and outputs each of the output images (S, M, L). Compare with teacher data. Here, the learning execution unit 52 calculates an error from the output image (S) for the area without buildings and the area of buildings belonging to the range S in the teacher data, but for the areas of buildings belonging to the ranges M and L. Does not calculate the error. Similarly, the learning execution unit 52 does not calculate an error from the output image (M) for the area of the building belonging to the ranges S and L, and the error from the output image (L) for the area of the building belonging to the ranges S and M. Is not calculated. As a result, learning proceeds so that each of the individual units 541, 542, and 543 is suitable for detection of buildings having the area ranges S, M, and L.

次に、学習実行部５２は、算出された誤差に基づいて、誤差逆伝播法（バックプロパゲーション）などにより、個別部５４１，５４２，５４３における重み等のパラメータの値を変更する（ステップＳ１２４）。また、学習実行部５２は、個別部５４１，５４２，５４３のそれぞれの最上位の層から共通部の最下層に伝播させるべき誤差を積算し（ステップＳ１２５）、積算された誤差に基づいて、誤差逆伝播法などにより、共通部５４０における重み等のパラメータの値を変更する（ステップＳ１２６）。 Next, the learning execution unit 52 changes the values of parameters such as weights in the individual units 541, 542, and 543 by the error back-propagation method (back propagation) based on the calculated error (step S124). . Further, the learning execution unit 52 integrates errors to be propagated from the uppermost layer of each of the individual units 541, 542, and 543 to the lowermost layer of the common unit (step S125), and based on the accumulated error, The value of a parameter such as a weight in the common unit 540 is changed by a back propagation method or the like (step S126).

ステップＳ１０３およびステップＳ１０４（図９）に示される学習の処理は、ある学習用画像から学習に用いるすべての窓画像が取得されるまで繰り返される。この処理のセットは、すべての学習検出器５４のそれぞれに対して繰り返し行われ、それにより、各学習検出器５４が学習される。ここで、ステップＳ１０３の処理の代わりに、学習に用いる複数の窓画像をまとめて切り出す処理を行ってもよい。この場合、窓画像を入力し学習検出器５４を学習させる処理が切り出された窓画像のそれぞれについて行われるように、ステップＳ１０４の処理が繰り返し実行されてよい。 The learning process shown in step S103 and step S104 (FIG. 9) is repeated until all window images used for learning are acquired from a certain learning image. This set of processes is repeated for each of the learning detectors 54, whereby each learning detector 54 is learned. Here, instead of the process of step S103, a process of cutting out a plurality of window images used for learning may be performed. In this case, the process of step S104 may be repeatedly performed so that the process of inputting the window image and learning the learning detector 54 is performed for each of the extracted window images.

次に、学習済の学習検出器５４を評価し、実際に処理対象画像から建物の領域を抽出する処理を実行させるための学習検出器５４を実行検出器６２，６３，６４として選択する処理の詳細について説明する。 Next, the learning detector 54 for evaluating the learned learning detector 54 and evaluating the learning detector 54 for executing the processing of actually extracting the building region from the processing target image is selected as the execution detectors 62, 63, 64. Details will be described.

図１２は、学習検出器５４を評価する処理の一例を示すフロー図である。この処理では、はじめに、評価データ取得部５６は、記憶部１２から評価用画像および正解データを取得する（ステップＳ２０１）。評価用画像は学習用画像と同じであってもよく、異なってもよい。評価用画像の縮尺は学習用画像と同じである。正解データは評価用画像のうち面積の範囲Ｓ，Ｍ，Ｌのそれぞれに属する建物の領域を示す画像であり、評価用画像と学習用画像とが同じ場合は、正解データは教師データであってよい。また、図１２には図示されていないが、評価データ取得部５６は、学習データ取得部５１と同様に、評価用画像のサイズを学習検出器５４のスケールに合わせるように設定する。 FIG. 12 is a flowchart showing an example of processing for evaluating the learning detector 54. In this process, first, the evaluation data acquisition unit 56 acquires an evaluation image and correct answer data from the storage unit 12 (step S201). The evaluation image may be the same as or different from the learning image. The scale of the evaluation image is the same as the learning image. The correct answer data is an image indicating the area of the building belonging to each of the area ranges S, M, and L in the evaluation image. When the evaluation image and the learning image are the same, the correct answer data is teacher data. Good. Although not illustrated in FIG. 12, the evaluation data acquisition unit 56 sets the size of the evaluation image so as to match the scale of the learning detector 54 as in the learning data acquisition unit 51.

次に、評価実行部５７は、評価用画像から、学習検出器５４に入力する窓画像を切出す（ステップＳ２０２）。より具体的には、評価実行部５７は、切り出される領域がこれまでに切り出された窓領域と比べて所定数のドットがずれるように窓画像を切り出す。所定数のドットは１ドット以上、１６ドット以下の任意の大きさとすることができる。所定数の上限である１６は、学習検出器５４の出力が１６×１６ドットの画像であることに対応している。所定数は学習検出器５４の出力の縦または横の大きさ以下である。評価実行部５７は、評価用画像から切り出された窓画像を学習検出器５４へ入力し（ステップＳ２０３）、学習検出器５４の個別部５４１，５４２，５４３のそれぞれの出力画像（Ｓ，Ｍ，Ｌ）を取得する（ステップＳ２０４）。ここで、評価実行部５７は、取得された出力画像を、各ドットの存在確率の値が閾値より大きいか小さいかに基づいて２値化し、２値化された出力画像を記憶部１２に格納する。以下の処理では、出力画像は２値化された出力画像を指すものとする。そして、すべての窓画像について学習検出器５４の処理を行うまで、ステップＳ２０２からＳ２０４の処理を繰り返す（ステップＳ２０５参照）。 Next, the evaluation execution unit 57 cuts out a window image to be input to the learning detector 54 from the evaluation image (step S202). More specifically, the evaluation execution unit 57 cuts out the window image so that a predetermined number of dots are shifted from the cut out area of the window area cut out so far. The predetermined number of dots can be any size from 1 dot to 16 dots. The predetermined upper limit of 16 corresponds to the output of the learning detector 54 being a 16 × 16 dot image. The predetermined number is less than the vertical or horizontal size of the output of the learning detector 54. The evaluation execution unit 57 inputs the window image cut out from the evaluation image to the learning detector 54 (step S203), and outputs the output images (S, M, and S) of the individual units 541, 542, and 543 of the learning detector 54, respectively. L) is acquired (step S204). Here, the evaluation execution unit 57 binarizes the acquired output image based on whether the existence probability value of each dot is larger or smaller than the threshold value, and stores the binarized output image in the storage unit 12. To do. In the following processing, the output image indicates a binarized output image. Then, the processing from step S202 to S204 is repeated until the processing of the learning detector 54 is performed for all window images (see step S205).

すべての窓画像についての出力画像（Ｓ，Ｍ，Ｌ）が得られると、評価実行部５７は、それらの窓画像に対応する位置に出力画像（Ｓ）が配置された全体画像（Ｓ）と、それらの窓画像に対応する位置に出力画像（Ｍ）が配置された全体画像（Ｍ）と、それらの窓画像に対応する位置に出力画像（Ｌ）が配置された全体画像（Ｌ）と、を生成する（ステップＳ２０６）。より具体的には、評価実行部５７は出力画像（Ｓ，Ｍ，Ｌ）を窓画像の配置に対応するように互いに所定数のドットずれるように配置することで、全体画像（Ｓ，Ｍ，Ｌ）を生成する。ここで、窓画像を切出す際のずれの大きさである所定数のドットが１６ドットより小さい場合、各窓画像から得られる出力画像（Ｓ，Ｍ，Ｌ）のうち少なくとも一部のドットが他の窓画像についての出力画像（Ｓ，Ｍ，Ｌ）と重なる。評価実行部５７は、複数の窓画像の出力において位置が重なるドットについては、出力画像のドットの値が平均された平均値を全体画像（Ｓ，Ｍ，Ｌ）におけるドットの値とする。これにより、隣り合う出力画像（Ｓ，Ｍ，Ｌ）の境界が滑らかにつながらない場合であっても、それに起因する不整合が全体画像に表れることを防ぐことができる。 When the output images (S, M, L) for all the window images are obtained, the evaluation execution unit 57 and the entire image (S) in which the output images (S) are arranged at positions corresponding to the window images. The whole image (M) in which the output image (M) is arranged at a position corresponding to the window image, and the whole image (L) in which the output image (L) is arranged at a position corresponding to the window image. Are generated (step S206). More specifically, the evaluation execution unit 57 arranges the output images (S, M, L) so as to be shifted from each other by a predetermined number of dots so as to correspond to the arrangement of the window images. L). Here, when the predetermined number of dots, which is the magnitude of the shift when the window image is cut out, is smaller than 16 dots, at least some of the output images (S, M, L) obtained from each window image are It overlaps with output images (S, M, L) for other window images. For the dots whose positions overlap in the output of a plurality of window images, the evaluation execution unit 57 uses the average value obtained by averaging the dot values of the output image as the dot value in the entire image (S, M, L). Thereby, even when the boundary between adjacent output images (S, M, L) is not smoothly connected, it is possible to prevent inconsistencies caused by the boundary from appearing in the entire image.

そして、評価実行部５７は全体画像と正解データとを比較し、学習検出器５４の個別部５４１，５４２，５４３のそれぞれについて精度を評価する（ステップＳ２０７）。精度の評価は、例えば、評価実行部５７は正解データのうち面積の範囲Ｓに属する建物が存在する領域に、出力画像（Ｓ）において建物と判定された領域が存在する割合（Ｒｅｃａｌｌ）を求めることで行う。評価実行部５７は、正解データのうち面積の範囲Ｍ，Ｌに属する建物の領域と、出力画像（Ｍ）、出力画像（Ｌ）に存在する建物の領域とにおいても、同様に精度を評価する。 Then, the evaluation execution unit 57 compares the entire image and the correct answer data, and evaluates the accuracy of each of the individual units 541, 542, and 543 of the learning detector 54 (step S207). For example, in the evaluation of accuracy, the evaluation execution unit 57 obtains a ratio (Recall) in which the area determined to be a building in the output image (S) is present in the area where the building belonging to the area range S exists in the correct answer data. Do that. The evaluation execution unit 57 similarly evaluates the accuracy of the building regions belonging to the area ranges M and L of the correct answer data and the building regions existing in the output image (M) and the output image (L). .

ステップＳ２０２からステップＳ２０７の処理により、１つの学習検出器５４の精度が評価される。そして、評価実行部５７は、すべての学習検出器５４について精度を評価していない場合、ステップＳ２０２からの処理を繰り返し（ステップＳ２０８）、これにより、評価実行部５７は、すべての学習検出器５４の精度を評価する。 Through the processing from step S202 to step S207, the accuracy of one learning detector 54 is evaluated. If the accuracy is not evaluated for all the learning detectors 54, the evaluation executing unit 57 repeats the process from step S202 (step S208), whereby the evaluation executing unit 57 is all the learning detectors 54. Assess the accuracy of

図１３は、評価実行部５７による評価結果を示す図である。図１３における「Ｎｏ」は、図３に示されるものと同じく、学習検出器５４に振られた番号を示す。図１３の例では、面積の範囲がＳである、個別部５４１の出力については、スケールが１．０倍かつダイレーションモデルである学習検出器５４が最も精度がよい。また、面積の範囲がＭである個別部５４２の出力については、スケールが１．０倍かつプーリングモデルの学習検出器５４が最も精度がよく、面積の範囲がＬである個別部５４３の出力については、スケールが０．５倍かつプーリングモデルの学習検出器５４が最も精度がよい。 FIG. 13 is a diagram illustrating an evaluation result by the evaluation execution unit 57. “No” in FIG. 13 indicates the number assigned to the learning detector 54 as in FIG. In the example of FIG. 13, with respect to the output of the individual unit 541 whose area range is S, the learning detector 54 having a scale of 1.0 times and a dilation model has the highest accuracy. Regarding the output of the individual unit 542 whose area range is M, the learning detector 54 of the pooling model with the scale of 1.0 times has the highest accuracy, and the output of the individual unit 543 whose area range is L. The learning detector 54 having a scale of 0.5 times and a pooling model is most accurate.

学習検出器５４の精度が評価されると、検出器選択部５８は、面積の範囲Ｓ，Ｍ，Ｌのそれぞれについて、最も精度の高い学習検出器５４を、実行検出器６２，６３，６４として選択する（ステップＳ２０９）。実行検出器６２は、面積の範囲Ｓについて最も精度の高い学習検出器５４に含まれる、共通部５４０（以下では共通部６２０という）と個別部５４１（以下では個別部６２１という）との組み合わせである。実行検出器６３は、面積の範囲Ｍについて最も精度の高い学習検出器５４に含まれる、共通部５４０（以下では共通部６３０という）と個別部５４２（以下では個別部６３１という）との組み合わせである。実行検出器６４は、面積の範囲Ｌについて元も精度の高い学習検出器５４に含まれる、共通部５４０（以下では共通部６４０という）と個別部５４３（以下では個別部６４１という）との組み合わせである。 When the accuracy of the learning detector 54 is evaluated, the detector selection unit 58 sets the learning detector 54 having the highest accuracy as the execution detectors 62, 63, and 64 for each of the area ranges S, M, and L. Select (step S209). The execution detector 62 is a combination of a common unit 540 (hereinafter referred to as a common unit 620) and an individual unit 541 (hereinafter referred to as an individual unit 621) included in the learning detector 54 having the highest accuracy in the area range S. is there. The execution detector 63 is a combination of the common unit 540 (hereinafter referred to as the common unit 630) and the individual unit 542 (hereinafter referred to as the individual unit 631) included in the learning detector 54 having the highest accuracy in the area range M. is there. The execution detector 64 is a combination of the common unit 540 (hereinafter referred to as the common unit 640) and the individual unit 543 (hereinafter referred to as the individual unit 641) included in the learning detector 54 that is also highly accurate with respect to the area range L. It is.

ここで、図１３の記載からもわかるように、ダイレーションモデルはプーリングモデルに比べて小さな変化をとらえやすい傾向があるため、面積の範囲（の最大値）が小さいものではダイレーションモデルが有利になり、面積の範囲が大きいものではプーリングモデルが有利になる。また、スケールが小さいと細かな情報が減る一方、大規模な建物の形状を判定しやすくなる傾向がある。そのため、面積の範囲（の最大値）が小さいものではスケールが大きい方が有利になり、面積の範囲が大きいものではスケールが小さい方が有利になる。 Here, as can be seen from the description of FIG. 13, the dilation model tends to catch small changes compared to the pooling model, and therefore the dilation model is advantageous when the area range (the maximum value) is small. Therefore, the pooling model is advantageous when the area range is large. In addition, when the scale is small, detailed information is reduced, while it tends to be easy to determine the shape of a large-scale building. Therefore, when the area range (the maximum value) is small, the larger scale is advantageous, and when the area range is large, the smaller scale is advantageous.

したがって、図１３の例においても、面積の範囲の最大値が小さいものに対応する実行検出器６２として、スケールが大きめの１．０倍であり、ダイレーションモデルである学習検出器５４が選択され、面積の範囲の最大値が大きいものに対応する実行検出器６４として、スケールが小さめの０．５倍であり、プーリングモデルである学習検出器５４が選択されている。 Therefore, also in the example of FIG. 13, the learning detector 54 that is 1.0 times larger than the large scale and the dilation model is selected as the execution detector 62 corresponding to the one having the smallest maximum area range. The learning detector 54, which is a pooling model, is selected as the execution detector 64 corresponding to the one having the largest maximum area range.

検出器選択部５８は、単に後述の対象データ入力部６５が処理対象画像を入力し出力画像を取得する対象となる学習検出器５４を示す情報を記憶部１２に保存することで、学習検出器５４を選択してもよいし、実行検出器６２，６３，６４の実体として、選択された学習検出器５４の共通部５４０、個別部５４１等をコピーすることで学習検出器５４を実行検出器６２，６３，６４として選択してもよい。 The detector selection unit 58 simply stores in the storage unit 12 information indicating the learning detector 54 to which the target data input unit 65 (to be described later) inputs a processing target image and obtains an output image. 54 may be selected, or the learning detector 54 may be selected by copying the common unit 540, the individual unit 541, and the like of the selected learning detector 54 as an entity of the execution detectors 62, 63, and 64. You may select as 62,63,64.

次に、実行検出器６２，６３，６４を用いて、処理対象画像から建物の領域を判定する処理について説明する。図１４は、建物の領域を判定する処理の概要を説明する図である。 Next, processing for determining a building area from the processing target image using the execution detectors 62, 63, and 64 will be described. FIG. 14 is a diagram illustrating an outline of processing for determining a building area.

はじめに、対象データ入力部６５は、処理対象画像を面積の範囲Ｓに適した実行検出器６２に入力し、出力取得部６６は、実行検出器６２の出力に基づいて全体出力画像（Ｓ）を取得する（ステップＳ３０１）。全体出力画像（Ｓ）は、処理対象画像の全体について、実行検出器６２により建物が存在すると判定された領域を示す画像である。後述の全体出力画像（Ｍ）、全体出力画像（Ｌ）は、同様に、それぞれ、実行検出器６３，６４により建物が存在すると判定された領域を示す画像である。 First, the target data input unit 65 inputs the processing target image to the execution detector 62 suitable for the area range S, and the output acquisition unit 66 outputs the entire output image (S) based on the output of the execution detector 62. Obtain (step S301). The entire output image (S) is an image showing an area in which the execution detector 62 determines that a building exists for the entire processing target image. Similarly, an overall output image (M) and an overall output image (L), which will be described later, are images indicating regions where it is determined by the execution detectors 63 and 64 that a building exists, respectively.

図１５は、処理対象画像から全体出力画像を生成する処理の流れを示すフロー図であり、ステップＳ３０１の処理を詳細に説明する図である。はじめに、対象データ入力部６５は、処理対象画像のスケールを、実行検出器６２に設定されたスケールに合わせる（ステップＳ３２１）。対象データ入力部６５は、処理対象画像のスケールと実行検出器６２のスケールが異なる場合には処理対象画像を拡大または縮小することにより、スケールを合わせる。次に、対象データ入力部６５は、スケールが合わせられた処理対象画像から窓画像を切出す（ステップＳ３２２）。窓画像のサイズや処理対象画像から窓画像を切出す手法については、評価用画像から窓画像を切出す手法と同じであるので説明を省略する。次に、対象データ入力部６５は、実行検出器６２へ窓画像を入力する（ステップＳ３２３）。すると、実行検出器６２は、入力された窓画像について建物の領域を検出する処理を行い、出力取得部６６は、実行検出器６２の出力画像を取得する（ステップＳ３２４）。ここで、図示していないが、出力取得部６６は、取得された出力画像を、各ドットの存在確率の値が閾値より大きいか小さいかに基づいて２値化し、２値化された出力画像を記憶部１２に格納する。以下の処理では、出力画像は２値化された出力画像を指すものとする。そして、すべての窓画像について学習検出器５４の処理を行うまで、ステップＳ３２２からＳ３２４の処理を繰り返す（ステップＳ３２５参照）。 FIG. 15 is a flowchart showing the flow of processing for generating the entire output image from the processing target image, and is a diagram for explaining the processing in step S301 in detail. First, the target data input unit 65 matches the scale of the processing target image with the scale set in the execution detector 62 (step S321). If the scale of the processing target image is different from the scale of the execution detector 62, the target data input unit 65 adjusts the scale by enlarging or reducing the processing target image. Next, the target data input unit 65 cuts out a window image from the processing target image with the scale adjusted (step S322). Since the window image size and the method for cutting out the window image from the processing target image are the same as the method for cutting out the window image from the evaluation image, description thereof will be omitted. Next, the target data input unit 65 inputs a window image to the execution detector 62 (step S323). Then, the execution detector 62 performs a process of detecting a building area for the input window image, and the output acquisition unit 66 acquires the output image of the execution detector 62 (step S324). Here, although not shown, the output acquisition unit 66 binarizes the acquired output image based on whether the value of the probability of existence of each dot is larger or smaller than the threshold, and binarized output image Is stored in the storage unit 12. In the following processing, the output image indicates a binarized output image. Then, the processes of steps S322 to S324 are repeated until the process of the learning detector 54 is performed for all window images (see step S325).

なお、建物検出器が実行検出器６２，６３，６４の個別部６２１，６３１，６４１に対応し、建物検出器へ入力される処理対象画像の特徴情報が、それぞれ共通部６２０，６３０，６４０の出力であってよい。なお、学習検出器５４や実行検出器６２，６３，６４は、共通部５４０，６２０、６３０，６４０を含まなくてもよい。この場合、面積の範囲Ｓ、Ｍ、Ｌのそれぞれについて学習用入力画像や処理対象画像が入力され、建物検出器へ入力される処理対象画像の特徴情報は、単なる処理対象画像やその窓画像であってよい。 The building detector corresponds to the individual units 621, 631, and 641 of the execution detectors 62, 63, and 64, and the feature information of the processing target image input to the building detector is the common unit 620, 630, and 640, respectively. May be output. Note that the learning detector 54 and the execution detectors 62, 63, and 64 may not include the common units 540, 620, 630, and 640. In this case, the learning input image and the processing target image are input for each of the area ranges S, M, and L, and the feature information of the processing target image input to the building detector is simply the processing target image or its window image. It may be.

すべての窓画像についての出力画像が得られると、評価実行部５７は、それらの窓画像に対応する位置に出力画像が配置された全体出力画像（Ｓ）を生成する（ステップＳ３２６）。 When output images for all window images are obtained, the evaluation execution unit 57 generates an entire output image (S) in which the output images are arranged at positions corresponding to the window images (step S326).

次に、フィルタ部６７は、全体出力画像（Ｓ）に、面積に基づくフィルタをかける（ステップＳ３０２）。この処理は、より具体的には、フィルタ部６７は、全体出力画像（Ｓ）において建物が存在すると判定された領域の面積（領域のドット数とスケールから求められる）を算出し、その面積が面積の範囲Ｓに応じた許容範囲にない領域を全体出力画像（Ｓ）から削除する。具体的には許容範囲は、８９．２ｍ^２未満である。なお、フィルタ部６７の処理は行われなくてもよい。 Next, the filter unit 67 filters the entire output image (S) based on the area (step S302). More specifically, in this process, the filter unit 67 calculates the area of an area determined to have a building in the entire output image (S) (obtained from the number of dots and the scale of the area), and the area is A region that is not within the allowable range corresponding to the area range S is deleted from the entire output image (S). Specifically, the allowable range is less than 89.2 m ² . Note that the processing of the filter unit 67 may not be performed.

また、対象データ入力部６５は、処理対象画像を面積の範囲Ｍに適した実行検出器６３に入力し、出力取得部６６は、実行検出器６３の出力に基づいて全体出力画像（Ｍ）を取得する（ステップＳ３０３）。この処理の詳細は、実行検出器６２から全体出力画像（Ｓ）を取得する処理と同様であるので詳細の説明は省略する。 The target data input unit 65 inputs the processing target image to the execution detector 63 suitable for the area range M, and the output acquisition unit 66 outputs the entire output image (M) based on the output of the execution detector 63. Obtain (step S303). The details of this processing are the same as the processing for acquiring the entire output image (S) from the execution detector 62, and thus detailed description thereof is omitted.

次に、フィルタ部６７は、全体出力画像（Ｍ）に、面積に基づくフィルタをかける（ステップＳ３０４）。この処理は、より具体的には、フィルタ部６７は、全体出力画像（Ｍ）において建物が存在すると判定された領域の面積（領域のドット数とスケールから求められる）を算出し、その面積が面積の範囲Ｍに応じた許容範囲にない領域を全体出力画像（Ｍ）から削除する。具体的には許容範囲は、２２．３ｍ^２以上８９．２ｍ^２未満である。 Next, the filter unit 67 applies a filter based on the area to the entire output image (M) (step S304). More specifically, in this process, the filter unit 67 calculates the area of an area determined to have a building in the entire output image (M) (obtained from the number of dots and the scale of the area), and the area is A region that is not within the allowable range corresponding to the area range M is deleted from the entire output image (M). Specifically tolerance is less than 22.3 m ² or more 89.2m ^2.

また、対象データ入力部６５は、処理対象画像を面積の範囲Ｌに適した実行検出器６４に入力し、出力取得部６６は、実行検出器６４の出力に基づいて全体出力画像（Ｌ）を取得する（ステップＳ３０５）。この処理の詳細は、実行検出器６２から全体出力画像（Ｌ）を取得する処理と同様であるので詳細の説明は省略する。 The target data input unit 65 inputs the processing target image to the execution detector 64 suitable for the area range L, and the output acquisition unit 66 outputs the entire output image (L) based on the output of the execution detector 64. Obtain (step S305). The details of this processing are the same as the processing for acquiring the entire output image (L) from the execution detector 62, and thus detailed description thereof is omitted.

次に、フィルタ部６７は、全体出力画像（Ｌ）に、面積に基づくフィルタをかける（ステップＳ３０６）。この処理は、より具体的には、フィルタ部６７は、全体出力画像（Ｌ）において建物が存在すると判定された領域の面積（領域のドット数とスケールから求められる）を算出し、その面積が面積の範囲Ｌに応じた許容範囲にない領域を全体出力画像（Ｍ）から削除する。具体的には許容範囲は、６５．４ｍ^２以上である。 Next, the filter unit 67 applies a filter based on the area to the entire output image (L) (step S306). More specifically, in this process, the filter unit 67 calculates the area of an area determined to have a building in the entire output image (L) (obtained from the number of dots and the scale of the area), and the area is A region that is not within the allowable range corresponding to the area range L is deleted from the entire output image (M). Specifically, the allowable range is 65.4 m ² or more.

そして、統合部６８は、全体出力画像（Ｓ）、全体出力画像（Ｍ）、全体出力画像（Ｌ）の縮尺が一致するように、これらのうち少なくとも１つを拡大または縮小する処理を実行する（ステップＳ３０７）。なお、この処理は、フィルタ部６７の処理の前に行われてもよい。 Then, the integration unit 68 executes a process of enlarging or reducing at least one of these so that the scales of the overall output image (S), the overall output image (M), and the overall output image (L) match. (Step S307). This process may be performed before the process of the filter unit 67.

統合部６８は、その処理がなされた全体出力画像（Ｓ）、全体出力画像（Ｍ）、全体出力画像（Ｌ）を統合する（ステップＳ３０８）。言い換えると、統合部６８は、全体出力画像（Ｓ）、全体出力画像（Ｍ）、全体出力画像（Ｌ）のいずれかにおいて建物と認識された領域を、建物のある領域と判定し、その判定がされた領域を示す統合された画像を生成する。より具体的には、統合部６８は、フィルタされた全体出力画像（Ｓ）、全体出力画像（Ｍ）、全体出力画像（Ｌ）の各ドットの論理和をとることで、統合された画像を生成する。ここで、全体出力画像（Ｓ）、全体出力画像（Ｍ）、全体出力画像（Ｌ）の各ドットは、建物が存在すると判定された領域において１であり、そうでない領域において０であるとする。 The integration unit 68 integrates the processed overall output image (S), overall output image (M), and overall output image (L) (step S308). In other words, the integration unit 68 determines an area recognized as a building in any one of the overall output image (S), the overall output image (M), and the overall output image (L) as an area with a building, and the determination. An integrated image showing the marked area is generated. More specifically, the integration unit 68 obtains the integrated image by taking the logical sum of each dot of the filtered overall output image (S), overall output image (M), and overall output image (L). Generate. Here, it is assumed that each dot of the overall output image (S), the overall output image (M), and the overall output image (L) is 1 in an area where it is determined that a building exists, and 0 in an area other than that. .

そして、画像出力部６９は、統合部６８により生成された画像を記憶部１２や表示出力デバイスへ出力する。 Then, the image output unit 69 outputs the image generated by the integration unit 68 to the storage unit 12 and the display output device.

面積の範囲Ｓ，Ｍ，Ｌのそれぞれに好適なスケールやモデルの種類を有する実行検出器６２，６３，６４を用いて建物の領域が判定された画像を取得し、さらに統合部６８によりそれらの画像を統合することで、処理対象画像から判定される建物の精度を向上させ、特に見逃しを減らすことができる。 The execution detectors 62, 63, and 64 having scales and model types suitable for the area ranges S, M, and L are used to acquire the images in which the building areas are determined, and the integration unit 68 acquires the images. By integrating the images, it is possible to improve the accuracy of the building determined from the processing target image, and particularly to reduce oversight.

例えば、図１３に示される評価結果に基づいて、検出器選択部５８が、実行検出器６２，６３，６４として、それぞれ、スケールが１．０倍かつダイレーションモデル、スケールが１．０倍でプーリングモデル、スケールが０．５倍でプーリングモデルの学習検出器５４を選択した場合、ある実験では、見逃しの指標であるＲｅｃａｌｌの値が８７．０％であり、実行検出器６２，６３，６４として、どれもスケールが１．０倍でプーリングモデルとした場合における値である８２．０％や、実行検出器６２，６３，６４として、どれもスケールが１．０倍でダイレーションモデルとした場合における値である８３．８％を上回っている。ここで、Ｒｅｃａｌｌの値は、正解として与えられる建物の領域のうち、建物が存在すると判定された領域の数を、正解として与えられる建物の領域の数でわった数である。建物の領域の判定において、見落としを減らすことは一般的に容易ではないので、この効果は非常に大きいものとなる。 For example, based on the evaluation result shown in FIG. 13, the detector selection unit 58 sets the execution detectors 62, 63, 64 as a scale of 1.0 times, a dilation model, and a scale of 1.0 times, respectively. When the pooling model and the learning detector 54 of the pooling model are selected with a scale of 0.5, in some experiments, the value of Recall, which is an overlooked index, is 87.0%, and the execution detectors 62, 63, 64 82.0% which is the value when the pooling model is 1.0 times the scale, and the execution detectors 62, 63 and 64 are all the dilation model with the scale 1.0 times. The value in this case is 83.8%. Here, the value of “Recall” is a number obtained by dividing the number of areas determined as having a building out of the area of the building given as a correct answer by the number of building areas given as a correct answer. This effect is very significant because it is generally not easy to reduce oversights in the determination of building areas.

これまでに説明した実行検出器６２，６３，６４を組み合わせた建築物抽出システムを用いることで、航空写真や衛星画像といったリモートセンシング画像から様々なサイズの構造物や建築物等をより高精度に認識できるようになる。そして、建築物抽出システムを、建物の新築や滅失などの把握に利用することができ、家屋異動に関する統計の基礎情報の取得を可能とする。さらに、建物領域を精度良く抽出可能となることで、個々の建物の時間的変移をより容易に把握し、また、抽出された建物領域の大きさや形状から建物の詳細属性（例えば、戸建、マンション、工場といった建物の種類）を判別することもより容易になる。 By using the building extraction system that combines the execution detectors 62, 63, and 64 described so far, it is possible to more accurately detect structures and buildings of various sizes from remote sensing images such as aerial photographs and satellite images. Become able to recognize. Then, the building extraction system can be used for grasping a new construction or a loss of a building, and it is possible to obtain basic information on statistics regarding a house change. In addition, since it is possible to extract building areas with high accuracy, it is possible to more easily grasp the temporal transition of individual buildings, and the detailed attributes of buildings (for example, detached houses, It is also easier to determine the type of building such as a condominium or a factory.

そして、画像からの建物に関するこれらの情報抽出作業の自動化が図られることで、広範囲の地表を処理対象とした当該作業を低コストで高速に行うことが可能となる。 Then, by automating these information extraction operations relating to buildings from images, it is possible to perform the operations for processing a wide range of the ground surface at low cost and at high speed.

これまでに、本発明の実施形態について説明してきたが、本発明の趣旨の範囲内で様々な変形をすることができる。例えば、面積の範囲が３つではなく、２つや４つ以上でもよい。また、モデルの種類の数やスケールの種類の数が異なっていてもよい。また、個別部は建物の面積の範囲に応じて最適化されなくてもよい。例えば建物の高さなど、他の手法で分類されたグループに応じて個別部が最適化されてもよい。 Although the embodiments of the present invention have been described so far, various modifications can be made within the scope of the spirit of the present invention. For example, the area range is not three, but may be two or four or more. Also, the number of model types and the number of scale types may be different. The individual unit may not be optimized according to the range of the area of the building. For example, the individual unit may be optimized according to a group classified by another method such as a height of a building.

１学習サーバ、１１プロセッサ、１２記憶部、１３通信部、１４入出力部、３０要素、３１,３２，３３，３４，３５，３６，３７ユニット、５１学習データ取得部、５２学習実行部、５３学習検出器セット、５４学習検出器、５４０共通部、５４１，５４２，５４３個別部、５６評価データ取得部、５７評価実行部、５８検出器選択部、６１実行検出器セット、６２，６３，６４実行検出器、６２０，６３０，６４０共通部、６２１，６３１，６４１個別部、６５対象データ入力部、６６出力取得部、６７フィルタ部、６８統合部、６９画像出力部。 1 learning server, 11 processor, 12 storage unit, 13 communication unit, 14 input / output unit, 30 elements, 31, 32, 33, 34, 35, 36, 37 unit, 51 learning data acquisition unit, 52 learning execution unit, 53 Learning detector set, 54 Learning detector, 540 Common unit, 541, 542, 543 Individual unit, 56 Evaluation data acquisition unit, 57 Evaluation execution unit, 58 Detector selection unit, 61 Execution detector set, 62, 63, 64 Execution detector, 620, 630, 640 common unit, 621, 631, 641 individual unit, 65 target data input unit, 66 output acquisition unit, 67 filter unit, 68 integration unit, 69 image output unit.

Claims

1st building detector learned about a plurality of buildings belonging to the first group using learning input images and teacher data of information indicating the shapes of the plurality of buildings included in the learning input images When,
A second building detector that has learned a plurality of buildings belonging to a second group different from the first group using a learning input image and teacher data of information indicating the shapes of the plurality of buildings; ,
An input unit for inputting feature information of an input image obtained by photographing a learning target region on the ground surface from the sky to the first building detector and the second building detector;
An integration unit that integrates the output of the first building detector and the output of the second building detector with respect to the feature information of the input image;
Including
The first building detector and the second building detector include different types of neural networks.
Building extraction system.

In the building extraction system according to claim 1,
The areas of the plurality of buildings belonging to the first group belong to the first range,
The areas of the plurality of buildings belonging to the second group belong to a second range different from the first range,
Building extraction system.

In the building extraction system according to claim 2,
A filter for removing a building included in the output of the first building detector and a building included in the output of the second building detector based on an area;
Building extraction system.

In the building extraction system according to any one of claims 1 to 3,
The first building detector includes a convolution layer that performs an extended convolution operation;
The second building detector includes a pooling layer;
Building extraction system.

In the building extraction system according to claim 2,
The maximum value of the first range is smaller than the maximum value of the second range,
The first building detector includes a convolution layer that performs an extended convolution operation;
The second building detector includes a pooling layer;
Building extraction system.

In the building extraction system according to any one of claims 1 to 3,
For a plurality of buildings belonging to either the first group or the second group, the first type of neural network is included, and the learning input image and the learning input image A first candidate detector trained using teacher data of information indicating the shape of the building, and a second type of neural network, the learning input image and the learning input image An evaluation unit that evaluates the detection accuracy of the shape of each of the second candidate detectors learned using the teacher data of information indicating the shape of a plurality of buildings;
Based on the detection accuracy evaluated by the evaluation unit, one of the first candidate detector and the second candidate detector is replaced with one of the first building detector and the second building detector. A detector selection unit to select one of them,
Further including a building extraction system.

In the building extraction system according to any one of claims 1 to 6,
The integration unit, for the feature information of the input image input, an area recognized as a building in either the output of the first building detector or the output of the second building detector, To determine a certain area,
Building extraction system.