JP7096034B2

JP7096034B2 - Building extraction system

Info

Publication number: JP7096034B2
Application number: JP2018062646A
Authority: JP
Inventors: 竜平濱口
Original assignee: Pasco Corp
Current assignee: Pasco Corp
Priority date: 2018-03-28
Filing date: 2018-03-28
Publication date: 2022-07-05
Anticipated expiration: 2038-03-28
Also published as: JP2019175140A

Description

本発明は建築物抽出システムに関する。 The present invention relates to a building extraction system.

航空写真や衛星画像など、上空から取得した画像等のデータから建物を抽出する技術が研究されている。特許文献１には、航空写真等の画像上にて作業者が抽出したい建物を含む作業領域を指定し、当該作業領域にて建物の輪郭を自動的に抽出するシステムが開示されており、また下記特許文献２には、上空からレーザスキャナなどを用いて取得したＤＳＭ（Digital Surface Model：数値表層モデル）を使用して建物の輪郭を抽出する装置が開示されている。 Techniques for extracting buildings from data such as images acquired from the sky, such as aerial photographs and satellite images, are being researched. Patent Document 1 discloses a system in which a work area including a building that a worker wants to extract is specified on an image such as an aerial photograph, and the outline of the building is automatically extracted in the work area. Patent Document 2 below discloses an apparatus for extracting the contour of a building from the sky using a DSM (Digital Surface Model) acquired from the sky using a laser scanner or the like.

特許文献３には、歩行者を認識するための物体検出装置において、３つのスケールを有するアンサンブル検出器が開示されており、スケールにより検出するべき歩行者の画像のサイズが異なることが開示されている。 Patent Document 3 discloses an ensemble detector having three scales in an object detection device for recognizing a pedestrian, and discloses that the size of an image of a pedestrian to be detected differs depending on the scale. There is.

特開２０１１－７６１７８号公報Japanese Unexamined Patent Publication No. 2011-76178 特開２０１３－１０１４２８号公報Japanese Unexamined Patent Publication No. 2013-101428 特開２０１８－５５２０号公報Japanese Unexamined Patent Publication No. 2018-5520

発明者らは、例えば、建物の異動（新築や取り壊し）を検出する作業負荷を軽減するために、畳み込みニューラルネットワークを利用して建物を抽出する手法を開発している。畳み込みニューラルネットワークを用いて建物を抽出する場合、建物の抽出における見落としを抑制することが困難であった。 The inventors are developing a method of extracting a building using a convolutional neural network, for example, in order to reduce the workload of detecting a change (new construction or demolition) of a building. When extracting a building using a convolutional neural network, it was difficult to suppress oversight in the extraction of the building.

本発明は上記課題を鑑みてなされたものであって、その目的は、建物の抽出における見落としを抑制することが可能な建築物抽出システムを提供することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a building extraction system capable of suppressing oversight in the extraction of buildings.

（１）面積が第１の範囲に属する複数の建物について、第１の縮尺を有する第１の学習用入力画像と、前記第１の学習用入力画像に含まれる前記複数の建物の形状を示す情報の教師データとを用いて学習された第１の建物検出器と、面積が前記第１の範囲と異なる第２の範囲に属する複数の建物について、第２の縮尺を有する第２の学習用入力画像と、前記第２の学習用入力画像に含まれる複数の建物の形状を示す情報を含む教師データとを用いて学習させた第２の建物検出器と、地表上の学習対象領域が上空から撮影された第１の入力画像の特徴情報を前記第１の建物検出器に入力し、前記第１の入力画像が前記第１の縮尺と前記第２の縮尺との比に応じて拡大または縮小された第２の入力画像の特徴情報を前記第２の建物検出器に入力する入力部と、前記第１の入力画像の特徴情報に対する前記第１の建物検出器の出力と、前記第２の入力画像の特徴情報に対する第２の建物検出器の出力とを統合する統合部と、を含む建築物抽出システム。 (1) For a plurality of buildings whose areas belong to the first range, the first learning input image having the first scale and the shapes of the plurality of buildings included in the first learning input image are shown. A second learning device having a second scale for a first building detector learned using informational teacher data and a plurality of buildings belonging to a second range whose area is different from the first range. A second building detector trained using the input image and teacher data including information indicating the shapes of a plurality of buildings included in the second learning input image, and a learning target area on the ground surface are in the sky. The feature information of the first input image taken from the above is input to the first building detector, and the first input image is enlarged or enlarged according to the ratio of the first scale to the second scale. An input unit that inputs the feature information of the reduced second input image to the second building detector, an output of the first building detector with respect to the feature information of the first input image, and the second. A building extraction system that includes an integrated unit that integrates the output of a second building detector with respect to the feature information of the input image of.

（２）（１）において、前記第２の縮尺は、前記第１の縮尺と異なる、建築物抽出システム。 (2) In (1), the second scale is a building extraction system different from the first scale.

（３）（２）において、前記第１の範囲の最大値は、前記第２の範囲の最大値より大きく、
前記第１の縮尺は、前記第２の縮尺より小さい、建築物抽出システム。 (3) In (2), the maximum value of the first range is larger than the maximum value of the second range.
The first scale is a building extraction system smaller than the second scale.

（４）（１）から（３）のいずれかにおいて、前記第１の建物検出器の出力に含まれる建物、および、前記第２の建物検出器の出力に含まれる建物を面積に基づいて除去するフィルタをさらに含む、建築物抽出システム。 (4) In any of (1) to (3), the building included in the output of the first building detector and the building included in the output of the second building detector are removed based on the area. A building extraction system that further includes filters to do.

（５）（１）から（４）のいずれかにおいて、前記統合部は、前記第１の入力画像の特徴情報に対する前記第１の建物検出器の出力と、前記第２の入力画像の特徴情報に対する第２の建物検出器の出力との縮尺が一致するように、前記２つの出力のうち少なくとも一方を拡大または縮小する処理を実行し、前記処理が実行された前記２つの出力を重畳させる、建築物抽出システム。 (5) In any of (1) to (4), the integrated unit has the output of the first building detector with respect to the feature information of the first input image and the feature information of the second input image. A process of enlarging or reducing at least one of the two outputs is executed so that the scale of the output of the second building detector and the output of the second building detector are the same, and the two outputs for which the process is executed are superimposed. Building extraction system.

（６）（１）から（５）のいずれかにおいて、前記第１の範囲および前記第２の範囲のうちいずれかに属する複数の建物について、第１の候補縮尺を有する第３の学習用入力画像と、前記第３の学習用入力画像に含まれる前記複数の建物の形状を示す情報の教師データとを用いて学習された第１の候補検出器と、第２の候補縮尺を有する第４の学習用入力画像と、前記第４の学習用入力画像に含まれる前記複数の建物の形状を示す情報の教師データとを用いて学習された第２の候補検出器とのそれぞれの、建物の形状の検出精度を評価する評価部と、前記評価部により評価された検出精度に基づいて、前記第１の候補検出器および前記第２の候補検出器のうち一つを、前記第１の建物検出器および前記第２の建物検出器のうちいずれかとして選択する検出器選択部と、をさらに含む、建築物抽出システム。 (6) In any of (1) to (5), a third learning input having a first candidate scale for a plurality of buildings belonging to any one of the first range and the second range. A first candidate detector trained using the image and teacher data of information indicating the shape of the plurality of buildings included in the third learning input image, and a fourth having a second candidate scale. Of the building, the second candidate detector learned using the learning input image of the above and the teacher data of the information indicating the shapes of the plurality of buildings included in the fourth learning input image. Based on the evaluation unit that evaluates the shape detection accuracy and the detection accuracy evaluated by the evaluation unit, one of the first candidate detector and the second candidate detector is used in the first building. A building extraction system further comprising a detector and a detector selection unit for selection as any of the second building detectors.

（７）（１）から（６）のいずれかにおいて、前記統合部は、前記入力された入力画像の特徴情報に対する、前記第１の建物検出器の出力と前記第２の建物検出器の出力とのいずれかにおいて建物と認識された領域を、建物のある領域と判定する、建築物抽出システム。 (7) In any of (1) to (6), the integrated unit outputs the output of the first building detector and the output of the second building detector with respect to the feature information of the input input image. A building extraction system that determines an area recognized as a building in any of the above to be an area with a building.

本発明の実施形態にかかる建築物抽出システムのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of the building extraction system which concerns on embodiment of this invention. 建築物抽出システムの機能構成を示すブロック図である。It is a block diagram which shows the functional structure of a building extraction system. 学習検出器の種類を説明する図である。It is a figure explaining the kind of a learning detector. スケールの違いを説明する図である。It is a figure explaining the difference of scale. 学習検出器の構成の概要を示す図である。It is a figure which shows the outline of the structure of the learning detector. プーリングモデルの学習検出器に含まれる層を説明する図である。It is a figure explaining the layer contained in the learning detector of a pooling model. ダイレーションモデルの学習検出器に含まれる層を説明する図である。It is a figure explaining the layer included in the learning detector of the dilation model. 拡張畳み込み演算における層構造の一例を説明する図である。It is a figure explaining an example of the layer structure in the extended convolution operation. 学習検出器を学習させる処理の一例を示すフロー図である。It is a flow diagram which shows an example of the process which makes a learning detector learn. 窓画像のそれぞれに対する学習実行部の処理の一例を示すフロー図である。It is a flow diagram which shows an example of the processing of the learning execution part for each of window images. 教師データの一例を示す図である。It is a figure which shows an example of a teacher data. 学習検出器を評価する処理の一例を示すフロー図である。It is a flow diagram which shows an example of the process which evaluates a learning detector. 評価結果を示す図である。It is a figure which shows the evaluation result. 建物の領域を判定する処理の概要を説明する図である。It is a figure explaining the outline of the process of determining the area of a building. 処理対象画像から全体出力画像を生成する処理の流れを示すフロー図である。It is a flow chart which shows the flow of the process which generates the whole output image from the process target image.

以下では、本発明の実施形態について図面に基づいて説明する。出現する構成要素のうち同一機能を有するものには同じ符号を付し、その説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Among the components that appear, those having the same function are designated by the same reference numerals, and the description thereof will be omitted.

本実施形態にかかる建築物抽出システムでは、ニューラルネットワークを用いた学習済みモデルである建物検出器に、建物を抽出する処理の対象領域とする地表を撮影した航空写真や衛星画像等（航空写真や衛星画像に基づくオルソ画像であってよく、以下では「処理対象画像」と表記する）の特徴情報を入力し、建物検出器から出力された画像に基づいて建物領域を判定・抽出する。建築物抽出システムは、処理対象画像から建物を識別する際には、３つの建物検出器を用いる。また、３つの建物検出器は、それぞれ、面積がＳ，Ｍ，Ｌの範囲に属する建物をより高精度に検出するように構成されている。例えば、面積の範囲Ｓは４５ｍ^２未満であり、面積の範囲Ｍは、４５ｍ^２以上１３１ｍ^２未満であり、面積の範囲Ｌは１３１ｍ^２以上である。おおむね、面積の範囲Ｌに属する建物はマンションや大型商業施設に対応し、面積の範囲Ｍはアパートや小売店に対応し、面積の範囲Ｓは一般家屋に対応する。 In the building extraction system according to the present embodiment, an aerial photograph, a satellite image, or the like (aerial photograph or satellite image) obtained by taking a ground surface as a target area of the processing for extracting a building is applied to a building detector which is a trained model using a neural network. It may be an ortho image based on a satellite image, and is referred to as a “processed image” below), and the building area is determined and extracted based on the image output from the building detector. The building extraction system uses three building detectors when identifying a building from the image to be processed. Further, each of the three building detectors is configured to detect buildings having an area in the range of S, M, and L with higher accuracy. For example, the area range S is less than 45 m ² , the area range M is 45 m ² or more and less than 131 m ² , and the area range L is 131 m ² or more. Generally, the buildings belonging to the area range L correspond to condominiums and large commercial facilities, the area range M corresponds to apartments and retail stores, and the area range S corresponds to general houses.

また、本実施形態にかかる建築物抽出システムでは、面積の範囲Ｓ，Ｍ，Ｌのそれぞれについて、ニューラルネットワークの種類や、入力される学習用画像のスケール（縮尺）が互いに異なる複数の建物検出器に対して学習が行われ、面積の範囲Ｓ，Ｍ，Ｌのそれぞれについて、複数の建物検出器から最も良い建物検出器が選択され、選択された建物検出器が、処理対象データからの建物領域の検出に用いられる。 Further, in the building extraction system according to the present embodiment, there are a plurality of building detectors having different types of neural networks and scales (scales) of input learning images for each of the area ranges S, M, and L. The best building detector is selected from a plurality of building detectors for each of the area ranges S, M, and L, and the selected building detector is the building area from the data to be processed. Is used to detect.

図１は、本発明の実施形態にかかる建築物抽出システムのハードウェア構成を示す図である。建築物抽出システムは、学習サーバ１を含む。学習サーバ１は、サーバコンピュータであり、プロセッサ１１、記憶部１２、通信部１３、入出力部１４を含む。 FIG. 1 is a diagram showing a hardware configuration of a building extraction system according to an embodiment of the present invention. The building extraction system includes a learning server 1. The learning server 1 is a server computer, and includes a processor 11, a storage unit 12, a communication unit 13, and an input / output unit 14.

プロセッサ１１は、記憶部１２に格納されているプログラムに従って動作する。またプロセッサ１１は通信部１３を制御し、入出力部１４に接続されたデバイスを制御する。ここでは、プロセッサ１１は、いわゆるＣＰＵ（Central Processing Unit）や、並列計算機として用いられるＧＰＵ（Graphics Processing Unit）を含んでよい。なお、上記プログラムは、インターネット等を介して提供されるものであってもよいし、フラッシュメモリやＤＶＤ－ＲＯＭ等のコンピュータで読み取り可能な記憶媒体に格納されて提供されるものであってもよい。 The processor 11 operates according to the program stored in the storage unit 12. Further, the processor 11 controls the communication unit 13 and controls the device connected to the input / output unit 14. Here, the processor 11 may include a so-called CPU (Central Processing Unit) or a GPU (Graphics Processing Unit) used as a parallel computer. The above program may be provided via the Internet or the like, or may be stored and provided in a computer-readable storage medium such as a flash memory or a DVD-ROM. ..

記憶部１２は、ＲＡＭやフラッシュメモリ等のメモリ素子やハードディスクドライブによって構成されている。記憶部１２は、上記プログラムを格納する。また、記憶部１２は、各部から入力される情報や演算結果を格納する。 The storage unit 12 is composed of a memory element such as a RAM or a flash memory and a hard disk drive. The storage unit 12 stores the above program. Further, the storage unit 12 stores information and calculation results input from each unit.

通信部１３は、他の装置と通信する機能を実現するものであり、例えば有線ＬＡＮの集積回路などにより構成されている。通信部１３は、それぞれプロセッサ１１の制御に基づいて、他の装置との間で情報を送受信する。また通信部１３は、受信された情報をプロセッサ１１や記憶部１２に入力する。通信部１３は、例えばＬＡＮにより他の機器と接続されている。 The communication unit 13 realizes a function of communicating with other devices, and is configured by, for example, an integrated circuit of a wired LAN. The communication unit 13 transmits / receives information to / from other devices based on the control of the processor 11. Further, the communication unit 13 inputs the received information to the processor 11 and the storage unit 12. The communication unit 13 is connected to another device by, for example, a LAN.

入出力部１４は、表示出力デバイスをコントロールするビデオコントローラや、入力デバイスからのデータを取得するコントローラなどにより構成される。入力デバイスとしては、キーボード、マウス、タッチパネルなどがある。入出力部１４は、プロセッサ１１の制御に基づいて、表示出力デバイスに表示データを出力し、入力デバイスをユーザが操作することにより入力されるデータを取得する。表示出力デバイスは例えば外部に接続されるディスプレイ装置である。 The input / output unit 14 includes a video controller that controls a display output device, a controller that acquires data from the input device, and the like. Input devices include keyboards, mice, touch panels, and the like. The input / output unit 14 outputs display data to the display output device based on the control of the processor 11, and acquires the data input by the user operating the input device. The display output device is, for example, a display device connected to the outside.

次に、建築物抽出システムの機能の概要について説明する。図２は、建築物抽出システムの機能構成を示すブロック図である。建築物抽出システムは、機能的に、学習データ取得部５１と、学習実行部５２と、学習検出器セット５３と、評価データ取得部５６と、評価実行部５７と、検出器選択部５８と、実行検出器セット６１と、対象データ入力部６５と、出力取得部６６と、フィルタ部６７と、統合部６８と、画像出力部６９とを含む。これらの機能は、主に、プロセッサ１１が記憶部１２に格納されたプログラムを実行し、記憶部１２に格納されるデータにアクセスすることで実現される。これらの全ての機能が、学習サーバ１により実行されてもよいし、その一部の機能が他のサーバで実行されてもよい。例えば、対象データ入力部６５、実行検出器セット６１、出力取得部６６、フィルタ部６７、統合部６８、画像出力部６９の機能が、プロセッサ１１、記憶部１２、通信部１３、入出力部１４を有する他のサーバにより実現されてもよい。 Next, the outline of the function of the building extraction system will be described. FIG. 2 is a block diagram showing a functional configuration of a building extraction system. The building extraction system functionally includes a learning data acquisition unit 51, a learning execution unit 52, a learning detector set 53, an evaluation data acquisition unit 56, an evaluation execution unit 57, a detector selection unit 58, and the like. It includes an execution detector set 61, a target data input unit 65, an output acquisition unit 66, a filter unit 67, an integration unit 68, and an image output unit 69. These functions are mainly realized by the processor 11 executing the program stored in the storage unit 12 and accessing the data stored in the storage unit 12. All of these functions may be executed by the learning server 1, or some of the functions may be executed by another server. For example, the functions of the target data input unit 65, the execution detector set 61, the output acquisition unit 66, the filter unit 67, the integrated unit 68, and the image output unit 69 are the processor 11, the storage unit 12, the communication unit 13, and the input / output unit 14. It may be realized by another server having.

学習検出器セット５３は、複数の学習検出器５４を有する。本実施形態では、学習検出器５４の数は６であり、学習検出器５４のそれぞれは、面積の範囲Ｓ，Ｍ，Ｌに関わらず共通の学習がなされる共通部５４０と、それぞれ面積の範囲Ｓ，Ｍ，Ｌに応じた学習がなされる個別部５４１，５４２，５４３とを有する。学習検出器５４のそれぞれは、互いに異なる、ニューラルネットワークの種類と入力される学習用画像のスケールの組み合わせについて学習がなされる。 The learning detector set 53 has a plurality of learning detectors 54. In the present embodiment, the number of learning detectors 54 is 6, and each of the learning detectors 54 has a common portion 540 in which common learning is performed regardless of the area ranges S, M, and L, and an area range of each. It has individual units 541, 542, 543 for learning according to S, M, and L. Each of the learning detectors 54 learns about different combinations of neural network types and input training image scales.

学習データ取得部５１は、学習用入力画像と、その学習用入力画像に含まれる建物の形状を示す教師データとを取得する。学習実行部５２は、学習用入力画像と教師データとを用いて学習検出器５４を学習させる。 The learning data acquisition unit 51 acquires a learning input image and teacher data indicating the shape of the building included in the learning input image. The learning execution unit 52 trains the learning detector 54 using the input image for learning and the teacher data.

評価データ取得部５６は、評価用入力画像と、その評価用入力画像に含まれる建物の形状を示す正解データとを取得する。評価用入力画像および正解データは、学習用入力画像および教師データと同じであってもよい。評価実行部５７は、評価用入力画像と正解データとを用いて、学習検出器５４のそれぞれについて、個別部５４１，５４２，５４３のそれぞれについて、その建物の形状の検出精度を評価する。 The evaluation data acquisition unit 56 acquires an evaluation input image and correct answer data indicating the shape of the building included in the evaluation input image. The evaluation input image and the correct answer data may be the same as the learning input image and the teacher data. The evaluation execution unit 57 evaluates the detection accuracy of the shape of the building for each of the individual units 541, 542, 543 for each of the learning detectors 54 using the evaluation input image and the correct answer data.

検出器選択部５８は、評価実行部５７により評価された検出精度に基づいて、面積の範囲Ｓ，Ｍ，Ｌのそれぞれについて、入力対象データに対する建物の検出を行う学習検出器５４を選択する。選択された学習検出器５４の少なくとも一部は、実行検出器セット６１を構成する実行検出器６２，６３，６４として使用される。より具体的には、面積の範囲Ｓについて選択された学習検出器５４に含まれる共通部５４０および個別部５４１の組み合わせが、面積の範囲Ｓに対応する実行検出器６２に含まれる、共通部６２０および個別部６２１となる。面積の範囲Ｍについて選択された学習検出器５４に含まれる共通部５４０および個別部５４２の組み合わせが、面積の範囲Ｍに対応する実行検出器６３に含まれる、共通部６３０および個別部６３１となる。面積の範囲Ｌについて選択された学習検出器５４に含まれる共通部５４０および個別部５４３の組み合わせが、面積の範囲Ｌに対応する実行検出器６４に含まれる、共通部６４０および個別部６４１となる。 The detector selection unit 58 selects the learning detector 54 that detects the building with respect to the input target data for each of the area ranges S, M, and L, based on the detection accuracy evaluated by the evaluation execution unit 57. At least a part of the selected learning detector 54 is used as the execution detectors 62, 63, 64 constituting the execution detector set 61. More specifically, the combination of the common part 540 and the individual part 541 included in the learning detector 54 selected for the area range S is included in the execution detector 62 corresponding to the area range S, the common part 620. And the individual part 621. The combination of the common part 540 and the individual part 542 included in the learning detector 54 selected for the area range M becomes the common part 630 and the individual part 631 included in the execution detector 63 corresponding to the area range M. .. The combination of the common part 540 and the individual part 543 included in the learning detector 54 selected for the area range L becomes the common part 640 and the individual part 641 included in the execution detector 64 corresponding to the area range L. ..

対象データ入力部６５は、入力対象画像を取得し、その入力対象画像を必要に応じ加工し、入力対象画像を実行検出器６２，６３，６４に入力する。出力取得部６６は、実行検出器６２，６３，６４が出力する出力画像を取得する。 The target data input unit 65 acquires an input target image, processes the input target image as necessary, and inputs the input target image to the execution detectors 62, 63, 64. The output acquisition unit 66 acquires the output image output by the execution detectors 62, 63, 64.

フィルタ部６７は、実行検出器６２，６３，６４の出力画像に含まれる建物を、面積に基づいて除去し、フィルタされた出力画像を生成する。 The filter unit 67 removes the buildings included in the output images of the execution detectors 62, 63, 64 based on the area, and generates a filtered output image.

統合部６８は、フィルタされた、実行検出器６２，６３，６４の出力画像を統合する。統合部６８は、実行検出器６２，６３，６４の出力画像のいずれかにおいて建物と認識された領域が、建物のある領域と判定された画像を生成する。 The integration unit 68 integrates the filtered output images of the execution detectors 62, 63, 64. The integration unit 68 generates an image in which the area recognized as a building in any of the output images of the execution detectors 62, 63, and 64 is determined to be a certain area of the building.

画像出力部６９は、統合部６８により統合された画像を記憶部１２や表示出力デバイスへ出力する。 The image output unit 69 outputs the image integrated by the integration unit 68 to the storage unit 12 or the display output device.

次に、学習検出器セット５３およびそれに含まれる学習検出器５４の詳細について説明する。図３は、学習検出器５４の種類を説明する図である。図３に示される表の「Ｎｏ」は、６つの学習検出器５４に振られた番号を示す。「スケール」はその番号の学習検出器５４に投入される学習用入力画像のスケールを示し、はじめに用意される学習用入力画像をスケールに示される倍率で調整（必要に応じた拡大または縮小）され、スケールにかかわらず同じピクセル数を有するように切り出された学習用入力画像（以下では切り出された学習用入力画像を「窓画像」と記載する）が学習検出器５４に入力される。「モデル種類」はその番号の学習検出器５４の内部を構成するニューラルネットワークの種類を示す。「Ｐｏｏｌｉｎｇ」はＣＮＮ（Convolutional Neural Network）の中でも畳み込み層とプーリング層とを組み合わせたモデル（以下では「プーリングモデル」と記載する）であることを示し、「Ｄｉｌａｔｉｏｎ」は拡張畳み込み演算を行う畳み込み層を用いるモデル（以下では「ダイレーションモデル」と記載する）であることを示す。 Next, the details of the learning detector set 53 and the learning detector 54 included therein will be described. FIG. 3 is a diagram illustrating the types of the learning detector 54. “No” in the table shown in FIG. 3 indicates a number assigned to the six learning detectors 54. "Scale" indicates the scale of the learning input image input to the learning detector 54 of that number, and the learning input image prepared at the beginning is adjusted (enlarged or reduced as necessary) at the magnification indicated on the scale. , A learning input image cut out so as to have the same number of pixels regardless of the scale (hereinafter, the cut out learning input image is referred to as a “window image”) is input to the learning detector 54. The "model type" indicates the type of the neural network constituting the inside of the learning detector 54 of the number. "Polling" indicates that it is a model that combines a convolutional layer and a pooling layer (hereinafter referred to as "pooling model") in CNN (Convolutional Neural Network), and "Dilation" is a convolutional layer that performs an extended convolutional operation. It is shown that it is a model using (hereinafter referred to as "dilation model").

図４は、スケールの違いを説明する図である。図４（ａ）は、スケールが０．５倍の場合の窓画像の一例であり、図４（ｂ）、図４（ｃ）は、それぞれ、スケールが１倍、２倍の場合の窓画像の一例である。図４（ａ）～（ｃ）に示される窓画像は、同じ領域を含んでいる。窓画像のピクセル数は、どれもＰｘ×Ｐｙである。ＰｘおよびＰｙの値は、例えば３２や６４であってよい。スケールが０．５倍の場合の学習用入力画像は、スケールが１．０の場合の学習用入力画像を、縦横のドット数が１／２倍になるように縮小する（間引く）ことで得られ、スケールが２．０倍の場合の学習用入力画像は、スケールが１．０の場合の学習用入力画像を縦横のドット数が２倍になるように拡大する（ドット間に線形補間等によるドットを配置する）ことで得られる。この学習用入力画像の拡大または縮小は、学習データ取得部５１により行われる。 FIG. 4 is a diagram illustrating the difference in scale. FIG. 4A is an example of a window image when the scale is 0.5 times, and FIGS. 4B and 4C are window images when the scale is 1x and 2x, respectively. This is an example. The window images shown in FIGS. 4A to 4C include the same area. The number of pixels of the window image is Px × Py. The values of Px and Py may be, for example, 32 or 64. The learning input image when the scale is 0.5 times is obtained by reducing (thinning) the learning input image when the scale is 1.0 so that the number of vertical and horizontal dots is halved. The learning input image when the scale is 2.0 times is enlarged so that the number of vertical and horizontal dots is doubled (linear interpolation between dots, etc.). It is obtained by arranging dots by). Enlarging or reducing the learning input image is performed by the learning data acquisition unit 51.

図５は、学習検出器５４の構成の概要を示す図である。学習検出器５４は、前述のように、共通部５４０と個別部５４１，５４２，５４３を有する。共通部５４０は複数の層を有し、また個別部５４１，５４２，５４３は、同じ数の層を有する。共通部５４０の１番目の層には調整された学習用入力画像が入力され、最後の層の出力である特徴情報は、個別部５４１，５４２，５４３のそれぞれの１番目の層に入力される。個別部５４１，５４２，５４３の出力は、例えば１６×１６ドットの画像であり、その各ドットはそのドットの位置における建物の存在確率を示す。 FIG. 5 is a diagram showing an outline of the configuration of the learning detector 54. As described above, the learning detector 54 has a common unit 540 and individual units 541, 542, 543. The common part 540 has a plurality of layers, and the individual parts 541, 542, 543 have the same number of layers. The adjusted input image for learning is input to the first layer of the common part 540, and the feature information which is the output of the last layer is input to the first layer of each of the individual parts 541, 542, 543. .. The output of the individual portions 541, 542, 543 is, for example, an image of 16 × 16 dots, and each dot indicates the existence probability of the building at the position of the dot.

図６は、プーリングモデルの学習検出器５４に含まれる層を説明する図であり、図６には各層が、処理順に記載されている。所属の欄において、「共通」と記載される層は共通部５４０に存在し、「個別」と記載される層は個別部５４１，５４２，５４３に存在する。ここで、「個別」に記載される層は、個別部５４１，５４２，５４３のそれぞれに存在している。処理種類は、各層の種類を示しており、「ｉｎｐｕｔ」は入力層、「ｃｏｎｖｏｌｕｔｉｏｎ」は畳み込み層、「ｐｏｏｌｉｎｇ（ｓ２）」は、ストライド（カーネルの適用間隔）が２であるプーリング層を示している。カーネルサイズは、畳み込みフィルタのサイズを表すパラメータである。ここでは処理対象が画像であることに対応して、カーネルは２次元であり、カーネルサイズの値「ｋ」は“ｋ×ｋ”フィルタであることを意味する。各層の「特徴マップ数」は、当該層にて抽出される特徴マップの数であり、チャネルともよばれる。なお、ストライドは特に記載のない限り１であり、層ごとの記載を省略している。 FIG. 6 is a diagram illustrating layers included in the learning detector 54 of the pooling model, and FIG. 6 shows each layer in the order of processing. In the column of affiliation, the layer described as "common" exists in the common unit 540, and the layer described as "individual" exists in the individual units 541, 542, 543. Here, the layers described in "individual" exist in each of the individual portions 541, 542, 543. The processing type indicates the type of each layer, "input" indicates an input layer, "convolution" indicates a convolution layer, and "pooling (s2)" indicates a pooling layer having a stride (kernel application interval) of 2. There is. The kernel size is a parameter that represents the size of the convolution filter. Here, corresponding to the processing target being an image, the kernel is two-dimensional, and the kernel size value "k" means that it is a "k × k" filter. The "number of feature maps" of each layer is the number of feature maps extracted in the layer, and is also called a channel. The stride is 1 unless otherwise specified, and the description for each layer is omitted.

図７は、ダイレーションモデルの学習検出器に含まれる層を説明する図である。図７の記載も図６の記載に準じているが、ダイレーションモデルにおける「ｃｏｎｖｏｌｕｔｉｏｎ」の層は拡張畳み込み層を示しており、その拡張畳み込み層の設定が拡張係数の欄に示されている。 FIG. 7 is a diagram illustrating layers included in the learning detector of the dilation model. The description of FIG. 7 is also based on the description of FIG. 6, but the layer of “convolution” in the dilation model indicates an extended convolution layer, and the setting of the expanded convolution layer is shown in the column of expansion coefficient.

拡張畳み込み演算についてさらに説明する。図８は、拡張畳み込み演算における層構造の一例を説明する図である。学習用入力画像などの入力画像は空間的に２次元のデータであるが、ここでは図示及び説明の簡素化のため、学習検出器５４への入力データを１次元データに単純化して説明する。具体的には、図８にて一番下に位置する入力層にて水平方向に並ぶ複数の“○”印が入力データを構成する。“○”印で表す入力データの要素３０は、入力画像における画素（又は画素値）に相当する。図８に示される畳み込み層はいわゆる特徴抽出層であり、特徴抽出層に続く層の記載を省略している。 The extended convolution operation will be further described. FIG. 8 is a diagram illustrating an example of a layer structure in the extended convolution operation. The input image such as the input image for learning is spatially two-dimensional data, but here, for the sake of simplification of illustration and explanation, the input data to the learning detector 54 will be simplified and described as one-dimensional data. Specifically, a plurality of "○" marks arranged in the horizontal direction in the input layer located at the bottom in FIG. 8 constitute input data. The element 30 of the input data represented by the “◯” mark corresponds to a pixel (or a pixel value) in the input image. The convolutional layer shown in FIG. 8 is a so-called feature extraction layer, and the description of the layer following the feature extraction layer is omitted.

図８に示されるニューラルネットワークは特徴抽出層として７層の畳み込み層を有し、各畳み込み層が拡張畳み込み演算を行う。入力層の上に位置する第１層の畳み込み層は拡張係数ｄ＝１の拡張畳み込み演算を行う。具体的には第１層にて“○”印で表す複数のユニット３１それぞれにて畳み込み演算が行われ、各ユニット３１は入力層の隣り合う２つの要素３０の値に重みを乗じて足し合わせた値を出力する。 The neural network shown in FIG. 8 has seven convolution layers as feature extraction layers, and each convolution layer performs an extended convolution operation. The convolution layer of the first layer located above the input layer performs an expansion convolution operation with an expansion coefficient d = 1. Specifically, a convolution operation is performed on each of the plurality of units 31 represented by "○" in the first layer, and each unit 31 multiplies the values of two adjacent elements 30 of the input layer and adds them together. Output the value.

第２層の畳み込み層は拡張係数ｄ＝２の拡張畳み込み演算を行う。具体的には第２層にて“○”印で表す複数のユニット３２それぞれにて畳み込み演算が行われ、各ユニット３２は第１層にて１つ置きのユニット３１の出力値に重みを乗じて足し合わせた値を出力する。 The second convolutional layer performs an extended convolution operation with an expansion coefficient d = 2. Specifically, a convolution operation is performed on each of the plurality of units 32 represented by "○" in the second layer, and each unit 32 multiplies the output value of every other unit 31 in the first layer by a weight. The added value is output.

また、第３層の畳み込み層は拡張係数ｄ＝３の拡張畳み込み演算を行い、第３層の“○”印で表す各ユニット３３は第２層にて３つ置きのユニット３２の出力値に重みを乗じて足し合わせた値を出力し、第４層の畳み込み層は拡張係数ｄ＝４の拡張畳み込み演算を行い、第４層の“○”印で表す各ユニット３４は第３層にて７つ置きのユニット３３の出力値に重みを乗じて足し合わせた値を出力する。第５層の各ユニット３５は、ｄ＝３の拡張畳み込み演算を行い、また、第６層の各ユニット３６、第７層の各ユニット３７は、それぞれｄ＝２，ｄ＝１の拡張畳み込み演算を行う。 Further, the convolution layer of the third layer performs an expansion convolution operation having an expansion coefficient d = 3, and each unit 33 represented by the “○” mark of the third layer is the output value of every three units 32 in the second layer. The value obtained by multiplying the weights and adding them is output, the convolutional layer of the 4th layer performs the extended convolution operation with the expansion coefficient d = 4, and each unit 34 represented by the “○” mark of the 4th layer is the 3rd layer. The output value of every seven units 33 is multiplied by a weight and added together to output the value. Each unit 35 of the fifth layer performs an extended convolution operation of d = 3, and each unit 36 of the sixth layer and each unit 37 of the seventh layer perform an extended convolution operation of d = 2, d = 1, respectively. I do.

ここで、図８に示す特徴抽出層の構造において、第１層～第４層からなる部分をフロントエンド部と称し、これに続く第５層～第７層からなる部分を局所特徴抽出部と称することにする。フロントエンド部は、入力層に続く複数の畳み込み層であり、フロントエンド部では、当該畳み込み層の並び順に従って拡張係数ｄが特徴抽出層における最大値まで増加する。一方、局所特徴抽出部は、フロントエンド部に続く複数の畳み込み層であり、局所特徴抽出部では当該畳み込み層の並び順に従って拡張係数が減少する。 Here, in the structure of the feature extraction layer shown in FIG. 8, the portion consisting of the first layer to the fourth layer is referred to as a front end portion, and the portion consisting of the subsequent fifth layer to the seventh layer is referred to as a local feature extraction portion. I will call it. The front end portion is a plurality of convolution layers following the input layer, and in the front end portion, the expansion coefficient d increases to the maximum value in the feature extraction layer according to the order of the convolution layers. On the other hand, the local feature extraction unit is a plurality of convolutional layers following the front end portion, and the expansion coefficient of the local feature extraction unit decreases according to the order of the convolutional layers.

図８は、第７層の或る１つのユニット３７の出力に畳み込まれる第１層から第６層のユニット及び入力層の接続関係を線で例示している。拡張畳み込み演算では、拡張係数ｄに応じて指数関数的にカーネルの適用範囲が拡張される。例えば、図８のｄ＝１～４の畳み込み演算のカーネルは、いずれも２つの入力を畳み込むフィルタ、つまりサイズが２のフィルタであるが、ｄ＝１のカーネルにより畳み込まれる２つの入力の１次元データの並びでの間隔は１であるのに対して、ｄ＝２のカーネルにより畳み込まれる２つの入力の間隔は２であり、またｄ＝３では当該間隔は４、ｄ＝４では当該間隔は８となる。つまり、間隔は２^ｄ－１に設定されている。 FIG. 8 illustrates the connection relationship between the units of the first layer to the sixth layer and the input layer convoluted to the output of one unit 37 of the seventh layer by a line. In the extended convolution operation, the scope of application of the kernel is expanded exponentially according to the expansion coefficient d. For example, the kernel of the convolution operation of d = 1 to 4 in FIG. 8 is a filter that convolves two inputs, that is, a filter of size 2, but one of the two inputs convolved by the kernel of d = 1. The spacing in the sequence of dimensional data is 1, whereas the spacing between the two inputs convolved by the kernel with d = 2 is 2, and at d = 3, the spacing is 4, and at d = 4, the spacing is 4. The interval is 8. That is, the interval is set to 2 ^d-1 .

フロントエンド部におけるユニットおよび入力層の接続関係からわかるように、拡張畳み込み演算では、カーネルの適用範囲を拡張することで、少ない層数で受容野を広げることができる。そして、畳み込みだけで受容野を広げるので、一般的なＣＮＮで用いるプーリング層が不要となり、プーリング層による解像度低下を回避できる。また、適用範囲を拡大する一方で、当該範囲内の要素を間引いて残った一部の要素しか畳み込まないことで、重みパラメータの増大が抑制される。 As can be seen from the connection relationship between the unit and the input layer in the front end part, in the extended convolution operation, the receptive field can be expanded with a small number of layers by expanding the applicable range of the kernel. Since the receptive field is expanded only by convolution, the pooling layer used in a general CNN becomes unnecessary, and the resolution deterioration due to the pooling layer can be avoided. Further, while expanding the applicable range, the increase of the weight parameter is suppressed by thinning out the elements within the range and convolving only a part of the remaining elements.

一方、フロントエンド部のように、順に拡張係数ｄが増加するように層を積み重ねる構造は、最上層における近傍ユニット間の相関が弱まるという問題や、入力データのローカルな特徴を拾いにくくなるという問題を有する。局所特徴抽出部はこの問題を解決するために設けられており、フロントエンド部と局所特徴抽出部とを組み合わせることで、第７層のあるユニットにおいて近傍ユニット間の相関が弱まるという問題や、第１層のユニット３１ａ，３１ｂが隣り合っているというローカルな情報を把握できないという問題が解決されている。 On the other hand, a structure in which layers are stacked so that the expansion coefficient d increases in order, such as the front end part, has a problem that the correlation between neighboring units in the uppermost layer is weakened and a problem that it is difficult to pick up local characteristics of input data. Have. The local feature extraction unit is provided to solve this problem, and by combining the front end unit and the local feature extraction unit, the problem that the correlation between neighboring units is weakened in a unit with the 7th layer, and the second The problem that the local information that the units 31a and 31b of one layer are adjacent to each other cannot be grasped is solved.

言い換えると、フロントエンド部の後に局所特徴抽出部を設けた構成とすることで、フロントエンド部にて拡張畳み込み演算を積極的に利用し解像度を一切落とさずにコンテキストを得ると共に、局所特徴抽出部ではフロントエンド部により分散された局所特徴を集約する。これにより、コンテキストの情報と局所特徴の情報を有効活用でき、小さく密集したオブジェクトも認識可能となっている。 In other words, by providing a local feature extraction section after the front end section, the front end section actively uses the extended convolution operation to obtain context without reducing the resolution at all, and the local feature extraction section. Now, the local features distributed by the front end part are aggregated. As a result, contextual information and local feature information can be effectively utilized, and even small and dense objects can be recognized.

次に、これまでに説明した学習検出器５４を、スケールに応じた学習用入力画像と、その学習用画像に含まれる建物の形状を示す教師データとを用いて学習させる処理の詳細について説明する。 Next, the details of the process of training the learning detector 54 described so far by using the learning input image according to the scale and the teacher data indicating the shape of the building included in the learning image will be described. ..

図９は、学習検出器５４を学習させる処理の一例を示すフロー図である。図９には、学習データ取得部５１および学習実行部５２の処理が記載されており、この処理により、学習検出器５４が学習される。また、図９に示される処理は、学習検出器５４ごとに繰り返し回数だけ行われる。 FIG. 9 is a flow chart showing an example of a process for learning the learning detector 54. FIG. 9 shows the processing of the learning data acquisition unit 51 and the learning execution unit 52, and the learning detector 54 is learned by this processing. Further, the process shown in FIG. 9 is performed for each learning detector 54 by the number of repetitions.

学習データ取得部５１は、記憶部１２に格納された学習用画像を取得する（ステップＳ１０１）。学習用画像は、建物を抽出する処理の対象領域とする地表を撮影した航空写真や衛星画像等（航空写真や衛星画像に基づくオルソ画像であってよい）である。次に、学習データ取得部５１は、学習用画像のサイズを、学習検出器５４のスケールに合わせるように設定する（ステップＳ１０２）。例えば、学習検出器５４のスケールが０．５倍であれば学習用画像を０．５倍に縮小し、スケールが２倍であれば学習用画像を２倍に拡大する。なお、ステップＳ１０２の処理をする代わりに、予めスケールの種類のそれぞれに対応した複数の学習用画像を準備しておき、学習データ取得部５１が学習検出器５４のスケールに対応する画像を読み込んでもよい。 The learning data acquisition unit 51 acquires a learning image stored in the storage unit 12 (step S101). The learning image is an aerial photograph, a satellite image, or the like (which may be an ortho image based on the aerial photograph or the satellite image) taken on the ground surface as the target area of the processing for extracting the building. Next, the learning data acquisition unit 51 sets the size of the learning image to match the scale of the learning detector 54 (step S102). For example, if the scale of the learning detector 54 is 0.5 times, the learning image is reduced to 0.5 times, and if the scale is 2 times, the learning image is enlarged 2 times. Even if a plurality of learning images corresponding to each of the scale types are prepared in advance instead of the processing of step S102, and the learning data acquisition unit 51 reads the images corresponding to the scale of the learning detector 54. good.

そして、学習実行部５２は、スケールに合わせるように設定された学習用画像から、学習検出器５４に入力する窓画像を切出す（ステップＳ１０３）。窓画像は、Ｐｘ×Ｐｙのサイズであり、１つの学習用画像から、ランダムに位置を選択し、選択した位置をもとに学習用画像から窓画像が切り出す。 Then, the learning execution unit 52 cuts out a window image to be input to the learning detector 54 from the learning image set to match the scale (step S103). The window image has a size of Px × Py, and a position is randomly selected from one learning image, and the window image is cut out from the learning image based on the selected position.

学習実行部５２は、学習用画像から切り出された窓画像を入力し、出力を教師データと比較することで学習検出器５４を学習させる（ステップＳ１０４）。 The learning execution unit 52 inputs a window image cut out from the learning image, and trains the learning detector 54 by comparing the output with the teacher data (step S104).

図１０は、窓画像のそれぞれに対する学習実行部５２の処理の一例を示すフロー図であり、ステップＳ１０４の処理をさらに詳細に説明する図である。ステップＳ１０４では、はじめに、学習実行部５２は、学習検出器５４の共通部５４０へ、学習用画像から切り出された窓画像を入力する（ステップＳ１２１）。これにより、学習検出器５４の共通部５４０が窓画像を処理し、さらに共通部５４０の出力を個別部５４１，５４２，５４３が処理する。そして、学習実行部５２は、学習検出器５４の個別部５４１，５４２，５４３のそれぞれの出力画像を取得する（ステップＳ１２２）。ここで、以下では、面積の範囲Ｓに対応する個別部５４１の出力画像を出力画像（Ｓ）、面積の範囲Ｍに対応する個別部５４２の出力画像を出力画像（Ｍ）、面積の範囲Ｌに対応する個別部５４３の出力画像を出力画像（Ｌ）と記載する。また、個別部５４１，５４２，５４３の出力画像をまとめて出力画像（Ｓ，Ｍ，Ｌ）と記載する。ここで、出力画像（Ｓ，Ｍ，Ｌ）の各ドットの値は、建物の領域の存在確率を示している。 FIG. 10 is a flow chart showing an example of the processing of the learning execution unit 52 for each of the window images, and is a diagram for explaining the processing in step S104 in more detail. In step S104, first, the learning execution unit 52 inputs the window image cut out from the learning image to the common unit 540 of the learning detector 54 (step S121). As a result, the common unit 540 of the learning detector 54 processes the window image, and the individual units 541, 542, 543 further process the output of the common unit 540. Then, the learning execution unit 52 acquires the output images of the individual units 541, 542, 543 of the learning detector 54 (step S122). Here, in the following, the output image of the individual unit 541 corresponding to the area range S is the output image (S), the output image of the individual unit 542 corresponding to the area range M is the output image (M), and the area range L. The output image of the individual unit 543 corresponding to the above is referred to as an output image (L). Further, the output images of the individual portions 541, 542, 543 are collectively referred to as output images (S, M, L). Here, the value of each dot of the output image (S, M, L) indicates the existence probability of the area of the building.

次に、学習実行部５２は、学習検出器５４の出力画像（Ｓ，Ｍ，Ｌ）と、教師データとの誤差を算出する（ステップＳ１２３）。ここで、教師データは、学習用画像データに含まれる建物の形状を示す情報である。 Next, the learning execution unit 52 calculates an error between the output image (S, M, L) of the learning detector 54 and the teacher data (step S123). Here, the teacher data is information indicating the shape of the building included in the learning image data.

図１１は、教師データの一例を示す図である。図１１に示される教師データは、図４に示される窓画像を含む学習用画像に対応しているビットマップ画像である。図１１に示される教師データは、面積が範囲Ｓに属する建物の領域（例えばＡ）と、範囲Ｍに属する建物の領域（例えばＢ）と、範囲Ｌに属する建物の領域（例えばＣ）とが区別されている。教師データは、例えば、建物のない領域のドットの値を０、面積が範囲Ｓ，Ｍ，Ｌの建物の領域のドットの値をそれぞれ１，２，３に設定された画像であってもよい。また、教師データは、面積が範囲Ｓに属する建物の領域のドットの値が１である画像と、面積が範囲Ｍに属する建物の領域のドットの値が１である画像と、面積が範囲Ｌに属する建物の領域のドットの値が１である画像との複数のレイヤーに相当する画像であってもよい。 FIG. 11 is a diagram showing an example of teacher data. The teacher data shown in FIG. 11 is a bitmap image corresponding to the learning image including the window image shown in FIG. In the teacher data shown in FIG. 11, the area of the building whose area belongs to the range S (for example, A), the area of the building belonging to the range M (for example, B), and the area of the building belonging to the range L (for example, C) are included. It is distinguished. The teacher data may be, for example, an image in which the dot value of the area without a building is set to 0 and the dot value of the area of the building having the areas S, M, and L is set to 1, 2, and 3, respectively. .. Further, the teacher data includes an image in which the dot value of the area of the building whose area belongs to the range S is 1, an image in which the dot value of the area of the building whose area belongs to the range M is 1, and the area L. It may be an image corresponding to a plurality of layers with an image in which the dot value of the area of the building belonging to is 1.

学習実行部５２は、誤差の算出において、学習用画像の窓画像の中央の１６×１６ドットに相当する位置の画像を教師データから切り出し、そして、出力画像（Ｓ，Ｍ，Ｌ）のそれぞれと、教師データとを比較する。ここで、学習実行部５２は、教師データのうち建物のない領域および範囲Ｓに属する建物の領域については出力画像（Ｓ）との誤差を算出するが、範囲Ｍ，Ｌに属する建物の領域については誤差を算出しない。同様に、学習実行部５２は、範囲Ｓ，Ｌに属する建物の領域について出力画像（Ｍ）との誤差を算出せず、範囲Ｓ，Ｍに属する建物の領域について出力画像（Ｌ）との誤差を算出しない。これにより、個別部５４１，５４２，５４３のそれぞれが、面積の範囲Ｓ，Ｍ，Ｌの建物の検出に適するように学習が進む。 In the calculation of the error, the learning execution unit 52 cuts out an image at a position corresponding to 16 × 16 dots in the center of the window image of the learning image from the teacher data, and with each of the output images (S, M, L). , Compare with teacher data. Here, the learning execution unit 52 calculates an error from the output image (S) for the area of the teacher data without a building and the area of the building belonging to the range S, but the area of the building belonging to the ranges M and L. Does not calculate the error. Similarly, the learning execution unit 52 does not calculate the error with the output image (M) for the area of the building belonging to the ranges S and L, and the error with the output image (L) with respect to the area of the building belonging to the ranges S and M. Is not calculated. As a result, learning proceeds so that each of the individual portions 541, 542, 543 is suitable for detecting a building in the area ranges S, M, and L.

次に、学習実行部５２は、算出された誤差に基づいて、誤差逆伝播法（バックプロパゲーション）などにより、個別部５４１，５４２，５４３における重み等のパラメータの値を変更する（ステップＳ１２４）。また、学習実行部５２は、個別部５４１，５４２，５４３のそれぞれの最上位の層から共通部の最下層に伝播させるべき誤差を積算し（ステップＳ１２５）、積算された誤差に基づいて、誤差逆伝播法などにより、共通部５４０における重み等のパラメータの値を変更する（ステップＳ１２６）。 Next, the learning execution unit 52 changes the values of parameters such as weights in the individual units 541, 542, 543 by an error back propagation method (backpropagation) or the like based on the calculated error (step S124). .. Further, the learning execution unit 52 integrates the error to be propagated from the highest layer of each of the individual units 541, 542, 543 to the lowest layer of the common unit (step S125), and based on the integrated error, the error is calculated. The value of the parameter such as the weight in the common portion 540 is changed by the back propagation method or the like (step S126).

ステップＳ１０３およびステップＳ１０４（図９）に示される学習の処理は、ある学習用画像から学習に用いるすべての窓画像が取得されるまで繰り返される。この処理のセットは、すべての学習検出器５４のそれぞれに対して繰り返し行われ、それにより、各学習検出器５４が学習される。ここで、ステップＳ１０３の処理の代わりに、学習に用いる複数の窓画像をまとめて切り出す処理を行ってもよい。この場合、窓画像を入力し学習検出器５４を学習させる処理が切り出された窓画像のそれぞれについて行われるように、ステップＳ１０４の処理が繰り返し実行されてよい。 The learning process shown in step S103 and step S104 (FIG. 9) is repeated until all the window images used for learning are acquired from a certain learning image. This set of processes is repeated for each of all the learning detectors 54, thereby learning each learning detector 54. Here, instead of the process of step S103, a process of collectively cutting out a plurality of window images used for learning may be performed. In this case, the process of step S104 may be repeatedly executed so that the process of inputting the window image and training the learning detector 54 is performed for each of the cut out window images.

次に、学習済の学習検出器５４を評価し、実際に処理対象画像から建物の領域を抽出する処理を実行させるための学習検出器５４を実行検出器６２，６３，６４として選択する処理の詳細について説明する。 Next, in the process of evaluating the learned learning detector 54 and selecting the learning detector 54 for actually executing the process of extracting the building area from the image to be processed as the execution detectors 62, 63, 64. The details will be described.

図１２は、学習検出器５４を評価する処理の一例を示すフロー図である。この処理では、はじめに、評価データ取得部５６は、記憶部１２から評価用画像および正解データを取得する（ステップＳ２０１）。評価用画像は学習用画像と同じであってもよく、異なってもよい。評価用画像の縮尺は学習用画像と同じである。正解データは評価用画像のうち面積の範囲Ｓ，Ｍ，Ｌのそれぞれに属する建物の領域を示す画像であり、評価用画像と学習用画像とが同じ場合は、正解データは教師データであってよい。また、図１２には図示されていないが、評価データ取得部５６は、学習データ取得部５１と同様に、評価用画像のサイズを学習検出器５４のスケールに合わせるように設定する。 FIG. 12 is a flow chart showing an example of processing for evaluating the learning detector 54. In this process, first, the evaluation data acquisition unit 56 acquires an evaluation image and correct answer data from the storage unit 12 (step S201). The evaluation image may be the same as the learning image or may be different. The scale of the evaluation image is the same as that of the learning image. The correct answer data is an image showing the area of the building belonging to each of the area ranges S, M, and L in the evaluation image, and when the evaluation image and the learning image are the same, the correct answer data is the teacher data. good. Further, although not shown in FIG. 12, the evaluation data acquisition unit 56 sets the size of the evaluation image to match the scale of the learning detector 54, similarly to the learning data acquisition unit 51.

次に、評価実行部５７は、評価用画像から、学習検出器５４に入力する窓画像を切出す（ステップＳ２０２）。より具体的には、評価実行部５７は、切り出される領域がこれまでに切り出された窓領域と比べて所定数のドットがずれるように窓画像を切り出す。所定数のドットは１ドット以上、１６ドット以下の任意の大きさとすることができる。所定数の上限である１６は、学習検出器５４の出力が１６×１６ドットの画像であることに対応している。所定数は学習検出器５４の出力の縦または横の大きさ以下である。評価実行部５７は、評価用画像から切り出された窓画像を学習検出器５４へ入力し（ステップＳ２０３）、学習検出器５４の個別部５４１，５４２，５４３のそれぞれの出力画像（Ｓ，Ｍ，Ｌ）を取得する（ステップＳ２０４）。ここで、評価実行部５７は、取得された出力画像を、各ドットの存在確率の値が閾値より大きいか小さいかに基づいて２値化し、２値化された出力画像を記憶部１２に格納する。以下の処理では、出力画像は２値化された出力画像を指すものとする。そして、すべての窓画像について学習検出器５４の処理を行うまで、ステップＳ２０２からＳ２０４の処理を繰り返す（ステップＳ２０５参照）。 Next, the evaluation execution unit 57 cuts out a window image to be input to the learning detector 54 from the evaluation image (step S202). More specifically, the evaluation execution unit 57 cuts out a window image so that a predetermined number of dots are deviated from the window area cut out so far. The predetermined number of dots can be any size of 1 dot or more and 16 dots or less. The upper limit of 16 corresponding to the predetermined number corresponds to the output of the learning detector 54 being an image of 16 × 16 dots. The predetermined number is equal to or less than the vertical or horizontal size of the output of the learning detector 54. The evaluation execution unit 57 inputs the window image cut out from the evaluation image to the learning detector 54 (step S203), and outputs the respective output images (S, M, 543) of the individual units 541, 542, 543 of the learning detector 54. L) is acquired (step S204). Here, the evaluation execution unit 57 binarizes the acquired output image based on whether the value of the existence probability of each dot is larger or smaller than the threshold value, and stores the binarized output image in the storage unit 12. do. In the following processing, the output image refers to the binarized output image. Then, the processes of steps S202 to S204 are repeated until the process of the learning detector 54 is performed for all the window images (see step S205).

すべての窓画像についての出力画像（Ｓ，Ｍ，Ｌ）が得られると、評価実行部５７は、それらの窓画像に対応する位置に出力画像（Ｓ）が配置された全体画像（Ｓ）と、それらの窓画像に対応する位置に出力画像（Ｍ）が配置された全体画像（Ｍ）と、それらの窓画像に対応する位置に出力画像（Ｌ）が配置された全体画像（Ｌ）と、を生成する（ステップＳ２０６）。より具体的には、評価実行部５７は出力画像（Ｓ，Ｍ，Ｌ）を窓画像の配置に対応するように互いに所定数のドットずれるように配置することで、全体画像（Ｓ，Ｍ，Ｌ）を生成する。ここで、窓画像を切出す際のずれの大きさである所定数のドットが１６ドットより小さい場合、各窓画像から得られる出力画像（Ｓ，Ｍ，Ｌ）のうち少なくとも一部のドットが他の窓画像についての出力画像（Ｓ，Ｍ，Ｌ）と重なる。評価実行部５７は、複数の窓画像の出力において位置が重なるドットについては、出力画像のドットの値が平均された平均値を全体画像（Ｓ，Ｍ，Ｌ）におけるドットの値とする。これにより、隣り合う出力画像（Ｓ，Ｍ，Ｌ）の境界が滑らかにつながらない場合であっても、それに起因する不整合が全体画像に表れることを防ぐことができる。 When the output images (S, M, L) for all the window images are obtained, the evaluation execution unit 57 together with the whole image (S) in which the output images (S) are arranged at the positions corresponding to those window images. , The whole image (M) in which the output image (M) is arranged at the position corresponding to those window images, and the whole image (L) in which the output image (L) is arranged at the position corresponding to those window images. , Are generated (step S206). More specifically, the evaluation execution unit 57 arranges the output images (S, M, L) so as to be offset by a predetermined number of dots from each other so as to correspond to the arrangement of the window images, so that the entire image (S, M, L) can be arranged. L) is generated. Here, when a predetermined number of dots, which is the size of the deviation when cutting out the window image, is smaller than 16 dots, at least some of the dots in the output image (S, M, L) obtained from each window image are It overlaps with the output images (S, M, L) for other window images. For dots whose positions overlap in the output of a plurality of window images, the evaluation execution unit 57 uses the average value obtained by averaging the dot values of the output image as the dot value in the entire image (S, M, L). As a result, even when the boundaries of adjacent output images (S, M, L) are not smoothly connected, it is possible to prevent the inconsistency caused by the inconsistency from appearing in the entire image.

そして、評価実行部５７は全体画像と正解データとを比較し、学習検出器５４の個別部５４１，５４２，５４３のそれぞれについて精度を評価する（ステップＳ２０７）。精度の評価は、例えば、評価実行部５７は正解データのうち面積の範囲Ｓに属する建物が存在する領域に、出力画像（Ｓ）において建物と判定された領域が存在する割合（Ｒｅｃａｌｌ）を求めることで行う。評価実行部５７は、正解データのうち面積の範囲Ｍ，Ｌに属する建物の領域と、出力画像（Ｍ）、出力画像（Ｌ）に存在する建物の領域とにおいても、同様に精度を評価する。 Then, the evaluation execution unit 57 compares the entire image with the correct answer data, and evaluates the accuracy of each of the individual units 541, 542, 543 of the learning detector 54 (step S207). In the accuracy evaluation, for example, the evaluation execution unit 57 obtains the ratio (Recall) in which the area determined to be a building in the output image (S) exists in the area where the building belonging to the area range S exists in the correct answer data. Do it by. The evaluation execution unit 57 similarly evaluates the accuracy in the area of the building belonging to the area ranges M and L of the correct answer data and the area of the building existing in the output image (M) and the output image (L). ..

ステップＳ２０２からステップＳ２０７の処理により、１つの学習検出器５４の精度が評価される。そして、評価実行部５７は、すべての学習検出器５４について精度を評価していない場合、ステップＳ２０２からの処理を繰り返し（ステップＳ２０８）、これにより、評価実行部５７は、すべての学習検出器５４の精度を評価する。 The accuracy of one learning detector 54 is evaluated by the processing of steps S202 to S207. Then, when the evaluation execution unit 57 has not evaluated the accuracy of all the learning detectors 54, the process from step S202 is repeated (step S208), whereby the evaluation execution unit 57 causes all the learning detectors 54. Evaluate the accuracy of.

図１３は、評価実行部５７による評価結果を示す図である。図１３における「Ｎｏ」は、図３に示されるものと同じく、学習検出器５４に振られた番号を示す。図１３の例では、面積の範囲がＳである、個別部５４１の出力については、スケールが１．０倍かつダイレーションモデルである学習検出器５４が最も精度がよい。また、面積の範囲がＭである個別部５４２の出力については、スケールが１．０倍かつプーリングモデルの学習検出器５４が最も精度がよく、面積の範囲がＬである個別部５４３の出力については、スケールが０．５倍かつプーリングモデルの学習検出器５４が最も精度がよい。 FIG. 13 is a diagram showing the evaluation results by the evaluation execution unit 57. “No” in FIG. 13 indicates a number assigned to the learning detector 54, as shown in FIG. In the example of FIG. 13, for the output of the individual unit 541 whose area range is S, the learning detector 54 having a scale of 1.0 times and a dilation model has the highest accuracy. Regarding the output of the individual unit 542 having an area range of M, the output of the individual unit 543 having a scale of 1.0 times, the learning detector 54 of the pooling model having the highest accuracy, and the area range of L being L. The scale is 0.5 times and the learning detector 54 of the pooling model is the most accurate.

学習検出器５４の精度が評価されると、検出器選択部５８は、面積の範囲Ｓ，Ｍ，Ｌのそれぞれについて、最も精度の高い学習検出器５４を、実行検出器６２，６３，６４として選択する（ステップＳ２０９）。実行検出器６２は、面積の範囲Ｓについて最も精度の高い学習検出器５４に含まれる、共通部５４０（以下では共通部６２０という）と個別部５４１（以下では個別部６２１という）との組み合わせである。実行検出器６３は、面積の範囲Ｍについて最も精度の高い学習検出器５４に含まれる、共通部５４０（以下では共通部６３０という）と個別部５４２（以下では個別部６３１という）との組み合わせである。実行検出器６４は、面積の範囲Ｌについて元も精度の高い学習検出器５４に含まれる、共通部５４０（以下では共通部６４０という）と個別部５４３（以下では個別部６４１という）との組み合わせである。 When the accuracy of the learning detector 54 is evaluated, the detector selection unit 58 sets the learning detector 54 with the highest accuracy as the execution detectors 62, 63, 64 for each of the area ranges S, M, and L. Select (step S209). The execution detector 62 is a combination of a common part 540 (hereinafter referred to as a common part 620) and an individual part 541 (hereinafter referred to as an individual part 621) included in the learning detector 54 having the highest accuracy in the area range S. be. The execution detector 63 is a combination of a common part 540 (hereinafter referred to as a common part 630) and an individual part 542 (hereinafter referred to as an individual part 631) included in the learning detector 54 having the highest accuracy for the area range M. be. The execution detector 64 is a combination of a common part 540 (hereinafter referred to as a common part 640) and an individual part 543 (hereinafter referred to as an individual part 641), which are originally included in the learning detector 54 having high accuracy in the area range L. Is.

ここで、図１３の記載からもわかるように、ダイレーションモデルはプーリングモデルに比べて小さな変化をとらえやすい傾向があるため、面積の範囲（の最大値）が小さいものではダイレーションモデルが有利になり、面積の範囲が大きいものではプーリングモデルが有利になる。また、スケールが小さいと細かな情報が減る一方、大規模な建物の形状を判定しやすくなる傾向がある。そのため、面積の範囲（の最大値）が小さいものではスケールが大きい方が有利になり、面積の範囲が大きいものではスケールが小さい方が有利になる。 Here, as can be seen from the description in FIG. 13, since the dilation model tends to catch small changes as compared with the pooling model, the dilation model is advantageous when the area range (maximum value) is small. Therefore, the pooling model is advantageous for those with a large area range. In addition, when the scale is small, detailed information is reduced, but it tends to be easier to determine the shape of a large-scale building. Therefore, if the area range (maximum value) is small, the larger scale is advantageous, and if the area range is large, the smaller scale is advantageous.

したがって、図１３の例においても、面積の範囲の最大値が小さいものに対応する実行検出器６２として、スケールが大きめの１．０倍であり、ダイレーションモデルである学習検出器５４が選択され、面積の範囲の最大値が大きいものに対応する実行検出器６４として、スケールが小さめの０．５倍であり、プーリングモデルである学習検出器５４が選択されている。 Therefore, also in the example of FIG. 13, as the execution detector 62 corresponding to the one having a small maximum value in the area range, the learning detector 54 having a large scale of 1.0 times and being a dilation model is selected. As the execution detector 64 corresponding to the one having a large maximum value in the area range, the learning detector 54, which has a small scale of 0.5 times and is a pooling model, is selected.

検出器選択部５８は、単に後述の対象データ入力部６５が処理対象画像を入力し出力画像を取得する対象となる学習検出器５４を示す情報を記憶部１２に保存することで、学習検出器５４を選択してもよいし、実行検出器６２，６３，６４の実体として、選択された学習検出器５４の共通部５４０、個別部５４１等をコピーすることで学習検出器５４を実行検出器６２，６３，６４として選択してもよい。 The detector selection unit 58 simply stores the information indicating the learning detector 54 to be the target for inputting the processing target image and acquiring the output image by the target data input unit 65, which will be described later, in the storage unit 12, so that the learning detector can be detected. 54 may be selected, or the learning detector 54 may be executed by copying the common part 540, the individual part 541, etc. of the selected learning detector 54 as the substance of the execution detectors 62, 63, 64. It may be selected as 62, 63, 64.

次に、実行検出器６２，６３，６４を用いて、処理対象画像から建物の領域を判定する処理について説明する。図１４は、建物の領域を判定する処理の概要を説明する図である。 Next, the process of determining the area of the building from the image to be processed will be described using the execution detectors 62, 63, 64. FIG. 14 is a diagram illustrating an outline of a process for determining an area of a building.

はじめに、対象データ入力部６５は、処理対象画像を面積の範囲Ｓに適した実行検出器６２に入力し、出力取得部６６は、実行検出器６２の出力に基づいて全体出力画像（Ｓ）を取得する（ステップＳ３０１）。全体出力画像（Ｓ）は、処理対象画像の全体について、実行検出器６２により建物が存在すると判定された領域を示す画像である。後述の全体出力画像（Ｍ）、全体出力画像（Ｌ）は、同様に、それぞれ、実行検出器６３，６４により建物が存在すると判定された領域を示す画像である。 First, the target data input unit 65 inputs the processing target image to the execution detector 62 suitable for the area range S, and the output acquisition unit 66 outputs the entire output image (S) based on the output of the execution detector 62. Acquire (step S301). The whole output image (S) is an image showing an area where a building is determined to exist by the execution detector 62 for the whole image to be processed. Similarly, the overall output image (M) and the overall output image (L), which will be described later, are images showing areas where it is determined by the execution detectors 63 and 64 that a building exists, respectively.

図１５は、処理対象画像から全体出力画像を生成する処理の流れを示すフロー図であり、ステップＳ３０１の処理を詳細に説明する図である。はじめに、対象データ入力部６５は、処理対象画像のスケールを、実行検出器６２に設定されたスケールに合わせる（ステップＳ３２１）。対象データ入力部６５は、処理対象画像のスケールと実行検出器６２のスケールが異なる場合には処理対象画像を拡大または縮小することにより、スケールを合わせる。次に、対象データ入力部６５は、スケールが合わせられた処理対象画像から窓画像を切出す（ステップＳ３２２）。窓画像のサイズや処理対象画像から窓画像を切出す手法については、評価用画像から窓画像を切出す手法と同じであるので説明を省略する。次に、対象データ入力部６５は、実行検出器６２へ窓画像を入力する（ステップＳ３２３）。すると、実行検出器６２は、入力された窓画像について建物の領域を検出する処理を行い、出力取得部６６は、実行検出器６２の出力画像を取得する（ステップＳ３２４）。ここで、図示していないが、出力取得部６６は、取得された出力画像を、各ドットの存在確率の値が閾値より大きいか小さいかに基づいて２値化し、２値化された出力画像を記憶部１２に格納する。以下の処理では、出力画像は２値化された出力画像を指すものとする。そして、すべての窓画像について学習検出器５４の処理を行うまで、ステップＳ３２２からＳ３２４の処理を繰り返す（ステップＳ３２５参照）。 FIG. 15 is a flow chart showing a flow of processing for generating an overall output image from a processing target image, and is a diagram for explaining the processing in step S301 in detail. First, the target data input unit 65 adjusts the scale of the image to be processed to the scale set in the execution detector 62 (step S321). When the scale of the processing target image and the scale of the execution detector 62 are different, the target data input unit 65 adjusts the scale by enlarging or reducing the processing target image. Next, the target data input unit 65 cuts out a window image from the processed target image to which the scale has been adjusted (step S322). Since the method of cutting out the window image from the size of the window image and the image to be processed is the same as the method of cutting out the window image from the evaluation image, the description thereof will be omitted. Next, the target data input unit 65 inputs a window image to the execution detector 62 (step S323). Then, the execution detector 62 performs a process of detecting the area of the building with respect to the input window image, and the output acquisition unit 66 acquires the output image of the execution detector 62 (step S324). Here, although not shown, the output acquisition unit 66 binarizes the acquired output image based on whether the value of the existence probability of each dot is larger or smaller than the threshold value, and the binarized output image. Is stored in the storage unit 12. In the following processing, the output image refers to the binarized output image. Then, the processes of steps S322 to S324 are repeated until the process of the learning detector 54 is performed for all the window images (see step S325).

なお、建物検出器が実行検出器６２，６３，６４の個別部６２１，６３１，６４１に対応し、建物検出器へ入力される処理対象画像の特徴情報が、それぞれ共通部６２０，６３０，６４０の出力であってよい。なお、学習検出器５４や実行検出器６２，６３，６４は、共通部５４０，６２０、６３０，６４０を含まなくてもよい。この場合、面積の範囲Ｓ、Ｍ、Ｌのそれぞれについて学習用入力画像や処理対象画像が入力され、建物検出器へ入力される処理対象画像の特徴情報は、単なる処理対象画像やその窓画像であってよい。 The building detector corresponds to the individual parts 621, 631, 641 of the execution detectors 62, 63, 64, and the feature information of the image to be processed input to the building detector is the common parts 620, 630, 640, respectively. It may be an output. The learning detector 54 and the execution detectors 62, 63, 64 do not have to include the common portions 540, 620, 630, and 640. In this case, the learning input image and the processing target image are input for each of the area ranges S, M, and L, and the feature information of the processing target image input to the building detector is simply the processing target image or its window image. It may be there.

すべての窓画像についての出力画像が得られると、評価実行部５７は、それらの窓画像に対応する位置に出力画像が配置された全体出力画像（Ｓ）を生成する（ステップＳ３２６）。 When the output images for all the window images are obtained, the evaluation execution unit 57 generates an overall output image (S) in which the output images are arranged at positions corresponding to those window images (step S326).

次に、フィルタ部６７は、全体出力画像（Ｓ）に、面積に基づくフィルタをかける（ステップＳ３０２）。この処理は、より具体的には、フィルタ部６７は、全体出力画像（Ｓ）において建物が存在すると判定された領域の面積（領域のドット数とスケールから求められる）を算出し、その面積が面積の範囲Ｓに応じた許容範囲にない領域を全体出力画像（Ｓ）から削除する。具体的には許容範囲は、８９．２ｍ^２未満である。なお、フィルタ部６７の処理は行われなくてもよい。 Next, the filter unit 67 applies an area-based filter to the entire output image (S) (step S302). More specifically, in this process, the filter unit 67 calculates the area (obtained from the number of dots and the scale of the area) of the area where the building is determined to exist in the overall output image (S), and the area is calculated. The area that is not within the permissible range according to the area range S is deleted from the overall output image (S). Specifically, the permissible range is less than 89.2 m ² . The processing of the filter unit 67 may not be performed.

また、対象データ入力部６５は、処理対象画像を面積の範囲Ｍに適した実行検出器６３に入力し、出力取得部６６は、実行検出器６３の出力に基づいて全体出力画像（Ｍ）を取得する（ステップＳ３０３）。この処理の詳細は、実行検出器６２から全体出力画像（Ｓ）を取得する処理と同様であるので詳細の説明は省略する。 Further, the target data input unit 65 inputs the processing target image to the execution detector 63 suitable for the area range M, and the output acquisition unit 66 outputs the entire output image (M) based on the output of the execution detector 63. Acquire (step S303). Since the details of this process are the same as the process of acquiring the entire output image (S) from the execution detector 62, detailed description thereof will be omitted.

次に、フィルタ部６７は、全体出力画像（Ｍ）に、面積に基づくフィルタをかける（ステップＳ３０４）。この処理は、より具体的には、フィルタ部６７は、全体出力画像（Ｍ）において建物が存在すると判定された領域の面積（領域のドット数とスケールから求められる）を算出し、その面積が面積の範囲Ｍに応じた許容範囲にない領域を全体出力画像（Ｍ）から削除する。具体的には許容範囲は、２２．３ｍ^２以上８９．２ｍ^２未満である。 Next, the filter unit 67 applies an area-based filter to the entire output image (M) (step S304). More specifically, in this process, the filter unit 67 calculates the area of the area where the building is determined to exist in the overall output image (M) (obtained from the number of dots and the scale of the area), and the area is calculated. Areas that are not within the permissible range according to the area range M are deleted from the overall output image (M). Specifically, the permissible range is 22.3 m ² or more and less than 89.2 m ² .

また、対象データ入力部６５は、処理対象画像を面積の範囲Ｌに適した実行検出器６４に入力し、出力取得部６６は、実行検出器６４の出力に基づいて全体出力画像（Ｌ）を取得する（ステップＳ３０５）。この処理の詳細は、実行検出器６２から全体出力画像（Ｌ）を取得する処理と同様であるので詳細の説明は省略する。 Further, the target data input unit 65 inputs the processing target image to the execution detector 64 suitable for the area range L, and the output acquisition unit 66 outputs the entire output image (L) based on the output of the execution detector 64. Acquire (step S305). Since the details of this process are the same as the process of acquiring the entire output image (L) from the execution detector 62, detailed description thereof will be omitted.

次に、フィルタ部６７は、全体出力画像（Ｌ）に、面積に基づくフィルタをかける（ステップＳ３０６）。この処理は、より具体的には、フィルタ部６７は、全体出力画像（Ｌ）において建物が存在すると判定された領域の面積（領域のドット数とスケールから求められる）を算出し、その面積が面積の範囲Ｌに応じた許容範囲にない領域を全体出力画像（Ｍ）から削除する。具体的には許容範囲は、６５．４ｍ^２以上である。 Next, the filter unit 67 applies an area-based filter to the entire output image (L) (step S306). More specifically, in this process, the filter unit 67 calculates the area of the area where the building is determined to exist in the overall output image (L) (obtained from the number of dots and the scale of the area), and the area is calculated. The area that is not within the permissible range according to the area range L is deleted from the overall output image (M). Specifically, the permissible range is 65.4 m ² or more.

そして、統合部６８は、全体出力画像（Ｓ）、全体出力画像（Ｍ）、全体出力画像（Ｌ）の縮尺が一致するように、これらのうち少なくとも１つを拡大または縮小する処理を実行する（ステップＳ３０７）。なお、この処理は、フィルタ部６７の処理の前に行われてもよい。 Then, the integration unit 68 executes a process of enlarging or reducing at least one of these so that the scales of the overall output image (S), the overall output image (M), and the overall output image (L) match. (Step S307). This process may be performed before the process of the filter unit 67.

統合部６８は、その処理がなされた全体出力画像（Ｓ）、全体出力画像（Ｍ）、全体出力画像（Ｌ）を統合する（ステップＳ３０８）。言い換えると、統合部６８は、全体出力画像（Ｓ）、全体出力画像（Ｍ）、全体出力画像（Ｌ）のいずれかにおいて建物と認識された領域を、建物のある領域と判定し、その判定がされた領域を示す統合された画像を生成する。より具体的には、統合部６８は、フィルタされた全体出力画像（Ｓ）、全体出力画像（Ｍ）、全体出力画像（Ｌ）の各ドットの論理和をとることで、統合された画像を生成する。ここで、全体出力画像（Ｓ）、全体出力画像（Ｍ）、全体出力画像（Ｌ）の各ドットは、建物が存在すると判定された領域において１であり、そうでない領域において０であるとする。 The integration unit 68 integrates the processed overall output image (S), overall output image (M), and overall output image (L) (step S308). In other words, the integration unit 68 determines that the area recognized as a building in any one of the total output image (S), the total output image (M), and the total output image (L) is a region with a building, and the determination is made. Generates an integrated image showing the area that has been removed. More specifically, the integration unit 68 creates an integrated image by taking the logical sum of the dots of the filtered overall output image (S), overall output image (M), and overall output image (L). Generate. Here, it is assumed that the dots of the overall output image (S), the overall output image (M), and the overall output image (L) are 1 in the area where it is determined that the building exists, and 0 in the area where the building is not present. ..

そして、画像出力部６９は、統合部６８により生成された画像を記憶部１２や表示出力デバイスへ出力する。 Then, the image output unit 69 outputs the image generated by the integration unit 68 to the storage unit 12 or the display output device.

面積の範囲Ｓ，Ｍ，Ｌのそれぞれに好適なスケールやモデルの種類を有する実行検出器６２，６３，６４を用いて建物の領域が判定された画像を取得し、さらに統合部６８によりそれらの画像を統合することで、処理対象画像から判定される建物の精度を向上させ、特に見逃しを減らすことができる。 Images in which the area of the building was determined were acquired using execution detectors 62, 63, 64 having scales and model types suitable for each of the area ranges S, M, and L, and further, the integrated unit 68 used them. By integrating the images, it is possible to improve the accuracy of the building determined from the processed image and reduce oversight in particular.

例えば、図１３に示される評価結果に基づいて、検出器選択部５８が、実行検出器６２，６３，６４として、それぞれ、スケールが１．０倍かつダイレーションモデル、スケールが１．０倍でプーリングモデル、スケールが０．５倍でプーリングモデルの学習検出器５４を選択した場合、ある実験では、見逃しの指標であるＲｅｃａｌｌの値が８７．０％であり、実行検出器６２，６３，６４として、どれもスケールが１．０倍でプーリングモデルとした場合における値である８２．０％や、実行検出器６２，６３，６４として、どれもスケールが１．０倍でダイレーションモデルとした場合における値である８３．８％を上回っている。ここで、Ｒｅｃａｌｌの値は、正解として与えられる建物の領域のうち、建物が存在すると判定された領域の数を、正解として与えられる建物の領域の数でわった数である。建物の領域の判定において、見落としを減らすことは一般的に容易ではないので、この効果は非常に大きいものとなる。 For example, based on the evaluation result shown in FIG. 13, the detector selection unit 58 has a scale of 1.0 times and a dilation model and a scale of 1.0 times as execution detectors 62, 63, 64, respectively. When the pooling model, the learning detector 54 of the pooling model with a scale of 0.5 times is selected, in one experiment, the value of Recall, which is an index of oversight, is 87.0%, and the execution detectors 62, 63, 64. As for the 82.0%, which is the value when the scale is 1.0 times and the pooling model, and the execution detectors 62, 63, 64, the scale is 1.0 times and the dilation model is used. It exceeds the value of 83.8% in the case. Here, the value of Recall is a number obtained by dividing the number of areas where it is determined that a building exists among the areas of the building given as the correct answer by the number of areas of the building given as the correct answer. This effect is very large because it is generally not easy to reduce oversights in determining the area of a building.

これまでに説明した実行検出器６２，６３，６４を組み合わせた建築物抽出システムを用いることで、航空写真や衛星画像といったリモートセンシング画像から様々なサイズの構造物や建築物等をより高精度に認識できるようになる。そして、建築物抽出システムを、建物の新築や滅失などの把握に利用することができ、家屋異動に関する統計の基礎情報の取得を可能とする。さらに、建物領域を精度良く抽出可能となることで、個々の建物の時間的変移をより容易に把握し、また、抽出された建物領域の大きさや形状から建物の詳細属性（例えば、戸建、マンション、工場といった建物の種類）を判別することもより容易になる。 By using the building extraction system that combines the execution detectors 62, 63, and 64 described so far, structures and buildings of various sizes can be obtained with higher accuracy from remote sensing images such as aerial photographs and satellite images. You will be able to recognize it. Then, the building extraction system can be used to grasp new construction or loss of a building, and it is possible to obtain basic information on statistics on house changes. Furthermore, by being able to extract the building area with high accuracy, it is easier to grasp the temporal transition of each building, and the detailed attributes of the building (for example, detached house, etc.) from the size and shape of the extracted building area. It will also be easier to identify the type of building (type of building such as condominium or factory).

そして、画像からの建物に関するこれらの情報抽出作業の自動化が図られることで、広範囲の地表を処理対象とした当該作業を低コストで高速に行うことが可能となる。 By automating the work of extracting information about the building from the image, it is possible to perform the work on a wide range of ground surfaces at low cost and at high speed.

これまでに、本発明の実施形態について説明してきたが、本発明の趣旨の範囲内で様々な変形をすることができる。例えば、面積の範囲が３つではなく、２つや４つ以上でもよい。また、モデルの種類の数やスケールの種類の数が異なっていてもよい。また、個別部は建物の面積の範囲に応じて最適化されなくてもよい。例えば建物の高さなど、他の手法で分類されたグループに応じて個別部が最適化されてもよい。 Although the embodiments of the present invention have been described so far, various modifications can be made within the scope of the gist of the present invention. For example, the area range may be two or four or more instead of three. Also, the number of model types and the number of scale types may be different. Further, the individual part does not have to be optimized according to the range of the area of the building. Individual parts may be optimized according to groups classified by other methods, for example, the height of a building.

１学習サーバ、１１プロセッサ、１２記憶部、１３通信部、１４入出力部、３０要素、３１,３２，３３，３４，３５，３６，３７ユニット、５１学習データ取得部、５２学習実行部、５３学習検出器セット、５４学習検出器、５４０共通部、５４１，５４２，５４３個別部、５６評価データ取得部、５７評価実行部、５８検出器選択部、６１実行検出器セット、６２，６３，６４実行検出器、６２０，６３０，６４０共通部、６２１，６３１，６４１個別部、６５対象データ入力部、６６出力取得部、６７フィルタ部、６８統合部、６９画像出力部。 1 learning server, 11 processor, 12 storage unit, 13 communication unit, 14 input / output unit, 30 elements, 31,32,33,34,35,36,37 units, 51 learning data acquisition unit, 52 learning execution unit, 53 Learning detector set, 54 learning detector, 540 common part, 541,542,543 individual part, 56 evaluation data acquisition part, 57 evaluation execution part, 58 detector selection part, 61 execution detector set, 62,63,64 Execution detector, 620, 630, 640 common part, 621, 631, 641 individual part, 65 target data input part, 66 output acquisition part, 67 filter part, 68 integration part, 69 image output part.

Claims

A teacher of information showing the shapes of a first learning input image having a first scale and the plurality of buildings included in the first learning input image for a plurality of buildings whose areas belong to the first range. The first building detector learned using the data,
For a plurality of buildings belonging to a second range whose area is different from the first range, a second learning input image having a second scale and a plurality of buildings included in the second learning input image. A second building detector trained using teacher data containing shape information,
The learning target area on the ground surface is photographed from the sky, and the feature information of the first input image having the first scale is input to the first building detector, and the first input image is the first scale. And an input unit that inputs the feature information of the second input image enlarged or reduced according to the ratio to the second scale to the second building detector.
An integrated unit that integrates the output of the first building detector with respect to the feature information of the first input image and the output of the second building detector with respect to the feature information of the second input image.
Including
The maximum value of the first range is larger than the maximum value of the second range.
The minimum value of the first range is larger than the minimum value of the second range.
The first scale is smaller than the second scale.
Building extraction system.

A teacher of information showing the shapes of a first learning input image having a first scale and the plurality of buildings included in the first learning input image for a plurality of buildings whose areas belong to the first range. The first building detector learned using the data,
For a plurality of buildings belonging to a second range whose area is different from the first range, a second learning input image having a second scale and a plurality of buildings included in the second learning input image. A second building detector trained using teacher data containing shape information,
The learning target area on the ground surface is photographed from the sky, and the feature information of the first input image having the first scale is input to the first building detector, and the first input image is the first scale. And an input unit that inputs the feature information of the second input image enlarged or reduced according to the ratio to the second scale to the second building detector.
An integrated unit that integrates the output of the first building detector with respect to the feature information of the first input image and the output of the second building detector with respect to the feature information of the second input image.
Including
The maximum value of the first range is larger than the maximum value of the second range.
The minimum value of the first range is larger than the minimum value of the second range.
A first candidate detector and a second candidate detector provided for each of the first range and the plurality of ranges including the second range.
For each of the plurality of ranges, a third learning input image having a first candidate scale and information teacher data indicating the shapes of a plurality of buildings belonging to the range included in the third learning input image are teacher data. Included in the first candidate detector learned using the above, a fourth learning input image having a second candidate scale different from the first candidate scale , and the fourth learning input image. An evaluation unit that evaluates the detection accuracy of the shape of each of the second candidate detectors learned by using the teacher data of the information indicating the shapes of a plurality of buildings belonging to the range .
Based on the detection accuracy evaluated by the evaluation unit, one of the first candidate detector and the second candidate detector provided for the first range is referred to as the first building detector . Then, one of the first candidate detector and the second candidate detector provided for the second range is selected as the second building detector, and the first candidate is selected. Of the scales and the second candidate scales, the one corresponding to the selected one of the first candidate detector and the second candidate detector provided for the first range is the first scale. Of the first candidate scale and the second candidate scale, the one selected from the first candidate detector and the second candidate detector provided for the second range. Further includes a detector selection unit that selects the corresponding one as the second scale .
Building extraction system.

In the building extraction system according to claim 1 or 2.
Further comprising a filter for removing the building included in the output of the first building detector and the building included in the output of the second building detector based on the area.
Building extraction system.

In the building extraction system according to any one of claims 1 to 3.
In the integrated unit, the scales of the output of the first building detector with respect to the feature information of the first input image and the output of the second building detector with respect to the feature information of the second input image match. As described above, the process of enlarging or reducing at least one of the two outputs is executed, and the two outputs for which the process is executed are superimposed.
Building extraction system.

In the building extraction system according to any one of claims 1 to 4 .
The integrated unit sets a region recognized as a building in either the output of the first building detector or the output of the second building detector with respect to the feature information of the input input image of the building. Judge as a certain area,
Building extraction system.