JP7418281B2

JP7418281B2 - Feature classification system, classification method and its program

Info

Publication number: JP7418281B2
Application number: JP2020085247A
Authority: JP
Inventors: 知史山本; 晃一郎八幡; 陽平上野; 力草場; 直之橋本; 寛康吉川
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2020-05-14
Filing date: 2020-05-14
Publication date: 2024-01-19
Anticipated expiration: 2040-05-14
Also published as: JP2021179839A

Description

本発明は、衛星画像等の空撮画像を用いた地物の分類技術に関する。 The present invention relates to a technology for classifying features using aerial images such as satellite images.

本発明の背景となる技術を説明する。以下の文章中で、抽出とは種別は何か分からない状態で認識することであり、類別とは種別まで特定したうえで認識することを意図する。 The technology behind the present invention will be explained. In the text below, extraction means recognizing the type without knowing what it is, and categorizing means recognizing after specifying the type.

衛星画像は、一度の撮影で広域を撮影できる特性から、同一地域における地上の被覆状況を調査する用途に向いている。再利用ロケットの実利用に向けた研究、地球観測衛星の小型化、コンステレーション計画が世界中で進められており、衛星による地球観測は、高解像度ながら低コスト、高頻度撮影のトレンドにある。これらを背景に、判読すべき衛星画像の枚数は爆発的に増加する見込みとなっており、判読する人手の不足が懸念されている。よって、衛星画像を自動判読する技術への需要は年々高まっている。 Satellite images are suitable for investigating ground cover in the same area because they can cover a wide area in one shot. Research toward the practical use of reusable rockets, miniaturization of Earth observation satellites, and constellation plans are underway around the world, and the trend in satellite Earth observation is towards high resolution, low cost, and high frequency imaging. Against this backdrop, the number of satellite images that must be interpreted is expected to increase explosively, and there are concerns about a shortage of manpower to interpret them. Therefore, the demand for technology that automatically interprets satellite images is increasing year by year.

画像判読に関連しては深層学習(Deep Learning)の適用事例が数多く報告されている。
しかしながら、デジタルカメラで撮影した一般的な写真と比べて、衛星画像は高精度な分類結果が出にくい。理由はいくつか考えられるが、第１に、衛星画像は直上から撮影した画像であり、基本的には上からの一面しか映らない極端な画像であることが主要因である。これにより、画像の上下にも意味がなく画像内の典型的な配置パタンの情報が存在せず、前景と背景の分離が難しくなってしまうなどの難しさが生じる。また、第２に、天気や太陽と地球の位置関係などにより、衛星画像の撮像条件が毎回異なることが自動判読を難しくしている別の要因である。特許文献１では、既知の3Dモデルが撮像条件の変化によってどのように見えるかという形をシミュレートすることで、ユーザの判読をサポートするような仕組みを提供している。また、第３に、衛星画像中には様々な地物が存在し、人が地物に対して付与したラベルのバリエーションが多すぎて機械学習の適用が難しい場合がある。見た目が異なる物や、サイズが全然違うものを同じクラスにするなど、人が与えるラベルには、計算機には理解が難しいことがある。例えば、市街地などを例にすると、様々なピクセル数で構成される建物を同一のクラスとして抽出する深層学習のモデルを生成すると、性能が劣化してしまうことが知られている。 Many applications of deep learning have been reported in relation to image interpretation.
However, compared to ordinary photographs taken with a digital camera, it is difficult to produce highly accurate classification results from satellite images. There are several possible reasons for this, but the first is that satellite images are taken from directly above, and are extremely extreme images that basically only show one side from above. As a result, the top and bottom of the image have no meaning, and there is no information about typical layout patterns within the image, making it difficult to separate the foreground and background. Second, another factor that makes automatic interpretation difficult is that the conditions for capturing satellite images differ each time due to the weather, the positional relationship between the sun and the earth, and the like. Patent Document 1 provides a mechanism that supports the user's interpretation by simulating how a known 3D model looks depending on changes in imaging conditions. Third, there are various terrestrial features in satellite images, and there are too many variations in the labels that humans have given to the terrestrial features, making it difficult to apply machine learning. Labels given by humans can be difficult for computers to understand, such as classifying things that look different or have completely different sizes into the same class. For example, in an urban area, it is known that when a deep learning model is generated that extracts buildings with various numbers of pixels as the same class, the performance deteriorates.

衛星画像を自動判読する際には解像度調整に細心の注意が必要となる。特許文献２では建物分類のネットワークを組むとき、３段階のサイズ別検出器を学習させるマルチタスクネットワークとすることで、深層学習による分類性能が向上できる手法を開示している。３段階のサイズに区切るなどの人が無意識に使っている常識を併用することで建物や路面の抽出といったタスクは一定の成果が上がりつつある。 When automatically interpreting satellite images, careful attention must be paid to resolution adjustment. Patent Document 2 discloses a method that can improve classification performance by deep learning by creating a multi-task network that trains three stages of size-based detectors when building a building classification network. By using common sense that people unconsciously use, such as dividing the image into three sizes, we are achieving certain results in tasks such as extracting buildings and road surfaces.

建物や路面の類別にあたっては、背景知識を持ったうえで画像の範囲外の情報も使って間接的に推論しなければならない難しさがあり、抽出タスクのように単純ではないが、前述のとおり衛星画像の取得コストは低下傾向にあり、多数の画像を使って認識することも選択肢になってきている。 When classifying buildings and road surfaces, it is difficult to make indirect inferences using background knowledge and information outside the image range, so it is not as simple as the extraction task, but as mentioned above, it is difficult to classify buildings and road surfaces. The cost of acquiring satellite images is decreasing, and recognition using a large number of images is becoming an option.

特開２０１０－２７１８４５号公報JP2010-271845A 特開２０１９－１７５１４０号公報Japanese Patent Application Publication No. 2019-175140

深層学習は他手法に比べて認識性能が高いが、建物や路面の機能を有するか類別したい用途には完全ではない。衛星画像の場合は、種類を特定するには背景知識のようなものが必要となり、エンドツーエンドで実行することは現実的には難しい。このようなときに、深層学習をエンドツーエンドで実行してしまうと、結果修正の手間が大きくなり、ラベリングにかかる総工数を減らせない問題がある。 Although deep learning has higher recognition performance than other methods, it is not perfect for applications that want to classify buildings and road surfaces as having functions. In the case of satellite images, some kind of background knowledge is required to identify the type, and it is practically difficult to perform this end-to-end. In such a case, if deep learning is executed end-to-end, it will take a lot of effort to correct the results, and there is a problem that the total number of man-hours required for labeling cannot be reduced.

本発明は、地物の分類において、ラベリングをアシストすることにより、作業工数を小さくすることを目的とする。 The present invention aims to reduce the number of work steps by assisting labeling in classifying features.

本発明では、建物と路面のそれぞれについて段階的に認識をかけることで、建物の抽出と類別を両立し、判断指標となった要因説明を提示しながらラベリングをアシストする手法を提示する。 The present invention proposes a method that achieves both the extraction and classification of buildings by performing step-by-step recognition of each of buildings and road surfaces, and assists labeling while presenting explanations of factors that serve as judgment indicators.

上記課題を解決するための、本発明の「地物の分類システム」の一例を挙げるならば、
複数枚の空撮画像を用いて地物を分類する分類システムであって、複数の解像度で認識した抽出結果を統合して路面ポリゴンを抽出する路面ポリゴン抽出部と、複数の空撮画像ペアによる多眼ステレオで得られる三次元情報と前記路面ポリゴンを用いて建物ポリゴンを抽出する建物ポリゴン抽出部と、前記建物ポリゴンと前記路面ポリゴンを類別するための各ポリゴンの画像特徴と幾何情報を取り出して属性テーブルに格納する属性テーブル作成部と、前記属性テーブルを用いて建物ポリゴンを類別し、さらに、確度高く類別出来たポリゴンについて、他のポリゴンとの関係を評価して前記属性テーブルに追加更新し、類別処理を繰り返す類別部と、を有するものである。 An example of the "feature classification system" of the present invention for solving the above problems is as follows:
It is a classification system that classifies features using multiple aerial images, and includes a road surface polygon extraction unit that extracts road surface polygons by integrating the extraction results recognized at multiple resolutions, and multiple aerial image pairs. a building polygon extraction unit that extracts building polygons using three-dimensional information obtained by multi-view stereo and the road surface polygon; and a building polygon extraction unit that extracts image characteristics and geometric information of each polygon for classifying the building polygon and the road surface polygon. An attribute table creation unit stores in an attribute table and classifies building polygons using the attribute table, and further updates the attribute table by evaluating relationships with other polygons for polygons classified with high accuracy. , and a classification section that repeats classification processing.

本発明によれば、地物の分類において、ラベリングをアシストすることにより、作業工数を小さくすることができる。 According to the present invention, the number of man-hours can be reduced by assisting labeling in classifying features.

上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。 Problems, configurations, and effects other than those described above will be made clear by the following description of the embodiments.

地物の分類方法の全体処理フロー図。The overall processing flow diagram of the feature classification method. 地物の分類システムの機能ブロック構成図。Functional block diagram of the feature classification system. 建物ポリゴンと路面ポリゴンの抽出フロー図。Flow diagram for extracting building polygons and road surface polygons. 整形済みポリゴンを得るフロー図。A flow diagram for obtaining shaped polygons. 建物属性テーブルの作成フロー図。Flow diagram for creating a building attribute table. 路面属性テーブルの作成フロー図。A flowchart for creating a road surface attribute table. 建物属性テーブルの一例を示す図。The figure which shows an example of a building attribute table. 路面属性テーブルの一例を示す図。The figure which shows an example of a road surface attribute table. 建物と路面の類別処理フロー図。Flowchart of classification processing for buildings and road surfaces. 抽出したエッジの取捨選択画面を示す図。The figure which shows the selection screen of the extracted edge. 類別したポリゴンの結果表示画面を示す図。The figure which shows the result display screen of classified polygons. 本発明のハードウェア構成を説明する図。FIG. 2 is a diagram illustrating the hardware configuration of the present invention.

本発明の実施の形態を、図面を参照しつつ説明する。 Embodiments of the present invention will be described with reference to the drawings.

衛星画像から地物を類別する実施形態について説明する。本実施形態では、地上を撮影した衛星画像から、建物領域や路面領域を抽出し、建物領域については、例えば学校、病院、ショッピングモールなどに類別し、路面領域については、例えば駐車場、滑走路、道路などに類別する。 An embodiment for classifying terrestrial features from satellite images will be described. In this embodiment, building areas and road surface areas are extracted from satellite images taken of the ground, and the building areas are categorized into, for example, schools, hospitals, shopping malls, etc., and the road surface areas are categorized into, for example, parking lots, runways, etc. , roads, etc.

図１Ａに、地物の分類方法の全体処理のフロー図を示す。Ｓ１０１で衛星画像から建物ポリゴン（建物領域）及び路面ポリゴン（路面領域）を抽出する。そして、Ｓ１０２で建物ポリゴンおよび路面ポリゴンを整形して整形済みポリゴンを取得する。そして、Ｓ１０３で建物ポリゴンおよび路面ポリゴンの属性値を抽出して、属性テーブルを作成する。そして、Ｓ１０４で属性テーブルを用いて各ポリゴンの類別処理を行う。 FIG. 1A shows a flowchart of the overall processing of the feature classification method. In S101, building polygons (building areas) and road surface polygons (road surface areas) are extracted from the satellite image. Then, in S102, the building polygon and the road surface polygon are shaped to obtain a shaped polygon. Then, in S103, attribute values of the building polygon and road surface polygon are extracted to create an attribute table. Then, in S104, each polygon is classified using the attribute table.

建物ポリゴンおよび路面ポリゴンの抽出処理は図２で、抽出した建物ポリゴンおよび路面ポリゴンの整形処理は図３で説明する。建物属性テーブルの作成は図４で、路面属性テーブルの作成は図５で説明する。そして、建物および路面の類別処理を図８で説明する。 Extraction processing of building polygons and road surface polygons will be explained with reference to FIG. 2, and processing of shaping the extracted building polygons and road surface polygons will be explained with reference to FIG. Creation of the building attribute table will be explained in FIG. 4, and creation of the road surface attribute table will be explained in FIG. 5. The classification process for buildings and road surfaces will be explained with reference to FIG.

図１Ｂに、図１Ａの処理フローを行うための地物の分類システムの機能ブロック構成図を示す。分類システムは、衛星画像を取り込む画像入力部１１、衛星画像から路面ポリゴンを抽出する路面ポリゴン抽出部１２、路面ポリゴンを用いて衛星画像から建物ポリゴンを抽出する建物ポリゴン抽出部１３、抽出された建物ポリゴン及び路面ポリゴンから整形済みポリゴンを得る整形済みポリゴン取得部１４、各整形済みポリゴンから建物及び路面の属性テーブル１７を作成する属性テーブル作成部１５、属性テーブル１７に基づいて建物及び路面を類別する類別部１６を備えている。また、選択画面などを表示し、必要に応じてユーザからの入力を行うユーザインタフェース１８を備えている。 FIG. 1B shows a functional block configuration diagram of a feature classification system for performing the processing flow of FIG. 1A. The classification system includes an image input section 11 that imports satellite images, a road surface polygon extraction section 12 that extracts road surface polygons from the satellite image, a building polygon extraction section 13 that extracts building polygons from the satellite image using the road surface polygons, and an extracted building. A shaped polygon acquisition section 14 obtains shaped polygons from polygons and road surface polygons, an attribute table creation section 15 creates attribute tables 17 for buildings and road surfaces from each shaped polygon, and classifies buildings and road surfaces based on the attribute tables 17. A classification section 16 is provided. It also includes a user interface 18 that displays a selection screen and performs input from the user as necessary.

衛星画像は各ピクセルが実質の距離情報を持つ画像であり、空間解像度の良い画像になるほど、画像サイズが巨大になる特徴がある。ソフトウェアによる画像の自動判読をする場合、画像サイズが巨大すぎると演算能力が追い付かないことがよく発生する。特に深層学習では、GPU(Graphics Processor Unit)を使った超並列演算で大規模な演算を実現しているが、画像サイズが大きくなるとニューラルネットワーク規模が極端に大きくなってしまい、計算が困難になる問題がある。そのため、一般的には画像の解像度を調整してGPUへ入力するが、画像縮小した結果、もともとピクセルサイズが小さい地物は検出できなくなる可能性がある。そのため、画像の縮尺決定後、認識対象の地物のピクセルサイズに適合した学習データを用意するステップが基本のアプローチとなる。このとき、認識したい地物の大きさが小さい物から大きいものまで幅広く分布している場合や、大きさの分布そのものが未知の場合は、縮尺の決定が難しいことがある。このような場合、深層学習以外のアプローチで建物領域を検出するアプローチが有効なことがある。例えば、市街地における建物など、高さのある人工物を認識する時は、写真測量を用いて三次元化し、路面と判断されたところからの高度差を使って建物を抽出する方法が別のアプローチとして考えられる。三次元化したデータはDSM (Digital Surface Model)画像と呼ばれる二次元の画像の各ピクセルが地物の高度を表現する画像として取り扱うとよい。 Satellite images are images in which each pixel has real distance information, and the higher the spatial resolution of the image, the larger the image size becomes. When automatically interpreting images using software, if the image size is too large, the computational power often cannot keep up. Especially in deep learning, large-scale calculations are achieved through massively parallel calculations using GPUs (Graphics Processor Units), but as the image size increases, the neural network scale becomes extremely large, making calculations difficult. There's a problem. For this reason, the image resolution is generally adjusted before inputting it to the GPU, but as a result of image reduction, it is possible that features with originally small pixel sizes cannot be detected. Therefore, the basic approach is to prepare learning data that matches the pixel size of the feature to be recognized after determining the scale of the image. At this time, it may be difficult to determine the scale if the size of the feature you want to recognize is widely distributed from small to large, or if the size distribution itself is unknown. In such cases, approaches to detecting building areas using approaches other than deep learning may be effective. For example, when recognizing tall man-made objects such as buildings in an urban area, another approach is to use photogrammetry to create a three-dimensional image and extract the building using the height difference from what is determined to be the road surface. It can be considered as It is best to treat the three-dimensional data as a two-dimensional image called a DSM (Digital Surface Model) image, where each pixel represents the altitude of a feature.

衛星画像は前記の通り、各ピクセルが距離の情報を持つから、三次元点群はxyzの絶対座標を取得できる。ｙｘは緯度経度、ｚは高度に変換が可能である。DSM画像の等高線を引き、路面として抽出された領域との高度差が閾値以上の領域を人工的な建屋とみなすような処理をかけるとよい。なお、DSM画像の細かなピクセル欠落に対しては、膨張圧縮処理を用いて対処するとよい。 As mentioned above, in satellite images, each pixel has distance information, so a 3D point group can obtain absolute xyz coordinates. yx is latitude and longitude, and z is highly convertible. It is recommended to draw contour lines on the DSM image and apply processing to treat areas where the height difference from the area extracted as the road surface is equal to or greater than a threshold as artificial buildings. Note that it is preferable to use expansion and compression processing to deal with small pixel omissions in the DSM image.

画像からの三次元化で重要となるのは、被写体とカメラの位置関係が偏らないことである。建物の全面が撮影されるように、衛星の撮影位置と角度にばらつきを持たせた撮影計画とセットにするとよい。しかし、衛星画像は撮影頻度が高くないため、三次元化できる枚数の衛星画像が収集されるまでの所要時間が長くなりがちである。収集時間が長くなると、地表において新たな構造物が建造されたり、自然物が整地されてしまう場合がある。この問題を解消するために、衛星コンステレーションによる高頻度撮影や、同一地点をビデオ撮影するモードを搭載した衛星を活用することでこれらの影響を軽減できる。 What is important when converting images into three-dimensional images is that the positional relationship between the subject and camera is not biased. It is best to set up a shooting plan with variations in the satellite shooting position and angle so that the entire surface of the building is photographed. However, since satellite images are not taken frequently, it tends to take a long time to collect the number of satellite images that can be converted into three-dimensional images. If the collection time becomes long, new structures may be built on the ground or natural objects may be leveled. In order to solve this problem, these effects can be reduced by using satellite constellations that take frequent photographs or by using satellites equipped with a mode that allows video recording of the same location.

衛星画像を三次元化するのに十分な品質の画像セットが取得できなかった場合、すなわち、前記のように撮影条件にばらつきがなかった場合や、画像収集期間が長くなってしまった場合には、DSM画像のピクセルが欠落する可能性が高まる。三次元化の手法は、衛星画像のペアによるステレオ視で得られる三次元点群を統合するMulti View Stereoを想定する。できるだけ新しい衛星画像同士のペアを重視して、三次元点群を統合するとよい。具体的には、新しい衛星画像同士のペアで作られた三次元点群と、古い衛星画像同士のペアで作られた三次元点群を、同一地理座標上に重畳した時、前者の情報を優先するとよい。例えば複数のステレオペアで同一地点のｚ座標の数値を算出し、平均化する方法が考えられるが、新しい衛星画像同士のステレオペアであるほど平均化の際の重みを大きくするとよい。あるいは、新しい衛星画像同士のステレオペアを基準として、所定の年月以上経過した衛星画像を含むペアのｚ座標の逸脱度を判定し、逸脱度が高い場合、平均化の対象から外すとよい。 If it is not possible to obtain an image set of sufficient quality to convert satellite images into three-dimensional images, that is, if there are no variations in the shooting conditions as described above, or if the image collection period becomes long, , the possibility of missing pixels in the DSM image increases. The 3D conversion method assumes Multi View Stereo, which integrates 3D point clouds obtained by stereo viewing from pairs of satellite images. It is best to integrate 3D point clouds by focusing on pairs of satellite images that are as new as possible. Specifically, when a 3D point cloud created by a pair of new satellite images and a 3D point cloud created by a pair of old satellite images are superimposed on the same geographic coordinates, the information of the former is It's good to prioritize. For example, one possible method is to calculate the z-coordinate values of the same point for a plurality of stereo pairs and average them, but the newer the stereo pair of satellite images, the greater the weight during averaging. Alternatively, using a stereo pair of new satellite images as a reference, the degree of deviance of the z-coordinate of a pair including a satellite image that has passed a predetermined number of years is determined, and if the degree of deviation is high, it may be excluded from averaging.

このようにして得られたDSM画像は、地面が全体的に傾いている場合があり、路面との高度差で建物を検出する処理と相性が悪い場合があるので、DEM(Digital Elevation model)と呼ばれる地表面の高さを表すラスタデータを使って傾斜を打ち消す処理を行うとよい。DEMは地表面の上に存在する地物を含めないため、地表からの地物の高さの情報だけが算出できる。 DSM images obtained in this way may have an overall slope of the ground, which may not be compatible with the process of detecting buildings based on the height difference from the road surface, so DSM images may not be compatible with DEM (Digital Elevation Model). It is recommended to use raster data representing the height of the ground surface to cancel out the slope. DEM does not include features that exist above the ground surface, so only information about the height of features above the ground surface can be calculated.

路面の検出では影や建物のオクルージョン等によって抽出漏れが発生し、正しい領域が抽出できないことがある。その場合、撮影条件の異なる複数の画像を用いて同一地点を認識させて結果を統合するとよい。例えば撮影角度の違う画像を使うと建物のオクルージョンの影響を除外できるし、撮影時間帯が異なる画像を用いて影の影響を除外してもよい。深層学習に入力できる画像サイズに比して、認識させたい路面ピクセルの占める割合が大きすぎる場合には、形状情報として使えなくなり精度が著しく劣化してしまう。そのため、認識させたい路面の幅が入力画像サイズに比べて大きい場合、広い範囲を認識できるようにズームアウト、すなわち入力画像サイズは固定で解像度をダウンサンプルすることで広範を映した画像を、１または複数段用意してそれぞれ認識を実行するとよい。これらの異なる解像度で認識した結果を重ね合わせて、得られた情報尤度を用いてセグメンテーションするとよい。解像度を半分にしていき、ターゲットとする路面の幅と入力画像サイズが一定の値を切ったところで、ズームアウトを止めるとよい。 When detecting the road surface, extraction errors may occur due to shadows, occlusion of buildings, etc., and the correct area may not be extracted. In that case, it is preferable to recognize the same point using a plurality of images under different shooting conditions and integrate the results. For example, the influence of building occlusion can be excluded by using images taken at different angles, or the influence of shadows can be excluded by using images taken at different times. If the ratio of the road surface pixels that you want to recognize is too large compared to the image size that can be input to deep learning, the image cannot be used as shape information and accuracy will deteriorate significantly. Therefore, if the width of the road surface that you want to recognize is larger than the input image size, you can zoom out to recognize a wider area.In other words, the input image size is fixed and the resolution is downsampled to create an image that shows a wider area. Alternatively, it is preferable to prepare multiple stages and perform recognition on each stage. It is preferable to superimpose recognition results at these different resolutions and perform segmentation using the obtained information likelihood. It is a good idea to cut the resolution in half and stop zooming out when the target road surface width and input image size fall below a certain value.

図２に、建物ポリゴン及び路面ポリゴンを抽出するフロー図を示す。先ず、Ｓ２０１で画像から植物等の植生や水域などの自然物を除外する処理を行う。そして、路面抽出処理においては、Ｓ２０６のマルチ解像度化処理において、複数の解像度の画像を得る。そして、Ｓ２０７で各解像度の画像から路面を抽出し、Ｓ２０８で各解像度の路面の抽出結果を統合し、Ｓ２０９で正確な路面を抽出する。建物抽出処理においては、Ｓ２０２で三次元化を行い、Ｓ２０３でＤＳＭ画像とし、Ｓ２０４で傾斜補正を行い、Ｓ２０５で三次元化処理のはずれ値を除去するデノイズ処理を行う。そして、S210で、路面ポリゴンも用いて建物ポリゴンを抽出する。 FIG. 2 shows a flowchart for extracting building polygons and road surface polygons. First, in S201, processing is performed to exclude natural objects such as vegetation such as plants and water bodies from the image. In the road surface extraction process, images of multiple resolutions are obtained in the multi-resolution process of S206. Then, in S207, the road surface is extracted from the images of each resolution, the extraction results of the road surface of each resolution are integrated in S208, and the accurate road surface is extracted in S209. In the building extraction process, three-dimensionalization is performed in S202, a DSM image is created in S203, tilt correction is performed in S204, and denoising processing is performed to remove outliers from the three-dimensionalization process in S205. Then, in S210, building polygons are extracted using also road surface polygons.

建物ポリゴン及び路面ポリゴンを抽出は、図１Ｂのブロック構成図の、建物ポリゴン抽出部１３及び路面ポリゴン抽出部１２が行う。 The building polygons and road surface polygons are extracted by the building polygon extraction section 13 and the road surface polygon extraction section 12 in the block diagram of FIG. 1B.

以上のように、建物と路面を抽出したポリゴンはノイズを多く含むことが予測されるため、後処理としてポリゴンの境界を精錬して整形済みポリゴンを取得する。すなわち、図１Ａの処理フローの、S１０２の整形済みポリゴンの取得を行う。整形済みポリゴンの取得は、図１Ｂのブロック構成図の、整形済みポリゴン取得部１４が行う。 As described above, polygons extracted from buildings and road surfaces are expected to contain a lot of noise, so as post-processing, the boundaries of the polygons are refined to obtain shaped polygons. That is, the shaped polygons are acquired in S102 of the processing flow in FIG. 1A. The shaped polygons are acquired by the shaped polygon acquisition unit 14 shown in the block diagram of FIG. 1B.

図３を用いて、抽出したポリゴンからユーザ所望の形状を取り出す処理について説明する。本実施形態のポリゴン精錬方法は、建物ポリゴンの精錬プログラムと、路面ポリゴンの精錬プログラムと、これらの途中経過を表示可能なユーザインタフェースによって実現され、インタラクティブなポリゴン抽出を可能とする。逐一ユーザからの指定を必要とするわけではなく、途中経過を表示しておき、ユーザが修正の選択をした場合にのみ、変更を優先的に受け付ける仕組みを用意するとよい。 The process of extracting a shape desired by the user from the extracted polygons will be explained using FIG. 3. The polygon refining method of this embodiment is realized by a building polygon refining program, a road surface polygon refining program, and a user interface that can display the progress of these programs, thereby enabling interactive polygon extraction. Rather than requiring the user to specify each point, it is preferable to display the progress in progress and provide a mechanism that preferentially accepts changes only when the user selects a modification.

建物の抽出は、前記の通り三次元化して得られた点群を、緯度経度をｘｙ平面にとりｚを高さ方向とすることで、DSM画像を活用することを想定しているが、抽出結果は衛星画像に重畳して表示したいと考えることが自然である。この重畳対象の衛星画像をリファレンス画像と呼ぶこととする。しかし、DSM画像で抽出したポリゴンをリファレンス画像に重畳すると、本来の建物の位置とずれて表示されることがある。衛星画像が直下撮影でない限り、高層建築物は画像化されるときに緯度経度が異なる地点に覆いかぶさるように撮像されるためである。この影響を回避するために、できるだけ直下撮影の衛星画像をリファレンス画像として選択することが望ましいが（Ｓ３１１）、原理上、完全にこの位置ずれをなくすことは難しい。 To extract buildings, it is assumed that DSM images will be utilized by converting the point cloud obtained by three-dimensionalizing it into three dimensions as described above, with the latitude and longitude on the xy plane and z as the height direction, but the extraction results It is natural to want to display it superimposed on the satellite image. This satellite image to be superimposed will be referred to as a reference image. However, when polygons extracted from a DSM image are superimposed on a reference image, the building may appear shifted from its original position. This is because, unless the satellite image is taken directly below, when a high-rise building is imaged, it is imaged so as to overlap points with different latitudes and longitudes. In order to avoid this influence, it is desirable to select a satellite image taken directly below as a reference image as much as possible (S311), but in principle it is difficult to completely eliminate this positional shift.

そこで、本実施形態では、二次元の画像特徴を使って抽出ポリゴンの精度を向上させる方法を提供する。まず、DSM画像で抽出した建物ポリゴンそれぞれの代表点を選択する（Ｓ３１２）。代表点の選択方法はユーザに指定させてもよいし、建物ポリゴンの重心や、高さのピークなどを選択してもよい。この座標を中心にリファレンス画像を切り出してチップ画像単位に切り出してエッジを解析するとよい。チップ画像は、当該建物ポリゴンの全体が写るように切り出し、隣接ポリゴンと連結の可能性があるのでポリゴンの代表点間の距離が規定値よりも近い場合には、複数のポリゴンの全体像が写るようにチップ画像を切り出すとよい。 Therefore, this embodiment provides a method of improving the accuracy of extracted polygons using two-dimensional image features. First, representative points of each building polygon extracted from the DSM image are selected (S312). The method for selecting the representative point may be specified by the user, or the center of gravity of the building polygon, the peak of the height, etc. may be selected. It is preferable to cut out the reference image around these coordinates, cut out each chip image, and analyze the edges. The chip image is cut out so that the entire building polygon is captured, and since there is a possibility of connection with adjacent polygons, if the distance between the representative points of the polygons is closer than the specified value, the entire image of multiple polygons will be captured. It is a good idea to cut out the chip image like this.

代表点を選択したら、その画像周辺を切り出し、エッジ検出処理をかける（Ｓ３１３）。ハフ変換や、キャニーフィルタなどがエッジ検出の候補となる。検出対象は建物であるから、矩形である可能性が高く、エッジの方向ヒストグラムをとることで、多くの場合90度ずれた角度に頻度が集中する傾向を観測できる。この性質を利用して、強調すべきエッジ、ノイズとして除外すべきエッジを特定することができる。着目するエッジが決まったら、近似直線を抽出する。近似直線によるグリッドを形成し、どのグリッドを残すかを選択するとよい。前記グリッドを、建物ポリゴンをラスタライズして２値画像にしたものに重畳して、建物領域が一定以上含まれていたら選択するようにしたり、グラフカットで領域の連続性を評価して決定するとよい（S３１４）。このようにして得られた直線の中からユーザに残すべき候補を選択させるインタフェースを用意しておくとよい。表示された候補から削除する直線を選ぶ方法、使用する直線を選択する方法、あるいは新規に直線を指定することが可能なインタフェースを用意しておくとよい（Ｓ３０２）。これにより、アノテーション作業の手戻りを少なくし、労力を軽減する効果が期待できる。例えば、図９（ａ）に示すようなエッジ候補選択画面をサポートするインタフェースを取り付けるとよい。このようにして、直線が交差する点をつなぐことでポリゴンの形状を確定するが、元のポリゴンを分割・結合するケースがある（Ｓ３１５）。この時には、図９（ｂ）に示すような結果の確認画面と、修正を可能にする画面を表示するとよい（Ｓ３０３）。ユーザによる結果の修正画面では、ポリゴンの分割と結合の取り消し、ポリゴンの頂点操作などを可能にしておくとよい。頂点操作には頂点の移動、新規追加、削除などの操作を含む。結果の確認・修正の後、建物ポリゴンを確定する（Ｓ３１６）。 Once the representative point is selected, the area around the image is cut out and edge detection processing is applied (S313). Hough transform, Canny filter, etc. are candidates for edge detection. Since the detection target is a building, it is likely to be rectangular, and by taking a directional histogram of edges, it is often possible to observe a tendency for the frequency to concentrate at angles that are 90 degrees apart. By utilizing this property, edges to be emphasized and edges to be excluded as noise can be specified. Once the edge of interest is determined, an approximate straight line is extracted. It is best to form a grid using approximate straight lines and select which grid to leave. It is recommended to superimpose the grid on a binary image obtained by rasterizing building polygons, and select it if a certain number of building areas are included, or to evaluate the continuity of the area using a graph cut. (S314). It is preferable to prepare an interface that allows the user to select candidates to keep from among the straight lines obtained in this way. It is preferable to prepare an interface that allows a method for selecting a straight line to be deleted from the displayed candidates, a method for selecting a straight line to be used, or a method for specifying a new straight line (S302). This can be expected to have the effect of reducing rework of annotation work and reducing labor. For example, it is preferable to install an interface that supports an edge candidate selection screen as shown in FIG. 9(a). In this way, the shape of the polygon is determined by connecting the points where the straight lines intersect, but there are cases where the original polygon is divided or combined (S315). At this time, it is preferable to display a result confirmation screen and a screen that allows correction as shown in FIG. 9(b) (S303). It is preferable to enable the user to cancel division and combination of polygons, manipulate the vertices of polygons, etc. on the screen for modifying the results. Vertex operations include operations such as moving vertices, adding new vertices, and deleting vertices. After checking and correcting the results, the building polygon is determined (S316).

路面ポリゴンも同様の流れであるが、路面は建物よりもポリゴンのサイズが大きいことが想定される。解像度をそのままにすると、エッジ検出する範囲が広くなるので、セグメンテーションの時と同様に、複数の解像度で処理をかけるとよい（Ｓ３２１～Ｓ３２６）。 The same flow applies to road surface polygons, but it is assumed that the polygon size of road surfaces is larger than that of buildings. If the resolution is left unchanged, the edge detection range will become wider, so it is better to perform processing at multiple resolutions, as in the case of segmentation (S321 to S326).

この方法のほか、ポリゴンを前記三次元形状から抽出した建物ポリゴンと衛星画像のチップ画像を入力データとして、建物および路面の輪郭を推論する敵対的生成ネットワークを用いて、機械学習ベースでポリゴンの形状をシャープにする方法が考えられる。この場合も、結果の確認と修正の画面は用意しておき、ポリゴンの頂点操作ができるように設定しておくとよい。 In addition to this method, a generative adversarial network that infers the contours of buildings and road surfaces is used as input data for building polygons extracted from the three-dimensional shape and chip images of satellite images. There is a way to sharpen the image. In this case as well, it is a good idea to prepare a screen for checking and correcting the results, and to set it so that you can manipulate the vertices of polygons.

以上のフローによって、建物と路面の輪郭ポリゴンを得る。 Through the above flow, contour polygons of buildings and road surfaces are obtained.

本実施形態では、このようにして得られた建物ポリゴンと路面ポリゴンのそれぞれのクラスを類別するための情報を抽出する。すなわち、図１Ａの処理フローの、建物及び路面の属性テーブルの作成を行う（Ｓ１０３）。属性テーブルの作成は、図１Ｂのブロック構成図の、属性テーブル作成部１５が行う。建物ポリゴンを類別するための属性値を得るフローを図４に、路面ポリゴンを類別するための属性値を得るフローを図５に示す。各建物ポリゴンにはIDを割り当てて、図６に示すような建物属性テーブルを、各路面ポリゴンにはIDを割り当てて、図７に示すような路面属性テーブルを作成する。 In this embodiment, information for classifying the classes of the building polygons and road surface polygons obtained in this way is extracted. That is, building and road attribute tables are created in the processing flow of FIG. 1A (S103). Creation of the attribute table is performed by the attribute table creation unit 15 shown in the block diagram of FIG. 1B. FIG. 4 shows a flow for obtaining attribute values for categorizing building polygons, and FIG. 5 shows a flow for obtaining attribute values for categorizing road surface polygons. An ID is assigned to each building polygon to create a building attribute table as shown in FIG. 6, and an ID is assigned to each road surface polygon to create a road surface attribute table as shown in FIG.

図４に建物属性テーブルの作成フローを示す。まず、Ｓ４０１で高さを抽出する。DSM画像から高さ情報が取得できるので、ピーク、中央値、平均値の情報を保持しておくとよい。また、Ｓ４０２で面積を算出する。ポリゴンの面積情報も、その建物の類別に有意義な情報となるので算出してコラムとして保持しておく。Ｓ４０３でスペクトル特徴を算出する。ポリゴンの色情報、テクスチャ、NDVI（植生指数）/PNVIのようにピクセル値をベースに属性値を算出する場合は、重心から５～９ピクセル程度の範囲を使って計算するとよい。色情報およびNDVI/PNVIはピクセルの平均値を採用するとよい。PNVIは、Red Edgeのスペクトルを取得できないと計算できないので、使用する衛星画像によっては計算できないことがあるので、算出できない場合は未使用でよい。テクスチャはグレーレベル同時生起行列(GLCM)を用いて、コントラスト、相関、エネルギー、均一性などをパラメータとして保持するとよい。また、Ｓ４０４で形状を抽出する。ポリゴンの形状を幾つかのパタンに分類しておく。例えば、建物に最も多いことが想定される四角の形状ならば0、貯蔵庫など特殊な構造物によくみられる円の形状を持っていれば1、それ以外の多角形に2、3といった形で用途に応じた形でクラスを割り当てるとよい。Ｓ４０５で、文字やサインを抽出する。日本の学校や警察署などの施設では、屋上に名前を書き込むことがよくあるので、「ｘｘ病院」、「ｘｘ警察」、「ｘｘ小」、「ｘｘ中」などといった１または複数の漢字の組み合わせ、に関してパタン認識をかけ、検出されたものを属性テーブルに保持しておくとよい。その他、特に小中学校に多いが、防災時にヘリコプターが上空から見つけやすいよう屋上に数字を記入した建物もある。つまり、屋上にかかれている数字は建物の特定に使用できると考えられるので、当該ポリゴンの中に数字が含まれている場合にはパタン検出して、属性テーブルに発見できた数字を保持しておくとよい。ヘリコプターが着陸するためのサインである丸にHのマークは、路面と建物で両方に存在しうるが、建物特定の場合にも、参考情報になるので、検出できた場合には属性テーブルに保持しておくとよい。 Figure 4 shows the flow of creating a building attribute table. First, the height is extracted in S401. Since height information can be obtained from DSM images, it is a good idea to retain information on peak, median, and average values. Furthermore, the area is calculated in S402. The polygon area information is also useful information for the type of building, so it is calculated and stored as a column. Spectral features are calculated in S403. When calculating attribute values based on pixel values, such as polygon color information, texture, NDVI (vegetation index)/PNVI, it is recommended to use a range of about 5 to 9 pixels from the center of gravity. It is preferable to use pixel average values for color information and NDVI/PNVI. PNVI cannot be calculated unless the Red Edge spectrum is obtained, so it may not be possible to calculate it depending on the satellite image used, so if it cannot be calculated, it can be left unused. For the texture, it is preferable to use a gray level co-occurrence matrix (GLCM) and maintain parameters such as contrast, correlation, energy, and uniformity. In addition, the shape is extracted in S404. Classify polygon shapes into several patterns. For example, if the shape is square, which is most common in buildings, then 0, if the shape is circular, which is often seen in special structures such as storage warehouses, 1, and for other polygons, 2, 3, etc. It is a good idea to assign classes according to the purpose. In S405, characters and signatures are extracted. Facilities such as schools and police stations in Japan often have their names written on the roof, so a combination of one or more kanji such as "xx hospital", "xx police", "xx elementary school", "xx junior high", etc. It is a good idea to perform pattern recognition on , and store the detected ones in an attribute table. Other buildings, especially elementary and junior high schools, have numbers written on their roofs to make it easier for helicopters to spot them from the air in the event of a disaster. In other words, it is thought that the numbers on the roof can be used to identify the building, so if a number is included in the polygon, the pattern is detected and the found number is stored in the attribute table. It's a good idea to leave it there. The H-in-a-circle mark, which is a sign for a helicopter to land, can exist on both roads and buildings, but it also serves as reference information when specifying a building, so if it can be detected, it is retained in the attribute table. It's a good idea to keep it.

深層学習によるエンドツーエンドでの類別をそのまま使うことが難しい点を課題で指摘したが、ニューラルネットワークの出力層で得られる各クラスの分類確率には、有用な情報が含まれているため、属性テーブルの値として利用してよい。Ｓ４０６～Ｓ４０９に示すように、建物ポリゴンを中心にチップ画像を生成し（Ｓ４０６）、クラス分類や物体認識のネットワークを適用して、類別クラスの位置・形状・分類確率等を得る（Ｓ４０７）。たとえばクラス分類のネットワークを用いて、チップ画像のクラスを推論する方法が考えられる。クラス分類のネットワークを用いる場合は、そのチップ画像内に写っている建物のクラスと確率が出力されるので、確率上位クラス３つ程度の分類確率を属性テーブルに保存しておくとよい（Ｓ４０９）。 We pointed out in the assignment that it is difficult to use end-to-end classification by deep learning as is, but since the classification probability of each class obtained in the output layer of the neural network contains useful information, May be used as table value. As shown in S406 to S409, a chip image is generated centering on the building polygon (S406), and a class classification and object recognition network is applied to obtain the position, shape, classification probability, etc. of the classification class (S407). For example, a method of inferring the class of a chip image using a class classification network can be considered. When using a class classification network, the class and probability of the building in the chip image are output, so it is recommended to save the classification probabilities of the three highest probability classes in the attribute table (S409). .

一方、物体認識のネットワークを用いる場合、チップ画像の中に写っている物体の位置とクラスの情報が得られる。衛星画像の認識において、直接的に認識ができるような建物は種類が少なく、特徴的な形状の抽出に用いることが想定される。例えば、石油の貯蔵庫や、レーダーなどは上空から撮影すると白い円形をしている。このような単純な形状の建物抽出に物体検出のネットワークは有効である。物体検出のネットワークを用いて建物を抽出した場合、抽出位置と建物ポリゴンの重なり方を評価する。抽出した建物の位置・形状と建物ポリゴンとのIoU (Intersection over Union) を用いて、重なり度合いを評価する（Ｓ４０８）。IoUの値が一定以上、すなわち重なりが十分な時は、物体検出のネットワークで検出した建物と建物ポリゴンの建物が同一であると判定できる。物体認識のネットワークを使ったときは、位置・形状の情報が取得できるが、クラス分類の時の出力を使う場合には、GradCAMなどを適用して、各確率の判断指標となった領域とIoUをとるとよい。 On the other hand, when using an object recognition network, information on the position and class of the object in the chip image can be obtained. In satellite image recognition, there are only a few types of buildings that can be directly recognized, and it is assumed that this will be used to extract characteristic shapes. For example, oil storage facilities and radars look like white circles when photographed from above. Object detection networks are effective for extracting buildings with such simple shapes. When a building is extracted using an object detection network, the overlap between the extraction position and the building polygon is evaluated. The degree of overlap is evaluated using IoU (Intersection over Union) between the extracted building position/shape and the building polygon (S408). When the IoU value is above a certain level, that is, when the overlap is sufficient, it can be determined that the building detected by the object detection network and the building in the building polygon are the same. When using an object recognition network, position and shape information can be obtained, but when using the output from class classification, GradCAM etc. can be applied to obtain the area and IoU that served as the judgment index for each probability. It is better to take

三次元形状をもとに建物ポリゴンを生成しているから、建物ポリゴンは完全にオルソ仮定ができるが、チップ画像は建物ポリゴンの座標をもとに、マルチスペクトル衛星画像を切り取った画像を想定するため、撮影条件によってオルソ仮定ができないことがある。そのためIoU以外の同一判定の指標ももっておくとよい。例えば、物体検出のネットワークで検出した位置形状と、建物ポリゴンの距離が所定の値以下であれば同一と判定してもよい。距離の定義はいくつか考えられるが、建物ポリゴンの重心と物体検出のネットワークで検出した位置形状の重心の距離、あるいは両ポリゴンの最短距離などが考えられる。 Since the building polygon is generated based on the three-dimensional shape, the building polygon can be assumed to be completely orthogonal, but the chip image is assumed to be an image cut from a multispectral satellite image based on the coordinates of the building polygon. Therefore, depending on the imaging conditions, it may not be possible to make an orthogonal assumption. Therefore, it is a good idea to have an index for determining identity other than IoU. For example, if the distance between the positional shape detected by the object detection network and the building polygon is less than or equal to a predetermined value, it may be determined that they are the same. There are several possible definitions of distance, such as the distance between the center of gravity of a building polygon and the center of gravity of a position shape detected by an object detection network, or the shortest distance between both polygons.

物体検出のネットワークで認識できる建物クラスの確率の情報は、クラス分類のネットワークを用いた時と同様に、確率上位のクラスとその確率をテーブルに保持しておく。物体検出のネットワークで、単純な形状抽出を行うことも可能である。例えば円形の建物を検出するように学習した物体検出のネットワークを用いると、画像中の円形領域の探索が可能になる。前記の同一判定処理と組み合わせることで、属性テーブルのクラスと確率情報だけでなく、形状情報のコラムを埋めることが可能になる。 Information on the probability of building classes that can be recognized by the object detection network is stored in a table with the classes with higher probabilities and their probabilities, similar to when the class classification network is used. It is also possible to perform simple shape extraction using an object detection network. For example, an object detection network trained to detect circular buildings can be used to search for circular regions in images. By combining this with the above-described identity determination process, it is possible to fill in not only the class and probability information columns of the attribute table but also the shape information columns.

物体検出のネットワークは、前記の特徴的な形状の建物だけでなく、動体の検出にもよい性能を発揮する。そこで、建物周辺にある特定の物体を検出し、数をカウントして、属性値として利用してもよい。例えば、多数の車が止まっていれば駐車場、電車が密集していれば屋外車庫、飛行機が密集していれば駐機所、コンテナが積まれていれば倉庫である可能性が高いなど、近くにある建物を類別する材料となる。よって、各建物ポリゴンについて、これらの重要な動体が所定の範囲内にいくつあるかをカウントして属性テーブルに格納する（Ｓ４１０）。車、電車、飛行機などの数をカウントするとき、ある時点の画像では存在しなくても、同一地点を撮影した別の画像を見ると存在することが頻繁におきる。そのため、これらの動体をカウントするとき、衛星画像を複数枚利用し、過去の多数の時点における動体の検出数を累積し、平均をとるなどして利用するとよい。また、類別のキーとなる地物との間の距離を測って属性テーブルに保持しておく。キーとなる地物は興味領域によって異なる。例えば、空港では滑走路との間の距離を計測しておくと、ターミナルや計器着陸装置など様々な施設の類別が可能になる。また、キーとなる地物との間に含まれる、建物ポリゴン数や、動体の数を属性テーブルに保持しておくとよい。 The object detection network exhibits good performance not only in detecting the characteristically shaped buildings described above but also in detecting moving objects. Therefore, specific objects around the building may be detected, counted, and used as attribute values. For example, if a large number of cars are parked, it is likely to be a parking lot, if there are a lot of trains, it is likely to be an outdoor garage, if there are a lot of planes, it is likely to be an apron, and if containers are loaded, it is likely to be a warehouse. It serves as a material for classifying nearby buildings. Therefore, for each building polygon, the number of these important moving objects within a predetermined range is counted and stored in the attribute table (S410). When counting the number of cars, trains, airplanes, etc., even if they are not present in an image at a certain point in time, they often exist when looking at another image taken of the same location. Therefore, when counting these moving objects, it is preferable to use a plurality of satellite images, accumulate the number of detected moving objects at many past points, and take the average. Additionally, the distance between the feature and the feature that is the key for the classification is measured and stored in the attribute table. Key features vary depending on the area of interest. For example, at an airport, by measuring the distance from the runway, it becomes possible to classify various facilities such as terminals and instrument landing systems. Further, it is preferable to hold the number of building polygons and the number of moving objects included between key features in the attribute table.

図６に、図４の建物属性テーブルの作成フローで作成した、建物属性テーブルの一例を示す。各建物ポリゴンに割り当てたＩＤに、色のスペクトル値、テクスチャ、植生指数、高さ、面積、形状クラス、所定の距離内の物体の数、キーとの距離、キーとの間の地物数、機械学習による推定クラスとその確率、抽出文字などの項目を備えている。 FIG. 6 shows an example of a building attribute table created in the building attribute table creation flow of FIG. 4. The ID assigned to each building polygon includes the color spectrum value, texture, vegetation index, height, area, shape class, number of objects within a given distance, distance to the key, number of features between the key, It has items such as estimated classes and their probabilities based on machine learning, and extracted characters.

図５に路面属性テーブルの作成フローを示す。図２の処理の時点で抽出されている路面は材質別に分類されている想定であり、例えば、コンクリート、アスファルトの領域ポリゴンを想定する。路面クラスの類別では、上に載っている関連物体の検出が重要な処理となる（Ｓ５０１）。例えば、車が存在するまっすぐに細長い路面は道路であるし、大量に車が密集している広い路面は駐車場である可能性が高い。本処理は、建物ポリゴンの属性テーブルを埋めるときに用いた、物体検出ネットワークのフローを利用するとよい。検出した関連物体の領域を一定の大きさを持ったバッファで拡張する。そして、もとの路面の領域ポリゴンと重複範囲を判定し、領域ポリゴンの形状と座標を更新する（Ｓ５０２）。関連物体の抽出は形状の更新だけでなく、類別にも重要な情報を含む。移動手段を有する地物が載っている場合には、路面の種類を類別しやすくなる。多数の車が止まっていれば駐車場、電車が密集していれば屋外車庫、飛行機が密集していれば駐機所である。よって、各路面ポリゴンは上に載った地物の情報を保持するとよい。 FIG. 5 shows the flow of creating a road surface attribute table. It is assumed that the road surface extracted at the time of the processing in FIG. 2 is classified by material, for example, area polygons of concrete and asphalt are assumed. In classifying the road surface class, detection of related objects placed on the road surface is an important process (S501). For example, a straight, narrow road surface with cars on it is a road, and a wide road surface with a large number of cars densely packed together is likely to be a parking lot. This process preferably uses the flow of the object detection network used when filling the attribute table of building polygons. Expand the area of the detected related object with a buffer of a certain size. Then, the overlapping range with the original road surface area polygon is determined, and the shape and coordinates of the area polygon are updated (S502). Extraction of related objects includes important information not only for updating the shape but also for classification. If a feature with a means of transportation is listed, it becomes easier to classify the type of road surface. If there are a lot of cars parked there, it's a parking lot, if there are a lot of trains, it's an outdoor garage, and if there are a lot of planes, it's an apron. Therefore, it is preferable that each road surface polygon retains information about the features placed on it.

また、領域ポリゴンの周辺をチップ画像に分割し、建物ポリゴンの時と同様に、位置・形状・類別クラスの分類確率を入手できる機械学習を導入して結果をテーブルに保存しておくとよい（Ｓ５０３～Ｓ５０５）。例えば、深層学習のインスタンス認識のネットワークを用いて、領域の形状と分類クラスの情報を取り出す処理を行ってもよい。インスタンス認識によって、ピクセル単位で路面の形状情報が抽出できるので、初期の路面領域ポリゴンの形状と一致判定し、重複領域を切り取るようにポリゴンを分割し、路面の領域ポリゴンの形状と座標を更新する（Ｓ５０６）。次に、建物ポリゴンについて行ったのと同様の処理を行う。すなわち、衛星画像のピクセル値から得られるスペクトル特徴量である色、テクスチャ、植生指数の抽出、形状抽出、面積算出、文字抽出などにより、路面属性テーブルを更新する（Ｓ５０７～Ｓ５１０）。 Also, it is a good idea to divide the area around the area polygon into chip images, introduce machine learning that can obtain the classification probabilities of position, shape, and classification class, and save the results in a table (as with building polygons). S503-S505). For example, a deep learning instance recognition network may be used to extract the shape of the region and the classification class information. Instance recognition allows the extraction of road surface shape information in pixel units, so it is determined that the shape matches the initial road surface area polygon, the polygon is divided to cut out the overlapping area, and the shape and coordinates of the road surface area polygon are updated. (S506). Next, the same processing as that for the building polygons is performed. That is, the road surface attribute table is updated by extracting spectral features such as color, texture, and vegetation index obtained from the pixel values of the satellite image, shape extraction, area calculation, character extraction, etc. (S507 to S510).

また、路面ポリゴンは隣接している地物が何であるかが重要な情報となるから、建物および路面ポリゴンとの接触判定をしておく。接触しているポリゴンの情報を、包含、隣接の情報とともに保存しておく（Ｓ５１１）。 Also, since the type of feature adjacent to the road surface polygon is important information, the contact between the building and the road surface polygon is determined. Information on polygons in contact is saved together with information on inclusion and adjacency (S511).

図７に、図５の路面属性テーブルの作成フローで作成した、路面属性テーブルの一例を示す。各路面ポリゴンに割り当てたＩＤに、推定材質、色のスペクトル値、テクスチャ、植生指数、文字列、形状クラス、面積、積載地物、隣接する建物ポリゴン、包含する建物ポリゴン、機械学習による推定クラスとその確率などの項目を備えている。 FIG. 7 shows an example of a road surface attribute table created using the road surface attribute table creation flow shown in FIG. The ID assigned to each road surface polygon includes the estimated material, color spectrum value, texture, vegetation index, character string, shape class, area, loading feature, adjacent building polygons, included building polygons, and the estimated class by machine learning. It includes items such as the probability.

各ポリゴンについて属性テーブルを作成したのち、図１Ａの処理フローの、各ポリゴンの類別処理を行う（Ｓ１０４）。類別処理は、図１Ｂのブロック構成図の、類別部１６が行う。 After creating an attribute table for each polygon, classification processing for each polygon is performed in the processing flow of FIG. 1A (S104). The classification process is performed by the classification unit 16 in the block diagram of FIG. 1B.

図８に、類別処理の処理フローを示す。本実施形態の類別処理方法は、建物類別プログラムと路面類別プログラムと、これらの途中経過を表示可能なユーザインタフェースによって実現され、インタラクティブな類別処理を可能とする。 FIG. 8 shows a processing flow of classification processing. The classification processing method of this embodiment is realized by a building classification program, a road surface classification program, and a user interface that can display the progress of these programs, and enables interactive classification processing.

社会で特定の機能を持つ建物は、設置場所において重要な機能を果たしているはずである。そのため、立地場所に意味があり、周辺の地物の特定に使えることが考えられる。逆に、特定の地物の場所が判明すると連鎖的に類別が可能な地物が増える場合も考えられる。これらの類別クラス間の関連度の高いものを、ルールベースで設定したり、機械学習の訓練フェーズで関連度を計算しておくとよい。路面は、周辺の建物によって似た外観をしていても呼び方が変わることがあるので、基本的には建物の類別後に類別可能になる場合が多い。 Buildings with specific functions in society are supposed to perform important functions at the location where they are installed. Therefore, the location has meaning and can be used to identify surrounding features. Conversely, when the location of a specific feature is known, the number of features that can be categorized may increase in a chain. It is advisable to set the degree of association between these classification classes based on rules, or calculate the degree of association in the training phase of machine learning. Road surfaces may have different names depending on the surrounding buildings even if they have a similar appearance, so in most cases it is possible to classify the road surface after the buildings have been classified.

まず、各ポリゴンごとに属性テーブルの値を元に機械学習あるいはルールベースで地物のクラスを類別するとよい。この時に判定に使った属性値の情報と、確度の高さを保持しておく。 First, it is preferable to classify the classes of features using machine learning or rule-based methods based on the values in the attribute table for each polygon. At this time, information on the attribute values used in the determination and the level of accuracy are retained.

キーとなる地物をユーザインタフェース経由であらかじめ入力してもらうショートカットを設けてもよい（Ｓ８０１）。地物の情報が確定すると、類別に必要な情報がそろって、類別が可能になる建物が存在し増えていく。キーが確定すると、属性値（属性テーブル）のアップデートを行う（Ｓ８０２）。例えば、図６の建物属性テーブルの例では、キーとの距離やキーとの間の地物数が追加され、推定クラスが更新される。そして、各ポリゴンのクラス評価を行い（Ｓ８０３）、判定に用いた情報のうち、確度が高い結果を示したポリゴンから順にユーザインタフェースに提示する（Ｓ８０４、Ｓ８０５）。一定の閾値を超えた確度で分類ができたものをまとめて表示させるようにするとよい。ユーザインタフェースにより、類別候補が確定入力される（Ｓ８０６）。ユーザインタフェースで類別結果の正解判定をもらえる仕組みを用意し、その情報をもとに属性値のアップデートをかけて、類別処理とユーザインタフェースへの提示を繰り返し、徐々に類別済の地物を増やしていく。確度が一定の閾値を超える地物がなくなったときにこの繰り返し処理は終了する。そして、見つけられなかったクラスの情報、更新した属性テーブルをユーザインタフェースへ送り（Ｓ８０７）、ユーザインタフェースで参考情報・判定根拠の更新を行って（Ｓ８０８）、終了する。 A shortcut may be provided that allows the user to enter a key feature in advance via the user interface (S801). Once the information on the features is determined, the information necessary for categorization is gathered, and the number of buildings that can be categorized increases. When the key is determined, the attribute value (attribute table) is updated (S802). For example, in the example of the building attribute table in FIG. 6, the distance to the key and the number of features between the key and the key are added, and the estimated class is updated. Then, class evaluation is performed for each polygon (S803), and among the information used for the determination, polygons showing results with the highest accuracy are presented on the user interface in order (S804, S805). It is preferable to display all the classifications that have been classified with an accuracy exceeding a certain threshold value. Category candidates are confirmed and input through the user interface (S806). Prepare a mechanism to receive the correctness of the classification results on the user interface, update the attribute values based on that information, repeat the classification process and display on the user interface, and gradually increase the number of classified features. go. This iterative process ends when there are no features whose accuracy exceeds a certain threshold. Then, the information on the class that was not found and the updated attribute table are sent to the user interface (S807), the reference information and decision basis are updated on the user interface (S808), and the process ends.

本実施形態で類別結果を表示するインタフェースは、例えば図１０に示すように各ポリゴンを選択したときに、最もそれらしい類別クラスを提示し、根拠となった属性値を表示する。図１０の例では、「病院（スコア：０．９３）、ＸＸ病院の文字有り、ヘリポート有り、近くに大学あり」と表示する。また、検出の際に用いたほかの地物との位置関係を可視化する仕組みを持つとよい。例えば、港の近くに極端にたくさんの車が止まっている場所があれば、隣接する建物は貿易関連施設である、というように連想条件となる地物の関係性を可視化する仕組みを持っておくとよい。図１０の例では、「大学（スコア：０．６３）、ＸＸ学校の文字有り、プール有り、近くに病院あり」と表示する。あるポリゴンの類別結果を選択したとき、類別クラス、属性値を表示させることに加えて、関連ポリゴンを線でリンクしたり、ポップアップ表示やポリゴン明滅させたりするとよい。すべての建物と路面のポリゴンを評価し終わった後、当該地域で類別された地物と関連度が深いが、発見されなかった地物がある場合には、過去の事例を参考として、不足していた属性情報とともにレポートを提示するとよい（Ｓ８０７，Ｓ８０８）。レポート画面は類別候補の提示画面とは別ウィンドウ、同ウィンドウ内の別フレームまたは別タブに表示してもよいし、ログとして出力してもよい。 In this embodiment, the interface for displaying the classification results presents the most likely classification class and displays the attribute values that are the basis when each polygon is selected, as shown in FIG. 10, for example. In the example of FIG. 10, "Hospital (score: 0.93), XX hospital text, helipad available, university nearby" is displayed. It is also good to have a mechanism to visualize the positional relationship with other terrestrial objects used during detection. For example, if there is an extremely large number of cars parked near a port, there should be a system in place to visualize the relationship between features that serve as association conditions, such as if the adjacent building is a trade-related facility. Good. In the example of FIG. 10, "University (score: 0.63), XX school text, pool available, hospital nearby" is displayed. When a polygon classification result is selected, in addition to displaying the classification class and attribute value, it is a good idea to link related polygons with lines, display a pop-up, or make the polygon blink. After evaluating all building and road polygons, if there are any features that are closely related to the features classified in the area but have not been discovered, consider past cases as a reference and find the missing features. It is advisable to present a report together with the attribute information that has been included (S807, S808). The report screen may be displayed in a separate window from the category candidate presentation screen, in a separate frame or tab within the same window, or may be output as a log.

本発明を実現するハードウェア構成について、図１１に示す。CPU５３は本発明のプログラムを起動し、衛星画像の類別処理を実行する。衛星画像など演算に使用する画像データは一時的にメモリ５５に格納されるが、高速な計算が必要な場合、GPU５４のメモリにデータを転送し、GPUの演算サポートを使用してよい。類別結果は表示装置５１に表示される。ユーザは表示された結果に基づき、妥当性を確認してユーザインタフェース５６により、評価結果を入力する。評価結果の入力を受け付けると、記憶装置５２上の属性テーブルの更新を行う。通信インタフェース５７は、複数の計算機をもってシステムを構成する際や、実際の計算処理をクラウド上のインスタンスで実行するために使用してよい。 FIG. 11 shows a hardware configuration for realizing the present invention. The CPU 53 starts the program of the present invention and executes satellite image classification processing. Image data used for calculations, such as satellite images, is temporarily stored in the memory 55, but if high-speed calculations are required, the data may be transferred to the memory of the GPU 54 and the calculation support of the GPU may be used. The classification results are displayed on the display device 51. The user checks the validity based on the displayed results and inputs the evaluation results through the user interface 56. Upon receiving the input of the evaluation results, the attribute table on the storage device 52 is updated. The communication interface 57 may be used when configuring a system with multiple computers or to execute actual calculation processing on an instance on the cloud.

本発明の地物の分類システムは、計算機（コンピュータ）において、ＣＰＵが所定のプログラムをメモリ上にロードし、また、ＣＰＵがメモリ上にロードした所定のプラグラムを実行することにより実現できる。この所定のプログラムは、当該プログラムが記憶された記憶装置から、または、通信装置を介してネットワークから入力して、直接メモリ上にロードするか、もしくは、一旦、外部記憶装置に格納してから、メモリ上にロードすれば良い。 The feature classification system of the present invention can be realized in a computer by having a CPU load a predetermined program onto a memory, and by having the CPU execute the predetermined program loaded onto the memory. This predetermined program may be inputted from a storage device in which the program is stored or from a network via a communication device and loaded directly onto the memory, or once stored in an external storage device and then Just load it into memory.

本発明におけるプログラムの発明は、このようにコンピュータに組み込まれ、コンピュータを地物の分類システムとして動作させるプログラムである。本発明のプログラムをコンピュータに組み込むことにより、図１Ｂの機能ブロック構成図に示される地物の分類システムが構成される。 The invention of the program in the present invention is thus a program that is incorporated into a computer and causes the computer to operate as a feature classification system. By incorporating the program of the present invention into a computer, a feature classification system shown in the functional block diagram of FIG. 1B is configured.

上記実施の形態では、衛星画像を例に説明したが、本発明は、衛星画像に限らず、航空機などにより上空から地上を撮影した空撮画像全般に用いることもできる。 Although the above embodiments have been described using satellite images as an example, the present invention is not limited to satellite images, but can also be used for general aerial images taken of the ground from above by an aircraft or the like.

本実施の形態によれば、地物の分類において、ラベリングをアシストすることにより、作業工数を減らすことができる。 According to the present embodiment, the number of man-hours can be reduced by assisting labeling in classifying features.

大量の空撮画像が利用可能になったときに、三次元情報を取り込んだ空撮画像の地物の分類方法を提供できる。 When a large amount of aerial images become available, it is possible to provide a method for classifying features in aerial images that incorporates three-dimensional information.

１１画像入力部
１２路面ポリゴン抽出部
１３建物ポリゴン抽出部
１４整形済みポリゴン取得部
１５属性テーブル作成部
１６類別部
１７属性テーブル
１８ユーザインタフェース
５１表示装置
５２記憶装置
５３ CPU
５４ GPU
５５メモリ
５６ユーザインタフェース
５７通信インタフェース 11 Image input section 12 Road surface polygon extraction section 13 Building polygon extraction section 14 Shaped polygon acquisition section 15 Attribute table creation section 16 Classification section 17 Attribute table 18 User interface 51 Display device 52 Storage device 53 CPU
54 GPUs
55 Memory 56 User Interface 57 Communication Interface

Claims

A classification system that classifies features using multiple aerial images,
a road surface polygon extraction unit that extracts road surface polygons by integrating extraction results recognized at multiple resolutions;
a building polygon extraction unit that extracts a building polygon using three-dimensional information obtained by multi-view stereo using a plurality of pairs of aerial images and the road surface polygon;
an attribute table creation unit that extracts image features and geometric information of each polygon for categorizing the building polygons and the road surface polygons and stores them in an attribute table;
a classification unit that classifies building polygons using the attribute table, further evaluates relationships with other polygons for polygons classified with high accuracy, adds and updates the attribute table, and repeats the classification process;
A feature classification system with

The feature classification system according to claim 1, further comprising:
For the small polygon groups that make up the extracted building polygons and road surface polygons, a certain rectangular area is set to include each small polygon, the aerial photographed image is cut out, edges are extracted, and edge information is obtained. A system for classifying features, comprising: a shaped polygon acquisition unit that obtains shaped polygons using a shaped polygon acquisition unit.

The feature classification system according to claim 2,
The shaped polygon acquisition unit takes an angle histogram of edges, forms a straight line grid using the accuracy with the highest frequency, superimposes the straight line grid on data obtained by binary rasterizing the small polygon, and A classification system for features characterized by selecting a grid to be left with an occupancy ratio of , and obtaining shaped polygons.

The feature classification system according to claim 1,
The attribute table is
a building attribute table having attribute values of the building polygon;
a road surface attribute table having attribute values of the road surface polygon;
A classification system for features characterized by consisting of:

The feature classification system according to claim 1,
The attribute table assigns an ID to each polygon, and includes geometric information such as area, height, and shape of each polygon, image features such as color, vegetation index, and texture obtained by image processing, and estimated classes of classification results by deep learning. , a feature classification system characterized by including a positional relationship of feature polygons obtained by the classification section.

The feature classification system according to claim 2,
has a user interface,
The feature classification system is characterized in that the user interface displays in stages the shape of the shaped polygon obtained by the shaped polygon acquisition unit, and accepts vertex editing and line correction by user operations.

The feature classification system according to claim 1,
has a user interface,
A system for classifying features, wherein the user interface displays the classification results obtained by the classification section, and accepts additions, deletions, and modifications of the classification results by user operations.

The feature classification system according to claim 1,
has a user interface,
The feature classification system is characterized in that the user interface displays the classification result obtained by the classification section together with one or more attribute values serving as a basis.

The feature classification system according to claim 1,
has a user interface,
When the classification result for each polygon obtained in the classification section is calculated using the positional relationship with other polygons, the user interface adjusts the relationship by adding straight lines connecting the polygons or making the polygons blink. A classification system for features that emphasizes gender.

A classification method for classifying features using multiple aerial images, the method comprising:
a road surface polygon extraction step for extracting road surface polygons by integrating extraction results recognized at multiple resolutions;
a building polygon extraction step of extracting a building polygon using three-dimensional information obtained by multi-view stereo using a plurality of aerial image pairs and the road surface polygon;
an attribute table creation step of extracting image features and geometric information of each polygon for categorizing the building polygons and the road surface polygons and storing them in an attribute table;
a classification processing step of categorizing buildings using the attribute table, further evaluating relationships with other polygons for polygons that have been classified with high accuracy, adding and updating the attribute table, and repeating the categorization process;
A classification method for features with

The method for classifying features according to claim 10, further comprising:
For the small polygon groups that make up the extracted building polygons and road surface polygons, a certain rectangular area is set to include each small polygon, the aerial photographed image is cut out, edges are extracted, and edge information is obtained. A method for classifying features characterized by comprising a step of obtaining a shaped polygon using a shaped polygon.

The method for classifying features according to claim 11,
The shaped polygon obtaining step includes:
Cutting out the periphery of the polygon before shaping from the aerial photographed image, taking an angle histogram of the edge, forming a straight line grid using the accuracy with the highest frequency, and superimposing the straight line grid on the data obtained by binary rasterizing the small polygon, A method for classifying features characterized by selecting a grid to be left with an occupancy ratio of 1 within the grid and obtaining shaped polygons.

11. The method for classifying features according to claim 10,
A method for classifying features, wherein the attribute table includes a building attribute table having attribute values of the building polygons and a road surface attribute table having attribute values of the road surface polygons.

The method for classifying features according to claim 11 ,
The attribute table assigns an ID to each polygon obtained in the shaped polygon acquisition step, and includes geometric information such as area, height, shape, etc., and image characteristics such as color, vegetation index, texture, etc. obtained by image processing. , an estimated class of classification results obtained by machine learning, and a positional relationship of feature polygons obtained in the classification processing step.

to the computer,
A classification method for classifying features using multiple aerial images, the method comprising:
a road surface polygon extraction step for extracting road surface polygons by integrating extraction results recognized at multiple resolutions;
a building polygon extraction step of extracting a building polygon using three-dimensional information obtained by multi-view stereo using a plurality of aerial image pairs and the road surface polygon;
For the small polygon groups that make up the extracted building polygons and road surface polygons, a certain rectangular area is set to include each small polygon, the aerial photographed image is cut out, edges are extracted, and edge information is obtained. a shaped polygon acquisition step of obtaining shaped polygons using
an attribute table creation step of extracting image features and geometric information for classifying the building polygons and the road surface polygons and storing them in an attribute table;
a classification processing step of categorizing buildings using the attribute table, further evaluating relationships with other polygons for polygons classified with high accuracy, adding and updating the attribute table, and repeating the categorization process;
A program to run.