JP2024002745A

JP2024002745A - Information processing device, area division method, and program

Info

Publication number: JP2024002745A
Application number: JP2022102136A
Authority: JP
Inventors: ショウオウ; Zhao Wang; 雄介中野; Yusuke Nakano; ▲ゆ▼博王; Yubo Wang; 淳大谷; Atsushi Otani; 克也長谷川; Katsuya Hasegawa
Original assignee: Waseda University; Nippon Telegraph and Telephone Corp
Current assignee: Waseda University; Nippon Telegraph and Telephone Corp
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2024-01-11

Abstract

PROBLEM TO BE SOLVED: To provide an information processing device, an area division method and a program capable of dividing an area of traffic congestion on a road from an image of an aerial photograph.

SOLUTION: An information processing device includes: an aerial photograph collection part 10 for acquiring an image photographed from the sky; an algorithm calculation part 120 for simultaneously dividing a road area and a congestion area from each other from the image by using a model of a neural network; an output processing part 130 for outputting a division result obtained by the algorithm calculation part 120; and a learning part 140 for inputting the image of the aerial photograph to the model and adjusting the parameter of the model so as to minimize the error between the output from the model and a correct answer.

SELECTED DRAWING: Figure 1

Description

本発明は、空中写真の画像から道路情報を抽出する技術に関連するものである。 The present invention relates to technology for extracting road information from aerial photographic images.

空中写真の画像から交通渋滞を検知したり、交通密度を推定したりすることで、リアルタイムの交通状態情報を都市モニタリングシステムや運転者に提供することができる。このような交通状態情報により、例えば、適切な走行経路を決定することができる。 By detecting traffic congestion and estimating traffic density from aerial photographic images, real-time traffic status information can be provided to city monitoring systems and drivers. With such traffic condition information, for example, an appropriate driving route can be determined.

渋滞を検知する従来技術である非特許文献２に開示された技術では、交差点に設置したカメラで撮影した画像から交通密度を推定することにより、渋滞検知を分類問題として扱う技術が開示されている。 The technology disclosed in Non-Patent Document 2, which is a conventional technology for detecting traffic congestion, discloses a technology that treats traffic congestion detection as a classification problem by estimating traffic density from images taken with cameras installed at intersections. .

また、非特許文献３に開示された技術では、LTA（Land Transport Authority（陸上交通庁））が提供するオープンソースのアプリケーションプログラミングインタフェース(API) を使ってデータを収集し、交通密度を推定するためのCNN（畳み込みニューラルネットワーク）を提案している。 In addition, the technology disclosed in Non-Patent Document 3 uses an open source application programming interface (API) provided by the Land Transport Authority (LTA) to collect data and estimate traffic density. proposed a CNN (convolutional neural network).

しかし、非特許文献２、３で使用された画像は交通カメラ（交差点等に設置されたカメラ）で撮影されたものであるため、非常に小さなエリアの道路状況情報しか提供できない。 However, since the images used in Non-Patent Documents 2 and 3 were taken by traffic cameras (cameras installed at intersections, etc.), they can only provide road condition information for a very small area.

また、セマンティックセグメンテーションに基づいて、空中写真の画像から道路を抽出するための多くの方法が提案されている。例えば非特許文献１には、道路抽出を、相互に関連する３つのサブタスク、すなわち、道路表面セグメンテーション、道路エッジ検出、および道路中心線抽出に分解して行う技術が開示されている。しかし、道路網を抽出できるものの、空中写真の画像から交通渋滞の領域をセグメント化（分割）することはできていない。 Also, many methods have been proposed for extracting roads from aerial photography images based on semantic segmentation. For example, Non-Patent Document 1 discloses a technique in which road extraction is divided into three interrelated subtasks: road surface segmentation, road edge detection, and road centerline extraction. However, although it is possible to extract the road network, it is not possible to segment (divide) areas of traffic congestion from aerial photographs.

Y. Liu, J. Yao, X. Lu, M. Xia, X. Wang, and Y. Liu. RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes from High-Resolution Remotely Sensed Images. IEEE Transactions on Geoscience and Remote Sensing, 57(4):2043-2056, 2019.Y. Liu, J. Yao, X. Lu, M. Xia, X. Wang, and Y. Liu. RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes from High-Resolution Remotely Sensed Images. IEEE Transactions on Geoscience and Remote Sensing, 57(4):2043-2056, 2019. J. Nubert, N. G. Truong, A. Lim, H. I. Tanujaya, L. Lim, M. A. Vu, "Traffic Density Estimation using a Conbolutional Neural Network," arXiv preprint arXiv : 1809.01564, 2018.J. Nubert, N. G. Truong, A. Lim, H. I. Tanujaya, L. Lim, M. A. Vu, "Traffic Density Estimation using a Conbolutional Neural Network," arXiv preprint arXiv : 1809.01564, 2018. M. Hasan, S. Das and M. N. T. Akhand, "Estimating Traffic Density on Roads using Convolutional Neural Network with Batch Normalization," 2021 5th International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), 2021, pp. 1-6, doi: 10.1109/ICEEICT53905.2021.9667860.M. Hasan, S. Das and M. N. T. Akhand, "Estimating Traffic Density on Roads using Convolutional Neural Network with Batch Normalization," 2021 5th International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), 2021, pp. 1-6, doi : 10.1109/ICEEICT53905.2021.9667860.

本発明は上記の点に鑑みてなされたものであり、空中写真の画像から、道路における渋滞の領域を区分することを可能とする技術を提供することを目的とする。 The present invention has been made in view of the above points, and an object of the present invention is to provide a technique that makes it possible to classify areas of traffic congestion on a road from an aerial photographic image.

開示の技術によれば、上空から撮影された画像を取得する取得部と、
ニューラルネットワークのモデルを用いて、前記画像から道路領域と渋滞領域を同時に区分する計算部と、
前記計算部により得られた区分結果を出力する出力処理部と
を備える情報処理装置が提供される。 According to the disclosed technology, an acquisition unit that acquires an image taken from above;
a calculation unit that simultaneously divides the image into a road area and a traffic congestion area using a neural network model;
An information processing device is provided, comprising: an output processing section that outputs the classification results obtained by the calculation section.

開示の技術によれば、空中写真の画像から、道路における渋滞の領域を区分することを可能とする技術が提供される。 According to the disclosed technology, a technology is provided that makes it possible to classify congested areas on a road from an aerial photographic image.

本発明の実施の形態における情報処理装置の構成図である。FIG. 1 is a configuration diagram of an information processing device in an embodiment of the present invention. 処理の流れを説明するためのフローチャートである。It is a flowchart for explaining the flow of processing. コンテキスト強化交通セグメンテーションモデルの全体構成を示す図である。FIG. 2 is a diagram showing the overall configuration of a context-enhanced traffic segmentation model. オリジナル交通モジュールの構成を示す図である。FIG. 3 is a diagram showing the configuration of an original transportation module. アテンション計算ブロックの構成を示す図である。FIG. 3 is a diagram showing the configuration of an attention calculation block. 装置のハードウェア構成例を示す図である。It is a diagram showing an example of the hardware configuration of the device.

以下、図面を参照して本発明の実施の形態（本実施の形態）を説明する。以下で説明する実施の形態は一例に過ぎず、本発明が適用される実施の形態は、以下の実施の形態に限られるわけではない。 DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention (this embodiment) will be described below with reference to the drawings. The embodiments described below are merely examples, and embodiments to which the present invention is applied are not limited to the following embodiments.

なお、本明細書及び請求の範囲において、「分割」、「セグメント化」、「抽出」、「区分」、「分類」、「セグメンテーション」は、互いに同義に使用されてもよい。つまり、明細書あるいは請求の範囲に記載された、「分割」、「セグメント化」、「抽出」、「区分」、「分類」、及び「セグメンテーション」はそれぞれ、これらのうちの他のいずれかに置き換えてもよい。 Note that in this specification and claims, "division," "segmentation," "extraction," "classification," "classification," and "segmentation" may be used interchangeably. In other words, "division," "segmentation," "extraction," "classification," "classification," and "segmentation" described in the specification or claims each refer to any other of these. May be replaced.

また、「空中写真」を「航空写真」に言い換えてもよい。空中写真の画像を「空中画像」あるいは「航空画像」と言い換えてもよい。空中写真／航空写真は、上空にある飛行体から地上を撮影した写真であり、当該飛行体は、特定のものに限定されない。例えば、飛行体は飛行機であってもよいし、衛星であってもよいし、ドローンであってもよい。 Furthermore, "aerial photograph" may be replaced with "aerial photograph." An aerial photo image may also be referred to as an "aerial image" or "aerial image." An aerial photograph/aerial photograph is a photograph taken of the ground from a flying object in the sky, and the flying object is not limited to a specific one. For example, the flying object may be an airplane, a satellite, or a drone.

また、「強化する（enhanceする）」とは、例えば、渋滞領域の区分の精度を高めること、あるいは、渋滞領域と渋滞以外の領域との境界を明確化すること、などの意味を有する。 Furthermore, "enhance" has the meaning of, for example, increasing the accuracy of classifying congested areas, or clarifying the boundaries between congested areas and non-congested areas.

（装置構成例、動作例）
図１に本実施の形態における情報処理装置１００の構成例を示す。図１に示すように、情報処理装置１００は、空中写真収集部１１０、アルゴリズム計算部１２０、出力処理部１３０、学習部１４０を有する。なお、空中写真収集部１１０を取得部と呼んでもよい。また、アルゴリズム計算部１２０を計算部と呼んでもよい。 (Equipment configuration example, operation example)
FIG. 1 shows a configuration example of an information processing apparatus 100 in this embodiment. As shown in FIG. 1, the information processing device 100 includes an aerial photo collection section 110, an algorithm calculation section 120, an output processing section 130, and a learning section 140. Note that the aerial photo collection unit 110 may also be referred to as an acquisition unit. Further, the algorithm calculation unit 120 may be called a calculation unit.

図２のフローチャートを参照して、情報処理装置１００による推論時（テスト時）の処理の流れを説明する。Ｓ１０１において、空中写真収集部１１０が、ドローン、衛星、あるいは航空機などが撮影した写真あるいはビデオ（動画）を取得する。これら写真とビデオを総称して「空中写真」と呼ぶことにする。空中写真収集部１１０により取得された空中写真の画像はアルゴリズム計算部１２０に入力される。 The flow of processing at the time of inference (during testing) by the information processing apparatus 100 will be described with reference to the flowchart of FIG. 2. In S101, the aerial photo collection unit 110 acquires photos or videos taken by a drone, satellite, aircraft, or the like. These photographs and videos will be collectively referred to as "aerial photographs." The aerial photo image acquired by the aerial photo collection unit 110 is input to the algorithm calculation unit 120.

アルゴリズム計算部１２０は、後述するニューラルネットワークのモデル（エンドツーエンドのモデル）を有している。ここではモデルは学習済みであるとする。Ｓ１０２において、アルゴリズム計算部１２０は、モデルに空中写真の画像を入力することで、モデルからの出力として、道路領域と渋滞領域（道路領域における渋滞領域と渋滞領域以外を区分したもの）が分割（区分）された画像を取得する。 The algorithm calculation unit 120 has a neural network model (end-to-end model), which will be described later. It is assumed here that the model has already been trained. In S102, the algorithm calculation unit 120 inputs the aerial photograph image to the model, and divides the road area and the congestion area (the road area into the congestion area and the non-congestion area) as an output from the model. Obtain a segmented image.

Ｓ１０３において、出力処理部１３０は、アルゴリズム計算部１２０に得られた画像（道路領域と渋滞領域とが分割された画像）をそのまま出力してもよいし、当該画像に対する処理を行って、処理後の画像を出力してもよい。例えば、出力処理部１３０は、アルゴリズム計算部１２０により得られた画像から、道路上の渋滞領域のみを抽出して出力することも可能である。 In S103, the output processing unit 130 may output the image obtained by the algorithm calculation unit 120 (the image in which the road area and the traffic congestion area are divided) as is, or may perform processing on the image and output the image after the processing. You may also output an image of For example, the output processing unit 130 can extract and output only the congested area on the road from the image obtained by the algorithm calculation unit 120.

モデルの学習時においては、大量の空中写真の画像と、そのラベルデータ（例えば、画像に道路、渋滞、渋滞以外をラベル付けしたデータ）を用いる。学習部１４０は、モデルに空中写真の画像を入力し、モデルからの出力と正解との誤差が最小になるように、モデルのパラメータ（重み）を調整する。 When training the model, a large amount of aerial photographic images and their label data (for example, images labeled with roads, traffic jams, and other than traffic jams) are used. The learning unit 140 inputs an aerial photograph image to the model, and adjusts the parameters (weights) of the model so that the error between the output from the model and the correct answer is minimized.

なお、学習を行う装置（学習部１４０を備える装置）と、推論を行う装置とが別々の装置であってもよい。この場合、学習を行う装置を学習装置と呼んでもよい。以降、アルゴリズム計算部１２０の構成と動作を詳細に説明する。また、推論を行う装置は学習部１４０を備えなくてもよい。 Note that the device that performs learning (the device that includes the learning section 140) and the device that performs inference may be separate devices. In this case, the device that performs learning may be called a learning device. Hereinafter, the configuration and operation of the algorithm calculation section 120 will be explained in detail. Furthermore, the device that performs inference does not need to include the learning unit 140.

アルゴリズム計算部１２０は、ニューラルネットワークのモデルを有する。このモデルにより、空中写真の画像上で道路表面と、道路表面における渋滞領域を同時に分割（セグメント化）することが可能である。 The algorithm calculation unit 120 has a neural network model. Using this model, it is possible to simultaneously segment the road surface and the congested area on the road surface on an aerial photograph image.

空中写真の画像において、上空から見た車両の大きさ（スケール）は、上空から見た道路表面の大きさよりも小さいことから、車両のセグメント化は一般に非常に難しい。これは、スケール変動問題と呼ばれる。そのため、従来技術においては、渋滞領域と渋滞ではない領域との境界を正確にセグメント化することは非常に難しい。 In aerial photographic images, vehicle segmentation is generally very difficult because the size (scale) of a vehicle seen from above is smaller than the size of a road surface when seen from above. This is called the scale variation problem. Therefore, in the conventional technology, it is very difficult to accurately segment the boundary between a congested area and a non-congested area.

本実施の形態に係るアルゴリズム計算部１２０を構成するモデルは、上記の課題を解決し、空中写真の画像から渋滞領域を精度良くセグメント化することが可能である。 The model constituting the algorithm calculation unit 120 according to the present embodiment solves the above problems and is capable of accurately segmenting a congested area from an aerial photograph image.

以下、本実施の形態におけるコンテキスト強化交通セグメンテーションモデルの構成と動作を詳細に説明する。以下、記載の便宜上、コンテキスト強化交通セグメンテーションモデルを「モデル」と呼ぶ場合がある。 The configuration and operation of the context-enhanced traffic segmentation model in this embodiment will be described in detail below. Hereinafter, for convenience of description, the context-enhanced traffic segmentation model may be referred to as a "model".

（モデルの全体構成）
図３に、コンテキスト強化交通セグメンテーションモデルの全体構成例を示す。図３に示すように、本モデルは、特徴ピラミッドネットワーク（ＦＰＮ：Feature Pyramid Network）２１０、オリジナル交通モジュール（Original Traffic Module）２２０、コンテキストアテンションモジュール２３０を有する。 (Overall configuration of model)
Figure 3 shows an example of the overall configuration of the context-enhanced traffic segmentation model. As shown in FIG. 3, this model includes a feature pyramid network (FPN) 210, an original traffic module 220, and a context attention module 230.

コンテキストアテンションモジュール２３０は、グローバルコンテキスト生成器（Global Context Generator）２４０、アテンション計算ブロック（Attention Computation Block）２５０を有する。 The context attention module 230 includes a Global Context Generator 240 and an Attention Computation Block 250.

マルチレベル予測はスケール変動問題に効果的であることから、まず、空中写真の画像が、マルチレベル予測を行う特徴ピラミッドネットワーク２１０に入力される。特徴ピラミッドネットワーク２１０は、入力画像から、異なるスケールの５つの特徴マップ（Ｐ２～Ｐ６）からなる特徴ピラミッドを生成する。Ｐ６の特徴マップは、最高（最上位）レベルの意味情報を含む。 Since multi-level prediction is effective for scale variation problems, an aerial photographic image is first input into a feature pyramid network 210 that performs multi-level prediction. The feature pyramid network 210 generates a feature pyramid consisting of five feature maps (P2 to P6) of different scales from the input image. The P6 feature map contains the highest (top) level semantic information.

５つの特徴マップ（Ｐ２～Ｐ６）は、オリジナル交通モジュール２２０に入力され、オリジナル交通モジュール２２０は、道路表面（Road Surface)と交通渋滞（Original Traffic Jam）のセグメンテーション結果を生成する。 The five feature maps (P2 to P6) are input to the original traffic module 220, which generates segmentation results of Road Surface and Original Traffic Jam.

また、Ｐ６の特徴マップがグローバルコンテキスト生成器２４０に入力され、グローバルコンテキスト生成器２４０は、Ｐ６の特徴マップからグローバルコンテキスト特徴（Global Context Feature）を生成する。 Further, the feature map of P6 is input to the global context generator 240, and the global context generator 240 generates a global context feature from the feature map of P6.

グローバルコンテキスト特徴とオリジナル渋滞セグメンテーション結果がアテンション計算ブロック２５０に入力され、アテンション計算ブロック２５０は、強化された（質が高められた）渋滞セグメンテーション（渋滞している領域）を出力する。 The global context features and the original congestion segmentation results are input to the attention calculation block 250, which outputs an enhanced (enhanced) congestion segmentation (congested region).

強化された渋滞セグメンテーションと、オリジナル交通モジュール２２０により得られた道路表面のセグメンテーションを結合することで、最終的な出力を得ることができる。 By combining the enhanced traffic congestion segmentation and the road surface segmentation obtained by the original traffic module 220, the final output can be obtained.

（オリジナル交通モジュール２２０）
次に、オリジナル交通モジュール２２０について説明する。道路上で渋滞領域を分割（セグメント化）するには、道路上で車両群をセグメント化する必要がある。しかし、上空から見た場合、道路に対する車両のスケールは小さく、スケール変動問題を引き起こす。オリジナル交通モジュール２００は、この問題に対処するためにマルチスケール特徴融合ネットワークの構成を有する。 (Original transportation module 220)
Next, the original transportation module 220 will be explained. In order to divide (segment) a congested area on a road, it is necessary to segment a group of vehicles on the road. However, when viewed from above, the scale of the vehicle relative to the road is small, causing a scale variation problem. The original traffic module 200 has a multi-scale feature fusion network configuration to address this problem.

図４に、オリジナル交通モジュール２２０の構成例を示す。図４に示すように、オリジナル交通モジュール２２０は、畳み込み層（Convolution layers）２２１、融合層（Fusion Layer）２２２を含む。畳み込み層（Convolution layers）２２１は、特徴マップごとに、３つの連続する３×３畳み込み層を含む。 FIG. 4 shows an example of the configuration of the original transportation module 220. As shown in FIG. 4, the original traffic module 220 includes a convolution layer 221 and a fusion layer 222. Convolution layers 221 include three consecutive 3x3 convolution layers for each feature map.

図４に示すように、各特徴マップＰ∈Ｒ^{２５６×Ｈ×Ｗ}（サイズ：２５６×Ｈ×Ｗ）が畳み込み層２２１に入力される。各特徴マップに対し、同じ畳み込み処理が行われる。畳み込み処理により、新たな特徴マップ^～Ｐ∈Ｒ^{１×Ｈ×Ｗ}が生成される。なお、本明細書のテキストにおいて、記載の便宜上、文字の頭に記載される記号を、文字の前に記載している。「^～Ｐ」はその例である。 As shown in FIG. 4, each feature map PεR ^256×H×W (size: 256×H×W) is input to the convolution layer 221. The same convolution process is performed on each feature map. The convolution process generates a new feature map ^~ P∈R ^1×H×W . Note that in the text of this specification, for convenience of description, symbols written at the beginning of letters are written before the letters. “ ^～ P” is an example.

次に、バイリニア補間により、各特徴マップをオリジナルの入力Ｒ^ｈ×ｗのスケールにサイズ変更する。その後、５つの特徴マップを連結により融合し、融合した特徴マップを１つの３×３畳み込み層（融合層２２２）に入力することで、交通渋滞マップＡ∈Ｒ^{１×ｈ×ｗ}と道路表面マップＢ∈Ｒ^{１×ｈ×ｗ}を有するオリジナルのセグメンテーション結果を取得する。 Next, each feature map is resized to the original input R ^h×w scale by bilinear interpolation. After that, the five feature maps are fused by concatenation, and the fused feature map is input to one 3×3 convolutional layer (fusion layer 222), thereby creating a traffic congestion map A∈R ^1×h×w and a road surface map. Obtain the original segmentation result with B∈R ^1×h×w .

（コンテキストアテンションモジュール２３０）
次に、コンテキストアテンションモジュール２３０について説明する。 (Context attention module 230)
Next, the context attention module 230 will be described.

一般に、空中写真の画像において、道路上の渋滞領域とその他の領域との境界はあいまいであるため、渋滞領域と通行可能道路領域を明示的に分けることは困難である。コンテキストアテンションモジュール２３０は、この課題を解決し、渋滞領域と他の領域との境界を明確にする。コンテキストアテンションモジュール２３０は、グローバルコンテキスト特徴マップを用いることで、オリジナル交通モジュール２２０により得られたオリジナルの渋滞マップを改良して、上記境界を明確にする。 Generally, in an aerial photographic image, the boundary between a congested area on a road and other areas is ambiguous, so it is difficult to explicitly distinguish between a congested area and a passable road area. Contextual attention module 230 solves this problem and makes the boundaries between congested areas and other areas clear. The context attention module 230 refines the original congestion map obtained by the original traffic module 220 by using the global context feature map to clarify the boundaries.

グローバルコンテキストモジュール２４０は、ピラミッドプーリングモジュール（ＰＰＭ）を含む。前述したとおり、特徴マップＰ６は、最も強力な意味情報を有しており、グローバルコンテキストモジュール２４０は、特徴マップＰ６を入力とする。 Global context module 240 includes a pyramid pooling module (PPM). As mentioned above, feature map P6 has the most powerful semantic information, and global context module 240 receives feature map P6 as input.

すなわち、まず、グローバルコンテキストモジュール２４０が、Ｐ６の特徴マップにピラミッドプーリングモジュール（ＰＰＭ）を適用して、領域表現とコンテキスト依存関係をさらに活用している。グローバルコンテキストモジュール２４０により、グローバルコンテキスト特徴マップＣ∈Ｒ^{１×ｈ×ｗ}を取得する。 That is, first, the global context module 240 applies a pyramid pooling module (PPM) to the feature map of P6 to further exploit the region representation and context dependencies. The global context module 240 obtains a global context feature map C∈R ^1×h×w .

なお、ピラミッドプーリングモジュールにおいては、ピラミッド状のサイズ階層を有する複数グリッドを用いて、入力に対してプーリングを実施することで、どのグリッドに各クラスの特徴がどのくらい含まれているかを示した大域的（グローバル）な大まかなコンテキスト情報を得ることができる。 The pyramid pooling module uses multiple grids with pyramid-shaped size hierarchies to perform pooling on the input, thereby generating a global model that shows how many features of each class are included in which grid. (Global) rough context information can be obtained.

グローバルコンテキスト特徴マップとオリジナル渋滞マップはアテンション計算ブロック２５０に入力される。 The global context feature map and the original congestion map are input to the attention calculation block 250.

図５に、アテンション計算ブロック２５０の処理構成を示す。この処理が可能なようにニューラルネットワークが構成されている。 FIG. 5 shows the processing configuration of the attention calculation block 250. A neural network is configured to enable this processing.

図５に示すように、最初にオリジナル渋滞マップＡとグローバルコンテキスト特徴マップＣをそれぞれダウンサンプリングして、｛^－Ａ，^－Ｃ｝∈Ｒ^{１×ｈ／４×ｗ／４}を取得する。次に、これらをリシェープ（変形）して、２つの新たな特徴マップ｛^～Ａ，^～Ｃ｝∈Ｒ^１×ｎを得る。ここで、ｎはｎ＝ｈ／４×ｗ／４であり、特徴マップのおけるピクセル数を示す。 As shown in FIG. 5, first, the original traffic congestion map A and the global context feature map C are each downsampled to obtain { ^−A , ^−C }∈R ^1×h/4×w/4 . Next, these are reshaped (transformed) to obtain two new feature maps { ^~ A, ^~ C}∈R ^1×n . Here, n is n=h/4×w/4 and indicates the number of pixels in the feature map.

図５及び下記の式（１）に示すように、^～Ａと、転置した^～Ｃとの間で行列乗算を行い、ソフトマックス層によってコンテキストアテンションマップＳ∈Ｒ^ｎ×ｎを計算する。 As shown in FIG. 5 and Equation (1) below, matrix multiplication is performed between ^~ A and the transposed ^~ C, and the context attention map S∈R ^n×n is calculated by the softmax layer.

Ｓ＝Ｓｏｆｔｍａｘ（^～Ｃ^Ｔ×^～Ａ）（１）
コンテキスト情報を用いて渋滞領域を強化（明確化）するために、図５及び下記の式（２）に示すように、コンテキストアテンションマップＳを^～Ａに乗算し、その積をＲ^{１×ｈ／４×ｗ／４}にリシェープする。そして、^－Ａを加えて、コンテキストにより強化された渋滞マップ^～Ａ^Ｓ∈Ｒ^{１×ｈ／４×ｗ／４}を取得する。ここで、αは、０として初期化される学習可能な重みパラメータである。 S=Softmax( ^~ C ^T × ^~ A) (1)
In order to enhance (clarify) the congestion area using context information, as shown in ^FIG ^. Reshape to ^4×w/4 . Then ^- add A to obtain the context-enhanced congestion map ^~ A ^S ∈R ^1×h/4×w/4 . Here, α is a learnable weight parameter initialized as 0.

^～Ａ^Ｓ＝α（Ｒｅｓｈａｐｅ（^～Ａ×Ｓ））＋^－Ａ（２）
そして、^～Ａ^Ｓをサイズ変更して、最終的な強化された渋滞セグメンテーションの結果Ａ^Ｓ∈Ｒ^{１×ｈ×ｗ}を得る。 ^~ A ^S = α (Reshape ( ^~ A × S)) + ^- A (2)
Then resize ^~ A ^S to obtain the final enhanced congestion segmentation result A ^S ∈R ^1×h×w .

以上がコンテキストアテンションモジュール２３０の処理である。最後に、強化された渋滞セグメンテーションと、オリジナルの道路表面セグメンテーションとを結合（組み合わせ）し、それを３×３の畳み込み層に入力し、最終的な渋滞セグメンテーション結果を生成する。最終的な渋滞セグメンテーション結果において、例えば、空中写真の画像上で、道路領域が区分されて示されるととともに、その道路領域における渋滞領域及び渋滞以外の領域が区分して示される。 The above is the processing of the context attention module 230. Finally, the enhanced congestion segmentation and the original road surface segmentation are combined and input into a 3×3 convolution layer to generate the final congestion segmentation result. In the final traffic congestion segmentation result, for example, a road area is divided and shown on an aerial photograph image, and a congestion area and a non-congestion area in the road area are also shown divided.

（コンテキスト強化交通セグメンテーションモデルのまとめ）
以上説明したように、本実施の形態では、コンテキスト強化交通セグメンテーションモデルが、エンドツーエンドの手法で、空中写真の画像から交通渋滞と道路表面を分割（区分）する。「コンテキスト強化交通セグメンテーションモデル」は、コンテキストにより性能を強化した交通セグメンテーションモデルである。 (Summary of context-enhanced traffic segmentation model)
As described above, in this embodiment, the context-enhanced traffic segmentation model divides (classifies) traffic congestion and road surfaces from an aerial photographic image using an end-to-end method. The "context-enhanced traffic segmentation model" is a traffic segmentation model whose performance is enhanced by context.

本実施の形態におけるモデルは、明示的に交通渋滞と道路表面を分割することを可能にする２つのモジュール(オリジナル交通モジュール２２０とコンテキストアテンションモジュール２３０)から構成されている。 The model in this embodiment consists of two modules (original traffic module 220 and context attention module 230) that make it possible to explicitly segment traffic congestion and road surface.

オリジナル交通モジュール２２０は、空中写真の画像におけるスケール変動問題を解決するためのモジュールである。すなわち、このモジュールでは、特徴ピラミッドに基づくマルチスケール特徴マップを利用して、畳み込み層２２１により更なる特徴を抽出する。そじて、融合層２２２によって異なるスケールの複数の特徴を融合し、交通渋滞と道路表面のオリジナル（初期）のセグメンテーションを得る。 The original traffic module 220 is a module for solving the scale variation problem in aerial photographic images. That is, this module utilizes a multi-scale feature map based on feature pyramids to extract further features by the convolutional layer 221. Then, the fusion layer 222 fuses the features at different scales to obtain the original (initial) segmentation of the traffic congestion and road surface.

コンテキストアテンションモジュール２３０は、交通渋滞の境界を強化する。コンテキストアテンションモジュール２３０は、アテンション計算ブロック２５０とそれに対応するグローバルコンテキスト生成器２４０から成る。特徴ピラミッドの最上位レベルの特徴マップは、最も強い意味情報を含んでいる。そこで、それをグローバルコンテキスト生成器２４０に入力し、ピラミッドプーリング演算を介してグローバルコンテキストマップを得る。その後、アテンション計算ブロック２５０において、グローバルコンテキストマップと交通渋滞のオリジナルセグメンテーションとの間のアテンションマップを計算する。最後に、アテンションマップを用いて交通渋滞の境界を強め（境界を明確化し）、最終的な交通渋滞セグメンテーション結果を得る。
これにより、道路表面と渋滞領域を同時にかつ正確に区分できる。また、空中写真の画像では、車両のスケールが空中から見た道路表面のスケールよりも小さく、車両のセグメント化が非常に困難であるというスケールの問題を解決する。 Contextual attention module 230 enforces traffic jam boundaries. Context attention module 230 consists of an attention calculation block 250 and a corresponding global context generator 240. The feature map at the top level of the feature pyramid contains the strongest semantic information. Therefore, it is input to the global context generator 240 and a global context map is obtained through a pyramid pooling operation. Then, in attention calculation block 250, an attention map between the global context map and the original segmentation of the traffic jam is calculated. Finally, the attention map is used to strengthen the boundaries of traffic congestion (clarify the boundaries) and obtain the final traffic congestion segmentation result.
This allows the road surface and the congested area to be simultaneously and accurately classified. It also solves the problem of scale in aerial photographic images, where the scale of the vehicle is smaller than the scale of the road surface seen from the air, making segmentation of the vehicle very difficult.

（ハードウェア構成例）
情報処理装置１００は、例えば、コンピュータにプログラムを実行させることにより実現できる。このコンピュータは、物理的なコンピュータであってもよいし、クラウド上の仮想マシンであってもよい。 (Hardware configuration example)
The information processing device 100 can be implemented, for example, by causing a computer to execute a program. This computer may be a physical computer or a virtual machine on the cloud.

すなわち、情報処理装置１００は、コンピュータに内蔵されるＣＰＵやメモリ等のハードウェア資源を用いて、情報処理装置１００で実施される処理に対応するプログラムを実行することによって実現することが可能である。上記プログラムは、コンピュータが読み取り可能な記録媒体（可搬メモリ等）に記録して、保存したり、配布したりすることが可能である。また、上記プログラムをインターネットや電子メール等、ネットワークを通して提供することも可能である。 That is, the information processing device 100 can be realized by using hardware resources such as a CPU and memory built into a computer to execute a program corresponding to the processing performed by the information processing device 100. . The above program can be recorded on a computer-readable recording medium (such as a portable memory) and can be stored or distributed. It is also possible to provide the above program through a network such as the Internet or e-mail.

図６は、上記コンピュータのハードウェア構成例を示す図である。図６のコンピュータは、それぞれバスＢＳで相互に接続されているドライブ装置１０００、補助記憶装置１００２、メモリ装置１００３、ＣＰＵ１００４、インタフェース装置１００５、表示装置１００６、入力装置１００７、出力装置１００８等を有する。 FIG. 6 is a diagram showing an example of the hardware configuration of the computer. The computer in FIG. 6 includes a drive device 1000, an auxiliary storage device 1002, a memory device 1003, a CPU 1004, an interface device 1005, a display device 1006, an input device 1007, an output device 1008, etc., which are interconnected by a bus BS.

当該コンピュータでの処理を実現するプログラムは、例えば、ＣＤ－ＲＯＭ又はメモリカード等の記録媒体１００１によって提供される。プログラムを記憶した記録媒体１００１がドライブ装置１０００にセットされると、プログラムが記録媒体１００１からドライブ装置１０００を介して補助記憶装置１００２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１００１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１００２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 A program for realizing processing by the computer is provided, for example, by a recording medium 1001 such as a CD-ROM or a memory card. When the recording medium 1001 storing the program is set in the drive device 1000, the program is installed from the recording medium 1001 to the auxiliary storage device 1002 via the drive device 1000. However, the program does not necessarily need to be installed from the recording medium 1001, and may be downloaded from another computer via a network. The auxiliary storage device 1002 stores installed programs as well as necessary files, data, and the like.

メモリ装置１００３は、プログラムの起動指示があった場合に、補助記憶装置１００２からプログラムを読み出して格納する。ＣＰＵ１００４は、メモリ装置１００３に格納されたプログラムに従って、情報処理装置１００に係る機能を実現する。インタフェース装置１００５は、ネットワークや各種計測装置、運動介入装置等に接続するためのインタフェースとして用いられる。表示装置１００６はプログラムによるＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）等を表示する。入力装置１００７はキーボード及びマウス、ボタン、又はタッチパネル等で構成され、様々な操作指示を入力させるために用いられる。出力装置１００８は演算結果を出力する。 The memory device 1003 reads the program from the auxiliary storage device 1002 and stores it when there is an instruction to start the program. The CPU 1004 implements functions related to the information processing apparatus 100 according to programs stored in the memory device 1003. The interface device 1005 is used as an interface for connecting to a network, various measuring devices, exercise intervention devices, and the like. A display device 1006 displays a GUI (Graphical User Interface) or the like based on a program. The input device 1007 is composed of a keyboard, a mouse, buttons, a touch panel, or the like, and is used to input various operation instructions. An output device 1008 outputs the calculation result.

（実施の形態の効果）
本実施の形態に係る技術により、従来のように人の目で画像から道路と渋滞の情報を抽出するのではなく、自動的に道路の領域及び渋滞領域を出力することができる。また、従来の画像上のスケース変動問題（車が小さい、道路が大きいなど）を考慮して、高精度で画像から領域を分割できる。また、領域間を分割する境界線の部分も円滑に表示することができる。 (Effects of embodiment)
With the technology according to the present embodiment, road areas and congestion areas can be automatically output, instead of extracting road and congestion information from images with the human eye as in the past. In addition, it is possible to divide the image into regions with high precision, taking into account the problem of scale variation on conventional images (small cars, large roads, etc.). Furthermore, the boundary line that divides the regions can also be displayed smoothly.

また、本実施の形態に係る技術で得られるセグメンテーションマップにより、視覚的に、渋滞の場所と、渋滞している領域が占める道路領域に対する割合を把握できる。また、渋滞があっても通過可能な場所を把握できるので、例えば緊急車両が通過可能かどうかを判断できる。このような点は、従来技術における渋滞検知や密度推定よりも優れた点である。 Further, the segmentation map obtained by the technology according to the present embodiment allows visually understanding the location of traffic jams and the ratio of the area occupied by the traffic jam to the road area. Furthermore, even if there is traffic jam, it is possible to know which places are passable, so it can be determined, for example, whether an emergency vehicle can pass. These points are superior to conventional techniques for detecting traffic jams and estimating density.

（付記）
以上の実施形態に関し、更に以下の付記項を開示する。
（付記項１）
メモリと、
プロセッサと、を備え、
前記プロセッサは、
上空から撮影された画像を取得し、
ニューラルネットワークのモデルを用いて、前記画像から道路領域と渋滞領域を同時に区分し、
得られた区分結果を出力する
情報処理装置。
（付記項２）
前記モデルは、
前記画像から得られた複数の特徴マップを用いて、前記画像における道路領域と渋滞領域を区分する第１モジュールと、
前記複数の特徴マップのうちの特定の特徴マップを用いて、前記第１モジュールにより得られた渋滞領域を強化する第２モジュールと
を備える付記項１に記載の情報処理装置。
（付記項３）
前記複数の特徴マップは、前記モデルに含まれる特徴ピラミッドネットワークにより生成され、前記特定の特徴マップは、前記複数の特徴マップのうちの最上位レベルの特徴マップである
付記項２に記載の情報処理装置。
（付記項４）
前記第２モジュールは、
前記特定の特徴マップから大域的なコンテキストを生成するコンテキスト生成器と、
前記大域的なコンテキストと、前記第１モジュールにより得られた渋滞領域とを用いて、当該渋滞領域よりも精度の高い渋滞領域を生成するアテンション計算ブロックと
を備える付記項２又は３に記載の情報処理装置。
（付記項５）
情報処理装置が実行する領域区分方法であって、
上空から撮影された画像を取得する取得ステップと、
ニューラルネットワークのモデルを用いて、前記画像から道路領域と渋滞領域を同時に区分する計算ステップと、
前記計算ステップにより得られた区分結果を出力する出力ステップと
を備える領域区分方法。
（付記項６）
コンピュータを、付記項１ないし４のうちいずれか１項に記載の情報処理装置における各部として機能させるためのプログラムを記憶した非一時的記憶媒体。 (Additional note)
Regarding the above embodiments, the following additional notes are further disclosed.
(Additional note 1)
memory and
comprising a processor;
The processor includes:
Obtain images taken from above,
simultaneously classifying road areas and congestion areas from the image using a neural network model;
An information processing device that outputs the obtained classification results.
(Additional note 2)
The model is
a first module that classifies a road area and a traffic congestion area in the image using a plurality of feature maps obtained from the image;
The information processing device according to supplementary note 1, further comprising: a second module that enhances the congested area obtained by the first module using a specific feature map among the plurality of feature maps.
(Additional note 3)
The information processing according to appendix 2, wherein the plurality of feature maps are generated by a feature pyramid network included in the model, and the specific feature map is the highest level feature map among the plurality of feature maps. Device.
(Additional note 4)
The second module is
a context generator that generates a global context from the specific feature map;
the information according to supplementary note 2 or 3, comprising: an attention calculation block that uses the global context and the congestion area obtained by the first module to generate a congestion area with higher accuracy than the congestion area; Processing equipment.
(Additional note 5)
An area segmentation method executed by an information processing device, the method comprising:
an acquisition step of acquiring an image taken from above;
a calculation step of simultaneously classifying a road area and a traffic congestion area from the image using a neural network model;
An area segmentation method comprising: an output step of outputting the segmentation result obtained by the calculation step.
(Additional note 6)
A non-temporary storage medium storing a program for causing a computer to function as each part of the information processing apparatus according to any one of Supplementary Notes 1 to 4.

以上、本実施の形態について説明したが、本発明はかかる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 Although the present embodiment has been described above, the present invention is not limited to such specific embodiment, and various modifications and changes can be made within the scope of the gist of the present invention as described in the claims. It is possible.

１００情報処理装置
１１０空中写真収集部
１２０アルゴリズム計算部
１３０出力処理部
１４０学習部
２１０特徴ピラミッドネットワーク
２２０オリジナル交通モジュール
２２１畳み込み層
２２２融合層
２３０コンテキストアテンションモジュール
２４０グローバルコンテキスト生成器
２５０アテンション計算ブロック
１０００ドライブ装置
１００１記録媒体
１００２補助記憶装置
１００３メモリ装置
１００４ＣＰＵ
１００５インタフェース装置
１００６表示装置
１００７入力装置
１００８出力装置 100 Information processing device 110 Aerial photo collection unit 120 Algorithm calculation unit 130 Output processing unit 140 Learning unit 210 Feature pyramid network 220 Original traffic module 221 Convolution layer 222 Fusion layer 230 Context attention module 240 Global context generator 250 Attention calculation block 1000 Drive device 1001 Recording medium 1002 Auxiliary storage device 1003 Memory device 1004 CPU
1005 Interface device 1006 Display device 1007 Input device 1008 Output device

Claims

an acquisition unit that acquires an image taken from the sky;
a calculation unit that simultaneously divides the image into a road area and a traffic congestion area using a neural network model;
An information processing device comprising: an output processing section that outputs the classification results obtained by the calculation section.

The model is
a first module that classifies a road area and a traffic congestion area in the image using a plurality of feature maps obtained from the image;
The information processing apparatus according to claim 1, further comprising: a second module that uses a specific feature map among the plurality of feature maps to strengthen the congested area obtained by the first module.

The information processing according to claim 2, wherein the plurality of feature maps are generated by a feature pyramid network included in the model, and the specific feature map is a top-level feature map among the plurality of feature maps. Device.

The second module includes:
a context generator that generates a global context from the specific feature map;
The information processing device according to claim 2, further comprising: an attention calculation block that uses the global context and the congestion area obtained by the first module to generate a congestion area with higher accuracy than the congestion area. .

An area segmentation method executed by an information processing device, the method comprising:
an acquisition step of acquiring an image taken from above;
a calculation step of simultaneously classifying a road area and a traffic congestion area from the image using a neural network model;
An area segmentation method comprising: an output step of outputting the segmentation result obtained by the calculation step.

A program for causing a computer to function as each part of an information processing apparatus according to any one of claims 1 to 4.