JP6797860B2

JP6797860B2 - Water intrusion detection system and its method

Info

Publication number: JP6797860B2
Application number: JP2018088946A
Authority: JP
Inventors: 純一富樫; 伊藤　渡; 渡伊藤; 正也岡田; 一成岩永; 藤井　幸; 幸藤井
Original assignee: Hitachi Kokusai Electric Inc
Current assignee: Hitachi Kokusai Electric Inc
Priority date: 2018-05-02
Filing date: 2018-05-02
Publication date: 2020-12-09
Anticipated expiration: 2036-12-09
Also published as: JP2018152106A

Description

本発明は、沿岸に設置された監視カメラの映像から、水上の船舶、泳者、漂流物等を検知する監視システムに関する。 The present invention relates to a surveillance system that detects ships, swimmers, drifting objects, etc. on the water from images of surveillance cameras installed on the coast.

近年、感度や解像度に優れたビデオカメラが安価に入手できるようになり、海上のような広範囲な領域を、現実的な台数のカメラで撮影し、物体の像を捉えることができる。 In recent years, video cameras with excellent sensitivity and resolution have become available at low cost, and it is possible to capture a wide area such as the sea with a realistic number of cameras and capture an image of an object.

海上において、波の影響を受けることなく、監視エリアに侵入してきた船舶等を検知する技術が知られる（例えば、特許文献１乃至６参照。）。 There is known a technique for detecting a ship or the like that has invaded a monitoring area at sea without being affected by waves (see, for example, Patent Documents 1 to 6).

なお、本発明に関連する技術として、カメラ映像から、水面の波の位相速度、周期、波高などを推定するものが知られる（例えば非特許文献２及び３参照。）。 As a technique related to the present invention, there is known a technique for estimating the phase velocity, period, wave height, etc. of a wave on the water surface from a camera image (see, for example, Non-Patent Documents 2 and 3).

特開２０１３-１８１７９５号公報Japanese Unexamined Patent Publication No. 2013-181795 特許４１１７０７３号公報Japanese Patent No. 4117703 特許３９９７０６２号公報Japanese Patent No. 3997062 国際公開第１０/０８４９０２号パンフレットInternational Publication No. 10/084902 Pamphlet 特開２００２-２７９４２９号公報JP-A-2002-279429 特許４３０２８０１号公報Japanese Patent No. 43201 特許５７０９２５５号公報Japanese Patent No. 5709255 特許４４７６５１７号公報Japanese Patent No. 4476517 特許４９２１８５７号公報Japanese Patent No. 4921857 特許５０２１９１３号公報Japanese Patent No. 5021913

Gunnar Farneback, “Two-frame motion estimation based on polynomial expansion”, Scandinavian Conference on Image Analysis, 2003, インターネット＜URL：http://www.diva-portal.org/smash/get/diva2:273847/FULLTEXT01.pdf＞Gunnar Farneback, “Two-frame motion estimation based on polynomial expansion”, Scandinavian Conference on Image Analysis, 2003, Internet <URL: http://www.diva-portal.org/smash/get/diva2:273847/FULLTEXT01.pdf ＞ Benetazzo, Alvise, et al, “Observation of extreme sea waves in a space-time ensemble”, Journal of Physical Oceanography Vol.45, No.9, 11 September 2015, インターネット＜http://journals.ametsoc.org/doi/pdf/10.1175/JPO-D-15-0017.1＞Benetazzo, Alvise, et al, “Observation of extreme sea waves in a space-time ensemble”, Journal of Physical Oceanography Vol.45, No.9, 11 September 2015, Internet <http://journals.ametsoc.org/doi /pdf/10.1175/JPO-D-15-0017.1> 原浩気，藤田一郎、「時空間画像を用いた河川表面流解析における二次元高速フーリエ変換の適用」，水工学論文集,第54巻,2010年2月、インターネット＜URL：http://library.jsce.or.jp/jsce/open/00028/2010/54-0185.pdf＞Hiroki Hara, Ichiro Fujita, "Application of 2D Fast Fourier Transform in River Surface Flow Analysis Using Space-Time Images", Journal of Hydraulic Engineering, Vol. 54, February 2010, Internet <URL: http: // library .jsce.or.jp/jsce/open/00028/2010/54-0185.pdf ＞稲葉徹、外３名、「波浪場画像の解析による水深分布推定」、土木学会第55回年次学術講演会、２−５、（2000年9月）、インターネット＜http://library.jsce.or.jp/jsce/open/00035/2000/55-2/55-2-0005.pdf＞Toru Inaba, 3 outsiders, "Estimation of water depth distribution by analyzing wave field images", 55th Annual Scientific Lecture Meeting of Japan Society of Civil Engineers, 2-5, (September 2000), Internet <http://library.jsce .or.jp/jsce/open/00035/2000/55-2/55-2-0005.pdf ＞

しかし上述の従来技術では、例えば船舶と遊泳者(ヒト)や浮遊物(物体)が侵入した際、どの物体が船舶なのか、ヒトなのか、もしくは物体なのかを区別する能力が限定的であった。例えば、物体の大きさやおおよその形、速度からその種別を想定する方法では、精度が不足する。このため、船舶、ヒト、物体検知の同時実現は難しいと考えられてきた。
また、侵入した物体が陸からどの程度離れたところにいるか、つまり陸への侵入度合は、映像だけでは推定できない。正確なカメラの設置位置や撮影画角が入手できる場合に、それらを参照して計算するしかなかった。 However, in the above-mentioned conventional technique, for example, when a ship and a swimmer (human) or a floating object (object) invade, the ability to distinguish which object is a ship, a human being, or an object is limited. It was. For example, the method of assuming the type from the size, approximate shape, and speed of the object is insufficient in accuracy. For this reason, it has been considered difficult to simultaneously detect ships, people, and objects.
In addition, how far the invaded object is from the land, that is, the degree of invasion into the land, cannot be estimated from the video alone. When accurate camera installation positions and shooting angles of view were available, there was no choice but to refer to them for calculation.

本発明の目的は、上記問題点に鑑み、海上から侵入しうる様々な物体を統合的に検知し警報する監視システムを提供することにある。 An object of the present invention is to provide a monitoring system that comprehensively detects and warns various objects that may invade from the sea in view of the above problems.

一実施例にかかる海上侵入検知システムは、海上を監視する可視カメラ映像から侵入する物体の候補を検知し、さらに大きさ、速度、侵入する方向および直線性等を導き、ある程度の物体識別を行う。同時に遠赤外線カメラ映像からも侵入する物体を検知すると同時に、輝度の違いから、船舶(高輝度)、ヒト(中輝度)、浮遊物(海と同輝度)を区別する。さらに同時に物体が無い通常状態の波の動きを分析する。波の動きを分析するために、フーリエ変換を用いて、輝度の時間的変化の周期性を観測する。その通常状態の波の動きとの対比で、例えば船舶であれば、波の動きと連動性がほぼ認められず、ヒトは波とは別の動きのため、連動性は比較的低く、ドラム缶などの物体においては、ほぼ波に漂うため、波との連動性が比較的高くなる。これらを用いて、物体の識別精度を上げる。他の実施例にかかる海上侵入検知システムは、旋回カメラを用いることで自動的に物体を追跡する。ただし、追跡する場合、画角が常に変化するため、検知した物体の位置が割り出せない。どこが陸かあらかじめ設定する必要なく陸への距離を測るため、波の周期性を用いて、物体のある程度の位置を導く。波の周期性は沖に比べると陸に近い場所では、水深が浅くなるため周期性が乱れ、弱くなる等の変化がみられる。この変化をもって、物体の陸への距離を概算し、侵入度合を自動的に導く。 The marine intrusion detection system according to one embodiment detects candidates for invading objects from a visible camera image that monitors the sea, further guides the size, speed, intrusion direction, linearity, etc., and identifies objects to some extent. .. At the same time, it detects invading objects from the far-infrared camera image, and at the same time, distinguishes between ships (high brightness), humans (medium brightness), and floating objects (same brightness as the sea) based on the difference in brightness. At the same time, the movement of the wave in the normal state without an object is analyzed. To analyze the movement of the wave, we use the Fourier transform to observe the periodicity of the temporal change in brightness. In comparison with the movement of waves in the normal state, for example, in the case of ships, there is almost no interlocking with the movement of waves, and since humans move differently from waves, the interlocking is relatively low, such as drums. Since the object is almost floating in the waves, the interlocking with the waves is relatively high. By using these, the accuracy of object identification is improved. The marine intrusion detection system according to the other embodiment automatically tracks an object by using a swivel camera. However, when tracking, the position of the detected object cannot be determined because the angle of view constantly changes. In order to measure the distance to the land without having to set where the land is in advance, the periodicity of the wave is used to guide the position of the object to some extent. Wave periodicity changes such as being disturbed and weakened due to the shallower water depth in places closer to land than offshore. With this change, the distance of the object to the land is estimated and the degree of penetration is automatically derived.

本発明によれば、船舶、ヒト、浮遊物を、高い信頼性で検知することが出来る。 According to the present invention, ships, humans, and suspended matter can be detected with high reliability.

実施例１の監視システム１の論理的な構成の例を示したブロック図。The block diagram which showed the example of the logical configuration of the monitoring system 1 of Example 1. FIG. 本実施例の海面状況取得器３の機能ブロック図。The functional block diagram of the sea level condition acquisition device 3 of this embodiment. 本実施例の差分法ベース検知器４の機能ブロック図。The functional block diagram of the difference method base detector 4 of this Example. 本実施例のシルエット状況下の検知器５の機能ブロック図。The functional block diagram of the detector 5 under the silhouette situation of this embodiment. 本実施例の特徴量ベースの検知器６の機能ブロック図。The functional block diagram of the feature amount-based detector 6 of this Example. 本実施例の追跡器７の機能ブロック図。The functional block diagram of the tracker 7 of this embodiment. 本実施例の脅威評価器８の機能ブロック図。The functional block diagram of the threat evaluator 8 of this embodiment.

本発明の実施形態の監視システムは、海上を監視する可視カメラ映像から侵入する物体の候補を検知し、さらに大きさ、速度、侵入する方向および直線性等を導き、ある程度の物体識別を行う。同時に遠赤外線カメラ映像からも侵入する物体を検知すると同時に、輝度の違いから、船舶(高輝度)、ヒト(中輝度)、浮遊物(海と同輝度)を区別する。さらに同時に物体が無い通常状態の波の動きを分析する。波の動きを分析するために、フーリエ変換を用いて、輝度の時間的変化の周期性を観測する。その通常状態の波の動きとの対比で、例えば船舶であれば、波の動きと連動性がほぼ認められず、ヒトは波とは別の動きのため、連動性は比較的低く、ドラム缶などの物体においては、ほぼ波に漂うため、波との連動性が比較的高くなる。これらを用いて、物体の識別精度を上げる。 The monitoring system of the embodiment of the present invention detects a candidate for an invading object from a visible camera image that monitors the sea, further derives the size, speed, invasion direction, linearity, etc., and identifies the object to some extent. At the same time, it detects invading objects from the far-infrared camera image, and at the same time, distinguishes between ships (high brightness), humans (medium brightness), and floating objects (same brightness as the sea) based on the difference in brightness. At the same time, the movement of the wave in the normal state without an object is analyzed. To analyze the movement of the wave, we use the Fourier transform to observe the periodicity of the temporal change in brightness. In comparison with the movement of waves in the normal state, for example, in the case of ships, there is almost no interlocking with the movement of waves, and since humans move differently from waves, the interlocking is relatively low, such as drums. Since the object is almost floating in the waves, the interlocking with the waves is relatively high. By using these, the accuracy of object identification is improved.

他の実施例にかかる海上侵入検知システムは、旋回カメラを用いることで自動的に物体を追跡する。
ただし、追跡する場合、画角が常に変化するため、検知した物体の位置が割り出せない。どこが陸かあらかじめ設定する必要なく陸への距離を測るため、波の周期性を用いて、物体のある程度の位置を導く。波の周期性は沖に比べると陸に近い場所では、水深が浅くなるため周期性が乱れ、弱くなる等の変化がみられる。この変化をもって、物体の陸への距離を概算し、侵入度合を自動的に導く。 The marine intrusion detection system according to the other embodiment automatically tracks an object by using a swivel camera.
However, when tracking, the position of the detected object cannot be determined because the angle of view constantly changes. In order to measure the distance to the land without having to set where the land is in advance, the periodicity of the wave is used to guide the position of the object to some extent. Wave periodicity changes such as being disturbed and weakened due to the shallower water depth in places closer to land than offshore. With this change, the distance of the object to the land is estimated and the degree of penetration is automatically derived.

図１は本発明の第１実施例の監視システム１の論理的な構成の例を示したブロック図である。本例の監視システム１は、映像ソース２、海面状況取得器３、差分法ベースの検知器４、シルエット状況下の検知器５、特徴量ベースの検知器６、それらの結果を統合して追跡する追跡器７、および脅威評価器８とを備える。
映像ソース２は、沿岸に設置され、海面を撮影する監視カメラ、或いは、その録画映像を再生する装置である。監視カメラは、電動雲台や電動ズームレンズを装備されうる。また映像は、可視光域、近赤外、遠赤外域のいずれから得られるものでもよく、１チャンネル（グレースケール）や多チャンネル（カラー）の何れでもよい。更に、揺れ補正、陽炎補正、自動ホワイトバランス、階調補正、高ダイナミックレンジ（HDR）合成などが施されうる。また映像ソース２は、後続の３つ検出器のために、空間領域若しくは時間領域フィルタを施した複数のバージョンの映像を出力しうる。また、後段の処理で入力映像が再度必要とされる場合に備え、過去の複数フレームの入力映像を自由に読出し可能に保持することができる。 FIG. 1 is a block diagram showing an example of a logical configuration of the monitoring system 1 according to the first embodiment of the present invention. The monitoring system 1 of this example integrates and tracks the video source 2, the sea level condition acquisition device 3, the difference method-based detector 4, the silhouette condition detector 5, the feature amount-based detector 6, and their results. The tracking device 7 and the threat assessor 8 are provided.
The video source 2 is a surveillance camera installed on the coast that captures the sea surface, or a device that reproduces the recorded video. Surveillance cameras can be equipped with electric pan heads and electric zoom lenses. The image may be obtained from any of the visible light region, the near infrared region, and the far infrared region, and may be either one channel (gray scale) or multiple channels (color). In addition, shake correction, heat haze correction, automatic white balance, gradation correction, high dynamic range (HDR) composition, etc. can be applied. The video source 2 may also output a plurality of versions of video filtered in the spatial or time domain for the subsequent three detectors. In addition, the input video of a plurality of frames in the past can be freely readable and held in case the input video is required again in the subsequent processing.

海面状況取得器３は、一例として、入力映像に基づいて、背景である海面の波の時間的周期、空間的周期、振幅（上下動）を自動的に推定する。これらの推定には、粒子画像流速測定法（PIV）のような古典的な方法、非特許文献２のステレオ画像法や、非特許文献３の時空間画像法など、各種の手法が利用できる。処理は、常時行う必要は無く、一部又は全部に人の操作が介在してもよい。また画像フレーム全体について行う必要は無く、海面の領域のうちの代表的な数か所のみで行ってもよい。逆に、海面状況取得器３による取得の試行を通じて、フレーム内の海面の領域を推定することもできる。 As an example, the sea level condition acquisition device 3 automatically estimates the temporal period, spatial period, and amplitude (vertical movement) of the background sea surface wave based on the input image. For these estimations, various methods such as a classical method such as particle image velocimetry (PIV), a stereo image method of Non-Patent Document 2, and a spatiotemporal image method of Non-Patent Document 3 can be used. The processing does not have to be performed all the time, and a human operation may intervene in part or in whole. Further, it is not necessary to perform the entire image frame, and it may be performed only at a few typical locations in the sea surface area. On the contrary, the area of the sea surface in the frame can be estimated through the trial of acquisition by the sea level condition acquisition device 3.

波の時間的周期、振幅、波数（位相速度）の間に成り立つ法則を利用して、それらが正しく得られているか検証したり、補完し合ったりできるが、非特許文献２には、複数の波が重なっている場合に必ずしも法則に単純には従わないことが示唆されている。実際の海岸付近の波の観察では、波形勾配（有義波高/有義波の波長）は0.05以下であり、稀な高波を除けば多くが0.01付近にピークを持つ分布となることが知られる。本例では、低俯角画像から直接的に求めた振幅を用いることで、精度を高める。すなわち、波の画像を、明部と暗部に分類し、それらの見かけの高さ（画像での縦方向の画素数）のうちの一方、例えば、平均値或いは分散値が小さい方を採用する。この方法で得られる振幅は、画像上での見かけの振幅であり、カメラ画像と実世界の間の座標変換に伴う誤差の影響は受けない。 It is possible to verify or complement each other by using the law that holds between the time period, amplitude, and wave number (phase velocity) of the wave, but Non-Patent Document 2 describes a plurality. It is suggested that the law is not always obeyed when the waves overlap. In actual observation of waves near the coast, it is known that the waveform gradient (significant wave height / wavelength of significant waves) is 0.05 or less, and most of them have a peak around 0.01 except for rare high waves. .. In this example, the accuracy is improved by using the amplitude obtained directly from the low depression angle image. That is, the wave image is classified into a bright part and a dark part, and one of the apparent heights (the number of pixels in the vertical direction in the image), for example, the one having the smaller average value or the dispersion value is adopted. The amplitude obtained by this method is the apparent amplitude on the image and is not affected by the error associated with the coordinate transformation between the camera image and the real world.

差分法ベース検知器４は、基準画像（背景）に比べて値が高速に変化する画素を検出するもので、特許文献２から６に記載されたものと類似したものである。数１０メートルから数キロメートルの視界の範囲で撮影される海面で観察される波は、その周期が数秒程度と想定される。上記文献では、波の周期変動に追従する様態で、入力映像を移動平均することで背景画像を生成し、画素毎に、最新の入力画像と背景画像の差の絶対値をしきい値処理することで、高速移動物体を検出していた。本例では、波由来の数Hzの周期の画素値の変動のみを抑圧することに特化した短時間背景画像と、数分から数時間の平均によって、背景構造物以外の動きのあるものが可能な限り全て除去された長時間背景画像を作成し、短時間背景画像と長時間背景画像の差分から、船舶等の候補領域（Region of interest）を取り出す。候補領域は、上記論理値からなるビットマップそのもの、或いは、各候補領域の属性によって、表現されうる。候補領域の属性は、ビットマップ中において１の値を持つ画素の塊の外接矩形に関連して、各種のものが定義され、例えば、塊の重心の座標、最下点の座標、外接矩形の縦サイズ、横サイズ、縦横比、平均輝度、エッジ量、画素数（面積）、充填度、周囲長、円形度、複雑度等である。ここで、外接矩形の辺は常に画像フレームと水平又は垂直に設定される。平均輝度やエッジ量は、短時間背景画像若しくは入力画像における、注目する塊に対応する各画素から算出されるものである。また充填度とは、外接矩形内で１の値を持つ画素の割合である。また、候補領域の属性の１つとして、領域の最下端と水平線との見かけの距離を追加するとよい。 The difference method-based detector 4 detects pixels whose values change faster than the reference image (background), and is similar to those described in Patent Documents 2 to 6. Waves observed on the surface of the sea photographed in a field of view of several tens of meters to several kilometers are assumed to have a period of several seconds. In the above document, a background image is generated by moving average the input video in a manner that follows the periodic fluctuation of the wave, and the absolute value of the difference between the latest input image and the background image is threshold-processed for each pixel. As a result, a high-speed moving object was detected. In this example, a short-time background image specializing in suppressing only fluctuations in pixel values with a period of several Hz derived from waves and an average of several minutes to several hours can be used to create a moving object other than the background structure. A long-time background image in which all the images are removed is created, and a candidate region (Region of interest) such as a ship is extracted from the difference between the short-time background image and the long-time background image. The candidate area can be represented by the bitmap itself consisting of the above logical values or the attributes of each candidate area. Various attributes of the candidate area are defined in relation to the circumscribing rectangle of the pixel mass having a value of 1 in the bitmap, for example, the coordinates of the center of gravity of the mass, the coordinates of the lowest point, and the circumscribing rectangle. Vertical size, horizontal size, aspect ratio, average brightness, edge amount, number of pixels (area), filling degree, peripheral length, circularity, complexity, etc. Here, the sides of the circumscribing rectangle are always set horizontally or perpendicular to the image frame. The average brightness and the amount of edges are calculated from each pixel corresponding to the mass of interest in the short-time background image or the input image. The filling degree is the ratio of pixels having a value of 1 in the circumscribing rectangle. Further, as one of the attributes of the candidate area, it is preferable to add the apparent distance between the lowermost end of the area and the horizontal line.

差分法において、背景の更新速度としきい値は、重要な制御可能なパラメータである。更新速度は、海面状況取得器３で得られた時間的周期や、候補領域の検出数などに応じて自動的に調整されうる。特に短時間背景画像の更新速度（生成フィルタの特性）は、泳者や浮遊物が短時間背景画像に残り、検知できるように、注意深く調整されることが望ましい。 In the finite difference method, background update speed and threshold are important controllable parameters. The update speed can be automatically adjusted according to the time cycle obtained by the sea level condition acquisition device 3, the number of detected candidate regions, and the like. In particular, the update speed of the short-time background image (characteristic of the generation filter) should be carefully adjusted so that swimmers and suspended objects remain in the short-time background image and can be detected.

しきい値処理では、短時間背景画像の画素毎に、その画素が属するであろう分布モデルの長時間背景画像における、対応画素（ブロック）の画素値との差分を算出する。そしてその差分値の絶対値を、対応画素の分散に基づくしきい値と比較することで達成され、０又は１の論理値が出力される。入力映像における画素をモデル化した場合、短時間背景画像では、異なる分布モデルに属する画素が時間的に重なりあったものとなる。そのためしきい値は、当該分布モデルの分散に、当該モデルと隣接モデルのそれぞれの平均の差の１／２を加えた値を基準として調整されうる。入力映像がカラーの場合、色毎に存在するしきい値が用いられ、１色でもしきい値を越えている場合に１としたり（つまり論理和）、しきい値の超過分の和を更にしきい値処理したりすることができる。また、しきい値は、検知漏れが少ない反面誤検知が多いものと、誤検知が多い反面検知漏れが多いものの２種類を用いて、それぞれの処理結果が出力されるようにしてもよい。 In the threshold value processing, for each pixel of the short-time background image, the difference from the pixel value of the corresponding pixel (block) in the long-time background image of the distribution model to which the pixel belongs is calculated. Then, it is achieved by comparing the absolute value of the difference value with the threshold value based on the variance of the corresponding pixel, and a logical value of 0 or 1 is output. When the pixels in the input video are modeled, the pixels belonging to different distribution models are temporally overlapped in the short-time background image. Therefore, the threshold value can be adjusted based on the variance of the distribution model plus 1/2 of the average difference between the model and the adjacent model. When the input video is color, the threshold value existing for each color is used, and if even one color exceeds the threshold value, it is set to 1 (that is, logical sum), or the sum of the excess of the threshold value is changed. Threshold processing can be performed. Further, there may be two types of threshold values, one with a small number of detection omissions and a large number of false positives, and the other with a large number of false positives but a large number of false detections, and each processing result may be output.

なお短時間と長時間の両背景画像は、入力映像に、波の空間周期に対応する特性の空間ローパスフィルタやダウンサンプル処理をしてから、それぞれの時間領域処理により生成してもよい。 Both short-time and long-time background images may be generated by subjecting the input video to a spatial low-pass filter or downsample processing having characteristics corresponding to the spatial period of the wave, and then performing each time domain processing.

シルエット状況下の検知器５は、ほぼ飽和した輝度を有する背景の中にほぼ暗黒の輝度を有する物体が映った映像から、暗黒領域を物体候補として検出するものであり、特許文献３に記載されたものと同様である。このような映像は、特に太陽の位置が低くなる時刻で、太陽光が直接若しくは海面に反射してカメラに入射する状況下で撮影される。検出結果は、差分法ベース検知器４と同様に、ビットマップ、或いは、候補領域の属性として出力される。 The detector 5 under a silhouette situation detects a dark region as an object candidate from an image in which an object having substantially dark brightness is projected in a background having substantially saturated brightness, and is described in Patent Document 3. It is the same as the one. Such an image is taken, especially at a time when the position of the sun is low, in a situation where the sunlight is reflected directly or on the sea surface and incident on the camera. The detection result is output as a bitmap or an attribute of the candidate area as in the difference method-based detector 4.

特徴量ベース検知器６は、差分法よりも高度な処理によって、物体を検知したり、その物体の種別を識別したりする。本例の特徴量ベース検知器５は、差分法ベース検知器４やシルエット状況下の検知器５が検出した候補領域の情報を取得し、入力映像における当該領域の近傍で特徴量を抽出する。そして、機械学習手法を用いて、その特徴量が、船舶、泳者、浮遊物、それ以外（海面、外乱等）のいずれに該当するか判別する。 The feature amount-based detector 6 detects an object and identifies the type of the object by a process higher than the difference method. The feature amount-based detector 5 of this example acquires information on a candidate area detected by the difference method-based detector 4 and the detector 5 under a silhouette situation, and extracts the feature amount in the vicinity of the area in the input video. Then, using a machine learning method, it is determined whether the feature amount corresponds to a ship, a swimmer, a floating object, or other (sea surface, disturbance, etc.).

特徴量は、１枚の画像から得られるものに限らず、直近の複数フレームの映像から得られる時空間特徴であってもよく、周知のあらゆる特徴量が利用できるかもしれない。同様に、機械学習手法も、周知のあらゆる特徴量が利用できる可能性があり、教師なし学習（クラスタリング）であるk-means、線形判別分析(LDA)、EMアルゴリズムなどや、教師あり学習であるロジスティック判別、サポートベクターマシン、決定木、制限付きボルツマンマシンなどが利用できる。ただし用途によって、特徴量と学習器の好ましい組合せがある。本例では、学習器にランダムフォレストを用い、特徴量としてはtexton特徴(Semantic Texton Forests)、色ヒストグラム、ＨＯＧ（Histograms of Oriented Gradients）、ＨＯＦ（Histograms of Optical Flow）、ＤＯＴ(Dominant Orientation Templates)、ＭＢＨ（Motion Boundary Histogram）、分離型格子隠れマルコフモデル等を用いる。入力画像からこれらの特徴量を得る前に、空間領域や時間領域でのローパスフィルタや、非特許文献１に記載のオプティカルフロー処理などが適用され得る。 The feature amount is not limited to that obtained from one image, but may be a spatiotemporal feature obtained from the latest multiple frames of video, and any well-known feature amount may be available. Similarly, machine learning methods may utilize all well-known features, such as unsupervised learning (clustering) k-means, linear discriminant analysis (LDA), EM algorithms, and supervised learning. Logistic discriminant analysis, support vector machines, decision trees, restricted Boltzmann machines, etc. can be used. However, depending on the application, there is a preferable combination of the feature amount and the learning device. In this example, a random forest is used as the learner, and the features are texton features (Semantic Texton Forests), color histogram, HOG (Histograms of Oriented Gradients), HOF (Histograms of Optical Flow), DOT (Dominant Orientation Templates), MBH (Motion Boundary Histogram), separated lattice hidden Markov model, etc. are used. Before obtaining these features from the input image, a low-pass filter in the spatial region or the time domain, the optical flow processing described in Non-Patent Document 1, and the like can be applied.

追跡器７は、各フレーム毎に上記の検知器によって検知された候補領域にラベル付けをし、更に時間方向に関連付けする。これにより、波の谷間に見え隠れする泳者などを持続的に追跡し、また単発的に誤検知された領域を除去する。本例の追跡器７には、カルマンフィルタ、パーティクルフィルタ、ベイズフィルタ、ミーンシフト等の周知の手法が適用されうる。それらの処理に必要であれば、追跡器７には候補領域における元の画像が提供される。追跡器は、近未来の位置を予測することもできる。海上の所定の位置に設置されたブイや標識等は、その位置に基づいて除去される。追跡器７は、画像座標で表される位置で追跡することもできるが、カメラの設置高さ、俯角、画角（ズーム倍率）等に基づいて、実世界に対応するグローバル座標系の位置で追跡することが望ましい。画像座標とグローバル座標とは、投影行列若しくはホモグラフィ行列（物体位置を１つの平面上に限定する場合）によって変換可能である。追跡器７は、異なる検知器からの候補領域の座標を統合して追跡することができるが、その際は、軌跡の生成に利用された候補領域がどの検知器から提供されたかを示す情報、またその候補領域の他の属性を収集し保持する。また、追跡を通じて、前フレームからの位置の変化などの新たな属性が追加されうる。 The tracker 7 labels the candidate area detected by the detector for each frame and further associates it in the time direction. As a result, swimmers and the like that appear and disappear in the valley of the waves are continuously tracked, and areas that are sporadically falsely detected are removed. Well-known methods such as a Kalman filter, a particle filter, a Bayesian filter, and a mean shift can be applied to the tracker 7 of this example. If necessary for those processes, the tracker 7 is provided with the original image in the candidate area. The tracker can also predict the position in the near future. Buoys, signs, etc. installed at a predetermined position on the sea are removed based on that position. The tracker 7 can track at a position represented by image coordinates, but at a position in the global coordinate system corresponding to the real world based on the camera installation height, depression angle, angle of view (zoom magnification), etc. It is desirable to track. The image coordinates and the global coordinates can be converted by a projection matrix or a homography matrix (when the object position is limited to one plane). The tracker 7 can integrate and track the coordinates of the candidate regions from different detectors, and in that case, information indicating from which detector the candidate regions used for generating the trajectory are provided. It also collects and retains other attributes of the candidate area. Also, through tracking, new attributes such as changes in position from the previous frame can be added.

脅威評価器８は、候補領域が何によるものかを識別したうえで、その移動（特に陸への接近）の傾向を考慮して、総合的に脅威を評価し多段的な発報を行う。例えば、候補領域が、５秒以上存在する、海面の外乱ではない何らかの物体として最初に認識され、その後の物体の接近に伴い２０ピクセル程度の大きさになった候補領域は、その種類の識別に成功する。また、追跡を続けることで陸への接近の傾向が判別できるようになる。つまり、脅威の程度は変化しうるので、脅威評価器８は、脅威が高まる度に、それに応じた警報を発する。これにより、海上警備の利便性および安全性を向上させることが期待できる。 After identifying what the candidate area is due to, the threat evaluator 8 comprehensively evaluates the threat and issues a multi-stage report in consideration of the tendency of its movement (particularly approaching the land). For example, a candidate area that exists for 5 seconds or more and is not a disturbance of the sea surface is first recognized as some kind of object, and then the size of the candidate area becomes about 20 pixels as the object approaches. success. In addition, by continuing the tracking, it becomes possible to determine the tendency of approaching the land. That is, since the degree of threat can change, the threat evaluator 8 issues an alarm corresponding to the heightened threat. This can be expected to improve the convenience and safety of maritime security.

候補領域の識別は、追跡器７が収集した候補領域の属性と、もし利用可能であれば特徴量ベース検知器６の判別結果に基づいて、人によって設定されたルールベースの手法や周知の機械学習手法を用いて達成される。ここで、船舶、泳者、浮遊物が呈するであろう属性を考えると、下表の様になる。
The identification of the candidate area is performed by a rule-based method set by a person or a well-known machine based on the attributes of the candidate area collected by the tracker 7 and the discrimination result of the feature amount-based detector 6 if available. Achieved using learning techniques. Here, considering the attributes that ships, swimmers, and floating objects will exhibit, the table below shows.

エッジを評価する対象が短時間背景画像である場合、動きの度合いによって像の輪郭の明瞭度が異なることに注意する。つまり、動きが多いとエッジが薄くなり、動きが少ないとエッジが本来のエッジに近くなる。実際には、表１よりも複雑な（例えば、昼夜で別のテーブルを参照する）判断が必要な場合がある。候補領域の識別の結果は、差分法ベース検知器４、シルエット状況下の検知器５、特徴量ベースの検知器６にフィードバックされ、パラメータの調整や、オンライン学習（強化学習）に利用される。 Note that if the object for which the edge is evaluated is a short-term background image, the intelligibility of the outline of the image will differ depending on the degree of movement. That is, if there is a lot of movement, the edge becomes thin, and if there is little movement, the edge becomes closer to the original edge. In practice, it may require more complex decisions (eg, referencing different tables day and night) than Table 1. The result of identification of the candidate region is fed back to the difference method-based detector 4, the detector 5 under the silhouette situation, and the feature amount-based detector 6, and is used for parameter adjustment and online learning (reinforcement learning).

脅威の程度は、明確な侵入の意図が推定される状況において、特に高くなる。たとえば、10秒間に10m以上岸に近づいたと判断された場合、それが推定される。 The degree of threat is especially high in situations where a clear intent is presumed. For example, if it is determined that the shore is approached more than 10m in 10 seconds, it is estimated.

海面状況取得器３から脅威評価器８までの構成は、DSP(Digital Signal Processor)、FPGA(Field-Programmable Gate Array)、その他の画像信号処理に特化したプロセッサを用いて実装されうる。実質的に10000 MMACS（Million Multiply-Accumulates Per Second）以上の演算性能を得るため、DSPとFPGAの協調設計、特にDSPではメモリ帯域を消費するような処理群をFPGA上でパイプライン化した構成が好ましい。 The configuration from the sea level condition acquisition device 3 to the threat evaluator 8 can be implemented by using a DSP (Digital Signal Processor), an FPGA (Field-Programmable Gate Array), or another processor specialized in image signal processing. In order to obtain a computing performance of 10000 MMACS (Million Multiply-Accumulates Per Second) or higher, a coordinated design of DSP and FPGA, especially a configuration in which a processing group that consumes memory bandwidth is pipelined on the FPGA. preferable.

図２に、本実施例の海面状況取得器３の機能ブロック図が示される。水平線推定器３１は、映像ソース２からの任意の映像若しくは短時間背景画像等に、Canny エッジ検出フィルタとハフ変換を適用し、略水平の線分の位置を決定する。ここで得られた水平線の位置は、差分法ベース検知器４から追跡器７に適宜提供され、水平線より上の領域（マスク領域）で検知を行わないようにする。また水平線の位置は、追跡器７等で、候補領域と水平線との見かけの距離を算出する際に用いられる。 FIG. 2 shows a functional block diagram of the sea level condition acquisition device 3 of this embodiment. The horizontal line estimator 31 applies a Canny edge detection filter and a Hough transform to an arbitrary video or short-time background image from the video source 2 to determine the position of a substantially horizontal line segment. The position of the horizon obtained here is appropriately provided from the difference method-based detector 4 to the tracker 7, and detection is not performed in the region above the horizon (mask region). The position of the horizon is used when calculating the apparent distance between the candidate area and the horizon with a tracker 7 or the like.

簡易推定器３２は、映像ソースからの入力映像中の所定の複数の評価領域（画素ブロック）内で、画素のモデルを推定し、領域毎に２つのしきい値を出力する。海面の画像は、一般的な差分法における背景画像と同様に、混合ガウス分布モデルによって表現されうる。本例では海面を想定しており、波の起伏に対応して明るい部分と暗い影の部分があり、更に太陽光が直接反射しているハイライト部分も想定して、２から３の分布の存在が推定される。 The simple estimator 32 estimates a pixel model within a plurality of predetermined evaluation regions (pixel blocks) in the input video from the video source, and outputs two threshold values for each region. The image of the sea surface can be represented by a mixed Gaussian distribution model, similar to the background image in the general finite difference method. In this example, the sea surface is assumed, there are bright parts and dark shadow parts corresponding to the undulations of the waves, and the highlight part where sunlight is directly reflected is also assumed, and the distribution is 2 to 3. Existence is presumed.

しきい値の決定には、ヒストグラムの谷の濃度値を用いるモード法、分別されるべき領域の面積比に従って、ヒストグラムの濃度の低い方からの画素数を決定するP−タイル法、微分ヒストグラムの値が最大となる濃度値を用いる微分ヒストグラム法、大津のしきい値決定法、画像中の部分ごとの性質に合わせてしきい値を変化させる変動しきい値法などが利用できる。
本例では、大津の方法と変動しきい値法を組合せる。すなわち、最初に適当なしきい値を設定し、各画素を画素値に応じていずれの分布モデル（クラス）に振り分け、十分な数のフレームに亘って蓄積した後、画素数や平均と分散を算出する。初期のしきい値に関して、ハイライト部分は飽和した輝度値の画素からできているのでこれを識別するしきい値は容易に決定でき、また、明るい部分と暗い部分に分けるしきい値は、全平均を用いることができる。その後、大津の方法に従い、クラス内分散とクラス間分散の比を最大化するようにしきい値を更新する。これらの処理は領域毎に行われる。また色情報を用いることなく輝度値のみを用いて行うことができる。海面の画像は比較的なだらかであるので、分布モデルは、離散的に設定された評価領域毎に行い、領域の間は補間により推定することができる。或いは、評価領域毎ではなく、画素単位でも算出できる。 To determine the threshold, use the mode method that uses the density value of the valley of the histogram, the P-tile method that determines the number of pixels from the lowest density of the histogram according to the area ratio of the area to be separated, and the differential histogram. A differential histogram method that uses the density value that maximizes the value, Otsu's threshold determination method, and a variable threshold method that changes the threshold according to the properties of each part in the image can be used.
In this example, the Otsu method and the fluctuation threshold method are combined. That is, first set an appropriate threshold, distribute each pixel to any distribution model (class) according to the pixel value, accumulate over a sufficient number of frames, and then calculate the number of pixels, average, and variance. To do. Regarding the initial threshold value, since the highlight part is made up of pixels with saturated luminance values, the threshold value for identifying this can be easily determined, and the threshold value for dividing the bright part and the dark part is all. The average can be used. Then, according to Otsu's method, the threshold is updated so as to maximize the ratio of intra-class variance to inter-class variance. These processes are performed for each area. Further, it can be performed using only the luminance value without using the color information. Since the image of the sea surface is comparatively gentle, the distribution model can be performed for each evaluation region set discretely, and the region can be estimated by interpolation. Alternatively, it can be calculated not for each evaluation area but for each pixel.

暗部/明部抽出器３３は、推定された分布モデルを簡易推定器３２から受取ると、明部、暗部に相当するクラスの一方を選択する。一例として、分散または画素数が小さい方のクラスを選択する。そして、入力映像の各評価領域およびその付近において、選択したクラスに属する画素の塊の縦方向の画素数（高さ）を算出する。この値は、海面を斜めから俯瞰したときの波の影若しくは照らされた部分の高さと奥行きを含んでおり、実際の高さではない。ここで、小さい方のクラスを選んだ理由は、大きなクラスでは複数の波が連結している可能性が高く、そのような領域を誤って評価しないようにするためである。 When the dark part / bright part extractor 33 receives the estimated distribution model from the simple estimator 32, the dark part / bright part extractor 33 selects one of the classes corresponding to the bright part and the dark part. As an example, select the class with the smaller variance or the smaller number of pixels. Then, the number of pixels (height) in the vertical direction of the pixel block belonging to the selected class is calculated in each evaluation region of the input video and its vicinity. This value includes the height and depth of the shadow or illuminated part of the wave when the sea surface is viewed from an angle, and is not the actual height. Here, the reason for choosing the smaller class is that there is a high possibility that multiple waves are connected in the larger class, and such a region is not evaluated incorrectly.

見かけの波高推定器３４は、暗部/明部抽出器３３から受取った塊の高さを、所定の換算式を用いて、波高に換算する。換算式は俯角の関数であり、さらに経験的に補正されうるが、俯角が小さくなるほどその影響は小さい。俯角は、予め設定されているか、或いはキャリブレーション実行器７２（後述）から提供される。波高には本質的にばらつきがあるため、複数のサンプルを取得してソートし、上位何％〜何％の間を平均化する等の処理を行うことが望ましい。 The apparent wave height estimator 34 converts the height of the mass received from the dark / bright area extractor 33 into a wave height using a predetermined conversion formula. The conversion formula is a function of the depression angle and can be corrected empirically, but the smaller the depression angle, the smaller the effect. The depression angle is preset or provided by the calibration executor 72 (described below). Since the wave height is inherently variable, it is desirable to take a plurality of samples, sort them, and perform processing such as averaging between the top percentages and percentages.

波数及び周期推定器３５は、映像ソース２からの映像を元に、各領域付近での波数と周期を推定するとともに、もし可能であれば、それらに基づいて、より信頼性の高い波高を算出する。また、脅威評価器８から要求があった時には、その場所における波の波高や周期等を算出して提供する。各領域の間の場所の波高は、見かけの波高推定器３４が推定した波高を補間して算出する。波数等の推定には様々な方法があるが、方向や周期の異なる複数の波が重なっている場合、特に浅い俯角の映像からの推定は容易ではない。必要に応じて、沖から岸に向かってくる1方向の波のみの存在を想定する。周期は、領域内のある画素について、時系列の画素値若しくはクラス分け結果をＦＦＴしピークを検出することで得られる。より簡易的には、暗部/明部抽出器３３で検出された塊の上端や下端の位置を時系列に収集し、平均値との交差回数で、収集時間を除算する方法でも得られる。 The wavenumber and period estimator 35 estimates the wavenumber and period in the vicinity of each region based on the video from the video source 2, and if possible, calculates a more reliable wave height based on them. To do. In addition, when requested by the threat evaluator 8, the wave height, period, etc. of the wave at that location are calculated and provided. The wave height at the location between each region is calculated by interpolating the wave height estimated by the apparent wave height estimator 34. There are various methods for estimating the wave number, etc., but when multiple waves with different directions and periods overlap, it is not easy to estimate from an image with a shallow depression angle. If necessary, assume the existence of only one-way waves coming from offshore to the shore. The period is obtained by detecting a peak by FFTing a time-series pixel value or a classification result for a certain pixel in the region. More simply, it can also be obtained by collecting the positions of the upper end and the lower end of the mass detected by the dark part / bright part extractor 33 in time series and dividing the collection time by the number of intersections with the average value.

水深推定器３６は、波の特性に基づいて、その場所の水深を推定する。例えば非特許文献４のように、下記の微小振幅波の分散関係式を用いて算出することができる。

ここで、Tは波の周期、ｇは重力加速度、ｈは水深である。水深が浅くなるほど波高が高く波長が短くなる現象は浅水変形と呼ばれ、複雑な不規則波においても同様の傾向がみられる。浅水変形が顕著に観察されるのは、水深が（沖での）波長の１／２よりも浅い場所に限られる。水深推定器３６は、水深の絶対値を得る必要は無く、波高が沖合に比べて何倍になっているかを示す浅水計数を算出するだけでもよい。 The water depth estimator 36 estimates the water depth at that location based on the characteristics of the waves. For example, as in Non-Patent Document 4, it can be calculated by using the following dispersion relation formula of the minute amplitude wave.

Here, T is the wave period, g is the gravitational acceleration, and h is the water depth. The phenomenon that the wave height becomes higher and the wavelength becomes shorter as the water depth becomes shallower is called shallow water deformation, and the same tendency can be seen in complicated irregular waves. Shallow water deformation is noticeably observed only where the water depth is shallower than 1/2 of the wavelength (offshore). The water depth estimator 36 does not need to obtain the absolute value of the water depth, and may only calculate a shallow water count indicating how many times the wave height is higher than that of the offshore area.

図３に、本実施例の差分法ベース検知器４の機能ブロック図が示される。差分法ベース検知器４は、短時間背景画像生成器４１、長時間背景画像生成器４２、更新係数設定器４３、絶対差分器４４、２値化器４５、しきい値設定器４６、時間フィルタ（真値保持器）４７、ラベリング器４８を備える。 FIG. 3 shows a functional block diagram of the difference method-based detector 4 of this embodiment. The difference method-based detector 4 includes a short-time background image generator 41, a long-time background image generator 42, an update coefficient setting device 43, an absolute difference device 44, a binarizer 45, a threshold value setting device 46, and a time filter. A (true value holder) 47 and a labeling device 48 are provided.

短時間背景画像生成器４１は、内部にフレームメモリを有し、映像ソース２から所定のレートで画像フレームが入力されるたびに、その画像フレームとフレーメモリ内の画像とを所定の重み（更新係数ρ_１）で合成し、短時間背景画像として出力するとともに、フレームメモリに上書きする。この処理は時間フィルタ、リカーシブフィルタ、IIRフィルタ、指数移動平均などとも呼ばれ、一例として０．数秒程度の時定数を有する。 The short-time background image generator 41 has a frame memory inside, and each time an image frame is input from the video source 2 at a predetermined rate, the image frame and the image in the frame memory are weighted (updated) by a predetermined weight. It is synthesized with the coefficient ρ ₁ ), output as a background image for a short time, and overwritten in the frame memory. This process is also called a time filter, recursive filter, IIR filter, exponential moving average, etc. As an example, 0. It has a time constant of about several seconds.

長時間背景画像生成器４２は、内部にフレームメモリを有し、映像ソース２から所定のレートで画像フレームが入力されるたびに、その画像フレームとフレームメモリ内の画像とを所定の重み（更新係数ρ_２）で合成し、長時間背景画像として出力するとともに、フレーメモリに上書きする。一例として、長時間背景画像生成器４２は、短時間背景画像生成器４１と同等の構成を有するが、削減されたフレームレート、若しくは、比較的小さな更新係数のもとで動作し、一例として数秒程度若しくはそれ以上の時定数を有する。 The long-time background image generator 42 has a frame memory inside, and each time an image frame is input from the video source 2 at a predetermined rate, the image frame and the image in the frame memory are weighted (updated). It is synthesized with the coefficient ρ ₂ ), output as a background image for a long time, and overwritten in the frame memory. As an example, the long-time background image generator 42 has the same configuration as the short-time background image generator 41, but operates under a reduced frame rate or a relatively small update factor, for example, a few seconds. It has a degree or higher time constant.

長時間背景画像生成器４２は、映像ソース２からの画像に代えて、短時間背景画像生成器４１が出力する短時間背景画像を用いることができる。長時間背景画像生成器４２は、単純に１フレームの長時間背景画像を生成するものに限らず、混合分布モデルに基づいて、複数のフレームの背景画像（平均値画像）や分散値画像を生成してもよい。これは周知のコードブック法を用いて実装されうる。モデル化には簡易推定器３２の結果を流用できる。 The long-time background image generator 42 can use the short-time background image output by the short-time background image generator 41 instead of the image from the video source 2. The long-time background image generator 42 is not limited to simply generating a long-time background image of one frame, but also generates a background image (mean value image) or a dispersion value image of a plurality of frames based on a mixture distribution model. You may. This can be implemented using the well-known codebook method. The result of the simple estimator 32 can be used for modeling.

更新係数設定器４３は、更新係数（更新速度）ρ_１、ρ_２を、海面状況取得器３で得られた時間的周期や、候補領域の検出数などに応じて自動的に調整する。更新係数ρ_１は、波の周期（数秒〜１０数秒）と同程度の時定数で更新が行われるように、例えばρ_１＝β・ｆにより設定される。ここで、ｆは波の周波数、βは所定の係数であり、追跡器７で追跡に失敗する候補領域の割合に応じて常に調整されうる。一方更新係数ρ_２は、２０〜３０分程度と言われる波の状態の持続性を基準にして、その持続性よりも短く且つ波の周期よりも長い時定数で更新が行われるように設定される。 The update coefficient setting device 43 automatically adjusts the update coefficients (update speed) ρ ₁ and ρ _{2 according} to the time period obtained by the sea level condition acquisition device 3 and the number of detected candidate regions. The update coefficient ρ ₁ is set by, for example, ρ ₁ = β · f so that the update is performed with a time constant similar to the wave period (several seconds to several seconds). Here, f is the frequency of the wave, β is a predetermined coefficient, and can always be adjusted according to the proportion of candidate regions that fail to be tracked by the tracker 7. On the other hand, the update coefficient ρ ₂ is set so that the update is performed with a time constant shorter than the sustainability and longer than the wave period, based on the persistence of the wave state, which is said to be about 20 to 30 minutes. To.

絶対差分器４４は、短時間背景画像と長時間背景画像の間で、対応する画素の値の差分の絶対値を算出し、差分画像として出力する。入力映像がカラーの場合、色毎にこの処理が行われる。なお、画素毎の差分に代えて、注目する画素近傍のヒストグラムの差分を算出してもよい。 The absolute difference device 44 calculates the absolute value of the difference between the values of the corresponding pixels between the short-time background image and the long-time background image, and outputs it as a difference image. When the input video is color, this processing is performed for each color. Instead of the difference for each pixel, the difference in the histogram near the pixel of interest may be calculated.

２値化器４５は、絶対差分器４４からの差分画像をしきい値と比較して２値化し、２値化画像を出力する。しきい値入力映像がカラーの場合、色毎に存在するしきい値が用いられ、１色でもしきい値を越えている場合に１としたり（つまり論理和）、しきい値の超過分の和を更にしきい値処理したりすることができる。また、しきい値は、検知漏れが少ない反面誤検知が多いものと、誤検知が多い反面検知漏れが多いものの２種類を用いて、それぞれの処理結果が出力されるようにしてもよい。 The binarizer 45 compares the difference image from the absolute difference device 44 with the threshold value, binarizes it, and outputs the binarized image. When the threshold input video is color, the threshold existing for each color is used, and if even one color exceeds the threshold, it is set to 1 (that is, logical sum), or the excess of the threshold is set. The sum can be further thresholded. Further, the threshold value may be two types, one with a small number of detection omissions and a large number of false positives, and the other with a large number of false positives but a large number of false positives, and the respective processing results may be output.

モデルベースの長時間背景画像を利用できる場合、２値化器４５は、短時間背景画像の画素毎に、その画素が属するであろう分布モデルの長時間背景画像における、対応画素（ブロック）の画素値との差分を算出する。そしてその差分値の絶対値を、対応画素の分散に基づくしきい値と比較することで達成され、０又は１の論理値が出力される。入力映像における画素をモデル化した場合、短時間背景画像では、厳密には、異なる分布モデルに属する画素が時間的に重なりあったものとなる。そのためしきい値は、当該分布モデルの分散に、当該モデルと隣接モデルのそれぞれの平均の差の１／２を加えた値を基準として調整されうる。 If a model-based long-term background image is available, the binarizer 45 will use the binarizer 45 for each pixel of the short-time background image to accommodate the corresponding pixels (blocks) in the long-time background image of the distribution model to which that pixel will belong. Calculate the difference from the pixel value. Then, it is achieved by comparing the absolute value of the difference value with the threshold value based on the variance of the corresponding pixel, and a logical value of 0 or 1 is output. When the pixels in the input video are modeled, in the short-time background image, strictly speaking, the pixels belonging to different distribution models are temporally overlapped. Therefore, the threshold value can be adjusted based on the variance of the distribution model plus 1/2 of the average difference between the model and the adjacent model.

しきい値設定器４６は、候補領域の検出に適したしきい値を適応的に設定する。一例として、差分画像中の所定の複数の領域毎に、画素値（絶対差分）を平均することで、絶対値化する前の分布における標準偏差を得て、この標準偏差に所定の係数を乗じてしきい値とする。領域に属しない画素の位置では、近傍の領域で得られたしきい値を補間して適用する。係数は人によって設定されるほか、候補領域の検出状況に応じて調整されうる。なお、入力映像の画素を正規混合モデル化している場合、短時間背景画像では、異なる分布モデルに属する画素が時間的に重なっている可能性がある。そのためしきい値は、当該分布モデルの分散に、当該モデルと隣接モデルのそれぞれの平均の差の１／２を加えた値を基準として調整されうる。 The threshold value setting device 46 adaptively sets a threshold value suitable for detecting the candidate region. As an example, by averaging the pixel values (absolute differences) for each of a plurality of predetermined regions in the difference image, the standard deviation in the distribution before the absolute value is obtained, and this standard deviation is multiplied by a predetermined coefficient. Is set as the threshold value. At the position of the pixel that does not belong to the area, the threshold value obtained in the neighboring area is interpolated and applied. The coefficient is set by a person and can be adjusted according to the detection status of the candidate area. When the pixels of the input video are modeled as a normal mixed model, the pixels belonging to different distribution models may overlap in time in the short-time background image. Therefore, the threshold value can be adjusted based on the variance of the distribution model plus 1/2 of the average difference between the model and the adjacent model.

時間フィルタ４７は、内部にフレームメモリを有し、画素毎に、その画素が真値となった直近のフレームのインデックスを保持し、２値化画像を受取る都度、そのインデックスを更新しながら、そのインデックスが直近の過去ｎフレーム以内であれば真値となるような画像（平滑化２値画像）を出力する。この処理により、一度真値となった画素はその値を最低ｎフレーム維持することとなり、真値画素の塊の形が、物体の形に近づく。なお、時間領域に限らず、メディアンフィルタ等の空間領域フィルタを施してもよい。 The time filter 47 has a frame memory inside, holds an index of the latest frame in which the pixel has become a true value for each pixel, and updates the index each time a binarized image is received. An image (smoothed binary image) that has a true value if the index is within the latest n frames in the past is output. By this processing, the pixel that has once become the true value maintains the value for at least n frames, and the shape of the mass of the true value pixels approaches the shape of the object. Not limited to the time domain, a spatial domain filter such as a median filter may be applied.

ラベリング器４８は、時間フィルタ４７からの２値化画像の中から、８近傍法や輪郭追跡法などを用いて、真値を持つ画素の塊を候補領域として抽出し、それらにインデックスを与えるとともにそれらの属性を取得して出力する。なおこの処理を追跡器７で集中的に行う場合、ラベリング器４６は不要である。ラベリング器４６は、直前のフレームからの抽出結果（インデックステーブル）を利用して現在のフレームから抽出を行う場合、簡易的な追跡も成し得る。 The labeling device 48 extracts a block of pixels having a true value as a candidate region from the binarized image from the time filter 47 by using the 8-nearest neighbor method, the contour tracking method, or the like, and gives an index to them. Get those attributes and output. When this process is intensively performed by the tracking device 7, the labeling device 46 is unnecessary. The labeling device 46 can also perform simple tracking when extracting from the current frame using the extraction result (index table) from the immediately preceding frame.

映像ソ―ス２が、別体に設けられた可視領域カメラと遠赤外線カメラからそれぞれ得られる映像である場合、差分法ベース検知器４は、両映像について独立に処理を行うことができる。もし必要であれば、短時間背景画像生成器４１からラベリング器４８までの構成を２セット設けてもよい。ただし、２セットは同一構成である必要は無く、遠赤外線映像を処理する側では、短時間背景画像生成器４１や時間フィルタ４７等を省略することができる。 When the image source 2 is an image obtained from a visible region camera and a far-infrared camera provided separately, the difference method-based detector 4 can independently process both images. If necessary, two sets of configurations from the short-time background image generator 41 to the labeling device 48 may be provided. However, the two sets do not have to have the same configuration, and the short-time background image generator 41, the time filter 47, and the like can be omitted on the side that processes the far-infrared image.

図４に、本実施例のシルエット状況下の検知器５の機能ブロック図が示される。シルエット状況下の検知器５は、２値化器５１、しきい値設定器５２、ラベリング器５３、時間フィルタ５４を備える。２値化器５１は、映像ソース２からの映像を、２値化器４５と同様に２値化し、２値化画像を出力する。ただし、画素値がしきい値より小さい時に真、しきい値以上の時に偽とする様態で２値化する。２値化器５１は、画素の輝度値のみを２値化できれば十分である。 FIG. 4 shows a functional block diagram of the detector 5 under the silhouette condition of this embodiment. The detector 5 under the silhouette situation includes a binarizer 51, a threshold value setting device 52, a labeling device 53, and a time filter 54. The binarizer 51 binarizes the video from the video source 2 in the same manner as the binarizer 45, and outputs a binarized image. However, it is binarized so that it is true when the pixel value is smaller than the threshold value and false when it is greater than or equal to the threshold value. It is sufficient for the binarizer 51 to be able to binarize only the brightness value of the pixel.

しきい値設定器５２は、２値化器５１で用いるしきい値を提供する。このしきい値は、飽和（ハイライト）と暗黒を弁別するだけなので、固定的に与えることができ、或いは簡易推定器３２で生成されたしきい値を利用してもよく、ラベリング器５３で検出した候補領域のサイズに応じて、例えば想定される物体のサイズに比べて大きすぎる領域が検出されたときにしきい値をより小さくするような様態で、調整してもよい。 The threshold setter 52 provides the threshold value used in the binarizer 51. Since this threshold value only distinguishes between saturation (highlight) and darkness, it can be given fixedly, or the threshold value generated by the simple estimator 32 may be used, and the labeling device 53 may be used. Depending on the size of the detected candidate region, for example, the threshold value may be adjusted to be smaller when a region that is too large compared to the size of the assumed object is detected.

ラベリング器５３は、２値化器５１からの２値化画像を、ラベリング器４６と同様に処理してラベリングする。必須ではないが、ラベリング器５３は、縦横比や充填率、輪郭画素数などの、候補領域の輪郭の複雑さを示す属性を取得することが望ましい。 The labeling device 53 processes and labels the binarized image from the binarizing device 51 in the same manner as the labeling device 46. Although not essential, it is desirable for the labeling device 53 to acquire attributes indicating the complexity of the contour of the candidate region, such as the aspect ratio, the filling factor, and the number of contour pixels.

時間フィルタ５４は、ラベリング器５３によるフレーム間の簡易的な対応付け、もしくは追跡器７の追跡結果を利用して、フレーム間で対応する候補領域の２値化画像を、重心を一致させて平均化し、候補領域の画像として出力する。シルエット映像では、物体は黒く映り、物体自身から輝度や色情報が得られないため、２値化画像の輪郭のみから物体認識しようとすると、精度の低下が危惧される。一方、時間フィルタ５４で得られる画像に含まれるモーションブラーは、追加的な特徴量を与え、精度の改善に役立つことが期待される。なお２値化画像に代えて、候補領域の原画像を平均化してもよく、平均化以外の時間領域操作を施してもよい。候補領域の原画像は、映像ソース２からの映像フレームと、候補領域の２値化画像との論理積演算によって得られる。時間フィルタ５４は、必須ではない。 The time filter 54 uses the simple mapping between frames by the labeling device 53 or the tracking result of the tracking device 7 to average the binarized images of the corresponding candidate regions between the frames with their centers of gravity matched. And output as an image of the candidate area. In the silhouette image, the object appears black and the brightness and color information cannot be obtained from the object itself. Therefore, if the object is recognized only from the outline of the binarized image, there is a concern that the accuracy may decrease. On the other hand, the motion blur included in the image obtained by the time filter 54 is expected to give an additional feature amount and help improve the accuracy. Instead of the binarized image, the original image of the candidate region may be averaged, or a time domain operation other than the averaging may be performed. The original image of the candidate area is obtained by a logical product operation of the video frame from the video source 2 and the binarized image of the candidate area. The time filter 54 is not essential.

図５に、本実施例の特徴量ベースの検知器６の機能ブロック図が示される。特徴量ベースの検知器６は、周知のランダムフォレストを利用して物体を識別するもので、パッチ指定器６１、サイズ正規化器６２、決定木実行器６３、確率統合器６４、クラス判別器６５、オンライン学習器６６、を備える。 FIG. 5 shows a functional block diagram of the feature amount-based detector 6 of this embodiment. The feature-based detector 6 identifies an object using a well-known random forest, and is a patch specifier 61, a size normalizer 62, a decision tree executor 63, a probability integrater 64, and a class discriminator 65. , Online learner 66.

パッチ指定器６１は、差分法ベース検知器４やシルエット状況下の検知器５で検出された候補領域を適切に含む画像パッチを適用して、映像ソース２の映像から部分画像を取り出す。パッチの形状は通常、正方形である。パッチ指定器６１は、特徴量ベースの検知器６の処理能力を超えない限りで、複数の画像パッチを任意に生成できる。例えば１つの候補領域に基づいて、位置や大きさを少し異ならせた複数のバージョンのパッチを適用してもよく、あるいは候補領域が無くても、海面状況取得器３で検出した海面領域内でパッチを順次走査して適用してもよい。そのときのパッチのサイズは、追跡器７で得られたカメラパラメータに基づいて、パッチの場所に船舶等が存在した場合に映るであろうサイズ（ただし後述の正規化サイズを下回らない）に設定する。逆に、１つの映像フレームから検出された候補領域が多すぎる場合、領域の大きさに応じた優先度、脅威評価器８からの要求、或いはラウンドロビンなどの基準で、候補領域を選別する必要がある。また、決定木実行器６３が、異なる時刻のフレームもしくは異なる時間領域操作を受けたフレームから特徴量を取り出すものである場合、同一の画像パッチをそれぞれフレームに対して適用する。 The patch specifier 61 applies an image patch appropriately including the candidate region detected by the difference method-based detector 4 and the detector 5 under the silhouette condition, and extracts a partial image from the video of the video source 2. The shape of the patch is usually square. The patch specifier 61 can arbitrarily generate a plurality of image patches as long as the processing capacity of the feature amount-based detector 6 is not exceeded. For example, based on one candidate area, a plurality of versions of patches with slightly different positions and sizes may be applied, or even if there is no candidate area, within the sea level area detected by the sea level condition acquirer 3. Patches may be sequentially scanned and applied. The patch size at that time is set to the size that will be displayed when a ship or the like is present at the patch location (however, it does not fall below the normalized size described later) based on the camera parameters obtained by the tracker 7. To do. On the contrary, when there are too many candidate areas detected from one video frame, it is necessary to select the candidate areas based on the priority according to the size of the area, the request from the threat evaluator 8, or the criteria such as round robin. There is. Further, when the decision tree executor 63 extracts a feature amount from a frame having a different time time or a frame having undergone a different time domain operation, the same image patch is applied to each frame.

サイズ正規化器６２は、パッチ指定器６１が切り出した部分画像を、決定木実行器６３が受け取ることができる所定のサイズに正規化する。サイズは、各画素が完全なカラー情報を有する４：４：４フォーマットであれば、例えば７×７画素で足りる。 The size normalizer 62 normalizes the partial image cut out by the patch specifier 61 to a predetermined size that can be received by the decision tree executor 63. The size may be, for example, 7 × 7 pixels if each pixel is in a 4: 4: 4 format with complete color information.

決定木実行器６３は、事前の学習により作成されたT個の決定木をそれぞれトラバースし、たどり着いた葉ノードに対応するクラスの確率を出力する。決定木の分岐は、分岐関数を評価することにより行う。分岐関数は、一例として、画像パッチ内の特定の１乃至４つの画素の値の加減算の結果をしきい値処理するものであり、Semantic Texton Forestsなどが知られる。各葉ノードには、学習に用いたサンプルに関して、そのクラスの事後確率ｐ（c|v）＝|Sc|/|S|が保持されており、その値を読み出すだけで良い。なおｐ（c|v）は、ｖを入力したときにクラスｃに識別される確率であり、|S|は学習に用いたサンプル集合Sにおけるサンプルの数、|Sc|はSのうちクラスcに属するサンプルの数である。本例では、クラスは、船舶、人、浮遊物、海面、その他からなる５つ若しくはそれ以上とする。本例の決定木実行器６３の特徴として、いくつかの決定木において、分岐関数が使用する値（説明変数）は、画像パッチに由来するものに限らず、候補領域の属性、例えば位置（距離）、大きさ、輪郭の複雑さ、平均輝度等を含む。 The decision tree executor 63 traverses each of the T decision trees created by prior learning, and outputs the probability of the class corresponding to the leaf node reached. Branching of the decision tree is performed by evaluating the branching function. As an example, the branch function performs threshold processing on the result of addition / subtraction of the values of specific 1 to 4 pixels in an image patch, and Semantic Texton Forests and the like are known. Each leaf node holds the posterior probability p (c | v) = | Sc | / | S | of the class for the sample used for training, and it is only necessary to read the value. Note that p (c | v) is the probability of being identified in class c when v is input, | S | is the number of samples in the sample set S used for training, and | Sc | is class c of S. The number of samples that belong to. In this example, the class is five or more consisting of ships, people, suspended objects, sea level, and others. As a feature of the decision tree executor 63 of this example, the values (explanatory variables) used by the branch function in some decision trees are not limited to those derived from the image patch, but are the attributes of the candidate area, for example, the position (distance). ), Size, contour complexity, average brightness, etc.

確率統合器６４は、クラス毎に、各決定木から得られた事後確率ｐを統合する。あるクラスｃにたどり着いた木が１つもなければ確率は０であり、複数の木があれば、それぞれの事後確率ｐを算術平均、幾何平均、最大値等によって統合する。 The probability integrater 64 integrates posterior probabilities p obtained from each decision tree for each class. If there is no tree that has reached a certain class c, the probability is 0, and if there are multiple trees, the posterior probabilities p of each are integrated by the arithmetic mean, geometric mean, maximum value, and the like.

クラス判別器６５は、統合された事後確率ｐの内、最大の１つに対応するクラスを、決定し、識別結果として出力する。 The class discriminator 65 determines the class corresponding to one of the largest of the integrated posterior probabilities p and outputs it as an identification result.

オンライン学習器６６は、運用中のデータを用いて、能動学習、半教師あり学習、トランスダクティブ学習等を行い、あるいはラベル伝搬によって事前のオフライン学習と同じアルゴリズムによる学習を継続し、性能を向上させる。能動学習では、何らかの発報が為され操作者が目視で物体を識別した結果を、学習機械に与える。例えば、発報の信頼性（クラス判別器の出力する確率）が低いもの、識別が間違っていたもの或いは目視でも紛らわしいと思われるもの、操作者の主観で特に覚えさせたいと思うもの、その他、識別境界の決定に貢献すると期待できるデータが、フィードバックされうる。なおAdaboostなどの、マージンの大きい訓練データで学習させたほど汎化誤差が小さくなるアルゴリズムでは、それに適したデータをフィードバックさせるべきである。 The online learning device 66 performs active learning, semi-supervised learning, inductive learning, etc. using the data in operation, or continues learning by the same algorithm as prior offline learning by label propagation to improve performance. Let me. In active learning, some kind of alarm is issued and the result of visually identifying the object by the operator is given to the learning machine. For example, the reliability of the report (probability of output by the class discriminator) is low, the identification is incorrect or it seems to be confusing visually, the operator's subjectivity wants to remember it, etc. Data that can be expected to contribute to the determination of identification boundaries can be fed back. For algorithms such as Adaboost, where the generalization error becomes smaller as training is performed with training data with a large margin, appropriate data should be fed back.

トランスダクティブ学習は、操作者によるラベルをテストデータとして集積し、このテストデータでの分類誤りを最小化するコンセプト及び手法である。決定木実行器６３での分岐関数や事後確率の学習、確率統合器６４でのアンサンブル学習には、例えばBrownBoost等の様々な周知の手法が利用できる。ブースティングは、既存のバージョンの識別器で誤認識された訓練データに大きな重みを与えて新たなバージョンの識別器を生成するメタアルゴリズムである。ラベル伝搬では、現在のクラス判別器６５の識別結果を仮のラベルとして使用する。単純な例では、葉ノードが保持する事後確率ｐを更新する。 Transductive learning is a concept and method that accumulates labels by operators as test data and minimizes classification errors in this test data. Various well-known methods such as Brown Boost can be used for learning the branch function and posterior probability in the decision tree executor 63 and ensemble learning in the probability integrator 64. Boosting is a meta-algorithm that produces a new version of the classifier by giving a large weight to the training data that was misrecognized by the existing version of the classifier. In label propagation, the identification result of the current class discriminator 65 is used as a temporary label. In a simple example, the posterior probability p held by the leaf node is updated.

図６に、本実施例の追跡器７の機能ブロック図が示される。追跡器７は、座標系変換器７１、キャリブレーション実行器７２、潮位取得器７３、属性統合器７４、カルマンフィルタ７５を備える。 FIG. 6 shows a functional block diagram of the tracker 7 of this embodiment. The tracker 7 includes a coordinate system converter 71, a calibration executor 72, a tide level acquirer 73, an attribute integrater 74, and a Kalman filter 75.

座標系変換器７１は、差分法ベース検知器４乃至特徴量ベース検知器６から、候補領域の属性を受取り、候補領域の座標や大きさを、画像（シーン）座標の値から、グローバル座標の値に変換する。一般には、投影行列Ｐの逆行列Ｖを用いることで、同次座標（斉次座標）表現された画像座標は、グローバル座標に変換される。

ここで、sは、画像座標における奥行の逆数に相当する値で、候補領域のグローバル座標における海抜ｚ_Ｒをｚ_{ｗｏｒｌｄ}に与えることで、下記の様に算出される。

海上監視の場合、物体の位置はグローバル座標においてＺ＝０（つまり海抜０）と仮定することができる。なお、上式における同次座標表現のグローバル座標は、最下行の値ｗ＝ｈ_４１・ｕ_{ｉｍａｇｅ}＋ｈ_４１・ｖ_{ｉｍａｇｅ}＋ｈ_４３・sで除算することで、ユークリッド座標となる。このようにｚ_{ｗｏｒｌｄ}に定数を与えることは、結果的に、下記のホモグラフィ変換もしくはDirect Linear Transformと等価である。

ここで、(ｕ_１,ｖ_１)..(ｕ_ｎ,ｖ_ｎ)は、校正用のｎ個の点の画像座標、(ｘ_１,ｙ_１)..(ｘ_ｎ,ｙ_ｎ)はそれらの点のグローバル座標（メトリック）、ｎは４以上の自然数である。 The coordinate system converter 71 receives the attributes of the candidate area from the difference method base detector 4 or the feature amount base detector 6, and determines the coordinates and size of the candidate area from the values of the image (scene) coordinates to the global coordinates. Convert to a value. Generally, by using the inverse matrix V of the projection matrix P, the image coordinates expressed in homogeneous coordinates (homogeneous coordinates) are converted into global coordinates.

Here, s is a value corresponding to the reciprocal of the depth in the image coordinates, and is calculated as follows by giving the sea level z _R in the global coordinates of the candidate region to z _world .

For maritime surveillance, the position of the object can be assumed to be Z = 0 (ie 0 above sea level) in global coordinates. The global coordinates of the homogeneous coordinate representation in the above equation become Euclidean coordinates by dividing by the value w = h ₄₁ · u _image + h ₄₁ · v _image + h ₄₃ · s in the bottom row. Giving a constant to z _world in this way is, as a result, equivalent to the following homography transformation or Direct Linear Transform.

_{_{Here, (u 1, v 1)}} .. (u n, v n) is, n pieces image coordinates of points for _{_{calibration, (x 1, y 1)}} .. (x n, y n) they The global coordinates (metric) of the point, n is a natural number of 4 or more.

なお、物体との距離が遠い場合、物体と水平線との見かけの距離から推定される、カメラと物体との距離Dを用いて、グローバル座標での位置を補正することができる。例えば、カメラの位置をグローバル座標の原点としている場合、以下のように元の座標の距離情報をDに置き換える様態で補正する。
When the distance to the object is long, the position in the global coordinates can be corrected by using the distance D between the camera and the object, which is estimated from the apparent distance between the object and the horizon. For example, when the position of the camera is the origin of the global coordinates, the distance information of the original coordinates is replaced with D as follows.

キャリブレーション実行器７２は、グローバル座標への変換に必要な投影行列P若しくはホモグラフィ行列H或いはカメラパラメータを算出する。投影行列Pは、内部カメラパラメータ行列Aと、外部カメラパラメータ（運動パラメータ）行列Mの積で与えられる。内部カメラパラメータ行列Aは、焦点距離、画素ピッチの縦横比などによって決まり、外部カメラパラメータ行列Mは、カメラの設置位置や撮影方向によって決まる。投影行列Pは、１１の自由度を有し、良く知られたZ. ZhangやTsaiの方法を用いて６以上の既知の点から推定できる。 The calibration executor 72 calculates the projection matrix P, the homography matrix H, or the camera parameters required for conversion to global coordinates. The projection matrix P is given by the product of the internal camera parameter matrix A and the external camera parameter (motion parameter) matrix M. The internal camera parameter matrix A is determined by the focal length, the aspect ratio of the pixel pitch, and the like, and the external camera parameter matrix M is determined by the camera installation position and the shooting direction. The projection matrix P has 11 degrees of freedom and can be estimated from 6 or more known points using the well-known Z. Zhang and Tsai methods.

潮位取得器７３は、キャリブレーション実行器７２内に設けられ、もし利用可能であれば、より正確な海面水位を取得してキャリブレーション実行器７２に提供する。水位は、海面の規定の場所に設けられた浮遊するブイや標識の高さ位置に基づいて推定したり、映像ソース２の映像に映りこんだ海岸線の位置や人工構造物に対する水面の位置などから画像処理技術を用いて推定したりすることができる。或いは、潮位情報を外部から随時取得したり、潮汐（天文潮）データを内部に保持し暦に応じて読み出すようにしてもよい。 The tide level acquirer 73 is provided in the calibration executor 72, and if available, acquires a more accurate sea level and provides it to the calibration executor 72. The water level is estimated based on the height position of floating buoys and signs provided at the specified location on the sea surface, or from the position of the coastline reflected in the image of the image source 2 and the position of the water surface with respect to the artificial structure. It can be estimated using image processing technology. Alternatively, the tide level information may be acquired from the outside at any time, or the tide (astronomical tide) data may be retained internally and read out according to the calendar.

属性統合器７４は、差分法ベース検知器４乃至特徴量ベース検知器６で得られた複数の候補領域を、過去に得られた対応する候補領域と関連付け、各候補領域を追跡しているカルマンフィルタ７５に渡すとともに、カルマンフィルタ７５から追跡結果を受取り、同一であると推定された候補領域の属性を統合、追加或いは更新する。関連付けは、候補領域の属性、特にグローバル座標で表現された位置座標や大きさ、速度が互いに類似するものを対応付ける様態で行われる。複数の検知器から同一の物体に由来する候補領域がそれぞれ得られたとしても、属性の類似性に基づいてそれらは統合される。また少なくとも１回追跡に成功すると、位置の変化に基づいて、軌跡や速度等の新たな属性が追加され、以後追跡が成功するたびに、その他の属性と同様に更新もしくは追加される。あるフレームから、単一の物体に由来する候補領域が複数に分裂して得られていたとしても、軌跡の共通性やサイズなどを考慮して、それらを統合することができる。候補領域は、追跡の継続によって、存在することが確からしい物体へと変わっていく。追跡が中断した場合、属性統合器７４は現在の想定位置付近での候補物体の検知の試行を、特徴量ベース検知器６に要求することができる。物体が映像ソース２の視界の外に出ることによる追跡の中断を避けるため、現在の想定位置が視野外であるかもしくは視野外に近づいているかどうかを、画像座標もしくはグローバル座標で判断し、カメラを搭載している電動雲台を制御することができる。 The attribute integrator 74 associates a plurality of candidate regions obtained by the difference method-based detector 4 to the feature amount-based detector 6 with the corresponding candidate regions obtained in the past, and tracks each candidate region with a Kalman filter. In addition to passing to 75, the tracking result is received from the Kalman filter 75, and the attributes of the candidate regions estimated to be the same are integrated, added, or updated. The association is performed by associating the attributes of the candidate area, particularly those having similar position coordinates, sizes, and velocities expressed in global coordinates. Even if candidate regions derived from the same object are obtained from multiple detectors, they are integrated based on the similarity of attributes. If the tracking is successful at least once, new attributes such as trajectory and velocity are added based on the change in position, and each time the tracking is successful thereafter, the attributes are updated or added in the same manner as the other attributes. Even if candidate regions derived from a single object are divided into a plurality of pieces from a certain frame, they can be integrated in consideration of the commonality and size of the trajectories. As the tracking continues, the candidate area turns into an object that is likely to exist. If the tracking is interrupted, the attribute integrater 74 can request the feature-based detector 6 to try to detect a candidate object near the current assumed position. In order to avoid interruption of tracking due to the object going out of the field of view of the video source 2, it is judged by the image coordinates or the global coordinates whether the current assumed position is out of the field of view or is approaching the field of view, and the camera. It is possible to control the electric pan head equipped with.

カルマンフィルタ７５は、属性統合器７４から候補領域の位置座標を受取り、候補領域毎にカルマンフィルタ処理を行い、推定された位置を出力する。推定された位置は、低減されたノイズを有する。カルマンフィルタ７５は、内部的にモデルの推定を行うため、算出された位置の分散は、属性統合器７４における関連付けのしきい値として利用され得る。 The Kalman filter 75 receives the position coordinates of the candidate area from the attribute integrator 74, performs Kalman filter processing for each candidate area, and outputs the estimated position. The estimated position has reduced noise. Since the Kalman filter 75 internally estimates the model, the calculated position variance can be used as the association threshold in the attribute integrater 74.

図７に、本実施例の脅威評価器８の機能ブロック図の一例が示される。脅威評価器８は、遠赤画像輝度評価器８１、位置変化評価器８２、大きさ評価器８３、縦横比評価器８４、輝度変動評価器８５、エッジ評価器８６、重点度評価器８７、その他の評価器８８、識別器８９、及び、発報制御器９０を有する。遠赤画像輝度評価器８１から充填度評価器８７までの構成は、分類器８８が使用する特徴量（説明変数）もしくは確率等の定量的な数値を算出するものである。 FIG. 7 shows an example of a functional block diagram of the threat evaluator 8 of this embodiment. The threat evaluator 8 includes a far-infrared image brightness evaluator 81, a position change evaluator 82, a size evaluator 83, an aspect ratio evaluator 84, a brightness fluctuation evaluator 85, an edge evaluator 86, a priority evaluator 87, and others. It has an evaluator 88, a classifier 89, and an alarm controller 90. The configuration from the far-infrared image luminance evaluator 81 to the filling degree evaluator 87 calculates a quantitative numerical value such as a feature amount (explanatory variable) or a probability used by the classifier 88.

遠赤画像輝度評価器８１は、追跡器７が蓄積した候補領域（物体）の属性の内、遠赤線画像の候補領域における輝度を評価し、それを説明する数値を出力する。一例として、候補領域中の輝度の平均値に所定の係数を乗じた値を出力する。係数は、各特徴量の分散を正規化する意味を含む。或いは、その平均輝度が得られたときにそれが船舶、浮遊物、もしくは遊泳者である事後確率をそれぞれ出力してもよい。以降の他の評価器でも同様である。 The far-infrared image brightness evaluator 81 evaluates the brightness in the candidate area of the far-infrared line image among the attributes of the candidate area (object) accumulated by the tracker 7, and outputs a numerical value explaining it. As an example, a value obtained by multiplying the average value of the brightness in the candidate region by a predetermined coefficient is output. The coefficient has the meaning of normalizing the variance of each feature. Alternatively, when the average brightness is obtained, the posterior probabilities that it is a ship, a floating object, or a swimmer may be output respectively. The same applies to other evaluation instruments thereafter.

位置変化評価器８２は、蓄積した候補領域の属性の１つである重心位置の時系列から、変動の周期や幅（波高）を算出し、海面上取得器３により得られた周期や波高との一致度、又は、位置変化の直線性や等速性の程度を数値化して出力する。波高は見かけの波高、実際の波高のいずれで比較してもよく、比較のために必要であれば適宜座標変換を行う。或いは候補領域の付近で、暗部/明部抽出器３３が抽出している明部又は暗部の縦方向位置の時系列と、上記重心位置の時系列との、統計的な相関値を計算してもよい。一方、直線性や等速性の程度は、一例として、加速度の大きさ（絶対値もしくは速度に垂直な成分）の平均値を平均速度で除算することにより数値化できる。このときに用いる候補領域の位置は、カルマンフィルタ７５で処理される前のものや、画像座標におけるものでもよい。 The position change evaluator 82 calculates the period and width (wave height) of the fluctuation from the time series of the position of the center of gravity, which is one of the attributes of the accumulated candidate region, and combines it with the period and wave height obtained by the sea level acquirer 3. The degree of coincidence, or the degree of linearity and constant velocity of position change is quantified and output. The wave height may be compared with either the apparent wave height or the actual wave height, and coordinate conversion is appropriately performed if necessary for comparison. Alternatively, in the vicinity of the candidate region, the statistical correlation value between the time series of the vertical position of the bright part or the dark part extracted by the dark part / bright part extractor 33 and the time series of the center of gravity position is calculated. May be good. On the other hand, the degree of linearity and constant velocity can be quantified by dividing the average value of the magnitude of acceleration (absolute value or component perpendicular to velocity) by the average velocity, for example. The position of the candidate region used at this time may be the one before being processed by the Kalman filter 75 or the one in the image coordinates.

大きさ評価器８３は、蓄積した候補領域の属性の１つである大きさ（ワールド座標）を時間平均化して評価した値を出力する。なお時間平均化に代えて、中央値を用いてもよい。他の評価器でも同様である。 The size evaluator 83 outputs a value evaluated by time-averaging the size (world coordinates), which is one of the attributes of the accumulated candidate area. The median value may be used instead of the time averaging. The same applies to other evaluators.

縦横比評価器８４は、蓄積した候補領域の属性の１つである縦横比を時間平均化して評価した値を出力する。 The aspect ratio evaluator 84 outputs a value evaluated by time-averaging the aspect ratio, which is one of the attributes of the accumulated candidate area.

輝度変動評価器８５は、蓄積した候補領域の属性の１つである平均輝度の時系列から、統計上の分散や偏差のような、ばらつきの度合いを評価した値を出力する。 The luminance variation evaluator 85 outputs a value obtained by evaluating the degree of variation such as statistical variance and deviation from the time series of average luminance, which is one of the attributes of the accumulated candidate region.

エッジ評価器８６は、蓄積した候補領域の属性の１つであるエッジ量を時間平均化して評価した値を出力する。 The edge evaluator 86 outputs a value evaluated by time averaging the edge amount, which is one of the attributes of the accumulated candidate area.

充填度評価器８７は、蓄積した候補領域の属性の１つである充填度を時間平均化して評価した値を出力する。 The filling degree evaluator 87 outputs a value evaluated by time-averaging the filling degree, which is one of the attributes of the accumulated candidate region.

その他の評価器８８は、候補領域の属性等に基づいて、その他の特徴量もしくは識別器８９のパラメータを出力する。例えば、映像ソースの種類（可視/遠赤）や、日照（昼/夜）に関わる特徴量、もしくはそれに応じて識別器を切替える信号を出力する。 The other evaluator 88 outputs other feature quantities or parameters of the discriminator 89 based on the attributes of the candidate area and the like. For example, it outputs a signal for switching the classifier according to the type of video source (visible / far red), the feature amount related to sunshine (day / night), or the corresponding amount.

識別器８９は、事例ベース推論（k近傍法）、決定木、ロジスティック回帰、ベイズ推論（隠れマルコフモデルを含む）、パーセプトロン等の周知技術を用いて構成された学習済みの識別器であり、候補領域の識別（分類）結果及び/または各分類の確率を出力する。評価器８９の内部では、その他の評価器８８の出力に応じて、パラメータや学習機械が切り替えられうる。もし、特徴量ベース検知器６による識別結果が利用できるときは、その結果と統合してもよく、ある候補領域について遠赤画像と可視画像の双方で各評価値が得られている場合、それぞれについて識別した結果統合してもよい。遠赤画像輝度評価器８１から重点度評価器８７が、各分類の確率を出力するものである場合、識別器８９はそれらを統合するアンサンブル学習器で構成され得る。 The classifier 89 is a trained classifier constructed using well-known techniques such as case-based inference (k-nearest neighbor method), decision tree, logistic regression, Bayesian inference (including hidden Markov model), and perceptron, and is a candidate. Outputs the area identification (classification) result and / or the probability of each classification. Inside the evaluator 89, parameters and learning machines can be switched according to the output of the other evaluator 88. If the identification result by the feature amount-based detector 6 is available, it may be integrated with the result, and if each evaluation value is obtained in both the far-infrared image and the visible image for a certain candidate region, each evaluation value is obtained. As a result of identifying the above, they may be integrated. When the far-infrared image luminance evaluator 81 to the priority evaluator 87 outputs the probabilities of each classification, the discriminator 89 may be composed of an ensemble learner that integrates them.

侵入度評価器９０は、蓄積した候補領域の属性の１つである、カルマンフィルタで処理された位置座標の系列から、領海への侵入度もしくは陸への接近度、或いはそれらの意図もしくは可能性に関する評価値を出力する。簡易な例では、現在の位置（グローバル座標）を、予め保持している地図の海岸線（基線）や領海線との最短距離を評価値とすることができる。しかし岬の先を通過するような侵入意図のない船に対して発報する可能性がある。そのため、周知の機械学習手法を用いて多数の軌跡を学習させ、平常時から観察される軌跡とは異なる軌跡に反応するような外れ値（異常値）検知を行ったり、行き先推定を行ったりして、それらの値に応じて評価値が変化することが望ましい。 The intrusion degree evaluator 90 relates to the degree of invasion into the territorial waters or the degree of approach to the land, or their intention or possibility, from a series of position coordinates processed by the Kalman filter, which is one of the attributes of the accumulated candidate area. Output the evaluation value. In a simple example, the shortest distance between the current position (global coordinates) and the coastline (baseline) or territorial waters line of the map held in advance can be used as the evaluation value. However, it may alert ships that have no intention of invading, such as passing beyond the cape. Therefore, a large number of loci are learned using a well-known machine learning method, and outliers (outliers) that react to trajectories different from the trajectories observed from normal times are detected and destinations are estimated. Therefore, it is desirable that the evaluation value changes according to those values.

発報制御器９１は、識別器８９による物体の識別結果と、侵入度評価器９０による評価値とに基づいて、侵入の脅威の程度を表わす連続的な或いは十分に多段階の評価値を出力するとともに、その評価値が設定されたしきい値を跨ぐ変化をするたびに、アラームを出力する。識別器８９の識別結果は通常、確率を示しているが、もし１つの選ばれたクラスを指し示すだけの結果であっても、カルマンフィルタ７５による追跡期間が長いほど、もしくは、候補領域の見かけのサイズが大きいほど高まるような信頼性を用いることができる。 The alarm controller 91 outputs a continuous or sufficiently multi-step evaluation value indicating the degree of threat of intrusion based on the identification result of the object by the classifier 89 and the evaluation value by the intrusion degree evaluation device 90. At the same time, an alarm is output each time the evaluation value changes over the set threshold value. The identification result of the classifier 89 usually indicates the probability, but even if the result only points to one selected class, the longer the tracking period by the Kalman filter 75, or the apparent size of the candidate area. It is possible to use reliability that increases as the value increases.

本発明に係るシステムや装置などの構成としては、必ずしも以上に示したものに限られず、種々な構成が用いられてもよい。例えば短時間背景画像を用いず、映像フレームと長時間背景画像を差分処理してもよく、波による差分が多数生じるものの、特徴量ベースの検知器６等における機械学習によってそれらを分別できる可能性がある。 The configurations of the system, the apparatus, and the like according to the present invention are not necessarily limited to those shown above, and various configurations may be used. For example, the video frame and the long-time background image may be subjected to difference processing without using the short-time background image, and although many differences due to waves occur, there is a possibility that they can be separated by machine learning with a feature-based detector 6 or the like. There is.

また、本発明は、例えば、本発明に係る処理を実行する方法或いは装置や、そのような方法をコンピュータに実現させるためのプログラムや、当該プログラムを記録する一過性ではない有形の媒体などとして提供することもできる。 Further, the present invention is, for example, as a method or device for executing a process according to the present invention, a program for realizing such a method on a computer, a non-transient tangible medium for recording the program, or the like. It can also be provided.

本発明は、ＣＣＴＶ（Closed-Circuit Television）システム等に適用できる。 The present invention can be applied to CCTV (Closed-Circuit Television) systems and the like.

１監視システム
２監視カメラ装置
３海面状況取得器
４差分法ベース検知器
５シルエット状況下の検知器
６特徴量ベース検知器
７追跡器
８脅威評価器
1 Surveillance system 2 Surveillance camera device 3 Sea level condition acquirer 4 Difference method base detector 5 Detector under silhouette condition 6 Feature base detector 7 Tracker 8 Threat evaluator

Claims

A sea level conditioner (3) that automatically estimates the wave attributes including the amplitude and period of the water surface wave that is the background of the input video based on the input video from the video source (2).
A finite difference method-based detector (4) that generates a reference image from the input video and detects pixels whose values change at a higher speed than the reference image from the input video.
A detector (5) under a silhouette situation that detects a dark region as an object candidate from the input image showing a background having substantially saturated brightness and an object having substantially dark brightness.
An image feature amount is extracted from the input video, and a feature amount-based detector (6) that outputs the type of the object when the image feature amount corresponding to the type of the object that has been machine-learned in advance is found.
Equipped with a,
The feature amount-based detector (6) acquires information on a candidate region detected by the difference method-based detector (4) and the detector (5) under the silhouette situation, and the darkness in the input video. A water intrusion detection system that extracts the image feature amount in the vicinity of the area .

A difference method-based detector (4) that generates a reference image based on the input video from the video source (2) and detects pixels whose values change at a higher speed than the reference image from the input video.
A detector (5) under a silhouette situation that detects a dark region as an object candidate from the input image showing a background having substantially saturated brightness and an object having substantially dark brightness.
An image feature amount is extracted from the input video, and a feature amount-based detector (6) that outputs the type of the object when the image feature amount corresponding to the type of the object that has been machine-learned in advance is found.
The difference method based detectors, the aforementioned under silhouette circumstances detectors and labeling the dark area of the object candidate detected by the feature amount based detectors, and the time direction of the association, integrated the A tracker (7) that updates the attributes of the dark area of the object candidate, and
Equipped with a,
The feature amount-based detector (6) acquires information on a candidate region detected by the difference method-based detector (4) and the detector (5) under the silhouette situation, and the darkness in the input video. A water intrusion detection system that extracts the image feature amount in the vicinity of the area .

A difference method-based detector (4) that generates a reference image based on the input video from the video source (2) and detects pixels whose values change at a higher speed than the reference image from the input video.
A detector (5) under a silhouette situation that detects a dark region as an object candidate from the input image showing a background having substantially saturated brightness and an object having substantially dark brightness.
An image feature amount is extracted from the input video, and a feature amount-based detector (6) that outputs the type of the object when the image feature amount corresponding to the type of the object that has been machine-learned in advance is found.
A threat evaluator (8) that identifies the object, continues to track the object, considers the tendency of approaching the land, comprehensively evaluates the threat, and issues a multi-stage report.
With
The feature amount-based detector (6) acquires information on a candidate region detected by the difference method-based detector (4) and the detector (5) under the silhouette situation, and the darkness in the input video. A water intrusion detection system that extracts the image feature amount in the vicinity of the area .

A sea level conditioner (3) that automatically estimates the wave attributes including the amplitude and period of the water surface wave that is the background of the input video based on the input video from the video source (2).
A finite difference method-based detector (4) that generates a reference image from the input video and detects pixels whose values change at a higher speed than the reference image from the input video.
A detector (5) under a silhouette situation that detects a dark region as an object candidate from the input image showing a background having substantially saturated brightness and an object having substantially dark brightness.
An image feature amount is extracted from the input video, and a feature amount-based detector (6) that outputs the type of the object when the image feature amount corresponding to the type of the object that has been machine-learned in advance is found.
The difference method based detectors, and the dark area of the object candidate detected by the lower silhouette circumstances detectors and the feature-based detectors labeling, and the time direction of the association, integrated the object candidate The tracker (7) that updates the attributes of the dark area of
Equipped with a,
The feature amount-based detector (6) acquires information on a candidate region detected by the difference method-based detector (4) and the detector (5) under the silhouette situation, and the darkness in the input video. A water intrusion detection system that extracts the image feature amount in the vicinity of the area .

A sea level conditioner (3) that automatically estimates the wave attributes including the amplitude and period of the water surface wave that is the background of the input video based on the input video from the video source (2).
A finite difference method-based detector (4) that generates a reference image from the input video and detects pixels whose values change at a higher speed than the reference image from the input video.
A detector (5) under a silhouette situation that detects a dark region as an object candidate from the input image showing a background having substantially saturated brightness and an object having substantially dark brightness.
An image feature amount is extracted from the input video, and a feature amount-based detector (6) that outputs the type of the object when the image feature amount corresponding to the type of the object that has been machine-learned in advance is found.
A threat evaluator (8) that identifies an object, comprehensively evaluates the threat in consideration of the movement tendency of the object, and issues a multi-stage report.
Equipped with a,
The feature amount-based detector (6) acquires information on a candidate region detected by the difference method-based detector (4) and the detector (5) under the silhouette situation, and the darkness in the input video. A water intrusion detection system that extracts the image feature amount in the vicinity of the area .

A difference method-based detector (4) that generates a reference image based on the input video from the video source (2) and detects pixels whose values change at a higher speed than the reference image from the input video.
A detector (5) under a silhouette situation that detects a dark region as an object candidate from the input image showing a background having substantially saturated brightness and an object having substantially dark brightness.
An image feature amount is extracted from the input video, and a feature amount-based detector (6) that outputs the type of the object when the image feature amount corresponding to the type of the object that has been machine-learned in advance is found.
The difference method based detectors, labeling the dark area of the object candidate detected by the detector and the characteristic amount based detectors under the silhouettes conditions, and the time direction of the association, and integrated the A tracker (7) that updates the attributes of the dark area of the object candidate, and
A threat evaluator (8) that identifies the object that caused the dark region of the object candidate , comprehensively evaluates the threat in consideration of the movement tendency of the object, and issues a multi-stage report.
Equipped with a,
The feature amount-based detector (6) acquires information on a candidate region detected by the difference method-based detector (4) and the detector (5) under the silhouette situation, and the darkness in the input video. A water intrusion detection system that extracts the image feature amount in the vicinity of the area .