JP2011186719A

JP2011186719A - Apparatus, method and program for detecting object

Info

Publication number: JP2011186719A
Application number: JP2010050417A
Authority: JP
Inventors: Naoki Ito; 直己伊藤; Isamu Igarashi; 勇五十嵐; Isao Miyagawa; 勲宮川; Hiroyuki Arai; 啓之新井; Hideki Koike; 秀樹小池
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-03-08
Filing date: 2010-03-08
Publication date: 2011-09-22

Abstract

<P>PROBLEM TO BE SOLVED: To count the accurate number of objects by controlling adaptation of an object to a background model even when the object (person) rests for a comparatively long period. <P>SOLUTION: An object detection processing part 3 divides an input image photographed by a fixed camera 1 and a background model into blocks each of which is composed of a plurality of pixels and extracts an object area corresponding to a contour of the object in each block. Further, the object detection processing part 3 performs expansion processing to fill an area corresponding to the inside of the contour part and performs contraction processing for controlling excess detection of the object area due to the expansion processing. A load value table generation part 5 obtains a load value expressing a contribution ratio of each pixel to the number of objects to generate a load value table. An object counting part 6 integrates the load values of pixels in the object area to count the number of objects. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、固定カメラ等の画像入力装置で撮影された入力画像中に存在する物体の領域を検出し、その入力画像中の物体の数を計数する物体検出装置、物体検出方法及び物体検出プログラムに関し、特に、雑踏中の人物のように、静止していてもふらつき等によって手足、足先や頭部などの輪郭部分が微動する物体の計数に好適な技術に関する。 The present invention relates to an object detection device, an object detection method, and an object detection program for detecting a region of an object present in an input image taken by an image input device such as a fixed camera and counting the number of objects in the input image. In particular, the present invention relates to a technique suitable for counting objects such as limbs, toes, and heads that slightly move due to wobbling or the like, such as a person who is busy.

固定カメラ等からの入力画像（映像）から物体の数を計数（計測）する手法として、特許文献１に記載のものが公知である。このものでは、撮影に使用するカメラの姿勢や位置などを含む幾何的条件に基づいて、入力画像中の各画素に物体が存在する場合における、物体の数に対する各画素の寄与率に相当する荷重値を全画素に対して求める一方、入力画像から物体を除く背景モデルを構築し、これら入力画像と背景モデルとの差分により、物体が存在する物体領域を求めている。そして、この物体領域を構成する画素の荷重値を積算することによって、入力画像中に存在する物体の数を計測している。 As a technique for counting (measuring) the number of objects from an input image (video) from a fixed camera or the like, the one described in Patent Document 1 is known. In this case, the load corresponding to the contribution ratio of each pixel to the number of objects when there is an object at each pixel in the input image based on geometric conditions including the posture and position of the camera used for shooting. While obtaining values for all the pixels, a background model excluding the object is constructed from the input image, and an object region in which the object exists is obtained from the difference between the input image and the background model. Then, the number of objects existing in the input image is measured by integrating the load values of the pixels constituting the object region.

特開２００９−２９４７５５号公報JP 2009-294755 A

上記背景モデルは、例えば時系列に撮影された複数枚の入力画像の平均を用いて生成され、更に、屋外における時間的な照明の変化（朝と夜の明るさの違い）を除外するため、定期的に更新される。そのため、物体としての人物が長期的にその場に留まる場合、物体の移動を検出できずに物体が背景に同化し易く、物体の検出漏れを招くおそれがある。また、同様に類似した物体が多数存在する場合、物体間の差異が小さいために、過去に同一箇所に存在した類似物体が背景として取り込まれ易く、やはり物体の検出漏れを招き易い。 The background model is generated using, for example, an average of a plurality of input images taken in time series, and further excludes temporal changes in illumination outdoors (differences in morning and night brightness). Updated regularly. Therefore, when a person as an object stays on the spot for a long time, the movement of the object cannot be detected, and the object is likely to be assimilated to the background, and there is a possibility that the detection of the object may be omitted. Similarly, when there are many similar objects, since the difference between the objects is small, a similar object that has existed in the same place in the past is likely to be taken in as a background, and it is easy to cause an object detection failure.

人物などが長期的に静止・滞留する場合、一般的には完全な静止状態になることは無く、ふらつきなどにより、手先や足先などの輪郭部分では微少な動きの変化が現れる。本発明ではこの点に着目し、物体の輪郭を抽出した上で、膨張処理及び収縮処理を行うことで、長期的に静止・滞留する物体が存在する場合や、同様に類似した物体が多数存在する場合であっても、物体が背景に同化することによる検出漏れを低減し、物体の数を正確に計数するものである。 When a person or the like stays and stays for a long period of time, in general, the person will not be in a completely stationary state, and a slight change in movement appears in the contour parts such as the hands and feet due to wobbling. In the present invention, paying attention to this point, by extracting the outline of the object and then performing the expansion process and the contraction process, there are objects that remain stationary or staying in the long term, or there are many similar objects as well. Even in this case, the detection omission due to the assimilation of the object with the background is reduced, and the number of objects is accurately counted.

すなわち、本発明に係る物体検出装置は、物体検出の対象範囲に対して所定の俯角をもって設置された画像入力装置により撮影される入力画像に基づいて、背景モデルを構築するとともに、前記入力画像と背景モデルとに基づいて、前記入力画像中に物体が存在する物体領域を検出する物体検出処理手段と、前記画像入力装置の幾何的条件に基づいて、前記入力画像中の各画素に物体が存在した場合における物体の数に対する各画素の寄与率を表す荷重値を求めて荷重値テーブルを生成する荷重値テーブル生成手段と、前記物体領域を構成する画素の荷重値を積算して、前記入力画像中に存在する物体の数を計数する物体計数手段と、を有するものである。 That is, the object detection device according to the present invention constructs a background model based on an input image photographed by an image input device installed at a predetermined depression angle with respect to a target range of object detection, and the input image and Object detection processing means for detecting an object region in which an object is present in the input image based on a background model, and an object exists in each pixel in the input image based on a geometric condition of the image input device A load value table generating means for determining a load value representing a contribution ratio of each pixel to the number of objects in the case, and generating a load value table; and integrating the load values of the pixels constituting the object region, the input image And object counting means for counting the number of objects present therein.

そして、本発明では、上記物体検出処理手段が、前記入力画像と背景モデルとを、複数の画素からなるブロックに分割し、各ブロック毎に、前記入力画像と背景モデルとの類似度に基づいて、物体の輪郭に対応する物体領域を抽出し、この物体の輪郭の内側の領域を物体領域に補正するように、前記物体の輪郭に対応する物体領域の膨張処理を行い、かつ、この膨張処理による物体領域の過剰な膨張を抑制するように、前記膨張処理後の物体領域の収縮処理を行う、ことを特徴としている。 In the present invention, the object detection processing unit divides the input image and the background model into blocks composed of a plurality of pixels, and for each block, based on the similarity between the input image and the background model. The object region corresponding to the contour of the object is extracted, and the object region corresponding to the contour of the object is expanded so that the region inside the contour of the object is corrected to the object region. The object region after the expansion processing is contracted so as to suppress excessive expansion of the object region due to the above.

このように、入力画像と背景モデルとの類似度に基づく物体領域の検出を、物体に対して適切な大きさに設定された複数の画素を含むブロック単位で定量的に行うことで、例えば個々の画素単位で細かく物体領域の検出を行う場合に比して、物体の形状に対する追従性は低下するものの、個々の画素の誤検出による影響を相殺・吸収することができるために、微小な変動を生じる物体の輪郭を確実に抽出することが可能となる。そして、膨張処理により物体の輪郭の内側の領域を物体領域として補正し、かつ、収縮処理によって、膨張処理による物体領域の過剰な膨張を抑制することで、物体領域を適切に検出することができる。これにより、物体が背景モデルに同化することによる物体の検出漏れを抑制し、正確な物体の数の計数が可能となる。 In this way, by detecting the object region based on the similarity between the input image and the background model quantitatively in units of blocks including a plurality of pixels set to an appropriate size for the object, for example, individual Compared to the case where the object area is detected finely in units of pixels, the followability to the shape of the object is reduced, but the influence of erroneous detection of individual pixels can be offset and absorbed. It is possible to reliably extract the contour of an object that causes Then, the region inside the contour of the object is corrected as the object region by the expansion process, and the object region can be appropriately detected by suppressing the excessive expansion of the object region due to the expansion process by the contraction process. . As a result, it is possible to suppress object detection omission due to assimilation of the object with the background model, and to accurately count the number of objects.

但し、膨張処理及び収縮処理を経ることによって、隣接する物体間の非物体領域が物体領域として過剰に検出される結合領域が不可避的に発生し、この結合領域の分、物体の数が過剰に計数される、という新たな問題が生じる。そこで好ましくは、前記結合領域が発生する確率を求め、この確率に基づいて、前記物体計数手段により計数される物体の数を低下側へ補正する。これによって、結合領域の分、過剰に計数される物体の数を抑制することが可能となる。 However, through the expansion process and the contraction process, a non-object area between adjacent objects is inevitably detected as an object area, and the number of objects is excessive due to this combined area. A new problem arises of being counted. Therefore, preferably, the probability that the combined region occurs is obtained, and based on this probability, the number of objects counted by the object counting means is corrected to the lower side. This makes it possible to suppress the number of objects that are excessively counted by the amount of the combined region.

上述したように、物体領域の検出精度や物体の輪郭の抽出精度は、物体の大きさとブロックの大きさとの相関関係に大きく依存しており、物体に対してブロック（格子）が小さ過ぎると、図１（Ａ）に示すように、物体の輪郭の内側に存在する物体未検出のブロックの数が多くなり、上述した膨張処理により輪郭内部の欠損部分を埋める補正が困難となる。 As described above, the detection accuracy of the object region and the extraction accuracy of the contour of the object greatly depend on the correlation between the size of the object and the size of the block, and if the block (grid) is too small for the object, As shown in FIG. 1A, the number of undetected blocks existing inside the contour of the object increases, and it becomes difficult to correct the filling of the missing portion inside the contour by the expansion processing described above.

従って、好ましくは、前記画像入力装置の幾何的条件に基づいて、入力画像中の平均的な物体の大きさを算出し、この平均的な物体の縦幅と横幅のうちで短い方の長さに基づいて、前記ブロックの大きさを設定する。具体的には、この平均的な物体の縦幅と横幅のうちで短い方の長さに対してブロックの一辺の長さが１／２〜１／３の範囲内となるように、ブロックの大きさを自動的に調整・設定する。 Therefore, preferably, the average size of the object in the input image is calculated based on the geometric condition of the image input device, and the shorter length of the average width and width of the average object is calculated. The size of the block is set based on Specifically, the length of one side of the block is within a range of 1/2 to 1/3 with respect to the shorter length of the average vertical and horizontal width of the average object. Adjust and set the size automatically.

あるいは、ブロックを一定の大きさとして、入力画像や背景モデルを拡大もしくは縮小するようにしても良い。すなわち、前記画像入力装置の幾何的条件に基づいて、入力画像中の平均的な物体の大きさを算出し、この平均的な物体の縦幅と横幅のうちで短い方の長さに基づいて、各ブロックに対する物体の大きさが所定レベルとなるように、前記入力画像と背景モデルの大きさを拡大もしくは縮小した上で、各ブロック毎に物体を検出する。この場合、前記物体計数手段は、検出結果を元の大きさへ戻した後、物体の数を計数することとなる。 Alternatively, the input image and the background model may be enlarged or reduced with the block having a certain size. That is, based on the geometric condition of the image input device, the average size of the object in the input image is calculated, and based on the shorter length of the average width and width of the average object. Then, the size of the input image and the background model is enlarged or reduced so that the size of the object for each block becomes a predetermined level, and then the object is detected for each block. In this case, the object counting means counts the number of objects after returning the detection result to the original size.

上述した各処理内容は、例えば、プログラムの形態としてコンピュータにより実行され、このプログラムは適宜な記録媒体に記録することができる。 Each processing content described above is executed by a computer in the form of a program, for example, and this program can be recorded on an appropriate recording medium.

以上のように本発明によれば、複数の画素を含むブロック単位で物体領域を検出することによって物体の輪郭を抽出し、かつ、この物体の輪郭の内側を埋める膨張処理並びに過度な膨張を抑制する収縮処理を行うことで、人物などの物体が比較的長い期間、その場に滞留・静止する場合や、長期的な照明変動に対応するために背景モデルの更新を行った場合においても、人物のふらつきなどにより不可避的に微動を生じる物体の輪郭を抽出して、検出対象の物体を背景モデルから分離し、適切な物体領域の検出並びに物体の数の計数を行うことができる。 As described above, according to the present invention, the contour of an object is extracted by detecting the object region in units of blocks including a plurality of pixels, and the expansion processing for filling the inside of the contour of the object and excessive expansion are suppressed. Even if an object such as a person stays or stays in place for a relatively long period of time, or if the background model is updated to cope with long-term lighting fluctuations, It is possible to extract the contour of an object that inevitably causes a fine movement due to wobbling of the object, to separate the object to be detected from the background model, to detect an appropriate object region, and to count the number of objects.

（Ａ）がブロックサイズが小さい場合、（Ｂ）がブロックサイズが大きい場合における物体として検出される領域の概念を示す説明図。Explanatory drawing which shows the concept of the area | region detected as an object when (A) has a small block size and (B) has a large block size. モルフォロジー演算による膨張処理（Ａ）と収縮処理（Ｂ）の一例を示す説明図。Explanatory drawing which shows an example of the expansion process (A) and shrinkage | contraction process (B) by morphological calculation. 収縮処理による物体周辺の過剰検出画素の抑制の一例を示す説明図。Explanatory drawing which shows an example of suppression of the excessive detection pixel of an object periphery by shrinkage | contraction processing. 膨張処理及び収縮処理により残存する物体の結合領域を示す説明図。Explanatory drawing which shows the coupling | bonding area | region of the object which remains by an expansion process and a contraction process. 膨張処理により物体の結合が発生する可能性のある領域を示す説明図。Explanatory drawing which shows the area | region where the coupling | bonding of an object may generate | occur | produce by an expansion process. 本発明に係る画像蓄積部を含む場合の基本構成を簡略的に示す構成図。The block diagram which shows simply the basic composition in case the image storage part which concerns on this invention is included. 本発明に係る画像蓄積部を含まない場合の基本構成を簡略的に示す構成図。The block diagram which shows simply the basic composition in case the image storage part which concerns on this invention is not included. 本発明の第１実施例に係る制御処理の流れを示すフローチャート。The flowchart which shows the flow of the control processing which concerns on 1st Example of this invention. 本発明の第２実施例に係る制御処理の流れを示すフローチャート。The flowchart which shows the flow of the control processing which concerns on 2nd Example of this invention. 本発明の第３実施例に係る制御処理の流れを示すフローチャート。The flowchart which shows the flow of the control processing which concerns on 3rd Example of this invention.

以下、本発明の好ましい実施形態について図面を参照して詳細に説明する。先ず、図１〜図５を参照して、基本的な原理・概念について説明する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. First, basic principles and concepts will be described with reference to FIGS.

人物などが長期的にその場に静止・滞留する場合、一般的には完全な静止状態になることは無く、ふらつきなどによって、手先、足先や頭部などの輪郭部分では微少な動きの変化が現れる。この点に着目し、固定カメラ等の画像入力装置により撮影される入力画像（単に「画像」とも呼ぶ）と背景モデル（以下、単に「背景」とも呼ぶ）とを、複数の画素からなる格子状のブロック（「格子」とも呼ぶ）毎に比較することにより、このような微少な変化を伴う物体の輪郭を抽出し、この物体の輪郭に対して後述する膨張処理並びに収縮処理を施すことによって、実際の物体に応じた適切な物体領域の検出を可能とするものである。つまり、複数の画素からなるブロック単位で物体領域を検出することで、長期的な照明変動に対応するために背景更新を行ったような場合においても、画像中の物体のふらつき等による物体の輪郭部分の微少な変化を検出し、検出対象の物体の輪郭を背景と分離して確実に抽出することができる。従って、類似した物体が複数存在する場合においても、類似した物体間の微少な差異を検出することが可能となる。 When a person or the like stays or stays on the spot for a long period of time, it is generally not completely stationary, and slight fluctuations in the contours of the hands, feet, head, etc. due to wobbling etc. Appears. Focusing on this point, an input image (also simply referred to as “image”) and a background model (hereinafter also simply referred to as “background”) captured by an image input device such as a fixed camera are formed in a grid pattern composed of a plurality of pixels. By comparing each block (also referred to as “lattice”), the contour of the object with such a slight change is extracted, and by performing expansion processing and contraction processing described later on the contour of the object, It is possible to detect an appropriate object region according to an actual object. In other words, by detecting the object area in units of blocks consisting of a plurality of pixels, even when background update is performed to cope with long-term illumination fluctuations, the contour of the object due to object fluctuations in the image, etc. It is possible to detect a minute change in the portion and extract the outline of the object to be detected separately from the background. Therefore, even when there are a plurality of similar objects, it is possible to detect a minute difference between similar objects.

図１に本発明の一実施形態に係る物体検出の第一段階の概念図を示す。任意の画像と背景モデルとを、それぞれ正方格子状の適宜な大きさのブロック１０に区切り、各ブロック１０毎に、入力画像と背景モデルとを比較することで、各ブロック１０、つまりはブロック１０内に含まれる全ての画素が物体領域か否かを検出する。人物のような移動物体１１の場合、ふらつきなどの微少な変化により、物体１１の輪郭・周辺領域では、背景と物体領域が変動するため、複数画素から構成される各ブロック１０単位で、入力画像と背景モデルとを比較することで、図１の右側白抜き部分１２に示すように、物体領域として物体１１の輪郭部分を抽出することができる。 FIG. 1 shows a conceptual diagram of the first stage of object detection according to an embodiment of the present invention. An arbitrary image and a background model are divided into blocks 10 each having an appropriate size in a square lattice shape, and each block 10, that is, the block 10 is compared by comparing the input image and the background model for each block 10. It is detected whether or not all the pixels included in the object region are object regions. In the case of a moving object 11 such as a person, the background and the object area fluctuate in the contour / peripheral area of the object 11 due to slight changes such as wobbling. 1 and the background model, the contour portion of the object 11 can be extracted as the object region as shown in the white outline portion 12 in FIG.

ここで、ブロック１０の大きさは任意の大きさに設定可能であるが、物体１１の大きさに対し、ブロック１０が小さ過ぎる場合には、図１（Ａ）に示すように、物体の輪郭部分１２の内側に位置する物体として検出されない未検出領域１３となるブロック数が増加し、後述する膨張処理による補正が困難となる。従って、ブロック１０は一定以上の大きさを持つこととし、且つ後述の膨張処理による未検出領域の補正のため、画像中で一様な大きさとする。 Here, the size of the block 10 can be set to an arbitrary size. However, when the block 10 is too small with respect to the size of the object 11, as shown in FIG. The number of blocks that become the undetected area 13 that is not detected as an object located inside the portion 12 increases, and correction by an expansion process, which will be described later, becomes difficult. Accordingly, the block 10 has a certain size or more, and has a uniform size in the image in order to correct an undetected area by an expansion process described later.

物体表面の変化が乏しい場合、図１に示すように、物体の輪郭部分１２は物体領域として検出されるものの、物体内側の領域１３では物体の移動・変化がほとんど起こらないため、複数画素を含むブロック毎の比較であっても、物体が背景モデルに同化して検出することはできない。そこで、このような検出できない物体の輪郭部分１２の内側の未検出領域１３を、物体領域として補正・検出するため、図２に示すように、例えば公知のモルフォロジー（ｍｏｒｐｈｏｌｏｇｙ）演算を用いた膨張処理、更にはこの膨張処理による過度な物体領域の膨張を低減するための収縮処理を行う。モルフォロジー演算の膨張処理により、物体の輪郭に対応する領域１２の内側の未検出領域１３を物体領域として補正し、その後の収縮処理によって、膨張処理により過剰検出された物体周辺の領域１２Ａを抑制する。このとき、モルフォロジー演算の膨張処理、収縮処理に使用する構造要素としてのブロックは、例えば３画素×３画素の正方格子とし、膨張処理、収縮処理を正方格子の１辺の画素数の半分の回数（１辺１０画素の場合は５回）ずつ繰り返し実行することにより、未検出領域１３が物体領域として検出される。図３に示すように、格子状に分割された複数画素を含む各ブロック毎に比較を行うことによって、過剰検出される物体周辺画素についても、モルフォロジー演算の収縮処理によって、実際の物体の大きさに応じた適宜な大きさの物体領域に抑制することができる。 When the change in the surface of the object is poor, as shown in FIG. 1, the contour portion 12 of the object is detected as the object region, but the region 13 inside the object hardly moves / changes the object, and therefore includes a plurality of pixels. Even in the block-by-block comparison, the object cannot be detected assimilated into the background model. Therefore, in order to correct and detect the undetected region 13 inside the contour portion 12 of the object that cannot be detected as an object region, as shown in FIG. 2, for example, an expansion process using a known morphological operation. Further, a contraction process is performed to reduce the excessive expansion of the object region due to the expansion process. The undetected area 13 inside the area 12 corresponding to the contour of the object is corrected as an object area by the expansion process of the morphological operation, and the area 12A around the object that is excessively detected by the expansion process is suppressed by the subsequent contraction process. . At this time, the block as a structural element used for the expansion process and the contraction process of the morphological operation is, for example, a 3 × 3 pixel square lattice, and the expansion process and the contraction process are half the number of pixels on one side of the square lattice. By repeatedly executing each time (5 times in the case of 10 pixels per side), the undetected area 13 is detected as an object area. As shown in FIG. 3, by comparing each block including a plurality of pixels divided in a grid shape, the actual object size is also detected by the morphological operation shrinkage process for the pixels around the object that are over-detected. It is possible to suppress the object area to an appropriate size according to the above.

図４は、画像中の複数の物体１１が一定の範囲内に近接して存在した場合の例である。ここで、一定の範囲内とは、モルフォロジー演算の膨張処理により物体同士が結合する結合領域１４を生じ得る物体間の距離に相当する。モルフォロジー演算の膨張処理により結合した画素は、その後の収縮処理でも元の状態に戻らないため、物体領域として検出され、物体及び物体の数の過剰検出の原因となる。そこで、このような膨張処理による物体の結合が発生する確率を計算し、過剰検出による誤差を補正する必要がある。画像中の物体の大きさは、実際の物体の大きさ、カメラの焦点距離、及びカメラから物体までの距離により変動するが、事前に物体の大きさやカメラから移動する物体までの距離を計測することはできないため、ここでは簡易的に平均的な大きさの物体が画像中で一様に存在すると仮定し、実空間ではなく画像中に存在する物体の平均的な大きさを用いて、物体の結合する確率を計算する。 FIG. 4 shows an example in which a plurality of objects 11 in the image are close to each other within a certain range. Here, “within a certain range” corresponds to the distance between the objects that can generate the coupling region 14 in which the objects are coupled by the expansion process of the morphological operation. Since the pixels combined by the expansion process of the morphological operation do not return to the original state even in the subsequent contraction process, they are detected as an object region and cause excessive detection of the number of objects and objects. Therefore, it is necessary to calculate the probability of the object coupling due to such expansion processing and to correct the error due to excessive detection. The size of the object in the image varies depending on the actual size of the object, the focal length of the camera, and the distance from the camera to the object, but measures the size of the object and the distance from the camera to the moving object in advance. In this case, we simply assume that an object of average size exists uniformly in the image, and use the average size of the object that exists in the image instead of the real space. Calculate the probability of combining.

モルフォロジー演算の性質上、膨張処理により結合が発生するのは、物体の斜め方向を除く、図５に示す物体の上下左右の方向の領域β１〜β４のみとなる。例えば、ある物体１１の右側の領域β２と、隣接する他の物体（図示省略）の左側の領域β４と、の間の距離が所定レベルより近い場合に、膨張処理による結合が発生し、同様に、物体１１の左側の領域部β４と他の物体の右側の領域β２、ある物体１１の上側の領域β１と他の物体の下側の領域β３、ある物体１１の下側の領域β３と他の物体の上側の領域β１、のそれぞれの領域間の距離が近い場合に結合が発生する。このことから、ある画素（ｘ，ｙ）について、結合による誤差が発生する確率は、該当する画素が物体ではなく、且つ前述の膨張処理により結合する組合せの場合となる。物体が１つだけ存在し、ある画素（ｘ，ｙ）が物体ではない確率Ｐ１'（ｘ，ｙ）は、画像中の平均的な物体面積（画素数）をα、画像全体の面積（画素数）をＡとした場合、以下の式（１）により表すことができる。
Ｐ１'（ｘ，ｙ）＝１−α／Ａ…（１）
また、物体が２つ存在し、ある画素（ｘ，ｙ）が物体では無い確率Ｐ１''（ｘ，ｙ）は、Ｐ１'の確率に対して、１つ目の物体以外の領域に２つ目の物体が存在しない確率であるため、画像中における物体の床面積（画素数）の平均値をａとすると、以下の式（２）で表すことができる。
Ｐ１''（ｘ，ｙ）＝Ｐ１'（ｘ，ｙ）×｛１−α／（Ａ−ａ）｝…（２）
上記の式（１）、式（２）より、物体がＮ個存在する場合に、ある画素（ｘ，ｙ）が物体では無い確率Ｐ１（ｘ，ｙ）を一般化すると、以下の式により表すことができる。 Due to the nature of the morphological operation, the combination causes the expansion only in the regions β1 to β4 in the vertical and horizontal directions of the object shown in FIG. 5 except for the oblique direction of the object. For example, when the distance between the region β2 on the right side of a certain object 11 and the region β4 on the left side of another adjacent object (not shown) is closer than a predetermined level, coupling due to expansion processing occurs, , The left side area β4 of the object 11 and the right side area β2 of the other object, the upper area β1 of the object 11 and the lower area β3 of the other object, the lower area β3 of the object 11 and the other area Coupling occurs when the distance between the regions β1 on the upper side of the object is short. From this, the probability that an error due to combining occurs with respect to a certain pixel (x, y) is in the case of a combination in which the corresponding pixel is not an object and is combined by the above-described expansion processing. The probability P1 ′ (x, y) that there is only one object and a pixel (x, y) is not an object is that the average object area (number of pixels) in the image is α and the area of the entire image (pixel When the number is A, it can be expressed by the following formula (1).
P1 ′ (x, y) = 1−α / A (1)
In addition, there are two probabilities P1 ″ (x, y) that there are two objects and a pixel (x, y) is not an object, and two in a region other than the first object with respect to the probability of P1 ′. Since this is the probability that there is no eye object, if the average value of the floor area (number of pixels) of the object in the image is a, it can be expressed by the following equation (2).
P1 ″ (x, y) = P1 ′ (x, y) × {1−α / (A−a)} (2)
From the above formulas (1) and (2), when there are N objects, the probability P1 (x, y) that a certain pixel (x, y) is not an object is generalized and expressed by the following formula: be able to.

物体の数Ｎは、例えば、物体面積αと、上記の特許文献１により求められるカメラと床面との幾何的な関係から求められる荷重値と、に基づいて算出される。 The number N of objects is calculated based on, for example, the object area α and the load value obtained from the geometric relationship between the camera and the floor surface obtained from Patent Document 1 described above.

なお、物体検出の対象範囲を画像全体ではなく、画像の一部に対してのみ行う場合には、物体検出の面積Ａの値を、画像全体の面積ではなく、対象範囲の面積（画素数）として算出すれば良い。 Note that when the object detection target range is performed not on the entire image but only on a part of the image, the value of the object detection area A is not the area of the entire image but the area of the target range (number of pixels). Can be calculated as

画像入力装置の幾何学条件、つまり焦点距離、画像中心、レンズ歪みなどの内部パラメータと、姿勢と位置を含む外部パラメータと、更には物体検出の対象範囲と、が変更されなければ、α、Ａ、ａは固定値となり、Ｐ１（ｘ，ｙ）はＮに依存する関数となる。Ｎは物体検出の対象範囲における物体の数となることから、計測対象となる撮影範囲の床面積や物体の大きさが固定されていれば、最大数が大きく変更されることはない。従って、Ｎの上限値を決定することも容易であり、事前にＮを１から上限値まで変えたＰ１（ｘ，ｙ）を計算し、テーブル化しておくことにより、処理を高速化することも可能となる。 If the geometric conditions of the image input device, that is, internal parameters such as focal length, image center, lens distortion, external parameters including posture and position, and object detection target range are not changed, α, A , A is a fixed value, and P1 (x, y) is a function dependent on N. Since N is the number of objects in the object detection target range, the maximum number is not greatly changed if the floor area or the size of the object to be measured is fixed. Therefore, it is easy to determine the upper limit value of N, and it is possible to speed up the processing by calculating P1 (x, y) in which N is changed from 1 to the upper limit value and making a table in advance. It becomes possible.

次に、膨張処理により結合する各領域を、図５に示すβ１、β２、β３、β４とすると、膨張処理により結合する確率は、物体Ａのβ１の領域が、この物体Ａとは異なる他の物体（以降、物体Ｘ）のβ３の領域と２つ以上重なる領域と考えられる。ここで、物体Ａと物体Ｘの大きさが、画像中の平均的な物体の大きさであるという仮定を用いると、以下の式（４），（５）に示すように、物体Ａと物体Ｘそれぞれのβ１とβ３、β２とβ４は、同じ面積になる。
β１＝β３＝β' …（４）
β２＝β４＝β'' …（５）
以上のことから、ある画素（ｘ，ｙ）が膨張処理により結合しない確率は、β'またはβ''が一つも存在しない、あるいは一つしか存在しない確率である。ある画素（ｘ，ｙ）にβ'またはβ''が一つも存在しない確率Ｐｂ０１（ｘ，ｙ）、Ｐｂ０２（ｘ，ｙ）は、それぞれ以下の式（６），（７）で表すことができる。
Ｐｂ０１（ｘ，ｙ）＝２×（１−β'／Ａ）＾Ｎ …（６）
Ｐｂ０２（ｘ，ｙ）＝２×（１−β''／Ａ）＾Ｎ …（７）
また、ある画素（ｘ，ｙ）にβ'またはβ''が一つしか存在しない確率Ｐｂ１１（ｘ，ｙ）、Ｐｂ１２（ｘ，ｙ）は、以下の式（８），（９）で表すことができる。
Ｐｂ１１（ｘ，ｙ）＝２×Ｎ×β'／Ａ×（１−β'／Ａ）＾（Ｎ−１） …（８）
Ｐｂ１２（ｘ，ｙ）＝２×Ｎ×β''／Ａ×（１−β''／Ａ）＾（Ｎ−１）…（９）
上記の式（６）〜（９）を用いると、ある画素（ｘ，ｙ）にβ'またはβ''が二つ以上存在する確率Ｐ２（ｘ，ｙ）は、下式（１０）で表すことができる。
Ｐ２（ｘ，ｙ）＝１−｛Ｐｂ０１（ｘ，ｙ）＋Ｐｂ０２（ｘ，ｙ）＋Ｐｂ１１（ｘ，ｙ）＋Ｐｂ１２（ｘ，ｙ）｝…（１０）
式（３）および式（１０）より、ある画素（ｘ，ｙ）が膨張処理により結合する確率Ｐ（ｘ，ｙ）は、下式（１１）として表すことができる。
Ｐ（ｘ，ｙ）＝Ｐ１（ｘ，ｙ）×Ｐ２（ｘ，ｙ） …（１１）
ここで、Ｐ１（ｘ，ｙ）とＰ２（ｘ，ｙ）のうち、Ｐ１（ｘ，ｙ）は、物体の数Ｎが大きくなるほど小さくなる特性のもので、Ｐ２（ｘ，ｙ）は、物体の数Ｎが大きくなるほど大きくなる特性のものである。従って、これらのＰ１（ｘ，ｙ）とＰ２（ｘ，ｙ）とを掛け合わせることで、最終的な確率Ｐ（ｘ，ｙ）は、物体の数Ｎがある所定レベルに達するまでは、物体の数Ｎが大きくなるほど大きくなり、所定レベルを超えると、物体の数Ｎが大きくなるほど減少する特性のものとなる。 Next, assuming that the regions to be combined by the expansion processing are β1, β2, β3, and β4 shown in FIG. 5, the probability of combining by the expansion processing is that the region of β1 of the object A is different from that of the object A. It is considered that two or more regions overlap the β3 region of the object (hereinafter, object X). Here, using the assumption that the size of the object A and the object X is the average size of the object in the image, as shown in the following equations (4) and (5), the object A and the object Β1 and β3, and β2 and β4 of X have the same area.
β1 = β3 = β ′ (4)
β2 = β4 = β ″ (5)
From the above, the probability that a certain pixel (x, y) is not combined by the expansion processing is the probability that none of β ′ or β ″ exists or only one exists. Probabilities Pb01 (x, y) and Pb02 (x, y) that no β ′ or β ″ exists in a certain pixel (x, y) can be expressed by the following equations (6) and (7), respectively. it can.
Pb01 (x, y) = 2 × (1−β ′ / A) ^ N (6)
Pb02 (x, y) = 2 × (1-β ″ / A) ^ N (7)
Further, the probabilities Pb11 (x, y) and Pb12 (x, y) that there is only one β ′ or β ″ in a certain pixel (x, y) are expressed by the following equations (8) and (9). be able to.
Pb11 (x, y) = 2 × N × β ′ / A × (1-β ′ / A) ^ (N−1) (8)
Pb12 (x, y) = 2 × N × β ″ / A × (1-β ″ / A) ^ (N−1) (9)
Using the above equations (6) to (9), the probability P2 (x, y) that two or more β ′ or β ″ exists in a certain pixel (x, y) is expressed by the following equation (10). be able to.
P2 (x, y) = 1- {Pb01 (x, y) + Pb02 (x, y) + Pb11 (x, y) + Pb12 (x, y)} (10)
From Expression (3) and Expression (10), the probability P (x, y) that a certain pixel (x, y) is combined by the expansion process can be expressed as the following Expression (11).
P (x, y) = P1 (x, y) × P2 (x, y) (11)
Here, of P1 (x, y) and P2 (x, y), P1 (x, y) has a characteristic that decreases as the number N of objects increases, and P2 (x, y) The number N becomes larger as the number N increases. Therefore, by multiplying these P1 (x, y) and P2 (x, y), the final probability P (x, y) is the object until the number N of objects reaches a certain predetermined level. As the number N of the object increases, the characteristic increases, and when the value exceeds a predetermined level, the characteristic decreases as the number N of objects increases.

物体領域として検出された画素数のうち、確率Ｐ（ｘ，ｙ）により導き出される画素数は、膨張処理により結合が発生し、収縮処理により収縮されずに過剰に検出されている画素と考えることができる。ここで、画像中の物体の大きさを平均的な物体の大きさとして仮定しているため、画素数と物体数には比例関係が成り立ち、式（１１）により表される過剰検出される確率は、画素に対するものとしてではなく、物体数としてとらえることもできる。そこで、下式（１２）に示すように、物体数Ｎに対し、式（１１）により算出した確率Ｐ（ｘ，ｙ）を用いた補正を行うことで、過剰検出を考慮した物体数Ｎ'を導き出すことができる。 Of the number of pixels detected as the object region, the number of pixels derived from the probability P (x, y) is considered to be a pixel that is excessively detected without being contracted by the contraction process due to the combination generated by the expansion process. Can do. Here, since the size of the object in the image is assumed as the average size of the object, a proportional relationship is established between the number of pixels and the number of objects, and the probability of overdetection represented by Expression (11) Can be viewed as the number of objects rather than as a pixel. Therefore, as shown in the following equation (12), by correcting the number of objects N using the probability P (x, y) calculated by the equation (11), the number of objects N ′ in consideration of excessive detection. Can be derived.

図６は、本発明に係る基本構成の一例を示す図である。図６において、１は画像入力装置としての固定カメラ、２はカメラ１で取得した画像を蓄積する画像蓄積部、３は画像蓄積部２に蓄積された画像から背景モデルを生成し、複数画素を含む各ブロック毎に背景モデルと入力画像と比較を行うことによって、物体の輪郭部分を物体を含む領域として検出し、検出した領域に対して輪郭内部の欠損部分を埋める膨張処理や、膨張処理により過剰検出した部分を低減する収縮処理などの、誤差低減処理を行う物体検出処理部である。 FIG. 6 is a diagram showing an example of a basic configuration according to the present invention. In FIG. 6, 1 is a fixed camera as an image input device, 2 is an image storage unit that stores images acquired by the camera 1, 3 is a background model generated from the images stored in the image storage unit 2, and a plurality of pixels are By comparing the background model with the input image for each block included, the contour part of the object is detected as a region including the object, and the detected region is expanded by filling the missing portion inside the contour, or by expansion processing An object detection processing unit that performs an error reduction process such as a contraction process that reduces an excessively detected portion.

背景モデルの生成については公知の手法を用いることができ、例えば複数フレームの画像を平均して作成することができる。背景モデルの生成は必ずしも全ての画像入力時に行う必要は無く、背景モデルを一時的に保持することで、過去に生成した背景モデルを使用し、任意の間隔で背景モデルを更新可能とする。 For the generation of the background model, a known method can be used. For example, an image of a plurality of frames can be created on average. It is not always necessary to generate the background model at the time of inputting all the images. By temporarily holding the background model, the background model can be updated at an arbitrary interval using the background model generated in the past.

４はカメラ１の内部パラメータ（焦点距離、画像中心、レンズ歪みなど）、外部パラメータ（姿勢と位置）などのカメラ情報（幾何的条件）を入力するカメラ情報入力部である。５は前記カメラ情報入力部４から入力された内部パラメータ、外部パラメータと実空間中の３次元点を対応付けた透視投影の関係を前提として、観測される画像上の各画素に投影される視体積のうち物体の体積に寄与する体積を算出し、その画素が物体の数にどれだけ寄与するかを数量的に表す寄与率としての荷重値を全ての画素に対して求めて荷重値テーブルを生成する荷重値テーブル算出手段としての荷重値テーブル生成部である。 A camera information input unit 4 inputs camera information (geometric conditions) such as internal parameters (focal length, image center, lens distortion, etc.) and external parameters (posture and position) of the camera 1. 5 is a view projected on each pixel on the observed image on the premise of a perspective projection relationship in which internal parameters and external parameters input from the camera information input unit 4 are associated with three-dimensional points in the real space. Calculate the volume of the volume that contributes to the volume of the object, and calculate the load value as a contribution ratio that quantitatively represents how much the pixel contributes to the number of objects, and calculate the load value table for all the pixels. It is a load value table production | generation part as a load value table calculation means to produce | generate.

６は、前記物体検出処理部３により検出された物体を含む領域の各画素と前記荷重値テーブル生成部５で得た各画素の荷重値から物体数を計測する物体計数部である。つまり、物体計数部６では、物体領域内の画素の荷重値を積算することで、物体の数が計数される。 Reference numeral 6 denotes an object counting unit that measures the number of objects from each pixel in the region including the object detected by the object detection processing unit 3 and the load value of each pixel obtained by the load value table generating unit 5. That is, the object counting unit 6 counts the number of objects by integrating the load values of the pixels in the object region.

７は、前記物体検出処理部３により検出された物体を含む領域の各画素と、前記物体計数部６で計測された物体数とに基づいて、前記物体検出処理部３の誤差低減処理（膨張・収縮処理）により不可避的に生じる物体間の結合領域に起因する物体数の過剰検出の発生確率を算出し、この確率に基づいて、計測された物体数に対して再度誤差の補正を行い、物体数を再算出する物体数補正手段としての誤差補正部である。 7 is an error reduction process (expansion) of the object detection processing unit 3 based on each pixel of the region including the object detected by the object detection processing unit 3 and the number of objects measured by the object counting unit 6.・ Calculate the occurrence probability of the excessive detection of the number of objects due to the joint area between the objects inevitably caused by the contraction process), and based on this probability, correct the error again for the measured number of objects, It is an error correction unit as an object number correcting means for recalculating the number of objects.

８は、誤差補正部で再算出された物体数を出力する物体数出力部である。尚、前記物体検出処理部３、荷重値テーブル生成部５、物体計数部６および誤差補正部７における各処理は、たとえば周知のコンピュータシステムにより実行可能である。画像蓄積部２には、ハードディスク、ＲＡＩＤ装置、ＣＤ−ＲＯＭなどの記録媒体を利用してもよく、あるいは、ネットワークを介してリモートなデータ資源を利用する形態でも構わない。 Reference numeral 8 denotes an object number output unit that outputs the number of objects recalculated by the error correction unit. Each process in the object detection processing unit 3, the load value table generation unit 5, the object counting unit 6, and the error correction unit 7 can be executed by, for example, a known computer system. The image storage unit 2 may use a recording medium such as a hard disk, a RAID device, or a CD-ROM, or may use a remote data resource via a network.

図７は、本発明に係る基本構成の他の例を示している。この例ように、蓄積部などの記憶装置、記録媒体を持たずにリアルタイムでの処理を行う構成とすることも可能である。図７において、図６と同一部分は同一符号をもって示し、重複する説明を適宜省略する。図７の例では、前記画像蓄積部２は省略しており、代わりに物体検出処理部３において背景モデルを保持し、カメラ１から入力された画像に対して背景モデルを更新する。このとき、背景モデルの更新は必ずしも全ての画像入力時に行う必要は無く、１枚以上の任意の間隔で設定可能とする。 FIG. 7 shows another example of the basic configuration according to the present invention. As in this example, it is also possible to adopt a configuration in which processing is performed in real time without having a storage device such as an accumulation unit or a recording medium. In FIG. 7, the same parts as those in FIG. 6 are denoted by the same reference numerals, and redundant description will be omitted as appropriate. In the example of FIG. 7, the image storage unit 2 is omitted, and instead, the object detection processing unit 3 holds the background model and updates the background model for the image input from the camera 1. At this time, it is not always necessary to update the background model when inputting all the images, and the background model can be set at an arbitrary interval of one or more sheets.

次に、図６、図７の装置の各部の処理の流れを図８のフローチャートに沿って説明する。まず全体の流れを説明すると、ステップＳ１においてカメラ１で撮影した画像を取り込む。ここで、図６の場合は、取り込んだ画像を画像蓄積部２へ蓄積する。次にステップＳ２において、前記取り込んだ画像から物体検出処理部３にて背景モデルを生成、あるいは更新する。次にステップＳ３で未処理の画像の有無を判定し、未処理の画像が存在する場合にはステップＳ４へ進み、前記背景モデルと入力された未処理の画像との間で、複数の画素を含む各ブロック毎の比較により、物体の輪郭に対応する領域を、物体を含む領域として検出する。つまり、複数の画素を含む各ブロック毎に、背景モデルと入力画像との間で、正規化相関による類似度を算出し、この類似度が所定レベルに達していない場合に、当該ブロックの入力画像を物体領域と判定・検出し、このブロック内の全ての画素を、物体が存在する画素として検出する。 Next, the processing flow of each part of the apparatus of FIGS. 6 and 7 will be described with reference to the flowchart of FIG. First, the overall flow will be described. An image taken by the camera 1 in step S1 is captured. Here, in the case of FIG. 6, the captured image is stored in the image storage unit 2. In step S2, a background model is generated or updated by the object detection processing unit 3 from the captured image. Next, in step S3, it is determined whether or not there is an unprocessed image. If there is an unprocessed image, the process proceeds to step S4, and a plurality of pixels are set between the background model and the input unprocessed image. A region corresponding to the contour of the object is detected as a region including the object by comparing each block including the object. That is, for each block including a plurality of pixels, the similarity based on the normalized correlation is calculated between the background model and the input image, and when the similarity does not reach a predetermined level, the input image of the block Is detected and detected as an object region, and all pixels in the block are detected as pixels in which an object exists.

次にステップＳ５において、前記物体を含む領域に対して、上述したように、物体輪郭部分の内側の欠損部分を物体領域として検出する膨張処理や、この膨張処理によって過剰検出される物体領域を低減・縮小する収縮処理を行い、物体領域の誤差を低減する。これらステップＳ３からステップＳ５の処理は、図６、図７の物体検出処理部３において行われる。 Next, in step S5, as described above, with respect to the region including the object, as described above, the expansion processing for detecting the missing portion inside the object contour portion as the object region, and the object region over-detected by the expansion processing are reduced. -Reduce contraction processing to reduce errors in the object area. The processing from step S3 to step S5 is performed in the object detection processing unit 3 shown in FIGS.

次にステップＳ１０において、ステップＳ９で生成された荷重値テーブルをもとに、ステップＳ５により検出された物体を含む領域内に存在する画素の荷重値を加算つまり積算して、この積算値を物体の数として計数・計測する。これらの処理は、図６、図７の物体計数部６で行われる。このとき、ステップＳ９で生成される荷重値テーブルは、ステップＳ６で入力されるカメラ情報、ステップＳ７で入力・設定されるカメラパラメータなどの幾何的条件に基づいて、ステップＳ８において各画素毎に算出される。荷重値テーブルはカメラの設置位置に依存するため、カメラの設置以降であれば、物体数計測の前に事前に算出しておくことができる。 Next, in step S10, based on the load value table generated in step S9, the load values of the pixels existing in the area including the object detected in step S5 are added, that is, integrated, and this integrated value is calculated as the object. Count and measure as a number. These processes are performed by the object counting unit 6 shown in FIGS. At this time, the load value table generated in step S9 is calculated for each pixel in step S8 based on the camera information input in step S6 and the geometric conditions such as camera parameters input and set in step S7. Is done. Since the load value table depends on the installation position of the camera, if it is after the installation of the camera, it can be calculated in advance before measuring the number of objects.

次にステップＳ１１において、上述した膨張処理と収縮処理を経ることによって、隣接する物体間の非物体領域が物体領域として過剰に検出される結合領域の分、過剰に計数される物体の数を抑制するように、物体の数を補正する。つまり、ステップＳ１０で計測された物体数と、ステップＳ５で算出された物体を含む領域の画素数と、に基づいて、計測された物体数の誤差を算出し、計測された物体数に対して誤差の補正を行う。具体的には、上記の式（１２）を用いて、過剰検出を考慮した物体数Ｎ'を求める。このステップＳ１１の処理は、図６、図７の誤差補正部７で行われる。そして、ステップＳ１２において、ステップＳ１１で補正された物体数Ｎ'を出力する。ステップＳ１２の処理は図６、図７の物体数出力部８で行われる。 Next, in step S11, by performing the above-described expansion processing and contraction processing, the number of excessively counted objects is suppressed by the amount of the combined region in which the non-object region between adjacent objects is excessively detected as the object region. To correct the number of objects. That is, based on the number of objects measured in step S10 and the number of pixels in the region including the object calculated in step S5, an error in the measured number of objects is calculated. Perform error correction. Specifically, the number of objects N ′ taking into account excessive detection is obtained using the above equation (12). The processing in step S11 is performed by the error correction unit 7 in FIGS. In step S12, the number of objects N ′ corrected in step S11 is output. The processing in step S12 is performed by the object number output unit 8 in FIGS.

以下、各処理の詳細を説明する。カメラ１を経由して取得された画像は画像蓄積部２に蓄積され、画像蓄積部２から取り出した画像に基づき、物体検出処理部３にて背景モデルを更新する。このとき、背景モデルの更新は、１枚以上の任意の間隔で行うこととする。例えば複数フレームの画像を平均して背景モデルを作成する場合には、複数回の更新によって背景モデルが完全に更新されるまでの時間が、上記「任意の間隔」に相当し、屋外であれば５〜１０分、屋内であれば３０分程度に設定される。 Details of each process will be described below. Images acquired via the camera 1 are stored in the image storage unit 2, and the object detection processing unit 3 updates the background model based on the images extracted from the image storage unit 2. At this time, the background model is updated at an arbitrary interval of one or more. For example, when creating a background model by averaging images of multiple frames, the time until the background model is completely updated by multiple updates corresponds to the above "arbitrary interval" 5 to 10 minutes, and about 30 minutes if indoor.

次に、取り出した画像と更新した背景モデルをそれぞれ格子状のブロック（格子）に分割し、複数の画素を含む各ブロック毎に、入力画像と背景モデルの位置的に対応するブロック内の各画素を用いて、正規化相互相関による類似度を算出する。この類似度は、０〜１の間の値であり、屋内もしくは屋外などのカメラの設置環境（幾何的条件）に応じて求められる。この類似度と、上記のカメラの設置環境等に応じて実験により予め設定しておいた閾値と、の比較処理を行い、一定以下の類似度を持つブロックが、物体領域として検出される。 Next, the extracted image and the updated background model are divided into grid-like blocks (lattices), and each pixel in the block corresponding to the position of the input image and the background model for each block including a plurality of pixels. Is used to calculate the similarity based on the normalized cross-correlation. This similarity is a value between 0 and 1 and is determined according to the camera installation environment (geometric conditions) such as indoors or outdoors. A comparison process is performed between the similarity and a threshold value set in advance by an experiment in accordance with the installation environment of the camera, and a block having a certain similarity or less is detected as an object region.

全てのブロックについて類似度の算出，閾値処理を終えたのち、図２に示すモルフォロジー演算の膨張処理によって、物体領域として検出された物体輪郭部分の内側の未検出領域の欠損を物体領域として補正し、次いで収縮処理により、膨張処理により過剰検出された領域や、ブロック単位での演算により、過剰に検出されてしまった領域を縮小・低減する補正を行う。ここでモルフォロジー演算以降は、ブロック毎の演算ではなく、画素単位での膨張処理、収縮処理の演算を行う。 After finishing the calculation of the similarity and the threshold processing for all the blocks, the defect in the undetected area inside the object contour portion detected as the object area is corrected as the object area by the expansion process of the morphological operation shown in FIG. Next, correction is performed to reduce / reduce the area that has been excessively detected by the expansion process or the area that has been excessively detected by calculation in units of blocks by the contraction process. Here, after the morphological calculation, not the calculation for each block but the calculation of the expansion process and the contraction process for each pixel is performed.

次に、物体検出処理部３で検出された物体領域と、荷重値テーブル生成部５にて生成された荷重値テーブルとから、物体計数部６において、物体領域に含まれる画素に対し、その荷重値を積算することで、画像中に含まれる物体数を計数する。 Next, from the object area detected by the object detection processing unit 3 and the load value table generated by the load value table generating unit 5, the object counting unit 6 applies the load to the pixels included in the object area. By integrating the values, the number of objects included in the image is counted.

ここで、固定されたカメラを前提とし、カメラ情報入力部４にて入力されたカメラの３次元位置、ならびにカメラの姿勢を表す回転角度、およびカメラの焦点距離（以降、カメラ情報）が、全て既知であるものとする。荷重値テーブル生成部５では、カメラ情報入力部４にて入力されたカメラ情報から、画像空間と３次元実空間との対応付けを行い、ある画素に物体が存在した場合の物体表面の全画素数に対する割合を、荷重値として算出する。物体計数部６では、物体検出処理部３で検出された物体領域に対応する荷重値テーブルの値を積算することにより、画像全体に含まれる物体数を算出する。ここで、物体検出処理部３では、図４に示すようにモルフォロジー演算の膨張処理により、本来物体領域ではない結合領域も、物体領域として検出される可能性がある。そのため、この結合領域による物体数の過剰検出を抑制するように、誤差補正部７にて物体数を補正する。誤差補正部７では、物体計数部６で算出された物体数や、画像全体の画素数、画像中に存在する平均的な物体の大きさ、物体検出処理部３で検出された物体領域の画素数に基づいて、上述した式（１）から式（１１）に示すように、結合確率Ｐ（ｘ，ｙ）を算出する。算出したＰ（ｘ，ｙ）は結合による誤差の発生確率に相当するため、誤差補正部７ではＰ（ｘ，ｙ）と物体計数部６で算出された物体数Ｎを用いて、上述した式（１２）により、補正後の物体数Ｎ'を算出する。物体数出力部８では、この補正後の物体数Ｎ'を誤差補正部７から取得し、最終的な物体数として出力する。 Here, assuming a fixed camera, all of the three-dimensional position of the camera input by the camera information input unit 4, the rotation angle representing the posture of the camera, and the focal length of the camera (hereinafter referred to as camera information) are all It shall be known. The load value table generation unit 5 associates the image space with the three-dimensional real space from the camera information input by the camera information input unit 4, and all the pixels on the object surface when an object exists in a certain pixel The ratio to the number is calculated as a load value. The object counting unit 6 calculates the number of objects included in the entire image by integrating the values in the load value table corresponding to the object area detected by the object detection processing unit 3. Here, in the object detection processing unit 3, as shown in FIG. 4, there is a possibility that a combined area that is not an original object area is also detected as an object area by the expansion process of the morphological operation. For this reason, the error correction unit 7 corrects the number of objects so as to suppress the excessive detection of the number of objects due to the combined region. In the error correction unit 7, the number of objects calculated by the object counting unit 6, the number of pixels of the entire image, the average size of an object present in the image, and the pixel of the object region detected by the object detection processing unit 3 Based on the number, the coupling probability P (x, y) is calculated as shown in the above-described equations (1) to (11). Since the calculated P (x, y) corresponds to the occurrence probability of the error due to the combination, the error correction unit 7 uses the above equation using P (x, y) and the number of objects N calculated by the object counting unit 6. From (12), the corrected number of objects N ′ is calculated. The object number output unit 8 acquires the corrected object number N ′ from the error correction unit 7 and outputs it as the final object number.

再び図８を参照して、本発明の第１実施例について説明する。本実施例の処理内容は、大きく分けて、事前に設置したカメラ情報に基づき荷重値テーブルを算出するルーチンと、それに基づき入力画像から物体数を計数するフローと、により大略構成される。 With reference to FIG. 8 again, a first embodiment of the present invention will be described. The processing contents of the present embodiment are roughly divided into a routine for calculating a load value table based on camera information set in advance and a flow for counting the number of objects from an input image based on the routine.

まず、荷重値テーブルを算出するルーチンでは、事前に設置したカメラの情報、カメラパラメータをステップＳ６およびステップＳ７により入力し、それぞれの値に基づいてステップＳ８において荷重値を各画素毎に算出し、算出した荷重値とカメラパラメータに基づいてステップＳ９において荷重値テーブルを生成する。荷重値テーブルには、入力画像の各画素に対して、物体表面積の占める割合を荷重値として記録し、また、算出に使用したカメラパラメータも記録される。 First, in a routine for calculating a load value table, information on cameras installed in advance and camera parameters are input in step S6 and step S7, and a load value is calculated for each pixel in step S8 based on the respective values. A load value table is generated in step S9 based on the calculated load value and camera parameters. In the load value table, the ratio of the object surface area to each pixel of the input image is recorded as a load value, and the camera parameters used for the calculation are also recorded.

一方物体数を計数するフローにおける物体検出処理では、まずステップＳ１においてカメラ１からの入力画像を取得し、取得した入力画像を元にステップＳ２において背景モデルの構築を行う。このとき、背景モデルは全ての入力画像に対して生成されるわけではなく、事前に設定した処理間隔に基づいて、背景モデルの更新も行えるものとする。また、並行して未処理の入力画像があるかどうかの判定をステップＳ３にて行い、未処理の入力画像がある場合にはステップＳ４の処理へ移り、未処理の入力画像が無い場合にはステップＳ１の処理へ移る。 On the other hand, in the object detection process in the flow for counting the number of objects, first, an input image from the camera 1 is acquired in step S1, and a background model is constructed in step S2 based on the acquired input image. At this time, the background model is not generated for all input images, and the background model can also be updated based on a preset processing interval. In parallel, it is determined in step S3 whether there is an unprocessed input image. If there is an unprocessed input image, the process proceeds to step S4. If there is no unprocessed input image, the process proceeds to step S4. The process proceeds to step S1.

ステップＳ４ではステップＳ２にて生成または更新した背景モデル、及び入力画像をそれぞれ格子状のブロックに分割し、分割した全てのブロックに対してブロック内の画素数を元に正規化相互相関を用いて背景モデルと入力画像との比較を行う。ここで、正規化相互相関の値が事前に設定した一定の値以下の場合は、背景との差分が大きいため、物体領域として判定され、一定の値を超えた場合は、背景との差分が少ないことから、物体以外の領域として判定される。続いてステップＳ５においてモルフォロジ演算の膨張、収縮処理により、格子状に分割比較したことにより発生する欠損や過剰検出を抑制する処理を行う。次いで、ステップＳ５において補正を行った物体領域、ステップＳ９において事前に作成された荷重値テーブルに基づき、ステップＳ１０において物体数を計数する。次いで、ステップＳ１１においてステップＳ５において補正を行った物体領域およびステップＳ１０において算出した物体数から、図４に示すステップＳ５の処理により発生する物体の結合による過剰検出領域の発生確率を算出する。最後にステップＳ１２においてステップＳ１１により算出した誤差発生確率に基づいてステップＳ１０において算出した物体数の補正を行い、物体数を計数する。 In step S4, the background model generated or updated in step S2 and the input image are each divided into grid blocks, and normalized cross-correlation is used for all the divided blocks based on the number of pixels in the block. The background model is compared with the input image. Here, when the value of the normalized cross-correlation is equal to or less than a predetermined value set in advance, the difference from the background is large, so it is determined as an object region, and when the value exceeds a certain value, the difference from the background is Since there are few, it determines as area | regions other than an object. Subsequently, in step S5, processing for suppressing defects and excess detection that occur due to the division and comparison in a lattice shape is performed by expansion and contraction processing of morphological operations. Next, the number of objects is counted in step S10 based on the object region corrected in step S5 and the load value table created in advance in step S9. Next, in step S11, the occurrence probability of the excessive detection region due to the combination of the objects generated by the processing in step S5 shown in FIG. 4 is calculated from the object region corrected in step S5 and the number of objects calculated in step S10. Finally, in step S12, the number of objects calculated in step S10 is corrected based on the error occurrence probability calculated in step S11, and the number of objects is counted.

前記第１実施例では、入力画像と背景モデルとの比較に正規化相互相関を用いており、その際のブロックのサイズは任意としていた。しかしながら、図１に示すように、物体の大きさに対してブロック１０のサイズが小さ過ぎると、物体検出処理部３における欠損の補正処理（膨張処理）を行う際に、物体輪郭部分の内部の欠損を補正しきれない。また、固定カメラを用い、カメラ情報入力部４によりカメラ情報が入力されれば、入力画像中に存在する物体の平均的な大きさを算出することができる。 In the first embodiment, normalized cross-correlation is used for comparison between the input image and the background model, and the block size at that time is arbitrary. However, as shown in FIG. 1, if the size of the block 10 is too small relative to the size of the object, the defect detection process (expansion process) in the object detection processing unit 3 is performed inside the object contour portion. The deficit cannot be corrected. Further, if camera information is input by the camera information input unit 4 using a fixed camera, the average size of an object present in the input image can be calculated.

そこで、図９に示す第２実施例においては、物体領域検出ステップＳ４Ａがカメラ情報入力ステップＳ６からカメラ情報を受け取り、受け取ったカメラ情報に基づき、平均的な物体の大きさを算出し、入力画像と背景モデルとの比較に使用するブロックの大きさを自動で決定する。例えば、検出したい対象の物体が画像の中心に最も近い状態で映し出されているものと仮定し、当該物体の大きさを平均的な物体の大きさとして用いる。ここで、物体周辺領域では物体の微少なふらつきなどにより検出が可能となることから、ブロックの大きさは、図１に示すように対象となる物体の画像中の縦幅、横幅のうちで短い長さの寸法に対し、３分の１から２分の１の範囲の大きさとすることで、ふらつきにより検出される物体の輪郭周辺を除く物体中心部分の物体領域として検出されない欠損領域も、概ねブロック１つ分の大きさとなり、物体検出処理部３における欠損の補正処理により良好な補正が可能となる。 Therefore, in the second embodiment shown in FIG. 9, the object region detection step S4A receives the camera information from the camera information input step S6, calculates the average object size based on the received camera information, and inputs the input image. The size of the block used for comparison with the background model is automatically determined. For example, assuming that the target object to be detected is projected in a state closest to the center of the image, the size of the object is used as the average size of the object. Here, since it is possible to detect in the object peripheral area due to a slight fluctuation of the object, the size of the block is short among the vertical width and the horizontal width in the image of the target object as shown in FIG. By setting the size within the range of one-third to one-half of the length dimension, a defect region that is not detected as an object region in the center portion of the object excluding the periphery of the contour of the object that is detected by wobbling is generally The size is one block, and good correction is possible by the defect correction processing in the object detection processing unit 3.

前記第１実施例および前記第２実施例では、正規化相互相関を用いるブロックサイズを任意としている。ブロックサイズが大きいほど、１つのブロック内の情報量（画素数）が増加することから、微少な変化を捉え易くなるが、一方で１つのブロックあたりの演算量が増加するという問題がある。そこで、第３実施例では、図１０に示すように、画像サイズ変更ステップＳ１３及びステップＳ１４を新たに追加している。画像サイズ変更ステップＳ１３では、カメラ情報入力ステップＳ６からの情報を受け取り、受け取ったカメラ情報に基づき、平均的な物体の大きさを算出し、ステップＳ１で取得される入力画像の大きさを変更し、ブロックの大きさを変えずに相対的なブロックの大きさを変更する。このとき、前記第２実施例と同様に、ブロックの大きさが対象となる物体の画像中の縦幅、横幅のうちで長い方の長さ寸法の３分の１から２分の１の大きさになるように、入力画像の大きさを変更することで、物体中心の欠損もブロック概ね１つ分となり、物体検出処理部３における欠損の補正処理による良好な補正が可能となる。荷重値テーブルは元の大きさの入力画像に基づいて作成されているため、物体検出処理部で検出された物体領域は、ステップＳ５による欠損、過剰検出の補正処理後の画像サイズ変更ステップＳ１４によって、元の入力画像の大きさに戻した後、ステップＳ１０へ進み、物体計数部６により物体数の計数を行う。この画像サイズ変更ステップＳ１４では、上記の画像サイズ変更ステップＳ１３と同様に、カメラ情報入力ステップＳ６からの情報を受け取り、この情報に基づいて、画像サイズを元に戻せば良い。 In the first embodiment and the second embodiment, the block size using normalized cross-correlation is arbitrary. As the block size is larger, the amount of information (number of pixels) in one block increases, so that it is easy to catch a minute change, but there is a problem that the amount of calculation per block increases. Therefore, in the third embodiment, as shown in FIG. 10, image size changing steps S13 and S14 are newly added. In the image size changing step S13, the information from the camera information input step S6 is received, the average size of the object is calculated based on the received camera information, and the size of the input image acquired in step S1 is changed. The relative block size is changed without changing the block size. At this time, as in the second embodiment, the size of the block is one third to one half of the longer length of the vertical and horizontal widths in the image of the target object. As described above, by changing the size of the input image, the defect at the center of the object is also approximately one block, and good correction by the defect correction processing in the object detection processing unit 3 is possible. Since the load value table is created based on the input image of the original size, the object region detected by the object detection processing unit is subjected to the image size changing step S14 after the correction processing of the defect / overdetection in step S5. After returning to the size of the original input image, the process proceeds to step S10, and the object count unit 6 counts the number of objects. In this image size changing step S14, the information from the camera information input step S6 may be received and the image size may be restored based on this information, as in the image size changing step S13.

なお、本発明は、上述した物体検出装置の各部の処理内容の一部又は全部を、コンピュータを機能させるプログラムとして構成することもできる。 In the present invention, a part or all of the processing contents of each part of the above-described object detection apparatus can be configured as a program that causes a computer to function.

１…カメラ（画像入力装置）
２…画像蓄積部
３…物体検出処理部（物体検出処理手段）
４…カメラ情報入力部
５…荷重値テーブル生成部（荷重値テーブル生成手段）
６…物体計数部（物体計数手段）
７…誤差補正部（物体数補正手段）
８…物体数出力部 1 ... Camera (image input device)
2 ... image storage unit 3 ... object detection processing unit (object detection processing means)
4 ... Camera information input unit 5 ... Load value table generation unit (load value table generation means)
6 ... Object counting section (object counting means)
7. Error correction unit (object number correction means)
8 ... Number of objects output section

Claims

A background model is constructed based on an input image photographed by an image input device installed at a predetermined depression angle with respect to a target range for object detection, and the input image is determined based on the input image and the background model. An object detection processing means for detecting an object region in which an object is present;
A load value for generating a load value table by obtaining a load value indicating a contribution ratio of each pixel to the number of objects when an object exists in each pixel in the input image based on a geometric condition of the image input device Table generating means;
Object counting means for accumulating the load values of the pixels in the object region and counting the number of objects present in the input image;
In an object detection apparatus having
The object detection processing means is
The input image and the background model are divided into blocks composed of a plurality of pixels, and for each block, an object region corresponding to the contour of the object is extracted based on the similarity between the input image and the background model,
In order to correct the region inside the contour of the object to the object region, the object region corresponding to the contour of the object is expanded.
And, in order to suppress the excessive expansion of the object region due to the expansion processing, to perform the contraction processing of the object region after the expansion processing,
An object detection apparatus characterized by that.

Through the expansion process and the contraction process, the probability that a non-object area between adjacent objects is excessively detected as an object area is generated, and the object counting means counts based on this probability. The object detection apparatus according to claim 1, further comprising an object number correcting unit that corrects the number of objects to be detected.

The object detection processing unit calculates an average size of an object in the input image based on a geometric condition of the image input device, and based on a shorter length of the vertical width and the horizontal width of the object. The object detection device according to claim 1, wherein a size of the block is set.

The object detection processing means includes
An average object size in the input image is calculated based on the geometric condition of the image input device, and the object size for each block is calculated based on the shorter one of the vertical and horizontal widths of the object. Detecting an object for each block after enlarging or reducing the size of the input image and the background model so that the size becomes a predetermined level,
3. The object detection device according to claim 1, wherein the object counting unit counts the number of objects after returning the detection result to the original size.

An object detection program characterized by being a program that causes a computer to execute each means according to any one of claims 1 to 4.

A background model construction step of constructing a background model based on an input image captured by an image input device installed at a predetermined depression angle with respect to the object detection target range;
An object detection processing step for detecting an object region where an object is present in the input image based on the input image and a background model;
A load value for generating a load value table by obtaining a load value indicating a contribution ratio of each pixel to the number of objects when an object exists in each pixel in the input image based on a geometric condition of the image input device A table generation step;
An object counting step of accumulating the load values of the pixels in the object region and counting the number of objects present in the input image;
In an object detection method comprising:
The object detection processing step includes
The input image and the background model are divided into blocks composed of a plurality of pixels, and for each block, an object region corresponding to the contour of the object is extracted based on the similarity between the input image and the background model,
In order to correct the region inside the contour of the object to the object region, the object region corresponding to the contour of the object is expanded.
And, in order to suppress the excessive expansion of the object region due to the expansion processing, to perform the contraction processing of the object region after the expansion processing,
An object detection method characterized by the above.