JP5737909B2

JP5737909B2 - Image processing apparatus, image processing method, and program

Info

Publication number: JP5737909B2
Application number: JP2010250207A
Authority: JP
Inventors: 東條　洋; 洋東條
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2010-11-08
Filing date: 2010-11-08
Publication date: 2015-06-17
Anticipated expiration: 2030-11-08
Also published as: JP2012104964A

Description

本発明は、カメラから取得されるフレーム画像等から物体を認識する技術に関するものである。 The present invention relates to a technique for recognizing an object from a frame image or the like acquired from a camera.

従来より、店舗等の入り口や通路を通行する人をカメラで撮影し、撮影した映像より人物の顔の位置を検出して、通過した人数を計測したり、予め登録されている人物の顔であるかを認識したりする技術が開示されている。 Conventionally, a person passing through an entrance or a passage of a store is photographed with a camera, the position of a person's face is detected from the photographed image, and the number of people passing is measured, or a person's face registered in advance A technique for recognizing whether or not there is disclosed.

このような所定領域における通行人をカメラ映像から自動的にカウントする技術としては、例えば、下記の特許文献１がある。この特許文献１では、通路の上方から真下に向けてカメラを設置する。カメラ上方から見た人物の頭の形状が円であることから、カメラ映像から円形の物体を抽出することで人物を検知、カウントするようにしている。 As a technique for automatically counting passers-by in such a predetermined area from a camera video, for example, there is Patent Document 1 below. In this patent document 1, a camera is installed from the upper side of the passage toward directly below. Since the shape of the head of the person viewed from above the camera is a circle, the person is detected and counted by extracting a circular object from the camera image.

一方、近年、画像から顔を検出する技術の実用化が進んでいる。このような技術を利用して、後述する図１に示すように通路の前方にカメラを設置して、カメラ映像から顔を検出することで人物をカウントすることも可能である。 On the other hand, in recent years, a technique for detecting a face from an image has been put into practical use. Using such a technique, it is possible to count a person by installing a camera in front of the passage and detecting a face from the camera image as shown in FIG.

ここで、カメラが広い範囲を撮影するほど、カメラと人物の位置関係で、顔の向きが異なって撮影されることになる。そして、顔の向きが変化すると、顔の特徴が異なってくる。従って、認識が困難になる。 Here, the more the camera captures a wider area, the more the face is photographed depending on the positional relationship between the camera and the person. And if the orientation of the face changes, the facial features will change. Therefore, recognition becomes difficult.

この課題に対応するために、下記の特許文献２では、顔の向きに応じて認識辞書を用意し、フレーム画像を複数の領域に分け、それぞれの領域で適用する認識辞書を変更していた。ここで、例として図１１の場合を挙げる。 In order to cope with this problem, in Patent Document 2 below, a recognition dictionary is prepared according to the orientation of the face, the frame image is divided into a plurality of regions, and the recognition dictionary applied in each region is changed. Here, the case of FIG. 11 is given as an example.

図１１は、カメラからの距離に応じて撮影される顔の向きが異なることを説明する模式図である。
図１１において、１１０１は通路の天井であり、１１０２は床である。１１０３がカメラであり、天井１１０１に設置され、通路を斜め上より撮影している。１１０４のようにカメラ１１０３から遠い位置に人物がいた場合、撮影された顔の垂直方向の向きは小さい角度になるが、１１０５のようにカメラ１１０３から近い位置に人物がいた場合、顔の垂直方向の向きは大きな角度になる。 FIG. 11 is a schematic diagram for explaining that the orientation of the face to be photographed varies depending on the distance from the camera.
In FIG. 11, 1101 is the ceiling of the passage, and 1102 is the floor. Reference numeral 1103 denotes a camera, which is installed on the ceiling 1101 and photographs the passage from an oblique upper side. When a person is far from the camera 1103 as in 1104, the vertical direction of the captured face is a small angle, but when a person is near the camera 1103 as in 1105, the vertical direction of the face is The direction becomes a big angle.

図１２は、顔の特徴が見え方によって変化することを説明する模式図である。
１２０１が、人物１１０４のようにカメラ１１０３から遠い位置に人物がいる時であり、１２０２が、人物１１０５のようにカメラ１１０３に近い位置に人物がいるときである。この図１２からもわかるように、顔の特徴が見え方によって変化する。 FIG. 12 is a schematic diagram for explaining that facial features change depending on how they are seen.
1201 is when a person is far from the camera 1103 like a person 1104, and 1202 is when a person is near a camera 1103 like a person 1105. As can be seen from FIG. 12, the facial features change depending on how they are seen.

図１３は、従来技術による課題を説明するための模式図である。
そこで、図１３のように、フレーム画像１３０１を、１３０２と１３０３の２つの領域に分け、１３０２の領域に対しては角度が小さい顔を認識する辞書を用い、１３０３の領域に対しては角度が大きな顔を認識する辞書を用いるようにしている。即ち、前記例において、１３０４が、人物１１０４の位置に人物がいるときに認識される顔のフレーム画像内の位置であり、認識辞書は角度が小さい顔用の辞書が用いられる。また、１３０５が、人物１１０５の位置に人物がいるときに認識される顔のフレーム画像内の位置であり、認識辞書は角度が大きい顔用の辞書が用いられる。 FIG. 13 is a schematic diagram for explaining a problem due to the prior art.
Therefore, as shown in FIG. 13, the frame image 1301 is divided into two areas 1302 and 1303, and a dictionary that recognizes a face with a small angle is used for the area 1302, and the angle is set for the area 1303. A dictionary that recognizes large faces is used. That is, in the above example, 1304 is a position in the face frame image recognized when a person is at the position of the person 1104, and a face dictionary with a small angle is used as the recognition dictionary. Reference numeral 1305 denotes a position in the face frame image recognized when the person is at the position of the person 1105, and a face dictionary having a large angle is used as the recognition dictionary.

特開平４−１９９４８７号公報JP-A-4-199487 特開２００７−２５７６７号公報JP 2007-25767 A

しかしながら、特許文献２では、各認識辞書を適用する領域の境界にあたる位置に顔があると認識が難しくなる。前述の例では、図１３の１３０６の位置に顔があるような場合である。このような位置での顔の角度は、中間的な角度となる。つまり、角度が小さい顔用の辞書からも、角度が大きい顔用の辞書からも、その特徴の変動が最も大きくなる。従って、どちらの認識辞書を用いても認識精度が低くなってしまう傾向にあるため、認識が難しくなるのである。 However, in Patent Document 2, it is difficult to recognize if there is a face at a position corresponding to the boundary of a region to which each recognition dictionary is applied. In the above example, there is a case where a face exists at the position 1306 in FIG. The face angle at such a position is an intermediate angle. That is, the variation of the feature is the largest both in the face dictionary with a small angle and the face dictionary with a large angle. Therefore, the recognition accuracy tends to be low regardless of which recognition dictionary is used, which makes recognition difficult.

本発明は、このような問題点に鑑みてなされたものであり、認識する物体の向きが画面内で変化する場合であっても、精度良く物体を認識することができる仕組みを提供することを目的とする。 The present invention has been made in view of such problems, and provides a mechanism that can accurately recognize an object even when the direction of the object to be recognized changes within the screen. Objective.

前述した目的を達成するために、本発明は、同一の物体の異なる方向に対応した複数の辞書のデータを格納する辞書記憶手段と、前記複数の辞書のそれぞれに対してフレーム画像における適用領域を、少なくとも２つの辞書の適用領域に重複する領域を含んで設定する設定手段と、前記複数の辞書のそれぞれを用いて当該辞書に対して設定された適用領域において前記物体を認識する認識手段とを備え、前記認識手段は、少なくとも２つの辞書で重複する適用領域に対して、連続する複数のフレーム画像においてフレーム画像ごとに使用する辞書を切り替え、そのフレーム画像ごとの認識結果を統合することを特徴とする画像処理装置等、を提供する。 In order to achieve the above-described object, the present invention provides dictionary storage means for storing data of a plurality of dictionaries corresponding to different directions of the same object, and an application area in a frame image for each of the plurality of dictionaries. Setting means for setting an area that overlaps application areas of at least two dictionaries; and recognition means for recognizing the object in the application area set for the dictionary using each of the plurality of dictionaries. And the recognition means switches the dictionary to be used for each frame image in a plurality of consecutive frame images with respect to an application region that overlaps at least two dictionaries, and integrates the recognition results for each frame image. An image processing apparatus is provided.

本発明によれば、認識する物体の向きが画面内で変化する場合であっても、精度良く物体を認識することができる。 According to the present invention, an object can be recognized with high accuracy even when the direction of the recognized object changes in the screen.

本発明の実施形態に係る画像処理装置の設置例を示す模式図である。It is a schematic diagram which shows the example of installation of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る画像処理装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る画像処理装置の機能構成の一例を示すブロック図である。It is a block diagram which shows an example of a function structure of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る画像処理装置による画像処理方法の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of the image processing method by the image processing apparatus which concerns on embodiment of this invention. 本発明の実施形態を示し、顔の角度を説明する模式図である。It is a schematic diagram which shows embodiment of this invention and demonstrates the angle of a face. 本発明の実施形態を示し、認識辞書Ａ及びＢの適用できる顔の角度と認識精度との関係を示した模式図である。It is the schematic which showed embodiment of this invention and showed the relationship between the angle of the face which recognition dictionary A and B can apply, and recognition accuracy. 本発明の実施形態を示し、認識辞書Ａ及びＢの適用できる領域を説明する模式図である。It is a schematic diagram which shows embodiment which shows embodiment of this invention and can apply the recognition dictionary A and B. FIG. 画像から顔パターンの探索を行う方法を説明する図である。It is a figure explaining the method of searching for a face pattern from an image. 本発明の実施形態を示し、辞書と照合領域を切り替える動作を説明するフローチャートである。It is a flowchart which shows embodiment of this invention and demonstrates the operation | movement which switches a dictionary and a collation area | region. 本発明の実施形態を示し、軌跡の生成とカウントの一例を示した模式図である。It is the schematic diagram which showed embodiment of this invention and showed an example of the production | generation and count of a locus | trajectory. カメラからの距離に応じて撮影される顔の向きが異なることを説明する模式図である。It is a schematic diagram explaining that the direction of the face image | photographed according to the distance from a camera differs. 顔の特徴が見え方によって変化することを説明する模式図である。It is a schematic diagram explaining that the feature of a face changes with how it looks. 従来技術による課題を説明するための模式図である。It is a schematic diagram for demonstrating the subject by a prior art.

以下に、図面を参照しながら、本発明を実施するための形態（実施形態）について説明する。 Hereinafter, embodiments (embodiments) for carrying out the present invention will be described with reference to the drawings.

なお、以下に挙げる実施形態は、通路を通過する人数を計測する例で説明する。
図１は、本発明の実施形態に係る画像処理装置の設置例を示す模式図である。
１０１は、通路の天井であり、１０２は通路の床である。１０３は通路を通行している人物である。１０４は撮像部（カメラ）であり、人物１０３を斜め上から撮影できるように、天井１０１に設置してある。１０５はＬＡＮケーブルであり、撮像部１０４で撮像される映像を送信する。１０６は、映像を解析し、計数する画像処理装置となるＰＣである。 In addition, embodiment mentioned below demonstrates by the example which measures the number of people who pass a passage.
FIG. 1 is a schematic diagram illustrating an installation example of an image processing apparatus according to an embodiment of the present invention.
101 is the ceiling of the passage and 102 is the floor of the passage. Reference numeral 103 denotes a person passing through the passage. An imaging unit (camera) 104 is installed on the ceiling 101 so that the person 103 can be photographed from above. Reference numeral 105 denotes a LAN cable, which transmits an image captured by the imaging unit 104. A PC 106 is an image processing apparatus that analyzes and counts video.

図２は、本発明の実施形態に係る画像処理装置１０６のハードウェア構成の一例を示すブロック図である。
図２において、２０１はＣＰＵであり、本実施形態の画像処理装置１０６における各種制御を実行する。２０２はＲＯＭであり、本画像処理装置１０６の立ち上げ時に実行されるブートプログラムや各種データを格納する。２０３はＲＡＭであり、ＣＰＵ２０１が処理するための制御プログラムを格納するとともに、ＣＰＵ２０１が各種制御を実行する際の作業領域を提供する。２０４はキーボード、２０５はマウスであり、ユーザによる各種入力操作環境を提供する。 FIG. 2 is a block diagram illustrating an example of a hardware configuration of the image processing apparatus 106 according to the embodiment of the present invention.
In FIG. 2, reference numeral 201 denotes a CPU, which executes various controls in the image processing apparatus 106 according to the present embodiment. Reference numeral 202 denotes a ROM which stores a boot program executed when the image processing apparatus 106 is started up and various data. A RAM 203 stores a control program to be processed by the CPU 201 and provides a work area when the CPU 201 executes various controls. A keyboard 204 and a mouse 205 provide various input operation environments for the user.

２０６は外部記憶装置であり、ハードディスクやフレキシブルディスク、光ディスク、磁気ディスク、光磁気ディスク、磁気テープ等で構成される。ただし、外部記憶装置２０６は、制御プログラムや各種データを全てＲＯＭ２０２に持つようにすれば、必ずしも必要な構成要素ではない。本実施形態においては、本発明の処理に係る制御プログラムは、ＲＯＭ２０２（或いは外部記憶装置２０６）に格納されているものとする。 Reference numeral 206 denotes an external storage device, which includes a hard disk, flexible disk, optical disk, magnetic disk, magneto-optical disk, magnetic tape, and the like. However, the external storage device 206 is not necessarily a necessary component if the ROM 202 has all the control programs and various data. In this embodiment, it is assumed that a control program related to the processing of the present invention is stored in the ROM 202 (or the external storage device 206).

２０７は表示装置であり、ディスプレイなどで構成され、結果等をユーザに対して表示する。２０８はネットワークインターフェース（ＮＩＣ）であり、ネットワーク上の撮像部１０４とＬＡＮケーブル１０５を介した通信を可能とする。２０９はビデオインターフェース（ビデオＩ／Ｆ）であり、撮像部１０４と同軸ケーブルを解したフレーム画像の取り込みを可能とする。また、２１０は上記の各構成を接続するバスである。 Reference numeral 207 denotes a display device that includes a display or the like and displays a result or the like to the user. Reference numeral 208 denotes a network interface (NIC) that enables communication with the imaging unit 104 on the network via the LAN cable 105. Reference numeral 209 denotes a video interface (video I / F), which makes it possible to capture a frame image from the imaging unit 104 and the coaxial cable. Reference numeral 210 denotes a bus connecting the above-described components.

図３は、本発明の実施形態に係る画像処理装置１０６の機能構成の一例を示すブロック図である。
１０は、撮像レンズ、及び、ＣＣＤ、ＣＭＯＳなどの撮像センサからなる撮像手段である。この撮像手段１０は、図１の撮像部１０４に相当するものである。 FIG. 3 is a block diagram illustrating an example of a functional configuration of the image processing apparatus 106 according to the embodiment of the present invention.
Reference numeral 10 denotes an imaging unit including an imaging lens and an imaging sensor such as a CCD or a CMOS. The imaging means 10 corresponds to the imaging unit 104 in FIG.

３０は、画像取得手段であり、撮像手段１０で撮像した画像データを所定時間間隔で取得し、時間的に連続した複数フレーム単位で出力する。フレーム画像は、撮像部１０４から、ＬＡＮケーブル１０５を介してｈｔｔｐプロトコルのパケットデータとして送られ、画像処理装置１０６上のネットワークインターフェース２０８を介して取得する。或いは、１０５を同軸ケーブルで構成し、画像処理装置１０６上のビデオインターフェース２０９で取得するようにしてもよい。 Reference numeral 30 denotes an image acquisition unit that acquires image data captured by the imaging unit 10 at predetermined time intervals and outputs the data in units of a plurality of temporally continuous frames. The frame image is sent as packet data of the http protocol from the imaging unit 104 via the LAN cable 105 and acquired via the network interface 208 on the image processing apparatus 106. Alternatively, 105 may be configured by a coaxial cable and acquired by the video interface 209 on the image processing apparatus 106.

４０は、物体認識手段であり、画像取得手段３０で取得した画像データに所望の物体が映っているかどうかを認識処理する。具体的に、物体認識手段４０は、物体に係る辞書を用いて所定方向の物体を認識する処理を行う。
５０は、認識結果分析・出力手段であり、物体認識手段４０で認識した結果を分析し、分析した結果を、例えば、表示装置２０７に表示するように出力する。 Reference numeral 40 denotes an object recognizing unit that recognizes whether a desired object is reflected in the image data acquired by the image acquiring unit 30. Specifically, the object recognition unit 40 performs processing for recognizing an object in a predetermined direction using a dictionary related to the object.
Reference numeral 50 denotes recognition result analysis / output means, which analyzes the result recognized by the object recognition means 40 and outputs the analyzed result to be displayed on the display device 207, for example.

６０は、物体辞書記憶手段であり、物体認識手段４０で用いる所望の認識対象に対応する物体辞書を記憶したメモリである。具体的に、物体辞書記憶手段６０は、同一の物体の異なる方向に対応した複数の辞書のデータを格納する。物体辞書は、数多くの所定方向の物体パターンから機械学習により予め求められたものである。外部記憶装置２０６に記憶され、プログラムの起動時などにＲＡＭ２０３に読み込まれる。なお、本実施形態では、顔の垂直方向の角度に応じた、複数の認識辞書を用意しているものとする。もちろん、角度が異なる方向（水平方向等）の角度に応じて認識辞書を分けてもよいが、説明を簡単にするために、以降、垂直方向の角度に応じた認識辞書の例で説明する。
７０は、辞書・照合領域設定手段であり、物体辞書記憶手段６０に記憶されている複数の認識辞書から使用する認識辞書の選択と、選択された認識辞書を切替えて使用して照合を行ってフレーム画像内の領域ごとに物体認識手段４０に設定する。即ち、辞書・照合領域設定手段７０は、フレーム画像の領域ごとに複数の辞書を切り替えて物体認識手段４０に適用する切替え手段を構成する。 Reference numeral 60 denotes an object dictionary storage unit, which is a memory that stores an object dictionary corresponding to a desired recognition target used by the object recognition unit 40. Specifically, the object dictionary storage unit 60 stores data of a plurality of dictionaries corresponding to different directions of the same object. The object dictionary is obtained in advance by machine learning from a number of object patterns in a predetermined direction. It is stored in the external storage device 206 and read into the RAM 203 when the program is started. In the present embodiment, it is assumed that a plurality of recognition dictionaries corresponding to the vertical angle of the face are prepared. Of course, the recognition dictionaries may be divided according to angles in different directions (such as the horizontal direction). However, in order to simplify the explanation, an example of a recognition dictionary according to the angle in the vertical direction will be described below.
Reference numeral 70 denotes a dictionary / collation area setting unit that selects a recognition dictionary to be used from a plurality of recognition dictionaries stored in the object dictionary storage unit 60 and switches the selected recognition dictionary to perform collation. Each region in the frame image is set in the object recognition means 40. That is, the dictionary / collation region setting unit 70 constitutes a switching unit that switches a plurality of dictionaries for each region of the frame image and applies them to the object recognition unit 40.

８０は、辞書適用領域決定手段であり、物体辞書記憶手段６０に記憶された認識辞書に対するフレーム画像内の照合領域を決定する。
９０は、辞書適用領域記憶手段あり、辞書適用領域決定手段８０で決定された認識辞書に対するフレーム画像内の照合領域を記憶する。 Reference numeral 80 denotes dictionary application area determination means for determining a collation area in the frame image for the recognition dictionary stored in the object dictionary storage means 60.
Reference numeral 90 denotes dictionary application area storage means, which stores the collation area in the frame image for the recognition dictionary determined by the dictionary application area determination means 80.

図３の画像取得手段３０、物体認識手段４０、辞書・照合領域設定手段７０、辞書適用領域決定手段８０は、例えば、図２のＣＰＵ２０１及びＲＯＭ２０２（或いは外部記憶装置２０６）に格納されている制御プログラム、並びにＲＡＭ２０３から構成されている。また、認識結果分析・出力手段５０は、例えば、図２のＣＰＵ２０１及びＲＯＭ２０２（或いは外部記憶装置２０６）に格納されている制御プログラム、ＲＡＭ２０３、並びに、表示装置２０７から構成されている。また、物体辞書記憶手段６０、辞書適用領域記憶手段９０は、例えば、図２の外部記憶装置に構成される。 The image acquisition means 30, the object recognition means 40, the dictionary / collation area setting means 70, and the dictionary application area determination means 80 in FIG. 3 are, for example, controls stored in the CPU 201 and ROM 202 (or external storage device 206) in FIG. It consists of a program and a RAM 203. The recognition result analysis / output unit 50 includes, for example, a control program stored in the CPU 201 and the ROM 202 (or the external storage device 206) in FIG. 2, the RAM 203, and the display device 207. Further, the object dictionary storage unit 60 and the dictionary application area storage unit 90 are configured in the external storage device of FIG. 2, for example.

図４は、本発明の実施形態に係る画像処理装置１０６による画像処理方法の処理手順の一例を示すフローチャートである。 FIG. 4 is a flowchart showing an example of the processing procedure of the image processing method by the image processing apparatus 106 according to the embodiment of the present invention.

まず、ステップＳ４００において、辞書適用領域決定手段８０は、フレーム画像内で適用できる、顔の角度の範囲が異なる複数の認識辞書のそれぞれが適用できる領域を決定する。これは、フレーム画像内の顔の位置と、その位置に顔が存在するときの顔の角度との関係より決定できる。なお、本実施形態では、垂直方向の角度であるものとして説明する。 First, in step S400, the dictionary application area determination unit 80 determines an area to which each of a plurality of recognition dictionaries with different face angle ranges that can be applied in a frame image can be applied. This can be determined from the relationship between the position of the face in the frame image and the angle of the face when a face is present at that position. In the present embodiment, description will be made assuming that the angle is in the vertical direction.

顔の角度は、人物が直立しているときの水平方向を０度として現される、図５のΘとして示した角度である。
図５は、本発明の実施形態を示し、顔の角度を説明する模式図である。
ここで、図５の５０１は通路の天井、５０２は通路の床、５０３は撮像部（カメラ）、５０４は人物である。 The angle of the face is an angle shown as Θ in FIG. 5 expressed as 0 degrees in the horizontal direction when the person is standing upright.
FIG. 5 is a schematic diagram illustrating an embodiment of the present invention and explaining a face angle.
Here, 501 in FIG. 5 is a ceiling of the passage, 502 is a floor of the passage, 503 is an imaging unit (camera), and 504 is a person.

図５にも示している通り、Θは天井に沿った直線と、カメラ５０３から人物５０４の顔まで引いた直線（図５の実線）とのなす角に等しい。従って、カメラ５０３から人物５０４までの距離（図５中のＸ）と（カメラ５０３の地面からの高さ−顔の地面からの高さ）（図５中のＹ）を求めることにより、Θは、以下の数式（１）で求めることができる。
Θ＝ｔａｎ^-1（（カメラ５０３の地面からの高さ−顔の地面からの高さ）／カメラ５０３から人物までの距離）・・・（１） As shown in FIG. 5, Θ is equal to an angle formed by a straight line along the ceiling and a straight line drawn from the camera 503 to the face of the person 504 (solid line in FIG. 5). Therefore, by obtaining the distance from the camera 503 to the person 504 (X in FIG. 5) and (the height of the camera 503 from the ground-the height of the face from the ground) (Y in FIG. 5), Θ is The following equation (1) can be obtained.
Θ = tan ⁻¹ ((the height of the camera 503 from the ground−the height of the face from the ground) / the distance from the camera 503 to the person) (1)

Θの範囲は、どの範囲を撮影できるかによるので、カメラ５０３の画角と設置時のカメラ５０３の角度で決まる。図５の例では、点線が撮影できる範囲である。人物５０４の顔の高さは、例えば平均的な身長の人物を想定すればよいので、カメラ５０３の設置条件（カメラの高さ、カメラの角度、カメラの画角）が決まれば、計算により求めることができる。そこで、カメラ５０３の設置条件をユーザに入力させ、数式（１）によってフレーム画像内の顔の位置と顔の角度の関係を求めることができる。 Since the range of Θ depends on which range can be photographed, it is determined by the angle of view of the camera 503 and the angle of the camera 503 at the time of installation. In the example of FIG. 5, the dotted line is the range that can be photographed. The height of the face of the person 504 may be assumed to be, for example, a person having an average height, so that if the installation conditions of the camera 503 (camera height, camera angle, camera angle of view) are determined, the height is calculated. be able to. Therefore, the installation condition of the camera 503 can be input by the user, and the relationship between the face position and the face angle in the frame image can be obtained by Expression (1).

また、フレーム画像内の任意の位置（ｙ）と顔の角度Θの関係式（Θ＝ｆ（ｙ））を定義し、設置時にいくつかの値を入力して、ｆ（ｙ）を求めるようにしてもよい。例えば、フレーム画像内の任意の位置（ｙ）と顔の角度Θが以下の数式（２）の一次式で表せるとする。
Θ＝ａｙ＋ｂ・・・（２）
そうすると、２つ以上のフレーム画面内の位置と顔の角度を予め計測しておけば、係数ａ、定数ｂを求めることができる。よって、実際にカメラ５０３の前に人物５０４を２地点以上立たせ、それぞれのフレーム画面内の位置とカメラ５０３からの距離を入力するようにして、係数ａ、定数ｂを求めるようにしてもよい。 Also, a relational expression (Θ = f (y)) between an arbitrary position (y) in the frame image and the face angle Θ is defined, and several values are input at the time of installation to obtain f (y). It may be. For example, it is assumed that an arbitrary position (y) in the frame image and the face angle Θ can be expressed by the following linear expression (2).
Θ = ay + b (2)
Then, if the position and the face angle in two or more frame screens are measured in advance, the coefficient a and the constant b can be obtained. Accordingly, the coefficient a and the constant b may be obtained by actually standing the person 504 at two or more points in front of the camera 503 and inputting the position in each frame screen and the distance from the camera 503.

次に、認識辞書について説明する。
認識辞書は、その適用できる顔の角度の範囲が重なるように作成されたものを用いる。
図６は、本発明の実施形態を示し、認識辞書Ａ及びＢの適用できる顔の角度と認識精度との関係を示した模式図である。 Next, the recognition dictionary will be described.
The recognition dictionary is created so that the range of applicable face angles overlaps.
FIG. 6 is a schematic diagram showing the relationship between the face angle applicable to the recognition dictionaries A and B and the recognition accuracy according to the embodiment of the present invention.

図６では、横軸に認識辞書が適用可能な顔の角度、縦軸に認識精度を示しており、６０１は認識辞書Ａ、６０２は認識辞書Ｂを示している。このとき、認識辞書Ａ（６０１）の適用可能な顔の角度の範囲は６０３（Θ１）から６０４（Θ３）、認識辞書Ｂ（６０２）の適用可能な顔の角度の範囲は６０５（Θ２）から６０６（Θ４）である。なお、Θ１＜Θ２＜Θ３＜Θ４である。 In FIG. 6, the horizontal axis indicates the face angle to which the recognition dictionary can be applied, the vertical axis indicates the recognition accuracy, 601 indicates the recognition dictionary A, and 602 indicates the recognition dictionary B. At this time, the applicable face angle range of the recognition dictionary A (601) is 603 (Θ1) to 604 (Θ3), and the applicable face angle range of the recognition dictionary B (602) is 605 (Θ2). 606 (Θ4). Note that Θ1 <Θ2 <Θ3 <Θ4.

既に、［発明が解決しようとしている課題］で述べたように、適用できる顔の角度の範囲の端に近くなると認識精度が落ちてくる。そこで、本実施形態では、２つの認識辞書によって、認識精度の低下を補完しあえるように、適用可能な顔の角度の範囲（画像の領域）が重複するように辞書を作成する。図６の例では、６０５（Θ２）から６０４（Θ３）の範囲が相当する。重なりの大きさは、認識精度の許容範囲をどの程度にするかによって決定できる。このようにして作成された２つの認識辞書Ａ及びＢを用いることによって、常に認識精度の高い顔検出を可能とする。また、認識辞書Ａ及びＢを適用するフレーム画像内の領域は、前述のフレーム画像内の顔の位置と顔の角度との関係を用いて、次のように重複するように決められる。 As already described in [Problems to be Solved by the Invention], the accuracy of recognition decreases when it comes close to the end of the range of applicable face angles. Therefore, in the present embodiment, the two recognition dictionaries are created so that the range of applicable face angles (image regions) overlap so that the reduction in recognition accuracy can be complemented. In the example of FIG. 6, the range from 605 (Θ2) to 604 (Θ3) corresponds. The size of the overlap can be determined by how much the allowable range of the recognition accuracy is set. By using the two recognition dictionaries A and B created in this way, face detection with high recognition accuracy is always possible. The areas in the frame image to which the recognition dictionaries A and B are applied are determined so as to overlap as follows using the relationship between the face position and the face angle in the frame image.

図７は、本発明の実施形態を示し、認識辞書Ａ及びＢの適用できる領域を説明する模式図である。
図７の例では、図７（ａ）の７０１（斜線領域）は、顔の位置が上端から下端に向かうに従って、顔の角度がΘ１からΘ３まで変化する。そこで、この領域については、認識辞書Ａ（６０１）が適用可能である。 FIG. 7 is a schematic diagram illustrating an area to which the recognition dictionaries A and B can be applied according to the embodiment of the present invention.
In the example of FIG. 7, in 701 (shaded area) in FIG. 7A, the face angle changes from Θ 1 to Θ 3 as the face position moves from the upper end to the lower end. Therefore, the recognition dictionary A (601) can be applied to this area.

図７（ｂ）の７０２（斜線領域）は顔の位置が上端から下端に向かうに従って、顔の角度が角度Θ２からΘ４まで変化する。そこで、この領域については、認識辞書Ｂ（６０２）が適用可能である。図７（ａ）及び（ｂ）を重ねて描くと、図７（ｃ）のようになり、７０４（横線領域）の部分はΘ２＜Θ３の領域であり、認識辞書Ａ及び認識辞書Ｂの両方が適用可能である。即ち、領域７０４においては、画像の領域に用いられる認識辞書の適用領域は、重複している。 In 702 (shaded area) in FIG. 7B, the face angle changes from the angle Θ2 to Θ4 as the face position moves from the upper end to the lower end. Therefore, the recognition dictionary B (602) can be applied to this area. 7A and 7B are drawn as shown in FIG. 7C, the portion 704 (horizontal line region) is a region of Θ2 <Θ3, and both the recognition dictionary A and the recognition dictionary B Is applicable. That is, in the area 704, the application areas of the recognition dictionary used for the image area overlap.

以上のようにして、顔の角度の範囲が異なる複数の認識辞書のそれぞれが適用できる領域を決定することができる。これらの領域は、辞書適用領域記憶手段９０に保存される。 As described above, an area to which each of a plurality of recognition dictionaries having different face angle ranges can be determined. These areas are stored in the dictionary application area storage unit 90.

ここで、再び、図４の説明に戻る。
ステップＳ４００の処理が終了すると、ステップＳ４０１に進む。
ステップＳ４０１に進むと、画像処理装置１０６は、処理を終了するか否かを判断する。 Here, it returns to description of FIG. 4 again.
When the process of step S400 ends, the process proceeds to step S401.
In step S401, the image processing apparatus 106 determines whether to end the process.

ステップＳ４０１の判断の結果、電源ＯＦＦやキーボード２０４やマウス２０５を介してユーザから処理の終了の指示があると、本フローチャートの処理を終了する。 As a result of the determination in step S401, if there is an instruction to end the process from the user via the power OFF or the keyboard 204 or the mouse 205, the process of this flowchart ends.

一方、ステップＳ４０１の判断の結果、ユーザから処理の終了の指示がなかった場合、ステップＳ４０２に進む。即ち、ユーザから処理の終了の指示があるまで、ステップＳ４０２〜ステップＳ４０６の処理を繰り返し行う。 On the other hand, if the result of determination in step S401 is that there is no instruction to end processing from the user, processing proceeds to step S402. That is, the processes in steps S402 to S406 are repeated until the user gives an instruction to end the process.

ステップＳ４０２に進むと、画像取得手段３０は、撮像手段１０へ入力された映像から、前述した方法によりフレーム画像として取得する。
ここで読み込まれた画像データは、例えば、８ビットの画素により構成される２次元配列のデータであり、Ｒ、Ｇ、Ｂの３つの面により構成される。このとき、画像データがＪＰＥＧ等の方式により圧縮されている場合には、画像データを所定の解凍方式にしたがって解凍し、ＲＧＢ各画素により構成される画像データとする。さらに、本実施形態では、ＲＧＢデータを輝度データに変換し、輝度画像データを以後の処理に適用するものとし、画像メモリ（例えば、図２の外部記憶装置２０６）に格納する。画像データとしてＹＣｒＣｂのデータを入力する場合には、Ｙ成分をそのまま輝度データとしてもよい。 In step S402, the image acquisition unit 30 acquires a frame image from the video input to the imaging unit 10 by the method described above.
The image data read here is, for example, data of a two-dimensional array composed of 8-bit pixels, and is composed of three planes R, G, and B. At this time, if the image data is compressed by a method such as JPEG, the image data is decompressed according to a predetermined decompression method to obtain image data composed of RGB pixels. Further, in the present embodiment, RGB data is converted into luminance data, and the luminance image data is applied to subsequent processing, and is stored in an image memory (for example, the external storage device 206 in FIG. 2). When YCrCb data is input as image data, the Y component may be directly used as luminance data.

続いて、ステップＳ４０３において、物体認識手段４０は、内部の画像メモリに転送された画像データから、辞書・照合領域設定手段７０で設定された辞書データと照合を行い、所望の物体を認識する。 Subsequently, in step S403, the object recognition unit 40 performs collation with the dictionary data set by the dictionary / collation region setting unit 70 from the image data transferred to the internal image memory to recognize a desired object.

ここで、まず、一般的な物体認識方法について説明する。
公知技術１や公知技術２で提案されている方法が知られている。
例えば、公知技術１では、ニューラル・ネットワークにより画像中の顔パターンを検出する技術である。以下、その方法について簡単に説明する。 First, a general object recognition method will be described.
The methods proposed in the known technique 1 and the known technique 2 are known.
For example, the known technique 1 is a technique for detecting a face pattern in an image using a neural network. The method will be briefly described below.

まず、顔の検出を対象とする画像データをメモリに読み込み、顔と照合する所定の領域を読み込んだ画像中から切り出す。そして、切り出した領域の画素値の分布を入力としてニューラル・ネットワークによる演算で１つの出力を得る。このとき、ニューラル・ネットワークの重み、閾値が膨大な顔画像パターンと非顔画像パターンにより予め学習されており、例えば、ニューラル・ネットワークの出力が０以上なら顔、それ以外は非顔であると判別する。ここで、重みや閾値が辞書データとなる。そして、ニューラル・ネットワークの入力である顔と照合する画像パターンの切り出し位置を、例えば、図８に示すように、画像全域から縦横順次に走査していくことにより、画像中から顔を検出する。
図８は、画像から顔パターンの探索を行う方法を説明する図である。
具体的には、画像全域８０１を縦横順次に走査して、照合するパターン８０２を抽出し。この照合するパターン８０２に対して、顔判別処理８０３を行う。 First, image data for face detection is read into a memory, and a predetermined area to be matched with the face is cut out from the read image. Then, one output is obtained by calculation using a neural network with the distribution of pixel values in the cut-out area as an input. At this time, the weight and threshold of the neural network are learned in advance using a face image pattern and a non-face image pattern, and for example, a face is determined if the output of the neural network is 0 or more, and a non-face is determined otherwise To do. Here, the weight and threshold value are dictionary data. Then, the face is detected from the image by scanning the cutout position of the image pattern to be collated with the face which is an input of the neural network, for example, as shown in FIG.
FIG. 8 is a diagram for explaining a method for searching for a face pattern from an image.
Specifically, the entire image 801 is scanned vertically and horizontally to extract a pattern 802 to be collated. Face discrimination processing 803 is performed on the pattern 802 to be collated.

また、処理の高速化に着目した例としては、公知技術２がある。この技術の中では、ＡｄａＢｏｏｓｔを使って多くの弱判別器を有効に組合せて顔判別の精度を向上させる一方、夫々の弱判別器をＨａａｒタイプの矩形特徴量で構成し、しかも矩形特徴量の算出を、積分画像を利用して高速に行っている。また、ＡｄａＢｏｏｓｔ学習によって得た判別器を直列に繋ぎ、カスケード型の顔検出器を構成するようにしている。このカスケード型の顔検出器は、まず前段の単純な判別器を使って明らかに顔でないパターンの候補をその場で除去する。そして、それ以外の候補に対してのみ、より高い識別性能を持つ後段の複雑な判別器を使って顔かどうかの判定を行っている。これにより、すべての候補に対して複雑な判定を行う必要がないので高速である。なお、公知技術１と同様に判別器で用いる重みや閾値が辞書データとなる。 Further, as an example paying attention to the speeding up of processing, there is known technique 2. In this technology, AdaBoost is used to effectively combine many weak classifiers to improve the accuracy of face discrimination, while each weak classifier is configured with a Haar type rectangular feature quantity, The calculation is performed at high speed using the integral image. In addition, the discriminators obtained by AdaBoost learning are connected in series to form a cascade type face detector. This cascade type face detector first removes a pattern candidate that is clearly not a face on the spot using a simple discriminator in the previous stage. Only for the other candidates, it is determined whether or not it is a face using a later complex discriminator having higher discrimination performance. As a result, it is not necessary to make a complicated determination for all candidates, which is fast. Note that the weights and threshold values used in the discriminator are dictionary data as in the known technique 1.

次に、本実施形態において、特徴的な辞書・照合領域設定手段７０等による動作について、図９のフローチャートを用いて説明する。
図９は、本発明の実施形態を示し、辞書と照合領域を切り替える動作を説明するフローチャートである。 Next, in this embodiment, the operation by the characteristic dictionary / collation area setting means 70 will be described with reference to the flowchart of FIG.
FIG. 9 is a flowchart illustrating an operation of switching between a dictionary and a collation area according to the embodiment of this invention.

まず、ステップＳ９００において、辞書・照合領域設定手段７０は、全てのフレームについて処理が行われたか否かを判断する。この判断の結果、全てのフレームについて処理が行われた場合には、本フローチャートの処理を終了する。 First, in step S900, the dictionary / collation region setting means 70 determines whether or not processing has been performed for all frames. As a result of this determination, if all the frames have been processed, the processing of this flowchart is terminated.

一方、全てのフレームについては未だ処理が行われていない場合には、ステップＳ９０１に進む。即ち、全てのフレームについては未だ処理が行われていない場合には、ステップＳ９０１〜ステップＳ９０９又はＳ９１０までの処理を繰り返す。 On the other hand, if all the frames have not been processed, the process proceeds to step S901. That is, if the processing has not been performed for all the frames, the processing from step S901 to step S909 or S910 is repeated.

続いて、ステップＳ９０１において、辞書・照合領域設定手段７０は、物体辞書記憶手段６０から読み込まれた複数の認識辞書の中から、認識辞書Ａを選択し、物体認識手段４０へ設定する。 Subsequently, in step S 901, the dictionary / collation area setting unit 70 selects the recognition dictionary A from the plurality of recognition dictionaries read from the object dictionary storage unit 60 and sets the recognition dictionary A in the object recognition unit 40.

続いて、ステップＳ９０２において、辞書・照合領域設定手段７０は、認識辞書Ａの照合領域を、辞書適用領域記憶手段９０から読み出して、物体認識手段４０へ設定する。前述の通り認識辞書Ａの照合領域は図７（ａ）の７０１になる。 Subsequently, in step S 902, the dictionary / collation area setting unit 70 reads the collation area of the recognition dictionary A from the dictionary application area storage unit 90 and sets it in the object recognition unit 40. As described above, the collation area of the recognition dictionary A is 701 in FIG.

続いて、ステップＳ９０３において、物体認識手段４０は、ステップＳ９０１で設定された認識辞書Ａを用いて、ステップＳ９０２で設定された照合領域の範囲で、辞書との照合を行う。 Subsequently, in step S903, the object recognition unit 40 uses the recognition dictionary A set in step S901 to collate with the dictionary within the range of the collation area set in step S902.

続いて、ステップＳ９０４において、辞書・照合領域設定手段７０は、物体辞書記憶手段６０から読み込まれた複数の認識辞書の中から、認識辞書Ｂを選択し、物体認識手段４０へ設定する。 Subsequently, in step S 904, the dictionary / collation region setting unit 70 selects the recognition dictionary B from the plurality of recognition dictionaries read from the object dictionary storage unit 60 and sets the recognition dictionary B in the object recognition unit 40.

続いて、ステップＳ９０５において、辞書・照合領域設定手段７０は、認識辞書Ｂの照合領域を、辞書適用領域記憶手段９０より読み出して、物体認識手段４０へ設定する。前述の通り認識辞書Ｂの照合領域は図７（ｂ）の７０２になる。 In step S 905, the dictionary / collation area setting unit 70 reads the collation area of the recognition dictionary B from the dictionary application area storage unit 90 and sets it in the object recognition unit 40. As described above, the collation area of the recognition dictionary B is 702 in FIG.

続いて、ステップＳ９０６において、物体認識手段４０は、ステップＳ９０４で設定された認識辞書Ｂを用いて、ステップＳ９０５で設定された照合領域の範囲で、辞書との照合を行う。 Subsequently, in step S906, the object recognition unit 40 uses the recognition dictionary B set in step S904 to collate with the dictionary within the range of the collation area set in step S905.

次に、様々な大きさの顔の認識に対応するために、以降の処理で、フレーム画像を縮小して照合を繰り返す。
まず、ステップＳ９０７において、例えば、物体認識手段４０（或いは辞書・照合領域設定手段７０）は、縮小が十分で行われたか否かを判断する。ここでは、照合に用いる画像パターンと同じサイズまで縮小したとき、フレーム画像内で最大の顔を検出することになる。 Next, in order to deal with recognition of faces of various sizes, the frame image is reduced and matching is repeated in the subsequent processing.
First, in step S907, for example, the object recognition unit 40 (or the dictionary / collation region setting unit 70) determines whether or not the reduction has been sufficiently performed. Here, when the image pattern is reduced to the same size as the image pattern used for collation, the maximum face is detected in the frame image.

ステップＳ９０７の判断の結果、縮小が十分でない、即ち、照合に用いる画像パターンよりも小さくならない範囲で縮小可能であるときには、ステップＳ９０８へ進む。 If the result of determination in step S907 is that the reduction is not sufficient, that is, it can be reduced within a range that does not become smaller than the image pattern used for collation, the process proceeds to step S908.

ステップＳ９０８に進むと、物体認識手段４０は、所定の縮小率でフレーム画像を縮小する。 In step S908, the object recognition unit 40 reduces the frame image at a predetermined reduction rate.

続いて、ステップＳ９０９において、辞書・照合領域設定手段７０は、認識辞書Ａと認識辞書Ｂの照合領域を、ステップＳ９０８と同じ縮小率で縮小する。そして、その後、ステップＳ９０１へ戻る。 Subsequently, in step S909, the dictionary / collation area setting unit 70 reduces the collation areas of the recognition dictionary A and the recognition dictionary B at the same reduction ratio as in step S908. Then, the process returns to step S901.

以降、ステップＳ９０８において縮小されたフレーム画像に対して、ステップＳ９０１〜ステップＳ９０６の処理を行う。ここで、ステップＳ９０２とステップＳ９０５で設定される照合領域には、ステップＳ９０９で縮小された領域を用いられる。 Thereafter, the processes in steps S901 to S906 are performed on the frame image reduced in step S908. Here, the area reduced in step S909 is used as the collation area set in steps S902 and S905.

以上のように、１枚のフレーム画像に対して、ステップＳ９０１〜ステップＳ９０９の処理を繰り返す。 As described above, the processes in steps S901 to S909 are repeated for one frame image.

一方、ステップＳ９０７の判断の結果、照合に用いる画像パターンと同じサイズまで縮小したときには、ステップＳ９１０へ進み、画像取得手段３０は、次のフレーム画像を取得し、ステップＳ９００へ戻る。 On the other hand, as a result of the determination in step S907, when the image pattern is reduced to the same size as the image pattern used for collation, the process proceeds to step S910, and the image acquisition unit 30 acquires the next frame image, and returns to step S900.

以上の処理によって、顔の角度の適用範囲が重なっている２つの認識辞書Ａ及びＢを用いることで、顔がフレーム画像内のどの位置にあっても、高精度な認識ができるようになる。しかしながら、毎フレーム、認識辞書Ａと認識辞書Ｂの両方を用いると、重なり部分（図７（ｃ）の７０４）は２重に認識辞書との照合（ステップＳ９０３とステップＳ９０６）が行われることになる。これでは、演算コストが増大してしまう。そこで、以下の方法によって、この課題を回避することが可能である。 Through the above processing, by using the two recognition dictionaries A and B where the application range of the face angle is overlapped, it becomes possible to recognize the face with high accuracy regardless of the position in the frame image. However, when both the recognition dictionary A and the recognition dictionary B are used for each frame, the overlapping portion (704 in FIG. 7C) is double-checked with the recognition dictionary (steps S903 and S906). Become. This increases the calculation cost. Therefore, this problem can be avoided by the following method.

Ｎフレーム目とＮ＋１フレーム目で、使用する認識辞書と照合領域を変更する（ただし、Ｎには自然数が入る）。即ち、Ｎフレームでは、辞書・照合領域設定手段７０は、認識辞書Ａを設定し、照合は画像全体ではなく図７（ａ）の７０１の領域に対してのみ行う。そして、Ｎ＋１フレームでは辞書・照合領域設定手段７０は、認識辞書Ｂを設定し、図７（ｂ）の７０２の領域に対してのみ行う。このように連続したフレームごとに使用する辞書と、照合の領域を切り替えながら、認識を行う。これにより、顔がフレーム画像内のどの位置にあってもＮフレームかＮ＋１フレームのどちらかで認識されることになる。また、フレームごとの照合の領域が制限されるので、フレーム画像の全領域について照合を行う場合に比べて、演算コストが少なくてすむ。 The recognition dictionary and collation area to be used are changed between the Nth frame and the (N + 1) th frame (where N is a natural number). That is, in the N frame, the dictionary / collation area setting means 70 sets the recognition dictionary A, and collation is performed only on the area 701 in FIG. Then, in the N + 1 frame, the dictionary / collation region setting means 70 sets the recognition dictionary B and performs it only for the region 702 in FIG. In this way, recognition is performed while switching between the dictionary used for each successive frame and the collation area. As a result, the face is recognized in either the N frame or the N + 1 frame regardless of the position in the frame image. In addition, since the collation area for each frame is limited, the calculation cost can be reduced as compared with the case where collation is performed for all areas of the frame image.

なお、認識結果としては、ＮフレームとＮ＋１フレームを論理和したものを使用すればよい。重なりの部分において、Ｎフレーム目とＮ＋１フレーム目の両方で認識されても、前後のフレーム間の時間差が十分に小さければ、位置はほとんど変わらないので、同じものであると判定することは容易である。 As a recognition result, a logical sum of N frames and N + 1 frames may be used. Even if the overlap is recognized in both the Nth frame and the (N + 1) th frame, if the time difference between the previous and next frames is sufficiently small, the position will hardly change, so it is easy to determine that they are the same. is there.

ここで、再び、図４の説明に戻る。
ステップＳ４０３の処理が終了すると、続いて、ステップＳ４０４に進む。
ステップＳ４０４に進むと、認識結果分析・出力手段５０は、現在から所定時間前までの間に検出された被写体領域をＲＡＭ２０３より読み出して、軌跡を生成する。これは、所定時間内に検出された複数ある顔のうち、どれが同一の人物の動きに対応するかを求める処理である。 Here, it returns to description of FIG. 4 again.
When the process of step S403 ends, the process proceeds to step S404.
In step S404, the recognition result analyzing / outputting unit 50 reads out the subject area detected between the present time and a predetermined time before from the RAM 203, and generates a locus. This is a process for determining which of a plurality of faces detected within a predetermined time corresponds to the movement of the same person.

この処理の詳細について、図１０を用いて説明する。
図１０は、本発明の実施形態を示し、軌跡の生成とカウントの一例を示した模式図である。 Details of this processing will be described with reference to FIG.
FIG. 10 is a schematic diagram illustrating an example of generation and counting of a trajectory according to the embodiment of the present invention.

図１０において、１００１は撮像しているフレーム全体である。ここに、所定の時間に検出された顔の領域を、矩形で表現して重ね描きしている（１００３〜１００５）。図１０の例では、３フレーム分を重ね描きしており、最も古いフレームでは１００３が、次のフレームでは１００４が、その次の現在のフレームでは、１００５が検出されているものとする。これらの軌跡を求める方法としては、各領域の中心を求め、各領域の中心間の距離が最小となるもの同士を同一の被写体とみなし、線分で接続するようにすればよい。このようにして求めた軌跡が、図１０の例では１００９となる。 In FIG. 10, reference numeral 1001 denotes the entire frame being imaged. Here, the face area detected at a predetermined time is represented by a rectangle and overlaid (1003 to 1005). In the example of FIG. 10, it is assumed that three frames are overdrawn. 1003 is detected in the oldest frame, 1004 is detected in the next frame, and 1005 is detected in the next current frame. As a method for obtaining these trajectories, the centers of the respective regions may be obtained, and those having the smallest distance between the centers of the respective regions may be regarded as the same subject and connected by line segments. The locus obtained in this way is 1009 in the example of FIG.

続いて、ステップＳ４０５において、認識結果分析・出力手段５０は、ステップＳ４０４で作成された軌跡が、所定の条件を満たすかどうかをチェックし、条件を満たしていればカウントする。ここで所定の条件とは、例えば、図１０に示した１００２のような計測ラインを横切っているかどうか、である。計測ライン１００２は、ユーザによってフレーム画面内に設定される。図１０の例では、軌跡１００９が計測ライン１００２を横切っているので、１とカウントされる。もし、まだ、計測ライン１００２を横切っていない軌跡が存在すれば、この時点では、カウントされない。 Subsequently, in step S405, the recognition result analysis / output unit 50 checks whether or not the trajectory created in step S404 satisfies a predetermined condition, and counts if the condition is satisfied. Here, the predetermined condition is, for example, whether or not the measurement line 1002 shown in FIG. 10 is crossed. The measurement line 1002 is set in the frame screen by the user. In the example of FIG. 10, since the trajectory 1009 crosses the measurement line 1002, it is counted as 1. If there is still a trajectory that does not cross the measurement line 1002, it is not counted at this point.

続いて、ステップＳ４０６において、認識結果分析・出力手段５０は、カウントした結果をユーザに対して表示する。 Subsequently, in step S406, the recognition result analysis / output unit 50 displays the counted result to the user.

その後、再び、ステップＳ４０１に戻る。 Then, it returns to step S401 again.

以上のように、適用範囲が重なる辞書を複数用意し、奇数フレームと偶数フレームで使用する辞書と照合範囲を切り替えるようにした。これにより、認識する物体の向きが画面内で変化する場合であっても精度よく認識することができる。 As described above, a plurality of dictionaries with overlapping application ranges are prepared, and the dictionary and collation range used in odd frames and even frames are switched. Thereby, even if the direction of the recognized object changes in the screen, it can be recognized with high accuracy.

本実施形態では、辞書・照合領域設定手段７０が使用する認識辞書と照合する範囲をフレームごとに交互に変更したが、次のような方法であっても良い。即ち、図７（ｃ）の７０３の範囲については、認識辞書Ａを用いて照合を行う。重なりの領域７０４に対しても、まずは認識辞書Ａを用いて照合を行うが、このとき、認識辞書との照合の結果得られる確からしさ（尤度）を元に、尤度マップを作成する。尤度は、辞書との照合の際の、閾値処理を施す前の演算結果より得られる。 In the present embodiment, the range to be collated with the recognition dictionary used by the dictionary / collation area setting unit 70 is alternately changed for each frame, but the following method may be used. That is, the range 703 in FIG. 7C is collated using the recognition dictionary A. The overlapping region 704 is also first collated using the recognition dictionary A. At this time, a likelihood map is created based on the probability (likelihood) obtained as a result of the collation with the recognition dictionary. The likelihood is obtained from a calculation result before threshold processing at the time of matching with a dictionary.

次に、尤度マップを参照し、尤度が所定値以下の部分に対してのみ認識辞書Ｂを用いて照合を行う。図７（ｃ）の７０５に対しては、認識辞書Ｂを用いて照合を行う。このようにすると、図７（ｃ）の７０４については部分的に２重に照合を行うことになるが、公知技術２のカスケード型の判別器を用いる場合には、尤度が低いものは、前段の判別器で判定できるため、照合にかかる時間は極めて少なくなる。従って、演算コストは大きく増えることはない。 Next, the likelihood map is referred to and collation is performed using the recognition dictionary B only for a portion where the likelihood is a predetermined value or less. For 705 in FIG. 7C, collation is performed using the recognition dictionary B. In this way, 704 in FIG. 7C is partially double-checked. However, when the cascade type discriminator of publicly known technology 2 is used, the one with low likelihood is Since it can be determined by the discriminator in the previous stage, the time required for collation is extremely small. Therefore, the calculation cost does not increase greatly.

本実施形態では、顔の位置を検出する例で説明したが、人体全体や上半身、頭部など人物の様々な部位や、自動車や自転車など、様々な物体においても適用可能である。また、特定の人物であるかどうかを個人の顔の特徴から弁別するような場合でも適用可能である。 In the present embodiment, the example in which the position of the face is detected has been described. However, the present invention can be applied to various parts of a person such as the entire human body, the upper body, and the head, and various objects such as a car and a bicycle. Further, the present invention can also be applied to the case of discriminating whether or not a person is a specific person from the facial features of the individual.

本実施形態では、顔の垂直方向の角度で説明したが、もちろん、水平方向の角度に関しても同様である。 In the present embodiment, the angle in the vertical direction of the face has been described. Of course, the same applies to the angle in the horizontal direction.

また、本実施形態では、認識結果分析・出力手段５０は、通路を通行する人数をカウントする例を説明した。しかしながら、所定のエリアの混雑率を計測したり、動線を分析したり、特定の人物に対してアラームを発生させるなど、様々な用途に適用可能である。 Further, in the present embodiment, the example in which the recognition result analysis / output unit 50 counts the number of people passing through the passage has been described. However, the present invention can be applied to various uses such as measuring the congestion rate of a predetermined area, analyzing a flow line, and generating an alarm for a specific person.

また、本実施形態では、ＰＣである画像処理装置１０６において認識、計数、表示まで行うように構成したが、これに限ったものではない。例えば、物体認識手段４０から辞書適用領域記憶手段９０までの全てをチップに納め、撮像部１０４と一体化させることにより、計数結果のみＬＡＮケーブル１０５を介して画像処理装置１０６にて受信し、計数結果を閲覧するようにしてもよい。或いは、物体認識手段４０、辞書・照合領域設定手段７０、物体辞書記憶手段６０、辞書適用領域決定手段８０、辞書適用領域記憶手段９０を撮像部１０４と一体化する。そして、認識結果のみＬＡＮケーブル１０５を介して画像処理装置１０６にて受信して画像処理装置１０６において計数するようにしてもよい。
なお、本実施形態は、コンピュータ内でプログラムを実行することによっても実現することができることは当然である。 In this embodiment, the image processing apparatus 106, which is a PC, is configured to perform recognition, counting, and display. However, the present invention is not limited to this. For example, by storing everything from the object recognition unit 40 to the dictionary application area storage unit 90 on a chip and integrating it with the imaging unit 104, only the count result is received by the image processing apparatus 106 via the LAN cable 105, and the count is performed. You may make it browse a result. Alternatively, the object recognition unit 40, the dictionary / collation region setting unit 70, the object dictionary storage unit 60, the dictionary application region determination unit 80, and the dictionary application region storage unit 90 are integrated with the imaging unit 104. Only the recognition result may be received by the image processing apparatus 106 via the LAN cable 105 and counted by the image processing apparatus 106.
Naturally, the present embodiment can also be realized by executing a program in a computer.

また、本実施形態では、フレーム画像の領域ごとに複数の辞書を切り替える例を説明したが、例えば、画像が時間的に連続した画像であって、画像ごとに使用する辞書を切り替える形態も適用可能である。 Further, in this embodiment, an example of switching a plurality of dictionaries for each region of a frame image has been described. However, for example, a mode in which images are temporally continuous images and a dictionary to be used for each image is switched is also applicable. It is.

（その他の実施形態）
また、本発明は、以下の処理を実行することによっても実現される。
即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。
このプログラム及び当該プログラムを記憶したコンピュータ読み取り可能な記録媒体は、本発明に含まれる。 (Other embodiments)
The present invention can also be realized by executing the following processing.
That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.
This program and a computer-readable recording medium storing the program are included in the present invention.

１０撮像手段、３０画像取得手段、４０物体認識手段、５０認識結果分析・出力手段、６０物体辞書記憶手段、７０辞書・照合領域設定手段、８０辞書適用領域決定手段、９０辞書適用領域記憶手段 10 imaging means, 30 image acquisition means, 40 object recognition means, 50 recognition result analysis / output means, 60 object dictionary storage means, 70 dictionary / collation area setting means, 80 dictionary application area determination means, 90 dictionary application area storage means

Claims

Dictionary storage means for storing data of a plurality of dictionaries corresponding to different directions of the same object;
Setting means for setting an application area in a frame image for each of the plurality of dictionaries, including an area overlapping with an application area of at least two dictionaries ;
Recognizing means for recognizing the object in an application area set for the dictionary using each of the plurality of dictionaries,
The recognition unit is configured to switch a dictionary to be used for each frame image in a plurality of continuous frame images with respect to an application region that overlaps at least two dictionaries, and to integrate the recognition results for the frame images. Image processing device.

A setting step for setting an application area in the frame image for each of a plurality of dictionaries corresponding to different directions of the same object, including an area overlapping the application areas of at least two dictionaries ;
Recognizing the object in an application area set for the dictionary using each of the plurality of dictionaries, and
In the recognition step, for an application region that overlaps at least two dictionaries, a dictionary to be used for each frame image in a plurality of continuous frame images is switched, and a recognition result for each frame image is integrated. Image processing method.

A program for causing a computer to execute each step of the image processing method according to claim 2 .