JP2004295781A

JP2004295781A - Image recognition method and image recognition apparatus

Info

Publication number: JP2004295781A
Application number: JP2003090303A
Authority: JP
Inventors: Norihiro Kasano; 範博笠野; Ken Katsuno; 憲勝野; Saori Makino; 沙織牧野
Original assignee: OCEAN NETWORK ENTERTAINMENT KK
Current assignee: OCEAN NETWORK ENTERTAINMENT KK
Priority date: 2003-03-28
Filing date: 2003-03-28
Publication date: 2004-10-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image recognition method and an image recognition apparatus capable of positively discriminating an object to be monitored from the profile shape of an imaged moving object and realizable with a comparatively low-cost device. <P>SOLUTION: The image recognition apparatus 1 is provided with a monitoring camera 1 for imaging the moving object and outputting its moving image data; a frame separating means 3 for inputting the moving image data and separating it in frame images of prescribed time intervals; a converting means 5 for determining the presence of an image change on two sequentially separated frame images for every pixel and converting the frame images into numerical string data based on the determined results; a code generating means 6 for generating a shape code corresponding to the profile shape of the moving object based on the numerical string data; and a discriminating means 7 for discriminating whether the moving object is the object to be monitored, based on the shape code. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、画像認識方法及び画像認識装置に関し、特に、画像認識により、人間、動物、及び自動車等の監視対象物を認識する画像認識方法、及びその方法を利用する装置に関するものである。
【０００２】
【従来の技術】
従来、画像データを基に、人間や自動車等の監視対象物を認識する方法として、予め登録された背景画像データと、移動物体が撮像された画像データとの差分を取得し、その差分画像データを基に、監視対象物を認識する方法が知られている。
【０００３】
以上の従来技術は、当業者において当然として行われているものであり、出願人は、この従来技術が記載された文献を知見していない。
【０００４】
【発明が解決しようとする課題】
しかし、この方法では、高度の画像処理を伴うため、処理能力の高いコンピュータ等、高価な装置が必要となるとともに、監視対象物の認識に比較的長い時間を要していた。また、監視対象物の形状が一定ではない場合、すなわち人間等、その動作によって輪郭が随時変化する場合には、撮像された移動物体が監視対象物（人間）であるか否かを判別することが困難となっていた。また、カメラに対する移動物体の位置や、カメラと移動物体との距離が異なる場合には、撮像される移動物体の大きさが異なることから、判別の困難さが助長されていた。
【０００５】
そこで、本発明は、上記の実情に鑑み、撮像された移動物体の輪郭形状から監視対象物を確実に判別すると共に、比較的安価な装置で実現できる画像認識方法及び画像認識装置を提供するものである。
【０００６】
【課題を解決するための手段】
本発明にかかる画像認識方法は、移動物体を撮像した動画データを所定の時間間隔のフレーム画像に分離するフレーム分離工程と、順に分離された二つのフレーム画像について、画像の変化の有無を画素毎または予め分割された小領域毎に判定し、判定結果を基に前記フレーム画像を数値列データに変換する変換工程と、前記数値列データを基に、前記移動物体の輪郭形状に対応する形状コードを生成するデータ生成工程と、前記形状コードを基に、前記移動物体が監視対象物か否かを判別する判別工程とを備えるものである。
【０００７】
この画像認識方法によれば、動画データはフレーム分離工程においてフレーム分離され、そのフレーム画像を基本に以降の認識処理が行われる。なお、このときのフレーム分離の時間間隔は、認識すべき動作の速さや、認識の精度及び処理能力に応じて適宜設定される。
【０００８】
続く変換工程においては、所定の時間間隔で分離された二つのフレーム画像における画像の変化が、画素毎または予め分割された小領域毎に判定されるとともに、その判定結果に基づく数値が与えられる。例えば、変化があると判定された場合は「１」を出力し、変化がないと判定された場合は「０」を出力する。このようにして、フレーム画像が数値列データに変更される。次のコード生成工程では、この数値列データを基に移動物体の輪郭形状に対応する形状コードが生成される。判別工程では、生成された移動物体の形状コードを、予め設定された監視対象物の形状コードと比較することにより、移動物体が監視対象物であるか否かを判別する。
【０００９】
本発明にかかる画像認識装置は、移動物体を撮像してその動画データを出力する撮像手段と、前記動画データを入力して所定の時間間隔のフレーム画像に分離するフレーム分離手段と、順に分離された二つのフレーム画像について、画像の変化の有無を画素毎または予め分割された小領域毎に判定し、判定結果を基に前記フレーム画像を数値列データに変換する変換手段と、前記数値列データを基に、前記移動物体の輪郭形状に対応する形状コードを生成するコード生成手段と、前記形状コードを基に、前記移動物体が監視対象物か否かを判別する判別手段とを具備するものである。
【００１０】
この画像認識装置によれば、前述した画像認識方法と同様、撮像手段により撮像された動画データは、フレーム分割手段によってフレーム分離され、そのフレーム画像を基本に以降の認識処理が行われる。また、変更手段により、フレーム画像に変化があるか否かが、画素毎または予め分割された小領域毎に判定されるとともに、その判定結果に基づく数値が与えられる。その後、コード生成手段により、この数値列データを基に移動物体の輪郭形状に対応する形状コードが生成される。そして判別手段では、生成された移動物体の形状コードを、予め設定された監視対象物の形状コードと比較することにより、移動物体が監視対象物であるか否かを判別する。
【００１１】
また、この画像認識装置において、「前記コード生成手段は、前記移動物体の輪郭形状における特徴点の座標を検出する座標検出手段と、前記各特徴点を結ぶ線分の長さ及びその傾きを算出する線分算出手段とを有し、前記各線分の長さ及び傾きの組合せを基に前記形状コードを生成する」構成とすることができる。
【００１２】
ここで、特徴点とは、移動物体の輪郭上に位置する複数の点である。特徴点を設定する方法として、輪郭上の点を一定の間隔でサンプリングしてもよいが、輪郭を構成する曲線の頂点または変曲点（輪郭形状を示す曲線が凸から凹に、または凹から凸に変わる点）を抽出するようにしてもよい。
【００１３】
この画像認識装置によれば、形状コードを生成するにあたり、座標検出手段により、移動物体の輪郭形状における特徴点の座標が検出される。この複数の特徴点を順に結ぶことにより、輪郭形状を簡略化した多角形、すなわち直線のみからなる図形が形成される。そこで、線分算出手段は、各特徴点を結ぶ線分の長さ及びその傾きを算出し、各線分の長さ及び傾きの組合せ、すなわち多角形の図形に関する情報を基に形状コードを生成する。なお、直線のみからなる図形は、移動物体の輪郭形状を構成する曲線の集まりよりもはるかに情報量が少ないため、処理の簡略化を図ることが可能になる。
【００１４】
【発明の実施の形態】
以下、本発明の一実施形態である画像認識装置について、図１乃至図４に基づき説明する。図１は画像認識装置の機能的構成を示すブロック図であり、図２及び図３は画像認識方法を説明するための説明図であり、図４は画像認識装置における処理の流れを示すフローチャートである。
【００１５】
本実施形態の画像認識装置１は、人間、動物、及び自動車等の移動可能な監視対象物を、その輪郭形状を基に判別するものであり、例えば、侵入者を判別したときに報知手段を作動させるホームセキュリティシステム、走行中の自動車の車種を判別する車両認識システム、及びコンベア上の流れる工業製品の種類を認識し分別する分別システム等に適用することができる。
【００１６】
本実施形態では、一例として、画像認識装置１をホームセキュリティシステムに適用した場合について説明する。ホームセキュリティシステムとして、監視カメラを備えるものが従来から知られているが、従来のシステムでは、侵入者である人間と、犬や猫等のペットとを判別することができないため、室内でペットを飼っている住宅においては、居住者が外出する際、ペットを連れて外出したり、サークルやクレイトの中にペットを入れて室内を動き回らないようにする等の対策が必要であった。ところが、本発明の画像認識装置１をホームセキュリティシステムに組み込めば、侵入者とペット（例えば犬）とを判別することが可能になり、侵入者と判別された場合にのみ報知手段を作動させることが可能になる。
【００１７】
図１に示すように、本実施形態の画像認識装置１は、ＣＤＤカメラ等の監視カメラ２（本発明の撮像手段に相当）を備えるとともに、監視カメラ２から出力される動画データを基に、移動物体が監視対象物（すなわち侵入者）かペットかを判別する機能的な構成を有している。なお、この機能的構成は、汎用のコンピュータの記憶手段に格納された実行プログラムによって実現されるものであり、フレーム分離手段３、変換手段５、コード生成手段６、及び判別手段７が含まれている。フレーム分離手段３は、監視カメラ２から動画データを入力して所定の時間間隔（例えば０．１秒毎）に分離されたフレーム画像（静止画）を生成するものである。分離されたフレーム画像は、記憶装置８の画像記憶部９に記憶される。ここで、画像記憶部９にフレーム画像を記憶するのは、最新のフレーム画像が分離された時点で、前回のフレーム画像と比較するためである。
【００１８】
変換手段５は、所定の時間間隔で分離された最新のフレーム画像と前回のフレーム画像とを、画像記憶部９から読出し、画像の変化の有無を画素毎に判定する変化判定手段１０を有しており、この判定結果を基にフレーム画像を数値列データに変換するものである。さらに詳しく説明すると、変化判定手段１０は、画素毎に検出される変化量を予め定められたしきい値と比較して、変化量がしきい値よりも大きい画素を、動画素（動きのある画素）として抽出する。これにより、被写体の中から動きのある部分のみを抽出することができる。つまり、被写体に対して、動きのない背景と、動きのある移動物体とを分離することが可能になる。そこで、変換手段５では、例えば動画素に対して「１」を出力し、静止画素に対して「０」を出力し、その後、この数値を所定の順序に並べることにより、移動物体に関する数値列データを作成する。
【００１９】
コード生成手段６は、変換手段５によって生成された数値列データを基に、移動物体の輪郭形状に対応する形状コードを生成するものであり、座標検出手段１１と線分算出手段１２とを有している。座標検出手段１１は、移動物体の輪郭形状を構成する曲線に対して複数の特徴点を求め、その座標を検出するものである。例えば図２（ａ）に示すように、抽出された移動物体の形状が人間の上半身の場合には、その輪郭形状を構成する複数の曲線における頂点Ｔ１，Ｔ２，Ｔ３……Ｔｎを特徴点として設定する。この特徴点を結ぶことにより、輪郭形状を簡略化した多角形、すなわち、図２（ｂ）に示すように、直線のみからなる図形が形成される。線分算出手段１２は、各特徴点を結ぶ線分の長さＢ，Ｃ，Ｄ……と、隣接する線分同士の内角ｂ，ｃ，ｄ……とを算出し、各線分の長さ及び内角の組合せからなる形状コード（例えばＢｂＣｃＤｄ……）を生成し出力する。
【００２０】
図１に示す判別手段７は、コード生成手段６によって生成された形状コードを基に、移動物体が人間（侵入者）であるか、それとも犬等のペットであるかを判別するものであり、ここには、縦横比判別手段１３と、部位特定手段１４と、部位判別手段１５とが備えられている。まず縦横比判別手段１３は、移動物体全体の輪郭形状が縦長か横長かを判定し、縦長の場合には移動物体が人間であり、横長の場合には移動物体が人間以外の移動物体であると判別する。これは図３に示すように、人間Ｈは起立姿勢で動くことが多く、犬や猫等の動物Ａは横伏姿勢で動くことが多いことから、この姿勢の違いを利用して人間か否かを判別するものである。
【００２１】
一方、部位特定手段１４は、コード生成手段６によって生成された形状コードを基に、被写体に含まれる体の部位を特定するものである。記憶装置８のコード記憶部１６には、頭部、胴部、及び足部等、体の各部位における輪郭形状を表す形状コードが予め記憶されており、生成された形状コードとコード記憶部１６に記憶された形状コードとを照合させることにより、生成された形状コードがどの部位を表す形状コードであるのかを特定する。例えば、形状コードとして、細長く鉛直方向に延びる形態の形状コードが生成された場合には、足部を示す形状コードであると特定する。
【００２２】
部位判別手段１５は、部位特定手段１４によって特定された部位に応じて、検出された移動物体が人間か否かの判別を行うものである。具体的な判別としては、足部に関する関節判別部１７、頭部に関する頭部判別部１８、肩部に関する肩幅判別部１９、及び尻部に関する尻尾判別部２０を例示することができる。
【００２３】
関節判別部１７は、部位特定手段１４によって特定された部位が「足部」付近に相当する場合、「足の関節の成り立ち」に基づいて判別する。これは、図３に示すように、人間Ｈの足は、膝部を中心として大腿骨２５と脛骨２６とが上下方向に延び、曲げる動作によって「く」の字形となるのに対し、犬等の動物Ａの足は、飛節を中心として脛骨２７と中足骨２８とが正面上下方向に延出された逆「く」の字形の形状になっていることから、これらの形状の違いを利用して判別するものである。つまり、進行方向（移動方向）に対し、生成された形状コードが「く」の字状であれば人間Ｈ（侵入者）であると判別し、一方、形状コードが逆「く」の字状であれば、犬等の動物Ａであると判別する。なお、この判別は、監視カメラ２に対し移動物体が左右方向に横切った場合、すなわち移動物体の側面を撮像した場合に有効となる。
【００２４】
頭部判別部１８は、部位特定手段１４によって特定された部位が「頭部」付近に相当する場合、「胴部に対する頭部の位置」に基づいて判別する。これは、図３に示すように、移動物体を側面から見た場合、人間Ｈの頭は胴部の上方に位置するのに対し、犬等の動物Ａの頭は胴部の斜め上方に位置することから、これらの相対位置の違いを利用して判別するものである。つまり、移動物体の大きさが殆ど変わらない場合、すなわち監視カメラ２に対し移動物体が左右方向に横切った場合において、頭部が胴部の上方に位置している場合には、人間Ｈ（侵入者）であると判別し、一方、頭部が胴部の斜め上方または前方に位置している場合には犬等の動物Ａであると判別する。
【００２５】
肩幅判別部１９は、部位特定手段１４によって特定された部位が「肩部及び頭部」付近に相当する場合、「頭部に対する肩幅の広さ」に基づいて判別する。これは、移動物体の正面または背面を見た場合、人間Ｈは肩幅（胴部の横幅）が広いのに対し、犬等の動物Ａは胴部の横幅が狭いことから、この幅の違いを利用して判別するものである。つまり、移動物体の大きさが変化する場合、すなわち監視カメラ２に対して遠近方向に移動する場合において、頭部に対する肩幅が所定の比率よりも大きいときには、その移動物体は人間Ｈ（侵入者）であると判別し、一方、所定の比率よりも小さいときには犬等の動物Ａであると判別する。
【００２６】
尻尾判別部２０は、部位特定手段１４によって特定された部位が「尻部」付近に相当する場合、「尻尾の存在」に基づいて判別する。これは、人間Ｈには犬のような尻尾を有しないことから、この違いを利用して判別するものである。つまり、尻部付近に細長い紐状の形状が認識された場合には犬等の動物Ａであると判別する。
【００２７】
なお、人間Ｈと動物Ａとの判別は上記の方法に限定されるものではなく、例えば腕の位置や全体の大きさ等、人間Ｈと動物Ａとを区別できる形状または姿勢であれば、その違いを利用して判別することが可能である。
【００２８】
ところで、本実施形態では、フレーム画像における変化の有無、すなわち所定時間間隔で分離されるフレーム画像の差分に基づいて移動物体を抽出するため、揺れ動くカーテンや洗濯物が、移動物体として認識される可能性がある。しかし、本発明によれば、人間の特徴的部位に応じて人間か否かを判別するため、カーテンや洗濯物が風にふかれても報知手段を作動させることはない。
【００２９】
また、地震により画像認識装置１またはそれに内蔵された監視カメラ２が揺動した場合には、フレーム画像全体が動画素と判別され、背景自体が移動物体と認識される可能性がある。しかし、この場合、移動物体の輪郭形状は画面全体の形状となり、人間の輪郭とは明らかに異なることから、それが人間と判別されることはない。
【００３０】
次に、本実施形態の警備システム１における画像処理の流れについて、図４に示すフローチャートに基づき説明する。まず、フレーム分離工程（ステップＳ１）において動画データがフレーム画像に分離され、そのフレーム画像を基本に以降の認識処理が行われる。なお、このときのフレーム分離の時間間隔は、認識すべき動作の速さや、認識の精度及び処理能力に応じて適宜設定される。分離されたフレーム画像は画像記憶部９に記憶される（ステップＳ２）。
【００３１】
続いて順次分離されるフレーム画像に対して変化の有無を画素毎に判定する（ステップＳ３）とともに、変化の有無を数値列データに変換する（ステップＳ４）。そして、数値列データから移動物体の輪郭形状における特徴点を抽出しその座標を検出する（ステップＳ５）とともに、各特徴点を結ぶ線分の長さ及び内角を検出し（ステップＳ６）、それを基に形状コードを生成する（ステップＳ７）。ここで、ステップＳ３及びステップＳ４の処理が本発明の変換工程に相当し、ステップＳ５〜ステップＳ７の処理が本発明のコード生成工程に相当する。その後、移動物体の外観形状を示す形状コードを基に、移動物体が人間（侵入者）か否かを判別する（ステップＳ８）。なお、ホームセキュリティシステムでは、侵入者と認定された場合、すなわち、少なくとも一つの判別手段において人間と判別された場合、報知手段を作動させる。
【００３２】
このように、上記の画像認識装置１では、ホームセキュリティシステムに適用した場合、撮像された移動物体が侵入者であるかペットであるかを正確に判別することができ、侵入者である場合にのみ報知手段を作動させることができる。このため、ペットを連れて外出したり、サークルやクレイトの中にペットを入れて室内を動き回らないようにする等の対策が不要となり、使い勝手を大きく向上させることができる。また、上記の画像認識装置１では、監視対象物の特徴的部位に応じた複数の判別部を有するため、判別の精度を大きく向上させることができる。
【００３３】
さらに、上記の画像認識装置１では、膨大な情報を有する動画データを効率的に処理することによって、認識処理の対象データ量を少なくすることができる。これにより、汎用のコンピュータによるリアルタイムな処理が可能となり、比較的安価な画像認識装置を提供することが可能になる。
【００３４】
以上、本発明について好適な実施形態を挙げて説明したが、本発明はこの実施形態に限定されるものではなく、以下に示すように、本発明の要旨を逸脱しない範囲において、種々の改良及び設計の変更が可能である。
【００３５】
すなわち、上記の画像認識装置１では、フレーム画像についての変化の有無を、画素毎に判定するものを示したが、フレーム画像を複数の小領域に分割する領域分割手段を備え、フレーム画像についての変化の有無を分割された小領域毎に判定するようにしてもよい。フレーム画像の分割数は、フレーム画像の総画素数よりもはるかに低く設定されるため、この領域分割により処理の簡略化を図ることが可能になる。
【００３６】
上記の画像認識装置１では、形状コードを生成する際、線分の傾きとして隣接する線分同士の内角を算出するものを示したが、線分の傾斜角度やベクトルを算出するようにしてもよい。
【００３７】
上記の実施形態では、画像認識装置をホームセキュリティシステムに適用し、人間（侵入者）とペットとを判別するものを示したが、監視対象物は、特に限定されるものではなく、輪郭形状に特徴のある移動物体であれば、本発明の画像認識装置によって判別することが可能である。例えば、車両認識システムに適用した場合には、走行中の自動車の車種をその輪郭形状から判別することが可能になり、手配車の捜索や交通状況の調査等に利用することが可能になる。また、分別システムに適用した場合には、例えばコンベア上を流れる生産物を輪郭形状に応じて判別し自動的に振分けることが可能となる。特に、本発明の画像認識装置は、汎用の安価なコンピュータで実現することができることから、その応用範囲は広く、防犯設備、生産管理装置、安全装置、遊技機、及び玩具等、幅広い分野で適用することが可能である。
【００３８】
【発明の効果】
本発明によれば、移動物体の輪郭形状に対応する形状コードを生成するとともに、その形状コードを解析することによって監視対象物か否かを判別することから、監視対象物を比較的容易に且つ正確に認識することができる。また、膨大な情報を有する動画データを効率的に処理することによって、認識処理の対象データ量を少なくすることができる。これにより、汎用のコンピュータによるリアルタイムな処理が可能となり、比較的安価な装置で実現することが可能になる。
【図面の簡単な説明】
【図１】本発明の一実施形態である画像認識装置の機能的構成を示すブロック図である。
【図２】画像認識方法を説明するための説明図である。
【図３】人間と犬との輪郭形状の違いを説明するための説明図である。
【図４】画像認識装置における処理の流れを示す説明図である。
【符号の説明】
１画像認識装置
２監視カメラ（撮像手段）
３フレーム分離手段
５変換手段
６コード生成手段
７判別手段
１１座標検出手段
１２線分算出手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an image recognition method and an image recognition apparatus, and more particularly to an image recognition method for recognizing a monitoring target such as a human, an animal, and a car by image recognition, and an apparatus using the method.
[0002]
[Prior art]
Conventionally, as a method of recognizing a monitoring target such as a human or a car based on image data, a difference between previously registered background image data and image data of a moving object is acquired, and the difference image data is acquired. There is known a method of recognizing a monitoring target based on the following.
[0003]
The above prior art is naturally performed by those skilled in the art, and the applicant does not know the document describing this prior art.
[0004]
[Problems to be solved by the invention]
However, since this method involves high-level image processing, an expensive device such as a computer having a high processing capability is required, and a relatively long time is required to recognize the monitoring target. Further, when the shape of the monitored object is not constant, that is, when the contour of the monitored object changes as needed due to its operation, it is determined whether or not the captured moving object is the monitored object (human). Had become difficult. In addition, when the position of the moving object with respect to the camera or the distance between the camera and the moving object is different, the size of the moving object to be imaged is different, which makes the determination difficult.
[0005]
In view of the above circumstances, the present invention provides an image recognition method and an image recognition device that can reliably determine a monitoring target from a contour shape of a captured moving object and can be realized by a relatively inexpensive device. It is.
[0006]
[Means for Solving the Problems]
The image recognition method according to the present invention includes: a frame separation step of separating moving image data of a moving object into frame images at predetermined time intervals; and, for two frame images separated in sequence, the presence or absence of an image change for each pixel. Or a determination step for each divided small area, and a conversion step of converting the frame image into numerical sequence data based on the determination result; and a shape code corresponding to the contour shape of the moving object based on the numerical sequence data. And a discriminating step of discriminating whether or not the moving object is a monitoring target based on the shape code.
[0007]
According to this image recognition method, moving image data is subjected to frame separation in a frame separation step, and subsequent recognition processing is performed based on the frame image. The time interval of the frame separation at this time is appropriately set according to the speed of the operation to be recognized, the accuracy of the recognition, and the processing capability.
[0008]
In the subsequent conversion step, the change of the image in the two frame images separated at a predetermined time interval is determined for each pixel or for each small area divided in advance, and a numerical value based on the determination result is given. For example, when it is determined that there is a change, “1” is output, and when it is determined that there is no change, “0” is output. In this way, the frame image is changed to the numerical sequence data. In the next code generation step, a shape code corresponding to the contour shape of the moving object is generated based on the numerical sequence data. In the determining step, it is determined whether the moving object is a monitoring target by comparing the generated shape code of the moving object with a preset shape code of the monitoring target.
[0009]
The image recognition device according to the present invention is configured such that an imaging unit that images a moving object and outputs moving image data thereof, a frame separation unit that inputs the moving image data and separates the moving image data into frame images at predetermined time intervals, Conversion means for judging the presence or absence of an image change for each of the two frame images for each pixel or for each small area previously divided, and converting the frame image into numerical sequence data based on the judgment result; and Code generating means for generating a shape code corresponding to the contour shape of the moving object based on the above, and determining means for determining whether or not the moving object is a monitoring target based on the shape code It is.
[0010]
According to this image recognition apparatus, similarly to the above-described image recognition method, the moving image data captured by the imaging unit is separated into frames by the frame division unit, and the subsequent recognition processing is performed based on the frame image. Further, the changing unit determines whether or not there is a change in the frame image for each pixel or for each small area divided in advance, and gives a numerical value based on the determination result. After that, the code generation means generates a shape code corresponding to the contour shape of the moving object based on the numerical sequence data. Then, the determining means determines whether the moving object is a monitoring target by comparing the generated shape code of the moving object with a preset shape code of the monitoring target.
[0011]
In the image recognition apparatus, the code generation unit may include a coordinate detection unit configured to detect a coordinate of a feature point in the contour shape of the moving object, and a length and a slope of a line segment connecting the feature points may be calculated. And generating the shape code based on a combination of the length and the inclination of each of the line segments.
[0012]
Here, the feature points are a plurality of points located on the contour of the moving object. As a method of setting the feature points, points on the contour may be sampled at regular intervals, but the vertices or inflection points of the curve constituting the contour (the curve indicating the contour shape changes from convex to concave or from concave to concave) (A point that changes to a convex shape) may be extracted.
[0013]
According to this image recognition device, when generating the shape code, the coordinates of the characteristic points in the contour shape of the moving object are detected by the coordinate detection means. By sequentially connecting the plurality of feature points, a polygon having a simplified outline shape, that is, a figure consisting of only straight lines is formed. Therefore, the line segment calculating means calculates the length of the line segment connecting each feature point and its inclination, and generates a shape code based on a combination of the length and the inclination of each line segment, that is, information on a polygonal figure. . It should be noted that a graphic consisting only of straight lines has a much smaller amount of information than a collection of curves forming the contour shape of the moving object, so that the processing can be simplified.
[0014]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an image recognition apparatus according to an embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram showing a functional configuration of the image recognition device, FIGS. 2 and 3 are explanatory diagrams for explaining an image recognition method, and FIG. 4 is a flowchart showing a flow of processing in the image recognition device. is there.
[0015]
The image recognition device 1 according to the present embodiment is configured to determine a movable monitoring target such as a human, an animal, and a car based on the outline shape thereof. The present invention can be applied to a home security system to be operated, a vehicle recognition system for discriminating the type of a running automobile, and a separation system for recognizing and separating the type of industrial products flowing on a conveyor.
[0016]
In the present embodiment, a case where the image recognition device 1 is applied to a home security system will be described as an example. As a home security system, a system equipped with a surveillance camera has been conventionally known.However, in the conventional system, it is not possible to distinguish between a human being an intruder and a pet such as a dog or a cat, so that a pet is indoors. When the resident goes out of the house, it is necessary to take measures such as taking out the pet with the resident or putting the pet in a circle or a krait so as not to move around the room. However, if the image recognition device 1 of the present invention is incorporated in a home security system, it is possible to distinguish between an intruder and a pet (for example, a dog), and to activate the notification means only when the intruder is determined. Becomes possible.
[0017]
As shown in FIG. 1, the image recognition device 1 of the present embodiment includes a monitoring camera 2 (corresponding to an imaging unit of the present invention) such as a CDD camera and the like. It has a functional configuration for determining whether a moving object is a monitoring target (that is, an intruder) or a pet. Note that this functional configuration is realized by an execution program stored in a storage unit of a general-purpose computer, and includes a frame separation unit 3, a conversion unit 5, a code generation unit 6, and a determination unit 7. I have. The frame separating unit 3 receives moving image data from the monitoring camera 2 and generates frame images (still images) separated at predetermined time intervals (for example, every 0.1 second). The separated frame images are stored in the image storage unit 9 of the storage device 8. Here, the reason why the frame images are stored in the image storage unit 9 is to compare with the previous frame image when the latest frame image is separated.
[0018]
The conversion unit 5 includes a change determination unit 10 that reads the latest frame image and the previous frame image separated at a predetermined time interval from the image storage unit 9 and determines whether or not there is a change in the image for each pixel. The frame image is converted into numerical value sequence data based on the determination result. More specifically, the change determination unit 10 compares the amount of change detected for each pixel with a predetermined threshold value, and determines a pixel whose amount of change is larger than the threshold value as a moving pixel (moving pixel). Pixel). As a result, only a moving part can be extracted from the subject. That is, it is possible to separate the background with no movement and the moving object with movement from the subject. Thus, the conversion means 5 outputs, for example, “1” to a moving pixel and “0” to a still pixel, and then arranges these numerical values in a predetermined order to obtain a numerical value sequence relating to a moving object. Create data.
[0019]
The code generation means 6 generates a shape code corresponding to the contour shape of the moving object based on the numerical sequence data generated by the conversion means 5, and includes a coordinate detection means 11 and a line segment calculation means 12. are doing. The coordinate detecting means 11 obtains a plurality of feature points for a curve constituting the contour shape of the moving object and detects the coordinates. For example, as shown in FIG. 2A, when the shape of the extracted moving object is a human upper body, vertices T1, T2, T3,... Tn of a plurality of curves constituting the contour shape are set as feature points. Set. By connecting these feature points, a polygon having a simplified outline shape, that is, a figure consisting of only straight lines as shown in FIG. 2B is formed. The line segment calculating means 12 calculates the lengths B, C, D,... Of the line segments connecting the feature points, and the interior angles b, c, d,. , And a shape code (for example, BbCcDd...) Composed of a combination of an inner angle and an inner angle is generated and output.
[0020]
The determining means 7 shown in FIG. 1 determines whether the moving object is a human (intruder) or a pet such as a dog based on the shape code generated by the code generating means 6, Here, an aspect ratio determining unit 13, a region specifying unit 14, and a region determining unit 15 are provided. First, the aspect ratio determining means 13 determines whether the outline shape of the entire moving object is vertically long or horizontally long. If the portrait shape is long, the moving object is a human, and if it is horizontally long, the moving object is a non-human moving object. Is determined. This is because, as shown in FIG. 3, the human H often moves in a standing posture, and the animal A such as a dog or a cat often moves in a prone posture. Is determined.
[0021]
On the other hand, the part specifying unit 14 specifies a part of the body included in the subject based on the shape code generated by the code generating unit 6. In the code storage unit 16 of the storage device 8, shape codes representing the outline shape of each part of the body such as the head, the torso, and the foot are stored in advance, and the generated shape code and the code storage unit 16 are stored. By comparing the generated shape code with a shape code stored in the form code, the generated shape code is identified as a shape code representing a portion. For example, when a shape code that is elongated and extends in the vertical direction is generated as the shape code, the shape code indicating the foot is specified.
[0022]
The part determining means 15 determines whether or not the detected moving object is a human according to the part specified by the part specifying means 14. Specific examples of the discrimination include a joint discrimination unit 17 for the foot, a head discrimination unit 18 for the head, a shoulder width discrimination unit 19 for the shoulder, and a tail discrimination unit 20 for the buttocks.
[0023]
When the part specified by the part specifying unit 14 corresponds to the vicinity of the “foot”, the joint determination unit 17 determines based on “the formation of the joint of the foot”. This is because, as shown in FIG. 3, the leg of the human H has a “U” shape due to the bending operation of the femur 25 and the tibia 26 extending up and down around the knee, whereas a dog or the like. Of the animal A, the tibia 27 and the metatarsal 28 are formed in the shape of an inverted "ku" extending in the vertical direction from the front with respect to the fly segment. It is determined by using the information. In other words, if the generated shape code is in the shape of a “ku” in the traveling direction (moving direction), it is determined that the person is a human H (intruder). If so, it is determined that the animal is an animal A such as a dog. This determination is effective when the moving object crosses the monitoring camera 2 in the left-right direction, that is, when the side surface of the moving object is imaged.
[0024]
When the part specified by the part specifying unit 14 corresponds to the vicinity of the “head”, the head determining unit 18 determines based on “the position of the head with respect to the trunk”. As shown in FIG. 3, when the moving object is viewed from the side, the head of the human H is located above the torso, whereas the head of the animal A such as a dog is located obliquely above the torso. Therefore, the determination is made using the difference between these relative positions. That is, when the size of the moving object hardly changes, that is, when the moving object crosses the surveillance camera 2 in the left-right direction and the head is positioned above the torso, the human H (invading On the other hand, if the head is positioned diagonally above or in front of the torso, it is determined to be an animal A such as a dog.
[0025]
When the part specified by the part specifying means 14 corresponds to the vicinity of the “shoulder and head”, the shoulder width determining unit 19 determines based on “the width of the shoulder width relative to the head”. This is because when looking at the front or back of the moving object, the human H has a wide shoulder width (width of the torso), whereas the animal A such as a dog has a narrow width of the torso. It is determined by using the information. That is, when the size of the moving object changes, that is, when the moving object moves in the perspective direction with respect to the monitoring camera 2 and the shoulder width with respect to the head is larger than the predetermined ratio, the moving object is a human H (intruder). On the other hand, when the ratio is smaller than the predetermined ratio, it is determined that the animal is an animal A such as a dog.
[0026]
When the part specified by the part specifying means 14 corresponds to the vicinity of the “tail”, the tail determination unit 20 determines based on “the presence of the tail”. Since the human H does not have a tail like a dog, it is determined using this difference. That is, when an elongated cord-like shape is recognized near the buttocks, the animal is determined to be an animal A such as a dog.
[0027]
Note that the discrimination between the human H and the animal A is not limited to the above method. For example, any shape or posture that can distinguish the human H and the animal A, such as the position of the arm or the overall size, is used. It is possible to determine using the difference.
[0028]
By the way, in the present embodiment, since the moving object is extracted based on the presence or absence of a change in the frame image, that is, the difference between the frame images separated at predetermined time intervals, a swinging curtain or laundry may be recognized as the moving object. There is. However, according to the present invention, since it is determined whether or not a person is a person according to a characteristic part of the person, the notification means is not activated even if the curtain or the laundry is blown by the wind.
[0029]
When the image recognition device 1 or the monitoring camera 2 built in the image recognition device 1 swings due to an earthquake, the entire frame image is determined to be a moving pixel, and the background itself may be recognized as a moving object. However, in this case, the contour shape of the moving object becomes the shape of the entire screen and is clearly different from the contour of a human, so that it is not determined to be a human.
[0030]
Next, the flow of image processing in the security system 1 of the present embodiment will be described with reference to the flowchart shown in FIG. First, moving image data is separated into frame images in a frame separation step (step S1), and subsequent recognition processing is performed based on the frame images. The time interval of the frame separation at this time is appropriately set according to the speed of the operation to be recognized, the accuracy of the recognition, and the processing capability. The separated frame image is stored in the image storage unit 9 (Step S2).
[0031]
Subsequently, the presence / absence of a change is determined for each pixel of the sequentially separated frame images (step S3), and the presence / absence of the change is converted into numerical sequence data (step S4). Then, feature points in the contour shape of the moving object are extracted from the numerical value sequence data and their coordinates are detected (step S5), and the length and interior angle of the line connecting each feature point are detected (step S6). A shape code is generated based on the shape code (step S7). Here, the processing of steps S3 and S4 corresponds to the conversion step of the present invention, and the processing of steps S5 to S7 corresponds to the code generation step of the present invention. Thereafter, it is determined whether or not the moving object is a human (intruder) based on the shape code indicating the external shape of the moving object (step S8). In the home security system, when it is determined that the person is an intruder, that is, when it is determined that at least one of the determination units is a human, the notification unit is activated.
[0032]
As described above, in the image recognition device 1 described above, when applied to a home security system, it is possible to accurately determine whether a captured moving object is an intruder or a pet. Only the notification means can be operated. For this reason, it is not necessary to take any measures such as taking the pet out of the room or putting the pet in a circle or clay so as not to move around the room, and the usability can be greatly improved. In addition, the image recognition device 1 described above has a plurality of discriminating units corresponding to the characteristic parts of the monitoring target, so that the accuracy of the discrimination can be greatly improved.
[0033]
Further, in the image recognition device 1 described above, the amount of data to be subjected to recognition processing can be reduced by efficiently processing moving image data having enormous information. Thus, real-time processing can be performed by a general-purpose computer, and a relatively inexpensive image recognition device can be provided.
[0034]
As described above, the present invention has been described with reference to the preferred embodiments. However, the present invention is not limited to these embodiments, and various improvements and modifications can be made without departing from the scope of the present invention as described below. Design changes are possible.
[0035]
That is, in the image recognition device 1 described above, the presence or absence of a change in the frame image is determined for each pixel. However, the image recognition device 1 includes an area dividing unit that divides the frame image into a plurality of small areas. The presence or absence of the change may be determined for each of the divided small areas. Since the number of divisions of the frame image is set to be much lower than the total number of pixels of the frame image, it is possible to simplify the processing by this region division.
[0036]
In the above-described image recognition apparatus 1, when generating the shape code, the one that calculates the interior angle between adjacent line segments as the line segment inclination has been described, but the inclination angle and the vector of the line segment may be calculated. Good.
[0037]
In the above embodiment, the image recognition device is applied to the home security system to discriminate between a human (intruder) and a pet. However, the monitoring target is not particularly limited and has a contour shape. A moving object having a characteristic can be determined by the image recognition device of the present invention. For example, when the present invention is applied to a vehicle recognition system, it is possible to determine the type of a running car from its contour shape, and it is possible to use it for searching for a arranged vehicle, investigating traffic conditions, and the like. In addition, when the present invention is applied to a sorting system, for example, products flowing on a conveyor can be determined according to a contour shape and automatically sorted. In particular, since the image recognition device of the present invention can be realized by a general-purpose inexpensive computer, its application range is wide, and it is applied to a wide range of fields such as security equipment, production control devices, safety devices, amusement machines, and toys. It is possible to do.
[0038]
【The invention's effect】
According to the present invention, a shape code corresponding to the contour shape of a moving object is generated, and the shape code is analyzed to determine whether or not the monitored object is a monitored object. Can be accurately recognized. In addition, by efficiently processing moving image data having enormous information, the amount of data to be subjected to recognition processing can be reduced. As a result, real-time processing can be performed by a general-purpose computer, and can be realized by a relatively inexpensive device.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a functional configuration of an image recognition device according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram for explaining an image recognition method.
FIG. 3 is an explanatory diagram for explaining a difference in contour between a human and a dog.
FIG. 4 is an explanatory diagram showing a flow of processing in the image recognition device.
[Explanation of symbols]
1 image recognition device 2 surveillance camera (imaging means)
3 frame separating means 5 converting means 6 code generating means 7 discriminating means 11 coordinate detecting means 12 line segment calculating means

Claims

A frame separation step of separating moving image data of a moving object into frame images at predetermined time intervals,
For the two frame images separated in order, the presence or absence of a change in the image is determined for each pixel or for each small region divided in advance, a conversion step of converting the frame image into numerical sequence data based on the determination result,
A code generation step of generating a shape code corresponding to the contour shape of the moving object based on the numerical sequence data;
A determination step of determining whether or not the moving object is a monitoring target based on the shape code.

Image capturing means for capturing a moving object and outputting moving image data thereof; frame separating means for inputting the moving image data and separating the moving object data into frame images at predetermined time intervals;
For two frame images separated in order, a conversion unit that determines the presence or absence of a change in the image for each pixel or for each small region divided in advance, and converts the frame image into numerical sequence data based on the determination result.
Code generation means for generating a shape code corresponding to the contour shape of the moving object based on the numerical sequence data,
An image recognition apparatus comprising: a determination unit configured to determine whether the moving object is a monitoring target based on the shape code.

The code generation means includes:
Coordinate detection means for detecting the coordinates of a feature point in the contour shape of the moving object,
Line segment calculation means for calculating the length of the line segment connecting each of the feature points and the inclination thereof,
The image recognition apparatus according to claim 2, wherein the shape code is generated based on a combination of the length and the inclination of each line segment.