JP3810943B2

JP3810943B2 - Image processing apparatus, image processing method, and recording medium recording image processing program

Info

Publication number: JP3810943B2
Application number: JP12630899A
Authority: JP
Inventors: 修山口
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-05-06
Filing date: 1999-05-06
Publication date: 2006-08-16
Anticipated expiration: 2019-05-06
Also published as: JP2000322588A

Description

【０００１】
【発明の属する技術の分野】
本発明は、画像に図形等を合成する画像処理装置及びその方法に関する。
【０００２】
【従来の技術】
顔画像処理を利用したシステムとして様々なものがある。
【０００３】
例えば、存在検知、個人認証、視線・顔向き検出等の高度な利用法がある。また、顔領域に着目した上で、画像に変形を加えることを主目的としたものとして以下のものが挙げられる。
【０００４】
１．めがね着せ替えシステム
２．動画編集システム
３．髪型着せ替えシステム
４．カップル子供シミュレーションシステム
５．写真シール製造器（いわゆるプリクラ（商標））のフレーム装飾
６．ＳＩＧＧＲＡＰＨ９７で提案された顔変形システム
等がある。
【０００５】
１．では、特徴点を手動で指定をし、静止画に対する処理のものが製品化され、眼鏡装用シミュレーション装置（特開平６−１３９３１８号）のような公知例がある。特徴点の自動化についても、特許が出願されている。
【０００６】
２．は、動画に対する処理で、手動で対応点等を指定するシステムがテレビ局等で用いられ、放送に利用されている。なお、モーフィングについては、静止画用のモーフィングソフトについても様々なものが市販されている。
【０００７】
３．は、画像処理装置（特開平８−３２９２７８号）で開示されているように、顔の輪郭を検出し、髪の部分を領域として求めた後、他の髪型を重ねるというものである。
【０００８】
４．は、２人の人物、もしくは人物と他の画像とのモーフィングを行い、例えば２人の人物の子供の画像を合成する装置であり、商品化されている。顔部品の検出はハーフミラーに表示された位置に顔を合わせて検出を簡略化している。
【０００９】
５．については、顔の位置は特定しておらず、画像の枠に装飾を施すことを主目的としている。また特定の領域に顔の位置を自分で調整してフレームの位置に移動する必要がある。
【００１０】
６．については、文献［Darrell,T.Gordon,F.,Woodfill W.,Baker,H.:′′A MagicMorphin Mirror′′,SIGGRAPH′97 Visual Proceedings,ACM Press,1997］で提案され、色、距離情報とパターン情報を組み合わせて人物の顔位置を見つけ、その顔位置の画像を変形させることにより、顔の形状が曲げられたような効果を表示する装置である。
【００１１】
それぞれの用途、特徴として、１，３については、髪型、めがね等のファッションを検討するためのシステムである。２，４，６は画像を局所的に変形させ、映像的な効果を向上させることを目的としたシステムである。５は画像認識を必要とせず、自動的に検出を行うものではない。
【００１２】
【発明が解決しようとする課題】
本発明はこれらの装置には該当せず、それぞれ本発明の次の目的を達成するには機能的に不十分であった。
【００１３】
すなわち、本発明は、主として動画を対象とし、位置や大きさが多様に変化する顔領域に追随し、自動的に顔領域や顔部品を検出する機能を利用して、顔の領域の位置、大きさ、また、顔部品の位置大きさにあわせて、顔領域の様々な部分に、鮮やかな色彩を持つ図形表示を行うことで、画像中の人物を瞬時に変装させたり、装飾したりすることを目的としたものである。
【００１４】
また、自動的に顔の検出を行うために、利用者の手を煩わすことがなく、利用者が移動しても、移動した画面位置で同じ効果の合成画像が得られることも目的としている。
【００１５】
これにより、本発明は、アミューズメントとしての効果が大きく、瞬時に印象の異なる様々な人物の変装画像を自動的に得ることが可能になる画像処理装置及びその方法を提供する。
【００１６】
【課題を解決するための手段】
本発明は、画像を入力する画像入力手段と、前記入力した画像中の対象者の顔領域の大きさと位置を検出する顔領域検出手段と、前記検出した顔領域から目、鼻、または、口の顔部品の位置を検出する顔部品位置検出手段と、前記入力した画像と合成するための図形または画像の内容と位置を決定する図形計算手段と、前記入力した画像に対し、前記決定した図形または画像を合成する画像合成手段と、前記合成された画像を出力する画像出力手段と、を有し、前記図形計算手段は、複数の図形、または、画像の集合からなる登録図形パターン情報であり、かつ、前記顔領域、または、前記顔部品の位置、前記合成するための図形または画像と前記顔部品との相対的な位置関係、並びに、前記顔領域の大きさに基づいて設定可能な登録図形パターン情報を保持する情報保持手段と、前記顔領域の位置と大きさ、または、顔部品の位置の情報と、前記保持された登録図形パターン情報とを用いて、合成する図形または画像の位置と大きさを決定する重ね合わせ位置計算手段と、を有し、前記重ね合わせ位置計算手段は、前記登録図形パターン情報に含まれる相対的な位置関係を表すベクトルを用いるものであり、前記検出した顔部品の位置を中心として前記ベクトルによって前記合成する図形または画像の位置を決定することを特徴とする画像処理装置である。
【００１７】
また、本発明は、画像を入力する画像入力手段と、前記入力した画像中の対象者の顔領域を検出する顔領域検出手段と、前記検出した顔領域から目、鼻、または、口の顔部品の位置を検出する顔部品位置検出手段と、前記入力した画像を変換する内容と位置を決定する図形計算手段と、前記入力した画像における前記決定した位置を前記決定した内容に変換する画像合成手段と、前記変換された画像を出力する画像出力手段と、を有し、前記図形計算手段は、変換する領域の形状の集合からなる登録図形パターン情報であり、かつ、前記顔領域、または、前記顔部品の位置、前記変換する領域の形状と前記顔部品との相対的な位置関係、並びに、前記顔領域の大きさに基づいて設定可能な登録図形パターン情報を保持する情報保持手段と、前記顔領域の位置と大きさ、または、顔部品の位置の情報と、前記保持された登録図形パターン情報とを用いて、変換する領域の位置と大きさを決定する画像変換手段と、を有し、前記画像変換手段は、前記登録図形パターン情報に含まれる相対的な位置関係を表すベクトルを用いるものであり、前記検出した顔部品の位置を中心として前記ベクトルによって前記変換する領域の位置を決定することを特徴とする画像処理装置である。
【００３８】
【発明の実施の形態】
以下に本発明の実施例について説明する。
【００３９】
［実施例１］
本発明の実施例１の画像処理装置について説明する。
【００４０】
画像処理装置では、図２（ａ）のように、入力された画像に対して、顔領域と顔部品の大きさ、位置を検出し、図２（ｂ）（ｃ）のように、所定の図形を検出された顔部品位置に応じて、相対的に位置、大きさを変え、図２（ｂ）（ｃ）のように、入力画像と共に重ね合わせた表示、加工をする。これにより、表示された内容は、認識対象となった人物があたかも瞬時に変装したように見せることができる。
【００４１】
このような効果を狙った画像生成を行うためには、画像処理装置は、図１のような画像入力部１００、顔領域検出部２００、顔部品検出部３００、画像出力部４００、図形計算部５００、図形合成部６００が必要であり、以下、それぞれの構成要素に対して説明する。
【００４２】
なお、画像処理装置は、例えば、カメラを備えたパソコンで実現可能であり、各構成要素の機能を果たすプログラムをこのパソコンにインストールして記憶させておけばよい。また、このプログラムをＦＤ，ＣＤ−ＲＯＭ，ＭＯ，ＤＶＤ等の記録媒体に記録させておき、他のパソコンに移植してもよい。さらに、写真シール製造器等に使用してもよい。
【００４３】
（画像入力部１００）
画像入力部１００は、人物が写っている静止画像、または動画像を得るためのものである。基本的な構成としては図３に示すように、カメラ１０１、フレームグラバ１０２からなり、人間の顔を撮影するために用いる。入力画像は、カラー、モノクロに限らない。
【００４４】
また、画像入力部１００の別の構成として、カメラの代わりに、ＴＶチューナー、ＶＣＲ、ＤＶＤ等でもよい。
【００４５】
さらに、画像入力部１００の別の構成として、ファイルに記録されたデジタルビデオデータやネットワークを経由して得られるビデオデータを入力としてもよい。
【００４６】
もちろん動画、静止画の種類を問わない。フレームグラバ１０２上のメモリ１０３に入力画像が保存され、それぞれの各処理部からアクセスされる。
【００４７】
（顔領域検出部２００）
顔領域検出部２００は、画像入力部１００から得られた画像に対して、画像中から人物の顔領域を検出する。
【００４８】
顔領域の検出方法として、予め収集した顔画像から生成されたテンプレートを用いて、画像の各部分に当てはめ、テンプレートと高い相関をもつ部分を顔領域とする。
【００４９】
図４に示すように、顔領域検出部２００は、画像変換部２０１とテンプレート計算部２０２と領域決定部２０３からなる。
【００５０】
画像変換部２０１は、入力画像がカラー画像の場合は、モノクロ画像に変換し、さらに入力画像を数段階の大きさに縮小し、いくつかの大きさの画像を生成する。縮小率等はカメラのレンズ系、画角等の情報や画像の大きさ等から、想定される顔領域の大きさを予め判断し数段階の割合を決定しておく。
【００５１】
テンプレート計算部２０２は、用意されたテンプレートを用いて、縮小画像に対し、画像をラスタースキャンし、各部分画像に対してテンプレートとの類似度を求める。類似度の計算方法としては、テンプレートとして、学習パターンを用いて構成された部分空間を用いて、部分空間法［エルッキ・オヤ著，小川英光，佐藤誠訳，“パターン認識と部分空間法”，産業図書,1986 参照］を用いて類似度を求めてもよい。この場合、顔に類似しているが、顔ではないような擬似顔画像のテンプレートも用意しておき、同時に類似度を計算することによって、誤った領域を検出しないようにもできる。
【００５２】
領域決定部２０３は、各縮小画像における各部分画像での類似度から判断して、もっとも顔らしいと判定された領域についての情報を出力する。具体的には、縮小画像の縮小率、検出された場所の検出位置である。テンプレートの大きさと縮小率を考慮し、元の入力画像の大きさにあわせて、顔領域を表す矩形の大きさと、矩形の位置を出力とする。
【００５３】
（顔部品検出部３００）
顔部品検出部３００は、顔領域と判定された領域について、目、鼻、口といった顔の部品を検出する。
【００５４】
図５に示すように、顔部品検出部３００は、画像変換部３０１、候補点抽出部３０２、候補点検証部３０３、候補点決定部３０４で構成される。
【００５５】
画像変換部３０１では、入力画像から、顔領域検出部２００によって決定された領域を注目領域とし、その領域の濃淡値の低い部分に対して処理を行うための、処理画素を決定する。
【００５６】
候補点抽出部３０２では、文献［福井和広、山口修：「形状抽出とパターン照合の組合せによる顔特徴点抽出」,電子情報通信学会論文誌（D）,vol.J80-D-II,No.8,pp2170--2177 （1997）］で述べられている分離度フィルタを用いて円形領域の候補点を検出する。候補は複数個の点として表される。
【００５７】
候補点検証部３０３では、また候補点をパターン認識手法により、各顔部品のテンプレートとの類似性の判定する。
【００５８】
候補点決定部３０４では、候補点の位置関係の組み合わせを用いて候補点を絞り、候補点の位置を出力する。
【００５９】
図６を用いて説明する。
【００６０】
図６の３５１は時間Ｔにおける入力画像を表し、顔領域検出部２００によって顔領域を検出した結果、その矩形領域を図６の３５２に示す。
【００６１】
顔部品検出部３００では、この矩形領域内部について、円形領域の候補点を検出し（図６の３５３）、位置関係、パターンの類似性等を満たすような４つの候補を選択し目鼻特徴点を得る（図６の３５４）。
【００６２】
また、本実施例では、ある入力画像に対して、瞳位置、鼻孔位置を検出した場合、その次のフレームではその検出位置付近のみを部品の探索範囲として、高速に部品検出を行うトラッキング探索を行う。
【００６３】
図６の３５５は時間Ｔにおける入力画像を表し、前述したような検出方法で目鼻特徴点を検出されたとする。その後時間Ｔ＋１では、再び全画面中から顔検出、目鼻検出を行うのではなく、図６の３５６に示したように、検出した点の近傍領域の矩形の部分だけに対して、候補点抽出部３０２の円形領域の候補点の検出、候補点検証部３０３における、各顔部品のテンプレートとの類似性判定を行い、候補点を絞り、候補点の位置を出力する。
【００６４】
時間Ｔ＋１の時点では、画像が図６の３５７のように別の方向を向いているように、変化していても、わずかな時間であれば人物の動作が少ないため、探索範囲を前述のように絞っても、特徴点の抽出が可能となる。
【００６５】
また、大きな動作を行い、時間Ｔ＋ａの時点では、画像が図６の３５８のように部品の探索範囲から外れた場合には、顔領域検出部２００により、顔領域の検出を行い、顔部品の再度検索を行う。
【００６６】
このトラッキング処理により、顔部品位置を求める時間が短縮され、スムーズな画像の合成、変換処理を行えるようになる。
【００６７】
本実施例では、目の瞳の位置の中心座標、鼻孔の位置の中心座標を出力する。
【００６８】
もちろん、口やその他の顔部品の位置を検出し、その出力としてもよい。
【００６９】
（画像出力部４００）
画像出力部４００は、合成内容を付加された入力画像を、出力機器に送る。
【００７０】
図７に示すように、画像を画面に出力する場合には、画像合成部６００の画像メモリにある合成結果をビデオＲＡＭ４１０に転送する。Ｄ／Ａ変換を介してディスプレイ４１１に表示する。
【００７１】
ファイルへの出力の場合には、ファイルポインタ、もしくはバッファに対して画像データを転送する。
【００７２】
ファイルは磁気ディスク４３０、メモリ４４０等に保存される。
【００７３】
また、画像の圧縮や画像のフォーマット変換等を行うエンコーダ４２０を通して画像を保存、出力してもよい。
【００７４】
また、ネットワーク４４０を会して、別の装置にデータを送付しても良い。
【００７５】
また、通信を行う場合は、合成した結果の画像を送信することも出来、入力と合成部分の画像を分ける等の方法や、図形計算部５００に必要な合成に必要な情報のみを送信し、受信側で合成を行う等の方法でもよい。
【００７６】
出力メディアが画像やデジタルビデオの場合には、各画像のメモリを描画内容にしたがって各画素の情報を変更し、記録メディア（ファイル）に保存する。
【００７７】
（図形計算部５００）
図形計算部５００は、入力画像に重ね合わせるための所定の図形や画像を保持、選択、位置計算を行う。図２（ｂ）（ｃ）で示したような重ね合わせるための所定の図形や画像の情報のことを、「登録図形パターン情報」と呼ぶ。
【００７８】
１登録図形パターン情報の内容
まず、登録図形パターン情報について説明する。
【００７９】
登録図形パターン情報の内容としては、表示する線、多角形、円、楕円、円弧等の規定図形、またビットマップと呼ばれるような画素集合等の規定画像等の、図形の属性を示すもの、また図形の色、テクスチャ等の属性、顔領域との相対的な位置関係を表す位置情報、等があり、これをオブジェクトと呼ぶ。
【００８０】
一例として図１１に示すような構造体で各オブジェクトを表現する。
【００８１】
ここで、各引数は図形を表現するための、パラメータや、相対的な位置関係を表現するための関係情報を表現するためのものである。そのオブジェクトを複数個有し、ある意味を持った図形配置の行うものを登録図形パターン情報と呼ぶ。
【００８２】
図２（ｂ）の例では、"ｃａｔ"を表す登録図形パターン情報であり、図１０に示すように９個のオブジェクトからなる。５１１，５１２，５１３多角形オブジェクト、５１４，５１５，５１６，５１７，５１８，５１９は線オブジェクトである。顔に合わせて相対的な位置関係に図形を配置することにより、猫に「変装」させることができる。
【００８３】
各オブジェクトは、オブジェクトの図形属性によって、それを表現する構造が異なる。例えば、多角形オブジェクトの場合は、多角形の角の個数、そして各ポイントの位置情報を表す変数が定義される。線オブジェクトの場合は、２点の位置情報が定義される。また色属性を表すためのＲＧＢ値を表す変数、またはテクスチャを表すための変数等も定義されている。
【００８４】
図１２は、オブジェクトの相対位置を指定するための指定方法についての説明図である。
【００８５】
図１２（ａ）のような顔がカメラに入力されると、顔領域検出部２００、顔部品検出部３００の機能により、図１２（ｂ）に示す各ポイントが検出される。
【００８６】
図１２（ｃ）は、その検出された特徴点から、いくつかの基準となりうる点について、ラベルをつけたもので、その名前を示すことにより、各特徴点の座標位置を得ることができる。
【００８７】
図１２（ｄ）は、各特徴点を用いて、相対的に位置を表すための、いくつかのベクトル情報を示しており、それぞれのラベルで位置情報を指定できる。
【００８８】
実際に、入力された画像に対して、図１２（ｆ）で示すような、多角形（三角形）を描画する方法について説明する。
【００８９】
まず、図形の種類（現在は多角形）を指定し、その位置情報を指定する。
【００９０】
図１２（ｅ）に示すように、ｐ１，ｐ２，ｐ３の各点を目の位置ｂ１からの相対的な位置関係で表現する。
【００９１】
そのためには、ａ１,ａ２,ａ３のベクトルを次のように指定する。
【００９２】
まず、基準として、ｂ１（目）を選び、相対的な位置を表現するベクトルａ１を表す。
【００９３】
そのために、例えば、顔領域を基準とした
FACE_WIDTH,FACE_HEIGHT
というベクトルを用いて表現すると、
a1 = t1 * FACE_WIDTH + t2 * FACE_HEIGHT
よって、
p1 = b1 + a1 = b1 + FACE_WIDTH * t1 + FACE_HEIGHT* t2
といった記述をすればよい（ｔ１,ｔ２は実数値）。
【００９４】
そして、図１２（ｆ）のような白い三角形を書くためには、以下の記述で可能となる。
【００９５】
FG_FILL_POLYGON 3
RIGHT_EYE ADD FACE_WIDTH t1 ADD FACE_HEIGHT t2
RIGHT_EYE ADD FACE_WIDTH -t3 ADD FACE_HEIGHT t4
RIGHT_EYE ADD FACE_WIDTH t5 ADD FACE_HEIGHT -t6
COLOR 255 255 255
５１１のオブジェクトの場合は、まず、表示位置を指定するために、基準となる顔部品の特徴点を指定する。
【００９６】
顔部品の特徴点としては、瞳位置、鼻孔位置、顔領域の中心、各部品間の中点等複数の候補があるが、５１１の場合、２つの鼻孔の中点を基準特徴点として選択し、その基準特徴点からの相対位置を設定する。
【００９７】
相対位置の設定は、顔の大きさが変化した場合にも、大きさをあわせて図形を重ね合わせるためであり、相対基準量を用いて指定する。
【００９８】
相対基準量は、検出された顔領域の大きさや２つの目の間の距離等が選択できる。
【００９９】
５１１では３角形を表示するために、相対基準量として、顔領域の大きさを選び、その定数倍のｘ、ｙ座標位置に３点を指定する。色属性は、５１１の場合は塗りつぶしを行い、ＲＧＢ値を指定している。
【０１００】
５１２，５１３の場合も同様に、基準特徴点として、それぞれの瞳位置を選択し、その上方に相対基準量を用いて、各点の指定を行っている。
【０１０１】
５１４から５１９は線オブジェクトであるために、２点を指定する。この場合も、基準特徴点と相対基準量を用いて指定する。線の場合は、実線、破線等の指定、色属性の指定等も行える。
【０１０２】
その他、円オブジェクトについても、円の中心を表す変数、円の大きさを表す変数等が定義され、その他の図形オブジェクトについても、同様に構造が定義される。
【０１０３】
ビットマップオブジェクトについては、表示を行う場合の位置、または表示させる場合の拡大率等を指定する。
【０１０４】
なお、これらのオブジェクトは描画順序によって描かれる結果が異なるために、描画順序を規定し、オブジェクトの記述順に画像に施されるとする。
【０１０５】
２図形計算部５００の内容
次に、図形計算部５００の振る舞いについて説明する。
【０１０６】
図形計算部５００は、図９に示すように図形データベース５０１、重ね合わせ位置計算部５０１から構成される。
【０１０７】
図形データベース５０１には、複数の登録図形パターン情報が保存されており、これらの登録図形パターンはファイルとして管理され、自由に新規追加、削除等が行われる。データベースには、パターン情報の個数や大きさ等の情報が管理される。
【０１０８】
各パターンの情報の構造は、図１３に示すように、データベースには複数の登録図形パターンがあり、各登録図形パターンは、それぞれオブジェクトの集合として構成されている。
【０１０９】
このデータベースは、磁気ディスクまたはメモリユニットから構成されている。そして、登録図形パターンのデータベースを記憶したＦＤ，ＣＤ−ＲＯＭ，ＭＯ，ＤＶＤ等の記録媒体を別途用意しておき、これをパソコンである画像処理装置にインストール等によって記憶させてもよい。この場合に、登録図形パターンのデータベースの記録媒体のみを製造、販売してもよい。
【０１１０】
パターン編集部５０４によって、登録図形パターンの入出力を可能にする。例えば、可搬型磁気ディスク、可搬型メモリユニット等の交換を目的としたメディア、またはネットワーク経由によるデータ交換等を目的としたインタフェースを有してもよいし、編集エディタ機能を有してもよい。
【０１１１】
これらの登録図形パターン情報は、あるスクリプト言語によって記述することができる。
【０１１２】
例えば、“ｃａｔ”を表す登録図形パターン情報は、図８に示すようなテキスト情報で記述されたスクリプトとして表現される。これにより、テキストファイルを編集することで、合成する図形情報を修正することができ、このファイルをネットワークを通じて流通させること等も可能である。
【０１１３】
重ね合わせ位置計算部５０２は、必要となる基準特徴点の情報や相対基準量の情報を、顔領域検出部２００や顔部品検出部３００から読み込み、引数によって指定されたそれぞれの位置を計算する。
【０１１４】
図１４のフローチャートを用いて説明する。
【０１１５】
まず、登録図形データ情報を図形データベース５０１の中からひとつを選択して、メモリに読み込む（ステップ５０２０１）。その選択は選択部５０３によって選ばれる。
【０１１６】
選択部５０３は、利用者によって選択された登録図形パターンのＩＤ（例えば、認識番号）を重ね合わせ位置計算部５０２に送り、そのＩＤのパターンを読み込む。
【０１１７】
ここで、選択部５０３に乱数発生機構やタイマを持たせ、一定時間間隔であるパターンの変更シーケンスを行ったり、パターンをランダムに切り替える等の演出効果を導入することも可能である。
【０１１８】
メモリにロードされた登録図形パターン情報について、複数のオブジェクトについて次の処理を行う。
【０１１９】
まず記述された順序通りに一つのオブジェクトを読み込む（ステップ５０２０２）。
【０１２０】
そのオブジェクトの中で定義されている引数の読み込みを行う（ステップ５０２０３）。
【０１２１】
引数には、位置情報を表すものとパラメータ（円の半径の大きさ等）を表すものがあるが、それぞれについて指定された引数の情報にしたがって処理をする。
【０１２２】
説明は位置情報の場合で説明する。始めに位置情報を表すための変数の初期化を行う。位置情報を表す点（ａｘ，ａｙ）とすると、ａｘ，ａｙについて、ａｘ，ａｙ共に０を代入する（ステップ５０２０４）。
【０１２３】
次に指定されている顔部品等の位置情報もしくは、顔部品から得られる図１２で示したベクトル等を用いて指定された長さを読み込む（ステップ５０２０５）。
【０１２４】
その位置情報に係数を施す（定数倍する）処理を行う（ステップ５０２０６）。
【０１２５】
そして位置情報を表す変数であるａｘ，ａｙに加算する（ステップ５０２０７）。
【０１２６】
さらに引数に記述がある場合には、ステップ５０２０５から繰り返し、そうでなければ（ａｘ，ａｙ）がその位置情報となる（ステップ５０２０８）。
【０１２７】
さらに引数がある場合には、繰り返し（ステップ５０２０９）、全てのオブジェクトの引数について値が決まった場合に終了する（ステップ５０２１０）。
【０１２８】
この手順により検出した顔の位置の情報に対応する、合成画像の位置が決定する。
【０１２９】
（画像合成部６００）
画像合成部６００は、指定された登録図形パターン情報を図形計算部５００によって計算された位置に、それぞれの属性で合成、または画像の加工を行う。
【０１３０】
画像合成部６００の構成例を図１５に示す。
【０１３１】
画像メモリ６１０は、入力画像を保持し、それぞれの合成部６１２に渡される。
【０１３２】
各合成部６１２は、登録図形パターン情報で記述された形状の種類、処理方法に応じて選択的に用いられる。その指示は、図形計算部５００から渡される形状、位置情報、パラメータ、画像情報をオブジェクト情報制御部６１１が受取り、指定のオブジェクトの数だけ処理を繰り返す。
【０１３３】
画像メモリ６１０に保持されている画像は合成部６１２によって変換され、画像メモリ６１３に保持される。処理が繰り返される場合は、画像メモリ６１３から画像メモリ６１０に画像がコピーされ、処理が繰り返される。
【０１３４】
オブジェクト情報制御部６１１は、全ての合成が終了すると、画像メモリ６１３の画像を外部に渡す。
【０１３５】
これまで、画像に対して図形を合成することについて述べたが、次にビットマップ画像の重ね合わせについて説明する。
【０１３６】
ビットマップの重ね合わせは、図１６のように行われる。フローチャートは図１７である。
【０１３７】
図１６の６５０は処理対象となる入力画像であり、わかりやすくするために、検出した目鼻に円をつけて表示したものである。
【０１３８】
検出された位置応じて計算される位置に合わせて、図１６の６５２で表されるようなビットマップ画像（サル）を重ね合わせ、図１６の６５１の合成画像を得ることを目的とする。
【０１３９】
図１７のステップ６６１では、合成するビットマップの画像を読み込み、同時に埋め込みの際の排除色（合成しない領域を表し、図１６の６５２の白い領域の色を指す。）を読み込む。
【０１４０】
ステップ６６２では、ビットマップ画像が合成される位置を図１６の６５３のように前述のような位置指定から計算する。
【０１４１】
ステップ６６３では、入力画像を読み込み、ビットマップ画像の合成を開始する。合成位置は拡大、縮小されることもあるため、対象となる領域のなかでの画素位置を求め（ステップ６６４）、それぞれの合成位置によって処理が異なる（ステップ６６５）。図１６の６５４の場所に対応する図１６の６５５の位置では、排除色の場合（図１６の６５６）は入力画像の画素値をそのまま出力し（ステップ６６６）、排除色ではなく図柄がある場合（図１６の６５７）はビットマップ画像の画素値を出力とする（ステップ６６７）。
【０１４２】
全ての対象領域に対して処理が行われたかを判断し（ステップ６６８）、終わっていれば、図１６の６５１に示したような合成画像を出力する。
【０１４３】
もちろん、一枚の入力画像に複数のビットマップ合成を行っても良い。
【０１４４】
なお、合成部６１２の構成を変え、画像メモリを一つにしてもよいし、その構成は問わない。
【０１４５】
また、出力メディアがディスプレイ等で、リアルタイムに表示を行う場合は、ＶＲＡＭに直接画像変換を加えるような構成でもよい。
【０１４６】
出力メディアが画像やデジタルビデオの場合には、各画像のメモリを描画内容にしたがって各画素の情報を変更し、記録メディアに保存すればよい。
【０１４７】
［実施例２］
実施例２の画像処理装置について説明する。
【０１４８】
実施例２の画像処理装置では、実施例１で述べた画像合成部６００に画像変換部７００を加える。
【０１４９】
画像合成部６００に画像変換部７００の機能を追加した構成例を図１８に示す。
【０１５０】
画像合成部６００で説明した各機能と同列で、画像を変換する機能をもつもの、例えば画像のある領域に対してモザイク化を行うモザイク処理部７０１や画像のある部分領域を拡大、縮小する部分拡大縮小処理部７０２やその他の画像変換部７０３を追加する。
【０１５１】
画像合成部６００を用いた実施例では、図形や画像の重ね合わせによる合成のみであるが、画像自身に画像変換を加えることで、様々な印象の異なる画像を得ることがさらに可能となる。
【０１５２】
図１９に画像変換の例を示す。
【０１５３】
図１９（ａ）は処理対象となる入力画像であり、わかりやすくするために、検出した目鼻に円をつけて表示したものである。
【０１５４】
この画像に対し、検出された顔領域、目鼻位置の情報を用いて、顔領域付近以外の場所について、モザイク化の処理を行ったものが図１９（ｂ）である。
【０１５５】
また、顔領域付近のみに対しモザイク化の処理を行ったものが図１９（ｃ）であり、このような画像自身に対して画像変換を行う処理を提供する。
【０１５６】
もちろん、顔領域を自動的に追跡しているために、図１９（ｃ）（ｄ）のように、人物が移動しても、その場所でモザイク化の処理を行うことで、常時顔領域がモザイクが処理された画像を得ることができる。
【０１５７】
また、入力画像の一部分を拡大、縮小する機能により、図１９（ｅ）（ｆ）のように、目部分の拡大や鼻部分の拡大といった効果を施すこともできる。
【０１５８】
従って、目鼻等の部品情報も用いた変形制御が可能であるため従来例の６．で述べたようなシステムとは異なる。
【０１５９】
処理手順は以下の通りである。
【０１６０】
画像メモリ６１０は、入力画像を保持し、それぞれの画像変換部に渡す。
【０１６１】
各画像変換部７０１，７０２，７０３は、登録図形パターン情報で記述された画像変換の種類、処理方法に応じて選択的に用いられる。
【０１６２】
その指示は、図形計算部５００から渡される形状、位置情報、パラメータ、をオブジェクト情報制御部６１１が受取り、画像メモリ６１０に保持されている画像を変換部７０１によって変換し、画像メモリ６１３に保持する。処理が繰り返される場合は、画像メモリ６１３から画像メモリ６１０に画像がコピーされ、処理が再度行われる。オブジェクト情報制御部６１１は、全ての合成、画像変換が終了すると、画像メモリ６１３の画像を外部に渡す。
【０１６３】
各画像変換部は、図２０のフローチャートで動作する。ここでは、部分拡大処理を行う部分拡大縮小処理部７０２の動作を例に説明する。
【０１６４】
まず、ステップ７１１では、画像変換の情報を読み込む。部分拡大処理部７０２では、目の領域の拡大率等を読み込む。
【０１６５】
ステップ７１２では、画像変換を施すための画像変換の位置情報を計算する。
【０１６６】
部分拡大縮小処理部７０２では、原画像の目領域と拡大後の目領域の大きさのを求める。図２０の７２１のように拡大前の目領域の位置を求め、ステップ７１１で得られた拡大率を元に、図２０の７２２の拡大後の目領域の位置を求める。
【０１６７】
ステップ７１３では、画像変換を施すための対象画像をメモリから読み込み、ステップ７１４において、画像変換を施す。
【０１６８】
部分拡大縮小処理７０２では、先の拡大を対象領域に対して行い、７２３のような画像をえる。そして、ステップ７１５で、その画像を出力する。
【０１６９】
もちろんこのような画像変形を複数回行ってもよいし、多重に操作してもよい。図形の合成と組み合わせて利用することでさらなる効果が期待できる。
【０１７０】
また、このような画像変換だけでなく、色調の補正や対象領域のエッジだけを抽出する等の別の処理でもよい。
【０１７１】
［変形例］
以下、上記実施例の変形例を幾つか説明する。
【０１７２】
（変形例１）
図２１のように、音声出力部７００を加え、画像の合成に合わせて音声、音楽等を出力してもよい。
【０１７３】
この場合、図形計算部５００で、合成が開始されたフレームに同期して、音声を出力させてもよい。また、合成が行われているときに、バックグラウンドミュージックを出力してもよい。音声情報は、登録図形パターン情報に付随して保持させればよい。
【０１７４】
（変形例２）
また、図２２のように、音声認識部８００を加え、音声による合成内容の選択を行っても良い。
【０１７５】
例えば、「サル」という声をシステムに対して入力すると音声認識部８００により、「サル」という言葉を認識し、その言葉に対応する
変装内容を選択部５０３への指示で選択し、図１６のような変装が行われることが実現できる。
【０１７６】
もちろん、このために登録図形パターン情報にキーワードをつける拡張を施して良い。
【０１７７】
これにより、次々に合成情報につけられたキーワードを発声することで次々に様々な画像変形を楽しむことが出来る。
【０１７８】
（変形例３）
実施例１，２では、顔領域、顔部品の位置と大きさだけであったが、顔から得られる副次的な特徴をくわえ、さらにバリエーションを加えることができる。
【０１７９】
ここで、副次的な特徴とは、瞬き、顔の向き、瞳の位置、視線方向、口のあけ方、個人同定情報等をさす。
【０１８０】
例えば、顔の向きに応じて図形を変形させることにより、各図形オブジェクトにアフィン変換等の変形操作を加え、顔の向きに対応した図形出力が可能になる。
【０１８１】
また、顔の状態によって、表示、合成するオブジェクトの数を変更する等の拡張を行ってもよい。
【０１８２】
例えば、人間の目のまばたきを行うことをスイッチとして、目を閉じているフレームでは、別のオブジェクトが表示されることや、オブジェクトの属性（色、大きさ、位置、形状）を変更させることも可能である。このような拡張のためには、オブジェクトの記述法と登録図形パターン情報の構造を変更する必要がある。
【０１８３】
また、顔の特徴として、目鼻の位置だけであったが、口等を検出する部も加えれば、さらに表現能力を高めることができる。
【０１８４】
（変更例４）
現在は、入力画像自身には変更を加えてないが、画像にもモーフィングに代表されるような変形を加え、図形表示と組み合わせることで、表現力を高めることができる。
【０１８５】
例えば、ある動物の骨格に合わせて、顔の領域を変形させ、その後図形表示を行えば、さらにリアリティも向上させることができる。
【０１８６】
（変更例５）
また、入力画像の情報にあわせて各オブジェクトの属性を変更することも可能である。
【０１８７】
例えば、入力画像の色調にあわせて、色属性を変更する等がその例である。暗い画像が入力された場合には、明度や彩度を落として重ね合わせれば効果的である。
【０１８８】
（変更例６）
顔の検出は、実施例では一人で説明を行ったが、複数人を検出し、複数人の顔部品情報を同時に検出する方法に変更し、複数人を同時に仮装させることも可能である。
【０１８９】
【発明の効果】
本発明によれば、瞬時に様々な変形パターン含んだ顔画像を得ることができ、アミューズメントやプレゼンテーション、画像通信、ビデオメール等の画像効果を向上させることが可能になる。
【０１９０】
また、通常とは異なる動画像を簡単に生成できるために、コミュニケーションのツールとしての効果をもたらすことが期待できる。
【図面の簡単な説明】
【図１】本発明の実施例１の画像処理装置のブロック図である。
【図２】合成適用例である。
【図３】画像入力部のブロック図である。
【図４】顔領域検出部のブロック図である。
【図５】顔部品検出部のブロック図である。
【図６】顔部品の検出の説明図である。
【図７】画像出力部のブロック図である。
【図８】登録パターン情報のスクリプト例である。
【図９】図形計算部のブロック図である。
【図１０】登録図形パターンの一例である。
【図１１】オブジェクトの構造体の一例である。
【図１２】オブジェクトの相対位置の指定法である。
【図１３】登録図形パターンの構造である。
【図１４】オブジェクトの相対位置の計算法の流れ図である。
【図１５】画像合成部のブロック図である。
【図１６】ビットマップ合成部の説明図である。
【図１７】ビットマップ合成のフローチャートである。
【図１８】画像変換部のブロック図である。
【図１９】画像変換適用例である。
【図２０】画像変換の説明図とフローチャートである。
【図２１】実施例２の画像処理装置のブロック図である。
【図２２】変更例の画像処理装置であり音声認識部を用いたブロック図である。
【符合の説明】
１００画像入力部
２００顔領域検出部
３００顔部品検出部
４００画像出力部
５００図形計算部
６００図形合成部[0001]
[Field of the Invention]
The present invention relates to an image processing apparatus and method for synthesizing a figure or the like with an image.
[0002]
[Prior art]
There are various systems using face image processing.
[0003]
For example, there are advanced usage methods such as presence detection, personal authentication, and gaze / face orientation detection. In addition, the followings can be cited as the main purpose of adding deformation to an image while paying attention to the face area.
[0004]
1. Glasses changing system
2. Video editing system
3. Hairstyle changing system
4). Couple kids simulation system
5). Frame decoration of photo sticker maker (so-called “Purikura”)
6). Face deformation system proposed in SIGGRAPH 97
Etc.
[0005]
1. Then, a feature point is manually specified, and a processing for a still image is commercialized, and there is a publicly known example such as a spectacle wearing simulation device (Japanese Patent Laid-Open No. 6-139318). Patents have also been filed for the automation of feature points.
[0006]
2. In the processing for moving images, a system for manually specifying corresponding points and the like is used in television stations and the like, and is used for broadcasting. As for morphing, various morphing software for still images are commercially available.
[0007]
3. As disclosed in an image processing apparatus (Japanese Patent Laid-Open No. 8-329278), after detecting the outline of a face and obtaining a hair portion as an area, another hairstyle is superimposed.
[0008]
4). Is a device that morphs two persons or a person and another image and synthesizes, for example, images of children of the two persons, and is commercialized. The detection of the facial parts is simplified by matching the face to the position displayed on the half mirror.
[0009]
5). With regard to, the position of the face is not specified, and the main purpose is to decorate the frame of the image. In addition, it is necessary to adjust the position of the face to a specific area and move it to the frame position.
[0010]
6). Is proposed in the literature [Darrell, T. Gordon, F., Woodfill W., Baker, H.:”A MagicMorphin Mirror ”, SIGGRAPH '97 Visual Proceedings, ACM Press, 1997], color and distance information And the pattern information are combined to find the position of the person's face and deform the image of the face position to display an effect that the shape of the face is bent.
[0011]
For each application and feature, 1 and 3 are systems for examining fashion such as hairstyles and glasses. Reference numerals 2, 4, and 6 are systems for the purpose of locally deforming an image and improving a video effect. No. 5 does not require image recognition and does not automatically detect.
[0012]
[Problems to be solved by the invention]
The present invention does not correspond to these devices, and each of them is functionally insufficient to achieve the following object of the present invention.
[0013]
That is, the present invention mainly targets a moving image, follows a face area whose position and size change variously, uses a function of automatically detecting a face area and a face part, Instantly disguise or decorate a person in an image by displaying graphics with vibrant colors on various parts of the face area according to the size and position of the face part. It is for the purpose.
[0014]
Another object of the present invention is to automatically detect a face, so that a user's hand is not bothered and a composite image having the same effect can be obtained at the moved screen position even if the user moves.
[0015]
As a result, the present invention provides an image processing apparatus and method that have a great effect as an amusement and can automatically obtain disguise images of various persons with different impressions instantly.
[0016]
[Means for Solving the Problems]
The present invention includes an image input means for inputting an image, a face area detection means for detecting the size and position of the face area of the subject in the input image, and an eye, nose or mouth from the detected face area. A face part position detecting means for detecting the position of the face part, a figure for combining with the inputted image or a figure calculating means for determining the content and position of the image, and the figure determined for the inputted image Alternatively, the image calculation means includes an image synthesis means for synthesizing images and an image output means for outputting the synthesized image, and the graphic calculation means is registered graphic pattern information composed of a plurality of figures or a set of images. And registration that can be set based on the position of the face area or the face part, the relative positional relationship between the figure or image to be combined and the face part, and the size of the face area. Figure The position of the figure or image to be synthesized using the information holding means for holding the screen information, the position and size of the face area, or the position information of the face part and the held registered figure pattern information And overlay position calculation means for determining the size. The overlay position calculation means uses a vector representing a relative positional relationship included in the registered figure pattern information, and the figure to be synthesized by the vector around the position of the detected face part or Determine the position of the image An image processing apparatus characterized by this.
[0017]
The present invention also provides an image input means for inputting an image, a face area detection means for detecting the face area of the subject in the input image, and a face of eyes, nose, or mouth from the detected face area. Face component position detecting means for detecting the position of a part, graphic calculation means for determining the content and position for converting the input image, and image composition for converting the determined position in the input image to the determined content Means and image output means for outputting the converted image. The graphic calculation means is registered graphic pattern information consisting of a set of shape of the area to be converted, and the position of the face area or the face part, the shape of the area to be converted and the relative of the face part Information holding means for holding registered figure pattern information that can be set based on the general positional relationship, and the size of the face area, information on the position and size of the face area, or the position of the face part; Image conversion means for determining the position and size of the area to be converted using the stored registered figure pattern information, and the image conversion means is a relative information included in the registered figure pattern information. A vector representing a positional relationship is used, and the position of the region to be converted is determined by the vector with the position of the detected face part as the center. An image processing apparatus characterized by this.
[0038]
DETAILED DESCRIPTION OF THE INVENTION
Examples of the present invention will be described below.
[0039]
[Example 1]
An image processing apparatus according to a first embodiment of the present invention will be described.
[0040]
In the image processing apparatus, as shown in FIG. 2A, the size and position of the face area and the facial part are detected from the input image, and predetermined values are obtained as shown in FIGS. 2B and 2C. The position and size of the figure are relatively changed in accordance with the detected face part position, and display and processing are superimposed with the input image as shown in FIGS. As a result, the displayed content can appear as if the person to be recognized is disguised instantly.
[0041]
In order to perform image generation aiming at such an effect, the image processing apparatus includes an image input unit 100, a face area detection unit 200, a face part detection unit 300, an image output unit 400, a graphic calculation unit as shown in FIG. 500 and the figure composition unit 600 are required, and each component will be described below.
[0042]
Note that the image processing apparatus can be realized by a personal computer equipped with a camera, for example, and a program that performs the function of each component may be installed and stored in the personal computer. Further, this program may be recorded on a recording medium such as FD, CD-ROM, MO, DVD, etc. and ported to another personal computer. Further, it may be used in a photographic sticker manufacturing machine.
[0043]
(Image input unit 100)
The image input unit 100 is for obtaining a still image or a moving image in which a person is shown. As shown in FIG. 3, the basic configuration includes a camera 101 and a frame grabber 102, and is used for photographing a human face. The input image is not limited to color and monochrome.
[0044]
As another configuration of the image input unit 100, a TV tuner, a VCR, a DVD, or the like may be used instead of the camera.
[0045]
Furthermore, as another configuration of the image input unit 100, digital video data recorded in a file or video data obtained via a network may be input.
[0046]
Of course, regardless of the type of video or still image. An input image is stored in the memory 103 on the frame grabber 102 and accessed from each processing unit.
[0047]
(Face region detection unit 200)
The face area detection unit 200 detects a face area of a person from the image obtained from the image input unit 100.
[0048]
As a method for detecting a face area, a template generated from a face image collected in advance is applied to each part of the image, and a part having a high correlation with the template is set as a face area.
[0049]
As shown in FIG. 4, the face area detection unit 200 includes an image conversion unit 201, a template calculation unit 202, and an area determination unit 203.
[0050]
When the input image is a color image, the image conversion unit 201 converts the input image into a monochrome image, further reduces the input image to several sizes, and generates images of several sizes. The reduction ratio and the like are determined in advance based on information such as the lens system of the camera, the angle of view, the size of the image, etc., and the size of the assumed face area is determined in advance.
[0051]
The template calculation unit 202 performs a raster scan on the reduced image using the prepared template, and obtains a similarity with the template for each partial image. As a method of calculating similarity, a subspace method using a learning pattern as a template is used as a subspace method [Erki Oya, Hidemitsu Ogawa, Makoto Sato, “Pattern recognition and subspace method”, The similarity may be obtained using an industrial book, see 1986]. In this case, a template of a pseudo face image that is similar to a face but not a face is prepared, and at the same time, the similarity can be calculated so that an erroneous region is not detected.
[0052]
The area determination unit 203 outputs information on an area determined to be the most face based on the similarity between the partial images in each reduced image. Specifically, the reduction rate of the reduced image and the detection position of the detected location. Considering the size of the template and the reduction ratio, the size of the rectangle representing the face area and the position of the rectangle are output in accordance with the size of the original input image.
[0053]
(Face Parts Detection Unit 300)
The face part detection unit 300 detects face parts such as eyes, nose, and mouth in the area determined as the face area.
[0054]
As shown in FIG. 5, the facial part detection unit 300 includes an image conversion unit 301, a candidate point extraction unit 302, a candidate point verification unit 303, and a candidate point determination unit 304.
[0055]
The image conversion unit 301 determines, from the input image, a processing pixel for performing processing on a region having a low gray value in the region determined by the face region detection unit 200 as a region of interest.
[0056]
In the candidate point extraction unit 302, the literature [Kazuhiro Fukui, Osamu Yamaguchi: “Face feature point extraction by combination of shape extraction and pattern matching”, IEICE Transactions (D), vol.J80-D-II, No. 8, pp2170--2177 (1997)], the candidate point of a circular area is detected using the separability filter. Candidates are represented as a plurality of points.
[0057]
The candidate point verification unit 303 also determines the similarity between the candidate points and the template of each facial part by a pattern recognition method.
[0058]
The candidate point determination unit 304 narrows down candidate points using a combination of the positional relationships of candidate points, and outputs the positions of the candidate points.
[0059]
This will be described with reference to FIG.
[0060]
351 in FIG. 6 represents an input image at time T. As a result of detecting the face area by the face area detecting unit 200, the rectangular area is indicated by 352 in FIG.
[0061]
The face part detection unit 300 detects the candidate points of the circular region within the rectangular region (353 in FIG. 6), selects four candidates that satisfy the positional relationship, pattern similarity, etc., and selects the eye-nose feature points. Obtain (354 in FIG. 6).
[0062]
Further, in this embodiment, when a pupil position and a nostril position are detected for a certain input image, a tracking search that performs component detection at high speed with only the vicinity of the detection position as a component search range in the next frame is performed. Do.
[0063]
355 in FIG. 6 represents an input image at time T, and it is assumed that an eye-nose feature point is detected by the detection method as described above. After that, at time T + 1, face detection and eye / nose detection are not performed again from the entire screen, but as shown by reference numeral 356 in FIG. The candidate point of the circular area 302 is detected, and the candidate point verification unit 303 performs similarity determination with each face part template, narrows down the candidate points, and outputs the positions of the candidate points.
[0064]
At time T + 1, even if the image changes in a different direction as indicated by 357 in FIG. 6, there is little movement of the person in a short time, so the search range is set as described above. It is possible to extract feature points even if it is narrowed down.
[0065]
Further, a large operation is performed, and at the time T + a, when the image deviates from the part search range as indicated by 358 in FIG. 6, the face area detection unit 200 detects the face area and detects the face part. Search again.
[0066]
By this tracking process, the time for obtaining the face part position is shortened, and a smooth image composition and conversion process can be performed.
[0067]
In this embodiment, the center coordinates of the eye pupil position and the center coordinates of the nostril position are output.
[0068]
Of course, the position of the mouth and other facial parts may be detected and output as such.
[0069]
(Image output unit 400)
The image output unit 400 sends the input image to which the composite content is added to the output device.
[0070]
As shown in FIG. 7, when outputting an image to the screen, the composition result in the image memory of the image composition unit 600 is transferred to the video RAM 410. The data is displayed on the display 411 through D / A conversion.
[0071]
In the case of output to a file, image data is transferred to a file pointer or a buffer.
[0072]
The file is stored in the magnetic disk 430, the memory 440, or the like.
[0073]
The image may be stored and output through an encoder 420 that performs image compression, image format conversion, and the like.
[0074]
Further, the network 440 may be met and the data may be sent to another device.
[0075]
In addition, when performing communication, it is also possible to transmit an image of the combined result, such as a method such as separating the input and the image of the combined part, or transmitting only the information necessary for combining necessary for the graphic calculation unit 500, A method of combining on the receiving side may also be used.
[0076]
When the output medium is an image or digital video, the information of each pixel is changed in the memory of each image according to the drawing content, and the information is stored in a recording medium (file).
[0077]
(Graphic calculation unit 500)
The figure calculation unit 500 holds, selects, and calculates a position of a predetermined figure or image to be superimposed on the input image. Information on a predetermined graphic or image for superimposition as shown in FIGS. 2B and 2C is referred to as “registered graphic pattern information”.
[0078]
1. Contents of registered graphic pattern information
First, the registered graphic pattern information will be described.
[0079]
The registered graphic pattern information includes graphic attributes such as lines, polygons, circles, ellipses, and arcs to be displayed, and graphic images such as a set of pixels called a bitmap. There are attributes such as the color of the figure and the texture, position information indicating the relative positional relationship with the face area, and the like, which are called objects.
[0080]
As an example, each object is represented by a structure as shown in FIG.
[0081]
Here, each argument is for expressing a parameter for expressing a figure and relation information for expressing a relative positional relationship. What has a plurality of objects and performs graphic arrangement with a certain meaning is called registered graphic pattern information.
[0082]
In the example of FIG. 2B, the registered graphic pattern information representing “cat” is composed of nine objects as shown in FIG. Polygonal objects 511, 512, and 513, 514, 515, 516, 517, 518, and 519 are line objects. By placing figures in a relative positional relationship with the face, the cat can be “disguised”.
[0083]
Each object has a different structure for expressing it depending on the graphic attribute of the object. For example, in the case of a polygon object, the number of polygon corners and variables representing the position information of each point are defined. In the case of a line object, position information of two points is defined. In addition, variables representing RGB values for representing color attributes, variables for representing textures, and the like are also defined.
[0084]
FIG. 12 is an explanatory diagram of a designation method for designating the relative position of an object.
[0085]
When a face as shown in FIG. 12A is input to the camera, the points shown in FIG. 12B are detected by the functions of the face area detection unit 200 and the face part detection unit 300.
[0086]
In FIG. 12C, points that can be used as a reference are labeled from the detected feature points, and the coordinates of each feature point can be obtained by indicating the names.
[0087]
FIG. 12D shows some vector information for relatively representing the position using each feature point, and the position information can be designated by each label.
[0088]
A method of actually drawing a polygon (triangle) as shown in FIG. 12F with respect to an input image will be described.
[0089]
First, the type of figure (currently a polygon) is designated, and its position information is designated.
[0090]
As shown in FIG. 12E, the points p1, p2, and p3 are expressed by a relative positional relationship from the eye position b1.
[0091]
For this purpose, the vectors a1, a2, and a3 are designated as follows.
[0092]
First, b1 (eyes) is selected as a reference, and a vector a1 expressing a relative position is represented.
[0093]
For this purpose, for example, the face area is used as a reference.
FACE_WIDTH, FACE_HEIGHT
When expressed using the vector
a1 = t1 * FACE_WIDTH + t2 * FACE_HEIGHT
Therefore,
p1 = b1 + a1 = b1 + FACE_WIDTH * t1 + FACE_HEIGHT * t2
(T1 and t2 are real values).
[0094]
In order to write a white triangle as shown in FIG. 12 (f), the following description is possible.
[0095]
FG_FILL_POLYGON 3
RIGHT_EYE ADD FACE_WIDTH t1 ADD FACE_HEIGHT t2
RIGHT_EYE ADD FACE_WIDTH -t3 ADD FACE_HEIGHT t4
RIGHT_EYE ADD FACE_WIDTH t5 ADD FACE_HEIGHT -t6
COLOR 255 255 255
In the case of the object 511, first, in order to designate the display position, the feature point of the facial part that is the reference is designated.
[0096]
There are multiple candidate face feature points such as pupil position, nostril position, face area center, and midpoint between each part. In the case of 511, the midpoint of the two nostrils is selected as the reference feature point. The relative position from the reference feature point is set.
[0097]
The setting of the relative position is for superimposing figures with matching sizes even when the size of the face changes, and is specified using a relative reference amount.
[0098]
As the relative reference amount, the size of the detected face area, the distance between two eyes, and the like can be selected.
[0099]
In 511, in order to display a triangle, the size of the face area is selected as a relative reference amount, and three points are designated at x and y coordinate positions that are a multiple of the face area. When the color attribute is 511, the color is filled and an RGB value is designated.
[0100]
Similarly, in the case of 512 and 513, each pupil position is selected as a reference feature point, and each point is designated using a relative reference amount above it.
[0101]
Since 514 to 519 are line objects, two points are designated. Also in this case, it designates using a reference feature point and a relative reference amount. In the case of a line, it is possible to specify a solid line, a broken line, etc., and a color attribute.
[0102]
In addition, a variable representing the center of the circle, a variable representing the size of the circle, and the like are defined for the circle object, and the structure is similarly defined for the other graphic objects.
[0103]
For a bitmap object, a position for displaying or an enlargement ratio for displaying is specified.
[0104]
Note that these objects have different rendering results depending on the rendering order, so that the rendering order is defined and applied to the image in the object description order.
[0105]
2 Contents of figure calculation unit 500
Next, the behavior of the figure calculation unit 500 will be described.
[0106]
As shown in FIG. 9, the graphic calculation unit 500 includes a graphic database 501 and an overlay position calculation unit 501.
[0107]
The graphic database 501 stores a plurality of registered graphic pattern information, and these registered graphic patterns are managed as files, and are newly added and deleted freely. Information such as the number and size of pattern information is managed in the database.
[0108]
As shown in FIG. 13, the information structure of each pattern has a plurality of registered graphic patterns in the database, and each registered graphic pattern is configured as a set of objects.
[0109]
This database is composed of a magnetic disk or a memory unit. Then, a recording medium such as an FD, CD-ROM, MO, and DVD that stores a registered graphic pattern database may be prepared separately and stored in an image processing apparatus that is a personal computer by installation or the like. In this case, only the recording medium of the registered graphic pattern database may be manufactured and sold.
[0110]
The pattern editing unit 504 enables input / output of registered graphic patterns. For example, a medium for exchanging a portable magnetic disk, a portable memory unit or the like, an interface for exchanging data via a network, or an editing editor function may be provided.
[0111]
These registered graphic pattern information can be described by a script language.
[0112]
For example, registered graphic pattern information representing “cat” is expressed as a script described in text information as shown in FIG. Thereby, the graphic information to be synthesized can be corrected by editing the text file, and this file can be distributed through the network.
[0113]
The overlapping position calculation unit 502 reads necessary reference feature point information and relative reference amount information from the face area detection unit 200 and the face part detection unit 300, and calculates each position specified by the argument.
[0114]
This will be described with reference to the flowchart of FIG.
[0115]
First, one piece of registered graphic data information is selected from the graphic database 501 and read into the memory (step 50201). The selection is selected by the selection unit 503.
[0116]
The selection unit 503 sends the ID (for example, recognition number) of the registered graphic pattern selected by the user to the overlay position calculation unit 502 and reads the pattern of the ID.
[0117]
Here, the selection unit 503 can be provided with a random number generation mechanism and a timer to introduce a production effect such as performing a pattern change sequence at a constant time interval or switching the pattern randomly.
[0118]
For the registered graphic pattern information loaded in the memory, the following processing is performed for a plurality of objects.
[0119]
First, one object is read in the order described (step 50202).
[0120]
An argument defined in the object is read (step 50203).
[0121]
There are arguments that indicate position information and parameters (such as the size of the radius of a circle), and processing is performed in accordance with the argument information specified for each.
[0122]
The description will be given in the case of position information. First, a variable for representing position information is initialized. Assuming that the point (ax, ay) represents position information, 0 is substituted for both ax and ay for ax and ay (step 50204).
[0123]
Next, the designated length is read using the position information of the designated face part or the like, or the vector shown in FIG. 12 obtained from the face part (step 50205).
[0124]
Processing for applying a coefficient to the position information (multiplying by a constant) is performed (step 50206).
[0125]
And it adds to ax and ay which are variables showing position information (Step 50207).
[0126]
Further, if there is a description in the argument, the processing is repeated from step 50205, otherwise (ax, ay) is the position information (step 50208).
[0127]
If there are more arguments, the process is repeated (step 50209), and the process ends when the values of all the object arguments are determined (step 50210).
[0128]
The position of the synthesized image corresponding to the face position information detected by this procedure is determined.
[0129]
(Image composition unit 600)
The image composition unit 600 synthesizes the designated registered graphic pattern information at the position calculated by the graphic calculation unit 500 with each attribute or processes the image.
[0130]
A configuration example of the image composition unit 600 is shown in FIG.
[0131]
The image memory 610 holds an input image and passes it to each combining unit 612.
[0132]
Each combining unit 612 is selectively used according to the shape type and processing method described in the registered graphic pattern information. For the instruction, the object information control unit 611 receives the shape, position information, parameters, and image information passed from the graphic calculation unit 500, and repeats the process for the number of designated objects.
[0133]
The image held in the image memory 610 is converted by the synthesis unit 612 and held in the image memory 613. When the process is repeated, the image is copied from the image memory 613 to the image memory 610, and the process is repeated.
[0134]
The object information control unit 611 passes the image in the image memory 613 to the outside when all the synthesis is completed.
[0135]
Up to now, it has been described that a figure is synthesized with an image. Next, overlapping of bitmap images will be described.
[0136]
Bitmaps are superimposed as shown in FIG. The flowchart is shown in FIG.
[0137]
Reference numeral 650 in FIG. 16 denotes an input image to be processed, which is displayed with a circle on the detected nose for easy understanding.
[0138]
An object is to obtain a composite image 651 in FIG. 16 by superimposing bit map images (monkeys) represented by 652 in FIG. 16 in accordance with the positions calculated according to the detected positions.
[0139]
In step 661 in FIG. 17, a bitmap image to be combined is read, and simultaneously, an exclusion color at the time of embedding (representing a region not to be combined and indicating a white region color 652 in FIG. 16) is read.
[0140]
In step 662, the position where the bitmap image is synthesized is calculated from the position designation as described above as indicated by 653 in FIG.
[0141]
In step 663, the input image is read and synthesis of the bitmap image is started. Since the combined position may be enlarged or reduced, the pixel position in the target region is obtained (step 664), and the process differs depending on the combined position (step 665). In the case of the exclusion color (656 in FIG. 16), the pixel value of the input image is output as it is (step 666) at the position 655 in FIG. 16 corresponding to the position 654 in FIG. (657 in FIG. 16) outputs the pixel value of the bitmap image (step 667).
[0142]
It is determined whether processing has been performed for all target regions (step 668), and if completed, a composite image as shown at 651 in FIG. 16 is output.
[0143]
Of course, a plurality of bitmaps may be combined with one input image.
[0144]
Note that the configuration of the combining unit 612 may be changed to one image memory, and the configuration is not limited.
[0145]
Further, when the output medium is a display or the like and displays in real time, a configuration in which image conversion is directly applied to the VRAM may be used.
[0146]
When the output medium is an image or digital video, the information of each pixel may be changed in the memory of each image according to the drawing content and stored in the recording medium.
[0147]
[Example 2]
An image processing apparatus according to the second embodiment will be described.
[0148]
In the image processing apparatus according to the second embodiment, an image conversion unit 700 is added to the image composition unit 600 described in the first embodiment.
[0149]
A configuration example in which the function of the image conversion unit 700 is added to the image composition unit 600 is shown in FIG.
[0150]
The same function as that described in the image composition unit 600 and having a function of converting an image, for example, a mosaic processing unit 701 that performs mosaic processing on a region with an image or a portion that expands or reduces a partial region with an image An enlargement / reduction processing unit 702 and other image conversion unit 703 are added.
[0151]
In the embodiment using the image composition unit 600, only composition by superimposing figures and images is performed, but it is possible to obtain images having various impressions by adding image conversion to the image itself.
[0152]
FIG. 19 shows an example of image conversion.
[0153]
FIG. 19A shows an input image to be processed, which is displayed with a circle on the detected eyes and nose for easy understanding.
[0154]
FIG. 19B shows the image obtained by performing mosaic processing on a place other than the vicinity of the face area using information on the detected face area and eye / nose position.
[0155]
Further, FIG. 19C shows a result of performing mosaic processing only on the vicinity of the face area, and provides processing for performing image conversion on such an image itself.
[0156]
Of course, since the face area is automatically tracked, even if a person moves as shown in FIGS. 19C and 19D, the face area is always changed by performing mosaic processing at that place. A mosaic processed image can be obtained.
[0157]
Further, the function of enlarging and reducing a part of the input image can provide effects such as enlargement of the eye part and enlargement of the nose part as shown in FIGS.
[0158]
Therefore, since deformation control using part information such as the eyes and nose is possible, the conventional example 6. This is different from the system described in.
[0159]
The processing procedure is as follows.
[0160]
The image memory 610 holds an input image and passes it to each image conversion unit.
[0161]
Each of the image conversion units 701, 702, and 703 is selectively used according to the type and processing method of the image conversion described by the registered graphic pattern information.
[0162]
As the instruction, the object information control unit 611 receives the shape, position information, and parameters passed from the graphic calculation unit 500, converts the image stored in the image memory 610 by the conversion unit 701, and stores the image in the image memory 613. . When the process is repeated, the image is copied from the image memory 613 to the image memory 610, and the process is performed again. The object information control unit 611 passes the image in the image memory 613 to the outside when all the synthesis and image conversion are completed.
[0163]
Each image conversion unit operates according to the flowchart of FIG. Here, the operation of the partial enlargement / reduction processing unit 702 that performs the partial enlargement process will be described as an example.
[0164]
First, in step 711, image conversion information is read. The partial enlargement processing unit 702 reads the enlargement ratio of the eye region and the like.
[0165]
In step 712, position information for image conversion for image conversion is calculated.
[0166]
The partial enlargement / reduction processing unit 702 obtains the sizes of the eye area of the original image and the enlarged eye area. The position of the eye area before enlargement is obtained as 721 in FIG. 20, and the position of the eye area after enlargement 722 in FIG. 20 is obtained based on the enlargement ratio obtained in step 711.
[0167]
In step 713, a target image for image conversion is read from the memory, and in step 714, image conversion is performed.
[0168]
In the partial enlargement / reduction processing 702, the previous enlargement is performed on the target area, and an image like 723 is obtained. In step 715, the image is output.
[0169]
Of course, such image deformation may be performed a plurality of times, or multiple operations may be performed. Further effects can be expected by using it in combination with figure composition.
[0170]
In addition to such image conversion, other processes such as color tone correction and extraction of only the edge of the target area may be used.
[0171]
[Example of deformation]
Hereinafter, some modifications of the above embodiment will be described.
[0172]
(Modification 1)
As shown in FIG. 21, an audio output unit 700 may be added to output audio, music, etc. in accordance with image synthesis.
[0173]
In this case, the graphic calculation unit 500 may output the sound in synchronization with the frame where the synthesis is started. In addition, background music may be output when synthesis is performed. The audio information may be held accompanying the registered graphic pattern information.
[0174]
(Modification 2)
In addition, as shown in FIG. 22, a speech recognition unit 800 may be added to select synthesized content by speech.
[0175]
For example, when a voice “monkey” is input to the system, the speech recognition unit 800 recognizes the word “monkey” and corresponds to the word.
It is possible to implement disguise as shown in FIG. 16 by selecting disguise contents by an instruction to the selection unit 503.
[0176]
Of course, for this purpose, an extension for adding a keyword to the registered graphic pattern information may be performed.
[0177]
Thus, various image deformations can be enjoyed one after another by uttering the keywords attached to the composite information one after another.
[0178]
(Modification 3)
In the first and second embodiments, only the face area and the position and size of the face parts were used. However, the secondary features obtained from the face can be added and further variations can be added.
[0179]
Here, secondary features include blinking, face orientation, pupil position, line-of-sight direction, mouth opening, personal identification information, and the like.
[0180]
For example, by deforming a figure according to the orientation of the face, a deformation operation such as affine transformation is applied to each figure object, and a figure output corresponding to the orientation of the face becomes possible.
[0181]
Further, the expansion may be performed by changing the number of objects to be displayed and synthesized depending on the face state.
[0182]
For example, blinking human eyes can be used as a switch to display another object in a frame with closed eyes or to change the attributes (color, size, position, shape) of the object. Is possible. For such extension, it is necessary to change the description method of the object and the structure of the registered graphic pattern information.
[0183]
Further, although the face feature is only the position of the eyes and nose, the expression ability can be further enhanced by adding a part for detecting the mouth and the like.
[0184]
(Modification 4)
At present, the input image itself is not changed, but it is possible to enhance the expressive power by adding a deformation represented by morphing to the image and combining it with a graphic display.
[0185]
For example, if the face area is deformed in accordance with the skeleton of a certain animal and then graphic display is performed, the reality can be further improved.
[0186]
(Modification 5)
It is also possible to change the attribute of each object according to the information of the input image.
[0187]
For example, the color attribute is changed in accordance with the color tone of the input image. When a dark image is input, it is effective to superimpose with a reduced brightness and saturation.
[0188]
(Modification 6)
The face detection has been described by one person in the embodiment, but it is also possible to change to a method of detecting a plurality of persons and simultaneously detecting the face part information of the plurality of persons, and disguise a plurality of persons at the same time.
[0189]
【The invention's effect】
According to the present invention, face images including various deformation patterns can be obtained instantaneously, and image effects such as amusement, presentation, image communication, and video mail can be improved.
[0190]
In addition, since it is possible to easily generate a moving image different from normal, it can be expected to bring about an effect as a communication tool.
[Brief description of the drawings]
FIG. 1 is a block diagram of an image processing apparatus according to a first embodiment of the present invention.
FIG. 2 is a synthesis application example.
FIG. 3 is a block diagram of an image input unit.
FIG. 4 is a block diagram of a face area detection unit.
FIG. 5 is a block diagram of a face part detection unit.
FIG. 6 is an explanatory diagram of detection of a facial part.
FIG. 7 is a block diagram of an image output unit.
FIG. 8 is a script example of registered pattern information.
FIG. 9 is a block diagram of a figure calculation unit.
FIG. 10 is an example of a registered graphic pattern.
FIG. 11 is an example of an object structure.
FIG. 12 is a method for specifying the relative position of an object.
FIG. 13 is a structure of a registered graphic pattern.
FIG. 14 is a flowchart of a method for calculating the relative position of an object.
FIG. 15 is a block diagram of an image composition unit.
FIG. 16 is an explanatory diagram of a bitmap synthesis unit.
FIG. 17 is a flowchart of bitmap synthesis.
FIG. 18 is a block diagram of an image conversion unit.
FIG. 19 is an application example of image conversion.
FIG. 20 is an explanatory diagram and a flowchart of image conversion.
FIG. 21 is a block diagram of an image processing apparatus according to a second embodiment.
FIG. 22 is a block diagram illustrating a modified example of an image processing apparatus using a voice recognition unit.
[Explanation of sign]
100 Image input section
200 Face area detection unit
300 Facial parts detector
400 Image output unit
500 figure calculator
600 figure composition part

Claims

An image input means for inputting an image;
A face area detection means for detecting the size and position of the face area of the subject in the input image by obtaining a similarity to a face template while changing the reduction ratio of the input image;
By detecting a plurality of candidate points of the face part from the detected face region, and selecting from among the plurality of candidate points based on the similarity with the template of the face part and the positional relationship between the candidate points, Facial part position detecting means for detecting the position of the facial part of the nose or mouth,
A figure calculation means for determining a figure to be combined with the inputted image or the content and position of the image;
Image combining means for combining the determined figure or image with the input image;
Image output means for outputting the synthesized image;
Have
The graphic calculation means
Registered graphic pattern information consisting of a collection of multiple figures or images,
(A) a face area or a face part which is a reference in the registered figure pattern;
(B) a relative reference quantity vector based on the size of the face area in the registered figure pattern, and
(C) a relative position vector expressing a relative positional relationship between the figure or image to be combined and the face region or the face part in the registered figure pattern using the relative reference amount vector;
Information holding means for holding registered graphic pattern information including:
Using the position and size of the face area detected by the face area detecting means, or the information on the position of the face part detected by the face part position detecting means, and the stored registered figure pattern information Superimposing position calculating means for determining the position and size of the figure or image to be combined;
Have
The superposition position calculation means is based on the position of the face part detected by the face part position detection means.
The relative position vector included in the registered graphic pattern information; and
Determining the position and size of the figure or image to be combined using a relative reference vector based on the size of the face region detected by the face region detection means;
An image processing apparatus.

An image input means for inputting an image;
A face area detection means for detecting the size and position of the face area of the subject in the input image by obtaining a similarity to a face template while changing the reduction ratio of the input image;
By detecting a plurality of candidate points of the face part from the detected face region, and selecting from among the plurality of candidate points based on the similarity with the template of the face part and the positional relationship between the candidate points, Facial part position detecting means for detecting the position of the facial part of the nose or mouth,
Graphic calculation means for determining the content and position to convert the input image;
Image synthesis means for converting the determined position in the input image into the determined content;
Image output means for outputting the converted image;
Have
The graphic calculation means
Registered graphic pattern information consisting of a collection of multiple figures or images,
(A) a face area or a face part which is a reference in the registered figure pattern;
(B) a relative reference quantity vector based on the size of the face area in the registered figure pattern, and
(C) a relative position vector expressing a relative positional relationship between the figure or image to be combined and the face region or the face part in the registered figure pattern using the relative reference amount vector;
Information holding means for holding registered graphic pattern information including:
Using the position and size of the face area detected by the face area detecting means, or the information on the position of the face part detected by the face part position detecting means, and the stored registered figure pattern information Image conversion means for determining the position and size of the area to be converted;
Have
The image converting means is based on the position of the face part detected by the face part position detecting means.
The relative position vector included in the registered graphic pattern information; and
An image processing apparatus, wherein a position and a size of the area to be converted are determined using a reference vector based on the size of the face area detected by the face area detecting means.

The face area detecting means or the face part position detecting means is
If the input image is a video,
Near the position of the face area detected in the previous frame of the video, or
The detection of only the vicinity of each position of the face part detected in the previous frame of the moving picture as a search area, and detecting a face area or a face part in the current frame of the moving picture. The image processing apparatus according to 1 or 2.

An image processing method performed by an image processing apparatus,
An image input step in which the image input means inputs an image;
The face area detecting means, the size and position of the subject's face area in the image input by said image input means, the degree of similarity between the template of the face while changing the reduction ratio of the input image by the image input means by finding and a face area detection step of detecting,
The face part position detecting means detects a plurality of candidate points of the face part from the face area detected by the face area detecting means, and the similarity between the face part template and the candidate points among the plurality of candidate points A face part position detecting step for detecting the position of the face part of the eyes, nose, or mouth by selecting based on the positional relationship of
A figure calculating means for determining a figure for synthesizing with the image input by the image input means or the content and position of the image; and
Image synthesis means, the image input by said image input means, an image combining step of combining the determined shape or image on the graphic calculation means,
An image output step in which the image output means outputs the image synthesized by the image synthesis means ;
Have
The graphic calculation step includes:
Registered graphic pattern information consisting of a collection of multiple figures or images,
(A) a face area or a face part which is a reference in the registered figure pattern;
(B) a relative reference quantity vector based on the size of the face area in the registered figure pattern, and
(C) a relative position vector expressing a relative positional relationship between the figure or image to be combined and the face region or the face part in the registered figure pattern using the relative reference amount vector;
An information holding step in which the information holding means holds registered graphic pattern information including:
The superposition position calculating means includes the position and size of the face area detected by the face area detecting means , or information on the position of the face part detected by the face part position detecting means, and the information holding means. by using the been the registered graphic pattern information held by the position calculation step superposition to determine the location and size of graphic or image to be combined,
Have
In the superposition position calculation step, the superposition position calculation means is based on the position of the face part detected by the face part position detection means in the face part position detection step.
A relative position vector included in the registered graphic pattern information held in the information holding means , and
An image characterized in that the position and size of the figure or image to be synthesized are determined using a relative reference vector based on the size of the face region detected by the face region detection means in the face region detection step. Processing method.

An image processing method performed by an image processing apparatus,
An image input step in which the image input means inputs an image;
The face area detecting means, the size and position of the subject's face area in the image input by said image input means, the degree of similarity between the template of the face while changing the reduction ratio of the input image by the image input means Detecting a face area by obtaining
Face part position detecting means detects a plurality of candidate points of the face parts from the detected face area, based on the positional relationship between the similarity and the candidate point of the facial part of the template from the plurality of candidate points A face part position detecting step for detecting the position of the face part of the eyes, nose or mouth by selecting
A figure calculating means for determining a figure for synthesizing with the image input by the image input means or the content and position of the image; and
Image synthesis means, the image input by said image input means, an image combining step of combining the determined graphic or image,
An image output means for outputting an image synthesized by the image synthesis means ;
Have
The graphic calculation step includes:
Registered graphic pattern information consisting of a collection of multiple figures or images,
(A) a face area or a face part which is a reference in the registered figure pattern;
(B) a relative reference quantity vector based on the size of the face area in the registered figure pattern, and
(C) a relative position vector expressing a relative positional relationship between the figure or image to be combined and the face region or the face part in the registered figure pattern using the relative reference amount vector;
An information holding step in which the information holding means holds registered graphic pattern information including:
The image conversion means detects the position and size of the face area detected by the face area detection means in the face area detection step, or the face detected by the face part position detection means in the face part position detection step. and information of the position of the component, using said registered graphic pattern information held by said information holding means, and an image conversion step of determining the position and size of the area to be converted,
Have
In the image conversion step, the image conversion means is based on the position of the face part detected by the face part position detection means in the face part position detection step.
The relative position vector included in the registered graphic pattern information held by the information holding means ; and
An image processing method for determining the position and size of the area to be converted using a relative reference quantity vector based on the size of the face area detected by the face area detecting means in the face area detecting step. .

In the face area detection step or the face part position detection step, the face area detection means or the face part position detection means includes:
When the image input by said image input means is a video,
Near the position of the face area detected in the previous frame of the video, or
The detection of only the vicinity of each position of the face part detected in the previous frame of the moving picture as a search area, and detecting a face area or a face part in the current frame of the moving picture. 6. The image processing method according to 4 or 5.

The computer,
Image input means for inputting images,
Detected by determining the similarity between the size and position of the subject's face area in the input image by the image input unit, the template of the face while changing the reduction ratio of the input image by the image input means Face area detection means ,
A plurality of candidate points of the face part are detected from the face area detected by the face area detecting means, and selected from the plurality of candidate points based on the similarity to the face part template and the positional relationship between the candidate points. Face part position detecting means for detecting the position of the face part of the eyes, nose, or mouth,
A figure calculation means for determining a figure to be combined with an image input by the image input means or a content and position of the image;
Image combining means for combining the graphic or image determined by the graphic calculating means with the image input by the image input means ; and
Image output means for outputting the image synthesized by the image synthesis means ;
Function as
The graphic calculation means
Registered graphic pattern information consisting of a collection of multiple figures or images,
(A) a face area or a face part which is a reference in the registered figure pattern;
(B) a relative reference quantity vector based on the size of the face area in the registered figure pattern, and
(C) a relative position vector expressing a relative positional relationship between the figure or image to be combined and the face region or the face part in the registered figure pattern using the relative reference amount vector;
Information holding means for holding registered graphic pattern information including, and
The position and size of the face area detected by the face area detection means , or information on the position of the face part detected by the face part position detection means, and the registered figure pattern held by the information holding means A superposition position calculation means for determining the position and size of a figure or image to be synthesized using information,
Have
The superposition position calculation means is based on the position of the face part detected by the face part position detection means .
A relative position vector included in the registered graphic pattern information held in the information holding means , and
Determining the position and size of the figure or image to be combined using a relative reference vector based on the size of the face region detected by the face region detection means ;
The recording medium which recorded the image processing program characterized by the above-mentioned.

Computer
Image input means for inputting images,
Detected by determining the similarity between the size and position of the subject's face area in the input image to the image input unit, the template of the face while changing the reduction ratio of the input image to the image input unit Face area detection means ,
A plurality of candidate points of the face part are detected from the face area detected by the face area detecting means, and selected from the plurality of candidate points based on the similarity to the face part template and the positional relationship between the candidate points. Face part position detecting means for detecting the position of the face part of the eyes, nose, or mouth,
A figure calculation means for determining a figure to be combined with an image input to the image input means or a content and position of the image;
Image combining means for combining the graphic or image determined by the graphic determining means with respect to the image input to the image input means ; and
Image output means for outputting the image synthesized by the image synthesis means ;
Function as
The graphic calculation means
Registered graphic pattern information consisting of a collection of multiple figures or images,
(A) a face area or a face part which is a reference in the registered figure pattern;
(B) a relative reference quantity vector based on the size of the face area in the registered figure pattern, and
(C) a relative position vector expressing a relative positional relationship between the figure or image to be combined and the face region or the face part in the registered figure pattern using the relative reference amount vector;
Information holding means for holding registered graphic pattern information including, and
Position and size of the face region detected by the face region detecting means, or the information of the position of the face part detected by the face part position detecting means, the registration figure held by said information holding means Image conversion means for determining the position and size of the area to be converted using the pattern information;
Have
The image converting means is based on the position of the face part detected by the face part position detecting means .
The relative position vector included in the registered graphic pattern information held in the information holding means ; and
A recording medium on which an image processing program is recorded, wherein a position and a size of the area to be converted are determined using a relative reference vector based on the size of the face area detected by the face area detecting means .

The face area detecting means or the face part position detecting means is
When the image input by said image input means is a video,
Near the position of the face area detected in the previous frame of the video, or
Detecting only the vicinity of each position of the face part detected in the previous frame of the moving picture as a search area, and detecting a face area or a face part in the current frame of the moving picture;
9. A recording medium on which the image processing program according to claim 7 or 8 is recorded.