JP2010152557A

JP2010152557A - Image processor and image processing method

Info

Publication number: JP2010152557A
Application number: JP2008328742A
Authority: JP
Inventors: Shoichi Ikegami; 渉一池上
Original assignee: Sony Computer Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2008-12-24
Filing date: 2008-12-24
Publication date: 2010-07-08
Anticipated expiration: 2028-12-24
Also published as: JP5468773B2

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem that it is difficult to track an object whose shape changes, in visual tracking. <P>SOLUTION: Captured video image data are read for each frame, and tracking start is determined based on the presence of an object to be tracked(S20, S22). When the tracking start is determined, an edge image of the image frame is generated (S24). On the other hand, when the control dot sequence of a B spline curve representing the shape of the object to be tracked is represented by the linear sum of the control dot sequence of the B spline representing a plurality of prepared reference shapes, particles are distributed in the space of the set of coefficients on each control dot sequence (S26). Furthermore, the particles are distributed even in a space of shape space vectors (S28), and likelihood observation and probability density distribution of the particles are obtained (S30). A curve obtained by weighting and averaging each parameter by the probability density distribution is generated as the result of tracking (S32). <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は情報処理技術に関し、特に入力画像中の対象物の位置や形状、およびそれらの変化を解析する画像処理装置およびそこで実行される画像処理方法に関する。 The present invention relates to information processing technology, and more particularly to an image processing apparatus that analyzes the position and shape of an object in an input image, and changes thereof, and an image processing method executed there.

視覚追跡はコンピュータビジョン、特にセキュリティ分野における視覚監視やＡＶ分野における記録映像の解析・分類、編集、またはマンマシンインターフェース、さらには人間同士のインターフェース、すなわちテレビ会議やテレビ電話など、多岐に渡る応用が見込まれる。そのため、追跡精度および処理効率の向上等を目的に、多くの研究がなされている。中でも、カルマンフィルタで扱うことのできない非ガウス性雑音が加算された信号の時系列解析手法として注目されているパーティクルフィルタを視覚追跡に応用する研究が多くなされており、特にCondensation(Conditional Density Propagation)アルゴリズムが有名である（例えば非特許文献１から非特許文献３参照）。 Visual tracking has a wide range of applications such as computer vision, especially visual surveillance in the security field, analysis / classification, editing of recorded video in the AV field, man-machine interface, and human-to-human interface, that is, video conferencing and videophone. Expected. Therefore, many studies have been made for the purpose of improving tracking accuracy and processing efficiency. In particular, there is a lot of research on applying particle filters, which are attracting attention as a time series analysis method for signals added with non-Gaussian noise that cannot be handled by Kalman filters, especially for the Condensation (Conditional Density Propagation) algorithm. Is famous (see, for example, Non-Patent Document 1 to Non-Patent Document 3).

このCondensationアルゴリズムにおいて、追跡対象物はＢスプライン曲線等で構成される任意形状の輪郭線で定義される。例えば人の頭部であればＢスプラインでΩ形の曲線を定義することで追跡を行うことができる。これは、振り向いたり屈んだりといった人の動作に対してその頭部の形状は基本的に変化しないため、Ω形の曲線を並進、伸縮、回転させることのみで頭部形状の表現が可能であることによる（例えば特許文献１参照）。
Contour tracking by stochastic propagation of conditional density, Michael Isard and Andrew Blake, Proc. European Conf. on Computer Vision, vol. 1, pp.343-356, Cambridge UK (1996) CONDENSATION - conditional density propagation for visual tracking, Michael Isard and Andrew Blake, Int. J. Computer Vision, 29, 1, 5-28 (1998) ICondensation: Unifying low-level and high-level tracking in a stochastic framework, Michael Isard and Andrew Blake, Proc 5th European Conf. Computer Vision, 1998 特開２００７−３２８７４７号公報 In this Condensation algorithm, the object to be tracked is defined by a contour line having an arbitrary shape composed of a B-spline curve or the like. For example, in the case of a human head, tracking can be performed by defining an Ω-shaped curve with a B-spline. This is because the shape of the head basically does not change in response to human movements such as turning and bending, so the head shape can be expressed only by translating, expanding and contracting, and rotating the Ω-shaped curve. (For example, refer to Patent Document 1).
Contour tracking by stochastic propagation of conditional density, Michael Isard and Andrew Blake, Proc.European Conf. On Computer Vision, vol. 1, pp.343-356, Cambridge UK (1996) CONDENSATION-conditional density propagation for visual tracking, Michael Isard and Andrew Blake, Int. J. Computer Vision, 29, 1, 5-28 (1998) ICondensation: Unifying low-level and high-level tracking in a stochastic framework, Michael Isard and Andrew Blake, Proc 5th European Conf.Computer Vision, 1998 JP 2007-328747 A

上述のように、人の頭部、ボール、自動車など、形状変化がほとんどない対象を追跡する場合、Condensationアルゴリズムは計算負荷、精度などの面で非常に有効な手法である。一方で、対象物の形状が変化し、特定の形状を並進、伸縮、回転させたのみで対象物を表現できない場合には精度のよい追跡が困難となる。このため少ない計算量で、対象物の形状、位置の変化を認識することのできる技術が望まれていた。 As described above, the Condensation algorithm is a very effective method in terms of calculation load, accuracy, and the like when tracking an object such as a human head, a ball, and an automobile that hardly changes in shape. On the other hand, when the shape of the object changes and the object cannot be expressed simply by translating, expanding, contracting, and rotating a specific shape, it is difficult to track with high accuracy. For this reason, a technique capable of recognizing changes in the shape and position of an object with a small amount of calculation has been desired.

本発明はこのような課題に鑑みてなされたものであり、その目的は、計算負荷を増大させることなく対象物の形状、位置の変化を認識することができる画像処理技術を提供することにある。 The present invention has been made in view of such problems, and an object thereof is to provide an image processing technique capable of recognizing a change in the shape and position of an object without increasing a calculation load. .

本発明のある態様は画像処理装置に関する。この画像処理装置は、複数の基準形状の輪郭線を定義する複数のパラメータを記憶する基準形状記憶部と、基準形状記憶部が記憶した複数のパラメータの線形和における各パラメータの係数のセットを定めることにより、画像内の対象物の輪郭線形状を線形和で表現して出力する対象物形状決定部と、を備えたことを特徴とする。 One embodiment of the present invention relates to an image processing apparatus. The image processing apparatus defines a reference shape storage unit that stores a plurality of parameters that define contour lines of a plurality of reference shapes, and sets of coefficients for each parameter in a linear sum of the plurality of parameters stored in the reference shape storage unit. Thus, the apparatus includes a target shape determining unit that outputs the contour shape of the target object in the image as a linear sum.

この画像処理装置は、対象物を撮像した第１の画像フレームおよび第２の画像フレームを含む動画像ストリームデータを取得する画像取得部をさらに備え、対象物形状決定部は、係数のセットで定義される係数セット空間に、第１の画像フレームにおける対象物の当該空間における推定存在確率分布に基づき、パーティクルフィルタに用いるパーティクルを生成および消滅させ、所定の遷移モデルに基づき遷移させる形状予測部と、第２の画像フレームにおける対象物の輪郭線と、パーティクルが定める候補輪郭とをマッチングして、各パーティクルの尤度を観測する観測部と、観測部が観測した尤度に基づき第２の画像フレームにおける対象物の、係数セット空間における推定存在確率分布を算出し、当該推定存在確率分布に基づき各パーティクルの係数のセットに重み付けを行うことにより、第２の画像フレームにおける対象物の輪郭線形状を推定する輪郭線取得部と、を備えてもよい。 The image processing apparatus further includes an image acquisition unit that acquires moving image stream data including a first image frame and a second image frame obtained by imaging an object, and the object shape determination unit is defined by a set of coefficients. A shape prediction unit that generates and extinguishes particles used for the particle filter based on the estimated existence probability distribution of the target object in the first image frame in the coefficient set space, and makes a transition based on a predetermined transition model; An observation unit that observes the likelihood of each particle by matching the contour line of the object in the second image frame with the candidate contour determined by the particle, and the second image frame based on the likelihood observed by the observation unit The estimated existence probability distribution in the coefficient set space of the target object is calculated, and each part is calculated based on the estimated existence probability distribution. By performing weighted set of coefficients Ikuru, and contour acquisition unit for estimating the contour shape of the object in the second image frame may be provided.

ここで「第１の画像フレーム」と「第２の画像フレーム」は、画像ストリームにおいて隣接する画像フレームでもよいし、離れて位置する画像フレームでもよい。時間軸の順方向へ追跡していく一般的な対象物追跡においては、「第１の画像フレーム」は「第２の画像フレーム」より時間的に前の画像フレームであるが、本実施の形態はこれに限らない。「候補輪郭」は対象物の一部または全体の輪郭線である。「尤度」は候補輪郭がどの程度対象物と近い態様となっているかを表す度合いであり、例えば追跡候補を２次元図形とした場合は、対象物との重なり具合、対象物との距離などを数値で示したものなどである。 Here, the “first image frame” and the “second image frame” may be adjacent image frames in the image stream, or may be image frames positioned apart from each other. In general object tracking in which tracking is performed in the forward direction of the time axis, the “first image frame” is an image frame temporally prior to the “second image frame”. Is not limited to this. A “candidate contour” is a contour line of a part or the whole of an object. “Likelihood” is a degree representing how close the candidate contour is to the object. For example, when the tracking candidate is a two-dimensional figure, the degree of overlap with the object, the distance to the object, etc. Is a numerical value.

「パーティクル」とは、過去の情報と現在の観測情報とから現在の状態を推定する手法のひとつであるパーティクルフィルタにおいて導入されるものであり、観測を行うパラメータのサンプリング頻度を、パラメータ空間に存在するパーティクルの数で表現する。 “Particles” are introduced in the particle filter, which is one of the methods for estimating the current state from past information and current observation information. The sampling frequency of the parameter to be observed exists in the parameter space. It is expressed by the number of particles to be played.

本発明の別の態様は画像処理方法に関する。この画像処理方法は、複数の基準形状の輪郭線を定義する複数のパラメータを記憶装置から読み出し、当該パラメータの線形和における各パラメータの係数のセットを定めるステップと、定められた係数のセットを用いて、画像内の対象物の輪郭線を線形和で表現して出力するステップと、を含むことを特徴とする。 Another embodiment of the present invention relates to an image processing method. This image processing method uses a step of reading a plurality of parameters defining outlines of a plurality of reference shapes from a storage device, determining a set of coefficients for each parameter in a linear sum of the parameters, and using the set of determined coefficients And expressing the contour line of the object in the image as a linear sum and outputting it.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、コンピュータプログラム、コンピュータプログラムを記録した記録媒体などの間で変換したものもまた、本発明の態様として有効である。 Note that any combination of the above-described components, and the expression of the present invention converted between a method, an apparatus, a system, a computer program, a recording medium on which the computer program is recorded, and the like are also effective as an aspect of the present invention. .

本発明によれば、対象物の位置や形状の変化に係る情報を少ない計算量で取得することができる。 According to the present invention, information related to changes in the position and shape of an object can be acquired with a small amount of calculation.

初めに、本実施の形態の特徴および効果を明らかにするために、パーティクルフィルタによる視覚追跡について概説する。図１は人物を追跡対象とした場合の視覚追跡手法を説明するための図である。人物画像１５０は実写した映像やコンピュータグラフィックスなどにより生成された動画像の画像ストリームを構成する画像フレームのひとつであり、追跡対象である人物１５２が写っている。 First, in order to clarify the features and effects of the present embodiment, an outline of visual tracking using a particle filter will be described. FIG. 1 is a diagram for explaining a visual tracking method when a person is a tracking target. The person image 150 is one of image frames constituting an image stream of a moving image generated by a photographed video or computer graphics, and a person 152 to be tracked is shown.

この人物１５２の動きを追跡するために、人物１５２の頭部輪郭の形状を近似するΩ形の曲線１５４を既知の表現で記述する。一方、人物１５２を含む人物画像１５０にはエッジ抽出処理を施し、エッジ画像を取得しておく。そして曲線１５４を規定するパラメータを変化させることにより当該曲線１５４を並進、伸縮、回転させて、その近傍にあるエッジを探索することにより、人物１５２の頭部輪郭と最もマッチすると推定されるパラメータの値を特定する。以上の処理をフレームごとに繰り返すことにより人物１５２の追跡が進捗する。ここでエッジとは一般的には画像の濃度や色に急な変化を有する箇所のことである。 In order to track the movement of the person 152, an Ω-shaped curve 154 that approximates the shape of the head contour of the person 152 is described in a known expression. On the other hand, the person image 150 including the person 152 is subjected to edge extraction processing to obtain an edge image. Then, by changing a parameter that defines the curve 154, the curve 154 is translated, expanded, contracted, and rotated, and an edge in the vicinity thereof is searched for. Identify the value. The tracking of the person 152 progresses by repeating the above processing for each frame. Here, the edge generally refers to a portion having an abrupt change in image density or color.

規定するパラメータの値を様々にした曲線１５４と人物１５２の頭部輪郭とのマッチングを行うために、パーティクルフィルタによる確率分布予測技術を導入する。すなわち、ひとつ前のフレームにおけるパラメータ空間上の対象物の確率分布に応じて曲線１５４のサンプリング数を増減させ、追跡候補の絞り込みを行う。これにより存在確率の高い部分に対しては重点的に探索を行うことができ、精度のよいマッチングが効率的に行える。 In order to perform matching between the curve 154 with various values of the specified parameter and the head contour of the person 152, a probability distribution prediction technique using a particle filter is introduced. In other words, the number of samplings of the curve 154 is increased or decreased according to the probability distribution of the object in the parameter space in the immediately preceding frame, and the tracking candidates are narrowed down. As a result, it is possible to focus on a portion having a high existence probability, and to perform highly accurate matching efficiently.

対象物の輪郭に着目した追跡に対するパーティクルフィルタの適用手法は、例えば非特許文献３（ICondensation: Unifying low-level and high-level tracking in a stochastic framework, Michael Isard and Andrew Blake, Proc 5th European Conf. Computer Vision, 1998）に詳述されている。ここでは本実施の形態に係る点に着目して説明する。 For example, Non-Patent Document 3 (ICondensation: Unifying low-level and high-level tracking in a stochastic framework, Michael Isard and Andrew Blake, Proc 5th European Conf. Computer Vision, 1998). Here, the description will be focused on the points according to the present embodiment.

まずΩ形の曲線１５４を、Ｂスプライン曲線で記述する。Ｂスプライン曲線はｎ個の制御点（Ｑ０，・・・，Ｑｎ）とｎ個のノット（ｓ０，・・・，ｓｎ）とから定義される。そして基本となる曲線形状、この場合はΩ形の曲線となるように、それらのパラメータをあらかじめ設定しておく。このときの設定によって得られる曲線を以後、テンプレートＱtと呼ぶ。なお、図１で示した人物画像１５０における人物１５２の追跡を行う場合は、テンプレートＱtはΩ形であるが、その形状は追跡対象によって変化させる。すなわち追跡対象がボールであれば円形、手のひらであれば手の形状などとなる。 First, an Ω-shaped curve 154 is described as a B-spline curve. A B-spline curve is defined by n control points (Q0,..., Qn) and n knots (s0,..., Sn). These parameters are set in advance so as to form a basic curve shape, in this case, an Ω-shaped curve. The curve obtained by the setting at this time is hereinafter referred to as template Qt. In the case of tracking the person 152 in the person image 150 shown in FIG. 1, the template Qt is an Ω shape, but the shape is changed depending on the tracking target. That is, if the tracking target is a ball, the shape is circular, and if it is a palm, the shape is a hand.

次にテンプレートの状態を変化させるための変換パラメータとして、形状空間ベクトルｘを準備する。形状空間ベクトルｘは以下のような６つのパラメータで構成される。 Next, a shape space vector x is prepared as a conversion parameter for changing the state of the template. The shape space vector x is composed of the following six parameters.

ここで（shift_ｘ，shift_ｙ）は（ｘ，ｙ）方向への並進量、（extend_ｘ，extend_ｙ）は倍率、θは回転角である。そして形状空間ベクトルｘをテンプレートＱtに作用させるための作用行列Ｗを用いると、変形後の曲線、すなわち候補曲線Ｑは以下のように記述できる。 Here, (shift _x , shift _y ) is a translation amount in the (x, y) direction, (extend _x , extend _y ) is a magnification, and θ is a rotation angle. Then, using the action matrix W for causing the shape space vector x to act on the template Qt, the curve after deformation, that is, the candidate curve Q can be described as follows.

式２を用いれば、形状空間ベクトルｘを構成する６つのパラメータを適宜変化させることにより、テンプレートを並進、伸縮、回転させることができ、組み合わせによって候補曲線Ｑを種々変化させることができる。 If Expression 2 is used, the template can be translated, stretched, and rotated by appropriately changing the six parameters constituting the shape space vector x, and the candidate curve Q can be changed variously depending on the combination.

そして、制御点、およびノットの間隔といったテンプレートＱtのパラメータや、形状空間ベクトルｘを構成する６つのパラメータを変化させることによって表現される複数の候補曲線について、各ノットの近傍にある人物１５２のエッジを探索する。その後、エッジとの距離などから各候補曲線の尤度を求めることにより、形状空間ベクトルｘを構成する６つのパラメータで定義される６次元空間における確率密度分布を推定する。 Then, for a plurality of candidate curves expressed by changing the parameters of the template Qt such as the control point and knot interval and the six parameters constituting the shape space vector x, the edges of the person 152 in the vicinity of each knot Explore. Thereafter, the likelihood density distribution in the six-dimensional space defined by the six parameters constituting the shape space vector x is estimated by obtaining the likelihood of each candidate curve from the distance to the edge or the like.

図２はパーティクルフィルタを用いた確率密度分布推定の手法を説明する図である。同図では理解を簡単にするために、形状空間ベクトルｘを構成する６つのパラメータのうち、あるパラメータｘ１の変化を横軸に表しているが、実際には６次元空間において同様の処理が行われる。ここで確率密度分布を推定したい画像フレームが時刻ｔの画像フレームであるとする。 FIG. 2 is a diagram for explaining a probability density distribution estimation method using a particle filter. In the figure, for the sake of easy understanding, among the six parameters constituting the shape space vector x, the change of a certain parameter x1 is shown on the horizontal axis, but actually the same processing is performed in the 6-dimensional space. Is called. Here, it is assumed that the image frame whose probability density distribution is to be estimated is the image frame at time t.

まず、時刻ｔの画像フレームのひとつ前のフレームである時刻ｔ−１の画像フレームにおいて推定された、パラメータｘ１軸上の確率密度分布を用いて（Ｓ１０）、時刻ｔにおけるパーティクルを生成する（Ｓ１２）。それまでにフィルタリングを行い、すでにパーティクルが存在する場合は、その分裂、および消滅を決定する。Ｓ１０において表した確率密度分布は、パラメータ空間上の座標に対応して離散的に求められたものであり、円が大きいほど確率密度が高いことを表している。 First, using the probability density distribution on the parameter x1 axis estimated in the image frame at time t−1, which is the frame immediately before the image frame at time t (S10), particles at time t are generated (S12). ). Filtering is performed until then, and if the particle already exists, its splitting and disappearance are determined. The probability density distribution represented in S10 is obtained discretely corresponding to the coordinates in the parameter space, and the probability density is higher as the circle is larger.

パーティクルはサンプリングするパラメータｘ１の値とサンプリング密度とを実体化したものであり、例えば時刻ｔ−１において確率密度が高かったパラメータｘ１の領域は、パーティクル密度を高くすることで重点的にサンプリングを行い、確率密度の低かった範囲はパーティクルを少なくすることでサンプリングをあまり行わない。これにより、例えば人物１５２のエッジ近傍において候補曲線を多く発生させて、効率よくマッチングを行う。 The particle is a materialization of the value of the parameter x1 to be sampled and the sampling density. For example, the parameter x1 region where the probability density was high at time t-1 is sampled by increasing the particle density. In the range where the probability density is low, sampling is not performed by reducing the number of particles. Thereby, for example, many candidate curves are generated near the edge of the person 152, and matching is performed efficiently.

次に所定の運動モデルを用いて、パーティクルをパラメータ空間上で遷移させる（Ｓ１４）。所定の運動モデルとは例えば、ガウシアン型運動モデル、自己回帰予測型運動モデルなどである。前者は、時刻ｔにおける確率密度は時刻ｔ−１における各確率密度の周囲にガウス分布している、とするモデルである。後者は、サンプルデータから取得した２次以上の自己回帰予測モデルを仮定する手法で、例えば人物１５２がある速度で等速運動をしているといったことを過去のパラメータの変化から推定する。図２の例では、自己回帰予測型運動モデルによりパラメータｘ１の正方向への動きが推定され、各パーティクルをそのように遷移させている。 Next, particles are transitioned on the parameter space using a predetermined motion model (S14). The predetermined motion model is, for example, a Gaussian motion model, an autoregressive prediction motion model, or the like. The former is a model in which the probability density at time t is Gaussian distributed around each probability density at time t-1. The latter is a method that assumes a second-order or higher-order autoregressive prediction model acquired from sample data. For example, it is estimated from a change in past parameters that a person 152 is moving at a certain speed. In the example of FIG. 2, the motion of the parameter x1 in the positive direction is estimated by the autoregressive prediction type motion model, and each particle is changed in that way.

次に、各パーティクルで決定される候補曲線の近傍にある人物１５２のエッジを、時刻ｔのエッジ画像を用いて探索することにより、各候補曲線の尤度を求め、時刻ｔにおける確率密度分布を推定する（Ｓ１６）。前述のとおり、このときの確率密度分布はＳ１６に示すように、真の確率密度分布４００を離散的に表したものになる。以降、これを繰り返すことにより、各時刻における確率密度分布がパラメータ空間において表される。例えば確率密度分布が単峰性であった場合、すなわち追跡対象が唯一であった場合は、得られた確率密度を用いて各パラメータの値に対し重み付けした和を最終的なパラメータとすることにより、追跡対象の輪郭と推定される曲線が得られることになる。 Next, the likelihood of each candidate curve is obtained by searching for the edge of the person 152 in the vicinity of the candidate curve determined by each particle using the edge image at time t, and the probability density distribution at time t is obtained. Estimate (S16). As described above, the probability density distribution at this time is a discrete representation of the true probability density distribution 400 as shown in S16. Thereafter, by repeating this, the probability density distribution at each time is represented in the parameter space. For example, if the probability density distribution is unimodal, that is, if the tracked object is unique, the final parameter is the weighted sum of each parameter value using the obtained probability density. Thus, a curve estimated as the contour of the tracking target is obtained.

Ｓ１６において推定される時刻ｔにおける確率密度分布p(x_t ⁱ)は以下のように計算される。 The probability density distribution p (x _t ⁱ ) at time t estimated in S16 is calculated as follows.

ここでｉはパーティクルに一意に与えられた番号、p(x_t ⁱ|x_t ⁱ, u_t-1)は所定の運動モデル、p(y_t|x_t ⁱ)は尤度である。 Here, i is a number uniquely given to a particle, p (x _t ⁱ | x _t ⁱ , u _t−1 ) is a predetermined motion model, and p (y _t | x _t ⁱ ) is a likelihood.

これまで述べた手法は、最初に設定したテンプレートの形状がある程度維持されるという前提のもとで追跡を行うため、人間の頭部のようにそれ自体の形状変化が少ない場合には、少ない計算量で精度の良い追跡が行えるという点において大変有効である。一方で、並進、伸縮、回転のみでは表現できない対象物の形状変化に対応しきれないという問題がある。そこで本実施の形態では、追跡対象の形状を規定するパラメータセットを、あらかじめ用意した複数のパラメータセットの線形和で表現し、その係数を調整することにより追跡対象の形状変化をも推定する。これにより対象物の形状変化に対応した追跡を可能にする。 The methods described so far are tracked on the premise that the initially set template shape is maintained to some extent, so if the shape change of itself is small, such as a human head, less calculation is required. This is very effective in that it can be accurately tracked by quantity. On the other hand, there is a problem that it cannot cope with the shape change of an object that cannot be expressed only by translation, expansion and contraction, and rotation. Therefore, in the present embodiment, a parameter set that defines the shape of the tracking target is expressed by a linear sum of a plurality of parameter sets prepared in advance, and a change in the shape of the tracking target is also estimated by adjusting the coefficient. This enables tracking corresponding to the shape change of the object.

以下、線形和で表現するパラメータセットとして、Ｂスプライン曲線を規定する制御点列を採用した場合について説明する。まずＮ個の制御点列Ｑ_０，Ｑ_１，・・・，Ｑ_Ｎを準備する。各制御点列は上述のとおりｎ個の制御点で構成され、それぞれが、異なる形状のＢスプライン曲線を定義する。そして対象物の推定形状を表すＢスプライン曲線を定義する制御点列Ｑsumを、次のように当該Ｎ個の制御点列の線形和とする。 Hereinafter, a case where a control point sequence defining a B-spline curve is adopted as a parameter set expressed by a linear sum will be described. First, N control point sequences Q ₀ , Q ₁ ,..., Q _N are prepared. Each control point sequence is composed of n control points as described above, and each defines a B-spline curve having a different shape. Then, the control point sequence Qsum that defines the B-spline curve representing the estimated shape of the object is a linear sum of the N control point sequences as follows.

ここで係数α_０，α_１，・・・，α_Ｎは準備した制御点列に対する重み付けであり、当該係数α_０，α_１，・・・，α_Ｎのセット（以後、係数セットαとも呼ぶ）を変化させることにより対象物の形状を表現する。そして形状空間ベクトルｘに加え、係数セットαによって各パーティクルを定義し、そのうえで各パーティクルの尤度を観測して係数セットαの空間における確率密度分布を式３と同様に算出する。 Here, the coefficients α ₀ , α ₁ ,..., Α _N are weights for the prepared control point sequence, and a set of the coefficients α ₀ , α ₁ ,..., Α _N (hereinafter also referred to as coefficient set α). ) To express the shape of the object. Then, in addition to the shape space vector x, each particle is defined by a coefficient set α, and then the likelihood of each particle is observed, and the probability density distribution in the space of the coefficient set α is calculated in the same manner as Equation 3.

ある対象物の形状を、あらかじめ準備した複数の形状を規定するパラメータの線形和で表すことにより、当該準備した形状（以後、基準形状と呼ぶ）の中間的な形状を表現することができる。そのため、対象物のあらゆる形状の画像データを全て準備してマッチングを行うなどの手法に比べ計算量が少なくてすむ。本実施の形態ではさらに、このシンプルな表現手法を利用して、係数セットαの遷移確率を設定することにより、少ない計算量で効率的に探索を行い精度を向上させる。基本的には形状空間ベクトルｘと同様に、係数セットαの空間で各パーティクルを当該空間における確率密度分布に応じて生成、消滅させ、所定のモデルに従い遷移させることによりサンプリングを行う。そして、当該パーティクルを、形状空間ベクトルｘの空間における確率密度分布に応じてさらに生成、消滅させて遷移させることにより、候補の輪郭線を定め、それぞれの尤度を観測する。 By expressing the shape of a certain object by a linear sum of parameters that define a plurality of shapes prepared in advance, an intermediate shape of the prepared shape (hereinafter referred to as a reference shape) can be expressed. Therefore, the amount of calculation can be reduced compared to a method of preparing all image data of all shapes of the object and performing matching. Further, in the present embodiment, by using this simple expression method, the transition probability of the coefficient set α is set, thereby efficiently searching with a small amount of calculation and improving the accuracy. Basically, similarly to the shape space vector x, sampling is performed by generating and annihilating each particle in the space of the coefficient set α according to the probability density distribution in the space, and making transition according to a predetermined model. Then, by further generating, annihilating, and transitioning the particles according to the probability density distribution in the space of the shape space vector x, candidate contour lines are determined, and the respective likelihoods are observed.

図３、４は係数セットαの値と遷移モデルについて説明するための図である。同図はじゃんけんをする手を追跡対象とした例を示しており、「ぐー」、「ちょき」、「ぱー」の３種類の基準形状のＢスプライン曲線を準備している。これらの基準形状を定義する制御点列を、それぞれ式４のＱ_０、Ｑ_１、Ｑ_２とすると、追跡対象の形状が「ぐー」であるときは係数セットα（α_０，α_１，α_２）＝(1.0, 0.0, 0.0)である。同様に「ちょき」であるときはα＝(0.0, 1.0, 0.0)、「ぱー」であるときはα＝(0.0, 0.0, 1.0)である。このように、現在時刻において「ぐー」、「ちょき」、「ぱー」の基準形状のいずれかにある場合、次の時刻でその他の２つの基準形状、すなわち「ぐー」であれば「ちょき」または「ぱー」に向かう確率Ｐをそれぞれ０．５とする。 3 and 4 are diagrams for explaining the value of the coefficient set α and the transition model. This figure shows an example in which a hand that makes a janken is tracked, and three types of reference B-spline curves of “Guu”, “Choki”, and “Par” are prepared. Assuming that the control point sequences defining these reference shapes are Q ₀ , Q ₁ , and Q _{2 in} Equation 4, respectively, when the shape of the tracking target is “gu”, the coefficient set α (α ₀ , α ₁ , α ₂ ) = (1.0, 0.0, 0.0). Similarly, α = (0.0, 1.0, 0.0) when “Choki”, and α = (0.0, 0.0, 1.0) when “Pa”. In this way, if the current time is in one of the reference shapes of “goo”, “chokki”, and “pah”, the other two reference shapes at the next time, namely “goo” or “chokki” or The probability P toward “pa” is 0.5.

ここで、係数セットαが上述の基準形状を表す数列からわずかにずれていても、実際には基準形状とみなせる場合が考えられるため、あらかじめ基準形状とみなせる係数セットαの範囲を設定しておく。例えば係数セットαが定義する空間において、(1.0, 0.0, 0.0)から所定のユークリッド距離内にあるαが定める形状は全て「ぐー」であると見なすように設定する。図３において現在時刻の形状が黒丸１０２であり、係数セットαが(0.9, 0.1, 0.0)であったとする。この状態を「ぐー」である、とみなす設定がなされている場合は、その状態から「ちょき」、「ぱー」に遷移する確率Ｐをそれぞれ０．５とする。 Here, even if the coefficient set α is slightly deviated from the above-mentioned sequence representing the reference shape, it can be considered that it is actually a reference shape. Therefore, a range of the coefficient set α that can be considered as the reference shape is set in advance. . For example, in the space defined by the coefficient set α, all the shapes defined by α within a predetermined Euclidean distance from (1.0, 0.0, 0.0) are set to be regarded as “goo”. In FIG. 3, it is assumed that the current time shape is a black circle 102 and the coefficient set α is (0.9, 0.1, 0.0). If this state is set to be “goo”, the probability P of transitioning from that state to “chokki” and “pa” is set to 0.5, respectively.

または「ちょき」への遷移がやや多いとみなし、(1.0, 0.0, 0.0)と(0.9, 0.1, 0.0)のユークリッド距離に基づいて「ちょき」への遷移確率を「ぱー」への遷移確率より大きくするように重み付けを行う。そして当該遷移確率に則りパーティクルを分配したうえ、現在の状態である黒丸１０２の係数セットαを中心としたガウス分布１０４と、「ぐー」の範囲内にあり、かつ「ぱー」へ向かう所定の係数セットαを中心としたガウス分布１０６でパーティクルを分布させる。 Or, it is considered that there are a few more transitions to “Choki”, and the transition probability to “Choki” is based on the Euclidean distance of (1.0, 0.0, 0.0) and (0.9, 0.1, 0.0). Weighting is performed to increase the value. Then, after distributing the particles according to the transition probability, the Gaussian distribution 104 centering on the coefficient set α of the black circle 102 which is the current state, and a predetermined coefficient which is within the range of “Gu” and which goes to “Pa” Particles are distributed with a Gaussian distribution 106 centered on the set α.

図４において現在時刻における状態が黒丸１０８であり、係数セットαが、上述の「ぐー」と見なせる範囲および「ちょき」と見なせる範囲の外である(0.4, 0.6, 0.0)であったとする。この場合は「ぐー」と「ちょき」のいずれかへ遷移する途中であると判断し、現在の状態である黒丸１０８の係数セットαを中心としたガウス分布１１０にパーティクルを分布させる。なお図３、図４のガウス分布１０４、１０６、１１０は、実際には係数セットα（α_０，α_１，α_２）によって定義される３次元空間における分布となる。このとき、例えば遷移の到達点とみなされる基準形状（図４の例では「ぐー」と「ちょき」）を表す係数セットαを結ぶ線分方向の分布の標準偏差を大きくとるようにしてもよい。このようにすると、遷移確率の高い形状に多くのパーティクルを配置できることになり、サンプリングの効率および追跡精度が向上する。 In FIG. 4, it is assumed that the state at the current time is a black circle 108 and the coefficient set α is outside the range that can be regarded as “goo” and the range that can be regarded as “chokki” (0.4, 0.6, 0.0). In this case, it is determined that the transition is made to either “goo” or “choki”, and the particles are distributed in the Gaussian distribution 110 centering on the coefficient set α of the black circle 108 in the current state. Note that the Gaussian distributions 104, 106, and 110 in FIGS. 3 and 4 are actually distributions in a three-dimensional space defined by the coefficient set α (α ₀ , α ₁ , α ₂ ). At this time, for example, the standard deviation of the distribution in the direction of the line segment connecting the coefficient sets α representing the reference shapes (“Gu” and “Choki” in the example of FIG. 4) regarded as the arrival point of the transition may be increased. . In this way, many particles can be arranged in a shape having a high transition probability, and the sampling efficiency and tracking accuracy are improved.

なおパーティクルの分布は上述したものに限らず、全ての方向に同じ標準偏差を有するガウス分布でもよいし、ガウス分布以外のモデルを導入してもよい。例えば現時刻までの複数フレームにおける係数セットαの動きを取得して回帰予測モデルを導入してもよい。この場合、例えば「ぐー」から「ちょき」への遷移が等速に進んでいることが過去のフレームから判断できる場合は、さらに「ちょき」の形状へ進む方向に多くのパーティクルを分布させる。 The particle distribution is not limited to that described above, and may be a Gaussian distribution having the same standard deviation in all directions, or a model other than the Gaussian distribution may be introduced. For example, the regression prediction model may be introduced by acquiring the motion of the coefficient set α in a plurality of frames up to the current time. In this case, for example, when it can be determined from the past frame that the transition from “Gu” to “Choki” is proceeding at a constant speed, many particles are further distributed in the direction of proceeding to the “Choki” shape.

なおある基準形状から別の基準形状へ遷移する確率Ｐは、上述のとおり基準形状が「ぐー」、「ちょき」、「ぱー」の３種類であればＰ＝０．５であったが、その値は基準形状の数などにより変化する。ここである基準形状から遷移することの出来る基準形状の数をＮとすると、各基準形状への遷移確率Ｐはそれぞれ１／Ｎとなる。対象物によっては遷移確率を等しくせず、偏りをもたせてもよいし、それまでの事象により動的に決定してもよい。 The probability P of transition from one reference shape to another reference shape is P = 0.5 if the reference shapes are three types of “gu”, “choki”, and “par” as described above. The value varies depending on the number of reference shapes. If the number of reference shapes that can be changed from the reference shape here is N, the transition probability P to each reference shape is 1 / N. Depending on the object, the transition probabilities may not be equal and may be biased, or may be determined dynamically according to the events so far.

また式４では、追跡対象の形状を表すＢスプライン曲線のパラメータとして制御点列の線形和を利用したが、同じＢスプライン曲線を定義するパラメータであるノットの線形和を利用してもよい。ただし処理上、制御点からノットへの展開が一回で済むため、制御点を利用する方がより効率がよい。 In Expression 4, a linear sum of control point sequences is used as a parameter of a B-spline curve representing a shape to be tracked, but a knot linear sum that is a parameter defining the same B-spline curve may be used. However, in terms of processing, since the development from the control point to the knot is only once, it is more efficient to use the control point.

図５は本実施の形態における視覚追跡システムの構成例を示している。視覚追跡システム１０は、追跡対象１８を撮像する撮像装置１２、追跡処理を行う追跡装置１４、撮像装置１２が撮像した画像のデータや追跡結果のデータを出力する表示装置１６を含む。追跡対象１８は人、物、それらの一部など、視覚追跡システム１０の使用目的によって異なっていてよい。 FIG. 5 shows a configuration example of the visual tracking system in the present embodiment. The visual tracking system 10 includes an imaging device 12 that images the tracking target 18, a tracking device 14 that performs tracking processing, and a display device 16 that outputs image data captured by the imaging device 12 and tracking result data. The tracked object 18 may vary depending on the intended use of the visual tracking system 10, such as a person, an object, or a portion thereof.

追跡装置１４と、撮像装置１２あるいは表示装置１６との接続は、有線、無線を問わず、また種々のネットワークを介していてもよい。あるいは撮像装置１２、追跡装置１４、表示装置１６のうちいずれか２つ、または全てが組み合わされて一体的に装備されていてもよい。また使用環境によっては、撮像装置１２と表示装置１６は同時に追跡装置１４に接続されていなくてもよい。 The connection between the tracking device 14 and the imaging device 12 or the display device 16 may be wired or wireless, and may be via various networks. Alternatively, any two or all of the imaging device 12, the tracking device 14, and the display device 16 may be combined and integrally provided. Depending on the usage environment, the imaging device 12 and the display device 16 may not be connected to the tracking device 14 at the same time.

撮像装置１２は追跡対象１８を含む画像、または追跡対象１８の有無に関わらずある場所の画像のデータを、所定のフレームレートで取得する。取得された画像データは追跡装置１４に入力され、追跡対象１８の追跡処理がなされる。処理結果は出力データとして追跡装置１４の制御のもと、表示装置１６へ出力される。追跡装置１４は別の機能を実行するコンピュータを兼ねていてもよく、追跡処理の結果得られたデータ、すなわち追跡対象１８の位置情報や形状情報などを利用して様々な機能を実現してよい。 The imaging device 12 acquires data of an image including the tracking target 18 or an image of a certain place regardless of the presence or absence of the tracking target 18 at a predetermined frame rate. The acquired image data is input to the tracking device 14, and the tracking processing of the tracking target 18 is performed. The processing result is output as output data to the display device 16 under the control of the tracking device 14. The tracking device 14 may also serve as a computer that performs another function, and may implement various functions by using data obtained as a result of the tracking process, that is, position information and shape information of the tracking target 18. .

図６は本実施の形態における追跡装置１４の構成を詳細に示している。追跡装置１４は、撮像装置１２から入力される入力画像データを取得する画像取得部２０、当該入力画像データなど追跡処理に必要なデータを記憶する画像記憶部２４、入力画像データからエッジ画像などを生成する画像処理部２２、追跡対象の領域を検出する追跡対象領域検出部２６、追跡の開始および終了を判定する追跡開始終了判定部２８、パーティクルフィルタを用いて追跡処理を行う追跡処理部３０、最終的な追跡結果のデータを記憶する結果記憶部３６、追跡結果の表示装置１６への出力を制御する出力制御部４０を含む。 FIG. 6 shows the configuration of the tracking device 14 in the present embodiment in detail. The tracking device 14 includes an image acquisition unit 20 that acquires input image data input from the imaging device 12, an image storage unit 24 that stores data necessary for tracking processing such as the input image data, and an edge image and the like from the input image data. An image processing unit 22 to generate, a tracking target region detection unit 26 that detects a tracking target region, a tracking start / end determination unit 28 that determines the start and end of tracking, a tracking processing unit 30 that performs tracking processing using a particle filter, A result storage unit 36 that stores final tracking result data and an output control unit 40 that controls output of the tracking result to the display device 16 are included.

図６において、様々な処理を行う機能ブロックとして記載される各要素は、ハードウェア的には、ＣＰＵ、メモリ、その他のＬＳＩで構成することができ、ソフトウェア的には、画像処理を行うプログラムなどによって実現される。したがって、これらの機能ブロックがハードウェアのみ、ソフトウェアのみ、またはそれらの組合せによっていろいろな形で実現できることは当業者には理解されるところであり、いずれかに限定されるものではない。 In FIG. 6, each element described as a functional block for performing various processes can be configured by a CPU, a memory, and other LSIs in terms of hardware, and a program for performing image processing in terms of software. It is realized by. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any one.

画像処理部２２は追跡対象の輪郭を抽出する。具体的には、画像記憶部２４が記憶した入力画像データの画像フレームごとにエッジ抽出処理を施し、エッジ画像を生成する。ここではキャニーエッジフィルタや、ソーベルフィルタなど一般的なエッジ抽出アルゴリズムを用いることができる。また画像処理部２２は、背景差分を利用した前景抽出器（図示せず）を実装していてもよく、エッジ抽出処理の前処理として入力画像から追跡対象を含む前景を抽出することにより、追跡対象のエッジを効率的に抽出するようにしてもよい。 The image processing unit 22 extracts the contour to be tracked. Specifically, edge extraction processing is performed for each image frame of the input image data stored in the image storage unit 24 to generate an edge image. Here, a general edge extraction algorithm such as a canny edge filter or a Sobel filter can be used. Further, the image processing unit 22 may be mounted with a foreground extractor (not shown) that uses a background difference, and the tracking is performed by extracting the foreground including the tracking target from the input image as preprocessing of the edge extraction processing. The target edge may be extracted efficiently.

追跡対象領域検出部２６は、画像記憶部２２が記憶した入力画像データの画像フレームごとに画像分析を行って追跡対象の領域を検出する。例えば背景差分を利用した前景抽出器（図示せず）を実装し、画像フレームから抽出した前景の形状から追跡対象の有無を判断したうえその領域を検出する。その際、追跡対象が人間の頭部であれば、顔検出技術をさらに適用してもよい。あるいは色検出器により背景色と異なる色や特定の色を有する領域を追跡対象として検出してもよい。または、あらかじめ設定した対象物の形状とのパターンマッチングによって追跡対象の領域を検出してもよい。 The tracking target area detection unit 26 performs image analysis for each image frame of the input image data stored in the image storage unit 22 to detect a tracking target area. For example, a foreground extractor (not shown) using a background difference is mounted, and the presence or absence of a tracking target is determined from the foreground shape extracted from the image frame, and the region is detected. At this time, if the tracking target is a human head, face detection technology may be further applied. Alternatively, an area having a color different from the background color or a specific color may be detected as a tracking target by the color detector. Alternatively, the region to be tracked may be detected by pattern matching with a preset object shape.

そのほか視覚追跡システム１０に、撮像装置１２の他に撮影対象の空間の熱分布を測定する温度センサや、追跡対象の接触領域を２次元的に取得する圧電センサを設け、熱分布や圧力分布によって追跡対象の領域を検出してもよい。温度センサや圧電センサによる対象物の検知は既存の技術を適用できる。 In addition to the imaging device 12, the visual tracking system 10 is provided with a temperature sensor that measures the heat distribution of the space to be imaged and a piezoelectric sensor that acquires the contact area of the tracking object two-dimensionally. An area to be tracked may be detected. Existing technology can be applied to the detection of an object by a temperature sensor or a piezoelectric sensor.

追跡開始終了判定部２８は、追跡対象領域検出部２６による追跡対象領域の検出結果に基づき、追跡を開始するか終了するかを判定する。なおここでの「終了」はオクルージョンなどによる追跡の一時停止を含んでもよい。追跡は、追跡対象が撮像装置の視野角内に現れた場合や、物陰などから現れた場合などに開始し、追跡対象が撮像装置の視野角内から去った場合や物陰などに入った場合などに終了する。判定結果は追跡処理部３０に通知することにより追跡処理部３０の追跡処理を開始、終了させる。 The tracking start / end determination unit 28 determines whether to start or end tracking based on the detection result of the tracking target region by the tracking target region detection unit 26. Here, “end” may include a temporary stop of tracking by occlusion or the like. Tracking starts when the tracking target appears within the viewing angle of the imaging device or when it appears from behind the object, etc.When the tracking target leaves the viewing angle of the imaging device or enters the shadow, etc. To finish. By notifying the tracking processing unit 30 of the determination result, the tracking processing of the tracking processing unit 30 is started and ended.

追跡処理部３０は、サンプリング部４２、観測部４８、および結果取得部５０を含む。サンプリング部４２は、係数セットαの空間でサンプリングを行う形状予測部４４と、形状空間ベクトルｘの空間でサンプリングを行う形状空間ベクトル予測部４６を含む。形状予測部４４は、一つ前の時刻ｔ−１における画像フレームに対して推定された、係数セットαの空間での確率密度分布に基づきパーティクルの生成および消滅の処理を行う。そして上述のじゃんけんの例のように、各パーティクルが表す形状に応じた所定の規則でパーティクルを分布させる。 The tracking processing unit 30 includes a sampling unit 42, an observation unit 48, and a result acquisition unit 50. The sampling unit 42 includes a shape prediction unit 44 that performs sampling in the space of the coefficient set α and a shape space vector prediction unit 46 that performs sampling in the space of the shape space vector x. The shape prediction unit 44 performs particle generation and extinction processing based on the probability density distribution in the space of the coefficient set α estimated for the previous image frame at time t−1. Then, as in the example of the above-mentioned Janken, the particles are distributed according to a predetermined rule corresponding to the shape represented by each particle.

形状空間ベクトル予測部４６は、一つ前の時刻ｔ−１における画像フレームに対して推定された、形状空間ベクトルｘの空間での確率密度分布に基づきパーティクルの生成および消滅の処理を行う。そして全パーティクルに対し所定の運動モデルを適用して、パーティクルを当該空間上で遷移させる。形状予測部４４および形状空間ベクトル予測部４６の処理により、時刻ｔの画像フレームにおける複数の候補曲線が、形状の変化および並進、伸縮、回転を考慮したうえで決定できる。サンプリング部４２は、追跡開始終了判定部２８から追跡開始を示す信号を受けたら処理を開始し、追跡終了を示す信号を受けたら処理を終了する。 The shape space vector prediction unit 46 performs particle generation and extinction processing based on the probability density distribution in the space of the shape space vector x estimated for the previous image frame at time t-1. Then, a predetermined motion model is applied to all particles to cause the particles to transition in the space. By the processing of the shape prediction unit 44 and the shape space vector prediction unit 46, a plurality of candidate curves in the image frame at time t can be determined in consideration of shape change and translation, expansion / contraction, and rotation. The sampling unit 42 starts the process when receiving a signal indicating the start of tracking from the tracking start / end determining unit 28 and ends the process when receiving a signal indicating the end of tracking.

観測部４８はサンプリング部が生成・消滅、遷移させた各パーティクルが定める候補曲線の尤度を観測する。尤度は上述のように、画像処理部２２が生成したエッジ画像上で各候補曲線の近傍にあるエッジを探索し、当該エッジまでの距離を候補曲線ごとに見積もることによって決定する。結果取得部５０は、観測部４８が観測した尤度に基づき式３で示すような確率密度分布を係数セットαの空間および形状空間ベクトルｘの空間のそれぞれにおいて算出し、それにより重み付け平均したパラメータによって得られる曲線のデータなどの追跡結果を導出し、結果記憶部３６に格納する。また次の時刻ｔ＋１における追跡処理に使用するため、サンプリング部４２にそのデータを返す。結果記憶部３６に格納するデータは、重み付け平均した各パラメータの値でもよいし、それにより定まる曲線のみで構成される画像や、曲線と入力画像とを合成してできた画像のデータなどのいずれでもよい。 The observation unit 48 observes the likelihood of the candidate curve defined by each particle generated, disappeared, and transitioned by the sampling unit. As described above, the likelihood is determined by searching for an edge in the vicinity of each candidate curve on the edge image generated by the image processing unit 22 and estimating the distance to the edge for each candidate curve. The result acquisition unit 50 calculates the probability density distribution as shown in Expression 3 based on the likelihood observed by the observation unit 48 in each of the space of the coefficient set α and the space of the shape space vector x, and the weighted average parameter thereby A tracking result such as curve data obtained by the above is derived and stored in the result storage unit 36. The data is returned to the sampling unit 42 for use in the tracking process at the next time t + 1. The data stored in the result storage unit 36 may be the value of each parameter that has been weighted and averaged, and may be any of an image composed only of a curve determined by that, or image data formed by combining a curve and an input image. But you can.

追跡対象が複数存在する場合、結果取得部５０はさらに、それぞれに用意したテンプレートを用いて、追跡対象ごとに追跡を行い、それらの追跡結果を合成することによりひとつの追跡結果としてもよい。また複数の追跡対象が重なるような場合を追跡結果によって検出し、後ろに隠れる追跡対象については所定のタイミングで追跡処理対象からはずすなどの措置を講じる。これにより追跡対象が別の追跡対象の背後に回ったことによって観測尤度が一時的に低下しても、不適当な追跡結果を出力するのを避けることができる。 When there are a plurality of tracking targets, the result acquisition unit 50 may further track each tracking target using a template prepared for each, and combine the tracking results into one tracking result. Further, a case where a plurality of tracking targets overlap is detected based on the tracking result, and the tracking target hidden behind is taken out of the tracking processing target at a predetermined timing. As a result, even if the observation likelihood is temporarily lowered due to the tracking target moving behind another tracking target, it is possible to avoid outputting an inappropriate tracking result.

画像処理部２２、追跡処理部３０における上述の処理を、各フレームに対して行うことにより、結果記憶部３６には例えば追跡結果を含む動画像のデータが記憶される。この場合、出力制御部４０の制御のもと、当該動画像のデータを表示装置１６に出力することにより、輪郭線が追跡対象の動きと同様に動く様を表示することができる。なお上述のとおり、追跡結果は動画として表示する以外に、追跡の目的に応じて別の演算モジュールに出力するなどの処理を適宜行ってよい。 By performing the above-described processing in the image processing unit 22 and the tracking processing unit 30 for each frame, the result storage unit 36 stores, for example, moving image data including the tracking result. In this case, by outputting the moving image data to the display device 16 under the control of the output control unit 40, it can be displayed that the contour line moves in the same manner as the movement of the tracking target. As described above, in addition to displaying the tracking result as a moving image, processing such as output to another arithmetic module may be appropriately performed according to the purpose of tracking.

次にこれまで述べた構成による追跡装置１４の動作について説明する。まず撮像装置１２は、ユーザの指示入力などに応じ撮影対象を所定のフレームレートで撮影する。撮影された画像は入力画像データとして追跡装置１４の画像取得部２０へ入力され、画像記憶部２４に格納される。また、画像記憶部２４には、複数の基準形状、上述の例では「ぐー」、「ちょき」、「ぱー」を定義する３種類の制御点列が格納されている。このような状態において以下に述べる追跡処理が行われる。 Next, the operation of the tracking device 14 configured as described above will be described. First, the imaging device 12 captures an imaging target at a predetermined frame rate in response to a user instruction input or the like. The captured image is input as input image data to the image acquisition unit 20 of the tracking device 14 and stored in the image storage unit 24. In addition, the image storage unit 24 stores three types of control point sequences that define a plurality of reference shapes, in the above example, “gu”, “choki”, and “par”. In such a state, the following tracking process is performed.

図７は本実施の形態における追跡処理の手順を示すフローチャートである。まず追跡対象領域検出部２６は、画像記憶部２４に格納された入力画像データをフレームごとに読み出し、追跡対象となり得る物が存在する領域を検出する。追跡開始終了判定部２８はその結果に基づき追跡を開始するかどうかの判定を行う（Ｓ２０、Ｓ２２）。例えば、画像フレームから抽出した前景として、手のひらと推定できる所定のサイズ、形を有する対象が出現した場合には、追跡を開始する判定を行う。判定基準となる前景のサイズや形はあらかじめ論理的または実験的に定めておく。 FIG. 7 is a flowchart showing the procedure of the tracking process in the present embodiment. First, the tracking target area detection unit 26 reads the input image data stored in the image storage unit 24 for each frame, and detects an area where an object that can be a tracking target exists. The tracking start / end determination unit 28 determines whether to start tracking based on the result (S20, S22). For example, when an object having a predetermined size and shape that can be estimated as a palm appears as a foreground extracted from an image frame, it is determined to start tracking. The size and shape of the foreground that is the criterion for determination is determined in advance logically or experimentally.

追跡開始と判定されるまでＳ２０とＳ２２のＮを繰り返し、追跡開始と判定されたら（Ｓ２２のＹ）、追跡処理部３０が追跡処理を開始する。ここで、追跡開始を判定された画像フレームに対応する時刻をｔ＝０とし、以後の画像フレームは時刻ｔ＝１，２，３，・・・にそれぞれ対応するとする。まず、サンプリング部４２が画像処理部２２に対し、エッジ画像生成処理を要求することにより、画像処理部２２はｔ＝０画像フレームのエッジ画像を生成する（Ｓ２４）。このときサンプリング部４２は、後続フレームのエッジ画像生成処理要求も行い、画像処理部２２は順次処理を行ってよい。 N of S20 and S22 is repeated until it is determined that the tracking is started, and when it is determined that the tracking is started (Y of S22), the tracking processing unit 30 starts the tracking process. Here, it is assumed that the time corresponding to the image frame determined to start tracking is t = 0, and the subsequent image frames correspond to time t = 1, 2, 3,. First, when the sampling unit 42 requests the image processing unit 22 to perform edge image generation processing, the image processing unit 22 generates an edge image of a t = 0 image frame (S24). At this time, the sampling unit 42 may also request an edge image generation process for the subsequent frame, and the image processing unit 22 may sequentially perform the process.

そしてサンプリング部４２の形状予測部４４は、まず係数セットαの空間の所定領域に均等にパーティクルを配置してサンプリングを行う（Ｓ２６）。追跡対象領域検出部２６がテンプレートマッチングなどによって、追跡対象が基準形状のいずれかにあることを検出している場合は、当該基準形状を定義する係数セットの所定範囲内にパーティクルを局所的に分布させるようにしてもよい。次に形状空間ベクトル予測部４６は、パラメータ空間の所定領域に均等にパーティクルを配置してサンプリングを行う（Ｓ２８）。すると観測部４８は、各パーティクルが定める候補曲線とエッジ画像とをマッチングすることにより尤度を観測し、結果取得部５０が式３を係数セットαおよび形状空間ベクトルｘの両空間に適用し確率密度分布の初期値p（ｔ＝０）を算出する（Ｓ３０）。 The shape prediction unit 44 of the sampling unit 42 first performs sampling by arranging particles evenly in a predetermined region of the space of the coefficient set α (S26). When the tracking target area detection unit 26 detects that the tracking target is in one of the reference shapes by template matching or the like, the particles are locally distributed within a predetermined range of a coefficient set that defines the reference shape. You may make it make it. Next, the shape space vector prediction unit 46 performs sampling by arranging particles evenly in a predetermined region of the parameter space (S28). Then, the observation unit 48 observes the likelihood by matching the candidate curve determined by each particle and the edge image, and the result acquisition unit 50 applies the expression 3 to both the space of the coefficient set α and the shape space vector x. An initial value p (t = 0) of the density distribution is calculated (S30).

結果取得部３４はさらに、時刻ｔ＝０における追跡対象の形状および位置として、確率密度分布p（ｔ＝０）によって各パラメータを重み付け平均して得られる曲線を最終的に決定し、元の入力画像フレームと合成するなど、所望の追跡結果データを生成して結果記憶部に保存する（Ｓ３２）。 The result acquisition unit 34 further finally determines a curve obtained by weighted average of each parameter according to the probability density distribution p (t = 0) as the shape and position of the tracking target at time t = 0, and the original input Desired tracking result data, such as combining with an image frame, is generated and stored in the result storage unit (S32).

一方、画像処理部２２は、画像記憶部２４より時刻ｔ＝１の画像フレームを読み出しエッジ画像を生成する（Ｓ３４のＮ、Ｓ２４）。サンプリング部４２は、生成した確率密度分布の初期値p（ｔ＝０）に対応した数のパーティクルを係数セットαの空間上に発生させ、係数セットαの値に応じて分布させる（Ｓ２６）。さらに形状空間ベクトルｘの空間上にも発生させ、所定の運動モデルに基づきパーティクルをそれぞれ遷移させる（Ｓ２８）。発生させるパーティクルの数は、追跡装置１４が有する演算リソースの量や、求められる結果出力速度などに基づき、処理の負荷を考慮して制御する。分布させる規則や運動モデルは、追跡対象の種類に応じてガウシアン型運動モデル、自己回帰予測型運動モデルなどから追跡精度が高く得られるものをあらかじめ決定しておく。 On the other hand, the image processing unit 22 reads the image frame at time t = 1 from the image storage unit 24 and generates an edge image (N in S34, S24). The sampling unit 42 generates a number of particles corresponding to the generated initial value p (t = 0) of the probability density distribution in the space of the coefficient set α, and distributes the particles according to the value of the coefficient set α (S26). Further, it is generated in the space of the shape space vector x, and the particles are respectively transitioned based on a predetermined motion model (S28). The number of particles to be generated is controlled in consideration of the processing load based on the amount of computation resources of the tracking device 14 and the required result output speed. The rules and motion models to be distributed are determined in advance according to the type of tracking target so that tracking accuracy can be obtained from a Gaussian motion model, an autoregressive prediction motion model, or the like.

すると観測部３０は、遷移後のパーティクルが定める各候補曲線の尤度を観測し、その結果に基づき時刻ｔ＝１の確率密度分布p（ｔ＝１）を求める。（Ｓ３０）。尤度の観測は、画像処理部２２がＳ２４において生成した時刻ｔ＝１のエッジ画像を用いて、各候補曲線近傍にある輪郭線を探索することにより行われる。複数の追跡対象が存在する場合は、上記の処理を全ての追跡対象について行う。そして結果取得部３４は、時刻ｔ＝１における追跡対象の形状および位置として、確率密度分布p（ｔ＝１）によって各パラメータを重み付け平均して得られる曲線を最終的に決定し、元の入力画像フレームと合成するなど、所望の追跡結果データを生成して結果記憶部に保存する（Ｓ３２）。 Then, the observation unit 30 observes the likelihood of each candidate curve defined by the particles after transition, and obtains the probability density distribution p (t = 1) at time t = 1 based on the result. (S30). The likelihood is observed by searching for a contour line in the vicinity of each candidate curve, using the edge image at time t = 1 generated by the image processing unit 22 in S24. When there are a plurality of tracking targets, the above process is performed for all the tracking targets. Then, the result acquisition unit 34 finally determines a curve obtained by weighted average of each parameter according to the probability density distribution p (t = 1) as the shape and position of the tracking target at time t = 1, and the original input Desired tracking result data, such as combining with an image frame, is generated and stored in the result storage unit (S32).

追跡開始終了判定部２８は、追跡処理を続行するか終了するかの判定を行う（Ｓ３４）。例えば手のひらと推定できる所定のサイズ、形を有する対象が前景として現れない状態が所定時間継続した場合に追跡終了の判定を行う。あるいは、実空間上において、ある追跡対象が別の追跡対象の背後に回った場合など、オクルージョンの状態が所定時間継続した場合に追跡終了の判定を行う。さらに、追跡対象が撮像装置１２の画角から外れた状態が所定時間継続した状況も、オクルージョンと同様の手法で検出し、追跡終了の判定を行う。 The tracking start / end determining unit 28 determines whether to continue or end the tracking process (S34). For example, the tracking end is determined when a target having a predetermined size and shape that can be estimated as a palm does not appear as a foreground for a predetermined time. Alternatively, the end of tracking is determined when the occlusion state continues for a predetermined time, such as when a tracking target turns behind another tracking target in real space. Further, a situation in which the tracking target deviates from the angle of view of the imaging device 12 continues for a predetermined time is also detected by a method similar to occlusion and the tracking end is determined.

Ｓ３４において追跡処理を終了しないと判定した場合は（Ｓ３４のＮ）、時刻ｔ＝２の画像フレームからエッジ画像を生成するとともに、Ｓ３２で得られた時刻ｔ＝１のときの確率密度分布p（ｔ＝１）を用いてパーティクルの操作を行い、時刻ｔ＝２のフレームに対する尤度観測、確率密度分布算出、追跡結果データ生成を行う（Ｓ２４〜Ｓ３２）。以降、Ｓ３４で追跡開始終了判定部２８が追跡終了の判定（Ｓ３４のＹ）を行うまでＳ２４からＳ３２までの処理を、各フレームに対して繰り返す。これにより、例えばじゃんけんをする手のひらと同じ形状および動きで、追跡結果たる輪郭線が時間に対して変化していくような動画のデータが結果記憶部３６に格納される。出力制御部４０が当該データを、表示装置１６や別の機能を提供するモジュールなどに出力することにより、ユーザは所望の形態で追跡結果を利用することができる。 If it is determined in S34 that the tracking process is not terminated (N in S34), an edge image is generated from the image frame at time t = 2, and the probability density distribution p (at time t = 1 obtained in S32) is obtained. Particles are manipulated using t = 1), and likelihood observation, probability density distribution calculation, and tracking result data generation for the frame at time t = 2 are performed (S24 to S32). Thereafter, the processing from S24 to S32 is repeated for each frame until the tracking start / end determination unit 28 determines tracking end (Y in S34) in S34. As a result, for example, moving image data that has the same shape and movement as the palm of the hand of a janken and whose contour line as a tracking result changes with time is stored in the result storage unit 36. When the output control unit 40 outputs the data to the display device 16 or a module that provides another function, the user can use the tracking result in a desired form.

なおこれまでの説明では主に、手のひらの基準形状をＢスプライン曲線で表す手法について述べたが、追跡対象は手のひらに限らず、人体全体、動物、物など形状が変化する物で同様に行うことができる。また、追跡対象の形状を表す曲線、直線の表現手法、および形状を定義するパラメータは、Ｂスプライン曲線や制御点などに限定されない。 In the description so far, the method for expressing the reference shape of the palm with a B-spline curve has been mainly described. However, the tracking target is not limited to the palm, but the same thing should be done for the entire human body, animals, objects, etc. Can do. Further, the curve representing the shape of the tracking target, the method of expressing the straight line, and the parameters defining the shape are not limited to the B-spline curve or the control point.

以上述べたように、本実施の形態では、追跡対象の形状変化に対応できる視覚追跡が可能となる。形状変化に対応できるということはすなわち、物の形状認識が可能である、ということを意味する。計算の過程において、一つ前の画像フレームの形状を定義する係数セットαから、遷移モデルによって次の画像フレームの形状を定義する係数セットαの分布を予測する。つまり、現在時刻の画像フレームにおける対象物の形状認識のみならず、その後の画像フレームにおける対象物の形状を予測していることになる。 As described above, in this embodiment, visual tracking that can cope with a change in the shape of the tracking target is possible. Being able to cope with the shape change means that the shape of the object can be recognized. In the course of the calculation, the distribution of the coefficient set α that defines the shape of the next image frame is predicted by the transition model from the coefficient set α that defines the shape of the previous image frame. That is, not only the shape recognition of the object in the image frame at the current time but also the shape of the object in the subsequent image frame is predicted.

この特徴を利用すれば、カメラの前のユーザの動きを、各種処理による遅延時間を最小限にリアルタイムで検知することが可能になり、応答性に優れたユーザインターフェースを提供することができる。例えば自分の体の動きに合わせて画面上に描画された仮想の人間を動かしたり、遠隔操作型のロボットハンドを操作したりする場合に、情報入力から結果出力までの時間を削減することができる。 By using this feature, it becomes possible to detect the movement of the user in front of the camera in real time with a minimum delay time due to various processes, and a user interface with excellent responsiveness can be provided. For example, when moving a virtual person drawn on the screen according to the movement of your body or operating a remote-controlled robot hand, you can reduce the time from information input to result output .

上述の説明では、出力制御部４０が、追跡処理の結果得られた追跡対象の輪郭線を入力画像と合成することにより、輪郭線が追跡対象の動きと同様に動く動画像を生成する、という例を述べた。本実施の形態では上述のように、形状変化の有無に関わらず追跡対象の輪郭線を精度よくトレースすることができる。この特徴を利用すると、輪郭線の表示のみならず、画像内の対象物の領域、もしくは対象物以外の領域に、様々な視覚的効果を与えることができる。以下にその例を説明する。 In the above description, the output control unit 40 generates a moving image in which the contour moves in the same manner as the movement of the tracking target by combining the contour of the tracking target obtained as a result of the tracking process with the input image. An example was given. In the present embodiment, as described above, the contour line to be tracked can be accurately traced regardless of the presence or absence of the shape change. Using this feature, various visual effects can be given not only to the display of the contour line but also to the region of the object in the image or the region other than the object. An example will be described below.

例えば手の輪郭線を追跡処理により取得した場合、親指から小指までの指の位置や各指の爪の位置がおよそ特定できる。ここで「位置」とは特徴点のような点の位置でもよいし、有限領域を有する面の位置でもよい。そしてユーザの手の画像を撮像して表示装置に表示する構成において、爪の位置にネイルアートを施した爪の画像を合成したり、所望の指の根本に指輪の画像を合成すれば、ネイルアートの試し塗りや指輪の試着を仮想的に行うことができる。 For example, when the outline of the hand is acquired by the tracking process, the position of the finger from the thumb to the little finger and the position of the nail of each finger can be specified. Here, the “position” may be the position of a point such as a feature point or the position of a surface having a finite area. In a configuration in which an image of a user's hand is captured and displayed on the display device, a nail image in which nail art is applied to the position of the nail is synthesized or a ring image is synthesized on the root of a desired finger. You can virtually try out art and try on a ring.

追跡装置１４は、手の動きや形状の変化に対応して輪郭線を導出することができるため、手が所定の位置、所定の状態にある必要はない。手の向き、大きさ、奥行き方向の位置などに応じて爪の向き、大きさなどが変化しても、用意した画像をそれに応じて変形させることにより、実際の手にフィットしたネイルアートや指輪を合成させることができ、リアリティが増す。さらに輪郭線の動きによって手の傾きが推定できるため、正面、側面などカメラに対する傾きによっても合成する画像を変化させれば、陰影や光の反射具合なども確認することができる。 Since the tracking device 14 can derive a contour line corresponding to the movement or shape change of the hand, the hand need not be in a predetermined position and in a predetermined state. Even if the orientation, size, etc. of the nails change depending on the orientation, size, depth direction, etc. of the hand, the nail art and rings fit the actual hand by deforming the prepared image accordingly. Can be synthesized, increasing the reality. Furthermore, since the inclination of the hand can be estimated by the movement of the contour line, if the image to be synthesized is changed depending on the inclination of the camera, such as the front and the side, it is possible to confirm the shadow and the light reflection.

図８は、追跡処理によって取得した輪郭線を利用して画像加工処理を行う画像処理装置の構成を示している。画像処理装置７０は、対象物の輪郭線を取得する追跡装置１４、ユーザからの指示入力を受け付ける入力部７２、対象物の所定の部位の位置を特定する部位特定部７４、所定の部位の位置情報に基づき所定の画像処理を施す加工処理部７６、画像処理を施した結果を出力する出力部７８、画像処理に用いるデータを記憶する加工データ記憶部８０を含む。 FIG. 8 shows the configuration of an image processing apparatus that performs image processing using the contour line acquired by the tracking process. The image processing device 70 includes a tracking device 14 that acquires the contour line of an object, an input unit 72 that receives an instruction input from a user, a part specifying unit 74 that specifies a position of a predetermined part of the object, and a position of the predetermined part A processing unit 76 that performs predetermined image processing based on the information, an output unit 78 that outputs the result of the image processing, and a processing data storage unit 80 that stores data used for image processing are included.

追跡装置１４は図６に示した追跡装置１４と同様の構成とすることができる。なお頭部など着目する部位によっては形状変化を伴わない場合もあるため、その場合は形状予測部４４の処理などを適宜省略してよい。逆に、手のように、多様な形状変化が予測できる場合は、それらの形状に対応した追跡処理が行えるようにしておく。この場合でも、上述のように基準形状を定義するパラメータの線形和で対象物の形状を定義することにより、少数の基準形状の準備のみであらゆる形状を表現することができる。手の場合、例えば基準形状として、五指のいずれか１本を立てて残りの４本は握っている状態の５つの形状を準備することにより、立っている指の本数が１本から５本までの手を表現することができる。 The tracking device 14 can have the same configuration as the tracking device 14 shown in FIG. Note that, depending on the region of interest such as the head, the shape may not change, and in this case, the processing of the shape prediction unit 44 may be omitted as appropriate. On the other hand, when various shape changes can be predicted, such as a hand, tracking processing corresponding to those shapes is performed. Even in this case, by defining the shape of the object with the linear sum of the parameters that define the reference shape as described above, any shape can be expressed with only a small number of reference shape preparations. In the case of a hand, for example, as a reference shape, by preparing five shapes in which any one of five fingers is raised and the remaining four are gripped, the number of standing fingers can be from 1 to 5 Can express the hand.

また、処理対象の画像は追跡装置１４の画像記憶部２４に格納されているものを使用するものとするが、別に設けた撮像装置から画像処理装置７０へ入力した画像データをリアルタイムで追跡したうえで画像処理を施す態様でもよい。入力部７２は、ユーザが画像処理装置７０に対し、処理の開始、終了の指示、加工処理の内容の選択を行うためのインターフェースである。入力部７２は、キーボード、マウス、トラックボール、ボタン、タッチパネルなど一般的な入力装置でよく、入力する際の選択肢などを表示する表示装置との組み合わせでもよい。 Further, the image to be processed is the one stored in the image storage unit 24 of the tracking device 14, but the image data input to the image processing device 70 from a separate imaging device is tracked in real time. In this case, image processing may be performed. The input unit 72 is an interface for the user to instruct the image processing apparatus 70 to start and end the process and to select the content of the processing process. The input unit 72 may be a general input device such as a keyboard, a mouse, a trackball, a button, or a touch panel, or may be a combination with a display device that displays options for input.

部位特定部７４は、追跡装置１４から追跡結果である対象物の輪郭線を表す曲線のデータを取得し、爪や指など対象部位の位置を特定する。対象部位は、ユーザが選択し、入力部７２に入力することにより決定してもよいし、あらかじめ設定しておいてもよい。いずれにしても、追跡装置１４から得られる輪郭線と、当該対象部位との位置関係に係る情報を加工データ記憶部８０に記憶させておく。前述のネイルアートの例では、手の輪郭のうち指先を示す点および指先の太さなどから爪の領域を導出する規則をあらかじめ設定しておくことにより爪の位置を特定する。さらに部位特定部７４は、輪郭線から対象物の傾き、あるいは対象部位の傾きを特定する。 The part specifying unit 74 acquires data of a curve representing the outline of the target object as a tracking result from the tracking device 14 and specifies the position of the target part such as a nail or a finger. The target part may be determined by the user selecting and inputting to the input unit 72, or may be set in advance. In any case, information related to the positional relationship between the contour line obtained from the tracking device 14 and the target part is stored in the processed data storage unit 80. In the example of the nail art described above, the position of the nail is specified by setting in advance a rule for deriving the nail region from the point indicating the fingertip and the thickness of the fingertip in the outline of the hand. Further, the part specifying unit 74 specifies the inclination of the object or the inclination of the target part from the contour line.

図９は部位特定部７４が対象物の傾きを特定する手法の例を説明するための図である。同図において、状態８２は対象物８６を正面から見たとき、状態８４は、状態８２から回転軸８８を中心に角度θだけ回転したときである。回転軸８８と垂直方向の対象物の幅をＷとすると、図に示すように、状態８２では見かけ上の幅もＷとなる。一方、状態８４では対象物の幅はＷｃｏｓθに見える。従って、例えば対象物の正面画像をキャリブレーション用画像として最初に撮影しておけば、図９の関係を利用して、見かけ上の幅から回転角を求めることができる。対象部位の傾きも同様である。どちらの方向に傾いているかは、親指の位置など輪郭線から取得できる情報などを適宜利用する。本実施の形態では輪郭線の動きを逐次トレースしているため、所定フレーム分の対象物の動きを取得すれば、回転軸は容易に求めることができる。また、そのような対象物の動きから回転角の時間変化を求め、直後のフレームの傾きを推定するようにしてもよい。 FIG. 9 is a diagram for explaining an example of a method in which the part specifying unit 74 specifies the inclination of the object. In the figure, the state 82 is when the object 86 is viewed from the front, and the state 84 is when the object is rotated from the state 82 about the rotation axis 88 by the angle θ. Assuming that the width of the object perpendicular to the rotation axis 88 is W, the apparent width is also W in the state 82 as shown in the figure. On the other hand, in state 84, the width of the object appears to be Wcos θ. Therefore, for example, if the front image of the object is first taken as a calibration image, the rotation angle can be obtained from the apparent width using the relationship shown in FIG. The same applies to the inclination of the target part. Which direction is tilted appropriately uses information that can be acquired from the contour line, such as the position of the thumb. In this embodiment, since the movement of the contour line is sequentially traced, the rotation axis can be easily obtained by acquiring the movement of the object for a predetermined frame. Further, the time change of the rotation angle may be obtained from the movement of the object, and the inclination of the immediately following frame may be estimated.

図８に戻り加工処理部７６は、部位特定部７４が特定した対象部位に対し所定の加工処理を施す。加工処理の内容は、ユーザが選択し、入力部７２に入力することにより決定してもよいし、あらかじめ設定しておいてもよい。あるいはその組み合わせでもよい。例えば表示装置にネイルアートの色や模様などの選択肢を表示し、ユーザの選択入力を受け付ける。そして選択されたネイルアートの画像を加工データ記憶部８０から読み出し、ユーザの手を撮像した入力画像の爪の部分に重ねて表示する。このため加工データ記憶部８０には、ネイルなど合成する画像のテクスチャデータや形状データなどの３Ｄグラフィックスデータなど、加工に必要な画像データを格納しておく。 Returning to FIG. 8, the processing processing unit 76 performs a predetermined processing on the target site specified by the site specifying unit 74. The content of the processing may be determined by the user selecting and inputting to the input unit 72, or may be set in advance. Or the combination may be sufficient. For example, choices such as nail art colors and patterns are displayed on the display device, and a user's selection input is accepted. Then, the selected nail art image is read out from the processed data storage unit 80, and displayed on the nail portion of the input image obtained by capturing the user's hand. For this reason, the processing data storage unit 80 stores image data necessary for processing, such as texture data of an image to be synthesized such as a nail or 3D graphics data such as shape data.

また部位特定部７４は対象部位の傾きも特定するため、加工処理部７６は当該傾きに応じて、合成する画像も変化させる。このとき、合成する画像の傾きを変化させるのみならず、動きに応じた陰影や光の反射の変化を表現する。また、対象部位が重なるなどして合成する画像同士が重なる場合は、部位および輪郭線の時間変化に基づき後ろ側の部位を特定し、後ろ側に対応する合成画像の隠れている部分を消去する。これらの処理は、３Ｄグラフィックスの分野におけるシェーディング、隠面消去など、一般的に用いられている手法を適宜利用することができる。さらに本実施の形態で得られる輪郭線は、対象物の任意の形状に対応することができるため、画面上対象部位が見えないときは、特に画像処理を施さない。例えば、手が「チョキ」の形をして甲が正面にある状態であれば、人差し指および中指の爪にのみネイルの画像を重ねる。 Moreover, since the site | part specific part 74 also specifies the inclination of a target site | part, the process part 76 also changes the image to synthesize | combine according to the said inclination. At this time, not only the inclination of the image to be synthesized is changed, but also a change in shadow and reflection of light according to the motion is expressed. Also, if the images to be combined overlap, such as when the target part overlaps, the rear part is specified based on the temporal change of the part and the contour line, and the hidden part of the composite image corresponding to the rear part is deleted. . For these processes, generally used techniques such as shading and hidden surface removal in the field of 3D graphics can be appropriately used. Furthermore, since the contour line obtained in the present embodiment can correspond to an arbitrary shape of the target object, no image processing is particularly performed when the target part is not visible on the screen. For example, if the hand is in the shape of “choki” and the instep is in front, the nail image is superimposed only on the index finger and the middle fingernail.

出力部７８は、加工処理部７６が行った加工処理の結果、得られる画像を表示したり動画データとして記憶したりする。したがって出力部７８は、表示装置、ハードディスクドライブなどの記憶装置で構成する。表示装置とする場合は、入力部７２の表示装置と同一でよい。 The output unit 78 displays an image obtained as a result of the processing performed by the processing unit 76 or stores it as moving image data. Therefore, the output unit 78 is composed of a storage device such as a display device or a hard disk drive. In the case of a display device, the display device of the input unit 72 may be the same.

次に、上記の構成による画像処理装置７０の動作を説明する。図１０は画像処理装置７０が行う画像加工の処理手順を示すフローチャートである。まずユーザは、入力部７２に対し加工処理の開始指示や処理内容の選択に係る入力を行う（Ｓ４０）。処理開始の指示入力の後、表示装置に表示したネイルから好みの物を選択するなど、多段階の入力態様としてもよい。また、別のネイルを選択し直すなど処理内容の変更は、後の処理の間でも随時受け付けてよい。 Next, the operation of the image processing apparatus 70 configured as described above will be described. FIG. 10 is a flowchart showing a processing procedure of image processing performed by the image processing apparatus 70. First, the user inputs to the input unit 72 an instruction to start processing and selection of processing contents (S40). A multi-stage input mode may be employed, such as selecting a favorite item from the nail displayed on the display device after inputting a processing start instruction. In addition, a change in processing content such as reselecting another nail may be accepted at any time during the subsequent processing.

すると追跡装置１４は、時刻ｔにおける対象物の画像を取得し（Ｓ４２）、追跡処理を行うことにより対象物の輪郭線を取得する（Ｓ４４）。対象物の画像は上述のように、ユーザが自らの手などの対象物を所定の場所に乗せ、それを撮像したものをリアルタイムで取得してもよいし、あらかじめ撮像しておいた動画像の画像フレームを画像記憶部２４から読み出してもよい。 Then, the tracking device 14 acquires an image of the target object at time t (S42), and acquires a contour line of the target object by performing tracking processing (S44). As described above, the image of the object may be acquired in real time by the user placing the object such as his / her hand on a predetermined place and capturing it in real time, or the moving image previously captured The image frame may be read from the image storage unit 24.

次に部位特定部７４は、追跡装置１４から取得した輪郭線のデータから、加工処理の内容に応じた部位の位置と傾きを上述のとおり特定する（Ｓ４６）。そして対象物の画像とともに特定した情報を加工処理部７６に送信する。加工処理部７６は、Ｓ４０においてユーザが選択した内容の加工処理を、対象部位の情報に基づき施すことにより加工画像を生成する（Ｓ４８）。出力部７８は生成された加工画像を表示するなどの出力処理を行う（Ｓ５０）。ユーザから処理終了の指示入力が入力部７２に対してなされない間は（Ｓ５２のＮ）、時刻ｔをインクリメントして（Ｓ５４）、Ｓ４２からＳ５０までの処理を各画像フレームに対して行う。そしてユーザから終了を指示する入力がなされたら、処理を終了する（Ｓ５２のＹ）。 Next, the site | part specific | specification part 74 specifies the position and inclination of the site | part according to the content of the process from the outline data acquired from the tracking apparatus 14 as above-mentioned (S46). Then, the specified information is transmitted to the processing unit 76 together with the image of the object. The processing unit 76 generates a processed image by performing the processing of the content selected by the user in S40 based on the information on the target part (S48). The output unit 78 performs output processing such as displaying the generated processed image (S50). While the user does not input a process end instruction to the input unit 72 (N in S52), the time t is incremented (S54), and the processes from S42 to S50 are performed on each image frame. When the user inputs an instruction to end the process, the process ends (Y in S52).

このような動作により、陰影や反射光の変化、オクルージョンなども考慮し、対象物の動きに追随した画像の加工が可能となる。これまでの説明では対象物を手とし、ネイルアートの試し塗りを仮想空間で行う態様を主たる例としてきたが、本実施の形態はその他、多くの応用例を実現することが可能である。以下、画像処理装置７０によって実現できる応用例を説明する。 By such an operation, it is possible to process an image that follows the movement of the object in consideration of shadows, changes in reflected light, occlusion, and the like. In the description so far, the main example has been an aspect in which the object is a hand and trial painting of nail art is performed in a virtual space, but the present embodiment can realize many other application examples. Hereinafter, application examples that can be realized by the image processing apparatus 70 will be described.

図１１は画像処理装置７０によって洋服の試着を仮想空間で行う態様を実現したとき、出力部７８の表示装置に表示される画面例を示している。仮想試着画面９０は、試着画像表示領域９２および洋服画像表示領域９４を含む。この態様においてまずユーザは、全身が視野角に入るように撮像装置の前に立つ。撮像装置が取得したユーザの全身を含む画像は、表示装置に表示した仮想試着画面９０の試着画像表示領域９２に表示する。撮像装置を表示装置と同じ向きに配置しておけば、ユーザは自分の全身を正面から捉えた画像を見ることができる。 FIG. 11 shows an example of a screen displayed on the display device of the output unit 78 when the image processing device 70 realizes an aspect of trying on clothes in a virtual space. Virtual try-on screen 90 includes a try-on image display area 92 and a clothes image display area 94. In this aspect, the user first stands in front of the imaging apparatus so that the whole body falls within the viewing angle. The image including the whole body of the user acquired by the imaging device is displayed in a try-on image display area 92 of the virtual try-on screen 90 displayed on the display device. If the imaging device is arranged in the same direction as the display device, the user can see an image obtained by capturing his / her whole body from the front.

洋服画像表示領域９４には、試着対象として選択できる洋服の画像を一覧表示する。例えば、ネットワークを介して洋服の受注を行う服飾店やオークションの出品者が自らの商品を画像として準備する。画像処理装置７０は、ユーザからの指示入力に従い当該画像をネットワークを介して取得して洋服画像表示領域９４に表示する。そして入力部７２を、ユーザの手元で仮想試着画面９０内に表示したポインタ９６を操作できるコントローラとすると、ユーザは当該コントローラを操作して、洋服画像表示領域９４から試着したい洋服をポインタ９６で選択することができる。 The clothes image display area 94 displays a list of clothes images that can be selected as try-on objects. For example, a clothing store or an auction exhibitor who receives an order for clothes via a network prepares their products as images. The image processing apparatus 70 acquires the image via the network in accordance with an instruction input from the user, and displays it in the clothes image display area 94. If the input unit 72 is a controller that can operate the pointer 96 displayed in the virtual try-on screen 90 at the user's hand, the user operates the controller to select a clothes to be tried on from the clothes image display area 94 with the pointer 96. can do.

すると図１０に示した処理手順により、試着画像表示領域９２に表示されたユーザの体に、洋服画像表示領域９４から選択された洋服を合成した画像を生成することができる。当該画像を試着画像表示領域９２に表示すると、ユーザは、選択した洋服を試着した自分の姿を見ることができる。この態様において追跡装置１４は、Ω型のテンプレートを用いてユーザの頭部の輪郭を追跡する。頭部の場合は上述のとおりΩ型のテンプレートの並進、伸縮、回転で追跡可能であるため、形状予測部４４の処理は省略してよい。 Then, according to the processing procedure shown in FIG. 10, an image in which the clothes selected from the clothes image display area 94 are combined with the user's body displayed in the try-on image display area 92 can be generated. When the image is displayed in the try-on image display area 92, the user can see his / her appearance after trying on the selected clothes. In this embodiment, the tracking device 14 tracks the contour of the user's head using an Ω-type template. In the case of the head, as described above, it can be traced by translation, expansion / contraction, and rotation of the Ω-type template, so the processing of the shape prediction unit 44 may be omitted.

すると部位特定部７４は、追跡装置１４が出力したΩ型の頭部輪郭のうち、肩のラインの位置や大きさを特定する。そして加工処理部７６は、選択された洋服の画像の肩のラインが、特定したユーザの肩のラインに重なるように、洋服の画像をユーザの画像に重ねる。この処理を各時刻の画像フレームに対して繰り返すことにより、ユーザの動きに追随して合成した洋服の画像を動かすことができ、まさにユーザ自身が洋服を試着して動いているように見せることができる。 Then, the site | part specific | specification part 74 specifies the position and magnitude | size of a shoulder line among the omega-type head outlines which the tracking apparatus 14 output. The processing unit 76 then superimposes the clothes image on the user's image so that the shoulder line of the selected clothes image overlaps the identified shoulder line of the user. By repeating this process for each image frame at each time, it is possible to move the synthesized clothes image following the user's movements, and the user himself can try on clothes and move as if moving. it can.

ユーザは撮像装置に対して正面を向いていなくてもよく、横を向いたり回転したりしても、図９に示したような原理で部位特定部７４がユーザの体の向きを検出し、それに合わせて洋服の画像も回転させる。そのためには、洋服を所定の複数の角度から撮影した画像を加工データ記憶部８０に格納しておく。それ以外の角度については３Ｄグラフィックスの既知の手法で補間する。ユーザの体が右、左のどちら側に向いたかは、上述のとおり前の画像フレームからの動きによって推定してもよいし、既存の顔検出技術を導入し、顔の向きから判断するようにしてもよい。 Even if the user does not have to face the imaging device and faces sideways or rotates, the part specifying unit 74 detects the orientation of the user's body according to the principle shown in FIG. The clothes image is rotated accordingly. For this purpose, images obtained by photographing clothes from a plurality of predetermined angles are stored in the processed data storage unit 80. The other angles are interpolated by a known method of 3D graphics. Whether the user's body is facing right or left may be estimated by the movement from the previous image frame as described above, or existing face detection technology is introduced to determine from the face orientation. May be.

なお図１１の例は、ユーザが撮像装置に対してほぼ後ろを向いた状態を示している。撮像装置と表示装置を同じ方向に設置した場合、この瞬間において当該ユーザは表示装置の仮想試着画面９０を見ることができない。そこで加工処理部７６は、ユーザが後ろを向いた状態を検出し、そのときに生成した加工画像は、例えば数秒単位の所定時間、表示を遅延させるように制御してもよい。ユーザが後ろを向いた状態は、ユーザの輪郭線の肩のラインの幅の時間変化や、顔検出処理において顔が検出されなかったことなどに基づき検出する。こうすることによりユーザは、洋服を試着した自分の後ろ姿を確認することができる。 Note that the example in FIG. 11 illustrates a state in which the user is facing substantially rearward with respect to the imaging apparatus. When the imaging device and the display device are installed in the same direction, the user cannot see the virtual try-on screen 90 of the display device at this moment. Therefore, the processing unit 76 may detect a state in which the user is facing backward, and the processed image generated at that time may be controlled to delay display for a predetermined time in units of several seconds, for example. The state in which the user is facing backward is detected based on a temporal change in the width of the shoulder line of the user's contour line or the fact that no face is detected in the face detection process. By doing so, the user can confirm the back of the user who tried on the clothes.

加工処理部７６はさらに、ユーザが回転していることを肩のラインの幅の時間変化などから検出したら、当該回転の速度に応じて試着中の服が形状変化する様を表現してもよい。例えば試着中のスカートの裾を広がらせたり、ブラウスをふくらませたりする。回転速度と形状変化の度合いを対応づけたテーブルを、服地の硬さや洋服の形状などに応じて用意しておけば、一般的な３Ｄグラフィックスの技術で回転速度に応じた形状変化をつけることができる。このようにすることで、より現実に近い間隔で服の様子を確認することができる。 Further, when the processing unit 76 detects that the user is rotating from a change over time in the width of the shoulder line, the processing unit 76 may express that the clothes being tried on change in shape according to the rotation speed. . For example, widen the skirt of a skirt while trying it on, or inflate a blouse. If a table that associates the rotation speed with the degree of shape change is prepared according to the hardness of the clothing, the shape of the clothes, etc., the shape change according to the rotation speed can be applied with general 3D graphics technology. Can do. By doing in this way, the state of clothes can be confirmed at intervals closer to reality.

画像処理装置７０によって実現できる別の応用例としてモザイク処理がある。例えばあらかじめ撮影された人物のビデオ画像に対し、人物の頭部のみにモザイク処理を施すことができる。この場合も追跡装置１４は、Ω型のテンプレートを用いて人物の頭部の追跡処理を行い、輪郭線を取得する。部位特定部７４は例えば、Ω型の輪郭線および端点を結んだ線分で囲まれた領域を頭部の領域として特定する。加工処理部７６は、特定した領域に対しモザイク処理を施す。これを各時刻の画像フレームに対して繰り返すことにより、人物の動きに追随してモザイク処理を施した動画像を生成することができる。 Another application example that can be realized by the image processing apparatus 70 is mosaic processing. For example, a mosaic process can be applied only to the head of a person on a video image of the person photographed in advance. Also in this case, the tracking device 14 performs tracking processing of a person's head using an Ω-type template and acquires a contour line. The part specifying unit 74 specifies, for example, an area surrounded by an Ω-shaped outline and a line segment connecting the end points as a head area. The processing unit 76 performs mosaic processing on the identified area. By repeating this for image frames at each time, it is possible to generate a moving image that has been subjected to mosaic processing following the movement of a person.

追跡装置１４は、人物の顔の向きに関わらず頭部の輪郭を常に取得する。そのため、顔検出などでは特定しにくい、人物が横を向いたり俯いたりしたときや後ろを向いたときなどでも頭部の領域が特定できる。すると、後頭部などでも人物が特定されてしまうような状況において顔検出ができなかったばかりにモザイクが取れてしまったり、モザイクが取れないように人物周辺の領域も含めた余計な場所に定常的にモザイクを施したりする状況を回避することができる。これにより、人物の周囲の状況など画像が有する必要な情報は保持したまま、人物の姿に係る情報を安全に隠蔽することができる。 The tracking device 14 always acquires the outline of the head regardless of the orientation of the person's face. Therefore, it is difficult to specify by face detection or the like, and the head region can be specified even when a person turns sideways or scolds or faces backward. Then, in a situation where a person is identified even in the back of the head, etc., the face is not detected and the mosaic is removed, or the mosaic is regularly placed in an extra area including the area around the person so that the mosaic is not removed. It is possible to avoid the situation of giving. Thereby, it is possible to safely conceal information relating to the figure of a person while holding necessary information included in the image such as a situation around the person.

画像処理装置７０によって実現できるさらに別の応用例として、画像上の対象物の情報表示がある。図１２はその一例として、サッカーの試合中の選手の情報を表示する画面を示している。同図の選手情報表示画面１２０は例えば試合の中継映像であり、選手１２２、１２６、１３０の３人が撮像装置の視野角内にいる。そして、選手１２２、１２６の頭上には、選手を指す矢印と、名前、背番号、今日のシュート回数など各選手の情報を表示した領域とからなる情報タグ１２４、１２８の画像を付加する加工が施されている。同図に示すように、情報タグ１２４、１２８の大きさは、選手の撮像装置からの距離に応じて変化させる。 As another application example that can be realized by the image processing device 70, there is information display of an object on an image. As an example, FIG. 12 shows a screen that displays information about players in a soccer game. The player information display screen 120 shown in the figure is, for example, a game relay video, and three players 122, 126, and 130 are within the viewing angle of the imaging apparatus. And processing which adds the image of the information tags 124 and 128 which consist of the area | region which displayed the information which shows each player's information, such as the arrow which points to a player, and a name, a back number, today's number of shots, on the head of the players 122 and 126 is carried out. It has been subjected. As shown in the figure, the sizes of the information tags 124 and 128 are changed according to the distance of the player from the imaging device.

この場合、追跡装置１４はΩ型のテンプレートを用いて試合中の映像における選手の頭部の追跡処理を行い、輪郭線を取得する。部位特定部７４はΩ型の輪郭線の頂点を頭頂部として特定するとともに、輪郭線の大きさを取得する。加工処理部７６は、あらかじめ設定しておいた、輪郭線の大きさと情報タグの大きさとの対応関係に基づき、情報タグの大きさを決定する。そして、あらかじめ準備した各選手の情報を加工データ記憶部８０から読み出して情報タグの画像を生成し、各選手の頭頂部に矢印の先が向くようにして試合中の映像に重ねて表示する。 In this case, the tracking device 14 performs tracking processing of the player's head in the video during the match using the Ω-type template, and acquires the contour line. The part specifying unit 74 specifies the apex of the Ω-type outline as the top of the head, and acquires the size of the outline. The processing unit 76 determines the size of the information tag based on the correspondence relationship between the size of the contour line and the size of the information tag that is set in advance. Then, information on each player prepared in advance is read from the processed data storage unit 80 to generate an information tag image, which is displayed superimposed on the video during the game with the arrow pointing to the top of each player.

ここで情報タグは、他の選手の姿に被せないように表示することが望ましい。そのため部位特定部７４は、視野角内の選手の頭部の輪郭線の情報に基づき、他の選手が存在しない領域も特定し、加工処理部７６は当該領域に情報タグを表示するようにしてもよい。この処理を、各時刻の画像フレームに対して繰り返すことにより、選手の動きに追随する情報タグを表示させた試合の映像を生成することができる。 Here, it is desirable to display the information tag so as not to cover other players. Therefore, the part specifying unit 74 specifies an area where no other player exists based on the information on the contour of the head of the player within the viewing angle, and the processing unit 76 displays an information tag in the area. Also good. By repeating this process for the image frames at each time, it is possible to generate a video of a game in which an information tag that follows the movement of the player is displayed.

情報タグの大きさを撮像装置から選手までの距離に応じて変化させることにより、情報タグにも距離感を演出することができるうえ、多人数が視野角内に存在しても情報タグが煩雑にならず、どの選手の情報タグかを把握し易くなる。追跡装置１４が複数の選手の重なり合いを検出したら、加工処理部７６は情報タグも同様に重ね合わせ、背後の選手の情報タグは手前の選手の情報タグにより一部隠蔽されるように表示してもよい。 By changing the size of the information tag according to the distance from the imaging device to the player, it is possible to produce a sense of distance also in the information tag, and the information tag is complicated even if many people exist within the viewing angle This makes it easier to know which player's information tag it is. When the tracking device 14 detects an overlap of a plurality of players, the processing unit 76 displays the information tags so that the information tags of the players behind are also partially hidden by the information tag of the player in front. Also good.

また情報タグの大きさには、表示させる上限、下限のいずれかまたは両方を設定してもよい。図１２の例では、最も遠くにいる選手１３０については、情報タグの大きさが下限未満になるとして情報タグを表示していない。情報タグの大きさに下限、上限を設定することにより、文字が判別できないような小さな情報タグや、画像中、大きな領域を覆ってしまうような大きな情報タグを表示することがなくなり、常に見やすい画像となる。 In addition, for the size of the information tag, either or both of an upper limit and a lower limit to be displayed may be set. In the example of FIG. 12, the information tag is not displayed for the player 130 farthest away because the size of the information tag is less than the lower limit. By setting the lower and upper limits for the size of the information tag, there is no need to display small information tags that cannot distinguish characters, or large information tags that cover large areas in the image, making it always easy to see It becomes.

このように情報タグを表示することにより、サッカーやマラソンのように広い領域で多くの人数によって行われるスポーツなどでは、選手の特定がし易くなり、さらに各選手の情報を、試合の状況や選手の動きなどを見ながら容易に把握することができる。情報タグは、ユーザの入力部７２に対する指示入力により、表示／非表示を切り替えられるようにしてもよい。また情報タグは、スポーツ映像のみならず、ドラマの登場人物や俳優の情報表示、動画像内の商品の情報表示などに利用することもできる。さらに実写のみならず、コンピュータグラフィックスで描画した仮想空間内の人や物の情報表示を行ってもよい。 By displaying information tags in this way, it is easy to identify players in sports such as soccer and marathon that are performed by a large number of people in a wide area, and further, information on each player can be displayed with the status of the game and the players. You can easily grasp while watching the movement of the. The information tag may be switched between display and non-display by an instruction input to the input unit 72 by the user. The information tag can be used not only for sports videos but also for displaying information about drama characters and actors, displaying information about products in moving images, and the like. Furthermore, not only live-action images but also information on people or objects in a virtual space drawn by computer graphics may be displayed.

以上述べた本実施の形態によれば、追跡対象の形状を、あらかじめ用意した複数の基準形状を表すＢスプライン曲線を定義する制御点列の線形和で表現する。そして各制御点列にかかる係数で構成される係数セットを、パーティクルを定義するパラメータに含める。これにより、一のテンプレート形状の並進、伸縮、回転にのみ対応可能であったCondensationアルゴリズムを、追跡対象の形状そのものが変化する環境において適用することができる。 According to the present embodiment described above, the shape to be tracked is expressed by a linear sum of control point sequences that define B-spline curves representing a plurality of reference shapes prepared in advance. Then, a coefficient set composed of coefficients related to each control point sequence is included in the parameters defining the particles. As a result, the Condensation algorithm that can only handle translation, expansion, contraction, and rotation of one template shape can be applied in an environment where the shape of the tracking target itself changes.

また、基準形状の中間的な形状は全て係数セットの調整により表現することができるため、対象物のとり得る形状を全て準備するのに比べ、使用するメモリ領域を大幅に削減できるとともに、計算に用いるパラメータの数を少なくすることができる。また係数セットは形状空間ベクトルと同様に扱うことができるため、従来のアルゴリズムをそのまま利用することができ、計算量を増大させることがなく、パーティクルフィルタを用いた追跡処理の利点を維持することができる。 In addition, since all intermediate shapes of the reference shape can be expressed by adjusting the coefficient set, the memory area to be used can be greatly reduced compared to preparing all the shapes that the object can take, and it is also useful for calculation. The number of parameters used can be reduced. In addition, since the coefficient set can be handled in the same way as the shape space vector, the conventional algorithm can be used as it is, and the advantage of the tracking process using the particle filter can be maintained without increasing the calculation amount. it can.

さらに係数セットの空間における遷移モデルを導入することにより、直後の形状を予測し、当該形状を定義する係数セットの近傍にパーティクルを分布させる。これにより、パーティクルの数を増大させずに、効率的かつ精度よく追跡処理を遂行することができる。一般的には、形状認識と追跡処理は別個の処理とされるが、パーティクルという概念でそれらを結びつけることができ、簡素なアルゴリズムで同時処理が可能となる。 Further, by introducing a transition model in the space of the coefficient set, the shape immediately after is predicted, and the particles are distributed in the vicinity of the coefficient set defining the shape. Thereby, the tracking process can be performed efficiently and accurately without increasing the number of particles. In general, shape recognition and tracking processing are separate processes, but they can be connected by the concept of particles, and simultaneous processing is possible with a simple algorithm.

また形状の遷移モデルを設定しパーティクルをそれに基づき分布させるということは、対象物の形状予測を行っていることに等しい。これにより、じゃんけんでユーザが出す手の先読みなども可能となるほか、ユーザの動きに対して応答性のよいインターフェースを実現することができ、通常の情報処理装置のユーザインターフェースのほか、遠隔操作により動作するロボットや医療器具などにも応用することができる。 Setting a shape transition model and distributing particles based on the model is equivalent to predicting the shape of the object. This makes it possible to pre-read the user's hand in the janken, and to realize an interface with good responsiveness to the user's movement. It can also be applied to operating robots and medical instruments.

さらに、形状変化、並進、伸縮、回転の少なくともいずれかを行う対象物の輪郭線を精度よく取得し、その情報を利用して画像の加工処理を施すことにより、様々な機能を提供することができる。具体的には、ネイルアートの試し塗り、指輪や洋服の試着、モザイク処理、情報タグの付加などを行うことができる。従来、画像中の対象物の輪郭を切り出す際は、人が画像フレームを一枚一枚確認して切り出していくという作業が必要であり、特に動画の場合は、その作業コストが甚大であった。本実施の形態では動画であっても輪郭線を正確かつ容易に取得できる。また、ブルースクリーンやグリーンスクリーンなどを用いたクロマキー合成や顔検出技術などのように、入力画像に特殊な条件を必要としない。 Furthermore, it is possible to provide various functions by accurately obtaining the contour line of an object that performs at least one of shape change, translation, expansion / contraction, and rotation, and performing image processing using the information. it can. Specifically, trial application of nail art, try-on of rings and clothes, mosaic processing, addition of information tags, and the like can be performed. Conventionally, when cutting out the outline of an object in an image, it is necessary for the person to check and cut out the image frames one by one, especially in the case of moving images, the work cost was significant. . In the present embodiment, the contour line can be acquired accurately and easily even for a moving image. In addition, special conditions are not required for the input image, such as chroma key composition using blue screen or green screen, face detection technology, and the like.

これにより、従来の手法に比べて少ない計算量で、輪郭線取得に加え、対象物の動きに応じた加工処理を手軽に施すことができる。対象物の傾きや重なり具合も検出できるため、加工する領域や合成する画像の形状を変化させたり、シェーディング、隠面消去などのグラフィックス処理をさらに施すことができ、仮想空間をよりリアルに表現できる。また、対象物が存在する領域、存在しない領域を、対象物の動きに応じて特定できるため、対象物のみに加工処理を施したり、対象物のない領域を選んで加工処理を施したり、といったことが可能になり、デザイン性や情報開示の観点から、ユーザのニーズに対応した加工画像を臨機応変に生成できる。 Thereby, it is possible to easily perform processing according to the movement of the object in addition to obtaining the contour line with a small calculation amount as compared with the conventional method. Since the tilt and overlap of objects can also be detected, the processing area and the shape of the image to be synthesized can be changed, and graphics processing such as shading and hidden surface removal can be further performed to express the virtual space more realistically. it can. In addition, since the area where the object exists and the area where the object does not exist can be specified according to the movement of the object, only the object is processed, or the area without the object is selected and processed. Therefore, from the viewpoint of design and information disclosure, a processed image corresponding to the user's needs can be generated flexibly.

以上、本発明を実施の形態をもとに説明した。上記実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on the embodiments. Those skilled in the art will understand that the above-described embodiment is an exemplification, and that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are also within the scope of the present invention. is there.

例えば本実施の形態では、主に追跡対象の輪郭線を推定するために、あらかじめ準備した基準形状を定義するパラメータの線形和で対象物の輪郭線を表現した。一方、この表現手法は、追跡対象の輪郭線の推定に限らず、対象物を描画する際の表現手法として広く適用することができる。例えば、三次元コンピュータグラフィックス上で使用されるポリゴンデータの生成などに用いてもよい。このような場合でも、表現可能な全ての形状のパラメータセットを準備しておく場合と比べて、格段に使用するメモリ量を少なくすることができる。 For example, in this embodiment, in order to mainly estimate the contour line of the tracking target, the contour line of the target object is expressed by a linear sum of parameters that define a reference shape prepared in advance. On the other hand, this expression method is not limited to the estimation of the contour line of the tracking target, but can be widely applied as an expression method for drawing an object. For example, it may be used for generating polygon data used on three-dimensional computer graphics. Even in such a case, the amount of memory to be used can be remarkably reduced as compared with a case where parameter sets of all shapes that can be expressed are prepared.

人物を追跡対象とした場合の視覚追跡手法を説明するための図である。It is a figure for demonstrating the visual tracking method when a person is made into the tracking object. パーティクルフィルタを用いた確率密度推定の手法を説明する図である。It is a figure explaining the technique of probability density estimation using a particle filter. 本実施の形態における係数セットの値と遷移モデルについて説明するための図である。It is a figure for demonstrating the value of a coefficient set and a transition model in this Embodiment. 本実施の形態における係数セットの値と遷移モデルについて説明するための図である。It is a figure for demonstrating the value of a coefficient set and a transition model in this Embodiment. 本実施の形態における視覚追跡システムの構成例を示す図である。It is a figure which shows the structural example of the visual tracking system in this Embodiment. 本実施の形態における追跡装置の構成を詳細に示す図である。It is a figure which shows the structure of the tracking apparatus in this Embodiment in detail. 本実施の形態における追跡処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the tracking process in this Embodiment. 本実施の形態における追跡処理によって取得した輪郭線を利用して画像加工処理を行う画像処理装置の構成を示す図である。It is a figure which shows the structure of the image processing apparatus which performs an image processing process using the outline acquired by the tracking process in this Embodiment. 本実施の形態の部位特定部が対象物の傾きを特定する際の手法の例を説明するための図である。It is a figure for demonstrating the example of the method at the time of the site | part specific part of this Embodiment specifying the inclination of a target object. 本実施の形態の画像処理装置が行う画像加工の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the image process which the image processing apparatus of this Embodiment performs. 本実施の形態の画像処理装置によって洋服の試着を仮想空間で行う態様を実現したとき、表示装置に表示される画面例を示す図である。It is a figure which shows the example of a screen displayed on a display apparatus, when the aspect which tries on clothes in virtual space is implement | achieved by the image processing apparatus of this Embodiment. 本実施の形態の画像処理装置によってサッカーの試合中の選手の情報を表示する画面例を示す図である。It is a figure which shows the example of a screen which displays the information of the player in a soccer game with the image processing apparatus of this Embodiment.

Explanation of symbols

１０視覚追跡システム、１２撮像装置、１４追跡装置、１６表示装置、２０画像取得部、２２画像処理部、２４画像記憶部、２６追跡対象領域検出部、２８追跡開始終了判定部、３０追跡処理部、３０観測部、３６結果記憶部、４０出力制御部、４２サンプリング部、４４形状予測部、４６形状空間ベクトル予測部、４８観測部、５０結果取得部、７０画像処理装置、７２入力部、７４部位特定部、７６加工処理部、７８出力部、８０加工データ記憶部、９０仮想試着画面、９２試着画像表示領域、９４洋服画像表示領域、１２０選手情報表示画面。 DESCRIPTION OF SYMBOLS 10 Visual tracking system, 12 Imaging apparatus, 14 Tracking apparatus, 16 Display apparatus, 20 Image acquisition part, 22 Image processing part, 24 Image memory | storage part, 26 Tracking object area | region detection part, 28 Tracking start end determination part, 30 Tracking processing part , 30 observation unit, 36 result storage unit, 40 output control unit, 42 sampling unit, 44 shape prediction unit, 46 shape space vector prediction unit, 48 observation unit, 50 result acquisition unit, 70 image processing device, 72 input unit, 74 Site specifying unit, 76 processing unit, 78 output unit, 80 processing data storage unit, 90 virtual try-on screen, 92 try-on image display region, 94 clothes image display region, 120 player information display screen.

Claims

A reference shape storage unit that stores a plurality of parameters that define contour lines of a plurality of reference shapes;
An object shape determination unit that expresses and outputs the contour shape of an object in an image by the linear sum by determining a set of coefficients of each parameter in a linear sum of a plurality of parameters stored in the reference shape storage unit When,
An image processing apparatus comprising:

An image acquisition unit that acquires moving image stream data including a first image frame and a second image frame obtained by imaging the object;
The object shape determination unit
In the coefficient set space defined by the set of coefficients, particles used for the particle filter are generated and disappeared based on the estimated existence probability distribution of the object in the first image frame in the space, and based on a predetermined transition model A shape prediction unit for transition;
An observation unit for observing the likelihood of each particle by matching the contour line of the object in the second image frame with the candidate contour defined by the particle;
An estimated existence probability distribution in the coefficient set space of the object in the second image frame is calculated based on the likelihood observed by the observation unit, and the coefficient set of each particle is weighted based on the estimated existence probability distribution A contour line acquisition unit that estimates a contour shape of the object in the second image frame by performing
The image processing apparatus according to claim 1, further comprising:

The image processing apparatus according to claim 1, wherein the parameter defining the contour line is a control point sequence when the contour line is represented by a B-spline curve.

The image processing apparatus according to claim 1, wherein the parameter defining the contour line is a knot sequence when the contour line is represented by a B-spline curve.

The particles that the shape prediction unit has made transition to the shape space vector space defined by the shape space vector that defines the translation amount, magnification, and rotation angle of the contour line determined by each particle, A shape space vector prediction unit that generates and disappears based on the estimated existence probability distribution in the space and makes a transition based on a predetermined transition model,
The observation unit observes the likelihood of the particles that the shape space vector prediction unit has transitioned,
The contour acquisition unit further calculates an estimated existence probability distribution in the space of the shape space vector of the object in the second image frame based on the likelihood observed by the observation unit, and the estimated existence probability distribution 3. The translation amount, magnification, and rotation angle of the contour line of the object in the second image frame are further estimated by weighting the shape space vector of each particle based on the second image frame. Image processing device.

The shape prediction unit transitions particles generated and extinguished based on the estimated existence probability distribution of the object in the first image frame so as to form a Gaussian distribution centered on coordinates before transition in the coefficient set space. The image processing apparatus according to claim 2, wherein:

When the shape prediction unit detects that the shape defined by the particle is a shape between the first reference shape and the second reference shape based on the coordinates before the transition of the particle in the coefficient set space. In the coefficient set space, the standard deviation in the direction of the line connecting the coordinates representing the first reference shape and the coordinates representing the second reference shape has a Gaussian distribution larger than the standard deviation in the other direction. The image processing apparatus according to claim 6, wherein the particles are transitioned.

When the shape prediction unit detects that the shape defined by the particle is in the state considered as the reference shape based on the coordinates before the transition of the particle in the coefficient set space, the shape prediction unit may make a transition from the reference shape. The image processing apparatus according to claim 2, wherein the particles are distributed on the assumption that the probability of transition to each of the possible reference shapes is equal.

Reading a plurality of parameters defining a plurality of reference shape contours from a storage device, and determining a set of coefficients for each parameter in a linear sum of the parameters;
Using the defined set of coefficients to represent and output the contour of the object in the image as the linear sum;
An image processing method comprising:

Further including obtaining moving image stream data including a first image frame and a second image frame obtained by imaging an object and storing them in a memory;
The outputting step predicts a contour line of the object in the second image frame based on an estimated existence probability distribution of the object in the first image frame in a coefficient set space defined by the set of coefficients. Obtaining an estimated presence probability distribution of the object in the second image frame by comparing with a contour line of the object in the second image frame read from the memory;
Estimating a contour line of an object in the second image frame based on the estimated existence probability distribution and storing it in a memory;
The image processing method according to claim 9, further comprising:

A function of reading a plurality of parameters defining contour lines of a plurality of reference shapes from a storage device, and determining a set of coefficients of each parameter in a linear sum of the parameters;
A function of expressing and outputting the outline of the object in the image by the linear sum using the set of the determined coefficients;
A computer program for causing a computer to realize the above.

Causing a computer to further realize a function of acquiring moving image stream data including a first image frame and a second image frame obtained by capturing an object and storing them in a memory;
Based on the estimated existence probability distribution of the object in the first image frame in the coefficient set space defined by the set of coefficients, the contour line of the object in the second image frame is predicted and read from the memory A function of obtaining an estimated presence probability distribution of the object in the second image frame by comparing with an outline of the object in the second image frame;
A function for estimating an outline of an object in the second image frame based on the estimated existence probability distribution and storing the outline in a memory;
The computer program according to claim 11, wherein a computer is implemented.

A function of reading a plurality of parameters defining contour lines of a plurality of reference shapes from a storage device, and determining a set of coefficients of each parameter in a linear sum of the parameters;
A function of expressing and outputting the outline of the object in the image by the linear sum using the set of the determined coefficients;
The recording medium which recorded the computer program characterized by making a computer implement | achieve.

Causing a computer to further realize a function of acquiring moving image stream data including a first image frame and a second image frame obtained by capturing an object and storing them in a memory;
Based on the estimated existence probability distribution of the object in the first image frame in the coefficient set space defined by the set of coefficients, the contour line of the object in the second image frame is predicted and read from the memory A function of obtaining an estimated presence probability distribution of the object in the second image frame by comparing with an outline of the object in the second image frame;
A function for estimating an outline of an object in the second image frame based on the estimated existence probability distribution and storing the outline in a memory;
14. A recording medium on which a computer program according to claim 13 is recorded.