JP2009245314A

JP2009245314A - Identification system of time-series data, and apparatus of giving personal meta information to moving image

Info

Publication number: JP2009245314A
Application number: JP2008093028A
Authority: JP
Inventors: Kenji Matsuo; 賢治松尾; Masaki Naito; 正樹内藤; Kazunori Matsumoto; 一則松本
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2008-03-31
Filing date: 2008-03-31
Publication date: 2009-10-22
Anticipated expiration: 2028-03-31
Also published as: JP5061382B2

Abstract

<P>PROBLEM TO BE SOLVED: To accurately discriminate to which a registered person or the other object inputted time-series data belong to. <P>SOLUTION: A plurality of time-series data are inputted to a comparison score calculating means 11. The comparison score calculating means 11, an LLR measuring section 14 and a threshold identifying section 16 perform hypothesis testing for each object, and output an identification result to which object each time-series data belongs to. According to an identification result, a state holding buffer 13 holds a hypothesis state (adoption, rejection, and under testing) for each object. The comparison score calculating means 11 calculates a comparison score only for the hypothesis under testing. The identification result can be used for giving personal meta-information. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、時系列データの識別装置および動画像への人物メタ情報付与装置に関し、特に、入力された時系列データが登録された人物やその他のオブジェクトのいずれに属するかを識別する時系列データの識別装置およびそれを利用した動画像への人物メタ情報付与装置に関する。 The present invention relates to an apparatus for identifying time-series data and an apparatus for assigning person meta information to a moving image, and in particular, time-series data for identifying whether input time-series data belongs to a registered person or other object. The present invention relates to a discriminating apparatus and an apparatus for assigning person meta information to a moving image using the same.

動画像内に登場する人物などのオブジェクト検索を実現するため、オブジェクトに関するメタ情報をフレーム単位で動画像に付与することが要求される。このためには、画像や音声などの時系列データが、どの人物あるいはその他のどのオブジェクトに属するものであるかを識別する必要がある。 In order to search for an object such as a person appearing in a moving image, it is required to add meta information about the object to the moving image in units of frames. For this purpose, it is necessary to identify to which person or other object the time-series data such as images and sounds belong.

非特許文献１には、逐次確率比検定(SPRT:Sequential Probability Ratio Test)により、入力された時系列データが登録されたオブジェクトのいずれに属するかを識別する方法が提案されている。 Non-Patent Document 1 proposes a method of identifying to which of registered objects the input time-series data belongs by a sequential probability ratio test (SPRT).

また、非特許文献２には、特に、SPRTを音声時系列信号に適用し、入力された音声時系列信号が登録された話者の中の誰に属するかを識別する方法が記載されている。 Non-Patent Document 2 particularly describes a method of applying SPRT to a speech time-series signal and identifying to whom the input speech time-series signal belongs among registered speakers. .

SPRTは、時間軸に沿って仮説検定を逐次的に実行していき、結論が得られた時点で検定を停止する逐次的決定過程（sequential decision process）であり、仮説検定理論における逐次確率比検定の考え方と決定理論の発想を導入したものである。 SPRT is a sequential decision process that executes hypothesis tests sequentially along the time axis and stops the test when a conclusion is reached. Sequential probability ratio test in hypothesis test theory And the idea of decision theory are introduced.

SPRTは、結論を出すべき時点を予め設定することなく、入力される時系列データの動向を逐次確認しつつ識別でき、少ない費用および労力で効率的に妥当な結論を得ることができるという特徴を有する。 SPRT is characterized by being able to identify the trend of input time-series data by sequentially confirming the trend without setting the time point at which a conclusion should be drawn, and efficiently obtaining a reasonable conclusion with less cost and effort. Have.

図５は、従来のSPRTの動作説明図である。オブジェクト1,2を規定する仮説H_１、H_２の対数尤度比(LLR:Logarithm of Likelihood Ratio)をそれぞれ、LLR(H_１)、LLR(H_２)とすると、まず、時刻t_０でLLR(H_１)、LLR(H_２)をゼロにリセットし、時系列データが入力される時間軸に沿って仮説検定を逐次的に実行して行く。LLR(H_１)、LLR(H_２)が共に上限閾値Aを上回っていなければ、いずれの仮説も採択しない。 FIG. 5 is a diagram for explaining the operation of a conventional SPRT. Assuming that the logarithmic likelihood ratio (LLR) of the hypotheses H ₁ and H ₂ that define objects 1 and 2 is LLR (H ₁ ) and LLR (H ₂ ), respectively, the LLR at time t ₀ (H ₁ ) and LLR (H ₂ ) are reset to zero, and hypothesis tests are sequentially executed along the time axis to which time series data is input. If neither LLR (H ₁ ) nor LLR (H ₂ ) exceeds the upper threshold A, neither hypothesis is adopted.

時刻t_Ｎ１で、LLR(H_２)は上限閾値Aを上回っていないが、LLR(H_１)が上限閾値Aを上回ったとすると、時刻t_Ｎ１で仮説H_１を採択する。これにより、時系列データはオブジェクト1に属すると識別される。その後、LLR(H_１)、LLR(H_２)をゼロにリセットして再び逐次的決定過程を繰り返す。 At time t _N1, but LLR (H ₂₎ is not greater than the upper threshold A, when the LLR (H ₁₎ exceeds the upper threshold A, to adopt the hypothesis H ₁ at time t _N1. As a result, the time-series data is identified as belonging to the object 1. Thereafter, LLR (H ₁ ) and LLR (H ₂ ) are reset to zero and the sequential determination process is repeated again.

次に、時刻t_Ｎ２で、LLR(H_１)は上限閾値Aを上回っていないが、LLR(H_２)が上限閾値Aを上回ったとすると、時刻t_Ｎ２で仮説H_２を採択する。今度は、時系列データはオブジェクト2に属すると識別される。その後、LLR(H_１)、LLR(H_２)をゼロにリセットして逐次的決定過程を繰り返す。 Then, at time t _N2, but LLR (H ₁₎ is not greater than the upper threshold A, when the LLR (H ₂₎ exceeds the upper threshold A, to adopt a hypothesis H ₂ at time t _N2. This time, the time series data is identified as belonging to object 2. Thereafter, LLR (H ₁ ) and LLR (H ₂ ) are reset to zero and the sequential determination process is repeated.

逆に、LLR(H_１)またはLLR(H_２)が下限閾値Bを下回った場合、仮説H_１またはH_２を棄却する。棄却した仮説のLLRは、他の仮説が採用される時刻までリセット状態のままとする。
松本一則、橋本和夫、“局所定常な2項分布モデルを組み合わせた通信トラヒック監視システム、”電子情報通信学会論文誌D, Vol.J84, No.6, pp.800-808, 2001. 浜崎武、野田秀樹、河口英二、“逐次確率比検定を用いた適応的話者識別、”電子情報通信学会技術研究報告. PRMU, Vol.99, No.710, pp. 9-14, Mar., 2000. Conversely, if LLR (H ₁ ) or LLR (H ₂ ) falls below the lower threshold B, the hypothesis H ₁ or H ₂ is rejected. Rejected hypotheses LLRs remain in reset until other hypotheses are adopted.
Kazunori Matsumoto, Kazuo Hashimoto, “Communication Traffic Monitoring System Combining Local Steady Binomial Distribution Model,” IEICE Transactions D, Vol.J84, No.6, pp.800-808, 2001. Takeshi Hamasaki, Hideki Noda, Eiji Kawaguchi, “Adaptive Speaker Identification Using Sequential Probability Ratio Test,” IEICE Technical Report. PRMU, Vol.99, No.710, pp. 9-14, Mar., 2000.

しかしながら、非特許文献１，２に代表される従来のSPRTでは、複数の時系列データが同時に入力される場合について特段の配慮が払われていない。従来のSPRTを単に複数の時系列データが同時に入力される場合に適用すると、各時系列データの識別結果に不整合が生じるという課題がある。 However, in the conventional SPRT represented by Non-Patent Documents 1 and 2, no special consideration is given to the case where a plurality of time-series data are input simultaneously. When the conventional SPRT is simply applied when a plurality of time-series data are simultaneously input, there is a problem in that the identification results of the respective time-series data are inconsistent.

例えば、画像内に同時に存在する複数の顔画像の時系列データが同時に入力され、これらの時系列データに基づいて人物を識別する場合、従来のSPRTを単に適用すると、複数の時系列データを同一人物のものと誤って識別することがある。これは、時系列データに変動が付加されている場合に生じ、特に顔画像では姿勢、照明、表情など様々な変動が付加され易いことに起因している。 For example, when time-series data of a plurality of face images that exist simultaneously in an image are input at the same time and a person is identified based on these time-series data, if a conventional SPRT is simply applied, a plurality of time-series data are the same May be mistaken for a person. This occurs when fluctuations are added to the time-series data, and this is caused by the fact that various fluctuations such as posture, illumination, facial expression, etc. are easily added to the face image.

以下、従来のSPRTを単に複数の時系列データに適用した場合の問題を具体例をあげて説明する。図６は、TV番組映像の、あるシーンの時刻tにおける１フレームの画像の例を示している。この画像内には２つ顔画像が含まれており、一方の顔画像(a)は正面向きでシーン内では動きが殆どなく安定しており、他方の顔画像(b)はシーン内で比較的動きが激しく不安定であるとする。また、ここでは、TV番組に付随している出演者リストに上がっている３名の人物1,2,3の顔画像が予め登録されており、TV映像からフレーム単位で検出された顔画像(a),(b)が、登録されている３名の人物1,2,3のいずれのものであるかを識別する場合を想定する。 Hereinafter, the problem when the conventional SPRT is simply applied to a plurality of time-series data will be described with specific examples. FIG. 6 shows an example of an image of one frame of a TV program video at a time t of a certain scene. This image contains two face images, one face image (a) is front-facing and stable with little movement in the scene, and the other face image (b) is compared in the scene. Suppose that the target movement is intense and unstable. In addition, here, face images of three persons 1, 2, and 3 on the performer list attached to the TV program are registered in advance, and face images detected in units of frames from TV images ( Assume that a) and (b) identify one of the three registered persons 1, 2, and 3.

図７は、従来のSPRTでの問題点についての説明図である。ここでは、３名(m=3)の登録人物が予定されているので、人物1,2,3をそれぞれ規定する仮説H_１,H_２,H_３を用意し、入力される顔画像(a),(b)それぞれについて仮説H_１,H_２,H_３のLLRをそれぞれ算出する。図７(A),(B)はそれぞれ、顔画像(a),(b)について算出された仮説H_１,H_２,H_３のLLRの推移を示し、実線はLLR(H_１)、破線はLLR(H_２)、一点鎖線はLLR(H_３)である。 FIG. 7 is an explanatory diagram of problems in the conventional SPRT. Here, since three registered persons (m = 3) are scheduled, hypotheses H ₁ , H ₂ , and H ₃ that respectively define the persons ₁ , ₂ , and ₃ are prepared, and face images (a ) and (b), LLRs of hypotheses H ₁ , H ₂ , and H ₃ are calculated. FIGS. 7A and 7B show transitions of LLRs of hypotheses H ₁ , H ₂ , and H ₃ calculated for the face images (a) and (b), respectively, and the solid line indicates LLR (H ₁ ) and the broken line. Is LLR (H ₂ ), and the alternate long and short dash line is LLR (H ₃ ).

図７の時間軸に沿いLLRに基づいてSPRTを行うと、下記(1)〜(6)の順番に仮説の採択と棄却が決定付けられる。 When SPRT is performed based on the LLR along the time axis of FIG. 7, the adoption and rejection of hypotheses are decided in the following order (1) to (6).

(1)時刻t=t_１で、顔画像(a)についてのLLR(H_３)が下限閾値Bを下回った(LLR(H_３)<B)ので、仮説H_３を棄却する。時刻t=t_１以降、顔画像(a)については仮説H_１,H_２の検定のみが引き続き行われる。 (1) At time t = t ₁ , LLR (H ₃ ) for the face image (a) falls below the lower threshold B (LLR (H ₃ ) <B), so the hypothesis H ₃ is rejected. Time t = t ₁ and later, only the test of the hypothesis H _1, H ₂ is the face image (a) is subsequently performed.

(2)時刻t=t_２で、顔画像(b)についてのLLR(H_３)が下限閾値Bを下回った(LLR(H_３)<B)ので、仮説H_３が棄却される。時刻t=t_２以降、顔画像(a)について仮説H_１,H_２の検定のみが引き続き行われる。 (2) at time t = t _2, since the LLR for the face image (b) (H ₃₎ is below the lower threshold _{B (LLR (H 3) <} B), the hypothesis H ₃ is rejected. After time t = t ₂ , only the hypothesis H ₁ and H ₂ are continuously tested for the face image (a).

(3)時刻t=t_３で、顔画像(a)についてのLLR(H_１)が上限閾値Aを上回った(LLR(H_１)>A)ので、仮説H_１が選択される。これにより顔画像(a)は人物1のものと識別される。 (3) Since LLR (H ₁ ) for the face image (a) exceeds the upper threshold A (LLR (H ₁ )> A) at time t = t ₃ , hypothesis H ₁ is selected. As a result, the face image (a) is identified as that of the person 1.

(4)上記(3)で顔画像(a)が人物1のものと識別されたので、顔画像(a)についての他の全ての仮説が棄却される。この場合、顔画像(a)についての仮説H_２が棄却される。 (4) Since face image (a) is identified as that of person 1 in (3) above, all other hypotheses about face image (a) are rejected. In this case, the hypothesis of H ₂ for the face image (a) is rejected.

(5)時刻t=t_４で、顔画像(b)についてのLLR(H_１)が上限閾値Aを上回った(LLR(H_１)>A)ので、仮説H_１が採択される。これにより顔画像(b)がオブジェクト1のものと識別される。 (5) time t = t _4, since the LLR (H ₁₎ exceeds the upper threshold A for the face image _{(b) (LLR (H 1} )> A), the hypothesis H ₁ is adopted. As a result, the face image (b) is identified as that of the object 1.

(6)上記(5)で顔画像(b)が人物1のものと識別されたので、顔画像(b)についての他の全ての仮説が棄却される。この場合、顔画像(b)についての仮説H_２が棄却される。 (6) Since face image (b) is identified as that of person 1 in (5) above, all other hypotheses about face image (b) are rejected. In this case, the hypothesis of H ₂ for the face image (b) is rejected.

上記の例の場合、顔画像(a)は時刻t=t_３で人物1のものと識別され、顔画像(b)も時刻t=t_４で人物1のものと識別される。すなわち、顔画像(a),(b)とも同一人物のものと識別される。この場合、顔画像(b)は動きが激しく不安定であり、この影響を受けてLLRが閾値を超えるまでに長時間を要し、その上、識別に誤りが生じているものと考えられる。顔画像は、特に、姿勢や表情などの変動要因が多様であり、変動要素による確率比(LLR)に揺らぎが生じ易いので、識別結果に不整合が生じる可能性が高い。しかし、顔画像に限らず、変動要因をもつ時系列データを対象としたSPRTでは、同様に、識別結果に不整合が生じる可能性がある。 In the example above, the face image (a) is identified as the person 1 at time t = t _3, is identified as the person 1 in the face image (b) it is also a time t = t _4. That is, the face images (a) and (b) are identified as those of the same person. In this case, the face image (b) is very unstable and unstable, and it takes a long time for the LLR to exceed the threshold value due to this influence, and it is considered that an error has occurred in the identification. In particular, there are various variation factors such as posture and facial expression in the face image, and the probability ratio (LLR) due to the variation element is likely to fluctuate. However, in the SPRT that targets time series data having a variation factor as well as the face image, there is a possibility that the identification result may be inconsistent.

このように識別結果に不整合が生じるのは、従来のSPRTではオブジェクト間の排他性を考慮しておらず、複数の時系列データを同一オブジェクトと識別することを許しているためである。 The inconsistencies in the identification results are caused by the fact that the conventional SPRT does not consider the exclusion between objects and allows a plurality of time-series data to be identified as the same object.

本発明の目的は、上記課題を解決し、入力される時系列データが登録された人物あるいはその他のオブジェクトのいずれに属するかを高精度に識別できる時系列データの識別装置を提供することにある。 An object of the present invention is to solve the above-mentioned problems and to provide a time-series data identification device capable of accurately identifying whether input time-series data belongs to a registered person or other object. .

また、本発明の他の目的は、その時系列データの識別装置を利用して動画像に人物に関するメタ情報を付与する人物メタ情報付与装置を提供することにある。 Another object of the present invention is to provide a person meta information adding apparatus that adds meta information about a person to a moving image using the time-series data identification apparatus.

上記目的を達成するため、本発明に係る時系列データの識別装置は、入力される複数の時系列データの各々が登録されたオブジェクトのいずれに属するかを個別に識別する複数の識別手段と、前記複数の識別手段のうちの１つ識別手段において、ある時系列データが登録されたあるオブジェクトに属すると識別された場合、他の識別手段において該オブジェクトを識別候補から除外する除外手段を備える点に特徴がある。 In order to achieve the above object, an apparatus for identifying time-series data according to the present invention includes a plurality of identification means for individually identifying which of a registered object each of a plurality of input time-series data includes, In one of the plurality of identification means, when the identification means is identified as belonging to a registered object, the other identification means includes an exclusion means for excluding the object from the identification candidates. There is a feature.

また、本発明に係る時系列データの識別装置は、前記識別手段が、時系列データがあるオブジェクトに属することを規定する仮説をオブジェクト別に立て、それぞれの仮説の確率的な確からしさと不確からしさの比を示す尤度比を求め、その大小に基づいて時系列データが登録されたオブジェクトのいずれに属するかを識別する点に特徴がある。 In the time-series data identification device according to the present invention, the identification unit sets hypotheses that define that the time-series data belongs to an object for each object, and the probabilistic probability and uncertainty of each hypothesis. It is characterized in that a likelihood ratio indicating a ratio is obtained, and which of the registered objects the time-series data belongs to is determined based on the magnitude.

また、本発明に係る時系列データの識別装置は、前記識別手段が、各オブジェクトの特徴量を予め蓄積する登録データベースと、前記登録データベースに蓄積されている各オブジェクトの特徴量と各時系列データの特徴量とを逐次比較し、比較スコアを算出する比較スコア算出手段と、前記比較スコア算出手段により算出された比較スコアを用いて尤度比を求めるＬＬＲ測定手段と、前記ＬＬＲ測定手段により求められた尤度比を閾値と比較する比較手段を備える点に特徴がある。 In the time-series data identification device according to the present invention, the identification unit stores a feature database of each object in advance, a feature database of each object stored in the registration database, and each time-series data. The comparison score calculation means for sequentially comparing the feature quantities of the two and calculating the comparison score, the LLR measurement means for calculating the likelihood ratio using the comparison score calculated by the comparison score calculation means, and the LLR measurement means There is a feature in that a comparison means for comparing the likelihood ratio obtained with a threshold value is provided.

また、本発明に係る時系列データの識別装置は、前記除外手段が、各仮説が採択・棄却・検定中のいずれであるかの状態を示す状態保持バッファを備え、前記比較スコア算出手段は、検定中の状態にある仮説についてのみ比較スコアを算出する点に特徴がある。 In the time-series data identification device according to the present invention, the exclusion means includes a state holding buffer indicating whether each hypothesis is being adopted, rejected, or being tested, and the comparison score calculating means includes: It is characterized in that a comparison score is calculated only for a hypothesis that is in the state of being tested.

また、本発明に係る時系列データの識別装置は、前記識別手段が、あるオブジェクトについての尤度比が上限域値を上回った場合、時系列データは該オブジェクトに属すると識別し、その仮説の状態を採択に変更する点に特徴がある。 In the time-series data identification device according to the present invention, the identification unit identifies that the time-series data belongs to the object when the likelihood ratio of the object exceeds the upper limit value, and the hypothesis It is characterized in that the status is changed to adopted.

また、本発明に係る時系列データの識別装置は、前記識別手段が、あるオブジェクトについての尤度比が下限閾値を下回った場合、時系列データは該オブジェクトに属さないと識別し、その仮説の状態を棄却に変更する点に特徴がある。 In the time-series data identification device according to the present invention, the identification unit identifies that the time-series data does not belong to the object when the likelihood ratio for a certain object falls below a lower threshold, and the hypothesis It is characterized in that the state is changed to rejection.

また、本発明に係る時系列データの識別装置は、前時刻の対数尤度比を保持するＬＬＲ保持バッファを備え、前記尤度比は対数尤度比であり、前記識別手段は、前記ＬＬＲ保持バッファに保持された前時刻の尤度比を用いて現時刻の尤度比を再帰的に求める点に特徴がある。 In addition, the time-series data identification device according to the present invention includes an LLR holding buffer that holds a log likelihood ratio of the previous time, the likelihood ratio is a log likelihood ratio, and the identification means holds the LLR holding It is characterized in that the likelihood ratio at the current time is obtained recursively using the likelihood ratio at the previous time held in the buffer.

また、本発明に係る時系列データの識別装置は、前記識別手段が、予め学習によりモデル化されたガウス密度関数を用いて尤度比を求める点に特徴がある。 The time-series data identification device according to the present invention is characterized in that the identification means obtains a likelihood ratio using a Gaussian density function modeled in advance by learning.

本発明に係る動画像への人物メタ情報付与装置は、時系列データを顔画像の動画像とし、オブジェクトを人物とし、上記のいずれかの時系列データの識別装置と、動画像をフレーム単位で順次に読み込み、同一人物の顔画像とその連続表示期間を対応付けた顔インデックスを構築する顔インデックス構築手段と、人物メタ情報付与手段を備え、前記時系列データの識別装置は、顔画像を識別してその人物識別情報を送出し、前記人物メタ情報付与手段は、前記人物識別情報と前記顔インデックスを人物メタ情報として動画像の対応する各フレームに付与する点に特徴がある。 The apparatus for assigning person meta information to a moving image according to the present invention uses time-series data as a moving image of a face image, an object as a person, and any of the above-described time-series data identification devices and a moving image as a frame unit. The time-series data identification device includes a face index construction unit that sequentially reads and constructs a face index that associates a face image of the same person with a continuous display period thereof, and a person meta information addition unit. The person identification information is sent out, and the person meta information adding means adds the person identification information and the face index as person meta information to each corresponding frame of the moving image.

また、本発明に係る動画像への人物メタ情報付与装置は、前記時系列データの識別装置が、動画像に付随している番組情報から出演者リストを抽出し、該出演者リストに登場する人物を規定する仮説だけに絞って動画像を識別する点に特徴がある Further, in the apparatus for assigning human meta information to a moving image according to the present invention, the time-series data identification device extracts a performer list from program information attached to the moving image and appears in the performer list. Characterized by identifying moving images by focusing only on hypotheses that define people

本発明に係る時系列データの識別装置では、時系列データ間の排他性を考慮し、複数の識別手段のうちの１つ識別手段において、ある時系列データが登録されたあるオブジェクトであると識別されたとき、他の識別手段において該オブジェクトを識別候補から除外するので、複数の時系列データを同一オブジェクトと識別することがなくなり、各々の時系列データを高精度に識別できる。 In the time-series data identification device according to the present invention, in consideration of exclusivity between time-series data, one of the plurality of identification means is identified as an object in which certain time-series data is registered. In this case, since the object is excluded from the identification candidates by other identification means, a plurality of time-series data is not identified as the same object, and each time-series data can be identified with high accuracy.

また、本発明に係る動画像への人物メタ情報付与装置では、顔画像に排他性を考慮した時系列データの識別手法を適用し、同一の顔画像とその連続表示期間を対応付けた顔インデックスを構築し、フレーム単位で人物メタ情報を付与するので、目視で確認することなく、動画像に人物名などのメタ情報を高精度に付与でき、また、このメタ情報をクエリとして所望の人物の登場シーンを正確に検索できる。 In addition, the apparatus for assigning human meta information to a moving image according to the present invention applies a time-series data identification method that considers exclusivity to a face image, and sets a face index that associates the same face image with its continuous display period. Since it builds and assigns person meta information in units of frames, it is possible to add meta information such as person names to moving images with high accuracy without visual confirmation, and the appearance of a desired person using this meta information as a query The scene can be searched accurately.

以下、図面を参照して本発明を説明する。まず、本発明に係る時系列データの識別装置について説明する。時系列データの識別装置は、複数の時系列データを入力とし、それらの各々が予め登録されたオブジェクトのいずれに属するかを識別し、その識別結果を出力するものである。 The present invention will be described below with reference to the drawings. First, a time-series data identification device according to the present invention will be described. The time-series data identification device receives a plurality of time-series data as input, identifies which of the objects each of them belongs in advance, and outputs the identification result.

図１は、本発明に係る時系列データの識別装置の一実施形態を示すブロック図である。本実施形態の時系列データの識別装置10は、比較スコア算出部11、登録データベース(DB)12、状態保持バッファ13、LLR測定部14、LLR保持バッファ15および閾値識別部16を備える。 FIG. 1 is a block diagram showing an embodiment of a time-series data identification device according to the present invention. The time-series data identification device 10 according to the present embodiment includes a comparison score calculation unit 11, a registration database (DB) 12, a state holding buffer 13, an LLR measurement unit 14, an LLR holding buffer 15, and a threshold identification unit 16.

比較スコア算出部11には複数の時系列データが同時に入力される。複数の時系列データは、比較スコア算出部11、LLR測定部14および閾値識別部16で並列的に処理される。したがって、これらの各部は、入力される時系列データ分用意される。なお、これらの各部はハードウエアでもソフトウエアでも実現できる。 A plurality of time-series data is simultaneously input to the comparison score calculation unit 11. The plurality of time-series data are processed in parallel by the comparison score calculation unit 11, the LLR measurement unit 14, and the threshold identification unit 16. Therefore, these parts are prepared for the input time-series data. Each of these units can be realized by hardware or software.

比較スコア算出部11は、時系列データを任意の時間間隔で読み込み、時系列データと登録DB12に予め登録されている各オブジェクトとの類似度を示す比較スコアを算出する。例えば、時系列データが顔画像(動画像)である場合、顔画像のフレームを１枚ずつ読み込み、比較スコアを算出する。 The comparison score calculation unit 11 reads time-series data at arbitrary time intervals, and calculates a comparison score indicating the degree of similarity between the time-series data and each object registered in advance in the registration DB 12. For example, when the time-series data is a face image (moving image), the frames of the face image are read one by one and the comparison score is calculated.

比較スコアの算出では、時系列データとオブジェクトの特徴量を利用することができる。この場合、登録DB12には各オブジェクトの識別情報、各オブジェクトの特徴量と正例および負例のそれぞれに関するスコア分布の密度関数を予め登録しておく。登録DB12に予め登録するデータは、オブジェクトが既知の時系列データを用いて予め学習することにより取得できる。 In calculating the comparison score, the time-series data and the feature amount of the object can be used. In this case, in the registration DB 12, the identification function of each object, the feature amount of each object, and the density function of the score distribution regarding each of the positive example and the negative example are registered in advance. Data registered in advance in the registration DB 12 can be acquired by learning in advance using time-series data with known objects.

比較スコア算出部11は、入力される時系列データの特徴量を登録DB12に登録されている各オブジェクトの特徴量と比較することにより比較スコアを算出する。比較スコアは、入力される時系列データがオブジェクトに類似しているほど高い値を示す。以下では、入力される時系列データが動画像(顔画像)であるとして説明する。 The comparison score calculation unit 11 calculates a comparison score by comparing the feature amount of the input time-series data with the feature amount of each object registered in the registration DB 12. The comparison score indicates a higher value as the input time-series data is more similar to the object. In the following description, it is assumed that the input time-series data is a moving image (face image).

状態保持バッファ13は、SPRTでの各仮説が採択・棄却・検定中のいずれであるかを示す状態を保持している。比較スコア算出部11は、状態保持バッファ13に保持されている状態を参照し、検定中の状態にある仮説に関してのみ比較スコアを算出する。各仮説が採択・棄却・検定中のいずれであるかを示す状態は、後述するように、閾値識別部16での識別結果に従って状態保持バッファ13に保持される。 The state holding buffer 13 holds a state indicating whether each hypothesis at the SPRT is being adopted, rejected, or being tested. The comparison score calculation unit 11 refers to the state held in the state holding buffer 13 and calculates a comparison score only for the hypothesis in the state under test. The state indicating whether each hypothesis is being adopted, rejected or being tested is held in the state holding buffer 13 according to the identification result in the threshold identifying unit 16, as will be described later.

LLR測定部14は、比較スコア算出部11で算出された比較スコアを用いて現時刻におけるLLRを求める。 The LLR measurement unit 14 obtains the LLR at the current time using the comparison score calculated by the comparison score calculation unit 11.

mクラスの仮説をH_ｉ(i=1〜m)とし、時刻t(t=t_１,t_２,・・・,t_Ｎ,t_Ｎ＋１,・・・)のときのデータｚ_ｔについての仮説H_ｉの確率をP_ｉ(z_ｔ)とすると、時刻t_Ｎにおける仮説H_ｉの対数尤度LLR(H_ｉ)は、式(1)で求めることができる。なお、確率P_ｉ(z_ｔ)は、予め学習を行い、比較スコアに対してガウス密度関数となる値としてモデル化して設定しておく。 Hypothesis on data z _{t at} time t (t = t ₁ , t ₂ ,..., t _N , t _{N + 1} ,...), where m class hypothesis is H _i (i = 1 to m) Assuming that the probability of H _i is P _i (z _t ), the log likelihood LLR (H _i ) of the hypothesis H _i at time t _N can be obtained by Equation (1). The probability P _i (z _t ) is learned in advance, and is set by modeling as a value that becomes a Gaussian density function with respect to the comparison score.

また、現時刻におけるLLRは、LLR保持バッファ15に蓄積されている前時刻でのLLRを用い、式(3)を用いて再帰的に求めることができる。式(3)において、LLR(H_ｉ)は、現時刻t=t_Ｎにおける仮説H_ｉの対数尤度であり、LLR′(H_ｉ)は、前時刻t=t_Ｎ−１における仮説H_ｉの対数尤度である。式(2)の右辺第２および第３項は現時刻t=t_ＮでのLLR追加分を表している。 Also, the LLR at the current time can be obtained recursively using equation (3) using the LLR at the previous time stored in the LLR holding buffer 15. In Equation (3), LLR (H _i ) is the log likelihood of hypothesis H _{i at} the current time t = t _N , and LLR ′ (H _i ) is hypothesis H _{i at the} previous time t = t _N−1 . Log likelihood. Hand side and the third term in Equation (2) represents the LLR Additions at the current time t = t _N.

閾値識別部16は、LLR測定部14で求められた現時刻におけるLLR(H_i)を２つの閾値AおよびB(A>B)と比較し、識別結果を出力する。すなわち、複数の時系列データそれぞれに対する仮説検定により得られたLLR(H_i)が上限閾値Aを上回った場合、仮説H_iを採択し、下限閾値Bを下回った場合、仮説H_iを棄却する。なお、LLR(H_i)が上限閾値Aと下限閾値Bの間にある場合には判定を下さない。閾値AおよびBは、RecallおよびPrecisionに代表される精度の観点で統計的に導出できる。 The threshold identifying unit 16 compares the LLR (H _i ) at the current time obtained by the LLR measuring unit 14 with the two thresholds A and B (A> B), and outputs the identification result. In other words, if LLR (H _i ) obtained by hypothesis testing for each of a plurality of time series data exceeds the upper threshold A, the hypothesis H _i is adopted, and if it falls below the lower threshold B, the hypothesis H _i is rejected. . Note that if LLR (H _i ) is between the upper threshold A and the lower threshold B, no determination is made. The thresholds A and B can be statistically derived from the viewpoint of accuracy represented by Recall and Precision.

例えば、時系列データfを仮説iで検定しているとすると、LLR(H_i)が上限閾値Aを上回った場合、仮説iを採択し、時系列データfは、仮説iに規定された人物と識別する。同時に仮説iの状態を採択に変更する。 For example, if time series data f is tested with hypothesis i, if LLR (H _i ) exceeds upper threshold A, hypothesis i is adopted, and time series data f is the person specified in hypothesis i. Identify. At the same time, change the state of hypothesis i to adopted.

また、LLR(H_i)が下限閾値Bを下回った場合、仮説iを棄却し、仮説iの状態を棄却に変更する。LLR(H_i)が上限閾値Aと下限閾値Bの間にある場合には何もしない。仮説iの状態は未確定のままである。 If LLR (H _i ) falls below the lower threshold B, hypothesis i is rejected and the state of hypothesis i is changed to reject. If LLR (H _i ) is between the upper threshold A and the lower threshold B, nothing is done. The state of hypothesis i remains uncertain.

LLR保持バッファ15は、LLR測定部14で求められたLLRを蓄積し、状態保持バッファ13は、閾値識別部16から送出される各仮説iの状態を保持する。 The LLR holding buffer 15 accumulates the LLR obtained by the LLR measuring unit 14, and the state holding buffer 13 holds the state of each hypothesis i sent from the threshold identifying unit 16.

図２は、図１におけるSPRTの説明図である。ここでも、識別対象の時系列データが動画像であり、オブジェクトがTV番組に出演する人物1,2,3である場合を想定している。 FIG. 2 is an explanatory diagram of the SPRT in FIG. Again, it is assumed that the time-series data to be identified is a moving image and the objects are persons 1, 2, and 3 appearing in a TV program.

図２(a),(b)はそれぞれ、入力される顔画像(a),(b)(図６)について得られた仮説H_１,H_２,H_３のLLRの推移を示し、実線はLLR(H_１)、破線はLLR(H_２)、一点鎖線はLLR(H_３)である。図２の時間軸に沿いLLRに基づいてSPRTを行うと、下記(1)〜(6)の順番に仮説の採択と棄却が決定付けられる。 2 (a) and 2 (b) show the LLR transitions of hypotheses H ₁ , H ₂ and H ₃ obtained for the input facial images (a) and (b) (FIG. 6), respectively. LLR (H ₁ ), the broken line is LLR (H ₂ ), and the alternate long and short dash line is LLR (H ₃ ). When SPRT is performed based on the LLR along the time axis in FIG. 2, the adoption and rejection of hypotheses are decided in the order of (1) to (6) below.

(2)時刻t=t_２で、顔画像(b)についてのLLR(H_３)が下限閾値Bを下回った(LLR(H_３)<B)ので、仮説H_３を棄却する。時刻t=t_２以降、顔画像(a)については仮説H_１,H_２の検定のみが引き続き行われる。時刻t=t_２までは図７と全く同じである。 (2) at time t = t _2, since the LLR for the face image (b) (H ₃₎ is below the lower threshold _{B (LLR (H 3) <} B), to reject the hypothesis H _3. Time t = t ₂ and later, only the test of the hypothesis H _1, H ₂ is the face image (a) is subsequently performed. Until the time t = t ₂ is exactly the same as that shown in FIG. 7.

(3)時刻t=t_３で、顔画像(a)についてのLLR(H_１)が上限閾値Aを上回った(LLR(H_１)>A)ので、仮説H_１を採択する。これにより顔画像(a)は人物1のものと識別される。 (3) Since LLR (H ₁ ) for the face image (a) exceeds the upper threshold A (LLR (H ₁ )> A) at time t = t ₃ , hypothesis H ₁ is adopted. As a result, the face image (a) is identified as that of the person 1.

(4)上記(3)で顔画像(a)は人物1のものと識別されたので、顔画像(a)についての他の全ての仮説を棄却する。この場合、顔画像(a)についての仮説H_２が棄却される。 (4) Since face image (a) is identified as that of person 1 in (3) above, all other hypotheses about face image (a) are rejected. In this case, the hypothesis of H ₂ for the face image (a) is rejected.

(5)上記(3)で顔画像(a)に人物が識別され、同じフレーム内に同一人物が出現することはないので、顔画像(b)についての仮説H_１を棄却する。時刻t=t_３以降、顔画像(b)については仮説H_２の検定のみが引き続き行われる。 (5) (3) identified a person in the face image (a), since the same person in the same frame does not appear, to reject the hypothesis H ₁ of the face image (b). Time t = t ₃ or later, only the test of the hypothesis H ₂ is for the face image (b) is subsequently performed.

(6)時刻t=t_５で、顔画像(b)についてのLLR(H_２)が上限閾値Aを上回った(LLR(H_２)>A)ので、仮説H_２を採択する。これにより顔画像(b)は人物2のものと識別される。 (6) Since LLR (H ₂ ) for the face image (b) exceeds the upper threshold A (LLR (H ₂ )> A) at time t = t ₅ , hypothesis H ₂ is adopted. Thus, the face image (b) is identified as that of the person 2.

(7)上記上記(6)で顔画像(b)は人物2のものと識別されたので、顔画像(b)についての他の全ての仮説を棄却する。この場合、顔画像(b)について棄却される仮説はない。 (7) Since the face image (b) is identified as that of the person 2 in the above (6), all other hypotheses about the face image (b) are rejected. In this case, there is no hypothesis rejected for the face image (b).

以上のように、同一フレーム内に同一人物が出現することはないことを考慮し、ある時系列データをある人物のものと識別したときには、該人物を他の識別での識別候補から除外するので、他の時系列データをそれと同じ人物のものと識別することがない。 As described above, in consideration of the fact that the same person does not appear in the same frame, when identifying certain time-series data as that of a certain person, that person is excluded from identification candidates in other identifications. The other time series data is not distinguished from that of the same person.

次に、本発明に係る人物メタ情報付与装置について説明する。本発明に係る人物メタ情報付与装置は、入力される時系列データを動画像とし、オブジェクトを出演人物とし、上記の時系列データの識別装置を用いて、時系列データ(顔画像)がどの人物のものであるかを識別し、動画像へ人物メタ情報を付与するものである。 Next, a person meta information providing apparatus according to the present invention will be described. The person meta information providing apparatus according to the present invention uses input time-series data as a moving image, an object as a performer, and a person who has time-series data (face image) using the above-described time-series data identification apparatus. The person meta information is added to the moving image.

また、上記の時系列データの識別装置を用いて再帰的にLLRを求めるためには、入力する時系列データの各々を同一人物の顔画像とする必要がある。例えば、現在視聴中のTV番組で、図６に示すように２人の人物が登場している場合、時刻t_Ｎと次時刻t_Ｎ＋１の間の同一顔を判定して対応付け、各顔画像を２つの時系列データとして時系列データの識別装置に入力する必要がある。そこで、以下に説明する人物メタ情報付与装置は、時系列データの識別装置の前段に顔画像追跡部などを備えている。 In addition, in order to recursively obtain the LLR using the time-series data identification device described above, each of the input time-series data needs to be a face image of the same person. For example, a TV program currently being viewed, if two people have appeared as shown in FIG. 6, the time t _N and mapping to determine the same face between the next time t _{N + 1,} each face image Must be input to the time-series data identification device as two time-series data. Therefore, the person meta information providing apparatus described below includes a face image tracking unit and the like before the time-series data identification apparatus.

図３は、本発明に係る人物メタ情報付与装置の一実施形態を示すブロック図である。本実施形態の人物メタ情報付与装置は、顔画像を含む動画像を蓄積する動画像蓄積部31、顔画像追跡部32、代表顔決定部33、顔インデックス構築部34、出演者情報抽出部35、顔登録部36、顔DB37、顔識別部38および人物メタ情報付与部39を備える。 FIG. 3 is a block diagram showing an embodiment of the person meta information providing apparatus according to the present invention. The person meta information providing apparatus according to the present embodiment includes a moving image storage unit 31 that stores moving images including face images, a face image tracking unit 32, a representative face determination unit 33, a face index construction unit 34, and a performer information extraction unit 35. A face registration unit 36, a face DB 37, a face identification unit 38, and a person meta information adding unit 39.

顔画像追跡部32、代表顔決定部33および顔インデックス構築部34による機能は、本発明者が先に提案した「動画像の顔インデックス作成装置およびその顔画像追跡方法」(特願2007-88738号)と同じであるので、その概要だけを説明する。 The functions of the face image tracking unit 32, the representative face determination unit 33, and the face index construction unit 34 are the “Face index creation device for moving images and its face image tracking method” previously proposed by the present inventor (Japanese Patent Application No. 2007-88738). No.), so only the outline is explained.

顔画像追跡部32は、動画像蓄積部31から動画像をフレーム単位で順次に読み込み、複数フレームにわたって連続的に登場する同一人物の顔画像を追跡し、その連続表示期間を検出する。代表顔決定部33は、顔画像追跡部32で検出された連続表示期間ごとの代表顔を決定する。顔インデックス構築部34は、各顔画像の連続表示期間と該連続表示期間の代表顔を対応付けた顔インデックスを構築する。１フレームの画像内に複数の顔画像が含まれている場合、顔画像ごとに顔インデックスが構築されることになる。 The face image tracking unit 32 sequentially reads the moving images from the moving image storage unit 31 in units of frames, tracks the face images of the same person that appears continuously over a plurality of frames, and detects the continuous display period. The representative face determination unit 33 determines a representative face for each continuous display period detected by the face image tracking unit 32. The face index construction unit 34 constructs a face index that associates the continuous display period of each face image with the representative face of the continuous display period. When a plurality of face images are included in one frame image, a face index is constructed for each face image.

出演者情報抽出部35は、動画像蓄積部31に蓄積されている動画像中に出てくる出演者の名前などの出演者情報を抽出する。例えば、動画像蓄積部31に蓄積されている動画像がTV番組であり、それに付随してEPG(Electronic Program Guide)情報などの番組情報が蓄積されている場合、それから出演者リストを出演者情報として抽出する。 The performer information extraction unit 35 extracts performer information such as names of performers appearing in the moving images stored in the moving image storage unit 31. For example, if the moving image stored in the moving image storage unit 31 is a TV program and program information such as EPG (Electronic Program Guide) information is stored along with the TV program, the performer information is then added to the performer information. Extract as

顔登録部36には、予め多くの人物について名前と顔画像とが対応付けられて登録されている。顔DB37は、図１の登録DBに相当するものであり、顔登録部36に登録されているデータの中から出演者情報抽出部35で抽出された出演者リストに上がっている人物についてのデータ(各人物の名前、各人物の顔画像の特徴量と正例および負例のそれぞれに関するスコア分布の密度関数)を抽出して登録する。 In the face registration unit 36, names and face images are registered in advance in association with many persons. The face DB 37 corresponds to the registration DB of FIG. 1 and is data about a person on the performer list extracted by the performer information extraction unit 35 from the data registered in the face registration unit 36. (The name of each person, the feature quantity of each person's face image, and the density function of the score distribution for each of the positive and negative examples) are extracted and registered.

顔識別部38には、顔画像追跡部32で画像内から追跡して検出された顔画像の時系列データが入力される。画像内に複数の顔画像が含まれている場合、各顔画像についての時系列データが生成され、複数の時系列データが同時に顔識別部38に入力される。 The face identification unit 38 receives time-series data of face images detected by tracking from the image by the face image tracking unit 32. When a plurality of face images are included in the image, time-series data for each face image is generated, and the plurality of time-series data are simultaneously input to the face identifying unit 38.

顔識別部38は、図１の比較スコア算出部11、状態保持バッファ13、LLR測定部14、LLR保持バッファ15および閾値識別部16に相当する部分であり、顔DB37に登録された人物についてのデータを読み込み、時系列データがどの人物のものであるかの仮説を逐次確率比検定し、人名などの人物識別情報を出力する。 The face identifying unit 38 is a part corresponding to the comparison score calculating unit 11, the state holding buffer 13, the LLR measuring unit 14, the LLR holding buffer 15 and the threshold identifying unit 16 of FIG. Data is read, a hypothesis as to which person the time-series data belongs to is sequentially subjected to a probability ratio test, and person identification information such as a person's name is output.

人物メタ情報付与部39は、顔インデックス構築部34から出力される顔インデックス(連続表示期間・代表顔画像)および顔識別部38から出力される人物識別情報を人物メタ情報としてフレーム単位で記述し、動画像蓄積部31の対応する動画像に付与する。 The person meta information adding unit 39 describes the face index (continuous display period / representative face image) output from the face index constructing unit 34 and the person identification information output from the face identifying unit 38 as person meta information in units of frames. The image is assigned to the corresponding moving image in the moving image storage unit 31.

図４は、図３の顔画像追跡部32と顔識別部38を機能的表現したブロック図である。顔画像追跡部32は上記特願2007-88738号で提案したものと同じであり、顔識別部38は図１と同じであるので、その概要だけを説明する。 FIG. 4 is a block diagram functionally expressing the face image tracking unit 32 and the face identification unit 38 of FIG. The face image tracking unit 32 is the same as that proposed in the above Japanese Patent Application No. 2007-88738, and the face identification unit 38 is the same as that shown in FIG.

顔画像追跡部32は、フレーム画像取得部41、今回フレームバッファ42、前回フレームバッファ43、ショットチェンジ識別部44、顔検出部45、今回検出結果バッファ46、前回検出結果バッファ47、顔間距離算出部48、距離依存対応付部49、類似度依存対応付部50を備える。 The face image tracking unit 32 includes a frame image acquisition unit 41, a current frame buffer 42, a previous frame buffer 43, a shot change identification unit 44, a face detection unit 45, a current detection result buffer 46, a previous detection result buffer 47, and an inter-face distance calculation. A unit 48, a distance dependency correspondence unit 49, and a similarity dependency correspondence unit 50.

顔識別部38は、比較スコア算出部51、顔DB52、状態保持バッファ53、LLR測定部54、LLR保持バッファ55および閾値識別部56を備える。 The face identifying unit 38 includes a comparison score calculating unit 51, a face DB 52, a state holding buffer 53, an LLR measuring unit 54, an LLR holding buffer 55, and a threshold identifying unit 56.

フレーム画像取得部41は、動画像蓄積部31に蓄積されている動画像から１フレーム分の静止画像を任意の時刻間隔で読み込む。今回フレームバッファ42は、今回読み込まれた今回フレームの静止画像を蓄積し、前回フレームバッファ53は、前回読み込まれた前回フレームの静止画像を蓄積する。 The frame image acquisition unit 41 reads a still image for one frame from a moving image stored in the moving image storage unit 31 at an arbitrary time interval. The current frame buffer 42 stores the still image of the current frame read this time, and the previous frame buffer 53 stores the still image of the previous frame read last time.

ショットチェンジ識別部44は、今回フレームバッファ42と前回フレームバッファ53に蓄積されている静止画像同士を比較し、その類似度に基づいてショット間のカメラ編集点の有無を識別する。 The shot change identifying unit 44 compares the still images stored in the current frame buffer 42 and the previous frame buffer 53, and identifies the presence or absence of a camera edit point between shots based on the similarity.

顔検出部45は、今回フレームバッファ42に蓄積された今回フレームの静止画像から顔画像を検出し、検出された顔画像ごとにその表示範囲の位置座標、幅、高さなどの空間的な位置情報を求める。 The face detection unit 45 detects a face image from the still image of the current frame stored in the current frame buffer 42, and for each detected face image, a spatial position such as a position coordinate, a width, and a height of the display range. Ask for information.

今回検出結果バッファ46は、顔検出部45で求められた顔画像の位置情報を蓄積する。なお、顔検出部45で検出された直後の顔画像は、未確定の顔候補として取り扱い、次のフレームでも同じ表示位置で顔画像が検出されたときに確定顔として取り扱うこととする。また、前回検出結果バッファ47は、前回フレームで検出された顔候補および確定顔の表示範囲の位置座標、幅、高さなどの空間的な位置情報を蓄積する。 The detection result buffer 46 this time accumulates the position information of the face image obtained by the face detection unit 45. Note that the face image immediately after being detected by the face detection unit 45 is treated as an undetermined face candidate, and is treated as a confirmed face when a face image is detected at the same display position in the next frame. The previous detection result buffer 47 accumulates spatial position information such as the position coordinates, width, and height of the display range of the face candidate and the confirmed face detected in the previous frame.

顔間距離算出部48は、顔検出部45で検出され、今回検出結果バッファ46に蓄積された今回フレームの各顔画像の表示位置と前回検出結果バッファ47に蓄積されている前回フレームの各顔画像の表示位置との距離(以下、顔間距離と称す)Δdを算出する。この距離Δdは、例えば、各顔画像の左上座標間の距離を求めることにより算出できる。 The face-to-face distance calculation unit 48 is detected by the face detection unit 45 and the display position of each face image of the current frame accumulated in the current detection result buffer 46 and each face of the previous frame accumulated in the previous detection result buffer 47 A distance Δd from the image display position (hereinafter referred to as a face-to-face distance) is calculated. This distance Δd can be calculated, for example, by obtaining the distance between the upper left coordinates of each face image.

距離依存対応付部49は、顔間距離算出部48で算出された顔間距離Δdが所定の閾値Δdrefを下回る顔画像(顔候補および確定顔)の組み合わせに対して、今回フレームで検出された顔候補の状態を今回検出結果バッファ46上で確定顔に更新する。同時に、前回フレームで検出された顔候補の状態を前回検出結果バッファ47上で確定顔に更新し、さらに各顔画像を顔画像シリーズとして相互に対応付ける。ここで、所定の閾値Δdrefは、前回フレームで検出された顔画像の大きさに比例した値に設定するのが好ましい。 The distance dependence association unit 49 detects a combination of face images (face candidates and confirmed faces) in which the inter-face distance Δd calculated by the inter-face distance calculation unit 48 is less than a predetermined threshold Δdref in the current frame. The face candidate state is updated to a confirmed face in the current detection result buffer 46. At the same time, the state of the face candidate detected in the previous frame is updated to a confirmed face in the previous detection result buffer 47, and each face image is associated with each other as a face image series. Here, the predetermined threshold Δdref is preferably set to a value proportional to the size of the face image detected in the previous frame.

類似度依存対応付部50は、前回検出結果バッファ47に蓄積されている前回フレームの確定顔の中で、今回フレームのいずれの顔画像とも対応付けされなかった確定顔をテンプレートとして今回フレームの画像内でテンプレートマッチングを行い、類似度が所定の閾値を超える領域の画像を新たな顔画像(確定顔)として今回検出結果バッファ46に追加し、さらに各顔画像を顔画像シリーズとして相互に対応付ける。 The similarity dependence association unit 50 uses the confirmed face that is not associated with any face image of the current frame among the confirmed faces of the previous frame accumulated in the previous detection result buffer 47 as an image of the current frame. Template matching is performed, and an image of a region whose similarity exceeds a predetermined threshold is added to the current detection result buffer 46 as a new face image (determined face), and each face image is associated with each other as a face image series.

テンプレートマッチングの適用領域は、テンプレートマッチングでの誤検出を低減し、かつ計算量を低減するために、今回フレームの画像全体ではなく、テンプレートとして使用する確定顔の顔前フレームにおける表示位置と対応した位置およびその近傍に限定するのが好ましい。 The template matching application area corresponds to the display position in the pre-face frame of the confirmed face used as a template instead of the entire image of the current frame in order to reduce false detection in template matching and reduce the amount of calculation. It is preferable to limit the position and its vicinity.

前回検出結果バッファ47上には確定された顔画像とその領域座標が保持され、これらは代表顔決定部33(図３)および顔特徴量作成部57に送出される。 The previously detected face image and its region coordinates are held in the previous detection result buffer 47, and these are sent to the representative face determination unit 33 (FIG. 3) and the face feature amount creation unit 57.

顔識別部38は、図１の構成と同じであり、確定された顔画像が登録された人物のいずれのものであるか仮説を逐次確率検定し、人物識別情報を出力する。ただし、ここでは、前回検出結果バッファ47と顔識別部38との間に顔特徴量作成部57を介在させている。 The face identification unit 38 has the same configuration as that in FIG. 1, and sequentially tests the hypothesis as to which of the registered persons the confirmed face image is, and outputs person identification information. However, here, a face feature amount creation unit 57 is interposed between the previous detection result buffer 47 and the face identification unit 38.

顔特徴量作成部57は、前回検出結果バッファ47から出力される確定顔画像それぞれの時系列データから顔識別に適した顔特徴量を作成し、それらを時系列データとして比較スコア算出部51に入力する。顔特徴量は、Eigen face法(固有顔法)、Fisher face法、その他の方法で作成できる。この顔特徴量は、姿勢や照明などの変動にロバストな特徴量とするのが好ましい。また、顔特徴量の作成に際し、顔中の両方の瞳などの器官を抽出し、それを基準にして位置と大きさの正規化を行うことも好ましい。 The face feature quantity creation unit 57 creates face feature quantities suitable for face identification from the time series data of each confirmed face image output from the previous detection result buffer 47, and uses the face feature quantities as time series data to the comparison score calculation unit 51. input. The face feature amount can be created by the Eigen face method (proper face method), the Fisher face method, or other methods. The face feature amount is preferably a feature amount that is robust to variations in posture, lighting, and the like. It is also preferable to extract both organs such as pupils in the face and to normalize the position and size with reference to the extracted organs such as pupils in the face.

以上のように、顔画像に基づいて人物を判定するSPRTでは、時系列系列データが登録されている人物のいずれのものであるかを、各人物を規定するそれぞれの仮説を用いて識別する。ここで、従来のSPRTでは複数の時系列データ間の排他性を考慮せずに識別を行うのに対し、本発明では複数の時系列データ間の排他性を考慮した識別を行うので、複数の時系列データを同一人物のものなどと誤識別することがない。したがって、複数の時系列データを高精度に識別でき、また、動画像にフレーム単位で登場人物のメタ情報を正確に付与することができる。 As described above, in SPRT for determining a person based on a face image, the person who has registered time-series data is identified using each hypothesis that defines each person. Here, in the conventional SPRT, the identification is performed without considering the exclusivity between the plurality of time series data, whereas in the present invention, the identification is performed in consideration of the exclusivity between the plurality of time series data. The data is not mistakenly identified as that of the same person. Therefore, a plurality of time-series data can be identified with high accuracy, and the meta information of the characters can be accurately given to the moving image in units of frames.

本発明に係る時系列データの識別装置の一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the identification apparatus of the time series data which concerns on this invention. 図１における逐次確率比検定(SPRT)の動作説明図である。It is operation | movement explanatory drawing of the sequential probability ratio test (SPRT) in FIG. 本発明に係る人物メタ情報付与装置の一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the person meta information provision apparatus which concerns on this invention. 図３の顔画像追跡部と顔識別部を機能的表現したブロック図である。FIG. 4 is a block diagram functionally expressing a face image tracking unit and a face identification unit of FIG. 3. 従来のSPRTの動作説明図である。It is operation | movement explanatory drawing of the conventional SPRT. TV番組映像の、あるシーンにおける１フレームの画像の例を示す図である。It is a figure which shows the example of the image of 1 frame in a certain scene of a TV program image | video. 従来のSPRTにおける問題点の説明図である。It is explanatory drawing of the problem in the conventional SPRT.

Explanation of symbols

10・・・時系列データの識別装置、11,51・・・比較スコア算出部、12・・・登録データベース(DB)、13,53・・・状態保持バッファ、14,54・・・LLR測定部、15,55・・・LLR保持バッファ、16,56・・・閾値識別部、31・・・動画像蓄積部、32・・・顔画像追跡部、33・・・代表顔決定部、34・・・顔インデックス構築部、35・・・出演者情報抽出部、36・・・顔登録部、37,52・・・顔DB、38・・・顔識別部、39・・・人物メタ情報付与部、41・・・フレーム画像取得部、42・・・今回フレームバッファ、43・・・前回フレームバッファ、44・・・ショットチェンジ識別部、45・・・顔検出部、46・・・今回検出結果バッファ、47・・・前回検出結果バッファ、48・・・顔間距離算出部、49・・・距離依存対応付部、50・・・類似度依存対応付部 10 ... Time-series data identification device, 11, 51 ... Comparison score calculator, 12 ... Registration database (DB), 13, 53 ... State holding buffer, 14, 54 ... LLR measurement , 15, 55 ... LLR holding buffer, 16,56 ... threshold identification unit, 31 ... moving image storage unit, 32 ... face image tracking unit, 33 ... representative face determination unit, 34 ... Face index construction part, 35 ... Performer information extraction part, 36 ... Face registration part, 37,52 ... Face DB, 38 ... Face identification part, 39 ... Person meta information Giving unit, 41 ... Frame image acquisition unit, 42 ... Current frame buffer, 43 ... Previous frame buffer, 44 ... Shot change identification unit, 45 ... Face detection unit, 46 ... This time Detection result buffer, 47 ... Previous detection result buffer, 48 ... Face distance calculation section, 49 ... Distance dependence correspondence section, 50 ... Similarity dependence correspondence section

Claims

In the time-series data identification device for identifying which of the registered objects each of a plurality of input time-series data belongs,
A plurality of identification means for individually identifying which of the registered objects each of a plurality of input time-series data belongs;
When one of the plurality of identification means identifies that a certain time-series data belongs to a registered object, the other identification means includes an exclusion means for excluding the object from identification candidates. A characteristic time-series data identification device.

The identification means establishes a hypothesis defining that time series data belongs to an object for each object, obtains a likelihood ratio indicating a ratio of probabilistic probability to uncertainty of each hypothesis, and based on the magnitude 2. The time-series data identification device according to claim 1, wherein it identifies which of the registered objects the time-series data belongs.

The identification unit sequentially compares a feature database of each object, a feature amount of each object stored in the registration database, and a feature amount of each time-series data, and calculates a comparison score. Comparison score calculation means, LLR measurement means for obtaining a likelihood ratio using the comparison score calculated by the comparison score calculation means, and comparison means for comparing the likelihood ratio obtained by the LLR measurement means with a threshold value The time-series data identification device according to claim 2.

The exclusion means includes a state holding buffer that indicates whether each hypothesis is being adopted, rejected, or being tested, and the comparison score calculation means calculates a comparison score only for a hypothesis that is in the state being tested. The time-series data identification device according to claim 3.

5. The identification means, when a likelihood ratio for a certain object exceeds an upper limit value, identifies that time-series data belongs to the object, and changes the hypothesis state to adopted. The time-series data identification device described in 1.

5. The identification unit, when a likelihood ratio for a certain object falls below a lower threshold, identifies that time-series data does not belong to the object, and changes the hypothesis state to rejection. Or the time-series data identification device according to 5.

An LLR holding buffer that holds a log likelihood ratio of the previous time is provided, the likelihood ratio is a log likelihood ratio, and the identification unit uses the likelihood ratio of the previous time held in the LLR holding buffer. 7. The time-series data identification device according to claim 2, wherein the likelihood ratio at the current time is obtained recursively.

7. The time-series data identification apparatus according to claim 2, wherein the identification unit obtains a likelihood ratio using a Gaussian density function modeled in advance by learning.

Time series data is a moving image of a face image, an object is a person,
A time-series data identification device according to any one of claims 1 to 8,
Face index construction means for sequentially loading moving images in units of frames and constructing a face index in which face images of the same person and their continuous display periods are associated with each other;
Provided with person meta information giving means,
The time-series data identification device identifies a face image and sends out the person identification information, and the person meta information giving means uses the person identification information and the face index as person meta information to correspond to each of moving images. An apparatus for assigning person meta information to a moving image, characterized by being attached to a frame.

The time-series data identifying device extracts a performer list from program information attached to a moving image, and identifies a moving image by focusing only on a hypothesis defining a person appearing in the performer list. The apparatus for adding person meta information to a moving image according to claim 9.