JP3746690B2

JP3746690B2 - Signal detection method and apparatus, program, and recording medium

Info

Publication number: JP3746690B2
Application number: JP2001209813A
Authority: JP
Inventors: 隆行黒住; 邦夫柏野; 洋村瀬
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-07-10
Filing date: 2001-07-10
Publication date: 2006-02-15
Anticipated expiration: 2021-07-10
Also published as: JP2003022084A

Description

【０００１】
【発明の属する技術分野】
入力信号内において参照信号と類似した部分を検出する信号検出方法およびその装置に関する。特に、例えば映像信号や音響信号などといった入力時系列信号の中から、入力時系列信号よりも短い参照時系列信号と類似した部分を検出する信号検出方法およびその装置に関する。
【０００２】
【従来の技術】
従来、信号検出方法に関しては、特許第３０６５３１４号「高速信号検出法、装置およびその記録媒体」のように、あらかじめ登録した音響信号と類似した音響信号の場所を探し出す音響信号検出方法が知られている。しかし、この方法では、参照時系列信号、または入力時系列信号のノイズによる特徴ひずみが少ないことが想定されており、特徴ひずみが激しい場合、探索精度が低下する可能性があるという欠点があった。
【０００３】
また、本願発明者らは、変動付加過程を設けることによって特徴ひずみに対して頑健な信号検出を行う方法を発明し、既に特許出願を行っている。
また、注目領域全体のパワーを用いて正規化することで、特徴ひずみに対して頑健な信号検出を行う方法もある。
【０００４】
【発明が解決しようとする課題】
上述した変動付加過程を設ける方法においては、複数の特徴ひずみを考慮する場合、複数の参照特徴を用意しなければならないという欠点があった。
また、上述した注目領域全体のパワーを用いて正規化する方法においては、周波数特性が変化した場合、探索精度が低下するという欠点があった。
【０００５】
本発明は、以上のような事情を考慮してなされたものであり、従来の方法よりも特徴ひずみに頑健な信号検出の処理手段を提供するとともに、従来の方法よりも汎用的で特徴ひずみに頑健な信号検出の処理手段を提供することを目的としている。
【０００６】
【課題を解決するための手段】
上記課題を解決するために、本発明は、参照信号に類似した信号を入力信号から検出する信号検出装置であって、入力された参照信号から参照特徴を抽出する参照特徴計算手段と、前記参照特徴を基に正規化処理を行うことにより、参照正規化特徴を計算する参照特徴正規化手段と、前記参照正規化特徴を線形変換する参照特徴変換手段と、入力信号から入力特徴を抽出する入力特徴計算手段と、前記入力特徴を基に正規化処理を行うことにより、入力正規化特徴を計算する入力特徴正規化手段と、前記入力正規化特徴を線形変換する入力特徴変換手段と、前記入力特徴変換手段により線形変換された後の入力正規化特徴（以降、変換後入力正規化特徴とする）上に設定された照合区間における変換後入力正規化特徴と前記参照特徴変換手段により線形変換された後の参照正規化特徴（以降、変換後参照正規化特徴とする）との類似度を計算し、計算された類似度に基づき検索結果を出力する特徴照合手段と、学習用信号を読み込み、前記参照特徴変換手段と前記入力特徴変換手段とにおいて線形変換する際に用いる変換係数を求める学習手段とを有し、前記学習手段は、前記学習用信号として、特徴ひずみのない原信号と該原信号に特徴ひずみが加えられたひずみ信号とを用い、前記入力特徴計算手段及び前記入力特徴正規化手段が用いる処理と同じ処理を前記学習用信号に実行することにより求められた学習用信号の正規化特徴の所定の区間の信号対を複数作成し、該信号対の級間分散の級内分散に対する比を評価関数としたときの、該評価関数が最大となる前記変換係数を求め、前記参照特徴変換手段または前記入力特徴変換手段は、前記学習手段の求めた変換係数を用いて前記参照正規化特徴、前記入力正規化特徴をそれぞれ線形変換することを特徴とする信号検出装置である。
【０００７】
また本発明は、前記信号検出装置が、前記評価関数を最大とすることにより求める変換係数を、
【数１】

【数２】

【数３】

から求められる固有ベクトルを用いて生成することを特徴とする。
【０００８】
また本発明は、参照信号に類似した信号を入力信号から検出する信号検出装置であって、入力された参照信号から参照特徴を抽出する参照特徴計算手段と、前記参照特徴を基に正規化処理を行うことにより、参照正規化特徴を計算する参照特徴正規化手段と、前記参照正規化特徴を線形変換する参照特徴変換手段と、入力信号から入力特徴を抽出する入力特徴計算手段と、前記入力特徴を基に正規化処理を行うことにより、入力正規化特徴を計算する入力特徴正規化手段と、前記入力正規化特徴を線形変換する入力特徴変換手段と、前記入力特徴変換手段により線形変換された後の入力正規化特徴（以降、変換後入力正規化特徴とする）上に設定された照合区間における変換後入力正規化特徴と前記参照特徴変換手段により線形変換された後の参照正規化特徴（以降、変換後参照正規化特徴とする）との類似度を計算し、計算された類似度に基づき検索結果を出力する特徴照合手段と、学習用信号を読み込み、前記参照特徴変換手段と前記入力特徴変換手段とにおいて線形変換する際に用いる変換係数を求める学習手段とを有し、前記学習手段は、前記学習用信号として、特徴ひずみのない原信号と該原信号に特徴ひずみが加えられたひずみ信号とを用い、前記入力特徴計算手段及び前記入力特徴正規化手段が用いる処理と同じ処理を前記学習用信号に実行することにより求められた学習用信号の正規化特徴の所定の区間の信号対を複数作成し、該信号対の級間分散を評価関数としたときの、該評価関数が最大となる前記変換係数を求め、前記参照特徴変換手段または前記入力特徴変換手段は、前記学習手段の求めた変換係数を用いて前記参照正規化特徴、前記入力正規化特徴をそれぞれ線形変換することを特徴とする信号検出装置である。
【０００９】
また本発明は、前記信号検出装置が、前記評価関数を最大とすることにより求める変換係数を、
【数５】

【数６】

【数７】

から求められる固有ベクトルを用いて生成することを特徴とする。
【００１０】
また本発明は、参照信号に類似した信号を入力信号から検出する信号検出装置の信号検出方法であって、信号検出装置の参照特徴計算手段が、入力された参照信号から参照特徴を抽出し、信号検出装置の参照特徴正規化手段が、前記参照特徴を基に正規化処理を行うことにより、参照正規化特徴を計算し、信号検出装置の学習手段が、特徴ひずみのない原信号と該原信号に特徴ひずみが加えられたひずみ信号とをを示す学習用信号を読み込み、当該学習用信号を用い、入力特徴計算手段及び入力特徴正規化手段が用いる処理と同じ処理を前記学習用信号に実行することにより求められた学習用信号の正規化特徴の所定の区間の信号対を複数作成し、該信号対の級間分散の級内分散に対する比を評価関数としたときの、該評価関数が最大となる前記変換係数を求め、信号検出装置の前記参照特徴変換手段が、前記学習手段の求めた変換係数を用いて前記参照正規化特徴を線形変換し、信号検出装置の入力特徴計算手段が、入力信号から入力特徴を抽出し、信号検出装置の入力特徴正規化手段が、前記入力特徴を基に正規化処理を行うことにより、入力正規化特徴を計算し、信号検出装置の前記入力特徴変換手段が、前記学習手段の求めた変換係数を用いて前記入力正規化特徴を線形変換し、信号検出装置の特徴照合手段が、前記入力特徴変換手段により線形変換された後の入力正規化特徴（以降、変換後入力正規化特徴とする）上に設定された照合区間における変換後入力正規化特徴と前記参照特徴変換手段により線形変換された後の参照正規化特徴（以降、変換後参照正規化特徴とする）との類似度を計算し、計算された類似度に基づき検索結果を出力することを特徴とする信号検出方法である。
【００１１】
また本発明は、前記信号検出方法において、前記評価関数を最大とすることにより求める変換係数を、
【数１】

【数２】

【数３】

から求められる固有ベクトルを用いて生成することを特徴とする。
【００１２】
また本発明は、参照信号に類似した信号を入力信号から検出する信号検出方法であって、信号検出装置の参照特徴計算手段が、入力された参照信号から参照特徴を抽出し、信号検出装置の参照特徴正規化手段が、前記参照特徴を基に正規化処理を行うことにより、参照正規化特徴を計算し、信号検出装置の学習手段が、特徴ひずみのない原信号と該原信号に特徴ひずみが加えられたひずみ信号とを示す学習用信号を読み込み、当該学習用信号を用い、入力特徴計算手段及び入力特徴正規化手段が用いる処理と同じ処理を前記学習用信号に実行することにより求められた学習用信号の正規化特徴の所定の区間の信号対を複数作成し、該信号対の級間分散を評価関数としたときの、該評価関数が最大となる前記変換係数を求め、信号検出装置の参照特徴変換手段が、前記学習手段の求めた変換係数を用いて前記参照正規化特徴を線形変換し、信号検出装置の入力特徴計算手段が、入力信号から入力特徴を抽出し、信号検出装置の入力特徴正規化手段が、前記入力特徴を基に正規化処理を行うことにより、入力正規化特徴を計算し、信号検出装置の入力特徴変換手段が、前記学習手段の求めた変換係数を用いて前記入力正規化特徴を線形変換し、信号検出装置の特徴照合手段が、前記入力特徴変換手段により線形変換された後の入力正規化特徴（以降、変換後入力正規化特徴とする）上に設定された照合区間における変換後入力正規化特徴と前記参照特徴変換手段により線形変換された後の参照正規化特徴（以降、変換後参照正規化特徴とする）との類似度を計算し、計算された類似度に基づき検索結果を出力することを特徴とする信号検出方法である。
【００１３】
また本発明は、前記信号検出方法において、前記評価関数を最大とすることにより求める変換係数を、
【数５】

【数６】

【数７】

から求められる固有ベクトルを用いて生成することを特徴とする。
【００１４】
また本発明は、上記信号検出装置としてコンピュータを実行させるための信号検出プログラムである。
【００１５】
また本発明は、上記信号検出プログラムを格納したコンピュータ読み取り可能な記録媒体である。
【００２３】
【発明の実施の形態】
以下、図面を参照しながら、本発明の実施形態について説明する。
【００２４】
＜第１の実施形態＞
図１は、本発明の第１の実施形態であり、音響信号を対象とする特徴ひずみに頑健な信号検出装置の構成を示すブロック図である。
図１に示す信号検出装置は、音響信号を対象とする特徴ひずみに頑健な信号検出を実現するものであり、参照特徴計算手段１と、入力特徴計算手段２と、参照特徴正規化手段３と、入力特徴正規化手段４と、特徴照合手段５で構成され、参照時系列信号すなわち検索したい音響信号と、入力時系列信号すなわち検索される音響信号を入力とし、参照時系列信号との類似した入力時系列信号中の箇所を出力する。
【００２５】
参照特徴計算手段１は、参照時系列信号から、特徴ベクトルからなる参照特徴を導くものである。
入力特徴計算手段２は、入力時系列信号から、特徴ベクトルからなる入力特徴を導くものである。
参照特徴正規化手段３は、前記参照特徴から、周辺の参照特徴から導いた統計量を用いて特徴ベクトルの各要素ごと独立に正規化した参照正規化特徴を導くものである。
入力特徴正規化手段４は、前記入力特徴から、周辺の入力特徴から導いた統計量を用いて特徴ベクトルの各要素ごと独立に正規化した入力正規化特徴を導くものである。
特徴照合手段５は、前記入力正規化特徴中に照合区間を設定し、前記参照正規化特徴と、前記入力正規化特徴中の該照合区間のそれぞれから類似度を計算するものである。
【００２６】
次に、上述した参照特徴計算手段１〜特徴照合手段５における処理内容を具体的に説明する。図２は、図１に示した特徴ひずみに頑健な信号検出装置の動作を示すフローチャートである。以下、このフローチャートに沿って説明する。
参照特徴計算手段１は、はじめに、与えられた参照時系列信号を読み込む（ステップＳ１）。次に、読み込んだ参照時系列信号に対して特徴抽出を行う（ステップＳ２）。
【００２７】
本実施形態では、特徴として音響信号のフーリエ変換の振幅成分を用いる。例えば、実環境中で流れるＣＤの音響信号から携帯端末で受音した５秒程度の音響信号を探索したい場合、特徴抽出の具体的な設定を次のようにすると、良い結果が得られる。すなわち、周波数８０００Ｈｚ（ヘルツ）で標本化した音響信号の１秒の区間をフーリエ変換し、０〜４０００Ｈｚを等間隔に２０区間に分割し、各区間内での振幅成分の平均パワーからなる２０次元のベクトルを抽出する。また、前記特徴ベクトルは０．１秒毎に抽出する。
【００２８】
本実施形態とは別に、特徴として映像信号の縮小画像を用いることもできる。例えば、テレビの放送信号から１５秒程度の映像信号を探索したい場合、特徴抽出の具体的な設定を次のようにすると、良い結果が得られる。すなわち、１フレームの画像を横に４等分割、縦に３等分割し、１２個の領域を設け、各領域内でＲＧＢ（赤、緑、青の三原色）それぞれについて画素値を平均する。前記１２個の領域のＲＧＢそれぞれの平均画素値からなる３６次元ベクトルを特徴ベクトルとする。この場合、前記特徴ベクトルは１フレーム毎に得られる。
【００２９】
入力特徴計算手段２は、はじめに、入力信号を読み込む（ステップＳ３）。次に、読み込んだ入力信号に対して特徴抽出を行う（ステップＳ４）。特徴抽出は、前記参照特徴計算手段１において行ったものと同様の操作を行う。
【００３０】
参照特徴正規化手段３では、はじめに、参照特徴計算手段１により得られた参照特徴を読み込む。次に参照特徴の特徴ベクトルの各要素毎に、ある一定区画の平均値と標準偏差を求める。例えば、該特徴ベクトルの前後１秒間の区画の特徴ベクトルに対して、平均値と標準偏差を求める。次に、該特徴ベクトルから該平均値を差し引き、該標準偏差で割った値を要素にもつ特徴ベクトルを参照正規化特徴とする（以上、ステップＳ５）。
【００３１】
入力特徴正規化手段４では、はじめに、入力特徴計算手段２により得られた入力特徴を読み込む。次に、読み込んだ入力特徴に対して正規化を行う。正規化は、前記参照特徴正規化手段３において行ったものと同様の操作を行う（以上、ステップＳ６）。
【００３２】
特徴照合手段５では、はじめに、参照特徴正規化手段３及び入力特徴正規化手段４により得られた参照正規化特徴及び入力正規化特徴を読み込む。続いて、入力正規化特徴に対して参照特徴正規化手段３で与えられた参照正規化特微と同じ長さの照合区間を設定する。次に、参照正規化特徴と入力正規化特徴の照合区間内の類似度を計算する。ここでは、類似度としてユークリッド距離を用いる。例えば、参照正規化特徴が５秒の長さの場合、参照正規化特徴のベクトルを１秒ごとに５つ抽出し、それらからなる１００次元を照合に用いるベクトルとする。つまり、音響信号をフーリエ変換することによって得た特徴ベクトルが２０次元であるので、そのベクトル５つ分（５秒分）で１００次元となる。
【００３３】
照合箇所は入力正規化特徴の先頭からずらしながら照合する。最後まで照合した後、ユークリッド距離が最も小さい箇所を探索結果として出力する。
つまり、特徴照合手段５は、入力信号の位置を初期化するとともに最短距離を初期化し（ステップＳ７）、参照信号と現位置における入力信号とを基にベクトル間のユークリッド距離を算出し（ステップＳ８）、算出された距離が最短距離より小さいか否かを判定し（ステップＳ９）、小さければ最短距離を更新し（ステップＳ１０）、小さくなければステップＳ１０の処理をスキップする。
そして、入力信号が終了したか否かを判定し（ステップＳ１１）、まだ終了していなければ、入力信号の位置をずらして（ステップＳ１２）、ステップＳ８の処理へ戻る。ステップＳ１１の判定において入力信号が終了していたなら、結果を出力して（ステップＳ１３）、全体の処理を終える。
【００３４】
なお、探索結果は、事前にユークリッド距離のしきい値を与えられていた場合、しきい値を下回るもののみ出力することもできる。これにより、入力信号中に参照信号にマッチする音響信号が存在しない場合には、結果を出力しないようにすることができる。
また、ユークリッド距離の上位Ｎ位までを出力するようにすることも可能である。これにより、入力信号中に参照信号にマッチする可能性のある箇所が複数存在する場合にも、それら複数の候補を出力することができる。
【００３５】
＜第２の実施形態＞
次に説明する第２の実施形態は、前記第１の実施形態に更に、変換手段を付加したものである。図３は、本発明の第２の実施形態であり、音響信号を対象とする特徴ひずみに頑健な信号検出装置の構成を示すブロック図である。
図３に示した参照特徴変換手段６及び入力特徴変換手段７は、参照時系列信号及び入力時系列信号から計算された正規化特徴に対して変換を行う。
【００３６】
次に、上述した参照特徴変換手段６及び入力特徴変換手段７における処理を具体的に説明する。図４は、図３に示した特徴ひずみに頑健な信号検出装置の動作を示すフローチャートである。このフローチャートにおいて、ステップＳ２１からＳ２６までの処理は、図２に示したステップＳ１からＳ６までの処理と同じである。
【００３７】
そして、参照特徴変換手段６は、はじめに前記参照特徴正規化手段３により得られた参照正規化特徴を読み込み、次に、前記参照正規化特徴の線形変換を行う（ステップＳ２７）。例えば、全２０次元のベクトルにおける４次元ずつの和をとり、５次元のベクトルに変換する。
また、入力特徴変換手段７は、はじめに前記入力特徴正規化手段４により得られた入力正規化特徴を読み込み、次に、上の参照特徴変換手段６による処理と同様の線形変換を行う（ステップＳ２８）。
また、図４のステップＳ２９からＳ３５までの処理は、図２に示したステップＳ７からＳ１３までの処理と同様である。
【００３８】
＜第３の実施形態＞
次に説明する第３の実施形態は、前記第２の実施形態に更に学習手段を有する形態である。図５は、本発明の第３の実施形態であり、音響信号を対象とする特徴ひずみに頑健な信号検出装置の一実施形態を示すブロック図である。
図５に示した学習手段８は、特徴ひずみによる変動の少ない変換を前もって学習により求める。
【００３９】
次に、上述した学習手段８における処理を具体的に説明する。図６は、図５に示した特徴ひずみに頑健な信号検出装置の動作を示すフローチャートである。このフローチャートにおいて、ステップＳ４１からＳ４６までの処理は、図４に示したステップＳ２１からＳ２６までの処理と同じである。
【００４０】
学習手段８では、はじめに、十分長いＣＤの楽曲などの音響信号を用意する。次に、この原信号と同一内容の楽曲などの音響信号で、特徴ひずみを含んだひずみ信号を複数用意する。ひずみ信号とは、例えば、携帯電話で受音した音響信号や車のエンジン音の雑音を含む音響信号などである。次に、原信号とひずみ信号から同一区間を複数切り取って信号対を作成する。原信号とひずみ信号から導かれる特徴または正規化特徴の変換出力において、信号対の級間分散を、原信号のまわりの二次モーメントを全信号対における和で割った値を評価関数とする。その評価関数が最大となるような変換係数を前もって学習する、すなわち、次の式（１）〜（３）で定義される一般固有値問題の固有ベクトルを変換係数として用いる。
【００４１】
【数１】

【００４２】
【数２】

【００４３】
【数３】

【００４４】
ただし、Ｍは信号対の数、Ｎは特徴ひずみを含む音響信号の種類、ｘ_ijはｉ番目の信号対のｊ番目の特徴ひずみを含む音響信号の正規化特徴（列ベクトル）、ｘ（バー）_iはｉ番目の信号対中の信号の正規化特徴の平均、ｘ_i0はｉ番目の信号対の原音の信号の正規化特徴、ｘ（バー）は全ての正規化特徴の平均、λは固有値、φは固有ベクトル、ｔは転置を表す。前記行列の固有ベクトルを固有値の大きいものから複数求めておく。本実施形態では、従来技術であるハウスホルダー法により８個の固有ベクトルを求めた。
【００４５】
つまり、学習手段８は上記のように、まず学習用信号を読み込み（ステップＳ４７）、その学習用信号を基に行列計算を行い（ステップＳ４８）、そして固有ベクトルを計算する（ステップＳ４９）。
【００４６】
次に、学習手段８が求めた固有ベクトルを用いて特徴の変換を行う。
参照特徴変換手段６は、はじめに、参照特徴正規化手段３により得られた参照正規化特微を読み込む。次に、前記学習手段８により得られた固有ベクトルを読み込む。そして、固有ベクトルを用いて前記参照正規化特徴を変換する（以上、ステップＳ５０）。
なお、変換後のｋ番目の特徴ベクトルの要素ｙ_kは、次の式（４）で定義される値である。ただし、φ_kはｋ番目の固有ベクトルである。
【００４７】
【数４】

【００４８】
入力特徴変換手段７は、はじめに、入力特徴正規化手段４により得られた入力正規化特徴を読み込む。次に、前記学習手段８により得られた固有ベクトルを読み込む。そして、上述した参照正規化特徴の変換と同様の変換を入力正規化特徴に対して行う（以上、ステップＳ５１）。
【００４９】
変換後の処理、すなわちステップＳ５２からＳ５８までの処理は、図４に示したステップＳ２９からＳ３５までの処理と同様である。
【００５０】
＜第４の実施形態＞
次に説明する第４の実施形態は、前記第３の実施形態と同様に学習手段を有する構成であるが、その学習の処理の方法が異なるものである。
この第４の実施形態による信号検出装置の構成は、図５に示したものと同じである。また、その信号検出装置の動作手順も図６のフローチャートで示したものと同様である。
【００５１】
次に、本実施形態特有の学習処理の方法について説明する。学習手段は、はじめに、十分長いＣＤの楽曲などの音響信号を用意する。次に、この原信号と同一内容の楽曲などの音響信号であって特徴ひずみを含んだひずみ信号を複数用意する。ひずみ信号とは、例えば、携帯電話で受音した音響信号や車のエンジン音の雑音を含む音響信号などである。次に、原信号とひずみ信号から同一区間を複数切り取って信号対を作成する。原信号とひずみ信号から導かれる特徴または正規化特徴の変換出力において、信号対の級間分散を評価関数とする。その評価関数が最大となるような変換係数を前もって学習する、すなわち、次の式（５）〜（７）で定義される一般固有値問題の固有ベクトルを変換係数として用いる。
【００５２】
【数５】

【００５３】
【数６】

【００５４】
【数７】

【００５５】
ただし、Ｍは信号対の数、Ｎは原信号及び特徴ひずみを含む音響信号の種類、ｘ_ijはｉ番目の信号対のｊ番目の特徴ひずみを含む音響信号の正規化特徴（列ベクトル）、ｘ（バー）_iはｉ番目の信号対中の信号の正規化特徴の平均、ｘ（バー）は全ての正規化特徴の平均、λは固有値、φは固有ベクトル、ｔは転置を表す。前記行列の固有ベクトルを固有値の大きいものから複数求めておく。本実施形態では、従来技術であるハウスホルダー法により８個の固有ベクトルを求めた。
【００５６】
つまり、学習手段８は上記のように、まず学習用信号を読み込み（図６のステップＳ４７）、その学習用信号を基に行列計算を行い（同、ステップＳ４８）、そして固有ベクトルを計算する（同、ステップＳ４９）。
【００５７】
次に、学習手段８が求めた固有ベクトルを用いて特徴の変換を行う。
参照特徴変換手段６は、はじめに、参照特徴正規化手段３により得られた参照正規化特微を読み込む。次に、前記学習手段８により得られた固有ベクトルを読み込む。そして、固有ベクトルを用いて前記参照正規化特徴を変換する。
なお、変換後のｋ番目の特徴ベクトルの要素ｙ_kは、次の式（８）で定義される値である。ただし、φ_kはｋ番目の固有ベクトルである。
【００５８】
【数８】

【００５９】
入力特徴変換手段７は、はじめに、入力特徴正規化手段４により得られた入力正規化特徴を読み込む。次に、前記学習手段８により得られた固有ベクトルを読み込む。そして、上述した参照正規化特徴の変換と同様の変換を入力正規化特徴に対して行う。
変換後の処理については、実施形態３と同様である。
【００６０】
上述した各実施形態の信号検出装置は、コンピュータを用いて実現される。そして、上述した参照信号読み込み、参照特徴抽出、入力信号読み込み、入力特徴抽出、参照特徴正規化、入力特徴正規化、学習、参照特徴変換、入力特徴変換、特徴間のユークリッド距離算出、最短ユークリッド距離による信号のマッチングなどの各処理の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータが読み出して実行することによって、上記処理が行われる。ここでコンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、半導体メモリ等をいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしても良い。
【００６１】
なお、入力信号及び参照信号は、入力ポート等から電気信号として入力するようにする。あるいは、入力信号あるいは参照信号のいずれか一方または両方に相当するデジタルデータを予め磁気ディスク等の記録媒体に書き込んでおいて、この記録媒体からデータを読み込んで処理するようにしても良い。
【００６２】
また、前記各実施形態において、参照特徴計算手段１が計算した参照特徴のベクトルや、入力特徴計算手段２が計算した入力特徴のベクトルや、参照特徴正規化手段３が正規化の処理を行った参照正規化特徴のベクトルや、入力特徴正規化手段４が正規化の処理を行った入力正規化特徴のベクトルや、参照特徴変換手段６によって変換された後のベクトルや、入力特徴変換手段７によって変換された後のベクトルや、学習手段８によって求められた変換係数などのデータ、あるいはその他の必要なデータは、コンピュータが備える記憶装置にそれぞれ書き込まれる。また、後続する各過程でこれらのデータを参照する際には前記記憶装置に書き込まれたデータがそれぞれ読み出される。
【００６３】
次に、この発明を適用した装置の動作実験例を示す。
本発明の効果を確認するため、次のような比較実験を行った。まず入力時系列信号として１２分間の音響信号を用意した。そして、その音響信号の中から無作為に２００個の各５秒間の参照信号を選択した。そして、本発明を適用した場合としなかった場合とで、それぞれ上記入力時系列信号内での上記参照信号の探索を行い、その精度を比較した。
【００６４】
入力信号としては、あるＣＤの音響信号をそのまま装置に取り込んだもの、実環境中でマイクを用いて受音したもの、ＰＨＳ（パーソナル・ハンディフォン・システム）で受音したもの、携帯電話で受音したものを用意した。またこれらを室内において、自動車内において、そして商店街において、それぞれ収録して用意した。
【００６５】
そして、信号検出装置によって最も類似した部分として出力された探索結果の再現率をもって精度とした。ここで前記再現率とは、探索されるべきもののうち探索結果として出力されたものの割合である。
【００６６】
上記のような実験の結果、本発明を適用しなかった場合すなわち正規化を行わなかった場合の精度は６．０４％であった。
正規化特徴に対して変換を行った場合すなわち本発明の第２の実施形態を用いた場合の精度は５０．２％であった。
また、学習結果を用いて変換を行った場合、本発明の第３の実施形態を用いた場合の精度は５９．６％、本発明の第４の実施形態を用いた場合の精度は６１．２％であった。
このように、本発明を適用することにより、従来技術に比べて信号検出の精度が大幅に向上することが実証できた。
【００６７】
なお、本発明を応用することによって、例えば、実環境中に流れている音楽やＣＭ（コマーシャルメッセージ）を携帯端末で受音し、その受音された音響信号を用いて膨大な音楽ＣＭデータベースの中から同一の音楽やＣＭを検索するといったコンテンツ検索装置が実現可能となる。また、このようなコンテンツ検索装置を、コンテンツの不正な複製や不正な再生を検出するために用いることもできる。
また、音響信号だけでなく、映像信号など一般の信号の検出にも応用することができる。
【００６８】
以上、図面を参照してこの発明の実施形態を詳述してきたが、具体的な構成はこれらの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。
【００６９】
【発明の効果】
以上説明したように、この発明によれば、参照特徴を正規化する処理により参照正規化特徴を導く参照特徴正規化過程と、入力特徴を正規化する処理により入力正規化特徴を導く入力特徴正規化過程とを有し、特徴照合過程においては、これら正規化された特徴同士を用いて類似度の計算を行うため、様々な種類の特徴ひずみに共通な方法により、特徴ひずみに頑健な信号検出を行うことが可能となる。
【００７０】
また、この発明によれば、参照特徴正規化過程において計算された参照正規化特徴及び入力特徴正規化過程において計算された入力正規化特徴の少なくともいずれか一方に対して所定の変換処理を行う変換過程を有するため、より一層特徴ひずみに頑健な信号検出を行うことが可能となる。
【００７１】
また、この発明によれば、特徴ひずみによる変動の少ない特徴への変換を前もって学習処理によって求める学習過程を有しており、変換過程においては学習処理の結果得られる変換を用いて変換処理を行うため、より一層特徴ひずみに頑健な信号検出を行うことが可能となる。
【００７２】
これらにより、例えば映像信号や音響信号など、時系列信号において特定の信号と類似の部分を探索する場合に、探索精度を向上させることが可能となり、例えば、データベースの中から特定の音楽やＣＭや映像などを検索するコンテンツ検索装置の検索精度を向上させることが可能となる。
【００７３】
また、特に、複数の特徴ひずみに対しても一つの参照特徴で信号検出を可能とした点、注目領域全体のパワーを用いて正規化する方法に比べて周波数特性の変化にも対応可能にした点などにおいて、本願発明の技術は従来技術よりも飛躍的に進歩している。
【図面の簡単な説明】
【図１】この発明の第１の実施形態による信号検出装置の構成を示すブロック図である。
【図２】この発明の第１の実施形態による信号検出装置の動作手順を示すフローチャートである。
【図３】この発明の第２の実施形態による信号検出装置の構成を示すブロック図である。
【図４】この発明の第２の実施形態による信号検出装置の動作手順を示すフローチャートである。
【図５】この発明の第３および第４の実施形態による信号検出装置の構成を示すブロック図である。
【図６】この発明の第３および第４の実施形態による信号検出装置の動作手順を示すフローチャートである。
【符号の説明】
１参照特徴計算手段
２入力特徴計算手段
３参照特徴正規化手段
４入力特徴正規化手段
５特徴照合手段
６参照特徴変換手段
７入力特徴変換手段
８学習手段[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a signal detection method and apparatus for detecting a portion similar to a reference signal in an input signal. In particular, the present invention relates to a signal detection method and apparatus for detecting a portion similar to a reference time-series signal shorter than an input time-series signal from input time-series signals such as video signals and audio signals.
[0002]
[Prior art]
Conventionally, regarding a signal detection method, there is known an acoustic signal detection method for finding a location of an acoustic signal similar to a previously registered acoustic signal, such as Japanese Patent No. 30653314 "High-speed signal detection method, apparatus and recording medium thereof". Yes. However, in this method, it is assumed that the characteristic distortion due to the noise of the reference time-series signal or the input time-series signal is small, and there is a drawback that the search accuracy may be reduced when the characteristic distortion is severe. .
[0003]
The inventors of the present invention have invented a method of performing signal detection that is robust against characteristic distortion by providing a variation addition process, and have already filed patent applications.
There is also a method of performing signal detection that is robust against feature distortion by normalizing using the power of the entire region of interest.
[0004]
[Problems to be solved by the invention]
The above-described method of providing a variation adding process has a drawback that a plurality of reference features must be prepared when a plurality of feature distortions are considered.
In addition, the above-described normalization method using the power of the entire region of interest has a drawback in that the search accuracy decreases when the frequency characteristic changes.
[0005]
The present invention has been made in consideration of the above-described circumstances, and provides a signal detection processing means that is more robust to characteristic distortion than the conventional method, and is more versatile and has characteristic distortion than the conventional method. The object is to provide a robust signal detection processing means.
[0006]
[Means for Solving the Problems]
In order to solve the above problems, the present invention provides a signal detection device for detecting a signal similar to a reference signal from an input signal, the reference feature calculating means for extracting a reference feature from the input reference signal, and the reference Reference feature normalizing means for calculating a reference normalized feature by performing normalization processing based on the feature, reference feature converting means for linearly converting the reference normalized feature, and input for extracting an input feature from an input signal Feature calculation means, input feature normalization means for calculating an input normalized feature by performing normalization processing based on the input feature, input feature conversion means for linearly converting the input normalized feature, and the input The post-conversion input normalization feature and the reference feature conversion unit in the collation section set on the input normalization feature (hereinafter referred to as post-conversion input normalization feature) linearly transformed by the feature transformation unit A feature matching unit that calculates a similarity with a reference normalized feature after further linear conversion (hereinafter referred to as a converted reference normalized feature), and outputs a search result based on the calculated similarity; and for learning A learning unit that reads a signal and obtains a conversion coefficient to be used when performing linear conversion in the reference feature conversion unit and the input feature conversion unit, and the learning unit uses an original without characteristic distortion as the learning signal. Learning obtained by executing the same processing as the processing used by the input feature calculation means and the input feature normalization means on the learning signal using a signal and a distortion signal obtained by adding characteristic distortion to the original signal A plurality of signal pairs in a predetermined interval of the normalization feature of the signal for use, and the conversion coefficient that maximizes the evaluation function when the ratio of the inter-class variance of the signal pair to the in-class variance is an evaluation function, Demand The reference feature converting means or the input feature converting means linearly transforms the reference normalized feature and the input normalized feature using the conversion coefficient obtained by the learning means, respectively. is there.
[0007]
In the present invention, the signal detection device obtains a conversion coefficient obtained by maximizing the evaluation function,
[Expression 1]

[Expression 2]

[Equation 3]

It generates using the eigenvector calculated | required from (2).
[0008]
The present invention also relates to a signal detection device for detecting a signal similar to a reference signal from an input signal, a reference feature calculation means for extracting a reference feature from the input reference signal, and a normalization process based on the reference feature A reference feature normalizing means for calculating a reference normalized feature, a reference feature converting means for linearly converting the reference normalized feature, an input feature calculating means for extracting an input feature from an input signal, and the input By performing normalization processing based on features, input feature normalizing means for calculating input normalized features, input feature converting means for linearly converting the input normalized features, and linear conversion by the input feature converting means After the input normalization features after conversion in the collation section set on the input normalization features after conversion (hereinafter referred to as post-conversion input normalization features) and after linear conversion by the reference feature conversion means A feature matching unit that calculates a similarity with a reference normalized feature (hereinafter referred to as a converted reference normalized feature), outputs a search result based on the calculated similarity, reads a learning signal, and reads the reference feature Learning means for obtaining a conversion coefficient used when performing linear conversion in the conversion means and the input feature conversion means, and the learning means uses the original signal having no characteristic distortion as the learning signal and features in the original signal. Using a distortion signal to which distortion has been added, and performing the same processing as the processing used by the input feature calculation means and the input feature normalization means on the learning signal. A plurality of signal pairs in a predetermined section are created, and when the inter-class variance of the signal pairs is used as an evaluation function, the conversion coefficient that maximizes the evaluation function is obtained, and the reference feature conversion means or the input feature Switch means, said reference normalization feature using the conversion coefficient determined in the learning means, a signal detection apparatus, characterized in that each linear transformation of the input normalized features.
[0009]
In the present invention, the signal detection device obtains a conversion coefficient obtained by maximizing the evaluation function,
[Equation 5]

[Formula 6]

[Expression 7]

It generates using the eigenvector calculated | required from (2).
[0010]
Further, the present invention is a signal detection method of a signal detection device for detecting a signal similar to a reference signal from an input signal, wherein the reference feature calculation means of the signal detection device extracts the reference feature from the input reference signal, The reference feature normalizing means of the signal detection device calculates a reference normalized feature by performing normalization processing based on the reference feature, and the learning means of the signal detection device detects the original signal without feature distortion and the original signal. A learning signal indicating a distortion signal obtained by adding a characteristic distortion to the signal is read, and the learning signal is used to perform the same processing as the processing used by the input feature calculation unit and the input feature normalization unit on the learning signal. A plurality of signal pairs in a predetermined section of the normalized feature of the learning signal obtained by doing the above, and the evaluation function when the ratio of the inter-class variance of the signal pair to the in-class variance is used as the evaluation function is Maximum The reference feature conversion means of the signal detection device linearly converts the reference normalized feature using the conversion coefficient obtained by the learning means, and the input feature calculation means of the signal detection device The input feature is extracted from the input feature, and the input feature normalizing means of the signal detection device calculates the input normalized feature by performing normalization processing based on the input feature, and the input feature conversion means of the signal detection device The input normalization feature is linearly transformed using the conversion coefficient obtained by the learning means, and the feature matching means of the signal detection device is the input normalization feature (hereinafter referred to as the input normalization feature after being linearly transformed by the input feature conversion means) The post-conversion input normalization feature in the matching section set above and the reference normalization feature after linear conversion by the reference feature conversion means (hereinafter referred to as post-conversion reference normalization feature) The similarity calculated with that), a signal detection method and outputting a search result based on the calculated similarity.
[0011]
Further, the present invention provides a conversion coefficient obtained by maximizing the evaluation function in the signal detection method,
[Expression 1]

[Expression 2]

[Equation 3]

It generates using the eigenvector calculated | required from (2).
[0012]
The present invention is also a signal detection method for detecting a signal similar to a reference signal from an input signal, wherein the reference feature calculation means of the signal detection device extracts the reference feature from the input reference signal, and The reference feature normalizing means calculates a reference normalized feature by performing normalization processing based on the reference feature, and the learning means of the signal detection device detects the original signal without feature distortion and the feature distortion on the original signal. Is obtained by reading a learning signal indicating a distortion signal to which the learning signal is added, using the learning signal, and executing the same processing as the processing used by the input feature calculation unit and the input feature normalization unit on the learning signal. A plurality of signal pairs in a predetermined section of the normalized feature of the learning signal obtained, and when the inter-class variance of the signal pairs is used as an evaluation function, the conversion coefficient that maximizes the evaluation function is obtained, and signal detection is performed. Equipment reference A feature conversion unit linearly converts the reference normalized feature using the conversion coefficient obtained by the learning unit, an input feature calculation unit of the signal detection device extracts an input feature from the input signal, and an input of the signal detection device The feature normalizing means calculates an input normalized feature by performing normalization processing based on the input feature, and the input feature converting means of the signal detection device uses the conversion coefficient obtained by the learning means to The input normalization feature is linearly converted, and the feature matching unit of the signal detection device is set on the input normalization feature (hereinafter referred to as the post-conversion input normalization feature) after being linearly converted by the input feature conversion unit. The similarity between the input normalization feature after conversion in the matching section and the reference normalization feature after linear conversion by the reference feature conversion means (hereinafter referred to as post-conversion reference normalization feature) is calculated and calculated. In similarity A signal detection method and outputting the Hazuki search results.
[0013]
Further, the present invention provides a conversion coefficient obtained by maximizing the evaluation function in the signal detection method,
[Equation 5]

[Formula 6]

[Expression 7]

It generates using the eigenvector calculated | required from (2).
[0014]
Moreover, this invention is a signal detection program for making a computer run as said signal detection apparatus.
[0015]
The present invention is also a computer-readable recording medium storing the signal detection program.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0024]
<First Embodiment>
FIG. 1 is a block diagram illustrating a configuration of a signal detection device that is a first embodiment of the present invention and is robust against characteristic distortions for acoustic signals.
The signal detection apparatus shown in FIG. 1 realizes signal detection that is robust to characteristic distortion for an acoustic signal, and includes a reference feature calculation unit 1, an input feature calculation unit 2, and a reference feature normalization unit 3. The input feature normalizing means 4 and the feature matching means 5 are similar to the reference time series signal, with the input time series signal, ie, the acoustic signal to be searched, and the input time series signal, ie, the acoustic signal to be searched, as inputs. Output locations in the input time series signal.
[0025]
The reference feature calculation means 1 derives a reference feature comprising a feature vector from a reference time series signal.
The input feature calculation means 2 derives an input feature composed of feature vectors from the input time series signal.
The reference feature normalizing means 3 derives, from the reference features, reference normalized features that are independently normalized for each element of the feature vector using a statistic derived from surrounding reference features.
The input feature normalizing means 4 derives an input normalized feature obtained by normalizing each element of the feature vector independently from the input feature using a statistic derived from surrounding input features.
The feature matching unit 5 sets a matching section in the input normalized feature, and calculates a similarity from each of the reference normalized feature and the matching section in the input normalized feature.
[0026]
Next, the processing contents in the above-described reference feature calculation means 1 to feature matching means 5 will be specifically described. FIG. 2 is a flowchart showing the operation of the signal detection apparatus robust to the characteristic distortion shown in FIG. Hereinafter, it demonstrates along this flowchart.
The reference feature calculation means 1 first reads a given reference time series signal (step S1). Next, feature extraction is performed on the read reference time series signal (step S2).
[0027]
In this embodiment, an amplitude component of Fourier transform of an acoustic signal is used as a feature. For example, when searching for an acoustic signal of about 5 seconds received by a portable terminal from an acoustic signal of a CD flowing in a real environment, good results can be obtained by setting the specific feature extraction settings as follows. That is, a 1-second section of an acoustic signal sampled at a frequency of 8000 Hz (Hertz) is Fourier-transformed, and 0 to 4000 Hz is divided into 20 sections at equal intervals. Extract the vectors. The feature vector is extracted every 0.1 second.
[0028]
Apart from this embodiment, a reduced image of a video signal can also be used as a feature. For example, when it is desired to search for a video signal of about 15 seconds from a television broadcast signal, good results can be obtained by setting the specific feature extraction settings as follows. That is, an image of one frame is divided into four equal parts horizontally and three equal parts vertically, and 12 regions are provided, and pixel values are averaged for each of RGB (the three primary colors of red, green, and blue) in each region. A 36-dimensional vector composed of average pixel values of RGB in the 12 regions is defined as a feature vector. In this case, the feature vector is obtained for each frame.
[0029]
First, the input feature calculation means 2 reads an input signal (step S3). Next, feature extraction is performed on the read input signal (step S4). For feature extraction, the same operation as that performed in the reference feature calculation means 1 is performed.
[0030]
First, the reference feature normalizing unit 3 reads the reference feature obtained by the reference feature calculating unit 1. Next, an average value and a standard deviation of a certain section are obtained for each element of the feature vector of the reference feature. For example, an average value and a standard deviation are obtained for a feature vector of a section for 1 second before and after the feature vector. Next, a feature vector having an element obtained by subtracting the average value from the feature vector and dividing by the standard deviation is set as a reference normalized feature (step S5).
[0031]
First, the input feature normalization means 4 reads the input features obtained by the input feature calculation means 2. Next, normalization is performed on the read input features. For normalization, the same operation as that performed in the reference feature normalization means 3 is performed (step S6).
[0032]
The feature matching unit 5 first reads the reference normalized feature and the input normalized feature obtained by the reference feature normalizing unit 3 and the input feature normalizing unit 4. Subsequently, a collation section having the same length as the reference normalization feature given by the reference feature normalization means 3 is set for the input normalization feature. Next, the similarity in the collation section of the reference normalized feature and the input normalized feature is calculated. Here, the Euclidean distance is used as the similarity. For example, when the reference normalization feature is 5 seconds long, five reference normalization feature vectors are extracted every second, and the 100 dimensions composed of them are used as vectors for matching. That is, since the feature vector obtained by Fourier transforming the acoustic signal has 20 dimensions, the five vectors (5 seconds) become 100 dimensions.
[0033]
The collation part is collated while shifting from the head of the input normalization feature. After collating to the end, the part with the smallest Euclidean distance is output as a search result.
That is, the feature matching unit 5 initializes the position of the input signal and initializes the shortest distance (step S7), and calculates the Euclidean distance between the vectors based on the reference signal and the input signal at the current position (step S8). ), It is determined whether or not the calculated distance is smaller than the shortest distance (step S9). If it is smaller, the shortest distance is updated (step S10), and if not smaller, the process of step S10 is skipped.
Then, it is determined whether or not the input signal has ended (step S11). If it has not ended yet, the position of the input signal is shifted (step S12), and the process returns to step S8. If the input signal has been completed in the determination in step S11, the result is output (step S13), and the entire process is completed.
[0034]
In addition, when the threshold value of the Euclidean distance is given in advance, only search results that are below the threshold value can be output. As a result, when there is no acoustic signal matching the reference signal in the input signal, the result can be prevented from being output.
It is also possible to output up to the top N places of the Euclidean distance. Thereby, even when there are a plurality of locations in the input signal that may match the reference signal, the plurality of candidates can be output.
[0035]
<Second Embodiment>
In the second embodiment to be described next, conversion means is further added to the first embodiment. FIG. 3 is a block diagram showing a configuration of a signal detection device that is a second embodiment of the present invention and is robust against characteristic distortions for acoustic signals.
The reference feature conversion unit 6 and the input feature conversion unit 7 shown in FIG. 3 perform conversion on the normalized feature calculated from the reference time series signal and the input time series signal.
[0036]
Next, the processing in the reference feature conversion unit 6 and the input feature conversion unit 7 described above will be specifically described. FIG. 4 is a flowchart showing the operation of the signal detection apparatus robust to the characteristic distortion shown in FIG. In this flowchart, the processing from step S21 to S26 is the same as the processing from step S1 to S6 shown in FIG.
[0037]
Then, the reference feature conversion unit 6 first reads the reference normalized feature obtained by the reference feature normalizing unit 3, and then performs linear conversion of the reference normalized feature (step S27). For example, the sum of four dimensions in all 20-dimensional vectors is taken and converted to a five-dimensional vector.
The input feature conversion means 7 first reads the input normalized feature obtained by the input feature normalization means 4, and then performs the same linear transformation as the processing by the reference feature conversion means 6 (step S28). ).
Further, the processing from step S29 to S35 in FIG. 4 is the same as the processing from step S7 to S13 shown in FIG.
[0038]
<Third Embodiment>
A third embodiment to be described next is a form further including learning means in the second embodiment. FIG. 5 is a block diagram showing an embodiment of a signal detection apparatus according to the third embodiment of the present invention, which is robust to characteristic distortions for acoustic signals.
The learning means 8 shown in FIG. 5 obtains a conversion with less fluctuation due to feature distortion by learning in advance.
[0039]
Next, the process in the learning means 8 mentioned above is demonstrated concretely. FIG. 6 is a flowchart showing the operation of the signal detection apparatus robust to the characteristic distortion shown in FIG. In this flowchart, the processing from step S41 to S46 is the same as the processing from step S21 to S26 shown in FIG.
[0040]
The learning means 8 first prepares an acoustic signal such as a sufficiently long CD song. Next, a plurality of distortion signals including characteristic distortions are prepared as acoustic signals having the same contents as the original signal. The distortion signal is, for example, an acoustic signal received by a mobile phone or an acoustic signal including noise of a car engine sound. Next, a plurality of the same sections are cut out from the original signal and the distortion signal to create a signal pair. In the converted output of the characteristic derived from the original signal and the distortion signal or the normalized characteristic, the value obtained by dividing the interclass variance of the signal pair by the second moment around the original signal divided by the sum of all the signal pairs is used. The conversion coefficient that maximizes the evaluation function is learned in advance, that is, the eigenvector of the general eigenvalue problem defined by the following equations (1) to (3) is used as the conversion coefficient.
[0041]
[Expression 1]

[0042]
[Expression 2]

[0043]
[Equation 3]

[0044]
Where M is the number of signal pairs, N is the type of acoustic signal including characteristic distortion, x _ij Is the normalized feature (column vector) of the acoustic signal containing the j th feature distortion of the i th signal pair, x (bar) _i Is the average of the normalized features of the signal in the i-th signal pair, x _i0 Is the normalized feature of the original sound signal of the i-th signal pair, x (bar) is the average of all normalized features, λ is the eigenvalue, φ is the eigenvector, and t is the transpose. A plurality of eigenvectors of the matrix are obtained from those having large eigenvalues. In this embodiment, eight eigenvectors are obtained by the conventional house holder method.
[0045]
That is, as described above, the learning means 8 first reads the learning signal (step S47), performs matrix calculation based on the learning signal (step S48), and calculates the eigenvector (step S49).
[0046]
Next, feature conversion is performed using the eigenvector obtained by the learning means 8.
First, the reference feature conversion unit 6 reads the reference normalization feature obtained by the reference feature normalization unit 3. Next, the eigenvector obtained by the learning means 8 is read. Then, the reference normalization feature is converted using the eigenvector (step S50).
The element y of the kth feature vector after conversion _k Is a value defined by the following equation (4). However, φ _k Is the k-th eigenvector.
[0047]
[Expression 4]

[0048]
First, the input feature conversion unit 7 reads the input normalized feature obtained by the input feature normalization unit 4. Next, the eigenvector obtained by the learning means 8 is read. Then, the conversion similar to the conversion of the reference normalization feature described above is performed on the input normalization feature (step S51).
[0049]
The process after conversion, that is, the process from step S52 to S58 is the same as the process from step S29 to S35 shown in FIG.
[0050]
<Fourth Embodiment>
A fourth embodiment to be described next has a learning unit as in the third embodiment, but the learning processing method is different.
The configuration of the signal detection apparatus according to the fourth embodiment is the same as that shown in FIG. The operation procedure of the signal detection apparatus is the same as that shown in the flowchart of FIG.
[0051]
Next, a learning processing method unique to this embodiment will be described. The learning means first prepares an acoustic signal such as a sufficiently long CD song. Next, a plurality of distortion signals including characteristic distortions, which are acoustic signals having the same contents as the original signal, are prepared. The distortion signal is, for example, an acoustic signal received by a mobile phone or an acoustic signal including noise of a car engine sound. Next, a plurality of the same sections are cut out from the original signal and the distortion signal to create a signal pair. The interclass variance of the signal pair is used as an evaluation function in the converted output of the feature derived from the original signal and the distortion signal or the normalized feature. The conversion coefficient that maximizes the evaluation function is learned in advance, that is, the eigenvector of the general eigenvalue problem defined by the following equations (5) to (7) is used as the conversion coefficient.
[0052]
[Equation 5]

[0053]
[Formula 6]

[0054]
[Expression 7]

[0055]
Where M is the number of signal pairs, N is the type of acoustic signal including the original signal and characteristic distortion, x _ij Is the normalized feature (column vector) of the acoustic signal containing the j th feature distortion of the i th signal pair, x (bar) _i Is the average of the normalized features of the signal in the i-th signal pair, x (bar) is the average of all the normalized features, λ is the eigenvalue, φ is the eigenvector, and t is the transpose. A plurality of eigenvectors of the matrix are obtained from those having large eigenvalues. In this embodiment, eight eigenvectors are obtained by the conventional house holder method.
[0056]
That is, as described above, the learning means 8 first reads a learning signal (step S47 in FIG. 6), performs matrix calculation based on the learning signal (step S48), and calculates an eigenvector (same as above). Step S49).
[0057]
Next, feature conversion is performed using the eigenvector obtained by the learning means 8.
First, the reference feature conversion unit 6 reads the reference normalization feature obtained by the reference feature normalization unit 3. Next, the eigenvector obtained by the learning means 8 is read. Then, the reference normalization feature is converted using an eigenvector.
The element y of the kth feature vector after conversion _k Is a value defined by the following equation (8). However, φ _k Is the k-th eigenvector.
[0058]
[Equation 8]

[0059]
First, the input feature conversion unit 7 reads the input normalized feature obtained by the input feature normalization unit 4. Next, the eigenvector obtained by the learning means 8 is read. Then, the same conversion as that of the reference normalization feature described above is performed on the input normalization feature.
The processing after conversion is the same as in the third embodiment.
[0060]
The signal detection apparatus of each embodiment described above is realized using a computer. Then, the above-mentioned reference signal reading, reference feature extraction, input signal reading, input feature extraction, reference feature normalization, input feature normalization, learning, reference feature conversion, input feature conversion, Euclidean distance calculation between features, shortest Euclidean distance Each process such as signal matching is stored in a computer-readable recording medium in the form of a program, and the above process is performed by the computer reading and executing the program. Here, the computer-readable recording medium means a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, or the like. Alternatively, the computer program may be distributed to the computer via a communication line, and the computer that has received the distribution may execute the program.
[0061]
The input signal and the reference signal are input as electric signals from an input port or the like. Alternatively, digital data corresponding to one or both of the input signal and the reference signal may be written in advance on a recording medium such as a magnetic disk, and the data may be read from the recording medium for processing.
[0062]
In each of the above embodiments, the reference feature vector calculated by the reference feature calculation unit 1, the input feature vector calculated by the input feature calculation unit 2, and the reference feature normalization unit 3 performs normalization processing. A reference normalized feature vector, an input normalized feature vector that has been subjected to normalization processing by the input feature normalizing means 4, a vector that has been converted by the reference feature converting means 6, and an input feature converting means 7 The converted vector, the data such as the conversion coefficient obtained by the learning means 8, or other necessary data are written in a storage device included in the computer. Further, when referring to these data in each subsequent process, the data written in the storage device is read out.
[0063]
Next, an operation experiment example of the apparatus to which the present invention is applied will be shown.
In order to confirm the effect of the present invention, the following comparative experiment was conducted. First, an acoustic signal for 12 minutes was prepared as an input time series signal. Then, 200 reference signals for 5 seconds each were randomly selected from the acoustic signals. Then, the reference signal was searched for in the input time-series signal with and without applying the present invention, and the accuracy was compared.
[0064]
As input signals, the sound signal of a certain CD is directly taken into the device, received by a microphone in a real environment, received by a PHS (Personal Handyphone System), received by a mobile phone. I prepared a sound. They were also recorded and prepared in the room, in the car, and in the shopping street.
[0065]
The accuracy was determined by the recall of the search result output as the most similar part by the signal detection device. Here, the recall is the ratio of the search results output as search results.
[0066]
As a result of the above experiment, the accuracy when the present invention was not applied, that is, when normalization was not performed, was 6.04%.
The accuracy when the normalized feature was converted, that is, when the second embodiment of the present invention was used was 50.2%.
Further, when the conversion is performed using the learning result, the accuracy when the third embodiment of the present invention is used is 59.6%, and the accuracy when the fourth embodiment of the present invention is used is 61. 2%.
Thus, it has been proved that the accuracy of signal detection is greatly improved by applying the present invention as compared with the prior art.
[0067]
By applying the present invention, for example, music or CM (commercial message) flowing in a real environment is received by a portable terminal, and a huge music CM database is stored using the received sound signal. It is possible to realize a content search apparatus that searches for the same music or CM from the inside. Such a content search device can also be used to detect unauthorized duplication or unauthorized reproduction of content.
Further, it can be applied not only to an acoustic signal but also to detection of a general signal such as a video signal.
[0068]
The embodiments of the present invention have been described in detail above with reference to the drawings. However, the specific configuration is not limited to these embodiments, and includes a design and the like within a scope not departing from the gist of the present invention.
[0069]
【The invention's effect】
As described above, according to the present invention, the reference feature normalization process for deriving the reference normalized feature by the process of normalizing the reference feature, and the input feature normalization for deriving the input normalized feature by the process of normalizing the input feature In the feature matching process, these normalized features are used to calculate the similarity, so that signal detection that is robust against feature distortion is achieved by a method common to various types of feature distortion. Can be performed.
[0070]
Further, according to the present invention, the conversion that performs a predetermined conversion process on at least one of the reference normalized feature calculated in the reference feature normalization process and the input normalized feature calculated in the input feature normalization process Since it has a process, it becomes possible to perform signal detection more robust to characteristic distortion.
[0071]
In addition, according to the present invention, a learning process is performed in which a conversion to a feature with less fluctuation due to feature distortion is obtained in advance by a learning process, and the conversion process is performed using a conversion obtained as a result of the learning process. Therefore, it becomes possible to perform signal detection more robust against characteristic distortion.
[0072]
Thus, for example, when searching for a portion similar to a specific signal in a time-series signal such as a video signal or an audio signal, the search accuracy can be improved. For example, specific music, CM, It becomes possible to improve the search accuracy of a content search apparatus that searches for video and the like.
[0073]
In particular, it is possible to detect signals with a single reference feature even for multiple feature distortions, and it is also possible to cope with changes in frequency characteristics compared to the normalization method using the power of the entire region of interest. In terms of points, the technology of the present invention has made significant progress over the prior art.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a signal detection device according to a first embodiment of the present invention.
FIG. 2 is a flowchart showing an operation procedure of the signal detection apparatus according to the first embodiment of the present invention.
FIG. 3 is a block diagram showing a configuration of a signal detection device according to a second embodiment of the present invention.
FIG. 4 is a flowchart showing an operation procedure of a signal detection device according to a second embodiment of the present invention.
FIG. 5 is a block diagram showing a configuration of a signal detection device according to third and fourth embodiments of the present invention.
FIG. 6 is a flowchart showing an operation procedure of a signal detection device according to third and fourth embodiments of the present invention.
[Explanation of symbols]
1 Reference feature calculation means
2 Input feature calculation means
3 Reference feature normalization means
4 Input feature normalization means
5. Feature matching means
6 Reference feature conversion means
7 Input feature conversion means
8 Learning tools

Claims

A signal detection device for detecting a signal similar to a reference signal from an input signal,
Reference feature calculation means for extracting a reference feature from the input reference signal;
Reference feature normalizing means for calculating a reference normalized feature by performing a normalization process based on the reference feature;
Reference feature conversion means for linearly converting the reference normalized feature;
An input feature calculation means for extracting an input feature from the input signal;
An input feature normalizing means for calculating an input normalized feature by performing a normalization process based on the input feature;
Input feature conversion means for linearly converting the input normalized features;
A post-conversion input normalization feature in a matching section set on an input normalization feature (hereinafter referred to as post-conversion input normalization feature) after linear transformation by the input feature transformation unit and linear by the reference feature transformation unit A feature matching unit that calculates a similarity with a converted reference normalized feature (hereinafter referred to as a converted reference normalized feature), and outputs a search result based on the calculated similarity;
Learning means for reading a learning signal and obtaining a conversion coefficient used when performing linear conversion in the reference feature conversion means and the input feature conversion means;
The learning means uses, as the learning signal, an original signal without feature distortion and a distortion signal obtained by adding feature distortion to the original signal, and processing used by the input feature calculation means and the input feature normalization means; A plurality of signal pairs in a predetermined section of the normalized feature of the learning signal obtained by executing the same processing on the learning signal is created, and the ratio of the inter-class variance of the signal pair to the in-class variance is evaluated function The conversion coefficient that maximizes the evaluation function is obtained,
The reference feature conversion unit or the input feature conversion unit linearly converts the reference normalization feature and the input normalization feature using the conversion coefficient obtained by the learning unit.

The signal detection device according to claim 1,
The conversion coefficient obtained by maximizing the evaluation function is

A signal detection device characterized by generating using an eigenvector obtained from

A signal detection device for detecting a signal similar to a reference signal from an input signal,
Reference feature calculation means for extracting a reference feature from the input reference signal;
Reference feature normalizing means for calculating a reference normalized feature by performing a normalization process based on the reference feature;
Reference feature conversion means for linearly converting the reference normalized feature;
An input feature calculation means for extracting an input feature from the input signal;
An input feature normalizing means for calculating an input normalized feature by performing a normalization process based on the input feature;
Input feature conversion means for linearly converting the input normalized features;
A post-conversion input normalization feature in a matching section set on an input normalization feature (hereinafter referred to as post-conversion input normalization feature) after linear transformation by the input feature transformation unit and linear by the reference feature transformation unit A feature matching unit that calculates a similarity with a converted reference normalized feature (hereinafter referred to as a converted reference normalized feature), and outputs a search result based on the calculated similarity;
Learning means for reading a learning signal and obtaining a conversion coefficient used when performing linear conversion in the reference feature conversion means and the input feature conversion means;
The learning means uses, as the learning signal, an original signal without feature distortion and a distortion signal obtained by adding feature distortion to the original signal, and processing used by the input feature calculation means and the input feature normalization means; A plurality of signal pairs in a predetermined section of the normalized feature of the learning signal obtained by executing the same processing on the learning signal, and when the interclass variance of the signal pair is used as an evaluation function, Find the conversion coefficient that maximizes the evaluation function,
The reference feature conversion unit or the input feature conversion unit linearly converts the reference normalization feature and the input normalization feature using the conversion coefficient obtained by the learning unit.

The signal detection device according to claim 3,
The conversion coefficient obtained by maximizing the evaluation function is

A signal detection method of a signal detection device for detecting a signal similar to a reference signal from an input signal,
The reference feature calculation means of the signal detection device extracts the reference feature from the input reference signal,
The reference feature normalization means of the signal detection device calculates a reference normalization feature by performing a normalization process based on the reference feature,
The learning means of the signal detection device reads a learning signal indicating an original signal having no characteristic distortion and a distortion signal obtained by adding the characteristic distortion to the original signal, and using the learning signal, an input feature calculation means and an input A plurality of signal pairs in a predetermined section of the normalized feature of the learning signal obtained by executing the same processing as the processing used by the feature normalizing means on the learning signal is generated, and the class variance of the signal pair is When the ratio to the in-class variance is taken as an evaluation function, the conversion coefficient that maximizes the evaluation function is obtained,
The reference feature conversion means of the signal detection device linearly converts the reference normalized feature using the conversion coefficient obtained by the learning means,
The input feature calculation means of the signal detection device extracts the input feature from the input signal,
The input feature normalization means of the signal detection device calculates an input normalization feature by performing normalization processing based on the input feature,
The input feature conversion unit of the signal detection device linearly converts the input normalized feature using the conversion coefficient obtained by the learning unit,
The post-conversion input normalization in the collation section set on the input normalization feature (hereinafter referred to as post-conversion input normalization feature) after the feature collation means of the signal detection device is linearly transformed by the input feature conversion means The similarity between the feature and the reference normalized feature that has been linearly converted by the reference feature conversion means (hereinafter referred to as the converted reference normalized feature) is calculated, and a search result is output based on the calculated similarity. And a signal detection method.

The conversion coefficient obtained by maximizing the evaluation function is

The signal detection method according to claim 5, wherein the signal is generated using an eigenvector obtained from

A signal detection method for detecting a signal similar to a reference signal from an input signal,
The reference feature calculation means of the signal detection device extracts the reference feature from the input reference signal,
The reference feature normalization means of the signal detection device calculates a reference normalization feature by performing a normalization process based on the reference feature,
The learning means of the signal detection device reads a learning signal indicating an original signal having no characteristic distortion and a distortion signal obtained by adding the characteristic distortion to the original signal, and using the learning signal, an input feature calculation means and an input feature A plurality of signal pairs in a predetermined section of the normalized feature of the learning signal obtained by executing the same processing as the processing used by the normalizing means on the learning signal is evaluated, and the interclass variance of the signal pair is evaluated. When the function is used, the conversion coefficient that maximizes the evaluation function is obtained,
The reference feature conversion means of the signal detection device linearly converts the reference normalized feature using the conversion coefficient obtained by the learning means,
The input feature calculation means of the signal detection device extracts the input feature from the input signal,
The input feature normalization means of the signal detection device calculates an input normalization feature by performing normalization processing based on the input feature,
The input feature conversion unit of the signal detection device linearly converts the input normalized feature using the conversion coefficient obtained by the learning unit,
The post-conversion input normalization in the collation section set on the input normalization feature (hereinafter referred to as post-conversion input normalization feature) after the feature collation means of the signal detection device is linearly transformed by the input feature conversion means The similarity between the feature and the reference normalized feature that has been linearly converted by the reference feature conversion means (hereinafter referred to as the converted reference normalized feature) is calculated, and a search result is output based on the calculated similarity. And a signal detection method.

The conversion coefficient obtained by maximizing the evaluation function is

The signal detection method according to claim 7, wherein the signal detection method is generated using an eigenvector obtained from the following.

A signal detection program for causing a computer to execute the signal detection apparatus according to claim 1.

A computer-readable recording medium storing the signal detection program according to claim 9.