JP2001265779A

JP2001265779A - Acoustic retrieving method

Info

Publication number: JP2001265779A
Application number: JP2000079317A
Authority: JP
Inventors: Takashi Hasegawa; 長谷川　　隆
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2000-03-16
Filing date: 2000-03-16
Publication date: 2001-09-28

Abstract

PROBLEM TO BE SOLVED: To perform processing based on accelerated and simplified operation in acoustic data retrieval for retrieving acoustic information in a library with acoustic information as a key. SOLUTION: Power for each musical scale is found from the acoustic of data base and keys by acoustic and normalization, that value is quantized into three large, medium and small values, the feature data of acoustic to be retrieved and key acoustic are prepared (100) and the coincidence of the feature data of acoustic to be retrieved and key acoustic is decided while using only three large, medium and small values (113). Thus, only by designating a parameter to be used for frequency analysis and the time of acoustic to decide coincidence, high-speed acoustic retrieval is performed with the feature data of a little bits.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音響検索方法、特
に、ライブラリ中の音声、音楽等の音響情報を含む音響
データベースからキーとなる音響情報を用いて希望する
音声、音楽等の音響情報を検索するための音響データベ
ースの作成方法及びそれを利用した音響情報検索方法に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound search method, and more particularly, to a method for retrieving desired sound and music information by using key sound information from an acoustic database including sound information such as sound and music in a library. The present invention relates to a method for creating an acoustic database for retrieval and an acoustic information retrieval method using the same.

【０００２】[0002]

【従来の技術】情報処理技術の発展に伴い、マルチメデ
ィア・データベースシステム及びその検索方法、また、
ミニディスク（ＭＤ）やカセットテープ等の記録媒体に
記録された又はマイククロホン等の音響入力装置から入
力された音楽情報をキーとして音楽ライブラリから該当
音楽を検索し、販売する音楽データ販売システム、ある
いは、ＭＤやカセットテープ等の記録媒体に記録された
又はマイク等の音響入力装置から入力された音楽情報を
キーとしてプロモーションビデオやライブビデオのよう
な映像ライブラリから該当音楽を検索するミュージック
・ビデオ検索システムの開発が行われている。2. Description of the Related Art With the development of information processing technology, a multimedia database system and its search method,
A music data sales system that searches for and sells corresponding music from a music library using music information recorded on a recording medium such as a mini disk (MD) or a cassette tape or input from an audio input device such as a microphone microphone as a key, or Music / video search system for searching for a corresponding music from a video library such as a promotion video or a live video using music information recorded on a recording medium such as an MD or a cassette tape or input from an audio input device such as a microphone as a key. Is being developed.

【０００３】このような音響情報を含むマルチメディア
・データベースシステムの検索では、システム利用者
が、希望する音響情報や映像情報を簡易かつ高速で検索
できることが要求れる。このような要求に対して、画像
・音響時系列の検索を高速に行う方法として、次ぎのよ
うな高速音響検索方法が検討されている。（柏野他、マ
ルチモーダルアクティブ探索を用いた画像・音響時系列
の高速検索、信学技法、PRMU98-80、 pp.51-58、 1998/0
9）。[0003] Searching such a multimedia database system including audio information requires a system user to be able to search for desired audio information and video information easily and at high speed. In response to such demands, the following high-speed sound search methods are being studied as a method for performing high-speed image / sound search. (Kashino et al., High-speed search of image and sound time series using multimodal active search, IEICE, PRMU98-80, pp.51-58, 1998/0
9).

【０００４】（１）音響情報をＭ個のサンプル毎にＮ個
のチャンネルの周波数帯域フィルタバンクにかけ、各チ
ャンネルの出力を正規化し、求められたＮ個の値をＮ次
元特徴ベクトルとする。(1) Acoustic information is applied to a frequency band filter bank of N channels for every M samples, and the output of each channel is normalized, and the obtained N values are used as N-dimensional feature vectors.

【０００５】（２）上記特徴ベクトルの成分をそれぞれ
ｂ個の値に量子化する。以上から、特徴ベクトルはＬ＝
ｂＮ種類の値のいずれかなる。[0005] (2) Each of the components of the feature vector is quantized into b values. From the above, the feature vector is L =
One of bN kinds of values.

【０００６】（３）上記じ音響情報の時間的に連続する
Ｋ個の特徴ベクトルをまとめ（分割窓）、そのヒストグ
ラムを求める。(3) The K temporally continuous feature vectors of the above-mentioned acoustic information are combined (divided window), and a histogram thereof is obtained.

【０００７】（４）被検索音響とキー音響の対応する分
割窓におけるヒストグラム値の小さいほうの値の和を求
め、分割窓の類似度とし、（５）連続するＪ個の分割窓
をまとめ（注目窓）、その注目窓に含まれる分割窓の類
似度の最小値を注目窓の類似度Ｓし、（６）類似度Ｓが
類似度の平均、分散、及び人手で与えられたパラメータ
ｃから求められる閾値θより大きければ、対応する被検
索音響が検索対象であるとし、そうでなければ類似度Ｓ
の値から計算される数だけ被検索音響の分割窓をずら
し、上記（２）〜（６）の処理を行う。(4) The sum of the smaller values of the histogram values in the divided windows corresponding to the to-be-retrieved sound and the key sound is determined as the similarity of the divided windows, and (5) J continuous divided windows are collected ( Window of interest), the minimum value of the similarity of the divided windows included in the window of interest is calculated as the similarity S of the window of interest, and (6) the similarity S is calculated from the average, variance, and parameter c given manually. If it is larger than the obtained threshold θ, the corresponding search target sound is determined to be a search target; otherwise, the similarity S
The divided windows of the sound to be searched are shifted by the number calculated from the value of, and the above processes (2) to (6) are performed.

【０００８】上述の検索方法は音響検索を高速に行うこ
とができるが、その信号処理にＭ、Ｎ、ｂ、Ｋ、Ｊ、ｃ
等の多くのパラメータを適切に与えなければならないた
め、その信号処理が複雑になるという問題があった。[0008] The above-described search method enables high-speed sound search, but its signal processing includes M, N, b, K, J, and c.
And other parameters must be given appropriately, and the signal processing becomes complicated.

【０００９】[0009]

【発明が解決しようとする課題】本発明の主な目的は、
音響検索の実行者が、極めて少ないパラメータを与える
だけで、高速な音響情報を検索をできる実現する音響検
索方法及びシステムを実現することである。SUMMARY OF THE INVENTION The main object of the present invention is to:
It is an object of the present invention to provide a sound search method and system that enables a sound searcher to search for high-speed sound information by giving very few parameters.

【００１０】本発明の他の目的は、被検索音響とキー音
響の類似判定を単純な演算処理で行うことが出来るよう
に、被検索音響とキー音響の変換した音響の特徴データ
を作る方法を実現することである。Another object of the present invention is to provide a method for generating characteristic data of a sound to be searched and a sound converted from a key sound so that the similarity between the sound to be searched and the key sound can be determined by a simple arithmetic processing. It is to realize.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するた
め、本発明の音響検索方法では、データベース等の被検
索音響の音響データ（被検索音響データと略称）から検
索するためのキーとなる音響（以下、キー音響と略称）
をもつ被検索音響データを検索する方法において、被検
索音響データ及びキー音響を後述の同様の方法によっ
て、特徴データを抽出し、被検索音響データ及びキー音
響の特徴データの類似を比較判定し、上記データベース
の中からキー音響と類似する部分もつ音響情報を検索す
るあるいは再生する。In order to achieve the above object, in the sound search method of the present invention, a sound as a key for searching from sound data of sound to be searched (abbreviated as sound data to be searched) in a database or the like. (Hereinafter abbreviated as key sound)
In the method of searching for the searched audio data having, in the same manner as described below, the searched audio data and the key sound, feature data is extracted, and the similarity between the searched audio data and the feature data of the key sound is compared and determined. Search or reproduce sound information having a part similar to the key sound from the database.

【００１２】被検索音響データ及びキー音響データの特
徴データを作るため、音響データを一定時間（複数サン
プル）毎に周波数解析を行い複数周波数帯域毎のパワー
を求め、その周波数帯域毎のパワーを正規化し、正規化
したパワーを３値に量子化して、上記特徴データとす
る。上記特徴データを作成するため、検索システムの利
用者は、上記キー音響の特徴データに対応する一定時間
Ｔと、上記特徴データを構成する特徴ベクトル、即ち上
記周波数解析を行うために必要な被検索音響データある
いは上記キー音響データのサンプル数を決めるパラメー
タＥを入力すればよい。In order to generate characteristic data of the searched audio data and the key audio data, the audio data is subjected to frequency analysis at predetermined time intervals (a plurality of samples) to obtain power for a plurality of frequency bands, and the power for each of the frequency bands is normalized. Then, the normalized power is quantized into three values to obtain the feature data. In order to create the feature data, a user of the search system needs a fixed time T corresponding to the feature data of the key sound and a feature vector constituting the feature data, that is, a search target required for performing the frequency analysis. What is necessary is just to input the sound data or the parameter E which determines the number of samples of the key sound data.

【００１３】特に、上記音響データが音楽音であるとき
は、上記周波数ごとのパワーは音楽におけるピッチそれ
ぞれに対応するパワーであり、音楽におけるピッチそれ
ぞれに対応するパワーが各オクターブにおける音名に対
応するピッチのパワーの和を求め、その音名に対応する
１２個のパワーに基づいて特徴データを抽出する。In particular, when the acoustic data is a musical sound, the power for each frequency is the power corresponding to each pitch in music, and the power corresponding to each pitch in music corresponds to a note name in each octave. The sum of the powers of the pitches is obtained, and feature data is extracted based on the twelve powers corresponding to the pitch names.

【００１４】[0014]

【発明の実施の形態】図１及び図２はそれぞれ本発明に
よる音響検索方法の一実施形態における処理ステップを
説明するＰＡＤ図及びその音響検索方法実施する音響検
索システムの一実施形態の構成図を示す。1 and 2 are a PAD diagram illustrating processing steps in an embodiment of a sound search method according to the present invention, and a configuration diagram of an embodiment of a sound search system for implementing the sound search method. Show.

【００１５】まず、被検索音響データ（以下単に音響デ
ータと呼ぶ）を含むデータベースであるマルチメディア
・データベースを構築する際に、本発明による音響情報
の特徴データ作成方法を実行する特徴データ抽出プログ
ラム２１０を用いて、マルチメディア・データベース２
２１の音響データの特徴データを作成する（１００）。First, when constructing a multimedia database, which is a database containing audio data to be searched (hereinafter simply referred to as audio data), a feature data extraction program 210 for executing the method for creating audio information feature data according to the present invention. , A multimedia database 2
The feature data of the sound data of No. 21 is created (100).

【００１６】上記特徴データを作成するステップ１００
では、キー音響の時間長に相当する比較時間Ｔ（秒）と
周波数解析に必要なパラメータＥをキーボード、ディス
プレイ等からなる入出力装置２０２を用いてメモリ２０
３に設定する（１０１）。特徴データ、比較時間Ｔ
（秒）及びパラメータＥの詳細は後述する。Step 100 for creating the characteristic data
Then, the comparison time T (second) corresponding to the time length of the key sound and the parameter E required for the frequency analysis are stored in the memory 20 using the input / output device 202 including a keyboard, a display, and the like.
3 is set (101). Feature data, comparison time T
(Seconds) and the parameter E will be described later in detail.

【００１７】次に、プロセッサ２０１は特徴データ抽出
プログラム２１０及びメモリ２０３に設定されたパラメ
ータＴ及びＥを用いて、ディスク２２１に保存された複
数の音響データ（マルチメディア・データ）１０２のそ
れぞれに対し、音響データの特徴データを抽出し（１０
３）、その特徴データを特徴データ格納部２２０に保存
する。Next, the processor 201 uses the characteristic data extraction program 210 and the parameters T and E set in the memory 203 to process each of the plurality of acoustic data (multimedia data) 102 stored on the disk 221. , Extract the characteristic data of the acoustic data (10
3) Save the feature data in the feature data storage unit 220.

【００１８】キー音響を入力して音響データベース２２
１から音響検索を行う際には、プロセッサ２０１が音響
データ検索プログラム２１１を用いて、以下の手順で検
索を実行する（１１０）。Inputting a key sound, the sound database 22
When performing a sound search from No. 1, the processor 201 executes a search by the following procedure using the sound data search program 211 (110).

【００１９】まず、キー音響のデータを入出力装置２０
２を用いて指定する（１１１）。上記キー音響データを
入力する入力装置２０２としては、上記キー音響データ
が既にデジタル・データになっている場合、同データが
保存されているＣＤ、ＭＤ、ＭＯ、フロッピー（登録商
標）等のデバイスに対応するドライブを利用する。ま
た、マルチメディア・データベース２２１に対し、ネッ
トワークを通じて検索を行う場合には、ネットワーク・
カード等が利用できる。また、携帯音響記憶装置等に記
憶されたデータの場合には、その記憶装置との接続装置
が利用される。また、音響を直接入力又はカセットテー
プ等のアナログ機器から入力する場合には、マイク・カ
セットデッキからのアナログ音響信号をＡＤ変換器でデ
ジタル・データとする。First, key sound data is input to the input / output device 20.
2 is designated (111). As the input device 202 for inputting the key sound data, if the key sound data is already digital data, a device such as a CD, MD, MO, or floppy (registered trademark) in which the key sound data is stored is used. Use the corresponding drive. When searching the multimedia database 221 through a network,
Cards can be used. In the case of data stored in a portable acoustic storage device or the like, a connection device with the storage device is used. When audio is directly input or input from an analog device such as a cassette tape, an analog audio signal from a microphone / cassette deck is converted into digital data by an AD converter.

【００２０】次に、プロセッサ２０１がメモリ２０３を
用いて、上記キー音響データの特徴ベクトルを、上記音
響データの特徴データを抽出する方法と同様の方法によ
って抽出し、メモリ２０３に保存する（１１２）。上記
キー音響データの特徴データの特徴ベクトルと、上記特
徴データ上の特徴ベクトルの間で一致判定を行い、一致
す特徴データをもつ音響データをマルチメディアデータ
から求める（１１３）。Next, the processor 201 uses the memory 203 to extract the feature vector of the key sound data by the same method as the method for extracting the feature data of the sound data, and stores it in the memory 203 (112). . A match is determined between the feature vector of the feature data of the key sound data and the feature vector on the feature data, and sound data having the matching feature data is obtained from the multimedia data (113).

【００２１】上記一致判定の結果を入出力装置２０２に
出力する。すなわち一致する音響データがある場合を
は、その音響データであるマルチメディアデータ名を入
出力装置２０２を構成する表示装置に表示したり、ある
いは上記マルチメディア・データの音響情報そのものを
音響情報として再生する。なお、図２では特徴データ抽
出処理１００と音響データ検索処理１１０が同じプロセ
ッサ２０１で行われる例を示しているが、これらは別個
に設けられたプロセッサを用いて処理してよい。別個に
設けられたプロセッサを用いて処理する場合には、特徴
データの抽出（１００）を行った後、マルチメディア・
データ２２１と特徴データ２２０を別のシステムに複写
又は移動し、検索処理（１１０）を行う。従って、特徴
データ抽出（１００）を行うシステムには特徴データ抽
出プログラム２１０が、検索処理（１１０）を行うシス
テムには音響データ検索プログラム２１１が保存され
る。The result of the match determination is output to the input / output device 202. That is, when there is matching audio data, the multimedia data name as the audio data is displayed on the display device constituting the input / output device 202, or the audio information itself of the multimedia data is reproduced as audio information. I do. Although FIG. 2 shows an example in which the feature data extraction processing 100 and the acoustic data search processing 110 are performed by the same processor 201, these may be processed by using separately provided processors. When processing is performed using a separately provided processor, multimedia data is extracted (100) after performing feature data extraction.
The data 221 and the feature data 220 are copied or moved to another system, and a search process (110) is performed. Therefore, the system that performs the feature data extraction (100) stores the feature data extraction program 210, and the system that performs the search process (110) stores the acoustic data search program 211.

【００２２】図３は、特徴データ格納部２２０に格納さ
れたマルチメディア・データの特徴データを示す。音響
データの特徴データは特徴ベクトル抽出に必要な２つの
内部パラメータ（Ｍ及びＮ）３００、マルチメディア・
データのファイル情報３０１…３０３、特徴ベクトル３
１０…３１３をもつ。ファイル情報３０１…３０３はフ
ァイル名３２１と、そのファイルの特徴ベクトルが格納
されているメモリにおける特徴ベクトル開始位置を示す
ポインタ３２２から構成される。特徴ベクトル３１０…
３１３の個々は、以下に詳細に説明するように、例え
ば、２４ビットのデータ３２０（特徴素辺と呼ぶ）の集
合である。FIG. 3 shows the feature data of the multimedia data stored in the feature data storage section 220. The feature data of the sound data includes two internal parameters (M and N) 300 required for feature vector extraction, multimedia
Data file information 301 ... 303, feature vector 3
10 ... 313. The file information 301... 303 is composed of a file name 321 and a pointer 322 indicating a feature vector start position in the memory where the feature vector of the file is stored. Feature vector 310 ...
Each of 313 is, for example, a set of 24-bit data 320 (referred to as feature element side), as described in detail below.

【００２３】図４は上記特徴ベクトル、上記特徴データ
を詳細に説明する図である。FIG. 4 is a diagram for explaining the feature vector and the feature data in detail.

【００２４】（ａ）は、音響データである楽音の一部の
波形図で、個々のデータｘは、サンプリング周波数Ｓで
標本化されたサンプルデータである。上記音響データの
一定期間Ｔを連続するＮ（整数）ブロックに分け、個々
のブロックの周波数特性を求め、後に述べる特定の形態
のに変換したものを上記特徴ベクトルとする。個々のブ
ロックのサンプル数をＭ個とすると、Ｎ＝ＴＳ／Ｍの関
係が成り立つ。FIG. 2A is a waveform diagram of a part of a musical tone as acoustic data. Each data x is sample data sampled at a sampling frequency S. The predetermined period T of the acoustic data is divided into continuous N (integer) blocks, the frequency characteristic of each block is obtained, and the frequency characteristic of each block is converted into a specific form, which will be described later, as the feature vector. Assuming that the number of samples in each block is M, a relationship of N = TS / M holds.

【００２５】サンプル数Ｍ個は、音響データを周波数解
析（ＦＦＴ）して、周波情報に変換する際に必要な数
で、Ｍ＝２^Eで表わすことができる。ここで、パラメー
タＥは、音響データのサンプリング周波数の範囲から実
験的に決定される定数である。パラメータＥは、通常の
音響の場合、サンプリング周波数１１、０２５Ｈｚ／秒
の場合９、同２２、０５０Ｈｚ／秒の場合１０、同４
４、１００Ｈｚ／秒の場合１１で良好な検索結果が得ら
れるので、データベースに蓄積されるマルチメディア・
データの音響のサンプリング周波数に応じてパラメータ
Ｅを決定する。The number of samples the M, the frequency analyzing acoustic data (FFT), the number required to convert the frequency information, it can be expressed by M = 2 ^E. Here, the parameter E is a constant experimentally determined from the range of the sampling frequency of the acoustic data. The parameter E is a sampling frequency of 11, 25 for 025 Hz / second, 22, 22 and 050 Hz / sec, 10 and 4 for normal sound.
4. In the case of 100 Hz / sec, good search results can be obtained in 11 and multimedia data stored in the database
The parameter E is determined according to the sampling frequency of the data sound.

【００２６】（ｂ）は、上記一定期間Ｔの音響データの
周波数特性図で、横軸が対数表示の周波数、縦軸はパワ
ーＰを示す。上記周波数解析で、音響データをＭ個分づ
つずらして求めいくので、期間Ｔ分に対しＮ個の周波数
特性が得られる。FIG. 2B is a frequency characteristic diagram of the acoustic data during the above-mentioned fixed period T, wherein the horizontal axis represents the logarithmic frequency and the vertical axis represents the power P. In the frequency analysis, M pieces of acoustic data are obtained by shifting each time, so that N frequency characteristics can be obtained for the period T.

【００２７】（ｃ）上記Ｎ個の周波数特性のそれぞれ
は、特定の周波数成分、例えば、音楽における音高に対
応する数波数帯域ｆ１，ｆ２…ｆｍ−１，ｆｍ毎のｍ個
のパワーで表される。従ってＮ個の周波数特性全体では
ｍ×Ｎ個のパワーデータが構成される。(C) Each of the above-mentioned N frequency characteristics is represented by a specific frequency component, for example, m powers in several wavenumber bands f1, f2... Fm-1, fm corresponding to the pitch of music. Is done. Therefore, m × N power data are constituted by the entire N frequency characteristics.

【００２８】（ｄ）上記パワーデータのそれぞれは、３
値に量子化される。３値に量子化する方法は、上記ｍＮ
個のパワーデータの平均値μ、標準偏差値σを求め、個
々のパワーデータの値が、値μ−σからμ＋σの範囲の
Ｏ、値μ−σより低い値ｂｌ、及び値μ＋σより高い値
ｂｈかを表す２ビットデータとする。上記３値に量子化
されたｍＮ個のパワーデータを特徴データと称呼し、特
徴データを構成するＮ個のそれぞれの２ｍビットの特徴
素辺を特徴ベクトルと称呼する。(D) Each of the power data is 3
Quantized to a value. The method of quantizing into ternary values is based on the above mN
The average value μ and the standard deviation value σ of the power data are obtained, and the values of the individual power data are O in the range of μ−σ to μ + σ, the value bl lower than the value μ−σ, and the value higher than the value μ + σ. It is 2-bit data indicating whether the data is bh. The mN pieces of power data quantized into the three values are referred to as feature data, and each of the N 2m-bit feature elements constituting the feature data is referred to as a feature vector.

【００２９】図１に戻り、パラメータ入力処理（１０
１）及び特徴データを抽出する処理（１０３）を説明す
る。図５は、上述の特徴データを抽出する処理（１０
３）を表すＰＡＤ図である。入力装置により、上記パラ
メータＭを決定するパラメータＥ及び検索における一致
判定を行う期間Ｔに対応する秒数を入力した後、マルチ
メディア・データのファイル名３０１，３０２…３０３
…を特徴データ２２０内のファイル名３２１格納部に保
存する（５０１）。ここで、ファイルの番号をｉで表
す。Returning to FIG. 1, the parameter input processing (10
1) and the process (103) of extracting characteristic data will be described. FIG. 5 shows a process (10) for extracting the above-described feature data.
It is a PAD figure showing 3). After inputting the parameter E for determining the parameter M and the number of seconds corresponding to the period T for performing a match determination in the search, the file names 301, 302,...
.. Are stored in the file name 321 storage section in the feature data 220 (501). Here, the number of the file is represented by i.

【００３０】次に、ファイル番号ｉのマルチメディア・
データの特徴データが格納された一連の特徴ベクトルの
開始位置を表すポインタｐ（ｉ）３２２を設定し、同時
にその位置を変数ｊに格納する（５０２）。Next, the multimedia of file number i
A pointer p (i) 322 indicating the start position of a series of feature vectors storing the feature data of the data is set, and at the same time, the position is stored in a variable j (502).

【００３１】次に、ファイル番号ｉの特徴データの特徴
ベクトルの全特徴ベクトルデータ数Ｋ（ｉ）を求める
（５０３）。特徴ベクトルは、図４で説明したように、
音響データ（サンプルデータ）をＭ個ずつずらしながら
ＭＮ個のデータを用いて求めていくので、特徴ベクトル
データ数Ｋ（ｉ）は、ファイル番号ｉのマルチメディア
・データ内の音響データ数ＮＤ（ｉ）（全サンプル数）
からＭＮを減じてＭで割った商になる。Next, the number K (i) of all feature vector data of the feature vectors of the feature data of the file number i is obtained (503). The feature vector is, as described in FIG.
Since the sound data (sample data) is obtained using MN data while shifting the data by M pieces, the number of feature vector data K (i) is determined by the number of sound data ND (i) in the multimedia data of file number i. ) (Total number of samples)
From MN and divide by M.

【００３２】次に、特徴ベクトルの番号をｋとし、ｋ＝
０〜Ｋ（ｉ）−１で以下の処理を行う（５０４）。ファ
イル番号ｉのマルチメディア・データの音響データのｋ
番目からｋ＋ＭＮ−１番目を用いて特徴ベクトルを抽出
し（５１０）、その特徴ベクトルのそれぞれをｄ（ｊ
１）〜ｄ（ｊＮ）（３１３）し、特徴データ格納部２２
０に保存し（５１１）、最後にｊをインクリメントする
（５１２）。Next, the number of the feature vector is k, and k =
The following processing is performed for 0 to K (i) -1 (504). K of audio data of multimedia data of file number i
The feature vector is extracted using the (k + MN−1) th from the (th) to (510), and each of the feature vectors is d (j
1) to d (jN) (313), and the characteristic data storage unit 22
0 is stored (511), and j is finally incremented (512).

【００３３】図６は、上記特徴ベクトルの抽出処理（５
１０）を説明するＰＡＤ図である。ここで、上記特徴ベ
クトルの抽出処理の対象となる音響データの一連のサン
プル値をｘ（ｋ）〜ｘ（ｋ＋ＭＮ−１）とする。FIG. 6 shows the above-described feature vector extraction processing (5).
It is a PAD figure explaining 10). Here, it is assumed that a series of sample values of the acoustic data to be subjected to the feature vector extraction processing are x (k) to x (k + MN-1).

【００３４】キー音響データに対する特徴ベクトル抽出
処理（１１２）は、キー音響データに対しては、データ
としてキー音響データの最初のＭＮ個、データベース内
の音響データに対する特徴ベクトル抽出処理（５１０）
では、Ｄ（ｋ）〜Ｄ（ｋ＋ＭＮ−１）が対象データとな
る。キー音響に対する特徴ベクトル抽出処理（１１２）
では、抽出された特徴ベクトルは、ｄ（０）〜ｄ（Ｎ−
１）として図１のメモリ２０３に蓄えられる。また、デ
ータベース２２１内の音響データに対する特徴ベクトル
抽出処理（５１０）では、ｄ（ｊ１）〜ｄ（ｊＮ）とし
て特徴データ格納部２２０内に保存される。The feature vector extracting process (112) for the key acoustic data includes the first MN key acoustic data as data for the key acoustic data, and the feature vector extracting process (510) for the acoustic data in the database.
Then, D (k) to D (k + MN-1) are the target data. Feature vector extraction processing for key sound (112)
Then, the extracted feature vectors are d (0) to d (N−
1) is stored in the memory 203 of FIG. In the feature vector extraction process (510) for the sound data in the database 221, the feature data is stored in the feature data storage unit 220 as d (j1) to d (jN).

【００３５】まず、ｉ＝０〜Ｎ−１で以下の処理を行う
（６００）。音響データｘ（ｉＭ）〜ｘ（ｉＭ＋Ｍ−
１）に対し、ＦＦＴ等の手法を用いて周波数解析を行う
（６０１）。ＦＦＴ等の手法は従来知られてい手法が使
用できる。求められた周波数帯域毎のパワーから、音楽
における音高に対応する周波数帯域毎にパワーを求める
（６０２）。具体的には、例えば音高Ａ３（ラ、４４０
Ｈｚ）に対応する周波数帯域のパワーは、［２^-0.5/2４
・４４０Ｈｚ〜２^0.5/2４・４４０Ｈｚ］の範囲、すな
わち［４２７Ｈｚ〜４５３Ｈｚ］の範囲の周波数成分の
和になる。First, the following processing is performed for i = 0 to N-1 (600). Acoustic data x (iM) to x (iM + M-
For 1), frequency analysis is performed using a technique such as FFT (601). As a method such as FFT, a conventionally known method can be used. From the obtained power for each frequency band, power is obtained for each frequency band corresponding to the pitch of music (602). Specifically, for example, pitch A3 (LA, 440
Hz), the power in the frequency band corresponding to [ ^{2-0.5 / 2} 4
440 Hz to ^{2 0.5 / 2} 4.440 Hz], that is, the sum of the frequency components in the range of [427 Hz to 453 Hz].

【００３６】次に、音階毎にパワーをまとめる（６０
３）。例えば音高Ａ（ラ）のパワーとして、オクターブ
毎のＡ２、Ａ３、Ａ４等の音高に対応するパワーの和を
求める。これにより、１２段階の音高に対応する周波数
解析結果は１２個のパワーＰ（０、ｉ）〜Ｐ（１１、
ｉ）にまとまる。なお、処理ステップ６０１〜６０３で
求めたパワーは、音響データにバンドパスフィルタをか
けることによっても求めることができる。また、本方法
は、オクターブ毎のパワーをまとめるため、データベー
ス内のデータを、オクターブ上、もしくは下にした（す
なわち、周波数を２倍又は半分にした）キーで、元のデ
ータが検索されるという特徴を持つ。Next, the power is summarized for each scale (60
3). For example, as the power of the pitch A (la), the sum of the powers corresponding to the pitches such as A2, A3, and A4 for each octave is obtained. As a result, the frequency analysis result corresponding to the 12-step pitch has 12 powers P (0, i) to P (11,
i). Note that the power obtained in the processing steps 601 to 603 can also be obtained by applying a band-pass filter to the acoustic data. In addition, the method is such that the data in the database is searched up or down (ie, the frequency is doubled or halved) to retrieve the original data in order to summarize the power for each octave. Has features.

【００３７】１２個のパワーＰ（０、ｉ）〜Ｐ（１１、
ｉ）を求める方法として、ｘ（ｉＭ）〜ｘ（ｉＭ＋Ｍ−
１）に対し、例えば音高Ａ（ラ）のパワーを求める場
合、オクターブ毎のＡ２、Ａ３、Ａ４等の音高に対応す
る周波数のみを通過させるバンドパスフィルタを設け、
該フィルタの出力のパワーを用いる方法が採用できる。The twelve powers P (0, i) to P (11,
As a method for obtaining i), x (iM) to x (iM + M−
In contrast to 1), for example, when obtaining the power of the pitch A (la), a band-pass filter that passes only frequencies corresponding to pitches such as A2, A3, and A4 for each octave is provided.
A method using the output power of the filter can be adopted.

【００３８】次に、図４の（ｄ）で説明したように、上
述の処理で求められたパワーＰ（ｊ、ｉ）（ｊ＝０〜１
１、ｉ＝０〜Ｎ−１）の平均値μと標準偏差σを求め
（６１１）、パワーの中間値下限Ｌ（＝μ−σ）と同上
限Ｈ（＝μ＋σ）を求める（６１２）。パワーＰ（ｊ、
ｉ）が［Ｌ、Ｈ］の範囲に無いとき、そのパワーは音響
の顕著な特徴を表すことになる。Next, as described with reference to FIG. 4D, the power P (j, i) (j = 0 to 1) obtained by the above-described processing is obtained.
1, the average value μ of i = 0 to N−1) and the standard deviation σ are obtained (611), and the lower limit L (= μ−σ) and the upper limit H (= μ + σ) of the intermediate value of the power are obtained (612). Power P (j,
When i) is not in the [L, H] range, the power will represent a salient feature of the sound.

【００３９】そこで、次に、全てのパワーＰ（ｊ、ｉ）
（ｊ＝０〜１１、ｉ＝０〜Ｎ−１）に対し、以下の処理
を行う（６２０）。もし、Ｐ（ｊ、ｉ）が下限Ｌより小
さいならば、特徴値ｂ（ｊ、ｉ）をパワーが小さいこと
を表す特徴量ｂｌとする（６２２）。もし、Ｐ（ｊ、
ｉ）がＨより大きいならば、ｂ（ｊ、ｉ）をパワーが大
きいことを表す特徴量ｂｈとする（６２３）。それ以外
の場合は、Ｐ（ｊ、ｉ）は中間値であり、ｂ（ｊ、ｉ）
を特徴的な値でないことを表す論理値“０”とする（６
２４）。ここで、ｂ（ｊ、ｉ）＝“０”、ｂｌ、ｂｈは
いずれも２ビットの値で表し、音響データの特徴データ
内ではｂｌ＝１、ｂｈ＝２、キー音響の特徴ベクトル内
ではｂｌ＝２、ｂｈ＝１とする。上記パワーＰ（ｊ、
ｉ）及びｂ（ｊ、ｉ）のｊ＝０〜１１、ｉ＝０〜Ｎ−１
の付いて表したテーブルは、それぞれ図４の（ｃ）及び
（ｄ）に対応する。上記ｂｌとｂｈの値をデータベース
とキー音響で交換することにより、後述のように、特徴
ベクトル一致判定（１１３）が論理積（ＡＮＤ）のビッ
ト演算のみで高速に行うことができる。Then, next, all the powers P (j, i)
The following processing is performed on (j = 0 to 11, i = 0 to N-1) (620). If P (j, i) is smaller than the lower limit L, the feature value b (j, i) is set as the feature amount bl indicating that the power is small (622). If P (j,
If i) is larger than H, b (j, i) is set as a feature amount bh indicating that the power is large (623). Otherwise, P (j, i) is an intermediate value and b (j, i)
Is a logical value “0” indicating that it is not a characteristic value (6
24). Here, b (j, i) = “0”, bl and bh are all represented by 2-bit values, bl = 1 and bh = 2 in the feature data of the sound data, and bl in the feature vector of the key sound. = 2, bh = 1. The power P (j,
j = 0 to 11, i = 0 to N-1 of i) and b (j, i)
4 correspond to (c) and (d) in FIG. 4, respectively. By exchanging the values of bl and bh with the database and the key sound, the feature vector match determination (113) can be performed at high speed only by a logical AND (AND) bit operation, as described later.

【００４０】最後に、ｉ＝０〜Ｎ−１で以下の処理を行
う（６３０）。まず、ｄ（ｉ）＝“０”とする（６３
１）。次に、ｊ＝０〜１１で（６３２）、ｄ（ｉ）を２
ビット左にずらしてｂ（ｊ、ｉ）とのＯＲ演算の結果を
代入する処理（６３３）を行う。なお、ステップ６３１
〜６３３の処理で、ｄ（ｉ）の０〜２４ビットに２ビッ
トずつ６２１〜６２４で求めた各音階の特徴値が代入さ
れる。従って、ｄ（ｉ）は２４ビットの値であり、本処
理によりＮ個の特徴ベクトルｄ（ｉ）が求められる。Finally, the following processing is performed for i = 0 to N-1 (630). First, d (i) = “0” (63
1). Next, when j = 0 to 11 (632), d (i) is 2
The process (633) of shifting the bit to the left and substituting the result of the OR operation with b (j, i) is performed. Step 631
In the processing of to 633, the characteristic value of each scale obtained in 621 to 624 is substituted into the 0 to 24 bits of d (i) by 2 bits. Therefore, d (i) is a 24-bit value, and N feature vectors d (i) are obtained by this processing.

【００４１】次に、特徴ベクトル一致判定処理（１１
３）を、図７を用いて説明する。ここで、キー音響に対
する特徴ベクトル抽出処理（１１２）で得られ、メモリ
（２０３）に蓄えられたキー音響の特徴ベクトルをｄｋ
（ｉ）（ｉ＝１〜Ｎ）とする。特徴データ内２２０に蓄
えられた、データベース２２１内のマルチメディア・デ
ータに対する特徴ベクトル（３１０〜３１３）の各々ｄ
（ｊ，ｉ）について、以下の処理を行う（７００）。な
お、以下はｊ番目の特徴ベクトルに対する処理とする。
まず、ｉ＝１〜Ｎで順に（７０１）、ｄｋ（ｉ）とｄ
（ｊ、ｎ）の論理積を求め（７０２）、もし論理積が
“０”でなければ、以降の処理を中断し、次の特徴ベク
トルｊ＋１の処理に移る（７０３）。もし、ｎ＝１〜Ｎ
の全てのｉについてｄｋ（ｉ）とｄ（ｊ、ｉ）論理積が
“０”となるならば、特徴ベクトルが一致したと判定さ
れるので、該当特徴ベクトルが含まれるファイル名を求
める（７１０）。該当特徴ベクトルｊがファイル情報
（３０１〜３０３）中のポインタ（３２２）ｐ（ｉ）と
ｐ（ｉ＋１）の間にある場合に、ファイル名ｉが求める
ファイル名となる。ファイル名が求められた場合には、
一致判定処理を終了する（７１１）。Next, feature vector coincidence determination processing (11)
3) will be described with reference to FIG. Here, the feature vector of the key sound obtained in the feature vector extraction process (112) for the key sound and stored in the memory (203) is dk
(I) (i = 1 to N). D of each of the feature vectors (310 to 313) for the multimedia data in the database 221 stored in the feature data 220
The following processing is performed for (j, i) (700). The following is processing for the j-th feature vector.
First, (701) for i = 1 to N, dk (i) and d
The logical product of (j, n) is obtained (702). If the logical product is not "0", the subsequent processing is interrupted, and the processing shifts to the processing of the next feature vector j + 1 (703). If n = 1 to N
If the logical product of dk (i) and d (j, i) is "0" for all i, it is determined that the feature vectors match, and the file name including the corresponding feature vector is obtained (710). ). When the corresponding feature vector j is located between the pointer (322) p (i) and p (i + 1) in the file information (301 to 303), the file name i is the file name to be obtained. If you are asked for a file name,
The match determination processing ends (711).

【００４２】図８は、上記論理積演算における真値表を
表す。前述のように、キー音響に対する特徴ベクトルの
特徴素辺は、Ｈ（Ｈ＝μ＋σ）より大きい、ＨとＬとの
間、及びＬ（Ｌ＝μ−σ）より小さい場合、それぞれは
ｂｈ＝“０１”、Ｏ＝“００”及びｂｌ＝“１０”の２
ビットで表され、被検索音響データの特徴ベクトルは、
それぞれｂｈ＝“１０”、Ｏ＝“００”及びｂｌ＝“０
１”で表される。そのため、各特徴素辺の論理積演算は
真値表のようになり、一致判定処理が簡単な演算処理
で、高速に行える。また特徴データの構成ビット数も少
なく、格納目の手段の必要メモリ容量を小さくできる。FIG. 8 shows a true value table in the AND operation. As described above, when the feature element side of the feature vector for the key sound is larger than H (H = μ + σ), between H and L, and smaller than L (L = μ−σ), bh = “ 01, O = “00” and bl = “10”
And the feature vector of the searched audio data is
Bh = “10”, O = “00” and bl = “0” respectively
1 ". Therefore, the logical product operation of each feature element side becomes like a true value table, and the match determination process can be performed at high speed by a simple calculation process. The required memory capacity of the storage means can be reduced.

【００４３】本発明を音楽データベースに適用すれば、
ユーザがラジオやテレビの音楽番組のエアーチェックや
マイク等で録音した音楽の一部から該当音楽を購入する
ことができる音楽配信販売サービスを実現できる。同様
にミュージック・ビデオのデータベースに適用すれば、
映像を伴わない音楽の一部からビデオを購入又は視聴で
きるミュージック・ビデオ配信販売サービスを実現でき
る。更に、音楽データベースから特徴データを抽出した
場合、マルチメディアデータ（２２１）を持たず、ユー
ザが与えた音響データからファイル名（３２１）を求
め、同ファイル名を元に曲名等を出力する曲名検索サー
ビスも実現できる。更に、音楽データベースに対し、キ
ーとしてインターネットや放送等で配信される音響情報
を与え、該音楽データベースに含まれる音楽が検索され
た場合に課金処理を行うことにより、該音楽データベー
スに含まれる音楽の著作権を監視するシステムを実現で
きる。If the present invention is applied to a music database,
It is possible to realize a music distribution / sales service in which a user can purchase music from a part of music recorded by air check of a radio or television music program or a microphone. Similarly, if applied to a music video database,
It is possible to realize a music / video distribution / sales service in which a video can be purchased or viewed from a part of music not accompanied by video. Further, when the feature data is extracted from the music database, the file name (321) is obtained from the audio data provided by the user without the multimedia data (221), and the song name search for outputting the song name or the like based on the file name is performed. Services can also be realized. Further, audio information distributed via the Internet or broadcasting is given to the music database as a key, and when music included in the music database is searched, a charging process is performed, whereby the music included in the music database is processed. A system for monitoring copyright can be realized.

【００４４】[0044]

【発明の効果】本発明によるれ特徴データを用いれば、
特徴データが極めて少ないビット数のデータで表現で
き、音響検索等における被検索音響データとキー音響の
一致判定処理が簡単な論理演算処理で、高速の処理が実
現できる。検索システムの操作者は、周波数解析に用い
るパラメータＥと、一致を判定する音響の時間Ｔを指定
するのみで、高速な音響検索が実現される。According to the present invention, if the feature data is used,
Characteristic data can be represented by data of an extremely small number of bits, and high-speed processing can be realized by a simple logical operation processing for determining the coincidence between sound data to be searched and key sounds in sound search or the like. A high-speed sound search is realized only by the operator of the search system designating a parameter E used for frequency analysis and a sound time T for determining a match.

[Brief description of the drawings]

【図１】本発明による音響検索方法の一実施形態のＰＡ
Ｄ図である。FIG. 1 shows a PA of an embodiment of an acoustic search method according to the present invention.
FIG.

【図２】本発明による音響検索システムの一実施形態の
ブロック構成図である。FIG. 2 is a block diagram of an embodiment of an audio search system according to the present invention.

【図３】本発明による音響検索方法の一実施形態におけ
る音響の特徴データを表した図である。FIG. 3 is a diagram showing acoustic feature data in an embodiment of an acoustic search method according to the present invention.

【図４】特徴ベクトル、特徴データを説明する図であ
る。FIG. 4 is a diagram illustrating a feature vector and feature data.

【図５】本発明による音響検索方法の一実施形態におけ
る１つのマルチメディア・データに対する特徴データ抽
出処理を表したＰＡＤ図である。FIG. 5 is a PAD diagram showing a feature data extraction process for one piece of multimedia data in one embodiment of the sound search method according to the present invention.

【図６】本発明による音響検索方法の一実施形態におけ
る特徴ベクトル抽出処理を表したＰＡＤ図である。FIG. 6 is a PAD diagram showing a feature vector extraction process in one embodiment of the sound search method according to the present invention.

【図７】本発明による音響検索方法の一実施形態におけ
る特徴ベクトル一致判定処理を表したＰＡＤ図である。FIG. 7 is a PAD diagram showing a feature vector match determination process in one embodiment of the sound search method according to the present invention.

【図８】本発明による音響検索方法の一実施形態におけ
る特徴ベクトル一致判定処理の論理演算を説明するため
の真値表を示す。FIG. 8 shows a true value table for explaining the logical operation of the feature vector match determination processing in one embodiment of the sound search method according to the present invention.

[Explanation of symbols]

２０１…プロセッサ、２０２…入出力装置、２０３…メ
モリ、２１０、２１１…プログラム、２２０、２２１…
データ、３００…パラメータ、３０１〜３０３ファイル
情報、３１０〜３１３…特徴ベクトル、３２０…特徴素
辺、３２１…ァイル名、３２２…特徴ベクトルへのポイ
ンタ。201: processor, 202: input / output device, 203: memory, 210, 211 ... program, 220, 221 ...
Data, 300: parameter, 301 to 303 file information, 310 to 313: feature vector, 320: feature element side, 321: file name, 322: pointer to feature vector.

Claims

[Claims]

A first parameter having a value corresponding to a length of the key sound data from the sound data serving as a search key; and a sample of the sound data for performing a frequency analysis of the key sound data. A first step of setting a second parameter of a value for setting a value of a time interval, a second step of extracting characteristic data of key audio data using the first and second parameters, Second
A third step of extracting feature data of the searched sound information from the searched sound information of the sound data included in the database by the same method as in the step; and extracting the feature data of the searched sound information and the key sound data. A fourth step of comparing and judging the feature data and searching the database for acoustic information having a part similar to the key sound.
A sound search method comprising steps.

2. A first step of performing frequency analysis of acoustic data at predetermined time intervals to obtain power for each of a plurality of frequency bands, a second step of normalizing the power of each frequency, and a ternary value of the normalized power. And generating a characteristic data of the acoustic data.

3. The method according to claim 2, wherein said acoustic information is music sound, and said power for each frequency band is a power corresponding to each pitch in music.

4. The power corresponding to each pitch in the music sound is obtained by summing the power of the pitch corresponding to the pitch name in each octave, and feature data is extracted based on the 12 powers corresponding to the pitch name. 4. The method according to claim 3, wherein the characteristic data of the acoustic information is created.

5. The step of quantizing into the above three values obtains upper and lower limits of an intermediate value from the obtained power, and when each power value is larger than the upper limit value, smaller than the lower limit value, and 3 when the value is between the value and the lower limit
5. The method according to claim 2, wherein the quantized power is quantized to a value, and the quantized power is used as feature data as a 2-bit value.

6. The method according to claim 1, wherein the characteristic data of the searched audio data and the key audio data is created by the characteristic data generating method of any one of claims 2 to 6. Acoustic search method.

7. The method according to claim 1, wherein the characteristic data of the acoustic data and the acoustic data serving as the key are created by the method for creating characteristic data of acoustic information according to claim 5. Of the quantized feature values, the intermediate value is set to 0, and the feature data obtained from the database and the feature data obtained from the key are inverted with respect to the values representing the large and small values. And a logical product of the respective feature data, and using a portion where the value of all the lengths of the set feature data becomes 0 as a search result.

8. The method according to claim 2, wherein
A database comprising characteristic data created by two acoustic information characteristic data creating methods and storing characteristic data.