JP5078032B2

JP5078032B2 - Sound source identification method and sound source identification apparatus

Info

Publication number: JP5078032B2
Application number: JP2008250360A
Authority: JP
Inventors: 彰岩田; クグレマウリシオ
Original assignee: 国立大学法人名古屋工業大学
Priority date: 2008-09-29
Filing date: 2008-09-29
Publication date: 2012-11-21
Anticipated expiration: 2028-09-29
Also published as: JP2010079188A

Description

本発明は、音源の種類を同定する音源同定方法及び音源同定装置に関する。 The present invention relates to a sound source identification method and a sound source identification device for identifying the type of a sound source.

音による周辺環境把握の基本機能は、音源方向を識別する音源定位と、音源種類を識別する音源同定（音源認識）であり、パルスニューロンモデルを用いた音源同定装置には、下記非特許文献１に記載されたものがある。また、出願人による音源同定に関する出願に、下記特許文献１、２があり、特許文献２には、処理の高速化のため、音源定位・同定装置をＦＰＧＡ（Field Programmable Gate Array）に実装した例が示されている。
特開２００８−７７１７７号公報特開２００８−８５４７２号公報坂口晋也、黒柳奨、岩田彰、「環境把握のための音源同定システム」、電子情報通信学会ＮＣ研究会技術研究報告、社団法人電子情報通信学会、１９９９年１２月、ＮＣ９９−７０、ｐ．６１−６８ The basic functions for grasping the surrounding environment by sound are sound source localization for identifying the direction of the sound source and sound source identification (sound source recognition) for identifying the type of sound source. The sound source identification device using the pulse neuron model includes the following non-patent document 1. There is what is described in. Further, there are the following Patent Documents 1 and 2 in the application relating to sound source identification by the applicant. Patent Document 2 describes an example in which a sound source localization / identification device is mounted on an FPGA (Field Programmable Gate Array) for speeding up the processing. It is shown.
JP 2008-77177 A JP 2008-85472 A Shinya Sakaguchi, Shu Kururoyanagi, Akira Iwata, “Sound source identification system for grasping the environment”, IEICE NC study technical report, IEICE, December 1999, NC99-70, p. 61-68

しかし、上記特許文献２の表１から分かるように、特許文献２記載の装置では、音源同定用周波数パターン検出部のみでも約５，０００ALUTｓの回路数が必要であり、更なる処理の高速化と装置のコンパクト化のためには、より少ない回路数でハードウェア化可能な音源同定方法が望まれていた。 However, as can be seen from Table 1 of Patent Document 2, the apparatus described in Patent Document 2 requires a circuit number of about 5,000 ALUTs even with only the sound source identification frequency pattern detection unit. In order to reduce the size of the apparatus, a sound source identification method that can be implemented by hardware with a smaller number of circuits has been desired.

この発明は、上述した問題を解決するものであり、より少ない回路数でハードウェア化可能な音源同定方法及び音源同定装置を提供することを目的とする。 An object of the present invention is to solve the above-described problems, and to provide a sound source identification method and a sound source identification apparatus that can be implemented in hardware with a smaller number of circuits.

本発明の音源同定方法は、入力音を、複数の周波数帯域別に、音圧に応じたパルス頻度を持ち時間軸方向にパルスが並んだパルス列に変換し、前記各周波数帯域のパルス列を用いて入力音の音源を識別する音源同定方法であって、前記各周波数帯域のパルス列において、時間軸方向に所定幅を有するカウント範囲内のパルス数を数えて、前記各パルス列における前記パルス数を要素とするパルス数ベクトルを生成する第１ステップと、前記パルス数ベクトルの各要素のうちの大きい方からＮ（Ｎ：正整数）個の要素を１とし、残りの要素を０とした特徴ベクトルを生成する第２ステップと、音源が分かっている音から前記特徴ベクトルと同様に生成され、それぞれ元の音源を示す音源カテゴリに分類されて記憶されている複数の参照ベクトルから、前記特徴ベクトルにハミング距離で近い方からｋ（ｋ：正整数）個の参照ベクトルを検索し、前記ｋ個の参照ベクトルのうちの最も多くの参照ベクトルが属する音源カテゴリを決定し、該音源カテゴリを示すカテゴリ情報を出力する第３ステップとを、前記カウント範囲を時間軸方向に重ならないように移動させつつ、繰り返し、出力された前記カテゴリ情報に基づいて前記入力音の音源を識別することを特徴とする。 According to the sound source identification method of the present invention, an input sound is converted into a pulse train having a pulse frequency corresponding to sound pressure and arranged in a time axis direction for each of a plurality of frequency bands, and input using the pulse train of each frequency band. A sound source identification method for identifying a sound source, wherein in a pulse train of each frequency band, the number of pulses within a count range having a predetermined width in the time axis direction is counted, and the number of pulses in each pulse train is used as an element A first step of generating a pulse number vector and a feature vector in which N (N: positive integer) elements are set to 1 and the remaining elements are set to 0 from the larger of the elements of the pulse number vector Second step and a plurality of reference vectors which are generated in the same manner as the feature vector from the sound whose sound source is known, and are classified and stored in the sound source category indicating the original sound source. Then, k (k: positive integer) reference vectors are searched from the one closer to the feature vector at a Hamming distance, a sound source category to which the most reference vector among the k reference vectors belongs is determined, The third step of outputting category information indicating the sound source category is repeated while moving the count range so as not to overlap in the time axis direction, and the sound source of the input sound is identified based on the output category information. It is characterized by that.

なお、好ましくは、前記第１ステップと、前記第２ステップと、前記第３ステップと、前記複数の参照ベクトルが分類される複数の音源カテゴリのうち、前記第３ステップで出力された前記カテゴリ情報が示す音源カテゴリについてはポテンシャル値を上げて、他の音源カテゴリについてはポテンシャル値を下げる第４ステップとを、前記カウント範囲を時間軸方向に重ならないように移動させつつ、繰り返し、前記複数の音源カテゴリのうち、前記ポテンシャル値が最大になった音源カテゴリを、前記入力音の音源と判定する。 Preferably, among the plurality of sound source categories into which the first step, the second step, the third step, and the plurality of reference vectors are classified, the category information output in the third step. The fourth step of raising the potential value for the sound source category indicated by and lowering the potential value for the other sound source categories is repeated while moving the count range so as not to overlap in the time axis direction. Of the categories, the sound source category having the maximum potential value is determined as the sound source of the input sound.

本発明の音源同定装置は、入力音を、複数の周波数帯域別に、音圧に応じたパルス頻度を持ち時間軸方向にパルスが並んだパルス列に変換するパルス列生成手段と、時間軸方向に所定幅を有し時間軸方向に重ならないように設定されるカウント範囲毎に、前記各周波数帯域のパルス列における前記カウント範囲内のパルス数を数えて、前記各パルス列における前記パルス数を要素とするパルス数ベクトルを生成するパルス数ベクトル生成手段と、前記パルス数ベクトルの各要素のうちの大きい方からＮ（Ｎ：正整数）個の要素を１とし、残りの要素を０とした特徴ベクトルを生成する特徴ベクトル生成手段と、音源が分かっている音から前記特徴ベクトルと同様に生成され、それぞれ元の音源を示す音源カテゴリに分類された複数の参照ベクトルを記憶した参照ベクトル記憶手段と、前記参照ベクトル記憶手段に記憶されている参照ベクトルから、前記特徴ベクトルにハミング距離で近い方からｋ（ｋ：正整数）個の参照ベクトルを検索し、前記ｋ個の参照ベクトルのうちの最も多くの参照ベクトルが属する音源カテゴリを決定し、該音源カテゴリを示すカテゴリ情報を出力する音源カテゴリ識別手段と、を有し、出力された前記カテゴリ情報に基づいて前記入力音の音源を識別することを特徴とする。 The sound source identification device according to the present invention includes a pulse train generation means for converting an input sound into a pulse train having a pulse frequency corresponding to sound pressure and arranged in the time axis direction for each of a plurality of frequency bands, and a predetermined width in the time axis direction. For each count range set so as not to overlap in the time axis direction, the number of pulses in the count range in the pulse train of each frequency band is counted, and the number of pulses having the number of pulses in each pulse train as an element A pulse number vector generating means for generating a vector, and a feature vector in which N (N: positive integer) elements are set to 1 from the larger one of the elements of the pulse number vector, and the remaining elements are set to 0 Feature vector generation means and a plurality of reference vectors generated in the same manner as the feature vector from the sound whose sound source is known, each classified into a sound source category indicating the original sound source. A reference vector storage means storing a reference vector and a reference vector stored in the reference vector storage means to search k (k: positive integer) reference vectors from the one closer to the feature vector by a Hamming distance, sound source category identifying means for determining a sound source category to which the most reference vector of the k reference vectors belongs, and outputting category information indicating the sound source category, and based on the output category information The sound source of the input sound is identified.

なお、好ましくは、前記複数の参照ベクトルが分類される複数の音源カテゴリのうち、前記音源カテゴリ識別手段によって出力された前記カテゴリ情報が示す音源カテゴリについてはポテンシャル値を上げて、他の音源カテゴリについてはポテンシャル値を下げるポテンシャル値処理手段を有し、前記複数の音源カテゴリのうち、前記ポテンシャル値が最大になった音源カテゴリを、前記入力音の音源と判定する。 Preferably, among the plurality of sound source categories into which the plurality of reference vectors are classified, the potential value is increased for the sound source category indicated by the category information output by the sound source category identifying means, and the other sound source categories are increased. Has potential value processing means for lowering the potential value, and determines the sound source category having the maximum potential value among the plurality of sound source categories as the sound source of the input sound.

本発明の音源同定方法及び音源同定装置は、入力音の特徴を０、１で表した特徴ベクトルを生成し、音源が分かっている音から特徴ベクトルと同様に生成された参照ベクトルを用いて、特徴ベクトルと参照ベクトルとの遠近（類似度）をハミング距離で決定するという、シンプルで演算容易なロジックを用いているので、より少ない回路数でハードウェア化可能である。 The sound source identification method and the sound source identification device of the present invention generate a feature vector representing the features of the input sound as 0 and 1, and use a reference vector generated in the same manner as the feature vector from the sound whose sound source is known. Since a simple and easy-to-operate logic that determines the perspective (similarity) between the feature vector and the reference vector based on the Hamming distance is used, hardware can be realized with a smaller number of circuits.

以下、本発明の一実施形態について図面に基づいて説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

音源同定装置Ｓは、図１に示すように、マイクロホン（図示せず。）に接続されたパルス列生成手段６を備えている。パルス列生成手段６は、ＡＤ変換部１４と、人の聴覚系の蝸牛に相当する周波数分解部１５と、有毛細胞に相当する非線形変換部１６と、蝸牛神経に相当するパルス変換部１７とを備えている。ＡＤ変換部１４は、マイクロホンから入力された信号（入力音）をＡＤ変換する。周波数分解部１５は、バンドパスフィルタ（ＢＰＦ）群により構成され、ＡＤ変換された信号を所定の周波数範囲について対数スケールで複数の周波数帯域（以下、「チャンネル」ともいう。）別の信号に分解する。非線形変換部１６は、周波数分解部１５から入力された各周波数帯域の信号に対して、それぞれ、非線形変換を行うことによりその正の成分だけを取り出すとともに、ローパスフィルタ（ＬＰＦ）によりエンベロープ検出を行う。パルス変換部１７は、非線形変換部１６から入力された各周波数帯域の信号を、それぞれ、信号強度（すなわち、音圧）に比例したパルス頻度を持つパルス列に変換する。これらの処理により、パルス列生成手段６は、入力音を、チャンネル別に、音圧に応じたパルス頻度を持ち時間軸方向にパルスが並んだパルス列に変換する。 As shown in FIG. 1, the sound source identification device S includes pulse train generation means 6 connected to a microphone (not shown). The pulse train generation means 6 includes an AD conversion unit 14, a frequency decomposition unit 15 corresponding to a cochlea of a human auditory system, a non-linear conversion unit 16 corresponding to a hair cell, and a pulse conversion unit 17 corresponding to a cochlear nerve. I have. The AD conversion unit 14 AD converts a signal (input sound) input from the microphone. The frequency resolving unit 15 includes a band pass filter (BPF) group, and decomposes an AD-converted signal into a plurality of signals in a plurality of frequency bands (hereinafter also referred to as “channels”) on a logarithmic scale with respect to a predetermined frequency range. To do. The non-linear transformation unit 16 performs non-linear transformation on each frequency band signal input from the frequency resolving unit 15 to extract only the positive component, and performs envelope detection using a low-pass filter (LPF). . The pulse converter 17 converts each frequency band signal input from the nonlinear converter 16 into a pulse train having a pulse frequency proportional to the signal intensity (ie, sound pressure). By these processes, the pulse train generation means 6 converts the input sound into a pulse train having a pulse frequency corresponding to the sound pressure and arranged in the time axis direction for each channel.

また、音源同定装置Ｓは、図２に示すように、パルス数ベクトル生成手段１と、特徴ベクトル生成手段２と、音源カテゴリ識別手段３と、参照ベクトル記憶手段４とを備えている。 Further, as shown in FIG. 2, the sound source identification device S includes a pulse number vector generation unit 1, a feature vector generation unit 2, a sound source category identification unit 3, and a reference vector storage unit 4.

パルス数ベクトル生成手段１には、パルス列生成手段６で生成された各チャンネルのパルス列が入力される。実施形態では、チャンネル数は４３である。パルス数ベクトル生成手段１は、チャンネル数分のパルスカウンタ５を備え、各チャンネルのパルス列におけるカウント範囲内のパルス数を、各パルスカウンタ５により数え、それらのパルス数を要素とするパルス数ベクトルを生成する。パルス数ベクトルの要素数は、チャンネル数すなわち４３となる。カウント範囲は、実施形態ではパルス1000個分の幅とする。実施形態では、入力信号を４８kHzでサンプリングしてパルス列を生成するため、１secに48000個のパルスを発生可能であり、パルス1000個分の幅とは1000÷48000≒0.02secの幅となる。すなわち、各チャンネルのパルス列を、約20msec間隔で区切って数える。 The pulse number vector generation means 1 receives the pulse train of each channel generated by the pulse train generation means 6. In the embodiment, the number of channels is 43. The pulse number vector generation means 1 includes pulse counters 5 corresponding to the number of channels. Each pulse counter 5 counts the number of pulses in the count range in the pulse train of each channel, and a pulse number vector whose elements are those pulse numbers Generate. The number of elements of the pulse number vector is the number of channels, that is, 43. In the embodiment, the count range has a width corresponding to 1000 pulses. In the embodiment, since the input signal is sampled at 48 kHz to generate a pulse train, 48000 pulses can be generated per second, and the width of 1000 pulses is 1000 ÷ 48000≈0.02 sec. That is, the pulse train of each channel is counted by dividing it at intervals of about 20 msec.

特徴ベクトル生成手段２には、パルス数ベクトルが入力される。特徴ベクトル生成手段２は、パルス数ベクトルの各要素のうちの大きい方からＮ個の要素を１とし、残りの要素を０とした特徴ベクトルを生成するものである。すなわち、特徴ベクトルは、音圧の強い部分を１とし残りの部分を０とすることにより音の特徴を表すバイナリーベクトルである。なお、Ｎは、パルス数ベクトルの要素数以下の正整数であり、実施形態ではＮ＝９とする。 The feature vector generation means 2 receives a pulse number vector. The feature vector generating means 2 generates a feature vector in which N elements from the larger one of the elements of the pulse number vector are set to 1, and the remaining elements are set to 0. That is, the feature vector is a binary vector that represents a sound feature by setting the strong sound pressure portion to 1 and the remaining portion to 0. Note that N is a positive integer equal to or smaller than the number of elements of the pulse number vector, and N = 9 in the embodiment.

音源カテゴリ識別手段３には、特徴ベクトルが入力される。音源カテゴリ識別手段３は、参照ベクトル記憶手段４に記憶されている参照ベクトルから、特徴ベクトルにハミング距離で近い方からｋ個の参照ベクトルを検索し、それらｋ個の参照ベクトルのうちの最も多くの参照ベクトルが属する音源カテゴリを決定し、その音源カテゴリを示すカテゴリ情報を出力するものである。ｋは、参照ベクトルの総数以下の正整数であり、実施形態ではｋ＝１２とする。 A feature vector is input to the sound source category identification means 3. The sound source category identification unit 3 searches the reference vectors stored in the reference vector storage unit 4 for k reference vectors from the one closer to the feature vector by the Hamming distance, and most of the k reference vectors. The sound source category to which the reference vector belongs is determined, and category information indicating the sound source category is output. k is a positive integer equal to or smaller than the total number of reference vectors, and in the embodiment, k = 12.

参照ベクトル記憶手段４は、複数の参照ベクトルを記憶したもの（実施形態ではＲＯＭ）である。各参照ベクトルは、音源が分かっている入力音から上記特徴ベクトルと同様にして生成したバイナリーベクトルである。すなわち、所定時間長さを持つ入力音からチャンネル別のパルス列を生成し、時間軸方向に所定幅（パルス1000個分）を有するカウント範囲を、時間軸方向に重ならないように、かつ、間隔が開かないように移動させつつ、各パルス列におけるカウント範囲内のパルス数をカウントして、各パルス列におけるパルス数を要素とするパルス数ベクトルを生成する。そして、パルス数ベクトルの要素のうち大きい方からＮ（＝９）個を１に、残りの要素を０にした特徴ベクトルを生成し、その特徴ベクトルを参照ベクトルとする。各参照ベクトルは、元の音源を示す音源カテゴリ（音源種類）に、例えばその音源カテゴリを示すカテゴリ情報に関連付けることにより、分類されている。なお、参照ベクトルの総数を少なくするために、Ｋ平均法（K-means clustering）を用いて、各音源カテゴリについて、その音源カテゴリに属する特徴ベクトルを複数のクラスタに分けるクラスタリングを行い、各クラスタの代表（中心）をその音源カテゴリの参照ベクトルとする。実施形態では、各音源カテゴリについて参照ベクトルを1,000個とした。勿論、クラスタリングの方法は、Ｋ平均法以外の方法であってもよい。 The reference vector storage unit 4 stores a plurality of reference vectors (ROM in the embodiment). Each reference vector is a binary vector generated in the same manner as the feature vector from an input sound whose sound source is known. That is, a pulse train for each channel is generated from an input sound having a predetermined time length, and a count range having a predetermined width (for 1000 pulses) in the time axis direction is set so as not to overlap in the time axis direction. While moving so as not to open, the number of pulses in the count range in each pulse train is counted, and a pulse number vector having the number of pulses in each pulse train as an element is generated. Then, a feature vector in which N (= 9) elements from the larger of the pulse number vector elements are set to 1 and the remaining elements are set to 0 is generated, and the feature vector is set as a reference vector. Each reference vector is classified into a sound source category (sound source type) indicating the original sound source, for example, by associating it with category information indicating the sound source category. In order to reduce the total number of reference vectors, clustering is performed for each sound source category by dividing the feature vector belonging to the sound source category into a plurality of clusters using the K-means clustering. The representative (center) is the reference vector of the sound source category. In the embodiment, 1,000 reference vectors are used for each sound source category. Of course, the clustering method may be a method other than the K-average method.

音源同定装置Ｓにおいて実行される音源同定方法について、次に説明する。 Next, a sound source identification method executed in the sound source identification device S will be described.

音源同定装置Ｓでは、まず、パルス列生成手段６により入力音をチャンネル別のパルス列に変換する。図３は、入力音（入力信号）からパルス列が生成される様子を示したイメージ図であるが、図の最上段が入力信号（audio signal）であり、図の２段目以降に、この入力信号をチャンネル毎に周波数分解したものを破線で、周波数分解された信号から生成されたパルス列を実線で示している。なお、図３では、チャンネル１（channel 1）、チャンネル４（channel 4）、チャンネル７（channel 7）、チャンネル１０（channel 10）、チャンネル１３（channel 13）のみを図示している。 In the sound source identification device S, first, the input sound is converted into a pulse train for each channel by the pulse train generation means 6. FIG. 3 is an image diagram showing how a pulse train is generated from an input sound (input signal). The uppermost stage of the figure is an input signal (audio signal), and this input signal is shown after the second stage of the figure. Are broken down by frequency for each channel, and a pulse train generated from the frequency-resolved signal is shown by a solid line. In FIG. 3, only channel 1 (channel 1), channel 4 (channel 4), channel 7 (channel 7), channel 10 (channel 10), and channel 13 (channel 13) are illustrated.

また、図４−１〜図６−２は、音源が分かっている入力音、具体的には、図４−１は目覚まし時計のアラーム音、図４−２はインターホンの呼び出し音、図４−３は笛吹きケトルの沸騰音、図５−１は救急車のサイレン音、図５−２はパトカーのサイレン音、図５−３は電話のベル音、図６−１は消防車のサイレン音、図６−２は人の声から生成されたパルス列を示す。なお、縦軸はチャンネル、横軸は時間を示し、濃淡が濃い程パルス頻度大であることを示している。 FIGS. 4-1 to 6-2 are input sounds whose sound sources are known. Specifically, FIG. 4-1 is an alarm sound of an alarm clock, FIG. 4-2 is an interphone ringing sound, and FIG. 3 is a boiling sound of a whistling kettle, FIG. 5-1 is an ambulance siren sound, FIG. 5-2 is a police car siren sound, FIG. 5-3 is a telephone bell sound, FIG. 6-1 is a fire truck siren sound, FIG. 6B shows a pulse train generated from a human voice. The vertical axis indicates the channel, the horizontal axis indicates the time, and the darker the shade, the higher the pulse frequency.

〈第１ステップ〉次に、音源同定装置Ｓは、パルス数ベクトル生成手段１により、チャンネル別のパルス列からパルス数ベクトルを生成する第１のステップを実行する。第１のステップでは、パルス数ベクトル生成手段１は、図７のステップＳ０１に示すように、各ウィンドウ（時間窓）についてパルスカウンタ５によりパルス数を数える。ここで、ウィンドウとは、カウント範囲を示すものであり、図８のウィンドウＡ、ウィンドウＢに示すように、すべてのチャンネルを通してパルス列を所定幅（実施形態ではパルス1000個分の幅）で言わば覗いて数えるためのものである。 <First Step> Next, the sound source identification apparatus S executes a first step of generating a pulse number vector from the pulse train for each channel by means of the pulse number vector generating means 1. In the first step, the pulse number vector generating means 1 counts the number of pulses by the pulse counter 5 for each window (time window) as shown in step S01 of FIG. Here, the window indicates a count range, and as shown in window A and window B in FIG. 8, the pulse train can be viewed through a predetermined width (width of 1000 pulses in the embodiment) through all channels. It is for counting.

パルス数ベクトル生成手段１は、まず、最初のウィンドウ内に存在するパルス数を数える。最初のウィンドウは、時刻０から約20msecの範囲となる。パルス数を数えている間は、図７に示すように、待機状態（アイドリング状態）となる。そのウィンドウについてパルス数を数え終わると、パルス数ベクトル生成手段１は、パルス数をチャンネル順に並べたパルス数ベクトルを、特徴ベクトル生成手段２に渡す。次にステップＳ０１を行うときは、パルス数ベクトル生成手段１は、ウィンドウを、最初のウィンドウに重ならないように、かつ、最初のウィンドウとの間に間隔が開かないように移動させ、そのウィンドウ内に存在するパルス数を数える。そして、そのウィンドウについてパルス数を数え終わると、パルス数をチャンネル順に並べたパルス数ベクトルを、特徴ベクトル生成手段２に渡す。パルス数ベクトル生成手段１は、かかる処理を、ウィンドウを時間軸方向に（すなわち、時間の早い方から遅い方に向かって）、前のウィンドウと後のウィンドウとが重ならないように、かつ、前のウィンドウと後のウィンドウとで間隔が開かないように移動させつつ、繰り返す。 First, the pulse number vector generation means 1 counts the number of pulses existing in the first window. The first window ranges from time 0 to about 20 msec. While the number of pulses is being counted, as shown in FIG. 7, a standby state (idling state) is entered. When the number of pulses has been counted for the window, the pulse number vector generation unit 1 passes a pulse number vector in which the pulse numbers are arranged in channel order to the feature vector generation unit 2. Next, when performing step S01, the pulse number vector generation means 1 moves the window so that it does not overlap the first window and does not open a space between the first window and the window. Count the number of pulses present in. When the number of pulses is counted for the window, a pulse number vector in which the number of pulses is arranged in the channel order is passed to the feature vector generation means 2. The pulse number vector generation means 1 performs such processing in a time axis direction (that is, from the earlier time to the later time) so that the previous window and the subsequent window do not overlap with each other, and It repeats, moving so that a space | interval may not open with the window of and the latter window.

かかる処理により、例えば、図８に示すように、ウィンドウＡについてはパルス数ベクトルＡ１＝（3,2,4,3,…,5,2,4,5）が、ウィンドウＢについてはパルス数ベクトルＢ１＝(2,3,5,3,…,3,5,9,10)が生成される。 By this processing, for example, as shown in FIG. 8, the pulse number vector A1 = (3,2,4,3,..., 5,2,4,5) for the window A, and the pulse number vector for the window B. B1 = (2,3,5,3, ..., 3,5,9,10) is generated.

〈第２ステップ〉パルス数ベクトル生成手段１からパルス数ベクトルを受け取った特徴ベクトル生成手段２は、パルス数ベクトルから特徴ベクトルを生成する第２のステップを実行する（図７のＳ０２）。 <Second Step> The feature vector generation means 2 that has received the pulse number vector from the pulse number vector generation means 1 executes a second step of generating a feature vector from the pulse number vector (S02 in FIG. 7).

詳しくは、図９に示すように、特徴ベクトル生成手段２は、パルス数ベクトルの各要素からＮ（＝９）個の要素を任意に選択し、それらの要素を所定領域に格納する（Ｓ１０１）。次に、所定領域に格納された９個の要素の中で最小値の要素を見つける（Ｓ１０２）。最小値の要素を見つけたら、所定領域に格納されなかった残りの要素とその最小値の要素とを比較する（Ｓ１０３）。そして、残りの要素の中から、その最小値の要素より大きい値の要素を見つけたら、その最小値の要素の代わりに、見つかったより大きい値の要素を所定領域に格納し、ステップＳ１０２に戻って、所定領域に格納されている要素の中で最小値の要素を見つける。 Specifically, as shown in FIG. 9, the feature vector generation means 2 arbitrarily selects N (= 9) elements from each element of the pulse number vector, and stores these elements in a predetermined area (S101). . Next, the minimum value element is found among the nine elements stored in the predetermined area (S102). When the element having the minimum value is found, the remaining elements not stored in the predetermined area are compared with the element having the minimum value (S103). If an element having a value greater than the minimum value element is found from the remaining elements, the found element having a larger value is stored in a predetermined area instead of the minimum value element, and the process returns to step S102. The element having the minimum value among the elements stored in the predetermined area is found.

ステップＳ１０２、１０３は、所定領域中の最小値の要素より大きい値の要素が、残りの要素から見つからなくなるまで繰り返される。ステップＳ１０３で、所定領域中の最小値の要素より大きい値の要素が、残りの要素から見つからなかったら、所定領域にはパルス数ベクトルの各要素のうちの大きい方から９個の要素が格納されているので、パルス数ベクトル中のそれら９個の要素を１とし、それら９個以外の要素を０とすることにより、特徴ベクトルを生成する（Ｓ１０４）。 Steps S102 and 103 are repeated until no element having a value greater than the minimum value element in the predetermined area is found from the remaining elements. In step S103, if an element larger than the minimum value element in the predetermined area is not found from the remaining elements, nine elements from the larger of the elements of the pulse number vector are stored in the predetermined area. Therefore, the nine elements in the pulse number vector are set to 1, and the elements other than these nine are set to 0 to generate a feature vector (S104).

かかる処理により、例えば、図８に示すように、パルス数ベクトルＡ１については特徴ベクトルＡ２が、パルス数ベクトルＢ１については特徴ベクトルＢ２が生成される。生成された特徴ベクトルは音源カテゴリ識別手段３に渡される。 By this processing, for example, as shown in FIG. 8, a feature vector A2 is generated for the pulse number vector A1, and a feature vector B2 is generated for the pulse number vector B1. The generated feature vector is passed to the sound source category identifying means 3.

〈第３ステップ〉特徴ベクトル生成手段２から特徴ベクトルを受け取った音源カテゴリ識別手段３は、参照ベクトルと特徴ベクトルとの距離を調べ、特徴ベクトルに近い参照ベクトルが多く属する音源カテゴリのカテゴリ情報を出力する第３のステップを実行する（図７のＳ０３）。 <Third Step> Upon receiving the feature vector from the feature vector generating unit 2, the sound source category identifying unit 3 checks the distance between the reference vector and the feature vector, and outputs category information of the sound source category to which many reference vectors close to the feature vector belong. The third step is executed (S03 in FIG. 7).

詳しくは、図１０に示すように、音源カテゴリ識別手段３は、まず、参照ベクトル記憶手段４から参照ベクトルをすべて作業領域に読み出す（Ｓ２０１）。次に、作業領域中の参照ベクトルから任意のｋ（＝１２）個の参照ベクトルを選択し、それらの参照ベクトルを所定領域に格納する（Ｓ２０２）。 Specifically, as shown in FIG. 10, the sound source category identifying unit 3 first reads all reference vectors from the reference vector storage unit 4 into the work area (S201). Next, arbitrary k (= 12) reference vectors are selected from the reference vectors in the work area, and these reference vectors are stored in a predetermined area (S202).

そして、所定領域に格納された１２個の参照ベクトルの中から、特徴ベクトルとの距離が最大のものを見つける（Ｓ２０３）。なお、距離はハミング距離とする。最大距離の参照ベクトルを見つけたら、所定領域に格納されなかった残りの参照ベクトルと特徴ベクトルとの距離をそれぞれ調べ、それらの距離と見つけた最大距離とを比較する（Ｓ２０４）。そして、残りの参照ベクトルの中から、その最大距離より小さい距離の参照ベクトルを見つけたら、その最大距離の参照ベクトルの代わりに、見つかったより小さい距離の参照ベクトルを所定領域に格納し、ステップＳ２０３に戻って、所定領域に格納されている参照ベクトルの中で最大距離の参照ベクトルを見つける。 Then, from among the 12 reference vectors stored in the predetermined area, the one having the maximum distance from the feature vector is found (S203). The distance is a Hamming distance. When the reference vector with the maximum distance is found, the distances between the remaining reference vectors and feature vectors not stored in the predetermined area are checked, and the distances are compared with the found maximum distance (S204). When a reference vector having a distance smaller than the maximum distance is found from the remaining reference vectors, the reference vector having a smaller distance is stored in a predetermined area instead of the reference vector having the maximum distance, and the process proceeds to step S203. Returning, the reference vector of the maximum distance is found among the reference vectors stored in the predetermined area.

ステップＳ２０３、２０４は、所定領域中の最大距離の参照ベクトルより小さい距離の参照ベクトルが、残りの参照ベクトルから見つからなくなるまで繰り返される。ステップＳ２０４で、最大距離の参照ベクトルより小さい距離の参照ベクトルが、残りの参照ベクトルから見つからなかったら、所定領域には、すべての参照ベクトルのうち、特徴ベクトルとの距離が小さい方からｋ（＝１２）個の参照ベクトル（k-Nearest Neighbor）が格納されている（ｋ最近隣法）。各参照ベクトルはその参照ベクトルが属する音源カテゴリに分類されているので、所定領域の１２個の参照ベクトルが属する音源カテゴリをそれぞれ調べて、例えば、救急車のサイレン音であれば救急車のサイレン音の投票数に１を加える等、属する音源カテゴリに投票を行う（Ｓ２０５）。そして、所定領域の１２個の参照ベクトルの中で最も多くの参照ベクトルが属する音源カテゴリを決定して、その音源カテゴリを示すカテゴリ情報を、識別結果として出力する（Ｓ２０６）。 Steps S203 and 204 are repeated until no reference vector having a distance smaller than the reference vector having the maximum distance in the predetermined area is found from the remaining reference vectors. In step S204, if a reference vector having a distance smaller than the reference vector having the maximum distance is not found from the remaining reference vectors, k (= 12) Reference vectors (k-Nearest Neighbor) are stored (k nearest neighbor method). Since each reference vector is classified into the sound source category to which the reference vector belongs, the sound source categories to which the 12 reference vectors in the predetermined area belong are examined, for example, if the siren sound of an ambulance, the vote of the ambulance siren sound Vote for the sound source category to which the number belongs, such as adding 1 to the number (S205). Then, a sound source category to which the largest number of reference vectors among the 12 reference vectors in the predetermined region belong is determined, and category information indicating the sound source category is output as an identification result (S206).

この出力されたカテゴリ情報によって示される音源カテゴリは、特徴ベクトルに最も近い方から１２個の参照ベクトルのうち、最も多くの参照ベクトルが属するものであるので、特徴ベクトルが属する音源カテゴリであると判断できる。 The sound source category indicated by the output category information is the sound source category to which the feature vector belongs because the most reference vector belongs to the 12 reference vectors closest to the feature vector. it can.

図７に示すように、カテゴリ情報の出力を終えると（すなわち、クラス分けが終了すると）、音源同定装置Ｓは、ステップＳ０１に戻って、次のカウント範囲についてパルス数ベクトルの生成を行い、以下、ステップＳ０１〜Ｓ０３の処理をパルス列が終了するまで繰り返す。なお、ステップＳ０１〜０３の処理は、例えば、前のパルス数ベクトルについてステップＳ０２、Ｓ０３の処理が行われているときに、次のパルス数ベクトルの生成をステップＳ０１で行う等、平行して行ってもよい。 As shown in FIG. 7, when the output of the category information is finished (that is, when the classification is finished), the sound source identification device S returns to step S01 to generate a pulse number vector for the next count range. , Steps S01 to S03 are repeated until the pulse train is completed. Note that the processing in steps S01 to 03 is performed in parallel, for example, when the processing in steps S02 and S03 is performed for the previous pulse number vector, the next pulse number vector is generated in step S01. May be.

図１１−１〜図１３−２は、各種の入力音を音源同定装置Ｓで識別したときの識別結果を示す図であり、出力されたカテゴリ情報が示す音源カテゴリを黒の棒線で示している。図１１−１は目覚まし時計のアラーム音、図１１−２はインターホンの呼び出し音、図１１−３は笛吹きケトルの沸騰音、図１２−１は救急車のサイレン音、図１２−２はパトカーのサイレン音、図１２−３は電話のベル音、図１３−１は消防車のサイレン音、図１３−２は人の声をそれぞれ入力したときの識別結果を表す。なお、縦軸は音源カテゴリで、上から順に、不明（Unknown）、目覚まし時計のアラーム音（Alarm）、インターホンの呼び出し音（Interphone）、笛吹きケトルの沸騰音（Kettle）、救急車のサイレン音（Ambulance）、パトカーのサイレン音（Police）、電話のベル音（Phone）、消防車のサイレン音（Fire）、人の声（Voice）であり、横軸は時間である。これらの図から、音源同定装置Ｓは、かなり正確に音源を識別していることが分かる。 FIGS. 11A to 13B are diagrams illustrating identification results when various input sounds are identified by the sound source identification device S. The sound source categories indicated by the output category information are indicated by black bars. Yes. 11-1 is the alarm sound of the alarm clock, FIG. 11-2 is the intercom ringing tone, FIG. 11-3 is the whistling kettle boiling sound, FIG. 12-1 is the ambulance siren sound, and FIG. 12-2 is the police car. A siren sound, FIG. 12-3 is a telephone bell sound, FIG. 13-1 is a fire truck siren sound, and FIG. 13-2 is an identification result when a human voice is input. Note that the vertical axis shows the sound source category, starting from the top, unknown, alarm clock alarm sound (Alarm), interphone ringing sound (Interphone), whistling kettle boiling sound (Kettle), ambulance siren sound ( Ambulance, police car siren (Police), telephone bell (Phone), fire truck siren (Fire), human voice (Voice), the horizontal axis is time. From these figures, it can be seen that the sound source identification device S identifies sound sources fairly accurately.

音源同定装置Ｓにおいて、パルス数ベクトルをそのまま識別に用いずに特徴ベクトルに変換したのは、パルス数ベクトルをそのまま用いると、音圧（音の強さ）の影響を強く受けるため、ノイズに弱くなってしまうからである。１つのウィンドウ内の各チャンネルのパルス数は、そのウィンドウに相当する時間内の各チャンネルの平均エネルギーに比例するので、特徴ベクトルは、その時間内の音のエネルギーの強い部分を表すことになるが、強い部分を「１」でその他の部分を「０」で表しているので、音圧の影響は小さくなって、ノイズに強い。 In the sound source identification device S, the pulse number vector is converted into a feature vector without being used for identification as it is, because if the pulse number vector is used as it is, it is strongly affected by the sound pressure (sound intensity), and therefore it is weak against noise. Because it becomes. Since the number of pulses of each channel in one window is proportional to the average energy of each channel in the time corresponding to that window, the feature vector represents a strong portion of sound energy in that time. Since the strong portion is represented by “1” and the other portions are represented by “0”, the influence of the sound pressure is reduced, and it is strong against noise.

また、図１４に示すように、パルス数ベクトルをそのまま用いるとともに、参照ベクトルとしてパルス数ベクトルと同様のベクトルを用いて、両者の距離をマンハッタン距離で計測すると、幾つかの要素が入れ替わっただけの殆ど同じベクトルでも、距離が遠くなってしまうことがある。図１４では、参照ベクトル（reference）と殆ど同じベクトルＡ（vector A）が、参照ベクトルに似ていないベクトルＢ（vector B）よりも、参照ベクトルから遠くなってしまっている。音源同定装置Ｓでは、特徴ベクトルと参照ベクトルとをいずれもバイナリーベクトルとし、両者の距離をハミング距離で計測しているので、似ているもの同士は距離が近く、似ていないもの同士は距離が遠くなり、識別の正確さが向上する。また、ハミング距離は排他的論理和を用いて容易に演算可能である。 Further, as shown in FIG. 14, when the pulse number vector is used as it is and the distance between the two is measured by the Manhattan distance using the same vector as the pulse number vector as a reference vector, only some elements are replaced. Even with almost the same vector, the distance may be far. In FIG. 14, a vector A (vector A) that is almost the same as the reference vector is farther from the reference vector than a vector B (vector B) that does not resemble the reference vector. In the sound source identification device S, both the feature vector and the reference vector are binary vectors, and the distance between the two is measured by the Hamming distance, so that similar ones are close to each other, and dissimilar ones are close to each other. The distance is increased and the accuracy of identification is improved. The Hamming distance can be easily calculated using exclusive OR.

なお、図１１−１〜図１３−２から分かるように、音源同定装置Ｓは、単純な音については正しく分類できるが、複雑な音については分類ミスを生じている。この分類ミスを除去するには、音源カテゴリ識別手段３がカテゴリ情報を出力する度に、そのカテゴリ情報が示す音源カテゴリのポテンシャル値を上げ、他の音源カテゴリのポテンシャル値を下げるポテンシャル値処理手段を、音源カテゴリ識別手段３の後段に設け、ポテンシャル値が最大になった音源カテゴリを、入力音の音源と判定することが好ましい。以下、ポテンシャル値処理手段が行う第４のステップについて説明する。第４のステップは、第３のステップの次に実行される。 As can be seen from FIGS. 11-1 to 13-2, the sound source identification device S can correctly classify simple sounds, but has a classification error for complex sounds. To eliminate this misclassification, every time the sound source category identifying unit 3 outputs category information, a potential value processing unit that raises the potential value of the sound source category indicated by the category information and lowers the potential value of other sound source categories. The sound source category having the maximum potential value provided at the subsequent stage of the sound source category identifying means 3 is preferably determined as the sound source of the input sound. Hereinafter, the fourth step performed by the potential value processing means will be described. The fourth step is executed next to the third step.

〈第４のステップ〉ポテンシャル値処理手段は、各音源カテゴリｉのポテンシャル値Ｐ_ｉ（ｔ）を記憶している。なお、ｉは、上記８種類の音源カテゴリにそれぞれ付されたインデックスでｉ＝０〜７である。例えば、ｉ＝０は目覚まし時計のアラーム音（Alarm）に、ｉ＝１はインターホンの呼び出し音（Interphone）に付されたインデックスである。音源カテゴリ識別手段３は、カテゴリ情報として、かかるインデックスを出力するものとする。また、ｔは時刻であり、Ｐ_ｉ（０）＝０（ｉ＝０〜７）とする。 <Fourth Step> The potential value processing means stores the potential value P _i (t) of each sound source category i. Note that i is an index assigned to each of the eight types of sound source categories, and i = 0 to 7. For example, i = 0 is an index attached to an alarm sound (Alarm) of an alarm clock, and i = 1 is an index attached to an interphone ringing sound (Interphone). The sound source category identification unit 3 outputs such an index as category information. T is time, and P _i (0) = 0 (i = 0 to 7).

ポテンシャル値処理手段は、音源カテゴリ識別手段３から時刻ｔにおけるカテゴリ情報ｙ（ｔ）を受け取ると、次の数式(1)(2)に従って、Ｐ_ｉ（ｔ）（ｉ＝０〜７）を増減する。 When the potential value processing means receives the category information y (t) at time t from the sound source category identifying means 3, it increases or decreases P _i (t) (i = 0 to 7) according to the following equations (1) and (2). To do.

ｉ＝ｙ（ｔ）に対しては、Ｐ_ｉ（ｔ）＝min（Ｐ_max，Ｐ_ｉ（ｔ−１）＋γ）…(1)
ｉ≠ｙ（ｔ）に対しては、Ｐ_ｉ（ｔ）＝max（０，Ｐ_ｉ（ｔ−１）−１）…(2)
すなわち、カテゴリ情報ｙ（ｔ）で示された音源カテゴリに対しては、そのポテンシャル値をγ上昇させ、それ以外の音源カテゴリに対しては、そのポテンシャル値を１下降させる。なお、１回あたりの上昇幅は１回あたりの下降幅よりも大きいもの（すなわち、γ＞１）とし、ここではγ＝２とする。また、Ｐ_maxはポテンシャル値の上限であり、ポテンシャル値の下限は０とする。 For i = y (t), P _i (t) = min (P _max , P _i (t−1) + γ) (1)
For i ≠ y (t), P _i (t) = max (0, P _i (t−1) −1) (2)
That is, for the sound source category indicated by the category information y (t), the potential value is increased by γ, and for the other sound source categories, the potential value is decreased by 1. In addition, the ascending width per time is larger than the descending width per time (that is, γ> 1), and here, γ = 2. P _max is the upper limit of the potential value, and the lower limit of the potential value is 0.

このように、時間情報を加えれば、音源は一般に急変することはないので、時間の経過と共にその音源が識別されるようになる。以下に実験例を示す。 As described above, if time information is added, the sound source generally does not change suddenly, so that the sound source is identified as time passes. Experimental examples are shown below.

〈実験例〉
上記８種類の音源をマイクの周囲に並べ、インデックスが小さい音源から順に音を発してマイクで集音し、サンプリング周波数４８kHzで３つの音信号ファイルを作った。そのうち２つのファイルをトレーニング（すなわち、参照ベクトルの作成）に用い、１つのファイルをテストに用いた。各パラメータは、次のように定めた。 <Experimental example>
The above 8 types of sound sources were arranged around the microphone, sound was generated in order from the sound source with the smallest index, and the sound was collected by the microphone, and three sound signal files were created at a sampling frequency of 48 kHz. Two of these files were used for training (ie, creation of reference vectors) and one file was used for testing. Each parameter was determined as follows.

ウィンドウ（カウント範囲）の幅＝1000（パルス1000個分）
Ｎ＝９
ｋ＝１２
Ｐ_max＝１９２
γ＝２
テスト・ファイルの音信号を音源同定装置Ｓに入力したときの出力結果を、図１５に示す。図１５の最上段（Original Labels）は入力音を示し、２段目（k-Nearest Neighbor Classification Result）は音源カテゴリ識別手段３の出力結果を×印で示す。なお、×印が多数密集している部分は棒状に見える。また、４段目（Time Potentials）のグラフは、ポテンシャル値処理手段によって処理されたポテンシャル値を示す。なお、このグラフにおいて、符号Ｐ０が付された線はＰ_０（ｔ）、符号Ｐ１が付された線はＰ_１（ｔ）、符号Ｐ２が付された線はＰ_２（ｔ）、符号Ｐ３が付された線はＰ_３（ｔ）、符号Ｐ４が付された線はＰ_４（ｔ）、符号Ｐ５が付された線はＰ_５（ｔ）、符号Ｐ６が付された線はＰ_６（ｔ）、符号Ｐ７が付された線はＰ_７（ｔ）を表している。４段目のグラフからは、時間の経過と共に正しい音源がポテンシャル値によって示されることが分かる。なお、３段目（Time Potentials Classification Result）は、４段目のポテンシャル値のうち最大となったものの音源カテゴリを示している。 Window (count range) width = 1000 (1000 pulses)
N = 9
k = 12
P _max = 192
γ = 2
The output result when the sound signal of the test file is input to the sound source identification device S is shown in FIG. The top row (Original Labels) in FIG. 15 indicates the input sound, and the second row (k-Nearest Neighbor Classification Result) indicates the output result of the sound source category identifying means 3 with a cross. In addition, the part where many X marks are concentrated looks like a rod. The fourth graph (Time Potentials) shows the potential values processed by the potential value processing means. In this graph, the line labeled P0 is P ₀ (t), the line labeled P1 is P ₁ (t), the line labeled P2 is P ₂ (t), and the code P3 is is attached line _P 3 (t), reference numeral P4 is attached line _P 4 (t), the reference numeral P5 is attached line _P 5 (t), the code P6 is affixed line _{P 6} (T), the line with the reference symbol P7 represents P ₇ (t). From the fourth graph, it can be seen that the correct sound source is indicated by the potential value over time. The third level (Time Potentials Classification Result) indicates the sound source category of the maximum potential value in the fourth level.

このように、時間情報を加えたポテンシャル値によって音源を判定すれば、複雑な音であっても入力音の音源を正しく識別できることが分かる。 Thus, it can be seen that if a sound source is determined based on a potential value to which time information is added, the sound source of the input sound can be correctly identified even for complex sounds.

本発明の音源同定方法は、一般のコンピュータでソフトウェアにより実行させることもできる（すなわち、音源同定装置Ｓを一般のコンピュータで実現することもできる）が、処理の高速化のためには処理ロジックをハードウェア化することが好ましい。実施形態の音源同定方法（但し、ポテンシャル値の処理を行う第４ステップを除く。）をコーディングしてＦＰＧＡに書き込みハードウェア化した場合、回路数は約2,300ALUTsとなり、上記特許文献２記載の装置に比して回路数が大幅に少なくなった。これは、処理ロジックがシンプルでステップ数が少ないためである。このように、本発明の音源同定方法は、少ない回路数でハードウェア化可能であるので、音源同定装置Ｓのコンパクト化と処理の高速化が可能である。 The sound source identification method of the present invention can be executed by software on a general computer (that is, the sound source identification device S can also be realized by a general computer). It is preferable to use hardware. When the sound source identification method of the embodiment (excluding the fourth step for processing the potential value) is coded and written into the FPGA, the number of circuits is about 2,300 ALUTs, and the apparatus described in Patent Document 2 above Compared to, the number of circuits is greatly reduced. This is because the processing logic is simple and the number of steps is small. Thus, since the sound source identification method of the present invention can be implemented with hardware with a small number of circuits, the sound source identification device S can be made compact and the processing speed can be increased.

また、本発明の音源同定方法では、パラメータ数が、従来のパルスニューロンモデルを用いた手法に比して少ない。しかも、図１６に、パラメータＮ（Number of Features）の値を６〜１２の範囲で変更するとともにパラメータｋ（Nearest Neighbor）の値を１〜１７の範囲で変更して、識別の正確さ（正しく識別された割合）の変化を調べた結果を示すが、図１６から分かるように、パラメータＮ、ｋは、ある程度大きいところからは値を変えても識別の正確さを維持できる。すなわち、これらのパラメータはいずれも臨界的（critical）でないため、調整が容易であり、新しい音を学習させる（すなわち、新しい音により参照ベクトルを生成する）ときも、パラメータの調整が容易である。 Further, in the sound source identification method of the present invention, the number of parameters is smaller than that of a conventional technique using a pulse neuron model. In addition, in FIG. 16, the value of parameter N (Number of Features) is changed in the range of 6 to 12, and the value of parameter k (Nearest Neighbor) is changed in the range of 1 to 17, so that the accuracy of identification (correctly FIG. 16 shows the results of examining the change in the identified ratio). As can be seen from FIG. 16, the parameters N and k can maintain the accuracy of identification even if they are changed to a certain extent. That is, since none of these parameters is critical, the adjustment is easy, and the adjustment of the parameters is easy even when a new sound is learned (that is, a reference vector is generated by the new sound).

また、参照ベクトルがバイナリーベクトルであるため、参照ベクトルを記憶するメモリの容量が少なくて済む。 Further, since the reference vector is a binary vector, the memory capacity for storing the reference vector can be reduced.

なお、上記実施形態では、カウント範囲を、時間軸方向に重ならないように、かつ、間隔が開かないように移動させつつ、各パルス列におけるカウント範囲内のパルス数をカウントしたが、カウント範囲同士の間隔を開けるように構成してもよい。適宜間隔を開けつつカウント範囲を移動させてパルス数をカウントしても、入力音の特徴を抽出でき、音源同定が可能であるとともに、データ量を減少させることができるからである。 In the above embodiment, the number of pulses in the count range in each pulse train is counted while moving the count range so that it does not overlap in the time axis direction and the interval is not opened. You may comprise so that a space | interval may be opened. This is because even if the count range is moved with an appropriate interval and the number of pulses is counted, the characteristics of the input sound can be extracted, the sound source can be identified, and the amount of data can be reduced.

本発明の一実施形態に係る音源同定装置のパルス列生成手段のブロック構成図である。It is a block block diagram of the pulse train production | generation means of the sound source identification device which concerns on one Embodiment of this invention. 同実施形態に係る音源同定装置のブロック構成図である。It is a block block diagram of the sound source identification device which concerns on the embodiment. 入力信号からパルス列を生成する様子を示したイメージ図である。It is the image figure which showed a mode that the pulse train was produced | generated from the input signal. 目覚まし時計のアラーム音から生成されたパルス列を示す図である。It is a figure which shows the pulse train produced | generated from the alarm sound of the alarm clock. インターホンの呼び出し音から生成されたパルス列を示す図である。It is a figure which shows the pulse train produced | generated from the ringing tone of the intercom. 笛吹きケトルの沸騰音から生成されたパルス列を示す図である。It is a figure which shows the pulse train produced | generated from the boiling sound of a whistling kettle. 救急車のサイレン音から生成されたパルス列を示す図である。It is a figure which shows the pulse train produced | generated from the siren sound of the ambulance. パトカーのサイレン音から生成されたパルス列を示す図である。It is a figure which shows the pulse train produced | generated from the siren sound of the police car. 電話のベル音から生成されたパルス列を示す図である。It is a figure which shows the pulse train produced | generated from the bell sound of the telephone. 消防車のサイレン音から生成されたパルス列を示す図である。It is a figure which shows the pulse train produced | generated from the siren sound of the fire engine. 人の声から生成されたパルス列を示す図である。It is a figure which shows the pulse train produced | generated from the human voice. 同実施形態に係る音源同定方法を示すフローチャートである。It is a flowchart which shows the sound source identification method which concerns on the embodiment. パルス列からパルス数ベクトル、特徴ベクトルが生成される様子を示したイメージ図である。It is the image figure which showed a mode that the pulse number vector and the feature vector were produced | generated from the pulse train. 同実施形態に係る音源同定方法の第２ステップのフローチャートである。It is a flowchart of the 2nd step of the sound source identification method which concerns on the same embodiment. 同実施形態に係る音源同定方法の第３ステップのフローチャートである。It is a flowchart of the 3rd step of the sound source identification method which concerns on the embodiment. 目覚まし時計のアラーム音を入力したときの識別結果を示す図である。It is a figure which shows the identification result when the alarm sound of an alarm clock is input. インターホンの呼び出し音を入力したときの識別結果を示す図である。It is a figure which shows the identification result when the ringtone of an intercom is input. 笛吹きケトルの沸騰音を入力したときの識別結果を示す図である。It is a figure which shows the identification result when the boiling sound of a whistling kettle is input. 救急車のサイレン音を入力したときの識別結果を示す図である。It is a figure which shows the identification result when the siren sound of an ambulance is input. パトカーのサイレン音を入力したときの識別結果を示す図である。It is a figure which shows the identification result when the siren sound of a police car is input. 電話のベル音を入力したときの識別結果を示す図である。It is a figure which shows the identification result when the bell sound of a telephone is input. 消防車のサイレン音を入力したときの識別結果を示す図である。It is a figure which shows the identification result when the siren sound of a fire engine is input. 人の声を入力したときの識別結果を示す図である。It is a figure which shows the identification result when a human voice is input. パルス数ベクトルと、パルス数ベクトルと同様の参照ベクトルとの距離をマンハッタン距離で計測した例である。This is an example in which the distance between the pulse number vector and a reference vector similar to the pulse number vector is measured by the Manhattan distance. テスト・ファイルの音信号を入力したときの出力結果を示す図である。It is a figure which shows the output result when the sound signal of a test file is input. パラメータＮ、ｋの値を変えて、識別の正確さの変化を調べた結果を示す図である。It is a figure which shows the result of having investigated the change of the accuracy of identification by changing the value of parameters N and k.

Explanation of symbols

Ｓ…音源同定装置
１…パルス数ベクトル生成手段
２…特徴ベクトル生成手段
３…音源カテゴリ識別手段
４…参照ベクトル記憶手段
６…パルス列生成手段 DESCRIPTION OF SYMBOLS S ... Sound source identification apparatus 1 ... Pulse number vector generation means 2 ... Feature vector generation means 3 ... Sound source category identification means 4 ... Reference vector storage means 6 ... Pulse train generation means

Claims

Sound source identification that converts input sound into a pulse train that has a pulse frequency according to sound pressure and arranged in the time axis direction for each frequency band, and identifies the sound source of the input sound using the pulse train of each frequency band A method,
A first step of counting the number of pulses in a count range having a predetermined width in the time axis direction in the pulse train of each frequency band, and generating a pulse number vector having the number of pulses in each pulse train as an element;
A second step of generating a feature vector in which N (N: positive integer) elements from the larger one of the elements of the pulse number vector are set to 1 and the remaining elements are set to 0;
From a plurality of reference vectors that are generated in the same manner as the feature vector from the sound whose sound source is known and are classified and stored in the sound source category indicating the original sound source, respectively, k ( (k: positive integer) third reference vectors are searched, a sound source category to which the most reference vectors of the k reference vectors belong is determined, and category information indicating the sound source categories is output. ,
While repeatedly moving the count range so as not to overlap in the time axis direction,
A sound source identification method, wherein a sound source of the input sound is identified based on the output category information.

Said first step;
The second step;
The third step;
Among the plurality of sound source categories into which the plurality of reference vectors are classified, the potential value is increased for the sound source category indicated by the category information output in the third step, and the potential value is decreased for the other sound source categories. 4 steps
While repeatedly moving the count range so as not to overlap in the time axis direction,
The sound source identification method according to claim 1, wherein a sound source category having the maximum potential value among the plurality of sound source categories is determined as a sound source of the input sound.

A pulse train generating means for converting the input sound into a pulse train having a pulse frequency corresponding to the sound pressure and arranged in the time axis direction for each of a plurality of frequency bands;
For each count range that has a predetermined width in the time axis direction and does not overlap in the time axis direction, the number of pulses in the count range in the pulse train of each frequency band is counted, and the number of pulses in each pulse train A pulse number vector generating means for generating a pulse number vector having as elements,
Feature vector generating means for generating a feature vector in which N (N: positive integer) elements from the larger one of the elements of the pulse number vector are set to 1 and the remaining elements are set to 0;
Reference vector storage means for storing a plurality of reference vectors generated in the same manner as the feature vectors from sounds whose sound sources are known, each classified into a sound source category indicating the original sound source,
The reference vector stored in the reference vector storage means is searched for k (k: positive integer) reference vectors from the one closer to the feature vector by the Hamming distance, and the largest of the k reference vectors. Sound source category identifying means for determining a sound source category to which the reference vector belongs, and outputting category information indicating the sound source category;
And identifying a sound source of the input sound based on the output category information.

Among the plurality of sound source categories into which the plurality of reference vectors are classified, the potential value is increased for the sound source category indicated by the category information output by the sound source category identifying means, and the potential value is decreased for the other sound source categories. Having potential value processing means;
4. The sound source identification apparatus according to claim 3, wherein, among the plurality of sound source categories, a sound source category having the maximum potential value is determined as a sound source of the input sound.