JPH04109300A

JPH04109300A - Impulsive sound discrimination device

Info

Publication number: JPH04109300A
Application number: JP2226534A
Authority: JP
Inventors: Shoichi Hirai; 平井　彰一
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1990-08-30
Filing date: 1990-08-30
Publication date: 1992-04-10

Abstract

PURPOSE:To relax the observation conditions of an input sound and to discriminate the impulsive sound with high accuracy by providing a learning function which knows the category of an input pattern and updates a dictionary pattern. CONSTITUTION:The device is provided with an acoustic input part 1, a feature extraction part 2, a segment processing part 3, a recognition processing part 4, a learning pattern memory part 8, and a learning process part 9. Then when a recognition result obtained the recognition processing part 4 is received and a previously known result, a pattern stored in the learning pattern memory part 8 is read out to update a characteristic core. After the characteristic core is updated, a characteristic vector is calculated and the contents of a standard pattern memory part 5 are updated into a new dictionary. Thus, the learning function which knows the category of the input pattern and updates the standard pattern is provided. Consequently, the observation conditions of the input sound are relaxed and the impulsive sound can be identified with high accuracy.

Description

【発明の詳細な説明】［発明の目的コ（産業上の利用分野）この発明は、例えば大砲音の判別や銃器の種別を判別す
る衝撃音識別装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Objective of the Invention (Industrial Application Field) The present invention relates to an impact sound identification device for identifying, for example, cannon sounds or the type of firearm.

（従来の技術）従来より、音声認識装置を利用して、大砲音の判別や銃
器の種別を判別する衝撃音識別装置の開発が進められて
いる。第５図に一般的な音声認識装置として、単語識別
装置の構成を示す。(Prior Art) Development of an impact sound identification device that uses a voice recognition device to identify cannon sounds and the type of firearm has been progressing. FIG. 5 shows the configuration of a word identification device as a general speech recognition device.

第５図において、音声はマイクロフォン等のセンサで電
気信号に変換されて入力され、Ａ／Ｄ（アナログ／デジ
タル）変換器１１でデジタル信号に変換された後、音響
処理回路１２に入力される。この音響処理回路１２は入
力した音声信号をバントパスフィルタ処理やフーリュ変
換により周波数空間に変換し、特徴パラメータ時系列を
検出する。ここで、前処理として単語が形成されている
音声パターンデータを時系列から切り出してデ−タハッ
ファメモリ１３に格納する。このメモリ１３に記憶され
た音声パターンデータは、順次、類似度計算回路１４に
送られる。この類似度計算回路１４は入力した音声パタ
ーンについて予めパターン辞書ファイル１５に登録され
ている単語パターンとの類似度を計算し、最大類似度を
示す単語名を識別結果として、制御回路１６を通じて出
力する。In FIG. 5, audio is input after being converted into an electrical signal by a sensor such as a microphone, converted into a digital signal by an A/D (analog/digital) converter 11, and then input to a sound processing circuit 12. This acoustic processing circuit 12 converts the input audio signal into frequency space by band pass filter processing or Fourier transform, and detects a characteristic parameter time series. Here, as a preprocess, the speech pattern data in which words are formed is cut out from the time series and stored in the data buffer memory 13. The voice pattern data stored in the memory 13 is sequentially sent to the similarity calculation circuit 14. This similarity calculation circuit 14 calculates the similarity of the input speech pattern with word patterns registered in advance in the pattern dictionary file 15, and outputs the word name showing the maximum similarity as an identification result through the control circuit 16. .

ここで、上記類似度には、複合類似度を用いると高精度
な識別性能か得られることがよく知られている。この複
合類似度法について説明すると、あるカテゴリーのサン
プル（例えば数字の認識を考えると］　　ｒｉｃｈｊｌ
と発音されたデータンを集めて、ＫＬ展開を行い、共分
散行列の固有ベクトルを固有値の大きい順に複数個用い
てそのカテゴリーの辞書とする。カテゴリーＬについて
の辞書ヲ（φ。“Ｌゝ）　とすると、複合類似度Ｓは入
カバターンｆに対して次式で定義される。Here, it is well known that highly accurate identification performance can be obtained when a composite similarity is used as the above-mentioned similarity. To explain this composite similarity method, samples of a certain category (for example, considering number recognition)
Collect data pronounced as , perform KL expansion, and use a plurality of eigenvectors of the covariance matrix in descending order of eigenvalues to form a dictionary for that category. Assuming that the dictionary for category L is (φ.“Lゝ), the composite similarity S is defined for the input cover turn f by the following equation.

ＳＬＬ″［ｆコ一　Σ　　（ｆ　・　φ　　（Ｌゝ　）　　２／　ＩＩ
　　ｆ　　ＩＩ　２ｒｌ但し、ｌ　ｆ　ＩＩはｆのノルム、（ｆ・φ。３Ｌ））
はｆとφ　ＬＬ＋の内積である。SLL″ [f Σ (f ・ φ (Lゝ) 2/ II
f II 2rl where l f II is the norm of f, (f・φ.3L))
is the inner product of f and φ LL+.

ところで、音声認識の場合は、音声の発生する位置とマ
イクロフォン等のセンサの位置の関係かある一定の距離
に置かれることを条件としており、例えばマイクロフォ
ンに向かって発生するとか、電話で入力する場合等がほ
とんどである。しかしなから、大砲台等の識別では、野
外で観測を行うことか多く、距離が一定条件下におかれ
る保証がない。したがって、上記構成の音声認識装置を
用いても、精度よく衝撃音を識別することは困難である
。By the way, in the case of voice recognition, the condition is that the position where the voice is generated is placed at a certain distance from the position of a sensor such as a microphone. For example, when the voice is generated toward a microphone or when inputting by phone. etc. in most cases. However, when identifying artillery batteries, etc., observations are often conducted outdoors, and there is no guarantee that the distance will be kept constant. Therefore, even if the voice recognition device having the above configuration is used, it is difficult to accurately identify impact sounds.

（発明が解決しようとする課題）以上述べたように、従来の音声認識装置では、大砲台等
の衝撃音の識別に利用するには観測条件が厳しく、同−
音であっても観測距離に大きく依存し、高精度に識別す
るのは困難である。(Problems to be Solved by the Invention) As described above, conventional voice recognition devices require difficult observation conditions to be used for identifying impact sounds from artillery batteries, etc.
Even if it is a sound, it is difficult to identify it with high precision, as it depends greatly on the observation distance.

この発明は上記の問題を解決するためになされたもので
、入力音の観測条件が緩やかで、高精度に衝撃音を識別
可能な衝撃音識別装置を提供することを目的とする。The present invention was made in order to solve the above-mentioned problems, and an object of the present invention is to provide an impact sound identification device that has relaxed observation conditions for input sounds and is capable of identifying impact sounds with high accuracy.

［発明の構成］（課題を解決するための手段）上記目的を達成するためにこの発明は、入力音の特徴パ
ラメータ時系列からなる衝撃音パターンを検出するパタ
ーン検出手段と、このパターン手段で得られた衝撃音パ
ターンを記憶する記憶手段と、この記憶手段に記憶され
た衝撃パターンについて予め登録された複数の標準パタ
ーンとノ類似度をそれぞれ計算する類似度計算手段と、
この計算手段により求められた類似度値をもとに入力音
の判定を行う衝撃音識別装置において、前記入力音と共
にそのカテゴリー情報を入力し、当該カテゴリーにおけ
る衝撃音パターンを標準パターンとして再登録する学習
機能を具備して構成される。[Structure of the Invention] (Means for Solving the Problem) In order to achieve the above object, the present invention provides a pattern detection means for detecting an impact sound pattern consisting of a time series of characteristic parameters of an input sound, and a pattern detection means for detecting an impact sound pattern consisting of a time series of characteristic parameters of an input sound. storage means for storing the impact sound pattern stored in the storage means; similarity calculation means for calculating the similarity between the impact patterns stored in the storage means and a plurality of pre-registered standard patterns;
In an impact sound identification device that determines the input sound based on the similarity value obtained by this calculation means, the category information is input together with the input sound, and the impact sound pattern in the category is re-registered as a standard pattern. It is configured with a learning function.

（作用）上記構成による衝撃音識別装置では、入カバターンのカ
テゴリーを知って、標準パターンを更新する学習機能を
有しているので、例えば特定のカテゴリーの入カバター
ンについて、距離による変形を加えて更新することによ
り、識別精度を向上させることかできる。(Function) The impact sound identification device with the above configuration has a learning function that updates the standard pattern by knowing the category of the incoming cover pattern, so for example, the incoming cover pattern of a specific category is updated by adding transformations based on the distance. By doing so, identification accuracy can be improved.

（実施例）以下、第１図乃至第４図を参照してこの発明の一実施例
を説明する。(Embodiment) An embodiment of the present invention will be described below with reference to FIGS. 1 to 4.

第１図はその構成を示すもので、音響入力部１はマイク
ロフォン等のセンサとＡ／Ｄ変換器を備え、センサによ
って捕えられた衝撃音は電気信号に変換され、さらにＡ
／Ｄ変換されて特徴抽出部２に供給される。この特徴抽
出部２は、例えば１６チヤンネルのバンドパスフィルタ
群からなり、各バンドパスフィルタの出力は一定時間毎
に順にセグメント処理部３に供給される。このセグメン
ト処理部３はバンドパスフィルタ出力の系列と音の大き
さの変化を参照して衝撃音の部分をバターンとして切り
出すもので、ここで得られた衝撃音パターンは認識処理
部４に送られる。この認識処理部４は入力した衝撃音パ
ターンについて、メモリ部５から予め登録されている標
準パターンと比較し、それぞれ類似度計算を行って、最
大類似度を示すパターンの種別名を認識結果表示部６に
表示させるものである。FIG. 1 shows its configuration. The acoustic input section 1 is equipped with a sensor such as a microphone and an A/D converter. The impact sound captured by the sensor is converted into an electrical signal, and the sound input section 1 is equipped with a sensor such as a microphone and an A/D converter.
/D converted and supplied to the feature extraction unit 2. The feature extractor 2 is composed of, for example, a group of 16 channel band-pass filters, and the output of each band-pass filter is sequentially supplied to the segment processor 3 at regular intervals. This segment processing section 3 extracts the impact sound part as a pattern by referring to the series of band-pass filter outputs and changes in sound volume, and the impact sound pattern obtained here is sent to the recognition processing section 4. . The recognition processing unit 4 compares the input impact sound pattern with standard patterns registered in advance from the memory unit 5, calculates the similarity of each, and displays the type name of the pattern showing the maximum similarity on the recognition result display. 6.

上記セグメント処理部３で得られた衝撃音パターンは、
識別／学習のモード切替スイッチ７を介して学習パター
ンメモリ部８にも供給される。この学習パターンメモリ
部８は学習モード時に供給される衝撃音パターンを学習
パターンとして記憶するものである。学習処理部９は学
習モード指定により起動され、認識処理部４で得られた
認識結果を取り込んで、予めわかっている結果と異なっ
た場合に、学習パターンメモリ部８に格納されているパ
ターンを読出して特性核の更新を行う。このように特性
核の更新が終了したら、固有ベクトルを計算し、新しい
辞書として標準パターンメモリ部５の内容を更新する。The impact sound pattern obtained by the segment processing section 3 is as follows:
The data is also supplied to the learning pattern memory unit 8 via the identification/learning mode changeover switch 7 . This learning pattern memory section 8 stores the impact sound pattern supplied during the learning mode as a learning pattern. The learning processing section 9 is activated by the learning mode designation, takes in the recognition result obtained by the recognition processing section 4, and reads out the pattern stored in the learning pattern memory section 8 if the recognition result differs from the previously known result. Update the characteristic kernel. When the characteristic kernel has been updated in this way, the eigenvector is calculated and the contents of the standard pattern memory section 5 are updated as a new dictionary.

尚、上記学習処理部９は、学習パターンに複数種類のロ
ーパスフィルタ処理を施したパターンを作成し、これに
よって特性核の更新を行うように構成してもよい。The learning processing unit 9 may be configured to create a pattern by subjecting the learning pattern to a plurality of types of low-pass filter processing, and update the characteristic kernel using this pattern.

上記構成において、以下、第２図乃至第４図を参照して
その処理手段について説明する。In the above configuration, the processing means will be explained below with reference to FIGS. 2 to 4.

すなわち、上記衝撃音識別装置は、入カバターンのカテ
ゴリーを知って辞書パターン（標準パターン）を更新す
る学習機能を有することを特徴としており、特に入カバ
ターンに距離に変形を加えることにより、辞書の更新用
の学習パターンとする。標準パターンメモリ部５に記憶
する辞書としては、複合類似度法を用いて１つのカテゴ
リーに属するパターン集合ｌｘ、ｌ　ｉ＝１．・・、ｎ
）に対して次の共分散行列（特性核）を定義する。That is, the impact sound identification device is characterized by having a learning function that updates the dictionary pattern (standard pattern) by knowing the category of the input pattern. This is a learning pattern for As a dictionary stored in the standard pattern memory section 5, a set of patterns lx, l i=1. ..., n
), define the following covariance matrix (characteristic kernel).

Ｋｗｘ　　”ＸＩ” ここで、Ｘ、ＴはＸＩの軽鎖ベクトルである。二〇には
パターン集合の分布の特徴を表すもので、特性核と呼ば
れる。このＫの固有ベクトルを固有値の大きい順に複数
個使用して辞書パターンとする。同じカテゴリーに属す
るパターンｇか誤認識された場合に、ｇの特徴がＫに反
映されていないと考え、Ｋ’　−に＋ａｇｇＴとＫを更新し、Ｋ′の固有ベクトルを固有値の大きい順
に複数個用いて新しい辞書パターンとする。Kwx "XI" where X, T are the light chain vectors of XI. Item 20 represents the characteristics of the distribution of the pattern set and is called the characteristic kernel. A plurality of K eigenvectors are used in descending order of eigenvalue to form a dictionary pattern. When a pattern g belonging to the same category is misrecognized, it is assumed that the features of g are not reflected in K, and K is updated with +aggT to K' -, and multiple eigenvectors of K' are used in descending order of eigenvalue. and create a new dictionary pattern.

さらに、ｇに距離による変形を加えたｇ′を作成し、Ｋ’　　−に＋αｇｇ”　　＋βｇ’　　ｇ’　　”と
すれば、距離に応じた辞書パターンに更新することがで
きる。Furthermore, by creating g' by adding a modification to g according to the distance, and setting K' - to +αgg''+βg'g''', it is possible to update the dictionary pattern according to the distance.

実例をあげて説明するに、大砲前を約２．５ｋｍ離れた
位置で観測して、第２図に示すような信号波形が得られ
たとすると、同種の大砲前を約１０Ｋｍ離れた位置で観
測した場合には第３図に示すように信号波形が大きく変
形する。そこで、ｇ′は一般に距離か遠くなるに従って
高周波成分の減衰が大きいことから、近距離のデータに
対してローパスフィルタ処理をかけて作成する。すなわ
ち、第２図のデータに対し、約３０Ｈｚのローパスフィ
ルタ処理をかけると第４図に示すような波形となり、第
３図に示した波形に近似するようになる。To explain with an example, if the front of a cannon is observed at a position approximately 2.5 km away and a signal waveform as shown in Figure 2 is obtained, then the front of a cannon of the same type is observed at a position approximately 10 km away. In this case, the signal waveform is greatly deformed as shown in FIG. Therefore, g' is created by applying low-pass filter processing to data at a short distance, since the attenuation of high-frequency components generally increases as the distance increases. That is, when the data in FIG. 2 is subjected to low-pass filter processing at approximately 30 Hz, a waveform as shown in FIG. 4 is obtained, which approximates the waveform shown in FIG. 3.

したがって、上記構成の衝撃音識別装置は、観測距離に
よるパターンの変形に応じて標準パターンを更新するこ
とができるので、入力音の観測距離条件が緩やかにして
、高精度に衝撃音を識別することができる。Therefore, the impact sound identification device configured as described above can update the standard pattern according to the deformation of the pattern depending on the observation distance, so that the observation distance condition for the input sound can be relaxed and impact sounds can be identified with high accuracy. I can do it.

尚、上記実施例では観測距離に着目して説明したが、こ
の発明はこれに限定されるものではなく、例えば気象条
件によって音波の伝送特性が変化する場合等、種々の条
件にも対応できることはもちろんである。Although the above embodiments have been explained with a focus on observation distance, the present invention is not limited to this, and can also be applied to various conditions, such as when the transmission characteristics of sound waves change depending on weather conditions. Of course.

［発明の効果］以上のようにこの発明によれば、入力音の観測条件が緩
やかで、高精度に衝撃音を識別可能な衝撃音識別装置を
提供することができる。[Effects of the Invention] As described above, according to the present invention, it is possible to provide an impact sound identification device that has relaxed observation conditions for input sounds and can identify impact sounds with high accuracy.

[Brief explanation of the drawing]

第１図はこの発明に係る衝撃音識別装置の一実施例を示
すブロック回路構成図、第２図乃至第４図はそれぞれ同
実施例の衝撃音パターン処理を説明するために示す波形
図、第５図は従来の衝撃音識別装置として利用される音
声認識による単語識別装置の構成を示すブロック回路図
である。１・・・音響入力部、２・・・特徴抽出部、３・・・セ
グメント処理部、４・・認識処理部、５・・標準パター
ンメモリ部、６・・・認識結果表示部、７・・・識別／
学習モード切替スイッチ、８・・・学習パターンメモリ
部、９・・・学習処理部、１１・・・Ａ／Ｄ変換器、］
２・・・音響処理回路、１３・・・データバッファメモ
リ、１４・・・類似度計算回路、１５・・・パターン辞
書ファイル、１６・・制御回路。出願人代理人　弁理士　鈴江武彦第図第図FIG. 1 is a block circuit configuration diagram showing an embodiment of an impact sound identification device according to the present invention, and FIGS. 2 to 4 are waveform diagrams shown to explain impact sound pattern processing of the same embodiment, respectively. FIG. 5 is a block circuit diagram showing the configuration of a word identification device based on voice recognition used as a conventional impact sound identification device. DESCRIPTION OF SYMBOLS 1... Acoustic input section, 2... Feature extraction section, 3... Segment processing section, 4... Recognition processing section, 5... Standard pattern memory section, 6... Recognition result display section, 7. ··identification/
learning mode changeover switch, 8... learning pattern memory section, 9... learning processing section, 11... A/D converter,]
2... Acoustic processing circuit, 13... Data buffer memory, 14... Similarity calculation circuit, 15... Pattern dictionary file, 16... Control circuit. Applicant's Representative Patent Attorney Takehiko Suzue

Claims

[Claims]

(1) Regarding a pattern detection means for detecting an impact sound pattern consisting of a time series of characteristic parameters of an input sound, a storage means for storing an impact sound pattern obtained by this pattern means, and an impact pattern stored in this storage means In an impact sound identification device that determines an input sound based on a similarity calculation means that calculates the similarity with a plurality of pre-registered standard patterns, and a similarity value obtained by the calculation means, the input Enter the category information along with the sound,
An impact sound identification device having a learning function that re-registers an impact sound pattern in the category as a standard pattern.

(2) The learning function predicts from the input sound a change in the feature parameter due to the difference between the observation position and the sound source position, creates a modified sound thereof, and adds it to the re-registered pattern. The impact sound identification device described.