JPH02278295A

JPH02278295A - Voice pattern registering system and voice recognizing device

Info

Publication number: JPH02278295A
Application number: JP1101144A
Authority: JP
Inventors: Junichiro Fujimoto; 潤一郎藤本
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-04-19
Filing date: 1989-04-19
Publication date: 1990-11-14

Abstract

PURPOSE:To execute a correct recognition even in a noise by eliminating a noise part from a stationary part of a noise to which a sudden noise is applied and replace it with a correct pattern. CONSTITUTION:A voice is collected by a microphone 1 and converted to an electric signal, the converted electric signal is brought to frequency analysis by a filter bank 2, and from a result of analysis, a position of a large frequency component is fetched by a peak detecting part 4, and whether the position of a large frequency component is continued for a prescribed time or not is decided by a time counter 5 and a comparing part 6. In the case plural spectrum stable parts continued exceeding the prescribed time exist, whether the adjacent spectrum stable parts are the same kind or not is decided by a pattern comparing part 8, and at the time of the same kind, they are decided that one spectrum stable part is continued and corrected by a pattern replacing part 9 and brought to pattern registration. In such a way, even in a noise, a correct recognition can be executed.

Description

【発明の詳細な説明】伎亙分更本発明は、音声認識用のパターン登録方式、及び、該パ
ターン登録方式によって２８したパターンを用いた音声
認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a pattern registration method for speech recognition, and a speech recognition device using 28 patterns created by the pattern registration method.

灸未皮逢音声認識装置が実眉に近付くに従い、実環境の中での使
用を考慮するための研究が行なわれてきた。特に、音声
認識の場合は周囲の騒音が問題で、音声と騒音の区別が
できないと認識することができない、！Ｉ音の中で認識
するために、いくつかの方法が提案されている。例えば
、音声が入力されない時の騒音のスペクトルをメモリー
に保存しておいて、音声が入力された時、そのスペクト
ル成分から騒音の分を差引いてからｔＬ＆するもの（日
本音響学会講演論文集　昭和５７年３月　Ｐ１４１〜１
４２　３−４−６．単語認識の雑音処理に関する実験）
が良く知られている。しかし、この方法ではあらかじめ
メモリーにたくねえた騒音のスペクトル成分が変化した
時には、逆効果となることもある。その他に、音声が入
力される前に騒音の大きさを測定しておいて、それより
も大きな音が入力された時、またはその測定値よりも小
さくなった時、それぞれを音声の始端と終端として騒音
中から取りだすものがある。しかし、この方法では、音
声発声中に騒音が大きくなると、始端検出後の音声レベ
ルがあらかじめ測定しておいたしきい値より下がること
がなくなる。このために、音声の終端が見つからないと
いった現象を引き起こしてしまうことになる。また、２
つの音声入力手段を持ち、一方には音声と周＠騒音が、
他方には周囲騒音が主入力となるようにし、両者の誤差
が最小になるようにフィルター特性を変えていくもの、
いわゆる、アダプティブフィルタによるものがある（音
声研究会資料　５８１−８１ｐ６５１．高速マイクロ・
プロセッサーを用いた雑音除去）。この方法は精度も良
く、入力される騒音が定常的ではなくとも、それを減ら
した信号を取りだすことができるという長所を持ってい
る。As moxibustion-based speech recognition devices approach real eyebrows, research has been conducted to consider their use in real environments. In particular, in the case of voice recognition, ambient noise is a problem, and recognition cannot be achieved unless it is possible to distinguish between voice and noise! Several methods have been proposed for recognition in I sounds. For example, the spectrum of noise when no audio is input is stored in memory, and when audio is input, the noise component is subtracted from the spectral component and then tL& is performed (Proceedings of the Acoustical Society of Japan, 1977) March P141-1
42 3-4-6. Experiments on noise processing for word recognition)
is well known. However, this method may have the opposite effect if the spectral components of the noise stored in memory change. In addition, you can measure the loudness of the noise before the audio is input, and when a louder sound is input or the noise is lower than the measured value, you can measure the noise level at the beginning and end of the audio, respectively. There are things that can be extracted from the noise. However, with this method, if the noise increases during voice production, the voice level after detecting the start point will not fall below a pre-measured threshold. This causes a phenomenon in which the end of the audio cannot be found. Also, 2
It has two audio input means, one for audio and one for ambient noise.
On the other hand, the main input is ambient noise, and the filter characteristics are changed to minimize the error between the two.
There is a so-called adaptive filter (Voice Research Group Materials 581-81 p651.
noise removal using a processor). This method is highly accurate and has the advantage that even if the input noise is not constant, a signal with reduced noise can be extracted.

しかし、この方法では、誤差最小に収束させるための計
算量が多く１通常の音声認識のように１０ｍ５ごとにデ
ータを取込めば、取込時間中に収束させることができに
くい。どのような方法であっても、音声を発声している
間に突発的な騒音が発生した場合、これに対処すること
ができない。However, this method requires a large amount of calculation to converge to the minimum error.1 If data is acquired every 10 m5 as in normal speech recognition, it is difficult to converge within the acquisition time. No matter what method is used, it is impossible to deal with sudden noises that occur while uttering voice.

几−一並本発明は、上述のごとき従来技術の欠点に鑑みなされた
もので、音声入力中に騒音が発生しても特徴パターンを
修正して認識するための、パターンの作成法を提供する
ことを目的としてなされたものである。The present invention has been made in view of the above-mentioned shortcomings of the prior art, and provides a pattern creation method for modifying and recognizing characteristic patterns even when noise occurs during voice input. It was done for that purpose.

眉ニー」又本発明は、上記目的を達成するために、音声を収集して
電気信号に変換する部分と、変換された電気信号を周波
数分析する部分と、分析された結果から周波数成分の大
なる位置を取り出し、周波数成分の大なる位置が所定の
時間継続するか否かを判定する部分と、所定の時間以上
継続する部分（以降スペクトル安定部と称する）が複数
存在した場合、隣り合うスペクトル安定部が同種のもの
であるかどうかを判定し、同種の時にはそれらを一つの
スペクトル安定部が継続しているものとして修正してパ
ターン登録することを特徴としだものである。以下、本
発明の実施例に基づいて説明する。In order to achieve the above object, the present invention includes a part that collects audio and converts it into an electrical signal, a part that analyzes the frequency of the converted electrical signal, and a part that analyzes the frequency components from the analyzed results. If there are multiple parts where the position where the frequency component is large continues for a predetermined time or not (hereinafter referred to as spectral stable part), the adjacent spectrum This method is characterized by determining whether the stable parts are of the same type, and if they are of the same type, correcting them as one continuous spectral stable part and registering the pattern. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するための構成図で
、図中、１はマイクロフォン、２はフィルタバンク部、
３はサンプリング部、４はピーク検出部、５は時間カウ
ンタ、６は比較部、７はメモリ部、８はパターン比較部
、９はパターン置き換え部、１０は登録メモリ部で、本
発明は、音声中に含まれる定常的な音韻の並びと、出現
順序だけでもかなり良い精度で認識結果をしぼり込むこ
とができるという事実に基づいてなされたものであり、
具体的には、音声を収集して電気信号に変換する部分と
、変換された電気信号を周波数分析する部分と、分析さ
れた結果から周波数成分の大なる位置を取りだし、周波
数成分の大なる位置が所定の時間継続するか否かを判定
する部分と、所定の時間以上継続する部分（以降スペク
トル安定部と称する）が複数存在した場合、隣り合うス
ペクトル安定部が同種のものであるかどうかを判定し、
同種の時にはそれらを一つのスペクトル安定部が継続し
ているものとして修正してパターン登録するようにした
ものである。これを第１図によって説明すると、まず、
マイクロフォン１によって音声を入力し、電気信号に変
換する。変換された信号を周波数分析する周波数分析部
としては、例えばバンドパスフィルタ群２などを使用す
れば良い。あるいは、波形をサンプル３してからＦＦＴ
により周波数変換しても良い。周波数変換した結果の中
で、成分が大きい周波数をピーク検出部４で検出する。FIG. 1 is a configuration diagram for explaining one embodiment of the present invention, in which 1 is a microphone, 2 is a filter bank section,
3 is a sampling section, 4 is a peak detection section, 5 is a time counter, 6 is a comparison section, 7 is a memory section, 8 is a pattern comparison section, 9 is a pattern replacement section, and 10 is a registration memory section. This was done based on the fact that recognition results can be narrowed down with fairly high accuracy based on the constant arrangement of phonemes contained within and the order of appearance.
Specifically, there is a part that collects audio and converts it into an electrical signal, a part that analyzes the frequency of the converted electrical signal, and a part that extracts the major positions of frequency components from the analyzed results and calculates the major positions of the frequency components. If there is a part that determines whether or not the spectral stability continues for a predetermined time and a part that continues for a predetermined time or more (hereinafter referred to as spectral stable parts), it is determined whether the adjacent spectral stable parts are of the same type. judge,
When they are of the same type, they are corrected and registered as a continuous spectral stable region. To explain this using Figure 1, first,
Voice is input through the microphone 1 and converted into an electrical signal. As a frequency analysis section that performs frequency analysis of the converted signal, for example, band pass filter group 2 may be used. Alternatively, sample the waveform 3 and then perform FFT
The frequency may be converted by A peak detection unit 4 detects a frequency with a large component among the results of frequency conversion.

ここでは周波数軸上で成分の大きさを比較しながら、極
大値を取り出す方法で実行するが、周波数軸上の隣り合
うデータの差を取り、その符号が逆転するところをピー
クとするなど、他の方法でも良いことは言うまでもない
。次に、この取り出した極大値が時間的に連続する長さ
を時間カウンタ５で測定し、これを決められた時間長と
比較部６で比較してそれよりも長いかどうかをチエツク
する。決められた時間長としては３０ｍ５程度が良い。Here, we compare the magnitudes of the components on the frequency axis and extract the maximum value, but other methods, such as taking the difference between adjacent data on the frequency axis and determining the peak where the sign is reversed, are used. It goes without saying that this method is also good. Next, a time counter 5 measures the length of time in which the extracted local maximum values continue, and a comparator 6 compares this with a predetermined time length to check whether it is longer than that. The recommended length of time is about 30m5.

これよりも長い部分に、メモリ部７において、マークを
つける。A mark is placed on a portion longer than this in the memory section 7.

ここでマークがつけられるのは母音と／Ｓ／。What is marked here is the vowel and /S/.

／ｆ／、／ｌ　ｆ／などの音韻である。しかもこれらの
音韻の定常的な安定部（スペクトル安定部）にマークが
つけられることになる。例えば、異なる母音が２個連続
する場合は、母音の変り目でパターンに定常性がなくな
るため、２つに分割される。マークをつけられた部分で
、隣り合う部分をパターン比較部８において比較するこ
とにより、隣り合う２つが同じ音韻かそうでないかを判
定する。これは隣り合う部分の類似度を求め、その値が
一定値以上であるのか否かで判定できる。同じ音韻なら
、その２つの部分を連結して１つにまとめてしまう。ま
とめる時にはマークを連続させても良いし、同じ母音の
パターンを挿入して作り替えても良い。このようにして
、このようなパターンを登録しておいて、まず、通常の
認識の前に該パターンによって認識対象を限定してから
、通常のＬｙ２２を行なうのも効果的であるが、登録単
語数があまり多くない場合や、母音や、定常性を持つ子
音の存在位置と組合せの同じものがない場合は、このま
までＬｙ＆識結果が得られる。These are phonemes such as /f/ and /l f/. Moreover, the stationary stable parts (spectral stable parts) of these phonemes are marked. For example, when two different vowels are consecutive, the pattern loses stability at the transition between vowels, so it is divided into two. By comparing adjacent portions of the marked portions in the pattern comparing section 8, it is determined whether two adjacent portions have the same phoneme or not. This can be determined by determining the degree of similarity between adjacent parts and determining whether the value is greater than a certain value. If they have the same phoneme, we connect the two parts and combine them into one. When grouping the marks, you can make them consecutive, or you can insert the same vowel pattern to rearrange the marks. It is also effective to register such a pattern in this way, first limit the recognition target by the pattern before normal recognition, and then perform normal Ly22. If the number is not very large, or if there are no vowels or constant consonants in the same position and combination, Ly& identification results can be obtained as is.

もし、母音発声中に突発的な雑音が入り込んだ場合には
、母音の中に雑音が入り込むことになるが、上記の操作
により、母音を復元することができる。また、雑音が定
常性のない子音に混入した時、また、母音の末尾に付い
た時も上記のマークの修正で混入しないものと同じ状態
に戻すことができる。If a sudden noise occurs during vowel pronunciation, the noise will enter the vowel, but the vowel can be restored by the above operation. Furthermore, when noise is mixed into a non-stationary consonant, or when it is attached to the end of a vowel, it is possible to return to the same state as when noise was not mixed in by correcting the mark as described above.

なお、第１図に示した例は、予備的な認識にこの方法を
使う例であって、認識部がどのような手法を使うかは制
限していない。予備的なＬｙ３識でもマークの並び方か
ら辞書中の同じパターンを取りだしてくるもので、特に
パターンマツチングの必要はない。勿論、パターンマツ
チングをしてもよい。Note that the example shown in FIG. 1 is an example in which this method is used for preliminary recognition, and there is no restriction on what kind of method the recognition unit uses. Even in the preliminary Ly3 knowledge, the same pattern in the dictionary is retrieved from the arrangement of marks, so there is no particular need for pattern matching. Of course, pattern matching may be used.

勿−一二隈以上の説明から明らかなように、本発明によると、突発
的な雑音が加すっだ音声の定常部から雑音部を取除き正
しいパターンに置き換えることができるようになり、こ
の結果、雑音の中でも正しい認識ができるようになった
。Of course - IchikumaAs is clear from the above explanation, according to the present invention, it becomes possible to remove the noise part from the steady part of speech to which sudden noise has been added and replace it with the correct pattern. , I was able to make correct recognition even in noise.

[Brief explanation of drawings]

第１図は、本発明の一実施例を説明するための構成図で
ある。１・・・マイクロフォン、２・・・フィルタバンク部、
３サンプリング部、４・・・ピーク検出部、５・・・時
間カウンタ、６・・・比較部、７・・・メモリ部、８・
・・パターン比較部、９・・・パターン置き換え部、１
０・・・登録メモリ部。第１図特許出願人　　株式会社　リコーFIG. 1 is a configuration diagram for explaining one embodiment of the present invention. 1...Microphone, 2...Filter bank section,
3 sampling section, 4... peak detection section, 5... time counter, 6... comparison section, 7... memory section, 8.
...Pattern comparison section, 9...Pattern replacement section, 1
0...Registered memory section. Figure 1 Patent applicant Ricoh Co., Ltd.

Claims

[Claims] 1. A part that collects audio and converts it into an electrical signal, a part that analyzes the frequency of the converted electrical signal, and extracts the major positions of frequency components from the analyzed results. If there are multiple parts that determine whether or not a large position continues for a predetermined time and parts that continue for more than a predetermined time (hereinafter referred to as spectral stable parts), adjacent spectral stable parts are of the same type. A voice pattern registration method characterized by determining whether or not they are the same, and registering the patterns by correcting them as if they are a continuous spectral stable part if they are the same. 2. A speech recognition device using a pattern registered by the pattern registration method according to claim 1.