JPS61240298A

JPS61240298A - Voice recognition equipment

Info

Publication number: JPS61240298A
Application number: JP60082012A
Authority: JP
Inventors: 潤一郎藤本
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1985-04-17
Filing date: 1985-04-17
Publication date: 1986-10-25

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】弦１ｕＬ１本発明は、音声認識装置に関する。[Detailed description of the invention] String 1uL1 The present invention relates to a speech recognition device.

災末挟夏音声認識装置には単語を認識するものや単音節を認識す
るものがあり、ワードプロセッサの如き日本語文章入力
のためには単音節音声認識が必須技術となる。しかし単
音節は各々が非常に類似しており、特に単音節を発声し
た際の大部分を占める母音部は５種類しかないため子音
だけで区別をしなければならない。中でも／　ｐ　／　
ｔ　／　ｔ　／　ｅ／に／の区別、／ｂ／、／ｄ／、／
ｇ／の区別、更に／ｍ／ｌ　／ｎ／＋　／ｇ／の区別が
つきにくく認識率が低くなってしまう。Speech recognition devices include those that recognize words and those that recognize monosyllables, and monosyllabic speech recognition is an essential technology for inputting Japanese sentences such as word processors. However, monosyllables are very similar, and since there are only five types of vowels, which make up the majority of monosyllables when uttered, it is necessary to distinguish them only by consonants. Among them / p /
Distinction between t / t / e/ and /, /b/, /d/, /
It is difficult to distinguish between g/ and furthermore between /m/l /n/+ /g/, resulting in a low recognition rate.

ｌ−一孜本発明は、上述のごとき実情に鑑みてなされたもので、
特に、単音節等の区別しにくい音声を区別して認識する
ことのできる音声認識装置を提供することを目的として
なされたものである。l-Itsuke The present invention was made in view of the above-mentioned circumstances.
In particular, the purpose of this invention is to provide a speech recognition device that can distinguish and recognize speech that is difficult to distinguish, such as monosyllables.

豊−一里本発明は、上記目的を達成する”丸め、音声収集部と、
音声信号を抽出する抽出部と、抽出された信号の時間長
を求める時間長検出部とを有し、あらかじめ使用者に区
別すべき音声の時間長を知らせておき、前記音声信号抽
出部を通過した信号の時間長によって音声を区別するこ
と、或いは、音声収集部と、音声信号を抽出する抽出部
と、抽出された信号の時間長を求める時間長検出部と、
標準パターンを登録しておく辞書部と、類似度を判定す
る類似度判定部とを有し、パターンの照合では特に区別
しにくい音声に関して認識時の発声長を決めておき、入
力されたパターンの認識結果が前記区別しにくい音声に
分類された場合、入力された音声長によって認識結果を
修正することを特徴としたものである。以下、本発明の
実施例に基づいて説明する。Toyo-Ichiri The present invention provides a "rounding and voice collecting section" that achieves the above object,
It has an extraction section that extracts an audio signal and a time length detection section that determines the time length of the extracted signal, and the user is informed in advance of the time length of the audio to be distinguished, and the signal passes through the audio signal extraction section. distinguishing voices based on the time length of the signal, or a voice collecting section, an extraction section that extracts the voice signal, and a time length detection section that determines the time length of the extracted signal;
It has a dictionary section that registers standard patterns and a similarity judgment section that judges the degree of similarity.The utterance length during recognition is determined for speech that is particularly difficult to distinguish in pattern matching, and the utterance length is determined during recognition. If the recognition result is classified as the hard-to-distinguish speech, the recognition result is modified based on the input speech length. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するための電気回路
図で１図中、１は音声収集部（マイク）。FIG. 1 is an electrical circuit diagram for explaining one embodiment of the present invention, and in the figure, 1 is a voice collecting section (microphone).

２は音声区間検出部、３はサンプリング部、４はカウン
タ、５は長さ分類部、６は結果出力部で、この実施例は
、音声収集部と音声信号検出部と、音声信号の時間長を
求める時間長検出部とを有し、あらかじめ使用者に区別
すべき音声の時間長を知らせておき、前記音声信号抽出
部を通過した信号の時間長によって認識するようにした
ものである。2 is a voice section detection section, 3 is a sampling section, 4 is a counter, 5 is a length classification section, and 6 is a result output section. The user is informed in advance of the time length of the voice to be distinguished, and recognition is made based on the time length of the signal that has passed through the voice signal extraction section.

例えば、三つの言葉を認識させる認識装置で、「きのう
」、「きよう」、「あしだ」という単語を認識するとし
た場合、あらかじめ使用者に「きよう」を認識させる時
は通常話すスピードで、「きのう」を認識させる時は通
常よりゆっくりと発声し例えば「きょう」の時の２倍位
の長さになるようにし、「あした」は逆に半分程度の長
さになるよう速く発声するよう指示を与えておく、そう
しておくと、マイクからその中のどれかを入力した場合
、音声の区間を検出してその音声の長さを判定すること
によって三つの単語のどれであるかを認識することが出
来る。ここに示した例では音声の長さの測定は音声信号
をサンプリングし、そのサンプルの数によって行なうよ
うにしたが、音声区間の検出はいくつかの方法が知られ
ており、例えば音声のパワーがある閾値を越したかどう
かで判定するようにしても良い、このような方法で測定
した音声の長さが１秒以上あると「きのう」。For example, if a recognition device that recognizes three words is used to recognize the words "yesterday,""kiyo," and "ashida," the user is asked to recognize "kiyo" in advance at normal speaking speed. , when making the child recognize ``yesterday'', say it slower than usual so that the length is about twice as long as ``today'', and conversely say ``tomorrow'' faster so that it is about half the length. Then, if you input any of them through the microphone, it will detect which of the three words it is by detecting the interval of the audio and determining the length of the audio. can be recognized. In the example shown here, the length of the voice is measured by sampling the voice signal and using the number of samples. However, there are several known methods for detecting voice intervals. The determination may be made based on whether or not a certain threshold has been exceeded.If the length of the audio measured using this method is 1 second or more, it is considered "yesterday".

０．５秒〜１秒は「きようＪ、０．５秒以下で「あした
」としておくことで非常に簡単な構成の不特定話者単語
音声認識装置が実現できる。しかし、この方法では単語
が３語を超すとまぎられしくなる。By setting "Kiyo J" for 0.5 seconds to 1 second and "Tomorrow" for 0.5 seconds or less, a speaker-independent word speech recognition device with a very simple configuration can be realized. However, with this method, if the number of words exceeds three, it becomes difficult to confuse the words.

第２図は、本発明の他の実施例を説明するための電気回
路図で、図中、１１は音声収集部（マイク）、１２は音
声区間検出部、１３は周波数分析部、１４はサンプリン
グ部、１５はカウンタ、１６はレジスタ、１７は登録・
認識切り換えスイッチ、１８は辞書部、１９は照合部、
２０は判定部。FIG. 2 is an electric circuit diagram for explaining another embodiment of the present invention, in which 11 is a voice collection section (microphone), 12 is a voice section detection section, 13 is a frequency analysis section, and 14 is a sampling section. 15 is a counter, 16 is a register, 17 is a registration/
Recognition changeover switch, 18 is a dictionary section, 19 is a collation section,
20 is a judgment section.

２１は結果修正部、２２は結果表示部で、この実施例は
、音声収集部と、音声区間検出部と、信号の時間長を求
める時間長検出部と、標準パターンを登録しておく辞書
部と、類似度判定部とを有し、パターンの照合で特に区
別しにくい音声に関して認識時の発声長を決めておき、
入力されたパターンの認識結果が該区別しにくい音声に
分類された場合、入力の音声長によって認識結果を修正
するようにしたものである。この認識装置は単音節の認
識を行なうもので判別が難しい／ｐ／、／ｌ／。21 is a result correction section, 22 is a result display section, and this embodiment includes a voice collection section, a voice section detection section, a time length detection section for determining the time length of a signal, and a dictionary section for registering standard patterns. and a similarity determination unit, and determines the utterance length at the time of recognition for voices that are particularly difficult to distinguish by pattern matching,
When the recognition result of the input pattern is classified as the difficult-to-distinguish speech, the recognition result is modified according to the length of the input speech. This recognition device recognizes monosyllables, making it difficult to distinguish between /p/ and /l/.

／に１つまりパ行、タ行、カ行の間で誤認識を起こしや
すい点に着目してなされたものである１通常、単音節を
発声すると２５ｍ秒〜４０ｍ秒であるから、パ行を発声
する際は短かく２５ｍ秒以下になるよう発声し、夕行は
通常通り、カ行は長く発声して４０ｍ秒以上になるよう
にする。／ｂ／。This was done by focusing on the fact that it is easy to misrecognize between the lines 1, ``pa'', ``ta'', and ``ka''. When speaking, the voice should be short, 25 msec or less, and the evening line should be uttered as usual, and the Ka line should be long and at least 40 msec long. /b/.

／　ｄ　／　ｔ　／　ｇ　／が混同しやすい場合はこれ
らの間でも同様の規則を作る。ここでは／ｐ／、／ｌ／
。If / d / t / g / are likely to be confused, create a similar rule for these. Here /p/, /l/
.

／に／だけの例で説明する。まず、スイッチ１７を登録
側にして辞書の登録を行なう。この登録は通常の音声認
識装置と何ら変わるところがない。Let's explain with an example of / to / only. First, switch 17 is set to the registration side to register the dictionary. This registration is no different from a normal speech recognition device.

マイクから単音節を入力し特徴量（ここでは周波数スペ
クトル）に変換し、それをサンプリングした後、カウン
タでサンプル数つまり時間長を測定してレジスタ１６に
入れておく、サンプリングされた特徴量は照合部で辞、
書の各パターンと照合し、類似度を求める。その結・果
、最大゛め類似度を得た結果が／　ｐ　／　ｅ　／　ｔ
　／　ｅ　／　ｋ　／で始まる音節かどうかを判定し、
そうでなければ結果をそのまま出力し、そうであればレ
ジスタに貯えられた音声長が２５ｍ秒以下ならパ行とし
、２５〜４０ｍ秒なら夕行、それ以上ならばカ行に修正
して出力する。After inputting a single syllable from a microphone and converting it into a feature quantity (in this case, a frequency spectrum) and sampling it, the number of samples, that is, the time length, is measured with a counter and stored in the register 16. The sampled feature quantity is collated. Resignation at the department
Match each pattern in the book and find the degree of similarity. As a result, the result with the maximum similarity is / p / e / t
Determine whether the syllable starts with / e / k /,
If not, output the result as is, and if so, if the audio length stored in the register is less than 25ms, it will be output as a line, if it is 25 to 40ms, it will be changed to a line, and if it is longer, it will be corrected to a line and output. .

通常、単音節の母音部の認識は、はぼ１００％近い正確
さで行なえるため、もし、修正前の結果が／　ｋ　ｅ　
／であって音声長が２０ｍ秒しかないとこれは子音だけ
を修正し、／　ｐ　ｅ　／とじて出力する。Normally, the vowel part of a monosyllable can be recognized with almost 100% accuracy, so if the result before correction is / k e
/, and the voice length is only 20 msec, this modifies only the consonant and outputs it as /pe/.

なお、本発明は、以上に述べた照合方法に限定されるも
のではなく、その他ＤＰマツチングを利用する等どのよ
うな方法をとっても良い。Note that the present invention is not limited to the matching method described above, and any other method such as using DP matching may be used.

羞−一来以上の説明から明らかなように１本発明によると、誤認
識の少ない音声認識装置を実現することができる。As is clear from the above description, according to the present invention, it is possible to realize a speech recognition device with fewer recognition errors.

[Brief explanation of drawings]

４　　　　　　　第１図及び第２図は、それぞれ本発明
の詳細な説明するための電気回路図である。１・・・音声収集部、２・・・音声区間検出部、３・・
・サンプリング部、４・・・カウンタ、５・・・長さ分
類部、６・・・結果出力部、１１・・・音声収集部、１
２・・・音声区間検出部、１３・・・周波数分析部、１
４・・・す゛ンプリング部、１５・・・カウンタ、１６
・・・レジスタ、１７・・・切り換えスイッチ、１８・
・・辞書部、１９・・・照合部、２０・・・判定部、２
１・・・結果修正部、２２・・・結果表示部。4. FIGS. 1 and 2 are electrical circuit diagrams for explaining the present invention in detail, respectively. 1... Voice collection unit, 2... Voice section detection unit, 3...
- Sampling unit, 4... Counter, 5... Length classification unit, 6... Result output unit, 11... Audio collection unit, 1
2... Voice section detection section, 13... Frequency analysis section, 1
4...Sampling section, 15...Counter, 16
...Register, 17...Selector switch, 18.
...Dictionary section, 19... Collation section, 20... Judgment section, 2
1...Result correction section, 22...Result display section.

Claims

[Claims]

(1), an audio collection unit, an extraction unit that extracts an audio signal,
and a time length detecting section that determines the time length of the extracted signal, the user is informed in advance of the time length of the audio to be distinguished, and the audio is detected based on the time length of the signal that has passed through the audio signal extracting section. A speech recognition device characterized by distinguishing between.

(2) an audio collection unit and an extraction unit that extracts audio signals;
It has a time length detection unit that calculates the time length of the extracted signal, a dictionary unit that registers standard patterns, and a similarity determination unit that determines the degree of similarity. A speech recognition device characterized in that the length of speech during recognition is determined, and when the recognition result of the input pattern is classified as the difficult-to-distinguish speech, the recognition result is corrected according to the length of the input speech. .