JPS61240298A - Voice recognition equipment - Google Patents

Voice recognition equipment

Info

Publication number
JPS61240298A
JPS61240298A JP60082012A JP8201285A JPS61240298A JP S61240298 A JPS61240298 A JP S61240298A JP 60082012 A JP60082012 A JP 60082012A JP 8201285 A JP8201285 A JP 8201285A JP S61240298 A JPS61240298 A JP S61240298A
Authority
JP
Japan
Prior art keywords
time length
section
audio
speech
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP60082012A
Other languages
Japanese (ja)
Inventor
潤一郎 藤本
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP60082012A priority Critical patent/JPS61240298A/en
Publication of JPS61240298A publication Critical patent/JPS61240298A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 弦1uL1 本発明は、音声認識装置に関する。[Detailed description of the invention] String 1uL1 The present invention relates to a speech recognition device.

災末挟夏 音声認識装置には単語を認識するものや単音節を認識す
るものがあり、ワードプロセッサの如き日本語文章入力
のためには単音節音声認識が必須技術となる。しかし単
音節は各々が非常に類似しており、特に単音節を発声し
た際の大部分を占める母音部は5種類しかないため子音
だけで区別をしなければならない。中でも/ p / 
t / t / e/に/の区別、/b/、/d/、/
g/の区別、更に/m/l /n/+ /g/の区別が
つきにくく認識率が低くなってしまう。
Speech recognition devices include those that recognize words and those that recognize monosyllables, and monosyllabic speech recognition is an essential technology for inputting Japanese sentences such as word processors. However, monosyllables are very similar, and since there are only five types of vowels, which make up the majority of monosyllables when uttered, it is necessary to distinguish them only by consonants. Among them / p /
Distinction between t / t / e/ and /, /b/, /d/, /
It is difficult to distinguish between g/ and furthermore between /m/l /n/+ /g/, resulting in a low recognition rate.

l−一孜 本発明は、上述のごとき実情に鑑みてなされたもので、
特に、単音節等の区別しにくい音声を区別して認識する
ことのできる音声認識装置を提供することを目的として
なされたものである。
l-Itsuke The present invention was made in view of the above-mentioned circumstances.
In particular, the purpose of this invention is to provide a speech recognition device that can distinguish and recognize speech that is difficult to distinguish, such as monosyllables.

豊−一里 本発明は、上記目的を達成する”丸め、音声収集部と、
音声信号を抽出する抽出部と、抽出された信号の時間長
を求める時間長検出部とを有し、あらかじめ使用者に区
別すべき音声の時間長を知らせておき、前記音声信号抽
出部を通過した信号の時間長によって音声を区別するこ
と、或いは、音声収集部と、音声信号を抽出する抽出部
と、抽出された信号の時間長を求める時間長検出部と、
標準パターンを登録しておく辞書部と、類似度を判定す
る類似度判定部とを有し、パターンの照合では特に区別
しにくい音声に関して認識時の発声長を決めておき、入
力されたパターンの認識結果が前記区別しにくい音声に
分類された場合、入力された音声長によって認識結果を
修正することを特徴としたものである。以下、本発明の
実施例に基づいて説明する。
Toyo-Ichiri The present invention provides a "rounding and voice collecting section" that achieves the above object,
It has an extraction section that extracts an audio signal and a time length detection section that determines the time length of the extracted signal, and the user is informed in advance of the time length of the audio to be distinguished, and the signal passes through the audio signal extraction section. distinguishing voices based on the time length of the signal, or a voice collecting section, an extraction section that extracts the voice signal, and a time length detection section that determines the time length of the extracted signal;
It has a dictionary section that registers standard patterns and a similarity judgment section that judges the degree of similarity.The utterance length during recognition is determined for speech that is particularly difficult to distinguish in pattern matching, and the utterance length is determined during recognition. If the recognition result is classified as the hard-to-distinguish speech, the recognition result is modified based on the input speech length. Hereinafter, the present invention will be explained based on examples.

第1図は、本発明の一実施例を説明するための電気回路
図で1図中、1は音声収集部(マイク)。
FIG. 1 is an electrical circuit diagram for explaining one embodiment of the present invention, and in the figure, 1 is a voice collecting section (microphone).

2は音声区間検出部、3はサンプリング部、4はカウン
タ、5は長さ分類部、6は結果出力部で、この実施例は
、音声収集部と音声信号検出部と、音声信号の時間長を
求める時間長検出部とを有し、あらかじめ使用者に区別
すべき音声の時間長を知らせておき、前記音声信号抽出
部を通過した信号の時間長によって認識するようにした
ものである。
2 is a voice section detection section, 3 is a sampling section, 4 is a counter, 5 is a length classification section, and 6 is a result output section. The user is informed in advance of the time length of the voice to be distinguished, and recognition is made based on the time length of the signal that has passed through the voice signal extraction section.

例えば、三つの言葉を認識させる認識装置で、「きのう
」、「きよう」、「あしだ」という単語を認識するとし
た場合、あらかじめ使用者に「きよう」を認識させる時
は通常話すスピードで、「きのう」を認識させる時は通
常よりゆっくりと発声し例えば「きょう」の時の2倍位
の長さになるようにし、「あした」は逆に半分程度の長
さになるよう速く発声するよう指示を与えておく、そう
しておくと、マイクからその中のどれかを入力した場合
、音声の区間を検出してその音声の長さを判定すること
によって三つの単語のどれであるかを認識することが出
来る。ここに示した例では音声の長さの測定は音声信号
をサンプリングし、そのサンプルの数によって行なうよ
うにしたが、音声区間の検出はいくつかの方法が知られ
ており、例えば音声のパワーがある閾値を越したかどう
かで判定するようにしても良い、このような方法で測定
した音声の長さが1秒以上あると「きのう」。
For example, if a recognition device that recognizes three words is used to recognize the words "yesterday,""kiyo," and "ashida," the user is asked to recognize "kiyo" in advance at normal speaking speed. , when making the child recognize ``yesterday'', say it slower than usual so that the length is about twice as long as ``today'', and conversely say ``tomorrow'' faster so that it is about half the length. Then, if you input any of them through the microphone, it will detect which of the three words it is by detecting the interval of the audio and determining the length of the audio. can be recognized. In the example shown here, the length of the voice is measured by sampling the voice signal and using the number of samples. However, there are several known methods for detecting voice intervals. The determination may be made based on whether or not a certain threshold has been exceeded.If the length of the audio measured using this method is 1 second or more, it is considered "yesterday".

0.5秒〜1秒は「きようJ、0.5秒以下で「あした
」としておくことで非常に簡単な構成の不特定話者単語
音声認識装置が実現できる。しかし、この方法では単語
が3語を超すとまぎられしくなる。
By setting "Kiyo J" for 0.5 seconds to 1 second and "Tomorrow" for 0.5 seconds or less, a speaker-independent word speech recognition device with a very simple configuration can be realized. However, with this method, if the number of words exceeds three, it becomes difficult to confuse the words.

第2図は、本発明の他の実施例を説明するための電気回
路図で、図中、11は音声収集部(マイク)、12は音
声区間検出部、13は周波数分析部、14はサンプリン
グ部、15はカウンタ、16はレジスタ、17は登録・
認識切り換えスイッチ、18は辞書部、19は照合部、
20は判定部。
FIG. 2 is an electric circuit diagram for explaining another embodiment of the present invention, in which 11 is a voice collection section (microphone), 12 is a voice section detection section, 13 is a frequency analysis section, and 14 is a sampling section. 15 is a counter, 16 is a register, 17 is a registration/
Recognition changeover switch, 18 is a dictionary section, 19 is a collation section,
20 is a judgment section.

21は結果修正部、22は結果表示部で、この実施例は
、音声収集部と、音声区間検出部と、信号の時間長を求
める時間長検出部と、標準パターンを登録しておく辞書
部と、類似度判定部とを有し、パターンの照合で特に区
別しにくい音声に関して認識時の発声長を決めておき、
入力されたパターンの認識結果が該区別しにくい音声に
分類された場合、入力の音声長によって認識結果を修正
するようにしたものである。この認識装置は単音節の認
識を行なうもので判別が難しい/p/、/l/。
21 is a result correction section, 22 is a result display section, and this embodiment includes a voice collection section, a voice section detection section, a time length detection section for determining the time length of a signal, and a dictionary section for registering standard patterns. and a similarity determination unit, and determines the utterance length at the time of recognition for voices that are particularly difficult to distinguish by pattern matching,
When the recognition result of the input pattern is classified as the difficult-to-distinguish speech, the recognition result is modified according to the length of the input speech. This recognition device recognizes monosyllables, making it difficult to distinguish between /p/ and /l/.

/に1つまりパ行、タ行、カ行の間で誤認識を起こしや
すい点に着目してなされたものである1通常、単音節を
発声すると25m秒〜40m秒であるから、パ行を発声
する際は短かく25m秒以下になるよう発声し、夕行は
通常通り、カ行は長く発声して40m秒以上になるよう
にする。/b/。
This was done by focusing on the fact that it is easy to misrecognize between the lines 1, ``pa'', ``ta'', and ``ka''. When speaking, the voice should be short, 25 msec or less, and the evening line should be uttered as usual, and the Ka line should be long and at least 40 msec long. /b/.

/ d / t / g /が混同しやすい場合はこれ
らの間でも同様の規則を作る。ここでは/p/、/l/
If / d / t / g / are likely to be confused, create a similar rule for these. Here /p/, /l/
.

/に/だけの例で説明する。まず、スイッチ17を登録
側にして辞書の登録を行なう。この登録は通常の音声認
識装置と何ら変わるところがない。
Let's explain with an example of / to / only. First, switch 17 is set to the registration side to register the dictionary. This registration is no different from a normal speech recognition device.

マイクから単音節を入力し特徴量(ここでは周波数スペ
クトル)に変換し、それをサンプリングした後、カウン
タでサンプル数つまり時間長を測定してレジスタ16に
入れておく、サンプリングされた特徴量は照合部で辞、
書の各パターンと照合し、類似度を求める。その結・果
、最大゛め類似度を得た結果が/ p / e / t
 / e / k /で始まる音節かどうかを判定し、
そうでなければ結果をそのまま出力し、そうであればレ
ジスタに貯えられた音声長が25m秒以下ならパ行とし
、25〜40m秒なら夕行、それ以上ならばカ行に修正
して出力する。
After inputting a single syllable from a microphone and converting it into a feature quantity (in this case, a frequency spectrum) and sampling it, the number of samples, that is, the time length, is measured with a counter and stored in the register 16. The sampled feature quantity is collated. Resignation at the department
Match each pattern in the book and find the degree of similarity. As a result, the result with the maximum similarity is / p / e / t
Determine whether the syllable starts with / e / k /,
If not, output the result as is, and if so, if the audio length stored in the register is less than 25ms, it will be output as a line, if it is 25 to 40ms, it will be changed to a line, and if it is longer, it will be corrected to a line and output. .

通常、単音節の母音部の認識は、はぼ100%近い正確
さで行なえるため、もし、修正前の結果が/ k e 
/であって音声長が20m秒しかないとこれは子音だけ
を修正し、/ p e /とじて出力する。
Normally, the vowel part of a monosyllable can be recognized with almost 100% accuracy, so if the result before correction is / k e
/, and the voice length is only 20 msec, this modifies only the consonant and outputs it as /pe/.

なお、本発明は、以上に述べた照合方法に限定されるも
のではなく、その他DPマツチングを利用する等どのよ
うな方法をとっても良い。
Note that the present invention is not limited to the matching method described above, and any other method such as using DP matching may be used.

羞−一来 以上の説明から明らかなように1本発明によると、誤認
識の少ない音声認識装置を実現することができる。
As is clear from the above description, according to the present invention, it is possible to realize a speech recognition device with fewer recognition errors.

【図面の簡単な説明】[Brief explanation of drawings]

4       第1図及び第2図は、それぞれ本発明
の詳細な説明するための電気回路図である。 1・・・音声収集部、2・・・音声区間検出部、3・・
・サンプリング部、4・・・カウンタ、5・・・長さ分
類部、6・・・結果出力部、11・・・音声収集部、1
2・・・音声区間検出部、13・・・周波数分析部、1
4・・・す゛ンプリング部、15・・・カウンタ、16
・・・レジスタ、17・・・切り換えスイッチ、18・
・・辞書部、19・・・照合部、20・・・判定部、2
1・・・結果修正部、22・・・結果表示部。
4. FIGS. 1 and 2 are electrical circuit diagrams for explaining the present invention in detail, respectively. 1... Voice collection unit, 2... Voice section detection unit, 3...
- Sampling unit, 4... Counter, 5... Length classification unit, 6... Result output unit, 11... Audio collection unit, 1
2... Voice section detection section, 13... Frequency analysis section, 1
4...Sampling section, 15...Counter, 16
...Register, 17...Selector switch, 18.
...Dictionary section, 19... Collation section, 20... Judgment section, 2
1...Result correction section, 22...Result display section.

Claims (2)

【特許請求の範囲】[Claims] (1)、音声収集部と、音声信号を抽出する抽出部と、
抽出された信号の時間長を求める時間長検出部とを有し
、あらかじめ使用者に区別すべき音声の時間長を知らせ
ておき、前記音声信号抽出部を通過した信号の時間長に
よつて音声を区別することを特徴とする音声認識装置。
(1), an audio collection unit, an extraction unit that extracts an audio signal,
and a time length detecting section that determines the time length of the extracted signal, the user is informed in advance of the time length of the audio to be distinguished, and the audio is detected based on the time length of the signal that has passed through the audio signal extracting section. A speech recognition device characterized by distinguishing between.
(2)、音声収集部と、音声信号を抽出する抽出部と、
抽出された信号の時間長を求める時間長検出部と、標準
パターンを登録しておく辞書部と、類似度を判定する類
似度判定部とを有し、パターンの照合では特に区別しに
くい音声に関して認識時の発声長を決めておき、入力さ
れたパターンの認識結果が前記区別しにくい音声に分類
された場合、入力された音声長によつて認識結果を修正
することを特徴とする音声認識装置。
(2) an audio collection unit and an extraction unit that extracts audio signals;
It has a time length detection unit that calculates the time length of the extracted signal, a dictionary unit that registers standard patterns, and a similarity determination unit that determines the degree of similarity. A speech recognition device characterized in that the length of speech during recognition is determined, and when the recognition result of the input pattern is classified as the difficult-to-distinguish speech, the recognition result is corrected according to the length of the input speech. .
JP60082012A 1985-04-17 1985-04-17 Voice recognition equipment Pending JPS61240298A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP60082012A JPS61240298A (en) 1985-04-17 1985-04-17 Voice recognition equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP60082012A JPS61240298A (en) 1985-04-17 1985-04-17 Voice recognition equipment

Publications (1)

Publication Number Publication Date
JPS61240298A true JPS61240298A (en) 1986-10-25

Family

ID=13762605

Family Applications (1)

Application Number Title Priority Date Filing Date
JP60082012A Pending JPS61240298A (en) 1985-04-17 1985-04-17 Voice recognition equipment

Country Status (1)

Country Link
JP (1) JPS61240298A (en)

Similar Documents

Publication Publication Date Title
US4284846A (en) System and method for sound recognition
US4181813A (en) System and method for speech recognition
WO2014129856A1 (en) Method for recognizing voice of single sentence containing multiple commands
US4769844A (en) Voice recognition system having a check scheme for registration of reference data
JPS62217295A (en) Voice recognition system
JPS6138479B2 (en)
JPS6316766B2 (en)
JPS61240298A (en) Voice recognition equipment
JP2813209B2 (en) Large vocabulary speech recognition device
JP3039453B2 (en) Voice recognition device
JPH0682275B2 (en) Voice recognizer
JPS5936759B2 (en) Voice recognition method
JPH02124600A (en) Voice recognition device
JPS61118800A (en) Voice analyzer
JPS6070497A (en) Voice recognition equipment
JPS62165700A (en) Voice decision unit
JPS63131196A (en) Nasal identifier
JPS61233792A (en) Voice recognition equipment
JPS58123596A (en) Voice recognition system jointly using auxiliary information
JPS5885495A (en) Voice recognition equipment
JPS62166400A (en) Voice wordprocessor
JPS62218997A (en) Word voice recognition equipment
JPS60158496A (en) Voice recognition equipment
JPS62164097A (en) Voice discrimination system
JPS63220199A (en) Voice recognition equipment