JPS59160195A

JPS59160195A - Voice recognition equipment

Info

Publication number: JPS59160195A
Application number: JP58034812A
Authority: JP
Inventors: 米山　正秀; 潤一郎藤本
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1983-03-03
Filing date: 1983-03-03
Publication date: 1984-09-10

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】技術分野本発明は、音声入力を識別するための装置に関する。[Detailed description of the invention] Technical field The present invention relates to a device for identifying audio input.

皿米肢皿従来、音声認識装置では入力の音声を分析して特徴を抽
出し、その後、すでに辞書として登録しておいた各単語
に対応する特徴と比較し、類似度最大のもの又は距離が
最小のものを認識結果として出力するようにしている。Traditionally, speech recognition devices analyze input speech to extract features, then compare them with the features corresponding to each word that have already been registered in the dictionary, and select the one with the greatest similarity or distance. The smallest one is output as the recognition result.

その場合、音声の認識は高速で行なうことが望ましいが
、辞書登録中の全ての単語と照合するようにすると時間
がかかりすぎる。そのために、次のような方法が提案さ
れている。（１）辞書を単語中に富まれる母音で群に分
類しておき、入力音声も母音分析して相当する辞書群中
だけで照合する。（２）入力前にオペレータがスイッチ
で入力音声のカテゴリーを入力し、照合すべき辞書の範
囲を限定する。（３）辞書に登録された各単語の先頭に
その時間長を記録し、入力音声長と大きく異なる長さの
単語は照合しない。In that case, it is desirable to perform speech recognition at high speed, but it would take too much time to match all the words registered in the dictionary. For this purpose, the following methods have been proposed. (1) The dictionary is classified into groups based on vowels that are abundant in words, and the input speech is also analyzed for vowels and matched only in the corresponding dictionary groups. (2) Before inputting, the operator inputs the category of the input voice using a switch to limit the range of dictionaries to be compared. (3) Record the length of time at the beginning of each word registered in the dictionary, and do not match words whose length is significantly different from the input speech length.

しかしながら、前記（１）の方法では入力音声を母音分
析する必要があり余分な時間を要し、（２）の方法では
オペレータの労力を要し、その操作上のミスが結果のエ
ラーにつながる。また、（３）の方法では照合の要、不
要にかかわらず全ての辞書を読まねばならないという欠
点がある。However, method (1) requires vowel analysis of input speech, which requires extra time, and method (2) requires operator effort, and operational errors may lead to errors in the results. Furthermore, method (3) has the disadvantage that all dictionaries must be read regardless of whether or not verification is necessary.

目　　　　　的本発明は、上述のごとき従来技術の欠点を解消し、かつ
、高速で入力音声を認識することのできる音声認識装置
を提供しようとするものである。OBJECTS The present invention aims to eliminate the above-mentioned drawbacks of the prior art and provide a speech recognition device that can recognize input speech at high speed.

構　　　成本発明の構成について、以下、実施例に基づいて説明す
る。Configuration The configuration of the present invention will be described below based on examples.

第１図は、本発明による音声認識装置の一実施例を説明
するための構成図で、図中、１はマイク、２はリセット
スイッチ、３はマイク１より入力された信号から音声の
区間だけを切り出す音声区間検出部、４はこの音声を記
録しておくレジスター、５は番号又は記号を記録してお
くレジスター、６はレジスター４の内容とレジスター５
の内容の積をとる乗算器、７は辞書部、８は特徴抽出部
、９は辞書部７に記録された特徴と特徴抽出部８からの
特徴を照合する照合部、１０は照合結果の類似性を判定
する判定部、１１は判定結果を出力する出力部で、あら
かじめ辞書部にいくつかの単語を登録した後、認識過程
に入る。まず、リセットスイッチ２でレジスター４の内
容を全てＯにしておいてから、マイク１から認識すべき
音声を入力する。この音声は音声区間検出部３によって
音声区間の切り出しを行なった後、レジスター４内に記
録される。一方、レジスター５には１，２．３・・・と
番号が記入されており、１とレジスター４の先頭番地の
内容、２と同じく第２番地、３と第３番地・・・という
具合にレジスター４とレジスター５の内容が対応づけら
れて積がとられる。その結果は、先頭から順にｆ　（１
）、２　ｆ　（２）、３　ｆ　（３）、・＝ｎ　ｆ（ｎ
）、（ｎ＋　Ｌ）ｆ（ｎ＋ＬＬ・・・Ｎｆ（Ｎ）となる
。ただし、入力音声をｆ（、ｔ）とし、レジスター内は
Ｎ個のデータを記録でき、音声長は１〜ｎであったとす
る。すなわち、ｔ＞ｎではｆ（ｔ）＝Ｏであるため（ｎ
＋１）ｆ（ｎ＋１）〜Ｎｆ（Ｎ）には０が並ぶ。FIG. 1 is a block diagram for explaining an embodiment of the speech recognition device according to the present invention. In the figure, 1 is a microphone, 2 is a reset switch, and 3 is only the section of the voice from the signal input from the microphone 1. 4 is a register to record this voice, 5 is a register to record a number or symbol, 6 is the contents of register 4 and register 5
7 is a dictionary section, 8 is a feature extraction section, 9 is a matching section that matches the features recorded in the dictionary section 7 and the features from the feature extraction section 8, 10 is a similarity of matching results A determining unit 11 is an output unit outputting a determination result, and after registering some words in the dictionary unit in advance, the recognition process begins. First, all the contents of the register 4 are set to O using the reset switch 2, and then the voice to be recognized is inputted from the microphone 1. This voice is recorded in the register 4 after a voice section is cut out by the voice section detecting section 3. On the other hand, numbers are written in register 5, such as 1, 2, 3, etc. 1 and the contents of the first address of register 4, 2 and the second address, and 3 and the third address, and so on. The contents of register 4 and register 5 are correlated and multiplied. The results are f (1
), 2 f (2), 3 f (3), ・=n f(n
), (n+L)f(n+LL...Nf(N). However, the input audio is f(,t), N pieces of data can be recorded in the register, and the audio length is 1 to n. In other words, since f(t)=O when t>n, (n
+1) 0 is lined up in f(n+1) to Nf(N).

これを逆方向つまりＮ　ｆ　（Ｎ）から（Ｎ−１）ｆ（
Ｎ−１）、・・・と並べた時に、最初にＯでない値が出
現するのはｎｆ（ｎ）であり、これにより音声長ｎが容
易に求められることになる。辞書内で単語を短い順に並
べておき、音声長ｎ付近の単語から照合すると、正解を
得るために要する演算が少なくなり高速化が望める。た
だし、図示実施例では、入力音声はレジスター４へ入っ
た後に特徴抽出部８へ達するような構成になっているが
、逆に、入力信号をまず特徴抽出部８へ入れた後にレジ
スター４へ格納する方が一般的である。また、ここでは
乗算器６を用いているがこれは割り算器にしてもよく、
その場合は、レジスター４の内容をレジスター５の内容
で割れば良い。更に、上記説明において、レジスター５
内の番号は辞書中の単語ファイル番号又はファイル名で
ある場合であるが、これは辞書中の番地であっても良い
。This is done in the opposite direction, from N f (N) to (N-1) f(
N-1), . . . , the first value that is not O appears is nf(n), which allows the voice length n to be easily determined. By arranging words in the dictionary in ascending order of shortest length and comparing words starting from the words with the phonetic length n, the number of calculations required to obtain the correct answer will be reduced and faster processing can be expected. However, in the illustrated embodiment, the input audio is configured to enter the register 4 and then reach the feature extractor 8, but conversely, the input signal is first entered into the feature extractor 8 and then stored in the register 4. It is more common to do so. Also, although multiplier 6 is used here, it may also be a divider,
In that case, just divide the contents of register 4 by the contents of register 5. Furthermore, in the above description, register 5
The numbers inside are the word file numbers or file names in the dictionary, but they may also be addresses in the dictionary.

第２図は、本発明の他の実施例を説明するための構成図
で、図中、第１図に示した実施例と同様の作用をする部
分には、第１図の場合と同一の参照番号が付しである。FIG. 2 is a configuration diagram for explaining another embodiment of the present invention. In the figure, parts that have the same effect as the embodiment shown in FIG. Reference numbers are included.

而して、この実施例が第１図の実施例と異なる点は、レ
ジスター５に代って自然数発生部１２を設けた点にある
。第２図に示した実施例におい゛て、辞書が作成された
後、まず、リセットスイッチ２によってレジスター４の
内容を０にし、その後に、マイク１に向って音声を発す
る。すると、音声区間検出部３では音声信号部のみを切
り出して特徴抽出部８で特徴量に直し、レジスター４の
中へ一時格納する。一方、自然数発生部１２でかなり大
きな自然数を発生させ、まず、レジスター４中の最後部
の内容との積をとり、その結果が０であれば該自然数に
−１を加えるか又は１を差し引き、続いてレジスター４
の後から２番目のレジスターの内容とかけ合せる。その
結果が０であれば再び上記自然数に−１を加え、結果が
０でなくなるまで同様の演算をくり返す。最初に０でな
い結果が得られた場合、その時の自然数が入力音声長に
対応する。これより第１図に示した実施例と同様、その
数によって辞書中の単語の特徴から入力音声とほぼ同じ
長さの特徴量を選び出すことができる。この時、入力音
声長に対して±３０％程度の特徴量のみを照合するよう
にしておけば照合対象が制限され、認識の高速化を狙う
ことができる。なお、音声区間の切り出しと特徴抽出部
を逆に配置しても同じ効果が得られることは言うまでも
ない。また、自然数発生部はかなり大きな数でなくとも
０又は１でも良く、この時は、演算部に１又は他の数を
加えるようにすれば効果的である。更に、先にも述べた
ように演算は割り算であっても良い。This embodiment differs from the embodiment shown in FIG. 1 in that a natural number generator 12 is provided in place of the register 5. In the embodiment shown in FIG. 2, after the dictionary is created, the contents of the register 4 are first set to 0 by the reset switch 2, and then the voice is emitted into the microphone 1. Then, the voice section detecting section 3 cuts out only the voice signal part, converts it into a feature quantity at the feature extracting section 8, and temporarily stores it in the register 4. On the other hand, a fairly large natural number is generated in the natural number generator 12, first, the product is multiplied with the contents of the last part of the register 4, and if the result is 0, -1 is added or 1 is subtracted from the natural number, Next, register 4
Multiply by the contents of the second register from the end. If the result is 0, -1 is added to the above natural number again, and the same operation is repeated until the result is no longer 0. When a non-zero result is obtained for the first time, the natural number at that time corresponds to the input speech length. From this, similar to the embodiment shown in FIG. 1, it is possible to select a feature having approximately the same length as the input speech from the features of the words in the dictionary, depending on the number. At this time, if only feature amounts of approximately ±30% of the input speech length are compared, the objects to be matched are limited, and it is possible to aim for faster recognition. Note that it goes without saying that the same effect can be obtained even if the voice section cutout and feature extraction section are arranged in reverse. Further, the natural number generator need not be a very large number and may be 0 or 1; in this case, it is effective to add 1 or another number to the arithmetic unit. Furthermore, as mentioned above, the operation may be division.

匁−一困以」二の説明から明らかなように、本発明によると、認
識速度の速い音声認識装置を提供することができる。As is clear from the explanation in Section 2, according to the present invention, a speech recognition device with high recognition speed can be provided.

[Brief explanation of drawings]

第１図及び第２図は、それぞれ本発明による音声認識装
置の実施例を説明するための構成図である。１・・・マイク、２・・・リセットスイッチ、３・・・
音声区間検出部、４，５・・・レジスター、６・・・乗
算（割算）器、７・・・辞書部、８・・特徴抽出部、９
・・・照合部、１０・・・判定部、１１・・出力部、１
２・・・自然数発生部。FIGS. 1 and 2 are configuration diagrams for explaining embodiments of a speech recognition device according to the present invention, respectively. 1...Microphone, 2...Reset switch, 3...
Speech section detection unit, 4, 5... Register, 6... Multiplier (divider), 7... Dictionary unit, 8... Feature extraction unit, 9
... Verification section, 10... Judgment section, 11... Output section, 1
2...Natural number generator.

Claims

[Claims]

(1) A register for temporarily recording input speech in a speech recognition device that has a dictionary section that accommodates speech characteristics and identifies the input speech by comparing and collating the features of the input speech with the features of the dictionary section. , a reset switch for the register, a register in which numbers or symbols are recorded in advance, and an arithmetic unit that calculates the product or quotient of the contents of the register and the register in which the input voice is recorded, and a dictionary is created based on the result of the arithmetic operation. A speech recognition device characterized by referencing.

(2) A speech recognition device that compares and collates the characteristics of the input speech with a dictionary section that stores the speech features and identifies the input speech based on the result, including a register that temporarily records the input speech and a means for generating a natural number. , means for increasing or decreasing the number, and means for taking the product or quotient of the contents of the register and the natural number, and referring to a dictionary based on the result.