JPS6266300A

JPS6266300A - Voice recognition system

Info

Publication number: JPS6266300A
Application number: JP60207131A
Authority: JP
Inventors: 潤一郎藤本; 室井　哲也
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1985-09-19
Filing date: 1985-09-19
Publication date: 1987-03-25
Anticipated expiration: 2012-04-16
Also published as: JP2601448B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

[Detailed description of the invention]

扱批氷乱本発明は、音声認識方式に関する。丈來技！音声を２値化処理して特徴パターンを求め、この２値化
処理して求めた入カバターンと辞書パターンを線形マツ
チングして認識する所１１ＢＴｓＰ（Ｂｉｎａｒｙ　Ｔ
ｉｍｅ−３ｐｅｃｔｒｕｍ　Ｐａｔｔｅｒｎ）方式によ
る音声認識は本出願人において既に種々提案されている
。しかし、このＢＴＳＰ方式では音声を２値処理するため
に音声の大きさを表わすエネルギーやパワーの情報が失
われ誤認識となることがある。例えば、子音部が急速に
立ち上る破裂音である／Ｐ／と比較的緩かに立ち上る／
に／の差が区別しにくい。そこで音声のパワー情報を通
常の方法で２値化して２値のＴＳＰ（ＢＴＳＰ）パター
ンと共に保持しておくことが考えられるが、この場合、
ＢＴＳＰとパワ一部の演算方式が異なるため、パワ一部
の類似性を求める専用の演算部が必要となり、装置が複
雑化し折角のＢＴＳＰ方式の高速演算性が失われてしま
うことになる。目　　　　　的本発明は、上述のごとき実情に鑑みてなされたもので、
特に、２値化パターンによる認識システムに音声のエネ
ルギー又はパワー情報を加えながらも高精度、高速認識
の可能な認識方式を提供することを目的としてなされた
ものである。構成本発明は、上記目的を達成するため、音声の特徴量を抽
出して、標準パターンとして保持しておき、未知入力音
声の音声パターンと照合することによって認識結果を決
定する音声認識方式において、音声のエネルギー又はパ
ワー形状を示す位置又はその近傍をそれ以外の部分と異
なる符号によって２値化して音声パターンを作り、同様
の手順によって作った一つ以上のパターンを重ね合わせ
て加算したものを標準パターンとし、未知入力音声も同
様に２値化し、標準パターンと重ね合わせて類似性を判
別し、最大類似の標準パターンを認識結果とすることを
特徴としたものである。以下、本発明の実施例に基づい
て説明する。第１図は１本発明の一実施例を説明するための電気的ブ
ロック線図で２図中、１はマイク、２はパワー検出部、
３は音声区間検出部、４は２値化部、５はレジスタ、６
は加算部、７は標準パターン、８は重ね合わせ部、９は
類似度判定部、１０は認識結果出力部で、まず、マイク
１から入った音声のパワー検出し、音声区間を求め、音
声区間に関する部分だけパワーを２値化する。パワー検
出は例えば音声波形の振幅の包絡を検波するなどして求
めれば良いし、音声区間の検出はパワー検出部２で求め
たパワーが一定値以上である部分として求められる。又
、２値化部は第２図（Ａ）に示すパワー信号から（Ｂ）
に示す２値化情報の如く、音声パワーの形状が示される
部分を「１」、他を「０」で表わす方法で良い。この例
ではパワーの大きさを５段階に量子化しており（Ｂ）か
ら（Ａ）の波形を連想することは容易である。標準パタ
ーン作成時にはスイッチＳをTECHNICAL FIELD The present invention relates to a voice recognition method. Long come technique! 11BTsP (Binary T
The present applicant has already proposed various types of voice recognition using the ime-3pectrum pattern) method. However, in this BTSP method, since the audio is subjected to binary processing, information on energy and power representing the loudness of the audio is lost, which may result in erroneous recognition. For example, /P/ is a plosive whose consonant part rises rapidly, and /P/ has a relatively slow rise.
The difference between / is difficult to distinguish. Therefore, it may be possible to binarize the audio power information using the usual method and store it together with the binary TSP (BTSP) pattern, but in this case,
Since the calculation method for the BTSP and the power part is different, a dedicated calculation unit for determining the similarity of the power part is required, which complicates the device and loses the high-speed calculation performance of the BTSP method. Purpose The present invention was made in view of the above-mentioned circumstances.
In particular, the purpose of this invention is to provide a recognition method capable of high-accuracy and high-speed recognition while adding voice energy or power information to a recognition system using binary patterns. Structure In order to achieve the above object, the present invention provides a speech recognition method in which a recognition result is determined by extracting a feature amount of a speech, retaining it as a standard pattern, and comparing it with a speech pattern of an unknown input speech. The standard is a sound pattern that is created by binarizing the position or its vicinity that indicates the energy or power shape of the sound with a different code from the rest of the sound, and then superimposing and adding one or more patterns created using the same procedure. The system is characterized in that the unknown input speech is similarly binarized as a pattern, and the similarity is determined by superimposing it on a standard pattern, and the standard pattern with the maximum similarity is taken as the recognition result. Hereinafter, the present invention will be explained based on examples. FIG. 1 is an electrical block diagram for explaining one embodiment of the present invention. In FIG. 2, 1 is a microphone, 2 is a power detection section,
3 is a voice section detection section, 4 is a binarization section, 5 is a register, 6
1 is an addition unit, 7 is a standard pattern, 8 is a superimposition unit, 9 is a similarity judgment unit, and 10 is a recognition result output unit. First, the power of the voice input from the microphone 1 is detected, the voice interval is determined, and the voice interval is Binarize the power only for the relevant part. The power may be detected, for example, by detecting the amplitude envelope of the speech waveform, and the speech section may be detected as a portion where the power determined by the power detection section 2 is equal to or greater than a certain value. In addition, the binarization unit converts the power signal shown in FIG. 2 (A) into (B)
As in the case of the binarized information shown in FIG. 1, a method may be used in which the part where the shape of the audio power is shown is represented by "1" and the other parts are represented by "0". In this example, the magnitude of power is quantized in five stages, and it is easy to associate the waveforms of (A) with (B). When creating a standard pattern, press switch S.

【標】側に倒し、一つの音
声例えば／　Ｐ　ａ　／を３回発声する。まず、１回目
発声パターンをレジスタ５に入れ、２回目のパターンと
重ね合わせ加算し、再びレジスタ５に入れる。次いで、
３回目のパターンとレジスタ５の内容が加算されて標準
パターンとして登録される。つまり第２図の（Ｂ）、（
Ｃ）、（Ｄ）のパターンの加算により標準パターン（Ｅ
）が作成されることになり、登録すべき各音声について
これをくり返した後認識に入る。認識の場合、未知の音
声がマイク１から入力され、標準パターン作成時と同じ
過程を経て２値化されたパターンが先に作られたいくつ
かの標準パターンと照合される。照合は２値化されたパターンと標準パターンの一つが重
ね合わされて類似度を計算することになる。この時の２値化された未知音声パターンは第２図（Ｂ）
と同じ形をしており、両者が類似の波形なら標準パター
ン（Ｅ）と重ね合わせることによって（Ｂ）のパターン
の「１」のエレメントは（Ｅ）の大きな値を示すエレメ
ントに重なることになる。そこで類似度として両者の重ね合わせによって対応づい
たエレメント同士の積をとりそれらの和として定義して
も良い。こうして登録されている全ての標準パターンと
未知のパターンの類似度を求め、最大の類似度を得たも
のを認識結果として出力する。これによって２値化処理
した中にパワー情報を加えて類似度の計算ができるよう
になった。しかし、パワーの情報だけによって音声を認識すること
は難しい。第３図は、上記欠点を解消した他の実施例を示す電気的
ブロック線図で、この実施例は、前記実施例で作成した
パターンと他の特徴量により作成したパターンを合わせ
て用い、両方のパターン間で類似度を求め、一方の類似
度を他方の類似度に作用させて最終的な類似度を求める
ようにしたもので、ここでは、併用する他の方法として
従来技術として説明した２値のＴＳＰを用いる方法を選
んだ。これは２値のＴＳＰパワーパターンと共に２値化
処理されたもので、同じ演算が可能であるからであるが
勿論これ以外の方式と併用しても差し支えない、第３図
においては、音声区間検出部３で音声区間が切り出され
た後、パワー検出部２でパワー検出がなされ、一方では
同じ信号を特徴量変換部１１にて特徴量変換を行なう、
特徴量は。この実施例では、スペクトルが適している。パワーとス
ペクトルの形状を２値化部４で２値化する。この２値化
パターンではスペクトルパターンとパワーパターンを結
合して一つのパターンとする方が後の演算が容易である
。２値化部４でのパターンの例は第４図の如くなり、通
常のＢＴＳＰがＦ、第２図（Ｂ）のパターンに相当する
のがＧである。これを第１図の例と同じ手順で類似度計
算して結果を引き出せば良い。この場合、類似度判定部
９ではパターンの大きさが大きくなったと考えれば手順
は何ら変る部分がなく、両者のパターンの和の類似度に
より結果を求めることになる。これにより、第１図の例に比べ精度は最終的に向上する
。この場合、パワーかスペクトルのどちらかのパターン
にウェイトを置いて他を補助的手段として用いることが
できる。第５図は、上述のごとき観点に立ってなされた実施例を
説明するための電気的ブロック線図で、この実施例によ
ると類似度を求める際、一方の類似度が特定の条件を満
たす或いは満たさない時のみ、他方の類似度も計算して
認識結果を決定することができる。この実施例は、第３
図に示した実施例と同様にスペクトルとパワーを結合し
た第４図の如きパターンを作り、これを何回か重ねて登
録しておく。認識時には２値化部４でできた未知入力パ
ワーのパワ一部とパワーパターン照合部１２で照合して
類似性をみる。この類似性が大きく違っているものはス
ペクトル部の類似度を計算しないと判断部１３で判断し
、次の標準パターンとの照合に移る。もし判断部１３で
スペクトルパターンの類似度計算をすると判断されたも
のは第３図と同様にパターン間の類似度を求めることに
なる。この場合の類似度はパワ一部を含めて計算しても
含めずに計算しても良い。ここでの例はパターン全体の
パワーの比較になっているが、これは一つの音声パター
ン全体でなくパターン中のフレーム毎に行なっても良い
ことは勿論である。肱−一来以上の説明から明らかなように、本発明によると２値化
されたスペクトルパターンにもパワー情報が添加され音
声認識の精度を向上させることができる。[Mark] Turn to the side and say one sound, for example /P a / three times. First, the first utterance pattern is put into the register 5, superimposed and added with the second utterance pattern, and then put into the register 5 again. Then,
The third pattern and the contents of register 5 are added and registered as a standard pattern. In other words, (B) in Figure 2, (
By adding the patterns C) and (D), the standard pattern (E
) is created, and after repeating this for each voice to be registered, recognition begins. In the case of recognition, an unknown voice is input through the microphone 1, and the binarized pattern is compared with several previously created standard patterns through the same process as when creating standard patterns. In matching, the binarized pattern and one of the standard patterns are superimposed to calculate the degree of similarity. The binarized unknown voice pattern at this time is shown in Figure 2 (B).
If both have similar waveforms, by overlapping it with the standard pattern (E), the element of "1" in the pattern (B) will overlap with the element that shows a large value of (E). . Therefore, the degree of similarity may be defined as the sum of the products of elements that correspond to each other by overlapping them. In this way, the degree of similarity between all registered standard patterns and the unknown pattern is determined, and the one with the highest degree of similarity is output as the recognition result. This makes it possible to calculate similarity by adding power information to the binarized data. However, it is difficult to recognize speech based only on power information. FIG. 3 is an electrical block diagram showing another embodiment that eliminates the above drawbacks. This embodiment uses both the pattern created in the previous embodiment and the pattern created using other feature quantities, This method calculates the degree of similarity between the two patterns, and applies the degree of similarity of one to the degree of similarity of the other to determine the final degree of similarity. We chose a method using the TSP value. This is because it has been binarized together with the binary TSP power pattern and the same calculations can be performed, but of course it can be used in conjunction with other methods. After the speech section is cut out in the section 3, the power is detected in the power detection section 2, and on the other hand, the same signal is subjected to feature amount conversion in the feature amount conversion section 11.
What are the features? In this example, a spectrum is suitable. A binarization unit 4 binarizes the power and the shape of the spectrum. In this binarized pattern, later calculations are easier if the spectrum pattern and the power pattern are combined into one pattern. An example of a pattern in the binarization section 4 is as shown in FIG. 4, where F is the normal BTSP and G is the pattern corresponding to the pattern in FIG. 2(B). The similarity can be calculated using the same procedure as in the example shown in FIG. 1 to derive the results. In this case, if the similarity determination unit 9 considers that the size of the pattern has increased, there is no change in the procedure, and the result is determined based on the similarity of the sum of both patterns. This ultimately improves accuracy compared to the example of FIG. In this case, it is possible to place weight on either the power or spectrum pattern and use the other as an auxiliary means. FIG. 5 is an electrical block diagram for explaining an embodiment made from the above-mentioned viewpoint. According to this embodiment, when determining the degree of similarity, one degree of similarity satisfies a specific condition or Only when the similarity is not satisfied, the recognition result can be determined by calculating the other similarity. In this example, the third
Similar to the embodiment shown in the figure, a pattern as shown in FIG. 4 is created in which the spectrum and power are combined, and this pattern is overlapped several times and registered. During recognition, a portion of the unknown input power generated by the binarization section 4 is compared with the power pattern matching section 12 to check for similarity. If the similarity is significantly different, the determination unit 13 determines that the degree of similarity of the spectrum portion should not be calculated, and moves on to matching with the next standard pattern. If the determination unit 13 determines that the degree of similarity of spectral patterns should be calculated, the degree of similarity between the patterns will be determined in the same manner as in FIG. In this case, the degree of similarity may be calculated with or without including part of the power. Although the example here is a comparison of the power of the entire pattern, it goes without saying that this may be performed for each frame in the pattern rather than for the entire audio pattern. As is clear from the above explanation, according to the present invention, power information is added to the binarized spectral pattern, making it possible to improve the accuracy of speech recognition.

[Brief explanation of the drawing]

第１図は１本発明の一実施例を説明するための電気的ブ
ロック線図、第２図は、本発明の動作説明をするための
２値化パターンを示す図、第３図は、本発明の他の実施
例を説明するための電気的ブロック線図、第４図は、２
値化パターンの例を示す図、第５図は、本発明の他の実
施例を示す電気的ブロック線図である。１・・・マイク、２・・・パワー検出部、３・・・音声
区間検出部、４・・・２値化部、５・・・レジスタ、６
・・・加算部、７・・・標準パターン、８・・・重ね合
わせ部、９・・・類似度判定部、１ｏ・・・認識結果出
力部、１１・・・特徴量変換部、１２・・・パワーパタ
ーン照合部、１３・・・判断部。特許出願人　　株式会社　リコー第　　１　　図第３図第４図Ｆ　　　　　　　　　　ＧFIG. 1 is an electrical block diagram for explaining one embodiment of the present invention, FIG. 2 is a diagram showing a binarization pattern for explaining the operation of the present invention, and FIG. 3 is an electrical block diagram for explaining an embodiment of the present invention. An electrical block diagram for explaining another embodiment of the invention, FIG.
FIG. 5, which is a diagram showing an example of a value pattern, is an electrical block diagram showing another embodiment of the present invention. DESCRIPTION OF SYMBOLS 1... Microphone, 2... Power detection section, 3... Voice section detection section, 4... Binarization section, 5... Register, 6
... Addition section, 7. Standard pattern, 8. Overlay section, 9. Similarity determination section, 1o.. Recognition result output section, 11. Feature value conversion section, 12. ...Power pattern matching section, 13... Judgment section. Patent applicant: Ricoh Co., Ltd. Figure 1 Figure 3 Figure 4 FG

Claims

[Claims]

(1) In a speech recognition method that determines the recognition result by extracting the feature quantity of the speech and storing it as a standard pattern, and comparing it with the speech pattern of unknown input speech, the energy or power shape of the speech is used. A sound pattern is created by binarizing the position indicating or the vicinity thereof using a code different from the rest of the part, and one or more patterns created using the same procedure are superimposed and added to form a standard pattern. , a speech recognition method characterized in that unknown input speech is similarly binarized, superimposed on a standard pattern to determine similarity, and the standard pattern with maximum similarity is taken as the recognition result.

(2) The above pattern is used together with patterns created using other features, the degree of similarity is determined between each type of pattern, and the degree of similarity of one type is applied to the degree of similarity of the other to determine the final similarity. The speech recognition method according to claim 1, wherein the recognition result is determined by determining the speech recognition method.

(3) When using the above-mentioned pattern together with patterns created using other feature values to determine the degree of similarity between each type of pattern, only when one degree of similarity satisfies or does not satisfy a specific condition. The speech recognition method according to claim (1), wherein the recognition result is determined by also calculating the other similarity.