JPH01309099A

JPH01309099A - Speech responding device

Info

Publication number: JPH01309099A
Application number: JP63126847A
Authority: JP
Inventors: Harutake Yasuda; 安田　晴剛; Shoji Kuriki; 章次栗木; Takashi Ariyoshi; 有吉　敬; Toshiki Kawamoto; 河本　俊毅; Tomofumi Nakatani; 中谷　奉文
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1987-06-04
Filing date: 1988-05-23
Publication date: 1989-12-13

Abstract

PURPOSE:To maintain a high recognition rate without any deviation between unspecified recognition and specified recognition by providing unspecified standard patterns as to words whose registration contents are already known. CONSTITUTION:Candidates up to high-order (n) of the unspecified recognition are denoted as WI1, WI2... WIn and their similarity data are denoted as SI1, SI2... SIn; and candidates up to high-order (n) of the specified recognition are denotes as WD1, WD2... WDn and their similarity data are denoted as SD1, SD2... SDn. Then, (kXSI1) and SD1 are compared by using a coefficient (k) which is found by experiment in advance, thereby obtaining WI1 when (kXSI1)> SD1 or WD1 when SD1>(kXSI1) as the final recognition result of an unspecified/specified decision part. Consequently, the deviation between the unspecified recognition and specified recognition is eliminated and the high recognition rate is maintained.

Description

【発明の詳細な説明】援帆分」本発明は、音声応答装置、より詳細には、音声ダイヤリ
ング装置、或いは、音声ワードプロセッサ、その他の音
声応答装置等に応用可能な音声認識装置、更には、制御
認識処理部に接続される外部周辺装置の制御を認識処理
制御を行う中央処理部で実現できるようにした装置に関
する。[Detailed Description of the Invention] The present invention relates to a voice response device, more specifically a voice recognition device applicable to a voice dialing device, a voice word processor, other voice response devices, etc. The present invention relates to a device in which control of an external peripheral device connected to a control recognition processing section can be realized by a central processing section that performs recognition processing control.

従米弦皿従来、開発、及び、考案されている特定音声認識装置に
おける音声登録は、使用者にとって負担となる作業であ
ることは知られている。そこで。It is known that voice registration in specific voice recognition devices that have been developed and devised in the past is a burdensome task for users. Therefore.

少しでもべ２触車語を減らすために、装置のコマンド用
語や、一般的な名称などは、不特定標準パターンとして
予め登録しておき、使用者の登録した特定標やパターン
と同時に認識を行う方法が考えられる。しかしながら、
現在の音声認識技術では、ある−人の標準パターンから
その人の個人性の影響を取り除くことができない、した
がって、個人性の影響が含まれている特定話者用の標準
パターンと多数の話者の発声を元にして個人性の影響を
無くした不特定話者用の標準パターンとを等価として扱
って認識を行うことは難かしい、更に、不特定音声認識
と、特定音声認識とを別々に行って、各認識結果の上位
同士の類似度を比較する方法は、等価な類似度の比較で
ないために、不特定標準パターンの登録された候補が選
ばれ易い、或いは、特定標準パターンに登録された候補
が選ばれ易いといったように、不特定認識、また、特定
認識のどちらかの！！識結果に偏ってしまう確率が高い
という欠点があった。In order to reduce the amount of language used, device command terms and general names are registered in advance as unspecified standard patterns, and recognized at the same time as specific marks and patterns registered by the user. There are possible ways. however,
Current speech recognition technology cannot remove the influence of a person's individuality from a standard pattern of a person. It is difficult to perform recognition by treating standard patterns for unspecified speakers as equivalent, based on the utterances of speakers, and eliminating the influence of individuality. The method of comparing the similarities between the top recognition results is not a comparison of equivalent similarities, so candidates registered in unspecified standard patterns are likely to be selected, or candidates registered in specific standard patterns are likely to be selected. Either unspecific recognition or specific recognition, such as the candidate that is more likely to be selected! ! The drawback was that there was a high probability that the results would be biased towards the results of the study.

また、従来、音声ダイヤリング装置は特定話者用の標準
パターンしか持っておらず、どこで使用しても同じ番号
であるｒｌｌＯＪ　、ｒｌ１９Ｊ　。Furthermore, conventional voice dialing devices only have standard patterns for specific speakers, and the numbers rllOJ and rl19J are the same no matter where they are used.

ｒ１０４Ｊ　、ｒＬ’ｌ’ｉＪ等、又それに対応する「
？ｙ察」、「消防署」、「番号案内」、［天気予報」等
の認識用、合成用の標準パターンも話者ごとに全て登録
しなければならない手間があった。r104J, rL'l'iJ, etc., and the corresponding "
? Standard patterns for recognition and synthesis such as ``Yoshiken'', ``Fire Department'', ``Number Guide'', and ``Weather Forecast'' had to be registered for each speaker, which was a hassle.

また、従来の認識プロセッサは、認識処理の演算量が厖
大であるため、認識機能しか実現できない、いわゆる認
識専用プロセッサとして構成されていた。したがって、
従来の認識専用プロセッサに外部周辺装置、たとえば、
ファクシミリ装置か音声ダイヤリング装置などを接続し
ようとすると。Further, since the amount of calculation required for recognition processing is enormous, conventional recognition processors have been configured as so-called recognition-only processors that can only implement recognition functions. therefore,
Traditional recognition-only processors with external peripherals, e.g.
If you try to connect a facsimile device or voice dialing device, etc.

認識専用プロセッサに直接接続することができず。Unable to connect directly to recognition-only processor.

認識専用プロセッサとこれら外部周辺装置との間で制御
の橋渡しをするための制御用の中央処理部（Ｃｌ）Ｕ）
を介在させる必要があった。A control central processing unit (Cl) that bridges control between the recognition-dedicated processor and these external peripheral devices
It was necessary to intervene.

また、従来の音声認識用ＬＳＩは、認識演算量が厖大な
ため認識機能だけしか実行できない。たとえば、動的計
画法（ＤＰ）による照合の場合には、照合の格子点の演
算が厖大であるため、実用Ｊ：では、汎用ＣＰＵの処理
だけで行うことはできないので、大規模な専用ハードウ
ェアが必要となる。しかし、この専用ハードウェア（又
は専用プロセッサ）に、照合以外の機能例えば辞書パタ
ーンの登録、認識結果の出力又は認識にかかわる周辺制
御を実行することや、外部周辺制御との接続は不可能で
あるので、これらの場合、汎用ＣＰＵを接続しなければ
ならない、また、音声認識用ＬＳＩにおいて、特定話者
認識用ＬＳＩと、不特定話者認識用ＬＳＩとは一般的に
認識アルゴリズムが異なり、演算プロセッサを、双方と
も専用化せざるを得す、特定話者認識又は不特定話者認
識の各用途に応じてそれぞれの認識用ＬＳＩを選択して
使用しなければならない。Further, the conventional speech recognition LSI requires a huge amount of recognition calculations, so it can only perform the recognition function. For example, in the case of matching using dynamic programming (DP), the computation of grid points for matching is huge, so in practical use it cannot be performed by processing only on a general-purpose CPU, so it requires large-scale dedicated hardware. clothing is required. However, this dedicated hardware (or dedicated processor) cannot perform functions other than matching, such as registering dictionary patterns, output recognition results, or perform peripheral control related to recognition, or connect with external peripheral control. Therefore, in these cases, a general-purpose CPU must be connected. Also, in speech recognition LSIs, the recognition algorithms are generally different between LSIs for specific speaker recognition and LSIs for non-specific speaker recognition, and arithmetic processors are required. Both of them have to be dedicated, and it is necessary to select and use the respective recognition LSIs depending on the purpose of specific speaker recognition or non-specific speaker recognition.

また、従来、特徴パターンと辞書パターンとの類似度（
または、距離）を演算する演算部は、−般的には、ＤＰ
プロセッサに代表されるように、大規模な専用プロセッ
サであり、ＣＰＵと演算部を１つのＬＳＩの中に含むこ
とが不可能であった。In addition, conventionally, the similarity between feature patterns and dictionary patterns (
or distance), - generally, DP
As typified by processors, these are large-scale dedicated processors, and it has been impossible to include a CPU and an arithmetic unit in one LSI.

また、従来の演算プロセッサで、特定話者、不特定話者
のいずれにも対応するために、マルチテンプレートを用
いる方法がある。これは、１つの認識結果に対する辞書
パターンを複数個持つ方法である。しかし、この方法で
不特定話者認識を行なうには、１つの認識結果に対する
辞書パターンを４０〜６０程度持つ必要があり、認識処
理に時間がかかるという欠点がある。Furthermore, there is a method of using multi-templates in order to accommodate both specific speakers and non-specific speakers using conventional arithmetic processors. This is a method of having multiple dictionary patterns for one recognition result. However, in order to perform speaker-independent recognition using this method, it is necessary to have about 40 to 60 dictionary patterns for one recognition result, which has the disadvantage that the recognition process takes time.

その他、以前に開発したＢ　Ｔ　Ｓ　Ｐ　（Ｂｉｎａｒ
ｙ　ＴｉｍｅＳｐｅｃｔｒｕｆｆｉＰａｔｔｅｒｎ）方
式の演算においても、特定話者と不特定話者の認識にお
いて、特徴パターンは同一であるが、辞書パターンは、
それぞれ構成が異なり、同一の演算部で演算することが
できなかった。In addition, the previously developed BTS P (Binar
yTimeSpectruffiPattern) method, the feature patterns are the same in recognition of specific speakers and non-specific speakers, but the dictionary pattern is
Each had a different configuration and could not be calculated using the same calculation unit.

且−−」在本発明は、上述のごとき実情に鑑みてなされたもので１
本願の請求項第１項及び第２項に記載された発明は、コ
マンド用語や、一般的な名称などの用語を不特定話者音
声認識を用い、その他の単語を特定話者音声認識を用い
る場合に、不特定標準パターンに登録された候補が選ば
れ易い、或いは、特定標やパターン登録された候補が選
ばれ易いといった、不特定認識、特定認識間の偏りをな
くし、高い認識率を保つことのできる音声認識装置を提
供することを目的としてなされたものである。する、ま
た、請求項第３項に記載された発明は、電話のダイヤリ
ングを音声によって行う音声ダイヤリング装置において
、誰もが使う公共機関名を登録する手間を省くことを目
的としてなされたものである。更に、請求項第４項及び
第５項に記載された発明は、入力データの特徴パターン
を認識するに際し、制御認識処理部に外部周辺装置が制
御用中央処理部（ＣＰＵ）を介さずに直接接続できるよ
うにした認識処理装置を提供することを目的としてなさ
れたものである。更に、音声認識において、特定話者認
識又は不特定話者認識のいずれを行う場合においても、
制御認識処理部を共用でき、かつ、高速に演算を実行で
きるようにした！５！Ｉａ装置を提供すること、更に、
前記認識処理装置において、入力データの特徴パターン
の有効区間検出の制御を制御認識処理部の外から可変制
御できるようにした装置を提供すること、更に。The present invention has been made in view of the above-mentioned circumstances.
The invention described in claims 1 and 2 of the present application uses speaker-independent speech recognition for terms such as command terms and general names, and uses speaker-specific speech recognition for other words. In some cases, candidates registered in unspecified standard patterns are more likely to be selected, or candidates registered in specific marks or patterns are more likely to be selected. This eliminates the bias between unspecified recognition and specific recognition, and maintains a high recognition rate. This was done with the purpose of providing a speech recognition device that can perform the following tasks. Furthermore, the invention described in claim 3 is made for the purpose of saving the trouble of registering public institution names that everyone uses in a voice dialing device that dials a telephone by voice. It is. Furthermore, in the invention described in claims 4 and 5, when recognizing the characteristic pattern of input data, an external peripheral device is directly connected to the control recognition processing unit without going through a control central processing unit (CPU). This was done for the purpose of providing a recognition processing device that can be connected. Furthermore, in speech recognition, whether specific speaker recognition or non-specific speaker recognition is performed,
The control recognition processing unit can be shared and calculations can be executed at high speed! 5! providing an Ia device;
The present invention further provides a recognition processing device in which detection of a valid section of a characteristic pattern of input data can be variably controlled from outside the control recognition processing section.

前記認識処理装置において、入力データの特徴パターン
の照合時におけるデータ斌を軽減し、中央処理部の演算
負荷を軽くすること等を目的とじてなされたものである
。In the recognition processing device, this is done for the purpose of reducing data loss when comparing characteristic patterns of input data and lightening the calculation load on the central processing unit.

大−」し二匠第１図は、本発明の一実施例を説明するための構成図、
第２図は、その動作説明をするためのフローチャートで
、この実施例は、音声認識装置、例えば、音声ダイヤリ
ング装置、音声ワードプロセッサ、その他の音声応答装
置等に応用可能な装置に関する。Figure 1 is a configuration diagram for explaining one embodiment of the present invention.
FIG. 2 is a flowchart for explaining the operation, and this embodiment relates to a device applicable to a voice recognition device, such as a voice dialing device, a voice word processor, and other voice response devices.

従来、開発、及び、考案されている特定音声認識装置に
おける音声登録は、使用者にとって負担となる作業であ
ることは知られている。そこで、少しでも登録用語を減
らすために、装置のコマンド用語や、一般的な名称など
は、不特定標準パターンとして予め登録しておき、使用
者の登録した特定標準パターンと同時に認識を行う方法
が考えられる。しかしながら、現在の音声認識技術では
、ある−人の標準パターンからその人の個人性の影響を
取り除くことができない。したがって１個人性の影響が
含まれている特定話者用の標準パターンと多数の話者の
発声を元にして個人性の影響を無くした不特定話者用の
標準パターンとを等価として扱って認識を行うことは難
かしい、更に、不特定音声認識と、特定音声認識とを別
々に行って。It is known that voice registration in specific voice recognition devices that have been developed and devised in the past is a burdensome task for users. Therefore, in order to reduce the number of registered terms as much as possible, it is recommended that device command terms and general names be registered in advance as unspecified standard patterns, and then recognized at the same time as the specific standard patterns registered by the user. Conceivable. However, current speech recognition technology cannot remove the influence of a person's individuality from a person's standard pattern. Therefore, the standard pattern for a specific speaker, which includes the influence of individuality, and the standard pattern for non-specific speakers, which is based on the utterances of many speakers and eliminates the influence of individuality, are treated as equivalent. It is difficult to perform recognition, and furthermore, unspecified speech recognition and specific speech recognition are performed separately.

各認識結果の上位同士の類似度を比較する方法は。How to compare the similarity between the top recognition results?

等価な類似度の比較でないために、不特定標準パターン
の登録された候補が選ばれ易い、或いは、特定標準パタ
ーンに登録された候補が選ばれ易いといったように、不
特定認識、また、特定認識のどちらかの認識結果に偏っ
てしまう確率が高いという欠点があった。Because it is not an equivalent comparison of similarity, candidates registered in unspecified standard patterns are more likely to be selected, or candidates registered in specific standard patterns are more likely to be selected. The disadvantage is that there is a high probability that the recognition result will be biased towards one of the two.

本実施例は、コマンド用語や、一般的な名称などの単語
を不特定話者音声認識を用い、その他の単語を特定話者
音声認識を用いる場合に、不特定標準パターンに登録さ
れた候補が選ばれ易い、或いは、特定標準パターン登録
された候補が選ばれ易いといった、不特定認識、特定認
識間の偏りをなくし、高い認識率を保つことのできる音
声認識装置を提供することを目的としてなされたもので
ある。In this example, when using speaker-independent voice recognition for words such as command terms and general names, and using speaker-specific voice recognition for other words, the candidates registered in the unspecified standard pattern are The purpose of this invention is to provide a speech recognition device that can maintain a high recognition rate by eliminating bias between unspecified recognition and specific recognition, such as candidates being more likely to be selected or candidates registered with specific standard patterns being more likely to be selected. It is something that

本願の請求項第１項及び第２項に記載された発明は、上
記目的を達成するためのもので、請求項第１項の発明は
、入力された音声の特徴量を抽出する特徴抽出手段と、
予め登録された不特定話者標準パターンを記憶する不特
定標準パターン記憶手段と、使用者が、登録を行う標準
パターンを記憶する特定標準パターン記憶手段と、前記
不特定標準パターン及び特定標準パターンとのそれぞれ
と特徴抽出手段によって抽出された音声の特徴量とを照
合し、認識を行う認識手段と、前記認識手段によって照
合し、認識された認識結果を使用者に知らしめる認識結
果確認手段と、あらかじめ定められた少なくとも１個の
所定の係数を記憶する係数記憶手段と、前記認識手段に
よって得られた不特定標準パターン又は特定標準パター
ンのいずれか一方の標準パターンに対する類似度に相当
する値と前記係数とに演算を施こした革と、前記不特定
標準パターン又は特定標準パターンのうち前記演算が施
されなかった標準パターンに対する類似度に相当する値
とを比較し、類似度の最大値に相当する値が得られるａ
準パターンを認識する認識手段とを具備する。また、請
求項第２項の発明は、音声認識方法において１次のＳｔ
ｅｐを備える、すなわち特徴量抽出手段によって入力さ
れた音声の特徴量を抽出する；予め登録された不特定話
者標準パターン及び使用者が登録を行う特定標準パター
ンとのそれぞれと前記特徴抽出手段によって抽出された
音声の特徴量とを認識手段によって照合し認識する；前
記認識手段によって得られた不特定標準パターン又は特
定標準パターンのいずれか一方の標準パターンに対する
類似度に相当する値とあらかじめ定められた少なくとも
１個の所定の係数とに演算を施こす；演算を施した値と
、前記不特定標準パターン又は特定標準パターンのうち
前記演算を施さなかった標準パターンに対する類似度に
相当する値とを比較する：前記比較によって類似度の最
大値に相当する値が得られる標準パターンを認識するこ
とを構成する。以下、本発明の実施例に基づいて説明す
る。The invention set forth in claims 1 and 2 of the present application is for achieving the above object, and the invention set forth in claim 1 is a feature extraction means for extracting feature amounts of input speech. and,
unspecified standard pattern storage means for storing preregistered unspecified speaker standard patterns; specific standard pattern storage means for storing standard patterns registered by the user; and the unspecified standard patterns and the specified standard patterns. recognition means for performing recognition by comparing each of the voice features with the feature amount of the voice extracted by the feature extraction means; recognition result confirmation means for making the comparison by the recognition means and informing the user of the recognized recognition results; a coefficient storage means for storing at least one predetermined predetermined coefficient; a value corresponding to the degree of similarity to either the unspecified standard pattern or the specific standard pattern obtained by the recognition means; The leather for which the coefficient has been calculated is compared with the value corresponding to the degree of similarity to the standard pattern for which the calculation has not been performed among the unspecified standard pattern or the specified standard pattern, and the value corresponding to the maximum value of the degree of similarity is compared. a that yields the value
and recognition means for recognizing the quasi-pattern. The invention of claim 2 also provides a voice recognition method in which the first-order St
ep, that is, the feature amount of the input voice is extracted by the feature amount extraction means; the unspecified speaker standard pattern registered in advance and the specific standard pattern registered by the user, and the feature amount extraction means The extracted voice features are compared and recognized by a recognition means; a value is predetermined as a value corresponding to the degree of similarity to either the unspecified standard pattern or the specific standard pattern obtained by the recognition means. and at least one predetermined coefficient; calculate the calculated value and a value corresponding to the degree of similarity to the standard pattern on which the calculation was not performed among the unspecified standard pattern or the specified standard pattern; Comparing: constitutes recognizing a standard pattern for which a value corresponding to the maximum similarity is obtained by the comparison. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するための構成図、
第２図は、その動作説明をするためのフローチャートで
１図中、１はマイクロフォン、２は特徴抽出部、３は不
特定話者認識部、４は特定話者認識部、５は不特定話者
標準パターン記憶部。FIG. 1 is a configuration diagram for explaining one embodiment of the present invention,
Fig. 2 is a flowchart for explaining the operation. In Fig. 1, 1 is a microphone, 2 is a feature extraction section, 3 is a non-specific speaker recognition section, 4 is a specific speaker recognition section, and 5 is a non-specific speaker recognition section. standard pattern storage section.

６は特定話者標準パターン記憶部、７は不特定／特定話
者判定部、８はキーボード、９は係数部。6 is a specific speaker standard pattern storage section, 7 is an unspecified/specific speaker determining section, 8 is a keyboard, and 9 is a coefficient section.

１０は音声合成部、１１は音声合成辞書部、１２はスピ
ーカで、特定抽出部２で抽出された音声の特徴量は、予
め不特定標準パターン記憶部５に登録された不特定話者
に対応する不特定標準パターンとパターン認識が行われ
、その結果が不特定認識部の結果となる。また、その特
徴量は、これと平行して、特定標準パターン記憶部４の
、使用者が登録した特定標準パターンと、パターン認識
が行われ、その結果が特定認識部の結果となる。10 is a speech synthesis section, 11 is a speech synthesis dictionary section, 12 is a speaker, and the feature amount of the voice extracted by the specific extraction section 2 corresponds to an unspecified speaker registered in advance in the unspecified standard pattern storage section 5. Pattern recognition is performed with the unspecified standard pattern, and the result becomes the result of the unspecified recognition unit. Further, in parallel with this, pattern recognition is performed on the feature amount with the specific standard pattern registered by the user in the specific standard pattern storage section 4, and the result becomes the result of the specific recognition section.

ここで、不特定認識の上位ｎ位迄の候補を、ＷＩ　１　
、　ＷＩ　２．−、　ＷＩ　ｎその類似度を、それぞれ
、ＳＩＩ、ＳＩ２．−、ＳＩｎとし、特定認識の上位ｎ位迄の候補を、ＷＤ　１　、　
ＷＤ　２　、−、　ＷＤ　ｎその類似度を、それぞれ。Here, the top n candidates for unspecified recognition are WI 1
, WI 2. -, WIn, and their similarities are respectively SII and SI2. −, SIn, and the top n candidates for specific recognition are WD 1,
WD 2 ,−, WD n their similarity, respectively.

Ｓ　Ｄ　１　、　Ｓ　Ｄ　２　、−　、　Ｓ　Ｄ　ｎと
すると、予め、実験的に定められた係数ｋを用いて、（
ｋＸｓ　Ｉ　ｌ）とＳＤｌとの大小比較が行われて、（
ｋＸＳ　Ｉ　１）＞Ｓ０１なら、ＷＩＩが、ＳＤＩ＞　
（ｋＸｓｌｌ）なら、ＷＤｌが、不特定／特定判別部の
最終的な認識結果となる。S D 1 , S D 2 , − , S D n , using a coefficient k determined experimentally in advance, (
A comparison is made between kXs I l) and SDl, and (
If kXS I 1)>S01, WII is SDI>
(kXsll), WDl becomes the final recognition result of the unspecified/specific discriminator.

（Ｓｔｅｐｌ〜４）また、次候補が必要な場合においても同様に、不特定の
上位ｎ個の類似度にそれぞれ係数ｋを乗じた値と、特定
の上位ｎ個の類似度の値の中から、大きい順に選ばれる
。(Step ~ 4) Also, when the next candidate is required, similarly, from among the unspecified top n similarity values multiplied by the coefficient k, and the specific top n similarity values, , are selected in descending order.

なお、上記係数ｋが１つの固定値であると、特定標準パ
ターンの登録環境や、認識環境、使用者の音声の特徴や
発声変動などの原因によって不特定／特定判別部の認識
結果が、不特定、或は、特定の何れかの候補に偏ること
がある。そこで、係数ａｃｔｔ？ｌｉ［数個用意してお
き、この偏りを是正するために、使用者に手動で、スイ
ッチ又は、コマンドによって、係数ｋを適切と思われる
値に設定させる。Note that if the coefficient k is one fixed value, the recognition result of the unspecified/specific discriminator may vary depending on factors such as the registration environment of the specific standard pattern, the recognition environment, the characteristics of the user's voice, and vocalization fluctuations. It may be biased toward specific or specific candidates. Therefore, the coefficient actt? li [Several coefficients are prepared, and in order to correct this bias, the user manually sets the coefficient k to an appropriate value using a switch or a command.

なお、以」〕に説明した実施例では、不特定の類似度に
係数ｋを乗じて特定の類似度と比較しているが、逆に、
特定の類似度に係数に／を乗じて不特定の類似度と比較
しても結果は同じである。この場合、５ＤｉＸｋとＳＤ
ｉとの大小が比較されることになるのでに７とｋとの関
係はに／＝１／にとなる。In addition, in the embodiment described below, the unspecified degree of similarity is multiplied by the coefficient k and compared with the specified degree of similarity, but conversely,
Even if a specific similarity is multiplied by a coefficient and compared with an unspecified similarity, the result is the same. In this case, 5DiXk and SD
Since the magnitude with i will be compared, the relationship between 7 and k will be /=1/.

更に、認識結果確認手段が、使用者に表示、又は、音声
合成などによって、認識結果を知らせ。Further, the recognition result confirmation means notifies the user of the recognition result by displaying it or by voice synthesis.

使用者がその認識結果に対して、誤りであることを示す
行動をキー入力、又は、音声入力などによって行ったと
き（第１図では、音声合成部と音声合成部ｊす部とキー
ボードが、認識結果確認手段である。なお、音声合成辞
書部は、各認識用標準パターンを用いてもよい。）、装
置は、次候補を出力する。この認識結果確認手段に於て
、使用者が、不特定標準パターンに含まれる認識結果を
キャンセルすると、更に次の候補を出力する。キャンセ
ルして次の候補を出力してと順次実行し、所望の候補を
選択することになる（Ｓｔｅｐ５〜７）が、この場合、
不特定標準パターンに含まれる認識結果をキャンセルし
て前記特定標準パターンに含まれる認識結果を採用した
場合に限り、係数にの値をある実験的に定められた値、
または、ある実嚇的に定められた式により計算された値
だけ小さくし、逆に、前記特定標準パターンに含まれる
認識結果をキャンセルし、上記不特定標準パターンに含
まれる認識結果を採用した場合に限り、ある実験的に定
められた値、または、ある実験的に定められた式により
計算された値でけ大きくすることによって、自動的に、
係数ｋを常に適切な値に保つ。When the user performs an action indicating that the recognition result is incorrect by key input or voice input (in Fig. 1, the voice synthesis section, the voice synthesis section, and the keyboard are (The speech synthesis dictionary unit may use each recognition standard pattern.) The device outputs the next candidate. In this recognition result confirmation means, when the user cancels the recognition result included in the unspecified standard pattern, the next candidate is further output. The process is executed sequentially by canceling and outputting the next candidate, and the desired candidate is selected (Steps 5 to 7), but in this case,
Only when the recognition result included in the unspecified standard pattern is canceled and the recognition result included in the specific standard pattern is adopted, the value of the coefficient is set to a certain experimentally determined value,
Or, if the value calculated by a certain formula is reduced, and conversely, the recognition result included in the specified standard pattern is canceled, and the recognition result included in the unspecified standard pattern is adopted. automatically by increasing it by some experimentally determined value or by a value calculated by some experimentally determined formula.
Always keep the coefficient k at an appropriate value.

次候補が出力され、使用者が不特定標準パターンに含ま
れる認識結果をキャンセルして次候補において、再び、
不特定標準パターンに含まれる認識結果を採用した場合
には係数にの値は変更をしない。逆に、次候補が出力さ
れて、使用者が、特定標準パターンに含まれる認識結果
をキャンセルして、次候補において、再び、特定標準パ
ターンに含まれる認識結果を採用した場合にも係数にの
値は変更しない。The next candidate is output, and the user cancels the recognition result included in the unspecified standard pattern and uses the next candidate again.
When the recognition result included in the unspecified standard pattern is adopted, the value of the coefficient is not changed. Conversely, even if the next candidate is output and the user cancels the recognition result included in the specific standard pattern and then adopts the recognition result included in the specific standard pattern again in the next candidate, the coefficients will not be affected. Do not change the value.

以上の説明から明らかなように、第１図及び第２図に示
した実施例によると、装置のコマンド用語や、一般的な
名称など、予め登録内容の分かっている＋３語（又は、
他の発声単位）について、不特定の標準パターンを持た
せることにより、その単語に関して使用者の登録の手間
を省くと共に。As is clear from the above description, according to the embodiment shown in FIGS. 1 and 2, +3 words (or
By providing an unspecified standard pattern for other utterance units), it saves the user the trouble of registering the word.

不特定標準パターンに登録された候補が選ばれ易い、或
は、特定標準パターンに登録された候補が選ばれ易いと
いった、不特定認識、特定認識間の偏りをなくシ、高い
認識率を保つことができる。To maintain a high recognition rate by eliminating bias between unspecified recognition and specific recognition, such as candidates registered in unspecified standard patterns being more likely to be selected, or candidates registered in specific standard patterns being more likely to be selected. I can do it.

第３図は、本発明は他の実施例を説明するための図で、
この実施例は、音声ダイヤリング装置に関する。従来、
音声ダイヤリング装置は特定話者用の標準パターンしか
持っておらず、どこで使用しても同じ番号であるｒｌｌ
ｏＪ　、ｒｌ１９」。FIG. 3 is a diagram for explaining another embodiment of the present invention,
This embodiment relates to a voice dialing device. Conventionally,
Voice dialing devices only have a standard pattern for a specific speaker, and the rll number is the same no matter where it is used.
oJ, rl19”.

ｒｌ、０４Ｊ、ｒ１７７Ｊ等、又それに対応する「警察
」、「消防署」、「番号案内」、「天気予報」等の認識
用、合成用の標準パターンも話者ごとに全て登録しなけ
ればならない手間があった。rl, 04J, r177J, etc., and the corresponding standard patterns for recognition and synthesis such as "police", "fire department", "directory assistance", "weather forecast", etc. must also be registered for each speaker. was there.

本実施例は、上述のごとき実情に鑑みてなされたもので
、電話のダイヤリングを音声によって行う音声ダイヤリ
ング装置において、誰もが使う公共機関名を登録する手
間を省くことを目的としてなされたものである。This embodiment was developed in view of the above-mentioned circumstances, and was designed to save the trouble of registering public institution names used by everyone in a voice dialing device that performs telephone dialing by voice. It is something.

本願の請求項第３項の発明は、上記目的を達成するため
のもので、該請求項第３項の発明は、電話の送受用のハ
ンドセットより入力される音声の特徴量を抽出する特徴
抽出手段と、前記特徴抽出手段により抽出された特徴量
を記憶する標準パターン記憶手段と、入力音声に対応す
るダイヤル番号を記憶する番号記憶手段と、音声入力時
に前記特徴量抽出手段によって抽出される特徴量とあら
かじめ記憶された前記標準パターン記憶手段内の特徴量
との照合を行い、入力音声がどの標準パターンに該当す
るかを認識するパターン照合手段と。The invention of claim 3 of the present application is for achieving the above object, and the invention of claim 3 is a feature extraction method for extracting feature amounts of voice input from a handset for transmitting and receiving telephone calls. means, standard pattern storage means for storing the feature extracted by the feature extraction means, number storage means for storing a dial number corresponding to input voice, and features extracted by the feature extraction means at the time of voice input. a pattern matching means for comparing the quantity with a feature quantity stored in the standard pattern storage means stored in advance, and recognizing which standard pattern the input voice corresponds to;

前記パターン照合手段によって認識された認識結果に対
応するダイヤル番号を前記番号記憶手段から読み出して
ダイヤル信号を出力するダイヤル信号出力手段と、特定
の電話番号に対応する呼称の音声の標準パターンを不特
定話者用の標準パターンとして備える不特定話者用標準
パターン記憶手段とを具備する。dial signal output means for reading a dial number corresponding to the recognition result recognized by the pattern matching means from the number storage means and outputting a dial signal; and an unspecified standard pattern of a voice of a name corresponding to a specific telephone number A standard pattern storage means for unspecified speakers is provided as a standard pattern for speakers.

以下１本発明の実施例に基づいて説明する。An explanation will be given below based on one embodiment of the present invention.

第３図は、本発明の他の実施例を説明するための構成図
で、図中、２１は送話器、２２は受話器、２３は通話回
路、２４はフックスイッチ、２５は特徴抽出部、２６は
キーボード、２７は制御部、２８はパターン照合部、２
９は認識用標準パターン記憶部、３０は音声合成部、：
３１は合成用標準パターン記憶部、３２は電話番号記憶
部、３３は発（４制御部、３４は着信制御部、３５は回
線制御部、３Ｇは局線で、送話器から入力された音声の
特徴ｈ（は特徴抽出部で抽出される。キーボードの状態
によってこの特徴量を認識用標準パターンとするか、パ
ターン照合を行うかを制御部で判断する４キーボードの
標°べりパターン登録ボタンが押下されていない状態で
は、この特徴）ヨはパターン照合部でため記憶されてい
る認識用標ｌ（Ｉ！パターンとのマツチングが行われ、
類似度の最も高いものが認識結果として選出される。こ
の結果が発信制御部へ送られ、キーボードで予め対応づ
けて登録しである電話番号を番号記憶部から読み出し発
信を行う。又、上記の認識結果は音声合成部へ送られ、
予め対応づけて登録しである合成用標準パターンを読み
出し、合成音として受話器に出力する。この時、認識用
標準パターン及びそれに対応する合成用標準パターン、
番号記憶部は第４図の様な構成である。FIG. 3 is a block diagram for explaining another embodiment of the present invention, in which 21 is a transmitter, 22 is a receiver, 23 is a communication circuit, 24 is a hook switch, 25 is a feature extractor, 26 is a keyboard, 27 is a control unit, 28 is a pattern matching unit, 2
9 is a standard pattern storage unit for recognition; 30 is a speech synthesis unit;
31 is a standard pattern storage unit for synthesis, 32 is a telephone number storage unit, 33 is a calling (4 control unit, 34 is an incoming call control unit, 35 is a line control unit, 3G is a central office line, and the voice input from the transmitter is The feature h (is extracted by the feature extractor. Depending on the state of the keyboard, the controller determines whether to use this feature as a standard pattern for recognition or to perform pattern matching.4) The marker pattern registration button on the keyboard When this feature is not pressed, matching with the recognition mark I (I! pattern) stored in the pattern matching section is performed.
The one with the highest degree of similarity is selected as the recognition result. This result is sent to the call control section, and the telephone number, which has been registered in advance with the keyboard, is read out from the number storage section and a call is made. In addition, the above recognition results are sent to the speech synthesis section,
A standard pattern for synthesis, which has been associated and registered in advance, is read out and output to the receiver as a synthesized sound. At this time, a standard pattern for recognition and a standard pattern for synthesis corresponding to it,
The number storage section has a structure as shown in FIG.

第４図において、（Ａ）は合成用標準パターン記憶部、
（Ｂ）は認識用標準パターン記憶部、（Ｃ）は電話番号
記憶部、（１）は書き込み不可領域、（■）は書き込み
可能領域で、電話番号記憶部の不特定用電話番号記憶部
には、公共機関の電話番号ｒｉｉｏＪ　、ｒＬ１９」等
が記憶されており、これに対応する不特定話者用標準パ
ターンが１つの電話番号に対して複数個記憶されており
、同様に合成用標準パターンも１個ずつ記憶されており
、この領域は書き込み出来ない。また、それぞれに書き
込みのできる領域をもち、ユーザーの登録したパターン
及び電話番号はここに書き込まれる。In FIG. 4, (A) is a standard pattern storage unit for synthesis;
(B) is a standard pattern storage area for recognition, (C) is a phone number storage area, (1) is a non-writable area, and (■) is a writable area. , public institution telephone numbers riioJ, rL19, etc. are stored, and a plurality of corresponding standard patterns for unspecified speakers are stored for one telephone number, as well as standard patterns for synthesis. are stored one by one, and this area cannot be written to. Each device also has a writable area, where the user's registered pattern and phone number are written.

不特定用標準パターンは、ユーザーの使用しないものも
あるのであろうから、キーボードによりマツチングの対
象としないものを選定する。Since some standard patterns for non-specific use may not be used by users, those that are not to be matched are selected using the keyboard.

次に登録時について説明する。キーボードの標準パター
ン登録ボタンを押下すると、特徴抽出部で抽出された入
力音声の特徴量は、認識用標準パターン、合成用標準パ
ターンとして登録される。Next, the time of registration will be explained. When the standard pattern registration button on the keyboard is pressed, the feature amount of the input voice extracted by the feature extraction section is registered as a standard pattern for recognition and a standard pattern for synthesis.

この時、この入力音声に対応する電話番号をキーボード
より番号記憶部に登録する。電話番号の登録作業は、電
話番号をキーボードのテンキーにより番号記憶部３２に
入力することにより実行される。このテンキーによる入
力操作は発信時に実行するキーダイヤル操作でもある。At this time, the telephone number corresponding to this input voice is registered in the number storage section using the keyboard. Registration of a telephone number is performed by inputting the telephone number into the number storage section 32 using the numeric keys on the keyboard. This input operation using the numeric keypad is also a key dial operation performed when making a call.

したがって、電話番号の登録操作（作業）は当該番号の
局線への発信を阻止するためにオンフックの状態で実行
されなければならない、オンフッタの状態とは電話番号
情報の発振出力が局線と接続されていない状態であり、
電話機が局線からの着信を感知して通話ができる状態で
ある。登録モードには、認識用標準パターンの登録モー
ド、合成用標準パターン登録モード及び電話番号登録モ
ードの各登録モードがある。第３図と第４図とを参照し
ながら動作を説明する。いずれかの登録モードを設定す
る登録モードキーを操作して、システムが登録モードに
設定されると、オンフッタ状態となる。これらの登録モ
ードは各モードを独立にキー単位で設定するようにして
も、また、複数モードを同時に、１つのキーで設定でき
るようにしてもよい、なお。Therefore, the telephone number registration operation (work) must be performed in an on-hook state to prevent the number from being called to the central office line.The on-hook state means that the oscillation output of the telephone number information is connected to the central office line. It is in a state where it has not been
The telephone detects an incoming call from the central office line and is ready to make a call. The registration modes include a recognition standard pattern registration mode, a synthesis standard pattern registration mode, and a telephone number registration mode. The operation will be explained with reference to FIGS. 3 and 4. When the system is set to the registration mode by operating a registration mode key for setting one of the registration modes, the system enters an on-footer state. These registration modes may be set independently for each key, or multiple modes may be set simultaneously using a single key.

合成用メモリ３１を含む音声合成を有する部分は音声を
そのまま記憶してそれを再生する方式、本実施例のよう
な所謂分解合成方式、音声波形を圧縮処理して記憶して
、それから伸長再生する方式、いずれの方式も適用する
ことができる。The part having speech synthesis including the synthesis memory 31 uses a method that stores the sound as it is and plays it back, a so-called decomposition and synthesis method as in this embodiment, and a method that compresses and stores the sound waveform and then decompresses and plays it back. Either method can be applied.

以上の説明から明らかなように、第３図及び第４図に示
した実施例によると、誰もが使うような公共機関等の電
話番号、又は、音声の標準パターンを登録する手間が省
け、登録作業が軽減される。As is clear from the above explanation, according to the embodiment shown in FIGS. 3 and 4, it is possible to save the trouble of registering telephone numbers of public institutions, etc. that everyone uses, or standard voice patterns. Registration work is reduced.

第５図乃至第１Ｏ図は１本発明の他の実施例を説明する
ための図で、この実施例は、制御認識処理部に接続され
る外部周辺装置の制御を認識処理制御を行う中央処理部
で実現できるようにした装置に関する。5 to 10 are diagrams for explaining another embodiment of the present invention, in which a central processing unit performs recognition processing control for controlling external peripheral devices connected to a control recognition processing unit. This invention relates to a device that can be realized in the department.

従来の認識プロセッサは認識処理の演算量が厖大である
ため、認、！ｉ１機能しか実現できない、いわゆる認識
専用プロセッサとして構成されていた。Conventional recognition processors require a huge amount of calculation for recognition processing. It was configured as a so-called recognition-only processor that could only implement the i1 function.

したがって、従来の認識専用プロセッサに外部周辺装置
、たとえば、ファクシミリ装置か音声ダイヤリング装置
などを接続しようとすると、認識専用プロセッサに直接
接続することができず、認識専用プロセッサとこれら外
部周辺装置との間で制御の橋渡しをするための制御用の
中央処理部（ＣＰＵ）を介在させる必要があった。Therefore, if you try to connect an external peripheral device, such as a facsimile machine or a voice dialing device, to a conventional recognition-only processor, it will not be possible to connect it directly to the recognition-only processor, and the connection between the recognition-only processor and these external peripherals will be limited. It was necessary to interpose a central processing unit (CPU) for control in order to bridge control between the two.

また、従来の音声認識用ＬＳＩは認識演算量が厖大なた
め認識機能だけしか実行できない。たとえば、動的計画
法（Ｄ　Ｉ）　）による照合の場合には。Furthermore, since the amount of recognition calculation required by conventional speech recognition LSIs is enormous, only the recognition function can be executed. For example, in the case of matching using dynamic programming (DI).

照合の格子点の演算が厖大であるため、実用上では、汎
用ＣＩ’ｌ　Ｕの処理だけで行うことはできないので、
大規模な専用ハードウェアが必要となる。Since the computation of grid points for matching is huge, in practice it cannot be performed only by general-purpose CI'lU processing.
Requires extensive specialized hardware.

しかし、この専用ハードウェア（又は専用プロセッサ）
に、照合以外の機能例えば辞書パターンの　　　・登録
、認識結果の出力又は認識にかかわる周辺制御を実行す
ることや、外部周辺制御との接続は不可能であるので、
これらの場合、汎用ＣＰＵを接続しなければならない、
また、音声認識用ＬＳＩにおいて、特定話者認識用ＬＳ
Ｉと、不特定話者認識用ＬＳＩとは一般的に認識アルゴ
リズムが異なり、演算プロセッサを、双方とも専用化せ
ざるを得ず、特定話者認識又は不特定話者認識の各用途
に応じてそれぞれの認識用ＬＳＩを選択して使用しなけ
ればならない。However, this dedicated hardware (or dedicated processor)
In addition, it is impossible to perform functions other than matching, such as dictionary pattern registration, output of recognition results, or peripheral control related to recognition, or connection with external peripheral control.
In these cases, a general-purpose CPU must be connected,
In addition, in LSI for voice recognition, LS for specific speaker recognition
I and LSI for speaker-independent recognition generally have different recognition algorithms, and the processors for both must be dedicated, depending on the purpose of specific speaker recognition or speaker-independent recognition. Each recognition LSI must be selected and used.

また、従来、特徴パターンと辞書パターンとの類似度（
または、距離）を演算する演算部は、−般的には、ＤＰ
プロセッサに代表されるように、　・大規模な専用プロ
セッサであり、ＣＰＵと演算部を１つのＬＳＩの中に含
むことが不可能であった。In addition, conventionally, the similarity between feature patterns and dictionary patterns (
or distance), - generally, DP
As represented by processors: - They are large-scale dedicated processors, and it was impossible to include a CPU and an arithmetic unit in one LSI.

その他、以前に開発したＢ　Ｔ　Ｓ　Ｐ　（Ｂｉｎａｒ
ｙ　ＴｉｍｅＳｐｅｃｔｒｕａ＋　Ｐａｔｔｅｒｎ）方
式の演算においても、特定話者と不特定話者の認識にお
いて、特徴パターンは同一であるが、辞書パターンは、
それぞれ構成が異なり、同一の演算部で演算することが
できなかった。In addition, the previously developed BTS P (Binar
Even in the calculation of the y TimeSpectra+Pattern method, the feature patterns are the same in recognition of specific speakers and non-specific speakers, but the dictionary pattern is
Each had a different configuration and could not be calculated using the same calculation unit.

本実施例は、上述のごとき実情に鑑みてなされたもので
、特に、入力データの特徴パターンを認識するに際し、
制御認識処理部に外部周辺装置が制御用中央処理部（Ｃ
ＰＵ）を介さずに直接接続できるようにした認識処理装
置を提供することを目的としてなされたものである。更
に、音声認識において、特定話者認識又は不特定話者認
識のいずれを行う場合においても、制御認識処理部を共
用でき、かつ、高速に演算を実行できるようにした認識
装置を提供すること、更に、前記認識処理装置において
、入力データの特徴パターンの有効区間検出の制御を制
御認識処理部の外から可変制御できるようにした装置を
提供すること、更に。This embodiment has been developed in view of the above-mentioned actual situation, and especially when recognizing characteristic patterns of input data,
An external peripheral device is connected to the control recognition processing section by the control central processing section (C
The purpose of this invention is to provide a recognition processing device that can be directly connected without going through a PU. Furthermore, it is an object of the present invention to provide a recognition device which can share a control recognition processing section and can perform calculations at high speed when performing either specific speaker recognition or non-specific speaker recognition in speech recognition. Furthermore, in the recognition processing device, it is possible to variably control detection of a valid section of a characteristic pattern of input data from outside the control recognition processing section.

前記認識処理装置において、入力データの特徴パターン
の照合時におけるデータ量を軽減し、中央処理部の演算
負荷を軽くすること等を目的としてなされたものである
。In the recognition processing device, this is done for the purpose of reducing the amount of data when comparing characteristic patterns of input data and lightening the calculation load on the central processing section.

本願の請求項第４項及び第５項に記載された発明は、上
記目的を達成するためのもので、請求項第４項の発明は
、入力データの特徴量を抽出し、特徴パターンを生成す
る特徴パターン生成部と。The invention described in claims 4 and 5 of the present application is for achieving the above object, and the invention of claim 4 extracts feature amounts of input data and generates a feature pattern. and a feature pattern generation unit.

前記特徴パターンと辞書パターンとを照合する照合部、
並びに、前記生成部及び照合部の制御と共に前記生成部
と呼応して要求される外部周辺装置の制御を行いうる中
央処理部（ＣＰＵ）とを有する制御認識処理部と、前記
中央処理部の制御を支援するプログラム用記憶部と、照
合及び制御に必要なデータを記憶するデータ用記憶部と
を具備し、前記生成部、制御認識処理部、プログラム用
記憶部、データ用記憶部の各部を前記中央処理部の転送
路（ＣＰＵ　ＢＵＳ）に脱着可能に接続する構成とする
。また、請求項第５項の発明は、特徴パターンを入力す
る特徴パターン入力部と、辞書パターンを入力する辞書
パターン入力部と、前記特徴パターン入力部と前記辞書
パターン入力部の出力データを用いてパターン照合に必
要な少なくとも１つの要素を演算する演算部と、前記パ
ターン照合に必要な要素の１フレーム分の値を蓄えるフ
レームデータレジスタと、前記辞書パターン入力部に蓄
えられたデータを２倍したものと前記演算部のデータを
加算するフレームデータ加算部と。a matching unit that matches the feature pattern with a dictionary pattern;
and a control recognition processing unit having a central processing unit (CPU) capable of controlling the generation unit and the verification unit as well as controlling required external peripheral devices in conjunction with the generation unit, and controlling the central processing unit. and a data storage section that stores data necessary for collation and control, and each of the generation section, control recognition processing section, program storage section, and data storage section It is configured to be removably connected to the transfer path (CPU BUS) of the central processing unit. Further, the invention according to claim 5 uses a feature pattern input section for inputting a feature pattern, a dictionary pattern input section for inputting a dictionary pattern, and output data of the feature pattern input section and the dictionary pattern input section. a calculation unit that calculates at least one element necessary for pattern matching; a frame data register that stores one frame worth of the element necessary for pattern matching; and a data register that doubles the data stored in the dictionary pattern input unit. and a frame data addition unit that adds the data of the calculation unit and the data of the calculation unit.

前記パターン照合に必要な要素の値を蓄えるワードデー
タレジスタと、前記ワードデータレジスタに蓄えられた
前記要素の１フレーム分の値とを加算するワードデータ
加算器とからなるパターン照合回路を具備する。The pattern matching circuit includes a word data register that stores values of elements necessary for the pattern matching, and a word data adder that adds values of one frame of the elements stored in the word data register.

以下、本発明の実施例に基づいて説明する。Hereinafter, the present invention will be explained based on examples.

口、）プロセッサ１とプロセッサ２を用いたシステム」
１逸第５図は、基本的な音声認識システムの構成例を示す図
で、プロセッサ４１と、プロセッサ４２と、プログラム
ＲＯＭ４３と、テンプレートＲＡＭ４４からなり、ＣＰ
Ｕバス４５で接続されている。) A system using processor 1 and processor 2.
1. FIG. 5 is a diagram showing an example of the configuration of a basic speech recognition system, which consists of a processor 41, a processor 42, a program ROM 43, and a template RAM 44.
They are connected via U bus 45.

プロセッサ４１は、音声の特徴量を抽出するためのＬＳ
Ｉで、パワースペクトラムとＢＴＳＰを１０ｍ５のサン
プリング周期で出力する。まず、プロセッサ４１内の各
部の機能を説明する。The processor 41 uses an LS for extracting features of the voice.
I outputs the power spectrum and BTSP at a sampling period of 10 m5. First, the functions of each part within the processor 41 will be explained.

・マイクアンプ部５１は、マイク４７から入力された音
声信号を増幅する。- The microphone amplifier section 51 amplifies the audio signal input from the microphone 47.

・ローパスフィルタ部５２は、サンプリングの折返し雑
音除去のため高域をカットする。- The low-pass filter section 52 cuts high frequencies to remove sampling aliasing noise.

・ＡＧＧ　（オートゲインコントロール）アンプとプリ
エンファシスのブロック５３は、音声レベルを適正な範
囲にし、更に、音声の高域のパワーを補正するために高
域を強調する。- AGG (auto gain control) amplifier and pre-emphasis block 53 adjusts the audio level to an appropriate range and further emphasizes the high range to compensate for the power of the high range of the audio.

・バンドパスフィルタと、検波器と、ローパスフィルタ
のブロック５５は、尖鋭度Ｑ＝６で、中６周波数２５０
Ｈｚから６．３５ＫＨｚまで１／３ｏｃｔ、間隔（１５
チヤンネル）で音声のパワースペクトラムを求める。・Band pass filter, detector, and low pass filter block 55 have sharpness Q=6 and middle 6 frequency 250
1/3 oct from Hz to 6.35KHz, interval (15
channel) to find the power spectrum of the audio.

・ＳＣＦＣシコントロ−５４は、スイッチドキャパシタ
フィルタ（ＳＣＦ）で構成されており、バンドパスフィ
ルタと検波器とローパスフィルタのブロック５５のコン
トロールを行う。- The SCFC controller 54 is composed of a switched capacitor filter (SCF), and controls a block 55 of a bandpass filter, a detector, and a lowpass filter.

・Ａ／Ｄコンバータ５６は、音声のパワースペクトラム
を８　ｂｉｔのデジタル値に変換する。- The A/D converter 56 converts the audio power spectrum into an 8-bit digital value.

レジスタ５７は、デジタル化されたパワースペクトラム
データを蓄える。Register 57 stores digitized power spectrum data.

・ＬＯＧ変換部５８は、パワースペクトラムをＬＯＧ軸
に変換する。- The LOG converter 58 converts the power spectrum to the LOG axis.

・ＬＳＦＬフィルタ部５９は、話者の音源特性を正規化
するために最小二乗近似直線（ＬＳＦＬ）を用いた補正
を行う。- The LSFL filter unit 59 performs correction using a least squares approximation straight line (LSFL) in order to normalize the speaker's sound source characteristics.

・バイナリ変換部６０は、１５チヤンネルのうち各ロー
カルピークの１／２以上の値を持つチャンネルに１、そ
うでないチャンネルにＯを与える。- The binary converter 60 gives 1 to channels having a value of 1/2 or more of each local peak among the 15 channels, and gives O to channels that do not.

これらＬＯＧ変換部５８．ＬＳＦＬフィルタ部５９、及
び、バイナリ変換部６０について更に詳細に説明すると
、まず人間の声帯音源の周波数特性が、高域で減衰して
いると、更に個人差が大きいことから、２値化の前にこ
れを最小自乗誤差近似直線を用いる方法によって補正し
ておく。１０１ＩＳごとに得られる音声の周波数データ
に対して最小自乗誤差となる直線を求めて、各データか
らこれを差し引き、その後で２値化する。ＴＳｒ’から
ＢＴＳＰに変換する閾値を低くしすぎると「１」の部分
がブロードになり、単語間のパターン差が乏しくなるの
に対し、高くしすぎるとピークパターンのわずかな周波
数変動も吸収しなくなってしまう、ここでは簡単な実験
によって閾値をピークの５０％に決定した。These LOG converters 58. To explain the LSFL filter section 59 and the binary conversion section 60 in more detail, first of all, if the frequency characteristics of the human vocal cord sound source are attenuated in the high range, individual differences are even greater. Then, this is corrected by a method using a least square error approximation straight line. A straight line giving the least square error is found for the audio frequency data obtained every 101 ISs, this is subtracted from each data, and then it is binarized. If the threshold value for converting from TSr' to BTSP is set too low, the "1" part will become broad and the pattern difference between words will be poor, whereas if it is set too high, even the slightest frequency fluctuation of the peak pattern will not be absorbed. Here, the threshold value was determined to be 50% of the peak through a simple experiment.

・タイマ部６１は、一定周期（ｉｏｍｓ）のパルスを出
力し、プロセッサ４２内部のＣＰＵなどに割込み信号を
送る。- The timer unit 61 outputs pulses with a constant period (ioms) and sends an interrupt signal to the CPU inside the processor 42, etc.

プロセッサ４２は、音声の登録処理、認識処理等を行う
ＬＳＩであり１周辺装置Ｗ４６とコミュニケーションを
行うためのパラレルポート７６を持っている。次にプロ
セッサ４２内の各部の機能を説明する。The processor 42 is an LSI that performs voice registration processing, recognition processing, etc., and has a parallel port 76 for communicating with one peripheral device W46. Next, the functions of each part within the processor 42 will be explained.

・ｃｐｕ部７１は、汎用の１６ｂｉｔ−ＣＰＵコアで、
外付けのＲＯＭ内のプログラムを実行する。- The CPU section 71 is a general-purpose 16-bit CPU core,
Execute the program in the external ROM.

・バスコントロール部７２は、ＬＳＩ内部の内部バス７
３と、ＬＳＩ外部のＣＰＵバス４５のコントロールを行
う。- The bus control unit 72 controls the internal bus 7 inside the LSI.
3 and controls the CPU bus 45 outside the LSI.

・メモリコントローラ部７４は、外付のＲＯＭ　３のチ
ップセレクトを与える。- The memory controller section 74 provides chip selection of the external ROM 3.

・割込みコン１−ローラ部７５は、プロセッサ４１から
のタイマ割込み信号等を入力し、割込みのコントロール
を行う。- The interrupt controller 1-roller unit 75 receives timer interrupt signals and the like from the processor 41 and controls interrupts.

・　パラレルポート部７６は４周辺装置４６とのコ　ミ
ュニケーションを行う。- The parallel port section 76 communicates with the four peripheral devices 46.

・コントロール信号ジェネレータ部７７は、このＬＳＩ
２内部の各種のコントロール信号を生成する。・The control signal generator section 77 is based on this LSI
2. Generates various internal control signals.

・クロックジェネレータ部７８は、シリアルインタフェ
イスを外付した場合のボーレートクロック等を生成する
。- The clock generator section 78 generates a baud rate clock etc. when a serial interface is externally attached.

・５ＥＣＵ部（類似度要素演算ユニット）７９は、ＢＴ
ＳＰで表現された未知入カバターンとテンプレートパタ
ーンをメモリ転送で入力し、後述の、式（１）で示され
る類似度を求めるための要素である式（２）のＰｄ、Ｐ
ｙ、Ｐｖ、、Ｐｉなどを高速に計算する。５ＥＣＵ部１
９は、特定話者認識にも、不特定話者認識にも対応する
。・The 5ECU unit (similarity element calculation unit) 79 is a BT
The unknown input cover pattern and template pattern expressed in SP are input by memory transfer, and Pd and P in equation (2), which are elements for calculating the similarity shown in equation (1), described later.
Calculate y, Pv, Pi, etc. at high speed. 5ECU part 1
9 corresponds to both specific speaker recognition and non-specific speaker recognition.

音声認識プログラムは、プログラムＲＯＭ中にあり、次
のような機能を実行する。（各機能は。The speech recognition program resides in the program ROM and performs the following functions. (Each function is.

パラレルに実行される場合がある。）・プロセッサ４１からの割込み信号が来ると、ＢＴＳＰ
とパワースペクトラムを読み込む。May be executed in parallel. ) - When an interrupt signal comes from the processor 41, the BTSP
and load the power spectrum.

・音声区間を検出し、その間のＢＴＳＰを未知入カバタ
ーンとする。そこで二つのパターンを照合する場合に、
周波数変動と時間変動の幅を考慮して、一方のパターン
の幅は広くとっておき、他方のパターンは１幅のある線
図形から線の特徴を取り出す手法の一つである細線化法
によって幅のほぼ中央近傍の点又は中心線を取り出して
から照合を行なう。この際１時間軸方向も幅を狭めてお
くことが望ましい。こうすることによって、一方のパタ
ーンの時間１周波数の両軸が変動しても細線化した細い
線パターンは幅の広いパターンからはみ出すことなくマ
ツチングがとれる。ここでの細線化処理はいろいろ考え
られるが（例えば電子通信学会研究会資料ＰＲＬ−７５
−６６、第４９〜５６頁参照）、基本的には２値図形の
境界に接している点を図形の連結性を保ったまま１点ず
つ消していってほぼ中央近傍の点、又は中心線を取り出
す・なお１以上の説明においては一方のパターンを細線化し
たが、これは二つのパターンの比較においてはみ出すこ
となくマツチングをとるためであるから、一方を線図形
の特徴を保持して太線化してもよく、或いは一方を細線
化し、他方を太線化しても良い′。・Detect the voice section and use the BTSP during that period as an unknown input cover turn. So when matching two patterns,
Considering the width of frequency fluctuations and time fluctuations, one pattern is set wide, and the other pattern is set to approximately the same width using the thinning method, which is a method of extracting line characteristics from a line figure with a certain width. Verification is performed after extracting a point or center line near the center. At this time, it is desirable to narrow the width in the 1-time axis direction as well. By doing this, even if both time and frequency axes of one pattern fluctuate, matching can be achieved without the thin line pattern protruding from the wide pattern. Various thinning processes can be considered here (for example, IEICE study group material PRL-75
-66, pp. 49 to 56), basically, points touching the boundary of a binary figure are erased one by one while maintaining the connectivity of the figure, and then the points near the center or the center line are removed. In the above explanation, one of the patterns has been made into a thin line, but this is to ensure matching without overlapping when comparing the two patterns, so one should be made into a thick line while retaining the characteristics of the line shape. Alternatively, one may be made into a thin line and the other may be made into a thick line.

・未知入カバターンと各テンプレートパターンを５ＥＣ
Ｕ部にメモリ転送し、その結果である類似度を求めるた
めの要素を読み出す。・5EC of unknown cover pattern and each template pattern
The memory is transferred to the U section, and the resulting elements for determining the degree of similarity are read out.

未知入カバターンに対する各テンプレートの類似度を上
記の要素から計算し、最も高い類似度を与えたテンプレ
ートを認識結果とする。The degree of similarity of each template to the unknown input pattern is calculated from the above factors, and the template that gives the highest degree of similarity is taken as the recognition result.

又、プログラムＲＯＭは外付であるので、特定話者認識
プログラムと、不特定話者認識プログラムとの差し替え
が可能でばかりでなく、音声の登録、認識とそれ以外の
アプリケーション例えば。Furthermore, since the program ROM is external, it is not only possible to replace the specific speaker recognition program with the non-specific speaker recognition program, but also to enable voice registration, recognition, and other applications.

音声ダイヤリングなどのプログラムと同−ＲＯＭ上に持
ち、プロセッサ４２内部のＣＰＵに実行させることがで
きる。認識プログラムのバージョンアップ、認識単語数
の拡張も可能である。などのメリットがある。It can be stored in the same ROM as a program such as voice dialing and executed by the CPU inside the processor 42. It is also possible to upgrade the recognition program and expand the number of recognized words. There are advantages such as.

テンプレートＲＡＭは外付であり、テンプレートの入れ
替えが容易に行えるばかりでなく、ＲＡＭ領域を追加す
ることにより、認識単語数の拡張も可能である。The template RAM is external, and not only can templates be easily replaced, but also the number of recognized words can be expanded by adding a RAM area.

未知入カバターンｙとテンプレートパターン鵬ｉとの類
似度Ｓｙｉを以下のように定める。The degree of similarity Syi between the unknown cover pattern y and the template pattern Peng i is determined as follows.

５ｙｉ＝　　Ｐｖ　　　−Ｐｉ−・・・・・・　（１）
Ｐｄ　　−Ｐｖ　　　　　　Ｐｙ　　−Ｐｉここで、上式で、・は乗算、ｎは論理積（ｙ＝１．■ｉ＞０の時
１、それ以外の時０）を示す。又、ｆは周波数を表し、
各チャンネルに対応し、しは時間を表し、各フレームに
対応する。5yi=Pv-Pi-... (1)
Pd - Pv Py - Pi where: In the above formula, * indicates multiplication, and n indicates logical product (1 when y=1.■i>0, 0 otherwise). Also, f represents the frequency,
Corresponds to each channel, and represents time, which corresponds to each frame.

第６図は、本発明の他の実施例を説明するためのフロー
チャー１−で、この実施例も１図示のように、データの
２次元パターンを取り込み（ｓｔｅｐｌ→２→３）、こ
れを標準パターンとして形成する（ｓｔｅｐ６）場合で
あり、標準２次元パターンを形成する際に、これとデー
タとして取り込んだ２次元パターン（自己自身）との比
較をしながら（ｓｔｅｐ５）前者の各次元軸方向のパタ
ーン幅が後者の各対応次元軸方向のパターンとに対して
異なるように決められてゆ＜（ｓｔｅｐ１→２→３→４
→５→６→３）。しがし、この場合、形成される標準パ
ターンの各次元軸方向の幅の変換の傾向がデータパター
ンの対応幅より常に細線化又は太線化するいずれか一方
向にあるような変換の仕方をすれば両パターンの当該幅
から異なるかどうかの比較判断（ｓｔｅｐ５）は不要と
なる。FIG. 6 is a flowchart 1- for explaining another embodiment of the present invention. In this embodiment, as shown in FIG. This is a case of forming a standard pattern (step 6), and when forming a standard 2D pattern, while comparing this with the 2D pattern (self) imported as data (step 5), each dimension axis direction of the former is The pattern width is determined to be different from the latter pattern in each corresponding dimension axis direction.
→5→6→3). However, in this case, the conversion method must be such that the width of the standard pattern to be formed in each dimensional axis direction is always thinner or thicker than the corresponding width of the data pattern. In this case, there is no need to compare and judge whether the widths of both patterns are different (step 5).

上記のうちパターン作成はプロセッサー４２中のＣＰＵ
と外付テンプレートＲＡＭ中のＣＰＵ用記憶エリア（ｗ
ｏｒｋ　ＲＡ　Ｍ　）とを使って、外付ＲＯＭ中のプロ
グラムによって実行する。Of the above, pattern creation is done by the CPU in the processor 42.
and the CPU storage area in the external template RAM (w
ork RAM) and is executed by a program in an external ROM.

第７図は１本発明の他の実施例を説明するためのフロー
チャートで、この実施例は１図示のようにデータの２次
元パターンを取り込んで（ｓｔａｐｌ→２→３）、これ
を標準の２次元パターンとして形成する（ｓｔｅｐ４）
場合のものである。一般には、データの２次元パターン
は形成された標準の２次パターンと比較されながら（ｓ
ｔｏｐ６）前者の各次元軸方向のパターン幅が後者の各
対応次元軸方向のパターンに対して異なるように決めら
れてゆ（（ｓｔｅｐｌ→２→３→５）。しかし、データ
パターンの各次元軸方向の幅の変換の傾向が、標準パタ
ーンの対応幅より常に細線化又は太線化するいずれか一
方向にあるような変換の仕方をすれば両パターンの当該
幅が異なるかどうかの比較判断（ｓｔｅｐ６）は不要と
なる。FIG. 7 is a flowchart for explaining another embodiment of the present invention. This embodiment imports a two-dimensional pattern of data (stapl→2→3) as shown in FIG. Form as a dimensional pattern (step 4)
It is a matter of the case. Generally, the two-dimensional pattern of data is compared with a standard quadratic pattern formed (s
top 6) The former pattern width in each dimension axis direction is determined to be different from the latter pattern width in the corresponding dimension axis direction ((step → 2 → 3 → 5). However, each dimension axis of the data pattern If the conversion trend is such that the width of the standard pattern is always thinner or thicker than the corresponding width of the standard pattern, then it is possible to compare and judge whether the widths of the two patterns are different (step 6). ) is no longer needed.

上記のうちパターン作成はプロセッサー４２の中のＣＰ
Ｕと外付テンプレートＲＡＭ中のＣＰＵ用記憶エリア（
ｗｏｒｋ　ＲＡ　Ｍ　）とを使って、外付ＲＯＭ中のプ
ログラムによって実行する。Of the above, pattern creation is performed by the CP in the processor 42.
CPU storage area in U and external template RAM (
The program is executed using the external ROM (work RAM).

第８図に、パーソナルコンピュータ８０を用いた音声認
識システムの構成図を示す。FIG. 8 shows a configuration diagram of a speech recognition system using a personal computer 80.

一般的なパーソナルコンピュータ８０においては、ＣＰ
Ｕ８１のバス１００が拡張スロットにでているので、こ
の拡張スロットにプロセッサ９１を含む音声認識ボード
９０を挿入することによって音声Ｌｉ　ｆｉ’ｌを実行
することができる。In a general personal computer 80, CP
Since the bus 100 of U81 comes out to the expansion slot, voice Li fi'l can be executed by inserting the voice recognition board 90 including the processor 91 into this expansion slot.

音声認識ボード９０には、プロセッサ９１とアドレスデ
コードを行うデコーダ９２だけが必要であり、パーソナ
ルコンピュータバス８０に接続される。又、外部からマ
イク１０１の入力が接続される。The voice recognition board 90 requires only a processor 91 and a decoder 92 for address decoding, and is connected to the personal computer bus 80. Also, the input of the microphone 101 is connected from the outside.

第５図に示した例で、プロセッサ４２のＣＰＵ７１が実
行していた機能は、パーソナルコンピュータ８０のＣＰ
Ｕ８１が全て行う。プログラムとテンプレートデータは
、ハードディスクやフロッピーディスク等１０２，１０
３にセーブされていて、音声認識処理が行なわれる前に
、パーソナルコンピュータのＲＡＭ８２に移される。In the example shown in FIG. 5, the functions executed by the CPU 71 of the processor 42 are
U81 will do everything. Programs and template data can be stored on hard disks, floppy disks, etc.102,10
3, and is transferred to the RAM 82 of the personal computer before voice recognition processing is performed.

（３５ＥＣＵ部の構　　用ＢＴＳＰを用いた音声認識方式では、１フレームの未知
入カバターンは、各チャンネルが１ビツトで表現される
ので、２バイトで表せる。１フレームのテンプレートパ
ターンは、各チャンネルがｎビットとすれば、ｎ×２バ
イトで表せる。そこで、ｌフレームの未知入カバターン
をＵ（１６ｂｉｔ）として、Ｕと同じ構造を用いて、１
フレームのテンプレートパターンの各チャンネルの最上
位ビットをまとめたものをＴｎ−１、次のビットをまと
めたものをＴｎ−２，１、最上位ビットをまとめたもの
をＴＯで表現することができる。５ＥＣＵ部は、このＵ
、Ｔｎ−１，Ｔｎ−２，、、ＴＯのデータを用いて１式
（２）に示す類似度の要素Ｐｄ、Ｐｙ。(In the voice recognition system using the 35ECU structure BTSP, one frame of unknown input cover pattern can be expressed with 2 bytes because each channel is expressed with 1 bit.) In the template pattern of 1 frame, each channel is n If it is a bit, it can be expressed as n × 2 bytes. Therefore, let the unknown input cover pattern of l frame be U (16 bits), and using the same structure as U, 1
The collection of the most significant bits of each channel of the frame template pattern can be expressed as Tn-1, the collection of the next bits can be expressed as Tn-2,1, and the collection of the most significant bits can be expressed as TO. 5ECU part is this U
, Tn-1, Tn-2, , and the similarity elements Pd and Py shown in equation (2) using the data of TO.

Ｐｖ、Ｐｉを演算する。Calculate Pv and Pi.

第９図に前記５ＥＣＵ部７９の構成図を示す。FIG. 9 shows a configuration diagram of the 5ECU section 79.

・クリアコントローラ７９ａは、フレームデータレジス
タ７９ｇとワードデータレジスタ７９ｉのクリアを行う
。- The clear controller 79a clears the frame data register 79g and word data register 79i.

・未知データレジスタ７９ａは、１フレームの未知入カ
バターンＵを入力するための１６ビツトのレジスタであ
る。- The unknown data register 79a is a 16-bit register for inputting one frame of unknown input cover pattern U.

・テンプレートデータレジスタ７９ｂは、１フレームの
未知入カバターンＴＯ，Ｔｌ、　、　、　Ｔｎ−１を入
力するための１６ビツトのレジスタである。- The template data register 79b is a 16-bit register for inputting one frame of unknown input cover patterns TO, Tl, , , Tn-1.

・論理演算部７９ｅは、未知データレジスタ９９Ｃとテ
ンプレートデータレジスタ７９ｂのデータを各ビット毎
に論理演算し、Ｐ　Ｙ　ｔ　Ｐ　ｉに対するデータは、
フレームデータレジスタ７９ｇに送り。- The logical operation unit 79e performs a logical operation on the data in the unknown data register 99C and the template data register 79b for each bit, and the data for P Y t P i is
Send to frame data register 79g.

Ｐｄ、Ｐｖに対するデータは、フレームデータ加算器７
９ｆに送る。The data for Pd and Pv are sent to the frame data adder 7.
Send to 9f.

・フレームデータレジスタ７９ｇは、１フレーム分の類
似度の各要素を入れるためのレジスタである。- The frame data register 79g is a register for storing each element of similarity for one frame.

・フレームデータ加算器７９ｆは、Ｐｄ、Ｐｖについて
、テンプレートデータレジスタ７９ｂに一時的に蓄えら
れたデータを１ビツト上方にシフトしたもの（２倍した
もの）と、論理演算部７９ｅのデータを加算する。- The frame data adder 79f adds the data temporarily stored in the template data register 79b shifted upward by 1 bit (doubled) for Pd and Pv, and the data of the logic operation unit 79e. .

・ワードデータレジスタ７９ｉは、１つのテンプレート
に対する類似度の各要素を入れるためのレジスタである
。- The word data register 79i is a register for storing each element of similarity for one template.

・ワードデータ加算器７９ｆは、ワードデータレジスタ
７９ｉに蓄えられた類似度の各要素と、フレームデータ
レジスタ７９ｇに蓄えられた類似度の各要素とを加算す
る。- The word data adder 79f adds each similarity element stored in the word data register 79i and each similarity element stored in the frame data register 79g.

・出カバソファ７９ｄは、ワードデータレジスタ７９ｉ
に蓄えられた類似度の各要素をデータバス７３に送るた
めのバッファである。・The output sofa 79d is the word data register 79i.
This is a buffer for sending each similarity element stored in the data bus 73 to the data bus 73.

第１０図は、前記５ＥＣＵ７９を用いて、未知入カバタ
ーンと１つのテンプレートとの類似度の各要素を計算す
るためのフローチャートで、まず・ワードデータレジス
タ７９ｉをクリアする。FIG. 10 is a flowchart for calculating each element of similarity between an unknown input cover pattern and one template using the 5ECU 79. First, the word data register 79i is cleared.

・フレームデータレジスタ７９ｇをクリアし、同時にＵ
を転送する。・Clear frame data register 79g and at the same time
Transfer.

・テンプレートデータを上位（Ｔｎ−１）から送る。- Send template data from the higher level (Tn-1).

・この手続きを、ＴＯまで繰り返す。・Repeat this procedure until TO.

・この手続を、最終フレームまで繰り返す。- Repeat this procedure until the final frame.

・結果を読み出す。・Read the results.

上記の手続により、５ＥＣＵ部７９は１次のような演算
を行う。Through the above procedure, the 5ECU section 79 performs the following first-order calculation.

式（２）において、　ｙ（ｆ、ｔ）１ｍｉ（ｆ、ｔ）　
ｎｙ（ｆ、ｔ）は、１ビツトの値をとるので、ｌフレー
４分のＰｙ。In equation (2), y(f, t)1mi(f, t)
Since ny(f, t) takes a value of 1 bit, Py is equal to 4 frames.

Ｐｉ、すなわち、の値は、　’Ｉ’ｎ−１ｙ　Ｉ　ｌ　’ｒｏの転送後直
ちに、フレームレジスタ７９ｇに送られる。The value of Pi, ie, is sent to the frame register 79g immediately after the transfer of 'I'n-1y Il'ro.

又、ｍｌ（ＬＬＬ　ｍｕ（ｆ−ｔ）・ｙ（Ｌｔ）は、ｎ
ビットの値をとるので、１フレーム分のＰｄ、Ｐｖ、す
なわち、の値は、Ｔｎ−１，、、ＴＯの上位から計算して、順次
ビットシフトして下位と加算することにより得られる。Also, ml(LLL mu(ft)・y(Lt) is n
Since bit values are taken, the values of Pd and Pv for one frame, that is, the values of Tn-1, .

ｌフレー４分の各要素がフレームデータレジスタ７９ｇ
に揃った時点で、各要素のデータは、その時点までの累
積データと加算される。Each element for four frames is a frame data register 79g.
When all of the elements are complete, the data for each element is added to the accumulated data up to that point.

この５ＥＣＵ部７９の動作は、各部分が非常にシンプル
であるので、高速演算が可能で、ＣＰＵは、ｌテンプレ
ートの最終フレームのデータ転送後、すぐに、１テンプ
レートに対する類似度の要素を得ることができる。The operation of this 5ECU unit 79 is very simple in each part, so high-speed calculation is possible, and the CPU can immediately obtain the similarity element for one template after transferring the data of the last frame of one template. I can do it.

この５ＥＣＵ部を用いた演算方式は、テンプレートのデ
ータを上位からしているので、１フレームのテンプレー
トパターンの各チャンネルを表現するビット数ｎの値に
左右されずに類似度の要素を得ることができる。従って
、ＢＴＳＰ方式の特定話者音声認識と、不特定話者音声
認識の両方に適応する。This calculation method using 5ECU units calculates the template data from the top, so it is possible to obtain the similarity element without being influenced by the value of the number of bits n representing each channel of the template pattern of one frame. can. Therefore, it is applicable to both the BTSP method of specific speaker speech recognition and speaker-independent speech recognition.

以上の説明から明らかなように、第５図乃至第１０図に
示した実施例によると、ＢＴＳＰ方式の演算が簡易で、
ハードウェアが小さくてすむため、認識演算および処理
だけでなく周辺の制御およびそのハードウェアを有し、
外部制御を含めたアプリケーションをも含めて実行する
ことができる。As is clear from the above explanation, according to the embodiments shown in FIGS. 5 to 10, the calculation of the BTSP method is simple;
Because the hardware is small, it has not only recognition calculations and processing, but also peripheral control and its hardware.
Applications including external control can also be executed.

また、Ｉ！１ＴＳＰ方式における演算は簡易であるのに
加えて、同一の特徴量、同一の２値パターンを用いて、
若干の認識処理の変更により、特定と不特定認識がソフ
トウェアで実現できる。従って。Also, I! In addition to the simple calculation in the 1TSP method, using the same feature amount and the same binary pattern,
By slightly changing the recognition process, specific and non-specific recognition can be realized using software. Therefore.

外部ＲＯＭの変更により、単語数の変更、特定・不特定
の変更を行うことができる。By changing the external ROM, the number of words and specific/unspecific changes can be made.

効　　　果以上の説明から明らかなように、本願の請求項第１項及
び第２項に記載された発明によると、以上の説明から明
らかなように、第１図及び第２図に示した実施例による
と、装置のコマンド用語や、一般的な名称など、予め登
録内容の分かつている用語（又は、他の発声単位）につ
いて、不特定の標準パターンを持たせることにより、そ
の単語に関して使用者の登録の手間を省くと共に、不特
定ａ準パターンに登録された候補が選ばれ易い、或は、
特定標準パターンに登録された候補が選ばれ易いといっ
た、不特定認識、特定認識間の偏りをなくシ、高い認識
率を保つことができる。Effects As is clear from the above explanation, according to the inventions stated in claims 1 and 2 of the present application, the implementation shown in FIGS. For example, by providing an unspecified standard pattern for a term (or other unit of utterance) whose registered content is known in advance, such as a command term for a device or a general name, the user can In addition to saving the trouble of registration, candidates registered in unspecified quasi-patterns are easily selected, or
It is possible to maintain a high recognition rate by eliminating bias between non-specific recognition and specific recognition, such as candidates registered in specific standard patterns being more likely to be selected.

また、請求項第３項の発明によると、誰もが使うような
公共機関等の電話番号、又は、音声の標準パターンを登
録する手間が省け、登録作業が軽減される。Further, according to the invention as claimed in claim 3, it is possible to save the trouble of registering the telephone number of a public institution or the like that everyone uses, or the standard pattern of voice, and the registration work is reduced.

また、請求項第４項及び第５項の発明によると。Further, according to the invention of claims 4 and 5.

ＢＴＳＰ方式の演算が簡易で、ハードウェアが小さくて
すむため、認識演算および処理だけでなく周辺の制御お
よびそのハードウェアを有し、外部制御を含めたアプリ
ケーションをも含めて実行することができる。また、Ｂ
ＴＳＰ方式における演算は簡易であるのに加えて、同一
の特徴量、同一の２値パターンを用いて、若干の認識処
理の変更により、特定と不特定認識がソフトウェアで実
現できる。従って、外部ＲＯＭの変更により、単語数の
変更、特定・不特定の変更を行うことができる。Since the calculations of the BTSP method are simple and require small hardware, it has not only recognition calculations and processing, but also peripheral control and its hardware, and can also execute applications including external control. Also, B
In addition to simple calculations in the TSP method, specific and non-specific recognition can be realized by software by using the same feature amount and the same binary pattern and by slightly changing the recognition process. Therefore, by changing the external ROM, the number of words and specific/unspecified changes can be made.

[Brief explanation of the drawing]

第１図は１本発明の一実施例を説明するための構成図、
第２図は、その動作説明をするためのフローチャー１・
、第３図は２本発明の他の実施例を説明するための図、
第４図は、合成用標準パターン記憶部、認識用標準パタ
ーン記憶部、及び、電話番号記憶部の構成を示す図、第
５図乃至第１０図は、本発明の他の実施例を説明するた
めの図である。ｌ・・・マイクロフォン、２・・・特徴抽出部、３・・
・不特定話者認識部、４・・・特定話者認識部、５・・
・不特定話者標準パターン記憶部、６・・・特定話者ａ
７１＃！パターン記憶部、７・・・不特定／特定話者判
定部、８・・・キーボード、９・・・係数部、１０・・
・音声合成部。１１・・・音声合成辞書部、１２・・・スピーカ、２１
・・・送話器、２２・・・受話器、２３・・・通話回路
、２４・・・フックスイッチ、２５・・・特徴抽出部、
２６・・・キーボード、２７・・・制御部、２８・・・
パターン照合部、２９・・・認識用標準パターン記憶部
、３０・・・音声合成部、３１・・・合成用標準パター
ン記憶部、３２・・・電話番号記憶部、３３・・・発信
制御部、３４・・・着信制御部、３５・・・回線制御部
、３６・・・局線、４１．４２・・・プロセッサ、４３
・・・ＲＯＭ、４４・・・ＲＡＭ、４５・・・バス、４
６・・・周辺装置、４７・・・マイク。第１図第　２　Σ 第　３　図第　４　　図第５図冥　６　Σ 第　　７　図第８図？０貝　９　Σ バスFIG. 1 is a configuration diagram for explaining one embodiment of the present invention.
Figure 2 is a flowchart 1 for explaining the operation.
, FIG. 3 is a diagram for explaining another embodiment of the present invention,
FIG. 4 is a diagram showing the configuration of a standard pattern storage unit for synthesis, a standard pattern storage unit for recognition, and a telephone number storage unit, and FIGS. 5 to 10 explain other embodiments of the present invention. This is a diagram for l...Microphone, 2...Feature extraction unit, 3...
・Unspecified speaker recognition unit, 4...Specific speaker recognition unit, 5...
・Unspecified speaker standard pattern storage unit, 6...Specific speaker a
71#! Pattern storage section, 7... Unspecified/specific speaker determination section, 8... Keyboard, 9... Coefficient section, 10...
・Speech synthesis section. 11... Speech synthesis dictionary section, 12... Speaker, 21
... Transmitter, 22... Receiver, 23... Call circuit, 24... Hook switch, 25... Feature extraction unit,
26...Keyboard, 27...Control unit, 28...
Pattern matching section, 29... Standard pattern storage section for recognition, 30... Speech synthesis section, 31... Standard pattern storage section for synthesis, 32... Telephone number storage section, 33... Call control section , 34... Incoming call control unit, 35... Line control unit, 36... Office line, 41.42... Processor, 43
...ROM, 44...RAM, 45...bus, 4
6... Peripheral device, 47... Microphone. Figure 1 Figure 2 Σ Figure 3 Figure 4 Figure 5 Mei 6 Σ Figure 7 Figure 8? 0 shellfish 9 Σ bus

Claims

[Claims] 1. Feature extracting means for extracting feature quantities of input speech; unspecified standard pattern storage means for storing preregistered unspecified speaker standard patterns; specific standard pattern storage means for storing a standard pattern for performing
A recognition means that performs recognition by comparing each of the unspecified standard pattern and the specific standard pattern with the feature amount of the voice extracted by the feature extraction means, and using the recognition result obtained by the comparison and recognition by the recognition means. a means for confirming the recognition results to inform the person, and at least one predetermined method.
coefficient storage means for storing predetermined coefficients, and a value corresponding to the degree of similarity to either the unspecified standard pattern or the specific standard pattern obtained by the recognition means and the coefficients. A standard for which a value corresponding to the maximum similarity is obtained by comparing the calculated value with a value corresponding to the degree of similarity to a standard pattern to which the calculation has not been performed among the unspecified standard pattern or the specified standard pattern. A speech recognition device comprising recognition means for recognizing a pattern. 2. In the speech recognition method, the feature amount of the input voice is extracted by the feature amount extraction means, which includes the following steps; and the voice feature extracted by the feature extracting means and recognized by the recognition means; the degree of similarity of either the unspecified standard pattern or the specific standard pattern obtained by the recognition means to the standard pattern; A calculation is performed on a value corresponding to the value and at least one predetermined coefficient determined in advance; Compare with the value corresponding to the degree of similarity;
A standard pattern from which a value corresponding to the maximum similarity value is obtained through the comparison is recognized. 3. Feature extracting means for extracting feature quantities of voice input from a handset for transmitting and receiving telephone calls; standard pattern storage means for storing feature quantities extracted by the feature extracting means; and a dial number corresponding to the input voice. and a number storage means for storing the number, and compares the feature amount extracted by the feature amount extraction means at the time of voice input with the feature amount stored in the standard pattern storage means stored in advance, and determines which standard pattern the input voice corresponds to. dial signal output means for reading a dial number corresponding to the recognition result recognized by the pattern matching means from the number storage means and outputting a dial signal;
A voice dialing device comprising a standard pattern storage means for unspecified speakers, which stores a standard pattern of a voice of a name corresponding to a particular telephone number as a standard pattern for unspecified speakers. 4. A feature pattern generation unit that extracts feature amounts of input data and generates a feature pattern; a matching unit that matches the feature pattern with a dictionary pattern; and a control unit that controls the generation unit and the matching unit. A control recognition processing section having a central processing section (CPU) capable of controlling the external peripheral devices required in concert, a program storage section that supports the control of the central processing section, and a storage section necessary for collation and control. and a data storage section for storing data, and each section of the generation section, control recognition processing section, program storage section, and data storage section is removably connected to a transfer path (CPUBUS) of the central processing section. A recognition processing device characterized by having the following configuration. 5. A feature pattern input section for inputting a feature pattern, a dictionary pattern input section for inputting a dictionary pattern, and at least one element necessary for pattern matching using the output data of the feature pattern input section and the dictionary pattern input section. and one of the elements necessary for the pattern matching.
a frame data register that stores values for a frame; a frame data addition unit that adds double the data stored in the dictionary pattern input unit to the data of the calculation unit; and values of elements necessary for pattern matching. 1. A recognition processing device comprising a pattern matching circuit comprising a word data register that stores a value of the element, and a word data adder that adds one frame worth of the element stored in the word data register.