JPH0331275B2

JPH0331275B2 -

Info

Publication number: JPH0331275B2
Application number: JP58168797A
Authority: JP
Inventors: Yasuo Sato; Takayuki Fujimoto
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-09-13
Filing date: 1983-09-13
Publication date: 1991-05-02
Also published as: JPS6060698A

Description

【発明の詳細な説明】 (A) 発明の技術分野本発明は音声標準特徴パターン作成処理装置、
特に、未知入力音声から得られる入力特徴パター
ンとの照合し用いられる標準特徴パターンに関す
る登録音声を、再生して使用者に聞かせるように
し、誤つた発声による標準特徴パターンの登録を
簡単に削除できるようにした音声標準特徴パター
ン作成処理装置に関するものである。[Detailed Description of the Invention] (A) Technical Field of the Invention The present invention relates to a speech standard feature pattern creation processing device,
In particular, it is possible to easily delete registration of standard feature patterns caused by incorrect utterances by playing back and letting the user listen to registered sounds related to standard feature patterns that are used for matching with input feature patterns obtained from unknown input sounds. The present invention relates to a speech standard feature pattern creation processing device.

(B) 従来技術と問題点一般に音声認識において、認識率を向上させる
ためには、音声情報からどのような特徴パラメー
タを抽出し照合に用いるかが重要であるが、その
システムで定められた特徴抽出により、各項目を
代表する標準特徴パラメータとして、いかに最適
なものを辞書に用意するかについても重要であ
る。特徴の抽出のし方や照合のし方が、いかに優
れていても、辞書中に登録される標準特徴パター
ンに、雑音付加パターン、不明瞭発声パターン等
の不良標準特徴パターンや、例えば「ａ」を登録
すべきときに「ｉ」と発声してしまう等の発声誤
りによる誤り標準特徴パターンが多ければ、認識
率は向上しない。(B) Prior art and problems In general, in speech recognition, in order to improve the recognition rate, it is important to determine what kind of feature parameters are extracted from speech information and used for matching. It is also important to determine how best to prepare the standard feature parameters representative of each item in the dictionary through extraction. No matter how good the method of feature extraction or matching is, the standard feature patterns registered in the dictionary may contain defective standard feature patterns such as noise-added patterns, unclear utterance patterns, or, for example, "a". If there are many erroneous standard feature patterns due to utterance errors such as uttering ``i'' when ``i'' should be registered, the recognition rate will not improve.

標準特徴パターンは、辞書中にデイジタル情報
で記憶され、その数が多く、機械部品のように目
にみえるわけではなく、またすべての標準特徴パ
ターンが一律に使用されるわけではないので、一
旦登録されてしまうと、上記不良標準特徴パター
ン、誤り標準特徴パターン等の検出は容易ではな
い。 Standard feature patterns are stored as digital information in the dictionary, and there are many of them, and they are not visible like mechanical parts, and not all standard feature patterns are uniformly used, so once registered. If this occurs, it is not easy to detect the above-mentioned defective standard feature patterns, erroneous standard feature patterns, etc.

従来、一旦登録した標準特徴パターンはすべて
正しいものとして扱い、認識誤りが生じた場合、
認識させようとする入力音声が悪いか、または認
識の限界であつて、止むを得ないものとされるの
が一般的であつた。また、誤認識を生じさせた入
力音声から抽出された入力特徴パターンと、既に
登録されている標準特徴パターンとのいわゆる平
均化により、辞書の品質を改良していく学習方式
等も提案されているが、登録済みの標準特徴パタ
ーンが、ある程度正しいという前提のもとにとら
れる方式であつて、標準特徴パターンが誤つてい
る場合には、収束が遅いという問題があつた。 Conventionally, all standard feature patterns once registered are treated as correct, and if a recognition error occurs,
Generally, this was unavoidable because the input voice to be recognized was bad or at the limits of recognition. In addition, a learning method has been proposed that improves the quality of the dictionary by averaging the input feature pattern extracted from the input voice that caused the misrecognition with the already registered standard feature pattern. However, this method is based on the premise that the registered standard feature pattern is correct to some extent, and if the standard feature pattern is incorrect, there is a problem that convergence is slow.

登録しようとする標準特徴パターンまたは既に
登録されている標準特徴パターンが適当なもので
はない場合、それを検出して再登録できるように
することが望まれる。 If a standard feature pattern to be registered or a standard feature pattern that has already been registered is not appropriate, it is desirable to be able to detect it and re-register it.

ところで、従来、音声の認識結果を音声で出力
する方式は考えられていたが、認識結果として出
力される音声の情報は、各項目に対応して用意さ
れているものであつて、各標準特徴パターンに対
応して用意されているものではなかつた。そのた
め、誤認識が生じた場合等に、出力音声を聞いて
も、それによつて標準特徴パターンの良否を決定
することはできなかつた。 By the way, in the past, methods have been considered to output voice recognition results as voice, but the voice information output as recognition results is prepared corresponding to each item, and is based on each standard feature. It was not prepared in accordance with the pattern. Therefore, in the case where a recognition error occurs, it is not possible to determine whether the standard feature pattern is good or bad even if the output audio is heard.

(C) 発明の目的と構成本発明は上記問題点の解決を図り、認識モード
時に、現在着目されている標準特徴パターンのも
とになつた登録音声を再生して出力することによ
り、使用者が誤つた発声による登録であることを
認知できるようにし、妥当でない標準特徴パター
ンについては、再登録することにより、辞書の品
質を向上させ、認識率を高めることを目的として
いる。そのため、本発明の音声標準特徴パターン
作成処理装置は、未知入力音声を音響分析して得
られる入力特徴パターンと、予め辞書中の各項目
に対応して格納された標準特徴パターンとの照合
によつて、音声認識を行う音声認識システムにお
ける音声標準特徴パターン作成処理装置であつ
て、上記辞書は上記標準特徴パターンに対応して
音声情報を記憶する音声情報記憶部をそなえ、音
声認識時または認識誤りが生じた際に上記辞書中
の該当する上記音声情報にもとづいて音声を再生
して出力する音声再生部と、登録削除の指示によ
り当該認識時に選択された標準特徴パターンの登
録を辞書中から削除する登録削除部とをそなえた
ことを特徴としている。以下図面を参照しつつ、
実施例に従つて説明する。(C) Object and Structure of the Invention The present invention aims to solve the above-mentioned problems, and allows the user to The aim is to improve the quality of the dictionary and increase the recognition rate by making it possible to recognize that the dictionary has been registered as an incorrect utterance, and by re-registering invalid standard feature patterns. Therefore, the speech standard feature pattern creation processing device of the present invention compares an input feature pattern obtained by acoustically analyzing unknown input speech with a standard feature pattern stored in advance corresponding to each item in a dictionary. A speech standard feature pattern creation processing device in a speech recognition system that performs speech recognition, wherein the dictionary includes a speech information storage section that stores speech information corresponding to the standard feature pattern, an audio playback unit that reproduces and outputs audio based on the corresponding audio information in the dictionary when this occurs, and deletes the registration of the standard feature pattern selected at the time of the recognition from the dictionary in response to an instruction to delete the registration. It is characterized by having a registration/deletion section to do the following. Referring to the drawings below,
An explanation will be given according to an example.

(D) 発明の実施例第１図は音声パターンの分布と標準特徴パター
ンとの関係を説明するための図、第２図は本発明
による処理概要を説明するための図、第３図は本
実施例に関係する参考技術の例、第４図は第３図
図示実施例における音声再生についての説明図を
示す。(D) Embodiments of the invention FIG. 1 is a diagram for explaining the relationship between the distribution of voice patterns and standard feature patterns, FIG. 2 is a diagram for explaining the outline of the processing according to the present invention, and FIG. As an example of reference technology related to the embodiment, FIG. 4 shows an explanatory diagram of audio reproduction in the embodiment illustrated in FIG. 3.

第１図において、Ａ，Ｂ，Ｃの実線で囲まれた
部分は、パターン空間における実際の音声パター
ンの分布を示し、A₁およびA₂は単語Ａ（単音節を
含む。以下同様。）に対する登録された標準特徴
パターン、B₁ないしB₃は単語Ｂに対する標準特
徴パターン、C₁は単語Ｃに対する標準特徴パタ
ーンを表わしている。図示Ｃのように、１つの単
語項目について、１つの標準特徴パターンでカバ
ーできる場合もあるが、通常、図示Ａ，Ｂのよう
に、１つの項目について複数の標準特徴パターン
を用意し、認識すべき音声パターンの分布範囲を
カバーするのが普通である。例えば、未知入力音
声の入力特徴パターンＸが抽出されると、その入
力特徴パターンＸと各標準特徴パターンA₁，A₂，
B₁……とのマツチング距離の演算を行い、距離
の小さい標準特徴パターンの属する項目を認識結
果とする。 In Figure 1, the parts surrounded by solid lines A, B, and C indicate the distribution of actual speech patterns in the pattern space, and A ₁ and A ₂ are for word A (including monosyllables; the same applies hereinafter). The registered standard feature patterns B ₁ to B ₃ represent standard feature patterns for word B, and C ₁ represents a standard feature pattern for word C. In some cases, one word item can be covered by one standard feature pattern, as shown in illustration C, but usually, multiple standard feature patterns are prepared and recognized for one word, as shown in illustrations A and B. It usually covers the distribution range of power speech patterns. For example, when an input feature pattern X of unknown input speech is extracted, the input feature pattern X and each standard feature pattern A ₁ , A ₂ ,
The matching distance with B ₁ is calculated, and the item to which the standard feature pattern with the short distance belongs is taken as the recognition result.

もし、辞書に登録された標準特徴パターンの中
に、音声パターンの分布から外れた不良標準特徴
パターンや誤り標準特徴パターン等があれば、認
識率は劣化することとなる。本発明は、このよう
な妥当でない標準特徴パターンを登録後に削除す
ることによつて、認識率を向上させようとするも
のである。 If the standard feature patterns registered in the dictionary include a defective standard feature pattern or an erroneous standard feature pattern that deviates from the distribution of speech patterns, the recognition rate will deteriorate. The present invention aims to improve the recognition rate by deleting such invalid standard feature patterns after registration.

例えば、第２図図示の如く、単語「渋谷」の音
声パターンの分布が、図示Ｓであり、単語「日比
谷」の音声パターンの分布が図示Ｈであつたとす
る。辞書の作成にあたつて、それぞれ複数個の標
準特徴パターンを登録するとき、操作ミスまたは
発声ミスによつて、「シブヤ」と発声すべきとこ
ろを、誤つて「ヒビヤ」と発声し、この標準特徴
パターンS₃を登録してしまつたとする。標準特徴
パターンS₃は、実際には「ヒビヤ」の音声パター
ンであるにもかかわらず、辞書においては単語
「渋谷」に属するものとして記憶されることにな
る。 For example, as shown in FIG. 2, it is assumed that the distribution of the voice pattern of the word "Shibuya" is S in the diagram, and the distribution of the voice pattern of the word "Hibiya" is H in the diagram. When creating a dictionary, when registering multiple standard feature patterns, due to a mistake in operation or pronunciation, the user mistakenly pronounced "hibiya" instead of "shibuya", resulting in the standard feature pattern being registered. Assume that feature pattern _S3 has been registered. Standard feature pattern S ₃ is stored in the dictionary as belonging to the word "Shibuya" even though it is actually a speech pattern for "Hibiya".

１度、上記のように登録されてしまうと、例え
ば「ジブヤ」の発声に対する認識にあたつては、
標準特徴パターンS₁およびS₂だけがマツチング
し、パターンS₃はマツチングしない。しかし、パ
ターンS₃が誤つていることは、検知されず、単に
パターンS₃に該当する発声がなされないとして扱
われる。一方、例えば第２図図示の如く、「ヒビ
ヤ」について入力特徴パターンＸの発声がなされ
たとする。入力特徴パターンＸと標準特徴パター
ンS₃との距離d₁は、標準特徴パターンH₃との距
離d₂よりも小さいため、パターンＸは、単語「渋
谷」と認識されることとなる。この場合、従来の
学習方式等によれば、標準特徴パターンS₃が誤つ
ているというよりも、むしろ、単語「日比谷」の
標準特徴パターンH₁，H₂，H₃が適当でないと判
断し、「日比谷」に属する標準特徴パターンの追
加、修正を行うようにされていた。そのため、誤
り標準特徴パターンS₃は、そのまま辞書中に放置
されることとなる。 Once registered as described above, for example, when recognizing the utterance of "Jibuya",
Only standard feature patterns S ₁ and S ₂ match, and pattern S ₃ does not match. However, the fact that pattern _S3 is incorrect is not detected, and it is simply treated as if no utterance corresponding to pattern _S3 is made. On the other hand, it is assumed that the input feature pattern X for "Hibiya" is uttered as shown in FIG. 2, for example. Since the distance d ₁ between the input feature pattern X and the standard feature pattern S ₃ is smaller than the distance d ₂ between the input feature pattern X and the standard feature pattern H ₃ , the pattern X is recognized as the word "Shibuya". In this case, according to conventional learning methods, it is determined that the standard feature patterns H ₁ , H ₂ , H ₃ of the word "Hibiya" are inappropriate, rather than that the standard feature pattern S ₃ is incorrect. Standard feature patterns belonging to "Hibiya" were added and modified. Therefore, the error standard feature pattern _S3 will be left as is in the dictionary.

第３図に示す参考技術の場合、標準特徴パター
ンを登録録しようとして入力した音声を、その場
で再生して出力することにより、その出力音声が
登録を意図したものと同じであるかどうかを閉き
わけることができるようにし、上記標準特徴パタ
ーンS₃のような誤り標準特徴パターンの登録が、
未然に防止されるようにする。さらに本発明は、
第５図を参照して後述する如く、間違つて上記誤
り標準特徴パターンS₃が登録されてしまつたとし
ても、その標準特徴パターンS₃が認識侯補に用い
られたとき、パターンS₃に関する登録音声を再生
して出力することにより、該標準特徴パターンS₃
が妥当なものであるか否かを判断できるようにす
る。これによつて、不良標準特徴パターン等を検
出し、辞書中から排除できることとなる。 In the case of the reference technology shown in Figure 3, by playing back and outputting the audio input when attempting to register a standard feature pattern on the spot, it is possible to check whether the output audio is the same as the one intended for registration. The registration of error standard feature patterns such as the standard feature pattern _S3 above is
prevent it from happening. Furthermore, the present invention
As will be described later with reference to FIG. 5, even if the error standard feature pattern _S3 is registered by mistake, when the standard _feature pattern _S3 is used for recognition candidate, By playing and outputting the registered audio, the standard feature pattern S ₃
to be able to judge whether or not it is appropriate. This makes it possible to detect defective standard feature patterns and eliminate them from the dictionary.

第３図は、本実施例に関係する参考技術を示す
ブロツク図であつて、図中、符号１はマイクロホ
ン、２は音響分析部、３はパターン抽出部、４は
切替部、５は音声再生部、６はスピーカ、７は入
力パターン・バツフア、８は選択キー、９は誤り
指示部、１０はパターン追加部、１１は登録棄却
部、１２は辞書、１３は照合判定部を表わす。 FIG. 3 is a block diagram showing reference technology related to this embodiment, and in the figure, reference numeral 1 is a microphone, 2 is an acoustic analysis section, 3 is a pattern extraction section, 4 is a switching section, and 5 is an audio reproduction section. 6 is a speaker, 7 is an input pattern buffer, 8 is a selection key, 9 is an error indication section, 10 is a pattern addition section, 11 is a registration rejection section, 12 is a dictionary, and 13 is a collation determination section.

マイクロホン１から入力された音声信号は、音
響分析部２において周波数分析される。音響分析
部２は、例えば帯域フイルタ群、パラメータ抽出
回路等を有しており、入力音声の特徴量（パラメ
ータ）、例えば第１ホルマント周波数に相当する
モーメントM₁や、第２ホルマント周波数に相当
するモーメントM₂や、さらには、低域電力や高
域電力などを抽出し、これらの特徴量に関するサ
ンプル点を決定して、特徴量の時系列情報を得
る。 The audio signal input from the microphone 1 is subjected to frequency analysis in the acoustic analysis section 2. The acoustic analysis unit 2 includes, for example, a group of band filters, a parameter extraction circuit, etc., and extracts features (parameters) of the input speech, such as a moment _M1 corresponding to the first formant frequency and a moment M1 corresponding to the second formant frequency. Moment _M2 , low frequency power, high frequency power, etc. are extracted, sample points related to these feature quantities are determined, and time series information of the feature quantities is obtained.

音響分析部２において得られたパラメータ時系
列情報は、パターン抽出部３に入力される。パタ
ーン抽出部３は、このパラメータ時系列情報か
ら、入力音声の特徴を表わす入力特徴パターンを
抽出する。切替部４は、パターン情報の登録また
は照合を、例えばキーボード（図示省略）からの
モード切替指示により、切り替えるものである。 The parameter time series information obtained by the acoustic analysis section 2 is input to the pattern extraction section 3. The pattern extraction unit 3 extracts an input feature pattern representing the characteristics of the input voice from this parameter time series information. The switching unit 4 switches registration or verification of pattern information in response to a mode switching instruction from, for example, a keyboard (not shown).

登録モードが指示されている場合、上記入力特
徴パターンは、入力パターン・バツフア７に保持
される。音声再生部５は、登録のための入力音声
を、音声合成等により再生して、スピーカ６から
出力する。これによつて、例えば「シブヤ」と発
声すべきところ、誤つて「ヒビヤ」と発声した場
合に、登録前にその発声を耳で確認できるため、
誤りを検知できる。選択キー８は、登録するかし
ないかを選択するためのキーである。誤り指示部
９は、登録の場合には、パターン追加部１０を起
動し、発声誤りのため、登録しない場合には、登
録棄却部１１を起動する。パターン追加部１０
は、入力パターン・バツフアに格納された入力特
徴パターンを、標準特徴パターンとして、辞書１
２へ追加登録する。登録棄却部１１は、入力パタ
ーン・バツフア７上の入力特徴パターンを棄却
し、再発声入力を促す。照合判定部１３は、認識
モード時に、入力特徴パターンと、辞書中の標準
特徴パターンとを照合し、認識結果を出力するも
のである。 When the registration mode is designated, the input feature pattern is held in the input pattern buffer 7. The audio reproduction unit 5 reproduces the input audio for registration by voice synthesis or the like and outputs it from the speaker 6. With this, for example, if you accidentally say "Hibiya" when you should say "Shibuya", you can confirm the utterance with your ears before registering.
Errors can be detected. The selection key 8 is a key for selecting whether to register or not. The error instruction unit 9 activates the pattern addition unit 10 in the case of registration, and activates the registration rejection unit 11 in the case of not registering due to a utterance error. Pattern addition part 10
uses the input feature pattern stored in the input pattern buffer as a standard feature pattern in dictionary 1.
Add registration to 2. The registration rejection unit 11 rejects the input feature pattern on the input pattern buffer 7 and prompts for re-voice input. The matching/judgment unit 13 matches the input feature pattern with the standard feature pattern in the dictionary in the recognition mode, and outputs the recognition result.

なお、第３図において、パターン追加部１０
は、無条件にパターンの追加登録を行い、後に誤
り指示部９からの指示により、登録棄却部１１が
その登録を抹消するようにしてもよい。 In addition, in FIG. 3, the pattern adding section 10
Alternatively, the pattern may be additionally registered unconditionally, and the registration rejecting unit 11 may later cancel the registration based on an instruction from the error indicating unit 9.

音声再生部５は、例えば第４図図示の如くにし
て、入力音声を再生し、スピーカ６に出力するこ
とができる。第４図イ図示の場合、登録音声を音
響分析して得られた特徴パラメータ時系列をもと
に、音声合成することにより、音声を出力してい
る。なお、この音声合成は、周知の技術により実
現できるので、ここではこの程度の説明にとどめ
る。 The audio reproduction section 5 can reproduce input audio and output it to the speaker 6, for example, as shown in FIG. In the case shown in FIG. 4A, the voice is output by synthesizing the voice based on the feature parameter time series obtained by acoustically analyzing the registered voice. Note that this speech synthesis can be realized by a well-known technique, so the explanation will be limited to this extent here.

第４図ロ図示の場合、登録音声をアナログ／デ
ジタル変換したデイジタル音声を、そのまま音声
バツフアに蓄え、音声再生部において、デイジタ
ル／アナログ変換を行つて出力する。第４図ハ図
示の場合、登録音声をアナログ・デジタル変換し
たデイジタル音声を、音声符号化して、音声バツ
フアに保持し、それを復号してデイジタル／アナ
ログ変換することにより、音声を再生している。 In the case shown in FIG. 4B, the digital audio obtained by analog/digital conversion of the registered audio is stored as it is in the audio buffer, and the audio reproducing section performs the digital/analog conversion and outputs it. In the case shown in Figure 4 C, the digital audio obtained by converting the registered audio from analog to digital is encoded, stored in an audio buffer, and then decoded and converted from digital to analog to reproduce the audio. .

第５図は本発明の一実施例構成、第６図は第５
図図示実施例の音声再生方式説明図を示す。図
中、符号１ないし６，１２，１３は第３図に対応
し、２０はパターン・音声登録部、２１は誤認識
指示キー、２２は登録削除指示キー、２３は誤認
識指示部、２４は登録削除部、２５は音声情報記
憶部を表わす。 FIG. 5 shows the configuration of an embodiment of the present invention, and FIG.
An explanatory diagram of the audio reproduction method of the illustrated embodiment is shown. In the figure, numerals 1 to 6, 12, and 13 correspond to those in FIG. 3, 20 is a pattern/voice registration section, 21 is an erroneous recognition instruction key, 22 is a registration deletion instruction key, 23 is an erroneous recognition instruction section, and 24 is an erroneous recognition instruction key. The registration deletion unit 25 represents a voice information storage unit.

第３図図示実施例の技術では、登録時に登録音
声を再生して出力するが、第５図図示実施例の場
合、認識時に常時または認識誤りが生じた際に、
選択された標準特徴パターンに関する登録音声を
再生して出力する。従つて、登録後にも、誤り標
準特徴パターンの検出および再登録が可能であ
る。 In the embodiment illustrated in FIG. 3, the registered voice is played back and output at the time of registration, but in the embodiment illustrated in FIG.
The registered audio related to the selected standard feature pattern is played back and output. Therefore, even after registration, it is possible to detect and re-register the erroneous standard feature pattern.

音響分析部２、パターン抽出部３、切替部４
は、第３図で説明したものと同様である。パター
ン・音声登録部２０は、登録モード時に、音録音
声の入力特徴パターンとともに、第６図を参照し
て後述する音声情報を、辞書１２中の音声情報記
憶部２５へ、標準特徴パターンに対応させて登録
する。 Acoustic analysis section 2, pattern extraction section 3, switching section 4
is similar to that explained in FIG. In the registration mode, the pattern/voice registration unit 20 sends voice information, which will be described later with reference to FIG. and register.

認識モード時、照合判定部１３は、入力特徴パ
ターンと各標準特徴パターンとの距離を演算し、
照合を行う。そして、距離の小さなものを認識結
果として出力する。使用者は、認識結果が誤つて
いる場合に、誤認識指示キー２１を押下する。誤
認識指示部２３は、誤認識指示キー２１の押下を
検出して、認識誤りを辞書１２の処理部または音
声再生部５へ通知する。音声再生部５は、認識侯
補第１位または所定の範囲内もしくは使用者の要
求範囲内の標準特徴パターンに対応する音声情報
を、音声情報記憶部２５から読出し、音声を再生
してスピーカ６から出力する。 In the recognition mode, the matching determination unit 13 calculates the distance between the input feature pattern and each standard feature pattern,
Perform verification. Then, the one with the smallest distance is output as the recognition result. The user presses the erroneous recognition instruction key 21 when the recognition result is incorrect. The erroneous recognition instruction section 23 detects the depression of the erroneous recognition instruction key 21 and notifies the processing section of the dictionary 12 or the audio reproduction section 5 of the recognition error. The audio reproduction unit 5 reads audio information corresponding to the first recognition candidate or a standard feature pattern within a predetermined range or within a user's requested range from the audio information storage unit 25, reproduces the audio, and outputs the audio to the speaker 6. Output from.

一般に誤認識が生じる原因として、未知入力音
声が不明瞭である場合と、標準特徴パターンが適
当でない場合とがある。本発明の場合、音声再生
部５による音声出力によつて、認識時に選択され
た標準特徴パターンの妥当性がチエツク可能とな
る。もし、標準特徴パターンが妥当なものでない
場合、使用者は、登録削除指示キー２２を押下す
る。この押下によつて、登録削除部２４は、辞書
１２中の当該標準特徴パターンおよびその音声情
報を削除する。そして、必要であれば、登録モー
ドに切替えて、正しい標準特徴パターンを再登録
する。なお、音声再生部５は、上述の如く、認識
誤りが生じたときだけ、音声を再生して出力して
もよいし、また、照合判定部１３が認識結果を出
力する際、その結果に従つて、認識誤りの有無に
かかわらず音声を出力するようにしてもよい。 Common causes of misrecognition include cases where the unknown input voice is unclear and cases where the standard feature pattern is inappropriate. In the case of the present invention, the validity of the standard feature pattern selected at the time of recognition can be checked by the audio output by the audio reproduction section 5. If the standard feature pattern is not valid, the user presses the registration deletion instruction key 22. By pressing this button, the registration deletion unit 24 deletes the standard feature pattern and its audio information from the dictionary 12. Then, if necessary, switch to registration mode and re-register the correct standard feature pattern. Note that, as described above, the audio reproducing unit 5 may reproduce and output the audio only when a recognition error occurs, or when the matching determination unit 13 outputs the recognition result, it may output the audio based on the recognition result. Therefore, the voice may be output regardless of the presence or absence of a recognition error.

次に、第６図を参照して、音声再生の処理につ
いて説明する。例えば第６図イ図示の如く、登録
時に、登録音声を音響分析して得られた特徴パラ
メータ時系列を、辞書１２に記憶する。音声再生
部５は、その特徴パラメータ時系列を読出して、
音声を合成し、スピーカ６へ出力する。また、第
６図ロ図示の如く、登録音をアナログ／デイジタ
ル変換したデイジタル音声を、そのままの形で辞
書１２へ格納しておき、認識時に読出して、デイ
ジタル／アナログ変換を行い、音声を再生しても
よい。さらに第６図ハ図示の如く、辞書１２の記
憶領域を削減するため、登録音声をアナログ／デ
イジタル変換後に音声符号化し、その符号化情報
を辞書１２へ記憶し、認識時に復号して音声を再
生し出力するようにしてもよい。 Next, the audio reproduction process will be explained with reference to FIG. For example, as shown in FIG. 6A, at the time of registration, the characteristic parameter time series obtained by acoustically analyzing the registered speech is stored in the dictionary 12. The audio reproduction unit 5 reads out the characteristic parameter time series,
The audio is synthesized and output to the speaker 6. In addition, as shown in FIG. 6B, the digital voice obtained by converting the registered sound from analog to digital is stored in the dictionary 12 as it is, read out at the time of recognition, performs digital/analog conversion, and reproduces the voice. It's okay. Furthermore, as shown in FIG. 6C, in order to reduce the storage area of the dictionary 12, the registered speech is encoded after analog/digital conversion, the encoded information is stored in the dictionary 12, and the encoded information is decoded during recognition and played back. It may also be possible to output it.

(E) 発明の効果以上説明した如く、本発明によれば、雑音付加
パターン、不明瞭発声パターン等の不良特徴パタ
ーンや、誤り発声による誤り標準特徴パターンの
登録削除が容易にできるようになり、辞書の品質
を向上させて、認識率を良好にすることが可能と
なる。(E) Effects of the Invention As explained above, according to the present invention, it becomes possible to easily delete the registration of defective feature patterns such as noise-added patterns and unclear speech patterns, as well as erroneous standard feature patterns due to incorrect speech production. It is possible to improve the quality of the dictionary and improve the recognition rate.

[Brief explanation of the drawing]

第１図は音声パターンの分布と標準特徴パター
ンとの関係を説明するための図、第２図は本発明
による処理概要を説明するための図、第３図は本
実施例に関係する参考技術の例、第４図は第３図
図示実施例における音声再生についての説明図、
第５図は本発明の一実施例構成、第６図は第５図
図示実施例の音声再生方式説明図を示す。図中、３はパターン抽出部、５は音声再生部、
９は誤り指示部、１１は登録棄却部、１２は辞
書、１３は照合判定部、２３は誤認識指示部、２
４は登録削除部、２５は音声情報記憶部を表わ
す。 Fig. 1 is a diagram for explaining the relationship between the distribution of voice patterns and standard feature patterns, Fig. 2 is a diagram for explaining the outline of the processing according to the present invention, and Fig. 3 is a reference technique related to this embodiment. example, FIG. 4 is an explanatory diagram of audio reproduction in the embodiment illustrated in FIG. 3,
FIG. 5 shows the configuration of an embodiment of the present invention, and FIG. 6 is an explanatory diagram of the audio reproduction system of the embodiment shown in FIG. In the figure, 3 is a pattern extraction section, 5 is an audio reproduction section,
9 is an error instruction section, 11 is a registration rejection section, 12 is a dictionary, 13 is a collation determination section, 23 is an error recognition instruction section, 2
4 represents a registration deletion unit, and 25 represents an audio information storage unit.

Claims

[Claims] 1. Speech recognition is performed by comparing an input feature pattern obtained by acoustic analysis of unknown input speech with a standard feature pattern stored in advance corresponding to each item in a dictionary. A voice standard feature pattern creation processing device in a recognition system, wherein the dictionary includes a voice information storage unit that stores voice information corresponding to the standard feature pattern, and the dictionary is used during voice recognition or when a recognition error occurs. an audio reproduction unit that reproduces and outputs audio based on the corresponding audio information in the dictionary; and a registration deletion unit that deletes the registration of the standard feature pattern selected at the time of recognition from the dictionary in response to a registration deletion instruction. 1. A speech standard feature pattern creation processing device, characterized in that: 2. The voice information stored in the dictionary is feature parameter time series extracted from registered voices, digital information obtained by A/D conversion of registered voices, or audio encoded information thereof. The speech standard feature pattern creation processing device according to item 1.