JPH04178699A

JPH04178699A - Method and device for speech recognition

Info

Publication number: JPH04178699A
Application number: JP2307607A
Authority: JP
Inventors: Toshiki Kawamoto; 河本　俊毅
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1990-11-14
Filing date: 1990-11-14
Publication date: 1992-06-25

Abstract

PURPOSE:To prevent the superposition of an abnormal standard pattern for recognition on a normal standard pattern at the time of recording the standard pattern by providing a characteristic extracting section, a standard pattern forming section, a standard pattern memory section for recognition, a standard pattern memory section for synthesis, and a voice synthesizing section. CONSTITUTION:Voice inputs are subjected to the extraction of the characteristic quantity for recognition and the characteristic quantity for synthesis in the characteristic extracting section 2, by which voice sections are determined. These sections are sent to the standard pattern forming section 3 by which the standard patterns for recognition and the standard patterns for synthesis are formed. These patterns are stored in the respective memory sections 4, 5. A speaker utters voices three times and the standard patterns for synthesis formed by the uttered voices are synthesized in the speech synthesizing section 7 and are outputted only when the frame length, the number of the silent sections, etc., of only the voices uttered once among these voices are larger than the predetermined values in the difference from the voices uttered in the other two times. The speaker is made to judge whether these patterns are correct or not and the standard patterns are registered.

Description

【発明の詳細な説明】且皿見立本発明は、音声認識方法及び装置、より詳細には、より
正確な標準パタンを登録するための特定話者音声認識方
法及びそのための装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition method and apparatus, and more particularly to a specific speaker speech recognition method and apparatus for registering a more accurate standard pattern.

ｌ米致麦従来の音声認識装置には、標準パタン登録時に、３回以
上の発声のうち、ある決められた何回■かの発声を合成
用標準パタンとして登録するものがある。また、登録用
音声を発声する度に合成音を必ず出力してその発声が正
しいかどうかを話者が判断するものもある。Some conventional speech recognition devices register a predetermined number of utterances out of three or more utterances as a standard pattern for synthesis when registering a standard pattern. In addition, there are systems that output a synthesized sound every time a registration voice is uttered, and the speaker determines whether the utterance is correct.

しかし、標準パタン登録時に、３回以上の発声のうち、
ある決められた何回目かの発声、例えば、２回目の発声
を合成用標準パタンとして登録する音声認識装置では、
１回目、３回目の発声に対しての確認ができないため、
認識用標準パタンを作成するために３回分の音声パタン
を重ねあわせるときに１回目、３回目の音声パタンか不
良音声パタンであってもそれを用いてしまい正常な標準
パタンを作成できなくなってしまう。However, when registering the standard pattern, out of 3 or more utterances,
In a speech recognition device that registers a predetermined number of utterances, for example, the second utterance, as a standard pattern for synthesis,
Since it is not possible to confirm the first and third utterances,
When overlapping three audio patterns to create a standard pattern for recognition, even if it is the first or third audio pattern or a defective audio pattern, it will be used and it will not be possible to create a normal standard pattern. .

また、登録時に音声を発声する度に合成音を出ツノする
音声認識装置では、話者が発声を行う度にその発声が正
常に入力されたかどうかを判断しなければならず話者に
対する負担が大きかった。In addition, with a voice recognition device that outputs a synthesized sound every time a voice is uttered during registration, each time the speaker utters a voice, it must be determined whether the utterance has been input correctly, which places a burden on the speaker. It was big.

■−−道。■--way.

本発明は、上述のごとき実情に鑑みてなされたもので、
特に、特定話者の音声認識において、標準パタン登録時
に、音声区間検出ミス等による異常な認識用標準パタン
を正常な標準パタンに重ねあわせることを防ぐことがで
き、更に、話者が発声する度に正常かどうかを判断する
負担をなくすことを目的としてなされたものである。The present invention was made in view of the above-mentioned circumstances, and
In particular, in speech recognition of a specific speaker, when registering standard patterns, it is possible to prevent an abnormal recognition standard pattern from being superimposed on a normal standard pattern due to a speech interval detection error, etc. This was done with the aim of eliminating the burden of determining whether something is normal or not.

豊−一広本発明は、上記目的を達成するために、（１）マイクロ
フォンから入力された音声からスペクトル情報等の特徴
量を抽出し、抽出され特徴量から認識用標準パタンと合
成用標準パタンとを作成し。Kazuhiro Toyo In order to achieve the above object, the present invention (1) extracts feature quantities such as spectral information from speech input from a microphone, and creates a recognition standard pattern and a synthesis standard pattern from the extracted feature quantities; and create.

作成された認識用標準パタンを記憶するとともに、作成
された合成用標準パタンを各認識用標準パタンに対応し
て記憶し、音声入力時に抽出される認識用特徴量から作
成される入力パタンと予め記憶された上記認識用標準パ
タンとのパタン照合を行なって入力音声がどの標準パタ
ンに該当するのかを認識し、認識された結果として、第
１候補の認識用標準パタンに対応する合成用標準パタン
を上記合成用標準パタンの記憶部から読みだして合成音
として出力し、標準パタン登録時に１単語につき３回以
上の発声を行う特定お者音声認識方法において、標準パ
タン登録時に各発声のフレーム長、無音区間数等を測定
し、３回のうち１回の発声だけが測定したフレーム長、
無音区間数等が他の２回の発声と差が大きくその差が予
め定められた値よりも大きい場合のみ、その１回の発声
によって作成された合成用標準パタンを音声合成して出
力し、話者にそのパタンか正しいかどうかを判断させて
標準パタンを登録させることこと、或いは、（２）マイ
クロフォンから入ツノされた音声からスペクトル情報等
の特徴量を抽出する特徴抽出部と、特徴抽出部で得られ
た特徴量から認識用標準パタンと合成用標準パタンとを
作成する標準パタン作成部と、作成された認識用標準パ
タンを記憶する認識用標準パタン記憶部と、作成された
合成用標準パタンを各認識用標準パタンに対応して記憶
する合成用標準パタン記憶部と、音声入力時に上記特徴
抽出部で抽出される詔識用特微量から作成される入力パ
タンと予め記憶された上記認識用標準パタン記憶部内の
認識用標準パタンとのパタン照合を行ない入力音声がど
の標準パタンに該当するのかを認識するパタン照合部と
、認識された結果として、第１候補の認識用標準パタン
に対応する合成用標準パタンを上記合成用標準パタン記
憶部から読みだして合成音として出力する音声合成部と
を備え、標準パタン登録時に１単語につき３回以トの発
声を行う特定話者音声認識装置において、標準パタン登
録時に各発声のフレーム長、無音区間数等を測定し、３
回のうち１回の発声だけが測定したフレーム長、無音区
間数等が他の２回の発声と差が大きくその差が予め定め
られた値よりも大きい場合のみ、その１回の発声によっ
て作成された合成用標準パタンを音声合成部で合成して
出力し、話者にそのパタンか正しいかどうかを判断させ
て標準パタンを登録することを特徴としたものである。The created standard patterns for recognition are stored, and the created standard patterns for synthesis are stored in correspondence with each standard pattern for recognition, and the input patterns created from the recognition features extracted at the time of voice input are used in advance. The pattern is compared with the stored recognition standard pattern to recognize which standard pattern the input voice corresponds to, and as a result of recognition, a synthesis standard pattern corresponding to the first candidate recognition standard pattern is created. is read from the storage unit of the standard pattern for synthesis and output as a synthesized sound, and the frame length of each utterance is set at the time of registering the standard pattern. , the number of silent intervals, etc. are measured, and the frame length in which only one utterance out of three is measured,
Only when there is a large difference in the number of silent intervals, etc. from the other two utterances and the difference is larger than a predetermined value, a standard pattern for synthesis created by that one utterance is synthesized into speech and output; (2) A feature extraction unit that extracts feature quantities such as spectral information from the voice input from the microphone, and feature extraction. a standard pattern creation section that creates a standard pattern for recognition and a standard pattern for synthesis from the feature values obtained in the section; a standard pattern storage section for recognition that stores the created standard recognition pattern; and a standard pattern storage section that stores the created standard pattern for synthesis. a standard pattern storage unit for synthesis that stores standard patterns corresponding to each standard pattern for recognition; input patterns created from the feature quantities for imperial judgment extracted by the feature extraction unit during voice input; and the above-mentioned pre-stored patterns. A pattern matching unit recognizes which standard pattern the input voice corresponds to by comparing the pattern with the standard recognition pattern in the recognition standard pattern storage unit, and a pattern matching unit recognizes which standard pattern the input voice corresponds to, and as a result of the recognition, the first candidate recognition standard pattern is selected. A speech synthesis unit that reads a corresponding standard pattern for synthesis from the standard pattern storage unit for synthesis and outputs it as a synthesized sound, and performs speech recognition for a specific speaker three times or more per word when registering the standard pattern. When registering standard patterns, the device measures the frame length of each utterance, the number of silent intervals, etc.
Only when the measured frame length, number of silent intervals, etc. of only one utterance out of the two utterances is significantly different from the other two utterances, and the difference is larger than a predetermined value, it is created based on that one utterance. This system is characterized in that the synthesized standard pattern for synthesis is synthesized and outputted by a speech synthesis section, and the speaker judges whether the pattern is correct or not, and then registers the standard pattern.

以下、本発明の実施例に基いて説明する。Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するための構成図で
、図示のように、マイクロフォン１から入力された音声
からスペクトル情報等の特徴量を抽出する特徴抽出部２
と、特徴抽出部２で得られた特徴量から認識用標準パタ
ンと合成用標準パタンとを作成する標準パタン作成部３
と、作成された認識用標準パタンを記憶する認識用標準
パタン記憶部４と、作成された合成用標準パタンを各認
識用標準パタンに対応して記憶する合成用標準パタン記
憶部５と、音声入力時に上記特徴抽出部２で抽出される
認識用特徴量から作成される入力パタンと予め記憶され
た上記認識用標準パタン記憶部内の認識用標準パタンと
のパタン照合を行ない入力音声がどの標準パタンに該当
するのかを認識するパタン照合部６と、認識された結果
として、第１候補の認識用標準パタンに対応する合成用
標準パタンを上記合成用標準パタン記憶部から読みだし
、合成音として出力する音声合成部７と、認識結果等を
出力するスピーカ等の表示部８と、話者が登録や認識を
行わせるためのキーを備えたキーボード９と、前記各ブ
ロックを制御する制御部１０と、オペレータ（話者）に
操作指示を行う表示部１１等からなる。FIG. 1 is a block diagram for explaining one embodiment of the present invention. As shown in the figure, a feature extraction unit 2 extracts feature quantities such as spectral information from audio input from a microphone 1.
and a standard pattern creation unit 3 that creates a recognition standard pattern and a synthesis standard pattern from the feature amounts obtained by the feature extraction unit 2.
, a recognition standard pattern storage unit 4 that stores created recognition standard patterns, a synthesis standard pattern storage unit 5 that stores created synthesis standard patterns corresponding to each recognition standard pattern, and a voice At the time of input, pattern matching is performed between the input pattern created from the recognition feature extracted by the feature extraction unit 2 and the recognition standard pattern stored in the recognition standard pattern storage unit stored in advance, and it is determined which standard pattern the input voice corresponds to. As a result of the recognition, a pattern matching unit 6 reads out a synthesis standard pattern corresponding to the first candidate recognition standard pattern from the synthesis standard pattern storage unit and outputs it as a synthesized sound. a display section 8 such as a speaker that outputs recognition results, a keyboard 9 equipped with keys for the speaker to perform registration and recognition, and a control section 10 that controls each of the blocks. , a display section 11 for giving operation instructions to an operator (speaker), and the like.

次に上記の構成で、標準パタン登録時に１単語につき３
回ずつ発声を行って登録する場合を例として、標準パタ
ン登録時の本発明の動作について説明する。標準パタン
登録時には、話者は、まず、キーボード９の登録を行う
ためのキーを押下し、単語番号が何番の単語を登録する
かを指定する。Next, with the above configuration, when registering a standard pattern, 3
The operation of the present invention when registering a standard pattern will be explained using an example in which registration is performed by uttering each time. When registering a standard pattern, the speaker first presses a key for registration on the keyboard 9 and specifies the word number to be registered.

制御部１０はキーボード９の登録キーが押下されたこと
を検知すると、登録動作を行うか認識動作を行うかを切
替るスイッチ１２を登録側に切替える。話者はマイクロ
フォン１にその単語の１回目の発声を入力し、入力され
た音声は特徴抽出部２で認識用の特徴量及び合成用の特
徴量を抽出され、音声区間が決定される。決定された音
声区間内の特徴量は標準パタン作成部３に送られ認識用
標準パタン、合成用標準パタンが作成される。この処理
が終わった時点で標準パタン作成部３は処理終了信号を
制御部１ｏに送り、制御部１０はこの動作が終わったこ
とを確認した後、表示部１１に２回目の発声を話者に促
すメツセージを出力する。When the control unit 10 detects that the registration key on the keyboard 9 has been pressed, it switches the switch 12, which switches between performing a registration operation and a recognition operation, to the registration side. The speaker inputs the first utterance of the word into the microphone 1, and the input voice is subjected to a feature extraction unit 2, which extracts a feature amount for recognition and a feature amount for synthesis, and determines a speech section. The determined feature amount within the voice section is sent to the standard pattern creation section 3, where a recognition standard pattern and a synthesis standard pattern are created. When this process is finished, the standard pattern creation unit 3 sends a process end signal to the control unit 1o, and after confirming that this operation is finished, the control unit 10 displays the second utterance on the display unit 11 to the speaker. Outputs a prompting message.

話者は次に２回目の発声をマイクロフォンｌに入力し、
１回目の発声と同様、入力された音声が特徴抽出部２で
認識用の特徴量及び合成用の特徴量を抽出され、音声区
間が決定される。決定された音声区間内の特徴量は標準
パタン作成部３に送られる。標準パタン作成部３では認
識用の特徴量から認識用標準パタンを作成し、１回目の
発声による標準パタンのフレーム長や無音区間数等を２
回目の発声による標準パタンのフレーム長や無音区間数
等と比べて、その差とそれぞれある予め定めておいた閾
値ｘ、ｙ・・・と比較する。その結果、差が閾値よりも
小さい場合には１回目の認識用標準パタンと２回目の認
識用標準パタンとを重ねあわせて新たな認識用標準パタ
ンを作成し、合成用標準パタンに関しては、１回目の発
声によって作成されたものだけ残し、２回目の発声によ
る合成用特徴量からは合成用標準パタンを作成しない。The speaker then inputs the second utterance into microphone l,
Similar to the first utterance, the feature extraction unit 2 extracts features for recognition and features for synthesis from the input speech, and determines a speech section. The determined feature amount within the voice section is sent to the standard pattern creation section 3. The standard pattern creation unit 3 creates a standard pattern for recognition from the features for recognition, and sets the frame length and number of silent sections of the standard pattern from the first utterance to 2.
The frame length and number of silent sections of the standard pattern of the first utterance are compared, and the difference therebetween is compared with predetermined thresholds x, y, . . . . As a result, if the difference is smaller than the threshold, a new recognition standard pattern is created by overlapping the first recognition standard pattern and the second recognition standard pattern. Only the one created by the second utterance is retained, and a standard pattern for synthesis is not created from the synthesis feature amount by the second utterance.

−方、その差が予め定めておいた閾値よりも大きい場合
には１回目の認識用標準パタンと２回目の認識用標準パ
タンとを重ねあわせずそのまま２つとも記憶しておく。- On the other hand, if the difference is larger than a predetermined threshold, the first recognition standard pattern and the second recognition standard pattern are not superimposed and are stored as they are.

合成用標準パタンに関しては２回目の発声による合成用
特徴量から合成用標準パタンを作成し、同様に１回目の
合成用標準パタンと２回目の合成用標準パタンとをその
まま２つとも記憶しておく。ここまでの処理が終わると
標準パタン作成部３は制御部１０に処理終了信号を送り
、制御部１０は表示部１１に３回目の発声を話者に促す
メツセージを出力させる。話者は次に３回［」の発声を
マイクロフォン１に入力し、１回目の発声と同様に、入
力された音声が特徴抽出部２で認識用の特徴量及び合成
用の特徴量を抽出され、音声区間が決定される。決定さ
れた音声区間内の特徴量は標準パタン作成部３に送られ
る。標準パタン作成部３では認識用特徴量から認識用標
準パタンを作成し、それまでに作成されている認識用標
準パタンの数を見て、１つであればその認識用標準パタ
ンのフレーム数や無音区間数等と、３回目の発声による
認識用標準パタンのフレーム数や無音区間数等と比較す
る。その差とそれぞれある予め定めておいた閾値ｘ、ｙ
・と比べた結果、差が閾値よりも小さい場合にはそれま
でに作成してあった認識用標準パタンと３回目の発声に
よる認識用標準パタンとを重ねあわせて新たな認識用標
準パタンを作成し、合成用標準パタンに関しては、１回
目の発声によって作成されたものだけ残し、３回目の発
声による合成用特徴量からは合成用標準パタンを作成し
ない。一方、その差が予め定めておいた閾値よりも大き
い場合にはそれまでに作成してあった認識用標準パタン
と３回目の発声による認識用標準パタンとを重ねあわせ
ずそのまま２つとも記憶しておく。合成用標準パタンに
関しては３回目の発声による合成用特徴量から合成用標
準パタンを作成し、同様に１回目の合成用標準パタンと
３回目の標準パタンとをそのまま２つとも記憶しておく
。また、それまでに作成されている認識用標準パタンの
数が２つの場合には、３回目の発声による認識用標準パ
タンのフレーム数や無音区間数等を１回目の発声による
認識用標準パタンと２回目の発声による認識用標準パタ
ンのフレーム数や無音区間数等と比較し、その差の小さ
い方の標準パタンを選ぶ。その選ばれた認識用標準パタ
ンと３回目の発声による認識用標準パタンとを重ねあわ
せ、新たな認識用標準パタンを作成する。合成用標準パ
タンに関しては、３回目の発声による合成用標準パタン
は作成せず、それまでの合成用標準パタンをそのまま残
しておく。ここまでの処理が終わった時点で標準パタン
作成部３は処理終了信号を制御部１０に送り、制御部１
０はこの信号を受は取ると標準パタン作成部３の認識用
標準パタンの数を見て、１つであればその認識用標準パ
タンを認識用標準パタン記憶部４に、合成用標準パタン
を合成用標準パタン記憶部５にそれぞれ単語番号と共に
記憶して、その単語の標準パタン登録処理を終了する。Regarding the standard pattern for synthesis, create the standard pattern for synthesis from the feature values for synthesis from the second utterance, and similarly store both the standard pattern for synthesis for the first time and the standard pattern for synthesis for the second time as they are. put. When the processing up to this point is completed, the standard pattern creation section 3 sends a processing end signal to the control section 10, and the control section 10 causes the display section 11 to output a message urging the speaker to speak for the third time. The speaker then inputs the utterance ['' three times into the microphone 1, and similarly to the first utterance, the feature extraction unit 2 extracts features for recognition and features for synthesis from the input voice. , a voice section is determined. The determined feature amount within the voice section is sent to the standard pattern creation section 3. The standard pattern creation unit 3 creates a recognition standard pattern from the recognition feature amount, looks at the number of recognition standard patterns created so far, and if there is one, calculates the number of frames of the recognition standard pattern. The number of silent sections, etc. is compared with the number of frames, the number of silent sections, etc. of the standard pattern for recognition by the third utterance. The difference and respective predetermined thresholds x, y
・If the difference is smaller than the threshold, a new standard recognition pattern is created by overlapping the previously created standard recognition pattern and the standard recognition pattern created by the third utterance. However, as for the standard pattern for synthesis, only the one created by the first utterance is retained, and the standard pattern for synthesis is not created from the feature for synthesis by the third utterance. On the other hand, if the difference is larger than a predetermined threshold, the standard recognition pattern created up to that point and the standard recognition pattern created by the third utterance are not superimposed and both are stored as they are. I'll keep it. Regarding the standard pattern for synthesis, the standard pattern for synthesis is created from the feature quantity for synthesis resulting from the third utterance, and similarly, both the standard pattern for synthesis of the first time and the standard pattern of the third time are stored as they are. In addition, if the number of recognition standard patterns that have been created so far is two, the number of frames and the number of silent sections of the recognition standard pattern created by the third utterance will be compared with the recognition standard pattern created by the first utterance. Compare the number of frames, number of silent sections, etc. of the standard pattern for recognition based on the second utterance, and select the standard pattern with the smaller difference. The selected recognition standard pattern and the recognition standard pattern produced by the third utterance are superimposed to create a new recognition standard pattern. Regarding the standard pattern for synthesis, the standard pattern for synthesis by the third utterance is not created, and the standard pattern for synthesis up to that point is left as is. When the processing up to this point is completed, the standard pattern creation section 3 sends a processing end signal to the control section 10.
When 0 receives this signal, it looks at the number of recognition standard patterns in the standard pattern creation section 3, and if there is one, it stores the recognition standard pattern in the recognition standard pattern storage section 4 and the synthesis standard pattern. Each word is stored in the synthesis standard pattern storage unit 5 together with the word number, and the standard pattern registration process for that word is completed.

一方、標準パタン作成部３の認識用標準パタンの数が２
つの場合には、重ね合わされていない方の認識用標準パ
タンか何回目の発声によって作成されたものかを検知し
、その発声によって作成された合成用標準パタンを音声
合成部７に送り、合成音としてスピーカ８より出力する
。この標準パタンは３回の発声のうち１回だけフレーム
長や無音区間数が違っているものである。よって、正確
な音声区間検呂が行われず、余分なノイズが付いていた
り、音声の語頭、語尾が欠落している可能性がある。そ
こで話者にその標準パタンか異常かどうかを判断しても
らうために合成音で出力するのである。話者はこの合成
音を聞いてこの標準パタンか異常であると判断した場合
には、キーボード９のキャンセルキーを押下し、そうで
なければ何もし２ない。制御部１０は、音声合成部７か
ら合成音の出力が終わったことを示す信号を受は取った
後、キーボード９のキャンセルキーが押下されたかどう
かを一定時間監視し、押された場合には、標準パタン作
成部３に異常である認識用標準パタン、合成用標準パタ
ンを消去させる。次にもう一度発声を行うよう話者に促
すメツセージを表示部１１に出力させて、話者はもう一
度同じ発声を入力する。一方、一定時間キャンセルキー
が押下されなかった場合には、他の２回の発声が異常で
あるとみなし、標準パタン作成部３に重ねあわせた認識
用標準パタン、それに対応する合成用標準パタンを消去
させて、次にもう２回発声を行うよう話者に促すメツセ
ージを表示部１１に出力させ、話者はもう２回同じ発声
を入力する。そして３回の認識用標準パタンか重ね合わ
された時点で、その認識用標準パタンを認識用標準パタ
ン記憶部４へ、合成用標準パタンを合成用標準パタン記
憶部５へその単語番号と共に記憶させる。以りが一単語
の標準パタン登録の処理の説明である。話者は、登録し
たい単語数だけ以上の動作を繰り返す。On the other hand, the number of standard patterns for recognition in the standard pattern creation section 3 is 2.
In this case, it is detected how many times the recognition standard pattern that is not superimposed was created by utterance, and the synthesis standard pattern created by that utterance is sent to the speech synthesis section 7 to produce a synthesized sound. is output from the speaker 8 as . In this standard pattern, the frame length and number of silent sections differ only once out of three utterances. Therefore, accurate speech section checking may not be performed, and extra noise may be added or the beginning or end of a word may be missing. Therefore, a synthesized sound is output in order for the speaker to judge whether the pattern is standard or abnormal. If the speaker listens to this synthesized sound and determines that this standard pattern is abnormal, he or she presses the cancel key on the keyboard 9, otherwise does nothing. After the control unit 10 receives a signal from the speech synthesis unit 7 indicating that the output of the synthesized sound has ended, the control unit 10 monitors for a certain period of time whether or not the cancel key on the keyboard 9 has been pressed, and if it has been pressed, the control unit 10 , causes the standard pattern creation unit 3 to delete the abnormal recognition standard pattern and synthesis standard pattern. Next, a message prompting the speaker to make another utterance is output on the display section 11, and the speaker inputs the same utterance again. On the other hand, if the cancel key is not pressed for a certain period of time, the other two utterances are considered to be abnormal, and the standard pattern for recognition superimposed on the standard pattern creation section 3 and the standard pattern for synthesis corresponding to it are A message prompting the speaker to make the utterance twice more is output on the display section 11, and the speaker inputs the same utterance twice more. When the recognition standard patterns are superimposed three times, the recognition standard patterns are stored in the recognition standard pattern storage section 4, and the synthesis standard patterns are stored in the synthesis standard pattern storage section 5 together with their word numbers. This is a one-word explanation of the standard pattern registration process. The speaker repeats the action as many times as the number of words he or she wishes to register.

ガーー玉以上の説明から明らかなように、本発明によると、特定
話者の音声認識において、標準パタン登録時に、音声区
間検出ミス等による異常な認識用標準パタンを正常な標
準パタンに重ねあわせることを防ぐことができ、更に、
話者が発声する度に正常かどうかを判断する負担をなく
すことができる。As is clear from the above explanation, according to the present invention, in speech recognition of a specific speaker, when registering a standard pattern, an abnormal recognition standard pattern due to a speech segment detection error etc. can be superimposed on a normal standard pattern. can be prevented, and furthermore,
This eliminates the burden of determining whether the speaker's voice is normal every time he or she utters it.

[Brief explanation of the drawing]

第１図は、本発明の一実施例を説明するための構成図で
ある。１　マイクロフォン、２　・特徴抽出部、３・・標準パ
タン作成部、４・・・認識用標準バタシ記憶部、５・・
合成用標準パタン記憶部、６・パタン照合部、７・・・
音声合成部、８・・・スピーカ、９・・キーホード、１
０・・制御部、１１・表示部。FIG. 1 is a configuration diagram for explaining one embodiment of the present invention. 1. Microphone, 2. Feature extraction section, 3.. Standard pattern creation section, 4.. Standard pattern storage section for recognition, 5..
Standard pattern storage unit for synthesis, 6/pattern matching unit, 7...
Speech synthesis unit, 8...Speaker, 9...Keyboard, 1
0: Control unit, 11: Display unit.

Claims

[Claims] 1. A recognition standard created by extracting feature quantities such as spectral information from audio input from a microphone, and creating a recognition standard pattern and a synthesis standard pattern from the extracted feature quantities. In addition to storing the patterns, the created standard patterns for synthesis are stored in correspondence with each standard pattern for recognition, and the input patterns created from the recognition features extracted at the time of voice input and the previously stored standard patterns for recognition are stored. The pattern is compared with the standard pattern to recognize which standard pattern the input voice corresponds to, and as a result of the recognition, the synthesis standard pattern corresponding to the first candidate recognition standard pattern is used as the synthesis standard pattern. In a specific speaker speech recognition method in which the speech is read from the memory of the speaker and output as a synthesized sound, and utterances are uttered three or more times per word when registering a standard pattern, the frame length of each utterance, the number of silent intervals, etc. are set when registering the standard pattern. If the measured frame length, number of silent intervals, etc. of only one utterance among the three utterances are significantly different from the other two utterances, and the difference is larger than a predetermined value, only that one utterance is measured. A speech recognition method characterized in that a standard pattern for synthesis created by uttering is synthesized and output, and the speaker judges whether the pattern is correct or not and registers the standard pattern. 2. A feature extraction unit that extracts feature quantities such as spectral information from the voice input from the microphone, and a standard pattern creation unit that creates a standard pattern for recognition and a standard pattern for synthesis from the feature quantities obtained by the feature extraction unit. , a recognition standard pattern storage unit that stores the created recognition standard patterns, a synthesis standard pattern storage unit that stores the created synthesis standard patterns corresponding to each recognition standard pattern, and a synthesis standard pattern storage unit that stores the created standard recognition patterns. The input pattern created from the recognition feature extracted by the feature extracting section is compared with the recognition standard pattern stored in the recognition standard pattern storage section stored in advance to determine which standard pattern the input voice corresponds to. a pattern matching unit that recognizes whether the first candidate recognition standard pattern In a specific speaker speech recognition device that utters utterances three or more times per word when registering a standard pattern, the frame length of each utterance, the number of silent intervals, etc. are measured when registering a standard pattern, and one out of three utterances is measured. The frame length measured only by the vocalization of
Only when there is a large difference in the number of silent intervals etc. from the other two utterances and the difference is larger than a predetermined value, the speech synthesis unit synthesizes the standard pattern for synthesis created by that one utterance. A speech recognition device characterized in that a standard pattern is registered by outputting a standard pattern and having a speaker judge whether the pattern is correct.