JP3100208B2

JP3100208B2 - Voice recognition device

Info

Publication number: JP3100208B2
Application number: JP03337842A
Authority: JP
Inventors: 哲也室井
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1991-11-27
Filing date: 1991-11-27
Publication date: 2000-10-16
Anticipated expiration: 2015-10-16
Also published as: JPH05150798A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、入力された音声の認識
を行なう音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus for recognizing inputted speech.

【０００２】[0002]

【従来の技術】一般に音声認識装置においては、入力さ
れた音声の特徴パタ−ンと予め辞書等に登録されている
種々の標準パタ−ンとを照合し、標準パタ−ンのうちで
入力音声の特徴パタ−ンと類似しているものを候補（認
識結果）として選出し、選出された候補のうちで最も大
きな類似度をもつ第１の候補を基本的には最終的な認識
結果として選択するが、従来においてはさらに、最も大
きな類似度をもつ第１の候補が得られたときにも、例え
ばこの第１の候補の認識信頼度が低いような場合には最
終的な認識結果をリジェクトとして出力したり、あるい
はユ−ザに認識結果の確認を求め、確認が得られたもの
だけを正しい認識結果として出力するようになってい
た。このようの機能が付加されていることによって、認
識対象となる音声以外の音響信号が入力された場合や入
力音声が不安定な場合などに誤認識動作がなされるのを
防止することができる。2. Description of the Related Art In general, in a speech recognition apparatus, a feature pattern of an inputted speech is collated with various standard patterns registered in a dictionary or the like in advance, and the input speech among the standard patterns is compared. Are selected as candidates (recognition results), and the first candidate having the highest similarity among the selected candidates is basically selected as the final recognition result. However, conventionally, even when the first candidate having the highest similarity is obtained, for example, if the recognition reliability of the first candidate is low, the final recognition result is rejected. Or asking the user to confirm the recognition result, and outputting only the confirmed result as a correct recognition result. By adding such a function, it is possible to prevent an erroneous recognition operation from being performed when an acoustic signal other than the voice to be recognized is input or when the input voice is unstable.

【０００３】[0003]

【発明が解決しようとする課題】このように従来の音声
認識装置においては、得られた候補の認識信頼度が、例
えば所定の閾値よりも低いときにはリジェクトなどの機
能によって誤認識等を有効に防止することができるが、
その反面、これらの機能が付加されているために、正し
い認識結果を即座に得ることができなくなるなどの問題
があった。例えば、最初の音声がリジェクトされたと
き、音声認識装置の使用者は、１回目の発声が曖昧であ
ったために認識されなかったと考え、認識がなされ易い
ように発話速度を下げてゆっくりと言い直しを行なう場
合がある。しかしながら、この場合、入力音声パタ−ン
の継続時間が標準パタ−ンの継続時間と比較してかえっ
て長いものになって認識信頼度がさらに低下し、使用者
の意図に反して、益々認識しにくいものになってしまう
という問題があった。As described above, in the conventional speech recognition apparatus, when the recognition reliability of the obtained candidate is lower than a predetermined threshold value, for example, erroneous recognition is effectively prevented by a function such as rejection. Can be
On the other hand, since these functions are added, there is a problem that a correct recognition result cannot be obtained immediately. For example, when the first speech is rejected, the user of the speech recognition apparatus thinks that the first utterance was not recognized because it was ambiguous, and lowers the utterance speed and re-phrases slowly to facilitate recognition. May be performed. However, in this case, the duration of the input voice pattern is longer than the duration of the standard pattern, and the recognition reliability is further reduced. There was a problem that it became difficult.

【０００４】本発明は、このような従来の欠点を解決す
るものであり、認識結果が採用されず言い直しをする場
合に、言い直した言葉については正しい認識結果を得易
く、正しい認識結果を早期に得ることの可能な音声認識
装置を提供することを目的としている。SUMMARY OF THE INVENTION The present invention solves such a conventional drawback. In the case where the recognition result is not adopted and restatement is performed, a correct recognition result can be easily obtained for the reworded word, and the correct recognition result can be obtained. It is an object of the present invention to provide a speech recognition device that can be obtained early.

【０００５】[0005]

【課題を解決するための手段】上記目的を達成するため
に請求項１記載の発明は、音声を入力する音声入力手段
と、入力された音声を継続時間制御を用いて認識する認
識手段と、認識手段によって得られた認識結果を採用す
るか否かを判定する判定手段とを有し、前記判定手段に
おいて認識結果が採用されなかった場合に、前記認識手
段は、標準パタ−ンの継続時間を大きく設定し、次の入
力音声を認識するようになっていることを特徴としてい
る。According to one aspect of the present invention, there is provided a voice input unit for inputting a voice, a recognition unit for recognizing the input voice by using duration control, Determining means for determining whether or not to use the recognition result obtained by the recognition means; and when the determination result does not use the recognition result, the recognition means determines whether or not the continuation time of the standard pattern is satisfied. Is set large to recognize the next input voice.

【０００６】また、請求項２記載の発明では、判定手段
において認識結果が採用されなかった場合に、認識手段
は、継続時間の制限を緩めて、次の入力音声を認識する
ことを特徴としている。Further, in the invention according to claim 2, when the recognition result is not adopted by the determination means, the recognition means relaxes the restriction on the duration and recognizes the next input voice. .

【０００７】また、請求項３記載の発明では、判定手段
において認識結果が採用されなかった場合に、認識手段
は、入力音声の終端に対応する標準パタ−ンの継続時間
を大きくして、次の入力音声を認識することを特徴とし
ている。In the invention according to the third aspect, when the recognition result is not adopted by the determination means, the recognition means increases the duration of the standard pattern corresponding to the end of the input voice, and Is recognized.

【０００８】また、請求項４記載の発明は、判定手段に
おいて認識結果が採用されなかった場合に、認識手段
は、入力音声の終端に対する継続時間制御の制限を緩め
て、認識することを特徴としている。The invention according to claim 4 is characterized in that, when the recognition result is not adopted in the determination means, the recognition means relaxes the restriction on the duration control for the end of the input voice and performs recognition. I have.

【０００９】また、請求項５記載の発明では、判定手段
において認識結果が採用されなかった場合に、認識手段
は、入力音声の最後の母音に対応する標準パタ−ンの継
続時間を大きくして、次の入力音声を認識するようにな
っていることを特徴としている。According to the present invention, when the recognition result is not adopted by the judgment means, the recognition means increases the duration of the standard pattern corresponding to the last vowel of the input voice. , The next input voice is recognized.

【００１０】また、請求項６記載の発明では、判定手段
において認識結果が採用されなかった場合に、認識手段
は、入力音声の最後の母音に対応する継続時間制御の制
限を緩めて、次の入力音声を認識するようになっている
ことを特徴としている。Further, in the invention according to claim 6, when the recognition result is not adopted in the determination means, the recognition means relaxes the restriction on the duration control corresponding to the last vowel of the input voice, and It is characterized by recognizing an input voice.

【００１１】[0011]

【作用】本発明では、１回目に発声された音声を認識で
きなかった場合に、標準パタ−ンの継続時間を大きく設
定するか、あるいは、継続時間の制限を緩めるか、ある
いは、入力音声の終端に対応する標準パタ−ンの継続時
間を大きくするか、あるいは、入力音声の終端に対する
継続時間制御の制限を緩めるか、あるいは、入力音声の
最後の母音に対応する標準パタ−ンの継続時間を大きく
するか、あるいは、入力音声の最後の母音に対応する継
続時間制御の制限を緩めるかして、次の入力音声を認識
する。According to the present invention, when the first uttered voice cannot be recognized, the duration of the standard pattern is set to be large, the restriction on the duration is relaxed, or the input voice is not reproduced. Either increase the duration of the standard pattern corresponding to the end, relax the restriction on the duration control for the end of the input voice, or the duration of the standard pattern corresponding to the last vowel of the input voice , Or the restriction on the duration control corresponding to the last vowel of the input voice is relaxed to recognize the next input voice.

【００１２】[0012]

【実施例】以下、本発明の実施例を図面に基づいて説明
する。図１は本発明に係る音声認識装置のブロック図で
ある。図１の音声認識装置は、音声を入力する音声入力
部１と、入力された音声の特徴パターンを予め登録され
ている種々の標準パターンと照合し、標準パターンのう
ちで入力音声の特徴パターンと類似しているものを候補
（認識結果）として選出する認識部２と、選出された候
補のうちで最も大きな類似度をもつものを基本的には最
終的な認識結果として選択するが、この候補の認識信頼
度をも考慮してこの候補を最終的な認識結果として採用
するか否かを判定する判定部３とを有している。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram of a speech recognition device according to the present invention. The voice recognition device of FIG. 1 collates a voice input unit 1 for inputting voice, a feature pattern of the input voice with various standard patterns registered in advance, and detects a feature pattern of the input voice among the standard patterns. The recognizing unit 2 selects similar ones as candidates (recognition results), and the one having the highest similarity among the selected candidates is basically selected as the final recognition result. And a determination unit 3 that determines whether to adopt this candidate as a final recognition result in consideration of the recognition reliability.

【００１３】認識部２における認識手法には種々の方式
のものを用いることができるが、本発明では、認識率の
向上を図るため、認識部２は、継続時間制御を行なうよ
うになっている。Although various methods can be used as the recognition method in the recognition unit 2, in the present invention, the recognition unit 2 performs duration control in order to improve the recognition rate. .

【００１４】なお、継続時間制御を行なう音声認識手法
としては、例えば、文献「中川聖一著“確率モデルによ
る音声認識”，昭和６３年，電子情報通信学会，第７４
頁〜第７８頁」に開示されている継続時間制御付き隠れ
マルコフモデルや、本願の発明者による文献「“継続時
間制御型状態遷移モデルを用いた単語音声認識”，電子
通信学会論文誌，Ｖｏｌ、７２−Ｄ−II，１９８９年１
１月）に開示のものなどが知られており、以下では、本
願の発明者による上記文献に示されている継続時間制御
型状態遷移モデルを用いて説明する。As a speech recognition method for performing the duration control, for example, the document “Seiichi Nakagawa,“ Speech Recognition by Stochastic Model ”, 1988, IEICE, No. 74
Hidden Markov model with duration control disclosed in "Pages to 78", and the document "Word speech recognition using duration control type state transition model" by the inventor of the present application, IEICE Transactions, Vol. , 72-D-II, 1989 1
(January) is known, and the following description will be made using the duration control type state transition model shown in the above-mentioned document by the inventor of the present application.

【００１５】この継続時間制御型状態遷移モデルは、基
本的には、次の漸化式によって、未知の入力音声パタ−
ンと標準パタ−ンとの距離を計算するものである。This duration control type state transition model basically has an unknown input voice pattern by the following recurrence formula.
This is to calculate the distance between the pattern and the standard pattern.

【００１６】[0016]

【数１】 (Equation 1)

【００１７】ここで、Ｄ（ｉ，ｊ）は、入力音声の開始
からｉフレ−ム目までを標準パタ−ンの第１〜ｊ状態に
割当てたときの距離であり、Ｄ（Ｉ，Ｊ）（Ｉは入力音
声の終了フレ−ム番号，Ｊは標準パタ−ンの最終状態番
号）が入力音声と標準パタ−ンとの距離になる。また、
ｘ_iは入力音声の第ｉフレ−ムの特徴ベクトル，ｙ_jは標
準パタ−ンの第ｊ状態の特徴ベクトルを示している。ま
た、Ｂ（ｊ）はマッチングパスの形状を記憶するもの
で、初めて第ｊ状態に遷移したフレ−ム番号を表してい
る。このため、数１のｉ−Ｂ（ｊ−１）は入力音声パタ
−ンが第ｊ状態に滞留したフレ−ム数になる。また、Ｌ
（ｊ）は標準パタ−ンの第ｊ状態の継続時間を表わす。
また、ｗ_jにより継続時間制御の強さを変えることがで
き、ｗ_jが大きい値を採るほど継続時間制御の制限が厳
しくなる。Here, D (i, j) is the distance when the frame from the start of the input voice to the i-th frame is assigned to the first to j states of the standard pattern, and D (I, J). ) (I is the end frame number of the input voice, J is the final state number of the standard pattern) is the distance between the input voice and the standard pattern. Also,
x _i is a feature vector of the i-th frame of the input voice, and y _j is a feature vector of the j-th state of the standard pattern. B (j) stores the shape of the matching path, and represents the frame number of the first transition to the j-th state. Therefore, iB (j-1) in Equation 1 is the number of frames in which the input voice pattern stays in the j-th state. Also, L
(J) represents the duration of the j-th state of the standard pattern.
Further, it is possible to vary the intensity of the duration control by w _j, continuous time control limit as taking w _j larger value becomes severe.

【００１８】認識部２では、このように継続時間制御型
状態遷移モデルを標準パタ−ンとして用い、入力音声の
特徴パターン，すなわち未知入力パタ−ンと認識対象と
なる全ての標準パターンとのパタ−ンマッチングを行な
い、その結果得られる距離を類似度値（スコア）として
求め、類似度値の大きさの順に所定順位までの候補を選
出するようになっている。The recognizing section 2 uses the duration control type state transition model as a standard pattern as described above, and uses the characteristic pattern of the input voice, that is, the pattern of the unknown input pattern and all the standard patterns to be recognized. , Matching is performed, the resulting distance is obtained as a similarity value (score), and candidates up to a predetermined rank are selected in the order of the magnitude of the similarity value.

【００１９】また、判定部３における認識信頼度の求め
方にも種々の方式のものを用いることができる。例え
ば、最も大きな類似度値をもつ第１の候補の類似度値Ｒ
（１）とその次の第２の候補の類似度値Ｒ（２）との差
や比をとったものを第１の候補についての認識信頼度Ｓ
として求めることができる。以下では、判定部３は、認
識信頼度Ｓを次式のように類似度値Ｒ（１）とＲ（２）
との差により求め、この認識信頼度Ｓが例えば閾値ＴＨ
よりも大きいか否かにより、この第１の候補を認識結果
として採用するか否かを判定し、採用しない場合には、
ユーザに再発声，すなわち言い直しを促すようになって
いるとして説明する。Also, various methods can be used for determining the recognition reliability in the determination unit 3. For example, the similarity value R of the first candidate having the largest similarity value
The difference or ratio between (1) and the similarity value R (2) of the next second candidate is taken as the recognition reliability S of the first candidate.
Can be obtained as In the following, the determination unit 3 determines the recognition reliability S as the similarity values R (1) and R (2) as in the following equation.
And the recognition reliability S is, for example, a threshold value TH.
It is determined whether or not to adopt this first candidate as a recognition result, depending on whether or not it is larger than.
A description will be given assuming that the user is urged to re-speak, that is, restate.

【００２０】[0020]

【数２】Ｓ＝Ｒ（２）−Ｒ（１）S = R (2) -R (1)

【００２１】先づ、本発明の第１の実施例について説明
する。上述のような継続時間制御型状態遷移モデルが用
いられ、また判定部３が上記のような判定を行なうよう
になっているとき、本発明の第１の実施例においては、
認識部２は、１つの単語当り、８個の状態を持つ継続時
間制御型状態遷移モデルを標準パタ−ンとして用意し、
入力音声パタ−ンと認識対象となる全ての標準パタ−ン
とのパタ−ンマッチングを行なうようになっている。ま
た、判定部３において数２に基づき認識結果の第１位の
候補の認識信頼度Ｓが閾値ＴＨよりも低いと判断し言い
直しを促がし、使用者が言い直しを行なうと、認識部２
は、（１）数１の全ての継続時間Ｌ（ｊ）の値を大きく
して、言い直した音声に対する認識を行なうようになっ
ている。あるいは、（２）数１の全てのｗ_jの値を小さ
くして、言い直した言葉に対する認識を行なうようにな
っている。あるいは、（３）単語の終端に対する継続時
間Ｌ（ｊ）の値を大きくして（具体的には、例えば、ｊ
が６，７，８に対するＬ（ｊ）の値を１．５倍にし
て）、言い直した言葉に対する認識を行なうようになっ
ている。あるいは、（４）単語の終端に対するｗ_jの値
を小さくして（具体的には、例えば、ｊが６，７，８に
対するｗ_jの値を０．５倍にして）、言い直した言葉に
対する認識を行なうようになっている。First, a first embodiment of the present invention will be described. In the first embodiment of the present invention, when the duration control type state transition model as described above is used and the determination unit 3 performs the above determination,
The recognition unit 2 prepares a duration control type state transition model having eight states per word as a standard pattern,
Pattern matching is performed between the input voice pattern and all the standard patterns to be recognized. Further, the determining unit 3 determines that the recognition reliability S of the first candidate in the recognition result is lower than the threshold value TH based on Expression 2, and prompts the user to repeat the repetition. 2
(1) increases the value of all the durations L (j) of Equation 1 and recognizes the rephrased voice. Alternatively, (2) all the values of w _j in Equation 1 are reduced, and the rephrased word is recognized. Alternatively, (3) increasing the value of the duration L (j) with respect to the end of the word (specifically, for example, j
(1.5 times the value of L (j) for 6, 7, 8) to recognize the rephrased word. Alternatively, (4) the value of w _j with respect to the end of the word is reduced (specifically, for example, the value of w _j with respect to j is 6, 7, and 8 is multiplied by 0.5), and Is recognized.

【００２２】このような第１の実施例の構成において、
１回目の発声で音声を認識できなかったときに、認識部
２が（１）のように、標準パタ−ンの継続時間Ｌ（ｊ）
を大きい値に変更してから、言い直した言葉に対する認
識を行なうようになっている場合には、使用者が１回目
に比べて発話速度を下げてゆっくりと言い直しを行な
い、この結果、数１においてｉ−Ｂ（ｊ−１）の値が大
きくなっても、これに対してＬ（ｊ）も大きな値となっ
ているため、入力音声パタ−ンと標準パタ−ンとのマッ
チング時に継続時間の違いによってこれらの間の距離が
大きくなってしまうという不都合が生じるのを有効に防
止することができる。In the configuration of the first embodiment,
When the voice cannot be recognized by the first utterance, the recognition unit 2 sets the standard pattern duration L (j) as shown in (1).
Is changed to a larger value, and then the recognition of the rephrased word is performed, the user lowers the utterance speed compared to the first time and rephrases slowly, and as a result, Even if the value of i-B (j-1) becomes large in No. 1, L (j) also becomes a large value, so that it is continued during the matching between the input voice pattern and the standard pattern. It is possible to effectively prevent the inconvenience of increasing the distance between them due to the difference in time.

【００２３】また、認識部２が（２）のように、ｗ_jの
値を小さくし継続時間制御の制限を緩めてから、言い直
した言葉に対する認識を行なうようになっている場合に
は、使用者が発話速度を下げてゆっくりと言い直しを行
ない、この結果、ｉ−Ｂ（ｊ−１）の値が大きくなって
Ｌ（ｊ−１）との差が大きくなっても、ｗ_jを小さい値
に変更しているため、入力音声パタ−ンと標準パタ−ン
とのマッチング時に継続時間の違いによってこれらの間
の距離が大きくなってしまうという不都合が生じるのを
有効に防止することができる。In the case where the recognizing unit 2 performs the recognition of the rephrased word after reducing the value of w _j and relaxing the restriction of the duration control as in (2), Even if the user lowers the utterance speed and restates slowly, as a result, even if the value of i−B (j−1) increases and the difference from L (j−1) increases, w _j can be increased. Since the value is changed to a small value, it is possible to effectively prevent the inconvenience that the distance between the input voice pattern and the standard pattern becomes large due to a difference in duration when matching the standard pattern. it can.

【００２４】また、認識部２が、（３）のように、単語
の終端に対する標準パタ−ンの継続時間Ｌ（ｊ）（上述
の例では、全てのＬ（ｊ）のうちのｊ＝６，７，８の部
分）を大きい値に変更してから、言い直した言葉に対す
る認識を行なうようになっている場合には、使用者がゆ
っくりと言い直しを行ない、この結果、単語の語尾が長
くなり、単語終端部分でｉ−Ｂ（ｊ−１）の値が大きく
なっても、単語の終端に対する標準パタ−ンの継続時間
Ｌ（ｊ）についてもこれを大きい値に変更しているた
め、入力音声パタ−ンと標準パタ−ンとのマッチング時
に継続時間の違いによってこれらの間の距離が大きくな
ってしまうという不都合が生じるのを有効に防止するこ
とができる。Also, as shown in (3), the recognizing unit 2 determines that the standard pattern duration L (j) with respect to the end of the word (j = 6 of all L (j) in the above example) , 7, 8) to a large value, and then recognizes the rephrased word, the user slowly rephrases, and as a result, the ending of the word is changed. Even if the value of i-B (j-1) increases at the end of the word, the duration L (j) of the standard pattern for the end of the word is also changed to a large value. In addition, it is possible to effectively prevent an inconvenience that the distance between the input voice pattern and the standard pattern is increased due to a difference in duration when matching the input voice pattern and the standard pattern.

【００２５】また、認識部２が、（４）のように、単語
の終端に対応する継続時間制御の制限を緩めてから（上
述の例では、ｗ_jのうちのｊ＝６，７，８の部分）、言
い直した言葉に対する認識を行なうようになっている場
合には、使用者が発話速度を下げてゆっくりと言い直し
を行ない、この結果、単語の語尾が長くなり、単語終端
部分でｉ−Ｂ（ｊ−１）の値が大きくなっても、単語の
終端に対するｗ_jについてもこれを小さい値に変更して
いるため、入力音声パタ−ンと標準パタ−ンとのマッチ
ング時に継続時間の違いによってこれらの間の距離が大
きくなってしまうという不都合が生じるのを有効に防止
することができる。Also, the recognizing unit 2 relaxes the restriction on the duration control corresponding to the end of the word as in (4) (in the above example, j = 6, 7, 8 of w _j ). Part), when the user recognizes the rephrased word, the user lowers the utterance speed and rephrases slowly, and as a result, the ending of the word becomes longer, and the word end part becomes longer. even if the value of i-B (j-1) is increased, since the change to a smaller value, even for the w _j for the end of the word, the input speech pattern - emission and the standard pattern - continue when matched with emission It is possible to effectively prevent the inconvenience of increasing the distance between them due to the difference in time.

【００２６】従って、上述した（１）乃至（４）のいず
れかの手法を用いることによって、ゆっくりと言い直し
がなされた場合にも、認識信頼度は低下せず、ゆっくり
と言い直しがなされた明瞭な音声に基づき、認識信頼度
を高めることができ、正しい認識結果を迅速に得ること
ができる。Therefore, by using any one of the above-mentioned methods (1) to (4), even if the word is rewritten slowly, the recognition reliability is not reduced and the word is rewritten slowly. Based on clear speech, recognition reliability can be increased, and correct recognition results can be obtained quickly.

【００２７】なお、上記例では、（１）乃至（４）のい
ずれか１つの手法が用いられるとして説明したが、
（１）乃至（４）の手法のうちの２つ，あるいは３つの
手法が併用されても良く、あるいは（１）乃至（４）の
全ての手法が併用されても良い。また、（１）乃至
（４）の手法を併用する場合に、そのうちで最も高い認
識信頼度を与えたものからの認識結果を判定部３に送る
ようにしても良い。このときには、より認識精度を高め
ることができる。In the above example, it has been described that any one of the methods (1) to (4) is used.
Two or three of the methods (1) to (4) may be used together, or all of the methods (1) to (4) may be used together. When the methods (1) to (4) are used together, the recognition result from the one giving the highest recognition reliability may be sent to the determination unit 3. At this time, the recognition accuracy can be further improved.

【００２８】次に、本発明の第２の実施例について説明
する。また、本発明の第２の実施例においては、認識部
２は、１音素について１〜４状態の継続時間制御型状態
遷移モデルを音素標準パタ−ンとして用意し、この音素
標準パタ−ンを単語ごとに接続するようになっている。
また、判定部３において、数２に基づき認識結果の第１
位の候補の認識信頼度が閾値ＴＨよりも低いと判断し言
い直しを促がし、使用者が言い直しを行なうと、認識部
２は、（５）単語の最終母音に対応する標準パタ−ンの
継続時間Ｌ（ｊ）の値を大きくして、言い直した音声に
対する認識を行なうようになっている。あるいは、
（６）単語の最終母音に対応する標準パタ−ンのｗ_jの
値を小さくして、言い直した音声に対する認識を行なう
ようになっている。Next, a second embodiment of the present invention will be described. Further, in the second embodiment of the present invention, the recognition unit 2 prepares a duration control type state transition model of 1 to 4 states for one phoneme as a phoneme standard pattern, and this phoneme standard pattern is used. The connection is made for each word.
Further, the determination unit 3 determines the first of the recognition results based on Equation 2.
If the recognition is determined to be lower than the threshold value TH and the user is prompted to re-state, and the user re-performs, the recognizing unit 2 sets (5) the standard pattern corresponding to the final vowel of the word. The value of the continuation time L (j) is increased, and recognition of the rephrased voice is performed. Or,
(6) The value of w _j of the standard pattern corresponding to the final vowel of a word is reduced, and the re-speech is recognized.

【００２９】このような第２の実施例の構成において、
１回目の発声で音声を認識できなかったとき、認識部２
が（５）のように入力音声の最終母音に対応する標準パ
タ−ンの継続時間Ｌ（ｊ）を大きい値に変更してから、
言い直した言葉に対する認識を行なうようになっている
場合には、使用者が発話速度を下げてゆっくりと言い直
しを行ない、この結果、単語の語尾が長くなり、単語終
端部分でｉ−Ｂ（ｊ−１）の値が大きくなっても、単語
の終端に対する標準パタ−ンの継続時間Ｌ（ｊ）につい
てもこれを大きい値に変更しているため、入力音声パタ
−ンと標準パタ−ンとのマッチング時に継続時間の違い
によってこれらの間の距離が大きくなってしまうという
不都合が生じるのを有効に防止することができる。In the configuration of the second embodiment,
When the speech cannot be recognized by the first utterance, the recognition unit 2
After changing the duration L (j) of the standard pattern corresponding to the final vowel of the input voice to a large value as shown in (5),
In the case of recognizing the rephrased word, the user lowers the utterance speed and slowly rephrases. As a result, the ending of the word becomes longer, and i-B ( Even if the value of j-1) increases, the duration L (j) of the standard pattern with respect to the end of the word is also changed to a large value, so that the input voice pattern and the standard pattern It is possible to effectively prevent the inconvenience that the distance between them becomes large due to the difference in the duration when matching with.

【００３０】また、認識部２が（６）のように、入力音
声の最終母音に対する継続時間制御の制限を緩めてか
ら、言い直した言葉に対する認識を行なうようになって
いる場合には、使用者が発話速度を下げてゆっくりと言
い返しを行ない、この結果、単語終端の部分でｉ−Ｂ
（ｊ−１）の値が大きくなってＬ（ｊ−１）との差が大
きくなっても、ｗ_jを小さい値に変更しているため、入
力音声パタ−ンと標準パタ−ンとのマッチング時に継続
時間の違いによってこれらの間の距離が大きくなってし
まうという不都合が生じるのを有効に防止することがで
きる。If the recognition unit 2 relaxes the restriction on the duration control of the final vowel of the input voice as in (6) and then recognizes the rephrased word, The person slows down the utterance speed and repeats slowly, and as a result, the i-B
Be (j-1) values of increases it becomes the difference is large between the L (j-1), because it changes the w _j to a small value, the input speech pattern - emission and the standard pattern - ting of It is possible to effectively prevent the disadvantage that the distance between them becomes large due to the difference in the duration during matching.

【００３１】このように第２の実施例においても、上述
した（５）乃至（６）のいずれかの手法を用いることに
よって、ゆっくりと言い直しがなされた場合にも、認識
信頼度は低下せず、ゆっくりと言い直しがなされた明瞭
な音声に基づき、認識信頼度を高めることができ、正し
い認識結果を迅速に得ることができる。As described above, also in the second embodiment, by using any of the above-mentioned methods (5) and (6), the recognition reliability is reduced even when the word is rephrased slowly. Instead, the recognition reliability can be increased based on a clear voice that has been slowly rephrased, and a correct recognition result can be obtained quickly.

【００３２】なお、上記例では、（５）乃至（６）のい
ずれか１つの手法が用いられるとして説明したが、
（５）乃至（６）の手法が併用されても良い。また、
（５）乃至（６）の手法を併用する場合に、そのうちで
最も高い認識信頼度を与えたものからの認識結果を判定
部３に送るようにしても良い。このときには、より認識
精度を高めることができる。In the above example, it has been described that any one of the methods (5) and (6) is used.
The methods (5) and (6) may be used together. Also,
When the methods (5) and (6) are used together, the recognition result from the one giving the highest recognition reliability may be sent to the determination unit 3. At this time, the recognition accuracy can be further improved.

【００３３】また、さらには、第１の実施例と第２の実
施例とを組合せ（１）乃至（６）を併用しても良く、こ
の場合に、そのうちで最も高い認識信頼度を与えたもの
からの認識結果を判定部３に送るようにしても良い。こ
のときには、さらに一層認識精度を高めることができ
る。Further, the first embodiment and the second embodiment may be combined with the combinations (1) to (6). In this case, the highest recognition reliability is given. The recognition result from the object may be sent to the determination unit 3. At this time, the recognition accuracy can be further improved.

【００３４】[0034]

【発明の効果】以上に説明したように、本発明では、認
識結果が採用されず言い直しをする場合に、言い直した
発声については、標準パタ−ンの継続時間を大きく設定
するか、あるいは、継続時間の制限を緩めるか、あるい
は、入力音声の終端に対応する標準パタ−ンの継続時間
を大きくするか、あるいは、入力音声の終端に対する継
続時間制御の制限を緩めるか、あるいは、入力音声の最
後の母音に対応する標準パタ−ンの継続時間を大きくす
るか、あるいは、入力音声の最後の母音に対応する継続
時間制御の制限を緩めるかして、認識するようにしてい
るので、言い直した言葉について正しい認識結果を得易
く、正しい認識結果を早期に得ることができる。As described above, according to the present invention, when the recognition result is not adopted and rephrasing is performed, the repetition utterance is set to a longer duration of the standard pattern, or Either relax the restriction on the duration, increase the duration of the standard pattern corresponding to the end of the input voice, relax the restriction on the duration control for the end of the input voice, or Since the duration of the standard pattern corresponding to the last vowel of the input voice is increased or the restriction on the duration control corresponding to the last vowel of the input voice is relaxed, the recognition is made. A correct recognition result can be easily obtained for the corrected word, and a correct recognition result can be obtained early.

[Brief description of the drawings]

【図１】本発明に係る音声認識装置の一実施例のブロッ
ク図である。FIG. 1 is a block diagram of one embodiment of a speech recognition device according to the present invention.

[Explanation of symbols]

１音声入力部２認識部３判定部 DESCRIPTION OF SYMBOLS 1 Voice input part 2 Recognition part 3 Judgment part

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/00 - 17/00 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int. Cl. ⁷ , DB name) G10L 15/00-17/00 JICST file (JOIS)

Claims

(57) [Claims]

1. A voice input means for inputting voice, a recognition means for recognizing the input voice by using duration control,
Determining means for determining whether or not to use the recognition result obtained by the recognition means; and when the determination result does not use the recognition result, the recognition means determines whether or not the continuation time of the standard pattern is satisfied. Is set to be large so as to recognize the next input voice.

2. A voice input unit for inputting voice, a recognition unit for recognizing the input voice using duration control,
Determining means for determining whether or not to use the recognition result obtained by the recognition means, and when the recognition result is not adopted by the determination means, the recognition means relaxes the restriction on the duration. A speech recognition device for recognizing a next input speech.

3. A voice input unit for inputting voice, a recognition unit for recognizing the input voice using duration control,
Determining means for determining whether or not to use the recognition result obtained by the recognition means, wherein when the recognition result is not used by the determination means, the recognition means corresponds to the end of the input voice A speech recognition device characterized by recognizing the next input speech by increasing the duration of the standard pattern.

4. A voice input means for inputting voice, a recognition means for recognizing the input voice using duration control,
Determining means for determining whether or not to use the recognition result obtained by the recognition means; and when the determination result does not use the recognition result, the recognition means determines whether or not the continuation time with respect to the end of the input voice A speech recognition apparatus characterized in that the restriction on control is relaxed and recognition is performed.

5. A voice input unit for inputting voice, a recognition unit for recognizing the input voice using duration control,
Determining means for determining whether or not to use the recognition result obtained by the recognition means, and when the recognition result is not used in the determination means, the recognition means sets the last vowel of the input voice as A speech recognition apparatus characterized by recognizing the next input speech by increasing the duration of a corresponding standard pattern.

6. A voice input unit for inputting voice, a recognition unit for recognizing the input voice by using duration control,
Determining means for determining whether or not to use the recognition result obtained by the recognition means, and when the recognition result is not used in the determination means, the recognition means sets the last vowel of the input voice as A speech recognition device characterized by recognizing the next input speech by relaxing the restriction of the corresponding duration control.