JPH05158493A

JPH05158493A - Speech recognizing device

Info

Publication number: JPH05158493A
Application number: JP3324930A
Authority: JP
Inventors: Hitoshi Iwamida; 均岩見田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-12-10
Filing date: 1991-12-10
Publication date: 1993-06-25

Abstract

PURPOSE:To obtain accurate recognizing result with less input frequency about sound voice information, relating to a speech recognizing device which recognizes and outputs a character string displayed by sound voice information inputted by a user. CONSTITUTION:In a voice sound recognizing device 1, the likelihood between the registered voice sound standard pattern and the sound pattern to be recognized is evaluated by obtaining local likelihood string to specify and output the character string corresponding to the voice standard pattern with highest likelihood. The device 1 is provided with a dividing part 12 which divides the character string obtained as recognizing result and divides the obtained local likelihood string in a manner corresponding to the division and a calculating part 13 which calculates the likelihood corresponding to the divided each character part by calculating a representative value of divided each local likelihood string. When the character string obtained as recognizing result is outputted, the likelihood calculated by the calculating part 13 is made to be related with corresponding character part.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ユーザの入力する音声
情報が予め登録されてある文字列のどれに対応するのか
を認識して、その認識結果をユーザに対して出力してい
くよう動作する音声認識装置に関し、特に、正確な認識
結果を少ない音声情報の入力回数でもって得られるよう
にする音声認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention operates so as to recognize which one of pre-registered character strings corresponds to voice information input by a user and output the recognition result to the user. The present invention relates to a voice recognition device, and more particularly to a voice recognition device that can obtain an accurate recognition result with a small number of times of inputting voice information.

【０００２】近年、オペレータの発声する音声情報を認
識して、その認識結果に従って対象物の自動振り分け等
の処理を実行していくようなシステムが普及しつつあ
る。このようなシステムに用いられる音声認識装置は、
オペレータの発声する音声情報が予め登録されてある文
字列のどれに対応するのかを認識していくとともに、オ
ペレータは、その認識結果が正しいものでない場合に
は、正しい認識結果が得られるまで音声情報の発声を繰
り返していくことになる。これから、このような音声認
識装置では、オペレータの希望する認識結果が少ない音
声情報の発声回数でもって得られるようにする構成を採
っていく必要がある。In recent years, a system is becoming widespread in which voice information uttered by an operator is recognized and processing such as automatic distribution of objects is executed according to the recognition result. The voice recognition device used in such a system is
While recognizing which of the registered character strings the voice information uttered by the operator corresponds to, the operator, if the recognition result is not correct, the voice information until the correct recognition result is obtained. Will be repeated. Therefore, in such a voice recognition device, it is necessary to adopt a configuration in which the recognition result desired by the operator can be obtained with a small number of voice information utterances.

【０００３】[0003]

【従来の技術】ユーザの入力する音声情報が予め登録さ
れてある文字列のどれに対応するのかを認識して、その
認識結果をユーザに対して表示していくよう動作する音
声認識装置では、従来、１位の認識結果の文字列を表示
する構成を採るか、上位複数個の認識結果の文字列を表
示する構成を採っている。2. Description of the Related Art In a voice recognition device that recognizes which of the character strings registered in advance the voice information input by the user corresponds to and displays the recognition result to the user, Conventionally, the configuration has been adopted in which the character string of the recognition result of the first place is displayed or the character string of the recognition result of a plurality of higher ranks is displayed.

【０００４】そして、ユーザは、この認識結果がいずれ
も正しいものでないときには、再度音声情報を発声して
音声認識装置に入力していくことになる。When none of the recognition results is correct, the user again speaks voice information and inputs it to the voice recognition device.

【０００５】[0005]

【発明が解決しようとする課題】このように、従来の音
声認識装置では、ただ単に認識結果となる文字列を表示
する構成を採るだけであることから、ユーザは、音声認
識装置がどの音声情報部分で誤認識したのかを知るすべ
がなく、音声認識装置が誤認識した場合、やみくもに音
声情報の入力を繰り返していくことになる。これから、
正しい認識結果を得られるまでに、音声情報を何回も入
力していかなくてはならないことが起こるという問題点
があったのである。As described above, since the conventional voice recognition device merely displays the character string which is the recognition result, the user can recognize which voice information the voice recognition device has. If there is no way of knowing whether or not a part has been erroneously recognized, and if the voice recognition device erroneously recognizes it, the user will blindly repeat the input of voice information. from now on,
There was a problem that voice information had to be input many times before the correct recognition result was obtained.

【０００６】本発明はかかる事情に鑑みてなされたもの
であって、正確な認識結果を少ない音声情報の入力回数
でもって得られるようにする新たな音声認識装置の提供
を目的とするものである。The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a new voice recognition apparatus which can obtain an accurate recognition result with a small number of times of inputting voice information. ..

【０００７】[0007]

【課題を解決するための手段】図１に本発明の原理構成
を図示する。図中、１は本発明を具備する音声認識装
置、２は音声認識装置１の認識結果を出力する出力装置
である。FIG. 1 illustrates the principle configuration of the present invention. In the figure, 1 is a voice recognition device equipped with the present invention, and 2 is an output device for outputting the recognition result of the voice recognition device 1.

【０００８】音声認識装置１は、標準パターン管理部１
０と、認識部１１と、分割部１２と、算出部１３と、出
力部１４とを備える。この標準パターン管理部１０は、
入力されてくる可能性のある音声信号の持つ音声パター
ンを文字列との対応関係をとりつつ音声標準パターンと
して管理する。認識部１１は、認識対象の音声信号の音
声パターンが与えられるときに、その音声パターンと標
準パターン管理部１０に管理されている音声標準パター
ンとの尤度を局所尤度列を求めつつ評価して、認識対象
の音声信号の表す文字列を認識する。The voice recognition device 1 includes a standard pattern management unit 1
0, a recognition unit 11, a division unit 12, a calculation unit 13, and an output unit 14. This standard pattern management unit 10
A voice pattern of a voice signal that may be input is managed as a voice standard pattern while having a correspondence with a character string. When a voice pattern of a voice signal to be recognized is given, the recognition unit 11 evaluates the likelihood of the voice pattern and the voice standard pattern managed by the standard pattern management unit 10 while obtaining a local likelihood sequence. The character string represented by the voice signal to be recognized is recognized.

【０００９】分割部１２は、認識部１１の認識した文字
列を分割するとともに、その分割に対応して認識部１１
により求められた局所尤度列を分割する。分割部１２
は、この分割処理にあたって、時間を基準として認識結
果の文字列を分割していくことがあり、また、音素又は
音節を基準として認識結果の文字列を分割していくこと
がある。The dividing unit 12 divides the character string recognized by the recognizing unit 11 and, corresponding to the division, the recognizing unit 11
The local likelihood sequence obtained by is divided. Dividing unit 12
In this division processing, the character string of the recognition result may be divided based on time, and the character string of the recognition result may be divided based on the phoneme or syllable.

【００１０】算出部１３は、分割部１２により分割され
た各局所尤度列の代表値を算出することで、分割部１２
により分割された各文字部分に対応付けられる尤度を算
出する。出力部１４は、出力装置２に認識結果を出力す
る。The calculating unit 13 calculates the representative value of each local likelihood sequence divided by the dividing unit 12, and thereby the dividing unit 12
The likelihood associated with each character part divided by is calculated. The output unit 14 outputs the recognition result to the output device 2.

【００１１】[0011]

【作用】本発明では、ユーザが認識対象となる音声情報
を発声することで、認識部１１に認識対象の音声パター
ンが与えられると、認識部１１は、その音声パターンと
標準パターン管理部１０に管理されている音声標準パタ
ーンとの尤度を局所尤度列を求めつつ評価して、最も尤
度の高い音声標準パターン、あるいは上位複数個の音声
標準パターンに対応する文字列を認識対象の音声情報の
表す文字列として認識する。In the present invention, when the voice pattern to be recognized is given to the recognition unit 11 by the user uttering voice information to be recognized, the recognition unit 11 causes the voice pattern and the standard pattern management unit 10 to operate. The likelihood with the managed voice standard pattern is evaluated while obtaining the local likelihood sequence, and the voice standard pattern with the highest likelihood or a character string corresponding to a plurality of high-rank voice standard patterns is recognized. Recognize as a character string that represents information.

【００１２】このようにして、認識部１１により認識結
果の文字列が求められると、分割部１２は、その認識結
果の文字列を例えば音素を基準にして分割するととも
に、その分割に対応して認識部１１により求められた局
所尤度列を分割し、この分割処理を受けて、算出部１３
は、分割部１２により分割された各局所尤度列の例えば
平均値を算出していくことで、分割部１２により分割さ
れた各文字部分に対応付けられる尤度を算出する。When the recognition unit 11 obtains the character string of the recognition result in this way, the division unit 12 divides the character string of the recognition result based on, for example, a phoneme, and, corresponding to the division, The local likelihood sequence obtained by the recognition unit 11 is divided, and subjected to this division processing, the calculation unit 13
Calculates the likelihood associated with each character portion divided by the dividing unit 12 by calculating, for example, an average value of each local likelihood sequence divided by the dividing unit 12.

【００１３】そして、出力部１４は、この算出部１３の
算出結果を受けて、認識部１１の認識した認識結果の文
字列を出力していくときに、算出された尤度を対応する
各文字部分に関連付けて出力していく。このとき、算出
された尤度の内の相対的に高い１つ又は複数の尤度と、
この尤度に対応付けられる文字部分との双方又はいずれ
か一方を、他の文字部分についての出力形態とは異なる
出力形態で出力していくよう処理することがある。Then, when the output unit 14 receives the calculation result of the calculation unit 13 and outputs the character string of the recognition result recognized by the recognition unit 11, the calculated likelihood corresponds to each character. Output in association with the part. At this time, one or a plurality of relatively high likelihoods among the calculated likelihoods,
Either or both of the character part associated with this likelihood may be output in an output form different from the output form of the other character part.

【００１４】このように、本発明の音声認識装置１で
は、ただ単に認識結果となる文字列を表示するのではな
くて、認識処理により求められた各文字部分の尤度を関
連付けて表示する構成を採るものであることから、ユー
ザは、認識結果が誤りであることで音声情報を再入力す
るときにあって、どの音声情報部分に注意して発声すべ
きかを知ることができるので、音声情報を何回も入力し
ていかなくても済むようになるのである。As described above, in the voice recognition device 1 of the present invention, the likelihood of each character portion obtained by the recognition processing is displayed in association with each other, instead of simply displaying the character string as the recognition result. Therefore, the user can know which voice information part to utter when paying attention to the voice information when the voice information is re-input due to the incorrect recognition result. You don't have to type in many times.

【００１５】[0015]

【実施例】以下、実施例に従って本発明を詳細に説明す
る。図２に、本発明の一実施例を図示する。図中、１は
本発明に係る音声認識装置の一実施例、２ａは音声認識
装置１の認識結果を出力するディスプレイ装置、３はユ
ーザの発声する単語音声情報を電気信号に変換すること
で単語音声信号を生成するマイクロフォンである。EXAMPLES The present invention will be described in detail below with reference to examples. FIG. 2 illustrates one embodiment of the present invention. In the figure, 1 is an embodiment of a voice recognition device according to the present invention, 2a is a display device for outputting a recognition result of the voice recognition device 1, and 3 is a word by converting word voice information uttered by a user into an electric signal. A microphone that generates an audio signal.

【００１６】この実施例の音声認識装置１は、音声入力
部１５と、周波数分析部１６と、標準周波数パターン管
理部１０ａと、照合部１１ａと、音素分割部１２ａと、
音素尤度算出部１３ａと、認識結果表示制御部１４ａと
を備える。The voice recognition device 1 of this embodiment includes a voice input unit 15, a frequency analysis unit 16, a standard frequency pattern management unit 10a, a collation unit 11a, a phoneme division unit 12a,
The phoneme likelihood calculation unit 13a and the recognition result display control unit 14a are provided.

【００１７】この音声入力部１５は、マイクロフォン３
の変換した単語音声信号をＡ／Ｄ変換する。周波数分析
部１６は、音声入力部１５によりＡ／Ｄ変換された単語
音声信号をある適当な周期毎に周波数分析することで時
間周波数パターン（各時間でどのような周波数特性を持
つのかを表示する）を得る。標準周波数パターン管理部
１０ａは、図１の標準パターン管理部１０に対応するも
のであって、入力されてくる可能性のある単語音声信号
の持つ典型的な時間周波数パターンを単語文字列との対
応関係をとりつつ標準周波数パターンとして管理する。The voice input unit 15 includes a microphone 3
A / D conversion is performed on the converted word voice signal. The frequency analysis unit 16 frequency-analyzes the A / D-converted word voice signal by the voice input unit 15 for each appropriate period to display a time-frequency pattern (what frequency characteristic has at each time). ) Get. The standard frequency pattern management unit 10a corresponds to the standard pattern management unit 10 of FIG. 1, and associates a typical time-frequency pattern of a word voice signal that may be input with a word character string. It is managed as a standard frequency pattern while taking a relationship.

【００１８】照合部１１ａは、図１の認識部１１に相当
するものであって、例えばＤＰマッチングのような照合
手法を用いて、周波数分析部１６の分析する時間周波数
パターンと、標準周波数パターン管理部１０ａの管理す
る標準周波数パターンとを照合して、最も尤度（類似
度）の高い単語文字列を特定して音素分割部１２ａに出
力するとともに、その照合の際に求めた局所尤度列（最
適照合経路上の局所尤度の列）を音素分割部１２ａに出
力する。The matching unit 11a corresponds to the recognition unit 11 in FIG. 1, and uses a matching technique such as DP matching, for example, to analyze the time-frequency pattern analyzed by the frequency analysis unit 16 and the standard frequency pattern management. The reference frequency pattern managed by the unit 10a is collated, the word character string having the highest likelihood (similarity) is specified and output to the phoneme division unit 12a, and the local likelihood sequence obtained at the time of the collation is specified. The (sequence of local likelihoods on the optimum matching route) is output to the phoneme dividing unit 12a.

【００１９】音素分割部１２ａは、図１の分割部１２に
相当するものであって、照合部１１ａにより求められた
単語文字列を音素毎に分割するとともに、その分割に対
応して、照合部１１ａにより求められた局所尤度列を音
素毎に分割する。音素尤度算出部１３ａは、図１の算出
部１３に相当するものであって、音素分割部１２ａによ
り分割された音素毎の局所尤度列の平均値を算出するこ
とで音素毎の尤度を算出する。認識結果表示制御部１４
ａは、図１の出力部１４に相当するものであって、ディ
スプレイ装置２ａに対して認識結果を出力する。The phoneme dividing unit 12a corresponds to the dividing unit 12 in FIG. 1, divides the word character string obtained by the matching unit 11a into phonemes, and, in accordance with the division, the matching unit. The local likelihood sequence obtained by 11a is divided for each phoneme. The phoneme likelihood calculating unit 13a corresponds to the calculating unit 13 in FIG. 1, and calculates the average value of the local likelihood sequence for each phoneme divided by the phoneme dividing unit 12a to calculate the likelihood for each phoneme. To calculate. Recognition result display control unit 14
a corresponds to the output unit 14 in FIG. 1 and outputs the recognition result to the display device 2a.

【００２０】次に、このように構成される実施例の動作
処理について詳細に説明する。ユーザが認識対象となる
単語音声情報を発声すると、音声入力部１５が、マイク
ロフォン３により電気信号に変換された単語音声信号を
Ａ／Ｄ変換し、周波数分析部１６が、このＡ／Ｄ変換さ
れた単語音声信号をある適当な周期毎に周波数分析する
ことで時間周波数パターンを得る。Next, the operation processing of the embodiment thus constructed will be described in detail. When the user utters word voice information to be recognized, the voice input unit 15 A / D-converts the word voice signal converted into an electric signal by the microphone 3, and the frequency analysis unit 16 performs the A / D conversion. A time-frequency pattern is obtained by frequency-analyzing the word speech signal for each appropriate period.

【００２１】このようにして、ユーザの発声した音声情
報の時間周波数パターンが得られると、照合部１１ａ
は、周波数分析部１６の分析する時間周波数パターン
と、標準周波数パターン管理部１０ａの管理する標準周
波数パターンとを照合して、最も尤度の高い単語文字列
を特定して音素分割部１２ａに出力するとともに、その
照合の際に求めた局所尤度列を音素分割部１２ａに出力
する。When the time-frequency pattern of the voice information uttered by the user is obtained in this way, the collating unit 11a.
Compares the time-frequency pattern analyzed by the frequency analysis unit 16 with the standard frequency pattern managed by the standard frequency pattern management unit 10a, identifies the word string with the highest likelihood, and outputs it to the phoneme division unit 12a. At the same time, the local likelihood sequence obtained during the matching is output to the phoneme dividing unit 12a.

【００２２】この照合部１１ａによる単語文字列の出力
処理を受けて、音素分割部１２ａは、受け取った単語文
字列を音素毎に分割するとともに、その分割に対応し
て、照合部１１ａにより求められた局所尤度列を音素毎
に分割し、この音素分割部１２ａによる局所尤度列の分
割処理を受けて、音素尤度算出部１３ａは、分割された
音素毎の局所尤度列の平均値を算出することで音素毎の
尤度を算出する。In response to the word character string output processing by the collating unit 11a, the phoneme dividing unit 12a divides the received word character string for each phoneme, and the collating unit 11a finds the corresponding word segment. The local likelihood sequence is divided for each phoneme, and the phoneme likelihood calculating unit 13a receives the local likelihood sequence dividing process by the phoneme dividing unit 12a. Is calculated to calculate the likelihood for each phoneme.

【００２３】そして、認識結果表示制御部１４ａは、照
合部１１ａの認識した認識結果の単語文字列をディスプ
レイ装置２ａに表示していくとともに、この音素尤度算
出部１３ａの算出結果を受けて、算出された音素毎の尤
度を認識結果の単語文字列の対応する音素に関連付けて
表示していくよう制御する。Then, the recognition result display control unit 14a displays the word character string of the recognition result recognized by the collation unit 11a on the display device 2a, and receives the calculation result of the phoneme likelihood calculation unit 13a, Control is performed so that the calculated likelihood for each phoneme is displayed in association with the corresponding phoneme of the word character string of the recognition result.

【００２４】このようにして、ユーザが例えば「ＣＨＩ
ＢＡ（千葉）」という単語音声情報を発声するときにあ
って、照合部１１ａが「ＳＨＩＧＡ（滋賀）」と認識す
る場合でもって説明するならば、音素分割部１２ａは、
認識結果の「ＳＨＩＧＡ」を音素を単位に分割すること
で「／ＳＨ／，／Ｉ／，／Ｇ／，／Ａ／」を得て、音素
尤度算出部１３ａは、例えば、「ＳＨ」の尤度が“１
０”で、「Ｉ」の尤度が“５”で、「Ｇ」の尤度が
“４”で、「Ａ」の尤度が“１２”であることを算出
し、これらの処理結果を受けて、認識結果表示制御部１
４ａは、図３に示すように、ディスプレイ装置２ａのデ
ィスプレイ画面上に、認識結果の単語文字列である「Ｓ
ＨＩＧＡ」と、その「ＳＨＩＧＡ」を構成する各音素の
尤度とを関連付けて表示していくよう制御するのであ
る。In this way, the user can select, for example, "CHI
If the collation unit 11a recognizes "SHIGA (Shiga)" when uttering the word voice information "BA (Chiba)", the phoneme segmentation unit 12a is
"/ SH /, / I /, / G /, / A /" is obtained by dividing the recognition result "SHIGA" in units of phonemes, and the phoneme likelihood calculation unit 13a uses, for example, "SH" Likelihood is "1"
It is calculated that the likelihood of “I” is “5”, the likelihood of “G” is “4”, and the likelihood of “A” is “12” with 0 ”, and these processing results are Receiving, recognition result display control unit 1
As shown in FIG. 3, 4a is a word character string “S” which is a recognition result on the display screen of the display device 2a.
The control is performed such that "HIGA" and the likelihood of each phoneme that constitutes "SHIGA" are displayed in association with each other.

【００２５】この表示制御に従い、ユーザは、「ＣＨＩ
ＢＡ」の「ＣＨ」が「ＳＨ」に誤認識され、「ＣＨＩＢ
Ａ」の「Ｂ」が「Ｇ」に誤認識されていることを知ると
きにあって、「ＳＨ」の尤度が高いことを知ることがで
きるので、次に「ＣＨＩＢＡ」を再入力するときには、
高い尤度でもって間違えられた「ＣＨ」に注意しなが
ら、「ＣＨＩＢＡ」を発声していくようにすればよいこ
とを知ることができるのである。According to this display control, the user can select "CHI
"CH" of "BA" is mistakenly recognized as "SH", and "CHIB"
When I know that "B" of "A" is mistakenly recognized as "G", I can know that the likelihood of "SH" is high, so when I re-enter "CHIBA" ,
It is possible to know that it is sufficient to say "CHIBA" while paying attention to "CH" that was mistaken with high likelihood.

【００２６】この図３に示す表示例では、単に、音素尤
度算出部１３ａにより算出された尤度を認識結果の単語
文字列の各音素に対応付けて表示していく構成例でもっ
て示したが、例えば、相対的に高い尤度を他の尤度とは
別の色で表示させたりフリッカ表示させたり、あるいは
その尤度に対応付けられる音素も別の色で表示させたり
フリッカ表示させたりすると、ユーザの注意を促すのに
便利なものとなる。In the display example shown in FIG. 3, the likelihood calculated by the phoneme likelihood calculating unit 13a is simply displayed in association with each phoneme of the word character string of the recognition result. However, for example, a relatively high likelihood may be displayed in a color different from other likelihoods or flicker displayed, or a phoneme associated with that likelihood may be displayed in a different color or flicker displayed. This is convenient for calling the user's attention.

【００２７】このように、本発明の音声認識装置１で
は、ただ単に認識結果となる文字列を表示するのではな
くて、認識処理により求められた各文字部分の尤度を関
連付けて表示する構成を採ることを特徴とするものであ
る。As described above, in the voice recognition device 1 of the present invention, the likelihood of each character portion obtained by the recognition process is displayed in association with each other, instead of simply displaying the character string as the recognition result. It is characterized by taking.

【００２８】図示実施例について説明したが、本発明は
これに限定されるものではない。例えば、実施例では、
単語音声情報の認識処理を例にして本発明を説明した
が、本発明はこれに限られることなく、文節等の音声認
識処理についてもそのまま適用することが可能である。Although the illustrated embodiment has been described, the present invention is not limited to this. For example, in the example,
Although the present invention has been described by taking the recognition processing of word voice information as an example, the present invention is not limited to this, and can be applied as it is to the voice recognition processing of phrases and the like.

【００２９】また、実施例では、音素を基準にして認識
結果の文字列を分割していくことで開示したが、本発明
はこれに限られることなく、音節や時間を基準にして認
識結果の文字列を分割していく構成を採ることも可能で
ある。例えば時間を基準にして分割すると、ユーザは発
声する音声情報の前半部分に注意すべきであるとか、後
半部分に注意すべきであるとかいう情報が表示されるこ
とになるのである。Further, although the embodiment has been disclosed by dividing the character string of the recognition result based on the phoneme, the present invention is not limited to this, and the recognition result of the recognition result is based on the syllable or the time. It is also possible to adopt a configuration in which the character string is divided. For example, when divided based on time, information that the user should pay attention to the first half of the voice information to be uttered or the second half of the voice information should be displayed.

【００３０】[0030]

【発明の効果】以上説明したように、本発明によれば、
音声認識装置にあって、認識結果の文字列を表示すると
きに、認識処理により求められた各文字部分の尤度を関
連付けて表示する構成を採るものであることから、ユー
ザは、認識結果が誤りであることで音声情報を再入力す
るときにあって、どの音声情報部分に注意して発声すべ
きかを知ることができるので、音声情報を何回も入力し
ていかなくても済むようになるのである。As described above, according to the present invention,
In the voice recognition device, when displaying the character string of the recognition result, since the likelihood of each character portion obtained by the recognition processing is associated and displayed, the user is Since it is an error, you can re-enter the voice information and know which voice information part should be uttered, so that you do not have to input the voice information many times. It will be.

[Brief description of drawings]

【図１】本発明の原理構成図である。FIG. 1 is a principle configuration diagram of the present invention.

【図２】本発明の一実施例である。FIG. 2 is an example of the present invention.

【図３】認識結果の表示例の説明図である。FIG. 3 is an explanatory diagram of a display example of a recognition result.

[Explanation of symbols]

１音声認識装置２出力装置１０標準パターン管理部１１認識部１２分割部１３算出部１４出力部 1 voice recognition device 2 output device 10 standard pattern management unit 11 recognition unit 12 division unit 13 calculation unit 14 output unit

Claims

[Claims]

1. A likelihood standard between a pre-registered voice standard pattern and a voice pattern to be recognized is evaluated while obtaining a local likelihood sequence, and a voice standard pattern having the highest likelihood or a plurality of higher-ranked voice standard patterns are evaluated. In a voice recognition device that specifies and outputs a character string corresponding to a voice standard pattern, the character string of the recognition result is divided, and the division unit that divides the obtained local likelihood sequence corresponding to the division ( 12) and by calculating a representative value of each local likelihood sequence divided by the division unit (12), the likelihood associated with each character portion divided by the division unit (12) is calculated. The calculation unit (13) is provided, and when outputting the character string of the recognition result, the calculation unit (1
A speech recognition device characterized by processing such that the likelihood calculated in 3) is output in association with a corresponding character portion.

2. The voice recognition device according to claim 1, wherein the dividing unit (12) performs processing so as to divide the character string of the recognition result on the basis of time.

3. The voice recognition device according to claim 1, wherein the dividing unit (12) performs processing so as to divide the character string of the recognition result based on a phoneme or a syllable. ..

4. The speech recognition apparatus according to claim 1, 2 or 3, wherein when outputting a character string of a recognition result, the likelihood calculated by the calculation unit (13) is relatively high. Processing such that one or a plurality of likelihoods and / or one or both of the character parts associated with the likelihoods are output in an output form different from the output form for the other character parts. , Characteristic voice recognition device.