JP2003323196A

JP2003323196A - Voice recognition system, voice recognition method, and voice recognition program

Info

Publication number: JP2003323196A
Application number: JP2002132472A
Authority: JP
Inventors: Hideaki Nagatsuma; 秀明長妻
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2002-05-08
Filing date: 2002-05-08
Publication date: 2003-11-14

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recognition system in which a plurality of voice recognition processing sections having different reliability references for the outputs is complementarily combined. <P>SOLUTION: Voice recognition processing sections 131 and 132 conduct voice recognition for input voice data 111 in accordance with intrinsic recognition systems and output voice recognition results and reliabilities indicating the sureness of the results. A voice recognition result evaluating means 133 converts the outputted reliabilities into an evaluation value, that is a common scale, employing an evaluation function 134 that is prepared for every voice recognition processing section, selects an optimum voice recognition result on the basis of the evaluation value obtained by the conversion and outputs the result. The function 134 includes a reference evaluation function corresponding to the section 131 and an evaluation function corresponding to the section 132. The latter evaluation function provides a higher evaluation value than the reference evaluation function for the reliability range in which the voice recognition result is closer to a correct solution as compared with the voice recognition result of the section 131. For the range of reliability that is not close to the correct solution, the latter evaluation function provides an evaluation value that is less than the reference evaluation function. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識技術に関
し、特に複数の音声認識処理部を用いることにより認識
率の向上を図った音声認識システム、音声認識方法およ
び音声認識用プログラムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition technique, and more particularly to a speech recognition system, a speech recognition method and a speech recognition program which improve a recognition rate by using a plurality of speech recognition processing units.

【０００２】[0002]

【従来の技術】従来より、SmartVoice（http://www.ne
c.co.jp/press/ja/0107/2301-01.html）やViaVoice（ht
tp://www-6.ibm.com/jp/voiceland/main/function.htm
l）などの音声認識システムが製品として販売されてい
る。しかし、これらの従来の音声認識システムは、単一
の音声認識方式を利用しているため、発声方法の異なる
複数の人の音声や日本語と英語など異なる言語の音声が
混在している環境などでは、時と場合によって認識率に
高低さが生じるという課題がある。なお、単一の音声認
識方式を使用する音声認識システムを記載した他の文献
としては、特開平8-248981号公報や特開2001-175276号
公報がある。[Prior Art] SmartVoice (http://www.ne
c.co.jp/press/ja/0107/2301-01.html) and ViaVoice (ht
tp: //www-6.ibm.com/jp/voiceland/main/function.htm
l) and other voice recognition systems are sold as products. However, since these conventional speech recognition systems use a single speech recognition method, it is possible to use multiple people's voices with different utterance methods or environments in which voices of different languages such as Japanese and English are mixed. Then, there is a problem that the recognition rate becomes high or low depending on the time and the case. Other documents describing a voice recognition system that uses a single voice recognition method include JP-A-8-248981 and JP-A-2001-175276.

【０００３】そこで、多種多様な音声入力環境に適応さ
せて認識率の向上を図るために、複数の音声認識手段を
相補的に使用するようにした音声認識システムが特開平
8-202388号公報（以下、文献1と称す）で提案されてい
る。この文献1に記載された従来の音声認識システム
は、入力音声に対してそれぞれ固有の認識方式に従って
音声認識を行い、音声認識結果とその確からしさを示す
信頼度（文献1では距離(distance)と称している）をそ
れぞれ出力する複数の音声認識処理部と、これら複数の
音声認識結果を所定の基準により纏めて総合評価して最
終結果を出力する認識結果統合部とを備えている。ここ
で、所定の基準としては、個々の音声認識処理部の過去
のヒット率（過去の認識総数に対する正解出力回数の割
合で、文献1では信頼度と称している）を用い、個々の
音声認識処理部から出力された信頼度にその音声認識処
理部の過去のヒット率を乗ずることで信頼度を補正し、
補正後の複数の信頼度の比較結果に基づいて最終結果を
選択している。Therefore, in order to adapt to a wide variety of voice input environments and improve the recognition rate, there is provided a voice recognition system in which a plurality of voice recognition means are complementarily used.
It is proposed in 8-202388 (hereinafter referred to as Reference 1). The conventional speech recognition system described in this document 1 performs speech recognition according to a unique recognition method for each input speech, and the reliability indicating the speech recognition result and its certainty (distance (distance) in document 1 and A plurality of voice recognition processing units for respectively outputting) and a recognition result integration unit for collectively evaluating the plurality of voice recognition results by a predetermined standard and outputting a final result. Here, as a predetermined criterion, the past hit rate of each speech recognition processing unit (the ratio of the number of correct output times to the total number of past recognitions, which is referred to as reliability in Reference 1) is used, and each speech recognition is performed. The reliability is corrected by multiplying the reliability output from the processing unit by the past hit rate of the voice recognition processing unit,
The final result is selected based on the result of comparison of the plurality of corrected reliabilities.

【０００４】この音声認識システムによれば、例えば、
複数の音声認識処理部において、認識を行うために予め
蓄積しておくテンプレートとして、同一の単語について
の異なる発声方法（男女別、年齢別、地方別など）で、
かつ、異なる形式で作成されたテンプレートを用いれ
ば、各認識結果を統合することで認識率の向上が図れ、
また、異なる単語について作成されたテンプレートを用
いれば、認識語彙数を計算コストの上昇を招くことなく
増大させることができるとしている。According to this voice recognition system, for example,
As a template to be stored in advance for recognition in a plurality of voice recognition processing units, different voicing methods for the same word (by gender, age, region, etc.),
And if you use templates created in different formats, you can improve the recognition rate by integrating each recognition result.
In addition, it is said that the number of recognized vocabularies can be increased without increasing the calculation cost by using templates created for different words.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、文献1
に記載される音声認識システムは、認識方式の異なる任
意の音声認識処理部を組み合わせることが困難であると
いう課題がある。その理由は、組み合わせる複数の音声
認識処理部が、”出力する信頼度が完全に同じ基準、尺
度で生成されていること”という前提が必要になるため
である。すなわち、文献1では、２つの音声認識処理部
ＡとＢがある場合、両者から同じ信頼度（文献1では距
離）が出力されたとき、認識の確からしさは同じである
と推定しているが、出力する信頼度の生成基準が異なる
複数の音声認識処理部の場合、同じ信頼度であっても双
方の認識結果の客観的な確からしさが必ずしも同じであ
るとは限らないからである。このため、認識方式が大き
く相違し、その結果、出力する信頼度の生成基準や尺度
が相違する任意の音声認識処理部を組み合わせることが
できない。なお、文献1では、音声認識処理部から出力
された信頼度を補正し、その補正後の信頼度どうしを比
較する処理を行っているが、その補正は個々の音声認識
処理部固有の情報に基づく補正に過ぎないため、そのよ
うな補正を行っても信頼度の生成基準が異なる複数の音
声認識処理部を相補的に組み合わせることはできない。[Problems to be Solved by the Invention] However, Document 1
The voice recognition system described in 1 has a problem that it is difficult to combine arbitrary voice recognition processing units having different recognition methods. The reason is that the plurality of speech recognition processing units to be combined need to be premised that "the reliability of output is generated with completely the same reference and scale". That is, in Document 1, when there are two voice recognition processing units A and B, it is estimated that the recognition reliability is the same when both of them output the same reliability (distance in Document 1). This is because, in the case of a plurality of speech recognition processing units having different reliability generation criteria to be output, even if the reliability is the same, the objective certainty of both recognition results is not always the same. For this reason, the recognition methods are greatly different, and as a result, it is not possible to combine arbitrary speech recognition processing units having different output reliability generation standards and scales. Note that in Document 1, the reliability output from the voice recognition processing unit is corrected, and the process of comparing the corrected reliability is performed, but the correction is performed on information unique to each voice recognition processing unit. Since the correction is based on the above, it is not possible to complementarily combine a plurality of voice recognition processing units having different reliability generation standards even if such correction is performed.

【０００６】本発明の目的は、出力する信頼度の生成基
準や尺度が異なる複数の音声認識処理部であってもそれ
らを相補的に組み合わせることができる音声認識システ
ム及び音声認識方法を提供することにある。An object of the present invention is to provide a voice recognition system and a voice recognition method which can complemently combine a plurality of voice recognition processing units having different standards and scales for outputting reliability. It is in.

【０００７】[0007]

【課題を解決するための手段】本発明の第１の音声認識
システムは、入力音声に対してそれぞれ固有の認識方式
に従って音声認識を行い、音声認識結果とその確からし
さを示す信頼度とをそれぞれ出力する複数の音声認識処
理部と、前記複数の音声認識処理部のそれぞれから出力
された信頼度を前記音声認識処理部別に用意された評価
関数を用いて前記複数の音声認識処理部で共通な尺度で
ある評価値に変換し、該変換して得られた複数の評価値
の比較結果に基づいて前記複数の音声認識結果から音声
認識結果を選択して出力する音声認識結果評価手段とを
備えている。ここで、前記評価関数は、例えば、前記複
数の音声認識処理部の何れか１つの音声認識処理部に対
応する基準評価関数と、他の音声認識処理部に対応する
評価関数とを含み、他の音声認識処理部に対応する評価
関数は、信頼度の値が同じであっても、音声認識結果が
前記１つの音声認識処理部の音声認識結果に比べて、よ
り正解に近い信頼度の範囲については前記基準評価関数
よりも高い評価値が得られ、正解に近くない信頼度の範
囲については前記基準評価関数よりも低い評価値が得ら
れるような評価関数である。According to a first speech recognition system of the present invention, speech recognition is performed on an input speech according to a unique recognition method, and a speech recognition result and a reliability indicating its certainty are respectively obtained. The plurality of voice recognition processing units to output and the reliability output from each of the plurality of voice recognition processing units are shared by the plurality of voice recognition processing units by using an evaluation function prepared for each voice recognition processing unit. A voice recognition result evaluation means for converting the evaluation value as a scale and selecting and outputting a voice recognition result from the plurality of voice recognition results based on a comparison result of the plurality of evaluation values obtained by the conversion. ing. Here, the evaluation function includes, for example, a reference evaluation function corresponding to any one voice recognition processing unit of the plurality of voice recognition processing units and an evaluation function corresponding to another voice recognition processing unit. In the evaluation function corresponding to the voice recognition processing unit of, even if the reliability value is the same, the voice recognition result is closer to the correct answer range than the voice recognition result of the one voice recognition processing unit. Is a higher evaluation value than the reference evaluation function, and an evaluation value lower than the reference evaluation function is obtained in the range of reliability not close to the correct answer.

【０００８】本発明の第２の音声認識システムは、上述
の音声認識システムの構成に加えて更に、サンプル音声
データについて前記複数の音声認識処理部で得られた音
声認識結果を利用者に提示する手段と、信頼度と評価値
のグラフ上に前記基準評価関数を表示する手段と、前記
他の音声認識処理部で得られた信頼度に対して付与する
評価値を前記グラフ上に利用者からの指示に従ってプロ
ットする手段と、前記グラフ上のプロット点の集合に基
づいて前記他の音声認識処理部用の前記評価関数を作成
する手段とを備えている。The second voice recognition system of the present invention, in addition to the configuration of the voice recognition system described above, further presents to the user the voice recognition results obtained by the plurality of voice recognition processing units for sample voice data. Means, means for displaying the reference evaluation function on a graph of reliability and evaluation value, and an evaluation value to be given to the reliability obtained by the other speech recognition processing unit on the graph from the user And a means for creating the evaluation function for the other speech recognition processing unit based on a set of plot points on the graph.

【０００９】また本発明の第１の音声入力方法は、ａ）
複数の音声認識処理部で並行して、入力音声に対してそ
れぞれ固有の認識方式に従って音声認識を行い、音声認
識結果とその確からしさを示す信頼度とをそれぞれ出力
するステップと、ｂ）前記複数の音声認識処理部のそれ
ぞれから出力された信頼度を前記音声認識処理部別に用
意された評価関数を用いて前記複数の音声認識処理部で
共通な尺度である評価値に変換するステップと、ｃ）該
変換して得られた複数の評価値の比較結果に基づいて前
記複数の音声認識結果から音声認識結果を選択して出力
するステップとを含んでいる。ここで、前記評価関数
は、例えば、前記複数の音声認識処理部の何れか１つの
音声認識処理部に対応する基準評価関数と、他の音声認
識処理部に対応する評価関数とを含み、他の音声認識処
理部に対応する評価関数は、信頼度の値が同じであって
も、音声認識結果が前記１つの音声認識処理部の音声認
識結果に比べて、より正解に近い信頼度の範囲について
は前記基準評価関数よりも高い評価値が得られ、正解に
近くない信頼度の範囲については前記基準評価関数より
も低い評価値が得られるような評価関数である。The first voice input method of the present invention is a)
In parallel, a plurality of voice recognition processing units perform voice recognition on an input voice according to a unique recognition method, and output a voice recognition result and a reliability indicating its certainty, respectively, and b) the plurality of voice recognition processing units. Converting the reliability output from each of the speech recognition processing units into an evaluation value, which is a scale common to the plurality of speech recognition processing units, using an evaluation function prepared for each of the speech recognition processing units; ) A step of selecting and outputting a voice recognition result from the plurality of voice recognition results based on a comparison result of the plurality of evaluation values obtained by the conversion. Here, the evaluation function includes, for example, a reference evaluation function corresponding to any one voice recognition processing unit of the plurality of voice recognition processing units and an evaluation function corresponding to another voice recognition processing unit. In the evaluation function corresponding to the voice recognition processing unit of, even if the reliability value is the same, the voice recognition result is closer to the correct answer range than the voice recognition result of the one voice recognition processing unit. Is a higher evaluation value than the reference evaluation function, and an evaluation value lower than the reference evaluation function is obtained in the range of reliability not close to the correct answer.

【００１０】本発明の第２の音声認識方法は、ｄ）前記
他の音声認識処理部に対応する評価関数を生成するステ
ップを更に含み、該ステップｄは、ｄ−１）サンプル音
声データについて前記複数の音声認識処理部で得られた
音声認識結果を利用者に提示するステップと、ｄ−２）
信頼度と評価値のグラフ上に前記基準評価関数を表示す
るステップと、ｄ−３）前記他の音声認識処理部で得ら
れた信頼度に対して付与する評価値を前記グラフ上に利
用者からの指示に従ってプロットするステップと、ｄ−
４）用意されたサンプル音声データの数だけ前記ステッ
プｄ−１〜ｄ−３を繰り返すステップと、ｄ−５）前記
グラフ上のプロット点の集合に基づいて前記他の音声認
識処理部用の前記評価関数を作成するステップとを含ん
でいる。The second speech recognition method of the present invention further includes d) a step of generating an evaluation function corresponding to the other speech recognition processing section, and the step d is d-1) for the sample speech data. Presenting the voice recognition results obtained by the plurality of voice recognition processing units to the user, d-2)
A step of displaying the reference evaluation function on a graph of reliability and an evaluation value; and d-3) an evaluation value to be given to the reliability obtained by the other speech recognition processing unit on the graph. Plotting according to the instructions from d.
4) repeating the steps d-1 to d-3 by the number of prepared sample voice data, and d-5) the step for the other voice recognition processing unit based on the set of plot points on the graph. And a step of creating an evaluation function.

【００１１】[0011]

【作用】本発明の第１の音声認識システム及び音声認識
方法にあっては、複数の音声認識処理部からそれぞれ出
力された信頼度を、音声認識処理部別に用意された評価
関数を用いて複数の音声認識処理部で共通な尺度である
評価値に変換し、この変換後の複数の評価値の比較結果
に基づいて最適な音声認識結果を選択するため、認識方
式が大きく相違し、そのために出力する信頼度の基準や
尺度が異なる複数の音声認識処理部を組み合わせても、
より正解に近い音声認識結果を選択して出力する音声認
識システム及び方法が得られる。In the first voice recognition system and the voice recognition method of the present invention, the reliability output from each of the plurality of voice recognition processing units is determined by using the evaluation function prepared for each voice recognition processing unit. The speech recognition processing unit converts the evaluation value to a common scale and selects the optimum speech recognition result based on the comparison result of the converted multiple evaluation values. Even when combining multiple speech recognition processing units with different output reliability standards and scales,
A voice recognition system and method for selecting and outputting a voice recognition result closer to a correct answer can be obtained.

【００１２】また本発明の第２の音声認識システム及び
音声認識方法にあっては、必要な評価関数をサンプル音
声データおよび基準評価関数から利用者が簡単に作成す
ることができる。In the second speech recognition system and speech recognition method of the present invention, the user can easily create a necessary evaluation function from the sample voice data and the reference evaluation function.

【００１３】[0013]

【発明の実施の形態】次に、本発明の第１の実施の形態
について図面を参照して詳細に説明する。BEST MODE FOR CARRYING OUT THE INVENTION Next, a first embodiment of the present invention will be described in detail with reference to the drawings.

【００１４】図１のブロック図を参照すると、本発明の
第１の実施の形態の音声認識システムは、音声データ入
力装置１０１と、プログラム制御により動作するデータ
処理装置１０２と、文字データ出力装置１０３と、モー
ド切替器１０４と、表示装置１０５と、入力装置１０６
とから構成されている。Referring to the block diagram of FIG. 1, a voice recognition system according to a first embodiment of the present invention includes a voice data input device 101, a data processing device 102 operated by program control, and a character data output device 103. A mode switch 104, a display device 105, and an input device 106.
It consists of and.

【００１５】モード切替器１０４は、音声認識システム
の動作モードを、音声認識モードと評価関数生成モード
とに切り替えるスイッチ等で構成される。モード切替器
１０４の出力は、データ処理装置１０２および文字デー
タ出力装置１０３に入力される。The mode switch 104 is composed of a switch for switching the operation mode of the voice recognition system between the voice recognition mode and the evaluation function generation mode. The output of the mode switch 104 is input to the data processing device 102 and the character data output device 103.

【００１６】音声データ入力装置１０１は、音声認識モ
ードのときに利用者によって入力される入力音声データ
１１１および評価関数生成モードのときに利用者によっ
て入力されるサンプル音声データ１１２を入力してデー
タ処理装置１０２に出力する音声データ入力手段１２１
を含んで構成される。音声データ入力手段１２１は、音
声を集音するマイクロフォン、その出力を増幅する増幅
器、その出力をデジタル信号に変換するＡＤ変換器など
を有する。The voice data input device 101 inputs the input voice data 111 input by the user in the voice recognition mode and the sample voice data 112 input by the user in the evaluation function generation mode to perform data processing. Audio data input means 121 for outputting to the device 102
It is configured to include. The voice data input means 121 has a microphone that collects voice, an amplifier that amplifies its output, an AD converter that converts its output into a digital signal, and the like.

【００１７】データ処理装置１０２は、２個の音声認識
処理部１３１、１３２と、音声認識結果評価手段１３３
と、評価関数１３４と、評価関数作成手段１３５と、音
声認識結果プロット手段１３６とを含んで構成される。The data processing device 102 includes two voice recognition processing units 131 and 132 and a voice recognition result evaluation unit 133.
An evaluation function 134, an evaluation function creating means 135, and a voice recognition result plotting means 136.

【００１８】音声認識処理部１３１、１３２は、音声デ
ータ入力手段１２１から入力された音声に対してそれぞ
れ固有の認識方式に従って音声認識を行い、音声認識結
果とその確からしさを示す信頼度とをそれぞれ出力する
機能を有する。ここで、音声認識結果は文字データであ
る。また信頼度は例えば100％を最高値とする数値であ
る。音声認識処理部１３１、１３２は、音声認識モード
と評価関数生成モードとで入力される音声が入力音声デ
ータ１１１とサンプル音声データ１１２とで相違するだ
けで、両モードにおいて同じ動作を行う。The voice recognition processing units 131 and 132 perform voice recognition on the voices input from the voice data input unit 121 according to their own recognition methods, and respectively give the voice recognition result and the reliability indicating its certainty. Has a function to output. Here, the voice recognition result is character data. Further, the reliability is a numerical value whose maximum value is 100%, for example. The voice recognition processing units 131 and 132 perform the same operation in both the voice recognition mode and the evaluation function generation mode, except that the input voice data 111 and the sample voice data 112 are different from each other.

【００１９】本実施の形態の場合、音声認識処理部１３
１は、音声認識手段１３７と信頼度計算手段１３９とで
構成される。一般に音声認識の手法には、知識工学的な
手法、パターンマッチング的な手法、確率統計的な手
法、ニューラルネットワーク的な手法など各種の手法が
提案ないし実用化されており、同じ手法であっても数多
くのバラエティがある。音声認識手段１３７にはそれら
の任意の手法を用いることができる。音声認識手段１３
７は、入力音声が与えられると、内部の処理で幾つかの
認識結果の候補をその信頼度と共に生成し、最も信頼度
の高い候補とその信頼度を出力する。信頼度計算手段１
３９は、音声認識手段１３７が入力音声を構成する単語
などの構成単位で信頼度を出力するものである場合に、
各構成単位毎の信頼度から入力音声全体の信頼度を計算
する手段である。例えば、全ての構成単位の信頼度の平
均値を入力音声全体の信頼度とする。音声認識手段１３
７自体が入力音声全体の信頼度を出力するものである場
合には、信頼度計算手段１３９は、音声認識手段１３７
から出力された信頼度をそのまま出力するものとして構
成されるか、もしくは省略してよい。In the case of the present embodiment, the voice recognition processing section 13
1 is composed of a voice recognition unit 137 and a reliability calculation unit 139. In general, various techniques such as a knowledge engineering technique, a pattern matching technique, a stochastic statistical technique, and a neural network technique have been proposed or put into practical use as the voice recognition technique, and even if the same technique is used. There are many varieties. Any of these methods can be used for the voice recognition means 137. Voice recognition means 13
When the input voice is given, 7 generates some recognition result candidates together with their reliability by internal processing, and outputs the candidate with the highest reliability and its reliability. Reliability calculation means 1
39 is the case where the voice recognition means 137 outputs the reliability in units such as words that form the input voice,
It is a means for calculating the reliability of the entire input voice from the reliability of each structural unit. For example, the average value of the reliability of all the constituent units is set as the reliability of the entire input voice. Voice recognition means 13
When 7 itself outputs the reliability of the entire input speech, the reliability calculation means 139 causes the speech recognition means 137.
The reliability output from can be configured as is, or can be omitted.

【００２０】他方の音声認識処理部１３２も音声認識処
理部１３１における音声認識手段１３７及び信頼度計算
手段１３９と同様な音声認識手段１３８及び信頼度計算
手段１４０で構成される。但し、音声認識手段１３８の
認識方式は音声認識手段１３７の認識方式とは相違して
いる。ここで、認識方式が相違するとは、知識工学的な
手法、パターンマッチング的な手法、確率統計的な手
法、ニューラルネットワーク的な手法などの音声認識の
手法自体が相違する場合だけでなく、同じ音声認識の手
法を使うが、認識処理を行うために予め蓄えておく標準
パターンなどが異なる場合も含む。The other speech recognition processing section 132 is also composed of a speech recognition section 138 and a reliability calculation section 140 similar to the speech recognition section 137 and the reliability calculation section 139 in the speech recognition processing section 131. However, the recognition method of the voice recognition means 138 is different from the recognition method of the voice recognition means 137. Here, the difference in the recognition method is not limited to the case where the speech recognition methods themselves such as the knowledge engineering method, the pattern matching method, the stochastic statistical method, and the neural network method are different. The recognition method is used, but it also includes the case where the standard patterns stored in advance for performing the recognition process are different.

【００２１】音声認識結果評価手段１３３は、モードに
よって動作が異なる。音声認識モードのときは、音声認
識処理部１３１、１３２から音声認識結果と信頼度を入
力し、音声認識処理部別に用意された評価関数１３４を
用いてそれぞれの信頼度を、互いの信頼度を相対的に比
較するための共通な尺度である評価値に変換し、この変
換して得られた各評価値の比較結果に基づいて前記複数
の音声認識結果から１つの音声認識結果を選択して文字
データ出力装置１０３に出力する。評価関数生成モード
のときは、音声認識処理部１３１、１３２から出力され
た音声認識結果と信頼度をそのまま文字データ出力装置
１０３に出力する。The operation of the voice recognition result evaluation means 133 differs depending on the mode. In the voice recognition mode, the voice recognition results and the reliability are input from the voice recognition processing units 131 and 132, and the respective reliability is calculated by using the evaluation function 134 prepared for each voice recognition processing unit. It is converted into an evaluation value which is a common scale for relatively comparing, and one speech recognition result is selected from the plurality of speech recognition results based on the comparison result of each evaluation value obtained by this conversion. It is output to the character data output device 103. In the evaluation function generation mode, the voice recognition result and the reliability output from the voice recognition processing units 131 and 132 are directly output to the character data output device 103.

【００２２】評価関数１３４は、音声認識処理部１３１
用の評価関数fa()と音声認識処理部１３２用の評価関数
fb()とで構成される。本実施の形態の場合、何れか一方
は事前に固定的に設定され、他方が評価関数生成モード
時に作成される。事前に固定的に設定される評価関数を
基準評価関数と呼ぶ。fa()、fb()の何れを基準評価関数
にしても構わないが、以下の説明では音声認識処理部１
３１用のfa()を基準評価関数とする。図２にfa()の例を
グラフ形式で示す。横軸が信頼度、縦軸が評価値であ
り、符号２０１で示す線分がfa()に相当する。この場
合、基準評価関数fa()は、信頼度を変数に持つ一次関数
である。このfa()によれば、例えば信頼度65％は値８５
の評価値に変換される。なお、基準評価関数は一次関数
に限定されるものではない。評価関数生成モードにおい
ては、この基準評価関数fa()との関係において残りの評
価関数fb()が生成される。この生成に関して音声認識結
果プロット手段１３６と評価関数作成手段１３５が関与
するが、それらの機能については後述する。The evaluation function 134 is the speech recognition processing unit 131.
Evaluation function fa () for speech recognition processing unit 132
Composed of fb () and. In the case of the present embodiment, either one is fixedly set in advance and the other is created in the evaluation function generation mode. An evaluation function fixedly set in advance is called a reference evaluation function. Either fa () or fb () may be used as the reference evaluation function, but in the following description, the voice recognition processing unit 1
The fa () for 31 is used as the reference evaluation function. FIG. 2 shows an example of fa () in a graph format. The horizontal axis represents the reliability and the vertical axis represents the evaluation value, and the line segment indicated by reference numeral 201 corresponds to fa (). In this case, the reference evaluation function fa () is a linear function having the reliability as a variable. According to this fa (), for example, the reliability of 65% is the value of 85
Is converted to the evaluation value of. The reference evaluation function is not limited to the linear function. In the evaluation function generation mode, the remaining evaluation function fb () is generated in relation to the reference evaluation function fa (). The voice recognition result plotting unit 136 and the evaluation function creating unit 135 are involved in this generation, and their functions will be described later.

【００２３】文字データ出力装置１０３は文字データ出
力手段１５１を有する。文字データ出力手段１５１はモ
ードによって動作が異なる。音声認識モードのときは、
音声認識結果評価手段１３３から出力された音声認識結
果の文字データを認識結果１１３として出力する。出力
先は特に制限されず、表示装置１０５であってもよい
し、図示しない記録媒体や回線などであってもよい。他
方、評価関数生成モードのときは、音声認識処理部１３
１、１３２で得られた双方の音声認識結果およびその信
頼度を表示装置１０５に表示して利用者に提示する。よ
り具体的には本実施の形態では、以下のような動作を行
う。The character data output device 103 has a character data output means 151. The operation of the character data output means 151 differs depending on the mode. In voice recognition mode,
The character data of the voice recognition result output from the voice recognition result evaluation unit 133 is output as the recognition result 113. The output destination is not particularly limited, and may be the display device 105, or a recording medium or line not shown. On the other hand, in the evaluation function generation mode, the voice recognition processing unit 13
Both of the voice recognition results obtained in Nos. 1 and 132 and the reliability thereof are displayed on the display device 105 and presented to the user. More specifically, in this embodiment, the following operation is performed.

【００２４】先ず、基準評価関数fa()を図２で示したよ
うなグラフ形式で表示装置１０５のグラフ表示欄に表示
する。次に、或るサンプル音声データに関して音声認識
処理部１３１、１３２で得られた音声認識結果と信頼度
とが音声認識結果評価手段１３３から渡されると、それ
ぞれの音声認識結果の文字データを利用者に提示するた
めに表示装置１０５の認識結果欄に表示し、かつ、音声
認識処理部１３１の信頼度が基準評価関数fa()で変換す
るとどのような評価値に変換されるかを、グラフ表示欄
に表示している基準評価関数fa()のグラフ上で提示し、
さらに、音声認識処理部１３２の信頼度を同グラフ上で
提示する。例えば図３に示すように、音声認識処理部１
３１で得られた信頼度が例えば65％とすると、横軸の65
％の目盛りの位置から縦軸に平行に基準評価関数２０１
まで達する線分２０２を描き、この線分２０２と基準評
価関数２０１の交点を通り、横軸に平行な線分２０３を
描くことで、音声認識処理部１３１の信頼度が基準評価
関数fa()で変換すると値85の評価値に変換されることを
利用者に提示する。また、音声認識処理部１３２で得ら
れた信頼度が例えば50％とすると、図３に示すように、
横軸の50％の目盛りの位置から縦軸に平行な直線２０４
を描くことで、音声認識処理部１３２の信頼度を同グラ
フ上で提示する。First, the reference evaluation function fa () is displayed in the graph display section of the display device 105 in the graph format as shown in FIG. Next, when the voice recognition results and the reliability obtained by the voice recognition processing units 131 and 132 with respect to a certain sample voice data are passed from the voice recognition result evaluation unit 133, the character data of each voice recognition result is sent to the user. Is displayed in the recognition result column of the display device 105 for presentation to the user, and the evaluation value converted by the reference evaluation function fa () is converted into the evaluation value by a graph display. Present on the graph of the reference evaluation function fa () displayed in the column,
Furthermore, the reliability of the voice recognition processing unit 132 is presented on the same graph. For example, as shown in FIG. 3, the voice recognition processing unit 1
If the reliability obtained in step 31 is 65%, for example, then 65 on the horizontal axis
The reference evaluation function 201 is parallel to the vertical axis from the position of the% scale.
By drawing a line segment 202 reaching up to, and drawing a line segment 203 that passes through the intersection of this line segment 202 and the reference evaluation function 201 and is parallel to the horizontal axis, the reliability of the voice recognition processing unit 131 is determined by the reference evaluation function fa (). It is shown to the user that it will be converted into the evaluation value of value 85 by converting with. If the reliability obtained by the voice recognition processing unit 132 is 50%, as shown in FIG.
Straight line 204 parallel to the vertical axis from the position of the 50% scale on the horizontal axis
By drawing, the reliability of the voice recognition processing unit 132 is presented on the graph.

【００２５】前述した音声認識結果プロット手段１３６
は、自動作成の対象となる評価関数fb()について今回の
信頼度に対する評価点を与えるための手段であり、上記
のようにして描かれた直線２０４上の一点を入力装置１
０６からの利用者の指示でプロットする機能を有する。
直線２０４上のどの箇所をプロット点として利用者が指
定するかは、表示装置１０５に表示された音声認識処理
部１３１、１３２の音声認識結果を比較して決定する。
具体的には、例えば以下のように行う。The above-mentioned speech recognition result plotting means 136
Is a means for giving an evaluation point for the reliability of the evaluation function fb () to be automatically created, and one point on the straight line 204 drawn as described above is input to the input device 1
It has a function of plotting according to the user's instruction from 06.
Which part on the straight line 204 is designated by the user as a plot point is determined by comparing the voice recognition results of the voice recognition processing units 131 and 132 displayed on the display device 105.
Specifically, for example, it is performed as follows.

【００２６】図３に示したように音声認識処理部１３
１、１３２で今回得られた信頼度がそれぞれ65％、50％
とし、音声認識処理部１３１の音声認識結果よりも音声
認識処理部１３２の音声認識結果の方がより正解に近い
のであれば、音声認識処理部１３１の信頼度65％を基準
評価関数fa()で変換して得られる評価点85よりも高い評
価点を音声認識処理部１３２に与える必要があるので、
図３のプロット点２０５に例示するように直線２０４の
線分２０３より若干上側の部分をプロットする。反対
に、音声認識処理部１３２の音声認識結果よりも音声認
識処理部１３１の音声認識結果の方がより正解に近いの
であれば、元々の信頼度65％、50％自体がその関係を正
しく示しているので、図３のプロット点２０６に例示す
るように直線２０４と基準評価関数２０１の交点をプロ
ットする。また、認識結果に差異がなければ直線２０４
と線分２０３との交点をプロットする。As shown in FIG. 3, the voice recognition processing unit 13
The reliability obtained this time with 1 and 132 is 65% and 50%, respectively.
If the voice recognition result of the voice recognition processing unit 132 is closer to the correct answer than the voice recognition result of the voice recognition processing unit 131, the reliability of 65% of the voice recognition processing unit 131 is set to the reference evaluation function fa (). Since it is necessary to give the speech recognition processing unit 132 an evaluation score higher than the evaluation score 85 obtained by converting
As illustrated in the plot point 205 of FIG. 3, a portion of the straight line 204 slightly above the line segment 203 is plotted. On the contrary, if the voice recognition result of the voice recognition processing unit 131 is closer to the correct answer than the voice recognition result of the voice recognition processing unit 132, the original reliability of 65%, 50% itself indicates the relationship correctly. Therefore, the intersection of the straight line 204 and the reference evaluation function 201 is plotted as illustrated in the plot point 206 of FIG. If there is no difference in the recognition result, the straight line 204
And the intersection of the line segment 203 is plotted.

【００２７】図４に示したように音声認識処理部１３
１、１３２で今回得られた信頼度がそれぞれ65％、75％
とし、音声認識処理部１３１の音声認識結果よりも音声
認識処理部１３２の音声認識結果の方がより正解に近い
のであれば、音声認識処理部１３１の信頼度65％を基準
評価関数fa()で変換して得られる評価点85よりも高い評
価点を音声認識処理部１３２に与える必要があるため、
図４のプロット点２０７に例示するように例えば直線２
０４と基準評価関数２０１の交点をプロットする。反対
に、音声認識処理部１３２の音声認識結果よりも音声認
識処理部１３１の音声認識結果の方がより正解に近いの
であれば、図４のプロット点２０８に例示するように直
線２０４の線分２０３より若干下側の部分をプロットす
る。また、認識結果に差異がなければ直線２０４と線分
２０３との交点をプロットする。As shown in FIG. 4, the voice recognition processing unit 13
The reliability obtained this time with 1 and 132 is 65% and 75%, respectively.
If the voice recognition result of the voice recognition processing unit 132 is closer to the correct answer than the voice recognition result of the voice recognition processing unit 131, the reliability of 65% of the voice recognition processing unit 131 is set to the reference evaluation function fa (). Since it is necessary to give the speech recognition processing unit 132 an evaluation score higher than the evaluation score 85 obtained by converting
For example, the straight line 2 as illustrated in the plot point 207 of FIG.
The intersection of 04 and the reference evaluation function 201 is plotted. On the contrary, if the voice recognition result of the voice recognition processing unit 131 is closer to the correct answer than the voice recognition result of the voice recognition processing unit 132, the line segment of the straight line 204 is exemplified as the plot point 208 of FIG. The portion slightly below 203 is plotted. If there is no difference in the recognition result, the intersection of the straight line 204 and the line segment 203 is plotted.

【００２８】図５に示したように音声認識処理部１３
１、１３２で今回得られた信頼度が同じ65％とし、音声
認識処理部１３１の音声認識結果よりも音声認識処理部
１３２の音声認識結果の方がより正解に近いのであれ
ば、図５のプロット点２０９に例示するように直線２０
４の線分２０３より若干上側の部分をプロットする。反
対に、音声認識処理部１３２の音声認識結果よりも音声
認識処理部１３１の音声認識結果の方がより正解に近い
のであれば、図５のプロット点２１０に例示するように
直線２０４の線分２０３より若干下側の部分をプロット
する。また、認識結果に差異がなければ直線２０４と線
分２０３との交点をプロットする。As shown in FIG. 5, the voice recognition processing unit 13
If the reliability obtained this time in Nos. 1 and 132 is the same 65% and the voice recognition result of the voice recognition processing unit 132 is closer to the correct answer than the voice recognition result of the voice recognition processing unit 131, The straight line 20 as illustrated in the plot point 209
The portion slightly above the line segment 203 of 4 is plotted. On the contrary, if the voice recognition result of the voice recognition processing unit 131 is closer to the correct answer than the voice recognition result of the voice recognition processing unit 132, the line segment of the straight line 204 is exemplified as the plot point 210 of FIG. The portion slightly below 203 is plotted. If there is no difference in the recognition result, the intersection of the straight line 204 and the line segment 203 is plotted.

【００２９】以上のようなプロット作業は、各サンプル
音声データ毎に実施する。サンプル音声データは、多種
多様な入力音声環境をカバーし得るように必要充分なサ
ンプル数だけ用意されている。全てのサンプル音声デー
タについて上述したプロット作業を終えると、サンプル
音声データの数だけのプロット点が得られる。評価関数
作成手段１３５は、これらのプロット点の集合から音声
認識処理部１３２用の評価関数fb()を生成する。評価関
数fb()はプロット点の集合をよく近似する関数であれば
よく、その具体的な作成方法は任意で良い。例えば、横
軸の信頼度目盛りの１％おきに、その信頼度から縦軸に
平行に引いた直線に乗る全てのプロット点について、そ
の点から横軸に引いた線分が縦軸と交わる箇所の評価点
を読み取ってその平均値を求め、１％おきの各信頼度毎
の平均値を通る曲線を表す関数をfb()とする方法などが
利用できる。こうして作成された評価関数fb()は、評価
関数１３４に登録され、音声認識モード時に使用され
る。The above-described plotting work is carried out for each sample voice data. The sample voice data is prepared in a sufficient number of samples so as to cover a wide variety of input voice environments. When the above-described plotting work is completed for all sample voice data, plot points as many as the sample voice data are obtained. The evaluation function creating means 135 generates an evaluation function fb () for the voice recognition processing unit 132 from the set of these plot points. The evaluation function fb () only needs to be a function that approximates the set of plot points well, and its specific creation method may be arbitrary. For example, for every 1% of the reliability scale on the horizontal axis, for all plot points on a straight line drawn from the reliability parallel to the vertical axis, the line segment drawn from that point on the horizontal axis intersects the vertical axis. It is possible to use a method in which the evaluation value is read, the average value is obtained, and a function representing a curve passing through the average value for each reliability at every 1% is represented by fb (). The evaluation function fb () thus created is registered in the evaluation function 134 and used in the voice recognition mode.

【００３０】以上の説明では、表示装置１０５への基準
評価関数のグラフ表示、音声認識処理部１３１で得られ
た信頼度とそれに対応する評価値とのグラフ上での提
示、音声認識処理部１３２で得られた信頼度のグラフ上
での提示を文字データ出力手段１５１が実施したが、こ
れらの処理を音声認識結果プロット手段１３６で実施す
るようにしてもよい。また、音声認識処理部１３１で得
られた信頼度とそれに対応する評価値との算出を利用者
自身が行う用にしてもよい。In the above description, a graph of the reference evaluation function is displayed on the display device 105, the reliability obtained by the voice recognition processing unit 131 and the evaluation value corresponding thereto are presented on the graph, and the voice recognition processing unit 132. Although the character data output unit 151 performs the presentation of the reliability obtained in the above step on the graph, these processes may be performed by the voice recognition result plotting unit 136. Further, the user may calculate the reliability obtained by the voice recognition processing unit 131 and the evaluation value corresponding to the reliability.

【００３１】次に、図６のフローチャートの流れに沿っ
て、本実施の形態の音声認識システムの評価関数生成モ
ード時の動作を説明する。なお、図６に示す処理はモー
ド切替器１０４を評価関数生成モードに切り替えた場合
に実行される。Next, the operation of the speech recognition system of this embodiment in the evaluation function generation mode will be described along the flow of the flowchart of FIG. The processing shown in FIG. 6 is executed when the mode switch 104 is switched to the evaluation function generation mode.

【００３２】まず利用者は、前もって信頼度の評価の基
準となる基準評価関数fa()を作成して評価関数１３４に
設定する（ステップＳ１０１）。次に利用者が音声認識
の対象となる音声データとしてサンプル音声データ１１
２を音声データ入力装置１０１の音声データ入力手段１
２１を用いて入力すると、音声データ入力手段１２１が
入力されたサンプル音声データ１１２をデータ処理装置
１０２に入力する（ステップＳ１０２）。データ処理装
置１０２では、音声データ入力手段１２１から受け取っ
たサンプル音声データ１１２を音声認識処理部１３１お
よび音声認識処理部１３２に送りこむ。First, the user creates a reference evaluation function fa () that serves as a reference for evaluation of reliability in advance and sets it in the evaluation function 134 (step S101). Next, the sample voice data 11 is used by the user as voice data to be voice-recognized.
2 is voice data input means 1 of the voice data input device 101
21 is input, the sample voice data 112 input by the voice data input means 121 is input to the data processing device 102 (step S102). In the data processing device 102, the sample voice data 112 received from the voice data input means 121 is sent to the voice recognition processing unit 131 and the voice recognition processing unit 132.

【００３３】音声認識処理部１３１は、音声認識手段１
３７により、受け取ったサンプル音声データ１１２を音
声認識処理部１３１固有の認識方式で音声認識して音声
認識結果Ａを出力し、信頼度計算手段１３９により、そ
の音声認識結果の信頼度ａを出力する（ステップＳ１０
３）。同様に、音声認識処理部１３２は、音声認識手段
１３８により、受け取ったサンプル音声データ１１２を
音声認識処理部１３２固有の認識方式で音声認識して音
声認識結果Ｂを出力し、信頼度計算手段１４０により、
その音声認識結果の信頼度ｂを出力する（ステップＳ１
０４）。The voice recognition processing unit 131 is a voice recognition means 1
37, the received sample voice data 112 is voice-recognized by a recognition method unique to the voice recognition processing unit 131 to output a voice recognition result A, and the reliability calculation means 139 outputs the reliability a of the voice recognition result. (Step S10
3). Similarly, in the voice recognition processing unit 132, the voice recognition unit 138 performs voice recognition on the received sample voice data 112 by a recognition method specific to the voice recognition processing unit 132, outputs a voice recognition result B, and the reliability calculation unit 140. Due to
The reliability b of the voice recognition result is output (step S1).
04).

【００３４】次に音声認識結果評価手段１３３は、音声
認識処理部１３１、１３２から受け取った音声認識結果
Ａ、Ｂと信頼度ａ、ｂを文字データ出力装置１０３の文
字データ出力手段１５１に出力し（ステップＳ１０
５）、文字データ出力手段１５１は、双方の音声認識結
果の文字データを表示装置１０５の認識結果欄に表示す
ると共に、図３を参照して説明したような方法で、音声
認識処理部１３１の信頼度ａとそれに対応する評価値fa
(a)、音声認識処理部１３２の信頼度ｂをグラフ上で提
示する（ステップＳ１０６）。Next, the voice recognition result evaluation unit 133 outputs the voice recognition results A and B and the reliability a and b received from the voice recognition processing units 131 and 132 to the character data output unit 151 of the character data output device 103. (Step S10
5), the character data output unit 151 displays the character data of both voice recognition results in the recognition result column of the display device 105, and the character recognition unit 131 of the voice recognition processing unit 131 uses the method described with reference to FIG. Reliability a and evaluation value fa corresponding to it
(a), the reliability b of the voice recognition processing unit 132 is presented on a graph (step S106).

【００３５】次に利用者は、表示装置１０５の認識結果
欄に表示された双方の音声認識結果を比較し、何れがよ
り正解に近いかを判断し、その判断結果を踏まえて図３
乃至図５を参照して説明した方法で、音声認識結果プロ
ット手段１３６を利用して音声認識処理部１３２の今回
の信頼度ｂに対応する評価値fb(b)をグラフ上にプロッ
トする（ステップＳ１０７）。Next, the user compares the two voice recognition results displayed in the recognition result column of the display device 105, judges which one is closer to the correct answer, and based on the judgment result, FIG.
Through the method described with reference to FIG. 5, the evaluation value fb (b) corresponding to the current reliability b of the voice recognition processing unit 132 is plotted on the graph by using the voice recognition result plotting unit 136 (step S107).

【００３６】次に、利用者はサンプリング処理の完了を
確認し、完了していなければ（ステップＳ１０８でＮ
ｏ）、次のサンプル音声データを入力する（ステップＳ
１０２）。これにより、上述した処理と同様の処理が繰
り返される。また、全てのサンプル音声データを処理し
終えていれば（ステップＳ１０８でＹｅｓ）、入力装置
１０６からの指示で評価関数作成手段１３５を起動す
る。評価関数作成手段１３５は、利用者が音声認識結果
プロット手段１３６で信頼度と評価値のグラフ上にプロ
ットした評価値fb(b)の集合から、音声認識処理部１３
２用の評価関数fb()を生成し、評価関数１３４に設定す
る（ステップＳ１０９）。Next, the user confirms the completion of the sampling process, and if not completed (step S108 returns N
o), input the next sample voice data (step S)
102). As a result, the same processing as the above-mentioned processing is repeated. If all the sample voice data have been processed (Yes in step S108), the evaluation function creating means 135 is activated by an instruction from the input device 106. The evaluation function creating unit 135 uses the voice recognition processing unit 13 from the set of evaluation values fb (b) plotted by the user on the graph of the reliability and the evaluation value by the voice recognition result plotting unit 136.
The evaluation function fb () for 2 is generated and set in the evaluation function 134 (step S109).

【００３７】図７に全てのサンプル音声データについて
プロットされたグラフの一例とそのプロットの集合から
生成された評価関数fb()の例を示す。同図において小さ
なドットの一つ一つがプロット点であり、破線で示す曲
線がプロット点の集合から生成された評価関数fb()の例
である。この例では、信頼度を高低の２つの領域に分け
た場合、高い領域では、同じ信頼度であっても評価関数
fa()よりも高い評価値が得られ、低い領域では、同じ信
頼度であっても評価関数fa()よりは低い評価値が得られ
るような評価関数fb()が生成されている。FIG. 7 shows an example of a graph plotted for all sample voice data and an example of the evaluation function fb () generated from the set of the plots. In the figure, each of the small dots is a plot point, and the curve shown by the broken line is an example of the evaluation function fb () generated from the set of plot points. In this example, when the reliability is divided into two areas of high and low, even if the reliability is the same in the high area, the evaluation function
An evaluation function fb () is generated such that an evaluation value higher than that of fa () is obtained and an evaluation value lower than that of the evaluation function fa () is obtained in the low region even with the same reliability.

【００３８】次に、図８のフローチャートの流れに沿っ
て、本実施の形態の音声認識システムの音声認識モード
時の動作を説明する。なお、図８に示す処理はモード切
替器１０４を音声認識モードに切り替えた場合に実行さ
れる。Next, the operation in the voice recognition mode of the voice recognition system of the present embodiment will be described along the flow of the flowchart of FIG. The process shown in FIG. 8 is executed when the mode switch 104 is switched to the voice recognition mode.

【００３９】利用者が音声認識の対象となる入力音声デ
ータ１１１を音声データ入力装置１０１の音声データ入
力手段１２１を用いて入力すると、音声データ入力手段
１２１が入力音声データ１１１をデータ処理装置１０２
に入力する（ステップＳ２０１）。データ処理装置１０
２では、音声データ入力手段１２１から受け取った入力
音声データ１１１を音声認識処理部１３１および音声認
識処理部１３２に送りこむ。When the user inputs the input voice data 111 to be voice-recognized by using the voice data input means 121 of the voice data input device 101, the voice data input means 121 outputs the input voice data 111 to the data processing device 102.
(Step S201). Data processing device 10
In 2, the input voice data 111 received from the voice data input means 121 is sent to the voice recognition processing unit 131 and the voice recognition processing unit 132.

【００４０】音声認識処理部１３１は、音声認識手段１
３７により、受け取った入力音声データ１１１を音声認
識処理部１３１固有の認識方式で音声認識して音声認識
結果Ａを出力し、信頼度計算手段１３９により、その音
声認識結果の信頼度ａを出力する（ステップＳ２０
２）。同様に、音声認識処理部１３２は、音声認識手段
１３８により、受け取った入力音声データ１１１を音声
認識処理部１３２固有の認識方式で音声認識して音声認
識結果Ｂを出力し、信頼度計算手段１４０により、その
音声認識結果の信頼度ｂを出力する（ステップＳ２０
３）。The voice recognition processing section 131 comprises the voice recognition means 1
37, the received input voice data 111 is voice-recognized by a recognition method unique to the voice recognition processing unit 131, and a voice recognition result A is output, and the reliability calculation means 139 outputs the reliability a of the voice recognition result. (Step S20
2). Similarly, in the voice recognition processing unit 132, the voice recognition unit 138 performs voice recognition on the received input voice data 111 by a recognition method unique to the voice recognition processing unit 132, outputs a voice recognition result B, and the reliability calculation unit 140. Thus, the reliability b of the voice recognition result is output (step S20).
3).

【００４１】次に音声認識結果評価手段１３３は、音声
認識処理部１３１から受け取った信頼度ａを評価関数fa
()を用いて評価値fa(a)に変換すると共に、音声認識処
理部１３２から受け取った信頼度ｂを評価関数fb()を用
いて評価値fb(b)に変換する（ステップＳ２０４）。そ
して、音声認識処理部１３１の評価値fa(a)の方が音声
認識処理部１３２の評価値fb(b)より高ければ（ステッ
プＳ２０５でＹｅｓ）、音声認識処理部１３１の音声認
識結果Ａを選択して文字データ出力装置１０３に出力す
る（ステップＳ２０６）。そうでなければ、つまり、音
声認識処理部１３１の評価値fa(a)の方が音声認識処理
部１３２の評価値fb(b)より高くなければ（ステップＳ
２０５でＮｏ）、音声認識処理部１３２の音声認識結果
Ｂを選択して文字データ出力装置１０３に出力する（ス
テップＳ２０７）。文字データ出力装置１０３の文字デ
ータ出力手段１５１は、音声認識結果評価手段１３３か
ら渡された音声認識結果の文字データを認識結果１１３
として出力する（ステップＳ２０８）。Next, the voice recognition result evaluation means 133 uses the reliability a received from the voice recognition processing unit 131 as the evaluation function fa.
() Is used to convert to the evaluation value fa (a), and the reliability b received from the speech recognition processing unit 132 is converted to the evaluation value fb (b) using the evaluation function fb () (step S204). If the evaluation value fa (a) of the voice recognition processing unit 131 is higher than the evaluation value fb (b) of the voice recognition processing unit 132 (Yes in step S205), the voice recognition result A of the voice recognition processing unit 131 is set. It is selected and output to the character data output device 103 (step S206). If not, that is, if the evaluation value fa (a) of the voice recognition processing unit 131 is not higher than the evaluation value fb (b) of the voice recognition processing unit 132 (step S).
No in 205), the voice recognition result B of the voice recognition processing unit 132 is selected and output to the character data output device 103 (step S207). The character data output unit 151 of the character data output device 103 recognizes the character data of the voice recognition result passed from the voice recognition result evaluation unit 133 as the recognition result 113.
Is output (step S208).

【００４２】図９乃至図１２に、評価関数fa()、fb()が
それぞれ図２の２０１、図７の３０１であった場合にお
いて、音声認識処理部１３１の信頼度ａ、その評価値fa
(a)、音声認識処理部１３２の信頼度ｂ、その評価値fb
(b)の組み合わせを示す。図９は、信頼度ａが信頼度ｂ
より高く、且つ、評価値fa(a)も評価値fb(b)より高い場
合を示す。この場合、音声認識結果評価手段１３３は音
声認識結果として音声認識結果Ａを選択する。図１０
は、信頼度ａが信頼度ｂより低く、且つ、評価値fa(a)
も評価値fb(b)より低い場合を示す。この場合、音声認
識結果評価手段１３３は音声認識結果として音声認識結
果Ｂを選択する。この図９および図１０の場合には評価
値でなく信頼度の高低で音声認識結果を選択した場合と
特に変わらない。これに対し、図１１に示すように、信
頼度ａが信頼度ｂより高いが、評価値fa(a)が評価値fb
(b)より低い場合は、音声認識結果評価手段１３３は音
声認識結果として音声認識結果Ｂを選択する。また、図
１２に示すように、信頼度ａが信頼度ｂより低いが、評
価値fa(a)が評価値fb(b)より高い場合は、音声認識結果
評価手段１３３は音声認識結果として音声認識結果Ａを
選択する。このように図１１および図１２の場合、信頼
度の高低で音声認識結果を選択した場合とは反対の結果
が得られている。9 to 12, when the evaluation functions fa () and fb () are 201 in FIG. 2 and 301 in FIG. 7, respectively, the reliability a of the voice recognition processing unit 131 and its evaluation value fa
(a), reliability b of the voice recognition processing unit 132, and its evaluation value fb
The combination of (b) is shown. In FIG. 9, the reliability a is the reliability b.
The case where the evaluation value fa (a) is higher than the evaluation value fb (b) is also higher. In this case, the voice recognition result evaluation unit 133 selects the voice recognition result A as the voice recognition result. Figure 10
Indicates that the reliability a is lower than the reliability b and the evaluation value fa (a)
Also shows the case where the evaluation value is lower than the evaluation value fb (b). In this case, the voice recognition result evaluation unit 133 selects the voice recognition result B as the voice recognition result. In the case of FIG. 9 and FIG. 10, there is no particular difference from the case where the voice recognition result is selected with high or low reliability instead of the evaluation value. On the other hand, as shown in FIG. 11, the reliability a is higher than the reliability b, but the evaluation value fa (a) is equal to the evaluation value fb.
If it is lower than (b), the voice recognition result evaluation unit 133 selects the voice recognition result B as the voice recognition result. As shown in FIG. 12, when the reliability a is lower than the reliability b, but the evaluation value fa (a) is higher than the evaluation value fb (b), the voice recognition result evaluation unit 133 outputs the voice recognition result as a voice recognition result. The recognition result A is selected. In this way, in the cases of FIG. 11 and FIG. 12, the opposite results to the case where the voice recognition result is selected with high and low reliability are obtained.

【００４３】以上の実施の形態では、音声認識処理部１
３２用の評価関数fb()を、基準となる評価関数fa()及び
サンプル音声データを用いて実際に音声認識した結果に
基づいて実験的に作成したが、組み合わせる音声認識処
理部の特性が推定できる場合、その推定に基づいて評価
関数を事前に作成することが可能である。以下、この点
につき説明する。In the above embodiment, the voice recognition processing section 1
The 32 evaluation function fb () was experimentally created based on the result of actual voice recognition using the reference evaluation function fa () and sample voice data, but the characteristics of the voice recognition processing unit to be combined are estimated. If possible, an evaluation function can be created in advance based on the estimation. Hereinafter, this point will be described.

【００４４】音声認識処理部１３１の音声認識手段１３
７が、語句解析で音声認識を行い意味解析による認識処
理は行わない音声認識手段であり、信頼度計算手段１３
９が、音声認識手段１３７で音声認識した音声認識結果
の信頼度を算出するものとする。なお、信頼度計算手段
１３９における信頼度の計算方法としては、例えば音声
認識結果に含まれる語句毎の信頼度の平均を音声認識結
果の信頼度とする方法を用いることができる。つまり、
例えば「コンニチハ（ワ）イイテンキデスネ」という
入力音声を音声認識させた場合に「今日は、いい天気で
すね。」という音声認識結果が得られた場合、「今
日」、「は」、「、」、「いい」、「天気」、「で
す」、「ね」、「。」の各語句の信頼度を出現頻度で表
した場合にそれぞれ「７０％」「６０％」「７５％」
「３５％」「５０％」「４０％」「３０％」「８０％」
とすると、入力音声全体の音声認識結果の信頼度をその
平均の「５５％」とする。The voice recognition means 13 of the voice recognition processing unit 131
Reference numeral 7 is a voice recognition means that performs voice recognition by phrase analysis and does not perform recognition processing by semantic analysis.
It is assumed that 9 calculates the reliability of the voice recognition result voice-recognized by the voice recognition means 137. As the reliability calculation method in the reliability calculation means 139, for example, a method in which the average of the reliability of each word included in the speech recognition result is used as the reliability of the speech recognition result can be used. That is,
For example, if the voice recognition result of "Today is a nice weather" is obtained when the input voice "Konichiha (wa) Itenkidesune" is recognized, "Today", "Ha", ", ",""Good","weather","is","ne","." When the reliability is expressed in terms of appearance frequency, "70%""60%""75%" respectively
"35%""50%""40%""30%""80%"
Then, the reliability of the speech recognition result of the entire input speech is set to "55%" of the average.

【００４５】他方、音声認識処理部１３２の音声認識手
段１３８は、語句解析ならびに意味解析で音声認識を行
う音声認識手段であるとする。なお、意味解析まで行う
場合、一般に連続音声認識した入力音声全体の信頼度を
出力するので、信頼度計算手段１４０は音声認識手段１
３８が出力した、例えば音声認識結果の類似度をそのま
ま入力音声全体の音声認識結果の信頼度として出力す
る。On the other hand, it is assumed that the voice recognition means 138 of the voice recognition processing section 132 is a voice recognition means for performing voice recognition by phrase analysis and semantic analysis. In addition, in the case of performing the semantic analysis, generally, the reliability of the entire input voice recognized by the continuous voice recognition is output.
For example, the similarity of the voice recognition result output by 38 is output as it is as the reliability of the voice recognition result of the entire input voice.

【００４６】語句解析で音声認識を行う音声認識手段１
３７の場合、音声認識した結果の文字データの任意の語
句は前後の語句に影響され無いので、音声認識した結果
の信頼度から評価値を得る評価関数fa()は、ほぼ直線に
するのが最適だと考えられる。例えば、「コンニチハ
（ワ）イイテンキデスネ」という入力音声を音声認識
させた場合、語句「いい」の前後の語句「今日」「は」
「、」「天気」「です」「ね」「。」がかなり高い信頼
度が期待できる場合でも語句「いい」の信頼度に影響が
無いので、入力音声データ全体の音声認識結果の信頼度
は全語句の平均値等の各語句の信頼度が単調増加すると
全語句の信頼度が単調増加する値を適用することが適切
である。また、音声認識結果の信頼度である全語句の信
頼度が単調に増加すると音声認識結果を利用者が正解と
認める割合である認識率も単調に増加する。よって、図
１３のグラフ上の線分４０１に示すように、語句解析で
音声認識を行う音声認識手段１３７の信頼度を評価する
評価関数fa()は直線的な関数になる。Voice recognition means 1 for performing voice recognition by phrase analysis
In the case of 37, since the arbitrary phrase of the character data of the voice recognition result is not influenced by the preceding and following phrases, the evaluation function fa () that obtains the evaluation value from the reliability of the voice recognition result should be almost a straight line. Considered optimal. For example, when the input voice "Konichiha (wa) Itenkidesune" is recognized, the phrases "today" and "ha" before and after the phrase "good"
Even if ",""weather","is","ne","." Can be expected to have a fairly high degree of reliability, the reliability of the phrase "good" is not affected, so the reliability of the speech recognition result of the entire input speech data is It is appropriate to apply a value such that the reliability of all words monotonically increases when the reliability of each word such as the average value of all words monotonically increases. Further, if the reliability of all words, which is the reliability of the voice recognition result, monotonically increases, the recognition rate, which is the rate at which the user recognizes the voice recognition result as the correct answer, also monotonically increases. Therefore, as indicated by the line segment 401 on the graph in FIG. 13, the evaluation function fa () that evaluates the reliability of the voice recognition unit 137 that performs voice recognition by phrase analysis is a linear function.

【００４７】他方、意味解析で音声認識を行う音声認識
手段１３８を利用する場合、前述した例文の場合、音声
認識結果には「今日は、良い（ヨイ）天気ですね。」
「今日は、いい（イイ）天気ですね。」等が考えられ、
音声認識結果の組み合わせが語句解析で音声認識を行う
場合よりも増えるために全体的な音声認識結果の信頼度
は下がってしまう。また、ある程度前後の語句が音声認
識されるとそれら語句を利用することにより入力音声デ
ータ全体の認識率が高くなる。逆に、前後の語句が音声
認識されないと認識率が低くなる恐れがある。例えば、
語句「いい」の前後の語句「今日」「は」「、」「天
気」「です」「ね」「。」がある程度音声認識できれ
ば、入力音声データ全体の信頼度がそんなに高くなくて
も、入力音声データ全体の音声認識結果の「今日は、良
い天気ですね。」の認識率が高くなる。よって、図１３
のグラフ上の曲線４０２に示すように、意味解析で音声
認識を行う音声認識手段１３８の信頼度を評価する評価
関数fb()は、語句解析で音声認識を行う音声認識手段１
３７の信頼度を評価する評価関数fa()に比べて信頼度が
高い場合には評価値が高くなり、信頼度が低い場合には
評価値が低くなる曲線を描く関数になる。On the other hand, when using the voice recognition means 138 for performing voice recognition by semantic analysis, in the case of the above-mentioned example sentence, the voice recognition result shows "Today is good (good) weather."
"Today is good weather."
Since the number of combinations of voice recognition results is larger than that in the case of performing voice recognition by phrase analysis, the reliability of the overall voice recognition result is lowered. Further, when the words before and after the speech are recognized to some extent, the recognition rate of the entire input voice data is increased by utilizing the words. On the contrary, if the surrounding words are not recognized, the recognition rate may be low. For example,
If the words "today", "ha", ",""weather","is","ne" and "." Before and after the word "good" can be recognized to some extent, the input voice data can be input even if the reliability is not so high. The recognition rate of "Today's weather is good" is high in the voice recognition result of the entire voice data. Therefore, FIG.
As indicated by the curve 402 on the graph of, the evaluation function fb () that evaluates the reliability of the voice recognition unit 138 that performs voice recognition by semantic analysis is the voice recognition unit 1 that performs voice recognition by phrase analysis.
Compared to the evaluation function fa () for evaluating the reliability of 37, the evaluation value becomes high when the reliability is high, and the evaluation value becomes low when the reliability is low.

【００４８】以上のように音声認識処理部１３１、１３
２双方の評価関数fa()、fb()が事前に推定できる場合、
推定して作成した評価関数fa()、fb()を評価関数１３４
に設定することで直ちに音声認識モードによる処理を開
始できる。その場合、評価関数作成手段１３５、音声認
識結果プロット手段１３６、モード切替器１０４、表示
装置１０５、入力装置１０６の各要素、および音声認識
結果評価手段１３３と文字データ出力手段１５１が有す
る評価関数生成モード時の機能は省略して、図１４に示
すような音声認識システムに簡素化することができる。As described above, the voice recognition processing units 131 and 13
2 If both evaluation functions fa () and fb () can be estimated in advance,
The estimated and created evaluation functions fa () and fb () are assigned to the evaluation function 134
By setting to, the processing in the voice recognition mode can be started immediately. In that case, the evaluation function creation unit 135, the voice recognition result plotting unit 136, the mode switching unit 104, the display device 105, the respective elements of the input device 106, and the evaluation function generation included in the voice recognition result evaluation unit 133 and the character data output unit 151. By omitting the function in the mode, the voice recognition system as shown in FIG. 14 can be simplified.

【００４９】以上の各実施の形態では、２つの音声認識
処理部を相補的に組み合わせたが、３つ以上の音声認識
処理部を組み合わせることも可能である。基準評価関数
から他の２以上の評価関数を図１の実施の形態における
評価関数生成モードと同様に作成する場合、基準評価関
数を割り当てた１つの音声認識処理部の音声認識結果
と、他の２つ以上の音声認識処理部の音声認識結果それ
ぞれとを比較して評価値をプロットする。従って、１つ
のサンプル音声データ当り、他の音声認識処理部の数だ
けの評価値をプロットすることになる。各々のプロット
点がどの音声認識処理部に対応するものであるかを区別
するために、プロット点を複数種類にして他の音声認識
処理部に１対１に割り当て、同じ種類のプロット点の集
合毎に評価関数作成手段１３５が評価関数を生成する。In each of the above embodiments, the two voice recognition processing units are complementarily combined, but it is also possible to combine three or more voice recognition processing units. When two or more other evaluation functions are created from the reference evaluation function in the same manner as the evaluation function generation mode in the embodiment of FIG. 1, the voice recognition result of one voice recognition processing unit to which the reference evaluation function is assigned, The evaluation values are plotted by comparing each of the voice recognition results of the two or more voice recognition processing units. Therefore, the evaluation values for the number of other voice recognition processing units are plotted for one sample voice data. In order to distinguish which speech recognition processing unit each plot point corresponds to, a plurality of types of plotting points are assigned to other speech recognition processing units, and one to one is assigned to a set of plotting points of the same type. The evaluation function creating means 135 generates an evaluation function for each time.

【００５０】また、以上の各実施の形態では、音声認識
処理部１３１、１３２のそれぞれに信頼度計算手段１３
９、１４０を独立に設けたが、同じ手法の音声認識を使
い、標準パターンだけが違う認識方式の複数の音声認識
手段１３７、１３８を組み合わせる場合などでは、信頼
度の計算方法が同じになるので、例えば図１５に示すよ
うに、複数の音声認識手段１３７、１３８で同じ信頼度
計算手段１６１を使うようにしてもよい。In each of the above embodiments, the reliability calculation means 13 is provided in each of the voice recognition processing units 131 and 132.
Although 9, 140 are independently provided, the method of calculating the reliability is the same in the case of using a plurality of voice recognition means 137, 138 of the recognition method that uses the same method of voice recognition and has a different standard pattern. For example, as shown in FIG. 15, a plurality of voice recognition means 137 and 138 may use the same reliability calculation means 161.

【００５１】以上本発明の実施の形態について説明した
が、本発明は以上の実施の形態にのみ限定されず、その
他各種の付加変更が可能である。また、本発明の音声認
識システムは、その有する機能をハードウェア的に実現
することは勿論、コンピュータと音声認識用プログラム
とで実現することができる。音声認識用プログラムは、
磁気ディスクや半導体メモリ等のコンピュータ可読記録
媒体に記録されて提供され、コンピュータの立ち上げ時
などにコンピュータに読み取られ、そのコンピュータの
動作を制御することにより、そのコンピュータを前述し
た各実施の形態における音声認識システムとして機能さ
せる。Although the embodiments of the present invention have been described above, the present invention is not limited to the above embodiments, and various other additions and changes can be made. Further, the voice recognition system of the present invention can be realized by a computer and a voice recognition program, as well as by hardware realizing the function of the voice recognition system. The speech recognition program is
It is provided by being recorded in a computer-readable recording medium such as a magnetic disk or a semiconductor memory, and is read by a computer at the time of starting the computer, and the operation of the computer is controlled, so that the computer in each of the above-described embodiments is controlled. Make it function as a voice recognition system.

【００５２】[0052]

【発明の効果】以上説明したように本発明によれば、出
力する信頼度の生成基準や尺度が異なる複数の音声認識
処理部であってもそれらを相補的に組み合わせることが
できる音声認識システム及び音声認識方法が得られる。
また、それに必要な評価関数をサンプル音声データおよ
び基準評価関数から利用者が簡単に作成することができ
る。As described above, according to the present invention, even a plurality of speech recognition processing units having different output reliability generation standards and scales can be combined in a complementary manner. A voice recognition method is obtained.
Also, the user can easily create the evaluation function required for it from the sample voice data and the reference evaluation function.

[Brief description of drawings]

【図１】本発明の第１の実施の形態の構成を示すブロッ
ク図である。FIG. 1 is a block diagram showing a configuration of a first exemplary embodiment of the present invention.

【図２】本発明で用いる基準評価関数の一例を示す図で
ある。FIG. 2 is a diagram showing an example of a reference evaluation function used in the present invention.

【図３】評価関数生成モードにおいて利用者が実施する
評価点のプロット作業の説明図である。FIG. 3 is an explanatory diagram of an evaluation point plotting operation performed by a user in an evaluation function generation mode.

【図４】評価関数生成モードにおいて利用者が実施する
評価点プロット作業の説明図である。FIG. 4 is an explanatory diagram of evaluation point plotting work performed by a user in an evaluation function generation mode.

【図５】評価関数生成モードにおいて利用者が実施する
評価点プロット作業の説明図である。FIG. 5 is an explanatory diagram of evaluation point plotting work performed by a user in an evaluation function generation mode.

【図６】本発明の第１の実施の形態における評価関数生
成モード時の処理例を示すフローチャートである。FIG. 6 is a flowchart showing a processing example in an evaluation function generation mode according to the first embodiment of the present invention.

【図７】サンプル音声データ全てについて実施された評
価点のプロットの集合とそれから作成された評価関数の
例を示す図である。FIG. 7 is a diagram showing an example of a set of evaluation point plots executed for all sample voice data and an evaluation function created from the set.

【図８】本発明の第１の実施の形態における音声認識モ
ード時の処理例を示すフローチャートである。FIG. 8 is a flowchart showing a processing example in a voice recognition mode according to the first embodiment of the present invention.

【図９】本発明の第１の実施の形態における音声認識モ
ード時の音声認識結果評価手段の動作説明図である。FIG. 9 is an operation explanatory diagram of the voice recognition result evaluation means in the voice recognition mode according to the first embodiment of the present invention.

【図１０】本発明の第１の実施の形態における音声認識
モード時の音声認識結果評価手段の動作説明図である。FIG. 10 is an operation explanatory diagram of the voice recognition result evaluation means in the voice recognition mode according to the first embodiment of the present invention.

【図１１】本発明の第１の実施の形態における音声認識
モード時の音声認識結果評価手段の動作説明図である。FIG. 11 is an operation explanatory diagram of the voice recognition result evaluation means in the voice recognition mode according to the first embodiment of the present invention.

【図１２】本発明の第１の実施の形態における音声認識
モード時の音声認識結果評価手段の動作説明図である。FIG. 12 is an operation explanatory diagram of the voice recognition result evaluation means in the voice recognition mode according to the first embodiment of the present invention.

【図１３】本発明の第２の実施の形態で用いる評価関数
の一例を示す図である。FIG. 13 is a diagram showing an example of an evaluation function used in the second embodiment of the present invention.

【図１４】本発明の第２の実施の形態の音声認識システ
ムの構成を示すブロック図である。FIG. 14 is a block diagram showing a configuration of a voice recognition system according to a second embodiment of the present invention.

【図１５】信頼度計算手段を複数の音声認識手段で共有
する構成を示すブロック図である。FIG. 15 is a block diagram showing a configuration in which reliability calculation means is shared by a plurality of voice recognition means.

[Explanation of symbols]

１０１…音声データ入力装置１０２…データ処理装置１０３…文字データ出力装置１０４…モード切替器１０５…表示装置１０６…入力装置１１１…入力音声データ１１２…サンプル音声データ１２１…音声データ入力手段１３１、１３２…音声認識処理部１３３…音声認識結果評価手段１３４…評価関数１３５…評価関数作成手段１３６…音声認識結果プロット手段１３７、１３８…音声認識手段１３９、１４０…信頼度計算手段 101 ... Voice data input device 102 ... Data processing device 103 ... Character data output device 104 ... Mode changer 105 ... Display device 106 ... Input device 111 ... Input voice data 112 ... Sample audio data 121 ... Voice data input means 131, 132 ... Voice recognition processing unit 133 ... Voice recognition result evaluation means 134 ... Evaluation function 135 ... Evaluation function creating means 136 ... Speech recognition result plotting means 137, 138 ... Voice recognition means 139, 140 ... Reliability calculation means

Claims

[Claims]

1. A plurality of voice recognition processing units, each of which performs voice recognition on an input voice according to a unique recognition method, and outputs a voice recognition result and a reliability indicating its certainty, respectively, and the plurality of voice recognitions. The reliability output from each of the processing units is converted into an evaluation value which is a scale common to the plurality of speech recognition processing units by using an evaluation function prepared for each speech recognition processing unit, and is obtained by the conversion. A voice recognition system comprising: a voice recognition result evaluation means for selecting and outputting a voice recognition result from the plurality of voice recognition results based on a comparison result of a plurality of evaluation values.

2. The evaluation function includes a reference evaluation function corresponding to any one voice recognition processing unit of the plurality of voice recognition processing units and an evaluation function corresponding to another voice recognition processing unit, and In the evaluation function corresponding to the voice recognition processing unit of, even if the reliability value is the same, the voice recognition result is closer to the correct answer range than the voice recognition result of the one voice recognition processing unit. The speech recognition system according to claim 1, wherein is an evaluation function that obtains an evaluation value higher than the reference evaluation function, and an evaluation value that is lower than the reference evaluation function in the range of the reliability not close to the correct answer.

3. Means for presenting to the user the voice recognition results obtained by the plurality of voice recognition processing units for sample voice data, and means for displaying the reference evaluation function on a graph of reliability and evaluation values. A unit for plotting an evaluation value to be given to the reliability obtained by the other speech recognition processing unit on the graph according to an instruction from a user, and the other unit based on a set of plot points on the graph. The speech recognition system according to claim 2, further comprising means for creating the evaluation function for the speech recognition processing unit.

4. The voice recognition processing unit calculates voice recognition means for performing voice recognition on an input voice and outputting a voice recognition result, and reliability indicating the certainty of the voice recognition result of the voice recognition means. The speech recognition system according to claim 1, further comprising a reliability calculation means.

5. The voice recognition system according to claim 4, wherein the reliability calculation means is shared by a plurality of voice recognition processing units.

6. A) in parallel with a plurality of speech recognition processing units,
Voice recognition is performed on the input voice in accordance with its own recognition method, and a voice recognition result and a reliability indicating its certainty are respectively output; and b) output from each of the plurality of voice recognition processing units. A step of converting the reliability into an evaluation value which is a scale common to the plurality of speech recognition processing sections using an evaluation function prepared for each speech recognition processing section; and c) a plurality of evaluations obtained by the conversion. A voice recognition method comprising: selecting and outputting a voice recognition result from the plurality of voice recognition results based on a value comparison result.

7. The evaluation function includes a reference evaluation function corresponding to any one voice recognition processing unit of the plurality of voice recognition processing units and an evaluation function corresponding to another voice recognition processing unit, and In the evaluation function corresponding to the voice recognition processing unit of, even if the reliability value is the same, the voice recognition result is closer to the correct answer range than the voice recognition result of the one voice recognition processing unit. 7. The speech recognition method according to claim 6, wherein the evaluation value is higher than the reference evaluation function, and the evaluation value is lower than the reference evaluation function in the range of reliability not close to the correct answer.

8. The method further comprising the step of: d) generating an evaluation function corresponding to the other speech recognition processing section, the step d
D-1) presenting to the user the voice recognition results obtained by the plurality of voice recognition processing units with respect to the sample voice data, and d-2) the reference evaluation function on the graph of the reliability and the evaluation value. D-3) plotting an evaluation value to be given to the reliability obtained by the other speech recognition processing unit on the graph according to an instruction from the user, and d-4) Repeating the steps d-1 to d-3 for the number of prepared sample speech data, and d-5) the evaluation function for the other speech recognition processing unit based on a set of plot points on the graph. The method according to claim 7, further comprising the step of:

9. A plurality of voice recognition processing units, each of which performs voice recognition on an input voice according to a unique recognition method, and outputs a voice recognition result and a reliability indicating its certainty, respectively. The reliability output from each of the voice recognition processing units is converted into an evaluation value that is a common scale for the plurality of voice recognition processing units by using the evaluation function prepared for each voice recognition processing unit, and the conversion is performed. A voice recognition program that functions as a voice recognition result evaluation unit that selects and outputs a voice recognition result from the plurality of voice recognition results based on a comparison result of a plurality of obtained evaluation values.

10. The evaluation function includes a reference evaluation function corresponding to any one voice recognition processing unit of the plurality of voice recognition processing units and an evaluation function corresponding to another voice recognition processing unit, and In the evaluation function corresponding to the voice recognition processing unit of, even if the reliability value is the same, the voice recognition result is closer to the correct answer range than the voice recognition result of the one voice recognition processing unit. 10. The speech recognition program according to claim 9, which is an evaluation function that obtains an evaluation value higher than the reference evaluation function, and an evaluation value that is lower than the reference evaluation function in the range of reliability not close to the correct answer. .

11. The computer further comprises means for presenting to a user voice recognition results obtained by the plurality of voice recognition processing units for sample voice data, the reference evaluation function on a graph of reliability and evaluation value. Means for displaying, means for plotting an evaluation value given to the reliability obtained by the other speech recognition processing unit on the graph according to an instruction from a user, based on a set of plot points on the graph The speech recognition program according to claim 10, which is caused to function as a unit that creates the evaluation function for the other speech recognition processing unit.