JP7214841B2

JP7214841B2 - THRESHOLD ADJUSTMENT DEVICE, THRESHOLD ADJUSTMENT METHOD, AND RECORDING MEDIUM

Info

Publication number: JP7214841B2
Application number: JP2021511407A
Authority: JP
Inventors: 健太長; 一彦阿部; 海亮李
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2019-03-29
Filing date: 2020-03-17
Publication date: 2023-01-30
Anticipated expiration: 2040-03-17
Also published as: WO2020203275A1; JPWO2020203275A1; CN111754995A; CN111754995B

Description

本発明の実施形態は、閾値調整装置、閾値調整方法および記録媒体に関する。 Embodiments of the present invention relate to a threshold adjustment device, a threshold adjustment method, and a recording medium.

従来、予め定められた複数の認識対象語のいずれかが入力音声に含まれる場合に、その認識対象語を入力音声に対する認識結果として出力する音声認識装置が知られている。この音声認識装置では、各認識対象語に対する反応しやすさを閾値によって制御できる。例えば、入力音声から算出される音響特徴量と、複数の認識対象語のうちのいずれかの音響特徴量との特徴空間上における距離がその認識対象語に対して設定された閾値以下であれば、入力音声に対する認識結果として、その認識対象語が出力される。この場合、複数の認識対象語の各々に対して適切な閾値を設定することで、入力音声に含まれる認識対象語を正しく認識することができる。 2. Description of the Related Art Conventionally, there is known a speech recognition apparatus that, when input speech includes any of a plurality of predetermined recognition target words, outputs the recognition target word as a recognition result for the input speech. In this speech recognition device, the responsiveness to each recognition target word can be controlled by a threshold value. For example, if the distance in the feature space between the acoustic feature value calculated from the input speech and the acoustic feature value of any one of a plurality of recognition target words is equal to or less than the threshold set for the recognition target word , the recognition target word is output as the recognition result for the input speech. In this case, by setting an appropriate threshold for each of the plurality of recognition target words, the recognition target words contained in the input speech can be correctly recognized.

しかし、複数の認識対象語の各々に対し、その認識対象語が入力音声に含まれる場合は反応し、他の認識対象語やノイズには反応しないように適切な閾値を設定することは難しく、このような閾値の調整をサポートする仕組みが求められている。 However, it is difficult to set an appropriate threshold value for each of a plurality of recognition target words so that the recognition target words are included in the input speech and not reacted to other recognition target words or noise. There is a demand for a mechanism that supports such threshold adjustment.

特開特開２０１８－０７２５９９号公報Japanese Patent Application Laid-Open No. 2018-072599

本発明が解決しようとする課題は、予め定められた複数の認識対象語の各々に対して適切な閾値を設定できるように閾値の調整をサポートする閾値調整装置、閾値調整方法および記録媒体を提供することである。 The problem to be solved by the present invention is to provide a threshold adjustment device, a threshold adjustment method, and a recording medium that support threshold adjustment so that an appropriate threshold can be set for each of a plurality of predetermined recognition target words. It is to be.

実施形態の閾値調整装置は、音声認識部と、評価部と、表示制御部と、を備える。音声認識部は、音声認識を行う。評価部は、前記音声認識部に対し、予め定められた複数の認識対象語に対し個別に設定される複数の閾値を要素とする閾値リストと評価用音声とを入力し、前記音声認識部が出力する前記評価用音声に対する認識結果に基づいて、前記閾値リストを用いた前記音声認識部による前記複数の認識対象語の各々に対する認識精度を表す評価値を算出する。表示制御部は、前記複数の認識対象語のうちの任意の認識対象語に対応する閾値を調整するための閾値調整画面を表示装置に表示させる。前記閾値調整画面は、前記評価部によって前記複数の認識対象語の各々に対して算出された評価値を、前記複数の認識対象語の各々に対して事前に算出された理想値と併せて提示する精度一覧画面を含む。 A threshold adjustment device according to an embodiment includes a speech recognition unit, an evaluation unit, and a display control unit. The speech recognition unit performs speech recognition. The evaluation unit inputs, to the speech recognition unit, a threshold list whose elements are a plurality of threshold values individually set for a plurality of predetermined recognition target words, and evaluation speech, and the speech recognition unit An evaluation value representing recognition accuracy for each of the plurality of recognition target words by the speech recognition unit using the threshold list is calculated based on the recognition result of the evaluation speech to be output. The display control unit causes the display device to display a threshold adjustment screen for adjusting a threshold corresponding to an arbitrary recognition target word among the plurality of recognition target words. The threshold adjustment screen presents the evaluation value calculated for each of the plurality of recognition target words by the evaluation unit together with the ideal value calculated in advance for each of the plurality of recognition target words. Contains a list of accuracy screens.

上記構成の閾値調整装置によれば、予め定められた複数の認識対象語の各々に対して適切な閾値を設定できるように、閾値の調整をサポートすることができる。 According to the threshold adjustment device having the above configuration, threshold adjustment can be supported so that an appropriate threshold can be set for each of a plurality of predetermined recognition target words.

図１は、実施形態の閾値調整装置の機能的な構成例を示すブロック図である。FIG. 1 is a block diagram illustrating a functional configuration example of a threshold adjustment device according to an embodiment; 図２は、キーワードテーブルの一例を示す図である。FIG. 2 is a diagram showing an example of a keyword table. 図３は、評価用データテーブルの一例を示す図である。FIG. 3 is a diagram showing an example of an evaluation data table. 図４は、閾値リストテーブルの一例を示す図である。FIG. 4 is a diagram showing an example of a threshold list table. 図５は、評価結果テーブルの一例を示す図である。FIG. 5 is a diagram showing an example of an evaluation result table. 図６は、理想値テーブルの一例を示す図である。FIG. 6 is a diagram showing an example of an ideal value table. 図７は、実施形態の閾値調整装置の動作例を示すフローチャートである。FIG. 7 is a flowchart illustrating an operation example of the threshold adjustment device of the embodiment; 図８は、精度一覧画面の一例を示す図である。FIG. 8 is a diagram showing an example of an accuracy list screen. 図９は、誤認識解析画面の一例を示す図である。FIG. 9 is a diagram showing an example of an erroneous recognition analysis screen. 図１０は、初期評価結果画面の一例を示す図である。FIG. 10 is a diagram showing an example of an initial evaluation result screen. 図１１は、変形例の閾値調整装置の機能的な構成例を示すブロック図である。FIG. 11 is a block diagram showing a functional configuration example of a threshold adjustment device according to a modification. 図１２は、変形例の精度一覧画面の一例を示す図である。FIG. 12 is a diagram showing an example of the accuracy list screen of the modification. 図１３は、閾値調整装置のハードウェア構成例を示すブロック図である。FIG. 13 is a block diagram illustrating a hardware configuration example of the threshold adjustment device.

以下、本発明の具体的な実施形態について、図面を参照しながら詳細に説明する。以下の実施形態では、数十個程度の特定のキーワード（認識対象語）のみに反応するボイストリガ音声認識への適用例を想定して説明する。 Hereinafter, specific embodiments of the present invention will be described in detail with reference to the drawings. In the following embodiments, an example of application to voice-triggered speech recognition that reacts only to several tens of specific keywords (recognition target words) will be described.

図１は、本実施形態の閾値調整装置の機能的な構成例を示すブロック図である。本実施形態の閾値調整装置は、図１に示すように、音声認識部１と、評価部２と、表示制御部３とを備える。 FIG. 1 is a block diagram showing a functional configuration example of the threshold adjustment device of this embodiment. The threshold adjustment device of this embodiment includes a speech recognition unit 1, an evaluation unit 2, and a display control unit 3, as shown in FIG.

音声認識部１は、音声認識モデル１０を用いて入力音声に対する音声認識処理を行う。本実施形態で利用される音声認識モデル１０には、音の特性を解析する音響モデルと、音声認識モデル１０の認識対象として予め定められたキーワード群が含まれる。音声認識モデル１０に含まれるキーワード群は、キーワードテーブル２０に登録される。 A speech recognition unit 1 uses a speech recognition model 10 to perform speech recognition processing on an input speech. The speech recognition model 10 used in this embodiment includes an acoustic model for analyzing sound characteristics and a group of keywords predetermined as recognition objects of the speech recognition model 10 . A keyword group included in the speech recognition model 10 is registered in the keyword table 20 .

音声認識モデル１０が認識対象とする個々のキーワードに対する反応しやすさを制御するために、音声認識部１に閾値リストが入力される。閾値リストは、各キーワードに対して個別に設定される複数の閾値を要素とするリストである。本実施形態では、音響特徴量間の類似性の指標として特徴空間上の距離を用い、入力音声の音響特徴量とキーワードの音響特徴量との間の距離が閾値以下である場合に音声認識部１がそのキーワードを出力するため、閾値を高く設定するほど対応するキーワードに反応しやすくなる。ここで、閾値の値域は０～１とし、特徴空間上の距離が０～１の値に正規化されて閾値と比較されるものとする。閾値リストは、閾値リストテーブル４０に登録される。 A threshold list is input to the speech recognition unit 1 in order to control the responsiveness of the speech recognition model 10 to individual keywords to be recognized. The threshold list is a list whose elements are a plurality of thresholds individually set for each keyword. In this embodiment, the distance in the feature space is used as an index of the similarity between the acoustic features. Since 1 outputs the keyword, the higher the threshold is set, the more likely it is to respond to the corresponding keyword. Here, the value range of the threshold is 0 to 1, and the distance in the feature space is normalized to a value of 0 to 1 and compared with the threshold. The threshold list is registered in the threshold list table 40. FIG.

音声認識部１は、例えば、音響モデルを用いて入力音声から音響特徴量を算出し、認識対象となるキーワード群のうち、この入力音声の音響特徴量と特徴空間上で最も近い音響特徴量を持つキーワードを特定する。そして、音声認識部１は、入力音声の音響特徴量と特定したキーワードの音響特徴量との特徴空間上における距離が、特定したキーワードに対して設定された閾値以下の場合に、入力音声に対する認識結果として、特定したキーワードを出力する。 The speech recognition unit 1 calculates, for example, an acoustic feature quantity from an input speech using an acoustic model, and selects an acoustic feature quantity that is closest in the feature space to the acoustic feature quantity of the input speech among the keywords to be recognized. Identify your keywords. Then, the speech recognition unit 1 recognizes the input speech when the distance in the feature space between the acoustic feature quantity of the input speech and the acoustic feature quantity of the specified keyword is equal to or less than a threshold value set for the specified keyword. As a result, the specified keyword is output.

評価部２は、音声認識部１に対し、閾値リストテーブル４０に随時登録される閾値リストと、評価用データテーブル３０に登録されている評価用データの音声（評価用音声）とを入力し、音声認識部１が出力する評価用音声に対する認識結果に基づいて、閾値リストを用いた音声認識部１による各キーワードに対する認識精度を表す評価値を算出する。評価部２による評価結果は、評価結果テーブル５０に記録される。 The evaluation unit 2 inputs to the speech recognition unit 1 the threshold list registered as needed in the threshold list table 40 and the speech of the evaluation data registered in the evaluation data table 30 (evaluation speech), Based on the recognition result of the evaluation speech output by the speech recognition unit 1, an evaluation value representing the recognition accuracy for each keyword by the speech recognition unit 1 using the threshold list is calculated. The evaluation result by the evaluation unit 2 is recorded in the evaluation result table 50. FIG.

また、評価部２は、後述の初期評価によって各キーワードの理想的な評価値（理想値）を事前に算出する。評価部２により事前に算出されたキーワードごとの理想値は、理想値テーブル６０に記録される。 Also, the evaluation unit 2 calculates in advance an ideal evaluation value (ideal value) of each keyword through an initial evaluation described later. The ideal value for each keyword calculated in advance by the evaluation unit 2 is recorded in the ideal value table 60 .

表示制御部３は、音声認識モデル１０が認識対象とするキーワード群のうちの任意のキーワードに対応する閾値を調整するための閾値調整画面を任意の表示装置に表示させる。閾値調整画面は、後述の精度一覧画面７０（図８参照）、誤認識解析画面８０（図９参照）、初期評価結果画面９０（図１０参照）を含む。 The display control unit 3 causes an arbitrary display device to display a threshold adjustment screen for adjusting a threshold corresponding to an arbitrary keyword among the keyword group to be recognized by the speech recognition model 10 . The threshold adjustment screen includes an accuracy list screen 70 (see FIG. 8), an erroneous recognition analysis screen 80 (see FIG. 9), and an initial evaluation result screen 90 (see FIG. 10), which will be described later.

図２は、キーワードテーブル２０の一例を示す図である。キーワードテーブル２０は、音声認識モデル１０が認識対象とする各キーワードが登録されるテーブルである。キーワードテーブル２０は、図２に示すように、各キーワードに対して一意に割り当てられたＩＤと、各キーワードのテキストとを含む。 FIG. 2 is a diagram showing an example of the keyword table 20. As shown in FIG. The keyword table 20 is a table in which each keyword to be recognized by the speech recognition model 10 is registered. The keyword table 20, as shown in FIG. 2, includes an ID uniquely assigned to each keyword and the text of each keyword.

図３は、評価用データテーブル３０の一例を示す図である。評価用データテーブル３０は、評価部２による評価に用いられる評価用データが登録されるテーブルである。評価用データは、音声とその音声に含まれるキーワードからなる。つまり、音声認識モデル１０の認識対象として予め定められたキーワードを含む評価用音声と、その評価用音声から認識されるべき正解のキーワードとの組みが評価用データである。評価用データテーブル３０は、図３に示すように、各評価用データに対して一意に割り当てられたＩＤと、音声のファイル名と、キーワードＩＤとを含む。キーワードＩＤは、音声に含まれるキーワードのＩＤを示す。 FIG. 3 is a diagram showing an example of the evaluation data table 30. As shown in FIG. The evaluation data table 30 is a table in which evaluation data used for evaluation by the evaluation unit 2 is registered. Evaluation data consists of speech and keywords contained in the speech. In other words, evaluation data is a combination of an evaluation speech containing a keyword predetermined as a recognition target of the speech recognition model 10 and a correct keyword to be recognized from the evaluation speech. The evaluation data table 30, as shown in FIG. 3, includes an ID uniquely assigned to each evaluation data, an audio file name, and a keyword ID. The keyword ID indicates the ID of the keyword included in the voice.

図４は、閾値リストテーブル４０の一例を示す図である。閾値リストテーブル４０は、音声認識部１に入力される閾値リストが登録されるテーブルである。閾値リストテーブル４０は、図４に示すように、各閾値リストに対して一意に割り当てられたＩＤと、閾値リストとを含む。閾値リストは、上述のように、音声認識モデル１０が認識対象とする各キーワードに対して個別に設定される閾値を要素とするリストである。 FIG. 4 is a diagram showing an example of the threshold list table 40. As shown in FIG. The threshold list table 40 is a table in which the threshold list input to the speech recognition unit 1 is registered. The threshold list table 40, as shown in FIG. 4, includes an ID uniquely assigned to each threshold list and a threshold list. As described above, the threshold list is a list whose elements are thresholds set individually for each keyword to be recognized by the speech recognition model 10 .

閾値リストテーブル４０に登録される閾値リストとしては、評価部２の初期評価で用いられる初期評価用閾値リスト、評価部２の初期評価によって得られる初期閾値リスト、閾値調整画面を用いて生成される調整後閾値リストがある。図４の例では、ＩＤが“１”とＩＤが“２”の閾値リストが初期評価用閾値リスト、ＩＤが“１９”の閾値リストが初期閾値リスト、ＩＤが“２０”の閾値リストが調整後閾値リストであることを想定している。 The threshold list registered in the threshold list table 40 includes the initial evaluation threshold list used in the initial evaluation by the evaluation unit 2, the initial threshold list obtained by the initial evaluation by the evaluation unit 2, and the threshold adjustment screen. There is an adjusted threshold list. In the example of FIG. 4, the threshold lists with IDs "1" and "2" are threshold lists for initial evaluation, the threshold list with ID "19" is the initial threshold list, and the threshold list with ID "20" is adjusted. It is assumed to be a post-threshold list.

初期評価用閾値リストは、音声認識モデル１０が認識対象とする各キーワードに共通の閾値を設定する閾値リスト、つまり、要素の値が全て同じ閾値リストである。各キーワードで共通とする要素の値が異なる複数の初期評価用リストが、閾値リストテーブル４０に登録される。 The initial evaluation threshold list is a threshold list in which a common threshold is set for each keyword to be recognized by the speech recognition model 10, that is, a threshold list whose element values are all the same. A plurality of initial evaluation lists having different element values common to each keyword are registered in the threshold list table 40 .

初期閾値リストは、各キーワードに対して個別に設定される閾値として、初期評価によって理想値が得られたときに用いられた初期評価用閾値リストの閾値を用いた閾値リストである。初期閾値リストは、評価部２による初期評価の結果をもとに生成され、閾値リストテーブル４０に登録される。 The initial threshold list is a threshold list that uses the thresholds of the initial evaluation threshold list that were used when the ideal values were obtained by the initial evaluation as the thresholds that are individually set for each keyword. The initial threshold list is generated based on the results of the initial evaluation by the evaluation unit 2 and registered in the threshold list table 40 .

調整後閾値リストは、ユーザが閾値調整画面を用いて任意の認識対象語に対応する閾値を調整することにより得られる閾値リストである。調整後閾値リストは、閾値調整画面を用いた操作に応じて随時生成され、閾値リストテーブル４０に登録される。 The post-adjustment threshold list is a threshold list obtained by the user adjusting the threshold corresponding to an arbitrary recognition target word using the threshold adjustment screen. The post-adjustment threshold list is generated as needed according to the operation using the threshold adjustment screen and registered in the threshold list table 40 .

図５は、評価結果テーブル５０の一例を示す図である。評価結果テーブル５０は、評価部２による評価結果が記録されるテーブルである。評価結果テーブル５０は、図５に示すように、各評価結果に対して一意に割り当てられたＩＤと、閾値リストＩＤと、キーワードＩＤと、正解率と、拒絶率と、誤認識キーワードとを含む。 FIG. 5 is a diagram showing an example of the evaluation result table 50. As shown in FIG. The evaluation result table 50 is a table in which evaluation results by the evaluation unit 2 are recorded. The evaluation result table 50 includes, as shown in FIG. 5, an ID uniquely assigned to each evaluation result, a threshold list ID, a keyword ID, an accuracy rate, a rejection rate, and an incorrectly recognized keyword. .

閾値リストＩＤは、評価の際に音声認識部１に入力した閾値リストのＩＤである。キーワードＩＤは、評価対象となるキーワードのＩＤである。ここでは、１つの閾値リストに対してキーワードごとの音声認識部１による認識精度を評価するものとし、閾値リストとキーワードの組合せごとの評価結果が、評価結果テーブル５０の１つのエントリに記録される。 The threshold list ID is the ID of the threshold list input to the speech recognition unit 1 at the time of evaluation. The keyword ID is the ID of the keyword to be evaluated. Here, the recognition accuracy by the speech recognition unit 1 for each keyword is evaluated with respect to one threshold list, and the evaluation result for each combination of the threshold list and the keyword is recorded in one entry of the evaluation result table 50. .

正解率は、評価対象となるキーワードを含む評価用音声に対する音声認識部１による認識結果がそのキーワードであった場合を正解とし、（正解回数／そのキーワードを含む評価用データの総数）×１００（％）で算出される。拒絶率は、音声認識部１の認識結果がキーワードなしであった場合を拒絶とし、（拒絶回数／評価対象となるキーワードを含む評価用データの総数）×１００（％）で算出される。これら正解率および拒絶率は、閾値リストを用いた音声認識部１によるキーワードに対する認識精度を表す評価値として、評価結果テーブル５０に記録される。 The accuracy rate is defined as the correct answer when the recognition result of the speech recognition unit 1 for the evaluation speech containing the keyword to be evaluated is the keyword, and %). The rejection rate is calculated by (number of rejections/total number of data for evaluation including keywords to be evaluated)×100 (%) when the recognition result of the speech recognition unit 1 is no keyword. These accuracy rate and rejection rate are recorded in the evaluation result table 50 as evaluation values representing the accuracy of keyword recognition by the speech recognition unit 1 using the threshold list.

上述の正解と拒絶以外に、音声認識部１による認識結果が正解と異なるキーワードである場合、つまり他のキーワードを誤認識する場合がある。音声認識部１が正解と異なる他のキーワードを誤認識した場合、誤認識したキーワードごとに誤認識した回数が評価部２によりカウントされ、誤認識したキーワードのＩＤとそのキーワードを誤認識した回数の組が、評価結果テーブル５０の誤認識キーワードに配列として記録される。 In addition to the above-mentioned correct answer and rejection, there is a case where the recognition result by the speech recognition unit 1 is a different keyword from the correct answer, that is, there is a case where another keyword is erroneously recognized. When the speech recognition unit 1 incorrectly recognizes another keyword that is different from the correct answer, the evaluation unit 2 counts the number of times of misrecognition for each misrecognized keyword. The set is recorded as an array in the erroneously recognized keyword of the evaluation result table 50 .

図６は、理想値テーブル６０の一例を示す図である。理想値テーブル６０は、評価部２が初期評価によって算出したキーワードごとの理想値が記録されるテーブルである。理想値テーブル６０は、各キーワードに対して一意に割り当てられたＩＤと、各キーワードの理想値である正解率および拒絶率とを含む。正解率は、初期評価によって得られた正解率のうち、最も値が高いものが記録される。拒絶率は、最も高い正解率が得られた初期評価における拒絶率が記録される。 FIG. 6 is a diagram showing an example of the ideal value table 60. As shown in FIG. The ideal value table 60 is a table in which the ideal value for each keyword calculated by the evaluation unit 2 by the initial evaluation is recorded. The ideal value table 60 includes an ID uniquely assigned to each keyword, and the correct answer rate and rejection rate, which are ideal values for each keyword. As for the accuracy rate, the highest value among the accuracy rates obtained by the initial evaluation is recorded. The rejection rate is recorded as the rejection rate in the initial evaluation with the highest correct rate.

次に、図７のフローチャートに沿って、本実施形態の閾値調整装置の動作を説明する。図７は、本実施形態の閾値調整装置の動作例を示すフローチャートである。 Next, the operation of the threshold adjustment device of this embodiment will be described along the flowchart of FIG. FIG. 7 is a flow chart showing an operation example of the threshold adjustment device of this embodiment.

まず、閾値調整装置に対するデータ登録が実施される（ステップＳ１０１）。ここで登録されるデータは、音声認識モデル１０および評価用データである。音声認識モデル１０の登録は、例えば、モデルのファイル名と認識対象となる各キーワードの文字列とを含むＪＳＯＮファイルをアップロードすることによって行う。モデルのファイルは予め装置内部に格納しておいてもよいし、別途アップロードする構成でもよい。登録された音声認識モデル１０は、音声認識部１が音声認識処理を行うために使用される。また、この音声認識モデル１０に含まれる認識対象の各キーワードが、キーワードテーブル２０に登録される。 First, data registration for the threshold adjustment device is performed (step S101). The data registered here are the speech recognition model 10 and evaluation data. Registration of the speech recognition model 10 is performed, for example, by uploading a JSON file containing the file name of the model and the character strings of each keyword to be recognized. The model file may be stored in the device in advance, or may be uploaded separately. The registered speech recognition model 10 is used by the speech recognition unit 1 to perform speech recognition processing. Each keyword to be recognized included in the speech recognition model 10 is registered in the keyword table 20 .

また、評価用データの登録は、例えば、複数の評価用音声のファイル名と各評価用音声に含まれるキーワードの文字列とを含むＪＳＯＮファイルをアップロードすることによって行う。各評価用音声のファイルは予め装置内部に格納しておいてもよいし、別途アップロードする構成でもよい。アップロードした各評価用音声のファイル名とキーワードのＩＤが、評価用データテーブル３０に登録される。 Registration of the evaluation data is performed by uploading a JSON file containing, for example, the file names of multiple evaluation voices and the character strings of keywords included in each evaluation voice. Each evaluation voice file may be stored in the device in advance, or may be uploaded separately. The file name and keyword ID of each uploaded evaluation voice are registered in the evaluation data table 30 .

データ登録が終了すると、評価部２が初期評価を行う（ステップＳ１０２）。初期評価では、評価部２は、各キーワードに共通の閾値を設定する初期評価用閾値リストと、評価用データテーブル３０に登録されている各評価用データの音声（評価用音声）を、初期評価用閾値リストにおける閾値を所定間隔で変更しながら音声認識部１に繰り返し入力する。初期評価用閾値リストの閾値は、例えば、０．１を開始値として０．９５まで０．０５刻みで変化させる。これらの初期評価用閾値リストは、音声認識部１に順次入力されるとともに閾値リストテーブル４０に登録される。 When the data registration is completed, the evaluation unit 2 performs initial evaluation (step S102). In the initial evaluation, the evaluation unit 2 uses an initial evaluation threshold list that sets a common threshold for each keyword, and the speech (evaluation speech) of each evaluation data registered in the evaluation data table 30 for the initial evaluation. The threshold value in the threshold value list is repeatedly input to the speech recognition unit 1 while being changed at predetermined intervals. For example, the threshold values in the initial evaluation threshold list are changed from 0.1 to 0.95 in increments of 0.05. These initial evaluation threshold lists are sequentially input to the speech recognition unit 1 and registered in the threshold list table 40 .

評価部２は、初期評価用閾値リストと評価用音声を音声認識部１に入力するたびに、その初期評価用閾値リストの閾値を用いた音声認識部１によるキーワードごとの認識精度を評価する。初期評価用閾値リストの閾値に対応するキーワードごとの評価は、音声認識部１が出力する認識結果を評価用データに含まれるキーワードと照合し、キーワードごとの正解率・拒絶率（評価値）を算出することで行われる。すなわち、評価部２は、以上のように閾値を変化させた初期評価用と評価用音声を音声認識部１に繰り返し入力しながら、音声認識部１が出力する評価用音声に対する認識結果に基づいて、キーワードごとの評価値を繰り返し算出する。これら評価部２による初期評価の評価結果は、閾値リストテーブル４０に登録された初期評価用閾値リストのＩＤとともに、評価結果テーブル５０に記録される。 The evaluation unit 2 evaluates the recognition accuracy for each keyword by the speech recognition unit 1 using the thresholds of the initial evaluation threshold list and the evaluation speech each time the speech recognition unit 1 receives the initial evaluation threshold list and the evaluation speech. The evaluation of each keyword corresponding to the thresholds in the initial evaluation threshold list is performed by comparing the recognition results output by the speech recognition unit 1 with the keywords included in the evaluation data, and calculating the accuracy rate/rejection rate (evaluation value) for each keyword. This is done by calculating That is, the evaluation unit 2 repeatedly inputs the initial evaluation speech and the evaluation speech with the threshold changed as described above to the speech recognition unit 1, and based on the recognition result of the evaluation speech output from the speech recognition unit 1, , the evaluation value for each keyword is repeatedly calculated. The evaluation results of the initial evaluation by the evaluation unit 2 are recorded in the evaluation result table 50 together with the ID of the threshold value list for initial evaluation registered in the threshold value list table 40 .

以上の初期評価を実施した後、評価部２は、キーワードごとに、最良の正解率が得られた評価結果を評価結果テーブル５０から検索し、その正解率・拒絶率を理想値として、キーワードのＩＤとともに理想値テーブル６０に記録する。また、評価部２は、最良の正解率が得られたキーワードごとの閾値を要素とする閾値リストを初期閾値リストとして閾値リストテーブル４０に登録する。 After carrying out the above initial evaluation, the evaluation unit 2 searches the evaluation result table 50 for the evaluation result with the best correct answer rate for each keyword, and uses the correct answer rate/rejection rate as ideal values for the keyword. It is recorded in the ideal value table 60 together with the ID. In addition, the evaluation unit 2 registers a threshold list whose elements are the threshold values for each keyword for which the best accuracy rate is obtained in the threshold value list table 40 as an initial threshold value list.

次に、評価部２は、初期閾値リストを用いた評価を行う（ステップＳ１０３）。初期評価と同様に、評価部２は、閾値リスト（ここでは初期閾値リスト）と評価用音声を音声認識部１に入力し、音声認識部１が出力する認識結果をもとにキーワードごとの正解率・拒絶率（評価値）を算出して、評価結果を評価結果テーブル５０に記録する。 Next, the evaluation unit 2 performs evaluation using the initial threshold list (step S103). As in the initial evaluation, the evaluation unit 2 inputs the threshold list (here, the initial threshold list) and evaluation speech to the speech recognition unit 1, and based on the recognition results output by the speech recognition unit 1, correct answers for each keyword are determined. The rate/rejection rate (evaluation value) is calculated and the evaluation result is recorded in the evaluation result table 50 .

初期閾値リストを用いた評価が終了すると、表示制御部３が、ユーザが使用する端末の表示装置などの任意の表示装置に閾値調整画面を表示させ、この閾値調整画面を用いたユーザの操作に応じて閾値の調整を行う（ステップＳ１０４）。 When the evaluation using the initial threshold list is completed, the display control unit 3 displays a threshold adjustment screen on an arbitrary display device such as the display device of the terminal used by the user, and the user's operation using this threshold adjustment screen The threshold value is adjusted accordingly (step S104).

まず、表示制御部３は、閾値調整画面として、例えば図８に示すような精度一覧画面７０を表示装置に表示させる。この精度一覧画面７０は、評価部２によって認識対象のキーワードの各々に対して算出された評価値を、理想値テーブル６０に記録された理想値と併せて提示する画面である。 First, the display control unit 3 causes the display device to display, for example, an accuracy list screen 70 as shown in FIG. 8 as the threshold adjustment screen. The accuracy list screen 70 presents the evaluation values calculated for each of the keywords to be recognized by the evaluation unit 2 together with the ideal values recorded in the ideal value table 60 .

図８に示す精度一覧画面７０では、グラフ表示７１により、音声認識部１に入力した閾値リスト（ここでは初期閾値リスト）での各キーワードの評価値を理想値と併せて提示している。グラフ表示７１は、横軸に認識対象の各キーワードが配置され、キーワードごとの評価値（正解率・拒絶率）と理想値（正解率・拒絶率）が、縦軸の該当する位置にプロットされている。図中の白の丸が初期閾値リストでの正解率、黒の丸が理想値としての正解率を示し、図中の白の三角が初期閾値リストでの拒絶率、黒の三角が理想値としての拒絶率を示している。これらキーワードごとの評価値は、初期閾値リストのＩＤとキーワードのＩＤをキーとして評価結果テーブル５０を検索することで取得される。また、キーワードごとの理想値は、キーワードＩＤをキーとして理想値テーブル６０を検索することで取得される。 In the accuracy list screen 70 shown in FIG. 8, a graph display 71 presents the evaluation value of each keyword in the threshold list (here, the initial threshold list) input to the speech recognition unit 1 together with the ideal value. In the graph display 71, each keyword to be recognized is arranged on the horizontal axis, and the evaluation value (correct answer rate/rejection rate) and the ideal value (correct answer rate/rejection rate) for each keyword are plotted at the corresponding positions on the vertical axis. ing. The white circles in the figure indicate the accuracy rate in the initial threshold list, the black circles indicate the accuracy rate as the ideal value, the white triangles in the figure indicate the rejection rate in the initial threshold list, and the black triangles indicate the ideal value. shows the rejection rate of The evaluation value for each keyword is acquired by searching the evaluation result table 50 using the ID of the initial threshold value list and the ID of the keyword as keys. The ideal value for each keyword is acquired by searching the ideal value table 60 using the keyword ID as a key.

あるキーワードにおいて、初期閾値リストでの評価値が理想値と一致しないのは、理想値では他のキーワードに対して同一の閾値を設定しているのに対し、初期閾値リストでは他のキーワードに異なる閾値を設定していることにより発生する、キーワード間の相互作用のためである。ユーザは、この精度一覧画面７０のグラフ表示７１を参照することにより、初期閾値リストでの評価値が理想値に対して低下しているキーワード（図８の例では「オン」）を容易に把握することができる。 For a given keyword, the evaluation value in the initial threshold list does not match the ideal value. This is due to interactions between keywords that occur due to setting thresholds. By referring to the graph display 71 of the accuracy list screen 70, the user can easily grasp the keyword (“on” in the example of FIG. 8) whose evaluation value in the initial threshold list is lower than the ideal value. can do.

精度一覧画面７０上でユーザが任意のキーワードをクリックして選択し、「誤検知解析」ボタン７２を押すと、閾値調整画面は、例えば図９に示す誤認識解析画面８０に遷移する。この誤認識解析画面８０は、精度一覧画面７０で選択されたキーワードについて、そのキーワードを含む評価用音声に対して音声認識部１が他のキーワードを認識結果として出力した回数、つまり、音声認識部１による誤認識の回数を、誤認識したキーワードごとに提示する画面である。 When the user clicks and selects an arbitrary keyword on the accuracy list screen 70 and presses the "analysis of false positive detection" button 72, the threshold adjustment screen transitions to the false recognition analysis screen 80 shown in FIG. 9, for example. This misrecognition analysis screen 80 shows the number of times the speech recognition unit 1 has output another keyword as a recognition result for the evaluation speech containing the keyword selected on the accuracy list screen 70, that is, the speech recognition unit 1 is a screen that presents the number of misrecognitions by keyword 1 for each misrecognised keyword.

図９に示す誤認識解析画面８０では、グラフ表示８１により、誤認識したキーワードごとの誤認識の回数を提示している。グラフ表示８１は、横軸に認識対象の各キーワードが配置され、誤認識されたキーワードについては、その誤認識回数が縦軸の該当する位置まで伸びる棒グラフで表されている。誤認識されたキーワードの誤認識回数は、精度一覧画面７０上で選択されたキーワードのＩＤをキーとして評価結果テーブル５０を検索することで取得される。ユーザは、この誤認識解析画面８０を参照することにより、誤認識を防止するために閾値を下げるべきキーワード（図９の例では「音楽」）を容易に把握することができる。 In the misrecognition analysis screen 80 shown in FIG. 9, a graph display 81 presents the number of misrecognitions for each misrecognised keyword. In the graph display 81, each keyword to be recognized is arranged on the horizontal axis, and the misrecognized keyword is represented by a bar graph extending to the corresponding position on the vertical axis for the number of misrecognition times. The number of incorrectly recognized keywords is obtained by searching the evaluation result table 50 using the ID of the keyword selected on the accuracy list screen 70 as a key. By referring to this erroneous recognition analysis screen 80, the user can easily grasp the keyword (“music” in the example of FIG. 9) for which the threshold value should be lowered in order to prevent erroneous recognition.

誤認識解析画面８０上でユーザが「戻る」ボタン８２を押すと、閾値調整画面は、図８に示した精度一覧画面７０に戻る。そして、精度一覧画面７０上でユーザが任意のキーワードをクリックで選択し、「初期評価結果」ボタン７３を押すと、閾値調整画面は、例えば図１０に示す初期評価結果画面９０に遷移する。この初期評価結果画面９０は、選択されたキーワードについて、初期評価で用いた初期評価用閾値リストの閾値ごとに評価部２により算出された評価値（正解率・拒絶率）の一覧を提示する画面である。 When the user presses a "return" button 82 on the misrecognition analysis screen 80, the threshold adjustment screen returns to the accuracy list screen 70 shown in FIG. Then, when the user clicks to select an arbitrary keyword on the accuracy list screen 70 and presses the "initial evaluation result" button 73, the threshold adjustment screen changes to an initial evaluation result screen 90 shown in FIG. 10, for example. This initial evaluation result screen 90 presents a list of evaluation values (correct answer rate/rejection rate) calculated by the evaluation unit 2 for each threshold in the initial evaluation threshold list used in the initial evaluation for the selected keyword. is.

図１０に示す初期評価結果画面９０では、グラフ表示９１により、初期評価で用いた初期評価用閾値リストの閾値ごとの評価値の一覧を提示している。グラフ表示９１は、横軸に初期評価で用いた各初期評価用閾値リストの閾値が配置され、それぞれの閾値に対応する正解率と拒絶率が縦軸の該当する位置まで伸びる棒グラフで表されている。図中の白の棒グラフが閾値ごとの正解率、黒の棒グラフが閾値ごとの拒絶率を示している。閾値ごとの正解率・拒絶率は、キーワードのＩＤと初期評価用閾値リストのＩＤをキーとして評価結果テーブル５０を検索することで取得される。ユーザは、この初期評価結果画面９０を参照することにより、閾値の調整可能範囲などを把握できる。正解率が十分に高く、拒絶率が十分に低い範囲（図１０の例では０．４５～０．７５の範囲）であれば、閾値を調整してよいと考えられる。 In the initial evaluation result screen 90 shown in FIG. 10, a graph display 91 presents a list of evaluation values for each threshold in the initial evaluation threshold list used in the initial evaluation. In the graph display 91, the thresholds of each initial evaluation threshold list used in the initial evaluation are arranged on the horizontal axis, and the accuracy rate and the rejection rate corresponding to each threshold are represented by bar graphs extending to the corresponding positions on the vertical axis. there is The white bar graph in the figure indicates the accuracy rate for each threshold, and the black bar graph indicates the rejection rate for each threshold. The accuracy rate/rejection rate for each threshold is acquired by searching the evaluation result table 50 using the ID of the keyword and the ID of the threshold list for initial evaluation as keys. By referring to this initial evaluation result screen 90, the user can grasp the adjustable range of the threshold value. If the accuracy rate is sufficiently high and the rejection rate is sufficiently low (range of 0.45 to 0.75 in the example of FIG. 10), the threshold may be adjusted.

初期評価結果画面９０上でユーザが設定したい閾値をクリックして選択し、「閾値設定」ボタン９２を押すと、精度一覧画面７０上で選択したキーワードに対する閾値が、初期評価結果画面９０上で選択した閾値に変更され、閾値調整画面は精度一覧画面７０に戻る。誤認識解析画面８０で把握した他のキーワードに対する閾値を変更する場合、ユーザは、そのキーワードを精度一覧画面７０上で選択し、初期評価結果画面９０上で閾値を選択して「閾値設定」ボタン９２を押すことにより、そのキーワードの閾値も変更できる。すなわち、ユーザは、表示装置に閾値調整画面として表示される精度一覧画面７０、誤認識解析画面８０、初期評価結果画面９０を用いて、各キーワードに設定する閾値を所望の値に適切に調整することができる。 When the user clicks and selects the threshold he/she wants to set on the initial evaluation result screen 90 and presses the "Set Threshold" button 92, the threshold for the keyword selected on the accuracy list screen 70 is selected on the initial evaluation result screen 90. and the threshold value adjustment screen returns to the accuracy list screen 70 . When changing the threshold for another keyword grasped on the misrecognition analysis screen 80, the user selects the keyword on the accuracy list screen 70, selects the threshold on the initial evaluation result screen 90, and presses the "Set Threshold" button. By pressing 92, the threshold for that keyword can also be changed. That is, the user appropriately adjusts the threshold value set for each keyword to a desired value using the accuracy list screen 70, the misrecognition analysis screen 80, and the initial evaluation result screen 90 displayed as the threshold adjustment screen on the display device. be able to.

閾値を変更したい全てのキーワードについて同様の操作を行った後、精度一覧画面７０上でユーザが「再評価」ボタン７４を押すと（ステップＳ１０５：Ｎｏ）、変更された閾値を反映した新たな閾値リストが調整後閾値リストとして閾値リストテーブル４０に登録される。また、閾値調整装置の動作フローはステップＳ１０３に戻り、評価部２によってその調整後閾値リストを用いた評価が再度行われ、評価結果が評価結果テーブル５０に記録される。その後、調整後閾値リストでの各キーワードの評価値を理想値と併せて提示する精度一覧画面７０が表示される。このとき、初期閾値リストでの各キーワードの評価値も消さずに提示してもよい。この場合は、初期閾値リストでの評価値と調整後閾値リストでの評価値とを例えば色分けなどによって明確に区別できるようにすることが望ましい。 After performing the same operation for all the keywords for which the threshold is to be changed, when the user presses the "re-evaluate" button 74 on the accuracy list screen 70 (step S105: No), a new threshold reflecting the changed threshold is set. The list is registered in the threshold list table 40 as an adjusted threshold list. In addition, the operation flow of the threshold adjustment device returns to step S103, the evaluation unit 2 performs the evaluation again using the adjusted threshold list, and the evaluation result is recorded in the evaluation result table 50. FIG. After that, an accuracy list screen 70 presenting the evaluation value of each keyword in the post-adjustment threshold list together with the ideal value is displayed. At this time, the evaluation value of each keyword in the initial threshold list may also be presented without being erased. In this case, it is desirable that the evaluation values in the initial threshold list and the evaluation values in the adjusted threshold list can be clearly distinguished by, for example, different colors.

ユーザは、認識対象の各キーワードについて適切な評価結果が得られるまで上述の操作を繰り返し、各キーワードについて適切な評価結果が得られたことを確認したら、精度一覧画面７０上で「終了ボタン」７５を押す（ステップＳ１０５：Ｙｅｓ）。これにより、閾値調整装置の一連の動作が終了する。このとき、最新の調整後閾値リストが、ステップＳ１０１で登録された音声認識モデル１０に対応する最適な閾値リストとして、指定された外部の配信先に配信されるようにしてもよい。また、最新の調整後閾値リストを、ステップＳ１０１で登録された音声認識モデル１０に対応する最適な閾値リストとして閾値調整装置の内部に保存され、必要に応じて外部からアクセスできるようにしてもよい。 The user repeats the above operation until an appropriate evaluation result is obtained for each keyword to be recognized, and after confirming that an appropriate evaluation result has been obtained for each keyword, clicks the "end button" 75 on the accuracy list screen 70. is pressed (step S105: Yes). This completes a series of operations of the threshold adjustment device. At this time, the latest adjusted threshold list may be distributed to the specified external distribution destination as the optimal threshold list corresponding to the speech recognition model 10 registered in step S101. Also, the latest adjusted threshold list may be stored inside the threshold adjustment apparatus as an optimal threshold list corresponding to the speech recognition model 10 registered in step S101, and may be accessed from the outside as necessary. .

以上、具体的な例を挙げながら詳細に説明したように、本実施形態の閾値調整装置は、認識対象として予め定められた複数のキーワードの各々に対して個別に設定される閾値を要素とする閾値リストを用いた場合の認識精度を表す評価値をキーワードごとに算出し、算出した評価値を理想値と併せて提示する精度一覧画面７０を表示装置に表示させるようにしている。したがって、ユーザは、この精度一覧画面７０参照することで、閾値を変更すべきキーワードを容易に把握することができる。 As described above in detail with specific examples, the threshold adjustment device of the present embodiment uses thresholds set individually for each of a plurality of keywords predetermined as recognition targets as elements. An evaluation value representing the recognition accuracy when using the threshold list is calculated for each keyword, and an accuracy list screen 70 presenting the calculated evaluation value together with the ideal value is displayed on the display device. Therefore, by referring to the accuracy list screen 70, the user can easily grasp the keyword whose threshold value should be changed.

また、本実施形態の閾値調整装置は、精度一覧画面７０上で任意のキーワードが選択されると、誤認識解析画面８０や初期評価結果画面９０を表示装置に表示させるようにしている。したがって、ユーザは誤認識解析画面８０を参照することで、選択したキーワードに対して誤認識されやすい他のキーワードを容易に把握できるとともに、初期評価結果画面９０を参照することで、選択したキーワードの閾値を変更可能な範囲を容易に把握することができ、閾値の変更を適切に実施することができる。 Further, the threshold adjusting apparatus of the present embodiment displays an erroneous recognition analysis screen 80 and an initial evaluation result screen 90 on the display device when an arbitrary keyword is selected on the accuracy list screen 70 . Therefore, by referring to the erroneous recognition analysis screen 80, the user can easily grasp other keywords that are likely to be erroneously recognized with respect to the selected keyword, and by referring to the initial evaluation result screen 90, the selected keyword can be identified. The range in which the threshold can be changed can be easily grasped, and the threshold can be appropriately changed.

このように、本実施形態の閾値調整装置は、認識対象として予め定められた複数のキーワードの各々に対して適切な閾値を設定できるように、閾値の調整をサポートすることができる。 In this manner, the threshold adjustment device of the present embodiment can support threshold adjustment so that an appropriate threshold can be set for each of a plurality of keywords predetermined as recognition targets.

＜変形例１＞
上述の閾値調整装置は、閾値を自動調整する機能を備える構成としてもよい。図１１は、本変形例の閾値調整装置の機能的な構成例を示すブロック図である。本変形例の閾値調整装置は、図１に示した構成に対し、自動調整部４が付加された構成である。<Modification 1>
The threshold adjustment device described above may be configured to have a function of automatically adjusting the threshold. FIG. 11 is a block diagram showing a functional configuration example of the threshold adjustment device of this modification. The threshold adjustment device of this modification has a configuration in which an automatic adjustment unit 4 is added to the configuration shown in FIG.

本変形例では、初期閾値リストを用いた評価が終了すると、表示制御部３が、閾値調整画面として、まず、図１２に示すような精度一覧画面７０を表示装置に表示させる。この精度一覧画面７０は、図８に示した精度一覧画面７０に対し、「自動調整」ボタン７６が付加された構成である。この精度一覧画面７０上でユーザが「自動調整」ボタン７６を押すと、自動調整部４が起動する。 In this modification, when the evaluation using the initial threshold list is completed, the display control unit 3 first causes the display device to display an accuracy list screen 70 as shown in FIG. 12 as the threshold adjustment screen. This accuracy list screen 70 has a configuration in which an "automatic adjustment" button 76 is added to the accuracy list screen 70 shown in FIG. When the user presses an "automatic adjustment" button 76 on the accuracy list screen 70, the automatic adjustment section 4 is activated.

自動調整部４は、起動後、まず、精度一覧画面７０に提示される評価値と理想値との差分に基づいて、閾値を調整する対象となるキーワード（第１の認識対象語）を選択する。例えば、自動調整部４は、評価値が理想値から最も低下しているキーワードを選択する。そして、自動調整部４は、選択したキーワードに対応する初期評価結果画面９０に提示される閾値ごとの評価値の一覧に基づいて、選択したキーワードの正解率が低下せず拒絶率が上昇しない範囲で、そのキーワードに対応する閾値を上げる。 After being activated, the automatic adjustment unit 4 first selects a keyword (first recognition target word) whose threshold is to be adjusted based on the difference between the evaluation value and the ideal value presented on the accuracy list screen 70. . For example, the automatic adjustment unit 4 selects the keyword whose evaluation value is the lowest from the ideal value. Based on the list of evaluation values for each threshold presented on the initial evaluation result screen 90 corresponding to the selected keyword, the automatic adjustment unit 4 determines the range in which the accuracy rate of the selected keyword does not decrease and the rejection rate does not increase. , raise the threshold corresponding to the keyword.

また、自動調整部４は、精度一覧画面７０上で選択したキーワードに対応する誤認識解析画面８０において最も誤認識の回数が多かったキーワードを、精度一覧画面７０上で選択したキーワードとともに閾値を調整するキーワード（第２の認識対象語）として選択する。そして、自動調整部４は、選択したキーワードに対応する初期評価結果画面９０に提示される閾値ごとの評価値の一覧に基づいて、選択したキーワードの正解率が低下せず拒絶率が上昇しない範囲で、そのキーワードに対応する閾値を下げる。 In addition, the automatic adjustment unit 4 adjusts the threshold for the keyword selected on the accuracy list screen 70 together with the keyword with the largest number of misrecognitions on the misrecognition analysis screen 80 corresponding to the keyword selected on the accuracy list screen 70. is selected as a keyword to be recognized (second recognition target word). Based on the list of evaluation values for each threshold presented on the initial evaluation result screen 90 corresponding to the selected keyword, the automatic adjustment unit 4 determines the range in which the accuracy rate of the selected keyword does not decrease and the rejection rate does not increase. , lower the threshold corresponding to that keyword.

自動調整部４は、以上の動作を規定回数繰り返し、変更した閾値を反映した新たな閾値リストを調整後閾値リストとして閾値リストテーブル４０に登録する。その後、上述の実施形態と同様に、評価部２によってその調整後閾値リストを用いた評価が再度行われ、評価結果が評価結果テーブル５０に記録される。そして、調整後閾値リストでの各キーワードの評価値を理想値と併せて提示する精度一覧画面７０が表示される。 The automatic adjustment unit 4 repeats the above operation a specified number of times, and registers a new threshold list reflecting the changed threshold in the threshold list table 40 as an adjusted threshold list. After that, the evaluation section 2 performs evaluation again using the adjusted threshold value list, and the evaluation result is recorded in the evaluation result table 50 in the same manner as in the above-described embodiment. Then, an accuracy list screen 70 presenting the evaluation value of each keyword in the post-adjustment threshold list together with the ideal value is displayed.

本変形例では、以上のように、閾値を調整すべきキーワードの選択やそのキーワードに対応する閾値の調整を自動で行うことができるため、ユーザの操作負担を軽減できるといった特有の効果が得られる。 In this modified example, as described above, it is possible to automatically select a keyword for which the threshold value should be adjusted and adjust the threshold value corresponding to the keyword, thereby obtaining a unique effect of reducing the user's operation burden. .

＜変形例２＞
上述の実施形態では、音声認識モデル１０の認識対象となるキーワードのいずれかが評価用音声に含まれるものとしたが、キーワードが含まれていないノイズ音声、あるいは音声認識モデル１０の認識対象ではない他のワードが含まれるノイズ音声を評価用音声に加えてもよい。このようなノイズ音声を評価用音声に加える場合、評価用データテーブル３０の該当するエントリのキーワードＩＤには、該当なしを示す“ｎ／ａ”が記録される。このようなノイズ音声に対する音声認識部１の認識結果としては、キーワードなし（拒絶）が正しい結果である。<Modification 2>
In the above-described embodiment, it is assumed that any of the keywords to be recognized by the speech recognition model 10 is included in the evaluation speech. Noise speech containing other words may be added to the evaluation speech. When adding such a noise sound to the evaluation sound, the keyword ID of the corresponding entry in the evaluation data table 30 is recorded with "n/a" indicating no match. As for the recognition result of the speech recognition unit 1 for such noise speech, the correct result is that there is no keyword (rejection).

本変形例では、初期評価後の評価実施（図７のステップＳ１０３）において、キーワードを含む評価用音声に加えてノイズ音声を音声認識部１に入力し、認識対象となるキーワードごとに、音声認識部１がノイズ音声に対する認識結果としてそのキーワードを誤認識（誤反応）した回数を記録する。そして、図８に示した精度一覧画面７０において、キーワードごとの評価値および理想値と併せて、ノイズ音声で誤認識された回数を提示する。ユーザは、この精度一覧画面７０を参照することにより、ノイズ音声に対して誤認識されやすいキーワードを容易に把握することができ、そのキーワードを閾値の調整対象として選択し、上述の実施形態と同様に閾値の調整を適切に行うことができる。 In this modification, in the evaluation after the initial evaluation (step S103 in FIG. 7), in addition to the evaluation speech containing the keyword, noise speech is input to the speech recognition unit 1, and speech recognition is performed for each keyword to be recognized. The unit 1 records the number of erroneous recognitions (erroneous reactions) of the keyword as the recognition result for the noise voice. Then, on the accuracy list screen 70 shown in FIG. 8, along with the evaluation value and ideal value for each keyword, the number of times noise voices are erroneously recognized is presented. By referring to the accuracy list screen 70, the user can easily grasp the keyword that is likely to be misrecognized with respect to noise speech, select the keyword as a threshold adjustment target, and perform the same operation as in the above-described embodiment. threshold adjustment can be performed appropriately.

以上のように、本変形例では、ノイズ音声に対して誤認識されやすいキーワードをユーザに的確に把握させることができ、閾値の調整をより効果的にサポートすることができるといった特有の効果が得られる。 As described above, in this modified example, it is possible to allow the user to accurately grasp keywords that are likely to be erroneously recognized with respect to noise speech, and to more effectively support threshold adjustment. be done.

＜変形例３＞
上述の実施形態では、閾値調整の対象として選択されたキーワードに対応する初期評価結果画面９０において、正解率が十分に高くかつ拒絶率が十分に低い範囲を、閾値の調整可能な範囲としている。しかし、この範囲はあくまで評価用データを用いた評価部２の評価結果をもとに導き出される範囲であり、この範囲の境界付近の閾値を調整後の閾値として設定すると、より多様な音声が入力される実際の音声認識においては、認識精度が低下することも想定される。<Modification 3>
In the above-described embodiment, the range in which the correct answer rate is sufficiently high and the rejection rate is sufficiently low in the initial evaluation result screen 90 corresponding to the keyword selected as the threshold adjustment target is set as the threshold adjustable range. However, this range is only a range derived based on the evaluation results of the evaluation unit 2 using the evaluation data, and if the threshold near the boundary of this range is set as the threshold after adjustment, more diverse voices can be input. In the actual speech recognition performed, it is also assumed that the recognition accuracy will decrease.

そこで、初期評価結果画面９０において、隣接する閾値での正解率が急激に低下している、あるいは隣接する閾値での拒絶率が急激に上昇しているような閾値を調整後の閾値として設定しようとした場合に、その初期評価結果画面９０上またはその画面から遷移する精度一覧画面７０上で、閾値調整によって精度が低下する虞があることを示す警告を表示してもよい。隣接する閾値での正解率が急激に低下している、あるいは隣接する閾値での拒絶率が急激に上昇しているとの判定は、例えば、隣接する閾値での正解率が１０％以上低下する、あるいは隣接する閾値での拒絶率が１０％以上上昇するといった条件を予め定めておけばよい。 Therefore, in the initial evaluation result screen 90, the threshold after adjustment should be set such that the accuracy rate at the adjacent threshold is rapidly decreased or the rejection rate at the adjacent threshold is rapidly increased. In this case, a warning may be displayed on the initial evaluation result screen 90 or on the accuracy list screen 70 transitioned from the screen to indicate that there is a possibility that the accuracy may be lowered by the threshold adjustment. Judgment that the accuracy rate at the adjacent threshold is rapidly decreasing or the rejection rate at the adjacent threshold is rapidly increasing is, for example, the accuracy rate at the adjacent threshold is decreased by 10% or more. Alternatively, a condition such that the rejection rate at the adjacent threshold increases by 10% or more may be determined in advance.

なお、上述の変形例１で説明したように、自動調整部４が閾値の調整を自動で行う構成の場合においても、同様の警告表示を行うようにしてもよい。すなわち、自動調整部４が調整後の閾値として、隣接する閾値での正解率が急激に低下している、あるいは隣接する閾値での拒絶率が急激に上昇しているような閾値を選択した場合に、初期評価結果画面９０上または精度一覧画面７０上で警告を表示し、ユーザが閾値の調整を許可した場合に閾値の調整を行うようにしてもよい。 As described in Modification 1 above, even in the case where the automatic adjustment unit 4 automatically adjusts the threshold value, the same warning display may be performed. That is, when the automatic adjustment unit 4 selects, as the adjusted threshold value, a threshold value at which the accuracy rate at the adjacent threshold value is sharply decreased, or at which the rejection rate at the adjacent threshold value is sharply increased. Alternatively, a warning may be displayed on the initial evaluation result screen 90 or the accuracy list screen 70, and the threshold may be adjusted when the user permits adjustment of the threshold.

以上のように、本変形例では、却って認識精度の低下を招くような閾値の調整を行う虞がある場合に警告を表示するようにしているので、閾値の調整をより効果的にサポートすることができるといった特有の効果が得られる。 As described above, in this modified example, a warning is displayed when there is a risk of the adjustment of the threshold value that would rather cause a decrease in recognition accuracy. Therefore, it is possible to more effectively support the adjustment of the threshold value. You can get the unique effect of being able to

＜変形例４＞
上述の実施形態では、特定のキーワードのみに反応するボイストリガ音声認識への適用例を想定したが、ボイストリガ音声認識に限らず、連続した音声を文字に変換する連続音声認識に対して本発明を適用することもできる。連続音声認識においては、一般的な用語に加えて専門的な用語を認識可能にするためのユーザ単語辞書を追加することが可能である。そして、そのユーザ単語辞書に登録された各単語の認識されやすさ、認識されにくさを、閾値によって制御することができる。<Modification 4>
In the above-described embodiment, an example of application to voice-triggered speech recognition that reacts only to a specific keyword was assumed, but the present invention is not limited to voice-triggered speech recognition, but also applies to continuous speech recognition that converts continuous speech into characters. can also be applied. In continuous speech recognition, it is possible to add a user word dictionary to enable recognition of technical terms in addition to general terms. Then, the ease of recognition and the difficulty of recognition of each word registered in the user word dictionary can be controlled by a threshold value.

したがって、連続音声認識に本発明を適用する場合は、ユーザ単語辞書に登録された各単語を上述のボイストリガ音声認識におけるキーワードと同様に扱い、上述の実施形態と同様の方法で、ユーザ単語辞書内の各単語に対して設定される閾値の調整を適切にサポートすることができる。 Therefore, when the present invention is applied to continuous speech recognition, each word registered in the user word dictionary is treated in the same manner as the keyword in the above-described voice-triggered speech recognition, and the user word dictionary is processed in the same manner as in the above-described embodiment. can adequately support adjustment of the threshold set for each word in the .

上述した実施形態や各変形例の閾値調整装置における、閾値調整画面に表示する評価値は、音声認識部１が出力する認識結果をもとに、キーワードテーブル２０に登録されたキーワードごとの正解率・拒絶率（評価値）を算出した各キーワードの評価値である。すなわち、複数の発話者が発話した音声の認識結果に対して、キーワードごとの評価値を表示しているため、複数の発話者の音声認識結果が混在したものが評価対象となっている。
しかし、評価値の表示方法についてはこれに限らず、例えば、発話者ごとの音声認識結果を対象に、キーワードごとの正解率・拒絶率（評価値）を算出し、発話者ごとの各キーワードの評価値を表示するようにしても良い。同様に誤認識解析画面においても、発話者ごとの誤認識キーワードを算出し、発話者ごとの誤認識キーワードおよびその出現回数を表示するようにしても良い。The evaluation value displayed on the threshold adjustment screen in the threshold adjustment device of the above-described embodiment and modifications is based on the recognition result output by the speech recognition unit 1, and the accuracy rate for each keyword registered in the keyword table 20.・It is the evaluation value of each keyword for which the rejection rate (evaluation value) is calculated. That is, since the evaluation value for each keyword is displayed for the recognition results of voices uttered by a plurality of speakers, the evaluation target is a mixture of voice recognition results of a plurality of speakers.
However, the display method of the evaluation value is not limited to this. An evaluation value may be displayed. Similarly, on the misrecognition analysis screen, the misrecognition keyword for each speaker may be calculated, and the misrecognition keyword for each speaker and the number of appearances of the misrecognition keyword may be displayed.

＜補足説明＞
上述した実施形態や各変形例の閾値調整装置は、例えば、汎用のコンピュータを基本ハードウェアとして用いることで実現可能である。すなわち、上述の閾値調整装置の各部の機能は、汎用のコンピュータに搭載された１以上のプロセッサにプログラムを実行させることにより実現することができる。このとき、閾値調整装置は、上記のプログラムをコンピュータに予めインストールすることで実現してもよいし、コンピュータ読み取り可能な記憶媒体に上記のプログラムを記憶して、あるいはネットワークを介して上記のプログラムを配布して、このプログラムをコンピュータに適宜インストールすることで実現してもよい。<Supplementary explanation>
The threshold value adjustment devices of the above-described embodiments and modifications can be realized by using, for example, a general-purpose computer as basic hardware. That is, the function of each part of the above-described threshold adjustment device can be realized by causing one or more processors installed in a general-purpose computer to execute a program. At this time, the threshold adjustment device may be realized by pre-installing the above program in a computer, storing the above program in a computer-readable storage medium, or executing the above program via a network. It may be realized by distributing and installing this program on a computer as appropriate.

図１３は、上述の閾値調整装置のハードウェア構成例を示すブロック図である。閾値調整装置は、例えば図１３に示すように、ＣＰＵ（Central Processing Unit）などのプロセッサ１０１と、ＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）などのメモリ１０２と、ＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）などのストレージデバイス１０３と、液晶パネルなどの表示装置１０６やキーボードやポインティングデバイスなどの入力装置１０７といった機器を接続するための機器Ｉ／Ｆ１０４と、装置外部と通信を行う通信Ｉ／Ｆ１０５と、これら各部を接続するバス１０８とを備えた一般的なコンピュータとしてのハードウェア構成を有する。 FIG. 13 is a block diagram showing a hardware configuration example of the above threshold adjustment device. For example, as shown in FIG. 13, the threshold adjustment device includes a processor 101 such as a CPU (Central Processing Unit), a memory 102 such as a RAM (Random Access Memory) or a ROM (Read Only Memory), and a HDD (Hard Disk Drive). , a storage device 103 such as an SSD (Solid State Drive), a display device 106 such as a liquid crystal panel, and an input device 107 such as a keyboard or pointing device. It has a hardware configuration as a general computer including a communication I/F 105 and a bus 108 connecting these units.

上述の閾値調整装置を図１３に示すハードウェア構成により実現する場合、例えば、プロセッサ１０１がメモリ１０２を利用して、ストレージデバイス１０３などに格納されたプログラムを読み出して実行することにより、上述の音声認識部１、評価部２、表示制御部３、自動調整部４などの各部の機能を実現することができる。また、上述の音声認識モデル１０、キーワードテーブル２０、評価用データテーブル３０、閾値リストテーブル４０、評価結果テーブル５０、理想値テーブル６０は、例えばメモリ１０２やストレージデバイス１０３などに格納しておき、適宜読み出して処理に利用することができる。 When the above-described threshold adjustment device is realized by the hardware configuration shown in FIG. 13, for example, the processor 101 uses the memory 102 to read and execute a program stored in the storage device 103 or the like, thereby obtaining the above-described voice Functions of each unit such as the recognition unit 1, the evaluation unit 2, the display control unit 3, and the automatic adjustment unit 4 can be realized. Further, the speech recognition model 10, the keyword table 20, the evaluation data table 30, the threshold list table 40, the evaluation result table 50, and the ideal value table 60 are stored in the memory 102, the storage device 103, or the like, and can be stored as needed. It can be read out and used for processing.

なお、上述の閾値調整装置の各部の機能は、その一部または全部を、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field-Programmable Gate Array）などの専用のハードウェア（汎用のプロセッサではなく専用のプロセッサ）により実現することもできる。また、複数のプロセッサを用いて上述した各部の機能を実現する構成であってもよい。また、上述の閾値調整装置は、単一のコンピュータにより実現する場合に限らず、複数のコンピュータに機能を分散して実現することもできる。 It should be noted that some or all of the functions of each part of the threshold adjustment device described above are implemented by dedicated hardware such as ASIC (Application Specific Integrated Circuit) or FPGA (Field-Programmable Gate Array) (not a general-purpose processor but a dedicated hardware). processor). Further, the configuration may be such that a plurality of processors are used to implement the functions of the respective units described above. Further, the above-described threshold adjustment device is not limited to being implemented by a single computer, but can be implemented by distributing the functions to a plurality of computers.

以上、本発明の実施形態を説明したが、この実施形態は例として提示したものであり、発明の範囲を限定することは意図していない。この新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、請求の範囲に記載された発明とその均等の範囲に含まれる。 Although the embodiment of the present invention has been described above, this embodiment is presented as an example and is not intended to limit the scope of the invention. This novel embodiment can be embodied in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. These embodiments and their modifications are included in the scope and gist of the invention, and are included in the scope of the invention described in the claims and equivalents thereof.

１音声認識部
２評価部
３表示制御部
４自動調整部
１０音声認識モデル
２０キーワードテーブル
３０評価用データテーブル
４０閾値リストテーブル
５０評価結果テーブル
６０理想値テーブル1 speech recognition unit 2 evaluation unit 3 display control unit 4 automatic adjustment unit 10 speech recognition model 20 keyword table 30 evaluation data table 40 threshold list table 50 evaluation result table 60 ideal value table

Claims

a speech recognition unit that performs speech recognition;
inputting a threshold value list whose elements are a plurality of threshold values individually set for a plurality of predetermined recognition target words and evaluation speech to the speech recognition unit, and outputting the evaluation from the speech recognition unit; an evaluation unit that calculates an evaluation value representing the recognition accuracy of each of the plurality of recognition target words by the speech recognition unit using the threshold list based on the recognition result of the speech for the target speech;
a display control unit that causes a display device to display a threshold adjustment screen for adjusting a threshold corresponding to an arbitrary recognition target word among the plurality of recognition target words,
The threshold adjustment screen presents the evaluation value calculated for each of the plurality of recognition target words by the evaluation unit together with the ideal value calculated in advance for each of the plurality of recognition target words. including an accuracy list screen to
The evaluation unit calculates, for each of the plurality of recognition target words, the number of times the speech recognition unit incorrectly recognizes another recognition target word from the evaluation speech containing the recognition target word. count for each target word,
The threshold adjustment screen is an erroneous recognition analysis screen for presenting the number of erroneous recognitions counted by the evaluation unit for each erroneously recognized recognition target word for the recognition target word selected from the plurality of recognition target words. further including,
Threshold adjuster.

The threshold adjustment screen adjusts an initial evaluation threshold list for setting a threshold common to the plurality of recognition target words and the evaluation speech for the recognition target word selected from the plurality of recognition target words. further comprising an initial evaluation result screen for presenting a list of evaluation values repeatedly calculated by the evaluation unit when input is repeatedly made to the speech recognition unit while changing a common threshold in the threshold list for initial evaluation at predetermined intervals; Item 2. The threshold adjustment device according to item 1.

The ideal value is determined by the evaluation unit when the initial evaluation threshold list and the evaluation speech are repeatedly input to the speech recognition unit while changing the common threshold in the initial evaluation threshold list at predetermined intervals. 3. The threshold adjustment device according to claim 2 , wherein the evaluation value represents the highest recognition accuracy by the speech recognition unit among evaluation values that are repeatedly calculated.

The threshold list uses a common threshold in the initial evaluation threshold list when the ideal values of the respective recognition target words are obtained as the plurality of thresholds individually set for the plurality of recognition target words. 4. The threshold adjuster of claim 3 , comprising an initial threshold list.

2. The threshold value adjustment apparatus according to claim 1, wherein said threshold value list includes an adjusted threshold value list in which threshold values corresponding to arbitrary recognition target words are adjusted using said threshold value adjustment screen.

an automatic adjustment unit that selects a first recognition target word whose threshold value is to be adjusted from among the plurality of recognition target words based on the difference between the evaluation value and the ideal value presented on the accuracy list screen; 3. The threshold adjuster of claim 2 , comprising:

3. The automatic adjustment unit adjusts the threshold corresponding to the first recognition target word based on a list of evaluation values presented on the initial evaluation result screen corresponding to the first recognition target word. 7. The threshold adjustment device according to 6 .

The automatic adjustment unit further selects a second recognition target word whose threshold is to be adjusted based on the number of misrecognitions presented on the misrecognition analysis screen corresponding to the first recognition target word. 7. A threshold adjuster according to claim 6 .

3. The automatic adjustment unit adjusts the threshold corresponding to the second recognition target word based on a list of evaluation values presented on the initial evaluation result screen corresponding to the second recognition target word. 9. The threshold adjustment device according to 8 .

2. The threshold adjustment device according to claim 1, wherein said evaluation speech includes noise speech that does not include any of said plurality of recognition target words.

The display control unit adjusts the threshold specified as the threshold after adjustment based on the list of evaluation values presented on the initial evaluation result screen corresponding to the recognition target word whose threshold is to be adjusted to meet a predetermined condition. 3. The threshold adjustment device according to claim 2 , wherein a warning is displayed on the threshold adjustment screen when it is determined that it applies.

A threshold value list including a plurality of threshold values individually set for a plurality of predetermined recognition target words as elements and evaluation speech are input to a speech recognition unit that performs speech recognition, and the speech recognition unit outputs the threshold list. an evaluation step of calculating an evaluation value representing the recognition accuracy of each of the plurality of recognition target words by the speech recognition unit using the threshold list, based on the recognition result of the evaluation speech;
a display control step of causing a display device to display a threshold adjustment screen for adjusting a threshold corresponding to an arbitrary recognition target word among the plurality of recognition target words,
The threshold adjustment screen presents the evaluation value calculated for each of the plurality of recognition target words in the evaluation step together with the ideal value calculated in advance for each of the plurality of recognition target words. including an accuracy list screen to
In the evaluation step, for each of the plurality of recognition target words, the number of times the speech recognition unit incorrectly recognizes another recognition target word from the evaluation speech containing the recognition target word is counted. count for each target word,
The threshold adjustment screen is an erroneous recognition analysis screen for presenting the number of erroneous recognitions counted in the evaluation step for each of the erroneously recognized target words selected from among the plurality of target words to be recognized. further including,
Threshold adjustment method.

A computer-readable recording medium storing a computer program, wherein the computer program causes the computer to:
a speech recognition unit that performs speech recognition;
inputting a threshold value list whose elements are a plurality of threshold values individually set for a plurality of predetermined recognition target words and evaluation speech to the speech recognition unit, and outputting the evaluation from the speech recognition unit; an evaluation unit that calculates an evaluation value representing the recognition accuracy of each of the plurality of recognition target words by the speech recognition unit using the threshold list based on the recognition result of the speech for the target speech;
a display control unit that causes a display device to display a threshold adjustment screen for adjusting a threshold corresponding to an arbitrary recognition target word among the plurality of recognition target words,
The threshold adjustment screen presents the evaluation value calculated for each of the plurality of recognition target words by the evaluation unit together with the ideal value calculated in advance for each of the plurality of recognition target words. including an accuracy list screen to
The evaluation unit calculates, for each of the plurality of recognition target words, the number of times the speech recognition unit incorrectly recognizes another recognition target word from the evaluation speech containing the recognition target word. count for each target word,
The threshold adjustment screen is an erroneous recognition analysis screen for presenting the number of erroneous recognitions counted by the evaluation unit for each erroneously recognized recognition target word for the recognition target word selected from the plurality of recognition target words. further including,
recording medium.