JP2005173157A

JP2005173157A - Parameter setting device, parameter setting method, program and storage medium

Info

Publication number: JP2005173157A
Application number: JP2003412497A
Authority: JP
Inventors: Yasuo Okuya; 泰夫奥谷; Toshiaki Fukada; 俊明深田; Yasuhiro Komori; 康弘小森
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2003-12-10
Filing date: 2003-12-10
Publication date: 2005-06-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide a parameter setting device which can determine a parameter set so that other conditions fall within ranges to which a user can give permission even in the case where the parameter set is determined so that at least one condition out of a plurality of operating conditions becomes optimum. <P>SOLUTION: In the parameter setting device, a voice recognizing part 204 performs voice recognition using a parameter set selected by a parameter selecting part 202. Results of operations obtained as results of having performed voice recognition in respective trials, namely, recognition rates, recognition periods, and memory usages are held. A vector quantization part 206 performs vector quantization by considering the results of operations as vectors and results of the vector quantization are displayed by a display processing part 208. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、音声認識のためのパラメータを適切に設定するためのパラメータ設定装置、その方法、プログラムおよび記憶媒体に関する。 The present invention relates to a parameter setting device, a method, a program, and a storage medium for appropriately setting parameters for speech recognition.

音声認識を実環境で利用するためには、各種パラメータのチューニングが必要である。ここで、パラメータとは、ビーム幅、音響モデル、ＶＡＤ（音声切り出し）の閾値などの音声認識に関する任意のパラメータを包含する。このパラメータのチューニングは、音声認識に詳しい技術者が認識実験を試行錯誤しながらパラメータを決定するという方法が一般的である。そのため、パラメータのチューニングは、音声認識に関する深い知識や経験を有する技術者のみが行うことが可能な作業であるといえる。 In order to use speech recognition in a real environment, various parameters need to be tuned. Here, the parameters include arbitrary parameters related to speech recognition, such as a beam width, an acoustic model, and a VAD (speech extraction) threshold. The parameter tuning is generally performed by a method in which an engineer familiar with speech recognition determines the parameter through trial and error. Therefore, it can be said that parameter tuning is an operation that can be performed only by engineers who have deep knowledge and experience regarding speech recognition.

一方、これらパラメータのチューニングを自動的に行う方法が提案されている（例えば、特許文献１を参照）。この方法は、評価セット（音声データと認識文法）、動作条件（認識率、認識時間、メモリ使用量などの必要条件）、各パラメータの探索条件（刻み幅や範囲）を与え、各パラメータの値を探索条件内で様々に変化させながら音声認識を繰り返し試行し、各試行における動作結果（認識率、認識時間、メモリ使用量）を記録し、記録された動作結果から、動作条件を満足する最適なパラメータのセットを最終的に得るものである。
特開２００２−３２８６９６号公報 On the other hand, a method for automatically tuning these parameters has been proposed (see, for example, Patent Document 1). This method gives an evaluation set (speech data and recognition grammar), operating conditions (requirements such as recognition rate, recognition time, memory usage, etc.), search conditions (step size and range) for each parameter, and values for each parameter The speech recognition is repeatedly tried while changing the search conditions in various ways, and the operation results (recognition rate, recognition time, memory usage) in each trial are recorded, and the optimum operation conditions are satisfied from the recorded operation results. The final set of parameters.
JP 2002-328696A

しかしながら、上述した方法においては、動作条件を、認識率、認識時間、メモリ使用量などの各必要条件のアンドで表現すること自体に無理がある。本当にユーザが設定したい動作条件とは、認識率、認識時間、メモリ使用量との兼ね合いの中で決まるものである。すなわち、認識率を優先させるような動作条件であっても、認識時間やメモリ使用量が大きすぎるものである場合、このような動作条件は、実際には使用することができない。また、認識率は多少劣るがメモリ使用量が半分で済むものがあれば、積極的にそちらを採用したいと考えるはずである。例えば、動作条件として認識率９０％以上、かつ、メモリ使用量２ＭＢ以下を設定した場合において、試行の結果、認識率９０％でメモリ使用量２ＭＢのパラメータセットが選ばれたとすると、選ばれなかったパラメータセットの中には、認識率は８９．９％であるが、メモリ使用量が１．２ＭＢのものが存在する可能性がある。もちろん、このパラメータセットは、認識率に関する必要条件を満たさなかったために、解としては得られない。しかしながら、一般的には、認識率の０．１％よりもそれに反して得られるメモリ使用量の低減の効果０．８ＭＢの方が魅力的である。このような場合は、後者のパラメータセットを選択したいところであるが、上述の方法では、前者のパラメータセットが選択されることになる。 However, in the above-described method, it is impossible to express the operating condition by AND of each necessary condition such as the recognition rate, the recognition time, and the memory usage. The operating conditions that the user really wants to set are determined in consideration of the recognition rate, the recognition time, and the memory usage. That is, even if the operation condition gives priority to the recognition rate, such an operation condition cannot be actually used if the recognition time and the amount of memory used are too large. If the recognition rate is somewhat inferior but the memory usage can be halved, you should be willing to adopt it. For example, when a recognition rate of 90% or more and a memory usage of 2 MB or less are set as operating conditions, if a parameter set with a recognition rate of 90% and a memory usage of 2 MB is selected as a result of the trial, it was not selected. Some parameter sets have a recognition rate of 89.9% but may have a memory usage of 1.2 MB. Of course, this parameter set cannot be obtained as a solution because it does not satisfy the requirements regarding the recognition rate. However, in general, the effect of reducing the amount of memory used obtained on the contrary to 0.8 MB is more attractive than the recognition rate of 0.1%. In such a case, the latter parameter set is desired to be selected, but in the above method, the former parameter set is selected.

また、上記のように認識率とメモリ使用量の２つの条件だけを設定すると、得られる最適なパラメータセットでは、認識時間が非常に大きいものになる可能性がある。 Moreover, if only two conditions of the recognition rate and the memory usage are set as described above, the recognition time may be very long in the optimum parameter set obtained.

本発明は、上記問題点に鑑みてなされたものであり、複数の動作条件のうち、少なくとも１つの条件が最適になるように決定された場合においても、他の条件がユーザ許容可能な範囲内に収まるようにパラメータセットを決定することができるパラメータ設定装置、その方法、その制御方法を実現するプログラムおよび記憶媒体を提供することを目的とする。 The present invention has been made in view of the above problems, and even when at least one of a plurality of operating conditions is determined to be optimal, other conditions are within a range acceptable by the user. It is an object of the present invention to provide a parameter setting device capable of determining a parameter set so as to fall within the range, a method thereof, a program realizing the control method, and a storage medium.

本発明は、上記目的を達成するため、音声認識のためのパラメータ設定装置であって、動作条件に対して予め用意されている複数のパラメータセット毎にそれを用いて音声認識を試行する試行手段と、前記試行毎における、認識率を含む動作結果をそれに対応するパラメータセットとともに保持する動作結果保持手段と、前記試行毎の動作結果をグループ化するグループ化手段とを備えることを特徴とする。 In order to achieve the above object, the present invention provides a parameter setting device for speech recognition, and trial means for trying speech recognition using each of a plurality of parameter sets prepared in advance for operating conditions. And an operation result holding means for holding an operation result including a recognition rate for each trial together with a parameter set corresponding thereto, and a grouping means for grouping the operation results for each trial.

本発明は、上記目的を達成するため、音声認識のためのパラメータ設定方法であって、動作条件に対して予め用意されている複数のパラメータセット毎にそれを用いて音声認識を試行する試行工程と、前記試行毎における、認識率を含む動作結果をそれに対応するパラメータセットとともに保持する動作結果保持工程と、前記試行毎の動作結果をグループ化するグループ化工程とを備えることを特徴とする。 In order to achieve the above object, the present invention is a parameter setting method for speech recognition, and a trial step of trying speech recognition using each of a plurality of parameter sets prepared in advance for operating conditions. And an operation result holding step for holding an operation result including a recognition rate for each trial together with a parameter set corresponding thereto, and a grouping step for grouping the operation results for each trial.

本発明は、上記目的を達成するため、音声認識のためのパラメータ設定方法を実現するためのプログラムであって、動作条件に対して予め用意されている複数のパラメータセット毎にそれを用いて音声認識を試行する試行モジュールと、前記試行毎における、認識率を含む動作結果をそれに対応するパラメータセットとともに保持する動作結果保持モジュールと、前記試行毎の動作結果をグループ化するグループ化モジュールとを備えることを特徴とする。 In order to achieve the above object, the present invention is a program for realizing a parameter setting method for speech recognition, and uses a plurality of parameter sets prepared in advance for operating conditions to perform speech. A trial module for attempting recognition; an operation result holding module for holding an operation result including a recognition rate for each trial together with a parameter set corresponding to the trial module; and a grouping module for grouping the operation results for each trial. It is characterized by that.

本発明は、上記目的を達成するため、上記プログラムをコンピュータ読取可能に格納したことを特徴とする記憶媒体を提供する。 In order to achieve the above object, the present invention provides a storage medium in which the above program is stored so as to be readable by a computer.

本発明によれば、複数の動作条件のうち、少なくとも１つの条件が最適になるように決定された場合においても、他の条件をユーザが許容可能な範囲内に収まるようにパラメータセットを決定することができる。 According to the present invention, even when at least one of a plurality of operating conditions is determined to be optimal, the parameter set is determined so that the other conditions are within an allowable range for the user. be able to.

以下、本発明の実施の形態について図面を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１の実施の形態）
図１は本発明の第１の実施の形態に係る音声認識のためのパラメータ設定装置のハードウエア構成を示すブロック図である。本実施の形態においては、一般的なパーソナルコンピュータを用いてパラメータ設定装置を実現する場合について説明するが、本発明は専用のパラメータ設定装置であっても、また他の形態の装置であってもよいことはいうまでもない。 (First embodiment)
FIG. 1 is a block diagram showing a hardware configuration of a parameter setting apparatus for speech recognition according to the first embodiment of the present invention. In the present embodiment, a case where a parameter setting device is realized using a general personal computer will be described. However, the present invention may be a dedicated parameter setting device or another type of device. Needless to say, it is good.

パラメータ設定装置は、音声認識のためのパラメータを適切に設定するための装置であり、図１に示すように、ＲＯＭなどからなる制御メモリ１０１、中央処理装置（ＣＰＵ）１０２、ＲＡＭなどからなるメモリ１０３、外部記憶装置１０４、入力装置１０５、表示装置１０６、バス１０７などから構成される。制御メモリ１０１には、本パラメータ設定装置を実現するための制御プログラムやその制御プログラムで用いられるデータが格納される。これらの制御プログラムやデータは、中央処理装置１０２の制御の下で、バス１０７を通じて適宜メモリ１０３に取り込まれ、中央処理装置１０２によって実行される。 The parameter setting device is a device for appropriately setting parameters for speech recognition. As shown in FIG. 1, the parameter setting device is a control memory 101 composed of a ROM, a central processing unit (CPU) 102, a memory composed of a RAM, and the like. 103, an external storage device 104, an input device 105, a display device 106, a bus 107, and the like. The control memory 101 stores a control program for realizing the parameter setting device and data used in the control program. These control programs and data are appropriately fetched into the memory 103 through the bus 107 under the control of the central processing unit 102 and executed by the central processing unit 102.

次に、本パラメータ設定装置のモジュール構成について図２を参照しながら説明する。図２は図１のパラメータ設定装置のモジュール構成を示すブロック図である。 Next, the module configuration of the parameter setting apparatus will be described with reference to FIG. FIG. 2 is a block diagram showing a module configuration of the parameter setting apparatus of FIG.

本パラメータ設定装置のモジュール構成は、図２に示すように、パラメータ探索条件保持部２０１、パラメータ選択部２０２、評価用データ保持部２０３、音声認識部２０４、動作結果保持部２０５、ベクトル量子化部２０６、量子化結果保持部２０７、表示処理部２０８、および入力処理部２０９を含む。このモジュール構成は、制御メモリ１０１に格納されている制御プログラムを中央処理装置１０２が実行することにより、構成されるものである。 As shown in FIG. 2, the module configuration of the parameter setting apparatus includes a parameter search condition holding unit 201, a parameter selection unit 202, an evaluation data holding unit 203, a speech recognition unit 204, an operation result holding unit 205, and a vector quantization unit. 206, a quantization result holding unit 207, a display processing unit 208, and an input processing unit 209. This module configuration is configured by the central processing unit 102 executing a control program stored in the control memory 101.

パラメータ探索条件保持部２０１は、各種パラメータを様々に変化させる場合の最大値、最小値、刻み幅などの探索条件を保持する。パラメータ選択部２０２は、パラメータ探索条件保持部２０１が保持する探索条件の中からこれまでに試行していないパラメータセットの中から１つのパラメータセットを選び出す。評価用データ保持部２０３は、音声認識の評価を行うためのデータを保持する。このデータは、具体的には、評価用の音声データ、認識文法、言語モデル、音響モデルなどである。音声認識部２０４は、パラメータ選択部２０２により選択されたパラメータセットを用いて音声認識を行う。動作結果保持部２０５は、各試行において音声認識を実行した結果として得られる動作結果を保持する。ここで、動作結果とは、認識率、認識時間、およびメモリ使用量を示すものである。 The parameter search condition holding unit 201 holds search conditions such as a maximum value, a minimum value, and a step size when various parameters are changed variously. The parameter selection unit 202 selects one parameter set from the parameter sets that have not been tried so far from the search conditions held by the parameter search condition holding unit 201. The evaluation data holding unit 203 holds data for performing speech recognition evaluation. Specifically, this data is speech data for evaluation, recognition grammar, language model, acoustic model, and the like. The voice recognition unit 204 performs voice recognition using the parameter set selected by the parameter selection unit 202. The operation result holding unit 205 holds an operation result obtained as a result of executing speech recognition in each trial. Here, the operation result indicates a recognition rate, a recognition time, and a memory usage.

ベクトル量子化部２０６は、上記動作結果をベクトルとみなしてベクトル量子化を行う。量子化結果保持部２０７は、ベクトル量子化部２０６によるベクトル量子化の結果を保持する。表示処理部２０８は、ベクトル量子化部２０６によるベクトル量子化の結果を表示する。入力処理部２０９は、ユーザが量子化結果の中から所望の一つを選択入力するための操作を行うための操作手段を有し、該操作手段を用いた操作により選択入力された量子化結果を受理する。パラメータセット決定部２１０は、ユーザの入力に対応するパラメータセットを求める。 The vector quantization unit 206 performs vector quantization by regarding the operation result as a vector. The quantization result holding unit 207 holds the result of vector quantization by the vector quantization unit 206. The display processing unit 208 displays the result of vector quantization by the vector quantization unit 206. The input processing unit 209 has an operation unit for the user to perform an operation for selecting and inputting a desired one from the quantization results, and the quantization result selected and input by an operation using the operation unit Is accepted. The parameter set determination unit 210 obtains a parameter set corresponding to the user input.

次に、本パラメータ設定装置の処理について図３を参照しながら説明する。図３は図２のパラメータ設定装置の処理の手順を示すフローチャートである。この図３のフローチャートで示される手順は、図２のモジュール構成により実行されるものである。 Next, processing of the parameter setting device will be described with reference to FIG. FIG. 3 is a flowchart showing a processing procedure of the parameter setting apparatus of FIG. The procedure shown in the flowchart of FIG. 3 is executed by the module configuration of FIG.

本パラメータ設定装置においては、図３に示すように、まずステップＳ３０１において、パラメータ選択部２０２が、パラメータ探索条件保持部２０１に保持されている探索条件の中に未試行のパラメータセットが存在するか否かを判定する。ここで、探索条件とは、音声認識を様々な値のパラメータで試行するための、各パラメータのとりうる最小値、最大値、および刻み幅などで定義されるものである。また、いうまでもないが、探索条件は、試行すべきパラメータの値をリストアップしたものでもよい。そして、未試行のパラメータセットが存在する場合は、パラメータ選択部２０２によりパラメータ探索条件保持部２０１が保持する探索条件の中から、未試行のパラメータセットの一つが選択され、処理がステップＳ３０２に進められる。これに対し、未試行のパラメータセットが存在しない場合は、処理がステップＳ３０４に進められる。 In this parameter setting apparatus, as shown in FIG. 3, first, in step S301, the parameter selection unit 202 determines whether there is an untrial parameter set in the search conditions held in the parameter search condition holding unit 201. Determine whether or not. Here, the search condition is defined by a minimum value, a maximum value, a step size, and the like that each parameter can take in order to try speech recognition with parameters of various values. Needless to say, the search condition may be a list of parameter values to be tried. If an untrial parameter set exists, the parameter selection unit 202 selects one of the untrial parameter sets from the search conditions held by the parameter search condition holding unit 201, and the process proceeds to step S302. It is done. On the other hand, if there is no untrial parameter set, the process proceeds to step S304.

ステップＳ３０２においては、音声認識部２０４が、パラメータ選択部２０２が選択したパラメータセットと、評価用データ保持部２０３が保持する評価用データを用いて音声認識を行う。音声認識部２０４は、認識率、認識時間、メモリ使用量をそれぞれ計算する。そして、ステップＳ３０３において、動作結果保持部２０５が、音声認識の結果として得られる認識率、認識時間、メモリ使用量およびその試行に用いたパラメータセットを組にして保持する。次いで、処理がステップＳ３０１に戻る。 In step S <b> 302, the voice recognition unit 204 performs voice recognition using the parameter set selected by the parameter selection unit 202 and the evaluation data held by the evaluation data holding unit 203. The voice recognition unit 204 calculates a recognition rate, a recognition time, and a memory usage amount. In step S303, the operation result holding unit 205 holds the recognition rate, the recognition time, the memory usage, and the parameter set used for the trial obtained as a result of the speech recognition as a set. Next, the process returns to step S301.

ステップＳ３０４においては、ベクトル量子化部２０６が、これまでの各試行で得られた動作結果（認識率、認識時間、メモリ使用量）を３次元のベクトルとみなしてベクトル量子化を行い、その結果を量子化結果保持部２０７に保持する。ベクトル量子化は、一般的な方法のものを利用すればよい。また、量子化の結果として得られる量子化ベクトルの数は、ユーザがその中から選択することが可能な程度に少ないことが望ましく５〜１０程度がよい。この数は、予め適切な値を設定しておいてもよいし、ユーザに設定させてもかまわない。 In step S304, the vector quantization unit 206 regards the operation results (recognition rate, recognition time, memory usage) obtained in each trial so far as a three-dimensional vector, and performs vector quantization. Is held in the quantization result holding unit 207. For vector quantization, a general method may be used. The number of quantization vectors obtained as a result of quantization is preferably as small as possible so that the user can select from among them. This number may be set to an appropriate value in advance or may be set by the user.

次いで、ステップＳ３０５において、表示処理部２０８が量子化結果保持部２０７に保持されている量子化結果をユーザに提示し、続くステップＳ３０６において、入力処理部２０９が、ユーザの入力を受理する。ここで入力される情報は、量子化結果の中からユーザが所望するものの一つである。入力を受理すると、ステップＳ３０７において、パラメータセット決定部２１０が、入力処理部２０９が受理したユーザの選択結果に基づいて量子化結果保持部２０７に保持されている量子化結果に一致するパラメータセットを探し出し、それを求めるパラメータのセットとする。そして、本処理は、終了する。 Next, in step S305, the display processing unit 208 presents the quantization result held in the quantization result holding unit 207 to the user, and in the subsequent step S306, the input processing unit 209 accepts the user input. The information input here is one desired by the user from among the quantization results. When the input is accepted, in step S307, the parameter set determination unit 210 selects a parameter set that matches the quantization result held in the quantization result holding unit 207 based on the user selection result received by the input processing unit 209. Find and use it as a set of parameters. Then, this process ends.

次に、本実施の形態におけるベクトル量子化部２０６によるベクトル量子化の入出力データについて図４を参照しながら説明する。図４は図２のベクトル量子化部２０６によるベクトル量子化の入出力データの一例を示す図である。 Next, input / output data of vector quantization by the vector quantization unit 206 in the present embodiment will be described with reference to FIG. FIG. 4 is a diagram showing an example of input / output data of vector quantization by the vector quantization unit 206 of FIG.

ベクトル量子化部２０６に入力されるベクトル量子化の入力、すなわち、すべての試行の動作結果としては、例えば図４に示すような入力４０１がある。この入力４０１に対応するパラメータセットは、パラメータセット４０３である。入力４０１およびパラメータセット４０３は、図２の動作結果保持部２０５に保持される。上記入力４０１に対して、ベクトル量子化部２０６からは、ベクトル量子化の出力４０２が得られる。この出力４０２は、量子化結果保持部２０７に保持される。 As an input of vector quantization input to the vector quantization unit 206, that is, an operation result of all trials, for example, there is an input 401 as shown in FIG. A parameter set corresponding to this input 401 is a parameter set 403. The input 401 and parameter set 403 are held in the operation result holding unit 205 in FIG. In response to the input 401, the vector quantization unit 206 obtains an output 402 of vector quantization. This output 402 is held in the quantization result holding unit 207.

このように、本実施の形態によれば、すべての試行を対象にベクトル量子化を行うので、ユーザが所望する動作条件に近いパラメータセットを決定することができる。換言すれば、認識率、認識時間、メモリ使用量の動作条件のうち、少なくとも１つの条件が最適になるように決定された場合においても、他の条件がユーザ許容可能な範囲内に収まるようにパラメータセットを決定することができる。 Thus, according to the present embodiment, since vector quantization is performed for all trials, it is possible to determine a parameter set that is close to the operating condition desired by the user. In other words, even when at least one of the operating conditions of the recognition rate, the recognition time, and the memory usage is determined to be optimal, the other conditions are within the user-acceptable range. A parameter set can be determined.

本実施の形態においては、ベクトル量子化の結果をユーザに提示し、その中から所望する結果をユーザに選択させる場合について説明したが、これに限定されるものではなく、ベクトル量子化の結果として得られた複数の代表ベクトルの中で各要素（認識率、認識時間、メモリ使用量）について、その値が最大となる要素を含むベクトルを、それぞれ認識率優先、認識時間優先、省メモリ優先の各モードに割り付け、そのときのパラメータセットを各モードのパラメータとしてユーザに提供するようにしてもよい。これにより、音声認識のパラメータチューニングに関する経験が少ないユーザでもパラメータセットの選択が可能となる。 In the present embodiment, a case has been described in which the result of vector quantization is presented to the user, and the user is allowed to select a desired result. However, the present invention is not limited to this, and as a result of vector quantization Among the obtained representative vectors, for each element (recognition rate, recognition time, memory usage), the vector containing the element with the maximum value is assigned the recognition rate priority, recognition time priority, and memory saving priority. It may be assigned to each mode, and the parameter set at that time may be provided to the user as a parameter of each mode. As a result, even a user with little experience in parameter recognition for speech recognition can select a parameter set.

また、本実施の形態においては、すべての試行を対象にベクトル量子化を行う場合について説明したが、これに限定されるものではなく、音声認識のパラメータとして明らかに不適当なものを取り除いた後、ベクトル量子化を行うようにしてもよい。音声認識のパラメータとして明らかに不適当であることを判別する方法としては、動作結果に着目し、例えば認識率に関して言えば、認識率が８０％以下のもの、認識率の最大値の８０％以下のもの、認識率の最大値より２０％以上低いものなどを不適当であるとすればよい。いうまでもなく、認識率に限らず、認識時間やメモリ使用量に関する動作結果について同様の事前処理を行うことは、有効である。これにより、上述した、各要素（認識率、認識時間、メモリ使用量）のベクトルを認識率優先、認識時間優先、省メモリ優先の各モードへ割り付ける場合においては、たとえ認識率優先モードのパラメータセットを選択しても、認識率とメモリ使用量を許容可能な範囲内に収めることが保証される。 In the present embodiment, the case where vector quantization is performed for all trials has been described. However, the present invention is not limited to this, and after clearly unsuitable speech recognition parameters are removed. Alternatively, vector quantization may be performed. As a method for discriminating that the parameters are obviously inappropriate as a speech recognition parameter, paying attention to the operation result, for example, regarding the recognition rate, the recognition rate is 80% or less, and the maximum value of the recognition rate is 80% or less. And those that are 20% or more lower than the maximum recognition rate may be considered inappropriate. Needless to say, it is effective to perform the same pre-processing on the operation result related to the recognition time and the memory usage, not limited to the recognition rate. Thereby, in the case of assigning the vector of each element (recognition rate, recognition time, memory usage) to each mode of recognition rate priority, recognition time priority, and memory saving priority, the parameter set of the recognition rate priority mode is set. Is selected, it is guaranteed that the recognition rate and the memory usage are within the allowable range.

（第２の実施の形態）
次に、本発明の第２の実施の形態について図５を参照しながら説明する。図５は本発明の第２の実施の形態に係るパラメータ設定装置における認識時間とメモリ使用量に関する２次元平面上にグループ分けの結果を表した図である。 (Second Embodiment)
Next, a second embodiment of the present invention will be described with reference to FIG. FIG. 5 is a diagram showing the result of grouping on a two-dimensional plane regarding the recognition time and the memory usage in the parameter setting apparatus according to the second embodiment of the present invention.

上記第１の実施の形態は、量子化の結果として得られる量子化ベクトルの数を、ユーザがその中から選択することが可能な程度に少ないことが望ましいとし、その数を５〜１０程度としていることに対し、本実施の形態は、動作結果の値毎にグルーピングして提示する方法を採用する。 In the first embodiment, it is desirable that the number of quantization vectors obtained as a result of quantization be as small as possible so that the user can select from among them. In contrast, the present embodiment employs a method of grouping and presenting each operation result value.

本実施の形態の方法の場合、すべての試行により得られる動作結果が、メモリ使用量と認識時間の組で分類される。例えば、メモリ使用量の場合は０．２ＭＢ刻み、認識時間の場合は５０ミリ秒刻みで量子化するなどの方法が用いられる。このような観点ですべての動作結果が分類される。次に、各グループにおいて、認識率が最大となる動作結果が選択される。この各グループで選択された動作結果が、上記第１の実施の形態におけるベクトル量子化の結果に相当する。そして、これがユーザに提示される。 In the case of the method according to the present embodiment, the operation results obtained by all trials are classified by combinations of memory usage and recognition time. For example, a method of quantizing in units of 0.2 MB in the case of memory usage and in units of 50 milliseconds in the case of recognition time is used. From this point of view, all operation results are classified. Next, in each group, an operation result that maximizes the recognition rate is selected. The operation result selected in each group corresponds to the vector quantization result in the first embodiment. This is then presented to the user.

メモリ使用量や認識時間の刻み幅は、システム設計上問題とならない程度の大きさにする。すなわち、同じグループに属するものは大差ないと考えてよいことになる。その中で、認識率が最大となる動作結果を選択できるため、上記第１の実施の形態と比較すると明らかなように、認識率を犠牲にすることなく、また、メモリ使用量や認識時間の刻みも妥当な大きさに刻むことができるという利点がある。 The amount of memory used and the increment of recognition time should be large enough not to cause problems in system design. That is, it can be considered that there is no great difference between those belonging to the same group. Among them, since the operation result that maximizes the recognition rate can be selected, the memory usage amount and the recognition time can be reduced without sacrificing the recognition rate, as is clear when compared with the first embodiment. There is an advantage that the step can be cut into a reasonable size.

本実施の形態においては、例えば図５に示すような、認識時間とメモリ使用量に関する２次元平面上にグループ分けの結果が提示される。ここで、平面上の数字は、各グループにおける最大認識率である。ユーザは、この図５から所望の認識率、メモリ使用量、認識時間を達成するパラメータセットを選択することができる。 In the present embodiment, grouping results are presented on a two-dimensional plane regarding recognition time and memory usage, for example, as shown in FIG. Here, the number on the plane is the maximum recognition rate in each group. The user can select a parameter set that achieves a desired recognition rate, memory usage, and recognition time from FIG.

例えば、図５の最大認識率９３のグループ５０１と最大認識率９４のグループ５０２を比較すると、グループ５０１に関しては、グループ５０２より認識率は高いが、その差はわずか１％である。また、認識時間に関しては、各グループ５０１，５０２は、８５０ミリ秒〜９００ミリ秒の同じグループとなる。一方で、メモリ使用量に着目すると、グループ５０１においては、２．２ＭＢ〜２．４ＭＢのメモリ量が必要であるが、グループ５０２においては、グループ５０１より少ない、１．６ＭＢ〜１．８ＭＢのメモリ量がである。よって、ユーザは、若干認識率は低いもののメモリ使用量が格段に少ないグループ５０２を選択することも可能となる。 For example, when comparing the group 501 having the maximum recognition rate 93 and the group 502 having the maximum recognition rate 94 in FIG. 5, the group 501 has a higher recognition rate than the group 502, but the difference is only 1%. Moreover, regarding recognition time, each group 501 and 502 becomes the same group of 850 milliseconds-900 milliseconds. On the other hand, focusing on the memory usage, the group 501 needs a memory amount of 2.2 MB to 2.4 MB, but the group 502 has a memory of 1.6 MB to 1.8 MB, which is smaller than the group 501. The amount is. Therefore, the user can select a group 502 that has a slightly low recognition rate but has a very small memory usage.

本実施の形態においては、メモリ使用量−認識時間平面に認識率を提示する場合について説明したが、これに限定されるものではなく、メモリ使用量−認識率平面や認識時間−認識率平面を提示するようにしてもよい。いうまでもなく、メモリ使用量−認識率平面の場合は、認識時間の最小値を、認識時間−認識率平面の場合は、メモリ使用量の最小値をそれぞれ表示する。 In this embodiment, the case where the recognition rate is presented on the memory usage-recognition time plane has been described. However, the present invention is not limited to this, and the memory usage-recognition rate plane and the recognition time-recognition rate plane are You may make it show. Needless to say, the minimum value of the recognition time is displayed in the case of the memory usage amount-recognition rate plane, and the minimum value of the memory usage amount is displayed in the case of the recognition time-recognition rate plane.

上記第１の実施の形態においては、動作結果（認識率、認識時間、メモリ使用量）をベクトルとみなしてベクトル量子化する場合について説明したが、これに限定されるものではなく、（認識率、メモリ使用量）をベクトルとみなしてベクトル量子化を行う場合もよいものとする。さらに、上記第２の実施の形態と組み合わせて、量子化の結果をメモリ使用量−認識時間平面上に表示するようにしてもよい。平面上に提示する数字は、その代表ベクトルによって代表される動作結果の集合の中の最大認識率である。この場合は、動作結果がどの代表ベクトルに所属するかを記録しておく必要がある。 In the first embodiment, the case where the operation result (recognition rate, recognition time, memory usage) is regarded as a vector and vector quantization is described, but the present invention is not limited to this. It is also possible to perform vector quantization by regarding the memory usage as a vector. Further, in combination with the second embodiment, the quantization result may be displayed on the memory usage amount-recognition time plane. The number presented on the plane is the maximum recognition rate in the set of motion results represented by the representative vector. In this case, it is necessary to record which representative vector the operation result belongs to.

また、メモリ使用量−認識時間平面上に認識率を表示することに代えて、メモリ使用量−認識率平面や認識時間−認識率平面を提示するようにしてもよく、メモリ使用量−認識率平面の場合は認識時間の最小値が、認識時間−認識率平面の場合はメモリ使用量の最小値がそれぞれ表示されることになる。 Further, instead of displaying the recognition rate on the memory usage-recognition time plane, a memory usage-recognition rate plane or a recognition time-recognition rate plane may be presented. In the case of the plane, the minimum value of the recognition time is displayed, and in the case of the recognition time-recognition rate plane, the minimum value of the memory usage is displayed.

上記第１の実施の形態のベクトル量子化において、認識率、認識時間、メモリ使用量はそれぞれ違う性質のものであるので、距離尺度としてそれぞれの分散で正規化したものを利用してもよい。この場合の分散は、試行結果のすべてもしくは一部を使ってその各要素である認識率、認識時間、メモリ使用量の値の分散をそれぞれ求めることで実現できる。 In the vector quantization of the first embodiment, the recognition rate, the recognition time, and the memory usage have different properties. Therefore, the distance scale normalized by each variance may be used. The variance in this case can be realized by using all or a part of the trial results to obtain the variances of the recognition rate, recognition time, and memory usage values that are the respective elements.

なお、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。前述した実施形態の機能を実現するソフトウエアのプログラムコードを、ネットワークなどを介してダウンロードして実行したり、プログラムコードを記録した記録媒体をシステムまたは装置に供給し、そのシステムまたは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることはいうまでもない。 The present invention may be applied to a system composed of a plurality of devices or an apparatus composed of a single device. The program code of the software realizing the functions of the above-described embodiments is downloaded and executed via a network or the like, or a recording medium on which the program code is recorded is supplied to the system or apparatus, and the computer ( Needless to say, this can also be achieved by reading and executing the program code stored in the recording medium by the CPU or MPU.

この場合、記録媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本発明を構成することになる。プログラムコードを供給するための記録媒体としては、例えば、フロッピー（登録商標）ディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどを用いることができる。 In this case, the program code itself read from the recording medium realizes the functions of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention. As a recording medium for supplying the program code, for example, a floppy (registered trademark) disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like is used. be able to.

また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているＯＳなどが実際の処理の一部または全部を行い、その処理によって前述した実施の形態の機能が実現される場合も含まれることはいうまでもない。 Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an OS running on the computer performs actual processing based on an instruction of the program code. It goes without saying that a case where the functions of the above-described embodiment are realized by performing part or all of the processing, is also included.

さらに、記録媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることはいうまでもない。 Furthermore, after the program code read from the recording medium is written in a memory provided in a function expansion board inserted in the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. It goes without saying that the CPU or the like provided in the board or the function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.

本発明の第１の実施の形態に係る、音声認識のためのパラメータを適切に設定するためのパラメータ設定装置のハードウエア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the parameter setting apparatus for setting appropriately the parameter for speech recognition based on the 1st Embodiment of this invention. 図１のパラメータ設定装置のモジュール構成を示すブロック図である。It is a block diagram which shows the module structure of the parameter setting apparatus of FIG. 図２のパラメータ設定装置の処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process of the parameter setting apparatus of FIG. 図２のベクトル量子化部２０６によるベクトル量子化の入出力データの一例を示す図である。It is a figure which shows an example of the input / output data of the vector quantization by the vector quantization part 206 of FIG. 本発明の第２の実施の形態に係るパラメータ設定装置における認識時間とメモリ使用量に関する２次元平面上にグループ分けの結果を表した図である。It is the figure which represented the result of grouping on the two-dimensional plane regarding the recognition time and memory usage in the parameter setting apparatus which concerns on the 2nd Embodiment of this invention.

Explanation of symbols

１０１制御メモリ１０１
１０２中央処理装置（ＣＰＵ）
１０３メモリ１０３
１０４外部記憶装置
１０５入力装置
１０６表示装置
１０７バス
２０１パラメータ探索条件保持部
２０２パラメータ選択部
２０３評価用データ保持部
２０４音声認識部
２０５動作結果保持部
２０６ベクトル量子化部
２０７量子化結果保持部
２０８表示処理部
２０９入力処理部 101 Control memory 101
102 Central processing unit (CPU)
103 Memory 103
104 External storage device 105 Input device 106 Display device 107 Bus 201 Parameter search condition holding unit 202 Parameter selection unit 203 Evaluation data holding unit 204 Speech recognition unit 205 Operation result holding unit 206 Vector quantization unit 207 Quantization result holding unit 208 Display Processing unit 209 Input processing unit

Claims

A parameter setting device for speech recognition,
Trial means for attempting speech recognition using each of a plurality of parameter sets prepared in advance for the operating conditions;
Operation result holding means for holding an operation result including a recognition rate for each trial together with a parameter set corresponding thereto;
A parameter setting device comprising grouping means for grouping operation results for each trial.

2. The parameter setting apparatus according to claim 1, wherein the grouping unit groups the operation results for each trial using vector quantization.

2. The parameter setting apparatus according to claim 1, further comprising a removing unit that removes an inappropriate operation result as a pre-processing of grouping by the grouping unit.

A parameter setting method for speech recognition,
A trial step of trying speech recognition using each of a plurality of parameter sets prepared in advance for the operating conditions;
An operation result holding step of holding an operation result including a recognition rate together with a parameter set corresponding to the recognition rate for each trial;
And a grouping step of grouping the operation results for each trial.

5. The parameter setting method according to claim 4, wherein in the grouping step, the operation results for each trial are grouped using vector quantization.

5. The parameter setting method according to claim 4, further comprising a removal step of removing an inappropriate operation result as a pre-processing of grouping by the grouping step.

A program for realizing a parameter setting method for speech recognition,
A trial module that attempts speech recognition using each of a plurality of parameter sets prepared in advance for the operating conditions;
An operation result holding module that holds an operation result including a recognition rate for each trial together with a parameter set corresponding thereto;
A program comprising: a grouping module for grouping operation results for each trial.

The program according to claim 7, wherein the grouping module groups the operation results for each trial using vector quantization.

8. The program according to claim 7, further comprising a removal module that removes an inappropriate operation result as a pre-processing of grouping by the grouping step.

A storage medium storing the program according to claim 7 in a computer-readable manner.