JP6367993B1

JP6367993B1 - Learning device, noise suppression parameter set switching rule learning device, speech recognition device, learning method, noise suppression parameter set switching rule learning method, speech recognition method, program

Info

Publication number: JP6367993B1
Application number: JP2017026705A
Authority: JP
Inventors: 智子川瀬; 隆朗福冨; 岡本　学; 学岡本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-02-16
Filing date: 2017-02-16
Publication date: 2018-08-01
Anticipated expiration: 2037-02-16
Also published as: JP2018132683A

Abstract

【課題】収音条件に基づいて雑音抑圧パラメータセットを設定することができる学習装置を提供する。【解決手段】音声ファイルと対応する正解文と収音条件とを対応付けて第１のエントリとして記憶する音声データベースを用い、特徴量と収音条件と雑音抑圧パラメータセットと精度情報を対応付けて第２のエントリとして音声認識結果データベースに記憶する音声認識結果データベース作成部と、指定された収音条件に基づいて音声認識結果データベースを検索し、検索された第２のエントリを特徴量に基づくグループ分け基準に従ってグループ分けし、最適雑音抑圧パラメータセットとグループ分け基準と収音条件を対応付けて第３のエントリとして切替規則データベースに記憶する雑音抑圧パラメータセット切替規則学習部を含む。【選択図】図１A learning apparatus capable of setting a noise suppression parameter set based on a sound collection condition is provided. A speech database that stores a correct sentence and sound collection conditions corresponding to a sound file in association with each other and stores them as a first entry, and associates feature amounts, sound collection conditions, noise suppression parameter sets, and accuracy information. A speech recognition result database creating unit for storing the second entry in the speech recognition result database, a search for the speech recognition result database based on the designated sound pickup condition, and a group based on the feature value for the searched second entry It includes a noise suppression parameter set switching rule learning unit that performs grouping according to the grouping criterion and stores the optimum noise suppression parameter set, the grouping criterion, and the sound collection condition in the switching rule database as a third entry in association with each other. [Selection] Figure 1

Description

本発明は、スマートフォン等への発話内容を文字に書き起こす音声認識サービスに関し、音声認識サービスを多様な環境下で提供できるようにするために、事前に各環境に適した雑音抑圧パラメータセットを決定するために用いる学習装置、雑音抑圧パラメータセット切替規則学習装置、音声認識装置、学習方法、雑音抑圧パラメータセット切替規則学習方法、音声認識方法、プログラムに関する。 The present invention relates to a speech recognition service for writing speech contents to a smartphone or the like into characters, and in order to provide the speech recognition service in various environments, a noise suppression parameter set suitable for each environment is determined in advance. learning device used to, noise suppression parameter set changeover rule learning device, a speech recognition apparatus, a learning method, a noise suppression parameter set changeover rule learning process, speech recognition method, and a program.

本技術分野の背景技術として例えば特許文献１がある。特許文献１のパラメータ決定装置は、音声のデータセットに対し、帯域ごとの雑音レベルを特徴量としてグループ分けを実施し、各グループに対して音声認識率が最大となる雑音抑圧パラメータセットを選択する。 For example, Patent Document 1 is a background art in this technical field. The parameter determination device of Patent Literature 1 performs grouping on a speech data set using a noise level for each band as a feature amount, and selects a noise suppression parameter set that maximizes the speech recognition rate for each group. .

特開２０１６−１３９０２５号公報Japanese Patent Laying-Open No. 2006-139025 特許第３３０９８９５号公報Japanese Patent No. 3309895

Y. Uemura, Y. Takahashi, H. Saruwatari, K. Shikano , K. Kondo, “AUTOMATIC OPTIMIZATION SCHEME OF SPECTRAL SUBTRACTION BASED ON MUSICAL NOISE ASSESSMENT VIA HIGHER-ORDER STATISTICS,” Seattle, 2008.Y. Uemura, Y. Takahashi, H. Saruwatari, K. Shikano, K. Kondo, “AUTOMATIC OPTIMIZATION SCHEME OF SPECTRAL SUBTRACTION BASED ON MUSICAL NOISE ASSESSMENT VIA HIGHER-ORDER STATISTICS,” Seattle, 2008. M. Vondrasek , P. Pollak, “Methods for Speech SNR estimation: Evaluation Tool and Analysis of VAD Dependency,” Radioengineering, 第巻14, 第 1, pp. 6-11, 2005.M. Vondrasek, P. Pollak, “Methods for Speech SNR estimation: Evaluation Tool and Analysis of VAD Dependency,” Radioengineering, Vol 14, 14, pp. 6-11, 2005.

最適な雑音抑圧パラメータセットは、音声信号の収音条件によって異なる場合がある。例えば収音した端末の機種が異なる場合は、マイクロホン素子やその構成が異なるため、雑音抑圧処理に最適なパラメータの値が大きく異なる可能性がある。従来技術では収音条件を考慮していなかったため、最適な雑音抑圧パラメータセットが選択されない可能性があった。 The optimal noise suppression parameter set may vary depending on the sound signal collection conditions. For example, if the terminal type that picks up the sound is different, the microphone element and its configuration are different, and therefore the parameter value optimal for the noise suppression processing may be greatly different. Since the conventional technology does not consider the sound collection condition, there is a possibility that the optimum noise suppression parameter set may not be selected.

そこで本発明では、収音条件に基づいて雑音抑圧パラメータセットを設定することができる学習装置を提供することを目的とする。 Therefore, an object of the present invention is to provide a learning device that can set a noise suppression parameter set based on sound collection conditions.

本発明の学習装置は、音声認識結果データベース作成部と、雑音抑圧パラメータセット切替規則学習部を含む。 The learning device of the present invention includes a speech recognition result database creation unit and a noise suppression parameter set switching rule learning unit.

音声認識結果データベース作成部は、音声ファイルと、音声ファイルに対応する正解文と、音声ファイルの収音時の条件を規定するラベルである収音条件とを対応付けて第１のエントリとして記憶する音声データベースを用い、音声ファイルの雑音の特性を規定する特徴量と、収音条件と、雑音抑圧に用いるパラメータのセットである雑音抑圧パラメータセットと、音声認識結果の精度を評価する値である精度情報を対応付けて第２のエントリとして音声認識結果データベースに記憶する処理を、複数の雑音抑圧パラメータセットからなる雑音抑圧パラメータセット群の、それぞれの雑音抑圧パラメータセットに対して実行する。 The speech recognition result database creation unit stores a speech file, a correct sentence corresponding to the speech file, and a sound collection condition that is a label that defines a condition at the time of sound collection of the speech file in association with each other, and stores them as a first entry. Using the speech database, features that define the noise characteristics of speech files, sound collection conditions, a noise suppression parameter set that is a set of parameters used for noise suppression, and accuracy that evaluates the accuracy of speech recognition results The process of associating information and storing it as a second entry in the speech recognition result database is executed for each noise suppression parameter set of the noise suppression parameter set group consisting of a plurality of noise suppression parameter sets.

雑音抑圧パラメータセット切替規則学習部は、指定された収音条件に基づいて音声認識結果データベースを検索し、検索された第２のエントリを特徴量に基づくグループ分け基準に従ってグループ分けし、各グループにおける精度情報が所定の条件を充たすように各グループにおいて選択された雑音抑圧パラメータセットである最適雑音抑圧パラメータセットと、グループ分け基準と、検索に用いた収音条件を対応付けて第３のエントリとして切替規則データベースに記憶する。 The noise suppression parameter set switching rule learning unit searches the speech recognition result database based on the specified sound pickup conditions, groups the searched second entries according to the grouping criterion based on the feature amount, As the third entry, the optimum noise suppression parameter set, which is the noise suppression parameter set selected in each group so that the accuracy information satisfies a predetermined condition, the grouping criterion, and the sound collection condition used for the search are associated with each other. Store in the switching rule database.

本発明の学習装置によれば、収音条件に基づいて雑音抑圧パラメータセットを設定することができる。 According to the learning device of the present invention, it is possible to set a noise suppression parameter set based on sound collection conditions.

実施例１の音声認識システムの構成を示すブロック図。1 is a block diagram illustrating a configuration of a voice recognition system according to Embodiment 1. FIG. 変形例の音声認識システムの構成を示すブロック図。The block diagram which shows the structure of the speech recognition system of a modification. 変形例の音声認識装置の構成を示すブロック図。The block diagram which shows the structure of the speech recognition apparatus of a modification. 実施例１の学習装置の動作を示すシーケンス図。FIG. 3 is a sequence diagram illustrating the operation of the learning device according to the first embodiment. 実施例１の雑音抑圧・音声認識装置の動作を示すシーケンス図。FIG. 3 is a sequence diagram showing the operation of the noise suppression / voice recognition device of the first embodiment. 音声認識結果データベース作成部の構成を示すブロック図。The block diagram which shows the structure of a speech recognition result database preparation part. 音声認識結果データベース作成部の動作を示すフローチャート。The flowchart which shows operation | movement of the speech recognition result database preparation part. 雑音抑圧パラメータセット切替規則学習部の構成を示すブロック図。The block diagram which shows the structure of a noise suppression parameter set switching rule learning part. 雑音抑圧パラメータセット切替規則学習部の動作を示すフローチャート。The flowchart which shows operation | movement of a noise suppression parameter set switching rule learning part. 雑音抑圧・音声認識装置の構成を示すブロック図。The block diagram which shows the structure of a noise suppression / speech recognition apparatus. 雑音抑圧・音声認識装置の動作を示すフローチャート。The flowchart which shows operation | movement of a noise suppression / voice recognition apparatus.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the structure part which has the same function, and duplication description is abbreviate | omitted.

＜音声認識システム１＞
図１を参照して、実施例１の音声認識システム１の構成を説明する。同図に示すように本実施例の音声認識システム１は、学習装置１０と、音声データベース１１と、音声認識結果データベース１３と、切替規則データベース１５と、雑音抑圧・音声認識装置１６を含み、学習装置１０は、音声認識結果データベース作成部１２と、雑音抑圧パラメータセット切替規則学習部１４を含む構成である。 <Voice recognition system 1>
With reference to FIG. 1, the structure of the speech recognition system 1 of Example 1 is demonstrated. As shown in the figure, the speech recognition system 1 of the present embodiment includes a learning device 10, a speech database 11, a speech recognition result database 13, a switching rule database 15, and a noise suppression / speech recognition device 16. The apparatus 10 includes a speech recognition result database creation unit 12 and a noise suppression parameter set switching rule learning unit 14.

なお、図２に示すように学習装置１０を二つの装置に分けて構成することもできる。具体的には、音声認識結果データベース作成部１２と同様の機能を有する独立した装置である音声認識結果データベース作成装置１２、雑音抑圧パラメータセット切替規則学習部１４と同様の機能を有する独立した装置である雑音抑圧パラメータセット切替規則学習装置１４を含む、音声認識システム１ａとして構成してもよい。 In addition, as shown in FIG. 2, the learning apparatus 10 can be divided into two apparatuses. Specifically, it is an independent device having the same function as the speech recognition result database creating device 12 and the noise suppression parameter set switching rule learning unit 14 which are independent devices having the same function as the speech recognition result database creating unit 12. You may comprise as the speech recognition system 1a containing the certain noise suppression parameter set switching rule learning apparatus 14. FIG.

なお、図３に示すようにシステム全体を一つの装置として構成してもよい。具体的には、音声データベース１１と、音声認識結果データベース作成部１２と、音声認識結果データベース１３と、雑音抑圧パラメータセット切替規則学習部１４と、切替規則データベース１５と、雑音抑圧・音声認識装置１６と同様の機能を有する雑音抑圧・音声認識部１６を含む音声認識装置１ｂとして構成してもよい。この場合、三つのデータベースは、音声認識装置１ｂの外部に配置されることとしてもよい。以下では、図１に示した構成に従って説明を進めるものとする。 Note that the entire system may be configured as one apparatus as shown in FIG. Specifically, a speech database 11, a speech recognition result database creation unit 12, a speech recognition result database 13, a noise suppression parameter set switching rule learning unit 14, a switching rule database 15, and a noise suppression / speech recognition device 16 You may comprise as the speech recognition apparatus 1b containing the noise suppression and the speech recognition part 16 which has the same function. In this case, the three databases may be arranged outside the voice recognition device 1b. In the following, the description will proceed according to the configuration shown in FIG.

本実施例では、オンライン上で利用される音声認識サービスを想定している。音声認識サービスを提供、運用する者を運用者と呼び、スマートフォン等の端末から音声認識サービスを利用する者を利用者と呼ぶ。運用者は、運用者ａとして図示、および呼称される。利用者は、利用者ｂとして図示、および呼称される。 In this embodiment, a speech recognition service used on-line is assumed. A person who provides and operates a voice recognition service is called an operator, and a person who uses the voice recognition service from a terminal such as a smartphone is called a user. The operator is illustrated and referred to as operator a. The user is illustrated and referred to as user b.

例えば、学習装置１０は、運用者ａの操作を契機に動作する装置を想定している。雑音抑圧・音声認識装置１６は、利用者ｂの操作を契機に動作する音声認識サービスにおけるオンライン処理装置を想定している。音声認識結果データベース作成部１２は、雑音抑圧パラメータセット切替規則学習のために用いる音声認識結果データベース１３を用意するための処理部である。雑音抑圧パラメータ切替規則学習部１４は、雑音抑圧・音声認識装置１６が利用される際に、どのような音声ファイルに対してどのような雑音抑圧パラメータセットを割当てるかの規則を事前に決定するための処理部である。 For example, the learning device 10 is assumed to be a device that operates in response to the operation of the operator a. The noise suppression / speech recognition device 16 is assumed to be an online processing device in a speech recognition service that operates when the user b operates. The speech recognition result database creation unit 12 is a processing unit for preparing a speech recognition result database 13 used for noise suppression parameter set switching rule learning. The noise suppression parameter switching rule learning unit 14 determines in advance a rule as to what kind of noise suppression parameter set is assigned to what kind of voice file when the noise suppression / voice recognition device 16 is used. It is a processing part.

＜用語の定義＞
下表に本実施例で使用される用語を定義する。 <Definition of terms>
The terms used in this example are defined in the following table.

＜学習装置１０の動作＞
以下、図４を参照して学習装置１０の動作であるステップＳ１２、ステップＳ１４を説明する。 <Operation of Learning Device 10>
Hereinafter, step S12 and step S14 which are operation | movement of the learning apparatus 10 are demonstrated with reference to FIG.

＜音声認識結果データベース作成部１２の動作（Ｓ１２）＞
本ステップに先立ち、運用者ａは、予め雑音抑圧パラメータセット群を用意し、音声認識結果データベース作成部１２に入力する。これに加え、本ステップでは音声データベース１１が参照される。音声データベース１１には、音声ファイルと、正解文と、収音条件とが対応付けられて第１のエントリとして予め記憶されている。音声認識結果データベース作成部１２は、対象となる音声ファイルから特徴量を算出する。音声認識結果データベース作成部１２は、対象となる音声ファイルを雑音抑圧パラメータセットに基づいて雑音抑圧する。雑音抑圧の方法および雑音抑圧パラメータセットについては、特許文献１と同様の方法を用いることができる。音声認識結果データベース作成部１２は、雑音抑圧後の音声ファイルを音声認識する。音声認識結果データベース作成部１２は、音声認識結果と正解文とを比較することにより精度情報を算出する。 <Operation of Speech Recognition Result Database Creation Unit 12 (S12)>
Prior to this step, the operator a prepares a noise suppression parameter set group in advance and inputs it to the speech recognition result database creation unit 12. In addition to this, the voice database 11 is referred to in this step. In the voice database 11, a voice file, a correct sentence, and a sound collection condition are associated with each other and stored in advance as a first entry. The voice recognition result database creation unit 12 calculates a feature amount from a target voice file. The speech recognition result database creation unit 12 performs noise suppression on the target speech file based on the noise suppression parameter set. For the noise suppression method and the noise suppression parameter set, the same method as in Patent Document 1 can be used. The speech recognition result database creation unit 12 recognizes the speech file after noise suppression. The speech recognition result database creation unit 12 calculates accuracy information by comparing the speech recognition result and the correct sentence.

音声認識結果データベース作成部１２は、特徴量と、収音条件と、雑音抑圧パラメータセットと精度情報を対応付けて第２のエントリとして音声認識結果データベース１３に記憶する処理を、雑音抑圧パラメータセット群の、それぞれの雑音抑圧パラメータセットに対して実行する。 The speech recognition result database creation unit 12 associates the feature amount, the sound collection condition, the noise suppression parameter set, and the accuracy information, and stores the processing in the speech recognition result database 13 as a second entry in the noise suppression parameter set group. For each noise suppression parameter set.

すなわち、雑音抑圧は運用者ａが予め用意したＮ個（Ｎは２以上の整数）の雑音抑圧パラメータセットからなる雑音抑圧パラメータセット群を用いておこなわれる。雑音抑圧により一つの音声ファイルに対してＮ個の雑音抑圧後音声ファイルが生成され、それぞれに対応する音声認識結果が生成される。生成されたＮ個の音声認識結果は正解文と比較され、Ｎ個の音声認識結果に対してＮ個の精度情報が算出される。従ってこれらを対応付けたＮ個の第２のエントリが音声認識結果データベース１３に記憶されることになる。以上がステップＳ１２である。ステップＳ１２は、音声データベース１１中の各音声ファイルについて実行される。 That is, noise suppression is performed using a noise suppression parameter set group including N (N is an integer of 2 or more) noise suppression parameter sets prepared in advance by the operator a. N noise-suppressed audio files are generated for one audio file by noise suppression, and a corresponding speech recognition result is generated. The generated N speech recognition results are compared with the correct sentence, and N pieces of accuracy information are calculated for the N speech recognition results. Accordingly, N second entries in which these are associated are stored in the speech recognition result database 13. The above is step S12. Step S <b> 12 is executed for each audio file in the audio database 11.

特徴量は、例えば以下の値を一つ、または複数含んで構成すればよい。
・帯域ごとの雑音のパワー（特許文献２参照）
・雑音の尖度（非特許文献１参照）
・信号対雑音比（非特許文献２参照）
収音条件については、音声データベース１１に音声ファイルと対応付けて記憶されているものが、そのまま音声認識結果データベース１３に引き継がれる。収音条件として、たとえば下表に挙げる項目とラベルの例が考えられる。収音条件の項目として端末名を含んでいれば好適である。 The feature amount may be configured to include one or more of the following values, for example.
-Noise power for each band (see Patent Document 2)
-Noise kurtosis (see Non-Patent Document 1)
・ Signal-to-noise ratio (see Non-Patent Document 2)
As for the sound collection conditions, those stored in the voice database 11 in association with the voice files are passed on to the voice recognition result database 13 as they are. As the sound collection conditions, for example, the items and labels shown in the table below can be considered. It is preferable if the terminal name is included as an item of the sound pickup condition.

精度情報は、以下の情報から計算することができる。
・正解文の文字数または単語数
・音声認識結果の正解文字数または正解単語数
・音声認識結果の誤り文字数または誤り単語数 The accuracy information can be calculated from the following information.
・ Number of characters or words in correct sentences ・ Number of correct characters or words in speech recognition results ・ Number of erroneous characters or words in speech recognition results

＜雑音抑圧パラメータセット切替規則学習部１４の動作（Ｓ１４）＞
本ステップに先立ち、運用者ａは、切替規則を学習する収音条件を指定して、雑音抑圧パラメータセット切替規則学習部１４に入力する。雑音抑圧パラメータセット切替規則学習部１４は、運用者ａに指定された収音条件に基づいて音声認識結果データベース１３を検索する。雑音抑圧パラメータセット切替規則学習部１４は、検索された第２のエントリを特徴量に基づくグループ分け基準に従ってグループ分けする。雑音抑圧パラメータセット切替規則学習部１４は、各グループにおける精度情報が所定の条件を充たすように各グループにおいて雑音抑圧パラメータセットを選択して、最適雑音抑圧パラメータセットとする。雑音抑圧パラメータセット切替規則学習部１４は、最適雑音抑圧パラメータセットと、グループ分け基準と、検索に用いた収音条件を対応付けて第３のエントリとして切替規則データベース１５に記憶する。以上がステップＳ１４である。本実施例では、収音条件による検索により、特徴量だけからは区別不能な雑音の特性が区別され、それぞれに適した雑音抑圧パラメータセットが選択されることが特徴である。 <Operation of Noise Suppression Parameter Set Switching Rule Learning Unit 14 (S14)>
Prior to this step, the operator a designates a sound collection condition for learning the switching rule and inputs it to the noise suppression parameter set switching rule learning unit 14. The noise suppression parameter set switching rule learning unit 14 searches the speech recognition result database 13 based on the sound collection conditions specified by the operator a. The noise suppression parameter set switching rule learning unit 14 groups the searched second entries according to a grouping criterion based on the feature amount. The noise suppression parameter set switching rule learning unit 14 selects a noise suppression parameter set in each group so that accuracy information in each group satisfies a predetermined condition, and sets it as an optimal noise suppression parameter set. The noise suppression parameter set switching rule learning unit 14 stores the optimum noise suppression parameter set, the grouping criterion, and the sound collection condition used for the search in association with each other in the switching rule database 15 as a third entry. The above is step S14. This embodiment is characterized in that noise characteristics that are indistinguishable from the feature amount alone are distinguished by a search based on sound collection conditions, and a noise suppression parameter set suitable for each is selected.

収音条件の指定方法について説明する。収音条件は運用者ａにより指定されるが、作業を軽減するために、収音条件の一項目に対して次のいずれの指定方法をとっても良いものとする。
・単一条件を指定する。
・複数条件をＯＲ条件で指定する。
・条件を指定しない（全条件のＯＲ条件）。 A method for specifying the sound pickup condition will be described. The sound collection condition is designated by the operator a, but in order to reduce the work, any one of the following designation methods may be used for one item of the sound collection condition.
・ Specify a single condition.
・ Specify multiple conditions with OR condition.
-No condition is specified (OR condition for all conditions).

収音条件が複数項目を含む場合、全項目のＡＮＤ条件が指定される。項目数や項目ごとの条件数が多い場合、収音条件数はその組み合わせであるので、組み合わせ総数は非常に多くなると考えられる。従って、雑音抑圧パラメータセット切替規則を全収音条件に対して学習しても良いし、一部の収音条件に対してのみ学習しても良い。一部の収音条件に対してのみ学習する場合、利用者ｂが音声認識サービスを利用する時に指定する収音条件が、雑音抑圧パラメータ切替規則学習データベース１５に存在しない可能性がある。そのため、学習された雑音抑圧パラメータセット切替規則のうちの一つの規則を、前述の場合に用いられるデフォルトの規則として切替規則データベース１５上で指定しておけば好適である。 When the sound collection condition includes a plurality of items, an AND condition for all items is designated. If the number of items and the number of conditions for each item are large, the total number of combinations is considered to be very large because the number of sound collecting conditions is a combination thereof. Therefore, the noise suppression parameter set switching rule may be learned for all sound collection conditions, or may be learned only for some sound collection conditions. When learning only for some sound collection conditions, there is a possibility that the sound collection conditions specified when the user b uses the speech recognition service do not exist in the noise suppression parameter switching rule learning database 15. Therefore, it is preferable to specify one of the learned noise suppression parameter set switching rules on the switching rule database 15 as a default rule used in the above case.

グループ分けと最適雑音抑圧パラメータセットの選択については、特許文献１と同様の方法を用いることができる。渡されたデータ全体の音声認識精度（精度情報）が最大となるように、グループ分け基準とグループごとの最適雑音抑圧パラメータセットが交互に繰り返し決定される。 For grouping and selection of the optimal noise suppression parameter set, the same method as in Patent Document 1 can be used. The grouping criterion and the optimum noise suppression parameter set for each group are repeatedly and alternately determined so that the voice recognition accuracy (accuracy information) of the entire transferred data is maximized.

＜雑音抑圧・音声認識装置１６の動作＞
以下、図５を参照して雑音抑圧・音声認識装置１６の動作であるステップＳ１６を説明する。本ステップに先立ち、利用者ｂは、音声認識対象となる音声ファイルと、当該音声ファイルの収音条件を指定して雑音抑圧・音声認識装置１６に入力しているものとする。 <Operation of Noise Suppression / Voice Recognition Device 16>
Hereinafter, step S16, which is the operation of the noise suppression / voice recognition device 16, will be described with reference to FIG. Prior to this step, it is assumed that the user b designates the voice file to be voice-recognized and the sound collection conditions of the voice file and inputs them to the noise suppression / voice recognition device 16.

収音条件の指定方法として、例えば利用者ｂが、音声認識サービス利用時に、現在の収音条件を手動で指定してもよい。また、利用者ｂが利用する端末がソフトウェアなどを通じて現在の収音条件を指定してもよい。収音条件の指定・取得の方法として、収音条件の少なくとも一つの項目が指定・取得される場合と、いずれの項目も指定・取得されない場合が考えられる。収音条件検索指定・取得されない項目に対する検索では、全条件のＯＲ条件がヒットする。 As a method for specifying the sound pickup condition, for example, the user b may manually specify the current sound pickup condition when using the voice recognition service. Further, the terminal used by the user b may specify the current sound pickup conditions through software or the like. As a method of specifying / acquiring the sound collection condition, there are a case where at least one item of the sound collection condition is specified / acquired and a case where none of the items is specified / acquired. In a search for an item that is not designated / acquired for sound pickup condition search, the OR condition of all conditions is hit.

雑音抑圧・音声認識装置１６は、利用者ｂから指定された収音条件に基づいて切替規則データベース１５を検索する。雑音抑圧・音声認識装置１６は、検索された第３のエントリのグループ分け基準に従って利用者ｂから取得した音声ファイルをグループ分けする。雑音抑圧・音声認識装置１６は、グループ分けされたグループに対応する最適雑音抑圧パラメータセットに基づいて利用者ｂから取得した音声ファイルを雑音抑圧する。雑音抑圧・音声認識装置１６は、雑音抑圧後の音声ファイルを音声認識して音声認識結果を出力する。以上がステップＳ１６である。 The noise suppression / speech recognition device 16 searches the switching rule database 15 based on the sound collection conditions designated by the user b. The noise suppression / voice recognition device 16 groups the voice files acquired from the user b according to the grouping criteria of the searched third entry. The noise suppression / voice recognition device 16 performs noise suppression on the voice file acquired from the user b based on the optimum noise suppression parameter set corresponding to the grouped group. The noise suppression / speech recognition device 16 recognizes the speech file after noise suppression and outputs a speech recognition result. The above is step S16.

＜音声認識結果データベース作成部１２およびステップＳ１２の詳細＞
以下、図６、図７を参照して本実施例の音声認識結果データベース作成部１２およびステップＳ１２の詳細について説明する。図６に示すように、音声認識結果データベース作成部１２は、特徴量算出部１２１と、雑音抑圧部１２２と、音声認識部１２３と、音声認識精度評価部１２４を含む。特徴量算出部１２１は、対象となる音声ファイルから特徴量を算出する（Ｓ１２１）。雑音抑圧部１２２は、対象となる音声ファイルを運用者ａが指定した雑音抑圧パラメータセット群の中の任意の雑音抑圧パラメータセットに基づいて雑音抑圧する（Ｓ１２２）。音声認識部１２３は、雑音抑圧後の音声ファイルを音声認識する（Ｓ１２３）。音声認識精度評価部１２４は、音声認識結果と正解文とを比較することにより精度情報を算出する（Ｓ１２４）。音声認識結果データベース作成部１２は、特徴量と、収音条件と、雑音抑圧パラメータセットと精度情報を対応付けて第２のエントリとして音声認識結果データベース１３に記憶する処理を、運用者ａが指定した雑音抑圧パラメータセット群の、各雑音抑圧パラメータセットに対して実行する。以上がステップＳ１２の動作の詳細である。 <Details of Speech Recognition Result Database Creation Unit 12 and Step S12>
The details of the speech recognition result database creation unit 12 and step S12 according to the present embodiment will be described below with reference to FIGS. As shown in FIG. 6, the speech recognition result database creation unit 12 includes a feature amount calculation unit 121, a noise suppression unit 122, a speech recognition unit 123, and a speech recognition accuracy evaluation unit 124. The feature amount calculation unit 121 calculates a feature amount from the target audio file (S121). The noise suppression unit 122 performs noise suppression based on an arbitrary noise suppression parameter set in the noise suppression parameter set group designated by the operator a (S122). The voice recognition unit 123 recognizes the voice file after noise suppression (S123). The speech recognition accuracy evaluation unit 124 calculates accuracy information by comparing the speech recognition result and the correct sentence (S124). The speech recognition result database creation unit 12 specifies the processing for storing the feature amount, the sound collection condition, the noise suppression parameter set, and the accuracy information in the speech recognition result database 13 as the second entry in association with each other. It performs for each noise suppression parameter set of the noise suppression parameter set group. The above is the details of the operation in step S12.

＜雑音抑圧パラメータセット切替規則学習部１４およびステップＳ１４の詳細＞
以下、図８、図９を参照して本実施例の雑音抑圧パラメータセット切替規則学習部１４およびステップＳ１４の詳細について説明する。図８に示すように、雑音抑圧パラメータセット切替規則学習部１４は、収音条件検索部１４１と、グループ分け部１４２と、最適パラメータセット選択部１４３と、収束判定部１４４を含む。収音条件検索部１４１は、運用者ａが指定した収音条件に基づいて音声認識結果データベース１３を検索する（Ｓ１４１）。グループ分け部１４２は、検索された第２のエントリを特徴量に基づくグループ分け基準に従ってグループ分けする（Ｓ１４２ａ）。最適パラメータセット選択部１４３は、各グループにおいて、各グループの精度情報が最大となるように各グループの雑音抑圧パラメータセットを選択する（Ｓ１４３ａ）。収束判定部１４４は、選択された各グループの雑音抑圧パラメータセットについて収束判定を行い（Ｓ１４４ａ）、雑音抑圧パラメータセットの最適化が収束していると判定した場合に（Ｓ１４４ｂ−Ｙ）、最適パラメータセット選択部１４３に出力指令を出力する（Ｓ１４４ｃ）。一方、収束判定部１４４は、雑音抑圧パラメータセットの最適化が未収束であると判定した場合に（Ｓ１４４ｂ−Ｎ）、グループ分け部１４２にグループ分け指令を出力する（Ｓ１４４ｄ）。出力指令を取得した最適パラメータセット選択部１４３は、最新の雑音抑圧パラメータセットを最適雑音抑圧パラメータセットとして、グループ分け基準と、検索に用いた収音条件と共に第３のエントリとして切替規則データベース１５に記憶する（Ｓ１４３ｂ）。一方、グループ分け指令を取得したグループ分け部１４２は、グループ分け基準を変更して、再度グループ分けを実行する（１４２ｂ）。以上がステップＳ１４の動作の詳細である。 <Details of Noise Suppression Parameter Set Switching Rule Learning Unit 14 and Step S14>
The details of the noise suppression parameter set switching rule learning unit 14 and step S14 of the present embodiment will be described below with reference to FIGS. As shown in FIG. 8, the noise suppression parameter set switching rule learning unit 14 includes a sound collection condition search unit 141, a grouping unit 142, an optimum parameter set selection unit 143, and a convergence determination unit 144. The sound collection condition search unit 141 searches the speech recognition result database 13 based on the sound collection condition designated by the operator a (S141). The grouping unit 142 groups the searched second entries according to a grouping criterion based on the feature amount (S142a). The optimum parameter set selection unit 143 selects the noise suppression parameter set of each group so that the accuracy information of each group is maximized in each group (S143a). The convergence determination unit 144 performs convergence determination on the noise suppression parameter set of each selected group (S144a), and determines that the optimization of the noise suppression parameter set has converged (S144b-Y), the optimal parameter An output command is output to the set selection unit 143 (S144c). On the other hand, when it is determined that the optimization of the noise suppression parameter set has not converged (S144b-N), the convergence determination unit 144 outputs a grouping command to the grouping unit 142 (S144d). The optimum parameter set selection unit 143 that has acquired the output command sets the latest noise suppression parameter set as the optimum noise suppression parameter set, and enters the switching rule database 15 as a third entry together with the grouping criteria and the sound collection conditions used for the search. Store (S143b). On the other hand, the grouping unit 142 that has acquired the grouping command changes the grouping criteria and executes grouping again (142b). The above is the details of the operation in step S14.

＜雑音抑圧・音声認識装置１６およびステップＳ１６の詳細＞
以下、図１０、図１１を参照して本実施例の雑音抑圧・音声認識装置１６およびステップＳ１６の詳細について説明する。図１０に示すように、雑音抑圧・音声認識装置１６は、収音条件検索部１６１と、特徴量算出部１６２と、雑音抑圧パラメータセット導出部１６３と、雑音抑圧部１６４と、音声認識部１６５を含む。収音条件検索部１６１は、利用者ｂから指定された収音条件に基づいて切替規則データベース１５を検索する（Ｓ１６１）。特徴量算出部１６２は、利用者ｂから取得した音声ファイルから特徴量を算出する（Ｓ１６２）。雑音抑圧パラメータセット導出部１６３は、検索された第３のエントリのグループ分け基準に従って利用者ｂから取得した音声ファイルの特徴量をグループ分けし、これに対応する最適雑音抑圧パラメータセットを導出する（Ｓ１６３）。雑音抑圧部１６４は、導出された最適雑音抑圧パラメータセットに基づいて利用者ｂから取得した音声ファイルを雑音抑圧する（Ｓ１６４）。音声認識部１６５は、雑音抑圧後の音声ファイルを音声認識して音声認識結果を出力する（Ｓ１６５）。以上がステップＳ１６の動作の詳細である。 <Details of Noise Suppression / Voice Recognition Device 16 and Step S16>
Details of the noise suppression / speech recognition device 16 and step S16 of this embodiment will be described below with reference to FIGS. As shown in FIG. 10, the noise suppression / speech recognition device 16 includes a sound collection condition search unit 161, a feature amount calculation unit 162, a noise suppression parameter set derivation unit 163, a noise suppression unit 164, and a speech recognition unit 165. including. The sound collection condition search unit 161 searches the switching rule database 15 based on the sound collection condition designated by the user b (S161). The feature amount calculation unit 162 calculates a feature amount from the audio file acquired from the user b (S162). The noise suppression parameter set deriving unit 163 groups the feature amounts of the audio file acquired from the user b in accordance with the grouping criterion of the retrieved third entry, and derives an optimum noise suppression parameter set corresponding to this ( S163). The noise suppression unit 164 performs noise suppression on the voice file acquired from the user b based on the derived optimal noise suppression parameter set (S164). The voice recognition unit 165 recognizes the voice file after noise suppression and outputs a voice recognition result (S165). The above is the details of the operation in step S16.

＜本実施例の音声認識システム１により生じる効果＞
本実施例の音声認識システム１によれば、収音条件が多様な場合にも、それぞれの収音条件に適した雑音抑圧パラメータセットを自動で選択可能になる。 <Effects produced by the speech recognition system 1 of this embodiment>
According to the speech recognition system 1 of the present embodiment, it is possible to automatically select a noise suppression parameter set suitable for each sound collection condition even when the sound collection conditions are various.

＜補記＞
本発明の装置は、例えば単一のハードウェアエンティティとして、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ハードウェアエンティティの外部に通信可能な通信装置（例えば通信ケーブル）が接続可能な通信部、ＣＰＵ（Central Processing Unit、キャッシュメモリやレジスタなどを備えていてもよい）、メモリであるＲＡＭやＲＯＭ、ハードディスクである外部記憶装置並びにこれらの入力部、出力部、通信部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置の間のデータのやり取りが可能なように接続するバスを有している。また必要に応じて、ハードウェアエンティティに、ＣＤ−ＲＯＭなどの記録媒体を読み書きできる装置（ドライブ）などを設けることとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 <Supplementary note>
The apparatus of the present invention includes, for example, a single hardware entity as an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity. Can be connected to a communication unit, a CPU (Central Processing Unit, may include a cache memory or a register), a RAM or ROM that is a memory, an external storage device that is a hard disk, and an input unit, an output unit, or a communication unit thereof , A CPU, a RAM, a ROM, and a bus connected so that data can be exchanged between the external storage devices. If necessary, the hardware entity may be provided with a device (drive) that can read and write a recording medium such as a CD-ROM. A physical entity having such hardware resources includes a general-purpose computer.

ハードウェアエンティティの外部記憶装置には、上述の機能を実現するために必要となるプログラムおよびこのプログラムの処理において必要となるデータなどが記憶されている（外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくこととしてもよい）。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。 The external storage device of the hardware entity stores a program necessary for realizing the above functions and data necessary for processing the program (not limited to the external storage device, for example, reading a program) It may be stored in a ROM that is a dedicated storage device). Data obtained by the processing of these programs is appropriately stored in a RAM or an external storage device.

ハードウェアエンティティでは、外部記憶装置（あるいはＲＯＭなど）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてメモリに読み込まれて、適宜にＣＰＵで解釈実行・処理される。その結果、ＣＰＵが所定の機能（上記、…部、…手段などと表した各構成要件）を実現する。 In the hardware entity, each program stored in an external storage device (or ROM or the like) and data necessary for processing each program are read into a memory as necessary, and are interpreted and executed by a CPU as appropriate. . As a result, the CPU realizes a predetermined function (respective component requirements expressed as the above-described unit, unit, etc.).

本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 The present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the spirit of the present invention. In addition, the processing described in the above embodiment may be executed not only in time series according to the order of description but also in parallel or individually as required by the processing capability of the apparatus that executes the processing. .

既述のように、上記実施形態において説明したハードウェアエンティティ（本発明の装置）における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 As described above, when the processing functions in the hardware entity (the apparatus of the present invention) described in the above embodiments are realized by a computer, the processing contents of the functions that the hardware entity should have are described by a program. Then, by executing this program on a computer, the processing functions in the hardware entity are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。
In this embodiment, a hardware entity is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

Claims

Using a speech database that stores a sound file, a correct sentence corresponding to the sound file, and a sound collection condition that is a label that defines a condition at the time of sound collection of the sound file as a first entry in association with each other, Corresponding the feature quantity that defines the noise characteristics of the audio file, the sound collection condition, the noise suppression parameter set that is a set of parameters used for noise suppression, and accuracy information that is a value for evaluating the accuracy of the speech recognition result A speech recognition result database creation unit that executes processing for storing the second entry in the speech recognition result database for each noise suppression parameter set of the noise suppression parameter set group including the plurality of noise suppression parameter sets; ,
The speech recognition result database is searched based on the specified sound pickup condition, the searched second entries are grouped according to a grouping criterion based on the feature amount, and the accuracy information in each group is a predetermined value. A switching rule as a third entry by associating the optimum noise suppression parameter set, which is the noise suppression parameter set selected in each group so as to satisfy the condition, the grouping criterion, and the sound collection condition used for the search A learning apparatus including a noise suppression parameter set switching rule learning unit stored in a database.

A feature amount that defines noise characteristics of an audio file; a sound collection condition that is a label that defines conditions at the time of sound collection of the audio file; a noise suppression parameter set that is a set of parameters used for noise suppression; and speech recognition A speech recognition result database in which an entry associated with accuracy information, which is a value for evaluating the accuracy of the result, is stored for each noise suppression parameter set of the noise suppression parameter set group including the plurality of noise suppression parameter sets. A noise suppression parameter set switching rule learning device using
A sound collection condition search unit for searching the speech recognition result database based on the specified sound collection condition;
A grouping unit that groups the retrieved entries according to a grouping criterion based on the feature amount;
Corresponding to the optimum noise suppression parameter set, which is the noise suppression parameter set selected in each group so that the accuracy information in each group satisfies a predetermined condition, the grouping criterion, and the sound collection condition used for the search A noise suppression parameter set switching rule learning device including an optimum parameter set selection unit that stores an entry as an entry in a switching rule database.

An audio database that stores a first entry in association with an audio file, a correct sentence corresponding to the audio file, and a sound collection condition that is a label that defines a condition at the time of sound collection of the audio file;
Corresponds to features that define noise characteristics of the voice file, the sound collection conditions, a noise suppression parameter set that is a set of parameters used for noise suppression, and accuracy information that is a value that evaluates the accuracy of the speech recognition result A speech recognition result database creation unit that executes the process as the second entry for each noise suppression parameter set of the noise suppression parameter set group including the plurality of noise suppression parameter sets;
A speech recognition result database for storing the second entry;
The speech recognition result database is searched based on the sound collection condition specified by the operator, the searched second entries are grouped according to a grouping criterion based on the feature amount, and the accuracy information in each group Is a third entry by associating the optimum noise suppression parameter set, which is the noise suppression parameter set selected in each group so as to satisfy a predetermined condition, the grouping criterion, and the sound collection condition used for the search. A noise suppression parameter set switching rule learning unit, and
A switching rule database for storing the third entry;
The switching rule database is searched based on the sound pickup conditions specified by the user, and the audio files obtained from the user are grouped according to the grouping criteria of the searched third entry and grouped. A noise suppression / voice recognition unit that suppresses noise of a voice file acquired from the user based on the optimum noise suppression parameter set corresponding to a group, recognizes the voice file after noise suppression, and outputs a voice recognition result; Including speech recognition device.

A learning method executed by a learning device,
Using a speech database that stores a sound file, a correct sentence corresponding to the sound file, and a sound collection condition that is a label that defines a condition at the time of sound collection of the sound file as a first entry in association with each other, Corresponding the feature quantity that defines the noise characteristics of the audio file, the sound collection condition, the noise suppression parameter set that is a set of parameters used for noise suppression, and accuracy information that is a value for evaluating the accuracy of the speech recognition result A process of storing in the speech recognition result database as a second entry for each noise suppression parameter set of the noise suppression parameter set group consisting of a plurality of the noise suppression parameter sets;
The speech recognition result database is searched based on the specified sound pickup condition, the searched second entries are grouped according to a grouping criterion based on the feature amount, and the accuracy information in each group is a predetermined value. A switching rule as a third entry by associating the optimum noise suppression parameter set, which is the noise suppression parameter set selected in each group so as to satisfy the condition, the grouping criterion, and the sound collection condition used for the search A learning method comprising the step of storing in a database.

A feature amount that defines noise characteristics of an audio file; a sound collection condition that is a label that defines conditions at the time of sound collection of the audio file; a noise suppression parameter set that is a set of parameters used for noise suppression; and speech recognition A speech recognition result database in which an entry associated with accuracy information, which is a value for evaluating the accuracy of the result, is stored for each noise suppression parameter set of the noise suppression parameter set group including the plurality of noise suppression parameter sets. A noise suppression parameter set switching rule learning method using
Searching the speech recognition result database based on the designated sound pickup conditions;
Grouping the retrieved entries according to a grouping criterion based on the features;
Corresponding to the optimum noise suppression parameter set, which is the noise suppression parameter set selected in each group so that the accuracy information in each group satisfies a predetermined condition, the grouping criterion, and the sound collection condition used for the search And storing it as an entry in the switching rule database.
A noise suppression parameter set switching rule learning method executed by a noise suppression parameter set switching rule learning device.

A speech recognition method executed by a speech recognition apparatus,
Using a speech database that stores a sound file, a correct sentence corresponding to the sound file, and a sound collection condition that is a label that defines a condition at the time of sound collection of the sound file as a first entry in association with each other, Corresponding the feature quantity that defines the noise characteristics of the audio file, the sound collection condition, the noise suppression parameter set that is a set of parameters used for noise suppression, and accuracy information that is a value for evaluating the accuracy of the speech recognition result A process of storing in the speech recognition result database as a second entry for each noise suppression parameter set of the noise suppression parameter set group consisting of a plurality of the noise suppression parameter sets;
The speech recognition result database is searched based on the sound collection condition specified by the operator, the searched second entries are grouped according to a grouping criterion based on the feature amount, and the accuracy information in each group Is a third entry by associating the optimum noise suppression parameter set, which is the noise suppression parameter set selected in each group so as to satisfy a predetermined condition, the grouping criterion, and the sound collection condition used for the search. Storing in the switching rule database as
The switching rule database is searched based on the sound pickup conditions specified by the user, and the audio files obtained from the user are grouped according to the grouping criteria of the searched third entry and grouped. A speech recognition method comprising the steps of: noise-suppressing a speech file acquired from the user based on the optimum noise suppression parameter set corresponding to a group; speech recognition of the speech file after noise suppression; and outputting a speech recognition result.

A program for causing a computer to function as the apparatus according to any one of claims 1 to 3.