JPWO2012165529A1

JPWO2012165529A1 - Language model construction support apparatus, method and program

Info

Publication number: JPWO2012165529A1
Application number: JP2013518149A
Authority: JP
Inventors: 長友　健太郎; 健太郎長友
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2011-06-03
Filing date: 2012-05-31
Publication date: 2015-02-23
Also published as: WO2012165529A1

Abstract

本発明は、学習データから言語特徴量を抽出し、抽出された言語特徴量とその出自である学習データとを関連付けた言語特徴量出自情報を記憶し、音声認識処理において各言語特徴量が利用された度合いを示す言語特徴量利用統計情報を取得して記憶し、言語特徴量統計情報と言語特徴量出自情報に基づいて、音声認識処理における各学習データの寄与度を算出し、算出された学習データの寄与度を出力する。The present invention extracts linguistic feature quantities from learning data, stores linguistic feature quantity origin information in which the extracted linguistic feature quantities are associated with the learning data that is the origin, and uses each linguistic feature quantity in speech recognition processing. The language feature usage statistics information indicating the degree of learning is acquired and stored, and the contribution degree of each learning data in the speech recognition processing is calculated based on the language feature statistics information and the language feature origin information, Outputs the contribution of learning data.

Description

本発明は、言語モデル構築支援装置、方法及びプログラムに関する。 The present invention relates to a language model construction support apparatus, method, and program.

音声データからテキスト情報を取得する音声認識技術では、言語モデルと呼ばれるコンポーネントが用いられる。このコンポーネントは、音声認識器が複数の認識結果テキストの仮説の中から、より言語的に妥当なものを選ぶために用いられる。今日広く用いられている言語モデルにはN-gram言語モデルがある。N-gram言語モデルは、Ｎ個の単語連鎖が現れる尤もらしさを、大量のテキストデータから統計的に学習することによって構築される。N-gram言語モデルは、学習データから得られた統計情報に基づいて動作するため、ある特有のドメインに属する学習データが手元にある状態では、そのドメインに属する言語表現を用いた音声は高い精度で認識できるものの、そのドメインに属さない言語表現を用いた音声に対する認識精度は低くなる。 In speech recognition technology that acquires text information from speech data, a component called a language model is used. This component is used by the speech recognizer to select a more linguistically valid one from a plurality of recognition result text hypotheses. The language model widely used today is an N-gram language model. The N-gram language model is constructed by statistically learning the likelihood that N word chains appear from a large amount of text data. Since the N-gram language model operates based on statistical information obtained from learning data, in a state where learning data belonging to a specific domain is at hand, speech using a language expression belonging to that domain has high accuracy. However, the recognition accuracy for speech using language expressions that do not belong to the domain is low.

例えば、特許文献１には、テキストデータを複数のクラスタに分けて、クラスタ毎に言語モデルを生成し、評価用データを用いて各言語モデルを評価し、評価結果に基づいて選択した言語モデルを統合する等して、目的の用途に即したテキストデータ要することなく、目的の用途に即した言語モデルを作成するための装置が開示されている。 For example, Patent Document 1 divides text data into a plurality of clusters, generates a language model for each cluster, evaluates each language model using evaluation data, and selects a language model selected based on the evaluation result. An apparatus for creating a language model suitable for a target application without requiring text data appropriate for the target application by integration or the like is disclosed.

特許第４５３７９７０号公報Japanese Patent No. 4537970

しかし、上記特許文献１の装置では、最終的に得られる言語モデルの目的ドメインに対する認識精度が、他のクラスタ別言語モデルよりは高くなるが、良好な認識精度を達成できる保証がない。この方式によって保証されるものは、最終的に得られる言語モデルの目的ドメインに対する認識精度が、他のクラスタ別言語モデルよりも高い、という点である。 However, in the device of Patent Document 1, the recognition accuracy for the target domain of the finally obtained language model is higher than that of other cluster language models, but there is no guarantee that good recognition accuracy can be achieved. What is guaranteed by this method is that the recognition accuracy for the target domain of the finally obtained language model is higher than that of other cluster-specific language models.

より良好な認識精度を得るためには、言語モデルを目的ドメインに適応させる効果の高い学習データを用いることが望ましく、このような学習データをユーザに登録させるための仕組みの実現が望まれる。しかし、どのような学習データが音声認識の精度向上に効果的であるかをユーザが把握することは困難であった。 In order to obtain better recognition accuracy, it is desirable to use learning data that has a high effect of adapting the language model to the target domain, and it is desirable to realize a mechanism for allowing the user to register such learning data. However, it has been difficult for the user to understand what learning data is effective in improving the accuracy of speech recognition.

本発明は、上記問題点に鑑みてなされたもので、音声認識処理で用いる言語モデルを目的ドメインに適応させる効果の高い学習データの登録を支援する言語モデル構築支援装置、方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and provides a language model construction support apparatus, method, and program for supporting registration of highly effective learning data for adapting a language model used in speech recognition processing to a target domain. For the purpose.

また、本発明は、どのような学習データが音声認識の精度向上に効果的であるかをユーザが容易に把握することができる言語モデル構築支援装置、方法及びプログラムを提供することを他の目的とする。 Another object of the present invention is to provide a language model construction support apparatus, method, and program that allows a user to easily grasp what learning data is effective in improving the accuracy of speech recognition. And

本発明は、学習データから言語特徴量を抽出する言語特徴量抽出手段と、抽出された言語特徴量とその出自である学習データとを関連付けた言語特徴量出自情報を記憶する手段と、音声認識処理において各言語特徴量が利用された度合いを示す言語特徴量利用統計情報を取得して記憶する手段と、前記言語特徴量統計情報と前記言語特徴量出自情報に基づいて、前記音声認識処理における各学習データの寄与度を算出する寄与度算出手段と、前記算出された学習データの寄与度を出力する手段と、を備えることを特徴とする言語モデル構築支援装置である。 The present invention includes a language feature amount extraction unit that extracts a language feature amount from learning data, a unit that stores language feature amount origin information in which the extracted language feature amount is associated with the learning data that is the origin, speech recognition In the speech recognition processing, based on the language feature quantity statistical information and the language feature quantity origin information, means for acquiring and storing language feature quantity use statistical information indicating the degree of use of each language feature quantity in the processing A language model construction support apparatus comprising: a contribution calculation unit that calculates a contribution of each learning data; and a unit that outputs the calculated contribution of the learning data.

本発明は、言語モデル構築支援方法であって、学習データから言語特徴量を抽出し、抽出された言語特徴量とその出自である学習データとを関連付けた言語特徴量出自情報を記憶し、音声認識処理において各言語特徴量が利用された度合いを示す言語特徴量利用統計情報を取得して記憶し、前記言語特徴量統計情報と前記言語特徴量出自情報に基づいて、前記音声認識処理における各学習データの寄与度を算出し、前記算出された学習データの寄与度を出力する、ことを特徴とする言語モデル構築支援方法である。 The present invention is a language model construction support method that extracts a language feature amount from learning data, stores language feature amount origin information in which the extracted language feature amount is associated with learning data that is the origin, and stores speech information Language feature usage statistics information indicating the degree to which each language feature is used in the recognition process is acquired and stored, and based on the language feature statistics information and the language feature origin information, The language model construction support method is characterized in that the contribution of learning data is calculated and the calculated contribution of learning data is output.

本発明は、コンピュータに、学習データから言語特徴量を抽出する処理、抽出された言語特徴量とその出自である学習データとを関連付けた言語特徴量出自情報を記憶する処理、音声認識処理において各言語特徴量が利用された度合いを示す言語特徴量利用統計情報を取得して記憶する処理、前記言語特徴量統計情報と前記言語特徴量出自情報に基づいて、前記音声認識処理における各学習データの寄与度を算出する処理、前記算出された学習データの寄与度を出力する処理、を実行させるためのプログラムである。 The present invention relates to a process for extracting a language feature quantity from learning data, a process for storing language feature quantity origin information in which the extracted language feature quantity is associated with the learning data that is the origin, and a speech recognition process. Processing for acquiring and storing linguistic feature quantity usage statistical information indicating the degree to which the linguistic feature quantity is used, and based on the linguistic feature quantity statistical information and the linguistic feature quantity origin information, It is a program for executing a process for calculating a contribution degree and a process for outputting the calculated contribution degree of learning data.

本発明によれば、音声認識処理で用いる言語モデルを目的ドメインに適応させる効果の高い学習データの登録を支援することができる。 ADVANTAGE OF THE INVENTION According to this invention, registration of the learning data with the high effect of adapting the language model used by the speech recognition process to a target domain can be supported.

図１は本発明の第１の実施形態に係る言語モデル構築支援装置のブロック図である。FIG. 1 is a block diagram of a language model construction support apparatus according to the first embodiment of the present invention. 図２は集計フェイズを説明するためのフローチャートである。FIG. 2 is a flowchart for explaining the counting phase. 図３は学習フェイズを説明するためのフローチャートである。FIG. 3 is a flowchart for explaining the learning phase. 図４は認識フェイズを説明するためのフローチャートである。FIG. 4 is a flowchart for explaining the recognition phase. 図５は解析フェイズを説明するためのフローチャートである。FIG. 5 is a flowchart for explaining the analysis phase. 図６は第１の実施形態に係る言語モデル構築支援装置の他の構成例を示す図である。FIG. 6 is a diagram illustrating another configuration example of the language model construction support apparatus according to the first embodiment. 図７は第２の実施形態に係る言語モデル構築支援装置のブロック図である。FIG. 7 is a block diagram of a language model construction support apparatus according to the second embodiment. 図８は言語特徴量出自ＤＢに記憶されるデータを例示する図である。FIG. 8 is a diagram illustrating data stored in the language feature amount origin DB. 図９は第３の実施形態にかかる言語モデル構築支援装置のブロック図である。FIG. 9 is a block diagram of a language model construction support apparatus according to the third embodiment. 図１０は第４の実施形態にかかる言語モデル構築支援装置のブロック図である。FIG. 10 is a block diagram of a language model construction support apparatus according to the fourth embodiment. 図１１は第５の実施形態にかかる言語モデル構築支援装置のブロック図である。FIG. 11 is a block diagram of a language model construction support apparatus according to the fifth embodiment. 図１２は第７の実施形態にかかる言語モデル構築支援装置のブロックである。FIG. 12 is a block diagram of a language model construction support apparatus according to the seventh embodiment.

以下、本発明の実施形態について図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（第１の実施形態）
図１は、本発明の第１の実施形態に係る言語モデル構築支援装置のブロック図である。図示されるように、本実施形態に係る言語モデル構築支援装置は、言語特徴量抽出部１０１と、言語特徴量出自記憶部１０２と、言語モデル学習部１０３と、音声認識部１０４と、学習データ寄与度解析部１０５と、を備える。(First embodiment)
FIG. 1 is a block diagram of a language model construction support apparatus according to the first embodiment of the present invention. As illustrated, the language model construction support apparatus according to the present embodiment includes a language feature quantity extraction unit 101, a language feature quantity source storage unit 102, a language model learning unit 103, a speech recognition unit 104, and learning data. A contribution analysis unit 105.

言語特徴量抽出部１０１は、言語モデルの学習データを入力として受け取ると、その学習データから所与の手続きに従って言語特徴量を抽出（計算）する。 When the language feature quantity extraction unit 101 receives learning data of a language model as an input, the language feature quantity extraction unit 101 extracts (calculates) a language feature quantity from the learning data according to a given procedure.

言語特徴量出自記憶部１０２は、言語特徴量抽出部１０１において抽出された言語特徴量を、その出自である学習データを追跡できるような形でデータベース（言語特徴量出自ＤＢ）に記憶する。例えば、「『柴犬』という単語は学習データｄ１では２０回、学習データｄ２では１５回現れた」という情報（言語特徴量出自情報）が記録される。 The language feature quantity origin storage unit 102 stores the language feature quantity extracted by the language feature quantity extraction unit 101 in a database (language feature quantity origin DB) in such a way that the learning data that is the origin can be tracked. For example, information (language feature amount origin information) that “the word“ shiba inu ”appears 20 times in the learning data d1 and 15 times in the learning data d2” is recorded.

言語モデル学習部１０３は、言語特徴量抽出部１０１により抽出された言語特徴量を用い、所与の手続きに基づいて言語モデルを構築する。例えば一般的な単語トライグラム（3-gram）言語モデルの場合、連続して現れる３つの単語の組の各々についての出現数が言語特徴量となる。この出現数を、最初の２つの単語（コンテキスト）が連続して現れる数で割る。これを与えられた全ての言語特徴量に対して計算することで、確率的言語モデルを構築することができる。 The language model learning unit 103 uses the language feature amount extracted by the language feature amount extraction unit 101 to construct a language model based on a given procedure. For example, in the case of a general word trigram (3-gram) language model, the number of appearances for each group of three words that appear successively is the language feature. Divide this number of occurrences by the number of first two words (context) that appear in succession. A probabilistic language model can be constructed by calculating this for all given language features.

音声認識部１０４は、言語モデル学習部１０３によって構築された言語モデルを用い、別途入力された音声信号に対して所与の手順に従って音声認識処理を施す。また、音声認識部１０４は、音声認識処理において参照された言語特徴量を逐一記録する。これにより、音声認識処理における各言語特徴量についての参照カウントを示す言語特徴量利用統計情報が得られる。 The speech recognition unit 104 uses the language model constructed by the language model learning unit 103 and performs speech recognition processing on a separately input speech signal according to a given procedure. In addition, the voice recognition unit 104 records each language feature amount referred to in the voice recognition process. Thereby, language feature amount usage statistical information indicating a reference count for each language feature amount in the speech recognition processing is obtained.

学習データ寄与度解析部１０５は、所与のアルゴリズムに従って、音声認識部１０４によって記録された言語特徴量利用統計情報を元に、言語特徴量出自記憶部１０２を適宜参照しながら、言語特徴量抽出部１０１に入力された各学習データに対して、音声認識処理に対する寄与度を算出する。 The learning data contribution analysis unit 105 extracts a language feature amount according to a given algorithm, based on the language feature amount use statistical information recorded by the speech recognition unit 104 and appropriately referring to the language feature amount source storage unit 102. The degree of contribution to the speech recognition process is calculated for each learning data input to the unit 101.

次に、本実施形態に係る言語モデル構築支援装置の動作について説明する。言語モデル構築支援装置における処理は、大きく四つのそれぞれ独立したフェイズから構成される。すなわち、集計フェイズ、学習フェイズ、認識フェイズ、解析フェイズである。集計→学習、学習→認識、認識→解析の各々のフェイズ間には直接の依存関係がある。各フェイズは、直接の依存関係を持つフェイズの直近の処理結果を用いて動作する。典型的には、各フェイズは集計→学習→認識→解析の順で実行されるので、以降はこの順に沿って説明を進める。しかしこれは一例であって、必ずしも、あるフェイズの完了を契機として次のフェイズを起動する必要はない。例えば、これらのフェイズを各々任意のタイミングで実行するようにしてもよい。 Next, the operation of the language model construction support apparatus according to this embodiment will be described. The processing in the language model construction support apparatus is mainly composed of four independent phases. That is, a total phase, a learning phase, a recognition phase, and an analysis phase. There is a direct dependency between each phase of aggregation → learning, learning → recognition, recognition → analysis. Each phase operates using the latest processing result of the phase having a direct dependency relationship. Typically, each phase is executed in the order of tabulation → learning → recognition → analysis, and hence the description will proceed in this order. However, this is only an example, and it is not always necessary to activate the next phase upon completion of a certain phase. For example, these phases may be executed at arbitrary timings.

例えば、学習フェイズは、集計フェイズの完了後にそれを受けて動作してもよく、認識フェイズが起動する直前に自動的に起動するようにしてもよく、ユーザによって明示的に起動されてもよく、所定時間毎（数分から１時間毎や一晩ごと等）にタイマーで起動されるようにしてもよい。認識フェイズは、ユーザの音声認識処理要求に応じて実行される。解析フェイズは、認識フェイズの完了後にそれを受けて動作してもよく、ユーザにより明示的に起動されてもよく、所定時間毎にタイマーで起動されるようにしてもよい。 For example, the learning phase may operate in response to the completion of the aggregation phase, may be automatically activated immediately before the recognition phase is activated, or may be explicitly activated by the user, You may make it start with a timer for every predetermined time (every minute to every hour, every night etc.). The recognition phase is executed in response to a user's voice recognition processing request. The analysis phase may operate in response to the completion of the recognition phase, may be explicitly activated by the user, or may be activated by a timer at predetermined time intervals.

集計フェイズについて説明する。集計フェイズは、ユーザから新たな学習データが与えられる都度実行され、言語特徴量出自データベースを順次更新する。以下、集計フェイズの詳細について図２を参照して述べる。 The aggregation phase will be described. The counting phase is executed each time new learning data is given from the user, and the language feature source database is sequentially updated. Hereinafter, the details of the aggregation phase will be described with reference to FIG.

まず、ユーザによって学習データが入力される（Ａ１）。学習データは、一塊のテキストデータであり、典型的には一つの文書ファイルである。テキストデータが取り出せるものであれば、このファイルの具体的なフォーマットは問われない。また、ウェブページのテキスト入力フォームなどから直接入力されたテキストデータを一つの学習データとして扱ってもよい。 First, learning data is input by the user (A1). The learning data is a lump of text data, typically a single document file. If the text data can be extracted, the specific format of this file is not limited. Further, text data directly input from a text input form of a web page may be handled as one learning data.

一つの学習データは単独のドメインに属する言語表現によってのみ構成されることが望ましいが、複数のドメインに属する言語表現を含んでも構わない。但し、異なるドメインに属する言語表現を一様に含む学習データは、特定のドメインに属する言語表現のみを含む学習データに比べ、その特定ドメインに属する表現形式に沿ってなされた音声発話に対する音声認識処理への寄与度が低く見積もられる場合がある。 One learning data is preferably composed only of language expressions belonging to a single domain, but may include language expressions belonging to a plurality of domains. However, learning data that uniformly includes language expressions belonging to different domains is compared with learning data that includes only language expressions belonging to a specific domain, and voice recognition processing for speech utterances made in accordance with the expression format belonging to that specific domain. There is a case where the contribution to is low.

例えば、音声が業務日報の読み上げである場合に、与える学習データは業務日報そのもの（”業務日報”ドメイン）であることが望ましいが、ここに製品パンフレット（”製品紹介ドメイン”）を合わせたものを一つの学習データとして与えると、学習データに含まれる言語表現の絶対量は増加するにも関わらず”業務日報”ドメインに特有の言語表現は殆ど増加しないので、”業務日報”ドメインに特有の発話に対する寄与度は下がる。 For example, when the voice is a daily business report, it is desirable that the learning data given is the daily business report itself (the “business daily report” domain), but this is the product brochure (“product introduction domain”). If given as one piece of learning data, the linguistic expression specific to the “business daily report” domain hardly increases despite the increase in the absolute amount of language expressions included in the learning data. The contribution to is reduced.

学習データは、実際に音声発話に用いられる言語表現を多く含むことが望ましいが、そのような言語表現が含まれなくても構わない。但し、そのような学習データは、音声認識精度を低下させる場合がある。 The learning data preferably includes many linguistic expressions that are actually used for speech utterance, but such linguistic expressions may not be included. However, such learning data may reduce speech recognition accuracy.

言語特徴量算出部１０１は、与えられた学習データから言語特徴量を抽出する（ステップＡ２）。例えば、形態素解析処理を施して単語（形態素）単位に分割した後、Ｎ個の単語の連鎖単位に出現数をカウントする。この他の言語特徴量としては、単語一つ一つの出現頻度（ユニグラム頻度）、各単語や単語連鎖の属するメタ情報（品詞や意味クラス等）単位での出現頻度が考えられる。また、複数の単語の組み合わせからなるフレーズや、係り受け解析に基づく構文木の部分木などを用いてもよい。どのような特徴量を取り出すかは、最終的に生成される言語モデルや、後述する音声認識部１０４に応じて予め決定する。 The language feature quantity calculation unit 101 extracts a language feature quantity from the given learning data (step A2). For example, after performing morpheme analysis processing and dividing into words (morpheme) units, the number of appearances is counted in a chain unit of N words. As other language feature quantities, the appearance frequency of each word (unigram frequency) and the appearance frequency in units of meta information (part of speech, semantic class, etc.) to which each word or word chain belongs can be considered. Also, a phrase composed of a combination of a plurality of words, a subtree of a syntax tree based on dependency analysis, or the like may be used. What kind of feature value is to be extracted is determined in advance according to the finally generated language model and the voice recognition unit 104 described later.

言語特徴量出自記憶部１０２は、ステップＡ２で抽出された言語特徴量の出現頻度を、その抽出元の学習データを追跡可能な形で記憶する（ステップＡ３）。例えば、学習データを表すＩＤ番号として、言語特徴量出自記憶部１０２に登録された学習データの通し番号を採用し、各言語特徴量に対して、ＩＤ番号と出現頻度のペアを要素とするリストを言語特徴量出自データベースに記録する。以上で集計フェイズが完了する。 The language feature quantity source storage unit 102 stores the appearance frequency of the language feature quantity extracted in step A2 in a form in which the extracted learning data can be traced (step A3). For example, a serial number of learning data registered in the language feature quantity origin storage unit 102 is adopted as an ID number representing learning data, and a list having an ID number and an appearance frequency pair as an element for each language feature quantity. Record in the language feature source database. This completes the counting phase.

少なくとも１回以上集計フェイズが行われた後、言語モデルの学習フェイズが実行される。学習フェイズの起動タイミングとしては、（１）集計フェイズの完了後、（２）所定時間毎に（数分から数時間毎に）・定期毎に、（３）ユーザによって明示的に起動を指示された場合、等が考えられる。集計フェイズと連動してする以外の構成とする場合、一次的に集計フェイズにおける出力結果を蓄積するデータベースを用いてもよく、言語特徴量出自データベースを流用してもよい。いずれの場合でも、前回の学習フェイズ以降、特に学習データやそこから抽出された言語特徴量に変化が認められなければ、学習フェイズは何も処理をせずに終了してもよい。 After the aggregation phase is performed at least once, the language model learning phase is executed. (1) After the completion of the totaling phase, (2) every predetermined time (every few minutes to every several hours) and every regular period, (3) the user is explicitly instructed to start the learning phase If, etc., is considered. In the case of a configuration other than interlocking with the aggregation phase, a database that temporarily accumulates output results in the aggregation phase may be used, or a language feature amount origin database may be diverted. In any case, after the previous learning phase, if no change is recognized in the learning data or the language feature extracted from the learning phase, the learning phase may be terminated without any processing.

以下、学習フェイズの詳細について図３を参照して述べる。 Hereinafter, the details of the learning phase will be described with reference to FIG.

言語モデル学習部１０３は、所与の設定に基づいて、集計済みの言語特徴量を収集する（ステップＡ４）。具体的には、例えば、既知の全ての学習データから抽出された言語特徴量を足し合わせる。この際、同一と見做せる言語特徴量同士の頻度は学習データを超えて足し合わせるが、例えばN-gram言語モデルを学習する場合であれば、学習データごとに別途与えた重み値をもって加重和を取ってもよい。また、この重みを何らかのアルゴリズムを用いて自動的に算出してもよい。これらは言語モデルのタスク適応のために一般的に用いられている方法であるためその詳細は省略する。 The language model learning unit 103 collects aggregated language feature amounts based on the given setting (step A4). Specifically, for example, language feature amounts extracted from all known learning data are added. At this time, the frequency of language features that can be regarded as the same is added beyond the learning data, but for example, when learning an N-gram language model, a weighted sum with a weight value separately provided for each learning data is used. You may take Further, this weight may be automatically calculated using some algorithm. Since these are methods generally used for task adaptation of the language model, details thereof are omitted.

次に、言語モデル学習部１０３は、収集した言語特徴量を用いて言語モデルを学習する（ステップＡ５）。例えばN-gram言語モデルの場合は、言語特徴量として収集した単語のＮ連鎖の出現頻度を、そのＮ連鎖を構成する冒頭（Ｎ−１）個の単語による（Ｎ−１）連鎖の出現頻度で割ることによって得る。即ち、ある単語連鎖ｗ_{ｉ−Ｎ＋１}…ｗ_ｉ−１ｗ_ｉの出現頻度をＮ（ｗ^ｉ _{ｉ−Ｎ＋１}）とした場合、単語連鎖ｗ_{ｉ−Ｎ＋１}…ｗ_ｉ−１が得られた次にｗ_ｉが現れる条件付確率P(ｗ_ｉ|ｗ^ｉ _{ｉ−Ｎ＋１})は（式１）で求められる。Next, the language model learning unit 103 learns a language model using the collected language feature amount (step A5). For example, in the case of an N-gram language model, the occurrence frequency of N chains of words collected as language feature amounts is expressed as the occurrence frequency of (N-1) chains by the first (N-1) words constituting the N chain. Get by dividing by. That is, when the frequency of occurrence of a word chain _{_{w i-N + 1 ... w}} i-1 w i and ^{_{N (w i i-N +}} 1), the words chain _{_{w i-N + 1 ... w}} i-1 is obtained next to w _The conditional probability P (w _i | w ⁱ _{i−N + 1} ) in which _i appears is obtained by (Equation 1).

以上で学習フェイズが完了し、言語モデルが構築される。なお、一般的には、出現頻度の低い単語連鎖に対していくらかの確率値を割り当てる処理（スムージング）が合わせて行われる。 This completes the learning phase and builds the language model. In general, a process (smoothing) for assigning some probability value to a word chain having a low appearance frequency is also performed.

少なくとも１回以上学習フェイズが完了した後、認識フェイズが実行される。認識フェイズは、ユーザにより音声認識処理が要求される度に実行される。以下、認識フェイズにおける動作について図４を参照して説明する。 After the learning phase is completed at least once, the recognition phase is executed. The recognition phase is executed every time a voice recognition process is requested by the user. Hereinafter, the operation in the recognition phase will be described with reference to FIG.

まず、ユーザより音声信号データが音声認識部１０４に入力される。音声認識部１０４は、所定のアルゴリズムに基づいて音声信号データを解析し、最も適切と判断された文字列や単語列などを出力する（ステップＡ６）。この音声認識処理では、学習フェイズにより構築された言語モデルを用いる。 First, voice signal data is input to the voice recognition unit 104 by the user. The voice recognition unit 104 analyzes the voice signal data based on a predetermined algorithm, and outputs a character string, a word string, or the like determined to be most appropriate (step A6). In this speech recognition process, a language model constructed by the learning phase is used.

音声認識部１０４における音声認識処理の具体的な処理手順にはさまざまな方式があるが、おおむね次のようにして行われる。 Although there are various methods for the specific processing procedure of the speech recognition processing in the speech recognition unit 104, it is generally performed as follows.

まず入力音声信号から音響特徴量を算出する。音響特徴量とは、ケプストラム、パワー、Δパワーなどの、人が音声を知覚する際の手掛かりとして利用されている音響信号的な特徴量のベクトルである。こうして得られた音響特徴量シーケンスを、音響モデルと呼ばれるコンポーネントに記録された各音素の音響特徴量パターン分布と比較することで、あり得る音素シーケンスの組み合わせが生成される。パターン分布と音素シーケンスの累積距離の近さを音響尤度と呼ぶ。 First, an acoustic feature amount is calculated from the input voice signal. The acoustic feature amount is a vector of acoustic feature amounts such as cepstrum, power, and Δ power that are used as clues when a person perceives speech. By comparing the acoustic feature quantity sequence obtained in this way with the acoustic feature quantity pattern distribution of each phoneme recorded in a component called an acoustic model, possible phoneme sequence combinations are generated. The closeness of the cumulative distance between the pattern distribution and the phoneme sequence is called acoustic likelihood.

次に各音素シーケンスを辞書と照らし合わせることによって、あり得る単語シーケンス(仮説)を求める。そして、各単語シーケンスの妥当性を言語モデル学習部１０３で構築した言語モデルによって評価し、数値化された評価結果（言語尤度）を得る。N-gram言語モデルなどの確率的言語モデルの場合、言語尤度は単語シーケンスの出現確率となる。最後に、各仮説単語シーケンスに対してその音響尤度と言語尤度を所与の重みで混合した値を算出し、この値がもっとも高い仮説を認識結果として出力する。実際には全ての仮説を探索することはできないため、音声信号の入力に従って尤度の低い仮説は切り捨てる。また、最尤仮説の探索処理は尤度の逆数を用いることで最小コスト経路探索問題として解くのが一般的である。本発明では、音声認識の具体的なアルゴリズムについては特に問わないが、少なくともそこで用いられる言語モデルは、前述の学習フェイズによって構築されたものを用いる必要がある。 Next, a possible word sequence (hypothesis) is obtained by comparing each phoneme sequence with a dictionary. Then, the validity of each word sequence is evaluated by the language model constructed by the language model learning unit 103, and a numerical evaluation result (language likelihood) is obtained. In the case of a probabilistic language model such as an N-gram language model, the language likelihood is the appearance probability of a word sequence. Finally, a value obtained by mixing the acoustic likelihood and the language likelihood with a given weight is calculated for each hypothesis word sequence, and the hypothesis having the highest value is output as a recognition result. In fact, since all hypotheses cannot be searched, hypotheses with low likelihood are discarded according to the input of the speech signal. Further, the search process for the maximum likelihood hypothesis is generally solved as a minimum cost route search problem by using the inverse of the likelihood. In the present invention, a specific algorithm for speech recognition is not particularly limited, but at least a language model used therein must be constructed by the learning phase described above.

また、音声認識部１０４は、音声認識処理において参照された言語特徴量を逐一記録する。即ち、音声認識部１０４は、音声認識処理の完了後または音声認識処理の進行に合わせて、言語特徴量の利用統計情報を算出し、これを学習寄与度解析部１０５に入力する（ステップＡ７）。 In addition, the voice recognition unit 104 records each language feature amount referred to in the voice recognition process. That is, the speech recognition unit 104 calculates language feature usage statistics information after completion of the speech recognition processing or in accordance with the progress of the speech recognition processing, and inputs this to the learning contribution analysis unit 105 (step A7). .

例えば、最終的に得られた最尤単語シーケンスに含まれる単語や単語連鎖を、最尤単語シーケンスとは別に記録・出力する。最尤単語シーケンスだけでなく、尤度上位のＮ個の仮説（N-best）における単語や単語連鎖の出現頻度を対象としてもよい。あるいは、最尤単語シーケンスが得られた際に音声認識部１０４の内部で保持していた仮説単語シーケンス群のラティス構造（ワードグラフ）の中に含まれる単語または単語連鎖の出現頻度を対象としてもよい。また、最終的な上位仮説以外に、探索途中で言語モデルを参照した全ての単語または単語連鎖とその個数を記録・出力してもよい。 For example, words and word chains included in the finally obtained maximum likelihood word sequence are recorded and output separately from the maximum likelihood word sequence. Not only the maximum likelihood word sequence but also the appearance frequency of words and word chains in N hypotheses (N-best) with the highest likelihood may be targeted. Alternatively, the frequency of occurrence of words or word chains included in the lattice structure (word graph) of the hypothesized word sequence group held inside the speech recognition unit 104 when the maximum likelihood word sequence is obtained Good. In addition to the final higher-level hypothesis, all words or word chains that refer to the language model during the search and the number thereof may be recorded and output.

また、それらを記録する際に、その単語または単語連鎖を構成する音素シーケンスの音響尤度を合わせて記録・出力してもよい。また、最尤単語シーケンスないしN-bestシーケンスやワードグラフに対して所定の意味解析処理を施し、一つ又は複数の単語連鎖からなる意味単位（フレーズ又は固有表現）を取り出してその出現数を集計・記録・出力してもよい。あるいは、単語または単語連鎖の持つメタ情報（品詞、意味クラス）を単位として集計・記録・出力してもよい。また、単語又は単語列の係り受け構造や共起情報を集計・記録・出力してもよい。 Moreover, when recording them, you may record and output together the acoustic likelihood of the phoneme sequence which comprises the word or word chain. Predetermined semantic analysis processing is performed on the most likely word sequence or N-best sequence or word graph, and semantic units (phrases or specific expressions) consisting of one or more word chains are extracted and the number of occurrences is counted. -You may record and output. Or, meta information (part of speech, semantic class) possessed by a word or word chain may be aggregated, recorded, and output as a unit. Further, the dependency structure or co-occurrence information of words or word strings may be aggregated, recorded, and output.

いずれの統計情報を用いるにしても、その集計対象となる言語特徴量は、言語特徴量出自記憶部１０２を参照することで、過去のどの学習データにどれだけ出現したか判断できるものでなければならない。典型的には、これは、言語特徴量抽出部１０１により抽出され、言語特徴量出自記憶部１０２により記録されている言語特徴量を対象とすることを意味する。しかし、言語特徴量出自記憶部１０２により記録された複数のデータから演算によって求められるような二次的な特徴量を用いてもよい。具体的には、言語特徴量出自記憶部１０２に単語トライグラムのみが記録されている場合であっても、これを適宜積算することによって単語バイグラムや品詞トライグラムに関する言語特徴量を導き出すことが可能であるから、そのような言語特徴量の統計情報を用いてもよい。また、整合性が保たれるのであれば、言語特徴量抽出部１０１には異なる複数のアルゴリズムを用いてもよい。以上で認識フェイズが完了する。 Whichever statistical information is used, the linguistic feature quantity to be counted must be determined by referring to the linguistic feature quantity origin storage unit 102 and how much it has appeared in which learning data in the past. Don't be. Typically, this means that the language feature quantity extracted by the language feature quantity extraction unit 101 and recorded by the language feature quantity source storage unit 102 is targeted. However, a secondary feature amount that is obtained by calculation from a plurality of data recorded by the language feature amount source storage unit 102 may be used. Specifically, even if only the word trigram is recorded in the language feature quantity source storage unit 102, it is possible to derive the language feature quantity related to the word bigram or the part-of-speech trigram by appropriately integrating the word trigram. Therefore, such statistical information on language features may be used. In addition, a plurality of different algorithms may be used for the language feature quantity extraction unit 101 as long as consistency is maintained. This completes the recognition phase.

少なくとも認識フェイズが１回以上完了した後、解析フェイズが実行される。解析フェイズの起動タイミングとしては、（１）認識フェイズの完了後、（２）所定時間毎に（数分から数時間毎に）、（３）ユーザによって明示的に起動を指示された場合、等が考えられる。認識フェイズと連動して起動する以外の構成とする場合、一時的に認識フェイズにおける出力結果である言語特徴量の利用統計情報を蓄積するデータベースを用いてもよいし、学習データ寄与度解析部１０５に直接入力してもよい。いずれの場合でも、前回の認識フェイズ以降、特に言語特徴量利用統計に変化が認められなければ、解析フェイズは何も処理せずに終了してよい。以下、解析フェイズにおける動作について図５を参照して説明する。 After at least one recognition phase is completed, the analysis phase is executed. The analysis phase activation timing includes (1) after completion of the recognition phase, (2) every predetermined time (every few minutes to every few hours), (3) when the user explicitly instructs activation, etc. Conceivable. When the configuration other than the activation is performed in conjunction with the recognition phase, a database that temporarily stores the usage statistical information of the language feature amount that is the output result in the recognition phase may be used, or the learning data contribution analysis unit 105 You may enter directly into. In any case, if no change is recognized in the language feature usage statistics after the previous recognition phase, the analysis phase may be terminated without any processing. Hereinafter, the operation in the analysis phase will be described with reference to FIG.

学習データ寄与度１０５は、音声認識部１０４の出力した言語特徴量ごとの統計情報を用い、言語特徴量出自部１０２を参照しながら、各学習データが音声認識処理においてどれだけ参照されたか、その寄与度を所与のアルゴリズムを用いて算出する。 The learning data contribution degree 105 uses statistical information for each language feature amount output from the speech recognition unit 104, and refers to the language feature amount origin unit 102, and how much each learning data is referred to in the speech recognition processing. The contribution is calculated using a given algorithm.

寄与度算出アルゴリズムの一例を挙げる。音声認識部１０４は、音声認識処理において最尤単語シーケンスに含まれる各単語の出現個数を取得する。例えば「今日は晴れ、明日は雨です」という発話に対する認識結果単語列が「今日/は/あれ/明日/は/雨/で」であったとすると、「今日」「あれ」「明日」「雨」「で」はそれぞれ１回、「は」２回現れたという統計情報が出力され、学習データ寄与度解析部１０５に入力される。学習データ寄与度解析部１０５は、過去に入力された統計情報を記憶しており、これに得られた統計情報を加える。その結果、例えば「今日」が３０回、「あれ」が１００回、「明日」が４０回、「雨」が１０回、「は」が５００回、それぞれ累積で現れたとする。以降、ある時刻ｔにおける単語の出現情報の集合をEtと表し、Etにおける単語wの累積出現数をEt(w)と表す。 An example of a contribution calculation algorithm is given. The voice recognition unit 104 acquires the number of appearances of each word included in the maximum likelihood word sequence in the voice recognition process. For example, if the recognition result word string for the utterance “Today is sunny and tomorrow is rain” is “Today / Has / That / Tomorrow / Has / Rain / In”, “Today” “That” “Tomorrow” “Rain” The statistical information that “de” appears once and “ha” appears twice is input to the learning data contribution analysis unit 105. The learning data contribution analysis unit 105 stores statistical information input in the past, and adds the obtained statistical information. As a result, for example, “Today” is 30 times, “That” is 100 times, “Tomorrow” is 40 times, “Rain” is 10 times, and “Ha” is 500 times. Hereinafter, a set of word appearance information at a certain time t is expressed as Et, and the cumulative number of appearances of the word w at Et is expressed as Et (w).

学習データ寄与度解析部１０５は、言語特徴量出自記憶部１０２を参照して、各単語が学習データのそれぞれにどれだけ現れたかを計算する。例えば「今日」という単語は学習データｄ１に２００回、学習データｄ２では２０回、学習データｄ３では一回も現れなかった、のような情報を求める。以降、ある学習データｃにおける単語ｗの出現数をF(w|c)と表す。すると、ある時刻tにおける学習データcの寄与度Contribute(t|c)は（式２）のように表される。 The learning data contribution analysis unit 105 refers to the language feature amount origin storage unit 102 and calculates how many words appear in each of the learning data. For example, the word “today” is searched for information such as 200 times in the learning data d1, 20 times in the learning data d2, and none in the learning data d3. Hereinafter, the number of occurrences of the word w in a certain learning data c is expressed as F (w | c). Then, the contribution Contribute (t | c) of the learning data c at a certain time t is expressed as (Equation 2).

（式２）は過去に得られた認識結果により多く含まれる単語をより多く与える学習データほど、その過去の音声認識に対する寄与度が高いということを表す。ここでαは学習データごとの重みであり、基本的にはステップＡ４において用いられた値に合わせる。学習時に特に重みをつけず足し合わせた場合は、あらゆる学習データに対してα＝１となる。 (Expression 2) indicates that the learning data that gives more words included in the recognition results obtained in the past has a higher contribution to the past speech recognition. Here, α is a weight for each learning data, and is basically adjusted to the value used in step A4. When learning is performed without adding any particular weight, α = 1 for all learning data.

このケースでは認識処理で何回用いられたかは考慮していないが、これを考慮するならば（式３）のようになる。 In this case, the number of times used in the recognition process is not considered, but if this is taken into consideration, (Equation 3) is obtained.

また、上式では単純に全ての単語を計算対象として公平に取り扱っているが、例えば格助詞「が」、「は」等は、学習データによらず同じように働くと考えて計算から除外したり、適当係数を掛けて影響が小さくなるよう調整したりしてもよい。単語ごとの重みをβ（ｗ）とすると、例えば（式４）のようになる。 In addition, the above formula simply treats all words fairly as calculation targets, but for example, the case particles “ga”, “ha”, etc. are excluded from the calculation because they work in the same way regardless of the learning data. Or may be adjusted so as to reduce the influence by multiplying by an appropriate coefficient. If the weight for each word is β (w), for example, (Equation 4) is obtained.

単語ごとの重みの与え方については、予め単語ごとに重み値を与えておく方法と、音声認識処理時に求める方法がある。前者は、例えば固有名詞に高い重みを与え、助詞や普通名詞には低い重みを与える、等のように事前に決定する。後者は、例えば、認識結果の各単語に対する尤度やスコアの大小から求める。例えば単語事後確率等のいわゆる信頼度は、そのまま重みとして用いることができる。 There are two methods for assigning weights for each word: a method in which weight values are given in advance for each word, and a method for obtaining values during speech recognition processing. The former is determined in advance, for example, giving a high weight to proper nouns and giving a low weight to particles and common nouns. The latter is obtained from, for example, the likelihood of each word of the recognition result and the magnitude of the score. For example, so-called reliability such as word posterior probability can be used as a weight as it is.

また、さらに学習データのサイズを考慮するという方法も考えられる。例えば認識処理中に参照されたある単語が、５０００単語からなる学習データで１０回、１００単語からなる文書で１０回出てくるのであれば、後者の方が音声の内容に近い学習データである可能性が高い。これを例えば（式５）で表現する。 Further, a method of considering the size of learning data is also conceivable. For example, if a word referred to during the recognition process appears 10 times for learning data consisting of 5000 words and 10 times for a document consisting of 100 words, the latter is learning data closer to the content of the speech. Probability is high. This is expressed by, for example, (Formula 5).

ここでｙは、学習データｃが含む単語を示す。ｗは認識処理中に参照された単語であるため、母集団が異なる点に注意が必要となる。上記のようにこの式は、各単語の学習データ中における出現数ではなく、出現確率に基づいて寄与度を求めることを表している。学習データは、最終的に言語モデルにおける各言語特徴量の出現確率に影響を与えるのであるからこれを反映した方式であるといえる。 Here, y indicates a word included in the learning data c. Since w is a word referred to during the recognition process, attention must be paid to the fact that the population is different. As described above, this expression indicates that the contribution is obtained based on the appearance probability instead of the number of appearances of each word in the learning data. Since the learning data finally affects the appearance probability of each language feature in the language model, it can be said that this is a method that reflects this.

学習データの大きさに関しては別の考え方もある。例えば、前述の学習データＡにおいて、ある単語ｗの出現確率は確かに小さいとしても、すべての学習データから得られた言語特徴量全体としてみたときに、ｗが表れたのは学習データＡだけだったとすれば、やはりその寄与は大きいと考えられる。これについては、例えば、式４と式５を適当な重みを用いて線形和したものを最終的な寄与度にする、などの方法が考えられる。 There is another way of thinking about the size of the learning data. For example, in the above-mentioned learning data A, even if the appearance probability of a certain word w is surely small, w appears only in the learning data A when viewed as the whole language feature amount obtained from all the learning data. If so, the contribution is considered significant. For this, for example, a method in which the final contribution is obtained by linearly summing Equations 4 and 5 using appropriate weights can be considered.

また、単語の出現頻度分布は大小が極端に現れる傾向があるので、各々の出現数には対数を取った値を代わりに用いてもよい。 Also, since the appearance frequency distribution of words tends to appear extremely large and small, a logarithmic value may be used instead for each appearance number.

単語以外の言語特徴量を用いる場合でも、同様のアルゴリズムで寄与度を計算することができる。例えば３個の単語連鎖（3-gram）を用いる場合、音声認識手段１０４は、上述の例であれば「<S>/今日/は/」、「今日/は/あれ」、「は/あれ/明日」、「明日/は/雨」、「は/雨/で」、「雨/で/<E>」の７つの単語連鎖に対する出現回数（それぞれ１回）を集計する。ここで<S>と<E>はそれぞれ文頭・文末を表す特殊な記号である。「で/<E>/<?>」を８つ目の単語連鎖として集計してもよいが、この場合の第３単語には実在しない（＝音素列から引けない）任意の単語エントリー<?>を予めひとつ定めて、これを割り当てる。その後の寄与度算出アルゴリズムは単語を言語特徴量とする場合とほぼ同一であるため省略する。 Even when a language feature other than words is used, the degree of contribution can be calculated with the same algorithm. For example, when three word chains (3-gram) are used, the speech recognition means 104 is “<S> / Today / ha /”, “Today / ha / that”, “ha / that” in the above example. The number of appearances (one each) for the seven word chains “/ Tomorrow”, “Tomorrow / Ha / Rain”, “Ha / Rain / De”, and “Rain / De / <E>” is tabulated. Here, <S> and <E> are special symbols representing the beginning and end of the sentence, respectively. “De / <E> / <?>” May be counted as the eighth word chain, but in this case, any word entry that does not exist in the third word (= cannot be subtracted from the phoneme string) <? > Is determined in advance and assigned. The subsequent contribution calculation algorithm is almost the same as the case where a word is a language feature, and is omitted.

学習データごとの寄与度が算出されたならば、システムはこれを所定の形式で出力する（Ａ９）。具体的には、学習データごとに、過去に行われたすべての音声認識処理から得られた言語特徴量利用統計を用いて算出した寄与度の一覧を、所定のデータ形式にエンコードして出力する。 If the contribution for each learning data is calculated, the system outputs it in a predetermined format (A9). Specifically, for each learning data, a list of contributions calculated using linguistic feature quantity usage statistics obtained from all speech recognition processes performed in the past is encoded and output in a predetermined data format. .

また、特定ユーザの音声認識処理から得られた言語特徴量利用統計のみを用いて算出した寄与度を出力してもよい。ひとり以上の特定ユーザからなるグループの音声認識処理に対する寄与度を出力してもよい。あるいはまた、過去のある時間内に行われた音声認識処理に対する寄与度を算出してもよい。例えば、一日単位で学習データごとの寄与度を順次出力することで、グラフ（折れ線グラフなど）で表示することができる。またあるいは、学習データごとではなく、学習データを登録したユーザ単位でまとめて出力してもよい。例えば、あるユーザが４つの学習データを登録し、直近の一週間におけるすべての音声認識処理に対する寄与度がそれぞれ１０％、１５％、２％、５％であれば、そのユーザの登録した学習データによる寄与度は１０＋１５＋２＋５＝３２％である、のように求めることができる。以上で解析フェイズが一通り完了する。 Moreover, you may output the contribution calculated using only the language feature-value utilization statistics obtained from the speech recognition process of the specific user. You may output the contribution with respect to the speech recognition process of the group which consists of one or more specific users. Or you may calculate the contribution with respect to the speech recognition process performed within the past certain time. For example, it is possible to display a graph (such as a line graph) by sequentially outputting the contribution for each learning data on a daily basis. Alternatively, the learning data may be collectively output for each registered user instead of each learning data. For example, if a user registers four pieces of learning data, and the contributions to all speech recognition processes in the most recent week are 10%, 15%, 2%, and 5%, the learning data registered by the user The degree of contribution by can be calculated as follows: 10 + 15 + 2 + 5 = 32%. This completes the analysis phase.

以上説明したように、本発明によれば、音声認識処理の際に参照された言語特徴量の利用統計を記録する手段を備え、各言語特徴量が、与えられた学習データごとにどのように出現したかを後で参照可能な構成を有するため、音声認識処理に際して実際により多く参照された言語特徴量をより多く含む学習データがいずれであるか、また、その相対的な大きさはどの程度であるか、を算出し、ユーザに提示することができる。 As described above, according to the present invention, there is provided means for recording usage statistics of language feature values referred to during speech recognition processing, and how each language feature value is determined for each given learning data. Since it has a configuration in which it can be referred later, it is the learning data that contains more language features that are actually referred to more in the speech recognition process, and what is its relative size Can be calculated and presented to the user.

なお、図１に示すシステム構成は一例であり、例えば図６に示すように、２つの言語特徴抽出部１０１が並んでおり、一方が言語特徴量出自記憶部１０２に、もう一方が言語モデル学習部１０３に、それぞれ繋がるような構成としてもよい。例えば、言語モデルはN-gramモデルであるが、言語特徴量としては係り受けを用いるといった場合では、言語特徴量収集と言語モデル構築で共有される情報は殆どないため、このような場合は、図６の構成を適用してもよい。 The system configuration shown in FIG. 1 is an example. For example, as shown in FIG. 6, two language feature extraction units 101 are arranged, one in the language feature quantity source storage unit 102, and the other in language model learning. It is good also as a structure connected to the part 103, respectively. For example, the language model is an N-gram model, but when using dependency as a language feature, there is almost no information shared between language feature collection and language model construction. The configuration of FIG. 6 may be applied.

（第２の実施形態）
次に、本発明の第２の実施形態に係る言語モデル構築支援装置について説明する。図７は、第２の実施形態に係る言語モデル構築支援装置のブロック図である。本言語モデル構築支援装置は、第１の実施形態とほぼ同じ構成を有するが、言語モデル構築部に更新前の言語モデルを与えることができる点が異なる。以下、第１の実施形態との相違点を中心に説明する。(Second Embodiment)
Next, a language model construction support apparatus according to the second embodiment of the present invention will be described. FIG. 7 is a block diagram of a language model construction support apparatus according to the second embodiment. This language model construction support apparatus has substantially the same configuration as that of the first embodiment, except that a language model before update can be given to the language model construction unit. Hereinafter, a description will be given focusing on differences from the first embodiment.

学習フェイズにおけるステップＡ５は、言語モデルを実際に構築するステップである。第２の実施形態では、ステップＡ４で集計された言語特徴量のほかに、別途与えられた言語モデル（更新前言語モデル）も同時に用いることで新しい言語モデル（更新後言語モデル）を構築する。 Step A5 in the learning phase is a step of actually constructing a language model. In the second embodiment, a new language model (updated language model) is constructed by simultaneously using a language model (pre-update language model) given separately in addition to the language feature values tabulated in step A4.

言語モデル学習部２０３は、更新前言語モデルの構築に用いられた言語特徴量の集合を、言語特徴量出自記憶部２０２に登録する。この場合、これらの言語特徴量の出自は更新前言語モデルそのものとなるので、ステップＡ９における出力の際には、適宜他の学習データと区別することができる。 The language model learning unit 203 registers a set of language feature values used for constructing the pre-update language model in the language feature value source storage unit 202. In this case, since the origin of these language feature amounts is the pre-update language model itself, it can be appropriately distinguished from other learning data when output in step A9.

更新前言語モデルから、その構築に用いられた言語特徴量を取り出すには例えば二つの方法がある。 There are, for example, two methods for extracting the language feature used for the construction from the pre-update language model.

第１の方法は、更新前言語モデルの学習時に、言語モデル（言語特徴量をキーとし、その言語尤度を返すテーブルと見做せる）に関連付けて、算出の根拠となった言語特徴量の集合を合わせて保存する方式である。例えば、更新前言語モデル自体が本発明の第１の実施形態に準じたシステムによって構築されたのであれば、このような情報を保存することは容易い。 In the first method, when learning the pre-update language model, it is related to the language model (which can be regarded as a table returning the language likelihood using the language feature amount as a key) and This is a method of storing a set together. For example, if the pre-update language model itself is constructed by a system according to the first embodiment of the present invention, it is easy to store such information.

第２の方法は、更新前言語モデルに含まれる各言語特徴量に対する言語尤度から、各言語特徴量の出現数を所与のアルゴリズムを用いて復元する方式である。頻度復元アルゴリズムはどのような方法を用いてもよいが、例えば文献１「相補的バックオフを用いた言語モデル融合ツールの構築，情報処理学会研究報告.SLP,音声言語情報処理2001,11,p49-54」では、N-gram言語モデルが条件付確率の積であることを利用した復元方法が紹介されている。これは近似値になるが、本発明での利用においては精度上の問題は特にない。 The second method is a method of restoring the number of appearances of each language feature amount from a language likelihood for each language feature amount included in the pre-update language model using a given algorithm. Any method can be used for the frequency restoration algorithm. For example, Reference 1 “Construction of language model fusion tool using complementary back-off, IPSJ Research Report. SLP, Spoken Language Information Processing 2001, 11, p49” -54 "introduces a restoration method using the fact that the N-gram language model is a product of conditional probabilities. Although this is an approximate value, there is no particular problem in accuracy in use in the present invention.

また、ステップＡ４で集計された言語特徴量と合わせて更新前言語モデルを利用して言語モデルを構築する処理方法にもさまざまな方式が考えられる。ここでは、代表的な二つの方法について説明する。これらの方法のいずれを用いるかは任意に設定可能である。 In addition, various methods are conceivable as processing methods for constructing a language model by using the pre-update language model in combination with the language feature values totaled in step A4. Here, two typical methods will be described. Which of these methods is used can be arbitrarily set.

第１の言語モデルの更新方法は、更新前言語モデルの構築に用いられた言語特徴量の集合と、ステップＡ４で集計された言語特徴量の集合との和（または加重和）を取るものである。前掲の文献１ではこれをコーパス結合と呼んでいる。本実施形態では、上述の通り、更新前言語モデルから言語特徴量の出現数を取り出す処理を行っているので、この方法が特に好適である。 The first language model update method is to take the sum (or weighted sum) of the set of language feature values used to construct the pre-update language model and the set of language feature values tabulated in step A4. is there. In the above-mentioned document 1, this is called corpus coupling. In the present embodiment, as described above, since the process of extracting the number of appearances of language feature values from the pre-update language model is performed, this method is particularly suitable.

第２の言語モデル更新方法は、集計された言語特徴量から第１の実施形態におけるステップＡ５と同様の方法で構築した言語モデルと、更新前言語モデルとの間で、言語尤度の線形和を取るものである。具体的な線形和の取り方にもさまざまな方式があるが、文献１では一方の言語モデルに出現しない言語特徴量については確率値としてゼロを与える方法と、もう一方の言語モデルから推定した確率値を与える方法が記載されている。この方法は第一の方法に比べて処理速度が速い。 The second language model update method is a linear sum of language likelihoods between a language model constructed by the same method as in step A5 in the first embodiment and the language model before update from the aggregated language feature quantities. Is something to take. Although there are various methods for taking a specific linear sum, in literature 1, for a language feature that does not appear in one language model, a method of giving zero as a probability value and a probability estimated from the other language model The method of giving the value is described. This method has a higher processing speed than the first method.

本発明の第２の実施形態によれば、既存の言語モデルに対して、後から加えた学習データから得られた言語特徴量をさらに追加するよう構築されているため、ゼロから言語モデルを構築するよりも、より大規模でロバスト性の高い言語モデルを得ることができる。そして、そのような言語モデルを用いる場合でも、後から加えた学習データの寄与度を正しく求めることができる。 According to the second embodiment of the present invention, a language model is constructed from scratch because it is constructed to further add language feature values obtained from learning data added later to an existing language model. It is possible to obtain a larger and more robust language model than this. Even when such a language model is used, it is possible to correctly determine the contribution of learning data added later.

次に、本実施形態の実施例として、業務日報入力をタスクとしたシステムについて説明する。学習データは過去の（手書きの）業務日報のほか、製品パンフレット、メールなどとし、一つのファイルを一つの学習データとする。生成する言語モデルは単語トライグラムモデルとし、言語特徴量抽出部２０１は、単語３連鎖の各々の出現頻度を数える。言語特徴量出自記憶部２０２により言語特徴量出自ＤＢに記憶されるデータの一例を図８に示す。 Next, a system using task daily report input as a task will be described as an example of the present embodiment. In addition to past (handwritten) business daily reports, learning data includes product brochures, emails, etc., and one file is used as one learning data. The language model to be generated is a word trigram model, and the language feature quantity extraction unit 201 counts the appearance frequency of each of the three word chains. An example of data stored in the language feature quantity origin DB by the language feature quantity origin storage unit 202 is shown in FIG.

このようなデータにより、トライグラムをキーにして学習データごとの出現頻度を取得できる。更新前言語モデルは、そのモデルが学習された際の入力であったトライグラムの出現頻度テーブルが付属しているものとする。言語モデル学習部２０３は、更新前言語モデルの出現頻度テーブル（トライグラムをキーにして出現頻度が引ける表）を言語特徴量出自記憶部２０２に記憶されるテーブルにマージする（１列増える）。その後、言語特徴量出自記憶部２０２のテーブルでトライグラムを一つずつ参照し、全ての行の出現頻度を合計して記憶する。以降の確率計算の処理動作は第１の実施形態と同様であり、算出されたトライグラム→確率値のテーブルを更新後言語モデルとして出力する。 With such data, the appearance frequency for each learning data can be acquired using the trigram as a key. It is assumed that the pre-update language model is attached with a trigram appearance frequency table that was input when the model was learned. The language model learning unit 203 merges the appearance frequency table of the pre-update language model (a table in which the appearance frequency can be subtracted using the trigram as a key) with the table stored in the language feature amount source storage unit 202 (increase by one column). Thereafter, the trigrams are referred to one by one in the table of the language feature amount origin storage unit 202, and the appearance frequencies of all lines are summed and stored. Subsequent probability calculation processing operations are the same as in the first embodiment, and the calculated trigram → probability value table is output as an updated language model.

音声認識部２０４は、第１の実施形態と同様に、最尤単語列を算出し、これを表示装置に出力する。一方、最尤単語列に含まれる各単語を、学習データ寄与度解析部２０５に入力する。学習データ寄与度解析部２０５は、受け取った単語の累積個数（累積利用頻度）をカウントしておく。保持方式としては、言語特徴量出自記憶部２０２と同様に、トライグラムと音声認識処理の実行時刻をキーとして、その実行時刻における利用頻度が引けるようなテーブルを用いる。これにより、例えば、実行時刻から１カ月たったデータは削除するという処理動作を行ってもよい。また、その音声認識処理を実行したユーザを示すユーザアカウントも合わせて記録してもよい。 Similar to the first embodiment, the speech recognition unit 204 calculates a maximum likelihood word string and outputs it to the display device. On the other hand, each word included in the maximum likelihood word string is input to the learning data contribution analysis unit 205. The learning data contribution analysis unit 205 counts the cumulative number of words received (cumulative usage frequency). As the retention method, a table that can use the trigram and the execution time of the speech recognition processing as a key and can use the table at the execution time is used, as in the language feature amount source storage unit 202. Thereby, for example, a processing operation of deleting data after one month from the execution time may be performed. Further, a user account indicating the user who executed the voice recognition process may also be recorded.

ユーザがＵＩ（ユーザインタフェース）画面から「学習データの利用状況を見る」ボタンを押すと、学習データ寄与度解析部２０５は、保持しているカウント数から式４に基づいて寄与度を算出し、学習コーパスごとに一覧表を生成して表示装置に出力する。ここで、αとβはどちらも常に１とする。また、ユーザがＵＩ画面から「計算対象とする利用期間」フィールドに日数又は月数を入力した場合、学習データ寄与度解析部２０５は、各トライグラムの累積利用頻度を求める際、そのエントリーの実行時刻が指定された期間に属さないエントリーは無視する。また、ユーザがＵＩ画面から「計算対象とするユーザ」リストボックスから特定のユーザを選んだ場合は、同様にそのユーザを示すユーザアカウントと合わせて登録されなかったエントリーは無視する。複数選択可能リストボックスを用いて複数ユーザを登録させたり、別途定めたユーザグループ単位で選択させることで、選択されたユーザユーザグループに属する複数のユーザを対象として選択させてもよく、この場合も同様に選択されたユーザのいずれでもないユーザアカウントで登録されたエントリーは無視する。 When the user presses the “view learning data usage” button from the UI (user interface) screen, the learning data contribution analysis unit 205 calculates the contribution based on Equation 4 from the count number held, A list is generated for each learning corpus and output to the display device. Here, both α and β are always 1. In addition, when the user inputs the number of days or months in the “use period to be calculated” field from the UI screen, the learning data contribution analysis unit 205 executes the entry when obtaining the cumulative use frequency of each trigram. Ignore entries whose time does not belong to the specified period. When the user selects a specific user from the “user to be calculated” list box from the UI screen, the entry that is not registered together with the user account indicating the user is similarly ignored. A plurality of users can be registered using a selectable list box or can be selected for each user group separately defined, so that a plurality of users belonging to the selected user user group can be selected. Similarly, entries registered with a user account that is not one of the selected users are ignored.

学習データの登録時に、学習データごとに複数の属性を付与できるようにしてもよい。例えば、学習データの登録ユーザのユーザカウント、登録日時、あるいはタグ（任意のラベル）などが考えられる。ユーザは、寄与度の計算対象として特定の属性値を持つ学習データを選択することができる。例えば、ＵＩ上に「登録者が○○である学習データの利用状況を見る」（○○はリストボックスやテキストフィールド）というコントロールを配置しておき、ここから選択する。また、属性ごとに学習データクラスタとしてまとめて寄与度を計算し、表示するようにしてもよい。例えばＵＩ画面上に「学習データの利用状況を登録時間ごとに見る」というボタンを配置しておき、ユーザがこれを選択した場合、学習データをその登録日時属性を用いて特定の期間、例えば１カ月単位で分類して各々を学習データクラスタとしたのち、学習データクラスタ単位で寄与度を計算する。 When registering learning data, a plurality of attributes may be assigned to each learning data. For example, a user count of registered users of learning data, a registration date and time, or a tag (arbitrary label) can be considered. The user can select learning data having a specific attribute value as a contribution calculation target. For example, a control “view the usage status of learning data whose registrant is XX” (XX is a list box or text field) is arranged on the UI, and is selected from here. Further, the contribution may be calculated and displayed as a learning data cluster for each attribute. For example, when a button “view learning data usage status at every registration time” is arranged on the UI screen and the user selects it, the learning data is used for a specific period, for example, 1 After classifying in units of months and making each a learning data cluster, the contribution is calculated in units of learning data clusters.

（第３の実施形態）
図９は、第３の実施形態にかかる言語モデル構築支援装置のブロック図である。本言語モデル構築支援装置は、第２の実施形態と同様の構成を有するが、更新前言語モデルでも同様に音声認識を行い、更新後モデルとの間で言語特徴量の差分を取って、そこから寄与度を算出する手段をさらに備える。(Third embodiment)
FIG. 9 is a block diagram of a language model construction support apparatus according to the third embodiment. This language model construction support apparatus has the same configuration as that of the second embodiment, but also performs speech recognition in the pre-update language model, takes a difference in language feature from the post-update model, And a means for calculating a contribution degree from the above.

第２の実施形態では、更新前言語モデルの寄与度が大きい場合に、個別の学習データに対する寄与度が相対的に小さく、優劣の幅も小さくなる。特に、学習データ間の寄与度の差に注目する場合はこの処理方法が望ましい。学習データの寄与度の差が特に重要となるケースとしては、例えば、寄与度の高い学習データを追加したユーザに謝礼などの報酬を与える場合が挙げられる。なお、第２の実施形態でもステップＡ８で更新前言語モデルにおける各言語特徴量の出現数をゼロと見做せば同じような値が得られるが、こちらの方がより厳密に更新前後の言語モデルに対する学習データ群の影響を評価できる。 In the second embodiment, when the contribution level of the pre-update language model is large, the contribution degree to the individual learning data is relatively small and the range of superiority and inferiority is also small. This processing method is particularly desirable when paying attention to the difference in contribution between learning data. As a case where the difference in the contribution level of learning data is particularly important, for example, a reward such as a reward is given to a user who has added learning data with a high contribution degree. In the second embodiment, a similar value can be obtained by assuming that the number of appearances of each language feature in the pre-update language model is zero in Step A8, but this is more strictly the language before and after the update. The influence of learning data group on the model can be evaluated.

（第４の実施形態）
図１０は、第４の実施形態にかかる言語モデル構築支援装置のブロック図である。本実施形態は、話者の話した言葉に対する正しい書き起こしテキスト（正解テキスト）が与えられるケースに対応するものである。本実施形態のモデル構築支援装置が備える言語特徴量抽出部５０１、言語特徴量出自記憶部５０２、学習データ寄与度解析部５０３は、第１の実施形態において対応する各構成要素とほぼ同じ機能・構成を有するが、学習データ寄与度解析部５０３は、入力された正解テキストから言語特徴量を抽出し、抽出した言語特徴量の統計情報を利用統計情報として用いて寄与度を算出する。(Fourth embodiment)
FIG. 10 is a block diagram of a language model construction support apparatus according to the fourth embodiment. This embodiment corresponds to a case where a correct transcription text (correct text) for a word spoken by a speaker is given. The language feature quantity extraction unit 501, language feature quantity source storage unit 502, and learning data contribution degree analysis unit 503 included in the model construction support apparatus of the present embodiment have substantially the same functions and functions as the corresponding components in the first embodiment. Although having a configuration, the learning data contribution degree analysis unit 503 extracts a language feature amount from the input correct text, and calculates a contribution degree using the statistical information of the extracted language feature amount as usage statistical information.

（第５の実施形態）
図１１は、第５の実施形態にかかる言語モデル構築支援装置のブロック図である。本実施形態は、第１の実施形態とほぼ同様の構成を有するが、学習データ寄与度解析部１０５が音声認識部１０４によって記録された言語特徴量の利用統計情報や、各音声について入力される正解テキスト等に基づいて学習データの寄与度を算出する点で相違する。(Fifth embodiment)
FIG. 11 is a block diagram of a language model construction support apparatus according to the fifth embodiment. This embodiment has substantially the same configuration as that of the first embodiment, but the learning data contribution analysis unit 105 is input for language feature usage statistics recorded by the speech recognition unit 104 and for each speech. The difference is that the degree of contribution of learning data is calculated based on correct text and the like.

例えば会議室録音音声の書き起こし補助システムなどでは、まず音声認識を行い、認識結果テキストの中から間違いを修正して、最終的な書き起こしを仕上げる、というワークフローが用いられる。このような場合には、認識結果と正解テキストを比較することができる。そこで、例えば学習データ寄与度解析部１０５が、音声認識結果と正解テキストで共通する言語特徴量については+1を、一致しないものは-1をそれぞれの利用頻度に掛けたのち、式２〜式５等の方法で寄与度を求める。また、正解テキスト自体は与えられず、認識結果テキストが「ほぼ正解」、「間違いも多い」、「殆ど間違い」のいずれであるか、その正解の程度を与えてもよい。この場合、正解の程度のレベル（上の例では３段階）を重みとして、例えば+1,0,-1のように数値化し、これを言語特徴量利用統計情報に掛け合わせる。例えば「ＮＥＣ」という単語が「ほぼ正解」の発話に合わせて５回、「ほとんど間違い」の発話に合わせて３回現れたなら、差し引きで２回と見做して寄与度を算出してもよい。さらにまた、認識結果の一部についてのみ、誤りの訂正情報が与えられる可能性もある。この例として、かな漢字変換のように、認識誤りである単語を指定し、提示された下位候補から正解を選ぶようなケースが考えられる。この場合は、訂正された単語は明らかに誤りと判断できるが、それ以外の単語に関しては正しかったのか、誤りであるが許容できると見做されたのか判断できない。よって、訂正された単語を-2とし、訂正されなかった単語は+1と見做すことによって、訂正されなかった単語のもたらすプラスの効果をマイルドにすることができる。勿論、明示的にある単語が正解であるという情報を入力させるようなケースでは、明示的に誤りとも正解とも示されなかった単語よりも高い効果を示すよう、+2を与えるようにしてもよい。 For example, a conference room recording voice transcription assisting system uses a workflow in which speech recognition is first performed, errors are corrected from the recognition result text, and the final transcription is finished. In such a case, the recognition result and the correct text can be compared. Therefore, for example, the learning data contribution analysis unit 105 multiplies the usage frequency by +1 for the language feature amount common to the speech recognition result and the correct text, and -1 for the unmatched language feature, The degree of contribution is obtained by a method such as 5. Further, the correct text itself is not given, and the recognition result text may be “almost correct answer”, “many mistakes”, or “almost wrong”, or the degree of correct answer may be given. In this case, the level of the correct answer (three levels in the above example) is used as a weight, for example, a numerical value such as + 1,0, -1 is multiplied by the language feature usage statistical information. For example, if the word “NEC” appears 5 times for the “almost correct” utterance and 3 times for the “almost wrong” utterance, the contribution may be calculated by subtracting it as 2 times. Good. Furthermore, error correction information may be given only for a part of the recognition result. As an example of this, there is a case where a word that is a recognition error is designated and a correct answer is selected from the presented lower candidates, such as Kana-Kanji conversion. In this case, it can be determined that the corrected word is clearly an error, but it cannot be determined whether the other words were correct or whether it was considered an error but acceptable. Therefore, the positive effect brought about by the uncorrected word can be made mild by regarding the corrected word as -2 and the uncorrected word as +1. Of course, in the case where information is explicitly input that a certain word is correct, +2 may be given to show a higher effect than a word that is not explicitly shown as an error or correct answer. .

（第６の実施形態）
上述した第１、２、３、５の各実施形態において、音声認識処理中に頻繁に仮説として採用されたが、最終的に認識結果（及び／又は上位Ｎ位までの候補）に残らなかった言語特徴量を検索し、そのような言語特徴量を多く含む学習データの寄与度にペナルティを掛けるようにしてもよい。例えば、最終的な認識結果が「菅総理」であるとき、音声認識部は、途中の仮説として「癌」を頻繁に参照したとすると、その「癌」という言葉を多く含む学習データは目的ドメイン（この例では”政治”ドメイン）とは関係ないドメイン（例えば”医療”ドメイン）に属する可能性がある、と見做す。別ドメインに属する学習データは、目的ドメインの発話に対してマイナスの影響を及ぼすことがありうるので、このような学習データの寄与度はやや差し引いてユーザに提示する。例えば、仮説としてＮ回採用されたにも関わらず、最終的に尤度の高い上位Ｋ個の仮説のいずれにも残らなかった単語は、言語特徴量利用統計情報に−（γ／Ｎ）をペナルティとして掛けあわせる。γは所与の定数であり、ペナルティの効きの強弱を調整する。(Sixth embodiment)
In each of the first, second, third, and fifth embodiments described above, the hypothesis was frequently adopted as a hypothesis during the speech recognition process, but the recognition result (and / or the top N candidates) did not remain in the end. A language feature amount may be searched and a penalty may be applied to the contribution degree of learning data including many such language feature amounts. For example, if the final recognition result is “Prime Prime Minister” and the speech recognition unit frequently refers to “cancer” as a hypothesis on the way, the learning data containing many words “cancer” It is assumed that it may belong to a domain (for example, “medical” domain) that is not related to (in this example, “political” domain). Since learning data belonging to another domain may have a negative effect on the utterance of the target domain, the contribution of such learning data is slightly deducted and presented to the user. For example, a word that has not been left in any of the top K hypotheses with high likelihood in spite of being adopted N times as a hypothesis, has − (γ / N) in the language feature usage statistical information. Multiply as a penalty. γ is a given constant and adjusts the strength of the penalty effect.

（第７の実施形態）
図１２は、第７の実施形態にかかる言語モデル構築支援装置のブロックである。本実施形態の言語モデル構築支援装置は、複数の異なる音声認識部を備える。学習データは一括して与えるが、音声認識部１０４は複数存在し、言語モデルはそれぞれ別に学習（更新）する。音声認識処理はすべて同時に動いても、ユーザから明示的に指示されたどれか一つ（又は複数）だけ動いてもよい。このようなシステム構成により、音声認識部ごとに異なる言語特徴量利用履歴（言語特徴量利用統計情報）が得られるので、各々独立に寄与度を求めることができる。(Seventh embodiment)
FIG. 12 is a block diagram of a language model construction support apparatus according to the seventh embodiment. The language model construction support apparatus of this embodiment includes a plurality of different speech recognition units. Although learning data is given collectively, there are a plurality of speech recognition units 104, and language models are learned (updated) separately. All of the voice recognition processes may be moved simultaneously, or may be moved by any one (or a plurality) explicitly designated by the user. With such a system configuration, different language feature usage histories (language feature usage statistics) are obtained for each speech recognition unit, so that contributions can be obtained independently of each other.

これにより、学習データがどの音声認識部にどれだけ寄与したかを示すことができる。例えば音声認識部（および言語モデル）をドメインごとに割り当てたり、アプリケーションごとに（例えば、音声ディクテーション用、音声地図検索用、等）割り当てたりすることで、ユーザは学習データがどのドメイン／アプリケーションに寄与するかを知ることができる。寄与度の算出に用いられる言語特徴量統計情報を指定する入力をユーザから受け付けてもよい。 Thereby, it can be shown how much the learning data contributed to which speech recognition unit. For example, by assigning a voice recognition unit (and language model) for each domain or for each application (for example, for voice dictation, voice map search, etc.), the user can contribute to which domain / application the learning data You can know what to do. You may receive from a user the input which designates the linguistic feature-value statistics information used for calculation of a contribution.

音声認識部を複数同時に動作させる場合は最終的な認識結果は何らかの方法で一つにまとめることが多い。例えば、「A post-processing system to yield reduced word error rate: Recognizer Output Voting Error Reduction (ROVER), In Proceedings IEEE Automatic Speech Recognition and Understanding Workshop,pp.347-352」に開示されているROVER法を用いてもよい。なお、複数同時にユーザに提示してもよい。 When a plurality of voice recognition units are operated simultaneously, the final recognition results are often combined into one by some method. For example, using the ROVER method disclosed in `` A post-processing system to yield reduced word error rate: Recognizer Output Voting Error Reduction (ROVER), In Proceedings IEEE Automatic Speech Recognition and Understanding Workshop, pp.347-352 '' Also good. A plurality of them may be presented to the user simultaneously.

（第８の実施形態）
第８の実施形態に係る言語モデル構築支援装置は、上述した第１から第７の各実施形態と同様の構成を有し、寄与度算出の際に、複数の学習データクラスタ単位で解析を行う。寄与度算出の式としては、式２〜式５用いてさらに「学習クラスタに属する各コーパス」のsummationを取る。学習データクラスタとしては、例えば同一ユーザによって登録されたもの、同一部門のもの、ユーザによって明示的に選択されたもの、所定期間（例えば１カ月）に登録されたもの、等、さまざまな設定を用いることができる。第８の実施形態に係る言語モデル構築支援装置は、上述した第１から第７の各実施形態と同様の構成を有するが、学習データをいくつかのクラスタに分割して記録、参照する点において異なる。クラスタは、例えばユーザごとに１クラスタとしてもよいし、同一部門に属するユーザから登録されたものを１クラスタとしてもよいし、登録された時期に応じて一定期間（例えば１カ月）ごとに分けてもよい。寄与度の算出は、各クラスタごとに式２〜式５と同様の手順で算出したのち、所与の重み値を用いて総和（加重和）を取る。これによって、例えば一定期間ごとにクラスタ分けをするなら、特に強く影響を持たせる学習データと、それほど強くは影響させない学習データとを区別することができる。勿論、クラスタの分け方は上記に限らない。(Eighth embodiment)
The language model construction support apparatus according to the eighth embodiment has the same configuration as that of each of the first to seventh embodiments described above, and performs analysis in units of a plurality of learning data clusters when calculating the contribution. . As an expression for calculating the contribution degree, summation of “each corpus belonging to the learning cluster” is further calculated using Expressions 2 to 5. As the learning data cluster, for example, various settings such as those registered by the same user, those of the same department, those explicitly selected by the user, those registered in a predetermined period (for example, one month) are used. be able to. The language model construction support apparatus according to the eighth embodiment has the same configuration as that of each of the first to seventh embodiments described above, except that learning data is divided into several clusters and recorded and referenced. Different. For example, the cluster may be one cluster for each user, or one registered from users belonging to the same department may be one cluster, or divided into fixed periods (for example, one month) according to the registered time. Also good. The calculation of the contribution degree is performed for each cluster in the same procedure as Expressions 2 to 5, and then the sum (weighted sum) is obtained using a given weight value. Thus, for example, if clustering is performed at regular intervals, it is possible to distinguish between learning data that has a particularly strong influence and learning data that does not have a strong influence. Of course, the way of dividing the cluster is not limited to the above.

以上、好ましい実施の形態をあげて本発明を説明したが、本発明は必ずしも上記実施の形態に限定されるものではなく、その技術的思想の範囲内において様々に変形し実施することが出来る。 Although the present invention has been described with reference to the preferred embodiments, the present invention is not necessarily limited to the above-described embodiments, and various modifications can be made within the scope of the technical idea.

上述した本発明の実施形態に係る言語モデル構築支援装置は、ＣＰＵ（Central Processing Unit）等の制御部、メモリやハードディスク等の記憶部、通信部等を備えるコンピュータにより実現してもよい。言語特徴量抽出部、言語特徴量出自記憶部、言語モデル学習部、音声認識部と、学習データ寄与度解析部は、ＣＰＵが記憶部に格納された動作プログラム等を読み出して実行することにより実現されてもよく、また、ハードウェアで構成されてもよい。上述した実施の形態の一部の機能のみをコンピュータプログラムにより実現することもできる。上記コンピュータプログラムは、コンピュータで読み取り可能な記憶媒体に記録されてもよい。コンピュータプログラムは、記録媒体からメモリにロードされてもよいし、ネットワークを通じてコンピュータにダウンロードされ、メモリにロードされてもよい。 The language model construction support apparatus according to the embodiment of the present invention described above may be realized by a computer including a control unit such as a CPU (Central Processing Unit), a storage unit such as a memory and a hard disk, a communication unit, and the like. Language feature extraction unit, language feature source storage unit, language model learning unit, speech recognition unit, and learning data contribution analysis unit are realized by the CPU reading and executing the operation program stored in the storage unit. It may also be configured by hardware. Only some functions of the above-described embodiments can be realized by a computer program. The computer program may be recorded on a computer-readable storage medium. The computer program may be loaded from a recording medium into a memory, or may be downloaded to a computer through a network and loaded into the memory.

上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 A part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.

(付記１）
学習データ毎に言語特徴量を抽出する言語特徴量抽出手段と、
抽出された言語特徴量とその出自である学習データとを関連付けた言語特徴量出自情報を記憶する手段と、
過去に実施された音声認識処理において各言語特徴量が利用された度合いを示す言語特徴量利用統計情報を取得して記憶する手段と、
前記言語特徴量統計情報と前記言語特徴量出自情報に基づいて、前記音声認識処理における各学習データの寄与度を算出する寄与度算出手段と、
前記算出された学習データの寄与度を出力する手段と、
を備えることを特徴とする言語モデル構築支援装置。(Appendix 1)
Language feature extraction means for extracting a language feature for each learning data;
Means for storing language feature quantity origin information in which the extracted language feature quantity is associated with learning data that is the origin of the extracted language feature quantity;
Means for acquiring and storing linguistic feature quantity usage statistical information indicating the degree of use of each linguistic feature quantity in speech recognition processing performed in the past;
A contribution degree calculating means for calculating a contribution degree of each learning data in the speech recognition processing based on the language feature quantity statistical information and the language feature quantity origin information;
Means for outputting the calculated contribution of the learning data;
A language model construction support apparatus comprising:

(付記２）
前記寄与度算出手段は、各学習データにおける、前記音声認識処理で利用された言語特徴量の出現度に応じた値を寄与度として算出する
ことを特徴とする付記１に記載の言語モデル構築支援装置。(Appendix 2)
The language model construction support according to appendix 1, wherein the contribution degree calculation means calculates a value corresponding to the appearance degree of the language feature amount used in the speech recognition process in each learning data as a contribution degree. apparatus.

(付記３）
前記音声認識処理の処理対象の発話に対する正解テキストと前記発話の音声認識結果の正解度に関する情報と前記発話の音声認識結果に対する訂正情報の少なくとも一つに基づいて、前記言語特徴量利用統計情報に重み付けを行う、
ことを特徴とする付記１又は２に記載の言語モデル構築支援装置。(Appendix 3)
Based on at least one of correct text for the speech to be processed in the speech recognition process, information on the correctness of the speech recognition result of the speech, and correction information for the speech recognition result of the speech, the language feature usage statistics information Weight,
The language model construction support apparatus according to Supplementary Note 1 or 2, characterized in that:

(付記４）
前記言語特徴量抽出手段により取得された言語特徴量に基づいて、音声認識処理で用いる言語モデルを更新又は構築する、
ことを特徴とする付記１から３のいずれか１項に記載の言語モデル構築支援装置。(Appendix 4)
Updating or constructing a language model used in the speech recognition processing based on the language feature amount acquired by the language feature amount extraction unit;
4. The language model construction support apparatus according to any one of supplementary notes 1 to 3, wherein

(付記５）
前記言語モデルを用いて音声データについて音声認識を行い、前記言語特徴量利用統計情報を出力する音声認識手段をさらに備える、
ことを特徴とする付記１から４のいずれか１項に記載の言語モデル構築支援装置。(Appendix 5)
Further comprising speech recognition means for performing speech recognition on speech data using the language model and outputting the language feature usage statistics information;
5. The language model construction support apparatus according to any one of appendices 1 to 4, wherein

(付記６）
寄与度の算出対象となる学習データと、寄与度の算出に用いられる言語特徴量利用統計情報の少なくとも一方を、ユーザの要求に応じて切り換え又は限定する、
ことを特徴とする付記１から５のいずれか１項に記載の言語モデル構築支援装置。(Appendix 6)
Switching or limiting at least one of the learning data to be calculated for contribution and the language feature usage statistical information used for calculating the contribution according to the user's request,
6. The language model construction support apparatus according to any one of appendices 1 to 5, wherein

(付記７）
更新前の言語モデルを用いて行った音声認識処理で取得された言語特徴量統計情報と、更新後の言語モデルを用いて行った音声認識処理で取得された言語特徴量統計情報の差分を取り、当該差分を用いて学習データの寄与度を算出する、
ことを特徴とする付記１から６のいずれか１項に記載の言語モデル構築支援装置。(Appendix 7)
The difference between the linguistic feature quantity statistical information acquired in the speech recognition process performed using the language model before the update and the linguistic feature quantity statistical information acquired in the speech recognition process performed using the updated language model is obtained. The degree of contribution of learning data is calculated using the difference,
The language model construction support device according to any one of appendices 1 to 6, wherein

(付記８）
言語モデル構築支援方法であって、
学習データ毎に言語特徴量を抽出し、
抽出された言語特徴量とその出自である学習データとを関連付けた言語特徴量出自情報を記憶し、
過去に実施された音声認識処理において各言語特徴量が利用された度合いを示す言語特徴量利用統計情報を取得して記憶し、
前記言語特徴量統計情報と前記言語特徴量出自情報に基づいて、前記音声認識処理における各学習データの寄与度を算出し、
前記算出された学習データの寄与度を出力する、
ことを特徴とする言語モデル構築支援方法。(Appendix 8)
A language model construction support method,
Extract language features for each learning data,
Stores language feature origin information that associates the extracted language feature with the learning data that is its origin,
Acquiring and storing language feature usage statistics information indicating the degree to which each language feature was used in speech recognition processing performed in the past;
Based on the linguistic feature quantity statistical information and the linguistic feature quantity origin information, the contribution of each learning data in the speech recognition processing is calculated,
Outputting the calculated contribution degree of the learning data;
A language model construction support method characterized by this.

(付記９）
前記寄与度の算出では、各学習データにおける、前記音声認識処理で利用された言語特徴量の出現度に応じた値を寄与度として算出する
ことを特徴とする付記８に記載の言語モデル構築支援方法。(Appendix 9)
9. The language model construction support according to appendix 8, wherein in the calculation of the contribution level, a value corresponding to the appearance level of the language feature amount used in the speech recognition processing in each learning data is calculated as the contribution level. Method.

(付記１０）
前記音声認識処理の処理対象の発話に対する正解テキストと前記発話の音声認識結果の正解度に関する情報と前記発話の音声認識結果に対する訂正情報の少なくとも一つに基づいて、前記言語特徴量利用統計情報に重み付けを行う、
ことを特徴とする付記８又は９に記載の言語モデル構築支援方法。(Appendix 10)
Based on at least one of correct text for the speech to be processed in the speech recognition process, information on the correctness of the speech recognition result of the speech, and correction information for the speech recognition result of the speech, the language feature usage statistics information Weight,
The language model construction support method according to appendix 8 or 9, characterized in that.

(付記１１）
前記取得された言語特徴量に基づいて、音声認識処理で用いる言語モデルを更新又は構築する、
ことを特徴とする付記８から１０のいずれか１項に記載の言語モデル構築支援方法。(Appendix 11)
Updating or constructing a language model used in speech recognition processing based on the acquired language feature amount,
The language model construction support method according to any one of appendices 8 to 10, characterized in that:

(付記１２）
前記言語モデルを用いて音声データについて音声認識を行い、前記言語特徴量利用統計情報を出力する、
ことを特徴とする付記８から１１のいずれか１項に記載の言語モデル構築支援方法。(Appendix 12)
Performing speech recognition on speech data using the language model, and outputting the language feature usage statistics information;
The language model construction support method according to any one of appendices 8 to 11, wherein

(付記１３）
寄与度の算出対象となる学習データと、寄与度の算出に用いられる言語特徴量利用統計情報の少なくとも一方を、ユーザの要求に応じて切り換え又は限定する、
ことを特徴とする付記８から１２のいずれか１項に記載の言語モデル構築支援方法。(Appendix 13)
Switching or limiting at least one of the learning data to be calculated for contribution and the language feature usage statistical information used for calculating the contribution according to the user's request,
13. The language model construction support method according to any one of appendices 8 to 12, characterized in that:

(付記１４）
更新前の言語モデルを用いて行った音声認識処理で取得された言語特徴量統計情報と、更新後の言語モデルを用いて行った音声認識処理で取得された言語特徴量統計情報の差分を取り、当該差分を用いて学習データの寄与度を算出する、
ことを特徴とする付記８から１３のいずれか１項に記載の言語モデル構築支援方法。(Appendix 14)
The difference between the linguistic feature quantity statistical information acquired in the speech recognition process performed using the language model before the update and the linguistic feature quantity statistical information acquired in the speech recognition process performed using the updated language model is obtained. The degree of contribution of learning data is calculated using the difference,
14. The language model construction support method according to any one of supplementary notes 8 to 13, characterized in that:

(付記１５）
コンピュータに、
学習データ毎に言語特徴量を抽出する処理、
抽出された言語特徴量とその出自である学習データとを関連付けた言語特徴量出自情報を記憶する処理、
過去に実施された音声認識処理において各言語特徴量が利用された度合いを示す言語特徴量利用統計情報を取得して記憶する処理、
前記言語特徴量統計情報と前記言語特徴量出自情報に基づいて、前記音声認識処理における各学習データの寄与度を算出する処理、
前記算出された学習データの寄与度を出力する処理、
を実行させるためのプログラム。(Appendix 15)
On the computer,
Processing to extract language features for each learning data,
A process of storing language feature origin information that associates the extracted language feature with the learning data that is its origin;
Processing for acquiring and storing language feature usage statistics information indicating the degree to which each language feature is used in speech recognition processing performed in the past;
A process for calculating the contribution of each learning data in the speech recognition process based on the language feature quantity statistical information and the language feature quantity origin information;
A process of outputting the calculated learning data contribution,
A program for running

(付記１６）
前記寄与度を算出する処理では、各学習データにおける、前記音声認識処理で利用された言語特徴量の出現度に応じた値を寄与度として算出する、
ことを特徴とする付記１５に記載のプログラム。(Appendix 16)
In the process of calculating the contribution degree, a value corresponding to the appearance degree of the language feature amount used in the speech recognition process in each learning data is calculated as the contribution degree.
The program according to supplementary note 15, characterized by:

(付記１７）
前記コンピュータに、
前記音声認識処理の処理対象の発話に対する正解テキストと前記発話の音声認識結果の正解度に関する情報と前記発話の音声認識結果に対する訂正情報の少なくとも一つに基づいて、前記言語特徴量利用統計情報に重み付けを行う処理を実行させる、
ことを特徴とする付記１５又は１６に記載の言語モデル構築支援装置。(Appendix 17)
In the computer,
Based on at least one of correct text for the speech to be processed in the speech recognition process, information on the correctness of the speech recognition result of the speech, and correction information for the speech recognition result of the speech, the language feature usage statistics information To perform a process of weighting,
The language model construction support apparatus according to supplementary note 15 or 16, characterized in that:

(付記１８）
前記コンピュータに、
前記取得された言語特徴量に基づいて、音声認識処理で用いる言語モデルを更新又は構築する処理を実行させる、
ことを特徴とする付記１５から１７のいずれか１項に記載のプログラム。(Appendix 18)
In the computer,
Based on the acquired language feature amount, a process for updating or constructing a language model used in the speech recognition process is executed.
18. The program according to any one of supplementary notes 15 to 17, characterized in that:

(付記１９）
前記コンピュータに、
前記言語モデルを用いて音声データについて音声認識を行い、前記言語特徴量利用統計情報を出力する音声認識処理をさらに実行させる、
ことを特徴とする付記１５から１８のいずれか１項に記載のプログラム。(Appendix 19)
In the computer,
Performing speech recognition on speech data using the language model, and further executing speech recognition processing for outputting the language feature usage statistics information;
19. The program according to any one of supplementary notes 15 to 18, characterized by:

(付記２０）
前記コンピュータに、
寄与度の算出対象となる学習データと、寄与度の算出に用いられる言語特徴量利用統計情報の少なくとも一方を、ユーザの要求に応じて切り換え又は限定する処理を実行させる、
ことを特徴とする付記１５から１９のいずれか１項に記載のプログラム。(Appendix 20)
In the computer,
Causing at least one of learning data to be a contribution calculation target and language feature usage statistical information used for calculating the contribution to be switched or limited according to a user request,
20. The program according to any one of supplementary notes 15 to 19, characterized by:

(付記２１）
前記コンピュータに、
更新前の言語モデルを用いて行った音声認識処理で取得された言語特徴量統計情報と、更新後の言語モデルを用いて行った音声認識処理で取得された言語特徴量統計情報の差分を取り、当該差分を用いて学習データの寄与度を算出する処理を実行させる、
ことを特徴とする付記１５から２０のいずれか１項に記載のプログラム。(Appendix 21)
In the computer,
The difference between the linguistic feature quantity statistical information acquired in the speech recognition process performed using the language model before the update and the linguistic feature quantity statistical information acquired in the speech recognition process performed using the updated language model is obtained. , Execute the process of calculating the contribution of learning data using the difference,
21. The program according to any one of appendices 15 to 20, characterized in that:

本出願は、２０１１年６月３日に出願された日本出願特願２０１１−１２４７８５号を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2011-124785 for which it applied on June 3, 2011, and takes in those the indications of all here.

１０１、２０１、５０１言語特徴量抽出部
１０２、２０２、５０２言語特徴量出自記憶部
１０３、２０３、５０３言語モデル学習部
１０４、２０４音声認識部
１０５、２０５学習データ寄与度解析部101, 201, 501 Language feature extraction unit 102, 202, 502 Language feature source storage unit 103, 203, 503 Language model learning unit 104, 204 Speech recognition unit 105, 205 Learning data contribution analysis unit

Claims

Language feature extraction means for extracting a language feature for each learning data;
Means for storing language feature quantity origin information in which the extracted language feature quantity is associated with learning data that is the origin of the extracted language feature quantity;
Means for acquiring and storing linguistic feature quantity usage statistical information indicating the degree of use of each linguistic feature quantity in speech recognition processing performed in the past;
A contribution degree calculating means for calculating a contribution degree of each learning data in the speech recognition processing based on the language feature quantity statistical information and the language feature quantity origin information;
Means for outputting the calculated contribution of the learning data;
A language model construction support apparatus comprising:

2. The language model construction according to claim 1, wherein the contribution degree calculating unit calculates a value corresponding to the appearance degree of the language feature amount used in the speech recognition process in each learning data as a contribution degree. Support device.

Based on at least one of correct text for the speech to be processed in the speech recognition process, information on the correctness of the speech recognition result of the speech, and correction information for the speech recognition result of the speech, the language feature usage statistics information Weight,
The language model construction support apparatus according to claim 1 or 2, characterized in that

Updating or constructing a language model used in the speech recognition processing based on the language feature amount acquired by the language feature amount extraction unit;
The language model construction support apparatus according to any one of claims 1 to 3, wherein

Further comprising speech recognition means for performing speech recognition on speech data using the language model and outputting the language feature usage statistics information;
The language model construction support apparatus according to claim 1, wherein the language model construction support apparatus is any one of claims 1 to 4.

Switching or limiting at least one of the learning data to be calculated for contribution and the language feature usage statistical information used for calculating the contribution according to the user's request,
6. The language model construction support apparatus according to claim 1, wherein

The difference between the linguistic feature quantity statistical information acquired in the speech recognition process performed using the language model before the update and the linguistic feature quantity statistical information acquired in the speech recognition process performed using the updated language model is obtained. The degree of contribution of learning data is calculated using the difference,
The language model construction support apparatus according to any one of claims 1 to 6, wherein

A language model construction support method,
Extract language features from learning data,
Stores language feature origin information that associates the extracted language feature with the learning data that is its origin,
Language feature usage statistics information indicating the degree to which each language feature is used in the speech recognition process is acquired and stored,
Based on the linguistic feature quantity statistical information and the linguistic feature quantity origin information, the contribution of each learning data in the speech recognition processing is calculated,
Outputting the calculated contribution degree of the learning data;
A language model construction support method characterized by this.

On the computer,
Processing to extract language features from learning data,
A process of storing language feature origin information that associates the extracted language feature with the learning data that is its origin;
Processing for acquiring and storing linguistic feature quantity usage statistical information indicating the degree to which each linguistic feature quantity is used in the speech recognition process;
A process for calculating the contribution of each learning data in the speech recognition process based on the language feature quantity statistical information and the language feature quantity origin information;
A process of outputting the calculated learning data contribution,
A program for running