JP2002091477A

JP2002091477A - Voice recognition system, voice recognition device, acoustic model control server, language model control server, voice recognition method and computer readable recording medium which records voice recognition program

Info

Publication number: JP2002091477A
Application number: JP2000280674A
Authority: JP
Inventors: Yohei Okato; 洋平岡登; Jun Ishii; 純石井
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2000-09-14
Filing date: 2000-09-14
Publication date: 2002-03-27

Abstract

PROBLEM TO BE SOLVED: To update acoustic and language models and to improve the precision in recognition without exerting a large load onto a user. SOLUTION: An acoustic model control server 20 obtains updated acoustic data 107 and constructs an acoustic model. A language model control server 30 obtains updated language data 114 and constructs a language model. These models are respectively tansmitted to a voice recognition device 10. In the device 10, an acoustic model updating means 111 updates an acoustic model 102 by the transmitted acoustic model. A language model updating means 118 updates a language model 103 using the transmitted language model.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声認識の際に
参照する音響モデルと言語モデルを、高い認識率が得ら
れるように、ネットワークを介して最新の状態に更新す
る音声認識システム、音声認識装置、音響モデル管理サ
ーバ、言語モデル管理サーバ、音声認識方法及び音声認
識プログラムを記録したコンピュータ読み取り可能な記
録媒体に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition system and a speech recognition system for updating an acoustic model and a language model referred to during speech recognition through a network so as to obtain a high recognition rate. The present invention relates to an apparatus, an acoustic model management server, a language model management server, a speech recognition method, and a computer-readable recording medium recording a speech recognition program.

【０００２】[0002]

【従来の技術】音声認識においては、通常、ディジタル
化して入力された音声を、信号処理手法を用いて音声の
音響的特徴を良く表すベクトルの時系列へ変換した後、
音声のモデル（音響モデル、言語モデル）との照合処理
を行う。2. Description of the Related Art In speech recognition, usually, a digitally input speech is converted into a time series of vectors that well represent the acoustic characteristics of the speech by using a signal processing technique.
A matching process with a voice model (acoustic model, language model) is performed.

【０００３】照合処理はｎ個の時刻フレームからなる音
響特徴ベクトル時系列Ａ＝［ａ１ａ２：：ａｎ］から発
声された単語列Ｗ＝［ｗ１ｗ２：：ｗｋ］（ｋは単語
数）を求める問題である。認識精度が最も高くなるよう
な単語列を推定するには、出現確率Ｐ（Ｗ｜Ａ）が最大
となる認識単語列Ｗ＊を求めれば良い。すなわち、Ｗ＊＝ａｒｇｍａｘＷＰ（Ｗ｜Ａ）（１）ただし、Ｐ（Ｗ｜Ａ）を直接求めることは通常困難であ
る。そこでベイズの定理を用いて、Ｐ（Ｗ｜Ａ）は
（２）式のように書き換える。Ｐ（Ｗ｜Ａ）＝Ｐ（Ｗ）Ｐ（Ａ｜Ｗ）／Ｐ（Ａ）（２）The collation processing is a problem of finding a word sequence W = [w1w2 :: wk] (k is the number of words) uttered from an acoustic feature vector time series A = [a1a2 :: an] composed of n time frames. is there. In order to estimate a word string with the highest recognition accuracy, a recognized word string W * that maximizes the appearance probability P (W | A) may be obtained. That is, W * = argmaxWP (W | A) (1) However, it is usually difficult to directly obtain P (W | A). Therefore, using Bayes' theorem, P (W | A) is rewritten as in equation (2). P (W | A) = P (W) P (A | W) / P (A) (2)

【０００４】ここで、左辺を最大化するＷを求める際
に、右辺分母であるＰ（Ａ）は認識候補となるＷに影響
を与えないため、右辺分子を最大化するＷを求めれば良
い。すなわち、（３）式のようになる。Ｗ＊＝ａｒｇｍａｘＷＰ（Ｗ）Ｐ（Ａ｜Ｗ）（３）ここで、Ｐ（Ｗ）を与える確率モデルを言語モデルと呼
び、Ｐ（Ａ｜Ｗ）を与える確率モデルを音響モデルと呼
ぶ。Here, when determining W for maximizing the left side, P (A), which is the denominator on the right side, does not affect W which is a recognition candidate. Therefore, W for maximizing the numerator of the right side may be obtained. That is, equation (3) is obtained. W * = argmaxWP (W) P (A | W) (3) Here, the probability model that gives P (W) is called a language model, and the probability model that gives P (A | W) is called an acoustic model.

【０００５】音声認識におけるこれらの代表的なモデル
化方法は、音響モデルを隠れマルコフモデルで表現し、
言語モデルをｎグラムと呼ばれる単語のｎ−１重マルコ
フ過程で表現する方法である。[0005] These typical modeling methods in speech recognition represent an acoustic model by a hidden Markov model,
This is a method of expressing a language model by an n-1 double Markov process of a word called n-gram.

【０００６】これらの方法の詳細は、例えば「音声認識
の基礎（上、下）」Ｌ．ＲＡＢＩＮＥＲ，Ｂ．Ｈ．ＪＵ
ＡＮＧ，古井監訳、１９９５年、１１月、ＮＴＴアドバ
ンステクノロジ（以下、文献１とする）、「確率的言語
モデル」北研二、東京大学出版会（以下、文献２とす
る）、「音声・音情報のディジタル信号処理」鹿野清
宏、中村哲、伊勢史郎共著、１９９７年、１１月、昭晃
堂（以下、文献３とする）に記されている。The details of these methods are described in, for example, "Basics of speech recognition (upper and lower)" L. RABINER, B .; H. JU
ANG, edited by Furui, November, 1995, NTT Advanced Technology (hereinafter referred to as Reference 1), "Probabilistic Language Model" Kenji Kita, University of Tokyo Press (hereinafter referred to as Reference 2), "Speech and Sound Information" Digital Signal Processing ", co-authored by Kiyohiro Kano, Satoshi Nakamura and Shiro Ise, Nov. 1997, Shokodo (hereinafter referred to as Reference 3).

【０００７】これらの方法において、モデルを構成する
パラメータは大量のデータから統計的に推定する。音響
モデルの構築では、あらかじめ多数の話者からの単語、
文等の音声データを収集し、統計的手法を利用して認識
精度や認識精度と良く関連した指標が向上するように推
定する。例えば、音響モデルを構成する隠れマルコフモ
デルのパラメータを、音響モデルが学習データを出力す
る尤度が最大となるように、バウム・ウェルチアルゴリ
ズムを用いて推定する。音響モデルの推定方法は、文献
１下巻において詳細に記されている。In these methods, parameters constituting a model are statistically estimated from a large amount of data. In constructing the acoustic model, words from many speakers,
Speech data such as sentences is collected, and estimation is performed by using a statistical method so that recognition accuracy and an index well related to the recognition accuracy are improved. For example, the parameters of the hidden Markov model constituting the acoustic model are estimated by using the Baum-Welch algorithm so that the likelihood that the acoustic model outputs the learning data is maximized. The method of estimating the acoustic model is described in detail in the lower volume of Reference 1.

【０００８】同様に、言語モデルの構築では、新聞や会
話の書き起こし等のテキストから、言語モデルの構造に
従って、それぞれの発話や発話を構成する単語の出現す
る確率を計算する。例えば、ｎグラム言語モデルにおい
て、ｎ＝２とおいたとき（バイグラム言語モデルと呼ば
れる）、Ｐ（Ｗ）は（４）式のように近似される。ｎグ
ラム言語モデルのパラメータは、学習用テキストデータ
内の隣接するｎ単語の頻度から推定される。言語モデル
の推定方法は、文献２において詳細に記されている。Ｐ（ｗ１．．ｗ．ｋ）＝Ｐ（ｗ１）Ｐ（ｗ２｜ｗ１）．．Ｐ．（ｗｋ｜ｗ１．．ｗ．ｋ−１） ≒Ｐ（ｗ１）Ｐ（ｗ２｜ｗ１）．．Ｐ．（ｗｋ｜ｗｋ−１）（４）Similarly, in the construction of a language model, the probability of appearance of each utterance and words constituting the utterance is calculated from texts such as newspapers and transcripts of conversations in accordance with the structure of the language model. For example, when n = 2 in the n-gram language model (called a bigram language model), P (W) is approximated as in equation (4). The parameters of the n-gram language model are estimated from the frequency of adjacent n words in the learning text data. The method of estimating the language model is described in detail in Reference 2. P (w1..wk) = P (w1) P (w2 | w1). . P. (Wk | w1 ... wk-1) {P (w1) P (w2 | w1). . P. (Wk | wk-1) (4)

【０００９】このように大量テキストを用いて、それぞ
れの単語の出現確率を統計的に推定することによって、
統計量を用いない方法に比べて高い認識精度を得る言語
モデルを構築できる。なお、日本語ではテキストが分か
ち書きされないため、単語の定義はあいまいであるが、
本文では整合性のある何らかの手段でテキストを分割し
たそれぞれの単位を単語と定義する。この単語は、例え
ば、文字や形態素、文節等言語的な単位やエントロピー
基準に基づいたテキストの分割、これらの組み合わせ等
である。By using a large amount of text to statistically estimate the appearance probability of each word,
A language model that achieves higher recognition accuracy than a method that does not use statistics can be constructed. In addition, the definition of the word is ambiguous because the text is not separated in Japanese,
In the text, each unit obtained by dividing the text by some means that is consistent is defined as a word. This word is, for example, a linguistic unit such as a character, a morpheme, or a phrase, a text segmentation based on an entropy criterion, or a combination thereof.

【００１０】図１２は上記文献１に開示された従来の音
声認識装置の構成を示すブロック図である。図１２にお
いて、１０１は音声信号１００を入力し音声認識して認
識結果１０４を出力する照合手段、１０２は照合手段１
０１が音声認識する際に参照する音響モデル、１０３は
照合手段１０１が音声認識する際に参照する言語モデル
である。FIG. 12 is a block diagram showing the configuration of the conventional speech recognition device disclosed in the above-mentioned document 1. In FIG. 12, reference numeral 101 denotes a matching unit that receives a voice signal 100 and performs voice recognition to output a recognition result 104;
Reference numeral 01 denotes an acoustic model to be referred to when performing voice recognition, and reference numeral 103 denotes a language model to be referred to when the matching unit 101 performs voice recognition.

【００１１】次に動作について説明する。照合手段１０
１は、ユーザの音声信号１００を入力し、音響モデル１
０２及び言語モデル１０３を参照して、音声認識を実行
し認識結果１０４を出力する。音響モデル１０２は、入
力されたユーザの音声信号１００の音声波形を信号処理
して得られる音響特徴ベクトルの時系列と、例えば、音
素等で表される音声認識装置が扱う最小のシンボル情報
との写像関係を表す。言語モデル１０３は、音響モデル
１０２により写像されるシンボルの組み合わせにより表
される単語等より長い認識単位との対応関係と、単語の
出現情報を記述する。音響モデル１０２は、あるシンボ
ルのモデルがベクトル時系列を出力する確率を求めるも
のであり、すなわち、音声の音響的な観測値系列の確率
を求めるものであり、言語モデル１０３は、ある単語列
の出現確率を求めるものである。Next, the operation will be described. Collation means 10
1 inputs a user's voice signal 100 and outputs an acoustic model 1
With reference to the language model 103 and the language model 103, speech recognition is executed and a recognition result 104 is output. The acoustic model 102 is composed of a time series of an acoustic feature vector obtained by performing signal processing on a speech waveform of an input user speech signal 100 and, for example, the minimum symbol information handled by a speech recognition device represented by a phoneme or the like. Represents a mapping relationship. The language model 103 describes a correspondence relationship with a recognition unit longer than a word or the like represented by a combination of symbols mapped by the acoustic model 102, and word appearance information. The acoustic model 102 is for calculating the probability that a model of a certain symbol outputs a vector time series, that is, for obtaining the probability of an acoustic observation value sequence of speech, and the language model 103 is for calculating a certain word string. This is for calculating the appearance probability.

【００１２】図１３は音声信号１００を入力して認識結
果１０４を得る従来の音声認識処理の手順を示すフロー
チャートである。ステップＳＴ１３０１において、入力
された音声信号１００はＡ／Ｄ変換されてディジタル信
号となる。ステップＳＴ１３０２において、ディジタル
化された音声信号は適当な間隔をおいて信号処理され、
音声の性質をよく表す音響特徴ベクトルの時系列へと変
換される。FIG. 13 is a flowchart showing the procedure of a conventional voice recognition process for obtaining a recognition result 104 by inputting a voice signal 100. In step ST1301, the input audio signal 100 is A / D converted to a digital signal. In step ST1302, the digitized audio signal is signal-processed at appropriate intervals,
It is converted into a time series of acoustic feature vectors that well represent the properties of speech.

【００１３】ステップＳＴ１３０３において、音響特徴
ベクトルは音響照合処理により音響モデル１０２と照合
され、それぞれの認識候補について、音響特徴ベクトル
の時系列を出力する確率が求められる。ステップＳＴ１
３０４において、それぞれの認識候補はさらに言語照合
処理によって、言語モデル１０３と照合され単語列の出
力確率が乗じられる。最後にステップＳＴ１３０５にお
いて、それぞれの認識候補から最も適切な候補を選択し
て認識結果１０４を得る。通常、最も適切な認識結果と
は、上記照合によって最も確率が高いとされた認識候補
である。In step ST1303, the acoustic feature vector is collated with the acoustic model 102 by acoustic collation processing, and the probability of outputting a time series of acoustic feature vectors is obtained for each recognition candidate. Step ST1
At 304, each recognition candidate is further collated with the language model 103 by a language collation process and multiplied by the output probability of the word string. Finally, in step ST1305, the most appropriate candidate is selected from the respective recognition candidates, and the recognition result 104 is obtained. Usually, the most appropriate recognition result is a recognition candidate that is determined to have the highest probability by the above-described collation.

【００１４】上記方法により構成された音響モデル及び
言語モデルのみでは、十分な性能が達成されない場合
で、ユーザがカスタマイズ可能な音声認識装置では、認
識されにくい音声や単語をより良く認識させるために、
音響モデルをユーザに適応化させたり、認識対象単語を
ユーザ辞書に追加することによって、認識精度を高める
ことができる場合がある。In the case where sufficient performance is not achieved only by the acoustic model and the language model constructed by the above method, a speech recognition device that can be customized by the user is to improve the recognition of speech and words that are difficult to recognize.
In some cases, the recognition accuracy can be improved by adapting the acoustic model to the user or adding the recognition target word to the user dictionary.

【００１５】まず、音響モデルを適応化させる場合につ
いて説明する。図１４は上記文献１に開示された、音響
モデルを音声信号１００に適応化させる音響モデル適応
化手段を備えた従来の音声認識装置の構成を示すブロッ
ク図である。図１４において、図１２と異なる部分は、
入力する音声信号１００に対して音響モデルを適応させ
るために、初期音響モデル１００３，音響モデル適応化
手段１００４，適応化済み音響モデル１４０１を備える
ことと、音声信号１００が音響モデル適応化手段１００
４及び照合手段１０１により選択されることである。First, the case where the acoustic model is adapted will be described. FIG. 14 is a block diagram showing a configuration of a conventional speech recognition device provided with an acoustic model adapting means for adapting an acoustic model to a speech signal 100 disclosed in the above-mentioned Document 1. 14 differs from FIG. 12 in that:
An initial acoustic model 1003, an acoustic model adapting means 1004, and an adapted acoustic model 1401 are provided to adapt the acoustic model to the input audio signal 100.
4 and the matching means 101.

【００１６】次に図１４に示す音声認識装置の動作につ
いて説明する。音響モデル適応化手段１００４は、実際
の認識前に収集した適応化用音声（音声信号１００）と
初期音響モデル１００３から、例えば最大事後確率推定
法を用いて、初期音響モデル１００３を適応化用音声に
適応化させ、適応化済み音響モデル１４０１を得る。音
響モデルの適応化方法については、文献３の７章に示さ
れている。照合手段１０１は、音声信号１００を入力
し、適応化済み音響モデル１４０１及び言語モデル１０
３を参照して、音声認識を行い認識結果１０４を出力す
る。Next, the operation of the speech recognition apparatus shown in FIG. 14 will be described. The acoustic model adapting means 1004 converts the initial acoustic model 1003 from the adaptation speech (speech signal 100) collected before the actual recognition and the initial acoustic model 1003 using, for example, a maximum a posteriori probability estimation method. And an adapted acoustic model 1401 is obtained. A method of adapting the acoustic model is described in Chapter 7 of Reference 3. The matching unit 101 receives the audio signal 100 and receives the adapted acoustic model 1401 and language model 10
3, the speech recognition is performed and the recognition result 104 is output.

【００１７】次に認識対象単語をユーザが辞書に登録す
る場合について説明する。図１５は上記文献１に開示さ
れた、ユーザ辞書が追加された従来の音声認識装置の構
成を示すブロック図である。図１５において、図１２と
異なる部分はユーザ辞書６０１を備えることである。Next, a case where a user registers a recognition target word in a dictionary will be described. FIG. 15 is a block diagram showing the configuration of a conventional speech recognition apparatus disclosed in the above-mentioned Document 1 to which a user dictionary is added. 15 differs from FIG. 12 in that a user dictionary 601 is provided.

【００１８】図１６はユーザ辞書６０１の構成例を示す
図である。ユーザ辞書６０１は認識されない単語、認識
されにくい単語を、より良く認識させるためにユーザが
登録した単語の集まりで、単語の表記、読みからなる単
語の一覧である。FIG. 16 is a diagram showing a configuration example of the user dictionary 601. The user dictionary 601 is a group of words registered by the user for better recognition of unrecognized words and words that are difficult to recognize, and is a list of words that are written and read.

【００１９】次に図１５に示す音声認識装置の動作につ
いて説明する。図１７はユーザ辞書６０１が追加された
従来の音声認識処理の手順を示すフローチャートであ
る。図１７において、図１３と異なる点は、ステップＳ
Ｔ１７０４の言語照合処理において、言語モデル１０３
とユーザ辞書６０１を参照するために、ユーザ辞書６０
１に登録された単語が認識対象単語に加えられることで
ある。Next, the operation of the speech recognition apparatus shown in FIG. 15 will be described. FIG. 17 is a flowchart showing a procedure of a conventional speech recognition process to which a user dictionary 601 has been added. 17 differs from FIG. 13 in that step S
In the language matching process of T1704, the language model 103
To refer to the user dictionary 601,
1 is added to the recognition target words.

【００２０】ユーザ辞書６０１に登録された単語は、適
当な接続確率によって、任意の単語列と接続可能とす
る。例えば、単語の出現条件を先行する１単語のみによ
り決定するバイグラム言語モデルにおいて、任意の単語
ｗｉとｗｊにはさまれるユーザ辞書登録単語ｗｕｓｅｒ
の確率Ｐ（ｗｕｓｅｒ｜ｗｉ），Ｐ（ｗｊ｜ｗｕｓｅ
ｒ）に一定値を与え、言語モデル全体の確率が１になる
ように確率値を再配分する。この結果、ユーザは登録し
た単語が含まれた認識結果を得られる。The words registered in the user dictionary 601 can be connected to an arbitrary word string with an appropriate connection probability. For example, in a bigram language model in which the appearance condition of a word is determined by only one preceding word, a user dictionary registered word “user” sandwiched between arbitrary words wi and wj.
Probability P (user | wi), P (wj | use
A constant value is given to r), and the probability values are redistributed so that the probability of the entire language model becomes 1. As a result, the user can obtain a recognition result including the registered word.

【００２１】しかし、図１４及び図１５に示す音声認識
装置の構成では、認識精度を高めるために、音声信号１
００と適応化済み音響モデル１４０１を作成したり、ユ
ーザ辞書６０１に単語を登録することによって、ユーザ
が音響モデル１０２，言語モデル１０３を自らカスタマ
イズする必要があり、ユーザに大きな負担を強いること
になる。However, in the configuration of the voice recognition apparatus shown in FIGS. 14 and 15, the voice signal 1
The user needs to customize the acoustic model 102 and the language model 103 by creating the acoustic model 1401 and the adapted acoustic model 1401 or registering a word in the user dictionary 601, thereby imposing a heavy burden on the user. .

【００２２】また、ユーザ辞書６０１へ単語を登録した
場合であっても、言語モデル構築時点では、出現してい
なかった単語や考慮されていなかった用法があるため
に、それらの単語へ付与した出現確率が不適切である場
合がある。さらに、これが原因となり認識精度が低下す
る可能性がある。Even when words are registered in the user dictionary 601, there are words that have not appeared or usages that have not been taken into account at the time of constructing the language model. Probabilities may be inappropriate. Further, this may cause a decrease in recognition accuracy.

【００２３】さらに、上記のようにカスタマイズされた
音響モデル１０２及び言語モデル１０３は、特定の照合
手段１０１からのみ参照される。このため、それ以外の
照合手段１０１が、このカスタマイズした音響モデル１
０２及び言語モデル１０３を利用すると、認識精度が低
下してしまう。Further, the acoustic model 102 and the language model 103 customized as described above are referred to only by the specific matching unit 101. For this reason, the other matching means 101 uses this customized acoustic model 1
If the language model 02 and the language model 103 are used, the recognition accuracy is reduced.

【００２４】このため、漢字かな変換や機械翻訳等の言
語処理システムでは、上記に示した問題のうち、ユーザ
辞書６０１の更新に関し、ネットワークを介して辞書を
自動的に更新する機能を備えることによって、ユーザの
負担を軽減したシステムが、例えば特開平１０−２６０
９６０号公報のように提案されている。しかし、上記公
報では、音声のようなパターン認識のためのモデルを扱
うことは考慮しておらず、音響モデル１０２のようなパ
ターン情報に関するモデルに適用することができない。For this reason, the language processing system such as kanji-to-kana conversion or machine translation has a function of automatically updating a user dictionary 601 via a network with respect to updating of the user dictionary 601 among the problems described above. For example, Japanese Patent Application Laid-Open No. H10-260
No. 960 has been proposed. However, the above-mentioned publication does not consider handling a model for pattern recognition such as speech, and cannot be applied to a model related to pattern information such as the acoustic model 102.

【００２５】さらに、言語モデル１０３の更新において
も、ユーザの登録単語数が増加すると、適切でない短い
単語が挿入された湧き出し誤りを生じやすくなることか
ら、認識精度の低下が起きやすくなる。この認識精度の
低下は、ユーザ辞書６０１へ追加登録した単語が、言語
モデル１０３を構築した時点では存在しなかったり、使
用環境が変化しているために、例えばユーザ登録単語の
出現確率Ｐ（ｗｕｓｅｒ｜ｗｉ），Ｐ（ｗｊ｜ｗｕｓｅ
ｒ）等に適切でない出現確率が付与されている場合があ
るからである。この結果、不適切な認識結果を得やすく
なり、認識精度が低下する可能性がある。これを防ぐた
めには、単語の出現確率を適切に設定する必要がある
が、ユーザ自身が妥当な値を与えることは一般に困難で
ある。Further, also in the update of the language model 103, if the number of registered words of the user increases, a source error in which an inappropriate short word is inserted is likely to occur, so that the recognition accuracy tends to decrease. This decrease in recognition accuracy is caused by the fact that words additionally registered in the user dictionary 601 do not exist at the time of constructing the language model 103, or the use environment has changed, for example, the appearance probability P (user | Wi), P (wj | use
This is because an inappropriate appearance probability may be given to r) or the like. As a result, an inappropriate recognition result may be easily obtained, and the recognition accuracy may be reduced. In order to prevent this, it is necessary to appropriately set the appearance probability of words, but it is generally difficult for the user himself to give a reasonable value.

【００２６】[0026]

【発明が解決しようとする課題】従来の音声認識装置は
以上のように構成されているので、認識精度を高めるた
めに音響モデルや言語モデルをカスタマイズする際に、
ユーザに大きな負担をかけるという課題があった。Since the conventional speech recognition apparatus is configured as described above, when customizing an acoustic model or a language model in order to improve recognition accuracy,
There has been a problem that a heavy burden is imposed on the user.

【００２７】また、ユーザごとにカスタマイズされた音
声認識装置以外を利用する場合、認識精度が低下すると
いう課題があった。Further, when using a speech recognition device other than a speech recognition device customized for each user, there is a problem that the recognition accuracy is reduced.

【００２８】さらに、ネットワークを介したカスタマイ
ズによって、音響モデルを自動更新することができない
という課題があった。Further, there is a problem that the acoustic model cannot be automatically updated by customization via a network.

【００２９】さらに、ユーザ辞書への登録が増加すると
認識精度が低下しやすいという課題があった。Further, there is a problem that the recognition accuracy is apt to decrease when the number of registrations in the user dictionary increases.

【００３０】この発明は上記のような課題を解決するた
めになされたもので、ネットワークに接続されたサーバ
側で、最新の音響データ又は言語データを取得して、最
新の状態にある音響モデル又は言語モデルを構築した
り、ユーザに対応した音響モデル又は言語モデルを構築
し、ネットワークを介してユーザ側の音響モデルと言語
モデルを更新することで、ユーザに大きな負担をかける
ことなく認識精度を向上させる音声認識システム、音声
認識装置、音響モデル管理サーバ、言語モデル管理サー
バ、音声認識方法及び音声認識プログラムを記録したコ
ンピュータ読み取り可能な記録媒体を得ることを目的と
する。The present invention has been made in order to solve the above-mentioned problems. A server connected to a network acquires the latest acoustic data or language data to obtain the latest acoustic model or the latest acoustic data. Improve recognition accuracy without imposing a heavy burden on the user by constructing a language model, constructing an acoustic model or language model corresponding to the user, and updating the acoustic model and language model on the user side via the network An object of the present invention is to obtain a computer-readable recording medium that stores a speech recognition system, a speech recognition device, an acoustic model management server, a language model management server, a speech recognition method, and a speech recognition program.

【００３１】また、ネットワークを介して接続すること
によって、あらゆる音声認識装置において、カスタマイ
ズされた音響モデル又は言語モデルを利用できる音声認
識システム、音声認識装置、音響モデル管理サーバ、言
語モデル管理サーバ、音声認識方法及び音声認識プログ
ラムを記録したコンピュータ読み取り可能な記録媒体を
得ることを目的とする。Also, by connecting via a network, any speech recognition apparatus can use a customized acoustic model or language model, a speech recognition apparatus, an acoustic model management server, a language model management server, and a speech model. It is an object of the present invention to obtain a computer-readable recording medium on which a recognition method and a voice recognition program are recorded.

【００３２】さらに、ユーザから得られる辞書やテキス
トと半自動的に収集されたテキストを利用することによ
って、ユーザ辞書が大きくなった場合でも、認識精度が
低下しにくい音声認識システム、音声認識装置、音響モ
デル管理サーバ、言語モデル管理サーバ、音声認識方法
及び音声認識プログラムを記録したコンピュータ読み取
り可能な記録媒体を得ることを目的とする。Further, by utilizing a dictionary or text obtained from the user and text collected semi-automatically, even when the user dictionary becomes large, the recognition accuracy is not easily reduced even if the user dictionary becomes large. It is an object of the present invention to obtain a computer-readable recording medium on which a model management server, a language model management server, a speech recognition method, and a speech recognition program are recorded.

【００３３】[0033]

【課題を解決するための手段】この発明に係る音声認識
システムは、音声信号を入力し、音声の音響的な観測値
系列の確率を求める音響モデルを参照して音声認識を行
い、認識結果を出力する音声認識装置と、上記音声認識
装置とネットワークを介して接続され、更新された音響
データを取得して上記音響モデルを構築する音響モデル
管理サーバとを備えたものにおいて、上記音響モデル管
理サーバが構築した上記音響モデルを上記音声認識装置
に送信し、上記音声認識装置が、音声認識の際に参照す
る音響モデルを、上記音響モデル管理サーバが送信した
音響モデルにより更新するものである。A speech recognition system according to the present invention receives a speech signal, performs speech recognition with reference to an acoustic model for obtaining the probability of an acoustic observation sequence of speech, and generates a recognition result. An acoustic model management server configured to output a speech recognition device and to connect to the speech recognition device via a network to acquire updated acoustic data and construct the acoustic model. Transmits the constructed acoustic model to the speech recognition apparatus, and the speech recognition apparatus updates the acoustic model referred to during speech recognition by the acoustic model transmitted by the acoustic model management server.

【００３４】この発明に係る音声認識システムは、音響
モデル管理サーバが、音声認識装置が音声認識の際に参
照する音響モデルを特定するＩＤを取得し、取得したＩ
Ｄで指示される特定条件に対応して、更新された音響デ
ータを読み出し、上記特定条件に依存した音響モデルを
構築して上記音声認識装置に送信するものである。In the speech recognition system according to the present invention, the acoustic model management server acquires an ID for specifying an acoustic model to be referred to when the speech recognition device recognizes speech, and acquires the acquired ID.
In response to the specific condition indicated by D, the updated acoustic data is read, an acoustic model dependent on the specific condition is constructed, and transmitted to the speech recognition device.

【００３５】この発明に係る音声認識システムは、音声
信号を入力し、単語列の出現確率を求める言語モデルを
参照して音声認識を行い、認識結果を出力する音声認識
装置と、上記音声認識装置とネットワークを介して接続
され、更新された言語データを取得して上記言語モデル
を構築する言語モデル管理サーバとを備えたものにおい
て、上記言語モデル管理サーバが構築した上記言語モデ
ルを上記音声認識装置に送信し、上記音声認識装置が、
音声認識の際に参照する言語モデルを、上記言語モデル
管理サーバが送信した言語モデルにより更新するもので
ある。A speech recognition system according to the present invention receives a speech signal, performs speech recognition with reference to a language model for calculating the probability of occurrence of a word string, and outputs a recognition result. And a language model management server connected via a network to obtain updated language data and construct the language model, wherein the language model constructed by the language model management server is transmitted to the speech recognition device. And the voice recognition device transmits
The language model referred at the time of speech recognition is updated by the language model transmitted by the language model management server.

【００３６】この発明に係る音声認識システムは、言語
モデル管理サーバが、音声認識装置が音声認識の際に参
照する言語モデルを特定するＩＤを取得し、取得したＩ
Ｄで指示される特定条件に対応して、更新された言語デ
ータを読み出し、上記特定条件に依存した言語モデルを
構築して上記音声認識装置に送信するものである。In the speech recognition system according to the present invention, the language model management server acquires an ID for specifying a language model to be referred to when the speech recognition device recognizes speech, and acquires the acquired ID.
The updated language data is read in accordance with the specific condition indicated by D, a language model dependent on the specific condition is constructed, and transmitted to the speech recognition device.

【００３７】この発明に係る音声認識システムは、音声
認識装置が音声認識の際に単語を登録したユーザ辞書を
参照し、言語モデル管理サーバが、ネットワークを介し
て上記ユーザ辞書を読み出し、更新された言語データ
と、読み出した上記ユーザ辞書とを参照し、上記ユーザ
辞書に依存した言語モデルを構築して上記音声認識装置
に送信するものである。In the speech recognition system according to the present invention, the speech recognition device refers to a user dictionary in which words are registered at the time of speech recognition, and the language model management server reads out the user dictionary via a network and updates the user dictionary. With reference to the language data and the read user dictionary, a language model dependent on the user dictionary is constructed and transmitted to the speech recognition device.

【００３８】この発明に係る音声認識システムは、言語
モデル管理サーバが、音声認識装置のユーザが利用した
テキストを取得し、更新された言語データと、取得した
上記テキストとを参照し、上記テキストに依存した言語
モデルを構築して上記音声認識装置に送信するものであ
る。In the speech recognition system according to the present invention, the language model management server acquires the text used by the user of the speech recognition apparatus, refers to the updated language data and the acquired text, and adds the text to the text. It builds a dependent language model and sends it to the speech recognition device.

【００３９】この発明に係る音声認識システムは、音声
信号を入力し、音声の音響的な観測値系列の確率を求め
る音響モデルを参照して音声認識を行い、認識結果を出
力する音声認識装置と、上記音声認識装置とネットワー
クを介して接続され、適応化前の初期音響モデルを有す
る音響モデル管理サーバとを備えたものにおいて、上記
音声認識装置が、上記音響モデルを特定するＩＤと、入
力された音声信号から適応化用の音声データとを取得
し、取得したＩＤ及び適応化用の音声データを、ネット
ワークを介して上記音響モデル管理サーバに送信し、上
記音響モデル管理サーバが、送信された適応化用の音声
データを用いて、上記初期音響モデルを適応化し、適応
化済み音響モデルを、送信された上記ＩＤに対応付けて
格納すると共に、外部からの音響モデル更新指令を受け
て、ネットワークを介して上記音声認識装置から上記音
響モデルを特定するＩＤを受信し、受信したＩＤに対応
する適応化済み音響モデルを、格納している適応化済み
音響モデルの中から選択して読み出し、ネットワークを
介して上記音声認識装置に送信し、上記音声認識装置
が、音声認識の際に参照する音響モデルを、上記音響モ
デル管理サーバが送信した適応化済み音響モデルにより
更新するものである。A speech recognition system according to the present invention includes a speech recognition device that inputs a speech signal, performs speech recognition with reference to an acoustic model for obtaining the probability of an acoustic observation sequence of speech, and outputs a recognition result. And an acoustic model management server connected to the speech recognition apparatus via a network and having an initial acoustic model before adaptation, wherein the speech recognition apparatus receives an ID for specifying the acoustic model, Audio data for adaptation is obtained from the obtained audio signal, the acquired ID and the audio data for adaptation are transmitted to the acoustic model management server via a network, and the acoustic model management server is transmitted. Using the audio data for adaptation, the initial acoustic model is adapted, the adapted acoustic model is stored in association with the transmitted ID, and Receiving an acoustic model update command from the voice recognition device via the network, receives an ID specifying the acoustic model, and stores an adapted acoustic model corresponding to the received ID. An acoustic model selected and read out from the acoustic model and transmitted to the voice recognition device via the network, and the voice recognition device refers to the acoustic model to be referred to at the time of voice recognition, and is adapted by the acoustic model management server. It is updated by the acoustic model.

【００４０】この発明に係る音声認識装置は、音声の音
響的な観測値系列の確率を求める音響モデルと、音声信
号を入力し上記音響モデルを参照して音声認識を行い、
認識結果を出力する照合手段とを備えたものにおいて、
ネットワークを介して接続された音響モデル管理サーバ
から、更新された音響データにより構築された音響モデ
ルを受信し、上記照合手段が音声認識の際に参照する音
響モデルを、受信した音響モデルにより更新する音響モ
デル更新手段とを備えたものである。A speech recognition apparatus according to the present invention receives an acoustic model for obtaining a probability of an acoustic observation sequence of speech and a speech signal and performs speech recognition with reference to the acoustic model.
And a matching means for outputting a recognition result.
An acoustic model constructed based on the updated acoustic data is received from an acoustic model management server connected via a network, and the acoustic model referred to by the matching unit at the time of speech recognition is updated with the received acoustic model. Sound model updating means.

【００４１】この発明に係る音声認識装置は、音響モデ
ル更新手段が、ネットワークを介して接続された音響モ
デル管理サーバから、更新された音響データにより構築
された、照合手段が音声認識の際に参照する音響モデル
の特定条件に依存した音響モデルを受信し、上記照合手
段が音声認識の際に参照する音響モデルを、受信した音
響モデルにより更新するものである。In the speech recognition apparatus according to the present invention, the acoustic model updating means is constructed based on the updated acoustic data from the acoustic model management server connected via the network. An acoustic model dependent on specific conditions of an acoustic model to be executed is received, and an acoustic model referred to by the matching means at the time of speech recognition is updated with the received acoustic model.

【００４２】この発明に係る音声認識装置は、単語列の
出現確率を求める言語モデルと、音声信号を入力し上記
言語モデルを参照して音声認識を行い、認識結果を出力
する照合手段とを備えたものにおいて、ネットワークを
介して接続された言語モデル管理サーバから、更新され
た言語データにより構築された言語モデルを受信し、上
記照合手段が音声認識の際に参照する言語モデルを、受
信した言語モデルにより更新する言語モデル更新手段と
を備えたものである。The speech recognition apparatus according to the present invention includes a language model for obtaining an appearance probability of a word string, and a matching unit for receiving a speech signal, performing speech recognition with reference to the language model, and outputting a recognition result. A language model constructed from updated language data from a language model management server connected via a network, and the matching means refers to a language model referred to in speech recognition in the received language. Language model updating means for updating with a model.

【００４３】この発明に係る音声認識装置は、言語モデ
ル更新手段が、ネットワークを介して接続された言語モ
デル管理サーバから、更新された言語データにより構築
された、照合手段が音声認識の際に参照する言語モデル
の特定条件に依存した言語モデルを受信し、上記照合手
段が音声認識の際に参照する言語モデルを、受信した言
語モデルにより更新するものである。In the speech recognition apparatus according to the present invention, the language model updating means is constructed based on the updated language data from the language model management server connected via the network. A language model depending on a specific condition of the language model to be received is received, and the language model referred to by the matching means at the time of speech recognition is updated with the received language model.

【００４４】この発明に係る音声認識装置は、照合手段
が音声認識の際に参照する単語を登録したユーザ辞書を
備え、言語モデル更新手段が、ネットワークを介して接
続された言語モデル管理サーバから、更新された言語デ
ータにより構築された、上記照合手段が音声認識の際に
参照するユーザ辞書に依存した言語モデルを受信し、上
記照合手段が音声認識の際に参照する言語モデルを、受
信した言語モデルにより更新するものである。[0044] The speech recognition apparatus according to the present invention includes a user dictionary in which the collating means registers words to be referred to during speech recognition, and the language model updating means is provided from a language model management server connected via a network. A language model constructed based on updated language data, which depends on a user dictionary referred to by the matching means during speech recognition, is received, and a language model referred by the matching means during speech recognition is received. It is updated by the model.

【００４５】この発明に係る音声認識装置は、言語モデ
ル更新手段が、ネットワークを介して接続された言語モ
デル管理サーバから、更新された言語データにより構築
された、音声認識を行うユーザが利用したテキストに依
存した言語モデルを受信し、上記照合手段が音声認識の
際に参照する言語モデルを、受信した言語モデルにより
更新するものである。[0045] In the speech recognition apparatus according to the present invention, the language model updating means uses a text model created by the updated language data from a language model management server connected via a network and used by a user who performs speech recognition. , And updates the language model referred to by the matching means at the time of speech recognition with the received language model.

【００４６】この発明に係る音声認識装置は、音声の音
響的な観測値系列の確率を求める音響モデルと、音声信
号を入力し上記音響モデルを参照して音声認識を行い、
認識結果を出力する照合手段と、上記音響モデルを特定
するＩＤを取得する音響モデルＩＤ取得手段と、上記音
響モデルＩＤ取得手段が取得したＩＤを読み出し、入力
された音声信号から適応化用の音声データを取得し、読
み出したＩＤ及び取得した適応化用の音声データを、ネ
ットワークを介して接続された音響モデル管理サーバに
送信する適応化用音声取得手段と、上記音響モデル管理
サーバから、上記ＩＤに対応する上記適応化用の音声デ
ータにより適応化された適応化済み音響モデルを受信
し、上記照合手段が音声認識の際に参照する音響モデル
を、受信した適応化済み音響モデルにより更新する音響
モデル更新手段とを備えたものである。A speech recognition apparatus according to the present invention receives an acoustic model for obtaining a probability of an acoustic observation value sequence of speech and a speech signal and performs speech recognition with reference to the acoustic model.
A matching unit that outputs a recognition result; an acoustic model ID acquiring unit that acquires an ID that specifies the acoustic model; and an ID that is acquired by the acoustic model ID acquiring unit. Data acquisition means for acquiring the read-out ID and the acquired speech data for adaptation to an acoustic model management server connected via a network; and Receiving an adapted acoustic model adapted by the adaptation speech data corresponding to the above, and updating the acoustic model referred to by the matching means at the time of speech recognition by the received adapted acoustic model. Model updating means.

【００４７】この発明に係る音響モデル管理サーバは、
更新された音響データを取得する音響データ取得手段
と、外部からの音響モデル更新指令を受けて、上記音響
データ取得手段が取得した更新された音響データを読み
出し、音声の音響的な観測値系列の確率を求める音響モ
デルを構築する音響モデル構築手段と、上記音響モデル
構築手段により構築された音響モデルを、ネットワーク
を介して音声認識を行う音声認識装置に送信する音響モ
デル送信手段とを備えたものである。The acoustic model management server according to the present invention comprises:
An acoustic data acquisition unit that acquires updated acoustic data, and receives an external acoustic model update command, reads out the updated acoustic data acquired by the acoustic data acquisition unit, and obtains an acoustic observation value sequence of the audio. An acoustic model construction means for constructing an acoustic model for which a probability is to be obtained, and an acoustic model transmission means for transmitting the acoustic model constructed by the acoustic model construction means to a speech recognition apparatus for performing speech recognition via a network. It is.

【００４８】この発明に係る音響モデル管理サーバは、
更新された音響データを取得する音響データ取得手段
と、外部からの音響モデル更新指令を受けて、ネットワ
ークを介して接続された音声認識装置が音声認識の際に
参照する音響モデルを特定するＩＤを取得する更新音響
モデルＩＤ取得手段と、上記更新音響モデルＩＤ取得手
段が取得したＩＤで指示される特定条件に対応して、上
記音響データ取得手段が取得した更新された音響データ
を読み出す特定向け音響データ読み出し手段と、上記特
定向け音響データ読み出し手段が読み出した更新された
音響データを参照し、上記特定条件に依存した音響モデ
ルを構築する特定向け音響モデル構築手段と、上記特定
向け音響モデル構築手段が構築した音響モデルを、ネッ
トワークを介して上記音声認識装置に送信する音響モデ
ル送信手段とを備えたものである。The acoustic model management server according to the present invention comprises:
An acoustic data acquisition unit for acquiring updated acoustic data, and an ID for identifying an acoustic model to be referred to by a speech recognition device connected via a network upon speech recognition in response to an acoustic model update command from outside. The updated acoustic model ID acquisition means to be acquired, and the specific sound to read out the updated acoustic data acquired by the acoustic data acquisition means in response to the specific condition indicated by the ID acquired by the updated acoustic model ID acquisition means. Data readout means, specific acoustic model construction means for constructing an acoustic model dependent on the specific condition with reference to the updated acoustic data read by the specific acoustic data readout means, and specific acoustic model construction means And an acoustic model transmitting means for transmitting the constructed acoustic model to the speech recognition device via a network. It is intended.

【００４９】この発明に係る音響モデル管理サーバは、
音声の音響的な観測値系列の確率を求める、適応化前の
初期音響モデルと、ネットワークを介して接続された音
声認識装置から送信された、適応化用の音声データと、
上記音声認識装置が音声認識の際に参照する音響モデル
を特定するＩＤを受信し、上記適応化用の音声データを
用いて上記初期音響モデルを適応化し、適応化済み音響
モデルを、受信した上記ＩＤに対応付けて適応化済み音
響モデル格納手段に格納する音響モデル適応化手段と、
外部からの音響モデル更新指令を受けて、ネットワーク
を介して上記音声認識装置から上記ＩＤを受信し、受信
したＩＤに対応する適応化済み音響モデルを、上記適応
化済み音響モデル格納手段から選択して読み出す適応化
済み音響モデル選択手段と、上記適応化済み音響モデル
選択手段が読み出した適応化済み音響モデルを、ネット
ワークを介して上記音声認識装置に送信する音響モデル
送信手段とを備えたものである。The acoustic model management server according to the present invention comprises:
Finding the probability of an acoustic observation sequence of speech, an initial acoustic model before adaptation, and speech data for adaptation transmitted from a speech recognizer connected via a network,
The voice recognition device receives an ID for specifying an acoustic model to be referred to at the time of speech recognition, adapts the initial acoustic model using the adaptation speech data, and receives the adapted acoustic model. Acoustic model adaptation means for storing in the adapted acoustic model storage means in association with the ID,
Receiving an acoustic model update command from the outside, receiving the ID from the speech recognition device via a network, selecting an adapted acoustic model corresponding to the received ID from the adapted acoustic model storage means. And an acoustic model transmitting means for transmitting the adapted acoustic model read by the adapted acoustic model selecting means to the speech recognition device via a network. is there.

【００５０】この発明に係る言語モデル管理サーバは、
更新された言語データを取得する言語データ取得手段
と、外部からの言語モデル更新指令を受けて、上記言語
データ取得手段が取得した更新された言語データを読み
出し、単語列の出現確率を求める言語モデルを構築する
言語モデル構築手段と、上記言語モデル構築手段が構築
した言語モデルを、ネットワークを介して音声認識を行
う音声認識装置に送信する言語モデル送信手段とを備え
たものである。The language model management server according to the present invention comprises:
A language data acquiring means for acquiring updated language data, and a language model for receiving an external language model update command, reading out the updated language data acquired by the language data acquiring means, and obtaining a word string appearance probability. And language model transmission means for transmitting the language model constructed by the language model construction means to a speech recognition apparatus for performing speech recognition via a network.

【００５１】この発明に係る言語モデル管理サーバは、
更新された言語データを取得する言語データ取得手段
と、外部からの言語モデル更新指令を受けて、ネットワ
ークを介して接続された音声認識装置が音声認識の際に
参照する言語モデルを特定するＩＤを取得する更新言語
モデルＩＤ取得手段と、上記更新言語モデルＩＤ取得手
段が取得したＩＤで指示される特定条件に対応して、上
記言語データ取得手段が取得した更新された言語データ
を読み出す特定向け言語データ読み出し手段と、上記特
定向け言語データ読み出し手段が読み出した更新された
言語データを参照し、上記特定条件に依存した言語モデ
ルを構築する特定向け言語モデル構築手段と、上記特定
向け言語モデル構築手段が構築した言語モデルを、ネッ
トワークを介して上記音声認識装置に送信する言語モデ
ル送信手段とを備えたものである。The language model management server according to the present invention comprises:
A language data acquisition unit for acquiring updated language data, and an ID for specifying a language model to be referred to by a speech recognition device connected via a network upon speech recognition in response to an external language model update command. An updated language model ID acquisition unit to be acquired, and a specific language for reading out updated language data acquired by the language data acquisition unit in response to a specific condition indicated by the ID acquired by the update language model ID acquisition unit. Data reading means, specific language model building means for referring to the updated language data read by the specific language data reading means, and building a language model dependent on the specific conditions, and the specific language model building means Language model transmitting means for transmitting the language model constructed by the method to the speech recognition device via a network. It is intended.

【００５２】この発明に係る言語モデル管理サーバは、
更新された言語データを取得する言語データ取得手段
と、外部からの言語モデル更新指令を受けて、ネットワ
ークを介して接続された音声認識装置が音声認識の際に
参照するユーザ辞書を読み出すユーザ辞書読み出し手段
と、上記言語データ取得手段が取得した更新された言語
データを読み出し、上記ユーザ辞書読み出し手段が読み
出したユーザ辞書に依存した言語モデルを構築するユー
ザ辞書依存言語モデル構築手段と、上記ユーザ辞書依存
言語モデル構築手段が構築した言語モデルを、ネットワ
ークを介して上記音声認識装置に送信する言語モデル送
信手段とを備えたものである。The language model management server according to the present invention comprises:
Language data acquisition means for acquiring updated language data, and a user dictionary readout for receiving a language model update command from the outside and reading out a user dictionary referred to by a speech recognition device connected via a network during speech recognition Means for reading the updated language data acquired by the language data acquisition means, and constructing a language model dependent on the user dictionary read by the user dictionary reading means; and a user dictionary dependent language model construction means. Language model transmitting means for transmitting the language model constructed by the language model constructing means to the speech recognition device via a network.

【００５３】この発明に係る言語モデル管理サーバは、
更新された言語データを取得する言語データ取得手段
と、外部からの言語モデル更新指令を受けて、ネットワ
ークを介して接続された音声認識装置のユーザが利用し
たテキストを取得するユーザ利用テキスト取得手段と、
上記言語データ取得手段が取得した更新された言語デー
タを読み出し、上記ユーザ利用テキスト取得手段が取得
したテキストに依存した言語モデルを構築するユーザ利
用テキスト依存言語モデル構築手段と、上記ユーザ利用
テキスト依存言語モデル構築手段が構築した言語モデル
を、ネットワークを介して上記音声認識装置に送信する
言語モデル送信手段とを備えたものである。The language model management server according to the present invention comprises:
Language data acquisition means for acquiring updated language data; user use text acquisition means for receiving a language model update command from the outside, and acquiring text used by a user of a voice recognition device connected via a network; ,
A user-based text-dependent language model constructing unit that reads the updated language data acquired by the language data acquiring unit and constructs a language model dependent on the text acquired by the user-used text acquiring unit; Language model transmitting means for transmitting the language model constructed by the model constructing means to the speech recognition apparatus via a network.

【００５４】この発明に係る音声認識方法は、音声信号
を入力し、音声の音響的な観測値系列の確率を求める音
響モデルを参照して音声認識を行い、認識結果を出力す
るものにおいて、更新された音響データを取得する第１
のステップと、音響モデル更新指令を受けて、上記第１
のステップで取得した更新された音響データを読み出
し、音響モデルを構築する第２のステップと、上記第２
のステップで構築した音響モデルを、ネットワークを介
して送信する第３のステップと、上記第３のステップで
送信した音響モデルを受信し、上記音声認識の際に参照
する音響モデルを、受信した音響モデルにより更新する
第４のステップとを備えたものである。A speech recognition method according to the present invention is characterized in that a speech signal is input, speech recognition is performed with reference to an acoustic model for obtaining the probability of an acoustic observation value sequence of speech, and a recognition result is output. First to obtain the acquired acoustic data
Receiving the acoustic model update command and the first
A second step of reading out the updated acoustic data acquired in the step and constructing an acoustic model;
A third step of transmitting the acoustic model constructed in step 3 through a network, receiving the acoustic model transmitted in the third step, and referencing the acoustic model to be referred to at the time of the speech recognition. And a fourth step of updating with the model.

【００５５】この発明に係る音声認識方法は、音声信号
を入力し、単語列の出現確率を求める言語モデルを参照
して音声認識を行い、認識結果を出力するものにおい
て、更新された言語データを取得する第１のステップ
と、言語モデル更新指令を受けて、上記第１のステップ
で取得した更新された言語データを読み出し、言語モデ
ルを構築する第２のステップと、上記第２のステップで
構築した言語モデルを、ネットワークを介して送信する
第３のステップと、上記第３のステップで送信した言語
モデルを受信し、上記音声認識の際に参照する言語モデ
ルを、受信した言語モデルにより更新する第４のステッ
プとを備えたものである。In a speech recognition method according to the present invention, a speech signal is input, speech recognition is performed with reference to a language model for obtaining a word string occurrence probability, and a recognition result is output. A first step of acquiring, a second step of reading the updated language data acquired in the first step in response to a language model update command, and constructing a language model, and a second step of constructing the language model Receiving the language model transmitted through the network in the third step, and receiving the language model transmitted in the third step, and updating the language model referred to in the speech recognition based on the received language model. And a fourth step.

【００５６】この発明に係る音声認識方法は、音声信号
を入力し、音声の音響的な観測値系列の確率を求める音
響モデルを参照して音声認識を行い、認識結果を出力す
るものにおいて、更新された音響データを取得する第１
のステップと、音響モデル更新指令を受けて、音声認識
の際に参照する音響モデルを特定するＩＤを取得する第
２のステップと、上記第２のステップで取得したＩＤで
指示される特定条件に対応して、上記第１のステップで
取得した更新された音響データを読み出す第３のステッ
プと、上記第３のステップで読み出した更新された音響
データを参照し、上記特定条件に依存した音響モデルを
構築する第４のステップと、上記第４のステップで構築
した音響モデルを、ネットワークを介して送信する第５
のステップと、上記第５のステップで送信した音響モデ
ルを受信し、音声認識の際に参照する音響モデルを、受
信した音響モデルにより更新する第６のステップとを備
えたものである。A speech recognition method according to the present invention is characterized in that a speech signal is input, speech recognition is performed with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of speech, and a recognition result is output. First to obtain the acquired acoustic data
A second step of receiving an acoustic model update command and acquiring an ID for identifying an acoustic model to be referred to during speech recognition, and a specific condition indicated by the ID acquired in the second step. Correspondingly, a third step of reading out the updated acoustic data obtained in the first step, and an acoustic model dependent on the specific condition with reference to the updated acoustic data read out in the third step And a fifth step of transmitting the acoustic model constructed in the fourth step via a network.
And a sixth step of receiving the acoustic model transmitted in the fifth step and updating the acoustic model to be referred to at the time of speech recognition with the received acoustic model.

【００５７】この発明に係る音声認識方法は、音声信号
を入力し、単語列の出現確率を求める言語モデルを参照
して音声認識を行い、認識結果を出力するものにおい
て、更新された言語データを取得する第１のステップ
と、言語モデル更新指令を受けて、音声認識の際に参照
する言語モデルを特定するＩＤを取得する第２のステッ
プと、上記第２のステップで取得したＩＤで指示される
特定条件に対応して、上記第１のステップで取得した更
新された言語データを読み出す第３のステップと、上記
第３のステップで読み出した更新された言語データを参
照し、上記特定条件に依存した言語モデルを構築する第
４のステップと、上記第４のステップで構築した言語モ
デルを、ネットワークを介して送信する第５のステップ
と、上記第５のステップで送信した言語モデルを受信
し、音声認識の際に参照する言語モデルを、受信した言
語モデルにより更新する第６のステップとを備えたもの
である。According to the speech recognition method of the present invention, a speech signal is input, speech recognition is performed with reference to a language model for obtaining the probability of occurrence of a word string, and a recognition result is output. A first step of obtaining, a second step of obtaining an ID for specifying a language model to be referred to at the time of speech recognition in response to a language model update command, and an instruction specified by the ID obtained in the second step. A third step of reading out the updated language data obtained in the first step in response to the specific condition, and referring to the updated language data read out in the third step. A fourth step of constructing a dependent language model, a fifth step of transmitting the language model constructed in the fourth step via a network, and a fifth step of In receiving the transmitted language model, in which the language model referenced during speech recognition, and a sixth step of updating the language model received.

【００５８】この発明に係る音声認識方法は、音声信号
を入力し、単語列の出現確率を求める言語モデルと、単
語を登録したユーザ辞書を参照して音声認識を行い、認
識結果を出力するものにおいて、更新された言語データ
を取得する第１のステップと、言語モデル更新指令を受
けて、音声認識の際に参照するユーザ辞書を読み出す第
２のステップと、上記第１のステップで取得した更新さ
れた言語データを読み出し、上記第２のステップで読み
出したユーザ辞書に依存した言語モデルを構築する第３
のステップと、上記第３のステップで構築した言語モデ
ルを、ネットワークを介して送信する第４のステップ
と、上記第４のステップで送信した言語モデルを受信
し、音声認識の際に参照する言語モデルを、受信した言
語モデルにより更新する第５のステップとを備えたもの
である。A speech recognition method according to the present invention performs speech recognition by inputting a speech signal and referring to a language model for obtaining a word string appearance probability and a user dictionary in which words are registered, and outputs a recognition result. A first step of obtaining updated language data, a second step of receiving a language model update command and reading out a user dictionary to be referred to at the time of speech recognition, and a step of obtaining the updated language data in the first step. A third language model that reads the language data thus read and constructs a language model dependent on the user dictionary read in the second step.
And a fourth step of transmitting, via a network, the language model constructed in the third step, and a language to receive the language model transmitted in the fourth step and to refer to the language model for speech recognition. Updating the model with the received language model.

【００５９】この発明に係る音声認識方法は、音声信号
を入力し、単語列の出現確率を求める言語モデルを参照
して音声認識を行い、認識結果を出力するものにおい
て、更新された言語データを取得する第１のステップ
と、言語モデル更新指令を受けて、音声認識を行うユー
ザが利用したテキストを取得する第２のステップと、上
記第１のステップで取得した更新された言語データを読
み出し、上記第２のステップで取得したテキストに依存
した言語モデルを構築する第３のステップと、上記第３
のステップで構築した言語モデルを、ネットワークを介
して送信する第４のステップと、上記第４のステップで
送信した言語モデルを受信し、音声認識の際に参照する
言語モデルを、受信した言語モデルにより更新する第５
のステップとを備えたものである。In the speech recognition method according to the present invention, a speech signal is input, speech recognition is performed with reference to a language model for obtaining the probability of occurrence of a word string, and a recognition result is output. A first step of acquiring, a second step of acquiring a text used by a user performing speech recognition in response to a language model update command, and reading the updated language data acquired in the first step; A third step of constructing a language model dependent on the text obtained in the second step,
A fourth step of transmitting the language model constructed in step 4 via a network, receiving the language model transmitted in the fourth step, and referencing the language model to be referred to in speech recognition. Updated by the fifth
And the following steps.

【００６０】この発明に係る音声認識方法は、音声信号
を入力し、音声の音響的な観測値系列の確率を求める音
響モデルを参照して音声認識を行い、認識結果を出力す
るものにおいて、上記音響モデルを特定するＩＤを取得
する第１のステップと、上記第１のステップで取得した
ＩＤを読み出し、入力された音声信号から適応化用の音
声データを取得し、ネットワークを介して、読み出した
ＩＤ及び取得した適応化用の音声データを送信する第２
のステップと、上記第２のステップで送信した適応化用
の音声データを用いて、適応化前の初期音響モデルを適
応化し、適応化済み音響モデルを、上記第２のステップ
で送信したＩＤに対応付けて格納する第３のステップ
と、音響モデル更新指令を受けて、ネットワークを介し
て上記第１のステップで取得したＩＤを受信し、受信し
たＩＤに対応する適応化済み音響モデルを、上記第３の
ステップで格納している適応化済み音響モデルの中から
選択して読み出す第４のステップと、上記第４のステッ
プで読み出した適応化済み音響モデルを、ネットワーク
を介して送信する第５のステップと、上記第５のステッ
プで送信した適応化済み音響モデルを受信し、音声認識
の際に参照する音響モデルを、受信した適応化済み音響
モデルにより更新する第６のステップとを備えたもので
ある。A speech recognition method according to the present invention is characterized in that a speech signal is inputted, speech recognition is performed with reference to an acoustic model for obtaining the probability of an acoustic observation sequence of speech, and a recognition result is output. A first step of acquiring an ID for specifying an acoustic model, reading the ID acquired in the first step, acquiring audio data for adaptation from the input audio signal, and reading the data via a network. Second for transmitting ID and acquired audio data for adaptation
And the adaptation of the initial acoustic model before adaptation using the audio data for adaptation transmitted in the second step, and the adaptation of the adapted acoustic model to the ID transmitted in the second step. Receiving the ID acquired in the first step via a network in response to the third step of storing the associated acoustic model and the acoustic model update command, and storing the adapted acoustic model corresponding to the received ID A fourth step of selecting and reading out the adapted acoustic model stored in the third step, and a fifth step of transmitting the adapted acoustic model read in the fourth step via a network. Receiving the adapted acoustic model transmitted in the fifth step and updating the acoustic model to be referred to at the time of speech recognition with the received adapted acoustic model. It is obtained by a sixth step.

【００６１】この発明に係る音声認識プログラムを記録
したコンピュータ読み取り可能な記録媒体は、音声信号
を入力し、音声の音響的な観測値系列の確率を求める音
響モデルを参照して音声認識を行い、認識結果を出力す
る照合機能を実現させるものであって、更新された音響
データを取得する音響データ取得機能と、音響モデル更
新指令を受けて、上記音響データ取得機能が取得した更
新された音響データを読み出し、音響モデルを構築する
音響モデル構築機能と、上記音響モデル構築機能が構築
した音響モデルを、ネットワークを介して送信する音響
モデル送信機能と、上記音響モデル送信機能が送信した
音響モデルを受信し、上記照合機能が音声認識の際に参
照する音響モデルを、受信した音響モデルにより更新す
る音響モデル更新機能とを実現させるものである。A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal and performs speech recognition with reference to an acoustic model for obtaining the probability of an acoustic observation sequence of speech. A sound data acquiring function for acquiring a renewed acoustic data, and an updated acoustic data acquired by the acoustic data acquiring function in response to an acoustic model update command, realizing a matching function for outputting a recognition result. And an acoustic model construction function for constructing an acoustic model, an acoustic model transmission function for transmitting the acoustic model constructed by the acoustic model construction function via a network, and an acoustic model transmitted by the acoustic model transmission function Then, the acoustic model referred to by the matching function during speech recognition is updated with the received acoustic model. It is used for realizing the potential.

【００６２】この発明に係る音声認識プログラムを記録
したコンピュータ読み取り可能な記録媒体は、音声信号
を入力し、単語列の出現確率を求める言語モデルを参照
して音声認識を行い、認識結果を出力する照合機能を実
現させるものであって、更新された言語データを取得す
る言語データ取得機能と、言語モデル更新指令を受け
て、上記言語データ取得機能が取得した更新された言語
データを読み出し、言語モデルを構築する言語モデル構
築機能と、上記言語モデル構築機能が構築した言語モデ
ルを、ネットワークを介して送信する言語モデル送信機
能と、上記言語モデル送信機能が送信した言語モデルを
受信し、上記照合機能が音声認識の際に参照する言語モ
デルを、受信した言語モデルにより更新する言語モデル
更新機能とを実現させるものである。A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal, performs speech recognition with reference to a language model for obtaining a word string occurrence probability, and outputs a recognition result. A language data acquisition function for acquiring the updated language data, and a language model update command, which reads the updated language data acquired by the language data acquisition function, A language model construction function for constructing a language model, a language model transmission function for transmitting the language model constructed by the language model construction function via a network, and a language model transmitted by the language model transmission function Language model updating function that updates the language model referred to during speech recognition by the received language model. Is shall.

【００６３】この発明に係る音声認識プログラムを記録
したコンピュータ読み取り可能な記録媒体は、音声信号
を入力し、音声の音響的な観測値系列の確率を求める音
響モデルを参照して音声認識を行い、認識結果を出力す
る照合機能を実現させるものであって、更新された音響
データを取得する音響データ取得機能と、音響モデル更
新指令を受けて、上記音響モデルを特定するＩＤを取得
する更新音響モデルＩＤ取得機能と、上記更新音響モデ
ルＩＤ取得機能が取得したＩＤで指示される特定条件に
対応して、上記音響データ取得機能が取得した更新され
た音響データを読み出す特定向け音響データ読み出し機
能と、上記特定向け音響データ読み出し機能が読み出し
た更新された音響データを参照し、上記特定条件に依存
した音響モデルを構築する特定向け音響モデル構築機能
と、上記特定向け音響モデル構築機能が構築した音響モ
デルを、ネットワークを介して送信する音響モデル送信
機能と、上記音響モデル送信機能が送信した音響モデル
を受信し、上記照合機能が音声認識の際に参照する音響
モデルを、受信した音響モデルにより更新する音響モデ
ル更新機能とを実現させるものである。A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal and performs speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of speech. A sound data acquisition function for acquiring updated acoustic data, and an updated acoustic model for receiving an acoustic model update command and acquiring an ID for specifying the acoustic model. An ID acquisition function, a specific acoustic data reading function for reading the updated acoustic data acquired by the acoustic data acquisition function in response to a specific condition indicated by the ID acquired by the updated acoustic model ID acquisition function, Referring to the updated acoustic data read by the specific acoustic data reading function, an acoustic model depending on the specific condition is obtained. The specific acoustic model construction function to be built, the acoustic model constructed by the specific acoustic model construction function, an acoustic model transmission function to transmit via a network, and the acoustic model transmitted by the acoustic model transmission function are received, An acoustic model updating function of updating the acoustic model referred to by the matching function at the time of speech recognition with the received acoustic model is realized.

【００６４】この発明に係る音声認識プログラムを記録
したコンピュータ読み取り可能な記録媒体は、音声信号
を入力し、単語列の出現確率を求める言語モデルを参照
して音声認識を行い、認識結果を出力する照合機能を実
現させるものであって、更新された言語データを取得す
る言語データ取得機能と、言語モデル更新指令を受け
て、上記言語モデルを特定するＩＤを取得する更新言語
モデルＩＤ取得機能と、上記更新言語モデルＩＤ取得機
能が取得したＩＤで指示される特定条件に対応して、上
記言語データ取得機能が取得した更新された言語データ
を読み出す特定向け言語データ読み出し機能と、上記特
定向け言語データ読み出し機能が読み出した更新された
言語データを参照し、上記特定条件に依存した言語モデ
ルを構築する特定向け言語モデル構築機能と、上記特定
向け言語モデル構築機能が構築した言語モデルを、ネッ
トワークを介して送信する言語モデル送信機能と、上記
言語モデル送信機能が送信した言語モデルを受信し、上
記照合機能が音声認識の際に参照する言語モデルを、受
信した言語モデルにより更新する言語モデル更新機能と
を実現させるものである。A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal, performs speech recognition with reference to a language model for obtaining a word string appearance probability, and outputs a recognition result. A language data acquiring function for realizing a collation function, acquiring updated language data, an updated language model ID acquiring function for acquiring an ID for specifying the language model in response to a language model update command, A language data reading function for reading updated language data acquired by the language data acquisition function in response to a specific condition indicated by the ID acquired by the updated language model ID acquisition function; The reading function refers to the updated language data read, and a specific direction for constructing a language model depending on the specific conditions described above. A language model construction function, a language model transmission function for transmitting the language model constructed by the specific language model construction function via a network, and a language model transmitted by the language model transmission function, and the matching function This implements a language model updating function of updating a language model referred to during speech recognition based on the received language model.

【００６５】この発明に係る音声認識プログラムを記録
したコンピュータ読み取り可能な記録媒体は、音声信号
を入力し、単語列の出現確率を求める言語モデルと、単
語を登録したユーザ辞書を参照して音声認識を行い、認
識結果を出力する照合機能を実現させるものであって、
更新された言語データを取得する言語データ取得機能
と、言語モデル更新指令を受けて、上記ユーザ辞書を読
み出すユーザ辞書読み出し機能と、上記言語データ取得
機能が取得した更新された言語データを読み出し、上記
ユーザ辞書読み出し機能が読み出したユーザ辞書に依存
した言語モデルを構築するユーザ辞書依存言語モデル構
築機能と、上記ユーザ辞書依存言語モデル構築機能が構
築した言語モデルを、ネットワークを介して送信する言
語モデル送信機能と、上記言語モデル送信機能が送信し
た言語モデルを受信し、上記照合機能が音声認識の際に
参照する言語モデルを、受信した言語モデルにより更新
する言語モデル更新機能とを実現させるものである。A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal, and refers to a language model for obtaining a word string appearance probability and a user dictionary in which words are registered for speech recognition. And implements a collation function of outputting a recognition result,
A language data acquisition function for acquiring updated language data, a user dictionary reading function for receiving the language model update command and reading the user dictionary, and an updated language data acquired by the language data acquisition function. A user dictionary dependent language model construction function for constructing a language model dependent on the user dictionary read by the user dictionary read function, and a language model transmission for transmitting the language model constructed by the user dictionary dependent language model construction function via a network And a language model updating function of receiving the language model transmitted by the language model transmitting function and updating the language model referred to by the matching function at the time of speech recognition with the received language model. .

【００６６】この発明に係る音声認識プログラムを記録
したコンピュータ読み取り可能な記録媒体は、音声信号
を入力し、単語列の出現確率を求める言語モデルを参照
して音声認識を行い、認識結果を出力する照合機能を実
現させるものであって、更新された言語データを取得す
る言語データ取得機能と、言語モデル更新指令を受け
て、音声認識を行うユーザが利用したテキストを取得す
るユーザ利用テキスト取得機能と、上記言語データ取得
機能が取得した更新された言語データを読み出し、上記
ユーザ利用テキスト取得機能が取得したテキストに依存
した言語モデルを構築するユーザ利用テキスト依存言語
モデル構築機能と、上記ユーザ利用テキスト依存言語モ
デル構築機能が構築した言語モデルを、ネットワークを
介して送信する言語モデル送信機能と、上記言語モデル
送信機能が送信した言語モデルを受信し、上記照合機能
が音声認識の際に参照する言語モデルを、受信した言語
モデルにより更新する言語モデル更新機能とを実現させ
るものである。A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal, performs speech recognition with reference to a language model for obtaining the appearance probability of a word string, and outputs a recognition result. A language data acquisition function for realizing a collation function, which acquires updated language data, and a user use text acquisition function for acquiring a text used by a user who performs speech recognition in response to a language model update command. Reading the updated language data acquired by the language data acquisition function, and constructing a language model dependent on the text acquired by the user use text acquisition function; a user use text dependent language model construction function; The language that sends the language model constructed by the language model construction function via the network A device that realizes a Dell transmission function and a language model update function of receiving the language model transmitted by the language model transmission function and updating the language model referred to by the matching function during speech recognition with the received language model. It is.

【００６７】この発明に係る音声認識プログラムを記録
したコンピュータ読み取り可能な記録媒体は、音声信号
を入力し、音声の音響的な観測値系列の確率を求める音
響モデルを参照して音声認識を行い、認識結果を出力す
る照合機能を実現させるものであって、上記音響モデル
を特定するＩＤを取得する音響モデルＩＤ取得機能と、
上記音響モデルＩＤ取得機能が取得したＩＤを読み出
し、入力された音声信号から適応化用の音声データを取
得し、ネットワークを介して、読み出したＩＤ及び取得
した適応化用の音声データを送信する適応化用音声取得
機能と、上記適応化用音声取得機能が送信した適応化用
の音声データを用いて、適応化前の初期音響モデルを適
応化し、適応化済み音響モデルを、上記適応化用音声取
得機能が送信したＩＤに対応付けて格納する音響モデル
適応化機能と、音響モデル更新指令を受けて、ネットワ
ークを介して上記音響モデルＩＤ取得機能が取得したＩ
Ｄを受信し、受信したＩＤに対応する適応化済み音響モ
デルを、上記音響モデル適応化機能が格納した適応化済
み音響モデルの中から選択して読み出す適応化済み音響
モデル選択機能と、上記適応化済み音響モデル選択機能
が読み出した適応化済み音響モデルを、ネットワークを
介して送信する音響モデル送信機能と、上記音響モデル
送信機能が送信した適応化済み音響モデルを受信し、上
記照合機能が音声認識の際に参照する音響モデルを、受
信した適応化済み音響モデルにより更新する音響モデル
更新機能とを実現させるものである。A computer-readable recording medium on which a speech recognition program according to the present invention is recorded receives a speech signal, performs speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of speech, An acoustic model ID acquisition function that implements a collation function of outputting a recognition result, and acquires an ID that specifies the acoustic model;
The above-described acoustic model ID acquisition function reads the acquired ID, acquires audio data for adaptation from the input audio signal, and transmits the read ID and the acquired audio data for adaptation via a network. Adaptation of the initial acoustic model before adaptation using the adaptation speech acquisition function and the adaptation speech data transmitted by the adaptation speech acquisition function, and the adaptation acoustic model The acoustic model adaptation function stored in association with the ID transmitted by the acquisition function, and the I acquired by the acoustic model ID acquisition function via the network in response to the acoustic model update command.
D: receiving an adaptive acoustic model corresponding to the received ID from the acoustic model adaptation function stored in the acoustic model adaptation function and reading out the adapted acoustic model. An acoustic model transmitting function of transmitting the adapted acoustic model read by the optimized acoustic model selecting function via a network; and an adapted acoustic model transmitted by the acoustic model transmitting function. This realizes an acoustic model update function of updating an acoustic model referred to at the time of recognition with the received adapted acoustic model.

【００６８】[0068]

【発明の実施の形態】以下、この発明の実施の一形態を
説明する。実施の形態１．図１はこの発明の実施の形態１による音
声認識システムの構成を示すブロック図である。図にお
いて、１０は音声認識を行う音声認識装置、２０はネッ
トワークに接続されている音響モデル管理サーバ、３０
はネットワークに接続されている言語モデル管理サーバ
である。ここで、ネットワークとは、有線あるいは無線
によってディジタル信号を伝達可能な通信経路一般を示
す。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below. Embodiment 1 FIG. FIG. 1 is a block diagram showing a configuration of a speech recognition system according to Embodiment 1 of the present invention. In the figure, reference numeral 10 denotes a voice recognition device for performing voice recognition, 20 denotes an acoustic model management server connected to a network, 30
Is a language model management server connected to the network. Here, a network generally indicates a communication path capable of transmitting a digital signal by wire or wirelessly.

【００６９】音声認識装置１０において、１００は入力
する音声信号、１０１は音声信号１００の音声認識を行
う照合手段、１０２は照合手段１０１が音声認識の際に
参照する音響モデル、１０３は照合手段１０１が音声認
識の際に参照する言語モデル、１０４は照合手段１０１
が出力する認識結果、１１１はネットワークを介して送
信された音響モデルにより音響モデル１０２を更新する
音響モデル更新手段、１１８はネットワークを介して送
信された言語モデルにより言語モデル１０３を更新する
言語モデル更新手段である。In the speech recognition apparatus 10, reference numeral 100 denotes an input speech signal; 101, a matching means for performing speech recognition of the speech signal 100; 102, an acoustic model referred to by the matching means 101 for speech recognition; Is a language model referred to during speech recognition.
, The acoustic model updating unit 111 updates the acoustic model 102 with the acoustic model transmitted via the network, and the language model updating unit 118 updates the language model 103 with the language model transmitted via the network. Means.

【００７０】音響モデル管理サーバ２０において、１０
５は外部より与えられる音響モデル更新指令、１０６は
更新された音響データを取得する音響データ取得手段、
１０７は音響データ取得手段１０６が取得した更新され
た音響データ、１０８は音響モデル更新指令１０５を受
けて、更新された音響データ１０７を読み出し、統計的
手法を用いてパラメータ推定を行い音響モデルを構築す
る音響モデル構築手段、１０９は構築された音響モデル
を格納する音響モデル格納手段、１１０は音響モデル格
納手段１０９に格納されている音響モデルを、ネットワ
ークを介して音声認識装置１０に送信する音響モデル送
信手段である。In the acoustic model management server 20, 10
5 is an acoustic model update command given from the outside, 106 is acoustic data acquisition means for acquiring updated acoustic data,
Reference numeral 107 denotes updated acoustic data acquired by the acoustic data acquisition means 106. Reference numeral 108 denotes an acoustic model update command 105, reads out the updated acoustic data 107, and constructs an acoustic model by performing parameter estimation using a statistical method. Acoustic model constructing means 109, acoustic model storing means 109 for storing the constructed acoustic model, 110 acoustic model for transmitting the acoustic model stored in the acoustic model storing means 109 to the speech recognition apparatus 10 via a network. Transmission means.

【００７１】言語モデル管理サーバ３０において、１１
２は外部より与えられる言語モデル更新指令、１１３は
更新された言語データを取得する言語データ取得手段、
１１４は言語データ取得手段１１３が取得した更新され
た言語データ、１１５は言語モデル更新指令１１２を受
けて、更新された言語データ１１４を読み出し、統計的
手法を用いてパラメータ推定を行い言語モデルを構築す
る言語モデル構築手段、１１６は構築された言語モデル
を格納する言語モデル格納手段、１１７は言語モデル格
納手段１１６に格納されている言語モデルを、ネットワ
ークを介して音声認識装置１０に送信する言語モデル送
信手段である。In the language model management server 30, 11
2 is a language model update command given from outside, 113 is language data acquisition means for acquiring updated language data,
Reference numeral 114 denotes updated language data obtained by the language data obtaining means 113. Reference numeral 115 denotes a language model constructed by reading out updated language data 114 in response to a language model update command 112 and estimating parameters using a statistical method. Language model constructing means 116 for storing the constructed language model, and 117 a language model for transmitting the language model stored in the language model storing means 116 to the speech recognition apparatus 10 via a network. Transmission means.

【００７２】従来技術と異なるこの発明の特徴的な部分
は、音響モデル管理サーバ２０，言語モデル管理サーバ
３０において、音響データ取得手段１０６，言語データ
取得手段１１３により、更新された音響データ１０７，
更新された言語データ１１４を取得し、音響モデル構築
手段１０８，言語モデル構築手段１１５により最新の音
響モデル、最新の言語モデルを構築し、構築した最新の
音響モデル、最新の言語モデルを、ネットワークを介し
て音声認識装置１０に送信し、音声認識装置１０におい
て、音響モデル更新手段１１１，言語モデル更新手段１
１８が、最新の音響モデル、最新の言語モデルにより、
照合手段１０１が参照する音響モデル１０２，言語モデ
ル１０３を更新することである。The feature of the present invention, which is different from the prior art, is that the acoustic data 107 updated by the acoustic data acquisition means 106 and the language data acquisition means 113 in the acoustic model management server 20 and the language model management server 30
The updated language data 114 is acquired, the latest acoustic model and the latest language model are constructed by the acoustic model construction unit 108 and the language model construction unit 115, and the constructed latest acoustic model and the latest language model are transmitted to the network. The voice model is transmitted to the voice recognition device 10 via the voice recognition device 10, where the acoustic model updating unit 111 and the language model updating unit 1
18, the latest acoustic model, the latest language model,
This is to update the acoustic model 102 and the language model 103 referred to by the matching unit 101.

【００７３】次に動作について説明する。音響モデル管
理サーバ２０において、音響データ取得手段１０６は、
音響モデル更新指令１０５と同期、あるいは非同期に動
作し、常時、更新あるいは配信される音響データを自
動、あるいは半自動的にダウンロードし、更新された音
響データ１０７に格納する。取得するこれらの音響デー
タは、例えばインターネット上で更新されたり、マルチ
メディア放送によって配信される音声データ、あるいは
音声データと対応する書き起こしテキストであり、検索
ツールによってインターネット上で検索されたり、マル
チメディア放送の番組表を用いて決定されてダウンロー
ドされる。Next, the operation will be described. In the acoustic model management server 20, the acoustic data acquisition means 106
It operates synchronously or asynchronously with the acoustic model update command 105, constantly or automatically or semi-automatically downloads updated or distributed acoustic data and stores it in the updated acoustic data 107. These acoustic data to be acquired are, for example, audio data that is updated on the Internet, distributed by multimedia broadcasting, or transcripts corresponding to the audio data. It is determined using a broadcast program table and downloaded.

【００７４】更新された音響データ１０７は、音響デー
タ取得手段１０６により取得された音響モデル学習用音
響データの集積であり、上記の例では、音声データや音
声データと対応する書き起こしテキストからなる。The updated acoustic data 107 is an accumulation of acoustic model learning acoustic data acquired by the acoustic data acquiring means 106. In the above example, the updated acoustic data 107 is composed of audio data and a transcript corresponding to the audio data.

【００７５】音響モデル構築手段１０８は、例えば、一
定の時間間隔、音声認識処理が実施された時間間隔、あ
るいは入力装置から与えられるユーザの指示等、適当な
タイミングで与えられる音響モデル更新指令１０５を受
けて、更新された音響データ１０７を参照し、統計的手
法を用いて音響モデルのパラメータ推定を行うことによ
って、例えば、音声データのみからベクトル量子化アル
ゴリズムを用いることによって、あるいは音声データと
対応する書き起こしテキストからバウム・ウェルチアル
ゴリズムを用いることによって、学習データを良く表す
ように音響モデルを構築して音響モデル格納手段１０９
に格納する。The acoustic model constructing means 108 sends the acoustic model update command 105 given at an appropriate timing, for example, at a fixed time interval, a time interval at which the speech recognition processing is performed, or a user instruction given from the input device. Then, by referring to the updated acoustic data 107 and estimating the parameters of the acoustic model using a statistical method, for example, by using a vector quantization algorithm from only the audio data, or by corresponding to the audio data. By using the Baum-Welch algorithm from the transcribed text, an acoustic model is constructed so as to express the learning data well, and the acoustic model storage means 109 is constructed.
To be stored.

【００７６】音響モデル格納手段１０９は、音響モデル
構築手段１０８により構築された音響モデルを記憶し、
読み出し要求に応じて音響モデルを出力する。音響モデ
ル送信手段１１０は、音響モデル格納手段１０９から音
響モデルを読み出し、音声認識装置１０の音響モデル更
新手段１１１に、ネットワークを介して送信する。The acoustic model storage means 109 stores the acoustic model constructed by the acoustic model construction means 108,
An acoustic model is output in response to a read request. The acoustic model transmitting unit 110 reads the acoustic model from the acoustic model storing unit 109 and transmits the acoustic model to the acoustic model updating unit 111 of the speech recognition device 10 via the network.

【００７７】音声認識装置１０において、音響モデル更
新手段１１１は、音響モデル管理サーバ２０の音響モデ
ル送信手段１１０からネットワークを介して受け取った
音響モデルにより、照合手段１０１が照合の際に参照す
る音響モデル１０２を更新する。In the speech recognition apparatus 10, the acoustic model updating unit 111 uses the acoustic model received from the acoustic model transmitting unit 110 of the acoustic model management server 20 via the network to refer to the acoustic model that the collating unit 101 refers to at the time of collation. Update 102.

【００７８】言語モデル管理サーバ３０において、言語
データ取得手段１１３は、言語モデル更新指令１１２と
同期、あるいは非同期に動作し、常時、更新あるいは配
信される言語データをダウンロードし、言語モデル構築
のために用いる新規の言語データを収集し、更新された
言語データ１１４へ格納する。取得するこれらの言語デ
ータは、例えば、定期的に配信される新聞やメールマガ
ジンやインターネット上から検索可能なテキスト、チャ
ット、メール、マニュアル等のテキスト等である。In the language model management server 30, the language data acquisition means 113 operates synchronously or asynchronously with the language model update command 112, constantly downloads updated or distributed language data, and constructs a language model. New language data to be used is collected and stored in the updated language data 114. These language data to be acquired are, for example, regularly distributed newspapers, mail magazines, texts that can be searched from the Internet, texts of chats, mails, manuals, and the like.

【００７９】更新された言語データ１１４は、言語デー
タ取得手段１１３により取得された言語モデル学習用言
語データの集積であり、テキストデータや同時に得られ
るテキスト内容に関するキーワード情報等である。The updated language data 114 is an accumulation of language model learning language data obtained by the language data obtaining means 113, and includes text data and keyword information on text contents obtained at the same time.

【００８０】言語モデル構築手段１１５は、例えば、一
定の時間間隔、音声認識処理が実施された時間間隔、あ
るいは入力装置から与えられるユーザの指示等、適当な
タイミングで与えられる言語モデル更新指令１１２を受
けて、更新された言語データ１１４を参照してテキスト
データを読み出し、統計的手法を用いて言語モデルのパ
ラメータ推定を行うことによって、例えば単語に分割し
たテキストデータからｎグラム統計量を求めることによ
って、学習用データを良く表すように言語モデルを構築
し、言語モデル格納手段１１６に格納する。The language model constructing means 115 sends a language model update command 112 given at an appropriate timing, for example, at a fixed time interval, a time interval at which the speech recognition processing is performed, or a user's instruction given from an input device. Then, by reading the text data with reference to the updated language data 114 and estimating the parameters of the language model using a statistical method, for example, by obtaining an n-gram statistic from the text data divided into words Then, a language model is constructed so as to express the learning data well and stored in the language model storage means 116.

【００８１】言語モデル格納手段１１６は、言語モデル
構築手段１１５により構築された言語モデルを記憶し、
読み出し要求に応じて言語モデルを出力する。言語モデ
ル送信手段１１７は、言語モデル格納手段１１６から言
語モデルを読み出し、ネットワークを介して音声認識装
置１０の言語モデル更新手段１１８へ送信する。The language model storage means 116 stores the language model constructed by the language model construction means 115,
A language model is output in response to a read request. The language model transmitting unit 117 reads the language model from the language model storing unit 116 and transmits the language model to the language model updating unit 118 of the speech recognition device 10 via the network.

【００８２】音声認識装置１０において、言語モデル更
新手段１１８は、言語モデル管理サーバ３０の言語モデ
ル送信手段１１７からネットワークを介して受け取った
言語モデルにより、照合手段１０１が照合の際に参照す
る言語モデル１０３を更新する。In the speech recognition apparatus 10, the language model updating means 118 uses the language model received from the language model transmitting means 117 of the language model management server 30 via the network to refer to the language model which the collating means 101 refers to at the time of collation. 103 is updated.

【００８３】図２はこの発明の実施の形態１による音響
モデル１０２の更新処理を示すフローチャートである。
ステップＳＴ２０１において、音響モデル更新タイミン
グ決定手段（図示していない）は、例えばユーザの指示
や音響モデルの最終更新時刻からの時間間隔、ネットワ
ークの利用状況等の監視から適当な更新タイミングを判
定し、音響モデル管理サーバ２０の音響モデル構築手段
１０８へ音響モデル更新指令１０５を送信する。音響モ
デル構築手段１０８は、音響モデル更新指令１０５を受
けていれば、ステップＳＴ２０２へ進み、音響モデル更
新指令１０５を受けていなければ処理を終了する。FIG. 2 is a flowchart showing a process of updating the acoustic model 102 according to the first embodiment of the present invention.
In step ST201, the acoustic model update timing determination means (not shown) determines an appropriate update timing from, for example, a user's instruction, a time interval from the last update time of the acoustic model, a monitoring of network usage, and the like. The acoustic model update command 105 is transmitted to the acoustic model construction means 108 of the acoustic model management server 20. If the acoustic model constructing unit 108 has received the acoustic model update command 105, the process proceeds to step ST202, and if not, the process ends.

【００８４】ステップＳＴ２０２において、音響モデル
構築手段１０８は、学習に用いる更新された音響データ
１０７を読み出す。ステップＳＴ２０３において、音響
モデル構築手段１０８は、更新された音響データ１０７
から、統計的手法を用いて音響モデルのパラメータ推定
を行うことによって音響モデルを構築し、構築した音響
モデルを音響モデル格納手段１０９へ格納する。In step ST202, the acoustic model construction means 108 reads out the updated acoustic data 107 used for learning. In step ST203, the acoustic model construction unit 108 updates the acoustic data 107
Then, the acoustic model is constructed by estimating the parameters of the acoustic model using a statistical method, and the constructed acoustic model is stored in the acoustic model storage unit 109.

【００８５】ステップＳＴ２０４において、音響モデル
送信手段１１０は、音響モデル格納手段１０９から音響
モデルを読み出して、ネットワークを介して音声認識装
置１０の音響モデル更新手段１１１へ音響モデルを送信
する。ステップＳＴ２０５において、音響モデル更新手
段１１１は、受け取った音響モデルにより照合手段１０
１が参照する音響モデル１０２を更新する。In step ST204, the acoustic model transmitting means 110 reads the acoustic model from the acoustic model storage means 109 and transmits the acoustic model to the acoustic model updating means 111 of the speech recognition device 10 via the network. In step ST205, the acoustic model updating unit 111 uses the received acoustic model to
1 updates the acoustic model 102 referred to.

【００８６】なお、音響モデル更新指令１０５により、
音響モデル１０２の更新を要求する際に、同時に音声認
識装置１０が利用している音響モデル１０２のバージョ
ンを伝達し、音響モデル送信手段１１０が、音響モデル
格納手段１０９に格納されている音響モデル全体ではな
く、それまで音声認識装置１０が利用していた音響モデ
ル１０２との差分情報のみを送信すれば、送信データを
減らすことができ、ネットワークの負荷を軽減すること
ができる。Note that the acoustic model update command 105 gives
When requesting the update of the acoustic model 102, the version of the acoustic model 102 used by the speech recognition device 10 is transmitted at the same time, and the acoustic model transmitting unit 110 transmits the entire acoustic model stored in the acoustic model storage unit 109. Instead, if only the difference information from the acoustic model 102 used by the speech recognition apparatus 10 is transmitted, the transmission data can be reduced, and the load on the network can be reduced.

【００８７】また、音響モデル構築手段１０８が、あら
かじめ更新された音響モデルを構築しておき、音響モデ
ル更新指令１０５の要求にしたがって、音響モデルを音
声認識装置１０に送信するような形態を取った場合でも
同様に動作可能である。The acoustic model construction means 108 constructs an acoustic model updated in advance, and transmits the acoustic model to the speech recognition apparatus 10 according to the request of the acoustic model update command 105. In this case, the operation can be performed similarly.

【００８８】さらに、音声認識装置１０がユーザ辞書を
持つ場合であっても、同様に処理可能である。Further, even when the voice recognition device 10 has a user dictionary, the same processing can be performed.

【００８９】さらに、この実施の形態では、音声認識を
対象として説明を行ったが、パターンとシンボルの関係
を表した確率モデル、シンボルの出現を表した確率モデ
ルからなるパターン認識を対象とするものであれば、同
様に適用可能である。Further, in this embodiment, the description has been given for speech recognition. However, the present invention is intended for pattern recognition including a probability model representing a relationship between a pattern and a symbol and a probability model representing a symbol appearance. If so, the same can be applied.

【００９０】さらに、更新された音響データ１０７の格
納形式は、音響モデル構築時に利用可能な形式であれ
ば、あらかじめ信号処理や頻度分布を計算してあっても
かまわない。Further, the storage format of the updated acoustic data 107 may be a signal processing or a frequency distribution calculated in advance as long as the format can be used at the time of constructing the acoustic model.

【００９１】図３はこの発明の実施の形態１による言語
モデル１０３の更新処理を示すフローチャートである。
ステップＳＴ３０１において、言語モデル更新タイミン
グ決定手段（図示していない）は、例えば、ユーザの指
示や言語モデルの最終更新時刻からの時間間隔、ネット
ワークの利用状況等の監視から適当な更新タイミングを
判定し、言語モデル管理サーバ３０の言語モデル構築手
段１１５に言語モデル更新指令１１２を送信する。言語
モデル構築手段１１５は、言語モデル更新指令１１２を
受けていればステップＳＴ３０２へ進み、言語モデル更
新指令１１２を受けていなければ処理を終了する。FIG. 3 is a flowchart showing a process of updating language model 103 according to the first embodiment of the present invention.
In step ST301, the language model update timing determining means (not shown) determines an appropriate update timing based on, for example, monitoring of a user's instruction, a time interval from the last update time of the language model, a network usage status, and the like. The language model update command 112 is transmitted to the language model construction means 115 of the language model management server 30. Language model construction means 115 proceeds to step ST302 if language model update command 112 has been received, and ends the process if language model update command 112 has not been received.

【００９２】ステップＳＴ３０２において、言語モデル
構築手段１１５は、学習に用いる更新された言語データ
１１４を読み出す。ステップＳＴ３０３において、言語
モデル構築手段１１５は、更新された言語データ１１４
から、統計的手法を用いて言語モデルのパラメータ推定
を行うことによって言語モデルを構築し、構築した言語
モデルを言語モデル格納手段１１６へ格納する。In step ST302, language model construction means 115 reads updated language data 114 used for learning. In step ST303, the language model construction means 115 updates the language data 114
Then, a language model is constructed by estimating the parameters of the language model using a statistical method, and the constructed language model is stored in the language model storage means 116.

【００９３】ステップＳＴ３０４において、言語モデル
送信手段１１７は、言語モデル格納手段１１６から言語
モデルを読み出して、ネットワークを介して音声認識装
置１０の言語モデル更新手段１１８に送信する。ステッ
プＳＴ３０５において、言語モデル更新手段１１８は、
受け取った言語モデルにより照合手段１０１が参照する
言語モデル１０３を更新する。In step ST304, language model transmitting means 117 reads the language model from language model storing means 116 and transmits the language model to language model updating means 118 of speech recognition apparatus 10 via the network. In step ST305, the language model updating means 118
The language model 103 referred to by the matching unit 101 is updated based on the received language model.

【００９４】なお、言語モデル更新指令１１２により、
言語モデルの更新を要求する際に、同時に音声認識装置
１０が利用している言語モデル１０３のバージョンを伝
達することにより、言語モデル格納手段１１６に格納さ
れている言語モデル全体ではなく、それまで音声認識装
置１０が利用していた言語モデル１０３との差分情報の
みを送信すれば、送信データを減らすことができ、ネッ
トワークの負荷を軽減することができる。Note that the language model update command 112 gives
When requesting the update of the language model, the version of the language model 103 used by the speech recognition apparatus 10 is transmitted at the same time, so that the speech model is not the entire language model stored in the language model storage means 116 but the speech model up to that point. If only the difference information from the language model 103 used by the recognition device 10 is transmitted, the transmission data can be reduced, and the load on the network can be reduced.

【００９５】また、言語モデル構築手段１１５が、あら
かじめ更新された言語モデルを構築しておき、言語モデ
ル更新指令１１２の要求にしたがって、言語モデルを送
信するような形態を取った場合でも、同様に動作可能で
ある。Also, even when the language model construction means 115 constructs a language model updated in advance and transmits the language model in accordance with the request of the language model update command 112, the language model construction means 115 similarly Operable.

【００９６】さらに、音声認識装置１０がユーザ辞書を
持つ場合であっても、同様に動作可能である。Further, even when the speech recognition apparatus 10 has a user dictionary, the same operation can be performed.

【００９７】さらに、この実施の形態では、音声認識を
対象として説明を行ったが、パターンとシンボルの関係
を表した確率モデル、シンボルの出現を表した確率モデ
ルからなるパターン認識を対象とするものであれば、同
様に適用可能である。Further, in this embodiment, the description has been made for speech recognition. However, the present invention is directed to pattern recognition including a probability model representing the relationship between a pattern and a symbol and a probability model representing the appearance of a symbol. If so, the same can be applied.

【００９８】さらに、更新された言語データ１１４の格
納形式は、言語モデル構築時に利用可能な形式であれ
ば、あらかじめ単語に分割しておいたり、言語モデル構
築に使えるように単語や単語連鎖、同時に出現する単語
の組み合わせ等について頻度あるいは確率計算してあっ
てもかまわない。Further, if the storage format of the updated language data 114 is a format that can be used at the time of constructing a language model, it can be divided into words in advance, or words and word chains can be used at the same time for constructing a language model. Frequency or probability may be calculated for combinations of appearing words.

【００９９】この実施の形態１の図１では、音響モデル
更新指令１０５に従って音響モデルを構築する場合につ
いて示しているが、音響データ取得手段１０６が音響デ
ータを取得した際に、音響モデル構築手段１０８が音響
モデルを更新して音響モデル格納手段に１０９に格納
し、音響モデル送信手段１１０が音響モデル更新指令１
０５を受けて音響モデルを読み出すようにしても良い。
言語モデルの更新についても同様である。FIG. 1 of the first embodiment shows a case where an acoustic model is constructed in accordance with the acoustic model update command 105. However, when the acoustic data acquiring means 106 acquires acoustic data, the acoustic model constructing means 108 Updates the acoustic model and stores it in the acoustic model storage means 109, and the acoustic model transmitting means 110
05 may be read out.
The same applies to updating the language model.

【０１００】また、図１では音響モデル更新手段１１１
及び言語モデル更新手段１１８を備える場合を示した
が、音響モデル更新手段１１１又は言語モデル更新手段
１１８のどちらか一方のみを備える場合であってもかま
わない。In FIG. 1, the acoustic model updating means 111
And the case where the language model updating unit 118 is provided, but the case where only one of the acoustic model updating unit 111 and the language model updating unit 118 is provided may be used.

【０１０１】なお、実施の形態１における音声認識シス
テムを音声認識プログラムとして記録媒体に記録するこ
ともできる。この場合には、音響モデル管理サーバ２０
において、音響データ取得手段１０６と同様の処理を行
う音響データ取得機能と、音響モデル構築手段１０８と
同様の処理を行う音響モデル構築機能と、音響モデル格
納手段１０９と同様の処理を行う音響モデル格納機能
と、音響モデル送信手段１１０と同様の処理を行う音響
モデル送信機能から構成されるソフトウェアと、言語モ
デル管理サーバ３０において、言語データ取得手段１１
３と同様の処理を行う言語データ取得機能と、言語モデ
ル構築手段１１５と同様の処理を行う言語モデル構築機
能と、言語モデル格納手段１１６と同様の処理を行う言
語モデル格納機能と、言語モデル送信手段１１７と同様
の処理を行う言語モデル送信機能から構成されるソフト
ウェアと、音声認識装置１０において、音響モデル更新
手段１１１と同様の処理を行う音響モデル更新機能と、
言語モデル更新手段１１８と同様の処理を行う言語モデ
ル更新機能と、照合手段１０１と同様の処理を行う照合
機能から構成されるソフトウェアで音声認識プログラム
となる。Note that the speech recognition system according to the first embodiment can be recorded on a recording medium as a speech recognition program. In this case, the acoustic model management server 20
, An acoustic data acquisition function for performing the same processing as the acoustic data acquisition means 106, an acoustic model construction function for performing the same processing as the acoustic model construction means 108, and an acoustic model storage for performing the same processing as the acoustic model storage means 109 A software comprising an acoustic model transmitting function for performing the same processing as the acoustic model transmitting means 110 and the language model acquiring means 11 in the language model management server 30.
3; a language model construction function for performing the same processing as the language model construction means 115; a language model storage function for performing the same processing as the language model storage means 116; Software comprising a language model transmission function for performing the same processing as the means 117, and an acoustic model updating function for performing the same processing as the acoustic model updating means 111 in the speech recognition device 10,
The speech recognition program is software that includes a language model updating function that performs the same processing as the language model updating unit 118 and a matching function that performs the same processing as the matching unit 101.

【０１０２】記録媒体に記録する音声認識プログラム
は、音声認識装置１０のソフトウェアと、音響モデル管
理サーバ２０のソフトウェアを、別々の記録媒体に記録
しても良いし、１つの記録媒体に記録して、音声認識装
置１０又は音響モデル管理サーバ２０から、それぞれ音
響モデル管理サーバ２０又は音声認識装置１０に送信し
ても良い。また、これは言語モデルを対象とした場合で
も同様である。In the speech recognition program recorded on the recording medium, the software of the speech recognition device 10 and the software of the acoustic model management server 20 may be recorded on separate recording media, or may be recorded on one recording medium. , The speech recognition apparatus 10 or the acoustic model management server 20 may transmit the information to the acoustic model management server 20 or the speech recognition apparatus 10, respectively. The same applies to the case of a language model.

【０１０３】以上のように、この実施の形態１によれ
ば、ネットワークに接続された音響モデル管理サーバ２
０又は言語モデル管理サーバ３０で、更新された音響デ
ータ１０７又は更新された言語データ１１４を取得し、
最新の状態にある音響モデル又は言語モデルを構築し、
ネットワークを介してユーザ側の音声認識装置１０の音
響モデル１０２又は言語モデル１０３を更新すること
で、ユーザに大きな負担をかけることなく、音声認識の
認識精度を向上させることができるという効果が得られ
る。As described above, according to the first embodiment, the acoustic model management server 2 connected to the network
0 or the language model management server 30 acquires the updated sound data 107 or the updated language data 114,
Build up-to-date acoustic or language models,
By updating the acoustic model 102 or the language model 103 of the speech recognition device 10 on the user side via the network, the effect of improving the recognition accuracy of speech recognition can be obtained without imposing a heavy burden on the user. .

【０１０４】実施の形態２．図４はこの発明の実施の形
態２による音声認識システムの構成を示すブロック図で
ある。図４の言語モデル管理サーバ３０において、４０
１は更新する音声認識装置１０の言語モデル１０３を特
定するＩＤを取得する更新言語モデルＩＤ取得手段、４
０２は、更新言語モデルＩＤ取得手段４０１が取得した
ＩＤで指示される特定条件に対応して、更新された言語
データ１１４を読み出す特定向け言語データ読み出し手
段、４０３は、特定向け言語データ読み出し手段４０２
により読み出された更新された言語データ１１４を参照
し、特定条件に対応した言語モデルを構築する特定向け
言語モデル構築手段である。既に説明した各手段及び各
モデルについては、同一の符号を付し説明を省略する。Embodiment 2 FIG. 4 is a block diagram showing a configuration of a speech recognition system according to Embodiment 2 of the present invention. In the language model management server 30 of FIG.
Reference numeral 1 denotes an updated language model ID acquisition unit that acquires an ID that specifies the language model 103 of the speech recognition device 10 to be updated.
02 is a specific language data reading means for reading the updated language data 114 in accordance with the specific condition indicated by the ID obtained by the updated language model ID obtaining means 401, and 403 is a specific language data reading means 402
Is a specific language model constructing means for constructing a language model corresponding to a specific condition by referring to the updated language data 114 read by. The same reference numerals are given to the respective units and models already described, and description thereof will be omitted.

【０１０５】従来技術と異なるこの実施の形態に特徴的
な部分は、更新言語モデルＩＤ取得手段４０１と、特定
向け言語データ読み出し手段４０２と、特定向け言語モ
デル構築手段４０３とを備え、ネットワークを介して言
語モデルＩＤにより特定された言語モデルを提供するこ
とである。ここで、特定向け言語モデルとは、特定のユ
ーザやグループ、応用アプリケーション等に言語モデル
を特化させることによって、より高い性能が得られるよ
うに学習した言語モデルである。A characteristic part of this embodiment different from the prior art is that it comprises an updated language model ID acquiring means 401, a specific language data reading means 402, and a specific language model constructing means 403, and is provided via a network. To provide the language model specified by the language model ID. Here, the specific language model is a language model learned so as to obtain higher performance by specializing the language model for a specific user, group, application, or the like.

【０１０６】次に動作について説明する。言語管理サー
バ３０において、更新言語モデルＩＤ取得手段４０１
は、更新された複数の言語モデル１０３から特定の言語
モデル１０３を選択するために用いられるＩＤを取得す
る。このＩＤは、例えば利用者のユーザＩＤ，音声信号
１００の対象となるタスクを表すＩＤ等であり、更新す
る特定向けの言語モデル１０３を一意に定めることがで
きるものである。Next, the operation will be described. In the language management server 30, the updated language model ID acquisition means 401
Acquires an ID used to select a specific language model 103 from the plurality of updated language models 103. This ID is, for example, a user ID of a user, an ID representing a task targeted by the audio signal 100, and the like, and can uniquely determine a specific language model 103 to be updated.

【０１０７】特定向け言語データ読み出し手段４０２
は、更新言語モデルＩＤ取得手段４０１が取得した更新
言語モデルＩＤを受け取り、更新された言語データ１１
４を文や独立したテキストの単位で読み出して、例えば
言語データに付与されるテキスト内容に関するキーワー
ドや言語データに含まれるキーワードから判定し、言語
モデルＩＤで特定される対象であるかどうか識別するフ
ラグを付与し、特定向け言語モデル構築手段４０３へ送
る。Specific language data reading means 402
Receives the updated language model ID acquired by the updated language model ID acquisition means 401, and updates the updated language data 11
4 is read out in units of a sentence or an independent text, for example, a flag for judging from a keyword relating to the text content added to the language data or a keyword included in the language data and identifying whether or not the object is specified by the language model ID And sends it to the specific language model construction means 403.

【０１０８】特定向け言語モデル構築手段４０３は、更
新された言語データ１１４から特定対象について認識精
度が高くなるように学習した言語モデルを構築し、言語
モデル格納手段１１６に格納する。The specific language model constructing means 403 constructs a language model learned from the updated language data 114 so that the recognition accuracy of the specific object becomes high, and stores the language model in the language model storing means 116.

【０１０９】このために、まず、学習用言語データを
文、あるいは複数の文を単位として含まれるキーワード
等から、特定される言語モデル１０３を判定しておき、
これに従って、例えば「対話音声認識のための事前タス
ク適応の検討」、伊藤彰則、好田正紀、電子情報通信学
会技術研究報告（ＳＰ９６−８１）、１９９６年（以
下、文献４とする）で検討されているように、関連深い
テキストデータにより大きな重みを付与することによっ
て、特定条件に対応した言語モデルを構築できる。For this purpose, first, the language model 103 to be specified is determined from the learning language data from a sentence or a keyword or the like including a plurality of sentences as a unit.
In accordance with this, for example, "Study of Prior Task Adaptation for Dialogue Speech Recognition", Akinori Ito, Masaki Yoshida, IEICE Technical Report (SP96-81), 1996 (hereinafter referred to as Document 4) As described above, it is possible to construct a language model corresponding to a specific condition by giving greater weight to closely related text data.

【０１１０】例えば、スポーツに関するトピックに特化
した特定向け言語モデルを構築するには、言語モデル学
習時に、特定向け言語データ読み出し手段４０２から得
たフラグを参照し、スポーツに関するトピックのテキス
トデータであれば、実際の頻度をα倍して数え、それ以
外の記事であれば、そのまま頻度で両者を加えて確率モ
デルを推定する。ここで、αは音声認識の対象とする特
定向けテキストのうち、学習に用いないデータに対する
言語モデルのエントロピーが最小となるように定める。For example, in order to construct a specific language model specialized for a topic related to sports, a flag obtained from the specific language data reading means 402 is referred to at the time of language model learning, and text data of a topic relating to sports may be used. For example, the actual frequency is counted by multiplying by α, and in the case of other articles, the probability model is estimated by adding both at the same frequency. Here, α is determined so that the entropy of the language model for data not used for learning among the specific texts to be subjected to speech recognition is minimized.

【０１１１】図５はこの発明の実施の形態２による言語
モデル１０３の更新処理を示すフローチャートである。
ステップＳＴ５０１において、言語モデル更新タイミン
グ決定手段（図示していない）は、例えば、ユーザの指
示や最終更新時刻からの時間間隔、ネットワークの利用
状況等の監視から適当な更新タイミングを判定し、言語
モデル管理サーバ３０の更新言語モデルＩＤ取得手段４
０１へ言語モデル更新指令１１２を送信する。更新言語
モデルＩＤ取得手段４０１は、言語モデル更新指令１１
２を受けていれば、ステップＳＴ５０２へ進み、言語モ
デル更新指令１１２を受けていなければ、処理を終了す
る。FIG. 5 is a flowchart showing a process of updating language model 103 according to the second embodiment of the present invention.
In step ST501, the language model update timing determination means (not shown) determines an appropriate update timing based on, for example, a user's instruction, a time interval from the last update time, monitoring of network usage, and the like. Update language model ID acquisition means 4 of management server 30
01 is transmitted to the language model update command 112. The update language model ID acquisition means 401 receives the language model update command 11
If the language model update instruction 112 has not been received, the process ends.

【０１１２】ステップＳＴ５０２において、更新タイミ
ングであれば、更新言語モデルＩＤ取得手段４０１は、
使用しているユーザ・グループを特定する手段、タスク
を特定する手段等により、更新先の言語モデル１０３の
ＩＤを取得し、特定向け言語データ読み出し手段４０２
へ送る。ステップＳＴ５０３において、特定向け言語デ
ータ読み出し手段４０２は、特定向け言語モデルＩＤに
従い、更新された言語データ１１４を文や独立したテキ
ストの単位で読み出して、言語モデルＩＤで特定される
対象であるかどうか判別するフラグを付与して、更新さ
れた言語データ１１４を読み出す。In step ST502, if it is the update timing, the update language model ID obtaining means 401
The ID of the language model 103 to be updated is acquired by means for specifying the user group in use, means for specifying the task, etc., and the specific language data reading means 402
Send to In step ST503, the specific language data reading unit 402 reads the updated language data 114 in units of sentences or independent texts according to the specific language model ID, and determines whether the target is specified by the language model ID. The updated language data 114 is read with a flag for determination.

【０１１３】ステップＳＴ５０４において、特定向け言
語モデル構築手段４０３は、学習アルゴリズムに従い、
更新された言語データ１１４から言語モデルを推定し、
推定した言語モデルを言語モデル格納手段１１６へ格納
する。ステップＳＴ５０５において、言語モデル送信手
段１１７は、読み出した言語モデルを、ネットワークを
介して音声認識装置１０の言語モデル更新手段１１８に
送信する。ステップＳＴ５０６において、言語モデル更
新手段１１８は、受け取った言語モデルにより照合手段
１０１が参照する言語モデル１０３を更新する。In step ST 504, the specific language model constructing means 403 follows the learning algorithm
Estimating the language model from the updated language data 114,
The estimated language model is stored in the language model storage means 116. In step ST505, the language model transmitting means 117 transmits the read language model to the language model updating means 118 of the speech recognition device 10 via the network. In step ST506, the language model updating unit 118 updates the language model 103 referred to by the matching unit 101 with the received language model.

【０１１４】なお、言語モデルＩＤに対して、適合する
言語モデルを出力することができれば、特定向け言語モ
デルをあらかじめ構築しておく必要はない。例えば、言
語モデルＩＤに依存して学習用言語データのみ作成して
おき、要求に応じて言語モデルを構築しても良い。Note that if a language model that matches the language model ID can be output, it is not necessary to construct a specific language model in advance. For example, only language data for learning may be created depending on the language model ID, and a language model may be constructed according to a request.

【０１１５】なお、この実施の形態では、文献４にした
がった特定向け言語モデル構成法を例としたが、言語モ
デルを決定できるＩＤを用いて、複数の言語モデル１０
３から選択する方法であれば同様に適用可能である。In this embodiment, the specific language model construction method according to Document 4 is described as an example. However, a plurality of language models 10
The method can be similarly applied as long as the method is selected from the three.

【０１１６】また、言語モデル更新指令１１２により、
言語モデル更新要求の際に、同時に音声認識装置１０が
利用している言語モデル１０３のバージョンを伝達する
ことにより、構築した特定向け言語モデル全体ではな
く、現行の言語モデル１０３からの差分情報のみを送信
し、ネットワークの負荷を軽減することができる。Further, according to the language model update command 112,
By transmitting the version of the language model 103 used by the speech recognition apparatus 10 at the time of the language model update request, only the difference information from the current language model 103 is used instead of the entire specific language model constructed. It can transmit and reduce the load on the network.

【０１１７】さらに、音声認識装置にユーザ辞書６０１
がある場合であっても、同様に処理可能である。Further, the user dictionary 601 is stored in the speech recognition device.
Even when there is, the same processing can be performed.

【０１１８】さらに、この説明では音声認識を対象とし
て説明を行ったが、パターンとシンボルの関係を表した
確率モデル、シンボルの出現を表した確率モデルからな
るパターン認識を対象とするものであれば同様に適用可
能である。Further, in this description, the description has been made for speech recognition. However, if it is for pattern recognition consisting of a probability model representing the relationship between a pattern and a symbol, and a probability model representing the appearance of a symbol. It is equally applicable.

【０１１９】また、この実施の形態では、特定向け言語
モデルを構築し、構築した特定向け言語モデルにより言
語モデル１０３を更新しているが、特定向け音響モデル
を構築し、構築した特定向け音響モデルにより音響モデ
ル１０２を更新することもできる。その場合には、図４
において、言語モデル管理サーバ３０の代わりに音響モ
デル管理サーバ，言語データ取得手段１１３の代わりに
音響データ取得手段、更新された言語データ１１４の代
わりに更新された音響データ、更新言語モデルＩＤ取得
手段４０１の代わりに更新音響モデルＩＤ取得手段、特
定向け言語データ読み出し手段４０２の代わりに特定向
け音響モデル読み出し手段、特定向け言語モデル構築手
段４０３の代わりに特定向け音響モデル構築手段、言語
モデル格納手段１１６の代わりに音響モデル格納手段、
言語モデル送信手段１１７の代わりに音響モデル送信手
段を備え、音声認識装置１０において、言語モデル更新
手段１１８の代わりに音響モデル更新手段を備え、音響
モデル更新手段が音響モデル１０２を更新するようにす
れば良い。In this embodiment, the specific language model is constructed, and the language model 103 is updated with the constructed specific language model. However, the specific acoustic model is constructed, and the constructed specific acoustic model is constructed. , The acoustic model 102 can be updated. In that case, FIG.
, The acoustic model management server in place of the language model management server 30, the acoustic data acquisition means in place of the language data acquisition means 113, the updated acoustic data in place of the updated language data 114, and the updated language model ID acquisition means 401 Instead of the updated acoustic model ID acquisition means, the specific acoustic model reading means in place of the specific language data reading means 402, the specific acoustic model construction means in place of the specific language model construction means 403, and the language model storage means 116 Instead, acoustic model storage means,
An acoustic model transmitting means is provided in place of the language model transmitting means 117. In the speech recognition apparatus 10, an acoustic model updating means is provided in place of the language model updating means 118, and the acoustic model updating means updates the acoustic model 102. Good.

【０１２０】さらに、実施の形態２における音声認識シ
ステムを音声認識プログラムとして記録媒体に記録する
こともできる。この場合には、言語モデル管理サーバ３
０において、言語データ取得手段１１３と同様の処理を
行う言語データ取得機能と、更新言語モデルＩＤ取得手
段４０１と同様の処理を行う更新言語モデルＩＤ取得機
能と、特定向け言語データ読み出し手段４０２と同様の
処理を行う特定向け言語データ読み出し機能と、特定向
け言語モデル構築手段４０３と同様の処理を行う特定向
け言語モデル構築機能と、言語モデル格納手段１１６と
同様の処理を行う言語モデル格納機能と、言語モデル送
信手段１１７と同様の処理を行う言語モデル送信機能か
ら構成されるソフトウェアと、音声認識装置１０におい
て、言語モデル更新手段１１８と同様の処理を行う言語
モデル更新機能と、照合手段１０１と同様の処理を行う
照合機能から構成されるソフトウェアで音声認識プログ
ラムとなる。これは音響モデルを対象とした場合でも同
様である。Furthermore, the speech recognition system according to the second embodiment can be recorded on a recording medium as a speech recognition program. In this case, the language model management server 3
0, a language data obtaining function for performing the same processing as the language data obtaining means 113, an updated language model ID obtaining function for performing the same processing as the updated language model ID obtaining means 401, and the same as the specific language data reading means 402 A specific language data reading function for performing the same processing, a specific language model building function for performing the same processing as the specific language model building means 403, a language model storing function for performing the same processing as the language model storing means 116, Software comprising a language model transmitting function for performing the same processing as the language model transmitting means 117, a language model updating function for performing the same processing as the language model updating means 118 in the speech recognition device 10, and the same as the matching means 101 This is software that has a collation function that performs the processing described above, and becomes a speech recognition program. This is the same even when an acoustic model is targeted.

【０１２１】記録媒体に記録する音声認識プログラム
は、音声認識装置１０のソフトウェアと、言語モデル管
理サーバ３０のソフトウェアを、別々の記録媒体に記録
しても良いし、１つの記録媒体に記録して、音声認識装
置１０又は言語モデル管理サーバ３０から、それぞれ言
語モデル管理サーバ３０又は音声認識装置１０に送信し
ても良い。また、これは音響モデルを対象とした場合で
も同様である。The voice recognition program recorded on the recording medium may record the software of the voice recognition device 10 and the software of the language model management server 30 on separate recording media, or may record the software on one recording medium. , The speech recognition device 10 or the language model management server 30 may transmit the data to the language model management server 30 or the speech recognition device 10, respectively. The same applies to the case where an acoustic model is targeted.

【０１２２】以上のように、この実施の形態２によれ
ば、ネットワークに接続された言語モデル管理サーバ３
０で、更新された言語データ１１４を取得し、しかも、
ユーザの音声認識装置１０の言語モデル１０３のＩＤを
取得することで、最新の状態にあり、しかもユーザに対
応した特定向けの言語モデルを構築し、ネットワークを
介してユーザの音声認識装置１０の言語モデル１０３を
更新することで、ユーザに大きな負担をかけることな
く、音声認識の認識精度を向上させることができるとい
う効果が得られる。As described above, according to the second embodiment, the language model management server 3 connected to the network
0, the updated language data 114 is obtained, and
By acquiring the ID of the language model 103 of the user's voice recognition device 10, a language model for the user that is in the latest state and that is specific to the user is constructed, and the language of the user's voice recognition device 10 is connected via a network. By updating the model 103, it is possible to obtain an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

【０１２３】また、この実施の形態２によれば、特定向
けにカスタマイズされた言語モデルを、ネットワークを
介して更新することにより、ユーザが複数の異なる照合
手段１０１を利用する場合でも、全ての照合手段１０１
の利用時に適切な言語モデル１０３を利用し、高い認識
精度を得ることができるという効果が得られる。According to the second embodiment, by updating a language model customized for a specific purpose via a network, even if a user uses a plurality of different collation means 101, all collation can be performed. Means 101
In this case, it is possible to obtain an effect that a high recognition accuracy can be obtained by using an appropriate language model 103 when using.

【０１２４】実施の形態３．図６はこの発明の実施の形
態３による音声認識システムの構成を示すブロック図で
ある。図６の音声認識装置１０において、６０１は照合
手段１０１が照合の際に参照する単語を登録したユーザ
辞書であり、言語モデル管理サーバ３０において、６０
２は言語モデル更新指令１１２を受けて、照合手段１０
１が参照するユーザ辞書６０１を、ネットワークを介し
て読み出すユーザ辞書読み出し手段、６０３は、更新さ
れた言語データ１１４とユーザ辞書読み出し手段６０２
が読み出したユーザ辞書６０１を参照し、ユーザ辞書６
０１に依存した言語モデルを構築するユーザ辞書依存言
語モデル構築手段である。既に説明した各手段及び各モ
デルについては、同一の符号を付し説明を省略する。Embodiment 3 FIG. 6 is a block diagram showing a configuration of a speech recognition system according to Embodiment 3 of the present invention. In the speech recognition apparatus 10 of FIG. 6, reference numeral 601 denotes a user dictionary in which words to be referred to by the matching means 101 at the time of matching are registered.
2 receives the language model update command 112 and receives the
The user dictionary reading unit 603 reads out the user dictionary 601 referred to by the user 1 via a network.
Refers to the user dictionary 601 read by the
01 is a user dictionary dependent language model construction means for constructing a language model dependent on 01. The same reference numerals are given to the respective units and models already described, and description thereof will be omitted.

【０１２５】従来技術と異なるこの実施の形態に特徴的
な部分は、ユーザ辞書読み出し手段６０２とユーザ辞書
依存言語モデル構築手段６０３を備えたことである。A feature of this embodiment different from the prior art is that a user dictionary reading means 602 and a user dictionary dependent language model construction means 603 are provided.

【０１２６】次に動作について説明する。言語モデル管
理サーバ３０のユーザ辞書読み出し手段６０２は、言語
モデル更新指令１１２を受けて、音声認識装置１０の照
合手段１０１が参照するユーザ辞書６０１を、ネットワ
ークを介して読み出す。ユーザ辞書依存言語モデル構築
手段６０３は、ユーザ辞書６０１に登録された単語と更
新された言語データ１１４を用いて、最新の状態に更新
されており、かつユーザにカスタマイズされた言語モデ
ルを構築する。Next, the operation will be described. Upon receiving the language model update command 112, the user dictionary reading unit 602 of the language model management server 30 reads out the user dictionary 601 referenced by the matching unit 101 of the speech recognition device 10 via the network. The user dictionary-dependent language model construction means 603 constructs a language model that has been updated to the latest state and customized for the user, using the words registered in the user dictionary 601 and the updated language data 114.

【０１２７】ユーザ辞書６０１に依存した言語モデルの
構築は、例えば、更新された言語データ１１４の中か
ら、ユーザ辞書６０１に存在する単語が含まれるテキス
トを抜き出し、これを特定向けテキストであると見なし
て、実施の形態２で参照した文献４記載の方法を実施す
ることにより行われる。これによって、ユーザ辞書６０
１記載の単語のうち、元の言語モデルでは登録されてお
らず、適切な統計量が付与されていなかった単語で、更
新されたテキストにおいて出現した単語に妥当な統計量
を付与することが可能となり、認識精度が向上すること
を期待できる。For constructing a language model depending on the user dictionary 601, for example, a text including a word existing in the user dictionary 601 is extracted from the updated language data 114, and this is regarded as a specific text. This is performed by implementing the method described in Reference 4 referred to in the second embodiment. Thereby, the user dictionary 60
Among the words described in 1, the appropriate language can be given to words that were not registered in the original language model and did not have appropriate statistics, and appeared in the updated text. And it can be expected that the recognition accuracy is improved.

【０１２８】図７はこの発明の実施の形態３による言語
モデル１０３の更新処理を示すフローチャートである。
ステップＳＴ７０１において、言語モデル更新タイミン
グ決定手段（図示していない）は、例えば、ユーザの指
示や最終更新時刻からの時間間隔、ネットワークの利用
状況等の監視から適当な更新タイミングを判定し、言語
モデル管理サーバ３０のユーザ辞書読み出し手段６０２
へ言語モデル更新指令１１２を送信する。ユーザ辞書読
み出し手段６０２は、言語モデル更新指令１１２を受け
ていれば、ステップＳＴ７０２へ進み、言語モデル更新
指令１１２を受けていなければ、処理を終了する。FIG. 7 is a flowchart showing a process of updating the language model 103 according to the third embodiment of the present invention.
In step ST701, the language model update timing determination means (not shown) determines an appropriate update timing based on, for example, a user's instruction, a time interval from the last update time, monitoring of a network usage state, and the like. User dictionary reading means 602 of management server 30
To the language model update command 112. If the user dictionary reading unit 602 has received the language model update command 112, the process proceeds to step ST702, and if it has not received the language model update command 112, the process ends.

【０１２９】ステップＳＴ７０２において、ユーザ辞書
読み出し手段６０２は、照合手段１０１が参照するユー
ザ辞書６０１を、ネットワークを介して読み出す。ステ
ップＳＴ７０３において、ユーザ辞書依存言語モデル構
築手段６０３は、さらに更新された言語データ１１４を
読み出す。ステップＳＴ７０４において、ユーザ辞書依
存言語モデル構築手段６０３は、ユーザ辞書６０１及び
更新された言語データ１１４からユーザ辞書６０１に依
存した言語モデルを構築し、言語モデル格納手段１１６
に格納する。In step ST702, the user dictionary reading means 602 reads the user dictionary 601 referenced by the collating means 101 via the network. In step ST703, the user dictionary-dependent language model construction means 603 reads the updated language data 114. In step ST704, the user dictionary dependent language model construction means 603 constructs a language model dependent on the user dictionary 601 from the user dictionary 601 and the updated language data 114, and stores the language model storage means 116
To be stored.

【０１３０】ステップＳＴ７０５において、言語モデル
送信手段１１７は、言語モデル格納手段１１６から読み
出したユーザ辞書６０１に依存した言語モデルを、ネッ
トワークを介して音声認識装置１０の言語モデル更新手
段１１８に送信する。ステップＳＴ７０６において、言
語モデル更新手段１１８は、受け取った言語モデルによ
り照合手段１０１が参照する言語モデル１０３を更新す
る。At step ST705, the language model transmitting means 117 transmits the language model dependent on the user dictionary 601 read from the language model storing means 116 to the language model updating means 118 of the speech recognition apparatus 10 via the network. In step ST706, the language model updating unit 118 updates the language model 103 referred to by the matching unit 101 with the received language model.

【０１３１】また、この実施の形態では、音声認識を対
象として説明を行ったが、パターンとシンボルの関係を
表した確率モデル、シンボルの出現を表した確率モデル
からなるパターン認識を対象とするものであれば、同様
に適用可能である。In this embodiment, the description has been given for speech recognition. However, the present embodiment is directed to pattern recognition including a probability model representing a relationship between a pattern and a symbol and a probability model representing a symbol appearance. If so, the same can be applied.

【０１３２】さらに、実施の形態３における音声認識シ
ステムを音声認識プログラムとして記録媒体に記録する
こともできる。この場合には、言語モデル管理サーバ３
０において、言語データ取得手段１１３と同様の処理を
行う言語データ取得機能と、ユーザ辞書読み出し手段６
０２と同様の処理を行うユーザ辞書読み出し機能と、ユ
ーザ辞書依存言語モデル構築手段６０３と同様の処理を
行うユーザ辞書依存言語モデル構築機能と、言語モデル
格納手段１１６と同様の処理を行う言語モデル格納機能
と、言語モデル送信手段１１７と同様の処理を行う言語
モデル送信機能から構成されるソフトウェアと、音声認
識装置１０において、言語モデル更新手段１１８と同様
の処理を行う言語モデル更新機能と、照合手段１０１と
同様の処理を行う照合機能から構成されるソフトウェア
で音声認識プログラムとなる。Further, the speech recognition system according to Embodiment 3 can be recorded on a recording medium as a speech recognition program. In this case, the language model management server 3
0, the language data acquisition function for performing the same processing as the language data acquisition means 113, and the user dictionary reading means 6
02, a user dictionary-dependent language model construction function that performs the same processing as the user dictionary-dependent language model construction means 603, and a language model storage function that performs the same processing as the language model storage means 116. Software comprising a function and a language model transmitting function for performing the same processing as the language model transmitting means 117; a language model updating function for performing the same processing as the language model updating means 118 in the speech recognition apparatus 10; A speech recognition program is software that includes a collation function that performs the same processing as 101.

【０１３３】記録媒体に記録する音声認識プログラム
は、音声認識装置１０のソフトウェアと、言語モデル管
理サーバ３０のソフトウェアを、別々の記録媒体に記録
しても良いし、１つの記録媒体に記録して、音声認識装
置１０又は言語モデル管理サーバ３０から、それぞれ言
語モデル管理サーバ３０又は音声認識装置１０に送信し
ても良い。In the speech recognition program recorded on the recording medium, the software of the speech recognition device 10 and the software of the language model management server 30 may be recorded on separate recording media, or may be recorded on one recording medium. , The speech recognition device 10 or the language model management server 30 may transmit the data to the language model management server 30 or the speech recognition device 10, respectively.

【０１３４】以上のように、この実施の形態３によれ
ば、ネットワークに接続された言語モデル管理サーバ３
０で、更新された言語データ１１４を取得し、しかも、
ユーザの音声認識装置１０のユーザ辞書６０１を読み出
すことで、最新の状態にあり、しかもユーザ辞書６０１
に登録した単語について、より詳細に反映させた言語モ
デルを構築し、ネットワークを介してユーザの音声認識
装置１０の言語モデル１０３を更新することで、ユーザ
辞書が大きくなった場合でも、ユーザに大きな負担をか
けることなく、音声認識の認識精度を向上させることが
できるという効果が得られる。As described above, according to the third embodiment, the language model management server 3 connected to the network
0, the updated language data 114 is obtained, and
By reading the user dictionary 601 of the user's voice recognition device 10, the user dictionary 601 is kept up-to-date.
By constructing a language model reflecting the words registered in the language in more detail, and updating the language model 103 of the user's speech recognition device 10 via the network, even if the user dictionary becomes large, the language is large for the user. The effect is obtained that the recognition accuracy of the voice recognition can be improved without imposing a burden.

【０１３５】実施の形態４．図８はこの発明の実施の形
態４による音声認識システムの構成を示すブロック図で
ある。図８の言語モデル管理サーバ３０において、８０
１は、言語モデル更新指令１１２を受けて、ユーザが利
用したテキストを取得するユーザ利用テキスト取得手
段、８０２はユーザ利用テキスト取得手段８０１が取得
したテキストを格納するユーザ利用テキスト格納手段、
８０３は、更新された言語データ１１４と、ユーザ利用
テキスト格納手段８０２に格納されているテキストを参
照し、テキストに依存した言語モデルを構築するユーザ
利用テキスト依存言語モデル構築手段である。既に説明
した各手段及び各モデルについては、同一の符号を付し
説明を省略する。Embodiment 4 FIG. 8 is a block diagram showing a configuration of a speech recognition system according to Embodiment 4 of the present invention. In the language model management server 30 of FIG.
1 is a user use text acquisition unit that acquires a text used by the user in response to the language model update command 112, 802 is a user use text storage unit that stores the text acquired by the user use text acquisition unit 801,
Reference numeral 803 denotes a user-used text-dependent language model construction unit that constructs a text-dependent language model by referring to the updated language data 114 and the text stored in the user-used text storage unit 802. The same reference numerals are given to the respective units and models already described, and description thereof will be omitted.

【０１３６】従来技術と異なるこの実施の形態に特徴的
な部分は、ユーザ利用テキスト取得手段８０１，ユーザ
利用テキスト格納手段８０２及びユーザ利用テキスト依
存言語モデル構築手段８０３を備え、ユーザが利用した
テキストと最新状態に更新された言語データ１１４を参
照し、ユーザが利用したテキストに合わせて言語モデル
を構築することである。The features of this embodiment different from the prior art are that a user-used text acquisition unit 801, a user-used text storage unit 802, and a user-used text-dependent language model construction unit 803 are provided. Referencing the language data 114 updated to the latest state, and constructing a language model in accordance with the text used by the user.

【０１３７】次に動作について説明する。言語モデル管
理サーバ３０のユーザ利用テキスト取得手段８０１は、
言語モデル更新指令１１２を受けて、例えば、ユーザが
あらかじめ指定したファイル、ディレクトリを走査する
ことにより、ユーザが参照、あるいは記述したテキスト
ファイルを読み出す。ユーザ利用テキスト格納手段８０
２は、ユーザ利用テキスト取得手段８０１によって収集
されたテキストを格納する。Next, the operation will be described. The user use text acquisition unit 801 of the language model management server 30
In response to the language model update command 112, for example, by scanning a file or directory specified by the user in advance, a text file referred or described by the user is read. User use text storage means 80
Reference numeral 2 stores the text collected by the user use text acquisition unit 801.

【０１３８】ユーザ利用テキスト依存言語モデル構築手
段８０３は、ユーザ利用テキスト及び更新された言語デ
ータ１１４を参照し、認識精度が高くなるように言語モ
デルを構築する。ユーザ利用テキストを用いた言語モデ
ルの構築では、例えば、ユーザ利用テキストを特定向け
テキストであると見なし、実施の形態２で参照した文献
４記載の方法を実施することにより、ユーザ利用テキス
ト依存の言語モデルを構築する。このようにして構築さ
れた言語モデルは、ユーザが参照あるいは既出したテキ
ストの性質を反映させているため、ユーザが発声する確
率の高い言語的性質を含み、より精度の高い認識結果を
得ることができる。The user use text dependent language model construction means 803 refers to the user use text and the updated language data 114, and constructs a language model so that the recognition accuracy is improved. In the construction of the language model using the user-used text, for example, the user-used text is regarded as the specific text, and the method described in Reference 4 referred to in the second embodiment is performed, whereby the language that depends on the user-used text is used. Build the model. Since the language model constructed in this way reflects the properties of the text that the user has referred to or has already published, it contains linguistic properties that have a high probability that the user will utter, and can obtain more accurate recognition results. it can.

【０１３９】図９はこの発明の実施の形態４による言語
モデル１０３の更新処理を示すフローチャートである。
ステップＳＴ９０１において、言語モデル更新タイミン
グ決定手段（図示していない）は、ユーザの指示や最終
更新時刻からの時間間隔、ネットワークの利用状況等の
モニタから適当なタイミングを判定し、言語モデル管理
サーバ３０のユーザ利用テキスト取得手段８０１へ言語
モデル更新指令１１２を送信する。ユーザ利用テキスト
取得手段８０１は、言語モデル更新指令１１２を受けて
いれば、ステップＳＴ９０２へ進み、言語モデル更新指
令１１２を受けていなければ、処理を終了する。FIG. 9 is a flowchart showing a process of updating the language model 103 according to the fourth embodiment of the present invention.
In step ST901, the language model update timing determining means (not shown) determines an appropriate timing based on a user's instruction, a time interval from the last update time, a monitor of a network usage status, and the like. The language model update instruction 112 is transmitted to the user use text acquisition unit 801 of the first embodiment. The user use text obtaining unit 801 proceeds to step ST902 if the language model update command 112 has been received, and ends the process if the language model update command 112 has not been received.

【０１４０】ステップＳＴ９０２において、ユーザ利用
テキスト取得手段８０１は、ユーザ利用テキストを読み
出しユーザ利用テキスト格納手段８０２へ格納する。ス
テップＳＴ９０３において、ユーザ利用テキスト依存言
語モデル構築手段８０３は、ユーザ利用テキストと更新
された言語データ１１４を読み出す。ステップＳＴ９０
４において、ユーザ利用テキスト依存言語モデル構築手
段８０３は、ユーザ利用テキスト及び更新された言語デ
ータ１１４からユーザ利用テキスト依存言語モデルを構
築し言語モデル格納手段１１６に格納する。[0140] In step ST902, the user use text acquisition means 801 reads out the user use text and stores it in the user use text storage means 802. In step ST903, the user use text dependent language model construction means 803 reads the user use text and the updated language data 114. Step ST90
In 4, the user-used text-dependent language model construction unit 803 constructs a user-used text-dependent language model from the user-used text and the updated language data 114 and stores the model in the language model storage unit 116.

【０１４１】ステップＳＴ９０５において、言語モデル
送信手段１１７は、言語モデル格納手段１１６から読み
出したユーザ利用テキスト依存言語モデルを、ネットワ
ークを介して音声認識装置１０の言語モデル更新手段１
１８に送信する。ステップＳＴ９０６において、言語モ
デル更新手段１１８は、受け取った言語モデルにより照
合手段１０１が参照する言語モデル１０３を更新する。At step ST905, language model transmitting means 117 transmits the user-dependent text-dependent language model read from language model storing means 116 to language model updating means 1 of speech recognition apparatus 10 via the network.
18 to be sent. In step ST906, the language model updating unit 118 updates the language model 103 referred to by the matching unit 101 with the received language model.

【０１４２】この実施の形態では、ユーザ利用テキスト
の入手を、特定ディレクトリやファイルの検索によると
したが、ユーザ利用テキスト取得手段８０１は、テキス
トが収集できるのであれば、テキストをファイルやディ
レクトリから取り出すのではなく、音声認識やキーボー
ド、ペン、ＯＣＲ等のユーザ入力、あるいはブラウザ等
によってユーザが閲覧したテキスト等を利用してもかま
わない。In this embodiment, the user use text is obtained by searching a specific directory or file. However, the user use text acquisition unit 801 extracts the text from the file or directory if the text can be collected. Instead, user input such as voice recognition, a keyboard, a pen, OCR, or the like, or text or the like viewed by a user using a browser or the like may be used.

【０１４３】また、ユーザ利用テキスト格納手段８０２
は、ユーザ利用テキスト取得手段８０１によって収集さ
れたテキストを格納するとしたが、ユーザ利用テキスト
依存言語モデル構築手段８０３の基準に従い、テキスト
を適当な手段によって単語に分割したり、モデル構築時
に参照する単語、単語連鎖、同時に出現する単語の組み
合わせ等に関する頻度として格納しても同様である。The user use text storage means 802
Stores the text collected by the user use text acquisition means 801, but divides the text into words by an appropriate means according to the criteria of the user use text dependent language model construction means 803, The same applies to the case where the frequency is stored as a frequency related to a word chain, a combination of words that appear simultaneously, and the like.

【０１４４】さらに、この実施の形態では、音声認識を
対象として説明を行ったが、パターンとシンボルの関係
を表した確率モデル、シンボルの出現を表した確率モデ
ルからなるパターン認識を対象とするものであれば同様
に適用可能である。Furthermore, in this embodiment, the description has been made with respect to voice recognition. However, the present invention is directed to pattern recognition including a probability model representing the relationship between a pattern and a symbol and a probability model representing the appearance of a symbol. If so, it can be similarly applied.

【０１４５】さらに、実施の形態４における音声認識シ
ステムを、音声認識プログラムとして記録媒体に記録す
ることもできる。この場合には、言語モデル管理サーバ
３０において、言語データ取得手段と同様の処理を行う
言語データ取得機能と、ユーザ利用テキスト取得手段８
０１と同様の処理を行うユーザ利用テキスト取得機能
と、ユーザ利用テキスト格納手段８０２と同様の処理を
行うユーザ利用テキスト格納機能と、ユーザ利用テキス
ト依存言語モデル構築手段８０３と同様の処理を行うユ
ーザ利用テキスト依存言語モデル構築機能と、言語モデ
ル格納手段１１６と同様の処理を行う言語モデル格納
と、言語モデル送信手段１１７と同様の処理を行う言語
モデル送信機能から構成されるソフトウェアと、音声認
識装置１０において、言語モデル更新手段１１８と同様
の処理を行う言語モデル更新機能と、照合手段１０１と
同様の処理を行う照合機能から構成されるソフトウェア
で音声認識プログラムとなる。Further, the speech recognition system according to Embodiment 4 can be recorded on a recording medium as a speech recognition program. In this case, in the language model management server 30, a language data acquisition function for performing the same processing as the language data acquisition means, and the user use text acquisition means 8
01, a user use text storage function that performs the same processing as the user use text storage unit 802, and a user use that performs the same process as the user use text dependent language model construction unit 803. Software comprising a text-dependent language model construction function, a language model storage function for performing the same processing as the language model storage means 116, and a language model transmission function for performing the same processing as the language model transmission means 117; , A speech recognition program is formed by software including a language model updating function for performing the same processing as the language model updating unit 118 and a collating function for performing the same processing as the collating unit 101.

【０１４６】記録媒体に記録する音声認識プログラム
は、音声認識装置１０のソフトウェアと、言語モデル管
理サーバ３０のソフトウェアを、別々の記録媒体に記録
しても良いし、１つの記録媒体に記録して、音声認識装
置１０又は言語モデル管理サーバ３０から、それぞれ言
語モデル管理サーバ３０又は音声認識装置１０に送信し
ても良い。In the speech recognition program recorded on the recording medium, the software of the speech recognition device 10 and the software of the language model management server 30 may be recorded on separate recording media, or may be recorded on one recording medium. , The speech recognition device 10 or the language model management server 30 may transmit the data to the language model management server 30 or the speech recognition device 10, respectively.

【０１４７】以上のように、この実施の形態４によれ
ば、ネットワークに接続された言語モデル管理サーバ３
０で、更新された言語データ１１４を取得し、しかも、
ユーザが利用したテキストを取得することで、最新の状
態にあり、しかもユーザが利用したテキストに依存した
言語モデルを構築し、ネットワークを介してユーザの音
声認識装置１０の言語モデル１０３を更新することで、
ユーザに大きな負担をかけることなく、音声認識の認識
精度を向上させることができるという効果が得られる。As described above, according to the fourth embodiment, the language model management server 3 connected to the network
0, the updated language data 114 is obtained, and
By acquiring the text used by the user, constructing a language model that is up to date and dependent on the text used by the user, and updating the language model 103 of the user's speech recognition device 10 via the network. so,
The effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user is obtained.

【０１４８】実施の形態５．図１０はこの発明の実施の
形態５による音声認識システムの構成を示すブロック図
である。図１０の音声認識装置１０において、１００１
は照合手段１０１が照合の際に参照する音響モデルを識
別するＩＤを取得する音響モデルＩＤ取得手段、１００
２は、音響モデルＩＤ取得手段１００１が取得したＩＤ
を読み込み、入力された音声信号から適応化用の音声デ
ータを取得し、読み込んだＩＤと取得した適応化用の音
声データを、ネットワークを介して音響モデル管理サー
バ２０に送信する適応化用音声取得手段である。Embodiment 5 FIG. FIG. 10 is a block diagram showing a configuration of a speech recognition system according to Embodiment 5 of the present invention. In the speech recognition apparatus 10 of FIG.
Is an acoustic model ID acquiring means for acquiring an ID for identifying an acoustic model referred to by the collating means 101 at the time of collation;
2 is the ID acquired by the acoustic model ID acquisition unit 1001
To obtain audio data for adaptation from the input audio signal, and transmit the read ID and the obtained audio data for adaptation to the acoustic model management server 20 via the network. Means.

【０１４９】図１０の音響モデル管理サーバ２０におい
て、１００３は適応化前の初期音響モデル、１００４
は、音声認識装置１０の適応化用音声取得手段１００２
から送信された適応化用の音声データを用いて、適応化
前の初期音響モデル１００３を適応化し、適応化済み音
響モデルを、適応化用音声取得手段１００２から送信さ
れたＩＤに対応付けて適応化済み音響モデル格納手段１
００５に格納する音響モデル適応化手段、１００６は、
音響モデル更新指令１０５を受けて、ネットワークを介
して音声認識装置１０の音響モデルＩＤ取得手段１００
１が取得したＩＤを受信し、受信したＩＤに対応する適
応化済み音響モデルを、適応化済み音響モデル格納手段
１００５から選択して読み出す適応化済み音響モデル選
択手段である。既に説明した各手段及び各モデルについ
ては、同一の符号を付し説明を省略する。In the acoustic model management server 20 shown in FIG. 10, reference numeral 1003 denotes an initial acoustic model before adaptation;
Is a voice acquisition unit for adaptation 1002 of the voice recognition device 10
Adaptation of the initial acoustic model 1003 before the adaptation is performed using the adaptation audio data transmitted from, and the adapted acoustic model is adapted in association with the ID transmitted from the adaptation audio acquisition unit 1002. Stored acoustic model storage means 1
The acoustic model adaptation means 1006 stored in 005
Receiving the acoustic model update command 105, the acoustic model ID acquiring means 100 of the speech recognition apparatus 10 via the network
1 is an adapted acoustic model selecting means for receiving the acquired ID and selecting and reading out the adapted acoustic model corresponding to the received ID from the adapted acoustic model storage means 1005. The same reference numerals are given to the respective units and models already described, and description thereof will be omitted.

【０１５０】従来技術と異なるこの実施の形態に特徴的
な部分は、音声認識装置１０において、音響モデルＩＤ
取得手段１００１，適応化用音声取得手段１００２を備
え、音響モデル管理サーバ２０において、初期音響モデ
ル１００３，音響モデル適応化手段１００４，適応化済
み音響モデル格納手段１００５及び適応化済み音響モデ
ル選択手段１００６を備えたことである。A characteristic part of this embodiment that differs from the prior art is that the speech recognition apparatus 10
The acoustic model management server 20 includes an acquisition unit 1001, an adaptation voice acquisition unit 1002, and an initial acoustic model 1003, an acoustic model adaptation unit 1004, an adapted acoustic model storage unit 1005, and an adapted acoustic model selection unit 1006. It is to have.

【０１５１】この実施の形態では、音声認識装置１０に
おいて、適応化対象となる音響モデルＩＤと適応化用の
音声データを取得し、音響モデル管理サーバ２０におい
て、音響モデルＩＤに依存した適応化を行った音響モデ
ルを構築して、音声認識装置１０に送信し、音声認識装
置１０において、音響モデル１０２を更新することによ
り、ユーザは任意の照合手段１０１の利用に際して、適
応化した音響モデルを参照可能であるため、より高い認
識精度を得ることができる。In this embodiment, the speech recognition apparatus 10 acquires an acoustic model ID to be adapted and speech data for adaptation, and the acoustic model management server 20 performs adaptation depending on the acoustic model ID. The constructed acoustic model is constructed and transmitted to the speech recognition apparatus 10, and the speech model 102 updates the acoustic model 102 so that the user refers to the adapted acoustic model when using the arbitrary matching unit 101. Since it is possible, higher recognition accuracy can be obtained.

【０１５２】次に動作について説明する。音声認識装置
１０において、音響モデルＩＤ取得手段１００１は、適
応化対象となる音響モデルを決定するものであり、例え
ば、音声認識装置１０を利用するユーザのユーザＩＤで
ある。適応化用音声取得手段１００２は、照合手段１０
１の利用前にあらかじめ適応化用として音声信号１００
による適応化用の音声データを取得し、音響モデルＩＤ
取得手段１００１から読み出す音響モデルＩＤと、取得
したした適応化用の音声データを、ネットワークを介し
て接続される音響モデル管理サーバ２０の音響モデル適
応化手段１００４へ送信する。Next, the operation will be described. In the speech recognition apparatus 10, the acoustic model ID acquisition unit 1001 determines an acoustic model to be adapted, and is, for example, a user ID of a user who uses the speech recognition apparatus 10. The voice acquisition unit for adaptation 1002 is
Before using the audio signal 1, the audio signal 100 is used for adaptation.
Acquisition of audio data for adaptation by acoustic model ID
The acoustic model ID read from the acquisition unit 1001 and the acquired adaptation voice data are transmitted to the acoustic model adaptation unit 1004 of the acoustic model management server 20 connected via a network.

【０１５３】音響モデル管理サーバ２０において、初期
音響モデル１００３は適応化を行う前の音響モデルであ
る。音響モデル適応化手段１００４は、ネットワークを
介して受け取った適応化用の音声データと初期音響モデ
ル１００３を用いて、適応化した音響モデルを構築し、
適応化済みの音響モデルとネットワークを介して受け取
った音響モデルＩＤを、適応化済み音響モデル格納手段
１００５へ格納する。音響モデルの適応化には、例えば
最大事後確率推定法を用いる。In the acoustic model management server 20, the initial acoustic model 1003 is an acoustic model before performing the adaptation. The acoustic model adapting means 1004 constructs an adapted acoustic model using the audio data for adaptation received via the network and the initial acoustic model 1003,
The adapted acoustic model and the acoustic model ID received via the network are stored in the adapted acoustic model storage unit 1005. For the adaptation of the acoustic model, for example, a maximum posterior probability estimation method is used.

【０１５４】適応化済み音響モデル格納手段１００５
は、音響モデルＩＤと音響モデル適応化手段１００４に
より適応化された音響モデルを格納し、適応化済み音響
モデル選択手段の要求に従い、指定の音響モデルＩＤを
持つ音響モデルを出力する。適応化済み音響モデル選択
手段１００６は、音響モデル更新指令１０５を受けて、
音響モデルＩＤ取得手段１００１から音響モデルＩＤを
取得し、対応する適応化済み音響モデルを、適応化済み
音響モデル格納手段１００５から選択して読み出す。Adapted acoustic model storage means 1005
Stores an acoustic model ID and an acoustic model adapted by the acoustic model adapting means 1004, and outputs an acoustic model having a designated acoustic model ID according to a request from the adapted acoustic model selecting means. The adapted acoustic model selecting means 1006 receives the acoustic model update command 105,
An acoustic model ID is acquired from the acoustic model ID acquiring means 1001, and a corresponding adapted acoustic model is selected and read from the adapted acoustic model storage means 1005.

【０１５５】図１１はこの発明の実施の形態５による音
響モデル１０２の更新処理を示すフローチャートであ
る。処理はステップＳＴ１１０１からＳＴ１１０７まで
の音響モデルの適応化段階と、ステップＳＴ１１０８か
らＳＴ１１１２までの音響モデルの更新段階に分けられ
る。適応化段階では、入力された適応化用の音声データ
を用いて、音響モデルＩＤに依存した音響モデルを構築
する。FIG. 11 is a flowchart showing a process of updating the acoustic model 102 according to the fifth embodiment of the present invention. The processing is divided into an acoustic model adaptation stage in steps ST1101 to ST1107 and an acoustic model update stage in steps ST1108 to ST1112. In the adaptation stage, an acoustic model dependent on the acoustic model ID is constructed using the input speech data for adaptation.

【０１５６】ステップＳＴ１１０１において、音声認識
装置１０の音響モデルＩＤ取得手段１００１は、適応化
対象となる音響モデルを識別する、例えばユーザ名等の
識別情報を取得する。ステップＳＴ１１０２において、
適応化用音声取得手段１００２は、音響モデルＩＤを読
み出し、音声信号１００から入力される適応化用の音声
データを取得する。ステップＳＴ１１０３において、適
応化用音声取得手段１００２は、ネットワークを介して
音響モデル管理サーバ２０に音響モデルの適応化要求を
送信し、同時に音響モデルＩＤと適応化用の音声データ
を音響モデル適応化手段１００４へ送信する。In step ST1101, the acoustic model ID acquiring means 1001 of the speech recognition device 10 acquires identification information such as a user name for identifying an acoustic model to be adapted. In step ST1102,
The adaptation voice acquisition unit 1002 reads the acoustic model ID and acquires the adaptation voice data input from the voice signal 100. In step ST1103, the adaptation voice acquisition unit 1002 transmits an acoustic model adaptation request to the acoustic model management server 20 via the network, and simultaneously transmits the acoustic model ID and the adaptation voice data to the acoustic model adaptation unit. 1004.

【０１５７】ステップＳＴ１１０４において、音響モデ
ル適応化手段１００４は、ネットワークを介して音響モ
デルの適応化要求を受信し、音響モデルＩＤと適応化用
の音声データを読み出す。ステップＳＴ１１０５におい
て、音響モデル適応化手段１００４は初期音響モデル１
００３を読み出す。ステップＳＴ１１０６において、音
響モデル適応化手段１００４は、ネットワークを介して
受け取った適応化用の音声データを用いて初期音響モデ
ル１００３を適応化する。ステップＳＴ１１０７におい
て、音響モデル適応化手段１００４は、適応化された音
響モデルを音響モデルＩＤによって区別できるように、
適応化済み音響モデル格納手段１００５に格納する。In step ST1104, the acoustic model adapting means 1004 receives the acoustic model adaptation request via the network, and reads out the acoustic model ID and the audio data for adaptation. In step ST1105, the acoustic model adapting means 1004 sets the initial acoustic model 1
003 is read. In Step ST1106, the acoustic model adapting means 1004 adapts the initial acoustic model 1003 using the audio data for adaptation received via the network. In step ST1107, the acoustic model adapting means 1004 sets the adapted acoustic model so that it can be distinguished by the acoustic model ID.
It is stored in the adapted acoustic model storage means 1005.

【０１５８】ステップＳＴ１１０８において、音響モデ
ル更新タイミング決定手段（図示していない）は、ユー
ザの指示や最終更新時刻からの時間間隔、ネットワーク
の利用状況等の監視から適当な更新タイミングを判定
し、音響モデル更新指令１０５を適応化済み音響モデル
選択手段１００６へ送信する。適応化済み音響モデル選
択手段１００６は、音響モデル更新指令１０５を受けて
いれば、ステップＳＴ１１０９へ進み、音響モデル更新
指令１０５を受けていなければ処理を終了する。ステッ
プＳＴ１１０９において、適応化済み音響モデル選択手
段１００６は、音声認識装置１０の音響モデルＩＤ取得
手段１００１から、ネットワークを介して適応化対象の
音響モデルＩＤを読み出す。In step ST1108, the acoustic model update timing determining means (not shown) determines an appropriate update timing based on the user's instruction, monitoring of the time interval from the last update time, network usage, and the like. The model update command 105 is transmitted to the adapted acoustic model selecting means 1006. Adapted acoustic model selecting means 1006 proceeds to step ST1109 if acoustic model update instruction 105 has been received, and ends the process if acoustic model update instruction 105 has not been received. In step ST1109, the adapted acoustic model selecting unit 1006 reads the acoustic model ID to be adapted from the acoustic model ID acquiring unit 1001 of the speech recognition device 10 via the network.

【０１５９】ステップＳＴ１１１０において、適応化済
み音響モデル選択手段１００６は、音響モデルＩＤで指
定された音響モデルを、適応化済み音響モデル格納手段
１００５から選択して読み出す。ステップＳＴ１１１１
において、音響モデル送信手段１１０は、読み出された
適応化済み音響モデルを、ネットワークを介して音声認
識装置１０の音響モデル更新手段１１１に送信する。ス
テップＳＴ１１１２において、音響モデル更新手段１１
１は、受け取った適応化済み音響モデルにより照合手段
１０１が参照する音響モデル１０２を更新する。In step ST1110, the adapted acoustic model selecting means 1006 selects and reads out the acoustic model specified by the acoustic model ID from the adapted acoustic model storage means 1005. Step ST1111
In, the acoustic model transmitting unit 110 transmits the read-out adapted acoustic model to the acoustic model updating unit 111 of the speech recognition device 10 via the network. In step ST1112, the acoustic model updating unit 11
1 updates the acoustic model 102 referred to by the matching unit 101 with the received adapted acoustic model.

【０１６０】なお、音声認識装置１０において、音響モ
デルＩＤ取得手段１００１は、適応化対象となる音響モ
デルを決定するものであれば、回線伝達特性、背景雑音
特性、残響音特性等を決定するものであってもかまわな
い。In the speech recognition apparatus 10, the acoustic model ID acquiring means 1001 determines the line transfer characteristic, the background noise characteristic, the reverberation sound characteristic, etc., if it determines the acoustic model to be adapted. It may be.

【０１６１】また、音声認識装置１０において、適応化
用音声取得手段１００２は、照合手段１０１の利用前に
あらかじめ適応化用としてユーザの音声データを取得し
ているが、照合手段１０１が照合時に音声データを取得
してして適応化用音声取得手段１００２に入力すること
で、次の照合の際に参照する音響モデルの適応化を行う
ことも可能である。In the speech recognition apparatus 10, the adaptation speech acquisition means 1002 acquires the user's speech data for adaptation in advance before using the collation means 101. By acquiring the data and inputting the acquired data to the adaptation voice acquiring unit 1002, it is also possible to adapt the acoustic model to be referred to at the time of the next collation.

【０１６２】さらに、音声認識装置１０において、適応
化用の音声のデータの取得と音響モデルＩＤの取得の手
順は逆であってもかまわない。Further, in the speech recognition apparatus 10, the procedure of acquiring the data of the speech for adaptation and the procedure of acquiring the acoustic model ID may be reversed.

【０１６３】さらに、音声認識装置１０において、適応
化用音声取得手段１００２の音声データの格納形態は音
声波形としたが、音声データを信号処理した音響特徴ベ
クトルの時系列、音響特徴ベクトルとベクトル量子化コ
ードブックを参照して得られるコードブック符号列、そ
れらを統計処理して得られる頻度分布、頻度分布から得
られる確率分布等、音響モデルの学習に利用できる形態
であれば、どのようなものでもかまわない。Further, in the speech recognition apparatus 10, the storage form of the speech data in the adaptation speech acquisition means 1002 is a speech waveform. However, the time series of the acoustic feature vector obtained by signal processing of the speech data, the acoustic feature vector and the vector quantum Codebook code sequence obtained by referring to a structured codebook, frequency distribution obtained by statistically processing them, probability distribution obtained from frequency distribution, etc., as long as they can be used for learning acoustic models But it doesn't matter.

【０１６４】さらに、音響モデル管理サーバ２０におい
て、音響モデル適応化手段１００４が受信した適応化用
の音声データを格納する手段を追加し、格納した多くの
適応化用の音声データにより初期モデル１００３を更新
することにより、学習の精度が高くなり、より高精度な
認識を行う適応化済み音響モデルを構築することができ
る。Further, in the acoustic model management server 20, means for storing the audio data for adaptation received by the acoustic model adaptation means 1004 is added, and the initial model 1003 is created by the stored many audio data for adaptation. By updating, the learning accuracy is increased, and an adapted acoustic model that performs more accurate recognition can be constructed.

【０１６５】さらに、この実施の形態では音声認識を対
象としたが、パターンとシンボルの関係を表した確率モ
デル、シンボルの出現を表した確率モデルからなるパタ
ーン認識を対象とするものであれば同様に適用可能であ
る。Furthermore, in this embodiment, speech recognition is targeted. However, the same applies to pattern recognition including a probability model representing the relationship between a pattern and a symbol and a probability model representing the appearance of a symbol. Applicable to

【０１６６】さらに、実施の形態５における音声認識シ
ステムを音声認識プログラムとして記録媒体に記録する
こともできる。この場合には、音声認識装置１０におい
て、音響モデルＩＤ取得手段１００１と同様の処理を行
う音響モデルＩＤ取得機能と、適応化用音声取得手段１
００２と同様の処理を行う適応化用音声取得機能と、音
響モデル更新手段１１１と同様の処理を行う音響モデル
更新機能と、照合手段１０１と同様の処理を行う照合機
能から構成されるソフトウェアと、音響モデル管理サー
バ２０において、音響モデル適応化手段１００４と同様
の処理を行う音響モデル適応化機能と、適応化済み音響
モデル格納手段１００５と同様の処理を行う適応化済み
音響モデル格納機能と、適応化済み音響モデル選択手段
１００６と同様の処理を行う適応化済み音響モデル選択
機能と、音響モデル送信手段１１０と同様の処理を行う
音響モデル送信機能から構成されるソフトウェアで音声
認識プログラムとなる。Furthermore, the speech recognition system according to the fifth embodiment can be recorded on a recording medium as a speech recognition program. In this case, in the speech recognition device 10, an acoustic model ID acquisition function that performs the same processing as the acoustic model ID acquisition unit 1001 and the adaptation speech acquisition unit 1
Adaptation voice acquisition function performing the same processing as 002, acoustic model updating function performing the same processing as the acoustic model updating unit 111, and software including a matching function performing the same processing as the matching unit 101; In the acoustic model management server 20, an acoustic model adapting function for performing the same processing as the acoustic model adapting means 1004, an adapted acoustic model storing function for performing the same processing as the adapted acoustic model storing means 1005, The software includes an adapted acoustic model selecting function for performing the same processing as that of the optimized acoustic model selecting means 1006, and an acoustic model transmitting function for performing the same processing as the acoustic model transmitting means 110, and forms a speech recognition program.

【０１６７】記録媒体に記録する音声認識プログラム
は、音声認識装置１０のソフトウェアと、音響モデル管
理サーバ２０のソフトウェアを、別々の記録媒体に記録
しても良いし、１つの記録媒体に記録して、音声認識装
置１０又は音響モデル管理サーバ２０から、それぞれ音
響モデル管理サーバ２０又は音声認識装置１０に送信し
ても良い。In the speech recognition program recorded on the recording medium, the software of the speech recognition device 10 and the software of the acoustic model management server 20 may be recorded on separate recording media, or may be recorded on one recording medium. , The speech recognition apparatus 10 or the acoustic model management server 20 may transmit the information to the acoustic model management server 20 or the speech recognition apparatus 10, respectively.

【０１６８】以上のように、この実施の形態５によれ
ば、ネットワークに接続された音響言語モデル管理サー
バ２０で、ユーザの音声信号１００による適応化用の音
声データにより適応化済みの音響モデルを構築し、ネッ
トワークを介してユーザの音声認識装置１０の音響モデ
ル１０２を更新することで、ユーザに大きな負担をかけ
ることなく、音声認識の認識精度を向上させることがで
きるという効果が得られる。As described above, according to the fifth embodiment, in the acoustic language model management server 20 connected to the network, the acoustic model that has been adapted by the audio data for adaptation by the user's audio signal 100 is used. By constructing and updating the acoustic model 102 of the user's voice recognition device 10 via the network, the effect of improving the recognition accuracy of voice recognition can be obtained without imposing a large burden on the user.

【０１６９】また、この実施の形態５によれば、ユーザ
に適応化した適応化済み音響モデルにより、ネットワー
クを介して音響モデル１０２を更新することで、ユーザ
が複数の異なる照合手段１０１を利用する場合でも、全
ての照合手段１０１の利用時に適切な音響モデル１０２
を利用し、高い認識精度を得ることができるという効果
が得られる。According to the fifth embodiment, the user uses a plurality of different matching units 101 by updating the acoustic model 102 via the network with the adapted acoustic model adapted to the user. Even in such a case, an appropriate acoustic model 102 is used when all the matching means 101 are used.
Is used to obtain an effect that a high recognition accuracy can be obtained.

【０１７０】[0170]

【発明の効果】以上のように、この発明によれば、音響
モデル管理サーバが、更新された音響データを取得して
構築した音響モデルを、ネットワークを介して音声認識
装置に送信し、音声認識装置が、音声認識の際に参照す
る音響モデルを、音響モデル管理サーバが送信した音響
モデルにより更新することにより、ユーザに大きな負担
をかけることなく、音声認識の認識精度を向上させるこ
とができるという効果がある。As described above, according to the present invention, the acoustic model management server acquires the updated acoustic data, transmits the constructed acoustic model to the speech recognition device via the network, and performs speech recognition. By updating the acoustic model referred to in the speech recognition by the device with the acoustic model transmitted by the acoustic model management server, the recognition accuracy of the speech recognition can be improved without imposing a large burden on the user. effective.

【０１７１】この発明によれば、音響モデル管理サーバ
が、音声認識装置が音声認識の際に参照する音響モデル
を特定するＩＤを取得し、取得したＩＤで指示される特
定条件に対応して、更新された音響データを読み出し、
特定条件に依存した音響モデルを構築して音声認識装置
に送信することにより、ユーザに大きな負担をかけるこ
となく、音声認識の認識精度を向上させることができる
と共に、特定向けにカスタマイズされた音響モデルを、
ネットワークを介して送信することにより、ユーザが複
数の異なる音声認識装置を利用する場合でも適切な音響
モデルを利用し、高い認識精度を得ることができるとい
う効果がある。According to the present invention, the acoustic model management server acquires an ID for identifying an acoustic model to be referred to when the speech recognition device performs speech recognition, and responds to the specific condition indicated by the acquired ID. Read the updated sound data,
By constructing an acoustic model depending on specific conditions and transmitting it to the speech recognition device, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user, and an acoustic model customized for a specific application To
By transmitting via a network, there is an effect that even when a user uses a plurality of different speech recognition devices, an appropriate acoustic model can be used and high recognition accuracy can be obtained.

【０１７２】この発明によれば、言語モデル管理サーバ
が、更新された言語データを取得して構築した言語モデ
ルを、ネットワークを介して音声認識装置に送信し、音
声認識装置が、音声認識の際に参照する言語モデルを、
言語モデル管理サーバが送信した言語モデルにより更新
することにより、ユーザに大きな負担をかけることな
く、音声認識の認識精度を向上させることができるとい
う効果がある。According to the present invention, the language model management server acquires the updated language data and transmits the language model constructed to the speech recognition device via the network, and the speech recognition device performs The language model referred to in
By updating with the language model transmitted by the language model management server, there is an effect that the recognition accuracy of speech recognition can be improved without imposing a large burden on the user.

【０１７３】この発明によれば、言語モデル管理サーバ
が、音声認識装置が音声認識の際に参照する言語モデル
を特定するＩＤを取得し、取得したＩＤで指示される特
定条件に対応して、更新された言語データを読み出し、
特定条件に依存した言語モデルを構築して音声認識装置
に送信することにより、ユーザに大きな負担をかけるこ
となく、音声認識の認識精度を向上させることができる
と共に、特定向けにカスタマイズされた言語モデルを、
ネットワークを介して更新することにより、ユーザが複
数の異なる音声認識装置を利用する場合でも適切な言語
モデルを利用し、高い認識精度を得ることができるとい
う効果がある。According to the present invention, the language model management server acquires the ID for specifying the language model to be referred to when the speech recognition device recognizes the speech, and responds to the specific condition indicated by the acquired ID. Read the updated language data,
By constructing a language model depending on specific conditions and transmitting the language model to the speech recognition device, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user, and a language model customized for a specific purpose. To
By updating via a network, there is an effect that even when a user uses a plurality of different speech recognition devices, an appropriate language model is used and high recognition accuracy can be obtained.

【０１７４】この発明によれば、音声認識装置が音声認
識の際に単語を登録したユーザ辞書を参照し、言語モデ
ル管理サーバが、ネットワークを介してユーザ辞書を読
み出し、更新された言語データと、読み出したユーザ辞
書とを参照し、ユーザ辞書に依存した言語モデルを構築
して音声認識装置に送信することにより、ユーザ辞書が
大きくなった場合でも、ユーザに大きな負担をかけるこ
となく、音声認識の認識精度を向上させることができる
という効果がある。According to the present invention, the speech recognition device refers to the user dictionary in which words are registered at the time of speech recognition, and the language model management server reads the user dictionary via the network, and updates the language data, By referring to the read user dictionary and constructing a language model dependent on the user dictionary and transmitting the language model to the speech recognition device, even if the user dictionary becomes large, the user can perform speech recognition without imposing a heavy burden on the user. There is an effect that recognition accuracy can be improved.

【０１７５】この発明によれば、言語モデル管理サーバ
が、音声認識装置のユーザが利用したテキストを取得
し、更新された言語データと、取得したテキストとを参
照し、テキストに依存した言語モデルを構築して上記音
声認識装置に送信することにより、ユーザに大きな負担
をかけることなく、音声認識の認識精度を向上させるこ
とができるという効果がある。According to the present invention, the language model management server acquires the text used by the user of the speech recognition apparatus, refers to the updated language data and the acquired text, and creates a language model dependent on the text. By constructing and transmitting the speech to the speech recognition device, there is an effect that the recognition accuracy of speech recognition can be improved without imposing a large burden on the user.

【０１７６】この発明によれば、音声認識装置が、音響
モデルを特定するＩＤと、入力された音声信号から適応
化用の音声データとを取得し、取得したＩＤ及び適応化
用の音声データを、ネットワークを介して音響モデル管
理サーバに送信し、音響モデル管理サーバが、送信され
た適応化用の音声データを用いて、初期音響モデルを適
応化し、適応化済み音響モデルを、送信されたＩＤに対
応付けて格納すると共に、外部からの音響モデル更新指
令を受けて、ネットワークを介して音声認識装置からＩ
Ｄを受信し、受信したＩＤに対応する適応化済み音響モ
デルを、格納している適応化済み音響モデルの中から選
択して読み出し、ネットワークを介して音声認識装置に
送信し、音声認識装置が、音声認識の際に参照する音響
モデルを、音響モデル管理サーバが送信した適応化済み
音響モデルにより更新することにより、ユーザに大きな
負担をかけることなく、音声認識の認識精度を向上させ
ることができると共に、ユーザに適応化した適応化済み
音響モデルにより、ネットワークを介して音響モデルを
更新することで、ユーザが複数の異なる音声認識装置を
利用する場合でも、適切な音響モデルを利用し、高い認
識精度を得ることができるという効果がある。According to the present invention, the speech recognition apparatus acquires an ID for specifying an acoustic model and speech data for adaptation from an inputted speech signal, and converts the acquired ID and speech data for adaptation. Is transmitted to the acoustic model management server via the network, and the acoustic model management server adapts the initial acoustic model using the transmitted speech data for adaptation, and transmits the adapted acoustic model to the transmitted ID. And in response to an external acoustic model update command, the voice recognition device
D, receives and reads the adapted acoustic model corresponding to the received ID from the stored adapted acoustic models, and transmits the read acoustic model to the speech recognition device via the network. By updating the acoustic model to be referred to at the time of speech recognition with the adapted acoustic model transmitted by the acoustic model management server, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user. At the same time, by updating the acoustic model via the network using the adapted acoustic model adapted to the user, even when the user uses a plurality of different speech recognition devices, the user can use an appropriate acoustic model to achieve high recognition. There is an effect that accuracy can be obtained.

【０１７７】この発明によれば、音声認識装置がネット
ワークを介して接続された音響モデル管理サーバから、
更新された音響データにより構築された音響モデルを受
信し、照合手段が音声認識の際に参照する音響モデル
を、受信した音響モデルにより更新する音響モデル更新
手段とを備えたことにより、ユーザに大きな負担をかけ
ることなく、音声認識の認識精度を向上させることがで
きるという効果がある。According to the present invention, the speech recognition device is provided from the acoustic model management server connected via the network.
An acoustic model constructed by the updated acoustic data is received, and an acoustic model updating means for updating the acoustic model referred to by the matching means at the time of speech recognition based on the received acoustic model is provided. There is an effect that the recognition accuracy of voice recognition can be improved without imposing a burden.

【０１７８】この発明によれば、音響モデル更新手段
が、ネットワークを介して接続された音響モデル管理サ
ーバから、更新された音響データにより構築された、照
合手段が音声認識の際に参照する音響モデルの特定条件
に依存した音響モデルを受信し、照合手段が音声認識の
際に参照する音響モデルを、受信した音響モデルにより
更新することにより、ユーザに大きな負担をかけること
なく、音声認識の認識精度を向上させることができると
共に、特定向けにカスタマイズされた音響モデルを、ネ
ットワークを介して受信することにより、ユーザが複数
の異なる照合手段を利用する場合でも適切な音響モデル
を利用し、高い認識精度を得ることができるという効果
がある。According to the present invention, the acoustic model updating means is constructed by the acoustic model management server connected via the network and constructed by the updated acoustic data, and the acoustic model referred by the collating means at the time of speech recognition. Receiving the acoustic model depending on the specific condition of the user, and updating the acoustic model referred to by the matching means in the speech recognition based on the received acoustic model, thereby reducing the recognition accuracy of the speech recognition without imposing a large burden on the user. In addition to receiving a customized acoustic model via a network, even if the user uses a plurality of different matching means, the user can use an appropriate acoustic model and achieve high recognition accuracy. Is obtained.

【０１７９】この発明によれば、音声認識装置がネット
ワークを介して接続された言語モデル管理サーバから、
更新された言語データにより構築された言語モデルを受
信し、照合手段が音声認識の際に参照する言語モデル
を、受信した言語モデルにより更新する言語モデル更新
手段とを備えたことにより、ユーザに大きな負担をかけ
ることなく、音声認識の認識精度を向上させることがで
きるという効果がある。According to the present invention, the speech recognition device is sent from the language model management server connected via the network.
Language model updating means for receiving a language model constructed based on the updated language data and updating the language model referred to by the matching means during speech recognition based on the received language model is provided. There is an effect that the recognition accuracy of voice recognition can be improved without imposing a burden.

【０１８０】この発明によれば、言語モデル更新手段
が、ネットワークを介して接続された言語モデル管理サ
ーバから、更新された言語データにより構築された、照
合手段が音声認識の際に参照する言語モデルの特定条件
に依存した言語モデルを受信し、照合手段が音声認識の
際に参照する言語モデルを、受信した言語モデルにより
更新することにより、ユーザに大きな負担をかけること
なく、音声認識の認識精度を向上させることができると
共に、特定向けにカスタマイズされた言語モデルを、ネ
ットワークを介して受信することにより、ユーザが複数
の異なる照合手段を利用する場合でも適切な言語モデル
を利用し、高い認識精度を得ることができるという効果
がある。According to the present invention, the language model updating means is constructed by the language model management server connected via the network and constructed by the updated language data, and the language model referred by the collating means at the time of speech recognition. Receiving the language model depending on the specific condition of the speech recognition, and updating the language model referred to by the matching means in the speech recognition with the received language model, without imposing a great burden on the user, and thereby improving the recognition accuracy of the speech recognition. By receiving a language model customized for a specific application via a network, a user can use an appropriate language model even when using a plurality of different matching means, and achieve high recognition accuracy. Is obtained.

【０１８１】この発明によれば、照合手段が音声認識の
際に参照する単語を登録したユーザ辞書を備え、言語モ
デル更新手段が、ネットワークを介して接続された言語
モデル管理サーバから、更新された言語データにより構
築された、ユーザ辞書に依存した言語モデルを受信し、
照合手段が音声認識の際に参照する言語モデルを、受信
した言語モデルにより更新することにより、ユーザ辞書
が大きくなった場合でも、ユーザに大きな負担をかける
ことなく、音声認識の認識精度を向上させることができ
るという効果がある。According to the present invention, the collating means has a user dictionary in which words to be referred to at the time of speech recognition are registered, and the language model updating means is updated from the language model management server connected via the network. Receives a language model that depends on the user dictionary, constructed with language data,
By updating the language model referred to by the matching means during speech recognition with the received language model, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user even when the user dictionary becomes large. There is an effect that can be.

【０１８２】この発明によれば、言語モデル更新手段
が、ネットワークを介して接続された言語モデル管理サ
ーバから、更新された言語データにより構築された、音
声認識を行うユーザが利用したテキストに依存した言語
モデルを受信し、照合手段が音声認識の際に参照する言
語モデルを、受信した言語モデルにより更新することに
より、ユーザに大きな負担をかけることなく、音声認識
の認識精度を向上させることができるという効果があ
る。According to the present invention, the language model updating means depends on a text used by a user who performs speech recognition, constructed from updated language data from a language model management server connected via a network. By receiving the language model and updating the language model referred to by the collation unit at the time of speech recognition with the received language model, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user. This has the effect.

【０１８３】この発明によれば、音声の音響的な観測値
系列の確率を求める音響モデルと、音声信号を入力し上
記音響モデルを参照して音声認識を行い、認識結果を出
力する照合手段と、音響モデルを特定するＩＤを取得す
る音響モデルＩＤ取得手段と、取得したＩＤを読み出
し、入力された音声信号から適応化用の音声データを取
得し、読み出したＩＤ及び取得した適応化用の音声デー
タを、ネットワークを介して接続された音響モデル管理
サーバに送信する適応化用音声取得手段と、音響モデル
管理サーバから、ＩＤに対応する適応化用の音声データ
により適応化された適応化済み音響モデルを受信し、照
合手段が音声認識の際に参照する音響モデルを、受信し
た適応化済み音響モデルにより更新する音響モデル更新
手段とを備えたことにより、ユーザに大きな負担をかけ
ることなく、音声認識の認識精度を向上させることがで
きると共に、ユーザに適応化した適応化済み音響モデル
により、ネットワークを介して音響モデルを更新するこ
とで、ユーザが複数の異なる照合手段を利用する場合で
も、適切な音響モデルを利用し、高い認識精度を得るこ
とができるという効果がある。According to the present invention, there is provided an acoustic model for obtaining the probability of an acoustic observation sequence of speech, and a matching means for inputting a speech signal, performing speech recognition with reference to the acoustic model, and outputting a recognition result. An acoustic model ID acquiring means for acquiring an ID for specifying an acoustic model, an acquired ID, readout of audio data for adaptation from an input audio signal, the read-out ID and the acquired audio for adaptation An adaptation sound acquisition unit for transmitting data to an acoustic model management server connected via a network; and an adapted sound adapted from the acoustic model management server by adaptation sound data corresponding to the ID. An acoustic model updating means for receiving the model and updating the acoustic model referred to by the matching means during speech recognition with the received adapted acoustic model Thus, the recognition accuracy of the voice recognition can be improved without imposing a great burden on the user, and the user can update the acoustic model via the network with the adapted acoustic model adapted for the user, thereby enabling the user to Even when a plurality of different matching means are used, there is an effect that high recognition accuracy can be obtained by using an appropriate acoustic model.

【０１８４】この発明によれば、更新された音響データ
を取得する音響データ取得手段と、外部からの音響モデ
ル更新指令を受けて更新された音響データを読み出し、
音声の音響的な観測値系列の確率を求める音響モデルを
構築する音響モデル構築手段と、音響モデル構築手段に
より構築された音響モデルを、ネットワークを介して音
声認識を行う音声認識装置に送信する音響モデル送信手
段とを備えたことにより、ユーザに大きな負担をかける
ことなく、音声認識の認識精度を向上させることができ
るという効果がある。According to the present invention, the acoustic data acquiring means for acquiring updated acoustic data and the acoustic data updated in response to an external acoustic model update command are read out.
Acoustic model construction means for constructing an acoustic model for determining the probability of an acoustic observation sequence of speech, and sound for transmitting the acoustic model constructed by the acoustic model construction means to a speech recognition device for speech recognition via a network The provision of the model transmission means has an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

【０１８５】この発明によれば、更新された音響データ
を取得する音響データ取得手段と、外部からの音響モデ
ル更新指令を受けて、音声認識の際に参照する音響モデ
ルを特定するＩＤを取得する更新音響モデルＩＤ取得手
段と、取得したＩＤで指示される特定条件に対応して、
更新された音響データを読み出す特定向け音響データ読
み出し手段と、読み出した更新された音響データを参照
し、特定条件に依存した音響モデルを構築する特定向け
音響モデル構築手段と、構築した音響モデルを、ネット
ワークを介して音声認識装置に送信する音響モデル送信
手段とを備えたことにより、ユーザに大きな負担をかけ
ることなく、音声認識の認識精度を向上させることがで
きると共に、特定向けにカスタマイズされた音響モデル
を、ネットワークを介して送信することにより、ユーザ
が複数の異なる音声認識装置を利用する場合でも適切な
音響モデルを利用し、高い認識精度を得ることができる
という効果がある。According to the present invention, an acoustic data acquiring means for acquiring updated acoustic data and an ID for specifying an acoustic model to be referred to at the time of speech recognition in response to an acoustic model update command from outside are acquired. In accordance with the updated acoustic model ID acquisition means and the specific condition indicated by the acquired ID,
Specific acoustic data reading means for reading the updated acoustic data, a specific acoustic model construction means for referring to the read updated acoustic data, and constructing an acoustic model dependent on specific conditions, the constructed acoustic model, By providing the acoustic model transmitting means for transmitting to the voice recognition device via the network, it is possible to improve the recognition accuracy of the voice recognition without imposing a great burden on the user, and to customize the acoustics customized for the specific application. By transmitting the model via the network, there is an effect that even when the user uses a plurality of different speech recognition devices, an appropriate acoustic model can be used and high recognition accuracy can be obtained.

【０１８６】この発明によれば、音声の音響的な観測値
系列の確率を求める、適応化前の初期音響モデルと、ネ
ットワークを介して接続された音声認識装置から送信さ
れた、適応化用の音声データと、音声認識装置が音声認
識の際に参照する音響モデルを特定するＩＤを受信し、
適応化用の音声データを用いて初期音響モデルを適応化
し、適応化済み音響モデルを、受信したＩＤに対応付け
て適応化済み音響モデル格納手段に格納する音響モデル
適応化手段と、外部からの音響モデル更新指令を受け
て、ネットワークを介して音声認識装置からＩＤを受信
し、受信したＩＤに対応する適応化済み音響モデルを、
適応化済み音響モデル格納手段から選択して読み出す適
応化済み音響モデル選択手段と、読み出した適応化済み
音響モデルを、ネットワークを介して音声認識装置に送
信する音響モデル送信手段とを備えたことにより、ユー
ザに大きな負担をかけることなく、音声認識の認識精度
を向上させることができると共に、ユーザに適応化した
適応化済み音響モデルにより、ネットワークを介して音
響モデルを更新することで、ユーザが複数の異なる音声
認識装置を利用する場合でも、適切な音響モデルを利用
し、高い認識精度を得ることができるという効果があ
る。According to the present invention, the initial acoustic model before adaptation, which determines the probability of the acoustic observation value sequence of speech, and the adaptive acoustic model transmitted from the speech recognition device connected via the network. Receiving voice data and an ID for specifying an acoustic model to be referred to by the voice recognition device for voice recognition,
An acoustic model adapting means for adapting the initial acoustic model using the audio data for adaptation, storing the adapted acoustic model in the adapted acoustic model storing means in association with the received ID, and In response to the acoustic model update command, an ID is received from the speech recognition device via the network, and an adapted acoustic model corresponding to the received ID is
By providing an adapted acoustic model selecting means for selecting and reading from the adapted acoustic model storing means and an acoustic model transmitting means for transmitting the read adapted acoustic model to the speech recognition device via a network. It is possible to improve the recognition accuracy of speech recognition without imposing a large burden on the user, and to update the acoustic model via the network with the adapted acoustic model adapted to the user, thereby enabling the user to perform multiple operations. In the case of using a speech recognition device different from the above, there is an effect that high recognition accuracy can be obtained by using an appropriate acoustic model.

【０１８７】この発明によれば、更新された言語データ
を取得する言語データ取得手段と、外部からの言語モデ
ル更新指令を受けて更新された言語データを読み出し、
単語列の出現確率を求める言語モデルを構築する言語モ
デル構築手段と、言語モデル構築手段が構築した言語モ
デルを、ネットワークを介して音声認識を行う音声認識
装置に送信する言語モデル送信手段とを備えたことによ
り、ユーザに大きな負担をかけることなく、音声認識の
認識精度を向上させることができるという効果がある。According to the present invention, language data acquiring means for acquiring updated language data and language data updated in response to an external language model update command are read out.
A language model constructing means for constructing a language model for obtaining a word string appearance probability; and a language model transmitting means for transmitting the language model constructed by the language model constructing means to a speech recognition apparatus for performing speech recognition via a network. As a result, there is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

【０１８８】この発明によれば、更新された言語データ
を取得する言語データ取得手段と、外部からの言語モデ
ル更新指令を受けて、音声認識の際に参照する言語モデ
ルを特定するＩＤを取得する更新言語モデルＩＤ取得手
段と、ＩＤで指示される特定条件に対応して、更新され
た言語データを読み出す特定向け言語データ読み出し手
段と、読み出した更新された言語データを参照し、特定
条件に依存した言語モデルを構築する特定向け言語モデ
ル構築手段と、構築した言語モデルを、ネットワークを
介して音声認識装置に送信する言語モデル送信手段とを
備えたことにより、ユーザに大きな負担をかけることな
く、音声認識の認識精度を向上させることができると共
に、特定向けにカスタマイズされた言語モデルを、ネッ
トワークを介して送信することにより、ユーザが複数の
異なる音声認識装置を利用する場合でも適切な言語モデ
ルを利用し、高い認識精度を得ることができるという効
果がある。According to the present invention, a language data acquiring means for acquiring updated language data, and an ID for specifying a language model to be referred to at the time of speech recognition in response to an external language model update command. Update language model ID acquisition means, specific language data reading means for reading updated language data corresponding to a specific condition specified by the ID, and refer to the read updated language data to depend on the specific condition Specific language model construction means for constructing the language model, and a language model transmission means for transmitting the constructed language model to the speech recognition device via a network, without imposing a heavy burden on the user, It can improve the recognition accuracy of speech recognition, and can customize language models customized for specific applications via a network. By signal, utilizing the appropriate language model, even if the user uses a plurality of different speech recognition device, there is an effect that it is possible to obtain a high recognition accuracy.

【０１８９】この発明によれば、更新された言語データ
を取得する言語データ取得手段と、外部からの言語モデ
ル更新指令を受けて、ネットワークを介して接続された
音声認識装置が音声認識の際に参照するユーザ辞書を読
み出すユーザ辞書読み出し手段と、更新された言語デー
タを読み出し、ユーザ辞書に依存した言語モデルを構築
するユーザ辞書依存言語モデル構築手段と、構築した言
語モデルを、ネットワークを介して音声認識装置に送信
する言語モデル送信手段とを備えたことにより、ユーザ
辞書が大きくなった場合でも、ユーザに大きな負担をか
けることなく、音声認識の認識精度を向上させることが
できるという効果がある。According to the present invention, a language data acquiring means for acquiring updated language data and a speech recognition device connected via a network in response to a language model update command from outside are used for speech recognition. A user dictionary reading means for reading a user dictionary to be referred to, a user dictionary dependent language model building means for reading updated language data and building a language model dependent on the user dictionary, and voices the built language model via a network. With the provision of the language model transmitting means for transmitting to the recognition device, there is an effect that even when the user dictionary becomes large, the recognition accuracy of the voice recognition can be improved without imposing a large burden on the user.

【０１９０】この発明によれば、更新された言語データ
を取得する言語データ取得手段と、外部からの言語モデ
ル更新指令を受けて、ユーザが利用したテキストを取得
するユーザ利用テキスト取得手段と、更新された言語デ
ータを読み出し、取得したテキストに依存した言語モデ
ルを構築するユーザ利用テキスト依存言語モデル構築手
段と、構築した言語モデルを、ネットワークを介して音
声認識装置に送信する言語モデル送信手段とを備えたこ
とにより、ユーザに大きな負担をかけることなく、音声
認識の認識精度を向上させることができるという効果が
ある。According to the present invention, a language data acquiring means for acquiring updated language data, a user use text acquiring means for acquiring a text used by a user in response to an external language model update command, User-based text-dependent language model construction means for reading out the language data obtained and constructing a language model dependent on the acquired text, and language model transmission means for transmitting the constructed language model to a speech recognition device via a network. With the provision, there is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

【０１９１】この発明によれば、更新された音響データ
を取得する第１のステップと、音響モデル更新指令を受
けて更新された音響データを読み出し、音響モデルを構
築する第２のステップと、構築した音響モデルを、ネッ
トワークを介して送信する第３のステップと、音響モデ
ルを受信し、音声認識の際に参照する音響モデルを、受
信した音響モデルにより更新する第４のステップとを備
えたことにより、ユーザに大きな負担をかけることな
く、音声認識の認識精度を向上させることができるとい
う効果がある。According to the present invention, the first step of acquiring updated acoustic data, the second step of reading out updated acoustic data in response to an acoustic model update command, and constructing an acoustic model, And a fourth step of receiving the acoustic model and updating the acoustic model to be referred to during speech recognition based on the received acoustic model. Accordingly, there is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

【０１９２】この発明によれば、更新された言語データ
を取得する第１のステップと、言語モデル更新指令を受
けて更新された言語データを読み出し、言語モデルを構
築する第２のステップと、構築した言語モデルを、ネッ
トワークを介して送信する第３のステップと、言語モデ
ルを受信し、音声認識の際に参照する言語モデルを、受
信した言語モデルにより更新する第４のステップとを備
えたことにより、ユーザに大きな負担をかけることな
く、音声認識の認識精度を向上させることができるとい
う効果がある。According to the present invention, the first step of acquiring updated language data, the second step of reading language data updated in response to a language model update command, and constructing a language model, And a fourth step of receiving the language model via a network, and updating the language model referred to during speech recognition with the received language model. Accordingly, there is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

【０１９３】この発明によれば、更新された音響データ
を取得する第１のステップと、音響モデル更新指令を受
けて、音声認識の際に参照する音響モデルを特定するＩ
Ｄを取得する第２のステップと、ＩＤで指示される特定
条件に対応して、更新された音響データを読み出す第３
のステップと、更新された音響データを参照し、特定条
件に依存した音響モデルを構築する第４のステップと、
構築した音響モデルを、ネットワークを介して送信する
第５のステップと、音響モデルを受信し、音声認識の際
に参照する音響モデルを、受信した音響モデルにより更
新する第６のステップとを備えたことにより、ユーザに
大きな負担をかけることなく、音声認識の認識精度を向
上させることができると共に、特定向けにカスタマイズ
された音響モデルを、ネットワークを介して受信するこ
とにより、ユーザが複数の異なる音声認識方法を利用す
る場合でも適切な音響モデルを利用し、高い認識精度を
得ることができるという効果がある。According to the present invention, the first step of acquiring updated acoustic data and the I / O that specifies an acoustic model to be referred to during speech recognition upon receiving an acoustic model update command
A second step of acquiring D, and a third step of reading updated acoustic data corresponding to a specific condition indicated by the ID.
And a fourth step of constructing an acoustic model depending on specific conditions with reference to the updated acoustic data,
A fifth step of transmitting the constructed acoustic model via a network; and a sixth step of receiving the acoustic model and updating the acoustic model to be referred to at the time of speech recognition with the received acoustic model. This makes it possible to improve the recognition accuracy of voice recognition without imposing a large burden on the user, and to receive a customized acoustic model via a network so that the user can recognize a plurality of different voices. Even when a recognition method is used, there is an effect that a high recognition accuracy can be obtained by using an appropriate acoustic model.

【０１９４】この発明によれば、更新された言語データ
を取得する第１のステップと、言語モデル更新指令を受
けて、音声認識の際に参照する言語モデルを特定するＩ
Ｄを取得する第２のステップと、取得したＩＤで指示さ
れる特定条件に対応して、更新された言語データを読み
出す第３のステップと、更新された言語データを参照
し、特定条件に依存した言語モデルを構築する第４のス
テップと、構築した言語モデルを、ネットワークを介し
て送信する第５のステップと、言語モデルを受信し、音
声認識の際に参照する言語モデルを、受信した言語モデ
ルにより更新する第６のステップとを備えたことによ
り、ユーザに大きな負担をかけることなく、音声認識の
認識精度を向上させることができると共に、特定向けに
カスタマイズされた言語モデルを、ネットワークを介し
て受信することにより、ユーザが複数の異なる音声認識
方法を利用する場合でも適切な言語モデルを利用し、高
い認識精度を得ることができるという効果がある。According to the present invention, the first step of acquiring the updated language data and the I step of receiving the language model update command and specifying the language model to be referred to during speech recognition.
A second step of acquiring D, a third step of reading out updated language data corresponding to a specific condition indicated by the acquired ID, and referencing the updated language data, depending on the specific condition. A fourth step of constructing the constructed language model, a fifth step of transmitting the constructed language model via a network, and receiving the language model and referencing the language model to be referred to during speech recognition in the received language. And the sixth step of updating with a model, the recognition accuracy of speech recognition can be improved without imposing a great burden on the user, and a language model customized for specific use can be transmitted via a network. To obtain high recognition accuracy by using an appropriate language model even when the user uses a plurality of different speech recognition methods. There is an effect that can be.

【０１９５】この発明によれば、更新された言語データ
を取得する第１のステップと、言語モデル更新指令を受
けて、音声認識の際に参照するユーザ辞書を読み出す第
２のステップと、更新された言語データを読み出し、ユ
ーザ辞書に依存した言語モデルを構築する第３のステッ
プと、構築した言語モデルを、ネットワークを介して送
信する第４のステップと、言語モデルを受信し、音声認
識の際に参照する言語モデルを、受信した言語モデルに
より更新する第５のステップとを備えたことにより、ユ
ーザ辞書が大きくなった場合でも、ユーザに大きな負担
をかけることなく、音声認識の認識精度を向上させるこ
とができるという効果がある。According to the present invention, the first step of acquiring updated language data, the second step of receiving a language model update command and reading out a user dictionary to be referred to during speech recognition, and A third step of reading out the language data, and constructing a language model dependent on the user dictionary, a fourth step of transmitting the constructed language model via a network, and receiving the language model and performing speech recognition. And the fifth step of updating the language model to be referred to in accordance with the received language model, thereby improving the recognition accuracy of speech recognition without imposing a large burden on the user even when the user dictionary becomes large. There is an effect that can be made.

【０１９６】この発明によれば、更新された言語データ
を取得する第１のステップと、言語モデル更新指令を受
けて、ユーザが利用したテキストを取得する第２のステ
ップと、更新された言語データを読み出し、テキストに
依存した言語モデルを構築する第３のステップと、構築
した言語モデルを、ネットワークを介して送信する第４
のステップと、言語モデルを受信し、音声認識の際に参
照する言語モデルを、受信した言語モデルにより更新す
る第５のステップとを備えたことにより、ユーザに大き
な負担をかけることなく、音声認識の認識精度を向上さ
せることができるという効果がある。According to the present invention, the first step of acquiring updated language data, the second step of acquiring a text used by a user in response to a language model update command, and the step of acquiring updated language data And constructing a language model dependent on the text, and a fourth step of transmitting the constructed language model via a network.
And a fifth step of receiving a language model and updating a language model to be referred to in speech recognition with the received language model. This has the effect of improving the recognition accuracy of.

【０１９７】この発明によれば、音響モデルを特定する
ＩＤを取得する第１のステップと、取得したＩＤを読み
出し、入力された音声信号から適応化用の音声データを
取得し、ネットワークを介して、読み出したＩＤ及び取
得した適応化用の音声データを送信する第２のステップ
と、送信した適応化用の音声データを用いて、適応化前
の初期音響モデルを適応化し、適応化済み音響モデル
を、送信したＩＤに対応付けて格納する第３のステップ
と、音響モデル更新指令を受けて、ネットワークを介し
て第１のステップで取得したＩＤを受信し、受信したＩ
Ｄに対応する適応化済み音響モデルを、第３のステップ
で格納している適応化済み音響モデルの中から選択して
読み出す第４のステップと、第４のステップで読み出し
た適応化済み音響モデルを、ネットワークを介して送信
する第５のステップと、送信した適応化済み音響モデル
を受信し、音声認識の際に参照する音響モデルを、受信
した適応化済み音響モデルにより更新する第６のステッ
プとを備えたことにより、ユーザに大きな負担をかける
ことなく、音声認識の認識精度を向上させることができ
ると共に、ユーザに適応化した適応化済み音響モデルに
より、ネットワークを介して音響モデルを更新すること
で、ユーザが複数の異なる音声認識方法を利用する場合
でも、適切な音響モデルを利用し、高い認識精度を得る
ことができるという効果がある。According to the present invention, the first step of obtaining an ID for specifying an acoustic model, reading out the obtained ID, obtaining audio data for adaptation from the input audio signal, and via the network A second step of transmitting the read ID and the acquired audio data for adaptation, and using the transmitted audio data for adaptation to adapt the initial acoustic model before adaptation, And the third step of storing the ID acquired in the first step via the network in response to the acoustic model update command and storing the received I
A fourth step of selecting and reading out the adapted acoustic model corresponding to D from the adapted acoustic models stored in the third step, and an adapted acoustic model read out in the fourth step And the sixth step of receiving the transmitted adapted acoustic model and updating the acoustic model to be referred to at the time of speech recognition with the received adapted acoustic model. With this configuration, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user, and the acoustic model is updated via the network by the adapted acoustic model adapted to the user. Thus, even when the user uses a plurality of different voice recognition methods, it is possible to obtain a high recognition accuracy by using an appropriate acoustic model. There is an effect.

【０１９８】この発明によれば、音声認識プログラムを
記録した記録媒体で、更新された音響データを取得する
音響データ取得機能と、音響モデル更新指令を受けて更
新された音響データを読み出し、音響モデルを構築する
音響モデル構築機能と、構築した音響モデルを、ネット
ワークを介して送信する音響モデル送信機能と、音響モ
デルを受信し、照合機能が音声認識の際に参照する音響
モデルを、受信した音響モデルにより更新する音響モデ
ル更新機能とを実現させることにより、ユーザに大きな
負担をかけることなく、音声認識の認識精度を向上させ
ることができるという効果がある。According to the present invention, an acoustic data acquisition function for acquiring updated acoustic data on a recording medium on which a speech recognition program is recorded, and an updated acoustic data in response to an acoustic model update command are read out. An acoustic model construction function for constructing a sound model, an acoustic model transmission function for transmitting the constructed acoustic model via a network, an acoustic model for receiving the acoustic model, and an acoustic model referred to by the matching function for speech recognition. By realizing the acoustic model update function of updating with a model, there is an effect that the recognition accuracy of speech recognition can be improved without imposing a large burden on the user.

【０１９９】この発明によれば、音声認識プログラムを
記録した記録媒体で、更新された言語データを取得する
言語データ取得機能と、言語モデル更新指令を受けて、
上記言語データ取得機能が取得した更新された言語デー
タを読み出し、言語モデルを構築する言語モデル構築機
能と、上記言語モデル構築機能が構築した言語モデル
を、ネットワークを介して送信する言語モデル送信機能
と、上記言語モデル送信機能が送信した言語モデルを受
信し、上記照合機能が音声認識の際に参照する言語モデ
ルを、受信した言語モデルにより更新する言語モデル更
新機能とを実現させることにより、ユーザに大きな負担
をかけることなく、音声認識の認識精度を向上させるこ
とができるという効果がある。According to the present invention, a language data acquisition function for acquiring updated language data on a recording medium on which a speech recognition program is recorded, and a language model update instruction
A language model construction function for reading updated language data acquired by the language data acquisition function and constructing a language model; and a language model transmission function for transmitting the language model constructed by the language model construction function via a network. Receiving the language model transmitted by the language model transmission function, and realizing a language model update function of updating the language model referred to by the collation function at the time of speech recognition with the received language model, thereby enabling the user There is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden.

【０２００】この発明によれば、音声認識プログラムを
記録した記録媒体で、更新された音響データを取得する
音響データ取得機能と、音響モデル更新指令を受けて、
音響モデルを特定するＩＤを取得する更新音響モデルＩ
Ｄ取得機能と、取得したＩＤで指示される特定条件に対
応して、更新された音響データを読み出す特定向け音響
データ読み出し機能と、更新された音響データを参照
し、特定条件に依存した音響モデルを構築する特定向け
音響モデル構築機能と、音響モデルを、ネットワークを
介して送信する音響モデル送信機能と、音響モデルを受
信し、照合機能が音声認識の際に参照する音響モデル
を、受信した音響モデルにより更新する音響モデル更新
機能とを実現させることにより、ユーザに大きな負担を
かけることなく、音声認識の認識精度を向上させること
ができると共に、特定向けにカスタマイズされた音響モ
デルを、ネットワークを介して受信することにより、ユ
ーザが複数の照合機能を利用する場合でも適切な音響モ
デルを利用し、高い認識精度を得ることができるという
効果がある。According to the present invention, an acoustic data acquisition function for acquiring updated acoustic data on a recording medium on which a speech recognition program is recorded, and an acoustic model update instruction
Updated acoustic model I for acquiring ID for specifying acoustic model
D Acquisition function, a specific acoustic data reading function for reading updated acoustic data corresponding to a specific condition indicated by the acquired ID, and an acoustic model dependent on the specific condition with reference to the updated acoustic data An acoustic model construction function for constructing an acoustic model, an acoustic model transmission function for transmitting an acoustic model via a network, an acoustic model for receiving an acoustic model, and a collation function for referencing an acoustic model for speech recognition. By realizing the acoustic model update function of updating with a model, it is possible to improve the recognition accuracy of speech recognition without imposing a great burden on the user, and to provide a customized acoustic model via a network. By receiving the information, the appropriate acoustic model is used even when the user uses multiple matching functions, and high recognition is achieved. There is an effect that it is possible to obtain accuracy.

【０２０１】この発明によれば、音声認識プログラムを
記録した記録媒体で、更新された言語データを取得する
言語データ取得機能と、言語モデル更新指令を受けて、
言語モデルを特定するＩＤを取得する更新言語モデルＩ
Ｄ取得機能と、取得したＩＤで指示される特定条件に対
応して、更新された言語データを読み出す特定向け言語
データ読み出し機能と、更新された言語データを参照
し、特定条件に依存した言語モデルを構築する特定向け
言語モデル構築機能と、構築した言語モデルを、ネット
ワークを介して送信する言語モデル送信機能と、言語モ
デルを受信し、照合機能が音声認識の際に参照する言語
モデルを、受信した言語モデルにより更新する言語モデ
ル更新機能とを実現させることにより、ユーザに大きな
負担をかけることなく、音声認識の認識精度を向上させ
ることができると共に、特定向けにカスタマイズされた
言語モデルを、ネットワークを介して受信することによ
り、ユーザが複数の照合機能を利用する場合でも適切な
言語モデルを利用し、高い認識精度を得ることができる
という効果がある。According to the present invention, a language data acquisition function for acquiring updated language data on a recording medium on which a speech recognition program is recorded, and a language model update instruction
Updated language model I for acquiring ID specifying language model
D acquisition function, specific language data read function for reading updated language data corresponding to the specific condition indicated by the obtained ID, and language model dependent on the specific condition by referring to the updated language data A language model construction function for constructing a language model, a language model transmission function for transmitting the constructed language model via a network, a language model for receiving the language model, and a language model for the collation function to refer to during speech recognition. By realizing the language model update function of updating with a language model that has been updated, the recognition accuracy of speech recognition can be improved without imposing a large burden on the user, and a language model customized for specific use can be transferred to a network. Through the use of the appropriate language model even if the user uses multiple matching functions. There is an effect that it is possible to obtain a high recognition accuracy.

【０２０２】この発明によれば、音声認識プログラムを
記録した記録媒体で、更新された言語データを取得する
言語データ取得機能と、言語モデル更新指令を受けて、
ユーザ辞書を読み出すユーザ辞書読み出し機能と、更新
された言語データを読み出し、ユーザ辞書に依存した言
語モデルを構築するユーザ辞書依存言語モデル構築機能
と、構築した言語モデルを、ネットワークを介して送信
する言語モデル送信機能と、言語モデルを受信し、照合
機能が音声認識の際に参照する言語モデルを、受信した
言語モデルにより更新する言語モデル更新機能とを実現
させることにより、ユーザ辞書が大きくなった場合で
も、ユーザに大きな負担をかけることなく、音声認識の
認識精度を向上させることができるという効果がある。According to the present invention, a language data acquisition function for acquiring updated language data on a recording medium on which a speech recognition program is recorded, and a language model update instruction
A user dictionary reading function for reading a user dictionary, a user dictionary dependent language model building function for reading updated language data and building a language model dependent on the user dictionary, and a language for transmitting the built language model via a network When the user dictionary becomes large by realizing a model transmission function and a language model update function of receiving a language model and updating the language model referred by the collation function for speech recognition with the received language model However, there is an effect that the recognition accuracy of voice recognition can be improved without imposing a large burden on the user.

【０２０３】この発明によれば、音声認識プログラムを
記録した記録媒体で、更新された言語データを取得する
言語データ取得機能と、言語モデル更新指令を受けて、
ユーザが利用したテキストを取得するユーザ利用テキス
ト取得機能と、更新された言語データを読み出し、テキ
ストに依存した言語モデルを構築するユーザ利用テキス
ト依存言語モデル構築機能と、構築した言語モデルを、
ネットワークを介して送信する言語モデル送信機能と、
言語モデルを受信し、照合機能が音声認識の際に参照す
る言語モデルを、受信した言語モデルにより更新する言
語モデル更新機能とを実現させることにより、ユーザに
大きな負担をかけることなく、音声認識の認識精度を向
上させることができるという効果がある。According to the present invention, a language data acquisition function for acquiring updated language data on a recording medium on which a speech recognition program is recorded, and a language model update instruction
A user use text acquisition function for acquiring a text used by a user, a user use text dependent language model construction function for reading updated language data and constructing a text dependent language model, and a constructed language model
A language model transmission function for transmission via a network,
A language model is received, and a language model updating function of updating the language model referred to by the collating function at the time of speech recognition with the received language model is realized. There is an effect that recognition accuracy can be improved.

【０２０４】この発明によれば、音声認識プログラムを
記録した記録媒体で、音響モデルを特定するＩＤを取得
する音響モデルＩＤ取得機能と、取得したＩＤを読み出
し、入力された音声信号から適応化用の音声データを取
得し、ネットワークを介して、読み出したＩＤ及び取得
した適応化用の音声データを送信する適応化用音声取得
機能と、送信した適応化用の音声データを用いて、適応
化前の初期音響モデルを適応化し、適応化済み音響モデ
ルを、送信したＩＤに対応付けて格納する音響モデル適
応化機能と、音響モデル更新指令を受けて、ネットワー
クを介して音響モデルＩＤ取得機能が取得したＩＤを受
信し、受信したＩＤに対応する適応化済み音響モデル
を、音響モデル適応化機能が格納した適応化済み音響モ
デルの中から選択して読み出す適応化済み音響モデル選
択機能と、読み出した適応化済み音響モデルを、ネット
ワークを介して送信する音響モデル送信機能と、適応化
済み音響モデルを受信し、照合機能が音声認識の際に参
照する音響モデルを、受信した適応化済み音響モデルに
より更新する音響モデル更新機能とを実現させることに
より、ユーザに大きな負担をかけることなく、音声認識
の認識精度を向上させることができると共に、ユーザに
適応化した適応化済み音響モデルにより、ネットワーク
を介して音響モデルを更新することで、ユーザが複数の
異なる照合機能を利用する場合でも、適切な音響モデル
を利用し、高い認識精度を得ることができるという効果
がある。According to the present invention, an acoustic model ID acquisition function for acquiring an ID for specifying an acoustic model on a recording medium on which a speech recognition program is recorded, an acquired ID is read, and an adaptation And an adaptation audio acquisition function of transmitting the read ID and the acquired adaptation audio data via a network, and using the transmitted adaptation audio data before adaptation. The acoustic model adaptation function that adapts the initial acoustic model of the above and stores the adapted acoustic model in association with the transmitted ID, and the acoustic model ID acquisition function acquires via the network in response to the acoustic model update command Received, and selects an adapted acoustic model corresponding to the received ID from the adapted acoustic models stored by the acoustic model adaptation function. A function for selecting an adapted acoustic model to be read, an acoustic model transmitting function for transmitting the read adapted acoustic model via a network, and receiving an adapted acoustic model, and referencing the matching function for speech recognition. By realizing the acoustic model updating function of updating the acoustic model with the received adapted acoustic model, it is possible to improve the recognition accuracy of speech recognition without imposing a large burden on the user and to adapt to the user. By updating the acoustic model via the network using the optimized adapted acoustic model, even when the user uses a plurality of different matching functions, an appropriate acoustic model can be used and high recognition accuracy can be obtained. This has the effect.

[Brief description of the drawings]

【図１】この発明の実施の形態１による音声認識装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech recognition device according to a first embodiment of the present invention.

【図２】この発明の実施の形態１による音響モデルの
更新処理の手順を示すフローチャートである。FIG. 2 is a flowchart showing a procedure of an acoustic model updating process according to the first embodiment of the present invention.

【図３】この発明の実施の形態１による言語モデルの
更新処理の手順を示すフローチャートである。FIG. 3 is a flowchart showing a procedure of a language model update process according to the first embodiment of the present invention;

【図４】この発明の実施の形態２による音声認識装置
の構成を示すブロック図である。FIG. 4 is a block diagram showing a configuration of a speech recognition device according to a second embodiment of the present invention.

【図５】この発明の実施の形態２による言語モデルの
更新処理の手順を示すフローチャートである。FIG. 5 is a flowchart showing a procedure of a language model update process according to the second embodiment of the present invention;

【図６】この発明の実施の形態３による音声認識装置
の構成を示すブロック図である。FIG. 6 is a block diagram showing a configuration of a voice recognition device according to a third embodiment of the present invention.

【図７】この発明の実施の形態３による言語モデルの
更新処理の手順を示すフローチャートである。FIG. 7 is a flowchart showing a procedure of a language model update process according to Embodiment 3 of the present invention;

【図８】この発明の実施の形態４による音声認識装置
の構成を示すブロック図である。FIG. 8 is a block diagram showing a configuration of a voice recognition device according to a fourth embodiment of the present invention.

【図９】この発明の実施の形態４による言語モデルの
更新処理の手順を示すフローチャートである。FIG. 9 is a flowchart showing a procedure of a language model update process according to Embodiment 4 of the present invention.

【図１０】この発明の実施の形態５による音声認識装
置の構成を示すブロック図である。FIG. 10 is a block diagram showing a configuration of a speech recognition device according to a fifth embodiment of the present invention.

【図１１】この発明の実施の形態５による音響モデル
の更新処理の手順を示すフローチャートである。FIG. 11 is a flowchart showing a procedure of an acoustic model updating process according to the fifth embodiment of the present invention.

【図１２】従来の音声認識装置の構成を示すブロック
図である。FIG. 12 is a block diagram illustrating a configuration of a conventional voice recognition device.

【図１３】従来の音声認識処理の手順を示すフローチ
ャートである。FIG. 13 is a flowchart showing a procedure of a conventional voice recognition process.

【図１４】従来の音声認識装置の構成を示すブロック
図である。FIG. 14 is a block diagram illustrating a configuration of a conventional voice recognition device.

【図１５】従来の音声認識装置の構成を示すブロック
図である。FIG. 15 is a block diagram showing a configuration of a conventional speech recognition device.

【図１６】ユーザ辞書６０１の構成例を示す図であ
る。FIG. 16 is a diagram illustrating a configuration example of a user dictionary 601.

【図１７】従来の音声認識処理の手順を示すフローチ
ャートである。FIG. 17 is a flowchart showing a procedure of a conventional voice recognition process.

[Explanation of symbols]

１０音声認識装置、２０音響モデル管理サーバ、３
０言語モデル管理サーバ、１００音声信号、１０１
照合手段、１０２音響モデル、１０３言語モデ
ル、１０４認識結果、１０５音響モデル更新指令、
１０６音響データ取得手段、１０７更新された音響
データ、１０８音響モデル構築手段、１０９音響モ
デル格納手段、１１０音響モデル送信手段、１１１
音響モデル更新手段、１１２言語モデル更新指令、１
１３言語データ取得手段、１１４更新された言語デー
タ、１１５言語モデル構築手段、１１６言語モデル
格納手段、１１７言語モデル送信手段、１１８言語
モデル更新手段、４０１更新言語モデルＩＤ取得手段、
４０２特定向け言語データ読み出し手段、４０３特
定向け言語モデル構築手段、６０１ユーザ辞書、６０
２ユーザ辞書読み出し手段、６０３ユーザ辞書依存
言語モデル構築手段、８０１ユーザ利用テキスト取得
手段、８０２ユーザ利用テキスト格納手段、８０３
ユーザ利用テキスト依存言語モデル構築手段、１００１
音響モデルＩＤ取得手段、１００２適応化用音声取
得手段、１００３初期音響モデル、１００４音響モ
デル適応化手段、１００５適応化済み音響モデル格納
手段、１００６適応化済み音響モデル選択手段。10 speech recognition device, 20 acoustic model management server, 3
0 language model management server, 100 voice signal, 101
Collation means, 102 acoustic model, 103 language model, 104 recognition result, 105 acoustic model update command,
106 sound data acquisition means, 107 updated sound data, 108 sound model construction means, 109 sound model storage means, 110 sound model transmission means, 111
Acoustic model updating means, 112 language model updating command, 1
13 language data acquisition means, 114 updated language data, 115 language model construction means, 116 language model storage means, 117 language model transmission means, 118 language model update means, 401 updated language model ID acquisition means,
402 Specific language data reading means, 403 Specific language model construction means, 601 User dictionary, 60
2 User dictionary reading means, 603 User dictionary dependent language model construction means, 801 User use text acquisition means, 802 User use text storage means, 803
User-dependent text-dependent language model construction means, 1001
Acoustic model ID acquisition means, 1002 Adaptation speech acquisition means, 1003 Initial acoustic model, 1004 Acoustic model adaptation means, 1005 Adapted acoustic model storage means, 1006 Adapted acoustic model selection means.

Claims

[Claims]

1. A speech recognition apparatus for receiving a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of the speech, and outputting a recognition result. And an acoustic model management server configured to acquire updated acoustic data and construct the acoustic model, the acoustic model being constructed by the acoustic model management server, and And the voice recognition device updates the acoustic model to be referred to during speech recognition with the acoustic model transmitted by the acoustic model management server.

2. An acoustic model management server acquires an ID for identifying an acoustic model to be referred to when a speech recognition device recognizes speech, and updates an acoustic model corresponding to a specific condition indicated by the acquired ID. 2. The speech recognition system according to claim 1, wherein data is read out, an acoustic model depending on the specific condition is constructed, and transmitted to the speech recognition device.

3. A speech recognition device that inputs a speech signal, performs speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputs a recognition result. The speech recognition device is connected to the speech recognition device via a network. A language model management server that acquires updated language data and constructs the language model, and transmits the language model constructed by the language model management server to the speech recognition device. A speech recognition system, characterized in that a speech recognition device updates a language model referred to during speech recognition with the language model transmitted by the language model management server.

4. A language model management server acquires an ID for identifying a language model to be referred to when the speech recognition device performs speech recognition, and updates the language in accordance with a specific condition indicated by the acquired ID. 4. The speech recognition system according to claim 3, wherein data is read, a language model depending on the specific condition is constructed, and transmitted to the speech recognition device.

5. A speech recognition device refers to a user dictionary in which words are registered at the time of speech recognition, and a language model management server reads the user dictionary via a network, and updates the language data and the read language data. 4. The speech recognition system according to claim 3, wherein a language model dependent on the user dictionary is constructed with reference to the user dictionary and transmitted to the speech recognition device.

6. A language model management server acquires a text used by a user of the speech recognition device, refers to updated language data and the acquired text, and constructs a language model dependent on the text. 4. The speech recognition system according to claim 3, wherein the speech is transmitted to the speech recognition device.

7. A speech recognition device that inputs a speech signal, performs speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation value sequence of the speech, and outputs a recognition result. And a sound model management server having an initial sound model before adaptation, wherein the sound recognition device has an ID for specifying the sound model.
And audio data for adaptation are acquired from the input audio signal, and the acquired ID and audio data for adaptation are transmitted to the acoustic model management server via a network. Using the transmitted adaptation audio data, the initial acoustic model is adapted, the adapted acoustic model is stored in association with the transmitted ID, and an external acoustic model update command is issued. Receiving an ID specifying the acoustic model from the speech recognition device via the network, and selecting an adapted acoustic model corresponding to the received ID from the stored adapted acoustic models. The voice recognition device transmits the voice model to the voice recognition device via the network to the voice recognition device. There speech recognition system and updates the transmitted adapted pre acoustic model.

8. A speech system comprising: an acoustic model for obtaining a probability of an acoustic observation value sequence of speech; and a matching unit for inputting a speech signal, performing speech recognition with reference to the acoustic model, and outputting a recognition result. In the recognition device, an acoustic model constructed based on the updated acoustic data is received from an acoustic model management server connected via a network, and the matching unit refers to the received acoustic model for speech recognition. A speech recognition device comprising: an acoustic model updating unit that updates the model based on a model.

9. An acoustic model updating unit according to an acoustic model management server connected via a network, based on updated acoustic data, based on a specific condition of an acoustic model to be referred to by the collating unit for speech recognition. 9. The speech recognition apparatus according to claim 8, wherein a dependent acoustic model is received, and an acoustic model referred to by the matching means at the time of speech recognition is updated with the received acoustic model.

10. A speech recognition apparatus comprising: a language model for obtaining an appearance probability of a word string; and a matching unit for inputting a speech signal, performing speech recognition with reference to the language model, and outputting a recognition result. A language model constructed based on the updated language data from a language model management server connected via a language model, and updating the language model referred to by the matching means at the time of speech recognition with the received language model A speech recognition device comprising: a model updating unit.

11. A language model updating unit according to a language model management server connected via a network, based on a specific condition of a language model constructed by updated language data and referred to by a collating unit during speech recognition. 11. The speech recognition apparatus according to claim 10, wherein a language model dependent on the language model is received, and a language model referred to by the matching unit during speech recognition is updated with the received language model.

12. The collation means includes a user dictionary in which words to be referred to in speech recognition are registered, and the language model updating means is constructed from the language model management server connected via a network with updated language data. Receiving the language model dependent on the user dictionary referred to by the matching means for speech recognition, and updating the language model referred to by the matching means for speech recognition with the received language model. Claim 10
The speech recognition device according to the above.

13. A language model updating unit receives a language model dependent on a text used by a user who performs speech recognition, constructed from updated language data, from a language model management server connected via a network. And
A language model referred to by the matching means during speech recognition,
The speech recognition device according to claim 10, wherein the speech recognition device is updated based on the received language model.

14. An acoustic model for obtaining a probability of an acoustic observation value sequence of speech, a matching means for inputting a speech signal, performing speech recognition with reference to the acoustic model, and outputting a recognition result, Acoustic model I for acquiring ID for specifying
D acquisition means, reads the ID acquired by the acoustic model ID acquisition means, acquires audio data for adaptation from the input audio signal, and transmits the read ID and the acquired audio data for adaptation to the network. An adaptation sound acquisition unit for transmitting to the acoustic model management server connected via the control unit; and an adaptation sound model adapted by the adaptation sound data corresponding to the ID from the acoustic model management server. A speech recognition apparatus, comprising: an acoustic model updating unit that receives and updates an acoustic model referred to by the matching unit during speech recognition based on the received adapted acoustic model.

15. An acoustic data acquiring means for acquiring updated acoustic data, and an acoustic model updating command received from the outside, the acoustic data acquiring means reads the acquired acoustic data, and reads the acquired acoustic data. Acoustic model construction means for constructing an acoustic model for obtaining a probability of a series of observation values, and acoustic model transmission means for transmitting the acoustic model constructed by the acoustic model construction means to a speech recognition apparatus for performing speech recognition via a network. An acoustic model management server comprising:

16. An acoustic data acquisition unit for acquiring updated acoustic data, and an acoustic model referred to by a speech recognition device connected via a network upon receiving an acoustic model update command from the outside when performing speech recognition. Updated acoustic model ID acquiring means for acquiring an ID specifying the updated acoustic model ID, and updated acoustic data acquired by the acoustic data acquiring means corresponding to a specific condition indicated by the ID acquired by the updated acoustic model ID acquiring means Specific acoustic data reading means for reading the specific acoustic data reading means, a specific acoustic model constructing means for constructing an acoustic model dependent on the specific condition with reference to the updated acoustic data read by the specific acoustic data reading means, Model for transmitting the acoustic model constructed by the acoustic model constructing means to the speech recognition apparatus via a network. Acoustic model management server characterized by comprising a transmission unit.

17. An initial acoustic model before adaptation for obtaining the probability of an acoustic observation value sequence of speech, speech data for adaptation transmitted from a speech recognizer connected via a network, The speech recognition device receives an ID for specifying an acoustic model to be referred to during speech recognition, adapts the initial acoustic model using the adaptation speech data, and receives the adapted acoustic model. An acoustic model adapting unit that stores the acoustic model in the adapted acoustic model storing unit in association with the ID; and an external acoustic model update command. An adapted acoustic model selecting means for selecting and reading out an adapted acoustic model corresponding to the ID from the adapted acoustic model storage means, and the adapted acoustic model An acoustic model management server comprising: an acoustic model transmitting unit that transmits the adapted acoustic model read by the selecting unit to the speech recognition device via a network.

18. A language data acquiring means for acquiring updated language data, and in response to a language model update command from outside, reads out the updated language data acquired by the language data acquiring means, and generates a word string. Language model construction means for constructing a language model for obtaining a probability; and language model transmission means for transmitting the language model constructed by the language model construction means to a speech recognition device that performs speech recognition via a network. Characteristic language model management server.

19. A language data acquisition means for acquiring updated language data, and a language model referred to by a speech recognition device connected via a network upon receiving a language model update command from the outside in speech recognition. Language model ID acquiring means for acquiring an ID specifying the language, and updated language data acquired by the language data acquiring means in response to a specific condition indicated by the ID acquired by the updated language model ID acquiring means Specific language data reading means for reading, specific language model constructing means for constructing a language model dependent on the specific condition with reference to the updated language data read by the specific language data reading means, Language model for transmitting the language model constructed by the language model constructing means to the speech recognition device via a network. Language model management server characterized by comprising a transmission unit.

20. A language data acquisition means for acquiring updated language data, and a user dictionary which, upon receiving a language model update command from the outside, refers to a speech recognition device connected via a network when performing speech recognition. A user dictionary reading unit for reading the updated language data obtained by the language data obtaining unit, and a user dictionary dependent language model building unit for building a language model dependent on the user dictionary read by the user dictionary reading unit. A language model management server, comprising: a language model transmission unit that transmits a language model constructed by the user dictionary dependent language model construction unit to the speech recognition device via a network.

21. A language data acquiring means for acquiring updated language data, and a user who receives a language model update command from outside and acquires text used by a user of a speech recognition device connected via a network. Use text acquisition means, a user use text dependent language model construction means for reading updated language data acquired by the language data acquisition means, and constructing a language model dependent on the text acquired by the user use text acquisition means, A language model management server, comprising: a language model transmission unit that transmits a language model constructed by the user-used text-dependent language model construction unit to the speech recognition device via a network.

22. A speech recognition method for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of the speech, and outputting a recognition result. A first step of acquiring, an acoustic model update command, receiving the updated acoustic data acquired in the first step, a second step of constructing an acoustic model, and a second step of constructing the acoustic model. A third step of transmitting the obtained acoustic model via a network, receiving the acoustic model transmitted in the third step, and updating the acoustic model to be referred to at the time of the speech recognition with the received acoustic model. And a fourth step.

23. A speech recognition method for inputting a speech signal, performing speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result. Receiving a language model update command, reading the updated language data acquired in the first step, and constructing a language model; and a language step constructed in the second step, A third step of transmitting via the network, a fourth step of receiving the language model transmitted in the third step, and updating the language model to be referred to at the time of the speech recognition with the received language model; A speech recognition method comprising:

24. A speech recognition method for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of speech, and outputting a recognition result. A first step of acquiring, an acoustic model update instruction, a second step of acquiring an ID for specifying an acoustic model to be referred to at the time of speech recognition, and an instruction designated by the ID acquired in the second step. A third step of reading out the updated acoustic data acquired in the first step in response to the specific condition, and referring to the updated acoustic data read in the third step. A fourth step of constructing a dependent acoustic model, a fifth step of transmitting the acoustic model constructed in the fourth step via a network, and a fifth step Receiving the acoustic model transmitted in step (a), and updating the acoustic model to be referred to in the speech recognition with the received acoustic model.

25. A speech recognition method for inputting a speech signal, performing speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result. Step, a second step of receiving an instruction to update the language model, and acquiring an ID for identifying a language model to be referred to in speech recognition, and corresponding to a specific condition indicated by the ID acquired in the second step. And a third step of reading out the updated language data obtained in the first step; and referring to the updated language data read out in the third step, a language model depending on the specific condition is referred to. A fourth step of constructing, a fifth step of transmitting the language model constructed in the fourth step via a network, and a language transmitted in the fifth step And a sixth step of receiving the model and updating a language model referred to during speech recognition with the received language model.

26. A speech recognition method for inputting a speech signal and obtaining a probability of occurrence of a word string and performing speech recognition with reference to a user dictionary in which words are registered, and outputting a recognition result. A first step of acquiring language data; a second step of receiving a language model update command and reading a user dictionary to be referred to during speech recognition; and updating the updated language data acquired in the first step. A third step of reading and constructing a language model dependent on the user dictionary read in the second step; a fourth step of transmitting the language model constructed in the third step via a network; Fifth step of receiving the language model transmitted in the fourth step and updating a language model referred to during speech recognition with the received language model Speech recognition method characterized by comprising a.

27. A speech recognition method for inputting a speech signal, performing speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result. Receiving a language model update command, a second step of obtaining a text used by a user performing voice recognition, and reading the updated language data obtained in the first step, A third step of constructing a language model dependent on the text obtained in step 3, a fourth step of transmitting the language model constructed in the third step via a network, and a fourth step of transmitting the language model in the fourth step. A fifth step of receiving a language model and updating a language model referred to during speech recognition with the received language model. Voice recognition method.

28. A speech recognition method for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation sequence of speech, and outputting a recognition result, wherein the acoustic model is specified. A first step of acquiring an ID; reading the ID acquired in the first step; acquiring audio data for adaptation from the input audio signal; and via a network, the read ID and the acquired adaptation. A second step of transmitting audio data for adaptation, and adapting the initial acoustic model before adaptation using the audio data for adaptation transmitted in the second step, ID sent in the second step above
Receiving the ID acquired in the first step via a network in response to the acoustic model update command,
A fourth step of selecting and reading out the adapted acoustic model corresponding to D from the adapted acoustic model stored in the third step, and the adapted acoustic model read in the fourth step A fifth step of transmitting an acoustic model via a network, receiving the adapted acoustic model transmitted in the fifth step, and converting the acoustic model to be referred to in speech recognition to the received adapted acoustic model. And a sixth step of updating with a model.

29. A speech recognition program for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation value sequence of the speech, and outputting a recognition result is recorded. A recording medium, comprising: an acoustic data acquisition function for acquiring updated acoustic data; and an acoustic model for receiving an acoustic model update command, reading the updated acoustic data acquired by the acoustic data acquisition function, and constructing an acoustic model. A model construction function; an acoustic model transmission function for transmitting the acoustic model constructed by the acoustic model construction function via a network; and an acoustic model transmitted by the acoustic model transmission function. Record a speech recognition program that realizes the acoustic model update function that updates the acoustic model to be referred to when receiving the acoustic model Computer readable recording medium.

30. A recording medium on which a speech recognition program for inputting a speech signal, performing a speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result is provided. A language data acquisition function for acquiring updated language data, and a language model construction function for receiving a language model update command, reading the updated language data acquired by the language data acquisition function, and constructing a language model. A language model transmission function for transmitting the language model constructed by the language model construction function via a network, a language model for receiving the language model transmitted by the language model transmission function, and referring to the collation function for speech recognition A computer that records a speech recognition program that implements a language model update function of updating a model with a received language model. Readable recording medium.

31. A speech recognition program for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining a probability of an acoustic observation value sequence of the speech, and outputting a recognition result is recorded. A recording medium, an acoustic data acquiring function for acquiring updated acoustic data, an updated acoustic model ID acquiring function for acquiring an ID for specifying the acoustic model in response to an acoustic model update command, In response to a specific condition indicated by the ID obtained by the ID obtaining function, a specific sound data reading function for reading the updated sound data obtained by the sound data obtaining function; A specific acoustic model construction function for constructing an acoustic model depending on the specific condition with reference to the updated acoustic data, An acoustic model transmission function for transmitting the acoustic model constructed by the regular acoustic model construction function via a network, and an acoustic model transmitted by the acoustic model transmission function are received, and the collation function refers to the acoustic model when performing speech recognition. A computer-readable recording medium storing a speech recognition program for realizing an acoustic model updating function of updating an acoustic model with a received acoustic model.

32. A recording medium on which a speech recognition program for inputting a speech signal, performing a speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result is provided. A language data acquisition function for acquiring updated language data, an update language model ID acquisition function for acquiring an ID for specifying the language model in response to a language model update command, and an update language model ID acquisition function for acquiring A specific language data reading function for reading the updated language data obtained by the language data obtaining function in response to the specific condition indicated by the specified ID; and an updated language read by the specific language data reading function. A language model construction function for constructing a language model depending on the specific conditions by referring to the data; A language model transmission function for transmitting the language model constructed by the file construction function via a network, a language model transmitted by the language model transmission function, and a language model referenced by the collation function for speech recognition. And a computer-readable recording medium on which a speech recognition program for realizing a language model updating function for updating with a received language model is recorded.

33. A speech recognition program for realizing a matching function of performing speech recognition by inputting a speech signal and calculating a word string appearance probability and a user dictionary in which words are registered, and outputting a recognition result. A language data acquiring function for acquiring updated language data, a user dictionary reading function for reading the user dictionary in response to a language model update command, and a language data acquiring function for acquiring the language data. A user dictionary dependent language model construction function for reading updated language data and constructing a language model dependent on the user dictionary read by the user dictionary read function; and a language model constructed by the user dictionary dependent language model construction function. A language model transmission function for transmitting via a network, and a language model transmitted by the language model transmission function. Receiving the computer-readable recording medium recording a speech recognition program language model reference, to realize a language model update function for updating the language model received during the verification function is speech recognition.

34. A recording medium which records a speech recognition program for inputting a speech signal, performing speech recognition with reference to a language model for obtaining an appearance probability of a word string, and outputting a recognition result. A language data acquisition function for acquiring updated language data, a user use text acquisition function for acquiring a text used by a user who performs speech recognition in response to a language model update command, and a language data acquisition function for acquiring A user-based text-dependent language model construction function for reading updated language data and constructing a language model dependent on the text acquired by the user-based text acquisition function; and a language model constructed by the user-based text-dependent language model construction function Language model transmission function for transmitting A computer-readable recording medium storing a speech recognition program for receiving a received language model and realizing a language model update function of updating the language model referred to by the matching function during speech recognition with the received language model .

35. A speech recognition program for inputting a speech signal, performing speech recognition with reference to an acoustic model for obtaining the probability of an acoustic observation sequence of the speech, and outputting a recognition result. A recording medium, and an acoustic model I for acquiring an ID for specifying the acoustic model
D acquisition function, reads the ID acquired by the acoustic model ID acquisition function, acquires speech data for adaptation from the input speech signal, and reads the read ID and acquired speech for adaptation via a network. Adaptation voice acquisition function for transmitting data, and adaptation of the initial acoustic model before adaptation using the speech data for adaptation transmitted by the speech acquisition function for adaptation,
The acoustic model adaptation function of storing the adapted acoustic model in association with the ID transmitted by the adaptation voice acquisition function, and the acoustic model ID acquisition function via a network in response to an acoustic model update command Receiving an acquired ID, and selecting and reading out an adapted acoustic model corresponding to the received ID from the adapted acoustic models stored by the acoustic model adaptation function; An acoustic model transmitting function of transmitting the adapted acoustic model read by the adapted acoustic model selecting function via a network; and receiving the adapted acoustic model transmitted by the acoustic model transmitting function, and performing the matching function. And an acoustic model updating function of updating an acoustic model referred to during speech recognition by the received adapted acoustic model. A computer-readable recording medium for recording the voice recognition program.