JP2001312294A

JP2001312294A - Learning method of transducer transducing input symbol series into output symbol series, and computer-readable recording medium with stored learning program of transducer

Info

Publication number: JP2001312294A
Application number: JP2000133943A
Authority: JP
Inventors: Hajime Tsukada; 元塚田
Original assignee: ATR ONSEI GENGO TSUSHIN KENKYU; ATR Spoken Language Translation Research Laboratories
Current assignee: ATR ONSEI GENGO TSUSHIN KENKYU; ATR Spoken Language Translation Research Laboratories
Priority date: 2000-05-02
Filing date: 2000-05-02
Publication date: 2001-11-09

Abstract

PROBLEM TO BE SOLVED: To provide a learning method of a transducer, transducing an input symbol series into an output symbol series, which makes it possible to obtain a transducer taking into account not only the context of input symbols, but also the context of output symbols and a computer-readable recording medium with stored a learning program of the transducer. SOLUTION: This is a learning method of a transducer transducing the input symbol series into the output symbol series; and a group of input and output symbols which are previously made to correspond to each other is used as learning data and modeled as n-gram.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、入力記号列を出
力記号列に変換するトランスデューサの学習方法および
トランスデューサの学習プログラムを記憶したコンピュ
ータ読み取り可能な記録媒体に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a transducer learning method for converting an input symbol string into an output symbol string and a computer-readable recording medium storing a transducer learning program.

【０００２】[0002]

【従来の技術】音声認識に必要な各種の確率モデルは、
それぞれ独立した定式化に基づいて生成されることが多
かった。例えば、音響モデルはＨＭＭとして、言語モデ
ルはｎ−ｇｒａｍとしモデル化することが多い。しか
し、近年のオートマトン理論の進展により、これらの確
率モデルは、重み付き有限状態トランスデューサ（weig
hted finite-state transducer, 以後ＷＦＳＴという）
として統一的にモデル化できることが明らかになってき
た（文献１参照）。2. Description of the Related Art Various probability models required for speech recognition are as follows.
They were often generated based on independent formulas. For example, an acoustic model is often modeled as an HMM, and a language model is modeled as an n-gram. However, with the development of automata theory in recent years, these stochastic models have been replaced by weighted finite state transducers (weig
hted finite-state transducer, hereinafter WFST)
It has become clear that a model can be unified as (see Document 1).

【０００３】文献１： Fernando Pereira and Michael
Riley, "Speech Recognition by Composition of Weig
hted Finite Automata", In Emmanuel Roche and Yves
Schabes, editors, Finite-State Language Processin
g, pp.431-453, MIT Press, Cambridge, Massachusett
s, 1997.Reference 1: Fernando Pereira and Michael
Riley, "Speech Recognition by Composition of Weig
hted Finite Automata ", In Emmanuel Roche and Yves
Schabes, editors, Finite-State Language Processin
g, pp.431-453, MIT Press, Cambridge, Massachusett
s, 1997.

【０００４】ＷＦＳＴは、重みという形で尤度を一般化
したもので、確率的有限状態トランスデューサ(probabi
listic finite-state transducer, 以後ＰＦＳＴとい
う）はその特殊な形だと考えることができる。また、合
成演算によって、あらかじめ複数のＷＦＳＴを一つのＷ
ＦＳＴに展開することで、効率よく精度の高い探索が行
えることも、大語彙連続音声認識を対象として実証され
てきている（文献２参照）。[0004] WFST is a generalization of likelihood in the form of weights, and is a stochastic finite state transducer (probabi
listic finite-state transducer (hereinafter PFST) can be considered a special form. In addition, a plurality of WFSTs are converted into one W
It has been demonstrated that large-vocabulary continuous speech recognition can be performed efficiently and with high accuracy by expanding to FST (see Reference 2).

【０００５】文献２：Mehryar Mohri, Michael Riley,
Don Hindle, Andrej Ljoljo and Fernando Pereira, "F
ull Expansion of Context-Dependent Networks in Lar
ge Vocabulary Speech Recognition", In Proc. of the
International Conferenceon Acoustics, Speech, and
Signal Processing (ICASSP '98), 1998.Reference 2: Mehryar Mohri, Michael Riley,
Don Hindle, Andrej Ljoljo and Fernando Pereira, "F
ull Expansion of Context-Dependent Networks in Lar
ge Vocabulary Speech Recognition ", In Proc. of the
International Conferenceon Acoustics, Speech, and
Signal Processing (ICASSP '98), 1998.

【０００６】このような背景から、音声認識に必要な確
率モデルをＷＦＳＴとして自動学習する手法が、近年、
重要な研究課題の一つになってきた。[0006] Against this background, a method of automatically learning a probability model necessary for speech recognition as WFST has recently been developed.
It has become one of the important research issues.

【０００７】これまでにも、形態素解析や数字列変換の
問題を対象に、有限状態トランスデューサを自動学習す
る手法が研究されてきた（文献３、４参照）。文献３：Emmanuel Roche and Yves Schabes, "Determin
istic Part-of-SpeechTagging with Finite-State Tran
sducers", Computational Linguistics, Vol.21, No.
2, pp.227-253, 1995.Until now, techniques for automatically learning a finite state transducer have been studied for the problems of morphological analysis and digit string conversion (see References 3 and 4). Reference 3: Emmanuel Roche and Yves Schabes, "Determin
istic Part-of-SpeechTagging with Finite-State Tran
sducers ", Computational Linguistics, Vol. 21, No.
2, pp.227-253, 1995.

【０００８】文献４：Jose Oncina, Pedro Garcia and
Enrique Vidal, "Learning Subsequential Transducers
for Pattern Recognition Interpretation Tasks", IE
EE Trans. Pattern Analysis and Machine Intelligenc
e, Vol. 15, No. 5, 1993.Reference 4: Jose Oncina, Pedro Garcia and
Enrique Vidal, "Learning Subsequential Transducers
for Pattern Recognition Interpretation Tasks ", IE
EE Trans.Pattern Analysis and Machine Intelligenc
e, Vol. 15, No. 5, 1993.

【０００９】しかし、これらの手法は確率的なモデルを
学習するものではないため、音声認識にはあまり適した
ものとはいえなかった。また、音声認識で広くおこなわ
れているように、各入力（または出力）記号毎に環境依
存ＨＭＭを構成し、記号列変換をモデル化する方法も考
えられる。しかし、ＨＭＭの出力記号は独立性を仮定し
ているため、変換モデルとしての能力はあまり高くな
い。統計的機械翻訳の分野では、同期依存木によって対
応づけられている言語間の変換を自動学習する手法も提
案されている（文献５参照）。However, since these methods do not learn a probabilistic model, they cannot be said to be very suitable for speech recognition. Further, as widely used in speech recognition, a method of constructing an environment-dependent HMM for each input (or output) symbol and modeling symbol string conversion is also conceivable. However, since the output symbols of the HMM assume independence, their ability as a transformation model is not very high. In the field of statistical machine translation, a method of automatically learning conversion between languages associated with a synchronization dependency tree has also been proposed (see Reference 5).

【００１０】文献５：Hiyan Alshawi, Bangalore Srini
vas and Shona Douglas, "LearningDependency Transla
tion Models as Collections of Finite State Head Tr
ansducers", Computational Linguistics, Vol. 26, 20
00.Reference 5: Hiyan Alshawi, Bangalore Srini
vas and Shona Douglas, "LearningDependency Transla
tion Models as Collections of Finite State Head Tr
ansducers ", Computational Linguistics, Vol. 26, 20
00.

【００１１】しかし、ここで学習される確率的ヘッドト
ランスデューサは、ＷＦＳＴ上に定義された音声認識に
とって有用な演算がそのまま使えるわけではない。[0011] However, the stochastic head transducer learned here cannot use the computation useful for speech recognition defined on the WFST as it is.

【００１２】[0012]

【発明が解決しようとする課題】この発明では、文脈依
存の入力記号列ｓ_inと文脈依存の出力記号列ｓ_outとの
間の変換を、Ｐ (ｓ_in, ｓ_out) またはＰ (ｓ_in｜ｓ
_out) を出力するＷＦＳＴとして自動学習する手法を提
案する。本手法は、あらかじめ対応づけられた入出力記
号ペア（入出力号の組）を可変長のｎ−ｇｒａｍ（文献
６、７、８としてモデル化するものである。ｎ−ｇｒａ
ｍの次数を可変とすることで、パラメータ数の最適化を
はかることができる。According to the present invention, the conversion between a context-dependent input symbol string s _in and a context-dependent output symbol string s _out is performed by P (s _in , s _out ) or P (s _in | S
_out ) is proposed as a WFST that outputs automatically. This method models input / output symbol pairs (sets of input / output signals) associated in advance as variable-length n-grams (references 6, 7, and 8).
By making the order of m variable, the number of parameters can be optimized.

【００１３】文献６：春野雅彦, 松本裕治, "文脈木を
利用した形態素解析", 情報処理学会研究報告, 96-NL-
112, pp.31-36, 1996. 文献７：Hinrich Schutze and Yoram Singer, "Part-of
-Speech Tagging Using a Variable Memory Markov Mod
el", 32nd Annual Meeting of ACL, 1994. 文献８：Marcelo J. Weinberger, Jorma J. Rissanen a
nd Meir Feder, "A Universal Finite Memory Source",
IEEE Trans. Information Theory, Vol. 41,No. 3, 19
95.Document 6: Masahiko Haruno and Yuji Matsumoto, "Morphological Analysis Using Context Tree", IPSJ SIG Technical Report, 96-NL-
112, pp.31-36, 1996. Reference 7: Hinrich Schutze and Yoram Singer, "Part-of
-Speech Tagging Using a Variable Memory Markov Mod
el ", 32nd Annual Meeting of ACL, 1994. Reference 8: Marcelo J. Weinberger, Jorma J. Rissanen a
nd Meir Feder, "A Universal Finite Memory Source",
IEEE Trans. Information Theory, Vol. 41, No. 3, 19
95.

【００１４】この発明は、入力記号の文脈だけでなく、
出力記号の文脈についても考慮したトランスデューサが
得られる、入力記号列を出力記号列に変換するトランス
デューサの学習方法およびトランスデューサの学習プロ
グラムを記憶したコンピュータ読み取り可能な記録媒体
を提供することを目的とする。The invention is not limited to the context of the input symbol,
An object of the present invention is to provide a transducer learning method for converting an input symbol string into an output symbol string, and a computer-readable recording medium storing a transducer learning program, which can provide a transducer that also considers the context of output symbols.

【００１５】[0015]

【課題を解決するための手段】この発明は、入力記号列
を出力記号列に変換するトランスデューサの学習方法で
あって、予め対応づけられた入出力記号の組を学習デー
タとして用い、対応づけられた入出力記号の組をｎ−ｇ
ｒａｍとしてモデル化することを特徴とする。SUMMARY OF THE INVENTION The present invention relates to a transducer learning method for converting an input symbol string into an output symbol string, wherein a set of input / output symbols associated in advance is used as learning data. Ng the set of input / output symbols
It is characterized by being modeled as ram.

【００１６】対応づけられた入出力記号の組を文脈木を
使って可変長のｎ−ｇｒａｍとしてモデル化することが
好ましい。It is preferable to model the set of input / output symbols associated with each other as a variable-length n-gram using a context tree.

【００１７】この発明は、入力記号列を出力記号列に変
換するトランスデューサの学習プログラムを記録したコ
ンピュータ読み取り可能な記録媒体であって、予め対応
づけられた入出力記号の組を学習データとして用い、対
応づけられた入出力記号の組の列をｎ−ｇｒａｍとして
モデル化するための処理をコンピュータに実行させるた
めの学習プログラムを記録していることを特徴とする。The present invention is a computer-readable recording medium storing a transducer learning program for converting an input symbol string into an output symbol string, using a set of input / output symbols associated in advance as learning data. A learning program for causing a computer to execute processing for modeling a sequence of a set of input / output symbols associated with each other as an n-gram is recorded.

【００１８】[0018]

【発明の実施の形態】以下、図面を参照して、この発明
の実施の形態について説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１９】〔１〕ＷＦＳＴに基づく認識問題の定式化ここでは、文献１にならって認識問題を定式化すること
で、条件付き確率を出力するＷＦＳＴを自動学習する意
義を明らかにする。[1] Formulation of recognition problem based on WFST Here, the significance of automatically learning a WFST that outputs a conditional probability is clarified by formulating a recognition problem according to Reference 1.

【００２０】まず最初に、ＷＦＳＴの重み（尤度）を表
現するために半環(semiring)という概念を導入する。半
環を用いることによって、音声認識で使われる尤度とそ
れに対応する演算を見通し良く表現することができる。First, the concept of semiring is introduced to express the weight (likelihood) of WFST. By using a half-ring, the likelihood used in speech recognition and the operation corresponding thereto can be expressed with good visibility.

【００２１】半環とは、(i) 〜(iv)のような、＜Ｋ，
＋，・，０_K，１_K＞である。 (i) 和は結合律が成り立ち、可換で、単位０_Kをもつ。 (ii)積は結合律が成り立ち、単位元１_Kをもつ。 (iii) 積は和に対して分配律が成り立つ。 (iv)０_Kは積の零元である。A half ring is defined as <K, such as (i) to (iv).
+, .., _0K , _1K >. (i) sum holds bond law, commutative, with units 0 _K. (ii) the product is holds bond law, with identity element 1 _K. (iii) The product has a distribution rule for the sum. (iv) 0 _K is the zero element of the product.

【００２２】例えば、確率値は、＜｛ｘ∈Ｒ｜０≦ｘ≦
１｝，＋，・，０，１＞のような半環であると考えるこ
とができる。Viterbi 近似によって処理上は、＋のかわ
りに最大値ｍａｘが用いられるが、この場合について
も、確率値は、＜｛ｘ∈Ｒ｜０≦ｘ≦１｝，ｍａｘ，
０，１＞として表現できる。さらに多くの場合、積を和
に変換するために確率値のlog を用いるが、これについ
ても、＜｛ｘ∈Ｒ｜ｘ≦１｝∪｛−∞｝，ｍａｘ，＋，
−∞，０＞という半環で表現できる。For example, the probability value is expressed as <｛x∈R | 0 ≦ x ≦
1｝, +, .., 0, 1>. Although the maximum value max is used instead of + in the processing by the Viterbi approximation, also in this case, the probability value is expressed as <{x∈R | 0 ≦ x ≦ 1}, max,
0,1>. More often, the log of the probability value is used to convert the product to a sum, but again, <｛x∈R | x ≦ 1｝ ∪ ｛−∞｝, max, +,
−∞, 0>.

【００２３】ＷＦＳＴは、入力記号列をある出力記号列
に変換し、重みを返す有限状態機械である。ＷＦＳＴ
Ａは、６つの組＜Ｑ_A, Σ^* _Ain, Σ^* _Aout, ｉ_A, Ｆ
_A,Ｅ_A＞で表現される。Ｑ_Aは状態の有限集合、Σ
_Ainは入力記号の有限集合、Σ _Aoutは出力記号の有限集
合、ｉ_A∈Ｑは初期状態、Ｆ_A: Ｑ_A→Ｋは最終重み
関数、Ｅ_A⊆Ｑ_A×Σ^* _Ain×Σ^* _Aout×Ｋ×Ｑ_Aは遷
移の集合を表す。Σ^*はΣのKleene閉包で、Σの０個以
上の要素を連接したものを要素とする集合を表す。遷移
の構成要素であるＫは重みを表す半環である。Ｆ_Aは通
常の最終状態の概念を一般化したもので、通常の意味の
最終状態は、最終状態ならば１_Kを、そうでなければ０
_Kを返す関数として表現できる。また、入力記号しか持
たない一般のオートマトンについても、Σ_AinとΣ_Aout
が等しいものとして、ＷＦＳＴによる表現が可能であ
る。WFST converts an input symbol string into an output symbol string.
Is a finite state machine that converts to and returns weights. WFST
A is a set of 6 <Q_A, Σ^* _Ain, Σ^* _Aout, i_A, F
_A, E_A>. Q_AIs a finite set of states, Σ
_AinIs a finite set of input symbols, Σ _AoutIs a finite collection of output symbols
If i_A∈Q is the initial state, F_A: Q_A→ K is the final weight
Function, E_A⊆Q_A× Σ^* _Ain× Σ^* _Aout× K × Q_AIs trans
Represents a set of moves. Σ^*Is the Kleene closure of Σ, zero or more of Σ
Represents a set whose elements are the concatenation of the above elements. transition
K is a semi-ring representing a weight. F_AIs
It is a generalization of the concept of an ordinary final state, and has a normal meaning.
The final state is 1 if the final state_K, Otherwise 0
_KCan be expressed as a function that returns Also, only input symbols
For general automata that do not work,_AinAnd Σ_Aout
Can be expressed by WFST as
You.

【００２４】ｔ∈Ｅ_Aの構成要素を参照するための関数
src,dst,in,out, ωを、ｔ＝（src(t),in(t),out(t),ω
(t),dst(t)) のようにように定義する。ＷＦＳＴのパス
ｐとは、１＜ｉ≦ｍでsrc( t_i) ＝ dst (t _i-1) であ
るような遷移の列ｐ＝t₁, …,t_mである。パスｐに対す
る重みはｗ(p) ＝ｗ(t₁)…ｗ(t_m) , 入力記号列はin
(p) ＝in(t₁)…in(t_m) , 出力記号列はout(p)＝out
(t₁) …out( t_m) ，始端はsrc(p) ＝src(t₁),終端はd
st(p)＝dst (t_m) で表す。入力記号列ｕ，出力記号列
ｖに対する重みＷ(u,v) は、src(p)＝ i_A, in(p) ＝
ｕ,out(p) ＝ｖであるような全てのパスｐに対するΣ_p
ω(p) ・ F_A（dst(p)）で定義する。[0024] The function used to refer to the components of the t∈E _A
Let src, dst, in, out, ω be t = (src (t), in (t), out (t), ω
(t), dst (t)). The WFST path p is a sequence of transitions p = t ₁ ,..., _Tm such that src (t _i ) = dst (t _i-1 ) with 1 <i ≦ m. The weight for the path p is w (p) = w (t ₁ )... W (t _m ), and the input symbol string is in
(p) = in (t ₁ )… in (t _m ), output symbol string is out (p) = out
(t ₁ )… out (t _m ), start point is src (p) = src (t ₁ ), end point is d
represented by st (p) = dst (t m). The weight W (u, v) for the input symbol string u and the output symbol string v is src (p) = i _A , in (p) =
Σ _p for all paths p such that u, out (p) = v
ω (p) • Defined by F _A (dst (p)).

【００２５】入出力記号列がΣ^* ₀ ×Σ^* ₁ であるよう
なＷＦＳＴとΣ^* ₁ ×Σ^* ₂ であるものから、Σ^* ₀×
Σ^* ₂であるような合成ＷＦＳＴを定義することができ
る。合成（composition)のアルゴリズムは有限状態オー
トマン(FSA) の共通部分(intersection)を求めるものと
似ており、どちらも結合律が成り立つ。ここでは、説明
を簡単にするために遷移の入出力記号がΣ₀×Σ₁であ
るＷＦＳＴＡとΣ₁×Σ₂であるＷＦＳＴＢの合成
Ａ＊Ｂについてだけ説明する。詳細については、文献１
を参照のこと。From the WFST in which the input / output symbol string is Σ ^* ₀ × Σ ^* ₁ and Ｗ ^* ₁ × Σ ^* ₂ , Σ ^* ₀ ×
A composite WFST can be defined such that Σ ^* ₂ . The composition algorithm is similar to finding the intersection of the finite state automan (FSA), both of which have a joint rule. Here, for simplicity of description, only the composite A * B of WFST A in which the input / output symbol of the transition is Σ ₀ × Σ ₁ and WFST B in which the input / output symbol of the transition is Σ ₁ × Σ ₂ will be described. For details, see Reference 1.
checking ...

【００２６】Ａ＊Ｂは、次の条件(i) 〜(ii)を満たす＜
Ｑ_A×Ｑ_B，Σ₀，Σ₂，（ｉ_A，ｉ_B），Ｆ_A*B，Ｅ
_A*B＞として定義できる。A * B satisfies the following conditions (i) to (ii) <
Q _A × Q _B , Σ ₀ , Σ ₂ , (i _A , i _B ), F _{A * B} , E
_{A * B} >.

【００２７】(i) Ｆ_A*B（ｑ，ｑ’）＝Ｆ_A（ｑ）Ｆ_B
（ｑ’） (ii)（ｑ，ｘ，ｙ，ｋ，ｒ）∈Ｅ_Aかつ（ｑ’，ｙ，
ｚ，ｋ’，ｒ’）∈Ｅ_B⇔（（ｑ，ｑ’），ｘ，ｚ，ｋ
×ｋ’，（ｒ，ｒ’）∈Ｅ_A*B (I) F _{A * B} (q, q ') = F _A (q) F _B
(Q ') (ii) ( q, x, y, k, r) ∈E A and (q', y,
z, k ′, r ′) {E _B } ((q, q ′), x, z, k
× k ', (r, r') ∈EA _{* B}

【００２８】ＷＦＳＴとその合成という概念を用いる
と、音声認識の確率モデルを見通し良く表現することが
できる。Ｐ(s_n+1 ｜s₀) は、いくつかの段階の中間記号
列 s_iを媒介して、数１のように書き換えられる。ただ
し、各 s_iは s_i-1のみに依存するものとする。Using the concept of WFST and its synthesis, a probability model for speech recognition can be expressed with good visibility. P (s _{n + 1} | s ₀ ) is rewritten as shown in Equation 1 through the intermediate symbol sequence s _i in several stages. However, each s _i depends only on s _i-1 .

【００２９】[0029]

【数１】 (Equation 1)

【００３０】ｓ_iを入力記号列、 s_i-1を出力記号列と
するＷＦＳＴＡ_i,i-1によって、条件つき確率Ｐ(s_i
｜s _i-1)をモデル化したとする。このとき、重みは <
｛x ∈R｜0 ≦x ≦1 ｝, ＋, ・,0,1> または <｛x ∈
R｜0 ≦x ≦1 ｝,max, ・,0,1> の半環で表す。数１
は、Ａ_n+1,n＊Ａ_n,n-1＊…＊Ａ_1,0によって、Ｐ(s
_n+1｜s₀) がモデル化できることを表している。したが
って、ｏを音声パラメータ列、ωを認識対象シンボル列
とすると、一般に音声認識問題は数２のように表され
る。A conditional probability P (s _i is given by WFST A _{i, i-1 where} s _i is an input symbol string and s _i-1 is an output symbol string.
| S _i-1 ) is modeled. At this time, the weight is <
｛X ∈R | 0 ≤x ≤1｝, +, ·, 0,1> or <｛x ∈
R | 0 ≦ x ≦ 1｝, max, ·, 0,1> Number 1
Is P (s) by A _{n + 1, n} * A _{n, n-1} *... A _1,0
_{n + 1} | s ₀ ) can be modeled. Therefore, if o is a speech parameter sequence and ω is a symbol sequence to be recognized, a speech recognition problem is generally expressed as in Equation 2.

【００３１】[0031]

【数２】 (Equation 2)

【００３２】結局、認識候補の探索は、Ａ_O,Sn＊…Ａ
_s2,s1＊Ａ_s1,ω＊Ａω_,ωにおいて、ある入力列ｏに
対するＷ（ｏ，ω）値を最大にするようなωを探索する
問題として定式化できる。After all, the search for the recognition candidate is performed by A _{O, Sn} *.
_{In s2, s1} * A _s1, ω * Aω _, ω, it can be formulated as a problem of searching for ω that maximizes the W (o, ω) value for a certain input sequence o.

【００３３】ここで重要なことは、音声認識に必要な各
モデルを、条件付き確率を表すＷＦＳＴとしてモデル化
できれば、結合律が成り立つ合成演算によって自由に組
み合わせることができるという点である。この合成演算
の有用性から、条件付き確率を表すＷＦＳＴモデルを自
動学習する汎用的な手法が切望される。実際これまでも
音響モデルの分野では、環境依存離散ＨＭＭを用いてこ
のようなＷＦＳＴを自動学習することが行われてきた。
しかし、ＨＭＭの出力記号は独立性を仮定しているた
め、より一般的な変換モデルとするためには入出力とも
に環境を考慮するモデル化が望まれる。次に、そのよう
なＷＦＳＴを自動学習する手法について説明する。What is important here is that if the models required for speech recognition can be modeled as WFSTs representing conditional probabilities, they can be freely combined by a synthesis operation that satisfies the coupling rule. Because of the usefulness of this combination operation, a general-purpose method for automatically learning a WFST model representing a conditional probability is desired. In fact, in the field of acoustic models, automatic learning of such a WFST has been performed using an environment-dependent discrete HMM.
However, since the output symbols of the HMM are assumed to be independent, it is desired to model the input and output in consideration of the environment in order to obtain a more general conversion model. Next, a method for automatically learning such a WFST will be described.

【００３４】〔２〕ＷＦＳＴの学習の基本的な手法[2] Basic method of learning WFST

【００３５】トランデューサは別の見方をするとΣ＝Σ
_in×Σ_outの要素を入力記号とする有限状態オートマン
と見ることもできる。提案する自動学習法の基本的な考
えは、Σ＝Σ_in×Σ_out( ただし、ε ∈Σ_in, ∈Σ
_out) の要素をｎ- ｇｒａｍとしてモデル化しようとい
うものである。つまり学習データの入出力記号を、あら
かじめＤＰマッチングなどを用いて、高々記号一つに対
応づけ、対応づけられた入出力記号の組の列をｎ- ｇｒ
ａｍという確率的有限状態オートマンとしてモデル化す
る。このようにして、同時確率Ｐ(s_in,s_out) (s_in∈Σ
_in , s_out∈Σ_ou _t) を出力するＷＦＳＴを学習するこ
とができる。From another point of view, the transducer is Σ = Σ
It can also be seen as a finite state automan that uses the elements of _in × Σ _out as input symbols. The basic idea of the proposed automatic learning method is Σ = Σ _in × Σ _out (where ε ∈Σ _in , ε
_out ) are modeled as n-grams. That is, the input / output symbols of the learning data are associated in advance with at most one symbol using DP matching or the like, and the sequence of the set of the associated input / output symbols is represented by n-gr.
am is modeled as a stochastic finite state automan. In this way, the joint probability P (s _in , s _out ) (s _in ∈Σ
_in, it s _out _∈Σ _ou _t) can be learned WFST to output a.

【００３６】また、ｎ- ｇｒａｍのｎを固定にすること
は、性能に対するパラメータ数の関係としては最適なも
のとはいえない。そこで、本学習法では、文脈木を用い
た可変長ｎ- ｇｒａｍとしてモデル化する方法を採用し
た。Further, fixing n of n-gram is not optimal as the relation between the number of parameters and the performance. Therefore, in the present learning method, a method of modeling as a variable length n-gram using a context tree is adopted.

【００３７】条件付き確率Ｐ(s_in｜ s_out) のＷＦＳＴ
は、Ｐ(s_in,s_out) のＷＦＳＴと１／Ｐ(s_out) のＷＦ
ＳＴの合成として求められる。ただしこの条件付き確率
ＷＦＳＴの合成は、一般に計算コストが膨大であるばか
りか、合成されるＷＦＳＴも巨大なものになりがちであ
る。そこで、次の〔３〕では、条件付き確率のことを考
慮した文脈木の作成方法についても説明する。WFST of | (s _out s _in) [0037] conditional probability P
Are the WFST of P (s _in , s _out ) and the WF of 1 / P (s _out )
It is required as a composition of ST. However, the synthesis of this conditional probability WFST generally involves not only a huge calculation cost but also a large WFST to be synthesized. Therefore, in the following [3], a method of creating a context tree in consideration of the conditional probability will also be described.

【００３８】本手法で考慮する文脈の長さは、入力記号
列と出力記号列の両方で同じである。当然ながら、本手
法の発展形として両者で異なる長さの文脈を許すモデル
も考えられるであろう。この場合、入力記号列に関して
文脈を考慮しないモデルは、左環境依存の離散ＨＭＭと
等価になる。また、入出力記号の対応づけは以降におい
て、ｍ：ｎ（０≦ｍ≦１，０≦ｎ≦１）として説明する
が、手法自体はｍ，ｎのそれぞれの最大値が任意の場合
に拡張可能である。対応づけの結果生まれるεについて
は、この明細書の中では普通の記号と同等に扱う。当然
ながら、ｎ- ｇｒａｍの文脈からεを削除して考えるな
どの派生法も考えられよう。The length of the context considered in the present method is the same for both the input symbol string and the output symbol string. Of course, a model that allows for different length contexts could be considered as an extension of this method. In this case, a model that does not consider the context of an input symbol string is equivalent to a left-environment-dependent discrete HMM. In the following, the correspondence between input and output symbols will be described as m: n (0 ≦ m ≦ 1, 0 ≦ n ≦ 1). However, the method itself is extended to the case where the maximum values of m and n are arbitrary. It is possible. The ε generated as a result of the association is treated in this specification as equivalent to a normal symbol. Naturally, a derivation method such as removing ε from the context of n-gram may be considered.

【００３９】〔３〕文脈木を用いたモデル化文脈木はｎ- ｇｒａｍの文脈を階層的に管理した木であ
る。文脈木Ｔの枝は、Σの要素でラベルづけされてい
る。ノードには文脈を表すラベルがつくが、それは次の
ように再帰的に定義される。[3] Modeling Using Context Tree The context tree is a tree in which n-gram contexts are hierarchically managed. The branches of the context tree T are labeled with elements of Σ. Nodes are labeled with contextual labels, which are defined recursively as follows:

【００４０】（ａ）根ノードには空の文脈を表すεのラ
ベルがつく。（ｂ）親から子供のノードに至る枝のラベルσを、親ノ
ードのラベルｓの前に継ぎ足したラベルσｓが、子供の
ノードにつく。(A) The root node is labeled with ε representing an empty context. (B) The label σs obtained by adding the label σ of the branch from the parent to the child node before the label s of the parent node is added to the child node.

【００４１】各ノードｓに、Σ_UＰ_T（σ｜ｓ）＝１
（ただし、Ｕ＝σ∈Σ）であるような確率Ｐ_T（σ｜
ｓ）が定義されている。このとき、文脈木Ｔが生成する
文字列ω＝ω₁,ω₂,…, ω_nに対する確率は、数３で与
えられる。[0041] Each node _{_{s, Σ U P T (σ}} | s) = 1
(Where U = σ∈Σ), the probability P _T (σ |
s) is defined. At this time, the probability of the character string ω = ω ₁ , ω ₂ ,..., Ω _n generated by the context tree T is given by Expression 3.

【００４２】[0042]

【数３】 (Equation 3)

【００４３】ただし、ｓ₀＝ε，ｓ_i（ｉ＞０）はω₁,
…, ω_iの接尾辞のうちＴの一番深いノードのラベルに
一致するものである。Ｐ_T( σ｜ｓ）の推定量として
は、フロアリング値αを仮定した数４を用いた。Where s ₀ = ε, s _i (i> 0) is equal to ω ₁ ,
.., Ω _i suffixes match the label of the deepest node of T. As the estimation amount of P _T (σ | s), Equation 4 assuming the flooring value α was used.

【００４４】[0044]

【数４】 (Equation 4)

【００４５】ここで、ｎ_T( σ｜ｓ）は、文脈ｓに続い
てσが現れた回数を示す。文脈木を最適な形に枝刈りす
るのには、数５の利得関数を用いることができる（上記
文献６、８参照）。Here, n _T (σ | s) indicates the number of times σ appears after the context s. The pruning of the context tree in an optimal form can be performed using the gain function of Equation 5 (see Documents 6 and 8).

【００４６】[0046]

【数５】 (Equation 5)

【００４７】これは、親ノードｓのかわりに子ノードσ
ｓを用いた場合に得られる利得を表したものである。図
１に示すように、この利得関数を用いて、ある定数Ｃに
対して、Δ(s) ≧Ｃを満たすような一番深いノードｓよ
り先の枝を切り捨てることができる。This is because, instead of the parent node s, the child node σ
It shows the gain obtained when s is used. As shown in FIG. 1, using this gain function, it is possible to cut off a branch ahead of the deepest node s that satisfies Δ (s) ≧ C for a certain constant C.

【００４８】図２を参照して、文脈木は、次のようにし
て確率的有限状態オートマンに変換することができる。Referring to FIG. 2, the context tree can be transformed into a stochastic finite state automan as follows.

【００４９】（１）次の(i) 〜(ii)の条件を満たすよう
に親ノードを順次複製して子ノードをつくる。このと
き、子ノードσｓのＰ_T( ｗ_i｜σｓ）は、親ノードｓ
のＰ_T(ｗ_i｜ｓ）と同じにする。 (i) すべての葉ノードの全ての接頭辞について、これを
ラベルとするノードが木の中に存在する。 (ii)すべてのノードについて兄弟ノードがすべてそろっ
ている。(1) A parent node is sequentially copied so as to create a child node so as to satisfy the following conditions (i) to (ii). At this time, P _T (w _i | σs) of the child node s is equal to the parent node s
P _T (w _i | s). (i) For all prefixes of all leaf nodes, there is a node in the tree labeled with this. (ii) All sibling nodes are present for all nodes.

【００５０】（２）文脈木のノードをＷＦＳＴの状態に
対応させる。(2) The nodes of the context tree are made to correspond to the state of WFST.

【００５１】（３）すべての葉ノードｓ、すべてのσ∈
Σに対して、ｓσの接尾辞である葉ノードｓ’が一意に
決まる。ｓからｓ’に対し、入出力記号がσで重みがＰ
_T( σ｜ｓ）であるような遷移を作成する。(3) All leaf nodes s, all σ∈
For Σ, a leaf node s ′, which is a suffix of sσ, is uniquely determined. From s to s', the input / output symbol is σ and the weight is P
Create a transition such that _T (σ | s).

【００５２】ｓ_in∈Σ^* _in，ｓ_out∈Σ^* _OUTであるよ
うなＰ_T( ｓ_in, ｓ_out) を出力するＷＦＳＴは、Σを
( Σ_in∪｛ε｝) ×( Σ_out∪｛ε｝) とすることで自
動学習できることがわかった。そこで次に、Ｐ_T( ｓ_in
｜ｓ_out) を出力するＷＦＳＴの構成方法について述べ
る。A WFST that outputs P _T (s _in , s _out ) such that s _in ∈Σ ^* _in , s _out ∈Σ ^* _OUT is
(Σ _in ∪ {ε}) was found to be auto-learning by a _{× (Σ out ∪ {ε}} ). Then, P _T (s _in
| S _out ) will be described.

【００５３】ｏｕｔ：Σⁿ→( Σ_out∪｛ε｝) ⁿを、
Σⁿからその出力記号列への写像だとする。Out: Σ ⁿ → (Σ _out ∪ ｛ε｝) ⁿ ,
Let とす^{る be} a mapping from ⁿ to its output symbol sequence.

【００５４】（１）ｓ_train∈Σ^*をＴの学習データだ
とすると、ｏｕｔ( ｓ_train) を用いて、Ｔ_outを学習
する。(1) Assuming that s _train学習^* is learning data of T, T _out is learned using out (s _train ).

【００５５】（２）Δ_T(s) ≧ＣかつΔ_Tout (out(s))
≧Ｃを満たさないときに、ＴとＴ_outの両方を枝がりす
る。(2) Δ _T (s) ≧ C and Δ _Tout (out (s))
When ≧ C is not satisfied, both T and T _out are branched.

【００５６】（３）同様にＴとＴ_outの両方を拡張し
て、子ノードをつくる。(3) Similarly, both T and T _out are extended to create a child node.

【００５７】（４）同様に遷移を作成するが、遷移の重
みとしてＰ_T (σ｜ｓ）／Ｐ_Tout (out(σ) ｜out(ｓ))
を用いる。(4) A transition is created in the same manner, but P _T (σ | s) / P _Tout (out (σ) | out (s)) is used as the weight of the transition.
Is used.

【００５８】このようにして、Ｔから得られるＷＦＳＴ
とＴ_outから得られるＷＦＳＴ( ただし遷移の重みを１
／Ｐ_T( ｓ_out) としたもの) を、εを一般の記号とみ
なして合成したＷＦＳＴが得られる。Thus, the WFST obtained from T
And WFST obtained from T _out (where the weight of the transition is 1
/ P _T (s _out )) is regarded as a general symbol, and a WFST is obtained.

【００５９】〔４〕発音変形モデルの自動学習[4] Automatic learning of pronunciation deformation model

【００６０】音声認識の分野において、ＨＭＭに基づく
音響モデルやｎ−ｇｒａｍに基づく言語モデルなど、統
計モデルの自動学習はかなりの成功を納めている。しか
し、単語の発音をモデル化する発音辞書の作成は、依然
として人間のもつ言語依存の知識に大きく頼っているの
が普通である。作成作業は非常に労力がかかるため、何
らかの自動化が望まれる。特にシステムを多言語化しよ
うとした場合は、深刻な問題である。In the field of speech recognition, automatic learning of statistical models such as acoustic models based on HMMs and language models based on n-grams has achieved considerable success. However, the creation of pronunciation dictionaries that model the pronunciation of words usually still relies heavily on human language-dependent knowledge. Since the creation work is very labor-intensive, some automation is desired. This is a serious problem especially when trying to make the system multilingual.

【００６１】この目標への第一歩として、提案手法を用
いて、音素列ｐを代表的な発音を表す音素列ｐ’に変換
する発音変形モデルの自動学習を試みる。ここでは、単
語の振り仮名から自動的に作成される代表的な発音辞書
と、その代表的な発音からの発音変形モデルの組み合わ
せで単語の発音がモデル化できると仮定している。As a first step toward this goal, an attempt is made to automatically learn a pronunciation transformation model for converting a phoneme string p into a phoneme string p ′ representing a typical pronunciation using the proposed method. Here, it is assumed that the pronunciation of a word can be modeled by a combination of a typical pronunciation dictionary automatically created from the kana of a word and a pronunciation transformation model from the representative pronunciation.

【００６２】実際、ニューラルネットによって、この発
音変形をモデル化し、単語毎に発音辞書へ発音を追加す
ることで、認識性能が向上することも確かめられている
（文献９参照）。In fact, it has been confirmed that the recognition performance is improved by modeling the pronunciation deformation by a neural network and adding pronunciation to the pronunciation dictionary for each word (see Reference 9).

【００６３】文献９： Toshiaki Fukada, Takayoshi Yo
shimura and Yoshinori Sagisaka,"Automatic Generati
on of Multiple Pronunciations based on Neural Netw
orks", Speech Communication, Vol. 27, No. 1, pp.63
-73, 1999.Reference 9: Toshiaki Fukada, Takayoshi Yo
shimura and Yoshinori Sagisaka, "Automatic Generati
on of Multiple Pronunciations based on Neural Netw
orks ", Speech Communication, Vol. 27, No. 1, pp. 63
-73, 1999.

【００６４】〔４−１〕実験内容上記〔１〕で説明した定式化に基づいて、音響モデル、
言語モデル、発音辞書、発音変形モデルの関係を表現す
ることを試みる。ｐを音素列、ｐ’を代表的な発音を表
す音素列、ωを単語列としたとき、連続単語認識の問題
は数６で表される。[4-1] Content of Experiment Based on the formulation described in the above [1], an acoustic model,
We try to express the relationship between language models, pronunciation dictionaries, and pronunciation transformation models. When p is a phoneme sequence, p ′ is a phoneme sequence representing a representative pronunciation, and ω is a word sequence, the problem of continuous word recognition is expressed by Equation 6.

【００６５】[0065]

【数６】 (Equation 6)

【００６６】さらに、重みの和をｍａｘで定義すること
で、数７になる。Further, by defining the sum of the weights by max, the following equation (7) is obtained.

【００６７】[0067]

【数７】 (Equation 7)

【００６８】Ｐ( ｏ｜ｐ）は音響モデル、Ｐ( ｐ｜
ｐ’）は発音変形モデル、Ｐ( ｐ’｜ω）は代表発音辞
書、Ｐ（ω）は言語モデルを表す。P (o | p) is an acoustic model, and P (p |
p ′) represents a pronunciation transformation model, P (p ′ | ω) represents a representative pronunciation dictionary, and P (ω) represents a language model.

【００６９】ここでは、発音変形モデルＰ( ｐ｜ｐ’）
の評価のために、より単純化した数８、数９で表される
二つの実験を行う。Here, the pronunciation transformation model P (p | p ')
For the evaluation of, two experiments represented by simplified equations 8 and 9 are performed.

【００７０】[0070]

【数８】 (Equation 8)

【００７１】[0071]

【数９】 (Equation 9)

【００７２】数８で表される実験は、音素書き起こしを
代表的な発音に変換する実験である。数９で表される実
験は、音声を代表的な発音として音声認識するものであ
る。The experiment represented by Expression 8 is an experiment in which a phonetic transcription is converted into a typical pronunciation. The experiment represented by Expression 9 is for speech recognition using speech as a representative pronunciation.

【００７３】〔４−２〕実験条件代表的な発音は、各単語の振り仮名から規則的に生成し
た。実際の発音では長母音は母音を二つに続けることで
表現するが、ゆれを吸収するために代表的な発音では長
母音と短母音を区別せず短母音で代表化した。[4-2] Experimental Conditions Representative pronunciations were regularly generated from the pseudonym of each word. In actual pronunciation, long vowels are expressed by continuing two vowels, but in order to absorb fluctuations, typical pronunciations are represented by short vowels without distinguishing between long and short vowels.

【００７４】また、形態素境界をまたがない／ｏｕ／，
／ｅｉ／についてもそれぞれ／ｏ／，／ｅ／で代表化し
た。実際の発音はポーズを表す特殊な音素を含むが、代
表的な発音はこれを含まないものとする。学習データの
ための二つの音素列の間の対応づけには、ＤＰマッチン
グを用いた。Further, the signal does not cross the morpheme boundary / ou /,
/ Ei / was also represented by / o / and / e /, respectively. The actual pronunciation includes a special phoneme indicating a pause, but the typical pronunciation does not include this. DP matching was used for the correspondence between the two phoneme strings for the learning data.

【００７５】図３に「えっとー、エキストラベットをお
願いします。」という発話の音素対応付けの例を示す。
図３の下段は実際の発音の音素列（入力記号列ｐ）を、
上段が代表的な発音を表す音素列（出力記号列ｐ’）を
示している。FIG. 3 shows an example of phoneme correspondence of the utterance "Um, please give me an extra bet."
The lower part of FIG. 3 shows the phoneme sequence (input symbol sequence p) of the actual pronunciation,
The upper row shows a phoneme string (output symbol string p ′) representing a typical pronunciation.

【００７６】ＡＴＲ旅行会話データベース（文献１０参
照）中の５３２発話をテストセットに、それを含まない
６４１８発話を学習セットに設定した。言い間違いが存
在する発話は、正解単語列が音声データや音素書き起こ
しと整合しないため、これを含まないようにテストセッ
ト、学習セットを作成した。In the ATR travel conversation database (see Reference 10), 532 utterances were set as a test set, and 6418 utterances not including the 532 utterances were set as a learning set. The test set and the learning set were created so that the utterance in which the misstatement exists does not include the correct word string because it does not match the speech data or the phoneme transcription.

【００７７】文献１０：Toshiyuki Takezawa, Tsuyoshi
Morimoto and Yoshinori Sagisaka, "Speech and Lang
uage Databases for Speech Translation Research in
ATR", First International Workshop on EALREW (Orie
ntal COCOSDA), pp.148-155,1998.Reference 10: Toshiyuki Takezawa, Tsuyoshi
Morimoto and Yoshinori Sagisaka, "Speech and Lang
uage Databases for Speech Translation Research in
ATR ", First International Workshop on EALREW (Orie
ntal COCOSDA), pp.148-155,1998.

【００７８】Ｐ（ｐ’）は文脈木を使った可変長ｎ−ｇ
ｒａｍ（ｎ≦３，Δ_T≦１０００）として作成した。Ｐ
（ｐ｜ｐ’）は、二種類の学習データを用いて作成し
た。一つは、「正解代表音素列×音素書き起こし(phone
me transcription) 」の対応づけデータから、もう一つ
は「正解代表音素列×音素タイプライタ(phonetic type
writer) の一位認識結果」の対応づけデータを用いた学
習を行った。どちらの場合も、Δ_T≦１０００で文脈木
を枝刈りした。P (p ') is a variable length ng using a context tree
_{ram (n ≦ 3, Δ T} ≦ 1000) was created as. P
(P | p ') was created using two types of learning data. One is `` correct representative phoneme sequence x phoneme transcription (phone
me transcription) ”, and the other is“ correct representative phoneme sequence × phoneme typewriter (phonetic type
writer) 's first-order recognition result ". In both cases, the pruning context tree in delta _T ≦ 1000.

【００７９】音素タイプライタでは、言語モデルに音素
bigramを用いた。この言語モデルと音響モデルは上述の
５３２発話からなるテストセットを用いて学習したもの
である。したがって、Ｐ（ｐ｜ｐ’）の学習に用いるの
はクローズドな認識結果( 音素認識精度９１．７％）で
ある。また、上記数９の認識実験は、音素タイプライタ
の認識結果を再スコアづけする方法で行った。In a phoneme typewriter, a phoneme is added to a language model.
Bigram was used. The language model and the acoustic model are learned using the test set including the 532 utterances described above. Therefore, it is the closed recognition result (phoneme recognition accuracy of 91.7%) that is used for learning P (p | p ′). Further, the recognition experiment of Expression 9 was performed by a method of rescoring the recognition result of the phoneme typewriter.

【００８０】〔４−３〕実験結果表１に実験結果を示す。[4-3] Experimental Results Table 1 shows the experimental results.

【００８１】[0081]

【表１】 [Table 1]

【００８２】学習条件(training cond.)は、モデル学習
に音素書き起こし(transcription)を使ったのか、それ
とも音素タイプライタ(typewrite) を使ったのかを示
す。テスト条件(test cond.)は、上記数８の音素書き起
こしの発音変換実験(transcription) なのか、それとも
上記数９の認識実験(typewriter)なのかを示す。The training condition (training cond.) Indicates whether a phoneme transcription (transcription) or a phoneme typewriter (typewrite) was used for model learning. The test condition (test cond.) Indicates whether it is a phonetic transcription transcription experiment (transcription) of the above equation (8) or a recognition experiment (typewriter) of the above equation (9).

【００８３】第１カラムのｎはｎ−ｇｒａｍの次数を表
す。表中の数字は、代表的な音素列への変換誤り率(Err
or Rate = 100 ×(Ins＋Del ＋Sub)/UtteranceLength)
である。ｎの最大値を２より増やしても改善は大きくな
いが、それでも学習条件と実験条件のあらゆる組み合わ
せにおいて、ｎが増加するにしたがってモデルの性能が
単調に向上することが確かめられた。ｎ＝１は全く文脈
を考慮しない場合であり、コンフュージョンマトリクス
に相当する。それと比較してｎ≦３では最大４５％の改
善率が得られた。In the first column, n represents the order of n-gram. The numbers in the table indicate the conversion error rate (Err
or Rate = 100 × (Ins + Del + Sub) / UtteranceLength)
It is. Even if the maximum value of n is increased beyond 2, the improvement is not large, but it has been confirmed that the performance of the model monotonically improves as n increases in any combination of learning conditions and experimental conditions. The case where n = 1 does not consider the context at all, and corresponds to a confusion matrix. In comparison, when n ≦ 3, a maximum improvement of 45% was obtained.

【００８４】〔５〕むすびあらかじめ対応づけられら入出力記号ペアの列からＷＦ
ＳＴを自動学習する手法を提案した。学習されるモデル
は、文脈木を使って可変長のｎ−ｇｒａｍとして構成さ
れるため、パラメータ数の最適化をはかることができ
る。本モデルは、入力記号列の文脈だけでなく、出力記
号の文脈を考慮することができるため、文脈依存離散Ｈ
ＭＭよりも強力な変換モデルとなっている。[5] Conclusion The WF is obtained from the input / output symbol pair string
We proposed a method to learn ST automatically. The model to be learned is configured as a variable-length n-gram using a context tree, so that the number of parameters can be optimized. Since the present model can consider not only the context of the input symbol string but also the context of the output symbol, the context-dependent discrete H
It is a more powerful conversion model than MM.

【００８５】代表的な発音からの発音変形という問題に
本手法を適用した実験により、可変長の文脈を考慮する
ことによって、文脈を全く考慮しないコンフュージョン
マトリクスでは不可能な記号列変換ができることを示し
た。Experiments applying the present method to the problem of pronunciation deformation from typical pronunciation show that by considering variable-length contexts, it is possible to perform symbol string conversion that is impossible with a confusion matrix that does not consider contexts at all. Indicated.

【００８６】本発明は、目的に適した入出力記号ペアの
対応付けの手法を別途開発することによって、音声認識
以外のにも様々な利用が可能である。例えば、形態素解
析、タグ付きデータ変換、それから非常に似た言語間の
機械翻訳などにも利用可能である。The present invention can be used for various purposes other than speech recognition by separately developing a method of associating input / output symbol pairs suitable for the purpose. For example, it can be used for morphological analysis, data conversion with tags, and machine translation between very similar languages.

【００８７】[0087]

【発明の効果】この発明によれば、入力記号の文脈だけ
でなく、出力記号の文脈についても考慮したトランスデ
ューサが得られるようになる。According to the present invention, it is possible to obtain a transducer that takes into account not only the context of input symbols but also the context of output symbols.

[Brief description of the drawings]

【図１】文脈木の枝刈りを説明するための模式図であ
る。FIG. 1 is a schematic diagram illustrating pruning of a context tree.

【図２】文脈木の拡張と遷移の追加を説明するための模
式図である。FIG. 2 is a schematic diagram for explaining expansion of a context tree and addition of transitions.

【図３】学習データである入出力記号の対応付けの例を
示す模式図である。FIG. 3 is a schematic diagram showing an example of correspondence between input and output symbols as learning data.

Claims

[Claims]

1. A learning method of a transducer for converting an input symbol string into an output symbol string, wherein a set of input / output symbols associated in advance is used as learning data, and a set of associated input / output symbols is n. A transducer learning method for converting an input symbol string into an output symbol string, characterized by being modeled as a gram.

2. The transducer learning method according to claim 1, wherein a set of input / output symbols associated with each other is modeled as a variable-length n-gram using a context tree.

3. A computer-readable recording medium recording a learning program for a transducer for converting an input symbol string into an output symbol string, wherein a set of input / output symbols associated in advance is used as learning data. The sequence of the set of input / output symbols
A computer-readable recording medium in which a learning program for causing a computer to execute a process for modeling as m is recorded.