JP2954215B2

JP2954215B2 - Language processing system

Info

Publication number: JP2954215B2
Application number: JP62118337A
Authority: JP
Inventors: 和彦尾関
Original assignee: Nippon Hoso Kyokai NHK
Current assignee: Japan Broadcasting Corp
Priority date: 1987-05-15
Filing date: 1987-05-15
Publication date: 1999-09-27
Anticipated expiration: 2014-09-27
Also published as: JPS63282882A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、日本語音声認識装置や、日本語ワードプロ
セッサの出力の後処理をするための言語処理システムに
関するものであり、特に順次のいくつかの時点において
複数の文節候補が与えられたとき、それらの候補の確実
度と、複数の文節が他の一つの文節に同時に係る係り受
けの整合度とを考慮に入れ、日本語の句あるいは文とし
て最適な文節列が構成されるように各時点の文節候補か
ら一つずつ文節を選択すると共にその構文を決定し、か
つそれにより得られる文節列の日本語の句あるいは文と
しての適格度を計算する言語処理システムに関するもの
である。［従来の技術］日本語音声認識装置や、日本語ワードプロセッサの出
力の後処理をするための言語処理に使用する目的で、順
次のいくつかの時点において複数の文節候補が与えられ
たときに、それらの候補の確実度と、二文節間の係り受
けの整合度とを考慮に入れ、日本語の句あるいは文とし
て最適な文節列が構成されるように各時点の文節候補か
ら一つずつ文節を選択すると共に、その構文を決定し、
かつそれにより得られる文節列の日本語の句あるいは文
としての適格度を効率良く計算するアルゴリズムとし
て、従来は、たとえば次のものが提案されている。尾関和彦：「最適文節列を選択するための多段決定ア
ルゴリズム」、電子通信学会音声研究会資料、SP86−32
（1986年７月）［発明が解決しようとする問題点］日本語においては、語順が比較的自由であると言われ
ているが、場合によっては語順が問題となる。また、同
種の格が一つの述語に係ることは許されない。このよう
な、語順や、格の衝突の問題をも考慮に入れて、文節列
が日本語の文、あるいは句としてどの程度適格であるか
を計算するためには、二文節間の係り受けの整合度を考
慮するのみでは十分でなく、複数の文節が同時に他の一
つの文節に係ることの整合度を用いなくてはならない。
しかし、前記の従来法では、日本語の句あるいは文とし
ての適格度が、二文節間の係り受けの整合度の和によっ
て定まる場合しか解を求めることが出来なかった。枚挙
法を用いれば、原理的にはこの問題を解くことが出来る
が、計算量の点で実際問題に適用することは非常に困難
であった。そこで、本発明の目的は、日本語の文、あるいは句と
しての適格度が、二文節間の係り受けの整合度ではな
く、複数の文節が同時に他の一つの文節に係ることの整
合度に基づく場合にも、順次のいくつかの時点において
与えられた文節集合から日本語の文あるいは句として最
適な文節列を選択すると共に、その上の構文を決定し、
かつそれにより得られる文節列の日本語の句あるいは文
としての適格度を効率良く計算する言語処理システムを
提供することにある。［問題点を解決するための手段］（ａ）文節を単位とした日本語の構造本発明の構成について説明するにあたって、まず文節
を単位とした日本語の構造について述べる。日本語の文、あるいはまとまった句は、文節という単
位の間の広義の修飾関係によって成り立っていると考え
ることができる。例えば、［S1］「私は７時の電車で会社に行きます」という日本語の文において、「私は」、「７時の」、
「電車で」、「会社に」、「行きます」はそれぞれ文節
であり、「私は」、「電車で」、「会社に」は、すべて
「行きます」を修飾し、「７時の」は「電車で」を修飾
することにより一つのまとまった文を構成している。文節ｘが文節ｙを修飾するときは、ｘはｙに係り、ｙ
はｘを受けるという。また、このような修飾関係を係り
受けという。文節列が日本語のまとまった句、あるいは文を構成す
るためには、それらの文節間に、次のような条件を満た
す係り受けが存在することが必要であると考えられてい
る。［C1］最後の文節以外の文節は、それより文末側にある
文節のいずれか一つに係る。［C2］二つの文節間の係り受けは、他の二つの文節間の
係り受けと交差しない。条件［C1］，［C2］は、つぎのように定義される「構
文」によって表わすことができる。［D1］（１）ｘが文節のとき、［ｘ］は「構文」であ
る。（２）X₁,X₂,…,X_mが「構文」、ｘが文節のとき、［X₁X₂…X_mx］は「構文」である。［D2］文節列x₁x₂…x_nに適切に括弧を付け、構文になる
ようにしたものを、x₁x₂…x_n上の構文という。文節列x₁
x₂…x_n上の構文の全体をＫ（x₁x₂…x_n）と表わすことにする。構文［X₁X₂…X_mx］,X₁＝［…x₁］,X₂＝［…x₂］，…,
X_m＝［…x_m］において、x₁,x₂,…,x_mはｘに係ることを
表わすと約束しておくと、上の意味での構文において
は、条件［C1］と［C2］が満たされ、逆に、条件［C1］
と［C2］を満たす文節列における係り受け関係は、必ず
上の意味での構文で表わすことが出来る。さて、一つの文節列に対して、その上の構文は複数個
存在する。例えば、［［私は］［７時の］［電車で］［会社に］［行きま
す］［［［［［私は］７時の］電車で］会社に］行きます］［［［私は］［７時の］電車で］［会社に］行きます］［［［私は］［［７時の］電車で］会社に］行きます］［［私は］［［７時の］電車で］［会社に］行きます］などは全て例文［S1］の文節列上の構文である。このよ
うな多くの構文の中から、適格性の高い文節列を選択す
るためには、なんらかの評価関数が必要である。そこ
で、まず係り受けの整合度が次のように定められるもの
とする。［C3］文節x₁,x₂,…,x_mが、文節ｘに係ることの整合度
は非負の値をとる関数 PEN（x₁,x₂,…,x_m,x）で表わされる。 PENの値は、例えば０から100の範囲とし、０に近いほ
ど整合度が高いと約束しておく。関数PENをどのように
定めるかは、非常に重要な問題であるが、これは本発明
の主眼点ではないので説明を省く。以上の準備のもとで、構文Ｘの適格度Ｐ（Ｘ）を、次
のように再帰的に定める。［D3］（１）Ｘ＝［ｘ］，（ｘは文節）のとき、Ｐ
（Ｘ）＝0, （２）Ｘ＝［X₁X₂……X_mx］,X₁＝［…x₁］， X₂＝［…x₂］，…,X_m［…x_m］のとき、Ｐ（Ｘ）＝Ｐ（X₁）＋Ｐ（X₂）＋…＋Ｐ（X_m）＋ PEN（x₁、x₂,…,x_m,x）このように定義されたＰ（Ｘ）の値は、Ｘの中のあら
ゆる係り受けに対するPENの値を加算したものになって
いる。（ｂ）問題の設定以下では長い数式を読みやすくするための代りに Σ（ｉ≦ｍ≦ｊ）［ｆ（ｍ）］という記法を用いる。min,argmin,∪などについても同
様に表わす。括弧［、］は混乱のおそれがなければ省略
することもある。ここで、次の状況を考える：［J1］１からＮまでの各時点において、文節の集合B₁,B
₂,…,B_Nが与えられているとする。また、各文節集合B_k
に対して、非負の実数値を取る関数S_kが定められてい
る： S_k:B_k→［0,∞］ S_k（ｘ）は文節集合B_k内の文節ｘの確実度を表わす数値
で、例えば０から100までの値をとるとし、０に近いほ
ど確実度が高いとしておく。音声認識を例にとれば、S_k
（ｘ）は認識装置が、B_k内のｘという認識結果をどの程
度の確からしさで認識したかを示す数値であり、たいて
いの音声認識装置はそのような数値を認識結果と共に出
力するようになっている。また、仮名漢字変換方式の日
本語ワードプロセッサを例にとると、同音異議語が存在
するため、同じ読みを持つ複数の文節候補が出力される
が、それぞれの候補に、漢字や熟語の使用頻度に従って
確実度を示す数値を付随させることができる。構文Ｘの
確実度を、それを構成する文節x₁,x₂,…,x_mの確実度を
基にして次のように定義しておく。Ｓ（Ｘ）＝Σ（１≦ｋ≦ｍ）［S_k（x_k）］また、文節集合列A₁,A₂,…,A_mに対して KB（A₁A₂…A_m）＝｛X|X∈Ｋ（x₁x₂…x_m）,x₁∈A₁,x₂∈A₂,…,x_m∈A_m｝という記法を用意しておく。このような状況のもとで、本発明が取り扱う問題は次
の様に述べることが出来る。［P1］次のものを求めよ。（１）min（Ｘ∈KB（B₁B₂…B_N））［Ｐ（Ｘ）＋Ｓ
（Ｘ）］（２）argmin（Ｘ∈KB（B₁B₂…B_N））［Ｐ（Ｘ）＋Ｓ
（Ｘ）］ KB（B₁B₂…B_N）は有限集合であるから、この問題は原
理的には枚挙法によって解くことが出来る。すなわち、
あらゆるＸ∈KB（B₁B₂…B_N）に対して、定義式に基づきＰ（Ｘ）＋Ｓ（Ｘ）を計算することにより、その最小値と、最小値を与える
Ｘを求めることができる。しかし、KB（B₁B₂…B_N）の元
の数は第１表に示すようにＮと共に急激に増加し、たち
まち膨大な数となるため、枚挙法を実際的な問題に適用
することは極めて困難である。これに対して、本発明によれば、この問題を現実的な
計算量の範囲内で解くことができる。関数PENが PEN（x₁,x₂,…,x_m,x）＝PEN（x₁,x）＋PEN（x₂,x）＋…＋PEN（x_m,x）と二文節間の係り受けの整合度の和で表わすことが出来
る場合については、前記の従来法によってこの問題を効
率良く解くことが出来る。これに対して、本発明は、関
数PENがこのような和に分解出来ない場合を対象とする
ものである。（ｃ）再帰方程式ここでは、本発明において基本的な役割を果たす再帰
方程式について述べる。ある構文の中に埋め込まれてい
る一つの構文に着目すると、その中の文節で他の構文中
の文節と係り受け関係を持つことが出来るのは、末尾の
文節のみである。このことに注意して、次の定義を設け
る。［D4］１≦ｉ≦ｊ≦Ｎと、各ｘ∈B_jに対して（１）OPT（i,j;x）＝min（Ｘ∈KB（B_iB_i+1…B
_j-1｛ｘ｝）［Ｐ（Ｘ）＋Ｓ（Ｘ）］（２）OPK（i,j;x）＝argmin（Ｘ∈KB（B_iB_i+1…B
_j-1｛ｘ｝）［Ｐ（Ｘ）＋Ｓ（Ｘ）］ OPTとOPKに対して、次の再帰方程式が成り立つ。［E1］（１）OPT（i,i;x）＝S_i（ｘ）（２）ｉ＜ｊに対して OPT（i,j;x）＝min（ｉ−１＝ｋ（０）＜ｋ（１）＜ｋ（２）＜…
＜ｋ（ｍ）＝ｊ−1,x₁∈B_k(1),x₂∈B_k(2),…,x_m∈
B_k(m)）［OPT（ｋ（０）＋1,k（１）:x₁）＋OPT（ｋ（１）＋1,
k（２）;x₂）＋…＋OPT（ｋ（ｍ−１）＋1,k（ｍ）;
x_m）＋PEN（x₁、x₂,…,x_m,x）］＋S_j（ｘ）［E2］（１）OPK（i,i;x）＝［ｘ］（２）ｉ＜ｉに対し、［E1］において最小値を与えるｋ
（０）,k（１）,k（２），…,k（ｍ）,x₁,x₂,…,x_mを
（０），（１），（２），…，（），₁,₂,
…，_ｍとすると、 OPK（i,j;x）＝［OPK（（０）＋1,（１）；_１）OPK（
（１）＋1,（２）；_２）…OPK（（−１）＋1,
（）；_ｍ）ｘ］［E1］と［E2］の証明は、次の事実に着目すれば容易で
あるので、詳細は省略する。［E3］KB（B_iB_i+1…B_j-1｛ｘ｝）＝∪（ｉ−１＝ｋ（０）＜ｋ（１）＜ｋ（２）＜…＜
ｋ（ｍ）＝ｊ−1,x₁∈B_k(1),x₂∈B_k(2),…,x_m∈B_k(m)）｛［X₁X₂…X_mx］|X₁∈KB（B_k(o)+1…B_k(1)｛x₁｝）,X₂
∈KB（B_k(1)+1…B_k(2)｛x₂｝），…,X_m∈KB（B_k(m-1)+1
…B_k(m)｛x_m｝）さて、構文［X₁X₂…X_mx］,X₁＝［…x₁］,X₂＝［…
x₂］，…,X_m［…x_m］においては、x₁,x₂,…,x_mがｘに係
る。すなわち、ｘはx₁,x₂,…,x_mを受ける訳であるが、
一つの文節が受ける文節の数には一定の限度があると考
えてよい。そうすると、この上限をＬとするとき、１≦ｍ≦Ｌの範囲で解を求めればよい。このような制限を設ける
と、［E1］は次のようになる。［E1′］（１）OPT（i,i;x）＝S_i（ｘ）（２）ｉ＜ｊに対して OPT（i,j;x）＝min（ｉ−１＝ｋ（０）＜ｋ（１）＜ｋ（２）＜…
＜ｋ（ｍ）＝ｊ−1,1≦ｍ≦L,x₁∈B_k(1),x₂∈B_k(2),…,
x_m∈B_k(m)）［OPT（ｋ（０）＝1,k（１）;x₁）＋OPT
（ｋ（１）＋1,k（２）;x₂）＋…＋OPT（ｋ（ｍ−１）
＋1,k（ｍ）;x_m）＋PEN（x₁,x₂,…,x_m,x）］＋S_j（ｘ）
Ｌ≧Ｎ−１とすれば［E1′］と［E1］は同値になる。す
なわち、［E1′］は［E1］を含む。従って、以後、［E
1］の代りに［E1′］を用いることとする。［E2］はこ
のような制限を付けても変らない。［E1′］において最
小値を与えるｋ（１）,k（２），…,k（ｍ）,x₁,x₂,…,
x_mを、i,j,xに対する最適区分点およびそれらの区分点
における最適文節と呼ぶことにする。以下ではB_kの元の数をNUM（ｋ）とし、B_kの元をB_k＝
｛ｘ_k,1,x_k,2，…,x_{k,NUM（ｋ）}｝と列挙して表わす。（ｄ）OPT、最適区分点、最適文節番号の計算法［E1′］の（２）はｉ＜ｊのとき、OPT（s,t;x_t,p）
（ｉ≦ｓ≦ｔ≦ｊ−1,1≦ｐ≦NUM（ｔ））が既に計算さ
れていれば、OPT（i,j;x_j,q）が計算できることを示し
ている。また、［E1′］の（１）を用いると、ｉ＝ｊの
ときはOPT（i,i;x_i,q）＝S_i（ｘ_i,q）により値が定まる
ので、結局これらの双方により、OPT（i,j;x_j,q）をｊ
−ｉが０の部分から始め、順次より大きい部分へと計算
し、それと同時に最適区分点および最適文節番号を決定
して行くことができる。（ｅ）最適構文、およびその適格度の計算法簡単の為、最適区分点、最適文節番号が一意的に定ま
る場合について説明する。まず、 min（Ｘ∈KB（B₁B₂…B_N））［Ｐ（Ｘ）＋Ｓ（Ｘ）］ min（１≦ｑ≦NUM（Ｎ））min（ｘ∈KB）（B₁B₂…B
_N-1｛ｘ_N,q｝）［Ｐ（Ｘ）＋Ｓ（Ｘ）］＝min（１≦ｑ≦NUM（Ｎ））］OPT（1,N;x_N,q）］であるから、 q0＝argmin（１≦ｑ≦NUM（Ｎ））［OPT（1,N;
x_N,q）］を計算することにより、最適文節列の最後の文節ｘ_N,q0
と最適文節列上の最適構文の適格度OPT（1,N;x_N,q0）が定まる。また、最適文節列とその上の最適構文は、 OPK（1,N;x_N,q0）で与えられる。これを更に具体的に計算するには次のよ
うにすればよい。 1,N,x_N,q0に対する最適区分点を、k1（１）,k1
（２），…,k1（m1）、対応する最適文節番号をそれぞ
れp1（１）,p1（２），…,p1（m1）とすれば、［E2］の
（２）によって OPK（1,N;x_N,q0）＝［OPK（1,k1（１）;x_{k1（１）,p1（１）}）OPK（k1
（１）＋1,k1（２）;x_{k1（２）,p1（２）}）…OPK（k1
（ｍ−１）＋1,k1（ｍ）;x_{k1（ｍ）,p1（ｍ）}）
Ｘ_N,q0］が成り立つ。右辺のOPK（k1（ｒ−１）＋1,k1（ｒ）;x
_{k1（ｒ）,p1（ｒ）}）（１≦ｒ≦m1）の中の、例えばOPK
（1,k1（１）;x_{k1（１）,p1（１）}）において、１＝k1
（１）ならば、 OPK（1,k1（１）;x_{k1（１）,p1（１）}）＝［ｘ_{k1（１）,p1（１）}］であるから、OPK（1,N;x_N,q0）は OPK（1,N;x_N,q0）＝［［ｘ_{k1（１）,p1（１）}］OPK（k1（１）＋1,k1
（２）；ｘ_{k1（２）,p1（２）}）…OPK（k1（ｍ−１）＋1,k1
（ｍ）；ｘ_{k1（ｍ）,p1（ｍ）}）Ｘ_N,q0］と書き直すことが出来る。また、１≠k1（１）ならば、
1,k1（１）,x_{k1（１）,p1（１）}に対する最適区分点をk
2（１）,k2（２），…,k2（m2）、対応する最適文節番
号をそれぞれp2（１）,p2（２），…,p2（m2）とする
と、 OPK（1,k1（１）;x_{k1（１）,p1（１）}）＝［OPK（1,k2（１）;x_{k2（１）,p2（１）}）OPK（k2
（１）＋1,k2（２）;x_{k2（２）,p2（２）}）…OPK（k2
（ｍ−１）＋1,k2（ｍ）;x_{k2（ｍ）,p2（ｍ）}）ｘ
_{k1（２）,p1（２）}］であるから、OPK（1,N;x_N,q0］は OPK（1,N;x_N,q0）＝［［OPK（1,k2（１）;x_{k2（１）,p2（１）}）OPK（k
2（１）＋1,k2（２）;x_{k2（２）,p2（２）}）…OPK（k2
（ｍ−１）＋1,k2（ｍ）;x_{k2（ｍ）,p2（ｍ）}）ｘ
_{k1（２）,p1（２）}］OPK［k1（１）−1,k1（２）;x
_{k1（２）,p1（２）}］ OPK（k1（ｍ−１）＋1,k1（ｍ）;
x_{k1（ｍ）,p1（ｍ）}）ｘ_N,q0］と書き直すことが出来る。このような操作を、出現する
すべてのOPKが唯一つの文節からなる構文になるまで繰
り返せば、OPK（1,N;x_N,q0）、すなわち最適文節列とそ
の上の最適構文を同時に決定することが出来る。 i,jを文節集合の番号、ｘをB_jに属する文節とすると
き、i,j,xに対する最適区分点ｋ（１）,k（２），…,k
（ｍ），と最適文節x₁,x₂,…,x_mの組が複数組存在する
ことがあるが、そのときは、それらの組全てに対して上
記の操作を行い、得られる構文全てを列挙すればよい。本発明は、日本語を対象とする場合のみならず、韓国
語のように日本語と同様の係り受けによって記述できる
文法構造を持つ外国語にも適用できることは言うまでも
ない。［作用］本発明によれば、与えられた文節集合列B₁,B₂,…,B_N
の部分列B_i,B_i+1,…,B_jに対する、最後の文節を固定し
た時の最適な文節列とその上の最適な構文およびその適
格度を、長さの短い部分列に対応するものから順次求め
てそれを記憶しておき、それらの部分列を含むより長い
部分列に対して同様のものを計算するときにそれらを利
用することによって、同じ計算を繰り返すことなく効率
的に所期の結果を得ることが出来る。［実施例］以下に図面を参照して本発明を詳細に説明する。本発明を実施する装置の一実施例を第１図に示す。第
１図において、SCは第２図（Ｂ）に示すフローチャート
におけるステップS5中のテーブルscoreを実現するため
のRAMなどによるバッファメモリであり、入力端子i1か
ら入力される各文節ｘ_j,qの確実度S_j（ｘ_j,q）を保持す
るためのものである。BUFは文節入力端子i2から入力さ
れる文節集合を保持するRAMなどによるバッファメモリ
である。例えば、本発明を音声認識に用いる時は、認識
装置から認識結果として出力される各文節候補を端子i2
から入力し、それらの文節に付随した確実度を端子i1か
ら入力する。T1およびT2はそれぞれ第３図（Ａ）および
（Ｂ）に示すテーブルtable1およびteble2を実現するた
めのRAMなどによるメモリである。COMBIは文節集合B₁,B
₂,…,B_Nの中からいくつかの文節集合B_k(1),B_k(2),…,B
_k(m)を選ぶ組合わせと、選ばれた文節集合の中からそれ
ぞれ文節ｘ_{ｋ（１）,p（１）},x_{ｋ（２）,p（２）}，…,x
_{ｋ（ｍ）,p（ｍ）}を選択する組合わせを発生する装置で
ある。SEL1はバッファメモリBUFから組合わせ発生装置C
OMBIにより指定される特定の文節のみを選択する装置で
ある。PEはバッファメモリBUFから選択装置SEL1を介し
て読み出された文節x₁,x₂,…,x_m,xに対して、PEN（x₁,x
₂,…,x_m,x）を計算する装置である。SEL2はメモリT1か
ら、組合わせ発生装置COMBIにより指定される特定の情
報のみを選択する装置である。ADD1はPEN計算装置PEの
出力と、選択装置SEL2によってメモリT1から読み出され
た数値とを加算する加算器である。MINは組合わせ発生
装置COMBIが種々の組合わせを発生するときの加算器ADD
1の出力の最小値とそのときの組合わせを検知する最小
値検出器である。ADD2は最小値検出器MINの出力とバッ
ファメモリSCの中の特定の数値とを加算する加算器であ
る。 CONTはこれら各部の動作順序を制御するための制御装
置であって、例えば中央処理装置CPUと、各部の制御手
順を予め記憶しておくためのROMの形態のメモリMEM1
と、作業用のRAMの形態のメモリMEM2とを有する。01お
よび02はそれぞれメモリT1およびT2に書込まれた計算結
果を出力する出力端子である。第２図（Ａ）および（Ｂ）は、第１図示の実施例にお
けるメモリMEM1に、それぞれ、予め格納しておく制御手
順の一例としての、最適文節列の上の最適構文の適格
度、最適文節列、およびその上の最適構文を定める為の
最適区分点と最適文節番号の組を順次求める為の手順を
示すフローチャートである。以下、これについて説明す
る。第２図（Ａ）および（Ｂ）のフローチャートに付随し
て、第３図（Ａ）および（Ｂ）に示すように、想定して
いる文節列長Ｎに等しい数の行および列、および第ｊ列
において文節集合の元の数NUM（ｊ）に等しい項をもっ
た２つの３次元のテーブルtable1（i,j,q）およびtable
2（i,j,q）（１≦ｉ≦ｊ≦N,1≦ｑ≦NUM（ｊ））が必要
である。各テーブルの添字は左から順に行、列、項を表
す。 table1（i,j,q）はOPT（i,j;x_j,q）の値を、またtabl
e2（i,j,q）はi,j,x_j,qに対する最適区分点と最適文節
番号の組を記憶する為のものである。第ｋ文節集合B_kの
元の数NUM（ｋ）は１次元のテーブルnum（ｋ）に入力さ
れ保持される。第ｋ文節集合B_k内の第ｐ文節ｘ_k,pの確
実度は２次元のテーブルscore（k,p）に入力され保持さ
れる。また、PEN（ｘ_k1,p1,x_k2,p2，…,x_km,pm、
ｘ_j,q）を計算する関数をpen（k1,p1），（k2,p2），
…，（km,pm），（i,q））とする。第２図（Ａ）および（Ｂ）のフローチャートにおい
て、ステップS1からステップS13において、各テーブル
の列番号ｊを１から始めてＮまで１ずつ増加させ、第ｊ
列に対して次の処理を実行する。ステップS2からステップS11において各テーブルの行
番号ｉをｊから始めて１まで１ずつ減少させ、第ｉ行に
対して次の処理を実行する。ステップS3からステップS9において各テーブルの項番
号ｑを１から始めてnum（ｊ）まで１ずつ増加させ、第
ｑ項に対して次の処理を実行する。（１）ステップS4において、ｉ≠ｊと判定されたなら
ば、ステップS5に進んで次の［F1］を実行し、ついでス
テップS6において［F2］を実行する。［F1］table1（i,j,q） :min（ｉ≦k₁＜k₂＜…＜k_m＝ｊ−1,1≦ｍ≦L,1≦p₁≦
num（１）,1≦p₂≦num（２），…,1≦p_m≦num（ｍ））
［table1（i,k₁,p₁）＋table1（k₁＋1,k₂,p₂）＋…＋ta
ble1（ｋ（ｍ−１）＋1,k_m,p_m）＋pen（（k₁,p₁），（k
₂,p₂），…，（_m,p_m），（j,q））］＋score（j,q）［F2］table2（i,j,q）：［F1］において最小値を与える（k₁,p₁），（k₂,
p₂），…，（k_m,p_m）（２）ステップS4において、ｉ＝ｊと判定されたなら
ば、S7に進み、ここで次の［F3］を実行する。［F3］table1（i,j,q）：＝score（i,q）［F2］における組合わせの発生は第１図示の組合わせ
発生装置COMBIで行われる。それらの組合わせに関する
最小値と、最小値を与える組合わせの検知は最小値検出
器MINで行われる。PEN計算装置PEにおいてPENを計算す
るのに必要な文節の選択は選択装置SEL1によって行わ
れ、table1（i,k₁,p₁）,table1（k₁＋1,k₂,p₂），…,ta
ble1（k_m-1）＋1,k_m,p_m）の値の読み出しは選択装置SEL
2によって行われる。また、num（ｋ）の値はバッファBU
F内に保持される。以上の処理により、table1とtable2
の各行、列、項に上述の計算を施し、その結果を順次ta
ble1とtable2に書込んで行く。ステップS13においてｊ＞Ｎとなったときに計算が終
了し、table1（1,N,q）にはOPT（1,N,x_N,q）、（１≦ｑ
≦NUM（Ｎ））が記憶されている。また、table2には最
適区分点と最適文節番号の情報が記憶されているので、
（４）（ｅ）で述べた方法により、この情報から最適文
節列と最適構文を構成することができる。本発明を実際に使用するときには、第１図示の装置、
および第２図（Ａ）および（Ｂ）に示したフローチャー
トの他に、table2の情報から最適な文節列とその上の最
適な構文を構成する機構が必要であるが、本発明の主眼
点はtable1およびtable2の内容を計算するところにあ
り、これらの情報から最適な文節列およびその上に最適
な構文を構成する機構については上記の説明にとどめ
る。但し、table1およびtable2の内容が計算出来ていれ
ば、与えられた文節の集合から最適な文節列およびその
上の最適な構文を構成するために必要な計算の内で、最
も計算量の多い部分はもはや終了していることに注意し
ておく。［F1］において最小値を与える数値の対の組（（k₁,p
₁），（k₂,p₂），…，（k_m,p_m））が複数個存在するこ
とがあるが、そのときには、table2（i,j,q）に複数個
の数値の対の組が記憶できるようにしておき、［F2］に
おいてそれらを全てtable2（i,j,q）に記憶するように
すればよい。このように第２図（Ａ）および（Ｂ）のフ
ローチャートを変更しても計算量には殆ど変わりがな
い。なお、上述した実施例では、最小値を求める処理の場
合を示したが、これらはS_k（ｘ）の値が小さいほど文節
ｘの確実度が高く、PENの値が小さいほど係り受けの整
合度が高いとしたためである。もしS_k（ｘ）の値が大き
いほど確実度が高く、PENの値が大きいほど係り受けの
整合度が高いならば、最小値の代りに最適値を求める処
理を行えばよい。［発明の効果］以上に述べたように、本発明によれば、与えられた文
節集合B₁,B₂,…,B_Nの部分列B_i,B_i+1,…,B_jに対する、最
後の文節を固定した時の最適な文節列とその上の最適な
構文およびその適格度を、長さの短い部分列に対応する
ものから順次求めてそれを記憶しておき、それらの部分
列を含むより長い部分列に対して同様のものを計算する
ときにそれらを利用することによって、同じ計算を繰り
返すことなく効率的に所期の結果を得ることが出来る。複数の文節x₁,x₂,…,x_mが同時に、ある一つの文節ｘ
に係ることの整合度PEN（x₁,x₂,…,x_m,x）は、各文節を
構成する単語の属性や、実際の文章の中に現れる係り受
けの頻度などの統計情報に基づいて計算することができ
る。その計算量は言語辞書の構成法などによっても変る
が、一つの目安として、次のような場合につき、本発明
の計算方法と枚挙法における加算と比較演算の回数を評
価する。（１）PEN（x,y）を計算するためには、加算Ｊ回分の計
算量を必要とする。（２）PEN（x₁,x₂,…,x_m,x）を計算するためには、PEN
（x₁,x）＋PEN（x₂,x）＋…＋PEN（x_m,x）を計算するの
と同じだけの計算量、即ち加算（Ｊ＋１）・ｍ−１回分
の計算量を必要とする。さらに、文節集合の元の数は、全てＭに等しいとす
る。そうすると、解くべき問題の大きさを定めるパラメ
ータは、Ｊの他に既に述べた次のものがあることにな
る。 M:各文節集合の元の数 N:文節列長 L:一つの文節に同時に係り得る文節数の上限以上の前提の下で、計算量は次のようになる。（ａ）本発明（１）加算 _iC_jを二項係数とし、関数ｆ（ｎ）をと定義すると（２）比較関数ｇ（ｎ）をと定義すると（ｂ）枚挙法 knum（n,L）を長さｎの文節列上の係り受け構造の中
で、一つの文節に同時に係る文節の数がＬ以下のものの
個数とすると、（１）加算全加算回数＝｛knum（N,L）＋（Ｊ＋１）・（Ｎ−
１）＋（Ｎ−１）｝・M^N （２）比較全比較回数＝knum（N,L）・M^N−１ knum（n,L）は次の漸化式を用いて計算することがで
きる。これらの全加算回数および全比較回数をＪ＝１といく
つかのM,N,Lについて計算した値を第２表と第３表に掲
げる。これらの表によれば本発明の効果は明らかであり、例
えば、Ｍ＝５、Ｎ＝20、Ｌ＝５のときの計算量は枚挙法
の約10¹⁵分の１に削減される。DETAILED DESCRIPTION OF THE INVENTION [Industrial applications]   The present invention provides a Japanese speech recognition device and a Japanese word processor.
A language processing system for post-processing the output of the processor
Especially at some point in the sequence
When multiple phrase candidates are given, certainty of those candidates
And the degree at which multiple clauses are related to another
Considering the degree of consistency of
Phrase candidates at each time so that the optimal phrase sequence is constructed
One by one, select the clause and determine its syntax.
And the Japanese phrase or sentence in the bunsetsu sequence
Language processing system that calculates eligibility
It is. [Conventional technology]   Japanese speech recognizers and Japanese word processors
Order for use in language processing for post-processing
At the next few points,
The probability of those candidates and the dependency between the two phrases
Considering the degree of consistency of
Phrase candidates at each time so that the optimal phrase sequence is constructed
And select the clauses one by one, determine its syntax,
And the resulting Japanese phrase or sentence in the bunsetsu sequence
Algorithm to efficiently calculate eligibility as
Conventionally, for example, the following has been proposed.   Kazuhiko Ozeki: "Multi-stage decision a for selecting the optimal phrase sequence
Algorithm, "IEICE Speech Research Group, SP86-32
(July 1986) [Problems to be solved by the invention]   In Japanese, word order is said to be relatively free
However, in some cases, word order is a problem. Also,
Species cases cannot be associated with one predicate. like this
Phrasal sequence, taking into account the problem of word order and case conflicts
Is eligible as a Japanese sentence or phrase
Is calculated by considering the consistency of the dependency between the two phrases.
It is not enough to take into account that multiple clauses
We must use the consistency of the two clauses.
However, in the above-mentioned conventional method, Japanese phrases or sentences are used.
Eligibility is determined by the sum of the dependencies between the two phrases.
The solution could only be found when determined. Enumeration
In principle, this problem can be solved by using the method
But very difficult to apply to practical problems in terms of complexity
Met.   Therefore, an object of the present invention is to use Japanese sentences or phrases and
Eligibility is not the consistency of the dependency between the two phrases.
And that multiple clauses affect one other clause at the same time.
Even if based on the degree, at some point in the sequence
From the given phrase set,
Choose the appropriate phrase sequence, determine the syntax above it,
And the resulting Japanese phrase or sentence in the bunsetsu sequence
Language processing system that efficiently calculates eligibility
To provide. [Means to solve the problem] (A) Japanese structure in units of phrases   Before describing the configuration of the present invention,
This section describes the structure of Japanese in terms of.   A sentence in Japanese or a set of phrases is simply called a phrase.
Thought to be based on a broader qualifying relationship between positions
Can be For example, [S1] "I go to work on the 7 o'clock train." In the Japanese sentence "I am", "7 o'clock",
"By train", "To company", "Go" are phrases
And "I", "by train" and "to the company" are all
"Go" is qualified, "7 o'clock" is qualified by "train"
By doing so, one sentence is composed.   When clause x modifies clause y, x is related to y and y
Is said to receive x. In addition, such a modification relationship
It is called receiving.   The bunsetsu sequence forms a united phrase or sentence in Japanese
To satisfy the following conditions between the clauses:
It is considered necessary to have a dependency
You. [C1] Clauses other than the last one are at the end of the sentence
Pertains to one of the clauses. [C2] The dependency between two phrases is
Do not cross with the dependency.   Conditions [C1] and [C2] are defined as follows:
Sentence ". [D1] (1) When x is a clause, [x] is “syntax”
You. (2) X₁, X_Two,…, X_mIs "syntax" and x is a clause, [X₁X_Two… X_mx] Is "syntax". [D2] Clause sequence x₁x_Two… X_nWith proper parentheses and syntax
What you did, x₁x_Two… X_nThe above syntax is called. Phrase string x₁
x_Two… X_nThe whole syntax above K (x₁x_Two… X_n) Will be expressed as follows.   Syntax [X₁X_Two… X_mx], X₁= [… X₁], X_Two= [… X_Two],…,
X_m= [… X_m], X₁, x_Two,…, X_mIs related to x
Promise to represent that in the syntax in the above sense
Satisfies the conditions [C1] and [C2], and conversely, the condition [C1]
Dependency relations in a bunsetsu sequence that satisfies
It can be represented by the syntax in the above sense.   By the way, for one phrase string, there are multiple syntaxes
Exists. For example, [[I] [at 7 o'clock] [by train] [to the company] [go
You] [[[[[[I]] 7 o'clock] by train] go to company]] [[[I] [at 7 o'clock] by train] [go to work]] [[[I] [going to the company] [by the [7:00] train]] [[I] [going to the [train]] [to the company] Are all the syntax on the phrase string of the example sentence [S1]. This
Select a well-qualified clause sequence from many syntaxes
To do this, some sort of evaluation function is needed. There
First, the degree of consistency of the dependency is determined as follows
And [C3] Clause x₁, x_Two,…, X_mIs related to clause x
Is a non-negative function PEN (x₁, x_Two,…, X_m, x) Is represented by   The value of PEN is, for example, in the range of 0 to 100.
I promise that the consistency is high. How the function pen
It is a very important issue to determine
The explanation is omitted because it is not the main point.   With the above preparation, the eligibility P (X) of the syntax X is
Is determined recursively as follows. [D3] (1) When X = [x], where x is a clause, P
(X) = 0, (2) X = [X₁X_Two…… X_mx], X₁= [… X₁], X_Two= [… X_Two],…, X_m[… X_m]When, P (X) = P (X₁) + P (X_Two) + ... + P (X_m) + PEN (x₁, X_Two,…, X_m, x)   The value of P (X) defined in this way is
It is the sum of the PEN values for loose dependencies
I have. (B) Question setting   In the following, to make long formulas easier to readInstead of   Σ (i ≦ m ≦ j) [f (m)] Notation is used. The same applies to min, argmin, ∪, etc.
Express as follows. Parentheses [,] are omitted if there is no risk of confusion.
Sometimes.   Now consider the following situation: [J1] At each time point from 1 to N, a set of clauses B₁, B
_Two,…, B_NIs given. In addition, each phrase set B_k
For a function S that takes a nonnegative real value_kIs stipulated
RU:   S_k: B_k→ [0, ∞] S_k(X) is phrase set B_kNumerical value representing the certainty of clause x in
For example, assume that the value takes a value from 0 to 100.
It is assumed that the certainty is high. Taking voice recognition as an example, S_k
(X) is the recognition device, B_kThe recognition result of x in the
It is a numerical value that indicates whether it was recognized with certainty
Some speech recognizers output such values along with the recognition results.
It is designed to help. Also, the date of the kana-kanji conversion method
Taking a Japanese word processor as an example, there are homonyms
Output multiple phrase candidates with the same reading
However, each candidate has a kanji or idiom according to the frequency of use.
A numerical value indicating certainty can be attached. Of syntax X
The certainty and the clause x₁, x_Two,…, X_mThe certainty of
Based on the following definition.   S (X) = Σ (1 ≦ k ≦ m) [S_k(X_k)]   Also, the phrase set sequence A₁, A_Two,…, A_mAgainst   KB (A₁A_Two… A_m) = ｛X | X∈K (x₁x_Two… X_m), X₁∈A₁, x_Two∈A_Two,…, X_m∈A_m｝ Notation is prepared.   Under these circumstances, the problems addressed by the present invention are as follows:
It can be stated as follows. [P1] Find the following. (1) min (X∈KB (B₁B_Two… B_N)) [P (X) + S
(X)] (2) argmin (X∈KB (B₁B_Two… B_N)) [P (X) + S
(X)]   KB (B₁B_Two… B_N) Is a finite set, so the problem is
Physically, it can be solved by the enumeration method. That is,
Any X∈KB (B₁B_Two… B_N), Based on the definition formula   P (X) + S (X) Gives the minimum value and the minimum value by calculating
X can be determined. However, KB (B₁B_Two… B_N)
Increases rapidly with N, as shown in Table 1,
Applying the enumeration method to practical problems due to the huge number
It is extremely difficult to do.   In contrast, according to the present invention, this problem is
It can be solved within the complexity.   Function PEN is   PEN (x₁, x_Two,…, X_m, x)   = PEN (x₁, x) + PEN (x_Two, x) + ... + PEN (x_m, x) And the sum of the degrees of dependency between the two phrases
In such cases, this problem can be solved by the above-mentioned conventional method.
Can be solved efficiently. In contrast, the present invention provides
For cases where the number PEN cannot be decomposed into such a sum
Things. (C) Recursive equation   Here, recursion plays a fundamental role in the present invention.
The equation is described. Embedded in some syntax
Focusing on one syntax, the clauses in it
Can have a dependency relationship with the clause of
It is only a clause. With this in mind, the following definition was made:
You. [D4] 1 ≦ i ≦ j ≦ N and each x∈B_jAgainst (1) OPT (i, j; x) = min (X∈KB (B_iB_{i + 1}… B
_j-1{X}) [P (X) + S (X)] (2) OPK (i, j; x) = argmin (X∈KB (B_iB_{i + 1}… B
_j-1{X}) [P (X) + S (X)]   The following recursive equations hold for OPT and OPK. [E1] (1) OPT (i, i; x) = S_i(X) (2) For i <j   OPT (i, j; x)   = Min (i-1 = k (0) <k (1) <k (2) <...
<K (m) = j−1, x₁∈B_{k (1)}, x_Two∈B_{k (2)},…, X_m∈
B_{k (m)}) [OPT (k (0) + 1, k (1): x₁) + OPT (k (1) +1,
k (2); x_Two) + ... + OPT (k (m-1) + 1, k (m);
x_m) + PEN (x₁, X_Two,…, X_m, x)] + S_j(X) [E2] (1) OPK (i, i; x) = [x] (2) k that gives the minimum value in [E1] for i <i
(0), k (1), k (2), ..., k (m), x₁, x_Two,…, X_mTo
(0), (1), (2), ..., (),₁,_Two,
…,_mThen   OPK (i, j; x)   = [OPK ((0) +1, (1);₁) OPK (
(1) +1, (2);₂)… OPK ((-1) +1,
();_m) X] The proof of [E1] and [E2] is easy if we pay attention to the following facts.
Details are omitted here. [E3] KB (B_iB_{i + 1}… B_j-1{X})   = ∪ (i-1 = k (0) <k (1) <k (2) <... <
k (m) = j-1, x₁∈B_{k (1)}, x_Two∈B_{k (2)},…, X_m∈B_{k (m)}) ｛[X₁X_Two… X_mx] | X₁∈KB (B_{k (o) +1}… B_{k (1)}｛X₁｝), X_Two
∈KB (B_{k (1) +1}… B_{k (2)}｛X_Two｝),…, X_m∈KB (B_{k (m-1) +1}
… B_{k (m)}｛X_m｝)   Now, the syntax [X₁X_Two… X_mx], X₁= [… X₁], X_Two= […
x_Two],…, X_m[… X_m], X₁, x_Two,…, X_mIs related to x
You. That is, x is x₁, x_Two,…, X_mTo receive
It is considered that there is a certain limit to the number of clauses received by one clause.
May be. Then, when this upper limit is L,   1 ≦ m ≦ L The solution may be obtained in the range of. Imposing such restrictions
Then, [E1] becomes as follows. [E1 '] (1) OPT (i, i; x) = S_i(X) (2) For i <j   OPT (i, j; x)   = Min (i-1 = k (0) <k (1) <k (2) <...
<K (m) = j-1,1 ≦ m ≦ L, x₁∈B_{k (1)}, x_Two∈B_{k (2)},…,
x_m∈B_{k (m)}) [OPT (k (0) = 1, k (1); x₁) + OPT
(K (1) + 1, k (2); x_Two) + ... + OPT (k (m-1)
+1, k (m); x_m) + PEN (x₁, x_Two,…, X_m, x)] + S_j(X)
If L ≧ N−1, [E1 ′] and [E1] have the same value. You
That is, [E1 '] includes [E1]. Therefore, [E
[E1 '] is used instead of [1]. [E2] this
It doesn't change even if you put restrictions like. In [E1 ']
K (1), k (2), ..., k (m), x₁, x_Two,…,
x_mIs the optimal partition point for i, j, x and their partition points
Will be called the optimal clause.   Below is B_kLet NUM (k) be the original number of and B_kB to_k=
｛X_{k, 1}, x_{k, 2},…, X_{k, NUM (k)}｝ And enumerated. (D) OPT, optimal segmentation point, optimal clause number calculation method   (2) of [E1 '] is OPT (s, t; x when i <j_{t, p})
(I ≦ s ≦ t ≦ j−1,1 ≦ p ≦ NUM (t)) has already been calculated.
OPT (i, j; x_{j, q}) Can be calculated
ing. When (1) of [E1 '] is used, i = j
Sometimes OPT (i, i; x_{i, q}) = S_i(X_{i, q}) Determines the value
So, after all, OPT (i, j; x_{j, q}) To j
-Calculate starting from the part where i is 0 and gradually increasing
And at the same time determine the optimal segmentation point and optimal clause number
You can go. (E) Optimal syntax and its eligibility calculation method   For simplicity, the optimal breakpoint and optimal clause number are uniquely determined.
Will be described. First,   min (X∈KB (B₁B_Two… B_N)) [P (X) + S (X)]   min (1 ≦ q ≦ NUM (N)) min (x∈KB) (B₁B_Two… B
_N-1｛X_{N, q}｝) [P (X) + S (X)]   = Min (1 ≦ q ≦ NUM (N))] OPT (1, N; x_{N, q})] Because   q0 = argmin (1 ≦ q ≦ NUM (N)) [OPT (1, N;
x_{N, q})] Is calculated to obtain the last clause x of the optimal clause sequence._{N, q0}
And optimality of optimal syntax OPT (1, N; x_{N, q0}) Is determined.   In addition, the optimal clause sequence and the optimal syntax   OPK (1, N; x_{N, q0}) Given by To calculate this more specifically:
You can do it.   1, N, x_{N, q0}K1 (1), k1
(2), ..., k1 (m1), and the corresponding optimal clause number
If p1 (1), p1 (2), ..., p1 (m1), [E2]
By (2)   OPK (1, N; x_{N, q0})   = [OPK (1, k1 (1); x_{k1 (1), p1 (1)}) OPK (k1
(1) + 1, k1 (2); x_{k1 (2), p1 (2)})… OPK (k1
(M-1) + 1, k1 (m); x_{k1 (m), p1 (m)})
X_{N, q0}] Holds. OPK on the right side (k1 (r-1) +1, k1 (r); x
_{k1 (r), p1 (r)}) (1 ≦ r ≦ m1), for example, OPK
(1, k1 (1); x_{k1 (1), p1 (1)}), 1 = k1
(1)   OPK (1, k1 (1); x_{k1 (1), p1 (1)})   = [X_{k1 (1), p1 (1)}] , So OPK (1, N; x_{N, q0}) Is   OPK (1, N; x_{N, q0})   = [[X_{k1 (1), p1 (1)}] OPK (k1 (1) + 1, k1
(2);   x_{k1 (2), p1 (2)}) ... OPK (k1 (m-1) + 1, k1
(M);   x_{k1 (m), p1 (m)}) X_{N, q0}] Can be rewritten. If 1 ≠ k1 (1),
1, k1 (1), x_{k1 (1), p1 (1)}K is the optimal segment point for
2 (1), k2 (2), ..., k2 (m2), corresponding optimal clause number
The symbols are p2 (1), p2 (2), ..., p2 (m2)
When,   OPK (1, k1 (1); x_{k1 (1), p1 (1)})   = [OPK (1, k2 (1); x_{k2 (1), p2 (1)}) OPK (k2
(1) + 1, k2 (2); x_{k2 (2), p2 (2)})… OPK (k2
(M-1) + 1, k2 (m); x_{k2 (m), p2 (m)}) X
_{k1 (2), p1 (2)}] , So OPK (1, N; x_{N, q0}]   OPK (1, N; x_{N, q0})   = [[OPK (1, k2 (1); x_{k2 (1), p2 (1)}) OPK (k
2 (1) + 1, k2 (2); x_{k2 (2), p2 (2)})… OPK (k2
(M-1) + 1, k2 (m); x_{k2 (m), p2 (m)}) X
_{k1 (2), p1 (2)}] OPK [k1 (1) -1, k1 (2); x
_{k1 (2), p1 (2)}]   OPK (k1 (m-1) + 1, k1 (m);
x_{k1 (m), p1 (m)}) X_{N, q0}] Can be rewritten. Such operations appear
Repeat until all OPKs have a single clause syntax
If you return, OPK (1, N; x_{N, q0}), That is, the optimal phrase sequence and its
Can be determined at the same time.   i and j are phrase set numbers, and x is B_jIf the phrase belongs to
, K (1), k (2),..., K
(M), and optimal clause x₁, x_Two,…, X_mThere are multiple pairs of
At that time, then the above
Perform the above operation and enumerate all the obtained syntaxes.   The present invention is applicable not only to Japanese
Can be described in the same way as Japanese, like a word
Needless to say, it can be applied to foreign languages that have a grammatical structure.
Absent. [Action]   According to the present invention, a given phrase set sequence B₁, B_Two,…, B_N
Subsequence B of_i, B_{i + 1},…, B_jFixed the last clause for
Optimal bunsetsu sequence, optimal syntax on it, and its optimal
Degrees are determined sequentially from the ones corresponding to substrings with shorter lengths
Memorize it and include those substrings longer
Use them when calculating the same for subsequences.
Efficiency without having to repeat the same calculations
It is possible to obtain the expected result. [Example]   Hereinafter, the present invention will be described in detail with reference to the drawings.   FIG. 1 shows an embodiment of an apparatus for carrying out the present invention. No.
In FIG. 1, SC is a flowchart shown in FIG. 2 (B).
To achieve the table score in step S5 in
Buffer memory such as RAM of the
Each sentence x input from_{j, q}Of certainty S_j(X_{j, q}Hold)
It is for. BUF is input from the phrase input terminal i2.
Memory such as RAM that holds a set of clauses
It is. For example, when using the present invention for speech recognition,
Each phrase candidate output as a recognition result from the device is connected to terminal i2.
From the terminal i1 and the certainty associated with those clauses
Input from T1 and T2 are respectively shown in FIG.
Table B and table2 shown in (B) are realized.
This is a memory such as a RAM. COMBI is phrase set B₁, B
_Two,…, B_NSome phrase sets B from_{k (1)}, B_{k (2)},…, B
_{k (m)}And select it from the selected phrase set
Each phrase x_{k (1), p (1)}, x_{k (2), p (2)},…, X
_{k (m), p (m)}With a device that generates combinations to select
is there. SEL1 is the combination generator C from the buffer memory BUF
A device that selects only specific phrases specified by OMBI
is there. PE is transferred from the buffer memory BUF via the selector SEL1.
Sentence x read out₁, x_Two,…, X_m, x, PEN (x₁, x
_Two,…, X_m, x). SEL2 is memory T1
Specific information specified by the combination generator COMBI.
It is a device that selects only information. ADD1 is a PEN calculator PE
Output and read from memory T1 by selector SEL2
This is an adder for adding the calculated values. MIN occurs in combination
Adder ADD when device COMBI generates various combinations
Minimum value of output of 1 and minimum to detect the combination at that time
It is a value detector. ADD2 is connected to the output of the
This is an adder that adds a specific numerical value in the
You.   CONT is a control device for controlling the operation order of these parts.
For example, a central processing unit CPU and control means for each unit.
Memory MEM1 in the form of ROM for storing the order in advance
And a memory MEM2 in the form of a working RAM. 01 you
And 02 are the calculation results written to memories T1 and T2, respectively.
This is the output terminal that outputs the result.   2 (A) and (B) show the embodiment shown in FIG.
Control methods previously stored in the memory MEM1
Eligibility of optimal syntax on optimal clause sequence as an example of order
To determine the degree, optimal clause sequence, and optimal syntax
A procedure for sequentially finding the optimal segmentation point and optimal clause number pairs
It is a flowchart shown. This is described below.
You.   Attached to the flow charts of FIGS. 2 (A) and (B)
As shown in FIGS. 3A and 3B,
Number of rows and columns equal to the length of the clause column N, and the j-th column
With a term equal to the original number NUM (j) of the clause set
Two three-dimensional tables table1 (i, j, q) and table
2 (i, j, q) (1 ≦ i ≦ j ≦ N, 1 ≦ q ≦ NUM (j)) is required
It is. Subscripts in each table indicate rows, columns, and terms in order from the left.
You.   table1 (i, j, q) is OPT (i, j; x_{j, q}) Value and also tabl
e2 (i, j, q) is i, j, x_{j, q}Segmentation points and optimal clauses for
This is for storing a set of numbers. Kth clause set B_kof
The original number NUM (k) is entered into the one-dimensional table num (k).
Is retained. Kth clause set B_kThe p-th clause x in_{k, p}Sure
Reality is input and stored in a two-dimensional table score (k, p)
It is. Also, PEN (x_{k1, p1}, x_{k2, p2},…, X_{km, pm},
x_{j, q}) Is calculated by pen (k1, p1), (k2, p2),
..., (km, pm), (i, q)).   In the flowcharts of FIGS. 2 (A) and (B)
In steps S1 to S13, each table
Column number j from 1 is increased by 1 to N,
Perform the following operations on the column:   Rows of each table in steps S2 to S11
Starting with j, decrease the number i by 1 to 1 and add
Then, the following processing is performed.   Item number of each table from step S3 to step S9
Start q from 1 and increase by 1 to num (j)
The following processing is performed on the q term. (1) If it is determined in step S4 that i ≠ j,
If so, proceed to step S5 to execute the next [F1], and then
[F2] is executed in step S6. [F1] table1 (i, j, q)   : min (i ≦ k₁<K_Two<… <K_m= J-1,1 ≦ m ≦ L, 1 ≦ p₁≤
num (1), 1 ≦ p_Two≤num (2), ..., 1≤p_m≦ num (m))
[Table1 (i, k₁, p₁) + Table1 (k₁+1 and k_Two, p_Two) + ... + ta
ble1 (k (m-1) + 1, k_m, p_m) + Pen ((k₁, p₁), (K
_Two, p_Two),…, (_m, p_m), (J, q))] + score (j, q) [F2] table2 (i, j, q)   : Give the minimum value in [F1] (k₁, p₁), (K_Two,
p_Two),…, (K_m, p_m) (2) If i = j is determined in step S4
For example, the process proceeds to S7, where the next [F3] is executed. [F3] table1 (i, j, q): = score (i, q)   The occurrence of the combination in [F2] is the combination shown in FIG.
It is performed by the generator COMBI. About their combination
The minimum value and the combination that gives the minimum value are detected as the minimum value.
It is performed at the container MIN. Calculate PEN with PEN calculator PE
The selection of the clauses necessary for the selection is made by the selector SEL1.
And table1 (i, k₁, p₁), Table1 (k₁+1 and k_Two, p_Two),…, Ta
ble1 (k_m-1) + 1, k_m, p_mThe value of () is read out by the selection device SEL.
Done by two. Also, the value of num (k) is
It is kept in F. With the above processing, table1 and table2
Apply the above calculation to each row, column, and term in
Write to ble1 and table2.   The calculation ends when j> N in step S13.
OPT (1, N, x) is stored in table1 (1, N, q)_{N, q}), (1 ≦ q
≤ NUM (N)). Also, table2 has
Since the information of the appropriate segmentation point and the optimal clause number is stored,
(4) The optimal sentence is obtained from this information by the method described in (e).
Clause sequences and optimal syntax can be constructed.   When actually using the present invention, the apparatus shown in FIG.
And the flowchart shown in FIGS. 2 (A) and (B)
In addition to the table, the optimal phrase sequence and the
Although a mechanism for constructing a proper syntax is necessary, the main point of the present invention is
The point is where the contents of table1 and table2 are calculated.
From this information, the optimal phrase sequence and
The mechanism for constructing a simple syntax is only described above.
You.   However, if the contents of table1 and table2 can be calculated
If the optimal phrase sequence and its
Of the calculations required to construct the optimal syntax above,
Note that the computationally intensive part is no longer done
Keep it.   In [F1], a pair of numerical pairs giving the minimum value ((k₁, p
₁), (K_Two, p_Two),…, (K_m, p_m))
At that time, in table2 (i, j, q)
So that you can memorize the pairs of numerical values of
And store them all in table2 (i, j, q)
do it. Thus, the flow charts shown in FIGS.
Even if you change the chart, the amount of calculation hardly changes
No.   In the above-described embodiment, the processing for finding the minimum value
But these are S_kThe smaller the value of (x), the more clauses
The higher the certainty of x and the smaller the value of PEN, the better the dependency
This is because the degree was high. If S_kThe value of (x) is large
The higher the certainty, the higher the value of PEN
If the degree of consistency is high, the process of finding the optimum value instead of the minimum value
Just do it. [The invention's effect]   As mentioned above, according to the present invention, a given sentence
Clause set B₁, B_Two,…, B_NSubsequence B of_i, B_{i + 1},…, B_jThe most
The optimal phrase sequence when the later phrase is fixed and the optimal
The syntax and its eligibility correspond to short-length substrings
Find them sequentially from the thing and memorize it, and those parts
Calculate similar for longer subsequences including columns
Sometimes by using them, we can perform the same calculations
The desired result can be obtained efficiently without returning.   Multiple clauses x₁, x_Two,…, X_mAt the same time, one phrase x
The degree of consistency PEN (x₁, x_Two,…, X_m, x)
The attributes of the words that make up and the dependencies that appear in the actual sentence
Can be calculated based on statistical information such as
You. The amount of calculation also depends on the language dictionary construction method, etc.
However, as a guide, the present invention
And the number of addition and comparison operations in the enumeration method
Value. (1) To calculate PEN (x, y), a total of J times
Requires arithmetic. (2) PEN (x₁, x_Two,…, X_m, x) to calculate PEN
(X₁, x) + PEN (x_Two, x) + ... + PEN (x_m, x)
The same amount of calculation as, ie, addition (J + 1) · m-1 times
Requires the computational complexity of   Further assume that the original number of clause sets is all equal to M.
You. Then, a parameter that determines the size of the problem to be solved
The data is that, in addition to J,
You.   M: number of elements in each clause set   N: Length of phrase string   L: Upper limit of the number of phrases that can be related to one phrase at the same time   Under the above assumptions, the amount of calculation is as follows. (A) The present invention (1) Addition   _iC_jIs a binomial coefficient, and the function f (n) is If you define (2) Comparison   Function g (n) If you define (B) The enumeration method   knum (n, L) in the dependency structure on a clause sequence of length n
And the number of clauses pertaining to one clause at the same time is L or less
Assuming that (1) Addition   Total number of additions = ｛knum (N, L) + (J + 1) · (N−
1) + (N-1)｝ · M^N (2) Comparison   Total number of comparisons = knum (N, L) M^N-1   knum (n, L) can be calculated using the following recurrence formula:
Wear.  J = 1 for the total number of additions and the total number of comparisons
The calculated values for some M, N, and L are listed in Tables 2 and 3.
I can.   According to these tables, the effect of the present invention is clear,
For example, when M = 5, N = 20, L = 5, the calculation amount is enumerated.
About 10^FifteenIt is reduced by a factor of one.

【図面の簡単な説明】第１図は本発明を実施する装置の一実施例を示すブロッ
ク図、第２図（Ａ）および（Ｂ）はその制御手順の一例を示す
フローチャート、第３図（Ａ）および（Ｂ）は第２図のフローチャートを
実行する際に必要となるテーブルの一例を示すテーブル
構造図である。 SC……文節確実度保持用RAM、 BUF……文節集合保持用RAM、 T1……table1用RAM、 T2……table2用RAM、 SEL1……データ選択装置、 SEL2……データ選択装置、 PE……係り受け整合度計算装置、 COMBI……組合わせ発生装置、 ADD1……加算器、 ADD2……加算器、 MIN……最小値検出器、 CPU……中央処理装置、 MEM1……制御手順記憶用ROM、 MEM2……CPU作業用RAM、 CONT……各部の動作順序を制御する制御装置、 i1……文節確実度入力端子、 i2……文節入力端子、 01……メモリT1に得られた結果の出力端子、 02……メモリT2に得られた結果の出力端子。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of an apparatus for carrying out the present invention, FIGS. 2 (A) and (B) are flowcharts showing an example of the control procedure, FIG. FIGS. 3A and 3B are table structure diagrams showing an example of a table required when executing the flowchart of FIG. SC: RAM for holding phrase certainty, BUF: RAM for holding phrase set, T1: RAM for table1, T2: RAM for table2, SEL1: Data selector, SEL2 ... Data selector, PE ... Dependency matching degree calculator, COMBI …… Combination generator, ADD1 …… Adder, ADD2 …… Adder, MIN …… Minimum value detector, CPU …… Central processing unit, MEM1 …… ROM for storing control procedures , MEM2: CPU working RAM, CONT: Control device for controlling the operation order of each part, i1: ... clause certainty input terminal, i2 ...: clause input terminal, 01: output of results obtained in memory T1 Terminal, 02 ... Output terminal of the result obtained in memory T2.

フロントページの続き (56)参考文献電子通信学会技術研究報告，Ｖｏｌ. 86，Ｎｏ．93（1986．７．16），ｐ．41 −48（ＳＰ86−32) 電子通信学会技術研究報告，Ｖｏｌ. 86，Ｎｏ．221（1986．11．20），ｐ. 47−57（ＣＯＭＰ86−47)Continuation of front page (56) References IEICE Technical Report, Vol. 86, no. 93 (1986. 7.16), p. 41 -48 (SP86-32) IEICE Technical Report, Vol. 86, no. 221 (November 20, 1986), p. 47-57 (COMP86-47)

Claims

(57) [Claims] Given a sequence of clause sets and a numerical value representing the certainty of each clause in the clause set string, the sum of the consistency of the fact that a plurality of clauses simultaneously belong to one clause and the certainty of each clause. By minimizing or maximizing, the optimal clause sequence under the condition that one clause is selected from each clause set, the optimal syntax on the clause sequence, and the degree of eligibility are determined. In the language processing system, a first buffer memory means for holding the certainty of each clause in each of the input phrase sets, a second buffer memory means for holding each of the input phrase sets, There is a first and second table in the form of a two-dimensional upper triangular matrix having a number of rows and a number of columns equal to the column length N, wherein the first table and the second table are Each cell is defined as the element of the phrase set with the number equal to the column number j. A memory configured by dividing the first table and the second table into three dimensions by dividing the term into a number NUM (j), and an i-th row, an i-column, and an r term of each of the first table Means for storing the certainty factor of the r-th clause in the i-th clause set, a predetermined integer L smaller than N, and 1i <jN
Integer m that satisfies 1 mL for integer i, j that satisfies
And i-1 = k (0) <k (1) <k (2) <... <
integers k (0), k (1), k satisfying k (m) = j-1
(2),..., K (m) and an integer p that satisfies 1p (n) NUM (k (n)) for each n of 1 nm
(N) (m, k (0), k (1), k (2), ..., k
(M), p (1), p (2),..., P (m)), and a set of integers (m, k) generated by the combination generating means.
(0), k (1), k (2), ..., k (m), p (1), p
(2),..., P (m)), for each integer n that satisfies 1 nm, the k (n) -th clause set among the clause sets held in the second buffer memory means. First selecting means for selecting the p (n) -th clause of the set, and a set of integers (m, k) generated by the combination generating means
(0), k (1), k (2), ..., k (m), p (1), p
(2),..., P (m)), for each integer n satisfying 1 nm, k (n−1) +
A second selecting means for selecting a p (n) term in one row and a k (n) th column, and a p (n) clause in each k (n) clause set selected by the first selecting means Is calculated by the calculation means for calculating the degree of consistency of simultaneously relating to the q-th clause in the j-th clause set held in the second buffer memory means, and the output of the calculation means and the second selection means are selected. And the k-th (n-1) + 1-th row and the k-th
(N) a first adding means for calculating the sum of the numerical values held in the p (n) term of the column, and a set of integers (m , k (0), k (1), k
(2), ..., k (m), p (1), p (2), ..., p
(M)) a set of minimum values for all of the above and an integer giving the minimum value A minimum value detecting means for calculating the minimum value, and a first value which is a first output of the minimum value detecting means.
Q-th in the j-th clause set held in the buffer memory means
Second adding means for adding the certainty factor of the clause, and outputting the output of the second adding means to the i-th row, j-th row of the first table.
A set of means for storing in a column and a q-th term, and a set of integers as a second output of the minimum value detecting means In the i-th row, the j-th column, and the q-th term of the second table; and filling each row, column, and term of the first and second tables with sequentially calculated values. Calculation order control means for controlling the values of the integers i, j, and q in order to go on; when the first table and the second table are all filled with the calculated values, the first By calculating the minimum value in each term in the first row and the Nth column of the table and the term number giving the minimum value, the degree of eligibility of the optimal syntax on the optimal clause column and the final clause set in the final clause set are determined. The means to obtain the optimal clause number and the set of integers required to construct the optimal syntax Means for reading from each row, column, and term of the second table.