JP4956334B2

JP4956334B2 - Automaton determinizing method, finite state transducer determinizing method, automaton determinizing apparatus, and determinizing program

Info

Publication number: JP4956334B2
Application number: JP2007223025A
Authority: JP
Inventors: 学永尾
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2007-08-29
Filing date: 2007-08-29
Publication date: 2012-06-20
Anticipated expiration: 2027-08-29
Also published as: JP2009058989A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a determination method of an automaton for reducing memory capacity required for determination, and to provide a determination method of a finite state transducer, an automaton determination device and an automaton determination program. <P>SOLUTION: Partial transition to be determined is selected from transition included in a nondeterministic model every time processing related with determination is repeated, and the name of a determined state configured of the combination of states generated in determining the selected partial transition is changed to single different name every time the processing is repeated. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、有限状態オートマトン（ＦｉｎｉｔｅＳｔａｔｅＡｕｔｏｍａｔｏｎ：ＦＳＡ）又はＦＳＡを拡張したオートマトン（有限状態トランスデューサ、重み付き有限状態オートマトン、重み付き有限状態トランスデューサ等）の決定化に関する。 The present invention relates to determinization of a finite state automaton (FSA) or an automaton (such as a finite state transducer, a weighted finite state automaton, or a weighted finite state transducer) that is an extension of FSA.

現在の状態からある入力によって遷移するときに、遷移先の状態が複数存在するようなＦＳＡは非決定性ＦＳＡと呼ばれている。また、現在の状態からある入力によって遷移するときに、遷移先の状態が１つのみであるようなＦＳＡは決定性ＦＳＡと呼ばれている。非決定性ＦＳＡを決定性ＦＳＡに変換することは「決定化」と呼ばれており、この決定化はサブセット構成法と呼ばれる方法によって実現できることが知られている（例えば、非特許文献１参照）。また、重み付き有限状態オートマトンや有限状態トランスデューサ、有限状態トランスデューサ等の有限状態オートマトンを拡張したオートマトンについても、非決定状態にあるオートマトン（非決定性オートマトン）から決定化する方法が提案されている（例えば、非特許文献２参照）。 An FSA in which a plurality of transition destination states exist when transitioning from a current state by a certain input is called a non-deterministic FSA. Further, an FSA in which there is only one transition destination state when a transition is made from an input to a current state is called a deterministic FSA. Converting non-deterministic FSA to deterministic FSA is called “determinization”, and it is known that this determinization can be realized by a method called a subset construction method (for example, see Non-Patent Document 1). In addition, a method of determinizing an automaton that is an extension of a finite state automaton such as a weighted finite state automaton, a finite state transducer, or a finite state transducer from an automaton in a non-determined state (non-deterministic automaton) has been proposed (for example, Non-patent document 2).

Ｊ．ホップクロフト／Ｊ．ウルマン共著、「オートマトン言語理論計算論Ｉ（第２版）」、サイエンス社J. et al. Hopcroft / J. Co-authored by Ullmann, "Automaton Language Theory Computation I (2nd edition)", Science Finite-state transducers in language and speech processing, Mehryar Mohri, Computational Linguistics, Volume 23, Issue 2(June 1997) Pages.269-311Finite-state transducers in language and speech processing, Mehryar Mohri, Computational Linguistics, Volume 23, Issue 2 (June 1997) Pages.269-311

しかしながら、決定化したい非決定性ＦＳＡや上述した非決定性オートマトンの規模が大きく複雑になると、従来の決定化の方法では、決定化の処理中に記憶しておかなければならない要素数が増加するため、決定化を実行するためのメモリ量が多量に必要となるという問題がある。 However, when the scale of the non-deterministic FSA to be determinized or the above-described non-deterministic automaton becomes large and complicated, the conventional determinizing method increases the number of elements that must be stored during the determinizing process. There is a problem that a large amount of memory is required to execute determinization.

本発明は上記に鑑みてなされたものであって、決定化の実行時に要するメモリ量を減少させることが可能なオートマトンの決定化方法、有限状態トランスデューサの決定化方法、オートマトン決定化装置及びオートマトンの決定化プログラムを提供することを目的とする。 The present invention has been made in view of the above, and is an automaton determinizing method, a finite state transducer determinizing method, an automaton determinizing device, and an automaton The purpose is to provide a determinizing program.

上述した課題を解決し、目的を達成するために、本発明は、有限状態オートマトン又は当該有限状態オートマトンを拡張したオートマトンに関し、非決定性オートマトンの決定化を行うオートマトン決定化装置で実行されるオートマトンの決定化方法であって、前記オートマトン決定化装置は記憶手段を備え、部分決定化手段によって、前記非決定性オートマトンに含まれた遷移のうち、決定化対象として選択した一部の遷移について決定化を行う部分決定化工程と、名称変更手段によって、前記決定化により前記記憶手段に記憶された前記非決定性オートマトンに含まれた何れかの状態の組み合わせからなる決定化済み状態の名称を、互いに異なる単一の名称に付け替える名称変更工程と、繰り返し手段によって、前記非決定性オートマトンに含まれた全ての遷移について決定化が行われるまで、前記部分決定化工程を繰り返し実行させる繰り返し工程と、を含み、前記部分決定化工程は、前記繰り返し工程による繰り返し毎に、前記決定化対象として選択する遷移又は遷移の組みを異ならしめることを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention relates to a finite state automaton or an automaton that is an extension of the finite state automaton. In the determinizing method, the automaton determinator includes a storage unit, and the partial determinizing unit performs determinization on a part of transitions selected as a determinizing target among transitions included in the non-deterministic automaton. A partial determinizing step to be performed and a name change unit which assigns names of determinized states, which are combinations of any states included in the non-deterministic automaton stored in the storage unit by the determinization, to each other. The non-deterministic automaton is obtained by a name changing step to be replaced with a single name and repeating means. The partial determinizing step is repeatedly executed until determinization is performed for all transitions included in the partial determinizing step, and the partial determinizing step includes the determinizing target for each repetition of the repeating step. The transitions or combinations of transitions selected as are made different.

また、本発明は、音声認識装置で用いられる重み付き有限状態トランスデューサに関し、非決定性有限状態トランスデューサの決定化を行うオートマトン決定化装置で実行される重み付き有限状態トランスデューサの決定化方法であって、前記オートマトン決定化装置は記憶手段を備え、部分決定化手段によって、前記非決定性有限状態トランスデューサに含まれた遷移のうち、決定化対象として選択した一部の遷移について決定化を行う部分決定化工程と、名称変更手段によって、前記決定化により前記記憶手段に記憶された前記非決定性有限状態トランスデューサに含まれた何れかの状態又はそれを含む組の組み合わせからなる決定化済み状態の名称を、互いに異なる単一の名称に付け替える名称変更工程と、繰り返し手段によって、前記非決定性有限状態トランスデューサに含まれた全ての遷移について決定化が行われるまで、前記部分決定化工程を繰り返し実行させる繰り返し工程と、を含み、前記部分決定化工程は、前記繰り返し工程による繰り返し毎に、前記決定化対象として選択する遷移又は遷移の組みを異ならしめることを特徴とする。 The present invention also relates to a weighted finite state transducer for use in a speech recognition apparatus, a weighted finite state transducer determinating method executed by an automaton determinator for determinizing a nondeterministic finite state transducer, The automaton determinizing apparatus includes a storage unit, and a partial determinizing step of performing determinization on a part of transitions selected as determinizing targets among transitions included in the non-deterministic finite state transducer by the partial determinizing unit And by the name changing means, the names of the determinized states made up of any state included in the non-deterministic finite state transducer stored in the storage means by the determinization or a combination of a set including the states are mutually connected. By the name change process to change to a different single name and repeat means, Repeating the partial determinizing step until all the transitions included in the non-deterministic finite state transducer are determinized, and the partial determinizing step is performed for each repetition of the repeating step. The transition or the combination of transitions selected as the determinizing target is made different.

また、本発明は、有限状態オートマトン又は当該有限状態オートマトンを拡張したオートマトンに関し、非決定性オートマトンを決定化するオートマトン決定化装置であって、前記非決定性オートマトンに含まれた遷移のうち、決定化対象として選択した一部の遷移について決定化を行う部分決定化手段と、前記決定化により生成された、前記非決定性オートマトンに含まれた何れかの状態の組み合わせからなる決定化済み状態を記憶する記憶手段と、前記決定化済み状態の名称を互いに異なる単一の名称に付け替える名称変更手段と、前記非決定性オートマトンに含まれた全ての遷移について決定化が行われるまで、前記部分決定化工程を繰り返し実行させる繰り返し手段と、を備え、前記部分決定化手段は、前記繰り返し工程による繰り返し毎に、前記決定化対象として選択する遷移又は遷移の組みを異ならしめることを特徴とする。 The present invention also relates to a finite state automaton or an automaton that is an extension of the finite state automaton, and is an automaton determinizing device that determines a nondeterministic automaton, and among the transitions included in the nondeterministic automaton, A partial determinizing unit that performs determinization on a part of transitions selected as a memory, and a memory that stores a determinized state that is generated by the determinization and includes any combination of states included in the non-deterministic automaton Repeating the partial determinizing step until determinizing is performed for all transitions included in the non-deterministic automaton, a name changing unit that replaces the determinized state name with a different single name Repetitive means to be executed, and the partial determinator means repeats the repetitive step. Each is characterized by occupying different set of transition or transition selected as the determination-target.

また、本発明は、有限状態オートマトン又は当該有限状態オートマトンを拡張したオートマトンに関し、非決定性オートマトンを決定化し、この決定化の際の作業エリアとして機能する記憶手段を備えたコンピュータで動作するプログラムであって、前記非決定性オートマトンに含まれた遷移のうち、決定化対象として選択した一部の遷移について決定化を行う部分決定化機能と、前記決定化により前記記憶手段に記憶された前記非決定性オートマトンに含まれた何れかの状態の組み合わせからなる決定化済み状態の名称を、互いに異なる単一の名称に付け替える名称変更工程と、前記非決定性オートマトンに含まれた全ての遷移について決定化が行われるまで、前記部分決定化工程を繰り返し行う繰り返し機能と、を実現させ、前記部分決定化機能は、前記繰り返し機能による繰り返し毎に、前記決定化対象として選択する遷移又は遷移の組みを異ならしめることを特徴とする。 Further, the present invention relates to a finite state automaton or an automaton that is an extension of the finite state automaton, and is a program that operates on a computer including a storage unit that determines a nondeterministic automaton and functions as a work area at the time of the determinization. Among the transitions included in the non-deterministic automaton, a partial determinizing function for determinizing a part of transitions selected as determinizing targets, and the non-deterministic automaton stored in the storage means by the determinizing A name change step of changing the name of a determinized state composed of any combination of states included in a single name different from each other, and determinization is performed for all transitions included in the non-deterministic automaton Until the partial determinizing step is repeated, and the partial decision is realized. Function is each iteration by the repeating function, characterized in that occupy different set of transition or transition selected as the determination-target.

本発明によれば、非決定性オートマトンに含まれた遷移から、決定化の対象とする遷移を一部選び、この決定化対象とする遷移を毎回変えながら繰り返し決定化を行うことで、一回の決定化で記憶する決定化済み状態の名称のうち、それを構成する非決定性オートマトンに含まれた状態の延べ数を減少させることができるため、決定化の実行時に要するメモリ量を減少させることができる。 According to the present invention, a part of the transitions to be determinized is selected from the transitions included in the non-deterministic automaton, and the determinization is repeatedly performed while changing the transitions to be determinized each time. Since the total number of states included in the non-deterministic automaton that constitutes the names of determinized states stored by determinization can be reduced, the amount of memory required for execution of determinization can be reduced. .

以下に添付図面を参照して、オートマトン決定化装置、決定化方法及び決定化プログラムの最良な実施形態を詳細に説明する。 Exemplary embodiments of an automaton determinizing apparatus, a determinizing method, and a determinizing program will be described below in detail with reference to the accompanying drawings.

［第１の実施形態］
図１は、本実施形態に係るオートマトン決定化装置１００のハードウェア構成を示したブロック図である。図１に示したように、オートマトン決定化装置１００は、ＣＰＵ（Central Processing Unit）１、操作部２、表示部３、ＲＯＭ（Read Only Memory）４、ＲＡＭ（Random Access Memory）５、記憶部６等を備え、各部はバス７により接続されている。なお、後述するオートマトン決定化装置２００、３００及び４００においても、オートマトン決定化装置１００と同様のハードウェア構成を有するものとする。 [First Embodiment]
FIG. 1 is a block diagram showing a hardware configuration of an automaton determinizing device 100 according to the present embodiment. As shown in FIG. 1, the automaton determinator 100 includes a CPU (Central Processing Unit) 1, an operation unit 2, a display unit 3, a ROM (Read Only Memory) 4, a RAM (Random Access Memory) 5, and a storage unit 6. Etc., and each part is connected by a bus 7. It is assumed that the automaton determinizing apparatuses 200, 300, and 400 described later have the same hardware configuration as that of the automaton determinizing apparatus 100.

ＣＰＵ１は、ＲＡＭ５の所定領域を作業領域として、ＲＯＭ４又は記憶部６に予め記憶された各種制御プログラムとの協働により各種処理を実行し、オートマトン決定化装置１００を構成する各部の動作を統括的に制御する。また、ＣＰＵ１は、ＲＯＭ４又は記憶部６に予め記憶された所定のプログラムとの協働により、後述する各機能部の機能を実現させる。 The CPU 1 performs various processes in cooperation with various control programs stored in the ROM 4 or the storage unit 6 in advance using the predetermined area of the RAM 5 as a work area, and controls the operation of each unit constituting the automaton determinizing apparatus 100. To control. Further, the CPU 1 realizes the functions of each functional unit described later in cooperation with a predetermined program stored in advance in the ROM 4 or the storage unit 6.

操作部２は、マウスやキーボード等の入力デバイスであって、ユーザから操作入力された情報を指示信号として受け付け、その指示信号をＣＰＵ１に出力する。 The operation unit 2 is an input device such as a mouse or a keyboard. The operation unit 2 receives information input by a user as an instruction signal, and outputs the instruction signal to the CPU 1.

表示部３は、ＬＣＤ（Liquid Crystal Display）等の表示装置により構成され、ＣＰＵ１からの表示信号に基づいて、各種情報を表示する。 The display unit 3 is configured by a display device such as an LCD (Liquid Crystal Display), and displays various types of information based on display signals from the CPU 1.

ＲＯＭ４は、オートマトン決定化装置１００の制御にかかるプログラムや各種設定情報等を書き換え不可能に記憶する。 The ROM 4 stores in a non-rewritable manner programs and various setting information relating to the control of the automaton determinizing apparatus 100.

ＲＡＭ５は、ＳＤＲＡＭ等の揮発性の記憶媒体であって、ＣＰＵ１の作業エリアとして機能し、具体的には、後述する逐次決定化処理時において生成される各種変数やパラメータの値等を一時記憶するバッファ等の役割を果たす。 The RAM 5 is a volatile storage medium such as an SDRAM and functions as a work area for the CPU 1. Specifically, the RAM 5 temporarily stores various variables and parameter values generated at the time of sequential determinization processing described later. Plays a role as a buffer.

記憶部６は、磁気的又は光学的に記録可能な記憶媒体を有し、オートマトン決定化装置１００の制御にかかるプログラムや各種設定情報等を書き換え可能に記憶する。また、記憶部６は、後述する逐次決定化処理で処理の対象となる非決定性ＦＳＡや後述する非決定性ＷＦＳＴ等のオートマトンに係る各種の情報を予め記憶する。 The storage unit 6 has a storage medium that can be magnetically or optically recorded, and stores a program, various setting information, and the like related to the control of the automaton determining apparatus 100 in a rewritable manner. In addition, the storage unit 6 stores in advance various types of information related to automata such as a non-deterministic FSA to be processed in a sequential determinizing process described later and a non-deterministic WFST described later.

図２は、ＣＰＵ１とＲＯＭ４又は記憶部６に予め記憶された所定のプログラムとの協働により実現される、オートマトン決定化装置１００の機能的構成を示した図である。同図に示したように、オートマトン決定化装置１００は決定化処理部１１、部分集合生成部１２、部分決定化部１３、繰り返し処理部１４を備えている。 FIG. 2 is a diagram showing a functional configuration of the automaton determinizing apparatus 100 realized by the cooperation of the CPU 1 and a predetermined program stored in the ROM 4 or the storage unit 6 in advance. As shown in the figure, the automaton determinizing apparatus 100 includes a determinizing processing unit 11, a subset generation unit 12, a partial determinizing unit 13, and an iterative processing unit 14.

決定化処理部１１は、非決定性ＦＳＡをサブセット構成法とよばれる公知の方法により決定化する機能部である。以下、決定化処理部１１が行う決定化の方法を、従来の決定化方法という。 The determinizing processing unit 11 is a functional unit that determinates the non-deterministic FSA by a known method called a subset construction method. Hereinafter, the determinizing method performed by the determinizing processing unit 11 is referred to as a conventional determinizing method.

ここで、決定化処理部１１が行う従来の決定化方法について説明する。図３は、非決定性ＦＳＡであるＡ₁＝（Ｑ₁，Σ，Ｅ₁，Ｉ₁，Ｆ₁）の一例を示した図である。Ｑ₁は状態の集合を、Σは入力記号の集合を、Ｅ₁は遷移の集合でＥ₁⊆Ｑ₁×Σ×Ｑ₁を、Ｉ₁は初期状態の集合を、Ｆ₁は受理状態の集合を夫々意味している。また、遷移の集合Ｅ₁の各要素である遷移をδとして、prev(δ)を遷移元の状態、next(δ)を遷移先の状態、input(δ)を遷移の入力記号であるとする。なお、prev(δ)∈Ｑ₁、next(δ)∈Ｑ₁、input(δ)∈Σである。 Here, a conventional determinizing method performed by the determinizing processing unit 11 will be described. FIG. 3 is a diagram showing an example of A ₁ = (Q ₁ , Σ, E ₁ , I ₁ , F ₁ ) that is a nondeterministic FSA. Q ₁ is a set of states, Σ is a set of input symbols, E ₁ is a set of transitions, E ₁ ⊆Q ₁ × Σ × Q ₁ , I ₁ is a set of initial states, and F ₁ is an accepted state Each means a set. Further, a transition which is the element of the set E ₁ transition as [delta], prev ([delta]) the transition source state, next ([delta]) the destination state, and an input symbol transition input The ([delta]) . Note that prev (δ) εQ ₁ , next (δ) εQ ₁ , and input (δ) εΣ.

図３の場合、Ｑ₁＝｛０，１，２，３｝、Σ＝｛ａ，ｂ，ｃ｝、Ｉ₁＝｛０｝、Ｆ₁＝｛０｝となる。なお、ＦＳＡの状態名は整数値で表現されているものとしている。この例の場合、遷移の集合Ｅ₁は図４で示した遷移表のように表すことができる。ここで、図４は、図３に示した非決定性ＦＳＡの遷移表を示した図である。同図に示したように、遷移の種別を表す「遷移」と、遷移元の状態を表す「遷移元状態」と、遷移先の状態を表す「遷移先状態」と、この遷移に付与された「入力記号」と、が対応付けられている。この遷移表において、例えば行Ｒ１は、入力記号ａによる状態０から状態１への遷移δ₀を表している。 In the case of FIG. 3, Q ₁ = {0, 1, 2, 3}, Σ = {a, b, c}, I ₁ = {0}, and F ₁ = {0}. Note that the state name of the FSA is expressed by an integer value. In this example, the transition set E ₁ can be expressed as in the transition table shown in FIG. Here, FIG. 4 is a diagram showing a transition table of the nondeterministic FSA shown in FIG. As shown in the figure, “Transition” that represents the type of transition, “Transition source state” that represents the state of the transition source, “Transition destination state” that represents the state of the transition destination, and this transition “Input symbol” is associated. In this transition table, for example, row R1 represents a transition δ ₀ from state 0 to state 1 by the input symbol a.

なお、図３で示したＦＳＡが非決定性ＦＳＡであるということは、一の入力記号に対応する遷移先の状態数を数えることで判別することができる。例えば、現在の状態が初期状態である状態０であるとき、入力記号ａがこのＦＳＡに入力された場合を参照すると、遷移先の状態は状態１か状態２となり、遷移先の状態は１つに定まらない。このことから、図３のＦＳＡが非決定性ＦＳＡであることがわかる。 Note that the FSA shown in FIG. 3 is a non-deterministic FSA can be determined by counting the number of transition destination states corresponding to one input symbol. For example, when the current state is the initial state, state 0, referring to the case where the input symbol a is input to this FSA, the transition destination state is state 1 or state 2, and the transition destination state is one. Not determined. From this, it can be seen that the FSA of FIG. 3 is a non-deterministic FSA.

決定化処理部１１は、図３で示した非決定性ＦＳＡをサブセット構成法によって決定化するため、図５で示した決定化処理を実行する。以下、決定化処理部１１が行う決定化処理について説明する。なお、本処理の前提として、非決定性ＦＳＡであるＡ₁＝（Ｑ₁，Σ，Ｅ₁，Ｉ₁，Ｆ₁）を決定化した決定性ＦＳＡを、Ａ₂＝（Ｑ₂，Σ，Ｅ₂，ｉ₂，Ｆ₂）とする。ここでｉ₂は初期状態であり、この処理によって決定化を実行する前、つまりステップＳ１１が実行される前のＦ₂やＥ₂やＱ₂は空集合であるものとする。また、決定化処理の実行時に生成される、後述するｑ_sub、ｑ'_sub、ｘ、δ’等の各種変数は作業エリアとして機能するＲＡＭ５に一時記憶されるものとする。なお、図中の「φ」は空集合を表す。 The determinization processing unit 11 executes the determinization process illustrated in FIG. 5 in order to determinate the non-deterministic FSA illustrated in FIG. 3 by the subset configuration method. Hereinafter, the determinizing process performed by the determinizing processing unit 11 is described. As a premise of this processing, a deterministic FSA obtained by determining A ₁ = (Q ₁ , Σ, E ₁ , I ₁ , F ₁ ), which is a non-deterministic FSA, is A ₂ = (Q ₂ , Σ, E _2). , I ₂ , F ₂ ). Here, i ₂ is an initial state, and it is assumed that F ₂ , E _2, and Q ₂ before executing determinization by this processing, that is, before executing step S11, are empty sets. It is assumed that various variables such as q _sub , q ′ _sub, x, and δ ′ _, which will be described later, generated when the determinizing process is executed are temporarily stored in the RAM 5 that functions as a work area. Note that “φ” in the figure represents an empty set.

まず、決定化処理部１１は、Ａ₂の初期状態ｉ₂にＩ₁を代入し（ステップＳ１１）、このＡ₂の初期状態であるｉ₂をキューＳに１つの要素として追加する（ステップＳ１２）。 First, determination processing unit 11 substitutes I ₁ to the initial state i ₂ of A ₂ (step S11), and add i ₂ is the initial state of the A ₂ as an element in the queue S (step S12 ).

次いで、決定化処理部１１は、Ｓが空集合か否かを判定する（ステップＳ１３）。ここで、Ｓを空集合でないと判定した場合には（Ｓ１３；Ｙｅｓ）、決定化処理部１１は、Ｓから要素を一つ取り出しｑ_subに代入した後（ステップＳ１４）、現在のｑ_subに関してΣに含まれた要素（入力記号）全てに対してステップＳ１６〜２３の処理を行ったか否かを判定する（ステップＳ１５）。 Next, the determinizing processing unit 11 determines whether S is an empty set (step S13). Here, when it is determined not to be empty set S (S13; Yes), determination processing unit 11, after assigning it to a single extraction q _sub elements from S (step S14), and for the current q _sub It is determined whether or not the processing in steps S16 to 23 has been performed on all elements (input symbols) included in Σ (step S15).

ステップＳ１５において、Σの要素全てに対して処理を行ったと判定した場合には（ステップＳ１５；Ｙｅｓ）、ステップＳ１３の処理へと再び戻る。また、ステップＳ１５において、Σの要素全てに対して処理を行っていないと判定した場合には（ステップＳ１５；Ｎｏ）、決定化処理部１１は、まだ処理していない入力記号をｘに代入する（ステップＳ１６）。 If it is determined in step S15 that processing has been performed for all elements of Σ (step S15; Yes), the process returns to step S13 again. If it is determined in step S15 that processing has not been performed for all the elements of Σ (step S15; No), the determinizing processing unit 11 substitutes an input symbol that has not yet been processed for x. (Step S16).

続いて、決定化処理部１１は、ｑ_subに含まれる状態のうち入力記号ｘを伴う遷移の遷移先の状態の集合をｑ'_subに代入する（ステップＳ１７）。次に決定化処理部１１は、入力記号ｘによりｑ_subからｑ'_subに遷移することのできる遷移δ’をＥ₂に追加する（ステップＳ１８）。 Subsequently, the determinizing processing unit 11 substitutes a set of transition destination states of the transition accompanied by the input symbol x among the states included in q _sub for q ′ _sub (step S17). Next, the determinization processing unit 11 adds a transition δ ′ that can transition from q _sub to q ′ _sub by the input symbol x to E ₂ (step S18).

次いで、決定化処理部１１は、ｑ'_subが既にＱ₂に存在しているか否かを判定する。ここで、存在すると判定した場合（ステップＳ１９；Ｙｅｓ）、ステップＳ１５の処理へと再び戻る。 Next, the determinizing processing unit 11 determines whether or not q ′ _sub already exists in Q ₂ . Here, when it determines with existing (step S19; Yes), it returns to the process of step S15 again.

一方、ｑ'_subがＱ₂に存在していないと判定した場合（ステップＳ１９；Ｎｏ）、決定化処理部１１は、ｑ'_subをＱ₂に追加する（ステップＳ２０）。続いて決定化処理部１１は、ｑ'_subの要素がＦ₁に含まれているか否かを判定する（ステップＳ２１）。ここで、ｑ'_subの要素がＦ₁に含まれていないと判定した場合には（ステップＳ２１；Ｎｏ）、ステップＳ２３の処理へと直ちに移行する。 On the other hand, when it is determined that q ′ _sub does not exist in Q ₂ (step S19; No), the determinizing processing unit 11 adds q ′ _sub to Q ₂ (step S20). Subsequently, the determinizing processing unit 11 determines whether or not the element of q ′ _sub is included in F ₁ (step S21). Here, when it is determined that the element of q ′ _sub is not included in F ₁ (step S21; No), the process immediately proceeds to the process of step S23.

また、ステップＳ２１において、ｑ'_subの要素がＦ₁に含まれていると判定した場合には（ステップＳ２１；Ｙｅｓ）、決定化処理部１１は、ｑ'_subを受理状態の集合Ｆ₂に追加し（ステップＳ２２）、ステップＳ２３の処理へと移行する。 When it is determined in step S21 that the element of q ′ _sub is included in F ₁ (step S21; Yes), the determinizing processing unit 11 sets q ′ _sub to the set of accepted states F ₂ . It adds (step S22) and transfers to the process of step S23.

続くステップＳ２３では、決定化処理部１１がＳにｑ'_subを追加した後（ステップＳ２３）、ステップＳ１５の処理へと再び戻る。 In subsequent step S23, after the determinizing processing unit 11 adds q ′ _sub to S (step S23), the process returns to the process of step S15 again.

一方、ステップＳ１３において、Ｓを空集合と判定した場合には（ステップＳ１３；Ｎｏ）、ステップＳ２４へと移行し、Ｑ₂に追加された状態の名前を付け替えた後（ステップＳ２４）、本処理を終了する。この名前の付け替えについては後述する。 On the other hand, in step S13, when it is determined that an empty set S (step S13; No), the operation proceeds to step S24, after it renames state of being added to Q ₂ (step S24), and the process Exit. This name change will be described later.

なお、受理状態に到達できない経路にある状態や遷移は除去することとしてもよい。また、Ｓはキューである必要はなく、例えばスタック等、要素を１つずつ追加、取り出しでき、空であるかどうかを確認できるものであればその態様は問わないものとする。 It should be noted that a state or transition on a route that cannot reach the accepting state may be removed. S does not need to be a queue. For example, any element can be used as long as it can add and take out elements one by one, such as a stack, and can check whether the element is empty.

図６は、上記した決定化処理により図３の非決定性ＦＳＡＡ₁を決定化した決定性ＦＳＡＡ₂を示した図である。また、図６での遷移の集合Ｅ₂は、図７に示した遷移表で表される。なお、図６では、名前の付け替えの処理（ステップＳ２４）前の状態を示している。 FIG. 6 is a diagram showing the deterministic FSA A ₂ obtained by determining the non-deterministic FSA A ₁ of FIG. 3 by the determinizing process described above. The transition set E ₂ in FIG. 6 is represented by the transition table shown in FIG. FIG. 6 shows a state before the name change process (step S24).

図６において、各状態の「｛｝」で囲まれた中の数字はＱ１の要素であり、「｛｝」で囲まれた全体の集合は上述したステップＳ２０で追加された各ｑ'_subに対応するものである。具体的に、状態Ｂ１１は｛０｝、状態Ｂ１２は｛１，２｝、状態Ｂ１３は｛０，３｝のように、集合Ｑ₁の部分集合として表現されている。以下、この部分集合が名前として割り当てられた状態のこと、すなわち、名前の付け替え前のＱ₂に含まれている状態のことを「決定化済み状態」という。 In FIG. 6, the numbers enclosed in “{}” in each state are elements of Q1, and the entire set enclosed in “{}” is added to each q ′ _sub added in step S20 described above. Corresponding. Specifically, state B11 is {0}, the state B12 is {1,2}, the state B13 is as {0,3}, is represented as a subset of the set Q _1. Hereinafter, a state in which this subset is assigned as a name, that is, a state included in Q ₂ before the name change is referred to as a “determinized state”.

上述したステップＳ２４で行う名前の付け替えの処理は、決定化済み状態の名前、つまり、状態の集合Ｑ₁の部分集合として表現された名前を、単一の名前に置き換える処理のことを意味している。例えば、図６の場合、名前の付け替え前のＱ₂の要素、つまり決定化済み状態の名前は整数値の集合で表されている。このように状態の名前を集合として保持するのは効率が悪いので、それぞれの集合に互いに異なる整数値を割り当てる。つまり、状態Ｂ１１の｛０｝を０に、状態Ｂ１２の｛１，２｝を１に、状態Ｂ１３の｛０，３｝を２に名前を付け替える。図８は、この図６の状態の名前を付け替えた結果の一例を示した図である。同様に図６の遷移表である図７での「遷移元状態」と「遷移先状態」との名前も付け替えることができる。 Processing replacement name performed in step S24 described above, the name of the determined reduction already state, that is, the name that is represented as a subset of the set to Q ₁ state, it means that the process of replacing the single name Yes. For example, in the case of FIG. 6, element Q ₂ 'of the previous replacement name, ie the name of determination of already state is represented by a set of integers. Since it is inefficient to hold state names as sets in this way, different integer values are assigned to the respective sets. In other words, {0} of the state B11 is renamed to 0, {1,2} of the state B12 is renamed to 1, and {0,3} of the state B13 is renamed to 2. FIG. 8 is a diagram showing an example of the result of renaming the state of FIG. Similarly, the names of “transition source state” and “transition destination state” in FIG. 7 which is the transition table of FIG. 6 can be changed.

このように名前の付け替えを行うことで、状態の名前をＱ₁の部分集合として保持する必要がなくなり、決定化後のＦＳＡを記憶しておくために必要な記憶域を削減できる。さらに具体的には、整数が４バイトで表現されていた場合、名前の付け替え前は状態名を記憶するために状態名とその数を記憶しておく必要があるため、状態名の記憶に２０バイト、状態名の数を記憶するために１２バイトで計３２バイト必要であるが、名前の付け替え後は１２バイトで済むようになる。 By renaming in this way, it is not necessary to hold the state name as a subset of Q ₁ , and the storage area required to store the determinized FSA can be reduced. More specifically, when the integer is expressed by 4 bytes, it is necessary to store the state name and the number thereof in order to store the state name before the name change. In order to store the number of bytes and state names, 12 bytes are required, for a total of 32 bytes. However, after the name is changed, 12 bytes are sufficient.

しかしながら、決定化処理部１１の決定化処理のみでは状態数や遷移数が増えるに伴い、Ｑ₁の部分集合として表現される名前の長さも増加するため、決定化済み状態を記憶しておくために大きなメモリ量が必要となる。そのため、本実施形態では、部分集合生成部１２、部分決定化部１３、繰り返し処理部１４により、後述する逐次決定化処理を実行し非決定性ＦＳＡの決定化を行うことで、決定化済み状態ｑ'_subに含まれている状態の数を減らし、決定化済み状態を記憶しておくためにメモリ量を削減する。 However, since only the determinizing process of the determinizing processing unit 11 increases the number of states and the number of transitions, the length of the name expressed as a subset of Q ₁ also increases, so the determinized state is stored. Requires a large amount of memory. For this reason, in this embodiment, the subset generation unit 12, the partial determinator 13, and the iterative processing unit 14 execute the sequential determinization process described later to determinate the non-deterministic FSA, thereby determining the determinized state q 'Reduce the amount of memory to reduce the number of states contained in _sub and store the determined state.

以下、本実施形態で行う逐次決定化処理の概要について説明する。ｑ'_subに含まれる状態がどのように作成されるのかは、上述した図５のステップＳ１７で説明したとおりである。Ｑ₂に含まれている決定化済み状態ｑ_subから入力記号ａにより遷移した先の状態がｑ'_subであるとすると、ｑ_subに含まれる各状態から入力記号ａをとって遷移した先の状態は全てｑ'_subに含まれることになる。ここで、ａは入力記号の集合Σに含まれる入力記号である。したがって、Ｅ₁に含まれる遷移のうち、遷移元の状態がｑ_subに含まれており、且つ、その入力記号がａであるような遷移の数がｑ'_subに含まれる状態数の上限値となる。 Hereinafter, an outline of the sequential determinizing process performed in the present embodiment will be described. How the states included in q ′ _sub are created is as described in step S17 of FIG. 5 described above. If the state after the transition from the determinized state q _sub included in Q ₂ by the input symbol a is q ′ _sub , the state after the transition from the states included in q _sub by taking the input symbol a All states are included in q ′ _sub . Here, a is an input symbol included in the input symbol set Σ. Therefore, among the transitions included in E ₁ , the upper limit of the number of states whose transition source state is included in q _sub and whose number of transitions whose input symbol is a is included in q ′ _sub It becomes.

さて、ｑ_subに含まれた状態が遷移元であり、入力記号がａであり、ｑ'_subに含まれている状態が遷移先であるようなＥ₁に含まれる遷移の集合をδ_subとすると、決定化の操作はδ_subに含まれている遷移を１つの遷移にまとめているとみなすことで処理することができる。実際、決定化後のＦＳＡＡ２において、ｑ_subから入力記号ａをとってｑ'_subへと遷移する遷移はただ１つである。δ_subに含まれる遷移の数がｑ'_subに含まれる状態数の上限値であるから、δ_subをいくつかの集合に分割することができればｑ'_subに含まれる状態数の上限値を減らすことができる。つまり、決定化済み状態を記憶するための記憶域を削減するためにはδ_subを分割すればよいということがわかる。 A set of transitions included in E ₁ in which the state included in q _sub is the transition source, the input symbol is a, and the state included in q ′ _sub is the transition destination is denoted as δ _sub . Then, the determinizing operation can be processed by regarding the transitions included in δ _sub as one transition. In fact, the FSA A2 after determination of the transition to transition to q _'sub taking input symbol a from q _sub is only one. 'Since the upper limit of the number of states included in the _sub, the hopefully q be divided into a number of sets of [delta] _sub' number of transitions included in the [delta] _sub is q reduce the upper limit of the number of states included in the _sub be able to. That is, it can be seen that δ _sub may be divided to reduce the storage area for storing the determinized state.

ところが、δ_subに含まれた全ての遷移を分割しても、それらの遷移については決定化したことにはならない場合がある。例えば、δ_subに含まれる入力記号ａに係る２つの遷移（δ₀、δ₁）を分割したとすると、ｑ_subから入力記号ａをとって遷移する遷移先の状態は２つになる。これはつまり決定化できていないということである。そこで、本実施形態では、δ_subに含まれた全ての遷移について分割を行うのでなく、δ_subに含まれた遷移のうち一部の遷移を分割し、残りの遷移については分割しないようにする。以下、δ_subのうち分割を行わない遷移のことを「決定化対象とする遷移」という。また、δ_subに含まれた遷移のうち一部の遷移を分割して決定化すること、つまり非決定性ＦＳＡの遷移の一部を決定化することを「部分決定化」という。ただし、全ての遷移について決定化することも、この部分決定化の概念に含めるものとする。 However, even if all transitions included in δ _sub are divided, those transitions may not be deterministic. For example, if two transitions (δ ₀ , δ ₁ ) related to the input symbol a included in δ _sub are divided, there are two transition destination states that transition from q _sub by taking the input symbol a. This means that it is not deterministic. Therefore, in this embodiment, rather than performing division for all transitions included in the [delta] _sub, divide the transition part of the transition contained in the [delta] _sub, so as not to split the remaining transition . Hereinafter, transitions that are not divided among δ _sub are referred to as “transitions to be determinized”. Further, dividing and determinizing a part of the transitions included in _δsub , that is, determinizing a part of the transition of the non-deterministic FSA is called “partial determinization”. However, determinizing all transitions is also included in the concept of partial determinization.

また、部分決定化によって決定化済み状態を記憶するための記憶域を削減することができるが、部分決定化を１回行っただけでは非決定性ＦＳＡは決定化できない。そこで、繰り返し処理部１４は、全ての遷移が決定化されるまで決定化対象とする遷移を変えながら、部分決定化部１３により部分決定化を繰り返し実行させる。部分決定化では、図５で説明した処理のステップＳ２４と同様に、実行されるたびに状態の名前を付け替えるため、決定化済み状態を記憶するための記憶域は部分決定化が終了するたびに不要となる。 In addition, although the storage area for storing the determinized state can be reduced by partial determinization, nondeterministic FSA cannot be determinized only by performing partial determinization once. Therefore, the iterative processing unit 14 causes the partial determinizing unit 13 to repeatedly execute partial determinization while changing the transition to be determinized until all transitions are determinized. In partial determinization, as in step S24 of the process described with reference to FIG. 5, the state name is changed every time it is executed. Therefore, the storage area for storing the determinized state is stored every time partial determinization is completed. It becomes unnecessary.

つまり、この部分決定化を繰り返して実行する方法で必要となる決定化済み状態を記憶するためのメモリ量は、各回の部分決定化が必要となった決定化済み状態を記憶するためのメモリ量のうちで最も多いものと同値となる。すなわち、図５で説明した従来の決定化の方法よりも、決定化済み状態を記憶するためのメモリ量を削減することが可能となる。 That is, the amount of memory for storing the determinized state required in the method of repeatedly executing this partial determinization is the amount of memory for storing the determinized state that required partial determinization each time It is the same value as the most common one. That is, it is possible to reduce the amount of memory for storing the determinized state as compared with the conventional determinizing method described in FIG.

以下、部分集合生成部１２、部分決定化部１３、繰り返し処理部１４について説明する。 Hereinafter, the subset generation unit 12, the partial determinization unit 13, and the iterative processing unit 14 will be described.

部分集合生成部１２は、決定化対象とする遷移の入力記号の部分集合Σ_i（ｉは１以上の整数）を、後述する決定化処理の繰り返し回数に応じて生成する機能部である。 The subset generation unit 12 is a functional unit that generates a subset Σ _i (i is an integer of 1 or more) of input symbols of transitions to be determinized according to the number of repetitions of determinization processing described later.

具体的に、部分集合生成部１２は、所定の規則に基づいて処理の対象となる非決定性ＦＳＡの集合Σに含まれた入力記号を配列し、配列後の入力記号のうち、一部の入力記号を抽出した部分集合Σ_iを、後述する部分決定化処理の繰り返し回数に応じて順次生成する。 Specifically, the subset generation unit 12 arranges input symbols included in the set Σ of non-deterministic FSA to be processed based on a predetermined rule, and some of the input symbols after the arrangement are input. A subset Σ _i from which symbols are extracted is sequentially generated according to the number of repetitions of the partial determinizing process described later.

ここで、入力記号の配列決定時の基準となる所定の規則は、特に問わないものとする。なお、本実施形態では、下記式（１）で表した関数ｖ（ｘ）で得られる値が大きいほど、上位の順位となるよう定めた関係式ｒ（ｖ（ｘ））を所定の規則として用いるものとする。 Here, there is no particular limitation on the predetermined rule that serves as a reference when determining the arrangement of the input symbols. In the present embodiment, the relational expression r (v (x)) that is determined so that the higher the value obtained by the function v (x) expressed by the following expression (1) is, the higher the order is, as a predetermined rule. Shall be used.

ここで、「ｘ」は処理対象となる非決定性ＦＳＡの集合Σに含まれた何れかの入力記号であって、当該ｘを入力記号とする遷移の集合をＥ_xとしている。ただし、Ｅ_xはＥ₁に含まれているものとする。Ｑ_xはＥ_xに属する遷移の遷移元となる状態の集合である。さらに、Ｅ_xの要素数をＮ（Ｅ_x）、Ｑ_xの要素数をＮ（Ｑ_x）、Ｑ₁の要素数をＮ（Ｑ₁）としている。 Here, "x" is a one of the input symbols included in Σ a set of non-deterministic FSA to be processed, and the E _x a set of transitions to enter the x symbol. However, E _x is assumed to be included in the E _1. Q _x is a set of states that are transition sources of transitions belonging to E _x . Further, the number of elements of E _x is N (E _x ), the number of elements of Q _x is N (Q _x ), and the number of elements of Q ₁ is N (Q ₁ ).

ここで、上記式（１）の計算結果に基づき、当該計算結果の値が大きいほど上位の順位となるよう各入力記号を配列することは、各入力記号に係る遷移の個数が多いものから配列することと同義である。なお、上記式（１）においてＮ(Ｑ₁)の部分を所定の定数値としてもよい。また、配列決定時の指標となる所定の規則は、上記例に限定されないものとする。 Here, based on the calculation result of the above formula (1), the arrangement of the input symbols so that the higher the value of the calculation result is, the higher the order is, the arrangement from the largest number of transitions related to each input symbol. It is synonymous with doing. In the above formula (1), the portion of N (Q ₁ ) may be a predetermined constant value. In addition, the predetermined rule that serves as an index when determining the arrangement is not limited to the above example.

また、部分集合生成部１２により生成される部分集合Σ_iは、下記式（２）の条件を満たすものとする。ここで、「ｎ」は後述する部分決定化処理を繰り返す総数（繰り返し総数）を表しており、Σ_n＝Σである。 The subset Σ _i generated by the subset generation unit 12 satisfies the condition of the following formula (2). Here, “n” represents the total number of repetitions of the partial determinizing process described later (the total number of repetitions), and Σ _n = Σ.

集合Σの分割方法は、上記式（２）の条件を満たす限り特に問わないものとするが、本実施形態では、ｉの増加とともに、決定化の対象となる遷移の入力記号の数を略等間隔で増加させる態様を採用する。 The dividing method of the set Σ is not particularly limited as long as the condition of the above expression (2) is satisfied. In the present embodiment, the number of input symbols of transitions to be determinized is substantially equal as i increases. A mode of increasing at intervals is adopted.

以下、図９を参照して、部分集合生成部１２で実行される部分集合生成処理の動作を説明する。まず、部分集合生成部１２は、上述した所定の規則に基づいて、非決定性ＦＳＡの集合Σに含まれた各入力記号を配列する（ステップＳ３１）。 Hereinafter, the operation of the subset generation process executed by the subset generation unit 12 will be described with reference to FIG. First, the subset generation unit 12 arranges each input symbol included in the set Σ of non-deterministic FSA based on the predetermined rule described above (step S31).

次いで、部分集合生成部１２は、Ｎ（Σ）を入力記号の総数として、Σ_iにＮ(Σ)×i／n個だけ並べた順に入力記号を代入し、これをn回繰り返すことで繰り返しの回数ｉ毎のΣ_iを順次生成する（ステップＳ３２）。言い換えると、入力記号を引数に取り、順位(１以上の整数)を返す関数ｒとし、入力記号をｘとしたときｒ(ｘ)≦Ｎ(Σ)×i／nを満たす入力記号の集合をｉの値毎にΣ_iに代入する。 Next, the subset generation unit 12 substitutes the input symbols in the order in which N (Σ) × i / n are arranged in Σ _i, where N (Σ) is the total number of input symbols, and repeats this by repeating this n times. sequentially generates sigma _i for each number of i (step S32). In other words, a set of input symbols satisfying r (x) ≦ N (Σ) × i / n, where r is a function r that takes an input symbol as an argument and returns a rank (an integer of 1 or more), and the input symbol is x. Substitute for Σ _i for each value of _i .

上述した部分集合生成処理の動作を、図３に示した非決定性ＦＳＡを用いて説明する。まず、部分集合生成部１２は、式（１）で示した関数ｖ（ｘ）を用いて、その値を入力記号毎に計算すると、ｖ（ａ）＝（２＋４）／（１＋４）＝１．２、ｖ（ｂ）＝（２＋４）／（２＋４）＝１．０、ｖ（ｃ）＝（１＋４）／（１＋４）＝１．０が得られる。ここで、関数ｖ（ｘ）の返す値が同じとなる場合には、入力記号が文字であれば割り当てられた文字コード等の値を基準に順序を決める等すればよいし、整数値であればその値自体を基準に順序を決めるなどすればよい。次いで、部分集合生成部１２は、ｖ（ｘ）の大きに応じて上位の順位となるよう定めた関係式ｒ（ｘ）を用いることで、ｒ（ｖ（ａ））＝１、ｒ（ｖ（ｂ））＝２、ｒ（ｖ（ｃ））＝３を導出する。 The operation of the subset generation process described above will be described using the nondeterministic FSA shown in FIG. First, the subset generation unit 12 calculates the value for each input symbol using the function v (x) shown in Expression (1), and v (a) = (2 + 4) / (1 + 4) = 1. 2, v (b) = (2 + 4) / (2 + 4) = 1.0 and v (c) = (1 + 4) / (1 + 4) = 1.0 are obtained. Here, when the values returned by the function v (x) are the same, if the input symbol is a character, the order may be determined based on the assigned character code or the like, or an integer value may be used. For example, the order may be determined based on the value itself. Next, the subset generation unit 12 uses r (v (a)) = 1, r (v) by using a relational expression r (x) determined so as to be ranked higher in accordance with the magnitude of v (x). (B)) = 2 and r (v (c)) = 3 are derived.

ここで、繰り返し総数ｎ＝３が設定されたものとすると、部分集合生成部１２は、部分集合Σ_iとして繰り返し回数ｉ（ｉ＝１〜３）毎に、Σ₁＝{ａ}、Σ₂＝{ａ,ｂ}、Σ₃＝{ａ,ｂ,ｃ}を生成する。 Here, assuming that the total number of iterations n = 3 is set, the subset generation unit 12 sets Σ ₁ = {a}, Σ _{2 for} each iteration number i (i = 1 to 3) as a subset Σ _i. = {A, b}, Σ ₃ = {a, b, c}.

図２に戻り、部分決定化部１３は、部分集合生成部１２により生成されたΣ_iに基づいて、決定化の対象とする遷移を選択するとともに、決定化の対象とならない遷移に係る入力記号を異なる入力記号に置き換え、図５で説明した従来の決定化方法を利用することで、決定化の対象とする遷移についての決定化（部分決定化）を行う。 Returning to FIG. 2, the partial determinator 13 selects a transition to be determinized based on the Σ _i generated by the subset generator 12 and inputs an input symbol related to the transition that is not to be determinized Is replaced with a different input symbol, and the conventional determinizing method described in FIG. 5 is used to perform determinization (partial determinization) on the transition to be determinized.

また、繰り返し処理部１４は、部分決定化部１３を制御し、繰り返し総数ｎに応じた回数だけ部分決定化処理を繰り返し実行させる。 In addition, the iterative processing unit 14 controls the partial determinizing unit 13 to repeatedly execute the partial determinizing process as many times as the number of repetitions n.

以下、図１０、１１を参照して、部分決定化部１３及び繰り返し処理部１４により実行される逐次決定化処理について説明する。なお、本処理の前提として、部分集合生成部１２により部分集合Σ_iが予め生成されているものとする。また、逐次決定化処理の実行時に生成される、後述するＥ_r、Ｅ_d、ｘ_new等の各種変数は作業エリアとして機能するＲＡＭ５に一時記憶されるものとする。 Hereinafter, the sequential determinizing process executed by the partial determinator 13 and the iterative processor 14 will be described with reference to FIGS. As a premise of this process, it is assumed that the subset Σ _i is generated in advance by the subset generation unit 12. It is assumed that various variables such as E _r , E _{d, and} x _new described later that are generated when the sequential determinizing process is executed are temporarily stored in the RAM 5 that functions as a work area.

図１０は、本実施形態における逐次決定化処理の手順を示したフローチャートである。まず、繰り返し処理部１４は、繰り返し回数を計数するための変数ｉを１に設定すると（ステップＳ４１）、このｉの値に応じた部分決定化処理（ステップＳ４２）を部分決定化部１３に実行させる。以下、図１１を参照して、ステップＳ４２の部分決定化処理について説明する。 FIG. 10 is a flowchart showing the procedure of the sequential determinizing process in the present embodiment. First, the repetition processing unit 14 sets a variable i for counting the number of repetitions to 1 (step S41), and executes a partial determinizing process (step S42) corresponding to the value of i to the partial determinizing unit 13. Let Hereinafter, the partial determinizing process in step S42 will be described with reference to FIG.

図１１は、ステップＳ４２の部分決定化処理の手順を示したフローチャートである。まず、部分決定化部１３は、決定化の対象となる遷移、つまり遷移の入力記号がΣ_iに含まれている遷移の集合をＥ_rに代入する（ステップＳ５１）。次いで、部分決定化部１３は、決定化の対象とならない遷移、つまり遷移の入力記号がΣ_iに含まれていない遷移の集合をＥ_dに代入する（ステップＳ５２）。なお、Σ_iのｉは、ステップＳ４１又は後述するステップＳ４５で設定されたｉの値に対応する。 FIG. 11 is a flowchart showing the procedure of the partial determinizing process in step S42. First, the partial determinizing unit 13 substitutes for E _{r a} transition to be determinized, that is, a set of transitions in which the input symbol of the transition is included in Σ _i (step S51). Next, the partial determinizing unit 13 substitutes E _d for a transition that is not subject to determinization, that is, a set of transitions in which the input symbol of the transition is not included in Σ _i (step S52). Note that _i in Σ _i corresponds to the value of i set in step S41 or step S45 described later.

続いて、部分決定化部１３は、Ｅ_dに属する全ての遷移δ_dについて、後述するステップＳ５４、５５の処理を施したか否かを判定する（ステップＳ５３）。ここで、未処理の遷移δ_dが存在すると判定した場合（ステップＳ５３；Ｎｏ）、部分決定化部１３は、下記式（３）を満たすｘ_new、即ち、Ｅ_dの遷移の入力記号が互いに異なる入力記号となり、且つ、Σ_iとも異なる入力記号となるような入力記号ｘ_newを生成する（ステップＳ５４）。 Subsequently, area determining unit 13, for all the transitions [delta] _d belonging to E _d, determines whether performing processing steps S54,55 to be described later (step S53). Here, when it is determined that there is an unprocessed transition δ _d (step S53; No), the partial determinator 13 satisfies x _new that satisfies the following expression (3), that is, input symbols of E _d transitions are mutually different. An input symbol x _new that is a different input symbol and also an input symbol different from Σ _i is generated (step S54).

次いで、部分決定化部１３は、この生成したｘ_newを遷移δ_dにかかる入力記号と置き換え（ステップＳ５５）、ステップＳ５３の処理へと再び戻る。 Next, the partial determinizing unit 13 replaces the generated x _new with the input symbol _relating to the transition δ _d (step S55), and returns to the process of step S53 again.

一方、ステップＳ５３において、Ｅ_dに属する全ての遷移δ_dについて、ステップＳ５４、５５の処理を施したと判定した場合には（ステップＳ５３；Ｙｅｓ）、ステップＳ５６の処理へと移行する。 On the other hand, in step S53, for all the transitions [delta] _d belonging to E _d, when it is determined that subjected to the process of step S54,55 (Step S53; Yes), and proceeds to step S56.

次に、部分決定化部１３は、Ｅ_dに属した遷移δ_dをＥ_rへと追加した後（ステップＳ５６）、このＥ_rを含む非決定性ＦＳＡＡ_r＝（Ｑ₁，Σ_r，Ｅ_r，Ｉ₁，Ｆ₁）の決定化処理を決定化処理部１１に実行させ、その結果をＡ'_r（Ａ'_r＝（Ｑ₂，Σ_r，Ｅ'_r，ｉ₂，Ｆ₂））に代入する（ステップＳ５７）。ここで、Σ_rは入力記号を置き換えた後の入力記号の集合である。なお、ステップＳ５７で行われる決定化処理の手順は、図５で説明したものと同様であるため、説明を省略する。 Next, area determining section 13, after adding the transition [delta] _d belonging to E _d to E _r (step S56), the non-deterministic FSA A _r = (Q ₁ including the E _r, sigma _r, E _r , I ₁ , F ₁ ) is executed by the determinizing processor 11, and the result is A ′ _r (A ′ _r = (Q ₂ , Σ _r , E ′ _r , i ₂ , F ₂ )). (Step S57). Here, Σ _r is a set of input symbols after replacing the input symbol. The determinizing process performed in step S57 is the same as that described with reference to FIG.

続いて、部分決定化部１３は、Ａ'_rの入力記号Σ_rをステップＳ５５で置き換える前の元の入力記号に戻した後、このＡ'_rをＡ₂＝（Ｑ₂，Σ，Ｅ₂，ｉ₂，Ｆ₂）に代入し（ステップＳ５８）、ステップＳ４３の処理へと移行する。ここでＡ₂は、Ａ₁のうちΣ_iに属する入力記号を持つ遷移のみを決定化したＦＳＡとなっている。 Subsequently, area determining section 13, 'after returning the input symbol sigma _r of _r based on the input symbol before replacing in step S55, the A' A a _{_{_{r A 2 = (Q 2,}}} Σ, E 2 , I ₂ , F ₂ ) (step S58), and the process proceeds to step S43. Here, A ₂ is an FSA in which only a transition having an input symbol belonging to Σ _i in A ₁ is determined.

図１０に戻り、繰り返し処理部１４は、ｉの値が繰り返し回数の最大値であるｎを下回るか否かを判定する（ステップＳ４３）。ここで、ｉの値がｎを下回ると判定した場合には（ステップＳ４３；Ｙｅｓ）、繰り返し処理部１４は、ステップＳ４２で決定化されたＦＳＡＡ₂をＦＳＡＡ₁とし（ステップＳ４４）、ｉの値を１増やした後（ステップＳ４５）、ステップＳ４２の処理へと再び戻る。 Returning to FIG. 10, the iterative processing unit 14 determines whether or not the value of i is less than n, which is the maximum number of repetitions (step S43). Here, if the value of i is determined to be below the n (Step S43; Yes), the repeating unit 14, the FSA A ₂ determined reduction in step S42 and the FSA A ₁ (step S44), i 1 is increased by 1 (step S45), and the process returns to step S42 again.

また、ステップＳ４３において、ｉの値がｎ以上と判定した場合には（ステップＳ４３；Ｎｏ）、本処理を終了する。ここで、最終的に得られたＦＳＡＡ₂は処理対象となった非決定性ＦＳＡＡ₁を決定化したものとなっている。 If it is determined in step S43 that the value of i is n or more (step S43; No), this process is terminated. Here, the finally obtained FSA A ₂ is a definitive non-deterministic FSA A ₁ to be processed.

上述した逐次決定化処理の動作を、図３で示した非決定性ＦＳＡＡ₁を基に、図１２〜１７を用いて説明する。なお、繰り返し総数ｎは３とし、部分集合生成部１２により生成された部分集合Σ_iが、Σ₁＝｛ａ｝、Σ₂＝｛ａ,ｂ｝、Σ₃＝｛ａ,ｂ,ｃ｝であるものとする。 The operation of the sequential determinizing process described above will be described with reference to FIGS. 12 to 17 based on the nondeterministic FSA A ₁ shown in FIG. Note that the total number n of iterations is 3, and the subset Σ _i generated by the subset generation unit 12 is Σ ₁ = {a}, Σ ₂ = {a, b}, Σ ₃ = {a, b, c}. Suppose that

まず、１回目（ｉ＝１）の決定化に関して、入力記号の名前を付け替えた時点（図１１のステップＳ５７の時点）でのＦＳＡＡ_rを図１２に示す。この例では、状態１から状態０へ遷移する遷移の入力記号ｂをＢと置き換え、状態２から状態３へ遷移する遷移の入力記号ｂをＣと置き換え、状態３から状態０へ遷移する遷移の入力記号ｃをＤと付け替えている。 First, with respect to determination of the first time (i = 1), shown in Figure 12 the FSA A _r at the time when renames input symbols (the time of step S57 in FIG. 11). In this example, the input symbol b of the transition from the state 1 to the state 0 is replaced with B, the input symbol b of the transition from the state 2 to the state 3 is replaced with C, and the transition of the transition from the state 3 to the state 0 is changed. The input symbol c is replaced with D.

部分決定化部１３は、このＦＳＡＡ_rを図５で説明した決定化の方法によって決定化する。状態名の名前の付け替え前、つまり図５でのステップＳ２４を実行する直前のＦＳＡを図１３に示す。そして、決定化処理部１１が状態番号を振りなおし、部分決定化部１３が、入力記号を元に戻して１回目の決定化が完了する。この状態が図１４である。 Area determining unit 13 determines by way of the decision of which describe the FSA A _r in FIG. FIG. 13 shows the FSA before the state name is renamed, that is, immediately before the execution of step S24 in FIG. Then, the determinization processing unit 11 reassigns the state number, and the partial determinization unit 13 restores the input symbol to complete the first determinization. This state is shown in FIG.

同様に２回目（ｉ＝２）の決定化処理中において、図１４で示した状態２から状態０へ遷移する遷移の入力記号ｃをＣに置き換えて決定化した後、状態名を付け替える直前の結果を図１５に示している。さらに状態番号を振りなおし、入力記号を元に戻したものが図１６である。 Similarly, during the second determinization process (i = 2), the input symbol c of the transition from state 2 to state 0 shown in FIG. 14 is replaced with C and determinized, and immediately before the state name is changed. The results are shown in FIG. FIG. 16 shows the state numbers reassigned and the input symbols restored.

既に決定化された状態ではあるが、部分決定化部１３は、繰り返し処理部１４の制御に応じて３回目（ｉ＝３）の決定化を行う。ここで、図１７は、状態名を付け替える直前の結果を示した図である。置き換えた入力記号はないので、入力記号については元に戻す処理は何も行わず、状態名のみ付け替えた最終結果は上述した図８と同様となる。 Although it has already been determined, the partial determinator 13 performs the third (i = 3) determinization according to the control of the iterative processing unit 14. Here, FIG. 17 is a diagram illustrating a result immediately before the state name is changed. Since there is no replaced input symbol, no processing for restoring the input symbol is performed, and the final result of changing only the state name is the same as in FIG.

ところで、図５で説明した従来の決定化の方法のみを用いた場合での、決定化済み状態に含まれた要素の合計数は、図６に示したように５個である。一方、図１０で説明した決定化の方法による１回目の決定化の結果において、決定化済み状態に含まれた要素の合計数は図１３から分かるように４個である。また、同様に２回目は図１５より４個、３回目は図１７より３個となる。 Incidentally, the total number of elements included in the determinized state when only the conventional determinizing method described in FIG. 5 is used is five as shown in FIG. On the other hand, in the result of the first determinization by the determinizing method described with reference to FIG. 10, the total number of elements included in the determinized state is four as can be seen from FIG. Similarly, the second time is four from FIG. 15, and the third time is three from FIG.

したがって、繰り返し決定化処理を行った中での最大の合計要素数は４個となり、決定化処理のみを用いた従来の決定化による方法よりも合計要素数の最大値を減らすことができる。つまり、決定化済み状態の記憶に必要なメモリ量を減らすことができる。 Therefore, the maximum total number of elements in the repeated determinizing process is four, and the maximum value of the total number of elements can be reduced as compared with the conventional determinizing method using only the determinizing process. That is, the amount of memory necessary for storing the determined state can be reduced.

以上のように、本実施形態によれば、非決定性ＦＳＡに含まれた遷移から、決定化の対象とする遷移を一部選び、この決定化対象とする遷移を毎回変えながら繰り返し決定化を行うことで、一回の決定化で記憶する決定化済み状態の名称のうち、それを構成する非決定性オートマトンに含まれた状態の延べ数を減少させることができるため、決定化の実行時に要するメモリ量を減少させることができる。 As described above, according to this embodiment, a part of transitions to be determinized is selected from the transitions included in the non-deterministic FSA, and repeated determinization is performed while changing the transition to be determinized each time. Therefore, since the total number of states included in the non-deterministic automaton constituting the determinized state names stored in one determinization can be reduced, the amount of memory required for executing determinization Can be reduced.

なお、先に述べたオートマトン決定化装置１００における各処理を実行するプログラムを、インストール可能な形式又は実行可能な形式でＣＤ−ＲＯＭ、フロッピー（Ｒ）ディスク（ＦＤ）、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録して提供する態様としてもよい。 The program for executing each process in the automaton determinator 100 described above can be read by a computer such as a CD-ROM, a floppy (R) disk (FD), and a DVD in an installable or executable format. It is good also as an aspect which records and provides on a recording medium.

また、オートマトン決定化装置１００における各処理を実行するプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成しても良い。 The program for executing each process in the automaton determinizing apparatus 100 may be stored on a computer connected to a network such as the Internet and provided by being downloaded via the network.

この場合には、プログラムは、オートマトン決定化装置１００において上記記録媒体から読み出して実行することによりＲＡＭ５上にロードされ、上記ソフトウェア構成で説明した各部がＲＡＭ５上に生成される。 In this case, the program is loaded onto the RAM 5 by being read from the recording medium and executed by the automaton determinator 100, and each unit described in the software configuration is generated on the RAM 5.

［第２の実施形態］
上述した第１の実施形態では、部分決定化処理を行う毎に入力記号の置き換えを行う態様を説明した。本実施形態では入力記号の置き換えを行うことなく、非決定性ＦＳＡの決定化を実現し、当該決定化に必要となるメモリ量を削減することが可能なオートマトン決定化装置２００について説明する。なお、上述した第１の実施形態と同様の構成については、同一の符号を付与しその説明を省略する。 [Second Embodiment]
In the first embodiment described above, the mode in which the input symbol is replaced every time the partial determinizing process is performed has been described. In the present embodiment, an automaton determinator 200 capable of realizing non-deterministic FSA determinization without reducing input symbols and reducing the amount of memory required for the determinization will be described. In addition, about the structure similar to 1st Embodiment mentioned above, the same code | symbol is provided and the description is abbreviate | omitted.

図１８は、図１に示したＣＰＵ１とＲＯＭ４又は記憶部６に予め記憶された所定のプログラムとの協働により実現される、本実施形態のオートマトン決定化装置２００の機能的構成を示した図である。同図に示したように、オートマトン決定化装置２００は部分集合生成部１２、部分決定化部２１、繰り返し処理部１４を有している。 FIG. 18 is a diagram illustrating a functional configuration of the automaton determinizing apparatus 200 according to the present embodiment realized by cooperation of the CPU 1 illustrated in FIG. 1 and a predetermined program stored in the ROM 4 or the storage unit 6 in advance. It is. As shown in the figure, the automaton determinizing apparatus 200 includes a subset generation unit 12, a partial determinization unit 21, and an iterative processing unit 14.

ここで、部分決定化部２１は、処理対象の非決定性ＦＳＡに含まれた遷移にかかる入力記号から、一の入力記号を処理対象として順次選択し、この処理対象の入力記号に係る遷移を抽出した遷移の集合Ｅ_dを生成する。また、部分決定化部２１は、Ｅ_dに含まれる遷移から、決定化対象とする遷移を選択し、この選択した遷移について決定化を行う。ここで、Ｅ_dの生成に係る抽出方法は、特に問わないものとするが、本実施形態では、第１の実施形態で説明した、部分集合Σ_iに基づいて行うものとする。 Here, the partial determinator 21 sequentially selects one input symbol as a processing target from the input symbols related to the transition included in the non-deterministic FSA to be processed, and extracts the transition related to the input symbol to be processed. A set E _d of transitions is generated. Further, the partial determinizing unit 21 selects a transition to be determinized from the transitions included in E _d and performs determinizing for the selected transition. Here, the extraction method related to the generation of E _d is not particularly limited, but in this embodiment, the extraction method is performed based on the subset Σ _i described in the first embodiment.

以下、図１９〜２１を参照して、部分決定化部２１及び繰り返し処理部１４により実行される、本実施形態の逐次決定化処理について説明する。なお、本処理の前提として、部分集合生成部１２により部分集合Σ_iが予め生成されているものとする。また、逐次決定化処理の実行時に生成される、後述する各種変数は作業エリアとして機能するＲＡＭ５に一時記憶されるものとする。 Hereinafter, with reference to FIGS. 19 to 21, the sequential determinizing process of the present embodiment executed by the partial determinizing unit 21 and the iterative processing unit 14 will be described. As a premise of this process, it is assumed that the subset Σ _i is generated in advance by the subset generation unit 12. It is assumed that various variables, which will be described later, that are generated when the sequential determinizing process is executed are temporarily stored in the RAM 5 that functions as a work area.

図１９は、本実施形態における逐次決定化処理の手順を示したフローチャートである。まず、繰り返し処理部１４は、繰り返し回数を計数するための変数ｉを１に設定すると（ステップＳ６１）、このｉの値に応じた部分決定化処理（ステップＳ６２）を部分決定化部２１に実行させる。以下、図２０を参照して、ステップＳ６２の部分決定化処理について説明する。 FIG. 19 is a flowchart showing the procedure of the sequential determinizing process in this embodiment. First, the repetition processing unit 14 sets a variable i for counting the number of repetitions to 1 (step S61), and executes a partial determinizing process (step S62) corresponding to the value of i to the partial determinizing unit 21. Let Hereinafter, the partial determinizing process in step S62 will be described with reference to FIG.

図２０は、ステップＳ６２の部分決定化処理の手順を示したフローチャートである。まず、部分決定化部２１は、Ａ₂の初期状態ｉ₂にＩ₁を代入し（ステップＳ７１）、このｉ₂をキューＳに追加する（ステップＳ７２）。この時点でＳにはｉ₂のみが記憶されている。 FIG. 20 is a flowchart showing the procedure of the partial determinizing process in step S62. First, the partial determinator 21 substitutes I ₁ for the initial state i ₂ of A ₂ (step S71), and adds this i ₂ to the queue S (step S72). At this time, only i ₂ is stored in S.

次いで、部分決定化部２１は、Ｓが空集合か否かを判定し、空集合でないと判定した場合には（Ｓ７３；Ｙｅｓ）、Ｓから要素を一つ取り出しｑ_subに代入した後（ステップＳ７４）、現在のｑ_subに関してΣに含まれた要素（入力記号）全てに対して処理を行ったか否かを判定する（ステップＳ７５）。ここで、Σの要素全てに対して処理を行ったと判定した場合には（ステップＳ７５；Ｙｅｓ）、ステップＳ７３の処理へと再び戻る。 Next, the partial determinator 21 determines whether or not S is an empty set. If it is determined that it is not an empty set (S73; Yes), after extracting one element from S and substituting it into _qsub (step In step S74, it is determined whether or not processing has been performed on all elements (input symbols) included in Σ with respect to the current q _sub (step S75). If it is determined that processing has been performed for all the elements of Σ (step S75; Yes), the process returns to step S73 again.

一方、ステップＳ７５において、Σの要素全てに対して処理を行っていないと判定した場合には（ステップＳ７５；Ｎｏ）、部分決定化部２１は、まだ処理していない入力記号のうちの１つをｘに代入する（ステップＳ７６）。 On the other hand, if it is determined in step S75 that processing has not been performed for all elements of Σ (step S75; No), the partial determinator 21 selects one of the input symbols that have not yet been processed. Is substituted for x (step S76).

続いて、部分決定化部２１は、入力記号ｘを伴うｑ_subに含まれる状態からの遷移の集合をＥ_dに代入した後（ステップＳ７７）、このＥ_dが空集合か否かを判定する（ステップＳ７８）。ここで、Ｅ_dを空集合と判定した場合（ステップＳ７８；Ｎｏ）、ステップＳ７５の処理へと再び戻る。 Subsequently, the partial determinizing unit 21 assigns a set of transitions from the state included in q _sub with the input symbol x to E _d (step S77), and then determines whether this E _d is an empty set. (Step S78). If it is determined that E _d is an empty set (step S78; No), the process returns to step S75 again.

一方、ステップＳ７８において、Ｅ_dを空集合でないと判定した場合（ステップＳ７８；Ｙｅｓ）、部分決定化部２１は、Ｅ_dから遷移を選択する遷移選択処理（ステップＳ７９）を実行する。以下、図２１を参照して、ステップＳ７９の遷移選択処理について説明する。 On the other hand, when it is determined in step S78 that E _d is not an empty set (step S78; Yes), the partial determinator 21 performs a transition selection process (step S79) for selecting a transition from E _d . Hereinafter, the transition selection process in step S79 will be described with reference to FIG.

まず、部分決定化部２１は、ステップＳ７６で代入した入力記号ｘが、部分集合Σ_iに含まれているか否かを判定する（ステップＳ７９１）。なお、Σ_iのｉは、ステップＳ６１又は後述するステップＳ６５で設定されたｉの値に対応する。 First, the partial determinator 21 determines whether or not the input symbol x substituted in step S76 is included in the subset Σ _i (step S791). Note that _i of Σ _i corresponds to the value of i set in step S61 or step S65 described later.

ここで、入力記号ｘが部分集合Σ_iに含まれていると判定した場合には（ステップＳ７９１；Ｙｅｓ）、遷移の集合Ｅ_tにＥ_dを代入し（ステップＳ７９２）、図２０のステップＳ８０の処理へと移行する。 Here, when the input symbol x is determined to be included in the subset sigma _i (Step S791; Yes), by substituting E _d to the set E _t transitions (step S792), step S80 of FIG. 20 Move on to processing.

一方、ステップＳ７９１において、入力記号ｘが部分集合Σ_iに含まれないと判定した場合には（ステップＳ７９１；Ｎｏ）、Ｅ_dから任意の遷移を一つ取り出し、遷移の集合Ｅ_tに代入した後（ステップＳ７９３）、図２０のステップＳ８０の処理へと移行する。 On the other hand, if it is determined in step S791 that the input symbol x is not included in the subset Σ _i (step S791; No), one arbitrary transition is extracted from E _d and substituted into the transition set E _t . After (step S793), the process proceeds to step S80 in FIG.

図２０に戻り、部分決定化部２１は、ステップＳ７９で選択した遷移の集合Ｅ_tに属する遷移をＥ_dから除去する（ステップＳ８０）。続いて、部分決定化部２１は、Ｅ_tに含まれる遷移をδとしたとき、遷移δの遷移先の状態の集合ｑ_sをｑ'_subに代入する（ステップＳ８１）。 Returning to FIG. 20, the partial determinizing unit 21 removes the transitions belonging to the set of transitions E _t selected in step S79 from E _d (step S80). Subsequently, area determining section 21, when a transition included in the E _t was [delta], by substituting the set q _s of transition destination state of the transition [delta] to q _'sub (step S81).

次いで、部分決定化部２１は、δ'の遷移元の状態がｑ_subであり、δ'の遷移先の状態がｑ'_subであり、δ'の入力記号がｘであるδ'を、Ｅ₂に追加した後（ステップＳ８２）、ｑ'_subがＱ₂に存在しているか否かを判定する（ステップＳ８３）。ここで、ｑ'_subが既にＱ₂に存在していると判定した場合には（ステップＳ８３；Ｙｅｓ）、ステップＳ７８の処理へと再び戻る。 Next, the partial determinator 21 determines that δ ′ in which the transition source state of δ ′ is q _sub , the transition destination state of δ ′ is q ′ _sub , and the input symbol of δ ′ is x is E After adding to ₂ (step S82), it is determined whether q ′ _sub exists in Q ₂ (step S83). If it is determined that q ′ _sub already exists in Q ₂ (step S83; Yes), the process returns to step S78 again.

一方、ステップＳ８３において、ｑ'_subがＱ₂に存在しないと判定した場合、つまりｑ'_subが新たな状態であると判定した場合には（ステップＳ８３；Ｎｏ）、部分決定化部２１は、Ｑ₂にｑ'_subを追加する（ステップＳ８４）。続いて部分決定化部２１は、ｑ'_subの要素がＦ₁に含まれているか否かを判定する（ステップＳ８５）。ここで、ｑ'_subの要素がＦ₁に含まれていないと判定した場合には（ステップＳ８５；Ｎｏ）、ステップＳ８７の処理へと直ちに移行する。 On the other hand, in step S83, 'if _{the sub} is judged not present in Q _2, i.e. q' q _{if sub} is determined to be a new state (step S83; No), area determining section 21, Q ′ _sub is added to Q ₂ (step S84). Subsequently, the partial determinizing unit 21 determines whether or not the element of q ′ _sub is included in F ₁ (step S85). Here, if it is determined that the element of q ′ _sub is not included in F ₁ (step S85; No), the process immediately proceeds to the process of step S87.

また、ステップＳ８５において、ｑ'_subの要素がＦ₁に含まれていると判定した場合には（ステップＳ８５；Ｙｅｓ）、部分決定化部２１は、ｑ'_subを受理状態の集合Ｆ₂に追加した後（ステップＳ８６）、ステップＳ８７の処理へと移行する。 If it is determined in step S85 that the element of q ′ _sub is included in F ₁ (step S85; Yes), the partial determinator 21 sets q ′ _sub to the set of accepted states F ₂ . After the addition (step S86), the process proceeds to step S87.

続くステップＳ８７では、部分決定化部２１がＳにｑ'_subを追加した後（ステップＳ８７）、ステップＳ７８の処理へと再び戻る。 In subsequent step S87, after partial determinator 21 adds q ′ _sub to S (step S87), the process returns to step S78 again.

一方、ステップＳ７３において、Ｓを空集合と判定した場合には（ステップＳ７３；Ｎｏ）、ステップＳ８８へと移行し、Ｑ₂に追加された状態の名前を付け替えた後（ステップＳ８８）、図１９のステップＳ６３へと移行する。 On the other hand, in step S73, when it is determined that an empty set S (step S73; No), the operation proceeds to step S88, after renames state of being added to Q ₂ (step S88), FIG. 19 The process proceeds to step S63.

図１９に戻り、繰り返し処理部１４は、ｉの値が繰り返し回数の最大値であるｎを下回るか否かを判定する（ステップＳ６３）。ここで、ｉの値がｎを下回ると判定した場合には（ステップＳ６３；Ｙｅｓ）、繰り返し処理部１４は、ステップＳ６２で導出されたＦＳＡＡ₂をＦＳＡＡ₁とし（ステップＳ６４）、ｉの値を１増やした後（ステップＳ６５）、ステップＳ６２の処理に再び移行する。 Returning to FIG. 19, the iterative processing unit 14 determines whether or not the value of i is less than n which is the maximum value of the number of repetitions (step S <b> 63). Here, if the value of i is determined to be below the n (Step S63; Yes), the repeating unit 14, the FSA A ₂ derived in step S62 and FSA A ₁ (step S64), the i After incrementing the value by 1 (step S65), the process proceeds to step S62 again.

また、ステップＳ６３において、ｉの値がｎ以上と判定した場合には（ステップＳ６３；Ｎｏ）、本処理を終了する。ここで、最終的に得られたＦＳＡＡ₂は処理対象となった非決定性ＦＳＡＡ₁を決定化したものとなっている。 Further, when it is determined in step S63 that the value of i is n or more (step S63; No), this process ends. Here, the finally obtained FSA A ₂ is a definitive non-deterministic FSA A ₁ to be processed.

また、本実施形態によれば、決定化の対象とする遷移を直接指定することができるため、入力記号による遷移の選択以外の方法でも決定化の対象となる遷移を選択することができる。 Further, according to the present embodiment, since the transition to be determinized can be directly specified, the transition to be determinized can be selected by a method other than the selection of the transition by the input symbol.

［第３の実施形態］
第１の実施形態で説明した方法は、ＦＳＡだけでなくＦＳＡを拡張したオートマトンの決定化に対しても同様に適用することが可能である。本実施形態では、ＦＳＡを拡張したオートマトンの決定化を行うオートマトン決定化装置３００について説明する。なお、上述した第１の実施形態と同様の構成については、同一の符号を付与しその説明を省略する。 [Third Embodiment]
The method described in the first embodiment can be similarly applied not only to FSA but also to automata determinizing FSA. In the present embodiment, an automaton determinizing apparatus 300 that performs automaton determinization with expanded FSA will be described. In addition, about the structure similar to 1st Embodiment mentioned above, the same code | symbol is provided and the description is abbreviate | omitted.

図２２は、ＣＰＵ１とＲＯＭ４又は記憶部６に予め記憶された所定のプログラムとの協働により実現される、オートマトン決定化装置３００の機能的構成を示した図である。同図に示したように、オートマトン決定化装置３００は決定化処理部３１、部分集合生成部１２、部分決定化部３２、繰り返し処理部３３を備えている。 FIG. 22 is a diagram showing a functional configuration of the automaton determinizing apparatus 300 realized by the cooperation of the CPU 1 and a predetermined program stored in the ROM 4 or the storage unit 6 in advance. As shown in the figure, the automaton determinizing apparatus 300 includes a determinizing processing unit 31, a subset generation unit 12, a partial determinizing unit 32, and an iterative processing unit 33.

決定化処理部３１は、遷移先の状態が複数存在する重み付き有限状態オートマトン（Weighted Finite State Automaton；ＷＦＳＡ）、有限状態トランスデューサ（Finite State Transducer；ＦＳＴ）、重み付き有限状態トランスデューサ（Weighted Finite State Transducer；ＷＦＳＴ）等の非決定性ＦＳＡを拡張した非決定性状態にあるオートマトン（非決定性オートマトン）の決定化を行う。 The determinization processing unit 31 includes a weighted finite state automaton (WFSA), a finite state transducer (FST), and a weighted finite state transducer (Weighted Finite State Transducer) in which a plurality of transition destination states exist. Determinizing an automaton (non-deterministic automaton) in a non-deterministic state obtained by extending a non-deterministic FSA such as WFST).

ここで、ＷＦＳＡとは、ＦＳＡの遷移に重みを加えたものである。したがって、ＷＦＳＡの遷移には、入力記号と重みとが割り当てられていることになる。なお、ここで「重み」とは、何らかの確率値やスコア、ペナルティ等であり、入力記号を受理する経路に沿って所定の規則（足し算、かけ算、最小値、最大値等）によって演算されるものである。 Here, WFSA is obtained by adding a weight to the transition of FSA. Therefore, an input symbol and a weight are assigned to the WFSA transition. Here, “weight” means any probability value, score, penalty, etc., and is calculated according to a predetermined rule (addition, multiplication, minimum value, maximum value, etc.) along the path for accepting the input symbol. It is.

また、ＦＳＴはＦＳＡの遷移の入力記号に出力記号を加えたものであり、入力記号からなる記号列を入力として与えると、出力記号からなる記号列を出力する。このＦＳＴは、例えば記号列の変換に利用されている。 The FST is obtained by adding an output symbol to an input symbol of an FSA transition. When a symbol string consisting of input symbols is given as an input, a symbol string consisting of output symbols is output. This FST is used for symbol string conversion, for example.

また、ＷＦＳＴは、ＷＦＳＡの遷移に割り当てられた入力記号、重みに加え、出力記号も付与するようにしたモデルである。つまり、ＷＦＳＴでは、遷移に対して入力記号、出力記号及び重みの３つの要素が割り当てられていることになる。このＷＦＳＴは、例えば音声認識におけるモデルを表現するものとして利用されている。 WFST is a model in which an output symbol is assigned in addition to an input symbol and a weight assigned to a WFSA transition. That is, in WFST, three elements of an input symbol, an output symbol, and a weight are assigned to the transition. This WFST is used, for example, as a model for speech recognition.

これらＦＳＡの拡張モデル（特に、ＷＦＳＡやＦＳＴ）を決定化するための方法は、Finite-state transducers in language and speech processing, Mehryar Mohri, Computational Linguistics, Volume 23, Issue 2(June 1997) Pages.269-311等に記載されている。なお、決定化処理部３１は、これら公知の方法を利用した決定化処理を行う機能部である。以下、決定化処理部３１が行う決定化の方法を、従来の決定化方法という。 The methods for determinating these extended models of FSA (especially WFSA and FST) are described in Finite-state transducers in language and speech processing, Mehryar Mohri, Computational Linguistics, Volume 23, Issue 2 (June 1997) Pages.269- 311 etc. The determinizing processing unit 31 is a functional unit that performs determinizing processing using these known methods. Hereinafter, the determinizing method performed by the determinizing processing unit 31 is referred to as a conventional determinizing method.

以下、決定化処理部３１が行う従来の決定化方法について説明する。なお、ここでは、ＷＦＳＴを処理の対象とし、決定化の対象となる非決定性ＷＦＳＴをＴ₁＝（Ｑ₁,Σ,Δ,Ｅ₁,Ｉ₁,Ｆ₁,λ₁,ρ₁）とし、決定化後のＴ₁を決定性ＷＦＳＴＴ₂=（Ｑ₂,Σ,Δ,Ｅ₂,ｉ₂,Ｆ₂,λ₂,ρ₂）とする。 Hereinafter, a conventional determinizing method performed by the determinizing processing unit 31 will be described. Here, WFST is a processing target, and nondeterministic WFST to be determinized is T ₁ = (Q ₁ , Σ, Δ, E ₁ , I ₁ , F ₁ , λ ₁ , ρ ₁ ), T ₁ after determinization is deterministic WFST T ₂ = (Q ₂ , Σ, Δ, E ₂ , i ₂ , F ₂ , λ ₂ , ρ ₂ ).

上記した非決定性ＷＦＳＴＴ₁において、Ｑ₁は状態の集合、Σは入力記号の集合、Δは出力記号の集合、Ｅ₁は遷移の集合でＥ₁⊆Ｑ₁×Σ×Δ×Ｋ×Ｑ₁、Ｉ₁は初期状態の集合、Ｆ₁は受理状態の集合である。また、λ₁は初期重み関数であって、初期状態を引数にとり当該引数に渡した初期状態に割り当てられた初期重みを返す関数である。ρ₁は終了重み関数であって、受理状態を引数にとり当該引数に渡した受理状態に割り当てられた終了重みを返す関数である。なお、Ｔ₂に関するＱ₂、Ｅ₂、Ｆ₂、ρ₂に関しても同様である。ただし、ｉ₂は初期状態、λ₂は初期重みである。Ｋは重みを表す集合であって、例えば整数全体や正の整数全体、実数全体であったりする。 In the above nondeterministic WFST T ₁ , Q ₁ is a set of states, Σ is a set of input symbols, Δ is a set of output symbols, E ₁ is a set of transitions, and E ₁ ⊆Q ₁ × Σ × Δ × K × Q ₁ and I ₁ are a set of initial states, and F ₁ is a set of accepting states. Λ ₁ is an initial weight function that takes an initial state as an argument and returns an initial weight assigned to the initial state passed to the argument. ρ ₁ is an end weight function that takes an accepted state as an argument and returns an end weight assigned to the accepted state passed to the argument. _{_{Incidentally, Q 2, E 2, F}} 2 relates T _2, which is the same for [rho _2. However, i ₂ is an initial state, and λ ₂ is an initial weight. K is a set representing a weight, and may be an entire integer, an entire positive integer, or an entire real number, for example.

図２３は、決定化処理部３１が行う決定化処理の手順を示したフローチャートである。まず、決定化処理部３１は、Ｆ₂を空集合とした後（ステップＳ９１）、全ての初期状態の初期重みのうち、最も小さい値をλ₂に代入する（ステップＳ９２）。 FIG. 23 is a flowchart showing a procedure of determinizing processing performed by the determinizing processing unit 31. First, the determinizing processing unit 31 sets F ₂ as an empty set (step S91), and then substitutes the smallest value among the initial weights in all initial states for λ ₂ (step S92).

次いで、決定化処理部３１は、Ｔ₁の状態名（ｑ）、文字列、重みの３つ組を要素とする集合を生成し、Ｔ₂の初期状態名としてｉ₂に代入する（ステップＳ９３）。この３つ組のうち、文字列を「余りの文字列」と呼ぶこととし、ステップＳ９３では空文字列であるεを設定する。一方、３つ組のうちの重みを「余りの重み」と呼ぶこととし、ステップＳ９３ではλ₁(ｑ)−λ₂を代入する。 Next, the determinization processing unit 31 generates a set including the triplet of the state name (q), the character string, and the weight of T ₁ and assigns it to i ₂ as the initial state name of T ₂ (step S93). ). Of these triplets, the character string is referred to as “remainder character string”, and ε, which is an empty character string, is set in step S93. On the other hand, the weight of the triplet is referred to as “remainder weight”, and λ ₁ (q) −λ ₂ is substituted in step S93.

続いて、決定化処理部３１は、キューＳにｉ₂を追加した後（ステップＳ９４）、Ｓが空集合か否かを判定する（ステップＳ９５）。ここで、Ｓが空でないと判定した場合には（ステップＳ９５；Ｙｅｓ）、決定化処理部３１は、Ｓに含まれた要素を１つ取り出し、取り出した要素をｑ₂に代入する（ステップＳ９６）。 Subsequently, the determinizing processing unit 31 adds i ₂ to the queue S (step S94), and then determines whether S is an empty set (step S95). Here, if it is determined that S is not empty (step S95; Yes), determination processing unit 31 takes out one included elements in S, substituting the extracted elements to q ₂ (step S96 ).

次に、決定化処理部３１は、ｑ₂に含まれた状態名、余りの文字列、余りの重みの３つ組の集合（ｑ，ｌ，ｗ）のうち、ｑがＴ₁の受理状態の集合Ｆ₁に含まれているか否かを判定する（ステップＳ９７）。ここで、ｑがＴ₁の受理状態の集合Ｆ₁に含まれていないと判定した場合には（ステップＳ９７；Ｎｏ）、ステップＳ１００の処理へと直ちに移行する。 Next, the determinization processing unit 31 accepts a state where q is T ₁ out of a set (q, l, w) of the triplet of the state name, the surplus character string, and the surplus weight included in q _2. It is determined whether or not it is included in the set F ₁ (step S97). Here, q is when it is determined that not included in the set F ₁ of accepting states of T _1; immediately proceeds to the processing of (step S97 No), step S100.

また、ステップＳ９７において、ｑがＴ₁の受理状態の集合Ｆ₁に含まれていると判定した場合には（ステップＳ９７；Ｙｅｓ）、決定化処理部３１は、ｑ₂をＴ₂の受理状態の集合Ｆ₂に追加する（ステップＳ９８）。 Further, in step S97, if q is determined to be in the set F ₁ of accepting states of T ₁ (step S97; Yes), determination processing unit 31, an accepting state of the q ₂ T ₂ To the set F ₂ (step S98).

次いで、決定化処理部３１は、ｑ₂に属する３つ組（ｑ，ｌ，ｗ）のｑのうち、受理状態の集合Ｆ₁に含まれた全てのｑについてｗ＋ρ₁(ｑ)を計算し、その最小値をｑ₂の終了重みとしてρ₂(ｑ₂)に代入する（ステップＳ９９）。 Next, the determinizing processing unit 31 calculates w + ρ ₁ (q) for all q included in the set F _{1 of} accepted states among q of the triplet (q, l, w) belonging to q _2. , and it substitutes the minimum value ρ _₂ (q ₂₎ as the end weight q ₂ (step S99).

続いて、決定化処理部３１は、Σに含まれた全ての要素（入力記号）について、後述するステップＳ１０１からステップＳ１０９の処理を実行したか否かを判定する（ステップＳ１００）。ここで、Σのすべての要素について処理を実行したと判定した場合には（ステップＳ１００；Ｙｅｓ）、ステップＳ９５の処理へと再び戻る。 Subsequently, the determinizing processing unit 31 determines whether or not the processing from step S101 to step S109 described later has been executed for all elements (input symbols) included in Σ (step S100). Here, when it is determined that the process has been executed for all the elements of Σ (step S100; Yes), the process returns to the process of step S95 again.

一方、ステップＳ１００において、未処理の要素が存在すると判定した場合には（ステップＳ１００；Ｎｏ）、決定化処理部３１は、まだ処理していない入力記号のうちの１つをｘに代入した後（ステップＳ１０１）、Γ(ｑ₂,ｘ)が空か否かを判定する（ステップＳ１０２）。ここで、Γ(ｑ₂,ｘ)は、Γ(ｑ₂,ｘ)＝{(ｑ,ｌ，ｗ)∈ｑ₂|δ∈Ｅ₁,prev(δ)=ｑ,input(δ)=ｘ}である。つまり、ｑ₂に含まれている３つ組（ｑ，ｌ，ｗ）のうち、入力記号がｘであり遷移元がｑである遷移δがＥ₁に存在するという条件をみたす３つ組（ｑ，ｌ，ｗ）の集合を示している。 On the other hand, if it is determined in step S100 that there is an unprocessed element (step S100; No), the determinizing processing unit 31 substitutes one of the input symbols not yet processed for x. (Step S101), it is determined whether Γ (q ₂ , x) is empty (Step S102). Here, Γ (q ₂ , x) is Γ (q ₂ , x) = {(q, l, w) ∈q ₂ | δ∈E ₁ , prev (δ) = q, input (δ) = x }. That is, among the triples (q, l, w) included in q ₂ , the triples satisfying the condition that the transition δ whose input symbol is x and whose transition source is q exists in E ₁ ( q, l, w).

ステップＳ１０２において、Γ(ｑ₂,ｘ)を空と判定した場合には（ステップＳ１０２；Ｎｏ）、ステップＳ１００の処理へと再び戻る。また、ステップＳ１０２において、Γ(ｑ₂,ｘ)を空でないと判定した場合には（ステップＳ１０２；Ｙｅｓ）、決定化処理部３１は、遷移元がｑ₂で入力記号がｘである遷移の重みを算出し、この算出結果をｗ₂に代入する（ステップＳ１０３）。ここで代入されるｗ₂の値は、Γ(ｑ₂,ｘ)に属する３つ組（ｑ，ｌ，ｗ）の全てに対する次の値のうち最も小さい値となる。その値とは、Ｅ₁に属する遷移のうち、遷移元がｑ且つ入力記号がｘであるような遷移δの重みのうちで最も小さい値にｗを足した値である。なお、ｗｅｉｇｈｔ（δ）は遷移δの重みを表す。 If it is determined in step S102 that Γ (q ₂ , x) is empty (step S102; No), the process returns to step S100 again. If it is determined in step S102 that Γ (q ₂ , x) is not empty (step S102; Yes), the determinizing processing unit 31 determines whether the transition source is q ₂ and the input symbol is x. A weight is calculated, and the calculation result is substituted for w ₂ (step S103). The value of w ₂ substituted here is the smallest value among the following values for all triples (q, l, w) belonging to Γ (q ₂ , x). The value is a value obtained by adding w to the smallest value among the weights of the transition δ in which the transition source is q and the input symbol is x among the transitions belonging to E ₁ . Note that weight (δ) represents the weight of the transition δ.

続いて、決定化処理部３１は、遷移元がｑ₂で入力記号がｘである遷移の出力記号を算出し、この算出結果をｌ₂に代入する（ステップＳ１０４）。ここで代入されるｌ₂の値は、Γ(ｑ₂,ｘ)に属する３つ組（ｑ，ｌ，ｗ）の全てに対する次の文字列のうち、前方最長一致をとったものとなる。その文字列とはＥ₁に属する遷移のうち、遷移元がｑかつ入力記号がｘであるような遷移δの出力記号の前方最長一致をとった文字列の前方に文字列ｌをつなげたものである。例えば、Γ(ｑ₂,ｘ)に属する３つ組（ｑ，ｌ，ｗ）が１つだけで且つ条件を満たすδが２つあるとし、その出力記号output(δ)がＡＢとＡＣであり、文字列ｌがＰであるとすると、ｌ₂に代入される文字列はＰＡとなる。 Subsequently, the determinizing processing unit 31 calculates an output symbol of a transition whose transition source is q _{2 and} whose input symbol is x, and substitutes this calculation result into l ₂ (step S104). The value of l ₂ substituted here is the longest forward match among the next character strings for all triples (q, l, w) belonging to Γ (q ₂ , x). The character string is a string in which the character string l is connected in front of the character string having the longest forward match of the output symbol of the transition δ whose transition source is q and the input symbol is x among the transitions belonging to E ₁ It is. For example, if there is only one triple (q, l, w) belonging to Γ (q ₂ , x) and two δ satisfying the condition, the output symbol output (δ) is AB and AC. If the character string l is P, the character string assigned to l ₂ is PA.

次いで、決定化処理部３１は、ｑ₂から入力記号ｘによって遷移する遷移先の状態ｑ'₂を生成する（ステップＳ１０５）。ここで、ｑ'₂を生成するには、Ｔ₁の状態名と余りの文字列と余りの重みとで構成される３つ組の集合を生成しなければならない。そのうち、状態はν(ｑ₂,ｘ)に属しているもので、その要素をｑ'とするものである。なお、ν(ｑ₂,ｘ)＝{ｑ'|(ｑ,ｌ,ｗ)∈ｑ₂,δ∈Ｅ₁,prev(δ)=ｑ,input(δ)＝ｘ,next(δ)＝ｑ'}である。つまり、ｑ₂に属する３つ組（ｑ，ｌ，ｗ）に関して、Ｅ₁に属する遷移δのうち、遷移元の状態がｑであり、入力記号がｘであるような遷移の遷移先の状態をｑ'としたとき、この条件を満たすｑ'の集合が、ν(ｑ₂,ｘ)の返す値である。 Next, the determinizing processing unit 31 generates a transition destination state q ′ ₂ that transitions from q ₂ by the input symbol x (step S105). Here, in order to generate q ′ ₂ , it is necessary to generate a set of triples composed of the state name of T ₁ , the surplus character string, and the surplus weight. Of these, the state belongs to ν (q ₂ , x), and its element is q ′. Ν (q ₂ , x) = {q ′ | (q, l, w) ∈q ₂ , δ∈E ₁ , prev (δ) = q, input (δ) = x, next (δ) = q '}. That is, with respect to the triplet (q, l, w) belonging to q ₂ , of the transitions δ belonging to E ₁ , the transition destination state where the transition source state is q and the input symbol is x If q ′ is q ′, a set of q ′ satisfying this condition is a value returned by ν (q ₂ , x).

３つ組の２番目の値である余りの文字列は、ｑ'に対して次のような文字列になる。その文字列とは、集合γ(ｑ₂,ｘ)に属する４つ組（ｑ，ｌ，ｗ，δ）の夫々に対し、ｌの後ろにδの出力文字列を加えた文字列の前方からｌ₂の文字列を除去した文字列を算出し、それらの文字列の前方最長一致をとったものとなる。ここで、γ(ｑ₂,ｘ)＝{(ｑ，ｌ，ｗ，δ)∈ｑ₂×Ｅ₁|prev(δ)＝ｑ,input(δ)＝ｘ}である。つまり、ｑ₂に含まれている３つ組（ｑ，ｌ，ｗ）のうち、入力記号がｘであり且つ遷移元がｑであるようなＥ₁に属する遷移を含めた４つ組（ｑ,ｌ,ｗ,δ）の集合が、γ(ｑ₂,ｘ)の返す値である。 The remaining character string that is the second value of the triplet is the following character string for q ′. The character string is from the front of the character string obtained by adding the output character string of δ after l to each of the four sets (q, l, w, δ) belonging to the set γ (q ₂ , x). A character string obtained by removing the character string of l ₂ is calculated, and the longest forward match of those character strings is obtained. Here, γ (q ₂ , x) = {(q, l, w, δ) ∈q ₂ × E ₁ | prev (δ) = q, input (δ) = x}. That is, among the triples (q, l, w) included in q ₂ , the quadruple (q, including the transition belonging to E ₁ whose input symbol is x and whose transition source is q) , l, w, δ) is a value returned by γ (q ₂ , x).

３つ組の３番目の値である余りの重みは、ｑ'に対して次のような重みとなる。その重みの値は、集合γ(ｑ₂,ｘ)に属する４つ組（ｑ，ｌ，ｗ，δ）の夫々に対して、ｗにδの重みを加えｗ₂を引いた値のうち、最も小さい値となる。以上の計算により生成される３つ組の集合がｑ'₂となる。 The remaining weight, which is the third value of the triplet, is as follows for q ′. The value of the weight is the value obtained by adding the weight of δ to w and subtracting w ₂ for each of the four sets (q, l, w, δ) belonging to the set γ (q ₂ , x). The smallest value. A set of triples generated by the above calculation is q ′ ₂ .

続くステップＳ１０６において、決定化処理部３１は、ｑ₂からｑ'₂への遷移をＥ₂に追加する（ステップＳ１０６）。ここで追加した遷移の入力記号はｘであり、出力記号はｌ₂であり、重みはｗ₂である。 In subsequent step S106, the determinizing processing unit 31 adds a transition from q ₂ to q ′ ₂ to E ₂ (step S106). The input symbol of the transition added here is x, the output symbol is l ₂ , and the weight is w ₂ .

次いで、決定化処理部３１は、ステップＳ１０５で生成したｑ'₂がＱ₂に含まれているか否かを判定し、含まれていると判定した場合には（ステップＳ１０７；Ｎｏ）、ステップＳ１００の処理へと再び戻る。 Then, determination processing unit 31, when the generated q _'2 in step S105, it is determined whether or not included in Q _2, is determined to be included (step S107; No), step S100 Return to the process.

また、ステップＳ１０５において、ｑ'₂がＱ₂に含まれていないと判定した場合には（ステップＳ１０７；Ｙｅｓ）、決定化処理部３１は、ｑ'₂をＱ₂に追加し（ステップＳ１０８）、Ｓにｑ'₂を追加した後（ステップＳ１０９）、ステップＳ１００の処理へと再び戻る。 If it is determined in step S105 that q ′ ₂ is not included in Q ₂ (step S107; Yes), the determinizing processing unit 31 adds q ′ ₂ to Q ₂ (step S108). After adding q ′ ₂ to S (step S109), the process returns to step S100 again.

一方、ステップＳ９５において、Ｓが空であると判定した場合には（ステップＳ９５；Ｎｏ）、決定化処理部３１は、Ｔ₁を決定化したＴ₂の状態の集合Ｑ₂に属している状態の名前を付け替え（ステップＳ１１０）、本処理を終了する。 On the other hand, if it is determined in step S95 that S is empty (step S95; No), the determinizing processor 31 belongs to the set Q ₂ of T ₂ states that determined T ₁ . Is renamed (step S110), and the process is terminated.

つまり、名前の付け替え前はＱ₂に属する状態の名前はＱ₁に属する状態名と余りの出力記号と余りの重みの３つ組の集合で表現されていたが、決定化処理が完了すれば不要であるため、それを新しい名前に付け替える。新しい名前は例えば、０から順に各状態に番号を振るなどとすればよく、この処理によって記憶域を削減できる。なお，ＷＦＳＴのようにＦＳＡを拡張したモデルにおいても、名前の付け替え前のＱ₂に含まれる状態のことをＦＳＡの場合と同様に「決定化済み状態」という。 That is, before the name change, the name of the state belonging to Q ₂ was expressed by a set of triples of the state name belonging to Q ₁ , the remainder output symbol, and the remainder weight, but if the determinizing process is completed, Rename it to a new name because it is unnecessary. For example, a new name may be assigned to each state in order from 0, and the storage area can be reduced by this processing. Even in a model in which FSA is expanded as in WFST, the state included in Q ₂ before the name change is referred to as “determinized state” as in the case of FSA.

以上で説明した決定化処理によりＷＦＳＴを決定化した場合の一例を、図２４〜２６を用いて説明する。図２４は、決定化前の非決定性ＷＦＳＴＴ₁を示した図である。同図において、「０／０」と書かれている状態が初期状態であり、「／」の左側が状態番号を、右側が初期重みを表している。また、２重丸で描かれている状態「１／０」と「２／２」はそれぞれ受理状態であり、「／」の左側が状態番号を、右側が終了重みを表している。つまり、状態１の終了重みは０で、状態２の終了重みは２であることを示している。また、各遷移に書かれている文字は「入力記号：出力記号／重み」という意味で，例えば状態０から状態１への遷移の場合、入力記号がａで出力記号がＡで重みが１であることを示している。 An example when the WFST is determinized by the determinizing process described above will be described with reference to FIGS. FIG. 24 is a diagram showing non-deterministic WFST T ₁ before determinization. In the figure, the state written as “0/0” is the initial state, the left side of “/” represents the state number, and the right side represents the initial weight. Further, states “1/0” and “2/2” drawn by double circles are accepting states, the left side of “/” represents the state number, and the right side represents the end weight. That is, the end weight in state 1 is 0, and the end weight in state 2 is 2. The character written in each transition means “input symbol: output symbol / weight”. For example, in the case of transition from state 0 to state 1, the input symbol is a, the output symbol is A, and the weight is 1. It shows that there is.

図２５は、決定化処理のステップＳ１１０により名前の付け替えを行う直前のＷＦＳＴを示した図である。ここで、Ｂ２１で示した状態は初期状態であって、３つ組の集合は（０，ε,０）であり、初期重みは０である。つまり、このＢ２１に対応する決定化前の状態は、状態０であり、余りの出力記号は空文字列であり、余りの重みは０であることを示している。 FIG. 25 is a diagram showing the WFST immediately before the name is changed in step S110 of the determinizing process. Here, the state indicated by B21 is the initial state, the set of triples is (0, ε, 0), and the initial weight is 0. That is, the state before determinization corresponding to B21 is state 0, the remainder output symbol is an empty character string, and the remainder weight is 0.

Ｂ２２で示した状態は受理状態であり、終了重みは１である。また、この状態Ｂ２２に対応する決定化前の状態１，２のうち、状態１に関する３つ組では余りの文字列が空文字列であり、余りの重みは１である。また、状態２に関する３つ組では余りの文字列が空文字列であり、余りの重みは０である。なお、他の状態Ｂ２３〜Ｂ２５についても同様である。 The state indicated by B22 is an acceptance state, and the end weight is 1. Of the states 1 and 2 before determinization corresponding to this state B22, the surplus character string is an empty character string and the remainder weight is 1 in the triplet related to state 1. Further, in the triple for state 2, the surplus character string is an empty character string, and the weight of the remainder is zero. The same applies to the other states B23 to B25.

図２６は、決定化処理のステップＳ１１０により名前の付け替えを行った後のＷＦＳＴ、つまり決定化後のＷＦＳＴを示した図である。この図２６と図２５とに含まれた状態名を比較することで、名前の付け替えにより状態名を記憶するために必要なメモリ量が減少することがわかる。例えば、図２５においてＢ２４で示した状態は「｛（０,ε,２）,（１,Ａ,１）,（２,Ａ,０）｝／１」であり、３つ組を３つ含む集合と終了重みを記憶している。一方、図２６において、Ｂ２４に対応するＢ３１で示した状態では、「３／１」となり、状態番号と終了重みを記憶するだけとなっている。 FIG. 26 is a diagram showing the WFST after the name is changed in step S110 of the determinizing process, that is, the WFST after determinizing. By comparing the state names included in FIG. 26 and FIG. 25, it can be seen that the amount of memory required to store the state name is reduced by the name change. For example, the state indicated by B24 in FIG. 25 is “{(0, ε, 2), (1, A, 1), (2, A, 0)} / 1”, which includes three triples. It remembers the set and end weight. On the other hand, in FIG. 26, in the state indicated by B31 corresponding to B24, “3/1” is obtained, and only the state number and the end weight are stored.

しかしながら、上記した決定化処理部３１による決定化の方法のみでは、ＷＦＳＴの状態数や遷移数が増えるに伴い、決定化済み状態を記憶しておくために大きなメモリ量が必要となる。そのため、本実施形態では、部分集合生成部１２、部分決定化部３２、繰り返し処理部３３を備えることで、決定化済み状態を記憶するためのメモリ量の削減を行う。以下、部分決定化部３２、繰り返し処理部３３について説明する。 However, with only the determinizing method by the determinizing processing unit 31 described above, a large amount of memory is required to store the determinized state as the number of states and transitions of WFST increases. Therefore, in the present embodiment, by providing the subset generation unit 12, the partial determinization unit 32, and the iterative processing unit 33, the amount of memory for storing the determined state is reduced. Hereinafter, the partial determinator 32 and the iterative processor 33 will be described.

部分決定化部３２は、繰り返し回数ｉに対応した入力記号の部分集合Σ_iに基づいて、決定化の対象とする遷移を選択するとともに、決定化の対象とならない遷移に係る入力記号を異なる入力記号に置き換え、図２３で説明した従来の決定化方法を利用することで、決定化の対象とする遷移についての決定化（部分決定化）を行う。 The partial determinator 32 selects a transition to be determinized based on a subset Σ _i of input symbols corresponding to the number of repetitions i, and inputs different input symbols related to transitions not to be determinized. By replacing with symbols and using the conventional determinizing method described with reference to FIG. 23, determinization (partial determinization) is performed on the transition to be determinized.

繰り返し処理部３３は、部分決定化部３２を制御し、繰り返し総数ｎに応じた回数だけ部分決定化処理を繰り返し実行させる。 The iterative processing unit 33 controls the partial determinizing unit 32 to repeatedly execute the partial determinizing process as many times as the number of repetitions n.

次に、図２７、２８を用いて、部分決定化部３２及び繰り返し処理部３３により実行される逐次決定化処理について説明する。なお、本処理の前提として、部分集合生成部１２により部分集合Σ_iが予め生成されているものとする。また、逐次決定化処理の実行時に生成される、後述する各種変数は作業エリアとして機能するＲＡＭ５に一時記憶されるものとする。 Next, the sequential determinizing process executed by the partial determinator 32 and the repetitive processor 33 will be described with reference to FIGS. As a premise of this process, it is assumed that the subset Σ _i is generated in advance by the subset generation unit 12. It is assumed that various variables, which will be described later, that are generated when the sequential determinizing process is executed are temporarily stored in the RAM 5 that functions as a work area.

図２７は、本実施形態における逐次決定化処理の手順を示したフローチャートである。まず、繰り返し処理部３３は、繰り返し回数を計数するための変数ｉを１に設定すると（ステップＳ１２１）、このｉの値に応じた部分決定化処理（ステップＳ１２２）を部分決定化部３２に実行させる。以下、図２８を参照して、ステップＳ１２２の部分決定化処理について説明する。 FIG. 27 is a flowchart showing the procedure of the sequential determinizing process in the present embodiment. First, the repetition processing unit 33 sets a variable i for counting the number of repetitions to 1 (step S121), and executes a partial determinizing process (step S122) according to the value of i on the partial determinizing unit 32. Let Hereinafter, the partial determinizing process in step S122 will be described with reference to FIG.

図２８は、ステップＳ１２２の部分決定化処理の手順を示したフローチャートである。まず、部分決定化部３２は、決定化の対象となる遷移、つまり遷移の入力記号がΣ_iに含まれている遷移の集合をＥ_rに代入する（ステップＳ１３１）。次いで、部分決定化部３２は、決定化の対象とならない遷移、つまり遷移の入力記号がΣ_iに含まれていない遷移の集合をＥ_dに代入する（ステップＳ１３２）。なお、Σ_iのｉは、ステップＳ１２１又は後述するステップＳ１２５で設定されたｉの値に対応する。 FIG. 28 is a flowchart showing the procedure of the partial determinizing process in step S122. First, the partial determinizing unit 32 substitutes for E _{r a} transition to be determinized, that is, a set of transitions in which the input symbol of the transition is included in Σ _i (step S131). Next, the partial determinizing unit 32 substitutes E _d for a transition that is not subject to determinization, that is, a set of transitions in which the input symbol of the transition is not included in Σ _i (step S132). Note that _i of Σ _i corresponds to the value of i set in step S121 or step S125 described later.

続いて、部分決定化部３２は、Ｅ_dに属する全ての遷移δ_dについて、後述するステップＳ１３４、１３５の処理を施したか否かを判定する（ステップＳ１３３）。ここで、未処理の遷移δ_dが存在すると判定した場合（ステップＳ１３３；Ｎｏ）、部分決定化部３２は、上記した式（３）を満たすｘ_new、即ち、Ｅ_dの遷移の入力記号が互いに異なる入力記号となり、且つ、Σ_iとも異なる入力記号となるような入力記号ｘ_newを生成する（ステップＳ１３４）。そして、部分決定化部３２は、この生成したｘ_newを遷移δ_dにかかる入力記号と置き換え（ステップＳ１３５）、ステップＳ１３３の処理へと再び戻る。 Subsequently, area determining unit 32, for all the transitions [delta] _d belonging to E _d, determines whether performing the processing of step S134,135 to be described later (step S133). Here, if it is determined that the transition [delta] _d unprocessed exists (step S133; No), area determining unit 32, x _{new new} satisfying Equation (3) described above, i.e., the input symbol transitions E _d An input symbol x _new that is different from each other and that is also different from Σ _i is generated (step S134). Then, the partial determinizing unit 32 replaces the generated x _new with the input symbol relating to the transition δ _d (step S135), and returns to the process of step S133 again.

一方、ステップＳ１３３において、Ｅ_dに属する全ての遷移δ_dについて、ステップＳ１３４、１３５の処理を施したと判定した場合には（ステップＳ１３３；Ｙｅｓ）、ステップＳ１３６の処理へと移行する。 On the other hand, in step S133, for all the transitions [delta] _d belonging to E _d, when it is determined that subjected to the process of step S134,135 (step S133; Yes), and proceeds to step S136.

次に、部分決定化部３２は、Ｅ_dに属した遷移δ_dをＥ_rに追加した後（ステップＳ１３６）、このＥ_rを含む非決定性ＷＦＳＴＴ_r＝（Ｑ₁，Σ_r，Δ,Ｅ_r，Ｉ₁，Ｆ₁,λ₁,ρ₁）の決定化処理を決定化処理部３１に実行させ、その結果をＴ'_r（Ｔ'_r＝（Ｑ₂，Σ_r，Δ,Ｅ'_r，ｉ₂，Ｆ₂,λ₂,ρ₂））に代入する（ステップＳ１３７）。ここで、Σ_rは入力記号を置き換えた後の入力記号の集合である。なお、ステップＳ１３７で行われる決定化処理の手順は、図２３で説明したものと同様であるため、説明を省略する。 Next, area determining unit 32, after adding the transition [delta] _d belonging to E _d to E _r (step S136), the non-deterministic WFST T _r = (Q ₁ including the E _r, sigma _r, delta, The determinizing processing unit 31 executes the determinizing process of E _r , I ₁ , F ₁ , λ ₁ , ρ ₁ ), and the result is T ′ _r (T ′ _r = (Q ₂ , Σ _r , Δ, E). ' _r , i ₂ , F ₂ , λ ₂ , ρ ₂ )) (step S137). Here, Σ _r is a set of input symbols after replacing the input symbol. Note that the determinizing process performed in step S137 is the same as that described with reference to FIG.

続いて、部分決定化部３２は、Ｔ'_rの入力記号Σ_rをステップＳ１３５で置き換える前の元の入力記号に戻した後、このＴ'_rをＴ₂＝（Ｑ₂，Σ，Δ,Ｅ₂，ｉ₂，Ｆ₂,λ₂,ρ₂）に代入し（ステップＳ１３８）、図２７のステップＳ１２３の処理へと移行する。ここでＴ₂は、Ｔ₁のうちΣ_iに属する入力記号を持つ遷移のみを決定化したＷＦＳＴとなっている。 Subsequently, area determining unit 32 'after returning the input symbol sigma _r of _r based on the input symbol before replacing in step S135, the T' T a _{_{_{r T 2 = (Q 2,}}} Σ, Δ, E ₂ , i ₂ , F ₂ , λ ₂ , ρ ₂ ) (step S138), and the process proceeds to step S123 in FIG. Here, T ₂ is WFST in which only transitions having input symbols belonging to Σ _i in T ₁ are determined.

図２７に戻り、繰り返し処理部３３は、ｉの値が繰り返し回数の最大値であるｎを下回るか否かを判定する（ステップＳ１２３）。ここで、ｉの値がｎを下回ると判定した場合には（ステップＳ１２３；Ｙｅｓ）、繰り返し処理部３３は、ステップＳ１２２で決定化されたＷＦＳＴＴ₂をＷＦＳＴＴ₁とし（ステップＳ１２４）、ｉの値を１増やした後（ステップＳ１２５）、ステップＳ１２２の処理へと再び戻る。 Returning to FIG. 27, the iterative processing unit 33 determines whether or not the value of i is less than n which is the maximum number of repetitions (step S123). Here, if the value of i is determined to be below the n (Step S123; Yes), repetitive processing unit 33, a WFST T ₂ determined reduction in step S122 as WFST T ₁ (step S124), i 1 is increased by 1 (step S125), and the process returns to step S122 again.

また、ステップＳ１２３において、ｉの値がｎ以上と判定した場合には（ステップＳ１２３；Ｎｏ）、本処理を終了する。ここで、最終的に得られたＷＦＳＴＴ₂は処理対象となった非決定性ＷＦＳＴＴ₁を決定化したものとなっている。 If it is determined in step S123 that the value of i is n or more (step S123; No), this process ends. Here, the finally obtained WFST T ₂ is obtained by determinizing the non-deterministic WFST T ₁ to be processed.

上述した逐次決定化処理の動作を、図２４で示した非決定性ＷＦＳＴＴ₁を基に、図２９〜３１を用いて説明する。なお、繰り返し総数ｎは２とし、部分集合生成部１２により生成された部分集合Σ_iが、Σ₁＝｛ａ｝、Σ₂＝｛ａ,ｂ｝であるものとする。 The operation of the sequential determinizing process described above will be described with reference to FIGS. 29 to 31 based on the nondeterministic WFST T ₁ shown in FIG. It is assumed that the total number n of iterations is 2, and the subset Σ _i generated by the subset generator 12 is Σ ₁ = {a}, Σ ₂ = {a, b}.

まず、１回目（ｉ＝１）の決定化に関して、決定化を行い且つ状態名を置き換える前（ステップＳ１１０を実行する直前）のＷＦＳＴを図２９に示す。この図から、入力記号はａの遷移については決定化できていることが分かる。また、入力記号ｂについては、ｂ₁、ｂ₂という記号に置き換えられており、元に戻すとどちらもｂとなるので、入力記号ｂに係る遷移についてはまだ決定化されていないことが分かる。なお、続くステップＳ１１０の処理により状態名を付け替え、さらに続くステップＳ１３８の処理により、図２９の入力記号を元の入力記号に戻した結果のＷＦＳＴは、図３０のようになる。 First, regarding the first determinization (i = 1), WFST before determinizing and replacing the state name (immediately before executing step S110) is shown in FIG. From this figure, it can be seen that the input symbol is deterministic for the transition of a. Further, the input symbol b is replaced with symbols b ₁ and b ₂ , and both of them are changed to b, so that it is understood that the transition related to the input symbol b has not yet been determinized. Note that the WFST as a result of changing the state name in the subsequent step S110 and returning the input symbol in FIG. 29 to the original input symbol in the subsequent step S138 is as shown in FIG.

次に２回目（ｉ＝２）の決定化を実行する。つまり、図３０のＷＦＳＴを決定化する。この決定化の結果、図３０のＷＦＳＴは図３１のようになる。なお、図３１では状態名の付け替え前のＷＦＳＴを示している。状態名を付け替えると図２６のようになる。 Next, the second (i = 2) determinization is executed. That is, the WFST in FIG. 30 is determinized. As a result of this determinization, the WFST in FIG. 30 becomes as shown in FIG. FIG. 31 shows the WFST before the state name is changed. When the state name is changed, it becomes as shown in FIG.

ところで、図２３で説明した決定化処理（従来の決定化方法）のみを用いて図２４のＷＦＳＴを決定化すると、状態名の付け替え前では図２５のように表されることは上述したとおりである。このとき、記憶しなければならない３つ組（状態名、余りの文字列、余りの重み）の数を数えると、状態Ｂ２１には｛（０，ε，０）｝で１個、状態Ｂ２２には｛（１，ε，１），（２，ε，０）｝で２個、状態Ｂ２３には｛（０，ε，０），（３，ε，１）｝で２個、状態Ｂ２４には｛（０，ε，２），（１，Ａ，１），（２，Ａ，０）｝で３個、状態Ｂ２５には｛（０，ε，０），（１，ε，１），（２，ε，０）｝で３個となり、合計１１個である。 By the way, if the WFST in FIG. 24 is determinized using only the determinizing process (conventional determinizing method) described in FIG. 23, it is expressed as shown in FIG. is there. At this time, when the number of triples (state name, remainder character string, remainder weight) that must be stored is counted, one is {(0, ε, 0)} in state B21, and in state B22. Are two in {(1, ε, 1), (2, ε, 0)}, two in state B23, two in {(0, ε, 0), (3, ε, 1)}, in state B24 Are {(0, ε, 2), (1, A, 1), (2, A, 0)}, and state B25 has {(0, ε, 0), (1, ε, 1). , (2, ε, 0)}, there are three, a total of eleven.

一方、図２７、２８で説明した逐次決定化処理により図２４のＷＦＳＴを決定化すると、１回目の決定化では図２９のようになり、状態Ｂ４１には｛（０，ε，０）｝で１個、状態Ｂ４２には｛（１，ε，１），（２，ε，０）｝で２個、状態Ｂ４３には｛（３，ε，０）｝で１個となり、合計４個である。また、２回目の決定化では図３１のようになり、同様に数えると合計８個となる。 On the other hand, when the WFST of FIG. 24 is determined by the sequential determinizing process described with reference to FIGS. 27 and 28, the first determinization is as shown in FIG. 29, and the state B41 has {(0, ε, 0)}. 1 state, 2 in state B42 {(1, ε, 1), (2, ε, 0)}, and 1 state B43 in {(3, ε, 0)}, a total of 4 is there. Further, in the second determinization, it becomes as shown in FIG.

したがって、逐次決定化処理による決定化の方法では、決定化の際に状態名として記憶しなければならない３つ組の数は最大８個となり、決定化処理のみを用いた決定化の方法よりも合計要素数の最大値を減らすことができる。つまり、決定化済み状態の記憶に必要なメモリ量を減らすことができる。 Therefore, in the determinizing method using the sequential determinizing process, the maximum number of triples that must be stored as state names at the time of determinizing is eight, which is more than the determinizing method using only the determinizing process. The maximum value of the total number of elements can be reduced. That is, the amount of memory necessary for storing the determined state can be reduced.

次に本実施形態にかかるオートマトン決定化装置３００の変形例として、本実施形態にかかる決定化の方法（逐次決定化処理）を音声認識装置に適用した態様について説明する。 Next, as a modification of the automaton determinizing apparatus 300 according to the present embodiment, a mode in which the determinizing method (sequential determinizing process) according to the present embodiment is applied to a speech recognition apparatus will be described.

図３２は、音声認識装置５００の構成を模式的に示した図である。図３２に示したように、音声認識装置５００は、図示しないマイク等を介して入力される音声信号から音声認識に必要となる特徴量を抽出する特徴量抽出部５０１と、後述するＷＦＳＴの合成と最適化を行う合成最適化部５０２と、合成最適化部５０２で最適化されたＷＦＳＴに基づいて、抽出された特徴量を文字列へと変換するデコーダ５０３と、を備えている。 FIG. 32 is a diagram schematically showing the configuration of the speech recognition apparatus 500. As shown in FIG. 32, the speech recognition apparatus 500 includes a feature amount extraction unit 501 that extracts a feature amount necessary for speech recognition from a speech signal input via a microphone (not shown), and a WFST synthesis described later. And a synthesis optimization unit 502 that performs optimization, and a decoder 503 that converts the extracted feature quantity into a character string based on the WFST optimized by the synthesis optimization unit 502.

また、音声認識装置５００は、図示しない記憶手段に、音響モデル５０４、単語辞書５０５、言語モデル５０６を予め夫々記憶している。ここで、音響モデル５０４にはどの音素が入力された音声信号に最も近いかを判断するための情報が保持されている。また、単語辞書５０５には各単語がどういった音素列で構成されているかが保持されている。また、言語モデル５０６には認識対象の言語においてどの単語の並びが尤もらしいかどうかを判断するための情報（スコア）が保持されている。 The speech recognition apparatus 500 stores an acoustic model 504, a word dictionary 505, and a language model 506 in advance in storage means (not shown). Here, the acoustic model 504 holds information for determining which phoneme is closest to the input voice signal. The word dictionary 505 holds what phoneme string each word is composed of. The language model 506 holds information (score) for determining which word sequence is likely in the recognition target language.

なお、音声認識装置５００では、これら音響モデル５０４、単語辞書５０５及び言語モデル５０６は、上述したＷＦＳＴで表現されているものとする。このような音声認識装置で用いられるＷＦＳＴの作成方法の例は、Mehryar Mohri, Michael Riley著, Integrated Context-Dependent Networks in Very Large Vocabulary Speech Recognition, EUROSPEECH '99, Volume 2, Page 811-814に記載されている。 In the speech recognition apparatus 500, it is assumed that the acoustic model 504, the word dictionary 505, and the language model 506 are expressed in the above-described WFST. An example of a method for creating a WFST used in such a speech recognition apparatus is described in Mehryar Mohri, Michael Riley, Integrated Context-Dependent Networks in Very Large Vocabulary Speech Recognition, EUROSPEECH '99, Volume 2, Page 811-814. ing.

具体的に、音響モデル５０４には一般にＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖｍｏｄｅｌ）が用いられるが、この場合、入力記号を遷移確率と出力確率を計算する関数の番号に、出力記号を音素にすることでＷＦＳＴとして表現できる。ただし、このＷＦＳＴの重みは入力記号で指定された番号の関数を使って動的に、つまりデコーダ５０３が特徴量を受け取ったときに計算されることになる。 Specifically, HMM (Hidden Markov model) is generally used for the acoustic model 504. In this case, the input symbol is set to the function number for calculating the transition probability and the output probability, and the output symbol is set as the phoneme as WFST. Can express. However, the weight of this WFST is calculated dynamically using the function of the number designated by the input symbol, that is, when the decoder 503 receives the feature amount.

また、単語辞書５０５では、音素を入力記号とし、単語を出力記号としたＷＦＳＴで表現することができる。また、言語モデルでは、単語を入力記号と出力記号とし、重みとして言語モデルのスコアを用いることで表現できる。 Also, the word dictionary 505 can be expressed in WFST using phonemes as input symbols and words as output symbols. The language model can be expressed by using words as input symbols and output symbols, and using the language model scores as weights.

合成最適化部５０２は、複数のＷＦＳＴを１つに合成する公知の手法により、音響モデル５０４を表すＷＦＳＴ、単語辞書５０５を表すＷＦＳＴ、言語モデル５０６を表すＷＦＳＴの何れか又は全てを１つのＷＦＳＴに合成する。 The synthesizing optimization unit 502 uses a known method of synthesizing a plurality of WFSTs into one, WFST representing the acoustic model 504, WFST representing the word dictionary 505, and WFST representing the language model 506 as one WFST. To synthesize.

また、合成最適化部５０２は、合成したＷＦＳＴを含む夫々のＷＦＳＴに対して、デコーダ５０３の処理量が少なくなるように、また、デコーダ５０３で処理しやすいように最適化を行う。この最適化を行なう処理の一つに本実施形態にかかる逐次決定化処理が含まれているものとする。即ち、合成最適化部５０２には、上述した本実施形態の逐次決定化処理を実行するオートマトン決定化装置３００が含まれている。 The synthesis optimization unit 502 optimizes each WFST including the synthesized WFST so that the processing amount of the decoder 503 is reduced and the decoder 503 can easily process the WFST. One of the processes for performing the optimization includes the sequential determinizing process according to the present embodiment. That is, the synthesis optimizing unit 502 includes the automaton determinizing apparatus 300 that executes the sequential determinizing process of the present embodiment described above.

デコーダ５０３は、合成最適化部５０２で合成・最適化されたＷＦＳＴを用いて、特徴量抽出部５０１により抽出された音声信号の特徴量を文字列へと変換し、これを認識結果として出力する。文字列への変換には例えばビタビ探索を用いることができる。 The decoder 503 uses the WFST synthesized and optimized by the synthesis optimization unit 502 to convert the feature value of the voice signal extracted by the feature value extraction unit 501 into a character string, and outputs this as a recognition result. . For example, Viterbi search can be used for conversion to a character string.

図３３は、合成最適化部５０２により単語辞書５０５を表すＷＦＳＴと、言語モデル５０６を表すＷＦＳＴとを合成したＷＦＳＴに対して、上述した逐次決定化処理を行なう際に必要となる記憶域の変化を示した図である。ここで、実線で示したグラフは、本実施形態の決定化の方法（逐次決定化処理）により決定化を行った結果を示している。また、比較のため、従来の決定化の方法により決定化を行った結果を破線により示している。 FIG. 33 shows a change in storage area required when performing the above-described sequential determinizing process on the WFST obtained by synthesizing the WFST representing the word dictionary 505 and the WFST representing the language model 506 by the synthesis optimization unit 502. FIG. Here, the graph shown with the continuous line has shown the result of having determinized by the determinizing method (sequential determinizing process) of this embodiment. For comparison, the result of determinization by the conventional determinizing method is indicated by a broken line.

横軸は図２３のステップＳ９５からステップＳ１０９までの処理を繰り返した回数を表しており、そのオーダーは「×１０００」となっている。言い換えると、キューＳに投入された決定化後の状態に関し、いくつの状態まで処理したかを示している。縦軸はＱ₂に含まれている決定化済み状態に含まれている３つ組（状態名，余りの文字列，余りの重み）の総数を表しており、そのオーダーは「×１００００００」となっている。また、決定化を行なう前の状態数は１８６８８個であり、遷移数は２３０９３０個である。また、入力記号の数は６０個である。 The horizontal axis represents the number of times the processing from step S95 to step S109 in FIG. 23 is repeated, and the order is “× 1000”. In other words, it shows how many states have been processed with respect to the state after determinization placed in the queue S. The vertical axis represents the total number of triples that are included in the determination of pre-state, which is included in Q ₂ (state name, the remainder of the string, the weight of the remainder), the order is a "× 1000000" It has become. The number of states before determinizing is 18688 and the number of transitions is 230930. The number of input symbols is 60.

図３３において、従来の決定化の方法により決定化を行った結果を点線で示している。図から明らかなように、従来の決定化の方法では、処理した状態数が増えるに伴い、決定化済み状態に含まれている３つ組の総数は急激に増加し、Ｃ１１で示した点で処理が完了している。 In FIG. 33, the result of determinizing by the conventional determinizing method is indicated by a dotted line. As is clear from the figure, in the conventional determinizing method, as the number of processed states increases, the total number of triples included in the determinized state increases abruptly, as indicated by C11. Processing is complete.

一方、本実施形態の決定化の方法（逐次決定化処理）により決定化を行った結果を実線で示している。この例では６回繰り返しており、１回の部分決定化で決定化される遷移の入力記号の数を、繰り返しのたびに１０個ずつ増やしている。決定化済み状態に含まれている３つ組の総数は単調増加せず、図２７におけるステップＳ１２２の部分決定化処理が完了するたびに、０個になる。これは決定化後、名前の置き換え処理が行なわれるためである。 On the other hand, the result of determinizing by the determinizing method (sequential determinizing process) of this embodiment is shown by a solid line. In this example, the process is repeated 6 times, and the number of input symbols of transition determined by one partial determinization is increased by 10 for each repetition. The total number of triples included in the determinized state does not increase monotonously and becomes zero each time the partial determinizing process in step S122 in FIG. 27 is completed. This is because a name replacement process is performed after determinization.

図３３において、Ｃ２１は一連の処理の中で１回目の部分決定化が完了した点を示しており、Ｃ２２は２回目の部分決定化が完了した点を示している。以後同様にＣ２３〜２５は、３回目〜５回目の部分決定化が完了した点を示しており、Ｃ２６は６回目の決定化が完了した点（逐次決定化処理が完了した点）を示している。 In FIG. 33, C21 indicates that the first partial determinization has been completed in the series of processes, and C22 indicates that the second partial determinization has been completed. Thereafter, similarly, C23 to 25 indicate the points at which the third to fifth partial determinization is completed, and C26 indicates the point at which the sixth determinization is completed (the point at which the sequential determinization process is completed). Yes.

この図３３から明らかなように、本実施形態にかかる決定化の方法（逐次決定化処理）では、従来の決定化の方法に比べ繰り返す回数は増えるものの、決定化済み状態を記憶しておくための記憶域を減らすことが可能である。 As is apparent from FIG. 33, the determinizing method (sequential determinizing process) according to the present embodiment stores the determinized state although the number of repetitions is increased as compared with the conventional determinizing method. It is possible to reduce the storage area.

以上のように、本実施形態によれば、非決定性ＷＦＳＴに含まれた遷移から、決定化の対象とする遷移を一部選び、この決定化対象とする遷移を毎回変えながら繰り返し決定化を行うことで、一回の決定化で記憶する決定化済み状態のうち、それを構成する３つ組の個数を減少させることができるため、決定化の実行時に要するメモリ量を減少させることができる。 As described above, according to the present embodiment, a part of transitions to be determinized is selected from the transitions included in the non-deterministic WFST, and repeated determinization is performed while changing the transition to be determinized each time. Thus, since the number of triples constituting the determinized state stored in one determinization can be reduced, the amount of memory required for executing determinization can be reduced.

なお、本実施形態では、非決定性ＷＦＳＴの決定化について説明したが、これに限らず、ＷＦＳＡやＦＳＴ等のＦＳＡを拡張したオートマトンについても、本実施形態の決定化の方法を適用することが可能である。 In the present embodiment, the determinization of non-deterministic WFST has been described. However, the present invention is not limited to this, and the determinization method of the present embodiment can be applied to an automaton that extends FSA such as WFSA and FST. It is.

［第４の実施形態］
上述した第３の実施形態では、部分決定化処理を行う毎に入力記号の置き換えを行う態様を説明した。本実施形態では入力記号の置き換えを行うことなく、非決定性ＷＦＳＴの決定化を実現し、当該決定化に必要となるメモリ量を削減することが可能なオートマトン決定化装置４００について説明する。なお、上述した第３の実施形態と同様の構成については、同一の符号を付与しその説明を省略する。 [Fourth Embodiment]
In the above-described third embodiment, the aspect in which the input symbol is replaced every time the partial determinizing process is performed has been described. In the present embodiment, an automaton determinator 400 capable of realizing determinization of nondeterministic WFST without reducing input symbols and reducing the amount of memory required for the determinism will be described. In addition, about the structure similar to 3rd Embodiment mentioned above, the same code | symbol is provided and the description is abbreviate | omitted.

図３４は、図１に示したＣＰＵ１とＲＯＭ４又は記憶部６に予め記憶された所定のプログラムとの協働により実現される、オートマトン決定化装置４００の機能的構成を示した図である。同図に示したように、オートマトン決定化装置４００は、部分集合生成部１２、部分決定化部４１及び繰り返し処理部３３を備えている。 FIG. 34 is a diagram showing a functional configuration of the automaton determinator 400 realized by cooperation of the CPU 1 shown in FIG. 1 and a predetermined program stored in the ROM 4 or the storage unit 6 in advance. As shown in the figure, the automaton determinizing apparatus 400 includes a subset generation unit 12, a partial determinization unit 41, and an iterative processing unit 33.

ここで、部分決定化部４１は、上述した決定化処理時において、決定化済み状態に含まれた非決定性ＷＦＳＴの状態が遷移元であるような遷移のうち、入力記号が同じ遷移から決定化する遷移を所定の遷移選択方法によって選択し、この選択した遷移から決定化済み状態を生成する。ここで、所定の遷移選択方法は、特に問わないものとするが、本実施形態では、第１の実施形態で説明した、部分集合Σ_iに基づいて行うものとする。 Here, the partial determinator 41 determines from the transitions having the same input symbol among the transitions in which the state of the non-deterministic WFST included in the determinized state is the transition source during the determinizing process described above. A transition to be selected is selected by a predetermined transition selection method, and a determinized state is generated from the selected transition. Here, the predetermined transition selection method is not particularly limited, but in the present embodiment, the predetermined transition selection method is performed based on the subset Σ _i described in the first embodiment.

以下、図３５〜３７を参照して、部分決定化部４１及び繰り返し処理部３３により実行される、本実施形態の逐次決定化処理について説明する。なお、本処理の前提として、部分集合生成部１２により部分集合Σ_iが予め生成されているものとする。 Hereinafter, with reference to FIGS. 35 to 37, the sequential determinizing process of the present embodiment executed by the partial determinizing unit 41 and the iterative processing unit 33 will be described. As a premise of this process, it is assumed that the subset Σ _i is generated in advance by the subset generation unit 12.

図３５は、本実施形態における逐次決定化処理の手順を示したフローチャートである。まず、繰り返し処理部３３は、繰り返し回数を計数するための変数ｉを１に設定すると（ステップＳ１４１）、このｉの値に応じた部分決定化処理（ステップＳ１４２）を部分決定化部４１に実行させる。以下、図３６を参照して、ステップＳ１４２の部分決定化処理について説明する。 FIG. 35 is a flowchart showing the procedure of the sequential determinizing process in this embodiment. First, the repetition processing unit 33 sets a variable i for counting the number of repetitions to 1 (step S141), and executes a partial determinizing process (step S142) according to the value of i to the partial determinizing unit 41. Let Hereinafter, with reference to FIG. 36, the partial determinizing process in step S142 will be described.

図３６は、ステップＳ１４２の部分決定化処理の手順を示したフローチャートである。まず、部分決定化部４１は、Ｆ₂を空集合とした後（ステップＳ１５１）、全ての初期状態の初期重みのうち、最も小さい値をλ₂に代入する（ステップＳ１５２）。 FIG. 36 is a flowchart showing the procedure of the partial determinizing process in step S142. First, the partial determinator 41 sets F ₂ as an empty set (step S151), and then substitutes the smallest value among the initial weights of all initial states for λ ₂ (step S152).

次いで、部分決定化部４１は、Ｔ₁の状態名（ｑ）、文字列、重みの３つ組を要素とする集合を生成し、Ｔ₂の初期状態名としてｉ₂に代入する（ステップＳ１５３）。この３つ組のうち、文字列を「余りの文字列」と呼ぶこととし、ステップＳ１５３では空文字列であるεを設定する。一方、３つ組のうちの重みを「余りの重み」と呼ぶこととし、このステップではλ₁(ｑ)−λ₂を代入する。 Next, the partial determinizing unit 41 generates a set having the triplet of the state name (q), the character string, and the weight of T ₁ as elements, and substitutes it into i ₂ as the initial state name of T ₂ (step S153). ). Of these triplets, the character string is referred to as “remainder character string”, and ε, which is an empty character string, is set in step S153. On the other hand, the weight of the triplet is referred to as “remainder weight”, and λ ₁ (q) −λ ₂ is substituted in this step.

続いて、部分決定化部４１は、キューＳにｉ₂を追加した後（ステップＳ１５４）、Ｓが空集合か否かを判定する（ステップＳ１５５）。ここで、Ｓが空でないと判定した場合には（ステップＳ１５５；Ｙｅｓ）、部分決定化部４１は、Ｓに含まれた要素を１つ取り出し、取り出した要素をｑ₂に代入する（ステップＳ１５６）。 Subsequently, after adding i ₂ to the queue S (step S154), the partial determinator 41 determines whether S is an empty set (step S155). Here, if it is determined that S is not empty (step S155; Yes), area determining section 41 takes out one included in the S element, substitutes the extracted elements to q ₂ (step S156 ).

次に、部分決定化部４１は、ｑ₂に含まれた状態名、余りの文字列、余りの重みの３つ組の集合（ｑ，ｌ，ｗ）のうち、ｑがＴ₁の受理状態の集合Ｆ₁に含まれているか否かを判定する（ステップＳ１５７）。ここで、ｑがＴ₁の受理状態の集合Ｆ₁に含まれていないと判定した場合には（ステップＳ１５７；Ｎｏ）、ステップＳ１６０の処理へと直ちに移行する。 Next, the partial determinator 41 accepts a state where q is T ₁ out of a set (q, l, w) of triples of the state name, the remainder character string, and the remainder weight included in q _2. It is determined whether or not it is included in the set F ₁ (step S157). Here, when it is determined that q is not included in the acceptance state set F ₁ of T ₁ (step S157; No), the process immediately proceeds to step S160.

また、ステップＳ１５７において、ｑがＴ₁の受理状態の集合Ｆ₁に含まれていると判定した場合には（ステップＳ１５７；Ｙｅｓ）、部分決定化部４１は、ｑ₂をＴ₂の受理状態の集合Ｆ₂に追加する（ステップＳ１５８）。 Further, in step S157, q is when it is determined to be included in the set F ₁ of accepting states of T ₁ (step S157; Yes), area determining section 41, an accepting state of the q ₂ T ₂ Is added to the set F ₂ (step S158).

次いで、部分決定化部４１は、ｑ₂に属する３つ組（ｑ，ｌ，ｗ）のｑのうち、受理状態の集合Ｆ₁に含まれた全てのｑについてｗ＋ρ₁(ｑ)を計算し、その最小値をｑ₂の終了重みとしてρ₂(ｑ₂)に代入する（ステップＳ１５９）。 Next, the partial determinator 41 calculates w + ρ ₁ (q) for all q included in the set F _{1 of} accepted states among q of the triplet (q, l, w) belonging to q _2. , and it substitutes the minimum value ρ _₂ (q ₂₎ as the end weight q ₂ (step S159).

続いて、部分決定化部４１は、集合Σに含まれたすべての要素（入力記号）について、後述するステップＳ１６１からステップＳ１７２の処理を実行したか否かを判定する（ステップＳ１６０）。ここで、Σに含まれた全ての要素（入力記号）について処理を実行したと判定した場合には（ステップＳ１６０；Ｙｅｓ）、ステップＳ１５５の処理へと再び戻る。 Subsequently, the partial determinator 41 determines whether or not the processing of steps S161 to S172 described later has been executed for all elements (input symbols) included in the set Σ (step S160). Here, when it is determined that the process has been executed for all elements (input symbols) included in Σ (step S160; Yes), the process returns to the process of step S155 again.

一方、ステップＳ１６０において、未処理の要素が存在すると判定した場合には（ステップＳ１６０；Ｎｏ）、部分決定化部４１は、まだ処理していない入力記号のうちの１つをｘに代入する（ステップＳ１６１）。 On the other hand, if it is determined in step S160 that there is an unprocessed element (step S160; No), the partial determinator 41 substitutes one of the input symbols that have not yet been processed for x ( Step S161).

次に、部分決定化部４１は、ｑ₂に属する３つ組（ｑ，ｌ，ｗ）の集合において、Ｅ₁に属する遷移δのうち、遷移元がｑであり且つ入力記号にｘをとる遷移の集合をＥ_dに代入した後（ステップＳ１６２）、このＥ_dが空集合か否かを判定する（ステップＳ１６３）。ここで、Ｅ_dを空集合と判定した場合には（ステップＳ１６３；Ｎｏ）、ステップＳ１６０の処理に再び戻る。 Next, the partial determinator 41 takes the transition symbol q and the input symbol x among the transitions δ belonging to E ₁ in the set of triples (q, l, w) belonging to q _2. After the transition set is substituted for E _d (step S162), it is determined whether this E _d is an empty set (step S163). If it is determined that E _d is an empty set (step S163; No), the process returns to step S160 again.

一方、ステップＳ１６３において、Ｅ_dが空集合でないと判定した場合（ステップＳ１６３；Ｙｅｓ）、部分決定化部４１は、Ｅ_dから遷移を選択する遷移選択処理（ステップＳ１６４）を実行する。以下、図３７を参照して、ステップＳ１６４の遷移選択処理について説明する。 On the other hand, if it is determined in step S163 that E _d is not an empty set (step S163; Yes), the partial determinator 41 executes a transition selection process (step S164) for selecting a transition from E _d . Hereinafter, the transition selection processing in step S164 will be described with reference to FIG.

図３７は、ステップＳ１６４の遷移選択処理の手順を示したフローチャートである。まず、部分決定化部４１は、ステップＳ１６１で代入した入力記号ｘが、部分集合Σ_iに含まれているか否かを判定する（ステップＳ１６４１）。なお、Σ_iのｉは、ステップＳ１４１又は後述するステップＳ１４５で設定されたｉの値に対応する。 FIG. 37 is a flowchart showing the procedure of the transition selection process in step S164. First, the partial determinator 41 determines whether or not the input symbol x substituted in step S161 is included in the subset Σ _i (step S1641). Note that _i of Σ _i corresponds to the value of i set in step S141 or step S145 described later.

ここで、入力記号ｘが部分集合Σ_iに含まれていると判定した場合には（ステップＳ１６４１；Ｙｅｓ）、遷移の集合Ｅ_tにＥ_dを代入し（ステップＳ１６４２）、図３６のステップＳ１６５の処理へと移行する。 Here, when the input symbol x is determined to be included in the subset sigma _i (step S1641; Yes), by substituting E _d to the set E _t transitions (step S1642), step S165 of FIG. 36 Move on to processing.

一方、ステップＳ１６４１において、入力記号ｘが部分集合Σ_iに含まれないと判定した場合には（ステップＳ１６４１；Ｎｏ）、Ｅ_dから任意の遷移を一つ取り出し、遷移の集合Ｅ_tに代入した後（ステップＳ１６４３）、図３６のステップＳ１６５の処理へと移行する。 On the other hand, in step S1641, if the input symbol x is judged it not included in the subset sigma _i (step S1641; No), take out one any transition from E _d, and assigned to the set E _t of the transition After (step S1643), the process proceeds to step S165 in FIG.

図３６に戻り、部分決定化部４１は、ステップＳ１６４で選択した遷移の集合Ｅ_tに属する遷移をＥ_dから除去する（ステップＳ１６５）。 Returning to FIG. 36, the partial determinator 41 removes the transition belonging to the set of transitions E _t selected in step S164 from E _d (step S165).

続いて、部分決定化部４１は、遷移元がｑ₂で入力記号がｘである遷移の重みを算出し、この算出結果をｗ₂に代入する（ステップＳ１６６）。ここで代入されるｗ₂の値は、Γ'(ｑ₂,ｘ)に属する３つ組（ｑ，ｌ，ｗ）の全てに対する次の値のうち最も小さい値となる。その値とは、Ｅ_tに属する遷移のうち、遷移元がｑ且つ入力記号がｘであるような遷移δの重みのうちで最も小さい値にｗを足した値である。なお、Γ'(ｑ₂,ｘ)＝{(ｑ，ｌ，ｗ)∈ｑ₂｜δ∈Ｅ_t,prev(δ)＝ｑ,input(δ)＝ｘ}である。 Subsequently, the partial determinator 41 calculates the weight of the transition whose transition source is q ₂ and the input symbol is x, and substitutes this calculation result for w ₂ (step S166). The value of w ₂ substituted here is the smallest value among the following values for all triples (q, l, w) belonging to Γ ′ (q ₂ , x). And its value among the transition belonging to E _t, a value obtained by adding w to the smallest value among the weights of the transition δ such that the transition source is q and the input symbol is x. Note that Γ ′ (q ₂ , x) = {(q, l, w) ∈q ₂ | δ∈E _t, prev (δ) = q, input (δ) = x}.

次に、部分決定化部４１は、遷移元がｑ₂で入力記号がｘである遷移の出力記号を算出し、この算出結果をｌ₂に代入する（ステップＳ１６７）。ここで代入されるｌ₂の値は、Γ'(ｑ₂,ｘ)に属する３つ組（ｑ，ｌ，ｗ）の全てに対する次の文字列のうち、前方最長一致をとったものとなる。その文字列とはＥ_tに属する遷移のうち、遷移元がｑかつ入力記号がｘであるような遷移δの出力記号の前方最長一致をとった文字列の前方に文字列ｌをつなげたものである。例えば、Γ'(ｑ₂,ｘ)に属する３つ組（ｑ，ｌ，ｗ）が１つだけで且つ条件を満たすδが２つあるとし、その出力記号output(δ)がＡＢとＡＣであり、文字列ｌがＰであるとすると、ｌ₂に代入される文字列はＰＡとなる。 Next, the partial determinator 41 calculates the output symbol of the transition whose transition source is q ₂ and the input symbol is x, and substitutes this calculation result for l ₂ (step S167). The value of l ₂ assigned here is the one with the longest forward match among the next character strings for all triples (q, l, w) belonging to Γ ′ (q ₂ , x). . Those thereof of transition belonging to E _t is a string, the transition source is connecting the string l in front of the string taken forward longest match of output symbols of the transition δ as q and the input symbol is x It is. For example, if there is only one triple (q, l, w) belonging to Γ ′ (q ₂ , x) and two δ satisfying the condition, the output symbol output (δ) is AB and AC. If the character string l is P, the character string assigned to l ₂ is PA.

次いで、部分決定化部４１は、ｑ₂から入力記号ｘによって遷移する遷移先の状態ｑ'₂を生成する（ステップＳ１６８）。ここで、ｑ'₂を生成するには、Ｔ₁の状態名と余りの文字列と余りの重みとで構成される３つ組の集合を生成しなければならない。そのうち、状態はν'(ｑ₂,ｘ)に属しているもので、その要素をｑ'とするものである。なお、ν'(ｑ₂,ｘ)＝{ｑ'|(ｑ,ｌ,ｗ)∈ｑ₂,δ∈Ｅ_t,prev(δ)=ｑ,input(δ)＝ｘ,next(δ)＝ｑ'}である。つまり、ｑ₂に属する３つ組（ｑ，ｌ，ｗ）に関して、Ｅ_tに属する遷移δのうち、遷移元の状態がｑであり、入力記号がｘであるような遷移の遷移先の状態をｑ'としたとき、この条件を満たすｑ'の集合が、ν'(ｑ₂,ｘ)の返す値である。 Next, the partial determinator 41 generates a transition destination state q ′ ₂ that transitions from q ₂ by the input symbol x (step S168). Here, in order to generate q ′ ₂ , it is necessary to generate a set of triples composed of the state name of T ₁ , the surplus character string, and the surplus weight. Of these, the state belongs to ν ′ (q ₂ , x), and its element is q ′. Ν ′ (q ₂ , x) = {q ′ | (q, l, w) ∈q ₂ , δ∈E _t , prev (δ) = q, input (δ) = x, next (δ) = q ′}. That is, three pairs belonging to _{q 2 (q, l, w} ) with respect to, among transition δ belonging to E _t, transition source state is q, the input symbol is the transition destination of the transition such that x state If q ′ is q ′, a set of q ′ satisfying this condition is a value returned by ν ′ (q ₂ , x).

３つ組の２番目の値である余りの文字列は、ｑ'に対して次のような文字列になる。その文字列とは、集合γ'(ｑ₂,ｘ)に属する４つ組（ｑ，ｌ，ｗ，δ）の夫々に対し、ｌの後ろにδの出力文字列を加えた文字列の前方からｌ₂の文字列を除去した文字列を算出し、それらの文字列の前方最長一致をとったものとなる。ここで、γ'(ｑ₂,ｘ)＝{(ｑ，ｌ，ｗ，δ)∈ｑ₂×Ｅ_t|prev(δ)＝ｑ,input(δ)＝ｘ}である。つまり、ｑ₂に含まれている３つ組（ｑ，ｌ，ｗ）のうち、入力記号がｘであり且つ遷移元がｑであるようなＥ_tに属する遷移を含めた４つ組（ｑ,ｌ,ｗ,δ）の集合が、γ'(ｑ₂,ｘ)の返す値である。 The remaining character string that is the second value of the triplet is the following character string for q ′. The character string is the front of the character string obtained by adding the output character string of δ after l to each of the four sets (q, l, w, δ) belonging to the set γ ′ (q ₂ , x). A character string obtained by removing the character string of l ₂ from is calculated, and the longest forward match of those character strings is obtained. Here, γ ′ (q ₂ , x) = {(q, l, w, δ) ∈q ₂ × E _t | prev (δ) = q, input (δ) = x}. That is, three contained in q ₂ pairs (q, l, w) among the four pairs of input symbols and the transition source is x is included transitions belonging to E _t such that q (q , l, w, δ) is a value returned by γ ′ (q ₂ , x).

３つ組の３番目の値である余りの重みは、ｑ'に対して次のような重みとなる。その重みの値は、集合γ'(ｑ₂,ｘ)に属する４つ組（ｑ，ｌ，ｗ，δ）の夫々に対して、ｗにδの重みを加えｗ₂を引いた値のうち、最も小さい値となる。以上の計算により生成される３つ組の集合がｑ'₂となる。 The remaining weight, which is the third value of the triplet, is as follows for q ′. The value of the weight is the value obtained by adding the weight of δ to w and subtracting w ₂ for each of the four sets (q, l, w, δ) belonging to the set γ ′ (q ₂ , x). , The smallest value. A set of triples generated by the above calculation is q ′ ₂ .

続くステップＳ１６９において、部分決定化部４１は、ｑ₂からｑ'₂への遷移をＥ₂に追加する（ステップＳ１６９）。ここで追加した遷移の入力記号はｘであり、出力記号はｌ₂であり、重みはｗ₂である。 In subsequent step S169, partial determinator 41 adds a transition from q ₂ to q ′ ₂ to E ₂ (step S169). The input symbol of the transition added here is x, the output symbol is l ₂ , and the weight is w ₂ .

次いで、部分決定化部４１は、ステップＳ１６８で生成したｑ'₂がＱ₂に含まれているか否かを判定し、含まれていると判定した場合には（ステップＳ１７０；Ｎｏ）、ステップＳ１６３の処理へと再び戻る。 Then, area determining section 41, when the generated q _'2 it is determined whether or not included in the Q _2, is determined to be included in Step S168 (Step S170; No), step S163 Return to the process.

また、ステップＳ１７０において、ｑ'₂がＱ₂に含まれていないと判定した場合には（ステップＳ１７０；Ｙｅｓ）、部分決定化部４１は、ｑ'₂をＱ₂に追加し（ステップＳ１７１）、Ｓにｑ'₂を追加した後（ステップＳ１７２）、ステップＳ１６３の処理へと再び戻る。 Further, at step S170, _{'if 2} is determined not to be included in Q ₂ (step S170; Yes), area determining unit 41, q' q ₂ Add to Q ₂ (step S171) After adding q ′ ₂ to S (step S172), the process returns to step S163 again.

一方、ステップＳ１５５において、Ｓを空集合と判定した場合には（ステップＳ１５５；Ｎｏ）、部分決定化部４１は、Ｔ₁を決定化したＴ₂の状態の集合Ｑ₂に属している状態の名前を付け替え（ステップＳ１７３）、図３５のステップＳ１４３の処理へと移行する。 On the other hand, in step S155, when it is determined that an empty set S (step S155; No), area determining unit 41, a state belonging to the set Q ₂ of states of T ₂ determined the T ₁ The name is changed (step S173), and the process proceeds to step S143 in FIG.

図３５に戻り、繰り返し処理部３３は、ｉの値が繰り返し回数の最大値であるｎを下回るか否かを判定する（ステップＳ１４３）。ここで、ｉの値がｎを下回ると判定した場合には（ステップＳ１４３；Ｙｅｓ）、繰り返し処理部３３は、ステップＳ１４２で決定化されたＷＦＳＴＴ₂をＷＦＳＴＴ₁とし（ステップＳ１４４）、ｉの値を１増やした後（ステップＳ１４５）、ステップＳ１４２の処理へと再び戻る。 Returning to FIG. 35, the iterative processing unit 33 determines whether or not the value of i is less than n, which is the maximum value of the number of repetitions (step S143). Here, if the value of i is determined to be below the n (Step S143; Yes), repetitive processing unit 33, a WFST T ₂ determined reduction in step S142 as WFST T ₁ (step S144), i Is incremented by 1 (step S145), the process returns to step S142 again.

また、ステップＳ１４３において、ｉの値がｎ以上と判定した場合には（ステップＳ１４３；Ｎｏ）、本処理を終了する。ここで、最終的に得られたＷＦＳＴＴ₂は処理対象となった非決定性ＷＦＳＴＴ₁を決定化したものとなっている。 If it is determined in step S143 that the value of i is n or more (step S143; No), this process is terminated. Here, the finally obtained WFST T ₂ is obtained by determinizing the non-deterministic WFST T ₁ to be processed.

以上、発明の実施の形態について説明したが、本発明はこれに限定されるものではなく、本発明の主旨を逸脱しない範囲での種々の変更、置換、追加などが可能である。 Although the embodiments of the invention have been described above, the present invention is not limited to these embodiments, and various modifications, substitutions, additions, and the like can be made without departing from the spirit of the present invention.

オートマトン決定化装置のハードウェア構成を示した図である。It is the figure which showed the hardware constitutions of the automaton determinizing device. オートマトン決定化装置の機能的構成の一例を示した図である。It is the figure which showed an example of the functional structure of an automaton determinizing device. 有限オートマトンの一例を示した図である。It is the figure which showed an example of the finite automaton. 有限オートマトンの遷移表の一例を示した図である。It is the figure which showed an example of the transition table of a finite automaton. 決定化処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of determinizing processing. 有限オートマトンの一例を示した図である。It is the figure which showed an example of the finite automaton. 有限オートマトンの遷移表の一例を示した図である。It is the figure which showed an example of the transition table of a finite automaton. 有限オートマトンの一例を示した図である。It is the figure which showed an example of the finite automaton. 部分集合生成処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of a subset generation process. 逐次決定化処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of a sequential determinization process. 部分決定化処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of the partial determinization process. 逐次決定化処理の動作を説明するための図である。It is a figure for demonstrating the operation | movement of a sequential determination process. 逐次決定化処理の動作を説明するための図である。It is a figure for demonstrating the operation | movement of a sequential determination process. 逐次決定化処理の動作を説明するための図である。It is a figure for demonstrating the operation | movement of a sequential determination process. 逐次決定化処理の動作を説明するための図である。It is a figure for demonstrating the operation | movement of a sequential determination process. 逐次決定化処理の動作を説明するための図である。It is a figure for demonstrating the operation | movement of a sequential determination process. 逐次決定化処理の動作を説明するための図である。It is a figure for demonstrating the operation | movement of a sequential determination process. オートマトン決定化装置の機能的構成の一例を示した図である。It is the figure which showed an example of the functional structure of an automaton determinizing device. 逐次決定化処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of a sequential determinization process. 部分決定化処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of the partial determinization process. 遷移選択処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of a transition selection process. オートマトン決定化装置の機能的構成の一例を示した図である。It is the figure which showed an example of the functional structure of an automaton determinizing device. 決定化処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of determinizing processing. 重み付き有限状態トランスデューサの一例を示した図である。It is a figure showing an example of a weighted finite state transducer. 重み付き有限状態トランスデューサの一例を示した図である。It is a figure showing an example of a weighted finite state transducer. 重み付き有限状態トランスデューサの一例を示した図である。It is a figure showing an example of a weighted finite state transducer. 逐次決定化処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of a sequential determinization process. 部分決定化処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of the partial determinization process. 逐次決定化処理の動作を説明するための図である。It is a figure for demonstrating the operation | movement of a sequential determination process. 逐次決定化処理の動作を説明するための図である。It is a figure for demonstrating the operation | movement of a sequential determination process. 逐次決定化処理の動作を説明するための図である。It is a figure for demonstrating the operation | movement of a sequential determination process. 音声認識装置の構成を模式的に示した図である。It is the figure which showed the structure of the speech recognition apparatus typically. 逐次決定化処理時に必要となる記憶域の変化を示した図である。It is the figure which showed the change of the storage area required at the time of sequential determinization processing. オートマトン決定化装置の機能的構成の一例を示した図である。It is the figure which showed an example of the functional structure of an automaton determinizing device. 逐次決定化処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of a sequential determinization process. 部分決定化処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of the partial determinization process. 遷移選択処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of a transition selection process.

Explanation of symbols

１００オートマトン決定化装置
２００オートマトン決定化装置
３００オートマトン決定化装置
４００オートマトン決定化装置
１ＣＰＵ
２操作部
３表示部
４ＲＯＭ
５ＲＡＭ
６記憶部
７バス
１１決定化処理部
１２部分集合生成部
１３部分決定化部
１４繰り返し処理部
２１部分決定化部
３１決定化処理部
３２部分決定化部
３３繰り返し処理部
４１部分決定化部
５００音声認識装置
５０１特徴量抽出部
５０２合成最適化部
５０３デコーダ
５０４音響モデル
５０５単語辞書
５０６言語モデル 100 automaton determinator 200 automaton determinator 300 automaton determinator 400 automaton determinator 1 CPU
2 Operation section 3 Display section 4 ROM
5 RAM
6 Storage Unit 7 Bus 11 Determinization Processing Unit 12 Subset Generation Unit 13 Partial Determinization Unit 14 Repetition Processing Unit 21 Partial Determinization Unit 31 Determinization Processing Unit 32 Partial Determinization Unit 33 Repetition Processing Unit 41 Partial Determinization Unit 500 Audio Recognizer 501 feature extraction unit 502 synthesis optimization unit 503 decoder 504 acoustic model 505 word dictionary 506 language model

Claims

A finite state automaton or an automaton that is an extension of the finite state automaton, an automaton determinizing method that is executed by an automaton determinator for determinizing a nondeterministic automaton,
The automaton determinator comprises a storage means,
A partial determinizing step for performing determinization on a part of transitions selected as determinizing targets among the transitions included in the non-deterministic automaton by partial determinizing means;
A name change by which the name of the determinized state consisting of any combination of states included in the non-deterministic automaton stored in the storage unit by the determinizing unit is renamed to a single different name by the name changing unit Process,
A repeating step of repeatedly executing the partial determinizing step until determinizing is performed for all transitions included in the non-deterministic automaton by a repeating unit;
Including
In the partial determinizing step, the transition or the combination of transitions selected as the determinizing target is made different for each repetition in the repeating step.

A subset generation step of sequentially generating a subset obtained by extracting some of the input symbols among the input symbols related to the transition included in the non-deterministic automaton according to the number of repetitions;
The automaton determinizing method according to claim 1, wherein the partial determinizing step sets a transition related to the input symbol as a determinizing target based on an input symbol included in the subset.

The partial determinizing step includes
Of the transitions included in the non-deterministic automaton, a replacement step of replacing input symbols related to transitions other than the determinizing target with different symbols,
Determinizing the non-deterministic automaton after the replacing step, and generating a deterministic automaton;
Among the transitions included in the deterministic automaton, a restoration step of returning the input symbol replaced in the replacement step to the original symbol;
Including
3. The repetition step causes the partial determinization step to process the deterministic automaton as a non-deterministic automaton until determinization is performed for all transitions included in the non-deterministic automaton. The determinating method of the automaton described in 1.

The partial determinizing step includes
An input symbol selection step of sequentially selecting one input symbol as a processing target from input symbols related to transitions included in the non-deterministic automaton;
Transition set extraction means for generating a transition set obtained by extracting transitions related to the input symbol to be processed among the transitions included in the non-deterministic automaton;
Selecting means for selecting a transition to be determinized from transitions included in the transition set;
The automaton determinizing method according to claim 1, wherein the automaton is deterministic.

The automaton determinizing method according to claim 2, wherein the subset generation step selects an input symbol to be extracted according to a type of an input symbol related to a transition included in the non-deterministic automaton.

3. The subset generation step selects an input symbol to be extracted according to the number of transitions related to each input symbol among the input symbols related to the transition included in the non-deterministic automaton. Or, the automaton determinizing method according to 5.

The subset generation step includes, for each input symbol related to the transition included in the non-deterministic automaton, a sum of the number of transitions related to the input symbol and the number of states included in the non-deterministic automaton. A value obtained by dividing the number of transition source states of the transition relating to the symbol by the sum of the number of states included in the non-deterministic automaton is calculated, and an input symbol to be extracted is selected according to the calculated value. The determinizing method for an automaton according to claim 6.

A weighted finite state transducer determinating method executed by an automaton determinator for determinizing a nondeterministic finite state transducer with respect to a weighted finite state transducer used in a speech recognition device,
The automaton determinator comprises a storage means,
A partial determinizing step for determinizing a part of transitions selected as determinizing targets among the transitions included in the non-deterministic finite state transducer by a partial determinizing unit;
By the name changing means, the names of the determinized states made up of any state included in the non-deterministic finite state transducer stored in the storage means by the determinization or a combination of a set including the states are different from each other. A name change process to be replaced with one name;
A repeating step of repeatedly performing the partial determinizing step until determinizing is performed for all transitions included in the non-deterministic finite state transducer by a repeating means;
Including
In the partial determinizing step, the transition or combination of transitions selected as the determinizing target is made different for each repetition in the repeating step.

An automaton determinator for determining a nondeterministic automaton with respect to a finite state automaton or an automaton obtained by extending the finite state automaton,
Of the transitions included in the non-deterministic automaton, a partial determinizing unit that performs determinization on a part of transitions selected as determinization targets;
Storage means for storing a determinized state formed by a combination of any state included in the non-deterministic automaton generated by the determinization;
Name changing means for changing the name of the finalized state to a different single name;
Repeating means for repeatedly executing the partial determinizing step until determinization is performed for all transitions included in the non-deterministic automaton;
With
The automaton determinizing apparatus characterized in that the partial determinizing unit changes a transition or a set of transitions to be selected as the determinizing target for each repetition in the repetition step.

Regarding a finite state automaton or an automaton that is an extension of the finite state automaton, a non-deterministic automaton is determined, and a computer having storage means that functions as a work area at the time of the determination
Of the transitions included in the non-deterministic automaton, a partial determinizing function that performs determinization on some transitions selected as determinizing targets;
A name changing step of replacing the name of the determinized state consisting of any combination of states included in the non-deterministic automaton stored in the storage means by the determinizing method with a different single name;
A repeat function that repeats the partial determinization step until determinization is performed for all transitions included in the non-deterministic automaton;
Realized,
The automaton determinizing program characterized in that the partial determinizing function changes a transition or a set of transitions selected as the determinizing object for each repetition by the repetitive function.