JP5458640B2

JP5458640B2 - Rule processing method and apparatus

Info

Publication number: JP5458640B2
Application number: JP2009100574A
Authority: JP
Inventors: 友哉岩倉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-04-17
Filing date: 2009-04-17
Publication date: 2014-04-02
Anticipated expiration: 2029-04-17
Also published as: JP2010250642A

Description

本技術は、予め学習によって得られる規則についての処理技術に関する。 The present technology relates to a processing technology for rules obtained in advance by learning.

例えば単語列から人名や場所などの固有名詞を抽出する場合に、従来では以下のような処理が必要であった。ここでは説明を簡単にするため、人名（「人」と記す）とそれ以外（「Ｏ」と記す）とを判別するものとする。すなわち、図１（ａ）に示すような単語と固有名詞タイプとの正しい組み合わせを用意し、そこから図１（ｂ）に示すような固有名詞タイプと素性集合との組み合わせを生成する。素性は、判別のための手がかりを表している。素性においては、単語そのものという属性種別であればＷで表し、その記号の後の（０）や（１）などで注目単語の位置（０）を基準にした出現相対位置を表す。例えば、図１（ａ）の１番目の単語「宮崎」に注目すると、Ｗ（０）＝宮崎と表すことができる。さらに、判別のための手がかりは、現在位置の単語と前後の単語とを用いることとする。よって、１番目の単語を現在位置とすると、１つ正の方向に移動した単語「出身」については、Ｗ（１）＝出身と表される。また、現在位置を２番目の単語に移行させると、現在位置の単語「出身」についてはＷ（０）＝出身と表され、現在位置から見て１つ前の単語「宮崎」についてはＷ（-1）＝宮崎と表され、現在位置から見て１つ後の単語「さん」についてはＷ（1）＝さんと表される。同様にして、図１（ａ）のような単語列とその固有名詞タイプとから、図１（ｂ）のようなデータを生成する。なお、本願では、一般化してＷ（ｐ）＝ｗiと表すものとする。ｗiは、ｉ番目の位置の単語である。ｐは注目位置からの出現相対位置を表す。また、ここでは簡略化しているが、属性種別には、文字種（例えば漢字、ひらがななど）、品詞など他の属性を用いるようにしても良い。複数の属性種別を組み合わせる場合もある。 For example, when a proper noun such as a person's name or place is extracted from a word string, conventionally, the following processing is necessary. Here, in order to simplify the explanation, it is assumed that the name of the person (denoted as “person”) and the other name (denoted as “O”) are distinguished. That is, a correct combination of a word and a proper noun type as shown in FIG. 1A is prepared, and a combination of a proper noun type and a feature set as shown in FIG. 1B is generated therefrom. The feature represents a clue for discrimination. In the feature, if the attribute type is the word itself, it is represented by W, and (0), (1), etc. after the symbol represent the relative appearance position based on the position (0) of the word of interest. For example, when focusing on the first word “Miyazaki” in FIG. 1A, it can be expressed as W (0) = Miyazaki. Further, as a clue for discrimination, the word at the current position and the previous and subsequent words are used. Therefore, assuming that the first word is the current position, the word “origin” moved in the positive direction is expressed as W (1) = origin. Further, when the current position is shifted to the second word, the word “from” at the current position is represented as W (0) = origin, and the word “Miyazaki” immediately before the current position is represented by W ( -1) = Miyazaki is represented, and the word “san” after the current position is represented by W (1) = san. Similarly, data as shown in FIG. 1B is generated from a word string as shown in FIG. 1A and its proper noun type. In this application, it is generalized and expressed as W (p) = wi. wi is the word at the i-th position. p represents the relative position of appearance from the position of interest. Although simplified here, other attributes such as a character type (for example, kanji or hiragana) or a part of speech may be used as the attribute type. In some cases, multiple attribute types may be combined.

さらに、図１（ｂ）に示すようなデータに対して周知の方法（Iwakura, Tomoya and Okamoto, Seishi,"A Fast Boosting-based Learner for Feature-Rich Tagging and Chunking", Proc. of CoNLL 2008, pages: {17-24}を参照のこと。）で機械学習を行うと、図２に示すような規則が生成される。図２の各行が１つの規則であり、各規則は、条件と、固有名詞タイプと、スコアとを含む。条件は素性で表される。図２の例では１つの条件が１つの素性を含むが、複数の素性の組み合わせを含むようにしても良い。スコアは、大きい程対応する固有名詞タイプである可能性が高くなる。条件「Ｗ（０）＝宮崎」、固有名詞タイプ「人」、スコア「１０」ということは、Ｗ（０）＝宮崎を満たす場合に、現在位置の単語「宮崎」が「人」であるスコアは「１０」であるということである。 Furthermore, a known method (Iwakura, Tomoya and Okamoto, Seishi, “A Fast Boosting-based Learner for Feature-Rich Tagging and Chunking”, Proc. Of CoNLL 2008, pages : See {17-24}.) When machine learning is performed, rules as shown in FIG. 2 are generated. Each line in FIG. 2 is a rule, and each rule includes a condition, a proper noun type, and a score. Conditions are represented by features. In the example of FIG. 2, one condition includes one feature, but a combination of a plurality of features may be included. The higher the score, the higher the possibility that it is a corresponding proper noun type. The condition “W (0) = Miyazaki”, proper noun type “people”, and score “10” means that if W (0) = Miyazaki is satisfied, the word “Miyazaki” at the current position is “people”. Means “10”.

図２に示すような規則を用いて、図３に示すような単語列「宮崎」「さん」「と」「遊ぶ」から人名を特定する際の処理は図４及び図５に示すようなものである。まず、現在位置の単語「宮崎」を規則に適用する場合、まず、現在位置の単語「宮崎」と次の単語「さん」とから、素性集合「Ｗ（０）＝宮崎」及び「Ｗ（１）＝さん」とを生成する（ステップ（１））。現在位置の単語とその前後の単語から素性を生成するためである。生成された素性で規則の条件を検索し（ステップ（２））、一致する条件に対応する固有名詞タイプ及びスコアを抽出する。素性Ｗ（０）＝宮崎に一致する条件の規則が存在するので、固有名詞タイプ「人」及びスコア「１０」が得られる。同様に、素性Ｗ（１）＝さんに一致する条件の規則が存在するので、固有名詞タイプ「人」及びスコア「３０」が得られる。まとめると、現在位置の単語「宮崎」について、固有名詞タイプ「人」がスコア「４０（＝１０＋３０）」となり、登録される（ステップ（３））。 The process for specifying a person name from the word strings “Miyazaki” “san” “to” “play” as shown in FIG. 3 using the rules as shown in FIG. 2 is as shown in FIG. 4 and FIG. It is. First, when applying the word “Miyazaki” at the current position to the rule, first, from the word “Miyazaki” at the current position and the next word “Ms.”, feature sets “W (0) = Miyazaki” and “W (1 ) = San ”is generated (step (1)). This is because the feature is generated from the word at the current position and the words before and after the current word. The rule condition is searched with the generated feature (step (2)), and the proper noun type and score corresponding to the matching condition are extracted. Since there is a rule with a condition that matches the feature W (0) = Miyazaki, the proper noun type “people” and the score “10” are obtained. Similarly, since there is a rule of a condition that matches the feature W (1) =, the proper noun type “person” and the score “30” are obtained. In summary, for the word “Miyazaki” at the current position, the proper noun type “people” becomes a score “40 (= 10 + 30)” and is registered (step (3)).

次に、現在位置の単語を「宮崎」から「さん」に移行して、単語「さん」の前後の単語から、素性集合「Ｗ（-1）＝宮崎」「Ｗ（０）＝さん」「Ｗ（１）＝と」を生成する（ステップ（４））。そして、生成された素性で規則の条件を検索し（ステップ（５））、一致する条件に対応する固有名詞タイプ及びスコアを抽出する。素性Ｗ（-1）＝宮崎に一致する条件の規則が存在するので、固有名詞タイプ「Ｏ」及びスコア「５」が得られる。同様に、素性Ｗ（０）＝さんに一致する条件の規則が存在するので、固有名詞タイプ「Ｏ」及びスコア「２０」が得られる。まとめると、現在位置の単語「さん」について、固有名詞タイプ「Ｏ」がスコア「２５（＝５＋２０）」となり、登録される（ステップ（６））。 Next, the word at the current position is shifted from “Miyazaki” to “san”, and from the words before and after the word “san”, the feature sets “W (−1) = Miyazaki” “W (0) = san” “ W (1) = to "is generated (step (4)). Then, the rule condition is searched with the generated feature (step (5)), and the proper noun type and score corresponding to the matching condition are extracted. Since there is a rule with a condition that matches the feature W (−1) = Miyazaki, the proper noun type “O” and the score “5” are obtained. Similarly, since there is a rule of a condition that matches the feature W (0) =, the proper noun type “O” and the score “20” are obtained. In summary, the proper noun type “O” becomes the score “25 (= 5 + 20)” and is registered for the word “san” at the current position (step (6)).

図４及び図５から分かるように、生成される素性において「宮崎」「さん」といった単語が、相対位置「０」、「−１」、「１」といったように異なる位置で別物として出現し、その都度規則とのマッチングを行わなければならない。これによって検索回数が増加するため処理速度が遅くなるという問題がある。 As can be seen from FIGS. 4 and 5, words such as “Miyazaki” and “Mr.” appear as different objects at different positions such as relative positions “0”, “−1”, “1” in the generated features, You must match the rules each time. As a result, the number of searches increases, which causes a problem that the processing speed becomes slow.

Iwakura, Tomoya and Okamoto, Seishi,"A Fast Boosting-based Learner for Feature-Rich Tagging and Chunking", Proc. of CoNLL 2008, pages: {17-24}Iwakura, Tomoya and Okamoto, Seishi, "A Fast Boosting-based Learner for Feature-Rich Tagging and Chunking", Proc. Of CoNLL 2008, pages: {17-24}

以上述べたように、従来技術では、同じ単語について何回も異なる位置で異なる素性として規則に対して適用することになるので、規則に対する検索回数が増加して処理速度が遅くなる。上の例では、単語の属性種別としては単語そのもののみを使用する例を示したが、その他単語の属性種別として品詞や文字種別をも用いる場合や、複数の素性の組み合わせで規則の条件が規定される場合には、余計に検索回数が増加する。 As described above, in the prior art, since the same word is applied to the rule as different features at different positions many times, the number of searches for the rule increases and the processing speed is slowed down. In the above example, only the word itself is used as the attribute type of the word. However, the rule condition is defined by using other parts of speech or character types as the attribute type of the word, or by combining multiple features. If this is done, the number of searches will increase.

従って、本技術の目的は、単語のタイプを判別するために規則を適用する際の処理速度を高速化するための技術を提供することである。 Accordingly, an object of the present technology is to provide a technology for increasing the processing speed when applying a rule to determine a word type.

本規則処理方法は、単語の属性種別と出現相対位置と属性値との組み合わせで表される条件又は条件の組み合わせの各々に関連付けてスコアと当該スコアを付与すべき出現相対位置及びタイプとを含む１又は複数のスコア設定規則が登録されている規則データ格納部と、単語の各属性種別の属性値を含む要素が出現順に並べられた要素列を格納する要素列データ格納部とにアクセス可能なコンピュータにより実行される。そして本規則処理方法は、要素列データ格納部から、上記条件の組み合わせに含まれる条件の最大数と上記条件又は条件の組み合わせに関連付けられているスコア設定規則に含まれる出現相対位置の、基準位置からの最大距離とにより特定される候補抽出条件に従って、要素毎に、単語の属性種別と出現相対位置と属性値との組み合わせで表される要素条件又は要素条件の組み合わせである適用候補を抽出する抽出ステップと、適用候補で規則データ格納部を検索して、該当する条件又は条件の組み合わせが存在する場合には、規則データ格納部から該当する上記条件又は条件の組み合わせに関連付けられている１又は複数のスコア設定規則を抽出し、抽出された１又は複数のスコア設定規則に含まれるタイプ及びスコアで、抽出された上記１又は複数のスコア設定規則に含まれる出現相対位置と適用候補に係る出現位置とから特定される要素の当該タイプについてのスコアを更新し、要素列データ格納部に格納するステップとを含む。 This rule processing method includes a score, a relative appearance position and a type to which the score should be assigned, in association with each condition or combination of conditions represented by a combination of the attribute type, appearance relative position, and attribute value of the word. Access to a rule data storage unit in which one or a plurality of score setting rules are registered, and an element column data storage unit that stores an element column in which elements including attribute values of each attribute type of words are arranged in the order of appearance Executed by a computer. Then, this rule processing method uses the reference position of the appearance relative position included in the score setting rule associated with the maximum number of conditions included in the combination of conditions and the condition or combination of conditions from the element string data storage unit. In accordance with the candidate extraction condition specified by the maximum distance from the application candidate, an application candidate that is an element condition or a combination of element conditions represented by a combination of the attribute type of word, relative position of appearance, and attribute value is extracted for each element. When the rule data storage unit is searched for the extraction step and the application candidate and there is a corresponding condition or combination of conditions, the rule data storage unit is associated with the corresponding condition or combination of conditions 1 or A plurality of score setting rules are extracted, and the above-mentioned types and scores included in the extracted one or more score setting rules are extracted. Or the appearance relative position included in the plurality of scores set rules score for the types of elements identified from the appearance position of the applied candidates updated, and storing the element sequence data storage unit.

単語のタイプを判別するために規則を適用する際の処理速度が高速化される。 The processing speed when applying the rule to determine the word type is increased.

図１は、従来技術を説明するための図である。FIG. 1 is a diagram for explaining the prior art. 図２は、従来技術の規則データを示す図である。FIG. 2 is a diagram showing rule data of the prior art. 図３は、判別対象データの一例を示す図である。FIG. 3 is a diagram illustrating an example of discrimination target data. 図４は、従来技術の処理を説明するための模式図である。FIG. 4 is a schematic diagram for explaining the processing of the prior art. 図５は、従来技術の処理を説明するための模式図である。FIG. 5 is a schematic diagram for explaining the processing of the prior art. 図６は、本技術の実施の形態における規則データの変換について示す図である。FIG. 6 is a diagram illustrating rule data conversion according to the embodiment of the present technology. 図７は、実施の形態における規則の適用を示す模式図である。FIG. 7 is a schematic diagram showing application of rules in the embodiment. 図８は、実施の形態における規則の適用を示す模式図である。FIG. 8 is a schematic diagram showing application of rules in the embodiment. 図９は、規則処理装置の機能ブロック図である。FIG. 9 is a functional block diagram of the rule processing device. 図１０は、実施の形態におけるメインの処理フローを示す図である。FIG. 10 is a diagram showing a main processing flow in the embodiment. 図１１は、規則変換処理の処理フローを示す図である。FIG. 11 is a diagram illustrating a processing flow of rule conversion processing. 図１２は、規則変換過程を表すデータを示す図である。FIG. 12 is a diagram illustrating data representing the rule conversion process. 図１３は、規則変換過程を表すデータを示す図である。FIG. 13 is a diagram illustrating data representing the rule conversion process. 図１４は、一般化した判別対象データの一例を示す図である。FIG. 14 is a diagram illustrating an example of generalized discrimination target data. 図１５は、判別処理の処理フローを示す図である。FIG. 15 is a diagram illustrating a processing flow of the discrimination processing. 図１６は、判別結果格納部に格納されるデータの一例を示す図である。FIG. 16 is a diagram illustrating an example of data stored in the determination result storage unit. 図１７は、第２規則変換処理の処理フローを示す図である。FIG. 17 is a diagram illustrating a processing flow of the second rule conversion processing. 図１８は、第２規則変換処理の内容を説明するための模式図である。FIG. 18 is a schematic diagram for explaining the contents of the second rule conversion process. 図１９は、第２規則変換処理の処理フローを示す図である。FIG. 19 is a diagram illustrating a processing flow of the second rule conversion processing. 図２０は、第２判別処理の処理フローを示す図である。FIG. 20 is a diagram illustrating a processing flow of the second determination processing. 図２１は、判別対象データの一例を示す図である。FIG. 21 is a diagram illustrating an example of discrimination target data. 図２２は、チェック候補の一例を示す図である。FIG. 22 is a diagram illustrating an example of check candidates. 図２３は、第２判別処理における規則適用例を示す図である。FIG. 23 is a diagram illustrating a rule application example in the second determination process. 図２４は、第２判別処理の処理フローを示す図である。FIG. 24 is a diagram illustrating a processing flow of the second determination processing. 図２５は、コンピュータの機能ブロック図である。FIG. 25 is a functional block diagram of a computer.

［本実施の形態の概要］
本実施の形態では、図２に示すような規則を、図６に示すような新たなフォーマットの規則に変換する。具体的には、新たな規則は、条件と、スコアを付与する単語の位置と、固有名詞タイプと、スコアとを含む。但し、条件１つにつき、スコアを付与する単語の位置と固有名詞タイプとスコアとを含むスコア設定規則が１又は複数関連付けられている。従って、規則の適用時に１つ条件が特定されると、一度にスコア設定規則が抽出できるようになる。 [Outline of this embodiment]
In the present embodiment, the rule as shown in FIG. 2 is converted into a new format rule as shown in FIG. Specifically, the new rule includes a condition, a word position to which a score is assigned, a proper noun type, and a score. However, one or more score setting rules including the position of the word to which the score is assigned, the proper noun type, and the score are associated with one condition. Therefore, when one condition is specified when applying the rule, the score setting rule can be extracted at a time.

そして、図７に示すように、図３に示すような単語列を処理する場合には、現在位置の単語「宮崎」で、図６に示す新規則を検索し、一致する条件が存在するか確認する（ステップ（１０））。図６の新規則では、１番目の規則が特定され、２つのスコア設定規則が一度に抽出される。第１のスコア設定規則は、スコアを付与する単語の位置「０」と固有名詞タイプ「人」とスコア「１０」とを含むので、スコアを付与する単語の位置「０」から現在位置の単語「宮崎」に対して、固有名詞タイプ「人」についてスコア「１０」を設定する（ステップ（１１））。さらに、第２のスコア設定規則は、スコアを付与する単語の位置「１」と固有名詞タイプ「Ｏ」とスコア「５」とを含むので、スコアを付与する単語の位置「１」から現在位置の次の位置の単語「さん」に対して、固有名詞タイプ「Ｏ」についてスコア「５」を設定する（ステップ（１２））。 Then, as shown in FIG. 7, when processing a word string as shown in FIG. 3, the word “Miyazaki” at the current position is searched for the new rule shown in FIG. Confirm (step (10)). In the new rule of FIG. 6, the first rule is specified, and two score setting rules are extracted at once. Since the first score setting rule includes the position “0” of the word to which the score is assigned, the proper noun type “person”, and the score “10”, the word at the current position from the position “0” of the word to which the score is assigned. For “Miyazaki”, a score “10” is set for the proper noun type “people” (step (11)). Furthermore, since the second score setting rule includes the position “1” of the word to which the score is assigned, the proper noun type “O”, and the score “5”, the current position is determined from the position “1” of the word to which the score is assigned. A score “5” is set for the proper noun type “O” for the word “san” at the next position (step (12)).

また、図８に示すように、現在位置を次の単語に移動させ、現在位置の単語「さん」で、図６に示す新規則を検索し、一致する条件が存在するか確認する（ステップ（１３））。図６の新規則では、２番目の規則が特定され、２つのスコア設定規則が一度に抽出される。第１のスコア設定規則は、スコアを付与する単語の位置「０」と固有名詞タイプ「Ｏ」とスコア「２０」とを含むので、スコアを付与する単語の位置「０」から現在位置の単語「さん」に対して、固有名詞タイプ「Ｏ」についてスコア「２０」を現在の値「５」に加算して「２５」を登録する（ステップ（１４））。同様に、第２のスコア設定規則は、スコアを付与する単語の位置「−１」と固有名詞タイプ「人」とスコア「３０」とを含むので、スコアを付与する単語の位置「−１」から１つ前の位置の単語「宮崎」に対して、固有名詞タイプ「人」についてスコア「３０」を現在の値「１０」に加算して「４０」を登録する（ステップ（１５））。 Also, as shown in FIG. 8, the current position is moved to the next word, and the new rule shown in FIG. 6 is searched for the word “Ms.” at the current position to check whether a matching condition exists (step ( 13)). In the new rule of FIG. 6, the second rule is specified, and two score setting rules are extracted at once. Since the first score setting rule includes the position “0” of the word to which the score is given, the proper noun type “O”, and the score “20”, the word at the current position from the position “0” of the word to which the score is given. For “san”, the score “20” is added to the current value “5” for the proper noun type “O”, and “25” is registered (step (14)). Similarly, since the second score setting rule includes the position “−1” of the word to which the score is assigned, the proper noun type “person”, and the score “30”, the position “−1” of the word to which the score is assigned. Is added to the current value “10” for the proper noun type “person”, and “40” is registered (step (15)).

このようにすれば、判別対象の単語１つについて新規則を１度検索すれば、適用すべきスコア設定規則を抽出することができ、検索回数が減少して処理速度が高速化される。 In this way, if a new rule is searched once for one word to be determined, the score setting rule to be applied can be extracted, the number of searches is reduced, and the processing speed is increased.

［本実施の形態の具体的内容］
図９に規則処理装置の機能ブロック図を示す。規則処理装置は、学習データ入力部１と、学習データ入力部１により入力された学習データを格納する学習データ格納部３と、学習データ格納部３に格納されているデータを用いて規則学習処理を実施する規則学習部５と、規則学習部５の処理結果を格納する第１規則データ格納部７と、第１規則データ格納部７に格納されている第１の規則を第２の規則に変換する規則変換部９と、規則変換部９により生成された第２の規則のデータを格納する第２規則データ格納部１１と、判別対象データ入力部１３と、判別対象データ入力部１３により入力された判別対象データを格納する判別対象データ格納部１５と、第２規則データ格納部１１と判別対象データ格納部１５とに格納されたデータを用いて処理する判別部１７と、判別部１７の処理結果を格納する判別結果格納部１９と、判別結果格納部１９に格納されているデータを出力する出力部２１とを有する。 [Specific contents of this embodiment]
FIG. 9 shows a functional block diagram of the rule processing device. The rule processing device uses a learning data input unit 1, a learning data storage unit 3 that stores learning data input by the learning data input unit 1, and a rule learning process that uses data stored in the learning data storage unit 3. The rule learning unit 5 that implements the processing, the first rule data storage unit 7 that stores the processing result of the rule learning unit 5, and the first rule stored in the first rule data storage unit 7 as the second rule Input by the rule conversion unit 9 for conversion, the second rule data storage unit 11 for storing the data of the second rule generated by the rule conversion unit 9, the discrimination target data input unit 13, and the discrimination target data input unit 13 A determination target data storage unit 15 for storing the determined determination target data, a determination unit 17 for processing using the data stored in the second rule data storage unit 11 and the determination target data storage unit 15, Processing A determination result storing section 19 for storing, and an output unit 21 for outputting the data stored in the determination result storage unit 19.

次に、規則処理装置の処理内容について図１０乃至図２４を用いて説明する。最初に、規則処理装置の学習データ入力部１は、ユーザから学習データの入力を受け付け、学習データ格納部３に格納する（図１０：ステップＳ１）。例えば、図１（ａ）に示すようなデータ、すなわち単語と正しい固有名詞タイプとの複数の対が含まれる。そして、規則学習部５は、学習データ格納部３に格納されている学習データに対して周知の学習処理を実施して第１の規則を生成し、第１規則データ格納部７に格納する（ステップＳ３）。例えば図１（ａ）から図１（ｂ）そして図２に示すようなデータ・テーブルを生成する。この処理自体は周知であるからこれ以上述べない。 Next, processing contents of the rule processing device will be described with reference to FIGS. First, the learning data input unit 1 of the rule processing device accepts input of learning data from the user and stores it in the learning data storage unit 3 (FIG. 10: step S1). For example, data as shown in FIG. 1A, that is, a plurality of pairs of words and correct proper noun types are included. Then, the rule learning unit 5 generates a first rule by performing a known learning process on the learning data stored in the learning data storage unit 3 and stores the first rule in the first rule data storage unit 7 ( Step S3). For example, a data table as shown in FIG. 1A to FIG. 1B and FIG. 2 is generated. This process itself is well known and will not be described further.

次に、規則変換部９は、規則変換処理を実施する（ステップＳ５）。この規則変換処理については、図１１乃至図１３を用いて説明する。但し、最初は、単語の属性種別については単語そのものだけであるものとする。規則変換部９は、ｒを１に初期化する（ステップＳ２１）。そして、第１規則データ格納部７に格納されている第１規則テーブル（図２）におけるｒ番目の規則の条件を、素性の出現位置情報（＝出現相対位置）ｐと値ｆとに分離する（ステップＳ２３）。図２の第１行目を処理する場合、条件はＷ（０）＝宮崎となっているので、ｐ＝０とｆ＝宮崎とが得られる。 Next, the rule conversion unit 9 performs a rule conversion process (step S5). This rule conversion process will be described with reference to FIGS. However, at first, it is assumed that the word attribute type is only the word itself. The rule conversion unit 9 initializes r to 1 (step S21). Then, the condition of the r-th rule in the first rule table (FIG. 2) stored in the first rule data storage unit 7 is separated into feature appearance position information (= appearance relative position) p and value f. (Step S23). When the first line in FIG. 2 is processed, since the condition is W (0) = Miyazaki, p = 0 and f = Miyazaki are obtained.

その後、規則変換部９は、第２規則データ格納部１１における第２規則テーブルに、値ｆを条件として、「−ｐ」をスコアを付与する単語の位置として、第１規則テーブルにおけるｒ番目の規則のスコア及び固有名詞タイプをそのまま、登録する（ステップＳ２５）。図２の第１行目であれば、図１２に示すようなデータが第２規則データ格納部１１における第２規則テーブルに登録される。すなわち、条件「宮崎」と、スコアを付与する単語の位置「０」と固有名詞タイプ「人」とスコア「１０」とを含むスコア設定規則とが登録される。 After that, the rule conversion unit 9 sets the second rule table in the second rule data storage unit 11 to the r-th rule in the first rule table with the value f as a condition and “−p” as the position of the word to which the score is assigned. The rule score and proper noun type are registered as they are (step S25). If it is the first row in FIG. 2, data as shown in FIG. 12 is registered in the second rule table in the second rule data storage unit 11. That is, the condition “Miyazaki” and the score setting rule including the position “0” of the word to which the score is assigned, the proper noun type “people”, and the score “10” are registered.

そして、規則変換部９は、ｒを１インクリメントし、ｒがｍ（ｍは第１規則データ格納部７の第１規則テーブルのレコード数）以下であるか判断する（ステップＳ２９）。ｒがｍ以下であればステップＳ２３に戻る。 Then, the rule conversion unit 9 increments r by 1, and determines whether r is less than m (m is the number of records in the first rule table of the first rule data storage unit 7) (step S29). If r is less than or equal to m, the process returns to step S23.

ｒ＝２になると、条件はＷ（-1）＝宮崎となっているので、ｐ＝−１とｆ＝宮崎とが得られる。そして、図１３に太線で示すように、ｆ＝宮崎を条件として、−ｐ＝１をスコアを付与する単語の位置として、ｒ番目の規則のスコア「５」及び固有名詞タイプ「Ｏ」をそのまま、第２規則テーブルに登録する。 When r = 2, since the condition is W (−1) = Miyazaki, p = −1 and f = Miyazaki are obtained. Then, as indicated by a bold line in FIG. 13, the score “5” and the proper noun type “O” of the r-th rule are used as they are, with f = Miyazaki as the condition, and −p = 1 as the position of the word to which the score is assigned. , Registered in the second rule table.

一方、ｒがｍを超える場合には、規則変換部９は、同一の条件についてスコア設定規則を集めることによって規則を集約する（ステップＳ３０）。同一の条件に対応付けられているスコア設定規則を収集して、１つの条件に関連付けて収集されたスコア設定規則が登録される形に変換する。これによって、一度の検索で簡単に全てのスコア設定規則を抽出することができるようになる。その後、元の処理に戻る。 On the other hand, if r exceeds m, the rule conversion unit 9 collects the rules by collecting the score setting rules for the same condition (step S30). Score setting rules associated with the same condition are collected and converted into a form in which the score setting rules collected in association with one condition are registered. This makes it possible to easily extract all score setting rules with a single search. Thereafter, the process returns to the original process.

このような処理を実施すれば、図６に示すような変換が完了して、第２規則データ格納部１１に第２規則テーブルが格納されるようになる。 When such processing is performed, the conversion as shown in FIG. 6 is completed, and the second rule table is stored in the second rule data storage unit 11.

図１０の処理の説明に戻って、ステップＳ１乃至Ｓ３とは別に又はステップＳ５の後に、判別対象データ入力部１３は、ユーザから判別対象データの入力を受け付け、判別対象データ格納部１５に格納する（ステップＳ７）。判別対象データは、例えば図３に示すようなデータである。このような単純な例では、単語の属性種別が、単語そのものである。但し、必ずしも単語の属性種別は１種類に限定されるものではなく、例えば図１４に示すように、単語そのものと、品詞と、文字種とを属性種別として採用するようにしても良い。このように、１つの単語について、使用する全属性種別の属性値の組み合わせを、要素ｗと呼ぶものとする。この単語の出現位置がある入力で「ｉ」番目であれば、ｗiと表すものとする。 Returning to the description of the processing in FIG. 10, separately from steps S <b> 1 to S <b> 3 or after step S <b> 5, the discrimination target data input unit 13 accepts input of discrimination target data from the user and stores it in the discrimination target data storage unit 15. (Step S7). The discrimination target data is, for example, data as shown in FIG. In such a simple example, the attribute type of the word is the word itself. However, the attribute type of the word is not necessarily limited to one type. For example, as shown in FIG. 14, the word itself, the part of speech, and the character type may be adopted as the attribute type. As described above, a combination of attribute values of all attribute types used for one word is referred to as an element w. If the input position of the word is “i” th input, it is expressed as wi.

そして、判別部１７は、第２規則データ格納部１１及び判別対象データ格納部１５に格納されているデータを用いて判別処理を実施する（ステップＳ９）。この判別処理については、図１５及び図１６を用いて説明する。但し、最初は、条件が１種類のみの場合の処理を以下に説明する。具体的には、図６の右側の第２規則テーブルが用意されている場合の処理を説明する。 Then, the determination unit 17 performs a determination process using the data stored in the second rule data storage unit 11 and the determination target data storage unit 15 (step S9). This determination process will be described with reference to FIGS. 15 and 16. However, first, the process when there is only one type of condition will be described below. Specifically, a process when the second rule table on the right side of FIG. 6 is prepared will be described.

まず、判別部１７は、ｉを１に初期化する（ステップＳ３１）。そして、判別対象データ格納部１５における要素ｗiの未処理の属性種別を１つ特定する（ステップＳ３３）。上で述べたように、要素ｗiは、判別対象データ格納部１５におけるｉ番目の単語の１又は複数の属性種別の属性値を含む。図３の例は、属性種別が単語そのもののみという非常に単純な例であり、上でも述べた品詞や文字種が組み合わせられる場合もある。 First, the determination unit 17 initializes i to 1 (step S31). Then, one unprocessed attribute type of the element wi in the discrimination target data storage unit 15 is specified (step S33). As described above, the element wi includes attribute values of one or more attribute types of the i-th word in the discrimination target data storage unit 15. The example of FIG. 3 is a very simple example in which the attribute type is only the word itself, and the part of speech or character type described above may be combined.

そして、判別部１７は、要素ｗiの特定された属性種別の属性値がいずれかの規則の条件を満たすか判断する（ステップＳ３５）。図７の例では、属性種別が単語そのもので、単語「宮崎」を条件とする規則が存在するか判断し、該当するスコア設定規則が２つ得られる。要素ｗiの特定された属性種別の属性値がいずれの規則の条件にも合致しない場合にはステップＳ３９に移行する。一方、要素ｗiの特定された属性種別の属性値がいずれかの規則の条件を満たしている場合には、該当規則におけるスコア設定規則（すなわち、スコアが付与される単語の位置（出現相対位置）と、タイプ及びスコア）に応じて、要素ｗiに対してスコアを設定する（ステップＳ３７）。図７の例では、ステップ（１１）及び（１２）のように、単語「宮崎」の要素について固有名詞タイプ「人」にスコア「１０」を設定すると共に、単語「さん」の要素について固有名詞タイプ「Ｏ」にスコア「５」を設定する。既にスコアが登録されている場合には、今回のスコア設定規則に含まれるスコアの値を加算する。処理はステップＳ３９に移行する。 Then, the determination unit 17 determines whether the attribute value of the specified attribute type of the element w i satisfies any rule condition (step S35). In the example of FIG. 7, it is determined whether there is a rule having the attribute type of the word itself and the condition of the word “Miyazaki”, and two corresponding score setting rules are obtained. If the attribute value of the specified attribute type of the element wi does not match any rule condition, the process proceeds to step S39. On the other hand, when the attribute value of the specified attribute type of the element wi satisfies any rule condition, the score setting rule in the corresponding rule (that is, the position of the word to which the score is given (appearance relative position)) And a score is set for the element wi according to the type and score (step S37). In the example of FIG. 7, as in steps (11) and (12), a score “10” is set for the proper noun type “person” for the element of the word “Miyazaki” and a proper noun for the element of the word “san”. A score “5” is set for the type “O”. If a score has already been registered, the score value included in the current score setting rule is added. The process proceeds to step S39.

ステップＳ３９では、判別部１７は、要素ｗiについて未処理の属性が存在するか判断し、要素ｗiについて未処理の属性種別が存在すればステップＳ３３に戻る。一方、未処理の属性種別が存在しなければ、ｉを１インクリメントして（ステップＳ４１）、ｉがｎ（ｎは要素の数）以下であるか判断する（ステップＳ４３）。ｉがｎ以下であればステップＳ３３に戻る。一方、ｉがｎを超える場合には、判別対象データ格納部１５における各要素ｗiについて、スコア最大のタイプ（図７の例では人名かそれ以外か）を特定し、判別結果格納部１９に格納する（ステップＳ４５）。例えば図１６に示すように、判別結果格納部１９には、単語毎に、判別結果が登録されるようになっている。すなわち、「宮崎」だけが「人」と判定されている。 In step S39, the determination unit 17 determines whether or not an unprocessed attribute exists for the element wi. If there is an unprocessed attribute type for the element wi, the process returns to step S33. On the other hand, if there is no unprocessed attribute type, i is incremented by 1 (step S41), and it is determined whether i is n (n is the number of elements) or less (step S43). If i is n or less, the process returns to step S33. On the other hand, if i exceeds n, for each element wi in the discrimination target data storage unit 15, the type with the highest score (in the example of FIG. 7, a person name or other) is specified and stored in the discrimination result storage unit 19. (Step S45). For example, as shown in FIG. 16, a discrimination result is registered in the discrimination result storage unit 19 for each word. That is, only “Miyazaki” is determined as “person”.

以上のように、条件が１つの属性種別についてのみで組み合わせ無しという単純な例では、属性種別が複数存在していれば、１つの要素について属性種別の数だけ規則の検索が行われる。但し、１つの要素については１回のみ処理されるので、検索回数は削減されている。属性種別が１つだけであれば、１要素について１回の検索で済んでいる。 As described above, in a simple example where the condition is only for one attribute type and there is no combination, if there are a plurality of attribute types, a rule is searched for the number of attribute types for one element. However, since one element is processed only once, the number of searches is reduced. If there is only one attribute type, one search is sufficient for one element.

図１０の処理の説明に戻り、出力部２１は、判別結果格納部１９に格納されている判別結果を、表示装置や印刷装置などの出力装置に出力する（ステップＳ２１）。規則処理装置がネットワークに接続されている場合には、ネットワークに接続されている他のコンピュータに送信するようにしても良い。 Returning to the description of the processing in FIG. 10, the output unit 21 outputs the determination result stored in the determination result storage unit 19 to an output device such as a display device or a printing device (step S <b> 21). When the rule processing device is connected to a network, it may be transmitted to another computer connected to the network.

以上のような処理を実施すれば、検索回数を削減しつつ、判別対象データの各単語について、最も確からしいタイプが特定されるようになる。上で述べた例では、人名であるか否かが特定されるが、他のタイプを判別することもできるようになる。例えば品詞の判別を行うことも可能である。さらに、名詞句、動詞句などの判別（基本フレーズ判別）、文節推定などを行うことも可能である。 By performing the processing as described above, the most probable type is identified for each word of the discrimination target data while reducing the number of searches. In the example described above, whether or not the name is a personal name is specified, but other types can be discriminated. For example, part of speech can be determined. Furthermore, it is possible to determine noun phrases and verb phrases (basic phrase determination), phrase estimation, and the like.

上では規則の条件が１つの属性種別についてのみで組み合わせ無しという単純な例について述べたが、一般的にはこのような単純な例だけではない。次に、より一般化した処理について説明する。すなわち、条件が複数の素性の組み合わせで表される場合について図１７乃至図１９を用いて説明する。規則変換部９は、現在位置を示すｒを１に初期化する（ステップＳ５１）。そして、出現相対位置の最小値ＭＩＮを「未定義」に初期化する（ステップＳ５３）。その後、第１規則データ格納部７におけるｒ番目の規則から未処理の素性を１つ特定する（ステップＳ５５）。例えば図１８（ａ）に示すように、各規則の条件が２つの素性の組み合わせで表される場合、第１の規則を処理する場合には、Ｗ（０）＝宮崎とＷ（１）＝さんのうち例えばＷ（０）＝宮崎を特定する。 In the above, a simple example in which the rule condition is only for one attribute type and there is no combination has been described. However, in general, this is not the only simple example. Next, more generalized processing will be described. That is, the case where the condition is expressed by a combination of a plurality of features will be described with reference to FIGS. The rule converter 9 initializes r indicating the current position to 1 (step S51). Then, the minimum value MIN of the appearance relative position is initialized to “undefined” (step S53). Thereafter, one unprocessed feature is specified from the r-th rule in the first rule data storage unit 7 (step S55). For example, as shown in FIG. 18A, when the condition of each rule is expressed by a combination of two features, when processing the first rule, W (0) = Miyazaki and W (1) = For example, W (0) = Miyazaki is identified.

そして、規則変換部９は、特定された素性から素性の出現相対位置ｐを特定する（ステップＳ５７）。上で特定された例では、ｐ＝０となる。その後、ＭＩＮが未定義又はｐ＜ＭＩＮであるか判断する（ステップＳ５９）。ＭＩＮが未定義又はｐ＜ＭＩＮである場合には、ＭＩＮを最小にするためにＭＩＮ＝ｐに設定する（ステップＳ６１）。そしてステップＳ６３に移行する。一方、ＭＩＮ≦ｐである場合には、ＭＩＮを更新する必要がないので、ステップＳ６３に移行する。 And the rule conversion part 9 specifies the appearance relative position p of a feature from the specified feature (step S57). In the example specified above, p = 0. Thereafter, it is determined whether MIN is undefined or p <MIN (step S59). If MIN is undefined or p <MIN, MIN = p is set to minimize MIN (step S61). Then, control goes to a step S63. On the other hand, if MIN ≦ p, there is no need to update MIN, and the process proceeds to step S63.

ステップＳ６３では、規則変換部９は、ｒ番目の規則に未処理の素性が存在するか判断し（ステップＳ６３）、未処理の素性が存在している場合にはステップＳ５５に戻る。図１８（ａ）の例では、Ｗ（１）＝さんが未処理として特定されるが、ｐ＝１で、ＭＩＮ＝０であるから、ＭＩＮ＝０で確定する。２番目の規則の場合、Ｗ（−１）＝宮崎とＷ（０）＝さんであるので、ＭＩＮ＝−１となる。 In step S63, the rule conversion unit 9 determines whether or not an unprocessed feature exists in the r-th rule (step S63). If an unprocessed feature exists, the process returns to step S55. In the example of FIG. 18A, W (1) = is identified as unprocessed, but since p = 1 and MIN = 0, it is determined that MIN = 0. In the case of the second rule, MIN = −1 because W (−1) = Miyazaki and W (0) = san.

一方、ｒ番目の規則に未処理の素性が存在しない場合には、端子Ａを介して図１９の処理に移行する。次に、規則変換部９は、ｒ番目の規則から再度未処理の素性を１つ特定する（ステップＳ６５）。そして、特定された素性を、属性種別ｔと素性の出現相対位置ｐと値ｆとに分離する（ステップＳ６７）。Ｗ（０）＝宮崎については、属性種別ｔ＝Ｗ（すなわち単語）、ｐ＝０、ｆ＝宮崎といったように分離される。そして、ＮＩ＝ｐ−ＭＩＮを新たな出現相対位置に設定し、属性種別ｔと新たな出現相対位置ＮＩと値ｆとで表現される素性を、新たな条件ｎｒに追加する（ステップＳ６９）。１番目の規則のＷ（０）＝宮崎については、ＮＩ＝０−０＝０であるので、素性の内容に変化はない。Ｗ（１）＝さんについても素性の内容に変化はない。一方、２番目の規則におけるＷ（-1）＝宮崎については、ＮＩ＝−１−（−１）＝０であり、Ｗ（０）＝宮崎に変更される。２番目の規則におけるＷ（０）＝さんについては、ＮＩ＝０−（−１）＝１であり、Ｗ（１）＝さんに変更される。その後、ｒ番目の規則において未処理の素性が存在するか判断する（ステップＳ７１）。未処理の素性が存在する場合にはステップＳ６５に戻る。 On the other hand, when there is no unprocessed feature in the r-th rule, the processing shifts to the processing in FIG. Next, the rule conversion unit 9 identifies one unprocessed feature again from the r-th rule (step S65). Then, the identified feature is separated into attribute type t, feature appearance relative position p, and value f (step S67). For W (0) = Miyazaki, the attribute type is t = W (ie, word), p = 0, f = Miyazaki, and so on. Then, NI = p-MIN is set as a new appearance relative position, and the feature expressed by the attribute type t, the new appearance relative position NI, and the value f is added to the new condition nr (step S69). For W (0) = Miyazaki in the first rule, since NI = 0-0 = 0, the content of the feature is not changed. There is no change in the content of the identity of W (1) =. On the other hand, for W (−1) = Miyazaki in the second rule, NI = −1 − (− 1) = 0, and W (0) = Miyazaki is changed. For W (0) = in the second rule, NI = 0 − (− 1) = 1 and is changed to W (1) =. Thereafter, it is determined whether an unprocessed feature exists in the r-th rule (step S71). If there is an unprocessed feature, the process returns to step S65.

一方、ｒ番目の規則において未処理の素性が存在しない場合には、規則変換部９は、第２規則データ格納部１１における第２規則テーブルに、条件「ｎｒ」と、スコアを付与する単語の位置「−ＭＩＮ」とｒ番目の規則のタイプ及びスコアとを含むスコア設定規則とを登録する（ステップＳ７３）。そして、ｒを１インクリメントし（ステップＳ７５）、ｒがｍ（ｍが規則数）以下であるか判断する（ステップＳ７７）。ｒがｍ以下である場合には、端子Ｂを介してステップＳ５３に戻る。一方、ｒがｍを超えた場合には、第２規則テーブルにおいて、同一条件の規則を集約する（ステップＳ７９）。図１８（ａ）の１番目の規則と２番目の規則については図１８（ｂ）に示すように、同じ条件となるので、１つの条件から２つのスコア設定規則が抽出できるように関連付けを行う。図１８（ａ）の例では、３番目及び４番目の規則についても同様に条件は同じとなるので、図１８（ｂ）に示すように、１つの条件から２つのスコア設定規則が抽出できるように関連付けを行う。そして元の処理に戻る。 On the other hand, when there is no unprocessed feature in the r-th rule, the rule conversion unit 9 adds the condition “nr” and the word to which the score is assigned to the second rule table in the second rule data storage unit 11. A score setting rule including the position “−MIN” and the type and score of the r-th rule is registered (step S73). Then, r is incremented by 1 (step S75), and it is determined whether r is less than m (m is the number of rules) (step S77). If r is less than or equal to m, the process returns to step S53 via the terminal B. On the other hand, if r exceeds m, rules having the same condition are collected in the second rule table (step S79). The first rule and the second rule in FIG. 18A have the same conditions as shown in FIG. 18B, and therefore, association is performed so that two score setting rules can be extracted from one condition. . In the example of FIG. 18A, the conditions are the same for the third and fourth rules, so that two score setting rules can be extracted from one condition as shown in FIG. 18B. Make an association with. Then, the process returns to the original process.

以上のような処理を実施すれば、条件に含まれる素性の数が複数であっても、また複数の属性種別について素性が構成される場合においても対応することができる。 By performing the processing as described above, it is possible to cope with a case where the number of features included in the condition is plural or when features are configured for a plurality of attribute types.

次に、図２０乃至図２４を用いて、より一般的な判別処理について説明する。なお、図１８（ｂ）の第２規則テーブルを用いることとする。まず、ｎに要素ｗの個数を設定する（ステップＳ８１）。例えば図２１に示すようなデータが判別対象データ格納部１５に格納されているものとする。ここでは、単語の属性種別として単語そのものＷの他に品詞Ｐも含まれている。ここでは４つの要素が存在しているので、ｎ＝４となる。また、ＷＲに、規則の条件における単語の出現相対位置の、基準位置からの最大距離を設定し、ＭＡＸに、規則の条件に含まれる素性の最大個数を設定する（ステップＳ８３）。これは、第２規則データ格納部１１における第２規則テーブルを探索することによって設定される。図１８（ｂ）に示すような第２規則テーブルの場合には、「Ｗ（ｐ）＝ｗi」におけるｐの最大値が１であるから、ＷＲ＝１である。但し、以下の処理の説明のためにＷＲ＝「２」であるものとする。また、各規則には最大２つの素性が含まれるので、ＭＡＸ＝２と設定される。 Next, more general determination processing will be described with reference to FIGS. Note that the second rule table in FIG. 18B is used. First, the number of elements w is set to n (step S81). For example, it is assumed that data as shown in FIG. 21 is stored in the discrimination target data storage unit 15. Here, the part of speech P is included in addition to the word itself W as the attribute type of the word. Since there are four elements here, n = 4. Further, the maximum distance from the reference position of the relative appearance position of the word in the rule condition is set in WR, and the maximum number of features included in the rule condition is set in MAX (step S83). This is set by searching the second rule table in the second rule data storage unit 11. In the case of the second rule table as shown in FIG. 18B, since the maximum value of p in “W (p) = wi” is 1, WR = 1. However, WR = “2” is assumed for the explanation of the following processing. Since each rule includes a maximum of two features, MAX = 2 is set.

そして、判別部１７は、ｉ＝１と設定し（ステップＳ８５）、要素ｗiを基準とし、ＷＲ及びＭＡＸに従ってチェック候補を生成し、ＡＲに格納する（ステップＳ８７）。要素ｗiから最大距離ＷＲだけ離れた出現位置の要素ｗ_i+WRまでの要素の各々について、属性種別（例えばＷ又はＰ）毎に当該属性種別（Ｗ又はＰ）と当該属性種別の属性値ｆと要素ｗiからの出現相対位置ｐjとから素性を生成する。そして、ＭＡＸ以内で要素ｗiについての素性のいずれかを少なくとも含むように組み合わせてチェック候補を生成する。図２１の例で、要素ｗ1を処理対象とすると、考慮しなければならない要素は、単語「宮崎」の要素ｗ1と、単語「さん」の要素ｗ2と、単語「と」の要素ｗ3とになる。そして、要素ｗ1について、Ｗ（０）＝宮崎（すなわちｐj＝０、ｆ＝宮崎）、Ｐ（０）＝名詞（すなわちｐj＝０、ｆ＝名詞）、Ｗ（１）＝さん（すなわちｐj＝１、ｆ＝さん）、Ｐ（１）＝接尾（すなわちｐj＝１、ｆ＝接尾）、Ｗ（２）＝と（すなわちｐj＝２、ｆ＝と）、Ｐ（２）＝助詞（すなわちｐj＝２、ｆ＝助詞）とが生成される。そして、ＭＡＸ＝２であるから、素性１個又は２個で、Ｗ（０）＝宮崎とＰ（０）＝名詞との少なくともいずれかを含むように素性を組み合わせると、図２２に示すようなチェック候補が生成される。 Then, the determination unit 17 sets i = 1 (step S85), generates a check candidate according to WR and MAX with reference to the element w i, and stores it in the AR (step S87). For each element up to the element w _{i + WR} at the appearance position separated from the element w _i by the maximum distance WR, for each attribute type (for example, W or P), the attribute type (W or P) and the attribute value f of the attribute type And the appearance relative position pj from the element w i are generated. Then, check candidates are generated by combining them so as to include at least one of the features about the element w i within MAX. In the example of FIG. 21, if the element w1 is a processing target, the elements to be considered are the element w1 of the word “Miyazaki”, the element w2 of the word “san”, and the element w3 of the word “to”. . For element w1, W (0) = Miyazaki (ie, pj = 0, f = Miyazaki), P (0) = noun (ie, pj = 0, f = noun), W (1) = san (ie, pj = 1, f =), P (1) = suffix (ie, pj = 1, f = suffix), W (2) = and (ie, pj = 2, f = to), P (2) = particle (ie, pj) = 2 and f = particle). And since MAX = 2, when features are combined so as to include at least one of W (0) = Miyazaki and P (0) = noun with one or two features, as shown in FIG. Check candidates are generated.

そして、判別部１７は、ＡＲから未処理のチェック候補ｃｒを１つ特定する（ステップＳ８９）。その後、チェック候補ｃｒで第２規則データ格納部１１における第２規則テーブルを検索することによって、チェック候補ｃｒが適用条件となる規則が存在するか判断する（ステップＳ９１）。チェック候補ｃｒが適用条件となる規則が存在しない場合には、端子Ｃを介して図２４の処理に移行する。一方、チェック候補ｃｒが適用条件となる規則が存在する場合には、該当規則についてのスコア設定規則に基づき、判別対象データ格納部１５における該当するスコアを更新する（ステップＳ９３）。図２３に模式的に示すように、ステップＳ８７で生成されたチェック候補の各々について第２規則データ格納部１１に格納されている第２規則テーブルを検索すると、４番目のチェック候補が１番目の規則の条件に一致することになる。そうすると２つのスコア設定規則が抽出されて、それぞれについて、判別対象データ格納部１５の該当箇所のスコアを更新する。第１のスコア設定規則であれば、スコアを付与する単語の位置が「０」であるから要素ｗiについてのタイプ「人」のスコアを「２０」だけ増加させる。第２のスコア設定規則であれば、スコアを付与する単語の位置が「１」であるから要素ｗ_i+1についてのタイプ「Ｏ」のスコアを「１０」だけ増加させる。そうすると、判別対象データ格納部１５のテーブルは、図２３の左上の状態になる。処理は端子Ｃを介して図２４の処理に移行する。 Then, the determination unit 17 specifies one unprocessed check candidate cr from the AR (step S89). Thereafter, by searching the second rule table in the second rule data storage unit 11 using the check candidate cr, it is determined whether there is a rule for which the check candidate cr is an application condition (step S91). If there is no rule for which the check candidate cr is an application condition, the process proceeds to the process in FIG. On the other hand, if there is a rule for which the check candidate cr is an application condition, the corresponding score in the discrimination target data storage unit 15 is updated based on the score setting rule for the corresponding rule (step S93). As schematically shown in FIG. 23, when the second rule table stored in the second rule data storage unit 11 is searched for each of the check candidates generated in step S87, the fourth check candidate is the first check candidate. It matches the conditions of the rule. Then, two score setting rules are extracted, and the score of the corresponding part of the discrimination target data storage unit 15 is updated for each. In the case of the first score setting rule, since the position of the word to which the score is assigned is “0”, the score of the type “person” for the element w i is increased by “20”. In the case of the second score setting rule, since the position of the word to which the score is assigned is “1”, the score of the type “O” for the element w _{i + 1 is} increased by “10”. If it does so, the table of the discrimination | determination object data storage part 15 will be in the upper left state of FIG. The processing shifts to the processing in FIG.

図２４の処理の説明に移行して、判別部１７は、ＡＲに未処理のチェック候補が残っているか判断する（ステップＳ９７）。未処理のチェック候補が残っている場合には端子Ｅを介して図２０のステップＳ８９に戻る。一方、ＡＲにおいて全てのチェック候補を処理した場合には、ｉを１インクリメントし（ステップＳ９９）、ｉがｎ（ｎは要素数）以下であるか判断する（ステップＳ１０１）。ｉがｎ以下であれば端子Ｄを介して図２０のステップＳ８７に戻る。一方、ｉがｎを超えた場合には、判別対象データ格納部１５内の各要素についてスコア最大のタイプを特定し、判別結果格納部１９に格納する（ステップＳ１０３）。そして元の処理に戻る。 Shifting to the description of the processing in FIG. 24, the determination unit 17 determines whether or not unprocessed check candidates remain in the AR (step S97). If unprocessed check candidates remain, the process returns to step S89 in FIG. On the other hand, if all check candidates have been processed in the AR, i is incremented by 1 (step S99), and it is determined whether i is n or less (n is the number of elements) (step S101). If i is n or less, the process returns to step S87 in FIG. On the other hand, if i exceeds n, the maximum score type is specified for each element in the discrimination target data storage unit 15 and stored in the discrimination result storage unit 19 (step S103). Then, the process returns to the original process.

このような処理を実施すれば要素や規則が一般化しても対応可能である。 By carrying out such processing, it is possible to cope with generalization of elements and rules.

以上本技術の実施の形態を説明したが、本技術はこれに限定されるものではない。例えば、タイプは「人名」か「その他」しかない例を示したが、他のタイプであってもよいし、多数のタイプにスコアを設定するような場合にも対応可能である。 Although the embodiment of the present technology has been described above, the present technology is not limited to this. For example, although the example has only “person name” or “other” as the type, other types may be used, and it is possible to deal with a case where scores are set for many types.

さらに、処理フローについても同様の結果を得られれば、処理順番を入れ替えたり、並列実施するようにしても良い。特にステップＳ８７のような処理については、一度にチェック候補を全て生成するのではなく、第２規則テーブルを検索する毎に生成するような形に変形するようにしても良い。 Furthermore, as long as the same result is obtained for the processing flow, the processing order may be changed or the processing flow may be performed in parallel. In particular, the processing as in step S87 may be modified so that it is generated every time the second rule table is searched, instead of generating all check candidates at once.

また規則処理装置の機能ブロック図は一例であって、必ずしも実際のプログラムモジュール構成とは一致しない場合もある。 Further, the functional block diagram of the rule processing device is an example, and may not necessarily match the actual program module configuration.

なお、上で述べた規則処理装置は、コンピュータ装置であって、図２５に示すように、メモリ２５０１とＣＰＵ２５０３とハードディスク・ドライブ（ＨＤＤ）２５０５と表示装置２５０９に接続される表示制御部２５０７とリムーバブル・ディスク２５１１用のドライブ装置２５１３と入力装置２５１５とネットワークに接続するための通信制御部２５１７とがバス２５１９で接続されている。オペレーティング・システム（ＯＳ：Operating System）及び本実施例における処理を実施するためのアプリケーション・プログラムは、ＨＤＤ２５０５に格納されており、ＣＰＵ２５０３により実行される際にはＨＤＤ２５０５からメモリ２５０１に読み出される。必要に応じてＣＰＵ２５０３は、表示制御部２５０７、通信制御部２５１７、ドライブ装置２５１３を制御して、必要な動作を行わせる。また、処理途中のデータについては、メモリ２５０１に格納され、必要があればＨＤＤ２５０５に格納される。本技術の実施例では、上で述べた処理を実施するためのアプリケーション・プログラムはコンピュータ読み取り可能なリムーバブル・ディスク２５１１に格納されて頒布され、ドライブ装置２５１３からＨＤＤ２５０５にインストールされる。インターネットなどのネットワーク及び通信制御部２５１７を経由して、ＨＤＤ２５０５にインストールされる場合もある。このようなコンピュータ装置は、上で述べたＣＰＵ２５０３、メモリ２５０１などのハードウエアとＯＳ及び必要なアプリケーション・プログラムとが有機的に協働することにより、上で述べたような各種機能を実現する。 The rule processing device described above is a computer device, and as shown in FIG. 25, a memory 2501, a CPU 2503, a hard disk drive (HDD) 2505, a display control unit 2507 connected to the display device 2509, and a removable device. A drive device 2513 for the disk 2511, an input device 2515, and a communication control unit 2517 for connecting to a network are connected by a bus 2519. An operating system (OS) and an application program for executing the processing in this embodiment are stored in the HDD 2505, and are read from the HDD 2505 to the memory 2501 when executed by the CPU 2503. If necessary, the CPU 2503 controls the display control unit 2507, the communication control unit 2517, and the drive device 2513 to perform necessary operations. Further, data in the middle of processing is stored in the memory 2501 and stored in the HDD 2505 if necessary. In an embodiment of the present technology, an application program for performing the above-described processing is stored in a computer-readable removable disk 2511 and distributed, and installed from the drive device 2513 to the HDD 2505. In some cases, the HDD 2505 may be installed via a network such as the Internet and the communication control unit 2517. Such a computer apparatus realizes various functions as described above by organically cooperating hardware such as the CPU 2503 and the memory 2501 described above, the OS, and necessary application programs.

以上本実施の形態をまとめると以下のようになる。 The present embodiment can be summarized as follows.

本規則処理方法は、単語の属性種別と出現相対位置と属性値との組み合わせで表される条件又は条件の組み合わせの各々に関連付けてスコアと当該スコアを付与すべき出現相対位置及びタイプとを含む１又は複数のスコア設定規則が登録されている規則データ格納部と、単語の各属性種別の属性値を含む要素が出現順に並べられた要素列を格納する要素列データ格納部とにアクセス可能なコンピュータにより実行される。そして本規則処理方法は、要素列データ格納部から、上記条件の組み合わせに含まれる条件の最大数と上記条件又は条件の組み合わせに関連付けられているスコア設定規則に含まれる出現相対位置の、基準位置からの最大距離とにより特定される候補抽出条件に従って、要素毎に、単語の属性種別と出現相対位置と属性値との組み合わせで表される要素条件（実施の形態における素性）又は要素条件の組み合わせである適用候補を抽出する抽出ステップと、適用候補で規則データ格納部を検索して、該当する条件又は条件の組み合わせが存在する場合には、規則データ格納部から該当する上記条件又は条件の組み合わせに関連付けられている１又は複数のスコア設定規則を抽出し、抽出された１又は複数のスコア設定規則に含まれるタイプ及びスコアで、抽出された上記１又は複数のスコア設定規則に含まれる出現相対位置と適用候補に係る出現位置とから特定される要素の当該タイプについてのスコアを更新し、要素列データ格納部に格納するステップとを含む。 This rule processing method includes a score, a relative appearance position and a type to which the score should be assigned, in association with each condition or combination of conditions represented by a combination of the attribute type, appearance relative position, and attribute value of the word. Access to a rule data storage unit in which one or a plurality of score setting rules are registered, and an element column data storage unit that stores an element column in which elements including attribute values of each attribute type of words are arranged in the order of appearance Executed by a computer. Then, this rule processing method uses the reference position of the appearance relative position included in the score setting rule associated with the maximum number of conditions included in the combination of conditions and the condition or combination of conditions from the element string data storage unit. In accordance with the candidate extraction condition specified by the maximum distance from the element, an element condition (feature in the embodiment) or a combination of element conditions represented by a combination of the attribute type, appearance relative position, and attribute value of each word for each element The extraction step for extracting the application candidate, and the rule data storage unit by searching for the application candidate, and if there is a corresponding condition or combination of conditions, the corresponding combination of the above condition or condition from the rule data storage unit 1 or a plurality of score setting rules associated with, and the types and types included in the extracted one or more score setting rules In the score, the score for the type of the element identified from the appearance relative position included in the extracted one or more score setting rules and the appearance position related to the application candidate is updated and stored in the element string data storage unit Including the step of.

上で述べたような規則データ格納部を用意することによって、適用候補で規則データ格納部を検索して、適用候補に該当する条件又は条件の組み合わせが存在する場合にはそれに関連付けられている１又は複数のスコア設定規則が一度に抽出できるので、規則データ格納部の検索回数等を減らすことができるようになり、全体の処理速度が向上する。なお、属性種別は１又は複数の場合があり、例えば単語そのもの、品詞、文字種などが想定される。 By preparing the rule data storage unit as described above, the rule data storage unit is searched for the application candidate, and if there is a condition or combination of conditions corresponding to the application candidate, 1 associated therewith Alternatively, since a plurality of score setting rules can be extracted at a time, the number of searches in the rule data storage unit can be reduced, and the overall processing speed is improved. There may be one or a plurality of attribute types. For example, a word itself, a part of speech, a character type, and the like are assumed.

また、上で述べた抽出ステップが、処理に係る要素の出現位置から最大距離だけ離れた出現位置までの要素の各々について、属性種別毎に当該属性種別と当該属性種別の属性値と処理に係る要素の出現位置からの出現相対位置との組み合わせで表される要素条件を生成するステップと、処理に係る要素についての要素条件のいずれかを少なくとも含むように要素条件を上記条件の最大数以内で組み合わせて適用候補を生成するステップとを含むようにしてもよい。要素列において処理に係る要素より前の要素のデータを再度用いて適用候補を生成するのではないので、処理が単純化且つ高速化される。 Further, the extraction step described above relates to the attribute type, the attribute value of the attribute type, and the processing for each attribute type for each of the elements up to the appearance position separated by the maximum distance from the appearance position of the element related to the processing. The element condition is included within the maximum number of the above conditions so as to include at least one of the element condition for the element related to the processing and the step of generating the element condition represented by the combination of the appearance relative position from the appearance position of the element And generating a candidate for application in combination. Since the application candidate is not generated again by using the data of the element before the element related to the process in the element string, the process is simplified and speeded up.

さらに、本規則処理方法において、上記要素の各々について、各タイプについてのスコアを比較して、最大のスコアが設定されているタイプを当該要素のタイプとして特定するステップをさらに含むようにしても良い。例えばタイプが固有名詞やそれ以外といったものであれば、単語毎に固有名詞か否かを判断することができるようになる。 Further, the rule processing method may further include a step of comparing the score for each type for each of the elements, and specifying the type for which the maximum score is set as the type of the element. For example, if the type is a proper noun or other type, it can be determined whether each word is a proper noun.

また、コンピュータが、単語の属性種別と出現相対位置と属性値との組み合わせで表される第２条件又は第２条件の組み合わせの各々に関連付けてスコアと当該スコアを付与すべきタイプとを含む１の第２スコア設定規則が登録されている第２規則データ格納部にさらにアクセス可能であってもよい。その場合、第２規則データ格納部に格納されている第２条件又は第２条件の組み合わせの各々について、第２条件又は第２条件の組み合わせに含まれる出現相対位置における最も小さい値を基準値として特定し、第２条件又は第２条件の組み合わせに含まれる各第２条件の出現相対位置を基準値からの新たな出現相対位置に変換して上記条件又は条件の組み合わせを生成し、当該条件又は条件の組み合わせに関連付けて、第２設定規則に加えて基準値に（−１）を乗じた値を出現相対位置として含むスコア設定規則を規則データ格納部に格納するステップと、規則データ格納部において、上記条件又は条件の組み合わせが同じスコア設定規則を抽出して、上記条件又は条件の組み合わせを集約するステップとをさらに含むようにしてもよい。例えば、第２規則データ格納部が従来技術で用意されるような場合、このようにすれば本技術で必要な規則データ格納部のデータが用意できるようになる。 In addition, the computer includes a score and a type to which the score should be assigned in association with each of the second condition or the combination of the second condition represented by a combination of the attribute type, appearance relative position, and attribute value of the word. The second rule data storage unit in which the second score setting rule is registered may be further accessible. In that case, for each of the second condition or the combination of the second condition stored in the second rule data storage unit, the smallest value at the appearance relative position included in the second condition or the combination of the second condition is used as the reference value. Identify and convert the appearance relative position of each second condition included in the second condition or combination of second conditions into a new appearance relative position from the reference value to generate the condition or combination of conditions, In the rule data storage unit, a step of storing a score setting rule including a value obtained by multiplying the reference value by (−1) as an appearance relative position in addition to the second setting rule in association with the combination of conditions in the rule data storage unit; And a step of extracting score setting rules having the same conditions or combinations of conditions and aggregating the conditions or combinations of conditions. For example, when the second rule data storage unit is prepared by the conventional technique, data of the rule data storage unit necessary for the present technique can be prepared in this way.

なお、上で述べたような処理をコンピュータに実施させるためのプログラムを作成することができ、当該プログラムは、例えばフレキシブル・ディスク、ＣＤ−ＲＯＭ、光磁気ディスク、半導体メモリ、ハードディスク等のコンピュータ読み取り可能な記憶媒体又は記憶装置に格納される。なお、処理途中のデータについては、コンピュータのメモリ等の記憶装置に一時保管される。 It is possible to create a program for causing a computer to carry out the processes described above, and the program can be read by a computer such as a flexible disk, a CD-ROM, a magneto-optical disk, a semiconductor memory, and a hard disk. Stored in a storage medium or storage device. Note that data being processed is temporarily stored in a storage device such as a computer memory.

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）
単語の属性種別と出現相対位置と属性値との組み合わせで表される条件又は条件の組み合わせの各々に関連付けてスコアと当該スコアを付与すべき出現相対位置及びタイプとを含む１又は複数のスコア設定規則が登録されている規則データ格納部と、単語の各属性種別の属性値を含む要素が出現順に並べられた要素列を格納する要素列データ格納部とにアクセス可能なコンピュータにより実行される規則処理方法であって、
前記要素列データ格納部から、前記条件の組み合わせに含まれる条件の最大数と前記条件又は条件の組み合わせに関連付けられている前記スコア設定規則に含まれる出現相対位置の、基準位置からの最大距離とにより特定される候補抽出条件に従って、前記要素毎に、単語の属性種別と出現相対位置と属性値との組み合わせで表される要素条件又は要素条件の組み合わせである適用候補を抽出する抽出ステップと、
前記適用候補で前記規則データ格納部を検索して、該当する前記条件又は条件の組み合わせが存在する場合には、前記規則データ格納部から該当する前記条件又は条件の組み合わせに関連付けられている１又は複数のスコア設定規則を抽出し、抽出された前記１又は複数のスコア設定規則に含まれる前記タイプ及び前記スコアで、抽出された前記１又は複数のスコア設定規則に含まれる前記出現相対位置と前記適用候補に係る前記出現位置とから特定される要素の当該タイプについてのスコアを更新し、前記要素列データ格納部に格納するステップと、
を含む規則処理方法。 (Appendix 1)
One or a plurality of score settings including a score and a relative position and type of appearance to which the score should be given in association with each condition or combination of conditions represented by a combination of the attribute type, appearance relative position, and attribute value of the word Rules executed by a computer that can access a rule data storage unit in which rules are registered and an element string data storage unit that stores an element sequence in which elements including attribute values of word attribute types are arranged in the order of appearance A processing method,
From the element string data storage unit, the maximum number of conditions included in the combination of conditions and the maximum distance from the reference position of the appearance relative position included in the score setting rule associated with the condition or combination of conditions; An extraction step of extracting an application candidate that is an element condition or a combination of element conditions represented by a combination of a word attribute type, an appearance relative position, and an attribute value for each element according to the candidate extraction condition specified by:
If the rule data storage unit is searched for the application candidate and the corresponding condition or combination of conditions exists, the rule data storage unit is associated with the corresponding condition or condition combination 1 or Extracting a plurality of score setting rules, the type and the score included in the extracted one or more score setting rules, and the appearance relative position included in the extracted one or more score setting rules Updating a score for the type of the element identified from the appearance position related to the application candidate, and storing the score in the element string data storage unit;
Rules processing method including.

（付記２）
前記抽出ステップが、
処理に係る要素の出現位置から前記最大距離だけ離れた出現位置までの前記要素の各々について、前記属性種別毎に当該属性種別と当該属性種別の属性値と前記処理に係る要素の出現位置からの出現相対位置との組み合わせで表される要素条件を生成するステップと、
前記処理に係る要素についての要素条件のいずれかを少なくとも含むように前記要素条件を前記条件の最大数以内で組み合わせて前記適用候補を生成するステップと、
を含む付記１記載の規則処理方法。 (Appendix 2)
The extraction step comprises:
For each of the elements from the appearance position of the element related to the processing to the appearance position separated by the maximum distance, for each attribute type, the attribute type, the attribute value of the attribute type, and the appearance position of the element related to the processing Generating an element condition represented by a combination with an appearance relative position;
Generating the application candidate by combining the element conditions within a maximum number of the conditions so as to include at least any of the element conditions for the elements involved in the processing;
The rule processing method according to supplementary note 1 including:

（付記３）
前記要素の各々について、各前記タイプについてのスコアを比較して、最大のスコアが設定されているタイプを当該要素のタイプとして特定するステップ
をさらに含む付記１又は２記載の規則処理方法。 (Appendix 3)
The rule processing method according to appendix 1 or 2, further comprising a step of comparing the score for each of the types for each of the elements and specifying the type for which the maximum score is set as the type of the element.

（付記４）
前記コンピュータが、単語の属性種別と出現相対位置と属性値との組み合わせで表される第２条件又は第２条件の組み合わせの各々に関連付けてスコアと当該スコアを付与すべきタイプとを含む１の第２スコア設定規則が登録されている第２規則データ格納部にさらにアクセス可能であり、
前記第２規則データ格納部に格納されている前記第２条件又は第２条件の組み合わせの各々について、前記第２条件又は第２条件の組み合わせに含まれる前記出現相対位置における最も小さい値を基準値として特定し、前記第２条件又は第２条件の組み合わせに含まれる各前記第２条件の前記出現相対位置を前記基準値からの新たな出現相対位置に変換して前記条件又は条件の組み合わせを生成し、当該条件又は条件の組み合わせに関連付けて、前記第２設定規則に加えて前記基準値に（−１）を乗じた値を前記出現相対位置として含む前記スコア設定規則を前記規則データ格納部に格納するステップと、
前記規則データ格納部において、前記条件又は条件の組み合わせが同じ前記スコア設定規則を抽出して、前記条件又は条件の組み合わせを集約するステップと、
をさらに含む付記１乃至３のいずれか１つ記載の規則処理方法。 (Appendix 4)
The computer includes a score and a type to which the score should be assigned in association with each of the second condition or the combination of the second condition represented by a combination of the attribute type, appearance relative position, and attribute value of the word The second rule data storage unit in which the second score setting rule is registered is further accessible,
For each of the second condition or combination of second conditions stored in the second rule data storage unit, the smallest value in the appearance relative position included in the second condition or combination of second conditions is set as a reference value. And generating the condition or combination of conditions by converting the appearance relative position of each second condition included in the second condition or combination of second conditions into a new appearance relative position from the reference value. In addition to the second setting rule, the score setting rule including a value obtained by multiplying the reference value by (−1) as the appearance relative position in the rule data storage unit in association with the condition or the combination of conditions. Storing, and
In the rule data storage unit, extracting the score setting rule having the same condition or combination of conditions, and aggregating the conditions or combinations of conditions;
The rule processing method according to any one of appendices 1 to 3, further including:

（付記５）
付記１乃至４のいずれか１つ記載の規則処理方法をコンピュータに実行させるためのプログラム。 (Appendix 5)
A program for causing a computer to execute the rule processing method according to any one of appendices 1 to 4.

（付記６）
単語の属性種別と出現相対位置と属性値との組み合わせで表される条件又は条件の組み合わせの各々に関連付けてスコアと当該スコアを付与すべき出現相対位置及びタイプとを含む１又は複数のスコア設定規則が登録されている規則データ格納部と、
単語の各属性種別の属性値を含む要素が出現順に並べられた要素列を格納する要素列データ格納部と、
前記要素列データ格納部から、前記条件の組み合わせに含まれる条件の最大数と前記条件又は条件の組み合わせに関連付けられている前記スコア設定規則に含まれる出現相対位置の、基準位置からの最大距離とにより特定される候補抽出条件に従って、前記要素毎に、単語の属性種別と出現相対位置と属性値との組み合わせで表される要素条件又は要素条件の組み合わせである適用候補を抽出する抽出手段と、
前記適用候補で前記規則データ格納部を検索して、該当する前記条件又は条件の組み合わせが存在する場合には、前記規則データ格納部から該当する前記条件又は条件の組み合わせに関連付けられている１又は複数のスコア設定規則を抽出し、抽出された前記１又は複数のスコア設定規則に含まれる前記タイプ及び前記スコアで、抽出された前記１又は複数のスコア設定規則に含まれる前記出現相対位置と前記適用候補に係る前記出現位置とから特定される要素の当該タイプについてのスコアを更新し、前記要素列データ格納部に格納する手段と、
を有する規則処理装置。 (Appendix 6)
One or a plurality of score settings including a score and a relative position and type of appearance to which the score should be given in association with each condition or combination of conditions represented by a combination of the attribute type, appearance relative position, and attribute value of the word A rule data storage in which rules are registered;
An element string data storage unit that stores an element string in which elements including attribute values of each attribute type of words are arranged in the order of appearance;
From the element string data storage unit, the maximum number of conditions included in the combination of conditions and the maximum distance from the reference position of the appearance relative position included in the score setting rule associated with the condition or combination of conditions; An extraction means for extracting an application candidate that is an element condition or a combination of element conditions represented by a combination of a word attribute type, an appearance relative position, and an attribute value for each element according to the candidate extraction condition specified by:
If the rule data storage unit is searched for the application candidate and the corresponding condition or combination of conditions exists, the rule data storage unit is associated with the corresponding condition or condition combination 1 or Extracting a plurality of score setting rules, the type and the score included in the extracted one or more score setting rules, and the appearance relative position included in the extracted one or more score setting rules Means for updating a score for the type of the element identified from the appearance position related to the application candidate, and storing the score in the element string data storage unit;
A rule processing device.

１学習データ入力部３学習データ格納部
５規則学習部７第１規則データ格納部
９規則変換部１１第２規則データ格納部
１３判別対象データ入力部１５判別対象データ格納部
１７判別部１９判別結果格納部
２１出力部 DESCRIPTION OF SYMBOLS 1 Learning data input part 3 Learning data storage part 5 Rule learning part 7 1st rule data storage part 9 Rule conversion part 11 2nd rule data storage part 13 Discrimination object data input part 15 Discrimination object data storage part 17 Discrimination part 19 Discrimination result Storage unit 21 Output unit

Claims

One or a plurality of score settings including a score and a relative position and type of appearance to which the score should be given in association with each condition or combination of conditions represented by a combination of the attribute type, appearance relative position, and attribute value of the word Rules executed by a computer that can access a rule data storage unit in which rules are registered and an element string data storage unit that stores an element sequence in which elements including attribute values of word attribute types are arranged in the order of appearance A processing method,
From the element string data storage unit, the maximum number of conditions included in the combination of conditions and the maximum distance from the reference position of the appearance relative position included in the score setting rule associated with the condition or combination of conditions; An extraction step of extracting an application candidate that is an element condition or a combination of element conditions represented by a combination of a word attribute type, an appearance relative position, and an attribute value for each element according to the candidate extraction condition specified by:
If the rule data storage unit is searched for the application candidate and the corresponding condition or combination of conditions exists, the rule data storage unit is associated with the corresponding condition or condition combination 1 or Extracting a plurality of score setting rules, the type and the score included in the extracted one or more score setting rules, and the appearance relative position included in the extracted one or more score setting rules Updating a score for the type of the element identified from the appearance position related to the application candidate, and storing the score in the element string data storage unit;
Rules processing method including.

The extraction step comprises:
For each of the elements from the appearance position of the element related to the processing to the appearance position separated by the maximum distance, for each attribute type, the attribute type, the attribute value of the attribute type, and the appearance position of the element related to the processing Generating an element condition represented by a combination with an appearance relative position;
Generating the application candidate by combining the element conditions within a maximum number of the conditions so as to include at least any of the element conditions for the elements involved in the processing;
The rule processing method according to claim 1, comprising:

The rule processing method according to claim 1, further comprising: comparing a score for each of the types for each of the elements and specifying a type for which the maximum score is set as the type of the element.

One or more first and a single word first condition or each type to be given a score and the score associated with the combination of the first condition and the attribute type is expressed in combination with the appearance relative position and attribute values a rule processing method performed in the first rule data storage unit 1 score setting rule is registered by a computer accessible,
For each of the combinations of the first condition or the first condition is stored in the first rule data storage unit, a reference value the smallest value in the occurrence relative position included in the combination of the first condition or the first condition was identified as the said occurrence relative position of each of the first condition contained in the combination of the first condition or the first condition is converted into a new appearance relative position from the reference value, the word attribute type and the generating a combination of second condition or a second condition represented by the combination of the attribute value and the new appearance relative positions, in association with a combination of the second condition or the second condition, the first score setting rule In addition, storing a second score setting rule including a value obtained by multiplying the reference value by (−1) as an appearance relative position to which a score is to be assigned, in a second rule data storage unit;
In the second rule data storage unit, the method comprising the combination of the second condition or the second condition is to extract the same the score set rules, to aggregate the combination of the second condition or the second condition,
Rules processing methods, including.

A program for causing a computer to execute the rule processing method according to any one of claims 1 to 4.

One or a plurality of score settings including a score and a relative position and type of appearance to which the score should be given in association with each condition or combination of conditions represented by a combination of the attribute type, appearance relative position, and attribute value of the word A rule data storage in which rules are registered;
An element string data storage unit that stores an element string in which elements including attribute values of each attribute type of words are arranged in the order of appearance;
From the element string data storage unit, the maximum number of conditions included in the combination of conditions and the maximum distance from the reference position of the appearance relative position included in the score setting rule associated with the condition or combination of conditions; An extraction means for extracting an application candidate that is an element condition or a combination of element conditions represented by a combination of a word attribute type, an appearance relative position, and an attribute value for each element according to the candidate extraction condition specified by:
If the rule data storage unit is searched for the application candidate and the corresponding condition or combination of conditions exists, the rule data storage unit is associated with the corresponding condition or condition combination 1 or Extracting a plurality of score setting rules, the type and the score included in the extracted one or more score setting rules, and the appearance relative position included in the extracted one or more score setting rules Means for updating a score for the type of the element identified from the appearance position related to the application candidate, and storing the score in the element string data storage unit;
A rule processing device.