JPS63213061A

JPS63213061A - System for classifying declensional kana ending

Info

Publication number: JPS63213061A
Application number: JP62044107A
Authority: JP
Inventors: Takashi Nakamura; 俊中村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-02-28
Filing date: 1987-02-28
Publication date: 1988-09-05
Anticipated expiration: 2011-03-29
Also published as: JPH0833891B2

Abstract

PURPOSE:To eliminate the need of registering all decleneional KANA (Japanese syllabary) endings into a dictionary by classifying and registering the relations between the index in the dictionary and the decleneional KANA endings of those words to be retrieved when the KANJI (Chinese characters)-KANA words are retrieved. CONSTITUTION:When no coincident index is not detected out of a dictionary when the KANJI-KANA words are detected, the relation between the index read out of the dictionary and including an extracted KANJI and the declension al ending of a word to be retrieved is classified. In this case, a simple (A) type where the KANA following the KANJI can be omitted is defined together with a KANJI word ending feed (B) type where where the extra declensional KANA endings can be secured, a continuous word ending addition (R) type where the KANJI has a continuous word ending and can be used as a noun, and declensional KANA ending coincidence enable (O<+>) type where the index in the dictionary contains HIRAGANA (cursive form of Japanese syllabary) equivalent one or more characters in the form of a word having the omission of a declensional KANA ending. Thus it is not required to register all declension al KANA endings into a dictionary.

Description

【発明の詳細な説明】〔概要〕この発明は、ある漢字仮名混じり単語が単語として辞書
に登録されているか検索する際、辞書中から見つけ出し
た見出しと、検索しようとする単語の送り仮名のっけか
たとが異なる場合、この単語を検索し得ない問題を解決
するため、辞書中の見出しと、単語中の送り仮名との関
係から送り仮名合致可能型、単純型、および漢字語尾送
り型などに分類することにより、例え辞書中に同一の送
り仮名を持つ見出しが見つからなくても所望の単語の検
索を行い得るようにしたものである。[Detailed Description of the Invention] [Summary] This invention, when searching to see if a certain kanji-kana mixed word is registered as a word in a dictionary, uses the heading found in the dictionary and the name of the word to be searched for. In order to solve the problem of not being able to search for a word when the words are different from each other, the word is classified into types that can match okurigana, simple types, and kanji endings, based on the relationship between the heading in the dictionary and the okurikana in the word. By doing so, it is possible to search for a desired word even if no heading with the same okurikana is found in the dictionary.

[Industrial application field]

本発明は、辞書中の見出しと、検索しようとする語中の
送り仮名との関係からこれらの類型を分類して例え辞書
中に合致するものがなくても所望の単語の検索を行い得
るよう構成した送り仮名分類方式に関するものである。The present invention classifies these types based on the relationship between the heading in the dictionary and the okurikana in the word to be searched, so that the desired word can be searched even if there is no matching word in the dictionary. This paper relates to the constructed okurikana classification system.

〔従来の技術と発明が解決しようとする問題点〕自然言
語処理の分野、特に機械翻訳、自然言語インタフェース
（例えば自然言語によるデータベース検索等）などでは
、コンピュータに文章を理解させる必要がある。その第
１段階では、辞書を引いて文を文節に分解し、同時に単
語の意味を取り出すことが行われている。[Problems to be solved by the prior art and the invention] In the field of natural language processing, particularly in machine translation, natural language interfaces (for example, database searches using natural language, etc.), it is necessary to make computers understand sentences. In the first stage, a dictionary is used to break down sentences into clauses and at the same time extract the meanings of words.

従来、コンピュータを用いて辞書を検索するとき、ある
単語の送り仮名の付は方が辞書中の見出しのそれと異な
る場合、その単語を引くことができないという問題点が
あった。強いてその単語を引けるようにするには、辞書
中の見出しとして引こうとする単語に対して考えられる
全ての送り仮名を予め登録しておく必要があり、極めて
辞書の容量が大きくなり、現実的でないと共に、必ずし
も全ての送り仮名について辞書に登録し得ないという問
題点があった。このため、辞書に単語の全ての考えられ
る送り仮名を予め登録するのではなくて、辞書中の見出
しと、単語の送り仮名との間の関係を分類して例え辞書
中に合致するものがなくても所望の単語の検索を行い得
るようにすることが望まれている。Conventionally, when searching a dictionary using a computer, there has been a problem that if the okurikana of a certain word is different from that of the heading in the dictionary, the word cannot be retrieved. In order to be able to force a word to be looked up, it is necessary to register in advance all possible okurikana for the word to be looked up as a heading in the dictionary, which increases the capacity of the dictionary and makes it impractical. In addition, there was a problem that not all okurigana could be registered in the dictionary. For this reason, instead of pre-registering all possible okurikana of a word in a dictionary, we classify the relationship between the heading in the dictionary and the okurikana of a word, even if there is no match in the dictionary. It is desired to be able to search for a desired word even if the user is searching for a desired word.

[Means for solving problems]

本発明は、前記問題点を解決するために、辞書中の見出
しと、単語中の送り仮名との関係から送り仮名合致可能
型、単純型、および漢字語尾送り型などに少なくとも分
類することにより、例え辞書中に同一の送り仮名が見つ
からなくても所望の単語の検索を行い得るようにしてい
る。In order to solve the above-mentioned problems, the present invention classifies the words into at least okuri-kana matching type, simple type, kanji ending-adjusting type, etc. based on the relationship between the heading in the dictionary and the okurikana in the word. Even if the same okurikana is not found in the dictionary, it is possible to search for a desired word.

第１図は本発明の原理構成図を示す。図中検索部１は入
力データである文字列に合致する見出しを辞書６中から
検索するものである。FIG. 1 shows a basic configuration diagram of the present invention. A search unit 1 in the figure searches a dictionary 6 for a heading that matches a character string that is input data.

開始位置検出部２は、本発明に係わる分類方式の適用が
可能と推定される文字列の開始位置を入力された文字列
から検出するものである。The start position detection unit 2 detects, from an input character string, a start position of a character string to which it is estimated that the classification method according to the present invention can be applied.

漢字抽出部３は、開始位置検出部２で検出された文字列
中の漢字のみを抽出するものである。The kanji extractor 3 extracts only kanji from the character string detected by the start position detector 2.

照合部４は、漢字抽出部３によって抽出された漢字を見
出し中に含むもの全てを辞書６中から見つけ出すもので
ある。The collation unit 4 finds out from the dictionary 6 all headings that contain the kanji extracted by the kanji extraction unit 3.

分類処理部５は、開始位置検出部２で検出された文字列
と、照合部４によって見つけ出された辞書６の見出しと
の関係に基づいて、見出し語の送り仮名の分類を行うも
のである。The classification processing unit 5 classifies the okurigana of the entry word based on the relationship between the character string detected by the start position detection unit 2 and the entry in the dictionary 6 found by the matching unit 4. .

[Effect]

次に、動作を説明する。 Next, the operation will be explained.

第１図において、文字列の入力データは、検索部１によ
って辞書６中から合致する見出しが検索され、見つかっ
た場合には、先頭から順次その見出しデータを出力デー
タとして出力する。一方、合致するものが見つからなか
った場合、開始位置検出部２は、送り仮名の違いによっ
て検索不能となったと推定される文字列の開始位置を見
つけ出す。漢字抽出部３はこの見つけ出した位置から始
めて単語を構成すると推定される長さの文字列に含まれ
る漢字のみを抽出する。照合部４は、この抽出した漢字
を含む見出しを辞書６中に予め登録されているものと照
合して読み出す。分類処理部５は、辞書６中から読み出
された上記漢字を含む見出しと、単語中の送り仮名など
との関係から分類を行う。In FIG. 1, a search unit 1 searches a dictionary 6 for a matching heading for character string input data, and if a matching heading is found, the heading data is sequentially output from the beginning as output data. On the other hand, if no match is found, the start position detection unit 2 finds the start position of the character string that is presumed to be unsearchable due to the difference in okurigana. Starting from this found position, the kanji extracting unit 3 extracts only kanji included in a character string of a length estimated to constitute a word. The collation unit 4 collates the heading containing the extracted kanji with those registered in advance in the dictionary 6 and reads it out. The classification processing unit 5 performs classification based on the relationship between the heading containing the above-mentioned kanji read from the dictionary 6 and the okurikana in the word.

以上のように、辞書６中の見出しと合致しない文字列を
含む単語に対して、抽出された漢字部分を含む見出しを
辞書６中から読み出し、両者の間の送り仮名の関係から
分類を行うことにより、例え辞書６中に登録されていな
い見出しに対しても単語の検索を行うことが可能となる
。As described above, for words that include character strings that do not match the headings in the dictionary 6, the headings that include the extracted kanji parts are read out from the dictionary 6, and classification is performed based on the relationship between the okurikana characters. This makes it possible to search for words even if they are not registered in the dictionary 6.

〔Example〕

次に、第２図ないし第４図を用いて本発明の１実施例の
構成および動作を詳細に説明する。Next, the configuration and operation of one embodiment of the present invention will be explained in detail using FIGS. 2 to 4.

第２図は分類処理部５における動作を表す。図中■は、
送り仮名合致可能型（以下０゛型という）であるか否か
を判別する状態を示す。この０゛型は、辞書中の見出し
が“□●□●・・・・□●○”の形で登録されており、
文中には、口の一部を送りすぎているか、あるいは・Ｏ
の一部が直前の口に取り込まれている単語として存在す
るものを表す。例えば第３図（二〉図中（ａｌを用いて
示す辞書６中の見出し“必ず”は“口Ｏ”からなり、単
語“必らず”は“口００′とからなり、単語中の“ら”
の仮名が１つ送りすぎになっている。この“ら”は、“
・”としてその有無を問われないので、′必ず”は、０
１型に該当する。ＹＥＳの場合には０９型として分類す
る。ＮＯの場合には図中■を実行する。FIG. 2 shows the operation in the classification processing section 5. ■ in the figure is
This shows a state in which it is determined whether or not the type is a type that can match forwarded kana (hereinafter referred to as 0゛ type). This 0゛ type is registered in the dictionary as the heading "□●□●...□●○".
Do you use too many parts of your mouth in your sentences, or do you use ・O?
Represents something that exists as a word in which part of is taken into the previous mouth. For example, in the dictionary 6 shown in Figure 3 (2〉(al), the entry ``necessarily'' consists of ``mouth O'', the word ``necessarily'' consists of ``mouth 00','' and others"
One kana has been sent too many times. This “ra” is “
・Since there is no question as to whether it exists or not, 'must be' means 0.
It corresponds to type 1. If YES, it is classified as type 09. In the case of NO, execute the process (■) in the figure.

図中■は、見出しの語尾が仮名であるか否かを判別する
状態を示す。これは、単純型（以下Ａ型という）である
か否かを判別することを意味し、照合部４の機能によっ
て暗に含まれている選別剤と複合して辞書６中から読み
出した見出しが“口・・・□●○”の形であり、文中で
は、“・○”の部分が省略されている単語として出現す
るか否かを判別することを意味している。例えば第３図
（イ）図中（ｂ）を用いて示す辞書６中の見出し“著し
”は“口○”からなり、単語“暑い”は辞書６中の見出
しの“○”に相当する“し”が省略されているので、こ
れは、Ａ型に該当する。ＹＥＳの場合にはＡ型として分
類する。Ｎｏの場合には図中■を実行する。■ in the figure indicates a state in which it is determined whether the ending of the heading is a kana. This means to determine whether or not it is a simple type (hereinafter referred to as type A). It is in the form of "mouth...□●○", meaning that it is determined whether or not the word appears as a word with the "・○" part omitted in the sentence. For example, the heading "author" in the dictionary 6 shown using (b) in FIG. Since "shi" is omitted, this corresponds to type A. If YES, it is classified as type A. In the case of No, execute ■ in the figure.

図中■は、漢字語尾送り型（以下Ｂ型という）であるか
否かを判別する状態を示す。これは、辞書中の見出しが
“★□”の形であり、文中では、“口”の最後の音が余
分に送られている単語として出現するものであるか否か
を判別することを意味している。例えば第３図（ロ）図
中ｆｃ）を用いて示すように、辞書６中の見出し“憤“
は“口“からなり、単語“憤おる”は辞書６中の見出し
に音“お”が余分に送られている単語に、活用語尾“る
”　（例では活用語尾が別単語として扱われている）が
付加されているので、これは、Ｂ型に該当する。ＹＥＳ
の場合にはＢ型として分類する。In the figure, ■ indicates a state in which it is determined whether or not the kanji is a word-ending type (hereinafter referred to as type B). This means determining whether the heading in the dictionary is in the form of “★□” and in the sentence, the final sound of “mouth” appears as an extra word. are doing. For example, as shown in FIG.
is composed of "mouth", and the word "angoru" is a word with an extra sound "o" in the entry in Dictionary 6, and the conjugated ending "ru" (in the example, the conjugated ending is treated as a separate word). ) is added, so this corresponds to type B. YES
In this case, it is classified as type B.

Ｎｏの場合には図中■を実行する。In the case of No, execute ■ in the figure.

図中■は、連用形語尾付加型（以下Ｒ型という）である
か否かを判別する状態を示す。これは、辞書中には、見
出しが“★□”の形の動詞しか登録されておらず、連用
形語尾があれば、形態緊解。In the figure, ■ indicates a state in which it is determined whether or not the adjunctive form is the suffix addition type (hereinafter referred to as the R type). This is because the dictionary only registers verbs with the heading "★□", and if there is a conjunctive ending, it is morphologically tense.

析以降の解析で名詞として扱うものである。例えば第３
図（ハ）図中（ｄｌを用いて示すように、辞書６中の見
出し“間”は“口”からなり、これは、動詞として登録
されており、単語中に名詞“間”（例えば“間ｌ”など
）として出現する場合には当該Ｒ型に分類される。ＹＥ
Ｓの場合にはＲ型として分類する。Ｎｏの場合には仮名
抜き単純型（以下〇−型という）に分類する。この〇−
型は、辞書中には、“□●□●・・・・口・”の形で登
録されており、単語中では“・”の部分が任意に増減す
るものである。例えば第４図（ホ）図中（ｆ）に示すよ
うに、辞書中の見出し“寒空”は“ロロ”からなり、単
語中の“寒む空”は“む”の部分が増大したものであっ
て、〇−型と分類される。It is treated as a noun in subsequent analyses. For example, the third
Figure (c) As shown in the figure (dl), the entry "ma" in the dictionary 6 consists of "mouth", which is registered as a verb, and the noun "ma" (for example, " If it appears as a type (e.g. "between 1"), it is classified as the relevant R type.YE
In the case of S, it is classified as R type. If No, it is classified as simple type without kana (hereinafter referred to as 〇-type). This 〇−
The pattern is registered in the dictionary as "□●□●...mouth.", and the "•" part in the word can be increased or decreased arbitrarily. For example, as shown in Figure 4 (E) and (F), the dictionary entry "Kansora" consists of "roro", and the word "Kamusora" has an increased "mu" part. Yes, it is classified as type ○-.

その他に、第４図（へ）に示すように、Ｒ−０−複合型
がある。これは、Ｒ型と〇−型とを複合したものである
。In addition, as shown in FIG. 4(f), there is an R-0-complex type. This is a combination of R type and O-type.

以上のように、辞書中の見出しと、単語との間の送り仮
名の関係から上述したように分類することが可能になり
、例え辞書中に予め登録された見出しがなくてもその単
語の検索を行うことができる。As mentioned above, it is now possible to classify the word as described above based on the relationship between the heading in the dictionary and the okurikana between the word, and even if there is no heading registered in advance in the dictionary, it is possible to search for that word. It can be performed.

第３図および第４図において、左欄に示す単語は、従来
の方式では、解析不能となったものを示し、中央の欄は
正しく分解されるような表記であって辞書６中に予め登
録されているものを示す。In FIGS. 3 and 4, the words shown in the left column indicate words that cannot be analyzed using the conventional method, and the center column shows words that can be correctly decomposed and are registered in advance in the dictionary 6. Show what is being done.

図中“１”は辞書６中の別の単語として登録されている
区切りを表す。尚、右欄は従来の辞書６を用いて失敗し
た失敗パターン例を示す。図中■は誤った単語を表し、
◎は偶然正しい単語を引いたことを表し、■は未登録語
として処理されたものを示す。In the figure, "1" represents a break registered as another word in the dictionary 6. Incidentally, the right column shows examples of failure patterns in which the conventional dictionary 6 was used. ■ in the diagram represents an incorrect word,
◎ indicates that the correct word was drawn by chance, and ■ indicates that it was processed as an unregistered word.

〔Effect of the invention〕

以上説明したように、本発明によれば、辞書中の見出し
と、単語中の送り仮名との関係から送り仮名合致可能型
、単純型、および漢字語尾送り型などに分類する構成を
採用しているため、例え辞書中に同一の送り仮名を含む
見出しが見つからなくても辞書中の見出しおよび単語か
ら分類を行い、所望の単語の検索を行うことができる。As explained above, according to the present invention, a configuration is adopted in which classification is performed based on the relationship between the heading in the dictionary and the okurikana in the word, such as the okuri-kana matching type, the simple type, and the kanji-ending type. Therefore, even if a heading containing the same okurikana is not found in the dictionary, classification can be performed based on the headings and words in the dictionary, and a desired word can be searched.

[Brief explanation of the drawing]

第１図は本発明の原理構成図、第２図は本発明の動作説
明フローチャート、第３図および第４図は本発明の分類
説明図を示す。図中、１は検索部、２は開始位置検出部、３は漢字抽出
部、４は照合部、５は分類処理部、６は辞書を表す。FIG. 1 is a diagram showing the principle configuration of the present invention, FIG. 2 is a flow chart explaining the operation of the present invention, and FIGS. 3 and 4 are diagrams explaining the classification of the present invention. In the figure, 1 is a search section, 2 is a start position detection section, 3 is a kanji extraction section, 4 is a collation section, 5 is a classification processing section, and 6 is a dictionary.

Claims

[Claims] In the okurikana classification method for classifying the okurikana of words included in input kana-mixed sentences, the headings in the dictionary are □●□●...□●○ (□ is a single character) Kanji, ● represents 0 or more flat names, ○ represents one character flat name, the same applies hereafter)
It is registered in the form of , and as words in the sentence, there is a type (1) in which part of □ is sent too much, or a part of ●○ is incorporated into the immediately preceding □, and a dictionary. The headings inside are registered in the form □...□●○, and the words in the sentence are in the simple form (with the ●○ part omitted).
2), the heading in the dictionary is registered in the form of ★□ (★ is any character of 0 or more), and the word in the sentence is a kanji ending with the last sound of □ added. An okurikana classification system characterized in that it is configured to classify at least into okurikana type (3).