JPH0231422B2 - - Google Patents

Info

Publication number
JPH0231422B2
JPH0231422B2 JP59081166A JP8116684A JPH0231422B2 JP H0231422 B2 JPH0231422 B2 JP H0231422B2 JP 59081166 A JP59081166 A JP 59081166A JP 8116684 A JP8116684 A JP 8116684A JP H0231422 B2 JPH0231422 B2 JP H0231422B2
Authority
JP
Japan
Prior art keywords
kana
independent word
notation
character string
independent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP59081166A
Other languages
Japanese (ja)
Other versions
JPS60225274A (en
Inventor
Shunichi Fukushima
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Institute of Advanced Industrial Science and Technology AIST
Original Assignee
Agency of Industrial Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency of Industrial Science and Technology filed Critical Agency of Industrial Science and Technology
Priority to JP59081166A priority Critical patent/JPS60225274A/en
Publication of JPS60225274A publication Critical patent/JPS60225274A/en
Publication of JPH0231422B2 publication Critical patent/JPH0231422B2/ja
Granted legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Description

【発明の詳細な説明】 (技術分野) 本発明は自立語検出方式に関し、特に与えられ
た漢字仮名混じり文字列と自立語辞書の各自立語
の表記部とを照合することにより該与えられた漢
字仮名混じり文字列から自立語を検出する自立語
検出方式に関するものである。
[Detailed Description of the Invention] (Technical Field) The present invention relates to an independent word detection method, and in particular, the present invention relates to an independent word detection method, and in particular, the present invention relates to an independent word detection method, and particularly to This invention relates to an independent word detection method for detecting independent words from character strings containing kanji and kana.

(従来技術) 日本語文の漢字仮名混じり文字列は、英文等の
ような単語単位の分かち書きの習慣を持たず、通
常べた書きで表記される。そのため日本語文の漢
字仮名混じり文字列を電子計算機等を用いて機械
的に解析する際には、まず与えられた漢字仮名混
じり文字列と単語辞書の各単語の表記部とを照合
することにより該与えられた漢字仮名混じり文字
列から単語を検出することが必要となる。
(Prior Art) Character strings containing kanji, kana, and kanji in Japanese sentences do not have the custom of separating words into words as in English sentences, and are usually written in solid writing. Therefore, when mechanically analyzing a character string containing kanji and kana in a Japanese sentence using an electronic computer, first, the given character string containing kanji and kana is compared with the notation of each word in a word dictionary. It is necessary to detect words from a given string of characters mixed with kanji and kana.

ここで、付属語は自立語に比べて語数が少なく
(付属語数は数百語〜数千語、自立語数は数万語
以上)かつ高頻度で出現するので、従来は単語辞
書を自立語辞書と付属語辞書とに分割して構成
し、まず付属語辞書検索による付属語検出を行な
つた後、付属語として認定されなかつた残余文字
列について自立語辞書との照合による自立語検出
を行なうことにより、検索に時間を要する自立語
辞書の検索回数を最小限に抑え処理効率の向上を
図つた方式が採られている。第1図a,bはそれ
ぞれ従来の自立語検出方式による自立語検出の第
1、第2の例を示す図である。第1の例において
は、付属語辞書検索により文字列「を」と「む」
がそれぞれ付属語“を(格助詞)”と“む(五段
活用語尾終止形)”として認定されており、付属
語として認定されなかつた残余文字列「本」と
「読」について自立語辞書との照合による自立語
検出が行なわれてそれぞれ自立語“本(名詞)”
と読(五段動詞語幹)”が検出される。これら自
立語は隣接する前記“を(格助詞)”や“む(五
段活用語尾終止形)”と接続可能であり、正しい
自立語が検出された(〇印で図示)とみなされ
る。
Here, adjunctive words have fewer words than independent words (the number of adjunct words is from several hundred to several thousand words, and the number of independent words is tens of thousands or more) and appears frequently, so conventionally, a word dictionary is used as an independent word dictionary. First, the attached word is detected by searching the attached word dictionary, and then the remaining character strings that are not recognized as attached words are checked against the independent word dictionary to detect independent words. By doing so, a method is adopted that aims to improve processing efficiency by minimizing the number of times the independent word dictionary is searched, which requires time. FIGS. 1a and 1b are diagrams showing first and second examples of independent word detection using a conventional independent word detection method, respectively. In the first example, the character strings ``wo'' and ``mu'' are searched in the adjunct dictionary.
are recognized as adjunctive words “wo (case particle)” and “mu (five-stage conjugated final form),” respectively, and an independent word dictionary is created for the remaining character strings “hon” and “yomi” that were not recognized as adjunct words. Independent word detection is performed by matching with the independent word “book (noun)”.
and yomi (five-stage verb stem)" are detected. These independent words can be connected with the adjacent "(case particle)" or "mu (five-stage conjugated final final form)", and the correct independent word is detected. It is considered to have been detected (indicated by a circle).

しかしながら上記のような従来の自立語検出方
式では、与えられた漢字仮名混じり文字列と自力
語辞書の表記部との照合開始位置を既に単語認定
された文字列とまだ単語認定されていない文字列
との境界に設定しているので、仮名文字と非仮名
文字とが混じつて表記される自立語等において、
第2の例のように自立語を構成する一部の仮名文
字が付属語として認定されることがあると正しい
自立語を検出できない(×印で図示)という欠点
があつた。すなわち、第1図bの文字列「例え
ば」において、本来自立語“例えば(副詞)”の
一部である「えば」が付属語列“え(五段活用語
尾仮定形)”、“ば(接続助詞)”として認定されて
いるので、文字列「例」を自立語辞書と照合する
ことになり正しい自立語が検出されない。このと
きは、自立語が存在すると推定される文字列の範
囲を「例」→「例え」→「例えば」というように
ずらしながら自立語辞書検索を行なえば所望の自
立語を得ることは可能であるが、第2図bの文字
列「突然にわか雨が」においては、自立語が存在
すると推定される文字列の範囲を「わか雨」→
「わか雨が」→「か雨が」→「か雨」→……→
「にわか雨」とずらしていくように盲目的な検索
を行なわなくてはならず、検索効率が極度に低下
するという欠点があつた。
However, in the conventional independent word detection method as described above, the starting position of matching a given character string containing kanji and kana with the notation part of a self-reliance dictionary is divided into character strings that have already been recognized as words and character strings that have not yet been recognized as words. Since it is set at the boundary between Kana and non-Kana characters,
As in the second example, if some kana characters constituting an independent word were recognized as attached words, there was a drawback that the correct independent word could not be detected (indicated by an x). That is, in the character string ``for example'' in Figure 1b, ``eba'', which is originally a part of the independent word ``for example (adverb)'', is replaced by the adjunct word string ``e (five-stage conjugated ending hypothetical form)'', ``ba ( Since the character string ``Example'' must be checked against the independent word dictionary, the correct independent word will not be detected. In this case, it is possible to obtain the desired independent word by performing an independent word dictionary search while shifting the range of character strings in which the independent word is presumed to exist, such as "example" → "example" → "for example". However, in the character string ``sudden rain shower'' in Figure 2b, the range of character strings in which the independent word is presumed to exist is ``shower rain'' →
“Shower rain” → “Ka rain” → “Ka rain” →……→
The drawback was that a blind search had to be performed, shifting the search to "showers", which resulted in extremely low search efficiency.

(発明の目的) 本発明の目的は、照合開始位置を表記に非仮名
文字を含む自立語における最後方の非仮名文字に
設定することにより上記欠点を除去し、仮名文字
と非仮名文字とが混じつて表記される自立語をも
正しくかつ効率的に検出することができる自立語
検出方式を提供することにある。
(Objective of the Invention) The object of the present invention is to eliminate the above-mentioned drawbacks by setting the matching start position to the last non-kana character in an independent word that includes non-kana characters in the notation, and to distinguish between kana characters and non-kana characters. An object of the present invention is to provide an independent word detection method that can correctly and efficiently detect independent words that are written in a mixed manner.

(発明の構成) 本発明によれば、与えられた漢字仮名混じり文
字列と自立語辞書の各自立語の表記部とを照合す
ることにより該与えられた漢字仮名混じり文字列
から自立語を検出する自立語検出方式において、
前記自立語辞書に収容された表記に非仮名文字を
含む自立語の表記部に対して該表記部から末尾の
仮名文字列を除去した部分の主要表記部と該末尾
の仮名文字列である部分の残余表記部とを設け、
前記与えられた漢字仮名混じり文字列から表記に
非仮名文字を含む自立語を検出する際には、表記
に非仮名文字を含む自立語における最後方の非仮
名文字の存在し得る位置を照合開始位置として設
定する照合開始位置設定手段と、前記与えられた
漢字仮名混じり文字列における前記照合開始位置
から前方の文字列と前記主要表記部とが一致する
自立語を前記自立語辞書から検索する主要表記部
検索手段と、該主要表記部検索手段により検索さ
れた各自立語について前記与えられた漢字仮名混
じり文字列における前記照合開始位置より後方の
文字列と前記残余表記部とを照合する残余表記部
照合手段とを備え、前記表記に非仮名文字を含む
自立語を該表記に非仮名文字を含む自立語におけ
る最後方の非仮名文字から照合を開始することに
よつて検出することを特徴とする自立語検出方式
が得られる。
(Structure of the Invention) According to the present invention, an independent word is detected from a given character string containing kanji and kana by comparing the given character string containing kanji and kana with the notation of each independent word in an independent word dictionary. In the independent word detection method,
The main notation part of the notation part of an independent word whose notation contained in the independent word dictionary includes a non-kana character, after removing the last kana character string from the notation part, and the part that is the last kana character string. and a residual notation part,
When detecting an independent word that includes a non-kana character in the notation from the given kanji-kana mixed character string, start matching the possible position of the last non-kana character in the independent word that includes a non-kana character in the notation. a collation start position setting means for setting a position, and a main unit for searching the independent word dictionary for an independent word whose main notation matches a character string forward from the collation start position in the given kanji-kana mixed character string. a notation part search means, and a residual notation that matches the character string after the matching start position in the given kanji-kana mixed character string with the remaining notation part for each independent word searched by the main notation part search means. and a part matching means for detecting an independent word including a non-kana character in the notation by starting matching from the last non-kana character in the independent word including the non-kana character in the notation. An independent word detection method is obtained.

(実施例) 次に第2図〜第7図を参照して本発明について
説明する。
(Example) Next, the present invention will be described with reference to FIGS. 2 to 7.

第2図は本発明の自立語検出方式の一実施例を
示す装置のブロツク図、第3図は第2図における
各部の処理動作を示すフローチヤート、第4図は
本発明の自立語検出方式で用いられる非仮名自立
語辞書の一例を示す図、第5図は第2図における
非仮名自立語検出部の一構成例を示すブロツク
図、第6図a,bはそれぞれ第5図における照合
開始位置設定回路による照合開始位置決定の第
1、第2の例を示す図および第7図は第3図にお
ける処理34の詳細を示すフローチヤートであ
る。
FIG. 2 is a block diagram of an apparatus showing an embodiment of the independent word detection method of the present invention, FIG. 3 is a flowchart showing the processing operation of each part in FIG. 2, and FIG. 4 is an independent word detection method of the present invention. Figure 5 is a block diagram showing an example of the configuration of the non-kana independent word detection section in Figure 2, and Figures 6a and b are the collation in Figure 5. FIG. 7 is a flowchart showing the details of the process 34 in FIG. 3.

第2図において、装置は自立語検出の対象とし
て与えられた漢字仮名混じり文字列を入力する文
字列入力部21と、該文字列入力部21からの文
字列を記憶する文字列記憶部22と、該文字列記
憶部22から読み出された文字列から検出すべき
自立語が仮名表記の自立語であるかそれとも表記
に非仮名文字を含む自立語であるかを判定する自
立語判定部23と、仮名表記の自立語を収容した
仮名自立語辞書を記憶する仮名自立語辞書記憶部
24と、自立語判定部23が仮名表記の自立語を
検出すべきと判定したとき起動され仮名自立語辞
書記憶部24内の仮名自立語辞書における表記部
と文字列記憶部22からの前記与えられた漢字仮
名混じり文字列における部分仮名文字列とを照合
することにより該与えられた漢字仮名混じり文字
列から仮名表記の自立語を検出する仮名自立語検
出部25と、表記に非仮名文字を含む自立語を収
容した非仮名自立語辞書を記憶する非仮名自立語
辞書記憶部26と、自立語判定部23が表記に非
仮名文字を含む自立語を検出すべきと判定したと
き起動され非仮名自立語辞書記憶部26内の非仮
名自立語辞書における表記部と文字列記憶部22
からの前記与えられた漢字仮名混じり文字列にお
ける非仮名文字を含む部分文字列とを照合するこ
とにより該与えられた漢字仮名混じり文字列から
表記に非仮名文字を含む自立語を検出する非仮名
自立語検出部27と、仮名自立語検出部25ある
いは非仮名自立語検出部27によつて検出された
仮名表記の自立語あるいは表記に非仮名文字を含
む自立語に関する品詞、読み、アクセント情報等
の単語情報を仮名自立語辞書記憶部24に記憶さ
れた仮名自立語辞書あるいは非仮名自立語辞書記
憶部26に記憶された非仮名自立語辞書から読み
込み出力する情報出力部28と、前記各部21,
〜28の動作を制御する制御部29とを備える。
なお自立語検出の対象として与えられた文字列
は、キーボード、OCR、磁気テープ装置等の入
力装置を通して文字コード列に変換された漢字仮
名混じり文字列に対して付属語辞書参照による付
属語検出を施したもの、あるいはさらに部分的に
自立語検出が施されたものである。また文字列記
憶部22、仮名自立語辞書記憶部24、非仮名自
立語辞書記憶部26には、例えばICメモリ、磁
気デイスク装置、磁気テープ装置等が用いられ
る。さらに自立語判定部23は、文字列記憶部2
2に記憶された前記与えられた漢字仮名混じり文
字列において単語として認定されていない残余文
字列がすべて仮名文字であるときには仮名表記の
自立語を検出すべきと判定し、該残余文字列に非
仮名文字が含まれるときには表記に非仮名文字を
含む自立語を検出すべきと判定する。例えば、漢
字仮名混じり文字列「難し/い(形容詞活用語尾
連体形)/本(名詞)」については、(名詞)で示
したような単語として認定されている文字列以外
の残余文字列「難し」が非仮名文字「難」を含む
ので表記に非仮名文字を含む自立語を検出すべき
と判定され、また例えば、漢字仮名混じり文字列
「おもしろ/い(形容詞活用語尾連体形)/本
(名詞)」については、残余文字列「おもしろ」が
すべて仮名文字であるので仮名表記の自立語を検
出すべきと判定される。これら判定結果は制御部
29へ送られる。
In FIG. 2, the device includes a character string input section 21 for inputting a character string mixed with kanji and kana given as a target for independent word detection, and a character string storage section 22 for storing the character string from the character string input section 21. , an independent word determination unit 23 that determines whether the independent word to be detected from the character string read from the character string storage unit 22 is an independent word expressed in kana or an independent word containing non-kana characters in the notation. , a kana independent word dictionary storage unit 24 that stores a kana independent word dictionary that stores independent words written in kana, and an independent word determination unit 23 that is activated when it is determined that an independent word written in kana should be detected. The given kanji-kana-mixed character string is obtained by comparing the notation part in the kana-independent word dictionary in the dictionary storage unit 24 with the partial kana character string in the given kanji-kana-mixed character string from the character string storage unit 22. a kana independent word detection unit 25 that detects independent words in kana notation from , a non-kana independent word dictionary storage unit 26 that stores a non-kana independent word dictionary containing independent words whose notation includes non-kana characters, and an independent word determination unit It is activated when the unit 23 determines that an independent word containing a non-kana character in the notation should be detected, and the notation unit and character string storage unit 22 in the non-kana independent word dictionary in the non-kana independent word dictionary storage unit 26 is activated.
Non-kana that detects an independent word whose notation includes a non-kana character from the given Kanji-kana-mixed character string by comparing it with a partial string containing non-kana characters in the given Kanji-Kana-mixed character string. Part of speech, pronunciation, accent information, etc. regarding independent words in kana notation or independent words containing non-kana characters in the notation detected by the independent word detection unit 27 and the kana independent word detection unit 25 or the non-kana independent word detection unit 27 an information output unit 28 that reads and outputs word information from the kana independent word dictionary stored in the kana independent word dictionary storage unit 24 or the non-kana independent word dictionary stored in the non-kana independent word dictionary storage unit 26; ,
-28.
Note that the character string given as the object of independent word detection is a character string containing kanji and kana that is converted into a character code string through an input device such as a keyboard, OCR, magnetic tape device, etc., and the attached word detection is performed by referring to an attached word dictionary. or even partially subjected to independent word detection. Further, for the character string storage section 22, the kana independent word dictionary storage section 24, and the non-kana independent word dictionary storage section 26, for example, an IC memory, a magnetic disk device, a magnetic tape device, etc. are used. Furthermore, the independent word determination unit 23 includes the character string storage unit 2
When the remaining character strings that are not recognized as words in the given kanji-kana mixed character string stored in step 2 are all kana characters, it is determined that an independent word in kana notation should be detected, and When a kana character is included, it is determined that an independent word whose notation includes a non-kana character should be detected. For example, for the character string ``difficult/ii (adjective conjugation adjunctive form)/hon (noun)'' containing kanji and kana, the remaining character string ``difficult'' other than the character string recognized as a word as shown in (noun) '' contains the non-kana character `` difficult'', so it is determined that an independent word that includes the non-kana character in the notation should be detected, and for example, the character string ``Omoshiro/ii (adjective conjugated adjunctive form)/hon ( As for "Noun)", since the remaining character string "Omoshiro" is all kana characters, it is determined that an independent word written in kana should be detected. These determination results are sent to the control section 29.

ここで、仮名自立語辞書記憶部24に記憶され
た仮名自立語辞書の構成および仮名自立語検出部
25は公知のものであり、本発明の自立語検出方
式の特徴とするところではない。本発明の自立語
検出方式の特徴とするところは、表記に非仮名文
字を含む自立語を検出する際に用いられる非仮名
自立語辞書の構成および非仮名自立語検出部27
である。
Here, the structure of the kana independent word dictionary stored in the kana independent word dictionary storage unit 24 and the kana independent word detection unit 25 are known, and are not a feature of the independent word detection method of the present invention. The features of the independent word detection method of the present invention include the configuration of the non-kana independent word dictionary used to detect independent words containing non-kana characters in the notation, and the non-kana independent word detection unit 27.
It is.

なお、前記仮名自立語辞書と前記非仮名自立語
辞書とは必ずしも別の辞書として分割して構成さ
れる必要はなく、1つの自立語辞書として構成す
ることも可能である。
Note that the kana independent word dictionary and the non-kana independent word dictionary do not necessarily have to be configured as separate dictionaries, but can also be configured as one independent word dictionary.

次に第4図において、非仮名自立語辞書4に収
容された表記に非仮名文字を含む自立語に対応し
た各レコードは表記部41と単語情報部42とを
有し、該表記部41は該表記部41から末尾の仮
名文字列を除去した部分の主要表記部411と該
末尾の仮名文字列である部分の残余表記部412
とから成る。また、単語情報部42は検出された
自立語の文法的検定あるいは構文解析等を行なう
際に用いられる品詞情報や音声出力の際に用いら
れる読み、アクセント情報等が収められたもので
あり、自立語検出に当たつては必ずしも用いられ
ない。
Next, in FIG. 4, each record corresponding to an independent word whose notation includes a non-kana character stored in the non-kana independent word dictionary 4 has a notation section 41 and a word information section 42, and the notation section 41 has a notation section 41 and a word information section 42. A main notation part 411 of the part from which the last kana character string is removed from the notation part 41 and a residual notation part 412 of the part which is the last kana character string.
It consists of In addition, the word information section 42 stores part-of-speech information used in grammatical testing or syntactic analysis of detected independent words, pronunciation and accent information used in audio output, etc. It is not necessarily used for word detection.

次に第5図において、文字列記憶部22、非仮
名自立語辞書記憶部26、情報出力部28、制御
部29はそれぞれ第2図における同符号のものに
対応している。非仮名自立語検出部27は文字列
記憶部22に記憶された与えられた漢字仮名混じ
り文字列220に対して表記に非仮名文字を含む
自立語における最後方の非仮名文字の存在し得る
位置を照合開始位置221として設定する照合開
始位置設定回路271と、文字列記憶部22に記
憶された与えられた漢字仮名混じり文字列220
における照合開始位置221より前方の文字列2
22と非仮名自立語辞書記憶部26に記憶された
非仮名自立語辞書における主要表記部411とが
一致する自立語を非仮名自立語辞書4から検索す
る主要表記部検索回路272と、主要表記部検索
回路272により検索された各自立語について、
文字列記憶部22に記憶された与えられた漢字仮
名混じり文字列220における照合開始位置22
1より後方の文字列223と非仮名自立語辞書記
憶部26に記憶された非仮名自立語辞書4におけ
る残余表記部412とを照合する残余表記部照合
回路273とで構成される。ここで主要表記部検
索回路272および残余表記部照合回路273は
それぞれ公知の単語辞書検索方式および文字列照
合方式を用いて実現できる。なお主要表記部検索
回路272から残余表記部照合回路273へは主
要表記部検索回路272により検索された各自立
語の非仮名自立語辞書4における位置が渡され、
残余表記部検索回路273から情報出力部28へ
は非仮名自立語検出部27により検出された表記
に非仮名文字を含む自立語の非仮名自立語辞書4
における位置が渡される。また照合開始位置設定
回路271が照合開始位置221を設定する際、
表記に非仮名文字を含む自立語における最後方の
非仮名文字の存在し得る位置として判定される照
合開始位置は第6図aおよびbにそれぞれ△印で
示してある。すなわち第6図aに示したように仮
名文字の直前の非仮名文字の位置(但し片仮名は
非仮名文字として扱う)および第6図bに示した
ように既に単語として認定されている文字列の直
前の非仮名文字の位置が第1および第2の例にお
ける照合開始位置である。なお同図において、()
内の文字はあらかじめなされている単語検出によ
り認定された単語に関する品詞情報を表す。
Next, in FIG. 5, a character string storage section 22, a non-kana independent word dictionary storage section 26, an information output section 28, and a control section 29 correspond to the same reference numerals in FIG. 2, respectively. The non-kana independent word detection unit 27 detects the possible position of the last non-kana character in the independent word whose notation includes a non-kana character for a given kanji-kana mixed character string 220 stored in the character string storage unit 22. A verification start position setting circuit 271 that sets the verification start position 221 as the verification start position 221, and a given kanji-kana mixed character string 220 stored in the character string storage unit 22.
Character string 2 before the matching start position 221 in
22 and the main notation part 411 in the non-kana independent word dictionary stored in the non-kana independent word dictionary storage unit 26. For each independent word searched by the partial search circuit 272,
Verification start position 22 in the given kanji/kana mixed character string 220 stored in the character string storage unit 22
It is comprised of a residual notation part collation circuit 273 that collates the character string 223 after 1 with the residual notation part 412 in the non-kana independent word dictionary 4 stored in the non-kana independent word dictionary storage part 26. Here, the main notation part search circuit 272 and the remaining notation part matching circuit 273 can be realized using a known word dictionary search method and character string matching method, respectively. The position of each independent word searched by the main notation part search circuit 272 in the non-kana independent word dictionary 4 is passed from the main notation part search circuit 272 to the remaining notation part matching circuit 273.
From the residual notation search circuit 273 to the information output unit 28 is a non-kana independent word dictionary 4 of independent words whose notation detected by the non-kana independent word detection unit 27 includes non-kana characters.
The position in is passed. Furthermore, when the verification start position setting circuit 271 sets the verification start position 221,
The collation start position determined as the possible position of the last non-kana character in an independent word whose notation includes a non-kana character is indicated by a triangle in FIGS. 6a and 6b, respectively. That is, as shown in Figure 6a, the position of the non-kana character immediately before the kana character (however, katakana is treated as a non-kana character), and as shown in Figure 6b, the position of the character string that has already been recognized as a word. The position of the immediately preceding non-kana character is the matching start position in the first and second examples. In the same figure, ()
The characters inside represent part-of-speech information regarding words recognized by word detection performed in advance.

続いて本実施例において、例えば漢字仮名混じ
り文字列「突然冷た/い(形容詞活用語尾連体
形)/雨(名詞)/が(格助詞)」の自立語検出
を行なう場合について、第2図、第4図、第5図
を用い第3図および第7図のフローチヤートに沿
つて説明する。
Next, in this embodiment, for example, in the case of performing independent word detection of a character string containing kanji and kana, ``suddenly cold/ii (adjective conjugated ending adjunctive form)/rain (noun)/ga (case particle)'', FIG. This will be explained along the flowcharts of FIGS. 3 and 7 using FIGS. 4 and 5.

ステツプ1(第3図における処理31): 文字列入力部21が起動され、自立語検出の対
象として与えられた漢字仮名混じり文字列「突然
冷た/い(形容詞活用語尾連体形)/雨(名
詞)/が(格助詞)」(既に後方から一部単語認定
が行なわれている)を文字列記憶部22に書き込
む。
Step 1 (process 31 in FIG. 3): The character string input unit 21 is activated, and the character string ``suddenly cold/ii (adjective conjugated suffix)/rain (noun )/ga (case particle)'' (part of the words have already been recognized from the rear) is written into the character string storage section 22.

ステツプ2(第3図における処理32): 自立語判定部23が起動され、単語として認定
されていない残余文字列「突然冷た」の字種をチ
エツクし、該残余文字列に非仮名文字が含まれて
いるので表記に非仮名文字を含む自立語を検出す
べきと判定され、判定結果Bが制御部29に送ら
れる。
Step 2 (process 32 in FIG. 3): The independent word determination unit 23 is activated, checks the character type of the remaining character string "suddenly cold" that is not recognized as a word, and determines whether the remaining character string contains non-kana characters. Therefore, it is determined that an independent word containing non-kana characters in the notation should be detected, and the determination result B is sent to the control unit 29.

ステツプ3(第3図における処理34): 表記に非仮名文字を含む自立語を検出すべきと
いう判定結果Bを受けて非仮名自立語検出部27
が起動され、以下のステツプ3−1,〜ステツプ
3−3の処理が行なわれる。
Step 3 (Process 34 in FIG. 3): Upon receiving the determination result B that an independent word containing non-kana characters in its notation should be detected, the non-kana independent word detection unit 27
is activated, and the following steps 3-1 to 3-3 are performed.

ステツプ3−1(第7図における処理341): 照合開始位置設定回路271が起動され、与え
られた漢字仮名混じり文字列220の残余文字列
「突然冷た」における仮名文字の直前の非仮名文
字の位置(すなわち「冷」の位置)が表記に非仮
名文字を含む自立語における最後方の非仮名文字
の位置として判定され、照合開始位置221とし
て設定される。
Step 3-1 (process 341 in FIG. 7): The matching start position setting circuit 271 is activated, and the non-kana character immediately before the kana character in the remaining character string "suddenly cold" of the given kanji-kana mixed character string 220 is activated. The position (that is, the position of "rei") is determined as the position of the last non-kana character in the independent word whose notation includes a non-kana character, and is set as the matching start position 221.

ステツプ3−2(第7図における処理342): 主要表記部検索回路272が起動され、与えら
れた漢字仮名混じり文字列220における照合開
始位置221(「冷」の位置)より前方の文字列
222「突然冷」と非仮名自立語辞書4における
主要表記部411とが一致する自立語を非仮名自
立語辞書4から検索する。ここでは「突然冷」お
よび1文字削つた「然冷」については主要表記部
411に一致する自立語が存在せず、さらに1文
字削つた「冷」については主要表記部411に一
致する自立語として“冷”、“冷た”、“冷やか”が
存在するのでそれら自立語が検索される(ただ
し、第4図〜第6図に示すような内容の非仮名自
立語辞書4を用いたものとする)。検索された前
記各自立語“冷”、“冷た”、“冷やか”について、
非仮名自立語辞書4内での位置が残余表記部照合
回路273へ送られる。このとき、もし主要表記
部411の一致する自立語が1つも検索されなか
つたときには、検索失敗が残余表記部照合回路2
73へ送られる。
Step 3-2 (process 342 in FIG. 7): The main notation search circuit 272 is activated and searches for a character string 222 preceding the collation start position 221 (position of "chi") in the given kanji-kana mixed character string 220. The non-kana independent word dictionary 4 is searched for an independent word whose main notation 411 in the non-kana independent word dictionary 4 matches "suddenly cold." Here, there is no independent word that matches the main notation part 411 for "Sudden Rei" and "Zenrei" with one character deleted, and further, there is no independent word that matches the main notation part 411 for "Rei" with one character deleted. Since there are "cold", "cold", and "cold", these independent words are searched (however, when using the non-kana independent word dictionary 4 with the contents shown in Figures 4 to 6), do). Regarding each of the searched independent words “chill”, “cold”, “cold”,
The position in the non-kana independent word dictionary 4 is sent to the residual notation matching circuit 273. At this time, if no matching independent word in the main notation part 411 is retrieved, the search failure occurs in the remaining notation part matching circuit 2.
Sent to 73.

ステツプ3−3(第7図における処理343): 残余表記部照合回路273が起動され、主要表
記部検索回路272により検索された前記各自立
語“冷”、“冷た”、“冷やか”について、与えられ
た漢字仮名混じり文字列220における照合開始
位置221より後方の文字列223「たい…」と
残余表記部412とが照合される。この際それぞ
れ残余表記部412の仮名文字列の長さ分だけの
文字列が照合される。すなわち、自立語“冷”に
ついては残余表記部412が無(スペース)なの
で照合成功、自立語“冷た”については残余表記
部412の“た”と文字列の“た”とが一致する
ので照合成功、自立語“冷やか”については残余
表記部412の“やか”と文字列の“たい”とが
一致せず照合失敗となる。そこで、照合成功した
前記各自立語“冷”、“冷た”について、非仮名自
立語辞書4内での位置が情報出力部28へ送られ
る。なお、もし主要表記部検索回路272から検
索された自立語の非仮名自立語辞書4内での位置
でなく検索失敗が送られてきたときには、残余表
記部412の照合は行なわれず、検索失敗が情報
出力部28へ送られる。
Step 3-3 (process 343 in FIG. 7): The residual notation matching circuit 273 is activated, and for each of the independent words "cold", "cold", and "cold" searched by the main notation searching circuit 272, The character string 223 "Tai..." after the verification start position 221 in the given kanji/kana mixed character string 220 is verified against the remaining notation portion 412 . At this time, character strings corresponding to the length of the kana character strings in each residual notation section 412 are compared. In other words, for the independent word "chi", the residual notation part 412 is empty (space), so the matching is successful, and for the independent word "chit", the "ta" in the residual notation part 412 matches "ta" in the character string, so the matching is successful. Successfully, for the independent word "cold", "yaka" in the residual notation section 412 and "tai" in the character string do not match, resulting in a matching failure. Therefore, the position in the non-kana independent word dictionary 4 is sent to the information output unit 28 for each of the independent words "rei" and "rei" that have been successfully matched. Note that if a search failure is sent from the main notation search circuit 272 instead of the position of the searched independent word in the non-kana independent word dictionary 4, the remaining notation part 412 is not checked and the search fails. The information is sent to the information output section 28.

ステツプ4(第3図における処理35): 情報出力部28が起動され、残余表記部照合回
路273から受けた自立語“冷”、“冷た”に関し
て以後の解析(文法的検定、音声出力等)で必要
となる単語情報を非仮名自立語辞書4から読み込
み出力する。
Step 4 (Process 35 in FIG. 3): The information output unit 28 is activated and performs further analysis (grammatical test, voice output, etc.) regarding the independent words “chi” and “cold” received from the residual notation matching circuit 273. The word information required for this is read from the non-kana independent word dictionary 4 and output.

この例では次のような単語情報が出力される。 In this example, the following word information is output.

“冷” 品詞:動詞語幹 読み:ひや アクセント:2型 “冷た” 品詞:形容詞語幹 読み:つめた アクセント:0型 ただし、残余表記部照合回路273から検索失
敗が送られてきたときには自立語検出失敗が出力
される。
“Cold” Part of speech: Verb stem Pronunciation: Hiya Accent: Type 2 “Cold” Part of speech: Adjective stem Pronunciation: Tsumeta Accent: Type 0 However, when a search failure is sent from the residual notation matching circuit 273, an independent word is detected. Failure is output.

なお、以上のステツプにより検出された自立語
“冷”と“冷た”について文法的検定を行なうな
らば、それらの後方の文字列「い(形容詞活用語
尾連体形)/雨(名詞)/が(格助詞)」と接続
可能な自立語“冷た(形容詞語幹)”が正しい解
析結果として選択されることになる。そして、さ
らに残余文字列「突然」について単語検出が行な
われることになる。
Furthermore, if we were to perform a grammatical test on the independent words "chi" and "cold" detected through the above steps, the following character strings "i (adjective conjugation suffix) / rain (noun) / ga ( The independent word ``cold (adjective stem)'' that can be connected with ``case particle)'' is selected as the correct analysis result. Then, word detection is further performed for the remaining character string "sudden".

上記ステツプ2において自立語判定部23が仮
名表記の自立語を検出すべきと判定して判定結果
Aを制御部29へ送つたときは公知のステツプ
3′の処理が行われたのち前記ステツプ4のフロー
になる。
When the independent word determination unit 23 determines that an independent word written in kana should be detected in step 2 and sends the determination result A to the control unit 29, the step
After the process 3' is performed, the flow returns to step 4.

ステツプ3′(第3図における処理33): 仮名表記の自立語を検出すべきという判定結果
Aを受けて仮名自立語検出部25が起動され、与
えられた漢字仮名混じり文字列から仮名表記の自
立語が検出されて情報出力部28へ送られる。
Step 3' (process 33 in FIG. 3): Upon receiving the determination result A that an independent word written in kana should be detected, the kana independent word detection unit 25 is activated, and detects the word written in kana from the given character string containing kanji and kana. Independent words are detected and sent to the information output section 28.

以上説明したように本実施例によれば、与えら
れた漢字仮名混じり文字列と非仮名自立語辞書の
表記部との照合開始位置を、既に単語認定された
文字列とまだ単語認定されていない文字列との境
界ではなく、表記に非仮名文字を含む自立語にお
ける最後方の非仮名文字に設定しているので、従
来の自立語検出方式では正しい自立語の得られな
かつた第1図bに示すような自立語の一部が誤つ
た付属語として認定されてしまつた場合について
も、正しい自立語が得られることは明らかであ
る。すなわち、第1図bの文字列「例えば」にお
いて、「えば」が付属語として認定されているの
で従来の自立語検出方式では“例(名詞)”しか
検出されないのとは異なり、本実施例によれば
“例(名詞)”のみならず“例えば(副詞)”も検
出することができるので正しい自立語が得られ
る。
As explained above, according to this embodiment, the starting position of matching a given character string containing kanji and kana with the notation part of the non-kana independent word dictionary is determined between a character string that has already been recognized as a word and a character string that has not yet been recognized as a word. Because it is set to the last non-kana character in an independent word that includes a non-kana character in its notation, rather than the boundary with a character string, the conventional independent word detection method could not obtain the correct independent word. It is clear that the correct independent word can be obtained even if part of the independent word is recognized as an erroneous adjunct, as shown in (a). In other words, in the character string "for example" in FIG. According to the method, not only "example (noun)" but also "for example (adverb)" can be detected, so correct independent words can be obtained.

また従来は複数回、時には盲目的に多数回の検
索を繰り返さなくては得られなかつた“例”と
“例えば”、あるいは“雨”と“にわか雨”のよう
な照合開始位置の異なる自立語等についても、本
実施例では自立語における最後方の非仮名文字と
いう同一の照合開始位置を設定しているので、同
時に検出することが可能であり効率が良い。
Additionally, independent words with different matching start positions, such as "example" and "for example," or "rain" and "shower," which previously could not be obtained without repeating multiple searches, sometimes blindly, many times. Also, in this embodiment, since the same matching start position of the last non-kana character in the independent word is set, simultaneous detection is possible and efficiency is high.

なお、本実施例においては自立語検出の対象と
する漢字仮名混じり文字列はあらかじめ付属語検
出のなされた漢字仮名混じり文字列であるとした
が、対象とする漢字仮名混じり文字列があらかじ
め付属語検出がなされていなくとも非仮名文字か
ら仮名文字への字種の変位点を照合開始位置とす
ることにより本発明の自立語検出方式を適用する
ことが可能である。例えば、漢字仮名混じり文字
列「突然にわか雨が降る」に対して、非仮名文字
から仮名文字への字種の変化点である文字“然”、
“雨”、“降”を順次照合開始位置として本発明の
自立語検出方式を適用することにより、付属語検
出を行なわずに、“突然(副詞)”、“にわか雨(名
詞)”、“雨(名詞)”、“降(五段動詞語幹)”等の
表記に非仮名文字を含む自立語を検出することが
可能である。従つて、対象とする漢字仮名混じり
文字列があらかじめ付属語検出がなされているか
どうかは本発明の自立語検出方式を実現するため
の制約にはならない。
Note that in this example, the character string containing kanji and kana that is the target of independent word detection is a character string containing kanji and kana that has been detected as an adjunct word in advance. Even if no detection has been made, it is possible to apply the independent word detection method of the present invention by setting the transition point of the character type from a non-kana character to a kana character as the matching start position. For example, for the character string ``suddenly showering'' that includes kanji and kana, the character ``ran'', which is the transition point from a non-kana character to a kana character,
By applying the independent word detection method of the present invention with "rain" and "rain" as matching starting positions, "sudden (adverb)", "shower (noun)" and "rain It is possible to detect independent words that include non-kana characters in the notation, such as "(noun)" and "desu (five-dan verb stem)." Therefore, whether or not the target character string containing kanji and kana has been subjected to adjunct word detection in advance is not a constraint for realizing the independent word detection method of the present invention.

(発明の効果) 以上の説明により明らかなように本発明の自立
語検出方式によれば、表記に非仮名文字を含む自
立語における最後方の非仮名文字から照合を開始
するので、仮名文字と非仮名文字とが混じつて表
記される自立語をも正しくかつ効率的に検出でき
るという効果が生じる。
(Effects of the Invention) As is clear from the above explanation, according to the independent word detection method of the present invention, matching starts from the last non-kana character in an independent word that includes non-kana characters in the notation, so This has the effect that even independent words written in combination with non-kana characters can be detected correctly and efficiently.

【図面の簡単な説明】[Brief explanation of drawings]

第1図a,bはそれぞれ従来の自立語検出方式
による自立語検出の第1、第2の例を示す図、第
2図は本発明の自立語検出方式の一実施例を示す
ブロツク図、第3図は第2図における各部の処理
動作を示すフローチヤート、第4図は本発明の自
立語検出方式で用いられる非仮名自立語辞書の一
例を示す図、第5図は第2図における非仮名自立
語検出部の一構成例を示すブロツク図、第6図
a,bはそれぞれ第5図における照合開始位置設
定回路による照合開始位置決定の第1、第2の例
を示す図および第7図は第3図における処理34
の詳細を示すフローチヤートである。 図において、21……文字列入力部、22……
文字列記憶部、220……漢字仮名混じり文字
列、221……照合開始位置、222……照合開
始位置より前方の文字列、223……照合開始位
置より後方の文字列、23……自立語判定部、2
4……仮名自立語辞書記憶部、25……仮名自立
語検出部、26……非仮名自立語辞書記憶部、2
7……非仮名自立語検出部、271……照合開始
位置設定回路、272……主要表記部検索回路、
273……残余表記部照合回路、28……情報出
力部、29……制御部、31,32,33,3
4,341,342,343,35……処理、4
……非仮名自立語辞書、41……表記部、411
……主要表記部、412……残余表記部、42…
…単語情報部。
FIGS. 1a and 1b are diagrams showing first and second examples of independent word detection using a conventional independent word detection method, respectively, and FIG. 2 is a block diagram showing an embodiment of the independent word detection method of the present invention. FIG. 3 is a flowchart showing the processing operations of each part in FIG. 2, FIG. 4 is a diagram showing an example of a non-kana independent word dictionary used in the independent word detection method of the present invention, and FIG. FIGS. 6a and 6b are a block diagram showing an example of the configuration of a non-kana independent word detection section, and FIGS. Figure 7 shows processing 34 in Figure 3.
This is a flowchart showing the details. In the figure, 21... character string input section, 22...
Character string storage unit, 220...Character string containing kanji and kana, 221...Verification start position, 222...Character string before the verification start position, 223...Character string after the verification start position, 23...Independent word Judgment part, 2
4...Kana independent word dictionary storage unit, 25...Kana independent word detection unit, 26...Non-kana independent word dictionary storage unit, 2
7...Non-kana independent word detection unit, 271...Verification start position setting circuit, 272...Main notation part search circuit,
273... Residual notation section collation circuit, 28... Information output section, 29... Control section, 31, 32, 33, 3
4,341,342,343,35...processing, 4
...Non-kana independent word dictionary, 41...Notation section, 411
...Main notation part, 412...Remaining notation part, 42...
...Word Information Department.

Claims (1)

【特許請求の範囲】[Claims] 1 与えられた漢字仮名混じり文字列と自立語辞
書の各自立語の表記部とを照合することにより該
与えられた漢字仮名混じり文字列から自立語を検
出する自立語検出方式において、前記自立語辞書
に収容された表記に非仮名文字を含む自立語の表
記部に対して該表記部から末尾の仮名文字列を除
去した部分の主要表記部と該末尾の仮名文字列で
ある部分の残余表記部とを設け、前記与えられた
漢字仮名混じり文字列から表記に非仮名文字を含
む自立語を検出する際には、表記に非仮名文字を
含む自立語における最後方の非仮名文字の存在し
得る位置を照合開始位置として設定する照合開始
位置設定手段と、前記与えられた漢字仮名混じり
文字列における前記照合開始位置から前方の文字
列と前記主要表記部とが一致する自立語を前記自
立語辞書から検索する主要表記部検索手段と、該
主要表記部検索手段により検索された各自立語に
ついて前記与えられた漢字仮名混じり文字列にお
ける前記照合開始位置より後方の文字列と前記残
余表記部とを照合する残余表記部照合手段とを備
え、前記表記に非仮名文字を含む自立語を該表記
に非仮名文字を含む自立語における最後方の非仮
名文字から照合を開始することによつて検出する
ことを特徴とする自立語検出方式。
1. In an independent word detection method that detects an independent word from a given character string containing kanji and kana by comparing a given character string containing kanji and kana with the notation of each independent word in an independent word dictionary, For the notation part of an independent word containing non-kana characters in the notation stored in the dictionary, the main notation part of the part after removing the last kana character string from the notation part and the remaining notation of the part that is the last kana character string. When detecting an independent word that includes a non-kana character in the notation from the given character string containing kanji and kana characters, it is necessary to detect the existence of the last non-kana character in the independent word that includes a non-kana character in the notation. a verification start position setting means for setting a position to be obtained as a verification start position; and a verification start position setting means for setting the obtained position as a verification start position; A main notation part search means for searching from a dictionary, and a character string after the matching start position in the given kanji-kana mixed character string and the remaining notation part for each independent word searched by the main notation part search means. residual notation part matching means for matching the notation, and detecting an independent word that includes a non-kana character in the notation by starting matching from the last non-kana character in the independent word that includes the non-kana character in the notation. An independent word detection method characterized by:
JP59081166A 1984-04-24 1984-04-24 Independent word detecting system Granted JPS60225274A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59081166A JPS60225274A (en) 1984-04-24 1984-04-24 Independent word detecting system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59081166A JPS60225274A (en) 1984-04-24 1984-04-24 Independent word detecting system

Publications (2)

Publication Number Publication Date
JPS60225274A JPS60225274A (en) 1985-11-09
JPH0231422B2 true JPH0231422B2 (en) 1990-07-13

Family

ID=13738872

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59081166A Granted JPS60225274A (en) 1984-04-24 1984-04-24 Independent word detecting system

Country Status (1)

Country Link
JP (1) JPS60225274A (en)

Also Published As

Publication number Publication date
JPS60225274A (en) 1985-11-09

Similar Documents

Publication Publication Date Title
KR100734741B1 (en) Recognizing words and their parts of speech in one or more natural languages
EP0415000B1 (en) Method and apparatus for spelling error detection and correction
JPH07325828A (en) Grammar checking system
US20060241936A1 (en) Pronunciation specifying apparatus, pronunciation specifying method and recording medium
US7328404B2 (en) Method for predicting the readings of japanese ideographs
JP2000298667A (en) Kanji converting device by syntax information
JPH0231422B2 (en)
JP2774495B2 (en) Natural language processor
KR20040018008A (en) Apparatus for tagging part of speech and method therefor
JPH0130173B2 (en)
JP3856515B2 (en) Document proofing device
JPH0233185B2 (en)
JPH07325825A (en) English grammar checking system device
JPH01114976A (en) Dictionary structure for document processor
JP3907106B2 (en) Translation rule creation device and program
JPH0546612A (en) Sentence error detector
JPS63163956A (en) Document preparation and correction supporting device
JPH05225183A (en) Automatic error detector for words in japanese sentence
JPH06149872A (en) Text input device
JPS62150462A (en) Sentence error automatic correcting system
JPH0215372A (en) Electronic dictionary device and method for retrieving electronic dictionary
JPS6132167A (en) Kana-kanji conversion processor
JPH0262659A (en) Extracting device for correction candidate character of japanese sentence
JPS62271172A (en) Kana/kanji conversion processing system
JPH01316863A (en) Automatic qualifying and correcting device for error in japanese language text

Legal Events

Date Code Title Description
EXPY Cancellation because of completion of term