JPS6118066A

JPS6118066A - Word extracting system

Info

Publication number: JPS6118066A
Application number: JP59139666A
Authority: JP
Inventors: Yasuyuki Numata; 泰之沼田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-07-05
Filing date: 1984-07-05
Publication date: 1986-01-25

Abstract

PURPOSE:To minimize the number of character strings to be retrieved in a dictionary retrieval mode and to increase the word extracting speed, by providing input character strings, an attribute deciding part for head KANJI (Chinese characters) sound, character strings to be retrieved, etc. CONSTITUTION:The KANA (Japanese syllabary) character strings corresponding to a Japanese word sentence supplied from a KANA character input part 1 of a word extracting system are stored temporarily to an input character string memory 2. A dividing part 3 for KANJI refers to a KANJI sound table 4 to divide KANJI and KANA in each minimum unit. While an attribute deciding part 5 for input character strings decides the attribute based on the arrangement of KANJI sounds and KANA characters. An attribute deciding part 6 for head KANJI sound decides the attribute of the KANJI sound based on whether an independent word having the same reading as the head KANJI sound exists or not. A deciding part 7 for character string to be retrieved sets the minimum number of character strings to be retrieved based on the information on decisions of both parts 5 and 6. The character strings to be retrieved are added to a dictionary retrieving part 8, and a word dictionary 9 is retrieved to store words to a candidate word memory part 10. This increases the word extracting speed.

Description

【発明の詳細な説明】技術分野本発明は、カナ漢字変換処理装管に関し、より詳細には
日本語文書作成装管、電子計算機システム等に適用し得
るカナ漢字変換処理装置におけるｔＫ４ＦＪ抽出方式に
関する。DETAILED DESCRIPTION OF THE INVENTION Technical Field The present invention relates to a kana-kanji conversion processing device, and more particularly to a tK4FJ extraction method in a kana-kanji conversion processing device that can be applied to a Japanese document creation device, a computer system, etc. .

従来技術カナ漢字変換処理装置では、表音文字Ｃひら仮名、カッ
仮名。ローマ文字）で入力された文章を適切な漢字カナ
混じり文に変換するため、カナ漢字変換用の単語辞書を
設けている。この単語辞書の検索は、入力されたカナ文
字列からＱｔ語を切出して被検索文字列どし、被検索文
字列と１１１語辞書中の見出し文字列とのマツチングを
行うことにより行う。しかし、日本語は文法がｉｆｆ　
Ｋｌで、かつ同音異義語が多数存在するため、辞書検索
により複数の候補７１１語が抽出さＪしる。The conventional kana-kanji conversion processing device uses phonetic characters C hiragana and kakkana. A word dictionary for kana-kanji conversion is provided in order to convert sentences entered in Roman letters) into appropriate sentences containing kanji and kana. This word dictionary search is performed by cutting out the Qt word from the input kana character string, creating the searched character string, and matching the searched character string with the index character string in the 111-word dictionary. However, Japanese grammar is wrong.
Since K1 has many homophones, a dictionary search extracts a plurality of 711 candidate words.

この複数の候補歌語の中から１つを選択し変換結果とす
るために、従来、次のような処理を行っている。すなわ
ち、抽出された候補嘔語のそれぞＪしに対し、前の変換
済単語（変換結果）との接続の可能性を判断し、さらに
接続可能な候補中４語を読み長、出現！ｔｒｉ　Ｉｆｆ
　、接続の屯み等右・パラメータとして評価し、評価の
最も高い候補ｆｉｔ語を変換結果として出力する。In order to select one of the plurality of candidate song words and use it as a conversion result, the following processing has conventionally been performed. That is, for each of the extracted candidate words, the possibility of connection with the previous converted word (conversion result) is determined, and 4 of the possible connectable words are read and appear! tri If
, the depth of connection, etc. are evaluated as right/parameters, and the candidate fit word with the highest evaluation is output as a conversion result.

ところで、　ｆｌｔ”ｌ！１．、　＋１’語辞書検索の
容易化、誤解析の低；酸化を図るため、入力文字列に対
し漢字前による前処理を行っている。By the way, flt"l!1., +1' In order to facilitate word dictionary searches and reduce erroneous analysis, the input character string is pre-processed using kanji preprocessing.

漢字前には、カナ表記した場合にその長さが山１文字で
ある１字漢字音、（’Ａ＋　２文字である２字漢字音、
侍）３文字である３字漢字音がある。例えば、（■）１
字漢字音・・はとんどのカナニア「亜」、イ「以、意９
位、医、異・・・」等（■２字漢字音・・アイ「愛、挨、哀・・・」、アク「
悪、握・・・」等 ■３字漢字音・・・シュウ「集、収１週、衆、＃。Before the kanji, there is a 1-character kanji sound whose length is one mountain character when written in kana, ('A + 2-character kanji sound whose length is 2 characters,
Samurai) There are three kanji sounds that are three letters. For example, (■)1
The Kanji sound is the most Kanania “A”, I “I”, meaning 9
place, medical, different...'', etc. (■Two-character kanji sound...ai ``love, hello, sadness...'', aku ``
Evil, grip...'', etc.■3 Kanji sounds...Shu ``Collect, collect 1 week, Shu, #.

習、修１周、就・・・」、ショウ「相、小、省９勝、少、商、証。Xi, Shu's 1st lap, Shu...'', Show “Sang, small, provincial 9 wins, small, commercial, proof.

消、正・・・」等である。Erase, correct...” etc. It is.

ところで、上記２字漢字音、３字漢字音において、２字
目、３字目を占めるカナは次に示す１８種に限定さ］Ｉ
、る。By the way, in the above two-character kanji sounds and three-character kanji sounds, the kana that occupy the second and third characters are limited to the following 18 types]I
,ru.

「イ、つ、キ、ワ、チ、ツ、ヤ、ユ、ヨ、ユウ。“I, tsu, ki, wa, chi, tsu, ya, yu, yo, yu.

目つ、ヤク、コク、ヨク、ユツ、ユン、ツ、ン」しかし
、１８種のカナの全てが１字目のカナに対して漢字前を
構成するわけではない。例えば、１字目がアの場合、アイ・・・漢字前（１：起倒参照）アク・・・漢字前でないアキ・・漢字前で４゛いアク・漢字前（−１−起倒参照）アチ・・・漢字前でないのようになる。Metsu, yaku, koku, yoku, yutsu, yun, tsu, n.'' However, not all of the 18 types of kana form the kanji preposition for the first kana. For example, if the first character is A, Ai... in front of a kanji (see 1: Kiseki) Aku... not in front of a kanji Aki... 4゛ in front of a kanji Aku - in front of a kanji (see -1 - Kiseki) ) Achi... It becomes like kanji before.

入力文字列を−に２２２字以上の漢字前により区切り、
それを東位として被検索文字列を作成することにより、
本来、漢字の読みの一部であるものを格助詞等と誤解析
することがなくなる。また、上記漢字前は＠１独で用い
られることはなく、必ず他の漢字前との相合わされて使
用される。したがっ　　　　　゛て、３文字以−にの入
力文字列とこれら漢字前をマツチングした結果、先頭の
１文字はマツチするが２宇目、３字目がマツチしない場
合は、当該先頭の１文字は漢字前ではなく付属語等のカ
ナである可能性が高いものである推測することができる
。このため、漢字前検索用の漢字前夫には、１字漢字音
を含めた全ての漢字前を格納する必要はなく、第４図（
ａＬ　（ｂ”）、（ｃ）に示したように２字以上の漢字
前を格納すればよい。Separate the input string by - before 222 or more kanji characters,
By creating a search string with that as the east position,
This eliminates the possibility of misinterpreting something that is originally part of the reading of a kanji as a case particle. Furthermore, the above Kanji-mae is never used in @1doku, but is always used in conjunction with other Kanji-mae. Therefore, as a result of matching the input string of 3 or more characters with the characters before these kanji, if the first character is matched but the second and third characters are not, the first character is matched before the kanji. It can be inferred that there is a high possibility that it is a kana, such as an attached word, rather than a kana. For this reason, there is no need to store all kanji fronts including the sound of a single kanji in the kanji front for kanji front search, as shown in Figure 4 (
It is sufficient to store two or more kanji characters as shown in aL (b'') and (c).

従来、第４図（ａ）、（ｂ）、（ｃ）に示したような漢
字前夫を用いて次のような前処理を行っている。Conventionally, the following preprocessing has been performed using kanji zenfu as shown in FIGS. 4(a), (b), and (c).

例えば、「ぶんしようのさくせいがひじょうによういで
ある。」という入力カナ文字列に基づいて漢字前夫をア
クセスし、漢字前とカナを最小単位とした次のような区
切りを施している。For example, the kanji maefu is accessed based on the input kana character string ``Bunjo no Sakusei is very good.'', and the following separation is performed using the kanji mae and kana as the minimum units.

［ジン／ショウ／の／サク／セイ／が／ひ／ジヨウ／に
／ヨウ／い／で／あ／る／。」ただし、カタカナは漢字前、ひらがなはカナを示す。[jin/sho/no/saku/sei/ga/hi/jiyou/ni/you/i/de/a/ru/. ” However, katakana indicates before kanji, and hiragana indicates kana.

上記の区切り処理後、漢字前とカナの並び方により、次
のように人文字列に属性を付加する。After the above delimiting process, attributes are added to the human character string as follows, depending on how the kanji and kana characters are arranged.

（漢字前）＋（漢字前）十〜　・・・・・・ＴＹＰＥ　
］（漢字前）＋（カナ）十〜　　・・・・・・ＴＹＰＥ
２Ｃカナ）＋（漢字前）十〜　　・・・・・・ＴＹＰＥ
３（カナ）十（カナ）十〜　　　・・・・・・ＴＹＰＥ
４「ブン＋ショウ十〜」は（漢字前）＋（漢字前）十〜
であるので上記例文はＴＹＰＥＩとなる。(before kanji) + (before kanji) 10~ ・・・・・・TYPE
] (before kanji) + (kana) 10~ ・・・・・・TYPE
2C kana) + (before kanji) 10~ ・・・・・・TYPE
3 (kana) 10 (kana) 10 ~ ・・・・・・TYPE
4 “Bun + Sho 10~” is (before kanji) + (before kanji) 10~
Therefore, the above example sentence becomes TYPEI.

次に、入力文字列の上記属性ＴＹＰＥＩ〜ＴＹＰＥ４に
したがって、被検索文字列を次のようにして作成する。Next, a searched character string is created in the following manner according to the attributes TYPEI to TYPE4 of the input character string.

ＴＹＰＥＩの場合・・・・・・（Ｄ　（漢字前）＋（漢
字前）■（漢字前）ＴＹＰＥ２の場合・・・・・・σ）（漢字前）＋（カナ
）■（漢字前）ＴＹＰＥ３の場合・・・・・・■（カナ）＋（漢字前）
■（カナ）ＴＹＰＥ４の場合・・・・・・■（カナ）■（カナ）十
（カナ） ■（カナ）＋（カナ）＋（カナ）上記例文の場合はＴＹＰＥＩであるので、次のように被
検索文字列を設定する。For TYPEI... (D (before kanji) + (before kanji) ■ (before kanji) For TYPE 2... σ) (before kanji) + (kana) ■ (before kanji) TYPE 3 In the case of...■ (kana) + (before kanji)
■ (kana) For TYPE 4...■ (kana) ■ (kana) 10 (kana) ■ (kana) + (kana) + (kana) In the example sentence above, it is TYPEI, so it is written as follows. Set the search string to .

［設定される被検索文字列コ・・・（１）ぶんしよう■
ぶん次に、設定さ九た被検索文字列にしたがって、単語辞書
を検索し、得られた候補単語群に対して評価を行い、最
適な候補ｍ語を選択する。[Character string to be searched... (1) Let's try it ■
Next, the word dictionary is searched according to the set character string to be searched, the obtained candidate word group is evaluated, and the most suitable m candidate words are selected.

ここでは、仮に「文章」が最適候補単語として抽出され
たとする。この場合、次の解析対象文字列はｒのさくせ
いがひじょうによういである。ｊであるので、これに対
して再び漢字音とかすを最小ｑｔ位とした区給りを施す
。Here, it is assumed that "sentence" is extracted as the optimal candidate word. In this case, the next character string to be analyzed has a very strong r. Since it is j, we apply the kuari again to this with the minimum qt level of the kanji sounds.

［の／サク／セイ／が／ひ／ジミウ／に／ヨウ／い／で
／あ／る。」「の＋サク十〜」は（カナ）十（漢字音）十〜であるの
で」二記入力文字列の属性はＴＹＰＥ３である。ＴＹＰ
Ｅ３の属性にしたがって被検索文字列を作成すると、被
検索文字列は次のようになる。[No/saku/sei/ga/hi/jimiu/ni/you/i/de/a/ru. ” Since “no+saku 10~” is (kana) 10 (kanji sound) 10~, the attribute of the input character string is TYPE3. TYP
When a searched character string is created according to the attribute of E3, the searched character string becomes as follows.

［設定される被検索文字列コ・・Ｃｒ）のさくＣ２ンの以下、同様にして、残りの入力文字列に対し漢字音とカ
ナを最小単位とした区切りを施し、漢字音とカナの並び
方による入力文字列の属性に応じて適切な被検索文字列
を作成する。In the same way, after the C2 of the set search character string Cr), the remaining input strings are separated using Kanji sounds and kana as the minimum unit, and the arrangement of Kanji sounds and kana is Creates an appropriate search string according to the attributes of the input string.

なお、２個の最小１１位により被検索文字列を作成した
理由は、ｍ語辞書に登録されているＪ１語のほとんどは
、２個以下の最小（１１位に対応しているという事実を
考慮して、辞書検索のスピードアンプを図るためである
。The reason for creating a search string using the minimum 11th position of two characters is to take into account the fact that most of the J1 words registered in the m-word dictionary have a minimum of 2 or less characters (corresponding to the 11th position). This is to speed up dictionary searches.

しかし、上記方式には次のにうな欠点がある。However, the above method has the following drawbacks.

入力文字列「かっこうでは〜」を例に説明する。This will be explained using the input character string "Kakko de ~" as an example.

この場合、漢字音による前処理により、［ガラ／コラ／
で／は〜」のように区切りが施される。「ガッ＋コウ十〜」はＣ漢
字音）＋ｆ漢字音）→−〜であるのでＴ”ｉ’ＰＥ１に
属し、被検索文字列として、（１）がつこう ■かつが設定される。In this case, pre-processing using kanji sounds allows [Gara/Kora/
Delimitations are added such as "de/ha~". Since "Gac+Kouju~" is C kanji sound) + f kanji sound) → -~, it belongs to T''i'PE1, and (1) Gatsukou■Katsu is set as the searched character string.

ところで、上記［■がっ］巾の「っＪは、促音であるが
、これは、本来「かく（学）」という漢字音が音便変化
したものであり、漢字音としての「がっ」は存在し得て
も、それに対応する自立語は存在しない。したがって、
用語辞書には、このような見出し、および該見出しに対
応する単語は登録されていない。By the way, the ``J'' in the above [■ GA] width is a consonant, but this is originally a phonetic change of the kanji sound ``kaku (gaku)'', and the kanji sound ``ga''. may exist, but there is no independent word corresponding to it. therefore,
Such headings and words corresponding to the headings are not registered in the terminology dictionary.

従来方法では、」１記のように、音便変化等によりその
読みが変化した結果、その変化後の漢字音は存在し得て
も、その漢字音対応の自立語は存在しないこととなった
場合にも、その漢字音単独の被検索文字列を作成し、該
被検索文字列により辞書検索を行うという無駄な処理を
行っており、４１語抽出の処理速度を一層向上するため
には、この問題を解決する必要がある。In the conventional method, as in ``1'', as a result of changes in the reading due to phonetic changes, etc., even though the kanji sound after the change may exist, there is no independent word corresponding to that kanji sound. In this case, the wasteful process of creating a search string containing only the kanji sound and performing a dictionary search using the search string is performed.In order to further improve the processing speed of 41 word extraction, This problem needs to be resolved.

目　　　　　的本発明の目的は、」１記のような従来技術の問題点を解
決するため、カナ漢字変換処理装置における中詰抽出に
際し１．Ｗ書検索時に用いる被検索文字列を必要最小限
に設定し、ｍ語抽出の処理速度を向」ニさせることにあ
る。Purpose The purpose of the present invention is to solve the problems of the prior art as described in 1. 1. The object of this invention is to set the searched character strings used when searching W-books to the minimum necessary, and to improve the processing speed of m-word extraction.

構　　　成上記目的を達成するため、本発明による！１１１！Ｆｉ
抽出方式は、漢字音大と、該漢字音大を利用して漢字音
とカナを最小単位として入力文字列を区切る第１の手段
とを有するカナ漢字変換処理装置において、前記第１の
手段により区切られた入力文字列の漢字音とカナの並び
方を判定する第２の手段と、解析対象文字列の先頭漢字
音と同一の読みを持つ自立語が存在するか否かを判定す
る第３の手段と、解析対象文字列の第１番目の最小単位
が漢字音であり、かつ当該漢字音が自立語と同一の読み
を持つ漢字音である場合、当該漢字音と第２番目の最小
ｊｐ位を結合したもののみを被検索文字列として設定す
る第４の手段を設けたことに特徴がある。Configuration To achieve the above object, according to the present invention! 111! Fi
The extraction method is a kana-kanji conversion processing device that has a kanji sound size and a first means that uses the kanji sound size to separate an input character string using kanji sounds and kana as the minimum unit. A second means for determining the arrangement of kanji sounds and kana in a separated input character string, and a third means for determining whether there is an independent word having the same pronunciation as the first kanji sound of the character string to be analyzed. means, and if the first minimum unit of the character string to be analyzed is a kanji sound, and the kanji sound has the same reading as the independent word, the kanji sound and the second minimum jp position. The present invention is characterized by providing a fourth means for setting only a combination of , as a searched character string.

なお、入力文字列は順次解析されるため、当然、当面解
析さ九るべき入力文字列は次々に変化する。Note that since the input character strings are sequentially analyzed, naturally the input character strings that should be analyzed for the time being change one after another.

本明細−訃では、当面解析されるべき入力文字列を解析
対象文字列と呼んでいる。In this specification, the input character string to be analyzed for the time being is referred to as the character string to be analyzed.

以下、本発明の構成を一実施例により詳細に説−１０＝明する。Hereinafter, the configuration of the present invention will be explained in detail using an example. I will clarify.

第１図は、本発明の一実施例によるｍ語抽出方式を適用
したカナ漢字変換処理装置のブロック図である。FIG. 1 is a block diagram of a kana-kanji conversion processing device to which an m-word extraction method is applied according to an embodiment of the present invention.

第１図において、１は作成し、ようとする日本語文に対
応したカナ文字列を入力するためのカナ文字入力部、２
は入力されたカナ文字列を一時記憶しておく入力文字列
記憶部、３は入力文字列に対し漢字台とカナを最小８１
位とする区切りを施す、漢字台による区切り部、４は当
該漢字台と同一の読みを持つ自立語が存在するか否かの
情報を付加した漢字行表、５け漢字台による区切りを施
されたカナ文字列の、漢字台とカナ（７）　、３ｆｌび
方を基；で）とした属性を判定する入力文字列の属性判
定部、６は解析対象文字列の先頭の漢字台と同一の読み
を持つ自立語が存在するか否かを基準とした漢字台の属
性を判定する。先頭漢字台の属性判定部、７は、入力文
字列の属性判定部５と先頭漢字台の属性判定部６からｔ
［トられる情報に拮づいて、必要最小限の被検索文字列
を設定する被検索文字列設定部、９は１１１語辞書、８
は被検索文字列に基づいて単語辞Ｍ１９を検索する辞書
検索部、１０は辞書検索部８よりｔ！Ｉられた候補中詰
を記憶する候補単語記憶部、１１け候補ｑｔ語を評価し
て最も適切な候補単語を選択する候補単語評価部、１２
は候補単語評価部１１で選択された最適候補眼語を記憶
する最適候補生語記憶部、１３は最適候補中詰をカナ漢
字変換結果として表示するための表示部である。In Fig. 1, 1 is a kana character input section for inputting a kana character string corresponding to the Japanese sentence to be created;
3 is an input character string storage part that temporarily stores the input kana character string, and 3 is a minimum of 81 kanji and kana characters for the input character string.
4 is a kanji row table with information on whether there is an independent word with the same pronunciation as the kanji table, and a 5-digit kanji table is used to separate the sections. The attribute judgment part of the input string that determines the attribute of the takana character string based on the kanji stand and kana (7), 3flbi form; 6 is the same as the kanji stand at the beginning of the character string to be analyzed. The attributes of the kanji table are determined based on whether or not an independent word with a reading exists. The attribute determination unit 7 of the first kanji board is the attribute determination unit 5 of the input character string and the attribute determination unit 6 of the first kanji board.
[Search string setting section that sets the minimum necessary search string according to the information to be searched, 9 is a 111-word dictionary, 8
10 is a dictionary search unit that searches for a word dictionary M19 based on the character string to be searched, and t! from the dictionary search unit 8. a candidate word storage unit that stores the selected candidate words; a candidate word evaluation unit that evaluates the 11 candidate words and selects the most appropriate candidate word;
Reference numeral 13 denotes an optimal candidate raw word storage section that stores the optimal candidate eye words selected by the candidate word evaluation section 11, and a display section 13 that displays the optimal candidate middle words as a kana-kanji conversion result.

漢字台ににる属性判定部５は、漢字行表４より得ら肛だ
情報を基に、解析対象文字列の先頭漢字台が次に示す２
つの属性のうちどちらに属するかを判定する。Based on the information obtained from the kanji line table 4, the attribute determination unit 5 on the kanji stand determines whether the first kanji stand of the character string to be analyzed is the following 2.
Determine which of the two attributes it belongs to.

漢字台の属性Ａ：その漢字台の読みと同じ読みを持つ自
立語が存在する。Kanji table attribute A: There is an independent word that has the same reading as the kanji table.

例：あい（愛）、あく（悪）・・・等漢字台の属性Ｂ：その漢字台の読みと同じ読みを持つ自
立語は存在しない。Examples: Ai (love), aku (evil), etc. Attribute B of the kanji table: There is no independent word that has the same reading as that of the kanji table.

例：がっ、ずう、ずん、せっ、ざっ、・・・等第２図は
、本発明の一実施例による漢字行表の内容の一部を示す
図である。Examples: ga, zu, zun, se, za, etc. FIG. 2 is a diagram showing part of the contents of a kanji row table according to an embodiment of the present invention.

第２図に示したように、本漢字前夫には、漢字前夫２欄
、コート欄の他に、当該漢字台の読みと同じ読みを有す
る自立語が存在するか否かを示す、自立性表示欄を設け
ている。本漢字前夫では、自立性を１ビツトのＩ　′、
’Ｏ’で示し、１′の場合は自立性有り゛、すなわち属
性Ａの漢字台であることを意味し、０′の場合は自立性
無し。As shown in Figure 2, in addition to the Kanji Zeno 2 column and the coat column, this kanji Zeno has an independence display that indicates whether there is an independent word that has the same reading as the reading on the Kanji stand. A column is provided. In this Kanji Zeno, independence is expressed as 1 bit I ′,
It is indicated by 'O', and when it is 1', it means that it has independence, that is, it is a kanji stand with attribute A, and when it is 0', it does not have independence.

すなわち属性Ｂの漢字台であることを意味している。That is, it means that it is a kanji stand of attribute B.

第３図は、本発明の一実施例によるｔＩｉ語抽出方式の
動作を示すフローチャートである。FIG. 3 is a flowchart showing the operation of the tIi word extraction method according to an embodiment of the present invention.

まず、漢字台による区切り部３け、入力文字列記憶部２
から送出された入力カナ文字列に対し、第２図に示した
漢字行表４を用いて漢字台による区切りを施す（３０１
’ｌ。この際、個々の漢字台の自立性情報も同時に漢字
行表４から読み出される。First, there is a 3-character delimiter with a kanji stand, and an input character string storage part 2.
The input kana character string sent from
'l. At this time, the independence information of each kanji stand is also read out from the kanji row table 4 at the same time.

入力文字列の属４′１判定部５は、漢字台とカナを最小
中位として区切られた入力文字列（解析対象文字列）の
漢字台とカナの並び方による属性が、Ｔ”ｌ’　Ｐ　Ｆ
：　ＩまたはＴＹＰＥ２であるか否かを判定する（３０
２’）。ＴＶＰＴ’：Ｉ、ＴＹＰＥ２以外の場合は、従
来方式により被検索文字列を設定する（３０３）。The input character string attribute 4'1 determination unit 5 determines that the attribute of the input character string (character string to be analyzed), which is separated with the kanji base and kana as the minimum middle rank, is determined by the arrangement of the kanji base and kana. F
: Determine whether it is I or TYPE2 (30
2'). In cases other than TVPT':I and TYPE2, the searched character string is set using the conventional method (303).

１”ＶＰＥＩまたはＴＹＰＥ２の場合は、以下に述べる
本実施例特有の被検索文字列の設定処理を実行する。そ
の理由は。1"VPEI or TYPE2, the following character string setting process unique to this embodiment is executed. The reason is:

’Ｉ”／ＰＥＩ・・・（漢字台）＋（漢字台）十〜’ｒ
　Ｖ　ｌ）　Ｅ　２・・・（漢字台）＋（カナ）十〜の
ように、先頭の最小単位が漢字台の場合は音便変化等に
より、当該漢字台が自立性を喪失する可能性があるから
である。'I'/PEI... (Kanji stand) + (Kanji stand) 10~'r
V l) E 2... If the first minimum unit is a kanji stand, such as (Kanji stand) + (kana) 10~, there is a possibility that the kanji stand may lose its independence due to a change in tone, etc. Because there is.

ステップ３０２によりＴ’ＹＰＥＩまたはＴ’ＹＰＥ２
であると判定された場合は、さらにＴＹＰＥｌか否を判
定する（３０２，３０４）。次に、ＴＹＰＥＴである場
合は、先頭漢字台の属性判定部６は解析対象文字列の先
頭（第１番目の最小単位）の漢字台が属性Ｂであるか否
かを判定する（３０／ｌ。T'YPEI or T'YPE2 by step 302
If it is determined that it is, it is further determined whether or not it is TYPEl (302, 304). Next, in the case of TYPET, the first kanji character attribute determination unit 6 determines whether the first kanji character character string (first minimum unit) of the character string to be analyzed has attribute B (30/l .

３０５）。被検索文字列設定部７は、漢字台の属性がＢ
″Ｃ：あ２．場合は、２個の漢字台を結合したもののみ
を被検索文字列として設定しく３０５．３０６）、属性
Ａである場合は、従来通り、２個の漢字音を結合し、た
ものの他に、先頭の漢字音ｍ独で構成される被検索文字
列を作成する（３０５゜Ａ）。305). The searched character string setting section 7 indicates that the attribute of the kanji stand is B.
``C: A2. In the case, only the combination of two kanji characters should be set as the searched character string305,306), and in the case of attribute A, the two kanji sounds should be combined as before. , and a character string to be searched consisting of the first kanji sound m-doku is created (305°A).

例えば、入力文字列「かっこうでは〜」の場合、漢字音
による区切りは［ガラ／コラ／で／は〜」となり、ＴＹ
ＰＥＩであるが、その先頭の漢字音「ガラ」は第２図に
示したように自立性表記欄が属性Ｂを意味する０′であ
り、それに対応する・　見出し、および該見出しに対応
する漢字音は申請辞書９に存在しないので。For example, in the case of the input character string "Kakko de ~", the Kanji sound delimiter is [Gara/Kora/de/Ha~], and TY
PEI, as shown in Figure 2, the first kanji sound ``gara'' is 0' in the independence notation column, which means attribute B, and the corresponding heading and kanji corresponding to the heading. Because the sound does not exist in application dictionary 9.

０′）がつこうという被検索文字列のみを設定する。このように、単語
辞書９に存在しないｍ語に対応する漢字音を被検索文字
列としないことにより、無駄な辞書検索を省き、辞書検
索の処理速度を向上させることが可能となる。0') is set only to be searched character strings. In this way, by not using the kanji sounds corresponding to m-words that do not exist in the word dictionary 9 as searched character strings, it is possible to omit wasteful dictionary searches and improve the processing speed of dictionary searches.

入力文字列「かいさつにて〜」の場合は、漢字音による
区切りは「カイ／サラ／に／て〜］となり、ＴＹＰＥＩ
であるが、先頭の漢字音「カイ」は属性Ａであり、対応
する自立語ｃ会２回、界。In the case of the input character string "Kaisatsu de de~", the kanji sound delimiter is "Kai/Sara/ni/te~", and the TYPEI
However, the first kanji sound ``kai'' has attribute A, and the corresponding independent word c kai 2 times, kai.

改、・・・等）が存在するので、従来通り、０）かいさ
つ ■かいのように、２種類の被検索文字列を作成する。Since there are two types of search strings, such as 0) Kaisatsu ■ Kai, two types of searched character strings are created as before.

ステップ３０４でＴ’ＹＰＥＩでないと判定した場合は
、解析対象文字列はＴ′ｖＰＥ２であるものと特定でき
るので、その先頭の漢字音が属性Ｂであるか否かを判定
し、Ｂであるときは先頭の漢字音と２番［１の最小１１
位であるカナとを結合したもののみを被検索文字列とし
て設定する（３０４　。If it is determined in step 304 that it is not T'YPEI, the character string to be analyzed can be identified as T'vPE2, so it is determined whether the first kanji sound is attribute B, and if it is B, is the first kanji sound and the second [minimum 11 of 1]
Only the combination of the digit kana and the digit kana is set as the searched character string (304).

３０７．３０　Ｒ）。属性がＢでなくＡの場合は、従来
通り、先頭の漢字音と２番目の最小単位であるカナとを
結合したものの他に、先頭の漢字音単独による被検索文
字列を作成する（３０７．　　Ａ）。307.30R). If the attribute is A instead of B, as before, in addition to the combination of the first kanji sound and the second minimum unit, kana, a search character string is created using the first kanji sound alone (307. A).

例えば、入力文字列「ざっしにより〜」の場合、漢字音
による区切りは「ザッ／シ／に／よ／す〜」となり、１
”ＶＰＥ２であるが、先頭の漢字音「ザラ」は漢字「雑
」の読み「ザラ」が音便変化したものであり、「ザラ」
に対応する自立語は存在しないので、被検索文字列とし
ては、 ■ざっしのみを設定する。For example, in the case of the input character string "Zashiyori~", the Kanji sound delimiters are "Za/shi/ni/yo/su~", and 1
``VPE2, the first kanji sound ``Zara'' is a phonetic change of the kanji ``Zara'' reading ``Zara''.
Since there is no independent word corresponding to , only ``■'' is set as the character string to be searched.

最後に、辞書検索部８は、得られた被検索文字列により
単語辞書９を検索する（３０９）。Finally, the dictionary search unit 8 searches the word dictionary 9 using the obtained searched character string (309).

このように、ＴＹＰＥＩおよびＴ’ＹＰＥ２の解析対象
文字列の先頭の漢字音が属性Ｂ、すなわち、該漢字音に
対応する自立語が存在しない場合は、該漢字音と２番目
の最小単位とを結合したもののみを被検索文字列とする
ことにより、辞書検索の処理速度を向上させることがで
きる。In this way, if the first kanji sound of the character string to be analyzed in TYPEI and T'YPE2 has attribute B, that is, if there is no independent word corresponding to the kanji sound, then the kanji sound and the second minimum unit are By using only the combined strings as searched character strings, the processing speed of dictionary searches can be improved.

効　　　果以上説明したように１本発明の畦語抽出方式によれば、
カナ漢字変換処理装置における単語抽出に際し、辞書検
索時に用いる被検索文字列を必要最小限に設定し５貼語
抽出の処理速度を向上させることが可能となる。Effects As explained above, according to the word extraction method of the present invention,
When extracting words in the kana-kanji conversion processing device, it is possible to improve the processing speed of five-word extraction by setting the number of searched character strings used at the time of dictionary search to the minimum necessary.

[Brief explanation of drawings]

第１図は本発明の一実施例によるｍ語抽出方式を適用し
たカナ漢字変換処理装置のブロック図、第２図は本発明
の一実施例による漢字前夫の内容の一部を示す図、第３
図は本発明の一実施例による単語抽出方式の動作を示す
フローチャート、第４図は従来の漢字前夫を示す図であ
る。３：漢字音による区切り部、４：漢字前夫、５：入力文
字列の属性判定部、６：先頭漢字音の属性判定部２７：
被検索文字列設定部、８：辞書検索部、９　：　ｌ１ｌ
−語辞書。手続補正書（自発）昭和５９年８月８日昭和５９年　特　許　願第１３Ｇ１６６６”ｉ２゜発明
の名称　　単語抽出方式３、　補正をする者事件との関係　　特許出願人住　所　　　　東京都大田区中馬込１丁目３番６号ヵ　
ゎ、。、）（６７→株式会社　リ　　　コ　　−代表者
　　浜　１）　　広５．６？　　補正により増加する発明の数　　　な　　
しくａ）明細書第１頁の［特許請求の範囲ｊを次のとお
りに補正する。［（１）漢字音表と、該漢字音表を利用して漢字音とカ
ナを最小単位として入力文字列を区切る第１の手段とを
有するカナ漢字変換処理装置において、自立語の読みと
同一の読みを有する漢字音であるか否かを示す情報欄を
前記漢字音表に付加するとともに、前記第］、の手段に
より区切られた入力文字列の漢字音とカナの並び方を判
定する第２の手段と、解析対象文字列の先頭漢字音と同
一の読みを持つ自立語が存在するか否かを判定する第３
の手段と、解析対象文字列の先頭の最小単位が漢字音で
あり、かつ当該漢字音が自立語と同一の読みを持なμ−
諭漢字音である場合、当該先頭の漢字音。と第２番「１の最小単位を結合したもののみを被検索文
字列として設定する第４の手段を設けたことを特徴とす
る単語抽出方式。」（ｂ）明細書第５頁第４行目の「である推測する」を「
であると推測する」と補正する。（Ｃ）明細書第１０頁下から第７行目のＵ持つ漢字音」
を１持たない漢字音」と補正する。FIG. 1 is a block diagram of a kana-kanji conversion processing device applying the m-word extraction method according to an embodiment of the present invention, FIG. 3
The figure is a flowchart showing the operation of the word extraction method according to an embodiment of the present invention, and FIG. 4 is a diagram showing the conventional kanji zenhu. 3: Separator by kanji sound, 4: Kanji ex-husband, 5: Attribute determination unit for input character string, 6: Attribute determination unit for first kanji sound 27:
Searched character string setting section, 8: Dictionary search section, 9: l1l
-Word dictionary. Procedural amendment (voluntary) August 8, 1980 Patent application No. 13G1666”i2゜ Title of invention Word extraction method 3 Relationship with the case of the person making the amendment Patent applicant address Naka, Ota-ku, Tokyo Magome 1-3-6
Wow,. , ) (67 → Rico Co., Ltd. - Representative Hama 1) Hiro 5.6? The number of inventions will increase due to amendments.
a) Claim j on page 1 of the specification is amended as follows. [(1) In a kana-kanji conversion processing device having a kanji sound table and a first means for separating an input string using kanji sounds and kana as the minimum units using the kanji sound table, A second step for adding an information column to the kanji sound table indicating whether or not it is a kanji sound having a reading of and a third method that determines whether there is an independent word that has the same pronunciation as the first kanji sound of the character string to be analyzed.
, the minimum unit at the beginning of the character string to be analyzed is a kanji sound, and the kanji sound has the same reading as an independent word.
If it is a kanji sound, the first kanji sound. and No. 2: "A word extraction method characterized by providing a fourth means for setting only a combination of 1 minimum units as a searched character string." (b) Page 5, line 4 of the specification. ``I guess'' of the eyes is ``
I guess that's the case.'' (C) Kanji sound with U in the 7th line from the bottom of page 10 of the specification.”
It is corrected as ``kanji sounds that do not have 1.''

Claims

[Claims]

(1) In a kana-kanji conversion processing device having a kanji sound table and a first means for dividing an input string using kanji sounds and kana as the minimum units using the kanji sound table, a second means for adding an information column to the kanji sound table indicating whether or not the kanji sound has a reading, and determining the arrangement of kanji sounds and kana in the input character string separated by the first means; and a third means for determining whether or not there is an independent word having the same pronunciation as the first kanji sound of the character string to be analyzed; A fourth means is provided for setting only the combination of the first kanji sound and the second minimum unit as the searched character string when the kanji sound has the same reading as the independent word. A word extraction method featuring: