JPH05189417A

JPH05189417A - Japanese word processor

Info

Publication number: JPH05189417A
Application number: JP4004115A
Authority: JP
Inventors: Shinji Kawamoto; 真司川本
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1992-01-13
Filing date: 1992-01-13
Publication date: 1993-07-30

Abstract

PURPOSE:To provide a Japanese word processor which selects only one candidate and obtains the optimum result of analysis even if plural homonym candidates exist in a sentence analyzing process. CONSTITUTION:A relative information dictionary 107 stores the correspondence between the prescribed words and their relative words. An evaluating means 106 retrieves the information out of the dictionary 107 for decision of the homonyms included in the KANA (Japanese syllabary)-KANJI (Chinese character) character strings. Then the means 106 counts the number of relative words for each homonym and outputs the homonym having the largest KANA-KANJI count value with preference. The KANA-KANJI character string which includes the decided homonym candidate is held in an output buffer 108 and then outputted through an output device 109.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は読み文字列をかな漢字
混じり文字列に変換する日本語文章処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Japanese sentence processor for converting a reading character string into a character string containing kana-kanji characters.

【０００２】[0002]

【従来の技術】従来、この種の装置においては、かな又
はローマ時で入力した読みに対して漢字変換を指示する
と、その読みに対応する漢字が単語辞書で検索され、か
な漢字混じり文に変換される。日本語を出力する際に
は、同音異義語を含む多くの解析結果の候補の中から、
なんらかの評価手段を用いて一つの候補を選出する必要
がある。その手段の一つとして、同音異義語の選択の際
に、同一文章内の他の語と関連の同音異義語を、同音異
義語の候補の中から優先的に出力するという方法があ
る。他の手段として、単語に意味分類情報を付加し、各
単語に対して関連意味分類を登録しておき、その情報を
利用するという方法もある。2. Description of the Related Art Conventionally, in this type of device, when a kana or romanized reading is instructed to be converted to kanji, the kanji corresponding to that reading is searched in a word dictionary and converted into a kana-kanji mixed sentence. It When outputting Japanese, from among many analysis result candidates including homonyms,
It is necessary to select one candidate using some evaluation method. As one of the means, when selecting a homonym, there is a method of preferentially outputting a homonym related to another word in the same sentence from candidates for the homonym. As another means, there is also a method of adding semantic classification information to a word, registering a related semantic classification for each word, and using the information.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記従
来の日本語文章処理装置では、解析文中に関連語を持っ
た同音異義語の候補が複数存在する場合、それ以上解析
ができないという問題がある。However, the above-mentioned conventional Japanese sentence processing apparatus has a problem that further analysis cannot be performed when there are a plurality of homonymous synonyms having related words in the analysis sentence.

【０００４】特に、同音異義語の選択の際に同一文章内
の他の語と関連の同異義音語を、同音異義語の候補の中
から優先的に出力する方法にあっては、ある単語とそれ
に関連する語を示す関連語とを対にして格納した辞書
（関連情報辞書）の情報を利用するが、文章の解析過程
で関連語を有する複数の同音異義語候補が複数存在した
場合は、どの同音異義語候補を優先して良いかを判断す
ることができず、よって最適な候補を選出することがで
きない。In particular, in the method of preferentially outputting a homonymous word related to another word in the same sentence from the candidates of the homonym when selecting a homonym, a certain word Using the information of the dictionary (related information dictionary) that stores the related words indicating the related words as a pair, if there are a plurality of homonym candidates having the related words in the sentence analysis process, , It is not possible to determine which homonym candidate should be prioritized, and therefore it is not possible to select the most suitable candidate.

【０００５】この発明の目的は、文章解析過程で複数の
同音異義語候補が存在した場合であっても、一つの候補
のみを選出し、最適な解析結果を得ることのできる日本
語文章処理装置を提供することにある。It is an object of the present invention to select a single candidate and obtain an optimum analysis result even if there are a plurality of homonym candidates in the sentence analysis process. To provide.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するた
め、第１の発明は、読み文字列をかな漢字混じり文字列
に変換する日本語文章処理装置において、所定の語と該
語に関連する語を示す関連語とを対応付して記憶する関
連情報辞書と、かな漢字変換処理に係る文字列に含まれ
る同音異義語を決定する際に、前記関連情報辞書内の情
報を検索して、各同音異義語候補毎に関連する関連語の
数を計数し、この計数値が最大である同音異義語候補を
優先して出力する評価手段とを具えている。In order to achieve the above object, the first invention is a Japanese sentence processing device for converting a reading character string into a character string containing kana-kanji, and a predetermined word and a word related to the word. And a related information dictionary that stores the related words associated with each other, and when determining the homonyms contained in the character string related to the kana-kanji conversion processing, the information in the related information dictionary is searched to determine each homophone. The evaluation means counts the number of related words associated with each synonym candidate, and preferentially outputs the homonym synonym candidate having the largest count value.

【０００７】第２の発明は、読み文字列をかな漢字混じ
り文字列に変換する日本語文章処理装置において、所定
の語と該語に関連する語を示す関連語とを対応付して記
憶する関連情報辞書と、かな漢字変換処理に係る文字列
に含まれる同音異義語を決定する際に、前記関連情報辞
書内の情報を検索して、各同音異義語候補毎に、関連す
る関連語についての、同音異義語候補と関連語との当該
文字列中における位置関係を含む属性に応じて算出した
接続のし易さを示す値の和を演算し、この演算値が最大
である同音異義語候補を優先して出力する評価手段とを
具えている。A second aspect of the present invention relates to a Japanese sentence processing device for converting a reading character string into a kana-kanji mixed character string, in which a predetermined word and a related word indicating a word related to the word are associated and stored. Information dictionary, when determining the homonyms included in the character string related to the Kana-Kanji conversion process, search the information in the related information dictionary, for each homonym candidate, for the related words related, The sum of the values indicating the ease of connection calculated according to the attribute including the positional relationship between the homonym synonym candidates and the related words in the character string is calculated, and the homonym synonym candidate having the largest calculated value is selected. It is provided with an evaluation means for giving priority to output.

【０００８】第３の発明は、読み文字列をかな漢字混じ
り文字列に変換する日本語文章処理装置において、所定
の語と該語に関連する語を示す関連語と当該両者の結び
付きの強さを示す関連度とを対応付して記憶する関連情
報辞書と、かな漢字変換処理に係る文字列に含まれる同
音異義語を決定する際に、前記関連情報辞書内の情報を
検索して、各同音異義語候補毎に関連する関連語につい
ての関連度の和を演算し、この演算値が最大である同音
異義語候補を優先して出力する評価手段とを具えてい
る。In a third aspect of the present invention, in a Japanese sentence processing device for converting a reading character string into a character string containing kana-kanji characters, a predetermined word, a related word indicating a word related to the word, and the strength of the connection between the two are determined. When determining the related information dictionary that stores the associated degree of association and the homonyms included in the character string related to the kana-kanji conversion processing, the information in the related information dictionary is searched to determine each homonym. The evaluation means calculates the sum of the degrees of association of related words for each word candidate, and preferentially outputs the homonymous word candidate having the largest calculated value.

【０００９】第４の発明は、詠み文字列をかな漢字混じ
り文字列に変換する日本語文章処理装置において、所定
の語と該語に関連する語を示す関連語と当該両者の結び
付きの強さを示す関連度とを対応付して記憶する関連情
報辞書と、かな漢字変換処理に係る文字列に含まれる同
音異義語を決定する際に、前記関連情報辞書内の情報を
検索して、各同音異義語候補毎に、関連する関連語にそ
れぞれ対応する関連度に、同音異義語候補と関連語との
当該文字列中における位置関係を含む属性に応じて算出
した接続のし易さを示す値を乗じ、これら乗じて得られ
た結果の和を演算することにより得られる演算値が最大
である同音異義語候補を優先して出力する評価手段とを
具えている。According to a fourth aspect of the present invention, in a Japanese sentence processing apparatus for converting a Yomimi character string into a character string containing Kana-Kanji characters, a predetermined word, a related word indicating a word related to the word, and the strength of the connection between the two are provided. When determining the related information dictionary that stores the associated degree of association and the homonyms included in the character string related to the kana-kanji conversion processing, the information in the related information dictionary is searched to determine each homonym. For each word candidate, a value indicating the ease of connection calculated according to the attribute including the positional relationship between the homonymous word candidate and the related word in the character string is set to the degree of association corresponding to the related word. Multiplying, and evaluating means for preferentially outputting a homonym candidate having a maximum calculated value obtained by calculating the sum of the results obtained by these multiplications.

【００１０】[0010]

【作用】第１の発明では、評価手段は、かな漢字混じり
文字列に含まれる同音異義語を決定する際に、関連情報
辞書内の情報を検索して、各同音異義語候補毎に関連語
の数を計数し、この計数値が最大である同音異義語候補
を最優先して出力する。従って、この発明によれば最適
な解析結果を得ることができる。In the first aspect of the invention, the evaluation means searches the information in the related information dictionary when determining the homonyms included in the kana-kanji mixed character string, and determines the related words for each homonym candidate. The number is counted, and the homonym synonym candidate having the largest count value is output with the highest priority. Therefore, according to the present invention, the optimum analysis result can be obtained.

【００１１】また第２の発明では、評価手段は、かな漢
字混じり文字列に含まれる同音異義語を決定する際に、
関連情報辞書内の情報を検索して、各同音異義語候補毎
に、関連する関連語についての、同音異義語候補と関連
語との当該文字列中における位置関係を含む属性に応じ
て算出した接続のし易さを示す値（重み）の和を演算
し、この演算値が最大である同音異義語候補を優先して
出力する。従って、この発明によれば最適な解析結果を
得ることができる。Further, in the second invention, the evaluation means determines the homonyms included in the kana-kanji mixed character string,
The information in the related information dictionary was searched and calculated for each homophone synonym candidate according to the attribute including the positional relationship between the homophone synonym candidate and the related word in the character string for the related word. The sum of the values (weights) indicating the ease of connection is calculated, and the homonym candidate having the maximum calculated value is preferentially output. Therefore, according to the present invention, the optimum analysis result can be obtained.

【００１２】また第３の発明では、評価手段は、かな漢
字混じり文字列に含まれる同音異義語を決定する際に、
関連情報辞書内の情報を検索して、各同音異義語候補毎
に関連する関連語についての関連度の和を演算し、この
演算値が最大である同音異義語候補を最優先して出力す
る。従って、この発明によれば最適な解析結果を得るこ
とができる。Further, in the third invention, the evaluation means determines the homonyms included in the kana-kanji mixed character string,
The information in the related information dictionary is searched, the sum of the degrees of association of related words related to each homonym candidate is calculated, and the homonym candidate having the largest calculated value is output with the highest priority. .. Therefore, according to the present invention, the optimum analysis result can be obtained.

【００１３】更に第４の発明では、評価手段は、かな漢
字混じり文字列に含まれる同音異義語を決定する際に、
関連情報辞書内の情報を検索して、各同音異義語候補毎
に、関連する関連語にそれぞれ対応する関連度に、同音
異義語候補と関連語との当該文字列中における位置関係
を含む属性に応じて算出した接続のし易さを示す値（重
み）を乗じ、これら乗じて得られた結果の和を演算する
ことにより得られる演算値が最大である同音異義語候補
を優先して出力する。従って、この発明によれば最適な
解析結果を得ることができる。Further, in the fourth invention, the evaluation means determines the homonyms included in the kana-kanji mixed character string,
An attribute including the positional relationship in the character string between the homonym candidate and the related word in the degree of relevance corresponding to each related related word for each homonym candidate by searching the information in the related information dictionary Multiply the value (weight) indicating the ease of connection calculated according to the above, and calculate the sum of the results obtained by these multiplications, and give priority to the homonym candidate with the largest calculated value. To do. Therefore, according to the present invention, the optimum analysis result can be obtained.

【００１４】[0014]

【実施例】以下、本発明の実施例を添付図面を参照して
説明する。Embodiments of the present invention will be described below with reference to the accompanying drawings.

【００１５】第１の発明の実施例を説明する。An embodiment of the first invention will be described.

【００１６】図１は第１の発明に係る日本語文章処理装
置の一実施例を機能ブロック図で示したものである。FIG. 1 is a functional block diagram showing an embodiment of a Japanese sentence processing apparatus according to the first invention.

【００１７】同図において、日本語文章処理装置は、読
み文字列を入力する入力装置１０１と、入力された読み
文字列を一時的に蓄える入力バッファ１０２と、文章を
文法的に正しく単語毎に分割して、これを解析結果とし
て出力する形態素解析装置１０３と、単語毎の情報が格
納されている単語辞書１０４と、文章解析に必要な文法
情報が格納されている文法辞書１０５と、複数の解析結
果を評価して適切な解析結果を出力する評価装置１０６
と、ある単語と該語に関連している単語（関連語）とを
単語単位で登録した関連語情報、及びある語に関連して
いる意味分類の情報を登録した関連意味分類情報が格納
されている関連情報辞書１０７と、出力結果を一時的に
蓄える出力バッファ１０８と、出力結果を出力する出力
装置１０９と、上述した各装置を制御する制御装置１１
０とを有して構成されている。In the figure, the Japanese sentence processing apparatus includes an input device 101 for inputting a reading character string, an input buffer 102 for temporarily storing the input reading character string, and a sentence grammatically correctly for each word. A morphological analyzer 103 that divides and outputs this as an analysis result, a word dictionary 104 that stores information for each word, a grammar dictionary 105 that stores grammatical information necessary for sentence analysis, Evaluation device 106 for evaluating the analysis result and outputting an appropriate analysis result
And related word information in which a certain word and a word (related word) related to the word are registered in word units, and related meaning classification information in which information of a meaning classification related to a certain word is registered. Related information dictionary 107, an output buffer 108 that temporarily stores the output result, an output device 109 that outputs the output result, and a control device 11 that controls each device described above.
It is configured with 0 and.

【００１８】図２は、本発明に係る関連情報辞書１０７
の構成を示す概念図である。FIG. 2 shows a related information dictionary 107 according to the present invention.
It is a conceptual diagram which shows the structure of.

【００１９】この辞書１０７には、図２に示すように、
ある単語としての同音語の読み２１及び表記２２に対応
して、ある語と関連する語を示す関連語情報としての関
連語の読み２３及び関連語２４、或いはある語に関連し
ている意味分類情報としての関連分類２５、関連語情報
あるいは意味分類情報のいずれが前記ある単語に対する
関連情報として登録されているかを示す関連タイプ２６
が登録されている。なお、関連タイプ２６が「意味」の
場合は意味分類情報が、「語」の場合は関連語情報がそ
れぞれ関連情報として登録されていることを示してい
る。In this dictionary 107, as shown in FIG.
Corresponding to the reading 21 and the notation 22 of a homonym as a word, the reading 23 and the related word 24 of the related word as the related word information indicating the word related to the certain word, or the meaning classification related to the certain word. The relation type 26 indicating which of the relation classification 25 as information, the relation word information or the meaning classification information is registered as the relation information for the certain word.
Is registered. It should be noted that when the relation type 26 is “meaning”, the meaning classification information is registered, and when it is “word”, the related word information is registered as related information.

【００２０】次に、上述した構成において、日本語文章
処理装置の基本的な処理動作を、図３に示すフローチャ
ートを参照して説明する。Next, the basic processing operation of the Japanese sentence processing apparatus having the above-mentioned configuration will be described with reference to the flowchart shown in FIG.

【００２１】図３に示すように、入力装置１０１から入
力された文字列は、一時的に入力バッファ１０２に蓄え
られる（ステップ３０１）。形態素解析装置１０３は、
制御装置１１０からの解析要求に基づいて、入力バッフ
ァ１０２から解析対象の文章（読み文字列）を読み込み
（ステップ３０２）、その読み込んだ解析対象の文章に
ついて、単語辞書１０４及び文法辞書１０５の各情報を
参照しながら形態素解析を行うと共に、その解析結果の
候補（解析文）を評価装置１０６に送る（ステップ３０
３）。なお、読み文字列に対応する解析文（かな漢字混
じり文字列）に含まれる同音異義語候補が複数存在する
場合には、解析結果の候補は同音異義語候補数に応じた
数となる。評価装置１０６では、品詞、頻度、関連語情
報等の情報を利用した評価方法（詳細は後述する）によ
り、解析結果の候補の中から最適な候補を選出する（ス
テップ３０４）。この解析結果の候補は、一時的に出力
バッファ１０８に蓄えられた後（ステップ３０５）、出
力装置１０９を通して出力される（ステップ３０６）。As shown in FIG. 3, the character string input from the input device 101 is temporarily stored in the input buffer 102 (step 301). The morphological analyzer 103
Based on an analysis request from the control device 110, a sentence to be analyzed (reading character string) is read from the input buffer 102 (step 302), and the read sentence to be analyzed is each information in the word dictionary 104 and the grammar dictionary 105. While performing the morphological analysis while referring to, the candidate of the analysis result (analysis sentence) is sent to the evaluation device 106 (step 30).
3). When there are a plurality of homonymous synonym candidates included in the analytic sentence (kana-kanji mixed character string) corresponding to the reading character string, the number of candidates for the analysis result is the number according to the number of homonymous synonym candidates. The evaluation device 106 selects an optimum candidate from the candidates of the analysis result by an evaluation method (details will be described later) using information such as part-of-speech, frequency, and related word information (step 304). The candidates of the analysis result are temporarily stored in the output buffer 108 (step 305) and then output through the output device 109 (step 306).

【００２２】次に、評価装置１０６による解析結果候補
の評価処理手順を、図４に示すフローチャートを参照し
て説明する。Next, the procedure of evaluating the analysis result candidates by the evaluation device 106 will be described with reference to the flowchart shown in FIG.

【００２３】図４に示すように、評価装置１０６では、
受け取った解析結果の候補としての解析文について、関
連情報（関連語情報あるいは意味分類情報）による評価
を行う必要があるかどうかを判定する（ステップ４０
１）。関連情報による評価を行う必要がある場合とは、
同一の読み文字列に対応する解析文（かな漢字混じり文
字列）に含まれる、ある語（一般的には同音異義語にな
る）の候補が複数存在しているときである。言い換えれ
ば解析文の候補が複数存在するときである。As shown in FIG. 4, in the evaluation device 106,
It is determined whether or not it is necessary to evaluate the received analysis result as a candidate of the analysis result based on the related information (related word information or semantic classification information) (step 40).
1). When it is necessary to evaluate with related information,
This is when there are a plurality of candidates for a certain word (generally a homonym) contained in the analysis sentence (character string mixed with kana-kanji) corresponding to the same reading character string. In other words, it is when there are a plurality of analysis sentence candidates.

【００２４】関連情報辞書１０７内の関連語の情報を用
いて解析文（解析結果）を評価する必要がある場合は、
解析文中に関連情報を持っている語の候補（例えば同音
異義語）が存在しているかどうかを調べる（ステップ４
０２）。その語の候補が存在する場合は、解析文中に語
の候補と対応する関連語が存在しているかどうかを判定
する（ステップ４０３）。関連語がある場合はその関連
語が存在する語の候補は複数か否かを判定する（ステッ
プ４０４）。語の候補が複数存在する場合は、当該各語
の候補毎に、解析文中に存在する全ての関連語の数を計
数し、これら計数値を比較し、該比較値が最大である語
の候補すなわち、関連語が最も多い語の候補の情報を関
連情報として優先的に使用する（ステップ４０５）。こ
の関連情報を用いて評価を行い（ステップ４０６）、そ
の結果、最適な候補が見つかったかどうかを判定し（ス
テップ４０７）、最適な解析結果の候補が見付かれば、
評価を終了し、見つからなければ他の評価手段により最
適な候補を見つける（ステップ４０８）。When it is necessary to evaluate the analysis sentence (analysis result) by using the information of the related words in the related information dictionary 107,
It is checked whether or not there is a word candidate (for example, a homonym) having related information in the analysis sentence (step 4).
02). When the word candidate exists, it is determined whether or not the related word corresponding to the word candidate exists in the analysis sentence (step 403). If there is a related word, it is determined whether there are a plurality of word candidates in which the related word exists (step 404). When there are a plurality of word candidates, the number of all related words existing in the analysis sentence is counted for each word candidate, the count values are compared, and the word candidate having the largest comparison value is detected. That is, the information of the word candidate having the most related words is preferentially used as the related information (step 405). An evaluation is performed using this related information (step 406), and as a result, it is determined whether or not an optimum candidate has been found (step 407). If an optimum analysis result candidate is found,
The evaluation is completed, and if not found, an optimum candidate is found by another evaluation means (step 408).

【００２５】次に具体例を挙げて、第１の発明の実施例
の処理動作を説明する。Next, the processing operation of the first embodiment of the present invention will be described with reference to specific examples.

【００２６】ここでは、例えば、『にわとりはたまごを
かえす』という文字列をかな漢字混じり文に変換する場
合を考える。Here, for example, consider a case where a character string "Niwatori wa tamago kae kae kae" is converted into a kana-kanji mixed sentence.

【００２７】形態素解析装置１０３は入力バッファ１０
２から上記の文字列を受け取ると、単語辞書１０４およ
び文法辞書１０５を用いて形態素解析を行い、『にわと
りは／たまごを／かえす』（なお記号“／”は文節区切
り記号を示す）という文節に分割した後、それぞれの単
語の検索を行う。この場合は、＜返す＞、＜帰す＞、＜
孵す＞等「かえす」の同音語が複数検索される。その解
析結果は評価装置１０６に入力される。この解析結果を
受け取った評価装置１０６では単語の品詞、頻度などを
利用して解析を行うことになるが、これらの情報のみで
は第一候補の文を決めることができない場合がある。こ
の例の場合、『鶏は／卵を／返す』及び『鶏は／卵を／
孵す』の二つの文が候補として残るが、共に文法的にも
正しく、品詞或いは文法情報のみではどちらを第一候補
とするかは判定できない。The morphological analysis device 103 uses the input buffer 10
When the above character string is received from 2, the morphological analysis is performed using the word dictionary 104 and the grammar dictionary 105, and it is converted into a phrase “Niwatori wa / tamago / kae” (note that the symbol “/” indicates a phrase delimiter). After dividing, each word is searched. In this case, <return>, <return>, <
Multiple homophones such as "hatching>" and "kaesu" are searched. The analysis result is input to the evaluation device 106. The evaluation device 106 that has received this analysis result will perform analysis using the part of speech, frequency, etc. of the word, but there are cases where the first candidate sentence cannot be determined using only this information. In this example, "chicken / egg / return" and "chicken / egg /
The two sentences "Hatchachi" remain as candidates, but both are grammatically correct, and it is not possible to determine which is the first candidate based on the part of speech or grammatical information alone.

【００２８】この様な場合に関連情報辞書１０７内の情
報を用いることになる。上記二つの文中のうち、一方の
文に存在する＜返す＞の関連語は「卵」（意味分類＝物
品）があり、これに対し他方の文に存在する＜孵す＞の
関連語は「卵」と「鶏」（意味分類＝人・動物）があ
る。このように共に関連語が存在する場合、関連情報の
有無のみでは候補（同音異義語候補）を絞れない。そこ
で本発明では、各候補の解析文中の関連語の個数を比較
し、関連語の多い候補を第一候補にするという方法を採
用している。この例の場合には、解析文中に存在する＜
返す＞の関連語は「卵」のみであり、これに対し＜孵す
＞の関連語は解析文中に「鶏」と「卵」の二つの語が存
在するので、同音異義語候補＜孵す＞が優先的に使用さ
れ、よって『鶏は卵を孵す』というかな漢字混じり文が
第一候補として選出されることとなる。In such a case, the information in the related information dictionary 107 is used. Of the above two sentences, the related word of <return> present in one sentence is “egg” (semantic classification = article), while the related word of <hatching> present in the other sentence is “egg”. And "chicken" (semantic classification = person / animal). When both related words are present in this way, the candidates (homogeneous word candidates) cannot be narrowed down only by the presence or absence of the related information. Therefore, in the present invention, a method is adopted in which the numbers of related words in the analysis sentences of the respective candidates are compared, and the candidate having a large number of related words is set as the first candidate. In the case of this example, the
The related word of "return" is only "egg", while the related word of "hatching" has two words "chicken" and "egg" in the analysis sentence. It will be used preferentially, so the sentence "Ken hatches eggs" mixed with Kanji will be selected as the first candidate.

【００２９】以上説明した様に本実施例によれば、文章
解析過程で解析文内に関連語をもった同音異義語が複数
存在し、これ以上候補が絞れなくなった場合、それぞれ
の同音異義語候補に対する関連語の個数を比較し、関連
語の多い同音異義語候補を第一候補とすることにより最
適な解析結果（解析文）を選出することができる。As described above, according to the present embodiment, when there are a plurality of homonyms having related words in the analysis sentence in the sentence analysis process and the candidates cannot be narrowed down any more, the homonyms of the homonyms are not used. An optimal analysis result (analysis sentence) can be selected by comparing the number of related words with respect to the candidate and setting the homonymous word candidate having many related words as the first candidate.

【００３０】次に第２の発明の実施例を説明する。Next, an embodiment of the second invention will be described.

【００３１】この実施例においても、日本語文章処理装
置の構成、基本的な処理動作、解析結果の評価処理動作
は第１の発明の実施例と同様である。この実施例と第１
の発明の実施例との差異点は、上記図４のステップ４０
６の処理を、「各語の候補毎（各同音異義語毎）に、語
の候補と関連語との解析文中における位置関係を含む属
性に応じて算出した重み（接続のし易さを示す値）の総
和を演算し、この演算値が最大の語の候補を関連情報と
して優先的に使用する」、という処理に変更したことで
ある。なお、上記属性には、上記位置関係に加えて、語
の候補（同音異義語）と関連語間の接続のし易さの度合
い、語の候補（同音異義語）と関連語との間に接続され
るべく助詞の接続される度合い、等が含まれる。Also in this embodiment, the configuration of the Japanese sentence processor, the basic processing operation, and the analysis result evaluation processing operation are the same as in the first embodiment of the invention. This example and the first
The difference from the embodiment of the invention of FIG.
The process of No. 6 is performed as follows: "For each word candidate (each homonym), a weight calculated according to an attribute including a positional relationship between the word candidate and the related word in the analysis sentence (indicating the ease of connection Value) is calculated, and the word candidate having the largest calculated value is preferentially used as related information. " In addition to the positional relationship, the attributes include the degree of ease of connection between the word candidate (synonyms) and the related words, and between the word candidates (synonyms) and the related words. The degree of connection of particles to be connected, etc. are included.

【００３２】したがってこの実施例によれば、文章解析
過程で解析文内に関連語をもった同音異義語候補が複数
存在し、これ以上候補が絞れなくなった場合、それぞれ
の同甥儀語候補に対する関連語の重みの総和を演算し、
該演算値が最大である同音語候補を第一候補とすること
により最適な解析結果（解析文）を選出することができ
る。Therefore, according to this embodiment, when there are a plurality of homonym candidate words having related words in the analysis sentence in the sentence analysis process and the candidates cannot be narrowed down any more, the same nephew word candidates are respectively extracted. Calculate the sum of the weights of related words,
An optimal analysis result (analysis sentence) can be selected by setting the homonym candidate having the largest calculated value as the first candidate.

【００３３】次に第３の発明の実施例を説明する。Next, an embodiment of the third invention will be described.

【００３４】この実施例においては、日本語文章処理装
置の構成は、基本的には図１に示した第１の発明の実施
例の構成と同様であるが、関連情報辞書１０７の辞書情
報が異なっている。この実施例における関連情報辞書１
０７の構成の概念図を図５に示す。In this embodiment, the structure of the Japanese sentence processor is basically the same as that of the embodiment of the first invention shown in FIG. 1, but the dictionary information of the related information dictionary 107 is Different. Related information dictionary 1 in this embodiment
A conceptual diagram of the configuration of No. 07 is shown in FIG.

【００３５】この実施例における辞書１０７は、図２の
第１の実施例における辞書１０７に示した様なデータ構
造に、関連助詞と関連度とを追加したデータ構造になっ
ている。すなわち、同音語の読み５１及び表記５２、関
連語情報としての関連語の読み５３及び関連語５４、意
味分類情報としての関連分類５５、関連タイプ５６、関
連助詞５７、関連度５８から構成されたデータ構造にな
っている。関連助詞５７には、ある語と関連語或いはあ
る語と関連分類（意味分類）間に挿入される助詞（関連
語の後に接続される助詞）が登録される。関連度５８に
は、ある語と関連語との間あるいはある語と関連分類
（意味分類）との間の結び付きの強さを示す関連度が登
録される。The dictionary 107 in this embodiment has a data structure in which a related particle and a degree of association are added to the data structure as shown in the dictionary 107 in the first embodiment of FIG. That is, it is composed of a phonetic reading 51 and a notation 52, a related word reading 53 and related words 54 as related word information, a related classification 55 as semantic classification information, a related type 56, a related particle 57, and a degree of association 58. It has a data structure. In the related particle 57, a particle (particle connected after the related word) inserted between a certain word and a related word or a certain word and a related classification (semantic classification) is registered. In the degree of association 58, a degree of association indicating the strength of connection between a certain word and a related word or between a certain word and a related classification (semantic classification) is registered.

【００３６】また日本語文章処理装置の基本的な処理動
作は第１の発明の実施例と同様であり、評価装置１０６
による解析結果の評価処理は、基本的には第１の発明の
実施例での処理と同様であるが一部の処理が異なってい
る。The basic processing operation of the Japanese sentence processing apparatus is the same as that of the first embodiment of the invention, and the evaluation apparatus 106
The analysis result evaluation process by is basically the same as the process in the embodiment of the first invention, but a part of the process is different.

【００３７】この実施例における評価装置１０６による
解析結果の評価の処理手順を図６に示す。この処理手順
（ステップ６０１〜ステップ６０８）は、図４に示した
処理手順と基本的には同様であるが次の点が異なってい
る。すなわち、図４に示すステップ４０５の処理が、
「当該各語の候補毎に、解析文中に存在する全ての関連
語についての関連度の総和を演算し、これらの演算値を
比較して、演算値が最大である語の候補を関連情報とし
て優先的に使用する」（ステップ６０５）、という処理
に変更されている点である。他のステップの処理は同様
である。FIG. 6 shows a processing procedure for evaluating the analysis result by the evaluation device 106 in this embodiment. This processing procedure (steps 601 to 608) is basically the same as the processing procedure shown in FIG. 4, but differs in the following points. That is, the processing of step 405 shown in FIG.
"For each candidate of each word, calculate the sum of the degrees of association for all related words present in the analysis sentence, compare these operation values, and use the word candidate with the maximum operation value as the related information. The process is changed to "priority use" (step 605). The processing of other steps is the same.

【００３８】次に具体例を挙げて、第３の発明の実施例
の処理動作を説明する。Next, the processing operation of the third embodiment of the invention will be described with reference to a specific example.

【００３９】ここでは、例えば『わたしはでぱーとでこ
とりをかう』といと文をかな漢字混じり文に変換する場
合を考える。Here, for example, consider the case of converting a sentence such as "I am a part and a child" to a sentence mixed with kana and kanji.

【００４０】形態素解析装置１０３は、入力バッファ１
０２からの文字列を受け取ると、単語辞書１０４及び文
法辞書１０５を用いて形態素解析を行い、『わたしは／
でぱーとで／ことりを／かう』（なお記号“／”は文節
区切り記号を示す）という文節に分割した後、それぞれ
の単語の検索を行う。この例の場合には、＜買う＞、＜
飼う＞、＜交う＞、＜支う＞等「かう」の同音語が複数
検索される。その解析結果は評価装置１０６に入力され
る。この解析結果を受け取った評価装置１０６では、単
語の品詞、頻度などを利用して解析を行うことになる
が、これらの情報のみでは第一候補を決めることができ
ない場合がある。この例の場合、『私は／デパートで／
小鳥を／飼う』と『私は／デパートで／小鳥を／買う』
の二つの文が候補として残るが、共に文法的にも正し
く、品詞あるいは文法情報のみではどちらを第一候補と
するかは判定できない。The morphological analyzer 103 has the input buffer 1
When the character string from 02 is received, morphological analysis is performed using the word dictionary 104 and the grammar dictionary 105, and "I am /
After dividing into bunsetsu: "departo / kotori wo / kaou" (note that the symbol "/" indicates a bunsetsu delimiter), each word is searched. In this example, <buy>, <
A plurality of homonyms such as “keep”, “cross”, and “support” are searched for. The analysis result is input to the evaluation device 106. The evaluation device 106 that has received this analysis result will perform analysis using the part of speech, frequency, etc. of the word, but there are cases where the first candidate cannot be determined using only this information. In the case of this example, “I ’m at a department store /
"I keep a bird / keep it" and "I / at a department store / buy a bird /"
Although two sentences remain as candidates, both are grammatically correct, and it is not possible to determine which is the first candidate based on only the part of speech or grammatical information.

【００４１】このような場合には関連情報辞書１０７内
の情報を用いることになる。上記二つの文中のうち、一
方の文に存在する＜飼う＞の関連語は「小鳥」（意味分
類＝動物・生物）があり、これに対し他方の文に存在す
る＜買う＞の関連語は「デパート」（意味分類＝店舗）
と「小鳥」（意味分類＝動物・生物）がある。このよう
に共に共通する関連語が存在する場合、各関連度の関連
度を利用して評価することになる。語の候補（同音異義
語）と関連し、かつ解析文中に存在する全ての関連語の
関連度の和をその評価値とし、各語の候補（同音異義
語）の評価値を比較して、評価値が最も大きいときの同
音異義語を関連情報として優先的に使用する。In such a case, the information in the related information dictionary 107 is used. Of the above two sentences, the related word of <keep> in one sentence is “small bird” (semantic classification = animal / organism), while the related word of <buy> in the other sentence is "Department store" (semantic classification = store)
And "small bird" (semantic classification = animal, creature). In this way, when there is a related word that is common to both, the degree of relevance of each degree of relevance is used for evaluation. The evaluation value is the sum of the relevance of all related words that are related to the word candidate (synonyms) and that exist in the analysis sentence, and the evaluation values of each word candidate (synonyms) are compared, The homonyms with the highest evaluation value are preferentially used as related information.

【００４２】この例の場合、文中に存在する＜飼う＞の
関連語は「小鳥」のみなので、評価値は「小鳥」との関
連度「３」と同じ「３」である。これに対し＜買う＞の
関連語は「小鳥」と「デパート」の二語なので、評価値
はそれぞれの関連度の和、つまり「小鳥」との関連度
「３」＋「デパート」との関連度「１」＝４となる。し
たがって、＜買う＞の評価値「４」が＜飼う＞の評価値
「３」に比べて大きいことから、＜買う＞の関連情報が
優先的に使用され、『私は／デパートで／小鳥を／買
う』というかな漢字混じり文が第一候補として選出され
ることとなる。In the case of this example, since the only related word of <keeping> present in the sentence is “small bird”, the evaluation value is “3” which is the same as the degree of association “3” with “small bird”. On the other hand, since the related words of <buy> are two words, “small bird” and “department store”, the evaluation value is the sum of the related degrees, that is, the related degree “3” with “small bird” + the relationship with “department store” The degree “1” = 4. Therefore, since the evaluation value “4” of <buy> is larger than the evaluation value “3” of <buy>, the related information of <buy> is preferentially used, and “I / in the department store / small bird” A sentence mixed with kanji, such as "/ buy", will be selected as the first candidate.

【００４３】以上説明したように本実施例によれば、関
連情報として関連語の関連の強さの度合いを表す関連度
の情報を保持し、文章解析の際に、解析文中に関連語を
持った複数の同音異義語候補が存在した場合、関連度を
用いて各候補の評価を求め、その評価値が高い候補を優
先することにより最適な解析結果（解析文）を選出する
ことができる。As described above, according to this embodiment, as the related information, the information of the related degree indicating the degree of the related degree of the related word is held, and the related word is included in the analysis sentence at the time of sentence analysis. When a plurality of homonym synonyms candidates exist, the evaluation of each candidate is obtained using the degree of association, and the candidate with a high evaluation value is prioritized, so that the optimum analysis result (analysis sentence) can be selected.

【００４４】次に第４の発明の実施例を説明する。Next, an embodiment of the fourth invention will be described.

【００４５】この実施例においては、日本語文章処理装
置の構成、基本的な処理動作、解析結果の評価の評価処
理動作は第３の発明の実施例と同様である。この実施例
と第３の発明の実施例との差異点は、上記図６のステッ
プ６０５の処理を、「各語の候補毎に、関連する全ての
関連語にそれぞれ対応する関連度に、語の候補と関連語
との解析文中における位置関係を含む属性に応じて算出
した重みを乗じ、これら乗じて得られた結果の総和を演
算し、この演算値が最大である語の候補を関連情報とし
て優先的に使用する」、という処理に変更したことであ
る。なお上記属性には上記第２の発明の実施例で説明し
た要素と同様のものが含まれている。In this embodiment, the configuration of the Japanese sentence processor, the basic processing operation, and the evaluation processing operation for evaluating the analysis result are the same as those of the third invention. The difference between this embodiment and the embodiment of the third invention is that the processing of step 605 in FIG. 6 is performed in the following manner: "For each word candidate, the degree of relevance corresponding to all related words, Of the candidate and related words are multiplied by the weights calculated according to the attributes including the positional relationship in the analysis sentence, and the sum of the results obtained by these multiplications is calculated. The word candidate with the maximum calculated value is related information. Is used as a priority ”. The attributes include the same elements as those described in the second embodiment of the invention.

【００４６】この実施例では、解析結果に対する評価値
は単純に関連度の総和とするのではなく、候補の語と関
連語間との距離等のいくつかの要素に応じて関連度に重
み付けして得られる値（重み付き関連度）の総和を求
め、この総和値に基づいて語の候補を決定し、これによ
って最適な解析結果を選出するようにしたものである。
従って、この実施例によれば、文章解析の際に、解析文
中に関連語を持った複数の同音異義語候補が存在した場
合、重み付き関連度を用いて各同音異義語候補の評価を
求め、該評価値が大きい同音異義語候補を優先すること
により最適な解析結果（解析文）を選出することができ
る。In this embodiment, the evaluation value for the analysis result is not simply the sum of the degrees of association, but the degrees of association are weighted according to some factors such as the distance between the candidate word and the related words. The sum of the values (weighted relevance) obtained is calculated, the word candidates are determined based on the sum, and the optimum analysis result is selected.
Therefore, according to this embodiment, when a plurality of homonyms with related words are present in the analysis sentence during sentence analysis, the evaluation of each homonym candidate is obtained using the weighted relevance. An optimal analysis result (analysis sentence) can be selected by giving priority to a homonym candidate having a large evaluation value.

【００４７】[0047]

【発明の効果】以上説明したように本発明によれば、文
章解析過程の解析文中に複数の同音異義語候補が存在し
た場合であっても、関連語の計数値、関連語に関する関
連度の和の値、関連語に関する重みの和の値、関連語に
関する重みを付き関連度の和の値のいずれかの値に基づ
いて、一つの同音異義語候補を選出するようにしたの
で、最適な解析結果（解析文）を得ることができること
となり、所望のかな漢字文字列を迅速に得ることができ
る。つまり、文書の作成・編集処理効率を向上させるこ
とが可能となる。As described above, according to the present invention, even when a plurality of homonymous synonyms candidates exist in the analysis sentence in the sentence analysis process, the count value of the related words and the degree of relevance related to the related words are One homonym candidate is selected based on the sum value, the sum of the weights of related words, or the weight of related words, and the value of the sum of related degrees. As a result of analysis (analysis sentence) can be obtained, a desired kana-kanji character string can be quickly obtained. That is, it is possible to improve the efficiency of document creation / editing processing.

[Brief description of drawings]

【図１】第１の発明に係る日本語文章処理装置の一実施
例を示す機能ブロック図。FIG. 1 is a functional block diagram showing an embodiment of a Japanese sentence processing apparatus according to the first invention.

【図２】第１の発明に係る実施例における関連情報辞書
の構成を示す概念図。FIG. 2 is a conceptual diagram showing a configuration of a related information dictionary in the embodiment according to the first invention.

【図３】第１の発明に係る実施例における日本語文章処
理装置の基本的な処理動作を示すフローチャート。FIG. 3 is a flowchart showing a basic processing operation of the Japanese sentence processing apparatus in the embodiment according to the first invention.

【図４】第１の発明に係る実施例における評価装置によ
る評価結果の評価処理手順を示すフローチャート。FIG. 4 is a flowchart showing an evaluation processing procedure of an evaluation result by the evaluation device in the embodiment according to the first invention.

【図５】第３の発明に係る実施例における関連情報辞書
の構成を示す概念図。FIG. 5 is a conceptual diagram showing the structure of a related information dictionary in an embodiment according to the third invention.

【図６】第３の発明に係る実施例における評価装置によ
る評価結果の評価処理手順を示すフローチャート。FIG. 6 is a flowchart showing an evaluation processing procedure of an evaluation result by the evaluation device in the embodiment according to the third invention.

[Explanation of symbols]

１０１…入力装置、１０２…入力バッファ、１０３…形
態素解析装置、１０４…単語辞書、１０５…文法辞書、
１０６…評価装置、１０７…関連情報辞書、１０８…出
力バッファ、１０９…出力装置、１１０…制御装置。101 ... Input device, 102 ... Input buffer, 103 ... Morphological analysis device, 104 ... Word dictionary, 105 ... Grammar dictionary,
106 ... Evaluation device, 107 ... Related information dictionary, 108 ... Output buffer, 109 ... Output device, 110 ... Control device.

Claims

[Claims]

1. A Japanese sentence processing device for converting a phonetic character string into a character string mixed with Kana-Kanji, and a related information dictionary storing a predetermined word and a related word indicating a word related to the word in association with each other. When determining the homonyms included in the character string related to the kana-kanji conversion processing, the information in the related information dictionary is searched,
A Japanese sentence processing device, characterized in that it comprises an evaluation means for counting the number of related words related to each homonym candidate and preferentially outputting the homonym candidate having the largest count value. .

2. A Japanese sentence processing device for converting a reading character string into a kana-kanji mixed character string, and a related information dictionary storing a predetermined word and a related word indicating a word related to the word in association with each other. When determining the homonyms included in the character string related to the kana-kanji conversion processing, the information in the related information dictionary is searched,
For each homonym synonym candidate, for the related related words, calculate the sum of the values indicating the ease of connection calculated according to the attributes including the positional relationship in the character string between the homonym synonym candidates and the related words. A Japanese sentence processing device, comprising: an evaluation unit that calculates and preferentially outputs a homophone synonym candidate having the largest calculated value.

3. A Japanese sentence processing device for converting a reading character string into a kana-kanji mixed character string, a predetermined word, a related word indicating a word related to the word, and a degree of association indicating the strength of the connection between the two. And a related information dictionary that stores the corresponding information, and when determining the homonyms included in the character string related to the kana-kanji conversion processing, the information in the related information dictionary is searched,
Japan, characterized in that it comprises an evaluation means for calculating the sum of the degrees of association for related words related to each homonym candidate, and preferentially outputting the homonym candidate having the largest calculated value. Word and sentence processor.

4. A Japanese sentence processing device for converting a reading character string into a kana-kanji mixed character string, a predetermined word, a related word indicating a word related to the predetermined word, and a relevance degree indicating the strength of the connection between the two. And a related information dictionary that stores the corresponding information, and when determining the homonyms included in the character string related to the kana-kanji conversion processing, the information in the related information dictionary is searched,
For each homonymous synonym candidate, the degree of relevance corresponding to each related related word, the ease of connection calculated according to the attributes including the positional relationship between the homonymous synonym candidate and the related word in the character string. The present invention is characterized by comprising an evaluation means for multiplying by the indicated value and giving priority to the homonym candidate having the largest calculated value obtained by calculating the sum of the results obtained by these multiplications. Text processing device.