JPH05324607A

JPH05324607A - Method and device for converting japanese syllabary/ chinese character

Info

Publication number: JPH05324607A
Application number: JP4123005A
Authority: JP
Inventors: Hiroki Abou; 博喜阿望; Minoru Nitta; 実新田
Original assignee: JustSystems Corp
Current assignee: JustSystems Corp
Priority date: 1992-05-15
Filing date: 1992-05-15
Publication date: 1993-12-07

Abstract

PURPOSE:To attain a meaning analysis, and to improve a clause division recognizing efficiency by operating the analysis of a sentence by using not only conventionally used format information but also meaning information, at the time of researching a clause candidate in a clause division processing. CONSTITUTION:The research of the clause candidate is operated to the clause candidate prepared by a basic analyzing part 62a, by using the format information such as word information 52, grammar information 51, and experience rule by a format information referring part 62b-1 of a clause division processing part 62. Then, the research is further operated to the searched clause candidate, based on the meaning information from a cooccurrence example information 53 by meaning information referring part 62b-2, and a clause dividing position for the conversion candidate is decided. Then, the notation of the decided clause is decided by a notation selecting part 62c, and it can be used as a conversion candidate character column.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は入力された読み文字列を
かな漢字混じり文字列に変換する、かな漢字変換方法及
び装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a kana-kanji conversion method and device for converting an input reading character string into a kana-kanji mixed character string.

【０００２】[0002]

【従来の技術】従来のかな漢字変換方法では、以下の手
順で変換結果を得ている。まず、変換の対象となる読み
文字列を入力し、変換指示キーによりかな漢字変換実行
の指示を行う。変換指示を受け付けると使用者に対して
変換候補を表示する。ここで、使用者は意図する変換候
補が得られていなければ次の候補を表示させるなどして
意図する変換候補を得る。もし、文節の区切りが意図す
るものと異なっていれば、使用者は文節の区切り直しを
実行し、この後、再度変換候補を表示させる。このよう
にして、意図する変換候補が表示されたら、確定指示キ
ーにより変換候補を変換結果として確定する。このよう
にして確定された変換候補は、変換結果として出力され
る。2. Description of the Related Art In the conventional kana-kanji conversion method, the conversion result is obtained in the following procedure. First, a phonetic character string to be converted is input, and a conversion instruction key is used to instruct execution of kana-kanji conversion. Upon receiving the conversion instruction, the conversion candidates are displayed to the user. Here, if the intended conversion candidate has not been obtained, the user obtains the intended conversion candidate by displaying the next candidate. If the segment break is different from the intended one, the user re-segments the phrase and then displays the conversion candidates again. In this way, when the intended conversion candidate is displayed, the conversion candidate is confirmed as the conversion result by the confirmation instruction key. The conversion candidate thus determined is output as the conversion result.

【０００３】次に、上述の変換指示から変換候補の表示
までのデータ処理手順について更に詳しく説明する。Next, the data processing procedure from the above conversion instruction to the display of conversion candidates will be described in more detail.

【０００４】変換指示の入力があると、まず、読み文字
列の基本解析を行う。この基本解析では、変換対象の読
みに関して、単語情報による単語（語幹）の辞書検索を
行う。そして、検索された単語と文法情報とから文節と
して成立し得る全ての文節候補の組み合わせを作成す
る。When a conversion instruction is input, first, a basic analysis of a read character string is performed. In this basic analysis, with respect to the reading to be converted, a dictionary search for a word (stem) based on word information is performed. Then, a combination of all bunsetsu candidates that can be established as a bunsetsu is created from the searched word and the grammatical information.

【０００５】次に、基本解析によって得られた文節よ
り、最適な文節の組み合わせを選択する。ここでは、単
語情報から得られる品詞情報や、文法情報から得られる
文節間の係り受けの優先度情報、組み合わせ候補を絞り
込むための経験則等を参照して文節の絞り込みを実行
し、最終的に１つの文節の組み合わせを抽出する。Next, an optimal combination of clauses is selected from the clauses obtained by the basic analysis. Here, the part-of-speech information obtained from the word information, the priority information of the dependency between the phrases obtained from the grammar information, the rule of thumb for narrowing down the combination candidates, etc. are referred to and the phrase is narrowed down to finally Extract a combination of one clause.

【０００６】上述のようにして決定された文節の組み合
わせに対して、それぞれの文節における表記を決定す
る。即ち、各文節毎に漢字表記の決定を行う。ここで
は、共起用例情報によるマッチングを行い、マッチング
が成立した表記を優先して採用する。このとき共起用例
による同音語処理が実行されたことになる。そして、共
起用例情報によるマッチングが成立しなかった文節に関
しては、単語情報及び学習情報を参照して最適な表記を
選択する。即ち、学習情報により最も最近に使用された
表記を優先して採用するか、或いは、辞書に予め登録さ
れている順位により表記が決定される。以上のようにし
て変換候補が決定される。With respect to the phrase combination determined as described above, the notation in each phrase is determined. That is, the kanji notation is determined for each phrase. Here, matching is performed based on the co-occurrence example information, and the notation in which the matching is established is preferentially adopted. At this time, the homophone processing by the co-occurrence example is executed. Then, with respect to the phrase for which the matching based on the co-occurrence example information has not been established, the optimum notation is selected by referring to the word information and the learning information. That is, the most recently used notation is preferentially adopted based on the learning information, or the notation is determined by the order registered in advance in the dictionary. The conversion candidates are determined as described above.

【０００７】ここで共起用例情報とは、意味的な結びつ
きの強い単語間の２項関係情報である。そして、この共
起用例情報とは単に２つの単語の結びつきの他に、
「人」，「花」等の属性単位の共起用例や、付加的な制
限情報を含むものである。付加的な制限情報としては、
例えば、「を」，「が」等の助詞情報、成立する向きの
情報等がある。共起用例の例を示すと、「暑い−夏」，「厚い−本」，「熱い−お湯」 …（単
語の結びつき）「人（彼，彼女，先生，恋人，等）に−会う」 …（属
性単位の共起用例）「花（チューリップ，菊，等）が−咲く」 …（属
性単位の共起用例）「話を−聞く」，「薬が−効く」，「機転が−利く」…
（助詞の制限情報）「家庭−教育」，「教育−過程」 …（向
きの制限情報）というようになる。以上のような共起用例情報は、意味
情報であり、これを用いることにより同音語の多義性が
解消される。Here, the co-occurrence example information is binary relation information between words having a strong semantic connection. And with this co-occurrence example information, besides simply connecting two words,
It includes examples of co-occurrence in attribute units such as “person” and “flower”, and additional restriction information. As additional restriction information,
For example, there is particle information such as "wo" and "ga", and information about the direction of establishment. Examples of co-occurrence examples are: "Hot-summer", "Thick-book", "Hot-hot water" ... (word connection) "Meet a person (he, she, teacher, lover, etc.)-" (Example of co-occurrence by attribute unit) "Flowers (tulips, chrysanthemums, etc.) bloom" ... (Example of co-occurrence by attribute unit) "Talk-listen", "Medicine-effective", "Twilight-difficult" …
(Particle restriction information) "Home-education", "Education-process" ... (direction restriction information). The co-occurrence example information as described above is semantic information, and by using this, the polysemy of homophones is eliminated.

【０００８】次に共起用例情報による同音語処理につい
て説明する。例えば、「あつい／ほん」という読みに対
して変換処理を行う際に、共起用例情報における「厚い
−本」という結びつきの情報から「厚い」が選択され、
「熱い，暑い」等は選択されない。従って、「厚い／
本」という変換結果が即座に得られる。このように、共
起用例情報を用いることにより変換効率の向上を図って
いる。Next, homophone processing by co-occurrence example information will be described. For example, when performing a conversion process on the reading "Atsu / hon", "thick" is selected from the connection information "thick-book" in the co-occurrence example information,
"Hot and hot" etc. are not selected. Therefore, "thick /
The conversion result "book" is immediately obtained. In this way, the conversion efficiency is improved by using the co-occurrence example information.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、上述の
従来例において、例えば、「こどもにほんをよみきかせ
る（子供に本を読み聞かせる）」という読み文字列を入
力して変換指示を実行すると「子供／日本を／読み／聞
かせる」というように、文節区切り位置を決定してしま
う場合がある。これは、文節区切り位置を決定する場合
に、文法情報による規則（助詞がつかない名詞と名詞の
結びつきを優先する等）や経験則（文節数の少ない候補
を優先する等）等の形式情報のみに基づいて処理を実行
しているために発生する。即ち、単語間の意味的な結び
つきに対する解析を行わずに、形式情報のみに基づいて
文節の区切り位置を決定してしまうために、このような
誤った変換候補を表示してしまう。However, in the above-mentioned conventional example, for example, when a conversion instruction is executed by inputting a reading character string of "make a child read a book" (read a book to a child), The phrase segmentation position may be determined, such as "/ read / listen to Japan". This is only formal information such as rules based on grammatical information (such as giving priority to nouns and nouns that do not have particles) and empirical rules (such as giving preference to candidates with a small number of clauses) when deciding punctuation points. It occurs because processing is executed based on. That is, such a wrong conversion candidate is displayed because the delimiter position of the phrase is determined based only on the format information without analyzing the semantic connection between words.

【００１０】また、このような場合、使用者は文節の区
切り直しをマニュアル操作により実行し、再度変換指示
を与えることで所望の変換結果を得ることができる。こ
のように文節区切り位置が誤ってしまった場合には、文
節の区切り直しという作業が必ず発生し、操作が非常に
煩わしい。In such a case, the user can obtain a desired conversion result by manually re-segmenting the clause and giving a conversion instruction again. In this way, when the bunsetsu delimiter position is incorrect, the work of re-separating the bunsetsu always occurs, and the operation is very troublesome.

【００１１】更に、従来のかな漢字変換方法における同
音語選択処理は文節区切り処理の後で実行されるので、
上述のように文節区切り処理が使用者の意図するものと
異なった場合には、同音語選択処理はその全てが無駄な
処理となってしまう。Further, since the homophone selection process in the conventional kana-kanji conversion method is executed after the phrase segmentation process,
As described above, when the phrase segmentation process is different from the user's intention, the homophone selection process is all wasted.

【００１２】本発明は上記の問題点に鑑みてなされたも
のであり、文節区切り処理において文節候補を絞り込む
際に、従来より用いられてきた形式情報に加えて、意味
情報を用いた文の解析を行うことにより意味的な解析を
可能とし、文節区切り認識率を向上したかな漢字変換方
法及び装置を提供することを目的とする。The present invention has been made in view of the above problems, and when narrowing down bunsetsu candidates in bunsetsu segmentation processing, in addition to the format information that has been conventionally used, a sentence analysis using semantic information is performed. It is an object of the present invention to provide a kana-kanji conversion method and device in which semantic analysis is enabled by performing the above and the phrase segmentation recognition rate is improved.

【００１３】[0013]

【課題を解決するための手段】上記の目的を達成するた
めの本発明によるかな漢字変換装置は以下の構成を備え
る。即ち、入力された読み文字列をかな漢字混じり文字
列に変換するかな漢字変換装置において、単語間の意味
的な結びつきに関する意味情報を格納する意味情報格納
手段と、単語情報及び文法情報を用いて前記読み文字列
より文節候補を生成する生成手段と、前記文節候補に対
して形式情報を用いて文節候補の絞り込みを行う第１の
絞り込み手段と、前記文節候補に対して前記意味情報格
納手段に格納されている前記意味情報を用いて文節候補
の絞り込みを行う第２の絞り込み手段と、前記第１の絞
り込み手段及び前記第２の絞り込み手段による文節候補
の絞り込みの結果として選択された文節候補に基づいて
変換候補の表記を決定する表記決定手段と、を備える。A kana-kanji conversion device according to the present invention for achieving the above object has the following configuration. That is, in a kana-kanji conversion device for converting an input phonetic character string into a kana-kanji mixed character string, a semantic information storage means for storing semantic information regarding a semantic connection between words and the reading using the word information and the grammatical information. Generation means for generating bunsetsu candidates from a character string, first narrowing means for narrowing down bunsetsu candidates using format information for the bunsetsu candidates, and the meaning information storage means for the bunsetsu candidates. Based on the phrase candidates selected as a result of narrowing down the phrase candidates by the first narrowing unit and the second narrowing unit, the second narrowing unit narrowing down the phrase candidates using the semantic information. A notation determining means for determining the notation of the conversion candidate.

【００１４】また、上記の目的を達成するための本発明
によるかな漢字変換方法は以下の行程を備える。即ち、
入力された読み文字列をかな漢字混じり文字列に変換す
るかな漢字変換方法において、単語間の意味的な結びつ
きに関する意味情報を格納する意味情報格納手段と、単
語情報及び文法情報を用いて前記読み文字列より文節候
補を生成する生成行程と、前記文節候補に対して形式情
報を用いて文節候補の絞り込みを行う第１の絞り込み行
程と、前記文節候補に対して前記意味情報格納手段に格
納されている前記意味情報を用いて文節候補の絞り込み
を行う第２の絞り込み行程と、前記第１の絞り込み行程
及び前記第２の絞り込み行程による文節候補の絞り込み
の結果として選択された文節候補に基づいて変換候補の
表記を決定する表記決定行程と、を備える。The Kana-Kanji conversion method according to the present invention for achieving the above object includes the following steps. That is,
In a kana-kanji conversion method for converting an input phonetic character string into a kana-kanji mixed character string, a semantic information storage means for storing semantic information regarding a semantic connection between words and the phonetic character string using word information and grammatical information. A generation process for generating more bunsetsu candidates, a first narrowing process for narrowing down bunsetsu candidates by using format information for the bunsetsu candidates, and the meaning information storage means for the bunsetsu candidates. A conversion candidate based on a second narrowing stroke for narrowing down phrase candidates using the semantic information, and a phrase candidate selected as a result of narrowing down the phrase candidates by the first narrowing stroke and the second narrowing stroke. And a notation determination step for determining the notation of.

【００１５】尚、本発明における形式情報とは、文節候
補の絞り込みを実行する際に使用する形式に関する情報
であり、文法情報（単語間或いは文節間の文法的な結び
つきの正否及びその強さ等）や経験則（文節数の少ない
候補を優先する等）等の意味的な要素を持たない情報を
指すものである。The formal information in the present invention is information about a format used when narrowing down bunsetsu candidates, and includes grammatical information (whether or not a grammatical connection between words or bunsetsu is correct and its strength, etc.). ) Or empirical rules (such as giving priority to candidates with a small number of clauses), it refers to information that does not have any semantic elements.

【００１６】[0016]

【作用】上記の構成において、まず生成手段により複数
の文節候補が生成される。次に、第１の絞り込み手段に
おいて、生成手段にて生成された文節候補に対して、単
語情報，文法情報，経験則等の形式情報による優先規則
により絞り込みを行う。次に、前記第１の絞り込み手段
において絞り込まれた文節候補に対して、第２の絞り込
み手段は意味情報に基づいて更に絞り込みを行い、変換
候補とするための文節区切り位置を決定する。そして、
表記決定手段は、決定された文節に対して表記を決定
し、これを変換候補文字列とする。In the above structure, first, a plurality of phrase candidates are generated by the generating means. Next, in the first narrowing-down means, the phrase candidates generated by the generating means are narrowed down by a priority rule based on formal information such as word information, grammatical information, and empirical rules. Next, the second narrowing-down means further narrows down the phrase candidates narrowed down by the first narrowing-down means on the basis of the semantic information, and determines a phrase-breaking position to be a conversion candidate. And
The notation determining means determines a notation for the determined clause and sets it as a conversion candidate character string.

【００１７】[0017]

【実施例】以下に添付の図面を参照して本発明の好適な
実施例について説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT A preferred embodiment of the present invention will be described below with reference to the accompanying drawings.

【００１８】図１は、本実施例における文書処理装置の
概略構成を表すブロック図である。同図において、１は
ＣＰＵであり、本文書処理装置の各種処理を実行する。
２はキーボードであり、読み文字列の入力や、変換指
示，確定指示等の各種入力を行う。３はＣＲＴであり、
入力中の文書や、かな漢字変換候補などを表示する。４
はＲＯＭであり、ＣＰＵ１が実行する各種の制御プログ
ラムが格納されている。５は辞書であり、かな漢字変換
処理を行う際に参照される各種情報が格納されている。
６はＲＡＭであり、ＣＰＵ１が各種の処理を実行するた
めの作業エリアや、かな漢字変換部６０等がある。ま
た、このかな漢字変換部６０は、かな漢字変換処理のた
めの入出力インターフェイス部６１及び、かな漢字変換
を実行する変換エンジン部６２を有する。そして、上述
の各部はバス７に接続されており、相互にデータの授受
を行う。FIG. 1 is a block diagram showing the schematic arrangement of a document processing apparatus according to this embodiment. In the figure, reference numeral 1 denotes a CPU, which executes various processes of the document processing apparatus.
Reference numeral 2 is a keyboard for inputting reading character strings and various inputs such as conversion instructions and confirmation instructions. 3 is a CRT,
Display the document you are inputting, Kana-Kanji conversion candidates, etc. Four
Is a ROM in which various control programs executed by the CPU 1 are stored. Reference numeral 5 denotes a dictionary, which stores various information referred to when performing kana-kanji conversion processing.
A RAM 6 has a work area for the CPU 1 to execute various processes, a kana-kanji conversion unit 60, and the like. The Kana-Kanji conversion unit 60 also includes an input / output interface unit 61 for Kana-Kanji conversion processing and a conversion engine unit 62 for executing Kana-Kanji conversion. The above-mentioned units are connected to the bus 7 and exchange data with each other.

【００１９】図２はかな漢字変換部６０（入出力インタ
ーフェイス部６１及び変換エンジン部６２）及び辞書５
の詳細を表す図である。FIG. 2 shows a kana-kanji conversion unit 60 (input / output interface unit 61 and conversion engine unit 62) and a dictionary 5.
It is a figure showing the detail of.

【００２０】まず、入出力インターフェイス部６１にお
いて、６１ａは判定部であり、キーボード２からの入力
が読み文字列であるか或いは変換指示，確定指示である
か等の判定を行う。６１ｂは格納部であり、キーボード
２より入力される読み文字列や変換エンジン部より入力
される変換候補文字列などを格納する。６１ｃは表示部
であり、格納部６１ｂに格納されている読み文字列や変
換候補文字列等をＣＲＴ３に表示する。First, in the input / output interface unit 61, a determination unit 61a determines whether the input from the keyboard 2 is a read character string, a conversion instruction, a confirmation instruction, or the like. A storage unit 61b stores a reading character string input from the keyboard 2 and a conversion candidate character string input from the conversion engine unit. A display unit 61c displays the reading character string, the conversion candidate character string, and the like stored in the storage unit 61b on the CRT 3.

【００２１】また、変換エンジン部６２において、６２
ａは基本解析部であり、辞書５にある文法情報、単語情
報より文節の区切り位置として可能な組み合わせの全て
を得る。６２ｂは文節区切り処理部であり、文法情報，
単語情報等の形式情報により文節区切り候補の絞り込み
を行う形式情報参照部６２ｂ−１と、共起用例情報を用
いて文節区切りの絞り込みを行うとともに、同音語選択
処理を実行する意味情報参照部６２ｂ−２を備える。６
２ｃは表記選択部であり、文節区切り処理部６２ｂで表
記が未決定の文節に対して表記の決定を行う。Further, in the conversion engine section 62, 62
Reference numeral a is a basic analysis unit, which obtains all possible combinations of punctuation positions from grammatical information and word information in the dictionary 5. Reference numeral 62b is a phrase segmentation processing unit, which includes grammar information,
A format information reference unit 62b-1 that narrows down phrase break candidates based on format information such as word information, and a semantic information reference unit 62b that narrows down phrase breaks using co-occurrence example information and executes homophone selection processing. -2 is provided. 6
Reference numeral 2c is a notation selecting unit, which determines the notation for a phrase whose notation has not been determined by the phrase segmentation processing unit 62b.

【００２２】辞書５には、文法情報５１，単語情報５
２，共起用例情報５３，学習情報５４が格納されてい
る。文法情報５１には、単語間の文法的な結びつきの正
否に関する情報やその結びつきの強さに関する情報など
がある。例えば、「名詞（助詞なし）＋接尾語は結びつ
きが強い，名詞（助詞なし）＋名詞は結びつきが強い」
等の形式情報が格納されている。単語情報５１には、各
々の単語の品詞情報及び活用情報等がある。共起用例情
報５３は、意味的な結びつきの強い単語間の２項関係情
報であり、その詳細は従来例において説明してある。こ
の共起用例情報は文節区切り処理における意味情報とし
て参照される。そして、このような共起用例情報は相当
数を用意してあり、本文書処理装置においては数十万例
の情報が収録されている。学習情報５４は、最も最近に
使用された表記を優先して採用するための情報である。The dictionary 5 includes grammar information 51 and word information 5
2, co-occurrence example information 53 and learning information 54 are stored. The grammatical information 51 includes information on the correctness of the grammatical connection between words and information on the strength of the connection. For example, "noun (no particle) + suffix has strong ties, noun (no particle) + noun has strong ties"
Format information such as is stored. The word information 51 includes part-of-speech information and utilization information of each word. The co-occurrence example information 53 is binary relation information between words having a strong semantic connection, and details thereof have been described in the conventional example. This co-occurrence example information is referred to as semantic information in the phrase segmentation process. A considerable number of such co-occurrence example information is prepared, and hundreds of thousands of information are recorded in this document processing device. The learning information 54 is information for preferentially adopting the most recently used notation.

【００２３】以上の構成による本文書処理装置の動作に
ついて説明する。図３は本実施例の文書処理装置による
かな漢字変換動作の手順を表すフローチャートである。
本処理は、キーボード２より何らかのキー入力があった
ときに開始される。The operation of the document processing apparatus having the above configuration will be described. FIG. 3 is a flowchart showing the procedure of the kana-kanji conversion operation by the document processing apparatus of this embodiment.
This process is started when there is any key input from the keyboard 2.

【００２４】ステップＳ１１において、判定部６１ａは
キーボード２からのキー入力が文字入力であるかどうか
を判定する。もし、文字入力であればステップＳ１２へ
進む。ステップＳ１２では、入力された文字を読み文字
列として格納部６１ｂに格納し、ステップＳ１３へ進
む。一方、ステップＳ１１において入力されたキー入力
が文字入力ではないと判定された場合は、ステップＳ１
３へ進む。In step S11, the determination unit 61a determines whether the key input from the keyboard 2 is a character input. If it is a character input, the process proceeds to step S12. In step S12, the input character is stored in the storage unit 61b as a reading character string, and the process proceeds to step S13. On the other hand, when it is determined that the key input input in step S11 is not character input, step S1
Go to 3.

【００２５】ステップＳ１３において、判定部６１ａは
キー入力が変換指示であるかどうかを判定し、変換指示
でなければステップＳ１１へ戻り上述の処理を繰り返
す。又、キー入力が変換指示であればステップＳ１４へ
進む。ステップＳ１４では、格納部６１ｂに格納された
読み文字列を変換エンジン部６２に受け渡し、変換エン
ジン部６２はこの読み文字列に対して変換処理を実行す
る。ステップＳ１５においては、変換エンジン部６２に
おいて得られた変換候補文字列を格納部６１ｂに格納す
るとともに、表示部６１ｃによりＣＲＴ３に変換候補と
して表示する。In step S13, the determination section 61a determines whether or not the key input is a conversion instruction, and if it is not a conversion instruction, the process returns to step S11 to repeat the above processing. If the key input is a conversion instruction, the process proceeds to step S14. In step S14, the phonetic character string stored in the storage unit 61b is transferred to the conversion engine unit 62, and the conversion engine unit 62 executes the conversion process on this phonetic character string. In step S15, the conversion candidate character string obtained by the conversion engine unit 62 is stored in the storage unit 61b and displayed on the CRT 3 by the display unit 61c as a conversion candidate.

【００２６】そして、ステップＳ１６で、判定部６１ａ
はキーボード２より確定を指示するキー入力があったか
どうかを判定し、確定指示が無ければステップＳ１４へ
戻る。また、確定指示があればステップＳ１７へ進み変
換結果として出力する。尚、本図では省略されている
が、変換候補が所望のものではない場合にはステップＳ
１４において、次候補を獲得したり、文節の区切り直し
を実行したりするものとする。Then, in step S16, the determination unit 61a
Judges whether or not there is a key input for instructing confirmation from the keyboard 2, and if there is no confirmation instruction, returns to step S14. Further, if there is a confirmation instruction, the process proceeds to step S17 and is output as the conversion result. Although omitted in this figure, if the conversion candidate is not the desired one, step S
At 14, it is assumed that the next candidate is acquired or the segmentation is performed again.

【００２７】次に、上述のステップＳ１４における変換
処理について図４のフローチャートを参照して更に詳し
く説明する。図４は本実施例の文書処理装置によるかな
漢字変換処理の手順を表すフローチャートである。Next, the conversion process in step S14 will be described in more detail with reference to the flowchart of FIG. FIG. 4 is a flow chart showing the procedure of kana-kanji conversion processing by the document processing apparatus of this embodiment.

【００２８】ステップＳ２１において、辞書５の文法情
報５１及び単語情報５２を参照して、基本解析部６２ａ
が文節候補の生成を行う。まず、辞書検索を行うことに
より読み文字列より単語（語幹）を抽出する。抽出され
た単語の品詞は単語情報５２より得られ、この品詞情報
及び文法情報５１より活用語尾，付属語の処理を行う。
このようにして、文法的に成立し得る全ての文節を抽出
し、複数の文節の組み合わせ候補を生成する。In step S21, the basic analysis section 62a is referred to by referring to the grammatical information 51 and the word information 52 of the dictionary 5.
Generates bunsetsu candidates. First, a word (stem) is extracted from the reading character string by performing a dictionary search. The part-of-speech of the extracted word is obtained from the word information 52, and from this part-of-speech information and grammatical information 51, the processing of the inflectional ending and the adjunct word is performed.
In this way, all clauses that can be grammatically established are extracted, and combination candidates of a plurality of clauses are generated.

【００２９】ステップＳ２２においては、文節区切り処
理部６２ｂにおける形式情報参照部６２ｂ−１がステッ
プＳ２１で生成された文節の組み合わせ候補のそれぞれ
に対して、形式情報を用いて加点を行い、優先度を算出
する。ここで、形式情報とは単語情報５２より得られる
情報（各々の単語の品詞，活用等の情報），文法情報５
１より得られる情報（単語間の文法的な結びつきの正
否，結びつきの強さ等の情報），及び経験則（文節数の
少ない候補を優先する等の情報）である。そして、ステ
ップＳ２３では、算出された優先度により所定の優先度
を有する文節候補のみが抽出され、文節の組み合わせ候
補の絞り込みが実行される。In step S22, the format information reference section 62b-1 in the clause segmentation processing section 62b adds points to each of the candidate combinations of clauses generated in step S21 using the format information to set the priority. calculate. Here, the formal information is information obtained from the word information 52 (information on the part of speech of each word, utilization, etc.), grammar information 5
Information obtained from 1 (information such as correctness of grammatical connection between words, strength of connection, etc.) and empirical rule (information such as giving priority to a candidate having a small number of clauses). Then, in step S23, only bunsetsu candidates having a predetermined priority are extracted based on the calculated priorities, and bunsetsu combination candidates are narrowed down.

【００３０】ステップＳ２４においては、意味情報参照
部６２ｂ−２がステップＳ２３で絞り込まれた文節の組
み合わせ候補のそれぞれに対して意味情報を用いて加点
を行い、優先度を算出する。ここでは、共起用例情報を
参照して一致するものがあるかどうかを検査し、一致す
るものがあればその文節区切り候補が選ばれる可能性を
大きくするように優先度の点数の加点を行う。ステップ
Ｓ２５では、以上のようにして最も優先度の高くなった
文節の組み合わせ候補を採用することで、変換候補にお
ける文節の区切り位置を決定する。そして、ステップＳ
２６では、ステップＳ２５で採用された文節において、
共起用例情報と一致するものがあればその文節の表記を
決定することにより、同音語選択処理が実行される。実
質的には、ステップＳ２５において、共起用例情報と一
致する文節が検出された時点で、この文節に対しては共
起用例情報により表記が決定される。In step S24, the semantic information reference unit 62b-2 adds points to each of the candidate combinations of clauses narrowed down in step S23 using the semantic information to calculate the priority. Here, the co-occurrence example information is referred to check whether there is a match, and if there is a match, the priority score is added so as to increase the possibility that the phrase segment candidate will be selected. .. In step S25, the phrase delimiter position in the conversion candidates is determined by adopting the phrase candidate having the highest priority as described above. And step S
In 26, in the clause adopted in step S25,
If there is a match with the co-occurrence example information, the notation of that phrase is determined to execute the homophone selection process. Substantially, at step S25, when a phrase that matches the co-occurrence example information is detected, the notation is determined for this phrase by the co-occurrence example information.

【００３１】次にステップＳ２７において、表記選択部
６２ｃは上記のステップＳ２６にて決定されていない文
節（即ち、同音語選択処理により表記が決定されなかっ
た文節）に対して、単語情報，文法情報，学習情報を参
照して表記を決定する。そして、ステップＳ２８におい
て、変換候補として入出力インターフェイス部６１へ出
力される。Next, in step S27, the notation selecting unit 62c sets word information and grammatical information for the phrase not determined in step S26 (that is, the phrase whose notation is not determined by the homophone selection process). , Determine the notation by referring to the learning information. Then, in step S28, it is output to the input / output interface unit 61 as a conversion candidate.

【００３２】次に、上述の処理による変換候補の決定に
ついて読み文字列の具体例を挙げて説明する。Next, the determination of the conversion candidates by the above-mentioned processing will be described by giving a specific example of the reading character string.

【００３３】例えば、図５に示すように、「こどもにほ
んをよみきかせる」という文（読み文字列）１０１を入
力した場合について説明する。尚、本文書処理装置に
は、共起用例情報として、「本を−読む」というような
設定が存在しているものとする。上述の図４におけるス
テップＳ２１からステップＳ２３までの形式情報による
絞り込み処理の結果、文節の組み合わせの候補として候
補文１０２，１０３が抽出される。従来のかな漢字変換
処理では、形式情報のみで文節候補を決定するので、こ
の形式情報に「名詞（助詞なし）＋名詞」の結びつきを
優先するというような規則が存在すると、「子供」と
「日本」の結びつきが強くなり文節候補１０２が選択さ
れる。そして、その結果として、変換候補文１０４が得
られることになる。For example, as shown in FIG. 5, a case will be described in which a sentence (reading character string) 101, "To read children's books" is input. It is assumed that the document processing apparatus has a setting such as "read a book" as co-occurrence example information. As a result of the narrowing-down process based on the format information from step S21 to step S23 in FIG. 4 described above, the candidate sentences 102 and 103 are extracted as candidates for the phrase combination. In the conventional Kana-Kanji conversion process, bunsetsu candidates are determined only by the formal information, so if there is a rule that prioritizes the connection of "noun (no particle) + noun" in this formal information, "child" and "Japanese" The tie of “” becomes stronger and the phrase candidate 102 is selected. Then, as a result, the conversion candidate sentence 104 is obtained.

【００３４】しかしながら、本文書処理装置において
は、ステップＳ２４からステップＳ２５における意味情
報による絞り込み処理の結果、共起用例情報「本を−読
む」において一致するので、文節候補１０３が選択され
る。更に、その表記がステップＳ２６において決定さ
れ、変換候補文１０５を得る。However, in the document processing apparatus, as a result of the narrowing-down process based on the semantic information in steps S24 to S25, the co-occurrence example information "reading a book" matches, so the phrase candidate 103 is selected. Further, the notation is determined in step S26, and the conversion candidate sentence 105 is obtained.

【００３５】以上説明してきたように、本実施例の文書
処理装置においては、文節区切り処理の際に形式情報に
加えて、共起用例情報を用いて文節の組み合わせ候補を
選択する。これにより、形式情報だけで文節区切り位置
を決定する場合に比べて、文節区切り認識率が大きく向
上するので、変換効率の大幅な向上が図られる。また、
共起用例情報により結びつきの確認された文節に対して
は、その文節に対する表記は既に同音語選択処理が実行
されたことになり、文節区切り処理と同音語選択処理と
が統合化される。この統合化により、従来の誤った文節
区切り処理後の同音語選択処理という無駄な変換処理を
削減することができる。As described above, in the document processing apparatus of this embodiment, the phrase combination candidate is selected using the co-occurrence example information in addition to the format information in the phrase segmentation process. As a result, the phrase break recognition rate is significantly improved compared to the case where the phrase break position is determined only by the format information, so that the conversion efficiency is greatly improved. Also,
For a phrase whose connection has been confirmed by the co-occurrence example information, the notation for that phrase has already been performed, and the phrase segmentation process and the homophone selection process are integrated. By this integration, it is possible to reduce the useless conversion process of the conventional homophone selection process after the erroneous phrase segmentation process.

【００３６】尚、共起用例情報として予め登録されてい
る情報に対して、使用者により新たに２項関係を追加登
録したり、学習機能により共起用例情報を追加したりす
るように構成することも可能である。The user may additionally register the binary relation to the information registered in advance as the co-occurrence example information, or the co-occurrence example information may be added by the learning function. It is also possible.

【００３７】[0037]

【発明の効果】以上説明してきたように、本発明のかな
漢字変換方法及び装置によれば、文節区切り処理におい
て従来より用いられてきた形式情報に加えて、意味情報
を用いて文の解析を行うことにより意味的な解析が可能
となり、文節区切り認識率が向上する。As described above, according to the kana-kanji conversion method and device of the present invention, a sentence is analyzed using semantic information in addition to the format information conventionally used in the phrase segmentation processing. This enables semantic analysis and improves the phrase break recognition rate.

【００３８】[0038]

[Brief description of drawings]

【図１】本実施例における文書処理装置の概略構成を表
すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a document processing apparatus according to an embodiment.

【図２】本実施例のかな漢字変換部の機能構成を表すブ
ロック図である。FIG. 2 is a block diagram showing a functional configuration of a kana-kanji conversion unit according to the present embodiment.

【図３】本実施例のかな漢字変換動作の手順を表すフロ
ーチャートである。FIG. 3 is a flowchart showing a procedure of a kana-kanji conversion operation of the present embodiment.

【図４】本実施例のかな漢字変換処理の手順を表すフロ
ーチャートである。FIG. 4 is a flowchart showing a procedure of kana-kanji conversion processing according to the present embodiment.

【図５】読み文字列の入力例と文節区切り候補及び変換
結果を表す図である。FIG. 5 is a diagram showing an input example of a reading character string, a phrase segment candidate, and a conversion result.

[Explanation of symbols]

１ＣＰＵ２キーボード３ＣＲＴ４ＲＯＭ５辞書５１文法情報５２単語情報５３共起用例情報５４学習情報６ＲＡＭ６１入出力インターフェイス部６２変換エンジン部 1 CPU 2 keyboard 3 CRT 4 ROM 5 dictionary 51 grammar information 52 word information 53 co-occurrence example information 54 learning information 6 RAM 61 input / output interface unit 62 conversion engine unit

Claims

[Claims]

1. A kana-kanji conversion device for converting an input phonetic character string into a kana-kanji mixed character string, using semantic information storage means for storing semantic information regarding a semantic connection between words, and word information and grammatical information. Generating means for generating bunsetsu candidates from the reading character string, first narrowing means for narrowing bunsetsu candidates using format information for the bunsetsu candidates, and the meaning information storing means for the bunsetsu candidates Second narrowing-down means for narrowing down bunsetsu candidates using the semantic information stored in, and a bunsetsu candidate selected as a result of narrowing down bunsetsu candidates by the first narrowing means and the second narrowing means. A kana-kanji conversion device, comprising: a notation determining means for determining a notation of a conversion candidate based on.

2. The semantic information stored in the semantic information storage means is co-occurrence example information representing binary relation information between words having a strong semantic connection. Nokana-Kanji conversion device.

3. The kana-kanji conversion device according to claim 2, wherein the notation determining means preferentially adopts the notation in the co-occurrence example information adopted by the second narrowing means.

4. A kana-kanji conversion method for converting an input phonetic character string into a kana-kanji mixed character string, using semantic information storage means for storing semantic information regarding a semantic connection between words, and word information and grammatical information. Generating process for generating bunsetsu candidates from the reading character string, a first narrowing process for narrowing bunsetsu candidates using format information for the bunsetsu candidates, and the semantic information storage unit for the bunsetsu candidates A second narrowing step for narrowing down the bunsetsu candidates using the semantic information stored in, and a bunsetsu candidate selected as a result of narrowing down the bunsetsu candidates by the first narrowing stroke and the second narrowing stroke. A kana-kanji conversion method comprising: a notation determination process for determining the notation of a conversion candidate based on.

5. The semantic information stored in the semantic information storage means is co-occurrence example information representing binary relation information between words having a strong semantic connection. Nokana-Kanji conversion method.

6. The kana-kanji conversion method according to claim 5, wherein in the notation determining process, the notation in the co-occurrence example information adopted in the second narrowing process is preferentially adopted.