JPH04195253A

JPH04195253A - Continuous clause japanese-syllabary chinese-character conversion method and device

Info

Publication number: JPH04195253A
Application number: JP2323431A
Authority: JP
Inventors: Hiroshi Kaneko; 宏金子
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1990-11-28
Filing date: 1990-11-28
Publication date: 1992-07-15
Anticipated expiration: 2010-05-24
Also published as: JPH0748213B2

Abstract

PURPOSE: To efficiently segment a clause by using any arbitrary independent word indication when an independent word contained in the clause at the time of clause segmentation error including a proper clause boundary is equal with an independent word contained in the clause at the time of proper description just before the proper clause boundary. CONSTITUTION: While referring to a dictionary 2, a Japanese syllabary(KANA)/Chinese character(KANJI) converting part 1 finds converted result candidates more than one. A clause resegmenting part 6 preserves the converted result OUT before clause resegmentation due to an operator in a non-corrected converted result buffer 7, clause segment designation data are prepared and stored in a buffer 5, the converting part 1 is called, and the converted result is corrected. Further, the converted result OUT after clause resegmentation is preserved in a corrected converted result buffer 8 and a learning part 9 is called. The learning part 9 compares the buffer 7 with the buffer 8, decides the kind of clause resegmentation and updates a 1st type resegmentation learning dictionary 3 or a 2nd type resegmentation learning dictionary 4. Namely, when the independent words contained in clauses before and after correction are equal, the clause is registered in the dictionary 3 but when they are not equal, it is registered in the dictionary 4.

Description

【発明の詳細な説明】Ａ、産業上の利用分野この発明は連文節仮名漢字変換方法および装置に関し、
とくに文節切りの学習を効率良く行えるようにするもの
である。[Detailed Description of the Invention] A. Field of Industrial Application This invention relates to a method and device for converting kana to kanji in connected clauses;
In particular, it allows for efficient learning of bunsetsu segmentation.

Ｂ、従来の技術複数の文節について入力仮名文字列を漢字仮名混じり文
に変換する連文節仮名漢字変換方法では、入力仮名文字
列を所定のアルゴリズムで文節候補の単位で分割しく文
節切り）、文節候補の各々に対応する仮名漢字混じりの
文字列を生成して連結し、変換結果としている。そして
この文節切りが誤ったものである場合には正しい文節切
りを学習させ、以降同一の文節切り誤りが生じないよう
にすることも行われつつある。B. Conventional technology In the continuous clause kana-kanji conversion method, which converts an input kana character string into a sentence containing kanji and kana for multiple clauses, the input kana character string is divided into clause candidates using a predetermined algorithm. A string of characters mixed with kana and kanji corresponding to each is generated and concatenated to form the conversion result. If this bunsetsu cut is incorrect, the correct bunsetsu cut is learned, so that the same bunsetsu cut error will not occur in the future.

たとえば「埼玉系では県財政を立て直す」という正書に
対して「さいたまけんではけんざいせいをたてなおす」
という仮名文字列を入力した場合に、［埼玉系で派遣財
政を立て直す」と誤って変換することがある。これは「
埼玉系では７県」と文節切りを行うべきところ、誤って
「埼玉県テ／派遣」と文節切りを行った結果である。こ
のような文節切り誤りに対処するために学習を行う場合
には「埼玉系では７県」に修正するときに、以降は正し
く文節切りが行われるコうに文節切り情報を記憶してお
く。For example, in contrast to the official text that says ``Saitama-kei will rebuild its prefectural finances,'' it says ``Saitama Kenzai will rebuild its prefectural finances.''
If you enter the kana character string, it may be incorrectly converted to ``Rebuild the dispatch finances in the Saitama system.'' this is"
This is the result of incorrectly cutting off the sentence as ``Saitama Prefecture Te/Dispatch'' when the phrase should have been ``7 prefectures in Saitama.'' When performing learning to deal with such bunsetsu cutting errors, when correcting to ``7 prefectures in the Saitama system,'' the bunsetsu cutting information is stored so that the bunsetsu cutting will be performed correctly thereafter.

ところで従来の学習用の文節切り情報は、当該文節切り
誤りのあった入力文字列固有のものであり、たとえば「
さいたまけんではけん」という読みは「埼玉系では県Ｊ
という正書に対するものであった。このため類似の誤り
についても個別に学習する必要があった。上述の「埼玉
系では県」を学習したのちに、「ちばけんではけん・・
」を入力したときには依然として「千葉系で派遣」とい
うように文節誤りが起こるおそれがあり、その場合個別
に学習する必要がある。By the way, conventional bunsetsu cutting information for learning is specific to the input character string in which the bunsetsu cutting error occurred, for example, "
``Saitama Ken de Haken'' is pronounced ``Saitama Prefecture J.
It was against the official text. Therefore, it was necessary to learn about similar errors individually. After studying the above-mentioned ``Saitama-kei prefecture'', ``Chibaken dehaken...''
”, there is still a risk that a phrase error will occur, such as ``Dispatched in Chiba,'' and in that case, it will be necessary to study it individually.

なおこの発明に関連する先行技術としては特開昭６１−
１４７３６５号公報および特開昭６１−１４７３６８公
報所載のものがある。The prior art related to this invention is JP-A-61-
There are those published in Japanese Patent Application Laid-open No. 147365 and Japanese Patent Application Laid-open No. 147368/1983.

特開昭８１−１４７３６５号公報の先行技術は正しい境
界を含む文節切り直し前の付属語と、その付属語の先頭
から正しい境界に至る仮名文字列とを記憶し、文節切り
の学習を行う。たとえばｒ今の日本は」　（上書）に対
する「いまのにはんば」という入力が誤って「今のに本
は」に変換されたときには、学習に−り「のに」と「の
」とを記憶し、以降「のに」が入力されたら「の」の後
に文節境界を置く。しかしこの先行技術では文節切りが
付属語の文字列中で起こる場合しか対処できない。たと
えば「今日は医者に」　（上書）が「今日歯医者に」に
誤変換されたときには、誤変換の「今臼歯−医者」　（
−印は正しい文節境界を示す）において正しい文節境界
を含む付属語がないのでこの先行技術は適用できない。The prior art disclosed in Japanese Unexamined Patent Publication No. 147365/1981 stores an adjunct word including the correct boundary before bunsetsu cutting, and a kana character string from the beginning of the adjunct word to the correct boundary, and learns bunsetsu cutting. For example, when the input ``Ima no Nihanba'' for ``Today's Japan'' (overwritten) is mistakenly converted to ``Today's hon wa'', learning can be done using ``noni'' and ``no''. From then on, when ``no'' is input, a clause boundary is placed after ``no''. However, this prior art can only deal with cases in which segmentation occurs within a character string of an adjunct word. For example, when "I'm going to the doctor today" (overwritten) is incorrectly converted to "I'm going to the dentist today", the incorrect translation "I'm going to the doctor today" (
This prior art cannot be applied because there is no attached word that includes a correct clause boundary (- indicates a correct clause boundary).

またこのＪうな文節切りの学習には弊害（副作用）も多
い。Additionally, there are many negative effects (side effects) to learning J Una Bunseki Kiri.

付属語の「のにｊが現れたら一律に「の」で文節切りを
行うのでは付属語の「のに」を付属語辞書に登録してい
る意味がなくなってしまう。If the adjunct word ``nonij'' were to be used to cut the sentence with ``no'', there would be no meaning in registering the adjunct word ``noni'' in the adjunct word dictionary.

特開昭６１−１４７３６６号公報の先行技術は正しい境
界を含む文節切りなおし前の文節の付属語、その文節に
続く自立語を含む文の情報を記憶して文節切りの学習を
行う。たとえば「表示し内容に注意し」　（上書）に対
する「ひようじしないようにちゆういし」という入力が
誤って「表示しないように注意し」に変換されたときに
、学習にＪす、「しない町うにちゅうい」と「シ」とを
記憶し、以降「しないようにちゅうい」が入力されたら
「シ」の後に文節境界を置く。この先行技術でも特開昭
６１−１４７３６５号公報の場合と同様な問題が生じる
。ただし弊害については、先の例で「ちゆうい」まで考
慮していることがら理解できるまうに若干改善される。The prior art disclosed in Japanese Unexamined Patent Application Publication No. 147366/1983 learns bunsetsu cutting by storing information on sentences including the adjunct word of the clause before bunsetsu cutting, including the correct boundary, and the independent word following the clause. For example, when the input ``Please be careful not to display the content'' (overwrite) is incorrectly converted to ``Please be careful not to display the contents'', the input ``Please be careful not to display the content'' (overwrite) is incorrectly converted to ``Please be careful not to display the contents''. It memorizes ``nai ni ni uni chui'' and ``shi'', and from now on, when ``nai ni ni ni chui'' is input, a clause boundary is placed after ``shi''. This prior art also causes the same problem as that of Japanese Patent Application Laid-Open No. 61-147365. However, the negative effects can be improved slightly, as can be understood by considering the ``chiyuui'' in the previous example.

しかし弊害が改善される反面、学習が有効な文節切り誤
りの範囲は狭くなっている。先の例で「注意Ｊに代えて
「留意」、「考慮」、「反映」等が来ると適用できない
。However, while the negative effects have been improved, the range of phrase-cutting errors for which learning is effective is narrowing. In the previous example, if ``note'', ``consider'', ``reflect'', etc. come in place of ``Caution J'', it cannot be applied.

Ｃ０発明が解決しまうとする課題この発明は以上の事情を考慮してなされたものであり、
−回の文節切り情報の学習にまり、当該入力文字固有の
文節切りのみでなく、その入力文字に代表される類型的
な文節切りまで学習でき、効率的な文節切りの学習が可
能であり、しかもそのまうな類型的な学習による弊害を
極力抑えることができる連文節文節変換手法を提供する
ことを目的としている。Problems to be solved by the C0 invention This invention was made in consideration of the above circumstances,
- It is possible to learn bunsetsu cut information of times, and not only the bunsetsu cut specific to the input character, but also the typical bunsetsu cut represented by that input character, and efficient bunsetsu cut learning is possible. Furthermore, the purpose of this invention is to provide a continuous phrase phrase conversion method that can minimize the negative effects of such typical learning.

９１課題を解決するための手段この発明では以上の目的を達成するために、文節切りを
学習するときに、正しい文節境界を包含する文節切り誤
り時の文節に含まれる自立語が、正しい文節境界の直前
の正書時の文節に含まれる自立語と同じときに、当該文
節の自立語に変えて任意の自立語表示を用いて学習を行
うようにしている。このまうにして包括的な文節切りの
学習が可能となる。他方、正しい文節境界を包含する文
節切り誤り時の文節に含まれる自立語が、正しい文節境
界の直前の正書時の文節に含まれる自立語と異なるとき
には、当該文節の自立語自体を用いて、包括的な学習を
止め、包括的な学習に止る弊害が生じない町うにしてい
る。91 Means for Solving the Problems In order to achieve the above object, in this invention, when learning bunsetsu cutting, an independent word included in the bunsetsu at the time of bunsetsu cutting error that includes the correct bunsetsu boundary is When the independent word is the same as the independent word included in the orthographic clause immediately before , learning is performed using an arbitrary independent word display instead of the independent word of the clause. In this way, comprehensive learning of bunsetsu cutting becomes possible. On the other hand, if the independent word included in the clause at the time of bunsetsu cut error that includes the correct clause boundary is different from the independent word included in the orthographic clause immediately before the correct clause boundary, the independent word itself of the clause is used. , we are trying to avoid comprehensive learning and avoid the negative effects of comprehensive learning.

なおここで自立語等の定義について触れておく。上述の
とおり連文節仮名漢字変換ではまず文節切りを行う。そ
してこの文節切りは第１の辞書くまたは辞書の部分）に
登録されている詔と、第２の辞書（濾たは辞書の部分ン
に登録されている語（空の文字列も含む、後述例）とを
接続して文節を構成し、これを入力文字列に突き合わせ
て行う。ここでは第１の辞書に登録されている語を自立
語といい、第２の辞書に登録されている語を付属語とい
う。たとえば前の「埼玉県では県」は「埼玉（自立語）
−空文字列（付属語）−県（自立語）−では（付属語）
−県（自立語）Ｊがら構成されている。Here, I would like to touch on the definition of independent words. As mentioned above, in the continuous clause kana-kanji conversion, the first step is to cut the clauses. This bunseki-kiri is the edict registered in the first dictionary (or part of the dictionary), and the word (including empty strings, which will be explained later) registered in the second dictionary (filter or part of the dictionary). Example) are connected to form a clause, and this is matched against the input character string.Here, the words registered in the first dictionary are called independent words, and the words registered in the second dictionary are called independent words. are called attached words.For example, the previous word ``Saitama prefecture'' means ``Saitama (independent word)''.
−Empty string (adjunct word) −Prefecture (independent word) −De (adjunct word)
-Prefecture (independent word) It is composed of J.

Ｅ、ｌ！ｉｉ！施例以下この発明の一実施例について説明する。E, l! ii! Example An embodiment of this invention will be described below.

第１図はこの実施例を全体として示すものである。なお
、第１図では機能ブロックを用いてハードウェアの実現
態様を示しているが、ソフトウェアで実現できることは
もちろんである。第１図において、仮名漢字変換部１は
仮名入力ＩＮを受け取って辞書２を参照しながら連文節
仮名漢字変換を行う。この仮名漢字変換部１はまず辞書
（自立語辞書部および付属語辞書部を含む）２を参照し
て１以上の変換結果候補を求める。この処理自体は通常
のものなのでこの点の詳細な説明は省略する（動忰につ
いてはのちに例を挙げる）。つぎに仮名漢字変換部１は
所定の規則（たとえばそれぞれの語の得点またはペナル
ティと語の閏の接続の得点またはペナルティとの総和の
大小に応じて変換結果を決定する）に加えて第１種切り
なおし学習辞書３、第２種切りなおし学習辞書４および
文節切り指定データーバッファ５を参照して変換結果を
決定をする。FIG. 1 shows this embodiment as a whole. Although FIG. 1 shows a hardware implementation using functional blocks, it is of course possible to implement the system using software. In FIG. 1, a kana-kanji conversion unit 1 receives a kana input IN and performs continuous clause kana-kanji conversion while referring to a dictionary 2. The kana-kanji conversion section 1 first refers to a dictionary (including an independent word dictionary section and an attached word dictionary section) 2 to obtain one or more conversion result candidates. Since this process itself is normal, a detailed explanation of this point will be omitted (an example will be given later regarding the movement). Next, the kana-kanji conversion unit 1 determines the conversion result according to predetermined rules (for example, depending on the size of the sum of the score or penalty of each word and the score or penalty of the connection of the word). The conversion result is determined by referring to the recutting learning dictionary 3, the second type recutting learning dictionary 4, and the bunsetsu cutting specification data buffer 5.

文節切りなおし部６は、オペレータが文節切り誤りを指
示したときの処理を行う部分である。すなわちオペレー
タによる文節切りなおしの前の変換結果ＯＵＴを修正前
変換結果バッファ７に保存し、文節切り指定データを作
成して（バッファ５にストア）仮名漢字変換部１を呼び
出し、変換結果を修正させる。さらに文節切りなおしの
後の変換結果ＯＵＴを修正後変換結果バ・ンファ８に保
存して学習部９を呼び出す。なお文節切りなおしは、た
とえば誤変換結果を元の仮名文字に直したのち所望の文
節のみを反転表示等で変換領域としてマークして再変換
して行うことができる。The phrase re-cutting section 6 is a section that performs processing when the operator instructs an error in phrase cutting. That is, the conversion result OUT before the bunsetsu re-cutting by the operator is saved in the uncorrected conversion result buffer 7, the bunsetsu-cutting specification data is created (stored in the buffer 5), and the kana-kanji conversion unit 1 is called to correct the conversion result. . Furthermore, the conversion result OUT after the phrase re-cutting is stored in the modified conversion result buffer 8, and the learning section 9 is called. Note that phrase re-cutting can be performed by, for example, converting the erroneous conversion result into the original kana characters, and then marking only the desired phrase as a conversion area by highlighting it, etc., and converting it again.

学習部９は、修正前変換結果バッファ７と修正後変換結
果バッファ８とを比較して文節切りなおしの種類を判定
し、第１種切りなおし学習辞書３または第２種切りなお
し学習辞書４を更新する。The learning unit 9 compares the pre-correction conversion result buffer 7 and the post-correction conversion result buffer 8 to determine the type of phrase re-cutting, and uses the first type re-cut learning dictionary 3 or the second type re-cut learning dictionary 4. Update.

すなわちのちに例を挙げて示すように正しい境界を含む
修正前の文節に含まれる自立語と正しい境界の直前の修
正後の文節に含まれる自立語とが同一のときに第１種切
りなおし辞書３に登録を行い、同一でないときに第２切
りなおし辞書４に登録を行う。In other words, as will be shown later with an example, when the independent word included in the uncorrected clause that includes the correct boundary is the same as the independent word included in the corrected clause immediately before the correct boundary, the Type 1 reshuffle dictionary is used. 3, and if they are not the same, they are registered in the second re-cutting dictionary 4.

つぎに、「しつけんのないよう」という入力仮名文字列
等を例に挙げて仮名漢字変換、文節切りなおしおよび再
変換の過程を示す。Next, the process of kana-kanji conversion, phrase re-cutting, and re-conversion will be explained using an input kana character string such as ``Shitsuken no Naiyo'' as an example.

文節切りなおし学習前の変換を韮ず説明する。A detailed explanation of the conversion before learning bunsetsu segmentation.

第２図において、自立語検索（ステップＳｌ）では第５
図に示すような自立語を自立語バッファＪＢに格納する
。付属語検索（ステップＳ２＞では第６図に示すような
付属語を付属語バッファＦＢに格納する。文節構成処理
（ステップＳ３）では以上の２つのバッファを参照して
第７図に示す町うな文法的に正しい文節を文節バッファ
ＢＢに格納する。候補構成処理（ステップＳ４）では文
節バッファＢＢを参照して第８図に示すような候補を候
補バッファＣＢに格納する。以降、上記の自立語検索、
付属語検索、文節構成および候補構成のすべての処理を
一括して「候補生成」と呼ぶ。In Figure 2, in the independent word search (step Sl), the fifth
An independent word as shown in the figure is stored in an independent word buffer JB. In the adjunct word search (step S2>, the adjunct words shown in FIG. 6 are stored in the adjunct word buffer FB. In the clause composition process (step S3), the above two buffers are referred to and the town name shown in FIG. 7 is stored. A grammatically correct clause is stored in the clause buffer BB. In the candidate composition process (step S4), the clause buffer BB is referred to and candidates as shown in FIG. 8 are stored in the candidate buffer CB. search,
All the processes of adjunct word search, clause construction, and candidate construction are collectively referred to as "candidate generation."

変換結果決定（ステップＳ５）は修正前変換結果バッフ
ァ７、修正後変換結果バッファ８および文節切り指定デ
ータ・バッファ５を参照して候補バッファＣＢに格納さ
れた候補より１を選択し変換結果ＯＵＴを更新する。こ
の場合には上述の２種の切りなおし学習辞書３．４にデ
ータが存在せず、また文節切り指定データもないので「
実験のないよう」が変換結果となる（第８図の上から下
の順に候補が選ばれるように得点などが与えられている
ものとする）。Conversion result determination (step S5) refers to the pre-correction conversion result buffer 7, the post-correction conversion result buffer 8, and the bunsetsu cut designation data buffer 5, selects 1 from the candidates stored in the candidate buffer CB, and selects the conversion result OUT. Update. In this case, there is no data in the above-mentioned two types of re-cutting learning dictionaries 3.4, and there is no phrase-cutting specification data, so
``There is no experiment'' is the conversion result (scores etc. are given so that candidates are selected in order from top to bottom in Figure 8).

ついで、オペレータが文節切りなおしを行ってｒ実験の
ないよう」を「実験の内容」に修正する過程を説明する
。第３図において、まずステップＳｌｌで修正前変換結
果の保存が行われる。すなわち修正前変換結果バッファ
７に漢字「実験のない町う」および仮名「じつけんのな
いよう」を格納する。オペレータによる文節切りの指定
（ステップ５１２）はたとえば第５文字「な」が文節先
頭であることを指定し、これにまり文節切り指定データ
・バッファ５に「５」を格納する。こののち仮名漢字変
換の指示（ステップ５１３）に−り仮名漢字変換部１を
呼び出す。Next, we will explain the process by which the operator rearranges the phrases to change the phrase ``avoid an experiment'' to ``the content of the experiment.'' In FIG. 3, first, in step Sll, the uncorrected conversion result is saved. That is, the kanji character ``Jikken no Nai Machiu'' and the kana character ``Jitsuken no Naiyo'' are stored in the pre-correction conversion result buffer 7. The operator specifies the phrase cut (step 512) by specifying that the fifth character "na" is the beginning of the phrase, and stores "5" in the phrase cut designation data buffer 5 accordingly. Thereafter, the kana-kanji conversion unit 1 is called in response to an instruction for kana-kanji conversion (step 513).

仮名漢字変換の指示（ステップ５１３）でにまって呼び
出された変換でも候補生成の過程は上記と同一である。The candidate generation process is the same as described above even in the case of a conversion called inadvertently in response to an instruction for kana-kanji conversion (step 513).

変換結果決定（ステップ５１４）では文節切り指定デー
タ「５」を参照して、第５文字「な」が文節先頭となる
町うな候補すなわち「実験の内容」を選択する（第８図
）。In the conversion result determination (step 514), the clause cut designation data "5" is referred to and a town una candidate in which the fifth character "na" is the beginning of the clause, ie, "experiment content" is selected (FIG. 8).

修正後変換結果ＯＵＴの保存（ステップ５１４）では修
正後変換結果バッファ８に漢字「実験の内容」および仮
名「しつけんの−ないよう」（「−」は文節境界を表す
）を格納する。こののち学習の指示（ステップ５１５）
においては学習部５が呼び出される。In saving the modified conversion result OUT (step 514), the kanji ``content of the experiment'' and the kana ``shitsuken no -naiyo''(``-'' represents a bunsetsu boundary) are stored in the modified conversion result buffer 8. Instructions for learning later (step 515)
In this case, the learning section 5 is called.

文節境界の学習は第４図に示すＪうに行われる。第４図
において、区切りの直前の自立語比較が行われる（ステ
ップ５２１）。すなわち修正前変換結果バッファ７の第
１自立語と修正後変換結果バッファ８の第１自立語とを
比較する。この場合には、共に「実験」であるので、−
散出力となり、第１種学習（ステップ５２２）に制御を
渡す。第１種学習では修正後変換結果バッファ８に格納
されている仮名「じつけんの−ないよう」から一致した
自立語「しつけん」を削除し「の−ない町う」を第１種
切りなおし学習辞書３に付加する（第１２図）。Learning of bunsetsu boundaries is performed as shown in FIG. In FIG. 4, an independent word comparison immediately before a break is performed (step 521). That is, the first independent word in the unmodified conversion result buffer 7 and the first independent word in the post-modified conversion result buffer 8 are compared. In this case, since both are "experiments", −
It becomes a scattered output, and control is passed to type 1 learning (step 522). In type 1 learning, the matching independent word ``shitsuken'' is deleted from the kana ``jitsuken no-naiyo'' stored in the corrected conversion result buffer 8, and ``no-nai machiu'' is recut to type 1. Add it to learning dictionary 3 (Figure 12).

このＪうに学習したのち、オペレータが再び仮名「しつ
けんのないよう」を変換しようとすると、仮名漢字変換
部１が呼び出され、第２図の処理が行われる。候補生成
の過程は上記と同一である。変換結果決定（ステップＳ
５）では第１種切りなおし学習辞書３のデータ「の−な
いまう」と整合する候補「実験の内容」を変換結果とし
て選択する。After learning this way, when the operator tries to convert the kana ``Shitsuken no Naiyo'' again, the kana-kanji converter 1 is called and the process shown in FIG. 2 is performed. The candidate generation process is the same as above. Conversion result determination (step S
In 5), the candidate "experiment content" that matches the data "no-naimau" in the first type recut learning dictionary 3 is selected as the conversion result.

また以上の文節切りなおしおよび学習が実行された後に
、上述の「しつけんのないよう」と類似の誤変換を伴う
「ちょうさのないよう」なる入力仮名文字列に対する変
換の過程を考える。「シラけんのないよう」なる入力仮
名文字列に対する変換と同様にして、第９図に示すよう
な候補を候補バッファＣＢに格納する。変換結果決定（
ステップＳ５）では第１種切りなおし学習辞書３のデー
タ「の−ないよう」と整合する候補「調査の内容」を変
換結果として選択する。この場合には「調査のないよう
」という誤変換が回避される。Also, consider the process of converting the input kana character string ``chosa no naiyo'' which involves a similar erroneous conversion to the above-mentioned ``shitsuke ken no naiyo'' after the above bunsetsu re-cutting and learning have been performed. Similar to the conversion of the input kana character string "Shiraken no Naiyo", candidates as shown in FIG. 9 are stored in the candidate buffer CB. Conversion result determination (
In step S5), the candidate "contents of investigation" that matches the data "no-naiyo" in the first type recut learning dictionary 3 is selected as the conversion result. In this case, the erroneous translation of "no investigation" is avoided.

つぎに、「ごかんせい」なる入力仮名文字列に対する変
換、文節切りなおし及び再変換の過程を示す。Next, the process of converting, re-splitting, and re-converting the input kana character string "Gokansei" will be described.

候補生成の過程は「しつけんのないよう」なる入力仮名
文字列に対する変換と同様であり、図１０図に示すよう
な候補を候補バッファＣＢに格納する。文節切りなおし
学習前の変換では、変換結果決定（ステップＳ５、第２
図）は「御完成」を変換結果として選択する。The process of generating candidates is similar to the conversion of the input kana character string ``Shitsuken no Naiyo'', and candidates as shown in FIG. 10 are stored in the candidate buffer CB. In the conversion before bunsetsu re-learning, the conversion result is determined (step S5, second
(Figure) selects "Completion" as the conversion result.

ついで、オペレータが文節切りなおしを行って「御完成
」を「互換性」に修正する過程において、修正前変換結
果の保存（Ｓ１１、第３図）は修正前変換結果バッファ
７に漢字ｒＩＩＩ完戒」および仮名「ご−かんせい」を
格納する。オペレータによる文節切りの指定（３１２）
は文節切り指定データ・バッファ５に「４」を格納する
（ごがん−せい）。仮名漢字変換の指示（Ｓ１３）によ
って仮名漢字変換部１が呼び出される。変換結果決定（
ＳＳ、第２図）は文節切り指定データ「４」を参照して
、第１０図に示す候補の中から第４文字「せ」が文節先
頭となるような候補「互換性」を選択する。修正後変換
結果の保存（Ｓ１４、第３図）は修正後変換結果バッフ
ァ８に漢字「互換性」および仮名「どかん−せい」を格
納する。学習の指示（８１５）は学習部５を呼び出す。Then, in the process where the operator re-cuts the phrases and corrects ``gosakusen'' to ``compatibility,'' the pre-correction conversion result is saved (S11, Figure 3) in the pre-correction conversion result buffer 7 as kanji rIII Kanji. ” and the kana “Go-kansei” are stored. Specifying bunsetsu cut by operator (312)
stores "4" in the bunsetsu cut designation data buffer 5 (gogan-sei). The kana-kanji conversion unit 1 is called by the instruction for kana-kanji conversion (S13). Conversion result determination (
SS (FIG. 2) refers to the clause cut designation data "4" and selects the candidate "compatibility" in which the fourth character "se" is the beginning of the clause from among the candidates shown in FIG. To save the modified conversion result (S14, FIG. 3), the kanji character "compatibility" and the kana "dokan-sei" are stored in the modified conversion result buffer 8. The learning instruction (815) calls the learning section 5.

区切りの直前の自立語比較では（ステップＳ２１、第４
図）修正前変換結果バッファ７の第１自立語と修正後変
換結果バッファ８の第１自立語とを比較する。この場合
には、「御」と「互換」であるので、不一致出力となり
、第２種学習Ｓ２３に制御を渡す。第２種学習２３では
修正後変換結果バッファ８に格納されている仮名「ごが
ん−せい」を第２種切りなおし学習辞書４に付加する（
第１３図）。In the independent word comparison immediately before the break (step S21, the fourth
Figure) The first independent word in the conversion result buffer 7 before correction and the first independent word in the conversion result buffer 8 after correction are compared. In this case, since it is "compatible" with "go", a mismatch output occurs and control is passed to the second type learning S23. In the second type learning 23, the kana "Gogan-sei" stored in the modified conversion result buffer 8 is added to the second type recutting learning dictionary 4 (
Figure 13).

こののちオペレータが再び仮名「どかんせい」を変換し
ようとすると、仮名漢字変換部１が呼び出される。候補
生成の過程は上記と同一である。After this, when the operator attempts to convert the kana "Dokansei" again, the kana-kanji conversion unit 1 is called. The candidate generation process is the same as above.

変換結果決定（Ｓ５）は第２種切りなおし学習辞書４の
データ「ごがん−せい」と整合する候補「互換性」を変
換結果として選択する。In the conversion result determination (S5), the candidate "compatibility" that matches the data "gogan-sei" in the second type recut learning dictionary 4 is selected as the conversion result.

つぎに、以上の文節切りなおしおよび学習が実行された
のちに、「ごかん竺と」と類似する「ききこう翌」なる
入力仮名文字列に対する変換の過程を示す。第１１図に
示すような候補が候補バッファＣＢ（第２図）に格納さ
れる。変換結果決定（Ｓ５〉は、第１種切りなおし学習
辞書３および第２種切りなおし学習辞書６の中に整合す
る候補が存在しないので、何も学習されていない場合と
同じ候補「機器構成」を変換結果として選択する。Next, the process of converting the input kana character string ``Kikiko next'', which is similar to ``Gokanjikuto'', after the above-mentioned bunsetsu re-cutting and learning is performed, will be described. Candidates as shown in FIG. 11 are stored in the candidate buffer CB (FIG. 2). Conversion result determination (S5) determines the same candidate "equipment configuration" as if nothing has been learned, since there is no matching candidate in the first type recutting learning dictionary 3 and the second type recutting learning dictionary 6. Select as the conversion result.

なおこの例で、互換性（「ごかん−せい」）の文ｉ境界
の学習時に「（任意の自立語）−せい」〈第１付属語は
空文字列）が第１種切りなおし辞書３に登録されている
と、「機器甲性」が誤って変換出力されるが（第１１図
の３番目の候補）、この実施例のまうに第１自立語の判
定によってこのような文節境界については一括的な学習
を行わず、個別の学習を行えば、以上のＪうな問題がな
い。In addition, in this example, when learning the sentence i boundary of compatibility (“gokan-sei”), “(any independent word)-sei” (the first attached word is an empty string) is entered in the first type reshuffling dictionary 3. If it is registered, "Device-K" will be erroneously converted and output (the third candidate in Fig. 11), but in this example, by determining the first independent word, such clause boundaries can be detected. If you do individual learning instead of all at once, you won't have any of the above problems.

この実施例の効果を確かめるために９５４文節からなる
文章（日本学術会議第１４期活動計画、昭和６３年１１
月）を仮名漢宇変換してみた。この場合この実施例の学
習を行わないと第１４図に示すように２１個の境界誤り
があった。そして実施例に詰る学習の下ではそのうちの
７個を防止できた。境界誤りＥＯ２の包括的な学習にま
りＥ０７およびＥ１５が回避され、ＥＯ５によりＥＯ９
が回避され、ＥｌｌによりＥ１６が回避され、Ｅ１８に
よりＥ１９、Ｅ２０およびＥ２１が回避される。In order to confirm the effectiveness of this example, a text consisting of 954 clauses (Science Council of Japan, 14th term activity plan, November 1988)
I tried converting the month) into kana and kanyu. In this case, if learning according to this embodiment was not performed, there would be 21 boundary errors as shown in FIG. Seven of them were able to be prevented by learning through practical examples. Due to comprehensive learning of boundary error EO2, E07 and E15 are avoided, and EO5 causes EO9
is avoided, E16 is avoided by Ell, and E19, E20, and E21 are avoided by E18.

Ｆ６発明の詳細な説明したＪうに、この発明では、文節切りを学習する
ときに、正しい文節境界を包含する文節切り誤り時の文
節に含まれる自立語が、正しい文節境界の直前の正置時
の文節に含まれる自立語と同じときに、当該文節の自立
語に変えて任意の自立語表示を用いて学習を行うように
している。F6 As described in the detailed explanation of the invention, in this invention, when learning bunsetsu cutting, an independent word included in the bunsetsu-cutting error clause that includes the correct bunsetsu boundary is placed directly before the correct bunsetsu boundary. When the word is the same as an independent word included in a clause, learning is performed using an arbitrary independent word display instead of the independent word of the clause.

このＪうにして包括的な文節切りの学習が可能となる。In this way, comprehensive learning of bunsetsu cuts becomes possible.

他方、正しい文節境界を包含する文節切り誤り時の文節
に含まれる自立語が、正しい文節境界の直前の正置時の
文節に含まれる自立語と異なるときには、当該文節の自
立語自体を用いて、包括的な学習を止め、包括的な学習
による弊害が生じないまうにしている。On the other hand, if the independent word included in the clause at the time of bunsetsu cut error that includes the correct clause boundary is different from the independent word included in the clause at the time of correct positioning immediately before the correct clause boundary, the independent word itself of the clause is used. , they stop comprehensive learning and prevent the negative effects of comprehensive learning from occurring.

[Brief explanation of the drawing]

第１図はこの発明の１実施例の構成を示すブロック図、
第２図は第１図の実施例の仮名漢字変換部１の動作を説
明するフローチャート、第３図は同実施例の文節切りな
おし部６の動作を説明するフローチャート、第４図は同
実施例の学習部９の動作を説明するフローチャート、第
５図ないし第１３ｒＩ！Ｊは同実施例の動作に供する図
、第１４図は同実施例の効果の説明に供する図である。１・・・仮名漢字変換部、２・・・辞書、３−・・第１
種切りなおし学習辞書、４−・・第２種切りなおし学習
辞書、５・・・文節切り指定データ・バッファ、６・・
・文節切りなおし部、７・・・修正前変換結果バッファ
、８・・・修正誤変換結果バッファ、９・・・学習部。出願人　インターナショナル・ビジネス・マシーンズ・
コーポレーション復代理人　　弁理士　　澤　１）俊　夫１　　’　　　
　　　　　　　　　　　　、ＢＢ＃２換部の流れ図第２図ＵＴ１　　　　　　　　　　、　　−５ｉ４文節バッファ第７図FIG. 1 is a block diagram showing the configuration of one embodiment of the present invention;
FIG. 2 is a flowchart explaining the operation of the kana-kanji converter 1 in the embodiment shown in FIG. 1, FIG. 3 is a flowchart explaining the operation of the phrase cutting unit 6 in the embodiment, and FIG. 4 is the embodiment Flowcharts illustrating the operation of the learning section 9 of FIGS. 5 to 13 rI! J is a diagram for explaining the operation of the same embodiment, and FIG. 14 is a diagram for explaining the effects of the same embodiment. 1...Kana-kanji conversion section, 2...Dictionary, 3-...1st
Seed cutting learning dictionary, 4-... Second type cutting learning dictionary, 5... Bunsetsu cutting specification data buffer, 6...
- Phrase re-cutting unit, 7... uncorrected conversion result buffer, 8... corrected erroneous conversion result buffer, 9... learning unit. Applicant International Business Machines
Corporation sub-agent Patent attorney Sawa 1) Toshio 1'
, BB#2 exchange section flowchart Fig. 2 UT 1 , -5i4 clause buffer Fig. 7

Claims

[Claims]

(1) In order to convert an input kana character string into a sentence containing kana and kanji, one of the words in the first group registered in the first dictionary part is followed by a word in the second group registered in the second dictionary part. Sequentially generate phrase units formed by combining one of the two groups of words or an empty string, and construct one or more chains of phrase units that match the input kana character string, and create the chain of 1 above or the above. In a continuous clause kana-kanji conversion method that outputs a kana-kanji-mixed sentence selected from a plurality of chains according to a predetermined rule, the method includes a step of receiving correction information on boundaries of bunsetsu units, and a step of receiving correction information on boundaries of bunsetsu units. re-selecting one of the above chains and re-outputting it as a kana-kanji-mixed sentence; The step of comparing the word of the first group immediately before the corrected boundary included in the kana-kanji mixed sentence contained in the re-outputted kana-kanji mixed sentence when the comparison result shows a mismatch. , the kana character string corresponding to the bunsetsu unit before the boundary after the above correction,
and included in the above re-outputted sentence containing kana and kanji,
A kana character string corresponding to at least a word of the first group among the bunsetsu units after the modified boundary is registered as a first series of kana character strings, and a kana character string in the first series of kana character strings A step of registering information indicative of the post-correction boundary position, and when the above comparison result indicates a match, one of the bunsetsu units before the post-correction boundary included in the re-output kana-kanji mixed sentence. Kana character string corresponding to the part excluding the first group of words, and at least the first group of clause units after the corrected boundary included in the re-outputted kana-kanji mixed sentence. registering a kana character string corresponding to the word as a second series of kana character strings, and registering information indicating the modified boundary position in the second series of kana character strings; When the first series of kana character strings is input, or when the second series of kana character strings is input following any of the first group of words, the registered modified boundary 1. A continuous clause kana-kanji conversion method comprising the step of forcibly dividing a clause unit by position.

(2) Included in the above re-outputted sentence containing kana and kanji,
2. The continuous clause kana-kanji according to claim 1, wherein one of the kana character strings corresponding to the part of the clause unit before the modified boundary excluding the words of the first group includes an empty kana character string. Conversion method.

(3) In order to convert the input kana character string into a sentence containing kana and kanji, create clauses in sequence by combining 1 independent word followed by 1 attached word or empty string, and match the above input kana character string. In the continuous clause kana-kanji conversion method, which configures one or more chains of clauses, and outputs the above chain 1 or the chain selected according to a predetermined rule as a kana-kanji mixed sentence, a step of receiving correction information; a step of reselecting one of the above-mentioned chains to match the correction information of the bunsetsu boundary and outputting it again as a sentence containing kana and kanji; The step of comparing the independent words contained in The kana character string that is included in the kana-kanji mixed sentence and corresponds to the clause before the above-mentioned corrected boundary, and the clause that is included in the above-mentioned re-output kana-kanji mixed sentence after the above-mentioned corrected boundary. registering at least a kana character string corresponding to an independent word as a first series of kana character strings, and registering information indicating a modified boundary position in the first series of kana character strings; When the result shows a match, the kana character string corresponding to the part of the bunsetsu unit before the boundary after the above correction, excluding the independent word, included in the above re-outputted kana-kanji mixed sentence, and the above re-outputted kana character string A kana character string corresponding to at least an independent word among the clauses after the corrected boundary included in the outputted kana-kanji mixed sentence is registered as a second series of kana character strings, and registering information indicating the modified boundary position in the series of kana character strings; and when the first series of kana character strings is input or following any independent word, 1. A continuous phrase kana-kanji conversion method comprising the step of forcibly cutting phrases at the registered boundary position after correction when a series of kana character strings is input.

(4) In order to convert the input kana character string into a sentence containing kana and kanji, one of the words in the first group registered in the first dictionary part is followed by a word in the second group registered in the second dictionary part. Sequentially generate phrase units formed by combining one of the two groups of words or an empty string, and construct one or more chains of phrase units that match the input kana character string, and create the chain of 1 above or the above. In a continuous clause kana-kanji conversion device that outputs a kana-kanji mixed sentence selected from a plurality of chains according to a predetermined rule, there is provided a means for receiving correction information on the boundaries of bunsetsu units, and a method adapted to the correction information on the boundaries of bunsetsu units. A means of re-selecting one of the above chains and re-outputting it as a sentence containing kana-kanji; a means of comparing the word of the first group immediately before the corrected boundary included in the kana-kanji mixed sentence, and when the comparison result shows a mismatch, the word included in the re-outputted kana-kanji mixed sentence. , the kana character string corresponding to the bunsetsu unit before the modified boundary, and at least the first group of clause units after the modified boundary included in the re-outputted kana-kanji mixed sentence. registering a kana character string corresponding to the word as a first series of kana character strings,
and a first storage means for registering information indicating the corrected boundary position in the first series of kana character strings; The kana character string that corresponds to the part of the bunsetsu unit before the boundary after the above correction excluding the words of the first group, and the above correction included in the re-outputted sentence containing kana and kanji. The kana character string corresponding to at least the first group of words in the clause units after the boundary of
a second storage means for registering information indicating the corrected boundary position in the second series of kana character strings; means for determining that the first series of kana character strings or the second series of kana character strings has been input by referring to the storage means; In response, said registration occurs when said first series of kana character strings is input, or when said second series of kana character strings is input following any said first group of words. and means for forcibly dividing a bunsetsu unit at the corrected boundary position.

(5) When the above comparison result indicates a match, the part of the bunsetsu unit before the above-mentioned corrected boundary excluding the words of the first group included in the above-mentioned re-outputted kana-kanji mixed sentence 5. The continuous clause kana-kanji conversion device according to claim 4, wherein one of the corresponding kana character strings also includes an empty kana character string.

(6) In order to convert the input kana character string into a sentence containing kana and kanji, sequentially generate clauses by combining 1 independent word followed by 1 attached word or empty string, and match the above input kana character string. In a continuous clause kana-kanji conversion device that configures one or more chains of clauses, and outputs the above chain 1 or the chain selected according to a predetermined rule as a kana-kanji mixed sentence, means for receiving correction information; means for reselecting one of the above chains to match the correction information for the boundary of the bunsetsu and re-outputting it as a sentence containing kana and kanji; and the clause before correction that includes the border after correction means to compare the independent word included in the above re-outputted sentence with the independent word immediately before the above-mentioned corrected boundary included in the above-mentioned re-outputted kana-kanji mixed sentence, and when the above-mentioned comparison result shows a mismatch, the above-mentioned re-output The kana character string that is included in the kana-kanji mixed sentence and corresponds to the clause before the above-mentioned corrected boundary, and the clause that is included in the above-mentioned re-output kana-kanji mixed sentence after the above-mentioned corrected boundary. A first storage means for registering at least a kana character string corresponding to an independent word as a first series of kana character strings, and for registering information indicating a modified boundary position in the first series of kana character strings. and, when the above comparison result indicates a match, the kana character string corresponding to the part of the bunsetsu unit before the above-mentioned corrected boundary excluding independent words, included in the above-mentioned re-outputted kana-kanji mixed sentence. , and registering a kana character string corresponding to at least an independent word among the clauses after the corrected boundary included in the re-outputted kana-kanji mixed sentence as a second series of kana character strings, and a second storage means for registering information indicating the corrected boundary position in the second series of kana character strings; means for determining that a series of kana character strings has been input or that the second series of kana character strings has been input; When a character string is input, or when the second series of kana character strings is input following any independent word, bunsetsu cutting is forcibly performed at the registered boundary position after correction. 1. A continuous clause kana-kanji conversion device characterized by having means.

(7) It has a central processing unit, a storage device, a display device, and a character input device to perform data processing and to convert an input kana character string into a sentence containing kana and kanji. One of the words of the first group is successively combined with one of the words of the second group registered in the second dictionary section or an empty string to generate phrase units, and the above-mentioned input kana characters are generated. In a data processing device that configures one or more chains of clause units that match a column and outputs the above chain 1 or the chain selected according to a predetermined rule from the above plural chains as a kana-kanji mixed sentence, a means for receiving correction information on the boundaries of the sentence; a means for reselecting one of the above chains so as to match the correction information on the boundaries of each bunsetsu unit and outputting it again as a sentence containing kana and kanji; means for comparing the first group of words included in the bunsetsu unit before correction with the first group of words immediately before the corrected boundary included in the re-outputted kana-kanji mixed sentence; When the result shows a mismatch, the kana string corresponding to the bunsetsu unit before the boundary after the above correction, which is included in the re-outputted kana-kanji mixed sentence, and the kana character string included in the above-mentioned re-outputted kana-kanji mixed sentence registering, as a first series of kana character strings, kana character strings that correspond to at least the first group of words among the clause units after the modified boundary;
and a first storage means for registering information indicating the corrected boundary position in the first series of kana character strings; The kana character string that corresponds to the part of the bunsetsu unit before the boundary after the above correction excluding the words of the first group, and the above correction included in the re-outputted sentence containing kana and kanji. The kana character string corresponding to at least the first group of words in the clause units after the boundary of
a second storage means for registering information indicating the corrected boundary position in the second series of kana character strings; means for determining that the first series of kana character strings or the second series of kana character strings has been input by referring to the storage means; In response, said registration occurs when said first series of kana character strings is input, or when said second series of kana character strings is input following any said first group of words. and means for forcibly dividing a bunsetsu unit at the corrected boundary position.