JPH0836571A

JPH0836571A - Document processor

Info

Publication number: JPH0836571A
Application number: JP6169250A
Authority: JP
Inventors: Takao Ikoma; 孝夫生駒
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1994-07-21
Filing date: 1994-07-21
Publication date: 1996-02-06
Anticipated expiration: 2019-08-11
Also published as: JP3552750B2

Abstract

PURPOSE:To convert the character string mixing KANJI and KANA by pen input and the converted character string mixing KANJI and KANA without using a large-scale dictionary registering the all combinations of KANJI (Chinese character) and KANA (Japanese syllabic writing). CONSTITUTION:A character string 1 to be converted mixing KANJI and KANA is converted into a middle KANA character group 3 being all KANAs by a KANJI/KANA conversion 2. Then, it is converted by a KANA/KANJI conversion 4 to generate a primary candidate character string group 5. The primary candidate character string group and the character string 1 to be converted are collated by collation processing 6. The ones fulfilling the condition is made the last candidate character string group 7.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ペン入力機能をもつ
ワードプロセッサ、コンピュータ、携帯情報端末など、
手書き文字入力および手書き文字認識機能をもち、直接
漢字を入力することが可能である日本語文処理装置、あ
るいは、一般のコンピュータおよびワードプロセッサな
どで、一度入力した日本語テキストを事後に再度変換で
きる機能をもつ文書処理装置に関する。The present invention relates to a word processor having a pen input function, a computer, a personal digital assistant, etc.
With a handwritten character input and handwritten character recognition function, a Japanese sentence processing device that can directly input kanji, or a function that can convert the once input Japanese text afterwards with a general computer and word processor. The present invention relates to a document processing device.

【０００２】[0002]

【従来の技術】現在、ペンなどによるペン入力機能を有
するワードプロセッサ、コンピュータなどが数多く存在
する。図９はペン入力機能を有する携帯情報端末の一例
である。表示部Ａは表示一体型のタブレットである。該
表示部Ａに入力手段である入力用ペンＢにより入力する
ことができる。図１０は入力中の画面の例である。記入
枠ＣにペンＢで入力することにより、ペン入力された漢
字、仮名、英数字等を認識し、認識された文字が入力行
Ｄに表示される。入力枠はどの枠（Ｃ１〜４）に書いて
もよく、認識された文字は記入枠から消えるため、空い
ている枠に順次入力していけばよく、書いた順に入力行
に表示される。2. Description of the Related Art Currently, there are many word processors, computers and the like having a pen input function using a pen or the like. FIG. 9 shows an example of a portable information terminal having a pen input function. The display unit A is a display-integrated tablet. Input can be made on the display unit A by an input pen B which is an input means. FIG. 10 is an example of a screen during input. By inputting with the pen B in the entry frame C, the kanji, kana, alphanumeric characters, etc. input by the pen are recognized, and the recognized characters are displayed on the input line D. The input box may be written in any of the boxes (C1 to C4), and the recognized characters disappear from the entry box. Therefore, it is sufficient to sequentially enter in the empty boxes, and the characters are displayed in the input line in the order of writing.

【０００３】従来のキーボード入力方式ではひらがなか
ら漢字に変換を行っていたが、ペン入力方式において
は、漢字も直接入力が可能である。しかし、漢字を混在
させた入力や、いったん変換済みの仮名漢字文節のテキ
ストを変換することはできない。漢字の前後の仮名部分
ごとに個別に変換を行う場合、例えば「公えん」と入力
して、ひらがな部の「公えん」の部分を変換した場合、
変換単位が短く、直前の文字「公」となんら関係のない
ものも候補としてしまうため、変換精度が非常に悪い。In the conventional keyboard input method, conversion from hiragana to kanji is performed, but in the pen input method, kanji can also be directly input. However, it is not possible to input mixed Kanji characters or convert the text of Kana-Kanji clauses that have already been converted. If the conversion separately for each before and after the kana part kanji, for example, type "Oyakeen" If converted part of the "ene public" Hiragana portion,
The conversion accuracy is very poor because the conversion unit is short and candidates that have nothing to do with the immediately preceding character "Kou" are also selected.

【０００４】特公平４−３５７８５において、漢字と仮
名の混在する見出し語ごとに対応する漢字を格納した専
用の辞書を用いることにより、入力させた仮名漢字まじ
り文字列を変換する方式が開示されている。この方式に
おいて用いられる辞書の例を図８に示す。仮名漢字変換
辞書の見出し語に漢字をも含んだ文字列も記述した辞書
を用いている。しかし、「日本語」に対する見出し語に
関して、すべての漢字と仮名の組み合わせを考えると
「にほんご」「日ほんご」「に本ご」「にほん語」「日
本ご」「日ほん語」「に本語」の７つが必要になる。変
換結果である漢字列の文字数をＲとすると、見出し語数
は２^R−１となり膨大な量に膨れ上がる。このためかな
り大規模な辞書を用意することが必要となり、検索効
率、メンテナンスの負荷などの問題があった。Japanese Patent Publication No. 4-35785 discloses a method for converting an input kana-kanji character string by using a dedicated dictionary storing kanji corresponding to each entry word in which kanji and kana are mixed. There is. An example of a dictionary used in this method is shown in FIG. The kana-kanji conversion dictionary uses a dictionary that describes character strings that also include kanji as headwords. However, regarding all the headwords for "Japanese", considering all combinations of kanji and kana, "Japanese,""Japanese,""Nihongo,""Japanese,""Japanese,""Japanese," and "Nihongo" 7 words are needed. Assuming that the number of characters in the Kanji string that is the conversion result is R, the number of headwords is 2 ^R -1, which is a huge amount. For this reason, it is necessary to prepare a fairly large dictionary, which causes problems such as search efficiency and maintenance load.

【０００５】[0005]

【発明が解決しようとする課題】ペン入力以外の入力方
式として一般的にはキーボードによる入力方式がある。
仮名漢字まじりの日本語文を入力する手段として、キー
ボードより入力文をひらがなで入力した後に、［変換］
キーなどの操作により、ひらがな文字列を漢字に変換す
るといった方式があった。As an input method other than the pen input, there is generally a keyboard input method.
As a means of inputting Japanese sentences in kana and kanji, enter the input sentence in hiragana from the keyboard, and then click [Convert].
There was a method to convert the Hiragana character string into Kanji by operating the keys.

【０００６】ペン入力方式においては漢字を直接入力す
ることが可能であるため、例えば「公園」という文字列
を入力したい場合、直接前述の記入枠Ｃに「公園」と入
力し、認識させることができるが、実際は「園」という
漢字は画数も多く、ユーザとしては「公えん」と入力し
て、変換要求を行うと、「公園」と変換してくれると非
常に都合がよい。また、一般の入力方式による機器にお
いても、既に入力済みの日本語のテキストの一部もしく
は全部を再度、漢字に変換したいケースがありうる。Ｏ
ＣＲ(Optical Character Reader)から入力された文章の
一部に再変換を行いたい場合も考えられる。そのために
は従来の仮名のみからなる文字列を変換の対象とするの
ではなく、漢字も含んだ、仮名漢字まじりの文字列から
変換を行えるようにする必要がある。In the pen input method, since it is possible to directly input Chinese characters, for example, when a character string "park" is desired to be input, "park" can be directly input and recognized in the above-mentioned entry box C. Although it can be done, actually, there are many strokes for the Chinese character "en", and it is very convenient for the user to input "Kouen" and make a conversion request, which will be converted to "park". In addition, even in a device using a general input method, there may be a case in which a part or all of the Japanese text that has already been input is desired to be converted into kanji again. O
There may be a case where it is desired to reconvert a part of a sentence input from a CR (Optical Character Reader). For that purpose, it is necessary to make it possible to perform conversion from a character string that is mixed with kana and kanji, which also includes kanji, rather than the conventional conversion target is a character string that consists only of kana.

【０００７】特公平４−３５７８５において、仮名漢字
まじりの文字列を変換する方式として、従来の仮名漢字
変換辞書の見出し語に漢字をも含んだ文字列も記述した
辞書を用いる手法が考案されている。しかし、この方式
では辞書の項目数が非常に大きくなってしまう。また、
上記の例でも明らかなように、可能なすべての組み合わ
せのうち、ほとんど参照されることのないであろう見出
し語（例えば「に本ご」など）を多く含んだ辞書になっ
てしまう一方、使われる可能性のある見出し語に限定し
て登録するとなると、その境界線を引く作業は容易では
ない。そのような辞書の作成には多大な負荷がかかるう
えに、さらに辞書の更新や利用者による単語登録などに
おいて、他の辞書と一貫性を持たせるためには多大な労
力を要することが考えられる。In Japanese Examined Patent Publication No. 4-35785, as a method for converting a character string containing kana-kanji characters, a method of using a dictionary in which a character string including kanji in the entry word of a conventional kana-kanji conversion dictionary is also devised. There is. However, this method results in a very large number of dictionary items. Also,
As can be seen from the above example, while all possible combinations result in a dictionary that contains many headwords that are rarely referenced (such as "nihongogo"), If the registration is limited to the headwords that are likely to occur, it is not easy to draw the boundary line. Creating such a dictionary would be very burdensome, and it would take a lot of effort to make it consistent with other dictionaries in updating the dictionary and registering words by the user. .

【０００８】本発明では、このような巨大な辞書をもつ
必要なく、漢字を含んだ漢字を含んだ文字列を一度、ひ
らがなのみの文字列に変換するという手法を用いること
により、仮名漢字まじりの文字列を変換する。According to the present invention, it is not necessary to have such a huge dictionary, and a character string containing Chinese characters including Chinese characters is once converted into a character string containing only Hiragana characters. Convert a string.

【０００９】[0009]

【課題を解決するための手段】請求項１の発明におい
て、変換対象文字列である仮名漢字まじり文字列を漢字
を含まない文字列に変換し、中間仮名文字列を得る漢字
仮名変換手段と、前記中間仮名文字列を仮名漢字まじり
の文字列に変換し、一次候補文字列を得る仮名漢字変換
手段、および、前記一次候補文字列と変換対象文字列を
照合する。According to the first aspect of the present invention, kana-kana conversion means for converting a kana-kanji magic character string that is a conversion target character string into a character string not containing kanji to obtain an intermediate kana character string, The intermediate kana character string is converted to a kana-kanji mixed character string to obtain a primary candidate character string, and the primary candidate character string and the conversion target character string are collated.

【００１０】請求項２の発明において、変換対象文字列
中で漢字の連続する部分文字列はそのまま一次候補文字
列に保存されており、かつ、変換対象文字列中で仮名の
連続する部分文字列は一次候補文字列中で空でない任意
の文字列に対応している場合に最終候補文字列とする照
合手段を備える。In the second aspect of the present invention, the partial character string in which Chinese characters are continuous in the conversion target character string is stored as it is in the primary candidate character string, and the partial character string in which Kana is continuous in the conversion target character string. Is provided with a collating unit that determines a final candidate character string when it corresponds to any non-empty character string in the primary candidate character string.

【００１１】[0011]

【作用】請求項１の発明によれば、漢字仮名変換手段に
よって、入力された変換対象の、漢字と仮名文字が混在
する文字列を仮名のみの文字列に変換する。ここで、中
間仮名文字列が得られる。仮名漢字変換手段によって、
仮名漢字まじりの文字列に変換する。ここで、一次候補
文字列が得られる。According to the first aspect of the present invention, the kanji-kana conversion means converts the input character string, which is a mixture of kanji and kana characters, into a character string containing only kana characters. Here, the intermediate kana character string is obtained. By kana-kanji conversion means,
Converts to a kana-kanji character string. Here, the primary candidate character string is obtained.

【００１２】また、請求項２の発明によれば、照合手段
によって、変換対象文字列と各一次候補文字列を照合
し、一次候補文字列のうち、変換対象文字列に含まれる
漢字が正しい位置に、同じ順序で現れるもの以外を排除
し、残ったものを正解の候補として、利用者に出力表示
する。According to the second aspect of the present invention, the collating means collates the conversion target character string with each of the primary candidate character strings, and the Chinese character included in the conversion target character string in the primary candidate character string is in the correct position. In addition, except those appearing in the same order, the remaining ones are output and displayed to the user as correct answer candidates.

【００１３】[0013]

【実施例】図１は本発明の処理の流れを示すブロック図
である。変換対象文字列１を仮名漢字変換２の処理によ
り、中間仮名文字列群３が得られる。この中間仮名文字
列群３に仮名漢字変換４による処理を加えると、一次候
補文字列群５が得られる。この一次候補文字列群５と変
換対象文字列１を照合処理６により照合すると最終候補
文字列群７が得られるという流れである。1 is a block diagram showing the flow of processing according to the present invention. By converting the conversion target character string 1 into the kana-kanji conversion 2, the intermediate kana character string group 3 is obtained. A primary candidate character string group 5 is obtained by adding a process by the kana-kanji conversion 4 to the intermediate kana character string group 3. The final candidate character string group 7 is obtained when the primary candidate character string group 5 and the conversion target character string 1 are matched by the matching process 6.

【００１４】本発明の一実施例のブロック図を図２に示
す。８はタブレット、９は文字認識回路、１０は変換対
象文字列バッファ、１１は漢字仮名変換回路、１２は漢
字仮名変換用回路、１３は中間仮名文字列バッファ、１
４は仮名漢字変換回路、１５は仮名漢字変換用辞書、１
６は一次候補文字列バッファ、１７は文字列照合手段、
１８は最終候補文字列バッファ、１９は表示回路、２０
は表示装置、２１は変換過程制御回路を表している。A block diagram of one embodiment of the present invention is shown in FIG. 8 is a tablet, 9 is a character recognition circuit, 10 is a conversion target character string buffer, 11 is a kanji kana conversion circuit, 12 is a kanji kana conversion circuit, 13 is an intermediate kana character string buffer, 1
4 is a kana-kanji conversion circuit, 15 is a kana-kanji conversion dictionary, 1
6 is a primary candidate character string buffer, 17 is a character string collating means,
18 is a final candidate character string buffer, 19 is a display circuit, 20
Is a display device, and 21 is a conversion process control circuit.

【００１５】上記各ブロックについて、処理にもとづい
て詳細に説明する。タブレット８は図９のＡに示したよ
うに、通常は表示部を兼ねている。図９の記入部Ｃにペ
ンＢを用いて文字を入力する。入力された文字は文字認
識回路９によって、認識される。文字認識回路９では、
入力された文字のイメージ情報や、入力される入力のス
トローク情報（筆使い）などの情報を用いて、入力され
た文字を認識する。認識された文字列は、まず変換対象
文字列バッファ１０に格納される。この変換対象文字列
バッファの内容は、漢字仮名変換用辞書１２を利用し
て、漢字仮名変換回路１１により、仮名のみを含む文字
列に変換され、中間文字列バッファ１３に格納される。Each of the above blocks will be described in detail based on the processing. As shown in FIG. 9A, the tablet 8 normally also serves as a display unit. A character is input using the pen B into the entry area C in FIG. The input character is recognized by the character recognition circuit 9. In the character recognition circuit 9,
The input character is recognized using information such as image information of the input character and stroke information (writing brush) of the input input. The recognized character string is first stored in the conversion target character string buffer 10. The contents of the conversion target character string buffer are converted into a character string containing only kana by the kanji kana conversion circuit 11 using the kanji kana conversion dictionary 12 and stored in the intermediate character string buffer 13.

【００１６】中間仮名文字列バッファ１３に格納され
た、仮名のみを含む文字列は、仮名漢字変換用辞書１５
を利用する仮名漢字変換回路１４によって、漢字まじり
の文字列に変換され、一次候補文字列バッファ１６に格
納される。変換対象文字列バッファ１０の内容と、一次
候補文字列バッファ１６の内容は、照合回路１７によっ
て比較され、この両バッファの内容が矛盾しないと判断
された場合にのみ、一次候補文字列バッファ１６の内容
が、最終候補文字列バッファ１８にコピーされる。The character string containing only the kana stored in the intermediate kana character string buffer 13 is converted into the kana-kanji conversion dictionary 15.
The kana-kanji conversion circuit 14 using the converted data is converted into a kanji character string and stored in the primary candidate character string buffer 16. The contents of the conversion target character string buffer 10 and the contents of the primary candidate character string buffer 16 are compared by the collation circuit 17, and only when it is determined that the contents of both buffers do not conflict, the contents of the primary candidate character string buffer 16 are stored. The contents are copied to the final candidate character string buffer 18.

【００１７】この最終候補文字列バッファ１８に得られ
た文字列は、変換結果の候補として表示回路２０を通じ
て表示装置２１に表示され、利用者に出力表示され、利
用者が承認すると確定され、そうでなければ、次の一次
候補文字列または中間仮名文字列を得て、同様の処理を
繰り返す。The character string obtained in the final candidate character string buffer 18 is displayed on the display device 21 through the display circuit 20 as a candidate of the conversion result, is output to the user, is displayed, and is confirmed when the user approves. If not, the next primary candidate character string or intermediate kana character string is obtained, and the same processing is repeated.

【００１８】変換処理の起動および候補の提示は、タブ
レット８より入力され、文字認識回路９によって認識さ
れた変換命令として、変換過程制御回路２１に渡され、
この変換過程制御回路２１は図３に示すフローチャート
にもとづいて処理を進める。図３、図４を用いて、ひと
つの実施例を詳細に説明する。ここで実際の例として、
「公えん」という文字列をタブレット８より入力し、文
字認識回路９によって認識され、変換をすることとす
る。The activation of conversion processing and the presentation of candidates are input from the tablet 8 and passed to the conversion process control circuit 21 as a conversion command recognized by the character recognition circuit 9,
The conversion process control circuit 21 advances the processing based on the flowchart shown in FIG. One embodiment will be described in detail with reference to FIGS. 3 and 4. Here as a real example,
It is assumed that the character string “Kouen” is input from the tablet 8, recognized by the character recognition circuit 9, and converted.

【００１９】まず、ステップ２２において変換対象文字
列をＳとする。Ｓは変換対象文字列バッファである。つ
まりここでＳには“公えん”が格納される。ステップ２
３で最終候補文字列バッファであるＲＲに空集合を代入
してクリアする。ステップ２４において、Ｓを漢字仮名
変換する。つまり“公えん”に対して漢字仮名変換を行
う。その結果の文字列の集合をＩＩとして格納する。こ
こではＩＩとして“こうえん”“きみえん”が格納され
る。これが中間仮名文字列バッファである。ステップ２
５において、ＩＩより要素の文字列をひとつ取り出して
Ｉとする。つまりＩとして“こうえん”が取り出され
る。もし取り出す要素がなければステップ３０の処理を
行う。ステップ２６において、Ｉに仮名漢字変換を適用
し、その結果の文字列とＣＣとする。例ではＣＣとし
て、“公園”“公演”“後援”“講演”“高遠”が得ら
れる。これが一次候補文字列バッファである。ステップ
２７でＣＣの要素の一つを取り出してＣとする。もし、
要素がなければステップ２５に戻る。ステップ２５にお
いて、ＩＩの全要素を取り出したら、ステップ３０にす
すむ。ステップ２８において、ＣとＳを照合する。Ｃと
Ｓが矛盾するならば、ステップ２７に戻る。矛盾しない
ならばステップ２９において、集合ＲＲ（最終候補文字
列バッファ）の要素としてＣを加えたのち、ステップ２
７に戻る。つまりステップ２７、２８でＣＣの要素であ
る“公園”“公演”“後援”“講演”“高遠”において
照合が行われ、Ｓに矛盾しない“公園”“公演”が最終
候補文字列バッファＲＲに格納され、ステップ２５にも
どり、Ｉとして“きみえん”を取り出し、ステップ２６
において仮名漢字変換を適用する。しかし、“きみえ
ん”に仮名漢字変換を適用すると候補が得られないた
め、またステップ２５にもどる。ステップ２５ではＩＩ
にはもう取り出すＩがないため、ステップ３０に処理の
処理を行うことになる。ステップ３０では、ＲＲより要
素の文字列をひとつ取り出してＲとする。要素がなけれ
ばステップ３１で候補なしとして終了する。ステップ３
２においてＲを利用者に表示する。ステップ３３におい
て、利用者がＲを採用すれば終了し、そうでない場合は
ステップ３０に戻る。つまり候補として“公園”“公
演”が表示されたわけである。First, in step 22, the character string to be converted is S. S is a conversion target character string buffer. That is, “public end” is stored in S here. Step 2
At 3, the empty set is substituted for the final candidate character string buffer RR to clear it. In step 24, S is converted into kanji and kana. In other words, kanji and kana conversion is performed on "Kouen". The resulting set of character strings is stored as II. Here, “Kouen” and “Kimien” are stored as II. This is the intermediate kana character string buffer. Step 2
In step 5, one character string of the element is extracted from II and set as I. In other words, "koen" is taken out as I. If there is no element to be taken out, the processing of step 30 is performed. In step 26, Kana-Kanji conversion is applied to I to obtain the resulting character string and CC. In the example, “park”, “performance”, “support”, “lecture”, and “takato” are obtained as CC. This is the primary candidate character string buffer. In step 27, one of the elements of CC is taken out and set as C. if,
If there is no element, the process returns to step 25. When all the elements of II are taken out in step 25, the process proceeds to step 30. In step 28, C and S are collated. If C and S are inconsistent, the process returns to step 27. If there is no contradiction, in step 29, C is added as an element of the set RR (final candidate character string buffer), and then step 2
Return to 7. That is, in steps 27 and 28, the CC elements "park", "performance", "support", "lecture", and "high distance" are compared, and "park" and "performance" that do not contradict S are stored in the final candidate character string buffer RR. Stored, return to step 25, take out "Kimien" as I, step 26
Apply Kana-Kanji conversion in. However, if the kana-kanji conversion is applied to "Kimien", no candidate can be obtained. Therefore, the process returns to step 25. II in step 25
Since there is no I to be taken out, the processing of step 30 is performed. In step 30, one character string of the element is taken out from RR and set as R. If there is no element, the process ends in step 31 with no candidate. Step 3
In step 2, R is displayed to the user. In step 33, if the user adopts R, the process ends, and if not, the process returns to step 30. In other words, "park" and "performance" were displayed as candidates.

【００２０】上記ステップ２４での漢字仮名変換は、文
字列Ｓを引き数としてとり、仮名のみに変換した候補の
文字列の集合を返す関数である。また、ステップ２６の
仮名漢字変換は仮名のみの文字列Ｉを引き数としてと
り、漢字まじりの文字列に変換した文字列の集合を返す
関数である。The kanji / kana conversion in step 24 is a function that takes a character string S as an argument and returns a set of candidate character strings converted into kana only. The kana-kanji conversion in step 26 is a function that takes a character string I containing only kana as an argument and returns a set of character strings converted into a kanji-mixed character string.

【００２１】上記ステップ２８においてＣとＳの照合を
行い、矛盾があるかどうかを判断するわけであるが、そ
の判断方法について、図５を用いて詳しく説明する。前
記の例における、変換対象文字列Ｓが「公えん」の場合
に、一次候補文字列群ＣＣとして「“公園”“公演”
“後援”“講演”“高遠”」が得られたとする。“公
園”“公園”は漢字『公』が変換対象文字列Ｓと同じ位
置に現れ、かつ、仮名『えん』の部分がなんらかの漢字
に置き換えられているのに対し、“後援”“講演”“高
遠”については、漢字『公』が現れていないので、明ら
かに利用者が入力しようとした文字列と異なることがわ
かる。その結果、“公園”“公演”は最終候補文字列と
して残り、“後援”“講演”“高遠”については却下さ
れることになる。In step 28, C and S are collated to determine whether there is a contradiction. The determination method will be described in detail with reference to FIG. In the above example, if the conversion target character string S is “Kouen”, the primary candidate character string group CC is ““ park ”“ performance ”.
It is assumed that "support", "lecture", and "high distance" are obtained. In "park" and "park", the kanji "Kou" appears at the same position as the conversion target character string S, and the kana "en" part is replaced with some kanji, while "support""lecture"" For "Takato", the kanji "Kou" does not appear, so it is clear that it is different from the character string that the user tried to input. As a result, “park” and “performance” remain as the final candidate character strings, and “support”, “lecture” and “takato” are rejected.

【００２２】また、もう一つの例として、変換対象文字
列として「ひ行き」を変換したとする。Ｓに「ひ行き」
が格納され、漢字仮名変換が行われる。漢字仮名変換の
結果として、中間仮名文字列としてＩに“ひこうき”
“ひぎょうき”“ひいき”（下線部は『行』が変換され
た部分）が得られる。これらの中間仮名文字列のそれぞ
れについて、仮名漢字変換を行うことにより、一次候補
文字列Ｃとして、“飛行機”“非行期”“罷業期”“贔
屓”等が得られる。このうち、元の変換対象文字列に含
まれていた漢字『行』を含まない“罷業期”“贔屓”等
を排除し、残った“飛行機”“非行期”は、「ひ行き」
の『ひ』『き』に相当する部分になんらかの漢字が現れ
ているため、これらを正解の候補として提示することに
なる。As another example, it is assumed that "higo" is converted as the character string to be converted. "Go to S"
Is stored and Kanji and Kana conversion is performed. As a result of the kanji kana conversion, the I as an intermediate pseudonym string "shed-out this"
"-Out shed industry" "Ki has fire" (underlined the portion where the "line" has been converted) is obtained. By performing kana-kanji conversion for each of these intermediate kana character strings, "airplane", "delinquency period", "departure period", "pension", etc. can be obtained as the primary candidate character string C. Out of these, the "training period", "pension", etc., which do not include the kanji "line" included in the original conversion target character string, are excluded, and the remaining "airplane" and "delinquency period" are "higo".
Some kanji appear in the parts corresponding to "hi" and "ki", so these will be presented as candidates for correct answers.

【００２３】上記の照合処理において、一次候補文字列
を最終候補として残すか否かを決定するには、以下の方
法による。変換対象文字列を、仮名のみが連続する部分
文字列と、漢字のみが連続する部分文字列に分割する。In the above collation processing, the following method is used to determine whether or not to leave the primary candidate character string as the final candidate. The conversion target character string is divided into a partial character string in which only kana is continuous and a partial character string in which only kanji is continuous.

【００２４】Ｓ＝Ｓ１＆Ｓ２＆・・・＆ＳｎただしＮＫ（Ｓｉ）［ｉが奇数のとき］かつＫ（Ｓｉ）［ｉが偶数のとき］またはＫ（Ｓｉ）［ｉが奇数のとき］ＮＫ（Ｓｉ）［ｉが偶数のとき］で表す。ここでＳは変換対象文字列、Ｓｉ(ｉ＝1,2,・・
・,ｎ;ｎは自然数)はその部分文字列、＆は文字列の連結
演算子を表し、Ｋ（Ｘ）は文字列Ｘが漢字のみを含むこ
とを表す述語、ＮＫ（Ｘ）は文字列Ｘが仮名のみを含む
ことを表す述語とする。このような分割は必ず一意に定
まる。S = S1 & S2 & ... & Sn where NK (Si) [when i is an odd number] and K (Si) [when i is an even number] or K (Si) [when i is an odd number] NK (Si) It is represented by [when i is an even number]. Here, S is a character string to be converted, Si (i = 1,2, ...
,, n; n is a natural number), & is a concatenation operator of the character strings, K (X) is a predicate that the character string X contains only Kanji, and NK (X) is a character string Let X be a predicate indicating that it contains only kana. Such division is always uniquely determined.

【００２５】つまり、例として「ひ行き」が変換対象文
字列であった場合、Ｓ＝［ひ行き］であり、Ｓ１は仮名
が連続する部分文字列である［ひ］である、Ｓ２は漢字
が連続する部分文字列［行］、Ｓ３は同様に、仮名が連
続する部分文字列である［き］である。また変換対象文
字列Ｓが［公えん］であった場合は、Ｓ１は漢字が連続
する部分文字列である［公］、Ｓ２が仮名が連続する部
分文字列である［えん］となる。That is, as an example, when "Higo" is a character string to be converted, S = [Hikou], S1 is [Hi] which is a partial character string in which kana is continuous, and S2 is Kanji. Is a continuous partial character string [row], and S3 is similarly a partial character string in which a kana is continuous [ki]. When the conversion target character string S is [Kouen], S1 is a partial character string in which Chinese characters are continuous [Kou], and S2 is a partial character string in which Kana is continuous [En].

【００２６】このとき、一次候補文字列が以下のような
条件を満たす部分文字列に分割できるならば、この文字
列は変換対象文字列と矛盾しておらず、最終候補文字列
として残すものとする。このような分割が可能でない一
次候補文字列は却下し、最終候補文字列から排除する。At this time, if the primary candidate character string can be divided into partial character strings satisfying the following conditions, this character string does not contradict the conversion target character string and is left as the final candidate character string. To do. Such a primary candidate character string that cannot be divided is rejected and excluded from the final candidate character string.

【００２７】Ｔ＝Ｔ１＆Ｔ２＆・・・＆Ｔｎ（ｎは変換対象文字列Ｓが分割された部分文字列の個
数）ただしＴｉは空でない任意の文字列［ｉが奇数のと
き］かつＴｉ＝ＳｉかつＫ（Ｔｉ）［ｉが偶数のとき］またはＴｉ＝ＳｉかつＫ（Ｔｉ）［ｉが奇数のと
き］かつＴｉは空でない任意の文字列［ｉが偶数のとき］ここで、Ｔは一次候補文字列、Ｔｉ(ｉ＝1,2,・・・,ｎ;ｎ
は自然数)はその部分文字列を表す。T = T1 & T2 & ... & Tn (n is the number of partial character strings into which the character string S to be converted is divided) where Ti is an arbitrary character string [when i is an odd number] and Ti = Si and K (Ti) [when i is even] or Ti = Si and K (Ti) [when i is odd] and Ti is any non-empty character string [when i is even] where T is a primary candidate character Column, Ti (i = 1,2, ..., n; n
Is a natural number) represents the substring.

【００２８】たとえば、変換対象文字列Ｓが「公えん」
であれば、Ｓ１＝「公」、Ｓ２＝「えん」と一意に分割
できる（ｎ＝２）。これに対する一次候補文字列が「公
園」であれば、Ｔ１＝「公」＝Ｓ１、Ｔ２＝「園」と分
割でき、「公演」でも同様であるが、「講演」「後援」
等は、Ｔ１＝Ｓ１、かつＴ２が空でないような分割は存
在しないため、最終候補文字列から排除される。For example, the conversion target character string S is "Kouen".
If so, S1 = “public” and S2 = “en” can be uniquely divided (n = 2). If the primary candidate character string for this is “park”, it can be divided into T1 = “public” = S1 and T2 = “garden”, and the same is true for “performance”, but “lecture” and “support”
Etc. are excluded from the final candidate character string because there is no division in which T1 = S1 and T2 is not empty.

【００２９】また、変換対象文字列Ｓを「ひ行き」とす
ると、Ｓ１＝「ひ」、Ｓ２＝「行」、Ｓ３＝「き」と分
割できる。これに対する一次候補文字列を「飛行機」と
すると、Ｔ１＝「飛」、Ｔ２＝「行」、Ｔ３＝「機」と
分割でき、「非行期」においても同様に分割できるの
で、最終候補文字列として残るが、「罷業期」や「贔
屓」についてはＴ１とＴ３が空でなく、かつ、Ｔ２＝Ｓ
２となるような分割は存在しない。Further, if the conversion target character string S is “higo”, it can be divided into S1 = “hi”, S2 = “row”, and S3 = “ki”. If the primary candidate character string for this is “airplane”, it can be divided into T1 = “fly”, T2 = “line”, T3 = “machine”, and can also be divided in the “delinquency period”. However, for “training period” and “preferential”, T1 and T3 are not empty, and T2 = S
There is no division that results in 2.

【００３０】すなわち、図６に示すように、変換対象文
字列中で漢字の連続する部分文字列はそのまま一次候補
文字列に保存されており、かつ、変換対象文字列中で仮
名の連続する部分文字列は一次候補文字列中で空でない
任意の文字列に対応しているならば、その時に限り、そ
の一次候補文字列を最終候補文字列として残すことにな
る。That is, as shown in FIG. 6, a partial character string of continuous Kanji in the conversion target character string is stored as it is in the primary candidate character string, and a part of continuous kana in the conversion target character string. If the character string corresponds to any non-empty character string in the primary candidate character string, that primary candidate character string is left as the final candidate character string only then.

【００３１】また、上記実施例では、漢字仮名変換、仮
名漢字変換をおこなうときに、すべての候補をバッファ
に格納して一次候補文字列を生成し、照合手段によっ
て、最終候補文字列を提示していた。以下の実施例は大
きなバッファを要しない実施例である。図７をもとに説
明する。Further, in the above embodiment, when performing kanji / kana conversion and kana / kanji conversion, all candidates are stored in a buffer to generate a primary candidate character string, and the final candidate character string is presented by the collating means. Was there. The following embodiment is an embodiment which does not require a large buffer. Description will be made with reference to FIG. 7.

【００３２】まず、ステップ３５において、変換対象文
字列をＳとする。次にステップ２３で、Ｓに漢字仮名変
換を適用し、その結果をＩとする。ステップ２４でも
し、Ｉが空文字列であれば、ステップ２５で候補なしと
して終了する。Ｉが空文字列でない場合には、ステップ
２６において、Ｉに仮名漢字変換を適用して、その結果
をＣとする。ステップ２７において、もし空文字列であ
ればステップ２３に戻り、次の候補をＩとし、処理を続
ける。空文字列でない場合は、ステップ２８で、Ｓ（変
換対象文字列バッファ）とＣ（一次候補文字列バッフ
ァ）の照合を行う。もし、ここで照合により矛盾するな
らば、またステップ２３にもどり次候補をＩとし、処理
をつづける。矛盾しなかった場合は、ステップ２９にお
いて、Ｃ（一次候補文字列バッファ）の内容をＲ（最終
候補文字列バッファ）に代入し、ステップ３０で利用者
に提示する。ステップ３１において、利用者が採用すれ
ばステップ３２で終了し、採用せずに、次候補を要求し
た場合はステップ２３にもどる。以上のようにバックト
ラックを繰り返して候補を生成することも可能である。First, in step 35, the character string to be converted is S. Next, in step 23, Kanji-kana conversion is applied to S, and the result is set to I. If I is an empty character string in step 24, in step 25 there is no candidate and the process ends. If I is not an empty character string, then Kana-Kanji conversion is applied to I in step 26 and the result is set to C. If it is an empty character string in step 27, the process returns to step 23, the next candidate is set to I, and the process is continued. If it is not an empty character string, in step 28, S (conversion target character string buffer) is compared with C (primary candidate character string buffer). If there is a contradiction due to the collation, the process returns to step 23, the next candidate is set to I, and the process is continued. If there is no contradiction, the contents of C (primary candidate character string buffer) are substituted into R (final candidate character string buffer) in step 29, and presented to the user in step 30. If the user adopts it in step 31, the process ends in step 32. If the user requests the next candidate without adopting it, the process returns to step 23. It is also possible to repeat the backtrack as described above to generate candidates.

【００３３】[0033]

【発明の効果】手書き入力文字を入力、認識できる機器
においては、直接漢字を入力できるため部分的に漢字を
含んだ文字列を入力できる。しかし、たとえば入力の繁
雑さ、あるいは困難さのために文字列の一部を仮名で入
力したものを変換したい場合や、また、入力済みの日本
語文を再度変換したい場合において、入力された文字列
に含まれる漢字とその位置の情報を有効に利用すること
により、不要な候補を排除することができ、変換の精度
を高めることができる。EFFECTS OF THE INVENTION In a device that can input and recognize handwritten input characters, since Chinese characters can be directly input, a character string partially containing Chinese characters can be input. However, if you want to convert a part of the character string that was input with a kana character due to complexity or difficulty of input, or if you want to convert the already input Japanese sentence again, the input character string By effectively utilizing the information on the Chinese characters and their positions included in, it is possible to eliminate unnecessary candidates and improve the conversion accuracy.

【００３４】漢字仮名まじり文字列を仮名文字列に変換
するための辞書はテキスト音声合成やソーティング用の
読みの生成などに汎用的に利用でき、特別な専用辞書を
用意する必要はない。仮名漢字変換においては従来の技
術をそのまま利用できる。このように、汎用的なソフト
ウェア技術を組み合わせたものであるため、個々の要素
技術をそれぞれ置き換えることが容易である。A dictionary for converting a kanji kana-magic character string into a kana character string can be generally used for text-to-speech synthesis, generation of reading for sorting, etc., and it is not necessary to prepare a special dedicated dictionary. The conventional technique can be used as it is for the kana-kanji conversion. In this way, since it is a combination of general-purpose software technologies, it is easy to replace individual element technologies.

[Brief description of drawings]

【図１】本発明の処理の流れを示すフローチャートであ
る。FIG. 1 is a flowchart showing a flow of processing of the present invention.

【図２】本発明における実施例１のブロック図である。FIG. 2 is a block diagram of a first embodiment according to the present invention.

【図３】本発明における実施例１のフローチャートであ
る。FIG. 3 is a flowchart of Embodiment 1 of the present invention.

【図４】本発明の実施例における変換過程を示した図で
ある。FIG. 4 is a diagram showing a conversion process in the embodiment of the present invention.

【図５】本発明における入力文字列例に対する中間かな
文字列群、一次候補文字列群、最終候補文字列群の例を
示した図である。FIG. 5 is a diagram showing an example of an intermediate kana character string group, a primary candidate character string group, and a final candidate character string group for an input character string example in the present invention.

【図６】照合処理によって、最終候補文字列として残す
条件の例を示した図である。FIG. 6 is a diagram showing an example of conditions to be left as a final candidate character string by the matching process.

【図７】本発明における実施例２のフローチャートであ
る。FIG. 7 is a flowchart of a second embodiment of the present invention.

【図８】従来例における、辞書の例を示した図である。FIG. 8 is a diagram showing an example of a dictionary in a conventional example.

【図９】ペン入力機能を持った携帯情報端末を示した図
である。FIG. 9 is a diagram showing a portable information terminal having a pen input function.

【図１０】ペンによる手書き入力モードでの画面図の一
例である。FIG. 10 is an example of a screen diagram in a handwriting input mode with a pen.

[Explanation of symbols]

１変換対象文字列２仮名漢字変換処理３中間仮名文字列群４仮名漢字変換処理５一次候補文字列群６照合処理７最終候補文字列群８タブレット９文字認識回路１０変換対象文字列バッファ１１漢字仮名変換回路１２漢字仮名変換用辞書１３中間仮名文字列バッファ１４仮名漢字変換回路１５仮名漢字変換用辞書１６一次候補文字列バッファ１７文字列照合回路１８最終候補文字列バッファ１９表示回路２０表示装置２１変換過程制御回路 1 conversion target character string 2 kana kanji character conversion process 3 intermediate kana character string group 4 kana kanji character conversion process 5 primary candidate character string group 6 collation process 7 final candidate character string group 8 tablet 9 character recognition circuit 10 conversion target character string buffer 11 kanji Kana conversion circuit 12 Kanji kana conversion dictionary 13 Intermediate kana character string buffer 14 Kana kanji conversion circuit 15 Kana kanji conversion dictionary 16 Primary candidate character string buffer 17 Character string matching circuit 18 Final candidate character string buffer 19 Display circuit 20 Display device 21 Conversion process control circuit

Claims

[Claims]

1. A kana-kana conversion means for converting a kana-kanji character string into a character string not containing kanji, and a kana character kanji character for converting a character string not containing kanji obtained by the means into a kana-kanji character string. A document processing apparatus, comprising: a conversion unit; and a matching unit that compares the Kana-Kanji character string that is the conversion target with the Kana-Kanji character string that is obtained by the conversion process.

2. A kana-kana conversion means for converting a kana-kanji kanji character string which is a conversion target character string into a character string not containing kanji and obtaining an intermediate kana character string, and the intermediate kana character string kana-kanji character kanji Kana-Kanji conversion means for converting to a string to obtain a primary candidate character string, and collating the primary candidate character string with the conversion target character string, and a partial character string of consecutive Kanji in the conversion target character string is the primary candidate character as it is. A collation means for storing a partial character string that is stored in a column and has consecutive kana characters in the conversion target character string as a final candidate character string when it corresponds to any non-empty character string in the primary candidate character string. A document processing apparatus comprising: