JPH1011432A

JPH1011432A - Character string estimation method and document preparing device using the estimation method

Info

Publication number: JPH1011432A
Application number: JP8161777A
Authority: JP
Inventors: Toshiya Tamura; 俊哉田村
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1996-06-21
Filing date: 1996-06-21
Publication date: 1998-01-16

Abstract

PROBLEM TO BE SOLVED: To estimate even an unregistered word based on its head several characters in the next and following input steps as long as the word is once inputted by deciding the connection between a character string that is selected among those candidates of estimated character strings and a character string that is stored in a character string buffer in the preceding retrieval step. SOLUTION: A connection decision part 110 refers to the character types or the grammar information stored in a grammar dictionary 111 to decide the connection between a candidate character string that is selected by a user or a character string that is directly inputted by the user and a character string that is stored in a character string buffer 112 in the preceding retrieval step. When a connectable state is decided by the part 110, a character string register part 113 stores the character string that is selected or directly inputted by the user in the buffer 112. If an unconnectable state is decided by the part 110, plural character strings if inputted in the buffer 112 are connected together into a compound word which is registered in a character string learning table 106. Then the buffer 112 is cleared.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力装置により入
力された文字列をもとに、その入力された文字列に続く
文字列を予測する文字列予測方法と、この文字列予測方
法を用いた文書作成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character string prediction method for predicting a character string subsequent to an input character string based on a character string input by an input device, and a method of using the character string prediction method. Document creation device.

【０００２】[0002]

【従来の技術】従来、例えば日本語ワードプロセッサ等
の文書作成装置において、手書き文字認識装置などを用
いて文字列を入力する際には、キーボードからの入力な
どとは異なり、１文字を入力するのに時間がかかるた
め、入力者に対する負担が大きかった。2. Description of the Related Art Conventionally, in a document creation device such as a Japanese word processor, when inputting a character string using a handwritten character recognition device or the like, unlike a keyboard input, one character is input. Time was required, and the burden on the input person was large.

【０００３】そこで、入力者に対する負担を軽くするた
めに、入力しようとしている文字列の一部が入力された
段階で、入力された文字列をもとにその文字列に続く文
字列を予測するという方法が考えられてきた。Therefore, in order to reduce the burden on the input person, when a part of a character string to be input is input, a character string following the character string is predicted based on the input character string. That method has been considered.

【０００４】これに対して考えられたのが入力予測と呼
ぶ方法で、例えば、文献（第４８回情報処理学会全国大
会予稿集４Ｊ−１１他）で報告されているように、最初
の数文字が入力された時点で単語辞書を検索して、その
数文字で始まる文字列（単語）を抽出する方法がある。[0004] In response to this, a method called input prediction has been considered, for example, as reported in the literature (the 48th Annual Convention of the IPSJ 4J-11, etc.), the first few characters of which are reported. There is a method of searching a word dictionary at the time when is input, and extracting a character string (word) starting with a few characters.

【０００５】これは、例えば「漢字辞書」という文字列
の入力では、入力者が「漢字」と入力した段階で、その
「漢字」に続く文字列として「辞書」を予測し、最終的
に「漢字辞書」を出力するものである。[0005] In the input of a character string "kanji dictionary", for example, when the input person inputs "kanji", a "dictionary" is predicted as a character string following the "kanji" and finally " It outputs a “kanji dictionary”.

【０００６】[0006]

【発明が解決しようとする課題】上記したような文字列
予測は、最初の数文字が入力された時点で単語辞書を検
索し、その数文字で始まる文字列（単語）を予測候補と
して得る方法であるため、予め単語辞書に所望の単語が
登録されていることが前提となる。この場合、単語辞書
には、一般的に使われる単語が複数登録されているが、
例えば「特許提案書」のような複合語は登録されていな
い。したがって、単語辞書に登録されていない複合語を
入力する場合には、「特許」、「提案書」のように単語
毎にその先頭数文字を入力し、各々の単語について複数
の予測候補の中から意図するものを選択する、といった
面倒な作業が必要であった。The character string prediction as described above is performed by searching a word dictionary when the first few characters are input and obtaining a character string (word) starting with the few characters as a prediction candidate. Therefore, it is assumed that a desired word is registered in the word dictionary in advance. In this case, the word dictionary contains multiple commonly used words,
For example, a compound word such as “patent proposal” is not registered. Therefore, when entering a compound word that is not registered in the word dictionary, enter the first few characters of each word, such as “patent” or “proposal”, and enter multiple prediction candidates for each word. It was troublesome work to select the intended one from the

【０００７】また、複合語に限らず、他の未登録語に関
しても、当然のことながら文字列予測を行うことができ
なかった。このため、そのような語を何度も入力すると
きには非常に不具合であった。[0007] Further, it is naturally impossible to predict a character string not only for compound words but also for other unregistered words. For this reason, when such a word was input many times, it was very inconvenient.

【０００８】本発明は上記のような点に鑑みなされたも
ので、未登録語であっても、一度入力されれば次回の入
力からはその未登録語の先頭数文字によって予測を行う
ことのできる文字列予測方法及びこの文字列予測方法を
用いた文書作成装置を提供することを目的とする。The present invention has been made in view of the above points. Even if an unregistered word is input once, it can be predicted by the first few characters of the unregistered word from the next input. It is an object of the present invention to provide a character string predicting method capable of performing the method and a document creating apparatus using the character string predicting method.

【０００９】[0009]

[Means for Solving the Problems]

（１）本発明は、単語を格納した単語辞書と、この単語
辞書に未登録の文字列を格納する文字列学習テーブルと
を有し、予測対象となる文字列の入力により、その入力
文字列を先頭部分に持つ文字列を上記単語辞書または上
記文字列学習テーブルから予測文字列の候補として検索
し、この予測文字列の候補の中で選択された文字列と前
回の検索で文字列バッファに格納された文字列との接続
判定を行い、接続可能と判定された場合に上記選択文字
列を上記文字列バッファに格納し、接続不可能と判定さ
れた場合に上記文字列バッファに複数の文字列が格納さ
れていれば、それらを結合した文字列を複合語として上
記文字列学習テーブルに登録した後、上記文字列バッフ
ァをクリアするものである。(1) The present invention has a word dictionary storing words, and a character string learning table for storing character strings not registered in the word dictionary. Is searched as a predicted character string candidate from the word dictionary or the character string learning table, and the character string selected in the predicted character string candidate is stored in the character string buffer in the previous search. Performs connection determination with the stored character string, stores the selected character string in the character string buffer when it is determined that connection is possible, and stores multiple characters in the character string buffer when it is determined that connection is not possible. If the strings are stored, the character string obtained by combining the strings is registered as a compound word in the character string learning table, and then the character string buffer is cleared.

【００１０】このような構成によれば、順次選択される
予測文字列のうち接続可能な各文字列が複合語として学
習され、以後、その学習文字列が予測対象の１つとして
検索される。したがって、複合語の入力であっても、一
度入力されれば次回の入力からはその語の先頭数文字に
よって予測を行うことができるようになる。According to such a configuration, each connectable character string among the predictive character strings sequentially selected is learned as a compound word, and thereafter, the learned character string is searched for as one of the prediction targets. Therefore, even if a compound word is input, once it is input, prediction can be performed by the first few characters of the word from the next input.

【００１１】（２）本発明は、上記（１）の構成におい
て、予測文字列の候補の中で選択された文字列と前回の
検索で上記文字列バッファに格納された文字列との接続
判定を文字種に基づいて行うことを特徴とする。(2) In the present invention, in the configuration of the above (1), connection determination between a character string selected from candidates of a predicted character string and a character string stored in the character string buffer in a previous search. Is performed based on the character type.

【００１２】このような構成によれば、文字種によって
接続判定が行われる。この場合、当該選択文字列が平仮
名以外の１文字種からなり、かつ、前回の文字列と同じ
文字種である場合に接続可能とされ、それ以外は接続不
可能とされる。According to such a configuration, the connection is determined based on the character type. In this case, if the selected character string is composed of one character type other than Hiragana and has the same character type as the previous character string, connection is enabled, and other connections are disabled.

【００１３】（３）本発明は、上記（１）の構成におい
て、文字列間の接続判定を行うための文法情報を格納し
た文法辞書を有し、予測文字列の候補の中で選択された
文字列と前回の検索で上記文字列バッファに格納された
文字列との接続判定を上記文法辞書に格納された上記文
法情報に基づいて行うことを特徴とする。(3) According to the present invention, in the configuration of the above (1), there is provided a grammar dictionary storing grammar information for determining connection between character strings, and the grammar dictionary is selected from candidates of a predicted character string. A connection between a character string and a character string stored in the character string buffer in a previous search is determined based on the grammar information stored in the grammar dictionary.

【００１４】このような構成によれば、文法情報によっ
て接続判定が行われる。この場合、例えば「名詞と動詞
は直接繋がらない」、「助詞が存在する場合には複合語
ではない」といった、予め用意された文法的な規則に従
って当該選択文字列と前回の文字列とが複合語として接
続可能か否かが判定される。According to such a configuration, the connection is determined based on the grammar information. In this case, the selected character string and the previous character string are compounded according to a grammatical rule prepared in advance, such as "noun and verb are not directly connected" and "no compound if particle is present". It is determined whether or not connection is possible as a word.

【００１５】（４）本発明は、単語を格納した単語辞書
と、この単語辞書に未登録の文字列を格納する文字列学
習テーブルとを有し、予測対象となる文字列の入力によ
り、その入力文字列を先頭部分に持つ文字列を上記単語
辞書または上記文字列学習テーブルから予測文字列の候
補として検索し、この検索によって利用者が意図する候
補が得られなかった場合、利用者によって直接入力され
た文字列を上記文字列学習テーブルに登録するものであ
る。(4) The present invention has a word dictionary that stores words and a character string learning table that stores character strings that are not registered in the word dictionary. A character string having the input character string at the beginning is searched from the word dictionary or the character string learning table as a candidate for a predicted character string. If the search does not yield a candidate intended by the user, the user The input character string is registered in the character string learning table.

【００１６】このような構成によれば、利用者が文字列
を直接入力した場合、その文字列は未登録語として学習
され、以後、その学習文字列が予測対象の１つとして検
索される。したがって、未登録語の入力であっても、一
度入力されれば次回の入力からはその語の先頭数文字に
よって予測を行うことができるようになる。According to such a configuration, when the user directly inputs a character string, the character string is learned as an unregistered word, and thereafter, the learned character string is searched as one of the prediction targets. Therefore, even if an unregistered word is input, once it is input, prediction can be performed by the first few characters of the word from the next input.

【００１７】（５）本発明は、単語を格納した単語辞書
と、この単語辞書に未登録の文字列を格納する文字列学
習テーブルとを有し、予測対象となる文字列の入力によ
り、その入力文字列を先頭部分に持つ文字列を上記単語
辞書または上記文字列学習テーブルから予測文字列の候
補として検索し、この予測文字列の候補の中で選択され
た文字列または利用者によって直接入力された文字列と
前回の検索で文字列バッファに格納された文字列との接
続判定を行い、接続可能と判定された場合に上記選択文
字列または上記直接入力文字列を上記文字列バッファに
格納し、接続不可能と判定された場合に上記文字列バッ
ファに複数の文字列が格納されていれば、それらを結合
した文字列を複合語として上記文字列学習テーブルに登
録した後、上記文字列バッファをクリアするものであ
る。(5) The present invention has a word dictionary that stores words and a character string learning table that stores character strings that are not registered in this word dictionary. A character string having the input character string at the beginning is searched as a predicted character string candidate from the word dictionary or the character string learning table, and a character string selected from the predicted character string candidates or directly input by a user. Determines the connection between the entered character string and the character string stored in the character string buffer in the previous search, and stores the selected character string or the directly input character string in the character string buffer when it is determined that connection is possible. If a plurality of character strings are stored in the character string buffer when it is determined that connection is impossible, a character string obtained by combining them is registered as a compound word in the character string learning table, and then the It is intended to clear the string buffer.

【００１８】このような構成によれば、予測候補の中で
選択された文字列または利用者によって直接入力された
文字列と前回の文字列との接続性が判定され、接続可能
な場合にはそれらの文字列は複合語として学習され、以
後、その学習文字列が予測対象の１つとして検索され
る。したがって、未登録語を含む複合語の入力であって
も、一度入力されれば次回の入力からはその語の先頭数
文字によって予測を行うことができるようになる。According to such a configuration, the connectivity between the character string selected from the prediction candidates or the character string directly input by the user and the previous character string is determined. Those character strings are learned as compound words, and thereafter, the learned character strings are searched for as one of the prediction targets. Therefore, even if a compound word including an unregistered word is input, once it is input, prediction can be performed by the first few characters of the word from the next input.

【００１９】（６）本発明は、上記（５）の構成におい
て、予測文字列の候補の中で選択された文字列または利
用者によって直接入力された文字列と前回の検索で上記
文字列バッファに格納された文字列との接続判定を文字
種に基づいて行うことを特徴とする。(6) The invention according to (5), wherein the character string selected from the candidates of the predicted character string or the character string directly input by the user and the character string buffer in the previous search are used. Is determined based on the character type.

【００２０】このような構成によれば、文字種によって
接続判定が行われる。この場合、当該選択文字列または
直接入力文字列が平仮名以外の１文字種からなり、か
つ、前回の文字列と同じ文字種である場合に接続可能と
され、それ以外は接続不可能とされる。According to such a configuration, the connection is determined based on the character type. In this case, if the selected character string or the directly input character string is composed of one character type other than hiragana and is of the same character type as the previous character string, connection is enabled, and other connections are disabled.

【００２１】（７）本発明は、上記（５）の構成におい
て、文字列間の接続判定を行うための文法情報を格納し
た文法辞書を有し、予測文字列の候補の中で選択された
文字列または利用者によって直接入力された文字列と前
回の検索で上記文字列バッファに格納された文字列との
接続判定を上記文法辞書に格納された上記文法情報に基
づいて行うことを特徴とする。(7) According to the present invention, in the configuration of the above (5), there is provided a grammar dictionary in which grammar information for determining connection between character strings is stored, and the grammar dictionary is selected from candidates of a predicted character string. A connection determination between a character string or a character string directly input by a user and a character string stored in the character string buffer in a previous search is performed based on the grammar information stored in the grammar dictionary. I do.

【００２２】このような構成によれば、文法情報によっ
て接続判定が行われる。この場合、例えば「名詞と動詞
は直接繋がらない」、「助詞が存在する場合には複合語
ではない」といった、予め用意された文法的な規則に従
って当該選択文字列または直接入力文字列と前回の文字
列とが複合語として接続可能か否かが判定される。According to such a configuration, the connection is determined based on the grammar information. In this case, for example, the selected character string or the directly input character string and the previous character string in accordance with a grammatical rule prepared in advance, such as "noun and verb are not directly connected" and "there is no compound word when particle exists". It is determined whether the character string can be connected as a compound word.

【００２３】[0023]

【発明の実施の形態】以下、図面を参照して本発明の一
実施形態を説明する。図１は本発明の一実施形態に係る
文書作成装置の構成を示すブロック図である。本装置
は、文字列予測機能を備えた日本語ワードプロセッサで
ある。図１において、入力装置１０１は、予測対象とな
る文字列を入力するものである。この入力装置１０１
は、例えば手書きタブレット等から得られる座標から手
書き文字を認識することで漢字かな混じり文字列を入力
することができ、入力情報を文字コードとして出力する
装置である。出力装置１１６は、例えばＣＲＴ (Cathod
e Ray Tube) やＬＣＤ (Liquid Crystal Display) 等の
画面表示装置であり、利用者はこれらにより対話的に文
書の作成作業を進めることができる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration of a document creation device according to an embodiment of the present invention. This device is a Japanese word processor having a character string prediction function. In FIG. 1, an input device 101 inputs a character string to be predicted. This input device 101
Is a device that can input a character string mixed with Chinese characters or kana by recognizing handwritten characters from coordinates obtained from a handwritten tablet or the like, and output input information as a character code. The output device 116 is, for example, a CRT (Cathod
It is a screen display device such as an e-ray tube) or an LCD (Liquid Crystal Display), and allows the user to interactively proceed with document creation work.

【００２４】入力制御部１０２は、入力装置１０１から
入力される文字データを入力バッファ１０３にバッファ
リングして文字列を作成し、予測対象となると当該文字
列を文字列検索部１０４に送る。The input control unit 102 buffers character data input from the input device 101 in the input buffer 103 to create a character string, and sends the character string to the character string search unit 104 when the character string is to be predicted.

【００２５】単語辞書１０５は、各単語毎に見出しや読
み、接続情報を格納したものである。文字列学習テーブ
ル１０６は、任意個の単語からなり、かつ、単語辞書１
０５に登録されていない文字列を格納したものである。
文字列検索部１０４は、入力制御部１０２から入力文字
列を受け取ると、その入力文字列を先頭部分に持つ文字
列を単語辞書１０５または文字列学習テーブル１０６か
ら検索し、その検索によって得た予測文字列の候補を候
補バッファ１０８に格納する。The word dictionary 105 stores headings, readings, and connection information for each word. The character string learning table 106 is composed of an arbitrary number of words, and includes the word dictionary 1
05 stores a character string that is not registered.
When the character string search unit 104 receives the input character string from the input control unit 102, the character string search unit 104 searches the word dictionary 105 or the character string learning table 106 for a character string having the input character string at the beginning, and predicts a character string obtained by the search. Character string candidates are stored in the candidate buffer 108.

【００２６】予測候補制御部１０７は、文字列検索部１
０４によって検索された予測文字列を表示して利用者に
よる候補選択処理を行い、利用者が確定した予測文字列
を文字列バッファ１１２に格納する。候補バッファ１０
８は、予測文字列の候補を格納するためのメモリであ
る。文字列入力部１０９は、利用者が予測文字列の候補
を選択せずに、入力文字列に続く残りの文字を直接入力
した場合での入力処理を行うものであって、利用者が直
接入力した文字列を文字列バッファ１１２に格納する。The prediction candidate control unit 107 is a character string search unit 1
The predicted character string searched by the user 04 is displayed, the user performs a candidate selection process, and the predicted character string determined by the user is stored in the character string buffer 112. Candidate buffer 10
Reference numeral 8 denotes a memory for storing predicted character string candidates. The character string input unit 109 performs an input process in a case where the user directly inputs the remaining characters following the input character string without selecting a candidate of the predicted character string. The stored character string is stored in the character string buffer 112.

【００２７】文法辞書１１１は、文字列間の接続判定を
行うための文法情報を格納したものである。文字列バッ
ファ１１２は、利用者が予測文字列の候補の中から選択
した予測文字列、または、意図した候補が得られずに利
用者によって直接入力された文字列を格納する。接続判
定部１１０は、文字の種類または文法辞書１１１に格納
された文法情報を参照して、利用者が選択した候補文字
列または利用者が直接入力した文字列と前の検索で文字
列バッファ１１２に格納された文字列との接続判定を行
う。この場合、例えば文法辞書によっては複合名詞とな
るものだけを接続可能とするといったように設定するこ
とができる。The grammar dictionary 111 stores grammar information for determining connection between character strings. The character string buffer 112 stores the predicted character string selected by the user from the candidates of the predicted character string, or the character string directly input by the user without obtaining the intended candidate. The connection determination unit 110 refers to the character type or the grammatical information stored in the grammar dictionary 111, and searches for a candidate character string selected by the user or a character string directly input by the user in the character string buffer 112 in the previous search. Performs connection determination with the character string stored in. In this case, for example, depending on the grammar dictionary, it can be set so that only those which become compound nouns can be connected.

【００２８】文字列登録部１１３は、接続判定部１１０
によって接続可能と判定された場合には利用者が選択し
た文字列または利用者が直接入力した文字列を文字列バ
ッファ１１２に格納し、接続不可能と判定された場合に
は文字列バッファ１１２に格納された文字列が複数あれ
ば、それらを結合した文字列を複合語として文字列学習
テーブル１０６に登録した後、文字列バッファ１１２を
クリアする。The character string registration unit 113 includes a connection determination unit 110
If the connection is determined to be possible, the character string selected by the user or the character string directly input by the user is stored in the character string buffer 112, and if the connection is determined to be impossible, the character string buffer 112 is stored in the character string buffer 112. If there are a plurality of stored character strings, a character string obtained by combining them is registered as a compound word in the character string learning table 106, and then the character string buffer 112 is cleared.

【００２９】出力制御部１１４は、出力バッファ１１５
に格納された文字列を出力装置１１６に表示する。出力
バッファ１１５は、出力装置１１６に表示する文字列を
格納するためのメモリである。The output control unit 114 includes an output buffer 115
Is displayed on the output device 116. The output buffer 115 is a memory for storing a character string to be displayed on the output device 116.

【００３０】次に、同実施形態の動作を説明する。図２
は同実施形態における文字入力から予測文字列の出力ま
での処理の流れを示したフローチャートである。入力装
置１０１から入力された文字データは、入力制御部１０
２において予測対象となるまで入力バッファ１０３にバ
ッファリングされて文字列を形成する（ステップ２０１
〜２０２）。Next, the operation of the embodiment will be described. FIG.
Is a flowchart showing a flow of processing from character input to output of a predicted character string in the embodiment. Character data input from the input device 101 is input to the input control unit 10.
In step 2, the character string is formed by being buffered in the input buffer 103 until it becomes a prediction target (step 201).
To 202).

【００３１】入力文字列が予測対象となると（ステップ
２０３のＹｅｓ）、入力バッファ１０３に格納された文
字列は文字列検索部１０４に送られ、文字列検索部１０
４において該文字列を先頭部分に持つ文字列が単語辞書
１０５または文字列学習テーブル１０６から検索され、
この検索により得られた予測文字列の候補が候補バッフ
ァ１０８に格納される（ステップ２０４，２０５）。When the input character string is to be predicted (Yes in step 203), the character string stored in the input buffer 103 is sent to the character string search unit 104, and the character string search unit 10
In step 4, a character string having the character string at the beginning is searched from the word dictionary 105 or the character string learning table 106.
Predicted character string candidates obtained by this search are stored in the candidate buffer 108 (steps 204 and 205).

【００３２】なお、入力文字列が予測対象となるタイミ
ング（予測処理を行うタイミング）は、例えば利用者に
よって予測ボタンが押されたとき、あるいは、所定の文
字数や時間に達したときなどである。The timing at which the input character string is to be predicted (the timing at which the prediction process is performed) is, for example, when the prediction button is pressed by the user, or when a predetermined number of characters or time is reached.

【００３３】予測文字列の検索が終了した後、予測候補
制御部１０７において予測文字列の候補を表示して利用
者の候補選択処理が行われる（ステップ２０６）。この
とき、利用者の意図する候補が得られなかったなどの理
由で、利用者がいずれの候補も選択せず、当該入力文字
列に続く文字列を直接入力した場合には、文字列入力部
１０９において、その文字列の入力処理が行われる（ス
テップ２０８〜２１０）。すなわち、文字列入力部１０
９は利用者が入力装置１０１を通じて直接入力した文字
列（所望の単語を構成する全文字数）を入力バッファ１
０３に格納する。After the search for the predicted character string is completed, the predicted candidate control unit 107 displays the predicted character string candidates and performs a user candidate selection process (step 206). At this time, if the user does not select any candidate and directly inputs a character string following the input character string, for example, because the candidate intended by the user was not obtained, the character string input unit At 109, the character string is input (steps 208 to 210). That is, the character string input unit 10
Reference numeral 9 denotes an input buffer 1 which stores a character string (the total number of characters constituting a desired word) directly input by the user through the input device 101.
03.

【００３４】このようにして、文字列が直接入力された
場合には、その文字列は未登録語として判断され、文字
列登録部１１３により文字列学習テーブル１０６に登録
される（ステップ２１１）。また、利用者が選択した文
字列または直接入力した文字列は、出力文字列として出
力バッファ１１５に一時的に格納される（ステップ２１
２）。As described above, when a character string is directly input, the character string is determined as an unregistered word, and registered in the character string learning table 106 by the character string registration unit 113 (step 211). The character string selected by the user or the character string directly input is temporarily stored in the output buffer 115 as an output character string (step 21).
2).

【００３５】ここで、予測候補制御部１０７は文字列バ
ッファ１１２をチェックし、そこに既に文字列が格納さ
れていれば（ステップ２１３のＹｅｓ）、接続判定部１
１０に指示を出し、出力バッファ１１５に現在格納され
ている文字列（予測文字列の候補の中で選択文字列また
は直接入力された文字列）と前回の検索で文字列バッフ
ァ１１２に格納された文字列（文字列バッファ１１２に
最後に格納された文字列）との接続判定処理を行う（ス
テップ２１４）。Here, the prediction candidate control unit 107 checks the character string buffer 112, and if a character string has already been stored therein (Yes in step 213), the connection determination unit 1
10, the character string currently stored in the output buffer 115 (the character string selected or directly input from the candidates of the predicted character string) and the character string stored in the character string buffer 112 in the previous search. A connection determination process with a character string (the character string last stored in the character string buffer 112) is performed (step 214).

【００３６】この接続判定処理は、各文字列を複合語と
して接続可能か否かをチェックするためのものであっ
て、具体的には以下のような各文字列の文字種または文
法辞書１１１に格納された文法情報に基づいて行う。This connection determination process is for checking whether or not each character string can be connected as a compound word, and specifically, is stored in the grammar dictionary 111 or the character type of each character string as follows. This is performed based on the grammar information.

【００３７】（ａ）文字種による接続判定方法では、当
該文字列が平仮名以外の１文字種からなり、かつ、前回
の文字列と同じ文字種である場合に接続可能とし、それ
以外を接続不可能であるとする。平仮名以外の１文字種
とするのは、複合語は漢字あるいはカタカナだけで入力
することが多く、また、平仮名は助詞として入力される
場合が多く、複合語としての判定が困難であるためであ
る。(A) In the connection determination method based on the character type, if the character string is composed of one character type other than Hiragana and has the same character type as the previous character string, connection is possible, and the other characters cannot be connected. And One character type other than hiragana is used because compound words are often input using only kanji or katakana, and hiragana characters are often input as particles, making it difficult to determine them as compound words.

【００３８】（ｂ）文法情報による接続判定方法では、
例えば「名詞と動詞は直接繋がらない」、「助詞が存在
する場合には複合語ではない」といった、予め用意され
た文法的な規則に従って当該文字列と前回の文字列とが
複合語として接続可能か否かを判定する。(B) In the connection determination method based on grammar information,
For example, the character string can be connected to the previous character string as a compound word according to grammatical rules prepared in advance, such as "Noun and verb are not directly connected", "Particles are not compound words if they exist". It is determined whether or not.

【００３９】このような接続判定処理により接続可能と
判定された場合には（ステップ２１５のＹｅｓ）、出力
バッファ１１５に現在格納されている文字列（予測文字
列の候補の中で選択文字列または直接入力された文字
列）が文字列バッファ１１２の最後に格納される（ステ
ップ２１６）。一方、接続不可能と判定された場合には
（ステップ２１５のＮｏ）、予測候補制御部１０７は文
字列バッファ１１２をチェックし、そこに複数の文字列
が格納されていれば（ステップ２１７のＹｅｓ）、文字
列登録部１１３に指示を出し、これらを結合した文字列
を複合語として文字列学習テーブル１０６に登録した後
（ステップ２１８）、文字列バッファ１１２をクリアす
る（ステップ２１９）。When it is determined that connection is possible by such connection determination processing (Yes in step 215), the character string currently stored in the output buffer 115 (selected character string or The directly input character string) is stored at the end of the character string buffer 112 (step 216). On the other hand, when it is determined that connection is not possible (No in step 215), the prediction candidate control unit 107 checks the character string buffer 112, and if a plurality of character strings are stored therein (Yes in step 217). ), An instruction is issued to the character string registration unit 113, and a character string obtained by combining these is registered as a compound word in the character string learning table 106 (step 218), and then the character string buffer 112 is cleared (step 219).

【００４０】また、上記ステップ２１３において、文字
列バッファ１１２に文字列が格納されていない場合に
は、出力バッファ１１５に格納された文字列（予測文字
列の候補の中で選択文字列または直接入力された文字
列）が文字列バッファ１１２に格納される（ステップ２
２０）。In step 213, if no character string is stored in the character string buffer 112, the character string stored in the output buffer 115 (selected character string or direct input Is stored in the character string buffer 112 (step 2).
20).

【００４１】この後、出力制御部１１４において出力バ
ッファ１１５に格納された文字列が画面表示される（ス
テップ２２１）。最後に、次の入力に備えて入力バッフ
ァ１０３、候補バッファ１０８、出力バッファ１１５が
初期化される（ステップ２２２）。Thereafter, the character string stored in the output buffer 115 is displayed on the screen by the output control unit 114 (step 221). Finally, the input buffer 103, candidate buffer 108, and output buffer 115 are initialized in preparation for the next input (step 222).

【００４２】次に、上述した文字列予測処理を具体例を
挙げて説明する。まず、図３を参照して従来の文字列予
測処理を説明する。従来、例えば単語辞書１０５に登録
されていない「情報処理学会」という複合語を入力する
場合、まず、図３（ａ）に示すように、利用者が「情」
を入力して、その予測候補の中から「情報」を選択す
る。次に、同図（ｂ）に示すように「処」を入力して、
その予測候補の中から「処理」を選択する。次に、同図
（ｃ）に示すように「学」を入力して、その予測候補の
中から「学会」を選択する。このように、「情報処理学
会」を入力する場合には、常に各単語毎に上記の操作を
繰り返さなければならない。Next, the above-described character string prediction processing will be described with a specific example. First, a conventional character string prediction process will be described with reference to FIG. Conventionally, when a compound word “information processing society” not registered in the word dictionary 105 is input, first, as shown in FIG.
, And select “information” from the prediction candidates. Next, as shown in FIG.
“Process” is selected from the prediction candidates. Next, "gaku" is input as shown in FIG. 9C, and "society" is selected from the prediction candidates. As described above, when "Information Processing Society" is input, the above operation must be repeated for each word.

【００４３】これに対し、本発明では、一度上記のよう
な操作を行えば、次の入力からは複合語の先頭数文字を
入力するだけで所望の文字列を得ることができる。図４
を参照して本発明の文字列予測処理を説明する。例えば
単語辞書１０５に登録されていない「情報処理学会」と
いう複合語を入力する場合、まず、図４（ａ）に示すよ
うに、利用者が「情」を入力して、その予測候補の中か
ら「情報」を選択する。このとき、選択文字列である
「情報」が文字列バッファ１１２に格納される。On the other hand, in the present invention, once the above operation is performed, a desired character string can be obtained only by inputting the first few characters of the compound word from the next input. FIG.
The character string prediction processing of the present invention will be described with reference to FIG. For example, when inputting a compound word “Information Processing Society” not registered in the word dictionary 105, first, as shown in FIG. Select "Information" from. At this time, “information” that is the selected character string is stored in the character string buffer 112.

【００４４】次に、同図（ｂ）に示すように「処」を入
力して、その予測候補の中から「処理」を選択する。こ
こで、文字列バッファ１１２に既に格納されている文字
列「情報」と今回選択された文字列「処理」との接続判
定が行われる。その結果、接続可能と判定されて、文字
列バッファ１１２の最後に今回の選択文字列「処理」が
加えられる。Next, as shown in FIG. 7B, "processing" is input, and "processing" is selected from the prediction candidates. Here, the connection determination between the character string “information” already stored in the character string buffer 112 and the character string “process” selected this time is performed. As a result, it is determined that connection is possible, and the current selected character string “processing” is added to the end of the character string buffer 112.

【００４５】次に、同図（ｃ）に示すように「学」を入
力して、その予測候補の中から「学会」を選択する。こ
こで、文字列バッファ１１２の最後に格納されている文
字列「処理」と今回の選択文字列「学会」との接続判定
が行われ、文字列バッファ１１２の最後に今回の選択文
字列「学会」が加えられる。Next, "gaku" is input as shown in FIG. 9C, and "society" is selected from the prediction candidates. Here, the connection determination between the character string “processing” stored at the end of the character string buffer 112 and the current selected character string “society” is performed, and the current selected character string “society” at the end of the character string buffer 112. Is added.

【００４６】次に、同図（ｄ）に示すように「と」が人
力されて、その候補文字列の中から付属語「としては」
が選択されたとする。この選択文字列「としては」は複
合語（複合名詞）を構成し得ないので、接続不可能と判
定される。このとき、文字列バッファ１１２には複数の
文字列「情報」，「処理」，「学会」が格納されている
ので、これらの文字列を結合した「情報処理学会」が文
字列学習テーブル１０６に登録される。また、接続不可
能な文字列「としては」の入力によって、文字列バッフ
ァ１１２がクリアされる。Next, as shown in FIG. 4D, "to" is manually input, and an auxiliary word "to" is selected from the candidate character strings.
Is selected. Since this selected character string "to" cannot form a compound word (compound noun), it is determined that connection is impossible. At this time, since the character string buffer 112 stores a plurality of character strings “information”, “processing”, and “society”, the “information processing society” combining these character strings is stored in the character string learning table 106. be registered. In addition, the character string buffer 112 is cleared by the input of the character string “was” that cannot be connected.

【００４７】このように、接続不可能な文字列の入力を
トリガにして、その以前に文字列バッファ１１２に格納
された複数の文字列が１つの複合語として文字列学習テ
ーブル１０６に登録される。これにより、以後、同図
（ｅ）に示すように再び「情」を入力すると、文字列学
習テーブル１０６を用いて「情報処理学会」が予測候補
として得られることになる。As described above, with the input of a character string that cannot be connected as a trigger, a plurality of character strings stored in the character string buffer 112 before that are registered in the character string learning table 106 as one compound word. . As a result, when “information” is input again as shown in FIG. 11E, “information processing society” is obtained as a prediction candidate using the character string learning table 106.

【００４８】また、本発明では、一度利用者が直接入力
した文字列も、次の入力からは予測候補に挙がってく
る。このときの具体例を図５に示す。例えば図５（ａ）
に示すように、利用者が「チャールズ」といった文字列
を意図して「チャ」と入力したが、予測候補に「チャー
ルズ」がない場合つまり単語辞書１０５に登録されてい
ない場合には、同図（ｂ）に示すように、利用者は「チ
ャールズ」を直接入力することになる。In the present invention, a character string directly input by the user once is also listed as a prediction candidate from the next input. A specific example at this time is shown in FIG. For example, FIG.
As shown in FIG. 5, when the user inputs “cha” with the intention of a character string such as “Charles”, but there is no “Charles” in the prediction candidate, that is, when the word is not registered in the word dictionary 105, As shown in (b), the user directly inputs "Charles".

【００４９】ここで、利用者によって直接入力された文
字列は文字列バッファ１１２に格納され、その文字列は
未登録語として文字列学習テーブル１０６に登録され
る。これにより、以後、同図（ｃ）に示すように再び
「チャ」と入力すると、文字列学習テーブル１０６を用
いて「チャールズ」が予測候補として得られることにな
る。Here, the character string directly input by the user is stored in the character string buffer 112, and the character string is registered in the character string learning table 106 as an unregistered word. As a result, when "ch" is input again as shown in FIG. 9C, "charles" is obtained as a prediction candidate using the character string learning table 106.

【００５０】さらに、本発明では、例えば「チャールズ
皇太子」といったような未登録語を含む複合語を入力し
た場合でも予測可能である。この場合には、「チャール
ズ」と「皇太子」を文字列バッファ１１２に格納後、そ
の両者の接続判定を行うことで実現できる。Further, in the present invention, even when a compound word including an unregistered word such as "Prince Charles" is input, it can be predicted. This case can be realized by storing “Charles” and “Prince” in the character string buffer 112 and then performing a connection determination between the two.

【００５１】なお、本発明は、上記実施形態で説明した
ような手書き文字の入力に限らず、キーボードによる入
力においても応用可能であり、このような場合でも、上
記同様に利用者の入力操作の手間を軽減でき、文書作成
効率を向上させることができる。The present invention can be applied not only to the input of handwritten characters as described in the above embodiment, but also to the input by a keyboard. Even in such a case, the input operation of the user is performed in the same manner as described above. Time and labor can be reduced, and document creation efficiency can be improved.

【００５２】[0052]

【発明の効果】以上のように本発明によれば、順次選択
される予測文字列のうち接続可能な各文字列を複合語と
して学習し、以後、その学習文字列を予測対象の１つと
して検索するようにしたため、複合語の入力であって
も、一度入力されれば次回の入力からはその語の先頭数
文字によって予測を行うことができるようになる。これ
により、利用者の入力操作を軽減することができ、文書
作成の効率向上を図ることができる。As described above, according to the present invention, each connectable character string among the predictive character strings sequentially selected is learned as a compound word, and thereafter, the learned character string is set as one of the prediction targets. Since the search is performed, even if a compound word is input, once it is input, prediction can be performed by the first few characters of the word from the next input. As a result, user input operations can be reduced, and the efficiency of document creation can be improved.

【００５３】また、本発明によれば、利用者が文字列を
直接入力した場合に、その文字列を未登録語として学習
し、以後、その学習文字列を予測対象の１つとして検索
するようにしたため、未登録語の入力であっても、一度
入力されれば次回の入力からはその語の先頭数文字によ
って予測を行うことができるようになる。これにより、
利用者の入力操作を軽減することができ、文書作成の効
率向上を図ることができる。According to the present invention, when a user directly inputs a character string, the character string is learned as an unregistered word, and thereafter, the learned character string is searched as one of the prediction targets. Therefore, even if an unregistered word is input, once it is input, it can be predicted from the next input using the first few characters of the word. This allows
User input operations can be reduced, and the efficiency of document creation can be improved.

【００５４】また、本発明によれば、予測候補の中で選
択された文字列または利用者によって直接入力された文
字列と前回の文字列との接続性を判定し、接続可能な場
合にはそれらの文字列を複合語として学習し、以後、そ
の学習文字列を予測対象の１つとして検索するようにし
たため、未登録語を含む複合語の入力であっても、一度
入力されれば次回の入力からはその語の先頭数文字によ
って予測を行うことができるようになる。これにより、
利用者の入力操作を軽減することができ、文書作成の効
率向上を図ることができる。According to the present invention, the connectivity between the character string selected from the prediction candidates or the character string directly input by the user and the previous character string is determined. Since those character strings are learned as compound words, and the learned character strings are searched as one of the prediction targets thereafter, even if a compound word including an unregistered word is input, once it is input, the next time it is input, Input makes it possible to make predictions using the first few characters of the word. This allows
User input operations can be reduced, and the efficiency of document creation can be improved.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る装置構成を示すブロ
ック図。FIG. 1 is a block diagram showing a device configuration according to an embodiment of the present invention.

【図２】同実施形態における文字列予測処理の動作を示
すフローチャート。FIG. 2 is an exemplary flowchart showing the operation of a character string prediction process in the embodiment.

【図３】従来の文字列予測処理を説明するための具体
例。FIG. 3 is a specific example for explaining a conventional character string prediction process.

【図４】本発明の文字列予測処理を説明するための具体
例。FIG. 4 is a specific example for explaining a character string prediction process according to the present invention.

【図５】本発明の文字列予測処理を説明するための具体
例。FIG. 5 is a specific example for explaining the character string prediction processing of the present invention.

[Explanation of symbols]

１０１…入力装置１０２…入力制御部１０３…入力バッファ１０４…文字列検索部１０５…単語辞書１０６…文字列学習テーブル１０７…予測候補制御部１０８…候補バッファ１０９…文字列入力部１１０…接続判定部１１１…文法辞書１１２…文字列バッファ１１３…文字列登録部１１４…出力制御部１１５…出力バッファ１１６…出力装置 Reference Signs List 101 input device 102 input control unit 103 input buffer 104 character string search unit 105 word dictionary 106 character string learning table 107 prediction candidate control unit 108 candidate buffer 109 character string input unit 110 connection determination unit 111 grammar dictionary 112 character string buffer 113 character string registration unit 114 output control unit 115 output buffer 116 output device

Claims

[Claims]

A word dictionary storing words and a character string learning table for storing character strings that are not registered in the word dictionary are provided. The character string in the part is searched as a predicted character string candidate from the word dictionary or the character string learning table, and the character string selected in the predicted character string candidate and stored in the character string buffer in the previous search. The connection determination is performed with the connected character string. When the connection is determined to be possible, the selected character string is stored in the character string buffer. When the connection is determined to be impossible, a plurality of character strings are stored in the character string buffer. If stored,
A character string predicting method comprising: registering a character string obtained by combining them as a compound word in the character string learning table; and then clearing the character string buffer.

2. A connection determination between a character string selected from candidates of a predicted character string and a character string stored in the character string buffer in a previous search is performed based on a character type. 1. The character string prediction method according to 1.

3. A grammar dictionary for storing grammar information for determining connection between character strings, and a character string selected from candidates of a predicted character string and stored in the character string buffer in a previous search. 2. The character string prediction method according to claim 1, wherein the connection determination with the selected character string is performed based on the grammar information stored in the grammar dictionary.

4. It has a word dictionary storing words, and a character string learning table for storing character strings not registered in the word dictionary. When a character string to be predicted is input, the input character string A character string in a part is searched from the word dictionary or the character string learning table as a candidate for a predicted character string. If the search does not yield a candidate intended by the user, a character string directly input by the user Is registered in the character string learning table.

5. A word dictionary that stores words and a character string learning table that stores character strings that are not registered in the word dictionary. When a character string to be predicted is input, the input character string is read first. A character string in a part is searched as a predicted character string candidate from the word dictionary or the character string learning table, and a character string selected from the predicted character string candidates or a character string directly input by a user is searched. Performs connection determination with the character string stored in the character string buffer in the previous search, and if it is determined that connection is possible, stores the selected character string or the directly input character string in the character string buffer and disables connection If it is determined that a plurality of character strings are stored in the character string buffer, a character string obtained by combining them is registered as a compound word in the character string learning table, and then the character string buffer is stored. String prediction method which is characterized in that clear.

6. A connection determination between a character string selected from candidates of a predicted character string or a character string directly input by a user and a character string stored in the character string buffer in a previous search is set as a character type. 6. The method according to claim 5, wherein
The described character string prediction method.

7. A grammar dictionary storing grammar information for determining connection between character strings, wherein a character string selected from candidates of a predicted character string or a character string directly input by a user. 6. The character string prediction method according to claim 5, wherein connection determination with a character string stored in the character string buffer in a previous search is performed based on the grammar information stored in the grammar dictionary.

8. An input means for inputting a character string to be predicted, a word dictionary storing words, a character string learning table storing character strings unregistered in the word dictionary, Character string search means for searching a character string having the first character string as a predicted character string candidate from the word dictionary or the character string learning table, a character string buffer for storing the determined character string, Connection determination means for determining a connection between a character string selected from the candidates of the predicted character string searched by the character string search means and a character string stored in the character string buffer in a previous search; When the determination unit determines that connection is possible, the selected character string is stored in the character string buffer, and when it is determined that connection is not possible, a plurality of character strings are stored in the character string buffer. And a character string registering means for registering a character string obtained by combining them as a compound word in the character string learning table and then clearing the character string buffer.

9. The connection determining unit performs a connection determination between a character string selected from candidates of a predicted character string and a character string stored in the character string buffer in a previous search based on a character type. 9. The document creation device according to claim 8, wherein:

10. A grammar dictionary storing grammar information for determining connection between character strings, wherein the connection determination unit determines a character string selected from candidates of a predicted character string by a previous search. 9. The document creation apparatus according to claim 8, wherein connection determination with a character string stored in the character string buffer is performed based on the grammar information stored in the grammar dictionary.

11. An input means for inputting a character string to be predicted, a word dictionary storing words, a character string learning table storing character strings not registered in the word dictionary, Character string search means for searching a character string having the character string at the beginning as a candidate for a predicted character string from the word dictionary or the character string learning table; and a candidate intended by the user is obtained by the character string search means. A character string registering means for registering a character string directly inputted by a user in the character string learning table when there is no character string.

12. An input means for inputting a character string to be predicted; a word dictionary storing words; a character string learning table storing character strings not registered in the word dictionary; Character string search means for searching a character string having the first character string as a predicted character string candidate from the word dictionary or the character string learning table, a character string buffer for storing the determined character string, Connection between a character string selected from among the predicted character string candidates searched by the character string search means or a character string directly input by the user and a character string stored in the character string buffer in a previous search. Connection determination means for performing determination; storing the selected character string in the character string buffer when the connection determination means determines that connection is possible; and determining that connection is not possible. If a plurality of character strings are stored in the character string buffer, a character string registration unit that clears the character string buffer after registering a character string obtained by combining them as a compound word in the character string learning table. A document creation device, comprising:

13. The connection judging means according to claim 1, wherein the character string selected from the candidates of the predicted character string or the character string directly input by the user and the character string stored in the character string buffer in the previous search. 13. The document creation apparatus according to claim 12, wherein the connection determination is performed based on a character type.

14. A grammar dictionary storing grammar information for determining connection between character strings, wherein the connection determining means is a character string selected from candidates of a predicted character string or directly by a user. 13. The document according to claim 12, wherein connection determination between an input character string and a character string stored in the character string buffer in a previous search is performed based on the grammar information stored in the grammar dictionary. Creating device.