JP5326945B2

JP5326945B2 - Character input support device, program, and character input support method

Info

Publication number: JP5326945B2
Application number: JP2009201449A
Authority: JP
Inventors: 清司大倉; 友樹長瀬; 清志竹内; 宏志吉田; 裕一林
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-09-01
Filing date: 2009-09-01
Publication date: 2013-10-30
Anticipated expiration: 2029-09-01
Also published as: JP2011053866A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a character input support device ranking a character string having a frequently used meaning in the higher rank of input prediction candidates by summarizing frequencies of use for each group of character strings having the same meaning. <P>SOLUTION: The character input support device includes: an independent word extraction part 205 extracting an independent word from partial character string data; a first detection part 206 selecting partial character string data of a processing target from the partial character string data and detecting other partial character string data including the independent word included in the partial character string data of the processing target; a first selection part 207 selecting partial character string data from the detected other partial character string data and the partial character string data of the processing target according to the length or an appearance frequency of the partial character string data and storing the selected partial character string data as an input prediction candidate in a RAM 103; and a frequency recalculation part 204 calculating a total value of the appearance frequency of the partial character string data of the processing target and that of the detected other partial character string data and recording the calculated total value in the RAM 103 as an application frequency of the input prediction candidate. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、文字入力支援装置、プログラム及び文字入力支援方法に関する。 The present invention relates to a character input support device, a program, and a character input support method.

ユーザのキーボード入力を支援する文字入力支援装置が知られている。文字入力支援装置は、過去にキーボード入力された文字列の入力履歴を記憶部に記憶しておき、この入力履歴を参照して次に入力される文字列を予測し、予測した入力予測候補を表示部に表示する装置である。表示部に表示した入力予測候補の１つがユーザによって指定されると、指定された文字列が入力領域に入力される。この種の技術が特許文献１及び２に開示されている。 2. Description of the Related Art Character input support devices that support user keyboard input are known. The character input support device stores an input history of a character string input from the keyboard in the past in the storage unit, predicts a character string to be input next with reference to the input history, and determines a predicted input prediction candidate. It is a device that displays on the display unit. When one of the input prediction candidates displayed on the display unit is specified by the user, the specified character string is input to the input area. This type of technology is disclosed in Patent Documents 1 and 2.

特許第３９３３９５２号公報Japanese Patent No. 3933952 特開２００７−３３４５３４号公報JP 2007-334534 A

同じ内容を異なる文字列で表現する場合がある。例えば、「血圧：正常」、「血圧は正常です」、「血圧正常」はすべて同じ内容を表している。従来の文字入力支援装置では、個々の文字列ごとに頻度を集計しており、文字列の表す意味や内容に応じて集計することは行っていなかった。このため、使用頻度が高いが、複数の文字列で表現される内容、又はこの内容を表す文字列を入力予測候補の上位にランク付けすることが難しく、ユーザの利便性を低下させる要因となっていた。 The same contents may be expressed by different character strings. For example, “blood pressure: normal”, “blood pressure is normal”, and “blood pressure normal” all indicate the same content. In the conventional character input support device, the frequency is totaled for each character string, and is not tabulated according to the meaning and contents of the character string. For this reason, although the frequency of use is high, it is difficult to rank the content expressed by a plurality of character strings, or the character string representing this content at the top of the input prediction candidates, which causes a decrease in user convenience. It was.

本発明は上記事情に鑑みてなされたものであり、同じ意味を持った文字列のグループごとに使用頻度を集計することで、使用頻度の高い意味を持った文字列を入力予測候補の上位にランク付けすることができる文字入力支援装置、プログラム及び文字入力支援方法を提供することを目的とする。 The present invention has been made in view of the above circumstances, and by summing up the frequency of use for each group of character strings having the same meaning, character strings having a high usage frequency are ranked higher in the input prediction candidates. It is an object to provide a character input support device, a program, and a character input support method that can be ranked.

本明細書に開示の文字入力支援装置は、入力受付手段によって入力を受け付け、記憶手段に記憶した複数の文字列データを解析して、各文字列データを、該各文字列データに含まれる単語又は文節を少なくとも２以上含んだ複数の部分文字列データに変換する変換手段と、前記変換手段で変換された部分文字列データの、前記複数の文字列データ中での出現頻度を計数する頻度計数手段と、前記変換手段により変換された部分文字列データから、自立語を抽出する自立語抽出手段と、前記変換手段により変換された部分文字列データの中から処理対象の部分文字列データを選択し、前記自立語抽出手段の抽出結果を参照して、前記処理対象の部分文字列データに含まれる自立語を含む他の部分文字列データを、前記処理対象の部分文字列データと同じ意味を持った部分文字列データとして検出する第１検出手段と、前記第１検出手段で検出した前記他の部分文字列データと前記処理対象の部分文字列データとのうち、部分文字列データの長さ、又は前記頻度計数手段で計数した出現頻度に応じて部分文字列データを選択し、選択した部分文字列データを、文字列の入力を支援する入力予測候補として前記記憶手段に記憶させる第１選択手段と、前記処理対象の部分文字列データと、前記第１検出手段で検出した前記他の部分文字列データとの出現頻度の合計値を算出し、算出した合計値を前記入力予測候補の使用頻度として前記記憶手段に記憶させる頻度算出手段と、前記入力受付手段により文字列データの入力を受け付けた場合に、前記記憶手段に記憶した使用頻度に基づいて、入力を受け付けた前記文字列データを含む部分文字列データを選出し、選出した部分文字列データを表示手段に表示させる選出手段とを備えている。
本明細書に開示の文字入力支援装置は、部分文字列データから自立語を抽出し、同じ自立語を含んだ部分文字列データを同じ意味を持った部分文字列データとして検出している。また、同じ意味を持つ部分文字列データから１つを選択し、同じ意味を持つ文字列データの使用頻度の和を、選択した文字列データの使用頻度とすることで、同じ意味を持った部分文字列データのグループごとに、使用頻度を集計している。従って、使用頻度の高い意味を持った文字列を入力予測候補の上位にランク付けすることができる。 The character input support device disclosed in this specification receives input by an input receiving unit, analyzes a plurality of character string data stored in a storage unit, and converts each character string data into a word included in each character string data. Alternatively, conversion means for converting to a plurality of partial character string data including at least two clauses, and a frequency count for counting the appearance frequency of the partial character string data converted by the conversion means in the plurality of character string data Means for extracting independent words from the partial character string data converted by the converting means; and selecting the partial character string data to be processed from the partial character string data converted by the converting means Then, referring to the extraction result of the independent word extraction unit, other partial character string data including the independent word included in the partial character string data to be processed is converted into the partial character string data to be processed. A first character detection unit that detects partial character string data having the same meaning as the first character, a partial character string among the other partial character string data detected by the first detection unit and the partial character string data to be processed The partial character string data is selected according to the length of the data or the appearance frequency counted by the frequency counting means, and the selected partial character string data is stored in the storage means as an input prediction candidate that supports the input of the character string. Calculating a total value of appearance frequencies of the first selection means to be processed, the partial character string data to be processed, and the other partial character string data detected by the first detection means, and the calculated total value is the input Based on the frequency of use stored in the storage means when receiving the input of character string data by the frequency calculation means to be stored in the storage means as the use frequency of the prediction candidate and the input reception means, Elected partial character string data containing the character string data received power, and a selection means to be displayed on the display means selected by the partial character string data.
The character input support device disclosed in this specification extracts an independent word from partial character string data, and detects partial character string data including the same independent word as partial character string data having the same meaning. In addition, by selecting one of the partial character string data having the same meaning and using the sum of the frequency of use of the character string data having the same meaning as the frequency of use of the selected character string data, the portion having the same meaning Usage frequency is aggregated for each group of character string data. Therefore, it is possible to rank a character string having a high frequency of use at a higher rank than the input prediction candidates.

本明細書に開示の文字入力支援装置によれば、同じ意味を持った文字列のグループごとに使用頻度を集計することで、使用頻度の高い意味を持った文字列を入力予測候補の上位にランク付けすることができる。 According to the character input support device disclosed in the present specification, by summing up the frequency of use for each group of character strings having the same meaning, the character string having a high usage frequency is ranked higher in the input prediction candidates. Can be ranked.

コンピュータ装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a computer apparatus. プログラム制御によって制御部に構成される機能ブロックの構成の一例を示す図である。It is a figure which shows an example of a structure of the functional block comprised in a control part by program control. 図２に示す入力予測候補ＤＢ作成部の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the input prediction candidate DB production | generation part shown in FIG. ＲＡＭに記録される記録情報の一例を示す図であり、文字列を分割した文節列を文節数ごとに記録した様子を示す図である。It is a figure which shows an example of the recording information recorded on RAM, and is a figure which shows a mode that the phrase string which divided | segmented the character string was recorded for every number of phrases. （Ａ）は、２文節列が入力履歴として記録された文字列データ中に出現する出現頻度をカウントしたカウント結果を示す図であり、（Ｂ）は、３文節列が入力履歴として記録された文字列データ中に出現する出現頻度をカウントしたカウント結果を示す図であり、（Ｃ）は、（Ａ）に示す２文節列を先頭文字をキーにしてソートした結果を示す図であり、（Ｄ）は、（Ｂ）に示す３文字列を先頭文字をキーにしてソートした結果を示す図である。(A) is a figure which shows the count result which counted the appearance frequency which appears in the character string data with which 2 phrase strings were recorded as input history, and (B) was recorded with 3 phrase strings as input history. It is a figure which shows the count result which counted the appearance frequency which appears in character string data, (C) is a figure which shows the result of having sorted the 2 phrase string shown to (A) by using the first character as a key, ( (D) is a diagram showing a result of sorting the three character strings shown in (B) using the first character as a key. ＲＡＭに記録される記録情報の一例を示す図であり、最終的に記録される記録情報の一例を示す図である。It is a figure which shows an example of the recording information recorded on RAM, and is a figure which shows an example of the recording information finally recorded. 制御部の全体の処理手順を示すフローチャートである。It is a flowchart which shows the whole process sequence of a control part. 図７に示すステップＳ３の詳細な手順を示すフローチャートである、It is a flowchart which shows the detailed procedure of step S3 shown in FIG. 図７に示すステップＳ４の詳細な手順を示すフローチャートである。It is a flowchart which shows the detailed procedure of step S4 shown in FIG. 図７に示すステップＳ５の詳細な手順を示すフローチャートである。It is a flowchart which shows the detailed procedure of step S5 shown in FIG. ディスプレイに表示される入力予測候補の第１の表示例を示す図である。It is a figure which shows the 1st example of a display of the input prediction candidate displayed on a display. ディスプレイに表示される入力予測候補の第２の表示例を示す図である。It is a figure which shows the 2nd example of a display of the input prediction candidate displayed on a display. ディスプレイに表示される入力予測候補の第３の表示例を示す図であり、（Ａ）は、表示する文字列をさらに絞り込む絞り込み語の入力を要求する画面の例であり、（Ｂ）は、入力された絞り込み語に従って、上位の入力予測候補を表示した例を示す図である。It is a figure which shows the 3rd example of an input prediction candidate displayed on a display, (A) is an example of the screen which requests | requires the input of the narrowing word which further narrows down the character string to display, (B), It is a figure which shows the example which displayed the upper input prediction candidate according to the narrowed-down word input. ディスプレイに表示される入力予測候補の第４の表示例を示す図であり、（Ａ）は、ユーザによって入力された文字列と、入力予測候補を絞り込む絞り込み語とを表示した例であり、（Ｂ）、（Ｃ）は絞り込み語の１つが選択された場合の入力予測候補の表示例を示す図である。It is a figure which shows the 4th example of a display of the input prediction candidate displayed on a display, (A) is the example which displayed the character string input by the user, and the narrowing word which narrows down an input prediction candidate, ( B) and (C) are diagrams showing display examples of input prediction candidates when one of the narrowed words is selected.

以下、添付図面を参照しながら本発明の好適な実施例を説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

まず、図１を参照しながら本発明の文字列入力支援装置を搭載したコンピュータ装置１の構成について説明する。
コンピュータ装置１は、制御部１００と、記憶部１０４と、グラフィックボード１１１と、インターフェース（以下、Ｉ／Ｆと略記する）１１３、１１４と、ネットワークインターフェース（以下、ネットワークＩ／Ｆと略記する）１１８と、ディスプレイ１１２と、操作部１１５とを備えている。制御部１００は、ＣＰＵ１０１と、ＲＯＭ１０２と、ＲＡＭ１０３とを備えている。また、操作部１１５は、キーボード１１６とマウス１１７とを備えている。 First, the configuration of a computer device 1 equipped with the character string input support device of the present invention will be described with reference to FIG.
The computer apparatus 1 includes a control unit 100, a storage unit 104, a graphic board 111, interfaces (hereinafter abbreviated as I / F) 113 and 114, and a network interface (hereinafter abbreviated as network I / F) 118. And a display 112 and an operation unit 115. The control unit 100 includes a CPU 101, a ROM 102, and a RAM 103. The operation unit 115 includes a keyboard 116 and a mouse 117.

まず、制御部１００について説明する。ＣＰＵ（Central Processing Unit）１０１は、装置全体の動作を司る。ＲＯＭ（Read Only Memory）１０２は、コンピュータ装置１を文字入力支援装置として動作させるための制御プログラムや、その他各種のプログラムを予め記憶している。ＲＡＭ（Random Access Memory）１０３は、ＣＰＵ１０１が演算に使用するデータや、演算結果のデータなどを記憶する。 First, the control unit 100 will be described. A CPU (Central Processing Unit) 101 controls the operation of the entire apparatus. A ROM (Read Only Memory) 102 stores in advance a control program for operating the computer device 1 as a character input support device and other various programs. A RAM (Random Access Memory) 103 stores data used by the CPU 101 for calculation, data of calculation results, and the like.

記憶部１０４は、コンピュータ装置１に備えられるＨＤＤ（Hard Disk Drive）などの記憶装置である。記憶部１０４は、過去にキーボード入力された文字列の履歴を記録した文字列履歴データベース（以下、データベースをＤＢと表記する）を記憶している。また、記憶部１０４は、入力されたかな文字を漢字に変換するための辞書を登録したかな漢字変換候補ＤＢや、過去に入力された文字列の入力履歴に基づいて、次に入力される候補を予測するための入力予測候補ＤＢを記憶している。さらに、記憶部１０４は、形態素解析に必要な辞書、自立語の同義語を抽出するための同義語辞書や、単語の意味を表す記号を登録した辞書を記憶している。
なお、記憶部１０４は、ＨＤＤに限られる分けではなく、コンピュータで読み込み可能なフレキシブルディスク（ＦＤ）、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）光磁気ディスク、またはＩＣカードなどの可搬記憶媒体であってもよい。 The storage unit 104 is a storage device such as an HDD (Hard Disk Drive) provided in the computer apparatus 1. The storage unit 104 stores a character string history database (hereinafter, the database is referred to as DB) in which a history of character strings input from the keyboard in the past is recorded. In addition, the storage unit 104 determines candidates to be input next based on the Kana-Kanji conversion candidate DB in which a dictionary for converting input Kana characters into Kanji is registered or the input history of character strings input in the past. An input prediction candidate DB for prediction is stored. Furthermore, the storage unit 104 stores a dictionary necessary for morphological analysis, a synonym dictionary for extracting synonyms of independent words, and a dictionary in which symbols representing the meaning of words are registered.
The storage unit 104 is not limited to the HDD, but is a computer-readable flexible disk (FD), DVD (Digital Versatile Disc), DVD-RAM, CD-R (Recordable) / RW (ReWritable) magneto-optical. It may be a portable storage medium such as a disk or an IC card.

グラフィックボード１１１は、ディスプレイ１１２への各種情報の表示を制御する。Ｉ／Ｆ１１３は、キーボード１１６で入力を受け付けた文字情報をＣＰＵ１０１に送る。Ｉ／Ｆ１１４は、マウス１１７で入力を受け付けた入力情報をＣＰＵ１０１に送る。ネットワークＩ／Ｆ１１８は、ネットワーク１１９に接続された外部装置との間で通信データの送受信を行う。 The graphic board 111 controls display of various types of information on the display 112. The I / F 113 sends the character information received from the keyboard 116 to the CPU 101. The I / F 114 sends input information received by the mouse 117 to the CPU 101. The network I / F 118 transmits / receives communication data to / from an external device connected to the network 119.

ＣＰＵ１０１、ＲＯＭ１０２、ＲＡＭ１０３、記憶部１０４、グラフィックボード１１１、Ｉ／Ｆ１１３、１１４、ネットワークＩ／Ｆ１１８は、システムバス１０５に接続されている。従って、ＣＰＵ１０１は、ＲＯＭ１０２、ＲＡＭ１０３、記憶部１０４へのアクセスと、グラフィックボード１１１を介したディスプレイ１１２への各種情報の表示と、キーボード１１６、マウス１１７の操作による情報の入力と、ネットワークＩ／Ｆ１１８を介した通信データの送受信等を行うことができる。 The CPU 101, ROM 102, RAM 103, storage unit 104, graphic board 111, I / F 113, 114, and network I / F 118 are connected to the system bus 105. Accordingly, the CPU 101 accesses the ROM 102, RAM 103, and storage unit 104, displays various information on the display 112 via the graphic board 111, inputs information by operating the keyboard 116 and mouse 117, and the network I / F 118. Communication data can be transmitted / received via the network.

次に、図２及び図３を参照しながらプログラム制御によって実現される制御部１００の機能ブロックについて説明する。なお、機能ブロックとは、制御部１００の備えるＣＰＵ１０１、ＲＡＭ１０３等のハードウェアと、ＲＯＭ１０２に記憶した制御プログラムとの協働によって実現される処理を、一定の機能ごとにまとめたブロックである。
制御部１００は、図２に示すように入力予測候補ＤＢ作成部２００と、かな漢字変換処理部２５０と、入力予測候補検索部３００と、表示制御部３５０とを備えている。 Next, functional blocks of the control unit 100 realized by program control will be described with reference to FIGS. 2 and 3. The functional block is a block in which processing realized by cooperation of hardware such as the CPU 101 and the RAM 103 included in the control unit 100 and the control program stored in the ROM 102 is grouped for each predetermined function.
As shown in FIG. 2, the control unit 100 includes an input prediction candidate DB creation unit 200, a kana-kanji conversion processing unit 250, an input prediction candidate search unit 300, and a display control unit 350.

入力予測候補ＤＢ作成部２００は、記憶部１０４に記憶する入力予測候補ＤＢを作成する。記憶部１０４には、過去に、ユーザがキーボード１１６から入力した文字列の履歴が記録されている。入力予測候補ＤＢ作成部２００は、文字列の履歴を参照して、入力予測候補としてディスプレイ１１２の表示領域に表示させる文字列を選択する。選択した文字列は、入力予測候補として入力予測候補ＤＢに記録される。なお、入力予測候補ＤＢ作成部２００の詳細については、図３を参照しながら後述する。 The input prediction candidate DB creation unit 200 creates an input prediction candidate DB stored in the storage unit 104. In the storage unit 104, a history of character strings input by the user from the keyboard 116 in the past is recorded. The input prediction candidate DB creation unit 200 refers to the character string history and selects a character string to be displayed in the display area of the display 112 as an input prediction candidate. The selected character string is recorded in the input prediction candidate DB as an input prediction candidate. Details of the input prediction candidate DB creation unit 200 will be described later with reference to FIG.

かな漢字変換処理部２５０は、記憶部１０４に記憶したかな漢字変換候補ＤＢを参照して、キーボード１１６で入力を受け付けたかな文字の漢字変換処理を行う。かな漢字変換処理部２５０は、キーボード１１６でかな文字の入力を受け付けると、かな漢字変換候補ＤＢを参照して、入力を受け付けたかな文字を漢字に変換する変換候補を選択する。選択した変換候補は、表示制御部３５０の表示制御によってディスプレイ１１２に表示される。 The kana-kanji conversion processing unit 250 refers to the kana-kanji conversion candidate DB stored in the storage unit 104 and performs kana-kanji conversion processing for the characters that have been accepted by the keyboard 116. When the kana-kanji conversion processing unit 250 receives input of kana characters from the keyboard 116, the kana-kanji conversion candidate DB refers to the kana-kanji conversion candidate DB and selects a conversion candidate for converting the input kana character into kanji. The selected conversion candidate is displayed on the display 112 by display control of the display control unit 350.

入力予測候補検索部３００は、前述した入力予測候補ＤＢを参照して、キーボード１１６で入力を受け付けた文字列の入力予測候補を選択する。選択した入力予測候補は、入力予測候補検索部３００から表示制御部３５０に出力され、表示制御部３５０の制御によってディスプレイ１１２に表示される。 The input prediction candidate search unit 300 refers to the above-described input prediction candidate DB, and selects an input prediction candidate of a character string that has been input with the keyboard 116. The selected input prediction candidate is output from the input prediction candidate search unit 300 to the display control unit 350 and displayed on the display 112 under the control of the display control unit 350.

表示制御部３５０は、かな漢字変換処理部２５０で選択された変換候補と、入力予測候補検索部３００で選択された入力予測候補とをディスプレイ１１２に表示させる。変換候補と入力予測候補とのディスプレイ１１２への表示例については、後述する。 The display control unit 350 causes the display 112 to display the conversion candidate selected by the kana-kanji conversion processing unit 250 and the input prediction candidate selected by the input prediction candidate search unit 300. A display example of the conversion candidate and the input prediction candidate on the display 112 will be described later.

次に、図３を参照しながら入力予測候補ＤＢ作成部２００の詳細について説明する。入力予測候補ＤＢ作成部２００は、変換部２０１、頻度計数部２０２、第３検出部２０３、頻度再計算部２０４、自立語抽出部２０５、第１検出部２０６、第１選択部２０７、同義語抽出部２０８、第２検出部２０９、第２選択部２１０を備えている。 Next, the details of the input prediction candidate DB creation unit 200 will be described with reference to FIG. The input prediction candidate DB creation unit 200 includes a conversion unit 201, a frequency counting unit 202, a third detection unit 203, a frequency recalculation unit 204, an independent word extraction unit 205, a first detection unit 206, a first selection unit 207, and synonyms. An extraction unit 208, a second detection unit 209, and a second selection unit 210 are provided.

変換部２０１は、文字列履歴ＤＢ４０１に登録された、過去のキーボード入力の履歴を入力して解析する。文字列履歴ＤＢ４０１は、記憶部１０４に記憶されている。変換部２０１は、辞書ＤＢ４０２に記憶した形態素解析用の辞書を参照して、入力履歴として記録された各文字列の形態素解析を行う。辞書ＤＢ４０２も記憶部１０４に記憶されている。変換部２０１は、形態素解析の結果から、各文字列を文節に分割する。なお、「：」や「？」等の記号も、１文節として分割する。次に、変換部２０１は、文節に区切られた文字列から２文節、３文節といった具合に複数文節（以下、文節列と呼ぶ。なお、文節列が本発明の部分文字列に該当する）を取り出す。取り出される文節は、文字列中で連続する（つながった）文節に限られ、また、取り出される文節の最大数は、対象の文字列に含まれる文節の数と同数である。例えば、文字列が「先週の受診時に比べて、血圧が少し高めです」であったとする。この場合、文字列には、（先週の）、（受診時に）、（比べて）、（血圧が）、（少し）、（高めです）と６文節が含まれている。従って、２文節を含む文節列（以下、２文節列と呼ぶ）として、変換部２０１は、（先週の／受診時に）、（受診時に比べて、）、（比べて、／血圧が）、（血圧が／少し）、（少し／高めです）の５つを取り出すことができる。同様に、３文節を含む文節列（以下、３文節列と呼ぶ）として、変換部２０１は、（先週の／受診時に／比べて、）、（受診時に／比べて、／血圧が）、（比べて、／血圧が／少し）、（血圧が／少し／高めです）の４つを取り出すことができる。以下、同様の手順を繰り返して、変換部２０１は、２〜６文節列を取り出す。なお「／」は、この明細書中では、文節と文節との区切りを示す記号として使用する。図４には、「血圧：正常」、「血圧は正常です」、「血圧正常」、「血圧：異常」の４つの文字列を文節に分割した結果と、分割した文節から文節列を作成した結果とを示す。変換部２０１は、生成した文節列データを第３検出部２０３と、頻度計数部２０２と、自立語抽出部２０５と、同義語抽出部２０８とに送る。また、変換部２０１は、文字列を形態素解析した結果を、自立語抽出部２０５に送る。また、変換部２０１は、生成した文節列をＲＡＭ１０３に記憶させる。
なお、この説明では、変換部２０１は、文字列を文節列に変換するとして説明するが、単語を単位とした単語列に分割してもよい。単語列とは、文節列と同様に、文字列中で連続する（つながった）複数個の単語の集まりである。 The conversion unit 201 inputs and analyzes the past keyboard input history registered in the character string history DB 401. The character string history DB 401 is stored in the storage unit 104. The conversion unit 201 refers to the morphological analysis dictionary stored in the dictionary DB 402 and performs morphological analysis of each character string recorded as the input history. The dictionary DB 402 is also stored in the storage unit 104. The conversion unit 201 divides each character string into phrases based on the result of morphological analysis. Symbols such as “:” and “?” Are also divided as one phrase. Next, the conversion unit 201 converts a plurality of clauses (hereinafter referred to as a phrase string. The phrase string corresponds to a partial character string of the present invention) from a character string divided into phrases into two phrases, three phrases, and the like. Take out. The number of clauses to be extracted is limited to the consecutive (connected) clauses in the character string, and the maximum number of clauses to be extracted is the same as the number of clauses included in the target character string. For example, it is assumed that the character string is “blood pressure is slightly higher than at the time of last week's visit”. In this case, the character string includes 6 clauses (last week), (at the time of consultation), (compared), (blood pressure), (a little), (higher). Therefore, as a phrase string including two phrases (hereinafter, referred to as a two-phrase string), the conversion unit 201 performs (in the last week / at the time of consultation), (compared at the time of consultation), (compared to / blood pressure), ( The blood pressure can be taken out (a little) and (a little / high). Similarly, as a phrase string including three phrases (hereinafter, referred to as a three-phrase string), the conversion unit 201 (the blood pressure), (when compared to / when compared with the last week), Compared to this, four blood pressures can be taken out: (blood pressure is a little) and (blood pressure is a little / high) Thereafter, the same procedure is repeated, and the conversion unit 201 extracts 2 to 6 phrase strings. In this specification, “/” is used as a symbol indicating a break between clauses. FIG. 4 shows a result of dividing four character strings of “blood pressure: normal”, “blood pressure is normal”, “blood pressure normal”, and “blood pressure: abnormal” into phrases, and a phrase string is created from the divided phrases. Results are shown. The conversion unit 201 sends the generated phrase string data to the third detection unit 203, the frequency counting unit 202, the independent word extraction unit 205, and the synonym extraction unit 208. Further, the conversion unit 201 sends the result of the morphological analysis of the character string to the independent word extraction unit 205. Further, the conversion unit 201 stores the generated phrase string in the RAM 103.
In this description, the conversion unit 201 is described as converting a character string into a phrase string, but may be divided into word strings in units of words. The word string is a group of a plurality of words that are continuous (connected) in the character string, like the phrase string.

頻度計数部２０２は、変換部２０１によって文字列から取り出された文節列を取得し、文字列履歴ＤＢ４０１から入力履歴の文字列を取得する。頻度計数部２０２は、変換部２０１から取得した各文節列の、文字列履歴ＤＢ４０１に入力履歴として記録された文字列中での出現頻度をカウントする。頻度計数部２０２は、カウントした各文節列の出現頻度をＲＡＭ１０３に記録する。図５（Ａ）には、図４に示す２文節列の出現頻度をカウントしたカウント結果を示す。また、図５（Ｂ）には、図４に示す３文節列の出現頻度をカウントしたカウント結果を示す。また、図５（Ｃ）には、２文節列を先頭文字でソートしたソート結果を示す。また、図５（Ｄ）には、３文節列を先頭文字でソートしたソート結果を示す。頻度計数部２０２は、各文節列の出現頻度を第３検出部２０３、第１検出部２０６、第２検出部２０９に出力する。 The frequency counting unit 202 acquires the phrase string extracted from the character string by the conversion unit 201 and acquires the character string of the input history from the character string history DB 401. The frequency counting unit 202 counts the appearance frequency of each phrase string acquired from the conversion unit 201 in the character string recorded as the input history in the character string history DB 401. The frequency counting unit 202 records the counted appearance frequency of each phrase string in the RAM 103. FIG. 5A shows a count result obtained by counting the appearance frequency of the two-phrase string shown in FIG. FIG. 5B shows a count result obtained by counting the appearance frequency of the three-phrase string shown in FIG. FIG. 5C shows a sorting result obtained by sorting the two phrase strings by the first character. FIG. 5D shows a sorting result obtained by sorting the three phrase strings by the first character. The frequency counting unit 202 outputs the appearance frequency of each phrase string to the third detection unit 203, the first detection unit 206, and the second detection unit 209.

第３検出部２０３は、変換部２０１から、文字列から取り出された文節列を取得し、頻度計数部２０２から、各文節列の出現頻度のデータを取得する。第３検出部２０３は、まず、変換部２０１から取得した文節列の中から処理対象の文節列を選択する。第３検出部２０３は、選択した処理対象の文節列に含まれる文節数よりも多くの文節を含んだ文節列であって、処理対象の文節列に含まれる文節を含んだ他の文節列を検出する。例えば、図４に示す例では、２文節列の「血圧／：」は、３文節列の「血圧／：／正常」と、３文節列の「血圧／：／異常」とに含まれる。従って、第３検出部２０３は、処理対象の文節列に「血圧／：」を選択した場合、他の文節列として３文節列の「血圧／：／正常」と「血圧／：／異常」とを他の文節列として検出する。
第３検出部２０３は、他の文節列を検出すると、ＲＡＭ１０３の記録情報を更新する。図６には、ＲＡＭ１０３に記録される記録情報の一例を示す。ＲＡＭ１０３には、文節列を識別する識別番号と、文節列と、文節列の出現頻度と、文節列の表す意味と、まとめ情報と、削除フラグとが対応付けて記録される。なお、文節列の表す意味については、後述する。また、まとめ情報とは、対応する文節列がどのように処理されたかを示す情報である。
第３検出部２０３は、他の文節列を検出すると、処理対象の文節列の記録欄に、処理対象の文節列が入力予測候補から除外されたことを示す削除フラグを記録する。また、第３検出部２０３は、処理対象の文節列のまとめ情報として、処理対象の文節列が、検出した他の文節列にまとめられたことを記録する。また、第３検出部２０３は、検出した他の文節列のまとめ情報として、処理対象の文節列をまとめたことを記録する。なお、図６に示す例では、識別番号３と識別番号６の文節列を、識別番号４の文節列「血圧は／正常です」にまとめた例を示している。なお、図６に示すように識別番号４の文節列にも削除フラグを記録して、識別番号４の文節列をそのままコピーした識別番号７の文節列として新たに記録してもよい。この場合、識別番号７の文節列の出現頻度は、識別番号３、４、６の文節列の出現頻度の合計値となる。これらのルールの詳細については、フローチャートを参照しながら後述する。
第３検出部２０３は、以上の処理を、変換部２０１から取得したすべての文節列について行う。 The third detection unit 203 acquires the phrase string extracted from the character string from the conversion unit 201, and acquires the appearance frequency data of each phrase string from the frequency counting unit 202. First, the third detection unit 203 selects a phrase string to be processed from the phrase strings acquired from the conversion unit 201. The third detection unit 203 includes a clause string that includes more clauses than the number of clauses included in the selected clause string to be processed, and includes another clause string that includes a clause included in the clause string to be processed. To detect. For example, in the example shown in FIG. 4, “blood pressure /:” in the 2-phrase string is included in “blood pressure /: / normal” in the 3-phrase string and “blood pressure /: / abnormal” in the 3-phrase string. Therefore, when “blood pressure /:” is selected as the phrase string to be processed, the third detection unit 203 sets “blood pressure /: / normal” and “blood pressure /: / abnormal” in the three phrase strings as the other phrase strings. Is detected as another phrase string.
When the third detection unit 203 detects another phrase string, the third detection unit 203 updates the recorded information in the RAM 103. FIG. 6 shows an example of record information recorded in the RAM 103. In the RAM 103, an identification number for identifying a phrase string, a phrase string, the frequency of occurrence of the phrase string, the meaning represented by the phrase string, summary information, and a deletion flag are recorded in association with each other. The meaning of the phrase string will be described later. The summary information is information indicating how the corresponding phrase string is processed.
When detecting the other phrase string, the third detection unit 203 records a deletion flag indicating that the phrase string to be processed is excluded from the input prediction candidates in the record column of the phrase string to be processed. In addition, the third detection unit 203 records, as summary information of the phrase strings to be processed, that the phrase strings to be processed are grouped into other detected phrase strings. In addition, the third detection unit 203 records that the processing target phrase strings are collected as summary information of other detected phrase strings. In the example illustrated in FIG. 6, the phrase strings of identification numbers 3 and 6 are combined into the phrase string “blood pressure is normal” of identification number 4. As shown in FIG. 6, a deletion flag may be recorded also in the phrase string of identification number 4, and newly recorded as a phrase string of identification number 7, which is a copy of the phrase string of identification number 4 as it is. In this case, the appearance frequency of the phrase string with the identification number 7 is the total value of the appearance frequencies of the phrase strings with the identification numbers 3, 4, and 6. Details of these rules will be described later with reference to a flowchart.
The third detection unit 203 performs the above processing for all the phrase strings acquired from the conversion unit 201.

頻度再計算部２０４は、第３検出部２０３から、第３検出部２０３が検出した他の文節列の出現頻度と、処理対象の文節列の出現頻度とを取得する。頻度再計算部２０４は、他の文節列の出現頻度と、処理対象の文節列の出現頻度とを加算する。頻度再計算部２０４は、加算結果を他の文字列データの出現頻度としてＲＡＭ１０３に記録する。図６に示す例では、頻度再計算部２０４は、識別番号３、４、６の文節列の出現頻度の合計値を算出し、算出した合計値を識別番号７の文節列の出現頻度として記録する。 The frequency recalculation unit 204 acquires, from the third detection unit 203, the appearance frequency of other phrase sequences detected by the third detection unit 203 and the appearance frequency of the phrase sequence to be processed. The frequency recalculation unit 204 adds the appearance frequency of another phrase string and the appearance frequency of the phrase string to be processed. The frequency recalculation unit 204 records the addition result in the RAM 103 as the appearance frequency of other character string data. In the example illustrated in FIG. 6, the frequency recalculation unit 204 calculates the total value of the appearance frequencies of the phrase strings with the identification numbers 3, 4, and 6, and records the calculated total value as the appearance frequency of the phrase string with the identification number 7. To do.

自立語抽出部２０５は、変換部２０１から、文字列を形態素解析した解析結果と、変換部２０１によって文字列から取り出された文節列とを取得する。
自立語抽出部２０５は、取得した形態素解析の結果を参照して、文節列から自立語を取り出す。さらに、自立語抽出部２０５は、取り出した自立語の意味を表す記号を、辞書ＤＢ４０２を参照して求める。辞書ＤＢ４０２には、自立語と、自立語の意味を示す記号とを記録した辞書が登録されている。自立語抽出部２０５は、この辞書を参照して、自立語の意味を表す記号を求める。意味を表す記号は、英語などの他の言語であってもよいし、日本電子化辞書研究所（ＥＤＲ：electronic dictionary research）のＥＤＲ電子化辞書に使用される概念識別子を使用してもよい。なお、概念識別子とは、単語の表す概念を数で表現したものである。以下では、自立語の意味を表す記号として英語を使用する場合を例に説明する。例えば、「ウィンドウを開く」という文節列を対象とした場合、自立語は、「ウィンドウ」と「開く」であり、これらの自立語の意味を表す記号は、「ＷＩＮＤＯＷ」と「ＯＰＥＮ」になる。自立語抽出部２０５は、求めた自立語の意味を表す記号をアルファベット順にソートして、ソート結果を第１検出部２０６と第２検出部２０９とに送る。また、自立語抽出部２０５は、自立語の意味を表す記号をソートしたソート結果をＲＡＭ１０３に記録する。図６に示す例では、例えば文節列「：／異常」の意味を表す記号は、「ABNORMAL」となる。 The independent word extraction unit 205 acquires from the conversion unit 201 an analysis result obtained by performing a morphological analysis on the character string and a phrase string extracted from the character string by the conversion unit 201.
The independent word extraction unit 205 refers to the acquired morphological analysis result and extracts an independent word from the phrase string. Further, the independent word extraction unit 205 obtains a symbol representing the meaning of the extracted independent word with reference to the dictionary DB 402. In the dictionary DB 402, a dictionary in which independent words and symbols indicating the meanings of the independent words are recorded is registered. The independent word extraction unit 205 refers to this dictionary to obtain a symbol representing the meaning of the independent word. The symbol representing meaning may be another language such as English, or a concept identifier used for an EDR electronic dictionary of the Electronic Dictionary Research (EDR). The concept identifier is a numerical expression of the concept represented by the word. Below, the case where English is used as a symbol showing the meaning of an independent word will be described as an example. For example, when the phrase string “open window” is targeted, the independent words are “window” and “open”, and the symbols representing the meanings of these independent words are “WINDOW” and “OPEN”. . The independent word extraction unit 205 sorts the symbols representing the meanings of the obtained independent words in alphabetical order, and sends the sorting result to the first detection unit 206 and the second detection unit 209. Further, the independent word extraction unit 205 records the sorting result obtained by sorting the symbols representing the meanings of the independent words in the RAM 103. In the example illustrated in FIG. 6, for example, the symbol indicating the meaning of the phrase string “: / abnormal” is “ABNORMAL”.

第１検出部２０６は、第３検出部２０３から、ＲＡＭ１０３に削除フラグが記録されていない文節列を取得する。また、第１検出部２０６は、頻度計数部２０２から各文節列の出現頻度のデータを取得する。また、第１検出部２０６は、自立語抽出部２０５から文節列と、文節列に含まれる自立語とを取得する。
第１検出部２０６は、第３検出部２０３から取得した文節列の中から処理対象とする文節列（以下、処理対象の文節列と呼ぶ）を選択する。次に、第１検出部２０６は、自立語抽出部２０５から取得した情報に基づいて、選択した処理対象の文節列に含まれる自立語を認識する。さらに第１検出部２０６は、自立語抽出部２０５から取得した情報に基づいて、第３検出部２０３から取得した文節列の中から、処理対象の文節列に含まれる自立語と同じ自立語を含んだ他の文節列を検出する。他の文節列を検出すると、第１検出部２０６は、処理対象の文節列と、検出した他の文節列と、これらの文節列の出現頻度とを第１選択部２０７に送る。
第１検出部２０６は、以上の処理を、第３検出部２０３から取得したすべての文節列について行う。 The first detection unit 206 acquires from the third detection unit 203 a phrase string for which no deletion flag is recorded in the RAM 103. Further, the first detection unit 206 acquires data on the appearance frequency of each phrase string from the frequency counting unit 202. In addition, the first detection unit 206 acquires a phrase string and the independent words included in the phrase string from the independent word extraction unit 205.
The first detection unit 206 selects a phrase string to be processed from the phrase strings acquired from the third detection unit 203 (hereinafter referred to as a phrase string to be processed). Next, the first detection unit 206 recognizes an independent word included in the selected phrase string to be processed based on the information acquired from the independent word extraction unit 205. Further, the first detection unit 206 selects the same independent word as the independent word included in the phrase string to be processed from the phrase strings acquired from the third detection unit 203 based on the information acquired from the independent word extraction unit 205. Detects other included phrase sequences. When other phrase strings are detected, the first detection unit 206 sends the phrase strings to be processed, the detected other phrase strings, and the appearance frequencies of these phrase strings to the first selection unit 207.
The first detection unit 206 performs the above processing for all the phrase strings acquired from the third detection unit 203.

第１選択部２０７は、第１検出部２０６から処理対象の文節列と、検出した他の文節列と、これらの文節列の出現頻度とを取得する。第１選択部２０７は、文節列の長さ、又は文節列の出現頻度に応じて、処理対象の文節列と検出した他の文節列とのいずれか一方を選択し、選択した文節列を入力予測候補としてＲＡＭ１０３に記録する。
第１選択部２０７は、まず、取得した処理対象の文節列の長さと、検出した他の文節列の長さとを比較する。第１選択部２０７は、長さの長い方の文節列を入力予測候補に選択し、長さの短い方の文節列を入力予測候補から除外する文節列に選択する。また、処理対象の文節列の長さと、検出した他の文節列の長さとが同一であった場合、第１選択部２０７は、出現頻度の小さい方の文節列を入力予測候補から除外する文節列に選択する。なお、以下では、入力予測候補から除外する文節列を、除外する文節列と呼び、入力予測候補から除外しない文節列を、残す文節列と呼ぶ。
第１選択部２０７は、ＲＡＭ１０３の記録情報を書き替えて、除外する文節列の記録欄に、この文節列が入力予測候補から除外されたことを示す削除フラグを記録する。また、第１選択部２０７は、除外する文節列のまとめ情報として、残す文節列にまとめられたことを記録する。また、第１選択部２０７は、残す文節列のまとめ情報として、除外する文節列をまとめたことを記録する。第１選択部２０７は、処理対象の文節列の出現頻度と、検出した他の文節列の出現頻度とを頻度再計算部２０４に送る。
頻度再計算部２０４は、処理対象の文節列の出現頻度と、検出した文節列の出現頻度とを加算する。頻度再計算部２０４は、加算した合計頻度を、残す文節列の頻度としてＲＡＭ１０３に登録する。 The first selection unit 207 acquires the phrase string to be processed, the other detected phrase strings, and the appearance frequencies of these phrase strings from the first detection unit 206. The first selection unit 207 selects either the phrase string to be processed or the detected other phrase string in accordance with the length of the phrase string or the appearance frequency of the phrase string, and inputs the selected phrase string It records in RAM103 as a prediction candidate.
The first selection unit 207 first compares the length of the acquired phrase string to be processed with the length of another detected phrase string. The first selection unit 207 selects a phrase string having a longer length as an input prediction candidate, and selects a phrase string having a shorter length as a phrase string to be excluded from the input prediction candidates. If the length of the phrase string to be processed is the same as the length of the other detected phrase string, the first selection unit 207 excludes the phrase string having the smaller appearance frequency from the input prediction candidates. Select on a column. In the following, the phrase string excluded from the input prediction candidates is referred to as the excluded phrase string, and the phrase string not excluded from the input prediction candidates is referred to as the remaining phrase string.
The first selection unit 207 rewrites the recording information in the RAM 103 and records a deletion flag indicating that the phrase string is excluded from the input prediction candidates in the recording column of the phrase string to be excluded. In addition, the first selection unit 207 records the fact that the phrase strings to be excluded are collected as summary information of the phrase strings to be excluded. In addition, the first selection unit 207 records that the phrase strings to be excluded are collected as summary information of the phrase strings to be left. The first selection unit 207 sends the frequency of appearance of the phrase string to be processed and the frequency of appearance of other detected phrase strings to the frequency recalculation unit 204.
The frequency recalculation unit 204 adds the appearance frequency of the phrase string to be processed and the appearance frequency of the detected phrase string. The frequency recalculation unit 204 registers the added total frequency in the RAM 103 as the frequency of the phrase string to be left.

同義語抽出部２０８は、自立語抽出部２０５から文節列と、文節列に含まれる自立語とを取得する。同義語抽出部２０８は、辞書ＤＢ４０２を参照して、自立語抽出部２０５から取得した自立語の見出し語を求める。辞書ＤＢ４０２には、同義語辞書が記録されている。同義語辞書には、見出し語と、この見出し語と同義の単語とが登録されている。例えば、「疲れ」という見出し語に対し、同義語として「疲れ」、「疲労」、「疲労感」といった単語が登録されている。同様に、「翻訳」という見出し語に対して、「翻訳」、「訳出」という同義語が登録されている。この同義語辞書を参照すると、例えば、「翻訳で疲れた」という２文節列と、「訳出で疲労した」という２文節列は、見出し語の同じ文節列ということになる。従って、「翻訳で疲れた」と、「訳出で疲労した」とは見出し語の同じ、すなわち意味の同じ文節列と判定できる。同義語抽出部２０８は、自立語の見出し語を第２検出部２０９に送る。 The synonym extraction unit 208 acquires the phrase string and the independent words included in the phrase string from the independent word extraction unit 205. The synonym extraction unit 208 refers to the dictionary DB 402 and obtains an independent word headword acquired from the independent word extraction unit 205. A synonym dictionary is recorded in the dictionary DB 402. In the synonym dictionary, a headword and a word synonymous with the headword are registered. For example, words such as “fatigue”, “fatigue”, and “feeling of fatigue” are registered as synonyms for the heading “fatigue”. Similarly, synonyms “translation” and “translation” are registered for the heading “translation”. Referring to the synonym dictionary, for example, a two-phrase string “I am tired from translation” and a two-phrase string “I am tired from translation” are the same phrase string in the headword. Therefore, it can be determined that “I am tired from translation” and “I am tired from translation” are phrase strings having the same headword, that is, the same meaning. The synonym extraction unit 208 sends the independent word headword to the second detection unit 209.

第２検出部２０９は、第３検出部２０３から、ＲＡＭ１０３に削除フラグが記録されていない文節列を取得する。また、第２検出部２０９は、頻度計数部２０２から各文節列の出現頻度を取得する。また、第２検出部２０９は、自立語抽出部２０５から文節列と、文節列に含まれる自立語と、自立語の意味を表す記号とを取得する。また、第２検出部２０９は、同義語抽出部２０８から自立語の見出し語を取得する。
第２検出部２０９は、まず、第３検出部２０３から取得した文節列の中から処理対象とする文節列（以下、処理対象の文節列と呼ぶ）を選択する。次に、第２検出部２０９は、自立語抽出部２０５から取得した情報に基づいて、選択した処理対象の文節列に含まれる自立語の意味を表す記号を取得する。次に、第２検出部２０９は、第３検出部２０３から取得した文節列の中に、処理対象の文節列に含まれる自立語の意味を表す記号と同じ記号を含む他の文節列を検出する。他の文節列を検出した場合には、第２検出部２０９は、処理対象の文節列と、検出した他の文節列と、これらの文節列の出現頻度とを第２選択部２１０に送る。 The second detection unit 209 acquires from the third detection unit 203 a phrase string for which no deletion flag is recorded in the RAM 103. In addition, the second detection unit 209 acquires the appearance frequency of each phrase string from the frequency counting unit 202. In addition, the second detection unit 209 acquires a phrase string, an independent word included in the phrase string, and a symbol representing the meaning of the independent word from the independent word extraction unit 205. In addition, the second detection unit 209 acquires a headword of an independent word from the synonym extraction unit 208.
First, the second detection unit 209 selects a phrase string to be processed from the phrase strings acquired from the third detection unit 203 (hereinafter referred to as a phrase string to be processed). Next, the second detection unit 209 acquires a symbol representing the meaning of the independent word included in the selected phrase string to be processed based on the information acquired from the independent word extraction unit 205. Next, the second detection unit 209 detects, in the phrase sequence acquired from the third detection unit 203, another phrase sequence that includes the same symbol as the symbol representing the meaning of the independent word included in the phrase sequence to be processed. To do. When other phrase strings are detected, the second detection unit 209 sends the phrase string to be processed, the detected other phrase strings, and the appearance frequency of these phrase strings to the second selection unit 210.

また、第２検出部２０９は、同義語抽出部２０８から取得した情報に基づいて、処理対象の文節列に含まれる自立語の見出し語を取得する。次に、第２検出部２０９は、第３検出部２０３から取得した文節列の中から、処理対象の文節列に含まれる自立語の見出し語と同じ見出し語を含む他の文節列を検出する。他の文節列を検出した場合には、第２検出部２０９は、処理対象の文節列と、検出した他の文節列と、これらの文節列の出現頻度とを第２選択部２１０に送る。
第２検出部２０９は、以上の処理を、第３検出部２０３から取得したすべての文節列について行う。 Further, the second detection unit 209 acquires a headword of an independent word included in the phrase string to be processed based on the information acquired from the synonym extraction unit 208. Next, the second detection unit 209 detects, from the phrase strings acquired from the third detection unit 203, other phrase strings including the same headword as the independent word headword included in the phrase string to be processed. . When other phrase strings are detected, the second detection unit 209 sends the phrase string to be processed, the detected other phrase strings, and the appearance frequency of these phrase strings to the second selection unit 210.
The second detection unit 209 performs the above process for all the phrase strings acquired from the third detection unit 203.

第２選択部２１０は、第２検出部２０９から処理対象の文節列と、検出した他の文節列と、これらの文節列の出現頻度とを取得する。第２選択部２１０は、文節列の長さ、又は文節列の出現頻度に応じて、処理対象の文節列と検出した他の文節列とのいずれか一方を選択し、選択した文節列を入力予測候補としてＲＡＭ１０３に記録する。
第２選択部２１０は、まず、取得した処理対象の文節列の長さと、検出した他の文節列の長さとを比較する。第２選択部２１０は、長さの長い方の文節列を入力予測候補に選択し、長さの短い方の文節列を入力予測候補から除外する文節列に選択する。また、処理対象の文節列の長さと、検出した他の文節列の長さとが同一であった場合、第２選択部２１０は、出現頻度の小さい方の文節列を入力予測候補から除外する文節列に選択する。なお、ここでも、入力予測候補から除外する文節列を除外する文節列と呼び、入力予測候補から除外しない文節列を残す文節列と呼ぶ。
第２選択部２１０は、ＲＡＭ１０３の記録情報を書き替えて、除外する文節列の記録欄に、この文節列が入力予測候補から除外されたことを示す削除フラグを記録する。また、第２選択部２１０は、除外する文節列のまとめ情報として、残す文節列にまとめられたことを記録する。また、第２選択部２１０は、残す文節列のまとめ情報として、除外する文節列をまとめたことを記録する。
また、第２選択部２１０は、処理対象の文節列の出現頻度と、検出した他の文節列の出現頻度とを頻度再計算部２０４に送る。頻度再計算部２０４は、処理対象の文節列の出現頻度と、検出した他の文節列の出現頻度とを加算する。頻度再計算部２０４は、加算した合計頻度を、残す文節列の頻度として登録する。 The second selection unit 210 obtains the phrase string to be processed, other detected phrase strings, and the appearance frequency of these phrase strings from the second detection unit 209. The second selection unit 210 selects one of the phrase string to be processed and another detected phrase string in accordance with the length of the phrase string or the appearance frequency of the phrase string, and inputs the selected phrase string It records in RAM103 as a prediction candidate.
First, the second selection unit 210 compares the acquired length of the phrase string to be processed with the length of another detected phrase string. The second selection unit 210 selects a phrase string having a longer length as an input prediction candidate, and selects a phrase string having a shorter length as a phrase string to be excluded from the input prediction candidates. In addition, when the length of the phrase string to be processed is the same as the length of the other detected phrase string, the second selection unit 210 excludes the phrase string having the smaller appearance frequency from the input prediction candidates. Select on a column. In this case, it is also called a phrase string that excludes a phrase string that is excluded from the input prediction candidates, and is called a phrase string that leaves a phrase string that is not excluded from the input prediction candidates.
The second selection unit 210 rewrites the recording information in the RAM 103 and records a deletion flag indicating that the phrase string is excluded from the input prediction candidates in the recording column of the phrase string to be excluded. In addition, the second selection unit 210 records that the phrase strings to be excluded are collected as summary information of the phrase strings to be excluded. In addition, the second selection unit 210 records that the phrase strings to be excluded are collected as summary information of the phrase strings to be left.
In addition, the second selection unit 210 sends the frequency of appearance of the phrase string to be processed and the frequency of appearance of the other detected phrase strings to the frequency recalculation unit 204. The frequency recalculation unit 204 adds the appearance frequency of the phrase string to be processed and the appearance frequency of the other detected phrase string. The frequency recalculation unit 204 registers the added total frequency as the frequency of the phrase string to be left.

次に、図７〜図１０に示すフローチャートを参照しながら制御部１００の処理フローを説明する。
まず、制御部１００は、キーボード１１６によって入力を受け付け、記憶部１０４に記憶している文字列の入力履歴を読み出してＲＡＭ１０３に記憶する。そして、制御部１００は、ＲＡＭ１０３に記憶した文字列を１文ずつ解析する。制御部１００は、文字列に形態素解析等の解析を行って、解析の結果から文字列を文節（又は単語）に分割する（ステップＳ１）。なお、制御部１００は、「：」や「？」等の記号も１文節として分割する。次に、制御部１００は、文節に区切られた文字列から複数文節列を取り出す（ステップＳ１）。複数文節列として取り出される文節は、連続する（つながった）文節に限られ、また、取り出される文節の最大数は、対象の文字列に含まれる文節の数と同数である。 Next, the processing flow of the control unit 100 will be described with reference to the flowcharts shown in FIGS.
First, the control unit 100 receives an input using the keyboard 116, reads the input history of the character string stored in the storage unit 104, and stores it in the RAM 103. Then, the control unit 100 analyzes the character string stored in the RAM 103 one sentence at a time. The control unit 100 performs analysis such as morphological analysis on the character string, and divides the character string into phrases (or words) based on the analysis result (step S1). Note that the control unit 100 also divides symbols such as “:” and “?” Into one phrase. Next, the control unit 100 extracts a plurality of phrase strings from the character string divided into phrases (step S1). The number of clauses extracted as a multiple clause string is limited to consecutive (connected) clauses, and the maximum number of clauses to be extracted is the same as the number of clauses included in the target character string.

次に、制御部１００は、各文節列の出現頻度を計算する（ステップＳ２）。制御部１００は、計算した各文節列の出現頻度をＲＡＭ１０３に記録する。また、制御部１００は、例えば、２文節列、３文節列といった文節列ごとに、文節列を先頭の文字をキーにしてソートし、ＲＡＭ１０３に記録する。
なお、制御部１００は、入力履歴として記録した各文字列の形態素解析の結果、文節列に含まれる文節数ごとに、文節列を先頭の文字をキーにソートしたもの、各文節列の出現頻度等をＲＡＭ１０３に記録する。 Next, the control unit 100 calculates the appearance frequency of each phrase string (step S2). The control unit 100 records the calculated appearance frequency of each phrase string in the RAM 103. In addition, the control unit 100 sorts the phrase strings using the first character as a key for each phrase string such as a two-phrase string and a three-phrase string, and records them in the RAM 103.
As a result of the morphological analysis of each character string recorded as the input history, the control unit 100 sorts the phrase string using the first character as a key for each number of phrases included in the phrase string, and the appearance frequency of each phrase string. Etc. are recorded in the RAM 103.

次に、制御部１００は、キーボード１１６で文字列の入力を受け付けた際に、入力予測候補としてディスプレイ１１２に表示する文節列の候補から除外する文節列を検出する（ステップＳ３）。この処理の詳細については、図８に示すフローチャートを参照しながら後述する。なお、この処理は、主に、図３に示す第３検出部２０３と頻度再計算部２０４とで実行される。 Next, when receiving input of a character string with the keyboard 116, the control unit 100 detects a phrase string to be excluded from the phrase string candidates displayed on the display 112 as input prediction candidates (step S3). Details of this processing will be described later with reference to the flowchart shown in FIG. This process is mainly executed by the third detection unit 203 and the frequency recalculation unit 204 shown in FIG.

次に、制御部１００は、文節列の表す意味を解析して、意味が同じ文節列を１つのグループにまとめる（ステップＳ４）。この処理の詳細についても図９に示すフローチャートを参照しながら後述する。この処理は、主に、図３に示す第２検出部２０９と、第２選択部２１０とで実行される。 Next, the control unit 100 analyzes the meanings represented by the phrase strings, and groups the phrase strings having the same meaning into one group (step S4). Details of this processing will also be described later with reference to the flowchart shown in FIG. This process is mainly executed by the second detection unit 209 and the second selection unit 210 shown in FIG.

次に、制御部１００は、文節列の簡略表記を作成し、簡略表記が同じ文節列を１つのグループにまとめる（ステップＳ５）。この処理についても図１０に示すフローチャートを参照しながら後述する。なお、簡略表記とは、文節列から自立語だけを取り出した表記である。この処理は、主に、図３に示す第１検出部２０６と、第１選択部２０７とで実行される。 Next, the control unit 100 creates a simplified notation for the phrase string, and groups the phrase strings having the same simplified notation into one group (step S5). This process will also be described later with reference to the flowchart shown in FIG. The shorthand notation is a notation in which only independent words are extracted from the phrase string. This process is mainly executed by the first detection unit 206 and the first selection unit 207 shown in FIG.

最後に、制御部１００は、図６に示すＲＡＭ１０３の記録情報を記憶部１０４に記憶させて処理を終了する（ステップＳ６）。 Finally, the control unit 100 stores the recording information of the RAM 103 shown in FIG. 6 in the storage unit 104 and ends the process (step S6).

図８に示すフローチャートを参照しながら、入力予測候補としてディスプレイ１１２に表示する文節列の候補から除外する文節列を検出する手順（図７に示すステップＳ３の処理の詳細）を説明する。
制御部１００は、まず、文節列を識別する変数ｎの初期値をｎ＝２に設定する（ステップＳ１１）。次に、制御部１００は、変数ｎの示す２文節列の中から処理を行う対象となる文節列（以下、処理対象の文節列と呼ぶ）を選択する。さらに、制御部１００は、処理対象の文節列の出現頻度を変数αに設定し、出現頻度の合計を示す変数ｆｒｅｃを０に設定する（ステップＳ１２）。なお、変数ｆｒｅｃは、以下で説明する処理によってまとめられた（グループ化された）文節列の出現頻度の合計をカウントする変数である。また、各文節列の出現頻度は、図７に示すステップＳ２で算出済みである。 A procedure for detecting a phrase string to be excluded from the phrase string candidates displayed on the display 112 as input prediction candidates (details of the processing in step S3 shown in FIG. 7) will be described with reference to the flowchart shown in FIG.
First, the control unit 100 sets an initial value of a variable n for identifying a phrase string to n = 2 (step S11). Next, the control unit 100 selects a phrase string to be processed (hereinafter referred to as a phrase string to be processed) from the two phrase strings indicated by the variable n. Further, the control unit 100 sets the appearance frequency of the phrase string to be processed to the variable α, and sets the variable frec indicating the total appearance frequency to 0 (step S12). Note that the variable frec is a variable that counts the total appearance frequency of clause strings that are grouped (grouped) by the process described below. Further, the frequency of appearance of each phrase string has been calculated in step S2 shown in FIG.

次に、制御部１００は、処理対象の文節列以外の文節列であって、処理対象の文節列よりも文節数の多い文節列を識別する変数をｍ（ｍ＝ｎ＋１）に設定する（ステップＳ１３）。なお、ｍは、文節列に含まれる文節数を表す。例えば、処理対象の文節列が２文節列（ｎ＝２）であった場合、２文節列よりも文節数の多い３文節列からが対象の文節列となる。また、以下では、変数ｍの示す文節列をｍ文節列と呼ぶ。
制御部１００は、ｍ文節列の中で、処理対象の文節列を含むｍ文節列を検出する（ステップＳ１４）。例えば、処理対象の文節列を「血圧：」（２文節列）とした場合、３文節列の「血圧：正常」と「血圧：」には、処理対象の文節列が含まれる。ｍ文節列を検出すると（ステップＳ１４／ＹＥＳ）、制御部１００は、出現頻度の合計をカウントする変数ｆｒｅｃに、検出したｍ文節列の出現頻度を加算する（ステップＳ１５）。次に、制御部１００は、ステップＳ１４の判定を行っていない他のｍ文節列があるか否かを判定する（ステップＳ１６）。ステップＳ１４の判定を行っていない他のｍ文節列がある場合には（ステップＳ１６／ＹＥＳ）、制御部１００は、ステップＳ１４からの処理を繰り返す。また、ステップＳ１４の判定を行っていない他のｍ文節列がないと判定すると（ステップＳ１６／ＮＯ）、制御部１００は、ｍの値を１加算する（ステップＳ１７）。そして、制御部１００は、ｍの値が、文節数が最大の文節列よりも大きくなったか否かを判定する（ステップＳ１８）。ｍの値が、文節数が最大の文節列よりも大きくないと判定すると（ステップＳ１８／ＮＯ）、制御部１００は、ステップＳ１４からの処理を繰り返す。また、ｍの値が、文節数が最大の文節列よりも大きいと判定すると（ステップＳ１８／ＹＥＳ）、制御部１００は、変数αの値と変数ｆｒｅｃの値とが一致しているか否かを判定する（ステップＳ１９）。制御部１００は、変数αと変数ｆｒｅｃの値が一致していると判定すると（ステップＳ１９／ＹＥＳ）、ＲＡＭ１０３の処理対象の文節列の記録欄に削除フラグを記録する（ステップＳ２０）。すなわち、変数αと変数ｆｒｅｃの値が一致しているということは、処理対象の文節列の出現頻度と、検出したｍ文節列の出現頻度とが一致しているということを示している。従って、処理対象の文節列を入力した場合、ユーザは、必ず検出したｍ文節列を入力しているということを示している。例えば、図５（Ｃ）、（Ｄ）に示す例では、２文節列の「血圧／：」を含む３文節列は、「血圧／：／異常」と「血圧／：／正常」である。３文節列の出現頻度の合計は２であり、２文節列の「血圧／：」の出現頻度も２である。従って、「血圧：」と入力した場合、ユーザは「血圧：異常」と入力するか、又は「血圧：正常」と入力することを示している。このため、制御部１００は、ＲＡＭ１０３の文節列「血圧／：」の記録欄に、入力予測候補として表示する文節列の候補から除外することを示す削除フラグを記録する。 Next, the control unit 100 sets m (m = n + 1) as a variable for identifying a phrase string other than the phrase string to be processed, which is a phrase string having a larger number of clauses than the phrase string to be processed (step n). S13). Note that m represents the number of clauses included in the clause string. For example, if the clause string to be processed is a two-phrase string (n = 2), the target phrase string is a three-phrase string having a larger number of phrases than the two-phrase string. In the following, the phrase string indicated by the variable m is referred to as an m phrase string.
The control unit 100 detects an m phrase string including the phrase string to be processed among the m phrase strings (step S14). For example, if the phrase string to be processed is “blood pressure:” (two phrase strings), the three phrase strings “blood pressure: normal” and “blood pressure:” include the phrase strings to be processed. When the m phrase string is detected (step S14 / YES), the control unit 100 adds the appearance frequency of the detected m phrase string to the variable frec for counting the total appearance frequency (step S15). Next, the control unit 100 determines whether there is another m phrase string that has not been determined in step S14 (step S16). If there is another m phrase string that has not been determined in step S14 (step S16 / YES), the control unit 100 repeats the processing from step S14. If it is determined that there is no other m phrase string that has not been determined in step S14 (step S16 / NO), the control unit 100 adds 1 to the value of m (step S17). And the control part 100 determines whether the value of m became larger than the phrase row | line | column with the largest number of phrases (step S18). If it is determined that the value of m is not greater than the phrase string with the maximum number of phrases (step S18 / NO), the control unit 100 repeats the process from step S14. If it is determined that the value of m is larger than the phrase string having the largest number of phrases (step S18 / YES), the control unit 100 determines whether or not the value of the variable α matches the value of the variable frec. Determination is made (step S19). If the control unit 100 determines that the values of the variable α and the variable frec match (step S19 / YES), the control unit 100 records a deletion flag in the record column of the phrase string to be processed in the RAM 103 (step S20). That is, the fact that the values of the variable α and the variable frec match indicates that the appearance frequency of the phrase string to be processed matches the appearance frequency of the detected m phrase string. Therefore, when the phrase string to be processed is input, it indicates that the user has always input the detected m phrase string. For example, in the example shown in FIGS. 5C and 5D, the three phrase strings including “blood pressure /:” in the two phrase strings are “blood pressure /: / abnormal” and “blood pressure /: / normal”. The total appearance frequency of the three-phrase string is 2, and the appearance frequency of “blood pressure /:” in the two-phrase string is also 2. Therefore, when “blood pressure:” is input, it indicates that the user inputs “blood pressure: abnormal” or “blood pressure: normal”. For this reason, the control unit 100 records a deletion flag indicating that it is excluded from the phrase string candidates to be displayed as the input prediction candidates in the recording column of the phrase string “blood pressure /:” in the RAM 103.

次に、制御部１００は、変数ｎの示す文節数と同じ文節数の他の文節列があるか否かを判定する（ステップＳ２１）。例えば、変数ｎの値が２であれば、制御部１００は、処理対象の文節列として選択していない他の２文節列が存在するか否かを判定する。変数ｎの他の文節列が存在すると判定すると（ステップＳ２１／ＹＥＳ）、制御部１００は、この他の文節列を処理対象の文節列として選択し、ステップＳ１２以降の処理を繰り返す。また、他の文節列が存在しないと判定した場合（ステップＳ２１／ＮＯ）、制御部１００は、変数ｎの値を１加算し（ステップＳ２２）、変数ｎの値が、文節数が最大の文節列の文節数と一致したか否かを判定する（ステップＳ２３）。一致したと判定した場合（ステップＳ２３／ＹＥＳ）、制御部１００は、このフローを終了する。また、一致しないと判定した場合（ステップＳ２３／ＮＯ）、制御部１００は、ステップＳ１２からの処理を繰り返す。 Next, the control unit 100 determines whether there is another clause string having the same number of clauses as the number of clauses indicated by the variable n (step S21). For example, if the value of the variable n is 2, the control unit 100 determines whether there are other two phrase strings that are not selected as the phrase string to be processed. If it is determined that there is another phrase string in the variable n (step S21 / YES), the control unit 100 selects the other phrase string as a processing target phrase string, and repeats the processing from step S12 onward. If it is determined that there is no other phrase string (step S21 / NO), the control unit 100 adds 1 to the value of the variable n (step S22), and the value of the variable n is the phrase with the maximum number of phrases. It is determined whether or not the number of clauses in the column matches (step S23). If it is determined that they match (step S23 / YES), the control unit 100 ends this flow. If it is determined that they do not match (step S23 / NO), the control unit 100 repeats the processing from step S12.

次に、図９に示すフローチャートを参照しながら文節列の表す意味を解析して、意味が同じ文節列を１つのグループにまとめる処理（図７に示すステップＳ４の処理の詳細）について説明する。
制御部１００は、図８に示すステップＳ２０の処理で、ＲＡＭ１０３に削除フラグを記録した文節列以外の文節列について、文節列から自立語を取り出す（ステップＳ３１）。図７に示すステップＳ１において形態素解析を行い、その結果をＲＡＭ１０３に記録しているので、制御部１００は、ＲＡＭ１０３に記録した形態素解析の結果を参照して、文節列から自立語を取り出す（ステップＳ３１）。 Next, a process of analyzing the meanings of the phrase strings and grouping the phrase strings having the same meanings into one group (details of the process in step S4 shown in FIG. 7) will be described with reference to the flowchart shown in FIG.
The control unit 100 extracts an independent word from the phrase string for the phrase string other than the phrase string in which the deletion flag is recorded in the RAM 103 in the process of step S20 shown in FIG. 8 (step S31). Since the morpheme analysis is performed in step S1 shown in FIG. 7 and the result is recorded in the RAM 103, the control unit 100 refers to the result of the morpheme analysis recorded in the RAM 103 and extracts an independent word from the phrase string (step S1). S31).

次に、制御部１００は、取り出した自立語の意味を表す記号を求める（ステップＳ３２）。記憶部１０４の辞書ＤＢ４０２には、予め自立語の意味を記録した辞書が登録されており、制御部１００は、この辞書を参照して自立語の意味を表す記号を求める。なお、ここでは、自立語の意味を表す記号として英語を使用した場合を例に説明する。例えば、「ウィンドウを開く」という文節列を対象とした場合、自立語は、「ウィンドウ」と「開く」であり、これらの自立語の意味を表す記号は、「ＷＩＮＤＯＷ」と「ＯＰＥＮ」になる。 Next, the control unit 100 obtains a symbol representing the meaning of the extracted independent word (step S32). In the dictionary DB 402 of the storage unit 104, a dictionary in which the meaning of the independent word is recorded in advance is registered, and the control unit 100 refers to this dictionary to obtain a symbol representing the meaning of the independent word. Here, a case where English is used as a symbol representing the meaning of an independent word will be described as an example. For example, when the phrase string “open window” is targeted, the independent words are “window” and “open”, and the symbols representing the meanings of these independent words are “WINDOW” and “OPEN”. .

次に、制御部１００は、求めた記号をアルファベット順にソートする（ステップＳ３３）。上述した例では、記号は「ＷＩＮＤＯＷ」と「ＯＰＥＮ」であるから、ソート結果は、ＯＰＥＮ＿ＷＩＮＤＯＷとなる。なお、「＿」は、本明細書中では、単語と単語を連結を表す記号として使用する。制御部１００は、ソート結果をＲＡＭ１０３に記録する。 Next, the control unit 100 sorts the obtained symbols in alphabetical order (step S33). In the example described above, since the symbols are “WINDOW” and “OPEN”, the sorting result is OPEN_WINDOW. Note that “_” is used in this specification as a symbol representing concatenation of words. The control unit 100 records the sorting result in the RAM 103.

次に、制御部１００は、記憶部１０４に記憶している同義語辞書を参照して、ステップＳ３１で取得した自立語の見出し語を取得する（ステップＳ３４）。同義語辞書には、見出し語と、この見出し語と同義の単語とが登録されている。 Next, the control unit 100 refers to the synonym dictionary stored in the storage unit 104, and acquires the headword of the independent word acquired in step S31 (step S34). In the synonym dictionary, a headword and a word synonymous with the headword are registered.

次に、制御部１００は、処理を行う対象となる文節列（以下、文節列Ａと呼ぶ）を選択する（ステップＳ３５）。次に、制御部１００は、文節列Ａと意味が同じ、又は文節列Ａに含まれる自立語の見出し語と見出し語が一致する文節列（以下、文節列Ｂと呼ぶ）を検出する（ステップＳ３６）。ステップＳ３２、３３で各文節列に含まれる自立語の意味を表す記号（英語）を求めてアルファベット順ソートしているので、制御部１００は、文節列Ａと他の文節列との意味を表す記号を比較して、記号が一致する文節列を検出する。また、ステップＳ３４で各文節列に含まれる自立語の見出し語を取得しているので、制御部１００は、文節列Ａと見出し語が一致する他の文節列を検出する。なお、文節列Ｂの数は、１つに限らない。 Next, the control unit 100 selects a phrase string (hereinafter referred to as phrase string A) to be processed (step S35). Next, the control unit 100 detects a phrase string (hereinafter referred to as a phrase string B) having the same meaning as the phrase string A, or a matching headword of the independent word included in the phrase string A (hereinafter referred to as a phrase string B). S36). Since the symbols (English) representing the meanings of the independent words included in each phrase string are obtained in steps S32 and S33 and sorted in alphabetical order, the control unit 100 represents the meanings of the phrase string A and other phrase strings. The symbols are compared to find a phrase string that matches the symbols. Further, since the independent word headword included in each phrase string is acquired in step S34, the control unit 100 detects another phrase string in which the phrase string A matches the headword. The number of phrase strings B is not limited to one.

文節列Ａと意味が同じ、又は見出し語が一致する文節列Ｂを検出することができなかった場合（ステップＳ３７／ＮＯ）、制御部１００は、ステップＳ４３の処理に移行する。ステップＳ４３において制御部１００は、削除フラグが記録されていない文節列の中で、文節列Ａとして選択していない文節列が他にあるか否かを判定する（ステップＳ４３）。文節列Ａとして選択していない文節列があると判定すると（ステップＳ４３／ＮＯ）、制御部１００は、ステップＳ３５からの処理を繰り返す。また、文節列Ａとして選択していない文節列がないと判定すると（ステップＳ４３／ＹＥＳ）、制御部１００は、このフローを終了させる。 When the phrase string B that has the same meaning as the phrase string A or matches the headword cannot be detected (step S37 / NO), the control unit 100 proceeds to the process of step S43. In step S43, the control unit 100 determines whether there is another phrase string that is not selected as the phrase string A among the phrase strings in which no deletion flag is recorded (step S43). If it is determined that there is a phrase string not selected as the phrase string A (step S43 / NO), the control unit 100 repeats the processing from step S35. If it is determined that there is no phrase string not selected as the phrase string A (step S43 / YES), the control unit 100 ends this flow.

また、ステップＳ３７において、文節列Ｂを検出した場合（ステップＳ３７／ＹＥＳ）、制御部１００は、文節列Ｂと文節列Ａとをまとめてグループ化する処理を行う。なお、グループ化する処理とは、文節列Ａと文節列Ｂのいずれか一方を選択して、選択した文字列をグループを代表する文節列とする。また、選択しなかった文字列のＲＡＭ１０３の記録欄には削除フラグを記録する。また、文節列Ａと文節列Ｂの出現頻度を加算し、加算した出現頻度を、グループを代表する文節列として選択した文節列の出現頻度の記録欄に記録する。
制御部１００は、まず、文節列Ａと文節列Ｂとの出現頻度を比較する（ステップＳ３８）。検出した文節列Ｂの出現頻度が、処理対象の文節列Ａの出現頻度よりも高いと判定した場合（ステップＳ３８／ＹＥＳ）、制御部１００は、まず、ＲＡＭ１０３の記録欄に文節列Ｃの記録欄を作成し、文節列Ｃの記録欄に文節列Ｂの記録情報をコピーする。また、制御部１００は、ＲＡＭ１０３の文節列Ａと文節列Ｂとの記録欄に削除フラグを記録する（ステップＳ３９）。そして、制御部１００は、文節列Ａの出現頻度と文節列Ｂの出現頻度とを加算し、加算結果を文節列Ｃの出現頻度としてＲＡＭ１０３の記録欄に記録する（ステップＳ４０）。また、文節列Ａの出現頻度と文節列Ｂの出現頻度とが同じであるか、又は文節列Ａの出現頻度が文節列Ｂの出現頻度よりも大きかった場合（ステップＳ３８／ＮＯ）、制御部１００は、まず、ＲＡＭ１０３の記録欄に文節列Ｄの記録欄を作成し、文節列Ｄの記録欄に文節列Ａの記録情報をコピーする。さらに、制御部１００は、ＲＡＭ１０３の文節列Ａと文節列Ｂとの記録欄に削除フラグを記録する（ステップＳ４１）。そして、制御部１００は、文節列Ａの出現頻度と文節列Ｂの出現頻度とを加算し、加算結果を文節列Ｄの出現頻度としてＲＡＭ１０３の記録欄に記録する（ステップＳ４２）。その後、制御部１００は、上述したステップＳ４３の処理に移行する。
なお、文節列Ａと文節列Ｂとの出現頻度が同じであった場合には、文節列の長さが長い方、すなわち含まれる文節数の多い方を入力予測候補として残し、文節数の少ない方に削除フラグを記録してもよい。 When the phrase string B is detected in step S37 (step S37 / YES), the control unit 100 performs a process of grouping the phrase string B and the phrase string A together. In the grouping process, one of the phrase string A and the phrase string B is selected, and the selected character string is used as a phrase string representing the group. Further, a deletion flag is recorded in the recording column of the RAM 103 for the character string that has not been selected. Further, the appearance frequencies of the phrase string A and the phrase string B are added, and the added appearance frequency is recorded in the record column of the appearance frequency of the phrase string selected as the phrase string representing the group.
First, the control unit 100 compares the appearance frequencies of the phrase string A and the phrase string B (step S38). When it is determined that the appearance frequency of the detected phrase string B is higher than the appearance frequency of the phrase string A to be processed (step S38 / YES), the control unit 100 first records the phrase string C in the recording column of the RAM 103. A field is created, and the record information of the phrase string B is copied to the record field of the phrase string C. Further, the control unit 100 records a deletion flag in the record column of the phrase string A and the phrase string B in the RAM 103 (step S39). Then, the control unit 100 adds the appearance frequency of the phrase string A and the appearance frequency of the phrase string B, and records the addition result in the recording column of the RAM 103 as the appearance frequency of the phrase string C (step S40). If the frequency of occurrence of phrase string A is the same as the frequency of occurrence of phrase string B, or the frequency of occurrence of phrase string A is greater than the frequency of occurrence of phrase string B (step S38 / NO), the control unit 100 first creates a record column of the phrase string D in the record column of the RAM 103 and copies the record information of the phrase string A to the record column of the phrase string D. Further, the control unit 100 records a deletion flag in the record column of the phrase string A and the phrase string B in the RAM 103 (step S41). Then, the control unit 100 adds the appearance frequency of the phrase string A and the appearance frequency of the phrase string B, and records the addition result in the recording column of the RAM 103 as the appearance frequency of the phrase string D (step S42). Then, the control part 100 transfers to the process of step S43 mentioned above.
When the frequency of occurrence of the phrase string A and the phrase string B is the same, the longer phrase string, that is, the greater number of included phrases is left as an input prediction candidate, and the number of phrases is smaller. A deletion flag may be recorded on the other side.

次に、図１０に示すフローチャートを参照しながら、文節列の簡略表記を作成して、簡略表記が同じ文節列をまとめ、グループ化する処理（図７に示すステップＳ５の処理の詳細）について説明する。
まず、制御部１００は、ＲＡＭ１０３に記録した形態素解析の結果を参照して、文節列がカンマで終わっていて、カンマの直前が助詞以外の文節列の記録欄に削除フラグを記録する。例えば、「血圧が正常であった、」といった文節列の記録欄に削除フラグを記録する（ステップＳ５１）。
次に、制御部１００は、ＲＡＭ１０３に記録した形態素解析の結果を参照して、削除フラグを記録した文節列以外の文節列について、文節列から自立語を取り出す（ステップＳ５２）。次に、制御部１００は、取り出した自立語だけをつなげることで作成した簡略表記Ｒを作成する。例えば、「ウィンドウを開く」の簡略表記Ｒは「ウィンドウ開く」であり、「血圧：正常」の簡略表記Ｒは、「血圧正常」である。 Next, referring to the flowchart shown in FIG. 10, a description will be given of a process of creating a simplified notation of a phrase string, grouping together phrase strings having the same simplified notation, and grouping them (the details of the process in step S <b> 5 shown in FIG. 7). To do.
First, the control unit 100 refers to the result of the morphological analysis recorded in the RAM 103, and records the deletion flag in the record column of the phrase string other than the particle where the phrase string ends with a comma and immediately before the comma. For example, the deletion flag is recorded in the record column of the phrase string “blood pressure was normal” (step S51).
Next, the control unit 100 refers to the result of the morphological analysis recorded in the RAM 103, and extracts an independent word from the phrase string for the phrase string other than the phrase string in which the deletion flag is recorded (step S52). Next, the control unit 100 creates a simplified notation R created by connecting only the extracted independent words. For example, the simplified notation R of “open window” is “open window”, and the simplified notation R of “blood pressure: normal” is “normal blood pressure”.

次に、制御部１００は、処理を行う対象となる文節列（以下、文節列Ｅと呼ぶ）を選択する（ステップＳ５３）。次に、制御部１００は、処理対象の文節列Ｅと簡略表記Ｒが一致する文節列（以下、文節列Ｆと呼ぶ）を検出する（ステップＳ５４）。なお、文節列Ｆの数は１つに限らない。文節列Ｅと簡略表記が一致する文節列Ｆを検出した場合には（ステップＳ５５／ＹＥＳ）、制御部１００は、文節列Ｅと文節列Ｆとの出現頻度を比較する。制御部１００は、文節列Ｅと文節列Ｆのうち出現頻度の高いほうを選択し、もう一方の文節列を選択した文節列にまとめてグループ化する。すなわち、文字列の出現頻度を比較して、出現頻度の低い方の文節列を入力予測候補から除外する。また、出現頻度の高い方の文節列に、出現頻度の低い方の文節列の出現頻度を加算して、加算結果を出現頻度の高い方の文節列の出現頻度とする。文節列Ｆの出現頻度が文節列Ｅの出現頻度よりも高いと判定した場合（ステップＳ５６／ＹＥＳ）、制御部１００は、まず、ＲＡＭ１０３の記録欄に文節列Ｇの記録欄を作成し、文節列Ｇの記録欄に文節列Ｆの記録情報をコピーする。さらに、制御部１００は、ＲＡＭ１０３の文節列Ｅと文節列Ｆとの記録欄に削除フラグを記録する（ステップＳ５７）。そして、制御部１００は、文節列Ｅの出現頻度と文節列Ｆの出現頻度とを加算し、加算結果を文節列Ｇの出現頻度としてＲＡＭ１０３に記録する（ステップＳ５８）。また、文節列Ｅの出現頻度と文節列Ｆの出現頻度とが同じであるか、又は文節列Ｅの出現頻度が文節列Ｆの出現頻度よりも大きかった場合（ステップＳ５６／ＮＯ）、制御部１００は、まず、ＲＡＭ１０３の記録欄に文節列Ｈの記録欄を作成し、文節列Ｈの記録欄に文節列Ｅの記録情報をコピーする。さらに、制御部１００は、ＲＡＭ１０３の文節列Ｅと文節列Ｆとの記録欄に削除フラグを記録する（ステップＳ５９）。そして、制御部１００は、文節列Ｅの出現頻度と文節列Ｆの出現頻度とを加算し、加算結果を文節列Ｈの出現頻度としてＲＡＭ１０３に記録する（ステップＳ６０）。その後、制御部１００は、ステップＳ６１の処理に移行する。制御部１００は、ステップＳ６１において、削除フラグが記録されていない文節列の中で、文節列Ｅとして選択していない文節列があるか否かを判定する（ステップＳ６１）。削除フラグが記録されていない文節列の中で、文節列Ｅとして選択していない文節列があると判定すると（ステップＳ６１／ＮＯ）、制御部１００は、ステップＳ５３からの処理を繰り返す。また、文節列Ｅとして選択していない文節列がないと判定すると（ステップＳ６１／ＹＥＳ）、制御部１００は、このフローを終了させる。 Next, the control unit 100 selects a phrase string (hereinafter referred to as phrase string E) to be processed (step S53). Next, the control unit 100 detects a phrase string (hereinafter referred to as a phrase string F) in which the phrase string E to be processed and the simplified notation R match (step S54). The number of phrase strings F is not limited to one. When a phrase string F having the same simplified notation as the phrase string E is detected (step S55 / YES), the control unit 100 compares the appearance frequencies of the phrase string E and the phrase string F. The control unit 100 selects one of the phrase string E and the phrase string F that has the highest appearance frequency, and groups the other phrase string together into the selected phrase string. That is, the appearance frequency of character strings is compared, and the phrase string with the lower appearance frequency is excluded from the input prediction candidates. In addition, the appearance frequency of the phrase string having the lower appearance frequency is added to the phrase string having the higher appearance frequency, and the addition result is set as the appearance frequency of the phrase string having the higher appearance frequency. When it is determined that the appearance frequency of the phrase string F is higher than the appearance frequency of the phrase string E (step S56 / YES), the control unit 100 first creates a recording column for the phrase string G in the recording column of the RAM 103, Copy the record information of the phrase string F to the record column of the string G. Further, the control unit 100 records a deletion flag in the record column of the phrase string E and the phrase string F in the RAM 103 (step S57). Then, the control unit 100 adds the appearance frequency of the phrase string E and the appearance frequency of the phrase string F, and records the addition result in the RAM 103 as the appearance frequency of the phrase string G (step S58). Further, when the appearance frequency of the phrase string E and the appearance frequency of the phrase string F are the same, or when the appearance frequency of the phrase string E is higher than the appearance frequency of the phrase string F (step S56 / NO), the control unit 100 first creates a record column of the phrase string H in the record column of the RAM 103 and copies the record information of the phrase string E to the record column of the phrase string H. Further, the control unit 100 records a deletion flag in the record column of the phrase string E and the phrase string F in the RAM 103 (step S59). Then, the control unit 100 adds the appearance frequency of the phrase string E and the appearance frequency of the phrase string F, and records the addition result in the RAM 103 as the appearance frequency of the phrase string H (step S60). Thereafter, the control unit 100 proceeds to the process of step S61. In step S61, the control unit 100 determines whether there is a phrase string that is not selected as the phrase string E among the phrase strings in which no deletion flag is recorded (step S61). If it is determined that there is a phrase string not selected as the phrase string E among the phrase strings for which no deletion flag is recorded (step S61 / NO), the control unit 100 repeats the processing from step S53. If it is determined that there is no phrase string not selected as the phrase string E (step S61 / YES), the control unit 100 ends this flow.

また、ステップＳ５５において、文節列Ｅの簡略表記と簡略表記が一致する文節列Ｆを検出することができなかった場合には（ステップＳ５５／ＮＯ）、制御部１００は、ステップＳ６２の処理に移行する。ステップＳ６２では、制御部１００は、文節列Ｅの簡略表記と簡略表記が前方一致する文節列（以下、文節列Ｉと呼ぶ）が存在するか否かを判定する。すなわち、文節列Ｅの簡略表記を含む文節列が存在するか否かを制御部１００は判定する。制御部１００は、簡略表記が前方一致する文節列Ｉを検出すると（ステップＳ６３／ＹＥＳ）、まず、ＲＡＭ１０３の記録欄に文節列Ｊの記録欄を作成し、文節列Ｊの記録欄に文節列Ｉの記録情報をコピーする（ステップＳ６４）。さらに、制御部１００は、ＲＡＭ１０３の文節列Ｅと文節列Ｉとの記録欄に削除フラグを記録する（ステップＳ６４）。そして、制御部１００は、文節列Ｅの出現頻度と文節列Ｉの出現頻度とを加算し、加算結果を文節列Ｊの出現頻度としてＲＡＭ１０３に記録する（ステップＳ６５）。その後、制御部１００は、上述したステップＳ６１の処理に移行する。 In step S55, if the phrase string F in which the simplified notation of the phrase string E matches the simplified notation cannot be detected (step S55 / NO), the control unit 100 proceeds to the process of step S62. To do. In step S62, the control unit 100 determines whether there is a phrase string (hereinafter referred to as phrase string I) in which the simplified notation and the simplified notation of the phrase string E coincide with each other. That is, the control unit 100 determines whether or not there is a phrase string including the simplified notation of the phrase string E. When the control unit 100 detects a phrase string I whose simplified notation matches forward (step S63 / YES), the control unit 100 first creates a record column for the phrase string J in the record field of the RAM 103, and then sets the phrase string in the record column for the phrase string J. The recording information of I is copied (step S64). Further, the control unit 100 records a deletion flag in the record column of the phrase string E and the phrase string I in the RAM 103 (step S64). Then, the control unit 100 adds the appearance frequency of the phrase string E and the appearance frequency of the phrase string I, and records the addition result in the RAM 103 as the appearance frequency of the phrase string J (step S65). Then, the control part 100 transfers to the process of step S61 mentioned above.

制御部１００は、上述の手順を経てＲＡＭ１０３に記録した情報を記憶部１０４に入力予測候補ＤＢとして記憶させる。また、制御部１００は、キーボード１１６で文字列を受け付けると、記憶部１０４に記憶した入力予測候補ＤＢを参照して、受け付けた文字列に対する入力予測候補をディスプレイ１１２に表示させる。
入力予測候補ＤＢに入力予測候補として記録される文節列は、同じ意味を持った文節列をグループ化し、グループの中の１つの文節列を選択して登録したものである。また、選択した文節列の頻度は、グループに含まれる各文節列の使用頻度の合計としている。従って、同じ意味を持つ文字列のグループを単位に使用頻度を集計することができ、使用頻度の高い意味のグループに含まれる文字列を入力予測候補の上位にランク付けすることができる。
また、文字列から作成される文節列は、１つの文から複数文節を取り出して生成されるものである。すなわち、入力予測候補として表示される文節列は、キーボード１１６のエンターキーが押されてから、次に押されるまでの間に入力を受け付けた文字列を単位とするものではない。従って、ユーザによって入力された１文を単位とし、この１文の中から頻度の高い有効な文節列を取り出すことができる。 The control unit 100 stores the information recorded in the RAM 103 through the above-described procedure in the storage unit 104 as an input prediction candidate DB. In addition, when the control unit 100 receives a character string with the keyboard 116, the control unit 100 refers to the input prediction candidate DB stored in the storage unit 104 and causes the display 112 to display the input prediction candidate for the received character string.
The phrase string recorded as the input prediction candidate in the input prediction candidate DB is obtained by grouping phrase strings having the same meaning and selecting and registering one phrase string in the group. The frequency of the selected phrase string is the total usage frequency of each phrase string included in the group. Therefore, the frequency of use can be aggregated in units of groups of character strings having the same meaning, and the character strings included in the group of meanings having a high usage frequency can be ranked higher in the input prediction candidates.
The phrase string created from the character string is generated by extracting a plurality of phrases from one sentence. In other words, the phrase string displayed as the input prediction candidate is not based on the character string that has been input from when the enter key of the keyboard 116 is pressed until it is pressed next. Therefore, it is possible to take out an effective phrase string having a high frequency from the one sentence in units of one sentence input by the user.

なお、上述した説明では、入力予測候補から除外する文節列には、削除フラグを記録し、入力予測候補として表示しないようにしていた。しかし、入力予測候補として表示する候補数が少ない場合には、削除フラグを記録した文節列も含んだ文節列の中から入力予測候補を検索してもよい。 In the above description, the deletion flag is recorded in the phrase string excluded from the input prediction candidates and is not displayed as the input prediction candidates. However, when the number of candidates to be displayed as input prediction candidates is small, the input prediction candidates may be searched from the phrase string including the phrase string in which the deletion flag is recorded.

次に、ディスプレイ１１２に表示される入力予測候補の表示例について説明する。
図１１に示す第１の表示例では、メインウィンドウ５０１と、サブウィンドウ５０２とを並べて表示している。メインウィンドウ５０１には、ユーザによって入力された文字列（図１１に示す例では、「けつあつ」）をかな漢字変換処理部２５０で変換した変換候補を表示する。また、サブウィンドウ５０２には、メインウィンドウ５０１に表示した変換候補の一つ（例えば、「血圧」）がマウス１１７等によって選択されると、選択された変換候補の入力予測候補を表示する。例えば、かな漢字変換処理部２５０の処理でメインウィンドウ５０１に表示させる変換候補が「血圧」である場合、「血圧異常あり」、「血圧：正常」の入力予測候補をサブウィンドウ５０２に表示させる。これらの入力予測候補は、「血圧」の文字を含み、出現頻度が上位の文節列である。 Next, a display example of input prediction candidates displayed on the display 112 will be described.
In the first display example shown in FIG. 11, a main window 501 and a sub window 502 are displayed side by side. The main window 501 displays conversion candidates obtained by converting the character string input by the user (“Ketsuatsu” in the example shown in FIG. 11) by the kana-kanji conversion processing unit 250. In addition, when one of the conversion candidates (for example, “blood pressure”) displayed on the main window 501 is selected by the mouse 117 or the like, the input prediction candidate of the selected conversion candidate is displayed on the sub window 502. For example, when the conversion candidate to be displayed on the main window 501 in the process of the kana-kanji conversion processing unit 250 is “blood pressure”, the input prediction candidates “blood pressure abnormality” and “blood pressure: normal” are displayed on the sub window 502. These input prediction candidates are phrase strings including the characters “blood pressure” and having the highest appearance frequency.

また、図１２に示す第２の表示例では、表示させるウィンドウは、メインウィンドウ５０１だけとし、メインウィンドウ５０１内に変換候補と、入力予測候補とを表示させている。図１２に示す例では、かな漢字変換処理部２５０で変換した変換候補は、１つだけを表示し、この変換候補の入力予測候補を変換候補の下に表示させる。 In the second display example shown in FIG. 12, the main window 501 is the only window to be displayed, and conversion candidates and input prediction candidates are displayed in the main window 501. In the example shown in FIG. 12, only one conversion candidate converted by the kana-kanji conversion processing unit 250 is displayed, and the input prediction candidate of this conversion candidate is displayed below the conversion candidate.

また、図１３に示す第３の表示例では、メインウィンドウ５０１の表示は図１１に示す第１の表示例と同一であるが、サブウィンドウ５０２には、表示させる入力予測候補を絞り込むための絞り込み語の入力を要求する表示５０３を表示させる。サブウィンドウ５０２に絞り込み語が入力されると、入力予測候補検索部３００は、入力された絞り込み語を検索キーとして入力予測候補ＤＢを検索する。入力予測候補検索部３００は、メインウィンドウ５０１に表示した変換候補と、絞り込み語とを含む文節列のうち、出現頻度が上位のもの（図１３に示す例では上位２つ）を取り出して、サブウィンドウ５０２に表示させる。 In the third display example shown in FIG. 13, the display of the main window 501 is the same as that of the first display example shown in FIG. 11, but in the sub-window 502, a narrowed word for narrowing down input prediction candidates to be displayed. A display 503 requesting the input of is displayed. When a narrowed word is input to the sub-window 502, the input prediction candidate search unit 300 searches the input prediction candidate DB using the input narrowed word as a search key. The input prediction candidate search unit 300 takes out the phrase having the highest appearance frequency (the top two in the example shown in FIG. 13) from the phrase string including the conversion candidate displayed in the main window 501 and the narrowed-down word, and outputs the sub-window 502 is displayed.

また、図１４（Ａ）に示す第４の表示例は、メインウィンドウ５０１の表示は図１１に示す第１の表示例と同一である。また、サブウィンドウ５０２には、メインウィンドウ５０１に表示させた変換候補が入力された場合に、この変換候補に続けて入力された単語の候補が表示される。入力予測候補検索部３００は、入力予測候補ＤＢを検索して、変換候補の単語が入力された場合に、この単語を含む文節列を検索する。入力予測候補検索部３００は、検出した文節列に含まれる自立語を取り出し、取り出した自立語を絞り込み語としてサブウィンドウ５０２に表示させる。また、入力予測候補検索部３００は、変換候補の単語と絞り込み語とを含む文節列が何個あるかをサブウィンドウ５０２内に表示させる。
また、マウス１１７等の操作によってサブウィンドウ５０２に表示させた絞り込み語の１つが選択されると、選択された絞り込み語と変換候補の単語とを含む文節列を表示させる。なお、文節列を表示させるウィンドウは、図１４（Ｂ）に示すようにサブウィンドウ５０２間にスペースを設けて、このスペースに文節列を表示させるウィンドウ５０４を表示させてもよい。また、図１４（Ｃ）に示すように、サブウィンドウ５０２のメインウィンドウ５０１とは重ならない側に、文節列を表示させるウィンドウ５０４を表示させてもよい。 In the fourth display example shown in FIG. 14A, the display of the main window 501 is the same as the first display example shown in FIG. In addition, when the conversion candidate displayed on the main window 501 is input to the sub-window 502, the word candidates input following the conversion candidate are displayed. The input prediction candidate search unit 300 searches the input prediction candidate DB, and when a conversion candidate word is input, searches for a phrase string including this word. The input prediction candidate search unit 300 extracts independent words included in the detected phrase string and displays the extracted independent words on the subwindow 502 as narrowed words. Further, the input prediction candidate search unit 300 displays in the sub-window 502 how many phrase strings including the conversion candidate word and the narrowed word.
When one of the narrowed words displayed on the sub window 502 is selected by operating the mouse 117 or the like, a phrase string including the selected narrowed word and the conversion candidate word is displayed. In the window for displaying the phrase string, a space may be provided between the sub-windows 502 as shown in FIG. 14B, and a window 504 for displaying the phrase string may be displayed in this space. As shown in FIG. 14C, a window 504 for displaying a phrase string may be displayed on the side of the sub window 502 that does not overlap with the main window 501.

また、メインウィンドウ５０１やサブウィンドウ５０２に表示させる入力予測候補を、出現頻度に応じて文字フォントや背景色を変更して表示させてもよい。例えば、出現頻度が１０以上の入力予測候補は、太字フォントを使用して表示し、さらにサブウィンドウ内の背景色を赤で強調表示する。また出現頻度が２以下の入力予測候補は、灰色のフォントを使用して表示させる。 Further, the input prediction candidates to be displayed in the main window 501 and the sub window 502 may be displayed by changing the character font and background color according to the appearance frequency. For example, input prediction candidates whose appearance frequency is 10 or more are displayed using a bold font, and the background color in the subwindow is highlighted in red. An input prediction candidate whose appearance frequency is 2 or less is displayed using a gray font.

上述した実施例は本発明の好適な実施の例である。但し、これに限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変形実施可能である。 The embodiment described above is a preferred embodiment of the present invention. However, the present invention is not limited to this, and various modifications can be made without departing from the scope of the present invention.

［付記１］
入力受付手段によって入力を受け付け、記憶手段に記憶した複数の文字列データを解析して、各文字列データを、該各文字列データに含まれる単語又は文節を少なくとも２以上含んだ複数の部分文字列データに変換する変換手段と、
前記変換手段で変換された部分文字列データの、前記複数の文字列データ中での出現頻度を計数する頻度計数手段と、
前記変換手段により変換された部分文字列データから、自立語を抽出する自立語抽出手段と、
前記変換手段により変換された部分文字列データの中から処理対象の部分文字列データを選択し、前記自立語抽出手段の抽出結果を参照して、前記処理対象の部分文字列データに含まれる自立語を含む他の部分文字列データを、前記処理対象の部分文字列データと同じ意味を持った部分文字列データとして検出する第１検出手段と、
前記第１検出手段で検出した前記他の部分文字列データと前記処理対象の部分文字列データとのうち、部分文字列データの長さ、又は前記頻度計数手段で計数した出現頻度に応じて部分文字列データを選択し、選択した部分文字列データを、文字列の入力を支援する入力予測候補として前記記憶手段に記憶させる第１選択手段と、
前記処理対象の部分文字列データと、前記第１検出手段で検出した前記他の部分文字列データとの出現頻度の合計値を算出し、算出した合計値を前記入力予測候補の使用頻度として前記記憶手段に記憶させる頻度算出手段と、
前記入力受付手段により文字列データの入力を受け付けた場合に、前記記憶手段に記憶した使用頻度に基づいて、入力を受け付けた前記文字列データを含む部分文字列データを選出し、選出した部分文字列データを表示手段に表示させる選出手段と、
を備えることを特徴とする文字入力支援装置。
［付記２］
同じ意味を示す同義語を登録した辞書を記憶した辞書記憶手段と、
前記辞書記憶手段を参照して、前記自立語抽出手段で抽出された自立語の同義語を抽出する同義語抽出手段と、
前記変換手段で変換された部分文字列データの中から処理対象の部分文字列データを選択し、選択した部分文字列データに含まれる自立語の同義語を含む他の部分文字列データを、前記処理対象の部分文字列データと同じ意味を持った部分文字列データとして検出する第２検出手段と、
前記第２検出手段で検出した前記他の部分文字列データと前記処理対象の部分文字列データとのうち、部分文字列データの長さ、又は前記頻度計数手段で計数した出現頻度に応じて部分文字列データを選択し、前記記憶手段に前記入力予測候補として記憶させる第２選択手段とを備え、
前記頻度算出手段は、前記処理対象の部分文字列データと、前記第２検出手段で検出した前記他の部分文字列データとの出現頻度の合計値を算出し、算出した合計値を前記入力予測候補の使用頻度として前記記憶手段に記憶させることを特徴とする付記１記載の文字入力支援装置。
［付記３］
前記変換手段で変換された部分文字列データの中から処理対象の部分文字列データを選択し、選択した部分文字列データに含まれる単語数又は文節数よりも多くの単語又は文節を含んだ部分文字列データであって、前記処理対象の部分文字列データに含まれる単語又は文節を含む他の部分文字列データを検出し、検出した他の部分文字列データを、前記入力予測候補として前記記憶手段に記憶させる第３検出手段を備え、
前記頻度算出手段は、前記第３検出手段で検出された他の部分文字列データと、前記処理対象の部分文字列データとの出現頻度の合計値を算出し、算出した合計値を前記入力予測候補の使用頻度として前記記憶手段に記憶させることを特徴とする付記１又は２記載の文字入力支援装置。
［付記４］
コンピュータを、文字入力を支援する文字入力支援システムとして動作させるプログラムであって、
前記コンピュータを、
入力受付手段によって入力を受け付け、記憶手段に記憶した複数の文字列データを解析して、各文字列データを、該各文字列データに含まれる単語又は文節を少なくとも２以上含んだ複数の部分文字列データに変換する変換手段と、
前記変換手段で変換された部分文字列データの、前記複数の文字列データ中での出現頻度を計数する頻度計数手段と、
前記変換手段により変換された部分文字列データから、自立語を抽出する自立語抽出手段と、
前記変換手段により変換された部分文字列データの中から処理対象の部分文字列データを選択し、前記自立語抽出手段の抽出結果を参照して、前記処理対象の部分文字列データに含まれる自立語を含む他の部分文字列データを、前記処理対象の部分文字列データと同じ意味を持った部分文字列データとして検出する第１検出手段と、
前記第１検出手段で検出した前記他の部分文字列データと前記処理対象の部分文字列データとのうち、部分文字列データの長さ、又は前記頻度計数手段で計数した出現頻度に応じて部分文字列データを選択し、選択した部分文字列データを、文字列の入力を支援する入力予測候補として前記記憶手段に記憶させる第１選択手段と、
前記処理対象の部分文字列データと、前記第１検出手段で検出した前記他の部分文字列データとの出現頻度の合計値を算出し、算出した合計値を前記入力予測候補の使用頻度として前記記憶手段に記憶させる頻度算出手段と、
前記入力受付手段により文字列データの入力を受け付けた場合に、前記記憶手段に記憶した使用頻度に基づいて、入力を受け付けた前記文字列データを含む部分文字列データを選出し、選出した部分文字列データを表示手段に表示させる選出手段として機能させることを特徴とするプログラム。
［付記５］
前記コンピュータを、
同じ意味を示す同義語を登録した辞書を参照して、前記自立語抽出手段で抽出された自立語の同義語を抽出する同義語抽出手段と、
前記変換手段で変換された部分文字列データの中から処理対象の部分文字列データを選択し、選択した部分文字列データに含まれる自立語の同義語を含む他の部分文字列データを、前記処理対象の部分文字列データと同じ意味を持った部分文字列データとして検出する第２検出手段と、
前記第２検出手段で検出した前記他の部分文字列データと前記処理対象の部分文字列データとのうち、部分文字列データの長さ、又は前記頻度計数手段で計数した出現頻度に応じて部分文字列データを選択し、前記記憶手段に前記入力予測候補として記憶させる第２選択手段として機能させ、
前記頻度算出手段は、前記処理対象の部分文字列データと、前記第２検出手段で検出した前記他の部分文字列データとの出現頻度の合計値を算出し、算出した合計値を前記入力予測候補の使用頻度として前記記憶手段に記憶させることを特徴とする付記４記載のプログラム。
［付記６］
前記コンピュータを、
前記変換手段で変換された部分文字列データの中から処理対象の部分文字列データを選択し、選択した部分文字列データに含まれる単語数又は文節数よりも多くの単語又は文節を含んだ部分文字列データであって、前記処理対象の部分文字列データに含まれる単語又は文節を含む他の部分文字列データを検出し、検出した他の部分文字列データを、前記入力予測候補として前記記憶手段に記憶させる第３検出手段として機能させ、
前記頻度算出手段は、前記第３検出手段で検出された他の部分文字列データと、前記処理対象の部分文字列データとの出現頻度の合計値を算出し、算出した合計値を前記入力予測候補の使用頻度として前記記憶手段に記憶させることを特徴とする付記４又は５記載のプログラム。
［付記７］
文字入力の支援を行うコンピュータ装置で実行される文字入力支援方法であって、
入力受付手段によって入力を受け付け、記憶手段に記憶した複数の文字列データを解析して、各文字列データを、該各文字列データに含まれる単語又は文節を少なくとも２以上含んだ複数の部分文字列データに変換する変換ステップと、
前記変換ステップで変換された部分文字列データの、前記複数の文字列データ中での出現頻度を計数する頻度計数ステップと、
前記変換ステップにより変換された部分文字列データから、自立語を抽出する自立語抽出ステップと、
前記変換ステップにより変換された部分文字列データの中から処理対象の部分文字列データを選択し、前記自立語抽出ステップの抽出結果を参照して、前記処理対象の部分文字列データに含まれる自立語を含む他の部分文字列データを、前記処理対象の部分文字列データと同じ意味を持った部分文字列データとして検出する第１検出ステップと、
前記第１検出ステップで検出した前記他の部分文字列データと前記処理対象の部分文字列データとのうち、部分文字列データの長さ、又は前記頻度計数ステップで計数した出現頻度に応じて部分文字列データを選択し、選択した部分文字列データを、文字列の入力を支援する入力予測候補として前記記憶手段に記憶させる第１選択ステップと、
前記処理対象の部分文字列データと、前記第１検出ステップで検出した前記他の部分文字列データとの出現頻度の合計値を算出し、算出した合計値を前記入力予測候補の使用頻度として前記記憶手段に記憶させる頻度算出ステップと、
前記入力受付手段により文字列データの入力を受け付けた場合に、前記記憶手段に記憶した使用頻度に基づいて、入力を受け付けた前記文字列データを含む部分文字列データを選出し、選出した部分文字列データを表示手段に表示させる選出ステップとを前記コンピュータ装置が実行することを特徴とする文字入力支援方法。
［付記８］
前記コンピュータ装置は、
同じ意味を示す同義語を登録した辞書を参照して、前記自立語抽出ステップで抽出された自立語の同義語を抽出する同義語抽出ステップと、
前記変換ステップで変換された部分文字列データの中から処理対象の部分文字列データを選択し、選択した部分文字列データに含まれる自立語の同義語を含む他の部分文字列データを、前記処理対象の部分文字列データと同じ意味を持った部分文字列データとして検出する第２検出ステップと、
前記第２検出ステップで検出した前記他の部分文字列データと前記処理対象の部分文字列データとのうち、部分文字列データの長さ、又は前記頻度計数ステップで計数した出現頻度に応じて部分文字列データを選択し、前記記憶手段に前記入力予測候補として記憶させる第２選択ステップとをさらに実行し、
前記頻度算出ステップは、前記処理対象の部分文字列データと、前記第２検出ステップで検出した前記他の部分文字列データとの出現頻度の合計値を算出し、算出した合計値を前記入力予測候補の使用頻度として前記記憶手段に記憶させることを特徴とする付記７記載の文字入力支援方法。
［付記９］
前記コンピュータ装置は、
前記変換ステップで変換された部分文字列データの中から処理対象の部分文字列データを選択し、選択した部分文字列データに含まれる単語数又は文節数よりも多くの単語又は文節を含んだ部分文字列データであって、前記処理対象の部分文字列データに含まれる単語又は文節を含む他の部分文字列データを検出し、検出した他の部分文字列データを、前記入力予測候補として前記記憶手段に記憶させる第３検出ステップをさらに実行し、
前記頻度算出ステップは、前記第３検出ステップで検出された他の部分文字列データと、前記処理対象の部分文字列データとの出現頻度の合計値を算出し、算出した合計値を前記入力予測候補の使用頻度として前記記憶手段に記憶させることを特徴とする付記７又は８記載の文字入力支援方法。 [Appendix 1]
A plurality of partial characters including at least two or more words or clauses included in each character string data by analyzing the plurality of character string data received by the input receiving means and analyzing the plurality of character string data stored in the storage means Conversion means for converting to column data;
Frequency counting means for counting the appearance frequency of the partial character string data converted by the conversion means in the plurality of character string data;
From the partial character string data converted by the conversion means, an independent word extraction means for extracting an independent word;
The partial character string data to be processed is selected from the partial character string data converted by the conversion means, and the independent character included in the partial character string data to be processed is referred to by referring to the extraction result of the independent word extraction means. First detection means for detecting other partial character string data including a word as partial character string data having the same meaning as the partial character string data to be processed;
Of the other partial character string data detected by the first detection means and the partial character string data to be processed, a part according to the length of the partial character string data or the appearance frequency counted by the frequency counting means First selection means for selecting character string data, and storing the selected partial character string data in the storage means as input prediction candidates for supporting input of the character string;
Calculate a total value of appearance frequencies of the partial character string data to be processed and the other partial character string data detected by the first detection unit, and use the calculated total value as the use frequency of the input prediction candidate A frequency calculation means for storing in the storage means;
When input of character string data is received by the input receiving means, based on the use frequency stored in the storage means, partial character string data including the character string data that has been accepted is selected, and the selected partial character Selection means for displaying column data on the display means;
A character input support device comprising:
[Appendix 2]
Dictionary storage means for storing a dictionary in which synonyms indicating the same meaning are registered;
Referring to the dictionary storage means, synonym extraction means for extracting synonyms of the independent words extracted by the independent word extraction means;
The partial character string data to be processed is selected from the partial character string data converted by the conversion means, and other partial character string data including synonyms of independent words included in the selected partial character string data, Second detection means for detecting as partial character string data having the same meaning as the partial character string data to be processed;
Of the other partial character string data detected by the second detection means and the partial character string data to be processed, a part according to the length of the partial character string data or the appearance frequency counted by the frequency counting means Second selection means for selecting character string data and storing it as the input prediction candidates in the storage means,
The frequency calculation means calculates a total value of appearance frequencies of the partial character string data to be processed and the other partial character string data detected by the second detection means, and the calculated total value is calculated as the input prediction. The character input support device according to appendix 1, wherein the storage means stores the candidate usage frequency.
[Appendix 3]
A part that includes partial character string data to be processed from the partial character string data converted by the conversion means and includes more words or phrases than the number of words or phrases included in the selected partial character string data Detecting other partial character string data including word or phrase included in the processing target partial character string data and storing the detected other partial character string data as the input prediction candidate A third detecting means for storing in the means;
The frequency calculating means calculates a total value of appearance frequencies of the other partial character string data detected by the third detecting means and the partial character string data to be processed, and the calculated total value is calculated as the input prediction. The character input support device according to appendix 1 or 2, characterized in that the storage means stores the candidate use frequency.
[Appendix 4]
A program for operating a computer as a character input support system for supporting character input,
The computer,
A plurality of partial characters including at least two or more words or clauses included in each character string data by analyzing the plurality of character string data received by the input receiving means and analyzing the plurality of character string data stored in the storage means Conversion means for converting to column data;
Frequency counting means for counting the appearance frequency of the partial character string data converted by the conversion means in the plurality of character string data;
From the partial character string data converted by the conversion means, an independent word extraction means for extracting an independent word;
The partial character string data to be processed is selected from the partial character string data converted by the conversion means, and the independent character included in the partial character string data to be processed is referred to by referring to the extraction result of the independent word extraction means. First detection means for detecting other partial character string data including a word as partial character string data having the same meaning as the partial character string data to be processed;
Of the other partial character string data detected by the first detection means and the partial character string data to be processed, a part according to the length of the partial character string data or the appearance frequency counted by the frequency counting means First selection means for selecting character string data, and storing the selected partial character string data in the storage means as input prediction candidates for supporting input of the character string;
Calculate a total value of appearance frequencies of the partial character string data to be processed and the other partial character string data detected by the first detection unit, and use the calculated total value as the use frequency of the input prediction candidate A frequency calculation means for storing in the storage means;
When input of character string data is received by the input receiving means, based on the use frequency stored in the storage means, partial character string data including the character string data that has been accepted is selected, and the selected partial character A program that functions as selection means for displaying column data on a display means.
[Appendix 5]
The computer,
A synonym extracting means for extracting a synonym of the independent word extracted by the independent word extracting means with reference to a dictionary in which synonyms indicating the same meaning are registered;
The partial character string data to be processed is selected from the partial character string data converted by the conversion means, and other partial character string data including synonyms of independent words included in the selected partial character string data, Second detection means for detecting as partial character string data having the same meaning as the partial character string data to be processed;
Of the other partial character string data detected by the second detection means and the partial character string data to be processed, a part according to the length of the partial character string data or the appearance frequency counted by the frequency counting means Select character string data and function as second selection means for storing the input prediction candidates in the storage means,
The frequency calculation means calculates a total value of appearance frequencies of the partial character string data to be processed and the other partial character string data detected by the second detection means, and the calculated total value is calculated as the input prediction. The program according to appendix 4, wherein the storage unit stores the candidate usage frequency.
[Appendix 6]
The computer,
A part that includes partial character string data to be processed from the partial character string data converted by the conversion means and includes more words or phrases than the number of words or phrases included in the selected partial character string data Detecting other partial character string data including word or phrase included in the processing target partial character string data and storing the detected other partial character string data as the input prediction candidate Function as a third detection means to be stored in the means,
The frequency calculating means calculates a total value of appearance frequencies of the other partial character string data detected by the third detecting means and the partial character string data to be processed, and the calculated total value is calculated as the input prediction. The program according to appendix 4 or 5, which is stored in the storage means as a candidate use frequency.
[Appendix 7]
A character input support method executed by a computer device that supports character input,
A plurality of partial characters including at least two or more words or clauses included in each character string data by analyzing the plurality of character string data received by the input receiving means and analyzing the plurality of character string data stored in the storage means A conversion step for converting to column data;
A frequency counting step of counting the appearance frequency of the partial character string data converted in the conversion step in the plurality of character string data;
An independent word extraction step of extracting an independent word from the partial character string data converted by the conversion step,
The partial character string data to be processed is selected from the partial character string data converted by the conversion step, and the independent character included in the partial character string data to be processed is referred to by referring to the extraction result of the independent word extraction step. A first detection step of detecting other partial character string data including a word as partial character string data having the same meaning as the partial character string data to be processed;
Of the other partial character string data detected in the first detection step and the partial character string data to be processed, a part according to the length of the partial character string data or the appearance frequency counted in the frequency counting step A first selection step of selecting character string data and storing the selected partial character string data in the storage means as an input prediction candidate that supports input of the character string;
Calculate the total value of the appearance frequency of the partial character string data to be processed and the other partial character string data detected in the first detection step, and use the calculated total value as the use frequency of the input prediction candidate A frequency calculation step of storing in the storage means;
When input of character string data is received by the input receiving means, based on the use frequency stored in the storage means, partial character string data including the character string data that has been accepted is selected, and the selected partial character A character input support method, wherein the computer device executes a selection step of displaying column data on a display means.
[Appendix 8]
The computer device includes:
A synonym extraction step of extracting a synonym of the independent word extracted in the independent word extraction step with reference to a dictionary in which synonyms indicating the same meaning are registered;
The partial character string data to be processed is selected from the partial character string data converted in the conversion step, and the other partial character string data including synonyms of independent words included in the selected partial character string data, A second detection step of detecting as partial character string data having the same meaning as the partial character string data to be processed;
Of the other partial character string data detected in the second detection step and the partial character string data to be processed, a part according to the length of the partial character string data or the appearance frequency counted in the frequency counting step A second selection step of selecting character string data and causing the storage means to store it as the input prediction candidate,
The frequency calculation step calculates a total value of appearance frequencies of the partial character string data to be processed and the other partial character string data detected in the second detection step, and calculates the calculated total value as the input prediction 8. The character input support method according to appendix 7, wherein the storage means stores the candidate usage frequency.
[Appendix 9]
The computer device includes:
A portion that includes partial character string data to be processed from the partial character string data converted in the conversion step and includes more words or phrases than the number of words or phrases included in the selected partial character string data Detecting other partial character string data including word or phrase included in the processing target partial character string data and storing the detected other partial character string data as the input prediction candidate Further executing a third detection step to be stored in the means;
The frequency calculation step calculates a total value of appearance frequencies of the other partial character string data detected in the third detection step and the partial character string data to be processed, and calculates the calculated total value as the input prediction The character input support method according to appendix 7 or 8, wherein the storage means stores the candidate use frequency.

１コンピュータ装置
１００制御部
１０４記憶部
１１２ディスプレイ
１１５操作部
１１６キーボード
１１７マウス
２０１変換部
２０２頻度計数部
２０３第３検出部
２０４頻度再計算部
２０５自立語抽出部
２０６第１検出部
２０７第１選択部
２０８同義語抽出部
２０９第２検出部
２１０第２選択部 DESCRIPTION OF SYMBOLS 1 Computer apparatus 100 Control part 104 Storage part 112 Display 115 Operation part 116 Keyboard 117 Mouse 201 Conversion part 202 Frequency counting part 203 3rd detection part 204 Frequency recalculation part 205 Autonomous word extraction part 206 1st detection part 207 1st selection part 208 synonym extraction unit 209 second detection unit 210 second selection unit

Claims

A plurality of partial characters including at least two or more words or clauses included in each character string data by analyzing the plurality of character string data received by the input receiving means and analyzing the plurality of character string data stored in the storage means Conversion means for converting to column data;
Frequency counting means for counting the appearance frequency of the partial character string data converted by the conversion means in the plurality of character string data;
From the partial character string data converted by the conversion means, an independent word extraction means for extracting an independent word;
The partial character string data to be processed is selected from the partial character string data converted by the conversion means, and the independent character included in the partial character string data to be processed is referred to by referring to the extraction result of the independent word extraction means. First detection means for detecting other partial character string data including a word as partial character string data having the same meaning as the partial character string data to be processed;
Of the other partial character string data detected by the first detection means and the partial character string data to be processed, a part according to the length of the partial character string data or the appearance frequency counted by the frequency counting means First selection means for selecting character string data, and storing the selected partial character string data in the storage means as input prediction candidates for supporting input of the character string;
Calculate a total value of appearance frequencies of the partial character string data to be processed and the other partial character string data detected by the first detection unit, and use the calculated total value as the use frequency of the input prediction candidate A frequency calculation means for storing in the storage means;
When input of character string data is received by the input receiving means, based on the use frequency stored in the storage means, partial character string data including the character string data that has been accepted is selected, and the selected partial character Selection means for displaying column data on the display means;
A character input support device comprising:

Dictionary storage means for storing a dictionary in which synonyms indicating the same meaning are registered;
Referring to the dictionary storage means, synonym extraction means for extracting synonyms of the independent words extracted by the independent word extraction means;
The partial character string data to be processed is selected from the partial character string data converted by the conversion means, and other partial character string data including synonyms of independent words included in the selected partial character string data, Second detection means for detecting as partial character string data having the same meaning as the partial character string data to be processed;
Of the other partial character string data detected by the second detection means and the partial character string data to be processed, a part according to the length of the partial character string data or the appearance frequency counted by the frequency counting means Second selection means for selecting character string data and storing it as the input prediction candidates in the storage means,
The frequency calculation means calculates a total value of appearance frequencies of the partial character string data to be processed and the other partial character string data detected by the second detection means, and the calculated total value is calculated as the input prediction. The character input support apparatus according to claim 1, wherein the storage unit stores the candidate usage frequency.

A part that includes partial character string data to be processed from the partial character string data converted by the conversion means and includes more words or phrases than the number of words or phrases included in the selected partial character string data Detecting other partial character string data including word or phrase included in the processing target partial character string data and storing the detected other partial character string data as the input prediction candidate A third detecting means for storing in the means;
The frequency calculating means calculates a total value of appearance frequencies of the other partial character string data detected by the third detecting means and the partial character string data to be processed, and the calculated total value is calculated as the input prediction. The character input support apparatus according to claim 1, wherein the storage unit stores the candidate usage frequency.

A program for operating a computer as a character input support system for supporting character input,
The computer,
A plurality of partial characters including at least two or more words or clauses included in each character string data by analyzing the plurality of character string data received by the input receiving means and analyzing the plurality of character string data stored in the storage means Conversion means for converting to column data;
Frequency counting means for counting the appearance frequency of the partial character string data converted by the conversion means in the plurality of character string data;
From the partial character string data converted by the conversion means, an independent word extraction means for extracting an independent word;
The partial character string data to be processed is selected from the partial character string data converted by the conversion means, and the independent character included in the partial character string data to be processed is referred to by referring to the extraction result of the independent word extraction means. First detection means for detecting other partial character string data including a word as partial character string data having the same meaning as the partial character string data to be processed;
Of the other partial character string data detected by the first detection means and the partial character string data to be processed, a part according to the length of the partial character string data or the appearance frequency counted by the frequency counting means First selection means for selecting character string data, and storing the selected partial character string data in the storage means as input prediction candidates for supporting input of the character string;
Calculate a total value of appearance frequencies of the partial character string data to be processed and the other partial character string data detected by the first detection unit, and use the calculated total value as the use frequency of the input prediction candidate A frequency calculation means for storing in the storage means;
When input of character string data is received by the input receiving means, based on the use frequency stored in the storage means, partial character string data including the character string data that has been accepted is selected, and the selected partial character A program that functions as selection means for displaying column data on a display means.

A character input support method executed by a computer device that supports character input,
A plurality of partial characters including at least two or more words or clauses included in each character string data by analyzing the plurality of character string data received by the input receiving means and analyzing the plurality of character string data stored in the storage means A conversion step for converting to column data;
A frequency counting step of counting the appearance frequency of the partial character string data converted in the conversion step in the plurality of character string data;
An independent word extraction step of extracting an independent word from the partial character string data converted by the conversion step,
The partial character string data to be processed is selected from the partial character string data converted by the conversion step, and the independent character included in the partial character string data to be processed is referred to by referring to the extraction result of the independent word extraction step. A first detection step of detecting other partial character string data including a word as partial character string data having the same meaning as the partial character string data to be processed;
Of the other partial character string data detected in the first detection step and the partial character string data to be processed, a part according to the length of the partial character string data or the appearance frequency counted in the frequency counting step A first selection step of selecting character string data and storing the selected partial character string data in the storage means as an input prediction candidate that supports input of the character string;
Calculate the total value of the appearance frequency of the partial character string data to be processed and the other partial character string data detected in the first detection step, and use the calculated total value as the use frequency of the input prediction candidate A frequency calculation step of storing in the storage means;
When input of character string data is received by the input receiving means, based on the use frequency stored in the storage means, partial character string data including the character string data that has been accepted is selected, and the selected partial character A character input support method, wherein the computer device executes a selection step of displaying column data on a display means.