JP2011257921A

JP2011257921A - Character string selection device, character string selection method, and program therefor

Info

Publication number: JP2011257921A
Application number: JP2010131071A
Authority: JP
Inventors: Akira Kitauchi; 啓北内
Original assignee: NTT Data Corp
Current assignee: NTT Data Group Corp
Priority date: 2010-06-08
Filing date: 2010-06-08
Publication date: 2011-12-22
Anticipated expiration: 2030-06-08
Also published as: JP5560105B2

Abstract

PROBLEM TO BE SOLVED: To provide a character string selection device capable of appropriately and fast acquiring a candidate of a character string which a user wants to input.SOLUTION: A character index generation unit 122 stores, for each character included in terms which a term storage unit 131 stores, a character index indicating the terms including such character. When calculating an LCS (Longest Common Subsequence) between a character string which a user has inputted and the term which the term storage unit 131 stores, a processing unit 120 selects the term including such character included in the character string which the user has inputted, based on the character index, and calculates the LCS of the selected character. In this way, a character string selection device can narrow down the term for calculating the LCS, thereby appropriately and fast acquiring a candidate of the character string which the user wants to input.

Description

本発明は、文字列選択装置、文字列選択方法およびプログラムに関する。 The present invention relates to a character string selection device, a character string selection method, and a program.

コンピュータ等においてユーザが用語（文字列）を入力する負担を軽減する用語入力支援方法の１つに、当該コンピュータ等が、予め記憶する用語群の中から、入力済みの文字列に類似する用語を用語候補として選択して表示し、表示した用語候補のいずれかに対するユーザの選択操作を受け付ける方法がある。
例えば、コンピュータの表示画面に表示される診断書に病名を入力する際、図１２（ａ）に示すように、ユーザによる「大腸」の入力に対して、コンピュータは、この「大腸」に類似する用語として、「大腸炎」や「大腸癌」など、「大腸」を含む用語を選択し、入力欄の下方近傍に表示する。そして、カーソルキーあるいはマウスを用いてユーザが「大腸ポリープ」を選択すると、コンピュータは、選択された「大腸ポリープ」を入力欄に表示する。 One of the term input support methods for reducing the burden of a user inputting a term (character string) on a computer or the like is to use a term similar to the input character string from a group of terms stored in advance by the computer or the like. There is a method of selecting and displaying as a term candidate and accepting a user's selection operation for any of the displayed term candidates.
For example, when inputting a disease name on a medical certificate displayed on a display screen of a computer, as shown in FIG. 12A, the computer is similar to this “colon” in response to an input of “colon” by a user. As a term, a term including “colon” such as “colitis” or “colon cancer” is selected and displayed near the lower part of the input column. When the user selects “colon polyp” using the cursor key or the mouse, the computer displays the selected “colon polyp” in the input field.

一方、ユーザが用語の選択を行わずに、更に「癌」を入力すると、コンピュータは、同図（ｂ）に示すように、入力された文字列「大腸癌」のうち２文字以上を含む、「大腸炎」や「大腸粘膜内癌」などの用語を選択して表示する。
このように、入力された文字列の全部を含む用語に限らず、一部を含む用語も表示することにより、「大腸炎」を誤って「大腸癌」と入力する文字違いや、「転移性大腸癌」を誤って「転移大腸癌」と入力する入力不足など、入力された文字列に誤りがある場合にもユーザの意図する用語を表示し得る。 On the other hand, when the user inputs “cancer” without selecting a term, the computer includes two or more characters in the input character string “colon cancer” as shown in FIG. Select and display terms such as “colitis” and “colon mucosal cancer”.
In this way, by displaying not only terms that include all of the entered character string, but also terms that include a portion, it is possible to incorrectly enter “colonitis” as “colon cancer” or “metastasis” The term intended by the user can also be displayed when there is an error in the input character string, such as insufficient input to erroneously input “colon cancer” as “metastatic colorectal cancer”.

このような、入力済みの文字列に類似する用語を用語候補として選択する方法として、例えば、特許文献１に示される方法を用いることが考えられる。同文献では、"leucocyte"と"leukocyte"など、同一の事物を示す用語の表記が異なる「表記揺れ」の用語を取得する方法が示されている。この方法では、注目する用語に対する表記揺れの用語を収集するために、まず、予め記憶する用語の各々について、注目する用語とのＮグラム（文字数Ｎの部分文字列）の一致度および文字列長の類似度を比較することにより、類似する用語の絞込みを行う（以下では、この絞込みを「第１の絞込み」と称する）。そして、大文字と小文字との置換は１０点、数字の置換は１００点など、編集内容毎に設定されたコストで重み付けされた編集距離に基づいて、さらに用語の絞込みを行う（以下では、この絞込みを「第２の絞込み」と称する）。
この方法を、用語候補の選択に用いると、入力済みの文字列と共通の部分文字列を多く含み、入力済みの文字列に文字数が類似し、かつ、入力済みの文字列に対してコストの小さい変換操作を行って得られる文字列を抽出できる。 As a method for selecting such a term similar to the input character string as a term candidate, for example, it is conceivable to use a method disclosed in Patent Document 1. In this document, a method of acquiring a “notation shaking” term in which terms representing the same thing such as “leucocyte” and “leukocyte” are different. In this method, in order to collect notation fluctuation terms for a term of interest, first, for each of the terms stored in advance, the degree of coincidence and character string length of N-grams (partial character strings of N characters) with the term of interest The similar terms are narrowed down by comparing the similarities (hereinafter, this narrowing is referred to as “first narrowing”). Further, the terms are further narrowed down based on the edit distance weighted by the cost set for each editing content, such as 10 points for uppercase and lowercase letters and 100 points for numbers. Is referred to as “second refinement”).
When this method is used to select term candidates, it contains many partial character strings in common with the input character string, the number of characters is similar to the input character string, and the cost of the input character string is low. A character string obtained by performing a small conversion operation can be extracted.

特開２００５−３５２８８８号公報JP-A-2005-352888

しかしながら、特許文献１に示される方法を用語候補の選択に用いた場合、第１の絞込みを行う段階で、編集距離が小さい用語が除外されてしまい、最終的に、ユーザが入力したい用語を抽出できなくなるおそれが強くなってしまう。
かかる不都合を回避するために、第１の絞込みを行う段階における絞込み要件を緩和することによって除外する用語の数を少なくすることが考えられる。しかし、第１の絞込みを行う段階で除外する用語の数を少なくすると、第２の絞込みの対象となる用語の数が増大し、処理に要する時間が増大してしまう。ユーザに、用語候補の表示を待つストレスを与えないためには、例えば文字列が入力されてから０．１秒以内に用語候補を表示するなど、高速に用語候補を選択して表示する必要がある（例えば、図１２（ｂ）の「癌」が入力された後速やかに、同図（ｂ）に示す用語候補を表示する必要がある）が、処理に要する時間の増大により、高速に用語候補を選択して表示できないおそれがある。 However, when the method disclosed in Patent Document 1 is used for selecting term candidates, terms with a short editing distance are excluded at the stage of the first narrowing down, and finally the terms that the user wants to input are extracted. The risk of being unable to do so increases.
In order to avoid such an inconvenience, it is conceivable to reduce the number of terms to be excluded by relaxing the narrowing-down requirement in the first narrowing-down stage. However, if the number of terms excluded in the stage of performing the first narrowing is reduced, the number of terms to be second narrowed increases, and the time required for processing increases. In order not to give the user the stress of waiting for the display of term candidates, it is necessary to select and display term candidates at high speed, for example, displaying term candidates within 0.1 seconds after a character string is input. There is a need (for example, the term candidates shown in FIG. 12B need to be displayed immediately after the input of “cancer” in FIG. 12B). There is a possibility that a candidate cannot be selected and displayed.

本発明は、このような事情を考慮してなされたものであり、その目的は、ユーザの入力したい文字列の候補を適切に、かつ、高速に得られる文字列選択装置、文字列選択方法およびプログラムを提供することにある。 The present invention has been made in consideration of such circumstances, and the object thereof is to provide a character string selection device, a character string selection method, and a character string selection method that can appropriately and quickly obtain character string candidates that a user wants to input. To provide a program.

［１］この発明は上述した課題を解決するためになされたもので、本発明の一態様による文字列選択装置は、第１の文字列を取得する取得部と、１つ以上の第２の文字列の各々と、当該第２の文字列を識別する文字列識別情報とが対応付けられた識別情報付文字列群を記憶する文字列群記憶部と、前記識別情報付文字列群中の前記第２の文字列に含まれる全ての文字について、異なる前記文字毎に、当該文字と、当該文字を含み前記識別情報付文字列群に含まれる前記第２の文字列の文字列識別情報と、が対応付けられた文字インデックスを記憶する文字インデックス記憶部と、前記第１の文字列に含まれる文字に、前記文字インデックスにて対応付けられた前記文字列識別情報により識別される前記第２の文字列を選択する文字列選択部と、を具備することを特徴とする。 [1] The present invention has been made to solve the above-described problem, and a character string selection device according to an aspect of the present invention includes an acquisition unit that acquires a first character string, and one or more second A character string group storage unit for storing a character string group with identification information in which each of the character strings is associated with character string identification information for identifying the second character string; For all the characters included in the second character string, for each of the different characters, the character and the character string identification information of the second character string including the character and included in the character string group with identification information , And a character index storage unit that stores a character index associated with the character string, and the character string identification information associated with the character included in the first character string by the character string identification information. A character string selector that selects the character string Characterized by including the.

［２］また、本発明の一態様による文字列選択装置は、上述の文字列選択装置であって、前記文字列選択部は、前記選択した第２の文字列に含まれる文字と、前記第１の文字列に含まれる文字とに共通する文字の文字数を示す文字数情報を生成することを特徴とする。 [2] A character string selection device according to an aspect of the present invention is the character string selection device described above, wherein the character string selection unit includes a character included in the selected second character string, and the first character string selection device. Character number information indicating the number of characters common to the characters included in one character string is generated.

［３］また、本発明の一態様による文字列選択装置は、上述の文字列選択装置であって、前記文字列選択部は、前記第１の文字列の部分文字列に含まれる文字と、前記第２の文字列に含まれる文字とに共通する文字の文字数を示す文字数情報を生成し、当該文字数情報と、前記第１の文字列に含まれる文字数と前記部分文字列に含まれる文字数との差と、に基づいて定まる値が、予め定められた閾値未満の場合に、前記第１の文字列と当該第２の文字列との前記類似度を示す情報の生成を中止することを特徴とする。 [3] A character string selection device according to an aspect of the present invention is the character string selection device described above, wherein the character string selection unit includes a character included in the partial character string of the first character string, Character number information indicating the number of characters common to the characters included in the second character string is generated, the character number information, the number of characters included in the first character string, and the number of characters included in the partial character string, Generation of information indicating the similarity between the first character string and the second character string is stopped when a value determined based on the difference between the first character string and the second character string is less than a predetermined threshold value. And

［４］また、本発明の一態様による文字列選択方法は、１つ以上の第２の文字列の各々と、当該第２の文字列を識別する文字列識別情報とが対応付けられた識別情報付文字列群を記憶する文字列群記憶部と、前記識別情報付文字列群中の前記第２の文字列に含まれる全ての文字について、異なる前記文字毎に、当該文字と、当該文字を含み前記識別情報付文字列群に含まれる前記第２の文字列の文字列識別情報と、が対応付けられた文字インデックスを記憶する文字インデックス記憶部と、を具備する文字列選択装置の文字列選択方法であって、取得部が、第１の文字列を取得する取得ステップと、文字列選択部が、前記第１の文字列に含まれる文字に、前記文字インデックスにて対応付けられた前記文字列識別情報により識別される前記第２の文字列を選択する文字列選択ステップと、を具備することを特徴とする。 [4] In the character string selection method according to one aspect of the present invention, an identification in which each of the one or more second character strings is associated with character string identification information for identifying the second character string. A character string group storage unit that stores a character string group with information, and for all the characters included in the second character string in the character string group with identification information, for each of the different characters, the character and the character And a character index storage unit for storing a character index associated with the character string identification information of the second character string included in the character string group with the identification information. In the column selection method, the acquisition unit acquires the first character string, and the character string selection unit is associated with the character included in the first character string by the character index. The first identified by the character string identification information Characterized by the character string selection step of selecting a character string, a provided.

［５］また、本発明の一態様によるプログラムは、１つ以上の第２の文字列の各々と、当該第２の文字列を識別する文字列識別情報とが対応付けられた識別情報付文字列群を記憶する文字列群記憶部と、前記識別情報付文字列群中の前記第２の文字列に含まれる全ての文字について、異なる前記文字毎に、当該文字と、当該文字を含み前記識別情報付文字列群に含まれる前記第２の文字列の文字列識別情報と、が対応付けられた文字インデックスを記憶する文字インデックス記憶部と、を具備する文字列選択装置としてのコンピュータに、第１の文字列を取得する取得ステップと、前記第１の文字列に含まれる文字に、前記文字インデックスにて対応付けられた前記文字列識別情報により識別される前記第２の文字列を選択する文字列選択ステップと、を実行させるためのプログラムである。 [5] Further, the program according to one aspect of the present invention provides a character with identification information in which each of the one or more second character strings is associated with character string identification information for identifying the second character string. For each character included in the second character string in the character string group with identification information and the character string group storage unit that stores the string group, for each different character, the character and the character are included. A computer as a character string selection device, comprising: a character index storage unit that stores a character index associated with the character string identification information of the second character string included in the character string group with identification information; An acquisition step of acquiring a first character string; and selecting the second character string identified by the character string identification information associated with the character included in the first character string by the character index Character string selection Tsu is a program to be executed and up, the.

この発明によれば、文字列選択部は、第１の文字列に含まれる文字に、文字インデックスにて対応付けられた文字列識別情報により識別される第２の文字列を選択する。
これにより、文字列選択部は、第１の文字列と共通の文字を含まない第２の文字列については、選択の要否を判定することなく第２の文字列を選択できるので、第２の文字列を高速に選択できる。これにより、ユーザの入力したい文字列の候補として、第２の文字列を高速に得られる。
また、文字列選択部は、第１の文字列に含まれる文字に、文字インデックスにて対応付けられた文字列識別情報により識別される第２の文字列を選択することにより、第１の文字列に含まれる文字を含む第２の文字列のみを全て選択できる。この点で、ユーザの入力した文字列の候補として、第２の文字列を適切に得られる。 According to this invention, the character string selection unit selects the second character string identified by the character string identification information associated with the character included in the first character string by the character index.
As a result, the character string selection unit can select the second character string without determining whether or not the second character string does not include a character common to the first character string. Can be selected at high speed. As a result, the second character string can be obtained at high speed as a character string candidate to be input by the user.
In addition, the character string selection unit selects the first character by selecting the second character string identified by the character string identification information associated with the character included in the first character string by the character index. Only the second character string including the characters included in the column can be selected. In this respect, the second character string can be appropriately obtained as a character string candidate input by the user.

この発明によれば、さらに、選択された第２の文字列を用いて、選択した第２の文字列に含まれる文字と、第１の文字列に含まれる文字とに共通する文字の文字数を示す文字数情報を生成するようにした。これにより、文字数情報の文字数が多い第２の文字列は、第１の文字列と共通度合いが高いものとして把握することができる。 According to this invention, the number of characters common to the character included in the selected second character string and the character included in the first character string is further calculated using the selected second character string. The number of characters shown information is generated. Thereby, the second character string having a large number of characters in the character number information can be grasped as having a high degree of commonality with the first character string.

この発明によれば、さらに、第１の文字列の部分文字列に含まれる文字と、第２の文字列に含まれる文字とに共通する文字の文字数を示す文字数情報を生成した後、当該文字数情報と、第１の文字列に含まれる文字数と部分文字列に含まれる文字数との差と、に基づいて定まる値が、予め定められた閾値未満の場合に、第１の文字列と当該第２の文字列との前記類似度を示す情報の生成を中止するようにした。これにより、文字数情報が閾値未満となる第２の文字列を除外して、残りの第２の文字列に対して文字数情報を生成することができ、文字数情報を生成する処理を高速に行うことができる。 According to this invention, after generating the number-of-characters information indicating the number of characters common to the characters included in the partial character string of the first character string and the characters included in the second character string, When the value determined based on the information and the difference between the number of characters included in the first character string and the number of characters included in the partial character string is less than a predetermined threshold, the first character string and the first character string Generation of information indicating the degree of similarity with the second character string is stopped. As a result, the second character string whose character number information is less than the threshold value can be excluded, the character number information can be generated for the remaining second character string, and the processing for generating the character number information can be performed at high speed. Can do.

本発明の一実施形態における文字列入力支援システムの概略構成を示す構成図である。It is a block diagram which shows schematic structure of the character string input assistance system in one Embodiment of this invention. 同実施形態において、用語記憶部１３１が記憶する用語リストの例を示す図である。In the embodiment, it is a figure which shows the example of the term list which the term memory | storage part 131 memorize | stores. 同実施形態において、文字インデックス記憶部１３２が記憶する文字インデックスの例を示す図である。In the same embodiment, it is a figure which shows the example of the character index which the character index memory | storage part 132 memorize | stores. ＬＣＳ長を算出するために用いられる行列の例を示す図である。It is a figure which shows the example of the matrix used in order to calculate LCS length. 入力文字列と用語とのＬＣＳが用語中に出現する位置を示す文字位置情報を算出するために用いられる行列の例を示す図である。It is a figure which shows the example of the matrix used in order to calculate the character position information which shows the position where LCS of an input character string and a term appears in a term. 同実施形態において、文字インデックス生成部１２２が文字インデックスを生成する処理手順を示すフローチャートである。In the same embodiment, it is a flowchart which shows the process sequence in which the character index production | generation part 122 produces | generates a character index. 同実施形態において、処理部１２０がＬＣＳ長を算出し、文字位置情報を生成する処理手順を示すフローチャートである。In the embodiment, a processing unit 120 calculates a LCS length and generates a character position information. 同実施形態において、処理部１２０がＬＣＳ長を算出し、文字位置情報を生成する処理手順を示すフローチャートである。In the embodiment, a processing unit 120 calculates a LCS length and generates a character position information. 同実施形態において、順位決定部１２５による用語の順位決定の例を示す図である。In the embodiment, it is a figure which shows the example of the order determination of the term by the order determination part 125. FIG. 同実施形態において、順位決定部１２５が生成する指標の例を示す図である。In the same embodiment, it is a figure which shows the example of the parameter | index which the order | rank determination part 125 produces | generates. 同実施形態において、表示部２３０が用語のリストを表示した例を示す図である。In the same embodiment, it is a figure which shows the example which the display part 230 displayed the list | wrist of a term. ユーザが文字列を入力する際の、ユーザが入力した文字列の候補の表示例を示す図である。It is a figure which shows the example of a display of the candidate of the character string which the user input when a user inputs a character string.

以下、図面を参照して、本発明の一実施形態について説明する。
図１は、本発明の一実施形態における文字列入力支援システムの概略構成を示す構成図である。同図において、文字列入力支援システム１は、文字列入力支援装置（文字列選択装置）１００と、端末装置２００とを具備する。文字列入力支援装置１００は、通信部（取得部）１１０と、処理部（文字列選択部）１２０と、記憶部１３０とを具備する。処理部１２０は、処理制御部１２１と、文字インデックス生成部１２２と、ＬＣＳ長算出部１２３と、文字位置情報生成部１２４と、順位決定部１２５とを具備する。記憶部１３０は、用語記憶部（文字列群記憶部）１３１と、文字インデックス記憶部１３２と、ＬＣＳ長行列記憶部１３３と、文字位置情報行列記憶部１３４とを具備する。端末装置２００は、通信部２１０と、入力部２２０と、表示部２３０とを具備する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a configuration diagram showing a schematic configuration of a character string input support system according to an embodiment of the present invention. In FIG. 1, the character string input support system 1 includes a character string input support device (character string selection device) 100 and a terminal device 200. The character string input support apparatus 100 includes a communication unit (acquisition unit) 110, a processing unit (character string selection unit) 120, and a storage unit 130. The processing unit 120 includes a processing control unit 121, a character index generation unit 122, an LCS length calculation unit 123, a character position information generation unit 124, and a rank determination unit 125. The storage unit 130 includes a term storage unit (character string group storage unit) 131, a character index storage unit 132, an LCS length matrix storage unit 133, and a character position information matrix storage unit 134. The terminal device 200 includes a communication unit 210, an input unit 220, and a display unit 230.

文字列入力支援システム１は、ユーザが入力済みの文字列（第１の文字列。以下では、「入力文字列」と称する）に基づいて、ユーザが入力したい文字列の候補（文字列候補）を表示する。
文字列入力支援装置１００は、入力文字列を端末装置２００から取得し、取得した文字列に基づいて、ユーザが入力したい文字列の候補の各々の順位を決定し、ユーザが入力したい文字列の候補を、決定した順位に従って並べたリストを生成して端末装置２００に送信する。
通信部１１０は、端末装置２００との間でデータの送受信を行う。
記憶部１３０は、ユーザが入力したい文字列の候補である用語を予め記憶する。また、記憶部１３０は、ユーザが入力したい文字列の候補である用語の順位を、文字列入力支援装置１００が決定する際のワーキングメモリとして機能する。記憶部１３０は、文字列入力支援装置１００が具備する記憶装置上に実現される。 The character string input support system 1 is a character string candidate (character string candidate) that the user wants to input based on a character string already input by the user (first character string; hereinafter referred to as “input character string”). Is displayed.
The character string input support device 100 acquires an input character string from the terminal device 200, determines the rank of each character string candidate that the user wants to input based on the acquired character string, and determines the character string that the user wants to input. A list in which candidates are arranged according to the determined order is generated and transmitted to the terminal device 200.
The communication unit 110 transmits / receives data to / from the terminal device 200.
The storage unit 130 stores in advance terms that are candidates for character strings that the user wants to input. In addition, the storage unit 130 functions as a working memory when the character string input support device 100 determines the ranking of terms that are character string candidates that the user wants to input. The storage unit 130 is realized on a storage device included in the character string input support device 100.

用語記憶部１３１は、ユーザが入力したい文字列の候補である用語（第２の文字列）をリスト形式にて予め記憶する。
図２は、用語記憶部１３１が記憶する用語リストの例を示す図である。同図に示すように、用語リストの各行に、ユーザが入力したい文字列の候補である用語と、当該用語の識別番号である用語ＩＤとが対応付けて記憶されている。用語リストは、例えば、ある分野の辞書に含まれる用語の各々に用語ＩＤを付して生成される。同図の例では、用語リストは、病名に用語ＩＤを付して生成されている。
用語ＩＤは、各用語を識別する情報であると共に、用語リスト中における各用語の順序を示す情報でもある。
なお、用語リストに記憶される用語の数は、同図に示す７つに限らず、任意の個数であってよい。 The term storage unit 131 stores in advance a term (second character string) that is a candidate for a character string that the user wants to input in a list format.
FIG. 2 is a diagram illustrating an example of a term list stored in the term storage unit 131. As shown in the figure, in each line of the term list, a term that is a candidate for a character string that the user wants to input and a term ID that is an identification number of the term are stored in association with each other. The term list is generated, for example, by adding a term ID to each term included in a dictionary in a certain field. In the example of the figure, the term list is generated by adding a term ID to a disease name.
The term ID is information for identifying each term and information indicating the order of each term in the term list.
Note that the number of terms stored in the term list is not limited to seven shown in the figure, and may be any number.

文字インデックス記憶部１３２は、用語記憶部１３１が記憶する用語に含まれる文字の各々について、当該文字を含む用語を示す文字インデックスを記憶する。
図３は、文字インデックス記憶部１３２が記憶する文字インデックスの例を示す図である。同図に示すように、文字インデックスの各行には、用語リスト中のいずれかの用語に含まれる文字と、当該文字を含む用語を示す用語ＩＤリストとが格納されている。同図の文字インデックスは、図２の用語リスト中の用語に含まれる文字毎に、当該文字を含む用語を用語ＩＤにて示している。例えば、図２の用語リスト中で用語ＩＤ「１」に対応付けられた用語「小腸癌」は、文字「小」と「腸」と「癌」とを含む。したがって、図３の文字インデックスの、文字「小」を含む行と、文字「腸」を含む行と、文字「癌」を含む行とのいずれも、用語ＩＤリスト中に用語ＩＤ「１」を含んでいる。 The character index storage unit 132 stores, for each character included in the term stored in the term storage unit 131, a character index indicating a term including the character.
FIG. 3 is a diagram illustrating an example of a character index stored in the character index storage unit 132. As shown in the figure, each line of the character index stores a character included in any term in the term list and a term ID list indicating a term including the character. The character index of the figure shows the term including the character by the term ID for each character included in the term in the term list of FIG. For example, the term “small intestine cancer” associated with the term ID “1” in the term list of FIG. 2 includes the characters “small”, “intestine”, and “cancer”. Therefore, the term ID “1” is included in the term ID list in any of the line including the character “small”, the line including the character “intestine”, and the line including the character “cancer” in the character index of FIG. Contains.

ＬＣＳ長行列記憶部１３３は、文字列入力支援装置１００が用語の順位を決定する際に生成するＬＣＳ長行列を記憶するワーキングメモリである。
以下、図４を参照して、ＬＣＳ長行列記憶部１３３が記憶するＬＣＳ長行列について説明する。
図４は、ＬＣＳ長を算出するために用いられる行列の例を示す図である。同図（ａ）と（ｂ）とは、いずれも、ＬＣＳ長を算出するために用いられる行列の例を示す。
同図（ａ）の行列は、入力文字列の文字数＋１の行、および、用語記憶部１３１が記憶する用語の文字数＋１の列を有する行列であり、各要素には非負整数が格納される。以下では、図４（ａ）の行列の最上行（初期値に対応する行）を第０行とし、最左列（初期値に対応する列）を第０列とする。 The LCS length matrix storage unit 133 is a working memory that stores an LCS length matrix generated when the character string input support device 100 determines the ranking of terms.
Hereinafter, the LCS length matrix stored in the LCS length matrix storage unit 133 will be described with reference to FIG.
FIG. 4 is a diagram illustrating an example of a matrix used for calculating the LCS length. FIGS. 5A and 5B show examples of matrices used for calculating the LCS length.
The matrix in FIG. 5A is a matrix having a row of the number of characters of the input character string + 1 and a column of the number of characters of the term + 1 stored in the term storage unit 131, and each element stores a non-negative integer. In the following, the top row (row corresponding to the initial value) of the matrix in FIG. 4A is the 0th row, and the leftmost column (column corresponding to the initial value) is the 0th column.

ここで、ＬＣＳ（Longest Common Subsequence、最長共通部分列）は、２つの文字列に共通する部分列（共通部分列、Common Subsequence）のうち、最も長い（文字数の多い）ものである。この共通部分列は、各文字列中に連続して出現する必要はないが、２つの文字列中に同じ順序で出現する必要がある。図４（ａ）の「大腸粘膜内癌」と「大腸粘液癌」との場合、「大」「腸」「粘」「癌」の各文字が同順序で出現しているので、両者のＬＣＳは「大腸粘癌」であり、ＬＣＳの文字数（以下では、「ＬＣＳ長」と称する）は４である。
このように、ＬＣＳ長は、２つの文字列に共通する文字の数を示すので、２つの文字列の類似度を示す情報として用いることができる。 Here, LCS (Longest Common Subsequence, longest common subsequence) is the longest (the number of characters is large) among the partial sequences (common subsequence) common to two character strings. The common partial sequence does not need to appear continuously in each character string, but needs to appear in the same order in the two character strings. In the case of “colon intramucosal cancer” and “colon mucinous cancer” in FIG. 4A, the letters “large”, “intestine”, “viscosity”, and “cancer” appear in the same order. Is “colonous carcinoma of the large intestine”, and the number of characters of LCS (hereinafter referred to as “LCS length”) is 4.
Thus, since the LCS length indicates the number of characters common to the two character strings, it can be used as information indicating the similarity between the two character strings.

図４（ａ）の行列の各要素は、当該要素の位置に対応する、用語またはその部分文字列と、入力文字列またはその部分文字列とのＬＣＳ長を示す。例えば、図４（ａ）の行列の第５行第６列の要素ａ１５６は、入力文字列「大腸粘液癌」と、用語「大腸粘膜内癌」とのＬＣＳ長「４」を示している。また、図４（ａ）の行列の第２行第４列の要素ａ１２４は、入力文字列の先頭から２文字の部分文字列（すなわち、先頭から２文字を入力した時点での入力文字列）「大腸」と、用語の先頭から４文字の部分文字列「大腸粘膜」とのＬＣＳ「大腸」の文字数「２」を示している。 Each element of the matrix in FIG. 4A indicates the LCS length of the term or its partial character string and the input character string or its partial character string corresponding to the position of the element. For example, the element a156 in the fifth row and sixth column of the matrix of FIG. 4A indicates the LCS length “4” of the input character string “colon mucinous cancer” and the term “colon mucosal carcinoma”. The element a124 in the second row and the fourth column of the matrix of FIG. 4A is a partial character string of two characters from the beginning of the input character string (that is, the input character string when the two characters are inputted from the beginning). The number of characters “2” of the LCS “colon” with “colon” and a partial character string “colon mucosa” of 4 characters from the beginning of the term is shown.

この、２つの文字列のＬＣＳは、部分文字列のＬＣＳから再帰的に算出できる。
２つの文字列の末尾の文字が同じ場合は、各文字列から当該末尾の文字を除いた部分文字列のＬＣＳに、当該末尾の文字を加えることによりＬＣＳを得られる。例えば、入力文字列「大腸粘液癌」と用語「大腸粘膜内癌」との末尾の文字は共に「癌」である。そして、「大腸粘液癌」と「大腸粘膜内癌」とのＬＣＳ「大腸粘癌」は、入力文字列「大腸粘液癌」から「癌」を除いた「大腸粘液」と、用語「大腸粘膜内癌」から「癌」を除いた「大腸粘膜内」との「大腸粘」に「癌」を加えて得られる。したがって、第５行第６列の要素ａ１５６の値「４」（「大腸粘癌」の文字数）は、第４行第５列の要素ａ１４５の値「３」（「大腸粘」の文字数）に１（「癌」の文字数）を加えた値となっている。 The LCS of the two character strings can be calculated recursively from the LCS of the partial character strings.
When the last character of two character strings is the same, the LCS can be obtained by adding the last character to the LCS of the partial character string obtained by removing the last character from each character string. For example, the end characters of the input character string “colon mucinous cancer” and the term “colon mucosa cancer” are both “cancer”. The LCS “colon mucosal cancer” of “colon mucinous cancer” and “colon mucosal carcinoma” is derived from the input string “colon mucinous cancer” excluding “cancer” and the term “colon mucosal cancer” It is obtained by adding "cancer" to "colon mucosa" with "intestinal mucosa" excluding "cancer" from "cancer". Therefore, the value “4” of the element a156 in the fifth row and sixth column (the number of characters of “colon mucosa”) is set to the value “3” (the number of characters of “colon mucosa”) of the element a145 in the fourth row and fifth column. 1 (the number of characters of “cancer”) is added.

一方、２つの文字列の末尾が異なる場合は、一方の文字列から当該末尾の文字を除いた部分文字列のＬＣＳのうち、文字数の多いほうのＬＣＳと同一（両者の値が等しい場合は、両者の値と同一）である。例えば、「大腸」と「大腸粘膜」との末尾の文字は、それぞれ「腸」と「膜」とであり異なる。ここで、「大腸」と、「大腸粘膜」から末尾の文字「膜」を除いた「大腸粘」とのＬＣＳは「大腸」である。また、「大腸」から末尾の文字「腸」を除いた「大」と、「大腸粘膜」とのＬＣＳは「大」である。そして、「大腸」と「大腸粘膜」とのＬＣＳ「大腸」は、この２つのＬＣＳ「大腸」および「大」のうち文字数の多いほうである「大腸」と同一である。したがって、第２行第４列の要素ａ１２４の値「２」（「大腸」の文字数）は、第２行第３列の要素ａ１２３の値「２」（「大腸」の文字数）と、第１行第４列の要素ａ１１４の値「１」（「大」の文字数）とのうち、値の大きいほうである「２」と同一である。 On the other hand, if the end of the two character strings are different, the LCS of the partial character string obtained by removing the end character from the one character string is the same as the LCS having the larger number of characters (if both values are equal, Both values are the same). For example, the letters at the end of “colon” and “colon mucosa” are different for “intestine” and “membrane”, respectively. Here, the LCS of “colon” and “colon mucosa” obtained by removing the last letter “membrane” from “colon mucosa” is “colon”. The LCS of “large” obtained by removing the last letter “intestine” from “large intestine” and “colon mucosa” is “large”. The LCS “colon” of “colon” and “colon mucosa” is the same as the “colon”, which is the larger of the two LCSs “colon” and “large”. Therefore, the value “2” (number of characters of “colon”) of the element a124 in the second row and fourth column is equal to the value “2” (number of characters of “colon”) of the element a123 of the second row and third column, and the first Of the value “1” (number of characters of “large”) of the element a114 in the row 4th column, it is the same as “2” which is the larger value.

このように、図４（ａ）の行列の各要素の値は、入力文字列中の対応する文字と、用語中の対応する文字とが同じ場合は、左上の要素の値に１を加えた値と同一である。一方、入力文字列中の対応する文字と、用語中の対応する文字とが異なる場合は、左隣の要素の値と、上隣の要素の値とのうち大きいほうの値と同一（両者の値が等しい場合は、両者の値と同一）である。 As described above, when the corresponding character in the input character string is the same as the corresponding character in the term, the value of each element of the matrix in FIG. Is the same as the value. On the other hand, if the corresponding character in the input string is different from the corresponding character in the term, it is the same as the larger value of the value of the element on the left and the value of the element on the upper left (both If the values are equal, both values are the same).

ここで、図４（ａ）の入力文字列「大腸粘液癌」のうち、文字「液」は、用語「大腸粘膜内癌」と共通しておらず、ＬＣＳ「大腸粘癌」には含まれない。このため、用語「大腸粘膜内癌」またはその部分文字列のいずれについても、入力文字列の部分文字列「大腸粘液」とのＬＣＳ長は、文字「液」を除いた「大腸粘」とのＬＣＳ長と同一である。例えば、用語の部分文字列「大腸粘膜内」について見ると、この「大腸粘膜内」と「大腸粘液」とのＬＣＳは「大腸粘」であり、ＬＣＳ長は（要素ａ１４５の値）「３」である。そして、「大腸粘膜内」と、「大腸粘液」から「液」を除いた「大腸粘」とのＬＣＳも「大腸粘」であり、ＬＣＳ長は（要素ａ１３５の値）「３」である。
このため、ＬＣＳ行列の第４行（文字「液」に対応する行）の値は、上隣の第３行（文字「粘」に対応する行）の値と一致する。 Here, in the input character string “colon mucinous carcinoma” in FIG. 4A, the character “fluid” is not common to the term “colon mucosal carcinoma” and is included in the LCS “colonous mucosal carcinoma”. Absent. Therefore, for both the term “colon mucosa cancer” or its partial character string, the LCS length of the input character string with the partial character string “colon mucus” is the same as the “colon mucosa” excluding the character “liquid”. It is the same as the LCS length. For example, looking at the term partial character string “in the large intestine mucosa”, the LCS of “in the large intestine mucosa” and “colon mucus” is “colon mucosa”, and the LCS length (value of element a145) is “3”. It is. The LCS of “colon mucosa” and “colon mucosa” obtained by removing “liquid” from “colon mucus” is also “colon mucosa”, and the LCS length (value of element a135) is “3”.
For this reason, the value of the fourth line (line corresponding to the character “liquid”) of the LCS matrix matches the value of the third line (line corresponding to the character “sticky”) on the upper side.

このように、入力文字列のうち用語に含まれない文字に対応する、図４（ａ）の行列中の行の値は、上隣の行の値と一致する。そこで、ＬＣＳ長行列記憶部１３３は、図４（ａ）の行列から、用語に含まれない文字に対応する行を削除した行列を、ＬＣＳ長行列として記憶する。このＬＣＳ長行列の最下行最右列の要素の値は、図４（ａ）の行列の最下行最右列の要素の値と一致する。すなわち、ＬＣＳ長行列の最下行最右列の要素の値は、入力文字列と用語とのＬＣＳ長を示している。
例えば、ＬＣＳ長行列記憶部１３３は、図４（ａ）の行列から、用語に含まれない文字「液」に対応する行を削除した、図４（ｂ）のＬＣＳ長行列を記憶する。この行列の最下行最右列の要素ａ２４６の値「４」は、入力文字列「大腸粘液癌」と用語「大腸粘膜内癌」とのＬＣＳ「大腸粘癌」の文字数、すなわちＬＣＳ長を示している。
後述するように、ＬＣＳ長行列は、文字インデックス記憶部１３２が記憶する文字インデックスを用いて生成される。 As described above, the value of the row in the matrix of FIG. 4A corresponding to the character not included in the term in the input character string matches the value of the upper adjacent row. Therefore, the LCS length matrix storage unit 133 stores, as an LCS length matrix, a matrix obtained by deleting rows corresponding to characters not included in the terms from the matrix in FIG. The value of the element in the bottom rightmost column of this LCS length matrix matches the value of the element in the bottom rightmost column of the matrix of FIG. That is, the value of the element in the bottom rightmost column of the LCS length matrix indicates the LCS length of the input character string and the term.
For example, the LCS length matrix storage unit 133 stores the LCS length matrix in FIG. 4B in which the row corresponding to the character “liquid” not included in the term is deleted from the matrix in FIG. The value “4” of the element a246 in the bottom rightmost column of this matrix indicates the number of characters of the LCS “colon mucosa” of the input character string “colon mucosa cancer” and the term “colon mucosa cancer”, that is, the LCS length. ing.
As will be described later, the LCS length matrix is generated using the character index stored in the character index storage unit 132.

図１に戻って、文字位置情報行列記憶部１３４は、文字列入力支援装置１００が用語の順位を決定する際に生成する文字位置情報行列を記憶するワーキングメモリである。
以下、図５を参照して、文字位置情報行列記憶部１３４が記憶する文字位置情報行列について説明する。
図５は、入力文字列と用語とのＬＣＳが用語中に出現する位置を示す文字位置情報を算出するために用いられる行列の例を示す図である。同図（ａ）と（ｂ）とは、いずれも、文字位置情報を算出するために用いられる行列の例を示す。
同図（ａ）の行列は、入力文字列の文字数＋１の行数、および、用語記憶部１３１が記憶する用語の文字数＋１の列数を有する行列であり、各要素にはビット列（二進数）が格納される。以下では、図５（ａ）の行列の最上行（初期値に対応する行）を第０行とし、最左列（初期値に対応する列）を第０列とする。 Returning to FIG. 1, the character position information matrix storage unit 134 is a working memory that stores a character position information matrix generated when the character string input support device 100 determines the ranking of terms.
Hereinafter, the character position information matrix stored in the character position information matrix storage unit 134 will be described with reference to FIG.
FIG. 5 is a diagram illustrating an example of a matrix used for calculating character position information indicating a position where an LCS of an input character string and a term appears in the term. FIGS. 5A and 5B show examples of matrices used for calculating character position information.
The matrix in FIG. 6A is a matrix having the number of characters of the input character string + 1 and the number of columns of the term character number + 1 stored in the term storage unit 131, and each element has a bit string (binary number). Is stored. In the following, the top row (row corresponding to the initial value) of the matrix in FIG. 5A is the 0th row, and the leftmost column (column corresponding to the initial value) is the 0th column.

図５（ａ）の行列の各要素には、入力文字列と、用語記憶部１３１が記憶する用語とのＬＣＳの各文字が、当該用語中に出現する位置を示す文字位置情報が格納されている。文字位置情報はビット列により構成され、１の位（最右ビット）から順に、入力文字列中の１番目（先頭）から順の各文字の位置を示す。例えば、図５（ａ）の行列の第５行第６列の要素ｂ１５６は、最右ビットが入力文字列の先頭の文字「大」の位置を示し、以下同様に、右から２番目のビットが「腸」の位置、３番目が「粘」の位置、・・・、６番目（最左ビット）が「癌」の位置を示す。ここで、入力文字列「大腸粘液癌」と用語「大腸粘膜内癌」とのＬＣＳ「大腸粘癌」の各文字は、それぞれ、用語「大腸粘膜内癌」の先頭から１文字目（「大」）と、２文字目（「腸」）と、３文字目（「粘」）と、６文字目（「癌」）とに出現する。したがって、図５（ａ）の行列の第５行第６列の要素ｂ１５６は、右から１、２、３、６番目のビットが「１」、４、５番目のビットが「０」となっている。 Each element of the matrix in FIG. 5A stores character position information indicating the position where each character of the LCS of the input character string and the term stored in the term storage unit 131 appears in the term. Yes. The character position information is composed of a bit string, and indicates the position of each character in order from the first (first) in the input character string in order from the first place (rightmost bit). For example, in the element b156 in the fifth row and the sixth column of the matrix of FIG. 5A, the rightmost bit indicates the position of the first character “Large” in the input character string, and so on. Indicates the position of “intestine”, the third indicates the position of “viscous”,..., The sixth (leftmost bit) indicates the position of “cancer”. Here, each character of the LCS “colon mucosal carcinoma” of the input character string “colon mucinous carcinoma” and the term “colon mucosal carcinoma” is the first character from the head of the term “colon mucosal carcinoma” ”), The second character (“ intestine ”), the third character (“ viscous ”), and the sixth character (“ cancer ”). Accordingly, in the element b156 in the fifth row and sixth column of the matrix of FIG. 5A, the first, second, third, and sixth bits from the right are “1”, and the fourth and fifth bits are “0”. ing.

上述したように、２つの文字列のＬＣＳは、部分文字列のＬＣＳから再帰的に算出できる。このため、文字位置情報も、部分文字列の文字位置情報から再帰的に生成できる。
２つの文字列（入力文字列またはその部分文字列、および、用語またはその部分文字列）の末尾の文字が同じ場合、これらの文字列の文字位置情報は、各文字列から当該末尾の文字を除いた部分文字列の文字位置情報に、当該末尾の文字の位置に対応する桁を１とした数を加算した値となる。例えば、図５（ａ）の行列の、第５行第６列の要素ｂ１５６の値「１００１１１」（二進数）は、第４行第５列の要素ｂ１４５の値「１１１」（二進数。「００１１１」の先頭の０が省略されている。以下同様に、文字位置情報の先頭の０の表示を省略する）に、末尾の文字「癌」の位置に対応する右から６桁目が１の「１０００００」（二進数）を加算した値となっている。 As described above, the LCS of the two character strings can be recursively calculated from the LCS of the partial character strings. For this reason, the character position information can also be recursively generated from the character position information of the partial character string.
When the last character of two character strings (input character string or its substring, and term or its substring) is the same, the character position information of these character strings is the character from the end of each character string. This is a value obtained by adding a number with the digit corresponding to the position of the last character as 1 to the character position information of the excluded partial character string. For example, in the matrix of FIG. 5A, the value “100111” (binary number) of the element b156 in the fifth row and sixth column is the value “111” (binary number) of the element b145 in the fourth row and fifth column. “00111” is omitted from the leading 0. Similarly, the leading zero of the character position information is omitted), and the sixth digit from the right corresponding to the position of the last character “cancer” is 1 It is a value obtained by adding “100,000” (binary number).

一方、２つの文字列の末尾が異なる場合、これらの文字列の文字位置情報は、一方の文字列から当該末尾の文字を除いた場合の文字位置情報のうち、ＬＣＳ長が長いほうの文字列の文字位置情報と同一である。例えば、図５（ａ）の行列の、第２行第４列の要素ｂ１２４の値「１１」（二進数）は、第２行第３列の要素ｂ１２３の値「１１」（二進数）と、第１行第４列の要素ｂ１１４の値「１」（二進数）とのうち、ＬＣＳ長が長い文字列に対応する、第２行第３列の要素ｂ１２３の値「１１」（二進数）と同一である。
いずれの文字列から末尾の文字を除いてもＬＣＳ長が同一の場合は、いずれの文字位置情報と同一としてもよい。本実施形態では、値が小さいほうの文字位置情報と同一とする。後述するように、ＬＣＳに含まれる文字が先頭に近い位置に出現する用語を上位とするためである。 On the other hand, when the two character strings have different tails, the character position information of these character strings is the character string having the longer LCS length among the character position information obtained by removing the tail character from one character string. Is the same as the character position information. For example, in the matrix of FIG. 5A, the value “11” (binary number) of the element b124 in the second row and fourth column is the value “11” (binary number) of the element b123 in the second row and third column. , The value “11” (binary number) of the element b123 in the second row and third column corresponding to the character string having a long LCS length among the value “1” (binary number) of the element b114 in the first row and fourth column ).
If the LCS length is the same even if the last character is removed from any character string, it may be the same as any character position information. In this embodiment, it is the same as the character position information with the smaller value. This is because, as will be described later, a term in which a character included in the LCS appears at a position close to the head is regarded as a higher rank.

このように、図５（ａ）の行列の各要素の値は、入力文字列中の対応する文字と、用語中の対応する文字とが同じ場合は、左上の要素の値に、用語の末尾の位置に対応する桁を１とした数を加えた値と同一である。一方、入力文字列中の対応する文字と、用語中の対応する文字とが異なる場合は、左隣の要素の値と、上隣の要素の値とのうち、ＬＣＳ長が長いほうの値と同一（両者のＬＣＳ長が同一の場合は、値が小さいほうの要素の値と同一。さらに両要素の値が同一の場合は、両要素の値と同一）である。 In this way, the value of each element of the matrix in FIG. 5A is the value of the upper left element when the corresponding character in the input character string is the same as the corresponding character in the term, and the end of the term. This is the same as the value obtained by adding a number with the digit corresponding to the position of 1. On the other hand, when the corresponding character in the input character string is different from the corresponding character in the term, the value of the longer LCS length among the value of the element on the left and the value of the element on the upper left is The same value (if both LCS lengths are the same, it is the same as the value of the smaller element. Further, if both elements are the same value, they are the same as the values of both elements).

ここで、図４で説明したように、入力文字列「大腸粘液癌」のうち、文字「液」は、用語「大腸粘膜内癌」と共通しておらず、ＬＣＳ「大腸粘癌」には含まれない。このため、図４で説明したＬＣＳ長の場合と同様、用語「大腸粘膜内癌」またはその部分文字列のいずれについても、入力文字列の部分文字列「大腸粘液」との文字位置情報は、文字「液」を除いた「大腸粘」との文字位置情報と同一である。例えば、用語の部分文字列「大腸粘膜内」について見ると、この「大腸粘膜内」と「大腸粘液」とのＬＣＳは「大腸粘」であり、文字位置情報は（要素ｂ１４５の値）「１１１」（二進数）である。そして、「大腸粘膜内」と、「大腸粘液」から「液」を除いた「大腸粘」とのＬＣＳも「大腸粘」であり、文字位置情報は（要素ａ１３５の値）「１１１」（二進数）である。
このため、図５（ａ）の行列の第４行（文字「液」に対応する行）の値は、上隣の第３行（文字「粘」に対応する行）の値と一致する。 Here, as described with reference to FIG. 4, the character “liquid” in the input character string “colon mucinous carcinoma” is not in common with the term “colon mucosa cancer”, and LCS “colonous mucosal cancer” Not included. For this reason, as in the case of the LCS length described in FIG. 4, the character position information with the partial character string “colon mucus” of the input character string for either the term “colon mucosa cancer” or its partial character string is: It is the same as the character position information of “colon viscosity” excluding the character “liquid”. For example, looking at the term partial character string “in the large intestine mucosa”, the LCS of “in the large intestine mucosa” and “colon mucus” is “colon mucosa”, and the character position information (value of element b145) is “111”. (Binary number). The LCS of “colon mucosa” and “colon mucosa” obtained by removing “liquid” from “colon mucus” is also “colon mucosa”, and the character position information is (value of element a135) “111” (two Decimal number).
For this reason, the value of the fourth row (row corresponding to the character “Liquid”) in the matrix of FIG. 5A matches the value of the third row above (row corresponding to the character “Viscosity”).

このように、入力文字列のうち、用語に含まれない文字に対応する、図５（ａ）の行列中の行の値は、上隣の行の値と一致する。そこで、文字位置情報行列記憶部１３４は、図５（ａ）の行列から、用語に含まれない文字に対応する行を削除した行列を、文字位置情報行列として記憶する。この文字位置情報行列の最下行最右列の要素の値は、図５（ａ）の行列の最下行最右列の要素の値と一致する。すなわち、文字位置情報行列の最下行最右列の要素の値は、入力文字列と用語とのＬＣＳが用語中に出現する位置を示す文字位置情報を示している。
例えば、文字位置情報行列記憶部１３４は、図５（ａ）の行列から、用語に含まれない文字「液」を削除した図５（ｂ）の文字位置情報行列を記憶する。この行列の最下行最右列の要素ｂ２４６の値「１００１１１」は、入力文字列「大腸粘液癌」と用語「大腸粘膜内癌」とのＬＣＳ「大腸粘癌」に含まれる各文字が、用語「大腸粘膜内癌」中に出現する位置を示している。
後述するように、文字列情報行列は、文字インデックス記憶部１３２が記憶する文字インデックスを用いて生成される。 As described above, the value of the row in the matrix of FIG. 5A corresponding to the character not included in the term in the input character string matches the value of the upper adjacent row. Therefore, the character position information matrix storage unit 134 stores, as a character position information matrix, a matrix obtained by deleting rows corresponding to characters not included in the terms from the matrix in FIG. The value of the element in the bottom rightmost column of this character position information matrix matches the value of the element in the bottom rightmost column of the matrix of FIG. That is, the value of the element in the bottom rightmost column of the character position information matrix indicates character position information indicating the position where the LCS of the input character string and the term appears in the term.
For example, the character position information matrix storage unit 134 stores the character position information matrix in FIG. 5B in which the character “liquid” not included in the term is deleted from the matrix in FIG. The value “100111” of the element b246 in the bottom rightmost column of this matrix indicates that each character included in the LCS “colon mucosal cancer” of the input character string “colon mucinous cancer” and the term “colon mucosal cancer” is the term The positions appearing in “colon mucosa cancer” are shown.
As will be described later, the character string information matrix is generated using the character index stored in the character index storage unit 132.

図１に戻って、処理部１２０は、通信部１１０を介して、ユーザが入力済みの文字列を端末装置２００から取得し、取得した文字列に基づいて、ユーザが入力したい文字列の候補の各々の順位を決定する。そして、処理部１２０は、ユーザが入力したい文字列の候補を、決定した順位に従って並べたリストを生成し、通信部１１０を介して端末装置２００に送信する。処理部１２０は、例えば、文字列入力支援装置１００の具備するＣＰＵが、記憶部１３０からプログラムを読み出して実行することにより実現される。
処理制御部１２１は、各部を制御して処理を行わせる。文字インデックス生成部１２２は、用語記憶部１３１の記憶する用語リストに基づいて文字インデックスを生成する。ＬＣＳ長算出部１２３は、入力文字列と、用語記憶部１３１が記憶する各用語とのＬＣＳ長を算出する。文字位置情報生成部１２４は、入力文字列と、用語記憶部１３１が記憶する各用語との文字位置情報を生成する。順位決定部１２５は、ＬＣＳ長算出部１２３が算出するＬＣＳ長および文字位置情報生成部１２４が生成する文字位置情報に基づいて、用語記憶部１３１が記憶する各用語の順位を決定する。 Returning to FIG. 1, the processing unit 120 acquires a character string that has been input by the user from the terminal device 200 via the communication unit 110 and, based on the acquired character string, the candidate character string that the user wants to input. Determine each ranking. Then, the processing unit 120 generates a list in which character string candidates that the user wants to input are arranged according to the determined order, and transmits the list to the terminal device 200 via the communication unit 110. The processing unit 120 is realized, for example, when the CPU included in the character string input support device 100 reads out and executes a program from the storage unit 130.
The processing control unit 121 controls each unit to perform processing. The character index generation unit 122 generates a character index based on the term list stored in the term storage unit 131. The LCS length calculation unit 123 calculates the LCS length between the input character string and each term stored in the term storage unit 131. The character position information generation unit 124 generates character position information between the input character string and each term stored in the term storage unit 131. The rank determination unit 125 determines the rank of each term stored in the term storage unit 131 based on the LCS length calculated by the LCS length calculation unit 123 and the character position information generated by the character position information generation unit 124.

端末装置２００は、ユーザによる文字列の入力を受け付けて文字列入力支援装置１００に送信し、文字列入力支援装置１００から送信される用語（ユーザが入力しようとしている文字列の候補）を、文字列入力支援装置１００が決定した順位に従って表示する。
通信部２１０は、文字列入力支援装置１００との間でデータの送受信を行う。
入力部２２０は、キーボードおよびマウスを備え、ユーザによる文字列の入力を受け付けて、入力された文字列を、通信部２１０を介して文字列入力支援装置１００に送信する。表示部２３０は、液晶ディスプレイ等の表示画面を備え、文字列の入力欄を表示し、ユーザの入力した文字列を入力欄に表示する。また、表示部２３０は、順位付けされた用語を、通信部２１０を介して文字列入力支援装置１００から受信し、受信した用語を、その順位に従って入力欄の下方近傍に表示する。 The terminal device 200 accepts input of a character string by the user and transmits the character string input support device 100 to the character string input support device 100. The term transmitted from the character string input support device 100 (candidate of the character string to be input by the user) They are displayed according to the order determined by the column input support device 100.
The communication unit 210 transmits / receives data to / from the character string input support device 100.
The input unit 220 includes a keyboard and a mouse, receives input of a character string by the user, and transmits the input character string to the character string input support device 100 via the communication unit 210. The display unit 230 includes a display screen such as a liquid crystal display, displays a character string input field, and displays a character string input by the user in the input field. In addition, the display unit 230 receives the ranked terms from the character string input support device 100 via the communication unit 210, and displays the received terms near the lower part of the input field according to the ranking.

次に、文字列入力支援システム１の動作について説明する。
文字列入力支援システム１では、文字列入力支援装置１００の文字インデックス生成部１２２が、用語記憶部１３１の記憶する用語リストに基づいて、予め文字インデックスを生成し、文字インデックス記憶部１３２に書き込んでおく。
図６は、文字インデックス生成部１２２が文字インデックスを生成する処理手順を示すフローチャートである。
文字インデックス生成部１２２は、まず、用語記憶部１３１が記憶する用語リストの各行に対する処理を行うループＬ１の処理を開始する。以下では、ループＬ１にて処理対象となっている用語を「Ｗ」にて表示し、用語Ｗの用語ＩＤリストを「ｉ」にて表示する（以上、ステップＳ１）。 Next, the operation of the character string input support system 1 will be described.
In the character string input support system 1, the character index generation unit 122 of the character string input support device 100 generates a character index in advance based on the term list stored in the term storage unit 131 and writes the character index in the character index storage unit 132. deep.
FIG. 6 is a flowchart illustrating a processing procedure in which the character index generation unit 122 generates a character index.
First, the character index generation unit 122 starts processing of a loop L1 that performs processing for each row of the term list stored in the term storage unit 131. In the following, the term to be processed in the loop L1 is displayed as “W”, and the term ID list of the term W is displayed as “i” (step S1).

そして、文字インデックス生成部１２２は、用語Ｗに含まれる各文字に対する処理を行うループＬ２の処理を開始する。以下では、ループＬ２にて処理対象となっている文字を「Ｒ」にて表示する（以上、ステップＳ２）。
そして、文字インデックス生成部１２２は、文字インデックスの行を示す変数ｋの値を１に設定する（ステップＳ３）。 Then, the character index generation unit 122 starts processing of a loop L2 that performs processing for each character included in the term W. In the following, the character to be processed in the loop L2 is displayed as “R” (step S2 above).
Then, the character index generation unit 122 sets the value of the variable k indicating the character index row to 1 (step S3).

次に、文字インデックス生成部１２２は、文字インデックス記憶部１３２が記憶する文字インデックスに第ｋ行が存在するか否かを判定する（ステップＳ４）。存在すると判定した場合（ステップＳ４：ＹＥＳ）、文字インデックス生成部１２２は、文字インデックスの第ｋ行に含まれる文字と文字Ｒとが一致するか否かを判定する。一致しないと判定した場合（ステップＳ５：ＮＯ）、文字インデックス生成部１２２は、ｋに１を加え、すなわち文字インデックスの次の行を対象に設定し（ステップＳ３１）、ステップＳ４に戻る。 Next, the character index generation unit 122 determines whether or not the k-th row exists in the character index stored in the character index storage unit 132 (step S4). When it is determined that it exists (step S4: YES), the character index generation unit 122 determines whether or not the character R and the character included in the k-th row of the character index match. If it is determined that they do not match (step S5: NO), the character index generation unit 122 adds 1 to k, that is, sets the next line of the character index as a target (step S31), and returns to step S4.

一方、ステップＳ５にて一致すると判定した場合（ステップＳ５：ＹＥＳ）、文字インデックス生成部１２２は、文字インデックスの第ｋ行に含まれる用語ＩＤリストに、ｉが含まれるか否かを判定する。このように、既にｉが含まれているか否かを判定するのは、重複記載を避けるためである（以上、ステップＳ６）。含まれないと判定した場合（ステップＳ６：ＮＯ）、文字インデックス生成部１２２は、文字インデックス記憶部１３２が記憶する、文字インデックスの第ｋ行に含まれる用語ＩＤリストに、ｉを追加する（書き込む）（ステップＳ７）。 On the other hand, when it is determined in step S5 that they match (step S5: YES), the character index generation unit 122 determines whether i is included in the term ID list included in the k-th row of the character index. In this way, it is determined whether or not i is already included in order to avoid duplicate description (step S6). When it is determined that the character index is not included (step S6: NO), the character index generation unit 122 adds i to the term ID list included in the k-th row of the character index stored in the character index storage unit 132 (write). (Step S7).

その後、文字インデックス生成部１２２は、用語Ｗの全ての文字に対してループＬ２の処理を行ったか否かを判定する。未処理の文字があると判定した場合は、ステップＳ２に戻って、未処理の文字に対して引き続きループＬ２の処理を行う。一方、全ての文字に対して処理を行ったと判定した場合は、次のステップＳ９に進む（以上、ステップＳ８）。
そして、文字インデックス生成部１２２は、用語リストの全ての行に対してループＬ１の処理を行ったか否かを判定する。未処理の行があると判定した場合は、ステップＳ１に戻って、未処理の行に対して引き続きループＬ１の処理を行う。一方、全ての行に対して処理を行ったと判定した場合は、同図の処理を終了する（以上、ステップＳ９）。 Thereafter, the character index generation unit 122 determines whether or not the processing of the loop L2 has been performed on all the characters of the term W. If it is determined that there is an unprocessed character, the process returns to step S2 to continue the processing of loop L2 for the unprocessed character. On the other hand, if it is determined that all characters have been processed, the process proceeds to the next step S9 (step S8).
Then, the character index generation unit 122 determines whether or not the process of the loop L1 has been performed on all the rows in the term list. If it is determined that there is an unprocessed line, the process returns to step S1, and the process of the loop L1 is continued for the unprocessed line. On the other hand, if it is determined that the processing has been performed for all the rows, the processing in the figure is terminated (step S9).

一方、ステップＳ４において、文字インデックス記憶部１３２が記憶する文字インデックスに第ｋ行が存在しないと判定した場合（ステップＳ４：ＮＯ）、および、ステップＳ６において、文字インデックスの第ｋ行に含まれる用語ＩＤリストに、ｉが含まれていると判定した場合（ステップＳ６：ＹＥＳ）は、ステップＳ８に進む。
また、ステップＳ４において、文字インデックス記憶部１３２が記憶する文字インデックスに第ｋ行が存在しないと判定した場合（ステップＳ４：ＮＯ）は、文字インデックス記憶部１３２が記憶する文字インデックスの末尾に、文字Ｒと、ｉから成る用語ＩＤリストとを含む行を追加する（書き込む）。 On the other hand, when it is determined in step S4 that the kth row does not exist in the character index stored in the character index storage unit 132 (step S4: NO), and the term included in the kth row of the character index in step S6. If it is determined that i is included in the ID list (step S6: YES), the process proceeds to step S8.
In step S4, when it is determined that the k-th line does not exist in the character index stored in the character index storage unit 132 (step S4: NO), the character index stored in the character index storage unit 132 is displayed at the end of the character index. Add (write) a line containing R and the term ID list consisting of i.

文字インデックスの生成を完了すると、端末装置２００の表示部２３０が入力欄を表示し、入力部２２０がユーザの入力操作を待ち受ける。入力部２２０は、ユーザの入力操作を受けると、当該操作に基づいて、入力文字列を生成する。すなわち、入力部２２０は、過去の入力操作に基づく入力文字列を記憶しており、文字の追加や削除等の新たな入力操作を受けると、記憶している入力文字列を当該操作に基づいて更新（編集）する。入力部２２０は、更新された入力文字列を、表示部２３０および通信部２１０に出力する。
表示部２３０は、入力部２２０から出力された入力文字列を入力欄に表示（既に入力文字列を表示しているときは更新）する。また、通信部２１０は、入力部２２０から出力された入力文字列を、文字列入力支援装置１００の通信部１１０に送信する。
通信部１１０は、通信部２１０からの入力文字列を受信すると、受信した入力文字列を処理部１２０に出力する。 When the generation of the character index is completed, the display unit 230 of the terminal device 200 displays the input field, and the input unit 220 waits for the user's input operation. Upon receiving a user input operation, the input unit 220 generates an input character string based on the operation. That is, the input unit 220 stores an input character string based on a past input operation. When a new input operation such as addition or deletion of a character is received, the input unit 220 converts the stored input character string based on the operation. Update (edit). The input unit 220 outputs the updated input character string to the display unit 230 and the communication unit 210.
The display unit 230 displays the input character string output from the input unit 220 in the input field (updates when the input character string is already displayed). Further, the communication unit 210 transmits the input character string output from the input unit 220 to the communication unit 110 of the character string input support device 100.
When receiving the input character string from the communication unit 210, the communication unit 110 outputs the received input character string to the processing unit 120.

処理部１２０は、通信部１１０から出力される入力文字列と、用語記憶部１３１の記憶する各用語とのＬＣＳ長の算出および文字位置情報の生成を行う。
図７および図８は、処理部１２０がＬＣＳ長を算出し、文字位置情報を生成する処理手順を示すフローチャートである。処理部１２０は、通信部１１０から入力文字列が出力されると同図の処理を開始する。
まず、処理部１２０の処理制御部１２１は、ＬＣＳ長行列記憶部１３３のＬＣＳ長行列を初期化するようＬＣＳ長算出部１２３を制御し、文字位置情報行列記憶部１３４の文字位置情報行列を初期化するよう文字位置情報生成部１２４を制御する。 The processing unit 120 calculates the LCS length of the input character string output from the communication unit 110 and each term stored in the term storage unit 131 and generates character position information.
7 and 8 are flowcharts showing a processing procedure in which the processing unit 120 calculates the LCS length and generates character position information. When the input character string is output from the communication unit 110, the processing unit 120 starts the processing in FIG.
First, the processing control unit 121 of the processing unit 120 controls the LCS length calculation unit 123 to initialize the LCS length matrix of the LCS length matrix storage unit 133, and initializes the character position information matrix of the character position information matrix storage unit 134. The character position information generation unit 124 is controlled so that

ＬＣＳ長算出部１２３は、用語記憶部１３１が記憶する用語の各々について、ＬＣＳ長行列の行数を入力文字列の文字数＋１とし、列数を当該用語の文字数＋１とする。そして、最上行（図４の例で、「初期値」に対応する行）の各要素の値と、最左列（図４の例で、「初期値」に対応する列）の各要素の値とを、いずれも「０」とする。他の要素の値は、この時点では未定である。
また、文字位置情報生成部１２４は、用語記憶部１３１が記憶する用語の各々について、文字位置情報行列の行数を入力文字列の文字数＋１とし、列数を当該用語の文字数＋１とする。そして、最上行（図５の例で、「初期値」に対応する行）の各要素の値と、最左列（図５の例で、「初期値」に対応する列）の各要素の値とを、いずれも「０」とする。他の要素の値は、この時点では未定である（以上、ステップＳ１０１）。 For each term stored in the term storage unit 131, the LCS length calculation unit 123 sets the number of rows of the LCS length matrix to the number of characters of the input character string + 1 and the number of columns to the number of characters of the term + 1. Then, the value of each element in the top row (the row corresponding to the “initial value” in the example of FIG. 4) and the value of each element in the leftmost column (the column corresponding to the “initial value” in the example of FIG. 4). Both values are set to “0”. The values of other elements are undecided at this point.
Further, for each term stored in the term storage unit 131, the character position information generation unit 124 sets the number of rows of the character position information matrix to the number of characters of the input character string + 1 and the number of columns to the number of characters of the term + 1. Then, the value of each element in the top row (the row corresponding to “initial value” in the example of FIG. 5) and the value of each element in the leftmost column (the column corresponding to “initial value” in the example of FIG. 5). Both values are set to “0”. The values of other elements are undecided at this point (step S101).

次に、処理制御部１２１は、入力文字列の各文字について先頭から順に処理を行うループＬ１１の処理を開始する。以下では、ループＬ１１にて処理対象となっている文字の位置を「ｊ」（先頭から順に、１、２、・・・とする）にて表示する。（以上、ステップＳ１０２）。
次に、処理制御部１２１は、文字インデックス記憶部１３２が記憶する文字インデックスの中から、入力文字列の先頭からｊ番目の文字と同じ行に格納されている用語ＩＤリスト（文字インデックス中に、該当する行が無い場合は空のリスト）を読み出す（ステップＳ１０３）。
そして、処理制御部１２１は、ステップＳ１０３で取得した用語ＩＤリストに含まれる用語ＩＤに対応する各用語について処理を行うループＬ１２の処理を開始する。以下では、ループＬ１２にて処理対象となっている用語を「Ｗ」にて表示する（以上、ステップＳ１０４）。 Next, the process control unit 121 starts a process of a loop L11 that sequentially processes each character of the input character string from the top. In the following, the position of the character to be processed in the loop L11 is displayed as “j” (in order from the top, 1, 2,...). (Step S102).
Next, the process control unit 121 selects a term ID list (in the character index, stored in the same line as the j-th character from the top of the input character string from among the character indexes stored in the character index storage unit 132. If there is no corresponding line, an empty list is read (step S103).
And the process control part 121 starts the process of the loop L12 which processes about each term corresponding to term ID contained in the term ID list acquired by step S103. Hereinafter, the term to be processed in the loop L12 is displayed as “W” (step S104).

次に、処理制御部１２１は、算出済みのＬＣＳ長、すなわち、ＬＣＳ長行列記憶部１３３の記憶するＬＣＳ長行列のうち、用語Ｗに対応するＬＣＳ長行列の、値が確定している行の最右の要素の値を読み出す。後述するように、この算出済みのＬＣＳ長は、入力文字列の部分文字列と、用語ＷとのＬＣＳ長である。また、処理制御部１２１は、入力文字列に含まれる文字のうち、処理が終わっていない文字の数（入力文字列の文字数−ｋ＋１）を算出する。そして、処理制御部１２１は、算出済みのＬＣＳ長と、入力文字列に含まれる文字のうち処理が終わっていない文字の数との合計値が、予め定められた閾値以上か否かを判定する。
この閾値は、入力文字列と用語とのＬＣＳ長に対する閾値であり、ＬＣＳ長が閾値以上の用語のみが、ユーザの入力したい文字列の候補として端末装置２００の表示部２３０に表示される。この閾値は、例えばユーザによって予め設定される（以上、ステップＳ１０５）。
算出済みのＬＣＳ長と、入力文字列に含まれる文字のうち処理が終わっていない文字の数との合計値が、閾値以上であると判定した場合（ステップＳ１０５：ＹＥＳ）、処理制御部１２１は、用語Ｗの各文字について先頭から順に処理を行うループＬ１３の処理を開始する。以下では、ループＬ１１にて処理対象となっている文字の位置を「ｋ」（先頭から順に、１、２、・・・とする）にて表示する。（以上、ステップＳ１０６）。 Next, the processing control unit 121 calculates the LCS length that has been calculated, that is, the LCS length matrix corresponding to the term W among the LCS length matrices stored in the LCS length matrix storage unit 133. Read the value of the rightmost element. As will be described later, the calculated LCS length is the LCS length of the partial character string of the input character string and the term W. Further, the processing control unit 121 calculates the number of characters that have not been processed among the characters included in the input character string (the number of characters in the input character string−k + 1). Then, the process control unit 121 determines whether the total value of the calculated LCS length and the number of characters that have not been processed among the characters included in the input character string is equal to or greater than a predetermined threshold value. .
This threshold value is a threshold value for the LCS length of the input character string and the term, and only the term whose LCS length is equal to or greater than the threshold value is displayed on the display unit 230 of the terminal device 200 as a character string candidate to be input by the user. This threshold value is preset by the user, for example (step S105).
When it is determined that the total value of the calculated LCS length and the number of characters that have not been processed among the characters included in the input character string is equal to or greater than the threshold (step S105: YES), the processing control unit 121 Then, the processing of the loop L13 for sequentially processing each character of the term W from the top is started. In the following, the position of the character to be processed in the loop L11 is displayed as “k” (in order from the top, 1, 2,...). (The above is step S106).

そして、処理制御部１２１は、入力文字列のｊ番目の文字と、用語Ｗのｋ番目の文字とが同一か否かを判定する（ステップＳ１０７）。同一であると判定した場合（ステップＳ１０７：ＹＥＳ）、処理制御部１２１は、文字が同一であることを示す信号を、ＬＣＳ長算出部１２３に出力する。
文字が同一であることを示す信号が処理制御部１２１から出力されると、ＬＣＳ長算出部１２３は、ＬＣＳ長行列の第ｊ行第ｋ列の要素の値として、第ｊ−１行第ｋ−１列の要素の値＋１を書き込む。すなわち、図４で説明したように、左上の要素の値に１を加えた値とする（以上、ステップＳ１１１）。 Then, the process control unit 121 determines whether or not the jth character of the input character string is the same as the kth character of the term W (step S107). If it is determined that they are the same (step S107: YES), the processing control unit 121 outputs a signal indicating that the characters are the same to the LCS length calculation unit 123.
When a signal indicating that the characters are the same is output from the processing control unit 121, the LCS length calculation unit 123 uses the jth row kth column as the element value of the jth row kth column of the LCS length matrix. Write the value -1 of the element in the -1 column. That is, as described with reference to FIG. 4, a value obtained by adding 1 to the value of the upper left element is set (step S111).

また、ステップＳ１０７において文字が同一であると判定した（ステップＳ１０７：ＹＥＳ）処理制御部１２１は、文字が同一であることを示す信号を、文字位置情報生成部１２４にも出力する。
文字が同一であることを示す信号が処理制御部１２１から出力されると、文字位置情報生成部１２４は、文字位置情報行列の第ｊ行第ｋ列の要素の値として、第ｊ−１行第ｋ−１列の要素の値＋２^ｋ−１を書き込む。すなわち、図５で説明したように、左上の要素の値に、用語の末尾の位置に対応する桁を１とした数を加えた値とする（以上、ステップＳ１１２）。 Further, in step S107, it is determined that the characters are the same (step S107: YES), and the processing control unit 121 also outputs a signal indicating that the characters are the same to the character position information generation unit 124.
When a signal indicating that the characters are the same is output from the processing control unit 121, the character position information generation unit 124 uses the j-1st row as the value of the element in the jth row and the kth column of the character position information matrix. Write the value of the element in column k−1 +2 ^k−1 . That is, as described with reference to FIG. 5, the value obtained by adding the number corresponding to the last position of the term to 1 is added to the value of the element on the upper left (step S112).

その後、処理制御部１２１は、用語Ｗの全ての文字に対してループＬ１３の処理を行ったか否かを判定する。未処理の文字があると判定した場合は、ステップＳ１０６に戻って、未処理の文字に対して引き続きループＬ１３の処理を行う。一方、全ての文字に対して処理を行ったと判定した場合は、次のステップＳ１４２に進む。
この、ループＬ１３の終了時点では、入力文字列の先頭からｊ文字目までの部分文字列と、用語ＷとのＬＣＳが算出されている。例えば、入力文字列が「大腸粘膜癌」、用語Ｗが「大腸粘膜内癌」で、ｊ＝２の場合、図４に示したＬＣＳ長行列のうち上３行（初期値の行と、入力文字列「大」および「腸」に対応する行）が生成されている。このＬＣＳ長行列の最下行最右列の要素の値「２」は、入力文字列の先頭から２文字の部分文字列「大腸」と用語Ｗ「大腸粘膜内癌」とのＣＬＳ長（ＣＬＳは「大腸」であり、ＣＬＳ長は「２」）を示している（以上、ステップＳ１４１）。 Thereafter, the process control unit 121 determines whether or not the process of the loop L13 has been performed for all the characters of the term W. If it is determined that there is an unprocessed character, the process returns to step S106, and the processing of loop L13 is continued for the unprocessed character. On the other hand, if it is determined that all characters have been processed, the process proceeds to the next step S142.
At the end of the loop L13, the LCS of the partial character string from the beginning of the input character string to the jth character and the term W is calculated. For example, when the input character string is “colon mucosa cancer” and the term W is “colon mucosa cancer” and j = 2, the upper three rows (the initial value row and the input value) of the LCS length matrix shown in FIG. The lines corresponding to the character strings “Large” and “Intestine” are generated. The value “2” of the element in the bottom rightmost column of this LCS length matrix is the CLS length (CLS is CLS) between the partial character string “colon” of 2 characters from the beginning of the input character string and the term W “colonal mucosal carcinoma” "Large colon" and CLS length is "2") (step S141).

そして、処理制御部１２１は、用語記憶部１３１の記憶する全ての用語に対してループＬ１２の処理を行ったか否かを判定する。未処理の用語があると判定した場合は、ステップＳ１０４に戻って、未処理の用語に対して引き続きループＬ１２の処理を行う。一方、全ての用語に対して処理を行ったと判定した場合は、次のステップＳ１４３に進む（以上、ステップＳ１４２）。
そして、処理制御部１２１は、入力文字列の全ての文字に対してループＬ１１の処理を行ったか否かを判定する。未処理の文字があると判定した場合は、ステップＳ１０２に戻って、未処理の文字に対して引き続きループＬ１１の処理を行う。一方、全ての文字に対して処理を行ったと判定した場合は、同図の処理を終了する（以上、ステップＳ１４３）。 And the process control part 121 determines whether the process of the loop L12 was performed with respect to all the terms which the term memory | storage part 131 memorize | stores. If it is determined that there is an unprocessed term, the process returns to step S104, and the processing of the loop L12 is continued for the unprocessed term. On the other hand, if it is determined that processing has been performed for all terms, the process proceeds to the next step S143 (step S142).
Then, the process control unit 121 determines whether or not the process of the loop L11 has been performed on all characters of the input character string. If it is determined that there is an unprocessed character, the process returns to step S102, and the processing of loop L11 is continued for the unprocessed character. On the other hand, if it is determined that processing has been performed for all characters, the processing in FIG. 5 ends (step S143).

一方、ステップＳ１０５において、閾値未満であると判定した場合（ステップＳ１０５：ＮＯ）、処理制御部１２１は、ステップＳ１４１に進む。すなわち、入力文字列に含まれる文字のうち、まだＬＣＳ長算出に用いられていない文字が全てＬＣＳを構成すると仮定しても処理対象の用語のＬＣＳ長が閾値未満である場合、残りの文字について処理をするまでもなく、この用語のＬＣＳ長が最終的に閾値未満となることが明らかである。すなわち、この用語は端末装置２００の表示部２３０には表示されないことが確定している。そこで、処理部１２０は、この用語に対するＬＣＳ長の算出および文字位置情報の生成を中止して、全体の処理の迅速化を図る。 On the other hand, when it determines with it being less than a threshold value in step S105 (step S105: NO), the process control part 121 progresses to step S141. That is, even if it is assumed that all characters included in the input character string that have not yet been used for LCS length calculation constitute an LCS, if the LCS length of the term to be processed is less than the threshold, Needless to say, it is clear that the LCS length of this term will eventually be below the threshold. That is, it is determined that this term is not displayed on the display unit 230 of the terminal device 200. Therefore, the processing unit 120 stops the calculation of the LCS length and the generation of the character position information for this term to speed up the overall processing.

また、ステップＳ１０７において文字が異なると判定した場合（ステップＳ１０７：ＮＯ）、処理制御部１２１は、ＬＣＳ長行列の第ｊ−１行第ｋ列の要素の値が、第ｊ行第ｋ−１列の要素の値以上か否かを判定する（ステップＳ１２１）。第ｊ−１行第ｋ列の要素の値が、第ｊ行第ｋ−１列の要素の値以上であると判定した場合（ステップＳ１２１：ＹＥＳ）、処理制御部１２１は、第ｊ−１行第ｋ列の要素の値が、第ｊ行第ｋ−１列の要素の値以上であることを示す信号を、ＬＣＳ長算出部１２３に出力する。
当該信号が処理制御部１２１から出力されると、ＬＣＳ長算出部１２３は、ＬＣＳ長行列の第ｊ行第ｋ列の要素の値として、第ｊ−１行第ｋ列の要素の値を書き込む。すなわち、図４で説明したように、左隣の値と、上隣の値とのうち大きいほう（ここでは、上隣の要素）の値と同一（両者の値が等しい場合は、両者の値と同一）とする（以上、ステップＳ１２２）。 If it is determined in step S107 that the characters are different (step S107: NO), the processing control unit 121 determines that the element value in the j-1st row and the kth column of the LCS length matrix is the jth row and the k-1th row. It is determined whether or not the value is greater than or equal to the element value in the column (step S121). When it is determined that the value of the element in the j−1th row and the kth column is equal to or larger than the value of the element in the jth row and the k−1th column (step S121: YES), the processing control unit 121 determines that the j−1th row A signal indicating that the value of the element in the row k-th column is equal to or greater than the value of the element in the j-th row k-1 column is output to the LCS length calculator 123.
When the signal is output from the processing control unit 121, the LCS length calculation unit 123 writes the value of the element in the j-1st row and the kth column as the value of the element in the jth row and the kth column of the LCS length matrix. . That is, as described in FIG. 4, the same value as the larger value (here, the upper adjacent element) of the left adjacent value and the upper adjacent value (if both values are equal, both values are the same) (Same as step S122).

また、ステップＳ１２１において、第ｊ−１行第ｋ列の要素の値が、第ｊ行第ｋ−１列の要素の値以上であると判定した（ステップＳ１２１：ＹＥＳ）処理制御部１２１は、ＬＣＳ長行列の第ｊ−１行第ｋ列の要素の値が、第ｊ行第ｋ−１列の要素の値と等しいか否かを判定する（ステップＳ１２３）。値が等しいと判定した場合（ステップＳ１２３：ＹＥＳ）、処理制御部１２１は、さらに、文字位置情報行列の第ｊ−１行第ｋ列の要素の値が、第ｊ行第ｋ−１列の要素の値以下か否かを判定する（ステップＳ１２４）。
ステップＳ１２３において両者の値が異なると判定した場合（ステップＳ１２３：ＮＯ）、および、ステップＳ１２４において、第ｊ−１行第ｋ列の要素の値が、第ｊ行第ｋ−１列の要素の値以下であると判定した場合（ステップＳ１２４：ＹＥＳ）、処理制御部１２１は、上隣の要素を書き込むよう指示する信号を、文字位置情報生成部１２４に出力する。
当該信号が処理制御部１２１から出力されると、文字位置情報生成部１２４は、文字位置情報行列の第ｊ行第ｋ列の要素の値として、第ｊ−１行第ｋ列の要素の値を書き込む。すなわち、図５で説明したように、左隣の要素の値と、上隣の要素の値とのうち、ＬＣＳ長が長いほう（ここでは、上隣の要素）の値と同一（両者のＬＣＳ長が同一の場合は、値が小さいほうの要素の値と同一。さらに両要素の値が同一の場合は、両要素の値と同一）とする（以上、ステップＳ１２５）。
その後、処理制御部１２１は、ステップＳ１４１に進む。 In step S121, the process control unit 121 determines that the value of the element in the j-1st row and the kth column is equal to or greater than the value of the element in the jth row and the k-1th column (step S121: YES) It is determined whether or not the value of the element in the j-1st row and the kth column of the LCS length matrix is equal to the value of the element in the jth row and the k-1th column (step S123). When it is determined that the values are equal (step S123: YES), the processing control unit 121 further determines that the value of the element in the j-1st row and the kth column of the character position information matrix is the jth row and the k-1th column. It is determined whether or not the value is equal to or less than the element value (step S124).
When it is determined in step S123 that the two values are different (step S123: NO), and in step S124, the value of the element in the j-1st row and the kth column is the value of the element in the jth row and the k-1th column. When it determines with it being below a value (step S124: YES), the process control part 121 outputs the signal which instruct | indicates to write an upper adjacent element to the character position information generation part 124. FIG.
When the signal is output from the processing control unit 121, the character position information generation unit 124 uses the value of the element in the jth row and the kth column as the value of the element in the jth row and the kth column of the character position information matrix. Write. That is, as described with reference to FIG. 5, the value of the LCS length which is longer (here, the upper adjacent element) is equal to the value of the element on the left side and the value of the upper adjacent element (the LCS of both). If the lengths are the same, the value is the same as the value of the smaller element, and if the values of both elements are the same, the value is the same as the values of both elements (step S125).
Thereafter, the process control unit 121 proceeds to step S141.

一方、ステップＳ１２１において、第ｊ−１行第ｋ列の要素の値が、第ｊ行第ｋ−１列の要素の値未満であると判定した場合（ステップＳ１２１：ＮＯ）、処理制御部１２１は、第ｊ−１行第ｋ列の要素の値が、第ｊ行第ｋ−１列の要素の値未満であることを示す信号を、ＬＣＳ長算出部１２３に出力する。
当該信号が処理制御部１２１から出力されると、ＬＣＳ長算出部１２３は、ＬＣＳ長行列の第ｊ行第ｋ列の要素の値として、第ｊ行第ｋ−１列の要素の値を書き込む。すなわち、図４で説明したように、左隣の値と、上隣の値とのうち大きいほう（ここでは、左隣の要素）の値と同一とする（以上、ステップＳ１３１）。 On the other hand, when it is determined in step S121 that the value of the element in the (j-1) th row and the kth column is less than the value of the element in the jth row and the (k-1) th column (step S121: NO), the processing control unit 121. Outputs a signal indicating that the value of the element in the j−1th row and the kth column is less than the value of the element in the jth row and the (k−1) th column to the LCS length calculation unit 123.
When the signal is output from the processing control unit 121, the LCS length calculation unit 123 writes the value of the element in the jth row and the k-1th column as the value of the element in the jth row and the kth column of the LCS length matrix. . That is, as described with reference to FIG. 4, it is set to the same value as the larger one of the left adjacent value and the upper adjacent value (here, the left adjacent element) (step S131).

また、ステップＳ１２１において、第ｊ−１行第ｋ列の要素の値が、第ｊ行第ｋ−１列の要素の値未満であると判定した場合（ステップＳ１２１：ＮＯ）、および、ステップＳ１２４において、第ｊ−１行第ｋ列の要素の値が、第ｊ行第ｋ−１列の要素の値より大きいと判定した場合（ステップＳ１２４：ＮＯ）、処理制御部１２１は、左隣の要素を書き込むよう指示する信号を、文字位置情報生成部１２４に出力する。
当該信号が処理制御部１２１から出力されると、文字位置情報生成部１２４は、文字位置情報行列の第ｊ行第ｋ列の要素の値として、第ｊ行第ｋ−１列の要素の値を書き込む。すなわち、図５で説明したように、左隣の要素の値と、上隣の要素の値とのうち、ＬＣＳ長が長いほう（ここでは、左隣の要素）の値と同一（両者のＬＣＳ長が同一の場合は、値が小さいほうの要素の値と同一）とする（以上、ステップＳ１３２）。
その後、処理制御部１２１は、ステップＳ１４１に進む。 Further, when it is determined in step S121 that the value of the element in the j-1st row and the kth column is less than the value of the element in the jth row and the (k-1) th column (step S121: NO), and step S124. In step S124, the process control unit 121 determines that the value of the element in the j-1st row and the kth column is larger than the value of the element in the jth row and the k-1th column (step S124: NO). A signal instructing to write the element is output to the character position information generation unit 124.
When the signal is output from the processing control unit 121, the character position information generation unit 124 uses the value of the element in the jth row and the (k-1) th column as the value of the element in the jth row and the kth column of the character position information matrix. Write. That is, as described in FIG. 5, the value of the LCS length that is longer (here, the element on the left) of the value of the element on the left side and the value of the element on the upper side (here, the LCS of both) is the same. If the lengths are the same, the value of the smaller element is the same) (step S132).
Thereafter, the process control unit 121 proceeds to step S141.

以上により、ＬＣＳ長行列および文字位置情報行列が完成する。そして、ＬＣＳ長行列の最上行最右列の要素の値が、対応する用語のＬＣＳ長を示し、文字位置情報行列の最上行最右列の要素の値が、対応する用語中における、ＬＣＳに含まれる文字の出現位置、すなわち文字位置情報を示している。 Thus, the LCS length matrix and the character position information matrix are completed. The value of the element in the top rightmost column of the LCS length matrix indicates the LCS length of the corresponding term, and the value of the element in the top rightmost column of the character position information matrix is the LCS in the corresponding term. The appearance position of the included character, that is, the character position information is shown.

なお、図７のステップＳ１０５で用語の絞込みを行わない場合（例えば、閾値の値が「０」または「１」に設定されている場合）は、処理部１２０が、ＬＣＳ長行列と文字位置情報行列とを常に新たに生成するのではなく、生成済みのＬＣＳ長行列および文字位置情報行列を更新するようにしてもよい。例えば、ユーザの操作入力により入力文字列の末尾に新たに１文字追加された場合、処理部１２０に入力される入力文字列のうち、末尾の文字を除いた部分文字列は、前回の入力文字列と同一である。そこで、ＬＣＳ長算出部１２３が、各ＬＣＳ長行列の末尾に行を追加し、文字位置情報生成部１２４が、各文字位置情報行列の末尾に行を追加するようにしてもよい。そして、処理部１２０は、新たに入力された入力文字列の末尾の文字に基づき、ループＬ１２の処理手順に従って、ＬＣＳ長行列および文字位置情報行列を完成させる。 In the case where no terms are narrowed down in step S105 of FIG. 7 (for example, when the threshold value is set to “0” or “1”), the processing unit 120 performs the LCS length matrix and the character position information. Instead of always generating a new matrix, the generated LCS length matrix and character position information matrix may be updated. For example, when one new character is added to the end of the input character string by the user's operation input, the partial character string excluding the last character in the input character string input to the processing unit 120 is the previous input character. Is identical to the column. Therefore, the LCS length calculation unit 123 may add a line to the end of each LCS length matrix, and the character position information generation unit 124 may add a line to the end of each character position information matrix. Then, the processing unit 120 completes the LCS length matrix and the character position information matrix according to the processing procedure of the loop L12 based on the last character of the newly input character string.

ＬＣＳ長行列および文字位置情報行列が完成すると、処理制御部１２１は、用語記憶部１３１の記憶する各用語の順位を決定するように順位決定部１２５を制御する。
順位決定部１２５は、用語記憶部１３１の記憶する各用語の、ＬＣＳ長と、文字位置情報と、用語の長さ（文字数）と、用語リストにおける順序（用語ＩＤの値）とに基づいて、順位を決定し、決定した順位に基づいて並べられた用語のリストを生成する。
図９は、順位決定部１２５による用語の順位決定の例を示す図である。同図は、入力文字列が「大腸癌」である場合の例であり、図１に示す用語リストに含まれる用語のうちＬＣＳ長が３以上の用語と、各用語の順位と、順位を決定する基準となるＬＣＳ長と文字位置情報と用語の文字数と用語ＩＤとを示している。 When the LCS length matrix and the character position information matrix are completed, the process control unit 121 controls the rank determination unit 125 to determine the rank of each term stored in the term storage unit 131.
The rank determination unit 125 is based on the LCS length, the character position information, the term length (number of characters), and the order in the term list (term ID value) for each term stored in the term storage unit 131. The ranking is determined, and a list of terms arranged based on the determined ranking is generated.
FIG. 9 is a diagram illustrating an example of the ranking determination of terms by the ranking determination unit 125. This figure is an example when the input character string is “colon cancer”. Among the terms included in the term list shown in FIG. 1, the term having an LCS length of 3 or more, the ranking of each term, and the ranking are determined. The LCS length, the character position information, the number of characters of the term, and the term ID as a reference to be used are shown.

順位決定部１２５は、まず、ＬＣＳ長が閾値以上の用語のみを、端末装置２００の表示部２３０に表示する用語として選択する。具体的には、順位決定部１２５は、ＬＣＳ長行列記憶部１３３が記憶する各ＬＣＳ長行列の最下行最右列の要素と閾値とを比較し、閾値以上の値を示すＬＣＳ長行列に対応する用語のみを選択する。ここで、ＬＣＳ長行列記憶部１３３が記憶するＬＣＳ長行列の中には、図７のステップＳ１０５における判定の結果、ＬＣＳ長算出処理が中止されているＬＣＳ長行列もある。このＬＣＳ長行列の最下行最右列の要素は、実際のＬＣＳ長よりも小さい値を示す場合がある。しかし、図７（ステップＳ１０５：ＮＯ）で説明したように、このＬＣＳ長行列に対応する用語は、ＬＣＳ長が閾値未満となることが確定している用語であり、順位決定部１２５の選択には影響しない。 The rank determination unit 125 first selects only terms whose LCS length is equal to or greater than a threshold as terms to be displayed on the display unit 230 of the terminal device 200. Specifically, the rank determination unit 125 compares the element in the bottom rightmost column of each LCS length matrix stored in the LCS length matrix storage unit 133 with the threshold value, and corresponds to the LCS length matrix indicating a value equal to or greater than the threshold value. Select only the terms that you want. Here, among the LCS length matrices stored in the LCS length matrix storage unit 133, there is also an LCS length matrix in which the LCS length calculation process is stopped as a result of the determination in step S105 of FIG. The element in the bottom rightmost column of this LCS length matrix may indicate a value smaller than the actual LCS length. However, as described with reference to FIG. 7 (step S105: NO), the term corresponding to this LCS length matrix is a term for which the LCS length is determined to be less than the threshold value. Has no effect.

次に順位決定部１２５は、選択した各用語に対して、ＬＣＳ長が長いほど上位とし（上位の用語ほど高い優先順位である、すなわち表示部２３０の入力欄に近い位置に表示されやすくなる）、ＬＣＳ長が同じ用語に対しては、文字位置情報の値が小さい用語を上位とする。さらに、ＬＣＳ長および文字位置情報の値が同じ用語に対しては、用語の文字数が少ない用語を上位とし、用語の文字数も同じである用語に対しては、用語ＩＤの小さい用語を上位とする。
上述したように、ＬＣＳ長は入力文字列と用語との類似度を示す情報として用いることができる。従って、ＬＣＳ長が長い用語ほど上位とすることで、入力文字列と類似度の高い用語を上位とすることができ、ユーザが入力したい文字列の候補を適切に提示できる。
また、ユーザは、通常、入力したい文字列の先頭から順に入力する。したがって、入力済みの文字列とのＬＣＳが先頭付近に出現する用語のほうが、末尾付近に出現する用語よりも、ユーザが入力したい文字列の候補として適切である。文字位置情報の値が小さい用語を上位とすることで、入力済みの文字列とのＬＣＳが先頭付近に出現する用語を上位とすることができ、ユーザが入力したい文字列の候補を適切に提示できる。 Next, the rank determination unit 125 sets the higher the LCS length for each selected term (the higher the term, the higher the priority, that is, the easier it is to be displayed near the input field of the display unit 230). For terms with the same LCS length, terms with small character position information values are given higher ranks. Furthermore, for terms with the same LCS length and character position information values, terms with a small number of characters are ranked high, and terms with the same number of characters are ranked high .
As described above, the LCS length can be used as information indicating the similarity between the input character string and the term. Therefore, a term having a higher LCS length is ranked higher so that a term having a higher similarity to the input character string can be ranked higher, and a character string candidate that the user wants to input can be presented appropriately.
Further, the user normally inputs the characters in order from the beginning of the character string to be input. Therefore, a term in which an LCS with an already entered character string appears near the beginning is more appropriate as a candidate for a character string that the user wants to input than a term that appears near the end. By placing a term with a small character position information value at the top, it is possible to place a term in which the LCS with the entered character string appears near the top, and appropriately present the candidate for the character string that the user wants to input it can.

順位決定部１２５は、例えば、順位を示す指標を用語毎に生成し、生成した指標に基づいて用語を並べ替える。
図１０は、順位決定部１２５が生成する指標の例を示す図である。
同図に示す指標は、ＬＣＳの長さ（二進数表示）と、小数点と、文字位置情報（二進数表示）の２の補数と、用語の長さ（二進数表示）の２の補数と、用語ＩＤ（二進数表示）の２の補数とが、この順に結合されて生成される。その際、順位決定部１２５は、文字位置情報（二進数表示）の桁数を、当該用語文字数に揃える。また、用語の長さ（二進数表示）の桁数を、用語記憶部１３１が記憶する用語のうち最長のものを表現可能な桁数に揃える。また、用語ＩＤ（二進数表示）の桁数を、用語記憶部１３１が記憶する用語ＩＤのうち最大のものを表現可能な桁数に揃える。 For example, the rank determination unit 125 generates an index indicating the rank for each term, and rearranges the terms based on the generated index.
FIG. 10 is a diagram illustrating an example of an index generated by the rank determining unit 125.
The index shown in the figure includes LCS length (binary number display), decimal point, two's complement number of character position information (binary number display), term length (binary number display) two's complement number, The two's complement of the term ID (in binary notation) is generated by combining in this order. At that time, the rank determination unit 125 aligns the number of digits of the character position information (binary number display) with the number of term characters. In addition, the number of digits of the term length (binary number display) is aligned to the number of digits that can represent the longest term stored in the term storage unit 131. In addition, the number of digits of the term ID (binary number display) is aligned to the number of digits that can represent the maximum one of the term IDs stored in the term storage unit 131.

また、同図の値は、入力文字列が「大腸癌」である場合に、用語「大腸粘膜内癌」に対して生成される指標の値を示している。
「大腸癌」の文字数は「３」であるため、同図に示す指標の整数部分は「１１」となっており、また、小数部分は、文字位置情報「１０００１１」の２の補数「０１１１００」と、用語の長さ「６」の二進数表示「０・・・０１１０」の２の補数「１・・・１００１」と、用語ＩＤ「５」の二進数表示「０・・・０１０１」の２の補数「１・・・１０１０」とを結合した値となっている。２の補数を取ることにより、元となる値が小さいほど大きい値の指標が生成される。 Also, the values in the figure indicate the index values generated for the term “colon mucosa cancer” when the input character string is “colon cancer”.
Since the number of characters of “colon cancer” is “3”, the integer part of the index shown in the figure is “11”, and the decimal part is the two's complement “011100” of the character position information “1000011”. And the two's complement “1... 1001” of the binary representation “0... 0110” of the term length “6” and the binary representation “0... 0101” of the term ID “5”. It is a value obtained by combining 2's complements “1... 1010”. By taking a 2's complement, a smaller value is generated as the original value is smaller.

順位決定部１２５が、この指標の大きい順に用語を並べ替えることにより、各用語は、ＬＣＳの長さが長い順に並べられ、ＬＣＳの長さが同一の場合は、文字位置情報の値が小さい順に並べられ、文字位置情報の値も同一の場合は、用語の長さが短い順に並べられ、用語の長さも同一の場合は、用語ＩＤの値が小さい順に並べられる。
ここで、ＬＣＳに含まれる文字が用語に出現する位置のうち、最も後ろ（用語の末尾側）の位置が先頭に近い用語ほど、文字位置情報の値が小さくなる。また、ＬＣＳに含まれる文字が用語に出現する位置のうち、最も後ろの出現位置が同一の場合は、後ろから２番目の出現位置が先頭に近い用語ほど、文字位置情報の値が小さくなる。同様に、１〜ｉ−１（ｉはＬＣＳ長以下の正整数）番目の各出現位置が同一の場合は、ｉ番目の出現位置が先頭に近い用語ほど、文字位置情報の値が小さくなる。この点で、文字位置情報生成部１２４は、ＬＣＳに含まれる文字が用語中に出現する位置が先頭に近いほど小さい値を示す文字位置情報を生成する。 The order determination unit 125 rearranges the terms in the descending order of the index, so that the terms are arranged in the order of the LCS length, and when the LCS length is the same, the character position information values are in ascending order. When the values of the character position information are the same, the terms are arranged in ascending order, and when the terms have the same length, the terms are arranged in ascending order of the term ID values.
Here, among the positions where the characters included in the LCS appear in the term, the value of the character position information becomes smaller as the term is located closest to the beginning (the end side of the term). Moreover, when the last appearance position is the same among the positions where the characters included in the LCS appear in the term, the value of the character position information becomes smaller as the second appearance position from the back is closer to the beginning. Similarly, when the 1st to i-1 (where i is a positive integer less than or equal to the LCS length) -th appearance positions are the same, the term with the i-th appearance position closer to the head has a smaller character position information value. In this regard, the character position information generation unit 124 generates character position information indicating a smaller value as the position where the character included in the LCS appears in the term is closer to the head.

順位決定部１２５は、並べ替えた用語を並べ替えた順に含むリストを生成し、通信部１１０を介して端末装置２００に送信する。
端末装置２００の通信部２１０は、順位決定部１２５からの用語のリストを受信すると、受信したリストを表示部２３０に出力する。表示部２３０は、通信部２１０から出力される用語のリストを、入力欄の下方近傍に表示する。 The rank determination unit 125 generates a list including the sorted terms in the sorted order, and transmits the list to the terminal device 200 via the communication unit 110.
When the communication unit 210 of the terminal device 200 receives the list of terms from the order determination unit 125, the communication unit 210 outputs the received list to the display unit 230. The display unit 230 displays a list of terms output from the communication unit 210 near the lower part of the input field.

図１１は、表示部２３０が用語のリストを表示した例を示す図である。同図では、入力文字列が「大腸癌」である場合に、図２に示した用語リストの用語のうち、ＬＣＳ長が３以上の用語を表示した例が示されている。順位決定部１２５は、設定されている閾値「３」に基づいて用語の絞込みを行い、さらに、図９で示した順位に従って用語を並べ替え、並べ替えた順に用語を含むリストを生成する。そして、表示部２３０がこのリストを表示する。
ユーザが、このリストに含まれる用語のいずれかを、マウスでクリックする等により選択すると、入力部２２０は、選択された用語を入力文字列として表示部２３０と通信部２１０に出力する。以下、各部は上述した処理を行う。 FIG. 11 is a diagram illustrating an example in which the display unit 230 displays a list of terms. In the same figure, when the input character string is “colon cancer”, an example is shown in which terms having an LCS length of 3 or more are displayed among the terms in the term list shown in FIG. The rank determination unit 125 narrows down terms based on the set threshold value “3”, rearranges the terms according to the rank shown in FIG. 9, and generates a list including the terms in the rearranged order. Then, the display unit 230 displays this list.
When the user selects one of the terms included in this list by clicking with the mouse or the like, the input unit 220 outputs the selected term to the display unit 230 and the communication unit 210 as an input character string. Hereinafter, each unit performs the above-described processing.

以上のように、文字列入力支援装置１００の文字インデックス記憶部１３２は、用語記憶部１３１の記憶する用語に含まれる文字の各々について、当該文字を含む用語を示す文字インデックスを予め記憶しておく。そして、処理部１２０は、ユーザが入力したい文字列の候補のリストを生成する際に、文字インデックスに基づいて処理対象となる用語の選択（絞込み）を行う。これにより、高速に、かつ、ＬＣＳ長が所定の閾値以上の全ての用語を対象として適切に、ユーザの入力したい文字列の候補を得られる。 As described above, the character index storage unit 132 of the character string input support device 100 stores in advance a character index indicating a term including the character for each character included in the term stored in the term storage unit 131. . Then, when generating a list of candidate character strings that the user wants to input, the processing unit 120 selects (narrows) terms to be processed based on the character index. Thereby, the candidate of the character string which a user wants to input can be obtained appropriately for all terms whose LCS length is equal to or greater than a predetermined threshold at high speed.

また、処理部１２０は、入力文字列の部分文字列と用語とのＬＣＳ長を算出し、算出したＬＣＳ長に基づいて、当該用語に関する類似度を示す情報の生成を中止するか否かを決定する。具体的には、入力文字列のうち未処理の文字の数と、算出したＬＣＳ長との合計値が、予め定められた閾値未満の場合に、当該用語について、ＬＣＳ長の算出および文字位置情報の生成を中止する。これにより、高速に、かつ、ＬＣＳ長が閾値以上の全ての用語を対象として適切に、ユーザの入力したい文字列の候補を得られる。 In addition, the processing unit 120 calculates the LCS length between the partial character string of the input character string and the term, and determines whether to stop generating information indicating the degree of similarity related to the term based on the calculated LCS length. To do. Specifically, when the total value of the number of unprocessed characters in the input character string and the calculated LCS length is less than a predetermined threshold, the LCS length calculation and character position information are performed for the term. Cancels generation of. Thereby, the candidate of the character string which a user wants to input can be appropriately obtained for all terms whose LCS length is equal to or greater than the threshold at high speed.

なお、ＬＣＳ長算出部１２３が生成する類似度を示す情報は、上述したＬＣＳ長に限らず、ある文字が用語に含まれるか否かに基づいて再帰的に生成されるものであればよい。例えば、入力文字列と用語とに共通して出現する文字数（ＬＣＳ長の算出において、文字の並びが同順であるとの条件を不要としたもの）を、上述したＬＣＳ長の算出と同様に、入力文字列の部分文字列について再帰的に算出するようにしてもよい。 Note that the information indicating the degree of similarity generated by the LCS length calculation unit 123 is not limited to the LCS length described above, and may be any information that is recursively generated based on whether a certain character is included in the term. For example, the number of characters that appear in common in the input character string and the term (in which the condition that the character sequence is the same in the calculation of the LCS length is not required) is calculated in the same manner as in the calculation of the LCS length described above. The partial character string of the input character string may be calculated recursively.

なお、処理制御部１２１が、用語リストに含まれる用語のうち、入力文字列に含まれる全ての用語を、ユーザの入力したい文字列の候補として選択するようにしてもよい。この場合、処理制御部１２１は、図７のループ１２において、用語ＩＤリストに基づいて得られる全ての用語を、ユーザの入力したい文字列の候補として選択する。 Note that the processing control unit 121 may select all the terms included in the input character string among the terms included in the term list as candidates for the character string that the user wants to input. In this case, the processing control unit 121 selects all the terms obtained based on the term ID list in the loop 12 of FIG. 7 as candidates for character strings to be input by the user.

なお、上述したように、文字列入力支援装置１００は、コンピュータによって実現するようにしてもよい。すなわち、文字列入力支援装置１００の全部または一部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより各部の処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。
また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 Note that, as described above, the character string input support device 100 may be realized by a computer. That is, a program for realizing all or part of the functions of the character string input support device 100 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed. The processing of each unit may be performed as necessary. Here, the “computer system” includes an OS and hardware such as peripheral devices.
Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case, and a program that holds a program for a certain period of time are also included. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

以上、この発明の実施形態を図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計変更等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes design changes and the like within a scope not departing from the gist of the present invention.

１文字列入力支援システム
１００文字列入力支援装置
１１０通信部
１２０処理部
１２１処理制御部
１２２文字インデックス生成部
１２３ＬＣＳ長算出部
１２４文字位置情報生成部
１２５順位決定部
１３０記憶部
１３１用語記憶部
１３２文字インデックス記憶部
１３３ＬＣＳ長行列記憶部
１３４文字位置情報行列記憶部
２００端末装置
２１０通信部
２２０入力部
２３０表示部

DESCRIPTION OF SYMBOLS 1 Character string input assistance system 100 Character string input assistance apparatus 110 Communication part 120 Processing part 121 Process control part 122 Character index production | generation part 123 LCS length calculation part 124 Character position information generation part 125 Order determination part 130 Storage part 131 Term storage part 132 Character index storage unit 133 LCS length matrix storage unit 134 Character position information matrix storage unit 200 Terminal device 210 Communication unit 220 Input unit 230 Display unit

Claims

An acquisition unit for acquiring a first character string;
A character string group storage unit for storing a character string group with identification information in which each of the one or more second character strings is associated with character string identification information for identifying the second character string;
For all the characters included in the second character string in the character string group with identification information, for each different character, the character and the second character string including the character and included in the character string group with identification information A character index storage unit that stores a character index associated with the character string identification information of
A character string selection unit that selects the second character string identified by the character string identification information associated with the character included in the first character string by the character index;
A character string selection device comprising:

The character string selection unit generates character number information indicating the number of characters common to the characters included in the selected second character string and the characters included in the first character string. The character string selection device according to claim 1.

The character string selection unit generates character number information indicating the number of characters common to a character included in the partial character string of the first character string and a character included in the second character string, and the number of characters If the value determined based on the information and the difference between the number of characters included in the first character string and the number of characters included in the partial character string is less than a predetermined threshold, the first character string The character string selection device according to claim 1 or 2, wherein the generation of information indicating the similarity between the second character string and the second character string is stopped.

A character string group storage unit for storing a character string group with identification information in which each of the one or more second character strings is associated with character string identification information for identifying the second character string;
For all the characters included in the second character string in the character string group with identification information, for each different character, the character and the second character string including the character and included in the character string group with identification information A character index storage unit that stores a character index associated with the character string identification information of
A character string selection method for a character string selection device comprising:
An acquisition step in which the acquisition unit acquires the first character string;
A character string selection step in which the character string selection unit selects the second character string identified by the character string identification information associated with the character included in the first character string by the character index; ,
A character string selection method comprising:

A character string group storage unit for storing a character string group with identification information in which each of the one or more second character strings is associated with character string identification information for identifying the second character string;
For all the characters included in the second character string in the character string group with identification information, for each different character, the character and the second character string including the character and included in the character string group with identification information A character index storage unit that stores a character index associated with the character string identification information of
In a computer as a character string selection device comprising:
An acquisition step of acquiring a first character string;
A character string selection step of selecting the second character string identified by the character string identification information associated with the character included in the first character string by the character index;
A program for running