JP5482236B2

JP5482236B2 - Program and information processing apparatus

Info

Publication number: JP5482236B2
Application number: JP2010016307A
Authority: JP
Inventors: 基行鷹合; 洋平山根; 博増市
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2010-01-28
Filing date: 2010-01-28
Publication date: 2014-05-07
Anticipated expiration: 2030-01-28
Also published as: JP2011154590A

Description

本発明は、プログラムおよび情報処理装置に関する。 The present invention relates to a program and an information processing apparatus.

テキストの入力を支援する技術が知られている。例えば、特許文献１には、利用者により入力された漢字やひらがなを含む文字列に続く文字列を予測する技術が開示されている。特許文献１に記載の技術では、ひらがな、漢字、または漢字混じりひらがなから成る複数の文字列を辞書に収容しておき、入力された文字列をキーとして前述の辞書の中から抽出した文字列を、入力された文字列に続く予測文字列として出力する。 A technique for supporting text input is known. For example, Patent Document 1 discloses a technique for predicting a character string following a character string including a kanji or hiragana input by a user. In the technique described in Patent Document 1, a plurality of character strings composed of hiragana, kanji, or mixed kanji are stored in a dictionary, and a character string extracted from the above dictionary using the input character string as a key is stored. And output as a predicted character string following the input character string.

特許文献２には、ペン入力コンピュータにおいて、入力および／または既に確定した文字列に基づいて、複数の単語および複数の文例を格納した辞書格納手段から検索した１以上の候補単語を表示手段に表示させ、表示させた候補単語の中から所望の単語をユーザに選択させることで文章入力を行う技術が開示されている。 In Patent Document 2, on a pen input computer, one or more candidate words retrieved from a dictionary storage unit that stores a plurality of words and a plurality of sentence examples are displayed on a display unit based on input and / or already determined character strings. A technique for inputting text by allowing a user to select a desired word from the displayed candidate words is disclosed.

特許文献３には、省略記号を含む文章の入力を受け付け、その省略記号の部分に相当する文字列を補完する技術が開示されている。特許文献３に記載の技術では、単語とその使用頻度を記憶した単語辞書を参照して、入力された文章中の省略記号の前後の文字から省略された単語の候補を抽出し、単語間の遷移情報と単語間の遷移確率を記憶した遷移辞書に基づいて、抽出された候補単語の中から１つの単語を決定する。 Patent Document 3 discloses a technique for accepting input of a sentence including an ellipsis and complementing a character string corresponding to the ellipsis. In the technique described in Patent Document 3, a word dictionary that stores words and the frequency of use of the words is extracted, word candidates omitted from characters before and after the ellipsis in the input sentence are extracted, Based on the transition dictionary storing the transition information and the transition probability between words, one word is determined from the extracted candidate words.

また、入力済みのテキストの校正を支援する技術もある。例えば、特許文献４に記載の技術では、処理対象の文を文節単位に解析し、各文節の中から、文の中で慣用的に連接して用いられる複数の要素から構成される慣用的用言句の構成要素を検出し、さらに、検出された構成要素を再配置する（並び替える）ことで、より分かりやすい文に校正することを支援する。 There are also technologies that support the proofreading of entered text. For example, in the technique described in Patent Document 4, a sentence to be processed is analyzed in a unit of phrase, and a conventional use composed of a plurality of elements that are conventionally connected in a sentence from each phrase. By detecting the constituent elements of the phrase, and further rearranging (rearranging) the detected constituent elements, it is possible to assist in proofreading the sentence more easily.

特開平８−２５５１５８号公報JP-A-8-255158 特開平１０−１５４０３３号公報Japanese Patent Laid-Open No. 10-154033 特開２０００−３３０９８４号公報JP 2000-330984 A 特開平７−２３４８７１号公報Japanese Patent Laid-Open No. 7-234871

ところで、入力済みのテキストを修正する場合、ユーザは、文字列の挿入を望むこともあるし、テキスト中の一部の文字列の置換を望むこともある。 By the way, when correcting the input text, the user may desire to insert a character string or may want to replace a part of the character string in the text.

本発明は、文字列の挿入を伴うテキストの修正および文字列の置換を伴うテキストの修正の両方を支援するプログラムおよび情報処理装置を提供することを目的とする。 It is an object of the present invention to provide a program and an information processing apparatus that support both correction of text accompanying insertion of a character string and correction of text accompanying replacement of a character string.

請求項１に係る発明は、処理対象の文字列である対象文字列における位置である指定位置を受け付ける受付ステップと、複数の文を記憶した辞書記憶手段を参照し、前記対象文字列における前記指定位置までに含まれる文字列を検索キーとし、前記辞書記憶手段に記憶されている前記複数の文のうち前記検索キーの文字列を含む文において、前記検索キーの文字列に続く文字列を、前記対象文字列における前記指定位置までの文字列に続く文字列の候補である候補文字列として推測する推測ステップと、前記対象文字列における前記指定位置の後の文字列に対して形態素解析を行う解析ステップと、前記形態素解析の結果に基づいて、前記対象文字列における前記指定位置の後の文字列を、文の構成要素に対応する文字列である要素文字列に分解する分解ステップと、前記候補文字列と前記要素文字列との組のそれぞれについて、予め定められた評価規則に基づいて、当該候補文字列に当該要素文字列以降の前記対象文字列中の文字列が続く表現を評価し、当該表現の自然言語としての妥当性を表す評価値を求める評価ステップと、前記評価値が予め設定された閾値以上である前記表現を出力する出力ステップと、をコンピュータに実行させるためのプログラムである。 The invention according to claim 1 refers to a reception step of receiving a specified position that is a position in a target character string that is a character string to be processed, and a dictionary storage unit that stores a plurality of sentences, and the specification in the target character string A character string included up to the position as a search key, and in a sentence including the search key character string among the plurality of sentences stored in the dictionary storage means, a character string following the search key character string, Guessing to infer as a candidate character string that is a candidate for a character string following the character string up to the specified position in the target character string, and performing a morphological analysis on the character string after the specified position in the target character string an analysis step, based on a result of the morphological analysis, the string after the specified position in the target string, the element string is a character string corresponding to the components of the sentence The character string in the target character string after the element character string is added to the candidate character string based on a predetermined evaluation rule for each of the pair of the candidate character string and the element character string. An evaluation step for evaluating an expression followed by a sequence, obtaining an evaluation value representing the validity of the expression as a natural language, and an output step for outputting the expression having the evaluation value equal to or greater than a preset threshold value It is a program for making it run.

請求項２に係る発明は、請求項１に係る発明において、前記出力ステップで出力された表現のうちの１つを選択するユーザの入力に応じて、前記対象文字列における前記指定位置以降の文字列を前記入力で選択された表現で置換した文字列を生成する生成ステップ、を前記コンピュータにさらに実行させる。 According to a second aspect of the present invention, in the first aspect of the invention, the characters after the designated position in the target character string in response to a user input selecting one of the expressions output in the output step. A generation step of generating a character string in which a string is replaced with an expression selected by the input is further executed by the computer.

請求項３に係る発明は、請求項１または２に係る発明において、前記出力ステップにおいて、さらに、前記評価値が前記閾値以上である前記表現が複数存在する場合に、前記評価値がより高い前記表現を優先して表示手段に表示させるよう制御する表示制御情報を出力する。 The invention according to claim 3 is the invention according to claim 1 or 2, wherein, in the output step, the evaluation value is higher when there are a plurality of expressions whose evaluation value is equal to or greater than the threshold. Display control information for controlling the display unit to display the display with priority is output.

請求項４に係る発明は、請求項１から３のいずれか１項に係る発明において、前記対象文字列に含まれる表現と同一の表現については前記評価値を求めない。 The invention according to claim 4 is the invention according to any one of claims 1 to 3, wherein the evaluation value is not obtained for the same expression as the expression included in the target character string.

請求項５に係る発明は、請求項１から４のいずれか１項に係る発明において、前記出力ステップにおいて、さらに、前記評価値が前記閾値以上である前記表現における前記候補文字列と前記対象文字列中の文字列とを異なる態様で表示手段に表示させるよう制御する表示制御情報を出力する。 The invention according to claim 5 is the invention according to any one of claims 1 to 4, wherein, in the output step, the candidate character string and the target character in the expression in which the evaluation value is not less than the threshold value. Display control information for controlling the display unit to display the character string in the column in a different manner is output.

請求項６に係る発明は、請求項１から５のいずれか１項に係る発明において、前記出力ステップにおいて、さらに、前記対象文字列において、前記対象文字列の前記指定位置と、前記評価値が前記閾値以上である前記表現に含まれる前記要素文字列と、の間に配置された前記要素文字列を、当該表現と共に表示手段に表示させるよう制御する表示制御情報を出力する。 The invention according to claim 6 is the invention according to any one of claims 1 to 5, wherein, in the output step, in the target character string, the designated position of the target character string and the evaluation value are Display control information for controlling the element character string arranged between the element character string included in the expression that is equal to or greater than the threshold to be displayed on the display unit together with the expression is output.

請求項７に係る発明は、処理対象の文字列である対象文字列における位置である指定位置を受け付ける受付手段と、複数の文を記憶した辞書記憶手段を参照し、前記対象文字列における前記指定位置までに含まれる文字列を検索キーとし、前記辞書記憶手段に記憶されている前記複数の文のうち前記検索キーの文字列を含む文において、前記検索キーの文字列に続く文字列を、前記対象文字列における前記指定位置までの文字列に続く文字列の候補である候補文字列として推測する推測手段と、前記対象文字列における前記指定位置の後の文字列に対して形態素解析を行う解析手段と、前記形態素解析の結果に基づいて、前記対象文字列における前記指定位置の後の文字列を、文の構成要素に対応する文字列である要素文字列に分解する分解手段と、前記候補文字列と前記要素文字列との組のそれぞれについて、予め定められた評価規則に従って、当該候補文字列に当該要素文字列以降の前記対象文字列中の文字列が続く表現を評価し、当該表現の自然言語としての妥当性を表す評価値を求める評価手段と、前記評価値が予め設定された閾値以上である前記表現を出力する出力手段と、を備えることを特徴とする情報処理装置である。 The invention according to claim 7 refers to a receiving unit that receives a specified position that is a position in a target character string that is a character string to be processed, and a dictionary storage unit that stores a plurality of sentences, and the specification in the target character string A character string included up to the position as a search key, and in a sentence including the search key character string among the plurality of sentences stored in the dictionary storage means, a character string following the search key character string, Guessing means for estimating a candidate character string that is a candidate for a character string following the character string up to the specified position in the target character string, and performing a morphological analysis on the character string after the specified position in the target character string analyzing means, based on a result of the morphological analysis, decompose decompose string after the specified position in the target string, the element string is a character string corresponding to the components of the sentence For each pair of the candidate character string and the element character string, an expression in which the character string in the target character string subsequent to the element character string is followed by the candidate character string according to a predetermined evaluation rule. Evaluation means for evaluating and obtaining an evaluation value representing the validity of the expression as a natural language; and output means for outputting the expression whose evaluation value is equal to or greater than a preset threshold value. Information processing apparatus.

請求項１または７に係る発明によると、文字列の挿入を伴うテキストの修正および文字列の置換を伴うテキストの修正の両方を支援できる。 According to the invention which concerns on Claim 1 or 7, both correction of the text accompanying insertion of a character string and correction of the text accompanying substitution of a character string can be supported.

請求項２に係る発明によると、ユーザが選択した表現によりテキストを修正できる。 According to the invention which concerns on Claim 2, a text can be corrected by the expression which the user selected.

請求項３に係る発明によると、自然言語としての妥当性が高い表現を優先してユーザに提示できる。 According to the third aspect of the present invention, it is possible to preferentially present an expression having high validity as a natural language to the user.

請求項４に係る発明によると、対象文字列中の表現と同一の表現について評価値を求める処理を省略できる。 According to the invention which concerns on Claim 4, the process which calculates | requires an evaluation value about the same expression as the expression in an object character string can be abbreviate | omitted.

請求項５に係る発明によると、指定位置の後に続く文字列として推測された候補文字列と、対象文字列に元から含まれる文字列と、を区別してユーザに提示できる。 According to the fifth aspect of the present invention, the candidate character string estimated as the character string following the designated position and the character string originally included in the target character string can be distinguished and presented to the user.

請求項６に係る発明によると、候補文字列を用いて生成された表現と共に、当該表現において候補文字列により置換された元の対象文字列中の文字列をユーザに提示できる。 According to the invention which concerns on Claim 6, the character string in the original object character string substituted by the candidate character string in the said expression can be shown to a user with the expression produced | generated using the candidate character string.

情報処理装置の内部構成の概略の例を示すブロック図である。It is a block diagram which shows the example of the outline of an internal structure of information processing apparatus. 表示画面の例を示す図である。It is a figure which shows the example of a display screen. テキストを形態素に分解した例を示す図である。It is a figure which shows the example which decomposed | disassembled the text into the morpheme. 指定位置の後続文字列の候補を連結した表現の例を示す図である。It is a figure which shows the example of the expression which connected the candidate of the subsequent character string of a designated position. 情報処理装置が行う処理の手順の例を示す図である。It is a figure which shows the example of the procedure of the process which information processing apparatus performs. 連結表現評価処理の手順の例を示す図である。It is a figure which shows the example of the procedure of a connection expression evaluation process. 指定位置の後続文字列の候補を連結した表現の他の例を示す図である。It is a figure which shows the other example of the expression which connected the candidate of the subsequent character string of a designated position. 連結表現評価処理の結果の一部の例を示す図である。It is a figure which shows the example of a part of result of a connection expression evaluation process. 連結表現評価処理の結果の一部の例を示す図である。It is a figure which shows the example of a part of result of a connection expression evaluation process. 連結表現評価処理の結果の一部の例を示す図である。It is a figure which shows the example of a part of result of a connection expression evaluation process. 連結表現評価処理の結果の一部の例を示す図である。It is a figure which shows the example of a part of result of a connection expression evaluation process. 表示の態様の例を示す図である。It is a figure which shows the example of the aspect of a display. 表示の態様の他の例を示す図である。It is a figure which shows the other example of the aspect of a display. 表示の態様のさらに他の例を示す図である。It is a figure which shows the further another example of the aspect of a display. コンピュータのハードウエア構成の例を示す図である。It is a figure which shows the example of the hardware constitutions of a computer.

図１に、本発明の一実施形態の例による情報処理装置の内部構成の概略を示す。図１の例の情報処理装置１０は、文書記憶部１００、参照データ記憶部１１０、入力受付部１２０、対象テキスト取得部１３０、後続文字列推測部１４０、テキスト分解部１５０、連結表現評価部１６０、出力処理部１７０、および修正テキスト生成部１８０を備える。 FIG. 1 shows an outline of an internal configuration of an information processing apparatus according to an example of an embodiment of the present invention. The information processing apparatus 10 in the example of FIG. 1 includes a document storage unit 100, a reference data storage unit 110, an input reception unit 120, a target text acquisition unit 130, a subsequent character string estimation unit 140, a text decomposition unit 150, and a connected expression evaluation unit 160. , An output processing unit 170, and a modified text generation unit 180.

文書記憶部１００は、電子文書を記憶する。後述する情報処理装置１０の各部の処理は、文書記憶部１００に記憶された電子文書中のテキスト（文字列）を処理対象として行なわれる。 The document storage unit 100 stores an electronic document. Processing of each unit of the information processing apparatus 10 to be described later is performed on a text (character string) in the electronic document stored in the document storage unit 100 as a processing target.

参照データ記憶部１１０は、情報処理装置１０が行う処理において参照される情報を記憶する。参照データ記憶部１１０は、解析辞書１１２、文例辞書１１４、および評価用データ１１６を含む。解析辞書１１２、文例辞書１１４、および評価用データ１１６の詳細は後述する。 The reference data storage unit 110 stores information referred to in processing performed by the information processing apparatus 10. The reference data storage unit 110 includes an analysis dictionary 112, a sentence example dictionary 114, and evaluation data 116. Details of the analysis dictionary 112, the sentence example dictionary 114, and the evaluation data 116 will be described later.

入力受付部１２０は、マウスおよびキーボードなどの入力装置を介したユーザの入力を受け付ける。入力受付部１２０は、受け付けた入力の情報を、その内容に応じて対象テキスト取得部１３０、後続文字列推測部１４０、または修正テキスト生成部１８０に渡す。 The input receiving unit 120 receives user input via an input device such as a mouse and a keyboard. The input reception unit 120 passes the received input information to the target text acquisition unit 130, the subsequent character string estimation unit 140, or the corrected text generation unit 180 according to the content.

対象テキスト取得部１３０は、文書記憶部１００に記憶された電子文書から、処理対象のテキストを取得する。対象テキスト取得部１３０は、例えば、電子文書においてユーザが指定した位置の前後の文字列を処理対象として取得する。例えば図２を参照し、文書記憶部１００に記憶された電子文書を表示装置（図示しない）に表示させた表示画面２００において、ユーザが入力装置を用いて矢印Ｃで示す位置にカーソルを合わせたとする。この例の場合、対象テキスト取得部１３０は、入力受付部１２０を介してカーソルの位置を表す情報を取得し、このカーソル位置をユーザによる指定位置とみなして指定位置の前後の文字列を取得する。このとき取得する文字列は、例えば、指定位置を含む一文を構成する文字列であってよい。図２の例では、指定位置を含む一文「右肺に高吸収域が見られます。」が取得される。指定位置を含む一文は、例えば、指定位置から前方向および後方向の文字列を一文字ずつ調べて、文の区切りを示す記号（句点、感嘆符、疑問符など）を発見した時点で、指定位置と当該記号との間の文字列を取得することで得られる。 The target text acquisition unit 130 acquires the text to be processed from the electronic document stored in the document storage unit 100. The target text acquisition unit 130 acquires, for example, character strings before and after a position designated by the user in the electronic document as a processing target. For example, referring to FIG. 2, in the display screen 200 in which the electronic document stored in the document storage unit 100 is displayed on the display device (not shown), the user moves the cursor to the position indicated by the arrow C using the input device. To do. In this example, the target text acquisition unit 130 acquires information indicating the position of the cursor via the input reception unit 120, and regards the cursor position as a specified position by the user, and acquires character strings before and after the specified position. . The character string acquired at this time may be, for example, a character string constituting one sentence including a specified position. In the example of FIG. 2, a sentence including the designated position “A high absorption area is seen in the right lung” is acquired. For example, a sentence containing the specified position is determined by checking the forward and backward character strings one by one from the specified position and finding a symbol (punctuation mark, exclamation mark, question mark, etc.) that indicates a sentence break. It is obtained by acquiring a character string between the symbols.

なお、本実施形態の例の説明において、指定位置の「前後」の文字列とは、横書きの文書であれば指定位置の「左右」の文字列を意味し、縦書きの文書であれば指定位置の「上下」の文字列を意味する。 In the description of the example of the present embodiment, the “front and back” character string at the designated position means the “left and right” character string at the designated position in a horizontally written document, and designated in a vertically written document. It means the character string "up and down" of the position.

後続文字列推測部１４０は、処理対象のテキストにおける指定位置の後に続く文字列の候補を、予め定められた推測規則に従って推測する。図２の例の対象テキストの場合、後続文字列推測部１４０は、指定位置までの文字列「右肺に」の後に続く文字列の候補を推測する。本実施形態の例では、後続文字列推測部１４０は、参照データ記憶部１１０が備える文例辞書１１４を参照して指定位置の後続文字列の候補を推測する。文例辞書１１４は、文書記憶部１００に記憶された電子文書に含まれる複数の文を記憶した辞書である。文例辞書１１４は、さらに、各文について、文中の単語の区切りおよび各単語の品詞などの文法上の役割を記憶しておいてもよい。後続文字列推測部１４０は、例えば、処理対象のテキストの指定位置までの文字列を検索キーとして文例辞書１１４を検索し、検索キーの文字列を含む文において当該文字列に続く単語または文節を特定する。そして、特定された単語または文節のうち出現頻度が予め設定された閾値以上であるものを指定位置の後続文字列の候補とする。このように求めた候補に対し、出現頻度が大きい順に順位付けしてもよい。あるいは、例えば、各候補に対し、出現頻度に基づくスコアを決定してもよい。なお、後続文字列推測部１４０が指定位置の後続文字列を推測する方法は上述の例に限られず、従来から知られている他の種々の手法を用いてよい。 The subsequent character string estimation unit 140 estimates a character string candidate that follows the specified position in the text to be processed according to a predetermined estimation rule. In the case of the target text in the example of FIG. 2, the subsequent character string estimation unit 140 estimates a character string candidate that follows the character string “in the right lung” up to the designated position. In the example of the present embodiment, the subsequent character string estimation unit 140 estimates a candidate for the subsequent character string at the specified position with reference to the sentence example dictionary 114 provided in the reference data storage unit 110. The example sentence dictionary 114 is a dictionary that stores a plurality of sentences included in an electronic document stored in the document storage unit 100. The sentence example dictionary 114 may further store, for each sentence, grammatical roles such as word breaks in the sentence and part of speech of each word. The subsequent character string estimation unit 140 searches, for example, the sentence example dictionary 114 using the character string up to the specified position of the text to be processed as a search key, and selects a word or phrase following the character string in a sentence including the character string of the search key. Identify. Then, a specified word or phrase having an appearance frequency equal to or higher than a preset threshold is set as a subsequent character string candidate at the designated position. The candidates obtained in this way may be ranked in descending order of appearance frequency. Alternatively, for example, a score based on the appearance frequency may be determined for each candidate. The method by which the subsequent character string estimation unit 140 estimates the subsequent character string at the specified position is not limited to the above-described example, and various other conventionally known methods may be used.

テキスト分解部１５０は、対象テキスト取得部１３０が取得した処理対象のテキストを予め定められた解析規則に従って解析し、文を構成する要素ごとに分解する。本実施形態の例では、テキスト分解部１５０は、参照データ記憶部１１０が備える解析辞書１１２を参照し、自然言語処理の分野で既知の形態素解析の手法を用いて処理対象のテキストを形態素に分解する。本例の解析辞書１１２は、単語と当該単語の文法上の役割などを表す情報とを対応づけて記憶すると共に、日本語の文法規則を記憶した辞書である。図２の例の処理対象テキストの場合、テキスト分解部１５０は、解析辞書１１２を参照して、「右肺に高吸収域が見られます。」との一文を、図３に例示するように単語ごとに分解する。テキスト分解部１５０は、また、後続文字列推測部１４０が推測した後続文字列の候補を解析して分解することもある。例えば、後続文字列推測部１４０が図２の例の指定位置の後続文字列の候補として「著明な」を推測した場合、テキスト分解部１５０は、この文字列を形態素解析して、「著明」および「な」の単語ごとに分解する。 The text decomposition unit 150 analyzes the text to be processed acquired by the target text acquisition unit 130 according to a predetermined analysis rule, and decomposes each element constituting the sentence. In the example of the present embodiment, the text decomposition unit 150 refers to the analysis dictionary 112 provided in the reference data storage unit 110, and decomposes the text to be processed into morphemes using a morphological analysis method known in the field of natural language processing. To do. The analysis dictionary 112 of this example is a dictionary that stores a word and information indicating a grammatical role of the word in association with each other and stores Japanese grammar rules. In the case of the processing target text in the example of FIG. 2, the text decomposition unit 150 refers to the analysis dictionary 112 and exemplifies a sentence “a high absorption area is seen in the right lung” as illustrated in FIG. 3. Disassemble each word. The text decomposition unit 150 may also analyze and decompose the subsequent character string candidates estimated by the subsequent character string estimation unit 140. For example, when the subsequent character string estimation unit 140 estimates “significant” as a candidate for the subsequent character string at the specified position in the example of FIG. 2, the text decomposition unit 150 performs a morphological analysis on the character string, Disassemble for each word “Ming” and “N”.

連結表現評価部１６０は、後続文字列推測部１４０が推測した後続文字列の候補のそれぞれと、処理対象のテキストにおける指定位置の後の文字列をテキスト分解部１５０が分解した要素それぞれと、の組について、当該候補に当該要素以降の処理対象のテキスト中の文字列を連結した表現の自然言語としての妥当性を評価する。言い換えると、各表現の尤もらしさを評価する。 The connected expression evaluation unit 160 includes each of the subsequent character string candidates estimated by the subsequent character string estimation unit 140 and each element obtained by the text decomposition unit 150 decomposing the character string after the specified position in the text to be processed. For a set, the validity of the expression as a natural language in which a character string in a text to be processed after the element is connected to the candidate is evaluated. In other words, the likelihood of each expression is evaluated.

以下、連結表現評価部１６０による評価対象の表現の具体例を説明する。図２の例の処理対象テキストが図３の例のように分解され、指定位置までの文字列「右肺に」の後続文字列の候補として「著明な」が推測された場合を考える。本例において、連結表現評価部１６０は、処理対象のテキストにおける指定位置の後の単語（要素）のうち、「高吸収」，「域」，「が」，「見られ」を評価対象とする。図４を参照し、指定位置までの文字列「右肺に」に後続文字列の候補「著明な」が続き、「著明な」の後に、指定位置の後の評価対象の各単語以降の文字列を連結した表現それぞれの妥当性が評価される。図４の例では、各破線矢印ａ，ｂ，ｃ，ｄにより連結される表現「右肺に著明な高吸収域が見られます。」，「右肺に著明な域が見られます。」，「右肺に著明なが見られます」，「右肺に著明な見られます。」のそれぞれについて評価値が求められる。 Hereinafter, a specific example of the expression to be evaluated by the connected expression evaluation unit 160 will be described. Consider the case where the text to be processed in the example of FIG. 2 is decomposed as in the example of FIG. 3 and “significant” is estimated as a candidate for the subsequent character string of the character string “in the right lung” up to the specified position. In this example, the connected expression evaluation unit 160 evaluates “high absorption”, “range”, “ga”, and “seen” among the words (elements) after the specified position in the text to be processed. . Referring to FIG. 4, the character string “right lung” up to the designated position is followed by a candidate for the succeeding character string “significant”, and after “significant”, each word to be evaluated after the designated position The validity of each expression that concatenates the character strings is evaluated. In the example of FIG. 4, the expression “connected to each broken arrow a, b, c, d” “A remarkable high absorption area is seen in the right lung.”, “A remarkable area is seen in the right lung. . ”,“ A prominent look is seen in the right lung ”, and“ A prominent look is seen in the right lung ”are obtained.

図４を参照する上述の説明からわかるように、本実施形態の例では、連結表現評価部１６０において、後続文字列の候補を処理対象テキストの指定位置に挿入した表現（図４の破線矢印ａ参照）だけでなく、処理対象テキスト中の一部の文字列を後続文字列の候補で置換した表現（図４の破線矢印ｂ，ｃ，ｄ参照）が生成され、生成された各表現の妥当性が評価される。 As can be seen from the above description with reference to FIG. 4, in the example of this embodiment, in the connected expression evaluation unit 160, an expression in which the candidate for the subsequent character string is inserted at the designated position of the processing target text (broken arrow a in FIG. Expression) (see broken line arrows b, c, and d in FIG. 4) in which a part of the character string in the processing target text is replaced with a candidate for the subsequent character string is generated. Sex is evaluated.

再び図１を参照し、連結表現評価部１６０は、参照データ記憶部１１０が備える評価用データ１１６を用いて、予め定められた評価規則に従って各表現の評価値を求める。評価用データ１１６は、各表現の評価値を求めるために必要なデータであり、予め生成されて参照データ記憶部１１０に記憶される。評価用データ１１６の内容は、連結表現評価部１６０が用いる評価規則に応じて異なる。評価規則としては、自然言語処理の分野において、表現の自然言語としての妥当性を評価する場合に従来から用いられている手法を用いてよい。一例として、Ｎグラムモデルなどの確率言語モデルを予め構成しておき、この確率言語モデルに従って各表現の出現確率の推定値を求めて各表現の評価値としてもよい。この例の場合、評価用データ１１６は、確率言語モデルを定める情報を含む。確率言語モデルを定める情報は、例えば、各単語の出現頻度、および、ある単語の後に他の単語が続いて出現する頻度などを含む。評価規則の他の例では、ある文法に従って文字列の構文を解析する構文解析の手法を用いてもよい。構文解析の手法が用いられる場合、評価用データ１１６は、処理対象テキストの言語（例えば日本語）の文法規則を表す情報と当該言語の単語の辞書を含む。連結表現評価部１６０は、評価対象の各表現について、評価用データ１１６中の文法規則および辞書を参照して構文解析を行い、構文解析の結果に基づいて各表現の評価値を決定する。例えば、構文解析に成功すれば自然言語の表現として妥当である旨を表す値を、構文解析に失敗すれば自然言語の表現として妥当でない旨を表す値を当該表現の評価値とする。 Referring again to FIG. 1, the connected expression evaluation unit 160 uses the evaluation data 116 included in the reference data storage unit 110 to obtain an evaluation value for each expression according to a predetermined evaluation rule. The evaluation data 116 is data necessary for obtaining the evaluation value of each expression, and is generated in advance and stored in the reference data storage unit 110. The contents of the evaluation data 116 differ depending on the evaluation rule used by the connected expression evaluation unit 160. As the evaluation rule, a method conventionally used in the field of natural language processing when evaluating the validity of an expression as a natural language may be used. As an example, a probabilistic language model such as an N-gram model may be configured in advance, and an estimated value of the appearance probability of each expression may be obtained in accordance with this probabilistic language model and used as an evaluation value for each expression. In this example, the evaluation data 116 includes information defining a probabilistic language model. The information that defines the probabilistic language model includes, for example, the appearance frequency of each word and the frequency with which another word appears after a certain word. In another example of the evaluation rule, a parsing technique for analyzing the syntax of a character string according to a certain grammar may be used. When the parsing technique is used, the evaluation data 116 includes information representing the grammar rules of the language (for example, Japanese) of the text to be processed and a dictionary of words in the language. The linked expression evaluation unit 160 performs syntax analysis on each expression to be evaluated with reference to the grammar rules and the dictionary in the evaluation data 116, and determines an evaluation value of each expression based on the result of the syntax analysis. For example, a value indicating that the expression is valid as a natural language expression if the parsing is successful, and a value indicating that the expression is not valid as a natural language expression if the parsing fails, are used as the evaluation value of the expression.

出力処理部１７０は、情報処理装置１０における処理の結果を出力する処理を行う。例えば、出力処理部１７０は、連結表現評価部１６０が求めた各表現の評価値に従って、出力対象の表現を選択し、選択した表現を図示しない表示装置に表示させる。このとき、出力処理部１７０は、選択した表現を、評価値の大きい順に、つまり、自然言語としての妥当性が高い順に表示させてもよい。また、後続文字列推測部１４０において複数の後続文字列の候補が推測され、これらの候補が順位付けされているか、あるいは各候補にスコアが付与されている場合、出力処理部１７０は、これらの候補の順位またはスコアをさらに考慮して出力対象の表現を選択してもよい。表示の態様の具体例は後述する。 The output processing unit 170 performs a process of outputting a processing result in the information processing apparatus 10. For example, the output processing unit 170 selects an expression to be output according to the evaluation value of each expression obtained by the connected expression evaluation unit 160, and displays the selected expression on a display device (not shown). At this time, the output processing unit 170 may display the selected expressions in descending order of evaluation values, that is, in order of high relevance as a natural language. If the subsequent character string estimation unit 140 estimates a plurality of subsequent character string candidates and ranks these candidates or assigns a score to each candidate, the output processing unit 170 The expression to be output may be selected by further considering the candidate rank or score. A specific example of the display mode will be described later.

修正テキスト生成部１８０は、連結表現評価部１６０による評価結果に基づいて出力処理部１７０が出力した表現の中から１つの表現をユーザが選択した場合に、入力受付部１２０を介してユーザの選択した表現を特定し、処理対象のテキストを選択された表現に置換する処理を行う。修正テキスト生成部１８０の処理により、処理対象のテキストを含む電子文書において、処理対象のテキストが選択された表現に修正される。 The modified text generation unit 180 selects the user via the input reception unit 120 when the user selects one expression from the expressions output by the output processing unit 170 based on the evaluation result by the connected expression evaluation unit 160. The specified expression is specified, and processing for replacing the text to be processed with the selected expression is performed. By the processing of the corrected text generation unit 180, the processing target text is corrected to the selected expression in the electronic document including the processing target text.

以下、情報処理装置１０の処理の例を説明する。図５は、情報処理装置１０が行う処理の手順の例を示すフローチャートである。情報処理装置１０は、例えば、ユーザが電子文書中の特定の位置を指定する入力を行った場合に、図５の例の手順の処理を開始する。 Hereinafter, an example of processing of the information processing apparatus 10 will be described. FIG. 5 is a flowchart illustrating an example of a procedure of processing performed by the information processing apparatus 10. The information processing apparatus 10 starts the processing of the procedure in the example of FIG. 5 when, for example, the user inputs to specify a specific position in the electronic document.

まず、入力受付部１２０は、ユーザが入力装置を用いて指定した電子文書中の指定位置を取得する（ステップＳ１０）。 First, the input receiving unit 120 acquires a designated position in the electronic document designated by the user using the input device (step S10).

入力受付部１２０から指定位置を取得した対象テキスト取得部１３０は、指定位置の前後の文字列を取得して処理対象テキストとする（ステップＳ１２）。本実施形態の例において、対象テキスト取得部１３０は、ステップＳ１２で、指定位置を含む一文を構成する文字列を取得する。例えば、指定位置の前後で最初に現れる文の区切り記号（句点、感嘆符、疑問符など）と、指定位置と、の間の文字列を取得することで指定位置を含む一文が得られる。 The target text acquisition unit 130 that has acquired the specified position from the input receiving unit 120 acquires the character string before and after the specified position and sets it as the processing target text (step S12). In the example of the present embodiment, the target text acquisition unit 130 acquires a character string constituting one sentence including the designated position in step S12. For example, a sentence including the specified position can be obtained by acquiring a character string between a specified position and a delimiter (a punctuation mark, an exclamation mark, a question mark, etc.) of a sentence that first appears before and after the specified position.

テキスト分解部１５０は、ステップＳ１２で取得された処理対象テキストを解析して、それぞれ、文を構成する要素に分解する（ステップＳ１４）。本実施形態の例では、テキスト分解部１５０は、ステップＳ１４で、解析辞書１１２を参照して形態素解析を行い、処理対象テキストを単語ごとに分解する。 The text decomposing unit 150 analyzes the processing target text acquired in step S12 and decomposes it into elements constituting the sentence (step S14). In the example of the present embodiment, the text decomposition unit 150 performs morphological analysis with reference to the analysis dictionary 112 in step S14, and decomposes the processing target text for each word.

入力受付部１２０から指定位置を取得した後続文字列推測部１４０は、指定位置までの文字列に続く文字列の候補（以下、単に「後続候補」とも呼ぶ）を推測する（ステップＳ１６）。本実施形態の例では、ステップＳ１４で、後続文字列推測部１４０は、まず、指定位置の直前の単語または文節の文字列を検索キーとして文例辞書１１４を検索する。そして、検索キーの文字列を含む文例辞書１１４中の文において、検索キーの文字列に続く単語または文節を特定し、特定された単語または文節のうち出現頻度が予め設定された閾値以上であるものを後続候補とする。検索キーの文字列は、テキスト分解部１５０によるステップＳ１４の分解処理の結果を参照して決定すればよい。例えば、指定位置の直前の単語から順に各単語の品詞を調べて、最初に現れる自立語（名詞、動詞など、単独で文節を構成可能な単語）、および当該自立語と指定位置との間に位置する単語を含む文字列を検索キーとすればよい。また、後続文字列推測部１４０は、複数の後続候補が得られた場合、例えば各後続候補の出現頻度に応じて、複数の後続候補を順位付けしてもよい。あるいは、各後続候補の出現頻度に基づいて各後続候補のスコアを決定してもよい。各後続候補のスコアは、当該後続候補が指定位置までの文字列に続く可能性の高さを表す値となる。なお、後続文字列推測部１４０は、テキスト分解部１５０に依頼して、求めた後続候補を単語に分解させてもよい。 The subsequent character string estimation unit 140 that has acquired the specified position from the input receiving unit 120 estimates a character string candidate (hereinafter also simply referred to as “subsequent candidate”) following the character string up to the specified position (step S16). In the example of this embodiment, in step S14, the subsequent character string estimation unit 140 first searches the example sentence dictionary 114 using the character string of the word or phrase immediately before the specified position as a search key. Then, in a sentence in the sentence example dictionary 114 including the search key character string, the word or phrase following the search key character string is specified, and the appearance frequency of the specified word or phrase is equal to or greater than a preset threshold. The thing is a subsequent candidate. The character string of the search key may be determined with reference to the result of the decomposition process in step S14 by the text decomposition unit 150. For example, the part-of-speech of each word is examined in order from the word immediately before the specified position, and the first independent word (a noun, verb, or other word that can form a single phrase), and between the independent word and the specified position A character string including the positioned word may be used as a search key. Further, when a plurality of subsequent candidates are obtained, the subsequent character string estimation unit 140 may rank the plurality of subsequent candidates according to, for example, the appearance frequency of each subsequent candidate. Or you may determine the score of each subsequent candidate based on the appearance frequency of each subsequent candidate. The score of each subsequent candidate is a value representing the high possibility that the subsequent candidate follows the character string up to the designated position. Note that the subsequent character string estimation unit 140 may request the text decomposition unit 150 to decompose the obtained subsequent candidates into words.

連結表現評価部１６０は、ステップＳ１６で推測された後続候補の中から１つを選択し（ステップＳ１８）、選択した後続候補について連結表現評価処理を行う（ステップＳ２０）。図６に、図５のステップＳ２０の連結表現評価処理の詳細手順の例を示す。図５のステップＳ２０が開始されると、図６の例の手順の処理が開始される。 The connected expression evaluation unit 160 selects one of the subsequent candidates estimated in step S16 (step S18), and performs a connected expression evaluation process on the selected subsequent candidate (step S20). FIG. 6 shows an example of a detailed procedure of the connected expression evaluation process in step S20 of FIG. When step S20 of FIG. 5 is started, the processing of the procedure of the example of FIG. 6 is started.

図６を参照し、連結表現評価部１６０は、まず、処理対象テキストにおける指定位置の後の要素のうち、評価対象とする要素を特定する（ステップＳ２００）。本実施形態の例では、指定位置の後、１つの文節を構成する単語、および、この文節の直後の単語を評価対象とする。図３に例示する分解された処理対象テキストの場合、指定位置の後の文節「高吸収域が」に含まれる単語（「高吸収」，「域」，「が」）および当該文節の直後の単語「見られ」が評価対象となる。他の実施形態の例では、指定位置の後、予め設定された個数の要素を評価対象としてもよい。さらに他の実施形態の例では、指定位置の後のすべての要素を評価対象としてもよい。 With reference to FIG. 6, the connected expression evaluation unit 160 first identifies an element to be evaluated among elements after the designated position in the processing target text (step S <b> 200). In the example of the present embodiment, after a specified position, a word constituting one clause and a word immediately after this clause are evaluated. In the case of the decomposed processing target text illustrated in FIG. 3, the words (“high absorption”, “range”, “ga”) included in the phrase “high absorption range” after the specified position and immediately after the phrase The word “seen” is the object of evaluation. In an example of another embodiment, a preset number of elements may be evaluated after the designated position. In still another example of the embodiment, all elements after the designated position may be evaluated.

次に、評価対象の要素を１つ選択し（ステップＳ２０２）、選択した要素以降の文字列を後続候補に連結した表現を生成する（ステップＳ２０４）。図３を参照する上述の例の場合に、評価対象の単語のうちの１つである「高吸収」をステップＳ２０２で選択したとし、後続候補が「著明な」であるとすると、連結表現評価部１６０は、「著明な」に「高吸収」以降の文字列を連結（図４の破線矢印ａ参照）した表現「右肺に著明な高吸収域が見られます。」を生成する。 Next, one element to be evaluated is selected (step S202), and an expression in which a character string after the selected element is linked to a subsequent candidate is generated (step S204). In the case of the above example with reference to FIG. 3, if “high absorption”, which is one of the words to be evaluated, is selected in step S202, and the subsequent candidate is “significant”, then the connected expression The evaluation unit 160 generates an expression “a remarkable high absorption region is seen in the right lung” by concatenating a character string after “high absorption” to “significant” (see the broken arrow a in FIG. 4). To do.

ステップＳ２０４の後、連結表現評価部１６０は、生成した表現が元の文字列の表現と同一であるか否かを判定する（ステップＳ２０６）。 After step S204, the linked expression evaluation unit 160 determines whether or not the generated expression is the same as the original character string expression (step S206).

例えば、図３の例の処理対象テキストについて、後続候補が「高吸収」であり、ステップＳ２０２で単語「高吸収」が選択された場合、図７に示すように、ステップＳ２０４で生成される表現「右肺に高吸収域が見られます。」は、元の処理対象テキストの表現と同一になる。このような例では、ステップＳ２０６でＹＥＳ判定されて、処理はステップＳ２０２に戻り、未処理の評価対象の要素の１つが選択されてステップＳ２０４以下の処理が繰り返される。 For example, for the text to be processed in the example of FIG. 3, when the subsequent candidate is “high absorption” and the word “high absorption” is selected in step S202, the expression generated in step S204 as shown in FIG. “A high-absorption area is seen in the right lung” is the same as the original text to be processed. In such an example, YES is determined in step S206, the process returns to step S202, one of the unprocessed evaluation target elements is selected, and the processes in and after step S204 are repeated.

一方、図３および図４を参照する上述の例の表現「右肺に著明な高吸収域が見られます。」の場合のように、生成した表現が元の処理対象テキストの表現と異なる場合、連結表現評価部１６０は、ステップＳ２０４で生成した表現の自然言語としての妥当性を表す評価値を求める（ステップＳ２０８）。ステップＳ２０８で、連結表現評価部１６０は、参照データ記憶部１１０の評価用データ１１６を参照し、予め定められた評価規則に従って、ステップＳ２０４で生成した表現の評価値を求める。上述の確率言語モデルを用いる例において図４に例示する表現「右肺に著明な高吸収域が見られます。」を評価する場合、例えば、評価用データ１１６を用いて、後続候補「著明な」の後に文字列「高吸収域が」が出現する確率を求めて評価値とすればよい。また、上述の構文解析を行う例の場合、文法規則および辞書を含む評価用データ１１６を用いて、表現「右肺に著明な高吸収域が見られます。」に対して構文解析を行い、構文解析の成否に応じて当該表現の評価値を求めればよい。 On the other hand, the generated expression is different from the expression of the original processing target text as in the case of the expression in the above example with reference to FIGS. 3 and 4, in which a marked high absorption area is seen in the right lung. In this case, the connected expression evaluation unit 160 obtains an evaluation value representing the validity of the expression generated in step S204 as a natural language (step S208). In step S208, the linked expression evaluation unit 160 refers to the evaluation data 116 in the reference data storage unit 110, and obtains the evaluation value of the expression generated in step S204 according to a predetermined evaluation rule. In the example using the above-mentioned stochastic language model, when evaluating the expression illustrated in FIG. 4 “a remarkable high absorption region is seen in the right lung”, for example, using the evaluation data 116, the subsequent candidate “ What is necessary is just to obtain | require the probability that the character string "high absorption area" will appear after "bright", and let it be an evaluation value. Further, in the case of the above-described parsing example, parsing is performed on the expression “a significant high absorption area is seen in the right lung” using the evaluation data 116 including grammatical rules and a dictionary. The evaluation value of the expression may be obtained according to the success or failure of the syntax analysis.

ステップＳ２０８の後、ステップＳ２００で特定した評価対象の要素のすべてについて処理済であれば（ステップＳ２１０でＹＥＳ）、図６の例の手順の連結表現評価処理は終了する。すべてについて処理済みでなければ（ステップＳ２１０でＮＯ）、ステップＳ２０２に戻って未処理の評価対象の要素を選択し、ステップＳ２０４以降の処理が繰り返される。 After step S208, if all of the evaluation target elements specified in step S200 have been processed (YES in step S210), the linked expression evaluation process in the procedure of the example of FIG. 6 ends. If not all have been processed (NO in step S210), the process returns to step S202 to select an unprocessed element to be evaluated, and the processes in and after step S204 are repeated.

再び図５を参照し、ステップＳ１６で推測された後続候補のすべてについて連結表現評価処理（ステップＳ２０，図６）が行われるまで（ステップＳ２２でＹＥＳ判定されるまで）、後続候補の選択（ステップＳ１８）および連結表現評価処理（ステップＳ２０）が繰り返される。 Referring to FIG. 5 again, until the connected expression evaluation process (step S20, FIG. 6) is performed for all the subsequent candidates estimated in step S16 (until YES is determined in step S22), selection of subsequent candidates (step S18) and the connected expression evaluation process (step S20) are repeated.

すべての後続候補について連結表現評価処理が終了すると（ステップＳ２２でＹＥＳ）、出力処理部１７０は、出力対象の表現を決定する（ステップＳ２４）。ステップＳ２４で、出力処理部１７０は、各後続候補と連結表現評価部１６０の評価対象の各要素との組み合わせに対して連結表現評価処理で求められた評価値を用いて出力対象の表現を決定する。また、出力処理部１７０は、ステップＳ１６で推測された複数の後続候補が順位付けされている場合または各後続候補のスコアが決定されている場合、その順位またはスコアをさらに考慮して出力対象の表現を決定してもよい。 When the connected expression evaluation process is completed for all subsequent candidates (YES in step S22), the output processing unit 170 determines an expression to be output (step S24). In step S <b> 24, the output processing unit 170 determines the output target expression using the evaluation value obtained in the connected expression evaluation process for the combination of each subsequent candidate and each evaluation target element of the connected expression evaluation unit 160. To do. In addition, when the plurality of subsequent candidates estimated in step S16 are ranked or when the score of each subsequent candidate is determined, the output processing unit 170 further considers the rank or score, The expression may be determined.

以下、図８Ａ〜図８Ｄを参照し、ステップＳ２４の具体例を説明する。図８Ａ〜図８Ｄは、後続候補として、「著明な」，「高吸収」，「低吸収」，「腫瘍」の４つがステップＳ１６で推測された場合に、各後続候補について連結表現評価処理（ステップＳ２０，図６）で求められた評価値の例を示す。図８Ａは、後続候補「著明な」を含む表現の評価値の例であり、図４と同様の図において各破線矢印に対応する評価値を示す。図８Ｂは、後続候補「高吸収」を含む表現の評価値の例を示す。図８Ｂでは、後続候補「高吸収」に処理対象テキスト中の各単語「高吸収」，「が」，「見られ」以降の文字列を連結した各表現の評価値を示す。図８Ｂを参照すると、後続候補「高吸収」に処理対象テキスト中の単語「域」を連結した表現の評価値は算出されない。元の処理対象テキストの表現と同一であるため、図６の例の連結表現評価処理のステップＳ２０６でＹＥＳ判定され、評価値の算出（ステップＳ２０８）が行われないからである。図８Ｃは、後続候補「低吸収」に処理対象テキスト中の各単語「高吸収」，「域」，「が」，「見られ」以降の文字列を連結した各表現の評価値を示す。同様に、図８Ｄは、後続候補「腫瘍」に処理対象テキスト中の各単語以降の文字列を連結した各表現の評価値を示す。 Hereinafter, a specific example of step S24 will be described with reference to FIGS. 8A to 8D. FIGS. 8A to 8D show the connected expression evaluation process for each subsequent candidate when four prominent candidates, “significant”, “high absorption”, “low absorption”, and “tumor” are estimated in step S16. The example of the evaluation value calculated | required by (step S20, FIG. 6) is shown. FIG. 8A is an example of an evaluation value of an expression including the subsequent candidate “striking”, and shows an evaluation value corresponding to each broken-line arrow in the same diagram as FIG. FIG. 8B shows an example of an evaluation value of an expression including the subsequent candidate “high absorption”. FIG. 8B shows the evaluation value of each expression in which the subsequent candidate “high absorption” is connected with the character strings after the words “high absorption”, “ga”, and “seen” in the text to be processed. Referring to FIG. 8B, the evaluation value of the expression in which the word “region” in the text to be processed is connected to the subsequent candidate “high absorption” is not calculated. This is because it is the same as the original processing target text expression, so that a YES determination is made in step S206 of the linked expression evaluation process in the example of FIG. 6, and evaluation value calculation (step S208) is not performed. FIG. 8C shows the evaluation value of each expression in which the subsequent candidate “low absorption” is concatenated with the character strings after the words “high absorption”, “range”, “ga”, and “seen” in the text to be processed. Similarly, FIG. 8D shows the evaluation value of each expression in which the subsequent candidate “tumor” is connected to the character string after each word in the text to be processed.

出力対象の表現の決定において後続候補の順位およびスコアを考慮しない場合、例えば、出力処理部１７０は、評価値が予め設定された閾値以上である表現を出力対象として選択する。一例として、この閾値を０．４とすると、後続候補「著明な」については、評価値０．６の表現「右肺に著明な高吸収域が見られます。」が選択される（図８Ａ）。後続候補「高吸収」については、評価値０．４以上の表現が存在しないので、出力対象として選択されない（図８Ｂ）。後続候補「低吸収」については、評価値０．６の表現「右肺に著明な低吸収域が見られます。」が選択される（図８Ｃ）。後続候補「腫瘍」については、評価値０．５の表現「右肺に腫瘍が見られます。」が選択される（図８Ｄ）。 When the rank and score of the subsequent candidates are not considered in determining the expression to be output, for example, the output processing unit 170 selects an expression whose evaluation value is equal to or greater than a preset threshold as an output object. As an example, when this threshold value is 0.4, for the subsequent candidate “significant”, the expression of the evaluation value 0.6 “a remarkable high absorption range is seen in the right lung” is selected ( FIG. 8A). The subsequent candidate “high absorption” is not selected as an output target because there is no expression with an evaluation value of 0.4 or more (FIG. 8B). For the subsequent candidate “low absorption”, an expression of an evaluation value of 0.6 “a marked low absorption region is seen in the right lung” is selected (FIG. 8C). For the subsequent candidate “tumor”, the expression “a tumor is found in the right lung” with an evaluation value of 0.5 is selected (FIG. 8D).

出力対象の表現の決定において後続候補の順位またはスコアを考慮する場合、例えば、出力処理部１７０は、各表現の評価値を後続候補の順位またはスコアに応じて重み付けした値が予め設定された閾値以上であるか否かに従って出力対象の表現を決定してもよい。 When considering the rank or score of the subsequent candidate in the determination of the expression to be output, for example, the output processing unit 170 has a preset threshold value obtained by weighting the evaluation value of each expression according to the rank or score of the subsequent candidate. The expression to be output may be determined according to whether or not this is the case.

出力対象の表現を決定すると、出力処理部１７０は、決定した出力対象の表現を出力する（ステップＳ２６）。本実施形態の例では、ステップＳ２６で、出力対象の表現を図示しない表示装置に表示させる。例えば、処理対象テキストを含む電子文書が表示された表示画面と共に、出力対象の表現を表示させる。このとき、例えば、出力対象の表現をその評価値が大きい順に並べたリストの形式で表示させてもよい。また例えば、後続候補の順位またはスコアに応じて重み付けした評価値が大きい順に出力対象の表現を並べて表示させてもよい。出力処理部１７０は、例えば、後述の各例のような表示の態様を指示する表示制御情報を生成して表示装置に出力することでステップＳ２６の処理を実現する。 When the output target expression is determined, the output processing unit 170 outputs the determined output target expression (step S26). In the example of this embodiment, in step S26, the expression to be output is displayed on a display device (not shown). For example, the expression of the output target is displayed together with the display screen on which the electronic document including the processing target text is displayed. At this time, for example, the expressions to be output may be displayed in the form of a list in which the evaluation values are arranged in descending order. Further, for example, the expressions to be output may be displayed side by side in descending order of evaluation values weighted according to the rank or score of the subsequent candidates. The output processing unit 170 realizes the process of step S26 by, for example, generating display control information that instructs a display mode as in each example described later and outputting the display control information to the display device.

図９〜図１１に、出力対象の表現を表示させる態様の各種の例を示す。図９〜図１１は、図８Ａ〜図８Ｄを参照して説明したステップＳ２４の具体例において、表現「右肺に著明な高吸収域が見られます。」（図８Ａ），「右肺に低吸収域が見られます。」（図８Ｂ），「右肺に腫瘍が見られます。」（図８Ｃ）が出力対象として選択された場合の、ステップＳ２６における表示の具体例である。 9 to 11 show various examples of modes for displaying the expression of the output target. FIGS. 9 to 11 show the expression “a remarkable high absorption region is seen in the right lung” (FIG. 8A) and “right lung” in the specific example of step S24 described with reference to FIGS. 8A to 8D. Is a specific example of display in step S26 when “A tumor is seen in the right lung” (FIG. 8C) is selected as an output target.

図９および図１０は、出力対象の表現を列挙すると共に、後続候補の文字列、元の処理対象テキスト中の文字列、および処理対象テキストにおいて後続候補により置換された文字列を互いに異なる態様で表示する例を示す。図９の例では、後続候補の文字列を太字で表示し、後続候補により置換された文字列を後続候補の後にカッコ書きで表示し、その他の処理対象テキスト中の文字列を後続候補の文字列よりも細い線のフォントで表示する。図１０の例では、後続候補により置換された文字列の上に線を引いて表示する。図１０の例において、後続候補により置換された文字列の他の文字列の表示の態様は、図９の例と同様である。なお、図９または図１０の例の表示画面は、元の処理対象テキストを含む電子文書を表示させた表示画面（図２参照）と共に表示装置に表示させてよい。例えば、図２の例の電子文書の表示画面における処理対象テキストの表示位置の付近に図９または図１０の例の表示画面を表示させることが考えられる。 FIG. 9 and FIG. 10 enumerate the expressions of the output targets, and the subsequent candidate character strings, the character strings in the original processing target text, and the character strings replaced by the subsequent candidates in the processing target text are different from each other. An example of display is shown. In the example of FIG. 9, the character string of the subsequent candidate is displayed in bold, the character string replaced by the subsequent candidate is displayed in parentheses after the subsequent candidate, and the character string in the other processing target text is displayed as the subsequent candidate character. Display with a font that is thinner than the column. In the example of FIG. 10, a line is drawn and displayed on the character string replaced by the subsequent candidate. In the example of FIG. 10, the display mode of other character strings replaced by subsequent candidates is the same as the example of FIG. The display screen in the example of FIG. 9 or FIG. 10 may be displayed on the display device together with the display screen (see FIG. 2) on which the electronic document including the original text to be processed is displayed. For example, the display screen of the example of FIG. 9 or 10 may be displayed near the display position of the text to be processed on the display screen of the electronic document of the example of FIG.

図１１の例では、処理対象テキストの文字列と、出力対象の表現の文字列と、の間の共通部分の表示位置を合わせて表示する。図１１の例では、処理対象テキストおよび出力対象の表現のすべてにおいて共通する文字列は、「が見られ」である。処理対象テキストおよび出力対象の各表現について、文字列「が見られ」が表示画面の左右方向でほぼ同じ位置になるように表示されている。また、出力対象の表現のうち、後続候補の前の文字列（つまり、指定位置までの文字列）および前述の共通部分「が見られ」の後の文字列の表示は省略されている。また、出力対象の表現において、各後続候補の文字列は、他の文字列よりも太い線のフォントで表示されている。なお、図１１の例において、出力対象の表現中の後続候補の他の文字列を、後続候補の文字列および処理対象テキストの文字列よりも淡い（または薄い）色で表示させてもよい。 In the example of FIG. 11, the display position of the common portion between the character string of the text to be processed and the character string of the expression to be output is displayed together. In the example of FIG. 11, the character string common to all of the processing target text and the output target expression is “seen”. For each expression of the text to be processed and the output target, the character string “I can see” is displayed so as to be in substantially the same position in the horizontal direction of the display screen. Further, in the expression to be output, the display of the character string before the succeeding candidate (that is, the character string up to the designated position) and the character string after the above-mentioned common part “seen” is omitted. In the expression to be output, each subsequent candidate character string is displayed in a thicker font than the other character strings. In the example of FIG. 11, another character string of the subsequent candidate in the output target expression may be displayed in a lighter (or lighter) color than the character string of the subsequent candidate and the character string of the processing target text.

図９〜図１１を参照して説明した例の他の態様で表示を行ってもよい。例えば、後続候補の文字列、後続候補により置換された処理対象テキスト中の文字列、およびその他の文字列を互いに異なる色で表示させてもよい。また、これらの文字列を特に区別せずに、単に出力対象の表現を表示させるだけでもよい。 You may display in the other aspect of the example demonstrated with reference to FIGS. For example, the subsequent candidate character string, the character string in the processing target text replaced by the subsequent candidate, and other character strings may be displayed in different colors. Further, it is also possible to simply display the expression to be output without particularly distinguishing these character strings.

図９〜図１１を参照して説明した出力処理（図５のステップＳ２６）が終了すると、図５の例の手順の処理は終了する。 When the output process (step S26 in FIG. 5) described with reference to FIGS. 9 to 11 ends, the process of the procedure in the example of FIG. 5 ends.

情報処理装置１０は、図５のステップＳ２６の出力処理における表示画面を確認したユーザの指示を受け付けて、表示画面中の表現によって電子文書中の処理対象テキストを修正することもある。例えば、図９〜図１１の例の表示画面において、ユーザが入力装置を用いて３行目の表現を選択すると、情報処理装置１０の入力受付部１２０は、選択された表現を修正テキスト生成部１８０に通知する。この通知を受けた修正テキスト生成部１８０は、３行目の文に対応する表現「右肺に腫瘍が見られます。」を生成し、処理対象テキストを含む電子文書において、処理対象テキストを生成したテキストで置換する。これにより、電子文書における元の処理対象テキスト「右肺に高吸収域が見られます。」は、文字列「右肺に腫瘍が見られます。」に修正される。 The information processing apparatus 10 may accept an instruction from a user who has confirmed the display screen in the output process of step S26 in FIG. 5 and correct the processing target text in the electronic document by the expression on the display screen. For example, when the user selects an expression on the third line using the input device on the display screens in the examples of FIGS. 9 to 11, the input reception unit 120 of the information processing apparatus 10 converts the selected expression into a modified text generation unit. 180 is notified. Upon receiving this notification, the corrected text generation unit 180 generates the expression “a tumor is seen in the right lung” corresponding to the sentence on the third line, and generates the processing target text in the electronic document including the processing target text. Replace with the text you made. As a result, the original text to be processed in the electronic document “a high absorption area is seen in the right lung” is corrected to the character string “a tumor is seen in the right lung”.

以上で説明した実施形態の例の処理では、処理対象テキストの指定位置に新たな単語や文節を挿入した表現だけでなく、指定位置以降の文字列の一部を他の文字列に置換した表現も生成され得る。これらの表現を出力することにより、処理対象テキストの修正の候補がユーザに提示される。 In the processing of the example of the embodiment described above, not only an expression in which a new word or phrase is inserted at the specified position of the processing target text, but also an expression in which a part of the character string after the specified position is replaced with another character string. Can also be generated. By outputting these expressions, candidates for correction of the text to be processed are presented to the user.

以上に例示した情報処理装置１０は、典型的には、汎用のコンピュータにて上述の情報処理装置１０の各部の機能又は処理内容を記述したプログラムを実行することにより実現される。コンピュータは、例えば、ハードウエアとして、図１２に示すように、ＣＰＵ（中央演算装置）８０、メモリ（一次記憶）８２、各種Ｉ／Ｏ（入出力）インタフェース８４等がバス８６を介して接続された回路構成を有する。また、そのバス８６に対し、例えばＩ／Ｏインタフェース８４経由で、ハードディスクドライブ（ＨＤＤ）８８やＣＤやＤＶＤ、フラッシュメモリなどの各種規格の可搬型の不揮発性記録媒体を読み取るためのディスクドライブ９０が接続される。このようなドライブ８８又は９０は、メモリに対する外部記憶装置として機能する。実施形態の処理内容が記述されたプログラムがＣＤやＤＶＤ等の記録媒体を経由して、又はネットワーク経由で、ＨＤＤ８８等の固定記憶装置に保存され、コンピュータにインストールされる。固定記憶装置に記憶されたプログラムがメモリに読み出されＣＰＵにより実行されることにより、実施形態の処理が実現される。 The information processing apparatus 10 exemplified above is typically realized by executing a program describing functions or processing contents of each unit of the information processing apparatus 10 described above on a general-purpose computer. In the computer, for example, as shown in FIG. 12, a CPU (central processing unit) 80, a memory (primary storage) 82, various I / O (input / output) interfaces 84, and the like are connected via a bus 86 as hardware. Circuit configuration. In addition, a hard disk drive (HDD) 88, a disk drive 90 for reading various types of portable non-volatile recording media such as a CD, a DVD, and a flash memory is connected to the bus 86 via, for example, an I / O interface 84. Connected. Such a drive 88 or 90 functions as an external storage device for the memory. A program in which the processing content of the embodiment is described is stored in a fixed storage device such as the HDD 88 via a recording medium such as a CD or DVD or via a network, and is installed in a computer. The program stored in the fixed storage device is read into the memory and executed by the CPU, whereby the processing of the embodiment is realized.

なお、以上では、情報処理装置１０を１台のコンピュータにより実現する例の実施形態を説明したが、情報処理装置１０の上述の例の各種の機能を複数のコンピュータに分散させて実現してもよい。 In the above, the embodiment of the example in which the information processing apparatus 10 is realized by one computer has been described. However, the various functions of the above-described example of the information processing apparatus 10 may be realized by being distributed to a plurality of computers. Good.

１０情報処理装置、８０ＣＰＵ、８２メモリ、８４Ｉ／Ｏインタフェース、８６バス、８８ＨＤＤ、９０ディスクドライブ、１００文書記憶部、１１０参照データ記憶部、１２０入力受付部、１３０対象テキスト取得部、１４０後続文字列推測部、１５０テキスト分解部、１６０連結表現評価部、１７０出力処理部、１８０修正テキスト生成部、２００表示画面。 10 Information processing device, 80 CPU, 82 Memory, 84 I / O interface, 86 bus, 88 HDD, 90 Disk drive, 100 Document storage unit, 110 Reference data storage unit, 120 Input reception unit, 130 Target text acquisition unit, 140 Subsequent character string estimation unit, 150 text decomposition unit, 160 linked expression evaluation unit, 170 output processing unit, 180 modified text generation unit, 200 display screen.

Claims

A reception step for receiving a designated position that is a position in a target character string that is a character string to be processed;
Referencing a dictionary storage means storing a plurality of sentences, using a character string included up to the specified position in the target character string as a search key, and using the search key among the plurality of sentences stored in the dictionary storage means In a sentence including the character string, a guess step of inferring a character string following the character string of the search key as a candidate character string that is a candidate for a character string following the character string up to the specified position in the target character string;
An analysis step for performing morphological analysis on a character string after the specified position in the target character string;
A decomposing step of decomposing a character string after the specified position in the target character string into an element character string that is a character string corresponding to a component of a sentence based on the result of the morphological analysis ;
For each pair of the candidate character string and the element character string, an expression in which the character string in the target character string after the element character string is followed by the candidate character string is evaluated based on a predetermined evaluation rule. An evaluation step for obtaining an evaluation value representing the validity of the expression as a natural language;
Outputting the expression in which the evaluation value is equal to or greater than a preset threshold;
A program that causes a computer to execute.

Generates a character string in which a character string after the specified position in the target character string is replaced with the expression selected by the input in response to a user input selecting one of the expressions output in the output step. Generating step,
The program according to claim 1, further causing the computer to execute the following.

In the output step, when there are a plurality of expressions having the evaluation value equal to or greater than the threshold value, display control information for controlling the expression having a higher evaluation value to be displayed on the display unit with priority is output. To
The program according to claim 1 or 2, characterized in that

In the evaluation step, the evaluation value is not obtained for the same expression as the expression included in the target character string.
The program according to any one of claims 1 to 3, wherein:

In the output step, display control information for controlling the display means to display the candidate character string in the expression whose evaluation value is equal to or greater than the threshold and the character string in the target character string in a different manner is output. To
The program according to any one of claims 1 to 4, wherein:

In the output step, the target character string is further arranged between the designated position of the target character string and the element character string included in the expression having the evaluation value equal to or greater than the threshold value. Outputting display control information for controlling the element character string to be displayed on the display means together with the expression,
The program according to any one of claims 1 to 5, wherein:

Accepting means for accepting a designated position that is a position in a target character string that is a character string to be processed;
Referencing a dictionary storage means storing a plurality of sentences, using a character string included up to the specified position in the target character string as a search key, and using the search key among the plurality of sentences stored in the dictionary storage means In a sentence including the character string, a guess means for guessing a character string following the character string of the search key as a candidate character string that is a candidate for a character string following the character string up to the specified position in the target character string,
Analyzing means for performing morphological analysis on a character string after the specified position in the target character string;
Decomposing means for decomposing a character string after the specified position in the target character string into an element character string that is a character string corresponding to a sentence component, based on the result of the morphological analysis ;
For each set of the candidate character string and the element character string, in accordance with a predetermined evaluation rule, evaluate the expression that the candidate character string is followed by the character string in the target character string after the element character string, An evaluation means for obtaining an evaluation value representing the validity of the expression as a natural language;
Output means for outputting the expression in which the evaluation value is equal to or greater than a preset threshold;
An information processing apparatus comprising: