JP2005174003A

JP2005174003A - Summary preparing method and program

Info

Publication number: JP2005174003A
Application number: JP2003413649A
Authority: JP
Inventors: Hiromitsu Kawajiri; 博光川尻
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2003-12-11
Filing date: 2003-12-11
Publication date: 2005-06-30
Anticipated expiration: 2023-12-11
Also published as: JP4036824B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a summary preparing method and its program for briefly and effectively noticing only a main text. <P>SOLUTION: When document information (for example, patient's electronic chart) is inputted, the morphemic analysis of such document information is performed, and whether a part of the text matches the other whole text is decided. When they matched, the partially matched character string is set as a simple sentence candidate, and in the other case, the text is set as a simplification candidate, as it is. Even in the case of the partially matched character string, when the number of characters of the matched character string is less than M, or when the number of morphemes is less than N, the partially matched character string is not set as a simple sentence candidate, and the text is set as the simplification candidate as it is. The simplifying candidate, including a keyword is extracted from the generated simplifying candidates, and set as a summary candidate. Then, the part pertinent to the summary candidate in the input document is marked, and the summary is generated. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、たとえば電子カルテ等の文書情報から要約を生成する要約生成方法およびそのプログラムに関するものである。
The present invention relates to a summary generation method for generating a summary from document information such as an electronic medical record and a program therefor.

一つのファイルに多数の文書情報が含まれている場合、それぞれの文書情報の内容を簡易に確認できるよう、要約を生成することがよく行われる。たとえば、文書情報中から重要な箇所を抜粋して別途、要約書面を生成したり、あるいは、重要な箇所のみにアンダーラインやハイライト表示を付する等して、それぞれの文書情報の内容を簡易に把握できるようにする。これにより、ファイル中から所望の文書情報を容易に抽出することができる。 When a large number of pieces of document information are included in one file, a summary is often generated so that the contents of each piece of document information can be easily confirmed. For example, by extracting important parts from the document information and generating a separate summary document, or by adding underline or highlight to only the important parts, the contents of each document information can be simplified. To be able to grasp. Thereby, desired document information can be easily extracted from the file.

ここで、電子カルテ等、同じ表現が多く現れる文書から要約を生成する場合、特定のキーワードを含む文章を抽出して要約を生成するのが効果的である。たとえば、以下の特許文献１では、あらかじめ準備された重要表現を含む文章を抽出して、電子メールの要約を作成するようにしている。
特開平１１−３１６７６２号公報 Here, in the case where a summary is generated from a document in which the same expression frequently appears, such as an electronic medical record, it is effective to generate a summary by extracting a sentence including a specific keyword. For example, in Patent Document 1 below, sentences including important expressions prepared in advance are extracted to create an e-mail summary.
Japanese Patent Laid-Open No. 11-316762

しかしながら、このように特定のキーワードを含む文章を抽出する場合には、同じ内容の文章の前後に日付や接続詞等が付加されている文章がそれぞれ抽出されてしまう。ところが、要約を生成する場合、日付や接続詞等は特に重要な意味は持たず、逆に、日付や接続詞等が含まれていると、却って読みづらいものとなってしまう。日付や接続詞等を省いた中心の文章のみが簡潔に表記されている方が、要約としては、読みやすく理解しやすいものとなる。 However, when a sentence including a specific keyword is extracted in this way, sentences having a date, a conjunction and the like added before and after the sentence having the same content are extracted. However, when generating a summary, dates, conjunctions, and the like have no particular significance, and conversely, if dates, conjunctions, etc. are included, they are difficult to read. The summary is easier to read and understand if only the central sentence is omitted, with the date and conjunctions omitted.

そこで、本発明は、主たる文章のみを簡潔かつ効果的に表示し得る要約作成方法およびそのプログラムを提供することを課題とする。
Therefore, an object of the present invention is to provide a summary creation method and a program for displaying only main sentences in a concise and effective manner.

上記課題に鑑み本発明は、それぞれ以下の特徴を有する。 In view of the above problems, the present invention has the following features.

請求項１の発明は、文書情報から要約を生成する要約生成方法であって、前記文書情報中に含まれる文章のうちキーワードを含む文章を抽出する重要文抽出ステップと、抽出された一の重要文と他の重要文とを比較し、当該重要文の一部が他の重要文に一致するかを判別するマッチング判別ステップと、前記マッチング判別ステップによる判別結果に応じてサマリ候補を設定するサマリ候補設定ステップと、前記文書情報の内、前記サマリ候補設定ステップによって設定されたサマリ候補に該当する箇所を抽出して要約を生成する要約生成ステップとを有し、前記サマリ候補設定ステップは、前記マッチング判別ステップにおいて当該重要文の一部が他の重要文に一致すると判別されたとき、一致する部分の文字列をサマリ候補として設定し、前記マッチング判別ステップにおいて当該重要文の一部が他の重要文に一致しないと判別されたとき、当該重要文をサマリ候補として設定することを特徴とする。 The invention of claim 1 is a summary generation method for generating a summary from document information, an important sentence extraction step for extracting a sentence including a keyword from sentences included in the document information, and one extracted important sentence A matching step for comparing a sentence with another important sentence to determine whether a part of the important sentence matches another important sentence, and a summary for setting a summary candidate according to the determination result of the matching determination step A candidate setting step; and a summary generation step for extracting a portion corresponding to the summary candidate set by the summary candidate setting step in the document information and generating a summary. When it is determined that part of the important sentence matches another important sentence in the matching determination step, the character string of the matching part is set as a summary candidate. When part of the key sentence in the matching determination step is determined not to match the other key sentences, and sets the key sentences as the summary candidates.

請求項２の発明は、請求項１に記載の要約生成方法において、前記マッチング判別ステップは、当該重要文の一部が他の重要文の全体に一致するかを判別することを特徴とする。 According to a second aspect of the present invention, in the summary generation method according to the first aspect, the matching determination step determines whether a part of the important sentence matches the entire other important sentence.

請求項３の発明は、請求項１または２に記載の要約生成方法において、前記サマリ候補設定ステップは、前記マッチング判別ステップにおいて当該重要文の一部が他の重要文に一致すると判別されたとき、一致する部分の文字数を閾値と比較し、当該文字数が当該閾値以上のときは一致する部分の文字列をサマリ候補として設定し、当該文字数が当該閾値未満のときは当該重要文をサマリ候補として設定することを特徴とする。 According to a third aspect of the present invention, in the summary generation method according to the first or second aspect, the summary candidate setting step determines that a part of the important sentence matches another important sentence in the matching determination step. The number of characters in the matching part is compared with a threshold value. When the number of characters is equal to or greater than the threshold value, the character string in the matching part is set as a summary candidate. When the number of characters is less than the threshold value, the important sentence is set as a summary candidate. It is characterized by setting.

請求項４の発明は、請求項１ないし３の何れかに記載の要約生成方法において、前記サマリ候補設定ステップは、前記マッチング判別ステップにおいて当該重要文の一部が他の重要文に一致すると判別されたとき、一致する部分の形態素数を閾値と比較し、当該形態素数が当該閾値以上のときは一致する部分の文字列をサマリ候補として設定し、当該文字数が当該閾値未満のときは当該重要文をサマリ候補として設定することを特徴とする。 According to a fourth aspect of the present invention, in the summary generation method according to any one of the first to third aspects, the summary candidate setting step determines that a part of the important sentence matches another important sentence in the matching determination step. When the number of characters is less than the threshold, the morpheme number of the matching part is compared with a threshold, and when the number of the morpheme is equal to or greater than the threshold, the character string of the matching part is set as a summary candidate. The sentence is set as a summary candidate.

請求項５の発明は、請求項１ないし４の何れかに記載の要約生成方法において、前記要約生成ステップは、前記文書情報を全文表示するとともに、前記サマリ候補設定ステップによって設定されたサマリ候補の文字列部分にマーキングを付することを特徴とする。 According to a fifth aspect of the present invention, in the summary generation method according to any one of the first to fourth aspects, the summary generation step displays the full text of the document information, and the summary candidate set by the summary candidate setting step. The character string portion is marked.

請求項６の発明は、文書情報から要約を生成する要約生成方法であって、前記文書情報中に含まれる一の文章と他の文章を比較し、当該文章の一部が他の文章に一致するかを判別するマッチング判別ステップと、前記マッチング判別ステップによる判別結果に応じて簡略文候補を設定する簡略文候補設定ステップと、前記簡略文候補設定ステップによって設定された簡略文候補のうち、キーワードを含む簡略文候補を抽出してサマリ候補として設定するサマリ候補設定ステップと、前記文書情報の内、前記サマリ候補設定ステップによって設定されたサマリ候補に該当する箇所を抽出して要約を生成する要約生成ステップとを有し前記簡略文候補設定ステップは、前記マッチング判別ステップにおいて当該文章の一部が他の文章に一致すると判別されたとき、一致する部分の文字列を簡略文候補として設定し、前記マッチング判別ステップにおいて当該文章の一部が他の文章に一致しないと判別されたとき、当該文章を簡略文候補として設定することを特徴とする。 The invention according to claim 6 is a summary generation method for generating a summary from document information, wherein one sentence included in the document information is compared with another sentence, and a part of the sentence matches another sentence. Of the simple sentence candidates set by the matching determination step for determining whether to perform, the simple sentence candidate setting step for setting the simple sentence candidate according to the determination result in the matching determination step, and the simple sentence candidate setting step, the keyword A summary candidate setting step for extracting a simple sentence candidate including the summary candidate and setting it as a summary candidate; and a summary for generating a summary by extracting a portion of the document information corresponding to the summary candidate set by the summary candidate setting step The simplified sentence candidate setting step includes a generation step, and when the part of the sentence matches another sentence in the matching determination step When separated, the matching part character string is set as a simple sentence candidate, and when it is determined in the matching determination step that a part of the sentence does not match another sentence, the sentence is set as a simple sentence candidate It is characterized by doing.

請求項７の発明は、請求項６に記載の要約生成方法において、前記マッチング判別ステップは、当該文章の一部が他の文章の全体に一致するかを判別することを特徴とする。 The invention according to claim 7 is the summary generation method according to claim 6, wherein the matching determination step determines whether a part of the sentence matches the whole of another sentence.

請求項８の発明は、請求項６または７に記載の要約生成方法において、前記簡略文候補設定ステップは、前記マッチング判別ステップにおいて当該文章の一部が他の文章に一致すると判別されたとき、一致する部分の文字数を閾値と比較し、当該文字数が当該閾値以上のときは、一致する部分の文字列を簡略文候補として設定し、当該文字数が当該閾値未満のときは、当該文章を簡略文候補として設定することを特徴とする。 The invention of claim 8 is the summary generation method according to claim 6 or 7, wherein the simplified sentence candidate setting step determines that a part of the sentence matches another sentence in the matching determination step, The number of characters in the matching part is compared with a threshold value.If the number of characters is equal to or greater than the threshold value, the character string in the matching part is set as a simple sentence candidate.If the number of characters is less than the threshold value, the sentence is simplified. It is set as a candidate.

請求項９の発明は、請求項６ないし８の何れかに記載の要約生成方法において、前記簡略文候補設定ステップは、前記マッチング判別ステップにおいて当該文章の一部が他の文章に一致すると判別されたとき、一致する部分の形態素数を閾値と比較し、当該形態素数が当該閾値以上のときは一致する部分の文字列を簡略文候補として設定し、当該文字数が当該閾値未満のときは当該文章を簡略文候補として設定することを特徴とする。 The invention according to claim 9 is the summary generation method according to any one of claims 6 to 8, wherein the simplified sentence candidate setting step is determined in the matching determination step that a part of the sentence matches another sentence. When the number of characters is less than the threshold, the number of matching morphemes is compared with a threshold value. Is set as a simple sentence candidate.

請求項１０の発明は、請求項６ないし９の何れかに記載の要約生成方法において、前記要約生成ステップは、前記文書情報を全文表示するとともに、前記サマリ候補設定ステップによって設定されたサマリ候補の文字列部分にマーキングを付することを特徴とする。 A tenth aspect of the present invention is the summary generation method according to any one of the sixth to ninth aspects, wherein the summary generation step displays the full text of the document information, and the summary candidate set by the summary candidate setting step. The character string portion is marked.

請求項１１の発明は、上記請求項１ないし１０の何れかに記載のサマリ生成方法に従うサマリ生成機能をコンピュータに付与するプログラムである。 The invention of claim 11 is a program for giving a computer a summary generation function according to the summary generation method of any one of claims 1 to 10.

なお、上記において、「文章」とは、句点「。」から次の句点「。」までによって区画される文字列の他、改行マークから次の改行マークによって区画された文字列等も広く含むものである。また、「マーキング」とは、アンダーラインやハイライト表示を付することによって文字列を強調表示する場合の他、文字の太さや大きさ色等を変更することで表示の差別化を図るといった手法も広く含むものである。 In the above description, “sentence” includes not only a character string divided by a phrase “.” To the next phrase “.” But also a character string partitioned by a next line break mark from a line break mark. . “Marking” is a method of differentiating the display by changing the thickness, size color, etc. of the character in addition to highlighting the character string by adding an underline or highlight display. Is also widely included.

本発明の特徴は、以下に示す実施の形態の説明により更に明らかとなろう。ただし、以下の実施の形態は、あくまでも、本発明の一つの実施形態であって、本発明ないし各構成要件の用語の意義は、以下の実施の形態に記載されたものに制限されるものではない。
The features of the present invention will become more apparent from the following description of embodiments. However, the following embodiment is merely one embodiment of the present invention, and the meaning of the term of the present invention or each constituent element is not limited to that described in the following embodiment. Absent.

本発明によれば、キーワードを含む文章のうち、「その後」などの日付や接続詞等が付加されている文章は、これらを除外した文章に簡略化されてサマリ候補とされるため、日付や接続詞等、不要な表現のない効果的な要約（サマリ）を生成することができる。 According to the present invention, among sentences containing a keyword, sentences to which a date or conjunction such as “after” is added are simplified to sentences that exclude these and are regarded as summary candidates. Thus, it is possible to generate an effective summary (summary) without unnecessary expressions.

さらに、請求項３、４、８、９に記載の発明によれば、部分マッチングする部分の文字数が最小文字数未満の場合、あるいは、その形態素数が最小形態素数未満である場合には、部分マッチングする文字列をサマリ候補とはせず、当該文章をそのままサマリ候補とするものであるから、サマリ候補が簡略され過ぎることがなく、もって、ほどよく簡略化された要約（サマリ）を生成することができる。
According to the third, fourth, eighth, and ninth aspects of the invention, when the number of characters of the part to be partially matched is less than the minimum number of characters, or when the number of morphemes is less than the minimum number of morphemes, partial matching is performed. Since the character string to be used is not a summary candidate and the sentence is used as a summary candidate as it is, the summary candidate is not oversimplified, and thus a moderately simplified summary (summary) is generated. Can do.

以下、本発明の実施の形態につき図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

まず、図１に実施例に係る要約作成装置の構成を示す。 First, FIG. 1 shows a configuration of a summary creation device according to the embodiment.

なお、本実施例における要約作成装置は、ハードウェア的には、任意のコンピュータのＣＰＵ、メモリ、その他のＬＳＩなどで実現できる。また、ソフトウェア的には、メモリにロードされた記録制御機能のあるプログラムなどによって実現される。図１には、ハードウェアおよびソフトウェアによって実現される要約作成装置の機能ブロックが示されている。ただし、これらの機能ブロックが、ハードウェアのみ、ソフトウェアのみ、あるいは、それらの組合せ等、いろいろな形態で実現できることは言うまでもない。 Note that the summary creation device in the present embodiment can be realized in hardware by a CPU, memory, or other LSI of an arbitrary computer. In terms of software, it is realized by a program having a recording control function loaded in a memory. FIG. 1 shows functional blocks of a summary creation device realized by hardware and software. However, it goes without saying that these functional blocks can be realized in various forms such as hardware only, software only, or a combination thereof.

図示の如く、要約作成装置は文章入力部１０１と、形態素解析部１０２と、キーワード設定部１０３と、キーワード辞書１０４と、重要文抽出部１０５と、サマリ候補設定部１０６と、サマリ出力部１０７とを含んでいる。 As shown in the figure, the summary creation device includes a text input unit 101, a morpheme analysis unit 102, a keyword setting unit 103, a keyword dictionary 104, an important sentence extraction unit 105, a summary candidate setting unit 106, and a summary output unit 107. Is included.

文章入力部１０１は、入力ポートやディスクドライブ等から電子カルテ等の文書情報を受信する。形態素解析部１０２は、形態素解析のためのデータベースを備え、入力部１０１から入力された文書情報（一つの文書情報）を形態素解析して形態素単位に分割し、区切り情報と、各単位の自立語・付属語の区別を示す情報を当該文書情報に添えてキーワード設定部１０３と重要文抽出部１０５に出力する。 The text input unit 101 receives document information such as an electronic medical record from an input port or a disk drive. The morpheme analysis unit 102 includes a database for morpheme analysis, parses the document information (single document information) input from the input unit 101 into morpheme units, divide information, and independent words of each unit Information indicating the distinction between attached words is output to the keyword setting unit 103 and the important sentence extracting unit 105 along with the document information.

キーワード設定部１０３は、当該文書情報中に含まれるそれぞれの自立語の発生頻度を検出し、発生頻度が所定の閾値以上のものをキーワード候補としてメモリ（図示せず）に記憶する。このとき、各キーワード候補には、発生頻度に応じたスコアが設定され、メモリに記憶される。 The keyword setting unit 103 detects the frequency of occurrence of each independent word included in the document information, and stores words having a frequency of occurrence equal to or higher than a predetermined threshold in a memory (not shown) as keyword candidates. At this time, a score corresponding to the occurrence frequency is set for each keyword candidate and stored in the memory.

キーワード辞書１０４は、キーボード等の入力手段を介して、あらかじめユーザによって設定されたキーワード候補を記憶する。ユーザは、かかるキーワード候補を設定する際、各キーワード候補の重要度を設定する。キーワード辞書１０４は、かかる重要度に応じたスコアを各キーワードに対応付けて記憶する。 The keyword dictionary 104 stores keyword candidates set in advance by the user via input means such as a keyboard. When setting such keyword candidates, the user sets the importance of each keyword candidate. The keyword dictionary 104 stores a score corresponding to the degree of importance in association with each keyword.

上記キーワード設定部１０３は、前記メモリに記憶したキーワード候補と、キーワード辞書１０４に登録されているキーワード候補から、重要語テーブルを生成する。かかる重要語テーブルは、重要文抽出部１０５において、重要文抽出の際に参照される。 The keyword setting unit 103 generates an important word table from the keyword candidates stored in the memory and the keyword candidates registered in the keyword dictionary 104. The important word table is referred to in the important sentence extraction unit 105 when extracting the important sentence.

なお、かかる重要語テーブルは、たとえば、キーワード辞書１０４に登録されている全てのキーワード候補と、上記メモリに格納されているキーワード候補のうちスコアが上位数番目までのキーワード候補によって生成される。あるいは、キーワード１０４に登録されているキーワード候補の上位数番目までと、上記メモリに格納されているキーワード候補の上位数番目までによって生成するようにしても良い。ここで、上位何番目までを重要語テーブルに取り込むかは、ユーザにおいて適宜設定できるようにすると良い。 Note that such an important word table is generated by, for example, all the keyword candidates registered in the keyword dictionary 104 and the keyword candidates with the highest score among the keyword candidates stored in the memory. Or you may make it produce | generate by the top few of the keyword candidate registered into the keyword 104 and the top few of the keyword candidate stored in the said memory. Here, it is preferable that the user can appropriately set up to what number is taken into the important word table.

重要文抽出部１０５は、入力文書中に含まれている文章のうち、キーワード設定部１０３によって設定された重要語テーブル中の重要語（キーワード候補）を形態素として含む文章を重要文候補として抽出し、サマリ候補設定部１０６に出力する。なお、ここでは、たとえば、句点「。」から次の句点「。」までを一つの文章として、重要文候補の抽出を行う。あるいは、改行マークから次の改行マークまでを一つの文章とするようにしてもよい。 The important sentence extraction unit 105 extracts sentences including important words (keyword candidates) in the important word table set by the keyword setting unit 103 as morphemes from the sentences included in the input document as important sentence candidates. And output to the summary candidate setting unit 106. Here, for example, an important sentence candidate is extracted with one sentence from the phrase “.” To the next phrase “.”. Or you may make it make one sentence from a newline mark to the following newline mark.

サマリ候補設定部１０６は、重要文抽出部１０５から入力された重要文候補を他の重要文候補と比較し、当該重要文候補が他の重要文候補を部分的に含んでいるときは、マッチングする部分の文字列をサマリ候補として設定し、他の重要文候補を部分的に含んでいないときは、当該重要文候補をそのままサマリ候補として設定する。ただし、マッチングする部分の文字列の文字数があらかじめ設定した最小文字数Ｍ未満であるか、あるいは、その形態素数があらかじめ設定した最小形態素数Ｎ未満である場合は、マッチングする部分の文字列をサマリ候補とはせずに、当該重要文候補をそのままサマリ候補として設定する。 The summary candidate setting unit 106 compares the important sentence candidate input from the important sentence extracting unit 105 with other important sentence candidates, and when the important sentence candidate partially includes other important sentence candidates, matching is performed. The character string of the part to be set is set as the summary candidate, and when the other important sentence candidates are not partially included, the important sentence candidate is set as the summary candidate as it is. However, if the number of characters in the character string of the matching part is less than the preset minimum number of characters M, or the morpheme number is less than the preset minimum morpheme number N, the character string of the matching part is selected as the summary candidate. Instead, the important sentence candidate is set as a summary candidate as it is.

サマリ出力部１０７は、当該文書情報からサマリ（要約）を生成して、モニタ上に表示する。たとえば、入力された文書情報を全文表示するとともに、サマリ候補設定部１０６によって設定されたサマリ候補に一致する文字列に、アンダーラインやハイライト表示等のマーキングを付加する。あるいは、サマリ用の書式を別途準備し、これに、サマリ候補に一致する文字列を移動させるようにしても良い。 The summary output unit 107 generates a summary (summary) from the document information and displays it on the monitor. For example, the input document information is displayed in full text, and markings such as underline and highlight are added to a character string that matches the summary candidate set by the summary candidate setting unit 106. Alternatively, a summary format may be prepared separately, and a character string that matches the summary candidate may be moved to this.

図２に、本実施例における要約作成装置の処理フローを示す。 FIG. 2 shows a processing flow of the summary creation device in the present embodiment.

文章入力部１０１から文書情報が入力されると（Ｓ１０１）、入力された文書情報は形態素解析部１０２によって形態素解析される（Ｓ１０２）。そして、キーワード設定部１０３によって自立語の頻度がカウントされ、それぞれの自立語に頻度に応じたスコアが設定される（Ｓ１０３）。このうち、閾値Ｋ以上のスコアを持つ自立語（キーワード候補）と、キーワード辞書１０４に登録されている自立語（キーワード候補）から、重要語テーブルが生成される（Ｓ１０５）。そして、生成された重要語テーブル中の重要語を形態素として含む文章が、重要文抽出部１０５において、重要文候補として抽出される（Ｓ１０５）。 When document information is input from the text input unit 101 (S101), the input document information is morphologically analyzed by the morpheme analysis unit 102 (S102). Then, the frequency of independent words is counted by the keyword setting unit 103, and a score corresponding to the frequency is set for each independent word (S103). Among these, an important word table is generated from independent words (keyword candidates) having a score equal to or higher than the threshold K and independent words (keyword candidates) registered in the keyword dictionary 104 (S105). Then, a sentence including the important word in the generated important word table as a morpheme is extracted as an important sentence candidate in the important sentence extraction unit 105 (S105).

このようにして、当該入力文書から重要文候補が抽出されると、次に、サマリ候補設定部１０６において、上記サマリ候補の設定処理が実行される（Ｓ１０６〜Ｓ１１１）。まず、判定対象の重要文候補を他の重要文候補と比較し、当該重要文候補が他の重要文候補を部分的に含んでいるか（部分マッチングするか）が判別される（Ｓ１０６）。ここで、部分マッチングしなければ、当該判定対象の重要文候補がそのままサマリ候補に設定される（Ｓ１０９）。 When important sentence candidates are extracted from the input document in this way, the summary candidate setting unit 106 then executes the summary candidate setting process (S106 to S111). First, the important sentence candidate to be determined is compared with other important sentence candidates to determine whether the important sentence candidate partially includes another important sentence candidate (partial matching) (S106). Here, if there is no partial matching, the important sentence candidate to be determined is set as a summary candidate as it is (S109).

他方、部分マッチングすれば、部分マッチングする部分の文字列の文字数が設定値Ｍ未満であるかが判別される（Ｓ１０７）。ここで、文字数が設定値Ｍ未満であれば、Ｓ１０９に進み、当該判定対象の重要文候補がそのままサマリ候補に設定される。他方、文字数が設定値Ｍ以上であれば、次に、部分マッチングする部分の文字列の形態素数が設定値Ｎ未満であるかが判別される（Ｓ１０７）。ここで、形態素数が設定値Ｎ未満であれば、Ｓ１０９に進み、当該判定対象の重要文候補がそのままサマリ候補に設定される。他方、形態素数がＮ以上であれば、部分マッチングした文字列がサマリ候補に設定される（Ｓ１１０）。 On the other hand, if the partial matching is performed, it is determined whether the number of characters in the character string of the partial matching portion is less than the set value M (S107). If the number of characters is less than the set value M, the process proceeds to S109, and the important sentence candidate to be determined is set as a summary candidate as it is. On the other hand, if the number of characters is greater than or equal to the set value M, it is next determined whether or not the number of morphemes in the character string of the partially matched portion is less than the set value N (S107). If the morpheme number is less than the set value N, the process proceeds to S109, and the important sentence candidate to be determined is set as a summary candidate as it is. On the other hand, if the number of morphemes is N or more, the partially matched character string is set as a summary candidate (S110).

上記Ｓ１０６〜Ｓ１１０の処理は、全ての重要文候補についてサマリ候補の設定処理が終了するまで繰り返される（Ｓ１１１）。そして、全ての重要文候補に対するサマリ候補の設定処理が終了すると、次に、サマリ出力部１０７において、サマリ候補をもとにしたサマリ（要約）の出力処理が実行される（Ｓ１１２）。たとえば、入力された文書情報を全文表示するとともに、上記Ｓ１０６〜Ｓ１１１にて設定されたサマリ候補に一致する文字列に、アンダーラインやハイライト表示等のマーキングを付加する。 The processes of S106 to S110 are repeated until the summary candidate setting process is completed for all important sentence candidates (S111). When the summary candidate setting process for all important sentence candidates is completed, the summary output unit 107 then executes a summary (summary) output process based on the summary candidates (S112). For example, the input document information is displayed in full text, and markings such as underline and highlight are added to the character string that matches the summary candidate set in S106 to S111.

図３に、サマリ候補の設定時の具体的処理例を示す。 FIG. 3 shows a specific processing example when setting summary candidates.

入力部に１単位の文書情報（たとえば、電子カルテ）が入力されると、同図（ａ）の如く、かかる文書情報が形態素解析される。なお、図中、「／」は、形態素の区切りを示している。このとき、重要語テーブルに、「経過」および「特に」のキーワードが設定されていると、当該文書中に含まれる文章のうち、形態素として「経過」および「特に」を含むもののみが抽出され、同図（ｂ）に示す如く、重要文候補が生成される。 When one unit of document information (for example, electronic medical record) is input to the input unit, such document information is morphologically analyzed as shown in FIG. In the figure, “/” indicates a morpheme break. At this time, if the keywords “progress” and “especially” are set in the important word table, only sentences including “progress” and “particularly” as morphemes are extracted from the text included in the document. The important sentence candidate is generated as shown in FIG.

次に、生成された重要文候補は、その一部が他の重要文候補と一致するか（部分マッチング）が判別され、一致するときには、当該部分マッチングする文字列がサマリ候補として設定される。たとえば、同図（ｂ）のうち、「その後、経過は順調である」は、同図（ｄ）に示す如く、「経過は順調である」の部分が他の重要文候補と部分マッチングするため、「経過は順調である」がサマリ候補として設定される。 Next, it is determined whether or not a part of the generated important sentence candidate matches another important sentence candidate (partial matching). When they match, the character string to be partially matched is set as a summary candidate. For example, in the figure (b), “the progress is then smooth” means that the part “the progress is good” partially matches other important sentence candidates as shown in the figure (d). “The progress is smooth” is set as the summary candidate.

他方、部分マッチングしないときは、当該重要文候補がそのままサマリ候補として設定される。たとえば、同図（ｂ）のうち、「症状は特になし」は「痛みは特になし」と「特になし」の部分が重複するが、「痛みは特になし」の全体をその一部に含むものではないため、部分マッチングしないとされる。したがって、同図（ｃ）に示す如く、「症状は特になし」はそのままサマリ候補として設定される。「痛みは特になし」も同様である。 On the other hand, when partial matching is not performed, the important sentence candidate is set as a summary candidate as it is. For example, in the figure (b), “No symptoms in particular” overlaps with “No pain in particular” and “No particular pain”, but includes “No pain in particular” as part of it. Because it is not, it is assumed that partial matching is not performed. Therefore, as shown in FIG. 5C, “no particular symptom” is set as a summary candidate as it is. The same applies to “no pain”.

このように、本実施例では、キーワードを含む文章（重要文候補）のうち、「その後」などの日付や接続詞等が付加されている文章は、これらを除外した文章に簡略化されてサマリ候補とされるため、日付や接続詞等、不要な表現のない要約（サマリ）を生成出力することができる。 In this way, in this embodiment, among sentences (important sentence candidates) including keywords, sentences to which a date or a conjunction such as “after” is added are simplified to sentences that exclude these, and summary candidates are added. Therefore, it is possible to generate and output a summary (summary) without unnecessary expressions such as dates and conjunctions.

また、図３には例示されていないが、部分マッチングする部分の文字数が最小文字数Ｍ未満の場合、あるいは、その形態素数が最小形態素数Ｎ未満であるときは、部分マッチングする文字列をサマリ候補とはせず、当該重要文候補をサマリ候補に設定するものであるから、重要文候補が簡略され過ぎることを防止でき、もって、内容把握に十分な程度に簡略化した要約（サマリ）を生成出力することができる。 Although not illustrated in FIG. 3, if the number of characters of the part to be partially matched is less than the minimum number of characters M, or if the number of morphemes is less than the minimum number of morphemes N, the character string to be partially matched is selected as a summary candidate. The important sentence candidate is set as a summary candidate instead, and it is possible to prevent the important sentence candidate from being oversimplified, thereby generating a summary (summary) that is simplified enough to grasp the contents. Can be output.

なお、最小文字数Ｍおよび最小形態素数Ｎは、たとえば、設計段階において、これらを変えながらサマリ生成を試行し、設計者が最も効果的なサマリが出力されると認識した値に設定するようにする。あるいは、これらの値をユーザが適宜設定できるようにしてもよい。 Note that the minimum number of characters M and the minimum number of morphemes N are set to values that the designer recognizes that the most effective summary is output, for example, by trying to generate a summary while changing them. . Alternatively, these values may be appropriately set by the user.

上記実施例１では、キーワード（重要語）をもとに重要文候補を抽出した後、これらを簡略化してサマリ候補を設定するようにしたが、本実施例では、まず、入力文書中に含まれる文章を簡略化した後、キーワードを含む文章を抽出し、これをサマリ候補に設定するものである。 In the first embodiment, after extracting important sentence candidates based on keywords (important words), summary candidates are set by simplifying them, but in this embodiment, first, they are included in the input document. After a simplified sentence is extracted, a sentence including a keyword is extracted and set as a summary candidate.

図４に、本実施例に係るサマリ生成装置の構成を示す。 FIG. 4 shows the configuration of the summary generation device according to the present embodiment.

図において、文章入力部１０１、形態素解析部１０２、キーワード設定部１０３、キーワード辞書１０４、サマリ出力部１０７の機能は、上記図１に示すものと同様である。本実施例では、上記実施例１の重要文抽出部１０５とサマリ候補設定部１０６に代えて、簡略文抽出部１１０とサマリ候補設定部１１１が採用されている。 In the figure, functions of a text input unit 101, a morpheme analysis unit 102, a keyword setting unit 103, a keyword dictionary 104, and a summary output unit 107 are the same as those shown in FIG. In this embodiment, a simple sentence extraction unit 110 and a summary candidate setting unit 111 are employed instead of the important sentence extraction unit 105 and the summary candidate setting unit 106 of the first embodiment.

簡略文抽出部１１０は、入力文書中に含まれている文章のうち、一の文章を他の文章と比較し、当該文章が他の文章と部分マッチングするときは、マッチングする部分の文字列を簡略文候補とし、部分マッチングしないときは、当該文章をそのまま簡略文候補とする。ただし、マッチングする部分の文字列の文字数があらかじめ設定した最小文字数Ｍ未満であるか、あるいは、その形態素数があらかじめ設定した最小形態素数Ｎ未満である場合は、マッチングする部分の文字列を簡略文候補とはせずに、当該文章をそのまま簡略文候補とする。 The simplified sentence extraction unit 110 compares one sentence among the sentences included in the input document with another sentence, and when the sentence partially matches with another sentence, the simplified sentence extraction unit 110 calculates the character string of the matching part. When a simple sentence candidate is used and partial matching is not performed, the sentence is directly used as a simple sentence candidate. However, if the number of characters in the character string of the matching part is less than the preset minimum number of characters M, or the morpheme number is less than the preset minimum morpheme number N, the character string of the matching part is simplified Instead of a candidate, the sentence is used as a simple sentence candidate.

サマリ候補設定部１１１は、生成された簡略文候補のうち、キーワード設定部１０３によって設定された重要語テーブル中の重要語（キーワード候補）を形態素として含む文章を抽出し、これをサマリ候補に設定する。 The summary candidate setting unit 111 extracts sentences including important words (keyword candidates) in the important word table set by the keyword setting unit 103 as morphemes from the generated simplified sentence candidates and sets them as summary candidates. To do.

図５に、本実施例における要約作成装置の処理フローを示す。 FIG. 5 shows a processing flow of the summary creation device in the present embodiment.

なお、同図に示す処理フローのうち、Ｓ１０１〜Ｓ１０４は、上記実施例１における図２の処理フローと同様であるため、その説明を省略する。 Of the processing flow shown in the figure, S101 to S104 are the same as the processing flow of FIG.

Ｓ１０４において重要語テーブルが生成されると、次に、入力文書中に含まれている文章のうち、一の文章（文章候補）を他の文章と比較し、当該文章候補が他の文章を部分的に含んでいるか（部分マッチングするか）が判別される（Ｓ１２１）。ここで、部分マッチングしなければ、当該文章候補がそのまま簡略文候補とされる（Ｓ１２４）。 When the important word table is generated in S104, next, among the sentences included in the input document, one sentence (sentence candidate) is compared with another sentence, and the sentence candidate partially substitutes another sentence. Whether it is included (partial matching) is determined (S121). Here, if there is no partial matching, the sentence candidate is used as a simple sentence candidate as it is (S124).

他方、部分マッチングすれば、部分マッチングする部分の文字列の文字数が設定値Ｍ未満であるかが判別される（Ｓ１２２）。ここで、文字数が設定値Ｍ未満であれば、Ｓ１２４に進み、当該文章候補がそのまま簡略文候補とされる。他方、文字数が設定値Ｍ以上であれば、次に、部分マッチングする部分の文字列の形態素数が設定値Ｎ未満であるかが判別される（Ｓ１２３）。ここで、形態素数が設定値Ｎ未満であれば、Ｓ１２４に進み、当該文章候補がそのまま簡略文候補とされる。他方、形態素数がＮ以上であれば、部分マッチングした文字列が簡略文候補とされる（Ｓ１２５）。 On the other hand, if the partial matching is performed, it is determined whether the number of characters in the character string of the partial matching portion is less than the set value M (S122). If the number of characters is less than the set value M, the process proceeds to S124, and the sentence candidate is directly used as a simple sentence candidate. On the other hand, if the number of characters is greater than or equal to the set value M, it is next determined whether or not the number of morphemes in the character string of the part to be partially matched is less than the set value N (S123). Here, if the morpheme number is less than the set value N, the process proceeds to S124, and the sentence candidate is directly used as a simple sentence candidate. On the other hand, if the number of morphemes is N or more, the partially matched character string is set as a simple sentence candidate (S125).

上記Ｓ１２１〜Ｓ１２５の処理は、全ての文章について簡略文候補の生成処理が終了するまで繰り返される（Ｓ１２６）。全ての文章に対する簡略文候補の生成処理が終了すると、次に、簡略文候補のうち、Ｓ１０４にて生成された重要語テーブル中の重要語を形態素として含む簡略文候補が抽出され、サマリ候補として設定される（Ｓ１２７）。そして、設定されたサマリ候補をもとに、サマリ出力部１０７において、サマリ（要約）の出力処理が実行される（Ｓ１２８）。たとえば、入力された文書情報を全文表示するとともに、上記Ｓ１２１〜Ｓ１２７にて設定されたサマリ候補に一致する文字列に、アンダーラインやハイライト表示等のマーキングを付加する。 The processes of S121 to S125 are repeated until the generation process of the simplified sentence candidate is completed for all sentences (S126). When the generation process of the simplified sentence candidates for all sentences is completed, a simplified sentence candidate including the important words in the important word table generated in S104 as morphemes is extracted from the simplified sentence candidates, and the summary candidates are extracted. It is set (S127). Based on the set summary candidates, the summary output unit 107 executes summary (summary) output processing (S128). For example, the input document information is displayed in full text, and markings such as underline and highlight display are added to the character string that matches the summary candidates set in S121 to S127.

図６に、サマリ候補の設定時の具体的処理例を示す。 FIG. 6 shows a specific processing example when setting summary candidates.

入力部に１単位の文書情報（たとえば、電子カルテ）が入力されると、同図（ａ）の如く、かかる文書情報が形態素解析される。次に、形態素解析された各文章は、その一部が他の文章と一致するか（部分マッチング）が判別され、一致するときには、当該部分マッチングする文字列が簡略文候補として設定される。他方、一致しないときは、当該文章がそのまま簡略化候補とされる。 When one unit of document information (for example, electronic medical record) is input to the input unit, such document information is morphologically analyzed as shown in FIG. Next, each sentence subjected to morphological analysis is discriminated whether or not a part thereof matches another sentence (partial matching). When they match, a character string to be partially matched is set as a simple sentence candidate. On the other hand, when they do not match, the sentence is directly used as a simplification candidate.

たとえば、同図（ａ）のうち、「その後、経過は順調である」は、「経過は順調である」の部分が他の文章と部分マッチングするため、「経過は順調である」が簡略文候補として設定される。 For example, in the figure (a), “the progress is smooth afterwards” means that “the progress is smooth” because the portion of “the progress is smooth” partially matches with other sentences. Set as a candidate.

なお、同図（ａ）の「異常なし」や、「症状は特になし」、「痛みは特になし」は「なし」と部分マッチングするが、部分マッチングする「なし」の文字数が最小値Ｍ（たとえば、Ｍ＝４）未満であるため、同図（ｄ）に示す如く、これら「異常なし」、「症状は特になし」、「痛みは特になし」に対する簡略文候補が「なし」とされることはない。これらは、それぞれ「異常なし」、「症状は特になし」、「痛みは特になし」それ自身が、そのまま簡略化候補とされる。 Note that “no abnormality”, “no symptom in particular”, and “no pain in particular” in FIG. 9A partially match with “none”, but the number of characters of “none” to be partially matched is the minimum value M ( For example, since it is less than M = 4), as shown in FIG. 4D, the simplified sentence candidates for “no abnormality”, “no particular symptom”, and “no special pain” are set to “none”. There is nothing. These are “no abnormality”, “no symptom in particular”, and “no pain in particular”, respectively, as simplification candidates.

次に、生成された簡略化候補のうち、キーワードを含む簡略化候補が抽出され、サマリ候補に設定される。たとえば、重要語テーブルに、「経過」および「特に」のキーワードが設定されている場合、同図（ｂ）の簡略化候補のうち、形態素として「経過」および「特に」を含むもののみが抽出され、同図（ｃ）に示す如く、サマリ候補として設定される。 Next, among the generated simplification candidates, simplification candidates including keywords are extracted and set as summary candidates. For example, if the keywords “progress” and “especially” are set in the keyword table, only simplification candidates in FIG. 5B that include “progress” and “particularly” as morphemes are extracted. Then, it is set as a summary candidate as shown in FIG.

このように、本実施例では、上記実施例１と同様、日付や接続詞等、不要な表現のない要約（サマリ）を生成出力することができる。また、最小文字数Ｍおよび最小形態素数Ｎを設定することにより、過度の簡略化を防止でき、効果的に簡略化された要約（サマリ）を生成出力することができる。 As described above, in this embodiment, similar to the first embodiment, a summary (summary) having no unnecessary expressions such as dates and conjunctions can be generated and output. Further, by setting the minimum number of characters M and the minimum number of morphemes N, excessive simplification can be prevented, and a simplified summary can be generated and output effectively.

本発明は、上記実施の形態に限定されるものではなく、他に種々の変更が可能であることは言うまでもない。本発明の実施の形態は、特許請求の範囲に示された技術的思想の範囲内において、適宜、変更が可能である。
It goes without saying that the present invention is not limited to the above-described embodiment, and various other modifications are possible. The embodiment of the present invention can be appropriately changed within the scope of the technical idea shown in the claims.

実施例１に係る要約作成装置の構成を示す図The figure which shows the structure of the summary preparation apparatus which concerns on Example 1. FIG. 実施例１に係る要約作成装置の処理動作を示すフローチャート6 is a flowchart illustrating processing operations of the summary creation device according to the first embodiment. 実施例１に係る要約作成動作の具体例を示す図The figure which shows the specific example of the summary preparation operation | movement which concerns on Example 1. FIG. 実施例２に係る要約作成装置の構成を示す図The figure which shows the structure of the summary preparation apparatus which concerns on Example 2. FIG. 実施例２に係る要約作成装置の処理動作を示すフローチャート10 is a flowchart illustrating processing operations of the summary creation device according to the second embodiment. 実施例２に係る要約作成動作の具体例を示す図The figure which shows the specific example of the summary preparation operation | movement which concerns on Example 2. FIG.

Explanation of symbols

１０３キーワード設定部
１０４キーワード辞書
１０５重要文抽出部
１０６サマリ候補設定部
１０７サマリ出力部
１１０簡略文抽出部
１１１サマリ候補設定部
DESCRIPTION OF SYMBOLS 103 Keyword setting part 104 Keyword dictionary 105 Important sentence extraction part 106 Summary candidate setting part 107 Summary output part 110 Simplified sentence extraction part 111 Summary candidate setting part

Claims

A summary generation method for generating a summary from document information,
An important sentence extraction step for extracting sentences including keywords from the sentences included in the document information;
A matching determination step for comparing one extracted important sentence with another important sentence and determining whether a part of the important sentence matches another important sentence;
A summary candidate setting step for setting a summary candidate according to the determination result of the matching determination step;
A summary generation step of generating a summary by extracting a portion corresponding to the summary candidate set by the summary candidate setting step in the document information,
The summary candidate setting step includes:
When it is determined in the matching determination step that a part of the important sentence matches another important sentence, a character string of the matching part is set as a summary candidate,
When it is determined in the matching determination step that a part of the important sentence does not match another important sentence, the important sentence is set as a summary candidate.
A summary generation method characterized by the above.

In claim 1,
In the matching determination step, it is determined whether a part of the important sentence matches the entire other important sentence.
A summary generation method characterized by the above.

In claim 1 or 2,
In the summary candidate setting step, when it is determined in the matching determination step that a part of the important sentence matches another important sentence, the number of characters in the matching part is compared with a threshold value, and the number of characters is equal to or more than the threshold value. Sets the character string of the matching part as a summary candidate, and when the number of characters is less than the threshold, sets the important sentence as a summary candidate.
A summary generation method characterized by the above.

In any one of Claims 1 thru | or 3,
In the summary candidate setting step, when it is determined in the matching determination step that a part of the important sentence matches another important sentence, the number of matching morphemes is compared with a threshold value, and the morpheme number is equal to or greater than the threshold value. When the number of characters is less than the threshold, the important sentence is set as a summary candidate.
A summary generation method characterized by the above.

In any of claims 1 to 4,
The summary generation step displays the full text of the document information and marks the character string portion of the summary candidate set by the summary candidate setting step.
A summary generation method characterized by the above.

A summary generation method for generating a summary from document information,
A matching determination step of comparing one sentence included in the document information with another sentence, and determining whether a part of the sentence matches the other sentence;
A simple sentence candidate setting step for setting a simple sentence candidate according to the determination result of the matching determination step;
A summary candidate setting step of extracting a simplified sentence candidate including a keyword from the simplified sentence candidates set by the simplified sentence candidate setting step and setting it as a summary candidate;
A summary generation step of generating a summary by extracting a portion corresponding to the summary candidate set by the summary candidate setting step in the document information,
The simplified sentence candidate setting step includes:
When it is determined in the matching determination step that a part of the sentence matches another sentence, the matching part character string is set as a simple sentence candidate,
When it is determined in the matching determination step that a part of the sentence does not match another sentence, the sentence is set as a simple sentence candidate;
A summary generation method characterized by the above.

In claim 6,
The matching determination step determines whether a part of the sentence matches the entire other sentence,
A summary generation method characterized by the above.

In claim 6 or 7,
In the simplified sentence candidate setting step, when it is determined in the matching determination step that a part of the sentence matches another sentence, the number of matching parts is compared with a threshold, and when the number of characters is equal to or more than the threshold Set the matching part character string as a simple sentence candidate, and when the number of characters is less than the threshold, set the sentence as a simple sentence candidate,
A summary generation method characterized by the above.

In any of claims 6 to 8,
In the simplified sentence candidate setting step, when it is determined in the matching determination step that a part of the sentence matches another sentence, the morpheme number of the matching part is compared with a threshold value, and the morpheme number is equal to or greater than the threshold value. When setting the matching part character string as a simple sentence candidate, when the number of characters is less than the threshold, set the sentence as a simple sentence candidate,
A summary generation method characterized by the above.

In any of claims 6 to 9,
The summary generation step displays the full text of the document information and marks the character string portion of the summary candidate set by the summary candidate setting step.
A summary generation method characterized by the above.

A program for giving a computer a summary generation function according to the summary generation method according to claim 1.