JP4106470B2

JP4106470B2 - Solution data editing processing apparatus and processing method

Info

Publication number: JP4106470B2
Application number: JP2006222723A
Authority: JP
Inventors: 真樹村田
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2006-08-17
Filing date: 2006-08-17
Publication date: 2008-06-25
Anticipated expiration: 2022-02-22
Also published as: JP2006318509A

Description

本発明は，機械学習法により文章を自動要約する処理において，編集可能な解データの編集処理および解データを用いる機械学習法を用いた自動要約処理に関する。 The present invention relates to edit processing of editable solution data and automatic summarization processing using a machine learning method using solution data in processing for automatically summarizing sentences by a machine learning method.

近年，情報技術の発展に伴ってコンピュータを用いた文章の自動要約処理が盛んになってきている。しかし，個人的な嗜好や要約結果の用途などにより，所望する要約結果の傾向に相違があると考えられる。 In recent years, with the development of information technology, automatic summarization processing of sentences using computers has become popular. However, it can be considered that there is a difference in the tendency of the desired summary result depending on personal preference and use of the summary result.

例えば，以下の非特許文献１では，複数の者がそれぞれ重要文抽出による要約を行なった結果に対する相互評価の尺度として再現率と適合率とを求めて表４に示している。非特許文献１の表４から明らかなように，２０文を抽出する処理の場合に，人−人（評価者相互）の評価（再現率および適合率）は，各評価者Ａ，Ｂ，Ｃの一致度は５０〜７０％であってあまり高い値とはいえず，要約結果に対する評価に個人差が存在することが推定できる。 For example, in Non-Patent Document 1 below, Table 4 shows the recall rate and relevance rate as a measure of mutual evaluation with respect to the results obtained by summarizing each of a plurality of persons by extracting important sentences. As is clear from Table 4 of Non-Patent Document 1, in the case of the process of extracting 20 sentences, the evaluation (reproduction rate and relevance rate) of person-person (evaluator mutual) is evaluated by each evaluator A, B, C. The degree of coincidence is 50 to 70%, which is not so high, and it can be estimated that there are individual differences in the evaluation of the summary results.

また，以下の非特許文献２では，サポート・ベクトル・マシン（Support Vector Machine）による重要文抽出処理において，処理セットＡ，Ｂ，Ｃについて交差検定の精度が最もよいことを表４により示している。非特許文献２の表４に示された交差検定は，同一評価者による処理と同一視でき，セットＡ，Ｂ，Ｃを作成した者が同一かどうかは不明であるが，少なくとも同一時期もしくは同一人物により学習データを作成したほうが精度が良いということがわかる。
伊藤山彦他，「講演文を対象にした重要文抽出」，言語処理学会第７回年次大会発表論文集，言語処理学会，2001年，pp.305-308 平尾勉他，「Support Vector Machineによる重要文抽出」，情報学会基礎論文63-16 ，情報学会，2001年，pp.121-127 In the following Non-Patent Document 2, Table 4 shows that the accuracy of cross-validation is the best for processing sets A, B, and C in the important sentence extraction processing by the Support Vector Machine. . The cross-validation shown in Table 4 of Non-Patent Document 2 can be identified with the process by the same evaluator, and it is unclear whether the persons who created the sets A, B, and C are the same, but at least the same time or the same It can be seen that it is better to create learning data by a person.
Itoyamahiko et al., "Important sentence extraction for lectures", Proc. Of the 7th Annual Conference of the Language Processing Society of Japan, 2001, pp.305-308 Tsutomu Hirao et al., “Important sentence extraction by Support Vector Machine”, Information Science Society Basic Paper 63-16, Information Society, 2001, pp.121-127

このように，要約結果に対する評価に個人差や用途差が存在すると考えることができることから，機械学習法を用いた自動要約処理においても，同じ評価にもとづいた要約を行なうのではなく，ユーザに特化した要約ができる必要がある。そのために，教師となる解データをユーザが自由に編集できる必要がある。 In this way, it can be considered that there are individual differences and usage differences in the evaluation of the summary results. Therefore, in the automatic summarization process using the machine learning method, the summarization based on the same evaluation is not performed, but the user is specially selected. There is a need for a simplified summary. Therefore, the user must be able to freely edit the solution data to be a teacher.

本発明の目的は，機械学習法で用いる解データとなる要約結果または要約結果に対する評価をユーザが任意に編集できる解データの編集処理を実現することである。および，この解データを用いた機械学習法を用いてユーザごとに特化した要約を行える自動要約処理技術を実現することである。 An object of the present invention is to realize a solution data editing process in which a user can arbitrarily edit a summary result or an evaluation of the summary result as solution data used in the machine learning method. And, it is to realize an automatic summarization processing technology that can perform summarization specialized for each user by using a machine learning method using the solution data.

上記の目的を達成するため，本発明は，予め備えた解データを用いた機械学習処理に対してユーザがどのような要約結果を高く評価したかの情報をフィードバックするために，ユーザが要約結果やその評価を編集できるようにする。また，ユーザが編集した解データのフィードバックにより，機械学習処理においてユーザごとの特性を学習し，ユーザに特化した要約を行なうことができるようにするものである。 In order to achieve the above-described object, the present invention provides a method for the user to summarize information about what kind of summary result the user highly appreciated for the machine learning process using solution data prepared in advance. And edit its rating. In addition, by feedback of solution data edited by the user, the characteristics for each user are learned in the machine learning process, and the user-specific summarization can be performed.

本発明は，文書データであるテキストを機械学習法により自動要約する処理で用いる解データを編集する解データ編集処理装置であって，１）文書データであるテキストを記憶するテキスト記憶手段と，２）前記テキスト記憶手段から取得したテキストを表示装置に表示し，前記テキストからユーザによって指定された範囲の文データを抽出して前記テキストのユーザ指定要約として表示する要約表示手段と，３）要約の評価として使用される要約の特徴を示す情報であって，要約として短い文を重視しているかどうかを示す短文重視の性質，数量についての表現が要約に含まれていることを重視しているかどうかを示す数量表現重視の性質，要約に手法についての表現が含まれていることを重視しているかどうかを示す手法重視の性質，要約の文体を重視しているかどうかを示す文体重視の性質，または要約が読みやすいかどうかを重視していることを示す読みやすさ重視の性質のいずれか２つ以上の性質を含む複数の性質について，当該性質各々に対する評価値を入力する項目を表示し，前記ユーザ指定要約に対する前記性質各々のユーザの評価値の入力を受け付ける評価付与手段と，４）問題および解で構成される解データを記憶する解データ記憶手段と，５）前記テキストおよび前記ユーザ指定要約を問題とし，当該問題に前記ユーザが入力した評価値を解として付与して解データを生成し，前記テキストから文を取り出しあらゆる文の選択の状態を要約候補とする重要文選択処理，前記テキストから文節を取り出しあらゆる文節の選択の状態を要約候補とする重要箇所選択処理，または前記テキストの文を予め定めた変形規則に従って変形し当該変形した状態を要約候補とする変形処理のいずれか１つの処理を行って前記テキストの要約候補を生成し，前記テキストおよび前記要約候補であって前記ユーザ指定要約以外の部分からなる要約候補を問題とし，当該問題に当該要約候補が前記ユーザ指定要約ではないことを示す悪評価を解として付与して解データを生成し，前記ユーザによって入力された評価値を解とする解データおよび前記悪評価を解とする解データを前記解データ記憶手段に出力する評価カスタマイズ手段とを備えることを特徴とする。 The present invention is a solution data editing processing apparatus for editing solution data used in a process of automatically summarizing text as document data by a machine learning method, 1) a text storage means for storing text as document data; 3) summary display means for displaying the text acquired from the text storage means on a display device, extracting sentence data in a range specified by the user from the text, and displaying it as a user-specified summary of the text; Is information indicating the characteristics of the summary used as an evaluation, and whether the importance is attached to the fact that the short sentence is emphasized as a summary and the fact that the expression includes quantity is included in the summary Quantitative expression-oriented nature that indicates, method-oriented nature that indicates whether the summary includes expressions about the technique, About two or more properties including two or more properties, either a style-oriented property that indicates whether or not a particular style is emphasized, or a readability-oriented property that indicates whether a summary is easy to read , An item for inputting an evaluation value for each property, and an evaluation giving means for receiving an input of the evaluation value of the user for each property with respect to the user-specified summary; Solution data storage means for performing 5) a problem with the text and the user-specified summary, giving an evaluation value input by the user to the problem as a solution, generating solution data, and extracting a sentence from the text key sentence selection process to the selection of the state candidate condensates, important passage election to the state of the selection of any clauses removed phrase and candidate condensates from the text Processing, or the modification of the statement text in accordance with a predetermined transformation rule carried out any one of process variants a process for the summary candidates while the deformation generates a summary candidate of the text, the text and the summary A candidate candidate that is a summary candidate other than the user-specified summary is used as a problem, and solution data is generated by giving a bad evaluation indicating that the summary candidate is not the user-specified summary to the problem as a solution, It comprises an evaluation customization means for outputting solution data whose solution is an evaluation value input by a user and solution data whose solution is the bad evaluation to the solution data storage means.

また，本発明は，文書データであるテキストを機械学習法を用いて自動要約する自動要約処理装置であって，１）テキストおよび前記テキストの要約を記憶するテキスト記憶手段と，２）前記要約を表示装置に表示する要約表示処理手段と，３）前記要約に対するユーザの評価の入力を受け付けて前記要約の評価とする評価設定処理手段と，４）前記テキストおよび前記要約で構成される問題に対し前記評価を解として付与して生成した解データを解データ記憶手段に記憶する解データ出力処理手段と，５）前記解データから前記問題の素性の集合と前記解との組を抽出し，当該組から，どのような素性のときにどのような解となりやすいかを学習した学習結果データを学習結果データ記憶手段に記憶する機械学習処理手段と，６）要約対象のテキストを入力し，前記入力テキストから要約候補を生成する要約候補生成処理手段と，７）前記入力テキストおよび前記要約候補から素性の集合を抽出し，当該素性の集合からどのような解となりやすいかを前記学習結果データをもとに推定し，要約候補と推定解との対を生成する要約候補−推定解対生成処理手段と，８）前記要約候補−推定解対から，推定解が所定の良い評価でかつ確信度が最高の対を選択し，当該対の要約候補を要約とする要約選択処理手段とを備える。 The present invention also provides an automatic summarization processing device for automatically summarizing text, which is document data, using a machine learning method, and includes 1) text storage means for storing the text and the text summary, and 2) the summary. Summary display processing means for displaying on a display device, 3) evaluation setting processing means for accepting an input of a user's evaluation for the summary and evaluating the summary, and 4) for the problem composed of the text and the summary Solution data output processing means for storing solution data generated by giving the evaluation as a solution in solution data storage means; 5) extracting a set of feature features of the problem and the solution from the solution data; Machine learning processing means for storing in the learning result data storage means learning result data for learning what kind of solution is likely to be obtained from the set, and 6) summary object Summary candidate generation processing means for inputting a text and generating a summary candidate from the input text; and 7) what kind of solution is likely to be obtained from the feature set by extracting a set of features from the input text and the summary candidate 8) summary candidate-estimated solution pair generation processing means for generating a pair of summary candidate and estimated solution based on the learning result data, and 8) an estimated solution is determined from the summary candidate-estimated solution pair Summarization selection processing means for selecting a pair with good evaluation and having the highest certainty factor, and summarizing the pair of candidate candidates.

本発明は，テキスト記憶手段に記憶されたテキストの要約を表示装置に表示し，前記要約に対するユーザの評価の入力を受け付けて前記要約の評価とする。そして，前記テキストおよび前記要約で構成される問題に対し前記評価を解として付与して生成した解データを解データ記憶手段に記憶し，前記解データから前記問題の素性の集合と前記解との組を抽出し，当該組から，どのような素性のときにどのような解となりやすいかを学習した学習結果データを学習結果データ記憶手段に記憶する。 According to the present invention, a summary of text stored in the text storage means is displayed on a display device, and an input of a user's evaluation for the summary is received and the summary is evaluated. Then, solution data generated by assigning the evaluation as a solution to the problem composed of the text and the summary is stored in a solution data storage means, and a set of feature features of the problem and the solution are determined from the solution data. A pair is extracted, and learning result data obtained by learning what kind of solution is likely to be obtained from the pair is stored in the learning result data storage unit.

その後，要約対象のテキストを入力し，前記入力テキストから要約候補を生成し，前記入力テキストおよび前記要約候補から素性の集合を抽出し，当該素性の集合からどのような解となりやすいかを前記学習結果データをもとに推定し，要約候補と推定解との対（要約候補−推定解対）を生成する。そして，前記要約候補−推定解対から，推定解が所定の良い評価でかつ確信度が最高の対を選択し，当該対の要約候補を要約とする。 After that, a text to be summarized is input, a summary candidate is generated from the input text, a set of features is extracted from the input text and the summary candidate, and what kind of solution is likely to be obtained from the set of features is learned. Estimate based on the result data, and generate a pair of summary candidate and estimated solution (summary candidate-estimated solution pair). Then, from the summary candidate-estimated solution pair, a pair whose estimated solution has a predetermined good evaluation and the highest certainty factor is selected, and the summary candidate of the pair is used as a summary.

これにより，表示した要約に対するユーザの評価を用いてユーザが良いと考える要約を機械学習し，その後に入力したテキストについてユーザに特化した要約を行うことができる。 This makes it possible to machine-learn a summary that the user considers good using the user's evaluation of the displayed summary, and to perform a user-specific summary on the text that is input thereafter.

または，本発明は，１）テキストを記憶するテキスト記憶手段と，２）前記テキストを表示装置に表示するテキスト表示処理手段と，３）前記テキストからユーザによって指定された範囲の文データを抽出して前記テキストのユーザ指定要約とする要約編集処理手段と，４）所定の規則に基づいて要約を生成する処理もしくは機械学習法を用いて要約を生成する処理のいずれかの自動要約生成処理により前記テキストの要約を生成し，前記テキストおよび前記ユーザ指定要約で構成される問題に対し前記ユーザによって選ばれた良い要約であることを示す所定の良い評価を解として付与して生成した解データと，前記テキストおよび前記自動要約生成処理による要約であって前記ユーザ指定要約以外の部分からなるもので構成される問題に対し前記ユーザ指定要約ではないことを示す所定の悪い評価を解として付与して生成した解データとを解データ記憶手段に記憶する解データ出力処理手段と，５）前記解データから前記問題の素性の集合と前記解との組を抽出し，当該組から，どのような素性のときにどのような解となりやすいかを学習した学習結果データを学習結果データ記憶手段に記憶する機械学習処理手段と，６）要約対象のテキストを入力し，前記入力テキストから要約候補を生成する要約候補生成処理手段と，７）前記入力テキストおよび前記要約候補から素性の集合を抽出し，当該素性の集合からどのような解となりやすいかを前記学習結果データをもとに推定し，要約候補と推定解との対を生成する要約候補−推定解対生成処理手段と，８）前記要約候補−推定解対から，推定解が所定の良い評価でかつ確信度が最高の対を選択し，当該対の要約候補を要約とする要約選択処理手段とを備える。 Alternatively, the present invention includes 1) text storage means for storing text, 2) text display processing means for displaying the text on a display device, and 3) extracting sentence data in a range designated by the user from the text. A summary editing processing means for providing a user-specified summary of the text, and 4) an automatic summary generation process of either a process of generating a summary based on a predetermined rule or a process of generating a summary using a machine learning method Solution data generated by generating a summary of a text and giving a predetermined good evaluation as a solution indicating a good summary selected by the user for a problem composed of the text and the user-specified summary; To solve a problem composed of the text and the summary generated by the automatic summary generation process, which is composed of parts other than the user-specified summary. Solution data output processing means for storing, in a solution data storage means, solution data generated by giving a predetermined bad evaluation indicating that it is not the user-specified summary as a solution; 5) from the solution data, A machine learning processing means for extracting a set of the set and the solution, and storing learning result data in the learning result data storage means that learns what kind of solution is likely to be generated from the set; 6) summary candidate generation processing means for inputting a text to be summarized and generating a summary candidate from the input text; and 7) extracting a set of features from the input text and the summary candidate and how from the feature set. A summary candidate-estimated solution pair generation processing means for estimating whether a solution is likely to be a solution based on the learning result data and generating a pair of a summary candidate and an estimated solution; 8) the summary candidate-estimation Of a pair, the estimated solutions and confidence predetermined good evaluation to select the best pair, and a summary selection processing means to summarize the summary candidates of the pair.

本発明は，テキスト記憶手段に記憶されたテキストを表示装置に表示し，前記テキストからユーザによって指定された範囲の文データを抽出して前記テキストのユーザ指定要約とする。そして，所定の規則に基づいて要約を生成する処理もしくは機械学習法を用いて要約を生成する処理のいずれかの自動要約生成処理により前記テキストの要約を生成し，前記テキストおよび前記ユーザ指定要約で構成される問題に対し前記ユーザによって選ばれた良い要約であることを示す所定の良い評価を解として付与して生成した解データと，前記テキストおよび前記自動要約生成処理による要約であって前記ユーザ指定要約以外の部分からなるもので構成される問題に対し前記ユーザ指定要約ではないことを示す所定の悪い評価を解として付与して生成した解データとを解データ記憶手段に記憶する。さらに，前記解データから前記問題の素性の集合と前記解との組を抽出し，当該組から，どのような素性のときにどのような解となりやすいかを学習した学習結果データを学習結果データ記憶手段に記憶する。その後，要約対象のテキストを入力し，前記入力テキストから要約候補を生成し，前記入力テキストおよび前記要約候補から素性の集合を抽出し，当該素性の集合からどのような解となりやすいかを前記学習結果データをもとに推定し，要約候補と推定解との対（要約候補−推定解対）を生成する。そして，前記要約候補−推定解対から，推定解が所定の良い評価でかつ確信度が最高の対を選択し，当該対の要約候補を要約とする。 In the present invention, the text stored in the text storage means is displayed on a display device, and sentence data in a range specified by the user is extracted from the text to obtain a user-specified summary of the text. Then, the summary of the text is generated by an automatic summary generation process of either a process of generating a summary based on a predetermined rule or a process of generating a summary using a machine learning method, and the text and the user-specified summary are Solution data generated by giving a predetermined good evaluation indicating that the problem is a good summary selected by the user as a solution, a summary by the text and the automatic summary generation process, and the user Solution data storage means stores solution data generated by giving a predetermined bad evaluation indicating that it is not the user-specified summary to a problem composed of parts other than the specified summary. Furthermore, a set of the feature of the problem and the solution is extracted from the solution data, and learning result data obtained by learning from the set what kind of solution is likely to become a learning result data Store in the storage means. After that, a text to be summarized is input, a summary candidate is generated from the input text, a set of features is extracted from the input text and the summary candidate, and what kind of solution is likely to be obtained from the set of features is learned. Estimate based on the result data, and generate a pair of summary candidate and estimated solution (summary candidate-estimated solution pair). Then, from the summary candidate-estimated solution pair, a pair whose estimated solution has a predetermined good evaluation and the highest certainty factor is selected, and the summary candidate of the pair is used as a summary.

これにより，表示したテキストからユーザによって抽出された部分をユーザが良いと評価した要約として機械学習し，その後に入力したテキストについてユーザに特化した要約を行うことができる。 As a result, a part extracted by the user from the displayed text can be machine-learned as a summary that the user has evaluated as good, and then the user-specific summary can be performed on the text that has been input thereafter.

または，本発明は，１）テキストおよび前記テキストの要約を記憶するテキスト記憶手段と，２）前記要約を表示装置に表示する要約表示処理手段と，３）前記要約に対するユーザの評価の入力を受け付けて前記要約の評価とする評価設定処理手段と，４）前記テキストおよび前記要約で構成される問題に対し前記ユーザが設定した評価を解として付与した解データを生成し，解データ記憶手段に記憶する解データ出力処理手段と，５）所定の評価のうち前記解となった評価以外の評価を解候補として，前記解データから前記問題の素性の集合と解もしくは解候補との組を抽出し，前記素性の集合と解との組を正例と前記素性の集合と解候補との組を負例とする素性−解対・素性−解候補対抽出処理手段と，６）前記抽出した組を教師信号として，どのような解もしくは解候補と素性の集合のときに正例である確率または負例である確率となるかを学習した学習結果データを学習結果データ記憶手段に記憶する機械学習処理手段と，７）要約対象のテキストを入力し，前記入力テキストから要約候補を生成する要約候補生成処理手段と，８）前記所定の評価を解の候補として，前記入力テキストおよび前記要約候補から素性の集合と解の候補との組を生成し，当該素性の集合と解の候補の組の場合に正例もしくは負例である確率を前記学習結果データをもとに推定し，前記推定した結果を推定解として前記要約候補と解の候補の組と前記推定解との対を生成する要約候補−推定解対生成処理手段と，９）前記要約候補と解の候補の組−推定解対から，解の候補が所定の良い評価でかつ推定解の正例の確率が最高の対を選択し，当該対の要約候補を要約とする要約選択処理手段とを備える。 Alternatively, the present invention includes: 1) text storage means for storing text and a summary of the text; 2) summary display processing means for displaying the summary on a display device; and 3) accepting an input of user evaluation for the summary. Evaluation setting processing means for evaluating the summary; and 4) generating solution data to which the evaluation set by the user is given as a solution to the problem composed of the text and the summary and storing the solution data in the solution data storage means And 5) extracting a set of feature features of the problem and a set of solutions or solution candidates from the solution data by using evaluations other than the evaluation that has become the solution among predetermined evaluations as solution candidates. , A feature-solution pair / feature-solution candidate pair extraction processing means having a positive example as a set of feature sets and solutions and a negative example as a set of feature sets and solution candidates, and 6) the extracted sets The teacher signal and Machine learning processing means for storing learning result data in a learning result data storage means for learning what kind of solution or solution candidate and feature is a probability of being a positive example or a probability of being a negative example; 7) summary candidate generation processing means for inputting a text to be summarized and generating a summary candidate from the input text; 8) a set of features from the input text and the summary candidate using the predetermined evaluation as a solution candidate A set of feature candidates and solution candidates, and in the case of the feature set and solution candidate pair, the probability of being positive or negative is estimated based on the learning result data, and the estimated result is estimated A summary candidate-estimated solution pair generation processing means for generating a pair of the summary candidate, the solution candidate pair and the estimated solution as a solution; 9) a solution from the summary candidate / solution candidate pair-estimated solution pair; Is the candidate's good evaluation? The probability of positive cases estimated solutions to select the best pair, and a summary selection processing means to summarize the summary candidates of the pair.

または，本発明は，１）テキストを記憶するテキスト記憶手段と，２）前記テキストを表示装置に表示するテキスト表示処理手段と，３）前記テキストからユーザによって指定された範囲の文データを抽出して前記テキストのユーザ指定要約とする要約編集処理手段と，４）所定の規則に基づいて要約を生成する処理もしくは機械学習法を用いて要約を生成する処理のいずれかの自動要約生成処理により前記テキストの要約を生成し，前記テキストおよび前記ユーザ指定要約で構成される問題に対し前記ユーザによって選ばれた良い要約であることを示す所定の良い評価を解として付与して生成した解データと，前記テキストおよび前記自動要約生成処理による要約であって前記ユーザ指定要約以外の部分からなるもので構成される問題に対し前記ユーザ指定要約ではないことを示す所定の悪い評価を解として付与して生成した解データとを解データ記憶手段に記憶する解データ出力処理手段と，５）所定の評価のうち前記解となった評価以外の評価を解候補として，前記解データから前記問題の素性の集合と解もしくは解候補との組を抽出し，前記素性の集合と解との組を正例と前記素性の集合と解候補との組を負例とする素性−解対・素性−解候補対抽出処理手段と，６）前記抽出した組を教師信号として，どのような解もしくは解候補と素性の集合のときに正例である確率または負例である確率となるかを学習した学習結果データを学習結果データ記憶手段に記憶する機械学習処理手段と，７）要約対象のテキストを入力し，前記入力テキストから要約候補を生成する要約候補生成処理手段と，８）前記所定の評価を解の候補として，前記入力テキストおよび前記要約候補から素性の集合と解の候補との組を生成し，当該素性の集合と解の候補の組の場合に正例もしくは負例である確率を前記学習結果データをもとに推定し，前記推定した結果を推定解として前記要約候補と解の候補との組と前記推定解との対を生成する要約候補−推定解対生成処理手段と，９）前記要約候補と解の候補との組−推定解対から，解の候補が所定の良い評価でかつ推定解の正例の確率が最高の対を選択し，当該対の要約候補を要約とする要約選択処理手段とを備える。 Alternatively, the present invention includes 1) text storage means for storing text, 2) text display processing means for displaying the text on a display device, and 3) extracting sentence data in a range designated by the user from the text. A summary editing processing means for providing a user-specified summary of the text, and 4) an automatic summary generation process of either a process of generating a summary based on a predetermined rule or a process of generating a summary using a machine learning method Solution data generated by generating a summary of a text and giving a predetermined good evaluation as a solution indicating a good summary selected by the user for a problem composed of the text and the user-specified summary; To solve a problem composed of the text and the summary generated by the automatic summary generation process, which is composed of parts other than the user-specified summary. Solution data output processing means for storing, in a solution data storage means, solution data generated by giving a predetermined bad evaluation indicating that it is not the user-specified summary as a solution; and 5) the solution of the predetermined evaluation. A set of feature features and solutions or solution candidates of the problem is extracted from the solution data with evaluations other than the evaluation as solution candidates, and a set of the feature set and solution is defined as a positive example and the feature set. Feature-solution pair / feature-solution candidate pair extraction processing means having a pair with a solution candidate as a negative example; 6) What kind of solution or set of solution candidates and features is the extracted pair as a teacher signal Machine learning processing means for storing in the learning result data storage means learning result data that has been learned whether the probability is a positive example or a negative example, and 7) a text to be summarized is input and summarization is performed from the input text. Summarization that generates candidates And 8) generating a set of feature sets and solution candidates from the input text and the summary candidates using the predetermined evaluation as a solution candidate, and generating a set of the feature set and solution candidates. In this case, the probability of being a positive example or a negative example is estimated based on the learning result data, and a pair of the summary candidate and the solution candidate and the estimated solution is generated using the estimated result as an estimated solution. The summary candidate-estimated solution pair generation processing means, and 9) the pair of the summary candidate and the solution candidate-estimated solution pair, the solution candidate has a predetermined good evaluation and the probability of the positive example of the estimated solution is the highest. And summary selection processing means for summarizing the pair of summary candidates.

または，本発明は，１）テキストを記憶するテキスト記憶手段と，２）前記テキストを表示装置に表示するテキスト表示処理手段と，３）前記テキストからユーザによって指定された範囲の文データを抽出して前記テキストのユーザ指定要約とする要約編集処理手段と，４）前記テキストを問題とし前記問題に対する前記ユーザ指定要約を解とする解データを生成し解データ記憶手段に記憶する解データ出力処理手段と，５）前記解データから前記問題の素性の集合と前記解との組を抽出し，当該組から，どのような素性のときにどのような解となりやすいかを学習した学習結果データを学習結果データ記憶手段に記憶する機械学習処理手段と，６）要約対象のテキストを入力し，前記入力テキストから素性の集合を抽出し，当該素性の集合からどのような解となりやすいかを前記学習結果データをもとに推定する解推定処理手段と，７）前記解推定処理手段で推定された解を前記入力テキストの要約として出力する要約選択処理手段とを備える。 Alternatively, the present invention includes 1) text storage means for storing text, 2) text display processing means for displaying the text on a display device, and 3) extracting sentence data in a range designated by the user from the text. Summary edit processing means for providing a user-specified summary of the text; and 4) solution data output processing means for generating solution data having the text as a problem and solving the user-specified summary for the problem and storing the solution data in a solution data storage means 5) A set of feature features and the solution of the problem are extracted from the solution data, and learning result data is learned from the set to learn what kind of solution is likely to become a solution. Machine learning processing means for storing in the result data storage means, 6) inputting a text to be summarized, extracting a set of features from the input text, and collecting the feature set Solution estimation processing means for estimating what kind of solution is likely to be based on the learning result data, and 7) summary selection processing means for outputting the solution estimated by the solution estimation processing means as a summary of the input text With.

これにより，表示した要約に対するユーザの評価を用いて要約処理を機械学習し，入力したテキストについてユーザに特化した要約を行うことができる。 As a result, it is possible to perform machine learning of the summarization process using the user's evaluation on the displayed summaries, and to perform user-specific summarization on the input text.

または，本発明は，１）テキストを記憶するテキスト記憶手段と，２）前記テキストを表示装置に表示するテキスト表示処理手段と，３）前記テキストからユーザによって指定された範囲の文データを抽出して前記テキストのユーザ指定要約とする要約編集処理手段と，４）前記テキストを問題とし前記問題に対する前記ユーザ指定要約を解とする解データを生成し解データ記憶手段に記憶する解データ出力処理手段と，５）所定の規則に基づいて要約を生成する処理もしくは機械学習法を用いて要約を生成する処理のいずれかの自動要約生成処理により前記テキストの要約を生成し，前記要約のうち前記ユーザ指定要約以外の部分からなるものを解候補とし，前記解データから解もしくは解候補と前記問題の素性の集合との組を抽出し，前記素性の集合と解との組を正例と前記素性の集合と解候補との組を負例とする素性−解対・素性−解候補対抽出処理手段と，６）前記抽出した組を教師信号として，どのような解もしくは解候補と素性の集合のときに正例である確率または負例である確率となるかを学習した学習結果データを学習結果データ記憶手段に記憶する機械学習処理手段と，７）要約対象のテキストを入力し，前記入力テキストから要約候補を生成する要約候補生成処理手段と，８）前記要約候補を解の候補として，前記入力テキストおよび前記要約候補から素性の集合と解の候補との組を生成し，当該素性の集合と解の候補との組の場合に正例もしくは負例である確率を前記学習結果データをもとに推定し，前記推定した結果を推定解として，前記要約候補と推定解との対を生成する要約候補−推定解対生成処理手段と，９）前記要約候補−推定解対から前記推定解の正例の確率が最高の対を選択し，当該対の要約候補を要約とする要約選択処理手段とを備える。 Alternatively, the present invention includes 1) text storage means for storing text, 2) text display processing means for displaying the text on a display device, and 3) extracting sentence data in a range designated by the user from the text. Summary edit processing means for providing a user-specified summary of the text; and 4) solution data output processing means for generating solution data having the text as a problem and solving the user-specified summary for the problem and storing the solution data in a solution data storage means And 5) generating a summary of the text by an automatic summary generation process of either a process of generating a summary based on a predetermined rule or a process of generating a summary using a machine learning method, and the user of the summaries A solution candidate other than the specified summary is taken as a solution candidate, and a set of the solution or solution candidate and the feature set of the problem is extracted from the solution data. Feature-solution pair / feature-solution candidate pair extraction processing means with a positive example of a set of feature sets and solutions, and a negative example of a set of feature sets and solution candidates, and 6) a teacher of the extracted sets Machine learning processing means for storing learning result data in a learning result data storage means that learns what kind of solution or solution candidate and feature set has a probability of being a positive example or a probability of being a negative example as a signal And 7) summary candidate generation processing means for inputting a text to be summarized and generating a summary candidate from the input text, and 8) a set of features from the input text and the summary candidate with the summary candidate as a solution candidate A set of feature candidates and a solution candidate, and in the case of a set of the feature set and solution candidate, the probability of being positive or negative is estimated based on the learning result data, and the estimated result is As the estimation solution, the summary candidate and the estimation 9) a summary candidate-estimated solution pair generation processing means for generating a pair of and 9) selecting a pair having the highest probability of a positive example of the estimated solution from the summary candidate-estimated solution pair, and summarizing the summary candidates of the pair And summary selection processing means.

これにより，表示したテキストから抽出された部分をユーザが良いと評価した要約として要約処理を機械学習し，入力したテキストについてユーザに特化した要約を行うことができる。 This makes it possible to machine-learn the summarization process as a summary in which the user has evaluated the portion extracted from the displayed text as being good, and to perform summarization specialized for the user on the input text.

本発明にかかる処理装置の各手段または機能または要素は，コンピュータが実行可能なプログラムによっても実現できる。このプログラムは，コンピュータが読み取り可能な，可搬媒体メモリ，半導体メモリ，ハードディスクなどの適当な記録媒体に格納することができ，これらの記録媒体に記録して提供され，または，通信インタフェースを介して種々の通信網を利用した送受信により提供されるものである。 Each means, function, or element of the processing apparatus according to the present invention can be realized by a computer-executable program. This program can be stored in an appropriate recording medium such as a portable medium memory, semiconductor memory, or hard disk, which can be read by a computer, provided by being recorded on these recording media, or via a communication interface. It is provided by transmission / reception using various communication networks.

本発明によれば，ユーザは，機械学習の解データとされる要約結果に対する評価を任意に設定することができるため，コンピュータを用いた自動要約処理においても，一つの類型で要約するのではなく，ユーザに特化した要約を行なえることが可能となる。 According to the present invention, since the user can arbitrarily set the evaluation for the summary result that is the solution data of the machine learning, the automatic summarization process using the computer is not summarized in one type. , It is possible to perform user-specific summaries.

機械学習法を用いた自動要約処理においても，同じ評価にもとづいた要約を行なうのではなく，ユーザに特化した要約を可能にするために，教師となる解データをユーザが自由に編集できる。 In the automatic summarization process using the machine learning method, the user can freely edit the solution data as a teacher in order to enable the summarization specific to the user, instead of performing the summarization based on the same evaluation.

また，同一人物であっても要約の評価が変化することが考えられるが，本発明によれば，同一人物であっても随時要約結果に対する評価を設定でき，新たな解データを用いて機械学習し直すことにより，新しい評価態度に合わせた要約を行なうことが可能となる。 Although it is considered that the evaluation of the summary changes even for the same person, according to the present invention, the evaluation for the summary result can be set at any time even for the same person, and machine learning is performed using new solution data. By re-doing, it is possible to perform summarization in accordance with the new evaluation attitude.

〔第１の実施の形態〕
図１に，第１の実施の形態における本発明の処理装置の構成例を示す。 [First Embodiment]
FIG. 1 shows a configuration example of the processing apparatus of the present invention in the first embodiment.

自動要約処理装置１０は，評価カスタマイズ手段１１０と，解データ記憶部１２０と，解−素性対抽出部１２１と，機械学習部１２２と，学習結果データ記憶部１２３と，要約候補生成部１２４と，素性抽出部１２５と，要約候補−推定解対生成部１２６と，要約選択部１２８とを備える。 The automatic summary processing device 10 includes an evaluation customization unit 110, a solution data storage unit 120, a solution-feature pair extraction unit 121, a machine learning unit 122, a learning result data storage unit 123, a summary candidate generation unit 124, A feature extraction unit 125, a summary candidate-estimated solution pair generation unit 126, and a summary selection unit 128 are provided.

評価カスタマイズ手段１１０は，解データ編集処理を実現する処理手段である。また，解データ記憶部１２０と，解−素性対抽出部１２１と，機械学習部１２２と，学習結果データ記憶部１２３とは，特許請求の範囲に示す自動要約処理装置の機械学習処理手段を実現する処理手段である。 The evaluation customizing unit 110 is a processing unit that realizes solution data editing processing. The solution data storage unit 120, the solution-feature pair extraction unit 121, the machine learning unit 122, and the learning result data storage unit 123 realize the machine learning processing unit of the automatic summarization processing device shown in the claims. Processing means.

評価カスタマイズ手段１１０は，要約結果やその評価をユーザごとにカスタマイズする手段であって，要約表示部１１１と，評価付与部１１２とを備える。 The evaluation customizing means 110 is means for customizing the summary result and its evaluation for each user, and includes a summary display unit 111 and an evaluation assigning unit 112.

要約表示部１１１は，予め用意されたテキスト・要約４の要約結果を表示装置（図１に図示しない）に表示する手段である。 The summary display unit 111 is a means for displaying a summary result of the prepared text / summary 4 on a display device (not shown in FIG. 1).

テキスト・要約４は，テキストとその要約結果からなる。テキストは，一または複数の記事などからなる文書データである。要約結果は，テキストを要約した文書データである。要約結果としては，人手で生成したもの，自動要約処理装置１０が入力したテキスト２に対して出力した要約３もしくは要約候補生成部１２４が生成し解データ記憶部１２０に記憶した要約候補であってもよい。 Text / Summary 4 consists of a text and its summary result. Text is document data consisting of one or more articles. The summary result is document data that summarizes the text. The summary results are the ones generated manually, the summary 3 output for the text 2 input by the automatic summary processing device 10, or the summary candidates generated by the summary candidate generation unit 124 and stored in the solution data storage unit 120. Also good.

評価付与部１１２は，要約表示部１１１が表示した要約結果に対してユーザが入力した評価を付与し，または，要約結果に予め与えられている評価をユーザが入力した評価に変更する手段である。 The evaluation assigning unit 112 is a unit that assigns an evaluation input by the user to the summary result displayed by the summary display unit 111 or changes an evaluation given in advance to the summary result to an evaluation input by the user. .

解データ記憶部１２０は，機械学習部１２２が機械学習法を実行する際に教師とする解データを記憶する手段である。解データ記憶部１２０には，解データとして，テキストおよびその要約結果とからなる問題と要約結果に対する評価である解との組である事例が記憶される。 The solution data storage unit 120 is a unit that stores solution data to be used as a teacher when the machine learning unit 122 executes the machine learning method. The solution data storage unit 120 stores, as solution data, a case that is a set of a problem including a text and its summary result and a solution that is an evaluation of the summary result.

解−素性対抽出部１２１は，解データ記憶部１２０に記憶されている事例ごとに解と素性の集合との組を抽出する手段である。 The solution-feature pair extraction unit 121 is a unit that extracts a set of a solution and a set of features for each case stored in the solution data storage unit 120.

素性とは，解析に用いる情報の細かい１単位を意味し，ここでは，１）文のなめらかさを示す情報，２）内容をよく表しているかどうかを示す情報，および，３）自動要約処理で用いられる特徴的な情報などである。 A feature means one unit of information used for analysis. Here, 1) information indicating the smoothness of a sentence, 2) information indicating whether the contents are well expressed, and 3) automatic summarization processing. This is characteristic information used.

機械学習部１２２は，解−素性対抽出部１２１により抽出された解と素性の集合との組から，どのような素性の集合のときにどのような解になりやすいかを機械学習法により学習し，学習結果を学習結果データ記憶部１２３に保存する手段である。機械学習部１２２は，解データを用いた機械学習法であればどのような手法で処理を行ってもよい。手法としては，例えば，決定木法，サポートベクトル法，パラメータチューニング法，シンプルベイズ法，最大エントロピー法，決定リスト法などがある。 The machine learning unit 122 learns, based on a machine learning method, what kind of solution is likely to be generated from the set of the solution extracted by the solution-feature pair extraction unit 121 and the set of features. The learning result is stored in the learning result data storage unit 123. The machine learning unit 122 may perform processing by any method as long as it is a machine learning method using solution data. Examples of methods include a decision tree method, a support vector method, a parameter tuning method, a simple Bayes method, a maximum entropy method, and a decision list method.

学習結果データ記憶部１２３は，機械学習部１２２の学習結果データを記憶する手段である。 The learning result data storage unit 123 is a unit that stores the learning result data of the machine learning unit 122.

要約候補生成部１２４は，入力されたテキスト２から，所定の方法にもとづいて要約候補を生成する手段である。要約候補生成部１２４は，重要文選択モデル，重要箇所選択モデル，変形規則を利用したモデル，ランダムジェネレーションを利用したモデルなどの種々のモデルを用いて要約候補を生成する。 The summary candidate generating unit 124 is a means for generating summary candidates from the input text 2 based on a predetermined method. The summary candidate generation unit 124 generates summary candidates using various models such as an important sentence selection model, an important part selection model, a model using a deformation rule, and a model using random generation.

素性抽出部１２５は，テキスト２および要約候補生成部１２４で生成された要約候補について素性の集合を抽出して要約候補−推定解対生成部１２６へ渡す手段である。 The feature extraction unit 125 is a unit that extracts a set of features for the summary candidates generated by the text 2 and the summary candidate generation unit 124 and passes them to the summary candidate-estimated solution pair generation unit 126.

要約候補−推定解対生成部１２６は，学習結果データ記憶部１２３の学習結果データを参照して，素性抽出部１２５から渡された素性の集合の場合に，どのような解になりやすいかを推定して，要約候補と推定解との対（要約候補−推定解対）１２７を生成する手段である。要約候補−推定解対生成部１２６は，さらに，各要約候補−推定解対１２７に，その推定解である確信度（確率）を求めて付与しておく。 The summary candidate-estimated solution pair generation unit 126 refers to the learning result data in the learning result data storage unit 123 and determines what kind of solution is likely to occur in the case of the set of features passed from the feature extraction unit 125. This is a means for generating a pair (summary candidate-estimated solution pair) 127 of a summary candidate and an estimated solution by estimation. The summary candidate-estimated solution pair generation unit 126 further obtains and assigns to each summary candidate-estimated solution pair 127 a certainty factor (probability) that is the estimated solution.

要約選択部１２８は，要約候補−推定解対１２７を受け取り，確信度の値が最も高い要約候補−推定解対１２７を選択し，その要約候補を要約３とする手段である。 The summary selection unit 128 is means for receiving the summary candidate-estimated solution pair 127, selecting the summary candidate-estimated solution pair 127 having the highest certainty factor, and setting the summary candidate as the summary 3.

第１の実施の形態における評価カスタマイズ処理を説明するため，３人のユーザＡ，Ｂ，Ｃが要約結果をカスタマイズする場合を考える。 To describe the evaluation customization process in the first embodiment, consider a case where three users A, B, and C customize the summary result.

ユーザＡは要約結果に精度に関する記載が含まれていることを重視して評価すると仮定する。ユーザＢは要約結果に手法に関する記載が含まれていることを重視し，ユーザＣは，要約結果に手法と精度の両方に関する記載が含まれていることを重視して評価すると仮定する。また，要約結果の評価を３段階に分けて，評価１＝よい，評価２＝どちらでもない，評価３＝悪い，のいずれかの分類先（評価）を与えるとする。 Assume that the user A evaluates with emphasis on the fact that the summary result includes a description about accuracy. It is assumed that user B emphasizes that the description of the method is included in the summary result, and user C evaluates that the description of both the method and the accuracy is included in the summary result. Further, it is assumed that the evaluation of the summary result is divided into three stages, and a classification destination (evaluation) of evaluation 1 = good, evaluation 2 = none, evaluation 3 = bad is given.

図２に，第１の実施の形態における評価カスタマイズ処理の流れを示す。 FIG. 2 shows the flow of evaluation customization processing in the first embodiment.

まず，テキスト・要約４が用意されているとする。図３にテキスト・要約４のテキストの例を示し，図４に要約結果の例を示す。図４（Ａ）〜（Ｃ）のそれぞれに，３つの要約結果ｒ１，ｒ２，ｒ３を示す。 First, assume that text / summary 4 is prepared. FIG. 3 shows an example of text / summary 4 text, and FIG. 4 shows an example of summary results. Each of FIGS. 4A to 4C shows three summary results r1, r2, and r3.

要約表示部１１１は，テキスト・要約４から取り出した要約結果を表示画面に表示する（ステップＳ１）。そして，評価付与部１１２は，ユーザが入力した評価を受け付け，その入力された評価を表示された要約結果の解（評価）とする（ステップＳ２）。 The summary display unit 111 displays the summary result extracted from the text / summary 4 on the display screen (step S1). And the evaluation provision part 112 receives the evaluation which the user input, and makes the input evaluation the solution (evaluation) of the displayed summary result (step S2).

ここで，ユーザＡが自動要約処理装置１０を使用する場合を想定する。ユーザＡは，図４（Ａ）の要約結果ｒ１に対して，精度に関係することが要約結果として抽出されているため，評価１をつける。すると，評価付与部１１２は，ユーザの入力（評価１）を受け付けて，事例ｃ１の解として評価１を設定する。 Here, it is assumed that the user A uses the automatic summary processing device 10. The user A assigns an evaluation of 1 to the summary result r1 in FIG. Then, the evaluation provision part 112 receives a user's input (evaluation 1), and sets evaluation 1 as a solution of case c1.

次に，要約表示部１１１が図４（Ｂ）に示す事例ｃ２の要約結果ｒ２を表示した場合には，要約結果ｒ２は精度に関係することが抽出されていないため，ユーザＡは，要約結果ｒ２に対して評価３をつけ，評価付与部１１２は，事例ｃ２の解として評価３を設定する。 Next, when the summary display unit 111 displays the summary result r2 of the case c2 shown in FIG. 4B, it is not extracted that the summary result r2 is related to accuracy. Evaluation 3 is assigned to r2, and the evaluation assigning unit 112 sets evaluation 3 as the solution of case c2.

さらに，要約表示部１１１が図４（Ｃ）に示す事例ｃ３の要約結果ｒ３を表示した場合には，要約結果ｒ３は精度に関係するところが抽出されているが若干冗長であるため，ユーザＡは評価２をつけ，評価付与部１１２は事例ｃ３の解として評価２を設定する。 Furthermore, when the summary display unit 111 displays the summary result r3 of the case c3 shown in FIG. 4C, the summary result r3 is extracted with respect to accuracy, but is slightly redundant. The evaluation 2 is assigned, and the evaluation assigning unit 112 sets the evaluation 2 as the solution of the case c3.

同様に，ユーザＢの場合を想定する。ユーザＢは，図４（Ａ）に示す要約結果ｒ１に対して手法に関係するところが抽出されていないために評価３をつけ，図４（Ｂ）に示す要約結果ｒ２に対して手法に関係するところが抽出されていることから評価１をつけ，図４（Ｃ）に示す要約結果ｒ３に対して手法に関係するところが抽出されているが若干冗長であるため評価２をつける。 Similarly, the case of user B is assumed. User B gives evaluation 3 because the place related to the technique is not extracted for the summary result r1 shown in FIG. 4 (A), and is related to the technique for the summary result r2 shown in FIG. 4 (B). However, evaluation 1 is assigned because it has been extracted, and evaluation 2 is assigned because a portion related to the technique is extracted for the summary result r3 shown in FIG.

また，同様に，ユーザＣの場合を想定する。ユーザＣは，図４（Ａ）に示す要約結果ｒ１に対して精度に関係するところが抽出されているが手法に関係するところが抽出されれていないため評価２をつけ，図４（Ｂ）に示す要約結果ｒ２に対して手法に関係するところが抽出されているが精度に関係するところが抽出されていないため評価２をつけ，図４（Ｃ）に示す要約結果ｒ３について手法および精度のいずれにも関係するところが抽出されているが若干冗長であるため評価１をつける。 Similarly, the case of user C is assumed. The user C attaches the evaluation 2 to the summary result r1 shown in FIG. 4A, but the evaluation 2 is given because the place related to the method is not extracted, and the result shown in FIG. Although the place related to the method is extracted for the summary result r2, the place related to the accuracy is not extracted, so the evaluation 2 is given, and the summary result r3 shown in FIG. Although it is extracted, it is evaluated as 1 because it is slightly redundant.

評価付与部１１２は，ユーザＢおよびユーザＣごとに要約結果ｒ１〜ｒ３に対する入力評価を，それぞれの事例ｃ１〜ｃ３の解（評価）として設定する。 The evaluation assigning unit 112 sets the input evaluation for the summary results r1 to r3 for each of the users B and C as solutions (evaluations) of the respective cases c1 to c3.

そして，評価カスタマイズ手段１１０は，テキスト・要約４で与えられたテキストとその要約結果と解とを事例として解データ記憶部１２０に記憶する（ステップＳ３）。 Then, the evaluation customizing unit 110 stores the text given in the text / summary 4 and its summary result and solution in the solution data storage unit 120 as an example (step S3).

図５に，機械学習処理および自動要約処理の流れを示す。 FIG. 5 shows the flow of machine learning processing and automatic summarization processing.

解−素性対抽出部１２１は，解データ記憶部１２０から，事例ごとに解と素性の集合との組を抽出する（ステップＳ１１）。 The solution-feature pair extraction unit 121 extracts a set of a solution and a feature set for each case from the solution data storage unit 120 (step S11).

解−素性対抽出部１２１は，例えば，１）文のなめらかさを示す情報として，ｋ−ｇｒａm 形態素列のコーパスでの存在，かかりうけ文節間の意味的整合度などを，また，２）内容をよく表しているかどうかを示す情報として，要約前のテキストにあったキーフレーズの包含率などを，また，３）自動要約で用いられる情報として，その文の位置やリード文かどうか，ＴＦ／ＩＤＦ（ＴＦは文書中でのその語の出現回数もしくは頻度を示す値，ＩＤＦはあらかじめ持っている多数の文書群のうち，その語が出現する文書数の逆数をいう。），文の長さ，固有表現・接続詞・機能語などの手がかり表現の存在などを，素性として抽出する。 The solution-feature pair extraction unit 121, for example, 1) as information indicating the smoothness of the sentence, the existence of the k-gram morpheme sequence in the corpus, the degree of semantic consistency between the received phrases, and 2) the contents As the information indicating whether or not it is well expressed, including the key phrase inclusion rate in the text before the summarization, and 3) as the information used in the automatic summarization, whether the sentence is a sentence or the lead sentence, TF / IDF (TF is a value indicating the frequency or frequency of occurrence of the word in the document, IDF is the reciprocal of the number of documents in which the word appears in a large number of document groups in advance), sentence length , Presence of clue expressions such as proper expressions, conjunctions, and function words are extracted as features.

次に，機械学習部１２２は，解と素性の集合との組から，どのような素性の集合のときにどのような解になりやすいかを機械学習法により学習し，学習結果を学習結果データ記憶部１２３に記憶する（ステップＳ１２）。 Next, the machine learning unit 122 learns by using a machine learning method what kind of feature is likely to be obtained from the set of the solution and the feature set, and the learning result is obtained as learning result data. It memorize | stores in the memory | storage part 123 (step S12).

ここでユーザＡの処理の場合に，解データ記憶部１２０に記憶される解データの「事例：問題→解」は，
事例ｃ１：テキスト−要約結果ｒ１→評価１，
事例ｃ２：テキスト−要約結果ｒ２→評価３，
事例ｃ３：テキスト−要約結果ｒ３→評価２
となり，機械学習部１２２は，これらの解データをもとに，どのような場合に評価１〜評価３になるかを機械学習で学習する。例えば，事例ｃ１→評価１や事例ｃ３→評価２から，機械学習部１２２は，精度の表現，例えば「数字＋［％］」の表現が出現すると評価が高くなるなどを学習する。ここで，「数字＋［％］」の表現は，学習に用いる素性の例である。 Here, in the case of the process of the user A, “example: problem → solution” of the solution data stored in the solution data storage unit 120 is
Case c1: Text-summary result r1 → evaluation 1,
Case c2: Text-summary result r2 → evaluation 3,
Case c3: Text-summary result r3 → evaluation 2
Thus, the machine learning unit 122 learns by machine learning which case is evaluated 1 to 3 based on these solution data. For example, from the case c1 → evaluation 1 and the case c3 → evaluation 2, the machine learning unit 122 learns that the evaluation increases when an expression of accuracy, for example, the expression “number + [%]” appears. Here, the expression “number + [%]” is an example of a feature used for learning.

また，ユーザＢの処理の場合に，「事例：問題→解」は，
事例ｃ１：テキスト−要約結果ｒ１→評価３，
事例ｃ２：テキスト−要約結果ｒ２→評価１，
事例ｃ３：テキスト−要約結果ｒ３→評価２
となり，機械学習部１２２は，「手がかり表現」や「用例」などの手法に相当する専門用語が出現すると評価が高くなるように学習する。 In addition, in the case of user B's processing, “example: problem → solution”
Case c1: Text-summary result r1 → evaluation 3,
Case c2: Text-summary result r2 → evaluation 1,
Case c3: Text-summary result r3 → evaluation 2
Thus, the machine learning unit 122 learns that the evaluation becomes high when a technical term corresponding to a technique such as “cue expression” or “example” appears.

また，ユーザＣの処理の場合に，「事例：問題→解」は，
「事例ｃ１：テキスト−要約結果ｒ１→評価２，
事例ｃ２：テキスト−要約結果ｒ２→評価２，
事例ｃ３：テキスト−要約結果ｒ３→評価１」
となり，機械学習部１２２は，精度の表現または手法に相当する表現の両方が出現すると評価が高くなるように学習する。 In the case of processing by user C, “example: problem → solution”
“Case c1: Text-summary result r1 → evaluation 2,
Case c2: Text-summary result r2 → evaluation 2,
Case c3: text-summary result r3 → evaluation 1 ”
Thus, the machine learning unit 122 learns so that the evaluation becomes high when both the precision expression and the expression corresponding to the technique appear.

また，要約結果として出力される文章は短いほどよいので，それぞれの処理の場合において，文章の長さが短いほど評価が高くなるように学習する。 Also, the shorter the sentence output as the summary result, the better. In each processing, learning is performed such that the shorter the sentence length, the higher the evaluation.

機械学習の手法としては，例えば，シンプルベイズ法，決定リスト法，最大エントロピー法，サポートベクトルマシン法などを用いる。 As a machine learning method, for example, a simple Bayes method, a decision list method, a maximum entropy method, a support vector machine method, or the like is used.

シンプルベイズ法は，ベイズの定理にもとづいて各分類になる確率を推定し，その確率値が最も大きい分類を求める分類とする方法である。 The Simple Bayes method is a method for estimating the probability of each classification based on Bayes' theorem and obtaining the classification having the largest probability value.

決定リスト法は，素性と分類先の組とを規則とし，それらをあらかじめ定めた優先順序でリストに蓄えおき，検出する対象となる入力が与えられたときに，リストで優先順位の高いところから入力のデータと規則の素性とを比較し，素性が一致した規則の分類先をその入力の分類先とする方法である。 In the decision list method, features and pairs of classification targets are set as rules, and they are stored in a list in a predetermined priority order. When an input to be detected is given, the list starts with the highest priority. This is a method in which the input data is compared with the feature of the rule, and the classification destination of the rule having the same feature is set as the classification destination of the input.

最大エントロピー法は，あらかじめ設定しておいた素性ｆj （１≦ｊ≦ｋ）の集合をＦとするとき，所定の条件式を満足しながらエントロピーを意味する式を最大にするときの確率分布を求め，その確率分布にしたがって求まる各分類の確率のうち，もっとも大きい確率値を持つ分類を求める分類とする方法である。 In the maximum entropy method, when a set of preset features fj (1≤j≤k) is F, a probability distribution when maximizing an expression that means entropy while satisfying a predetermined conditional expression is obtained. This is a method of obtaining a classification having the largest probability value among the probabilities of each classification obtained according to the probability distribution.

サポートベクトルマシン法は，空間を超平面で分割することにより，２つの分類からなるデータを分類する手法である。 The support vector machine method is a method of classifying data composed of two classifications by dividing a space by a hyperplane.

決定リスト法および最大エントロピー法については，以下の参考文献１に，サポートベクトルマシン法については，以下の参考文献２および参考文献３に説明されている。
［参考文献１：村田真樹，内山将夫，内元清貴，馬青，井佐原均，種々の機械学習法を用いた多義解消実験，電子情報通信学会言語理解とコミュニケーション研究会，NCL2001-2, (2001) ]
［参考文献２：Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods,(Cambridge University Press,2000) ］
［参考文献３：Taku Kudoh, Tinysvm:Support Vector machines,(http://cl.aist-nara.ac.jp/taku-ku//software/TinySVM/index.html,2000) ］
その後，要約を求めたいテキスト２が入力されると（ステップＳ１３），要約候補生成部１２４は，例えば以下に示すような処理モデルを用いて，テキスト２から要約候補を作成する（ステップＳ１４）。 The decision list method and the maximum entropy method are described in Reference Document 1 below, and the support vector machine method is described in Reference Document 2 and Reference Document 3 below.
[Reference 1: Maki Murata, Masao Uchiyama, Kiyotaka Uchimoto, Ma Aoi, Hitoshi Isahara, Ambiguity Solving Experiments Using Various Machine Learning Methods, IEICE Language Understanding and Communication Study Group, NCL2001-2, ( 2001)]
[Reference 2: Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, (Cambridge University Press, 2000)]
[Reference 3: Taku Kudoh, Tinysvm: Support Vector machines, (http://cl.aist-nara.ac.jp/taku-ku//software/TinySVM/index.html,2000)]
Thereafter, when the text 2 for which a summary is to be obtained is input (step S13), the summary candidate generating unit 124 creates a summary candidate from the text 2 using, for example, a processing model as shown below (step S14).

１）重要文選択モデル
重要文選択モデルとは，文を単位に要約し，重要と思われる文のみを選択して残すことにより要約を実現するモデルである。このモデルの場合には，あらゆる文選択の状態をすべて解の候補とするとよい。また，すべてを解の候補とすると計算速度に支障が生じる場合には，予め備えておいた選択規則を用いて，この選択規則を満足する文の選択状態のみを解の候補とする。すなわち，所定の選択規則により候補数を減少させて処理の負荷を軽減する。なお，選択規則は，人手による規則であってもよい。 1) Important sentence selection model The important sentence selection model is a model that summarizes sentences in units and realizes the summary by selecting and leaving only sentences that are considered important. In this model, all sentence selection states should be candidates for solutions. In addition, if all of the solutions are candidates for calculation, the calculation speed will be affected. Using a selection rule prepared in advance, only the selected state of a sentence satisfying this selection rule is set as a solution candidate. That is, the processing load is reduced by reducing the number of candidates according to a predetermined selection rule. The selection rule may be a manual rule.

２）重要箇所選択モデル
重要箇所選択モデルとは，文よりも小さいものを要約の単位として，不要なものを削除することにより要約を実現するモデルである。単位を文より小さいものとすること以外については，上記１）重要文選択モデルと同様である。文よりも小さいものとして，例えば文節を用いる。すなわち，文節を単位として不要な文節を消していくことにより要約を実現する。この重要箇所選択モデルの場合は，あらゆる文節の選択の状態をすべて解の候補とする。また，すべてを解の候補とすると計算速度に支障が生じる場合には，上記１）重要文選択モデルと同様に，予め選択規則を用意しておき，この選択規則を満足する文の選択状態のみを解の候補とする。 2) Important part selection model The important part selection model is a model that realizes summarization by deleting unnecessary ones that are smaller than sentences and using unnecessary ones as summarization units. The unit is the same as the 1) important sentence selection model except that the unit is smaller than the sentence. For example, a phrase is used as a sentence smaller than the sentence. In other words, summarization is realized by deleting unnecessary phrases in units of phrases. In the case of this important part selection model, the selection states of all phrases are all candidate solutions. In addition, if all of the candidates are the solution, the calculation speed will be affected. As in the case of 1) important sentence selection model, a selection rule is prepared in advance, and only the selected state of the sentence satisfying this selection rule is prepared. Is a candidate for the solution.

３）変形規則を利用したモデル
変形規則を利用したモデルとは，予め用意した変形規則を利用して要約結果を生成するモデルである。変形規則は，自動処理により獲得するか，または人手で作成しておいたものを利用する。例えば，「Ｘして，Ｙした。」を「Ｘした。」もしくは「Ｙした。」に書き換えるような変形規則を作っておき，この変形規則に従って入力「Ａして，Ｂした。」が与えられたときに「Ａした。」や「Ｂした。」という要約候補を生成する。 3) Model using transformation rules A model using transformation rules is a model that generates summary results using transformation rules prepared in advance. Deformation rules are obtained by automatic processing or used manually. For example, a transformation rule that rewrites “X and Y” to “X” or “Y” is created, and an input “A and B” is given according to this transformation rule. When this is done, summary candidates such as “A done” or “B done” are generated.

４）ランダムジェネレーションを利用したモデル
ランダムジェネレーションを利用したモデルは，例えば，入力「・・・Ｘ・・・」があったときに「・・・Ｙ・・・」を要約候補とするようなモデルである。このとき，置き換えられるＸはランダムに選ばれてもよいし，予め用意しておいた置換規則によって指定してもよい。置換規則は，人手によって生成されたものや，自動獲得したものなどを用いる。また，置き換えた先の表現Ｙは，ある辞書の単語もしくは文字列の集合からランダムに選ばれてもよいし，予め用意しておいた変換規則によって指定してもよい。変換規則は，置換規則と同様，人手によって生成されたものや，自動獲得したものなどを用いる。このとき，ＸやＹをランダムに選ばずに，変換規則にもとづいて選ぶとすると，変形規則を利用したモデルと同じようなものになる。 4) Model using random generation The model using random generation is, for example, a model that uses "... Y ..." as a summary candidate when there is an input "... X ...". It is. At this time, X to be replaced may be selected at random or may be designated by a replacement rule prepared in advance. As the replacement rule, a manually generated rule or an automatically acquired rule is used. The replaced expression Y may be selected at random from a set of words or character strings in a certain dictionary, or may be designated by a conversion rule prepared in advance. As with the replacement rule, a conversion rule that is manually generated or automatically acquired is used. At this time, if X and Y are not selected at random, but are selected based on the conversion rule, the model is the same as the model using the deformation rule.

素性抽出部１２５は，解−素性対抽出部１２１とほぼ同様の処理によって，入力したテキスト２および要約候補から素性の集合を抽出し，要約候補−推定解対生成部１２６へ渡す（ステップＳ１５）。 The feature extraction unit 125 extracts a set of features from the input text 2 and the summary candidate by substantially the same processing as the solution-feature pair extraction unit 121, and passes it to the summary candidate-estimated solution pair generation unit 126 (step S15). .

そして，要約候補−推定解対生成部１２６は，受け取った素性の集合の場合にどのような解になりやすいかを，学習結果データをもとに推定し，すなわち，複数の要約候補のそれぞれの解（評価）とその確信度を学習結果データにもとづき算出し，要約候補と推定解との対（要約候補−推定解対）１２７を生成する（ステップＳ１６）。 Then, the summary candidate-estimated solution pair generation unit 126 estimates, based on the learning result data, what kind of solution is likely to occur in the case of the received feature set, that is, each of the plurality of summary candidates. The solution (evaluation) and its certainty are calculated based on the learning result data, and a pair of summary candidate and estimated solution (summary candidate-estimated solution pair) 127 is generated (step S16).

そして，要約選択部１２８は，生成された要約候補−推定解対１２７から，推定解の確信度の値が最もよい要約候補−推定解対１２７を選択し，その要約候補を要約３とする（ステップＳ１７）。 Then, the summary selection unit 128 selects the summary candidate-estimated solution pair 127 having the highest certainty value of the estimated solution from the generated summary candidate-estimated solution pair 127, and sets the summary candidate as the summary 3 ( Step S17).

図６に，第１の実施の形態における本発明の処理装置の別の構成例を示す。 FIG. 6 shows another configuration example of the processing apparatus of the present invention in the first embodiment.

本形態では，解（分類先）として多数の候補が考えられるが，分類先の種類数が多くなり過ぎて，一般の機械学習法で処理ができない場合が生じうる。このような場合に，図６に示す自動要約処理装置２０では，機械学習部１３２は，実際の機械学習処理において正例と負例の二種類の解（分類先）のみを考える機械学習手法を用いることにより処理が可能となる。 In this embodiment, a large number of candidates can be considered as solutions (classification destinations), but there may be a case where the number of types of classification destinations becomes too large to be processed by a general machine learning method. In such a case, in the automatic summarization processing apparatus 20 shown in FIG. 6, the machine learning unit 132 uses a machine learning method that considers only two types of solutions (classification destinations) of positive examples and negative examples in actual machine learning processing. By using it, processing becomes possible.

また，図６に示す自動要約処理装置２０では，機械学習部１３２の学習の素性に評価という情報を用いることもできる。 In the automatic summarization processing apparatus 20 shown in FIG. 6, information called evaluation can be used as the learning feature of the machine learning unit 132.

自動要約処理装置２０は，評価カスタマイズ手段１１０と，解データ記憶部１３０と，素性−解対・素性−解候補対抽出部１３１と，機械学習部１３２と，学習結果データ記憶部１３３と，要約候補生成部１３４と，素性−解候補抽出部１３５と，要約候補−推定解対生成部１３６と，要約選択部１３８とを備える。 The automatic summarization processing device 20 includes an evaluation customizing unit 110, a solution data storage unit 130, a feature-solution pair / feature-solution candidate pair extraction unit 131, a machine learning unit 132, a learning result data storage unit 133, and a summary. A candidate generation unit 134, a feature-solution candidate extraction unit 135, a summary candidate-estimated solution pair generation unit 136, and a summary selection unit 138 are provided.

解データ記憶部１３０と，素性−解対・素性−解候補対抽出部１３１と，機械学習部１３２と，学習結果データ記憶部１３３とは，特許請求の範囲に示す自動要約処理装置の機械学習処理手段を実現する処理手段である。また，要約候補生成部１３４と，素性−解候補抽出部１３５と，要約候補−推定解対生成部１３６とは，特許請求の範囲に示す要約候補生成処理手段を実現する処理手段である。 The solution data storage unit 130, the feature-solution pair / feature-solution candidate pair extraction unit 131, the machine learning unit 132, and the learning result data storage unit 133 are machine learning of the automatic summarization processing device shown in the claims. It is a processing means for realizing the processing means. The summary candidate generation unit 134, the feature-solution candidate extraction unit 135, and the summary candidate-estimated solution pair generation unit 136 are processing units that implement the summary candidate generation processing unit shown in the claims.

評価カスタマイズ手段１１０および要約候補生成部１３４は，図１に示す自動要約処理装置１０の評価カスタマイズ手段１１０および要約候補生成部１２４と同様の処理を行う。 The evaluation customizing unit 110 and the summary candidate generating unit 134 perform the same processing as the evaluation customizing unit 110 and the summary candidate generating unit 124 of the automatic summary processing apparatus 10 shown in FIG.

素性−解対・素性−解候補対抽出部１３１は，解データ記憶部１３０に記憶されている事例ごとに，解もしくは解候補と素性の集合との組を抽出する手段である。ここでは，解の候補は解以外の解の候補を意味し，ユーザが設定した評価を解とする。また，解と素性の集合の組を正例とし，解の候補と素性の集合との組を負例とする。 The feature-solution pair / feature-solution candidate pair extraction unit 131 is a means for extracting a set of a solution or a solution candidate and a set of features for each case stored in the solution data storage unit 130. Here, the solution candidate means a solution candidate other than the solution, and the evaluation set by the user is the solution. A pair of a solution and a feature set is a positive example, and a pair of a solution candidate and a feature set is a negative example.

機械学習部１３２は，解もしくは解の候補と素性の集合との組から，どのような解もしくは解の候補と素性の集合のときに正例である確率や負例である確率を学習し，その学習結果を学習結果データ記憶部１３３に記憶する手段である。 The machine learning unit 132 learns the probability of being a positive example or the probability of being a negative example from any set of solutions or solution candidates and feature sets, and any solution or solution candidate and feature set, The learning result is stored in the learning result data storage unit 133.

素性−解候補抽出部１３５は，素性−解対・素性−解候補対抽出部１３１と同様の処理により，入力されたテキストおよび要約候補について，解の候補と素性の集合との組を抽出する手段である。 The feature-solution candidate extraction unit 135 extracts a set of solution candidates and feature sets for the input text and summary candidate by the same processing as the feature-solution pair / feature-solution candidate pair extraction unit 131. Means.

要約候補−推定解対生成部１３６は，渡された解の候補と素性の集合との組の場合に正例である確率や負例である確率を求め，正例である確率が最も大きい解を推定解として，その場合の要約候補と推定解との対（要約候補−推定解対）１３７を生成する手段である。 The summary candidate-estimated solution pair generation unit 136 obtains a probability that is a positive example or a probability that is a negative example in the case of a set of a candidate solution and a set of features that have been passed, and the solution that has the highest probability of being a positive example. Is a means for generating a pair (summary candidate-estimated solution pair) 137 of the summary candidate and the estimated solution in that case.

要約選択部１３８は，要約候補−推定解対１３７の要約候補を要約３とする手段である。 The summary selection unit 138 is means for setting the summary candidate of the summary candidate-estimated solution pair 137 as the summary 3.

図７に，自動要約処理装置２０の機械学習処理および自動要約処理の流れを示す。 FIG. 7 shows a flow of machine learning processing and automatic summarization processing of the automatic summarization processing device 20.

素性−解対・素性−解候補対抽出部１３１は，解データ記憶部１３０から，各事例ごとに解もしくは解の候補と素性の集合との組を抽出する（ステップＳ２１）。そして，機械学習部１３２は，解もしくは解の候補と素性の集合との組から，どのような解もしくは解の候補と素性の集合のときに，正例である確率や負例である確率を機械学習法により学習し，学習結果を学習結果データ記憶部１３３に記憶する（ステップＳ２２）。 The feature-solution pair / feature-solution candidate pair extraction unit 131 extracts a set of a solution or a solution candidate and a feature set for each case from the solution data storage unit 130 (step S21). Then, the machine learning unit 132 determines the probability of being a positive example or the probability of being a negative example from any set of solutions or solution candidates and feature sets. Learning is performed by the machine learning method, and the learning result is stored in the learning result data storage unit 133 (step S22).

その後，要約を求めたいテキスト２が入力されると（ステップＳ２３），要約候補生成部１３４は，所定の方法でテキスト２から要約候補を生成する（ステップＳ２４）。そして，素性−解候補抽出部１３５は，入力したテキスト２および要約候補から素性の集合と解の候補との組を抽出し，要約候補−推定解対生成部１３６へ渡す（ステップＳ２５）。 Thereafter, when the text 2 for which a summary is desired is input (step S23), the summary candidate generating unit 134 generates a summary candidate from the text 2 by a predetermined method (step S24). Then, the feature-solution candidate extraction unit 135 extracts a set of feature sets and solution candidates from the input text 2 and summary candidates, and passes them to the summary candidate-estimated solution pair generation unit 136 (step S25).

要約候補−推定解対生成部１３６は，受け取った解の候補と素性の集合との組の場合に正例や負例である確率を学習結果データをもとに推定し，正例である確率が最も大きい解の候補を推定解として，要約候補−推定解対１３７を生成し（ステップＳ２６），要約選択部１３８は，要約候補−推定解対１３７の要約候補を要約３とする（ステップＳ２７）。 The summary candidate-estimated solution pair generation unit 136 estimates the probability of a positive example or a negative example based on the learning result data in the case of a set of the received solution candidate and a feature set, and the probability of being a positive example The summary candidate-estimated solution pair 137 is generated using the candidate with the largest solution as the estimated solution (step S26), and the summary selection unit 138 sets the summary candidate of the summary candidate-estimated solution pair 137 as summary 3 (step S27). ).

第１の実施の形態では，ユーザは必要なときに自動要約処理装置１を使用しながら，その使用の際に出力された要約結果に対して評価１〜３をつければよい。したがって，ユーザは，操作負担を感じることなく要約結果の評価をカスタマイズすることが可能となる。 In the first embodiment, the user may give evaluations 1 to 3 to the summary results output during use while using the automatic summary processing device 1 when necessary. Therefore, the user can customize the evaluation of the summary result without feeling the operation burden.

〔第２の実施の形態〕
図８に，第２の実施の形態における本発明の処理装置の構成例を示す。図８に示す自動要約処理装置３０は，図１に示す自動要約処理装置１０の評価カスタマイズ手段１１０の代わりに評価カスタマイズ手段１４０を備え，また自動要約処理装置１０を構成する評価カスタマイズ手段１１０以外の処理手段を備える。 [Second Embodiment]
FIG. 8 shows a configuration example of the processing apparatus of the present invention in the second embodiment. The automatic summarization processing apparatus 30 shown in FIG. 8 includes an evaluation customizing means 140 instead of the evaluation customizing means 110 of the automatic summarization processing apparatus 10 shown in FIG. A processing means is provided.

評価カスタマイズ手段１４０は，テキスト表示部１４１と，要約編集部１４２とを備える。 The evaluation customizing unit 140 includes a text display unit 141 and a summary editing unit 142.

テキスト表示部１４１は，予め用意したテキスト５を表示装置（図８に図示しない）に表示する手段である。 The text display unit 141 is means for displaying the text 5 prepared in advance on a display device (not shown in FIG. 8).

要約編集部１４２は，テキスト表示部１４１が表示したテキスト５からユーザが要約として指定した部分を抽出して，または，ユーザが指定した部分内の表現を変更して要約を編集する手段である。 The summary editing unit 142 is a means for extracting a part designated by the user as a summary from the text 5 displayed by the text display unit 141 or changing the expression in the part designated by the user to edit the summary.

図９に，第２の実施の形態における評価カスタマイズ処理の流れを示す。 FIG. 9 shows the flow of evaluation customization processing in the second embodiment.

テキスト表示部１４１は，予め用意したテキスト５を取り込み，表示装置に表示する（ステップＳ３１）。表示したテキスト５上でユーザに要約結果として良いと思われる部分を指定させ，ユーザが指定した範囲を受け付けて抽出する（ステップＳ３２）。また，指定した範囲の部分が編集されたら，その編集内容を受け付け，編集後の指定範囲部分を要約結果とする（ステップＳ３３）。 The text display unit 141 takes in the prepared text 5 and displays it on the display device (step S31). On the displayed text 5, the user is allowed to specify a portion that is considered to be a good summary result, and the range specified by the user is received and extracted (step S32). If the specified range portion is edited, the edited content is accepted, and the specified range portion after editing is used as the summary result (step S33).

ユーザは，表示されたテキスト上をマウスなどのポインティング・デバイスによるドラッグや，カーソルキー移動による開始位置および終了位置の指定などにより要約とする範囲を指定する。テキスト表示部１４１は，指定された範囲を，反転もしくはマーキングなどの表示により，指定されなかった範囲と区別して表示する。 The user designates a range to be summarized on the displayed text by dragging with a pointing device such as a mouse or by specifying a start position and an end position by moving a cursor key. The text display unit 141 displays the specified range separately from the unspecified range by display such as inversion or marking.

図１０に，表示されるテキストの例およびユーザＡが指定した範囲の例を示す。ユーザＡは，破線で囲む部分「小説を対象にして実験を行なったところ，テストサンプルで再現率８４％，適合率８２％の精度で解析できた。」を要約としてよい部分であると指定する。要約編集部１４２は，図１０のテキストの破線の矩形で示された部分を要約結果とする。 FIG. 10 shows an example of text to be displayed and an example of a range designated by the user A. User A designates that the portion surrounded by the broken line “the experiment was conducted on a novel, and the test sample could be analyzed with an accuracy of a reproduction rate of 84% and a precision of 82%” was a good portion to summarize. . The summary editing unit 142 sets the portion indicated by the broken-line rectangle of the text in FIG. 10 as the summary result.

また，ユーザＢは，図１１に示すように，テキストの破線で囲む部分「自然言語では，動詞を省略するということがある。この省略された動詞を復元することは，対話システムや高品質の機械翻訳システムの実現には不可欠なことである。そこで本研究では，この省略された動詞を表層の表現（手がかり語) と用例から補完することを行なう。」を要約として良いと指定する。また，ユーザＣの場合には，図１２に示すように，２つの破線の矩形で囲まれた部分「自然言語では，動詞を省略するということがある。この省略された動詞を復元することは，対話システムや高品質の機械翻訳システムの実現には不可欠なことである。そこで本研究では，この省略された動詞を表層の表現（手がかり語) と用例から補完することを行なう。」と部分「小説を対象にして実験を行なったところ，テストサンプルで再現率84％，適合率82％の精度で解析できた。」とを要約としてよいと指定する。要約編集部１４２は，図１１および図１２に示すテキストの破線の矩形で示された部分をそれぞれ要約結果とする。 In addition, as shown in FIG. 11, the user B may enclose a portion surrounded by a broken line of text “a verb is omitted in a natural language. Restoring this omitted verb may be a dialogue system or a high quality Therefore, in this study, the abbreviated verb is complemented from the surface expression (cue word) and examples. " Further, in the case of the user C, as shown in FIG. 12, a part surrounded by two broken rectangles “in natural language, a verb may be omitted. To restore this omitted verb, Therefore, in this study, we will supplement the omitted verbs with surface expressions (cue words) and examples. " Specifying that the experiment should be a summary of the test sample, it was possible to analyze the test sample with an accuracy of 84% recall and accuracy of 82%. The summary editing unit 142 sets the portions indicated by broken-line rectangles of the text shown in FIGS. 11 and 12 as summary results.

なお，ユーザが指定した範囲をテキストと別に表示し，指定範囲内の表現について，ユーザが任意の箇所を削除したり，または表現を変更したりして，その内容を編集できるようにしてもよい。図１３に示すように，テキスト上で指定した範囲をテキストと別に表示して，指定範囲内の語句などを削除し，追加し，訂正することができるようにする。要約編集部１４２は，要約決定ボタンがクリック等の操作で選択されると，その選択を受け付けて，指定範囲の内容を要約結果とする。なお，キャンセルボタンが選択された場合には，指定範囲の内容をクリアする。 The range specified by the user may be displayed separately from the text so that the user can edit the contents of the expression within the specified range by deleting any part or changing the expression. . As shown in FIG. 13, the range specified on the text is displayed separately from the text so that words, phrases, etc. within the specified range can be deleted, added and corrected. When the summary determination button is selected by an operation such as clicking, the summary editing unit 142 accepts the selection and sets the contents of the designated range as the summary result. If the cancel button is selected, the contents of the specified range are cleared.

そして，要約編集部１４２は，テキスト５と要約結果とを，所定の解（良い評価）とともに解データ記憶部１３０に記憶する（ステップＳ３４）。さらに，評価カスタマイズ手段１４０は，第１の実施の形態において，自動要約処理装置２０が生成した要約，自動要約処理装置２０の要約候補生成部１２４が生成した要約候補，人手でランダムに生成した要約などのユーザが指定した要約以外の要約に対して所定の解（悪い評価）を付与した解データも解データ記憶部１３０へ記憶する。 The summary editing unit 142 stores the text 5 and the summary result in the solution data storage unit 130 together with a predetermined solution (good evaluation) (step S34). Further, in the first embodiment, the evaluation customizing means 140 is a summary generated by the automatic summary processing device 20, a summary candidate generated by the summary candidate generating unit 124 of the automatic summary processing device 20, or a manually generated summary. Also stored in the solution data storage unit 130 is solution data in which a predetermined solution (bad evaluation) is given to a summary other than the summary designated by the user.

以降，機械学習処理および自動要約処理の流れは，図５に示す処理の流れと同様である。ここで，機械学習部１２２は，それぞれのユーザごとに，図３に示すテキストと，図１０〜図１２に示す要約結果のいずれか（すなわち，ユーザ指定範囲）と，解とする事例について学習する。 The flow of machine learning processing and automatic summarization processing is the same as the processing flow shown in FIG. Here, the machine learning unit 122 learns, for each user, one of the text shown in FIG. 3, the summary result shown in FIGS. 10 to 12 (that is, a user-specified range), and a case as a solution. .

図１４に，第２の実施の形態における本発明の処理装置の別の構成例を示す。本形態においても，解（分類先）の種類数が多くなり過ぎて，一般の機械学習法で処理ができない場合が生じうる。 FIG. 14 shows another configuration example of the processing apparatus of the present invention in the second embodiment. Even in this embodiment, there may be a case where the number of types of solutions (classification destinations) becomes too large to be processed by a general machine learning method.

このため，図１４に示す自動要約処理装置４０では，機械学習部１３２は，実際の機械学習処理において正例と負例の二種類の解（分類先）のみを考える機械学習手法を用いることにより処理を可能としている。 For this reason, in the automatic summarization processing apparatus 40 shown in FIG. 14, the machine learning unit 132 uses a machine learning method that considers only two types of solutions (classification destinations) of positive examples and negative examples in the actual machine learning process. Processing is possible.

自動要約処理装置４０は，図６に示す自動要約処理装置２０を構成する処理手段と同様の処理手段を備え，かつ，評価カスタマイズ手段１１０の代わりに評価カスタマイズ手段１４０を備えるものである。 The automatic summarization processing apparatus 40 includes processing means similar to the processing means constituting the automatic summarization processing apparatus 20 shown in FIG. 6, and includes an evaluation customizing means 140 instead of the evaluation customizing means 110.

本形態では，ユーザに要約としてよい範囲をテキスト上で指定させるため，第１の実施の形態に比べてユーザの負担は大きい。しかし，ユーザが求める要約結果により近いものを解データ（教師）とすることができるため，ユーザが所望する要約結果をより早く出力できるように学習することができる。 In the present embodiment, since the user can designate a range that can be used as a summary on the text, the burden on the user is greater than that in the first embodiment. However, since it is possible to use solution data (teacher) that is closer to the summary result desired by the user, it is possible to learn so that the summary result desired by the user can be output more quickly.

〔第３の実施の形態〕
図１５に，第３の実施の形態における本発明の処理装置の構成例を示す。図１５に示す自動要約処理装置５０は，図１に示す自動要約処理装置１０の評価カスタマイズ手段１１０の代わりに評価カスタマイズ手段１５０を備え，また他の処理手段として，自動要約処理装置１０を構成する処理手段と同様の処理手段を備える。 [Third Embodiment]
FIG. 15 shows a configuration example of the processing apparatus of the present invention in the third embodiment. An automatic summary processing device 50 shown in FIG. 15 includes an evaluation customizing unit 150 instead of the evaluation customizing unit 110 of the automatic summary processing device 10 shown in FIG. 1, and constitutes the automatic summary processing device 10 as another processing unit. Processing means similar to the processing means are provided.

評価カスタマイズ手段１５０は，要約表示部１５１と，性質情報設定部１５２とを備える。 The evaluation customizing unit 150 includes a summary display unit 151 and a property information setting unit 152.

要約表示部１５１は，予め用意しておいたテキスト・要約４の要約結果を表示装置（図１５に図示しない）に表示する手段である。 The summary display unit 151 is a means for displaying a prepared text / summary 4 summary result on a display device (not shown in FIG. 15).

性質情報設定部１５２は，要約結果の評価にかかわる複数の性質情報を生成し，性質情報ごとの評価を設定する手段である。 The property information setting unit 152 is a means for generating a plurality of property information related to the evaluation of the summary result and setting the evaluation for each property information.

性質情報とは，要約結果の評価を構成する種々の性質に関する情報であり，例えば，短い文を重視しているかどうかという情報（短文重視），要約結果に数量についての表現が含まれていることを重視しているかどうかという情報（数量表現重視），要約結果に手法についての表現が含まれていることを重視しているかどうかという情報（手法重視），要約結果の文体を重視しているかどうかという情報（文体重視），要約結果の読みやすさを重視しているかどうかという情報（読みやすさ重視）などである。 The property information is information on various properties that constitute the evaluation of the summary result. For example, information indicating whether a short sentence is emphasized (short sentence emphasis), and the summary result includes an expression about quantity. On whether or not emphasis is placed on (quantity expression emphasis), information on whether or not the summary results include expression about the technique (method emphasis), and whether or not the style of the summary results is emphasized Information (stylistic emphasis), information about whether or not the summary results are easy to read (readability emphasis).

評価カスタマイズ手段１５０は，機械学習部１２２において要約結果の評価にかかわる複数の性質をそれぞれ学習することができるように，ユーザが随時必要となった評価にかかわる複数の性質情報を任意に設定できるようにして，要約結果に対するユーザの評価を複数の性質情報を用いて定義する。 The evaluation customizing unit 150 can arbitrarily set a plurality of property information related to the evaluation that the user needs at any time so that the machine learning unit 122 can learn a plurality of properties related to the evaluation of the summary result. Thus, the user's evaluation for the summary result is defined using a plurality of property information.

本形態では，性質情報設定部１５２で設定された性質情報の数に対応して解データ記憶部１２０を用意し，各性質情報ごとに機械学習を行なう。したがって，解−素性対抽出部１２１，機械学習部１２２，学習結果データ記憶部１２３，要約候補−推定解対生成部１２６の各処理手段は，性質情報の数に対応して備えられる。 In this embodiment, the solution data storage unit 120 is prepared corresponding to the number of property information set by the property information setting unit 152, and machine learning is performed for each property information. Accordingly, the processing means of the solution-feature pair extraction unit 121, the machine learning unit 122, the learning result data storage unit 123, and the summary candidate-estimated solution pair generation unit 126 are provided corresponding to the number of property information.

図１６に，評価カスタマイズ処理の流れを示す。 FIG. 16 shows the flow of evaluation customization processing.

要約表示部１５１は，テキスト・要約４から取り出した要約結果を表示する（ステップＳ４１）。性質情報設定部１５２は，表示した要約結果に対して複数の性質情報の項目を表示し，ユーザに各項目の値や，新規項目の設定などを促し，ユーザの入力を受け付ける（ステップＳ４２）。 The summary display unit 151 displays the summary result extracted from the text / summary 4 (step S41). The property information setting unit 152 displays a plurality of items of property information for the displayed summary result, prompts the user to set the value of each item, setting a new item, etc., and accepts user input (step S42).

図１７および図１８に，性質情報設定画面の例を示す。性質情報設定画面では，複数の位置情報のそれぞれに対応してスライドバーが設けられている。ユーザはそれぞれの性質情報のスライドバー上でスライドボタンを右側や左側など任意の位置を定めて性質情報ごとの評価を指定できる。例えば，ユーザは，表示された要約結果に対して「短い文重視，数量表現重視，手法重視，文体重視，読みやすさ重視」などの性質情報の項目ごとに，それぞれどのくらいの評価になるかを，スライドバー上でスライドボタンを移動させて設定する。図１７および図１８では，スライドバーの左端から右端に向かって評価が高くなるように設定されているとする。また，ユーザはスライドバーの横に任意の性質情報を入力することにより，スライドバーが何を意味するかについて自由に定義できる。 17 and 18 show examples of property information setting screens. In the property information setting screen, a slide bar is provided corresponding to each of the plurality of position information. The user can specify an evaluation for each property information by setting an arbitrary position such as the right or left side of the slide button on the slide bar of each property information. For example, the user evaluates the displayed summary result for each item of property information such as “emphasis on short sentences, importance on quantity expression, importance on methods, importance on style, readability”. , Set by moving the slide button on the slide bar. In FIGS. 17 and 18, it is assumed that the evaluation is set so that the evaluation increases from the left end to the right end of the slide bar. Moreover, the user can freely define what the slide bar means by inputting arbitrary property information beside the slide bar.

要約表示部１５１が図４（Ａ）に示す要約結果ｒ１を表示した場合に，図１７に示すように，ユーザＡは，要約結果ｒ１が短い文なので「短い文重視」のスライドボタンを右側へ，また数量に関する表現があるので「数量表現重視」のスライドボタンを右側へ，また手法にふれていないので「手法重視」のスライドボタンを左側へ，文体と読みやすさとはそれほど悪くないので，「文体重視」および「読みやすさ重視」のスライドボタンを右側へ位置させる。 When the summary display unit 151 displays the summary result r1 shown in FIG. 4 (A), as shown in FIG. 17, since the summary result r1 is a short sentence, the user A moves the slide button “emphasis on short sentences” to the right side. Also, because there is an expression related to quantity, the slide button for “quantity expression emphasis” is on the right side, and since the method is not touched, the slide button for “method emphasis” is on the left side, and the style and readability are not so bad. Position the slide buttons for "style style" and "readability" to the right.

また，要約表示部１５１が図４（Ｂ）に示す要約結果ｒ２を表示した場合に，ユーザＡは，図１８に示すように，要約結果ｒ２がそれほど短くないので，「短い文重視」のスライドボタンを左側へ，その他の性質情報は，まあまあよいので，その他の性質情報のスライドボタンを右側へ移動させる。 Further, when the summary display unit 151 displays the summary result r2 shown in FIG. 4B, the summary of the summary result r2 is not so short as shown in FIG. Move the button to the left and the other property information is OK, so move the slide button for the other property information to the right.

そして，性質情報設定部１５２は，入力された性質情報ごとの値をそれぞれ解とし，その解とテキストと要約結果とを事例として性質情報ごとの解データ記憶部１２０に記憶する（ステップＳ４３）。 Then, the property information setting unit 152 sets the value for each input property information as a solution, and stores the solution, text, and summary result as examples in the solution data storage unit 120 for each property information (step S43).

以降，機械学習処理および自動要約処理の流れは，図５に示す処理の流れとほぼ同様である。ここで，性質情報ごとに備えられた機械学習部１２２は，対応する性質情報の解データ記憶部１２０に記憶された事例を解データ（教師データ）として使用する。機械学習部１２２は，それぞれの性質情報ごとに学習を行なう。例えば，性質情報「短い文重視」については，各事例の解は，事例ｃ１（要約結果ｒ１）では「解＝最右側」，事例ｃ２（要約結果ｒ２）では，「解＝左側」という解ができる。機械学習部１２２は，これらの解データを教師データとして利用して，どういうときに短い文重視で評価されるのかを学習していく。また，その他の性質情報についても同様の学習を行なう。 Thereafter, the flow of the machine learning process and the automatic summarization process is almost the same as the process flow shown in FIG. Here, the machine learning unit 122 provided for each property information uses the case stored in the solution data storage unit 120 of the corresponding property information as solution data (teacher data). The machine learning unit 122 performs learning for each property information. For example, for the property information “emphasis on short sentences”, the solution of each case is “solution = rightmost” in case c1 (summary result r1) and “solution = left side” in case c2 (summary result r2). it can. The machine learning unit 122 uses these solution data as teacher data to learn when to evaluate with a short sentence emphasis. The same learning is performed for other property information.

本形態では，機械学習処理後，要約候補生成部１２４は，入力されたテキスト２から所定の方法で要約候補を生成し，素性抽出部１２５は，入力テキスト２および要約候補から素性の集合を抽出する。 In this embodiment, after the machine learning process, the summary candidate generation unit 124 generates a summary candidate from the input text 2 by a predetermined method, and the feature extraction unit 125 extracts a set of features from the input text 2 and the summary candidate. To do.

そして，各性質情報に対応する要約候補−推定解対生成部１２６は，受け取った素性の集合の場合にどのような解になりやすいかを学習結果データをもとに推定し，要約候補と推定解との対（要約候補−推定解対）１２７を生成する。例えば，要約候補−推定解対生成部１２６は，複数の要約候補のそれぞれの推定解とその確信度を学習結果データにもとづき算出して，それぞれの性質情報ごとの要約候補−推定解対１２７を生成する。 Then, the summary candidate-estimated solution pair generation unit 126 corresponding to each property information estimates, based on the learning result data, what kind of solution is likely to be obtained in the case of the received feature set, and estimates the estimation candidates. A pair with a solution (summary candidate-estimated solution pair) 127 is generated. For example, the summary candidate-estimated solution pair generation unit 126 calculates the estimated solution and the certainty factor of each of the plurality of summary candidates based on the learning result data, and obtains the summary candidate-estimated solution pair 127 for each property information. Generate.

要約選択部１２８は，要約結果に対する評価の性質情報をどの程度重視するかを設定したユーザ評価設定情報７を受け付けて，要約候補−推定解対１２７で各性質情報の評価の値を，ユーザ評価設定情報７と比較して，最も似た要約候補−推定解対，もしくはユーザ評価設定情報７に最も適した要約候補−推定解対を選択し，その要約候補−推定解対１２７の要約候補を要約３とする。 The summary selection unit 128 receives the user evaluation setting information 7 that sets how much importance is given to the evaluation property information for the summary result, and evaluates the evaluation value of each property information in the summary candidate-estimated solution pair 127 by the user evaluation. Compared with the setting information 7, the most similar summary candidate-estimated solution pair or the most suitable summary candidate-estimated solution pair for the user evaluation setting information 7 is selected, and the summary candidate of the summary candidate-estimated solution pair 127 is selected. Summary 3

要約選択部１２８は，図１７に示すような性質情報設定画面を表示して，ユーザが現在必要な要約結果の性質であるユーザ評価設定情報７を，性質情報の各項目のスライドバー上のスライドボタンの位置を変更して設定するように促してもよい。 The summary selection unit 128 displays a property information setting screen as shown in FIG. 17, and the user evaluation setting information 7 which is the property of the summary result currently required by the user is displayed on the slide bar of each item of property information. The button position may be changed and set.

例えば，「短い文重視」，「数量表現重視」，「手法重視」のスライドボタンを最右側に移動させ，「文体重視」，「読みやすさ重視」のスライドボタンを最左側へ移動させる場合には，ユーザは，なるべく短く，また，数量表現および手法は欠かさず，しかし，文体や読みやすさは軽視するというような性質の評価に適合する要約３を要求していることを意味するユーザ評価設定情報７となる。 For example, when the slide buttons for "Short sentence emphasis", "Quantity expression emphasis", and "Method emphasis" are moved to the right side, and the "Text style emphasis" and "Readability emphasis" slide buttons are moved to the left side. Means that the user is requesting a summary 3 that conforms to the nature of the evaluation such that the user is as short as possible and the quantitative expression and method are essential, but the style and readability are neglected Setting information 7 is obtained.

また，要約選択部１２８は，要約候補−推定解対１２７の簡単な選択方法として，例えば以下の式を利用して，すべての解の組合せの値Total ＿Score を求めてもよい。 Moreover, the summary selection part 128 may obtain | require the value Total_Score of all the solution combinations as a simple selection method of the summary candidate-estimated solution pair 127, for example using the following formula | equation.

Total ＿Score ＝ａ（短い文重視）×ｓｃｏｒｅ（短い文重視）
＋ａ（数量表現重視）×ｓｃｏｒｅ（数量表現重視)
＋ａ（手法重視) ×ｓｃｏｒｅ（手法重視)
＋ａ（文体重視) ×ｓｃｏｒｅ（文体重視)
＋ａ（読みやすさ重視) ×ｓｃｏｒｅ（読みやすさ重視)
ただし，ａ（Ｘ）はユーザが指定した性質情報Ｘのスライドバーのスライドボタン位置から求まる値である。スライドボタンがスライドバーの右側に位置するほど大きな値を持つとしている。ｓｃｏｒｅ（Ｘ）は学習結果データにもとづいて算出された性質情報Ｘの評価の値である。要約選択部１２８は，この組合せ値 Total＿Score が最も大きい要約候補−推定解対１２７を選択し，その要約候補を要約３として出力する。 Total_Score = a (emphasis on short sentences) x score (emphasis on short sentences)
+ A (emphasis on quantity expression) x score (emphasis on quantity expression)
+ A (method emphasis) × score (method emphasis)
+ A (style style emphasis) × score (style style emphasis)
+ A (easy to read) x score (easy to read)
However, a (X) is a value obtained from the slide button position of the slide bar of the property information X designated by the user. The slide button has a larger value as it is located on the right side of the slide bar. score (X) is an evaluation value of the property information X calculated based on the learning result data. The summary selection unit 128 selects the summary candidate-estimated solution pair 127 having the largest combination value Total_Score, and outputs the summary candidate as the summary 3.

本形態では，機械学習部１２２で用いる要約結果を表示させてユーザに評価させるという，第１の実施の形態における処理に近い処理方法を採用した。しかし，本形態では，第２の実施の形態における機械学習の手法のように，ユーザにテキストから要約結果としてよいと思われる範囲を指定させた上で，さらに，ユーザに評価にかかわる複数の性質をスライドバーなどを用いて評価させて，教師信号である解データを収集するようにしてもよい。かかる処理の場合には，同一ユーザであっても処理を行なう度に所望する要約のタイプが異なるような状況にも対処することが可能となる。また，同時に複数の性質情報を学習することが可能であるため，ユーザが評価（解）を与える際の処理負担も全体として軽減することが可能となる。 In this embodiment, a processing method similar to the processing in the first embodiment is adopted, in which summary results used by the machine learning unit 122 are displayed and evaluated by the user. However, in this embodiment, as in the machine learning method in the second embodiment, the user is allowed to specify a range that is considered to be a summary result from the text, and then the user has a plurality of properties related to the evaluation. May be evaluated using a slide bar or the like, and solution data that is a teacher signal may be collected. In the case of such processing, even the same user can cope with a situation in which a desired summary type differs every time processing is performed. In addition, since it is possible to learn a plurality of property information at the same time, it is possible to reduce the processing load when the user gives an evaluation (solution) as a whole.

本形態では，図１９に示すような処理手段の構成を持つ自動要約処理装置６０としてもよい。図１９の自動要約処理装置６０は，図６に示す自動要約処理装置２０を構成する処理手段と同様の処理手段を備え，かつ評価カスタマイズ手段１１０の代わりに評価カスタマイズ手段１５０を備えるものである。 In this embodiment, an automatic summarization processing device 60 having the processing means as shown in FIG. 19 may be used. An automatic summarization processing device 60 in FIG. 19 includes processing means similar to the processing means constituting the automatic summarization processing device 20 shown in FIG. 6, and includes an evaluation customizing means 150 instead of the evaluation customizing means 110.

自動要約処理装置６０は，実際の機械学習処理において，正例と負例の二種類の解（分類先）のみを考える機械学習手法を用いることにより，機械学習での過重な処理負担を回避することができる。 The automatic summarization processing device 60 avoids an excessive processing burden in machine learning by using a machine learning method that considers only two types of solutions (classification destinations) of positive examples and negative examples in actual machine learning processes. be able to.

以上，本発明をその実施の態様により説明したが，本発明はその主旨の範囲において種々の変形が可能である。例えば，第１の実施の形態ないし第３の実施の形態のいずれの形態をも組み合わせて実施することも可能である。 As mentioned above, although this invention was demonstrated by the embodiment, this invention can be variously deformed in the range of the main point. For example, it is possible to combine any of the first to third embodiments.

〔第４の実施の形態〕
図２０に，第４の実施の形態における本発明の処理の構成例を示す。図２０に示す自動要約処理装置７０は，解データ記憶部１２０と，解−素性対抽出部１２１と，機械学習部１２２と，学習結果データ記憶部１２３と，素性抽出部１２５と，解推定部１６０と，評価カスタマイズ手段１４０とを備える。 [Fourth Embodiment]
FIG. 20 shows a configuration example of the processing of the present invention in the fourth embodiment. 20 includes a solution data storage unit 120, a solution-feature pair extraction unit 121, a machine learning unit 122, a learning result data storage unit 123, a feature extraction unit 125, and a solution estimation unit. 160 and evaluation customizing means 140.

自動要約処理装置７０の解データ記憶部１２０，解−素性対抽出部１２１，機械学習部１２２，学習結果データ記憶部１２３，素性抽出部１２５および評価カスタマイズ手段１４０とは，図８に示す同一番号が付与された処理手段とほぼ同様の処理を行う手段である。 The solution data storage unit 120, the solution-feature pair extraction unit 121, the machine learning unit 122, the learning result data storage unit 123, the feature extraction unit 125, and the evaluation customization unit 140 of the automatic summary processing device 70 have the same numbers as shown in FIG. Is a means for performing substantially the same processing as the processing means to which is given.

解推定部１６０は，学習結果データ記憶部１２３の学習結果データを参照して，素性抽出部１２５から渡された素性の集合の場合に，どのような解になり易いかを推定し，その推定解１６１を要約３とする手段である。 The solution estimation unit 160 refers to the learning result data in the learning result data storage unit 123 to estimate what kind of solution is likely to occur in the case of a set of features passed from the feature extraction unit 125, and the estimation The solution 161 is a means for summarizing 3.

本形態では，解データ記憶部１２０は，テキストを問題としテキストの要約結果を解とする解データを記憶し，機械学習部１２２は，かかる解データから抽出された解−素性対を用いて機械学習を行う。また，素性抽出部１２５は，入力されたテキスト２の素性を抽出して，解推定部１６０に渡す。 In this embodiment, the solution data storage unit 120 stores solution data having a text as a problem and a text summary result as a solution, and the machine learning unit 122 uses a solution-feature pair extracted from the solution data. Do learning. The feature extraction unit 125 extracts the feature of the input text 2 and passes it to the solution estimation unit 160.

図２１に，第４の実施の形態における機械学習処理および自動要約処理の流れを示す。 FIG. 21 shows the flow of machine learning processing and automatic summarization processing in the fourth embodiment.

解−素性対抽出部１２１は，解データ記憶部１２０から，事例ごとに解と素性の集合との組を抽出し（ステップＳ５１），次に，機械学習部１２２は，解と素性の集合との組から，どのような素性の集合のときにどのような解になりやすいかを機械学習法により学習し，学習結果を学習結果データ記憶部１２３に記憶する（ステップＳ５２）。なお，ステップＳ５１，Ｓ５２の処理は，図５に示すステップＳ１１，Ｓ１２の処理と同様である。 The solution-feature pair extraction unit 121 extracts a set of a solution and a feature set for each case from the solution data storage unit 120 (step S51). Next, the machine learning unit 122 extracts the solution and feature set. From the set of features, what kind of feature set is likely to become a solution is learned by the machine learning method, and the learning result is stored in the learning result data storage unit 123 (step S52). The processes in steps S51 and S52 are the same as the processes in steps S11 and S12 shown in FIG.

その後，要約を求めたいテキスト２が入力されると（ステップＳ５３），素性抽出部１２５は，解−素性対抽出部１２１とほぼ同様の処理によって，入力したテキスト２から素性の集合を抽出し，解推定部１６０へ渡す（ステップＳ５４）。そして，解推定部１６０は，受け取った素性の集合の場合にどのような解になりやすいかを，学習結果データをもとに推定し，その推定解１６１を要約３とする（ステップＳ５５）。 Thereafter, when the text 2 to be summarized is input (step S53), the feature extraction unit 125 extracts a set of features from the input text 2 by substantially the same processing as the solution-feature pair extraction unit 121, It passes to the solution estimation part 160 (step S54). Then, the solution estimation unit 160 estimates, based on the learning result data, what kind of solution is likely to be generated in the case of the received feature set, and sets the estimated solution 161 as summary 3 (step S55).

本形態では，テキストの要約結果を解とする解データを用いて機械学習を行い，その学習結果を参照した解推定処理において要約とするべき推定解を直接求めるようにする。 In the present embodiment, machine learning is performed using solution data whose solution is a text summary result, and an estimated solution to be summarized is directly obtained in solution estimation processing referring to the learning result.

以上，本発明をその実施の形態により説明したが，本発明はその主旨の範囲において種々の変形が可能であることは当然である。 Although the present invention has been described above with reference to the embodiments, it is obvious that the present invention can be variously modified within the scope of the gist thereof.

また，本発明は，コンピュータにより読み取られ実行される処理プログラムとして実施するものとして説明したが，本発明を実現する処理プログラムは，コンピュータが読み取り可能な，可搬媒体メモリ，半導体メモリ，ハードディスクなどの適当な記録媒体に格納することができ，これらの記録媒体に記録して提供され，または，通信インタフェースを介して種々の通信網を利用した送受信により提供されるものである。 Although the present invention has been described as being implemented as a processing program that is read and executed by a computer, the processing program that implements the present invention includes a portable medium memory, a semiconductor memory, a hard disk, and the like that can be read by a computer. It can be stored in an appropriate recording medium, provided by being recorded on these recording media, or provided by transmission / reception using various communication networks via a communication interface.

第１の実施の形態における本発明の処理装置の構成例を示す図である。It is a figure which shows the structural example of the processing apparatus of this invention in 1st Embodiment. 第１の実施の形態における評価カスタマイズ処理の流れを示す図である。It is a figure which shows the flow of the evaluation customization process in 1st Embodiment. 対象となるテキストの例を示す図である。It is a figure which shows the example of the text used as object. 要約結果の例を示す図である。It is a figure which shows the example of a summary result. 図１に示す処理装置における機械学習処理および自動要約処理の流れを示す図である。It is a figure which shows the flow of the machine learning process and automatic summarization process in the processing apparatus shown in FIG. 第１の実施の形態における本発明の処理装置の別の構成例を示す図である。It is a figure which shows another structural example of the processing apparatus of this invention in 1st Embodiment. 図６に示す処理装置における機械学習処理および自動要約処理の流れを示す図である。It is a figure which shows the flow of the machine learning process and automatic summarization process in the processing apparatus shown in FIG. 第２の実施の形態における本発明の処理装置の構成例を示す図である。It is a figure which shows the structural example of the processing apparatus of this invention in 2nd Embodiment. 第２の実施の形態における評価カスタマイズ処理の流れを示す図である。It is a figure which shows the flow of the evaluation customization process in 2nd Embodiment. 表示されるテキストの例およびユーザＡの指定範囲の例を示す図である。It is a figure which shows the example of the text displayed, and the example of the designated range of the user A. 表示されるテキストの例およびユーザＢの指定範囲の例を示す図である。It is a figure which shows the example of the text displayed, and the example of the user's B designated range. 表示されるテキストの例およびユーザＣの指定範囲の例を示す図である。It is a figure which shows the example of the text displayed, and the example of the designation | designated range of the user C. ユーザが指定した範囲の表示の例を示す図である。It is a figure which shows the example of a display of the range designated by the user. 第２の実施の形態における本発明の処理装置の別の構成例を示す図である。It is a figure which shows another structural example of the processing apparatus of this invention in 2nd Embodiment. 第３の実施の形態における本発明の処理装置の構成例を示す図である。It is a figure which shows the structural example of the processing apparatus of this invention in 3rd Embodiment. 第３の実施の形態における評価カスタマイズ処理の流れを示す図である。It is a figure which shows the flow of the evaluation customization process in 3rd Embodiment. 性質情報設定画面の例を示す図である。It is a figure which shows the example of a property information setting screen. 性質情報設定画面の例を示す図である。It is a figure which shows the example of a property information setting screen. 第３の実施の形態における本発明の処理装置の別の構成例を示す図である。It is a figure which shows another structural example of the processing apparatus of this invention in 3rd Embodiment. 第４の実施の形態における本発明の処理装置の構成例を示す図である。It is a figure which shows the structural example of the processing apparatus of this invention in 4th Embodiment. 図２０に示す処理装置における機械学習処理および自動要約処理の流れを示す図である。It is a figure which shows the flow of the machine learning process and automatic summarization process in the processing apparatus shown in FIG.

Explanation of symbols

１０自動要約処理装置
１１０評価カスタマイズ手段
１１１要約表示部
１１２評価付与部
１２０解データ記憶部
１２１解−素性対抽出部
１２２機械学習部
１２３学習結果データ記憶部
１２４要約候補生成部
１２５素性抽出部
１２６要約候補−推定解対生成部
１２７要約候補−推定解対
１２８要約選択部
１３０解データ記憶部
１３１素性−解対・素性−解候補対抽出部
１３２機械学習部
１３３学習結果データ記憶部
１３４要約候補生成部
１３５素性−解候補抽出部
１３６要約候補−推定解対生成部
１３７要約候補−推定解対
１３８要約選択部
１４０評価カスタマイズ手段
１４１テキスト表示部
１４２要約編集部
１５０評価カスタマイズ手段
１５１要約表示部
１５２性質情報設定部
１６０解推定部
１６１推定解
２テキスト
３要約
４テキスト・要約
５テキスト
７ユーザ評価設定情報 DESCRIPTION OF SYMBOLS 10 Automatic summary processing apparatus 110 Evaluation customization means 111 Summary display part 112 Evaluation provision part 120 Solution data storage part 121 Solution-feature pair extraction part 122 Machine learning part 123 Learning result data storage part 124 Summary candidate generation part 125 Feature extraction part 126 Summary Candidate-estimated solution pair generation unit 127 Summary candidate-estimated solution pair 128 Summary selection unit 130 Solution data storage unit 131 Feature-solution pair / feature-solution candidate pair extraction unit 132 Machine learning unit 133 Learning result data storage unit 134 Summary candidate generation Unit 135 feature-solution candidate extraction unit 136 summary candidate-estimated solution pair generation unit 137 summary candidate-estimated solution pair 138 summary selection unit 140 evaluation customization unit 141 text display unit 142 summary editing unit 150 evaluation customization unit 151 summary display unit 152 property Information setting unit 160 Solution estimation unit 161 Teikai 2 text 3 summary 4 text summary 5 text 7 user evaluation setting information

Claims

A solution data editing processing device for editing solution data used in a process of automatically summarizing text as document data by a machine learning method,
Text storage means for storing text as document data;
Summary display means for displaying text acquired from the text storage means on a display device, extracting sentence data in a range specified by a user from the text, and displaying the text as a user-specified summary of the text;
Information indicating the characteristics of the summary used as an evaluation of the summary, with emphasis on the fact that the summary includes a short sentence-oriented nature indicating whether a short sentence is emphasized as an abstract, and an expression about quantity Quantitative expression emphasis on whether or not there is an emphasis, method emphasis on whether or not the summary includes expression about the technique, emphasis on the style of the summary Or an item for inputting an evaluation value for each of a plurality of properties including two or more of the properties of emphasizing readability indicating that importance is attached to whether the summary is easy to read. Evaluation giving means for accepting input of an evaluation value of each of the properties for the user-specified summary;
Solution data storage means for storing solution data composed of problems and solutions;
The text and the user-specified summary are used as a problem, an evaluation value input by the user is assigned as a solution to the problem, solution data is generated, a sentence is extracted from the text, and a selection state of any sentence is set as a summary candidate. An important sentence selection process, an important part selection process in which a clause is extracted from the text and the selection state of every phrase is a summary candidate, or a sentence of the text is transformed according to a predetermined transformation rule and the transformed state is taken as a summary candidate Any one of the transformation processes is performed to generate a summary candidate for the text, and the summary candidate consisting of the text and the summary candidate other than the user-specified summary is used as a problem. Is generated as a solution by giving a bad evaluation indicating that is not the user-specified summary, and is input by the user. Solution data edit processing device characterized by comprising an evaluation customization means for outputting to the solution data storage means the solution data and solution data to solution the evil evaluate the solution an evaluation value.

The solution data editing apparatus according to claim 1,
The evaluation customization means includes summary editing means for accepting an input of a change of a phrase of the text portion specified by the user and using the changed portion as the user-specified summary. Processing equipment.

The solution data editing apparatus according to claim 1,
The evaluation customizing means displays an item for inputting an evaluation value of each of the properties with respect to a summary candidate that is a summary candidate generated by the summary generation process and includes a portion other than the user-specified summary, and each user of the item A solution data editing process, wherein the solution data is generated by accepting an input of the evaluation value of the image and giving a combination of the input evaluation value to the problem that is the text and the displayed summary candidate as a solution apparatus.

Solution data editing for editing solution data used by a computer having text storage means, summary display means, evaluation assigning means, evaluation customization means, and solution data storage means for automatically summarizing text as document data by machine learning method A processing method,
The summary display means accesses the text storage means for storing text as document data, acquires the text, displays the text on a display device, and extracts sentence data in a range specified by the user from the text Process to display as a user-specified summary of the text;
Information indicating the characteristics of the summary used as an evaluation of the summary by the evaluation assigning means, and the summary includes a short sentence-oriented nature indicating whether a short sentence is emphasized or an expression about the quantity. Whether the emphasis is on quantity expression that indicates whether or not emphasis is placed on it, whether the emphasis is on the fact that the summaries include expression about the technique, whether or not emphasis is placed on the style of the summary The evaluation value for each of the two or more properties, including two or more of the stylistic properties that indicate or whether the summary is easy to read is important. A process of displaying an item to be input and receiving an input of an evaluation value of each of the properties for the user-specified summary;
The evaluation customizing means sets the text and the user-specified summary as a problem, assigns an evaluation value input by the user to the problem as a solution, generates solution data, extracts a sentence from the text, selects any sentence key sentence selection processing of the status and summary candidate was deformed the deformed according modification rule statement was predetermining the important passage selection process, or the text and selecting the state candidate condensates of any clauses removed clause from the text Performing any one of transformation processes with the state as a summary candidate to generate a summary candidate for the text, and considering the summary candidate consisting of the text and the summary candidate other than the user-specified summary, The solution data is generated by giving a bad evaluation indicating that the summary candidate is not the user-specified summary to the problem. , Solution data editing processing method characterized in that it comprises a process of outputting the solution data and solution data to solution the evil evaluate the solution an evaluation value input by the user on the solution data storage means.

In the solution data editing processing method according to claim 4,
The computer comprises summary editing means,
In the processing process executed by the evaluation customizing means, the summary editing means accepts input of a change in the phrase of the text portion designated by the user, and sets the changed portion as the user designated summary. Solution data editing processing method characterized by performing.

In the solution data editing processing method according to claim 4,
In the process performed by the evaluation customizing means, items for inputting evaluation values of the respective properties for the summary candidates generated by the summary generation process and consisting of portions other than the user-specified summary are displayed. A process of receiving input of evaluation values of users of each of the items, and generating the solution data by adding a combination of the input evaluation values as a solution to the problem that is the text and the displayed summary candidate Solution data editing processing method characterized by the above.