JPH1145289A

JPH1145289A - Document processor, storage medium storing document processing program and document processing method

Info

Publication number: JPH1145289A
Application number: JP9218230A
Authority: JP
Inventors: Naoyuki Nomura; 直之野村
Original assignee: JustSystems Corp
Current assignee: JustSystems Corp
Priority date: 1997-07-28
Filing date: 1997-07-28
Publication date: 1999-02-16

Abstract

PROBLEM TO BE SOLVED: To provide a document processor capable of preparing a summary based on the preference of a user such as a utilization purpose or the like, a storage medium storing a document processing program and a document processing method. SOLUTION: A key word and the importance are obtained from the contents of a processing document in the past and a GP(group personalizing) matrix for which one of the plural users and the key word is turned to a row, the other is turned to a column and the importance of the respective key words to the respective users is turned to an element value is obtained. Important words (a), (b),... from a summary preparation object document and the importance from the appearing frequency or the like are obtained, a term vector V for which the importance is an element is shifted by the GP matrix and a preference term vector V' is obtained. Preference important sentences F(Z) are extracted from the summary preparation object document based on the element (=preference importance) of the preference term vector V', arranged in an appearing order in the summary preparation object document and turned to a preference summary.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文書処理装置、文
書処理プログラムが記憶された記憶媒体、及び文書処理
方法に関し、更に詳細には、利用目的等のユーザーの嗜
好を踏まえた要約の作成に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document processing apparatus, a storage medium storing a document processing program, and a document processing method, and more particularly, to creation of an abstract based on user's preference such as purpose of use. .

【０００２】[0002]

【従来の技術】従来、書籍、論文、報告書等の各種の文
書に対し、要約（抄録を含む）の自動作成処理をコンピ
ュータを用いて行うことが行われている。文書の自動要
約については、例えば、「全文情報からの意味的情報の
抽出と加工」（情報処理学会第３８回全国大会予稿集、
第２２２頁；１９８９年）で提案されている。この方法
では、まず文書中の重要語を字種や動詞等の情報から抽
出し、さらに重要語の出現頻度から最重要語を取得す
る。次に重要語と最重要語が出現するか否かから重要文
を取得することで、自動的に要約を作成することが可能
になる。また、文章の段落の性質を反映させることで、
より正確に要約を作成する特開平３−１９１４７５号公
報に記載された方法等も提案されている。2. Description of the Related Art Conventionally, computers have been used to automatically create abstracts (including abstracts) for various documents such as books, papers, and reports. For the automatic summarization of documents, for example, “Extraction and processing of semantic information from full-text information” (Information Processing Society of Japan 38th Annual Conference Proceedings,
P. 222; 1989). In this method, an important word in a document is first extracted from information such as a character type and a verb, and the most important word is obtained from the appearance frequency of the important word. Next, by obtaining an important sentence based on whether or not the important word and the most important word appear, it is possible to automatically create a summary. Also, by reflecting the nature of paragraphs in sentences,
A method described in Japanese Patent Application Laid-Open No. 3-191475 for more accurately preparing an abstract has also been proposed.

【０００３】[0003]

【発明が解決しようとする課題】しかし、同一の文書で
も、例えば営業用や技術資料用等の利用目的その他のユ
ーザーの嗜好が異なると、文書における重要部位等に差
異が生じる。そして、上述のような従来の文書処理によ
って要約を作成しても、ユーザーの嗜好を踏まえた要約
を得ることはできない問題点がある。However, even in the same document, if the purpose of use, such as for business use or technical materials, or the user's preference is different, differences occur in important parts in the document. Then, even if an abstract is created by the conventional document processing as described above, there is a problem that an abstract based on the user's preference cannot be obtained.

【０００４】本発明は、上述のような課題を解決するた
めになされたもので、利用目的等のユーザーの嗜好を踏
まえた要約自動作成処理文書処理を行うことのできる文
書処理装置、文書処理プログラムを記憶した記憶媒体、
及び文書処理方法を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and has been made in consideration of the above-described circumstances. Storage medium storing
And a document processing method.

【０００５】[0005]

【課題を解決するための手段】請求項１に記載の発明
は、複数の文よりなる文書を取得する文書取得手段と、
前記文書取得手段により取得された前記文書から重要語
句とその重要度を取得する重要語句抽出手段と、前記重
要語句に基づいて前記文書からユーザーの嗜好を反映し
た嗜好重要部分を選択する嗜好重要部分選択手段と、前
記嗜好重要部分選択手段により選択された嗜好重要部分
に基づいて前記文書の要約を作成する嗜好要約作成手段
と、を具備する文書処理装置を提供することにより、上
記目的を達成する。請求項２に記載の発明は、請求項１
に記載の文書処理装置において、前記重要語句抽出手段
は、前記文書取得手段により取得された前記文書から前
記重要語の候補語句とその重要度を取得する候補語句取
得手段と、ユーザーの嗜好を表す複数のキーワードの重
要度を要素値とする嗜好ベクトル、または、複数のユー
ザーと各ユーザーの嗜好を表す複数のキーワードとの一
方を行、他方を列として前記各ユーザーに対する前記各
キーワードの重要度を要素値とするＧＰ行列、を取得す
る嗜好取得手段と、を有し、前記嗜好取得手段により取
得された前記嗜好ベクトルまたは前記ＧＰ行列を用い
て、前記候補語句取得手段により取得された候補語句の
重要度をシフトさせた重要度から前記重要語句を抽出
し、前記嗜好重要部分選択手段は、前記重要語句とその
重要度により前記嗜好重要部分を選択することを文書処
理装置を提供することにより、上記目的を達成する。請
求項３に記載の発明は、請求項１に記載の発明におい
て、前記重要語句抽出手段は、前記文書取得手段により
取得された前記文書から前記重要語の候補語句とその重
要度を取得して、前記候補語句の重要度により前記重要
語句を抽出し、前記嗜好重要部分選択手段は、ユーザー
の嗜好を表す複数のキーワードの重要度を要素値とする
嗜好ベクトル、または、複数のユーザーと複数のユーザ
ーそれぞれの嗜好を表す複数のキーワードとの一方を
行、他方を列として前記各ユーザーに対する前記各キー
ワードの重要度を要素値とするＧＰ行列、を取得する嗜
好取得手段を有し、前記嗜好取得手段により取得された
前記嗜好ベクトルまたは前記ＧＰ行列を用いて、前記重
要語句抽出手段により取得された重要語句の重要度をシ
フトさせた重要度により前記重要部分を選択する文書処
理装置を提供することにより前記目的を達成する。請求
項４に記載の発明は、複数の文よりなる文書を取得する
文書取得機能と、前記文書取得機能により取得された前
記文書から重要語句とその重要度を取得する重要語句抽
出機能と、前記重要語句に基づいて前記文書からユーザ
ーの嗜好を反映した嗜好重要部分を選択する嗜好重要部
分選択機能と、前記嗜好重要部分選択機能により選択さ
れた嗜好重要部分に基づいて前記文書の要約を作成する
嗜好要約作成機能とをコンピュータに実現させるための
コンピュータ読みとり可能な文書処理プログラムが記憶
された記憶媒体を提供することにより上記目的を達成す
る。請求項５に記載の発明は、請求項４に記載の記憶媒
体において、前記重要語句抽出機能は、前記文書取得機
能により取得された前記文書から前記重要語の候補語句
とその重要度を取得する候補語句取得機能と、ユーザー
の嗜好を表す複数のキーワードの重要度を要素値とする
嗜好ベクトル、または、複数のユーザーと各ユーザーの
嗜好を表す複数のキーワードとの一方を行、他方を列と
して前記各ユーザーに対する前記各キーワードの重要度
を要素値とするＧＰ行列、を取得する嗜好取得機能と、
を有し、前記嗜好取得機能により取得された前記嗜好ベ
クトルまたは前記ＧＰ行列を用いて、前記候補語句取得
機能により取得された候補語句の重要度をシフトさせた
重要度から前記重要語句を抽出し、前記嗜好重要部分選
択機能は、前記重要語句とその重要度により前記嗜好重
要部分を選択する文書処理プログラムが記憶された記憶
媒体を提供することにより前記目的を達成する。請求項
６に記載の発明は、請求項４に記載の記憶媒体におい
て、前記重要語句抽出機能は、前記文書取得機能により
取得された前記文書から前記重要語の候補語句とその重
要度を取得して、前記候補語句の重要度により前記重要
語句を抽出し、前記嗜好重要部分選択機能は、ユーザー
の嗜好を表す複数のキーワードの重要度を要素値とする
嗜好ベクトル、または、複数のユーザーと複数のユーザ
ーそれぞれの嗜好を表す複数のキーワードとの一方を
行、他方を列として前記各ユーザーに対する前記各キー
ワードの重要度を要素値とするＧＰ行列、を取得する嗜
好取得機能を有し、前記嗜好取得機能により取得された
前記嗜好ベクトルまたは前記ＧＰ行列を用いて、前記重
要語句抽出機能により取得された重要語句の重要度をシ
フトさせた重要度により前記重要部分を選択する文書処
理プログラムが記憶された記憶媒体を提供することによ
り前記目的を達成する。請求項７に記載の発明は、複数
の文よりなる文書を取得し、取得された前記文書から重
要語句とその重要度を取得し、前記重要語句に基づいて
前記文書からユーザーの嗜好を反映した嗜好重要部分を
選択し、選択された前記嗜好重要部分に基づいて前記文
書の要約を作成する文書処理方法を提供することにより
前記目的を達成する。According to a first aspect of the present invention, there is provided a document acquisition unit for acquiring a document including a plurality of sentences;
An important word and phrase extracting means for acquiring an important word and its importance from the document acquired by the document acquiring means; and a preference important part for selecting a preference important part reflecting user's preference from the document based on the important word. The above object is achieved by providing a document processing apparatus comprising: a selection unit; and a preference summary creating unit that creates a summary of the document based on the preference important part selected by the preference important part selection unit. . The invention described in claim 2 is the first invention.
In the document processing device described in the above, the important word extraction means, from the document acquired by the document acquisition means, the candidate word of the important word and a candidate word acquisition means for acquiring the importance thereof, and represents the user's preference A preference vector having the importance values of a plurality of keywords as element values, or one of a plurality of users and a plurality of keywords representing the preferences of each user in a row, and the other as a column, the importance of each of the keywords with respect to each of the users. A preference matrix for acquiring a GP matrix as an element value, and using the preference vector or the GP matrix acquired by the preference acquisition module, the candidate phrase acquired by the candidate phrase acquisition module. The important words are extracted from the importance shifted in importance, and the preference important part selecting means, based on the importance words and their importances, By providing the document processing device to select the main components, to achieve the above object. According to a third aspect of the present invention, in the first aspect of the present invention, the important word extracting unit obtains the candidate word of the important word and its importance from the document acquired by the document acquiring unit. Extracting the important phrase according to the importance of the candidate phrase, and the preference important part selecting means includes a preference vector having an importance value of a plurality of keywords representing the user's preference as an element value, or a plurality of users and a plurality of users. A preference matrix for acquiring one of a plurality of keywords representing preferences of each user in a row and the other as a column, and a GP matrix in which the importance of each keyword for each user is an element value; Using the preference vector or the GP matrix obtained by the means, the importance obtained by shifting the importance of the important words obtained by the important word extraction means is changed to Ri to achieve the object by providing a document processing apparatus for selecting the critical parts. The invention according to claim 4 is a document acquisition function for acquiring a document including a plurality of sentences, an important word extraction function for acquiring an important word and its importance from the document acquired by the document acquisition function, A preference important part selecting function for selecting a preference important part reflecting user's preference from the document based on the important word; and a summary of the document based on the preference important part selected by the preference important part selecting function. The above object is achieved by providing a storage medium storing a computer-readable document processing program for causing a computer to implement a preference summary creation function. According to a fifth aspect of the present invention, in the storage medium according to the fourth aspect, the important word / phrase extraction function acquires the important word candidate words and their importance from the document acquired by the document acquisition function. A candidate word acquisition function and a preference vector with the importance values of multiple keywords representing user preferences as element values, or one row of multiple users and multiple keywords representing each user's preference as rows and the other as columns A preference acquisition function for acquiring a GP matrix with the importance of each of the keywords for each user as an element value,
Using the preference vector or the GP matrix acquired by the preference acquisition function to extract the important phrase from the importance shifted from the importance of the candidate phrase acquired by the candidate phrase acquisition function. The preference important portion selection function achieves the above object by providing a storage medium storing a document processing program for selecting the preference important portion based on the important word and its importance. According to a sixth aspect of the present invention, in the storage medium according to the fourth aspect, the important word extraction function acquires the candidate word of the important word and its importance from the document acquired by the document acquisition function. The important phrase is extracted based on the importance of the candidate phrase, and the preference important part selecting function is a preference vector having the importance of a plurality of keywords representing the user's preference as an element value, or a plurality of users and a plurality of users. A plurality of keywords representing preferences of each user in a row, and the other as a column, a GP matrix having element values of the importance of each keyword for each user as element values, Using the preference vector or the GP matrix acquired by the acquisition function to shift the importance of the important words acquired by the important word extraction function Document processing program for selecting the critical parts to achieve the above object by providing the stored storage medium by. The invention according to claim 7 acquires a document composed of a plurality of sentences, acquires an important word and its importance from the acquired document, and reflects a user's preference from the document based on the important word. The object is achieved by providing a document processing method for selecting a preference important part and creating an abstract of the document based on the selected preference important part.

【０００６】[0006]

【発明の実施の形態】以下、本発明の文書処理装置、文
書処理プログラムが記憶された記憶媒体、及び文書処理
方法の好適な実施の形態について、図１から図７を参照
して詳細に説明する。（１）実施形態の概要本実施形態では、過去の処理文書の内容からキーワード
とその重要度を取得し、複数のユーザーとキーワードと
の一方を行、他方を列として前記各ユーザーに対する各
キーワードの重要度を要素値とするＧＰ行列を取得す
る。要約作成対象文書から重要語ａ，ｂ，…と、その出
現頻度等からの重要度ｇ（ｐ），ｇ（ｑ），…を取得
し、重要度を要素としたタームベクトルＶ＝（ｇ
（ｐ），ｇ（ｑ），…）を、ＧＰ行列によってシフトさ
せ、嗜好タームベクトルＶ’を取得する。嗜好タームベ
クトルＶ’の要素（＝嗜好重要度）ｇ’（ｐ），ｇ’
（ｑ），…をもとに要約作成対象文書から嗜好重要文を
抽出し、要約作成対象文書における出現順に並べて、嗜
好要約とする。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of a document processing apparatus, a storage medium storing a document processing program, and a document processing method according to the present invention will be described below in detail with reference to FIGS. I do. (1) Overview of Embodiment In this embodiment, a keyword and its importance are acquired from the contents of a past processed document, and one of a plurality of users and the keyword is set as a row, and the other is set as a column. A GP matrix with the importance as an element value is obtained. .., And the importance g (p), g (q),... From the appearance frequency and the like are obtained from the summary creation target document, and the term vector V = (g
(P), g (q),...) Are shifted by a GP matrix to obtain a preference term vector V ′. Elements of preference term vector V '(= preference importance) g' (p), g '
Based on (q),..., The preference important sentences are extracted from the summary creation target document, and are arranged in the order of appearance in the summary creation target document to obtain a preference summary.

【０００７】（２）実施形態の詳細図１は、本発明の文書処理装置の一実施形態であり、本
発明の文書処理プログラムを記憶した記憶媒体の一実施
形態の該プログラムが読み取られたコンピュータの構成
を表したブロック図である。この図１に示すように、文
書処理装置（コンピュータ）は、装置全体を制御するた
めの制御部１１を備えている。この制御部１１には、デ
ータバス等のバスライン２１を介して、入力装置として
のキーボード１２やマウス１３、表示装置１４、印刷装
置１５、記憶装置１６、記憶媒体駆動装置１７、通信制
御装置１８、および、入出力Ｉ／Ｆ１９、および、文字
認識装置２０が接続されている。制御部１１は、ＣＰＵ
１１１、ＲＯＭ１１２、ＲＡＭ１１３を備えている。Ｒ
ＯＭ１１２は、ＣＰＵ１１１が各種制御や演算を行うた
めの各種プログラムやデータが予め格納されたリードオ
ンリーメモリである。(2) Details of the Embodiment FIG. 1 shows an embodiment of a document processing apparatus according to the present invention, and a computer from which the document processing program according to an embodiment of the present invention is read. FIG. 2 is a block diagram showing the configuration of FIG. As shown in FIG. 1, the document processing apparatus (computer) includes a control unit 11 for controlling the entire apparatus. The control unit 11 includes a keyboard 12 and a mouse 13 as input devices, a display device 14, a printing device 15, a storage device 16, a storage medium driving device 17, a communication control device 18 via a bus line 21 such as a data bus. , And an input / output I / F 19 and a character recognition device 20 are connected. The control unit 11 includes a CPU
111, a ROM 112, and a RAM 113. R
The OM 112 is a read-only memory in which various programs and data for the CPU 111 to perform various controls and calculations are stored in advance.

【０００８】ＲＡＭ１１３は、ＣＰＵ１１１にワーキン
グメモリとして使用されるランダムアクセスメモリであ
る。このＲＡＭ１１３には、本実施形態による嗜好要約
処理を行うためのエリアとして、対象文書格納エリア１
１３１、要約パラメータ格納エリア１１３２、重要語格
納エリア１１３３、タームベクトル格納エリア１１３
４、行列格納エリア１１３５、嗜好タームベクトル格納
エリア１１３６、要約格納エリア１１３７、その他の各
種エリアが確保されるようになっている。The RAM 113 is a random access memory used as a working memory by the CPU 111. In the RAM 113, the target document storage area 1 is used as an area for performing the preference summarizing process according to the present embodiment.
131, summary parameter storage area 1132, important word storage area 1133, term vector storage area 113
4. A matrix storage area 1135, a preference term vector storage area 1136, a summary storage area 1137, and other various areas are secured.

【０００９】対象文書格納エリア１１３１には、嗜好要
約作成の対象となる文書が格納される。要約パラメータ
格納エリア１１３２には、操作者からの入力等により取
得された要約パラメータの値または後述のデータ格納部
１６３から読み込んだ要約パラメータのデフォルト値が
格納される。操作者が入力する要約パラメータとして
は、例えば、全文書に対する要約の比率（１〜９９％）
や、日付時刻、価格情報、物理量（サイズ、重量、温度
等）等の数量優先のある／なし、ＵＲＬ（UniformResou
rce Locator）重視長単文の優先のある／なし、です／
ます／であるの選択をする／しない、等の値が格納され
る。タームベクトル格納エリア１１３４には、本実施形
態により取得された、嗜好要約作成の対象文書の、ター
ムベクトルが格納される。要約格納エリア１１３５に
は、本実施形態により取得された重要文が、嗜好要約作
成対象文書における順番で格納される。[0009] The target document storage area 1131 stores a document for which a preference summary is to be created. The summary parameter storage area 1132 stores the value of the summary parameter obtained by input from the operator or the like or the default value of the summary parameter read from the data storage unit 163 described later. As the summary parameter input by the operator, for example, the ratio of the summary to all documents (1 to 99%)
, Date / time, price information, quantity priority such as physical quantity (size, weight, temperature, etc.) / None, URL (Uniform Resou
rce Locator) weighted long sentence with / without priority /
Values such as whether to select or not to be used are stored. The term vector storage area 1134 stores the term vector of the document for which the preference summary is created, which is acquired in the present embodiment. In the summary storage area 1135, the important sentences acquired according to the present embodiment are stored in the order of the preference summary creation target document.

【００１０】キーボード１２は、かな文字を入力するた
めのかなキーやテンキー、各種機能を実行するための機
能キー、カーソルキー、等の各種キーが配置されてい
る。マウス１３は、ポインティングデバイスであり、表
示装置１４に表示されたキーやアイコン等を左クリック
することで対応する機能の指定を行う入力装置である。
表示装置１４は、例えばＣＲＴや液晶ディスプレイ等が
使用される。この表示装置１４には、嗜好要約作成の対
象となる文書の内容や、本実施形態により作成された嗜
好要約等が表示されるようになっている。印刷装置１５
は、表示装置１４に表示された文章や、記憶装置１６の
文書データベース１６５に格納された文書等の印刷を行
うためのものである。この印刷装置としては、レーザプ
リンタ、ドットプリンタ、インクジェットプリンタ、ペ
ージプリンタ、感熱式プリンタ、熱転写式プリンタ、等
の各種印刷装置が使用される。The keyboard 12 has various keys such as a kana key and a numeric keypad for inputting kana characters, a function key for executing various functions, a cursor key, and the like. The mouse 13 is a pointing device, and is an input device for designating a corresponding function by left-clicking a key, an icon, or the like displayed on the display device 14.
As the display device 14, for example, a CRT or a liquid crystal display is used. The display device 14 displays the contents of a document for which a preference summary is to be created, the preference summary created by the present embodiment, and the like. Printing device 15
Is for printing a sentence displayed on the display device 14, a document stored in the document database 165 of the storage device 16, and the like. Various printing apparatuses such as a laser printer, a dot printer, an ink jet printer, a page printer, a thermal printer, and a thermal transfer printer are used as the printing apparatus.

【００１１】記憶装置１６は、読み書き可能な記憶媒体
と、その記憶媒体に対してプログラムやデータ等の各種
情報を読み書きするための駆動装置で構成されている。
この記憶装置１６に使用される記憶媒体としては、主と
してハードディスクが使用されるが、後述の記憶媒体駆
動装置１７で使用される各種記憶媒体のうちの読み書き
可能な記憶媒体を使用するようにしてもよい。記憶装置
１６は、仮名漢字変換辞書１６１、プログラム格納部１
６２、データ格納部１６３、重要語データベース１６
４、文書データベース１６５、行列データベース１６
８、図示しないその他の格納部（例えば、この記憶装置
１６内に格納されているプログラムやデータ等をバック
アップするための格納部）等を有している。プログラム
格納部１６２には、本実施形態における嗜好要約作成処
理プログラム等の各種プログラムの他、仮名漢字変換辞
書１６１を使用して入力された仮名文字列を漢字混り文
に変換する仮名漢字変換プログラム等の各種プログラム
が格納されている。データ格納部１６３には、要約パラ
メータのデフォルト値等の各種データが格納されてい
る。要約パラメータのデフォルト値としては、例えば、
全文書に対する要約の比率＝「２５％」や、日付時刻、
価格情報、物理量（サイズ、重量、温度等）等の数量重
視＝「しない」や、ＵＲＬ（Uniform Resource Locato
r）重視＝「しない」、長単文の重視＝「しない」や、
です／ます／であるの選択＝「しない」、等の値が格納
されている。The storage device 16 comprises a readable and writable storage medium and a drive device for reading and writing various information such as programs and data on the storage medium.
As a storage medium used for the storage device 16, a hard disk is mainly used, but a readable and writable storage medium among various storage media used in a storage medium driving device 17 described later may be used. Good. The storage device 16 stores the kana-kanji conversion dictionary 161 and the program storage unit 1
62, data storage unit 163, important word database 16
4. Document database 165, matrix database 16
8, other storage units not shown (for example, storage units for backing up programs, data, and the like stored in the storage device 16) and the like. In the program storage unit 162, in addition to various programs such as a preference summary creation processing program in the present embodiment, a kana-kanji conversion program for converting a kana character string input using the kana-kanji conversion dictionary 161 into a kanji mixed sentence. Etc. are stored. The data storage 163 stores various data such as default values of summary parameters. As the default value of the summary parameter, for example,
Summary ratio for all documents = "25%", date and time,
Importance of quantity such as price information, physical quantity (size, weight, temperature, etc.) = "No" or URL (Uniform Resource Locato)
r) emphasis = "no", long single sentence emphasis = "no",
A value such as "/" / "/""=" No "is stored.

【００１２】重要語データベース１６４には、本実施形
態において、過去の所定期間中に処理された文書をもと
に取得されたキーワード（処理重要語）とこのキーワー
ド（処理重要語）の重要度（処理重要度）が互いに対応
して格納されている。文書データベース１６５には、仮
名漢字変換プログラムにより作成された文書や、他の装
置で作成されて記憶媒体駆動装置１７や通信制御装置１
８から読み込まれた文書が格納される。この文書データ
ベース１６５に格納される各文書の形式は特に限定され
るものではなく、テキスト形式の文書、ＨＴＭＬ（Hype
r TextMarkup Language）形式の文書、ＪＩＳ形式の文
書等の各種形式の文書の格納が可能である。更にこの文
書データベース１６５には、文書を処理したユーザー及
びその処理回数が各文書に対応付けて格納されている。
前記処理回数は、所定期間毎に値を０にリセットされ
る。In this embodiment, the keyword database 164 stores, in the present embodiment, a keyword (processing keyword) obtained based on a document processed during a predetermined period in the past, and the importance of the keyword (processing keyword). (Processing importance) are stored in correspondence with each other. The document database 165 includes a document created by the kana-kanji conversion program and a storage medium drive 17 and a communication controller 1 created by another device.
8 is stored. The format of each document stored in the document database 165 is not particularly limited, and a text format document, HTML (Hype
Documents in various formats, such as rTextMarkup Language) format documents and JIS format documents, can be stored. Further, in the document database 165, the user who has processed the document and the number of times of processing have been stored in association with each document.
The value of the processing count is reset to 0 every predetermined period.

【００１３】行列データベース１６８には、過去の所定
期間に行われた文書処理の処理内容により取得される行
列Ｇａ，Ｇｂ，Ｇｃが格納されている。これらの行列Ｇ
ａ，Ｇｂ，ＧｃからＧＰ（Group Personalize ）行列が
取得され、このＧＰ行列によって、要約対象文書の重要
語（句も含む）の重要度がシフト（重要度が変換）され
る。図２（ａ）〜（ｃ）は、行列Ｇａ，Ｇｂ，Ｇｃの一
例を示す説明図である。The matrix database 168 stores matrices Ga, Gb, and Gc obtained based on the contents of the document processing performed in a predetermined period in the past. These matrices G
A GP (Group Personalize) matrix is acquired from a, Gb, and Gc, and the importance of important words (including phrases) of the document to be summarized is shifted (importance is converted) by the GP matrix. 2A to 2C are explanatory diagrams illustrating an example of the matrices Ga, Gb, and Gc.

【００１４】行列Ｇａは、図２（ａ）に示すように、過
去所定期間内に処理した処理文書から抽出された処理重
要語を行に、同処理文書を列にとった行列であり、各要
素は処理重要語の処理重要度ｆ（ｘ）を表している。行
列Ｇｂは、図２（ｂ）に示すように、前記処理文書を行
にとり、ユーザーを列にとった行列であり、各要素は、
ユーザーが各文書を前記所定期間内に処理した回数とな
っている。行列Ｇｃは、図２（ｃ）に示すように、行お
よび列がともにユーザーそれぞれの重要度係数を示して
いる。行列Ｇａ及び行列Ｇｂは所定期間ごとに書き換え
られ、行列Ｇｃは操作者からの入力により適宜書き換え
られる。As shown in FIG. 2A, the matrix Ga is a matrix in which processing important words extracted from processed documents processed within a predetermined period in the past are arranged in rows and the processed documents are arranged in columns. The element represents the processing importance f (x) of the processing important word. As shown in FIG. 2B, the matrix Gb is a matrix in which the processed document is taken in rows and the user is taken in columns, and each element is
This is the number of times that the user has processed each document within the predetermined period. In the matrix Gc, as shown in FIG. 2C, both the row and the column indicate the importance coefficient of each user. The matrix Ga and the matrix Gb are rewritten every predetermined period, and the matrix Gc is appropriately rewritten by an input from the operator.

【００１５】記憶媒体駆動装置１７は、ＣＰＵ１１１が
外部の記憶媒体からコンピュータプログラムや文書を含
むデータ等を読み込むための駆動装置である。記憶媒体
に記憶されているコンピュータプログラムには、本実施
形態の文書処理装置により実行される各種処理のための
プログラム、および、そこで使用される辞書、データ等
も含まれる。ここで、記憶媒体とは、コンピュータプロ
グラムやデータ等が記憶される記憶媒体をいい、具体的
には、フロッピーディスク、ハードディスク、磁気テー
プ等の磁気記憶媒体、メモリチップやＩＣカード等の半
導体記憶媒体、ＣＤ−ＲＯＭやＭＯ、ＰＤ（相変化書換
型光ディスク）等の光学的に情報が読み取られる記憶媒
体、紙カードや紙テープ等の用紙（および、用紙に相当
する機能を持った媒体）を用いた記憶媒体、その他各種
方法でコンピュータプログラム等が記憶される記憶媒体
が含まれる。本実施形態の文書処理装置において使用さ
れる記憶媒体としては、主として、ＣＤ−ＲＯＭやフロ
ッピーディスクが使用される。記憶媒体駆動装置１７
は、これらの各種記憶媒体からコンピュータプログラム
を読み込む他に、フロッピーディスクのような書き込み
可能な記憶媒体に対してＲＡＭ１１３や記憶装置１６に
格納されているデータ等を書き込むことが可能である。The storage medium drive 17 is a drive for the CPU 111 to read a computer program or data including a document from an external storage medium. The computer programs stored in the storage medium include programs for various processes executed by the document processing apparatus of the present embodiment, and dictionaries and data used therein. Here, the storage medium refers to a storage medium in which a computer program, data, and the like are stored, and specifically, a magnetic storage medium such as a floppy disk, a hard disk, and a magnetic tape, and a semiconductor storage medium such as a memory chip and an IC card. A storage medium such as a CD-ROM, an MO, a PD (phase change rewritable optical disk) or the like, from which information can be read optically, and a paper such as a paper card or a paper tape (and a medium having a function equivalent to the paper) are used. It includes a storage medium and a storage medium in which a computer program or the like is stored by various methods. As a storage medium used in the document processing apparatus of the present embodiment, a CD-ROM or a floppy disk is mainly used. Storage medium drive 17
In addition to reading computer programs from these various storage media, it is possible to write data and the like stored in the RAM 113 and the storage device 16 to a writable storage medium such as a floppy disk.

【００１６】本実施形態の文書処理装置では、制御部１
１のＣＰＵ１１１が、記憶媒体駆動装置１７にセットさ
れた外部の記憶媒体からコンピュータプログラムを読み
込んで、記憶装置１６の各部に格納（インストール）す
る。そして、本実施形態による類似度算出等の各種処理
を実行する場合、記憶装置１６から該当プログラムをＲ
ＡＭ１１３に読み込み、実行するようになっている。但
し、記憶装置１６からではなく、記憶媒体駆動装置１７
により外部の記憶媒体から直接ＲＡＭ１１３に読み込ん
で実行することも可能である。また、文書処理装置によ
っては、本実施形態の嗜好要約作成処理プログラム等を
予めＲＯＭ１１２に記憶しておき、これをＣＰＵ１１１
が実行するようにしてもよい。In the document processing apparatus of this embodiment, the control unit 1
One CPU 111 reads a computer program from an external storage medium set in the storage medium drive 17 and stores (installs) it in each unit of the storage 16. When executing various processes such as similarity calculation according to the present embodiment, the corresponding program is
The data is read into the AM 113 and executed. However, not from the storage device 16 but the storage medium drive device 17
It is also possible to read the program directly from the external storage medium into the RAM 113 and execute it. Also, depending on the document processing apparatus, the preference summary creation processing program of the present embodiment or the like is stored in the ROM 112 in advance, and this is stored in the CPU 111.
May be executed.

【００１７】通信制御装置１８は、他のパーソナルコン
ピュータやワードプロセッサ等との間でテキスト形式や
ＨＴＭＬ形式等の各種形式の文書やビットマップデータ
等の各種データの送受信を行うことができるようになっ
ている。入出力Ｉ／Ｆ１９は、音声や音楽等の出力を行
うスピーカ等の各種機器を接続するためのインターフェ
ースである。文字認識装置２０は、用紙等に記載された
文字をテキスト形式やＨＴＭＬ等の各種形式で認識する
装置であり、イメージスキャナや文字認識プログラム等
で構成されている。The communication control device 18 is capable of transmitting and receiving various types of documents such as text format and HTML format and various data such as bitmap data to and from other personal computers and word processors. I have. The input / output I / F 19 is an interface for connecting various devices such as a speaker that outputs audio, music, and the like. The character recognition device 20 is a device for recognizing characters written on paper or the like in various formats such as a text format or HTML, and is configured by an image scanner, a character recognition program, and the like.

【００１８】本実施形態では、キーボード１２の入力操
作により作成した文書（ＲＡＭ１１３の所定格納エリア
に格納）の他、外部で作成して所定の記憶媒体に格納し
た文書で記憶媒体駆動装置１７から読み込んだ文書、予
め文書データベースに格納されている文書、通信制御装
置１８からダウンロードした文書、及び文字認識装置２
０で文字認識した文書、等の各種文書を対象文書として
取得する（文書取得手段）ことが可能である。In this embodiment, in addition to a document created by an input operation on the keyboard 12 (stored in a predetermined storage area of the RAM 113), a document created externally and stored in a predetermined storage medium is read from the storage medium driving device 17. Documents, documents stored in advance in a document database, documents downloaded from the communication control device 18, and the character recognition device 2.
It is possible to acquire various documents such as a document whose characters have been recognized as 0 as a target document (document acquisition means).

【００１９】次に、上述のような構成の文書処理装置に
よる嗜好要約作成処理であって、本発明の文書処理方法
の一実施形態について図３〜図７を参照して説明する。Next, one embodiment of the document processing method according to the present invention, which is a preference summarizing process performed by the document processing apparatus having the above-described configuration, will be described with reference to FIGS.

【００２０】本実施形態においては、所定期間毎に、該
所定期間内に行われた文書処理の処理内容基づいて新た
な処理重要語及び処理重要度が取得され、行列データベ
ース１６８内の行列Ｇａ及び行列Ｇｂが書き換えられ
る。In this embodiment, a new processing important word and a new processing importance are acquired for each predetermined period based on the contents of the document processing performed during the predetermined period, and the matrices Ga and Ga in the matrix database 168 are obtained. The matrix Gb is rewritten.

【００２１】図３は、行列Ｇａ，Ｇｂ書き換え処理の動
作を表したフローチャートである。ＣＰＵ１１１は、所
定期間内に処理された文書（処理文書）を文書データベ
ース１６５から順次取得してＲＡＭ１１３の所定作業領
域に格納し（ステップ１１）、各文書についてのキーワ
ード（処理重要語（句も含む））及びその重要度（処理
重要度）を取得する（ステップ１２）。FIG. 3 is a flowchart showing the operation of the matrix Ga, Gb rewriting process. The CPU 111 sequentially obtains documents (processed documents) processed within a predetermined period from the document database 165, stores them in a predetermined work area of the RAM 113 (step 11), and sets keywords (including processing key words (including phrases)) for each document. )) And its importance (processing importance) (step 12).

【００２２】図４は、各文書についての処理重要語・処
理重要度取得処理の動作を表したフローチャートであ
る。図４に示すように、ＣＰＵ１１１は、文書データベ
ース１６５から取得した文書について、形態素解析を行
うことで処理文書から自立語を抽出する（ステップ１２
１）と共に、名詞句、複合名詞句等を含めた候補語
（句）を処理文書から抽出する（ステップ１２２）。次
に抽出した候補語（句）の処理文書での出現頻度、評価
関数から、各候補語（句）の処理重要度ｆ（ｘ）を取得
する（ステップ１２３）。ここで、評価関数としては、
例えば、所定の重要語が予め指定されている場合にはそ
の重要語に対する重み付け、単語、名詞句、複合名詞句
等の候補語（句）の種類による重み付け等が使用され
る。FIG. 4 is a flowchart showing the operation of the processing important word / processing importance acquisition processing for each document. As shown in FIG. 4, the CPU 111 extracts a self-contained word from the processed document by performing morphological analysis on the document acquired from the document database 165 (step 12).
Along with 1), candidate words (phrases) including noun phrases, compound noun phrases, etc. are extracted from the processed document (step 122). Next, the processing importance f (x) of each candidate word (phrase) is obtained from the appearance frequency of the extracted candidate word (phrase) in the processing document and the evaluation function (step 123). Here, as the evaluation function,
For example, when a predetermined important word is specified in advance, weighting for the important word, weighting based on the type of candidate word (phrase) such as a word, a noun phrase, a compound noun phrase, and the like are used.

【００２３】さらにＣＰＵ１１１は、取得した処理重要
度ｆ（ｘ）の値をもとに候補語（句）から処理重要語
ａ，ｂ，ｃ，…を取得し（ステップ１２４）、この処理
重要語ａ，ｂ，ｃ，…及びその処理重要度ｆ（ａ），ｆ
（ｂ），ｆ（ｃ）…を重要語データベース１６４に格納
する。すべての処理文書について、処理重要語及びその
処理重要度を取得すると、図３に示す行列Ｇａ，Ｇｂ書
き換え処理ルーチンへリターンする。Further, the CPU 111 obtains processing important words a, b, c,... From the candidate words (phrases) based on the obtained value of the processing importance f (x) (step 124). a, b, c,... and their processing importances f (a), f
(B), f (c)... Are stored in the important word database 164. When the processing important words and the processing importance thereof are obtained for all the processing documents, the process returns to the matrix Ga, Gb rewriting processing routine shown in FIG.

【００２４】次にＣＰＵ１１１は、行列データベース１
６８の行列Ｇａを、前記処理重要語ａ，ｂ，ｃ，…を行
に、前記所定期間の処理文書を列に、また処理重要度ｆ
（ｘ）を各要素にとったものに書き換える（ステップ１
３）。そして、ＣＰＵ１１１は、文書データベース１６
５から、各文書の処理回数を取得し（ステップ１４）、
行列Ｇｂを、所定期間内の処理文書を行に、文書データ
ベース１６５から取得した処理回数を各要素としたもの
に書き換えて、行列Ｇａ，Ｇｂ書き換え処理を終了す
る。Next, the CPU 111 operates the matrix database 1
68, the processing important words a, b, c,... In rows, the processing documents in the predetermined period in columns, and the processing importance f
(X) is rewritten to take each element (Step 1)
3). Then, the CPU 111 sets the document database 16
5, the number of times of processing of each document is obtained (step 14).
The matrix Gb is rewritten with the number of processes obtained from the document database 165 as elements, with the processed documents within a predetermined period as rows, and the matrix Ga, Gb rewriting process ends.

【００２５】図５は、嗜好要約作成処理のメイン動作を
表すフローチャートである。要約作成処理に際しては、
ＣＰＵ１１１は、要約を作成する対象となっている文書
（要約対象文書）を取得し、ＲＡＭ１１３の対象文書格
納エリア１１３１に格納する（ステップ２１）。要約対
象文書は、ユーザの指示に従ってＲＡＭ１１３、記憶装
置１６の文書データベース１６５、記憶媒体駆動装置１
７、または通信制御装置１８から取得する。続いてＣＰ
Ｕ１１１は、ユーザによってキーボード１２等から要約
パラメータが入力された場合には入力値を取得し、ユー
ザによる入力がない場合にはデータ格納部１６３に格納
された要約パラメータのデフォルト値を取得し、要約パ
ラメータ格納エリア１１３２に格納する（ステップ２
２）。FIG. 5 is a flowchart showing the main operation of the preference summary creation processing. In the summary creation process,
The CPU 111 acquires a document for which an abstract is to be created (summary target document) and stores it in the target document storage area 1131 of the RAM 113 (step 21). The document to be summarized is stored in the RAM 113, the document database 165 of the storage device 16, the storage medium drive 1
7 or from the communication control device 18. Then CP
U111 obtains an input value when the user inputs a summary parameter from the keyboard 12 or the like, and obtains a default value of the summary parameter stored in the data storage unit 163 when there is no input by the user. It is stored in the parameter storage area 1132 (step 2
2).

【００２６】次にＣＰＵ１１１は、対象文書格納エリア
１１３１に格納した要約対象文書に対するタームベクト
ルＶを求める（ステップ２３）。図６は、タームベクト
ル取得処理の動作を表したフローチャートである。ＣＰ
Ｕ１１１は、まず形態素解析を行うことで要約対象文書
に含まれる自立語を抽出する（ステップ２３１）と共
に、名詞句、複合名詞句等を含めた候補語（句）を要約
対象文書から抽出しＲＡＭ１１３の所定作業領域に格納
する（ステップ２３２）。次に、ＲＡＭ１６の要約パラ
メータ格納エリア１１３２に格納した要約パラメータ
や、抽出した候補語（句）の要約対象文書中での出現頻
度、評価関数等から、客観的な重要度ｇ（ｙ）を決定す
る（ステップ２３３）。ここで、評価関数としては、例
えば、所定の重要語が予め指定されている場合にはその
重要語に対する重み付け、単語、名詞句、複合名詞句等
の候補語（句）の種類による重み付け等が使用される。Next, the CPU 111 obtains a term vector V for the digest target document stored in the target document storage area 1131 (step 23). FIG. 6 is a flowchart showing the operation of the term vector acquisition processing. CP
The U111 first performs a morphological analysis to extract independent words included in the document to be summarized (step 231), and also extracts candidate words (phrases) including noun phrases, compound noun phrases, and the like from the document to be summarized and RAM 113. (Step 232). Next, the objective importance g (y) is determined from the summary parameters stored in the summary parameter storage area 1132 of the RAM 16, the frequency of appearance of the extracted candidate words (phrases) in the document to be summarized, the evaluation function, and the like. (Step 233). Here, as the evaluation function, for example, when a predetermined important word is specified in advance, weighting for the important word, weighting according to the type of a candidate word (phrase) such as a word, a noun phrase, a compound noun phrase, and the like are used. used.

【００２７】そして、この客観的な重要度ｇ（ｙ）によ
り重要語ｐ，ｑ，ｒ，…を取得し（ステップ２３４）、
重要語ｐ，ｑ，ｒ，…の客観的な重要度ｇ（ｐ），ｇ
（ｑ）、ｇ（ｒ），…を要素とするタームベクトルＶを
取得し（ステップ２３５）、図５に示す嗜好要約作成処
理のルーチンへリターンする。Then, important words p, q, r,... Are obtained based on the objective importance g (y) (step 234).
Objective importance g (p), g of important words p, q, r, ...
A term vector V having (q), g (r),... As elements is obtained (step 235), and the process returns to the preference summary creating routine shown in FIG.

【００２８】続いて、ＣＰＵ１１１は、行列Ｇａを行列
データベース１６８から取得し、タームベクトルＶと行
列Ｇａとの次元合わせを行う（ステップ２４）。即ち、
タームベクトルＶの次元数と、行列Ｇａの行数とを、要
約対象文書の重要語と行列Ｇａの行があらわす処理重要
語の和集合の数とし、タームベクトルＶのみに含まれる
重要語に対する行列Ｇａの要素値、および、行列の行の
みに含まれる重要語に対するタームベクトルＶの要素値
は、”０”と定義する。Subsequently, the CPU 111 acquires the matrix Ga from the matrix database 168, and performs dimension matching between the term vector V and the matrix Ga (step 24). That is,
The number of dimensions of the term vector V and the number of rows of the matrix Ga are defined as the number of union of the important words of the document to be summarized and the rows of the processing important words represented by the rows of the matrix Ga. The element value of Ga and the element value of the term vector V for the key word included only in the matrix row are defined as “0”.

【００２９】例えば、要約対象文書の重要語が「重要、
重要語、重要度、…」、行列Ｇａの行があらわす処理重
要語が「重要、…、政治、…」であり、要約対象文書の
タームベクトルＶ＝（１，１８，１９，…）、行列Ｇ
ａの、ある１列が（１８，…，２１，…）である場合、
次元を合わせると、要約対象文書のタームベクトルＶ＝
（１，１８，１９，…，０，…）、行列Ｇａの１列
は（１８，０，０，…，２１，…）となる。次元合
わせ後の行列Ｇａ及びタームベクトルＶは、それぞれ、
ＲＡＭ１１３の行列格納エリア１１３５、タームベクト
ル格納エリア１１３４に格納する。For example, if the important word of the document to be summarized is “important,
Important word, importance, ... ", the processing important word represented by the row of the matrix Ga is" important, ..., politics, ... ", and the term vector V = (1, 18, 19, ...) of the document to be summarized, matrix G
If one column of a is (18,..., 21,...)
When the dimensions are matched, the term vector V =
(1, 18, 19, ..., 0, ...), and one column of the matrix Ga is (18, 0, 0, ..., 21, ...). The matrix Ga and the term vector V after the dimension matching are
The data is stored in the matrix storage area 1135 and the term vector storage area 1134 of the RAM 113.

【００３０】続いて、ＣＰＵ１１１は、行列Ｇｂ，Ｇｃ
を行列データベース１６８から取得し、次元を合わせを
行った行列Ｇａと行列Ｇｂ，ＧｃとからＧＰ行列を取得
する（ステップ２５）。ＧＰ行列は、次の式に従って求
める。ＧＰ＝Ｇａ・Ｇｂ・Ｇｃ従って、本実施形態にお
けるＧＰ行列は、Ｇａ行列の次元合わせを行った行をそ
のまま行にとり、ユーザーの各メンバーを列にとってな
っており、ＧＰ行列の各要素は、メンバー毎の過去の文
書処理における処理重要語の処理重要度ｆ（ｘ）に各メ
ンバーの重要度を加味して表した数値となっている。Subsequently, the CPU 111 calculates the matrices Gb, Gc
Is obtained from the matrix database 168, and a GP matrix is obtained from the matrices Ga and Gb and Gc whose dimensions have been adjusted (step 25). The GP matrix is obtained according to the following equation. GP = Ga.Gb.Gc Therefore, the GP matrix in the present embodiment is a row where the dimension matching of the Ga matrix is taken as it is, and each member of the user is a column, and each element of the GP matrix is a member. The numerical value is obtained by adding the importance of each member to the processing importance f (x) of the processing important word in each past document processing.

【００３１】ＧＰ行列が取得されると、続いてＣＰＵ１
１１は、このＧＰ行列をもとにＧＰベクトルを算出する
（ステップ２６）。When the GP matrix is obtained, the CPU 1
11 calculates a GP vector based on the GP matrix (step 26).

【００３２】図７は、ＧＰ行列からＧＰベクトルを算出
する行程を概念的に説明する説明図である。ＣＰＵ１１
１は、まず、ＧＰ行列の各要素ｇｉｊ( ｉ＝１〜メンバ
ー数ｍ、ｊ＝１〜要約対象文書の重要語と処理重要語の
和集合の数ｋ）の各行毎の要素の平均値を算出して列ベ
クトル（総ＧＰベクトル）を得る（図７（１）→
（２））。この総ＧＰベクトルは、各要素ｇｉが重要語
毎のユーザーグループ全体における過去の文書処理での
出現頻度（但し各重要語の予め決められた重要語の重み
等や、メンバーの重要度が加味されている）を反映した
数値となっている。ＣＰＵ１１１は、更に、この総ＧＰ
ベクトルの各要素ｇｉを文書の処理回数の総数で割っ
て、１列のＧＰベクトルを得る（図７（２）→
（３））。この様に、総ＧＰベクトルを文書の処理回数
の総数で割るのは、行列Ｇｂに文書の処理回数が要素と
して含まれており、処理回数が増えるに従ってＧＰベク
トルが大きくなっていくのを回避するためである。FIG. 7 is an explanatory diagram conceptually explaining a process of calculating a GP vector from a GP matrix. CPU11
First, the average value of the elements of each element of each element gij of the GP matrix (i = 1 to the number of members m, j = 1 to the number k of the union of the key word and the processing key word of the document to be summarized) is calculated. Calculation to obtain a column vector (total GP vector) (FIG. 7 (1) →
(2)). In this total GP vector, the frequency of appearance of each element gi in the past document processing in the entire user group for each important word (however, the weight of a predetermined important word of each important word and the importance of members are taken into account) ) Is reflected. The CPU 111 further calculates the total GP
Each element gi of the vector is divided by the total number of document processing times to obtain a GP vector in one column (FIG. 7 (2) →
(3)). As described above, dividing the total GP vector by the total number of document processing times prevents the matrix Gb from including the document processing number as an element, and prevents the GP vector from increasing as the processing number increases. That's why.

【００３３】そして、ＣＰＵ１１１は、ＧＰベクトルの
各要素とこの各要素に対応するタームベクトルＶの要素
とを掛け合わせて、嗜好タームベクトルＶ’を得る（ス
テップ２７）。この嗜好タームベクトルＶ’の各要素
は、客観的な重要度ｇ（ｙ）にユーザーのタームについ
ての嗜好を重み付けした嗜好重要度ｇ’（ｙ）となって
いる。Then, the CPU 111 obtains a preference term vector V 'by multiplying each element of the GP vector by the element of the term vector V corresponding to each element (step 27). Each element of the preference term vector V ′ is a preference importance g ′ (y) obtained by weighting the user's preference for the term to the objective importance g (y).

【００３４】続いて、ＣＰＵ１１１は、重要語の嗜好重
要度ｇ’（ｙ）により、要約対象文書に含まれる嗜好部
分重要度（嗜好文重要度Ｆ（Ｚ））を取得する（ステッ
プ２８）。そして、決定した各部分（各文）の嗜好部分
重要度（嗜好文重要度Ｆ（Ｚ））の高い部分（文）の上
位から要約パラメータの要約比率（例えば、対象要約文
書中の全文数の内の上位２５％）以内に入る部分（文）
を嗜好重要部分（嗜好重要文）としてリストアップし
（ステップ２９）、リストアップした文を要約対象文書
の中での出現順に並べることで当該要約対象文書の嗜好
要約とし、これをＲＡＭ１１３の要約格納エリア１１３
７に格納して（ステップ３０）、本実施形態による嗜好
要約作成処理を終了する。Subsequently, the CPU 111 acquires the preference part importance (favorite sentence importance F (Z)) included in the document to be summarized based on the preference importance g ′ (y) of the important word (step 28). Then, the summarization ratio of the summarization parameter (for example, the total number of sentences in the target summarization document) is ranked from the top of the portion (sentence) with the highest preference part importance (favorite sentence importance F (Z)) of each part (each sentence) (Sentence) within the top 25% of
Is listed as a preference important part (taste important sentence) (step 29), and the listed sentences are arranged in the order of appearance in the summary target document to be a preference summary of the summary target document, and the summary is stored in the RAM 113. Area 113
7 (step 30), and terminates the preference summary creation processing according to the present embodiment.

【００３５】この様に、本実施形態では、過去の文書処
理における出現頻度等をもとにユーザーの重要語に対す
る嗜好を把握し、要約対象文書から取得した重要語の客
観的な重要度ｇ（ｙ）を前記ユーザーの嗜好を反映して
重み付けをした嗜好重要度ｇ’（ｙ）に変換し、この嗜
好重要度ｇ’（ｙ）をもとに重要文を取得して要約を作
成する。従って、本実施形態によると、ユーザーの嗜好
の反映された要約が作成される。本実施形態によると、
重要語の客観的な重要度を要素としたタームベクトルＶ
を獲得し、ユーザーの嗜好を反映させたＧＰ行列を用い
て変換させることによって、嗜好重要度を要素とする嗜
好タームベクトルＶ’を獲得しているので、計算処理が
簡単であり、ベクトル空間法を採用したコア・エンジン
を備えた一般の文書処理装置に容易に適用することが可
能である。As described above, in this embodiment, the user's preference for the important word is grasped based on the frequency of appearance in the past document processing and the like, and the objective importance g ( y) is converted into a preference importance g ′ (y) weighted by reflecting the user preference, and an important sentence is acquired based on the preference importance g ′ (y) to create a summary. Therefore, according to the present embodiment, a summary reflecting the user's preference is created. According to this embodiment,
Term vector V with objective importance of key words as elements
Is obtained and converted using a GP matrix that reflects the user's preferences, thereby obtaining a preference term vector V ′ having the preference importance as an element. The present invention can be easily applied to a general document processing apparatus having a core engine adopting the standard.

【００３６】本実施形態よると、タームベクトルＶを嗜
好タームベクトルＶ’にシフトさせるＧＰ行列を、表現
すべき特徴毎の単純な観点で構成した行列Ｇａ，Ｇｂ，
Ｇｃの掛け合わせて求めているので、様々な特徴を考慮
に入れたＧＰ行列を容易に構成してタームベクトルＶを
シフトさせることが可能である。本実施形態よると、タ
ームベクトルＶを嗜好タームベクトルＶ’にシフトさせ
るためのＧＰ行列は、各列がユーザーの興味を反映して
いるので、複数のユーザーからなるグループを数グルー
プに分割した該グループのＧＰ行列や個々のユーザーの
ＧＰ行列（ベクトル）を容易に得ることができる。本実
施形態よると、ＧＰ行列がユーザーの過去に処理した文
書をもとに所定期間毎に書き換えられている行列Ｇａ，
Ｇｂ，Ｇｃをもとに取得されているので、タームベクト
ルＶがユーザーの嗜好の経時的変化に対応した嗜好ター
ムベクトルＶ’にシフトされ、ユーザーの嗜好の変遷に
追随した嗜好要約が作成される。According to the present embodiment, the GP matrix for shifting the term vector V to the preference term vector V ′ is a matrix Ga, Gb,
Since it is obtained by multiplying Gc, it is possible to easily construct a GP matrix taking into account various features and shift the term vector V. According to the present embodiment, the GP matrix for shifting the term vector V to the preference term vector V ′ is obtained by dividing a group including a plurality of users into several groups because each column reflects the interest of the user. A GP matrix of a group or a GP matrix (vector) of individual users can be easily obtained. According to the present embodiment, the matrix Ga, the GP matrix of which is rewritten every predetermined period based on a document processed in the past by the user,
Since the term vector V is obtained based on Gb and Gc, the term vector V is shifted to the preference term vector V ′ corresponding to the change of the user's preference over time, and the preference summary following the change of the user's preference is created. .

【００３７】尚、本発明は、上述の実施形態に限定され
るものではなく、本発明の趣旨を逸脱しない限りにおい
て適宜変更が可能である。上述の実施形態においては文
書処理装置としてコンピュータを用いているが、コンピ
ュータに限定されるものではなく、ワードプロセッサ等
であってもよい。It should be noted that the present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the gist of the present invention. In the above embodiment, a computer is used as the document processing apparatus. However, the present invention is not limited to the computer, and may be a word processor or the like.

【００３８】要約対象文書から取得した重要語候補すべ
てについて嗜好重要度を獲得し、この嗜好重要度に基づ
いて重要語候補から重要語を取得することもできる。客
観的な重要度に基づいて嗜好重要度を取得する場合に、
客観的な重要度をベクトル化せずに、客観的な重要度そ
れぞれに適当な処理を施すことにより嗜好重要度を得る
こともできる。また、客観的な重要度をベクトル化する
場合であっても、タームベクトルを嗜好タームベクトル
に変換する手法はＧＰ行列を用いていなくてもよい。候
補語抽出手段及び重要語獲得手段として、要約作成対象
文書から処理重要語を抽出する処理重要語抽出手段を用
いることもできる。It is also possible to acquire the preference importance for all the important word candidates acquired from the document to be summarized, and acquire the important words from the important word candidates based on the preference importance. If you want to get preference importance based on objective importance,
Preference importance can also be obtained by performing appropriate processing for each objective importance without vectorizing the objective importance. Further, even when the objective importance is vectorized, the method of converting the term vector into the preference term vector does not need to use the GP matrix. As the candidate word extracting means and the important word acquiring means, a processing important word extracting means for extracting a processing important word from the document for which the summary is to be created can be used.

【００３９】上述の実施形態においてはＧＰ行列は、ユ
ーザー一人ずつの過去の文書処理回数（行列Ｇａ）と各
文書における重要語の出現頻度（行列Ｇｂ）、および各
ユーザーの重要度（行列Ｇｃ）とから取得されている
が、ユーザー毎の過去の文書処理回数（行列Ｇａ）と各
文書における重要語の出現頻度（行列Ｇｂ）のみにより
取得されてもよい。また、例えば、各文書の処理時間
や、他の文書作成に引用された件数等も加味して取得さ
れてもよい。更に、ＧＰ行列を上述の実施形態と同様に
行列Ｇａ〜行列Ｇｃ等の行列の掛け合わせから取得する
場合において、行列Ｇa 〜行列Ｇｃ等の各行列の要素は
それぞれ重要語の文書中の出現頻度や、ユーザーが各文
書を処理した回数を反映した数値となっていればよく、
直接出現頻度や処理回数そのものを表していなくてもよ
い。In the above-described embodiment, the GP matrix includes the number of past document processings (matrix Ga) for each user, the frequency of appearance of key words in each document (matrix Gb), and the importance of each user (matrix Gc). However, it may be obtained based only on the number of past document processings (matrix Ga) for each user and the frequency of appearance of the key words in each document (matrix Gb). In addition, for example, it may be obtained in consideration of the processing time of each document, the number of cases cited in creating another document, and the like. Further, when the GP matrix is obtained by multiplication of matrices such as the matrices Ga to Gc in the same manner as in the above-described embodiment, the elements of the matrices such as the matrices Ga to Gc indicate the frequency of occurrence of the important word in the document. Or a number that reflects the number of times a user has processed each document,
It does not need to directly represent the frequency of appearance or the number of processing itself.

【００４０】上述の実施形態においては行列Ｇａ〜Ｇｃ
は過去の文書処理内容から取得されているが、ユーザー
から取得して行列データベース１６８に格納しておいて
もよい。上述の実施形態においては行列Ｇａ〜Ｇｃは所
定期間毎に書き換えられているが、文書処理毎にまたは
操作者等の判断により適宜書き換えるようにしてもよ
い。ＧＰベクトルを表示装置に表示するＧＰベクトル表
示手段を備え、ユーザーのグループ全体やユーザーの嗜
好を視覚的に把握できるようにしてもよい。この場合、
ＧＰベクトルを行列データベースまたは専用のＧＰベク
トルデータベースに経時順に格納しておき、経時変化も
把握できるようにしてもよい。上述の実施形態において
は、重要語句の嗜好重要度によって文単位で重要度が比
較され、嗜好重要部分として嗜好重要文が選択される
が、段落単位やタイトルの重要度を比較して、嗜好重要
部分として嗜好重要段落や嗜好重要タイトルを選択させ
るようにしてもよい。In the above embodiment, the matrices Ga to Gc
Is acquired from the contents of past document processing, but may be acquired from the user and stored in the matrix database 168. In the above-described embodiment, the matrices Ga to Gc are rewritten every predetermined period. However, the matrices Ga to Gc may be appropriately rewritten every document processing or by a judgment of an operator or the like. A GP vector display means for displaying GP vectors on a display device may be provided so that the entire group of users and the user's preferences can be visually grasped. in this case,
GP vectors may be stored in a matrix database or a dedicated GP vector database in chronological order so that changes over time can be grasped. In the above-described embodiment, the importance is compared in sentence units according to the preference importance of important words, and the preference important sentence is selected as the preference important part. A preference important paragraph or a preference preference title may be selected as a part.

【００４１】[0041]

【発明の効果】以上説明したように、本発明によれば、
要約対象文書中の重要語について、ユーザーの嗜好を踏
まえた嗜好重要度を取得し、この嗜好重要度にもとづい
て重要部分を選択し、この重要部分から要約を作成する
ので、作成された要約にユーザーの興味や注目度、目的
等の嗜好が反映される。As described above, according to the present invention,
For the important words in the document to be summarized, the preference importance based on the user's preference is obtained, the important part is selected based on this preference importance, and the summary is created from this important part. Preference such as user's interest, degree of attention, and purpose is reflected.

[Brief description of the drawings]

【図１】本発明の文書処理装置の一実施形態であり、本
発明の文書処理プログラムを記憶した記憶媒体の一実施
形態の該プログラムが読み取られたコンピュータの構成
を表したブロック図である。FIG. 1 is a block diagram showing a configuration of a computer which is an embodiment of a document processing apparatus of the present invention and which reads a document processing program of the present invention from an embodiment of a storage medium storing the program.

【図２】図１の実施形態における行列Ｇａ，Ｇｂ，Ｇｃ
の一例を示す説明図である。FIG. 2 shows matrices Ga, Gb, Gc in the embodiment of FIG.
It is explanatory drawing which shows an example of.

【図３】図１の実施形態における行列Ｇａ，Ｇｂ書き換
え処理の動作を表したフローチャートである。FIG. 3 is a flowchart showing an operation of matrix Ga, Gb rewriting processing in the embodiment of FIG. 1;

【図４】図１の実施形態における、各文書についての処
理重要語・処理重要度取得処理の動作を表したフローチ
ャートである。FIG. 4 is a flowchart showing an operation of processing important word / processing importance acquisition processing for each document in the embodiment of FIG. 1;

【図５】図１の実施形態における嗜好要約作成処理のメ
イン動作を表すフローチャートである。FIG. 5 is a flowchart showing a main operation of a preference summary creation process in the embodiment of FIG. 1;

【図６】図１の実施形態におけるタームベクトル取得処
理の動作を表したフローチャートである。FIG. 6 is a flowchart illustrating an operation of a term vector acquisition process in the embodiment of FIG. 1;

【図７】図１の実施形態においてＧＰ行列からＧＰベク
トルを取得する行程を概念的に説明する説明図である。FIG. 7 is an explanatory diagram conceptually illustrating a process of acquiring a GP vector from a GP matrix in the embodiment of FIG.

[Explanation of symbols]

１１制御部１１２ＲＯＭ１１３ＲＡＭ１１３１対象文書格納エリア１１３２要約パラメータ格納エリア１１３３重要語格納エリア１１３４タームベクトル格納エリア１１３５行列格納エリア１１３６嗜好タームベクトル格納エリア１１３７要約格納エリア１２キーボード１３マウス１４表示装置１５印刷装置１６記憶装置１６１仮名漢字変換辞書１６２プログラム格納部１６３データ格納部１６４重要語データベース１６５文書データベース１６８行列データベース 11 control unit 112 ROM 113 RAM 1131 target document storage area 1132 summary parameter storage area 1133 important word storage area 1134 term vector storage area 1135 matrix storage area 1136 preference term vector storage area 1137 summary storage area 12 keyboard 13 mouse 14 display device 15 print Device 16 Storage device 161 Kana-Kanji conversion dictionary 162 Program storage 163 Data storage 164 Key word database 165 Document database 168 Matrix database

Claims

[Claims]

1. A document acquisition unit for acquiring a document composed of a plurality of sentences; an important word extraction unit for acquiring an important word and its importance from the document acquired by the document acquisition unit; Preference important part selecting means for selecting a preference important part reflecting user's preference from the document, and preference summary creating means for creating a summary of the document based on the preference important part selected by the preference important part selecting means A document processing apparatus comprising:

2. An important word / phrase extracting unit, comprising: candidate word / phrase obtaining unit for obtaining a candidate word of the important word and its importance from the document obtained by the document obtaining unit; and a plurality of keywords representing user preference. The preference vector with the importance of the element value as an element value, or one of a plurality of users and a plurality of keywords representing the preference of each user in a row, and the other as a column, the importance of each of the keywords with respect to each of the users as an element value. A preference matrix that acquires a candidate matrix acquired by the candidate acquisition module using the preference vector or the GP matrix acquired by the preference acquisition module. Extracting the important words from the shifted importance, the preference important part selecting means selects the preference important parts based on the important words and their importance The document processing apparatus according to claim 1, characterized in Rukoto.

3. The important phrase extracting unit acquires candidate words of the important word and its importance from the document acquired by the document acquiring unit, and extracts the important word based on the importance of the candidate word. The preference important portion selecting means may include a preference vector in which importance values of a plurality of keywords representing user preferences are used as element values,
Or, preference acquisition for acquiring a GP matrix in which one of a plurality of users and a plurality of keywords representing the preferences of the plurality of users is row and the other is a column, and an importance value of each keyword for each user is an element value. Means, using the preference vector or the GP matrix acquired by the preference acquisition means, to select the important part according to the importance shifted from the importance of the important words acquired by the important word extraction means. The document processing apparatus according to claim 1, wherein:

4. A document acquisition function for acquiring a document including a plurality of sentences; an important word extraction function for acquiring an important word and its importance from the document acquired by the document acquisition function; A preference important part selection function of selecting a preference important part reflecting user's preference from the document; and a preference summary creation function of creating a summary of the document based on the preference important part selected by the preference important part selection function. And a computer-readable document processing program for causing a computer to realize the above.

5. The important word extracting function includes: a candidate word acquiring function for acquiring candidate words of the important word and its importance from the document acquired by the document acquiring function; and a plurality of keywords representing user preferences. The preference vector with the importance of the element value as an element value, or one of a plurality of users and a plurality of keywords representing the preference of each user in a row, and the other as a column, the importance of each of the keywords with respect to each of the users as an element value. And a preference acquisition function of acquiring the GP matrix. The preference vector or the GP matrix acquired by the preference acquisition function is used to determine the importance of the candidate phrase acquired by the candidate phrase acquisition function. Extracting the important words from the shifted importance, the preference important part selecting function selects the preference important parts based on the important words and their importance Storage medium document processing program according to claim 4, characterized in Rukoto is stored.

6. The important word extracting function acquires candidate words and important values of the important words from the document acquired by the document acquiring function, and extracts the important words based on the importance of the candidate words. The preference important part selection function includes a preference vector having the importance values of a plurality of keywords representing the user's preference as element values,
Or, preference acquisition for acquiring a GP matrix in which one of a plurality of users and a plurality of keywords representing the preferences of the plurality of users is row and the other is a column, and an importance value of each keyword for each user is an element value. Having a function, using the preference vector or the GP matrix acquired by the preference acquisition function, selecting the important part according to the importance shifted from the importance of the important word acquired by the important word extraction function A storage medium storing the document processing program according to claim 4.

7. A document including a plurality of sentences is acquired, an important word and its importance are acquired from the acquired document, and a preference important part reflecting user's preference is reflected from the document based on the important word. A document processing method comprising: selecting and creating a summary of the document based on the selected preference important part.