JPH08161348A

JPH08161348A - Document filtering method and document processor

Info

Publication number: JPH08161348A
Application number: JP6298237A
Authority: JP
Inventors: Takanari Ueda; 隆也上田; Makoto Hirota; 誠廣田; Shiro Ito; 史朗伊藤; Shogo Shibata; 昇吾柴田; Yuji Ikeda; 裕治池田; Minoru Fujita; 稔藤田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1994-12-01
Filing date: 1994-12-01
Publication date: 1996-06-21

Abstract

PURPOSE: To provide a document filtering method and a document processor which filters a document by considering not only coincidence of the contents of the document and user's interest but the newness of the document. CONSTITUTION: In the document filtering method and the document processor which filters and presents a received document, the document which should be presented is sorted by a previously set identifier and the passing time of the received document (108) and the sorted document is presented (110). At the time of sorting the representing document, the identifier of the received document and the previously set identifier are calculated (104) and based on the comparison between a threshold set corresponding to the passing time and the coincidence, whether to present the received document or not is judged. In another way, a score is given to the received document corresponding to the coincidence and the passing time and based on the comparison between the score and the prescribed threshold value, whether to present the document or not is judged. At this time, the passing time is from the preparation of the document or the reception of it, or both from preparation and from reception.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は文書処理装置に関し、特
にユーザのもとに入ってくる文書のうちユーザが関心を
持つ文書を選別し、その結果を出力する文書フィルタリ
ング方法及び文書処理装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document processing apparatus, and more particularly to a document filtering method and a document processing apparatus for selecting a document of interest to a user from among documents entering the user and outputting the result. It is a thing.

【０００２】[0002]

【従来の技術】昨今、記録媒体の大容量化と低価格化、
また、ワードプロセッサの普及などによって、電子化さ
れた文書の量が増大している。さらに、ネットワークの
整備が進み、電子メールや電子ニュースなどのように、
メディアを介して直接ユーザのもとに届く電子化文書の
量も増えている。このため、ユーザが処理できる量を越
えた文書が入ってくるようになり、本当に必要な情報が
不要な情報の中にも埋もれてしまうという、いわゆる
「情報洪水」が問題になってきている。2. Description of the Related Art Recently, there has been an increase in the capacity and cost of recording media,
Further, the spread of word processors has increased the amount of electronic documents. Furthermore, the network is being improved, and like e-mail and e-news,
The amount of digitized documents that reach users directly via media is also increasing. As a result, so-called “information flood” has become a problem, in which documents exceeding the amount that the user can process come in and the information that is really needed is buried in the unnecessary information.

【０００３】この問題への対応策として、ユーザが関心
を持つであろう文書のキーワードを自動的に選別する
「文書フィルタリング」の技術が用いられるようになっ
てきた。「文書フィルタリング」では、予め文書に対す
るユーザの関心をキーワードや文書の内容として設定し
ておき、この設定と送られてきた文書の内容を比較し
て、一致がある場合にはその文書をユーザに見せ、一致
していないときにはその文書をユーザに見せないという
制御をしている。すなわち、「文書フィルタリング」の
技術によって、ユーザは自分にとって関心のないような
文書を最初から見なくてすむようになり、時間を有効に
利用できるようになった。As a countermeasure against this problem, a technique of "document filtering" has been used in which a keyword of a document which a user may be interested in is automatically selected. In "document filtering", the user's interest in the document is set in advance as keywords or the contents of the document, and the settings are compared with the contents of the sent document. If there is a match, the document is sent to the user. If the documents do not match, the document is not shown to the user. In other words, the technique of "document filtering" has made it possible for the user to avoid having to look at a document that is of no interest to him from the beginning, and to effectively use time.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、フィル
タリングされた文書を定期的にユーザが見ている間は問
題ないが、ユーザがしばらく文書を見ることができずに
フイルタリングされた文書が溜まってしまったような場
合、文書数が多くなり、結局重要な情報を見落としてし
まう可能性が出てくる。However, while the user regularly views the filtered document, there is no problem, but the user cannot see the document for a while and the filtered documents are accumulated. In such a case, the number of documents will increase, and eventually important information may be overlooked.

【０００５】従って、重要な情報を見落とさないための
対策として、フィルタリングされた文書をユーザの関心
のあるキーワードや文書の内容との合致の度合（合致
度）の順に整列して表示し、合致度の高いものから見る
ことができるようにすることも行われるが、この場合
は、最新の必要文書が後に来る等、文書の「鮮度」が考
慮されていないことに問題がある。Therefore, as a measure for not overlooking important information, the filtered documents are aligned and displayed in the order of the degree of matching with the keyword of the user's interest or the content of the document (matching degree), and the matching degree is displayed. However, in this case, the "freshness" of the document is not taken into consideration, such as the latest required document comes later.

【０００６】電子メールのように日々ユーザのもとに届
くような文書については、鮮度が重要で、一般に「新し
い文書ほど情報の価値がある」ということが言える。届
いたときには重要な情報であっても、時間の経過と共に
重要度が下がる文書も多い。しかし、文書フィルタリン
グにおいて単に文書の内容だけを考慮したのでは、こう
したことが反映されない。For documents that reach users on a daily basis, such as electronic mail, freshness is important, and it can be generally said that "newer documents are more valuable to information". Even when important information arrives, there are many documents that become less important as time passes. However, considering only the content of the document in the document filtering does not reflect this.

【０００７】従来装置には、以上のような種々の問題が
あった。本発明は、上述した従来の課題を解決し、文書
の内容とユーザの関心との合致度だけでなく、文書の鮮
度（新しさ）も考慮してフィルタリングを行うようにし
た文書フィルタリング方法及び文書処理装置を提供する
ことを目的とする。The conventional device has various problems as described above. The present invention solves the above-mentioned conventional problems, and a document filtering method and a document in which not only the matching degree between the content of the document and the user's interest but also the freshness (newness) of the document is considered. An object is to provide a processing device.

【０００８】[0008]

【課題を解決する為の手段】上述の課題を解決するため
に、本発明の文書フィルタリング方法は、受信した文書
をフィルタリングして提示する文書フィルタリング方法
において、予め設定された識別子と受信文書の経過時間
とから提示すべき文書を選別し、該選別された文書を提
示することを特徴とする。In order to solve the above-mentioned problems, the document filtering method of the present invention is a document filtering method for filtering a received document and presenting it. It is characterized in that a document to be presented is selected based on time and the selected document is presented.

【０００９】ここで、前記提示文書の選別は、受信文書
の識別子と前記予め設定された識別子との合致度を計算
する工程と、前記経過時間に応じて設定される閾値と前
記合致度との比較に基づいて、受信文書の提示／非提示
を判定する工程とを含む。また、前記閾値は更に文書数
に応じて設定される。また、前記提示文書の選別は、受
信文書の識別子と前記予め設定された識別子との合致度
を計算する工程と、前記合致度と経過時間とに応じて受
信文書にスコアを与える工程と、前記スコアと所定の閾
値との比較に基づいて、受信文書の提示／非提示を判定
する工程とを含む。また、前記識別子は特徴ベクトルで
表わされ、前記合致度は受信文書の特徴ベクトルと前記
予め設定された特徴ベクトルとの内積で表わされる。ま
た、前記特徴ベクトルは、複数のキーワードまたは文か
ら成る。また、前記経過時間は、前記受信文書の作成か
らまたは受信からのいずれか、又は作成から及び受信か
らの双方である。Here, the selection of the presented document includes the step of calculating the matching degree between the identifier of the received document and the preset identifier, and the threshold value set according to the elapsed time and the matching degree. Determining whether to present or not present the received document based on the comparison. Further, the threshold value is further set according to the number of documents. Further, the selection of the presented document includes a step of calculating the degree of matching between the identifier of the received document and the preset identifier, a step of giving a score to the received document according to the degree of matching and the elapsed time, Determining whether to present or not present the received document based on a comparison between the score and a predetermined threshold. The identifier is represented by a feature vector, and the matching degree is represented by an inner product of the feature vector of the received document and the preset feature vector. The feature vector is composed of a plurality of keywords or sentences. Further, the elapsed time is either from the creation of the received document or from the reception, or both from the creation and the reception.

【００１０】又、本発明の文書処理装置は、受信した文
書をフィルタリングして提示する文書処理装置におい
て、受信文書を作成からの経過時間及び／又は受信から
の経過時間と共に保持する文書保持手段と、予め設定さ
れた識別子と受信文書の経過時間とから提示すべき文書
を選別する選別手段と、該選別された文書を提示する提
示手段とを備えることを特徴とする。The document processing apparatus of the present invention is a document processing apparatus for filtering and presenting a received document, and a document holding means for holding the received document together with the elapsed time from creation and / or the elapsed time from reception. The present invention is characterized by comprising a selecting means for selecting a document to be presented based on a preset identifier and the elapsed time of the received document, and a presenting means for presenting the selected document.

【００１１】ここで、前記選別手段は、受信文書の識別
子と前記予め設定された識別子との合致度を計算する合
致度計算手段と、前記経過時間に対応する閾値を設定す
る閾値設定手段と、前記閾値と前記合致度との比較に基
づいて、受信文書の提示／非提示を判定する判定手段と
を含む。また、前記閾値設定手段は、更に文書数にも応
じて閾値を設定する。また、前記選別手段は、受信文書
の識別子と前記予め設定された識別子との合致度を計算
する合致度計算手段と、前記合致度と経過時間とに応じ
て受信文書にスコアを与えるスコア計算手段と、前記ス
コアと所定の閾値との比較に基づいて、受信文書の提示
／非提示を判定する判定手段とを含む。また、前記識別
子は特徴ベクトルで表わされ、前記合致度は受信文書の
特徴ベクトルと前記予め設定された特徴ベクトルとの内
積で表わされる。また、前記特徴ベクトルは、複数のキ
ーワードまたは文から成る。Here, the selecting means is a matching degree calculating means for calculating a matching degree between the identifier of the received document and the preset identifier, and a threshold setting means for setting a threshold value corresponding to the elapsed time. And a determination unit for determining presentation / non-presentation of the received document based on the comparison between the threshold value and the matching degree. Further, the threshold setting means further sets the threshold according to the number of documents. Further, the selection means is a matching degree calculating means for calculating a matching degree between the identifier of the received document and the preset identifier, and a score calculating means for giving a score to the received document according to the matching degree and the elapsed time. And a determination unit that determines presentation / non-presentation of the received document based on a comparison between the score and a predetermined threshold value. The identifier is represented by a feature vector, and the matching degree is represented by an inner product of the feature vector of the received document and the preset feature vector. The feature vector is composed of a plurality of keywords or sentences.

【００１２】[0012]

【作用】以上の構成により、文書フィルタリングの際
に、文書特徴と選別特徴との合致度が、経過時間に応じ
て定まる閾値を越えた場合に、その文書を必要と判断し
てユーザに呈示することにより、文書の鮮度を考慮した
フイルタリングを行うことが可能となる。With the above configuration, when the degree of matching between the document feature and the selection feature exceeds the threshold value determined according to the elapsed time during document filtering, the document is judged to be necessary and presented to the user. This makes it possible to perform filtering in consideration of the freshness of the document.

【００１３】[0013]

【実施例】以下、本発明の実施例を添付図面を用いて詳
細に説明する。図１は、本実施例の文書処理装置の処理
の論理構成を示すブロック図である。図１において、１
０１は、ユーザの元に入ってきた文書を格納している文
書データベースである。１０２は、処理対象の文書１０
２−１と、その文書特徴１０２−２と、文書到着時刻１
０２−３を保持する文書保持部である。１０３は、ユー
ザの関心に合う文書の文書特徴（選別特徴）を保持する
選別特徴保持部である。１０４は、処理対象の文書の文
書特徴と選別特徴との合致度を計算する合致度計算部で
ある。１０５は、現在の時刻を保持する時刻保持部であ
る。１０６は、文書が到着してからの経過時間を計算す
る経過時間計算部である。１０７は、経過時間をもとに
閾値を計算する閾値計算部である。１０８は、合致度と
閾値の関係によって文書を選択する文書選択部である。
１０９は、文書選択部１０８で選ばれた文書を保持する
選別文書保持部である。１１０は、選別文書保持部１０
９に保持された文書を表示する文書表示部である。Embodiments of the present invention will be described below in detail with reference to the accompanying drawings. FIG. 1 is a block diagram showing the logical configuration of processing of the document processing apparatus of this embodiment. In FIG. 1, 1
Reference numeral 01 is a document database that stores documents that have entered the user. 102 is the document 10 to be processed
2-1, the document feature 102-2, and the document arrival time 1
This is a document holding unit that holds 02-3. Reference numeral 103 denotes a selection feature holding unit that holds document features (selection features) of a document that matches a user's interest. A matching degree calculation unit 104 calculates the matching degree between the document feature of the document to be processed and the selection feature. A time holding unit 105 holds the current time. Reference numeral 106 denotes an elapsed time calculation unit that calculates the elapsed time since the document arrived. Reference numeral 107 denotes a threshold value calculation unit that calculates a threshold value based on the elapsed time. Reference numeral 108 denotes a document selection unit that selects a document according to the relationship between the matching degree and the threshold value.
A selected document storage unit 109 stores the document selected by the document selection unit 108. 110 is a selected document holding unit 10.
9 is a document display unit for displaying the document stored in the document storage unit 9.

【００１４】図２は本実施例の文書処理装置のハードウ
エア構成を示す図である。図２において、２０１は、図
１及び図４に示す制御手順３００を記憶する制御メモリ
である。これはＲＯＭであってもよいし、ＲＡＭであっ
ても良い。２０２は、制御メモリ２０１に記憶されてい
る制御手段に従って処理を行う中央処理装置である。２
０３はメモリで、上記文書保持部１０２，選別特徴保持
部１０３，選別文書保持部１０９を有する。なお、時刻
保持部１０５は現在の時刻を保持するところで、ハード
ウエアであっても良い。２０４はキーボード、２０８は
ポインティングデバイスであり、操作者が操作する。２
０５はディスクであり、文書データベース１０１を有す
る。２０６はディスプレイで、ＣＲＴであってもよいし
液晶ディスプレイであってもよい。これは文書を表示す
るのに用いる。２０７は各構成要素を接続する為のバス
である。２０９はフロッピーである。FIG. 2 is a block diagram showing the hardware arrangement of the document processing apparatus according to this embodiment. In FIG. 2, 201 is a control memory that stores the control procedure 300 shown in FIGS. 1 and 4. This may be a ROM or a RAM. Reference numeral 202 denotes a central processing unit that performs processing in accordance with the control means stored in the control memory 201. Two
A memory 03 has the document holding unit 102, the selection feature holding unit 103, and the selection document holding unit 109. Note that the time holding unit 105 may be hardware where it holds the current time. A keyboard 204 and a pointing device 208 are operated by the operator. Two
Reference numeral 05 denotes a disk, which has a document database 101. Reference numeral 206 denotes a display, which may be a CRT or a liquid crystal display. This is used to display the document. Reference numeral 207 is a bus for connecting each component. 209 is a floppy.

【００１５】図３は、図２に示す制御メモリ２０１の中
にある制御手順３００の構成を更に示すものであり、合
致度計算部１０４と、経過時間計算部１０６と、閾値計
算部１０７と、文書選別部１０８と、文書表示部１１０
とを含む。図４は図３に示した処理部に対応する動作手
順を示すフローチャートである。図４を参照しながら、
本発明の一実施例の動作を説明する。なお、本実施例で
は文書の特徴の表現方法として、一般に知られているベ
クトル空間モデルを利用する。FIG. 3 further shows the configuration of the control procedure 300 in the control memory 201 shown in FIG. 2, and includes a matching degree calculation unit 104, an elapsed time calculation unit 106, a threshold value calculation unit 107, and Document selection unit 108 and document display unit 110
And FIG. 4 is a flowchart showing an operation procedure corresponding to the processing section shown in FIG. Referring to FIG.
The operation of the embodiment of the present invention will be described. In this embodiment, a commonly known vector space model is used as a method of expressing the characteristics of a document.

【００１６】ベクトル空間モデルでは、文書の特徴を表
現するのに、Ｎ個のキーワードを用意し、文書毎に各キ
ーワードの重みを設定する。これは、Ｎ次元空間のベク
トルとみなすことができる。このベクトルを長さ１に正
規化する。文書の特徴の合致度はそれぞれのベクトルの
内積として表す。まず、ステップＳ３０１で、合致度計
算部１０４において、文書保持部１０２に保持された文
書特徴１０２−２と、選別特徴保持部１０３に保持され
た選別特徴との合致度を計算する。先に述べたように、
合致度は文書特徴を表すベクトル間の内積で表すので、
文書特徴をベクトルｄ，選別特徴をベクトルｓとする
と、合致度は（ｄ・ｓ）になる。In the vector space model, N keywords are prepared to express the features of a document, and the weight of each keyword is set for each document. This can be regarded as a vector in N-dimensional space. This vector is normalized to length 1. The degree of matching of document features is expressed as the dot product of each vector. First, in step S301, the matching degree calculating unit 104 calculates the matching degree between the document feature 102-2 held in the document holding unit 102 and the selection feature held in the selection feature holding unit 103. As mentioned earlier,
Since the degree of matching is represented by the inner product between the vectors representing the document features,
When the document feature is the vector d and the selection feature is the vector s, the matching degree is (d · s).

【００１７】ステップＳ３０２では、経過時間計算部１
０６において、文書が到着してからの経過時間を計算す
る。これは、時刻保持部１０５に保持された現在時刻
と、文書保持部１０２に保持された文書到着時刻１０２
−３との差分によって求めることができる。ステップＳ
３０３では、閾値計算部１０７において、ステップＳ３
０２で計算した経過時間から閾値を計算する。閾値は経
過時間ｔに従って決めるが、ｔの増加と共に増加するよ
うな関数ｆ（ｔ）であればどのような決めかたをしても
構わない。例えば、ｔを日数（端数切り捨て）で表現
し、ｆ（ｔ）＝１−１／（ｔ＋２）のようにする。In step S302, the elapsed time calculation unit 1
At 06, the elapsed time since the document arrived is calculated. This is the current time stored in the time storage unit 105 and the document arrival time 102 stored in the document storage unit 102.
It can be obtained by the difference from -3. Step S
In 303, in the threshold calculation unit 107, step S3
A threshold is calculated from the elapsed time calculated in 02. The threshold value is determined according to the elapsed time t, but any function f (t) that increases with an increase in t may be determined. For example, t is expressed by the number of days (rounded down) and f (t) = 1−1 / (t + 2).

【００１８】ステップＳ３０４では、ステップＳ３０１
で計算した合致度と、ステップＳ３０３で計算した閾値
を比較する。合致度が閾値を越えていない場合はそのま
ま処理を終了する。ステップＳ３０４で合致度が閾値を
越えていればステップＳ３０５に進み、文書選別部１０
８で、この文書を選択して選別文書保持部１０９に保持
する。そして処理を終了する。In step S304, step S301
The degree of matching calculated in step S3 is compared with the threshold value calculated in step S303. If the degree of coincidence does not exceed the threshold value, the process is terminated. If the matching degree exceeds the threshold value in step S304, the process proceeds to step S305, and the document selection unit 10
In step 8, this document is selected and held in the selected document holding unit 109. Then, the process ends.

【００１９】例えば、ある文書の文書特徴ベクトルと選
別特徴ベクトルの内積ｐを０．７８とする。この文書が
到着してから２日経過した時点ではｆ（ｔ）＝１−１／
（ｔ＋２）に代入すると、ｆ（２）＝０．７５であり、
ｐ＞ｆ（２）となって、文書を選択するが、３日経過し
た時点ではｆ（３）＝０．８であり、ｐ＜ｆ（３）とな
って、文書を選択しないことになる。For example, the inner product p of the document feature vector of a certain document and the selection feature vector is 0.78. Two days after the arrival of this document, f (t) = 1-1 /
Substituting for (t + 2), f (2) = 0.75,
A document is selected with p> f (2), but f (3) = 0.8 when three days have passed, and p <f (3) is set, and no document is selected. .

【００２０】次に、文書表示部１１０により、選択され
た文書を表示する。尚、前記実施例では、文書をユーザ
に呈示する際に合致度と閾値の計算を両方行っている
が、合致度の計算を文書が届いた時点で行ない、その値
を保存しておくようにしておき、次に、文書をユーザに
呈示する際には閾値の計算だけを行うようにする。こう
することによって、文書をユーザに呈示する際の処理時
間を短縮することができる。Next, the document display unit 110 displays the selected document. In the above embodiment, both the matching degree and the threshold value are calculated when the document is presented to the user. However, the matching degree is calculated when the document arrives, and the value is saved. Next, when presenting the document to the user, only the threshold value is calculated. By doing so, the processing time when presenting the document to the user can be shortened.

【００２１】又、前記実施例では、文書の特徴にベクト
ル空間モデルを利用したが、この表現方法に限らない。
すなわち、文書特徴と選別特徴の合致度を計算し、それ
に対する閾値を経過時間ｔの関数として定義できれば良
い。又、前記実施例では閾値を計算する際に経過時間を
日数で表わしたが、秒・分・時間などどんな単位で表現
しても構わない。また閾値の関数も上記実施例に挙げた
ものに限るものではない。In the above embodiment, the vector space model is used for the document feature, but the present invention is not limited to this representation method.
That is, it suffices that the degree of coincidence between the document feature and the selection feature is calculated, and the threshold value for it can be defined as a function of the elapsed time t. Further, in the above-mentioned embodiment, the elapsed time is represented by the number of days when the threshold value is calculated, but it may be represented by any unit such as seconds, minutes and hours. Further, the function of the threshold is not limited to the one given in the above embodiment.

【００２２】又、前記実施例では、経過時間ｔの関数で
定まる閾値を用意して合致度との大小を調べたが、文書
の選別特徴の合致度と経過時間ｔから定まるスコアの合
計を文書選別のスコアとし、文書選別のスコア自体に文
書特徴の合致度だけでなく、経過時間ｔも反映するよう
にし、更に、経過時間ｔの増加でスコアが減少するよう
に定めれば良い。このスコアに対して一定の閾値を設け
て文書を選別するものでもよい。In the above embodiment, the threshold value determined by the function of the elapsed time t is prepared and the degree of matching is checked. However, the sum of the matching degree of the selection characteristics of the document and the score determined from the elapsed time t is the document. The selection score may be set so that not only the matching degree of the document features but also the elapsed time t is reflected in the document selection score itself, and further, the score decreases as the elapsed time t increases. A document may be selected by setting a certain threshold value for this score.

【００２３】又、前記実施例では、溜まった文書数を考
慮していない。しかし、閾値を計算する際に文書数も反
映して、文書数が多い場合にはそれだけ閾値を高くする
ようにしても良い。こうすることによって、文書数が多
い場合でも一定の数の文書をユーザに提示することがで
きるようになる。又、前記実施例では、文書が到着して
からの経過時間を用いたが、文書が作成された時刻がわ
かる場合は、作成されてからの経過時間を用いるように
しても良いし、作成と受信とを共に用いてもよい。Further, in the above embodiment, the number of accumulated documents is not taken into consideration. However, when calculating the threshold value, the number of documents may be reflected, and when the number of documents is large, the threshold value may be increased accordingly. By doing so, it becomes possible to present a certain number of documents to the user even when the number of documents is large. Further, in the above-described embodiment, the elapsed time after the document arrives is used. However, when the time when the document is created is known, the elapsed time after the creation may be used, or Reception and reception may be used together.

【００２４】更に、本発明は、複数の機器から構成され
るシステムに適用しても、１つの機器から成る装置に適
用しても良い。また、本発明はシステム或は装置にプロ
グラムを供給することによって達成される場合にも適用
できることはいうまでもない。Furthermore, the present invention may be applied to a system composed of a plurality of devices or an apparatus composed of a single device. Further, it goes without saying that the present invention can be applied to the case where it is achieved by supplying a program to a system or an apparatus.

【００２５】[0025]

【発明の効果】以上説明したように、本発明によれば、
より新しい情報をより重要なものとして文書フィルタリ
ングを行うようにしたので、限られた時間でユーザが情
報を効率的に利用することができるという効果がある。As described above, according to the present invention,
Since the document filtering is performed with newer information as more important information, there is an effect that the user can efficiently use the information in a limited time.

[Brief description of drawings]

【図１】本実施例の文書処理装置の処理の論理構成を示
すブロック図である。FIG. 1 is a block diagram showing a logical configuration of processing of a document processing apparatus of this embodiment.

【図２】本実施例の文書処理装置のハードウエア構成を
示す図である。FIG. 2 is a diagram showing a hardware configuration of a document processing apparatus of this embodiment.

【図３】本実施例の制御手順の構成を示す図である。FIG. 3 is a diagram showing a configuration of a control procedure of the present embodiment.

【図４】本実施例に係る動作手順を示すフローチャート
である。FIG. 4 is a flowchart showing an operation procedure according to the present embodiment.

[Explanation of symbols]

１０１文書データベース１０２文書保持部１０２−１処理対象の文書１０２−２文書の特徴１０２−３文書到着時刻１０３選別特徴保持部１０４合致度計算部１０５時刻保持部１０６経過時間計算部１０７閾値計算部１０８文書選別部１０９選別文書保持部１１０文書表示部２０１制御メモリ２０２中央処理装置２０３メモリ２０４キーボード２０５ディスク２０６ディスプレイ２０７バス２０８ポインティングデバイス２０９フロッピー 101 Document Database 102 Document Holding Unit 102-1 Document to be Processed 102-2 Document Feature 102-3 Document Arrival Time 103 Selection Feature Holding Unit 104 Matching Calculator 105 Time Holding Unit 106 Elapsed Time Calculating Unit 107 Threshold Calculating Unit 108 Document selection unit 109 Selected document holding unit 110 Document display unit 201 Control memory 202 Central processing unit 203 Memory 204 Keyboard 205 Disk 206 Display 207 Bus 208 Pointing device 209 Floppy

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所 9194−5Ｌ 15/40 ３１０Ｆ (72)発明者柴田昇吾東京都大田区下丸子３丁目30番２号キヤノン株式会社内 (72)発明者池田裕治東京都大田区下丸子３丁目30番２号キヤノン株式会社内 (72)発明者藤田稔東京都大田区下丸子３丁目30番２号キヤノン株式会社内─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Internal reference number FI technical display location 9194-5L 15/40 310 F (72) Inventor Shogo Shibata 3-30 Shimomaruko, Ota-ku, Tokyo No. 2 Canon Inc. (72) Inventor Yuji Ikeda 3-30-2 Shimomaruko, Ota-ku, Tokyo Canon Inc. (72) Minor Fujita 3-30-2 Shimomaruko, Ota-ku, Tokyo Canon Inc. Within the corporation

Claims

[Claims]

1. A document filtering method for filtering a received document and presenting the document, selecting a document to be presented based on a preset identifier and an elapsed time of the received document, and presenting the selected document. Characteristic document filtering method.

2. The selection of the presented document includes a step of calculating a matching degree between an identifier of a received document and the preset identifier, and a comparison between a threshold value set according to the elapsed time and the matching degree. The method for filtering a document according to claim 1, further comprising the step of determining whether or not to present the received document based on the above.

3. The document filtering method according to claim 2, wherein the threshold value is further set according to the number of documents.

4. The selection of the presented document includes a step of calculating a degree of matching between an identifier of the received document and the preset identifier, and a step of giving a score to the received document according to the degree of matching and the elapsed time. 2. The document filtering method according to claim 1, further comprising: and a step of determining presentation / non-presentation of the received document based on a comparison between the score and a predetermined threshold value.

5. The identifier is represented by a feature vector,
The document filtering method according to claim 2, wherein the degree of matching is represented by an inner product of a feature vector of a received document and the preset feature vector.

6. The document filtering method according to claim 5, wherein the feature vector includes a plurality of keywords or sentences.

7. The document filtering method according to claim 1, wherein the elapsed time is either from creation of the received document or from reception, or both from creation and reception. .

8. A document processing device for filtering and presenting a received document, a document holding means for holding a received document together with an elapsed time from creation and / or an elapsed time from reception, and a preset identifier and reception A document processing apparatus comprising: a selection unit that selects a document to be presented based on the elapsed time of the document, and a presentation unit that presents the selected document.

9. The selecting means, a matching degree calculating means for calculating a matching degree between an identifier of a received document and the preset identifier, a threshold setting means for setting a threshold value corresponding to the elapsed time, 9. The document processing apparatus according to claim 8, further comprising: a determination unit that determines presentation / non-presentation of a received document based on a comparison between a threshold value and the matching degree.

10. The document processing apparatus according to claim 9, wherein the threshold setting unit further sets a threshold according to the number of documents.

11. The matching means calculates matching degree between an identifier of a received document and the preset identifier, and a score is given to the received document according to the matching degree and elapsed time. 9. The document processing apparatus according to claim 8, further comprising: a score calculation unit; and a determination unit that determines presentation / non-presentation of the received document based on a comparison between the score and a predetermined threshold value.

12. The identifier is represented by a feature vector, and the matching degree is represented by an inner product of a feature vector of a received document and the preset feature vector. Document processing device.

13. The document processing apparatus according to claim 12, wherein the feature vector includes a plurality of keywords or sentences.