JP2004234516A

JP2004234516A - Document retrieval device

Info

Publication number: JP2004234516A
Application number: JP2003024524A
Authority: JP
Inventors: Takuka Tan; 澤華譚
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2003-01-31
Filing date: 2003-01-31
Publication date: 2004-08-19
Anticipated expiration: 2023-01-31
Also published as: JP4309144B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document retrieval device enabling a user to retrieve a desired document without performing a special setting operation even if the user's taste or the like is changed with the lapse of time. <P>SOLUTION: Formers and browsers of various documents stored in a Web server 3 are compared with a retrieval user to determine weighting factors related to various documents. On the other hand, for each of the documents stored in the Web server 3, the number of words recognized for matching by a collation part 18 is multiplied by the weighting factor to calculate the evaluation point of each document, and the documents are presented in descending order of evaluation point. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
この発明は、キーワードに合致するドキュメントを検索するドキュメント検索装置に関するものである。
【０００２】
【従来の技術】
ドキュメント検索装置は、膨大なドキュメントの中から利用者が必要としているドキュメントを検索するものであるが、従来のドキュメント検索装置は、例えば、予め利用者の興味のある情報をプロファイルし、そのプロファイルを参酌してユーザの興味のあるドキュメントを検索するようにしている（以下の特許文献１を参照）。
【０００３】
【特許文献１】
特開２０００−９９５２５公報（段落番号［００３４］から［００５７］、図２）
【０００４】
【発明が解決しようとする課題】
従来のドキュメント検索装置は以上のように構成されているので、プロファイルを参酌すれば、ユーザの興味のあるドキュメントを検索することができるが、時間の経過に伴ってユーザの業務や嗜好が変化すると、ユーザが自らプロファイルを更新しなければ、興味のあるドキュメントを検索することができなくなるなどの課題があった。
【０００５】
この発明は上記のような課題を解決するためになされたもので、時間の経過に伴ってユーザの嗜好等が変化しても、ユーザが特別な設定操作等を行うことなく、所望のドキュメントを検索することができるドキュメント検索装置を得ることを目的とする。
【０００６】
【課題を解決するための手段】
この発明に係るドキュメント検索装置は、サーバに保存されている各種のドキュメントの作成者及び閲覧者と検索利用者を比較して、各種のドキュメントに係る加重係数を決定する一方、サーバに保存されている各種のドキュメント毎に、照合手段により一致が認定された語句の個数と当該加重係数を乗算してドキュメントの評価点を計算し、評価点が高いドキュメントから順番に提示するようにしたものである。
【０００７】
【発明の実施の形態】
以下、この発明の実施の一形態を説明する。
実施の形態１．
図１はこの発明の実施の形態１によるドキュメント検索装置を示す構成図であり、図において、利用者端末１は例えばインターネット４などの通信回線に接続され、ドキュメントの検索を依頼する際にキーワードをドキュメント検索装置５に送信する。なお、利用者端末１としては例えばパソコン，ＰＤＡ，携帯電話等が該当する。ＬＤＡＰ（ＬｉｇｈｔｗｅｉｇｈｔＤｉｒｅｃｔｏｒｙＡｃｃｅｓｓＰｒｏｔｏｃｏｌ）サーバ２は各利用者の個人属性情報（例えば、利用者の会社名、部門名、業種、職務、性別、氏名、就業時間（曜日、時間）、休日情報）が記録され、各利用者の個人属性情報をドキュメント検索装置５に送信する。Ｗｅｂサーバ３は各種のドキュメントを保存するとともに、ドキュメントの作成者に関する情報やドキュメントの閲覧者に関する情報を記憶している。
【０００８】
ドキュメント検索装置５の情報収集部１１はＬＤＡＰサーバ２に記録されている個人属性情報を収集するとともに、Ｗｅｂサーバ３に記憶されているドキュメントの作成者や閲覧者に関する情報等を収集し、また、ある利用者端末１から送信されたキーワードを受信する。個人情報記録部１２は情報収集部１１により収集された個人属性情報を記録し、閲覧ドキュメント記録部１３は情報収集部１１により収集された閲覧者に関する情報を記録し、作成ドキュメント記録部１４は情報収集部１１により収集された作成者に関する情報を記録する。
なお、情報収集部１１はキーワード受信手段を構成し、また、情報収集部１１と閲覧ドキュメント記録部１３と作成ドキュメント記録部１４から記録手段が構成されている。
【０００９】
加重係数決定部１５は作成ドキュメント記録部１４に記憶されているドキュメントの作成者と検索利用者を比較して、各種のドキュメントに係る作成係数（加重係数）を決定するとともに、閲覧ドキュメント記録部１３に記憶されているドキュメントの閲覧者と検索利用者を比較して、各種のドキュメントに係る閲覧係数（加重係数）を決定する。なお、加重係数決定部１５は係数決定手段を構成している。
ドキュメント特徴分析部１６は情報収集部１１がＷｅｂサーバ３に保存されている各種のドキュメントを収集すると、各種のドキュメントの段落毎に特徴分析を実施して、各段落に存在する語句を抽出する。ドキュメント特徴記憶部１７はドキュメント特徴分析部１６により抽出された各段落に存在する語句を記録している。照合部１８はドキュメント特徴記憶部１７に記録されている語句と情報収集部１１により収集されたキーワードを照合する。なお、ドキュメント特徴分析部１６、ドキュメント特徴記憶部１７及び照合部１８から照合手段が構成されている。
【００１０】
ドキュメント評価部１９はＷｅｂサーバ３に保存されている各種のドキュメント毎に、照合部１８により一致が認定された語句の個数と加重係数決定部１５により決定された作成係数を乗算するとともに、その語句の個数と閲覧係数を乗算し、双方の乗算結果の合計値をドキュメントの評価点として求める。なお、ドキュメント評価部１９は評価手段を構成している。
検索結果提示部２０はドキュメント評価部１９により計算された評価点が高いドキュメントから順番に利用者端末１に提示する。なお、検索結果提示部２０は検索結果提示手段を構成している。
制御部２１はドキュメント検索装置５を構成する各部の動作を制御する。
【００１１】
次に動作について説明する。
利用者がＷｅｂサーバ３に保存されているドキュメントの閲覧等を希望する場合、予め、自己の個人属性情報（例えば、利用者の会社名、部門名、業種、職務、性別、氏名、就業時間（曜日、時間）、休日情報）をＬＤＡＰサーバ２に登録する必要がある。
よって、利用者は、Ｗｅｂサーバ３に保存されているドキュメントの閲覧等を希望する場合、自己の利用者端末１を操作して、自己の個人属性情報をＬＤＡＰサーバ２に登録する。なお、図２はＬＤＡＰサーバ２に登録されている各利用者の個人属性情報である。
ＬＤＡＰサーバ２は、一定期間毎に、登録している各利用者の個人属性情報をドキュメント検索装置５に送信する。あるいは、新たに利用者の個人属性情報が登録されたとき、その個人属性情報をドキュメント検索装置５に送信する。
【００１２】
ドキュメント検索装置５の情報収集部１１は、ＬＤＡＰサーバ２から個人属性情報が送信されると、その個人属性情報を収集して個人情報記録部１２に記録する。
また、情報収集部１１は、Ｗｅｂサーバ３にはドキュメントのアクセス履歴（ドキュメントの作成者や閲覧者に関する情報）が残されているので、制御部２１の指示の下、一定期間毎に、そのアクセス履歴を収集する。そして、そのアクセス履歴の中からドキュメントの閲覧者に関する情報を抽出して、その閲覧者に関する情報を閲覧ドキュメント記録部１３に記録する（図３を参照）。また、そのアクセス履歴の中からドキュメントの作成者に関する情報を抽出して、その作成者に関する情報を作成ドキュメント記録部１４に記録する（図４を参照）。
【００１３】
さらに、情報収集部１１は、制御部２１の指示の下、Ｗｅｂサーバ３に保存されている各種のドキュメントを収集し、各種のドキュメントをドキュメント特徴分析部１６に出力する。
ドキュメント特徴分析部１６は、情報収集部１１からＷｅｂサーバ３に保存されている各種のドキュメントを受けると、各種のドキュメントの段落毎に特徴分析を実施して、各段落に存在する語句を抽出し、その語句をドキュメント特徴記憶部１７に記録する。
図５はドキュメント特徴記憶部１７の記録内容を示す一例であるが、図５の例では、識別番号“０００１”のドキュメントは、ｈｔｔｐ：／／ａａ．ｂｂ．ｃｃのアドレスに格納され、そのドキュメントの１番目の段落には、“企業”、“効率”、“経営”、“戦略”等の語句が存在することを示している。
【００１４】
例えば、利用者Ａがドキュメントの検索を依頼する場合、利用端末１を操作して、検索に利用するキーワード（例えば、経営、期間、効率、生産）を入力すると、利用端末１が利用者Ａの識別番号“Ａ”と当該キーワードをドキュメント検索装置５に送信する。なお、同一の利用者でも、就業時間内のアクセス傾向と、就業時間外のアクセス傾向とが異なる場合が多いので、この例では、説明の便宜上、就業時間内のアクセス傾向に基づいてドキュメントを検索するものとする。したがって、この場合、利用端末１は付加的な検索条件（就業時間内のアクセスに限定する条件）をドキュメント検索装置５に送信する。
【００１５】
ドキュメント検索装置５の照合部１８は、情報収集部１１が利用端末１から送信された利用者Ａの識別番号“Ａ”とキーワードと付加的な検索条件を受信すると、制御部２１の指示の下、ドキュメント特徴記憶部１７に記録されている語句と当該キーワードを照合する。
例えば、キーワードが“経営”、“期間”、“効率”、“生産”である場合、図６に示すように、識別番号“０００１”のドキュメントには、それらのキーワードと一致する語句の個数が５個あるので（１番目の段落では、“効率”、“経営”が一致、２番目の段落では、“期間”、“効率”、“生産”が一致）、識別番号“０００１”のドキュメントの照合点は“５”になる。
図６の例では、識別番号“０００２”のドキュメントの照合点は“３”、識別番号“０００３”のドキュメントの照合点は“２”、識別番号“０００４”のドキュメントの照合点は“３”になる。
【００１６】
なお、照合部１８は、語句とキーワードが完全に一致していない場合でも、類似関係がある場合には一致を認定するようにしてもよい。例えば、キーワードが“生産”で、語句が“製造”である場合、両者は意味的に略同一であるので、一致を認定するようようにする。この場合、照合部１８は、曖昧検索に用いる辞書等を保持するようにすればよい。
【００１７】
加重係数決定部１５は、制御部２１の指示の下、情報収集部１１により収集された利用者Ａの識別番号“Ａ”をキーにして、閲覧ドキュメント記録部１３から利用者Ａが就業時間内に閲覧したことがあるドキュメントを検索する。図３の例では、利用者Ａが就業時間内に閲覧したことがあるドキュメントは識別番号が“０００３”のドキュメントのみであり、識別番号が“０００１”，“０００２”，“０００４”のドキュメントは利用者Ａに閲覧されていないと判断される。
加重係数決定部１５は、利用者Ａが就業時間内に閲覧したことがあるドキュメントには、図６に示すように、閲覧係数として“３”を与え、利用者Ａが就業時間内に閲覧したことがないドキュメントには、閲覧係数として“１”を与えるようにする。ただし、ここでは閲覧係数として“３”又は“１”を与えているが、これに限るものではなく、例えば、“５”又は“２”を与えようにしてもよい。
なお、利用者Ａの就業時間は、個人情報記録部１２に記録されている個人属性情報（図２を参照）から得ることができる。
【００１８】
また、加重係数決定部１５は、制御部２１の指示の下、情報収集部１１により収集された利用者Ａの識別番号“Ａ”をキーにして、作成ドキュメント記録部１４から利用者Ａが就業時間内に作成したドキュメントを検索する。図４の例では、利用者Ａが就業時間内に作成したドキュメントは識別番号が“０００１”のドキュメントのみであり、識別番号が“０００２”，“０００３”，“０００４”のドキュメントは利用者Ａに作成されていないと判断される。
加重係数決定部１５は、利用者Ａが就業時間内に作成したドキュメントには、図６に示すように、作成係数として“５”を与え、利用者Ａが就業時間内に作成していないドキュメントには、作成係数として“１”を与えるようにする。ただし、ここでは作成係数として“５”又は“１”を与えているが、これに限るものではなく、例えば、“７”又は“３”を与えようにしてもよい。
【００１９】
ドキュメント評価部１９は、上記のようにして、照合部１８が各種のドキュメントの照合点を求め、加重係数決定部１５が各種のドキュメントの加重係数（閲覧係数、作成係数）を決定すると、各種のドキュメント毎に、照合部１８により求められた照合点と加重係数決定部１５により決定された作成係数を乗算するとともに、その照合点と閲覧係数を乗算し、双方の乗算結果の合計値をドキュメントの評価点として求める。
識別番号“０００１”のドキュメントの評価点＝５×５＋５×１＝３０
識別番号“０００２”のドキュメントの評価点＝３×１＋３×１＝６
識別番号“０００３”のドキュメントの評価点＝２×１＋２×３＝８
識別番号“０００４”のドキュメントの評価点＝３×１＋３×１＝６
【００２０】
検索結果提示部２０は、ドキュメント評価部１９が各種のドキュメントの評価点を求めると、評価点が高いドキュメントから順番に並べた一覧表（例えば、評価点が高いドキュメントほど上部に配置し、評価点が最も低いドキュメントを最下部に配置する）等を作成し、その一覧表等を利用者端末１に送信する。
これにより、利用者端末１のディスプレイには、評価点が高いドキュメントから順番に並べられた一覧表等が表示されるので、例えば、一番上に配置されたドキュメントを選択すれば、その選択情報がＷｅｂサーバ３に送信されることにより、そのドキュメントの閲覧が可能になる。
【００２１】
なお、この実施の形態１では、利用者Ａが検索を依頼する場合について示したが、例えば、利用者Ｃが同じキーワードで検索を依頼した場合、照合部１８により求められる照合点は変わらないが、図７に示すように、加重係数決定部１５により決定される加重係数（閲覧係数、作成係数）が変わるため、ドキュメント評価部１９により計算される評価点も変わる。
【００２２】
以上で明らかなように、この実施の形態１によれば、Ｗｅｂサーバ３に保存されている各種のドキュメントの作成者及び閲覧者と検索利用者を比較して、各種のドキュメントに係る加重係数を決定する一方、Ｗｅｂサーバ３に保存されている各種のドキュメント毎に、照合部１８により一致が認定された語句の個数と当該加重係数を乗算してドキュメントの評価点を計算し、評価点が高いドキュメントから順番に提示するように構成したので、時間の経過に伴ってユーザの嗜好等が変化しても、ユーザが特別な設定操作等を行うことなく、所望のドキュメントを検索することができる効果を奏する。
即ち、利用者のプロファイルを参照して、所望のドキュメントを検索するのではなく、Ｗｅｂサーバ３に自動的に残されるドキュメントのアクセス履歴を参照して、所望のドキュメントを検索するようにしているので、利用者が嗜好の変化が起こる毎に自己のプロファイルを更新することなく、適切なドキュメントを検索することができる。
【００２３】
また、この実施の形態１によれば、語句とキーワードが完全に一致していない場合でも、類似関係がある場合には照合部１８が一致を認定するように構成したので、実用的な照合結果が得られる効果を奏する。
また、この実施の形態１によれば、各種のドキュメント毎に、照合部１８により求められた照合点と加重係数決定部１５により決定された作成係数を乗算するとともに、その照合点と閲覧係数を乗算し、双方の乗算結果の合計値をドキュメントの評価点として求めるように構成したので、検索利用者の興味に合うドキュメントを検索することができる効果を奏する。
【００２４】
さらに、この実施の形態１によれば、情報収集部１１がＷｅｂサーバ３に残されているドキュメントのアクセス履歴を一定期間毎に収集して、閲覧ドキュメント記録部１３及び作成ドキュメント記録部１４の記録内容（ドキュメントの閲覧者と作成者に関する情報）を更新するように構成したので、利用者の嗜好の変化を速やかに反映することができる効果を奏する。
なお、Ｗｅｂサーバ３に新規のドキュメントが登録されたとき、あるいは、Ｗｅｂサーバ３に保存されているドキュメントが閲覧されたとき、Ｗｅｂサーバ３がアクセス履歴をドキュメント検索装置５に送信するようにして、閲覧ドキュメント記録部１３及び作成ドキュメント記録部１４の記録内容（ドキュメントの閲覧者と作成者に関する情報）を更新するようにしてもよい。この場合、常に最新のアクセス状況を把握することができる。
【００２５】
実施の形態２．
図８はこの発明の実施の形態２によるドキュメント検索装置を示す構成図であり、図において、図１と同一符号は同一または相当部分を示すので説明を省略する。
加重係数決定部２２は図１の加重係数決定部１５と同様にして、各種のドキュメントに係る作成係数と閲覧係数を決定するとともに、検索利用者の個人属性情報と他の利用者の個人属性情報とを比較して、その検索利用者と他の利用者の類似度を把握し、その類似度に所定値を乗算して作成類似係数及び閲覧類似係数を求める。なお、加重係数決定部２２は係数決定手段を構成している。
ドキュメント評価部２３はＷｅｂサーバ３に保存されている各種のドキュメント毎に、照合部１８により求められた照合点と作成係数の乗算、その照合点と閲覧係数の乗算、その照合点と作成類似係数の乗算、その照合点と閲覧類似係数の乗算を実施し、各乗算結果の合計値をドキュメントの評価点として求める。なお、ドキュメント評価部２３は評価手段を構成している。
【００２６】
次に動作について説明する。
加重係数決定部２２は、図１の加重係数決定部１５と同様にして、各種のドキュメントに係る作成係数と閲覧係数を決定すると、検索利用者の個人属性情報と他の利用者の個人属性情報とを比較して、その検索利用者と他の利用者の類似度を把握する。
【００２７】
即ち、加重係数決定部２２は、利用者Ａがドキュメントの検索を依頼する場合、利用者Ａと利用者Ｂ，Ｃ，Ｄの個人属性情報を比較する（図２を参照）。
例えば、利用者Ａと利用者Ｂの個人属性情報（会社名、部門名、業種、職務、性別）を比較すると、“会社名”、“部門名”、“業種”、“職務”、“性別”の５項目が一致しているので、図９に示すように、利用者Ａと利用者Ｂの類似度が“５”であると判断する。
また、利用者Ａと利用者Ｃの個人属性情報を比較すると、“性別”の１項目のみが一致しているので、図９に示すように、利用者Ａと利用者Ｃの類似度が“１”であると判断する。
さらに、利用者Ａと利用者Ｄの個人属性情報を比較すると、全ての項目が不一致であるので、図９に示すように、類似度が“０”であると判断する。
なお、言うまでもないが、利用者Ａと利用者Ａの個人属性情報を比較した場合、全部の項目が一致するので、図９に示すように、利用者Ａと利用者Ａの類似度が“５”であると判断する。
【００２８】
加重係数決定部２２は、上記のようにして、利用者Ａと利用者Ｂ，Ｃ，Ｄの類似度を把握すると、次のようにして、作成類似係数と閲覧類似係数を求める。
まず、識別番号“０００１”のドキュメントは、図４に示すように、利用者Ａが作成者であるので、利用者Ａと利用者Ａの類似度である“５”に所定値“０．５”を乗算し、その乗算結果である“２．５”を作成類似係数とする（図１０を参照）。ここで、所定値“０．５”を乗算しているのは、利用者の類似度に基づく評価は上記実施の形態１における評価を補足するものであるので、加重を低くするためである。ただし、所定値は“０．５”に限るものではない。
【００２９】
また、識別番号“０００２”のドキュメントは、利用者Ｂが作成者であるので、利用者Ａと利用者Ｂの類似度である“５”に所定値“０．５”を乗算し、その乗算結果である“２．５”を作成類似係数とする。
また、識別番号“０００３”のドキュメントは、利用者Ｃが作成者であるので、利用者Ａと利用者Ｃの類似度である“１”に所定値“０．５”を乗算し、その乗算結果である“０．５”を作成類似係数とする。
さらに、識別番号“０００４”のドキュメントは、利用者Ｄが作成者であるので、利用者Ａと利用者Ｄの類似度である“０”に所定値“０．５”を乗算し、その乗算結果である“０”を作成類似係数とする。
【００３０】
次に、加重係数決定部２２は、識別番号“０００１”のドキュメントは、図３に示すように、就業時間内では利用者Ｃが閲覧者であるので、利用者Ａと利用者Ｃの類似度である“１”に所定値“０．３”を乗算し、その乗算結果である“０．３”を閲覧類似係数とする。ここで、所定値“０．３”を乗算しているのは、利用者の類似度に基づく評価は上記実施の形態１における評価を補足するものであるので、加重を低くするためである。ただし、所定値は“０．３”に限るものではない。
【００３１】
また、識別番号“０００２”のドキュメントは、就業時間内では利用者Ｂが閲覧者であるので、利用者Ａと利用者Ｂの類似度である“５”に所定値“０．３”を乗算し、その乗算結果である“１．５”を閲覧類似係数とする。
また、識別番号“０００３”のドキュメントは、就業時間内では利用者Ａが閲覧者であるので、利用者Ａと利用者Ａの類似度である“５”に所定値“０．３”を乗算し、その乗算結果である“１．５”を閲覧類似係数とする。
さらに、識別番号“０００４”のドキュメントは、就業時間内では利用者Ｄが閲覧者であるので、利用者Ａと利用者Ｄの類似度である“０”に所定値“０．３”を乗算し、その乗算結果である“０”を閲覧類似係数とする。
【００３２】
ドキュメント評価部２３は、加重係数決定部２２が各種のドキュメントの加重係数（閲覧係数、作成係数）と作成類似係数及び閲覧類似係数を決定すると、各種のドキュメント毎に、照合部１８により求められた照合点と作成係数の乗算、その照合点と閲覧係数の乗算、その照合点と作成類似係数の乗算、その照合点と閲覧類似係数の乗算を実施し、各乗算結果の合計値をドキュメントの評価点として求める（図１０を参照）。
識別番号“０００１”のドキュメントの評価点
＝５×５＋５×１＋５×２．５＋５×０．３＝４４
識別番号“０００２”のドキュメントの評価点
＝３×１＋３×１＋３×２．５＋３×１．５＝１８
識別番号“０００３”のドキュメントの評価点
＝２×１＋２×３＋２×０．５＋２×１．５＝１２
識別番号“０００４”のドキュメントの評価点
＝３×１＋３×１＋３×０＋３×０＝６
【００３３】
なお、この実施の形態２では、利用者Ａが検索を依頼する場合について示したが、例えば、利用者Ｃが同じキーワードで検索を依頼した場合、照合部１８により求められる照合点は変わらないが、図１１に示すように、利用者Ｃと利用者Ａ，Ｂ，Ｄの類似度が変わり、加重係数決定部２２により決定される加重係数（閲覧係数、作成係数）と作成類似係数及び閲覧類似係数が変わるため、ドキュメント評価部２２により計算される評価点も変わる。
【００３４】
以上で明らかなように、この実施の形態２によれば、各種のドキュメント毎に、照合部１８により求められた照合点と作成係数の乗算、その照合点と閲覧係数の乗算、その照合点と作成類似係数の乗算、その照合点と閲覧類似係数の乗算を実施し、各乗算結果の合計値をドキュメントの評価点として求めるように構成したので、検索利用者の業務等に合うドキュメントを検索することができる効果を奏する。
【００３５】
実施の形態３．
上記実施の形態１，２では、就業時間内のアクセス傾向に基づいてドキュメントを検索するものについて示したが、就業時間外のアクセス傾向に基づいてドキュメントを検索するようにしてもよい。図１２は就業時間外のアクセス傾向に基づいてドキュメントを検索する場合の評価点等を示している。
【００３６】
【発明の効果】
以上のように、この発明によれば、サーバに保存されている各種のドキュメントの作成者及び閲覧者と検索利用者を比較して、各種のドキュメントに係る加重係数を決定する一方、サーバに保存されている各種のドキュメント毎に、照合手段により一致が認定された語句の個数と当該加重係数を乗算してドキュメントの評価点を計算し、評価点が高いドキュメントから順番に提示するように構成したので、時間の経過に伴ってユーザの嗜好等が変化しても、ユーザが特別な設定操作等を行うことなく、所望のドキュメントを検索することができる効果がある。
【図面の簡単な説明】
【図１】この発明の実施の形態１によるドキュメント検索装置を示す構成図である。
【図２】ＬＤＡＰサーバに登録されている各利用者の個人属性情報を示す説明図である。
【図３】閲覧ドキュメント記録部に記録されている閲覧者に関する情報を示す説明図である。
【図４】作成ドキュメント記録部に記録されている作成者に関する情報を示す説明図である。
【図５】各段落に存在する語句を示す説明図である。
【図６】ドキュメントの照合点や評価点等を示す説明図である。
【図７】ドキュメントの照合点や評価点等を示す説明図である。
【図８】この発明の実施の形態２によるドキュメント検索装置を示す構成図である。
【図９】利用者間の類似度を示す説明図である。
【図１０】ドキュメントの照合点や評価点等を示す説明図である。
【図１１】利用者間の類似度を示す説明図である。
【図１２】ドキュメントの照合点や評価点等を示す説明図である。
【符号の説明】
１利用者端末、２ＬＤＡＰサーバ、３Ｗｅｂサーバ、４インターネット、５ドキュメント検索装置、１１情報収集部（キーワード受信手段、記録手段）、１２個人情報記録部、１３閲覧ドキュメント記録部（記録手段）、１４作成ドキュメント記録部（記録手段）、１５加重係数決定部（係数決定手段）、１６ドキュメント特徴分析部（照合手段）、１７ドキュメント特徴記憶部（照合手段）、１８照合部（照合手段）、１９ドキュメント評価部（評価手段）、２０検索結果提示部（検索結果提示手段）、２１制御部、２２加重係数決定部（係数決定手段）、２３ドキュメント評価部（評価手段）。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a document search device that searches for a document that matches a keyword.
[0002]
[Prior art]
A document search device searches for a document that a user needs from an enormous number of documents.A conventional document search device, for example, profiles information of interest to a user in advance and sets the profile to Documents of interest to the user are searched for by reference (see Patent Document 1 below).
[0003]
[Patent Document 1]
JP 2000-99525 A (paragraph numbers [0034] to [0057], FIG. 2)
[0004]
[Problems to be solved by the invention]
Since the conventional document search device is configured as described above, if a profile is taken into consideration, it is possible to search for a document of interest to the user, but if the user's business or preference changes over time, If the user does not update the profile by himself / herself, it is impossible to search for an interesting document.
[0005]
SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problem. Even if the user's preference changes over time, the user can perform a desired document without performing a special setting operation or the like. It is an object of the present invention to obtain a searchable document search device.
[0006]
[Means for Solving the Problems]
A document search device according to the present invention compares a creator and a viewer of various documents stored in a server with a search user to determine a weighting coefficient for various documents, and stores the weight coefficients in the server. For each type of document, the evaluation score of the document is calculated by multiplying the number of words and phrases that have been matched by the matching means by the weighting coefficient, and presented in order from the document with the highest evaluation score. .
[0007]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of the present invention will be described.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing a document search apparatus according to a first embodiment of the present invention. In the figure, a user terminal 1 is connected to a communication line such as the Internet 4 and inputs a keyword when requesting a document search. It is transmitted to the document search device 5. The user terminal 1 corresponds to, for example, a personal computer, a PDA, a mobile phone, or the like. An LDAP (Lightweight Directory Access Protocol) server 2 records personal attribute information of each user (for example, the user's company name, department name, business type, job, gender, name, working hours (day of the week, time), holiday information). Then, the personal attribute information of each user is transmitted to the document search device 5. The Web server 3 stores various documents, and also stores information on a creator of the document and information on a viewer of the document.
[0008]
The information collection unit 11 of the document search device 5 collects personal attribute information recorded in the LDAP server 2 and also collects information about a creator and a viewer of the document stored in the Web server 3 and the like. A keyword transmitted from a certain user terminal 1 is received. The personal information recording unit 12 records the personal attribute information collected by the information collecting unit 11, the browsing document recording unit 13 records the information on the viewer collected by the information collecting unit 11, and the created document recording unit 14 stores the information. The information about the creator collected by the collection unit 11 is recorded.
The information collecting unit 11 forms a keyword receiving unit, and the information collecting unit 11, the browsed document recording unit 13, and the created document recording unit 14 constitute a recording unit.
[0009]
The weighting coefficient determination unit 15 compares the creator of the document stored in the created document recording unit 14 with the search user to determine the creation coefficient (weighting coefficient) for each type of document, and also stores the browsed document recording unit 13. The browsing coefficient (weighting coefficient) related to various documents is determined by comparing the browsing user and the searching user of the document stored in. Note that the weighting coefficient determination unit 15 constitutes coefficient determination means.
When the information collecting unit 11 collects various documents stored in the Web server 3, the document characteristic analyzing unit 16 performs a characteristic analysis for each paragraph of the various documents, and extracts words and phrases present in each paragraph. The document feature storage unit 17 records the words present in each paragraph extracted by the document feature analysis unit 16. The collation unit 18 collates the words and phrases recorded in the document feature storage unit 17 with the keywords collected by the information collection unit 11. Note that the document feature analysis unit 16, the document feature storage unit 17, and the comparison unit 18 constitute a matching unit.
[0010]
The document evaluation unit 19 multiplies, for each of various documents stored in the Web server 3, the number of words and phrases determined to match by the matching unit 18 by the creation coefficient determined by the weighting coefficient determination unit 15, Is multiplied by the browsing coefficient, and the total value of both multiplication results is obtained as a document evaluation point. The document evaluation section 19 constitutes an evaluation unit.
The search result presentation unit 20 presents the documents to the user terminal 1 in order from the document having the highest evaluation score calculated by the document evaluation unit 19. Note that the search result presentation unit 20 constitutes a search result presentation unit.
The control unit 21 controls the operation of each unit constituting the document search device 5.
[0011]
Next, the operation will be described.
If the user wishes to view a document stored in the Web server 3 or the like, the user's personal attribute information (for example, the user's company name, department name, industry, job, gender, name, working hours ( It is necessary to register day of the week, time) and holiday information) in the LDAP server 2.
Therefore, when the user desires to view a document stored in the Web server 3 or the like, the user operates his / her own user terminal 1 and registers his / her personal attribute information in the LDAP server 2. FIG. 2 shows personal attribute information of each user registered in the LDAP server 2.
The LDAP server 2 transmits the personal attribute information of each registered user to the document search device 5 at regular intervals. Alternatively, when the personal attribute information of the user is newly registered, the personal attribute information is transmitted to the document search device 5.
[0012]
When the personal attribute information is transmitted from the LDAP server 2, the information collecting unit 11 of the document search device 5 collects the personal attribute information and records it in the personal information recording unit 12.
In addition, the information collection unit 11 stores the document access history (information on the creator and the viewer of the document) in the Web server 3. Collect history. Then, information about the document viewer is extracted from the access history, and the information about the document viewer is recorded in the browse document recording unit 13 (see FIG. 3). Further, information on the creator of the document is extracted from the access history, and the information on the creator is recorded in the created document recording unit 14 (see FIG. 4).
[0013]
Further, the information collection unit 11 collects various documents stored in the Web server 3 under the instruction of the control unit 21 and outputs the various documents to the document feature analysis unit 16.
Upon receiving various documents stored in the Web server 3 from the information collecting unit 11, the document characteristic analysis unit 16 performs a characteristic analysis for each paragraph of the various documents, and extracts words and phrases present in each paragraph. Is recorded in the document feature storage unit 17.
FIG. 5 is an example showing the recorded contents of the document feature storage unit 17. In the example of FIG. 5, the document with the identification number “0001” is http: // aa. bb. cc, which indicates that words such as "company", "efficiency", "management", and "strategy" exist in the first paragraph of the document.
[0014]
For example, when the user A requests a document search, the user operates the user terminal 1 and inputs a keyword (for example, management, period, efficiency, production) used for the search. The identification number “A” and the keyword are transmitted to the document search device 5. In addition, even in the same user, the access tendency during working hours and the access tendency outside working hours are often different, so in this example, for convenience of explanation, a document is searched based on the access tendency during working hours. It shall be. Therefore, in this case, the use terminal 1 transmits an additional search condition (a condition limited to access during working hours) to the document search device 5.
[0015]
When the information collection unit 11 receives the identification number “A” of the user A, the keyword, and the additional search condition transmitted from the use terminal 1 under the instruction of the control unit 21, The keyword stored in the document feature storage unit 17 is compared with the keyword.
For example, if the keywords are “management”, “period”, “efficiency”, and “production”, as shown in FIG. 6, the document with the identification number “0001” has the number of words matching those keywords. Since there are five (the first paragraph matches “efficiency” and “management”, the second paragraph matches “period”, “efficiency” and “production”), the document of identification number “0001” The collation point is “5”.
In the example of FIG. 6, the collation point of the document with the identification number “0002” is “3”, the collation point of the document with the identification number “0003” is “2”, and the collation point of the document with the identification number “0004” is “3”. become.
[0016]
In addition, even when the phrase and the keyword do not completely match, the matching unit 18 may recognize the match when there is a similarity. For example, if the keyword is "production" and the phrase is "manufacturing", the two are semantically substantially the same, so that a match is determined. In this case, the matching unit 18 may hold a dictionary or the like used for fuzzy search.
[0017]
Under the instruction of the control unit 21, the weighting factor determination unit 15 uses the identification number “A” of the user A collected by the information collection unit 11 as a key, and Find documents you've viewed in. In the example of FIG. 3, the documents that the user A has viewed during the working hours are only the documents with the identification numbers “0003”, and the documents with the identification numbers “0001”, “0002”, and “0004” It is determined that the user A has not browsed.
The weighting factor determination unit 15 assigns “3” as a viewing factor to a document that the user A has viewed during the working hours, as shown in FIG. 6, and the user A has viewed the document during the working hours. For a document that does not have such a document, "1" is given as a browsing coefficient. Here, “3” or “1” is given as the browsing coefficient, but the present invention is not limited to this. For example, “5” or “2” may be given.
The working hours of the user A can be obtained from personal attribute information (see FIG. 2) recorded in the personal information recording unit 12.
[0018]
The weighting factor determination unit 15 uses the identification number “A” of the user A collected by the information collection unit 11 as a key under the instruction of the control unit 21 so that the user A starts working from the created document recording unit 14. Search for documents created in time. In the example of FIG. 4, the documents created by the user A during working hours are only the documents with the identification numbers “0001”, and the documents with the identification numbers “0002”, “0003”, and “0004” are the users A Is not created.
As shown in FIG. 6, the weighting coefficient determination unit 15 assigns “5” as a creation coefficient to a document created by the user A during working hours, and assigns a document that the user A has not created within the working hours. , "1" is given as a creation coefficient. Here, “5” or “1” is given as the creation coefficient, but the present invention is not limited to this. For example, “7” or “3” may be given.
[0019]
As described above, when the matching unit 18 obtains the matching points of various documents and the weighting factor determination unit 15 determines the weighting factors (viewing factors, creation factors) of various documents, the document evaluation unit 19 For each document, the collation point obtained by the collation unit 18 is multiplied by the creation coefficient determined by the weighting factor determination unit 15, and the collation point is multiplied by the browsing coefficient. Obtain as an evaluation point.
Evaluation score of document with identification number “0001” = 5 × 5 + 5 × 1 = 30
Evaluation score of document with identification number “0002” = 3 × 1 + 3 × 1 = 6
Evaluation score of document with identification number “0003” = 2 × 1 + 2 × 3 = 8
Evaluation score of document with identification number “0004” = 3 × 1 + 3 × 1 = 6
[0020]
When the document evaluation section 19 obtains the evaluation points of various documents, the search result presentation section 20 arranges a list in which documents having the highest evaluation points are arranged in order (for example, the higher the evaluation points, the higher the evaluation points are arranged at the top, Is arranged at the bottom), and a list thereof is transmitted to the user terminal 1.
As a result, a list or the like is displayed on the display of the user terminal 1 in order from the document with the highest evaluation score. For example, if the document arranged at the top is selected, the selection information is displayed. Is transmitted to the Web server 3 so that the document can be browsed.
[0021]
In the first embodiment, the case where the user A requests a search is described. For example, when the user C requests a search with the same keyword, the matching point obtained by the matching unit 18 does not change. As shown in FIG. 7, since the weighting coefficients (viewing coefficients and creation coefficients) determined by the weighting coefficient determination unit 15 change, the evaluation points calculated by the document evaluation unit 19 also change.
[0022]
As is clear from the above, according to the first embodiment, the creator and viewer of various documents stored in the Web server 3 are compared with the search user, and the weighting factors for the various documents are calculated. On the other hand, the evaluation score of the document is calculated by multiplying the number of words whose matching has been recognized by the matching unit 18 by the weighting coefficient for each of the various documents stored in the Web server 3, and the evaluation score is high. Since the document is presented in order from the document, even if the user's preference changes over time, the user can search for the desired document without performing any special setting operation. To play.
That is, instead of searching for a desired document by referring to the user's profile, the desired document is searched for by referring to the access history of the document automatically left in the Web server 3. In addition, the user can search for an appropriate document without updating his or her profile every time a change in taste occurs.
[0023]
Further, according to the first embodiment, even when the word and the keyword do not completely match, if the similarity exists, the matching unit 18 is configured to recognize the match. Is obtained.
Further, according to the first embodiment, for each type of document, the collation point determined by the collation unit 18 is multiplied by the creation coefficient determined by the weighting factor determination unit 15, and the collation point and the browsing coefficient are used. Since the multiplication is performed and the total value of both multiplication results is obtained as the evaluation point of the document, an effect is provided that a document that matches the interest of the search user can be searched.
[0024]
Further, according to the first embodiment, the information collecting unit 11 collects the access history of the document left in the Web server 3 at regular intervals, and records the access history in the browsed document recording unit 13 and the created document recording unit 14. Since the content (information on the document viewer and the creator) is configured to be updated, there is an effect that the change in the user's preference can be promptly reflected.
When a new document is registered in the Web server 3 or when a document stored in the Web server 3 is browsed, the Web server 3 transmits an access history to the document search device 5, The recorded contents (information on the document viewer and the creator) of the browsed document recording unit 13 and the created document recording unit 14 may be updated. In this case, the latest access status can always be grasped.
[0025]
Embodiment 2 FIG.
FIG. 8 is a configuration diagram showing a document search apparatus according to Embodiment 2 of the present invention. In the figure, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will be omitted.
The weighting coefficient determination unit 22 determines the creation coefficient and the browsing coefficient for various documents in the same manner as the weighting coefficient determination unit 15 in FIG. 1, and searches for the personal attribute information of the search user and the personal attribute information of other users. , The similarity between the search user and another user is grasped, and the similarity is multiplied by a predetermined value to obtain a created similarity coefficient and a browse similarity coefficient. Note that the weighting coefficient determination unit 22 constitutes coefficient determination means.
The document evaluation unit 23 multiplies the collation point calculated by the collation unit 18 by the creation coefficient, multiplies the collation point by the browsing coefficient, and compares the collation point and the creation similarity coefficient for each of various documents stored in the Web server 3. , And the matching point is multiplied by the browsing similarity coefficient, and the total value of the multiplication results is obtained as the evaluation point of the document. The document evaluation section 23 constitutes an evaluation unit.
[0026]
Next, the operation will be described.
When the weighting factor determination unit 22 determines the creation factor and the browsing factor for various documents in the same manner as the weighting factor determination unit 15 in FIG. 1, the personal attribute information of the search user and the personal attribute information of other users are determined. To find the similarity between the search user and other users.
[0027]
That is, when the user A requests a document search, the weighting factor determination unit 22 compares the personal attribute information of the user A with the personal attribute information of the users B, C, and D (see FIG. 2).
For example, comparing the personal attribute information (company name, department name, business type, job, and gender) of user A and user B, “company name”, “department name”, “business type”, “duty”, “sex Therefore, as shown in FIG. 9, it is determined that the similarity between the user A and the user B is "5".
Further, when comparing the personal attribute information of the user A and the user C, only one item of “sex” matches, so that the similarity between the user A and the user C is “as shown in FIG. 1 ".
Further, when the personal attribute information of the user A and the personal attribute information of the user D are compared, since all items do not match, it is determined that the similarity is “0” as shown in FIG.
Needless to say, when the personal attribute information of the user A is compared with the personal attribute information of the user A, all the items match, so that the similarity between the user A and the user A is “5” as shown in FIG. Is determined.
[0028]
When the weighting coefficient determination unit 22 grasps the similarity between the user A and the users B, C, and D as described above, the weighting coefficient determination unit 22 obtains the creation similarity coefficient and the browsing similarity coefficient as follows.
First, as shown in FIG. 4, since the user A is the creator of the document with the identification number “0001”, the predetermined value “0.5” is set to “5” which is the similarity between the user A and the user A. , And the result of the multiplication, “2.5”, is used as the creation similarity coefficient (see FIG. 10). Here, the predetermined value “0.5” is multiplied because the evaluation based on the similarity of the user complements the evaluation in the first embodiment, so that the weight is reduced. However, the predetermined value is not limited to “0.5”.
[0029]
Further, since the document with the identification number “0002” is created by the user B, the similarity between the user A and the user B is multiplied by “5” by a predetermined value “0.5”, and the multiplication is performed. The result “2.5” is used as the creation similarity coefficient.
Further, since the document with the identification number “0003” is created by the user C, the similarity between the user A and the user C is multiplied by “1” by a predetermined value “0.5”, and the multiplication is performed. The result “0.5” is used as the creation similarity coefficient.
Further, since the user D is the creator of the document with the identification number "0004", the similarity between the user A and the user D is multiplied by a predetermined value "0.5", and the multiplication is performed. The result “0” is set as a creation similarity coefficient.
[0030]
Next, as shown in FIG. 3, since the user C is a viewer during the working hours of the document with the identification number “0001”, the weighting factor determination unit 22 determines that the similarity between the user A and the user C is high. Is multiplied by a predetermined value “0.3”, and the multiplication result “0.3” is set as a browsing similarity coefficient. Here, the predetermined value “0.3” is multiplied because the evaluation based on the similarity of the user complements the evaluation in the first embodiment, so that the weight is reduced. However, the predetermined value is not limited to “0.3”.
[0031]
In the document with the identification number “0002”, since the user B is a viewer during working hours, the similarity “5” between the user A and the user B is multiplied by a predetermined value “0.3”. Then, the multiplication result “1.5” is set as the browsing similarity coefficient.
In the document with the identification number “0003”, since the user A is a viewer during working hours, the similarity “5” between the user A and the user A is multiplied by a predetermined value “0.3”. Then, the multiplication result “1.5” is set as the browsing similarity coefficient.
Further, in the document with the identification number “0004”, since the user D is the browsing person during working hours, the similarity “0” between the user A and the user D is multiplied by the predetermined value “0.3”. Then, the multiplication result “0” is set as the browsing similarity coefficient.
[0032]
When the weighting factor determination unit 22 determines the weighting factors (viewing factor, creation factor), creation similarity factor, and viewing similarity factor of various documents, the document evaluation unit 23 calculates the weighting factor for each of the various documents by the matching unit 18. Multiplies the collation point and the creation coefficient, multiplies the collation point and the browsing coefficient, multiplies the collation point and the creation similarity coefficient, multiplies the collation point and the browsing similarity coefficient, and evaluates the total value of each multiplication result. It is obtained as a point (see FIG. 10).
Evaluation score of document with identification number "0001"
= 5 × 5 + 5 × 1 + 5 × 2.5 + 5 × 0.3 = 44
Evaluation score of document with identification number "0002"
= 3 × 1 + 3 × 1 + 3 × 2.5 + 3 × 1.5 = 18
Evaluation score of document with identification number "0003"
= 2 × 1 + 2 × 3 + 2 × 0.5 + 2 × 1.5 = 12
Evaluation score of document with identification number "0004"
= 3 × 1 + 3 × 1 + 3 × 0 + 3 × 0 = 6
[0033]
In the second embodiment, the case where the user A requests a search has been described. For example, when the user C requests a search with the same keyword, the matching point obtained by the matching unit 18 does not change. As shown in FIG. 11, the similarity between the user C and the users A, B, and D changes, and the weighting coefficients (viewing coefficient, creation coefficient) determined by the weighting coefficient determination unit 22, the creation similarity coefficient, and the browsing similarity are determined. Since the coefficient changes, the evaluation score calculated by the document evaluation unit 22 also changes.
[0034]
As is clear from the above, according to the second embodiment, for each type of document, the multiplication of the collation point calculated by the collation unit 18 and the creation coefficient, the multiplication of the collation point and the browsing coefficient, and the multiplication of the collation point Multiplication of the created similarity coefficient, multiplication of the matching point and the browsing similarity coefficient are performed, and the total value of each multiplication result is obtained as the evaluation point of the document. Therefore, a document that matches the work of the search user is searched. The effect that can be achieved.
[0035]
Embodiment 3 FIG.
In the first and second embodiments, the case where the document is searched based on the access tendency during the working hours has been described. However, the document may be searched based on the access tendency outside the working hours. FIG. 12 shows evaluation points and the like when a document is searched based on an access tendency outside working hours.
[0036]
【The invention's effect】
As described above, according to the present invention, the creator and viewer of various documents stored in the server are compared with the search user to determine the weighting factors for the various documents, and stored in the server. For each of the various types of documents, the evaluation score of the document is calculated by multiplying the number of words that have been identified by the matching means by the weighting factor, and the document is presented in descending order of the evaluation score. Therefore, even if the user's preference or the like changes over time, the user can search for a desired document without performing a special setting operation or the like.
[Brief description of the drawings]
FIG. 1 is a configuration diagram showing a document search device according to a first embodiment of the present invention.
FIG. 2 is an explanatory diagram showing personal attribute information of each user registered in an LDAP server.
FIG. 3 is an explanatory diagram showing information on a viewer recorded in a browsed document recording unit.
FIG. 4 is an explanatory diagram showing information about a creator recorded in a created document recording unit.
FIG. 5 is an explanatory diagram showing words and phrases present in each paragraph.
FIG. 6 is an explanatory diagram showing collation points, evaluation points, and the like of a document.
FIG. 7 is an explanatory diagram showing collation points, evaluation points, and the like of a document.
FIG. 8 is a configuration diagram showing a document search device according to a second embodiment of the present invention.
FIG. 9 is an explanatory diagram showing the similarity between users.
FIG. 10 is an explanatory diagram showing collation points, evaluation points, and the like of a document.
FIG. 11 is an explanatory diagram showing similarity between users.
FIG. 12 is an explanatory diagram showing collation points, evaluation points, and the like of a document.
[Explanation of symbols]
1 user terminal, 2 LDAP server, 3 Web server, 4 Internet, 5 document search device, 11 information collection unit (keyword receiving unit, recording unit), 12 personal information recording unit, 13 browsed document recording unit (recording unit), 14 Created document recording section (recording means), 15 weighted coefficient determination section (coefficient determination means), 16 document characteristic analysis section (collation means), 17 document characteristic storage section (collation means), 18 collation section (collation means), 19 Document evaluation section (evaluation means), 20 search result presentation section (search result presentation means), 21 control section, 22 weighted coefficient determination section (coefficient determination means), 23 document evaluation section (evaluation means).

Claims

Recording means for recording the creators and viewers of various documents stored in the server; keyword receiving means for receiving a keyword transmitted from the user terminal of the search user; recording contents of the recording means; A coefficient determining means for comparing search users to determine weighting coefficients for various documents, and extracting words and phrases from various documents stored in the server and receiving the words and the keyword by the keyword receiving means. Matching means for matching the keywords, and for each document stored in the server, multiplying the number of words and phrases determined to be matched by the matching means by the weighting factor determined by the coefficient determining means. Evaluation means for calculating the evaluation points of the documents, and presenting the documents with the evaluation points calculated by the evaluation means in descending order. Document search device and a search result presentation means.

2. The document search apparatus according to claim 1, wherein the collating unit recognizes a match when there is a similarity even when the word and the keyword do not completely match.

The evaluation means, when the creation coefficient and the browsing coefficient are determined by the coefficient determination means as the weighting coefficient relating to the document, multiplies the number of words and phrases for which the matching is recognized by the matching means with the creation coefficient, and 3. The document search apparatus according to claim 1, wherein the document retrieval apparatus multiplies the result of the multiplication by the reference coefficient, and a total value of both multiplication results is used as an evaluation point of the document.

When the coefficient determining means determines a similarity coefficient created based on the similarity between the search user and another user in addition to the weighting coefficient relating to the document, the number of words recognized as matching by the matching means is determined. 4. The multiplication of the number of words and the created similarity coefficient, and the sum of both multiplication results is used as the evaluation point of the document. The document search device according to any one of claims 1 to 6.

The coefficient determining means compares the personal attribute information of the search user with the personal attribute information of other users, grasps the similarity between the search user and other users, and assigns a predetermined value to the similarity. 5. The document search apparatus according to claim 4, wherein the generated similarity coefficient is obtained by multiplying.

The evaluation means determines the number of words that have been identified by the matching means when the coefficient determining means determines a browsing similarity coefficient based on the similarity between the search user and another user in addition to the weighting coefficient for the document. 6. The multiplication of the number of words and the browsing similarity coefficient, and the sum of both multiplication results is used as an evaluation point of the document. The document search device according to any one of claims 1 to 6.

The coefficient determining means compares the personal attribute information of the search user with the personal attribute information of other users, grasps the similarity between the search user and other users, and assigns a predetermined value to the similarity. 7. The document search apparatus according to claim 6, wherein the browsing similarity coefficient is obtained by multiplying.

The document search apparatus according to any one of claims 1 to 7, wherein the recording unit updates the recorded contents of the creator and the viewer at regular intervals.

2. The recording unit according to claim 1, wherein the recording unit updates the recorded contents of the creator and the viewer when a new document is registered in the server or when a document stored in the server is browsed. The document search device according to any one of claims 1 to 7.