JP2005010848A

JP2005010848A - Information retrieval device, information retrieval method, information retrieval program and recording medium

Info

Publication number: JP2005010848A
Application number: JP2003170997A
Authority: JP
Inventors: Hiroyuki Kanza; 浩幸勘座
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2003-06-16
Filing date: 2003-06-16
Publication date: 2005-01-13

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information retrieval device with high retrieval precision and high reproducibility for retrieving information relevant to a concept expressed with an inputted word. <P>SOLUTION: An information retrieval device 100 is provided with a word extending part 106 which collects an extended word having a hierarchical relation on concept with an input word shown by a character string acquired by a character string acquiring part 102 from a concept dictionary storing part 104, a data retrieval part 110 for retrieving a keyword matched with an extended retrieval key, which is constituted of the input word and the extended word, from a database 108, a rank calculation part 114 which decides the priority order of the retrieved information by using the similarity of the attributes of the input word and the attributes of the keyword matched with the extended retrieval key as a reference and a data selection part 116 which selects information to be outputted according to the priority order decided by the rank calculation part 114, and makes an output part 118 output the information. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、複数の情報の中から、所定の情報を検索する情報検索装置に関する。特に、問合せに対して広範囲の検索を行ない、利用者に相応しい情報を選択して出力する情報検索装置に関する。
【０００２】
【従来の技術】
現在、情報を検索する方法として、入力された語と一致する語を含む情報を検索する方法が一般的に用いられている。この情報検索方法は、所望の情報の内容を端的に表現する語を利用者が思いつき、かつ利用者が思いついた語を含む情報が多量に存在する場合に、検索漏れの少ない検索結果を得る事ができる。
【０００３】
ところが、利用者が所望の情報を端的に表現する語を思いつかず、所望の情報を得るには的外れな語を入力した場合、得られる情報も的外れな情報となる。また、利用者が思いついた語を含む情報が少量しかない場合、少量の情報の中に利用者が所望する情報が含まれている可能性は低い。これらの場合、利用者は思いついた語から連想する別の語を入力して検索を行なうという作業を、満足な検索結果が得られるまで繰返す必要に迫られる。
【０００４】
この様な問題を解決するために、語の意味的関係を階層構造で記述し、その情報を利用して文書検索を行なう技術が、特許文献１に開示されている。特許文献１に記載されている検索方法は、語によって表現される概念同士の上位、及び下位関係を記憶した辞書データを参照して、入力した文字列が示す語の下位に位置する概念を表現する語を抽出し、抽出した語を検索キーとして文書を検索する検索方法である。
【０００５】
図１８に、語によって表現される概念同士の上位、及び下位関係を記憶した辞書データの模式図を示す。図１８を参照して、この樹形図において、語９０２、９０４、…、９１８はそれぞれ樹形図のノードに配置されている。上位概念を表現する語と、その上位概念に属する下位概念を表現する語とは、それらの語によって表現される概念が互いに関連している事を示すパス９００によってそれぞれ連結されている。
【０００６】
特許文献１に記載の文書検索方法では、例えば「釣り」という語を示す文字列が入力されると、「釣り」という語９０８だけでなく、その下位概念を表現する語である「磯釣り」という語９１６と、「渓流釣り」という語９１８とを検索キーとして文書の検索を行なう。この様にして検索を行なう事により、大量の文書の中から、より具体的な情報を探し出す事ができる。
【０００７】
また、特許文献１には、入力した文字列が示す語の上位に位置する概念を表現する語を抽出し、抽出した語、又は抽出した語によって表現される概念の下位に位置する概念を表現する語を検索キーとして文書を検索する検索方法も開示されている。例えば、「釣り」という語を示す文字列が入力されると、図１８を参照して、「釣り」という語９０８だけでなく、その上位概念を表現する語である「アウトドア」という語９１４と、さらに上位概念を表現する「レクリエーション」という語９０２と、これらの語が表現する概念の下位概念を表現する語「キャンプ」９１０、「旅行」９０６、「温泉」９１２、「グルメ」９１４などの語を検索キーとして文書の検索を行なう。この様にして検索を行なう事により、関連性のある情報をより広範囲にわたって探し出す事ができる。
【０００８】
【特許文献１】
特開平４−１００６２号公報
【発明が解決しようとする課題】
特許文献１に開示された方法を用いて情報を検索した場合、次の様な問題点がある。例えば、図１８に示す辞書データに記載された語９０２から語９１８までのいずれかの語を検索キーとして検索する事によって、データベースから得られる情報の一覧が図１９に示すものであるものとする。なお、図１９を参照して、この情報の一覧は、情報９４０、９４２、及び９４４、並びにこれらの情報を得るための検索キー９４６、９４８、及び９５０を一覧にまとめたものである。
【０００９】
特許文献１に記載の検索方法を用いて、与えられた語によって表現される概念の下位の概念を表現する語によって情報の検索を行なうと、大量の情報の中から、より具体的な情報を探し出す事ができる。しかし、下位の概念を表現する語を用いて検索を行なっても、十分な成果を得る事が期待できない場合がある。
【００１０】
例えば、利用者によって入力された文字列が示す語が「釣り」という語９０８であったとする。図１８を参照して「釣り」という語９０８の下位の概念を表現する語は、「磯釣り」９１６と「渓流釣り」９１８とである。この場合データベースには、利用者にとっては「釣り」と関係ある情報が含まれているかも知れない「キャンプ用品バーゲン情報」９４０という情報が保持されているが、この情報を探し出す事ができない。この様に、利用者は関係があると考えているかもしれない情報がデータベースに保持されているにもかかわらず、検索結果から漏れる恐れがある。
【００１１】
逆に、上位概念を表現する語、及びその下位概念を表現する語によって情報の検索を行なうと、論理的には利用者が所望する情報に近い情報であっても、利用者が所望する情報の内容に対して抱くイメージと全く異なるイメージを想起させる情報である恐れがある。
【００１２】
例えば、利用者によって入力された文字列の示す語が「釣り」という語であったとする。この検索方法では、上位概念を表現する語、及びその下位概念を表現する語を検索キーとして検索を行なう。すると、図１９を参照して、「キャンプ用品バーゲン情報」という情報９４０、「近郊温泉施設」という情報９４２、及び「探検倶楽部」という情報９４４が得られる。ところが利用者が「釣り」と「温泉」とには「のんびりした」イメージを抱いており、「キャンプ」と「探検」とには「のんびりした」イメージを抱いていない場合がある。この様な場合、利用者が入力した「釣り」という語と、検索結果として得られた「キャンプ用品バーゲン情報」という情報９４０との関係、及び「釣り」という語と、「探検倶楽部」という情報９４４との関係が利用者には分からず、利用者はこれらの情報は無駄な情報であると感じる恐れがある。
【００１３】
それゆえに本発明の目的は、利用者が入力した語によって表現される概念と関連する情報を検索する装置であって、高い検索精度と高い再現性とを共に備える情報検索装置を提供する事である。
【００１４】
本発明の他の目的は、利用者が検索するために入力した語によって表現される概念と関連する情報を検索する情報検索装置であって、当該入力した語との間に高い関連性を有する情報を、広範囲な情報から検索できる情報検索装置を提供する事である。
【００１５】
本発明のさらに他の目的は、利用者が検索するために入力した語によって表現される概念と関連する情報を検索する情報検索装置であって、多面的な評価により、当該入力した語との間に高い関連性を有すると推定される情報を、広範囲な情報から検索できる情報検索装置を提供する事である。
【００１６】
本発明のさらに他の目的は、利用者が検索するために入力した語によって表現される概念と関連する情報を検索する情報検索装置であって、利用者が重視する情報の性質に関して、当該入力した語との間に高い関連性を有すると推定される情報を、広範囲な情報から検索できる情報検索装置を提供する事である。
【００１７】
本発明のさらに他の目的は、利用者が検索するために入力した語によって表現される概念と関連する情報を検索する情報検索装置であって、利用者が重視する情報の性質を推定する事により、当該入力した語との間に高い関連性を有すると推定される情報を、広範囲な情報から検索できる情報検索装置を提供する事である。
【００１８】
本発明の追加の目的は、利用者が検索するために入力した語によって表現される概念と関連する情報を検索する情報検索装置であって、当該入力した語が利用者に想起させるイメージと類似のイメージを利用者に想起させる情報を、広範囲な情報から検索できる情報検索装置を提供する事である。
【００１９】
【課題を解決するための手段】
本発明の第１の局面に係る情報検索装置は、第１の語を示す文字列を取得するための文字列取得手段と、複数の語に関して、語の概念間の階層的な関係を示す概念情報を保持するための概念情報保持手段と、第１の語によって表現される概念と関連性のある概念を表現する第２の語を、概念情報に基づいて、概念情報保持手段より収集するための語収集手段と、検索対象となる情報を保持するためのデータベースと、第１の語、及び第２の語を検索キーとして、データベースから、第１の語、及び第２の語のいずれかと一致するキーワード、並びに当該一致するキーワードに対応するデータベースに保持されている情報とを抽出するための抽出手段と、複数の語に関して、語の属性を示す情報を取得するための手段と、第１の語の属性と、当該一致するキーワードの属性との類似性を基準として、抽出手段が抽出した情報の優先順位を決定するための順位決定手段と、順位決定手段が決定した優先順位に従って、抽出手段が抽出した情報を出力するための出力手段とを含む。
【００２０】
この情報検索装置の利用者によって入力される文字列が示す第１の語のみならず、第１の語と概念的な関連性を有する第２の語を検索キーとして情報を検索する事により、広範囲な情報から情報を検索する事ができる様になる。そのため、検索の再現性が向上する。さらに検索結果を、第１の語の属性との類似性という別の基準によって検証を行なう事により、利用者が入力した語との間に高い関連性を有する情報を検索できる。そのため、検索精度が向上する。
【００２１】
好ましくは、順位決定手段は、語の属性を示す情報を取得するための手段が取得した、第１の語の属性を示す情報と、一致するキーワードの属性を示す情報とを元に、第１の語の属性と、一致するキーワードの属性との類似性を示す得点を算出するための得点算出手段と、得点算出手段が算出した得点を基準として、抽出手段が抽出した情報の優先順位を決定するための決定手段とを含む。
【００２２】
第１の語の属性と、キーワードの属性との類似性を示す得点を算出する事により、具体的な基準に基づく優先順位の決定を行なう事ができる。よって、利用者が入力した語との間に高い関連性を有する情報を検索できる。
【００２３】
好ましくは、語の属性を示す情報を取得するための手段は、複数の語の各々に関して、当該語の属性を、属性ごとに予め設定された属性値によって示す属性情報を保持するための属性情報保持手段を含み、順位決定手段は、属性情報保持手段に保持された、第１の語に関する属性情報と、一致するキーワードに関する属性情報とを元に、第１の語と、一致するキーワードとの心的距離を算出するための心的距離算出手段と、心的距離算出手段が算出した心的距離を基準とし、抽出手段が抽出した情報の優先順位を決定するための決定手段とを含む。
【００２４】
第１の語の属性と、キーワードの属性との間の心的距離を算出する事により、さらに具体的な基準に基づく優先順位の決定を行なう事ができる。よって、利用者が入力した語との間に高い関連性を有する情報を検索できる。
【００２５】
さらに好ましくは、順位決定手段はさらに、概念情報保持手段に保持された、第１の語に関する概念情報と、一致するキーワードに関する概念情報とを元に、第１の語と、一致するキーワードとの概念距離を算出するための概念距離算出手段と、第１の語と、一致するキーワードとの組合せごとに、心的距離と、概念距離とを統合した基準値を作成するための手段とを含み、決定手段は、心的距離と、概念距離とを統合した基準値を基準として、抽出手段が抽出した情報の優先順位を決定するための手段を含む。
【００２６】
第１の語によって表現される概念と、キーワードによって表現される概念との関連性を概念距離によって具体化し、さらに、この概念距離と、心的距離とを用いて多面的に情報を評価する事により、利用者が入力した語との間に高い関連性を有すると推定される情報を、広範囲な情報から検索できる。
【００２７】
心的距離算出手段は、各属性の重要度を設定するための重要度設定手段と、属性情報保持手段に保持された、第１の語に関する属性情報、一致するキーワードに関する属性情報、及び重要度設定手段が設定した各属性の重要度を元に、第１の語と、一致するキーワードとの心的距離を算出するための手段とを含んでもよい。
【００２８】
利用者が重視する情報の性質がどの様なものであるかを重要度設定手段による設定によって具体化する事ができる。そのため、この設定を加味して心的距離を算出する事により、利用者が重視する性質に関して、利用者が入力した語と高い関連性を有する情報を検索する事ができる。
【００２９】
重要度設定手段は、出力手段が出力した情報の履歴を記録するための履歴記録手段と、履歴記録手段が記録した履歴に基づき、各属性の重要度を設定するための手段とを含んでもよい。
【００３０】
各属性の重要度を設定するための手段は、履歴記録手段が記録した履歴と、データベースとを照合する事により、情報に対する利用者の嗜好を推定するための嗜好推定手段と、嗜好推定手段が推定した嗜好と、属性情報保持手段が保持する属性情報とを元に、各属性の重要度を設定するための手段とを含んでもよい。
【００３１】
嗜好推定手段は、履歴記録手段が記録した履歴と、データベースとを照合し、キーワードごとに、キーワードに対応するデータベースに保持されている情報が出力された頻度を算出するための手段と、このキーワードごとの頻度、及び属性情報保持手段に保持されたキーワードの属性を元に、情報に対する利用者の嗜好を推定するための手段とを含んでもよい。
【００３２】
過去に出力された情報から、重要度を算出する事により、利用者が重視する情報の性質を推定する事が可能となり、利用者が重視する性質に関して、入力された語と高い関連性を有する情報を検索する事ができる。よって、利用者が入力した語が利用者に想起させるイメージと類似のイメージを利用者に想起させる情報を検索できる。
【００３３】
出力手段は、順位決定手段が決定した優先順位に従って、抽出した情報を優先順位によって示される順に、所定の順位まで出力するための手段を含んでもよい。
【００３４】
出力する情報を絞込む事により、特に利用者にとって興味あると思われる情報を検索結果として提示する事ができる。よって、情報の検索結果が利用者にとって充実したものとなる。
【００３５】
本発明の第２の局面に係る情報検索方法は、第１の語を示す文字列を取得するステップと、取得した文字列が示す第１の語によって表現される概念と関連性のある概念を表現する第２の語を収集するステップと、第１の語、及び第２の語を検索キーとして、キーワードに対応するデータベースに保持されている情報を検索するステップと、この検索するステップにおいて、検索キーによって得られた検索結果である情報に対応するキーワードの属性と、第１の語の属性との類似性を基準として、出力する検索結果である情報の優先順位を決定するステップと、この決定するステップにおいて決定された優先順位に従い、検索結果である情報を出力するステップとを含む。
【００３６】
情報を検索する際に、この情報検索方法を用いる事により、入力した語との間に高い関連性を有する情報を、広範囲な情報から検索できる。
【００３７】
本発明の第３の局面に係る情報検索プログラムは、コンピュータ上で実行されると、当該コンピュータを本発明の第１の局面に係る情報検索装置として動作させる。
【００３８】
この情報検索プログラムを実行する事により、上記した第１の局面に係る発明の作用及び効果をコンピュータで実現する事が可能となる。
【００３９】
本発明の第４の局面に係る記録媒体は、本発明の第３の局面に係る情報検索プログラムが記録された、コンピュータで読取可能な記録媒体である。
【００４０】
この記録媒体に記録された情報検索プログラムをコンピュータで読取り、実行する事により、上記した第１の局面に係る発明の作用及び効果を実現できる。
【００４１】
【発明の実施の形態】
以下、図面を参照しつつ、本発明の実施の形態について説明する。以下の説明に用いる図面では、同一の部品には同一の符号が付してある。それらの名称及び機能も同一である。従って、それらについての詳細な説明は繰返さない。
【００４２】
［実施の形態１］
本発明の実施の形態１に係る情報検索装置の概要について説明する。本実施の形態１に係る情報検索装置は、特許文献１に記載の技術と同様、情報の検索を行なう前に、与えられた語について、その語によって表現される概念（以下、この概念を「語の概念」と呼ぶ。）と関連する概念を表現する語を集める。そして、そうした語を検索キーとして情報の検索を行なう。
【００４３】
しかし本実施の形態１に係る情報検索装置はさらに、語に備わる性質（以下、この性質を「語の属性」と呼ぶ。）に基づいて、検索の結果を出力する際の優先順位を決定し、決定した優先順位に従って検索結果を出力する。
【００４４】
ある語の属性は、本実施の形態に係る情報検索装置の利用者が当該の語、若しくは当該の語によって表現される事物に対して抱くイメージなどの心情的な性質、又は当該の語によって表現される時代的、若しくは地域的な背景など、論理的な概念によって関連性を示す事が困難な性質などがある。そのため、本実施の形態１に係る情報検索装置は、関連する概念の語を検索キーとして検索を行ない、複数種類の情報を得た場合、利用者が所望する情報に類似する印象を想起させる可能性の高い情報を優先的に出力する。
【００４５】
図１に、本実施の形態１に係る情報検索装置の構成をブロック図形式で示す。図１を参照して、情報検索装置１００は、キーボード、マウス、又はタッチパネルなどの入力装置からなり、検索キーを示す文字列を取得する文字列取得部１０２と、語の概念間の階層的な関係を示す概念辞書を記憶する概念辞書記憶部１０４と、文字列取得部１０２及び概念辞書記憶部１０４に接続され、文字列取得部１０２が取得した文字列によって構成される語（以下、この語を「入力語」と呼ぶ。）と関連性のある概念の語（以下、この語を「拡張語」と呼ぶ。）を概念辞書記憶部１０４から取得し、入力語と拡張語とからなる拡張検索キーを作成する語拡張部１０６とを含む。
【００４６】
情報検索装置１００はさらに、検索対象となるデータを、データを検索する際に用いるキーワードに対応付けて保持するデータベース１０８と、語拡張部１０６及びデータベース１０８に接続され、語拡張部より与えられた拡張検索キーを用いて、データベース１０８に保持されたデータを検索するデータ検索部１１０とを含む。
【００４７】
情報検索装置１００はさらに、語の属性を示す属性情報によって構成された属性辞書を記憶する属性辞書記憶部１１２と、データ検索部１１０、及び属性辞書記憶部１１２に接続され、データ検索部１１０による検索結果を、属性辞書記憶部１１２に記憶されている属性情報に基づいて検証し、検索結果であるデータに優先順位を付けるランク計算部１１４と、データベース１０８及びランク計算部１１４に接続され、検索結果であるデータの優先順位をランク計算部１１４から取得し、取得した優先順位に従って、データベース１０８からデータを取得するデータ選択部１１６と、データ選択部１１６が取得したデータを出力する出力部１１８とを含む。
【００４８】
図２に、概念辞書記憶部１０４に記憶される語の概念の関連性を示す。図２を参照して、語の概念の関連性は、樹形図によって模式的に表現される。この樹形図において、語１３２、１３４、…、１４８、…はそれぞれ樹形図のノードに配置されている。上位概念を表現する語と、その上位概念に属する下位概念を表現する語とは、それらの語によって表現される概念が互いに関連している事を示すパス１３０によってそれぞれ連結されている。例えば、「釣り」という語１３８の概念は、「磯釣り」という語１４６の概念、及び「渓流釣り」という語１４８の概念の上位概念となる。また、「釣り」という語１３８の概念は、「アウトドア」という語１３４の概念の下位概念となる。また例えば、「釣り」という語１３８の概念と、「パラグライダー」という語１４０の概念との関連性は、「釣り」という語１３８の概念と、「温泉」と語１４２の概念との関連性より高い。
【００４９】
図３に、概念辞書記憶部１０４に記憶されている概念辞書の構成を示す。図３を参照して、概念辞書１６０は、多数の項目１６２、１６４、…、１８２、…を含む。各項目は語１８６と、語を識別するためのユニークな語番号１８８と、語１８６によって表現される概念の関連を示す概念情報１９０とを含む。概念情報１９０は、図２に示す樹形図におけるパス１３０に相当する情報である。概念情報１９０は、上位概念を表現する語の語番号１９２と、下位概念を表現する語の語番号１９４とを含む。
【００５０】
概念辞書１６０の各項目に記載された語が表現する概念の関連性は、概念情報１９０に格納された上位概念の語の語番号１９２、及び下位概念の語の語番号１９４によって示される。例えば、項目１６８に記載されている「釣り」という語の上位概念を表現する語の語番号は、「００１２３」である。語番号「００１２３」の語は、項目１６４に記載された「アウトドア」という語である。すなわち、「アウトドア」という語が、「釣り」という語の上位概念を表現する語となる。逆に語番号「００１２３」の「アウトドア」という語の下位概念を表現する語の語番号は「０１７３４」と「０２４９５」とである。即ち、「アウトドア」という語の下位概念を表現する語には、「釣り」という語に加えて、「パラグライダー」という語が含まれる。
【００５１】
図４に、語拡張部１０６が、図３に示す概念辞書１６０に基づいて作成する拡張検索キーの構成を示す。図４を参照して、拡張検索キー２００は、入力語２０２と、拡張語２０４とを含む。入力語２０２は、後の動作において図１に示すランク計算部１１４が、検索結果のランク付けを行なうために用いるので、拡張語２０４とは区別される。
【００５２】
図５に、データベース１０８に保持されているデータの一例を示す。図５を参照して、データベース１０８に保持されているデータ２１０は、複数のデータ項目２１２、２１４、２１６、２１８、…を含む。各データ項目は、格納されたデータ２２０と、データ項目を識別するための項目番号２２２と、データ検索部１１０がデータを検索する際に参照するキーワード２２４とを含む。キーワード２２４として選ばれる語は、データ２２０の内容に関連する語である。例えば、データ２２０の内容を端的に示す単語であってもよいし、データ２２０内に出現する単語を抽出したものであってもよい。
【００５３】
なお、図５に示すデータベースは、キーワードをデータと共に保持する形式で示されているが、これは説明の都合上のものであり、この形式に限定されるものではない。
【００５４】
図１に示すデータ検索部１１０は、語拡張部１０６より与えられた拡張検索キーに含まれる語と同一の語がキーワードに含まれているデータ項目をデータベース１０８から検索し、検索結果をランク計算部１１４に与えるための検索結果情報を作成する機能を有する。図６に、データ検索部１１０が作成し、ランク計算部１１４に与える検索結果情報の一例を示す。図６を参照して、検索結果情報２４０は複数の項目２４２、２４４、２４６、…を含む。これらは、それぞれ検索結果であるデータ項目の項目番号２４８と、検索時に拡張検索キーと一致したキーワード２５０とを含む。
【００５５】
図７に、属性辞書記憶部１１２（図１参照）に記憶されている属性辞書の構成を示す。図７を参照して、属性辞書２６０は、多数の項目２６２、２６４、２６６、２６８、…を含む。各項目は、語２７０と、語を識別するための語番号２７２と、語の属性を示す属性情報２７４とを含む。属性情報２７４は、語の属性を、属性の種類を示す属性項目２７６、２７８、２８０、…ごとにそれぞれ数値化した属性値を含む情報である。図７に示す属性辞書２６０では、語に、属性項目に示される属性が備わっている場合に属性値「１」を、備わっていない場合には属性値「０」が与えられている。これらの属性値は、調査に基づいて設定しておいてもよいし、利用者本人が設定する様にしてもよい。
【００５６】
図１に示すランク計算部１１４は、与えられた検索結果情報を元に、属性辞書記憶部１１２に記憶されている属性辞書を参照して、出力するデータ項目の優先順位を付ける機能を有する。ランク計算部１１４が出力するデータ項目の優先順位を付ける方法の一例を説明する。
【００５７】
例えば、属性辞書記憶部１１２に記憶されている属性辞書が、図７に示す属性辞書２６０であり、ランク計算部１１４に入力語として「釣り」という語が与えられ、検索結果情報として図６に示される検索結果情報２４０が与えられたとする。このとき、図６に示す検索結果情報２４０に記載されている項目２４４のキーワードは「温泉」である。図７を参照して、「温泉」という語を含む項目２６２と、入力語である「釣り」という語を含む項目２６６とは、属性項目「のんびり」２７６の属性値と、属性項目「和風」２８０の属性値が一致する。よって、図６に示す検索結果情報２４０の項目２４４の得点は２点となる。この様にして算出した得点の降順に出力するデータ項目の優先順位を決定する。
【００５８】
図１から図７を参照して、本実施の形態に係る情報検索装置１００は、以下の様に動作する。
【００５９】
図１を参照して、利用者が文字列取得部１０２を用いて入力語を入力した事に応答して、文字列取得部１０２は、入力語を語拡張部１０６に与える。
【００６０】
入力語を与えられた語拡張部１０６は、概念辞書記憶部１０４に記憶された図３に示す概念辞書１６０内で入力語が含まれている項目を探し出す。次に語拡張部１０６は、入力語が記載された項目の概念情報１９０を参照し、入力語の上位概念の語と、下位概念の語とを拡張語として取得する。語拡張部１０６は、取得した拡張語の概念情報１９０を参照して、拡張語の上位概念の語と下位概念の語とを取得する動作を所定の回数（本実施の形態では３回）繰返して拡張語を取得し、入力語及び取得した拡張語から拡張検索キーを作成する。作成された拡張検索キーは、データ検索部１１０に与えられる。
【００６１】
データ検索部１１０は、与えられた拡張検索キーによって、データベース１０８に格納されているデータを検索する。データ検索部１１０は、拡張検索キーに含まれる語のいずれかと一致するキーワードを含むという条件に該当するデータ項目の項目番号と、一致したキーワードとをデータベース１０８から読出す。
【００６２】
例えば、データベース１０８には図５に示すデータ２１０が記憶されており、データ検索部１１０には図４に示す拡張検索キー２００が与えられたとする。このときデータ検索部１１０は、拡張検索キー２００に含まれる語と一致するキーワードを含むデータ項目２１２、２１４、及び２１６の項目番号２２２とキーワード２２４とを検索結果として読出す。データ検索部１１０は、読出した項目番号及びキーワードを元に検索結果情報を作成する。作成された検索結果情報は入力語と共にランク計算部１１４に与える。
【００６３】
入力語と検索結果情報とが与えられた事に応答して、ランク計算部１１４は、入力語の属性情報と、検索結果情報に含まれるキーワードの属性情報とを属性辞書記憶部１１２から読出す。ランク計算部１１４は、読出した属性情報の各属性項目について、入力語の属性値とキーワードの属性値とを比較し、両者の属性値が一致した数だけ検索結果情報に得点を付与する。この動作を検索結果情報に含まれるすべてのキーワードに対して行ない、当該の検索結果情報の得点とする。ランク計算部１１４は、算出した得点が高い順に、検索結果情報に記載された項目番号を並べたランク表を作成し、データ選択部１１６に与える。
【００６４】
データ選択部１１６は、与えられたランク表の上位から、所定の数だけの項目番号を読出す。データ選択部１１６はさらに、読出した項目番号のデータ項目を、データベース１０８から読出して、読出したデータ項目を出力部１１８に与える。出力部１１８は与えられたデータを出力する。
【００６５】
本実施の形態１に係る情報検索装置１００は、一般的なコンピュータ、又は携帯情報端末装置と、それらの上で実行されるコンピュータプログラムにより実現できる。以下、情報検索装置１００に関する所望の機能を実現するためのプログラムの制御構造について説明する。
【００６６】
図８に情報検索装置１００が実行するプログラムのフローチャートを示す。図８を参照して、情報検索装置１００がプログラムを開始すると、制御はステップ（以下、ステップを単に「Ｓ」と表記する。）３０２に進む。Ｓ３０２では、情報検索装置１００は検索キーとなる文字列を取得する。制御はＳ３０４に進む。
【００６７】
Ｓ３０４では、取得した検索キーとなる文字列を入力語とし、入力語を元に、拡張検索キーを作成する。制御はＳ３０６に進む。
【００６８】
Ｓ３０６では、入力語と、Ｓ３０４の制御によって作成した拡張検索キーによって、データベースに保持されているデータを検索し、検索結果であるデータ項目の項目番号と、当該のデータ項目に関連付けられたキーワードとを取得する。制御はＳ３０８に進む。
【００６９】
Ｓ３０８では、Ｓ３０６の制御によって取得したキーワードの属性情報と入力語の属性情報とを比較し、各データ項目に得点を付ける。制御はＳ３１０に進む。
【００７０】
Ｓ３１０では、出力するデータとして、高得点のデータ項目から順に所定の数のデータ項目を選択する。続くＳ３１２では、選択されたデータ項目のデータを出力する。以上の制御が終了した後、このプログラムは終了する。
【００７１】
以上の様に本実施の形態１に係る情報検索装置１００は、与えられた入力語から、拡張検索キーを作成してデータを検索するため、入力語によって表現される概念を含む広範囲の概念と符合する情報を検索する事ができる。また、検索の結果得られた情報に対して、語の属性に基づく順位付けを行ない、順位に基づいて検索結果を出力するため、入力語と類似した性質を表現する情報が優先的に出力される。そのため、入力された語との間に高い関連性を有する情報を検索結果として出力する事ができる。
【００７２】
［実施の形態２］
実施の形態１に係る情報検索装置１００は、語によって表現される概念に基づいてデータの検索範囲を拡張し、語の属性に基づいて検索結果を出力する際の優先順位を決定するものであった。しかし、本発明は、この様な実施の形態には限定されない。
【００７３】
本実施の形態２に係る情報検索装置は、検索結果を出力する際の優先順位を決定する際に、入力語と、検索結果であるデータ項目のキーワードとの語の属性の類似性のみならず、入力語と、キーワードとの語の概念の関連性を加味した基準によって優先順位の決定を行なう。
【００７４】
図９に、本実施の形態２に係る情報検索装置の構成を、ブロック図形式で示す。図９を参照して、本実施の形態２に係る情報検索装置４００は、図１に示す実施の形態１に係る情報検索装置１００と同一の文字列取得部１０２と、概念辞書記憶部１０４と、語拡張部１０６と、データベース１０８と、データ検索部１１０と、データ選択部１１６と、出力部１１８とを含む。
【００７５】
情報検索装置４００はさらに、図１に示す実施の形態１に係る属性辞書記憶部１１２に替えて、属性辞書記憶部１１２と異なり、語がどの程度属性を備えているかを数値化した属性値からなる属性情報によって構成された属性辞書を記憶する属性辞書記憶部４１２を含む。
【００７６】
情報検索装置４００はさらに、データ検索部１１０及び概念辞書記憶部１０４に接続され、概念辞書記憶部１０４に記憶されている概念情報に基づいて、検索結果であるデータ項目のキーワードと入力語との概念の関連性を示す値である概念距離を計算する概念距離計算部４０２と、データ検索部１１０及び属性辞書記憶部４１２に接続され、属性辞書記憶部４１２に記憶されている属性情報に基づいて、検索結果であるデータ項目が利用者に与える心情的な印象と、入力語とが利用者に与える心情的な印象との類似性を示す値である心的距離を計算する心的距離計算部４０４とを含む。
【００７７】
情報検索装置４００はさらに、図１に示すランク計算部１１４に替えて、概念距離計算部４０２、心的距離計算部４０４、及びデータ選択部１１６に接続され、概念距離計算部４０２によって数値化された概念距離と、心的距離計算部４０４によって数値化された心的距離とに基づいて、検索されたデータ項目をランク付けするランク計算部４１４を含む。
【００７８】
概念距離計算部４０２は、概念辞書記憶部に記憶されている概念情報に基づいて、概念距離を計算する機能を有する。以下に、概念距離の計算方法の一例を示す。
【００７９】
語の概念が図２に示す樹形図によって模式化されるとする。概念距離計計算部４０２は、図２を参照して、この樹形図上で語Ｗ_ｉが位置するノードとＷ_ｊが位置するノードとを繋ぐパス１３０の本数を、２つの語Ｗ_ｉ、Ｗ_ｊ間の概念距離ｄ（Ｗ_ｉ，Ｗ_ｊ）とする。この計算方法を用いると、例えば「釣り」という語１３８と、「アウトドア」という語１３４との間の概念距離ｄ（「釣り」，「アウトドア」）は１と計算される。また例えば、「釣り」という語１３８と、「温泉」という語１４２との間の概念距離ｄ（「釣り」，「温泉」）は４と計算される。また、入力語とキーワードが同一の語である場合、概念距離を０とする。以上の様にして計算した概念距離によると、語の概念の関連性が高いほど、概念距離は小さくなる。
【００８０】
図１０に、本実施の形態２に係る属性辞書記憶部４１２が記憶する属性辞書の構成を示す。図１０を参照して、本実施の形態２に係る属性辞書記憶部４１２に記憶される属性辞書４４０は、多数の項目４４２、４４４、４４６、４４８、…を含む。各項目は図７に示す実施の形態１に係る属性辞書２６０と同様の構成を有するが、次の点で異なる。即ち、図７に示す属性辞書２６０の属性値は、各属性項目に関して、語が属性を備えているか否かを２値的に示した値であったのに対して、本実施の形態２に係る属性辞書４４０の属性値は、各属性項目に関して、語がどの程度属性を備えているかを示す「０」から「１０」の整数値を属性値としている。その他の点については、実施の形態１に係る属性辞書２６０と実施の形態２に係る属性辞書４４０とは同一である。なお、属性辞書４４０の属性値には、調査に基づいて統計的に算出した値を用いてもよいし、利用者が設定した値を用いてもよい。
【００８１】
心的距離計算部４０４は、図１０に示す属性情報に基づいて、心的距離を計算する機能を有する。以下に、心的距離の計算方法の一例を説明する。
【００８２】
属性辞書記憶部４１２（図９参照）に記憶されている語の属性情報が、図１０に示す属性辞書４４０によって定義されているとする。図１０を参照して、属性情報は先にも述べた様に、各属性項目に関して語がどの程度属性を備えているかを示す属性値を含む。定義されている属性項目の総数をｎとし、ある語Ｗ_ｉの属性項目Ａ_ｋ（１≦ｋ≦ｎ）における属性値をａ_ｉ _， _ｋとすると、この語Ｗ_ｉの属性情報はａ_ｉ _， _ｋをそれぞれ成分とするｎ次元の属性情報ベクトルとして表現される。即ち語Ｗ_ｉの属性は、ｎ次元のベクトル空間上で属性情報ベクトルによって定義される。このベクトルをｗ_ｉとする。そして、語Ｗ_ｉの属性と、語Ｗ_ｊの属性との間のユークリッド距離の２乗を下記の数式１によって計算し、これから語Ｗ_ｉと語Ｗ_ｊとの間の心的距離ｓ（Ｗ_ｉ，Ｗ_ｊ）を計算する。
【００８３】
【数１】

例えば、図１０を参照して、第１の属性項目Ａ_１を属性項目「のんびり」２７６、第２の属性項目Ａ_２を属性項目「スリリンク」２７８、第３の属性項目Ａ_３を属性項目「和風」２８０とし、「釣り」という語を語Ｗ_ｉ、「パラグライダー」という語を語Ｗ_ｊとする。語Ｗｉ、及び語Ｗｊの属性情報ベクトルｗ_ｉ、ｗ_ｊは、項目４４６及び項目４４８を参照して、それぞれ、ｗ_ｉ＝（１０，２，６）、ｗ_ｊ＝（０，１０，１）となる。よって、語Ｗ_ｉと、語Ｗ_ｊとの間の心的距離の２乗は、
【００８４】
【数２】

となる。よって、「釣り」という語と、「パラグライダー」という語との間の心的距離ｓ（「釣り」，「パラグライダー」）は、
【００８５】
【数３】

となる。以上の様にして計算した心的距離によると、二つの語の属性が類似するほど、それらの語の間の心的距離は小さくなる。
【００８６】
本実施の形態２に係る情報検索装置４００は、以下の様に動作する。
【００８７】
図９を参照して、実施の形態１に係る情報検索装置１００と同様に、利用者が入力語を入力した事に応答し、文字列取得部１０２は、入力語を語拡張部１０６に与える。語拡張部１０６は、実施の形態１と同様の動作で拡張検索キーを作成し、データ検索部１１０に与える。データ検索部１１０は、実施の形態１と同様の動作で、拡張検索キーによって、データベース１０８に記憶されているデータの検索を行ない、検索結果情報を作成する。データ検索部１１０は、入力語と、作成した検索結果情報とを出力する。これらは概念距離計算部４０２、及び心的距離計算部４０４に与えられる。
【００８８】
概念距離計算部４０２は、入力語と検索結果情報とが与えられた事に応答して、概念辞書記憶部１０４に記憶されている概念辞書に基づいて、入力語の概念と検索結果情報に含まれる各データ項目のキーワードの概念との間の概念距離を計算し、与えられた検索結果情報の各項目に、計算によって得られた入力語とキーワードとの間の概念距離を添付して、ランク計算部４１４に与える。
【００８９】
一方、図９を参照して、心的距離計算部４０４は、データ検索部１１０から入力語及び検索結果情報が与えられた事に応答して、属性辞書記憶部４１２に記憶されている属性情報に基づいて、入力語と各データ項目のキーワードとの間の心的距離を計算し、与えられた検索結果情報の各項目に、計算によって得られた入力語とキーワードとの間の心的距離を添付して、ランク計算部４１４に与える。
【００９０】
ランク計算部４１４、概念距離が添付された検索結果情報が概念距離計算部４０２から与えられたという条件、及び心的距離が添付された検索結果情報が心的距離計算部４０４から与えられたという条件の両方が満たされた事に応答して、検索結果情報に記載された各データ項目について、概念距離と心的距離との和をそれぞれ算出する。
【００９１】
先述の通り、与えられた概念距離の値が小さいという事は、入力語の概念とキーワードの概念とが高い関連性をもっている事を示す。また、心的距離が小さいという事は、入力語の属性と、キーワードの属性との類似性が高い事を示す。そのため、ランク計算部４１４は、入力語の概念と高い関連性をもつデータ項目、又は入力語の属性と類似した属性のデータ項目を優先的に出力させるために、概念距離と心的距離との和が小さいデータ項目から順に、データ項目の項目番号を並べたランク表を作成し、データ選択部１１６に与える。
【００９２】
データ選択部１１６は、与えられたランク表の上位から、所定の数だけ項目番号を読出し、読出した項目番号によって識別されるデータ項目を、データベース１０８から読出して、出力部１１８に与える。出力部１１８は与えられたデータを出力する。
【００９３】
本実施の形態２に係る情報検索装置４００は、実施の形態１に係る情報検索装置１００と同様に、一般的なコンピュータ、又は携帯情報端末と、それらの上で実行されるコンピュータプログラムにより実現できる。以下、情報検索装置４００に関する所望の機能を実現するためのプログラムの制御構造について説明する。
【００９４】
図１１に、本実施の形態２に係る情報検索装置４００が実行するプログラムのフローチャートを示す。図１１を参照して、本実施の形態２に係る情報検索装置４００がプログラムを開始すると、まずＳ３０２で、入力語を取得し、Ｓ３０４では取得した入力語を元に拡張検索キーを作成する。続くＳ３０６では、作成した拡張検索キーを用いてデータベース１０８に保持されているデータを検索し、検索結果と、検索結果であるデータ項目と関連付けられているキーワードとを取得する。
【００９５】
本実施の形態２に係る情報検索装置４００が実行するプログラムにおいては、Ｓ３０６の処理が終了した後、制御はＳ５０８に進む。
【００９６】
Ｓ５０８では、取得したキーワードについて、概念距離と心的距離との和をすべて算出したか否かを判定する。概念距離と心的距離との和をすべて算出しているならば、制御はＳ５１６に進む。算出していない検索結果があるならば、制御はＳ５１０に進む。
【００９７】
Ｓ５１０では、入力語と、検索結果のデータ項目と関連付けられたキーワードとの間の概念距離を算出する。Ｓ５１２では、入力語とキーワードとの間の心的距離を算出する。続くＳ５１４では、概念距離と心的距離との和を算出する。制御はＳ５０８に戻る。
【００９８】
Ｓ５１６では、概念距離と心的距離の和が小さな検索結果が上位になる様に、検索結果であるデータ項目にランク付けをする。制御は、Ｓ３１０に進む。
【００９９】
Ｓ３１０では、図８に示す実施の形態１に係るプログラムによる制御と同様に、ランクが上位のデータ項目から順に所定数のデータ項目を選択し、続くＳ３１２で、選択したデータ項目を出力する。以上の制御が終了した後、このプログラムは終了する。
【０１００】
以上の様に、実施の形態２に係る情報検索装置４００は、検索の結果得られた情報に対して、属性の類似性、及び概念の関連性に基づく順位付けを行ない、順位に基づいて検索結果を出力する。そのため、概念的にも性質的にも、入力語と高い関連性を有する情報を検索結果として出力する事が可能となる。
【０１０１】
［実施の形態３］
実施の形態２において、概念距離、及び心的距離によって検索結果の優先順位を決定する機能を例示した。しかし本発明は、その様な実施の形態には限定されるものではない。
【０１０２】
本実施の形態に係る情報検索装置は、出力された情報の履歴に基づいて属性項目の重要度を算出し、算出した重要度によって、属性項目に重み付けを行なって心的距離の計算を行なう機能をさらに有する。
【０１０３】
図１２に、本実施の形態３に係る情報検索装置の構成をブロック図形式で示す。図１２を参照して、本実施の形態３に係る情報検索装置６００は、図９に示す実施の形態２に係る情報検索装置４００の文字列取得部１０２、概念辞書記憶部１０４、語拡張部１０６、データベース１０８データ検索部１１０、データ選択部１１６、出力部１１８に加えて、データ選択部１１６に接続され、データ選択部１１６によって選択されたデータ項目の項目番号を取得し、出力したデータ項目の履歴として記録する履歴記録部６０２と、履歴記録部６０２及びデータベース１０８に接続され、利用者の嗜好を示す嗜好データを作成する嗜好抽出部６０４と、属性辞書記憶部４１２、及び嗜好抽出部６０４に接続され、嗜好データ、及び属性辞書の属性情報を元に、属性辞書の各属性項目の重要度を算出する重み調整部６０６とを含む。
【０１０４】
本実施の形態３に係る情報検索装置６００はさらに、図９に示す実施の形態２に示す心的距離計算部４０４に替えて、データ検索部１１０、属性辞書記憶部４１２、ランク計算部４１６、及び重み調整部６０６に接続され、属性辞書記憶部４１２に記憶されている情報、及び重み調整部６０６によって算出された重要度を元に、二つの語の間の心的距離を計算する心的距離計算部６０８を含む。
【０１０５】
図１３に本実施の形態３に係る履歴記録部６０２が記録する履歴情報の構成を示す。図１３を参照して、履歴情報６２０は、複数の履歴項目を含む。各履歴項目は、過去に情報検索装置６００が出力したデータ項目の項目番号６２２と、データ項目が出力された日時６２４とを含む。項目番号６２２は、嗜好抽出部６０４（図１２参照）が嗜好データを作成する際に用いられる。
【０１０６】
図１４に嗜好抽出部６０４が作成する嗜好データの構成を示す。図１４を参照して、嗜好データ６４０の各項目は、現在までに出力されたデータ項目と関連付けられているキーワードを示す出力キーワード６４２と、当該のキーワードと関連付けられているデータ項目が出力された回数を示す出力頻度６４４とを含む。出力頻度６４４は、出力されたデータ項目と関連付けられているキーワード群に、出力キーワード６２２が含まれている場合に加算される。この出力キーワードの出力頻度が高い値であるという事により、情報検索装置６００の利用者が、当該の出力頻度が高いキーワードの属性を好む傾向がある事が示される。
【０１０７】
図１２に示す重み調整部６０６は、先述の通り、嗜好抽出部６０４が作成する嗜好データ、及び属性辞書記憶部４１２に記憶されている属性情報を元に、属性項目の重要度を計算する機能を有する。以下に、重み調整部６０６が算出する重要度の計算方法の一例を説明する。
【０１０８】
与えられた嗜好データに、総数ｈの出力キーワードＫ_ｍ（１≦ｍ≦ｈ）が含まれているものとする。出力キーワードＫ_ｍの出力頻度をｆ_ｍで表わす。出力キーワードＫ_ｍと一致する語の属性項目Ａ_ｌにおける属性値をａ_ｌ，ｍとする。このとき、属性項目Ａ_ｌの重要度Ｉ_ｌは以下の数式４によって算出される。
【０１０９】
【数４】

例えば、重み調整部６０６に図１４に示す嗜好データ６４０が与えられ、図１０に示す属性辞書４４０に記載された属性情報を元に、各属性項目の重要度を算出する場合を考える。属性項目２３６、２３８、２４０の重要度をそれぞれＩ_１、Ｉ_２、Ｉ_３とすると、
【０１１０】
【数５】

となる。
【０１１１】
本実施の形態３に係る心的距離計算部６０８が行なう心的距離の計算方法の一例を、以下に説明する。
【０１１２】
定義されている属性項目の総数をｎ、語Ｗ_ｉの属性項目Ａ_ｋ（１≦ｋ≦ｎ）における属性値をａ_ｉ，ｋ、属性項目Ａ_ｋの重要度をＩ_ｋとする。心的距離計算部６０８は、Ｉ_ｋの逆数を重み値とする、語Ｗ_ｉと語Ｗ_ｊとの間の重み付きユークリッド距離の２乗を下記の数式６によって計算し、語Ｗ_ｉと語Ｗ_ｊ間の心的距離ｓ（Ｗ_ｉ，Ｗ_ｊ）を計算する。
【０１１３】
【数６】

【０１１４】
本実施の形態３に係る情報検索装置６００は、以下の様に動作する。
【０１１５】
図１２を参照して、情報検索装置６００が起動すると、嗜好推定部６０４が、履歴記録部６０２に記録されている履歴情報を読出す。嗜好推定部６０２はさらに、履歴情報に記載されている項目番号のデータ項目と関連付けられているキーワードを、データベース１０８から読出す。読出したキーワードを出力キーワードとして、履歴情報に基づき出力頻度を算出し、嗜好データを作成する。作成した嗜好データは、重み調整部６０６に与えられる。
【０１１６】
嗜好データが与えられた事に応答して、重み調整部６０６は、嗜好データに含まれている出力キーワードと一致する語の属性情報を、属性辞書記憶部４１２から読出す。重み調整部６０６は、与えられた嗜好データと、読出した属性情報とを元に、各属性項目の重要度を算出する。算出した各属性項目の重要度は、心的距離計算部６０８に与えられる。
【０１１７】
一方、利用者が語を入力すると、文字列取得部１０２が入力語を語拡張部１０６に与える。語拡張部１０６は、概念辞書記憶部１０４に記憶された概念辞書を参照して拡張検索キーを作成し、データ検索部１１０に与える。データ検索部１１０は、拡張検索キーによって、データベース１０８に記憶されているデータの検索を行ない、検索結果情報を作成する。データ検索部１１０は、入力語と、作成した検索結果情報を出力する。出力された入力語と、作成した検索結果情報とは、概念距離計算部４０２、及び心的距離計算部６０８に与えられる。
【０１１８】
概念距離計算部４０２は、実施の形態２と同様に、入力語の概念と検索結果情報に含まれる各データ項目のキーワードの概念との間の概念距離を計算し、入力語とキーワードとに、概念距離を添付して、ランク計算部４１４に与える。
【０１１９】
一方、心的距離計算部６０８は、データ検索部１１０から入力語と検索結果情報とを与えられた事に応答して、属性辞書記憶部４１２に記憶されている属性情報、及び重み調整部６０６より与えられた各属性項目の重要度に基づいて、入力語の属性と、検索結果情報に記載された各データ項目のキーワードとの心的距離を計算する。
【０１２０】
図１２を参照して、心的距離計算部６０８は、与えられた検索結果情報の各項目に、計算によって得られた入力語とキーワードとの間の心的距離を添付して、ランク計算部４１４に与える。
【０１２１】
ランク計算部４１４は、概念距離を含む検索結果情報が概念距離計算部４０２から与えられたという条件、及び心的距離を含む検索結果情報が心的距離計算部６０８から与えられたという条件の両方が満たされた事に応答して、検索結果情報に添付された概念距離、及び心的距離を読出し、検索結果情報に含まれる各データ項目について、概念距離と心的距離との和をそれぞれ算出する。
【０１２２】
ランク計算部４１４は、概念距離と心的距離との和の昇順に、データ項目の項目番号をソートしてランク表を作成し、データ選択部１１６に与える。データ選択部１１６は、与えられたランク表の上位から、所定の数だけデータ項目の項目番号を読出す。データ選択部１１６は、読出した項目番号のデータ項目を、データベース１０８から読出して、出力部１１８に与えると共に、読出したデータ項目の項目番号を、履歴記録部６０２に与える。出力部１１８は与えられたデータを出力し、履歴記録部６０２は、与えられた項目番号を履歴情報に加え、履歴情報を更新する。
【０１２３】
本実施の形態３に係る情報検索装置６００は、実施の形態１、又は実施の形態２に係る情報検索装置と同様に、一般的なコンピュータ、又は携帯情報端末と、それらの上で実行されるコンピュータプログラムにより実現できる。以下、情報検索装置６００に関する所望の機能を実現するためのプログラムの制御構造について説明する。
【０１２４】
図１５に、本実施の形態３に係る情報検索装置６００が実行するプログラムのフローチャートを示す。図１５を参照して、まずＳ７０２で、記録されている履歴情報を読出し、データベースと照合して、出力キーワードを抽出する。Ｓ７０４では、履歴情報とデータベースとを照合して、Ｓ７０２で抽出した出力頻度を算出する。続くＳ７０６では、Ｓ７０２で抽出した出力キーワードと一致する語の属性情報、及びＳ７０４で算出した出力頻度に基づき、属性辞書に記載されている各属性項目の重要度を算出する。制御はＳ３０２に進む。
【０１２５】
図１１に示す実施の形態２に係るプログラムと同様に、Ｓ３０２で、入力語を取得し、Ｓ３０４では取得した入力語を元に、拡張検索キーを作成する。続くＳ３０６では、Ｓ３０４で作成した拡張検索キーを用いて、データベースに保持されているデータを検索し、検索結果と、検索結果であるデータ項目と関連付けられているキーワードとを取得する。制御はＳ５０８に進む。
【０１２６】
Ｓ５０８では、取得したキーワードについて、入力語との間の概念距離と心的距離との和をすべて算出したか否かを判定する。概念距離と心的距離との和をすべて算出しているならば、制御はＳ５１６に進む。算出していない検索結果があるならば、制御はＳ５１０に進む。
【０１２７】
Ｓ５１０では、入力語と、検索結果のデータ項目と関連付けられたキーワードとの間の概念距離を算出する。制御はＳ７０８に進む。
【０１２８】
Ｓ７０８では、Ｓ７０６で算出した、各属性項目の重要度の逆数を重みとして、入力語とキーワードとの間の心的距離を算出する。続くＳ５１４では、図１１に示す実施の形態２に係るプログラムによる制御と同様の制御で、概念距離と心的距離との和を算出する。制御はＳ５０８に戻る。
【０１２９】
Ｓ５１６では、概念距離と心的距離との和の昇順に、検索結果であるデータ項目をソートし、ランク付けをする。制御はＳ３１０に進み、図８に示す実施の形態１と同様に、上位のデータ項目から順に、所定の数のデータ項目を選択する。続くＳ３１２では、選択されたデータ項目を出力する。制御はＳ７１０に進む。
【０１３０】
Ｓ７１０では、Ｓ５１６で選択したデータ項目のデータ番号を元に、履歴情報を更新する。以上の制御が終了した後、このプログラムは終了する。
【０１３１】
本実施の形態３に係る情報検索装置６００は、出力されたデータ項目の履歴に基づいて、利用者の嗜好を推定し、属性項目の重要度を算出する。算出した重要度が高い属性項目は、利用者が好む性質の項目であると考えられる。この属性項目の重要度を加味して心的距離を計算する事により、利用者が重視する情報の性質を推定し、利用者が重視する性質に関して、入力された語と高い関連性を有する情報を検索結果として出力する事が可能となる。
【０１３２】
実施の形態１から実施の形態３を例示するにあたり、拡張語を取得する方法の一例として、入力語の上位概念を表現する語と下位概念を表現する語とを取得し、取得した語のさらに上位概念を表現する語と、下位概念を表現する語とを取得する方法を例示した。しかし、拡張語を取得する方法は、この様な方法に限定されない。例えば、入力語の概念の下位概念を表現する語をすべて取得する様にしてもよい。
【０１３３】
実施の形態１から実施の形態３を例示するにあたり、概念辞書記憶部に記憶される概念辞書は、語の概念の上位、又は下位関係を示すものとしたが、概念辞書が示す語と語との関連性は、この様な上位、下位関係に限定しない。例えば、語と、その語の類義語とを関連付ける類義語辞典の様なものでもよい。概念辞書は、ある語と語との関連性を示すものであれば、その形態を問わない。
【０１３４】
また、実施の形態１に係る属性辞書は、実施の形態２又は実施の形態３に係る属性辞書の様な属性値を記載した辞書であってもよい。さらに、実施の形態１に係るランク計算部は、実施の形態２に係る心的距離を計算し、心的距離に基づくランク付けを行ってもよい。
【０１３５】
また、実施の形態１から実施の形態３に係る属性辞書に含まれる属性値は、語に備わる性質を、性質の種類ごとに数値化したものであれば、属性値、及び属性値の決定方法を問わない。
【０１３６】
実施の形態２、及び実施の形態３において、心的距離の計算方法として、ユークリッド距離、又は重み付きユークリッド距離を用いる計算方法を例示したが、心的距離を計算する方法はこの様な方法には限定されない。例えば、語の属性ベクトルに基づいてシティーブロック距離を算出する事により、心的距離を計算する様にしてもよい。
【０１３７】
実施の形態２、及び実施の形態３において、心的距離と、概念距離との和に基づいてデータ項目のランク付けを行なう例を示した。しかし、ランク付けを行なうために用いる値を算出するための計算方法は、この様な方法には限定されない。例えば、心的距離と概念距離とにそれぞれ所定の係数を掛けて正規化し、これら正規化した心的距離と正規化した概念距離との和をランク付けの基準となる値として用いてもよい。さらに、概念距離に掛ける係数と心的距離に掛ける係数とを利用者の好みに応じて設定し、概念距離と心的距離とを正規化する際に、設定した係数を掛ける様にしてもよい。また、ランク付けの基準となる値として、心的距離と概念距離との積を用いてもよい。
【０１３８】
実施の形態１から実施の形態３に係る情報検索装置の一例として、情報検索装置に含まれる各部が一体の構成を例示した。しかし本発明はそうした実施の形態には限定されない。例えば、情報検索装置を構成する各部が、２以上の筐体に分割されていてもよい。ただし、それらは互いに通信可能である必要がある。
【０１３９】
以上に例示した各実施の形態は前述の様に、コンピュータ及びコンピュータ上で動作するソフトウェアにより実現する事もできる。もちろん、以下に述べる機能の一部又は全部を、ソフトウェアでなくハードウェアで実現する事も可能である。
【０１４０】
図１６に、本実施の形態で利用されるコンピュータシステム８００の外観図を、図１７にコンピュータシステム８００のブロック図を、それぞれ示す。なおここに示すコンピュータシステム８００はあくまで一例であり、この他にも種々の構成が可能である。
【０１４１】
図１６を参照して、コンピュータシステム８００は、コンピュータ８２０と、モニタ８２２、キーボード８２６、及びパッド型ポインティングデバイス８２８を含む。コンピュータ８００にはさらに、ＣＤ―ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）ドライブ８３０が内蔵されている。
【０１４２】
図１７を参照して、コンピュータシステム８００はさらに、コンピュータ８００に接続されるプリンタ８２４を含むが、これは図１６には示していない。またコンピュータ８００はさらに、ＣＤ―ＲＯＭドライブ８３０に接続されたバス８４６と、いずれもバス８４６に接続された中央演算装置（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ：ＣＰＵ）８３６と、コンピュータシステム８００のブートアッププログラムなどを記憶したＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）８３８と、ＣＰＵ８３６が使用する作業エリア及びＣＰＵ８３６により実行されるプログラムの格納エリアを提供するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）８４０と、データベース、概念辞書、又は属性辞書などを格納するハードディスク８３４とを含む。
【０１４３】
実施の形態１から実施の形態３で例示した情報検索装置の動作を実現するソフトウェアは、例えば、ＣＤ―ＲＯＭ８４２の様な記録媒体上に記録されて流通し、ＣＤ―ＲＯＭドライブ８３０の様な読取装置を介してコンピュータ８００に読込まれ、ハードディスク８３４に格納される。ＣＰＵ８３６がこのプログラムを実行する際には、ハードディスク８３４からこのプログラムを読出してＲＡＭ８４０に格納し、図示しないプログラムカウンタによって指定されるアドレスから命令を読出して実行する。ＣＰＵ８３６は、処理対象のデータをハードディスク８３４から読出し、処理結果を同じくハードディスク８３４に格納する。
【０１４４】
コンピュータシステム８００の動作自体は周知であるので、ここではその詳細については繰返さない。
【０１４５】
なお、ソフトウェアの流通形態は上記した様に記録媒体に固定された形には限定されない。例えば、ネットワークを通じて接続された他のコンピュータからデータを受取る形で流通する事もあり得る。また、ソフトウェアの一部がハードディスク８３４中に格納されており、ソフトウェアの残りの部分をネットワーク経由でハードディスク８３４に取込んで実行時に統合する様な形の流通形態もあり得る。
【０１４６】
現代のプログラムはコンピュータのオペレーティングシステム（ＯＳ）又はいわゆるサードパーティ等によって提供される汎用の機能を利用し、それらを所望の目的に従って組織化した形態で実行する事により、所望の目的を達成するものが一般的である。従って、実施の形態１から実施の形態３において例示した各機能のうち、ＯＳ又はサードパーティが提供する汎用的な機能を含まず、それら汎用的な機能の実行順序の組合せだけを指定するプログラム（群）であっても、それらを利用して全体的として所望の目的を達成する制御構造を有するプログラム（群）である限り、それらが本発明の技術的範囲に含まれる事は明らかである。
【０１４７】
今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内でのすべての変更を含む。
【０１４８】
【発明の効果】
以上の様に、本発明の第１の局面によると、広範囲な情報から情報を検索する事ができる様になり、入力された語との間に高い関連性を有する情報を検索できる。そのため、検索精度と再現性とを向上できる。
【０１４９】
また、具体的な基準に基づく検証を行なったり、多面的に情報を評価する事により、入力された語との間に高い関連性を有すると推定される情報を、広範囲な情報から検索をしたりする事ができる。
【０１５０】
さらに、利用者が重視する性質に関して、入力された語と高い関連性を有する情報を検索する事ができる。さらに、利用者が重視する情報の性質を推定する事が可能となり、利用者が入力した語が利用者に想起させるイメージと類似のイメージを利用者に想起させる情報を検索できる。
【図面の簡単な説明】
【図１】本発明の実施の形態１に係る情報検索装置１００の構成を示すブロック図である。
【図２】本発明の実施の形態に係る概念辞書記憶部１０４に記憶される語の概念の関連性を示す模式図である。
【図３】本発明の実施の形態に係る概念辞書記憶部１０４に記憶されている概念辞書の構成を示す図である。
【図４】本発明の実施の形態１に係る語拡張部１０６が作成した拡張検索キーの構成を示す図である。
【図５】本発明の実施の形態に係るデータベース１０８に保持されているデータの構成を示す図である。
【図６】本発明の実施の形態１に係るデータ検索部１１０が作成する検索結果情報の構成を示す図である。
【図７】本発明の実施の形態１に係る属性辞書記憶部１１２に記憶されている属性辞書の構成を示す図である。
【図８】本発明の実施の形態１に係る情報検索装置１００が実行するプログラムのフローチャートである。
【図９】本発明の実施の形態２に係る情報検索装置４００の構成を示すブロック図である。
【図１０】本発明の実施の形態２に係る属性辞書記憶部４１２が記憶する属性辞書の構成を示す図である。
【図１１】本発明の実施の形態２に係る情報検索装置４００が実行するプログラムのフローチャートである。
【図１２】本発明の実施の形態３に係る情報検索装置６００の構成を示すブロック図である。
【図１３】本発明の実施の形態３に係る履歴記録部６０２が記録する履歴情報の構成を示す図である。
【図１４】本発明の実施の形態３に係る嗜好抽出部６０４が作成する嗜好データの構成を示す図である。
【図１５】本発明の実施の形態３に係る情報検索装置６００が実行するプログラムのフローチャートである。
【図１６】本発明の実施の形態で利用されるコンピュータシステム８００の外観図である。
【図１７】本発明の実施の形態で利用されるコンピュータシステム８００の構成を示すブロック図である。
【図１８】従来の技術における、辞書データの模式図である。
【図１９】従来の技術における、辞書データに記載された語を検索キーとして情報を検索したときに得られる情報の一例を示す図である。
【符号の説明】１００、４００、６００情報検索装置、１０２文字列取得部、１０４概念辞書記憶部、１０６語拡張部、１０８データベース、１１０データ検索部、１１２、４１２属性辞書記憶部、１１４、４１４ランク計算部、１１６データ選択部、１１８出力部、１６０概念辞書、２６０、４４０属性辞書、４０２概念距離計算部、４０４、６０８心的距離計算部、６０２履歴記録部、６０４嗜好抽出部、６０６重み調整部、６２０履歴情報、６４０嗜好データ、８００コンピュータシステム、８２０コンピュータ、８２２モニタ、８２４プリンタ、８２６キーボード、８２８パッド型ポインティングデバイス、８３０ＣＤ−ＲＯＭドライブ、８３４ハードディスク、８３６ＣＰＵ、８３８ＲＯＭ、８４０ＲＡＭ、８４２ＣＤ−ＲＯＭ、８４６バス[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an information search apparatus for searching for predetermined information from a plurality of information. In particular, the present invention relates to an information search apparatus that performs a wide range search for an inquiry and selects and outputs information suitable for a user.
[0002]
[Prior art]
Currently, as a method of searching for information, a method of searching for information including a word that matches an input word is generally used. This information retrieval method obtains a search result with few omissions when a user comes up with a word that expresses the content of desired information in a simple manner and there is a large amount of information including the word that the user has come up with. Can do.
[0003]
However, if the user cannot come up with a word that expresses the desired information in a straightforward manner and enters a word that is inappropriate for obtaining the desired information, the obtained information is also inappropriate. In addition, when there is only a small amount of information including a word that the user has come up with, the possibility that the information desired by the user is included in the small amount of information is low. In these cases, the user is required to repeat the search by inputting another word associated with the word that has been conceived until a satisfactory search result is obtained.
[0004]
In order to solve such a problem, Japanese Patent Application Laid-Open No. H10-228561 discloses a technique for describing a semantic relationship between words in a hierarchical structure and performing a document search using the information. The search method described in Patent Literature 1 refers to dictionary data that stores upper and lower relationships between concepts expressed by words, and expresses a concept positioned below the word indicated by the input character string. This is a search method for extracting a word to be searched and searching for a document using the extracted word as a search key.
[0005]
FIG. 18 is a schematic diagram of dictionary data that stores upper and lower relationships between concepts expressed by words. Referring to FIG. 18, in this tree diagram,

words

902, 904,..., 918 are respectively arranged at nodes of the tree diagram. A word expressing a superordinate concept and a word expressing a subordinate concept belonging to the superordinate concept are connected by a path 900 indicating that the concepts expressed by these words are related to each other.
[0006]
In the document search method described in Patent Document 1, for example, when a character string indicating the word “fishing” is input, not only the word 908 “fishing” but also “carp fishing”, which is a word expressing its subordinate concept. The document is searched using the word 916 and the word 918 “mountain stream fishing” as search keys. By performing a search in this manner, more specific information can be found out from a large amount of documents.
[0007]
Further, in Patent Document 1, a word expressing a concept positioned higher than the word indicated by the input character string is extracted, and a concept positioned lower than the concept expressed by the extracted word or the extracted word is expressed. A search method for searching a document using a search word as a search key is also disclosed. For example, when a character string indicating the word “fishing” is input, referring to FIG. 18, not only the word 908 “fishing”, but also the word 914 “outdoor” that is a word expressing its superordinate concept; Furthermore, the word “recreation” 902 expressing a higher level concept, the words “camp” 910, “travel” 906, “hot spring” 912, “gourmet” 914, etc. expressing the lower level concept of the concept expressed by these words A document is searched using a word as a search key. By performing a search in this way, relevant information can be searched over a wider range.
[0008]
[Patent Document 1]
JP-A-4-10062
[Problems to be solved by the invention]
When information is searched using the method disclosed in Patent Document 1, there are the following problems. For example, it is assumed that a list of information obtained from the database by searching any word from the word 902 to the word 918 described in the dictionary data shown in FIG. 18 as a search key is as shown in FIG. . Referring to FIG. 19, this list of information is a list of

information

940, 942, and 944 and

search keys

946, 948, and 950 for obtaining the information.
[0009]
Using the search method described in Patent Document 1, if information is searched for using a word that expresses a concept subordinate to a concept expressed by a given word, more specific information can be obtained from a large amount of information. You can find out. However, there are cases where it is not possible to expect sufficient results even if a search is performed using words that express lower-level concepts.
[0010]
For example, it is assumed that the word indicated by the character string input by the user is the word 908 “fishing”. Referring to FIG. 18, the terms expressing the subordinate concept of the term “fishing” 908 are “carp fishing” 916 and “mountain stream fishing” 918. In this case, the database stores information “camping equipment bargain information” 940 that may contain information related to “fishing” for the user, but this information cannot be found. In this way, there is a risk that the user may be leaked from the search result even though the information that the user may think is related is held in the database.
[0011]
Conversely, when information is searched for using a word that expresses a higher concept and a word that expresses a lower concept, even if the information is logically close to the information desired by the user, the information desired by the user There is a possibility that it is information that reminds us of an image completely different from the image held for the contents of.
[0012]
For example, it is assumed that the word indicated by the character string input by the user is the word “fishing”. In this search method, a search is performed using a word expressing a superordinate concept and a word expressing the subordinate concept as search keys. Then, referring to FIG. 19, information 940 “camping equipment bargain information”, information 942 “suburban hot spring facility”, and information 944 “exploration club” are obtained. However, there are cases where the user has a “relaxed” image for “fishing” and “hot spring” and does not have a “relaxed” image for “camping” and “exploration”. In such a case, the relationship between the word “fishing” entered by the user and the information 940 “camping equipment bargain information” obtained as a search result, and the word “fishing” and the information “exploration club” The relationship with 944 is unknown to the user, and the user may feel that this information is useless information.
[0013]
Therefore, an object of the present invention is to provide an information retrieval apparatus that retrieves information related to a concept expressed by a word input by a user and has both high retrieval accuracy and high reproducibility. is there.
[0014]
Another object of the present invention is an information search apparatus for searching for information related to a concept expressed by a word input for searching by a user, and has a high relationship with the input word. The object is to provide an information retrieval device capable of retrieving information from a wide range of information.
[0015]
Still another object of the present invention is an information search device for searching for information related to a concept expressed by a word input for searching by a user. The object is to provide an information retrieval apparatus capable of retrieving information presumed to have a high degree of relevance from a wide range of information.
[0016]
Still another object of the present invention is an information search device for searching for information related to a concept expressed by a word input for searching by a user, the input regarding the nature of information emphasized by the user. It is to provide an information retrieval apparatus capable of retrieving information presumed to be highly relevant to a word from a wide range of information.
[0017]
Still another object of the present invention is an information retrieval apparatus for retrieving information related to a concept expressed by a word input for retrieval by a user, and estimating a property of information emphasized by the user. Thus, it is to provide an information retrieval apparatus capable of retrieving information presumed to have high relevance with the input word from a wide range of information.
[0018]
An additional object of the present invention is an information search device for searching for information related to a concept expressed by a word input for searching by a user, similar to an image that the input word reminds a user. It is to provide an information retrieval device that can retrieve information reminiscent of a user's image from a wide range of information.
[0019]
[Means for Solving the Problems]
An information search apparatus according to a first aspect of the present invention includes a character string acquisition unit for acquiring a character string indicating a first word, and a concept indicating a hierarchical relationship between word concepts with respect to a plurality of words. To collect concept information holding means for holding information and a second word expressing a concept related to the concept expressed by the first word from the concept information holding means based on the concept information A word collection means, a database for holding information to be searched, a first word, and a second word as a search key, and from the database, either the first word or the second word Extracting means for extracting matching keywords and information held in a database corresponding to the matching keywords, means for acquiring information indicating word attributes for a plurality of words, Attribute of the word Based on the similarity with the attribute of the matching keyword, the rank determining unit for determining the priority of the information extracted by the extracting unit, and the information extracted by the extracting unit according to the priority determined by the rank determining unit are output. Output means.
[0020]
By searching for information using not only the first word indicated by the character string input by the user of this information search apparatus but also the second word having a conceptual relationship with the first word as a search key, Information can be searched from a wide range of information. This improves the reproducibility of the search. Further, by verifying the search result according to another criterion of similarity to the attribute of the first word, information having high relevance with the word input by the user can be searched. Therefore, the search accuracy is improved.
[0021]
Preferably, the rank determining means uses the information indicating the attribute of the first word acquired by the means for acquiring information indicating the attribute of the word and the information indicating the attribute of the matching keyword based on the first information. The score calculation means for calculating the score indicating the similarity between the attribute of the word and the attribute of the matching keyword, and the priority of the information extracted by the extraction means is determined based on the score calculated by the score calculation means And determining means.
[0022]
By calculating a score indicating the similarity between the attribute of the first word and the attribute of the keyword, the priority order can be determined based on specific criteria. Therefore, it is possible to search for information having high relevance with the word input by the user.
[0023]
Preferably, the means for acquiring information indicating the attribute of the word is attribute information for holding attribute information indicating the attribute of the word by an attribute value set in advance for each attribute for each of the plurality of words. The ranking determination means includes a holding means, and the rank determination means determines the first word and the matching keyword based on the attribute information relating to the first word and the attribute information relating to the matching keyword held in the attribute information holding means. A mental distance calculating means for calculating the mental distance; and a determining means for determining the priority of the information extracted by the extracting means with reference to the mental distance calculated by the mental distance calculating means.
[0024]
By calculating the mental distance between the attribute of the first word and the attribute of the keyword, it is possible to determine the priority based on a more specific criterion. Therefore, it is possible to search for information having high relevance with the word input by the user.
[0025]
More preferably, the rank determining means further includes the first word and the matching keyword based on the concept information relating to the first word and the concept information relating to the matching keyword held in the concept information holding means. A conceptual distance calculating means for calculating a conceptual distance; and means for creating a reference value integrating the mental distance and the conceptual distance for each combination of the first word and the matching keyword. The determining means includes means for determining the priority of the information extracted by the extracting means on the basis of a reference value obtained by integrating the mental distance and the conceptual distance.
[0026]
The relationship between the concept expressed by the first word and the concept expressed by the keyword is materialized by the concept distance, and further, information is evaluated in a multifaceted manner using the concept distance and the mental distance. Thus, information that is presumed to have high relevance with the word input by the user can be searched from a wide range of information.
[0027]
The mental distance calculation means includes importance setting means for setting the importance of each attribute, attribute information relating to the first word, attribute information relating to the matching keyword, and importance held in the attribute information holding means Based on the importance of each attribute set by the setting means, a means for calculating a mental distance between the first word and the matching keyword may be included.
[0028]
The nature of the information that the user attaches importance to can be embodied by setting by the importance setting means. Therefore, by calculating the mental distance in consideration of this setting, it is possible to search for information having a high relevance to the word input by the user with respect to the property emphasized by the user.
[0029]
The importance setting means may include a history recording means for recording a history of information output by the output means, and a means for setting the importance of each attribute based on the history recorded by the history recording means. .
[0030]
Means for setting the importance of each attribute include a preference estimation unit for estimating a user's preference for information by comparing a history recorded by the history recording unit with a database, and a preference estimation unit. A means for setting the importance of each attribute may be included based on the estimated preference and the attribute information held by the attribute information holding means.
[0031]
The preference estimating means collates the history recorded by the history recording means with the database, and calculates, for each keyword, the frequency at which the information held in the database corresponding to the keyword is output; And means for estimating the user's preference for information based on the frequency of each and the keyword attribute held in the attribute information holding means.
[0032]
By calculating the importance from the information output in the past, it is possible to estimate the nature of the information that the user places importance on, and it has a high relevance to the input word with respect to the nature that the user places importance on You can search for information. Therefore, it is possible to search for information that reminds the user of an image similar to the image that the user input word recalls.
[0033]
The output means may include means for outputting the extracted information to a predetermined order in the order indicated by the priority order according to the priority order determined by the order determining means.
[0034]
By narrowing down the information to be output, information that seems to be of particular interest to the user can be presented as a search result. Therefore, the information search result is enriched for the user.
[0035]
An information search method according to a second aspect of the present invention includes a step of acquiring a character string indicating a first word, and a concept related to the concept expressed by the first word indicated by the acquired character string. In the step of collecting the second word to be expressed, the step of searching for information held in the database corresponding to the keyword using the first word and the second word as a search key, and the step of searching, Determining the priority of the information that is the search result to be output based on the similarity between the attribute of the keyword corresponding to the information that is the search result obtained by the search key and the attribute of the first word; And outputting information as a search result according to the priority order determined in the determining step.
[0036]
By using this information search method when searching for information, it is possible to search for information having high relevance with the input word from a wide range of information.
[0037]
When the information search program according to the third aspect of the present invention is executed on a computer, the information search program causes the computer to operate as the information search device according to the first aspect of the present invention.
[0038]
By executing this information search program, the operation and effect of the invention according to the first aspect described above can be realized by a computer.
[0039]
A recording medium according to a fourth aspect of the present invention is a computer-readable recording medium on which an information search program according to the third aspect of the present invention is recorded.
[0040]
By reading and executing the information retrieval program recorded on the recording medium with a computer, the operation and effect of the invention according to the first aspect described above can be realized.
[0041]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the drawings used for the following description, the same parts are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description thereof will not be repeated.
[0042]
[Embodiment 1]
An outline of the information search apparatus according to Embodiment 1 of the present invention will be described. As in the technique described in Patent Document 1, the information search apparatus according to the first embodiment performs a concept expressed by a given word (hereinafter, this concept is referred to as “ Collect words that represent concepts related to the concept of words. Then, information is searched using such words as search keys.
[0043]
However, the information search apparatus according to the first embodiment further determines the priority order for outputting the search result based on the property of the word (hereinafter, this property is referred to as “word attribute”). The search result is output according to the determined priority order.
[0044]
The attribute of a word is expressed by emotional properties such as an image held by the user of the information search apparatus according to the present embodiment with respect to the word or an object expressed by the word, or expressed by the word. It is difficult to show relevance by logical concepts, such as historical or regional backgrounds. Therefore, the information search device according to the first embodiment can perform a search using related concept words as a search key, and can obtain an impression similar to the information desired by the user when multiple types of information are obtained. High-quality information is output preferentially.
[0045]
FIG. 1 shows the configuration of the information search apparatus according to the first embodiment in the form of a block diagram. Referring to FIG. 1, an information search device 100 includes an input device such as a keyboard, a mouse, or a touch panel, and includes a character string acquisition unit 102 that acquires a character string indicating a search key and a hierarchical relationship between word concepts. A concept dictionary storage unit 104 that stores a concept dictionary indicating a relationship, a word that is connected to the character string acquisition unit 102 and the concept dictionary storage unit 104, and includes a character string acquired by the character string acquisition unit 102 (hereinafter, this word Is referred to as an “input word”), and a concept word (hereinafter referred to as an “extended word”) that is related to the input word is acquired from the concept dictionary storage unit 104, and an extension composed of the input word and the extended word And a word expansion unit 106 for creating a search key.
[0046]
The information search apparatus 100 is further connected to the database 108 that stores the data to be searched in association with the keywords used when searching the data, the word expansion unit 106, and the database 108, and is given from the word expansion unit And a data search unit 110 that searches for data held in the database 108 using an extended search key.
[0047]
The information search apparatus 100 is further connected to an attribute dictionary storage unit 112 that stores an attribute dictionary configured by attribute information indicating the attribute of words, a data search unit 110, and an attribute dictionary storage unit 112. The search result is verified based on the attribute information stored in the attribute dictionary storage unit 112, and connected to the rank calculation unit 114, the database 108, and the rank calculation unit 114, which gives priority to the data that is the search result. A priority order of data as a result is acquired from the rank calculation unit 114, and in accordance with the acquired priority order, a data selection unit 116 that acquires data from the database 108, and an output unit 118 that outputs the data acquired by the data selection unit 116, including.
[0048]
FIG. 2 shows the relevance of the concept of words stored in the concept dictionary storage unit 104. With reference to FIG. 2, the relevance of the concept of a word is typically expressed by a tree diagram. In this tree diagram, the

words

132, 134,..., 148,. A word that expresses a superordinate concept and a word that expresses a subordinate concept belonging to the superordinate concept are connected by a path 130 indicating that the concepts expressed by these words are related to each other. For example, the concept of the word “138” “fishing” is a superordinate concept of the concept of the word 146 “fish fishing” and the concept of the word 148 “mountain fishing”. Further, the concept of the word 138 “fishing” is a subordinate concept of the concept of the word 134 “outdoor”. Further, for example, the relationship between the concept of the word “138” “fishing” and the concept of the word “paragliding” 140 is based on the relationship between the concept of the word “138” “fishing” and the concept of “hot spring” and the word 142. high.
[0049]
FIG. 3 shows the configuration of the concept dictionary stored in the concept dictionary storage unit 104. Referring to FIG. 3, the concept dictionary 160 includes a number of

items

162, 164,. Each item includes a word 186, a unique word number 188 for identifying the word, and concept information 190 indicating a relationship between the concepts expressed by the word 186. The concept information 190 is information corresponding to the path 130 in the tree diagram shown in FIG. The concept information 190 includes a word number 192 of a word expressing a higher concept and a word number 194 of a word expressing a lower concept.
[0050]
The relevance of the concept expressed by the words described in each item of the concept dictionary 160 is indicated by the word number 192 of the higher concept word and the word number 194 of the lower concept word stored in the concept information 190. For example, the word number of the word expressing the superordinate concept of the word “fishing” described in the item 168 is “00123”. The word with the word number “00123” is the word “outdoor” described in the item 164. That is, the word “outdoor” is a word expressing the superordinate concept of the word “fishing”. Conversely, the word numbers of the words expressing the subordinate concept of the word “outdoor” of the word number “00123” are “01734” and “02495”. That is, the word expressing the subordinate concept of the word “outdoor” includes the word “paragliding” in addition to the word “fishing”.
[0051]
FIG. 4 shows a configuration of an extended search key created by the word expansion unit 106 based on the concept dictionary 160 shown in FIG. Referring to FIG. 4, extended search key 200 includes an input word 202 and an extended word 204. The input word 202 is distinguished from the extended word 204 because the rank calculation unit 114 shown in FIG. 1 uses it for ranking search results in the subsequent operation.
[0052]
FIG. 5 shows an example of data held in the database 108. Referring to FIG. 5, data 210 held in database 108 includes a plurality of

data items

212, 214, 216, 218,. Each data item includes stored data 220, an item number 222 for identifying the data item, and a keyword 224 that the data search unit 110 refers to when searching for data. The word selected as the keyword 224 is a word related to the content of the data 220. For example, it may be a word that briefly indicates the content of the data 220, or a word that appears in the data 220 may be extracted.
[0053]
The database shown in FIG. 5 is shown in a format that holds keywords together with data, but this is for convenience of explanation and is not limited to this format.
[0054]
The data search unit 110 shown in FIG. 1 searches the database 108 for a data item that includes the same word as the word included in the extended search key given from the word expansion unit 106 and ranks the search result. It has a function of creating search result information to be given to the unit 114. FIG. 6 shows an example of search result information created by the data search unit 110 and given to the rank calculation unit 114. Referring to FIG. 6, search result information 240 includes a plurality of

items

242, 244, 246,. These include the item number 248 of the data item that is the search result and the keyword 250 that matches the extended search key at the time of the search.
[0055]
FIG. 7 shows the configuration of the attribute dictionary stored in the attribute dictionary storage unit 112 (see FIG. 1). Referring to FIG. 7, the attribute dictionary 260 includes a number of

items

262, 264, 266, 268,. Each item includes a word 270, a word number 272 for identifying the word, and attribute information 274 indicating the attribute of the word. The attribute information 274 is information including attribute values obtained by quantifying word attributes for

attribute items

276, 278, 280,. In the attribute dictionary 260 shown in FIG. 7, an attribute value “1” is given when a word has the attribute shown in the attribute item, and an attribute value “0” is given when the word has no attribute. These attribute values may be set based on a survey, or may be set by the user himself / herself.
[0056]
The rank calculation unit 114 shown in FIG. 1 has a function of referring to an attribute dictionary stored in the attribute dictionary storage unit 112 on the basis of given search result information and assigning priorities to data items to be output. An example of a method for assigning priorities to data items output by the rank calculation unit 114 will be described.
[0057]
For example, the attribute dictionary stored in the attribute dictionary storage unit 112 is the attribute dictionary 260 shown in FIG. 7, and the word “fishing” is given to the rank calculation unit 114 as an input word, and search result information is shown in FIG. Assume that the search result information 240 shown is given. At this time, the keyword of the item 244 described in the search result information 240 shown in FIG. 6 is “hot spring”. Referring to FIG. 7, an item 262 including the word “hot spring” and an item 266 including the word “fishing” as an input word include an attribute value of the attribute item “loosely” 276 and an attribute item “Japanese style”. The attribute values of 280 match. Therefore, the score of the item 244 of the search result information 240 shown in FIG. The priority order of the data items to be output in descending order of the score calculated in this way is determined.
[0058]
With reference to FIG. 1 to FIG. 7, the information search device 100 according to the present embodiment operates as follows.
[0059]
Referring to FIG. 1, in response to the user inputting an input word using character string acquisition unit 102, character string acquisition unit 102 provides the input word to word expansion unit 106.
[0060]
The word expansion unit 106 given the input word searches for an item containing the input word in the concept dictionary 160 shown in FIG. 3 stored in the concept dictionary storage unit 104. Next, the word expansion unit 106 refers to the concept information 190 of the item in which the input word is described, and acquires the upper concept word and the lower concept word of the input word as the expansion word. The word extension unit 106 refers to the acquired extended word concept information 190 and repeats a predetermined number of times (three times in the present embodiment) the operation of acquiring the upper concept word and the lower concept word of the extended word. The extended word is acquired, and an extended search key is created from the input word and the acquired extended word. The created extended search key is given to the data search unit 110.
[0061]
The data search unit 110 searches the data stored in the database 108 using the given extended search key. The data search unit 110 reads from the database 108 the item number of the data item that satisfies the condition that it includes a keyword that matches one of the words included in the extended search key and the matching keyword.
[0062]
For example, it is assumed that the data 108 shown in FIG. 5 is stored in the database 108 and the extended search key 200 shown in FIG. 4 is given to the data search unit 110. At this time, the data search unit 110 reads out the item numbers 222 and the keywords 224 of the

data items

212, 214, and 216 including the keywords that match the words included in the extended search key 200 as the search results. The data search unit 110 creates search result information based on the read item number and keyword. The created search result information is given to the rank calculation unit 114 together with the input word.
[0063]
In response to the input word and the search result information being given, the rank calculation unit 114 reads the attribute information of the input word and the keyword attribute information included in the search result information from the attribute dictionary storage unit 112. . For each attribute item of the read attribute information, the rank calculation unit 114 compares the attribute value of the input word with the attribute value of the keyword, and assigns scores to the search result information as many times as the attribute values of the two match. This operation is performed for all keywords included in the search result information, and the search result information is scored. The rank calculation unit 114 creates a rank table in which the item numbers described in the search result information are arranged in descending order of the calculated score, and gives the data to the data selection unit 116.
[0064]
The data selection unit 116 reads a predetermined number of item numbers from the top of the given rank table. The data selection unit 116 further reads out the data item of the read item number from the database 108 and gives the read data item to the output unit 118. The output unit 118 outputs the given data.
[0065]
The information search device 100 according to the first embodiment can be realized by a general computer or a portable information terminal device and a computer program executed on them. Hereinafter, a control structure of a program for realizing a desired function related to the information search apparatus 100 will be described.
[0066]
FIG. 8 shows a flowchart of a program executed by the information search apparatus 100. Referring to FIG. 8, when information retrieval apparatus 100 starts a program, control proceeds to step 302 (hereinafter, step is simply referred to as “S”). In S302, the information search apparatus 100 acquires a character string that serves as a search key. Control proceeds to S304.
[0067]
In S304, the acquired search string is used as an input word, and an extended search key is created based on the input word. Control proceeds to S306.
[0068]
In S306, the data stored in the database is searched using the input word and the extended search key created by the control in S304, the item number of the data item that is the search result, and the keyword associated with the data item, To get. Control proceeds to S308.
[0069]
In S308, the attribute information of the keyword acquired by the control in S306 is compared with the attribute information of the input word, and each data item is scored. Control proceeds to S310.
[0070]
In S310, as a data to be output, a predetermined number of data items are selected in order from a high score data item. In subsequent S312, the data of the selected data item is output. After the above control ends, this program ends.
[0071]
As described above, since the information search apparatus 100 according to the first embodiment creates an extended search key from a given input word and searches for data, a wide range of concepts including the concept expressed by the input word You can search for matching information. In addition, the information obtained as a result of the search is ranked based on the attribute of the word, and the search result is output based on the rank. Therefore, information expressing properties similar to the input word is preferentially output. The Therefore, information having high relevance with the input word can be output as a search result.
[0072]
[Embodiment 2]
The information search apparatus 100 according to Embodiment 1 extends the search range of data based on a concept expressed by a word, and determines the priority when outputting a search result based on the attribute of the word. It was. However, the present invention is not limited to such an embodiment.
[0073]
The information search apparatus according to Embodiment 2 determines not only the similarity of the attribute of the word between the input word and the keyword of the data item that is the search result when determining the priority order when outputting the search result. The priority order is determined according to a criterion that takes into account the relevance of the word concept between the input word and the keyword.
[0074]
FIG. 9 shows the configuration of the information search apparatus according to the second embodiment in the form of a block diagram. Referring to FIG. 9, information search device 400 according to the second embodiment includes character string acquisition unit 102, concept dictionary storage unit 104, and information search device 100 according to the first embodiment shown in FIG. , A word expansion unit 106, a database 108, a data search unit 110, a data selection unit 116, and an output unit 118.
[0075]
Unlike the attribute dictionary storage unit 112, the information search device 400 further replaces the attribute dictionary storage unit 112 according to the first embodiment shown in FIG. 1 with the attribute value obtained by quantifying how much the word has an attribute. An attribute dictionary storage unit 412 that stores an attribute dictionary configured by the attribute information.
[0076]
The information search device 400 is further connected to the data search unit 110 and the concept dictionary storage unit 104, and based on the concept information stored in the concept dictionary storage unit 104, the keyword of the data item that is the search result and the input word Based on attribute information stored in the attribute dictionary storage unit 412, connected to the concept distance calculation unit 402 that calculates a concept distance that is a value indicating concept relevance, the data search unit 110, and the attribute dictionary storage unit 412. A mental distance calculation unit that calculates a mental distance, which is a value indicating the similarity between the emotional impression given to the user by the data item as the search result and the emotional impression given to the user by the input word. 404.
[0077]
The information search apparatus 400 is further connected to a conceptual distance calculation unit 402, a mental distance calculation unit 404, and a data selection unit 116 in place of the rank calculation unit 114 shown in FIG. The rank calculation unit 414 ranks the retrieved data items based on the conceptual distance and the mental distance quantified by the mental distance calculation unit 404.
[0078]
The concept distance calculation unit 402 has a function of calculating a concept distance based on the concept information stored in the concept dictionary storage unit. Below, an example of the calculation method of a conceptual distance is shown.
[0079]
Assume that the word concept is modeled by the tree diagram shown in FIG. The conceptual distance meter calculation unit 402 refers to FIG._iThe node where W is located and W_jThe number of paths 130 connecting the node where the_i, W_jConceptual distance d (W_i, W_j). Using this calculation method, for example, the conceptual distance d (“fishing”, “outdoor”) between the word 138 “fishing” and the word 134 “outdoor” is calculated as 1. For example, the conceptual distance d (“fishing”, “hot spring”) between the word 138 “fishing” and the word 142 “hot spring” is calculated as 4. When the input word and the keyword are the same word, the conceptual distance is set to zero. According to the concept distance calculated as described above, the higher the relevance of the word concept, the smaller the concept distance.
[0080]
FIG. 10 shows the configuration of the attribute dictionary stored in the attribute dictionary storage unit 412 according to the second embodiment. Referring to FIG. 10, attribute dictionary 440 stored in attribute dictionary storage unit 412 according to the second embodiment includes a large number of

items

442, 444, 446, 448,. Each item has the same configuration as that of the attribute dictionary 260 according to the first embodiment shown in FIG. 7, but differs in the following points. That is, the attribute value of the attribute dictionary 260 shown in FIG. 7 is a value that binaryly indicates whether or not a word has an attribute for each attribute item. The attribute value of the attribute dictionary 440 has an integer value from “0” to “10” indicating how much the word has an attribute for each attribute item. In other respects, the attribute dictionary 260 according to the first embodiment and the attribute dictionary 440 according to the second embodiment are the same. Note that as the attribute value of the attribute dictionary 440, a value statistically calculated based on a survey may be used, or a value set by the user may be used.
[0081]
The mental distance calculation unit 404 has a function of calculating a mental distance based on the attribute information shown in FIG. Below, an example of the calculation method of a mental distance is demonstrated.
[0082]
Assume that the attribute information of words stored in the attribute dictionary storage unit 412 (see FIG. 9) is defined by the attribute dictionary 440 shown in FIG. Referring to FIG. 10, the attribute information includes an attribute value indicating how much the word has an attribute for each attribute item, as described above. The total number of defined attribute items is n, and a certain word W_iAttribute item A_kThe attribute value in (1 ≦ k ≦ n) is a_i _, _kThen this word W_iAttribute information of a_i _, _kAre expressed as n-dimensional attribute information vectors. That is, the word W_iAre defined by an attribute information vector in an n-dimensional vector space. This vector w_iAnd And the word W_iAttributes and the word W_jThe square of the Euclidean distance to the attribute of_iAnd word W_jMental distance between and_i, W_j).
[0083]
[Expression 1]

For example, referring to FIG. 10, the first attribute item A₁Attribute item “Leisurely” 276, second attribute item A₂Attribute item “Surilink” 278, third attribute item A₃Is the attribute item “Japanese style” 280 and the word “fishing” is the word W_i, The word "paraglider"_jAnd Attribute information vector w of word Wi and word Wj_i, W_jRefer to item 446 and item 448, respectively,_i= (10, 2, 6), w_j= (0, 10, 1). Thus, the word W_iAnd the word W_jThe square of the mental distance between
[0084]
[Expression 2]

It becomes. Therefore, the mental distance s between the word “fishing” and the word “paragliding” (“fishing”, “paragliding”) is
[0085]
[Equation 3]

It becomes. According to the mental distance calculated as described above, the more similar the attributes of two words, the smaller the mental distance between the words.
[0086]
The information search apparatus 400 according to the second embodiment operates as follows.
[0087]
Referring to FIG. 9, in a manner similar to information search device 100 according to Embodiment 1, in response to the user inputting an input word, character string acquisition unit 102 provides the input word to word expansion unit 106. . The word expansion unit 106 creates an extended search key by the same operation as in the first embodiment, and gives it to the data search unit 110. The data search unit 110 performs the same operation as in the first embodiment, searches for data stored in the database 108 using the extended search key, and creates search result information. The data search unit 110 outputs the input word and the created search result information. These are given to the conceptual distance calculation unit 402 and the mental distance calculation unit 404.
[0088]
The concept distance calculation unit 402 is included in the concept of the input word and the search result information based on the concept dictionary stored in the concept dictionary storage unit 104 in response to the input word and the search result information being given. Calculate the conceptual distance between the keyword concept of each data item, attach the conceptual distance between the input word and the keyword obtained by the calculation to each item of the given search result information, and rank This is given to the calculation unit 414.
[0089]
On the other hand, referring to FIG. 9, the mental distance calculation unit 404 responds to the input word and the search result information given from the data search unit 110 and stores the attribute information stored in the attribute dictionary storage unit 412. Based on the above, the mental distance between the input word and the keyword of each data item is calculated, and the mental distance between the input word and the keyword obtained by the calculation is calculated for each item of the given search result information. Is attached to the rank calculation unit 414.
[0090]
The rank calculation unit 414, the condition that the search result information to which the concept distance is attached is given from the concept distance calculation unit 402, and the search result information to which the mental distance is attached are given from the mental distance calculation unit 404 In response to satisfying both of the conditions, the sum of the conceptual distance and the mental distance is calculated for each data item described in the search result information.
[0091]
As described above, a small value of the given concept distance indicates that the concept of the input word and the concept of the keyword are highly related. A small mental distance indicates that the similarity between the attribute of the input word and the attribute of the keyword is high. For this reason, the rank calculation unit 414 outputs the data item having high relevance to the concept of the input word or the data item having the attribute similar to the attribute of the input word with priority between the concept distance and the mental distance. A rank table in which the item numbers of the data items are arranged in order from the data item with the smallest sum is created and given to the data selection unit 116.
[0092]
The data selection unit 116 reads out a predetermined number of item numbers from the top of the given rank table, reads out the data item identified by the read item number from the database 108, and gives it to the output unit 118. The output unit 118 outputs the given data.
[0093]
Similar to the information search device 100 according to the first embodiment, the information search device 400 according to the second embodiment can be realized by a general computer or a portable information terminal and a computer program executed on them. . Hereinafter, a control structure of a program for realizing a desired function related to the information search apparatus 400 will be described.
[0094]
FIG. 11 shows a flowchart of a program executed by the information search apparatus 400 according to the second embodiment. Referring to FIG. 11, when information search device 400 according to the second embodiment starts a program, first, in S302, an input word is acquired, and in S304, an extended search key is created based on the acquired input word. In subsequent S306, the data stored in the database 108 is searched using the created extended search key, and the search result and the keyword associated with the data item that is the search result are acquired.
[0095]
In the program executed by the information search apparatus 400 according to the second embodiment, after the process of S306 is completed, the control proceeds to S508.
[0096]
In S508, it is determined whether or not all the sums of the conceptual distance and the mental distance have been calculated for the acquired keyword. If all the sums of the conceptual distance and the mental distance have been calculated, the control proceeds to S516. If there is a search result that has not been calculated, control proceeds to S510.
[0097]
In S510, the conceptual distance between the input word and the keyword associated with the data item of the search result is calculated. In S512, a mental distance between the input word and the keyword is calculated. In subsequent S514, the sum of the conceptual distance and the mental distance is calculated. Control returns to S508.
[0098]
In S516, the data items that are the search results are ranked so that the search result having the smaller sum of the conceptual distance and the mental distance is higher. Control proceeds to S310.
[0099]
In S310, as in the control by the program according to the first embodiment shown in FIG. 8, a predetermined number of data items are selected in order from the data item with the higher rank, and in S312, the selected data item is output. After the above control ends, this program ends.
[0100]
As described above, the information search apparatus 400 according to Embodiment 2 ranks information obtained as a result of search based on similarity of attributes and relevance of concepts, and searches based on the rank. Output the result. Therefore, it is possible to output information having high relevance to the input word as a search result both conceptually and in nature.
[0101]
[Embodiment 3]
In the second embodiment, the function of determining the priority order of search results based on the conceptual distance and the mental distance is exemplified. However, the present invention is not limited to such an embodiment.
[0102]
The information search apparatus according to the present embodiment calculates the importance of the attribute item based on the output information history, and calculates the mental distance by weighting the attribute item according to the calculated importance It has further.
[0103]
FIG. 12 shows the configuration of the information search apparatus according to the third embodiment in a block diagram format. Referring to FIG. 12, information search device 600 according to the third embodiment includes character string acquisition unit 102, concept dictionary storage unit 104, and word expansion unit of information search device 400 according to the second embodiment shown in FIG. 106, in addition to the data search unit 110, the data selection unit 116, and the output unit 118, the data item connected to the data selection unit 116, the item number of the data item selected by the data selection unit 116 is acquired and output A history recording unit 602 that records as a history of the user, a preference extraction unit 604 that is connected to the history recording unit 602 and the database 108 and creates preference data indicating user preferences, an attribute dictionary storage unit 412, and a preference extraction unit 604. And a weight adjustment unit 606 that calculates the importance of each attribute item of the attribute dictionary based on the preference data and the attribute information of the attribute dictionary.
[0104]
The information search device 600 according to the third embodiment further replaces the mental distance calculation unit 404 shown in the second embodiment shown in FIG. 9 with a data search unit 110, an attribute dictionary storage unit 412, a rank calculation unit 416, And a mental distance that is connected to the weight adjustment unit 606 and calculates a mental distance between the two words based on the information stored in the attribute dictionary storage unit 412 and the importance calculated by the weight adjustment unit 606. A distance calculation unit 608 is included.
[0105]
FIG. 13 shows a configuration of history information recorded by the history recording unit 602 according to the third embodiment. Referring to FIG. 13, history information 620 includes a plurality of history items. Each history item includes the item number 622 of the data item output by the information search apparatus 600 in the past and the date and time 624 when the data item was output. The item number 622 is used when the preference extraction unit 604 (see FIG. 12) creates preference data.
[0106]
FIG. 14 shows the configuration of preference data created by the preference extraction unit 604. Referring to FIG. 14, in each item of preference data 640, an output keyword 642 indicating a keyword associated with the data item output so far and a data item associated with the keyword are output. Output frequency 644 indicating the number of times. The output frequency 644 is added when the output keyword 622 is included in the keyword group associated with the output data item. The fact that the output frequency of the output keyword is a high value indicates that the user of the information search apparatus 600 tends to prefer the attribute of the keyword with the high output frequency.
[0107]
As described above, the weight adjustment unit 606 illustrated in FIG. 12 has a function of calculating the importance of an attribute item based on the preference data created by the preference extraction unit 604 and the attribute information stored in the attribute dictionary storage unit 412. Have Below, an example of the calculation method of the importance degree which the weight adjustment part 606 calculates is demonstrated.
[0108]
A total of h output keywords K are added to the given preference data._m(1 ≦ m ≦ h) is included. Output keyword K_mOutput frequency of f_mIt expresses by. Output keyword K_mAttribute item A for words that match_lAttribute value in a_{l, m}And At this time, attribute item A_lImportance I_lIs calculated by Equation 4 below.
[0109]
[Expression 4]

For example, consider the case where preference data 640 shown in FIG. 14 is given to the weight adjustment unit 606 and the importance of each attribute item is calculated based on the attribute information described in the attribute dictionary 440 shown in FIG. The importance of the attribute items 236, 238, 240 is set to I₁, I₂, I₃Then,
[0110]
[Equation 5]

It becomes.
[0111]
An example of a mental distance calculation method performed by the mental distance calculation unit 608 according to the third embodiment will be described below.
[0112]
The total number of defined attribute items is n and the word W_iAttribute item A_kThe attribute value in (1 ≦ k ≦ n) is a_{i, k}, Attribute item A_kThe importance of I_kAnd The mental distance calculation unit 608 uses I_kThe word W with the reciprocal of_iAnd word W_jAnd the square of the weighted Euclidean distance between and_iAnd word W_jMental distance s (W_i, W_j).
[0113]
[Formula 6]

[0114]
The information search apparatus 600 according to the third embodiment operates as follows.
[0115]
Referring to FIG. 12, when information retrieval apparatus 600 is activated, preference estimation unit 604 reads history information recorded in history recording unit 602. The preference estimation unit 602 further reads a keyword associated with the data item having the item number described in the history information from the database 108. Using the read keyword as an output keyword, the output frequency is calculated based on the history information, and preference data is created. The created preference data is given to the weight adjustment unit 606.
[0116]
In response to the preference data being given, the weight adjustment unit 606 reads the attribute information of the word that matches the output keyword included in the preference data from the attribute dictionary storage unit 412. The weight adjustment unit 606 calculates the importance of each attribute item based on the given preference data and the read attribute information. The calculated importance of each attribute item is given to the mental distance calculation unit 608.
[0117]
On the other hand, when the user inputs a word, the character string acquisition unit 102 gives the input word to the word expansion unit 106. The word expansion unit 106 creates an extended search key with reference to the concept dictionary stored in the concept dictionary storage unit 104, and gives it to the data search unit 110. The data search unit 110 searches the data stored in the database 108 using the extended search key, and creates search result information. The data search unit 110 outputs the input word and the created search result information. The output input word and the created search result information are given to the conceptual distance calculation unit 402 and the mental distance calculation unit 608.
[0118]
As in the second embodiment, the concept distance calculation unit 402 calculates the concept distance between the concept of the input word and the keyword concept of each data item included in the search result information. A conceptual distance is attached and given to the rank calculation unit 414.
[0119]
On the other hand, the mental distance calculation unit 608 responds to the input word and the search result information from the data search unit 110, and the attribute information stored in the attribute dictionary storage unit 412 and the weight adjustment unit 606. Based on the importance of each attribute item given, a mental distance between the attribute of the input word and the keyword of each data item described in the search result information is calculated.
[0120]
Referring to FIG. 12, mental distance calculation unit 608 attaches the mental distance between the input word and the keyword obtained by the calculation to each item of the given search result information, and rank calculation unit 414.
[0121]
The rank calculation unit 414 has both a condition that the search result information including the concept distance is given from the concept distance calculation unit 402 and a condition that the search result information including the mental distance is given from the mental distance calculation unit 608. In response to the above, the conceptual distance and mental distance attached to the search result information are read, and the sum of the conceptual distance and mental distance is calculated for each data item included in the search result information. To do.
[0122]
The rank calculation unit 414 creates a rank table by sorting the item numbers of the data items in ascending order of the sum of the conceptual distance and the mental distance, and gives the rank table to the data selection unit 116. The data selection unit 116 reads a predetermined number of data item numbers from the top of the given rank table. The data selection unit 116 reads the data item of the read item number from the database 108 and gives it to the output unit 118, and also gives the item number of the read data item to the history recording unit 602. The output unit 118 outputs the given data, and the history recording unit 602 adds the given item number to the history information and updates the history information.
[0123]
Similar to the information search device according to the first or second embodiment, the information search device 600 according to the third embodiment is executed on a general computer or a portable information terminal and on them. It can be realized by a computer program. Hereinafter, a control structure of a program for realizing a desired function related to the information search apparatus 600 will be described.
[0124]
FIG. 15 shows a flowchart of a program executed by the information search apparatus 600 according to the third embodiment. Referring to FIG. 15, first, in S702, the recorded history information is read out, collated with a database, and an output keyword is extracted. In S704, the history information and the database are collated, and the output frequency extracted in S702 is calculated. In subsequent S706, the importance of each attribute item described in the attribute dictionary is calculated based on the attribute information of the word that matches the output keyword extracted in S702 and the output frequency calculated in S704. Control proceeds to S302.
[0125]
Similar to the program according to the second embodiment shown in FIG. 11, an input word is acquired in S302, and an extended search key is created based on the acquired input word in S304. In subsequent S306, data stored in the database is searched using the extended search key created in S304, and a search result and a keyword associated with the data item that is the search result are acquired. Control proceeds to S508.
[0126]
In S508, it is determined whether or not the sum of the conceptual distance and the mental distance between the input keyword and the acquired word has been calculated. If all the sums of the conceptual distance and the mental distance have been calculated, the control proceeds to S516. If there is a search result that has not been calculated, control proceeds to S510.
[0127]
In S510, the conceptual distance between the input word and the keyword associated with the data item of the search result is calculated. Control proceeds to S708.
[0128]
In S708, the mental distance between the input word and the keyword is calculated using the reciprocal of the importance of each attribute item calculated in S706 as a weight. In subsequent S514, the sum of the conceptual distance and the mental distance is calculated by the same control as the control by the program according to the second embodiment shown in FIG. Control returns to S508.
[0129]
In S516, the data items as the search results are sorted and ranked in ascending order of the sum of the conceptual distance and the mental distance. Control proceeds to S310, and a predetermined number of data items are selected in order from the upper data item, as in the first embodiment shown in FIG. In subsequent S312, the selected data item is output. Control proceeds to S710.
[0130]
In S710, the history information is updated based on the data number of the data item selected in S516. After the above control ends, this program ends.
[0131]
The information search apparatus 600 according to the third embodiment estimates the user's preference based on the output data item history and calculates the importance of the attribute item. The attribute item having a high degree of importance calculated is considered to be an item having a property that the user likes. By calculating the mental distance in consideration of the importance of this attribute item, the nature of the information emphasized by the user is estimated, and the information emphasized by the user is highly related to the input word. Can be output as a search result.
[0132]
In exemplifying Embodiment 1 to Embodiment 3, as an example of a method for acquiring an extended word, a word expressing a higher concept and a word expressing a lower concept of an input word are acquired, and A method of acquiring a word expressing a superordinate concept and a word expressing a subordinate concept was illustrated. However, the method for acquiring the extended word is not limited to such a method. For example, all the words expressing the subordinate concepts of the input word concept may be acquired.
[0133]
In exemplifying the first to third embodiments, the concept dictionary stored in the concept dictionary storage unit indicates the upper or lower relationship of the concept of words. The relevance of is not limited to such upper and lower relationships. For example, a synonym dictionary that associates a word with a synonym of the word may be used. The concept dictionary may be in any form as long as it shows the relationship between a word and a word.
[0134]
Further, the attribute dictionary according to the first embodiment may be a dictionary describing attribute values like the attribute dictionary according to the second or third embodiment. Furthermore, the rank calculation unit according to the first embodiment may calculate the mental distance according to the second embodiment and perform ranking based on the mental distance.
[0135]
In addition, the attribute value included in the attribute dictionary according to the first to third embodiments is an attribute value and a method for determining the attribute value, as long as the property included in the word is quantified for each property type. It doesn't matter.
[0136]
In the second embodiment and the third embodiment, the calculation method using the Euclidean distance or the weighted Euclidean distance is exemplified as the mental distance calculation method. However, the mental distance calculation method is such a method. Is not limited. For example, the mental distance may be calculated by calculating the city block distance based on the word attribute vector.
[0137]
In the second embodiment and the third embodiment, the example in which the data items are ranked based on the sum of the mental distance and the conceptual distance is shown. However, the calculation method for calculating the values used for ranking is not limited to such a method. For example, the mental distance and the conceptual distance may be normalized by multiplying each by a predetermined coefficient, and the sum of the normalized mental distance and the normalized conceptual distance may be used as a reference value for ranking. Furthermore, a coefficient to be multiplied by the conceptual distance and a coefficient to be multiplied by the mental distance may be set according to the user's preference, and when the conceptual distance and the mental distance are normalized, the set coefficient may be multiplied. . Further, a product of the mental distance and the conceptual distance may be used as a value serving as a reference for ranking.
[0138]
As an example of the information search device according to the first to third embodiments, a configuration in which each unit included in the information search device is integrated is illustrated. However, the present invention is not limited to such an embodiment. For example, each part which comprises an information search device may be divided | segmented into the 2 or more housing | casing. However, they need to be able to communicate with each other.
[0139]
Each embodiment illustrated above can also be realized by a computer and software operating on the computer, as described above. Of course, some or all of the functions described below can be realized by hardware instead of software.
[0140]
FIG. 16 shows an external view of a computer system 800 used in this embodiment, and FIG. 17 shows a block diagram of the computer system 800. Note that the computer system 800 shown here is merely an example, and various other configurations are possible.
[0141]
Referring to FIG. 16, a computer system 800 includes a computer 820, a monitor 822, a keyboard 826, and a pad type pointing device 828. The computer 800 further includes a CD-ROM (Compact Disc Read-Only Memory) drive 830.
[0142]
Referring to FIG. 17, computer system 800 further includes a printer 824 connected to computer 800, which is not shown in FIG. The computer 800 further stores a bus 846 connected to the CD-ROM drive 830, a central processing unit (CPU) 836 connected to the bus 846, a boot-up program of the computer system 800, and the like. ROM (Read-Only Memory) 838, a RAM (Random Access Memory) 840 that provides a work area used by the CPU 836 and a storage area for programs executed by the CPU 836, a database, a concept dictionary, an attribute dictionary, etc. A hard disk 834 to be used.
[0143]
The software that realizes the operation of the information retrieval apparatus exemplified in the first to third embodiments is recorded and distributed on a recording medium such as a CD-ROM 842 and read as a CD-ROM drive 830. The data is read into the computer 800 via the apparatus and stored in the hard disk 834. When the CPU 836 executes this program, the program is read from the hard disk 834 and stored in the RAM 840, and an instruction is read from an address designated by a program counter (not shown) and executed. The CPU 836 reads data to be processed from the hard disk 834 and stores the processing result in the hard disk 834 as well.
[0144]
Since the operation of computer system 800 is well known, details thereof will not be repeated here.
[0145]
The software distribution form is not limited to the form fixed on the recording medium as described above. For example, data may be distributed in the form of receiving data from other computers connected through a network. Further, there may be a distribution form in which a part of software is stored in the hard disk 834 and the remaining part of the software is taken into the hard disk 834 via a network and integrated at the time of execution.
[0146]
Modern programs use general-purpose functions provided by computer operating systems (OS) or so-called third parties, and execute them in an organized form according to the desired purpose, thereby achieving the desired purpose. Is common. Accordingly, among the functions exemplified in the first to third embodiments, the general function provided by the OS or the third party is not included, and only the combination of the execution order of these general functions is specified ( It is obvious that these programs are included in the technical scope of the present invention as long as they are programs (groups) having a control structure that achieves a desired object as a whole by using them.
[0147]
The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim in the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are intended. Including.
[0148]
【The invention's effect】
As described above, according to the first aspect of the present invention, information can be retrieved from a wide range of information, and information having high relevance with the input word can be retrieved. Therefore, search accuracy and reproducibility can be improved.
[0149]
In addition, by performing verification based on specific criteria and evaluating information from multiple angles, information that is presumed to be highly relevant to the input word is searched from a wide range of information. You can do it.
[0150]
Furthermore, it is possible to search for information having a high relevance to the input word with respect to the property emphasized by the user. Furthermore, it is possible to estimate the nature of the information that is important to the user, and it is possible to search for information that causes the user to recall an image similar to the image that the word entered by the user reminds the user.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an information search apparatus 100 according to Embodiment 1 of the present invention.
FIG. 2 is a schematic diagram showing relevance of word concepts stored in a concept dictionary storage unit 104 according to an embodiment of the present invention.
FIG. 3 is a diagram showing a configuration of a concept dictionary stored in a concept dictionary storage unit 104 according to the embodiment of the present invention.
FIG. 4 is a diagram showing a configuration of an extended search key created by the word expansion unit 106 according to Embodiment 1 of the present invention.
FIG. 5 is a diagram showing a configuration of data held in a database 108 according to the embodiment of the present invention.
FIG. 6 is a diagram showing a configuration of search result information created by the data search unit 110 according to Embodiment 1 of the present invention.
7 is a diagram showing a configuration of an attribute dictionary stored in an attribute dictionary storage unit 112 according to Embodiment 1 of the present invention. FIG.
FIG. 8 is a flowchart of a program executed by the information search apparatus 100 according to Embodiment 1 of the present invention.
FIG. 9 is a block diagram showing a configuration of an information search apparatus 400 according to Embodiment 2 of the present invention.
FIG. 10 is a diagram showing a configuration of an attribute dictionary stored in an attribute dictionary storage unit 412 according to Embodiment 2 of the present invention.
FIG. 11 is a flowchart of a program executed by the information search apparatus 400 according to Embodiment 2 of the present invention.
FIG. 12 is a block diagram showing a configuration of an information search apparatus 600 according to Embodiment 3 of the present invention.
FIG. 13 is a diagram showing a structure of history information recorded by a history recording unit 602 according to Embodiment 3 of the present invention.
FIG. 14 is a diagram showing a configuration of preference data created by a preference extraction unit 604 according to Embodiment 3 of the present invention.
FIG. 15 is a flowchart of a program executed by the information search apparatus 600 according to Embodiment 3 of the present invention.
FIG. 16 is an external view of a computer system 800 used in the embodiment of the present invention.
FIG. 17 is a block diagram showing a configuration of a computer system 800 used in the embodiment of the present invention.
FIG. 18 is a schematic diagram of dictionary data in the prior art.
FIG. 19 is a diagram illustrating an example of information obtained when information is searched using a word described in dictionary data as a search key in a conventional technique.
[Explanation of Symbols] 100, 400, 600 Information retrieval device, 102 Character string acquisition unit, 104 Concept dictionary storage unit, 106 Word expansion unit, 108 Database, 110 Data retrieval unit, 112, 412 Attribute dictionary storage unit, 114, 414 Rank calculation unit, 116 data selection unit, 118 output unit, 160 concept dictionary, 260, 440 attribute dictionary, 402 concept distance calculation unit, 404, 608 mental distance calculation unit, 602 history recording unit, 604 preference extraction unit, 606 weight Adjustment unit, 620 history information, 640 preference data, 800 computer system, 820 computer, 822 monitor, 824 printer, 826 keyboard, 828 pad type pointing device, 830 CD-ROM drive, 834 hard disk, 836 CPU, 838 ROM 840 RAM, 842 CD-ROM, 846 bus

Claims

A character string acquisition means for acquiring a character string indicating the first word;
Concept information holding means for holding conceptual information indicating a hierarchical relationship between the concepts of the words for a plurality of words,
Word collection means for collecting, from the concept information holding means, a second word representing a concept related to the concept represented by the first word indicated by the character string, based on the concept information When,
A database to hold the information to be searched,
Using the first word and the second word as a search key, the database corresponding to either the first word or the second word from the database and the database corresponding to the matching keyword Extraction means for extracting information held in
Means for obtaining information indicating the attribute of the word for a plurality of words;
Rank determination means for determining the priority of the information extracted by the extraction means on the basis of the similarity between the attribute of the first word and the attribute of the matching keyword;
And an output means for outputting the extracted information in accordance with the priority order determined by the order determination means.

The rank determining means includes
Based on the information indicating the attribute of the first word acquired by the means for acquiring and the information indicating the attribute of the matching keyword, the attribute of the first word and the matching keyword A score calculation means for calculating a score indicating similarity to the attribute;
The information search apparatus according to claim 1, further comprising: a determination unit that determines a priority order of the information extracted by the extraction unit on the basis of the score calculated by the score calculation unit.

The means for obtaining includes attribute information holding means for holding attribute information indicating an attribute of the word for each of the plurality of words by an attribute value set in advance for each attribute,
The rank determining means includes
A mental distance between the first word and the matching keyword is calculated based on the attribute information about the first word and the attribute information about the matching keyword held in the attribute information holding unit. Mental distance calculation means for
The information search apparatus according to claim 1, further comprising: a determination unit that determines a priority of information extracted by the extraction unit based on the mental distance calculated by the mental distance calculation unit.

The rank determining means includes
Based on the concept information related to the first word and the concept information related to the matching keyword held in the concept information holding means, a concept distance between the first word and the matching keyword is calculated. Conceptual distance calculation means for
Means for creating a reference value integrating the mental distance and the conceptual distance for each combination of the first word and the matching keyword;
4. The information search apparatus according to claim 3, wherein the determining means includes means for determining a priority order of information extracted by the extracting means with reference to the reference value.

The mental distance calculation means includes
Importance setting means for setting the importance of each attribute;
Based on the attribute information related to the first word, the attribute information related to the matching keyword, and the importance of each attribute set by the importance setting means held in the attribute information holding means, the first The information search apparatus according to claim 3, comprising: a word and means for calculating a mental distance between the matching keywords.

The importance setting means includes:
History recording means for recording a history of information output by the output means;
The information search device according to claim 5, further comprising: means for setting importance of each attribute based on a history recorded by the history recording means.

The means for setting is:
Preference estimation means for estimating the user's preference for information by comparing the history and the database;
The information search device according to claim 6, further comprising means for setting importance of each attribute based on the preference estimated by the preference estimation means and the attribute information held by the attribute information holding means. .

The preference estimation means includes
Means for collating the history with the database, and for each keyword, calculating a frequency at which information held in the database corresponding to the keyword is output;
The information search apparatus according to claim 7, further comprising: means for estimating a user's preference for information based on a frequency for each keyword and an attribute of the keyword held in the attribute information holding means.

2. The information search apparatus according to claim 1, wherein the output means includes means for outputting the extracted information to a predetermined rank in the order indicated by the priority according to the priority determined by the rank determination means. .

Obtaining a character string indicating a first word;
Collecting a second word representing a concept related to the concept represented by the first word represented by the character string;
Using the first word and the second word as a search key to search information held in a database corresponding to the keyword;
In the searching step, priority is given to the information that is the search result to be output on the basis of the similarity between the attribute of the keyword corresponding to the information that is the search result obtained by the search key and the attribute of the first word. Determining the ranking;
Outputting the information as the search result according to the priority order determined in the determining step.

A computer-executable information search program that, when executed on a computer, causes the computer to operate as the information search device according to any one of claims 1 to 9.

A computer-readable recording medium on which the information search program according to claim 11 is recorded.