JP6613833B2

JP6613833B2 - Information processing apparatus, information processing system, and program

Info

Publication number: JP6613833B2
Application number: JP2015221548A
Authority: JP
Inventors: 和久大野
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2015-11-11
Filing date: 2015-11-11
Publication date: 2019-12-04
Anticipated expiration: 2035-11-11
Also published as: JP2017091270A

Description

本発明は、情報処理装置、情報処理システム、及びプログラムに関し、詳細には、文書間のリンク構造を利用した情報の取得・提示に関する。 The present invention relates to an information processing apparatus, an information processing system, and a program, and more particularly to acquisition / presentation of information using a link structure between documents.

近年、インターネットの普及によりユーザは様々な情報を簡単に検索して取得できるようになっている。例えば、インターネット上のＷｅｂサイトの検索エンジンによって、ユーザが入力したキーワードを含むＷｅｂページを検索し閲覧することができる。また、リンクが埋め込まれたハイパーテキストであれば、ユーザはＷｅｂページを閲覧中に関連するリンク先へジャンプして、別の情報を容易に得ることが可能となっている。 In recent years, with the widespread use of the Internet, users can easily search and acquire various information. For example, a Web page including a keyword input by a user can be searched and browsed by a search engine of a Web site on the Internet. In the case of hypertext with embedded links, the user can jump to a related link destination while browsing a Web page and easily obtain other information.

また、よりユーザの興味を引く文章を検索結果として表示するため、例えば特許文献１には、意外性のある文章を検索結果として提示する手法について記載されている。具体的には、特許文献１の手法では、ドキュメントをカテゴリに分類し、ドキュメントから抽出された各単語についてカテゴリ内で各々の出現頻度に基づいて単語の意外性の指標となる単語スコアを算出し、ドキュメントから抽出された文章について単語スコアに基づいて文章の意外性のスコアとなる文章スコアを算出することが記載されている。これにより、カテゴリ内での単語スコアが高く、希少性が高い単語は意外性が高いとして、ドキュメントから意外性のある文章を抽出できる。 In addition, for example, Patent Literature 1 describes a technique of presenting an unexpected sentence as a search result in order to display a sentence that attracts more user interest as a search result. Specifically, in the method of Patent Document 1, a document is classified into categories, and for each word extracted from the document, a word score that is an index of word unexpectedness is calculated based on the appearance frequency of each word in the category. In addition, it is described that a sentence score that is a score of an unexpectedness of a sentence is calculated based on a word score for a sentence extracted from a document. Thereby, an unexpected sentence can be extracted from a document on the assumption that a word having a high word score in a category and a high rarity are highly unexpected.

特開２０１１−９５９０５号公報JP 2011-95905 A

しかしながら、上述の特許文献１の手法では、カテゴリを考慮して単語の出現頻度に基づく意外性を計算している。そのため、同じ内容が別のカテゴリでも多く出現する場合には、意外性の指標を正しく算出できないという問題があった。 However, the above-described method of Patent Document 1 calculates the unexpectedness based on the appearance frequency of words in consideration of categories. For this reason, there is a problem in that an unexpected index cannot be calculated correctly when many of the same contents appear in different categories.

本発明は、このような課題に鑑みてなされたもので、カテゴリによらず、一般に認知されている関連文書の中からユーザが自分では発見しにくい意外性のある情報を取得し、提示することが可能な情報処理装置、情報処理システム、及びプログラムを提供することを目的とする。 The present invention has been made in view of such problems, and acquires and presents surprising information that is difficult for a user to find by himself / herself from related documents that are generally recognized, regardless of category. It is an object to provide an information processing apparatus, an information processing system, and a program capable of performing the above.

前述した課題を解決するための第１の発明は、キーワードを入力する入力手段と、リンク関係としてリンク先を関連付けられた語句及び前記語句に関する文書を記憶する記憶手段と、入力されたキーワードとリンク関係にある語句を取得するリンク語句取得手段と、前記入力されたキーワードに関する文書及び前記入力されたキーワードとリンク関係にある語句に関する文書を抽出する文書抽出手段と、前記キーワードに関する文書と、前記リンク関係にある語句に関する文書との類似度を用いて文書間の意外度を算出する演算手段と、を備えることを特徴とする情報処理装置である。 According to a first aspect of the present invention, there is provided an input means for inputting a keyword, a storage means for storing a word / phrase associated with a link destination as a link relation and a document related to the phrase, and an input keyword / link. Link phrase acquisition means for acquiring a related phrase, document extraction means for extracting a document related to the input keyword and a phrase related to the input keyword, a document related to the keyword, and the link An information processing apparatus comprising: an operation unit that calculates a degree of unexpectedness between documents using a similarity to a document related to a related phrase.

第１の発明によれば、情報処理装置は、リンク関係としてリンク先を関連付けられた語句及び前記語句に関する文書を記憶する記憶手段を有し、入力されたキーワードとリンク関係のある語句を取得し、前記入力されたキーワードに関する文書及び前記入力されたキーワードとリンク関係にある語句に関する文書を抽出し、前記キーワードに関する文書と、前記リンク関係にある語句に関する文書との類似度を用いて文書間の意外度を算出する。これにより、ユーザは通常の文章閲覧では発見しにくい意外な情報を発見しやすくなる。網羅的な情報の発見やＡＩ自動対話の促進、発想支援等への活用が可能となる。 According to the first invention, the information processing apparatus includes a storage unit that stores a word / phrase associated with a link destination as a link relationship and a document related to the word / phrase, and acquires a word / phrase having a link relationship with the input keyword. , Extracting the document related to the input keyword and the document related to the phrase related to the input keyword, and using the similarity between the document related to the keyword and the document related to the phrase related to the link Calculate the degree of surprise. This makes it easier for the user to find unexpected information that is difficult to find by normal text browsing. It becomes possible to discover comprehensive information, promote AI automatic dialogue, and support ideas.

第１の発明において、前記演算手段の算出結果に応じて前記リンク関係にある語句に関する文書の少なくとも一部を出力する出力手段を備えることが望ましい。これにより、関連する膨大な量の文書から意外性のある部分を抽出して出力できる。ユーザは効率よく意外性のある情報を取得できる。 In the first invention, it is desirable to provide an output means for outputting at least a part of a document related to the phrase having the link relation in accordance with a calculation result of the calculating means. Thereby, an unexpected part can be extracted and output from a huge amount of related documents. The user can efficiently obtain unexpected information.

また第１の発明において、前記記憶手段は、相互に連想関係にある語句を関連付けて記憶し、前記入力されたキーワードに連想関係のある語句を連想語として取得する連想語取得手段を更に備え、前記演算手段は、更に、前記連想語に関する文書についても前記リンク語句取得手段により取得した語句に関する文書との文書間の類似度を用いて意外度を算出することが望ましい。これにより、類似度が高いと予想される語句である連想語を入力キーワード群に含めて入力キーワードとリンク関係がある語句との意外度（非類似度）を算出するため、意外性のある語の抽出精度を向上できる。 In the first invention, the storage means further includes associative word acquisition means for associating and storing words and phrases that are associated with each other, and acquiring phrases that are associated with the input keyword as associated words, It is desirable that the arithmetic means further calculates an unexpected degree of the document relating to the associative word by using the similarity between the document and the document relating to the phrase acquired by the link phrase acquiring means. As a result, an unexpected word (dissimilarity) between an input keyword and a phrase that has a link relationship with an associated word that is expected to have a high similarity is included in the input keyword group. The extraction accuracy can be improved.

また第１の発明において、前記記憶手段は、文書毎の参照された総数を記憶し、前記演算手段は、前記入力されたキーワードに関する文書と、前記リンク語句取得手段により取得した語句に関する文書との文書間の非類似度に、文書の参照数に基づく重み付けをして前記意外度とすることが望ましい。これにより一般により多く認知されている文書の中から、意外性のある情報を抽出することが可能となり、よりユーザの興味を引く情報を出力することが可能となる。 In the first invention, the storage means stores a total number referred to for each document, and the calculation means includes a document related to the input keyword and a document related to the phrase acquired by the link phrase acquisition means. It is desirable to weight the dissimilarity between documents based on the number of document references to obtain the unexpectedness. As a result, it is possible to extract surprising information from a document that is generally recognized more, and to output information that makes the user more interested.

また第１の発明において、前記記憶手段は、語句をカテゴリ毎に関連付けて記憶し、前記演算手段は、語句のカテゴリ毎に前記意外度を算出し、前記出力手段は、前記入力されたキーワードに対して前記カテゴリが意外な語句に関する文書の少なくとも一部を出力することが望ましい。これにより、カテゴリ毎の意外度を計算し、情報提示の際に利用することが可能となり、ユーザにとってより意外な情報を提示できるようになる。 In the first invention, the storage means stores a phrase in association with each category, the calculation means calculates the unexpectedness for each category of the phrase, and the output means sets the inputted keyword to the keyword. On the other hand, it is desirable to output at least a part of a document relating to a phrase whose category is unexpected. As a result, the unexpectedness for each category can be calculated and used when presenting information, and information more surprising to the user can be presented.

第２の発明は、リンク関係としてリンク先を関連付けられた語句及び前記語句に関する文書を記憶する記憶手段を有するサーバと、キーワードを入力する入力手段と、入力されたキーワードとリンク関係にある語句をサーバから取得するリンク語句取得手段と、前記入力されたキーワードに関する文書及び前記入力されたキーワードとリンク関係にある語句に関する文書をサーバから抽出する文書抽出手段と、前記キーワードに関する文書と、前記リンク関係にある語句に関する文書との類似度を用いて文書間の意外度を算出する演算手段と、を有する情報処理装置と、を備えることを特徴とする情報処理システムである。 According to a second aspect of the present invention, there is provided a server having storage means for storing a phrase associated with a link destination as a link relation and a document relating to the phrase, an input means for inputting a keyword, and a phrase having a link relation with the input keyword. Link phrase acquisition means acquired from a server, document extraction means for extracting a document related to the input keyword and a phrase related to the input keyword from the server, a document related to the keyword, and the link relation An information processing system comprising: an information processing apparatus that includes a calculation unit that calculates an unexpected degree between documents using a similarity to a document related to a certain phrase.

第２の発明により、リンク関係としてリンク先を関連付けられた語句及び前記語句に関する文書を記憶する記憶手段を有するサーバを設け、情報処理装置は、入力されたキーワードとリンク関係のある語句をサーバから取得し、前記入力されたキーワードに関する文書及び前記入力されたキーワードとリンク関係にある語句に関する文書をサーバから抽出し、前記キーワードに関する文書と、前記リンク関係にある語句に関する文書との類似度を用いて文書間の意外度を算出する。これにより、ユーザは通常の文章閲覧では発見しにくい意外な情報を発見しやすくなる。網羅的な情報の発見やＡＩ自動対話の促進、発想支援等に活用することが可能となる。 According to a second aspect of the present invention, a server having storage means for storing a phrase associated with a link destination as a link relation and a document related to the phrase is provided, and the information processing apparatus receives a phrase related to the input keyword from the server. Obtaining and extracting a document related to the input keyword and a document related to a phrase related to the input keyword from a server, and using similarity between the document related to the keyword and a document related to the phrase related to the link To calculate the unexpectedness between documents. This makes it easier for the user to find unexpected information that is difficult to find by normal text browsing. It can be used for comprehensive information discovery, AI automatic dialogue promotion, idea support, and the like.

第３の発明は、コンピュータを、第１の発明の情報処理装置として機能させるプログラムである。第３の発明により、コンピュータを第１の発明の情報処理装置として機能させることが可能となる。 A third invention is a program for causing a computer to function as the information processing apparatus of the first invention. According to the third invention, it is possible to cause a computer to function as the information processing apparatus according to the first invention.

本発明により、カテゴリによらず、一般に認知されている関連文書の中からユーザが自分では発見しにくい意外性のある情報を取得し、提示することが可能な情報処理装置、情報処理システム、及びプログラムを提供できる。 According to the present invention, an information processing apparatus, an information processing system, and an information processing apparatus capable of acquiring and presenting unexpected information that is difficult for a user to find out of related documents that are generally recognized regardless of the category, and Can provide a program.

情報処理システム１のシステム構成図System configuration diagram of information processing system 1 情報処理装置２のハードウエア構成図Hardware configuration diagram of the information processing apparatus 2 文書ＤＢ３について説明する図The figure explaining document DB3 リンク構造ＤＢ４について説明する図The figure explaining link structure DB4 連想語ＤＢ５のデータ内容の一例An example of the data contents of the association word DB5 連想語ＤＢ５の生成について説明する図The figure explaining the production | generation of associative word DB5 情報処理装置２が実行する情報取得処理の流れを説明するフローチャートThe flowchart explaining the flow of the information acquisition process which the information processing apparatus 2 performs 入力キーワード群５２について説明する図The figure explaining the input keyword group 52 非類似度（意外度）の算出例Example of calculating dissimilarity (unexpectedness) 文書ベクトルを用いた類似度の算出例Example of calculating similarity using document vectors 意外性のある情報の出力について説明する図Diagram explaining the output of unexpected information 参照数に基づく意外度の算出について説明する図Diagram explaining the calculation of the unexpectedness based on the number of references カテゴリの意外度について説明する図Illustration explaining the unexpectedness of categories

以下、図面に基づいて本発明の好適な実施形態について詳細に説明する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

［第１の実施の形態］
まず本発明の第１の実施形態について説明する。図１は、本発明に係る情報処理装置２を利用した情報処理システム１のシステム構成を示す図である。情報処理システム１において、情報処理装置２はネットワーク１０を介して文書ＤＢ３、リンク構造ＤＢ４、連想語ＤＢ５等に通信接続される。ネットワーク１０は、ＬＡＮ（Local Area Network）や、より広域に通信接続されたＷＡＮ（Wide Area Network）、またはインターネット等の公衆の通信回線、基地局等を含む。ネットワーク１０における通信接続は有線、無線を問わない。 [First Embodiment]
First, a first embodiment of the present invention will be described. FIG. 1 is a diagram showing a system configuration of an information processing system 1 using an information processing apparatus 2 according to the present invention. In the information processing system 1, the information processing apparatus 2 is communicatively connected to a document DB 3, a link structure DB 4, an associative word DB 5, and the like via a network 10. The network 10 includes a LAN (Local Area Network), a WAN (Wide Area Network) connected to a wider area, or a public communication line such as the Internet, a base station, and the like. The communication connection in the network 10 may be wired or wireless.

情報処理装置２は、任意のアプリケーションプログラムをインストールし、処理を実行可能なコンピュータ等の装置である。例えばスマートフォン、タブレット、ゲーム機、その他の各種の情報端末を含む。 The information processing apparatus 2 is an apparatus such as a computer that can install an arbitrary application program and execute processing. For example, a smart phone, a tablet, a game machine, and other various information terminals are included.

図２は、情報処理装置２のハードウエア構成の一例を示す図である。図２に示すように、情報処理装置２は、制御部２１、記憶部２２、入力部２３、表示部２４、通信Ｉ／Ｆ２５、メディア入出力部２６、周辺機器Ｉ／Ｆ部２７等がバス２９を介して接続されて構成される。 FIG. 2 is a diagram illustrating an example of a hardware configuration of the information processing apparatus 2. As shown in FIG. 2, the information processing apparatus 2 includes a control unit 21, a storage unit 22, an input unit 23, a display unit 24, a communication I / F 25, a media input / output unit 26, a peripheral device I / F unit 27, and the like. 29 is connected.

制御部２１は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等により構成される。ＣＰＵは、記憶部２２、ＲＯＭ、記録媒体等に格納されるプログラムをＲＡＭ上のワークメモリ領域に呼び出して実行し、バス２９を介して接続された各部を駆動制御する。 The control unit 21 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The CPU calls a program stored in the storage unit 22, ROM, recording medium or the like to a work memory area on the RAM and executes it, and drives and controls each unit connected via the bus 29.

制御部２１のＣＰＵは、入力された任意のキーワードに対し意外性のある情報を取得し出力する情報取得処理を実行する。この情報取得処理の詳細については後述する。 The CPU of the control unit 21 executes an information acquisition process that acquires and outputs information that is surprising for an input arbitrary keyword. Details of this information acquisition processing will be described later.

ＲＯＭは、情報処理装置２のブートプログラムやＢＩＯＳ等のプログラム、データ等を恒久的に保持する。ＲＡＭは、ロードしたプログラムやデータを一時的に保持するとともに、制御部２１が各種処理を行うために使用するワークエリアを備える。 The ROM permanently stores programs such as a boot program and BIOS for the information processing apparatus 2, data, and the like. The RAM temporarily holds the loaded program and data, and includes a work area used by the control unit 21 for performing various processes.

記憶部２２は、制御部２１が実行するプログラムや、プログラム実行に必要なデータ、ＯＳ（オペレーティング・システム）等が格納されている。これらのプログラムコードは、制御部２１により必要に応じて読み出されてＲＡＭに移され、ＣＰＵに読み出されて実行される。 The storage unit 22 stores a program executed by the control unit 21, data necessary for program execution, an OS (operating system), and the like. These program codes are read by the control unit 21 as necessary, transferred to the RAM, and read and executed by the CPU.

入力部２３は、例えば、キーボード、マウス、またはタッチパネル等の入力装置であり、入力されたデータを制御部２１へ出力する。
表示部２４は、例えば液晶パネル等のディスプレイ装置と、ディスプレイ装置と連携して表示処理を実行するための論理回路で構成され、制御部２１の制御により入力された表示情報をディスプレイ装置上に表示させる。なお、入力部２３がタッチパネルで構成される場合は、タッチパネルは表示部２４のディスプレイと一体的に構成される。 The input unit 23 is an input device such as a keyboard, a mouse, or a touch panel, for example, and outputs input data to the control unit 21.
The display unit 24 includes a display device such as a liquid crystal panel and a logic circuit for executing display processing in cooperation with the display device, and displays display information input by the control of the control unit 21 on the display device. Let In addition, when the input part 23 is comprised with a touch panel, a touch panel is comprised integrally with the display of the display part 24. FIG.

通信Ｉ／Ｆ（インターフェース）２５は、アンテナ及び通信制御回路等を含み、ネットワーク１０との通信を媒介するインターフェースである。
メディア入出力部２６は、例えばＣＤドライブ、ＤＶＤドライブ等のメディア入出力装置であり、制御部２１の制御に従ってメディアからのデータの読み出し、及びメディアへのデータの書き込みを行う。 The communication I / F (interface) 25 includes an antenna, a communication control circuit, and the like, and is an interface that mediates communication with the network 10.
The media input / output unit 26 is a media input / output device such as a CD drive or a DVD drive, for example.

周辺機器Ｉ／Ｆ（インターフェース）部２７は、周辺機器を接続させるためのポートであり、周辺機器Ｉ／Ｆ部２７を介して周辺機器とのデータの送受信を行う。周辺機器との接続形態は有線、無線を問わない。
バス２９は、制御信号、データ信号等の授受を媒介する経路である。 The peripheral device I / F (interface) unit 27 is a port for connecting a peripheral device, and transmits and receives data to and from the peripheral device via the peripheral device I / F unit 27. The connection form with the peripheral device may be wired or wireless.
The bus 29 is a path that mediates transmission / reception of control signals, data signals, and the like.

文書ＤＢ３は、図３（ａ）に示すように文書群（複数の文書）を記憶したデータベースであり、例えば図１に示すようにネットワーク１０上のＷｅｂサーバ等の記憶領域に記憶される。「文書」とは、本明細書では、ある項目について説明等を記載したテキストデータを意味するものとする。文書に含まれるいくつかの語句には、他の項目へのリンク（ハイパーリンク）が設定されているものとする。「文書群」とは、上述の文書を複数集めたものである。例えば、フリー百科事典「ウィキペディア（Wikipedia）」等を文書群と呼び、その各記事を文書と呼ぶ。なお、文書群は、ウィキペディアに限定されるものではなく、その他の辞書や用語辞典、情報記事、ニュース記事等も文書群に含めてもよい。 The document DB 3 is a database that stores a document group (a plurality of documents) as shown in FIG. 3A. For example, the document DB 3 is stored in a storage area such as a Web server on the network 10 as shown in FIG. In this specification, “document” means text data in which an explanation or the like is described for a certain item. It is assumed that links (hyperlinks) to other items are set for some words included in the document. The “document group” is a collection of a plurality of the documents described above. For example, the free encyclopedia “Wikipedia” is called a document group, and each article is called a document. The document group is not limited to Wikipedia, and other dictionaries, terminology dictionaries, information articles, news articles, and the like may be included in the document group.

文書ＤＢ３には、図３（ｂ）に示すように、「チョコレート」等の語を見出し項目として、各見出し項目についてそれぞれの記事「チョコレート（英：chocolate）は、カカオの種子を発酵・焙煎したカカオマスを主原料とし、…」が本文として格納される。 As shown in FIG. 3B, the document DB 3 includes words such as “chocolate” as heading items, and each article “chocolate” is fermented and roasted with cacao seeds. The cocoa mass is used as the main ingredient, and "..." is stored as the text.

リンク構造ＤＢ４は、文書ＤＢ３に記憶されている文書群のリンク構造データを格納するデータベースである。リンク構造データとは、リンクが埋め込まれた文書間のリンク関係を抽出したデータであり、ある語に対して設定されたリンク先及びリンク元の語が集約されて格納される。１つの語について複数のリンク先及びリンク元が設定されることがある。リンクとは、文書に埋め込まれるハイパーリンクであり、同じ文書の別の場所や他の文書、或いはまったく別のサイトへ移動するための情報である。 The link structure DB 4 is a database that stores link structure data of document groups stored in the document DB 3. The link structure data is data obtained by extracting a link relation between documents in which links are embedded, and link destinations and link source words set for a certain word are collected and stored. A plurality of link destinations and link sources may be set for one word. A link is a hyperlink embedded in a document, and is information for moving to another location of the same document, another document, or a completely different site.

例えば、図４に示すように、「アイスクリーム」という語は、リンク元として「夏目漱石」、「大和型戦艦」、「チョコレート」、「バニラ」、「クレープ」に関する記事（文書）にリンクが埋め込まれている。つまり、「夏目漱石」、「大和型戦艦」、「チョコレート」、「バニラ」、「クレープ」に関する記事内に「アイスクリーム」へのリンクが埋め込まれている。また、「アイスクリーム」の文書にはリンク先として「バニラ」、「クレープ」等にジャンプするためのリンクが埋め込まれている。この場合、「バニラ」、「クレープ」は「アイスクリーム」と双方向にリンクされている。「夏目漱石」、「大和型戦艦」、「チョコレート」は一方向のリンクである。 For example, as shown in FIG. 4, the word “ice cream” is linked to articles (documents) related to “Natsume Soseki”, “Yamato Battleship”, “Chocolate”, “Vanilla”, and “Crepe” as link sources. Embedded. In other words, links to “Ice Cream” are embedded in articles about “Natsume Soseki”, “Yamato Battleship”, “Chocolate”, “Vanilla”, and “Crepe”. In addition, a link for jumping to “vanilla”, “crepe” or the like is embedded as a link destination in the document “ice cream”. In this case, “vanilla” and “crepe” are linked bidirectionally with “ice cream”. “Natsume Soseki”, “Yamato Battleship” and “Chocolate” are one-way links.

このようなリンク構造ＤＢ４としては、例えば、「ウィキペディア（Wikipedia）」のリンク構造を集約した「ＤＢＰｅｄｉａ」等がある。「ＤＢＰｅｄｉａ」は、「ウィキペディア（Wikipedia）」から情報を抽出したＬＯＤ（Linked Open Data）として一般に公開するコミュニティプロジェクトである。なお、本発明で利用するリンク構造データは、「ＤＢＰｅｄｉａ」に限定されるものではなく、その他の各種のＬＯＤを利用してもよい。 As such a link structure DB 4, for example, there is “DBPedia” in which the link structures of “Wikipedia” are aggregated. “DBPedia” is a community project that is open to the public as LOD (Linked Open Data) obtained by extracting information from “Wikipedia”. The link structure data used in the present invention is not limited to “DBPedia”, and other various LODs may be used.

連想語ＤＢ５は、各語についての連想語を複数格納したものである。図５に示すように、例えば語「アイスクリーム」の連想語であれば、「プリン」、「ケーキ」、「キャラメル」等が連想語ＤＢ５に格納される。ある語についての連想語は、様々な文書群に含まれる単語の共起回数に基づいて求められる。 The associative word DB 5 stores a plurality of associative words for each word. As shown in FIG. 5, for example, in the case of an association word of the word “ice cream”, “pudding”, “cake”, “caramel”, and the like are stored in the association word DB 5. An associative word for a word is obtained based on the number of co-occurrence words included in various document groups.

図６は連想語ＤＢ５の生成方法の一例を説明する図である。図６に示すように、各種の様々な文書から抽出した複数の単語の共起回数を計数し、共起回数が予め設定された閾値より多い語が連想語として設定される。例えば図６の共起回数カウンタ５１に示すように「アイスクリーム」という語と同じ文書に出現する語は、「プリン」は１００回、「ケーキ」は８０回、「キャラメル」は７０回、「横浜」は２０回のように計数される。これらの語のうち出現回数が閾値より多い語である「プリン」、「ケーキ」、「キャラメル」等が「アイスクリーム」の連想語として連想語ＤＢ５に格納される。連想語ＤＢ５の元となる文書は、ウィキペディア記事の他、一般的なＷｅｂサイト等の文書である。 FIG. 6 is a diagram for explaining an example of a method for generating the associative word DB 5. As shown in FIG. 6, the number of times of co-occurrence of a plurality of words extracted from various various documents is counted, and a word having the number of times of co-occurrence exceeding a preset threshold is set as an associative word. For example, as shown in the co-occurrence number counter 51 of FIG. 6, the words that appear in the same document as the word “ice cream” are “pudding” 100 times, “cake” 80 times, “caramel” 70 times, “ “Yokohama” is counted as 20 times. Of these words, “pudding”, “cake”, “caramel”, and the like that appear more frequently than the threshold are stored in the associative word DB 5 as associative words of “ice cream”. The document that is the source of the associative word DB 5 is a document such as a general Web site in addition to a Wikipedia article.

次に、情報処理システム１における情報取得処理の流れを説明する。図７は情報処理装置２が実行する情報取得処理の流れを示すフローチャートである。図に示すように、ユーザにより情報処理装置２に任意のキーワードが入力されると（ステップＳ１）、情報処理装置２の制御部２１は、入力キーワードのリンク構造データを取得する（ステップＳ２）。リンク構造データは、リンク構造ＤＢ４に格納されている。例えば、入力キーワードが「アイスクリーム」である場合、制御部２１はリンク構造ＤＢ４から「アイスクリーム」に関するリンク構造データを取得する。図４に示すように、「アイスクリーム」に関するリンク構造データは、リンク元として「夏目漱石」、「大和型戦艦」、「チョコレート」、「バニラ」、「クレープ」、リンク先として「バニラ」、「クレープ」が紐づけられている。制御部２１は、これらのリンク元及びリンク先の語「夏目漱石」、「大和型戦艦」、「チョコレート」、「バニラ」、「クレープ」を取得し、ＲＡＭに保持する。 Next, the flow of information acquisition processing in the information processing system 1 will be described. FIG. 7 is a flowchart showing a flow of information acquisition processing executed by the information processing apparatus 2. As shown in the figure, when an arbitrary keyword is input to the information processing apparatus 2 by the user (step S1), the control unit 21 of the information processing apparatus 2 acquires link structure data of the input keyword (step S2). The link structure data is stored in the link structure DB4. For example, when the input keyword is “ice cream”, the control unit 21 acquires link structure data related to “ice cream” from the link structure DB 4. As shown in FIG. 4, the link structure data related to “Ice Cream” includes “Natsume Soseki”, “Yamato Battleship”, “Chocolate”, “Vanilla”, “Crepe” as the link source, “Vanilla” as the link destination, "Crepes" are tied. The control unit 21 acquires these link source and link destination words “Natsume Soseki”, “Yamato-type battleship”, “chocolate”, “vanilla”, and “crepe” and stores them in the RAM.

続いて、制御部２１は、ステップＳ１で入力されたキーワード「アイスクリーム」の連想語を連想語ＤＢ５から取得する（ステップＳ３）。図８に示すように、連想語ＤＢ５にキーワード「アイスクリーム」の連想語として「プリン」、「ケーキ」、「キャラメル」等が格納されているものとする。制御部２１は、入力キーワード「アイスクリーム」の連想語として「プリン」、「ケーキ」、「キャラメル」を取得し、入力キーワードとともに入力キーワード群５２としてＲＡＭに保持する。 Subsequently, the control unit 21 acquires the associative word of the keyword “ice cream” input in step S1 from the associative word DB 5 (step S3). As shown in FIG. 8, it is assumed that “pudding”, “cake”, “caramel”, and the like are stored in the associative word DB 5 as associative words of the keyword “ice cream”. The control unit 21 acquires “pudding”, “cake”, and “caramel” as associative words of the input keyword “ice cream”, and stores them in the RAM as the input keyword group 52 together with the input keywords.

入力キーワード群５２に連想語を加えるのは、容易に関連付けが予想される語句は意外性に欠けるため、そのような連想語についての意外度（非類似度）のスコアを下げ、出力結果の精度向上を図るためである。よって出力結果の精度よりもシステムの簡易化や演算量の軽減等を重視する場合にはステップＳ３の工程を省略してもよい。 The reason why an associative word is added to the input keyword group 52 is that a word or phrase that is expected to be easily associated is not surprising, so the score of the unexpectedness (dissimilarity) for such an associative word is lowered, and the accuracy of the output result This is for improvement. Therefore, when importance is attached to simplification of the system and reduction of the calculation amount rather than the accuracy of the output result, the step S3 may be omitted.

続いて、制御部２１は、入力キーワード及び連想語（入力キーワード群５２）の文書と、ステップＳ２で取得したリンク関係のある語句の文書との意外度（非類似度）を計算する（ステップＳ４）。ここで、図９及び図１０を参照して意外度の算出方法について説明する。図９は意外度（非類似度）の算出について説明する図であり、図１０は類似度の算出例を示す図である。 Subsequently, the control unit 21 calculates an unexpected degree (dissimilarity) between the document of the input keyword and the associative word (input keyword group 52) and the document of the phrase having the link relationship acquired in step S2 (step S4). ). Here, a method of calculating the unexpectedness will be described with reference to FIGS. 9 and 10. FIG. 9 is a diagram for explaining the calculation of the unexpectedness (dissimilarity), and FIG. 10 is a diagram illustrating a calculation example of the similarity.

図９に示すように、制御部２１は、まず入力キーワード群５２の各単語についての文書７ａ，７ｂ，７ｃ，…と、ステップＳ２で取得したリンク構造データにより得られるリンク関係がある語についての文書８ａ，…を文書ＤＢ３から読み出し、これらの文書間でそれぞれ類似度を算出する。類似度は例えばコサイン類似度等を用いて算出することができる。 As shown in FIG. 9, the control unit 21 firstly processes documents 7a, 7b, 7c,... For each word in the input keyword group 52 and words having a link relationship obtained from the link structure data acquired in step S2. Documents 8a,... Are read from the document DB 3, and the similarity between these documents is calculated. The similarity can be calculated using, for example, cosine similarity.

図１０に示すように、コサイン類似度は各文書の特徴を表す指標である文書ベクトルＡ，Ｂを用いて、式（１）により算出される。 As shown in FIG. 10, the cosine similarity is calculated by Expression (1) using document vectors A and B that are indexes representing the characteristics of each document.

文書ベクトルは、文書から抽出される単語の出現頻度を基に生成される。例えば、文書ベクトルＡは、入力キーワード「アイスクリーム」に関する文書７ａから抽出される各単語「アイスクリーム」、「牛乳」「原料」、「空気」、…等の出現頻度に基づいて各単語のＴＦ−ＩＤＦ値（１．２，２．０，１．２５，０．５５，…）を求め、これらの値を要素とするベクトルである。なお、ＴＦ−ＩＤＦ値の代わりにＴＦ値を用いてもよい。同様に、文書ベクトルＢは、リンク関係がある語「チョコレート」に関する文書８ａから抽出される各単語「チョコレート」、「カカオ」、「種子」「発酵」、…等の出現頻度に基づいて各単語のＴＦ−ＩＤＦ値（１．４，１．５，０．６，０．２，…）を求め、これらの値を要素とするベクトルである。なお同一の抽出単語がベクトル中で同一の順番となるように文書ベクトルＡ、文書ベクトルＢの各要素が整列される。 The document vector is generated based on the appearance frequency of words extracted from the document. For example, the document vector A is TF of each word based on the appearance frequency of each word “ice cream”, “milk” “raw material”, “air”,... Extracted from the document 7a regarding the input keyword “ice cream”. IDF values (1.2, 2.0, 1.25, 0.55,...) Are obtained, and these values are vectors. Note that a TF value may be used instead of the TF-IDF value. Similarly, the document vector B includes each word based on the appearance frequency of each word “chocolate”, “cacao”, “seed”, “fermentation”,... TF-IDF values (1.4, 1.5, 0.6, 0.2,...) Are obtained, and these values are vectors. The elements of document vector A and document vector B are aligned so that the same extracted words are in the same order in the vector.

図９に示すように、制御部２１は、入力キーワード群５２の「アイスクリーム」の文書７ａ、「プリン」（連想語）の文書７ｂ、「ケーキ」（連想語）の文書７ｃ、…等についてそれぞれ文書ベクトルを求め、リンク関係がある語「チョコレート」の文書８ａの文書ベクトルＢとの類似度を算出する。更に、算出した類似度の平均値を求める。類似度の平均を数値「１」から減算し、非類似度を求める。
これにより、入力キーワード群５２とリンク関係がある語「チョコレート」との非類似度が求められる。 As shown in FIG. 9, the control unit 21 selects the “ice cream” document 7a, the “pudding” (associative word) document 7b, the “cake” (associative word) document 7c,. Each document vector is obtained, and the similarity between the word “chocolate” and the document vector B of the document 8a having a link relationship is calculated. Further, an average value of the calculated similarities is obtained. The average of the similarities is subtracted from the numerical value “1” to obtain the dissimilarity.
Thereby, the dissimilarity with the word “chocolate” linked to the input keyword group 52 is obtained.

同様に、別のリンク関係がある語「夏目漱石」、「大和型戦艦」、「バニラ」、「クレープ」、…等についても、それぞれ入力キーワード群５２との非類似度を求める。第１の実施の形態では、非類似度そのものを意外度とする。 Similarly, the dissimilarity with the input keyword group 52 is also obtained for the words “Natsume Soseki”, “Yamato type battleship”, “vanilla”, “crepe”,. In the first embodiment, the dissimilarity itself is regarded as an unexpected degree.

制御部２１は、ステップＳ４の演算により入力キーワード群５２との意外度の高い語をいくつか求める。そして、意外語の高い語に関する情報を取得し（ステップＳ５）、出力する（ステップＳ６）。 The control unit 21 obtains some words having a high degree of surprise with the input keyword group 52 by the calculation in step S4. And the information regarding a word with a high unexpected word is acquired (step S5), and it outputs (step S6).

図１１に示すように、入力キーワード群５２とリンク関係がある語についてそれぞれ意外度が求められるものとする。例えば、「夏目漱石」の意外度は０．８、「小笠原諸島」の意外度は０．６４、「田中角栄」の意外度は０．５、「安倍晋三」の意外度は０．４、…のように、入力キーワード群５２（「チョコレート」等）との意外度が求められているとする。制御部２１は、意外度のスコアが予め設定された閾値より高い語、或いは意外度のスコアが上位である語を１つ以上取得し、これらの意外度が高い語に関する文書９を文書ＤＢ３から取得する。更に、意外度の高い語に関する文書９から、入力キーワード「アイスクリーム」が含まれる文「大の甘党で、療養中には当時貴重品だったアイスクリームを欲しがり周囲を困らせたこともある。」を抽出する。 As shown in FIG. 11, it is assumed that the degree of surprise is obtained for each word that has a link relationship with the input keyword group 52. For example, “Natsume Soseki” has an unexpected degree of 0.8, “Ogasawara Islands” has an unexpected degree of 0.64, “Tanaka Kakuei” has an unexpected degree of 0.5, “Abe Shinzo” has an unexpected degree of 0.4, Suppose that the unexpectedness with the input keyword group 52 (such as “chocolate”) is calculated as shown in FIG. The control unit 21 acquires one or more words having a higher unexpectedness score than a preset threshold value or a word having a higher unexpectedness score, and the document 9 relating to these words having a higher unexpectedness is obtained from the document DB 3. get. In addition, from the document 9 about words with a high degree of surprise, a sentence containing the input keyword “ice cream”, “The Great Sweet Party, I wanted the ice cream that was valuable at the time of medical treatment, and had troubled the surroundings. . "Is extracted.

制御部２１は、抽出した文を、意外度の高い語に関する情報として、出力する。出力は、表示部２４への表示、記憶部２２への記憶等である。また、周辺機器Ｉ／Ｆ部２７に接続された印刷装置への印刷や音声出力部（不図示）からの音声出力等としてもよい。出力の際、制御部２１は所定の出力フォーマットに従って意外度の高い語に関する情報を出力する。例えば、図１１に示すように、“「アイスクリーム」といえば、「夏目漱石」は「大の甘党で、療養中には当時貴重品だったアイスクリームを欲しがり周囲を困らせたこともある。」らしいですよ”のように所定の出力形式に意外性のある情報を合成して出力する。 The control unit 21 outputs the extracted sentence as information on a word having a high degree of surprise. The output includes display on the display unit 24, storage in the storage unit 22, and the like. Further, printing to a printing apparatus connected to the peripheral device I / F unit 27, voice output from a voice output unit (not shown), or the like may be performed. At the time of output, the control unit 21 outputs information on a word having a high degree of surprise according to a predetermined output format. For example, as shown in FIG. 11, “Niceme Soseki” is “a big sweet party, and during treatment, I wanted an ice cream that was a valuable item at the time, and sometimes confused my surroundings. Like “It seems”, it synthesizes unexpected information into a specified output format and outputs it.

以上説明したように、情報処理装置２は、入力部２３を介して任意のキーワードが入力されると、制御部２１は、入力されたキーワードとリンク関係のある語句を、文書群のリンク構造データに基づいて取得し、入力されたキーワードに関する文書７ａと、リンク関係のある語句に関する文書８ａとの文書間の意外度を算出し、算出された意外度に基づいて、入力されたキーワードについての意外性のある情報を取得し、出力する。これにより、ユーザは、通常の文書検索や閲覧では発見しにくい意外な情報を発見しやすくなる。網羅的な情報の発見、ＡＩ自動対話の促進、発想支援等に活用することが可能となる。 As described above, when an arbitrary keyword is input via the input unit 23, the control unit 21 converts the phrase having a link relationship with the input keyword into the link structure data of the document group. And the unexpectedness between the document 7a related to the input keyword and the document 8a related to the link-related phrase is calculated, and the unexpectedness about the input keyword is calculated based on the calculated unexpectedness. Get and output sexual information. This makes it easier for the user to find unexpected information that is difficult to find by normal document search or browsing. It can be used for comprehensive information discovery, AI automatic dialogue promotion, idea support, and the like.

また、制御部２１は、入力されたキーワードの連想語を取得し、連想語に関する文書についてもリンク関係がある語の文書との文書間の意外度を算出するため、容易に関連付けが予想される語の意外度（非類似度）のスコアを確実に下げ、精度よく出力結果を得ることができる。 Moreover, since the control part 21 acquires the associative word of the input keyword and calculates the unexpected degree between documents with the document of the word which has a link relation also about the document regarding an associative word, an association is anticipated easily. The word unexpectedness (dissimilarity) score can be reliably lowered, and an output result can be obtained with high accuracy.

なお、上述の説明では情報処理装置２は、図１に示すように、ネットワーク１０を介して通信接続された文書ＤＢ３、リンク構造ＤＢ４、連想語ＤＢ５から文書やリンク構造データや連想語等を取得するものとしたが、これに限定されない。文書ＤＢ３、リンク構造ＤＢ４、連想語ＤＢ５は情報処理装置２内の記憶部２２に記憶され、情報処理装置２の制御部２１は、記憶部２２から文書データやリンク構造データや連想語等を取得する構成としてもよい。 In the above description, as shown in FIG. 1, the information processing apparatus 2 acquires documents, link structure data, associative words, and the like from the document DB 3, the link structure DB 4, and the associative word DB 5 that are communicatively connected via the network 10. However, the present invention is not limited to this. The document DB 3, the link structure DB 4, and the associative word DB 5 are stored in the storage unit 22 in the information processing apparatus 2, and the control unit 21 of the information processing apparatus 2 acquires document data, link structure data, associative words, and the like from the storage unit 22. It is good also as composition to do.

また、リンク構造データには、互いにリンク元及びリンク先となる双方向リンクの関係である語句と、リンク元またはリンク先データとなる一方向リンクの語句とがあるが、双方向リンクの語句は互いに類似する関係であると推定し、意外度の算出対象から除外してもよい。 In addition, the link structure data includes a phrase that is a relation of a bidirectional link that is a link source and a link destination and a unidirectional link phrase that is a link source or a link destination data. You may estimate that it is a relationship similar to each other, and you may exclude from the calculation object of an unexpected degree.

［第２の実施の形態］
次に、本発明の第２の実施形態について説明する。第２の実施形態では、情報処理装置２が意外度を算出する際、第１の実施の形態で説明した非類似度に加え、文書の参照数を考慮して意外度を算出する。参照数とは、文書が参照された回数であり、該当記事（文書）へのアクセスログから得ることができる。参照数は情報取得処理を実行する都度、随時ネットワーク１０を介して取得してもよいし、予め情報処理装置２内の記憶部２２に参照数として記憶しておくものとしてもよい。 [Second Embodiment]
Next, a second embodiment of the present invention will be described. In the second embodiment, when the information processing apparatus 2 calculates the unexpectedness, the unexpectedness is calculated in consideration of the number of document references in addition to the dissimilarity described in the first embodiment. The reference number is the number of times a document is referred to and can be obtained from an access log to the corresponding article (document). The reference number may be acquired via the network 10 whenever the information acquisition process is executed, or may be stored in advance in the storage unit 22 in the information processing apparatus 2 as a reference number.

図１２の意外度算出テーブル６に示すように、第２の実施の形態では、非類似度に参照数（正規化参照数）を乗じた値を意外度とする。非類似度は、第１の実施の形態と同様の手順で求められる。正規化参照数とは、単語の参照数を全参照数で正規化した値（０〜１の数値）である。 As shown in the unexpectedness calculation table 6 of FIG. 12, in the second embodiment, a value obtained by multiplying the dissimilarity by the reference number (normalized reference number) is set as the unexpectedness. The dissimilarity is obtained by the same procedure as in the first embodiment. The normalized reference number is a value (numerical value of 0 to 1) obtained by normalizing the number of word references with the total number of references.

例えば、入力キーワード群５２に対して、語「チョコレート」の非類似度が「０．２」、正規化参照数が「０．７」の場合、意外度は「０．１４」である。また、語「夏目漱石」の非類似度が「０．８」、正規化参照数が「０．８」の場合、意外度は「０．６４」であり、語「大和型戦艦」の非類似度が「０．９」、正規化参照数が「０．３」の場合、意外度は「０．２７」である。 For example, when the dissimilarity of the word “chocolate” is “0.2” and the normalized reference number is “0.7” for the input keyword group 52, the unexpectedness is “0.14”. Also, if the dissimilarity of the word “Natsume Soseki” is “0.8” and the normalized reference number is “0.8”, the unexpectedness is “0.64”, and the word “Yamato-type battleship” When the similarity is “0.9” and the normalized reference number is “0.3”, the unexpectedness is “0.27”.

制御部２１は、入力キーワード群５２についての文書と、リンク関係がある語についての文書との非類似度を第１の実施の形態と同様の手順で求め、求めた非類似度にリンク関係がある語についての文書の正規化参照数を乗じ、意外度を求める。その後、第１の実施の形態と同様に、意外度が高い語に関する情報を文書群（文書ＤＢ３等）から取得し、出力する。 The control unit 21 obtains the dissimilarity between the document for the input keyword group 52 and the document for the word having a link relationship in the same procedure as in the first embodiment, and the link relationship is found in the obtained dissimilarity. Multiply the normalized number of documents for a word to find the degree of surprise. After that, as in the first embodiment, information about words with a high degree of unexpectedness is acquired from the document group (document DB 3 or the like) and output.

以上説明したように、第２の実施の形態の情報処理装置２は、文書の参照数を考慮して語の意外度を算出する。参照数を考慮することにより、より多く認知されている語の中から意外性のある情報を抽出することが可能となり、よりユーザの興味を引くことが可能となる。 As described above, the information processing apparatus 2 according to the second embodiment calculates the word unexpectedness in consideration of the number of document references. By considering the number of references, it is possible to extract surprising information from more recognized words, and to attract more users' interest.

［第３の実施の形態］
次に、本発明の第３の実施形態について説明する。第３の実施形態では、情報処理装置２が各語の意外度を算出する際に、語句のカテゴリ毎に意外度を算出し、カテゴリを考慮した情報出力を行う。 [Third Embodiment]
Next, a third embodiment of the present invention will be described. In the third embodiment, when the information processing device 2 calculates the unexpectedness of each word, the unexpectedness is calculated for each category of words and the information is output in consideration of the category.

例えば図１３に示すように、制御部２１は入力されたキーワードのリンク関係がある語句をリンク構造データから取得し、取得した語句をカテゴリ別に分類する。語句がどのカテゴリに所属するかといった情報については、語句もしくはカテゴリに対して予め与えておく。また、各語句の入力キーワード群５２からの意外度を求め、カテゴリ毎に意外度の平均を求め、カテゴリの意外度とする。そして例えば、意外度が高いカテゴリの中から意外度の高い語句に関する情報を取得し、出力する。 For example, as illustrated in FIG. 13, the control unit 21 acquires words / phrases having a link relationship with the input keyword from the link structure data, and classifies the acquired words / phrases by category. Information such as which category a word belongs to is previously given to the word or category. Moreover, the unexpectedness from the input keyword group 52 of each phrase is calculated | required, the average of unexpectedness is calculated | required for every category, and it is set as the unexpectedness of a category. Then, for example, information about a word with a high degree of surprise is acquired from a category with a high degree of unexpectedness, and is output.

図１３は、入力キーワードとリンク関係がある語を（ａ）カテゴリ「菓子」、（ｂ）カテゴリ「人」、（ｃ）カテゴリ「場所」に分類した状態を示す図である。図１３（ａ）に示すように、カテゴリ「菓子」には、「チョコレート」、「シュークリーム」、「チーズケーキ」等が分類される。また図１３（ｂ）に示すように、カテゴリ「人」には、「夏目漱石」、「田中角栄」、「安倍晋三」等が分類される。また図１３（ｃ）に示すように、カテゴリ「場所」には、「小笠原諸島」、「ユタ州立大学」、「田町駅」等が分類される。 FIG. 13 is a diagram illustrating a state in which words having a link relationship with the input keyword are classified into (a) category “confectionery”, (b) category “person”, and (c) category “place”. As shown in FIG. 13A, the category “confectionery” includes “chocolate”, “puff cream”, “cheese cake”, and the like. As shown in FIG. 13B, the category “people” includes “Natsume Soseki”, “Tanaka Kakuei”, “Abe Shinzo”, and the like. As shown in FIG. 13C, the category “place” includes “Ogasawara Islands”, “Utah State University”, “Tamachi Station”, and the like.

そして、各語についてそれぞれ入力キーワード群５２との意外度が求められる。例えば、図１３（ａ）に示すように、カテゴリ「菓子」に含まれる語「チョコレート」の意外度は０．１４であり、語「シュークリーム」の意外度は０．３であり、語「チーズケーキ」の意外度は０．３である。その他のカテゴリ「菓子」に含まれる各語の意外度を求め、意外度の平均値からカテゴリの意外度が「０．３」のように求められる。 Then, the degree of surprise with the input keyword group 52 is obtained for each word. For example, as shown in FIG. 13A, the word “chocolate” included in the category “confectionery” has an unexpected degree of 0.14, the word “puff cream” has an unexpected degree of 0.3, and the word “cheese”. The unexpectedness of “cake” is 0.3. The unexpectedness of each word included in the other category “confectionery” is obtained, and the unexpectedness of the category is obtained as “0.3” from the average value of the unexpectedness.

また図１３（ｂ）に示すように、カテゴリ「人」の意外度が「０．６」と求められ、図１３（ｃ）に示すように、カテゴリ「場所」の意外度が「０．５」と求められる。この場合、これらのカテゴリの中では図１３（ｂ）の「人」のカテゴリが最も意外度が高いため、制御部２１は、カテゴリ「人」の中から意外度の高い語「夏目漱石」についての情報を、意外性のある情報として出力する。 Further, as shown in FIG. 13B, the unexpected degree of the category “person” is obtained as “0.6”, and as shown in FIG. 13C, the unexpected degree of the category “place” is “0.5”. " In this case, since the category “person” in FIG. 13B has the highest degree of unexpectedness among these categories, the control unit 21 relates to the word “Natsume Soseki” having a high degree of surprise from the category “person”. Is output as unexpected information.

或いは、ユーザがカテゴリを選択可能としてもよい。この場合、ユーザがカテゴリを選択するためのインターフェースとして、制御部２１はカテゴリ選択画面等を表示部２４に表示する。ユーザがカテゴリを選択すると、制御部２１は、ユーザにより選択されたカテゴリの中で意外度の高い語句に関する情報を取得し、意外性のある情報として出力する。 Alternatively, the user may be able to select a category. In this case, the control unit 21 displays a category selection screen or the like on the display unit 24 as an interface for the user to select a category. When the user selects a category, the control unit 21 acquires information on a word / phrase having a high degree of surprise in the category selected by the user, and outputs the information as unexpected information.

以上説明したように、第３の実施の形態の情報処理装置２は、第１または第２の実施の形態の情報処理装置２において、更にカテゴリ毎の意外度を計算し、情報提示の際に利用する。これにより、ユーザにとってより意外な情報を提示することが可能となる。 As described above, the information processing apparatus 2 according to the third embodiment further calculates the unexpectedness for each category in the information processing apparatus 2 according to the first or second embodiment and presents the information. Use. This makes it possible to present information that is more surprising to the user.

以上、添付図面を参照して、本発明に係る情報処理装置、情報処理システム、及びプログラムの好適な実施形態について説明したが、本発明は係る例に限定されない。当業者であれば、本願で開示した技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 The preferred embodiments of the information processing apparatus, the information processing system, and the program according to the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to such examples. It will be apparent to those skilled in the art that various changes or modifications can be conceived within the scope of the technical idea disclosed in the present application, and these are naturally within the technical scope of the present invention. Understood.

１……………………情報処理システム
２……………………情報処理装置
２１…………………制御部
２２…………………記憶部
２３…………………入力部
２４…………………表示部
２５…………………通信Ｉ／Ｆ
２６…………………メディア入出力部
２７…………………周辺機器Ｉ／Ｆ部
２９…………………バス
３……………………文書ＤＢ
４……………………リンク構造ＤＢ
５……………………連想語ＤＢ
５１…………………共起回数カウンタ
６……………………意外度算出テーブル
５２…………………入力キーワード群
７ａ、７ｂ、７ｃ…文書
８ａ…………………文書
９……………………意外度が高い語についての文書
１０……………………ネットワーク 1 …………………… Information Processing System 2 …………………… Information Processing Device 21 …………………… Control Unit 22 …………………… Storage Unit 23 ……………… …… Input unit 24 …………………… Display unit 25 ………………… Communication I / F
26 …………………… Media Input / Output Unit 27 ………………… Peripheral Device I / F Unit 29 ………………… Bus 3 …………………… Document DB
4 …………………… Link structure DB
5 …………………… Associative Word DB
51 .............. Co-occurrence counter 6 .......... Unexpectedness calculation table 52 ................ Input keyword group 7a, 7b, 7c..Document 8a .............. ... Document 9 …………………… Document 10 about words with high degree of surprise …………………… Network

Claims

An input means for inputting a keyword;
Storage means for storing a phrase associated with a link destination as a link relation and a document related to the phrase;
Link phrase acquisition means for acquiring a phrase that is linked to the input keyword,
Document extracting means for extracting a document related to the input keyword and a document related to a phrase linked to the input keyword;
Computing means for calculating the degree of surprise between documents using the similarity between the document related to the keyword and the document related to the phrase related to the link;
An information processing apparatus comprising:

The information processing apparatus according to claim 1, further comprising: an output unit that outputs at least a part of a document related to the phrase having the link relationship according to a calculation result of the calculation unit.

The storage means stores words associated with each other in association with each other,
An associative word acquisition means for acquiring, as an associative word, a phrase associated with the input keyword;
The calculation means further calculates an unexpected degree of a document related to the associative word using a similarity between documents with a document related to the phrase acquired by the link phrase acquisition means. Item 3. The information processing device according to Item 2.

The storage means stores the total number referred to for each document;
The calculation means weights the dissimilarity between the document related to the input keyword and the document related to the phrase acquired by the link phrase acquisition means based on the number of references of the document to obtain the unexpected degree. The information processing apparatus according to claim 1, wherein the information processing apparatus is an information processing apparatus.

The storage means stores a phrase in association with each category,
The computing means calculates the unexpectedness for each category of words,
The information processing apparatus according to claim 2, wherein the output unit outputs at least a part of a document related to a phrase whose category is unexpected for the input keyword.

A server having storage means for storing a phrase associated with a link destination as a link relation and a document related to the phrase;
An input means for inputting a keyword;
Link phrase acquisition means for acquiring a phrase linked to the input keyword from the server,
Document extracting means for extracting from the server a document related to the input keyword and a document related to a phrase linked to the input keyword;
An information processing apparatus comprising: an operation unit that calculates a degree of unexpectedness between documents using a similarity between the document related to the keyword and the document related to the phrase related to the link;
An information processing system comprising:

A program for causing a computer to function as the information processing apparatus according to claim 1.