JP2009086944A

JP2009086944A - Information processor and information processing program

Info

Publication number: JP2009086944A
Application number: JP2007254855A
Authority: JP
Inventors: Mamiko Oka; 満美子岡
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2007-09-28
Filing date: 2007-09-28
Publication date: 2009-04-23
Anticipated expiration: 2027-09-28
Also published as: JP5151368B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information processor which acquires evaluation information for a sentence instead of evaluation information such as reputation and criticism for a specific word (keyword), together with relationship with information (sentence) to be evaluated. <P>SOLUTION: The information processor 10 includes: a first information acquisition part 11 for acquiring first information including a sentence; a second information acquisition part 12 for acquiring second information to be evaluated including a sentence; an identification part 13 for identifying a quotation part and a related part related to the quotation part, in the first information; a similarity determination part 14 for determining the similarity between the quotation part and content of the second information; a related description amount determination part 15 for determining the description amount of the related part or the related description amount indicative of the amount of evaluation expression in the related part; and an output part 16 for outputting first information based on the similarity and a determination result of the related description amount. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、情報処理装置および情報処理プログラムに関する。 The present invention relates to an information processing apparatus and an information processing program.

文書から、対象に対する評価などを抽出する技術がある（例えば、特許文献１，２参照）。 There is a technique for extracting an evaluation of a target from a document (for example, see Patent Documents 1 and 2).

特許文献１には、文書データから、対象に対する印象を表現する印象表現語と、当該印象表現語が結びつく名詞句との対を抽出する技術が記載されている。この技術では、例えば、口紅が対象とされる場合に、印象表現語と名詞句との対として、「よい」−「付け心地」が抽出される。 Patent Document 1 describes a technique for extracting a pair of an impression expression word expressing an impression of an object and a noun phrase to which the impression expression word is connected from document data. In this technique, for example, when a lipstick is targeted, “good”-“feeling” is extracted as a pair of an impression expression word and a noun phrase.

また、特許文献２には、文書中から、検索語に関する評判情報を、文書中における検索語の出現位置と評価表現の出現位置とに基づいて抽出する技術が記載されている。この技術では、例えば、検索語「モバイルギア」と評価表現「役に立つ」との対によって挟まれる部分の文字列を含んだ評判情報「モバイルギアのほうがよほど役に立つ。」が抽出される。また、特許文献２には、上記抽出された各評判情報に対して検索語に関する評判情報である確率が高い順に優先順位を付ける旨が記載されている。 Patent Document 2 describes a technique for extracting reputation information about a search word from a document based on the appearance position of the search word and the appearance position of the evaluation expression in the document. In this technology, for example, reputation information “mobile gear is more useful” including a character string between a search word “mobile gear” and an evaluation expression “useful” is extracted. Patent Document 2 describes that priorities are assigned to the extracted pieces of reputation information in descending order of the probability that the pieces of reputation information are related to search terms.

また、東京工業大学精密工学研究所奥村研究室のＷｅｂサイト「ｂｌｏｇＷａｔｃｈｅｒ」（http://blogwatcher.pi.titech.ac.jp/）では、収集された大量のブログから、製品名等のキーワードに対する評判情報を抽出するサービスが提供されている。 The website “blogWatcher” (http://blogwatcher.pi.titech.ac.jp/) of the Okumura Laboratory at the Institute of Precision Engineering, Tokyo Institute of Technology can be used to search for keywords such as product names from a large number of collected blogs. A service to extract reputation information is provided.

また、サイバーエージェント社のＷｅｂサイト「Ａｍｅｂａ口コミ評判検索」（http://buzz.ameba.jp/）では、ブログ上で話題になっている特定のキーワードについて、その出現数や好感度などをグラフ化して表示するサービスが提供されている。 In addition, the cyber agent website “Ameba Word-of-mouth Reputation Search” (http://buzz.ameba.jp/) is a graph showing the number of occurrences and likability of specific keywords that are on the blog. There is a service that can be displayed.

なお、特許文献３には、インターネットから、調査目的に合致した情報（例えば企業や製品に対する風評）を多く含むサイトを発見するための技術が記載されている。 Patent Document 3 describes a technique for finding a site containing a large amount of information (for example, a reputation for a company or a product) that matches the purpose of the survey from the Internet.

特許第３７８７３１８号明細書Japanese Patent No. 3787318 特許第３８２０８７８号明細書Japanese Patent No. 3820878 特開２００４−２８０５６９号公報Japanese Patent Laid-Open No. 2004-280568

本発明は、特定の単語（キーワード）に対する評判や批評などの評価情報ではなく、文章に対する評価情報を、評価される情報（文章）との関連性とともに得ることができる情報処理装置または情報処理プログラムを提供することを目的とする。 The present invention relates to an information processing apparatus or information processing program that can obtain evaluation information for a sentence, not evaluation information such as reputation or criticism for a specific word (keyword), with the relevance to the evaluated information (sentence). The purpose is to provide.

本発明に係る情報処理装置は、文章を含む第１の情報を取得する第１情報取得手段と、文章を含み評価の対象となる第２の情報を取得する第２情報取得手段と、前記第１の情報のうち、引用部分と、当該引用部分に関連する関連部分とを特定する特定手段と、前記引用部分と前記第２の情報との内容の類似度を判定する類似度判定手段と、前記関連部分の記載量または前記関連部分中の評価表現の量を示す関連記述量を判定する関連記述量判定手段と、前記類似度および関連記述量の判定結果に基づき第１の情報を出力する出力手段と、を有することを特徴とする。 An information processing apparatus according to the present invention includes a first information acquisition unit that acquires first information including a sentence, a second information acquisition unit that acquires second information that includes a sentence and is an object of evaluation, A specifying means for specifying a cited part and a related part related to the cited part among the information of 1, a similarity determining means for determining a similarity of contents between the cited part and the second information, A related description amount determination means for determining a related description amount indicating a description amount of the related portion or an evaluation expression in the related portion; and first information based on the determination result of the similarity and the related description amount. Output means.

本発明の一態様では、前記類似度判定手段は、前記第２の情報を複数の部分に分割し、前記引用部分と前記第２の情報の各部分との内容の類似度を判定するとともに、当該判定結果に基づき、前記引用部分が前記第２の情報のどの部分と類似するかを判定し、前記出力手段は、前記引用部分が前記第２の情報のどの部分と類似するかの判定結果を出力する。 In one aspect of the present invention, the similarity determination unit divides the second information into a plurality of parts, determines the similarity of the contents of the cited part and each part of the second information, Based on the determination result, it is determined which part of the second information the cited part is similar to, and the output means determines the part of the second information that the cited part is similar to Is output.

また、本発明の一態様では、前記出力手段により出力された第１の情報の前記関連部分を、前記類似度が所定値以上であり、かつ前記関連記述量が所定値以上である場合に表示装置に表示させる表示制御手段をさらに有する。 In one aspect of the present invention, the related portion of the first information output by the output unit is displayed when the similarity is a predetermined value or more and the related description amount is a predetermined value or more. Display control means for displaying on the apparatus is further provided.

また、本発明の一態様では、前記類似度判定手段は、前記第２の情報を複数の部分に分割し、前記引用部分と前記第２の情報の各部分との内容の類似度を判定するとともに、当該判定結果に基づき、前記引用部分が前記第２の情報のどの部分と類似するかを判定し、前記表示制御手段は、前記関連部分を、当該関連部分に対応する引用部分が類似する前記第２の情報の部分と関連付けて表示させる。 In the aspect of the invention, the similarity determination unit divides the second information into a plurality of parts, and determines the similarity of contents between the cited part and each part of the second information. At the same time, based on the determination result, it is determined which part of the second information the cited part is similar to, and the display control means is similar to the cited part corresponding to the related part. The second information part is displayed in association with the second information part.

本発明に係る情報処理プログラムは、コンピュータに、文章を含む第１の情報を取得するステップと、文章を含み評価の対象となる第２の情報を取得するステップと、前記第１の情報のうち、引用部分と、当該引用部分に関連する関連部分とを特定するステップと、前記引用部分と前記第２の情報との内容の類似度を判定するステップと、前記関連部分の記載量または前記関連部分中の評価表現の量を示す関連記述量を判定するステップと、前記類似度および関連記述量の判定結果に基づき第１の情報を出力するステップと、を実行させることを特徴とする。 An information processing program according to the present invention includes a step of acquiring, in a computer, first information including a sentence, a step of acquiring second information that includes a sentence and is an object of evaluation, and among the first information , Identifying a cited part and a related part related to the cited part, determining a similarity between contents of the cited part and the second information, a description amount of the related part or the related A step of determining a related description amount indicating the amount of evaluation expression in the portion; and a step of outputting first information based on the determination result of the similarity and the related description amount.

本発明の一態様では、前記類似度を判定するステップでは、前記第２の情報を複数の部分に分割し、前記引用部分と前記第２の情報の各部分との内容の類似度を判定するとともに、当該判定結果に基づき、前記引用部分が前記第２の情報のどの部分と類似するかを判定し、前記出力するステップでは、前記引用部分が前記第２の情報のどの部分と類似するかの判定結果を出力する。 In one aspect of the present invention, in the step of determining the similarity, the second information is divided into a plurality of parts, and the similarity of contents between the cited part and each part of the second information is determined. In addition, based on the determination result, it is determined which part of the second information the cited part is similar to, and in the outputting step, which part of the second information is the similar to the cited part The judgment result of is output.

また、本発明の一態様では、前記出力するステップで出力された第１の情報の前記関連部分を、前記類似度が所定値以上であり、かつ前記関連記述量が所定値以上である場合に表示装置に表示させるステップをさらに有する。 In one aspect of the present invention, the related portion of the first information output in the outputting step is determined when the similarity is a predetermined value or more and the related description amount is a predetermined value or more. It further has a step of displaying on the display device.

また、本発明の一態様では、前記類似度を判定するステップでは、前記第２の情報を複数の部分に分割し、前記引用部分と前記第２の情報の各部分との内容の類似度を判定するとともに、当該判定結果に基づき、前記引用部分が前記第２の情報のどの部分と類似するかを判定し、前記表示させるステップでは、前記関連部分を、当該関連部分に対応する引用部分が類似する前記第２の情報の部分と関連付けて表示させる。 In the aspect of the present invention, in the step of determining the similarity, the second information is divided into a plurality of parts, and the similarity of contents between the cited part and each part of the second information is determined. And determining, based on the determination result, which part of the second information the reference part is similar to, and in the step of displaying the reference part, the reference part corresponding to the relevant part is The second information is displayed in association with the similar second information portion.

請求項１に記載の発明によれば、評価される情報（文章）に関連して評価対象である文章に対する評価情報を得ることができる。 According to the first aspect of the present invention, it is possible to obtain evaluation information for a sentence to be evaluated in relation to information to be evaluated (sentence).

請求項２に記載の発明によれば、評価される情報の部分に対する評価情報を得ることができる。 According to invention of Claim 2, the evaluation information with respect to the part of the information evaluated can be obtained.

請求項３に記載の発明によれば、評価される情報に関連して評価情報を表示できる。 According to invention of Claim 3, evaluation information can be displayed in relation to the information evaluated.

請求項４に記載の発明によれば、評価される情報の部分に関連して評価情報を表示できる。 According to invention of Claim 4, evaluation information can be displayed in relation to the part of the information evaluated.

請求項５に記載の発明によれば、評価される情報（文章）に関連して評価対象である文章に対する評価情報を得ることができる。 According to the fifth aspect of the present invention, it is possible to obtain evaluation information for a sentence to be evaluated in relation to information (sentence) to be evaluated.

請求項６に記載の発明によれば、評価される情報の部分に対する評価情報を得ることができる。 According to the sixth aspect of the present invention, it is possible to obtain evaluation information for a portion of information to be evaluated.

請求項７に記載の発明によれば、評価される情報に関連して評価情報を表示できる。 According to invention of Claim 7, evaluation information can be displayed in relation to the information evaluated.

請求項８に記載の発明によれば、評価される情報の部分に関連して評価情報を表示できる。 According to invention of Claim 8, evaluation information can be displayed in relation to the part of the information evaluated.

以下、本発明の実施の形態を図面に従って説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本実施の形態に係る情報処理装置１０の構成の一例を示すブロック図である。情報処理装置１０は、一つの態様では、ハードウェア資源とソフトウェアとの協働により実現され、例えばコンピュータである。具体的には、情報処理装置１０の機能は、記録媒体に記録された情報処理プログラムがメインメモリに読み出されてＣＰＵ（Central Processing Unit）により実行されることによって実現される。上記情報処理プログラムは、ＣＤ−ＲＯＭ等のコンピュータ読み取り可能な記録媒体に記録されて提供されることも可能であるし、データ信号として通信により提供されることも可能である。ただし、情報処理装置１０の機能は、ハードウェアのみにより実現されてもよい。また、情報処理装置１０は、物理的に１つの装置により実現されてもよいし、複数の装置により実現されてもよい。具体的な一態様では、情報処理装置１０は、インターネット上のサーバ装置である。 FIG. 1 is a block diagram showing an example of the configuration of the information processing apparatus 10 according to the present embodiment. In one aspect, the information processing apparatus 10 is realized by cooperation of hardware resources and software, and is, for example, a computer. Specifically, the function of the information processing apparatus 10 is realized by reading an information processing program recorded on a recording medium into a main memory and executing it by a CPU (Central Processing Unit). The information processing program can be provided by being recorded on a computer-readable recording medium such as a CD-ROM, or can be provided by communication as a data signal. However, the function of the information processing apparatus 10 may be realized only by hardware. Further, the information processing apparatus 10 may be physically realized by one apparatus or may be realized by a plurality of apparatuses. In a specific aspect, the information processing apparatus 10 is a server apparatus on the Internet.

図１において、情報処理装置１０は、第１情報取得部１１、第２情報取得部１２、特定部１３、類似度判定部１４、関連記述量判定部１５、および出力部１６を有する。 In FIG. 1, the information processing apparatus 10 includes a first information acquisition unit 11, a second information acquisition unit 12, a specification unit 13, a similarity determination unit 14, a related description amount determination unit 15, and an output unit 16.

第１情報取得部１１は、文章を含む第１の情報を取得する。 The first information acquisition unit 11 acquires first information including a sentence.

ここで、上記第１の情報は、具体的には文章を含む文書情報であり、例えばブログ等のＷｅｂサイトから収集される記事である。 Here, the first information is specifically document information including sentences, and is an article collected from a website such as a blog.

第１情報取得部１１は、例えば、情報処理装置１０のハードディスク装置等の記憶装置や、インターネット上の装置などから第１の情報を取得する。 The first information acquisition unit 11 acquires first information from, for example, a storage device such as a hard disk device of the information processing device 10 or a device on the Internet.

第１情報取得部１１は、予め定められた記憶領域から第１の情報を取得してもよいし、外部（他の装置、他のプログラムモジュール、ユーザ等）から指定された情報を第１の情報として取得してもよい。 The first information acquisition unit 11 may acquire the first information from a predetermined storage area, or the information specified from outside (other devices, other program modules, users, etc.) It may be acquired as information.

一つの態様では、第１情報取得部１１は、複数の第１の情報を含む第１の情報群を取得する。 In one aspect, the 1st information acquisition part 11 acquires the 1st information group containing a plurality of 1st information.

第２情報取得部１２は、文章を含み評価の対象となる第２の情報を取得する。 The second information acquisition unit 12 acquires second information that includes text and is to be evaluated.

ここで、上記第２の情報は、具体的には文章を含む文書情報であり、より具体的にはニュース記事やプレスリリースなどの事実情報が記述された自然文テキストである。 Here, the second information is specifically document information including a sentence, and more specifically is a natural sentence text in which fact information such as a news article or a press release is described.

一つの態様では、第２情報取得部１２は、インターネットを介してクライアント装置から、ユーザにより指定された情報を第２の情報として受け取る。上記ユーザにより指定された情報としては、例えば、キーボード等の入力装置を介してユーザからクライアント装置に入力されたテキストや、ユーザにより指定されたクライアント装置内の文書ファイル、ユーザにより提供された紙文書がスキャンされ文字認識されて得られたテキストなどが挙げられる。このとき、ユーザは、インターネット上のニュース記事やプレスリリース記事などをそのまま第２の情報として利用してもよい。 In one mode, the 2nd information acquisition part 12 receives the information specified by the user as the 2nd information from the client device via the Internet. Examples of the information specified by the user include text input from the user to the client device via an input device such as a keyboard, a document file in the client device specified by the user, and a paper document provided by the user. Text obtained by scanning and character recognition. At this time, the user may use a news article or a press release article on the Internet as the second information as it is.

第２情報取得部１２は、上記以外の態様で第２の情報を取得してもよい。例えば、第２情報取得部１２は、情報処理装置１０の入力装置を介して、ユーザからテキストの入力を受け付けてもよいし、ユーザ（または他の装置やプログラムモジュール）からの指示に基づき、情報処理装置１０内の文書ファイルやインターネット上の文書ファイルを取得してもよい。 The second information acquisition unit 12 may acquire the second information in a mode other than the above. For example, the second information acquisition unit 12 may accept input of text from the user via the input device of the information processing device 10, and information based on an instruction from the user (or other device or program module) A document file in the processing apparatus 10 or a document file on the Internet may be acquired.

特定部１３は、第１情報取得部１１により取得された第１の情報のうち、引用部分と、当該引用部分に関連する関連部分とを特定する。 The specifying unit 13 specifies a cited part and a related part related to the cited part in the first information acquired by the first information acquiring unit 11.

一つの態様では、特定部１３は、第１の情報中の引用部分を特定し、当該特定の結果に基づいて関連部分を特定する。 In one aspect, the specifying unit 13 specifies a cited part in the first information, and specifies a related part based on the specific result.

また、一つの態様では、特定部１３は、第１情報取得部１１により取得された第１の情報群に含まれる各第１の情報について、引用部分と関連部分とを特定する。 Moreover, in one aspect, the specifying unit 13 specifies a cited part and a related part for each first information included in the first information group acquired by the first information acquiring unit 11.

上記引用部分は、例えば、ニュース記事やプレスリリース等の事実情報が引用されて記述されている記述部分である。 The cited part is a description part in which fact information such as a news article or a press release is cited and described.

特定部１３は、第１の情報から、１つ以上の引用部分を特定し、特定された各引用部分に関連する関連部分を特定してもよい。 The specifying unit 13 may specify one or more cited parts from the first information, and may specify a related part related to each specified cited part.

引用部分を特定する方法としては、下記（ａ１）〜（ａ３）の方法が例示される。なお、下記の方法は、単独で用いられてもよいし、組み合わせて用いられてもよい。 Examples of the method for specifying the cited part include the following methods (a1) to (a3). In addition, the following method may be used independently and may be used in combination.

（ａ１）第１の情報中のテキストに付加された、引用を表す付加情報（例えばタグ）に基づいて、引用部分を特定する。例えば、第１の情報がＨＴＭＬ（HyperText Markup Language）文書である場合、＜ＢＬＯＣＫＱＵＯＴＥ＞タグで囲まれたテキストを引用部分と特定する。
（ａ２）第１の情報中のテキストに含まれる、特定のテキストを囲む記号（例えば括弧）に基づいて、引用部分を特定する。例えば、テキスト中の開始括弧（例えば“「”）と終了括弧（例えば“」”）とを抽出し、当該両括弧間のテキストの文字数が所定数以上（例えば２０文字以上）であれば、当該両括弧間のテキストを引用部分と特定する。
（ａ３）他の情報へのリンクが設定された第１の情報中のテキストを、引用部分であると特定する。例えば、ＨＴＭＬ文書において、文全体または段落全体がアンカーテキストになっている場合、他のサイトのニュース記事などへのリンクである可能性が高いので、当該アンカーテキストを引用部分と特定する。 (A1) The citation part is specified based on the additional information (for example, tag) representing the citation added to the text in the first information. For example, when the first information is an HTML (HyperText Markup Language) document, the text enclosed by the <BLOCKQUEOT> tag is specified as the cited part.
(A2) The quoted part is specified based on a symbol (for example, parentheses) surrounding the specific text included in the text in the first information. For example, if the start parenthesis (for example, ““ ”) and the end parenthesis (for example,“ ””) in the text are extracted, and the number of characters in the text between the parentheses is a predetermined number or more (for example, 20 characters or more), The text between the brackets is identified as the quoted part.
(A3) The text in the first information in which a link to other information is set is specified as the cited part. For example, in an HTML document, when an entire sentence or entire paragraph is anchor text, there is a high possibility that it is a link to a news article or the like of another site, so that the anchor text is specified as a citation part.

関連部分を特定する方法としては、文章の構成に基づいて関連部分を特定する方法がある。例えば、「ブログなどでは、もとの事実情報について記述された記事が引用され、その後にそれに対する意見や評価などが記述される文章構成が多く採られている」といったことを利用して関連部分を特定する。具体的には、関連部分を特定する方法としては、下記（ｂ１）〜（ｂ３）の方法が例示される。なお、下記の方法は、単独で用いられてもよいし、組み合わせて用いられてもよい。 As a method of specifying the related part, there is a method of specifying the related part based on the composition of the sentence. For example, related parts using the fact that “in blogs, etc., are cited articles describing the original fact information, and then many sentence structures are described that describe opinions, evaluations, etc.” Is identified. Specifically, the following methods (b1) to (b3) are exemplified as a method for specifying the related portion. In addition, the following method may be used independently and may be used in combination.

（ｂ１）引用部分の直後から次の引用部分までのテキストを、前者の引用部分に関連する関連部分と特定する。
（ｂ２）括弧間のテキストが引用部分と特定された場合、当該引用部分を含む段落のうち当該引用部分以外の部分を関連部分と特定する。
（ｂ３）第１の情報中に引用部分が一箇所しかない場合、当該引用部分を除く第１の情報中の全部のテキストを関連部分と特定する。 (B1) The text from immediately after the citation part to the next citation part is specified as the related part related to the former citation part.
(B2) When the text between the parentheses is specified as the cited part, the part other than the cited part is specified as the related part in the paragraph including the cited part.
(B3) When there is only one citation part in the first information, all texts in the first information excluding the citation part are specified as related parts.

ただし、関連部分は、文章の構成以外に基づいて特定されてもよい。例えば、第１の情報のうち、引用部分との間の内容の類似度（例えばベクトル空間法による類似度）が高い部分を、関連部分と特定してもよい。 However, the related part may be specified based on other than the composition of the sentence. For example, in the first information, a portion having a high degree of content similarity (for example, a similarity based on a vector space method) with the cited portion may be specified as a related portion.

類似度判定部１４は、特定部１３により特定された引用部分と、第２情報取得部１２により取得された第２の情報との内容の類似度を判定する。 The similarity determination unit 14 determines the similarity of the content between the quoted portion specified by the specifying unit 13 and the second information acquired by the second information acquisition unit 12.

一つの態様では、類似度判定部１４は、特定部１３により特定された第１の情報中の１つ以上の引用部分の各々について、当該引用部分と第２の情報との内容の類似度を判定する。 In one aspect, the similarity determination unit 14 determines, for each of one or more cited parts in the first information specified by the specifying unit 13, the similarity between the contents of the cited part and the second information. judge.

また、一つの態様では、類似度判定部１４は、第１の情報群に含まれる第１の情報の各々について、各引用部分と、第２の情報との内容の類似度を判定する。 Moreover, in one aspect, the similarity determination part 14 determines the similarity of the content of each quotation part and 2nd information about each of the 1st information contained in a 1st information group.

ただし、類似度判定部１４による類似度の判定が、後述される関連記述量判定部１５による関連記述量の判定の後に行われる態様では、類似度判定部１４は、特定された引用部分のうち、関連記述量が所定値以上である関連部分（例えば記述が有ると判定された関連部分）に対応する引用部分についてのみ、類似度を判定してもよい。 However, in a mode in which the similarity determination by the similarity determination unit 14 is performed after the determination of the related description amount by the related description amount determination unit 15 described later, the similarity determination unit 14 The similarity may be determined only for a cited portion corresponding to a related portion (for example, a related portion determined to have a description) having a related description amount equal to or greater than a predetermined value.

上記類似度は、類似するか否かの２段階で表現されてもよいし、３段階以上で表現されてもよい。 The similarity may be expressed in two stages of whether or not they are similar, or may be expressed in three or more stages.

類似度判定部１４は、例えば、第２の情報および引用部分の各々から重要語を抽出し、ベクトル空間法などを用いて両者の類似度を計算する。ここで、上記重要語は、具体的には事実を表す上で重要な語であり、例えば、自立語、固有表現（固有名詞、日付表現、価格表現等）、専門用語などである。当該計算された類似度が最終的な内容の類似度とされてもよいし、当該計算された類似度に基づいて最終的な内容の類似度が求められてもよい。なお、類似度判定部１４は、上記とは別の方法で類似度を求めてもよい。 The similarity determination unit 14 extracts, for example, important words from each of the second information and the quoted portion, and calculates the similarity between the two using a vector space method or the like. Here, the important word is specifically an important word for expressing the fact, and is, for example, an independent word, a proper expression (proprietary noun, date expression, price expression, etc.), a technical term, or the like. The calculated similarity may be the final content similarity, or the final content similarity may be obtained based on the calculated similarity. Note that the similarity determination unit 14 may obtain the similarity by a method different from the above.

引用部分がアンカーテキストである場合には、当該アンカーテキストと、第２の情報または当該第２の情報のタイトルとの類似度が求められてもよいし、当該アンカーテキストのリンク先のテキストの全文と、第２の情報との類似度が求められてもよい。 When the quoted portion is anchor text, the similarity between the anchor text and the second information or the title of the second information may be obtained, or the entire text of the linked text of the anchor text And the similarity with 2nd information may be calculated | required.

関連記述量判定部１５は、特定部１３により特定された関連部分の記載量または当該関連部分中の評価表現の量を示す関連記述量を判定する。 The related description amount determination unit 15 determines the related description amount indicating the description amount of the related part specified by the specifying unit 13 or the amount of the evaluation expression in the related part.

一つの態様では、関連記述量判定部１５は、特定部１３により特定された第１の情報中の１つ以上の関連部分の各々について、関連記述量を判定する。 In one aspect, the related description amount determination unit 15 determines the related description amount for each of one or more related portions in the first information specified by the specifying unit 13.

また、一つの態様では、関連記述量判定部１５は、第１の情報群に含まれる各第１の情報の各関連部分について、関連記述量を判定する。ただし、当該関連記述量判定部１５による関連記述量の判定が、上記類似度判定部１４による類似度の判定の後に行われる態様では、関連記述量判定部１５は、特定された関連部分のうち、類似度が所定値以上である引用部分（例えば類似すると判定された引用部分）に対応する関連部分についてのみ、関連記述量を判定してもよい。 In one embodiment, the related description amount determination unit 15 determines the related description amount for each related portion of each first information included in the first information group. However, in the aspect in which the determination of the related description amount by the related description amount determination unit 15 is performed after the similarity determination by the similarity determination unit 14, the related description amount determination unit 15 includes the specified related portion. The related description amount may be determined only for a related part corresponding to a cited part (for example, a cited part determined to be similar) having a similarity equal to or greater than a predetermined value.

上記関連記述量は、記述が有るか無いかの２段階で表現されてもよいし、３段階以上で表現されてもよい。 The related description amount may be expressed in two stages of whether or not there is a description, or may be expressed in three or more stages.

上記関連記述量を判定する方法としては、下記（ｃ１）〜（ｃ３）の方法が例示される。 Examples of the method for determining the related description amount include the following methods (c1) to (c3).

（ｃ１）関連部分の記載量に応じた値を、関連記述量として求める。例えば、記載量が所定量以下（例えば、１文以下、１０文字以下など）であれば、記述は無い（例えば関連記述量＝０）と判定し、そうでなければ記述が有る（例えば関連記述量＝１）と判定する。
（ｃ２）関連部分中の評価表現の量に応じた値を、関連記述量として求める。ここで、評価表現は、例えば、意見を表す動詞（「思う」、「考える」、「みなす」、「期待する」等）や、評価を表す語（「よい」、「すばらしい」、「残念」等）などである。
（ｃ３）関連部分の記載量および関連部分中の評価表現の量に基づき、関連記述量を求める。例えば、上記記載量に応じた値と、上記評価表現の量に応じた値との積や和を関連記述量として算出する。 (C1) A value corresponding to the description amount of the related part is obtained as the related description amount. For example, if the description amount is equal to or less than a predetermined amount (for example, one sentence or less, 10 characters or less, etc.), it is determined that there is no description (for example, related description amount = 0), and if not, there is a description (for example, related description) It is determined that amount = 1).
(C2) A value corresponding to the amount of the evaluation expression in the related part is obtained as the related description amount. Here, evaluation expressions are, for example, verbs that express opinions (“think”, “think”, “respect”, “expect”, etc.) and words that represent evaluation (“good”, “great”, “sorry”) Etc.).
(C3) The related description amount is obtained based on the description amount of the related portion and the amount of evaluation expression in the related portion. For example, the product or sum of the value according to the amount described above and the value according to the amount of the evaluation expression is calculated as the related description amount.

ところで、上記類似度判定部１４による類似度の判定および上記関連記述量判定部１５による関連記述量の判定は、特定部１３により引用部分が特定された第１の情報について行われる。特定部１３により引用部分が特定されなかった第１の情報（以下、「引用部分を含まない第１の情報」と称す）については、処理対象から除外されてもよいし、別の処理が施されてもよい。 By the way, the determination of the similarity by the similarity determination unit 14 and the determination of the related description amount by the related description amount determination unit 15 are performed on the first information in which the citation part is specified by the specifying unit 13. The first information for which the citation part is not specified by the specification unit 13 (hereinafter referred to as “first information not including the citation part”) may be excluded from the processing target or may be subjected to another process. May be.

一つの態様では、情報処理装置１０は、情報評価部１９を有し、これにより引用部分を含まない第１の情報に対して次のような処理を行う。 In one aspect, the information processing apparatus 10 includes the information evaluation unit 19, and thereby performs the following process on the first information that does not include the cited part.

情報評価部１９は、引用部分を含まない第１の情報の各々について、当該第１の情報の、第２の情報に対する評価情報としての価値を判定する。評価情報とは、事実情報に対する意見、評価、評判、または論評などを含む情報である。 The information evaluation part 19 determines the value as evaluation information with respect to 2nd information of the said 1st information about each of the 1st information which does not include a quotation part. Evaluation information is information including an opinion, evaluation, reputation, or comment on fact information.

具体的には、情報評価部１９は、引用部分を含まない第１の情報の各々について、当該第１の情報と第２の情報との内容の類似度を判定する。ここでの判定方法は、上記類似度判定部１４での判定方法と同様であってもよいし、別の方法であってもよい。 Specifically, the information evaluation unit 19 determines the similarity between the contents of the first information and the second information for each of the first information that does not include the quoted portion. The determination method here may be the same as the determination method in the similarity determination unit 14, or may be another method.

さらに、情報評価部１９は、引用部分を含まない第１の情報の各々について、当該第１の情報における所定の評価表現の量または含有度を判定してもよい。 Further, the information evaluation unit 19 may determine the amount or content of the predetermined evaluation expression in the first information for each of the first information not including the quoted portion.

情報評価部１９は、上記判定された類似度を、第１の情報の評価情報としての価値としてもよいし、上記判定された類似度および評価表現の量や含有度に基づいて第１の情報の評価情報としての価値を求めてもよい。 The information evaluation unit 19 may use the determined similarity as the value as the evaluation information of the first information, or the first information based on the determined similarity and the amount and content of the evaluation expression. You may ask for value as evaluation information.

出力部１６は、類似度判定部１４の判定結果および関連記述量判定部１５の判定結果、すなわち類似度および関連記述量の判定結果に基づき、第１の情報を出力する。 The output unit 16 outputs first information based on the determination result of the similarity determination unit 14 and the determination result of the related description amount determination unit 15, that is, the determination result of the similarity and the related description amount.

一つの態様では、情報処理装置１０は表示制御部１７を有し、出力部１６は表示制御部１７に第１の情報を出力し、表示制御部１７は表示装置（不図示）に第１の情報を表示させる。 In one aspect, the information processing apparatus 10 includes the display control unit 17, the output unit 16 outputs the first information to the display control unit 17, and the display control unit 17 outputs the first information to the display device (not illustrated). Display information.

当該一つの態様における一態様では、表示制御部１７は、出力部１６により出力された第１の情報の関連部分を、類似度が所定値以上であり、かつ関連記述量が所定値以上である場合に表示装置に表示させる。 In one aspect of the one aspect, the display control unit 17 has a related part of the first information output by the output unit 16 having a similarity of a predetermined value or more and a related description amount of a predetermined value or more. Display on the display device.

ただし、出力部１６は、第１の情報を、記憶装置に出力して記憶させてもよいし、印刷装置に出力して印刷させてもよい。 However, the output unit 16 may output and store the first information to the storage device, or may output and print the first information to the printing device.

出力部１６は、上記処理に加えて、または上記処理に代えて、または上記処理の一態様として、以下の処理を実行してもよい。 The output unit 16 may execute the following processing in addition to the above processing, instead of the above processing, or as one aspect of the above processing.

例えば、出力部１６は、上記類似度および関連記述量の判定結果を出力してもよい。この場合、出力部１６は、当該判定結果を、記憶装置に出力して記憶させてもよいし、表示装置に出力して画面上に表示させてもよいし、印刷装置に出力して印刷させてもよい。 For example, the output unit 16 may output the determination result of the similarity and the related description amount. In this case, the output unit 16 may output the determination result to the storage device, store it, output it to the display device, display it on the screen, or output it to the printing device for printing. May be.

また、出力部１６は、１つ以上の引用部分と関連部分との組の各々について、当該組と関連付けて類似度および関連記述量を出力してもよい。 The output unit 16 may output the similarity and the related description amount in association with each set of one or more cited parts and related parts.

また、出力部１６は、特定部１３の特定結果を出力してもよい。例えば、出力部１６は、特定された引用部分と関連部分との組を、当該組に対応する類似度および関連記述量と関連付けて出力してもよい。 Further, the output unit 16 may output the specifying result of the specifying unit 13. For example, the output unit 16 may output the set of the specified citation part and the related part in association with the similarity and the related description amount corresponding to the set.

また、出力部１６は、上記類似度および関連記述量に基づき、関連部分の、第２の情報に対する評価情報としての価値を表す評価値を求め、当該評価値を判定結果として出力してもよい。ここで、上記評価値は、例えば次のように設定される。引用部分と第２の情報との類似度が高いほど、当該引用部分に対応する関連部分が第２の情報に対する評価情報である可能性が高いと考えられる。そこで、関連部分の評価値は、当該関連部分に対応する引用部分の類似度が高いほど高くなるように設定される。また、引用部分と第２の情報とがある程度類似する場合、当該引用部分に対応する関連部分について判定された関連記述量が多いほど、当該関連部分の、第２の情報に対する評価情報としての価値が高いと考えられる。そこで、関連部分の評価値は、当該関連部分の関連記述量が多いほど高くなるように設定される。例えば、上記評価値は、上記類似度と上記関連記述量との積である。 The output unit 16 may obtain an evaluation value representing the value of the related part as evaluation information for the second information based on the similarity and the related description amount, and output the evaluation value as a determination result. . Here, the evaluation value is set as follows, for example. It is considered that the higher the similarity between the quoted part and the second information, the higher the possibility that the related part corresponding to the quoted part is the evaluation information for the second information. Therefore, the evaluation value of the related part is set to be higher as the similarity of the cited part corresponding to the related part is higher. In addition, when the quoted part and the second information are somewhat similar, the more the related description amount determined for the relevant part corresponding to the quoted part, the more the value of the relevant part as evaluation information for the second information Is considered high. Therefore, the evaluation value of the related part is set so as to increase as the related description amount of the related part increases. For example, the evaluation value is a product of the similarity and the related description amount.

また、出力部１６は、類似度判定部１４の判定結果および関連記述量判定部１５の判定結果、すなわち類似度および関連記述量の判定結果に基づき、特定部１３により特定された関連部分のうち、対応する引用部分の類似度が所定値以上であり、かつ関連記述量が所定値以上である関連部分を優先的に表示させる表示用情報を出力してもよい。 In addition, the output unit 16 includes, based on the determination result of the similarity determination unit 14 and the determination result of the related description amount determination unit 15, that is, the determination result of the similarity and the related description amount, Display information that preferentially displays related parts in which the degree of similarity of the corresponding cited parts is greater than or equal to a predetermined value and the related description amount is greater than or equal to a predetermined value may be output.

ここで、出力部１６は、上記表示用情報を、表示装置に出力して表示させてもよいし、印刷装置に出力して印刷させてもよいし、記憶装置に出力して記憶させてもよい。 Here, the output unit 16 may output and display the display information on a display device, or may output and display the information on a printing device, or may output and store the information on a storage device. Good.

また、出力部１６は、特定された各関連部分について、当該関連部分に対応する引用部分の類似度と、当該関連部分の関連記述量とに基づき、当該関連部分の第２の情報に対する評価情報としての価値を表す関連部分評価値を求め、当該関連部分評価値に基づいて表示用情報を作成してもよい。ここで、上記関連部分評価値は、例えば次のように設定される。引用部分と第２の情報との類似度が高いほど、当該引用部分に対応する関連部分が第２の情報に対する評価情報である可能性が高いと考えられる。そこで、関連部分の関連部分評価値は、当該関連部分に対応する引用部分の類似度が高いほど高くなるように設定される。また、引用部分と第２の情報とがある程度類似する場合、当該引用部分に対応する関連部分について判定された関連記述量が多いほど、当該関連部分の、第２の情報に対する評価情報としての価値が高いと考えられる。そこで、関連部分の関連部分評価値は、当該関連部分の関連記述量が多いほど高くなるように設定される。例えば、上記関連部分評価値は、上記類似度と上記関連記述量との積である。 Further, the output unit 16 evaluates the second related information of the related part based on the similarity of the cited part corresponding to the related part and the related description amount of the related part for each specified related part. A related partial evaluation value representing the value as the above may be obtained, and display information may be created based on the related partial evaluation value. Here, the related portion evaluation value is set as follows, for example. It is considered that the higher the similarity between the quoted part and the second information, the higher the possibility that the related part corresponding to the quoted part is the evaluation information for the second information. Therefore, the related part evaluation value of the related part is set to be higher as the similarity of the cited part corresponding to the related part is higher. In addition, when the quoted part and the second information are somewhat similar, the more the related description amount determined for the relevant part corresponding to the quoted part, the more the value of the relevant part as evaluation information for the second information Is considered high. Therefore, the related part evaluation value of the related part is set so as to increase as the related description amount of the related part increases. For example, the related part evaluation value is a product of the similarity and the related description amount.

出力部１６は、上記関連部分評価値が高い順に関連部分が表示されるように、上記表示用情報を作成してもよい。また、出力部１６は、上記関連部分評価値が高いものから順に所定数の関連部分を選択し、当該選択された関連部分が表示されるように表示用情報を作成してもよい。また、出力部１６は、上記関連部分評価値が所定値以上である関連部分を選択し、当該選択された関連部分が表示されるように表示用情報を作成してもよい。 The output unit 16 may create the display information so that the related portions are displayed in descending order of the related portion evaluation value. The output unit 16 may select a predetermined number of related parts in descending order of the related part evaluation value, and create display information so that the selected related parts are displayed. Further, the output unit 16 may select a related part having the related part evaluation value equal to or greater than a predetermined value, and create display information so that the selected related part is displayed.

上記表示用情報は、表示対象の関連部分を含む第１の情報の全体を表示させるものであってもよい。 The display information may display the entire first information including a related portion to be displayed.

出力部１６は、各第１の情報について、当該第１の情報中の引用部分の類似度と、当該第１の情報中の関連部分の関連記述量とに基づき、当該第１の情報の、第２の情報に対する評価情報としての価値を表す評価値を求め、当該評価値に基づいて表示用情報を作成してもよい。 The output unit 16 determines, for each first information, based on the similarity of the cited part in the first information and the related description amount of the related part in the first information. An evaluation value representing the value as evaluation information for the second information may be obtained, and display information may be created based on the evaluation value.

出力部１６は、上記評価値が高い順に第１の情報が表示されるように、上記表示用情報を作成してもよいし、評価値が高い第１の情報に属するものから順に、関連部分が表示されるように、上記表示用情報を作成してもよい。また、出力部１６は、上記評価値が高いものから順に所定数の第１の情報を選択し、当該選択された第１の情報に属する関連部分が表示されるように表示用情報を作成してもよい。また、出力部１６は、上記評価値が所定値以上である第１の情報を選択し、当該選択された第１の情報に属する関連部分が表示されるように表示用情報を作成してもよい。 The output unit 16 may create the display information so that the first information is displayed in descending order of the evaluation value, or the related parts in order from the information belonging to the first information having the highest evaluation value. The display information may be created so that is displayed. In addition, the output unit 16 selects a predetermined number of pieces of first information in descending order of the evaluation value, and creates display information so that related portions belonging to the selected first information are displayed. May be. Further, the output unit 16 selects the first information whose evaluation value is equal to or greater than the predetermined value, and creates display information so that the related part belonging to the selected first information is displayed. Good.

上記「対応する引用部分の類似度が所定値以上であり、かつ関連記述量が所定値以上である関連部分」以外の関連部分については、表示用情報に含まれなくてもよいし、上記所定値以上である関連部分の後段に表示されるように表示用情報に含められてもよい。 Related parts other than the above-mentioned "related part in which the similarity of the corresponding cited part is equal to or greater than a predetermined value and the related description amount is equal to or greater than the predetermined value" may not be included in the display information, It may be included in the display information so as to be displayed in the subsequent stage of the related portion that is greater than or equal to the value.

出力部１６は、引用部分を含まない第１の情報を上記表示用情報に含めてもよい。例えば、出力部１６は、情報評価部１９により判定された第１の情報の価値に基づき、価値が高いものから順に所定数の引用部分を含まない第１の情報を選択し、当該選択された第１の情報が表示されるように、上記表示用情報を作成してもよい。 The output unit 16 may include the first information that does not include the quoted portion in the display information. For example, based on the value of the first information determined by the information evaluation unit 19, the output unit 16 selects the first information that does not include a predetermined number of citations in descending order of value, and the selected information is selected. The display information may be created so that the first information is displayed.

なお、出力部１６は、上記表示用情報を出力する代わりに、特定結果、判定結果、選択結果、計算結果などの各種の処理結果を出力してもよい。例えば、出力部１６は、上記選択された関連部分と、当該関連部分の関連部分評価値または関連部分評価値の順位とを互いに関連付けて記憶装置に出力してもよい。また、出力部１６は、上記選択された第１の情報と、当該第１の情報の評価値または評価値の順位とを互いに関連付けて記憶装置に出力してもよい。また、出力部１６は、引用部分を含まない第１の情報について、上記選択された第１の情報と、当該第１の情報の価値または価値の順位とを互いに関連付けて記憶装置に出力してもよい。 Note that the output unit 16 may output various processing results such as a specific result, a determination result, a selection result, and a calculation result instead of outputting the display information. For example, the output unit 16 may associate the selected related part and the related part evaluation value of the related part or the rank of the related part evaluation value with each other and output them to the storage device. Further, the output unit 16 may output the selected first information and the evaluation value of the first information or the rank of the evaluation value to each other in association with each other. In addition, the output unit 16 outputs the selected first information and the value of the first information or the rank of the values to the storage device in association with each other for the first information that does not include the quoted portion. Also good.

上記構成を有する情報処理装置１０において、一つの態様では、第１情報取得部１１は、以下のように第１の情報群を取得する。 In the information processing apparatus 10 having the above configuration, in one aspect, the first information acquisition unit 11 acquires the first information group as follows.

第１情報取得部１１は、第２情報取得部１２により取得された第２の情報を解析し、当該第２の情報から重要語を抽出する。 The first information acquisition unit 11 analyzes the second information acquired by the second information acquisition unit 12, and extracts an important word from the second information.

ここで、上記重要語は、具体的には事実を表す上で重要な語であり、例えば固有表現や専門用語などである。 Here, the important word is specifically an important word for expressing a fact, and is, for example, a proper expression or a technical term.

第１情報取得部１１は、重要語の抽出において、人名や地名といった特定の品詞を優先してもよいし、出現頻度の高いものを優先してもよいし、第一文に出てくるものを優先してもよい。 The first information acquisition unit 11 may prioritize a specific part of speech such as a person's name or a place name in the extraction of an important word, may prioritize a frequently appearing part, or appear in the first sentence May be given priority.

第１情報取得部１１は、上記抽出された重要語に基づき検索を行い、当該重要語を含む情報を第１の情報として取得する。 The first information acquisition unit 11 performs a search based on the extracted important word, and acquires information including the important word as first information.

例えば、第１情報取得部１１は、インターネットから、上記抽出された重要語を含み、かつ所定期間内（最近三日以内など）の新着記事を検索して収集する。 For example, the first information acquisition unit 11 retrieves and collects new articles from the Internet that include the extracted important words and within a predetermined period (such as within the last three days).

より具体的な例では、第１情報取得部１１は、予め指定されたＲＳＳ（RDF Site Summary）のＵＲＬ（Uniform Resource Locator）から、またはクローラにより集められたＲＳＳのＵＲＬから、ＲＳＳフィードを取得し、当該ＲＳＳフィードに記述された新規情報の作成日時等に基づき、新着記事（例えば、最近三日以内に登録されたもの）のＵＲＬを取得する。そして、第１情報取得部１１は、上記取得されたＵＲＬにアクセスして記事を取得し、取得された記事の中から、上記抽出された重要語を含む記事をＡＮＤ検索やＯＲ検索などで検索する。第１情報取得部１１は、上記検索された記事をランキングし、ランキングの上位から順に所定数の記事を選択し、当該所定数の記事を第１の情報群としてもよい。上記記事のランキングでは、例えば、重要語を多く含むものが優先され、重要語を同じだけ含む記事の中では新しいものが優先される。 In a more specific example, the first information acquisition unit 11 acquires an RSS feed from an RSS (Uniform Resource Locator) URL (RDF Site Summary) specified in advance or from an RSS URL collected by a crawler. Then, based on the creation date and time of new information described in the RSS feed, the URL of a newly arrived article (for example, registered within the last three days) is acquired. Then, the first information acquisition unit 11 accesses the acquired URL to acquire an article, and searches the acquired article including the extracted important word by AND search or OR search. To do. The first information acquisition unit 11 may rank the searched articles, select a predetermined number of articles in order from the top of the ranking, and use the predetermined number of articles as the first information group. In the ranking of the articles, for example, articles that include many important words are given priority, and articles that contain the same important words are given priority.

上記のような新着記事の収集および検索は、例えば株式会社テクノラティジャパン等が公開しているＡＰＩ（Application Program Interface）を利用するなど、種々の方法で実現可能である。 The collection and search of new articles as described above can be realized by various methods such as using API (Application Program Interface) published by Technorati Japan Co., Ltd., for example.

なお、第１情報取得部１１は、上記以外の方法で第１の情報群を取得してもよい。例えば、第１情報取得部１１は、予め記憶装置に記憶されている第１の情報群を取得してもよい。 In addition, the 1st information acquisition part 11 may acquire a 1st information group by methods other than the above. For example, the first information acquisition unit 11 may acquire a first information group stored in advance in the storage device.

図２は、本実施の形態に係る情報処理装置１０の動作手順の一例を示すフローチャートである。以下、図２を参照して、情報処理装置１０の動作を説明する。 FIG. 2 is a flowchart illustrating an example of an operation procedure of the information processing apparatus 10 according to the present embodiment. Hereinafter, the operation of the information processing apparatus 10 will be described with reference to FIG.

ステップＳ１１では、情報処理装置１０は、ユーザからの指定に基づく第２の情報を取得する。 In step S11, the information processing apparatus 10 acquires second information based on designation from the user.

ついで、ステップＳ１２では、情報処理装置１０は、所定の記憶装置から第１の情報を取得する。 In step S12, the information processing apparatus 10 acquires first information from a predetermined storage device.

ついで、ステップＳ１３では、情報処理装置１０は、上記取得された第１の情報のうち、１つ以上の引用部分と、当該各引用部分に関連する関連部分とを特定する。 Next, in step S13, the information processing apparatus 10 identifies one or more quoted parts and related parts related to each cited part in the acquired first information.

ついで、ステップＳ１４では、情報処理装置１０は、上記特定された各引用部分について、当該引用部分と上記取得された第２の情報との内容の類似度を判定する。ここでは、類似度は、「０」，「１」，「２」の３段階で表され、「０」は類似しないことを示す。 Next, in step S14, the information processing apparatus 10 determines, for each of the identified citations, the similarity between the content of the citation and the acquired second information. Here, the degree of similarity is expressed in three stages of “0”, “1”, and “2”, and “0” indicates that they are not similar.

ついで、ステップＳ１５では、情報処理装置１０は、上記特定された関連部分のうち、第２の情報と類似すると判定された引用部分（すなわち類似度が「１」または「２」の引用部分）に対応する関連部分の各々について、当該関連部分の記載量または当該関連部分中の評価表現の量を示す関連記述量を判定する。ここでは、関連記述量は、「０」〜「３」の４段階で表され、「０」は記述が無いことを示す。 Next, in step S15, the information processing apparatus 10 applies the citation portion determined to be similar to the second information among the identified related portions (that is, the citation portion having a similarity of “1” or “2”). For each corresponding related part, a description amount of the related part or a related description quantity indicating the amount of evaluation expression in the related part is determined. Here, the related description amount is expressed in four stages from “0” to “3”, and “0” indicates that there is no description.

ついで、ステップＳ１６では、情報処理装置１０は、上記類似度および関連記述量の判定結果に基づき、第１の情報を表示装置に表示させる。 Next, in step S16, the information processing apparatus 10 causes the display device to display the first information based on the determination result of the similarity and the related description amount.

なお、上記ステップＳ１３で引用部分を特定できなかった場合、情報処理装置１０は、例えば、引用部分を特定できなかった旨を示す情報を出力して、処理を終了させる。 In addition, when the cited part cannot be specified in step S13, the information processing apparatus 10 outputs information indicating that the cited part cannot be specified, for example, and ends the process.

本実施の形態における一つの態様では、情報処理装置１０は、次のとおりである。 In one aspect of the present embodiment, the information processing apparatus 10 is as follows.

当該態様では、類似度判定部１４は、第２の情報を複数の部分に分割し、引用部分と第２の情報の各部分との内容の類似度を判定するとともに、当該判定結果に基づき、引用部分が第２の情報のどの部分と類似するかを判定する。 In this aspect, the similarity determination unit 14 divides the second information into a plurality of parts, determines the similarity of the content between the cited part and each part of the second information, and based on the determination result, It is determined which part of the second information the cited part is similar to.

類似度判定部１４は、特定された１つ以上の引用部分の各々について、当該引用部分と第２の情報の各部分との内容の類似度を判定するとともに、当該判定結果に基づき、当該引用部分が第２の情報のどの部分と類似するかを判定してもよい。 The similarity determination unit 14 determines, for each of the specified one or more citation parts, the similarity between the citation part and each part of the second information, and based on the determination result, the citation part It may be determined which part of the second information the part is similar to.

また、類似度判定部１４は、第１の情報群の各第１の情報の各引用部分について、当該引用部分と第２の情報の各部分との内容の類似度を判定するとともに、当該判定結果に基づき、当該引用部分が第２の情報のどの部分と類似するかを判定してもよい。 In addition, the similarity determination unit 14 determines the similarity of the content between the cited part and each part of the second information for each cited part of the first information of the first information group, and the determination Based on the result, it may be determined which part of the second information the cited part is similar to.

類似度判定部１４は、第２の情報を、例えば、段落ごとに分割してもよいし、所定数文ずつに分割してもよい。また、類似度判定部１４は、第２の情報の各領域の記載内容に基づいて、第２の情報を複数の部分に分割してもよい。 The similarity determination unit 14 may divide the second information, for example, for each paragraph, or for each predetermined number of sentences. Moreover, the similarity determination unit 14 may divide the second information into a plurality of parts based on the description content of each area of the second information.

類似度判定部１４は、例えば、引用部分と第２の情報の各部分との内容の類似度に基づき、引用部分との類似度が最大である部分を、当該引用部分が類似する部分であると判定する。この場合、最大の類似度が所定値未満であるとき、引用部分はどの部分とも類似しないと判定されてもよい。 The similarity determination unit 14 is, for example, a portion where the citation portion is similar to a portion having the maximum similarity with the citation portion based on the similarity in content between the citation portion and each portion of the second information. Is determined. In this case, when the maximum similarity is less than a predetermined value, it may be determined that the cited part is not similar to any part.

そして、一つの態様では、出力部１６は、引用部分が第２の情報のどの部分と類似するかの判定結果を出力する。 In one aspect, the output unit 16 outputs a determination result as to which part of the second information the cited part is similar to.

また、一つの態様では、表示制御部１７は、第１の情報の関連部分を、当該関連部分に対応する引用部分が類似する第２の情報の部分と関連付けて表示させる。 Further, in one aspect, the display control unit 17 displays the related part of the first information in association with the second information part having a similar citation part corresponding to the related part.

また、一つの態様では、出力部１６は、上記類似度判定部１４の判定結果に基づき、関連部分が、当該関連部分に対応する引用部分が類似する第２の情報の部分と関連付けて表示されるように、上記表示用情報を出力する。 Further, in one aspect, the output unit 16 displays the related part in association with the second information part similar to the cited part corresponding to the related part based on the determination result of the similarity determining unit 14. As described above, the display information is output.

図３は、本実施の形態に係る情報処理装置１０の動作手順の別の一例を示すフローチャートである。以下、図３を参照して、情報処理装置１０の動作を説明する。 FIG. 3 is a flowchart showing another example of the operation procedure of the information processing apparatus 10 according to the present embodiment. Hereinafter, the operation of the information processing apparatus 10 will be described with reference to FIG.

（ステップＳ３１）
ユーザは、あるプレスリリース記事に対する評価情報を得たい場合、クライアント装置に対し、上記プレスリリース記事が記述されたＨＴＭＬファイルを指定する。クライアント装置は、上記指定されたＨＴＭＬファイルを、インターネットを介してサーバ装置である情報処理装置１０に送信する。 (Step S31)
When the user wants to obtain evaluation information for a certain press release article, the user designates an HTML file in which the press release article is described to the client device. The client apparatus transmits the specified HTML file to the information processing apparatus 10 that is a server apparatus via the Internet.

情報処理装置１０は、上記クライアント装置から上記ＨＴＭＬファイルを第２の情報（以下、「種文書」と称す）として受信する。なお、ＨＴＭＬファイルは、タイトル、本文ＨＴＭＬテキスト、本文テキスト（本文ＨＴＭＬテキストからタグを抜いたもの）を含む。 The information processing apparatus 10 receives the HTML file as second information (hereinafter referred to as “seed document”) from the client apparatus. The HTML file includes a title, a body HTML text, and a body text (a body HTML text obtained by removing a tag).

（ステップＳ３２）
ついで、情報処理装置１０は、テキスト解析により種文書を解析し、種文書の本文テキストの最初の三文に含まれる重要語（ここでは、人名、組織名、および専門用語）を抽出する。 (Step S32)
Next, the information processing apparatus 10 analyzes the seed document by text analysis and extracts important words (here, a person name, an organization name, and a technical term) included in the first three sentences of the body text of the seed document.

（ステップＳ３３）
ついで、情報処理装置１０は、インターネットから、上記抽出された重要語を含み、かつ最近三日以内の新着記事を検索して取得する。具体的には、情報処理装置１０は、予め指定された検索対象のＲＳＳのＵＲＬからＲＳＳフィードを取得し、当該ＲＳＳフィードに基づき最近三日以内に登録された記事を収集し、当該収集された記事から、上記抽出された重要語を検索語とするＯＲ検索により、重要語を含む記事を検索する。そして、情報処理装置１０は、検索された記事を、検索語を多く含む順にランキングする。このとき、検索語の個数が同じである記事については、作成日時が新しい順にランキングされる。そして、情報処理装置１０は、ランキングの上位５０個の記事（具体的にはＨＴＭＬファイル）を検索結果として取得する。そして、情報処理装置１０は、取得された５０個の記事を、対応するＵＲＬと関連付けて保持する。なお、記事（ＨＴＭＬファイル）は、タイトル、本文ＨＴＭＬテキスト、本文テキスト（本文ＨＴＭＬテキストからタグを抜いたもの）を含む。上記取得された記事（ＨＴＭＬファイル）は、第１の情報に相当する。 (Step S33)
Next, the information processing apparatus 10 searches the Internet for new articles that contain the extracted important word and are within the last three days, and acquires them. Specifically, the information processing apparatus 10 acquires an RSS feed from a previously specified RSS URL of a search target, collects articles registered within the last three days based on the RSS feed, and collects the collected An article including an important word is searched from the article by OR search using the extracted important word as a search word. Then, the information processing apparatus 10 ranks the searched articles in the order including many search terms. At this time, articles with the same number of search terms are ranked in the order of creation date. Then, the information processing apparatus 10 acquires the top 50 articles (specifically, HTML files) in the ranking as search results. The information processing apparatus 10 holds the 50 acquired articles in association with the corresponding URLs. The article (HTML file) includes a title, a body HTML text, and a body text (a body HTML text obtained by removing a tag). The acquired article (HTML file) corresponds to the first information.

（ステップＳ３４）
ついで、情報処理装置１０は、上記取得された各記事について、本文ＨＴＭＬテキストを解析して引用部分を特定し、上記検索結果に含まれる５０個の記事を、引用部分を含む記事と、引用部分を含まない記事とに分ける。 (Step S34)
Next, the information processing apparatus 10 analyzes the body HTML text to identify the cited part for each of the acquired articles, and identifies 50 articles included in the search result as the article including the cited part, and the cited part. It is divided into articles that do not contain.

ここでは、引用部分の特定は、次のように行われる。すなわち、本文ＨＴＭＬテキスト中に＜ｂｌｏｃｋｑｕｏｔｅ＞タグに囲まれた部分があれば、当該部分のテキストを引用部分として特定する。＜ｂｌｏｃｋｑｕｏｔｅ＞タグがない場合には、段落すべてがアンカーテキストとなっている部分があれば、当該アンカーテキストを引用部分として特定する。 Here, the citation part is specified as follows. That is, if there is a part surrounded by <blockquote> tags in the body HTML text, the text of the part is specified as a cited part. When there is no <blockquote> tag, if there is a part in which all the paragraphs are anchor text, the anchor text is specified as a quoted part.

（ステップＳ３５）
ついで、情報処理装置１０は、引用部分を含む各記事について、当該記事に含まれる各引用部分に対応する関連部分を特定し、引用部分と関連部分との組を保持する。 (Step S35)
Next, the information processing apparatus 10 specifies a related part corresponding to each cited part included in the article for each article including the cited part, and holds a pair of the cited part and the related part.

ここでは、情報処理装置１０は、引用部分と次の引用部分との間のテキストを、前者の引用部分に対応する関連部分とし、最後の引用部分と記事の末尾との間のテキストを、上記最後の引用部分に対応する関連部分とする。 Here, the information processing apparatus 10 sets the text between the citation part and the next citation part as a related part corresponding to the former citation part, and the text between the last citation part and the end of the article as the above. The related part corresponding to the last quoted part.

（ステップＳ３６）
ついで、情報処理装置１０は、引用部分を含む各記事について、当該記事に含まれる各引用部分と種文書との類似度を判定する。ここでは、情報処理装置１０は、類似度を判定するとともに、各引用部分が種文書のどの部分に類似するものかを判定する。 (Step S36)
Next, the information processing apparatus 10 determines the degree of similarity between each cited part included in the article and the seed document for each article including the cited part. Here, the information processing apparatus 10 determines the degree of similarity and determines which part of the seed document each citation part is similar to.

具体的には、情報処理装置１０は、次のように判定を行う。 Specifically, the information processing apparatus 10 performs determination as follows.

情報処理装置１０は、種文書の本文ＨＴＭＬテキストの＜ｐ＞タグなどを手掛かりにして、種文書を段落ごとに分割し、各段落のテキストを抽出する。図４に示される例では、種文書は、エリアＡ，Ｂ，Ｃの３つの部分に分割されている。 The information processing apparatus 10 divides the seed document into paragraphs using the <p> tag of the body HTML text of the seed document as a clue, and extracts the text of each paragraph. In the example shown in FIG. 4, the seed document is divided into three parts, areas A, B, and C.

ついで、情報処理装置１０は、種文書の各エリアのテキストから、重要語を抽出し、ｔｆ＊ＩＤＦなどを用いて重要語に重み付けし、各エリアの単語ベクトルを作成する。なお、ここで抽出される重要語は、上記ステップＳ３２で抽出される重要語と同じであってもよいし、異なっていてもよい。また、情報処理装置１０は、各記事の各引用部分についても、同様に単語ベクトルを作成する。 Next, the information processing apparatus 10 extracts important words from the text of each area of the seed document, weights the important words using tf * IDF, and creates a word vector for each area. The important words extracted here may be the same as or different from the important words extracted in step S32. In addition, the information processing apparatus 10 similarly creates a word vector for each quoted portion of each article.

ついで、情報処理装置１０は、種文書の各エリアの単語ベクトルと、各記事の各引用部分の単語ベクトルとに基づき、ベクトル空間法を用いて、種文書の各エリアと、各記事の各引用部分との間の類似度を計算する。ここでは、類似度は、同一文書間の類似度が１００となるように正規化されたものである。 Next, the information processing apparatus 10 uses the vector space method based on the word vector of each area of the seed document and the word vector of each citation part of each article, and each citation of each article and each article. Calculate the similarity between parts. Here, the similarity is normalized so that the similarity between the same documents is 100.

ついで、情報処理装置１０は、上記計算された類似度に基づき、各記事の各引用部分が種文書のどのエリアに類似するかを判定する。例えば、情報処理装置１０は、ある引用部分について判定する場合、当該引用部分との類似度が最も大きいエリアに類似するものと判定する。ただし、最も大きい類似度が所定値（ここでは４０）未満である場合には、当該引用部分は種文書に類似しないと判定される。 Next, the information processing apparatus 10 determines which area of the seed document each citation part of each article is similar to based on the calculated similarity. For example, when determining about a certain quoted part, the information processing apparatus 10 judges that the part is similar to the area having the highest similarity with the cited part. However, if the maximum similarity is less than a predetermined value (40 in this case), it is determined that the cited part is not similar to the seed document.

図４には、一つの記事が例示されており、当該記事は、引用部分Ｑ１，Ｑ２と、関連部分Ｒ１，Ｒ２とを含んでいる。種文書の各エリアＡ，Ｂ，Ｃと、各引用部分Ｑ１，Ｑ２との類似度が図５に示されるとおりであった場合、図４に示されるように、引用部分Ｑ１はエリアＡに類似し、引用部分Ｑ２はエリアＣに類似すると判定される。 FIG. 4 illustrates one article, and the article includes cited portions Q1 and Q2 and related portions R1 and R2. When the similarity between each area A, B, C of the seed document and each cited part Q1, Q2 is as shown in FIG. 5, the cited part Q1 is similar to area A as shown in FIG. The quoted portion Q2 is determined to be similar to the area C.

ついで、情報処理装置１０は、図６に示される類似度と類似度スコアとの対応表に基づき、各記事の各引用部分について、当該引用部分について判定された類似度のうち最も大きい類似度から類似度スコアを求める。類似度スコア「０」は、当該引用部分が種文書と類似しないことを示す。 Next, the information processing apparatus 10 determines, based on the correspondence table between the similarity and the similarity score shown in FIG. 6, for each cited part of each article, from the highest similarity among the similarities determined for the cited part. Find similarity score. The similarity score “0” indicates that the cited part is not similar to the seed document.

そして、情報処理装置１０は、図７に示されるように、各記事の各引用部分について、当該引用部分の類似度スコアと、当該引用部分が類似するエリアを示す情報とを保持する。 As illustrated in FIG. 7, the information processing apparatus 10 holds, for each cited part of each article, a similarity score of the cited part and information indicating an area where the cited part is similar.

（ステップＳ３７）
ついで、情報処理装置１０は、上記ステップＳ３６で種文書と類似すると判定された各引用部分（類似度スコアが「０」以外の引用部分）に対応する関連部分について、当該関連部分の記載量に基づいて、当該関連部分の関連記述量を判定する。 (Step S37)
Next, the information processing apparatus 10 uses the amount of description of the related part for the related part corresponding to each cited part (cited part having a similarity score other than “0”) determined to be similar to the seed document in step S36. Based on this, the related description amount of the relevant part is determined.

具体的には、情報処理装置１０は、図８に示される関連部分の記載量と関連記述量スコアとの対応表に基づき、各関連部分の記載量（文数）から各関連部分の関連記述量スコアを求める。 Specifically, the information processing apparatus 10 determines the related description of each related portion from the description amount (number of sentences) of each related portion based on the correspondence table of the related portion description amount and the related description amount score shown in FIG. Find the quantity score.

そして、情報処理装置１０は、図７に示されるように、各記事の各引用部分について、当該引用部分に対応する関連部分の関連記述量スコアを保持する。 As illustrated in FIG. 7, the information processing apparatus 10 holds the related description amount score of the related part corresponding to the cited part for each cited part of each article.

（ステップＳ３８）
ついで、情報処理装置１０は、引用部分を含まない記事について、当該記事と種文書との内容の類似度を計算する。 (Step S38)
Next, the information processing apparatus 10 calculates the similarity between the content of the article and the seed document for the article that does not include the cited part.

具体的には、情報処理装置１０は、上記ステップＳ３６と同様に、種文書および記事から重要語を抽出して両者の類似度を算出する。 Specifically, the information processing apparatus 10 extracts important words from the seed document and the article and calculates the degree of similarity between them as in step S36.

（ステップＳ３９）
ついで、情報処理装置１０は、引用部分を含む記事について、ステップＳ３６およびＳ３７の判定結果に基づいて記事をスコアリングし、各記事のスコアに基づいて所定数（ここでは７個）の記事を選択する。具体的には、スコアが高いものから順に所定数の記事を選択する。 (Step S39)
Next, the information processing apparatus 10 scores the articles based on the determination results of steps S36 and S37 for the articles including the cited portion, and selects a predetermined number (here, seven) of articles based on the scores of the articles. To do. Specifically, a predetermined number of articles are selected in descending order of score.

ここでは、記事のスコアＳＣは、下記式（１）により算出される。 Here, the score SC of the article is calculated by the following formula (1).

上記式（１）において、Ｎは当該記事に含まれる引用部分の総数であり、Ｓ_ｉは当該記事のｉ番目の引用部分に対応する類似度スコアであり、Ｒ_ｉは当該記事のｉ番目の引用部分に対応する関連記述量スコアである。 In the above formula (1), N is the total number of citation parts included in the article, S _i is a similarity score corresponding to the i-th citation part of the article, and R _i is the i-th part of the article. It is a related description amount score corresponding to the cited part.

情報処理装置１０は、図７に示されるように、上記算出された各記事のスコアを保持する。 As illustrated in FIG. 7, the information processing apparatus 10 holds the calculated score of each article.

また、情報処理装置１０は、上記選択された記事を保持する。具体的には、情報処理装置１０は、選択された記事（ＨＴＭＬファイル）を、当該記事に係る、ＵＲＬ、引用部分とこれに関連する関連部分、各引用部分が種文書のどのエリアに類似するかの判定結果、スコアまたはスコアの順位などと共に保持する。 In addition, the information processing apparatus 10 holds the selected article. Specifically, the information processing apparatus 10 relates the selected article (HTML file) to the URL related to the article, the citation part and the related part related thereto, and each citation part is similar to which area of the seed document. Are stored together with the determination result, the score or the rank of the score.

（ステップＳ４０）
ついで、情報処理装置１０は、引用部分を含まない記事について、ステップＳ３８の判定結果に基づいて記事をスコアリングし、各記事のスコアに基づいて所定数（ここでは３個）の記事を選択する。例えば、情報処理装置１０は、各記事についてステップＳ３８で得られた類似度に応じたスコアを求め、スコアが高いものから順に所定数の記事を選択する。 (Step S40)
Next, the information processing apparatus 10 scores the articles based on the determination result of step S38 for the articles that do not include the quoted portion, and selects a predetermined number (three in this case) of articles based on the score of each article. . For example, the information processing apparatus 10 obtains a score corresponding to the degree of similarity obtained in step S38 for each article, and selects a predetermined number of articles in descending order of score.

そして、情報処理装置１０は、上記選択された記事を保持する。具体的には、情報処理装置１０は、選択された記事（ＨＴＭＬファイル）を、当該記事に係る、ＵＲＬ、スコアまたはスコアの順位などと共に保持する。 The information processing apparatus 10 holds the selected article. Specifically, the information processing apparatus 10 holds the selected article (HTML file) together with the URL, the score, or the rank of the score related to the article.

（ステップＳ４１）
ついで、情報処理装置１０は、上記保持された各種の情報に基づき、種文書およびステップＳ３９およびＳ４０で選択された記事がレイアウトされた表示用情報を作成して、インターネットを介してクライアント装置に送信し、当該クライアント装置の画面上に表示させる。 (Step S41)
Next, the information processing apparatus 10 creates display information in which the seed document and the article selected in steps S39 and S40 are laid out based on the various pieces of held information, and transmits the display information to the client apparatus via the Internet. And displayed on the screen of the client device.

図９には、表示の一例が示されている。図９では、種文書の各エリアが分けて表示されており、各エリアの下に、当該エリアと類似する引用部分に関連する関連部分がリスト表示されている。各エリアに対応する関連部分は、スコアが高い記事に属するものから順に表示されている。上記種文書および関連部分の表示の下に、引用を含まない記事が、スコアの高い順に表示されている。 FIG. 9 shows an example of display. In FIG. 9, each area of the seed document is displayed separately, and below each area, related parts related to the cited part similar to the area are displayed in a list. The related parts corresponding to each area are displayed in order from the article belonging to the article with the highest score. Articles that do not include citations are displayed in descending order of score under the display of the seed document and related parts.

なお、本発明は、上記実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内で種々変更することができる。 In addition, this invention is not limited to the said embodiment, It can change variously within the range which does not deviate from the summary of this invention.

実施の形態に係る情報処理装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the information processing apparatus which concerns on embodiment. 実施の形態に係る情報処理装置の動作手順の一例を示すフローチャートである。It is a flowchart which shows an example of the operation | movement procedure of the information processing apparatus which concerns on embodiment. 実施の形態に係る情報処理装置の動作手順の別の一例を示すフローチャートである。It is a flowchart which shows another example of the operation | movement procedure of the information processing apparatus which concerns on embodiment. 種文書および記事の一例を示す概念図である。It is a conceptual diagram which shows an example of a seed document and an article. 種文書の各エリアと各引用部分との類似度の判定結果の一例を示す図である。It is a figure which shows an example of the determination result of the similarity of each area of a seed document, and each cited part. 類似度と類似度スコアとの対応表を示す図である。It is a figure which shows the correspondence table of similarity and a similarity score. 情報処理装置の保持内容の一例を示す図である。It is a figure which shows an example of the content hold | maintained of information processing apparatus. 関連部分の記載量と関連記述量スコアとの対応表を示す図である。It is a figure which shows the correspondence table of the description amount of a related part, and a related description amount score. 表示の一例を示す図である。It is a figure which shows an example of a display.

Explanation of symbols

１０情報処理装置、１１第１情報取得部、１２第２情報取得部、１３特定部、１４類似度判定部、１５関連記述量判定部、１６出力部、１７表示制御部。 DESCRIPTION OF SYMBOLS 10 Information processing apparatus, 11 1st information acquisition part, 12 2nd information acquisition part, 13 specific | specification part, 14 similarity determination part, 15 related description amount determination part, 16 output part, 17 display control part

Claims

First information acquisition means for acquiring first information including a sentence;
Second information acquisition means for acquiring second information to be evaluated including sentences;
Of the first information, a specifying means for specifying a citation part and a related part related to the citation part;
Similarity determination means for determining the similarity of the content of the cited part and the second information;
A related description amount determination means for determining a related description amount indicating a description amount of the related portion or an evaluation expression amount in the related portion;
Output means for outputting first information based on the determination result of the similarity and the related description amount;
An information processing apparatus comprising:

The information processing apparatus according to claim 1,
The similarity determination unit divides the second information into a plurality of parts, determines the similarity of contents between the cited part and each part of the second information, and based on the determination result, Determining which part of the second information the quoted part is similar to,
The output means outputs a determination result as to which part of the second information the cited part is similar;
An information processing apparatus characterized by that.

The information processing apparatus according to claim 1,
Display control means for causing the display device to display the related portion of the first information output by the output means when the similarity is a predetermined value or more and the related description amount is a predetermined value or more; An information processing apparatus comprising:

The information processing apparatus according to claim 3,
The similarity determination unit divides the second information into a plurality of parts, determines the similarity of contents between the cited part and each part of the second information, and based on the determination result, Determining which part of the second information the quoted part is similar to,
The display control means causes the related part to be displayed in association with a part of the second information in which the cited part corresponding to the related part is similar.
An information processing apparatus characterized by that.

On the computer,
Obtaining first information including a sentence;
Obtaining second information to be evaluated, including sentences;
Identifying a citation part and a related part related to the citation part in the first information;
Determining a similarity between contents of the quoted portion and the second information;
Determining a related description amount indicating a description amount of the related portion or an evaluation expression amount in the related portion;
Outputting first information based on the determination result of the similarity and the related description amount;
An information processing program for executing

An information processing program according to claim 5,
In the step of determining the degree of similarity, the second information is divided into a plurality of parts, the degree of similarity between the cited part and each part of the second information is determined, and based on the determination result , Determine which part of the second information the cited part is similar to,
In the outputting step, a determination result as to which part of the second information is similar to the cited part is output;
An information processing program characterized by that.

An information processing program according to claim 5,
A step of causing the display unit to display the related portion of the first information output in the outputting step when the similarity is a predetermined value or more and the related description amount is a predetermined value or more. An information processing program characterized by that.

An information processing program according to claim 7,
In the step of determining the degree of similarity, the second information is divided into a plurality of parts, the degree of similarity between the cited part and each part of the second information is determined, and based on the determination result , Determine which part of the second information the cited part is similar to,
In the displaying step, the related part is displayed in association with a part of the second information in which the cited part corresponding to the related part is similar.
An information processing program characterized by that.