JP2009116457A

JP2009116457A - Method and device for analyzing internet site information

Info

Publication number: JP2009116457A
Application number: JP2007286278A
Authority: JP
Inventors: Masakazu Hori; 雅和堀; Kyohei Kawazoe; 恭平川添
Original assignee: INTEC SYSTEMS Inst Inc
Current assignee: INTEC SYSTEMS Inst Inc
Priority date: 2007-11-02
Filing date: 2007-11-02
Publication date: 2009-05-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide an analysis device and an analysis method for effectively and accurately recognizing meanings, backgrounds and trends included in a large number of items of site information disclosed on the Internet on the basis of the site information. <P>SOLUTION: The device includes: a crawler 14 that is an information collection means for collecting text information of Web sites 12; a means for dividing words through analyzers 18 dividing the text information into words; an evaluation information extraction means for extracting evaluation information composed of evaluation objects, evaluation axes showing characteristics of the evaluation objects and evaluation expression from groups of the words through the analyzers 18; an evaluation classification means by evaluation axes through the analyzers 18 classifying the evaluation information for each evaluation axis; an evaluation expression dictionary group database 20 with a plurality of evaluation expression dictionaries composed of evaluation expression and scores showing its extent; an evaluation score calculation means that are the analyzers 18 calculating the score for each item of the evaluation information; and an evaluation analysis output means composed of a portal server 24 relatively comparing to output the scores of any different evaluation axes. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、インターネットのＷｅｂサイト上で公開されている情報を分析し、トレンド情報等を取得・提供するインターネットサイト情報分析方法と装置に関する。 The present invention relates to an Internet site information analysis method and apparatus for analyzing information published on an Internet website and acquiring / providing trend information and the like.

インターネット上では、様々な人々が公開した膨大な量の情報がＷｅｂサイトに記憶され、現在もなお増え続けている。ここで、Ｗｅｂサイトとは、掲示板（BBS：Bulletin Board
System）やホームページ、ブログと称されるウェブログ（Web
Log）などの情報源を意味する。 On the Internet, an enormous amount of information released by various people is stored in a Web site and is still increasing. Here, the website is a bulletin board (BBS: Bulletin Board).
System), homepage, blog (Web) called blog
Log).

近年、Ｗｅｂサイトに蓄積された記事を分析して新しいトレンド情報を得ようとする取り組みが盛んに行われている。例えば、意見の内容がどの程度肯定的なのか或いは否定的なのかを分析する評判分析の技術や、所定のキーワードの出現頻度や注目度（Ｂｕｒｓｔ度）のトレンドを時系列に評価する技術や、検索エンジンを使用したとき検索結果内で特定のＷｅｂサイトが上位に表示されるようにする検索エンジン最適化の技術等、様々な技術を用いたサービスが実際に提供されている。 In recent years, efforts to obtain new trend information by analyzing articles accumulated on Web sites have been actively conducted. For example, a reputation analysis technique that analyzes how positive or negative the content of an opinion is, a technique that evaluates trends in the frequency of appearance and the degree of attention (Burst degree) of a given keyword in time series, Services using various technologies are actually provided, such as a search engine optimization technology that allows a specific Web site to be displayed at the top of search results when a search engine is used.

具体例を挙げると、特許文献１で開示されているように、Ｗｅｂサイトから複数の文章情報を取得し、個々の文章情報から、などの評価対象と、その評価対象の性質を表す評価軸と、その評価軸に対する評価を表す評価表現と、その評価表現にあらかじめ設定されている評価スコアとの組からなる評判情報を抽出し、各評判情報について評価スコアを所定の方法ですべて集計（合算）することによって、分析対象である対象事物に関する知見を得る評判情報処理装置がある。この評判情報処理装置の実施例において、具体的には、評価表現に対する評価スコアは、「かっこいい」が０．２点、「軽い」が０．３点、「よい」が０．１点という具合に設定し、評判情報が有する否定性／肯定性についての度合いを数値化する分析方法が示されている。 As a specific example, as disclosed in Patent Document 1, a plurality of pieces of text information are acquired from a website, and from each piece of text information, an evaluation target such as an evaluation axis indicating the nature of the evaluation target, , Extract reputation information consisting of a set of evaluation expression representing evaluation for the evaluation axis and an evaluation score set in advance in the evaluation expression, and aggregate all evaluation scores for each reputation information by a predetermined method (summation) By doing so, there is a reputation information processing apparatus that obtains knowledge about the subject matter to be analyzed. In this embodiment of the reputation information processing apparatus, specifically, the evaluation score for the evaluation expression is 0.2 for “cool”, 0.3 for “light”, and 0.1 for “good”. And an analysis method for quantifying the degree of negativity / affirmation of reputation information.

また、特許文献２に開示されているように、Ｗｅｂサイトから複数の文章情報を取得し、個々の評判に対して評価スコアを付した評判情報を取得し、各評判情報について評価スコアを所定の計算方法で集計することによって、分析対象である対象事物に関する知見を得る評判情報処理装置がある。特に、「価格」「デザイン」などの評価軸毎に所定の重み付けの数値を設定しておき、例えば「デザイン」という評価軸が世の中で注目されている場合には、たとえ「デザイン」に関する評判情報の数が少ない場合であっても「デザイン」に関する評価スコアが集計結果に大きく反映されるように工夫されている。 Moreover, as disclosed in Patent Document 2, a plurality of pieces of text information are acquired from a website, reputation information with an evaluation score for each reputation is acquired, and an evaluation score is assigned to each reputation information. There is a reputation information processing apparatus that obtains knowledge about a subject matter that is an analysis target by tabulating with a calculation method. In particular, if you set a predetermined weighting value for each evaluation axis such as “price” and “design”, for example, when the evaluation axis “design” is attracting attention in the world, reputation information about “design” Even if the number is small, it is devised so that the evaluation score related to “design” is largely reflected in the total result.

さらに、特許文献３に開示されているように、Ｗｅｂサイトにアクセスして文章情報（風評情報）を所定の期間ごとに収集し、収集したキーワードの使用状況を定量化して、定量化したキーワードの使用状況を監視することにより、抽出したキーワードの中からトレンドキーワードとなるキーワードを選定して、近未来に検索エンジンで使用される可能性の高いトレンドキーワードをリアルタイムで予測し、そのトレンドキーワードに関連する情報を提供するトレンド予測装置がある。これは、実際に検索エンジンに入力された検索キーワードの使用実績を分析してトレンドキーワードを得るよりも、リアルタイム性に優れているという特徴がある。
特開２００７−１７２０５１号公報特開２００６−５３６８５号公報特開２００６−２２７９６５号公報 Furthermore, as disclosed in Patent Document 3, the website information is accessed, text information (reputation information) is collected every predetermined period, the usage status of the collected keywords is quantified, and the quantified keyword By monitoring usage, select keywords that will become trend keywords from the extracted keywords, predict trend keywords that are likely to be used in search engines in the near future, and relate to those trend keywords There is a trend prediction device that provides information to be used. This is characterized in that it has better real-time performance than the trend keyword is obtained by analyzing the actual use of the search keyword input to the search engine.
JP 2007-172051 A JP 2006-53685 A JP 2006-227965 A

しかし、特許文献１の評判情報処理装置にあっては、「かっこいい」「軽い」「よい」などの評価表現と、評価表現それぞれに設定された評価スコアは、例えば「カメラ」「電池の持ち」といった異なる評価軸を評価する場合にも共通して使用されるものである。従って、各評価軸に合わせた適切な評価ができないものであった。また、多くの評判情報の評価スコアを、最終的には１つの評価表現や１つの評価軸ごとに集計した総合的評価値として取り扱うので、一人の投稿者が持っている異なる評価軸に対しての意見の相関関係が、総合評価値に埋没してしまう。したがって、例えば「デザインがよいので、価格は安く感じる」と「デザインはよいが、価格は高いと感じる」という二人の投稿者の記事があった場合でも、投稿者一人一人がどのような優先度の付け方をするのか、そのバランス感覚に傾向はないか、といった分析にまで踏み込むことができないものであった。 However, in the reputation information processing apparatus of Patent Document 1, the evaluation expressions such as “cool”, “light”, and “good” and the evaluation scores set for each of the evaluation expressions are, for example, “camera” and “battery possession”. It is also used in common when evaluating different evaluation axes. Therefore, it was impossible to perform appropriate evaluation according to each evaluation axis. In addition, since the evaluation scores of a lot of reputation information are finally handled as one evaluation expression and a comprehensive evaluation value that is aggregated for each evaluation axis, it can be used for different evaluation axes that one contributor has. The correlation of opinions is buried in the comprehensive evaluation value. Therefore, for example, even if there are articles from two contributors, such as “I feel good because the design is good” and “I feel that the design is good, but the price is high”, what kind of priority is given to each contributor? I couldn't go into the analysis of how to measure the degree or whether there is a tendency to balance.

特許文献２の評判情報処理装置にあっては、評価スコアを集計した総合評価値の中に個々の評価情報が持つ価値が埋没してしまわないように工夫がなされたものではあるが、特許文献１と同様に、投稿者一人一人が有する複数の要望について、その優先度の付け方やバランス感覚の傾向等にまで踏み込んだ分析ができないものであった。 In the reputation information processing apparatus of Patent Document 2, the device has been devised so that the value of each evaluation information is not buried in the comprehensive evaluation value obtained by collecting the evaluation scores. As in the case of No. 1, it was impossible to analyze a plurality of requests possessed by each contributor, including how to assign priorities and a tendency toward balance.

特許文献３のトレンド予測装置にあっては、分析の対象としたキーワードについて、そのキーワードを含む文章情報の肯定性／否定性やキーワードの発生頻度などを分析して、そのキーワード自体をトレンドキーワードに選定するか否かを判断するものであって、そのキーワードの周囲に広がっていく投稿者の興味の変化を実体的に把捉し得るものではなかった。 In the trend prediction device of Patent Document 3, the affirmative / negative of text information including the keyword and the frequency of occurrence of the keyword are analyzed for the keyword to be analyzed, and the keyword itself is used as the trend keyword. It was a decision whether or not to select, and it was not possible to substantially grasp the change in the interest of the contributor spreading around the keyword.

また、特定のキーワードの周辺情報を収集する方法として相関分析手法が提案されており、そのキーワードから連想される別のキーワードを取得する連想検索等のサービスも行われているが、特許文献３と同様に、そのキーワードの周囲に広がっていく投稿者の興味の変化を実態的に把捉し得るものではなかった。 Further, a correlation analysis method has been proposed as a method of collecting peripheral information of a specific keyword, and a service such as an associative search for acquiring another keyword associated with the keyword has been performed. Similarly, changes in the interests of contributors spreading around the keyword could not be grasped in practice.

また、有益な情報が得られる活発なＷｅｂサイトをリアルタイムに知りたいという要望があるが、そのような要望に応え得る方法やサービスは提案されておらず、未だ実用化されていないものであった。 In addition, there is a request to know in real time an active website where useful information can be obtained, but no method or service that can meet such a request has been proposed and has not yet been put into practical use. .

本発明は上記背景技術に鑑みて成されたもので、インターネット上に公開された多くのサイト情報を基に、それらの情報が持つ意味や背景、傾向を効果的に正確に知ることができる分析装置及び分析方法を提供するもので、特に、１つの記事が有する複数の要望を対比的に評価する評判分析、有益な情報が得られる活発なＷｅｂサイトをリアルタイムに抽出するＷｅｂサイト活性度分析、およびあるキーワードの周囲に広がっていく投稿者の興味の変化を実態的に把捉する共起情報分析を行い、Ｗｅｂサイトに蓄積された記事から新しいトレンドを正確に知ることができるインターネットサイト情報分析方法と装置を提供することを目的とする。 The present invention was made in view of the above background art, and based on a lot of site information published on the Internet, an analysis that can effectively and accurately know the meaning, background, and tendency of the information. Providing a device and an analysis method, in particular, a reputation analysis that evaluates a plurality of requests of one article in comparison, a website activity analysis that extracts an active website that can provide useful information in real time, Internet site information analysis method that can analyze the co-occurrence information that actually grasps the change of the interest of the poster that spreads around a certain keyword, and can know the new trend accurately from the articles accumulated on the website And to provide a device.

この発明は、インターネット上に存在するＷｅｂサイトにアクセスしてその文章情報を収集し、分析を行うインターネットサイト情報分析方法において、前記Ｗｅｂサイトの文章情報の収集を行う情報収集ステップと、前記文章情報を単語に分割する単語分割ステップと、前記単語群から、評価対象と、その評価対象の性質を表す評価軸と、その評価軸に対する評価を表す評価表現との組からなる評判情報を抽出する評判情報抽出ステップと、
前記評判情報を前記評価軸毎に分類する評価軸別分類ステップと、前記評価表現とその程度を示すスコアとの組からなる評価表現辞書が評価軸毎にあらかじめ複数設けられ、その中の分析対象の評価軸に対応した評価表現辞書に基づいて評点を算出する評点計算ステップと、前記各文章情報について、任意の異なる評価軸の評点を相対比較して出力する評判分析出力ステップとを備えるインターネットサイト情報分析方法である。 The present invention relates to an Internet site information analysis method for accessing and analyzing the website information existing on the Internet, collecting the sentence information, an information collecting step for collecting the sentence information of the website, and the sentence information. A word dividing step for dividing a word into words, and a reputation that extracts from the word group a reputation information consisting of a set of an evaluation object, an evaluation axis representing the nature of the evaluation object, and an evaluation expression representing an evaluation with respect to the evaluation axis An information extraction step;
A plurality of evaluation expression dictionaries are provided in advance for each evaluation axis, and a plurality of evaluation expression dictionaries are provided for each evaluation axis. An internet site comprising: a score calculation step for calculating a score based on an evaluation expression dictionary corresponding to the evaluation axis; and a reputation analysis output step for comparing and outputting scores of arbitrary different evaluation axes for each sentence information Information analysis method.

またこの発明は、インターネット上に存在するＷｅｂサイトにアクセスしてその文章情報を収集し、分析を行うインターネットサイト情報分析方法において、前記文章情報と各Ｗｅｂサイトの更新日情報を収集する情報収集ステップと、前記文章情報を単語に分割する単語分割ステップと、前記単語群の中から所定のキーワードと同一または類似する単語を抽出し、その単語を含む文章情報の数を算出する関連情報投稿数計算ステップと、各Ｗｅｂサイトから収集した文章情報数に占める前記関連情報投稿数の割合を、各Ｗｅｂサイト毎に算出する関連情報投稿率計算ステップと、分析を行う基準日と前記更新日情報をもとに、各Ｗｅｂサイトの更新頻度を算出する更新頻度計算ステップと、各Ｗｅｂサイト毎の前記関連情報投稿率と前記更新頻度とを相対比較して出力するサイト活性度分析出力ステップとを備えるインターネットサイト情報分析方法である。 Further, the present invention provides an information collecting step for collecting the sentence information and update date information of each website in an internet site information analyzing method for collecting and analyzing sentence information by accessing a website existing on the Internet. A word dividing step for dividing the sentence information into words, and extracting a word that is the same as or similar to a predetermined keyword from the word group and calculating the number of pieces of related information that includes the word A related information posting rate calculating step for calculating the ratio of the number of related information posts to the number of sentence information collected from each website, for each website, a reference date for performing analysis, and the update date information In addition, an update frequency calculation step for calculating the update frequency of each website, the related information posting rate for each website, and the update An Internet site information analyzing method and a site activity analysis output step of outputting the frequency relative comparison.

またこの発明は、インターネット上に存在するＷｅｂサイトにアクセスしてその文章情報を収集し、分析を行うインターネットサイト情報分析方法において、前記Ｗｅｂサイトの文章情報の収集を所定の期間が経過するごとに行う情報収集ステップと、前記文章情報を単語に分割する単語分割ステップと、前記単語群の中から調査対象のキーワードと同一または類似する単語を含む文章情報を抽出する調査対象情報抽出ステップと、前記調査対象情報を構成する単語に含まれ、前記キーワードと別のキーワードである共起キーワードを抽出する共起キーワード抽出ステップと、調査対象情報に前記共起キーワードが出現する頻度をもとに、前記共起キーワード毎に評点計算する共起キーワード評点計算ステップと、前記共起キーワードを前記共起キーワード評点の順に並び替えて共起情報リストを作成するソートステップと、所定の期間ごとに得られた前記共起情報リストを、時系列に表して出力する共起情報分析出力ステップとを備えるインターネットサイト情報分析方法である。 Further, according to the present invention, in the Internet site information analysis method for accessing and analyzing the website information on the Internet, collecting the sentence information, the collection of the sentence information of the website is performed every time a predetermined period elapses. An information collecting step for performing, a word dividing step for dividing the sentence information into words, an investigation target information extracting step for extracting sentence information including words that are the same as or similar to the keyword to be investigated from the word group, Based on the co-occurrence keyword extraction step for extracting a co-occurrence keyword that is included in the words constituting the investigation target information and is a keyword different from the keyword, and the frequency of occurrence of the co-occurrence keyword in the investigation target information, A co-occurrence keyword score calculating step for calculating a score for each co-occurrence keyword; A sorting step for rearranging the keyword occurrence scores to create a co-occurrence information list and a co-occurrence information analysis output step for outputting the co-occurrence information list obtained for each predetermined period in time series are provided. Internet site information analysis method.

またこの発明は、コンピュータシステムにより構成され、インターネット上に存在するＷｅｂサイトにアクセスしてその文章情報を収集し、分析を行うインターネットサイト情報分析装置において、前記Ｗｅｂサイトの文章情報の収集を行う情報収集手段と、前記文章情報を単語に分割する単語分割手段と、前記単語群から、評価対象と、その評価対象の性質を表す評価軸と、その評価軸に対する評価を表す評価表現との組からなる評判情報を抽出する評判情報抽出手段と、前記評判情報を前記評価軸毎に分類する評価軸別分類手段と、前記評価表現とその程度を示すスコアとの組からなる評価表現辞書を、前記評価軸ごとに複数設けた評価表現辞書群データベースと、分析対象の評価軸に対応した評価表現辞書を前記評価表現辞書群データベースから抽出し、その評価表現辞書に基づいて評判情報毎に評点を算出する評点計算手段と、前記各文章情報について、任意の異なる評価軸の評点を相対比較して出力する評判分析出力手段とを備えるインターネットサイト情報分析装置である。 Further, the present invention is an information that is constituted by a computer system, collects text information by accessing a web site existing on the Internet, collects text information, and analyzes the text information of the web site. From a set of a collecting means, a word dividing means for dividing the sentence information into words, an evaluation object, an evaluation axis representing the property of the evaluation object, and an evaluation expression representing an evaluation with respect to the evaluation axis, from the word group A reputation information extracting means for extracting reputation information, an evaluation axis-specific classification means for classifying the reputation information for each evaluation axis, and an evaluation expression dictionary comprising a set of the evaluation expression and a score indicating the degree, A plurality of evaluation expression dictionary group databases provided for each evaluation axis and an evaluation expression dictionary group data corresponding to the evaluation axis to be analyzed A score calculation means for calculating a score for each reputation information based on the evaluation expression dictionary, and a reputation analysis output means for comparing and outputting scores of arbitrary different evaluation axes for each sentence information Is an Internet site information analyzing apparatus.

前記評判分析出力手段は、任意の異なる２つの評価軸の評点に基づき２次元グラフ、または、任意の異なる３つの評価軸の評点に基づき３次元グラフに表して出力するものである。 The reputation analysis output means outputs a two-dimensional graph based on the scores of two different evaluation axes or a three-dimensional graph based on the scores of three different evaluation axes.

またこの発明は、コンピュータシステムにより構成され、インターネット上に存在するＷｅｂサイトにアクセスしてその文章情報を収集し、分析を行うインターネットサイト情報分析装置において、前記Ｗｅｂサイトの文章情報の収集を行う情報収集手段と、前記文章情報を単語に分割する単語分割手段と、前記単語群の中から所定のキーワードと同一または類似する単語を抽出し、その単語を含む文章情報の数を算出する関連情報投稿数計算手段と、各Ｗｅｂサイトから収集した文章情報数に占める前記関連情報投稿数の割合を、各Ｗｅｂサイト毎に算出する関連情報投稿率計算手段と、各Ｗｅｂサイトの更新日付を収集する更新日情報収集手段と、分析を行う基準日と前記更新日情報をもとに、各Ｗｅｂサイトの更新頻度を算出する更新頻度計算手段と、各Ｗｅｂサイト毎の前記関連情報投稿率と前記更新頻度とを相対比較して出力するサイト活性度分析出力手段とを備えるインターネットサイト情報分析装置である。 Further, the present invention is an information that is constituted by a computer system, collects text information by accessing a web site existing on the Internet, collects text information, and analyzes the text information of the web site. A collecting means; a word dividing means for dividing the sentence information into words; and a related information post for extracting a word that is the same as or similar to a predetermined keyword from the word group and calculating the number of sentence information including the word A number calculating means, a related information posting rate calculating means for calculating, for each website, a ratio of the number of related information posts to the number of text information collected from each website, and an update for collecting an update date of each website. Update that calculates the update frequency of each website based on the date information collection means, the reference date for analysis, and the update date information A degree calculating unit, an Internet site information analyzer and a site activity analysis output means for outputting said update frequency and the related information posts index for each Web site by relative comparison.

前記活性度分析出力手段は、前記関連情報投稿率と前記更新頻度とを２次元グラフに表して出力するものである。 The activity analysis output means outputs the related information posting rate and the update frequency in a two-dimensional graph.

またこの発明は、インターネット上に存在するＷｅｂサイトにアクセスしてその文章情報を収集し、分析を行うインターネットサイト情報分析装置において、前記Ｗｅｂサイトの文章情報の収集を所定の期間が経過するごとに行う情報収集手段と、前記文章情報を単語に分割する単語分割手段と、前記単語群の中から調査対象のキーワードと同一または類似する単語を含む文章情報を抽出する調査対象情報抽出手段と、前記調査対象情報を構成する単語に含まれ、前記調査対象のキーワードと同一または類似の単語を除く他の単語である共起キーワードを抽出する共起キーワード抽出手段と、調査対象情報に前記共起キーワードが出現する頻度をもとに、前記共起キーワード毎に評点計算する共起キーワード評点計算手段と、前記共起キーワードを前記共起キーワード評点の順に並び替えて共起情報リストを作成するソート手段と、所定の期間ごとに得られた前記共起情報リストを、時系列に表して出力する共起情報分析出力手段とを備えるインターネットサイト情報分析装置である。 Further, according to the present invention, in an Internet site information analyzing apparatus that accesses and analyzes a website on the Internet, collects the sentence information, and collects the sentence information of the website every time a predetermined period elapses. Information collecting means for performing, word dividing means for dividing the sentence information into words, investigation target information extracting means for extracting sentence information including words that are the same as or similar to the keyword to be investigated from the word group, and A co-occurrence keyword extracting means for extracting a co-occurrence keyword that is included in words constituting the investigation target information and is other words excluding the same or similar word as the investigation target keyword; and the co-occurrence keyword in the investigation target information A co-occurrence keyword score calculating means for calculating a score for each co-occurrence keyword based on the frequency of occurrence of the co-occurrence keyword, and the co-occurrence keyword Sorting means for rearranging the keywords in the order of the co-occurrence keyword scores to create a co-occurrence information list, and co-occurrence information analysis output for outputting the co-occurrence information list obtained for each predetermined period in time series And an Internet site information analyzing apparatus.

この発明によれば、Ｗｅｂサイトに公開され蓄積された膨大な量の情報を分析し、的確なトレンド情報を容易に得ることが可能になる。 According to the present invention, it is possible to analyze an enormous amount of information released and accumulated on a website and easily obtain accurate trend information.

特に、請求項１，４及び５記載の発明によれば、評価軸ごとに評価表現辞書および評価スコアを設けることによって投稿者の意見を的確に評点計算することが可能であり、かつ、投稿者一人一人が有する複数の評価軸（要望）についての優先度の付け方やバランス感覚の傾向等にまで踏み込んだ評判分析をすることができる。 In particular, according to the inventions of claims 1, 4 and 5, it is possible to accurately calculate the score of the contributor's opinion by providing an evaluation expression dictionary and an evaluation score for each evaluation axis. It is possible to perform reputation analysis that goes into the way of assigning priorities to the evaluation axes (requests) of each person and the tendency of balance.

また、請求項２，６及び７記載の発明によれば、各Ｗｅｂサイトについて所定キーワードに関連した情報の投稿率と更新頻度を計算するサイト活性度分析によって、活発に情報発信しているＷｅｂサイトを容易に知ることができ、それら情報発信源として注目すべきＷｅｂサイトに絞って情報収集を行えば、有益なトレンド情報を効率よく得ることができる。 Further, according to the inventions of claims 2, 6 and 7, the website actively transmitting information by the site activity analysis for calculating the posting rate and the update frequency of the information related to the predetermined keyword for each website. If information is collected only on Web sites that should be noted as information sources, useful trend information can be obtained efficiently.

さらに、請求項３，８記載の発明によれば、所定のキーワードに共起する別のキーワードの変化を時系列で分析する共起情報分析によって、所定のキーワードの周囲に広がっていく投稿者の興味の変化を実態的に把捉することができる。 Furthermore, according to the third and eighth aspects of the present invention, the contributor's information spreading around the predetermined keyword is analyzed by co-occurrence information analysis that analyzes the change of another keyword co-occurring with the predetermined keyword in time series. The change in interest can be grasped in practice.

以下、本発明のインターネットサイト情報分析装置１０を配置したネットワークシステムの実施形態を図１に基づいて説明する。このネットワークシステムにおいては、多くの人が感想や意見などの情報を公開しているＷｅｂサイト１２と、指定されたＷｅｂサイトから定期的にＲＳＳ（Rich Site Summary）フォーマットの文章情報を収集するクローラ１４と、クローラ１４が収集した情報がデータベース化して記憶している記事データベース１６と、後述する３つの分析を行うプログラムを備えたアナライザ１８と、分析する評価軸に対応に対応する評価表現とその評価スコアが設定されている評価表現辞書群データベース２０と、分析の結果を格納する分析結果データベース２２と、利用者２６が所持するパソコンが接続され、所望の分析結果を分析結果データベース２２から抽出して利用者２６に配信するポータルサーバ２４とが、インターネット上にそれぞれ配置されている。 Hereinafter, an embodiment of a network system in which the Internet site information analysis apparatus 10 of the present invention is arranged will be described with reference to FIG. In this network system, a crawler 14 that regularly collects text information in RSS (Rich Site Summary) format from a designated website and a website 12 on which many people publish information such as comments and opinions. An article database 16 in which information collected by the crawler 14 is stored as a database, an analyzer 18 having a program for performing three analyzes described later, an evaluation expression corresponding to the evaluation axis to be analyzed, and its evaluation An evaluation expression dictionary group database 20 in which scores are set, an analysis result database 22 for storing analysis results, and a personal computer owned by a user 26 are connected, and desired analysis results are extracted from the analysis result database 22. The portal server 24 distributed to the user 26 is arranged on the Internet. To have.

本発明の第一の実施形態に係る評判分析を行うインターネットサイト情報分析方法を、図２から図６に基づいて説明する。まず、図２に示すフローに基づいて概要を説明する。ステップＳ１１０で、ブログである各Ｗｅｂサイトから文章情報を収集する。インターネットサイト情報分析装置１０では、クローラ１４と、記事データベース１６と、評判分析アナライザ１８ａとがこの情報収集手段としての働きを行う。次に、ステップＳ１２０で、収集した文章情報を単語（品詞）に分解する。インターネットサイト情報分析装置１０では、評判分析アナライザ１８ａがこの単語分割手段としての働きを行う。ステップＳ１３０で、分割された単語群から、評価対象と、その評価対象の性質を表す評価軸と、その評価軸に対する評価を表す評価表現との組からなる評判情報を抽出する。インターネットサイト情報分析装置１０では、評価分析アナライザ１８ａがこの評判情報抽出手段としての働きを行う。 An Internet site information analysis method for performing reputation analysis according to the first embodiment of the present invention will be described with reference to FIGS. First, an outline will be described based on the flow shown in FIG. In step S110, sentence information is collected from each website as a blog. In the Internet site information analysis apparatus 10, the crawler 14, the article database 16, and the reputation analysis analyzer 18a function as information collection means. Next, in step S120, the collected sentence information is broken down into words (parts of speech). In the Internet site information analysis apparatus 10, the reputation analysis analyzer 18a functions as the word dividing means. In step S130, reputation information comprising a set of an evaluation object, an evaluation axis representing the nature of the evaluation object, and an evaluation expression representing an evaluation with respect to the evaluation axis is extracted from the divided word group. In the Internet site information analysis apparatus 10, the evaluation analysis analyzer 18a functions as this reputation information extraction means.

そして、ステップＳ１４０で、抽出された評判情報を、評価軸ごとに分類する。インターネットサイト情報分析装置１０では、評価分析アナライザ１８ａがこの評価軸別分類手段としての働きを行う。ステップＳ１５０は、評価表現辞書群データベース２０から分析対象である評価軸に対応した評価表現辞書を抽出して各評判情報の評点を計算し、計算結果を格納する。インターネットサイト情報分析装置１０では、評価分析アナライザ１８ａと、評判分析結果データベース２２ａとがこの評点計算手段としての働きを行う。そして、ステップＳ１６０で、利用者２６の要求に応じ、任意の異なる評価軸による評点を有する各文章情報をグラフに表示して出力する。インターネットサイト情報分析装置１０では、評判分析結果データベース２２ａと、ポータルサーバ２４が備える評判分析表示フレームワーク２４ａとがこの評判分析出力手段としての働きを行う。 In step S140, the extracted reputation information is classified for each evaluation axis. In the Internet site information analysis apparatus 10, the evaluation analysis analyzer 18a functions as the evaluation axis-specific classification means. In step S150, the evaluation expression dictionary corresponding to the evaluation axis to be analyzed is extracted from the evaluation expression dictionary group database 20, the score of each piece of reputation information is calculated, and the calculation result is stored. In the Internet site information analysis apparatus 10, the evaluation analysis analyzer 18a and the reputation analysis result database 22a serve as the score calculation means. In step S160, in response to a request from the user 26, each sentence information having a score based on any different evaluation axis is displayed on a graph and output. In the Internet site information analysis apparatus 10, the reputation analysis result database 22a and the reputation analysis display framework 24a provided in the portal server 24 function as this reputation analysis output means.

次に、上記評判分析の各ステップについて、詳細に説明する。図３に示すように各Ｗｅｂサイトには、複数の文章情報が存在する。ステップＳ１１０では、例えば文章情報ａ１「Ａ社の車は、価格は高いしハンドルが使いにくい」というように、所定の文章の単位ごとのに分割された文章情報を収集する。さらにこれらの文章情報は、ステップＳ１２０において、名詞、助詞、形容詞、動詞等の単語（品詞）に分解される。 Next, each step of the reputation analysis will be described in detail. As shown in FIG. 3, each Web site has a plurality of pieces of text information. In step S110, for example, the sentence information divided for each predetermined sentence unit is collected, such as sentence information a1 “the car of company A is expensive and the handle is difficult to use”. Further, these pieces of text information are decomposed into words (parts of speech) such as nouns, particles, adjectives and verbs in step S120.

ステップＳ１３０では、それらの単語を公知技術である日本語形態素分析技術や係り受け解析技術等を用いて単語同士の関係付けを行う。日本語形態素分析技術によれば、文章中における名詞や助詞が出現する前後関係等を分析することによって、例えば文章情報ａ１に含まれる単語であれば、「価格」には「高い」が対応し、「ハンドル」には「使いにくい」が対応する、といった関係付けを行うことができる。このようにして文章情報から、評価対象と、その評価対象の性質を表す評価軸と、その評価軸に対する評価を表す評価表現との組である評価情報を抽出する。例えば文章情報ａ１であれば、「車、車の価格、高い」および「車、車の操作性、使いにくい」という２つの評判情報を抽出する。 In step S130, the words are related to each other using Japanese morphological analysis technology, dependency analysis technology, and the like, which are known techniques. According to the Japanese morphological analysis technology, by analyzing the context in which nouns and particles appear in a sentence, for example, if the word is included in the sentence information a1, “price” corresponds to “high”. , “Handle” corresponds to “Difficult to use”. In this way, the evaluation information that is a set of the evaluation object, the evaluation axis representing the property of the evaluation object, and the evaluation expression representing the evaluation with respect to the evaluation axis is extracted from the text information. For example, in the case of the text information a1, two pieces of reputation information “car, car price, high” and “car, car operability, difficult to use” are extracted.

ステップＳ１４０では、上記の評判情報を、「車の価格」「車の操作性」等の評価軸ごとに分類する。後述するステップＳ１５０の計算処理を効率的に行うために情報を整理する前準備のステップである。 In step S140, the reputation information is classified for each evaluation axis such as “car price” and “car operability”. This is a preparatory step for organizing information in order to efficiently perform the calculation processing in step S150 described later.

ステップＳ１５０では、評価表現辞書群データベース２０に設けられた複数の評価表現辞書を用いて各評判情報の評点を計算する。ここで、評価表現辞書の例を図４に基づいて説明する。評価表現辞書は評価軸ごとに設けてあり、例えば「車の価格」という評価軸に対しては評価表現辞書Ａが該当し、「高い」「割高」「お手頃」というように車の価格に関する評価表現として考えられるものをリストアップし、それぞれの評価表現に対して肯定的／否定的なレベルが評価スコアとして定義されている。一方「車の操作性」という評価軸に対しては評価表現辞書Ｂが該当し、「車の操作性」を評価するのに適した別の評価表現をリストアップし、それぞれの評価表現に対して肯定的／否定的なレベルが評価スコアとして定義されている。 In step S150, a score of each piece of reputation information is calculated using a plurality of evaluation expression dictionaries provided in the evaluation expression dictionary group database 20. Here, an example of the evaluation expression dictionary will be described with reference to FIG. The evaluation expression dictionary is provided for each evaluation axis. For example, the evaluation expression dictionary A corresponds to the evaluation axis of “car price”, and it relates to the price of the car such as “high”, “expensive”, “moderate”. Possible evaluation expressions are listed, and a positive / negative level is defined as an evaluation score for each evaluation expression. On the other hand, the evaluation expression dictionary B corresponds to the evaluation axis of “car operability”, and another evaluation expression suitable for evaluating “car operability” is listed, and for each evaluation expression, Positive / negative levels are defined as evaluation scores.

上記の評価表現辞書を用いるステップＳ１５０について、さらに詳細な処理について、図５に基づいて説明する。ステップＳ１５１は、評価表現辞書群データベース２０の中の一の評価表現辞書に着目し、その評価表現辞書に該当する評判情報群を抽出するステップである。例えば、上記評価表現辞書Ａに着目した場合には、評判情報の評価軸が「車の価格」に該当する評判情報を抽出する。 A more detailed process of step S150 using the evaluation expression dictionary will be described with reference to FIG. Step S151 is a step of paying attention to one evaluation expression dictionary in the evaluation expression dictionary group database 20 and extracting a reputation information group corresponding to the evaluation expression dictionary. For example, when attention is paid to the evaluation expression dictionary A, reputation information in which the evaluation axis of reputation information corresponds to “car price” is extracted.

ステップＳ１５２では、その評価表現辞書に列挙されている評価表現を含む評判情報を順番に抽出し、さらにステップＳ１５３で、その評判情報の評点としてその評価表現のスコアを割り付ける計算を行ってその結果を評判分析結果データベース２２ａに格納する。例えば、「高い」という評価表現を含む評判情報には「＋１」という評価スコアを、「お手頃」という評価表現を含む評判情報には「＋４」という評価スコアをそれぞれ割り付ける計算を行い、その結果を保存する。そして判断ステップであるステップＳ１５４において、すべての評価表現辞書についての評点計算がされたか否かを判断し、ＮＯであれば次の評価表現辞書についてステップＳ１５１からＳ１５３を繰り返し、ＹＥＳになった時点でステップＳ１５０が終了する。従って、ステップＳ１５０によって、例えば文章情報ａ１の場合には、「車の価格＝＋１」「車の操作性＝＋１」という２つの評価軸についての評点が割り付けられることになる。 In step S152, reputation information including evaluation expressions listed in the evaluation expression dictionary is extracted in order, and in step S153, a calculation is performed to assign a score of the evaluation expression as a score of the reputation information, and the result is obtained. Store in the reputation analysis result database 22a. For example, a calculation is performed in which an evaluation score of “+1” is assigned to reputation information including an evaluation expression “high”, and an evaluation score of “+4” is assigned to reputation information including an evaluation expression “fair”. Save. In step S154, which is a determination step, it is determined whether or not the score calculation has been performed for all evaluation expression dictionaries. If NO, steps S151 to S153 are repeated for the next evaluation expression dictionary, and when YES is obtained. Step S150 ends. Therefore, for example, in the case of the text information a1, in the case of the text information a1, scores for the two evaluation axes “car price = + 1” and “car operability = + 1” are assigned.

ステップＳ１６０では、ステップＳ１５０で評点が割り付けられた文章情報を、任意の異なる評価軸に基づいてグラフに表した評判分析情報として提供する。例えば図６に示すように、評価軸として「車の価格」と「車の操作性」の２つを選択した場合、各文章情報ごとに散布図のグラフが描かれる。この評判分析情報は、ＰＵＬＬ型（利用者２６が必要に応じて情報を取り出す）で提供される。なお、３つの評価軸を選択したときに３次元のグラフを描くものであってもよい。 In step S160, the sentence information to which the score is assigned in step S150 is provided as reputation analysis information represented in a graph based on any different evaluation axis. For example, as shown in FIG. 6, when “car price” and “car operability” are selected as evaluation axes, a graph of a scatter diagram is drawn for each piece of text information. This reputation analysis information is provided in a PULL type (the user 26 extracts information as needed). A three-dimensional graph may be drawn when three evaluation axes are selected.

以上説明した本発明の第一の実施形態に係る評判分析（ステップＳ１１０からステップＳ１６０）のインターネットサイト情報分析方法によれば、評価軸ごとに評価表現辞書および評価スコアを設けているので、投稿者の意見を的確にとらえて評点計算することが可能である。また、多くの評価項目の意見を単純に集計せず、一つ一つの文章情報の意味を分析して、異なる評価軸について相対的にグラフに表することによって、不特定多数の投稿者がもつ多数の要望に関して、全体的な傾向をより細かい項目に踏み込んだ評判分析およびその傾向分析をすることが可能になる。 According to the Internet site information analysis method of reputation analysis (step S110 to step S160) according to the first embodiment of the present invention described above, an evaluation expression dictionary and an evaluation score are provided for each evaluation axis. It is possible to calculate the score by accurately capturing the opinions. In addition, it does not simply summarize the opinions of many evaluation items, but analyzes the meaning of each piece of text information and displays it in a graph with respect to different evaluation axes, so that many unspecified contributors have it. With regard to a large number of requests, it becomes possible to perform a reputation analysis and a trend analysis in which the overall trend is subdivided into more detailed items.

次に、本発明の第二の実施形態に係るサイト活性度分析を行うインターネットサイト情報分析方法を、図７から図１１に基づいて説明する。まず、図７に示すフローに基づいて概要を説明する。ステップＳ２１０で、各Ｗｅｂサイトから文章情報と更新日情報を収集する。この時、インターネットサイト情報分析装置１０では、クローラ１４と、記事データベース１６と、サイト活性度分析アナライザ１８ｂとがこの情報収集手段および更新日情報取得手段としての働きを行う。次に、ステップＳ２２０で、収集した文章情報を単語（品詞）に分解する。インターネットサイト情報分析装置１０では、サイト活性度分析アナライザ１８ｂがこの単語分割手段としての働きを行う。そして、ステップＳ２３０で、分割された単語群の中に所定のキーワードと同一又は類似するものを含む文章情報、すなわち関連情報の投稿数を算出する。インターネットサイト情報分析装置１０では、サイト活性度分析アナライザ１８ｂがこの関連情報投稿数計算手段としての働きを行う。さらに、ステップＳ２４０で、そのＷｅｂサイトから収集した文章情報数に占める関連情報投稿数の割合である関連情報投稿率を計算し、計算結果を格納する。インターネットサイト情報分析装置１０では、サイト活性度分析アナライザ１８ｂと、サイト活性度分析結果データベース２２ｂがこの関連情報投稿率計算手段としての働きを行う。 Next, an Internet site information analysis method for performing site activity analysis according to the second embodiment of the present invention will be described with reference to FIGS. First, an outline will be described based on the flow shown in FIG. In step S210, text information and update date information are collected from each Web site. At this time, in the Internet site information analysis apparatus 10, the crawler 14, the article database 16, and the site activity analysis analyzer 18b function as the information collection unit and the update date information acquisition unit. Next, in step S220, the collected sentence information is decomposed into words (parts of speech). In the Internet site information analyzing apparatus 10, the site activity analysis analyzer 18b functions as this word dividing means. Then, in step S230, the sentence information including the same or similar to the predetermined keyword in the divided word group, that is, the number of related information posts is calculated. In the Internet site information analysis apparatus 10, the site activity analysis analyzer 18b functions as the related information posting number calculation means. In step S240, the related information posting rate, which is the ratio of the number of related information posts to the number of text information collected from the website, is calculated, and the calculation result is stored. In the Internet site information analysis apparatus 10, the site activity analysis analyzer 18b and the site activity analysis result database 22b function as the related information posting rate calculation means.

ステップＳ２５０は、分析を行う基準日と更新日情報をもとに、各Ｗｅｂサイトの更新頻度を計算し、計算結果を格納するもので、インターネットサイト情報分析装置１０では、サイト活性度分析アナライザ１８ｂと、サイト活性度分析結果データベース２２ｂとがこの更新頻度計算手段としての働きを行う。 Step S250 calculates the update frequency of each Web site based on the reference date for analysis and update date information, and stores the calculation result. The Internet site information analysis apparatus 10 uses the site activity analysis analyzer 18b. The site activity analysis result database 22b serves as the update frequency calculation means.

ステップＳ２６０は、利用者２６の要求に応じ、関連情報投稿率と更新頻度という２つの計算値が割り付けられた各Ｗｅｂサイトをグラフに表示して出力する。インターネットサイト情報分析装置１０では、サイト活性度分析結果データベース２２ｂと、ポータルサーバ２４が備えるサイト活性度分析表示フレームワーク２４ｂとがこのサイト分析出力手段としての働きを行う。 In step S260, in response to a request from the user 26, each Web site to which two calculated values of the related information posting rate and the update frequency are assigned is displayed on a graph and output. In the Internet site information analysis apparatus 10, the site activity analysis result database 22b and the site activity analysis display framework 24b provided in the portal server 24 function as this site analysis output means.

次に、上記サイト活性度分析の各ステップについて、詳細に説明する。図８に示すように各Ｗｅｂサイトには、複数の文章情報と各Ｗｅｂサイトの最終更新日の情報が存在する。ステップＳ２１０では、例えばＷｅｂサイト１であれば、文章情報ａ１，ａ２、および「最終更新日：９月１１日」という更新日情報を収集する。さらにこれらの文章情報はステップＳ２２０において、名詞、形容詞、動詞等の単語（品詞）に分解される。 Next, each step of the site activity analysis will be described in detail. As shown in FIG. 8, each Web site includes a plurality of text information and information on the last update date of each Web site. In step S210, for example, for the Web site 1, the text information a1 and a2 and the update date information “last update date: September 11” are collected. Further, in step S220, the sentence information is decomposed into words (parts of speech) such as nouns, adjectives and verbs.

ステップＳ２３０について、さらに詳細な処理について図９に基づいて説明する。ステップＳ２３０は、ステップＳ２３１で、調査対象となる所定のキーワードが与えられると、そのキーワードと類似の単語群を、類語辞典の一種であるシソーラスなどを用いて抽出する。そしてステップＳ２３２で、各Ｗｅｂサイト毎に、キーワードと同一または類似の単語群のいずれかを含む文章情報、すなわち関連情報を抽出する。さらにステップＳ２３３で、抽出された関連文章情報をカウントし、その数を累積計算する。そして判断ステップであるステップＳ２３４において、すべてのＷｅｂサイトについての計算がされたか否かを判断し、ＮＯであれば次のＷｅｂサイトについてステップＳ２３２からＳ２３３を繰り返し、ＹＥＳになった時点でステップＳ２３０が終了して次のステップ２４０へ移行する。 Further detailed processing of step S230 will be described with reference to FIG. In step S230, when a predetermined keyword to be investigated is given in step S231, a word group similar to the keyword is extracted using a thesaurus which is a kind of thesaurus. In step S232, sentence information including any word group that is the same as or similar to the keyword, that is, related information is extracted for each Web site. In step S233, the extracted related sentence information is counted and the number is cumulatively calculated. In step S234, which is a determination step, it is determined whether or not calculations have been performed for all websites. If NO, steps S232 to S233 are repeated for the next website. End and go to the next step 240.

このようにステップＳ２３０では、例えば、「車」というキーワードが与えられると、ステップＳ２３１によって「軽四」「ハイブリッドカー」「自動車」といった俗称、略称あるいは正式名称その他の単語を類似語として抽出する。従って、関連情報の内容の分析およびその投稿数の算出を漏れなく行うことができる。 Thus, in step S230, for example, when the keyword “car” is given, common words, abbreviations, formal names, and other words such as “light four”, “hybrid car”, and “automobile” are extracted as similar words in step S231. Therefore, it is possible to analyze the contents of related information and calculate the number of posts without omission.

ステップＳ２４０では、Ｗｅｂサイトごとに関連情報の投稿率を計算し、その計算結果をサイト活性度分析結果データベース２２ｂに格納する。関連情報投稿率は、各Ｗｅｂサイトから収集した文章情報の総数を分母に、その中の所定のキーワードについての関連情報の数を分子に配して除算計算を行っている。例えば、図８に示すＷｅｂサイト１には、文章情報ａ１，ａ２の２件あり、そのうち、キーワード「車」の関連情報は「ハイブリッドカー」という単語を含む文章情報ａ１の１件である。従って、Ｗｅｂサイト１におけるキーワード「車」についての関連情報投稿率は０．５と計算される。このようにしてキーワードごと、かつＷｅｂサイトごとに関連情報投稿率が計算され、その計算結果は図１０のリストのように系統立ててサイト活性度分析結果データベース２２ｂに格納される。 In step S240, the posting rate of related information is calculated for each Web site, and the calculation result is stored in the site activity analysis result database 22b. The related information posting rate is calculated by dividing the total number of sentence information collected from each Web site by using the denominator and the number of related information for a predetermined keyword in the denominator. For example, the Web site 1 shown in FIG. 8 has two pieces of sentence information a1 and a2, and the related information of the keyword “car” is one piece of sentence information a1 including the word “hybrid car”. Therefore, the related information posting rate for the keyword “car” on the Web site 1 is calculated as 0.5. In this way, the related information posting rate is calculated for each keyword and for each Web site, and the calculation results are systematically stored in the site activity analysis result database 22b as shown in the list of FIG.

ステップＳ２５０では、各Ｗｅｂサイトの更新頻度を計算し、その計算結果をサイト活性度分析結果データベース２２ｂに格納する。図８の計算例では、分析を行う基準日とそのＷｅｂサイトの最終更新日との差に１を加算し、その逆数を更新頻度と定義している。この定義によれば、Ｗｅｂサイト１の場合は、分析を行う基準日と最終更新日がともに９月１１日（同日）のため更新頻度は１．０となる。また、Ｗｅｂサイト２の場合は、同様の計算を行うと更新頻度は０．０１１となる。つまり、頻繁に更新されているＷｅｂサイト１は更新頻度が高い値となり、更新されずに放置されている期間が長いＷｅｂサイト２は更新頻度が低い値を示すことになる。 In step S250, the update frequency of each Web site is calculated, and the calculation result is stored in the site activity analysis result database 22b. In the calculation example of FIG. 8, 1 is added to the difference between the reference date for analysis and the last update date of the Web site, and the reciprocal thereof is defined as the update frequency. According to this definition, in the case of Web site 1, the update frequency is 1.0 because both the reference date for analysis and the last update date are September 11 (the same day). In the case of the Web site 2, if the same calculation is performed, the update frequency is 0.011. That is, the frequently updated Web site 1 has a high update frequency, and the Web site 2 that has been left unupdated for a long period shows a low update frequency.

ステップＳ２６０では、ステップＳ２４０で所定のキーワードについて割り付けられた関連情報投稿率を横軸に、ステップ２５０で割り付けられた更新頻度を縦軸にして、各Ｗｅｂサイトの相対的な位置づけをグラフに表し、サイト活性度分析情報としてＰＵＬＬ型（利用者２６が必要に応じて情報を取り出す）で提供する。例えば図１１に示すように、グラフの右上に位置するＷｅｂサイトは、「車」に関する情報が多く、かつ、頻繁に更新されているＷｅｂサイト群であるので、Ｗｅｂサイト１，２，５のように活発に情報発信しているＷｅｂサイトにアクセスすれば、「車」に関する有益な情報が得られそうだということが分かる。逆に、グラフの左下に位置するＷｅｂサイトは、「車」に関する情報が少なく、かつ、更新頻度も低いＷｅｂサイト群であるので、Ｗｅｂサイト８のように活動が低調なＷｅｂサイトにアクセスしても、「車」に関する有益な情報が得られそうにないということが分かる。 In step S260, the relative information posting ratio assigned for the predetermined keyword in step S240 is plotted on the horizontal axis, the update frequency assigned in step 250 is plotted on the vertical axis, and the relative positioning of each website is represented in a graph. The site activity analysis information is provided in PULL type (the user 26 extracts information as needed). For example, as shown in FIG. 11, the Web site located in the upper right of the graph is a group of Web sites that have a lot of information related to “cars” and are frequently updated. If you access a website that actively transmits information, you will find that you can get useful information about "cars". Conversely, the website located at the lower left of the graph is a group of websites with little information on “cars” and a low update frequency. Therefore, a website with low activity such as website 8 is accessed. However, it turns out that useful information about "cars" is unlikely to be obtained.

以上説明した本発明の第二の実施形態に係るサイト活性度分析（ステップＳ２１０からステップＳ２６０）のインターネットサイト情報分析方法によれば、活発に情報発信しているＷｅｂサイトを容易に知ることができ、情報発信源として注目すべきそれらのＷｅｂサイトに絞って情報収集を行えば、有益なトレンド情報を効率よく得ることができる。 According to the Internet site information analysis method of the site activity analysis (steps S210 to S260) according to the second embodiment of the present invention described above, it is possible to easily know the website that is actively transmitting information. If information is collected focusing on those Web sites that should be noted as information sources, useful trend information can be obtained efficiently.

次に、本発明の第三の実施形態に係る共起情報分析を行うインターネットサイト情報分析方法を、まず、図１２から図１５に示すフローに基づいて概要を説明する。図１２のフローに示すステップＳ３１０では、各Ｗｅｂサイトから所定の期間が経過するごとに文章情報を収集する。インターネットサイト情報分析装置１０では、クローラ１４と、記事データベース１６と、共起情報分析アナライザ１８ｃとがこの情報収集手段としての働きを行う。次に、ステップＳ３２０で、収集した文章情報を単語（品詞）に分解する。インターネットサイト情報分析装置１０では、共起情報分析アナライザ１８ｃがこの単語分割手段としての働きを行う。ステップＳ３３０では、分割された単語群の中に所定のキーワードと同一又は類似するものを含む文章情報、すなわち調査対象情報を抽出する。この時、インターネットサイト情報分析装置１０では、共起情報分析アナライザ１８ｃがこの調査対象情報抽出手段としての働きを行う。そして、ステップＳ３４０で、抽出された調査対象情報を構成する単語の中から、上記所定のキーワードと同一又は類似の単語を除く他の単語、すなわち共起キーワードを抽出する。インターネットサイト情報分析装置１０では、共起情報分析アナライザ１８ｃがこの共起キーワード抽出手段としての働きを行う。 Next, an outline of the Internet site information analysis method for performing co-occurrence information analysis according to the third embodiment of the present invention will be described based on the flow shown in FIGS. In step S310 shown in the flow of FIG. 12, sentence information is collected each time a predetermined period elapses from each Web site. In the Internet site information analyzing apparatus 10, the crawler 14, the article database 16, and the co-occurrence information analyzing analyzer 18c function as this information collecting means. Next, in step S320, the collected sentence information is broken down into words (parts of speech). In the Internet site information analyzing apparatus 10, the co-occurrence information analyzing analyzer 18c functions as the word dividing means. In step S330, sentence information including the same or similar to the predetermined keyword in the divided word group, that is, survey target information is extracted. At this time, in the Internet site information analysis apparatus 10, the co-occurrence information analysis analyzer 18c functions as this investigation target information extraction means. In step S340, other words excluding the same or similar words as the predetermined keyword, that is, co-occurrence keywords are extracted from the words constituting the extracted survey target information. In the Internet site information analysis apparatus 10, the co-occurrence information analysis analyzer 18c functions as this co-occurrence keyword extracting means.

ステップＳ３５０では、抽出された共起キーワードが調査対象情報に出現する頻度を基に、共起キーワードごとの評点計算を行う。インターネットサイト情報分析装置１０では、共起情報分析アナライザ１８ｃがこの共起キーワード評点計算手段としての働きを行う。次に、ステップＳ３６０で、共起キーワード群を共起キーワード評点の順に並び替えた共起情報リストを作成し、それを格納する。インターネットサイト情報分析装置１０では、共起情報分析アナライザ１８ｃと、共起情報分析結果データベース２２ｃとがこのソート手段としての働きを行う。そして、ステップＳ３７０は、利用者２６の要求に応じ、所定の期間が経過するごとに作成した共起情報リストを時系列に表して出力する。インターネットサイト情報分析装置１０では、共起情報分析結果データベース２２ｃと、ポータルサーバ２４が備える共起情報分析表示フレームワーク２４ｃとがこの共起情報分析出力手段としての働きを行う。 In step S350, a score is calculated for each co-occurrence keyword based on the frequency with which the extracted co-occurrence keyword appears in the survey target information. In the Internet site information analysis apparatus 10, the co-occurrence information analysis analyzer 18c functions as this co-occurrence keyword score calculation means. Next, in step S360, a co-occurrence information list in which the co-occurrence keyword groups are rearranged in the order of the co-occurrence keyword scores is created and stored. In the Internet site information analysis apparatus 10, the co-occurrence information analysis analyzer 18c and the co-occurrence information analysis result database 22c serve as the sorting means. In step S370, in response to a request from the user 26, the co-occurrence information list created every time a predetermined period elapses is output in a time series. In the Internet site information analysis apparatus 10, the co-occurrence information analysis result database 22c and the co-occurrence information analysis display framework 24c included in the portal server 24 function as the co-occurrence information analysis output means.

次に、上記共起情報分析の各ステップについて、詳細に説明する。図１３に示すように各Ｗｅｂサイトには、複数の文章情報が存在する。ステップＳ３１０では、例えばＷｅｂサイト１であれば、文章情報ａ１，ａ２を収集する。さらにこれらの文章情報はステップＳ３２０において、名詞、形容詞、動詞等の単語（品詞）に分解される。 Next, each step of the co-occurrence information analysis will be described in detail. As shown in FIG. 13, each Web site has a plurality of text information. In step S310, for example, if it is Web site 1, sentence information a1 and a2 are collected. Further, in step S320, the sentence information is decomposed into words (parts of speech) such as nouns, adjectives and verbs.

ステップＳ３３０では、調査対象となる所定のキーワードが与えられると、そのキーワードと類似の単語群を類語辞典の一種であるシソーラスなどを用いて抽出し、そのキーワードと同一または類似の単語群のいずれかを含む文章情報、すなわち調査対象情報を抽出する。例えば図１３の例によれば、「デジタルカメラ」というキーワードが与えられると、「デジカメ」「デジタルスチルカメラ」「デジタルビデオカメラ」といった俗称、略称あるいは正式名称その他の単語を類似語として抽出する。そして、「デジタルカメラ」およびその類似語を含む文章情報ａ１，ａ２，ｂ２を調査対象情報として抽出する。このように、本来的に調査すべき調査対象情報を漏れなく抽出することを可能にしている。 In step S330, when a predetermined keyword to be investigated is given, a word group similar to the keyword is extracted using a thesaurus which is a kind of thesaurus, and either one of the same or similar word groups as the keyword is extracted. Text information including, that is, survey target information is extracted. For example, according to the example of FIG. 13, when the keyword “digital camera” is given, common names such as “digital camera”, “digital still camera”, and “digital video camera”, abbreviated names, formal names, and other words are extracted as similar words. Then, text information a1, a2, and b2 including “digital camera” and similar words are extracted as investigation target information. In this way, it is possible to extract the investigation object information that should be originally investigated without omission.

ステップＳ３４０では、抽出された調査対象情報を構成する単語の中から、上記所定のキーワードと同一又は類似の単語を除く他の単語、すなわち共起キーワードを抽出する。例えば、文章情報ａ１であれば「Ａ社」「Ｂ社」「性能」が共起キーワードに該当する。ここで、「（株）Ａ」「株式会社Ａ」「Ａ」「Ａ社」といった共起キーワードが別個に抽出された場合、すべて「Ａ社」と同義語であるとして問題なければ、一つの共起キーワードとして取りまとめて、次のステップに進めばよい。 In step S340, other words excluding the same or similar words as the predetermined keyword, that is, co-occurrence keywords, are extracted from the words constituting the extracted survey target information. For example, in the case of sentence information a1, “Company A”, “Company B”, and “Performance” correspond to the co-occurrence keywords. Here, when co-occurrence keywords such as “(A) Co., Ltd.”, “A Co., Ltd.”, “A” and “Company A” are extracted separately, all are synonymous with “Company A”. Collect them as co-occurrence keywords and go to the next step.

ステップＳ３５０について、さらに詳細な処理について図１４のフローに基づいて説明する。ステップＳ３５１では、情報収集したすべての文章情報の数をカウントする。例えば、図１３の例において、情報収集をＷｅｂサイト１，２のみを対象に行ったとすれば、文章情報の総数は５とカウントされる。ステップＳ３５２では、調査対象情報に該当する文章情報の数をカウントする。例えば図１３の例では、文章情報の総数５のうち、キーワード「デジタルカメラ」に対する調査対象情報の数は３とカウントされる。ステップＳ３５３では、ステップ３４０で抽出した共起キーワードと同一の単語が含まれる文章情報の数を、共起キーワードごとにカウントする。例えば図１３の例では、文章情報の数５のうち、共起キーワード「Ａ社」が含まれる文章情報の数は３とカウントされる。ステップＳ３５４では、ステップＳ３４０で抽出した共起キーワードと同一の単語が含まれる調査対象情報の数を、各共起キーワードごとにカウントする。例えば図１３の例では、「デジタルカメラ」に対する調査対象情報の数３のうち、共起キーワード「Ａ社」が含まれる調査対象情報の数は２とカウントされる。 Regarding step S350, further detailed processing will be described based on the flow of FIG. In step S351, the number of all sentence information collected is counted. For example, in the example of FIG. 13, if information collection is performed only on the Web sites 1 and 2, the total number of text information is counted as 5. In step S352, the number of text information corresponding to the survey target information is counted. For example, in the example of FIG. 13, out of the total number 5 of sentence information, the number of pieces of investigation target information for the keyword “digital camera” is counted as 3. In step S353, the number of sentence information including the same word as the co-occurrence keyword extracted in step 340 is counted for each co-occurrence keyword. For example, in the example of FIG. 13, the number of text information including the co-occurrence keyword “Company A” out of the number 5 of text information is counted as 3. In step S354, the number of investigation target information including the same word as the co-occurrence keyword extracted in step S340 is counted for each co-occurrence keyword. For example, in the example of FIG. 13, the number of pieces of investigation target information including the co-occurrence keyword “Company A” is counted as two out of three pieces of investigation target information for “digital camera”.

ステップＳ３５５では、各共起キーワードごとにその共起キーワードの評点を計算する。ここで、共起キーワードの評点は、ステップＳ３５４のカウント数にＳ３５１のカウント数を積算したものを、Ｓ３５２のカウント数にＳ３５３のカウント数を積算したもので除算し、さらにそれを、２を底とする対数に換算したもの、と定義するのが好ましい。例えば図１３の例では、共起キーワード「Ａ社」の評点は、ステップＳ３５１のカウント数である５、ステップＳ３５２のカウント数である３、ステップ３５３のカウント数である３、ステップＳ３５４のカウント数である２を用いて計算され、評点０．１５２が得られる。そして判断ステップであるステップＳ３５６において、すべての共起キーワードについての計算がされたか否かを判断し、ＮＯであれば次の共起キーワードについてステップＳ３５３からＳ３５５を繰り返し、ＹＥＳになった時点でステップＳ３５０が終了する。 In step S355, the score of the co-occurrence keyword is calculated for each co-occurrence keyword. Here, the score of the co-occurrence keyword is obtained by dividing the sum of the count number of S351 by the count number of Step S354 by the sum of the count number of S352 and the count number of S353. It is preferable to define that converted into a logarithm. For example, in the example of FIG. 13, the score of the co-occurrence keyword “Company A” is 5 as the count number in step S351, 3 as the count number in step S352, 3 as the count number in step 353, and the count number in step S354. Is used to obtain a rating of 0.152. In step S356, which is a determination step, it is determined whether or not all co-occurrence keywords have been calculated. If NO, steps S353 to S355 are repeated for the next co-occurrence keyword. S350 ends.

なお、本実施形態のステップＳ３５０では、その相関関係の連鎖の要素をも評点計算に組み入れているという特徴を有している。本来、キーワード「デジタルカメラ」と共起キーワード「Ａ社」との間に相関関係がある場合には、共起キーワード「Ａ社」に対応して共起する「デジタルカメラ」以外のキーワード群との間にも相関関係が存在するものと考えるべきである。しかしながら、相関関係の連鎖にまで着目すると計算量が膨大になる問題もあり、一般的にはそこまでの処理は行われていなかった。本実施形態では、ステップＳ３５１のカウント数とステップＳ３５３のカウント数を計算式に盛り込むことによって、「デジタルカメラ」と「Ａ社」の相関関係の強さだけでなく、例えば「液晶テレビ」と「Ａ社」の相関関係の強さについても、その違いが相対的に各評点に反映させることができる。 Note that step S350 of the present embodiment has a feature that the correlation chain element is also incorporated in the score calculation. Originally, when there is a correlation between the keyword “digital camera” and the co-occurrence keyword “Company A”, a keyword group other than “digital camera” co-occurs corresponding to the co-occurrence keyword “A company” It should be considered that there is a correlation between the two. However, if attention is paid to the chain of correlations, there is a problem that the amount of calculation becomes enormous, and generally, the processing up to that point has not been performed. In this embodiment, by incorporating the count number in step S351 and the count number in step S353 into the calculation formula, not only the strength of the correlation between “digital camera” and “Company A” but also “liquid crystal television” and “ Regarding the strength of the correlation of “Company A”, the difference can be relatively reflected in each score.

ステップＳ３６０では、共起キーワード群を共起キーワード評点の順に並び替えた共起情報リストを作成し、それを共起情報分析結果情報データベース２２ｃに格納する。さらにステップＳ３７０では、利用者２６の要求に応じ、所定の期間が経過するごとに作成した共起情報リストを時系列に表し、共起情報分析出力としてＰＵＬＬ型（利用者２６が必要に応じて情報を取り出す）で提供する。図１５は、Ｗｅｂサイト１，２を含むすべてのＷｅｂサイトについて評点計算を行った例である。例えばキーワード「デジタルカメラ」についてみると、共起キーワード「製品Ｗ」は２００７年７月の時点では評点が低くランク外であったが、２００７年８月には２位までランクアップしている。従って、「デジタルカメラ」の分野では、「製品Ｗ」が投稿者の話題の中心になってきていることが分かる。また、共起キーワード「Ｂ社」は２００７年７月の時点では２位にランクされていたが、２００７年８月には５位までランクダウンしている。従って、投稿者は「Ｂ社」に注目しなくなってきていることが分かる。 In step S360, a co-occurrence information list in which the co-occurrence keyword groups are rearranged in the order of the co-occurrence keyword scores is created and stored in the co-occurrence information analysis result information database 22c. Further, in step S370, the co-occurrence information list created every time a predetermined period elapses in response to a request from the user 26 is displayed in time series, and the PULL type (the user 26 as necessary) Information is taken out). FIG. 15 shows an example in which the score calculation is performed for all websites including the websites 1 and 2. For example, in the case of the keyword “digital camera”, the co-occurrence keyword “product W” had a low score and was not ranked as of July 2007, but it rose to the second place in August 2007. Therefore, it can be seen that in the field of “digital camera”, “product W” has become the center of the contributor's topic. Further, the co-occurrence keyword “Company B” was ranked second in July 2007, but was ranked down to fifth in August 2007. Therefore, it can be seen that the poster has stopped paying attention to “Company B”.

以上説明した本発明の第三の実施形態に係る共起情報分析（ステップＳ３１０からステップＳ３７０）のインターネットサイト情報分析方法によれば、所定のキーワードに共起する別のキーワードの変化を時系列で分析することによって、所定のキーワードの周囲に広がっていく投稿者の興味の変化を実態的に把捉することができる。 According to the Internet site information analysis method of the co-occurrence information analysis (steps S310 to S370) according to the third embodiment of the present invention described above, the change of another keyword that co-occurs with a predetermined keyword is time-sequentially. By analyzing, it is possible to grasp in real terms the change in the interest of the poster spreading around the predetermined keyword.

なお、本発明は上記実施形態に限定するものではなく、インターネットサイト情報分析装置１０は、評判分析、サイト活性度分析、共起情報分析のうちのいずれか一以上の分析に係る作業手段またはステップを有する分析装置または分析方法であればよい。 In addition, this invention is not limited to the said embodiment, The internet site information analysis apparatus 10 is a working means or step concerning any one or more of reputation analysis, site activity analysis, and co-occurrence information analysis. Any analysis device or analysis method may be used.

また、ステップＳ１６０，Ｓ２６０が出力する分析結果の表示フレームは、各データが有する複数の特性値の相対的な関係が視覚的に認識可能なものであればよく、本実施形態に例示したグラフ化イメージに限定するものではない。グラフの目盛を対数表示にしたり、凡例を付して複数の分析結果を重ねて表するなどして、より視覚に訴えるための工夫がなされるべきものである。 Moreover, the display frame of the analysis result output in steps S160 and S260 may be any graph as long as it can visually recognize the relative relationship between a plurality of characteristic values of each data. It is not limited to images. The graph scale should be displayed in logarithmic form, or a legend should be attached to display the results of multiple analysis.

また、ステップＳ２４０に定義する関連投稿率の計算式、ステップＳ２５０に定義する更新頻度の計算式、ステップＳ３５０に定義する共起キーワード評点の計算式は、調査分析の対象とする事物やその分野ごとの個別の事情など鑑みて定義したものであれば、上記実施形態の計算式に限定するものではない。例えば、ステップＳ３５０に定義した共起キーワード評点の計算式であれば、対数の底の値を変更したり、特定の情報の数について２乗した数値を代入するなど、細かく解析したい内容が特性値として顕著に表れるように別の計算式を定義してもよい。 Further, the calculation formula of the related posting rate defined in step S240, the calculation formula of the update frequency defined in step S250, and the calculation formula of the co-occurrence keyword score defined in step S350 are the items to be investigated and analyzed for each field As long as it is defined in view of individual circumstances, the calculation formula of the above embodiment is not limited. For example, in the case of the co-occurrence keyword score calculation formula defined in step S350, the content to be analyzed in detail, such as changing the base value of the logarithm or substituting a square value for the number of specific information, is the characteristic value. Another calculation formula may be defined so as to appear prominently.

なお、上記第一から第三の実施形態のような一連の処理動作をプログラムとして構築し、インターネットサイト情報分析装置１０として利用されるサーバーコンピュータにインストールし、ＣＰＵなどの制御手段によって実行させる他、そのプログラムをネットワークを介して流通させるようにしてもよい。また、構築されたプログラムをインターネットサイト情報分析装置１０として利用される各種のコンピュータに接続されるハードディスク装置、フレキシブルディスク、ＣＤ−ＲＯＭなどの可搬記憶媒体に格納し、コンピュータにインストールして実行させるようにしてもよい。 In addition, a series of processing operations as in the first to third embodiments are constructed as a program, installed in a server computer used as the Internet site information analysis apparatus 10, and executed by a control means such as a CPU. The program may be distributed via a network. Further, the constructed program is stored in a portable storage medium such as a hard disk device, a flexible disk, or a CD-ROM connected to various computers used as the Internet site information analysis apparatus 10, and is installed and executed on the computer. You may do it.

この発明のインターネットサイト情報分析装置の一実施形態を配置したネットワークシステム全体の構成を示す図である。It is a figure which shows the structure of the whole network system which has arrange | positioned one Embodiment of the internet site information analysis apparatus of this invention. この発明のインターネットサイト情報分析方法の第一の実施形態である評判分析に係るフローチャートである。It is a flowchart which concerns on reputation analysis which is 1st embodiment of the internet site information analysis method of this invention. 本実施形態において文章情報から評判情報を抽出する例を示す図である。It is a figure which shows the example which extracts reputation information from text information in this embodiment. 本実施形態において使用される評価表現辞書の例を示す図である。It is a figure which shows the example of the evaluation expression dictionary used in this embodiment. 本実施形態において評点計算を行うステップに係るフローチャートである。It is a flowchart which concerns on the step which performs a score calculation in this embodiment. 本実施形態において評判情報分析結果の出力形式の例を示すグラフである。It is a graph which shows the example of the output format of a reputation information analysis result in this embodiment. この発明のインターネットサイト情報分析方法の第二の実施形態であるサイト活性度分析に係るフローチャートである。It is a flowchart which concerns on the site activity analysis which is 2nd embodiment of the internet site information analysis method of this invention. 本実施形態において文章情報と更新頻度計算の例を示す図である。It is a figure which shows the example of text information and update frequency calculation in this embodiment. 本実施形態において関連記事投稿数計算を行うステップに係るフローチャートである。It is a flowchart which concerns on the step which performs the number calculation of related articles in this embodiment. 本実施形態において関連記事投稿率の計算結果リストの例を示す図である。It is a figure which shows the example of the calculation result list | wrist of a related article contribution rate in this embodiment. 本実施形態においてサイト活性度分析結果の出力形式の例を示すグラフである。It is a graph which shows the example of the output format of a site activity analysis result in this embodiment. この発明のインターネットサイト情報分析方法の第三の実施形態である共起情報分析に係るフローチャートである。It is a flowchart which concerns on co-occurrence information analysis which is 3rd embodiment of the internet site information analysis method of this invention. 本実施形態において文章情報から調査対象情報を抽出する例を示す図である。It is a figure which shows the example which extracts investigation object information from text information in this embodiment. 本実施形態において各共起キーワードの評点計算を行うステップに係るフローチャートである。It is a flowchart which concerns on the step which performs the score calculation of each co-occurrence keyword in this embodiment. 本実施形態において共起情報分析結果の出力形式の例を示す図である。It is a figure which shows the example of the output format of a co-occurrence information analysis result in this embodiment.

Explanation of symbols

１０インターネットサイト情報分析装置
１２Ｗｅｂサイト
１４クローラ
１６記事データベース
１８アナライザ
２０評価表現辞書群データベース
２２分析結果データベース
２４ポータルサーバ
２６利用者 DESCRIPTION OF SYMBOLS 10 Internet site information analyzer 12 Website 14 Crawler 16 Article database 18 Analyzer 20 Evaluation expression dictionary group database 22 Analysis result database 24 Portal server 26 User

Claims

In the Internet site information analysis method for accessing and analyzing the website information on the Internet, collecting the text information,
An information collecting step for collecting text information of the website;
A word dividing step for dividing the sentence information into words;
A reputation information extraction step for extracting reputation information consisting of a set of an evaluation object, an evaluation axis representing the nature of the evaluation object, and an evaluation expression representing an evaluation with respect to the evaluation axis, from the word group;
A classification step for each evaluation axis for classifying the reputation information for each evaluation axis;
A plurality of evaluation expression dictionaries consisting of a set of the evaluation expression and a score indicating its degree are provided in advance for each evaluation axis, and a score calculation for calculating a score based on the evaluation expression dictionary corresponding to the evaluation axis to be analyzed therein Steps,
For each sentence information, a reputation analysis output step for outputting a relative comparison of scores of arbitrary different evaluation axes;
An Internet site information analysis method comprising:

In the Internet site information analysis method for accessing and analyzing the website information on the Internet, collecting the text information,
An information collecting step of collecting the sentence information and update date information of each website;
A word dividing step for dividing the sentence information into words;
Extracting a word that is the same as or similar to the predetermined keyword from the word group, and calculating the number of related information posts that calculates the number of text information including the word;
A related information posting rate calculating step for calculating a ratio of the number of related information postings to the number of text information collected from each Web site for each Web site;
An update frequency calculating step for calculating an update frequency of each Web site based on a reference date for analysis and the update date information;
A site activity analysis output step of outputting the related information posting rate for each website and the update frequency relative to each other;
An Internet site information analysis method comprising:

In the Internet site information analysis method for accessing and analyzing the website information on the Internet, collecting the text information,
An information collecting step of collecting text information of the website every time a predetermined period elapses;
A word dividing step for dividing the sentence information into words;
A survey target information extracting step for extracting sentence information including the same or similar word as the survey target keyword from the word group;
A co-occurrence keyword extraction step for extracting a co-occurrence keyword that is included in the words constituting the survey target information and is a keyword different from the keyword;
A co-occurrence keyword score calculation step for calculating a score for each co-occurrence keyword based on the frequency of occurrence of the co-occurrence keyword in the survey target information;
A sorting step of rearranging the co-occurrence keywords in the order of the co-occurrence keyword scores to create a co-occurrence information list;
A co-occurrence information analysis output step for outputting the co-occurrence information list obtained for each predetermined period in a time series; and
An Internet site information analysis method comprising:

In an Internet site information analysis apparatus that is configured by a computer system, accesses a Web site existing on the Internet, collects text information, and performs analysis,
Information collecting means for collecting text information of the website;
Word dividing means for dividing the sentence information into words;
Reputation information extraction means for extracting reputation information consisting of a set of an evaluation object, an evaluation axis representing the nature of the evaluation object, and an evaluation expression representing evaluation for the evaluation axis, from the word group;
Classification means for each evaluation axis for classifying the reputation information for each evaluation axis;
An evaluation expression dictionary group database in which a plurality of evaluation expression dictionaries including a set of the evaluation expression and a score indicating the degree are provided for each evaluation axis;
An evaluation expression dictionary corresponding to the evaluation axis to be analyzed is extracted from the evaluation expression dictionary group database, and a score calculation means for calculating a score for each piece of reputation information based on the evaluation expression dictionary;
Reputation analysis output means for comparing and outputting scores of arbitrary different evaluation axes for each sentence information;
An Internet site information analyzing apparatus comprising:

The reputation analysis output means outputs a two-dimensional graph based on a score of any two different evaluation axes or a three-dimensional graph based on a score of any three different evaluation axes. 4. Internet site information analysis apparatus according to 4.

In an Internet site information analysis apparatus that is configured by a computer system, accesses a Web site existing on the Internet, collects text information, and performs analysis,
Information collecting means for collecting text information of the website;
Word dividing means for dividing the sentence information into words;
Extracting a word that is the same as or similar to a predetermined keyword from the word group, and calculating the number of related information posts that calculates the number of text information including the word;
Related information posting rate calculating means for calculating the ratio of the number of related information postings to the number of text information collected from each Web site for each Web site;
Update date information collecting means for collecting the update date of each website;
Update frequency calculation means for calculating the update frequency of each website based on the reference date for analysis and the update date information;
Site activity analysis output means for relatively comparing and outputting the related information posting rate for each Web site and the update frequency;
Internet site information analysis apparatus characterized by comprising

7. The Internet site information analysis apparatus according to claim 6, wherein the activity analysis output means outputs the related information posting rate and the update frequency in a two-dimensional graph.

In the Internet site information analysis device that accesses and analyzes the website information that exists on the Internet,
Information collecting means for collecting text information of the website every time a predetermined period elapses;
Word dividing means for dividing the sentence information into words;
Investigation target information extracting means for extracting sentence information including a word that is the same as or similar to the keyword to be investigated from the word group;
A co-occurrence keyword extracting means for extracting a co-occurrence keyword which is included in the words constituting the investigation target information and is other words excluding the same or similar words as the investigation target keyword;
Co-occurrence keyword score calculation means for calculating a score for each co-occurrence keyword based on the frequency of appearance of the co-occurrence keyword in the survey target information;
Sort means for creating a co-occurrence information list by rearranging the co-occurrence keywords in the order of the co-occurrence keyword scores;
A co-occurrence information analysis output means for outputting the co-occurrence information list obtained for each predetermined period in time series; and
An Internet site information analyzing apparatus comprising: