JP2009116469A

JP2009116469A - Information extraction program and information extraction device

Info

Publication number: JP2009116469A
Application number: JP2007286537A
Authority: JP
Inventors: Kazunari Kawai; 一成川合
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2007-11-02
Filing date: 2007-11-02
Publication date: 2009-05-28
Anticipated expiration: 2027-11-02
Also published as: JP5088096B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a user with information precious to the user by properly managing information contributed from each user. <P>SOLUTION: An information management server 100 is configured to acquire an article (including a blog or the like) created in a prescribed period from a storage device, and extracts a keyword included in the acquired article as a feature keyword. On the basis of an article and the feature keyword created by an object user or a community configured of a plurality of users, the information management server 100 calculates the featured quantity (first TF/IDF value) of each feature keyword for the user or the community, and extracts the feature keyword whose calculated featured quantity is equal to a threshold or more as a profile keyword showing the feature (profile) of the user or the community. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、記事情報を記憶した記憶装置から所定の情報を抽出する情報抽出プログラムおよび情報抽出装置に関するものである。 The present invention relates to an information extraction program and an information extraction apparatus for extracting predetermined information from a storage device storing article information.

近年、企業ではネットワーク環境が整備されており、例えば、非特許文献１に開示されているような社内ＳＮＳ（Social Network Service）や、グループウェアなどの情報ツールが日常的に利用されている。そして、利用者（社員等）は、かかる情報ツールを利用することにより、自身が作成した記事（ブログ等も含む）を他の社員に公開すると共に、他の利用者が作成した記事を参照している。 In recent years, companies have established network environments, and for example, in-house SNS (Social Network Service) as disclosed in Non-Patent Document 1 and information tools such as groupware are routinely used. Users (employees, etc.) use such information tools to publish articles (including blogs, etc.) created by themselves to other employees, and refer to articles created by other users. ing.

なお、特許文献１では、利用者の記事検索にかかる効率を向上させるべく、複数のブログ記事を解析し、キーワードとなる単語を抽出し、該キーワードの１つが選択されると、選択されたキーワードのブログ記事の最新所定件数から関連キーワードを抽出するという技術が公開されている。 In Patent Document 1, in order to improve the efficiency of the user's article search, a plurality of blog articles are analyzed, a word as a keyword is extracted, and when one of the keywords is selected, the selected keyword is selected. A technique for extracting related keywords from the latest predetermined number of blog articles is published.

特開２００７−２３３４３８号公報JP 2007-233438 A 富士通ソフトウェアテクノロジーズ、“ＳＮＳ「知創空間（ちそうくうかん）」”、［online］、平成１９年１０月２６日検索、インターネット＜URL: http://jp.fujitsu.com/group/fst/services/chisokukan/＞FUJITSU SOFTWARE TECHNOLOGIES, “SNS“ Chisokukan ””, [online], search on October 26, 2007, Internet <URL: http://jp.fujitsu.com/group/fst / services / chisokukan / ＞

しかしながら、社内ＳＮＳやグループウェアなどの情報ツールが日常的に利用されているが、その情報（例えば、利用者が作成した記事）を効率よく再利用するといった仕組みがないため、貴重な情報があるにもかかわらず、この貴重な情報が利用者に再利用されることなく、他の情報に埋もれてしまうという問題があった。 However, information tools such as in-house SNS and groupware are used on a daily basis, but there is no mechanism for efficiently reusing that information (for example, articles created by users), so there is valuable information. Nevertheless, there is a problem that this valuable information is not reused by the user and is buried in other information.

この問題は、利用者が投稿する記事やブログが、自由に投稿することが可能なため、それらの情報を管理していないことが原因であると考えられる。しかし、従来では、これらの情報をどのように管理したら有効なのかを一意に決めることができず、適切に管理できていないのが現状である。 This problem can be attributed to the fact that articles and blogs posted by users can be freely posted, and that information is not managed. However, in the past, it has not been possible to uniquely determine how to manage these pieces of information, and the information is not properly managed.

この発明は、上述した従来技術による問題点を解消するためになされたものであり、各利用者が投稿する情報を適切に管理し、利用者にとって貴重な情報を利用者に提供することができる情報抽出プログラムおよび情報抽出装置を提供することを目的とする。 The present invention has been made to solve the above-described problems caused by the prior art, and can appropriately manage information posted by each user and provide the user with valuable information. An object is to provide an information extraction program and an information extraction device.

上述した課題を解決し、目的を達成するため、この情報抽出プログラムは、コンピュータに、所定の期間内に作成された記事を記憶装置から取得し、取得した記事に含まれるキーワードを特徴キーワードとして抽出する特徴キーワード抽出手順と、利用者または複数の利用者から構成されるコミュニティによって作成された記事と前記特徴キーワードとを基にして前記利用者または前記コミュニティに対する各特徴キーワードの特徴量を算出する特徴量算出手順と、前記特徴量が閾値以上となる特徴キーワードを前記利用者またはコミュニティの特徴を示すプロファイルキーワードとして抽出するプロファイルキーワード抽出手順と、を実行させることを要件とする。 In order to solve the above-described problems and achieve the object, this information extraction program acquires an article created within a predetermined period from a storage device and extracts keywords included in the acquired article as feature keywords. A feature amount of each feature keyword for the user or the community is calculated based on a feature keyword extraction procedure to be performed, an article created by a user or a community composed of a plurality of users, and the feature keyword. It is a requirement to execute a quantity calculation procedure and a profile keyword extraction procedure for extracting a feature keyword whose feature quantity is equal to or greater than a threshold value as a profile keyword indicating a feature of the user or community.

また、この情報抽出プログラムは、上記の情報抽出プログラムにおいて、前記特徴量算出手順は、前記利用者またはコミュニティによって作成された記事を前記記憶装置から抽出し、抽出した記事に含まれる前記特徴キーワードの数を示す第１の値を計数する第１計数手順と、前記利用者またはコミュニティによって作成された記事の総数を示す第２の値を計数する第２計数手順と、前記利用者またはコミュニティによって作成された記事の内で前記特徴キーワードを含む記事の数を示す第３の値を計数する第３計数手順と、前記第１、２、３の値を基にして特徴量を算出する算出手順とを実行することを要件とする。 Further, in the information extraction program, in the information extraction program described above, the feature amount calculation procedure extracts an article created by the user or community from the storage device, and extracts the feature keyword included in the extracted article. A first counting procedure for counting a first value indicating a number, a second counting procedure for counting a second value indicating the total number of articles created by the user or community, and created by the user or community A third counting procedure for counting a third value indicating the number of articles including the feature keyword in the posted articles, and a calculation procedure for calculating a feature value based on the first, second, and third values Is a requirement.

また、この情報抽出プログラムは、上記の情報抽出プログラムにおいて、基準となる利用者またはコミュニティの特徴を示す各プロファイルキーワードおよび当該プロファイルキーワードが記事中に出現する数を対応付けた基準プロファイルキーワード群と、他の利用者またはコミュニティの特徴を示す各プロファイルキーワードおよび当該プロファイルキーワードが記事中に出現する数を含んだ他のプロファイルキーワード群とを基にして類似度を算出し、当該類似度に基づいて前記基準となる利用者またはコミュニティに類似する他の利用者またはコミュニティを抽出する類似度算出手順を更にコンピュータに実行させること要件とする。 In addition, the information extraction program includes, in the information extraction program described above, a reference profile keyword group that associates each profile keyword indicating characteristics of a user or community serving as a reference and the number of appearance of the profile keyword in the article, Similarity is calculated based on each profile keyword indicating the characteristics of other users or communities and other profile keyword groups including the number of appearance of the profile keyword in the article, and based on the similarity It is a requirement that the computer further execute a similarity calculation procedure for extracting other users or communities similar to the reference user or community.

また、この情報抽出プログラムは、上記の情報抽出プログラムにおいて、前記記憶装置から所定の期間内の記事を抽出し、抽出した記事に含まれる前記特徴キーワードの数を示す第４の値を計数する第４計数手順と、所定の期間内の記事の総数を示す第５の値を計数する第５計数手順と、所定の期間内の記事の内で前記特徴キーワードを含む記事の数を示す第６の値を計数する第６計数手順と、前記第４、５、６の値を基にして各特徴キーワードの第２の特徴量を算出し、算出した第２の特徴量が閾値以上となる特徴キーワードを流行のキーワードとして抽出するトレンドキーワード抽出手順とを更にコンピュータに実行させることを要件とする。 Further, in the information extraction program, the information extraction program extracts articles within a predetermined period from the storage device, and counts a fourth value indicating the number of the feature keywords included in the extracted articles. A fourth counting procedure, a fifth counting procedure for counting a fifth value indicating the total number of articles in a predetermined period, and a sixth number indicating the number of articles including the characteristic keyword in the articles in the predetermined period Based on the sixth counting procedure for counting values and the fourth, fifth, and sixth values, the second feature amount of each feature keyword is calculated, and the calculated second feature amount is equal to or greater than a threshold value. It is a requirement that the computer further execute a trend keyword extraction procedure for extracting a keyword as a trendy keyword.

また、この情報抽出装置は、利用者または複数の利用者から構成されるコミュニティによって作成された記事を管理し、前記記事から所定の情報を抽出する情報抽出装置であって、前記利用者またはコミュニティによって作成された記事を記憶する記事記憶手段と、前記記事記憶手段から所定の期間内に作成された記事を取得し、取得した記事に含まれるキーワードを特徴キーワードとして抽出する特徴キーワード抽出手段と、前記利用者またはコミュニティによって作成された記事と前記特徴キーワードとを基にして前記利用者またはコミュニティに対する各特徴キーワードの特徴量を算出する特徴量算出手段と、前記特徴量が閾値以上となる特徴キーワードを前記利用者またはコミュニティの特徴を示すプロファイルキーワードとして抽出するプロファイルキーワード抽出手段と、を備えたことを要件とする。 The information extraction apparatus is an information extraction apparatus that manages an article created by a user or a community composed of a plurality of users and extracts predetermined information from the article, the user or community Article storage means for storing articles created by the above, feature keyword extraction means for acquiring articles created within a predetermined period from the article storage means, and extracting keywords included in the acquired articles as feature keywords; Feature quantity calculating means for calculating a feature quantity of each feature keyword for the user or community based on an article created by the user or community and the feature keyword, and a feature keyword having the feature quantity equal to or greater than a threshold value As a profile keyword indicating the characteristics of the user or community It is a requirement in that it comprises a profile keyword extracting means, a to.

この情報抽出プログラムによれば、所定期間内に作成された記事（ブログ等も含む）を記憶装置から取得し、取得した記事に含まれるキーワードを特徴キーワードとして抽出する。そして、対象となる利用者または複数の利用者から構成されるコミュニティによって作成された記事と特徴キーワードとを基にして利用者またはコミュニティに対する各特徴キーワードの特徴量（第１TF/IDF値）を算出し、算出した特徴量が閾値以上となる特徴キーワードを利用者またはコミュニティの特徴（プロファイル）を示すプロファイルキーワードとして抽出するので、利用者にとって有益な情報を利用者に提供することができる。 According to this information extraction program, an article (including a blog or the like) created within a predetermined period is acquired from the storage device, and a keyword included in the acquired article is extracted as a feature keyword. Then, the feature amount (first TF / IDF value) of each feature keyword for the user or community is calculated based on the article created by the target user or the community composed of a plurality of users and the feature keyword. Since the feature keyword whose calculated feature amount is equal to or greater than the threshold is extracted as a profile keyword indicating the feature (profile) of the user or community, information useful to the user can be provided to the user.

また、この情報抽出プログラムによれば、利用者またはコミュニティによって作成された記事を記憶装置から抽出し、抽出した記事に含まれる特徴キーワードの数を示す第１の値、利用者またはコミュニティによって作成された記事の総数を示す第２の値、利用者またはコミュニティによって作成された記事の内で特徴キーワードを含む記事の数を示す第３の値をそれぞれ計数し、第１、２、３の値を基にして特徴量を算出するので、より利用者の特徴をあらわすプロファイルキーワードを効率よく抽出することができる。 Further, according to this information extraction program, an article created by a user or a community is extracted from the storage device, and the first value indicating the number of feature keywords included in the extracted article is created by the user or the community. A second value indicating the total number of articles, a third value indicating the number of articles including a feature keyword among the articles created by the user or the community, respectively. Since the feature amount is calculated based on the profile keyword, it is possible to efficiently extract the profile keyword that represents the feature of the user.

また、この情報抽出プログラムによれば、基準となる利用者またはコミュニティの特徴を示す各プロファイルキーワードおよび当該プロファイルキーワードが記事中に出現する数を対応付けた基準プロファイルキーワード群と、他の利用者またはコミュニティの特徴を示す各プロファイルキーワードおよび当該プロファイルキーワードが記事中に出現する数を含んだ他のプロファイルキーワード群とを基にして類似度を算出し、当該類似度に基づいて基準となる利用者またはコミュニティに類似する他の利用者またはコミュニティを抽出するので、基準となる利用者またはコミュニティに類似した利用者またはコミュニティをより正確に抽出することができる。 Further, according to the information extraction program, each profile keyword indicating characteristics of a reference user or community and a reference profile keyword group that associates the number of appearance of the profile keyword in an article with another user or The similarity is calculated based on each profile keyword indicating the characteristics of the community and another profile keyword group including the number of occurrences of the profile keyword in the article, and a user or a reference user based on the similarity Since other users or communities similar to the community are extracted, a user or community similar to the reference user or community can be extracted more accurately.

また、この情報抽出プログラムによれば、記憶装置から所定の期間内の記事を抽出し、抽出した記事に含まれる特徴キーワードの数を示す第４の値、所定の期間内の記事の総数を示す第５の値、所定の期間内の記事の内で特徴キーワードを含む記事の数を示す第６の値をそれぞれ計数し、計数した第４、５、６の値を基にして各特徴キーワードの第２の特徴量を算出し、算出した第２の特徴量が閾値以上となる特徴キーワードを流行のキーワードとして抽出するので、流行のキーワードを効率よく抽出することができる。 Further, according to the information extraction program, articles within a predetermined period are extracted from the storage device, the fourth value indicating the number of feature keywords included in the extracted article, and the total number of articles within the predetermined period are indicated. A fifth value and a sixth value indicating the number of articles including the feature keyword in articles within a predetermined period are counted, and each feature keyword is counted based on the counted fourth, fifth, and sixth values. Since the second feature amount is calculated and the feature keyword whose calculated second feature amount is equal to or greater than the threshold is extracted as the trendy keyword, the trendy keyword can be efficiently extracted.

以下に添付図面を参照して、この発明に係る情報抽出プログラムおよび情報抽出装置の好適な実施の形態を詳細に説明する。 Exemplary embodiments of an information extraction program and an information extraction apparatus according to the present invention will be explained below in detail with reference to the accompanying drawings.

まず、本実施例にかかる情報共有システムの特徴について説明する。本実施例にかかる情報共有システムは、情報管理サーバが、所定期間内に作成された記事（ブログ等も含む）を記憶装置から取得し、取得した記事に含まれるキーワードを特徴キーワードとして抽出する。そして、情報管理サーバは、対象となる利用者または複数の利用者から構成されるコミュニティによって作成された記事と特徴キーワードとを基にして利用者またはコミュニティに対する各特徴キーワードの特徴量（第１TF/IDF値）を算出し、算出した特徴量が閾値以上となる特徴キーワードを利用者またはコミュニティの特徴（プロファイル）を示すプロファイルキーワードとして抽出する。 First, features of the information sharing system according to the present embodiment will be described. In the information sharing system according to the present embodiment, the information management server acquires articles (including blogs and the like) created within a predetermined period from the storage device, and extracts keywords included in the acquired articles as feature keywords. Then, the information management server uses the feature amount (first TF / first TF) for each user or community based on the article and the feature keyword created by the target user or the community composed of a plurality of users. IDF value) is calculated, and a feature keyword whose calculated feature amount is equal to or greater than a threshold value is extracted as a profile keyword indicating a feature (profile) of the user or community.

このように、本実施例にかかる情報共有システムでは、情報管理サーバが特徴キーワードを抽出し、抽出した特徴キーワードと対象となる利用者またはコミュニティによって作成された記事を基にして、プロファイルキーワードを抽出するので、利用者にとって有益な情報を利用者に提供することができる。 As described above, in the information sharing system according to the present embodiment, the information management server extracts the feature keyword, and extracts the profile keyword based on the extracted feature keyword and the article created by the target user or community. Therefore, information useful for the user can be provided to the user.

また、本実施例にかかる情報共有システムでは、各利用者（各コミュニティも含む）の各プロファイル情報を基にして類似度を算出し、算出した類似度に基づいて利用者に類似する他の利用者やコミュニティ（類似プロファイル）を抽出するので、利用者にとって有益な情報を利用者に提供することができる。 In the information sharing system according to the present embodiment, the similarity is calculated based on the profile information of each user (including each community), and other uses similar to the user based on the calculated similarity. Since users and communities (similar profiles) are extracted, information useful to the users can be provided to the users.

また、本実施例にかかる情報システムでは、所定の期間内に作成された記事を抽出し、抽出した記事と特徴キーワードとを基にして特徴量（第２TF/IDF値）を算出し、算出した特徴量が閾値以上となるキーワードを流行のキーワード（トレンドキーワード）として抽出するので、利用者にとって有益な情報を利用者に提供することができる。 Further, in the information system according to the present embodiment, an article created within a predetermined period is extracted, and a feature amount (second TF / IDF value) is calculated based on the extracted article and a feature keyword. Since keywords whose feature amount is equal to or greater than the threshold are extracted as trendy keywords (trend keywords), information useful to the user can be provided to the user.

次に、本実施例にかかる情報共有システムの構成について説明する。図１は、本実施例にかかる情報共有システムの構成を示すブロック図である。同図に示すように、この情報共有システムは、利用者が使用する利用者端末１０〜３０と、各種の情報を管理する情報管理サーバ１００とを備え、利用者端末１０〜３０および情報管理サーバ１００は、ネットワーク５０を介して相互に接続されている。なお、ここでは説明の便宜上、利用者端末１０〜３０を示すが、この情報共有システムは、その他の利用者端末を有していても良い。 Next, the configuration of the information sharing system according to the present embodiment will be described. FIG. 1 is a block diagram illustrating the configuration of the information sharing system according to the present embodiment. As shown in the figure, this information sharing system includes user terminals 10 to 30 used by a user and an information management server 100 that manages various types of information. The user terminals 10 to 30 and the information management server 100 are connected to each other via a network 50. In addition, although the user terminals 10-30 are shown here for convenience of explanation, this information sharing system may have other user terminals.

ここで、利用者端末１０〜３０は、情報管理サーバとデータ通信を実行し、利用者によって作成された記事の情報を情報管理サーバ１００に出力する装置である。また、利用者端末１０〜３０は、利用者またはコミュニティによって作成された記事の情報および上記したプロファイルキーワード、トレンドキーワード、類似プロファイルなどを含んだ画面に表示するための情報を情報管理サーバ１００から取得して、モニタに表示する。 Here, the user terminals 10 to 30 are devices that execute data communication with the information management server and output information on articles created by the user to the information management server 100. Also, the user terminals 10 to 30 acquire from the information management server 100 information to be displayed on a screen including information on articles created by users or communities and the above-described profile keywords, trend keywords, similar profiles, and the like. Display on the monitor.

図２は、利用者端末１０〜３０のモニタに表示される画面イメージの一例を示す図である。同図に示すように、この画面イメージでは、富士太郎さんのプロファイルキーワード、富士太郎さんの類似プロファイル、トレンドキーワードが表示されている。利用者は、自身でキーワード検索などを実行しなくても、この画面を参照することによって、各種有益な情報を効率よく取得することができ、重要な情報を見逃すことがなくなる。 FIG. 2 is a diagram illustrating an example of a screen image displayed on the monitor of the user terminals 10 to 30. As shown in the figure, in this screen image, a profile keyword of Mr. Fujitaro, a similar profile of Mr. Taro Fuji, and a trend keyword are displayed. By referring to this screen, the user can efficiently acquire various useful information without performing keyword search by himself / herself, so that important information is not missed.

情報管理サーバ１００は、利用者端末１０から記事の情報を取得して管理すると共に、上記した特徴キーワード、プロファイルキーワード、類似プロファイル、トレンドキーワードを抽出して、画面に表示するための情報を利用者端末１０〜３０に提供する装置である。図３は、本実施例にかかる情報管理サーバ１００の構成を示す機能ブロック図である。同図に示すように、この情報管理サーバ１００は、入力部１１０と、出力部１２０と、通信制御ＩＦ部１３０と、入出力制御ＩＦ部１４０と、記憶部１５０と、制御部１６０とを備えて構成される。 The information management server 100 acquires and manages article information from the user terminal 10, extracts the above-described feature keywords, profile keywords, similar profiles, and trend keywords, and displays information for display on the user's screen. It is an apparatus provided to the terminals 10-30. FIG. 3 is a functional block diagram illustrating the configuration of the information management server 100 according to the present embodiment. As shown in the figure, the information management server 100 includes an input unit 110, an output unit 120, a communication control IF unit 130, an input / output control IF unit 140, a storage unit 150, and a control unit 160. Configured.

このうち、入力部１１０は、各種の情報を入力する手段であり、キーボードやマウス、マイクなどによって構成され、例えば、後述する記憶部１５０に記憶された各テーブル１５０ａ〜１５０ｌの更新情報等を受け付けて入力する。なお、後述するモニタ（出力部１２０）も、マウスと協働してポインティングデバイス機能を実現する。 Among these, the input unit 110 is a means for inputting various types of information, and includes a keyboard, a mouse, a microphone, and the like. For example, the input unit 110 accepts update information of each table 150a to 150l stored in the storage unit 150 described later. Enter. A monitor (output unit 120) described later also realizes a pointing device function in cooperation with the mouse.

出力部１２０は、各種の情報を出力する出力手段であり、モニタ（若しくはディスプレイ、タッチパネル）やスピーカなどによって構成され、例えば、後述する記憶部１５０に記憶された各テーブル１５０ａ〜１５０ｌの情報を出力する。 The output unit 120 is an output unit that outputs various types of information. The output unit 120 includes a monitor (or display, touch panel), a speaker, and the like. For example, the output unit 120 outputs information on each table 150a to 150l stored in the storage unit 150 described later. To do.

通信制御ＩＦ部１３０は、主にネットワーク５０を介して利用者端末１０〜３０との間におけるデータ通信を制御する手段である。入出力制御ＩＦ部１４０は、入力部１１０、出力部１２０、通信制御ＩＦ部１３０、記憶部１５０、制御部１６０によるデータの入出力を制御する手段である。 The communication control IF unit 130 is means for controlling data communication with the user terminals 10 to 30 mainly via the network 50. The input / output control IF unit 140 is a unit that controls input / output of data by the input unit 110, the output unit 120, the communication control IF unit 130, the storage unit 150, and the control unit 160.

記憶部１５０は、制御部１６０による各種処理に必要なデータおよびプログラムを記憶する記憶手段（格納手段）であり、特に本発明に密接に関連するものとしては、図３に示すように、記事管理テーブル１５０ａと、記事情報テーブル１５０ｂと、実行管理テーブル１５０ｃと、同義語・結合語テーブル１５０ｄと、キーワードテーブル１５０ｅと、記事キーワードテーブル１５０ｆと、不要語テーブル１５０ｇと、記事空間管理テーブル１５０ｈと、プロファイル情報テーブル１５０ｉと、プロファイルキーワードテーブル１５０ｊと、トレンドキーワードテーブル１５０ｋと、類似プロファイルテーブル１５０ｌとを備える。 The storage unit 150 is a storage unit (storage unit) that stores data and programs necessary for various processes performed by the control unit 160. In particular, as shown in FIG. Table 150a, article information table 150b, execution management table 150c, synonym / joint word table 150d, keyword table 150e, article keyword table 150f, unnecessary word table 150g, article space management table 150h, profile An information table 150i, a profile keyword table 150j, a trend keyword table 150k, and a similar profile table 150l are provided.

以下において、記憶部１５０が記憶する各テーブル１５０ａ〜１５０ｌについて順に説明する。記事管理テーブル１５０ａは、記事およびこの記事を作成した利用者あるいはコミュニティの情報等を管理するテーブルである。図４は、記事管理テーブル１５０ａのデータ構造の一例を示す図である。 Below, each table 150a-150l which the memory | storage part 150 memorize | stores is demonstrated in order. The article management table 150a is a table for managing information on the article and the user or community that created the article. FIG. 4 is a diagram illustrating an example of the data structure of the article management table 150a.

図４に示すように、この記事管理テーブル１５０ａは、記事を識別する記事ＩＤ（Identification）、投稿者（利用者、コミュニティを含む）、投稿者を識別するプロファイルＩＤ、投稿先、記事本文を更新した日時を示すコンテンツ更新日時、記事本文、記事に対するコメントの数を示す被コメント総数、削除フラグを有する。削除フラグは、該当ラインの情報が有効か否かを示す情報であり、削除フラグが「オン」の場合は該当ラインの情報が有効であることを示し、「オフ」の場合は該当ラインの情報が無効であることを示す（以下に説明するテーブルの削除フラグも同様であるため、以下の削除フラグの説明は省略する）。 As shown in FIG. 4, the article management table 150a updates an article ID (Identification) for identifying an article, a poster (including a user and a community), a profile ID for identifying the poster, a posting destination, and an article body. Content update date and time indicating the date and time, the article text, the total number of comments indicating the number of comments on the article, and a deletion flag. The deletion flag is information indicating whether or not the information of the corresponding line is valid. When the deletion flag is “on”, the information of the corresponding line is valid. When the deletion flag is “off”, the information of the corresponding line is displayed. Is invalid (the same applies to the deletion flag of the table described below, so the description of the deletion flag below is omitted).

記事情報テーブル１５０ｂは、記事ごとに特徴キーワードを抽出したか否かを管理するテーブルである。図５は、記事情報テーブル１５０ｂのデータ構造の一例を示す図である。同図に示すように、この記事情報テーブル１５０ｂは、記事ＩＤ、投稿者（利用者、コミュニティを含む）、投稿先、コンテンツ更新日時、キーワード抽出フラグ、削除フラグを有する。このうち、キーワード抽出フラグは、該当ラインの記事ＩＤによって識別される記事から特徴キーワードを抽出済みか否かを示す情報である。キーワード抽出フラグが「オン」の場合は、記事から特徴キーワードを抽出済みであることを示し、「オフ」の場合は、記事から特徴キーワードを抽出していないことを示す。 The article information table 150b is a table for managing whether or not a feature keyword has been extracted for each article. FIG. 5 is a diagram illustrating an example of a data structure of the article information table 150b. As shown in the figure, the article information table 150b includes an article ID, a poster (including a user and a community), a posting destination, a content update date, a keyword extraction flag, and a deletion flag. Among these, the keyword extraction flag is information indicating whether or not the feature keyword has been extracted from the article identified by the article ID of the corresponding line. When the keyword extraction flag is “on”, it indicates that the feature keyword has been extracted from the article, and when it is “off”, it indicates that the feature keyword has not been extracted from the article.

実行管理テーブル１５０ｃは、記事から特徴キーワードを抽出する処理（特徴キーワード抽出処理）、利用者およびコミュニティのプロファイルキーワードを抽出する処理（プロファイルキーワード抽出処理）の実行開始日時を記憶するテーブルである。図６は、実行管理テーブル１５０ｃのデータ構造の一例を示す図である。同図に示すように、この実行管理テーブル１５０ｃは、処理種別および実行開始日時を有する。 The execution management table 150c is a table that stores the execution start date and time of processing for extracting feature keywords from articles (feature keyword extraction processing) and processing for extracting user and community profile keywords (profile keyword extraction processing). FIG. 6 is a diagram illustrating an example of the data structure of the execution management table 150c. As shown in the figure, the execution management table 150c has a process type and an execution start date and time.

同義語・結合語テーブル１５０ｄは、各種の同義語・結合語を管理し、置換元となるキーワードと置換後のキーワードとを対応付けて記憶するテーブルである。図７は、同義語・結合語テーブル１５０ｄのデータ構造の一例を示す図である。同図に示すように、この同義語・結合語管理テーブル１５０ｄは、置換元キーワード、置換キーワード、更新日時を有する。図７に示す同義語・結合語テーブル１５０ｄによれば、例えば、記事から抽出されるキーワード「コンピューター」は、「コンピュータ」に置換されることになる。 The synonym / joint word table 150d is a table that manages various synonyms / joint words and stores the replacement source keyword and the replacement keyword in association with each other. FIG. 7 is a diagram illustrating an example of a data structure of the synonym / joint word table 150d. As shown in the figure, the synonym / joint word management table 150d has a replacement source keyword, a replacement keyword, and an update date and time. According to the synonym / joint word table 150d shown in FIG. 7, for example, the keyword “computer” extracted from the article is replaced with “computer”.

キーワードテーブル１５０ｅは、記事に含まれるキーワード（特徴キーワード）に関わる各種情報を記憶するテーブルである。図８は、キーワードテーブル１５０ｅのデータ構造の一例を示す図である。同図に示すように、このキーワードテーブル１５０ｅは、キーワードＩＤ、キーワード（特徴キーワード）、正規化済キーワード、キーワード出現総数、含有コンテンツ数、不要語フラグを有する。 The keyword table 150e is a table that stores various types of information related to keywords (feature keywords) included in articles. FIG. 8 is a diagram illustrating an example of the data structure of the keyword table 150e. As shown in the figure, the keyword table 150e includes a keyword ID, a keyword (feature keyword), a normalized keyword, the total number of keyword appearances, the number of contained contents, and an unnecessary word flag.

このうち、正規化済キーワードは、正規化されたキーワードを示す。例えば、「ａｂｃ本店」は、「ＡＢＣ本店」に正規化される。キーワード出現総数は、記事管理テーブル１５０ａの全記事本文中に所定のキーワードが出現する総数を示す。図８の１段目に示す例では、キーワード「コンピュータ」が、記事本文中に「１００００回」出現することが記憶されている。 Among these, the normalized keyword indicates a normalized keyword. For example, “abc main store” is normalized to “ABC main store”. The total number of keyword appearances indicates the total number of occurrences of a predetermined keyword in all article texts in the article management table 150a. In the example shown in the first row of FIG. 8, it is stored that the keyword “computer” appears “10000 times” in the article text.

含有コンテンツ数は、記事管理テーブル１５０ａの記事のうち、所定のキーワードを含む記事の数を示す。図８の１段目に示す例では、キーワード「コンピュータ」を含む記事が、「５０」であることが記憶されている。 The number of contained contents indicates the number of articles including a predetermined keyword among the articles in the article management table 150a. In the example shown in the first row of FIG. 8, it is stored that the article including the keyword “computer” is “50”.

不要語フラグは、該当ラインのキーワードが特徴キーワードの対象となるか否かを示す情報である。不要語フラグが「オン」の場合には、該当ラインのキーワードが特徴キーワードの候補とならないことを示し、「オフ」の場合には、該当ラインのキーワードが特徴キーワードの候補となることを示している。 The unnecessary word flag is information indicating whether or not the keyword of the corresponding line is the target of the feature keyword. When the unnecessary word flag is “on”, it indicates that the keyword of the corresponding line is not a candidate for the feature keyword, and when it is “off”, it indicates that the keyword of the corresponding line is a candidate for the feature keyword. Yes.

記事キーワードテーブル１５０ｆは、特徴キーワードのキーワードＩＤとかかる特徴キーワードを含む記事とを対応付けて記憶するテーブルである。図９は、記事キーワードテーブル１５０ｆのデータ構造の一例を示す図である。同図に示すように、この記事キーワードテーブル１５０ｆは、記事ＩＤ、キーワードＩＤ、キーワード出現数、削除フラグを有する。 The article keyword table 150f is a table that stores a keyword ID of a feature keyword and an article including the feature keyword in association with each other. FIG. 9 is a diagram illustrating an example of the data structure of the article keyword table 150f. As shown in the figure, the article keyword table 150f has an article ID, a keyword ID, a keyword appearance count, and a deletion flag.

ここで、図９におけるキーワードＩＤによって識別されるキーワードは、特徴キーワードを示す。また、キーワード出現数は、キーワードＩＤによって識別される特徴キーワードが記事ＩＤによって識別される記事に出現する回数を示す。 Here, the keyword identified by the keyword ID in FIG. 9 indicates a feature keyword. The keyword appearance count indicates the number of times that the feature keyword identified by the keyword ID appears in the article identified by the article ID.

不要語テーブル１５０ｇは、特徴キーワードの候補とならないキーワード（不要語）を記憶するテーブルである。図１０は、不要語テーブル１５０ｇのデータ構造の一例を示す図である。同図に示すように、この不要語テーブル１５０ｇは、各種不要語を記憶している。 The unnecessary word table 150g is a table that stores keywords (unnecessary words) that are not candidates for feature keywords. FIG. 10 is a diagram illustrating an example of a data structure of the unnecessary word table 150g. As shown in the figure, the unnecessary word table 150g stores various unnecessary words.

記事空間管理テーブル１５０ｈは、投稿者（利用者、コミュニティを含む）の個人情報等を管理するテーブルである。図１１は、記事空間管理テーブル１５０ｈのデータ構造の一例を示す図である。同図に示すように、この記事空間管理テーブル１５０ｈは、投稿者を識別するプロファイルＩＤ、記事空間識別情報、名称、該当ラインを更新した日時を示す更新日時、削除フラグを有する。このうち、記事空間識別情報は、プロファイルＩＤに対応する投稿者が個人（単独の利用者）かコミュニティかを示す情報である。 The article space management table 150h is a table for managing personal information of contributors (including users and communities). FIG. 11 is a diagram illustrating an example of the data structure of the article space management table 150h. As shown in the figure, the article space management table 150h has a profile ID for identifying a contributor, article space identification information, a name, an update date and time indicating the date and time when the corresponding line is updated, and a deletion flag. Of these, the article space identification information is information indicating whether the poster corresponding to the profile ID is an individual (single user) or a community.

プロファイル情報テーブル１５０ｉは、プロファイルＩＤに対応する各種情報を記憶するテーブルである。図１２は、プロファイル情報テーブル１５０ｉのデータ構造の一例を示す図である。同図に示すように、このプロファイル情報テーブル１５０ｉは、プロファイルＩＤ、記事空間識別情報、キーワード抽出フラグ、類似度算出フラグ、投稿者が作成した記事の総数を示す記事総数、投稿者が作成した記事に対する被コメント総数、削除フラグを有する。 The profile information table 150i is a table that stores various types of information corresponding to profile IDs. FIG. 12 is a diagram illustrating an example of the data structure of the profile information table 150i. As shown in the figure, this profile information table 150i includes a profile ID, article space identification information, keyword extraction flag, similarity calculation flag, total number of articles indicating the total number of articles created by the poster, and articles created by the poster. The total number of commented on and a deletion flag.

図１２におけるキーワード抽出フラグは、該当ラインの投稿者に対応するプロファイルキーワードを抽出済みか否かを示す情報である。キーワード抽出フラグが「オン」の場合には、プロファイルキーワードを抽出済みであることを示し、「オフ」の場合には、プロファイルキーワードを抽出していないことを示す。 The keyword extraction flag in FIG. 12 is information indicating whether or not the profile keyword corresponding to the poster on the corresponding line has been extracted. When the keyword extraction flag is “on”, it indicates that the profile keyword has been extracted, and when it is “off”, it indicates that the profile keyword has not been extracted.

類似度算出フラグは、該当ラインの投稿者に対応する類似度（類似プロファイルを抽出する場合に利用される値であり、詳細は後述する）を算出したか否かを示す情報である。類似度算出フラグが「オン」の場合には、類似度を算出済みであることを示し、「オフ」の場合には、類似度を算出していないことを示す。 The similarity calculation flag is information indicating whether or not a similarity (a value used when a similar profile is extracted and details will be described later) corresponding to the poster of the corresponding line has been calculated. When the similarity calculation flag is “on”, it indicates that the similarity has been calculated, and when it is “off”, it indicates that the similarity is not calculated.

プロファイルキーワードテーブル１５０ｊは、投稿者（利用者、コミュニティを含む）のプロファイルキーワードを管理するテーブルである。図１３は、プロファイルキーワードテーブル１５０ｊのデータ構造の一例を示す図である。同図に示すように、このプロファイルキーワードテーブル１５０ｊは、プロファイルＩＤ、記事空間識別情報、キーワードＩＤ、第１TF/IDF値、キーワード出現総数、含有コンテンツ数、不要語フラグ、削除フラグを有する。 The profile keyword table 150j is a table for managing profile keywords of contributors (including users and communities). FIG. 13 is a diagram illustrating an example of the data structure of the profile keyword table 150j. As shown in the figure, the profile keyword table 150j has a profile ID, article space identification information, keyword ID, first TF / IDF value, total number of keyword appearances, number of contained contents, unnecessary word flag, and deletion flag.

図１３に含まれるキーワードＩＤは、投稿者に対するプロファイルキーワードのキーワードＩＤを示す。例えば、プロファイルＩＤ「ｐ０００１」によって識別される投稿者のプロファイルキーワードは、キーワードＩＤ「ｗ０００１」によって識別されるキーワードおよび「ｗ０００２」によって識別されるキーワードとなる。 The keyword ID included in FIG. 13 indicates the keyword ID of the profile keyword for the poster. For example, the profile keyword of the poster identified by the profile ID “p0001” is the keyword identified by the keyword ID “w0001” and the keyword identified by “w0002”.

第１TF/IDF値は、投稿者（利用者、コミュニティを含む）に対する特徴キーワードの特徴量を示し、第１TF/IDF値の値が大きいほど、より該当投稿者の特徴を良く表す特徴キーワードとなる。第１TF/IDF値の算出方法は後述する。なお、キーワード出現総数、含有コンテンツ数は上記と同様である。 The first TF / IDF value indicates the feature amount of the feature keyword for the poster (including the user and the community). The larger the first TF / IDF value, the more the feature keyword that better represents the feature of the corresponding poster. . A method for calculating the first TF / IDF value will be described later. The total number of keyword appearances and the number of contained contents are the same as described above.

トレンドキーワードテーブル１５０ｋは、トレンドキーワードを管理するテーブルである。図１４は、トレンドキーワードテーブル１５０ｋのデータ構造の一例を示す図である。同図に示すように、このトレンドキーワードテーブル１５０ｋは、キーワードＩＤ、第２TF/IDF値、キーワード出現総数、含有コンテンツ数を有する。 The trend keyword table 150k is a table for managing trend keywords. FIG. 14 is a diagram illustrating an example of a data structure of the trend keyword table 150k. As shown in the figure, the trend keyword table 150k has a keyword ID, a second TF / IDF value, a total number of keyword appearances, and the number of contained contents.

図１４に含まれるキーワードＩＤは、トレンドキーワードとなるキーワードのキーワードＩＤを示す。例えば、キーワードＩＤ「ｗ０００１」によって識別されるキーワードは、トレンドキーワードである。第２TF/IDF値は、対応するトレンドキーワードの特徴量を示し、第２TF/IDF値の値が大きいほど、より流行しているキーワードとなる。なお、キーワード出現総数、含有コンテンツ数は上記と同様である。 The keyword ID included in FIG. 14 indicates a keyword ID of a keyword that becomes a trend keyword. For example, the keyword identified by the keyword ID “w0001” is a trend keyword. The second TF / IDF value indicates the feature amount of the corresponding trend keyword. The larger the second TF / IDF value, the more popular the keyword. The total number of keyword appearances and the number of contained contents are the same as described above.

類似プロファイルテーブル１５０ｌは、投稿者（利用者、コミュニティを含む）に類似する利用者およびコミュニティ（類似プロファイル）を管理するテーブルである。図１５は、類似プロファイルテーブル１５０ｌのデータ構造の一例を示す図である。同図に示すように、この類似プロファイルテーブル１５０ｌは、プロファイルＩＤ、情報の順序を表すシーケンス番号、類似プロファイルＩＤ、類似度を有する。 The similar profile table 150l is a table for managing users and communities (similar profiles) similar to posters (including users and communities). FIG. 15 is a diagram illustrating an example of a data structure of the similar profile table 150l. As shown in the figure, the similar profile table 150l has a profile ID, a sequence number indicating the order of information, a similar profile ID, and a similarity.

このうち類似プロファイルＩＤは、プロファイルＩＤによって識別される投稿者に類似する利用者およびコミュニティのプロファイルＩＤを示す。例えば、プロファイルＩＤ「ｐ０００１」によって識別される投稿者の類似プロファイルは、プロファイルＩＤ（類似プロファイルＩＤ）「ｐ００２２」，「ｐ００８７」によって識別される投稿者（利用者あるいはコミュニティ）となる。類似度は、プロファイルＩＤの投稿者に対して類似プロファイルＩＤの投稿者がどれほど類似しているか否かを示す情報であり、数値が大きいほどより類似していることになる。 Of these, the similar profile ID indicates the profile ID of the user and community similar to the poster identified by the profile ID. For example, the similar profile of the poster identified by the profile ID “p0001” is the poster (user or community) identified by the profile ID (similar profile ID) “p0022”, “p0087”. The degree of similarity is information indicating how similar a poster with a similar profile ID is to a poster with a profile ID, and the greater the numerical value, the more similar.

図３の説明に戻ると、制御部１６０は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、これらによって種々の処理を実行する制御手段であり、特に本発明に密接に関連するものとしては、図３に示すように、情報管理部１６０ａと、特徴キーワード抽出処理部１６０ｂと、プロファイルキーワード抽出処理部１６０ｃと、類似度算出処理部１６０ｄと、トレンドキーワード抽出処理部１６０ｅと、サービス提供処理部１６０ｆとを備える。 Returning to the description of FIG. 3, the control unit 160 is a control unit that has an internal memory for storing programs and control data that define various processing procedures, and executes various processes using these programs. As shown in FIG. 3, the information management unit 160a, the feature keyword extraction processing unit 160b, the profile keyword extraction processing unit 160c, the similarity calculation processing unit 160d, and the trend keyword extraction are closely related to the invention. A processing unit 160e and a service provision processing unit 160f are provided.

情報管理部１６０ａは、記憶部１５０に記憶された各テーブル１５０ａ〜１５０ｌを管理する手段である。例えば、情報管理部１６０ａは、各テーブル１５０ａ〜１５０ｌに対する更新データを取得した場合に、取得した更新データによって各テーブル１５０ａ〜１５０ｌに記憶された情報を更新する。 The information management unit 160 a is a unit that manages the tables 150 a to 150 l stored in the storage unit 150. For example, when the update data for the tables 150a to 150l is acquired, the information management unit 160a updates the information stored in the tables 150a to 150l with the acquired update data.

特徴キーワード抽出処理部１６０ｂは、記憶部１５０に記憶された各テーブル１５０ａ〜１５０ｌの情報を用いて特徴キーワードを抽出する手段である。図１６および図１７は、特徴キーワード抽出処理部１６０ｂの具体的な処理を示す図である。 The feature keyword extraction processing unit 160 b is a means for extracting feature keywords using information in the tables 150 a to 150 l stored in the storage unit 150. 16 and 17 are diagrams illustrating specific processing of the feature keyword extraction processing unit 160b.

まず、図１６から説明すると、特徴キーワード抽出処理部１６０ｂは、実行管理テーブル１５０ｃから処理種別「特徴キーワード抽出処理」に対応する実行開始時間＜Ａ＞を取得し（図１６の（１）参照）、記事管理テーブル１５０ａからコンテンツ更新日時が＜Ａ＞よりも新しい各種記事情報（記事ＩＤ、投稿者、投稿先、削除フラグ、コンテンツ更新日時）を取得する（図１６の（２）参照）。 First, referring to FIG. 16, the feature keyword extraction processing unit 160b acquires the execution start time <A> corresponding to the process type “feature keyword extraction processing” from the execution management table 150c (see (1) in FIG. 16). Then, various pieces of article information (article ID, contributor, posting destination, deletion flag, content update date and time) whose content update date is newer than <A> are acquired from the article management table 150a (see (2) in FIG. 16).

そして、特徴キーワード抽出処理部１６０ｂは、図１６の（２）で取得した各種記事情報を記事情報テーブル１５０ｂに登録し（図１６の（３）参照）、今回の処理における実行開始日時を、実行管理テーブル１５０ｃに登録する（図１６の（４）参照）。 Then, the feature keyword extraction processing unit 160b registers the various article information acquired in (2) in FIG. 16 in the article information table 150b (see (3) in FIG. 16), and executes the execution start date and time in the current process. Register in the management table 150c (see (4) in FIG. 16).

その後、特徴キーワード抽出処理部１６０ｂは、同義語・結合語テーブル１５０ｄの更新日時と、記事情報テーブル１５０ｂのコンテンツ更新日時とを比較して、コンテンツ更新日時が同義語・結合語テーブル１５０ｄの更新日時以降となる記事ＩＤのキーワード抽出フラグを「オフ」に設定する（図示略）。 Thereafter, the feature keyword extraction processing unit 160b compares the update date / time of the synonym / joint word table 150d with the content update date / time of the article information table 150b, and the content update date / time is the update date / time of the synonym / joint word table 150d. The keyword extraction flag for the article IDs that follow is set to “off” (not shown).

続いて、図１７の説明に移ると、特徴キーワード抽出処理部１６０ｂは、記事情報テーブル１５０ｂからキーワード抽出フラグが「オフ」の記事ＩＤ一覧を取得し（図１７の（１）参照）、図１７の（１）で取得した各記事ＩＤに対応する記事本文を取得する（図１７の（２）参照）。 Subsequently, in the description of FIG. 17, the feature keyword extraction processing unit 160b acquires an article ID list whose keyword extraction flag is “off” from the article information table 150b (see (1) of FIG. 17). Article text corresponding to each article ID acquired in (1) is acquired (see (2) in FIG. 17).

特徴キーワード抽出処理部１６０ｂは、記事本文からキーワードを抽出する（図１７の（３））。なお、記事本文からキーワードを抽出する場合には、周知技術である形態素解析等を実行すればよい。 The feature keyword extraction processing unit 160b extracts keywords from the article text ((3) in FIG. 17). In addition, when extracting a keyword from an article text, a morphological analysis or the like that is a well-known technique may be executed.

続いて、特徴キーワード抽出処理部１６０ｂは、同義語・結合語テーブル１５０ｄから、置換元キーワード・置換キーワードの一覧を取得し、取得した一覧と図１７の（３）で抽出したキーワードとを比較することにより、各キーワードを置換する（図１７の（４）参照）。 Subsequently, the feature keyword extraction processing unit 160b acquires a list of replacement source keywords / replacement keywords from the synonym / joint word table 150d, and compares the acquired list with the keywords extracted in (3) of FIG. Thus, each keyword is replaced (see (4) in FIG. 17).

そして、特徴キーワード抽出処理部１６０ｂは、キーワード（置換したキーワード）に対応する各種情報（正規済キーワード、品詞タグ、キーワード出現総数）を作成する（図１７の（５）参照）。なお、正規済キーワードは、当該正規キーワードとキーワードとを対応付けたテーブル（図示略）を特徴キーワード抽出処理部１６０ｂが保持しており、かかるテーブルを利用して、特徴キーワード抽出処理部１６０ｂは、キーワードを正規化する。また、特徴キーワード抽出処理部１６０ｂがキーワードと、記事管理テーブル１５０ａに記憶された記事本文とを比較することによりキーワード出現総数を計数するものとする。 Then, the feature keyword extraction processing unit 160b creates various information (normalized keyword, part of speech tag, keyword appearance total number) corresponding to the keyword (replaced keyword) (see (5) in FIG. 17). Note that the regular keyword includes a table (not shown) in which the regular keyword and the keyword are associated with each other, and the feature keyword extraction processing unit 160b uses the table to store the feature keyword extraction processing unit 160b. Normalize keywords. Further, it is assumed that the feature keyword extraction processing unit 160b counts the total number of keyword appearances by comparing the keyword and the article text stored in the article management table 150a.

続いて、特徴キーワード抽出処理部１６０ｂは、図１７の（５）で作成した各種情報をキーワードテーブル１５０ｅに登録する（図１７の（６）参照）。キーワードテーブル１５０ｅに登録される各キーワードが、特徴キーワードとなる。なお、特徴キーワード抽出処理部１６０ｂは、キーワードと記事管理テーブル１５０ａに記憶された記事本文とを比較することにより含有コンテンツ数を計数するものとする。 Subsequently, the feature keyword extraction processing unit 160b registers various information created in (5) of FIG. 17 in the keyword table 150e (see (6) of FIG. 17). Each keyword registered in the keyword table 150e is a feature keyword. Note that the feature keyword extraction processing unit 160b counts the number of contained contents by comparing the keyword and the article text stored in the article management table 150a.

また、特徴キーワード抽出処理部１６０ｂは、記事ＩＤと、当該記事ＩＤから抽出したキーワード（特徴キーワード）とを対応づけた各種情報（記事ＩＤ、キーワードＩＤ、キーワード出現総数、削除フラグ）を記事キーワードテーブル１５０ｆに登録し（図１７の（７）参照）、処理が終了した記事情報テーブルのキーワード抽出フラグを「オン」に設定する（図１７の（８）参照）。 The feature keyword extraction processing unit 160b also stores various information (article ID, keyword ID, keyword appearance total number, deletion flag) in which the article ID is associated with the keyword (feature keyword) extracted from the article ID. It is registered in 150f (see (7) in FIG. 17), and the keyword extraction flag of the article information table that has been processed is set to “ON” (see (8) in FIG. 17).

その後、特徴キーワード抽出処理部１６０ｂは、不要語テーブル１５０ｇに記憶された各不要語と、キーワードテーブル１５０ｅの各キーワード（あるいは正規済キーワード）とを比較し、不要語と一致するキーワードのラインに対応する不要語フラグを「オン」に設定する（図示略）。 Thereafter, the feature keyword extraction processing unit 160b compares each unnecessary word stored in the unnecessary word table 150g with each keyword (or regularized keyword) in the keyword table 150e, and corresponds to a keyword line that matches the unnecessary word. The unnecessary word flag to be set is set to “ON” (not shown).

図３の説明に戻ると、プロファイルキーワード抽出処理部１６０ｃは、投稿者（利用者、コミュニティ含む）によって作成された記事と特徴キーワードとを基にして、投稿者の特徴を示すプロファイルキーワードを抽出する手段である。図１８および図１９は、プロファイルキーワード抽出処理部１６０ｃの具体的な処理を示す図である。 Returning to the description of FIG. 3, the profile keyword extraction processing unit 160 c extracts a profile keyword indicating the feature of the poster based on the article and the feature keyword created by the poster (including the user and the community). Means. 18 and 19 are diagrams illustrating specific processing of the profile keyword extraction processing unit 160c.

まず、図１８から説明すると、プロファイルキーワード抽出処理部１６０ｃは、実行管理テーブル１５０ｃから処理種別「プロファイルキーワード抽出処理」に対応する実行開始時間＜Ａ＞を取得し（図１８の（１）参照）、記事空間管理テーブル１５０ｈから更新日時が＜Ａ＞よりも新しい各種プロファイル情報（プロファイルＩＤ、記事空間識別情報、削除フラグ）を取得する（図１８の（２）参照）。 First, referring to FIG. 18, the profile keyword extraction processing unit 160c acquires the execution start time <A> corresponding to the processing type “profile keyword extraction processing” from the execution management table 150c (see (1) in FIG. 18). Then, various profile information (profile ID, article space identification information, deletion flag) whose update date is newer than <A> is acquired from the article space management table 150h (see (2) in FIG. 18).

そして、プロファイルキーワード抽出処理部１６０ｃは、図１８の（２）で取得した各種プロファイル情報をプロファイル情報テーブル１５０ｉに登録し（図１８の（３）参照）、今回の処理における実行開始日時を実行管理テーブル１５０ｃに登録する（図１８の（４）参照）。 Then, the profile keyword extraction processing unit 160c registers the various profile information acquired in (2) of FIG. 18 in the profile information table 150i (see (3) of FIG. 18), and manages the execution start date and time in the current process. It is registered in the table 150c (see (4) in FIG. 18).

また、プロファイルキーワード抽出処理部１６０ｃは、実行管理テーブル１５０ｃの「プロファイルキーワード抽出処理」に対応する実行開始時間＜Ａ＞以降に更新されたプロファイルＩＤのキーワード抽出フラグを「オフ」に設定する（図示略）。 Further, the profile keyword extraction processing unit 160c sets the keyword extraction flag of the profile ID updated after the execution start time <A> corresponding to the “profile keyword extraction processing” in the execution management table 150c to “off” (illustration shown). (Omitted).

続いて、図１９の説明に移ると、プロファイルキーワード抽出処理部１６０ｃは、プロファイル情報テーブル１５０ｉからキーワード抽出フラグが「オフ」のプロファイルＩＤ一覧を取得し（図１９の（１）参照）、記事管理テーブル１５０ａからプロファイルＩＤに対応する記事総数（プロファイルＩＤによって識別される投稿者が作成した記事の総数）および被コメント総数を取得し（図１９の（２）参照）、図１９の（２）で取得した各種情報、すなわち、プロファイルＩＤと記事総数と被コメント総数とを対応付けてプロファイル情報テーブル１５０ｉに登録する（図１９の（３）参照）。 Subsequently, in the description of FIG. 19, the profile keyword extraction processing unit 160c acquires a list of profile IDs whose keyword extraction flag is “off” from the profile information table 150i (see (1) in FIG. 19), and article management. The total number of articles corresponding to the profile ID (the total number of articles created by the poster identified by the profile ID) and the total number of comments are obtained from the table 150a (see (2) in FIG. 19). The acquired various information, that is, the profile ID, the total number of articles, and the total number of comments are associated and registered in the profile information table 150i (see (3) in FIG. 19).

そして、プロファイルキーワード抽出処理部１６０ｃは、図１９の（１）で取得したプロファイルＩＤに紐付く記事（記事ＩＤ）のキーワードを記事キーワードテーブル１５０ｆと記事情報テーブル１５０ｂから取得する（図１９の（４）参照）。なお、プロファイルＩＤに紐付く記事ＩＤは、例えば、記事管理テーブル１５０ａを参照することにより、判定することができる。 Then, the profile keyword extraction processing unit 160c acquires an article (article ID) keyword associated with the profile ID acquired in (1) of FIG. 19 from the article keyword table 150f and the article information table 150b ((4 in FIG. 19). )reference). The article ID associated with the profile ID can be determined by referring to the article management table 150a, for example.

続いて、プロファイルキーワード抽出処理部１６０ｃは、図１９の（４）で取得した各キーワード（特徴キーワード）の第１TF/IDF値を算出する（図１９の（５）参照）。第１TF/IDF値の具体的な算出式は、
第１TF/IDF値＝（プロファイルＩＤ（投稿者）の記事のキーワードの出現回数＜キーワード出現総数＞）×ｌｏｇ｛（プロファイルＩＤ（投稿者）によって作成された記事の記事総数）／（プロファイルＩＤ（投稿者）によって作成された記事の内でキーワードを含む記事数＜含有コンテンツ数＞）｝
によって表すことができる。なお、上式の各値は、プロファイルキーワード抽出処理部１６０ｃが各テーブル１５０ａ〜１５０ｌを参照して、予め計数しておくものとする。 Subsequently, the profile keyword extraction processing unit 160c calculates the first TF / IDF value of each keyword (feature keyword) acquired in (4) of FIG. 19 (see (5) of FIG. 19). The specific formula for calculating the first TF / IDF value is:
First TF / IDF value = (Number of appearances of keywords of article with profile ID (contributor) <Keyword appearance total number>) × log {(Total number of articles of article created by profile ID (contributor)) / (Profile ID ( Number of articles that contain keywords in the articles created by contributors) <number of contents contained>)}
Can be represented by It should be noted that each value of the above expression is counted in advance by the profile keyword extraction processing unit 160c with reference to the tables 150a to 150l.

そして、プロファイルキーワード抽出処理部１６０ｃは、プロファイルＩＤにかかる各種情報（プロファイルＩＤ、記事空間識別情報、キーワードＩＤ、第１TF/IDF値、キーワード出現総数、含有コンテンツ数、不要語フラグ、削除フラグ）をプロファイルキーワードテーブル１５０ｊに登録し（図１９の（６）参照）、プロファイルキーワードテーブル１５０ｊに登録したプロファイルＩＤのキーワード抽出フラグ（プロファイル情報テーブル１５０ｉのキーワード抽出フラグ）を「オン」に設定する（図１９の（７）参照）。 Then, the profile keyword extraction processing unit 160c obtains various information related to the profile ID (profile ID, article space identification information, keyword ID, first TF / IDF value, keyword appearance total number, content number contained, unnecessary word flag, deletion flag). It is registered in the profile keyword table 150j (see (6) in FIG. 19), and the keyword extraction flag (keyword extraction flag in the profile information table 150i) of the profile ID registered in the profile keyword table 150j is set to “ON” (FIG. 19). (See (7)).

その後、プロファイルキーワード抽出処理部１６０ｃは、不要語テーブル１５０ｇに記憶された各不要語と、プロファイルキーワードテーブル１５０ｊの各キーワード（キーワードＩＤによって識別されるキーワード）とを比較し、不要語と一致するキーワードのラインに対応する不要語フラグを「オン」に設定する（図示略）。 Thereafter, the profile keyword extraction processing unit 160c compares each unnecessary word stored in the unnecessary word table 150g with each keyword (keyword identified by the keyword ID) in the profile keyword table 150j, and matches the unnecessary word. The unnecessary word flag corresponding to the line is set to “ON” (not shown).

プロファイルキーワードテーブル１５０ｊに記憶された各キーワード（キーワードＩＤによって識別されるキーワード＜特徴キーワード＞）のうち、第１TF/IDF値が閾値以上となるキーワードが、該当プロファイルＩＤに対応するプロファイルキーワードとして抽出されることになる。 Of each keyword (keyword <characteristic keyword> identified by keyword ID) stored in the profile keyword table 150j, a keyword whose first TF / IDF value is equal to or greater than a threshold is extracted as a profile keyword corresponding to the corresponding profile ID. Will be.

図３の説明に戻ると、類似度算出処理部１６０ｄは、各プロファイルＩＤのプロファイルキーワードを基にして類似度を算出し、算出した類似度に基づいて、類似するプロファイルＩＤを抽出する手段である。図２０は、類似度算出処理部１６０ｄの具体的な処理を示す図である。 Returning to the description of FIG. 3, the similarity calculation processing unit 160 d is a means for calculating a similarity based on the profile keyword of each profile ID and extracting a similar profile ID based on the calculated similarity. . FIG. 20 is a diagram illustrating specific processing of the similarity calculation processing unit 160d.

図２０に示すように、類似度算出処理部１６０ｄは、プロファイル情報テーブル１５０ｉから類似度算出フラグが「オフ」のプロファイルＩＤ一覧を取得し（図２０の（１）参照）、図２０の（１）で取得したプロファイルＩＤの各プロファイルキーワード一覧（プロファイルＩＤ、キーワードＩＤ、第１TF/IDF値、キーワード出現総数、含有コンテンツ数、不要語フラグ）をプロファイルキーワードテーブル１５０ｊから取得する（図２０の（２）参照）。 As illustrated in FIG. 20, the similarity calculation processing unit 160d acquires a list of profile IDs whose similarity calculation flag is “off” from the profile information table 150i (see (1) in FIG. 20), and (1) in FIG. The profile keyword list (profile ID, keyword ID, first TF / IDF value, total number of keyword appearances, number of contained contents, unnecessary word flag) of the profile ID acquired in (1) is acquired from the profile keyword table 150j ((2 in FIG. 20). )reference).

続いて、類似度算出処理部１６０ｄは、プロファイル情報テーブル１５０ｉから全てのプロファイルＩＤ一覧を取得し（図２０の（３）参照）、図２０の（３）で取得したプロファルＩＤの各プロファイルキーワード一覧（プロファイルＩＤ、キーワードＩＤ、第１TF/IDF値、キーワード出現総数、含有コンテンツ数、不要語フラグ）をプロファイルキーワードテーブル１５０ｊから取得する（図２０の（４）参照）。 Subsequently, the similarity calculation processing unit 160d acquires all profile ID lists from the profile information table 150i (see (3) in FIG. 20), and each profile keyword list of profile IDs acquired in (3) in FIG. (Profile ID, keyword ID, first TF / IDF value, keyword appearance total number, content content number, unnecessary word flag) are acquired from the profile keyword table 150j (see (4) in FIG. 20).

そして、類似度算出処理部１６０ｄは、図２０の（２）、（４）で取得したプロファイルキーワード一覧から各プロファイル間の類似度を算出する（図２０の（５）参照）。ここで、一例として、各プロファイルＩＤによって識別される投稿者Ａさん、Ｂさん、Ｃさんが存在し、Ａさんのプロファイルキーワードが（キーワード１＜１０＞、キーワード２＜５＞、キーワード３＜８＞）、Ｂさんのプロファイルキーワードが（キーワード１＜１８＞、キーワード２＜５６＞、キーワード９＜６＞）、Ｃさんのプロファイルキーワードが（キーワード２＜１２＞、キーワード７＜６＞）であり、かつ、Ａさんに対するＢさん、Ｃさんの類似度を算出する場合について説明する（＜＞内の数値は、キーワード出現総数）。 Then, the similarity calculation processing unit 160d calculates the similarity between the profiles from the profile keyword list acquired in (2) and (4) in FIG. 20 (see (5) in FIG. 20). Here, as an example, there are posters A, B, and C identified by each profile ID, and A's profile keywords are (keyword 1 <10>, keyword 2 <5>, keyword 3 <8). >), B's profile keyword is (Keyword 1 <18>, Keyword 2 <56>, Keyword 9 <6>), and C's profile keyword is (Keyword 2 <12>, Keyword 7 <6>) In addition, a case where the similarity between Mr. B and Mr. C with respect to Mr. A is calculated will be described (the numerical values in <> are the total number of keyword appearances).

まず、類似度算出処理部１６０ｄは、Ａさんを基点として、Ａさん、Ｂさん、Ｃさんのベクトルを算出すると、
ＡさんのベクトルＡ＝（１０、５、８）、
ＢさんのベクトルＢ＝（１８、５６、０）、
ＣさんのベクトルＣ＝（０、１２、０）となる。 First, the similarity calculation processing unit 160d calculates the vectors of Mr. A, Mr. B, and Mr. C with Mr. A as a base point.
A's vector A = (10, 5, 8),
B's vector B = (18, 56, 0),
Mr. C's vector C = (0, 12, 0).

そして、類似度算出処理部１６０ｄは、ベクトル間の距離を類似度として算出し、算出した類似度が閾値以上である場合には、各ベクトルの投稿者は類似していると判定する。例えば、ベクトルＡとベクトルＢとの距離が閾値以上である場合には、ＡさんとＢさんとは類似していることとなり、Ａさんの類似プロファイルにＢさんのプロファイルＩＤが設定される。この場合、Ｂさんを対象とする類似キーワードは、Ｂさんの持つキーワードのうち最大のキーワード出現総数となるキーワード２となる。 Then, the similarity calculation processing unit 160d calculates the distance between the vectors as the similarity, and determines that the poster of each vector is similar when the calculated similarity is equal to or greater than the threshold. For example, when the distance between the vector A and the vector B is equal to or greater than the threshold value, Mr. A and Mr. B are similar, and the profile ID of Mr. B is set in the similar profile of Mr. A. In this case, the similar keyword targeting Mr. B is the keyword 2 that is the maximum total number of keyword appearances among the keywords of Mr. B.

一方、ベクトルＡとベクトルＣとの距離が閾値未満の場合には、ＡさんとＢさんとは類似していないこととなり、Ａさんの類似プロファイルにＣさんのプロファイルＩＤは設定されない。 On the other hand, when the distance between the vector A and the vector C is less than the threshold value, the Mr. A and the Mr. B are not similar, and the profile ID of Mr. C is not set in the similar profile of Mr. A.

類似度算出処理部１６０ｄは、算出した類似度（およびそれに付随する情報、すなわち、プロファイルＩＤ、シーケンス番号、類似プロファイルＩＤ、類似度）を類似プロファイルテーブル１５０ｌに設定する（図２０の（６）参照）。ここで、プロファイルＩＤは、基点となった投稿者（上記の例ではＡさん）のプロファイルＩＤであり、類似プロファイルＩＤは、類似度が閾値以上となる投稿者（上記の例ではＢさん）のプロファイルＩＤである。そして、類似度算出処理部１６０ｄは、処理したプロファイルＩＤの類似度算出フラグを「オン」に設定する（図２０の（７）参照）。 The similarity calculation processing unit 160d sets the calculated similarity (and information associated therewith, that is, profile ID, sequence number, similar profile ID, similarity) in the similar profile table 150l (see (6) in FIG. 20). ). Here, the profile ID is the profile ID of the poster who has become the base point (Mr. A in the above example), and the similar profile ID is that of the poster (Mr. B in the above example) whose similarity is greater than or equal to the threshold value. Profile ID. Then, the similarity calculation processing unit 160d sets the similarity calculation flag of the processed profile ID to “ON” (see (7) in FIG. 20).

図３の説明に戻ると、トレンドキーワード抽出処理部１６０ｅは、記憶部１５０に記憶された各テーブルを用いて流行しているキーワード（すなわち、トレンドキーワード）を抽出する処理部である。図２１は、トレンドキーワード抽出処理部の具体的な処理を示す図である。 Returning to the description of FIG. 3, the trend keyword extraction processing unit 160 e is a processing unit that extracts popular keywords (that is, trend keywords) using each table stored in the storage unit 150. FIG. 21 is a diagram illustrating specific processing of the trend keyword extraction processing unit.

図２１に示すように、トレンドキーワード抽出処理部１６０ｅは、記事情報テーブル１５０ｂから登録済みの記事総件数を取得し（図２１の（１）参照）、記事情報テーブル１５０ｂからコンテンツ更新日時が所定範囲内の記事ＩＤを取得する（図２１の（２）参照）。 As shown in FIG. 21, the trend keyword extraction processing unit 160e acquires the total number of registered articles from the article information table 150b (see (1) in FIG. 21), and the content update date / time from the article information table 150b is within a predetermined range. Article ID is acquired (see (2) in FIG. 21).

そして、トレンドキーワード抽出処理部１６０ｅは、図２１の（１）で取得した記事ＩＤに該当する記事のキーワードを記事キーワードテーブル１５０ｆから取得し（図２１の（３）参照）、取得した各キーワードの第２TF/IDF値を算出する（図２１の（４）参照）。 Then, the trend keyword extraction processing unit 160e acquires the keyword of the article corresponding to the article ID acquired in (1) of FIG. 21 from the article keyword table 150f (see (3) of FIG. 21), and for each of the acquired keywords. The second TF / IDF value is calculated (see (4) in FIG. 21).

ここで、第２TF/IDF値の具体的な算出式は、
第２TF/IDF値＝（キーワード出現総数）×ｌｏｇ（記事総数／含有コンテンツ数）
によって表すことができる。なお、上式の各値は、トレンドキーワード抽出処理部１６０ｅが各テーブル１５０ａ〜１５０ｌを参照して、予め計数しておくものとする。 Here, the specific formula for calculating the second TF / IDF value is:
2nd TF / IDF value = (total number of keyword appearances) x log (total number of articles / number of contained contents)
Can be represented by It should be noted that each value of the above equation is counted in advance by the trend keyword extraction processing unit 160e with reference to the tables 150a to 150l.

トレンドキーワード抽出処理部１６０ｅは、トレンドキーワードに関する各種情報（キーワードＩＤ、第２TF/IDF値、キーワード出現総数、含有コンテンツ数）をトレンドキーワードテーブル１５０ｋに登録する（図２１の（５）参照）。 The trend keyword extraction processing unit 160e registers various information related to the trend keyword (keyword ID, second TF / IDF value, keyword appearance total number, content content count) in the trend keyword table 150k (see (5) in FIG. 21).

トレンドキーワードテーブル１５０ｋに記憶された各キーワード（キーワードＩＤによって識別されるキーワード＜特徴キーワード＞）のうち、第２TF/IDF値が閾値以上となるキーワードが、トレンドキーワードとして抽出されることとなる。 Of each keyword (keyword <characteristic keyword> identified by keyword ID) stored in the trend keyword table 150k, a keyword having a second TF / IDF value equal to or greater than a threshold value is extracted as a trend keyword.

図３の説明に戻ると、サービス提供処理部１６０ｆは、利用者端末１０〜３０からのサービス要求に応答して各種のサービスを提供する手段であり、特に、本発明に密接に関連するものとしては、所定の投稿者（利用者、コミュニティを含む）を指定された場合に、指定された投稿者に対応するプロファイルキーワード、類似プロファイルおよびトレンドキーワードを出力する（例えば、図２において説明した画面情報をサービス要求元となる利用者端末に出力する）。 Returning to the description of FIG. 3, the service provision processing unit 160 f is a means for providing various services in response to service requests from the user terminals 10 to 30, and particularly as closely related to the present invention. When a predetermined poster (including a user and a community) is designated, a profile keyword, a similar profile, and a trend keyword corresponding to the designated poster are output (for example, the screen information described in FIG. 2) Is output to the user terminal that is the service request source).

具体的に、サービス提供処理部１６０ｆが、投稿者のプロファイルキーワードを抽出する場合には、指定された投稿者のプロファイルＩＤと、プロファイルキーワードテーブル１５０ｊと、キーワードテーブル１５０ｅとを比較することによって、投稿者のプロファイルキーワードを抽出する。なお、サービス提供処理部１６０ｆは、第１TF/IDF値の値が閾値以上となるキーワードをプロファイルキーワードとして抽出して出力する。 Specifically, when the service provision processing unit 160f extracts the poster keyword of the poster, the posting is performed by comparing the profile ID of the designated poster, the profile keyword table 150j, and the keyword table 150e. The profile keyword of the user. Note that the service provision processing unit 160f extracts and outputs a keyword whose first TF / IDF value is equal to or greater than a threshold value as a profile keyword.

また、サービス提供処理部１６０ｆが、投稿者の類似プロファイルを抽出する場合には、指定された投稿者のプロファイルＩＤと、類似プロファイルテーブル１５０ｌと、記事管理テーブル１５０ａとを比較することによって、投稿者の類似プロファイル（指定された投稿者に類似する他の利用者、コミュニティ）を抽出して出力する。 Further, when the service providing processor 160f extracts a similar profile of the poster, the poster ID is compared by comparing the profile ID of the designated poster, the similar profile table 150l, and the article management table 150a. Similar profiles (other users and communities similar to the designated contributor) are extracted and output.

また、サービス提供処理部１６０ｆが、トレンドキーワードを抽出する場合には、トレンドキーワードテーブル１５０ｋを参照し、第２TF/IDF値が閾値以上となるキーワードをトレンドキーワードとして抽出し、出力する。 When the service provision processing unit 160f extracts a trend keyword, the service keyword processing unit 160f refers to the trend keyword table 150k, extracts a keyword whose second TF / IDF value is equal to or greater than a threshold value as a trend keyword, and outputs it.

次に、本実施例にかかる情報管理サーバ１００の処理手順について説明する。図２２は、本実施例にかかる情報管理サーバ１００の処理手順を示すフローチャートである。同図に示すように、情報管理サーバ１００は、記憶部１５０に記憶された記事が更新されたか否かを判定し（ステップＳ１０１）、更新されていない場合には（ステップＳ１０２，Ｎｏ）、処理を終了する。 Next, a processing procedure of the information management server 100 according to the present embodiment will be described. FIG. 22 is a flowchart illustrating the processing procedure of the information management server 100 according to the present embodiment. As shown in the figure, the information management server 100 determines whether or not the article stored in the storage unit 150 has been updated (step S101). If the article has not been updated (No in step S102), the processing is performed. Exit.

一方、記事が更新されている場合には（ステップＳ１０２，Ｙｅｓ）、特徴キーワード抽出処理部１６０ｂが特徴キーワード抽出処理を実行し（ステップＳ１０３）、プロファイルキーワード抽出処理部１６０ｃがプロファイルキーワード抽出処理を実行する（ステップＳ１０４）。 On the other hand, if the article has been updated (step S102, Yes), the feature keyword extraction processing unit 160b executes the feature keyword extraction processing (step S103), and the profile keyword extraction processing unit 160c executes the profile keyword extraction processing. (Step S104).

そして、類似度算出処理部１６０ｄが、類似度算出処理を実行し（ステップＳ１０５）、トレンドキーワード抽出処理部１６０ｅがトレンドキーワード抽出処理を実行する（ステップＳ１０６）。 Then, the similarity calculation processing unit 160d executes similarity calculation processing (step S105), and the trend keyword extraction processing unit 160e executes trend keyword extraction processing (step S106).

次に、図２２のステップＳ１０３で示した特徴キーワード抽出処理について説明する。図２３は、特徴キーワード抽出処理の処理手順を示すフローチャートである。同図に示すように、特徴キーワード抽出処理部１６０ｂは、更新された記事を取得し（ステップＳ２０１）、記事本文を形態素解析し、「語」の単位に品詞分解する（ステップＳ２０２）。 Next, the feature keyword extraction process shown in step S103 of FIG. 22 will be described. FIG. 23 is a flowchart illustrating the processing procedure of the feature keyword extraction process. As shown in the figure, the feature keyword extraction processing unit 160b acquires the updated article (step S201), morphologically analyzes the article text, and decomposes the part of speech into units of “words” (step S202).

そして、特徴キーワード抽出処理部１６０ｂは、特徴キーワードの対象となる「語」を抽出し（ステップＳ２０３）、特徴キーワードの対象となる「語」から不要語テーブル１５０ｇに登録されている「語」を除外する（ステップＳ２０４）。 Then, the feature keyword extraction processing unit 160b extracts the “word” that is the target of the feature keyword (step S203), and the “word” that is registered in the unnecessary word table 150g from the “word” that is the target of the feature keyword. Exclude (step S204).

続いて、特徴キーワード抽出処理部１６０ｂは、特徴キーワードの対象となる「語」を同義語・結合語テーブル１５０ｄに登録されている「語」に置換し（ステップＳ２０５）、特徴キーワードをキーワードテーブル１５０ｅに登録し（ステップＳ２０６）、特徴キーワードと記事とを対応付けて記事キーワードテーブル１５０ｆに登録する（ステップＳ２０７）。 Subsequently, the feature keyword extraction processing unit 160b replaces the “word” that is the target of the feature keyword with the “word” registered in the synonym / joint word table 150d (step S205), and the feature keyword is replaced with the keyword table 150e. (Step S206), the characteristic keyword and the article are associated with each other and registered in the article keyword table 150f (step S207).

次に、図２２のステップＳ１０４で示したプロファイルキーワード抽出処理について説明する。図２４は、プロファイルキーワード抽出処理の処理手順を示すフローチャートである。同図に示すように、プロファイルキーワード抽出処理部１６０ｃは、記事キーワードテーブル１５０ｆから更新された記事を抽出し、記事に対応付けられた特徴キーワードを抽出する（ステップＳ３０１）。 Next, the profile keyword extraction process shown in step S104 of FIG. 22 will be described. FIG. 24 is a flowchart showing a processing procedure of profile keyword extraction processing. As shown in the figure, the profile keyword extraction processing unit 160c extracts an updated article from the article keyword table 150f, and extracts a feature keyword associated with the article (step S301).

そして、プロファイルキーワード抽出処理部１６０ｃは、対象の人（利用者）・コミュニティ毎の記事内の特徴キーワード出現総数、対象の人・コミュニティ毎の全記事数、対象の人・コミュニティ毎のキーワードを含む記事数を基にして第１TF/IDF値を算出する（ステップＳ３０２）。 The profile keyword extraction processing unit 160c includes the total number of feature keywords in the articles for each target person (user) and community, the total number of articles for each target person and community, and the keywords for each target person and community. Based on the number of articles, the first TF / IDF value is calculated (step S302).

プロファイルキーワード抽出処理部１６０ｃは、第１TF/IDF値が閾値以上となるｎ件の特徴キーワードをプロファイルキーワードに設定し（ステップＳ３０３）、プロファイルキーワードをプロファイルキーワードテーブル１５０ｊに登録する（ステップＳ３０４）。 The profile keyword extraction processing unit 160c sets n feature keywords whose first TF / IDF value is equal to or greater than a threshold as profile keywords (step S303), and registers the profile keywords in the profile keyword table 150j (step S304).

次に、図２２のステップＳ１０５で示した類似度算出処理について説明する。図２５は、類似度算出処理の処理手順を示すフローチャートである。同図に示すように、類似度算出処理部１６０ｄは、類似度算出対象となる利用者（コミュニティを含む）のプロファイルキーワードをプロファイルキーワードテーブル１５０ｊから取得し（ステップＳ４０１）、比較対象となる利用者のプロファイルキーワードをプロファイルキーワードテーブル１５０ｊから取得する（ステップＳ４０２）。なお、比較対象の利用者は、算出対象の利用者以外の全ての利用者に対応する。 Next, the similarity calculation process shown in step S105 of FIG. 22 will be described. FIG. 25 is a flowchart illustrating a processing procedure of similarity calculation processing. As shown in the figure, the similarity calculation processing unit 160d acquires a profile keyword of a user (including a community) as a similarity calculation target from the profile keyword table 150j (step S401), and a user as a comparison target Are obtained from the profile keyword table 150j (step S402). The comparison target users correspond to all users other than the calculation target users.

そして、類似度算出処理部１６０ｄは、類似度算出対象となる利用者のプロファイルキーワードおよび比較対象となる利用者のプロファイルキーワードを基にして類似度を算出し（ステップＳ４０３）、類似度が閾値以上となる各プロファイルＩＤを類似プロファイルとして抽出する（ステップＳ４０４）。類似度算出処理部１６０ｄは、抽出した類似プロファイルを類似プロファイルテーブル１５０ｌに登録する（ステップＳ４０５）。 Then, the similarity calculation processing unit 160d calculates the similarity based on the profile keyword of the user as the similarity calculation target and the profile keyword of the user as the comparison target (step S403), and the similarity is equal to or greater than the threshold value. Each profile ID is extracted as a similar profile (step S404). The similarity calculation processing unit 160d registers the extracted similar profile in the similar profile table 150l (step S405).

なお、ステップ４０３において、算出対象の利用者に対応するプロファイルキーワードの上位ｎ件（第１TF/IDF値が大きい順）を次元とし、算出対象と比較対象の各利用者におけるプロファイルキーワードの出現総数を値とする２つのベクトル間のコサイン距離を類似度として算出する。 In step 403, the top n number of profile keywords corresponding to the users to be calculated (in descending order of the first TF / IDF value) are taken as dimensions, and the total number of appearance of profile keywords for each user to be calculated and compared is calculated. The cosine distance between two vectors as values is calculated as the similarity.

次に、図２２のステップＳ１０６で示したトレンドキーワード抽出処理について説明する。図２６は、トレンドキーワード抽出処理の処理手順を示すフローチャートである。同図に示すように、トレンドキーワード抽出処理部１６０ｅは、記事キーワードテーブルから最新ｎ件の記事（あるいは、所定期間内の記事）を抽出し、記事に対応付けられた特徴キーワードを抽出する（ステップＳ５０１）。 Next, the trend keyword extraction process shown in step S106 of FIG. 22 will be described. FIG. 26 is a flowchart illustrating a processing procedure of trend keyword extraction processing. As shown in the figure, the trend keyword extraction processing unit 160e extracts the latest n articles (or articles within a predetermined period) from the article keyword table, and extracts feature keywords associated with the articles (steps). S501).

そして、トレンドキーワード抽出処理部１６０ｅは、対象期間内の記事内の特徴キーワード出現総数、対象期間内の全記事数、対象範囲内の特徴キーワードを含む記事数を基にして第２TF/IDF値を算出する（ステップＳ５０２）。 Then, the trend keyword extraction processing unit 160e calculates the second TF / IDF value based on the total number of feature keywords appearing in the articles within the target period, the total number of articles within the target period, and the number of articles including the feature keywords within the target range. Calculate (step S502).

トレンドキーワード抽出処理部１６０ｅは、第２TF/IDF値が閾値以上となるｎ件の特徴キーワードをトレンドキーワードに設定し（ステップＳ５０３）、トレンドキーワードをトレンドキーワードテーブル１５０ｋに登録する（ステップＳ５０４）。 The trend keyword extraction processing unit 160e sets n feature keywords whose second TF / IDF value is equal to or greater than the threshold value as the trend keyword (step S503), and registers the trend keyword in the trend keyword table 150k (step S504).

このように、特徴キーワード抽出処理部１６０ｂが特徴キーワードを抽出し、抽出した特徴キーワードを利用して、プロファイルキーワード、類似プロファイル、トレンドキーワードを抽出するので、利用者にとって有益な情報を効率よく抽出することがでる。 As described above, the feature keyword extraction processing unit 160b extracts the feature keyword, and uses the extracted feature keyword to extract the profile keyword, the similar profile, and the trend keyword. Therefore, information useful for the user is efficiently extracted. It comes out.

上述してきたように、本実施例にかかる情報管理サーバ１００は、所定期間内に作成された記事（ブログ等も含む）を記憶部１５０から取得し、取得した記事に含まれるキーワードを特徴キーワードとして抽出する。そして、情報管理サーバ１００は、対象となる利用者または複数の利用者から構成されるコミュニティによって作成された記事と特徴キーワードとを基にして利用者またはコミュニティに対する各特徴キーワードの特徴量（第１TF/IDF値）を算出し、算出した特徴量が閾値以上となる特徴キーワードを利用者またはコミュニティの特徴（プロファイル）を示すプロファイルキーワードとして抽出するので、利用者にとって有益な情報を利用者に提供することができる。 As described above, the information management server 100 according to the present embodiment acquires articles (including blogs and the like) created within a predetermined period from the storage unit 150, and uses keywords included in the acquired articles as feature keywords. Extract. Then, the information management server 100 uses the feature amount (first TF) of each feature keyword for the user or community based on the article and the feature keyword created by the target user or the community composed of a plurality of users. / IDF value) is calculated, and feature keywords whose calculated feature amount is equal to or greater than the threshold are extracted as profile keywords indicating the features (profiles) of the user or community, so that information useful to the user is provided to the user. be able to.

また、本実施例にかかる情報管理サーバ１００は、各利用者（各コミュニティも含む）の各プロファイルキーワードを基にして類似度を算出し、算出した類似度に基づいて利用者に類似する他の利用者やコミュニティ（類似プロファイル）を抽出するので、利用者にとって有益な情報を利用者に提供することができる。 Further, the information management server 100 according to the present embodiment calculates the similarity based on each profile keyword of each user (including each community), and other similar to the user based on the calculated similarity. Since users and communities (similar profiles) are extracted, information useful to the users can be provided to the users.

また、本実施例にかかる情報管理サーバ１００は、所定の期間内に作成された記事を抽出し、抽出した記事と特徴キーワードとを基にして特徴量（第２TF/IDF値）を算出し、算出した特徴量が閾値以上となるキーワードを流行のキーワード（トレンドキーワード）として抽出するので、利用者にとって有益な情報を利用者に提供することができる。 Further, the information management server 100 according to the present embodiment extracts an article created within a predetermined period, calculates a feature amount (second TF / IDF value) based on the extracted article and a feature keyword, Since keywords whose calculated feature amount is equal to or greater than the threshold are extracted as trendy keywords (trend keywords), information useful to the user can be provided to the user.

なお、本実施例ではサービス提供処理部１６０ｆが、投稿者（利用者・コミュニティを含む）を指定された場合に、かかる投稿者のプロファイルＩＤに紐付くプロファイルキーワード、類似プロファイルを抽出して利用者端末に出力していたが、これに限定されるものではない。 In this embodiment, when a service provider 160f designates a poster (including a user / community), the user extracts a profile keyword and a similar profile associated with the profile ID of the poster. Although outputting to the terminal, it is not limited to this.

例えば、サービス提供処理部１６０ｆが、所定のプロファイルキーワードあるいはトレンドキーワードを指定された場合には、かかるプロファイルキーワードあるいはトレンドキーワードに紐付くプロファイルＩＤ、類似プロファイルの情報を抽出して、利用者端末に出力してもよい。 For example, when a predetermined profile keyword or trend keyword is specified, the service provision processing unit 160f extracts the profile ID and similar profile information associated with the profile keyword or trend keyword, and outputs them to the user terminal May be.

ところで、本実施例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部あるいは一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 By the way, among the processes described in the present embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図３に示した情報管理サーバ１００の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部または任意の一部がＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, each component of the information management server 100 shown in FIG. 3 is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. Furthermore, each processing function performed by each device may be realized by a CPU and a program that is analyzed and executed by the CPU, or may be realized as hardware by wired logic.

図２７は、実施例にかかる情報管理サーバを構成するコンピュータのハードウェア構成を示す図である。図２７に示すように、このコンピュータ（情報管理サーバ）２００は、入力装置２０１、モニタ２０２、ＲＡＭ（Random Access Memory）２０３、ＲＯＭ（Read Only Memory）２０４、記憶媒体からデータを読み取る媒体読取装置２０５、他の装置（利用者端末）との間でデータの送受信を行うインターフェース２０６、ＣＰＵ（Central Processing Unit）２０７、ＨＤＤ（Hard Disk Drive）２０８をバス２０９で接続して構成される。 FIG. 27 is a diagram illustrating a hardware configuration of a computer constituting the information management server according to the embodiment. As shown in FIG. 27, the computer (information management server) 200 includes an input device 201, a monitor 202, a RAM (Random Access Memory) 203, a ROM (Read Only Memory) 204, and a medium reading device 205 that reads data from a storage medium. An interface 206 that transmits and receives data to and from other devices (user terminals), a CPU (Central Processing Unit) 207, and an HDD (Hard Disk Drive) 208 are connected by a bus 209.

そして、ＨＤＤ２０８には、上記した情報管理サーバ１００の機能と同様の機能を発揮する情報管理プログラム２０８ｂが記憶されている。ＣＰＵ２０７が情報管理プログラム２０８ｂを読み出して実行することにより、情報管理プロセス２０７ａが起動される。この情報管理プロセス２０７ａは、図３に示した、情報管理部１６０ａ、特徴キーワード抽出処理部１６０ｂ、プロファイルキーワード抽出処理部１６０ｃ、類似度算出処理部１６０ｄ、トレンドキーワード抽出処理部１６０ｅ、サービス提供処理部１６０ｆに対応する。 The HDD 208 stores an information management program 208b that exhibits the same function as that of the information management server 100 described above. When the CPU 207 reads and executes the information management program 208b, the information management process 207a is activated. The information management process 207a includes an information management unit 160a, a feature keyword extraction processing unit 160b, a profile keyword extraction processing unit 160c, a similarity calculation processing unit 160d, a trend keyword extraction processing unit 160e, and a service provision processing unit shown in FIG. Corresponds to 160f.

また、ＨＤＤ２０８は、図３に示した各テーブル１５０ａ〜１５０ｌに対応する各種データ２０８ａを記憶する。ＣＰＵ２０７は、ＨＤＤ２０８に格納された各種データ２０８ａを読み出してＲＡＭ２０３に格納し、ＲＡＭ２０３に格納された各種データ２０３ａを用いて、特徴キーワード、プロファイルキーワード、類似プロファイル、トレンドキーワードを抽出する。 The HDD 208 stores various data 208a corresponding to the tables 150a to 150l shown in FIG. The CPU 207 reads out various data 208a stored in the HDD 208, stores it in the RAM 203, and extracts feature keywords, profile keywords, similar profiles, and trend keywords using the various data 203a stored in the RAM 203.

ところで、図２７に示した情報管理プログラム２０８ｂは、必ずしも最初からＨＤＤ２０８に記憶させておく必要はない。たとえば、コンピュータに挿入されるフレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」、または、コンピュータの内外に備えられるハードディスクドライブ（ＨＤＤ）などの「固定用の物理媒体」、さらには、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータに接続される「他のコンピュータ（またはサーバ）」などに情報管理プログラム２０８ｂを記憶しておき、コンピュータがこれらから情報管理プログラム２０８ｂを読み出して実行するようにしてもよい。 By the way, the information management program 208b shown in FIG. 27 is not necessarily stored in the HDD 208 from the beginning. For example, a “portable physical medium” such as a flexible disk (FD), a CD-ROM, a DVD disk, a magneto-optical disk, or an IC card inserted into a computer, or a hard disk drive (HDD) provided inside or outside the computer. The information management program 208b is stored in the "fixed physical medium" of the computer, and also in the "other computer (or server)" connected to the computer via the public line, the Internet, LAN, WAN, etc. The computer may read out and execute the information management program 208b from these.

（付記１）コンピュータに、
所定の期間内に作成された記事を記憶装置から取得し、取得した記事に含まれるキーワードを特徴キーワードとして抽出する特徴キーワード抽出手順と、
利用者または複数の利用者から構成されるコミュニティによって作成された記事と前記特徴キーワードとを基にして前記利用者または前記コミュニティに対する各特徴キーワードの特徴量を算出する特徴量算出手順と、
前記特徴量が閾値以上となる特徴キーワードを前記利用者またはコミュニティの特徴を示すプロファイルキーワードとして抽出するプロファイルキーワード抽出手順と、
を実行させることを特徴とする情報抽出プログラム。 (Supplementary note 1)
A feature keyword extraction procedure for acquiring an article created within a predetermined period from a storage device and extracting a keyword included in the acquired article as a feature keyword;
A feature amount calculation procedure for calculating a feature amount of each feature keyword for the user or the community based on an article created by a user or a community composed of a plurality of users and the feature keyword;
A profile keyword extraction procedure for extracting a feature keyword whose feature quantity is equal to or greater than a threshold value as a profile keyword indicating a feature of the user or community;
An information extraction program characterized by causing

（付記２）前記特徴量算出手順は、前記利用者またはコミュニティによって作成された記事を前記記憶装置から抽出し、抽出した記事に含まれる前記特徴キーワードの数を示す第１の値を計数する第１計数手順と、前記利用者またはコミュニティによって作成された記事の総数を示す第２の値を計数する第２計数手順と、前記利用者またはコミュニティによって作成された記事の内で前記特徴キーワードを含む記事の数を示す第３の値を計数する第３計数手順と、前記第１、２、３の値を基にして特徴量を算出する算出手順とを実行することを特徴とする付記１に記載の情報抽出プログラム。 (Additional remark 2) The said feature-value calculation procedure extracts the article produced by the said user or community from the said memory | storage device, and counts the 1st value which shows the number of the said feature keywords contained in the extracted article. A counting procedure, a second counting procedure for counting a second value indicating the total number of articles created by the user or community, and the feature keyword included in the articles created by the user or community Supplementary note 1 characterized in that a third counting procedure for counting a third value indicating the number of articles and a calculation procedure for calculating a feature value based on the first, second, and third values are executed. The information extraction program described.

（付記３）基準となる利用者またはコミュニティの特徴を示す各プロファイルキーワードおよび当該プロファイルキーワードが記事中に出現する数を対応付けた基準プロファイルキーワード群と、他の利用者またはコミュニティの特徴を示す各プロファイルキーワードおよび当該プロファイルキーワードが記事中に出現する数を含んだ他のプロファイルキーワード群とを基にして類似度を算出し、当該類似度に基づいて前記基準となる利用者またはコミュニティに類似する他の利用者またはコミュニティを抽出する類似度算出手順を更にコンピュータに実行させることを特徴とする付記１または２に記載の情報抽出プログラム。 (Supplementary Note 3) Each profile keyword indicating characteristics of a user or community serving as a reference, a reference profile keyword group in which the number of appearance of the profile keyword in an article is associated, and each characteristic indicating characteristics of another user or community Similarity is calculated based on the profile keyword and other profile keyword group including the number of occurrence of the profile keyword in the article, and similar to the reference user or community based on the similarity The information extraction program according to appendix 1 or 2, further causing a computer to execute a similarity calculation procedure for extracting a user or community.

（付記４）前記記憶装置から所定の期間内の記事を抽出し、抽出した記事に含まれる前記特徴キーワードの数を示す第４の値を計数する第４計数手順と、所定の期間内の記事の総数を示す第５の値を計数する第５計数手順と、所定の期間内の記事の内で前記特徴キーワードを含む記事の数を示す第６の値を計数する第６計数手順と、前記第４、５、６の値を基にして各特徴キーワードの第２の特徴量を算出し、算出した第２の特徴量が閾値以上となる特徴キーワードを流行のキーワードとして抽出するトレンドキーワード抽出手順とを更にコンピュータに実行させることを特徴とする付記１、２または３に記載の情報抽出プログラム。 (Additional remark 4) The 4th count procedure which extracts the article | item in the predetermined period from the said memory | storage device, counts the 4th value which shows the number of the said characteristic keywords contained in the extracted article, and the article | item in a predetermined period A fifth counting procedure for counting a fifth value indicating the total number of articles, a sixth counting procedure for counting a sixth value indicating the number of articles including the feature keyword in articles within a predetermined period, and Trend keyword extraction procedure for calculating a second feature amount of each feature keyword based on the fourth, fifth, and sixth values, and extracting a feature keyword having the calculated second feature amount equal to or greater than a threshold as a trendy keyword The information extraction program according to appendix 1, 2, or 3, wherein the computer is further executed.

（付記５）前記類似算出手順は、前記基準プロファイルキーワード群を第１のベクトル、前記他のプロファイルキーワード群を第２のベクトルとしたベクトル演算を実行し、前記第１および第２のベクトル間の距離を前記類似度として算出することを特徴とする付記３に記載の情報抽出プログラム。 (Additional remark 5) The said similarity calculation procedure performs the vector operation which used the said reference profile keyword group as the 1st vector, said other profile keyword group as the 2nd vector, and performed between the said 1st and 2nd vector The information extraction program according to supplementary note 3, wherein a distance is calculated as the similarity.

（付記６）前記記憶装置は、前記特徴キーワードの対象外となる不要キーワードを更に記憶し、前記特徴キーワード抽出手順は、抽出した特徴キーワードのうち、前記不要キーワードと一致する特徴キーワードを取り除くことを特徴とする付記１に記載の情報抽出プログラム。 (Additional remark 6) The said memory | storage device further memorize | stores the unnecessary keyword which becomes the object of the said characteristic keyword, The said characteristic keyword extraction procedure removes the characteristic keyword which corresponds to the said unnecessary keyword among the extracted characteristic keywords. The information extraction program according to appendix 1, which is characterized.

（付記７）利用者または複数の利用者から構成されるコミュニティによって作成された記事を管理し、前記記事から所定の情報を抽出する情報抽出装置であって、
前記利用者またはコミュニティによって作成された記事を記憶する記事記憶手段と、
前記記事記憶手段から所定の期間内に作成された記事を取得し、取得した記事に含まれるキーワードを特徴キーワードとして抽出する特徴キーワード抽出手段と、
前記利用者またはコミュニティによって作成された記事と前記特徴キーワードとを基にして前記利用者またはコミュニティに対する各特徴キーワードの特徴量を算出する特徴量算出手段と、
前記特徴量が閾値以上となる特徴キーワードを前記利用者またはコミュニティの特徴を示すプロファイルキーワードとして抽出するプロファイルキーワード抽出手段と、
を備えたことを特徴とする情報抽出装置。 (Appendix 7) An information extraction device that manages an article created by a user or a community composed of a plurality of users and extracts predetermined information from the article,
Article storage means for storing articles created by the user or community;
Feature keyword extraction means for acquiring articles created within a predetermined period from the article storage means, and extracting keywords included in the acquired articles as feature keywords;
Feature quantity calculating means for calculating the feature quantity of each feature keyword for the user or community based on the article created by the user or community and the feature keyword;
Profile keyword extraction means for extracting a feature keyword whose feature quantity is equal to or greater than a threshold value as a profile keyword indicating a feature of the user or community;
An information extraction apparatus comprising:

以上のように、本発明にかかる情報抽出プログラムおよび情報抽出装置は、各種の情報を共有するシステムに有用であり、特に、膨大な情報の中から有益な情報を利用者に提供する必要がある場合に適している。 As described above, the information extraction program and the information extraction apparatus according to the present invention are useful for a system that shares various types of information, and in particular, it is necessary to provide useful information to a user from a vast amount of information. Suitable for cases.

本実施例にかかる情報共有システムの構成を示すブロック図である。It is a block diagram which shows the structure of the information sharing system concerning a present Example. 利用者端末のモニタに表示される画面イメージの一例を示す図である。It is a figure which shows an example of the screen image displayed on the monitor of a user terminal. 本実施例にかかる情報管理サーバの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the information management server concerning a present Example. 記事管理テーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of an article management table. 記事情報テーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of an article information table. 実行管理テーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of an execution management table. 同義語・結合語テーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a synonym and a combined word table. キーワードテーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a keyword table. 記事キーワードテーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of an article keyword table. 不要語テーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of an unnecessary word table. 記事空間管理テーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of an article space management table. プロファイル情報テーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a profile information table. プロファイルキーワードテーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a profile keyword table. トレンドキーワードテーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a trend keyword table. 類似プロファイルテーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a similar profile table. 特徴キーワード抽出処理部の具体的な処理を示す図（１）である。It is a figure (1) which shows the specific process of a characteristic keyword extraction process part. 特徴キーワード抽出処理部の具体的な処理を示す図（２）である。It is FIG. (2) which shows the specific process of a characteristic keyword extraction process part. プロファイルキーワード抽出処理部の具体的な処理を示す図（１）である。It is a figure (1) which shows the specific process of a profile keyword extraction process part. プロファイルキーワード抽出処理部の具体的な処理を示す図（２）である。It is FIG. (2) which shows the specific process of a profile keyword extraction process part. 類似度算出処理部の具体的な処理を示す図である。It is a figure which shows the specific process of a similarity calculation process part. トレンドキーワード抽出処理部の具体的な処理を示す図である。It is a figure which shows the specific process of a trend keyword extraction process part. 本実施例にかかる情報管理サーバの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the information management server concerning a present Example. 特徴キーワード抽出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of characteristic keyword extraction processing. プロファイルキーワード抽出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of profile keyword extraction processing. 類似度算出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a similarity calculation process. トレンドキーワード抽出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a trend keyword extraction process. 実施例にかかる情報管理サーバを構成するコンピュータのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the computer which comprises the information management server concerning an Example.

Explanation of symbols

１０，２０，３０利用者端末
５０ネットワーク
１００情報管理サーバ
１１０入力部
１２０出力部
１３０通信制御ＩＦ部
１４０入出力制御ＩＦ部
１５０記憶部
１５０ａ記事管理テーブル
１５０ｂ記事情報テーブル
１５０ｃ実行管理テーブル
１５０ｄ同義語・結合語テーブル
１５０ｅキーワードテーブル
１５０ｆ記事キーワードテーブル
１５０ｇ不要語テーブル
１５０ｈ記事空間管理テーブル
１５０ｉプロファイル情報テーブル
１５０ｊプロファイルキーワードテーブル
１５０ｋトレンドキーワードテーブル
１５０ｌ擬似プロファイルテーブル
１６０制御部
１６０ａ情報管理部
１６０ｂ特徴キーワード抽出処理部
１６０ｃプロファイルキーワード抽出処理部
１６０ｄ類似度算出処理部
１６０ｅトレンドキーワード抽出処理部
１６０ｆサービス提供処理部
２００コンピュータ
２０１入力装置
２０２モニタ
２０３ＲＡＭ
２０３ａ，２０８ａ各種データ
２０４ＲＯＭ
２０５媒体読取装置
２０６インターフェース
２０７ＣＰＵ
２０７ａ情報管理プロセス
２０８ＨＤＤ
２０８ｂ情報管理プログラム
２０９バス 10, 20, 30 User terminal 50 Network 100 Information management server 110 Input unit 120 Output unit 130 Communication control IF unit 140 Input / output control IF unit 150 Storage unit 150a Article management table 150b Article information table 150c Execution management table 150d Combined word table 150e Keyword table 150f Article keyword table 150g Unnecessary word table 150h Article space management table 150i Profile information table 150j Profile keyword table 150k Trend keyword table 150l Pseudo profile table 160 Control unit 160a Information management unit 160b Feature keyword extraction processing unit 160c Profile Keyword extraction processing unit 160d Similarity calculation processing unit 160e Trend keyword extraction processing unit 1 0f the service providing unit 200 the computer 201 input device 202 monitors 203 RAM
203a, 208a Various data 204 ROM
205 Media reader 206 Interface 207 CPU
207a Information management process 208 HDD
208b Information management program 209 bus

Claims

On the computer,
A feature keyword extraction procedure for acquiring an article created within a predetermined period from a storage device and extracting a keyword included in the acquired article as a feature keyword;
A feature amount calculation procedure for calculating a feature amount of each feature keyword for the user or the community based on an article created by a user or a community composed of a plurality of users and the feature keyword;
A profile keyword extraction procedure for extracting a feature keyword whose feature quantity is equal to or greater than a threshold value as a profile keyword indicating a feature of the user or community;
An information extraction program characterized by causing

The feature amount calculation procedure includes a first counting procedure for extracting an article created by the user or community from the storage device and counting a first value indicating the number of the feature keywords included in the extracted article; A second counting procedure for counting a second value indicating the total number of articles created by the user or community, and the number of articles including the characteristic keyword among the articles created by the user or community. The information according to claim 1, wherein a third counting procedure for counting a third value to be indicated and a calculation procedure for calculating a feature amount based on the first, second, and third values are executed. Extraction program.

Each profile keyword indicating the characteristics of the reference user or community and the number of occurrences of the profile keyword in the article, each profile keyword indicating the characteristics of other users or communities, and the corresponding Similarity is calculated based on other profile keyword groups including the number of profile keywords appearing in the article, and other users similar to the reference user or community based on the similarity or The information extraction program according to claim 1 or 2, further causing a computer to execute a similarity calculation procedure for extracting a community.

A fourth counting procedure for extracting articles from the storage device within a predetermined period and counting a fourth value indicating the number of the feature keywords included in the extracted articles, and a total number of articles within the predetermined period A fifth counting procedure for counting a fifth value, a sixth counting procedure for counting a sixth value indicating the number of articles including the characteristic keyword in articles within a predetermined period, and the fourth, fifth, And a trend keyword extraction procedure for calculating a second feature value of each feature keyword based on the value of 6, and extracting a feature keyword having the calculated second feature value equal to or greater than a threshold value as a trendy keyword. The information extraction program according to claim 1, 2, or 3, wherein the information extraction program is executed.

An information extraction device that manages an article created by a user or a community composed of a plurality of users and extracts predetermined information from the article,
Article storage means for storing articles created by the user or community;
Feature keyword extraction means for acquiring articles created within a predetermined period from the article storage means, and extracting keywords included in the acquired articles as feature keywords;
Feature quantity calculating means for calculating a feature quantity of each feature keyword for the user or community based on an article created by the user or community and the feature keyword;
Profile keyword extraction means for extracting a feature keyword whose feature quantity is equal to or greater than a threshold value as a profile keyword indicating a feature of the user or community;
An information extraction apparatus comprising: