JP2011003158A

JP2011003158A - Summary preparation device and method

Info

Publication number: JP2011003158A
Application number: JP2009148038A
Authority: JP
Inventors: Wataru Shoji; 渉庄司
Original assignee: HOWS KK
Current assignee: HOWS KK
Priority date: 2009-06-22
Filing date: 2009-06-22
Publication date: 2011-01-06

Abstract

PROBLEM TO BE SOLVED: To automatically prepare a summary of a content described in a text, in the text such as a blog.SOLUTION: A summary preparation device includes a blog data storage part 15 for storing a file including the text containing a plurality of sentences, a paragraphing processing part 232 for paragraphing the text in the file by every one sentence, when receiving a summary preparation request of the text in the file stored in the blog data storage part 15, a sentence selection processing part 233 for selecting a sentence finished with a prescribed sentence end character determined in response to an attribute of the text, out of the plurality of sentences, and a network interface 11 for outputting the selected sentence as a reply to the summary preparation request.

Description

本発明は、テキストの要約を自動生成する技術に関する。 The present invention relates to a technique for automatically generating a text summary.

インターネット上に多数存在するブログサイトやウェブサイト（掲示板）などには、いわゆる口コミ情報といわれるさまざまな商品やサービスに関する消費者の意見、感想などがテキストで記載されている。最近では、この口コミ情報によって、商品やサービスの売れ行きが左右されるようになってきている。 Many blog sites and websites (bulletin boards) exist on the Internet, and the opinions and impressions of consumers about various products and services called so-called word-of-mouth information are written in text. Recently, sales of products and services are influenced by this word-of-mouth information.

そのため、企業にとっては、自社及び競合他社の商品等に関するインターネット上の口コミ情報を把握することは、マーケティング戦略上重要なこととなっている。そのため、例えば、特許文献１及び２のような技術が提案されている。 For this reason, it is important for companies to understand word-of-mouth information on the Internet regarding their products and competitors' products. Therefore, for example, techniques such as Patent Documents 1 and 2 have been proposed.

特開２００４−１８５５７２号公報JP 2004-185572 A 特開２００８−２６２５２０号公報JP 2008-262520 A

例えば、ブログや掲示板などでは、ある商品等についてどのようなことが記述されているのか、その記述内容を迅速に把握したいというニーズがある。また、インターネット上の情報は時々刻々と追加されているので、その解析はリアルタイムに行いたい。 For example, in blogs and bulletin boards, there is a need to quickly grasp what is described about a certain product and the like. Also, since information on the Internet is constantly added, I want to analyze it in real time.

これに対して、従来の日本語テキスト解析処理では、形態素解析や係り受け解析などが広く知られているが、これらの処理には相当の時間を要していた。 On the other hand, in the conventional Japanese text analysis processing, morphological analysis, dependency analysis, and the like are widely known, but these processings require a considerable amount of time.

そこで、本発明の目的は、ブログなどのテキストにおいて、そこで記述されている内容の要約を自動作成することである。 Accordingly, an object of the present invention is to automatically create a summary of the contents described therein in a text such as a blog.

本発明の一つの実施態様に従う要約作成装置は、複数の文が含まれているテキストを含むファイルを記憶するテキスト記憶部と、前記テキスト記憶部に記憶されているファイル内のテキストの要約作成リクエストを受け付けると、前記ファイル内のテキストを一文ごとに区切る区切り処理部と、前記区切り処理部で区切られた複数の文のうち、前記テキストの属性に応じて定まる所定の文末文字で終わる文を選択する選択処理部と、前記選択処理部で選択された文を前記要約作成リクエストに対する応答として出力する出力手段と、を備える。 A summary creation device according to an embodiment of the present invention includes a text storage unit that stores a file including text including a plurality of sentences, and a text summary creation request in the file stored in the text storage unit. Is received, a delimiter processing unit that delimits the text in the file for each sentence, and a sentence ending with a predetermined end character determined according to the attribute of the text is selected from among a plurality of sentences delimited by the delimiter processing unit A selection processing unit that outputs the sentence selected by the selection processing unit as a response to the summary creation request.

好適な実施形態では、前記選択処理部は、前記区切り処理部で区切られた複数の文のうち、前記テキストにおける所定以上の頻出語を含む文を選択するようにしてもよい。 In a preferred embodiment, the selection processing unit may select a sentence including a predetermined number of frequent words or more in the text among a plurality of sentences delimited by the delimiter processing unit.

好適な実施形態では、前記テキストが、商品またはサービスについて記述したブログであるとき、前記文末文字は、少なくとも「足」、「得」、「念」、「す」、「い」、「り」、「る」、「よ」及び「ん」を含んでもよい。 In a preferred embodiment, when the text is a blog describing a product or service, the end-of-sentence characters are at least “foot”, “profit”, “mind”, “su”, “i”, “ri”. , “Ru”, “yo” and “n” may be included.

好適な実施形態では、前記テキストが新聞記事であるとき、前記文末文字は、少なくとも、「す」、「た」、「る」及び「んだ」を含んでもよい。 In a preferred embodiment, when the text is a newspaper article, the end-of-sentence character may include at least “su”, “ta”, “ru”, and “dan”.

好適な実施形態では、前記テキストが新聞記事であるとき、前記選択処理部は、文頭が「（」であり、かつ、文末が「）」ある文を選択してもよい。 In a preferred embodiment, when the text is a newspaper article, the selection processing unit may select a sentence having a sentence beginning with “(” and a sentence ending with “”.

好適な実施形態では、前記テキストが新聞記事であるとき、前記文末文字は、文頭が「この中で」、「ただ」、及び「このほか」のうちのいずれかであり、かつ、文末が「す」、「た」、「る」及び「んだ」のうちのいずれかである文の直前の文を選択してもよい。 In a preferred embodiment, when the text is a newspaper article, the end-of-sentence character has one of “in this”, “just”, and “other” and the end-of-sentence is “ You may select the sentence immediately before the sentence that is one of “su”, “ta”, “ru”, and “dan”.

本発明の一実施形態に係る情報提供装置を含む情報提供システムの概要図である。1 is a schematic diagram of an information providing system including an information providing apparatus according to an embodiment of the present invention. ブログ解析サーバ１の構成図である。1 is a configuration diagram of a blog analysis server 1. FIG. ブログデータ収集部１３の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the blog data collection part. ブログデータ記憶部１５のデータ構造の一例を示す模式図である。4 is a schematic diagram illustrating an example of a data structure of a blog data storage unit 15. FIG. 広告サーバ４の構成図である。It is a block diagram of the advertisement server 4. 本実施形態における情報提供処理手順を示すフローチャートである。It is a flowchart which shows the information provision process procedure in this embodiment. ユーザ端末装置３に表示される画面の例を示す。The example of the screen displayed on the user terminal device 3 is shown. ユーザ端末装置３に表示される画面の例を示す。The example of the screen displayed on the user terminal device 3 is shown. ユーザ端末装置３に表示される画面の例を示す。The example of the screen displayed on the user terminal device 3 is shown. ユーザ端末装置３に表示される画面の例を示す。The example of the screen displayed on the user terminal device 3 is shown. 口コミ情報解析リクエスト処理を示すフローチャートである。It is a flowchart which shows a word-of-mouth information analysis request process. ステップＳ３５のテキスト解析処理の詳細な手順を示すフローチャートである。It is a flowchart which shows the detailed procedure of the text analysis process of step S35. アイテム解析リクエスト処理を示すフローチャートである。It is a flowchart which shows an item analysis request process. 要約生成処理を示すフローチャートである。It is a flowchart which shows the summary production | generation process.

以下、本発明の一実施形態に係る情報提供装置を含む情報提供システムについて、図面を参照して説明する。 Hereinafter, an information providing system including an information providing apparatus according to an embodiment of the present invention will be described with reference to the drawings.

図１は、本発明の一実施形態に係る情報提供システムの構成図である。同図に示すように、本システムは、ブログ解析サーバ１と、複数のユーザ端末装置３と、広告サーバ４と、ブログサーバ５とを有し、それらがインターネットなどのネットワーク９を介して接続されている。 FIG. 1 is a configuration diagram of an information providing system according to an embodiment of the present invention. As shown in the figure, this system has a blog analysis server 1, a plurality of user terminal devices 3, an advertisement server 4, and a blog server 5, which are connected via a network 9 such as the Internet. ing.

ここで、ブログ解析サーバ１及び広告サーバ４が、ユーザ端末装置３へ情報を提供する情報提供装置を構成する。ブログ解析サーバ１及び広告サーバ４からなる情報提供装置は、単一のコンピュータ装置上に実現しても良いし、複数台のコンピュータ装置により実現しても良い。 Here, the blog analysis server 1 and the advertisement server 4 constitute an information providing device that provides information to the user terminal device 3. The information providing device including the blog analysis server 1 and the advertisement server 4 may be realized on a single computer device or may be realized by a plurality of computer devices.

ブログ解析サーバ１、ユーザ端末装置３、広告サーバ４及びブログサーバ５は、いずれも例えば汎用的なコンピュータシステムにより構成され、以下に説明するブログ解析サーバ１、ユーザ端末装置３、広告サーバ４及びブログサーバ５内の個々の構成要素または機能は、例えば、コンピュータプログラムを実行することにより実現される。このコンピュータプログラムは、コンピュータ読み取り可能な記録媒体に格納可能である。 The blog analysis server 1, the user terminal device 3, the advertisement server 4, and the blog server 5 are all configured by, for example, a general-purpose computer system, and the blog analysis server 1, user terminal device 3, advertisement server 4, and blog described below Individual components or functions in the server 5 are realized, for example, by executing a computer program. This computer program can be stored in a computer-readable recording medium.

ブログ解析サーバ１は、クローラと呼ばれるプログラムを用いて、複数のブログサーバ５からブログデータを収集し、複数のユーザ端末装置３に対してブログデータの解析サービスを提供する。本実施形態では、ブログ解析サーバ１が複数のブログサーバ５から、様々な商品及びサービス（カテゴリ）に関するいわゆる口コミ情報に関するブログデータを収集する。そして、ユーザ端末装置３からのリクエストに応じて、商品及びサービスに関する口コミ情報の解析結果を提供する。さらに、ブログ解析サーバ１は、ユーザ端末装置３からのリクエストに応じて、商品及びサービスに関する口コミ情報の要約を作成し、提供する。つまり、ブログ解析サーバ１は、ブログデータなどのテキストを解析するテキスト解析装置及び、そのテキストの要約をする要約作成装置として機能する。 The blog analysis server 1 collects blog data from a plurality of blog servers 5 using a program called a crawler, and provides a blog data analysis service to the plurality of user terminal devices 3. In the present embodiment, the blog analysis server 1 collects blog data related to so-called word-of-mouth information related to various products and services (categories) from the plurality of blog servers 5. And according to the request from the user terminal device 3, the analysis result of the word-of-mouth information regarding goods and service is provided. Furthermore, the blog analysis server 1 creates and provides a summary of word-of-mouth information related to products and services in response to a request from the user terminal device 3. That is, the blog analysis server 1 functions as a text analysis device that analyzes text such as blog data and a summary creation device that summarizes the text.

なお、本実施形態では、特にブログデータについて説明するが、これ以外にもネットワークに接続されているウェブサーバに構築されているウェブサイトのデータを対象とすることもできる。 In the present embodiment, blog data will be described in particular. However, in addition to this, data on a website constructed on a web server connected to a network can be targeted.

ユーザ端末装置３は、ネットワーク９にアクセス可能なコンピュータ装置であって、ウェブブラウザが搭載されている。後述する種々の画面は、例えば、ウェブブラウザを用いて表示する。 The user terminal device 3 is a computer device that can access the network 9 and is equipped with a web browser. Various screens to be described later are displayed using, for example, a web browser.

広告サーバ４は、ユーザ端末装置３へ種々の情報を提供するとともに、商品やサービスの広告に関する情報を提供する。例えば、広告サーバ４は、広告商品またはサービスに関する広告画像を表示するための表示画面のデータをユーザ端末装置３へ向けて出力する。 The advertisement server 4 provides various information to the user terminal device 3 and information related to advertisements for products and services. For example, the advertisement server 4 outputs data of a display screen for displaying an advertisement image related to the advertisement product or service to the user terminal device 3.

ブログサーバ５は、多くのインターネットユーザが書き込んだ様々なテキストを含むデータを記憶していて、そのデータに基づくブログサイトが他のインターネットユーザに閲覧可能となっている。 The blog server 5 stores data including various texts written by many Internet users, and a blog site based on the data can be viewed by other Internet users.

図２は、ブログ解析サーバ１の構成図を示す。 FIG. 2 shows a configuration diagram of the blog analysis server 1.

ブログ解析サーバ１は、ブログサーバ５からブログデータを取得して、これを解析または要約し、その結果をユーザ端末装置３へ提供する。そのために、ブログ解析サーバ１は、以下のような構成を有する。すなわち、ブログ解析サーバ１は、ネットワークインタフェース部１１と、ブログデータ収集部１３と、ブログデータ記憶部１５と、テキスト解析部１７と、抽出文字列記憶部１９と、アイテムランキング処理部２１と、要約生成部２３とを備える。 The blog analysis server 1 acquires blog data from the blog server 5, analyzes or summarizes the blog data, and provides the result to the user terminal device 3. Therefore, the blog analysis server 1 has the following configuration. That is, the blog analysis server 1 includes a network interface unit 11, a blog data collection unit 13, a blog data storage unit 15, a text analysis unit 17, an extracted character string storage unit 19, an item ranking processing unit 21, and a summary. And a generating unit 23.

ネットワークインタフェース部１１は、ネットワーク９を介してユーザ端末装置３及びブログサーバ５などのネットワーク９上の他の装置と通信を行う。例えば、ネットワークインタフェース部１１は、ユーザ端末装置３から、あるカテゴリの指定を含む口コミ情報解析リクエスト（ブログデータの解析リクエスト）を受け付けると、そのカテゴリに関する解析結果を出力する。また、ネットワークインタフェース部１１は、ユーザ端末装置３から、上記の解析結果に基づく文字列の指定を含むアイテム解析リクエストを受け付けると、指定された文字列に基づくアイテムに関する解析結果を出力する。さらに、ネットワークインタフェース部１１は、ユーザ端末装置３から、要約作成リクエストを受け付けると、そのリクエストにかかるテキストの要約を出力する。本実施形態では、カテゴリは商品またはサービスに相当する。ここで、口コミ情報解析リクエスト、アイテム解析リクエスト及び要約作成リクエストの詳細については後述する。 The network interface unit 11 communicates with other devices on the network 9 such as the user terminal device 3 and the blog server 5 via the network 9. For example, upon receiving a word-of-mouth information analysis request (blog data analysis request) including designation of a certain category from the user terminal device 3, the network interface unit 11 outputs an analysis result relating to the category. When the network interface unit 11 receives an item analysis request including designation of a character string based on the analysis result from the user terminal device 3, the network interface unit 11 outputs an analysis result regarding an item based on the designated character string. Further, when receiving a summary creation request from the user terminal device 3, the network interface unit 11 outputs a summary of text relating to the request. In the present embodiment, the category corresponds to a product or service. Details of the word-of-mouth information analysis request, the item analysis request, and the summary creation request will be described later.

ブログデータ収集部１３は、ネットワーク９を介して、各ブログサーバ５からブログデータを収集する。ブログデータ収集部１３は、収集したブログデータを分類して、ブログデータ記憶部１５へ格納する。 The blog data collection unit 13 collects blog data from each blog server 5 via the network 9. The blog data collection unit 13 classifies the collected blog data and stores it in the blog data storage unit 15.

図３は、ブログデータ収集部１３の処理手順を示すフローチャートである。同図に基づいて、ブログデータ収集部１３の詳細な処理について説明する。 FIG. 3 is a flowchart showing a processing procedure of the blog data collection unit 13. The detailed processing of the blog data collection unit 13 will be described with reference to FIG.

まず、ブログデータ収集部１３は、ネットワーク９を介して、複数のブログサーバ５からブログデータを収集する（Ｓ１１）。 First, the blog data collection unit 13 collects blog data from a plurality of blog servers 5 via the network 9 (S11).

ここで収集したブログデータは、所定のブロック、例えばエントリ（記事）ごとに分割可能である。そこで、ブログデータ収集部１３は、エントリ単位などの所定の処理単位に以下の処理を行う。まず、ブログデータ収集部１３は、一つのエントリ内のテキストを解析して、そのエントリのカテゴリ及びアイテムを判定する（Ｓ１３）。 The blog data collected here can be divided into predetermined blocks, for example, entries (articles). Therefore, the blog data collection unit 13 performs the following processing in a predetermined processing unit such as an entry unit. First, the blog data collection unit 13 analyzes the text in one entry and determines the category and item of the entry (S13).

例えば、ブログデータ収集部１３は、ブログデータから一つのエントリを取り出して、そのテキストから、予め定められているカテゴリ及びアイテムを示す文字列を抽出して、カテゴリ及びアイテムを判定する。ここでは、一つのエントリに対して、それぞれ一つ以上のカテゴリ及びアイテムを特定する。一つ以上のカテゴリ及びアイテムが抽出されなかったエントリは、これ以降の処理の対象外エントリとする。 For example, the blog data collection unit 13 extracts one entry from the blog data, extracts a character string indicating a predetermined category and item from the text, and determines the category and item. Here, one or more categories and items are specified for each entry. An entry from which one or more categories and items are not extracted is an entry that is not subject to further processing.

本実施形態では、カテゴリとして、商品またはサービスの種別を判定する。商品の種別とは、例えば、「冷蔵庫」、「洗濯機」、「エアコン」などの具体的な商品の一般名称、サービスの種別とは、例えば、「パチンコ」、「テーマパーク」などの具体的なサービスの一般名称などでよい。また、本実施形態では、アイテムとして、上記の商品またはサービス種別内の個別の商品またはサービスを特定する識別情報を判定する。例えば、商品の場合は、各商品を製造または供給しているメーカ名及び機種名（あるいは型番）など、サービスの場合は、サービスを提供する個別の店舗あるいは施設の名称及び具体的なサービス名などでアイテムを特定してもよい。 In the present embodiment, the type of product or service is determined as the category. The product type is a general name of a specific product such as “refrigerator”, “washing machine”, or “air conditioner”, and the service type is a specific product such as “pachinko” or “theme park”. The general name of a simple service may be used. Further, in the present embodiment, identification information that identifies an individual product or service within the product or service type is determined as an item. For example, in the case of a product, the name of the manufacturer and model (or model number) that manufactures or supplies each product. In the case of a service, the name of an individual store or facility that provides the service and the name of a specific service. You may specify an item.

次に、ブログデータ収集部１３は、ブログデータ記憶部１５にステップＳ１３で判定されたカテゴリのフォルダが既に存在するか否かを判定する（Ｓ１５）。そのカテゴリのフォルダが存在しないときは（Ｓ１５：Ｎｏ）、そのカテゴリのカテゴリ別フォルダを生成する（Ｓ１７）。一方、既にその対象となるカテゴリのフォルダがブログデータ記憶部１５に存在する場合は（Ｓ１５：Ｙｅｓ）、ステップＳ１７をスキップする。 Next, the blog data collection unit 13 determines whether or not the folder of the category determined in step S13 already exists in the blog data storage unit 15 (S15). When the folder of the category does not exist (S15: No), the category-specific folder of the category is generated (S17). On the other hand, if a folder of the target category already exists in the blog data storage unit 15 (S15: Yes), step S17 is skipped.

つまり、ブログデータ収集部１３は、新たなカテゴリが検出されると、ブログデータ記憶部１５にそのカテゴリに対応するカテゴリ別フォルダを生成する。例えば、各フォルダのフォルダ名をカテゴリ名としてもよい。本実施形態では、「冷蔵庫」、「洗濯機」、「エアコン」などのフォルダ名のフォルダが生成される（図４参照）。 That is, when a new category is detected, the blog data collection unit 13 generates a category-specific folder corresponding to the category in the blog data storage unit 15. For example, the folder name of each folder may be the category name. In the present embodiment, folders with folder names such as “refrigerator”, “washing machine”, and “air conditioner” are generated (see FIG. 4).

ブログデータ収集部１３は、ステップＳ１３で判定されたカテゴリのフォルダ内に、ステップＳ１３で判定されたアイテムに対応するアイテム別ファイルが既に存在するか否かを判定する（Ｓ１９）。対象のアイテム別ファイルが存在しないときは（Ｓ１９：Ｎｏ）、そのアイテム別ファイルを生成して、生成したファイルに処理対象となっているエントリのテキストを保存する（Ｓ２１）。一方、ステップＳ１３で判定されたアイテムに対応するアイテム別ファイルが既に存在するときは（Ｓ１９：Ｙｅｓ）、既存のファイルに対象エントリのテキストを追加保存する（Ｓ２３）。 The blog data collection unit 13 determines whether an item-specific file corresponding to the item determined in step S13 already exists in the category folder determined in step S13 (S19). When there is no target item-specific file (S19: No), the item-specific file is generated, and the text of the entry to be processed is stored in the generated file (S21). On the other hand, when an item-specific file corresponding to the item determined in step S13 already exists (S19: Yes), the text of the target entry is additionally stored in the existing file (S23).

例えば、ブログデータ収集部１３は、新たなアイテムが検出されると、ブログデータ記憶部１５の対応するカテゴリ別フォルダ内に、そのアイテムのテキストファイルを生成する。例えば、アイテム名をアイテム別ファイルのファイル名としても良い。本実施形態では、例えば、ファイル名が「メーカ名＿機種名」となっていても良い。また、異なるエントリで同じアイテムに付いて記載されている場合は、これらのエントリのテキストは同一ファイルに格納される。 For example, when a new item is detected, the blog data collection unit 13 generates a text file of the item in the corresponding category folder in the blog data storage unit 15. For example, the item name may be the file name of the item-specific file. In the present embodiment, for example, the file name may be “maker name_model name”. When the same item is described in different entries, the texts of these entries are stored in the same file.

上記のステップＳ１３以降の処理を、ステップＳ１１で取得したすべてのブログデータのエントリに対して繰り返し実行する（Ｓ２５）。 The processes after step S13 are repeatedly executed for all the blog data entries acquired in step S11 (S25).

ブログデータ収集部１３は、定期または不定期にブログデータの収集及び分類に関する上記処理を実行する。 The blog data collection unit 13 executes the above-described processing related to collection and classification of blog data regularly or irregularly.

これにより、収集したブログデータが、それぞれに記載されているカテゴリ及びアイテムに応じて分類される。つまり、ブログデータ記憶部１５にはテキストが含まれているファイルが記憶される。ブログデータ記憶部１５のデータ構造の一例の模式図を図４に示す。 Thereby, the collected blog data is classified according to the category and item described in each. That is, the blog data storage unit 15 stores a file containing text. A schematic diagram of an example of the data structure of the blog data storage unit 15 is shown in FIG.

図４に示すように、ブログデータ記憶部１５は一以上のカテゴリ別のフォルダ１５０（１５０ａ〜ｃ）を備える。各フォルダ１５０には一以上のテキストファイル１５２が記憶されている。各テキストファイル１５２は、それぞれ格納されているフォルダ１５０に対応するカテゴリに属する互いに異なるアイテムに関するアイテム別テキストが含まれている。 As shown in FIG. 4, the blog data storage unit 15 includes one or more category-specific folders 150 (150a to 150c). Each folder 150 stores one or more text files 152. Each text file 152 includes item-specific texts relating to different items belonging to a category corresponding to each stored folder 150.

例えば、Ａ社のｘｘｘなる機種の冷蔵庫に関する記述を含むブログデータ（エントリ）は、「冷蔵庫」フォルダ１５０ａ内のファイル名「Ａ社＿ｘｘｘ」なるテキストファイル１５２に格納されている。 For example, blog data (entry) including a description about a refrigerator of a model xxx of company A is stored in a text file 152 named “A company_xxx” in the “refrigerator” folder 150a.

あらためて図１を参照すると、テキスト解析部１７は、ユーザ端末装置３から、広告画像にかかる広告商品またはサービスのカテゴリを示す情報を含む第１のリクエスト（口コミ情報解析リクエスト）を受け付けると、テキスト記憶部（ブログデータ記憶部１５）を参照して、第１のリクエストに含まれるカテゴリの商品またはサービスに関して記述したテキストから、所定以上の頻度または回数出現する複数の頻出文字列を抽出する。以下、テキスト解析部１７についてさらに詳細に説明する。 Referring again to FIG. 1, when the text analysis unit 17 receives a first request (word-of-mouth information analysis request) including information indicating the category of the advertisement product or service related to the advertisement image from the user terminal device 3, the text storage unit 17 stores the text storage. Referring to the section (blog data storage section 15), a plurality of frequently occurring character strings appearing at a frequency or number of times greater than or equal to a predetermined frequency are extracted from the text described regarding the product or service of the category included in the first request. Hereinafter, the text analysis unit 17 will be described in more detail.

テキスト解析部１７は、ブログデータ記憶部１５に格納されているブログデータのテキスト解析を行う。テキスト解析部１７は、テキスト解析を行うために、特定文字列処理部１７１と、消去文字列処理部１７３と、不要文字列処理部１７５と、切出し処理部１７７とを備える。 The text analysis unit 17 performs text analysis of the blog data stored in the blog data storage unit 15. The text analysis unit 17 includes a specific character string processing unit 171, an erasure character string processing unit 173, an unnecessary character string processing unit 175, and a cutout processing unit 177 for performing text analysis.

特定文字列処理部１７１は、ブログデータ記憶部１５に記憶されているファイル内のテキストから、予め定められている特定文字列を抽出し、抽出された特定文字列を抽出文字列記憶部１９へ格納する。このときに、特定文字列処理部１７１は、抽出した特定文字列を空白に置換する。また、特定文字列処理部１７１は、抽出された特定文字列の出現回数または出現頻度を計数し、計数結果も合わせてブログデータ記憶部１５に格納してもよい。例えば、特定文字列処理部１７１は予め特定文字列辞書を保持している。この特定文字列辞書には、複数の特定文字列が格納されている。特定文字列処理部１７１は、文字列長の長い特定文字列から順に、テキストとのマッチング及び空白への置換処理を行う。 The specific character string processing unit 171 extracts a predetermined specific character string from the text in the file stored in the blog data storage unit 15, and extracts the extracted specific character string to the extracted character string storage unit 19. Store. At this time, the specific character string processing unit 171 replaces the extracted specific character string with a blank. Further, the specific character string processing unit 171 may count the number of appearances or the appearance frequency of the extracted specific character string, and store the count result in the blog data storage unit 15 together. For example, the specific character string processing unit 171 holds a specific character string dictionary in advance. The specific character string dictionary stores a plurality of specific character strings. The specific character string processing unit 171 performs matching with text and replacement with a blank in order from a specific character string having a long character string length.

ここで、特定文字列辞書は、対象としているブログデータ（テキスト）の属性別に設けられていても良い。つまり、ブログデータ（テキスト）の属性に応じて、特定文字列辞書に格納される特定文字列が異なっても良い。例えば、「商品の口コミ情報」と「新聞記事」とは属性が異なるので、それぞれ特定文字列辞書を構成する特定文字列が異なってもよい。 Here, the specific character string dictionary may be provided for each attribute of the target blog data (text). That is, the specific character string stored in the specific character string dictionary may be different according to the attribute of the blog data (text). For example, since “product word-of-mouth information” and “newspaper article” have different attributes, the specific character strings constituting the specific character string dictionary may be different from each other.

消去文字列処理部１７３は、ブログデータ記憶部１５に記憶されているテキストファイル内のテキストにおいて、少なくとも句読点と、すべてのひらがなまたは所定の除外文字以外のすべてのひらがなとを含む消去文字を空白に置換する。 The erase character string processing unit 173 blanks out erase characters including at least punctuation marks and all hiragana characters or all hiragana characters other than a predetermined excluded character in the text in the text file stored in the blog data storage unit 15. Replace.

本願の発明者は、ブログを含むテキスト一般において、そのテキストで記述されている文脈に関連して意味のある表現の多くは漢字で表現されているという知見を得た。そこで、本実施形態では、あるテキストの記述内容を特徴付ける文字列（単語、キーワード）として、漢字で構成される文字列を抜き出している。 The inventor of the present application has found that in texts including blogs in general, many meaningful expressions related to the context described in the text are expressed in Chinese characters. Therefore, in the present embodiment, a character string composed of kanji is extracted as a character string (word, keyword) characterizing the description content of a certain text.

なお、上述した特定文字列辞書には、対象となるブログデータ（テキスト）の属性との関係で重要な文字列が、特定文字列として含まれる。これは、漢字以外の構成要素を含む文字列であっても重要な表現（文字列）は存在するので、特定文字列処理部１７１が特定文字列に設定されているその重要表現を、消去文字列処理部１７３が消去文字を消去する前に抽出している。 The specific character string dictionary described above includes important character strings as specific character strings in relation to the attributes of the target blog data (text). This is because an important expression (character string) exists even if it is a character string including components other than kanji, so that the important expression that the specific character string processing unit 171 has set as the specific character string is used as an erasure character. The column processing unit 173 extracts the erased characters before erasing them.

消去文字は、例えば、すべてのひらがなを含んでも良いし、あるいは、「の」、「が」、「い」及び「く」のうちのいずれか一つ以上のひらがな（除外文字）を除くすべてのひらがなを含んでも良い。つまり、後者の場合は、「の」、「が」、「い」及び「く」のいずれか一つ以上は消去文字に含まれない。これらの文字は、漢字で構成される文字列を結合する役割を果たすことがあり、これらのひらがなで結合された漢字列によって、一定の意味を成すことがあるからである。「・」及び「−」も、同様の理由によって消去文字に含めなくても良い。 For example, the erasure character may include all hiragana characters, or all except one or more hiragana characters (excluded characters) of “no”, “ga”, “i”, and “ku”. Hiragana may be included. That is, in the latter case, any one or more of “NO”, “GA”, “I”, and “KU” is not included in the erase character. This is because these characters may play a role of combining character strings composed of kanji characters, and the kanji character strings combined with these hiragana characters may make a certain meaning. “·” And “−” may not be included in the erasure character for the same reason.

これ以外に、消去文字には、句読点、カンマ、ピリオド、アルファベット及び各種記号が含まれていても良い。 In addition, the erasure characters may include punctuation marks, commas, periods, alphabets, and various symbols.

次に、消去文字に含まれる文字列の一例を以下に示す。
消去文字＝［「あ」,「い」,「う」,「え」,「お」,「か」,「き」,「く」,「け」,「こ」,「さ」,「し」,「す」,「せ」,「そ」,「た」,「ち」,「つ」,「て」,「と」,「な」,「に」,「ぬ」,「ね」,「は」,「ひ」,「ふ」,「へ」,「ほ」,「ま」,「み」,「む」,「め」,「も」,「や」,「ゆ」,「よ」,「ら」,「り」,「る」,「れ」,「ろ」,「わ」,「を」,「が」,「ぎ」,「ぐ」,「げ」,「ご」,「ざ」,「じ」,「ず」,「ぜ」,「ぞ」,「だ」,「ぢ」,「づ」,「で」,「ど」,「ば」,「び」,「ぶ」,「べ」,「ぼ」,「ぱ」,「ぴ」,「ぷ」,「ぺ」,「ぽ」,「ゃ」, 「ゅ」,「ょ」,「っ」,「ん」,「。」,「、」,「.」,「（」,「）」,「｛」, 「｝」,「「」,「」」,「〜」,「〕」,「”」,「”」,「＜」,「＞」,「『」, 「』」,「■」,「*」,「！」,「＝」,「※」,「!」,「(」,「)」,「／」,「〔」, 「+」,「￥」,「$」,「＆」,「&」, 「@」,「＠」,「＊」,「…」,「a」,「b」,「c」,「d」,「e」,「f」,「g」,「h」,「i」,「j」,「k」,「l」,「m」,「n」,「o」,「p」,「q」,「r」,「s」,「t」,「u」,「v」,「w」,「x」,「y」,「z」,「A」,「B」,「C」,「D」,「E」,「F」,「G」,「H」,「I」,「J」,「K」,「L」,「M」,「N」,「O」,「P」,「Q」,「R」,「S」,「T」,「U」,「V」,「W」,「X」,「Y」,「Z」,「!」,「#」,「$」,「%」,「&」,「(」,「)」,「^」,「=」,「~」,「|」,「{」,「}」,「[」,「]」,「:」,「;」,「+」,「*」,「}」,「_」,「?」,「/」,「.」,「<」,「>」,「,」,「\\」,「\t」,「\b」,「\」」,「\"」,「\r」,「\n」,「，」,「：」, 「（墨付かっこ）」,「（墨付かっこ閉じる）」 ,「［」,「］」,「「」,「。」,「、」］。 Next, an example of a character string included in the erased character is shown below.
Erase character = [“A”, “I”, “U”, “E”, “O”, “K”, “K”, “K”, “K”, “K”, “S”, “S” ”,“ Su ”,“ se ”,“ so ”,“ ta ”,“ chi ”,“ tsu ”,“ te ”,“ to ”,“ na ”,“ ni ”,“ nu ”,“ ne ”, “Ha”, “hi”, “fu”, “he”, “ho”, “ma”, “mi”, “mu”, “me”, “mo”, “ya”, “yu”, “yo” ",""Ra","ri","ru","re","ro","wa","wo","ga","gi","gu","ge","go","Za","Ji","Zu","Ze","Zo","Da","Da","Zu","De","Do","Ba","Bi","Bu" ”,“ Be ”,“ bo ”,“ pa ”,“ pi ”,“ pu ”,“ pe ”,“ po ”,“ nya ”,“ yu ”,“ yo ”,“ tsu ”,“ n ”, ".", ",", ".", "(", ")", "{", "}", """,""","~","]",""",""","<",">",""","""," ■ "," * ","! "," = "," * ","! ","(",")" , “/”, “[”, “+”, “¥”, “$”, “&”, “&”, “@”, “@”, “*”, “…”, “a”, “ 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n' , “O”, “p”, “q”, “r”, “s”, “t”, “u”, “v”, “w”, “x”, “y”, “z”, “ `` A '', `` B '', `` C '', `` D '', `` E '', `` F '', `` G '', `` H '', `` I '', `` J '', `` K '', `` L '', `` M '' , “N”, “O”, “P”, “Q”, “R”, “S”, “T”, “U”, “V”, “W”, “X”, “Y”, “ "Z", "!", "#", "$", "%", "&", "(", ")", "^", "=", "~", "|", "{" , “}”, “[”, “]”, “:”, “;”, “+”, “*”, “}”, “_”, “?”, “/”, “.”, “ <",">",","," \\ "," \ t "," \ b "," \ "," \ "", "\ r", "\ n", ",", ":", "(Summed brackets)", "(closed brackets)", "[", "]", """,".",","].

不要文字列処理部１７５は、予め定められた一つ以上の不要文字列を抽出し、それらを空白に置換する。例えば、テキスト解析において意味を持たない、ネット上で頻繁に使用される語（例えば、「タグ」、「ページ」、「検索」等）を不要文字列としても良い。 The unnecessary character string processing unit 175 extracts one or more predetermined unnecessary character strings and replaces them with blanks. For example, words that have no meaning in text analysis and are frequently used on the net (for example, “tag”, “page”, “search”, etc.) may be used as unnecessary character strings.

切出し処理部１７７は、特定文字列処理部１７１、消去文字列処理部１７３、及び不要文字列処理部１７５の少なくともいずれか一つ以上によって処理された処理済みテキストから、空白で挟まれている文字列を抽出し、抽出された文字列を抽出文字列記憶部１９に格納する切出し処理を行う。このとき、切出し処理部１７７は、切り出された文字列の先頭（語頭）または末尾（語尾）が漢字以外の文字等であれば、その文字等を除去した文字列を抽出文字列記憶部１９に格納する。これは、消去文字列に含めていないひらがな、記号などが語頭または語尾になることがあるからである。後述するように、抽出文字列記憶部１９には、抽出された文字列と、その出現回数または出現頻度とが対応付けて記憶されている。従って、切出し処理部１７７が同一文字列を重複して抽出した場合には、その文字列の出現回数または出現頻度を計数する。 The cut-out processing unit 177 is a character that is sandwiched by blanks from the processed text processed by at least one of the specific character string processing unit 171, the erased character string processing unit 173, and the unnecessary character string processing unit 175. Extraction processing is performed for extracting a column and storing the extracted character string in the extracted character string storage unit 19. At this time, if the beginning (start) or end (end) of the extracted character string is a character other than Kanji, the extraction processing unit 177 sends the character string from which the character or the like has been removed to the extracted character string storage unit 19. Store. This is because hiragana, symbols, etc. that are not included in the erase character string may start or end. As will be described later, the extracted character string storage unit 19 stores the extracted character string and the number of appearances or the appearance frequency in association with each other. Therefore, when the cutout processing unit 177 extracts the same character string in duplicate, the number of appearances or the appearance frequency of the character string is counted.

抽出文字列記憶部１９は、ブログデータ記憶部１５のテキストファイルから抽出された文字列を記憶する。抽出文字列記憶部１９は、文字列と各文字列の出現回数または出現頻度とを対応付けて記憶する。 The extracted character string storage unit 19 stores a character string extracted from the text file in the blog data storage unit 15. The extracted character string storage unit 19 stores the character string and the number of appearances or the appearance frequency of each character string in association with each other.

テキスト解析部１７は、口コミ情報解析リクエストなどにより一のカテゴリが指定されると、指定されたカテゴリのフォルダ１５０内の一以上のテキストファイル１５２を読み込み、読み込んだテキストファイル１５２内のテキストにおいて、上記特定文字列処理部１７１による特定文字列の置換処理、消去文字列処理部１７３による消去文字列の消去（空白への置換）処理及び不要文字列処理部１７５による不要文字列の削除処理を行ってもよい。そして、これらの処理が行われた処理済みテキストに対して、切出し処理部１７７が上述の切出し処理を行って、切り出された文字列が抽出文字列記憶部１９に格納される。例えば、ユーザ端末装置３からの口コミ情報解析リクエストによって、カテゴリとして「洗濯機」が指定されると、洗濯機フォルダ１５０ａに格納されているファイル１５２に対して上述の処理が行われ、洗濯機に関するブログで記述されている内容を特徴付ける文字列が抽出される。 When one category is specified by a word-of-mouth information analysis request or the like, the text analysis unit 17 reads one or more text files 152 in the folder 150 of the specified category, and the text in the read text file 152 A specific character string replacement process by the specific character string processing unit 171, an erase character string deletion process (replacement to a blank) by the erase character string processing unit 173, and an unnecessary character string deletion process by the unnecessary character string processing unit 175 are performed. Also good. Then, the cut-out processing unit 177 performs the above-described cut-out process on the processed text subjected to these processes, and the extracted character string is stored in the extracted character string storage unit 19. For example, when “washing machine” is specified as a category by a word-of-mouth information analysis request from the user terminal device 3, the above-described processing is performed on the file 152 stored in the washing machine folder 150a, and the washing machine is related to the washing machine. Character strings that characterize the content described in the blog are extracted.

ネットワークインタフェース部１１は、上記の処理によって抽出文字列記憶部１９に格納された複数の文字列を、口コミ情報解析リクエストをしたユーザ端末装置３に対して出力する。例えば、ネットワークインタフェース部１１は、上記の処理により抽出した複数の頻出文字列（抽出文字列）を表示するための表示画面のデータをユーザ端末装置３へ向けて出力する。ネットワークインタフェース部１１から出力される複数の文字列は、それぞれの出現回数または出現頻度によってソートされた文字列リストでもよい。 The network interface unit 11 outputs the plurality of character strings stored in the extracted character string storage unit 19 by the above processing to the user terminal device 3 that has made the word-of-mouth information analysis request. For example, the network interface unit 11 outputs data of a display screen for displaying a plurality of frequent character strings (extracted character strings) extracted by the above process to the user terminal device 3. The plurality of character strings output from the network interface unit 11 may be a character string list sorted according to the number of appearances or the appearance frequency.

アイテムランキング処理部２１は、ユーザ端末装置３から、ネットワークインタフェース部１１が出力した複数の頻出文字列から選択された一の選択頻出文字列を示すデータを含む第２のリクエスト（アイテム解析リクエスト）を受け付けると、テキスト記憶部（ブログデータ記憶部１５）を参照して、第１のリクエスト（口コミ情報解析リクエスト）にかかるカテゴリの商品またはサービスに関して記述したテキストにおける、商品またはサービス別の選択頻出文字列の出現頻度または回数を計数する。例えば、口コミ情報解析リクエストに対するレスポンスとして出力した文字列の中の一の文字列が選択されたアイテム解析リクエストを受け付けたとき、アイテムランキング処理部２１は、口コミ情報解析リクエストで対象となったカテゴリ内の各ファイルにおける、選択文字列の出現回数または出現頻度を計数する。つまり、アイテムランキング処理部２１がブログデータ記憶部１５に記憶されているブログデータにおいて、アイテム別に、選択文字列の出現回数または出現頻度を計数している。アイテムランキング処理部２１は、計数結果に基づく商品またはサービス別ランキング（アイテム名をソートしたアイテムリスト）を生成する。 The item ranking processing unit 21 issues a second request (item analysis request) including data indicating one selected frequent character string selected from a plurality of frequent character strings output from the network interface unit 11 from the user terminal device 3. When accepted, the frequently-selected character string for each product or service in the text describing the product or service in the category related to the first request (word-of-mouth information analysis request) with reference to the text storage unit (blog data storage unit 15) Count the frequency or number of occurrences of. For example, when receiving an item analysis request in which one character string in a character string output as a response to a word-of-mouth information analysis request is selected, the item ranking processing unit 21 is in the category targeted by the word-of-mouth information analysis request. The number of appearances or the appearance frequency of the selected character string in each file is counted. That is, the item ranking processing unit 21 counts the number of appearances or the appearance frequency of the selected character string for each item in the blog data stored in the blog data storage unit 15. The item ranking processing unit 21 generates a product or service ranking (item list in which item names are sorted) based on the counting result.

この計数結果は、ネットワークインタフェース部１１によって、アイテム解析リクエストをしたユーザ端末装置３へ送信される。例えば、ネットワークインタフェース部１１は、上述のアイテム名リストを出力してもよい。また、ネットワークインタフェース部１１は、アイテムランキング処理部２１によって生成されたランキングを表示するための表示画面のデータをユーザ端末装置３へ向けて出力してもよい。 The count result is transmitted by the network interface unit 11 to the user terminal device 3 that has requested the item analysis. For example, the network interface unit 11 may output the item name list described above. Further, the network interface unit 11 may output display screen data for displaying the ranking generated by the item ranking processing unit 21 to the user terminal device 3.

要約生成部２３は、ブログデータ記憶部１５に格納されているブログデータのテキストの要約を生成する。要約生成部２３は、要約を生成するために、出現回数計数部２３１と、区切り処理部２３２と、文選択処理部２３３とを備える。 The summary generation unit 23 generates a text summary of the blog data stored in the blog data storage unit 15. The summary generation unit 23 includes an appearance number counting unit 231, a delimiter processing unit 232, and a sentence selection processing unit 233 in order to generate a summary.

要約生成部２３は、ユーザ端末装置３から、ランキングの中から選択された選択商品またはサービスを示す情報を含む第３のリクエスト（要約生成リクエスト）を受け付けると、テキスト記憶部（ブログデータ記憶部１５）を参照して、選択商品またはサービスに関して記述したテキストの要約を生成する。 When the summary generation unit 23 receives a third request (summary generation request) including information indicating the selected product or service selected from the ranking from the user terminal device 3, the summary generation unit 23 receives the text storage unit (blog data storage unit 15). ) To generate a text summary describing the selected product or service.

出現回数計数部２３１は、ブログデータ記憶部１５に記憶されているファイル内のテキストの単語（文字列）別の出現回数を計数する。例えば、出現回数計数部２３１は、区切り処理部２３２が区切り処理を行う前のテキストについて計数処理を行う。この計数結果が文選択処理部２３３において利用される。なお、出現回数計数部２３１は、文字列の抽出及びその登場回数または出現頻度の計数については、テキスト解析部１７と同じ処理を行ってもよい。 The appearance count unit 231 counts the number of appearances of each text word (character string) in the file stored in the blog data storage unit 15. For example, the appearance count counting unit 231 performs a counting process on the text before the delimiter processing unit 232 performs the delimiter process. This counting result is used in the sentence selection processing unit 233. Note that the appearance count unit 231 may perform the same processing as the text analysis unit 17 regarding the extraction of the character string and the count of the appearance frequency or the appearance frequency.

区切り処理部２３２は、ブログデータ記憶部１５に記憶されているファイル内のテキストの要約作成リクエストを受け付けると、ファイル内のテキストを一文ごとに区切る区切り処理を行う。例えば、ユーザ端末装置３からの要約作成リクエストで指定されたアイテムに対応するファイル内のテキストを読み込んで、そのテキスト内の複数の文を、例えば句点を検出してそれぞれ区切る。 When the delimiter processing unit 232 receives a text summary creation request in a file stored in the blog data storage unit 15, the delimiter processing unit 232 performs delimiter processing for delimiting the text in the file for each sentence. For example, the text in the file corresponding to the item specified in the summary creation request from the user terminal device 3 is read, and a plurality of sentences in the text are detected and separated, for example.

文選択処理部２３３は、区切り処理部２３２で区切られた複数の文のうち、テキストの属性に応じて定まる所定の文末文字で終わる文を選択する選択処理を行う。ここで、所定の文末文字には複数の種類があってもよい。各種類の文末文字は、１文字でも良いし、複数文字でも良い。例えば、テキストが商品またはサービスについて記述したブログであるとき、文末文字は、少なくとも「足」、「得」、「念」、「す」、「い」、「り」、「る」、「よ」及び「ん」を含んでもよい。あるいは、テキストが新聞記事であるとき、文末文字は、少なくとも、「す」、「た」、「る」及び「んだ」を含んでもよい。また、文選択処理部２３３は、テキストが新聞記事であるとき、文頭が「（」であり、かつ、文末が「）」ある文を選択してもよい。さらに、テキストが新聞記事であるとき、文選択処理部２３３は、文頭が「この中で」、「ただ」、及び「このほか」のうちのいずれかであり、かつ、文末が「す」、「た」、「る」及び「んだ」のうちのいずれかである文の直前の文を選択してもよい。新聞記事の場合、前文を引用する文が存在し、その前文がないと意味不明な文となることがあるからである。また、新聞記事の場合、カギカッコ（「、」）で囲まれた文字列の中に句点が含まれるときは、その句点を読点へ置換してから上述の処理を行うようにしても良い。 The sentence selection processing unit 233 performs a selection process of selecting a sentence ending with a predetermined end-of-sentence character determined according to the text attribute from among the plurality of sentences delimited by the delimiter processing unit 232. Here, there may be a plurality of types of predetermined sentence end characters. Each type of sentence end character may be a single character or a plurality of characters. For example, when the text is a blog that describes a product or service, the end-of-sentence characters are at least “foot”, “profit”, “mind”, “su”, “i”, “ri”, “ru”, “yo” And “n” may be included. Alternatively, when the text is a newspaper article, the end-of-sentence characters may include at least “su”, “ta”, “ru”, and “dan”. In addition, when the text is a newspaper article, the sentence selection processing unit 233 may select a sentence having a sentence head of “(” and a sentence end of “)”. Further, when the text is a newspaper article, the sentence selection processing unit 233 has the sentence head of any one of “inside”, “just”, and “other” and the sentence end is “s”, The sentence immediately before the sentence that is one of “TA”, “RU”, and “DA” may be selected. This is because in the case of a newspaper article, there is a sentence that quotes the previous sentence, and if there is no previous sentence, the sentence may be unclear. In the case of a newspaper article, if a character string enclosed in brackets (“,”) includes a punctuation mark, the above processing may be performed after the punctuation point is replaced with a punctuation mark.

本願の発明者は、テキストの要約となりうる文の文末が、上記のようにテキストの属性に応じて、それぞれ異なるという知見を得た。上記の文末文字を用いた文の選択処理は、この知見に基づくものである。 The inventor of the present application has found that sentence endings that can be text summaries are different depending on the text attributes as described above. The sentence selection process using the above sentence end character is based on this knowledge.

また、文選択処理部２３３は、区切り処理部２３２で区切られた複数の文のうち、出現回数計数部２３１で計数されたテキストにおける出現回数が所定以上の頻出語を含む文を選択してもよい。例えば、対象のファイル内のテキストにおいて、所定回数以上出現する文字列を含む文は、文末文字に関わりなく選択する。例えば、テキストが３０００文字未満のときは３回以上、４０００文字未満のときは４回以上出現する文字列を頻出語としても良い。頻出語の閾値は、以下、１０００文字ごとに１文字ずつ増やしてもよい。 In addition, the sentence selection processing unit 233 may select a sentence including a frequent word whose number of appearances in the text counted by the appearance number counting unit 231 is a predetermined number or more among the plurality of sentences delimited by the delimiter processing unit 232. Good. For example, in a text in a target file, a sentence including a character string that appears more than a predetermined number of times is selected regardless of the sentence end character. For example, a character string that appears 3 times or more when the text is less than 3000 characters and 4 or more times when the text is less than 4000 characters may be used as a frequent word. Hereinafter, the threshold value of the frequent word may be increased by one character every 1000 characters.

ネットワークインタフェース部１１は、上述の処理によって生成された要約を表示するための表示画面のデータをユーザ端末装置３へ向けて出力する。 The network interface unit 11 outputs data of a display screen for displaying the summary generated by the above process to the user terminal device 3.

図５は、広告サーバ４の構成図である。 FIG. 5 is a configuration diagram of the advertisement server 4.

広告サーバ４は、同図に示すように、ネットワークインタフェース部４１と、ウェブサーバ４３と、ウェブページデータ記憶部４５と、広告データ記憶部４７とを有する。 The advertisement server 4 includes a network interface unit 41, a web server 43, a web page data storage unit 45, and an advertisement data storage unit 47, as shown in FIG.

ウェブページデータ記憶部４５は、ウェブページに貼り付けられる画像などのオブジェクト及びスクリプトなどを含むウェブページデータを記憶する。ウェブページデータは、例えば、ＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ）、あるいはＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）などのスクリプトで記述されている。 The web page data storage unit 45 stores web page data including objects such as images and scripts to be pasted on the web page. The web page data is described by a script such as HTML (Hyper Text Markup Language) or XML (extensible Markup Language).

広告データ記憶部４７は、複数の商品またはサービスに関する複数の広告データのエンティティを記憶する。広告データは、例えば、ＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ）、あるいはＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）などで記述されている。広告データは、例えば、画像データを含んでも良い。広告データは、それぞれの商品またはサービスのカテゴリ及びアイテムに関する識別情報を含む。例えば、各広告データエンティティは、それぞれの「冷蔵庫」、「洗濯機」などのカテゴリの識別情報と、メーカ名及び機種名（あるいは型番）によるアイテムの識別情報とを含む。 The advertisement data storage unit 47 stores a plurality of advertisement data entities related to a plurality of products or services. The advertisement data is described in, for example, HTML (Hyper Text Markup Language) or XML (extensible Markup Language). The advertisement data may include image data, for example. The advertisement data includes identification information regarding the category and item of each product or service. For example, each advertisement data entity includes identification information of a category such as “refrigerator” and “washing machine”, and item identification information by manufacturer name and model name (or model number).

ネットワークインタフェース部４１は、ネットワーク９を介してユーザ端末装置３などネットワーク９に接続されている他の装置と通信を行う。例えば、ネットワークインタフェース部４１は、ユーザ端末装置３からのＨＴＴＰリクエストなどを受け付けて、そのＨＴＴＰリクエストなどに対するレスポンスとしてウェブページデータ及び広告データなどを送信する。 The network interface unit 41 communicates with other devices connected to the network 9 such as the user terminal device 3 via the network 9. For example, the network interface unit 41 receives an HTTP request from the user terminal device 3, and transmits web page data, advertisement data, and the like as a response to the HTTP request.

ウェブサーバ４３は、ユーザ端末装置３からのリクエストを受け付けて、これに対する処理を行う。例えば、ウェブサーバ４３は、ユーザ端末装置３からのＨＴＴＰ（ＨｙｐｅｒＴｅｘｔＴｒａｎｓｆａｒＰｒｏｔｏｃｏｌ）リクエストに基づいて、ウェブページデータ記憶部４５からウェブページデータを取得し、広告データ記憶部４７から広告対象の商品またはサービスに関する広告画像のデータを取得する。ウェブサーバ４３は、一以上の広告商品またはサービスに関する広告画像を表示するための表示画面のデータを、ネットワークインタフェース部４１を介してユーザ端末装置３へ向けて出力する。ここで、広告商品またはサービスは、例えば、ブログ解析サーバ１がブログデータを収集してある商品またはサービスの中の一つ以上の商品またはサービスでよい。 The web server 43 receives a request from the user terminal device 3 and performs a process corresponding thereto. For example, the web server 43 acquires web page data from the web page data storage unit 45 based on an HTTP (Hyper Text Transfer Protocol) request from the user terminal device 3, and the advertisement target product or advertisement product 47 Get advertising image data for the service. The web server 43 outputs display screen data for displaying an advertisement image related to one or more advertisement products or services to the user terminal device 3 via the network interface unit 41. Here, the advertising product or service may be, for example, one or more products or services among the products or services for which the blog analysis server 1 has collected blog data.

次に、上記のような構成を備える情報提供システムにおける処理手順の一例を、フローチャートを用いて説明する。 Next, an example of a processing procedure in the information providing system having the above configuration will be described using a flowchart.

まず、以下の処理を行うための前提として、ブログ解析サーバ１のブログデータ収集部１３が予め図３に示す処理を行うことにより、ブログデータ記憶部１５にブログデータが保存されている。 First, as a premise for performing the following processing, the blog data collection unit 13 of the blog analysis server 1 performs the processing shown in FIG.

図６は、本実施形態における情報提供処理手順を示す。この情報提供処理手順について、図７〜図１０に示すユーザ端末装置３における画面遷移と合わせて説明する。 FIG. 6 shows an information provision processing procedure in the present embodiment. This information provision processing procedure will be described together with screen transitions in the user terminal device 3 shown in FIGS.

まず、広告サーバ４が、ウェブページデータ記憶部４５に記憶されているウェブページデータ及び広告データ記憶部４７に記憶されている広告データを取得して、ユーザ端末装置３へ送信する（Ｓ４００）。このデータ送信は、例えば、ユーザ端末装置３からのリクエストなどを契機として行っても良い。 First, the advertisement server 4 acquires the web page data stored in the web page data storage unit 45 and the advertisement data stored in the advertisement data storage unit 47 and transmits them to the user terminal device 3 (S400). This data transmission may be performed, for example, triggered by a request from the user terminal device 3.

ユーザ端末装置３では、広告サーバ４から送られてきたウェブページデータ及び広告データを受信して、ウェブブラウザを用いて広告画像を含むウェブページを表示する（Ｓ４２１）。 The user terminal device 3 receives the web page data and advertisement data sent from the advertisement server 4 and displays a web page including an advertisement image using a web browser (S421).

図７は、このウェブページ５００の一例を示す。同図に示すように、ウェブページ５００は、ページコンテンツの表示領域５０１と、広告画像の表示領域５０２とを有する。さらに、本実施形態では、ウェブページ５００には、ユーザ選択領域５０３が設けられていて、広告画像の表示領域５０２の広告に対する口コミ情報を知りたいユーザがユーザ選択領域５０３を選択すると、以下に説明する手順に従って口コミ情報の解析結果が提供される。 FIG. 7 shows an example of this web page 500. As shown in the figure, the web page 500 includes a page content display area 501 and an advertisement image display area 502. Furthermore, in this embodiment, the user selection area 503 is provided in the web page 500, and when a user who wants to know the word-of-mouth information for the advertisement in the advertisement image display area 502 selects the user selection area 503, the following description will be given. The analysis result of word-of-mouth information is provided according to the procedure.

ウェブページ５００が表示されているときに、例えば、ユーザがユーザ選択領域５０３を選択するなど、ユーザによる所定の操作を受け付けると、ユーザ端末装置３は、口コミ情報解析リクエストをブログ解析サーバ１へ送る（Ｓ４２３）。口コミ情報解析リクエストには、広告画像の表示領域５０２に表示されている広告にかかる広告商品またはサービスのカテゴリを示す情報が含まれる。なお、広告商品またはサービスの識別情報（アイテムの識別情報）によりカテゴリが特定できるときは、口コミ情報解析リクエストに広告商品またはサービスの識別情報を含めても良い。 When the web page 500 is displayed, for example, when the user receives a predetermined operation such as selecting the user selection area 503, the user terminal device 3 sends a word-of-mouth information analysis request to the blog analysis server 1. (S423). The word-of-mouth information analysis request includes information indicating the category of the advertisement product or service related to the advertisement displayed in the advertisement image display area 502. When the category can be specified by the identification information (item identification information) of the advertising product or service, the identification information of the advertising product or service may be included in the word-of-mouth information analysis request.

ブログ解析サーバ１では、口コミ情報解析リクエストを受け付けると、ブログ解析処理を実行する（Ｓ４４１）。このブログ解析処理により、広告商品またはサービス及びこれらと同じカテゴリに属する商品またはサービスについて記載されたブログデータ（口コミ情報）において、所定以上の回数または頻度で登場する頻出文字列が抽出される。ブログ解析処理の詳細については後述する。ブログ解析サーバ１は、ブログ解析処理によって抽出された頻出文字列を表示する口コミ情報解析結果画面を生成し、その画面データをユーザ端末装置３へ送信する（Ｓ４４３）。 When receiving the word-of-mouth information analysis request, the blog analysis server 1 executes blog analysis processing (S441). By this blog analysis process, frequent character strings appearing at a predetermined number of times or frequency are extracted from the blog data (word-of-mouth information) described for the advertising products or services and the products or services belonging to the same category as these. Details of the blog analysis process will be described later. The blog analysis server 1 generates a word-of-mouth information analysis result screen that displays the frequent character strings extracted by the blog analysis processing, and transmits the screen data to the user terminal device 3 (S443).

ユーザ端末装置３では、ブログ解析サーバ１から送信された頻出文字列を含む口コミ情報解析結果画面を表示する（Ｓ４２５）。 The user terminal device 3 displays a word-of-mouth information analysis result screen including the frequent character string transmitted from the blog analysis server 1 (S425).

図８は、口コミ情報解析結果画面６００の一例を示す。同図に示すように、口コミ情報解析結果画面６００は、口コミ情報の解析を行ったカテゴリを示すカテゴリ表示６０１、及び口コミ情報の解析結果である頻出文字列リスト６０２を含む。頻出文字列リスト６０２では、出現回数または頻度により文字列がソートされている。 FIG. 8 shows an example of the word-of-mouth information analysis result screen 600. As shown in the figure, the word-of-mouth information analysis result screen 600 includes a category display 601 indicating a category for which word-of-mouth information has been analyzed, and a frequent character string list 602 that is the result of word-of-mouth information analysis. In the frequent character string list 602, character strings are sorted by the number of appearances or the frequency.

これにより、ユーザは、ブログに記載されている口コミ情報の中で、広告商品またはサービスと同じカテゴリの商品またはサービスについて、どのようなキーワードが多く使用されているのかを知ることができる。つまり、このキーワードにより、広告商品またはサービスに関して、消費者が何について高い関心を持っているかを知ることができる。例えば、図８の例では、消費者は、冷蔵庫の特性のうち「静か」、「満足」、「デザイン」、及び「機能」などに関心を持っていることがわかる。 Thereby, the user can know what keywords are frequently used for the product or service in the same category as the advertisement product or service in the word-of-mouth information described in the blog. In other words, it is possible to know what a consumer is highly interested in regarding an advertisement product or service by using this keyword. For example, in the example of FIG. 8, it can be seen that the consumer is interested in “quiet”, “satisfied”, “design”, “function”, and the like among the characteristics of the refrigerator.

口コミ情報解析結果画面６００が表示されているときに、例えば、ユーザが頻出文字列リスト６０２の中から一の頻出文字列を選択すると、ユーザ端末装置３はそれを受け付ける（Ｓ４２７）。この選択に基づいて、ユーザ端末装置３はブログ解析サーバ１へアイテム解析リクエストを送信する。このアイテム解析リクエストには、カテゴリ表示６０１に表示されているカテゴリ及び選択文字列を示す情報を含む。 When the word-of-mouth information analysis result screen 600 is displayed, for example, when the user selects one frequent character string from the frequent character string list 602, the user terminal device 3 accepts it (S427). Based on this selection, the user terminal device 3 transmits an item analysis request to the blog analysis server 1. This item analysis request includes information indicating the category displayed in the category display 601 and the selected character string.

ブログ解析サーバ１は、ユーザ端末装置３からのアイテム解析リクエストを受け付けて、アイテム解析処理を行う（Ｓ４４５）。アイテム解析では、アイテム解析リクエストに含まれているカテゴリの各アイテムの口コミ情報において、アイテム別に選択文字列が出現する回数または頻度を計数し、その結果に応じてアイテムランキングを生成する。アイテム解析の詳細は後述する。ブログ解析サーバ１は、アイテム解析結果に基づいて、アイテムランキング表示画面を生成し、この画面データをユーザ端末装置３へ送信する（Ｓ４４７）。 The blog analysis server 1 receives an item analysis request from the user terminal device 3 and performs an item analysis process (S445). In the item analysis, in the word-of-mouth information of each item in the category included in the item analysis request, the number or frequency of appearance of the selected character string for each item is counted, and the item ranking is generated according to the result. Details of the item analysis will be described later. The blog analysis server 1 generates an item ranking display screen based on the item analysis result, and transmits this screen data to the user terminal device 3 (S447).

ユーザ端末装置３では、ブログ解析サーバ１から送信されたアイテムランキングを含むアイテム解析結果画面を表示する（Ｓ４２９）。 The user terminal device 3 displays an item analysis result screen including the item ranking transmitted from the blog analysis server 1 (S429).

図９は、アイテム解析結果画面７００の一例を示す。同図に示すように、アイテム解析結果画面７００は、アイテム解析の対象となったカテゴリを示すカテゴリ表示７０１と、アイテム解析に用いた選択文字列を示す表示７０２と、アイテム解析結果であるアイテムランキング７０３とを含む。 FIG. 9 shows an example of the item analysis result screen 700. As shown in the figure, an item analysis result screen 700 includes a category display 701 indicating a category subjected to item analysis, a display 702 indicating a selected character string used for item analysis, and an item ranking which is an item analysis result. 703.

これにより、ユーザは、広告商品またはサービスと同じカテゴリの商品またはサービスに関してブログに記載されている口コミ情報の中で、自らが選択したキーワードが多く含まれている商品またはサービスを知ることができる。例えば、図９の例では、「冷蔵庫」の「デザイン」に関する口コミ情報が多いメーカ及び機種が何であるかを知ることができる。 Thereby, the user can know a product or service in which many keywords selected by the user are included in the word-of-mouth information described in the blog regarding the product or service in the same category as the advertisement product or service. For example, in the example of FIG. 9, it is possible to know what manufacturers and models have a lot of word-of-mouth information related to “design” of “refrigerator”.

アイテム解析結果画面７００が表示されているときに、例えば、ユーザがアイテムランキング７０３の中から一のアイテムを選択すると、ユーザ端末装置３はそれを受け付ける（Ｓ４３１）。この選択に基づいて、ユーザ端末装置３はブログ解析サーバ１へ、選択されたアイテムに関する口コミ情報の要約生成リクエストを送信する。要約生成リクエストには、選択されたアイテムを示す情報が含まれる。 When the item analysis result screen 700 is displayed, for example, when the user selects one item from the item ranking 703, the user terminal device 3 accepts it (S431). Based on this selection, the user terminal device 3 transmits to the blog analysis server 1 a request for generating a summary of word-of-mouth information related to the selected item. The summary generation request includes information indicating the selected item.

ブログ解析サーバ１は、この要約生成リクエストを受け付けると、要約生成処理を行う（Ｓ４４９）。この要約生成処理は、要約生成リクエストに含まれているアイテムについての口コミ情報の要約を生成する。要約生成処理の詳細は後述する。ブログ解析サーバ１は、生成した要約を表示するための要約表示画面を生成し、この画面データをユーザ端末装置３へ送る（Ｓ４５１）。 Upon receiving this summary generation request, the blog analysis server 1 performs summary generation processing (S449). In this summary generation process, a summary of word-of-mouth information about items included in the summary generation request is generated. Details of the summary generation process will be described later. The blog analysis server 1 generates a summary display screen for displaying the generated summary, and sends this screen data to the user terminal device 3 (S451).

ユーザ端末装置３は、この画面データを受け付けて、要約表示画面を表示させる。 The user terminal device 3 receives this screen data and displays a summary display screen.

図１０は、要約表示画面９００の一例を示す。要約表示画面９００には、要約生成処理で生成された要約の表示領域９０１と、要約生成の対象となった口コミ情報における頻出語の表示領域９０２を含む。 FIG. 10 shows an example of the summary display screen 900. The summary display screen 900 includes a summary display area 901 generated by the summary generation process, and a frequent word display area 902 in the word-of-mouth information subjected to the summary generation.

これにより、ユーザは、自らが指定した商品またはサービスに関するブログの内容の要約（抜粋）を見ることができる。つまり、ユーザは、自らが指定した商品またはサービスに関するブログの全体を読む必要がなく、その重要な部分を抜粋して読むことができる。 Thereby, the user can see the summary (extract) of the content of the blog regarding the product or service designated by the user. That is, the user does not need to read the entire blog regarding the product or service designated by the user, and can extract and read the important part.

図１１は、口コミ情報解析リクエスト（ブログデータ解析リクエスト）処理を示すフローチャートである。 FIG. 11 is a flowchart showing word-of-mouth information analysis request (blog data analysis request) processing.

まず、ネットワークインタフェース部１１が、ユーザ端末装置３から、広告商品またはサービスのカテゴリを含む口コミ情報解析リクエストを受け付ける（Ｓ３１）。 First, the network interface unit 11 receives a word-of-mouth information analysis request including a category of advertisement product or service from the user terminal device 3 (S31).

テキスト解析部１７は、ブログデータ記憶部１５に生成されているフォルダのうち、口コミ情報解析リクエストにかかるカテゴリのフォルダに格納されている全ファイルを読み込む（Ｓ３３）。 The text analysis unit 17 reads all the files stored in the folder of the category related to the word-of-mouth information analysis request among the folders generated in the blog data storage unit 15 (S33).

そして、テキスト解析部１７は、ここで読み込んだファイルのテキストに対して、テキスト解析処理を行う（Ｓ３５）。テキスト解析処理により抽出された文字列が、抽出文字列記憶部１９に格納される。このテキスト解析処理の詳細な処理手順は後述する。 Then, the text analysis unit 17 performs a text analysis process on the text of the file read here (S35). The character string extracted by the text analysis process is stored in the extracted character string storage unit 19. The detailed processing procedure of this text analysis processing will be described later.

ネットワークインタフェース部１１は、ステップＳ３５によって抽出された文字列を出現回数に応じてソートした抽出文字列リストを表示する口コミ情報解析結果画面６００のデータを、口コミ情報解析リクエストの送信元のユーザ端末装置３へ出力する（Ｓ３７）。 The network interface unit 11 uses the data of the word-of-mouth information analysis result screen 600 that displays the extracted character string list obtained by sorting the character strings extracted in step S35 according to the number of appearances, and the user terminal device that has transmitted the word-of-mouth information analysis request 3 (S37).

これによって、ユーザは、口コミ情報を含むブログデータから、広告商品またはサービスのカテゴリに関する記述の中で頻繁に使用されている文字列（キーワード）をリアルタイムで知ることができる。なお、ここで抽出される文字列の多くは名詞である。 Thereby, the user can know in real time the character string (keyword) frequently used in the description about the category of the advertising product or service from the blog data including the word-of-mouth information. Note that many of the character strings extracted here are nouns.

次に、図１２は、図１１のステップＳ３５のテキスト解析処理の詳細な手順を示すフローチャートである。 Next, FIG. 12 is a flowchart showing a detailed procedure of the text analysis process in step S35 of FIG.

まず、特定文字列処理部１７１が、読み込んだテキストの中から、予め定められている一以上の特定文字列を抽出し、それぞれの出現回数を計数する。そして、特定文字列処理部１７１は、ここで抽出された特定文字列及びそれぞれの出現回数を抽出文字列記憶部１９に保存する（Ｓ５１）。特定文字列処理部１７１は、さらに、読み込んだテキストにおいて、ここで抽出された特定文字列を空白に置換する（Ｓ５３）。 First, the specific character string processing unit 171 extracts one or more predetermined specific character strings from the read text, and counts the number of appearances of each. Then, the specific character string processing unit 171 stores the specific character string extracted here and the number of appearances thereof in the extracted character string storage unit 19 (S51). The specific character string processing unit 171 further replaces the specific character string extracted here with a blank in the read text (S53).

次に、消去文字列処理部１７３は、特定文字列が空白に置換されたテキストにおいて、予め定められている一以上の消去文字を空白に置換する（Ｓ５５）。 Next, the erased character string processing unit 173 replaces one or more predetermined erase characters with a blank in the text in which the specific character string is replaced with a blank (S55).

次に、不要文字列処理部１７５は、特定文字列及び消去文字が空白に置換されたテキストにおいて、予め定められている一以上の不要文字列を空白に置換する（Ｓ５７）。 Next, the unnecessary character string processing unit 175 replaces one or more predetermined unnecessary character strings with blanks in the text in which the specific character string and the erased character are replaced with blanks (S57).

そして、特定文字列、消去文字及び不要文字列が空白に置換されたテキストに対して、切出し処理部１７７は、空白によって区切られている文字列を分離する（Ｓ５９）。切出し処理部１７７は、ステップＳ５９で分離された文字列の語頭及び語尾に漢字以外の文字があれば、それを除去する（Ｓ６１）。そして、ここで得られた各文字列の出現回数を計数して、それぞれの文字列と出現回数を抽出文字列記憶部１９に保存する（Ｓ６３）。 Then, the cut-out processing unit 177 separates the character string delimited by the blank from the text in which the specific character string, the erased character, and the unnecessary character string are replaced with the blank (S59). The cutout processing unit 177 removes any character other than kanji at the beginning and end of the character string separated in step S59 (S61). Then, the number of appearances of each character string obtained here is counted, and each character string and the number of appearances are stored in the extracted character string storage unit 19 (S63).

図１１及び図１２の処理により、口コミ情報解析リクエスト（ブログ解析リクエスト）に対するレスポンスとして、ユーザが指定したカテゴリに関するブログデータから抽出された文字列がユーザ端末装置３へ出力される。次に説明するアイテム解析リクエスト処理は、ユーザ端末装置３へ出力された抽出文字列の中から選択された選択文字列を含むリクエストに関する処理である。 11 and 12, a character string extracted from the blog data related to the category specified by the user is output to the user terminal device 3 as a response to the word-of-mouth information analysis request (blog analysis request). The item analysis request process described below is a process related to a request including a selected character string selected from the extracted character strings output to the user terminal device 3.

次に、図１３は、アイテム解析リクエスト処理を示すフローチャートである。 Next, FIG. 13 is a flowchart showing item analysis request processing.

まず、ネットワークインタフェース部１１が、ユーザ端末装置３から、ユーザが抽出文字列リストの中から選択した文字列及び口コミ情報解析の対象カテゴリを含むアイテム解析リクエストを受け付ける（Ｓ７１）。 First, the network interface unit 11 receives from the user terminal device 3 an item analysis request including a character string selected by the user from the extracted character string list and a category for word-of-mouth information analysis (S71).

アイテムランキング処理部２１は、アイテム解析リクエストにかかるカテゴリのフォルダ１５０に含まれているテキストファイル１５２を読み込む（Ｓ７３）。そして、アイテムランキング処理部２１は、ここで読み込んだテキストの中から、ユーザが選択した選択文字列の出現回数を、ファイル別に計数する（Ｓ７５）。 The item ranking processing unit 21 reads the text file 152 included in the folder 150 of the category related to the item analysis request (S73). Then, the item ranking processing unit 21 counts the number of appearances of the selected character string selected by the user from the text read here for each file (S75).

アイテムランキング処理部２１は、この計数結果に応じて、各ファイルに対応するアイテム名をソートして、アイテムランキングを表示するアイテム解析結果画面７００のデータを生成し、これをアイテム解析リクエストの送信元のユーザ端末装置３へ出力する（Ｓ７７）。 The item ranking processing unit 21 sorts the item names corresponding to the respective files in accordance with the counting result, generates data of the item analysis result screen 700 displaying the item ranking, and uses this as the transmission source of the item analysis request To the user terminal device 3 (S77).

これにより、ユーザは、どのアイテムで、選択文字列が多く使用されているかを知ることができる。 Thereby, the user can know in which item the selected character string is frequently used.

図１４は、要約文作成処理手順を示すフローチャートである。 FIG. 14 is a flowchart showing a summary sentence creation processing procedure.

まず、ネットワークインタフェース部１１が、ユーザ端末装置３から、ユーザが選択したアイテム（商品またはサービス）を含む要約作成リクエストを受け付ける（Ｓ８１）。 First, the network interface unit 11 receives a summary creation request including an item (product or service) selected by the user from the user terminal device 3 (S81).

要約生成部２３は、ブログデータ記憶部１５に生成されているフォルダのうち要約作成リクエストにかかるアイテムのファイルを読み込む（Ｓ８３）。 The summary generation unit 23 reads the file of the item related to the summary creation request among the folders generated in the blog data storage unit 15 (S83).

出現回数計数部２３１は、読み込んだテキストを単語に区切り、各単語の出現回数を計数する（Ｓ８４）。 The appearance count counting unit 231 divides the read text into words, and counts the number of appearances of each word (S84).

区切り処理部２３２は、読み込んだテキストに含まれる所定の記号を置換する（Ｓ８５）。例えば、置換対象の記号は、例えば、空白「」、アスタリスク「＊」、コロン「：」、読点「、」、点「・」、丸「●」「○」、二重丸「◎」、四角「■」「□」、墨付きカッコなどでよく、半角記号及び全角記号のいずれであってもよい。本実施形態では、置換対象の記号を句点「。」に置換する。 The delimiter processing unit 232 replaces a predetermined symbol included in the read text (S85). For example, the symbols to be replaced include, for example, a blank “”, an asterisk “*”, a colon “:”, a punctuation mark “,”, a dot “•”, a circle “●” “○”, a double circle “◎”, a square “■”, “□”, black brackets, etc. may be used, and either half-width symbols or full-width symbols may be used. In this embodiment, the symbol to be replaced is replaced with a punctuation mark “.”.

区切り処理部２３２は、記号が置換されたテキストを一文ごとの区切る（Ｓ８７）。すなわち、区切り処理部２３２は、テキスト中の句点を検出して、検出した句点によって区切る。 The delimiter processing unit 232 delimits the text with the replaced symbols for each sentence (S87). That is, the delimiter processing unit 232 detects the punctuation points in the text, and divides the detected punctuation points.

文選択処理部２３３は、ステップＳ８７で区切られた文のうちの処理対象とする一文を特定する（Ｓ８９）。そして、対象文に頻出語が含まれているか否かを判定する（Ｓ９１）。頻出語の基準となる出現回数の閾値は、上述の通りテキスト長に応じて変動しても良い。 The sentence selection processing unit 233 identifies one sentence to be processed among the sentences delimited at step S87 (S89). Then, it is determined whether or not the frequently used word is included in the target sentence (S91). As described above, the threshold value of the number of appearances that is a reference for frequent words may vary according to the text length.

対象文に頻出語が含まれていないときは（Ｓ９１：Ｎｏ）、対象文の文末がテキスト属性別に予め定められている特定の文末であるか否かを判定する（Ｓ９３）。テキスト属性別の文末の例は、既に述べた通りである。 When the target sentence does not contain a frequent word (S91: No), it is determined whether or not the sentence end of the target sentence is a specific sentence end predetermined for each text attribute (S93). Examples of sentence endings by text attribute are as described above.

頻出語が含まれているとき（Ｓ９１：Ｙｅｓ）及び、対象文の文末がテキストの属性別の特定の文末であるときは（Ｓ９３：Ｙｅｓ）、その対象文を要約に含める文として選択する（Ｓ９５）。 When a frequent word is included (S91: Yes) and when the sentence end of the target sentence is a specific sentence end by text attribute (S93: Yes), the target sentence is selected as a sentence to be included in the summary ( S95).

対象文の文末がテキストの属性別の特定の文末でないときは（Ｓ９３：Ｎｏ）、その対象文に対する処理を終了し、読み込んだテキストに含まれている全文の処理が終了したか否かを判定する（Ｓ９７）。全文の処理が終了していないときは、ステップＳ８９へ戻って処理を繰り返し（Ｓ９７：Ｎｏ）、全文について処理が終了すると（Ｓ９７：Ｙｅｓ）、ブログ解析サーバ１は、ステップＳ９５で選択された一以上の文を、要約として要約作成リクエストの送信元のユーザ端末装置３へ出力する（Ｓ９９）。 When the sentence end of the target sentence is not a specific sentence end by text attribute (S93: No), the process for the target sentence is terminated, and it is determined whether or not the processing of all the sentences included in the read text is terminated. (S97). If the processing of the whole sentence has not been completed, the process returns to step S89 to repeat the processing (S97: No). When the processing for the whole sentence is completed (S97: Yes), the blog analysis server 1 selects the one selected in step S95. The above sentence is output as summary to the user terminal device 3 that is the source of the summary creation request (S99).

なお、上述した処理の中で、ステップＳ９１及びステップＳ９３のいずれか一方を省略しても良い。つまり、文選択処理部２３３で選択される文は、頻出語を含む文のみでも良いし、特定の文末を有する文のみでも良い。また、ステップＳ８５を省略しても良い。 In the above-described processing, either one of step S91 and step S93 may be omitted. That is, the sentence selected by the sentence selection processing unit 233 may be only a sentence including a frequent word or only a sentence having a specific sentence ending. Further, step S85 may be omitted.

以上説明した本発明の実施形態によれば、ブログなどのテキストにおいて、そこで記述されている内容を特徴付ける文字列を迅速に抽出することができる。 According to the embodiment of the present invention described above, it is possible to quickly extract a character string that characterizes the contents described in text such as a blog.

また、本発明の実施形態によれば、ブログなどのテキストから抽出した文字列を統計処理することができる。 Further, according to the embodiment of the present invention, it is possible to statistically process a character string extracted from text such as a blog.

また、本発明の実施形態によれば、口コミ情報を用いて広告効果を高めることができる。 Moreover, according to the embodiment of the present invention, the advertising effect can be enhanced by using the word-of-mouth information.

また、本発明の実施形態によれば、広告にかかる商品またはサービスに関する口コミ情報を提供することができる。 Moreover, according to the embodiment of the present invention, it is possible to provide word-of-mouth information related to a product or service related to an advertisement.

また、本発明の実施形態によれば、広告にかかる商品またはサービスに関する口コミ情報から、ユーザにとって有効な情報を選別して提供することができる。 Further, according to the embodiment of the present invention, it is possible to select and provide information effective for the user from the word-of-mouth information related to the product or service related to the advertisement.

例えば、インターネットユーザは、広告を見たときに、その広告にかかる商品またはサービスの口コミ情報を効率的に知ることができる。これにより、そのユーザは、その広告商品またはサービスを購入するか否かの意思決定に役立てることができる。また、商品の販売者は、広告にかかる商品について、肯定的な口コミ情報が多いことを客観的に示すことができれば、ユーザに購入への強い動機付けを与えることができ、広告効果を高めることができる。 For example, when an Internet user sees an advertisement, the user can efficiently know word-of-mouth information about a product or service related to the advertisement. Thereby, the user can make use of the decision as to whether or not to purchase the advertising product or service. In addition, if the merchant of the product can objectively show that there is a lot of positive word-of-mouth information about the product related to the advertisement, it can give the user a strong motivation to purchase and increase the advertising effect. Can do.

上述した本発明の実施形態は、本発明の説明のための例示であり、本発明の範囲をそれらの実施形態にのみ限定する趣旨ではない。当業者は、本発明の要旨を逸脱することなしに、他の様々な態様で本発明を実施することができる。 The above-described embodiments of the present invention are examples for explaining the present invention, and are not intended to limit the scope of the present invention only to those embodiments. Those skilled in the art can implement the present invention in various other modes without departing from the gist of the present invention.

例えば、上述した実施形態では、ブログデータを対象とするテキスト処理について説明したが、本発明は、ブログ以外のテキストに対しても適用可能である。 For example, in the embodiment described above, text processing for blog data has been described, but the present invention can also be applied to text other than blogs.

１ブログ解析サーバ
３ユーザ端末装置
４広告サーバ
５ブログサーバ
１１ネットワークインタフェース部
１３ブログデータ収集部
１５ブログデータ記憶部
１７テキスト解析部
１９抽出文字列記憶部
２１アイテムランキング処理部
２３要約生成部
４１ネットワークインタフェース部
４３ウェブサーバ
４５ページデータ記憶部
４７広告データ記憶部
１７１特定文字列処理部
１７３消去文字列処理部
１７５不要文字列処理部
１７７切り出し処理部
２３１出現回数計数部
２３２区切り処理部
２３３文選択処理部
５００ウェブページ
６００口コミ情報解析結果画面
７００アイテム解析結果画面
９００要約表示画面 DESCRIPTION OF SYMBOLS 1 Blog analysis server 3 User terminal device 4 Advertising server 5 Blog server 11 Network interface part 13 Blog data collection part 15 Blog data storage part 17 Text analysis part 19 Extracted character string storage part 21 Item ranking processing part 23 Summary generation part 41 Network interface Unit 43 Web server 45 Page data storage unit 47 Advertising data storage unit 171 Specific character string processing unit 173 Erase character string processing unit 175 Unnecessary character string processing unit 177 Cutout processing unit 231 Appearance count unit 232 Separation processing unit 233 Sentence selection processing unit 500 Web page 600 Word-of-mouth information analysis result screen 700 Item analysis result screen 900 Summary display screen

Claims

A text storage unit for storing a file including text including a plurality of sentences;
Upon receiving a text summary creation request in a file stored in the text storage unit, a delimiter processing unit that divides the text in the file into sentences,
A selection processing unit that selects a sentence ending with a predetermined end-of-sentence character determined according to the attribute of the text among a plurality of sentences separated by the separation processing unit;
An abstract creation device comprising: output means for outputting a sentence selected by the selection processing unit as a response to the summary creation request.

The summary creation device according to claim 1, wherein the selection processing unit selects a sentence including a predetermined frequent word or more in the text among a plurality of sentences delimited by the delimiter processing unit.

When the text is a blog describing a product or service, the end-of-sentence characters are at least “foot”, “profit”, “mind”, “su”, “i”, “ri”, “ru”, “ru” The summary creation device according to claim 1, comprising “yo” and “n”.

3. The summary creation device according to claim 1, wherein when the text is a newspaper article, the end-of-sentence character includes at least “su”, “ta”, “ru”, and “dan”.

5. The summary creation device according to claim 4, wherein when the text is a newspaper article, the selection processing unit selects a sentence having a sentence beginning with “(” and a sentence ending with “”.

When the text is a newspaper article, the end-of-sentence character has one of “in this”, “just”, and “other” at the beginning of the sentence, and “s”, “ta” at the end of the sentence. 6. The summary creation device according to claim 4, wherein a sentence immediately before a sentence that is any one of “RU” and “DA” is selected.

A computer-generated text summarization method,
Storing a file including text including a plurality of sentences in a text storage unit;
Receiving a text summary creation request in a file stored in the text storage unit, dividing the text in the file into sentences;
Selecting a sentence ending with a predetermined end-of-sentence character determined according to an attribute of the text among the plurality of separated sentences;
Outputting the selected sentence as a response to the summary creation request.

A computer program for creating a text summary,
Storing a file including text including a plurality of sentences in a text storage unit;
Receiving a text summary creation request in a file stored in the text storage unit, dividing the text in the file into sentences;
Selecting a sentence ending with a predetermined end-of-sentence character determined according to an attribute of the text among the plurality of separated sentences;
A computer program for causing a computer to execute the step of outputting the selected sentence as a response to the summary creation request;