JP5642058B2

JP5642058B2 - Attention word analysis method and attention word analysis system

Info

Publication number: JP5642058B2
Application number: JP2011284329A
Authority: JP
Inventors: 裕也小松; 森田　豊久; 豊久森田; 英志木村
Original assignee: Hitachi Systems Ltd
Current assignee: Hitachi Systems Ltd
Priority date: 2011-12-26
Filing date: 2011-12-26
Publication date: 2014-12-17
Anticipated expiration: 2031-12-26
Also published as: JP2013134612A

Description

本発明は、収集したテキストデータ内で注目されている単語を分析する注目単語分析方法および注目単語分析システムに関し、特に、Ｗｅｂページ等の時刻により収集する量が変化していくテキストデータを対象する場合の分析に関するものである。 The present invention relates to an attention word analysis method and attention word analysis system for analyzing a word attracting attention in collected text data, and particularly to text data whose amount to be collected changes depending on time such as a web page. It is about analysis of the case.

テキストデータ群から有用な情報を抽出する手法として、テキストマイニング処理を行い、テキストデータ群内で単語の注目度合を得る方法がある。 As a method of extracting useful information from a text data group, there is a method of performing a text mining process and obtaining the attention level of a word in the text data group.

広く使われている手法としては、テキストデータ群に含まれる各単語の出現回数をカウントし、出現回数の大きい単語をテキストデータ群における注目単語として表す手法がある。 As a widely used technique, there is a technique in which the number of appearances of each word included in a text data group is counted, and a word with a large number of appearances is represented as an attention word in the text data group.

これに加えて、単なる出現回数だけではなく、同一文書に含まれる付加的情報も加えて、単語を評価する手法もある。例えば、特開２０１１−７０２５２号公報（特許文献１）では、ユーザが発信したブログやＳＮＳ（ＳｏｃｉａｌＮｅｔｗｏｒｋｉｎｇＳｅｒｖｉｃｅ）などのＣＧＭ（ＣｏｎｓｕｍｅｒＧｅｎｅｒａｔｅｄＭｅｄｉａ）に蓄積された文書データ群について、各単語の出現回数と、予め用意した評価や感想に係わる単語との隣接の程度を考慮して、単語の評価を定め、市場全体のニーズやその変化を分析する手法を紹介している。 In addition to this, there is a method of evaluating words by adding not only the number of appearances but also additional information included in the same document. For example, in Japanese Patent Application Laid-Open No. 2011-70252 (Patent Document 1), the number of times each word appears in a document data group stored in a CGM (Consumer Generated Media) such as a blog or SNS (Social Networking Service) sent by a user. In addition, it introduces a method that determines the evaluation of words in consideration of the degree of adjacency with words related to evaluations and impressions prepared in advance, and analyzes the needs and changes of the market as a whole.

また、特開２００５−２５８６７８号公報（特許文献２）では、ある期間とそれ以前の期間で収集したＷｅｂページ内の単語の出現頻度と単語の出現している文書数を数え、その出現頻度と出現している文書数から文書内の単語の重みを計算する。その後、ある期間とそれ以前の期間の単語の重みの変化量から話題度を計算し、話題度により話題となっている単語リストを得る手法を述べている。 Japanese Patent Laid-Open No. 2005-258678 (Patent Document 2) counts the appearance frequency of words in a Web page and the number of documents in which words appear in a certain period and the previous period, The weight of the word in the document is calculated from the number of appearing documents. After that, a technique is described in which the topic level is calculated from the amount of change in the weight of the word in a certain period and the previous period, and a topic word list is obtained based on the topic level.

特開２０１１−７０２５２号公報JP 2011-70252 A 特開２００５−２５８６７８号公報JP 2005-258678 A

特許文献１では、Ｗｅｂ上のテキストデータを収集して、各単語の出現回数に、予め用意した評価を持つ単語と近辺に出現する単語との関係を加えて、ユーザの嗜好や市場ニーズに関する情報を得る方法が示されている。 Patent Document 1 collects text data on the Web, adds the relationship between a word having an evaluation prepared in advance and a word appearing in the vicinity to the number of appearances of each word, and information on user preferences and market needs The way to get it is shown.

しかし、Ｗｅｂ上で、ある期間に更新されたテキストデータを収集して分析するとした場合、期間ごとの更新されるテキストデータ数は大きく変化するため、更新されたテキストデータ数の影響を受け、単純に単語出現頻度を数えただけでは、市場で注目されている単語が得られるとは言い難い。 However, on the Web, if text data updated during a certain period is collected and analyzed, the number of text data updated for each period varies greatly. It is hard to say that the word attracting attention in the market can be obtained only by counting the word appearance frequency.

特許文献２は、ある時点での話題語のリストを得ることはできる。しかし、話題語リストを抽出するために計算される指標は、単語出現頻度を用いているため、対象とした文書数の影響を受ける。そのため、期間ごとに収集される文書数が変化する場合、話題語の時系列変化を評価することはできない。 Patent Document 2 can obtain a list of topic words at a certain point in time. However, since the index calculated for extracting the topic word list uses the word appearance frequency, it is affected by the number of target documents. Therefore, when the number of documents collected for each period changes, it is not possible to evaluate the time series change of the topic word.

そこで、本発明の目的は、期間ごとに分析の対象とするテキストデータ数が変化する集合に対して、テキストデータ数の影響を受けずに、単語の注目度合の時系列変化を評価することができる注目単語分析方法および注目単語分析システムを提供することにある。 Therefore, an object of the present invention is to evaluate a time series change of the degree of attention of a word without being affected by the number of text data, for a set in which the number of text data to be analyzed changes every period. An object of the present invention is to provide an attention word analysis method and attention word analysis system.

本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述および添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、次の通りである。 Of the inventions disclosed in the present application, the outline of typical ones will be briefly described as follows.

すなわち、代表的なものの概要は、テキストデータ分析部により、入力部から入力されたユーザからの分析対象単語リストを取得し、テキストデータ群から分析対象単語リスト内の複数の単語の出現頻度を数え、複数の単語の単語間の出現頻度から単語出現割合を計算し、相対的に注目されている単語を分析するものである。 In other words, the outline of a typical one is obtained by obtaining an analysis target word list from a user input from the input unit by the text data analysis unit and counting the appearance frequency of a plurality of words in the analysis target word list from the text data group. The word appearance ratio is calculated from the appearance frequency between words of a plurality of words, and the word that is relatively focused is analyzed.

また、ユーザからの分析対象単語リストが入力される入力部と、テキストデータ群から分析対象単語リスト内の複数の単語の出現頻度を数え、複数の単語の単語間の出現頻度から単語出現割合を計算し、相対的に注目されている単語を分析するテキストデータ分析部とを備えたものである。 In addition, the frequency of appearance of a plurality of words in the analysis target word list is counted from the input unit to which the analysis target word list from the user is input, and the text data group, and the word appearance ratio is calculated from the appearance frequency of the plurality of words. And a text data analysis unit for calculating and analyzing relatively attention words.

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下の通りである。 The effects obtained by typical ones of the inventions disclosed in the present application will be briefly described as follows.

すなわち、代表的なものによって得られる効果は、テキストデータ群内の注目単語を分析する際に、テキストデータ数の影響を少なくして、単語の注目度合の時系列変化を評価することが可能である。 In other words, the effect obtained by a typical one is that when analyzing a word of interest in a text data group, it is possible to reduce the influence of the number of text data and evaluate the time series change of the degree of attention of the word. is there.

本発明の一実施の形態に係る注目単語分析システムの構成を示す構成図である。It is a block diagram which shows the structure of the attention word analysis system which concerns on one embodiment of this invention. 本発明の一実施の形態に係る注目単語分析システムがテキストデータ分析を行う際のデータの流れを示す図である。It is a figure which shows the data flow when the attention word analysis system which concerns on one embodiment of this invention performs text data analysis. 本発明の一実施の形態に係る注目単語分析システムの分析対象テキストデータ集合決定部の処理を示すフローチャートである。It is a flowchart which shows the process of the analysis object text data set determination part of the attention word analysis system which concerns on one embodiment of this invention. 本発明の一実施の形態に係る注目単語分析システムで使用される注目単語と比較単語リストの一例を示す図である。It is a figure which shows an example of the attention word used by the attention word analysis system which concerns on one embodiment of this invention, and a comparison word list | wrist. 本発明の一実施の形態に係る注目単語分析システムで使用される分析対象単語リストの一例を示す図である。It is a figure which shows an example of the analysis object word list used with the attention word analysis system which concerns on one embodiment of this invention. 本発明の一実施の形態に係る注目単語分析システムで使用される分析対象テキストデータ集合の一例を示す図である。It is a figure which shows an example of the analysis object text data set used with the attention word analysis system which concerns on one embodiment of this invention. 本発明の一実施の形態に係る注目単語分析システムの単語頻度計算部の処理を示すフローチャートである。It is a flowchart which shows the process of the word frequency calculation part of the attention word analysis system which concerns on one embodiment of this invention. 本発明の一実施の形態に係る注目単語分析システムの相対的単語出現割合計算部の処理を示すフローチャートである。It is a flowchart which shows the process of the relative word appearance ratio calculation part of the attention word analysis system which concerns on one embodiment of this invention. 本発明の一実施の形態に係る注目単語分析システムの単語頻度計算部で作成される表の一例を示す図である。It is a figure which shows an example of the table produced in the word frequency calculation part of the attention word analysis system which concerns on one embodiment of this invention. 本発明の一実施の形態に係る注目単語分析システムの単語出現割合を計算した一例を示す図である。It is a figure which shows an example which calculated the word appearance ratio of the attention word analysis system which concerns on one embodiment of this invention. 本発明の一実施の形態に係る注目単語分析システムの出力例を示す図である。It is a figure which shows the example of an output of the attention word analysis system which concerns on one embodiment of this invention.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一の部材には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted.

＜注目単語分析システムの構成＞
図１により、本発明の一実施の形態に係る注目単語分析システムの構成について説明する。図１は本発明の一実施の形態に係る注目単語分析システムの構成を示す構成図である。 <Configuration of attention word analysis system>
With reference to FIG. 1, the structure of the attention word analysis system which concerns on one embodiment of this invention is demonstrated. FIG. 1 is a block diagram showing a configuration of a focused word analysis system according to an embodiment of the present invention.

図１において、注目単語分析システム１０１は、中央演算処理装置などの計算能力を有する１つ以上の計算機で構成される計算機システム上で動作しており、分析管理サブシステム２００、テキストデータ分析部であるテキストデータ分析サブシステム３００、テキストデータ群４００から構成されている。 In FIG. 1, an attention word analysis system 101 operates on a computer system composed of one or more computers having a calculation capability such as a central processing unit, and includes an analysis management subsystem 200 and a text data analysis unit. A text data analysis subsystem 300 and a text data group 400 are included.

分析管理サブシステム２００は、入力部２０１、表示部２０２から構成され、入力部２０１を介して得たユーザ１００からの要求をテキストデータ分析サブシステム３００に送信し、表示部２０２に結果を表示するシステムである。 The analysis management subsystem 200 includes an input unit 201 and a display unit 202. The request from the user 100 obtained through the input unit 201 is transmitted to the text data analysis subsystem 300, and the result is displayed on the display unit 202. System.

テキストデータ分析サブシステム３００は、分析対象テキストデータ集合決定部３０１、単語頻度計算部３０２、相対的単語出現割合計算部３０３から構成され、ユーザ１００から入力された要求に応じてテキストデータ内で注目されている単語を分析するシステムである。 The text data analysis subsystem 300 includes an analysis target text data set determination unit 301, a word frequency calculation unit 302, and a relative word appearance ratio calculation unit 303. In the text data in response to a request input from the user 100, It is a system that analyzes the words that are being used.

分析対象テキストデータ集合決定部３０１は、テキストデータ群４００から注目単語および比較単語を含むテキストデータの集合（分析対象テキストデータ集合）を抽出する機能を有する。単語頻度計算部３０２は、分析対象テキストデータ集合内に含まれる単語の出現頻度を数える機能を有する。相対的単語出現割合計算部３０３は、注目単語の出現頻度と比較単語の出現頻度から単語の出現割合を計算する機能を有する。 The analysis target text data set determination unit 301 has a function of extracting a set of text data (analysis target text data set) including the attention word and the comparison word from the text data group 400. The word frequency calculation unit 302 has a function of counting the appearance frequency of words included in the analysis target text data set. The relative word appearance rate calculation unit 303 has a function of calculating the word appearance rate from the appearance frequency of the attention word and the appearance frequency of the comparison word.

テキストデータ群４００は、更新時間とテキストデータを紐付けて格納しているデータベースであり、分析対象テキストデータ集合決定部３０１においてテキストデータ集合抽出の対象となる。 The text data group 400 is a database that stores the update time and the text data in association with each other, and is an object of text data set extraction in the analysis target text data set determination unit 301.

＜注目単語分析システムの処理＞
次に、図２〜図１１により、本発明の一実施の形態に係る注目単語分析システムの処理について説明する。図２は本発明の一実施の形態に係る注目単語分析システムがテキストデータ分析を行う際のデータの流れを示す図、図３は本発明の一実施の形態に係る注目単語分析システムの分析対象テキストデータ集合決定部の処理を示すフローチャート、図４は本発明の一実施の形態に係る注目単語分析システムで使用される注目単語と比較単語リストの一例を示す図、図５は本発明の一実施の形態に係る注目単語分析システムで使用される分析対象単語リストの一例を示す図、図６は本発明の一実施の形態に係る注目単語分析システムで使用される分析対象テキストデータ集合の一例を示す図である。 <Process of attention word analysis system>
Next, processing of the attention word analysis system according to the embodiment of the present invention will be described with reference to FIGS. FIG. 2 is a diagram showing a data flow when the attention word analysis system according to the embodiment of the present invention performs text data analysis, and FIG. 3 is an analysis target of the attention word analysis system according to the embodiment of the present invention. FIG. 4 is a flowchart showing the processing of the text data set determination unit, FIG. 4 is a diagram showing an example of the attention word and comparison word list used in the attention word analysis system according to the embodiment of the present invention, and FIG. FIG. 6 is a diagram showing an example of an analysis target word list used in the attention word analysis system according to the embodiment. FIG. 6 is an example of an analysis target text data set used in the attention word analysis system according to the embodiment of the present invention. FIG.

図７は本発明の一実施の形態に係る注目単語分析システムの単語頻度計算部の処理を示すフローチャート、図８は本発明の一実施の形態に係る注目単語分析システムの相対的単語出現割合計算部の処理を示すフローチャート、図９は本発明の一実施の形態に係る注目単語分析システムの単語頻度計算部で作成される表の一例を示す図、図１０は本発明の一実施の形態に係る注目単語分析システムの単語出現割合を計算した一例を示す図、図１１は本発明の一実施の形態に係る注目単語分析システムの出力例を示す図である。 FIG. 7 is a flowchart showing processing of the word frequency calculation unit of the attention word analysis system according to the embodiment of the present invention, and FIG. 8 is a relative word appearance ratio calculation of the attention word analysis system according to the embodiment of the present invention. FIG. 9 is a diagram showing an example of a table created by the word frequency calculation unit of the attention word analysis system according to the embodiment of the present invention, and FIG. 10 is an embodiment of the present invention. The figure which shows an example which calculated the word appearance ratio of the attention word analysis system which concerns, FIG. 11 is a figure which shows the example of an output of the attention word analysis system which concerns on one embodiment of this invention.

まず、注目単語分析システム全体のデータの流れとしては、図２に示すように、入力部２０１は、ユーザ１００から受け取った注目単語と比較単語リストを分析対象テキストデータ集合決定部３０１に送信する（Ｓ２０１）。 First, as a data flow of the entire attention word analysis system, as illustrated in FIG. 2, the input unit 201 transmits the attention word and the comparison word list received from the user 100 to the analysis target text data set determination unit 301 ( S201).

分析対象テキストデータ集合決定部３０１は、受信した注目単語、比較単語リストに基づいて、テキストデータ群４００から分析対象テキストデータ集合を抽出する（Ｓ２０２）。 The analysis target text data set determination unit 301 extracts an analysis target text data set from the text data group 400 based on the received attention word and comparison word list (S202).

また、分析対象テキストデータ集合決定部３０１は、分析対象テキストデータ集合を単語頻度計算部３０２に送信する（Ｓ２０３）。単語頻度計算部３０２は、受信した分析対象テキストデータ集合から単語出現頻度を数えて、単語頻度を相対的単語出現割合計算部３０３に送信する（Ｓ２０４）。相対的単語出現割合計算部３０３は、単語出現割合を表示部２０２に送信する（Ｓ２０５）。 Further, the analysis target text data set determination unit 301 transmits the analysis target text data set to the word frequency calculation unit 302 (S203). The word frequency calculation unit 302 counts the word appearance frequency from the received analysis target text data set, and transmits the word frequency to the relative word appearance rate calculation unit 303 (S204). The relative word appearance ratio calculation unit 303 transmits the word appearance ratio to the display unit 202 (S205).

以下、各処理の詳細を説明する。 Details of each process will be described below.

まず、分析対象テキストデータ集合決定部３０１の処理は、図３に示すように、入力部２０１を介してユーザ１００から注目単語と１つ以上の比較単語（比較単語リスト）を取得する（Ｓ３０１）。 First, as shown in FIG. 3, the processing of the analysis target text data set determination unit 301 acquires the attention word and one or more comparison words (comparison word list) from the user 100 via the input unit 201 (S301). .

入力する注目単語と比較単語リストの一例を図４に示す。図４においては、一例として、注目単語４０１を「ブラックコーヒー」、比較単語４０２のリストを「加糖コーヒー」、「微糖コーヒー」とする。「ブラックコーヒー」はユーザが市場で注目されているかを知りたい製品名で、「加糖コーヒー」、「微糖コーヒー」はユーザが考えるコーヒー市場において「ブラックコーヒー」の競合となる製品名である。 An example of the attention word and the comparison word list to be input is shown in FIG. In FIG. 4, as an example, the word of interest 401 is “black coffee”, and the list of comparison words 402 is “sweetened coffee” and “slightly sugared coffee”. “Black coffee” is a product name that the user wants to know whether the user is paying attention in the market, and “sweetened coffee” and “fine sugar coffee” are product names that compete with “black coffee” in the coffee market considered by the user.

そして、注目単語と比較単語リストを合わせたリストを分析対象単語リストとして保持する。分析対象単語リストには、注目単語であるか、比較単語であるかの分類も追加する。分析対象単語リストの一例を図５に示す。図５に示す分析対象単語リストは、「ブラックコーヒー」、「加糖コーヒー」、「微糖コーヒー」が記載され、また、それぞれの単語が注目単語であるか比較単語であるかの分類が同時に記載される。 A list in which the attention word and the comparison word list are combined is held as an analysis target word list. A classification as to whether the word is the attention word or the comparison word is also added to the analysis target word list. An example of the analysis target word list is shown in FIG. The analysis target word list shown in FIG. 5 describes “black coffee”, “sweetened coffee”, and “fine sugar coffee”, and simultaneously describes whether each word is a focus word or a comparison word. Is done.

次に、分析対象単語リスト内の単語を含んでいるテキストデータをテキストデータ群４００から抽出する（Ｓ３０２）。抽出されたテキストデータを分析対象テキストデータ集合とする。分析対象テキストデータ集合の一例を図６に示す。注目単語を「ブラックコーヒー」、比較単語リストを「加糖コーヒー」、「微糖コーヒー」として、テキストデータの抽出を行う。 Next, text data including words in the analysis target word list is extracted from the text data group 400 (S302). The extracted text data is set as an analysis target text data set. An example of the analysis target text data set is shown in FIG. Text data is extracted with the word of interest as “black coffee”, the comparison word list as “sweetened coffee”, and “fine sugared coffee”.

図６に示す例では、更新月日が２０１１／４のテキストデータ群と２０１１／５のテキストデータ群の２つから分析対象テキストデータ集合を抽出している。図６の６０１は２０１１／４分の分析対象テキストデータ集合、図６の６０２は２０１１／５分の分析対象テキストデータ集合である。 In the example illustrated in FIG. 6, an analysis target text data set is extracted from two text data groups whose update date is 2011/4 and 2011/5 text data group. 601 in FIG. 6 is an analysis target text data set for 2011/4, and 602 in FIG. 6 is an analysis target text data set for 2011/5.

最後に、それぞれの期間の分析対象テキストデータ集合を単語頻度計算部３０２に送信する（Ｓ３０３）。 Finally, the analysis target text data set for each period is transmitted to the word frequency calculation unit 302 (S303).

また、分析対象テキストデータ集合決定部３０１からの分析対象テキストデータ集合を受信した単語頻度計算部３０２の処理は、図７に示すように、分析対象テキストデータ集合決定部３０１から取得した分析対象テキストデータ集合に出現する単語の回数（単語出現頻度）を数える（Ｓ７０１）。単語出現頻度は、分析対象単語リスト内の単語ごとに数える。 In addition, the processing of the word frequency calculation unit 302 that has received the analysis target text data set from the analysis target text data set determination unit 301 includes the analysis target text acquired from the analysis target text data set determination unit 301 as shown in FIG. The number of words appearing in the data set (word appearance frequency) is counted (S701). The word appearance frequency is counted for each word in the analysis target word list.

その後、単語の出現頻度を相対的単語出現割合計算部３０３に送信する（Ｓ７０２）。 Thereafter, the appearance frequency of the word is transmitted to the relative word appearance ratio calculation unit 303 (S702).

また、単語頻度計算部３０２からの単語の出現頻度を受信した相対的単語出現割合計算部３０３の処理は、図８に示すように、まず、単語出現割合計算のために分析対象単語表と単語出現頻度表を作成する（Ｓ８０１）。 In addition, as shown in FIG. 8, the processing of the relative word appearance rate calculating unit 303 that has received the word appearance frequency from the word frequency calculating unit 302 is performed by first calculating the analysis target word table and the word to calculate the word appearance rate. An appearance frequency table is created (S801).

分析対象単語表と単語出現頻度表の一例を図９に示す。図９において、分析対象単語表９０１は、分析対象単語リストＩＤ、単語名、分類、単語ＩＤの属性を持ち、単語出現頻度表９０２は、分析対象単語リストＩＤ、単語ＩＤ、測定期間、出現頻度の属性を持っている。 An example of an analysis object word table and a word appearance frequency table is shown in FIG. 9, the analysis target word table 901 has attributes of analysis target word list ID, word name, classification, and word ID, and the word appearance frequency table 902 includes analysis target word list ID, word ID, measurement period, and appearance frequency. Has the attributes of

分析対象単語リストＩＤは、例えば「１」などの分析対象単語リストを一意に識別する値が格納される。単語名は、例えば「ブラックコーヒー」、「加糖コーヒー」、「微糖コーヒー」などのユーザが入力した注目単語か比較単語が格納される。分類は、「注目単語」、「比較単語」のいずれかが格納される。単語ＩＤは、例えば「１」、「２」、「３」など単語を一意に識別する値が格納される。 The analysis target word list ID stores a value for uniquely identifying the analysis target word list such as “1”, for example. The word name stores, for example, a word of interest or a comparison word input by the user, such as “black coffee”, “sweetened coffee”, and “fine sugar coffee”. As the classification, either “attention word” or “comparison word” is stored. The word ID stores a value for uniquely identifying the word, such as “1”, “2”, “3”, for example.

また、測定期間は、例えば「２０１１／４」、「２０１１／５」など分析対象としたテキストデータが更新された年月が格納される。出現頻度は、例えば「１」、「２」、「３」、「５」など単語頻度計算部３０２から取得した単語出現頻度が格納される。 The measurement period stores the date when the text data to be analyzed such as “2011/4” and “2011/5” is updated. As the appearance frequency, for example, the word appearance frequency acquired from the word frequency calculation unit 302 such as “1”, “2”, “3”, and “5” is stored.

次に、単語出現割合を計算する（Ｓ８０２）。 Next, the word appearance ratio is calculated (S802).

注目単語の単語出現割合は、注目単語を含むテキストデータ集合をＡ、比較単語がＮ個の場合の比較単語を含むテキストデータ集合をＢ₁、Ｂ₂、…、Ｂ_Nとしたとき、｜Ａ｜／｜Ａ∪Ｂ₁∪Ｂ₂∪…∪Ｂ_N｜で計算される。 Word appearance proportion of attention words, the text data set including the target word A, the text data set, including the comparison word of when the comparison words of the N B _1, B _2, ..., when the B _N, | A | / | A∪B ₁ ∪B ₂ ∪... ∪B _N |

例として、「ブラックコーヒー」の単語出現割合は、「ブラックコーヒー」を含むテキストデータの集合をＡ、「加糖コーヒー」を含むテキストデータの集合をＢ₁、「微糖コーヒー」を含むテキストデータの集合をＢ₂とした場合、｜Ａ｜／｜Ａ∪Ｂ₁∪Ｂ₂｜で計算される。 As an example, the word appearance ratio of “black coffee” is A for a set of text data including “black coffee”, B _{1 for} a set of text data including “sweetened coffee”, and for text data including “fine sugar coffee”. If the aggregate and B _2, | is calculated by _{| a | / | A∪B 1 ∪B} 2.

ここで、単語出現割合を計算して図１０に示す表を作成する一例を説明する。 Here, an example of calculating the word appearance ratio and creating the table shown in FIG. 10 will be described.

単語名１００１と分類１００２の列は、図９に示す分析対象単語表９０１から、分析対象単語リストＩＤが１である単語名と分類を抽出して格納される。単語出現割合１００３は、単語出現頻度表９０２から、分析対象単語リストＩＤ、単語ＩＤ、測定期間をキーに抽出した単語出現頻度を分析対象単語リストＩＤ、測定期間をキーに抽出した出現単語頻度の合計で割った値が格納される。 The columns of the word name 1001 and the classification 1002 are extracted from the analysis target word table 901 shown in FIG. The word appearance ratio 1003 is obtained by calculating the word appearance frequency extracted from the word appearance frequency table 902 using the analysis target word list ID, the word ID, and the measurement period as a key, and the appearance word frequency extracted using the analysis target word list ID and the measurement period as a key. Stores the value divided by the total.

最後に、単語出現割合を表示部２０２に送信する（Ｓ８０３）。 Finally, the word appearance ratio is transmitted to the display unit 202 (S803).

表示部２０２では、相対的単語出現割合計算部３０３で作成された図９に示す表を出力することができる。また、更新された期間の異なるテキストデータ群ごとに単語出現割合から時系列でのグラフを作成し、時系列での単語出現割合の変化を可視化することもできる。注目単語の単語出現頻度の出力と合わせて、比較単語の単語出現頻度を出力することもできる。 The display unit 202 can output the table shown in FIG. 9 created by the relative word appearance ratio calculation unit 303. It is also possible to create a time-series graph from the word appearance ratio for each text data group with different updated periods, and to visualize changes in the word appearance ratio in the time series. Along with the output of the word appearance frequency of the attention word, the word appearance frequency of the comparison word can also be output.

単語出現割合の出力例を図１１に示す。図１１に示す例は、２０１１／４、２０１１／５、２０１１／６と年月ごとに収集したテキストデータ群に対して、出現頻度割合を計算して、計算した結果を時系列でグラフ化して出力した例である。 An output example of the word appearance ratio is shown in FIG. In the example shown in FIG. 11, the appearance frequency ratio is calculated for the text data group collected every year, such as 2011/4, 2011/5, 2011/6, and the calculated result is graphed in time series. This is an output example.

図１１に示す例では、注目単語「ブラックコーヒー」の出現単語頻度の出力に加えて、比較単語「加糖コーヒー」、「微糖コーヒー」の出現単語頻度を合わせて出力している。 In the example shown in FIG. 11, in addition to the appearance word frequency of the attention word “black coffee”, the appearance word frequencies of the comparison words “sweetened coffee” and “slight sugar coffee” are output together.

以上のように、本実施の形態では、テキストデータ群から注目単語と比較単語を含むテキストデータを抽出して、その単語出現割合を比較することにより相対的に単語の注目度合を得ることにより、期間ごとに変化するテキストデータ数の影響を受けずにユーザが指定した単語の注目度合の時系列変化を評価することができる。 As described above, in the present embodiment, by extracting the text data including the attention word and the comparison word from the text data group, and comparing the word appearance ratio, relatively obtaining the attention degree of the word, It is possible to evaluate a time-series change in the degree of attention of a word designated by the user without being affected by the number of text data that changes every period.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

本発明は、収集したテキストデータ内で注目されている単語を分析する注目単語分析方法および注目単語分析システムに関し、企業がＷｅｂ上のブログやＳＮＳなどのソーシャルメディアからテキストデータを収集して市場調査を行う際に、製品名などの特定のキーワードの市場での注目度合の変化を評価する装置やシステムなどに広く適用可能である。 The present invention relates to an attention word analysis method and attention word analysis system for analyzing a word attracting attention in collected text data, and a market research in which a company collects text data from social media such as a blog or SNS on the Web. Can be widely applied to devices and systems that evaluate changes in the degree of attention in the market for specific keywords such as product names.

１００…ユーザ、１０１…注目単語分析システム、２００…分析管理サブシステム、２０１…入力部、２０２…表示部、３００…テキストデータ分析サブシステム、３０１…分析対象テキストデータ集合決定部、３０２…単語頻度計算部、３０３…相対的単語出現割合計算部、４００…テキストデータ群、４０１…注目単語、４０２…比較単語、９０１…分析対象単語表、９０２…単語出現頻度表。 DESCRIPTION OF SYMBOLS 100 ... User, 101 ... Attention word analysis system, 200 ... Analysis management subsystem, 201 ... Input part, 202 ... Display part, 300 ... Text data analysis subsystem, 301 ... Analysis object text data set determination part, 302 ... Word frequency Calculation unit 303: Relative word appearance ratio calculation unit 400: Text data group 401: Attention word 402: Comparison word 901: Analysis target word table 902: Word appearance frequency table

Claims

An attention word analysis method for obtaining information on a word attracting attention from a text data group,
The analysis target word list from the user input from the input unit is acquired by the text data analysis unit, the frequency of appearance of the plurality of words in the analysis target word list is counted from the text data group, and the words of the plurality of words Calculate the word appearance rate from the appearance frequency between, analyze the word that is relatively focused,
The process of analyzing the words that are relatively noted, at least one free multiple comparison word and a word of interest, and related to comparison with the target word measure whether it is focused by the text data groups Mute Kisutode targeting only data, a word of interest analysis method characterized in that it is performed based on the set of the text data including the target word or the comparison word with the union of the text data as the object .

The attention word analysis method according to claim 1,
The analysis target word list includes the attention word and the plurality of comparison words,
The text data analysis unit extracts text data including the comparison word from the attention word and the comparison word list from the text data group, and determines the appearance frequency of the attention word and the comparison word in the extracted text data. A word-of-interest analysis method, comprising: counting a word appearance ratio from the appearance frequency of the word of interest and the comparison word, and analyzing a relative degree of attention of the word of interest.

An attention word analysis method for obtaining information on a word attracting attention from a text data group,
A word of interest to be measured in the text data group from the user, and a comparison word to be compared in relation to the word of interest are input from the input unit as an analysis target word list,
The analysis target text data set determination unit extracts text data including the attention word and the comparison word from the text data group,
The word frequency calculation unit counts the appearance frequency of the attention word and the comparison word in the text data extracted by the analysis target text data set determination unit,
The relative word appearance ratio calculation unit calculates the word appearance ratio from the appearance frequency of the attention word and the comparison word, and displays the calculation result on the display unit.
The analyzed text data set decision unit, the word frequency calculating unit, and processing by the relative word appearance ratio computation unit, only the free Mute Kisutode data at least one of the word of interest and the plurality of comparison words An attention word analysis method, which is performed on the basis of a union of the text data as an object and the set of text data including the attention word or the comparison word .

An attention word analysis system that obtains information on a word that is attracting attention from a text data group,
An input unit for inputting a word list to be analyzed from a user;
The appearance frequency of a plurality of words in the analysis target word list is counted from the text data group, the word appearance ratio is calculated from the appearance frequencies between the words of the plurality of words, and the relatively attention word is analyzed. A text data analysis unit,
The process of analyzing the relatively focused word by the text data analysis unit includes a focused word that measures whether or not the focused text data is focused in the text data group, and a plurality of comparative words to be compared with the focused word . and characterized in that is carried out on the basis of at least one in the set of the text data intended for only including Mute Kisutode data includes the union of the text data as a target of the target word or the comparison word Attention word analysis system.

The attention word analysis system according to claim 4,
The analysis target word list includes the attention word and the plurality of comparison words,
The text data analysis unit extracts text data including the comparison word from the attention word and the comparison word list from the text data group, and determines the appearance frequency of the attention word and the comparison word in the extracted text data. An attention word analysis system that counts and calculates a word appearance ratio from appearance frequencies of the attention word and the comparison word, and analyzes a relative attention degree of the attention word.

An attention word analysis system that obtains information on a word that is attracting attention from a text data group,
An input unit for inputting a word of interest to be measured in the text data group from the user and a comparison word to be compared in relation to the word of interest as an analysis target word list;
An analysis target text data set determination unit that extracts text data including the attention word and the comparison word from the text data group;
A word frequency calculation unit that counts the appearance frequencies of the attention word and the comparison word in the text data extracted by the analysis target text data set determination unit;
A relative word appearance ratio calculating unit that calculates a word appearance ratio from the appearance frequencies of the attention word and the comparison word;
A display unit for displaying a calculation result of the relative word appearance ratio calculation unit,
The analyzed text data set decision unit, the word frequency calculating unit, and processing by the relative word appearance ratio calculation unit, and the target of the word of interest and only including Mute Kisutode data at least one of the comparison word Then, the attention word analysis system is performed based on a set of the text data including the union set of the text data as a target and the attention word or the comparison word .