JP7091700B2

JP7091700B2 - Information processing program, message analysis program, information processing device and information processing method

Info

Publication number: JP7091700B2
Application number: JP2018029175A
Authority: JP
Inventors: 悟志鯉渕; 大亮山岡; 祐冨田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-02-21
Filing date: 2018-02-21
Publication date: 2022-06-28
Anticipated expiration: 2038-02-21
Also published as: JP2019144905A

Description

本発明は、情報処理プログラム、メッセージ解析プログラム、情報処理装置及び情報処理方法に関する。 The present invention relates to an information processing program, a message analysis program, an information processing apparatus, and an information processing method.

例えば、Twitter等に投稿された内容を解析することで、特定の事象に関する情報を推定する事象推定システムの技術が普及している。事象推定システムでは、投稿された内容を解析して、例えば、災害に関する情報を解析することで、災害発生地域を特定できる。 For example, the technique of an event estimation system that estimates information about a specific event by analyzing the content posted on Twitter or the like is widespread. In the event estimation system, the posted content can be analyzed, and for example, the disaster occurrence area can be specified by analyzing the information about the disaster.

特開２０１２－２４３０３２号公報Japanese Unexamined Patent Publication No. 2012-243032

事象推定システムでは、例えば、Twitter等に投稿された内容を解析する際に、キーワードに基づいて特定の事象に関する情報を抽出することになる。しかしながら、事象推定システムでは、精度良く特定の事象に関する情報を取得するためには、特定の事象を把握するための適切なキーワードを選定する必要がある。 In the event estimation system, for example, when analyzing the content posted on Twitter or the like, information on a specific event is extracted based on the keyword. However, in the event estimation system, in order to acquire information about a specific event with high accuracy, it is necessary to select an appropriate keyword for grasping the specific event.

一つの側面では、特定の事象に関する情報を取得するのに適切なキーワードを選定できる情報処理プログラム等を提供することにある。 One aspect is to provide an information processing program or the like that can select appropriate keywords for acquiring information on a specific event.

一つの態様では、コンピュータに、所定期間における複数の投稿に含まれる所定のワードの出現回数を特定する処理を実行させる。更に、コンピュータに、特定した前記所定のワードの出現回数が所定の回数以上である場合に、前記所定のワードを含む投稿から、前記所定のワードとは異なる１または複数のワードを抽出する処理を実行させる。コンピュータに、抽出した前記１または複数のワードそれぞれの出現回数が、前記所定期間と異なる期間における出現回数と比較して特定の閾値以上多いか否かに応じて、前記１または複数のワードを出力する処理を実行させる。 In one embodiment, the computer is made to execute a process of specifying the number of occurrences of a predetermined word included in a plurality of posts in a predetermined period. Further, when the number of appearances of the specified predetermined word is equal to or greater than the predetermined number of times, the computer is subjected to a process of extracting one or a plurality of words different from the predetermined word from the posts containing the predetermined word. Let it run. The one or more words are output to the computer depending on whether or not the number of appearances of each of the extracted one or more words is more than a specific threshold value as compared with the number of appearances in a period different from the predetermined period. To execute the processing to be performed.

一つの側面として、特定の事象に関する情報を取得するのに適切なキーワードを選定できる。 One aspect is the ability to select appropriate keywords to obtain information about a particular event.

図１は、本実施例の事象推定システムの一例を示す説明図である。FIG. 1 is an explanatory diagram showing an example of the event estimation system of this embodiment. 図２は、情報処理装置のハードウェア構成の一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of the hardware configuration of the information processing apparatus. 図３は、実施例１の情報処理装置の機能構成の一例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of the functional configuration of the information processing apparatus of the first embodiment. 図４は、親キーワードを含むツイート群から子キーワードを出力する際の処理の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of processing when a child keyword is output from a tweet group including a parent keyword. 図５は、候補単語記憶部の一例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of a candidate word storage unit. 図６は、出力処理に関わる情報処理装置内の制御部の処理動作の一例を示すフロー図である。FIG. 6 is a flow chart showing an example of the processing operation of the control unit in the information processing apparatus related to the output processing. 図７は、子キーワード選定処理に関わる情報処理装置内の制御部の処理動作の一例を示すフロー図である。FIG. 7 is a flow chart showing an example of the processing operation of the control unit in the information processing apparatus related to the child keyword selection processing. 図８は、実施例２の情報処理装置の機能構成の一例を示す説明図である。FIG. 8 is an explanatory diagram showing an example of the functional configuration of the information processing apparatus of the second embodiment. 図９は、鹿児島で地震が発生した際のキーワード「地震」を含むツイート群から事象に関わる単語組合せの一例を示す説明図である。FIG. 9 is an explanatory diagram showing an example of a word combination related to an event from a group of tweets including the keyword “earthquake” when an earthquake occurs in Kagoshima. 図１０は、対象条件の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of the target condition. 図１１は、現在ツイート群（鹿児島で地震が発生した際のツイート群）及び比較対象ツイート群（過去の平常時の地震に関するツイート群）に関わる重要単語の一例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of important words related to the current tweet group (tweet group when an earthquake occurred in Kagoshima) and the comparison target tweet group (tweet group related to past normal earthquakes). 図１２は、鹿児島で地震が発生した際のツイート群の重要単語に関わる共起単語毎の出現回数の一例を示す説明図である。FIG. 12 is an explanatory diagram showing an example of the number of occurrences of each co-occurrence word related to an important word in a tweet group when an earthquake occurs in Kagoshima. 図１３は、渋滞時のツイート群から事象推定の一例を示す説明図である。FIG. 13 is an explanatory diagram showing an example of event estimation from a group of tweets during a traffic jam. 図１４は、渋滞時のツイート群から事象推定の一例を示す説明図である。FIG. 14 is an explanatory diagram showing an example of event estimation from a group of tweets during a traffic jam. 図１５は、鹿児島で地震が発生した際のツイート群から事象推定の一例を示す説明図である。FIG. 15 is an explanatory diagram showing an example of event estimation from a group of tweets when an earthquake occurs in Kagoshima. 図１６は、地震義援金を募集した際のツイート群から事象推定の一例を示す説明図である。FIG. 16 is an explanatory diagram showing an example of event estimation from a group of tweets when soliciting earthquake relief funds. 図１７は、北海道北見市で停電が発生した際のツイート群から事象推定の一例を示す説明図である。FIG. 17 is an explanatory diagram showing an example of event estimation from a group of tweets when a power failure occurs in Kitami City, Hokkaido. 図１８は、神奈川県横浜市で停電が発生した際のツイート群から事象推定の一例を示す説明図である。FIG. 18 is an explanatory diagram showing an example of event estimation from a group of tweets when a power failure occurs in Yokohama City, Kanagawa Prefecture. 図１９は、第１の推定処理に関わる情報処理装置内の制御部の処理動作の一例を示すフロー図である。FIG. 19 is a flow chart showing an example of the processing operation of the control unit in the information processing apparatus related to the first estimation processing. 図２０は、第１の出力処理に関わる情報処理装置内の制御部の処理動作の一例を示すフロー図である。FIG. 20 is a flow chart showing an example of the processing operation of the control unit in the information processing apparatus related to the first output processing. 図２１は、実施例３の情報処理装置の機能構成の一例を示す説明図である。FIG. 21 is an explanatory diagram showing an example of the functional configuration of the information processing apparatus of the third embodiment. 図２２は、鹿児島で地震が発生した際のツイート群の重要単語に関わる共起単語毎の出現回数の一例を示す説明図である。FIG. 22 is an explanatory diagram showing an example of the number of occurrences of each co-occurrence word related to an important word in a tweet group when an earthquake occurs in Kagoshima. 図２３は、鹿児島で地震が発生した際のツイート群から事象推定の一例を示す説明図である。FIG. 23 is an explanatory diagram showing an example of event estimation from a group of tweets when an earthquake occurs in Kagoshima. 図２４は、地震義援金を募集した際のツイート群から事象推定の一例を示す説明図である。FIG. 24 is an explanatory diagram showing an example of event estimation from a group of tweets when soliciting earthquake relief funds. 図２５は、北海道北見市で停電を発生した際のツイート群から事象推定の一例を示す説明図である。FIG. 25 is an explanatory diagram showing an example of event estimation from a group of tweets when a power failure occurs in Kitami City, Hokkaido. 図２６は、神奈川県横浜市で停電を発生した際のツイート群から事象推定の一例を示す説明図である。FIG. 26 is an explanatory diagram showing an example of event estimation from a group of tweets when a power failure occurs in Yokohama City, Kanagawa Prefecture. 図２７は、第２の推定処理に関わる情報処理装置内の制御部の処理動作の一例を示すフロー図である。FIG. 27 is a flow chart showing an example of the processing operation of the control unit in the information processing apparatus related to the second estimation processing. 図２８は、第２の出力処理に関わる情報処理装置内の制御部の処理動作の一例を示すフロー図である。FIG. 28 is a flow chart showing an example of the processing operation of the control unit in the information processing apparatus related to the second output processing. 図２９は、情報処理プログラムを実行する情報処理装置の一例を示す説明図である。FIG. 29 is an explanatory diagram showing an example of an information processing apparatus that executes an information processing program.

以下、図面に基づいて、本願の開示する情報処理プログラム等の実施例を詳細に説明する。尚、本実施例により、開示技術が限定されるものではない。また、以下に示す各実施例は、矛盾を起こさない範囲で適宜組み合わせても良い。 Hereinafter, examples of the information processing program and the like disclosed in the present application will be described in detail with reference to the drawings. The disclosed technique is not limited by the present embodiment. In addition, the examples shown below may be appropriately combined as long as they do not cause a contradiction.

図１は、本実施例の事象推定システム１の一例を示す説明図である。図１に示す事象推定システム１は、複数の端末装置２と、情報処理装置３と、通信網４とを有する。事象推定システム１は、例えば、ツイートの内容をＡＩ(Artificial Intelligence)技術で解析し、解析結果に基づき、例えば、災害が発生している事例を推定するＳＳＳ（Social Sensor System）である。尚、事象推定システム１では、災害が発生している箇所を推定して地図上で表示するようにしても良い。複数の端末装置２は、通信網４と接続し、通信網４を通じてＳＮＳ（Social Network Service）等のサービスを享受するスマートフォン、タブレット端末やパソコン等の情報端末である。情報処理装置３は、例えば、通信網４と接続し、ＳＮＳ内のツイート内の使用キーワードから事象を推定するコンピュータである。尚、ＳＮＳは、不特定多数の利用者の投稿が読めるTwitter等の投稿サービスが望ましいが、これに限定されるものではなく、例えば、特定多数の利用者の投稿が読めるフェイスブック等の投稿サービスでも良く、適宜変更可能である。 FIG. 1 is an explanatory diagram showing an example of the event estimation system 1 of this embodiment. The event estimation system 1 shown in FIG. 1 has a plurality of terminal devices 2, an information processing device 3, and a communication network 4. The event estimation system 1 is, for example, an SSS (Social Sensor System) that analyzes the content of a tweet by AI (Artificial Intelligence) technology and estimates, for example, a case where a disaster has occurred based on the analysis result. In the event estimation system 1, a location where a disaster has occurred may be estimated and displayed on a map. The plurality of terminal devices 2 are information terminals such as smartphones, tablet terminals, and personal computers that are connected to the communication network 4 and enjoy services such as SNS (Social Network Service) through the communication network 4. The information processing device 3 is, for example, a computer that connects to a communication network 4 and estimates an event from a keyword used in a tweet in an SNS. It should be noted that SNS is preferably a posting service such as Twitter that can read posts by an unspecified number of users, but is not limited to this, and is not limited to this, for example, a posting service such as Facebook that can read posts by a specific number of users. However, it can be changed as appropriate.

図２は、情報処理装置３のハードウェア構成の一例を示す説明図である。図２に示す情報処理装置３は、通信装置１１と、ＨＤＤ（Hard Disk Drive）１２と、ＲＯＭ(Read Only Memory)１３と、ＲＡＭ(Random Access Memory)１４と、ＣＰＵ(Central Processing Unit)１５と、バス１６とを有する。通信装置１１は、通信網４と通信接続する装置である。ＨＤＤ１２は、各種情報を記憶する記憶領域である。ＲＯＭ１３は、各種プログラム等の情報を記憶する記憶領域である。ＲＡＭ１４は、各種情報を記憶する記憶領域である。ＣＰＵ１５は、情報処理装置３全体を制御する。バス１６は、通信装置１１、ＨＤＤ１２、ＲＯＭ１３、ＲＡＭ１４及びＣＰＵ１５との間でデータを送受信する通信線及び制御線等である。 FIG. 2 is an explanatory diagram showing an example of the hardware configuration of the information processing apparatus 3. The information processing device 3 shown in FIG. 2 includes a communication device 11, an HDD (Hard Disk Drive) 12, a ROM (Read Only Memory) 13, a RAM (Random Access Memory) 14, and a CPU (Central Processing Unit) 15. , And a bus 16. The communication device 11 is a device that communicates and connects with the communication network 4. The HDD 12 is a storage area for storing various information. The ROM 13 is a storage area for storing information such as various programs. The RAM 14 is a storage area for storing various types of information. The CPU 15 controls the entire information processing apparatus 3. The bus 16 is a communication line, a control line, or the like that transmits / receives data to / from the communication device 11, the HDD 12, the ROM 13, the RAM 14, and the CPU 15.

図３は、情報処理装置３の機能構成の一例を示す説明図である。図３に示す情報処理装置３は、制御部３Ａと、記憶部３Ｂとを有する。制御部３Ａは、例えば、ＣＰＵ１５に対応する。制御部３Ａは、例えば、ＲＯＭ１３に格納された情報処理プログラムをＲＡＭ上１４に展開し、ＲＡＭ１４上に展開された情報処理プログラムを情報処理プロセスとして実行することで、例えば、特定部２１、抽出部２２、比較部２３、出力部２４及び推定部２５を機能として実行する。記憶部３Ｂは、例えば、ＨＤＤ１２、ＲＯＭ１３及びＲＡＭ１４等に対応する。記憶部３Ｂは、キーワード記憶部３１と、候補単語記憶部３２とを有する。キーワード記憶部３１は、例えば、親キーワード及び子キーワード等のキーワードを記憶する領域である。尚、親キーワードは、特定の事象を表現するツイートを抽出するための単語である。子キーワードは、親キーワードに加えて、特定の事象を表現するツイートの抽出を補助する単語である。候補単語記憶部３２は、例えば、候補単語を記憶する領域である。尚、候補単語は、子キーワード候補の単語である。 FIG. 3 is an explanatory diagram showing an example of the functional configuration of the information processing apparatus 3. The information processing device 3 shown in FIG. 3 has a control unit 3A and a storage unit 3B. The control unit 3A corresponds to, for example, the CPU 15. The control unit 3A expands the information processing program stored in the ROM 13 on the RAM 14, and executes the information processing program expanded on the RAM 14 as an information processing process. Therefore, for example, the specific unit 21 and the extraction unit 22, the comparison unit 23, the output unit 24, and the estimation unit 25 are executed as functions. The storage unit 3B corresponds to, for example, HDD 12, ROM 13, RAM 14, and the like. The storage unit 3B has a keyword storage unit 31 and a candidate word storage unit 32. The keyword storage unit 31 is an area for storing keywords such as a parent keyword and a child keyword. The parent keyword is a word for extracting a tweet expressing a specific event. The child keyword is a word that assists in extracting tweets expressing a specific event in addition to the parent keyword. The candidate word storage unit 32 is, for example, an area for storing candidate words. The candidate word is a child keyword candidate word.

特定部２１は、単位時間当たりの親キーワードを含むツイートのツイート数をカウントし、ツイート数が平常時のツイート数のｔ倍に到達した場合にバースト発生と判定する。抽出部２２は、バースト期間中に親キーワードを含むツイート群の文書に対して形態素解析を実行し、そのツイート群を単語分割する。抽出部２２は、単語分割後、ツイート群に頻繁に出現する単語を出現回数毎にランキング化する。更に、抽出部２２は、出現回数が上位Ｎ位までの単語を候補単語として抽出し、抽出した候補単語を候補単語記憶部３２に記憶する。抽出部２２は、今回のバースト期間の候補単語である第１の候補単語と、過去Ｘ回分のバースト期間の候補単語である第２の候補単語とを候補単語記憶部３２から抽出する。 The specific unit 21 counts the number of tweets including the parent keyword per unit time, and determines that a burst has occurred when the number of tweets reaches t times the number of tweets in normal times. The extraction unit 22 performs morphological analysis on the document of the tweet group including the parent keyword during the burst period, and divides the tweet group into words. After the word is divided, the extraction unit 22 ranks the words that frequently appear in the tweet group according to the number of occurrences. Further, the extraction unit 22 extracts words having the highest number of occurrences as candidate words, and stores the extracted candidate words in the candidate word storage unit 32. The extraction unit 22 extracts the first candidate word, which is the candidate word for the current burst period, and the second candidate word, which is the candidate word for the burst period for the past X times, from the candidate word storage unit 32.

比較部２３は、第１の候補単語と第２の候補単語とを比較する。比較部２３は、第１の候補単語及び第２の候補単語の内、同一単語の候補単語が所定数Ｍ以上あるか否かを判定する。比較部２３は、同一単語の候補単語が所定数Ｍ以上ある場合、所定数Ｍ以上の候補単語を子キーワードとして選定する。 The comparison unit 23 compares the first candidate word with the second candidate word. The comparison unit 23 determines whether or not the number of candidate words of the same word among the first candidate word and the second candidate word is a predetermined number M or more. When the number of candidate words of the same word is a predetermined number M or more, the comparison unit 23 selects the candidate words of the predetermined number M or more as child keywords.

出力部２４は、選定した子キーワードを親キーワードに対応付けてキーワード記憶部３２に記憶する。推定部２５は、キーワード記憶部３２に記憶中の親キーワード及び、親キーワードに対応する子キーワードを使用してツイート群から事例の内容を推定する。 The output unit 24 associates the selected child keyword with the parent keyword and stores it in the keyword storage unit 32. The estimation unit 25 estimates the content of the case from the tweet group using the parent keyword stored in the keyword storage unit 32 and the child keyword corresponding to the parent keyword.

図４は、親キーワードを含むツイート群から子キーワードを出力する際の処理の一例を示す説明図である。情報処理装置３は、親キーワード「停電」を含むツイート群のバーストを検知した場合、親キーワード「停電」を含む今回のバースト期間中のツイート群と、親キーワード「停電」を含む過去のＸ回分のバースト期間中のツイート群とを比較する。情報処理装置３は、比較結果に基づき、第１の候補単語である「ブレーカ」及び第２の候補単語である「ブレーカ」が所定数Ｍ以上の候補単語と判断し、「ブレーカ」を子キーワードとして出力する。 FIG. 4 is an explanatory diagram showing an example of processing when a child keyword is output from a tweet group including a parent keyword. When the information processing device 3 detects a burst of tweets including the parent keyword "power failure", the tweet group during the current burst period including the parent keyword "power failure" and the past X times including the parent keyword "power failure" Compare with the group of tweets during the burst period. Based on the comparison result, the information processing apparatus 3 determines that the first candidate word "breaker" and the second candidate word "breaker" are candidate words of a predetermined number of M or more, and uses "breaker" as a child keyword. Is output as.

図５は、候補単語記憶部３２の一例を示す説明図である。図５に示す候補単語記憶部３２は、親キーワードを含むツイート群のバースト期間毎に管理し、候補単語と、候補単語の出現回数とを出現回数の昇順順位に対応付けて管理している。尚、候補単語は、第１の候補単語と、第２の候補単語とを有し、第１の候補単語は、今回のバースト期間中の候補単語であって、第２の候補単語は、過去Ｘ回のバースト期間中の候補単語である。制御部３Ａは、候補単語記憶部３２を参照し、親キーワードを含むバースト期間中の候補単語及び候補単語の出現回数及び、親キーワードを含む過去のバースト期間中の候補単語及び候補単語の出現回数を認識できる。比較部２３は、図５に示す候補単語記憶部３２を参照し、例えば、「大丈夫」、「震度」及び「停電」を子キーワードとして選定する。 FIG. 5 is an explanatory diagram showing an example of the candidate word storage unit 32. The candidate word storage unit 32 shown in FIG. 5 manages each burst period of the tweet group including the parent keyword, and manages the candidate word and the number of appearances of the candidate word in association with the ascending order of the number of appearances. The candidate word has a first candidate word and a second candidate word, the first candidate word is the candidate word during the current burst period, and the second candidate word is the past. It is a candidate word during the burst period of X times. The control unit 3A refers to the candidate word storage unit 32, and the number of appearances of the candidate word and the candidate word during the burst period including the parent keyword, and the number of appearances of the candidate word and the candidate word during the past burst period including the parent keyword. Can be recognized. The comparison unit 23 refers to the candidate word storage unit 32 shown in FIG. 5, and selects, for example, “OK”, “seismic intensity”, and “power failure” as child keywords.

図６は、出力処理に関わる情報処理装置３内の制御部３Ａの処理動作の一例を示すフロー図である。図６において制御部３Ａ内の特定部２１は、ＳＮＳ内に投稿されたツイート群から親キーワードを含むツイートを一定時間毎に収集する（ステップＳ１１）。 FIG. 6 is a flow chart showing an example of the processing operation of the control unit 3A in the information processing apparatus 3 related to the output processing. In FIG. 6, the specific unit 21 in the control unit 3A collects tweets including the parent keyword from the tweet group posted in the SNS at regular time intervals (step S11).

特定部２１は、単位時間内に取得した親キーワードを含むツイート数をカウントする（ステップＳ１２）。特定部２１は、親キーワードを含むツイート数が所定回数を超えたか否かを判定する（ステップＳ１３）。尚、所定回数は、例えば、平常時のツイート数のｔ倍とする。特定部２１は、親キーワードを含むツイート数が所定回数を超えた場合（ステップＳ１３肯定）、親キーワードを含むツイート群のバースト発生と判断する（ステップＳ１４）。制御部３Ａは、バースト発生と判断した後、後述する図７に示す子キーワード選定処理を実行する（ステップＳ１５）。尚、子キーワードは、親キーワードの事象を適切に推定するためのキーワードである。制御部３Ａ内の出力部２４は、子キーワード選定処理で選定された子キーワードを出力し（ステップＳ１６）、図６に示す処理動作を終了する。制御部３Ａは、出力された子キーワードをキーワード記憶部３１に記憶する。特定部２１は、ツイート数が所定回数を超えたのでない場合（ステップＳ１３否定）、親キーワードを含むツイートを取得すべく、ステップＳ１１に移行する。 The specific unit 21 counts the number of tweets including the parent keyword acquired within the unit time (step S12). The specific unit 21 determines whether or not the number of tweets including the parent keyword exceeds a predetermined number of times (step S13). The predetermined number of times is, for example, t times the number of tweets in normal times. When the number of tweets including the parent keyword exceeds a predetermined number of times (step S13 affirmative), the specific unit 21 determines that a burst of tweets including the parent keyword has occurred (step S14). After determining that a burst has occurred, the control unit 3A executes the child keyword selection process shown in FIG. 7, which will be described later (step S15). The child keyword is a keyword for appropriately estimating the event of the parent keyword. The output unit 24 in the control unit 3A outputs the child keyword selected in the child keyword selection process (step S16), and ends the process operation shown in FIG. The control unit 3A stores the output child keyword in the keyword storage unit 31. When the number of tweets does not exceed the predetermined number of times (step S13 denial), the specific unit 21 proceeds to step S11 in order to acquire the tweets including the parent keyword.

その結果、制御部３Ａ内の推定部２５は、キーワード記憶部３１に記憶中の親キーワードと、親キーワードに対応した子キーワードとを用いてツイート群から高精度に事例の内容を推定できる。 As a result, the estimation unit 25 in the control unit 3A can estimate the content of the case from the tweet group with high accuracy by using the parent keyword stored in the keyword storage unit 31 and the child keyword corresponding to the parent keyword.

図７は、子キーワード選定処理に関わる情報処理装置３内の制御部３Ａの処理動作の一例を示すフロー図である。図７において制御部３Ａ内の抽出部２２は、バースト期間の親キーワードのツイート群を含む文書を形態素解析し（ステップＳ２１）、形態素解析後の文書を単語分割する（ステップＳ２２）。 FIG. 7 is a flow chart showing an example of the processing operation of the control unit 3A in the information processing apparatus 3 related to the child keyword selection processing. In FIG. 7, the extraction unit 22 in the control unit 3A analyzes the document including the tweet group of the parent keyword during the burst period by morphological analysis (step S21), and divides the document after the morphological analysis into words (step S22).

抽出部２２は、単語毎の出現回数をカウントし（ステップＳ２３）、出現回数が上位Ｎ位までの単語を今回の候補単語として候補単語記憶部３２内に記憶する（ステップＳ２４）。尚、候補単語記憶部３２には、今回のバースト期間の候補単語の他に、過去に記憶済みの、出現回数が上位Ｎ位までの候補単語を記憶している。抽出部２２は、今回のバースト期間の候補単語を第１の候補単語として候補単語記憶部３２から抽出する（ステップＳ２５）。更に、抽出部２２は、過去のＸ回分の候補単語を第２の候補単語として候補単語記憶部３２から抽出する（ステップＳ２６）。 The extraction unit 22 counts the number of occurrences of each word (step S23), and stores the words having the highest number of appearances up to the Nth place in the candidate word storage unit 32 as the current candidate word (step S24). In addition to the candidate words during the burst period, the candidate word storage unit 32 stores the candidate words stored in the past and having the highest number of appearances. The extraction unit 22 extracts the candidate word in the current burst period as the first candidate word from the candidate word storage unit 32 (step S25). Further, the extraction unit 22 extracts the candidate words for the past X times as the second candidate word from the candidate word storage unit 32 (step S26).

制御部３Ａ内の比較部２３は、第１の候補単語と第２の候補単語とを比較する（ステップＳ２７）。比較部２３は、第１の候補単語及び第２の候補単語の内、同一単語の候補単語が所定数以上あるか否かを判定する（ステップＳ２８）。比較部２３は、同一単語の候補単語が所定数以上ある場合（ステップＳ２８肯定）、所定数以上の候補単語を子キーワードとして選定し（ステップＳ２９）、図７に示す処理動作を終了する。制御部３Ａは、同一単語の候補単語が所定数以上ない場合（ステップＳ２８否定）、図７に示す処理動作を終了する。 The comparison unit 23 in the control unit 3A compares the first candidate word with the second candidate word (step S27). The comparison unit 23 determines whether or not there are a predetermined number or more of the candidate words of the same word among the first candidate word and the second candidate word (step S28). When the number of candidate words of the same word is equal to or more than a predetermined number (step S28 affirmative), the comparison unit 23 selects the predetermined number or more of candidate words as child keywords (step S29), and ends the processing operation shown in FIG. 7. When the number of candidate words of the same word is not equal to or more than a predetermined number (denial in step S28), the control unit 3A ends the processing operation shown in FIG. 7.

制御部３Ａは、今回のバースト期間内の上位Ｎ位までの候補単語と、過去Ｘ回分のバースト期間内の上位Ｎ位までの候補単語とを比較するため、例えば、言葉の変化に対応可能なキーワード抽出を実現できる。尚、Ｎ、Ｘ及びＭは、事象や運用に応じて調整可能である。 The control unit 3A compares the candidate words up to the top N in the burst period of this time with the candidate words up to the top N in the burst period of the past X times, so that it can respond to changes in words, for example. Keyword extraction can be realized. Note that N, X and M can be adjusted according to the event and operation.

実施例１の情報処理装置３は、事象を推定する上での適切な子キーワードを選定できる。更に、情報処理装置３は、出現回数が上位の候補単語は、その事象を表す単語である可能性が高いため、過去Ｘ回の事例比較で、より適切な子キーワードを選定できる。更に、情報処理装置３は、過去Ｘ回の事例比較で時代によっての変化、Twitter独特の言葉遣いへの対応が可能となる。しかも、適切なキーワードを使用して事象を推定するため、事象の推定精度の向上を図る。 The information processing apparatus 3 of the first embodiment can select an appropriate child keyword for estimating an event. Further, in the information processing apparatus 3, since the candidate word having a higher number of appearances is likely to be a word representing the event, a more appropriate child keyword can be selected by comparing the past X cases. Furthermore, the information processing device 3 can respond to changes with the times and the wording peculiar to Twitter by comparing cases X times in the past. Moreover, since the event is estimated using appropriate keywords, the accuracy of event estimation is improved.

実施例１の情報処理装置３では、親キーワードを含むツイート群のバースト発生と検知された場合、親キーワードを含むツイート群から候補単語を抽出する。情報処理装置３は、候補単語それぞれの出現回数が、過去Ｘ回分のバースト期間における候補単語の出現回数と比較して所定数Ｍ以上か否かに応じて、候補単語を子キーワードとして出力する。その結果、過去Ｘ回の事例比較で出現回数が上位の候補単語を、事象を推定する上での適切な子キーワードとして選定し、事象の推定性の向上を図る。 In the information processing apparatus 3 of the first embodiment, when it is detected that a burst of a tweet group including a parent keyword has occurred, a candidate word is extracted from the tweet group including the parent keyword. The information processing apparatus 3 outputs the candidate word as a child keyword depending on whether or not the number of appearances of each candidate word is a predetermined number M or more as compared with the number of appearances of the candidate word in the burst period for the past X times. As a result, the candidate word with the highest number of occurrences in the past X case comparisons is selected as an appropriate child keyword for estimating the event, and the predictability of the event is improved.

尚、上記実施例１の比較部２３は、過去Ｘ回分のバースト期間の候補単語の出現回数を比較事例としたが、過去Ｘ回分のバースト期間に限定されるものではなく、出現回数が所定数未満の過去Ｘ回分のバースト未検知の期間でも良く、適宜変更可能である。 The comparison unit 23 of the first embodiment uses the number of appearances of the candidate word in the burst period of the past X times as a comparative example, but is not limited to the burst period of the past X times, and the number of appearances is a predetermined number. It may be a period of less than the past X bursts not detected, and can be changed as appropriate.

上記実施例１の比較部２３は、第１の候補単語の出現回数及び第２の候補単語の出現回数に応じて出現回数上位の同一候補単語を子キーワードとして選定した。しかしながら、比較部２３は、過去Ｘ回分のツイート群の候補単語の出現回数に対する、今回バースト期間の候補単語の出現回数の第１の率が、過去Ｘ回分のツイート群の親キーワードの出現回数に対する今回バースト期間の親キーワードの出現回数の第２の率と一致又は類似するか否かを判定する。比較部２３は、第１の率が第２の率と一致又は類似する場合、当該候補単語を子キーワードとして出力しても良く、適宜変更可能である。 The comparison unit 23 of the first embodiment selects the same candidate word having a higher number of appearances as a child keyword according to the number of appearances of the first candidate word and the number of appearances of the second candidate word. However, in the comparison unit 23, the first ratio of the number of appearances of the candidate word in the current burst period to the number of appearances of the candidate word in the tweet group for the past X times is the number of appearances of the parent keyword in the tweet group for the past X times. This time, it is determined whether or not it matches or is similar to the second rate of the number of occurrences of the parent keyword during the burst period. When the first rate matches or is similar to the second rate, the comparison unit 23 may output the candidate word as a child keyword and can change it as appropriate.

上記実施例１の比較部２３は、第１の候補単語の出現回数及び第２の候補単語の出現回数に応じて出現回数上位の同一候補単語を子キーワードとして選定した。しかしながら、第１の候補単語及び第２の候補単語の出現回数に応じて子キーワードを選定しても良い。 The comparison unit 23 of the first embodiment selects the same candidate word having a higher number of appearances as a child keyword according to the number of appearances of the first candidate word and the number of appearances of the second candidate word. However, child keywords may be selected according to the number of occurrences of the first candidate word and the second candidate word.

比較部２３は、候補単語の出現回数に応じて候補単語を出力、若しくは、候補単語の出現回数の増加傾向に応じて候補単語を出力しても良く、適宜変更可能である。 The comparison unit 23 may output the candidate word according to the number of appearances of the candidate word, or may output the candidate word according to the increasing tendency of the number of appearances of the candidate word, and can be changed as appropriate.

尚、上記実施例１の特定部２１は、親キーワードを含むツイート数が平常時のツイート数のｔ倍の所定回数を超えた場合に親キーワードを含むツイート群のバースト発生と判断した。しかしながら、親キーワードを含むツイート数に限定されるものではなく、親キーワードの出現回数としても良く、適宜変更可能である。 The specific unit 21 of the first embodiment determines that a burst of tweets including the parent keyword occurs when the number of tweets including the parent keyword exceeds a predetermined number of times t times the number of tweets in normal times. However, the number of tweets including the parent keyword is not limited to the number of occurrences of the parent keyword, and can be changed as appropriate.

上記実施例１の事象推定システムでは、親キーワード及び子キーワードを用いてツイート群から特定の事象を推定した。しかしながら、親キーワード及び子キーワードを含むツイート群から重要単語及び共起単語を抽出、重要単語及び共起単語を用いて特定の事象を推定しても良く、その実施の形態につき、実施例２として以下に説明する。尚、実施例１の事象推定システム１と同一の構成には同一符号を付すことで、その重複する構成及び動作の説明については省略する。 In the event estimation system of the first embodiment, a specific event was estimated from the tweet group using the parent keyword and the child keyword. However, important words and co-occurrence words may be extracted from a group of tweets including a parent keyword and a child keyword, and a specific event may be estimated using the important words and co-occurrence words. This will be described below. The same configuration as that of the event estimation system 1 of the first embodiment is designated by the same reference numeral, and the description of the overlapping configuration and operation will be omitted.

図８は、実施例２の情報処理装置３の機能構成の一例を示す説明図である。図８に示す情報処理装置３は、制御部３Ｃと、記憶部３Ｄとを有する。制御部３Ｃは、例えば、ＣＰＵ１５に対応する。制御部３Ｃは、例えば、ＲＯＭ１３に格納された情報処理プログラムをＲＡＭ上１４に展開し、ＲＡＭ１４上に展開された情報処理プログラムを情報処理プロセスとして実行することで、例えば、特定部４１、抽出部４２、重要単語特定部４３、共起単語特定部４４、出力部４５及び推定部４６を機能として実行する。記憶部３Ｄは、例えば、ＨＤＤ１２、ＲＯＭ１３及びＲＡＭ１４等に対応する。 FIG. 8 is an explanatory diagram showing an example of the functional configuration of the information processing apparatus 3 of the second embodiment. The information processing device 3 shown in FIG. 8 has a control unit 3C and a storage unit 3D. The control unit 3C corresponds to, for example, the CPU 15. The control unit 3C, for example, expands the information processing program stored in the ROM 13 on the RAM 14 and executes the information processing program expanded on the RAM 14 as an information processing process. 42, the important word identification unit 43, the co-occurrence word identification unit 44, the output unit 45, and the estimation unit 46 are executed as functions. The storage unit 3D corresponds to, for example, HDD 12, ROM 13, RAM 14, and the like.

記憶部３Ｄは、キーワード記憶部５１と、重要単語記憶部５２と、共起単語記憶部５３とを有する。キーワード記憶部５１は、例えば、親キーワード及び子キーワード等のキーワードを記憶する領域である。重要単語記憶部５２は、例えば、重要単語を記憶する領域である。共起単語記憶部５３は、例えば、共起単語を記憶する領域である。 The storage unit 3D has a keyword storage unit 51, an important word storage unit 52, and a co-occurrence word storage unit 53. The keyword storage unit 51 is an area for storing keywords such as a parent keyword and a child keyword, for example. The important word storage unit 52 is, for example, an area for storing important words. The co-occurrence word storage unit 53 is, for example, an area for storing co-occurrence words.

特定部４１は、キーワード等の一定の条件で一定時間毎にツイートを収集する。特定部４１は、一定条件のツイート群のバースト発生を検知する。尚、一定条件のツイート群のバースト発生とは、例えば、所定のワードが含まれるメッセージ発信の一時的な増加である。抽出部４２は、バースト期間中の現在ツイート群を含む第１の文書を取得する。更に、抽出部４２は、過去の平常時のツイート群の内、対象条件に適合した比較対象ツイート群を含む第２の文書を取得する。尚、対象条件は、例えば、日単位、時間単位、曜日単位、時間帯単位やツイート数単位でツイート群から比較対象ツイート群を抽出するための条件である。また、対象条件は、バースト期間の内外の期間の比較対象ツイート群を抽出するための条件である。日単位の抽出条件は、バーストを検知した日のツイート群と比較する場合、過去Ｄ日分のツイート群を比較対象とし、例えば、コンサートイベント等の事象を日単位で区別する場合に適している。時間単位の抽出条件は、バーストを検知した時間Ｈの過去のツイート群と比較する場合、過去Ｄ×Ｈ時間分のツイート群を比較対象とし、例えば、渋滞等の事象が時間単位で区別する場合に適している。曜日単位の抽出条件は、バーストを検知した曜日の過去のツイート群を比較対象とし、例えば、列車遅延等の曜日特性の影響が大きい事象を区別する場合に適している。時間帯単位の抽出条件は、バーストを検知した時間帯の過去のツイート群を比較対象とし、例えば、台風時の通勤状況等の時間帯の影響が大きい事象を区別する場合に適している。ツイート数単位の抽出条件は、バーストを検知した間のツイート数と同数の過去のツイート群を比較対象とし、例えば、水道管破裂等の一般的な事象ではなく、平常時は頻繁に投稿されず、十分なツイート数が得られない事象を区別する場合に適している。 The specific unit 41 collects tweets at regular intervals under certain conditions such as keywords. The specific unit 41 detects the occurrence of a burst of tweets under certain conditions. The burst occurrence of a tweet group under a certain condition is, for example, a temporary increase in message transmission including a predetermined word. The extraction unit 42 acquires the first document including the current tweet group during the burst period. Further, the extraction unit 42 acquires a second document including a comparison target tweet group that matches the target condition from the past normal tweet group. The target condition is, for example, a condition for extracting a comparison target tweet group from a tweet group in units of day, time, day of the week, time zone, or number of tweets. In addition, the target condition is a condition for extracting a comparison target tweet group during the period inside and outside the burst period. The daily extraction condition is suitable when comparing with the tweet group of the day when the burst is detected, the tweet group for the past D days is targeted for comparison, and for example, when the event such as a concert event is distinguished on a daily basis. .. The time-based extraction condition is when comparing with the past tweet group of time H when the burst is detected, the tweet group for the past D × H hours is targeted for comparison, and for example, when an event such as a traffic jam is distinguished by time unit. Suitable for. The extraction condition for each day of the week is suitable for comparing past tweets on the day when the burst is detected, for example, for distinguishing an event such as a train delay that is greatly affected by the day of the week characteristics. The extraction condition for each time zone is suitable for comparing past tweet groups in the time zone in which a burst is detected, and for distinguishing events that are greatly affected by the time zone, such as commuting status during a typhoon. The extraction condition for each number of tweets is to compare the same number of past tweets as the number of tweets while the burst was detected. For example, it is not a general event such as a water pipe rupture, and it is not posted frequently in normal times. , Suitable for distinguishing events where a sufficient number of tweets cannot be obtained.

重要単語特定部４３は、現在のツイート群を含む第１の文書及び比較対象ツイート群を含む第２の文書を形態素解析して単語分割する。重要単語特定部４３は、（数１）、（数２）及び（数３）を使用して、単語毎にＴＦ－ＩＤＦ値を算出する。重要単語特定部４３は、上位ＴＦ－ＩＤＦ値の単語を重要単語として重要単語記憶部５２に記憶する。 The important word identification unit 43 morphologically analyzes the first document including the current tweet group and the second document including the comparison target tweet group, and divides the words into words. The important word identification unit 43 calculates the TF-IDF value for each word using (Equation 1), (Equation 2) and (Equation 3). The important word identification unit 43 stores a word having a higher TF-IDF value as an important word in the important word storage unit 52.

尚、（数３）に使用する単語種別の補正値は、重要単語であるにも関わらず、ＩＤＦ値が低い場合、例えば、普段から数が少ないものの、出現自体はしており、バースト時に多く出現する単語に適用する際の補正値である。補正値は、場所、固有名詞、動詞、形容詞（状態）等の知りたい情報に応じて調整しても良い。例えば、場所、固有名詞及び状態を知りたい場合は、補正値として「場所」を１、「固有名詞」を１、「動詞」を０、「形容詞」を０．５に設定する。つまり、補正値が大きくなるに連れてＴＦ－ＩＤＦ値が上位となる。 In addition, the correction value of the word type used in (Equation 3) is an important word, but when the IDF value is low, for example, although the number is small from usual, it appears itself and is large at the time of burst. This is a correction value when applied to the words that appear. The correction value may be adjusted according to the information to be known such as a place, a proper noun, a verb, and an adjective (state). For example, if you want to know the place, proper noun and state, set "place" to 1, "proper noun" to 1, "verb" to 0, and "adjective" to 0.5 as correction values. That is, as the correction value increases, the TF-IDF value becomes higher.

共起単語特定部４４は、現在のツイート群を含む第１の文書では、ＴＦ－ＩＤＦ値が高くなる単語ほど、その事象を表す重要な単語となる。共起単語特定部４４は、第１の文書に対して１ツイート単位でＴＦ－ＩＤＦ値上位の単語、すなわち重要単語と共起する共起単語を特定し、重要単語の共起単語の共起回数をカウントする。共起単語特定部４４は、重要単語の共起回数が上位の共起単語を共起単語記憶部５３に記憶する。 In the first document including the current tweet group, the co-occurrence word identification unit 44 becomes an important word representing the event as the TF-IDF value becomes higher. The co-occurrence word identification unit 44 identifies a word having a higher TF-IDF value for the first document in units of one tweet, that is, a co-occurrence word that co-occurs with an important word, and co-occurrence of the co-occurrence word of the important word. Count the number of times. The co-occurrence word identification unit 44 stores the co-occurrence words having the higher number of co-occurrence of important words in the co-occurrence word storage unit 53.

出力部４５は、重要単語記憶部５２に記憶中の重要単語と、共起単語記憶部５３に記憶中の重要単語に対応した共起単語との組合せを推定部４６に出力する。推定部４６は、重要単語及び共起単語の組合せに応じて事例を推定する。 The output unit 45 outputs to the estimation unit 46 a combination of the important word stored in the important word storage unit 52 and the co-occurrence word corresponding to the important word stored in the co-occurrence word storage unit 53. The estimation unit 46 estimates the case according to the combination of the important word and the co-occurrence word.

図９は、鹿児島で地震が発生した際のキーワード「地震」を含むツイート群から事象に関わる単語組合せの一例を示す説明図である。重要単語特定部４３は、「地震」を含むツイート群からＴＦ－ＩＤＦ上位の重要単語「鹿児島」を特定する。更に、共起単語特定部４４は、「地震」を含む今回のツイート群及び、「地震」を含む過去のツイート群から重要単語「鹿児島」に共起する単語の内、出現回数上位の共起単語「震度５強」、「大きい」及び「震源地」を特定する。そして、推定部４６は、重要単語「鹿児島」、共起単語「震度５強」、「大きい」及び「震源地」の組合せで「鹿児島で震度５強の地震が発生した」との事象を推定する。 FIG. 9 is an explanatory diagram showing an example of a word combination related to an event from a group of tweets including the keyword “earthquake” when an earthquake occurs in Kagoshima. The important word identification unit 43 identifies the important word “Kagoshima” at the top of TF-IDF from the tweet group including “earthquake”. Furthermore, the co-occurrence word identification unit 44 has the highest number of appearances among the words that co-occur to the important word "Kagoshima" from the current tweet group including "earthquake" and the past tweet group including "earthquake". Identify the words "seismic intensity 5+", "large" and "earthquake source". Then, the estimation unit 46 estimates the event that "an earthquake with a seismic intensity of 5 or higher occurred in Kagoshima" by combining the important word "Kagoshima", the co-occurrence words "seismic intensity 5 upper", "large" and "earthquake source". do.

図１０は、対象条件の一例を示す説明図である。制御部３Ｃ内の抽出部４２は、過去のツイート群から時間帯単位の対象条件に適合した比較対象ツイート群を抽出する。図１０に示す対象条件は、時間帯単位の抽出条件である。抽出部４２は、今回のバースト期間のツイート群が８月３日の１０時～１８時の時間帯で対象条件が時間帯単位とした場合、例えば、過去の８月２日、８月１日、７月３１日…の１０時～１８時の時間帯のツイート群を比較対象ツイート群として過去のツイート群から抽出する。 FIG. 10 is an explanatory diagram showing an example of the target condition. The extraction unit 42 in the control unit 3C extracts a comparison target tweet group that matches the target condition for each time zone from the past tweet group. The target condition shown in FIG. 10 is an extraction condition for each time zone. When the tweet group of this burst period is in the time zone from 10:00 to 18:00 on August 3 and the target condition is in the time zone unit, for example, the past August 2nd and August 1st , July 31 ... The tweet group in the time zone from 10:00 to 18:00 is extracted from the past tweet group as the comparison target tweet group.

図１１は、現在ツイート群（鹿児島で地震が発生した際のツイート群）及び比較対象ツイート群（過去の平常時の地震に関するツイート群）に関わる重要単語の一例を示す説明図である。現在ツイート群は、今回のバースト期間のツイート群であって、鹿児島で地震が発生した際のツイート群である。比較対象ツイート群は、抽出部４２で抽出した過去の平常時の地震に関するツイート群である。制御部３Ｃ内の重要単語特定部４３は、今回のツイート群の第１の文書及び比較対象ツイート群の第２の文書を形態素解析して単語分割する。更に、重要単語特定部４３は、分割単語毎にＴＦ－ＩＤＦ値を算出し、上位ＴＦ－ＩＤＦ値の単語を重要単語として抽出する。 FIG. 11 is an explanatory diagram showing an example of important words related to the current tweet group (tweet group when an earthquake occurred in Kagoshima) and the comparison target tweet group (tweet group related to past normal earthquakes). Currently, the tweet group is the tweet group during this burst period, and is the tweet group when the earthquake occurred in Kagoshima. The comparison target tweet group is a tweet group related to past normal earthquakes extracted by the extraction unit 42. The important word identification unit 43 in the control unit 3C analyzes the first document of the tweet group of this time and the second document of the tweet group to be compared by morphological analysis and divides the words. Further, the important word specifying unit 43 calculates the TF-IDF value for each divided word, and extracts the word having the higher TF-IDF value as the important word.

重要単語特定部４３は、現在ツイート群及び比較対象ツイート群の上位ＴＦ－ＩＤＦの重要単語として、図１１に示すように、第１位が「鹿児島」、第２位が「停電」、第３位が「土砂崩れ」とランキング化して重要単語記憶部５２に記憶している。図１１に示す重要単語は、地震発生時に共通して現れる単語よりも、場所やその地震発生時特有の単語がＴＦ－ＩＤＦ上位に出現しやすくなる傾向にある。 As shown in FIG. 11, the important word identification unit 43 ranks first as "Kagoshima", second as "power outage", and third as important words of the upper TF-IDF of the current tweet group and the comparison target tweet group. The rank is ranked as "landslide" and stored in the important word storage unit 52. As for the important words shown in FIG. 11, there is a tendency that a place or a word peculiar to the occurrence of an earthquake is more likely to appear at the upper level of TF-IDF than a word commonly appearing at the time of an earthquake.

図１２は、鹿児島で地震が発生した際のツイート群の重要単語に関わる共起単語毎の出現回数の一例を示す説明図である。制御部３Ｃ内の共起単語特定部４４は、今回の現在ツイート群から重要単語の共起単語を特定する。そして、共起単語特定部４４は、重要単語毎の共起単語の出現回数である共起回数をカウントする。共起単語特定部４４は、図１２に示すように、例えば、重要単語「鹿児島」の共起単語として、「震源地」の共起回数が２回、「土砂崩れ」の共起回数が２回、「停電」の共起回数が１回として共起単語記憶部５３に記憶する。 FIG. 12 is an explanatory diagram showing an example of the number of occurrences of each co-occurrence word related to an important word in a tweet group when an earthquake occurs in Kagoshima. The co-occurrence word identification unit 44 in the control unit 3C identifies the co-occurrence word of the important word from the current tweet group this time. Then, the co-occurrence word identification unit 44 counts the number of co-occurrence words, which is the number of appearances of the co-occurrence words for each important word. As shown in FIG. 12, the co-occurrence word identification unit 44 has, for example, two co-occurrence words of "earthquake source" and two co-occurrence words of "landslide" as co-occurrence words of the important word "Kagoshima". , The number of co-occurrence of "power failure" is set to 1 and stored in the co-occurrence word storage unit 53.

図１３は、渋滞時のツイート群から事象推定の一例を示す説明図である。重要単語特定部４３は、キーワードが「渋滞」で、重要単語「基山」、「亀山」及び「一宮」と特定したとする。また、共起単語特定部４４は、重要単語「基山」の共起単語が「付近」、「上り」、「下り」及び「事故」、重要単語「亀山」の共起単語が「付近」、「ＩＣ」、「上り」及び「下り」、重要単語「一宮」の共起単語が「付近」、「ＩＣ」及び「上り」と特定したとする。推定部４６は、これら重要単語及び共起単語の組合せから「基山、亀山で上り下り渋滞、一宮で上り渋滞、また、事故が発生」との事例を推定できる。 FIG. 13 is an explanatory diagram showing an example of event estimation from a group of tweets during a traffic jam. It is assumed that the important word identification unit 43 identifies the important words "Kiyama", "Kameyama" and "Ichinomiya" with the keyword "traffic jam". Further, in the co-occurrence word identification unit 44, the co-occurrence word of the important word "Kiyama" is "near", "up", "down" and "accident", and the co-occurrence word of the important word "Kameyama" is "near". It is assumed that the co-occurrence words of "IC", "up" and "down", and the important word "Ichinomiya" are specified as "nearby", "IC" and "up". From the combination of these important words and co-occurrence words, the estimation unit 46 can estimate the case of "up and down traffic jams at Kiyama and Kameyama, up and down traffic jams at Ichinomiya, and an accident".

図１４は、渋滞時のツイート群から事象推定の一例を示す説明図である。重要単語特定部４３は、キーワードが「渋滞」で、重要単語が「アクアライン」、「木更津」及び「帰宅」と特定したとする。更に、共起単語特定部４４は、重要単語「アクアライン」の共起単語が「時間」、「上り」及び「帰り」、重要単語「木更津」の共起単語が「上り」、「ＩＣ」及び「付近」、重要単語「帰宅」の共起単語が「時間」と特定したとする。推定部４６は、これら重要単語及び共起単語の組合せから「アクアライン上り、木更津付近で、帰宅時間に渋滞が発生」との事例を推定できる。 FIG. 14 is an explanatory diagram showing an example of event estimation from a group of tweets during a traffic jam. It is assumed that the important word identification unit 43 specifies that the keyword is "traffic jam" and the important words are "aqua line", "Kisarazu" and "return home". Further, in the co-occurrence word identification unit 44, the co-occurrence words of the important word "Aqualine" are "time", "up" and "return", and the co-occurrence words of the important word "Kisaratsu" are "up" and "IC". And, it is assumed that the co-occurrence word of "nearby" and the important word "homecoming" is specified as "time". From the combination of these important words and co-occurrence words, the estimation unit 46 can estimate a case of "traffic jam occurs at the time of returning home near the aqua line and Kisarazu".

図１５は、鹿児島で地震が発生した際のツイート群から事象推定の一例を示す説明図である。重要単語特定部４３は、キーワードが「地震」、重要単語が「鹿児島」、「停電」、「土砂崩れ」及び「雨」と特定したとする。更に、共起単語特定部４４は、重要単語「鹿児島」の共起単語が「震度」、「九州」、「大丈夫」、「心配」、「停電」及び「震源」、重要単語「停電」の共起単語が「鹿児島」、「九州」及び「南」と特定したとする。更に、共起単語特定部４４は、重要単語「土砂崩れ」の共起単語が「九州」、「鹿児島」及び「心配」、重要単語「雨」の共起単語が「九州」、「大丈夫」、「鹿児島」及び「心配」と特定したとする。推定部４６は、これら重要単語及び共起単語の組合せから「鹿児島を震源とした地震が発生、停電も発生、雨も重なり土砂崩れの心配」との事例を推定できる。 FIG. 15 is an explanatory diagram showing an example of event estimation from a group of tweets when an earthquake occurs in Kagoshima. It is assumed that the important word identification unit 43 identifies the keywords as "earthquake" and the important words as "Kagoshima", "power outage", "earthquake" and "rain". Furthermore, in the co-occurrence word identification unit 44, the co-occurrence words of the important word "Kagoshima" are "seismic intensity", "Kyushu", "OK", "worry", "power failure" and "earthquake source", and the important word "power failure". Suppose that the co-occurrence words are specified as "Kagoshima", "Kyushu", and "South". Further, in the co-occurrence word identification unit 44, the co-occurrence words of the important word "landslide" are "Kyushu", "Kagoshima" and "worry", and the co-occurrence words of the important word "rain" are "Kyushu", "OK". Suppose that you have identified "Kagoshima" and "worry". From the combination of these important words and co-occurrence words, the estimation unit 46 can estimate a case of "an earthquake with Kagoshima as the epicenter, a power outage, and a concern about landslides due to rain."

図１６は、地震義援金を募集した際のツイート群から事象推定の一例を示す説明図である。重要単語特定部４３は、キーワードが「地震」、重要単語が「スマ」、「支援」、「義援金」及び「応援」と特定したとする。更に、共起単語特定部４４は、重要単語「スマ」の共起単語が「復興」、「地震」、「東日本」、「熊本」及び「大震災」、重要単語の「支援」が「復興」、「東日本」、「大震災」、「熊本」及び「地震」と仮定する。更に、重要単語「義援金」の共起単語が「復興」、「熊本」、「地震」、「東日本」及び「お願い」、重要単語「応援」の共起単語が「復興」、「大震災」、「東日本」、「熊本」及び「地震」と特定したとする。推定部４６は、これら重要単語及び共起単語の組合せから「地震が発生したわけではなく、熊本・東日本における震災に対する支援」との事例を推定できる。 FIG. 16 is an explanatory diagram showing an example of event estimation from a group of tweets when soliciting earthquake relief funds. It is assumed that the important word identification unit 43 identifies the keywords as "earthquake" and the important words as "suma", "support", "donation" and "support". Furthermore, in the co-occurrence word identification unit 44, the co-occurrence words of the important word "suma" are "reconstruction", "earthquake", "eastern Japan", "Kumamoto" and "great earthquake", and the important word "support" is "reconstruction". , "Eastern Japan", "Great Earthquake", "Kumamoto" and "Earthquake". Furthermore, the co-occurrence words of the important word "donation" are "reconstruction", "Kumamoto", "earthquake", "eastern Japan" and "request", and the co-occurrence words of the important word "support" are "reconstruction", "great earthquake", It is assumed that "Eastern Japan", "Kumamoto" and "earthquake" are identified. From the combination of these important words and co-occurrence words, the estimation unit 46 can estimate the case of "the earthquake did not occur, but the support for the earthquake in Kumamoto and eastern Japan".

図１７は、北海道北見市で停電を発生した際のツイート群から事象推定の一例を示す説明図である。重要単語特定部４３は、キーワードが「停電」、重要単語が「雷」、「北見」、「一瞬」及び「信号」と特定したとする。更に、共起単語特定部４４は、重要単語「雷」の共起単語が「笑」及び、「北見」、重要単語「北見」の共起単語が「市」、「情報」及び「笑」、重要単語「一瞬」の共起単語が「雷」、「笑」及び「電気」、重要単語「信号」の共起単語が「電気」、「中」、「家」及び「復活」と特定したとする。推定部４６は、これら重要単語及び共起単語の組合せから「北海道北見市で雷が原因の停電が発生し、瞬間的な停電で、信号等も影響を受けた」との事例を推定できる。 FIG. 17 is an explanatory diagram showing an example of event estimation from a group of tweets when a power failure occurs in Kitami City, Hokkaido. It is assumed that the important word identification unit 43 specifies that the keyword is "power failure" and the important words are "thunder", "Kitami", "moment" and "signal". Further, in the co-occurrence word identification unit 44, the co-occurrence words of the important word "thunder" are "laughs" and "Kitami", and the co-occurrence words of the important words "Kitami" are "city", "information" and "laughs". , The co-occurrence words of the important word "moment" are identified as "thunder", "laugh" and "electricity", and the co-occurrence words of the important word "signal" are identified as "electricity", "middle", "house" and "revival". Suppose you did. From the combination of these important words and co-occurrence words, the estimation unit 46 can estimate the case that "a power outage caused by lightning occurred in Kitami City, Hokkaido, and the signal and the like were affected by the momentary power outage."

図１８は、神奈川県横浜市で停電を発生した際のツイート群から事象推定の一例を示す説明図である。重要単語特定部４３は、キーワードが「停電」、重要単語が「中」、「横浜」及び「うち」と特定したとする。更に、共起単語特定部４４は、重要単語「中」の共起単語が「市」、「区」及び「横浜」、重要単語「横浜」の共起単語が「中」、「市」及び「区」、重要単語「うち」の共起単語が「近所」及び「一帯」と特定したとする。推定部４６は、重要単語及び共起単語の組合せから「横浜市中区で停電が発生」との事例を推定できる。 FIG. 18 is an explanatory diagram showing an example of event estimation from a group of tweets when a power failure occurs in Yokohama City, Kanagawa Prefecture. It is assumed that the important word identification unit 43 specifies that the keyword is "power outage" and the important words are "middle", "Yokohama" and "uchi". Further, in the co-occurrence word identification unit 44, the co-occurrence words of the important word "middle" are "city", "ward" and "Yokohama", and the co-occurrence words of the important word "Yokohama" are "middle", "city" and It is assumed that the co-occurrence words of "ward" and the important word "uchi" are specified as "neighborhood" and "area". The estimation unit 46 can estimate the case of "a power outage has occurred in Naka-ku, Yokohama" from the combination of important words and co-occurrence words.

図１９は、第１の推定処理に関わる情報処理装置３内の制御部３Ｃの処理動作の一例を示すフロー図である。図１９において制御部３Ｃ内の特定部４１は、ＳＮＳに投稿されたツイートから一定条件のツイートを収集する（ステップＳ３１）。特定部４１は、単位時間内の一定条件のツイート数をカウントする（ステップＳ３２）。特定部４１は、一定条件のツイート数が所定回数を超えたか否かを判定する（ステップＳ３３）。尚、所定回数は、例えば、平常時のツイート数のｔ倍とする。特定部４１は、一定条件のツイート数が所定回数を超えた場合（ステップＳ３３肯定）、メッセージ発信の一時的な増加、すなわち一定条件のツイート群のバートス発生と判断する（ステップＳ３４）。制御部３Ｃは、図２０に示す第１の出力処理を実行する（ステップＳ３５）。 FIG. 19 is a flow chart showing an example of the processing operation of the control unit 3C in the information processing apparatus 3 related to the first estimation processing. In FIG. 19, the specific unit 41 in the control unit 3C collects tweets with certain conditions from the tweets posted on the SNS (step S31). The specific unit 41 counts the number of tweets under a certain condition within a unit time (step S32). The specific unit 41 determines whether or not the number of tweets under a certain condition exceeds a predetermined number of times (step S33). The predetermined number of times is, for example, t times the number of tweets in normal times. When the number of tweets under a certain condition exceeds a predetermined number of times (step S33 affirmative), the specific unit 41 determines that the number of message transmissions is temporarily increased, that is, the occurrence of vertos of the tweet group under a certain condition (step S34). The control unit 3C executes the first output process shown in FIG. 20 (step S35).

制御部３Ｃ内の推定部４６は、第１の出力処理を実行した後、重要単語と共起単語との組み合わせに応じて事象を推定し（ステップＳ３６）、図１９に示す処理動作を終了する。推定部４６は、一定条件のツイート数が所定回数を超えなかった場合（ステップＳ３３否定）、一定条件のツイートを取得すべく、ステップＳ３１に移行する。 After executing the first output process, the estimation unit 46 in the control unit 3C estimates an event according to the combination of the important word and the co-occurrence word (step S36), and ends the processing operation shown in FIG. .. When the number of tweets under a certain condition does not exceed the predetermined number of times (step S33 is denied), the estimation unit 46 proceeds to step S31 in order to acquire the tweets under a certain condition.

図２０は、第１の出力処理に関わる情報処理装置３内の制御部３Ｃの処理動作の一例を示すフロー図である。図２０において制御部３Ｃ内の抽出部４２は、バースト期間中の現在ツイート群を含む第１の文書を取得する（ステップＳ４１）。抽出部４２は、過去の平常時のツイート群の内、対象条件に適合した比較対象ツイート群を含む第２の文書を取得する（ステップＳ４２）。 FIG. 20 is a flow chart showing an example of the processing operation of the control unit 3C in the information processing apparatus 3 related to the first output processing. In FIG. 20, the extraction unit 42 in the control unit 3C acquires the first document including the current tweet group during the burst period (step S41). The extraction unit 42 acquires a second document including a comparison target tweet group that matches the target condition from the past normal tweet group (step S42).

制御部３Ｃ内の重要単語特定部４３は、第１の文書及び第２の文書の形態素解析で単語分割し（ステップＳ４３）、単語毎にＴＦ－ＩＤＦ値を算出する（ステップＳ４４）。重要単語特定部４３は、単語毎のＴＦ－ＩＤＦ値が上位Ｎ位までの単語を重要単語として特定する（ステップＳ４５）。尚、重要単語特定部４３は、特定した重要単語を重要単語記憶部５２に記憶する。 The important word identification unit 43 in the control unit 3C divides words by morphological analysis of the first document and the second document (step S43), and calculates a TF-IDF value for each word (step S44). The important word specifying unit 43 identifies words having a TF-IDF value up to the highest N rank for each word as important words (step S45). The important word specifying unit 43 stores the specified important word in the important word storage unit 52.

制御部３Ｃ内の共起単語特定部４４は、重要単語毎に共起する単語を共起単語として特定し、共起単語毎の共起回数をカウントする（ステップＳ４６）。尚、共起単語は、例えば、重要単語に共起した名詞単語である。共起単語特定部４４は、特定した共起単語を共起単語記憶部５３に記憶する。 The co-occurrence word specifying unit 44 in the control unit 3C identifies a co-occurrence word for each important word as a co-occurrence word, and counts the number of co-occurrence words for each co-occurrence word (step S46). The co-occurrence word is, for example, a noun word co-occurring with an important word. The co-occurrence word specifying unit 44 stores the specified co-occurrence word in the co-occurrence word storage unit 53.

制御部３Ｃ内の出力部４５は、重要単語の共起単語の内、共起回数が昇順で上位Ｎ位までの共起単語を重要単語と組み合わせて出力し（ステップＳ４７）、図２０に示す処理動作を終了する。 The output unit 45 in the control unit 3C outputs the co-occurrence words up to the top N in ascending order of the number of co-occurrence words among the co-occurrence words of the important words in combination with the important words (step S47), and is shown in FIG. End the processing operation.

実施例２の情報処理装置３は、事象特有の重要単語を特定すると共に、他の同条件で取得した比較対象ツイート群と比較して、その事象発生時に特に顕著に現れた共起単語を特定する。情報処理装置３は、重要単語と同時に呟かれた回数が多い共起単語との組合せに応じて事象を推定する。その結果、事象特有の現象及び原因の要点を把握できる。 The information processing apparatus 3 of the second embodiment identifies an important word peculiar to an event, and also identifies a co-occurrence word that appears particularly prominently when the event occurs, as compared with other comparison target tweet groups acquired under the same conditions. do. The information processing apparatus 3 estimates an event according to a combination with a co-occurrence word that is often muttered at the same time as an important word. As a result, it is possible to grasp the main points of the phenomenon and the cause peculiar to the event.

情報処理装置３は、重要単語が含まれるメッセージ発信の一時的な増加を検知すると、該一時的な増加に対応する期間に発信されたメッセージについて重要単語と共起する共起単語を抽出する。情報処理装置３は、重要単語及び共起単語の組合せを出力し、これら重要単語及び共起単語に応じて事象を推定する。その結果、事象特有の現象及び原因の要点を把握できる。 When the information processing apparatus 3 detects a temporary increase in message transmission including important words, the information processing apparatus 3 extracts co-occurrence words that co-occur with the important words for the messages transmitted in the period corresponding to the temporary increase. The information processing apparatus 3 outputs a combination of important words and co-occurrence words, and estimates an event according to these important words and co-occurrence words. As a result, it is possible to grasp the main points of the phenomenon and the cause peculiar to the event.

尚、上記実施例２の情報処理装置３では、重要単語毎に共起する共起単語を名詞で例示したが、これに限定されるものではなく、例えば、共起単語が形容詞であっても良く、その実施の形態につき、実施例３として以下に説明する。 In the information processing apparatus 3 of the second embodiment, the co-occurrence words co-occurring for each important word are exemplified by nouns, but the present invention is not limited to this, and for example, even if the co-occurrence words are adjectives. Well, the embodiment thereof will be described below as Example 3.

図２１は、実施例３の情報処理装置３の機能構成の一例を示す説明図である。尚、実施例２の情報処理装置３と同一の構成には同一符号を付すことで、その重複する構成及び動作の説明については省略する。 FIG. 21 is an explanatory diagram showing an example of the functional configuration of the information processing apparatus 3 of the third embodiment. The same configuration as that of the information processing apparatus 3 of the second embodiment is designated by the same reference numeral, and the description of the overlapping configuration and operation will be omitted.

図２１に示す情報処理装置３は、制御部３Ｅと、記憶部３Ｆとを有する。制御部３Ｅは、例えば、ＣＰＵ１５に対応する。制御部３Ｅは、例えば、ＲＯＭ１３に格納された情報処理プログラムをＲＡＭ上１４に展開し、ＲＡＭ１４上に展開された情報処理プログラムを情報処理プロセスとして実行することで、例えば、特定部４１、抽出部４２、重要単語特定部４３、共起単語特定部４４Ａ、出力部４５及び推定部４６を機能として実行する。記憶部３Ｆは、例えば、ＨＤＤ１２、ＲＯＭ１３及びＲＡＭ１４等に対応する。 The information processing device 3 shown in FIG. 21 has a control unit 3E and a storage unit 3F. The control unit 3E corresponds to, for example, the CPU 15. The control unit 3E expands the information processing program stored in the ROM 13 on the RAM 14, and executes the information processing program expanded on the RAM 14 as an information processing process. 42, the important word identification unit 43, the co-occurrence word identification unit 44A, the output unit 45, and the estimation unit 46 are executed as functions. The storage unit 3F corresponds to, for example, HDD 12, ROM 13, RAM 14, and the like.

記憶部３Ｆは、キーワード記憶部５１と、重要単語記憶部５２と、共起単語記憶部５３とを有する。キーワード記憶部５１は、例えば、親キーワード及び子キーワード等のキーワードを記憶する領域である。重要単語記憶部５２は、例えば、重要単語を記憶する領域である。共起単語記憶部５３は、例えば、形容詞の共起単語を記憶する領域である。 The storage unit 3F has a keyword storage unit 51, an important word storage unit 52, and a co-occurrence word storage unit 53. The keyword storage unit 51 is an area for storing keywords such as a parent keyword and a child keyword, for example. The important word storage unit 52 is, for example, an area for storing important words. The co-occurrence word storage unit 53 is, for example, an area for storing co-occurrence words of adjectives.

共起単語特定部４４Ａは、重要単語に共起する単語の内、形容詞の共起単語を特定する。共起単語特定部４４は、形容詞の共起単語の内、共起回数が上位の共起単語を共起単語記憶部５３に記憶する。 The co-occurrence word specifying unit 44A identifies a co-occurrence word of an adjective among the words co-occurring with an important word. The co-occurrence word specifying unit 44 stores the co-occurrence word having the higher number of co-occurrence among the co-occurrence words of the adjective in the co-occurrence word storage unit 53.

図２２は、鹿児島で地震が発生した際のツイート群の重要単語に関わる共起単語毎の出現回数の一例を示す説明図である。共起単語特定部４４Ａは、今回の現在ツイート群から重要単語の形容詞の共起単語を特定する。そして、共起単語特定部４４Ａは、重要単語毎の共起単語の共起回数をカウントする。共起単語特定部４４Ａは、図２２に示すように、例えば、重要単語「鹿児島」の共起単語として、「大きい」の共起回数が２回、「怖い」の共起回数が２回、「多い」の共起回数が１回として共起単語記憶部５３に記憶する。 FIG. 22 is an explanatory diagram showing an example of the number of occurrences of each co-occurrence word related to an important word in a tweet group when an earthquake occurs in Kagoshima. The co-occurrence word identification unit 44A identifies the co-occurrence word of the adjective of the important word from the current tweet group this time. Then, the co-occurrence word specifying unit 44A counts the number of co-occurrence words for each important word. As shown in FIG. 22, the co-occurrence word identification unit 44A has, for example, as a co-occurrence word of the important word "Kagoshima", the number of co-occurrence of "large" is 2 times and the number of co-occurrence of "scary" is 2 times. The number of co-occurrence of "many" is set to 1 and stored in the co-occurrence word storage unit 53.

図２３は、鹿児島で地震が発生した際のツイート群から事象推定の一例を示す説明図である。重要単語特定部４３は、キーワードが「地震」、重要単語が「鹿児島」、「最近」及び「停電」と特定したとする。更に、共起単語特定部４４Ａは、重要単語「鹿児島」の共起単語が「大きい」、「多い」、「強い」及び「怖い」、重要単語「最近」の共起単語が「多い」、「怖い」及び「大きい」、重要単語「停電」の共起単語が「暑い」と特定したとする。推定部４６は、これらの重要単語及び共起単語の組合せから「鹿児島で大きい地震が最近多発して停電で暑くなっている」との事例を推定できる。 FIG. 23 is an explanatory diagram showing an example of event estimation from a group of tweets when an earthquake occurs in Kagoshima. It is assumed that the important word identification unit 43 specifies that the keyword is "earthquake" and the important words are "Kagoshima", "recently" and "power outage". Further, in the co-occurrence word identification unit 44A, the co-occurrence words of the important word "Kagoshima" are "large", "many", "strong" and "scary", and the co-occurrence words of the important word "recent" are "many". Suppose that the co-occurrence words "scary" and "big" and the important word "power failure" are identified as "hot". From the combination of these important words and co-occurrence words, the estimation unit 46 can estimate the case that "a large earthquake has recently occurred frequently in Kagoshima and it has become hot due to a power outage."

図２４は、地震義援金を募集した際のツイート群から事象推定の一例を示す説明図である。重要単語特定部４３は、キーワードが「地震」、重要単語が「復興」、「熊本」、「東日本」及び「支援」と特定したとする。更に、共起単語特定部４４Ａは、重要単語「復興」の共起単語が「宜しく」、「温かい」及び「深」、重要単語「熊本」の共起単語が「宜しく」、「温かい」及び「暖かい」、重要単語「東日本」の共起単語が「宜しく」、「温かい」及び「詳しく」、重要単語「支援」の共起単語が「宜しく」、「温かい」及び「暖かい」と特定したとする。推定部４６は、重要単語及び共起単語の組合せから「地震が発生したわけではなく、熊本と東日本の地震に関しての温かい支援をお願いしている」との事例を推定できる。 FIG. 24 is an explanatory diagram showing an example of event estimation from a group of tweets when soliciting earthquake relief funds. It is assumed that the important word identification unit 43 identifies the keywords as "earthquake" and the important words as "reconstruction", "Kumamoto", "eastern Japan" and "support". Further, in the co-occurrence word identification unit 44A, the co-occurrence words of the important word "reconstruction" are "nice", "warm" and "deep", and the co-occurrence words of the important word "Kumamoto" are "nice", "warm" and The co-occurrence words of "warm" and the important word "East Japan" were identified as "nice", "warm" and "details", and the co-occurrence words of the important word "support" were identified as "nice", "warm" and "warm". And. From the combination of important words and co-occurrence words, the estimation unit 46 can estimate the case that "the earthquake did not occur and we are requesting warm support for the earthquakes in Kumamoto and eastern Japan."

図２５は、北海道北見市で停電を発生した際のツイート群から事象推定の一例を示す説明図である。重要単語特定部４３は、キーワードが「停電」、重要単語が「雷」、「雨」及び「北見」と特定したとする。更に、共起単語特定部４４Ａは、重要単語「雷」の共起単語が「すごい」、「やばい」及び「怖い」、重要単語「雨」の共起単語が「すごい」、「やばい」及び「怖い」、重要単語「北見」の共起単語が「広い」、「危ない」及び「幅広い」と特定したとする。推定部４６は、重要単語及び共起単語の組合せから「雷が原因で停電が発生し、雨も降っており、北見市の広い範囲で停電している」との事例を推定できる。 FIG. 25 is an explanatory diagram showing an example of event estimation from a group of tweets when a power failure occurs in Kitami City, Hokkaido. It is assumed that the important word identification unit 43 specifies that the keyword is "power outage" and the important words are "thunderstorm", "rain" and "Kitami". Further, in the co-occurrence word identification unit 44A, the co-occurrence words of the important word "thunder" are "wow", "bad" and "scary", and the co-occurrence words of the important word "rain" are "wow", "bad" and Suppose that the co-occurrence words of "scary" and the important word "Kitami" are identified as "wide", "dangerous", and "wide". From the combination of important words and co-occurrence words, the estimation unit 46 can estimate the case that "a power outage has occurred due to lightning, it is raining, and a power outage has occurred in a wide area of Kitami City."

図２６は、神奈川県横浜市で停電を発生した際のツイート群から事象推定の一例を示す説明図である。重要単語特定部４３は、キーワードが「停電」、重要単語が「横浜」及び「情報」と特定したとする。更に、共起単語特定部４４Ａは、重要単語「横浜」の共起単語が「珍しい」及び「恐ろし」、重要単語「情報」の共起単語が「なし」と特定したとする。推定部４６は、重要単語及び共起単語の組合せから「横浜市で珍しく停電が発生し、情報が無い」との事例を推定する。 FIG. 26 is an explanatory diagram showing an example of event estimation from a group of tweets when a power failure occurs in Yokohama City, Kanagawa Prefecture. It is assumed that the important word identification unit 43 identifies the keyword as "power outage" and the important words as "Yokohama" and "information". Further, it is assumed that the co-occurrence word identification unit 44A identifies the co-occurrence word of the important word "Yokohama" as "rare" and "horror", and the co-occurrence word of the important word "information" as "none". The estimation unit 46 estimates a case where "a power outage occurs rarely in Yokohama City and there is no information" from the combination of important words and co-occurrence words.

図２７は、第２の推定処理に関わる情報処理装置３内の制御部３Ｅの処理動作の一例を示すフロー図である。図２７において制御部３Ｅは、ステップＳ３４にて一定条件のツイート群のバートス発生と判断した後、図２８に示す第２の出力処理を実行する（ステップＳ３５Ａ）。推定部４６は、第２の出力処理を実行した後、重要単語と共起単語（形容詞）との組み合わせに応じて事象を推定し（ステップＳ３６Ａ）、図２７に示す処理動作を終了する。尚、共起単語は、例えば、形容詞である。 FIG. 27 is a flow chart showing an example of the processing operation of the control unit 3E in the information processing apparatus 3 related to the second estimation processing. In FIG. 27, the control unit 3E executes the second output process shown in FIG. 28 after determining in step S34 that the tweet group of tweets under certain conditions has been generated (step S35A). After executing the second output process, the estimation unit 46 estimates an event according to the combination of the important word and the co-occurrence word (adjective) (step S36A), and ends the process operation shown in FIG. 27. The co-occurrence word is, for example, an adjective.

図２８は、第２の出力処理に関わる情報処理装置３内の制御部３Ｅの処理動作の一例を示すフロー図である。図２８において制御部３Ｅ内の重要単語特定部４３は、ステップＳ４５にて単語毎のＴＦ－ＩＤＦから上位ＴＦ－ＩＤＦ値の単語を重要単語として抽出する。制御部３Ｅ内の共起単語特定部４４Ａは、重要単語毎に共起する形容詞の共起単語の共起回数をカウントする（ステップＳ４６Ａ）。尚、共起単語は、例えば、形容詞である。 FIG. 28 is a flow chart showing an example of the processing operation of the control unit 3E in the information processing apparatus 3 related to the second output processing. In FIG. 28, the important word identification unit 43 in the control unit 3E extracts a word having a higher TF-IDF value from the TF-IDF for each word as an important word in step S45. The co-occurrence word identification unit 44A in the control unit 3E counts the number of co-occurrence words of the adjectives that co-occur for each important word (step S46A). The co-occurrence word is, for example, an adjective.

出力部４５は、重要単語の共起単語の内、共起回数が昇順で上位Ｎ位までの形容詞の共起単語を重要単語と組み合わせて出力し（ステップＳ４７Ａ）、図２８に示す処理動作を終了する。 The output unit 45 outputs the co-occurrence words of the adjectives up to the top N in ascending order of the number of co-occurrence words among the co-occurrence words of the important words in combination with the important words (step S47A), and performs the processing operation shown in FIG. 28. finish.

情報処理装置３は、事象特有の重要単語を特定すると共に、他の同条件で取得した比較対象ツイート群と比較して、その事象発生時に特に顕著に現れた形容詞の共起単語を特定する。情報処理装置３は、重要単語と同時に呟かれた回数が多い形容詞の共起単語との組合せに応じて事象を推定する。その結果、事象特有の現象及び原因の要点を把握できる。 The information processing apparatus 3 identifies an important word peculiar to an event, and also identifies a co-occurrence word of an adjective that appears particularly prominently when the event occurs, as compared with other comparison target tweets acquired under the same conditions. The information processing apparatus 3 estimates an event according to a combination with a co-occurrence word of an adjective that is often muttered at the same time as an important word. As a result, it is possible to grasp the main points of the phenomenon and the cause peculiar to the event.

情報処理装置３は、重要単語が含まれるツイート発信の一時的な増加を検知すると、該一時的な増加に対応する期間に発信されたツイートについて重要単語と共起する形容詞の共起単語を特定する。情報処理装置３は、特定された形容詞の共起単語を含む期間の内外におけるツイートの発信状況に基づいて期間を評価する。その結果、重要単語及び形容詞の共起単語に応じて今回のツイート群の期間を評価できる。事象毎に頻出する形容詞に応じて言葉の変化を登録できる。 When the information processing device 3 detects a temporary increase in the transmission of tweets containing important words, it identifies a co-occurrence word of an adjective that co-occurs with the important word for the tweet transmitted during the period corresponding to the temporary increase. do. The information processing device 3 evaluates the period based on the transmission status of the tweet inside and outside the period including the co-occurrence word of the specified adjective. As a result, the period of this tweet group can be evaluated according to the co-occurrence words of important words and adjectives. You can register changes in words according to the adjectives that frequently appear for each event.

また、図示した各部の各構成要素は、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各部の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。 Further, each component of each of the illustrated parts does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each part is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / integrated in any unit according to various loads and usage conditions. Can be configured.

更に、各装置で行われる各種処理機能は、ＣＰＵ（Central Processing Unit）（又はＭＰＵ（Micro Processing Unit）、ＭＣＵ（Micro Controller Unit）等のマイクロ・コンピュータ）上で、その全部又は任意の一部を実行するようにしても良い。また、各種処理機能は、ＣＰＵ（又はＭＰＵ、ＭＣＵ等のマイクロ・コンピュータ）で解析実行するプログラム上、又はワイヤードロジックによるハードウェア上で、その全部又は任意の一部を実行するようにしても良いことは言うまでもない。 Further, various processing functions performed by each device are performed on a CPU (Central Processing Unit) (or a microcomputer such as an MPU (Micro Processing Unit) or an MCU (Micro Controller Unit)) in whole or in any part thereof. You may try to do it. Further, various processing functions may be executed in whole or in any part on a program analyzed and executed by a CPU (or a microcomputer such as an MPU or MCU) or on hardware by wired logic. Needless to say.

ところで、本実施例で説明した各種の処理は、予め用意されたプログラムを情報処理装置で実行することで実現できる。そこで、以下では、上記実施例と同様の機能を有するプログラムを実行する情報処理装置の一例を説明する。図２９は、情報処理プログラムを実行する情報処理装置の一例を示す説明図である。 By the way, various processes described in this embodiment can be realized by executing a program prepared in advance in the information processing apparatus. Therefore, in the following, an example of an information processing apparatus that executes a program having the same function as that of the above embodiment will be described. FIG. 29 is an explanatory diagram showing an example of an information processing apparatus that executes an information processing program.

図２９に示す情報処理プログラムを実行する情報処理装置１００では、通信装置１１０と、表示装置１２０と、操作装置１３０と、ＨＤＤ１４０と、ＲＯＭ１５０と、ＲＡＭ１６０と、ＣＰＵ１７０と、バス１８０とを有する。通信装置１１０は、図示せぬ通信網と接続し、通信網内のＳＮＳの投稿を収集する。 The information processing device 100 that executes the information processing program shown in FIG. 29 includes a communication device 110, a display device 120, an operation device 130, an HDD 140, a ROM 150, a RAM 160, a CPU 170, and a bus 180. The communication device 110 connects to a communication network (not shown) and collects SNS posts in the communication network.

そして、ＲＯＭ１５０には、上記実施例と同様の機能を発揮する情報処理プログラムが予め記憶されている。尚、情報処理プログラムは、必ずしも最初からＲＯＭ１５０に記憶させておかなくても良く、図示せぬドライブで読取可能な記録媒体に情報処理プログラムが記録されていても良い。また、記録媒体としては、例えば、フレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＤＶＤディスク、ＵＳＢメモリ、ＳＤカードやＩＣカード等の可搬型記録媒体、フラッシュメモリ等の半導体メモリ等でも良い。情報処理装置１００が記録媒体に記憶中の情報処理プログラムを読み出して実行するようにしても良い。また、情報処理プログラムとしては、図２９に示すように、特定プログラム１５０Ａ、抽出プログラム１５０Ｂ、比較プログラム１５０Ｃ及び出力プログラム１５０Ｄが含まれる。尚、プログラム１５０Ａ～１５０Ｄについては、適宜統合又は分散しても良い。 The ROM 150 stores in advance an information processing program that exhibits the same functions as those in the above embodiment. The information processing program does not necessarily have to be stored in the ROM 150 from the beginning, and the information processing program may be recorded on a recording medium that can be read by a drive (not shown). The recording medium may be, for example, a flexible disk (FD), a CD-ROM, a DVD disk, a USB memory, a portable recording medium such as an SD card or an IC card, a semiconductor memory such as a flash memory, or the like. The information processing apparatus 100 may read and execute the information processing program stored in the recording medium. Further, as shown in FIG. 29, the information processing program includes a specific program 150A, an extraction program 150B, a comparison program 150C, and an output program 150D. The programs 150A to 150D may be integrated or dispersed as appropriate.

そして、ＣＰＵ１７０は、これらのプログラム１５０Ａ～１５０ＤをＲＯＭ１５０から読み出し、これら読み出された各プログラムをＲＡＭ１６０のワークエリア上に展開する。そして、ＲＡＭ１６０は、展開した各プログラム１５０Ａ～１５０Ｄを、特定プロセス１６０Ａ、抽出プロセス１６０Ｂ、比較プロセス１６０Ｃ及び出力プロセス１６０Ｄとして機能する。 Then, the CPU 170 reads these programs 150A to 150D from the ROM 150, and expands each of these read programs on the work area of the RAM 160. Then, the RAM 160 functions as each of the expanded programs 150A to 150D as a specific process 160A, an extraction process 160B, a comparison process 160C, and an output process 160D.

ＣＰＵ１７０は、所定期間における複数の投稿に含まれる所定のワードの出現回数を特定する。ＣＰＵ１７０は、特定した所定のワードの出現回数が所定の回数以上である場合に、所定のワードを含む投稿から、所定のワードとは異なる１または複数のワードを抽出する。ＣＰＵ１７０は、抽出した１または複数のワードそれぞれの出現回数が、所定期間と異なる期間における出現回数と比較して特定の閾値以上多いか否かに応じて、１または複数のワードを出力する。その結果、特定の事象に関する情報を取得するのに適切なキーワードを選定できる。 The CPU 170 specifies the number of occurrences of a predetermined word included in a plurality of posts in a predetermined period. When the number of appearances of the specified predetermined word is equal to or greater than the predetermined number of times, the CPU 170 extracts one or a plurality of words different from the predetermined word from the posts containing the predetermined word. The CPU 170 outputs one or a plurality of words depending on whether or not the number of appearances of each of the extracted one or a plurality of words is greater than or equal to a specific threshold value as compared with the number of appearances in a period different from a predetermined period. As a result, it is possible to select appropriate keywords for acquiring information on a specific event.

１事象推定システム
３情報処理装置
２１特定部
２２抽出部
２３比較部
２４出力部
４１特定部
４２抽出部
４３重要単語特定部
４４共起単語特定部
４４Ａ共起単語特定部
４５出力部 1 Event estimation system 3 Information processing device 21 Specific unit 22 Extraction unit 23 Comparison unit 24 Output unit 41 Specific unit 42 Extraction unit 43 Important word specific unit 44 Co-occurrence word specific unit 44A Co-occurrence word specific unit 45 Output unit

Claims

Identify the number of occurrences of a given word for a particular event in multiple posts over a given time period
When the number of appearances of the specified predetermined word is equal to or greater than the predetermined number of times, the post containing the predetermined word is different from the predetermined word and co-occurs the predetermined word, 1 or Extract multiple co -occurrence words and
Depending on whether or not the number of appearances of each of the extracted one or more co -occurrence words is greater than or equal to a specific threshold value as compared with the number of appearances in a period different from the predetermined period, the one or more co -occurrence words may be selected. Output and
The specific event is estimated using the predetermined word and the output one or more co-occurrence words.
An information processing program characterized by having a computer execute processing.

The information processing program according to claim 1, wherein the period different from the predetermined period is a period in which the number of appearances of the specified predetermined word is less than the predetermined number of times.

The information processing program according to claim 1 or 2 , wherein a computer executes a process of outputting the one or a plurality of co -occurrence words according to the number of occurrences.

The information processing program according to claim 1 or 2 , wherein a computer executes a process of outputting the one or a plurality of co -occurrence words according to an increasing tendency of the number of occurrences.

It is characterized in that a computer is made to execute a process of outputting the different words according to the number of occurrences of a word different from the one or a plurality of co - occurrence words included in a post containing each of the one or a plurality of co-occurrence words. The information processing program according to any one of claims 1 to 4 .

Identify the number of occurrences of a given word for a particular event in multiple posts over a given time period
When the number of appearances of the specified predetermined word is equal to or greater than the predetermined number of times, one or a plurality of words different from the predetermined word are extracted from the posts containing the predetermined word.
The ratio of the number of appearances of each of the one or more words in the predetermined period to the number of appearances of the one or more words in a period different from the predetermined period is the number of appearances of the predetermined word in a period different from the predetermined period. When the rate of the number of appearances of the predetermined word in the predetermined period is the same as or similar to that of the above, one or more words are output.
The specific event is estimated using the predetermined word and the output one or more words.
An information processing program characterized by having a computer execute processing.

When a temporary increase in message transmission including a predetermined word related to a specific event is detected, the predetermined word and the predetermined word are co-occurred for the message transmitted in the period corresponding to the temporary increase. Extract co -occurrence words,
The predetermined word and the extracted co- occurrence word are output .
The specific event is estimated using the predetermined word and the output co-occurrence word.
A message analysis program characterized by having a computer execute processing.

When a temporary increase in message transmission including a predetermined word related to a specific event is detected, the predetermined word and the predetermined word are co-occurred for the message transmitted in the period corresponding to the temporary increase. Extract the co -occurrence word, which is a co-occurrence word and is an adjective,
The period is evaluated based on the transmission status of the message inside and outside the period including the co-occurrence word which is the extracted adjective.
The specific event is estimated using the predetermined word and the co-occurrence word.
A message analysis program characterized by having a computer execute processing.

A specific part that specifies the number of occurrences of a predetermined word for a specific event included in multiple posts in a predetermined period,
When the number of appearances of the specified predetermined word is equal to or greater than the predetermined number of times, the post containing the predetermined word is different from the predetermined word and co-occurs the predetermined word. An extractor that extracts multiple co -occurrence words,
Depending on whether or not the number of appearances of each of the extracted one or more co -occurrence words is greater than or equal to a specific threshold value as compared with the number of appearances in a period different from the predetermined period, the one or more co -occurrence words may be selected. The output section to output and
An estimation unit that estimates the specific event using the predetermined word and the output one or more co-occurrence words.
An information processing device characterized by having.

The computer
Identify the number of occurrences of a given word for a particular event in multiple posts over a given time period
When the number of appearances of the specified predetermined word is equal to or greater than the predetermined number of times, one or a plurality of posts containing the predetermined word, which are different from the predetermined word and co-occurrence the predetermined word. Extract the co -occurrence word of
Depending on whether or not the number of appearances of each of the extracted one or more co -occurrence words is greater than or equal to a specific threshold value as compared with the number of appearances in a period different from the predetermined period, the one or more co -occurrence words may be selected. Output and
The specific event is estimated using the predetermined word and the output one or more co-occurrence words.
An information processing method characterized by executing processing.