JP2017204169A

JP2017204169A - Event determination device

Info

Publication number: JP2017204169A
Application number: JP2016095938A
Authority: JP
Inventors: 柊高橋; Hiiragi Takahashi; 悠菊地; Yu Kikuchi; 健榎園; Ken Enokizono; 佑介深澤; Yusuke Fukazawa
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2016-05-12
Filing date: 2016-05-12
Publication date: 2017-11-16

Abstract

PROBLEM TO BE SOLVED: To provide an event determination device capable of determining the scarcity of an event.SOLUTION: An event determination device 1 comprises: a word acquisition part 2; a position information acquisition part 3; a time information acquisition part 4; a first group acquisition part 5 for acquiring a document of a first group on the basis of the occurrence position and occurrence time of an event; a second group acquisition part 6 for acquiring a document of a second group different from a document of the first group on the basis of the occurrence position of the event; a similarity calculation part 7 for calculating similarity indicating the similarity of the documents of the two groups; an appearance frequency calculation part 8 for calculating the appearance frequency of words indicating the event in the document of the first group; a determination part 9 for determining the scarcity of the event on the basis of the similarity and the appearance frequency; and an output part 10 for outputting a determination result of the determination part 9.SELECTED DRAWING: Figure 1

Description

本発明は、事象判断装置に関する。 The present invention relates to an event determination device.

従来、マイクロブログサービスにおいて、予め用意された辞書とのキーワードマッチングにより、投稿された文書の中から任意のカテゴリに含まれる文書を抽出する手法がある。例えば、特許文献１には、地名データベースを参照し、投稿された文書の中から地名情報が含まれる文書を抽出する装置が記載されている。 Conventionally, in a microblog service, there is a method of extracting a document included in an arbitrary category from posted documents by keyword matching with a dictionary prepared in advance. For example, Patent Document 1 describes an apparatus that refers to a place name database and extracts a document including place name information from posted documents.

特開２０１４−１３７６３２号公報JP 2014-137632 A

ユーザにより作成され、マイクロブログサービスに投稿された文書の中から、例えば治安に関連する希少な事象の発生を検出することが望まれている。上述の手法によれば、例えば治安に関連する事象を示す文書を抽出することはできる。しかしながら、事象の希少性は、その事象が発生する場所によって異なる。 It is desired to detect the occurrence of a rare event related to security, for example, from a document created by a user and posted to a microblog service. According to the above-described method, for example, a document indicating an event related to security can be extracted. However, the rarity of an event varies depending on where the event occurs.

本発明は、事象の希少性を判断可能な事象判断装置を提供することを目的とする。 An object of this invention is to provide the event judgment apparatus which can judge the rarity of an event.

上記の目的を達成するために、本発明に係る事象判断装置は、事象の希少性を判断する事象判断装置であって、事象を示す予め設定された単語を取得する単語取得手段と、事象が発生した位置を示す位置情報を取得する位置情報取得手段と、事象が発生した時刻を示す時刻情報を取得する時刻情報取得手段と、位置情報により示される位置、及び時刻情報により示される時刻に基づいて、位置及び時刻が設定された文書から、第１グループの文書を取得する第１グループ取得手段と、位置情報により示される位置に基づいて、第１グループの文書とは異なる第２グループの文書を取得する第２グループ取得手段と、第１グループの文書と第２グループの文書との類似性を示す類似度を計算する類似度計算手段と、第１グループの文書における単語の出現頻度を計算する出現頻度計算手段と、類似度と出現頻度とに基づき、事象の希少性を判断する判断手段と、判断手段の判断結果を出力する出力手段と、を備える。 In order to achieve the above object, an event determination apparatus according to the present invention is an event determination apparatus that determines the rarity of an event, a word acquisition unit that acquires a preset word indicating an event, and an event Based on position information acquisition means for acquiring position information indicating the position where it occurred, time information acquisition means for acquiring time information indicating the time when the event occurred, a position indicated by the position information, and a time indicated by the time information And a second group of documents different from the first group of documents based on the position indicated by the position information and the first group acquisition means for acquiring the first group of documents from the position and time set. A second group acquisition means for acquiring the similarity, a similarity calculation means for calculating a similarity indicating the similarity between the first group of documents and the second group of documents, and a first group of documents Comprising the occurrence frequency calculating means for calculating the frequency of occurrence of words, based on the appearance frequency and similarity determining means for determining the scarcity of events, and output means for outputting a determination result of the determination means.

本発明に係る事象判断装置は、事象が発生した位置及び時刻に基づいて取得した第１グループの文書と、同じ位置に基づいて取得した第２グループの文書との類似度を計算するとともに、第１グループの文書における事象を示す予め設定された単語の出現頻度を計算し、類似度及び出現頻度に基づき事象の希少性を判断する。希少性の高い事象が発生した場合、事象の発生した位置における文書中の事象を示す単語の出現頻度が上がることが考えられる。また、事象の発生した時刻の前後周辺の期間とそれ以外の期間とで、事象の発生した位置における文書の内容が変化することも考えられる。類似度によれば、事象の発生した位置における文書の内容の変化を把握することができる。このような出現頻度及び類似度に基づき事象の希少性を判断するので、事象の発生した位置に応じて事象の希少性を判断することができる。 The event determination apparatus according to the present invention calculates the similarity between the first group of documents acquired based on the position and time at which the event occurred and the second group of documents acquired based on the same position, and The appearance frequency of a preset word indicating an event in a group of documents is calculated, and the rarity of the event is determined based on the similarity and the appearance frequency. When a rare event occurs, the frequency of occurrence of a word indicating the event in the document at the position where the event occurred may increase. It is also conceivable that the content of the document at the position where the event occurred changes between the period around the time when the event occurred and the other periods. According to the similarity, it is possible to grasp the change in the content of the document at the position where the event has occurred. Since the rarity of the event is determined based on such appearance frequency and similarity, the rarity of the event can be determined according to the position where the event has occurred.

本発明に係る事象判断装置では、出現頻度計算手段は、予め設定されたリアクション語の第１グループの文書における出現頻度と、リアクション語の第２グループの文書における出現頻度とを計算して、判断手段は、リアクション語の第１グループ及び第２グループの文書における出現頻度を更に用いて、事象の希少性を判断してもよい。事象の希少性が高ければ、事象の発生した時刻及び位置における文書には、リアクション語が多く用いられることが考えられる。したがって、リアクション語の出現頻度を更に用いることにより、事象の希少性をより確実に判断することができる。 In the event determination device according to the present invention, the appearance frequency calculation means calculates and determines the appearance frequency of a preset reaction word in the first group of documents and the appearance frequency of the reaction word in the second group of documents. The means may further determine the rarity of the event by further using the appearance frequency of the reaction word in the first group and second group documents. If the rareness of an event is high, it is considered that many reaction words are used in the document at the time and position where the event occurred. Therefore, by further using the appearance frequency of the reaction word, it is possible to more reliably determine the rarity of the event.

本発明に係る事象判断装置では、第２グループ取得手段は、時刻情報により示される時刻に基づいて、第２グループの文書を取得してもよい。この場合、例えば第１グループの文書の直前の時刻が設定された文書を第２グループの文書とすることで、事象の発生前後における文書の変化が類似度により把握し易い。これにより、事象の希少性をより確実に判断することができる。 In the event determination apparatus according to the present invention, the second group acquisition unit may acquire the second group of documents based on the time indicated by the time information. In this case, for example, by setting a document in which the time immediately before the document of the first group is set as the document of the second group, it is easy to grasp the change of the document before and after the occurrence of the event based on the similarity. Thereby, the rarity of the event can be determined more reliably.

本発明では、事象の希少性を判断可能である。 In the present invention, the rarity of an event can be determined.

実施形態に係る事象判断装置のブロック図である。It is a block diagram of the event judgment device concerning an embodiment. 図１の事象判断装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the event judgment apparatus of FIG. 投稿データベースが記憶しているデータの例を示す図である。It is a figure which shows the example of the data which the contribution database has memorize | stored. 治安関連事象データベースが記憶しているデータの例を示す図である。It is a figure which shows the example of the data which the security related event database has memorize | stored. リアクション語データベースが記憶しているデータの例を示す図である。It is a figure which shows the example of the data which the reaction word database has memorize | stored. 希少治安関連事象データベースが記憶しているデータの例を示す図である。It is a figure which shows the example of the data which the rare security relevant event database has memorize | stored. 第１グループの文書及び第２グループの文書の例を示す図である。It is a figure which shows the example of a 1st group document and a 2nd group document. 事象判断装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of an event judgment apparatus.

以下、図面と共に本発明に係る事象判断装置の実施形態について詳細に説明する。なお、図面の説明においては同一要素には同一符号を付し、重複する説明を省略する。 Hereinafter, embodiments of an event determination apparatus according to the present invention will be described in detail with reference to the drawings. In the description of the drawings, the same elements are denoted by the same reference numerals, and redundant description is omitted.

図１は、実施形態に係る事象判断装置１のブロック図である。事象判断装置１は、事象の希少性、つまりその事象がごく少ない確率で発生する事象であるか否かを判断（判定）するための装置である。希少性が高い事象であるには、普段その事象が発生しない場所で事象が発生する必要がある。本実施形態において、事象判断装置１の判断対象となる事象は、各ユーザにより作成され、Ｔｗｉｔｔｅｒ（登録商標）等のマイクロブログサービスに投稿された文書、即ちマイクロブログから予め抽出された事象であって、特定のカテゴリに関連する特定カテゴリ関連事象である。マイクロブログとは、１行１５０文字程度の短い文章で記述されたブログである。以下では、特定カテゴリ関連事象は治安に関連する事象である治安関連事象であるとして説明する。 FIG. 1 is a block diagram of an event determination apparatus 1 according to the embodiment. The event determination device 1 is a device for determining (determining) whether or not an event is rare, that is, whether or not the event occurs with a very low probability. In order to be an event with high rarity, an event needs to occur in a place where the event does not normally occur. In the present embodiment, an event to be determined by the event determination device 1 is a document created by each user and posted to a microblog service such as Twitter (registered trademark), that is, an event extracted in advance from a microblog. Thus, it is a specific category related event related to a specific category. A microblog is a blog written in a short sentence of about 150 characters per line. In the following description, it is assumed that the specific category related event is a security related event that is an event related to security.

図１に示されるように、事象判断装置１は、単語取得部２、位置情報取得部３、時刻情報取得部４、第１グループ取得部５、第２グループ取得部６、類似度計算部７、出現頻度計算部８、判断部９、及び出力部１０を備えている。事象判断装置１は、投稿データベース２１、治安関連事象データベース２２、リアクション語データベース２３、及び希少治安関連事象データベース２４のそれぞれとインターネット等のネットワークを介して接続されている。 As shown in FIG. 1, the event determination device 1 includes a word acquisition unit 2, a position information acquisition unit 3, a time information acquisition unit 4, a first group acquisition unit 5, a second group acquisition unit 6, and a similarity calculation unit 7. , An appearance frequency calculation unit 8, a determination unit 9, and an output unit 10. The event determination device 1 is connected to each of the posting database 21, the security related event database 22, the reaction word database 23, and the rare security related event database 24 via a network such as the Internet.

図２は、事象判断装置のハードウェア構成を示す図である。図２に示されるように、事象判断装置１は、１つ以上のＣＰＵ（Central Processing Unit）１０１、主記憶装置であるＲＡＭ（RandomAccess Memory）１０２及びＲＯＭ（Read Only Memory）１０３、通信を行うための通信モジュール１０４、並びにハードディスク等の補助記憶装置等のハードウェア１０５を備えるコンピュータを含むものとして構成される。これらの構成要素がプログラム等により動作することにより、図１の事象判断装置１の各機能要素による機能が発揮される。なお、事象判断装置１は複数台のコンピュータによるコンピュータシステムによって構成されていてもよい。以下、図１に示される各機能要素及び各データベースについて説明する。 FIG. 2 is a diagram illustrating a hardware configuration of the event determination apparatus. As shown in FIG. 2, the event determination device 1 communicates with one or more CPUs (Central Processing Units) 101, a main storage device such as a RAM (Random Access Memory) 102 and a ROM (Read Only Memory) 103. The communication module 104 and a computer including hardware 105 such as an auxiliary storage device such as a hard disk are included. When these components are operated by a program or the like, the functions of the functional elements of the event determination apparatus 1 of FIG. 1 are exhibited. Note that the event determination device 1 may be configured by a computer system including a plurality of computers. Hereinafter, each functional element and each database shown in FIG. 1 will be described.

投稿データベース２１は、位置及び時刻が設定された文書であって、各ユーザにより作成され、投稿された文書を記憶する装置である。図３に示されるように、投稿データベース２１は、文書を示す文書ＩＤと、文書と、文書が投稿された位置である投稿位置を示す投稿位置情報と、文書が投稿された時刻である投稿時刻を示す投稿時刻情報と、を対応付けて記憶している。投稿位置情報は、例えばポイントを示す緯度経度であってもよいし、メッシュを示すメッシュＩＤであってもよい。メッシュとは、地図上の領域を分割して得られる区画である。なお、投稿された文書にこのような投稿位置情報が付随していない場合、文書中の記載から投稿位置を設定してもよい。例えば、地名が予め登録された辞書を用意し、辞書とのキーワードマッチングにより文書中に地名が含まれていれば、当該地名を投稿位置に設定する。この場合、図３の「大門のあたり雨すごい！」という文書であれば、「大門」が投稿位置として設定される。 The posting database 21 is a document in which a position and a time are set, and is a device that stores a document created and posted by each user. As illustrated in FIG. 3, the posting database 21 includes a document ID indicating a document, a document, posting position information indicating a posting position where the document is posted, and a posting time that is a time when the document is posted. Is stored in association with post time information indicating. The posting position information may be, for example, a latitude / longitude indicating a point or a mesh ID indicating a mesh. A mesh is a section obtained by dividing an area on a map. When such posted position information is not attached to the posted document, the posted position may be set from the description in the document. For example, a dictionary in which place names are registered in advance is prepared, and if a place name is included in a document by keyword matching with the dictionary, the place name is set as a posting position. In this case, “Damon” is set as the posting position for the document “Awesome rain around Daimon!” In FIG.

治安関連事象データベース２２は、事象判断装置１の判断対象となる治安関連事象を示す情報を記憶する装置である。図４に示されるように、治安関連事象データベース２２は、治安関連事象を示す治安関連事象ＩＤと、治安関連事象を示す予め設定された単語である治安関連キーワードと、治安関連事象が発生した位置である発生位置を示す発生位置情報と、治安関連事象が発生した時刻である発生時刻を示す発生時刻情報と、を対応づけて記憶している。発生位置情報は、投稿位置情報と同様に、例えば緯度経度であってもよいし、メッシュＩＤであってもよい。 The security-related event database 22 is a device that stores information indicating security-related events to be determined by the event determination device 1. As shown in FIG. 4, the security-related event database 22 includes a security-related event ID indicating a security-related event, a security-related keyword that is a preset word indicating a security-related event, and a position where the security-related event has occurred. The occurrence position information indicating the occurrence position and the occurrence time information indicating the occurrence time, which is the time when the security related event occurred, are stored in association with each other. The generation position information may be, for example, a latitude / longitude or a mesh ID, similar to the posting position information.

治安関連事象データベース２２が記憶する治安関連事象をマイクロブログから抽出する方法は、特に限定されない。例えば、治安関連キーワードとのキーワードマッチングにより、投稿データベース２１に記憶された複数の文書から治安関連事象を抽出してもよい。治安関連キーワードとして、例えば、爆破、火事、事故、テロ等が挙げられる。このような抽出処理は、例えば１時間ごとに行われ、治安関連事象がリアルタイムで抽出される。抽出の主体は、事象判断装置１であってもよいし、別の装置であってもよい。投稿データベース２１に記憶された複数の文書から治安関連事象を抽出する場合、投稿データベース２１に記憶された投稿位置及び投稿時刻をそれぞれ発生位置及び発生時刻として扱う。治安関連事象データベース２２が記憶する内容は、例えば治安関連事象が抽出されるタイミングで更新される。図３に示されるように、投稿データベース２１に記憶された「映画館Ａで火事？」という文書は、治安関連キーワードである「火事」を含んでいる。したがって、この文書に基づき、「火事」という治安関連事象が抽出される。この文書の投稿位置及び投稿時刻は、治安関連事象の発生位置及び発生時刻として扱われ、図４に示されるように、これらの情報が治安関連キーワードの「火事」とともに、治安関連事象データベース２２に記憶される。事象判断装置１は、例えば治安関連事象データベース２２の内容が更新されるタイミングで治安関連事象の希少性を判断する処理を開始する。事象判断装置１は、治安関連事象データベース２２に記憶された情報に基づき、治安関連事象の希少性を１件ずつ順に判断する。 The method for extracting the security related events stored in the security related event database 22 from the microblog is not particularly limited. For example, security-related events may be extracted from a plurality of documents stored in the posting database 21 by keyword matching with security-related keywords. Examples of security-related keywords include explosions, fires, accidents, and terrorism. Such extraction processing is performed, for example, every hour, and security-related events are extracted in real time. The subject of extraction may be the event determination device 1 or another device. When security related events are extracted from a plurality of documents stored in the posting database 21, the posting position and posting time stored in the posting database 21 are treated as the generation position and the generation time, respectively. The contents stored in the security related event database 22 are updated, for example, at the timing when the security related events are extracted. As shown in FIG. 3, the document “Fire in cinema A?” Stored in the posting database 21 includes “fire” as a security related keyword. Therefore, based on this document, the security related event “fire” is extracted. The posting position and the posting time of this document are treated as the occurrence position and the occurrence time of the security related event. As shown in FIG. 4, these pieces of information are stored in the security related event database 22 together with the security related keyword “fire”. Remembered. The event determination device 1 starts processing for determining the rarity of security-related events at the timing when the contents of the security-related event database 22 are updated, for example. The event determination device 1 sequentially determines the rarity of security-related events one by one based on the information stored in the security-related event database 22.

リアクション語データベース２３は、予め設定された複数のリアクション語からなるリアクション語集合を記憶する装置である。図５は、リアクション語データベース２３が記憶しているデータの例を示す図である。リアクション語は、希少性の高い治安関連事象が発生した際に、その発生位置付近において投稿される文書に用いられる可能性が高い文字及び記号である。リアクション語は、例えば、実際に希少性の高い治安関連事象が発生した際に、その発生位置付近において投稿された文書を事後的に解析することにより設定することができる。 The reaction word database 23 is a device that stores a reaction word set including a plurality of reaction words set in advance. FIG. 5 is a diagram illustrating an example of data stored in the reaction word database 23. Reaction words are characters and symbols that are highly likely to be used in documents posted near the location of occurrence of highly rare security-related events. The reaction word can be set, for example, by analyzing a document posted in the vicinity of the occurrence position when an extremely rare security-related event actually occurs.

希少治安関連事象データベース２４は、事象判断装置１により希少性が高いと判断された治安関連事象である希少治安関連事象を示す情報を記憶する装置である。図６に示されるように、希少治安関連事象データベース２４は、希少治安関連事象を示す希少治安関連事象ＩＤと、治安関連キーワードと、発生位置情報と、発生時刻情報と、を対応づけて記憶している。 The rare security-related event database 24 is a device that stores information indicating a rare security-related event that is a security-related event determined to be highly scarce by the event determination device 1. As shown in FIG. 6, the rare security-related event database 24 stores a rare security-related event ID indicating a rare security-related event, a security-related keyword, occurrence position information, and occurrence time information in association with each other. ing.

単語取得部２は、事象を示す予め設定された単語を取得する単語取得手段である。単語取得部２は、判断対象の治安関連事象を示す治安関連事象ＩＤに対応付けられた治安関連キーワードを治安関連事象データベース２２から取得する。単語取得部２は、取得した治安関連キーワードを出現頻度計算部８に送出する。 The word acquisition unit 2 is a word acquisition unit that acquires a preset word indicating an event. The word acquisition unit 2 acquires from the security related event database 22 a security related keyword associated with a security related event ID indicating a security related event to be determined. The word acquisition unit 2 sends the acquired security-related keywords to the appearance frequency calculation unit 8.

位置情報取得部３は、事象が発生した位置を示す位置情報を取得する位置情報取得手段である。位置情報取得部３は、判断対象の治安関連事象を示す治安関連事象ＩＤに対応付けられた発生位置情報を治安関連事象データベース２２から取得する。位置情報取得部３は、取得した発生位置情報を第１グループ取得部５及び第２グループ取得部６に送出する。 The position information acquisition unit 3 is position information acquisition means for acquiring position information indicating a position where an event has occurred. The location information acquisition unit 3 acquires the occurrence location information associated with the security related event ID indicating the security related event to be determined from the security related event database 22. The position information acquisition unit 3 sends the acquired occurrence position information to the first group acquisition unit 5 and the second group acquisition unit 6.

時刻情報取得部４は、事象が発生した時刻を示す時刻情報を取得する時刻情報取得手段である。時刻情報取得部４は、判断対象の治安関連事象を示す治安関連事象ＩＤに対応付けられた発生時刻情報を治安関連事象データベース２２から取得する。時刻情報取得部４は、取得した発生時刻情報を第１グループ取得部５及び第２グループ取得部６に送出する。 The time information acquisition unit 4 is time information acquisition means for acquiring time information indicating the time at which an event has occurred. The time information acquisition unit 4 acquires the occurrence time information associated with the security related event ID indicating the security related event to be determined from the security related event database 22. The time information acquisition unit 4 sends the acquired occurrence time information to the first group acquisition unit 5 and the second group acquisition unit 6.

第１グループ取得部５は、位置情報により示される位置、及び時刻情報により示される時刻に基づいて、位置及び時刻が設定された文書から、第１グループの文書を取得する第１グループ取得手段である。第１グループ取得部５は、位置情報取得部３から発生位置情報を入力するとともに、時刻情報取得部４から発生時刻情報を入力する。第１グループ取得部５は、発生位置情報及び発生時刻情報を入力すると、発生位置情報により示される発生位置、及び発生時刻情報により示される発生時刻に基づいて、投稿データベース２１から第１グループの文書を取得する。第１グループには、投稿データベースから治安関連事象を抽出する際に、抽出の根拠となった治安関連キーワードを含む文書が含まれることになる。 The first group acquisition unit 5 is a first group acquisition unit that acquires a document of the first group from a document in which the position and time are set based on the position indicated by the position information and the time indicated by the time information. is there. The first group acquisition unit 5 inputs the generation position information from the position information acquisition unit 3 and the generation time information from the time information acquisition unit 4. When the first group acquisition unit 5 inputs the generation position information and the generation time information, the first group document is retrieved from the posting database 21 based on the generation position indicated by the generation position information and the generation time indicated by the generation time information. To get. The first group includes documents including security-related keywords that are the basis for extraction when security-related events are extracted from the posting database.

第２グループ取得部６は、位置情報により示される位置に基づいて、第１グループの文書とは異なる第２グループの文書を取得する第２グループ取得手段である。第２グループ取得部６は、更に、発生時刻情報により示される発生時刻に基づいて、第２文書グループの文書を取得する。第２グループ取得部６は、位置情報取得部３から発生位置情報を入力するとともに、時刻情報取得部４から発生時刻情報を入力する。第２グループ取得部６は、発生位置情報及び発生時刻情報を入力すると、発生位置情報により示される発生位置、及び発生時刻情報により示される発生時刻に基づいて、投稿データベース２１から第２グループの文書を取得する。 The second group acquisition unit 6 is a second group acquisition unit that acquires a second group of documents different from the first group of documents based on the position indicated by the position information. The second group acquisition unit 6 further acquires documents of the second document group based on the occurrence time indicated by the occurrence time information. The second group acquisition unit 6 inputs the generation position information from the position information acquisition unit 3 and the generation time information from the time information acquisition unit 4. When the second group acquisition unit 6 inputs the generation position information and the generation time information, the second group document is retrieved from the posting database 21 based on the generation position indicated by the generation position information and the generation time indicated by the generation time information. To get.

図７は、第１グループ及び第２グループの文書を取得する方法を説明するための図である。図７の横軸は時刻を示している。第１グループは、例えば、発生位置に基づく位置範囲である発生位置範囲において、発生時刻を中心とする予め設定された長さの第１期間で投稿された文書から構成されている。発生位置範囲とは、例えば、発生位置情報がメッシュＩＤで示されている場合は、そのメッシュＩＤで示されるメッシュであり、発生位置情報が緯度経度で示されている場合は、その緯度経度で示されるポイントを中心とする予め設定された所定範囲（例えば、半径１００ｍの範囲）である。第１期間は例えば２時間である。つまり、治安関連事象の発生時刻が１８時である場合、第１期間は１７時から１９時までとなり、第１グループはこの第１期間に発生位置範囲で投稿された文書により構成される。第１グループの文書は、治安関連事象によりその内容に変化が生じたか否かの判断対象となる文書である。したがって、第１期間は、希少性の高い治安関連事象が発生したか否かの判断対象となる期間（判定時）と言える。また、第１グループの文書は、判定時投稿群と言える。 FIG. 7 is a diagram for explaining a method for acquiring the documents of the first group and the second group. The horizontal axis in FIG. 7 indicates time. The first group includes, for example, documents posted in a first period having a preset length around the occurrence time in the occurrence position range that is a position range based on the occurrence position. For example, when the generation position information is indicated by a mesh ID, the generation position range is a mesh indicated by the mesh ID. When the generation position information is indicated by a latitude and longitude, It is a predetermined range (for example, a range having a radius of 100 m) that is set in advance around the indicated point. The first period is, for example, 2 hours. That is, when the occurrence time of the security related event is 18:00, the first period is from 17:00 to 19:00, and the first group is composed of documents posted in the occurrence position range in the first period. The first group of documents is a document that is a target of determination as to whether or not the contents have changed due to a security-related event. Therefore, it can be said that the first period is a period (at the time of determination) that is a target for determining whether or not a highly rare security-related event has occurred. The first group of documents can be said to be a posting group at the time of determination.

第２グループは、例えば、発生位置範囲において、第１期間よりも前（過去）の第２期間に投稿された文書から構成されている。つまり、治安関連事象の発生時刻が１８時である場合、第２期間は１７時よりも前の期間となり、第２グループはこの第２期間に発生位置範囲で投稿された文書のうち、予め設定された所定数のランダムに選ばれた文書又は全ての文書により構成される。第２期間は、希少性の高い治安関連事象が発生したか否かの判断対象となっていない期間（非判定時）と言える。また、第２グループの文書は、非判定時投稿群と言える。第１グループ及び第２グループの文書は、いずれも同じ位置範囲で投稿された文書（ＰＯＩ（Point of Interest）関連投稿）である。 The second group includes, for example, documents posted in the second period before (the past) the first period in the generation position range. That is, when the occurrence time of the security related event is 18:00, the second period is a period before 17:00, and the second group is set in advance among the documents posted in the occurrence position range in the second period. A predetermined number of randomly selected documents or all documents. The second period can be said to be a period (when not determined) that is not subject to determination as to whether or not a highly rare security-related event has occurred. The second group of documents can be said to be a non-determination posting group. The documents of the first group and the second group are both documents (POI (Point of Interest) related posts) posted in the same position range.

類似度計算部７は、第１グループの文書と第２グループの文書との類似性を示す類似度を計算する類似度計算手段である。類似度計算部７は、第１グループ取得部５から第１グループの文書を入力し、第２グループ取得部６から第２グループの文書を入力し、単語取得部２から治安関連キーワードを入力する。類似度計算部７は、取得した第１グループの文書と第２グループの文書との類似性を示す類似度を周知の手法により計算する。類似度計算部７は、計算した類似度を判断部９に送出する。 The similarity calculator 7 is a similarity calculator that calculates a similarity indicating the similarity between the first group of documents and the second group of documents. The similarity calculation unit 7 inputs a first group document from the first group acquisition unit 5, inputs a second group document from the second group acquisition unit 6, and inputs security-related keywords from the word acquisition unit 2. . The similarity calculation unit 7 calculates the similarity indicating the similarity between the acquired first group of documents and the second group of documents by a known method. The similarity calculation unit 7 sends the calculated similarity to the determination unit 9.

類似度計算部７は、例えば、第１グループの文書を１つの特徴ベクトルに変換するとともに、第２グループの文書を１つの特徴ベクトルに変換し、これら２つの特徴ベクトル同士のｃｏｓ類似度を第１グループの文書と第２グループの文書との類似度として計算してもよい。特徴ベクトルは、文書に出現する単語の出現頻度を要素としたベクトルであり、例えば、各グループの文書をbag-of-wordsで表現したものである。なお、各単語は、例えば、形態素解析により抽出された形態素とすることができる。特徴ベクトルの要素の値、即ち、単語の出現頻度は、出現数（出現度数）としてもよいし、単語の出現数に正規化処理（例えば、それぞれの単語の出現数を全単語の出現数で割る処理）を施したもの（出現分布）としてもよい。正規化処理を行うことで単純な単語の出現数に左右されずに類似度を算出することができる。ｃｏｓ類似度が１に近いほど第１グループの文書と第２グループの文書とは類似しており、ｃｏｓ類似度が０に近いほど第１グループの文書と第２グループの文書とは類似していない。また、類似度計算部７は、第１グループの文書に出現する単語により構成される単語集合と、第２グループの文書に出現する単語により構成される単語集合との類似度を示すJaccard係数を、第１グループの文書と第２グループの文書との類似度として計算してもよい。Jaccard係数が１に近いほど第１グループの文書と第２グループの文書とは類似しており、Jaccard係数が０に近いほど第１グループの文書と第２グループの文書とは類似していない。 The similarity calculation unit 7 converts, for example, the first group of documents into one feature vector, converts the second group of documents into one feature vector, and determines the cos similarity between the two feature vectors. You may calculate as a similarity degree of a document of 1 group, and a document of a 2nd group. The feature vector is a vector having the frequency of occurrence of words appearing in the document as an element. For example, each group of documents is represented by bag-of-words. Each word can be, for example, a morpheme extracted by morphological analysis. The element value of the feature vector, that is, the word appearance frequency may be the number of appearances (appearance frequency), or normalized to the number of word appearances (for example, the number of occurrences of each word is the number of occurrences of all words). It is good also as what gave (processing to divide) (appearance distribution). By performing the normalization process, the similarity can be calculated regardless of the number of simple words appearing. The closer the cos similarity is to 1, the more similar the documents in the first group and the second group, and the closer the cos similarity is to 0, the more similar the documents in the first group and the second group are. Absent. Further, the similarity calculation unit 7 calculates a Jaccard coefficient indicating the similarity between a word set composed of words appearing in the first group of documents and a word set composed of words appearing in the second group of documents. The similarity between the first group of documents and the second group of documents may be calculated. The closer the Jaccard coefficient is to 1, the more similar the documents of the first group and the second group, and the closer the Jaccard coefficient is to 0, the more similar the documents of the first group and the second group are.

出現頻度計算部８は、第１グループの文書における事象を示す予め設定された単語の出現頻度を計算する出現頻度計算手段である。更に、出現頻度計算部８は、予め設定されたリアクション語の第１グループの文書における出現頻度と、リアクション語の第２グループの文書における出現頻度とを計算する。出現頻度計算部８は、第１グループ取得部５から第１グループの文書を入力し、第２グループ取得部６から第２グループの文書を入力し、リアクション語データベース２３からリアクション語を入力し、単語取得部２から治安関連キーワードを入力する。 The appearance frequency calculation unit 8 is an appearance frequency calculation unit that calculates an appearance frequency of a preset word indicating an event in the document of the first group. Furthermore, the appearance frequency calculation unit 8 calculates the appearance frequency of the preset reaction word in the first group of documents and the appearance frequency of the reaction word in the second group of documents. The appearance frequency calculation unit 8 inputs a first group document from the first group acquisition unit 5, inputs a second group document from the second group acquisition unit 6, inputs a reaction word from the reaction word database 23, Security related keywords are input from the word acquisition unit 2.

出現頻度計算部８は、第１グループの文書における治安関連キーワードの出現頻度、第１グループの文書におけるリアクション語の出現頻度、及び第２グループの文書におけるリアクション語の出現頻度を計算する。例えば、出現頻度計算部８は、第１グループの文書における治安関連キーワードの出現数を第１グループの文書の全単語の出現数で割った値を、第１グループの文書における治安関連キーワードの出現頻度として計算する。また、出現頻度計算部８は、第１グループの文書における複数のリアクション語の出現数の総数を第１グループの文書の全単語の出現数で割った値を、第１グループの文書におけるリアクション語の出現頻度として計算する。更に、出現頻度計算部８は、第２グループの文書における複数のリアクション語の出現数の総数を第２グループの文書の全単語の出現数で割った値を、第２グループの文書におけるリアクション語の出現頻度として計算する。なお、各グループの文書の全単語の出現数は、例えば、形態素解析により抽出された形態素の総数とすることができる。出現頻度計算部８は、計算したこれらの出現頻度を判断部９に送出する。 The appearance frequency calculation unit 8 calculates the appearance frequency of security-related keywords in the first group of documents, the appearance frequency of reaction words in the first group of documents, and the appearance frequency of reaction words in the second group of documents. For example, the appearance frequency calculation unit 8 divides a value obtained by dividing the number of appearances of security-related keywords in the first group of documents by the number of appearances of all words in the first group of documents into the appearance of security-related keywords in the first group of documents. Calculate as frequency. The appearance frequency calculation unit 8 also calculates a reaction word in the first group of documents by dividing the total number of appearances of the plurality of reaction words in the first group of documents by the number of occurrences of all the words in the first group of documents. Is calculated as the appearance frequency of Furthermore, the appearance frequency calculation unit 8 divides the total number of appearances of a plurality of reaction words in the second group of documents by the number of appearances of all the words in the second group of documents, into the reaction words in the second group of documents. Is calculated as the appearance frequency of Note that the number of occurrences of all words in the documents of each group can be, for example, the total number of morphemes extracted by morpheme analysis. The appearance frequency calculation unit 8 sends the calculated appearance frequencies to the determination unit 9.

判断部９は、類似度と出現頻度とに基づき、事象の希少性を判断する判断手段である。判断部９は、リアクション語の第１グループ及び第２グループの文書における出現頻度を更に用いて、事象の希少性を判断する。判断部９は、第１グループの文書と第２グループの文書との類似度を類似度計算部７から入力し、第１グループの文書における治安関連キーワードの出現頻度、第１グループの文書におけるリアクション語の出現頻度、及び、第２グループの文書におけるリアクション語の出現頻度を出現頻度計算部８から入力する。判断部９は、例えば、希少度Ｒを下記式（１）により求める。ここで、ｓｉｍを第１グループの文書と第２グループの文書との類似度、ｐｋを第１グループの文書における治安関連キーワードの出現頻度、ｐ１を第１グループの文書におけるリアクション語の出現頻度（ただし、第１グループの文書におけるリアクション語の出現頻度が０の場合、ｐ１を１）、ｐ２を第２グループの文書におけるリアクション語の出現頻度（ただし、第２グループの文書におけるリアクション語の出現頻度が０の場合、ｐ２を１）、α、β、γを正の数である任意のパラメータとする。なお、第１グループの文書には必ず治安関連キーワードが存在するため、ｐｋ＞０である。
Ｒ＝α（１／ｓｉｍ）・β（ｐ１／ｐ２）・γｐｋ（１） The determination unit 9 is a determination unit that determines the rarity of an event based on the similarity and the appearance frequency. The determination unit 9 determines the rarity of the event by further using the appearance frequency of the reaction word in the first group and second group documents. The determination unit 9 inputs the similarity between the first group document and the second group document from the similarity calculation unit 7, the appearance frequency of security related keywords in the first group document, and the reaction in the first group document. The appearance frequency of the word and the appearance frequency of the reaction word in the second group of documents are input from the appearance frequency calculation unit 8. For example, the determination unit 9 obtains the rarity R by the following formula (1). Here, sim is the similarity between the first group of documents and the second group of documents, pk is the frequency of appearance of security-related keywords in the first group of documents, and p1 is the frequency of occurrence of reaction words in the first group of documents ( However, when the appearance frequency of the reaction word in the first group document is 0, p1 is 1), and p2 is the appearance frequency of the reaction word in the second group document (however, the appearance frequency of the reaction word in the second group document) When is 0, p2 is 1), and α, β, and γ are positive parameters. Note that pk> 0 since security related keywords always exist in the first group of documents.
R = α (1 / sim) · β (p1 / p2) · γpk (1)

判断部９は、上記式（１）により求められた希少度Ｒに基づいて治安関連事象の希少性を判断する。判断部９は、例えば、希少度Ｒが予め定められた閾値よりも大きければ、治安関連事象の希少性が高いと判断し、希少度Ｒが予め定められた閾値以下であれば、治安関連事象の希少性が低いと判断する。判断部９は、判断結果を出力部１０に送出する。 The determination unit 9 determines the rarity of the security related event based on the rarity R obtained by the above formula (1). For example, the determination unit 9 determines that the rarity of the security-related event is high if the rarity R is greater than a predetermined threshold, and if the rarity R is equal to or less than the predetermined threshold, the security-related event Is judged to be low. The determination unit 9 sends the determination result to the output unit 10.

出力部１０は、判断手段の判断結果を出力する出力手段である。出力部１０は、判断部９から判断結果を入力する。出力部１０は、希少性が高いという判断結果を入力すると、単語取得部２から治安関連キーワードを取得し、位置情報取得部３から発生位置情報を取得し、時刻情報取得部４から発生時刻情報を取得する。出力部１０は、取得した治安関連キーワード、発生位置情報、及び発生時刻情報を希少治安関連事象データベース２４に記憶させる。出力部１０は、希少性が低いという判断結果を入力すると、これらの処理を行わない。なお、出力部１０は、これ以外の手段で判断結果を出力してもよく、例えば、希少治安関連事象の発生位置に対応する位置に発生時刻と治安関連キーワードを表示させた地図を表示してもよい。 The output unit 10 is an output unit that outputs a determination result of the determination unit. The output unit 10 inputs a determination result from the determination unit 9. When the output unit 10 inputs a determination result indicating that the rarity is high, the output unit 10 acquires a security-related keyword from the word acquisition unit 2, acquires the occurrence position information from the position information acquisition unit 3, and the occurrence time information from the time information acquisition unit 4. To get. The output unit 10 stores the acquired security related keywords, occurrence position information, and occurrence time information in the rare security related event database 24. When the output unit 10 inputs the determination result that the rarity is low, the output unit 10 does not perform these processes. Note that the output unit 10 may output the determination result by means other than this, for example, by displaying a map displaying the occurrence time and the security-related keyword at a position corresponding to the occurrence position of the rare security-related event. Also good.

図８は、事象判断装置の動作を示すフローチャートである。図８に示されるように、事象判断装置１は、まず単語取得部２、位置情報取得部３、及び時刻情報取得部４により、治安関連キーワード、発生位置情報、及び発生時刻情報を治安関連事象データベース２２から取得する（Ｓ１１）。続いて、事象判断装置１は、第１グループ取得部５及び第２グループ取得部６により、第１グループの文書及び第２グループの文書を投稿データベース２１から取得する（Ｓ１２）。続いて、事象判断装置１は、類似度計算部７により、第１グループの文書と第２グループの文書との類似度を計算する（Ｓ１３）。続いて、事象判断装置１は、出現頻度計算部８により、第１グループの文書における治安関連キーワードの出現頻度、第１グループの文書におけるリアクション語の出現頻度、及び第２グループの文書におけるリアクション語の出現頻度を計算する（Ｓ１４）。続いて、事象判断装置１は、判断部９により、治安関連事象の希少性を判断する（Ｓ１５）。事象判断装置１は、出力部１０により、判断部９の判断結果を出力する（Ｓ１６）。 FIG. 8 is a flowchart showing the operation of the event determination apparatus. As shown in FIG. 8, the event determination device 1 uses the word acquisition unit 2, the location information acquisition unit 3, and the time information acquisition unit 4 to obtain security related keywords, occurrence location information, and occurrence time information as security related events. Obtained from the database 22 (S11). Subsequently, the event determination apparatus 1 uses the first group acquisition unit 5 and the second group acquisition unit 6 to acquire the first group document and the second group document from the posting database 21 (S12). Subsequently, the event determination apparatus 1 uses the similarity calculation unit 7 to calculate the similarity between the first group of documents and the second group of documents (S13). Subsequently, the event determination apparatus 1 uses the appearance frequency calculation unit 8 to generate the security-related keyword appearance frequency in the first group document, the reaction word appearance frequency in the first group document, and the reaction word in the second group document. Is calculated (S14). Subsequently, the event determination device 1 determines the rarity of the security-related event by the determination unit 9 (S15). The event determination apparatus 1 outputs the determination result of the determination part 9 by the output part 10 (S16).

以上説明したように、事象判断装置１は、治安関連事象の発生位置範囲において、発生時刻を中心とした、発生時刻の前後周辺の期間である第１期間に投稿された文書を第１グループの文書として取得するとともに、第１期間よりも前の期間である第２期間に投稿された文書を第２グループの文書として取得する。事象判断装置１は、第１グループの文書と第２グループの文書との類似度を計算するとともに、第１グループの文書における治安関連キーワードの出現頻度を計算する。希少性の高い治安関連事象が発生した場合、その発生位置範囲において投稿される文書における治安関連キーワードの出現頻度が上がること、及び、発生位置範囲において投稿される文書の内容が第１期間と第２期間とで変化することが考えられる。文書の内容が大きく変化するほど、類似度は低下する。したがって、類似度によれば、発生位置範囲における文書の内容の変化の程度を把握することができる。事象判断装置１は、このような類似度及び出現頻度に基づき治安関連事象の希少性を判断するので、発生位置に応じた治安関連事象の希少性を判断することができる。また、類似度及び出現頻度の２つのパラメータを用いることにより、いずれか１つのパラメータを用いる場合よりも確実に治安関連事象の希少性を判断することができる。 As described above, the event determination apparatus 1 is configured so that documents posted in the first period, which is a period around the occurrence time, centered on the occurrence time in the occurrence position range of the security related event, The document is acquired as a document, and the document posted in the second period that is a period before the first period is acquired as a document of the second group. The event determination device 1 calculates the similarity between the first group of documents and the second group of documents, and calculates the appearance frequency of security related keywords in the first group of documents. When a security-related event with high rarity occurs, the frequency of appearance of security-related keywords in the document posted in the occurrence position range increases, and the contents of the document posted in the occurrence position range are the first period and the first It can be considered that the period changes in two periods. The similarity decreases as the content of the document changes greatly. Therefore, according to the similarity, it is possible to grasp the degree of change in the content of the document in the generation position range. Since the event determination device 1 determines the rarity of security related events based on such similarity and appearance frequency, it is possible to determine the rarity of security related events according to the occurrence position. In addition, by using two parameters of similarity and appearance frequency, the rarity of security related events can be determined more reliably than when any one of the parameters is used.

また、事象判断装置１は、リアクション語の第１グループの文書における出現頻度と、リアクション語の第２グループの文書における出現頻度とを計算し、リアクション語の第２グループの文書における出現頻度に対する、リアクション語の第１グループの文書における出現頻度の比（＝ｐ１／ｐ２）を更に用いて、治安関連事象の希少性を判断する。事象判断装置１は、具体的には、β（ｐ１／ｐ２）で示されるリアクション係数を更に用いて、治安関連事象の希少性を判断する。治安関連事象の希少性が高ければ、第１期間に発生位置範囲において投稿される文書には、第２期間に発生位置範囲において投稿される文書よりも、リアクション語が多く用いられ、その結果、リアクション係数が１よりも大きくなることが考えられる。したがって、事象判断装置１は、リアクション係数を更に用いることにより、治安関連事象の希少性をより確実に判断することができる。 Further, the event determination apparatus 1 calculates the appearance frequency of the reaction word in the first group of documents and the appearance frequency of the reaction word in the second group of documents, The rarity of security related events is determined by further using the ratio of appearance frequencies (= p1 / p2) of documents in the first group of reaction words. Specifically, the event determination apparatus 1 further determines the rarity of security related events by further using a reaction coefficient represented by β (p1 / p2). If the rareness of security related events is high, more reaction words are used for documents posted in the occurrence position range in the first period than in documents posted in the occurrence position range in the second period. It is conceivable that the reaction coefficient becomes larger than 1. Therefore, the event determination apparatus 1 can more reliably determine the scarcity of security related events by further using the reaction coefficient.

以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、各請求項に記載した要旨を変更しない範囲で変形し、または他のものに適用したものであってもよい。 As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, It deform | transforms in the range which does not change the summary described in each claim, or applied to another thing. There may be.

例えば、判断部９は、リアクション係数を用いずに、事象の希少性を判断してもよい。この場合、判断部９は、希少度Ｒを下記式（２）により求める。
Ｒ＝α（１／ｓｉｍ）・γｐｋ（２） For example, the determination unit 9 may determine the rarity of the event without using the reaction coefficient. In this case, the determination part 9 calculates | requires the rarity R by following formula (2).
R = α (1 / sim) · γpk (2)

また、第１グループ及び第２グループの文書の数が類似度の計算精度を保つため必要な数となるように、第１期間が設定されてもよい。また、第１グループは、１つの文書（即ち、治安関連キーワードを含む文書）のみで構成されていてもよく、同様に、第２グループは、１つの文書のみで構成されていてもよい。また、第１グループの文書及び第２グルーブの文書は、マイクロブログサービスに投稿された文書に限らず、位置及び時刻が設定された文書であればよい。また、第２グループの文書は、文書に設定された位置が、判断対象の治安関連事象の発生位置範囲に含まれる文書であって、第１グループの文書とは異なる文書であればよい。例えば、第２グループの文書は、第１グループの文書よりも前（過去）に投稿された文書に限らず、第１グループの文書よりも後に投稿された文書であってもよい。また、第１グループの文書の数は、第２グループの文書の数と一致していなくてもよい。また、事象判断装置１の判断対象となる事象は、治安関連事象に限られない。 Further, the first period may be set so that the number of documents in the first group and the second group becomes a number necessary for maintaining the accuracy of similarity calculation. Further, the first group may be composed of only one document (that is, a document including security related keywords), and similarly, the second group may be composed of only one document. In addition, the first group document and the second group document are not limited to documents posted to the microblog service, and may be any document in which a position and a time are set. The document of the second group may be a document in which the position set in the document is included in the range of the occurrence position of the security-related event to be determined and is different from the document of the first group. For example, the document of the second group is not limited to the document posted before (past) the document of the first group, and may be a document posted after the document of the first group. In addition, the number of documents in the first group may not match the number of documents in the second group. Moreover, the event which becomes the judgment object of the event judgment apparatus 1 is not restricted to a security related event.

１…事象判断装置、２…単語取得部、３…位置情報取得部、４…時刻情報取得部、５…第１グループ取得部、６…第２グループ取得部、７…類似度計算部、８…出現頻度計算部、９…判断部、１０…出力部。 DESCRIPTION OF SYMBOLS 1 ... Event judgment apparatus, 2 ... Word acquisition part, 3 ... Position information acquisition part, 4 ... Time information acquisition part, 5 ... 1st group acquisition part, 6 ... 2nd group acquisition part, 7 ... Similarity calculation part, 8 ... appearance frequency calculation part, 9 ... judgment part, 10 ... output part.

Claims

An event judgment device for judging the rarity of an event,
Word acquisition means for acquiring a preset word indicating the event;
Position information acquisition means for acquiring position information indicating a position where the event has occurred;
Time information acquisition means for acquiring time information indicating the time when the event occurred;
First group acquisition means for acquiring a first group of documents from a document in which the position and time are set based on the position indicated by the position information and the time indicated by the time information;
Second group acquisition means for acquiring a second group of documents different from the first group of documents based on the position indicated by the position information;
Similarity calculation means for calculating a similarity indicating the similarity between the first group of documents and the second group of documents;
Appearance frequency calculating means for calculating the appearance frequency of the word in the first group of documents;
Determination means for determining the rarity of the event based on the similarity and the appearance frequency;
Output means for outputting a judgment result of the judgment means;
An event determination device comprising:

The appearance frequency calculating means calculates an appearance frequency of a preset reaction word in the first group of documents and an appearance frequency of the reaction word in the second group of documents,
The event determination apparatus according to claim 1, wherein the determination unit determines the rarity of the event by further using the appearance frequency of the reaction word in the documents of the first group and the second group.

The event determination apparatus according to claim 1, wherein the second group acquisition unit acquires the second group of documents based on a time indicated by the time information.