JP2018142131A

JP2018142131A - Information determination model learning device, information determination device and program therefor

Info

Publication number: JP2018142131A
Application number: JP2017035283A
Authority: JP
Inventors: 友香武井; Yuka Takei; 後藤　淳; Atsushi Goto; 淳後藤; 太郎宮▲崎▼; Taro Miyazaki; 山田　一郎; Ichiro Yamada; 一郎山田
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2017-02-27
Filing date: 2017-02-27
Publication date: 2018-09-13
Anticipated expiration: 2037-02-27
Also published as: JP6806589B2

Abstract

PROBLEM TO BE SOLVED: To provide an information determination device which determines whether social media information is information relevant to an actually occurring event or not with high accuracy.SOLUTION: An information determination device 1 comprises: vectorizing means 12 which generates a distributed representation vector of a post unit from social media information; phrase determination means 13 which determines whether a phrase not representing an actually occurring event is included in the social media information; vector extension means 14 which vectorizes the presence/absence of the phrase and extends the distributed representation vector of the post unit to generate an extended distributed representation vector; learning means 15 which generates an information determination mode by machine learning of the extended distributed representation vector; and determination means 16 which uses the information determination model to determine whether the social media information is information relevant to an actually occurring event or not.SELECTED DRAWING: Figure 1

Description

本発明は、ソーシャルメディア情報が現実に発生している事象を示す情報であるか否かを判定する情報判定技術に関する。 The present invention relates to an information determination technique for determining whether social media information is information indicating an event that actually occurs.

近年、ソーシャル・ネットワーキング・サービス（ＳＮＳ：Social Networking Service）の発達により、個人が、容易にかつリアルタイムで情報を発信することが可能になった。このような個人が発信するソーシャル・ビッグデータは、有力な情報源となり、様々な社会問題の解決に活用されている。
例えば、放送局では、人がＳＮＳを常時監視し、事件、事故、災害等に関する情報を取得している。これによって、放送局は、事件等の情報を、ほぼリアルタイムで放送することができる。
しかし、膨大なソーシャル・ビッグデータの中から、有益な情報を手動で取得するには、多大な労力を必要としている。
そこで、有益な情報を効率的に取得するため、特定のテーマに依存して危険表現となりうる単語やフレーズをニューラルネットワークにより学習し、ソーシャル・ビッグデータから、特定のテーマに依存して危険表現となりうる単語やフレーズを抽出する手法が開示されている（特許文献１参照）。 In recent years, with the development of social networking services (SNS), it has become possible for individuals to transmit information easily and in real time. Such social big data transmitted by individuals is a powerful source of information and is used to solve various social problems.
For example, in a broadcasting station, a person constantly monitors an SNS and acquires information on incidents, accidents, disasters, and the like. As a result, the broadcasting station can broadcast information such as incidents in almost real time.
However, it takes a lot of effort to manually obtain useful information from a huge amount of social big data.
Therefore, in order to efficiently acquire useful information, words and phrases that can be dangerous expressions depending on a specific theme are learned by a neural network and become dangerous expressions depending on a specific theme from social big data. A technique for extracting a possible word or phrase is disclosed (see Patent Document 1).

特開２０１５−７２６１４号公報Japanese Patent Laying-Open No. 2015-72614

従来の手法は、危険表現に該当する単語やフレーズを学習するのみであるため、現実には発生していない事件等の情報も抽出してしまう。
例えば、「海外の事例を対岸の火事と楽観視できない。」、「火事になったらまずいから、火災保険に入っておこう。」、「大河ドラマの大火事シーンがリアルに再現されていた。」等、「火事」という事件、事故等に関連する単語が含まれている場合でも、現実には「火事」は発生していない。しかし、従来の手法は、現実に事件等が発生しているか否かに関わらず、危険表現に関連する情報を抽出してしまう。
このように、従来の手法は、現実に発生していない情報を抽出してしまうため、抽出した情報をニュース等の情報源として活用するには、現実に発生しているか否かを判別する手間がかかってしまうという問題がある。 Since the conventional method only learns words and phrases corresponding to the dangerous expression, information such as an incident that does not actually occur is also extracted.
For example, “I can't be optimistic about an overseas case as a fire on the opposite bank.”, “Let's get fire insurance because it ’s a fire,” “Large fire scenes in the Taiga drama were realistically reproduced. Even if a word related to an incident such as “fire” or an accident is included, “fire” does not actually occur. However, the conventional method extracts information related to the danger expression regardless of whether an incident or the like actually occurs.
As described above, since the conventional method extracts information that does not actually occur, in order to use the extracted information as an information source such as news, it is troublesome to determine whether or not it actually occurs. There is a problem that it takes.

そこで、本発明は、ソーシャルメディア情報が現実に発生している事象に関連する情報であるか否かを高精度に判定するための情報判定モデル学習装置、情報判定装置およびそれらのプログラムを提供することを課題とする。 Therefore, the present invention provides an information determination model learning device, an information determination device, and a program thereof for determining with high accuracy whether or not social media information is information related to an actually occurring event. This is the issue.

前記課題を解決するため、本発明に係る情報判定モデル学習装置は、現実の発生事象を示すか否かが既知の投稿単位のテキストデータである複数のソーシャルメディア情報を教師データとして、判定対象のソーシャルメディア情報が現実の発生事象を示す情報か否かを判定するための情報判定モデルを学習する情報判定モデル学習装置であって、ベクトル化手段と、語句判定手段と、ベクトル拡張手段と、学習手段と、を備える構成とした。 In order to solve the above-described problem, the information determination model learning device according to the present invention uses a plurality of social media information, which is text data in a posting unit, which is known to indicate an actual occurrence event, as teacher data, and is used as a determination target. An information determination model learning device for learning an information determination model for determining whether social media information is information indicating an actual occurrence event, comprising a vectorization means, a phrase determination means, a vector expansion means, and learning Means.

かかる構成において、情報判定モデル学習装置は、ベクトル化手段によって、教師データを入力して、予めｗｏｒｄ２ｖｅｃ等の手法により学習して記憶手段に記憶されている単語ごとの分散表現ベクトルから、投稿文を構成する単語の分散表現ベクトルを平均化して、投稿単位の分散表現ベクトルを生成する。単語ごとの分散表現ベクトルは、単語の分布から、近似する意味内容を示す単語ほど、近い数値ベクトルを与えたものである。これによって、ベクトル化手段は、投稿文そのものの意味内容を加味したベクトルを生成する。 In such a configuration, the information determination model learning device inputs the teacher data by the vectorization unit, learns by a technique such as word2vec and stores the posted sentence from the distributed expression vector for each word stored in the storage unit in advance. A distributed expression vector for each posting is generated by averaging the distributed expression vectors of the constituent words. The distributed expression vector for each word is a value obtained by giving a closer numerical vector to a word indicating the approximate semantic content from the word distribution. As a result, the vectorization means generates a vector that takes into account the semantic content of the posted text itself.

そして、情報判定モデル学習装置は、語句判定手段によって、ベクトル化手段で生成された投稿単位の分散表現ベクトルに対応するソーシャルメディア情報が、現実の発生事象を表していないことを示す予め定めた複数の語句を単語として含むか否かを判定する。この現実の発生事象を表していないことを示す語句には、発生事象に関連する慣用句、仮定形表現、あるいは、番組の出演者、ゲームのキャラクター等の固有名詞がある。 Then, the information determination model learning device uses a phrase determination unit to determine a plurality of predetermined social media information corresponding to the post-unit distributed representation vector generated by the vectorization unit and not representing an actual occurrence event. It is determined whether or not this phrase is included as a word. The phrase indicating that it does not represent an actual occurrence event includes an idiomatic phrase related to the occurrence event, a hypothetical expression, or a proper noun such as a performer of a program or a game character.

そして、情報判定モデル学習装置は、ベクトル拡張手段によって、語句判定手段で含まれていると判定された語句の有無をベクトル化して投稿単位の分散表現ベクトルに付加し、拡張分散表現ベクトルを生成する。この拡張分散表現ベクトルには、投稿文そのものの意味内容の特徴以外に、現実には事象が発生していないことを示す特徴が加味されることになる。 Then, the information determination model learning device vectorizes the presence / absence of the phrase determined to be included by the phrase determination unit by the vector expansion unit, adds the vector to the distributed representation vector for each posting unit, and generates an extended distributed representation vector . In addition to the feature of the semantic content of the posted text itself, this extended distributed expression vector takes into account a feature indicating that no event has actually occurred.

そして、情報判定モデル学習装置は、学習手段によって、ベクトル拡張手段で生成された拡張分散表現ベクトルを、機械学習することで情報判定モデルを生成する。この学習手段は、教師データが現実の発生事象を示すときの拡張分散表現ベクトルと、教師データが現実の発生事象を示さないときの拡張分散表現ベクトルとにより２つの状態を学習する。
これによって、情報判定モデル学習装置は、任意のソーシャルメディア情報が、現実の発生事象を示した情報であるか否かを判定するための情報判定モデルを学習する。
なお、情報判定モデル学習装置は、コンピュータを、前記した各手段として機能させるための情報判定モデル学習プログラムで動作させることができる。 Then, the information determination model learning device generates an information determination model by machine learning of the extended distributed expression vector generated by the vector expansion unit by the learning unit. The learning means learns two states based on an extended distributed representation vector when the teacher data indicates an actual occurrence event and an extended distributed expression vector when the teacher data does not indicate an actual occurrence event.
Thereby, the information determination model learning device learns an information determination model for determining whether any social media information is information indicating an actual occurrence event.
Note that the information determination model learning apparatus can operate the computer with an information determination model learning program for causing the computer to function as each of the above-described means.

また、前記課題を解決するため、本発明に係る情報判定装置は、情報判定モデル学習装置で学習した情報判定モデルを用いて、判定対象のソーシャルメディア情報である未知データが現実の発生事象を示す情報か否かを判定する情報判定装置であって、ベクトル化手段と、語句判定手段と、ベクトル拡張手段と、判定手段と、を備える構成とした。 Moreover, in order to solve the said subject, the information determination apparatus which concerns on this invention uses the information determination model learned with the information determination model learning apparatus, and the unknown data which are social media information of a determination target show an actual generation | occurrence | production event An information determination apparatus that determines whether or not information is included, and includes a vectorization unit, a phrase determination unit, a vector expansion unit, and a determination unit.

かかる構成において、情報判定装置は、ベクトル化手段によって、未知データを入力して、予め記憶手段に記憶されている単語ごとの分散表現ベクトルから、投稿文を構成する単語の分散表現ベクトルを平均化して、投稿単位の分散表現ベクトルを生成する。
そして、情報判定装置は、語句判定手段によって、ベクトル化手段で生成された投稿単位の分散表現ベクトルに対応するソーシャルメディア情報が、現実の発生事象を表していないことを示す予め定めた複数の語句を単語として含むか否かを判定する。 In such a configuration, the information determination device inputs unknown data by the vectorization means, and averages the distributed expression vectors of the words constituting the posted sentence from the distributed expression vectors for each word stored in the storage means in advance. Then, a distributed representation vector for each posting is generated.
Then, the information determination device has a plurality of predetermined phrases indicating that the social media information corresponding to the post-unit distributed representation vector generated by the vectorization means does not represent an actual occurrence event by the phrase determination means Is included as a word.

そして、情報判定装置は、ベクトル拡張手段によって、語句判定手段で含まれていると判定された語句の有無をベクトル化して投稿単位の分散表現ベクトルに付加し、拡張分散表現ベクトルを生成する。
そして、情報判定装置は、判定手段によって、ベクトル拡張手段で生成された拡張分散表現ベクトルにより、情報判定モデルを用いて、未知データが現実の発生事象を示す情報か否かを判定する。 Then, the information determination apparatus vectorizes the presence / absence of the phrase determined to be included in the phrase determination unit by the vector expansion unit, adds the vector to the distributed representation vector for each posting unit, and generates an extended distributed representation vector.
Then, the information determination apparatus determines whether or not the unknown data is information indicating an actual occurrence event using the information determination model based on the extended distributed expression vector generated by the vector expansion unit.

また、前記課題を解決するため、本発明に係る情報判定装置は、現実の発生事象を示すか否かが既知の投稿単位のテキストデータである複数のソーシャルメディア情報を教師データとして情報判定モデルを学習し、判定対象のソーシャルメディア情報である未知データが現実の発生事象を示す情報か否かを判定する情報判定装置であって、ベクトル化手段と、語句判定手段と、ベクトル拡張手段と、学習手段と、判定手段と、を備える構成とした。 In order to solve the above-mentioned problem, the information determination apparatus according to the present invention uses an information determination model using a plurality of social media information, which is text data of a posting unit, which is known to indicate an actual occurrence event, as teacher data. An information determination apparatus that learns and determines whether unknown data that is social media information to be determined is information indicating an actual occurrence event, comprising: vectorization means, phrase determination means, vector expansion means, and learning Means and determination means.

かかる構成において、情報判定装置は、ベクトル化手段によって、情報判定モデルを学習する学習モードにおいては教師データを入力し、情報判定モデルを用いた判定を行う評価モードにおいては未知データを入力して、予め記憶手段に記憶されている単語ごとの分散表現ベクトルから、投稿単位の分散表現ベクトルを生成する。
そして、情報判定装置は、語句判定手段によって、ベクトル化手段で生成された投稿単位の分散表現ベクトルに対応するソーシャルメディア情報が、現実の発生事象を表していないことを示す予め定めた複数の語句を単語として含むか否かを判定する。
さらに、情報判定装置は、ベクトル拡張手段によって、語句判定手段で含まれていると判定された語句の有無をベクトル化して投稿単位の分散表現ベクトルに付加し、拡張分散表現ベクトルを生成する。 In such a configuration, the information determination apparatus inputs the teacher data in the learning mode for learning the information determination model by the vectorization means, and inputs the unknown data in the evaluation mode for performing the determination using the information determination model, A distributed expression vector for each posting is generated from the distributed expression vector for each word stored in the storage means in advance.
Then, the information determination device has a plurality of predetermined phrases indicating that the social media information corresponding to the post-unit distributed representation vector generated by the vectorization means does not represent an actual occurrence event by the phrase determination means Is included as a word.
Further, the information determination apparatus vectorizes the presence / absence of the phrase determined to be included by the phrase determination unit by the vector expansion unit, adds the vector to the post-unit distributed representation vector, and generates an extended distributed representation vector.

そして、情報判定装置は、学習手段によって、学習モードにおいて、教師データに対応するソーシャルメディア情報から生成された拡張分散表現ベクトルを機械学習することで情報判定モデルを生成する。
また、情報判定装置は、判定手段によって、評価モードにおいて、未知データに対応するソーシャルメディア情報から生成された拡張分散表現ベクトルにより、情報判定モデルを用いて、未知データが現実の発生事象を示す情報か否かを判定する。
なお、情報判定装置は、コンピュータを、前記した各手段として機能させるための情報判定プログラムで動作させることができる。 Then, the information determination apparatus generates an information determination model by machine learning of the extended distributed expression vector generated from the social media information corresponding to the teacher data in the learning mode by the learning unit.
In addition, the information determination device uses the information determination model to determine whether the unknown data represents an actual occurrence event using the extended distributed expression vector generated from the social media information corresponding to the unknown data in the evaluation mode. It is determined whether or not.
The information determination apparatus can operate the computer with an information determination program for causing the computer to function as each of the above-described means.

本発明は、以下に示す優れた効果を奏するものである。
本発明によれば、ソーシャルメディア情報が、現実に発生している事象に関連する情報であるか否かを、高精度に判定することができる。
これによって、本発明は、ＳＮＳにおいて個人が発信するソーシャル・ビッグデータを、ニュース等の情報源として有効に活用することができる。 The present invention has the following excellent effects.
According to the present invention, it is possible to determine with high accuracy whether social media information is information related to an event that has actually occurred.
As a result, the present invention can effectively utilize social big data transmitted by individuals in SNS as an information source such as news.

本発明の実施形態に係る情報判定装置の構成を示すブロック構成図である。It is a block block diagram which shows the structure of the information determination apparatus which concerns on embodiment of this invention. ベクトル化手段の処理内容を説明するための図であって、（ａ）はメディア情報を単語に分割する例、（ｂ）は単語の分散表現ベクトルから投稿文の分散表現ベクトルを算出する例を説明するための説明図である。It is a figure for demonstrating the processing content of a vectorization means, Comprising: (a) is an example which divides media information into a word, (b) is an example which calculates the distributed expression vector of a contribution sentence from the distributed expression vector of a word. It is explanatory drawing for demonstrating. 特徴語句記憶手段に記憶する語句の例を示す図であって、（ａ）は慣用句、（ｂ）は仮定形表現、（ｃ）は指定固有名詞の例を示す。It is a figure which shows the example of the phrase memorize | stored in a characteristic phrase memory | storage means, Comprising: (a) is an idiomatic phrase, (b) is a hypothetical expression, (c) shows the example of a designation | designated proper noun. 仮定形表現の係り受け関係を説明するための説明図である。It is explanatory drawing for demonstrating the dependency relationship of an assumption form expression. ベクトル拡張手段が生成する拡張分散表現ベクトルの一例を示すデータ構成図である。It is a data block diagram which shows an example of the extended dispersion | distribution expression vector which a vector expansion means produces | generates. 情報判定モデルの一例であるフィードフォワードニューラルネットワークの構成を示す図である。It is a figure which shows the structure of the feedforward neural network which is an example of an information determination model. 本発明の実施形態に係る情報判定装置の学習モードの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the learning mode of the information determination apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る情報判定装置の評価モードの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the evaluation mode of the information determination apparatus which concerns on embodiment of this invention. ベクトル拡張手段が生成する拡張分散表現ベクトルの他の例を示すデータ構成図である。It is a data block diagram which shows the other example of the extended dispersion | distribution expression vector which a vector expansion means produces | generates. 本発明の他の実施形態に係る情報判定モデル学習装置の構成を示すブロック構成図である。It is a block block diagram which shows the structure of the information determination model learning apparatus which concerns on other embodiment of this invention. 本発明の他の実施形態に係る情報判定装置の構成を示すブロック構成図である。It is a block block diagram which shows the structure of the information determination apparatus which concerns on other embodiment of this invention.

以下、本発明の実施形態について図面を参照して説明する。
［情報判定装置の構成］
最初に、図１を参照して、本発明の実施形態に係る情報判定装置１の構成について説明する。 Embodiments of the present invention will be described below with reference to the drawings.
[Configuration of information judgment device]
Initially, with reference to FIG. 1, the structure of the information determination apparatus 1 which concerns on embodiment of this invention is demonstrated.

情報判定装置１は、制御部１０と記憶部２０とで構成される。
情報判定装置１は、ＳＮＳで発信される情報（投稿単位のテキストデータであるツイート〔登録商標〕等）が、現実に発生している予め定めた所定の事象に関連する情報であるか否かを判定するものである。 The information determination apparatus 1 includes a control unit 10 and a storage unit 20.
The information determination device 1 determines whether or not information (tweet [registered trademark] or the like that is text data for each posting) transmitted by the SNS is information related to a predetermined event that is actually occurring. Is determined.

制御部１０は、図１に示すように、分散表現ベクトル生成手段１１と、ベクトル化手段１２と、語句判定手段１３と、ベクトル拡張手段１４と、学習手段１５と、判定手段１６と、を備える。
制御部１０は、情報判定装置１の動作を制御するものである。制御部１０は、２つの動作モードで動作する。動作モードの１つは、現実に発生している事象に関連する情報であるか否かが既知のソーシャルメディア情報（以下、単にメディア情報）から、未知のメディア情報が、現実に発生している事象に関連する情報であるか否かを判定する情報判定モデルを学習する学習モードである。動作モードのもう１つは、学習した情報判定モデルを用いて、未知のメディア情報が、現実に発生している事象に関連する情報であるか否かを判定する評価モードである。 As shown in FIG. 1, the control unit 10 includes a distributed representation vector generation unit 11, a vectorization unit 12, a phrase determination unit 13, a vector expansion unit 14, a learning unit 15, and a determination unit 16. .
The control unit 10 controls the operation of the information determination apparatus 1. The control unit 10 operates in two operation modes. One of the operation modes is that unknown media information is actually generated from known social media information (hereinafter simply referred to as media information) whether or not the information is related to an actually occurring event. This is a learning mode for learning an information determination model for determining whether or not information is related to an event. Another operation mode is an evaluation mode in which it is determined whether or not unknown media information is information related to an actually occurring event using a learned information determination model.

本実施形態においては、現実に発生している事象として、事件、事故、災害等に関連するメディア情報の中で最も大きい割合を占める「火事」を例として説明する。もちろん、この事象は、現実に発生する事象であれば、火事に限定されるものではなく、交通事故、鉄道事故、気象災害等の予め定めた事象であればよい。 In the present embodiment, “fire”, which occupies the largest proportion of media information related to incidents, accidents, disasters, etc., will be described as an example of an actually occurring event. Of course, this event is not limited to a fire as long as it actually occurs, and may be a predetermined event such as a traffic accident, a railroad accident, or a weather disaster.

分散表現ベクトル生成手段１１は、既存のメディア情報等の大量の学習データ（分散表現学習データ）から、単語ごとの分散表現ベクトルを生成するものである。なお、分散表現ベクトルとは、分散表現学習データにおいて、意味が近い（分散の特徴が近い）単語を近いベクトルに対応させて、単語を有限の高次元（例えば、２００次元）の数値ベクトルで表現したものである。 The distributed expression vector generation means 11 generates a distributed expression vector for each word from a large amount of learning data (distributed expression learning data) such as existing media information. The distributed expression vector is a distributed expression learning data in which words having similar meanings (close distribution characteristics) are associated with close vectors, and the words are expressed by a finite high-dimensional (for example, 200-dimensional) numerical vector. It is a thing.

分散表現ベクトル生成手段１１は、分散表現学習データを形態素（単語）に分割し、分散表現学習データ全体の形態素に分割した単語を対象に分散表現ベクトルを生成する。なお、分散表現ベクトルの生成の手法は既知であり、例えば、ｗｏｒｄ２ｖｅｃ、ＧｌｏＶｅ（Global Vectors for Word Representation）等の一般的な手法により生成することができる。ここでは、分散表現ベクトルの生成の詳細な説明は省略する。
この分散表現ベクトル生成手段１１は、生成した分散表現ベクトルを、単語に対応付けて分散表現ベクトル記憶手段２１に記憶する。 The distributed expression vector generation means 11 divides the distributed expression learning data into morphemes (words), and generates a distributed expression vector for the words divided into morphemes of the entire distributed expression learning data. Note that a method for generating a distributed expression vector is known, and for example, it can be generated by a general method such as word2vec or GloVe (Global Vectors for Word Representation). Here, detailed description of the generation of the distributed expression vector is omitted.
The distributed expression vector generating unit 11 stores the generated distributed expression vector in the distributed expression vector storage unit 21 in association with the word.

ベクトル化手段１２は、メディア情報を、分散表現ベクトルとして、ベクトル化するものである。
このベクトル化手段１２は、学習モードにおいては、所定の事象（ここでは、「火事」）に関連する情報であるか否かが既知のメディア情報（教師データ）を入力する。なお、教師データには、テキストデータ以外に、所定の事象に関連する情報であるか否か（正例または負例）を示す情報が含まれ、後記する学習手段１５は、正例または負例を示す情報（例えば、“１”，“０”）を入力する。
また、ベクトル化手段１２は、評価モードにおいては、所定の事象（ここでは、「火事」）に関連する情報であることが未知のメディア情報を入力する。
そして、ベクトル化手段１２は、分散表現ベクトル記憶手段２１に記憶されている単語ごとの分散表現ベクトルから、投稿文を構成する単語の分散表現ベクトルを平均化して、投稿単位の分散表現ベクトルを生成する。 The vectorization unit 12 vectorizes the media information as a distributed expression vector.
In the learning mode, the vectorization means 12 inputs known media information (teacher data) as to whether the information is related to a predetermined event (here, “fire”). In addition to the text data, the teacher data includes information indicating whether or not the information is related to a predetermined event (positive example or negative example). (For example, “1”, “0”) is input.
In the evaluation mode, the vectorization unit 12 inputs media information that is unknown to be information related to a predetermined event (here, “fire”).
The vectorization unit 12 averages the distributed representation vectors of the words constituting the posted sentence from the distributed representation vectors for each word stored in the distributed representation vector storage unit 21 to generate a distributed representation vector for each posting unit. To do.

具体的には、ベクトル化手段１２は、テキストデータであるメディア情報を投稿ごとに入力し、投稿文を形態素解析により単語に分割する。そして、ベクトル化手段１２は、分散表現ベクトル記憶手段２１から、分割した単語に対応する分散表現ベクトルを読み出して加算する。
そして、ベクトル化手段１２は、加算した分散表現ベクトルを当該投稿文に含まれる単語数で除算することで、ベクトルを正規化し、投稿文の分散表現ベクトル（文分散表現ベクトル）を生成する。ベクトル化手段１２は、入力したメディア情報を語句判定手段１３に出力するとともに、生成した文分散表現ベクトルをベクトル拡張手段１４に出力する。 Specifically, the vectorization means 12 inputs media information that is text data for each post, and divides the posted sentence into words by morphological analysis. Then, the vectorization means 12 reads out the distributed expression vectors corresponding to the divided words from the distributed expression vector storage means 21 and adds them.
Then, the vectorization unit 12 normalizes the vector by dividing the added distributed expression vector by the number of words included in the posted sentence, and generates a distributed expression vector (sentence distributed expression vector) of the posted sentence. The vectorization unit 12 outputs the input media information to the phrase determination unit 13 and also outputs the generated sentence distribution expression vector to the vector expansion unit 14.

ここで、図２を参照（適宜図１参照）して、ベクトル化手段１２が生成する文分散表現ベクトルについて説明する。
図２（ａ）に示すように、メディア情報の投稿文の一例を「隣町で民家が火事だ。」とした場合、ベクトル化手段１２は、当該投稿文を「隣／町／で／民家／が／火事／だ／。」と分割する。 Here, referring to FIG. 2 (refer to FIG. 1 as appropriate), the sentence distribution expression vector generated by the vectorization means 12 will be described.
As shown in FIG. 2A, when an example of a post message of media information is “a private house is a fire in the neighboring town”, the vectorization means 12 converts the post sentence into “neighbor / town / in / private house”. / Ga / fire / da /. "

そして、ベクトル化手段１２は、分割した単語ごとに、対応する分散表現ベクトルを分散表現ベクトル記憶手段２１から読み出す。例えば、図２（ｂ）に示すように、単語「隣」に対応する次元数がｎ個（例えば、２００次元）の分散表現ベクトル「０．１，０．３，０．４，０．１，０．８，０．９，０．２，…，０．９」を読み出す。
そして、ベクトル化手段１２は、投稿文を構成する単語数だけ分散表現ベクトルを加算して、全単語合計（図２（ｂ）の例では、「６．４，１．６，２．４，３．２，３．２，６．４，４．０，…，５．６」）を算出する。 Then, the vectorization means 12 reads the corresponding distributed expression vector from the distributed expression vector storage means 21 for each divided word. For example, as shown in FIG. 2B, the distributed representation vector “0.1, 0.3, 0.4, 0.1 having n (for example, 200 dimensions) corresponding to the word“ neighbor ”is included. , 0.8, 0.9, 0.2,..., 0.9 ”.
Then, the vectorization means 12 adds the distributed expression vectors by the number of words constituting the posted sentence, and adds up all the words (in the example of FIG. 2B, “6.4, 1.6, 2.4, 3.2, 3.2, 6.4, 4.0, ..., 5.6 ").

その後、ベクトル化手段１２は、分散表現ベクトルの全単語合計を、投稿文を構成する単語数（図２の例では、８個）で除算することで、文分散表現ベクトル（図２（ｂ）の例では、「０．８，０．２，０．３，０．４，０．４，０．８，０．５，…，０．７」）を算出する。
これによって、ベクトル化手段１２は、メディア情報から、投稿文ごとに文分散表現ベクトルを生成する。
図１に戻って、情報判定装置１の構成について説明を続ける。 After that, the vectorization means 12 divides the total word total of the distributed expression vector by the number of words constituting the posted sentence (eight in the example of FIG. 2), thereby obtaining the sentence distributed expression vector (FIG. 2B). In this example, “0.8, 0.2, 0.3, 0.4, 0.4, 0.8, 0.5,..., 0.7”) is calculated.
Thereby, the vectorization means 12 generates a sentence distribution expression vector for each posted sentence from the media information.
Returning to FIG. 1, the description of the configuration of the information determination apparatus 1 will be continued.

語句判定手段１３は、入力されたメディア情報に、予め定めた所定の事象が発生していないと予測される特徴的な語句（特徴語句）が含まれているか否かを判定するものである。語句判定手段１３は、ベクトル化手段１２を介して入力されるメディア情報内に、特徴語句記憶手段２２に記憶されている特徴語句が含まれているか否かを判定する。 The phrase determination unit 13 determines whether or not the input media information includes a characteristic phrase (characteristic phrase) predicted that a predetermined event does not occur in advance. The phrase determination means 13 determines whether or not the feature information stored in the feature phrase storage means 22 is included in the media information input via the vectorization means 12.

所定の事象が発生していないと予測される特徴語句には、図３（ａ）に例示する慣用句（ことわざを含む）がある。メディア情報として、「喧嘩を止めるつもりが、『火に油を注ぐ』結果になってしまった。」が入力された場合、「火」、「火事」を含んでいても、現実には火事という事象は発生していない。
そこで、語句判定手段１３は、メディア情報に、予め定めた事象に関連する単語（ここでは、「火」、「火事」）を含む慣用句が含まれている場合に、特徴語句が含まれていると判定する。 Examples of characteristic words / phrases predicted that a predetermined event has not occurred include common phrases (including proverbs) illustrated in FIG. As media information, if you entered "I intended to stop fighting but ended up with the result of" oiling the fire "", even if it includes "fire" and "fire", it is actually a fire. The event has not occurred.
Therefore, the phrase determination unit 13 includes a characteristic phrase when the media information includes an idiomatic phrase including words related to a predetermined event (here, “fire”, “fire”). It is determined that

また、所定の事象が発生していないと予測される特徴語句には、図３（ｂ）に例示する仮定形表現がある。メディア情報として、「『火事』になったら、どこへ逃げたら、いいだろう。」が入力された場合、「火」、「火事」を含んでいても、現実には火事という事象は発生していない。このとき、語句判定手段１３は、メディア情報を係り受け解析し、予め定めた事象に関連する単語（ここでは、「火」、「火事」）と同じ文節内、または、係り受け関係にある場合に、メディア情報に特徴語句（仮定形表現）が含まれていると判定する。 Moreover, there is a hypothetical expression illustrated in FIG. 3B as a feature word / phrase predicted that a predetermined event has not occurred. As media information, if “fire, where should I escape?” Is entered, even if “fire” and “fire” are included, an event of fire actually occurs. Not. At this time, the phrase determination unit 13 performs dependency analysis on the media information, and in the same phrase as the word related to the predetermined event (here, “fire”, “fire”) or in a dependency relationship The media information includes a characteristic phrase (assumed expression).

例えば、図４に示すように、「火事に」と係り受け関係がある「なったら」に、仮定形表現（〜たら）がある場合、語句判定手段１３は、メディア情報に特徴語句が含まれていると判定する。また、図４の例では、「どこへ」と係り受け関係にある「逃げたら」にも仮定形表現（〜たら）が存在するが、「火事に」とは係り受け関係がないため除外する。 For example, as shown in FIG. 4, when there is a hypothetical expression (~ tara) in “when it” has a dependency relationship with “fire”, the phrase determination means 13 includes the characteristic phrase in the media information. It is determined that In addition, in the example of FIG. 4, there is a hypothetical expression (~ Tara) also in “if you escape” that is in a dependency relationship with “where”, but it is excluded because there is no dependency relationship in “fire”. .

また、所定の事象が発生していないと予測される特徴語句には、図３（ｃ）に例示するテレビ番組の番組名、出演者、登場人物等の予め指定された固有名詞（指定固有名詞）がある。メディア情報として、「精霊の△△人の火事のシーンはどうやって撮影しているのかな。」が入力された場合、「火」、「火事」を含んでいても、現実には火事という事象は発生していない。
そこで、語句判定手段１３は、メディア情報に、予め定めた指定された固有名詞が含まれている場合に、特徴語句が含まれていると判定する。
特徴語句として仮定形表現を用いる場合、「火事」等の所定の事象は、外部から語句判定手段１３に設定されるものとする。もちろん、所定の事象を記憶手段、例えば特徴語句記憶手段２２に予め記憶しておき、語句判定手段１３が参照することとしてもよい。
なお、指定固有名詞は、必ずしも番組に関連する固有名詞に限定されず、例えば、映画、ゲームに関連するタイトル、キャラクター等の固有名詞であっても構わない。 Further, the characteristic words / phrases predicted that the predetermined event has not occurred include pre-designated proper nouns (designated proper nouns) such as the program name, performers, and characters of the TV program illustrated in FIG. ) As media information, if “how do you capture the fire scene of the △ △ person's fire?” Is input, even if it includes “fire” and “fire”, in reality the event of fire is It has not occurred.
Therefore, the phrase determination unit 13 determines that the characteristic phrase is included when the media information includes a predetermined proper noun.
When the hypothetical expression is used as the feature phrase, a predetermined event such as “fire” is set in the phrase determination unit 13 from the outside. Of course, a predetermined event may be stored in advance in a storage unit, for example, the feature phrase storage unit 22 and referred to by the phrase determination unit 13.
The designated proper noun is not necessarily limited to a proper noun related to a program, and may be a proper noun such as a title or a character related to a movie or a game, for example.

この語句判定手段１３は、メディア情報に特徴語句が含まれていると判定した場合、特徴語句を識別する予め定めた情報（固有の識別子）を、ベクトル拡張手段１４に出力する。また、語句判定手段１３は、メディア情報に特徴語句が含まれていない場合、含まれていないことを示す予め定めた識別子（例えば、ＮＵＬＬ）を、ベクトル拡張手段１４に出力する。 If it is determined that the media information includes a feature word / phrase, the word / phrase determination unit 13 outputs predetermined information (unique identifier) for identifying the feature word / phrase to the vector expansion unit 14. Also, the phrase determination unit 13 outputs a predetermined identifier (for example, NULL) indicating that the media information does not include the feature phrase to the vector expansion unit 14 when the media information does not include the feature phrase.

ベクトル拡張手段１４は、ベクトル化手段１２で生成されたメディア情報の分散表現ベクトル（文分散表現ベクトル）に対して、語句判定手段１３で判定された特徴語句の有無を示すベクトルを拡張するものである。
ベクトル拡張手段１４は、図５に示すように、次元数がｎ個の文分散表現ベクトルに対して、慣用句の個数（ｍ個）、仮定形表現の個数（ｋ個）、指定固有名詞の個数（ｉ個）に応じた次元数だけ、ベクトルを拡張する。 The vector expansion unit 14 extends a vector indicating the presence / absence of the feature phrase determined by the phrase determination unit 13 to the distributed expression vector (sentence distribution expression vector) of the media information generated by the vectorization unit 12. is there.
As shown in FIG. 5, the vector expansion means 14 performs the number of idiomatic phrases (m), the number of hypothetical expressions (k), the number of designated proper nouns for the sentence distribution expression vector having n dimensions. The vector is expanded by the number of dimensions corresponding to the number (i).

ここで、拡張する慣用句のｍ個分のベクトルは、特徴語句記憶手段２２に記憶されている個々の慣用句ごとに、メディア情報に含まれているか否かを示す。メディア情報に含まれている慣用句については、その位置に対応する要素の値を“１”、含まれていない慣用句については、その位置に対応する要素の値を“０”とする。
また、拡張する仮定形表現のｋ個分のベクトルは、特徴語句記憶手段２２に記憶されている個々の仮定形表現ごとに、メディア情報に含まれているか否かを示す。メディア情報に含まれている仮定形表現については、その位置に対応する要素の値を“１”、含まれていない仮定形表現については、その位置に対応する要素の値を“０”とする。
また、拡張する指定固有名詞のｉ個分のベクトルは、特徴語句記憶手段２２に記憶されている個々の指定固有名詞ごとに、メディア情報に含まれているか否かを示す。メディア情報に含まれている指定固有名詞については、その位置に対応する要素の値を“１”、含まれていない指定固有名詞については、その位置に対応する要素の値を“０”とする。 Here, m vectors of idiomatic phrases to be expanded indicate whether or not each idiomatic phrase stored in the feature phrase storage means 22 is included in the media information. For an idiom included in the media information, the value of the element corresponding to the position is “1”, and for an idiom not included, the value of the element corresponding to the position is “0”.
Further, k vectors of the assumed form expression to be expanded indicate whether or not each hypothetical form expression stored in the feature word storage unit 22 is included in the media information. For the hypothetical expression included in the media information, the value of the element corresponding to the position is “1”, and for the hypothetical expression not included, the value of the element corresponding to the position is “0”. .
Further, i vectors of designated proper nouns to be expanded indicate whether or not each designated proper noun stored in the feature phrase storage means 22 is included in the media information. For a specified proper noun included in the media information, the value of the element corresponding to the position is “1”, and for a specified proper noun not included, the value of the element corresponding to the position is “0”. .

このように、ベクトル拡張手段１４は、文分散表現ベクトルに、特徴語句（慣用句、仮定形表現、指定固有名詞）が含まれているか否か示すベクトルを拡張した拡張分散表現ベクトルを生成する。この拡張分散表現ベクトルは、メディア情報が有する投稿文そのものの特徴に加え、当該メディア情報が所定の事象（ここでは、「火事」）に関連する情報ではないことを示す特徴量となる。
ベクトル拡張手段１４は、学習モードにおいては、拡張分散表現ベクトルを学習手段１５に出力する。また、ベクトル拡張手段１４は、評価モードにおいては、拡張分散表現ベクトルを判定手段１６に出力する。 In this way, the vector expansion unit 14 generates an extended distributed expression vector obtained by extending a vector indicating whether or not a feature word (idiomatic phrase, hypothetical expression, designated proper noun) is included in the sentence distributed expression vector. This extended distributed expression vector is a feature amount indicating that the media information is not information related to a predetermined event (here, “fire”) in addition to the feature of the posted text itself included in the media information.
The vector expansion means 14 outputs the extended distributed expression vector to the learning means 15 in the learning mode. Further, the vector expansion means 14 outputs the extended distributed expression vector to the determination means 16 in the evaluation mode.

学習手段１５は、学習モードにおいて、ベクトル拡張手段１４で生成される複数の拡張分散表現ベクトルから、メディア情報が現実に発生している予め定めた所定の事象に関連する情報であるか否かを判定するモデル（情報判定モデル）を学習するものである。
この学習手段１５に入力される拡張分散表現ベクトルは、現実に発生している事象に関連しているか否かが既知（正例または負例かが既知）の教師データである。 In the learning mode, the learning unit 15 determines whether or not the media information is information related to a predetermined event that is actually generated from a plurality of extended distributed expression vectors generated by the vector expansion unit 14. A model for determination (information determination model) is learned.
The extended distributed expression vector input to the learning means 15 is teacher data in which it is known (whether it is a positive example or a negative example) whether or not it is related to an actually occurring event.

学習手段１５は、例えば、ニューラルネットワークにより情報判定モデルを学習する。
具体的には、学習手段１５は、図６に示す入力層Ｌ１、隠れ層Ｌ２、出力層Ｌ３で構成される順伝播ニューラルネットワーク（Feed Forward Neural Network：ＦＦＮＮ）により情報判定モデルを学習する。
図６に示すＦＦＮＮは、入力層Ｌ１に、文分散表現ベクトルと拡張ベクトルとからなる拡張分散表現ベクトルを入力する。そして、ＦＦＮＮは、隠れ層Ｌ２において、入力層Ｌ１に入力された拡張分散表現ベクトルの各要素の値に重みを付加して伝搬させて、出力層Ｌ３から、判定結果を出力する。ここで、出力層Ｌ３は、例えば、次元数を２とし、一方のノードが、拡張分散表現ベクトルが現実に発生している事象に関連する投稿文のベクトルであることを示す確率を正規化して出力する。また、他方のノードが、拡張分散表現ベクトルが現実に発生している事象に関連する投稿文のベクトルではないことを示す確率を正規化して出力する。 The learning unit 15 learns the information determination model using, for example, a neural network.
Specifically, the learning unit 15 learns an information determination model by a forward propagation neural network (FFNN) including an input layer L1, a hidden layer L2, and an output layer L3 shown in FIG.
The FFNN shown in FIG. 6 inputs an extended distributed expression vector composed of a sentence distributed expression vector and an extended vector to the input layer L1. Then, the FFNN adds a weight to the value of each element of the extended dispersion expression vector input to the input layer L1 and propagates it in the hidden layer L2, and outputs the determination result from the output layer L3. Here, the output layer L3, for example, sets the number of dimensions to 2, and normalizes the probability that one of the nodes indicates that the extended distributed representation vector is a postal sentence vector related to an event that actually occurs. Output. Further, the other node normalizes and outputs the probability indicating that the extended distributed expression vector is not a posted text vector related to an event that has actually occurred.

そして、学習手段１５は、教師データが正例の場合、一方のノードの出力が、拡張分散表現ベクトルが現実に発生している事象に関連する投稿文のベクトルであることを示す確率値“１”、他方のノードの出力が確率値“０”となるように、各層の重みを情報判定モデルのパラメータとして学習する。また、教師データが負例の場合、一方のノードの出力が“０”、他方のノードの出力が“１” となるように、各層の重みを情報判定モデルのパラメータとして学習する。なお、ＦＦＮＮの学習には、例えば、誤差逆伝播法（Back Propagation）を用いる。
この学習手段１５は、教師データを用いた学習を所定回数行うか、パラメータ誤差が予め定めた誤差内に収束した段階で学習を終了する。
学習手段１５は、学習した情報判定モデルを、情報判定モデル記憶手段２３に書き込み記憶する。 Then, when the teacher data is a positive example, the learning means 15 has a probability value “1” indicating that the output of one of the nodes is a posted sentence vector related to an event in which the extended distributed representation vector actually occurs. "The weight of each layer is learned as a parameter of the information determination model so that the output of the other node has a probability value" 0 ". When the teacher data is a negative example, the weight of each layer is learned as a parameter of the information determination model so that the output of one node is “0” and the output of the other node is “1”. For learning FFNN, for example, an error back propagation method is used.
The learning unit 15 performs learning using the teacher data a predetermined number of times, or ends the learning when the parameter error converges within a predetermined error.
The learning unit 15 writes and stores the learned information determination model in the information determination model storage unit 23.

判定手段１６は、メディア情報が、現実に発生している事象に関連する情報であるか否かを判定するものである。
判定手段１６は、評価モードにおいて、現実に発生している事象に関連する情報であることが未知のメディア情報を入力する。また、判定手段１６は、そのメディア情報から、ベクトル化手段１２およびベクトル拡張手段１４を介して生成される拡張分散表現ベクトルを入力する。 The determination unit 16 determines whether or not the media information is information related to an actually occurring event.
In the evaluation mode, the determination unit 16 inputs media information that is unknown to be information related to an actually occurring event. Further, the determination unit 16 inputs an extended distributed expression vector generated from the media information through the vectorization unit 12 and the vector expansion unit 14.

判定手段１６は、情報判定モデル記憶手段２３に記憶されている情報判定モデルを用いて、入力した拡張分散表現ベクトルが、現実に発生している事象に関連する情報に対応するベクトルであるか否かを判定する。具体的には、判定手段１６は、図６に示したＦＦＮＮの入力層Ｌ１に拡張分散表現ベクトルを入力し、出力層Ｌ３から出力される結果に基づいて判定を行う。図６の例では、判定手段１６は、出力層Ｌ３の一方のノードの出力である現実に発生している事象に関連する確率値から、他方のノードから出力される確率値を減算し、正であれば、メディア情報が、現実に発生している事象に関連する情報であると判定する。一方、負であれば、判定手段１６は、メディア情報が、現実に発生している事象に関連する情報ではないと判定する。
これによって、判定手段１６は、メディア情報が現実に発生している事象に関連する情報か否かを判定することができる。判定手段１６は、この判定結果を外部に出力する。 The determination unit 16 uses the information determination model stored in the information determination model storage unit 23 to determine whether or not the input extended distributed representation vector is a vector corresponding to information related to an event actually occurring. Determine whether. Specifically, the determination unit 16 inputs the extended dispersion expression vector to the input layer L1 of the FFNN illustrated in FIG. 6, and performs determination based on the result output from the output layer L3. In the example of FIG. 6, the determination unit 16 subtracts the probability value output from the other node from the probability value related to the actually occurring event that is the output of one node of the output layer L3. If so, it is determined that the media information is information related to an actually occurring event. On the other hand, if it is negative, the determination means 16 determines that the media information is not information related to an event that has actually occurred.
Accordingly, the determination unit 16 can determine whether the media information is information related to an event that actually occurs. The determination unit 16 outputs the determination result to the outside.

記憶部２０は、分散表現ベクトル記憶手段２１と、特徴語句記憶手段２２と、情報判定モデル記憶手段２３と、を備える。記憶部２０は、情報判定装置１の動作で使用または生成する各種データを記憶するものである。
これら各記憶手段は、ハードディスク、半導体メモリ等の一般的な記憶装置で構成することができる。なお、ここでは、記憶部２０において、各記憶手段を個別に設けているが、１つの記憶装置の記憶領域を複数に区分して各記憶手段としてもよい。また、記憶部２０を外部記憶装置として、情報判定装置１の構成から省いてもよい。 The storage unit 20 includes a distributed expression vector storage unit 21, a feature phrase storage unit 22, and an information determination model storage unit 23. The storage unit 20 stores various data used or generated in the operation of the information determination apparatus 1.
Each of these storage means can be constituted by a general storage device such as a hard disk or a semiconductor memory. Here, each storage unit is provided individually in the storage unit 20, but the storage area of one storage device may be divided into a plurality of storage units. Moreover, you may omit from the structure of the information determination apparatus 1 by making the memory | storage part 20 into an external storage device.

分散表現ベクトル記憶手段２１は、分散表現ベクトル生成手段１１で生成される分散表現ベクトルを単語に対応付けて記憶するものである。 The distributed expression vector storage unit 21 stores the distributed expression vector generated by the distributed expression vector generation unit 11 in association with a word.

特徴語句記憶手段２２は、予め定めた所定の事象が発生していないと予測される特徴的な語句（特徴語句）を記憶するものである。この特徴語句記憶手段２２は、所定の事象が発生していないと予測される慣用句（図３（ａ）参照）、仮定形表現（図３（ｂ）参照）、指定固有名詞（図３（ｃ）参照）を予め記憶しておく。
この情報判定装置１は、図示を省略した通信手段を備え、電子番組表を提供するサーバから、電子番組表を取得し、番組名、出演者等を特徴語句記憶手段２２に記憶することとしてもよい。 The characteristic phrase storage unit 22 stores a characteristic phrase (characteristic phrase) predicted that a predetermined event does not occur. The feature phrase storage means 22 includes an idiomatic phrase (see FIG. 3A), a hypothetical expression (see FIG. 3B), a designated proper noun (see FIG. c)) is stored in advance.
This information determination apparatus 1 includes communication means (not shown), acquires an electronic program guide from a server that provides the electronic program guide, and stores the program name, performers, etc. in the feature phrase storage means 22. Good.

情報判定モデル記憶手段２３は、学習手段１５で学習した情報判定モデルを記憶するものである。この情報判定モデル記憶手段２３に記憶される情報判定モデルは、判定手段１６が参照する。 The information determination model storage unit 23 stores the information determination model learned by the learning unit 15. The information determination model stored in the information determination model storage unit 23 is referred to by the determination unit 16.

以上説明したように情報判定装置１を構成することで、情報判定装置１は、教師データである予め定めた所定の事象に関連する情報であるか否かが既知のメディア情報から、情報判定モデルを学習することができる。
そして、情報判定装置１は、情報判定モデルを用いて、未知のメディア情報が現実に発生している事象に関連する情報であるか否かを判定することができる。
なお、情報判定装置１は、一般的なコンピュータを、前記した制御部１０の各手段として機能させるプログラム（情報判定プログラム）で動作させることができる。 By configuring the information determination apparatus 1 as described above, the information determination apparatus 1 can determine whether or not the information determination information 1 is information related to a predetermined event, which is teacher data, from the known media information. Can learn.
And the information determination apparatus 1 can determine whether unknown media information is the information relevant to the phenomenon which has actually generate | occur | produced using an information determination model.
In addition, the information determination apparatus 1 can operate a general computer with a program (information determination program) that functions as each unit of the control unit 10 described above.

［情報判定装置の動作］
次に、図７，図８を参照して、本発明の実施形態に係る情報判定装置１の動作について説明する。なお、特徴語句記憶手段２２には、予め慣用句、仮定形表現、指定固有名詞が記憶されているものとする。ここでは、情報判定装置１の動作を、学習モードと評価モードとに分けて説明する。 [Operation of information judgment device]
Next, with reference to FIG. 7, FIG. 8, operation | movement of the information determination apparatus 1 which concerns on embodiment of this invention is demonstrated. It is assumed that idiomatic phrases, hypothetical expressions, and designated proper nouns are stored in the feature phrase storage means 22 in advance. Here, the operation of the information determination apparatus 1 will be described separately for the learning mode and the evaluation mode.

（学習モード）
まず、図７を参照（構成については適宜図１参照）して、情報判定装置１の学習モードの動作について説明する。
ステップＳ１において、情報判定装置１の分散表現ベクトル生成手段１１は、既存のメディア情報等の大量の学習データ（分散表現学習データ）から、単語ごとの分散表現ベクトルを生成する。この単語ごとの分散表現ベクトルは、分散表現ベクトル記憶手段２１に記憶される。 (Learning mode)
First, referring to FIG. 7 (refer to FIG. 1 as appropriate for the configuration), the operation of the learning mode of the information determination apparatus 1 will be described.
In step S1, the distributed expression vector generation means 11 of the information determination apparatus 1 generates a distributed expression vector for each word from a large amount of learning data (distributed expression learning data) such as existing media information. The distributed expression vector for each word is stored in the distributed expression vector storage unit 21.

そして、ステップＳ２において、情報判定装置１のベクトル化手段１２は、所定の事象（ここでは、「火事」）に関連する情報であるか否かが既知のメディア情報（教師データ）を投稿ごとに入力する。
そして、ステップＳ３において、情報判定装置１のベクトル化手段１２は、ステップＳ２で入力した投稿文に含まれる単語に対応するステップＳ１で生成された分散表現ベクトルを単語数分だけ加算する。
さらに、ステップＳ４において、情報判定装置１のベクトル化手段１２は、ステップＳ３で加算された分散表現ベクトルを、投稿文に含まれる単語数で除算することで、投稿文ごとの正規化したベクトル（文分散表現ベクトル）を生成する。 Then, in step S2, the vectorization means 12 of the information determination device 1 sends, for each post, known media information (teacher data) whether or not the information is related to a predetermined event (here, “fire”). input.
In step S3, the vectorization unit 12 of the information determination apparatus 1 adds the distributed expression vectors generated in step S1 corresponding to the words included in the posted text input in step S2 by the number of words.
Furthermore, in step S4, the vectorization means 12 of the information determination apparatus 1 divides the distributed expression vector added in step S3 by the number of words included in the posted sentence, thereby normalizing each posted sentence ( Sentence distribution expression vector).

ここで、ステップＳ５において、情報判定装置１の語句判定手段１３は、ステップＳ２で入力されたメディア情報に、特徴語句記憶手段２２に記憶されている特徴語句（慣用句、仮想的表現、指定固有名詞）が含まれているか否かを判定する。
このステップＳ５で、メディア情報に特徴語句が含まれていると判定された場合（Ｙｅｓ）、情報判定装置１のベクトル拡張手段１４は、ステップＳ６において、ステップＳ４で生成された文分散表現ベクトルに、ステップＳ５でメディア情報内に含まれていると判定された特徴語句に対応するベクトルの位置に値“１”を設定したベクトルを拡張して、拡張分散表現ベクトルを生成する（図５参照）。そして、情報判定装置１は、ステップＳ７に動作を進める。 Here, in step S5, the phrase determination unit 13 of the information determination apparatus 1 adds the feature phrases (i.e., idiomatic phrase, virtual expression, designation specific) stored in the feature phrase storage unit 22 to the media information input in step S2. Whether or not (noun) is included.
If it is determined in this step S5 that the media information includes a feature word / phrase (Yes), the vector expansion unit 14 of the information determination apparatus 1 adds the sentence distribution expression vector generated in step S4 to the sentence distribution expression vector in step S6. Then, the vector in which the value “1” is set at the position of the vector corresponding to the feature word determined to be included in the media information in step S5 is expanded to generate an extended distributed expression vector (see FIG. 5). . And the information determination apparatus 1 advances operation | movement to step S7.

一方、ステップＳ５で、メディア情報に特徴語句が含まれていないと判定された場合（Ｎｏ）、情報判定装置１は、ステップＳ７に動作を進める。ただし、厳密には、ベクトル拡張手段１４は、ステップＳ６で拡張するベクトルと同次数で要素の値をすべて“０”とする空のベクトルを文分散表現ベクトルに付加して拡張分散表現ベクトルとする。
ステップＳ７において、情報判定装置１の学習手段１５は、拡張分散表現ベクトルと、ステップＳ２で入力した教師データとから、メディア情報が現実に発生している事象に関連する情報であるか否かを判定する情報判定モデルを学習する。 On the other hand, when it is determined in step S5 that the feature phrase is not included in the media information (No), the information determination apparatus 1 advances the operation to step S7. However, strictly speaking, the vector expansion unit 14 adds an empty vector having the same degree as the vector expanded in step S6 and having all the element values “0” to the sentence distributed expression vector to obtain an extended distributed expression vector. .
In step S7, the learning unit 15 of the information determination apparatus 1 determines whether or not the media information is information related to an event actually occurring from the extended distributed expression vector and the teacher data input in step S2. Learn the information judgment model to judge.

そして、ステップＳ８において、情報判定装置１の学習手段１５は、教師データを用いた学習を所定回数行うか、情報判定モデルのパラメータ誤差が収束したかにより、学習が終了したか否かを判定する。
このステップＳ８で、学習が終了していないと判定された場合（Ｎｏ）、情報判定装置１は、ステップＳ２に戻って学習動作を継続する。
一方、ステップＳ８で、学習が終了したと判定された場合（Ｙｅｓ）、情報判定装置１は、ステップＳ９において、学習した情報判定モデルを、情報判定モデル記憶手段２３に書き込む。 In step S8, the learning unit 15 of the information determination apparatus 1 determines whether learning has been completed based on whether the learning using the teacher data is performed a predetermined number of times or the parameter error of the information determination model has converged. .
If it is determined in step S8 that learning has not ended (No), the information determination apparatus 1 returns to step S2 and continues the learning operation.
On the other hand, when it is determined in step S8 that the learning is finished (Yes), the information determination apparatus 1 writes the learned information determination model in the information determination model storage unit 23 in step S9.

以上の動作によって、情報判定装置１は、教師データから、未知のメディア情報が現実に発生している事象に関連する情報であるか否かを判定するための情報判定モデルを生成することができる。 With the above operation, the information determination apparatus 1 can generate an information determination model for determining whether or not unknown media information is information related to an event actually occurring from the teacher data. .

（評価モード）
次に、図８を参照（構成については適宜図１参照）して、情報判定装置１の評価モードの動作について説明する。この評価モードの動作は、図７で説明した学習モードの動作の後に行われる。
ステップＳ１０において、情報判定装置１のベクトル化手段１２は、現実に発生している事象に関連する情報であることが未知のメディア情報を投稿ごとに入力する。 (Evaluation mode)
Next, referring to FIG. 8 (refer to FIG. 1 as appropriate for the configuration), the operation in the evaluation mode of the information determination apparatus 1 will be described. The operation in the evaluation mode is performed after the operation in the learning mode described with reference to FIG.
In step S10, the vectorization means 12 of the information determination apparatus 1 inputs, for each post, media information that is unknown to be information related to an actually occurring event.

そして、ステップＳ１１において、情報判定装置１のベクトル化手段１２は、ステップＳ１０で入力した投稿文に含まれる単語に対応する分散表現ベクトル記憶手段２１に記憶されている分散表現ベクトルを単語数分だけ加算する。
さらに、ステップＳ１２において、情報判定装置１のベクトル化手段１２は、ステップＳ１１で加算された分散表現ベクトルを、投稿文に含まれる単語数で除算することで、投稿文ごとの正規化したベクトル（文分散表現ベクトル）を生成する。 In step S11, the vectorization unit 12 of the information determination apparatus 1 sets the distributed expression vectors stored in the distributed expression vector storage unit 21 corresponding to the words included in the posted text input in step S10 by the number of words. to add.
Furthermore, in step S12, the vectorization means 12 of the information determination apparatus 1 divides the distributed expression vector added in step S11 by the number of words included in the posted sentence, thereby normalizing each posted sentence ( Sentence distribution expression vector).

ここで、ステップＳ１３において、情報判定装置１の語句判定手段１３は、ステップＳ１０で入力されたメディア情報に、特徴語句記憶手段２２に記憶されている特徴語句（慣用句、仮想的表現、指定固有名詞）が含まれているか否かを判定する。
このステップＳ１３で、メディア情報に特徴語句が含まれていると判定された場合（Ｙｅｓ）、情報判定装置１のベクトル拡張手段１４は、ステップＳ１４において、ステップＳ１２で生成された文分散表現ベクトルに、ステップＳ１３でメディア情報内に含まれていると判定された特徴語句に対応するベクトルの位置に値“１”を設定したベクトルを拡張して、拡張分散表現ベクトルを生成する（図５参照）。そして、情報判定装置１は、ステップＳ１５に動作を進める。 Here, in step S13, the phrase determination unit 13 of the information determination apparatus 1 adds the feature phrase (idiomatic phrase, virtual expression, designation specific) stored in the feature phrase storage unit 22 to the media information input in step S10. Whether or not (noun) is included.
If it is determined in step S13 that the feature phrase is included in the media information (Yes), the vector expansion unit 14 of the information determination apparatus 1 adds the sentence distribution expression vector generated in step S12 to the sentence distribution expression vector in step S14. Then, the vector in which the value “1” is set at the position of the vector corresponding to the feature word determined to be included in the media information in step S13 is expanded to generate an extended distributed expression vector (see FIG. 5). . And the information determination apparatus 1 advances operation | movement to step S15.

一方、ステップＳ１３で、メディア情報に特徴語句が含まれていないと判定された場合（Ｎｏ）、情報判定装置１は、ステップＳ１５に動作を進める。ただし、厳密には、ベクトル拡張手段１４は、ステップＳ１４で拡張するベクトルと同次数で要素の値をすべて“０”とする空のベクトルを文分散表現ベクトルに付加して拡張分散表現ベクトルとする。 On the other hand, when it is determined in step S13 that the feature phrase is not included in the media information (No), the information determination apparatus 1 advances the operation to step S15. However, strictly speaking, the vector expansion unit 14 adds an empty vector having the same degree as the vector to be expanded in step S14 and having all the element values “0” to the sentence distribution expression vector to obtain an extended distribution expression vector. .

ステップＳ１５において、情報判定装置１の判定手段１６は、情報判定モデル記憶手段２３に記憶されている情報判定モデルを用いて、拡張分散表現ベクトルが、現実に発生している事象に関連する情報に対応するベクトルであるか否かを判定する。さらに、ステップＳ１６において、情報判定装置１の判定手段１６は、ステップＳ１５で判定した結果を外部に出力する。 In step S <b> 15, the determination unit 16 of the information determination apparatus 1 uses the information determination model stored in the information determination model storage unit 23 to convert the extended distributed representation vector into information related to the event that actually occurs. It is determined whether or not it is a corresponding vector. Furthermore, in step S16, the determination means 16 of the information determination apparatus 1 outputs the result determined in step S15 to the outside.

ステップＳ１７において、情報判定装置１は、さらにメディア情報が入力されるか否かにより、評価モードの動作の終了を判定する。
このステップＳ１７で、さらにメディア情報が入力され、評価モードの動作が終了していない場合（Ｎｏ）、情報判定装置１は、ステップＳ１０に動作を戻って、判定動作を継続する。
一方、ステップＳ１７で、新たなメディア情報が入力されず、評価モードの動作が終了した場合（Ｙｅｓ）、動作を終了する。
以上の動作によって、情報判定装置１は、未知のメディア情報が、現実に発生している事象に関連する情報であるか否かを判定することができる。 In step S17, the information determination apparatus 1 determines the end of the evaluation mode operation based on whether or not media information is further input.
In this step S17, when media information is further input and the operation in the evaluation mode has not ended (No), the information determination apparatus 1 returns to the operation in step S10 and continues the determination operation.
On the other hand, if no new media information is input in step S17 and the evaluation mode operation ends (Yes), the operation ends.
With the above operation, the information determination apparatus 1 can determine whether the unknown media information is information related to an event that actually occurs.

以上、本発明の実施形態に係る情報判定装置１の構成および動作について説明したが、本発明は、この実施形態に限定されるものではない。
ここでは、情報判定装置１は、特徴語句記憶手段２２に記憶する特徴語句として、慣用句、仮定形表現、指定固有名詞のすべてを用いた。
しかし、情報判定装置１は、特徴語句として、慣用句、仮定形表現、指定固有名詞の少なくとも１つを用いることとしてもよい。このように、限定して特徴語句を用いても、従来に比べて、ニュース素材となるメディア情報の候補を減らすことができ、最終的に人がメディア情報をニュース素材として活用することができるか否かの判定作業を減らすことができる。 The configuration and operation of the information determination apparatus 1 according to the embodiment of the present invention have been described above, but the present invention is not limited to this embodiment.
Here, the information determination apparatus 1 uses all of idiomatic phrases, hypothetical expressions, and designated proper nouns as feature words to be stored in the feature word storage means 22.
However, the information determination apparatus 1 may use at least one of an idiomatic phrase, a hypothetical expression, and a designated proper noun as a feature phrase. In this way, even if limited feature words are used, the number of media information candidates that can be news material can be reduced compared to the conventional case, and finally, can people use media information as news material? It is possible to reduce the determination work of whether or not.

また、ここでは、ベクトル拡張手段１４が、特徴語句記憶手段２２に記憶されている特徴語句のそれぞれの特徴語句が含まれているか否かを示すベクトルを分散表現ベクトルに追加した（図５参照）。
しかし、ベクトル拡張手段１４は、慣用句、仮定形表現、指定固有名詞ごとに、いずれかの特徴語句が含まれているか否かを示すベクトルを分散表現ベクトルに追加してもよい。また、仮定形表現については、仮定形表現が、予め定めた事象に関連する単語（例えば、「火」、「火事」）を含む文節からの距離ごとに拡張するベクトルを生成してもよい。 Here, the vector expansion unit 14 adds a vector indicating whether or not each of the feature words stored in the feature word storage unit 22 is included in the distributed expression vector (see FIG. 5). .
However, the vector expansion unit 14 may add a vector indicating whether any feature word / phrase is included for each idiom / assumed expression / designated proper noun to the distributed expression vector. As for the hypothetical expression, the hypothetical expression may generate a vector that expands for each distance from a phrase including a word related to a predetermined event (for example, “fire”, “fire”).

例えば、図９に示すように、ベクトル拡張手段１４は、慣用句については、特徴語句記憶手段２２に記憶されているいずれかの慣用句が含まれている場合、ベクトルに１次元の要素を割り当てる。また、ベクトル拡張手段１４は、指定固有名詞についても同様に、特徴語句記憶手段２２に記憶されているいずれかの指定固有名詞が含まれている場合、ベクトルに１次元の要素を割り当てる。
また、ベクトル拡張手段１４は、仮定形表現については、予め定めた事象に関連する単語（例えば、「火」、「火事」）を含む文節からの距離として、例えば、“−３”〜“３”までの７次元の要素を割り当てる。
これによって、図５に示した拡張分散表現ベクトルよりも次元数を抑えることができ、演算コストを抑えることができる。 For example, as shown in FIG. 9, the vector expansion unit 14 assigns a one-dimensional element to a vector when any of the idiomatic phrases stored in the feature phrase storage unit 22 is included for the idiomatic phrase. . Similarly, the vector expansion unit 14 assigns a one-dimensional element to the vector when any one of the designated proper nouns stored in the feature phrase storage unit 22 is included for the designated proper noun.
For the hypothetical expression, the vector expansion unit 14 uses, for example, “−3” to “3” as distances from phrases including words (for example, “fire” and “fire”) related to a predetermined event. All the 7-dimensional elements up to "are assigned.
As a result, the number of dimensions can be reduced as compared with the extended distributed expression vector shown in FIG. 5, and the calculation cost can be reduced.

また、ここでは、現実に発生している事象として、事件、事故等を例に説明したが、この事象は、現実に発生する事象であればなんでもよい。例えば、メディア情報が、現実の「風邪」に関する情報であるか否かを判定する場合、ドラマの演技上の風邪に関する情報を除外することができる。また、メディア情報が、現実の「交通情報」に関する情報であるか否かを判定する場合、ゲーム上で発生する交通情報に関する情報を除外することができる。 In addition, here, an event, an accident, or the like has been described as an example of an actually occurring event. However, this event may be any event that actually occurs. For example, when it is determined whether or not the media information is information related to an actual “cold”, information related to a cold in the drama performance can be excluded. Further, when it is determined whether or not the media information is information regarding actual “traffic information”, information regarding traffic information generated in the game can be excluded.

また、ここでは、情報判定装置１は、情報判定モデルを学習する学習動作と、情報判定モデルを用いて、未知のメディア情報が、現実に発生している事象に関連する情報であるか否かを判定する判定動作との２つの動作を１つの装置で行うものとした。
しかし、これらの動作は、別々の装置で動作させても構わない。 In addition, here, the information determination apparatus 1 uses the learning operation for learning the information determination model and the information determination model to determine whether the unknown media information is information related to an actually occurring event. It is assumed that one device performs two operations, a determination operation for determining whether or not.
However, these operations may be performed by separate devices.

具体的には、情報判定モデルを学習する学習動作を実現する装置は、図１０に示す情報判定モデル学習装置３として構成することができる。
情報判定モデル学習装置３は、図１０に示すように、図１で説明した情報判定装置１から、判定手段１６を省いて構成すればよい。この構成は、図１で説明した情報判定装置１と同じ、情報判定モデルを学習する学習動作のみを行う。なお、情報判定モデル学習装置３の動作は、図７で説明した動作と同じである。
この情報判定モデル学習装置３は、コンピュータを前記した各手段として機能させるためのプログラム（情報判定モデル学習プログラム）で動作させることができる。 Specifically, an apparatus for realizing a learning operation for learning an information determination model can be configured as an information determination model learning apparatus 3 shown in FIG.
As shown in FIG. 10, the information determination model learning device 3 may be configured by omitting the determination unit 16 from the information determination device 1 described in FIG. This configuration performs only the learning operation for learning the information determination model, which is the same as the information determination apparatus 1 described in FIG. The operation of the information determination model learning device 3 is the same as the operation described with reference to FIG.
The information determination model learning device 3 can be operated by a program (information determination model learning program) for causing a computer to function as each of the above-described means.

また、情報判定モデルを用いて、未知のメディア情報が、現実に発生している事象に関連する情報であるか否かを判定する判定動作を実現する装置は、図１１に示す情報判定装置１Ｂとして構成することができる。
情報判定装置１Ｂは、図１１に示すように、図１で説明した情報判定装置１から、分散表現ベクトル生成手段１１と学習手段１５を省いて構成すればよい。この構成は、図１で説明した情報判定装置１と同じ、未知のメディア情報が、現実に発生している事象に関連する情報を判定する判定動作のみを行う。なお、情報判定装置１Ｂの動作は、図８で説明した動作と同じである。
この情報判定装置１Ｂは、コンピュータを前記した各手段として機能させるためのプログラム（情報判定プログラム）で動作させることができる。
このように、学習動作と判定動作とを、異なる装置で動作させることで、１つの情報判定モデル学習装置３で学習した情報判定モデルを、複数の情報判定装置１Ｂで利用することが可能になる。 An apparatus that implements a determination operation for determining whether unknown media information is information related to an actually occurring event using the information determination model is an information determination apparatus 1B shown in FIG. Can be configured.
As shown in FIG. 11, the information determination apparatus 1B may be configured by omitting the distributed representation vector generation unit 11 and the learning unit 15 from the information determination apparatus 1 described in FIG. In this configuration, the same determination operation as that of the information determination apparatus 1 described with reference to FIG. 1 is performed to determine information related to an event in which unknown media information actually occurs. The operation of the information determination apparatus 1B is the same as the operation described in FIG.
This information determination apparatus 1B can be operated by a program (information determination program) for causing a computer to function as each means described above.
In this way, by operating the learning operation and the determination operation with different devices, the information determination model learned with one information determination model learning device 3 can be used with a plurality of information determination devices 1B. .

また、ここでは、学習手段１５が学習する情報判定モデルを、教師あり学習により学習するニューラルネットワークとした。しかし、この教師あり学習は、他の一般的な機械学習を用いることができる。例えば、サポートベクタマシン（ＳＶＭ：Support Vector Machine）、条件付確率場（ＣＲＦ：Conditional Random Fields）等を用いることができる。 Here, the information determination model learned by the learning means 15 is a neural network that learns by supervised learning. However, other general machine learning can be used for this supervised learning. For example, a support vector machine (SVM), a conditional random field (CRF), or the like can be used.

１，１Ｂ情報判定装置
１１分散表現ベクトル生成手段
１２ベクトル化手段
１３語句判定手段
１４ベクトル拡張手段
１５学習手段
１６判定手段
２１分散表現ベクトル記憶手段
２２特徴語句記憶手段
２３情報判定モデル記憶手段
３情報判定モデル学習装置 DESCRIPTION OF SYMBOLS 1,1B Information determination apparatus 11 Distributed expression vector generation means 12 Vectorization means 13 Phrase determination means 14 Vector expansion means 15 Learning means 16 Determination means 21 Distributed expression vector storage means 22 Feature phrase storage means 23 Information determination model storage means 3 Information determination Model learning device

Claims

In order to determine whether or not the social media information to be determined is information indicating an actual occurrence event by using a plurality of social media information that is text data in units of postings that indicate whether or not the actual occurrence event is known as teacher data An information determination model learning device for learning the information determination model of
Vectorization that inputs the teacher data, averages the distributed expression vectors of the words constituting the posted sentence from the distributed expression vectors for each word stored in the storage means in advance, and generates a distributed expression vector for each posting unit Means,
A phrase for determining whether or not the social media information corresponding to the post-unit distributed representation vector generated by the vectorization means includes a plurality of predetermined phrases indicating that the actual occurrence event is not included as a word A determination means;
Vector expansion means for generating an extended distributed expression vector by vectorizing the presence / absence of a word determined to be included by the word determination means and adding it to the distributed expression vector of the posting unit;
Learning means for generating the information determination model by machine learning of the extended distributed expression vector generated by the vector extending means;
An information determination model learning device comprising:

The social media information includes at least one kind of phrase group of an idiomatic phrase related to the occurrence event, a hypothetical expression, and a proper noun designated in advance as a phrase not related to the occurrence event. The information determination model learning device according to claim 1, wherein the information determination model learning device according to claim 1 is determined.

Information for determining whether or not unknown data, which is social media information to be determined, is information indicating an actual occurrence event, using the information determination model learned by the information determination model learning device according to claim 1 or 2. A determination device,
Vectorization that inputs the unknown data, averages the distributed expression vectors of the words constituting the posted sentence from the distributed expression vectors for each word stored in the storage means in advance, and generates a distributed expression vector for each posting unit Means,
A phrase for determining whether or not the social media information corresponding to the post-unit distributed representation vector generated by the vectorization means includes a plurality of predetermined phrases indicating that the actual occurrence event is not included as a word A determination means;
Vector expansion means for generating an extended distributed expression vector by vectorizing the presence / absence of a word determined to be included by the word determination means and adding it to the distributed expression vector of the posting unit;
A determination unit that determines whether the unknown data is information indicating an actual occurrence event using the information determination model based on the extended distributed expression vector generated by the vector expansion unit;
An information determination apparatus comprising:

Learning information judgment model using multiple social media information, which is text data for each posting unit that is known whether or not it represents an actual occurrence event, as teacher data, and unknown data that is social media information to be judged is an actual occurrence event An information determination device for determining whether or not the information indicates
In the learning mode for learning the information determination model, the teacher data is input. In the evaluation mode for performing the determination using the information determination model, the unknown data is input, and each word stored in the storage unit in advance. Vectorization means for averaging the distributed expression vectors of the words constituting the posted sentence from the distributed expression vectors of
A phrase for determining whether or not the social media information corresponding to the post-unit distributed representation vector generated by the vectorization means includes a plurality of predetermined phrases indicating that the actual occurrence event is not included as a word A determination means;
Vector expansion means for generating an extended distributed expression vector by vectorizing the presence / absence of a word determined to be included by the word determination means and adding it to the distributed expression vector of the posting unit;
In the learning mode, learning means for generating the information determination model by machine learning the extended distributed expression vector generated from social media information corresponding to the teacher data;
In the evaluation mode, it is determined whether the unknown data is information indicating an actual occurrence event using the information determination model based on the extended distributed expression vector generated from the social media information corresponding to the unknown data. A determination means;
An information determination apparatus comprising:

An information determination model learning program for causing a computer to function as the information determination model learning device according to claim 1.

An information determination program for causing a computer to function as the information determination apparatus according to claim 3.