JP2019133565A

JP2019133565A - News material classification apparatus, program and learning model

Info

Publication number: JP2019133565A
Application number: JP2018017228A
Authority: JP
Inventors: 仁宣牧野; Kiminobu Makino; 太郎宮▲崎▼; Taro Miyazaki; 後藤　淳; Atsushi Goto; 淳後藤; 友香武井; Yuka Takei
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2018-02-02
Filing date: 2018-02-02
Publication date: 2019-08-08
Anticipated expiration: 2038-02-02
Also published as: JP7181693B2

Abstract

To accurately classify news posts extracted from social media information.SOLUTION: A series generating unit 10 of the news material classification apparatus 1-1 generates a character vector and a word vector from a news posts information extracted by a news material extraction apparatus 100. A sentence classification unit 11 performs calculation using the outputted data of a character vector and a word vector of the news posts information and the input data as the out and non-out information using a learning model and generates and outputs the out and non-out information. Thus, the news posts information is classified into non-out or out information. The learning model is constituted of a character NN21 of a character vector of the news post information as the input data, a word NN22 of a word vector of news post information as the input data, and an output NN23 of the vector which combines the calculation result of the character NN21 and the calculation result of the word NN22 as the input data, and the output data of the out and the non-out information as the output data.SELECTED DRAWING: Figure 2

Description

本発明は、多数のソーシャルメディア情報からニュース素材となり得る情報を抽出し、抽出した情報を分類するニュース素材分類装置及びプログラム、並びにニュース素材分類装置に用いる学習モデルに関する。 The present invention relates to a news material classification device and program for extracting information that can be news material from a large number of social media information, and classifying the extracted information, and a learning model used for the news material classification device.

従来、インターネットを用いて個人が情報を発信することで、様々な情報交流サービスを形成するｔｗｉｔｔｅｒ（登録商標）等のソーシャルメディアが盛んに利用されている。ソーシャルメディアでは、例えば火事、事故等の現場に居合わせた個人から即時性の高い投稿情報（以下、「ソーシャルメディア情報」という。）が発信される。 2. Description of the Related Art Conventionally, social media such as Twitter (registered trademark), which forms various information exchange services by an individual sending information using the Internet, has been actively used. In social media, post information with high immediacy (hereinafter referred to as “social media information”) is transmitted from an individual who is present at a scene such as a fire or an accident.

放送局では、個々のソーシャルメディア情報からニュース素材となり得る情報（以下、「ニュース性投稿情報」という。）を抽出することで、ニュース性投稿情報をニュースまたは番組の制作に活用するケースが増えている。 In broadcasting stations, by extracting information that can be used as news material from individual social media information (hereinafter referred to as “news posting information”), the number of cases where news posting information is used in the production of news or programs has increased. Yes.

ソーシャルメディア情報からニュース性投稿情報を抽出する作業を、人手により行うとすると、大きな労力が必要となり、現実的ではない。そこで、人手による労力の低減を図るために、ソーシャルメディア情報からニュース性投稿情報を抽出する作業を自動的に行う手法が提案されている（例えば特許文献１、非特許文献１を参照）。 If the work of extracting news posting information from social media information is performed manually, a large amount of labor is required, which is not realistic. Therefore, in order to reduce labor by manpower, a method of automatically performing a work of extracting news posting information from social media information has been proposed (see, for example, Patent Document 1 and Non-Patent Document 1).

図２２は、従来のニュース素材抽出装置の概要を説明する図である。このニュース素材抽出装置は、例えば特許文献１の手法を組み込んだ装置である。 FIG. 22 is a diagram for explaining the outline of a conventional news material extracting apparatus. This news material extraction apparatus is an apparatus incorporating the technique of Patent Document 1, for example.

このニュース素材抽出装置１００は、学習フェーズにおいて、投稿情報であるソーシャルメディア情報を入力し、特徴ベクトルの特徴量とする素性を生成し、機械学習により学習モデルを生成する。また、ニュース素材抽出装置１００は、判定フェーズにおいて、ソーシャルメディア情報を入力し、学習モデル（学習済みモデル）を用いて、入力したソーシャルメディア情報がニュース性投稿情報であるか否かを判定する。そして、ニュース素材抽出装置１００は、ニュース性投稿情報を出力すると共に、ニュース性投稿情報以外の情報を破棄する。 In the learning phase, the news material extracting apparatus 100 inputs social media information as posting information, generates a feature as a feature amount of a feature vector, and generates a learning model by machine learning. Moreover, the news material extraction apparatus 100 inputs social media information in a determination phase, and determines whether the input social media information is news posting information using a learning model (learned model). Then, the news material extracting apparatus 100 outputs the news posting information and discards information other than the news posting information.

このニュース素材抽出装置１００は、入力したソーシャルメディア情報から言語情報を抽出し、言語情報を入力データとした機械学習にて学習モデルを生成することを基本としている。このような機械学習を用いた手法には、単語系列を入力データとする場合、文字系列を入力データとする場合、単語系列及び文字系列の組み合わせを入力データとする場合等がある（例えば非特許文献２−６を参照）。 The news material extracting apparatus 100 is based on extracting language information from input social media information and generating a learning model by machine learning using the language information as input data. Such a method using machine learning includes a case where a word series is used as input data, a case where a character series is used as input data, and a case where a combination of a word series and a character series is used as input data (for example, non-patent). Reference 2-6).

さらに、ソーシャルメディア情報から、電車の遅延等の所定のイベントを検出するための手法も提案されている（例えば、特許文献２，３を参照）。 Furthermore, a method for detecting a predetermined event such as a train delay from social media information has also been proposed (see, for example, Patent Documents 2 and 3).

特開２０１７−２０１４３７号公報JP 2017-201437 A 特開２０１３−１６８０２１号公報JP 2013-168021 A 特開２０１４−２５５１号公報Japanese Patent Application Laid-Open No. 2014-2551

宮崎他,“Twitterからの有用情報抽出のための学習データのマルチクラス化”,IFAT,2017,pages1-6Miyazaki et al., “Multi-classification of learning data for extracting useful information from Twitter”, IFAT, 2017, pages 1-6 Yoon Kim, et al.“Character-Aware Neural Language Models”, AAAI 2016Yoon Kim, et al. “Character-Aware Neural Language Models”, AAAI 2016 Rafal Jozefowicz, et al.“Exploring the limits of language modeling.”, CoRR, 2016, abs/1602.02410.Rafal Jozefowicz, et al. “Exploring the limits of language modeling.”, CoRR, 2016, abs / 1602.02410. Lyan Verwimp, et al.,“Character-Word LSTM Language Models”, EACL2017, pages 417-427Lyan Verwimp, et al., “Character-Word LSTM Language Models”, EACL2017, pages 417-427 Rupesh K. Srivastava, et al.,“Training Very Deep Networks.”NIPS, 2015, pages 2377-2385.Rupesh K. Srivastava, et al., “Training Very Deep Networks.” NIPS, 2015, pages 2377-2385. X. Ma and E. Hovy. End-to-end sequence labeling via bi-directional lstm-cnns-crf. In Proceedings of the 54th ACL, pages 1064-1074, Berlin, Germany, August 2016. ACL.X. Ma and E. Hovy.End-to-end sequence labeling via bi-directional lstm-cnns-crf.In Proceedings of the 54th ACL, pages 1064-1074, Berlin, Germany, August 2016.

一般に、ソーシャルメディアへ発信されるソーシャルメディア情報は多数であり、ソーシャルメディア情報から抽出されるニュース性投稿情報も多数である。このため、図２２に示したニュース素材抽出装置１００がソーシャルメディア情報からニュース性投稿情報を自動的に抽出したとしても、全てのニュース性投稿情報を利用者が監視することは困難である。 In general, there are many social media information transmitted to social media, and there are also many news posting information extracted from social media information. For this reason, even if the news material extraction device 100 shown in FIG. 22 automatically extracts news posting information from social media information, it is difficult for the user to monitor all news posting information.

また、ニュース性投稿情報の活用方法は、利用者、目的、用途、状況等に応じて大きく異なるものである。利用者は、多数のソーシャルメディア情報から抽出された多数のニュース性投稿情報のうち、所望の投稿情報のみを取得したいものと考えられる。このため、ニュース性投稿情報は、利用者の要求に適応するように、分類されることが望ましい。 Further, the utilization method of the news posting information varies greatly depending on the user, purpose, application, situation, and the like. It is considered that the user wants to obtain only desired posting information among a large number of news posting information extracted from a large number of social media information. For this reason, it is desirable to classify the news posting information so as to adapt to the user's request.

例えば、利用者が第一報のニュース性投稿情報を取得したい場合には、報道機関またはまとめサイトに関連する投稿情報及びこれを引用した投稿情報は、既出投稿（既に出現した投稿）であって第一報の投稿情報ではないから、必要とされない。ここで、既出投稿とは、ソーシャルメディアにおいて既に出現した投稿、すなわちソーシャルメディアにて既に呟かれた投稿、またはそれを引用した投稿をいう。 For example, if the user wants to obtain the first news posting information, the posting information related to the news agency or the summary site and the posting information that cites it are already posted (posts that have already appeared). Since it is not the posting information of the first report, it is not required. Here, the posted post means a post that has already appeared on social media, that is, a post that has already been hit on social media, or a post that cites it.

一般に、多数の人が意見を発信したいと感じるニュースについて、その投稿が一旦行われると、既出投稿は、大幅に増加する。このため、利用者は、ニュース性投稿情報をリアルタイムに監視し、第一報のニュース性投稿情報である非既出投稿を発見することは困難である。ここで、非既出投稿とは、ソーシャルメディアにおいて過去に出現しておらず、初めて出現した投稿をいう。 In general, once a posting is made on news that many people want to send their opinions, the number of posted posts increases significantly. For this reason, it is difficult for the user to monitor the news posting information in real time and find a non-existing posting that is the first news posting information. Here, the non-existing post means a post that has not appeared in the past in social media but has appeared for the first time.

同様に、利用者がニュースに対する意見及び続報のニュース性投稿情報を取得したい場合、第一報のニュース性投稿情報である非既出投稿は必要とされない。 Similarly, when the user wants to obtain the news-related posting information of the opinion and the follow-up news, the non-existing posting that is the news-related posting information of the first report is not required.

このように、利用者が第一報のニュース性投稿情報である非既出投稿を取得するためには、ソーシャルメディア情報からニュース性投稿情報が自動的に抽出された後、ニュース性投稿情報が既出投稿または非既出投稿であるかを自動的に分類することが望ましい。これは、利用者がニュースに対する意見及び続報のニュース性投稿情報である既出投稿を取得する場合も同様である。 In this way, in order for a user to obtain a non-existing post that is the first news posting information, after the news posting information is automatically extracted from the social media information, the news posting information is already displayed. It is desirable to automatically classify whether the posting is a post or a non-existing post. The same applies to the case where the user acquires an existing post which is an opinion on news and subsequent news posting information.

既出投稿には、引用元が明示されていて、その名称をキーワードとしたフィルタリングにより簡単に分類できる投稿が含まれる。また、既出投稿には、人目には文体等により既出であると分類されるが、キーワードのフィルタリング等の簡単な処理では分類が困難な投稿も含まれる。ソーシャルメディアでは、引用であっても文を簡単に変更することが可能であり、引用者により出典が削除されることもあり得る。 Already posted posts include posts whose citation source is specified and can be easily classified by filtering using the name as a keyword. In addition, the posted posts are classified as having been published by the style of the human eye, but include posts that are difficult to classify by simple processing such as keyword filtering. In social media, even if it is a citation, it is possible to easily change the sentence, and the source may be deleted by the citation.

このため、利用者が第一報のニュース性投稿情報を取得する場合には、単純なキーワードのフィルタリングにより分類が行われるのではなく、様々な文の条件を考慮する機械学習等により分類が行われることが望ましい。 For this reason, when the user obtains the first news posting information, classification is not performed by simple keyword filtering, but by machine learning that considers various sentence conditions. It is desirable that

ここで、機械学習の観点で、ソーシャルメディア情報からニュース性投稿情報を抽出する処理と、ニュース性投稿情報を第一報の情報である非既出投稿とその他の情報である既出投稿とに分類する処理とを、一連の処理にて行う場合を想定する。前者の処理を抽出処理といい、後者の処理を分類処理という。 Here, from the viewpoint of machine learning, the process of extracting news posting information from social media information and the news posting information are classified into non-existing posts that are the first information and other postings that are other information Assume that processing is performed in a series of processing. The former process is called an extraction process, and the latter process is called a classification process.

この一連の処理では、抽出処理により抽出したニュース性投稿情報だけでなく、抽出処理により破棄されるべき情報（ニュース性投稿情報ではない情報、すなわち分類する必要のない情報）も含めて、分類処理が行われてしまう。このため、一連の処理に対して機械学習を適用した場合には、学習データが増え、処理負荷が高くなり、分類精度が低下してしまう。 In this series of processing, not only the news posting information extracted by the extraction processing, but also information that should be discarded by the extraction processing (information that is not news posting information, that is, information that does not need to be classified) is classified. Will be done. For this reason, when machine learning is applied to a series of processes, the learning data increases, the processing load increases, and the classification accuracy decreases.

また、既出投稿及び非既出投稿以外の分類処理を行うために、分類の構成を変更する場合には、一連の処理を行う全体のシステムを再度構築し、抽出の大規模な学習も含めて再学習を行う必要がある。 In addition, when changing the classification configuration to perform classification processing other than posted postings and non-posted postings, the entire system that performs a series of processing is rebuilt and re-created including large-scale learning of extraction. Learning is necessary.

このように、抽出処理及び分類処理を一連の処理にて行う場合には、利用者の要求に合わせた分類処理を増やす毎に、全体の学習を再度行う必要があり、現実的ではない。 As described above, when the extraction process and the classification process are performed in a series of processes, it is necessary to perform the entire learning again every time the classification process according to the user's request is increased, which is not realistic.

一方で、言語情報を入力データとした機械学習にて学習モデルを生成する場合、以下のとおり、文字系列を入力データとして用いるか、または単語系列を用いるかにより、異なる長所及び短所がある。 On the other hand, when a learning model is generated by machine learning using linguistic information as input data, there are different advantages and disadvantages depending on whether a character sequence is used as input data or a word sequence as follows.

文字系列を入力データとして用いる場合、文字は単語よりもバリエーションが少ないため、少ないノード数で各文字を学習モデルの入力層のノードに割り当て易くなり、ノードとして入力されない未知語が減り、文を構成する文字を正確に入力データとして扱うことができ、学習モデルを用いた判定精度が高くなるという長所がある。文字と単語を比較すると、単語は複数の文字から構成され様々な組み合わせがあることから、単語は文字よりも未知語が多くなる。このため、単語系列の場合、各単語を学習モデルの入力層のノードに割り当て難くなる。これに対し、文字には組み合わせがないから、文字は単語よりも未知語が少なくなる。 When character sequences are used as input data, because characters have fewer variations than words, it is easier to assign each character to a node in the learning model input layer with fewer nodes, and fewer unknown words that are not input as nodes constitute a sentence. Can be handled as input data accurately, and the determination accuracy using a learning model is improved. Comparing characters and words, the word is composed of a plurality of characters and has various combinations, so the word has more unknown words than characters. For this reason, in the case of a word sequence, it is difficult to assign each word to a node in the learning model input layer. On the other hand, since there are no combinations of characters, characters have fewer unknown words than words.

一方で、文字は単語よりも一要素の持ち得る意味が多く、文全体に対して情報が少なくその意味を特定し難いため、文全体に対して文字の意味が異なる場合であっても同じ判定を行う学習モデルが生成されてしまうことがあり、判定精度が低くなるという短所がある。これは、文字単体では意味が曖昧であり、同じ文字であっても一方の文に使用される文字の意味と他方の文に使用される文字の意味とを区別し難いにも関わらず、文字は学習モデルの入力層の１ノードに割り当てられるからである。 On the other hand, characters have more meaning than a word, and there is less information for the whole sentence, and it is difficult to specify the meaning, so the same judgment is made even if the meaning of the letters is different for the whole sentence There is a disadvantage that a learning model for performing the above is generated and the determination accuracy is lowered. This is because the meaning of a single character is ambiguous, and it is difficult to distinguish the meaning of the character used in one sentence from the meaning of the character used in the other sentence, even if the same character is used. Is assigned to one node in the input layer of the learning model.

これに対し、単語系列を入力データとして用いる場合、単語は文字よりも一要素の持ち得る意味が限られ、その意味を特定し易いため、単語が異なる場合に異なる判定を行う学習モデルが生成され、判定精度が高くなるという長所がある。 On the other hand, when a word sequence is used as input data, the meaning of a word is limited to one element rather than a character, and it is easy to specify the meaning. Therefore, a learning model that makes a different determination when the word is different is generated. There is an advantage that the determination accuracy becomes high.

一方で、単語は文字よりもバリエーションが多く、使用する単語数を限ると（または使用するノード数を限ると）未知語の数が多くなるため、各単語を学習モデルの入力層のノードに割り当て難くなり、文を構成する単語を正確に入力データとして扱うことができない場合があり、学習モデルを用いた判定精度が低くなるという短所がある。 On the other hand, since words have more variations than letters and the number of unknown words increases when the number of words used is limited (or the number of nodes used is limited), each word is assigned to a node in the input layer of the learning model. This makes it difficult to handle words that make up a sentence accurately as input data, resulting in lower accuracy of determination using a learning model.

また、日本語のような、単語間に空白を空けない言語を利用する場合には、文章を単語系列に変換する系列生成器の性能にも大きく影響を受けてしまい、理想的な系列生成器は存在しないため、学習及び判定精度が低くなるという短所がある。 In addition, when using a language such as Japanese that does not leave a space between words, the performance of the sequence generator that converts sentences into word sequences is greatly affected. Does not exist, the learning and determination accuracy is low.

このため、文字系列の長所及び単語系列の長所の両方を活かした機械学習を行うことが望まれる。尚、前述の非特許文献３−６では、文字系列及び単語系列を入力データとして機械学習を行うものであるが、いずれか一方が他方を補完し、文字系列及び単語系列を１系列として扱っていることから、それぞれの長所を完全に活かした機械学習にはなっていない。 For this reason, it is desirable to perform machine learning that takes advantage of both the advantages of character sequences and the advantages of word sequences. In the above Non-Patent Documents 3-6, machine learning is performed using a character sequence and a word sequence as input data, but either one complements the other, and the character sequence and the word sequence are treated as one sequence. Therefore, it is not a machine learning that fully utilizes each advantage.

そこで、本発明は前記課題を解決するためになされたものであり、その目的は、利用者が多数のソーシャルメディア情報から所望の情報を取得する際に、ソーシャルメディア情報から抽出されたニュース性投稿情報を精度高く分類可能なニュース素材分類装置、プログラム及び学習モデルを提供することにある。 Therefore, the present invention has been made to solve the above-mentioned problems, and its purpose is to provide a news posting extracted from social media information when a user acquires desired information from a large number of social media information. An object is to provide a news material classification device, a program, and a learning model capable of classifying information with high accuracy.

前記課題を解決するために、請求項１のニュース素材分類装置は、多数のソーシャルメディア情報のうち、ニュース素材となり得る投稿情報をニュース性投稿情報として入力し、当該ニュース性投稿情報を利用者の要求に応じて分類するニュース素材分類装置において、前記ニュース性投稿情報を入力し、当該ニュース性投稿情報に含まれる文字及び単語をそれぞれ抽出し、前記文字の系列からなる文字ベクトルを生成すると共に、前記単語の系列からなる単語ベクトルを生成する系列生成部と、機械学習された学習モデルを記憶する学習モデル記憶部と、前記学習モデル記憶部から前記学習モデルを読み出し、当該学習モデルを用いて、前記系列生成部により生成された前記文字ベクトル及び前記単語ベクトルに基づき、前記ニュース性投稿情報についての前記利用者の要求に応じた分類情報を生成して出力する分類部と、を備え、前記学習モデルが、前記ニュース性投稿情報の前記文字ベクトルを入力データとし、演算結果のベクトルを出力データとする文字ＮＮ（ニューラルネットワーク）、前記ニュース性投稿情報の前記単語ベクトルを入力データとし、演算結果のベクトルを出力データとする単語ＮＮ、及び、前記文字ＮＮの演算結果のベクトルと前記単語ＮＮの演算結果のベクトルとを結合したベクトルを入力データとし、演算結果である前記分類情報を出力データとする出力ＮＮにより構成され、前記分類部が、前記文字ＮＮを用いて、前記ニュース性投稿情報の前記文字ベクトルに基づきＮＮの演算を行い、前記単語ＮＮを用いて、前記ニュース性投稿情報の前記単語ベクトルに基づきＮＮの演算を行い、前記出力ＮＮを用いて、前記文字ＮＮの演算結果のベクトルと前記単語ＮＮの演算結果のベクトルとを結合したベクトルに基づいてＮＮの演算を行い、演算結果を前記分類情報として出力する、ことを特徴とする。 In order to solve the above-described problem, the news material classification device according to claim 1 inputs post information that can be a news material among a large number of social media information as news post information, and uses the news post information as a user's post information. In the news material classification device for classifying on demand, the news posting information is input, the characters and words included in the news posting information are respectively extracted, and a character vector composed of the character series is generated. A sequence generation unit that generates a word vector composed of the sequence of words, a learning model storage unit that stores a machine-learned learning model, and reads out the learning model from the learning model storage unit, and uses the learning model, Based on the character vector and the word vector generated by the series generation unit, A classification unit that generates and outputs classification information according to the user's request for information, and the learning model uses the character vector of the news posting information as input data, and calculates a vector of an operation result. Character NN (neural network) to be output data, word NN having the word vector of the news posting information as input data and a vector of the operation result as output data, vector of the operation result of the character NN and the word It is comprised by the output NN which uses the vector which combined the vector of the calculation result of NN as input data, and uses the said classification information which is a calculation result as output data, The said classification | category part uses the said character NN, and is the said news contribution NN is calculated based on the character vector of the information, and the word NN is used to calculate the news posting information. An NN operation is performed based on a word vector, and an NN operation is performed based on a vector obtained by combining the operation result vector of the character NN and the operation result vector of the word NN using the output NN. Is output as the classification information.

また、請求項２のニュース素材分類装置は、請求項１に記載のニュース素材分類装置において、前記分類情報を、前記ニュース性投稿情報が既出であるか、または非既出であるかを示す既出・非既出情報とする、ことを特徴とする。 Further, the news material classification apparatus according to claim 2 is the news material classification apparatus according to claim 1, wherein the classification information is an already-displayed / not-displayed information indicating whether the news posting information has been published or has not been published. It is characterized as non-existing information.

また、請求項３のニュース素材分類装置は、請求項２に記載のニュース素材分類装置において、さらに、第２分類部及び判定部を備え、前記分類部が、前記既出・非既出情報を第１分類結果として出力し、前記第２分類部が、前記ニュース性投稿情報を入力し、当該ニュース性投稿情報に付加された、投稿元の機器を識別するためのエージェント情報を抽出し、当該エージェント情報に基づいて、前記既出・非既出情報を生成して第２分類結果として出力し、前記判定部が、前記分類部により出力された前記第１分類結果、及び前記第２分類部により出力された前記第２分類結果に基づいて、新たな既出・非既出情報を生成して出力する、ことを特徴とする。 Further, the news material classification device according to claim 3 is the news material classification device according to claim 2, further comprising a second classification unit and a determination unit, wherein the classification unit stores the already-present / non-existing information as first information. The classification result is output, and the second classification unit inputs the news posting information, extracts agent information added to the news posting information for identifying the posting source device, and the agent information The generated / non-exposed information is generated and output as the second classification result, and the determination unit is output by the first classification result output by the classification unit and the second classification unit. Based on the second classification result, new appearance / non-occurrence information is generated and output.

また、請求項４のニュース素材分類装置は、請求項２または３に記載のニュース素材分類装置において、前記系列生成部が、前記ニュース性投稿情報を入力し、当該ニュース性投稿情報に含まれる文字及び単語をそれぞれ抽出し、前記文字に対応したone-hot列ベクトルを並べて文字one-hotベクトル系列を生成すると共に、前記単語に対応したone-hot列ベクトルを並べて単語one-hotベクトル系列を生成し、前記学習モデルが、前記系列生成部により生成された前記文字one-hotベクトル系列を入力データとし、ＦＦＮＮ（フィードフォワードニューラルネットワーク）の演算結果のベクトルを出力データとする文字入力層用ＦＦＮＮ、当該文字入力層用ＦＦＮＮの演算結果のベクトルを入力データとし、ＲＮＮ（リカレントニューラルネットワーク）の演算結果のベクトルを出力データとする文字ＲＮＮ、前記系列生成部により生成された前記単語one-hotベクトル系列を入力データとし、ＦＦＮＮの演算結果のベクトルを出力データとする単語入力層用ＦＦＮＮ、当該単語入力層用ＦＦＮＮの演算結果のベクトルを入力データとし、ＲＮＮの演算結果のベクトルを出力データとする単語ＲＮＮ、前記文字ＲＮＮの演算結果のベクトルと前記単語ＲＮＮの演算結果のベクトルとを結合したベクトルを入力データとし、ＦＦＮＮの演算結果のベクトルを出力データとする中間層用ＦＦＮＮ、及び、当該中間層用ＦＦＮＮの演算結果のベクトルを入力データとし、ＦＦＮＮの演算結果である前記既出・非既出情報を出力データとする出力層用ＦＦＮＮにより構成され、前記分類部が、前記文字入力層用ＦＦＮＮを用いて、前記系列生成部により生成された前記文字one-hotベクトル系列に基づき、ＦＦＮＮの演算を行い、前記文字ＲＮＮを用いて、前記文字入力層用ＦＦＮＮの演算結果のベクトルに基づき、ＲＮＮの演算を行い、前記単語入力層用ＦＦＮＮを用いて、前記系列生成部により生成された前記単語one-hotベクトル系列に基づき、ＦＦＮＮの演算を行い、前記単語ＲＮＮを用いて、前記単語入力層用ＦＦＮＮの演算結果のベクトルに基づき、ＲＮＮの演算を行い、前記中間層用ＦＦＮＮを用いて、前記文字ＲＮＮの演算結果のベクトルと前記単語ＲＮＮの演算結果のベクトルとを結合したベクトルに基づき、ＦＦＮＮの演算を行い、前記出力層用ＦＦＮＮを用いて、前記中間層用ＦＦＮＮの演算結果のベクトルに基づき、ＦＦＮＮの演算を行い、演算結果を前記既出・非既出情報として出力する、ことを特徴とする。 The news material classification apparatus according to claim 4 is the news material classification apparatus according to claim 2 or 3, wherein the series generation unit inputs the news posting information, and characters included in the news posting information. And one-hot column vector corresponding to the character are arranged to generate a character one-hot vector sequence, and one-hot column vector corresponding to the word is arranged to generate a word one-hot vector sequence. The learning model uses the character one-hot vector sequence generated by the sequence generation unit as input data, and the FFNN (feed forward neural network) calculation result vector as output data, FFNN for character input layer, The vector of the calculation result of the character input layer FFNN is used as input data, and the RNN (Recurrent Neural Network) FFNN for word input layer using as input data the character RNN having the vector of the calculation result as output data, the word one-hot vector sequence generated by the sequence generation unit as the output data, and the vector of the calculation result of FFNN as the output data, The word RNN having the vector of the FFNN calculation result for the word input layer as input data and the vector of the calculation result of the RNN, the vector of the calculation result of the character RNN, and the vector of the calculation result of the word RNN are combined. An intermediate layer FFNN having a vector as input data and an FFNN calculation result vector as output data, and an intermediate layer FFNN calculation result vector as input data, and the above-mentioned and non-existing FFNN calculation results. It is composed of FFNN for output layer whose information is output data, and the classification unit is for the character input layer Based on the character one-hot vector sequence generated by the sequence generation unit using FNN, FFNN is calculated, and using the character RNN, based on the calculation result vector of the character input layer FFNN, RNN calculation is performed, FFNN calculation is performed based on the word one-hot vector sequence generated by the sequence generation unit using the word input layer FFNN, and the word input is performed using the word RNN. RNN calculation is performed based on the vector of the calculation result of the layer FFNN, and based on a vector obtained by combining the vector of the calculation result of the character RNN and the vector of the calculation result of the word RNN using the intermediate layer FFNN. FFNN is calculated, and the output layer FFNN is used to calculate the FFNN based on the calculation result vector of the intermediate layer FFNN. Performed, and outputs the operation result as the foregoing, non foregoing information, characterized in that.

また、請求項５のニュース素材分類装置は、請求項２または３に記載のニュース素材分類装置において、前記系列生成部が、前記ニュース性投稿情報を入力し、当該ニュース性投稿情報に含まれる文字及び単語をそれぞれ抽出し、前記文字に対応したone-hot列ベクトルを並べて文字one-hotベクトル系列を生成すると共に、前記単語に対応したone-hot列ベクトルを並べて単語one-hotベクトル系列を生成し、前記学習モデルが、前記系列生成部により生成された前記文字one-hotベクトル系列を入力データとし、ＦＦＮＮの演算結果のベクトルを出力データとする文字入力層用ＦＦＮＮ、当該文字入力層用ＦＦＮＮの演算結果のベクトルを入力データとし、ＣＮＮ（畳み込みニューラルネットワーク）の演算結果のベクトルを出力データとする文字ＣＮＮ、当該文字ＣＮＮの演算結果のベクトルを入力データとし、プーリングの演算結果のベクトルを出力データとする文字プーリング層と、前記系列生成部により生成された前記単語one-hotベクトル系列を入力データとし、ＦＦＮＮの演算結果のベクトルを出力データとする単語入力層用ＦＦＮＮ、当該単語入力層用ＦＦＮＮの演算結果のベクトルを入力データとし、ＣＮＮの演算結果のベクトルを出力データとする単語ＣＮＮ、当該単語ＣＮＮの演算結果のベクトルを入力データとし、プーリングの演算結果のベクトルを出力データとする単語プーリング層と、前記文字プーリング層の演算結果のベクトルと前記単語プーリング層の演算結果のベクトルとを結合したベクトルを入力データとし、ＦＦＮＮの演算結果のベクトルを出力データとする中間層用ＦＦＮＮ、及び、当該中間層用ＦＦＮＮの演算結果のベクトルを入力データとし、ＦＦＮＮの演算結果である前記既出・非既出情報を出力データとする出力層用ＦＦＮＮにより構成され、前記分類部が、前記文字入力層用ＦＦＮＮを用いて、前記系列生成部により生成された前記文字one-hotベクトル系列に基づき、ＦＦＮＮの演算を行い、前記文字ＣＮＮを用いて、前記文字入力層用ＦＦＮＮの演算結果のベクトルに基づき、ＣＮＮの演算を行い、前記文字プーリング層を用いて、前記文字ＣＮＮの演算結果のベクトルに基づき、プーリングの演算を行い、前記単語入力層用ＦＦＮＮを用いて、前記系列生成部により生成された前記単語one-hotベクトル系列に基づき、ＦＦＮＮの演算を行い、前記単語ＣＮＮを用いて、前記単語入力層用ＦＦＮＮの演算結果のベクトルに基づき、ＣＮＮの演算を行い、前記単語プーリング層を用いて、前記単語ＣＮＮの演算結果のベクトルに基づき、プーリングの演算を行い、前記中間層用ＦＦＮＮを用いて、前記文字プーリング層の演算結果のベクトルと前記単語プーリング層の演算結果のベクトルとを結合したベクトルに基づき、ＦＦＮＮの演算を行い、前記出力層用ＦＦＮＮを用いて、前記中間層用ＦＦＮＮの演算結果のベクトルに基づき、ＦＦＮＮの演算を行い、演算結果を前記既出・非既出情報として出力する、ことを特徴とする。 The news material classification apparatus according to claim 5 is the news material classification apparatus according to claim 2 or 3, wherein the series generation unit inputs the news posting information, and characters included in the news posting information. And one-hot column vector corresponding to the character are arranged to generate a character one-hot vector sequence, and one-hot column vector corresponding to the word is arranged to generate a word one-hot vector sequence. The learning model uses the character one-hot vector sequence generated by the sequence generation unit as input data, and the character input layer FFNN using the vector of the FFNN calculation result as output data, the character input layer FFNN The character CNN having the vector of the result of the operation as input data and the vector of the result of the operation of the CNN (convolutional neural network) as output data The character pooling layer having the vector of the calculation result of the character CNN as input data, the vector of the pooling calculation result as output data, and the word one-hot vector sequence generated by the sequence generation unit as input data, and FFNN FFNN for the word input layer using the vector of the result of the calculation of the above as output data, the vector of the calculation result of the FFNN for the word input layer as the input data, the word CNN using the vector of the calculation result of the CNN as the output data, and the word CNN A word pooling layer having an operation result vector as input data and a pooling operation result vector as output data, and a vector obtained by combining the character pooling layer operation result vector and the word pooling layer operation result vector. Use as input data, and use FFNN operation result vector as output data The classification unit includes the FFNN for the intermediate layer and the FFNN for the output layer using the vector of the calculation result of the intermediate layer FFNN as input data and the output / non-existing information that is the calculation result of the FFNN as output data. Performs an FFNN operation based on the character one-hot vector sequence generated by the sequence generation unit using the character input layer FFNN, and uses the character CNN to calculate the character input layer FFNN. The CNN is calculated based on the calculation result vector, the pooling calculation is performed based on the character CNN calculation result vector using the character pooling layer, and the word input layer FFNN is used to calculate the sequence. Based on the word one-hot vector sequence generated by the generation unit, the FFNN is calculated, and the word CNN is used for the word input layer Based on the vector of the calculation result of FNN, CNN is calculated, using the word pooling layer, calculating the pooling based on the vector of the calculation result of the word CNN, and using the intermediate layer FFNN, Based on a vector obtained by combining the vector of the result of the character pooling layer and the vector of the result of the word pooling layer, the FFNN is calculated and the output FFNN is used to calculate the result of the calculation of the intermediate FFNN. An FFNN operation is performed based on the vector, and the operation result is output as the previous / non-existing information.

また、請求項６のニュース素材分類装置は、請求項２または３に記載のニュース素材分類装置において、前記系列生成部が、前記ニュース性投稿情報を入力し、当該ニュース性投稿情報に含まれる文字及び単語をそれぞれ抽出し、抽出した全ての前記文字に対応した文字BOWベクトルを生成すると共に、抽出した全ての単語に対応した単語BOWベクトルを生成し、前記学習モデルが、前記系列生成部により生成された前記文字BOWベクトルを入力データとし、ＦＦＮＮの演算結果のベクトルを出力データとする文字入力層用ＦＦＮＮ、前記系列生成部により生成された前記単語BOWベクトルを入力データとし、ＦＦＮＮの演算結果のベクトルを出力データとする単語入力層用ＦＦＮＮ、前記文字入力層用ＦＦＮＮの演算結果のベクトルと前記単語入力層用ＦＦＮＮの演算結果のベクトルとを結合したベクトルを入力データとし、ＦＦＮＮの演算結果のベクトルを出力データとする中間層用ＦＦＮＮ、及び、当該中間層用ＦＦＮＮの演算結果のベクトルを入力データとし、ＦＦＮＮの演算結果である前記既出・非既出情報を出力データとする出力層用ＦＦＮＮにより構成され、前記分類部が、前記文字入力層用ＦＦＮＮを用いて、前記系列生成部により生成された前記文字BOWベクトルに基づき、ＦＦＮＮの演算を行い、前記単語入力層用ＦＦＮＮを用いて、前記系列生成部により生成された前記単語BOWベクトルに基づき、ＦＦＮＮの演算を行い、前記中間層用ＦＦＮＮを用いて、前記文字入力層用ＦＦＮＮの演算結果のベクトルと前記単語入力層用ＦＦＮＮの演算結果のベクトルとを結合したベクトルに基づき、ＦＦＮＮの演算を行い、前記出力層用ＦＦＮＮを用いて、前記中間層用ＦＦＮＮの演算結果のベクトルに基づき、ＦＦＮＮの演算を行い、演算結果を前記既出・非既出情報として出力する、ことを特徴とする。 The news material classification apparatus according to claim 6 is the news material classification apparatus according to claim 2 or 3, wherein the series generation unit inputs the news posting information, and characters included in the news posting information. And a word are extracted, a character BOW vector corresponding to all the extracted characters is generated, a word BOW vector corresponding to all the extracted words is generated, and the learning model is generated by the sequence generation unit The input character BOW vector is used as input data, the character input layer FFNN using the FFNN operation result vector as output data, the word BOW vector generated by the sequence generation unit as input data, and the FFNN operation result FFNN for word input layer using vector as output data, vector of calculation result of FFNN for character input layer and word input The input data is a vector obtained by combining the vector of the operation result of FFNN for the input data, the FFNN for the intermediate layer using the vector of the operation result of the FFNN as the output data, and the vector of the operation result of the FFNN for the intermediate layer, The character generated by the sequence generation unit using the character input layer FFNN, the FFNN for the output layer having the output / non-existing information that is the calculation result of the FFNN as output data. FFNN is calculated based on the BOW vector, FFNN is calculated based on the word BOW vector generated by the sequence generator using the word input layer FFNN, and the intermediate layer FFNN is used. The vector of the calculation result of the character input layer FFNN and the vector of the calculation result of the word input layer FFNN are combined. FFNN is calculated based on the vector, and FFNN is calculated based on the calculation result vector of the intermediate layer FFNN using the output layer FFNN, and the calculation result is output as the previous / non-existing information. It is characterized by that.

また、請求項７のニュース素材分類装置は、多数のソーシャルメディア情報のうち、ニュース素材となり得る投稿情報をニュース性投稿情報として入力し、当該ニュース性投稿情報を既出または非既出に分類するニュース素材分類装置において、前記ニュース性投稿情報を入力し、当該ニュース性投稿情報に含まれる文字または単語をそれぞれ抽出し、前記文字の系列からなる文字ベクトル、または前記単語の系列からなる単語ベクトルを生成する系列生成部と、前記ニュース性投稿情報の前記文字ベクトルまたは前記単語ベクトルを入力データとし、演算結果である、前記ニュース性投稿情報が既出であるか、または非既出であるかを示す既出・非既出情報を出力データとするＮＮにより構成された学習モデルであって、機械学習された前記学習モデルを記憶する学習モデル記憶部と、前記学習モデル記憶部から前記学習モデルを読み出し、当該学習モデルを用いて、前記系列生成部により生成された前記文字ベクトルまたは前記単語ベクトルに基づき、前記既出・非既出情報を生成して第１分類結果として出力する分類部と、前記ニュース性投稿情報を入力し、当該ニュース性投稿情報に付加された、投稿元の機器を識別するためのエージェント情報を抽出し、当該エージェント情報に基づいて、前記既出・非既出情報を生成して第２分類結果として出力する第２分類部と、前記分類部により出力された前記第１分類結果、及び前記第２分類部により出力された前記第２分類結果に基づいて、新たな既出・非既出情報を生成して出力する、ことを特徴とする。 The news material classification device according to claim 7 inputs the posting information that can be news material among a large number of social media information as news posting information, and classifies the news posting information as published or not published. In the classification device, the news posting information is input, characters or words included in the news posting information are extracted, and a character vector consisting of the character series or a word vector consisting of the word series is generated. A series generator and the character vector or the word vector of the news posting information as input data, and whether the news posting information as a calculation result has already been shown or not shown This is a learning model composed of NN that uses the existing information as output data. A learning model storage unit that stores a learning model, and reads out the learning model from the learning model storage unit, and uses the learning model to generate the previously-explained data based on the character vector or the word vector generated by the sequence generation unit. A classification unit that generates non-existing information and outputs it as a first classification result, and agent information for inputting the news posting information and identifying the posting source device added to the news posting information A second classifying unit that extracts and outputs the second / non-existing information based on the agent information, and outputs the second classification result; the first classification result output by the classification unit; and the second Based on the second classification result output by the classification unit, new previous / non-existing information is generated and output.

さらに、請求項８のプログラムは、コンピュータを、請求項１から７までのいずれか一項のニュース素材分類装置として機能させることを特徴とする。 Furthermore, a program according to claim 8 causes a computer to function as the news material classification device according to any one of claims 1 to 7.

さらに、請求項９の学習モデルは、コンピュータを、多数のソーシャルメディア情報から抽出されたニュース性投稿情報を利用者の要求に応じて分類し分類情報を出力するように機能させるための学習モデルであって、前記ニュース性投稿情報から抽出された文字の系列からなる文字ベクトルを入力データとし、演算結果のベクトルを出力データとする文字ＮＮ、前記ニュース性投稿情報から抽出された単語の系列からなる単語ベクトルを入力データとし、演算結果のベクトルを出力データとする単語ＮＮ、及び、前記文字ＮＮの演算結果のベクトルと前記単語ＮＮの演算結果のベクトルとを結合したベクトルを入力データとし、演算結果である前記分類情報を出力データとする出力ＮＮにより、前記ニュース性投稿情報の前記文字ベクトル及び前記単語ベクトルを当該学習モデルの入力データとし、前記ニュース性投稿情報の分類情報を当該学習モデルの出力データとして機械学習された前記文字ＮＮの重み係数、前記単語ＮＮの重み係数及び前記出力ＮＮの重み係数を保持して構成され、前記文字ＮＮの入力層に入力された前記文字ベクトルに対し、前記文字ＮＮの重み係数に基づくＮＮの演算を行い、出力層から演算結果のベクトルを出力し、前記単語ＮＮの入力層に入力された前記単語ベクトルに対し、前記単語ＮＮの重み係数に基づくＮＮの演算を行い、出力層から演算結果のベクトルを出力し、前記出力ＮＮの入力層に入力された、前記文字ＮＮの出力層から出力された前記演算結果のベクトルと前記単語ＮＮの出力層から出力された前記演算結果のベクトルとが結合されたベクトルに対し、前記出力ＮＮの重み係数に基づくＮＮの演算を行い、出力層から演算結果である前記分類情報を出力するように、前記コンピュータを機能させることを特徴とする。 Further, the learning model of claim 9 is a learning model for causing a computer to function so as to classify news posting information extracted from a large number of social media information according to a user's request and output classification information. A character vector consisting of a character sequence extracted from the news posting information and a character NN using a calculation result vector as output data, and a word sequence extracted from the news posting information. A word NN having a word vector as input data and a vector of the operation result as output data, and a vector obtained by combining the vector of the operation result of the character NN and the vector of the operation result of the word NN are input data, and the operation result The character vector of the news posting information by the output NN using the classification information as output data And the word vector as the input data of the learning model, and the classification information of the news posting information as the output data of the learning model, the weight coefficient of the character NN, the weight coefficient of the word NN, and the output NN The NN operation is performed on the character vector input to the input layer of the character NN based on the weight coefficient of the character NN, and an operation result vector is output from the output layer. NN operation is performed on the word vector input to the input layer of the word NN based on the weight coefficient of the word NN, and a vector of the operation result is output from the output layer and input to the input layer of the output NN The operation result vector output from the character NN output layer and the operation result vector output from the word NN output layer are combined. To vector performs a calculation of NN based on the weighting factor of the output NN, to output the classification information is an arithmetic result from the output layer, and characterized by causing the computer to function.

以上のように、本発明によれば、利用者が多数のソーシャルメディア情報から所望の情報を取得する際に、ソーシャルメディア情報から抽出されたニュース性投稿情報を精度高く分類することが可能となる。 As described above, according to the present invention, when a user acquires desired information from a large number of social media information, it is possible to classify news posting information extracted from social media information with high accuracy. .

実施例１のニュース素材分類装置を含む全体構成例の概要を説明する図である。It is a figure explaining the outline | summary of the whole structural example containing the news material classification | category apparatus of Example 1. FIG. 実施例１のニュース素材分類装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the news material classification | category apparatus of Example 1. FIG. 実施例２のニュース素材分類装置を含む全体構成例の概要を説明する図である。It is a figure explaining the outline | summary of the whole structural example containing the news material classification | category apparatus of Example 2. FIG. 実施例２のニュース素材分類装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the news material classification | category apparatus of Example 2. FIG. エージェント分類部の処理例を示すフローチャートである。It is a flowchart which shows the process example of an agent classification | category part. 判定部の第１の処理例（処理例１）を示すフローチャートである。It is a flowchart which shows the 1st process example (process example 1) of a determination part. 判定部の第２の処理例（処理例２）を示すフローチャートである。It is a flowchart which shows the 2nd process example (process example 2) of a determination part. 学習装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of a learning apparatus. 学習モデルの概略構造を説明する図である。It is a figure explaining the schematic structure of a learning model. ＦＦＮＮ及びＬＳＴＭを用いた学習モデル（具体例１）の概略構造を説明する図である。It is a figure explaining the schematic structure of the learning model (specific example 1) using FFNN and LSTM. 具体例１におけるノード数及び入出力データを説明する図である。It is a figure explaining the number of nodes and input-output data in the specific example 1. ＦＦＮＮ及びＣＮＮを用いた学習モデル（具体例２）の概略構造を説明する図である。It is a figure explaining the schematic structure of the learning model (specific example 2) using FFNN and CNN. 具体例２におけるノード数及び入出力データを説明する図である。It is a figure explaining the number of nodes and input-output data in the specific example 2. ＦＦＮＮを用いた学習モデル（具体例３）の概略構造を説明する図である。It is a figure explaining the schematic structure of the learning model (specific example 3) using FFNN. 具体例３におけるノード数及び入出力データを説明する図である。It is a figure explaining the number of nodes and input-output data in the specific example 3. FIG. ニュース性投稿情報から生成された文字系列の例及び単語系列の例を示す図である。It is a figure which shows the example of the character series produced | generated from news property posting information, and the example of a word series. 文字one-hotベクトル系列｛ｘ_char｝の例を示す図である。It is a figure which shows the example of the character one-hot vector series { _xchar }. 単語one-hotベクトル系列｛ｘ_word｝の例を示す図である。It is a figure which shows the example of the word one-hot vector series { _xword }. 文字BOWベクトルｘ_charの例を示す図である。Is a diagram illustrating an example of a character BOW vector x _char. 単語BOWベクトルｘ_wordの例を示す図である。It is a figure which shows the example of the word BOW vector _xword . 実験結果を説明する図である。It is a figure explaining an experimental result. 従来のニュース素材抽出装置の概要を説明する図である。It is a figure explaining the outline | summary of the conventional news material extraction apparatus.

以下、本発明を実施するための形態について図面を用いて詳細に説明する。本発明は、ソーシャルメディア情報からニュース性投稿情報を抽出する処理とは別に、ニュース性投稿情報を、利用者の要求に応じた情報（例えば第一報の非既出情報）とその他の情報（例えば既出情報）とに分類するものである。既出情報とは、既に出現した情報をいい、非既出情報とは、過去に出現しておらず、初めて出現した情報をいう。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings. In the present invention, apart from the process of extracting news posting information from social media information, the news posting information is divided into information according to the user's request (for example, non-existing information in the first report) and other information (for example, Information). The already-present information refers to information that has already appeared, and the non-existing information refers to information that has not appeared in the past but has appeared for the first time.

利用者が多数のソーシャルメディア情報から所望の情報を取得する際の処理を、二段階に分ける。第１の処理は、ソーシャルメディア情報からニュース性投稿情報を抽出する処理であり、既存の技術を用いて実現する。 The process when a user acquires desired information from a large number of social media information is divided into two stages. The first process is a process of extracting news posting information from social media information, and is realized using existing technology.

第２の処理は、本発明によるものであり、第１の処理により抽出されたニュース性投稿情報を、機械学習により、利用者の要求に応じた情報とその他の情報とに分類する処理である。 The second process is according to the present invention, and is a process of classifying the news posting information extracted by the first process into information according to the user's request and other information by machine learning. .

本発明は、第２の処理において、ニュース性投稿情報から文字ベクトル及び単語ベクトルを生成し、予め機械学習された学習モデルを用いて、文字ベクトル及び単語ベクトルを入力データとして入力層に入力する。そして、本発明は、文字ベクトルの中間データと単語ベクトルの中間データとを中間層にて結合し、利用者の要求に応じた情報とその他の情報とを判定し、この判定結果を出力データとして出力層から出力する。 In the second processing, in the second process, a character vector and a word vector are generated from the news posting information, and the character vector and the word vector are input to the input layer as input data by using a learning model that has been previously machine-learned. The present invention combines the intermediate data of the character vector and the intermediate data of the word vector in the intermediate layer, determines information according to the user's request and other information, and uses the determination result as output data. Output from the output layer.

以下、本発明について、実施例１，２を挙げて具体的に説明する。実施例１，２は、利用者が多数のソーシャルメディア情報から第一報の非既出情報を取得できるようにするものである。実施例１は、学習モデルを用いて、ニュース性投稿情報の文面（本文）に基づき、ニュース性投稿情報を、第一報の非既出情報とその他の既出情報とに分類し、判定結果を生成する。 Hereinafter, the present invention will be specifically described with reference to Examples 1 and 2. In the first and second embodiments, the user can acquire the first non-existing information from a large number of social media information. Example 1 uses a learning model to classify news posting information into non-existing information of the first report and other already-existing information based on the text (text) of news posting information and generate a determination result To do.

実施例２は、実施例１の処理に加え、ニュース性投稿情報の文面以外の発信元のアプリケーション、ＯＳ等のエージェント情報を用いたフィルタリングにて、ニュース性投稿情報を、第一報の非既出情報とその他の既出情報とに分類し、判定結果を生成する。そして、実施例２は、実施例１の処理による学習モデルを用いた判定結果、及びフィルタリングによる判定結果に基づいて、最終的な判定結果を生成する。 In the second embodiment, in addition to the processing of the first embodiment, the news posting information is filtered by using the sender information other than the text of the news posting information and the agent information such as the OS, and the first report is not published. The information is classified into information and other previously-described information, and a determination result is generated. And Example 2 produces | generates a final determination result based on the determination result using the learning model by the process of Example 1, and the determination result by filtering.

また、以下、実施例１，２に用いる学習モデルの種類について、具体例１，２，３を挙げて説明する。具体例１は、学習モデルとしてＦＦＮＮ（FeedForward Neural Network：フィードフォワード（順伝搬型）ニューラルネットワーク）及びＬＳＴＭ（Long Short Term Memory：長短期記憶ユニット）を用いた例である。具体例２は、ＦＦＮＮ及びＣＮＮ（Convolutional Neural Network：畳み込みニューラルネットワーク）を用いた例であり、具体例３は、ＦＦＮＮを用いた例である。 Hereinafter, the types of learning models used in the first and second embodiments will be described with specific examples 1, 2, and 3. Specific example 1 is an example using FFNN (FeedForward Neural Network) and LSTM (Long Short Term Memory) as learning models. Specific example 2 is an example using FFNN and CNN (Convolutional Neural Network), and specific example 3 is an example using FFNN.

〔実施例１〕
まず、実施例１について説明する。前述のとおり、実施例１は、学習モデルを用いて、ニュース性投稿情報の文面に基づき、ニュース性投稿情報を、第一報の非既出情報とその他の既出情報とに分類し、判定結果を生成する例である。 [Example 1]
First, Example 1 will be described. As described above, the first embodiment classifies the news posting information into the first report non-existing information and the other existing information using the learning model based on the text of the news posting information, and determines the determination result. This is an example of generation.

図１は、実施例１のニュース素材分類装置を含む全体構成例の概要を説明する図である。このシステムは、利用者が多数のソーシャルメディア情報から第一報の非既出情報を取得するためのものであり、ニュース素材抽出装置１００及びニュース素材分類装置１−１を備えて構成される。 FIG. 1 is a diagram for explaining an overview of an overall configuration example including the news material classification apparatus according to the first embodiment. This system is for a user to acquire the first non-existing information from a large number of social media information, and includes a news material extracting device 100 and a news material classifying device 1-1.

ニュース素材抽出装置１００は、図２２にて説明したとおり、投稿情報であるソーシャルメディア情報を入力し、ソーシャルメディア情報からニュース性投稿情報を抽出する。そして、ニュース素材抽出装置１００は、ソーシャルメディア情報のうち抽出したニュース性投稿情報をニュース素材分類装置１−１へ出力し、その他の情報を破棄する。 As described with reference to FIG. 22, the news material extraction apparatus 100 inputs social media information that is posting information, and extracts news posting information from the social media information. Then, the news material extraction device 100 outputs the news posting information extracted from the social media information to the news material classification device 1-1 and discards other information.

ニュース素材抽出装置１００は、ニューラルネットワークの学習モデルを用いた前述の非特許文献１の手法、キーワードマッチングの手法等の既存技術にて実現する。例えば、ニュース素材抽出装置１００は、実際に報道現場で人手により抽出したニュース性のあるソーシャルメディア情報を正例とし、ランダムサンプルしたソーシャルメディア情報を負例とした学習データを用いて機械学習を行い、学習モデルを生成する。機械学習は、ソーシャルメディア情報の文字one-hotベクトル系列を入力データとし、例えばBi-directional ＬＳＴＭを用いて行われる。そして、ニュース素材抽出装置１００は、機械学習した学習モデルを用いて、ソーシャルメディア情報からニュース性投稿情報を自動的に抽出する。 The news material extraction apparatus 100 is realized by existing techniques such as the technique of Non-Patent Document 1 and the keyword matching technique described above using a neural network learning model. For example, the news material extraction apparatus 100 performs machine learning using learning data in which social media information with news properties actually extracted manually in a news report site is a positive example and social media information randomly sampled is a negative example. Generate a learning model. Machine learning is performed using a character one-hot vector sequence of social media information as input data and using, for example, Bi-directional LSTM. The news material extracting apparatus 100 automatically extracts news posting information from social media information using a machine-learned learning model.

ニュース素材分類装置１−１は、ニュース素材抽出装置１００からニュース性投稿情報を入力し、学習モデルを用いて、ニュース性投稿情報を、第一報の非既出情報とその他の既出情報とに分類する。 The news material classification device 1-1 receives the news posting information from the news material extraction device 100, and classifies the news posting information into the first non-existing information and the other existing information using the learning model. To do.

ニュース素材分類装置１−１は、分類した第一報の非既出情報を、非既出のラベル付けニュース性投稿情報とし、分類したその他の既出情報を、既出のレベル付けニュース性投稿情報とする。そして、ニュース素材分類装置１−１は、ニュース性投稿情報について既出または非既出を示すラベル情報を、既出・非既出情報として出力する。ニュース素材分類装置１−１の詳細については後述する。 The news material classifying apparatus 1-1 sets the classified first non-existing information as non-existing labeled news posting information, and sets the other classified information as the leveled news posting information already described. Then, the news material classification device 1-1 outputs label information indicating the appearance or non-existing for the news posting information as the appearance / non-existing information. Details of the news material classification device 1-1 will be described later.

図２は、図１に示した実施例１のニュース素材分類装置１−１の構成例を示すブロック図である。このニュース素材分類装置１−１は、系列生成部１０、文面分類部１１及び学習モデル記憶部１２を備えている。 FIG. 2 is a block diagram illustrating a configuration example of the news material classification device 1-1 according to the first embodiment illustrated in FIG. The news material classification device 1-1 includes a sequence generation unit 10, a sentence classification unit 11, and a learning model storage unit 12.

系列生成部１０は、ニュース性投稿情報を入力し、ニュース性投稿情報に基づいて、ニュース性投稿情報に含まれる文字を文字系列として抽出すると共に、ニュース性投稿情報に含まれる単語を単語系列として抽出する。そして、系列生成部１０は、文字系列からなる文字ベクトルを生成すると共に、単語系列からなる単語ベクトルを生成する。そして、系列生成部１０は、ニュース性投稿情報の文字ベクトル及び単語ベクトルを文面分類部１１に出力する。 The sequence generation unit 10 inputs the news posting information, extracts characters included in the news posting information as a character sequence based on the news posting information, and sets the words included in the news posting information as a word sequence. Extract. Then, the sequence generation unit 10 generates a character vector composed of the character sequence and also generates a word vector composed of the word sequence. Then, the sequence generation unit 10 outputs the character vector and the word vector of the news posting information to the text classification unit 11.

例えば、系列生成部１０は、ニュース性投稿情報に含まれる文字を抽出し、文字に対応したone-hot（ワンホット）列ベクトルを並べて、文字one-hotベクトル系列を生成する。また、系列生成部１０は、ニュース性投稿情報に含まれる単語を抽出し、単語に対応したone-hot列べクトルを並べて、単語one-hotベクトル系列を生成する。また例えば、系列生成部１０は、抽出した文字の系列からなる文字BOW（Bags Of Words、ボウ）ベクトルを生成し、抽出した単語の系列からなる単語BOWベクトルを生成する。文字one-hotベクトル系列、単語one-hotベクトル系列、文字BOWベクトル及び単語BOWベクトルの詳細については後述する。 For example, the sequence generation unit 10 extracts characters included in the news posting information, arranges one-hot column vectors corresponding to the characters, and generates a character one-hot vector sequence. Further, the sequence generation unit 10 extracts words included in the news posting information, arranges one-hot sequence vectors corresponding to the words, and generates a word one-hot vector sequence. Further, for example, the sequence generation unit 10 generates a character BOW (Bags Of Words) vector composed of the extracted character sequence, and generates a word BOW vector composed of the extracted word sequence. Details of the character one-hot vector series, the word one-hot vector series, the character BOW vector, and the word BOW vector will be described later.

文面分類部１１は、系列生成部１０からニュース性投稿情報の文字ベクトル及び単語ベクトルを入力する。そして、文面分類部１１は、学習モデル記憶部１２に記憶された学習モデルを用いて、文字ベクトル及び単語ベクトルに基づき、既出または非既出を示すラベル情報を生成し、これを既出・非既出情報として出力する。この既出・非既出情報は、系列生成部１０が入力したニュース性投稿情報が既出であるか、または非既出であるかを示す情報である。 The text classification unit 11 inputs a character vector and a word vector of news posting information from the sequence generation unit 10. Then, the sentence classification unit 11 uses the learning model stored in the learning model storage unit 12 to generate label information indicating the appearance or non-occurrence based on the character vector and the word vector, and outputs the label information indicating the appearance / non-appearance information. Output as. This published / not-published information is information indicating whether the news posting information input by the sequence generation unit 10 has been published or has not been published.

学習モデル記憶部１２には、後述する図８の学習装置２により機械学習された学習モデルが記憶されている。 The learning model storage unit 12 stores a learning model machine-learned by the learning device 2 shown in FIG.

図９は、図２に示した学習モデル記憶部１２に記憶された学習モデルの概略構造を説明する図である。この学習モデルは、文字ＮＮ（Neural Network：ニューラルネットワーク）２１、単語ＮＮ２２及び出力ＮＮ２３を備え、文字ＮＮ２１の出力データ及び単語ＮＮ２２の出力データが出力ＮＮ２３に入力されるように、文字ＮＮ２１及び単語ＮＮ２２と出力ＮＮ２３とが結合して構成される。 FIG. 9 is a diagram illustrating a schematic structure of a learning model stored in the learning model storage unit 12 illustrated in FIG. This learning model includes a character NN (Neural Network) 21, a word NN22, and an output NN23, and the character NN21 and the word NN22 so that the output data of the character NN21 and the output data of the word NN22 are input to the output NN23. And the output NN23 are combined.

この学習モデルは、後述する図８に示す学習装置２の機械学習により生成され、ニュース性投稿情報の文字ベクトル及び単語ベクトルを入力データとし、入力データに対して、当該ニュース性投稿情報が既出であるか、または非既出であるかを示す既出・非既出情報を出力するように、コンピュータを機能させるためのものである。また、この学習モデルは、機械学習された文字ＮＮ２１の重み係数、単語ＮＮ２２の重み係数及び出力ＮＮ２３の重み係数を保持している。 This learning model is generated by machine learning of the learning device 2 shown in FIG. 8 to be described later, and the character vector and the word vector of the news posting information are used as input data. This is for causing the computer to function so as to output the appearing / not appearing information indicating whether it is present or not appearing. In addition, this learning model holds the weight coefficient of machine-learned character NN21, the weight coefficient of word NN22, and the weight coefficient of output NN23.

文面分類部１１は、学習モデル記憶部１２に記憶された図９に示す学習モデルを読み出し、図２には示していないメモリに格納する。文面分類部１１は、系列生成部１０により生成された文字ベクトルを文字ＮＮ２１の入力層に入力し、文字ベクトルの各要素の値に対し、機械学習された重み係数を付加する等のＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトルを文字ＮＮ２１の出力層から出力し、出力ＮＮ２３へ伝搬させる。 The sentence classification unit 11 reads the learning model shown in FIG. 9 stored in the learning model storage unit 12 and stores it in a memory not shown in FIG. The sentence classification unit 11 inputs the character vector generated by the sequence generation unit 10 to the input layer of the character NN21, and adds a machine-learned weighting factor to the value of each element of the character vector. I do. Then, the sentence classification unit 11 outputs the vector of the calculation result from the output layer of the character NN21 and propagates it to the output NN23.

また、文面分類部１１は、系列生成部１０により生成された単語ベクトルを単語ＮＮ２２の入力層に入力し、単語ベクトルの各要素の値に対し、機械学習された重み係数を付加する等のＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトルを単語ＮＮ２２の出力層から出力し、出力ＮＮ２３へ伝搬させる。 In addition, the sentence classification unit 11 inputs the word vector generated by the sequence generation unit 10 to the input layer of the word NN22, and adds a machine-learned weighting factor to the value of each element of the word vector. Perform the operation. Then, the sentence classification unit 11 outputs the vector of the calculation result from the output layer of the word NN22 and propagates it to the output NN23.

この場合、文字ＮＮ２１から出力された演算結果のベクトルと、単語ＮＮ２２から出力された演算結果のベクトルとが結合され、結合したベクトルが出力ＮＮ２３へ伝搬される。 In this case, the calculation result vector output from the character NN21 and the calculation result vector output from the word NN22 are combined, and the combined vector is propagated to the output NN23.

文面分類部１１は、文字ＮＮ２１から演算結果のベクトルを出力ＮＮ２３の入力層に入力すると共に、単語ＮＮ２２から演算結果のベクトルを出力ＮＮ２３の入力層に入力する。そして、文面分類部１１は、入力したベクトルの各要素の値に対し、機械学習された重み係数を付加する等のＮＮの演算を行う。文面分類部１１は、演算結果に対し、softmax関数を用いて確率値を演算し、確率値に対し、argmax関数を用いて既出または非既出の二値情報を演算し、二値情報を既出・非既出情報として出力ＮＮ２３の出力層から出力する。 The text classification unit 11 inputs a calculation result vector from the character NN21 to the input layer of the output NN23, and inputs a calculation result vector from the word NN22 to the input layer of the output NN23. Then, the text classification unit 11 performs an NN calculation such as adding a machine-learned weighting factor to the value of each element of the input vector. The sentence classification unit 11 calculates a probability value using the softmax function for the calculation result, calculates existing or non-existing binary information using the argmax function for the probability value, and outputs the binary information. Output from the output layer of the output NN 23 as non-existing information.

尚、文面分類部１１は、文字ＮＮ２１、単語ＮＮ２２及び出力ＮＮ２３の各層から演算結果を出力する際に、活性化関数ＲＥＬＵ（REctified Linear Unit）の演算を行うものとする。後述する図１０〜図１５に示す学習モデルについても同様である。 Note that the sentence classification unit 11 calculates an activation function RELU (REctified Linear Unit) when outputting calculation results from each layer of the character NN21, the word NN22, and the output NN23. The same applies to the learning models shown in FIGS.

以上のように、実施例１のニュース素材分類装置１−１によれば、系列生成部１０は、ニュース素材抽出装置１００により抽出されたニュース性投稿情報から、文字ベクトル及び単語ベクトルを生成する。文面分類部１１は、学習モデルを用いて、ニュース性投稿情報の文字ベクトル及び単語ベクトルを入力データとし、既出・非既出情報を出力データとした演算を行い、既出・非既出情報を生成して出力する。このようにして、ニュース性投稿情報が非既出情報または既出情報に分類される。 As described above, according to the news material classification device 1-1 of the first embodiment, the sequence generation unit 10 generates a character vector and a word vector from the news posting information extracted by the news material extraction device 100. Using the learning model, the sentence classification unit 11 performs an operation using the character vector and the word vector of the news posting information as input data and the existing / non-existing information as output data to generate the existing / non-existing information. Output. In this way, the news posting information is classified as non-existing information or existing information.

学習モデルは、ニュース性投稿情報の文字ベクトルを入力データとする文字ＮＮ２１と、ニュース性投稿情報の単語ベクトルを入力データとする単語ＮＮ２２と、文字ＮＮ２１の演算結果と単語ＮＮ２２の演算結果とを結合したベクトルを入力データとし、既出・非既出情報を出力データとする出力ＮＮ２３とにより構成される。 The learning model combines the character NN21 having the character vector of the news posting information as input data, the word NN22 having the word vector of the news posting information as input data, and the calculation result of the character NN21 and the calculation result of the word NN22. The output NN 23 having the generated vector as input data and the output / non-existing information as output data.

これにより、文面分類部１１にて、文字ベクトル及び単語ベクトルの独立した演算が行われるから、ニュース性投稿情報の文字情報及び単語情報の優位性を活かした判定が可能となり、精度の高い既出・非既出情報を得ることができる。この場合の優位性とは、前述のとおり、文字情報については、例えば文字ＮＮ２１の入力層のノード数と単語ＮＮ２２の入力層のノード数とが同じ場合に、その未知語が単語情報よりも少なく、単語情報については、文字情報よりも一要素の持ち得る意味が限られることをいう。 As a result, the text classification unit 11 performs independent calculation of the character vector and the word vector, so that it is possible to make a determination utilizing the superiority of the character information and the word information of the news posting information, and the high accuracy Non-existing information can be obtained. The superiority in this case is that, as described above, for character information, for example, when the number of nodes in the input layer of character NN21 is the same as the number of nodes in the input layer of word NN22, the number of unknown words is less than that of word information. In terms of word information, the meaning that an element can have is limited compared to character information.

つまり、ニュース素材分類装置１−１が用いる学習モデルは、未知語が少ない文字の長所、及び一要素の持ち得る意味が限られる単語の長所が反映されるように機械学習されたモデルである。したがって、ニュース性投稿情報に対し、文面の観点から文字及び単語の優位性が反映された判定が行われ、結果として、精度の高い既出・非既出情報を得ることができる。 That is, the learning model used by the news material classification device 1-1 is a model that is machine-learned so as to reflect the advantages of characters with few unknown words and the advantages of words that have a limited meaning that one element can have. Therefore, a determination that the superiority of characters and words is reflected from the viewpoint of the sentence is performed on the news posting information, and as a result, the high-precision existing / non-existing information can be obtained.

このように、利用者が多数のソーシャルメディア情報から第一報の非既出情報または既出情報を取得する際に、ニュース素材分類装置１−１を用いることで、既出情報と非既出情報とを精度高く分類することが可能となる。 Thus, when the user acquires the first non-existing information or the pre-existing information from a large number of social media information, by using the news material classification device 1-1, the preexisting information and the non-existing information are accurately obtained. High classification is possible.

また、実施例１のニュース素材分類装置１−１は、ニュース素材抽出装置１００とは独立して無相関に動作する。このため、ニュース素材分類装置１−１の学習モデルは、ニュース素材抽出装置１００が入力する多数のソーシャルメディア情報を用いる必要がなく、ニュース素材抽出装置１００が入力するソーシャルメディア情報の数よりも少ないニュース性投稿情報を用いればよい。 Also, the news material classification device 1-1 of the first embodiment operates uncorrelated independently of the news material extraction device 100. For this reason, the learning model of the news material classification device 1-1 does not need to use a large number of social media information input by the news material extraction device 100, and is smaller than the number of social media information input by the news material extraction device 100. News posting information may be used.

したがって、学習データが少なくて済むから、学習モデルを機械学習するためのハードウェアの負担が少なくなり、限られた計算機資源であっても複雑な処理を行うことも可能である。 Therefore, since less learning data is required, the burden of hardware for machine learning of the learning model is reduced, and it is possible to perform complicated processing even with limited computer resources.

さらに、実施例１のニュース素材分類装置１−１は、ニュース素材抽出装置１００とは独立して動作するから、ニュース素材抽出装置１００による抽出処理とニュース素材分類装置１−１による分類処理とは、ほぼ無相関の関係にある。したがって、ニュース素材抽出装置１００による抽出処理との関係をさほど気にすることなく、ニュース素材分類装置１−１の分類処理の内容を変更することができ、分類処理を変更する際の作業が容易となる。 Furthermore, since the news material classification device 1-1 according to the first embodiment operates independently of the news material extraction device 100, the extraction processing by the news material extraction device 100 and the classification processing by the news material classification device 1-1 are described. Is almost uncorrelated. Therefore, the contents of the classification process of the news material classification apparatus 1-1 can be changed without much concern about the relationship with the extraction process by the news material extraction apparatus 100, and the work for changing the classification process is easy. It becomes.

また、本願の発明者による実験によれば、既出情報と非既出情報の正しい割合を２０％及び８０％とした場合に、単純な従来のフィルタリング処理では、約５０％の精度でそれぞれ既出情報及び非既出情報に分類するという結果が得られた。つまり、ニュース性投稿情報のうち２０％の既出情報を抽出しようとした場合に、本来の既出情報のうち３４％を、９４．１％の精度で取得することができる。一方、実施例１のニュース素材分類装置１−１では、約８０％以上の精度で既出情報及び非既出情報に分類するという結果が得られた。つまり、ニュース性投稿情報のうち２０％の既出情報を抽出しようとした場合に、本来の既出情報のうち８０％を、９０％の精度で取得することができる。詳細については、後述する図２１にて説明する。 In addition, according to the experiment by the inventors of the present application, when the correct ratio of the existing information and the non-existing information is set to 20% and 80%, in the simple conventional filtering process, the existing information and The result of classification into non-existing information was obtained. That is, when 20% of the previously published information is extracted from the news posting information, 34% of the original previously posted information can be acquired with an accuracy of 94.1%. On the other hand, in the news material classification apparatus 1-1 of Example 1, the result of classifying into the existing information and the non-existing information with an accuracy of about 80% or more was obtained. That is, when 20% of the previously published information is to be extracted from the news posting information, 80% of the original previously posted information can be acquired with 90% accuracy. Details will be described later with reference to FIG.

〔実施例２〕
次に、実施例２について説明する。前述のとおり、実施例２は、実施例１の処理により、学習モデルを用いた判定結果を生成し、エージェント情報を用いたフィルタリングにて、ニュース性投稿情報を、第一報の非既出情報とその他の既出情報とに分類し、判定結果を生成し、学習モデルを用いた判定結果及びフィルタリングによる判定結果に基づいて、最終的な判定結果を生成する例である。 [Example 2]
Next, Example 2 will be described. As described above, in the second embodiment, a determination result using a learning model is generated by the processing of the first embodiment, and news posting information is converted into non-existing information in the first report by filtering using agent information. This is an example in which a classification result is generated and a determination result is generated, and a final determination result is generated based on a determination result using a learning model and a determination result by filtering.

一般に、ソーシャルメディアの投稿には、本文のみならず、投稿元のアプリケーションまたはＯＳ等のエージェント情報、さらには投稿時間の情報が付加されており、利用者は、これらの情報を利用することが可能な場合がある。 In general, not only the text but also agent information such as the application of the posting or the OS, and the posting time information are added to the social media posting, and the user can use the information. There is a case.

携帯端末等の可搬通信機器から発信されるニュースに関わる投稿は、第一報の非既出情報である可能性が高い。これに対し、携帯端末等の可搬通信機器でない設置型のサーバーまたはパーソナルコンピュータから発信される投稿は、報道局またはホームページから自動的に投稿されるもの、または屋内のパーソナルコンピュータにて何かの情報を得て投稿されるものが多い。この場合の投稿情報は、第一報の非既出情報である可能性が低く、既出情報である可能性が高い。エージェント情報の種類は膨大であるが、携帯電話からの投稿であるか否かの情報のみを用いて、既出・非既出の判別を行うことが可能である。 Posts related to news transmitted from portable communication devices such as portable terminals are likely to be non-existing information in the first report. On the other hand, posts sent from a stationary server or personal computer that is not a portable communication device such as a portable terminal are automatically posted from the news bureau or homepage, or something posted on an indoor personal computer. Many are posted with information. The post information in this case is unlikely to be the first non-existing information, and is likely to be the pre-existing information. Although the type of agent information is enormous, it is possible to determine whether it has been posted or not by using only information regarding whether or not the posting is from a mobile phone.

そこで、実施例２では、エージェント情報を用いたフィルタリングにて非既出情報と既出情報とを判定し、学習モデルを用いた判定結果、及びフィルタリングによる判定結果に基づいて、最終的な判定結果を生成するようにした。 Therefore, in the second embodiment, non-existing information and existing information are determined by filtering using agent information, and a final determination result is generated based on the determination result using the learning model and the determination result by filtering. I tried to do it.

図３は、実施例２のニュース素材分類装置を含む全体構成例の概要を説明する図である。このシステムは、利用者が多数のソーシャルメディア情報から第一報の非既出情報を取得するためのものであり、ニュース素材抽出装置１００及びニュース素材分類装置１−２を備えて構成される。 FIG. 3 is a diagram for explaining an overview of an overall configuration example including the news material classification device according to the second embodiment. This system is for a user to acquire first non-existing information from a large number of social media information, and includes a news material extraction device 100 and a news material classification device 1-2.

ニュース素材抽出装置１００は、図１に示したニュース素材抽出装置１００と同様であるから、ここでは説明を省略する。 The news material extracting apparatus 100 is the same as the news material extracting apparatus 100 shown in FIG.

ニュース素材分類装置１−２は、ニュース素材抽出装置１００からニュース性投稿情報を入力し、学習モデルを用いて、ニュース性投稿情報を、文面に基づき非既出情報と既出情報とに分類する。また、ニュース素材分類装置１−２は、エージェント情報を用いたフィルタリングにて、ニュース性投稿情報を、エージェント情報に基づき非既出情報と既出情報とに分類する。そして、ニュース素材分類装置１−２は、文面に基づいて分類した非既出情報及び既出情報、並びにエージェント情報に基づいて分類した非既出情報及び既出情報に基づいて、最終的な判定結果を生成する。 The news material classification device 1-2 receives the news posting information from the news material extraction device 100, and classifies the news posting information into non-existing information and existing information based on the text using a learning model. Moreover, the news material classification device 1-2 classifies the news posting information into non-existing information and existing information based on the agent information by filtering using the agent information. Then, the news material classification device 1-2 generates a final determination result based on the non-existing information and the preexisting information classified based on the text, and the non-existing information and the preexisting information classified based on the agent information. .

ニュース素材分類装置１−２は、最終的な判定結果として、ニュース性投稿情報について既出または非既出を示すラベル情報を、既出・非既出情報として出力する。ニュース素材分類装置１−２の詳細については後述する。 As a final determination result, the news material classification device 1-2 outputs label information indicating the appearance or non-existence of the news posting information as the appearance / non-appearance information. Details of the news material classification device 1-2 will be described later.

図４は、図３に示した実施例２のニュース素材分類装置１−２の構成例を示すブロック図である。このニュース素材分類装置１−２は、系列生成部１０、文面分類部１１、学習モデル記憶部１２、エージェント分類部１３及び判定部１４を備えている。 FIG. 4 is a block diagram illustrating a configuration example of the news material classification apparatus 1-2 according to the second embodiment illustrated in FIG. The news material classification apparatus 1-2 includes a sequence generation unit 10, a sentence classification unit 11, a learning model storage unit 12, an agent classification unit 13, and a determination unit 14.

系列生成部１０、文面分類部１１及び学習モデル記憶部１２は、図２に示した系列生成部１０、文面分類部１１及び学習モデル記憶部１２と同様であるから、ここでは説明を省略する。文面分類部１１は、既出・非既出情報ａ（第１判定結果）を判定部１４に出力する。 The sequence generation unit 10, the sentence classification unit 11, and the learning model storage unit 12 are the same as the sequence generation unit 10, the sentence classification unit 11, and the learning model storage unit 12 illustrated in FIG. The sentence classification unit 11 outputs the already-present / non-existing information a (first determination result) to the determination unit 14.

エージェント分類部１３は、ニュース素材抽出装置１００からニュース性投稿情報を入力し、ニュース性投稿情報に付加されたエージェント情報に基づいて、ニュース性投稿情報が既出であるか、または非既出であるかを判定し、そのラベル情報を生成する。そして、エージェント分類部１３は、生成したラベル情報を既出・非既出情報ｂ（第２判定結果）として判定部１４に出力する。 The agent classification unit 13 inputs the news posting information from the news material extracting apparatus 100, and based on the agent information added to the news posting information, whether the news posting information has been published or has not been published. And the label information is generated. Then, the agent classification unit 13 outputs the generated label information to the determination unit 14 as the previous / non-existing information b (second determination result).

エージェント情報は、投稿元の機器を識別するための情報であり、例えば、投稿元の機器の種類、投稿元の機器にて投稿に使用するアプリケーションまたはＯＳ等の情報が含まれる。 The agent information is information for identifying the posting source device, and includes, for example, the type of the posting source device, information such as an application or OS used for posting in the posting source device.

図５は、エージェント分類部１３の処理例を示すフローチャートである。エージェント分類部１３は、ニュース素材抽出装置１００からニュース性投稿情報を入力し（ステップＳ５０１）、ニュース性投稿情報からエージェント情報を抽出する（ステップＳ５０２）。ニュース性投稿情報にはエージェント情報が付加されている場合があり、エージェント分類部１３は、ニュース性投稿情報から、当該ニュース性投稿情報に付加されたエージェント情報を抽出することができる。 FIG. 5 is a flowchart illustrating a processing example of the agent classification unit 13. The agent classification unit 13 inputs news posting information from the news material extraction device 100 (step S501), and extracts agent information from the news posting information (step S502). Agent information may be added to the news posting information, and the agent classification unit 13 can extract the agent information added to the news posting information from the news posting information.

エージェント分類部１３は、エージェント情報に基づいて、ニュース性投稿情報が既出であるか、または非既出であるかを判定し（ステップＳ５０３）、そのラベル情報を生成する。エージェント分類部１３は、ラベル情報を既出・非既出情報ｂとして判定部１４に出力する（ステップＳ５０４）。 Based on the agent information, the agent classification unit 13 determines whether the news posting information has been published or has not been published (step S503), and generates the label information. The agent classification unit 13 outputs the label information to the determination unit 14 as the previous / non-existing information b (step S504).

具体的には、エージェント分類部１３は、エージェント情報が、携帯端末の機器名称、携帯端末にて使用するアプリケーションまたはＯＳの名称である場合、投稿元が携帯端末であると判断し、当該ニュース性投稿情報は非既出であると判定する。一方、エージェント分類部１３は、エージェント情報が、携帯端末の機器名称、携帯端末にて使用するアプリケーションまたはＯＳの名称でない場合（設置型のコンピュータの機器名称、設置型のコンピュータにて使用するアプリケーション及びＯＳの名称である場合）、投稿元が携帯端末でないと判断し、当該ニュース性投稿情報は既出であると判定する。 Specifically, the agent classification unit 13 determines that the posting source is the mobile terminal when the agent information is the device name of the mobile terminal, the name of the application or OS used in the mobile terminal, and the news property It is determined that the posted information is not published. On the other hand, when the agent information is not the device name of the mobile terminal, the application used on the mobile terminal, or the name of the OS, the agent classification unit 13 (the device name of the installed computer, the application used on the installed computer, and If it is the name of the OS), it is determined that the posting source is not a mobile terminal, and it is determined that the news posting information has already been issued.

例えば、エージェント分類部１３は、エージェント情報が「Twitter for iPhone（登録商標）」または「Twitter for Android（登録商標）」である場合、非既出であると判定する。一方、エージェント分類部１３は、エージェント情報が「Twitter for iPhone（登録商標）」及び「Twitter for Android（登録商標）」以外の場合、既出であると判定する。 For example, when the agent information is “Twitter for iPhone (registered trademark)” or “Twitter for Android (registered trademark)”, the agent classifying unit 13 determines that the information is not yet published. On the other hand, when the agent information is other than “Twitter for iPhone (registered trademark)” and “Twitter for Android (registered trademark)”, the agent classification unit 13 determines that the agent information has already been issued.

図４に戻って、判定部１４は、文面分類部１１から既出・非既出情報ａを入力すると共に、エージェント分類部１３から既出・非既出情報ｂを入力し、既出・非既出情報ａ，ｂに基づいて、既出・非既出情報を生成して出力する。 Returning to FIG. 4, the determination unit 14 inputs the already-noted / not-to-present information a from the sentence classification unit 11, and the already-not-to-not-present information b from the agent classification unit 13, and the previously-not-to-not-present information a, b Based on the above, it generates and outputs the previous / non-existing information.

図６は、判定部１４の第１の処理例（処理例１）を示すフローチャートである。判定部１４は、文面分類部１１及びエージェント分類部１３から既出・非既出情報ａ，ｂをそれぞれ入力する（ステップＳ６０１）。 FIG. 6 is a flowchart illustrating a first processing example (processing example 1) of the determination unit 14. The determination unit 14 inputs the already-present / non-existing information a and b from the sentence classifying unit 11 and the agent classifying unit 13 (step S601).

判定部１４は、既出・非既出情報ａが既出を示しているか否かを判定する（ステップＳ６０２）。判定部１４は、既出・非既出情報ａが既出を示している場合（ステップＳ６０２：Ｙ）、ステップＳ６０３へ移行し、既出・非既出情報ａが既出を示していない場合（非既出を示している場合）（ステップＳ６０２：Ｎ）、ステップＳ６０５へ移行する。 The determination unit 14 determines whether or not the previous / non-existing information a indicates the previous appearance (step S602). The determination unit 14 proceeds to step S603 when the previous / non-existing information “a” indicates the previous appearance (step S602: Y), and when the existing / non-existing information “a” does not indicate the previous appearance (shows the non-existing state) (Step S602: N), the process proceeds to Step S605.

判定部１４は、ステップＳ６０２（Ｙ）から移行して、既出・非既出情報ｂが既出を示しているか否かを判定する（ステップＳ６０３）。判定部１４は、既出・非既出情報ｂが既出を示している場合（ステップＳ６０３：Ｙ）、既出を判定し、既出を示すラベル情報を生成する（ステップＳ６０４）。一方、判定部１４は、既出・非既出情報ｂが既出を示していない場合（非既出を示している場合）（ステップＳ６０３：Ｎ）、ステップＳ６０５へ移行する。 The determination unit 14 shifts from step S602 (Y) to determine whether or not the previous / non-existing information b indicates the previous (step S603). When the previous / non-previous information b indicates the previous appearance (step S603: Y), the determination unit 14 determines the previous appearance and generates label information indicating the previous appearance (step S604). On the other hand, the determination part 14 transfers to step S605, when the previous / non-existing information b does not indicate the previous (when it indicates the non-existing) (step S603: N).

判定部１４は、ステップＳ６０２（Ｎ）またはステップＳ６０３（Ｎ）から移行して、非既出を判定し、非既出を示すラベル情報を生成する（ステップＳ６０５）。判定部１４は、ステップＳ６０４またはステップＳ６０５から移行して、ラベル情報を既出・非既出情報として出力する（ステップＳ６０６）。 The determination unit 14 proceeds from step S602 (N) or step S603 (N), determines non-existing, and generates label information indicating non-existing (step S605). The determination unit 14 proceeds from step S604 or step S605, and outputs the label information as the already-present / non-existing information (step S606).

このように、判定部１４は、図６に示した処理例１において、既出・非既出情報ａ，ｂの両者が既出を示している場合、既出を示す既出・非既出情報を出力し、それ以外の場合、非既出を示す既出・非既出情報を出力する。 As described above, in the processing example 1 shown in FIG. 6, the determination unit 14 outputs the previous / non-existing information indicating the existing when both the existing / non-existing information a and b indicate the existing, In cases other than, information on the appearance / non-appearance that indicates non-appearance is output.

図７は、判定部１４の第２の処理例（処理例２）を示すフローチャートである。図７に示すステップＳ７０１及びステップＳ７０６は、図６に示したステップＳ６０１及びステップＳ６０６と同様であるから、ここでは説明を省略する。 FIG. 7 is a flowchart illustrating a second processing example (processing example 2) of the determination unit 14. Steps S701 and S706 shown in FIG. 7 are the same as steps S601 and S606 shown in FIG.

判定部１４は、既出・非既出情報ａが既出を示しているか否かを判定する（ステップＳ７０２）。判定部１４は、既出・非既出情報ａが既出を示している場合（ステップＳ７０２：Ｙ）、ステップＳ７０４へ移行し、既出・非既出情報ａが既出を示していない場合（非既出を示している場合）（ステップＳ７０２：Ｎ）、ステップＳ７０３へ移行する。 The determination unit 14 determines whether or not the previous / non-existing information a indicates the previous appearance (step S702). The determination unit 14 proceeds to step S704 when the previous / non-existing information “a” indicates the previous appearance (step S702: Y), and when the existing / non-existing information “a” does not indicate the previous appearance (shows the non-existing state) (Step S702: N), the process proceeds to Step S703.

判定部１４は、ステップＳ７０２（Ｎ）から移行して、既出・非既出情報ｂが既出を示しているか否かを判定する（ステップＳ７０３）。判定部１４は、既出・非既出情報ｂが既出を示している場合（ステップＳ７０３：Ｙ）、ステップＳ７０４へ移行する。一方、判定部１４は、既出・非既出情報ｂが既出を示していない場合（非既出を示している場合）（ステップＳ７０３：Ｎ）、ステップＳ７０５へ移行する。 The determination unit 14 shifts from step S702 (N) to determine whether or not the previous / non-existing information b indicates the previous (step S703). The determination unit 14 proceeds to step S704 when the previous / non-existing information b indicates the previous (step S703: Y). On the other hand, the determination part 14 transfers to step S705, when the previous / non-existing information b does not indicate the previous (when it indicates the non-existing) (step S703: N).

判定部１４は、ステップＳ７０２（Ｙ）またはステップＳ７０３（Ｙ）から移行して、既出を判定し、既出を示すラベル情報を生成する（ステップＳ７０４）。判定部１４は、ステップＳ７０３（Ｎ）から移行して、非既出を判定し、非既出を示すラベル情報を生成する（ステップＳ７０５）。 The determination unit 14 proceeds from step S702 (Y) or step S703 (Y), determines the appearance, and generates label information indicating the appearance (step S704). The determination unit 14 proceeds from step S703 (N), determines non-existing, and generates label information indicating non-existing (step S705).

このように、判定部１４は、図７に示した処理例２において、既出・非既出情報ａ，ｂのいずれか一方が既出を示している場合、既出を示す既出・非既出情報を出力し、それ以外の場合、非既出を示す既出・非既出情報を出力する。 In this way, in the processing example 2 shown in FIG. 7, the determination unit 14 outputs the appearing / non-appearing information indicating the appearance when one of the appearing / non-appearing information a and b indicates the appearance. In other cases, information on the appearance / non-appearance indicating non-appearance is output.

以上のように、実施例２のニュース素材分類装置１−２によれば、系列生成部１０は、ニュース素材抽出装置１００により抽出されたニュース性投稿情報から、文字ベクトル及び単語ベクトルを生成する。文面分類部１１は、学習モデルを用いて、ニュース性投稿情報の文字ベクトル及び単語ベクトルを入力データとし、既出・非既出情報を出力データとした演算を行い、既出・非既出情報ａを生成する。エージェント分類部１３は、ニュース性投稿情報に含まれるエージェント情報に基づいて、既出・非既出情報ｂを生成する。判定部１４は、既出・非既出情報ａ，ｂに基づいて、ニュース性投稿情報に対する既出・非既出情報を生成する。 As described above, according to the news material classification device 1-2 of the second embodiment, the sequence generation unit 10 generates a character vector and a word vector from the news posting information extracted by the news material extraction device 100. Using the learning model, the sentence classification unit 11 performs an operation using the character vector and the word vector of the news posting information as input data and the existing / non-existing information as output data, and generates the existing / non-existing information a. . The agent classification unit 13 generates the already-not-present / not-to-not-present information b based on the agent information included in the news posting information. The determination unit 14 generates the appearing / non-appearing information for the news posting information based on the appearing / non-appearing information a and b.

これにより、実施例１と同様の効果を奏する。特に、ニュース性投稿情報の文面に基づいて、文字情報及び単語情報の優位性を活かした判定に加え、ニュース性投稿情報に含まれるエージェント情報に基づいた判定を行うから、一層精度の高い既出・非既出情報を得ることができる。 Thereby, there exists an effect similar to Example 1. In particular, based on the text of the news posting information, in addition to the determination utilizing the superiority of the character information and the word information, the determination based on the agent information included in the news posting information is performed, so that more accurate Non-existing information can be obtained.

このように、利用者が多数のソーシャルメディア情報から第一報の非既出情報または既出情報を取得する際に、ニュース素材分類装置１−２を用いることで、既出情報と非既出情報とを一層精度高く分類することが可能となる。 Thus, when the user acquires the first non-existing information or the pre-existing information from a large number of social media information, the news material classification device 1-2 is used to further combine the pre-existing information and the non-existing information. It becomes possible to classify with high accuracy.

〔実施例２の変形例〕
次に、実施例２の変形例について説明する。実施例２の変形例は、文字ベクトル及び単語ベクトルを入力データとして機械学習された学習モデルを用いる代わりに、文字ベクトルのみまたは単語ベクトルのみを入力データとして機械学習された学習モデルを用いるものである。つまり、実施例２の変形例は、文字ベクトルのみまたは単語ベクトルのみで機械学習された学習モデルを用いた判定結果を生成し、エージェント情報を用いたフィルタリングにて判定結果を生成し、学習モデルを用いた判定結果及びフィルタリングによる判定結果に基づいて、最終的な判定結果を生成する例である。 [Modification of Example 2]
Next, a modification of the second embodiment will be described. The modification of the second embodiment uses a learning model machine-learned using only character vectors or only word vectors as input data, instead of using a machine model learning using character vectors and word vectors as input data. . That is, the modified example of the second embodiment generates a determination result using a learning model machine-learned only with a character vector or only a word vector, generates a determination result by filtering using agent information, This is an example of generating a final determination result based on the determination result used and the determination result by filtering.

実施例２の変形例におけるニュース素材分類装置１−２は、図４の構成において、実施例２とは異なる系列生成部１０、文面分類部１１及び学習モデル記憶部１２を備え、実施例２と同じエージェント分類部１３及び判定部１４を備えている。 The news material classification device 1-2 in the modification of the second embodiment includes a sequence generation unit 10, a sentence classification unit 11, and a learning model storage unit 12 different from the second embodiment in the configuration of FIG. The same agent classification unit 13 and determination unit 14 are provided.

実施例２の変形例の系列生成部１０は、ニュース性投稿情報を入力し、ニュース性投稿情報に基づいて、ニュース性投稿情報に含まれる文字を文字系列として抽出し、または、ニュース性投稿情報に含まれる単語を単語系列として抽出する。そして、系列生成部１０は、文字系列からなる文字ベクトル、または単語系列からなる単語ベクトルを生成し、文字ベクトルまたは単語ベクトルを文面分類部１１に出力する。 The series generation unit 10 according to the modification of the second embodiment inputs news posting information and extracts characters included in the news posting information as a character series based on the news posting information or news posting information. The words included in are extracted as a word series. Then, the sequence generation unit 10 generates a character vector composed of a character sequence or a word vector composed of a word sequence, and outputs the character vector or the word vector to the text classification unit 11.

文面分類部１１は、系列生成部１０から文字ベクトルまたは単語ベクトルを入力し、学習モデル記憶部１２に記憶された学習モデルを用いて、文字ベクトルまたは単語ベクトルに基づき、既出・非既出情報ａを生成し、これを判定部１４に出力する。 The sentence classification unit 11 inputs a character vector or a word vector from the sequence generation unit 10, and uses the learning model stored in the learning model storage unit 12 to obtain the previous / non-existing information a based on the character vector or the word vector. This is generated and output to the determination unit 14.

学習モデル記憶部１２には、従来の学習装置により機械学習された従来の学習モデルが記憶されている。この学習モデルは、例えば文字ＮＮ及び出力ＮＮにより構成され、または単語ＮＮ及び出力ＮＮにより構成される。 The learning model storage unit 12 stores a conventional learning model machine-learned by a conventional learning device. This learning model is composed of, for example, a character NN and an output NN, or is composed of a word NN and an output NN.

以上のように、実施例２の変形例のニュース素材分類装置１−２によれば、文面分類部１１は、学習モデルを用いて、ニュース性投稿情報の文字ベクトルまたは単語ベクトルを入力データとし、既出・非既出情報を出力データとした演算を行い、既出・非既出情報ａを生成する。エージェント分類部１３は、ニュース性投稿情報に含まれるエージェント情報に基づいて、既出・非既出情報ｂを生成する。判定部１４は、既出・非既出情報ａ，ｂに基づいて、ニュース性投稿情報に対する既出・非既出情報を生成する。 As described above, according to the news material classification device 1-2 of the modified example of the second embodiment, the sentence classification unit 11 uses the learning model as a character vector or a word vector of news posting information, as input data, A calculation is performed using the previously-noted / not-present information as output data to generate the previously-not-present information a. The agent classification unit 13 generates the already-not-present / not-to-not-present information b based on the agent information included in the news posting information. The determination unit 14 generates the appearing / non-appearing information for the news posting information based on the appearing / non-appearing information a and b.

これにより、ニュース性投稿情報の文面に基づいた判定に加え、ニュース性投稿情報に含まれるエージェント情報に基づいた判定を行うから、従来の、文面のみに基づいた判定よりも、精度の高い既出・非既出情報を得ることができる。 As a result, in addition to the determination based on the text of the news posting information, the determination based on the agent information included in the news posting information is performed. Non-existing information can be obtained.

このように、利用者が多数のソーシャルメディア情報から第一報の非既出情報または既出情報を取得する際に、実施例２の変形例のニュース素材分類装置１−２を用いることで、既出情報と非既出情報とを精度高く分類することが可能となる。 As described above, when the user acquires the first non-existing information or the preexisting information from a large number of social media information, by using the news material classification device 1-2 according to the modification of the second embodiment, the preexisting information is obtained. And non-existing information can be classified with high accuracy.

〔学習装置〕
次に、図２及び図４に示した学習モデル記憶部１２に記憶される学習モデルを機械学習する学習装置について説明する。 [Learning device]
Next, a learning apparatus that performs machine learning of the learning model stored in the learning model storage unit 12 illustrated in FIGS. 2 and 4 will be described.

図８は、学習装置の構成例を示すブロック図である。この学習装置２は、系列生成部１０、学習部２０及び学習モデル記憶部１２を備えている。 FIG. 8 is a block diagram illustrating a configuration example of the learning device. The learning device 2 includes a sequence generation unit 10, a learning unit 20, and a learning model storage unit 12.

系列生成部１０及び学習モデル記憶部１２は、図２及び図４に示した系列生成部１０及び学習モデル記憶部１２と同様であるから、ここでは説明を省略する。学習モデル記憶部１２に記憶される学習モデルは、学習部２０により生成される。 Since the sequence generation unit 10 and the learning model storage unit 12 are the same as the sequence generation unit 10 and the learning model storage unit 12 illustrated in FIGS. 2 and 4, description thereof is omitted here. The learning model stored in the learning model storage unit 12 is generated by the learning unit 20.

学習部２０は、系列生成部１０からニュース性投稿情報の文字ベクトル及び単語ベクトルを入力すると共に、ニュース性投稿情報が既出であるか、または非既出であるかを示す既出・非既出情報を入力する。 The learning unit 20 inputs the character vector and the word vector of the news posting information from the sequence generation unit 10 and inputs the appearing / non-appearing information indicating whether the news posting information has been published or has not been published. To do.

学習部２０は、ニュース性投稿情報の文字ベクトル及び単語ベクトルを入力データとし、既出・非既出情報を出力データとし、これらの学習データを用いて機械学習を行う。そして、学習部２０は、図９に示した学習モデルを生成し、学習モデルを学習モデル記憶部１２に記憶する。例えば、学習部２０は、後述する図１０〜図１５に示す学習モデルを生成する。学習部２０による学習は、例えばバックプロパゲーション法により行われ、機械学習の度にノード間の重み係数等が更新される。 The learning unit 20 uses the character vector and the word vector of the news posting information as input data, uses the previous / non-existing information as output data, and performs machine learning using these learning data. Then, the learning unit 20 generates the learning model illustrated in FIG. 9 and stores the learning model in the learning model storage unit 12. For example, the learning unit 20 generates a learning model shown in FIGS. Learning by the learning unit 20 is performed by, for example, a back-propagation method, and a weighting coefficient between nodes is updated every time machine learning is performed.

既出・非既出情報の学習データは、人手のアノテータにより生成される。アノテータは、図１及び図３に示したニュース素材抽出装置１００により抽出されたニュース性投稿情報に対し、ニュース記事が含まれているか否かを判定する。そして、アノテータは、ニュース性投稿情報にニュース記事の引用が含まれていると判定した場合、当該ニュース性投稿情報について既出のラベル付けを行い、既出を示す既出・非既出情報を生成する。一方、アノテータは、ニュース性投稿情報にニュース記事の引用が含まれていないと判定した場合、当該ニュース性投稿情報について非既出のラベル付けを行い、非既出を示す既出・非既出情報を生成する。 The learning data of the already-present / not-present information is generated by a human annotator. The annotator determines whether or not a news article is included in the news posting information extracted by the news material extracting apparatus 100 shown in FIGS. When the annotator determines that the news posting information includes the citation of the news article, the annotator performs the labeling of the news posting information and generates the appearing / non-appearing information indicating the appearing. On the other hand, if the annotator determines that the news posting information does not include the citation of the news article, the annotator performs non-existing labeling on the news posting information and generates the existing / non-exposed information indicating the non-existing .

このように、学習装置２により、例えば後述する図１０〜図１５に示す学習モデルが生成され、この学習モデルは、図２または図４に示した学習モデル記憶部１２に記憶される。 In this way, the learning device 2 generates, for example, a learning model shown in FIGS. 10 to 15 described later, and this learning model is stored in the learning model storage unit 12 shown in FIG. 2 or FIG.

〔学習モデル〕
次に、図８に示した学習装置２により生成され、図１及び図２に示した実施例１のニュース素材分類装置１−１、または図３及び図４に示した実施例２のニュース素材分類装置１−２に用いる学習モデルについて説明する。 [Learning model]
Next, the news material classification apparatus 1-1 of the first embodiment shown in FIGS. 1 and 2 or the news material of the second embodiment shown in FIGS. 3 and 4 generated by the learning device 2 shown in FIG. A learning model used for the classification device 1-2 will be described.

学習装置２により生成された学習モデルは、ニュース性投稿情報の文字ベクトル及び単語ベクトルを入力データとし、入力データに対して、当該ニュース性投稿情報が既出であるか、または非既出であるかを示す既出・非既出情報を出力するように、コンピュータを機能させるためのものであり、機械学習された重み係数を保持している。図２及び図４に示した文面分類部１１は、学習モデル記憶部１２に記憶された学習モデルを読み出してメモリに格納し、学習モデルを用いて演算を行う演算部として機能する。 The learning model generated by the learning device 2 uses the character vector and the word vector of the news posting information as input data, and determines whether the news posting information has already been given or not given to the input data. This is for causing the computer to function so as to output the displayed / not-shown information, and holds the machine-learned weighting coefficient. The sentence classification unit 11 shown in FIGS. 2 and 4 functions as an arithmetic unit that reads out the learning model stored in the learning model storage unit 12 and stores it in the memory, and performs an operation using the learning model.

具体例１は、ＦＦＮＮ及びＬＳＴＭを用いた学習モデルであり、具体例２は、ＦＦＮＮ及びＣＮＮを用いた学習モデルであり、具体例３は、ＦＦＮＮを用いた学習モデルである。これらの具体例１〜３は、図９に示した学習モデルを具体的に示したものである。 Specific example 1 is a learning model using FFNN and LSTM, specific example 2 is a learning model using FFNN and CNN, and specific example 3 is a learning model using FFNN. These specific examples 1 to 3 specifically show the learning model shown in FIG.

（具体例１）
まず、具体例１の学習モデルについて説明する。図１０は、ＦＦＮＮ及びＬＳＴＭを用いた学習モデル（具体例１）の概略構造を説明する図であり、図１１は、具体例１におけるノード数及び入出力データを説明する図である。 (Specific example 1)
First, the learning model of specific example 1 will be described. FIG. 10 is a diagram illustrating a schematic structure of a learning model (specific example 1) using FFNN and LSTM, and FIG. 11 is a diagram illustrating the number of nodes and input / output data in specific example 1.

この学習モデルは、文字入力層用ＦＦＮＮ２４、単語入力層用ＦＦＮＮ２５、文字ＬＳＴＭ２６、単語ＬＳＴＭ２７、中間層用ＦＦＮＮ２８及び出力層用ＦＦＮＮ２９を備えて構成される。また、この学習モデルは、文字入力層用ＦＦＮＮ２４の出力データが文字ＬＳＴＭ２６に入力されるように、文字入力層用ＦＦＮＮ２４と文字ＬＳＴＭ２６とが結合し、単語入力層用ＦＦＮＮ２５の出力データが単語ＬＳＴＭ２７に入力されるように、単語入力層用ＦＦＮＮ２５と単語ＬＳＴＭ２７とが結合し、文字ＬＳＴＭ２６の出力データ及び単語ＬＳＴＭ２７の出力データが中間層用ＦＦＮＮ２８に入力されるように、文字ＬＳＴＭ２６及び単語ＬＳＴＭ２７と中間層用ＦＦＮＮ２８とが結合し、中間層用ＦＦＮＮ２８の出力データが出力層用ＦＦＮＮ２９に入力されるように、中間層用ＦＦＮＮ２８と出力層用ＦＦＮＮ２９とが結合して構成される。 The learning model includes a character input layer FFNN 24, a word input layer FFNN 25, a character LSTM 26, a word LSTM 27, an intermediate layer FFNN 28, and an output layer FFNN 29. This learning model also combines the character input layer FFNN 24 and the character LSTM 26 so that the output data of the character input layer FFNN 24 is input to the character LSTM 26, and the output data of the word input layer FFNN 25 to the word LSTM 27. The character LSTM26, the word LSTM27, and the intermediate layer are input so that the word input layer FFNN25 and the word LSTM27 are combined and the output data of the character LSTM26 and the output data of the word LSTM27 are input to the intermediate layer FFNN28. The intermediate layer FFNN 28 and the output layer FFNN 29 are combined so that the output layer FFNN 28 is coupled to the output layer FFNN 28 and the output data of the intermediate layer FFNN 28 is input to the output layer FFNN 29.

この学習モデルは、文字one-hotベクトル系列｛ｘ_char｝及び単語one-hotベクトル系列｛ｘ_word｝のそれぞれを入力データとし、中間層にて両者を結合し、二値の既出・非既出情報を出力データとするニューラルネットワークである。また、この学習モデルは、図８に示した学習装置２により機械学習された文字入力層用ＦＦＮＮ２４〜出力層用ＦＦＮＮ２９の重み係数を保持している。 This learning model uses each of a character one-hot vector sequence {x _char } and a word one-hot vector sequence {x _word } as input data, and combines both in the intermediate layer to obtain binary appearing / not appearing information. Is a neural network whose output data is. Further, this learning model holds the weighting factors of the character input layer FFNN24 to the output layer FFNN29 machine-learned by the learning device 2 shown in FIG.

文面分類部１１は、学習モデル記憶部１２に記憶された図１０に示す学習モデルを読み出す。文面分類部１１は、系列生成部１０により生成された文字one-hotベクトル系列｛ｘ_char｝を文字入力層用ＦＦＮＮ２４に入力し、文字one-hotベクトル系列｛ｘ_char｝の各要素の値に対し、機械学習された重み係数を付加する等のＦＦＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトル系列｛ｈ_char ⁱⁿ｝を文字入力層用ＦＦＮＮ２４から出力し、文字ＬＳＴＭ２６へ伝搬させる。尚、ＦＦＮＮの演算は既知であるから、ここでは説明を省略する。 The sentence classification unit 11 reads the learning model shown in FIG. 10 stored in the learning model storage unit 12. The text classification unit 11 inputs the character one-hot vector sequence {x _char } generated by the sequence generation unit 10 into the character input layer FFNN 24 and sets the value of each element of the character one-hot vector sequence {x _char }. On the other hand, an FFNN calculation such as adding a weighting coefficient learned by machine learning is performed. Then, the text classification unit 11 outputs the vector sequence {h _char ⁱⁿ } of the operation result from the character input layer FFNN 24 and propagates it to the character LSTM 26. In addition, since the calculation of FFNN is known, description is abbreviate | omitted here.

図１６は、ニュース性投稿情報から生成された文字系列の例及び単語系列の例を示す図である。ニュース性投稿情報が「新宿駅西口付近で火事１１月１５日」である場合、系列生成部１０により、このニュース性投稿情報から文字系列「新，宿，駅，西，口，付，近，で，火，事，１，１，月，１，５，日」が抽出される。この文字系列は、「新」「宿」・・・「日」の各文字から構成される。 FIG. 16 is a diagram illustrating an example of a character sequence and a word sequence generated from news posting information. When the news posting information is “November 15 near Shinjuku Station West Exit”, the series generation unit 10 causes the news generation posting information to generate a character sequence “New, Inn, Station, West, Exit, Tsuke, Near, Thus, “fire, thing, 1, 1, month, 1, 5, day” is extracted. This character sequence is composed of the characters “new”, “hotel”, and “day”.

また、系列生成部１０により、このニュース性投稿情報から単語系列「新宿駅西口，付近，で，火事，１１月１５日」が抽出される。この単語系列は、「新宿駅西口」「付近」・・・「１１月１５日」の各単語から構成される。 The series generation unit 10 extracts the word series “Shinjuku station west exit, near, fire, November 15” from the news posting information. This word sequence is made up of the words “Shinjuku Station West Exit”, “Nearby”... “November 15”.

図１７は、文字one-hotベクトル系列｛ｘ_char｝の例を示す図である。文字系列が図１６に示した「新，宿，駅，西，口，付，近，で，火，事，１，１，月，１，５，日」である場合、系列生成部１０により、図１７に示す６３００行×１５０列の文字one-hotベクトル系列｛ｘ_char｝が生成される。この文字one-hotベクトル系列｛ｘ_char｝の列数１５０は、投稿の文字数の最大値１５０に相当する。行数６３００は、実施例１，２に用いる文字の種類の最大値に相当し、学習モデルの設計者により、学習データとして出現する文字の頻度等を考慮して予め設定される。 FIG. 17 is a diagram illustrating an example of a character one-hot vector sequence {x _char }. When the character sequence is “new, inn, station, west, mouth, date, near, fire, thing, 1, 1, month, 1, 5, day” shown in FIG. A character one-hot vector sequence {x _char } of 6300 rows × 150 columns shown in FIG. 17 is generated. The number of columns 150 of this character one-hot vector sequence {x _char } corresponds to the maximum value 150 of the number of characters in a post. The number of lines 6300 corresponds to the maximum character type used in the first and second embodiments, and is set in advance by the designer of the learning model in consideration of the frequency of characters appearing as learning data.

文字one-hotベクトル系列｛ｘ_char｝の列データは、文字系列を構成する各文字「新」「宿」・・・「日」に対応しており、６３００の行位置のうち当該文字に対応する１つの行位置に「１」が設定され、その他は「０」が設定される。文字系列が１５０文字に満たない場合、残りの列データにはヌルデータである「０」が設定される。 The column data of the character one-hot vector sequence {x _char } corresponds to each character “new”, “hotel”... “Day” constituting the character sequence, and corresponds to the character in 6300 row positions. “1” is set in one row position, and “0” is set in the other. If the character series is less than 150 characters, “0”, which is null data, is set in the remaining column data.

図１１を参照して、図１６及び図１７の例において、文面分類部１１は、ニュース性投稿情報が「新宿駅西口付近で火事１１月１５日」である場合、６３００行×１５０列の文字one-hotベクトル系列｛ｘ_char｝を文字入力層用ＦＦＮＮ２４に入力し、ＦＦＮＮの演算を行う。この場合の文字入力層用ＦＦＮＮ２４における入力層のノード数は、６３００である。 Referring to FIG. 11, in the examples of FIGS. 16 and 17, the sentence classification unit 11, when the news posting information is “November 15 near the west exit of Shinjuku Station”, 6300 lines × 150 columns of characters. The one-hot vector series {x _char } is input to the character input layer FFNN 24 and FFNN is calculated. In this case, the number of nodes in the input layer in the character input layer FFNN 24 is 6300.

そして、文面分類部１１は、ＦＦＮＮの演算結果として２００行×１５０列のベクトル系列｛ｈ_char ⁱⁿ｝を生成し、これを文字入力層用ＦＦＮＮ２４から出力し、文字ＬＳＴＭ２６へ伝搬させる。この場合の文字入力層用ＦＦＮＮ２４における出力層のノード数は、２００である。 Then, the sentence classification unit 11 generates a 200-row × 150-column vector sequence {h _char ⁱⁿ } as the FFNN calculation result, outputs this from the character input layer FFNN 24, and propagates it to the character LSTM 26. In this case, the number of nodes in the output layer in the character input layer FFNN 24 is 200.

図１０に戻って、文面分類部１１は、文字入力層用ＦＦＮＮ２４から演算結果のベクトル系列｛ｈ_char ⁱⁿ｝を文字ＬＳＴＭ２６に入力し、ベクトル系列｛ｈ_char ⁱⁿ｝の各要素の値に対し、機械学習された重み係数を付加する等のＬＳＴＭの演算を行う。そして、文面分類部１１は、演算結果のベクトルｈ_char ^LSTMを文字ＬＳＴＭ２６から出力し、ベクトルｈ_char ^LSTM及び後述するベクトルｈ_word ^LSTMを結合して中間層用ＦＦＮＮ２８へ伝搬させる。尚、ＬＳＴＭの演算は既知であるから、ここでは説明を省略する。 Returning to FIG. 10, the text classification unit 11 inputs the vector sequence {h _char ⁱⁿ } of the calculation result from the character input layer FFNN 24 to the character LSTM 26, and for each element value of the vector sequence {h _char ⁱⁿ }, LSTM operations such as adding machine-learned weighting coefficients are performed. Then, the text classification unit 11 outputs the vector h _char ^LSTM of the operation result from the character LSTM ²⁶ , combines the vector h _char ^LSTM and a vector h _word ^LSTM described later, and propagates them to the intermediate layer FFNN 28. Since the LSTM calculation is known, the description thereof is omitted here.

図１１を参照して、図１６及び図１７の例において、文面分類部１１は、文字入力層用ＦＦＮＮ２４から２００行×１５０列のベクトル系列｛ｈ_char ⁱⁿ｝を文字ＬＳＴＭ２６に入力し、ＬＳＴＭの演算を行う。この場合の文字ＬＳＴＭ２６における入力層のノード数は、２００である。 Referring to FIG. 11, in the examples of FIGS. 16 and 17, the text classification unit 11 inputs a vector sequence {h _char ⁱⁿ } of 200 rows × 150 columns from the character input layer FFNN 24 to the character LSTM 26, and Perform the operation. In this case, the number of nodes in the input layer in the character LSTM26 is 200.

そして、文面分類部１１は、ＬＳＴＭの演算結果として２００行×１列のベクトルｈ_char ^LSTMを生成し、これを文字ＬＳＴＭ２６から出力し、中間層用ＦＦＮＮ２８へ伝搬させる。この場合の文字ＬＳＴＭ２６における出力層のノード数は、２００である。 Then, the sentence classification unit 11 generates a vector h _char ^LSTM of 200 rows × 1 column as an LSTM calculation result, outputs this from the character LSTM 26, and propagates it to the intermediate layer FFNN 28. In this case, the number of nodes in the output layer in the character LSTM26 is 200.

図１０に戻って、文面分類部１１は、系列生成部１０により生成された単語one-hotベクトル系列｛ｘ_word｝を単語入力層用ＦＦＮＮ２５に入力し、単語one-hotベクトル系列｛ｘ_word｝の各要素の値に対し、機械学習された重み係数を付加する等のＦＦＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトル系列｛ｈ_word ⁱⁿ｝を単語入力層用ＦＦＮＮ２５から出力し、単語ＬＳＴＭ２７へ伝搬させる。 Returning to FIG. 10, wording classification unit 11, the word one-hot vector sequence generated by sequence generation unit 10 {x _word} entered word input layer FFNN25, word one-hot vector sequence {x _word} FFNN operations such as adding machine-learned weighting coefficients to the values of the elements are performed. Then, the sentence classification unit 11 outputs the vector sequence {h _word ⁱⁿ } of the operation result from the word input layer FFNN 25 and propagates it to the word LSTM 27.

図１８は、単語one-hotベクトル系列｛ｘ_word｝の例を示す図である。単語系列が図１６に示した「新宿駅西口，付近，で，火事，１１月１５日」である場合、系列生成部１０により、図１８に示す７２０００行×１５０列の単語one-hotベクトル系列｛ｘ_word｝が生成される。この単語one-hotベクトル系列｛ｘ_word｝の列数１５０は、投稿の文字数の最大値１５０に相当する。行数７２０００は、実施例１，２に用いる単語の種類の最大値に相当し、学習モデルの設計者により、学習データとして出現する単語の頻度等を考慮して予め設定される。 FIG. 18 is a diagram illustrating an example of a word one-hot vector sequence {x _word }. When the word sequence is “Shinjuku station west exit, near, fire, November 15” shown in FIG. 16, the sequence generation unit 10 causes the word one-hot vector sequence of 72000 rows × 150 columns shown in FIG. {X _word } is generated. The number of columns 150 of the word one-hot vector sequence {x _word } corresponds to the maximum value 150 of the number of characters in the post. The number of lines 72000 corresponds to the maximum value of the word type used in the first and second embodiments, and is set in advance by the learning model designer in consideration of the frequency of words appearing as learning data.

単語one-hotベクトル系列｛ｘ_word｝の列データは、単語系列を構成する各単語「新宿駅西口」「付近」・・・「１１月１５日」に対応しており、７２０００の行位置のうち当該単語に対応する１つの行位置に「１」が設定され、その他は「０」が設定される。単語系列が１５０文字に満たない場合、残りの列データにはヌルデータである「０」が設定される。 The column data of the word one-hot vector sequence {x _word } corresponds to the words “Shinjuku Station West Exit”, “Near”, and “November 15” constituting the word sequence, and the row position of 72000 Of these, “1” is set in one line position corresponding to the word, and “0” is set in the other lines. If the word series is less than 150 characters, “0”, which is null data, is set in the remaining column data.

図１１を参照して、図１６及び図１８の例において、文面分類部１１は、ニュース性投稿情報が「新宿駅西口付近で火事１１月１５日」である場合、７２０００行×１５０列の単語one-hotベクトル系列｛ｘ_word｝を単語入力層用ＦＦＮＮ２５に入力し、ＦＦＮＮの演算を行う。この場合の単語入力層用ＦＦＮＮ２５における入力層のノード数は、７２０００である。 Referring to FIG. 11, in the examples of FIGS. 16 and 18, the sentence classification unit 11 has a word of 72000 rows × 150 columns when the news posting information is “November 15 near Shinjuku Station West Exit”. The one-hot vector series {x _word } is input to the word input layer FFNN 25 and FFNN is calculated. In this case, the number of nodes in the input layer in the word input layer FFNN 25 is 72,000.

そして、文面分類部１１は、ＦＦＮＮの演算結果として２００行×１５０列のベクトル系列｛ｈ_word ⁱⁿ｝を生成し、これを単語入力層用ＦＦＮＮ２５から出力し、単語ＬＳＴＭ２７へ伝搬させる。この場合の単語入力層用ＦＦＮＮ２５における出力層のノード数は、２００である。 Then, the text classification unit 11 generates a 200-row × 150-column vector sequence {h _word ⁱⁿ } as the FFNN calculation result, outputs this from the word input layer FFNN 25, and propagates it to the word LSTM 27. In this case, the number of nodes in the output layer in the word input layer FFNN 25 is 200.

図１０に戻って、文面分類部１１は、単語入力層用ＦＦＮＮ２５から演算結果のベクトル系列｛ｈ_word ⁱⁿ｝を単語ＬＳＴＭ２７に入力し、ベクトル系列｛ｈ_word ⁱⁿ｝の各要素の値に対し、機械学習された重み係数を付加する等のＬＳＴＭの演算を行う。そして、文面分類部１１は、演算結果のベクトルｈ_word ^LSTMを単語ＬＳＴＭ２７から出力し、ベクトルｈ_char ^LSTM及びベクトルｈ_word ^LSTMを結合して中間層用ＦＦＮＮ２８へ伝搬させる。 Returning to FIG. 10, the sentence classification unit 11 inputs the vector sequence {h _word ⁱⁿ } of the operation result from the word input layer FFNN 25 to the word LSTM 27, and for each element value of the vector sequence {h _word ⁱⁿ }, LSTM operations such as adding machine-learned weighting coefficients are performed. Then, the text classification unit 11 outputs the vector h _word ^LSTM of the operation result from the _word ^{LSTM 27} , combines the vector h _char ^LSTM and the vector h _word ^LSTM , and propagates them to the intermediate layer FFNN 28.

図１１を参照して、図１６及び図１８の例において、文面分類部１１は、単語入力層用ＦＦＮＮ２５から２００行×１５０列のベクトル系列｛ｈ_word ⁱⁿ｝を単語ＬＳＴＭ２７に入力し、ＬＳＴＭの演算を行う。この場合の単語ＬＳＴＭ２７における入力層のノード数は、２００である。 Referring to FIG. 11, in the examples of FIGS. 16 and 18, the text classification unit 11 inputs a vector sequence {h _word ⁱⁿ } of 200 rows × 150 columns from the word input layer FFNN 25 to the word LSTM 27, and Perform the operation. In this case, the number of nodes in the input layer in the word LSTM27 is 200.

そして、文面分類部１１は、ＬＳＴＭの演算結果として２００行×１列のベクトルｈ_word ^LSTMを生成し、これを単語ＬＳＴＭ２７から出力し、中間層用ＦＦＮＮ２８へ伝搬させる。この場合の単語ＬＳＴＭ２７における出力層のノード数は、２００である。 Then, the sentence classification unit 11 generates a vector h _word ^LSTM of 200 rows × 1 column as an LSTM calculation result, outputs this from the word LSTM 27, and propagates it to the intermediate layer FFNN 28. In this case, the number of nodes in the output layer in the word LSTM27 is 200.

図１０に戻って、文面分類部１１は、文字ＬＳＴＭ２６からのベクトルｈ_char ^LSTMと単語ＬＳＴＭ２７からのベクトルｈ_word ^LSTMとが結合したベクトルｈ^LSTMを中間層用ＦＦＮＮ２８に入力し、ベクトルｈ_char ^LSTM及びベクトルｈ_word ^LSTMの各要素の値に対し、機械学習された重み係数を付加する等のＦＦＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトルｈ^intを中間層用ＦＦＮＮ２８から出力し、出力層用ＦＦＮＮ２９へ伝搬させる。 Returning to FIG. 10, wording classifying unit 11 receives the vector h _char ^LSTM and vector h ^LSTM where the vector h _word ^LSTM bound from words LSTM27 from characters LSTM26 the intermediate layer FFNN28, vector h _char ^LSTM and FFNN operations such as adding machine-learned weighting coefficients are performed on the values of each element of the vector h _word ^LSTM . Then, the text classification unit 11 outputs the calculation result vector h ^int from the intermediate layer FFNN 28 and propagates it to the output layer FFNN 29.

図１１を参照して、図１６〜図１８の例において、文面分類部１１は、文字ＬＳＴＭ２６からの２００行×１列のベクトルｈ_char ^LSTMと単語ＬＳＴＭ２７からの２００行×１列のベクトルｈ_word ^LSTMとを結合させたベクトルｈ^LSTMを中間層用ＦＦＮＮ２８に入力し、ＦＦＮＮの演算を行う。この場合の中間層用ＦＦＮＮ２８における入力層のノード数は、４００である。 Referring to FIG. 11, in the examples of FIGS. 16 to 18, the sentence classification unit 11 includes a vector h _char ^LSTM of 200 rows × 1 column from the character LSTM ²⁶ and a vector h _word of 200 rows × 1 column from the word LSTM 27. ^The vector h ^{LSTM combined} with the LSTM is input to the intermediate layer FFNN 28 and the FFNN is calculated. In this case, the number of nodes in the input layer in the intermediate layer FFNN 28 is 400.

そして、文面分類部１１は、ＦＦＮＮの演算結果として２００行×１列のベクトルｈ^intを生成し、これを中間層用ＦＦＮＮ２８から出力し、出力層用ＦＦＮＮ２９へ伝搬させる。この場合の中間層用ＦＦＮＮ２８における出力層のノード数は、２００である。 Then, the sentence classification unit 11 generates a vector h ^int of 200 rows × 1 column as the calculation result of FFNN, outputs this from the intermediate layer FFNN 28, and propagates it to the output layer FFNN 29. In this case, the number of nodes in the output layer in the intermediate layer FFNN 28 is 200.

図１０に戻って、文面分類部１１は、中間層用ＦＦＮＮ２８からベクトルｈ^intを出力層用ＦＦＮＮ２９に入力し、ベクトルｈ^intの各要素の値に対し、機械学習された重み係数を付加する等のＦＦＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトルｈ^outに対し、softmax関数を用いて確率値を演算し、確率値に対し、argmax関数を用いて二値情報を演算し、これを既出・非既出情報として出力する。 Returning to FIG. 10, the sentence classification unit 11 inputs the vector h ^int from the intermediate layer FFNN 28 to the output layer FFNN 29, and adds a machine-learned weight coefficient to the value of each element of the vector h ^int. FFNN is calculated. Then, the text classification unit 11 calculates a probability value using the softmax function with respect to the vector h ^{out of the} calculation result, calculates binary information using the argmax function with respect to the probability value, Output as previous information.

図１１を参照して、図１６〜図１８の例において、文面分類部１１は、中間層用ＦＦＮＮ２８から２００行×１列のベクトルｈ^intを出力層用ＦＦＮＮ２９に入力し、ＦＦＮＮの演算を行う。この場合の出力層用ＦＦＮＮ２９における入力層のノード数は、２００である。 Referring to FIG. 11, in the examples of FIGS. 16 to 18, the text classification unit 11 inputs a vector h ^int of 200 rows × 1 column from the intermediate layer FFNN 28 to the output layer FFNN 29 and performs an FFNN operation. . In this case, the number of nodes in the input layer in the output layer FFNN 29 is 200.

そして、文面分類部１１は、ＦＦＮＮの演算結果として２行×１列のベクトルｈ^outを生成し、ベクトルｈ^outに対して関数処理を行い、既出・非既出情報を出力する。この場合の出力層用ＦＦＮＮ２９における出力層のノード数は、２である。 Then, the text classification unit 11 generates a vector h ^out of 2 rows × 1 column as a calculation result of FFNN, performs a function process on the vector h ^out , and outputs the previous / non-existing information. In this case, the number of nodes in the output layer in the output layer FFNN 29 is two.

尚、図１０及び図１１に示したＦＦＮＮ及びＬＳＴＭを用いた具体例１の学習モデルにおいて、文字ＬＳＴＭ２６及び単語ＬＳＴＭ２７のそれぞれにアテンションメカニズムを追加するようにしてもよい。文字ＬＳＴＭ２６にアテンションメカニズムを追加することにより、ニュース性投稿情報の文字系列全体に影響を及ぼしている文字に対して高い重み係数が与えられ、さほどの影響を及ぼしていない文字に対して低い重み係数が与えられる。また、単語ＬＳＴＭ２７にアテンションメカニズムを追加することにより、ニュース性投稿情報の単語系列全体に影響を及ぼしている単語に対して高い重み係数が与えられ、さほどの影響を及ぼしていない単語に対して低い重み係数が与えられる。 In the learning model of Example 1 using FFNN and LSTM shown in FIGS. 10 and 11, an attention mechanism may be added to each of the character LSTM26 and the word LSTM27. By adding an attention mechanism to the character LSTM 26, a high weighting factor is given to characters that affect the entire character sequence of the news posting information, and a low weighting factor is given to characters that do not have much influence. Is given. Further, by adding an attention mechanism to the word LSTM27, a high weighting factor is given to a word that affects the entire word sequence of the news posting information, and a low weight is given to a word that does not have a great influence. A weighting factor is given.

アテンションメカニズムは既知であり、詳細については以下の文献を参照されたい。
宮崎他、“ニュース制作に役立つｔｗｅｅｔの自動抽出手法”、言語処理学会、第２３回年次大会、発表論文集（２０１７年３月）、P.418-421 Attention mechanisms are known, see the following references for details.
Miyazaki et al., “Automatic extraction method of tweet useful for news production”, Linguistic Processing Society of Japan, 23rd Annual Conference, Presentation Papers (March 2017), P.418-421

また、図１０及び図１１に示したＦＦＮＮ及びＬＳＴＭを用いた具体例１の学習モデルにおいて、ＬＳＴＭの代わりに、ＧＲＵ（Gated Recurrent Unit：ゲート付きリカレント（再帰型）ユニット）等の他のＲＮＮ（Recurrent Neural Network：リカレント（再帰型）ニューラルネットワーク）を用いるようにしてもよい。具体的には、文字ＬＳＴＭ２６及び単語ＬＳＴＭ２７の代わりに、それぞれ文字ＲＮＮ及び単語ＲＮＮが用いられる。つまり、本発明は、ＦＦＮＮ及びＲＮＮを用いた学習モデルに適用がある。 Further, in the learning model of Example 1 using FFNN and LSTM shown in FIGS. 10 and 11, other RNNs such as GRU (Gated Recurrent Unit) instead of LSTM ( A Recurrent Neural Network may be used. Specifically, a character RNN and a word RNN are used instead of the character LSTM26 and the word LSTM27, respectively. That is, the present invention is applicable to a learning model using FFNN and RNN.

（具体例２）
次に、具体例２の学習モデルについて説明する。図１２は、ＦＦＮＮ及びＣＮＮを用いた学習モデル（具体例２）の概略構造を説明する図であり、図１３は、具体例２におけるノード数及び入出力データを説明する図である。 (Specific example 2)
Next, the learning model of specific example 2 will be described. FIG. 12 is a diagram illustrating a schematic structure of a learning model (specific example 2) using FFNN and CNN, and FIG. 13 is a diagram illustrating the number of nodes and input / output data in specific example 2.

この学習モデルは、文字入力層用ＦＦＮＮ３０、単語入力層用ＦＦＮＮ３１、文字ＣＮＮ３２、単語ＣＮＮ３３、プーリング４０，４１（文字プーリング層４０、単語プーリング層４１）、中間層用ＦＦＮＮ３４及び出力層用ＦＦＮＮ３５を備えて構成される。また、この学習モデルは、文字入力層用ＦＦＮＮ３０の出力データが文字ＣＮＮ３２に入力されるように、文字入力層用ＦＦＮＮ３０と文字ＣＮＮ３２とが結合し、文字ＣＮＮ３２の出力データがプーリング４０の層に入力されるように、文字ＣＮＮ３２とプーリング４０とが結合し、単語入力層用ＦＦＮＮ３１の出力データが単語ＣＮＮ３３に入力されるように、単語入力層用ＦＦＮＮ３１と単語ＣＮＮ３３とが結合し、単語ＣＮＮ３３の出力データがプーリング４１の層に入力されるように、単語ＣＮＮ３３とプーリング４１とが結合し、プーリング４０の出力データ及びプーリング４１の出力データが中間層用ＦＦＮＮ３４に入力されるように、プーリング４０，４１と中間層用ＦＦＮＮ３４とが結合し、中間層用ＦＦＮＮ３４の出力データが出力層用ＦＦＮＮ３５に入力されるように、中間層用ＦＦＮＮ３４と出力層用ＦＦＮＮ３５とが結合して構成される。尚、文字ＣＮＮ３２及びプーリング４０では演算が繰り返し行われ、単語ＣＮＮ３３及びプーリング４１でも演算が繰り返し行われる。 This learning model includes FFNN30 for character input layer, FFNN31 for word input layer, character CNN32, word CNN33, pooling 40, 41 (character pooling layer 40, word pooling layer 41), FFNN34 for intermediate layer, and FFNN35 for output layer. Configured. This learning model also combines the character input layer FFNN 30 and the character CNN 32 so that the output data of the character input layer FFNN 30 is input to the character CNN 32, and inputs the output data of the character CNN 32 to the pooling 40 layer. Thus, the character input layer FFNN31 and the word CNN33 are combined and the output of the word CNN33 so that the character CNN32 and the pooling 40 are combined and the output data of the word input layer FFNN31 is input to the word CNN33. The words CNN 33 and the pooling 41 are combined so that data is input to the pooling 41 layer, and the pooling 40 and 41 are output so that the output data of the pooling 40 and the output data of the pooling 41 are input to the FFNN 34 for the intermediate layer. And the intermediate layer FFNN 34 are combined, and the output of the intermediate layer FFNN 34 As over data is input to the output layer for FFNN35, configured as an intermediate layer for FFNN34 output layer FFNN35 combine to. Note that the calculation is repeated for the characters CNN32 and pooling 40, and the calculation is also repeated for the words CNN33 and pooling 41.

この学習モデルは、文字one-hotベクトル系列｛ｘ_char｝及び単語one-hotベクトル系列｛ｘ_word｝のそれぞれを入力データとし、中間層にて両者を結合し、二値の既出・非既出情報を出力データとするニューラルネットワークである。また、この学習モデルは、図８に示した学習装置２により機械学習された文字入力層用ＦＦＮＮ３０〜出力層用ＦＦＮＮ３５及びプーリング４０，４１の各層の重み係数を保持している。 This learning model uses each of a character one-hot vector sequence {x _char } and a word one-hot vector sequence {x _word } as input data, and combines both in the intermediate layer to obtain binary appearing / not appearing information. Is a neural network whose output data is. Further, this learning model holds the weight coefficients of the character input layer FFNN30 to the output layer FFNN35 and the pooling 40, 41 that are machine-learned by the learning device 2 shown in FIG.

文面分類部１１は、学習モデル記憶部１２に記憶された図１２に示す学習モデルを読み出す。文面分類部１１は、系列生成部１０により生成された文字one-hotベクトル系列｛ｘ_char｝を文字入力層用ＦＦＮＮ３０に入力し、文字one-hotベクトル系列｛ｘ_char｝の各要素の値に対し、機械学習された重み係数を付加する等のＦＦＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトル系列｛ｈ_char ⁱⁿ｝を文字入力層用ＦＦＮＮ３０から出力し、文字ＣＮＮ３２へ伝搬させる。 The sentence classification unit 11 reads the learning model shown in FIG. 12 stored in the learning model storage unit 12. The text classification unit 11 inputs the character one-hot vector sequence {x _char } generated by the sequence generation unit 10 to the character input layer FFNN 30 and sets the value of each element of the character one-hot vector sequence {x _char }. On the other hand, an FFNN calculation such as adding a weighting coefficient learned by machine learning is performed. Then, the sentence classification unit 11 outputs the vector sequence {h _char ⁱⁿ } of the operation result from the character input layer FFNN 30 and propagates it to the character CNN 32.

図１３を参照して、図１６及び図１７の例において、文面分類部１１は、図１１の具体例１と同様に、６３００行×１５０列の文字one-hotベクトル系列｛ｘ_char｝を文字入力層用ＦＦＮＮ３０に入力し、ＦＦＮＮの演算を行う。この場合の文字入力層用ＦＦＮＮ３０における入力層のノード数は、６３００である。 Referring to FIG. 13, in the examples of FIGS. 16 and 17, the text classification unit 11 converts a character one-hot vector sequence {x _char } of 6300 rows × 150 columns into a character as in the first specific example of FIG. 11. The data is input to the input layer FFNN 30 and the FFNN is calculated. In this case, the number of nodes in the input layer in the character input layer FFNN 30 is 6300.

そして、文面分類部１１は、図１１の具体例１と同様に、ＦＦＮＮの演算結果として２００行×１５０列のベクトル系列｛ｈ_char ⁱⁿ｝を生成し、これを文字入力層用ＦＦＮＮ３０から出力し、文字ＣＮＮ３２へ伝搬させる。この場合の文字入力層用ＦＦＮＮ３０における出力層のノード数は、２００である。 Then, the sentence classification unit 11 generates a 200-row × 150-column vector sequence {h _char ⁱⁿ } as the FFNN calculation result, and outputs this from the character input layer FFNN 30, as in the first specific example of FIG. 11. , And propagate to the character CNN32. In this case, the number of nodes in the output layer in the character input layer FFNN 30 is 200.

図１２に戻って、文面分類部１１は、文字入力層用ＦＦＮＮ３０から演算結果のベクトル系列｛ｈ_char ⁱⁿ｝を文字ＣＮＮ３２に入力し、ベクトル系列｛ｈ_char ⁱⁿ｝の各要素の値に対し、機械学習された重み係数を付加する等のＣＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトルを文字ＣＮＮ３２から出力し、プーリング４０へ伝搬させる。文面分類部１１は、文字ＣＮＮ３２から演算結果のベクトルをプーリング４０の層に入力し、ベクトルの各要素の値に対し、機械学習された重み係数を付加する等のプーリングの演算を行う。この場合、文面分類部１１は、文字ＣＮＮ３２及びプーリング４０の演算を所定回数繰り返し行い、演算結果のベクトルｈ_char ^CNNをプーリング４０から出力し、ベクトルｈ_char ^CNN及び後述するベクトルｈ_word ^CNNを結合して中間層用ＦＦＮＮ３４へ伝搬させる。 Returning to FIG. 12, the sentence classification unit 11 inputs the vector sequence {h _char ⁱⁿ } of the operation result from the character input layer FFNN 30 to the character CNN 32, and for each element value of the vector sequence {h _char ⁱⁿ }, CNN operations such as adding machine-learned weighting coefficients are performed. Then, the text classification unit 11 outputs a vector of the calculation result from the character CNN 32 and propagates it to the pooling 40. The sentence classification unit 11 inputs a vector of calculation results from the character CNN 32 to the pooling 40 layer, and performs pooling calculations such as adding machine-learned weighting factors to the values of each element of the vector. In this case, the sentence classification unit 11 repeats the calculation of the character CNN 32 and the pooling 40 a predetermined number of times, outputs the vector h _char ^CNN of the calculation result from the pooling 40, and combines the vector h _char ^CNN and a vector h _word ^CNN described later. To the intermediate layer FFNN 34.

ここで、ＣＮＮの演算とは畳み込みの演算をいい、当該演算により、入力したベクトル系列｛ｈ_char ⁱⁿ｝の局所的な特徴が抽出される。また、プーリングの処理とは、ＣＮＮの演算結果をまとめ上げる処理をいい、ベクトル系列｛ｈ_char ⁱⁿ｝の局所的な特徴を維持しながら縮小が行われる。尚、ＣＮＮ及びプーリングの演算の詳細は既知であるから、ここでは説明を省略する。 Here, the CNN calculation refers to a convolution calculation, and a local feature of the input vector sequence {h _char ⁱⁿ } is extracted by the calculation. The pooling process is a process for collecting CNN calculation results, and reduction is performed while maintaining local features of the vector sequence {h _char ⁱⁿ }. The details of the calculation of CNN and pooling are already known, and the description thereof is omitted here.

図１３を参照して、図１６及び図１７の例において、文面分類部１１は、文字入力層用ＦＦＮＮ３０から２００行×１５０列のベクトル系列｛ｈ_char ⁱⁿ｝を文字ＣＮＮ３２に入力し、ＣＮＮの演算を行い、演算結果のベクトルをプーリング４０の層に入力し、プーリングの演算を行い、ＣＮＮ及びプーリングの演算を繰り返す。この場合の文字ＣＮＮ３２における入力層のノード数は、２００である。 Referring to FIG. 13, in the example of FIGS. 16 and 17, the text classification unit 11 inputs a vector sequence {h _char ⁱⁿ } of 200 rows × 150 columns from the character input layer FFNN 30 to the character CNN 32, and An operation is performed, a vector of the operation result is input to the pooling 40 layer, the pooling operation is performed, and the CNN and pooling operations are repeated. In this case, the number of nodes in the input layer in the character CNN 32 is 200.

そして、文面分類部１１は、ＣＮＮ及びプーリングの演算結果として２００行×１列のベクトルｈ_char ^CNNを生成し、これをプーリング４０の層から出力し、中間層用ＦＦＮＮ３４へ伝搬させる。この場合のプーリング４０における出力層のノード数は、２００である。 Then, the sentence classification unit 11 generates a vector h _char ^CNN of 200 rows × 1 column as a calculation result of CNN and pooling, outputs this from the pooling 40 layer, and propagates it to the intermediate layer FFNN 34. In this case, the number of nodes in the output layer in the pooling 40 is 200.

図１２に戻って、文面分類部１１は、系列生成部１０により生成された単語one-hotベクトル系列｛ｘ_word｝を単語入力層用ＦＦＮＮ３１に入力し、単語one-hotベクトル系列｛ｘ_word｝の各要素の値に対し、機械学習された重み係数を付加する等のＦＦＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトル系列｛ｈ_word ⁱⁿ｝を単語入力層用ＦＦＮＮ３１から出力し、単語ＣＮＮ３３へ伝搬させる。 Returning to FIG. 12, wording classification unit 11, the word one-hot vector sequence generated by sequence generation unit 10 {x _word} entered word input layer FFNN31, word one-hot vector sequence {x _word} FFNN operations such as adding machine-learned weighting coefficients to the values of the elements are performed. Then, the text classification unit 11 outputs the vector sequence {h _word ⁱⁿ } of the calculation result from the word input layer FFNN 31 and propagates it to the word CNN 33.

図１３を参照して、図１６及び図１８の例において、文面分類部１１は、図１１の具体例１と同様に、７２０００行×１５０列の単語one-hotベクトル系列｛ｘ_word｝を単語入力層用ＦＦＮＮ３１に入力し、ＦＦＮＮの演算を行う。この場合の単語入力層用ＦＦＮＮ３１における入力層のノード数は、７２０００である。 Referring to FIG. 13, in the examples of FIGS. 16 and 18, the text classification unit 11 converts a word one-hot vector sequence {x _word } of 72000 rows × 150 columns into a word, as in the specific example 1 of FIG. 11. The data is input to the input layer FFNN 31 and FFNN is calculated. In this case, the number of nodes in the input layer in the word input layer FFNN 31 is 72,000.

そして、文面分類部１１は、ＦＦＮＮの演算結果として２００行×１５０列のベクトル系列｛ｈ_word ⁱⁿ｝を生成し、これを単語入力層用ＦＦＮＮ３１から出力し、単語ＣＮＮ３３へ伝搬させる。この場合の単語入力層用ＦＦＮＮ３１における出力層のノード数は、２００である。 Then, the sentence classification unit 11 generates a 200-row × 150-column vector sequence {h _word ⁱⁿ } as the FFNN calculation result, outputs this from the word input layer FFNN 31, and propagates it to the word CNN 33. In this case, the number of nodes in the output layer in the word input layer FFNN 31 is 200.

図１２に戻って、文面分類部１１は、単語入力層用ＦＦＮＮ３１から演算結果のベクトル系列｛ｈ_word ⁱⁿ｝を単語ＣＮＮ３３に入力し、ベクトル系列｛ｈ_word ⁱⁿ｝の各要素の値に対し、機械学習された重み係数を付加する等のＣＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトルを単語ＣＮＮ３３から出力し、プーリング４１へ伝搬させる。文面分類部１１は、単語ＣＮＮ３３から演算結果のベクトルをプーリング４１の層に入力し、ベクトルの各要素の値に対し、機械学習された重み係数を付加する等のプーリングの演算を行う。この場合、文面分類部１１は、単語ＣＮＮ３３及びプーリング４１の演算を所定回数繰り返し行い、演算結果のベクトルｈ_word ^CNNをプーリング４１から出力し、ベクトルｈ_char ^CNN及びベクトルｈ_word ^CNNを結合して中間層用ＦＦＮＮ３４へ伝搬させる。 Returning to FIG. 12, the text classification unit 11 inputs the vector sequence {h _word ⁱⁿ } of the operation result from the word input layer FFNN 31 to the word CNN 33, and for each element value of the vector sequence {h _word ⁱⁿ }, CNN operations such as adding machine-learned weighting coefficients are performed. Then, the text classification unit 11 outputs a vector of the calculation result from the word CNN 33 and propagates it to the pooling 41. The sentence classification unit 11 inputs a vector of calculation results from the word CNN 33 to the pooling 41 layer, and performs a pooling calculation such as adding a machine-learned weighting factor to the value of each element of the vector. In this case, the sentence classification unit 11 repeats the calculation of the _word ^CNN 33 and the pooling 41 a predetermined number of times, outputs the vector h _word ^CNN of the calculation result from the pooling 41, and combines the vector h _char ^CNN and the vector h _word ^CNN Propagate to the FFNN 34 for layers.

図１３を参照して、図１６及び図１８の例において、文面分類部１１は、単語入力層用ＦＦＮＮ３１から２００行×１５０列のベクトル系列｛ｈ_word ⁱⁿ｝を単語ＣＮＮ３３に入力し、ＣＮＮの演算を行い、演算結果のベクトルをプーリング４１の層に入力し、プーリングの演算を行い、ＣＮＮ及びプーリングの演算を繰り返す。この場合の単語ＣＮＮ３３における入力層のノード数は、２００である。 Referring to FIG. 13, in the examples of FIGS. 16 and 18, the text classification unit 11 inputs a vector sequence {h _word ⁱⁿ } of 200 rows × 150 columns from the word input layer FFNN 31 to the _word CNN 33, and An operation is performed, a vector of the operation result is input to the pooling 41 layer, the pooling operation is performed, and the CNN and pooling operations are repeated. In this case, the number of nodes in the input layer in the word CNN33 is 200.

そして、文面分類部１１は、ＣＮＮ及びプーリングの演算結果として２００行×１列のベクトルｈ_word ^CNNを生成し、これをプーリング４１の層から出力し、中間層用ＦＦＮＮ３４へ伝搬させる。この場合のプーリング４１における出力層のノード数は、２００である。 Then, the sentence classification unit 11 generates a vector h _word ^CNN of 200 rows × 1 column as a calculation result of CNN and pooling, outputs this from the pooling 41 layer, and propagates it to the intermediate layer FFNN 34. In this case, the number of nodes in the output layer in the pooling 41 is 200.

図１２に戻って、文面分類部１１は、プーリング４０からのベクトルｈ_char ^CNNとプーリング４１からのベクトルｈ_word ^CNNとが結合したベクトルｈ^CNNを中間層用ＦＦＮＮ３４に入力する。文面分類部１１は、図１０に示した中間層用ＦＦＮＮ２８及び出力層用ＦＦＮＮ２９と同様の処理を行い、既出・非既出情報を出力する。 Returning to FIG. 12, the sentence classification unit 11 inputs the vector h ^CNN obtained by combining the vector h _char ^CNN from the pooling 40 and the vector h _word ^CNN from the pooling 41 to the intermediate layer FFNN 34. The sentence classification unit 11 performs the same processing as the intermediate layer FFNN 28 and the output layer FFNN 29 shown in FIG. 10 and outputs the previous / non-existing information.

尚、中間層用ＦＦＮＮ３４及び出力層用ＦＦＮＮ３５については、図１０に示した中間層用ＦＦＮＮ２８及び出力層用ＦＦＮＮ２９とそれぞれ同じであり、ノード数、ベクトルサイズ及び演算も同様であるから、これらの説明は省略する。 The intermediate layer FFNN 34 and the output layer FFNN 35 are the same as the intermediate layer FFNN 28 and the output layer FFNN 29 shown in FIG. 10, respectively, and the number of nodes, vector size, and calculation are also the same. Is omitted.

（具体例３）
次に、具体例３の学習モデルについて説明する。図１４は、ＦＦＮＮを用いた学習モデル（具体例３）の概略構造を説明する図であり、図１５は、具体例３におけるノード数及び入出力データを説明する図である。 (Specific example 3)
Next, the learning model of specific example 3 will be described. FIG. 14 is a diagram illustrating a schematic structure of a learning model (specific example 3) using FFNN, and FIG. 15 is a diagram illustrating the number of nodes and input / output data in specific example 3.

この学習モデルは、文字入力層用ＦＦＮＮ３６、単語入力層用ＦＦＮＮ３７、中間層用ＦＦＮＮ３８及び出力層用ＦＦＮＮ３９を備えて構成される。また、この学習モデルは、文字入力層用ＦＦＮＮ３６の出力データ及び単語入力層用ＦＦＮＮ３７の出力データが中間層用ＦＦＮＮ３８に入力されるように、文字入力層用ＦＦＮＮ３６及び単語入力層用ＦＦＮＮ３７と中間層用ＦＦＮＮ３８とが結合し、中間層用ＦＦＮＮ３８の出力データが出力層用ＦＦＮＮ３９に入力されるように、中間層用ＦＦＮＮ３８と出力層用ＦＦＮＮ３９とが結合して構成される。 This learning model includes a character input layer FFNN 36, a word input layer FFNN 37, an intermediate layer FFNN 38, and an output layer FFNN 39. The learning model also includes the character input layer FFNN 36, the word input layer FFNN 37, and the intermediate layer so that the output data of the character input layer FFNN 36 and the output data of the word input layer FFNN 37 are input to the intermediate layer FFNN 38. The intermediate layer FFNN 38 and the output layer FFNN 39 are combined so that the output FFNN 38 is coupled to the output layer FFNN 38 and the output data of the intermediate layer FFNN 38 is input to the output layer FFNN 39.

この学習モデルは、文字BOWベクトルｘ_char及び単語BOWベクトルｘ_wordのそれぞれを入力データとし、中間層にて両者を結合し、二値の既出・非既出情報を出力データとするニューラルネットワークである。また、この学習モデルは、図８に示した学習装置２により機械学習された文字入力層用ＦＦＮＮ３６〜出力層用ＦＦＮＮ３９の重み係数を保持している。 This learning model is a neural network in which each of a character BOW vector x _char and a word BOW vector x _word is input data, both are combined in an intermediate layer, and binary appearing / not appearing information is output data. Further, this learning model holds the weight coefficients of the character input layer FFNN 36 to the output layer FFNN 39 machine-learned by the learning device 2 shown in FIG.

文面分類部１１は、学習モデル記憶部１２に記憶された図１４に示す学習モデルを読み出す。文面分類部１１は、系列生成部１０により生成された文字BOWベクトルｘ_charを文字入力層用ＦＦＮＮ３６に入力し、文字BOWベクトルｘ_charの各要素の値に対し、機械学習された重み係数を付加する等のＦＦＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトルｈ_char ⁱⁿを文字入力層用ＦＦＮＮ３６から出力し、中間層用ＦＦＮＮ３８へ伝搬させる。 The text classification unit 11 reads the learning model shown in FIG. 14 stored in the learning model storage unit 12. The text classification unit 11 inputs the character BOW vector x _char generated by the sequence generation unit 10 to the character input layer FFNN 36, and adds machine-learned weighting factors to the values of each element of the character BOW vector x _char. FFNN is calculated. Then, the sentence classification unit 11 outputs the calculation result vector h _char ⁱⁿ from the character input layer FFNN 36 and propagates it to the intermediate layer FFNN 38.

図１９は、文字BOWベクトルｘ_charの例を示す図である。文字系列が図１６に示した「新，宿，駅，西，口，付，近，で，火，事，１，１，月，１，５，日」である場合、系列生成部１０により、図１９に示す６３００行×１列の文字BOWベクトルｘ_charが生成される。この文字BOWベクトルｘ_charの行数６３００は、あらゆる文字系列に表れる文字の最大数に相当する。 FIG. 19 is a diagram illustrating an example of the character BOW vector x _char . When the character sequence is “new, inn, station, west, mouth, date, near, fire, thing, 1, 1, month, 1, 5, day” shown in FIG. A character BOW vector x _char of 6300 rows × 1 column shown in FIG. 19 is generated. The number of lines 6300 of the character BOW vector x _char corresponds to the maximum number of characters appearing in any character series.

６３００の行位置のうち、文字系列を構成する各文字「新」「宿」・・・「日」に対応する行位置に「１」が設定され、その他の行位置に「０」が設定される。つまり、６３００の行位置のうち、各文字に対応する行位置に「１」が設定される。 Among the 6300 line positions, “1” is set to the line position corresponding to each character “new”, “hotel”, “day”, and “0” is set to the other line positions. The That is, “1” is set to the line position corresponding to each character among the line positions of 6300.

図１５を参照して、図１６及び図１９の例において、文面分類部１１は、６３００行×１列の文字BOWベクトルｘ_charを文字入力層用ＦＦＮＮ３６に入力し、ＦＦＮＮの演算を行う。この場合の文字入力層用ＦＦＮＮ３６における入力層のノード数は、６３００である。 Referring to FIG. 15, in the example of FIGS. 16 and 19, the text classification unit 11 inputs a character BOW vector x _char of 6300 rows × 1 column to the character input layer FFNN 36 and performs an FFNN operation. In this case, the number of nodes in the input layer in the character input layer FFNN 36 is 6300.

そして、文面分類部１１は、ＦＦＮＮの演算結果として２００行×１列のベクトルｈ_char ⁱⁿを生成し、これを文字入力層用ＦＦＮＮ３６から出力し、ベクトルｈ_char ⁱⁿと後述するベクトルｈ_word ⁱⁿとを結合して中間層用ＦＦＮＮ３８へ伝搬させる。この場合の文字入力層用ＦＦＮＮ３６における出力層のノード数は、２００である。 Then, the text classification unit 11 generates a vector h _char ⁱⁿ of 200 rows × 1 column as a calculation result of FFNN, outputs this from the character input layer FFNN 36, and outputs a vector h _char ⁱⁿ and a vector h _word ⁱⁿ described later. Are propagated to the intermediate layer FFNN 38. In this case, the number of nodes in the output layer in the character input layer FFNN 36 is 200.

図１４に戻って、文面分類部１１は、系列生成部１０により生成された単語BOWベクトルｘ_wordを単語入力層用ＦＦＮＮ３７に入力し、単語BOWベクトルｘ_wordの各要素の値に対し、機械学習された重み係数を付加する等のＦＦＮＮの演算を行う。そして、文面分類部１１は、演算結果のベクトル系列ｈ_word ⁱⁿを単語入力層用ＦＦＮＮ３７から出力し、中間層用ＦＦＮＮ３８へ伝搬させる。 Returning to FIG. 14, the sentence classification unit 11 inputs the word BOW vector x _word generated by the sequence generation unit 10 to the word input layer FFNN 37, and performs machine learning on the value of each element of the word BOW vector x _word. The FFNN is calculated by adding the weighting factor. Then, the sentence classification unit 11 outputs the vector sequence h _word ⁱⁿ of the operation result from the word input layer FFNN 37 and propagates it to the intermediate layer FFNN 38.

図２０は、単語BOWベクトルｘ_wordの例を示す図である。単語系列が図１６に示した「新宿駅西口，付近，で，火事，１１月１５日」である場合、系列生成部１０により、図２０に示す７２０００行×１列の単語BOWベクトルｘ_wordが生成される。この単語BOWベクトルｘ_wordの行数７２０００は、あらゆる単語系列に表れる単語の最大数に相当する。 FIG. 20 is a diagram illustrating an example of the word BOW vector _xword . Word sequence is shown in FIG. 16 "Shinjuku Station West Exit, around in, fire, November 15" If it is, the sequence generating unit 10, the word BOW vector x _word of 72000 rows × 1 column shown in FIG. 20 Generated. The number of rows 72000 of the word BOW vector x _word corresponds to the maximum number of words appearing in any word series.

７２０００の行位置のうち、単語系列を構成する各単語「新宿駅西口」「付近」・・・「１１月１５日」に対応する行位置に「１」が設定され、その他の行位置に「０」が設定される。つまり、７２０００の行位置のうち、各単語に対応する行位置に「１」が設定される。 Among the 72000 line positions, “1” is set to the line position corresponding to each word “Shinjuku Station West Exit”, “Near”, “November 15” constituting the word series, and “ “0” is set. That is, “1” is set to the line position corresponding to each word among the 72000 line positions.

図１５を参照して、図１６及び図２０の例において、文面分類部１１は、ニュース性投稿情報が「新宿駅西口付近で火事１１月１５日」である場合、７２０００行×１列の単語BOWベクトルｘ_wordを単語入力層用ＦＦＮＮ３７に入力し、ＦＦＮＮの演算を行う。この場合の単語入力層用ＦＦＮＮ３７における入力層のノード数は、７２０００である。 Referring to FIG. 15, in the examples of FIGS. 16 and 20, the sentence classification unit 11, when the news posting information is “November 15 near the west exit of Shinjuku Station”, 72,000 rows × 1 column of words The BOW vector x _word is input to the word input layer FFNN 37 and FFNN is calculated. In this case, the number of nodes in the input layer in the word input layer FFNN 37 is 72,000.

そして、文面分類部１１は、ＦＦＮＮの演算結果として２００行×１列のベクトルｈ_word ⁱⁿを生成し、これを単語入力層用ＦＦＮＮ３７から出力し、中間層用ＦＦＮＮ３８へ伝搬させる。この場合の単語入力層用ＦＦＮＮ３７における出力層のノード数は、２００である。 Then, the sentence classification unit 11 generates a vector h _word ⁱⁿ of 200 rows × 1 column as the calculation result of FFNN, outputs this from the word input layer FFNN 37, and propagates it to the intermediate layer FFNN 38. In this case, the number of nodes in the output layer in the word input layer FFNN 37 is 200.

図１４に戻って、文面分類部１１は、文字入力層用ＦＦＮＮ３６からのベクトルｈ_char ⁱⁿと単語入力層用ＦＦＮＮ３７からのベクトルｈ_word ⁱⁿとが結合したベクトルｈⁱⁿを中間層用ＦＦＮＮ３８に入力する。文面分類部１１は、図１０に示した中間層用ＦＦＮＮ２８及び出力層用ＦＦＮＮ２９と同様の処理を行い、既出・非既出情報を出力する。 Returning to FIG. 14, wording classifying unit 11 inputs the vector h ⁱⁿ which the vector h _word ⁱⁿ from the vector h _char ⁱⁿ the word input layer FFNN37 from the character input layer FFNN36 is bonded to the intermediate layer for FFNN38 . The sentence classification unit 11 performs the same processing as the intermediate layer FFNN 28 and the output layer FFNN 29 shown in FIG. 10 and outputs the previous / non-existing information.

尚、中間層用ＦＦＮＮ３８及び出力層用ＦＦＮＮ３９については、図１０に示した中間層用ＦＦＮＮ２８及び出力層用ＦＦＮＮ２９とそれぞれ同じであり、ノード数、ベクトルサイズ及び演算も同様であるから、これらの説明は省略する。 The intermediate layer FFNN 38 and the output layer FFNN 39 are the same as the intermediate layer FFNN 28 and the output layer FFNN 29 shown in FIG. 10, respectively, and the number of nodes, the vector size, and the calculation are also the same. Is omitted.

〔実験結果〕
次に、コンピュータを用いたシミュレーションによる実験結果について説明する。図２１は、実験結果を説明する図である。（１）は、キーワードフィルタリングの手法を用いた従来技術の実験結果を示し、（２）は、文字のみを入力データとしたＮＮを用いた従来技術の実験結果を示し、（３）は、単語のみを入力データとしたＮＮを用いた従来技術の実験結果を示す。また、（４）は、文字及び単語を入力データとしたＮＮを用いた実施例１の実験結果を示す。〔Experimental result〕
Next, experimental results by simulation using a computer will be described. FIG. 21 is a diagram for explaining experimental results. (1) shows the experimental results of the prior art using the keyword filtering technique, (2) shows the experimental results of the prior art using NN using only the characters as input data, and (3) shows the word The experimental result of the prior art using NN which used only as input data is shown. (4) shows the experimental results of Example 1 using NN using characters and words as input data.

（２）（３）（４）おいては、ＬＳＴＭ、ＣＮＮ及びＦＦＮＮの学習モデルを用いた場合の実験結果をそれぞれ示す。また、（４）において、ＬＳＴＭは、図１０及び図１１に示した具体例１の学習モデルであり、ＣＮＮは、図１２及び図１３に示した具体例２の学習モデルであり、ＦＦＮＮは、図１４及び図１５に示した具体例３の学習モデルである。 In (2), (3), and (4), experimental results when using learning models of LSTM, CNN, and FFNN are shown. In (4), LSTM is the learning model of the specific example 1 shown in FIGS. 10 and 11, CNN is the learning model of the specific example 2 shown in FIGS. 12 and 13, and FFNN is It is a learning model of the specific example 3 shown in FIG.14 and FIG.15.

適合率（Precision）は、既出・非既出情報である判定結果が、実際の正解データとどの程度一致しているかを表す、正確性に関する指標である。また、再現率（Recall）は、判定結果が、実際の正解データをどのくらい網羅しているかを表す、網羅性に関する指標である。さらに、Ｆ値は、適合率及び再現率を調和平均した値である。 The precision (Precision) is an index relating to accuracy indicating how much the determination result, which is the information that has already appeared or not, matches the actual correct data. In addition, the recall (Recall) is an index relating to completeness that represents how much the determination result covers actual correct answer data. Further, the F value is a harmonic average of the precision and the recall.

それぞれの学習モデルを機械学習するための学習データとして、所定期間の４４６７０個のニュース性投稿情報を用いた。このうち既出情報の数は９３００であり、非既出情報の数は３５３７０である。そして、図２１に示す実験結果を得るためのテストデータとして、所定期間のニュース性投稿情報からランダムサンプルした１００００個のニュース性投稿情報を用いた。このうち既出情報の数は２０２８であり、非既出情報の数は７９７２である。 As learning data for machine learning of each learning model, 44670 pieces of news posting information for a predetermined period were used. Of these, the number of previously-exposed information is 9300, and the number of non-existing information is 35370. As test data for obtaining the experimental results shown in FIG. 21, 10,000 news posting information randomly sampled from news posting information for a predetermined period was used. Of these, the number of previously-exposed information is 2028 and the number of non-existing information is 7972.

図２１の実験結果によれば、（１）のＦ値は５０．０％であり、（２）のＬＳＴＭ、ＣＮＮ及びＦＦＮＮのＦ値はそれぞれ８７．３％，８６．１％，８５．０％であり、（３）のＬＳＴＭ、ＣＮＮ及びＦＦＮＮのＦ値はそれぞれ８６．２％，８５．４％，８４．０％である。また、（４）のＬＳＴＭ、ＣＮＮ及びＦＦＮＮのＦ値はそれぞれ８８．１％，８８．４％，８５．９％である。これにより、（１）の従来技術よりも、（２）〜（４）の学習モデルを用いた方が、結果が良くなっていることがわかる。 According to the experimental results of FIG. 21, the F value of (1) is 50.0%, and the F values of LSTM, CNN and FFNN of (2) are 87.3%, 86.1% and 85.0, respectively. The F values of LSTM, CNN, and FFNN in (3) are 86.2%, 85.4%, and 84.0%, respectively. The F values of LSTM, CNN, and FFNN in (4) are 88.1%, 88.4%, and 85.9%, respectively. Thus, it can be seen that the results are better when the learning models (2) to (4) are used than the conventional technique (1).

また、（２）の文字のみの学習モデルを用いた場合、または（３）の単語のみの学習モデルを用いた場合よりも、（４）の実施例１の学習モデルを用いた方が、ＬＳＴＭ、ＣＮＮ及びＦＦＮＮの学習モデルのそれぞれにおいて、結果が良くなっていることがわかる。 In addition, when the learning model of only the character of (2) is used or when the learning model of only the word of (3) is used, the learning model of Example 1 of (4) is used in LSTM. It can be seen that the results are improved in each of the learning models of CNN and FFNN.

以上、実施例１，２及び具体例１，２，３を挙げて本発明を説明したが、本発明は前記実施例１，２及び具体例１，２，３に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。 The present invention has been described with reference to Examples 1 and 2 and Specific Examples 1, 2, and 3. However, the present invention is not limited to Examples 1 and 2 and Specific Examples 1, 2, and 3, Various modifications can be made without departing from the technical idea.

例えば前記実施例１，２及び具体例１，２，３において、ニュース素材分類装置１−１，１−２は、学習モデルを用いて、ニュース性投稿情報を、第一報の非既出情報とその他の既出情報とに分類するようにした。本発明は、これに限定されるものではなく、利用者の要求に応じた他の分類にも適用がある。例えば、ニュース素材分類装置１−１，１−２は、ニュース性投稿情報について、その投稿者が男性であるか、または女性であるかを示す情報に分類するようにしてもよく、方言を含むか否かを示す情報に分類するようにしてもよい。この場合、学習装置２は、既出・非既出情報の代わりに、男性または女性を示す情報、方言を含むか否かを示す情報等の分類情報を用いて、学習モデルを機械学習する。 For example, in the first and second embodiments and the first, second, and third specific examples, the news material classifying devices 1-1 and 1-2 use the learning model to convert the news posting information to the first report non-existing information. It was classified into other existing information. The present invention is not limited to this, and can be applied to other classifications according to user requirements. For example, the news material classification devices 1-1 and 1-2 may classify the news posting information into information indicating whether the poster is male or female, and includes a dialect. It may be classified into information indicating whether or not. In this case, the learning device 2 performs machine learning on the learning model using classification information such as information indicating male or female, information indicating whether or not a dialect is included, instead of the previously displayed / not-present information.

尚、本発明の実施例１によるニュース素材分類装置１−１、実施例２によるニュース素材分類装置１−２、及び学習装置２のハードウェア構成としては、通常のコンピュータを使用することができる。ニュース素材分類装置１−１，１−２及び学習装置２は、ＣＰＵ（またはＧＰＵ）、ＲＡＭ等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、及びインターフェース等を備えたコンピュータによって構成される。 As the hardware configuration of the news material classification device 1-1 according to the first embodiment of the present invention, the news material classification device 1-2 according to the second embodiment, and the learning device 2, a normal computer can be used. The news material classification devices 1-1 and 1-2 and the learning device 2 are configured by a computer having a CPU (or GPU), a volatile storage medium such as a RAM, a non-volatile storage medium such as a ROM, an interface, and the like. Is done.

ニュース素材分類装置１−１に備えた系列生成部１０、文面分類部１１及び学習モデル記憶部１２の各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。また、ニュース素材分類装置１−２に備えた系列生成部１０、文面分類部１１、学習モデル記憶部１２、エージェント分類部１３及び判定部１４の各機能も、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。さらに、学習装置２に備えた系列生成部１０、学習部２０及び学習モデル記憶部１２の各機能も、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 Each function of the sequence generation unit 10, the sentence classification unit 11, and the learning model storage unit 12 included in the news material classification apparatus 1-1 is realized by causing the CPU to execute a program describing these functions. In addition, each function of the sequence generation unit 10, the sentence classification unit 11, the learning model storage unit 12, the agent classification unit 13 and the determination unit 14 included in the news material classification device 1-2 also has a program describing these functions as a CPU. This is realized by executing the above. Furthermore, the functions of the sequence generation unit 10, the learning unit 20, and the learning model storage unit 12 included in the learning device 2 are also realized by causing the CPU to execute a program describing these functions.

これらのプログラムは、前記記憶媒体に格納されており、ＣＰＵに読み出されて実行される。また、これらのプログラムは、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記憶媒体に格納して頒布することもでき、ネットワークを介して送受信することもできる。 These programs are stored in the storage medium and read out and executed by the CPU. These programs can also be stored and distributed in a storage medium such as a magnetic disk (floppy (registered trademark) disk, hard disk, etc.), optical disk (CD-ROM, DVD, etc.), semiconductor memory, etc. You can also send and receive.

また、図９〜図１５に示した学習モデルは、機械学習された学習済みモデルであり、コンピュータを、多数のソーシャルメディア情報から抽出されたニュース性投稿情報を既出または非既出を示す既出・非既出情報に分類し既出・非既出情報を出力するように機能させるためのものである。また、その構成は、各図に示したとおりである。 The learning models shown in FIGS. 9 to 15 are learned models that have been machine-learned, and the computer displays news posting information that indicates whether or not news posting information extracted from a large number of social media information has already appeared. It is for functioning to classify the information into the existing information and output the existing / non-existing information. The configuration is as shown in each figure.

これらの学習モデルは、人口知能ソフトウェアの一部であるプログラムモジュールとして利用され、ＣＰＵ及びメモリを備えるコンピュータにて用いられる。具体的には、例えば図９に示した学習モデルの場合、コンピュータのＣＰＵは、メモリに記憶された学習モデルからの指令に従って、文字ベクトルを文字ＮＮ２１に入力し、文字ＮＮ２１の入力層に入力された文字ベクトルに対し、機械学習された重み係数に基づくＮＮの演算を行い、出力層から演算結果のベクトルを出力ＮＮ２３に出力するように動作する。また、コンピュータのＣＰＵは、単語ベクトルを単語ＮＮ２２に入力し、単語ＮＮ２２の入力層に入力された単語ベクトルに対し、機械学習された重み係数に基づくＮＮの演算を行い、出力層から演算結果のベクトルを出力ＮＮ２３に出力するように動作する。そして、コンピュータのＣＰＵは、出力ＮＮ２３の入力層に入力された、文字ＮＮ２１の出力層から出力された演算結果のベクトルと単語ＮＮ２２の出力層から出力された演算結果のベクトルとが結合されたベクトルに対し、機械学習された重み係数に基づくＮＮの演算を行い、出力層から演算結果の既出・非既出情報を出力するように動作する。図１０〜図１５に示した学習モデルについても、同様に動作する。 These learning models are used as program modules that are a part of artificial intelligence software, and are used in a computer having a CPU and a memory. Specifically, for example, in the case of the learning model shown in FIG. 9, the CPU of the computer inputs a character vector to the character NN21 in accordance with a command from the learning model stored in the memory, and is input to the input layer of the character NN21. An NN operation is performed on the character vector based on the machine-learned weighting coefficient, and an operation result vector is output from the output layer to the output NN23. Further, the CPU of the computer inputs a word vector to the word NN22, performs an NN operation on the word vector input to the input layer of the word NN22, based on a machine-learned weight coefficient, and outputs an operation result from the output layer. It operates to output the vector to the output NN23. Then, the CPU of the computer combines the vector of the operation result output from the output layer of the character NN21 and the vector of the operation result output from the output layer of the word NN22, which are input to the input layer of the output NN23. On the other hand, an NN calculation based on the machine-learned weighting coefficient is performed, and the operation result is output from the output layer. The learning model shown in FIGS. 10 to 15 operates similarly.

１ニュース素材分類装置
２学習装置
１０系列生成部
１１文面分類部
１２学習モデル記憶部
１３エージェント分類部
１４判定部
２０学習部
２１文字ＮＮ（ニューラルネットワーク）
２２単語ＮＮ
２３出力ＮＮ
２４，３０，３６文字入力層用ＦＦＮＮ（順伝播型ニューラルネットワーク）
２５，３１，３７単語入力層用ＦＦＮＮ
２６文字ＬＳＴＭ（長短期記憶ユニット）
２７単語ＬＳＴＭ
２８，３４，３８中間層用ＦＦＮＮ
２９，３５，３９出力層用ＦＦＮＮ
３２文字ＣＮＮ（畳み込みニューラルネットワーク）
３３単語ＣＮＮ
４０，４１プーリング
１００ニュース素材抽出装置
ａ，ｂ既出・非既出情報
｛ｘ_char｝文字one-hotベクトル系列
｛ｘ_word｝単語one-hotベクトル系列
ｘ_char 文字BOWベクトル
ｘ_word 単語BOWベクトル DESCRIPTION OF SYMBOLS 1 News material classification device 2 Learning apparatus 10 Sequence generation part 11 Text classification part 12 Learning model memory | storage part 13 Agent classification part 14 Determination part 20 Learning part 21 Character NN (neural network)
22 words NN
23 output NN
24, 30, 36 FFNN for character input layer (forward propagation neural network)
25, 31, 37 FFNN for word input layer
26 character LSTM (Long Short Term Memory Unit)
27 words LSTM
FFNN for 28, 34, 38 intermediate layer
29, 35, 39 FFNN for output layer
32 character CNN (convolutional neural network)
33 words CNN
40, 41 Pooling 100 News material extraction device a, b Previous / non-existing information {x _char } character one-hot vector sequence {x _word } word one-hot vector sequence x _char character BOW vector x _word word BOW vector

Claims

In a news material classification device that inputs post information that can be news material among a large number of social media information as news post information, and classifies the news post information according to a user's request,
A sequence that inputs the news posting information, extracts characters and words included in the news posting information, generates a character vector composed of the character sequence, and generates a word vector composed of the word sequence A generator,
A learning model storage unit for storing a machine-learned learning model;
Reading the learning model from the learning model storage unit, and using the learning model, based on the character vector and the word vector generated by the sequence generation unit, to the user's request for the news posting information A classification unit that generates and outputs corresponding classification information,
The learning model is
Character NN (neural network) having the character vector of the news posting information as input data and the vector of the operation result as output data,
The word NN having the word vector of the news posting information as input data and the vector of the operation result as output data, and
A vector obtained by combining the vector of the calculation result of the character NN and the vector of the calculation result of the word NN as input data, and an output NN using the classification information as the calculation result as output data;
The classification unit includes:
Using the character NN, NN is calculated based on the character vector of the news posting information,
Using the word NN, NN is calculated based on the word vector of the news posting information,
Using the output NN, performing an NN operation based on a vector obtained by combining the operation result vector of the character NN and the operation result vector of the word NN, and outputting the operation result as the classification information; Feature news material classification device.

The news material classification device according to claim 1,
A news material classification device, characterized in that the classification information is used / not-shown information indicating whether the news posting information has been published or has not been published.

The news material classification device according to claim 2,
Furthermore, a second classification unit and a determination unit are provided,
The classification unit includes:
Outputting the above-mentioned information on non-existing information as the first classification result,
The second classification unit includes:
Input the news posting information, extract agent information added to the news posting information for identifying the posting source device, and generate the previous / non-existing information based on the agent information Output as the second classification result,
The determination unit
Based on the first classification result output by the classification unit and the second classification result output by the second classification unit, new output / non-existing information is generated and output. News material classification device.

In the news material classification device according to claim 2 or 3,
The sequence generation unit
Input the news posting information, extract characters and words included in the news posting information, generate one-hot vector sequences by arranging one-hot column vectors corresponding to the characters, and the word Generates a word one-hot vector sequence by arranging one-hot column vectors corresponding to
The learning model is
FFNN for a character input layer having the character one-hot vector sequence generated by the sequence generation unit as input data and a vector of a calculation result of FFNN (feed forward neural network) as output data;
A character RNN having a vector of calculation results of the character input layer FFNN as input data and a vector of calculation results of RNN (recurrent neural network) as output data,
FFNN for a word input layer having the word one-hot vector sequence generated by the sequence generation unit as input data and a vector of the operation result of FFNN as output data,
A word RNN having a vector of the calculation result of the word input layer FFNN as input data and a vector of the calculation result of the RNN as output data;
FFNN for intermediate layer having a vector obtained by combining the vector of the calculation result of the character RNN and the vector of the calculation result of the word RNN as input data and the vector of the calculation result of FFNN as output data, and
The intermediate layer FFNN operation result vector is used as input data, and the FFNN operation result is the output layer FFNN using the above-mentioned previous / non-existing information as output data.
The classification unit includes:
Based on the character one-hot vector sequence generated by the sequence generation unit using the character input layer FFNN, FFNN is calculated,
Using the character RNN, the RNN is calculated based on the vector of the calculation result of the character input layer FFNN,
Based on the word one-hot vector sequence generated by the sequence generation unit using the word input layer FFNN, FFNN is calculated,
Using the word RNN, RNN is calculated based on a vector of calculation results of the word input layer FFNN,
Using the intermediate layer FFNN, based on a vector obtained by combining the vector of the character RNN and the vector of the word RNN, the FFNN is calculated.
A news material classification device, wherein the output layer FFNN is used to perform an FFNN operation based on a vector of operation results of the intermediate layer FFNN, and the operation result is output as the previous / non-existing information. .

In the news material classification device according to claim 2 or 3,
The sequence generation unit
Input the news posting information, extract characters and words included in the news posting information, generate one-hot vector sequences by arranging one-hot column vectors corresponding to the characters, and the word Generates a word one-hot vector sequence by arranging one-hot column vectors corresponding to
The learning model is
FFNN for a character input layer having the character one-hot vector sequence generated by the sequence generation unit as input data and a vector of the operation result of FFNN as output data,
A character CNN having a vector of calculation results of the character input layer FFNN as input data and a vector of calculation results of CNN (convolutional neural network) as output data;
A character pooling layer having the vector of the calculation result of the character CNN as input data and the vector of the calculation result of pooling as output data;
FFNN for a word input layer having the word one-hot vector sequence generated by the sequence generation unit as input data and a vector of the operation result of FFNN as output data,
A word CNN having a vector of calculation results of the word input layer FFNN as input data and a vector of calculation results of CNN as output data,
A word pooling layer having a vector of calculation results of the word CNN as input data and a vector of calculation results of pooling as output data;
FFNN for an intermediate layer having a vector obtained by combining the vector of the result of operation of the character pooling layer and the vector of the result of operation of the word pooling layer as input data, and FFNN for the output of the vector of the operation result of FFNN, and
The intermediate layer FFNN operation result vector is used as input data, and the FFNN operation result is the output layer FFNN using the above-mentioned previous / non-existing information as output data.
The classification unit includes:
Based on the character one-hot vector sequence generated by the sequence generation unit using the character input layer FFNN, FFNN is calculated,
Using the character CNN, the CNN is calculated based on the vector of the calculation result of the character input layer FFNN,
Using the character pooling layer, a pooling operation is performed based on a vector of the operation results of the character CNN,
Based on the word one-hot vector sequence generated by the sequence generation unit using the word input layer FFNN, FFNN is calculated,
Using the word CNN, CNN is calculated based on the calculation result vector of the word input layer FFNN,
Using the word pooling layer, a pooling calculation is performed based on a vector of calculation results of the word CNN,
Based on a vector obtained by combining the vector of the result of the character pooling layer and the vector of the result of the operation of the word pooling layer using the intermediate layer FFNN, the FFNN is calculated.
A news material classification device, wherein the output layer FFNN is used to perform an FFNN operation based on a vector of operation results of the intermediate layer FFNN, and the operation result is output as the previous / non-existing information. .

In the news material classification device according to claim 2 or 3,
The sequence generation unit
Input the news posting information, extract characters and words included in the news posting information, generate character BOW vectors corresponding to all the extracted characters, and correspond to all extracted words Generate a word bow vector,
The learning model is
FFNN for character input layer having the character BOW vector generated by the sequence generation unit as input data and the vector of the operation result of FFNN as output data,
FFNN for word input layer having the word BOW vector generated by the sequence generation unit as input data and the vector of the operation result of FFNN as output data,
An intermediate layer FFNN having a vector obtained by combining the vector of the calculation result of the character input layer FFNN and the vector of the calculation result of the word input layer FFNN as input data and an output data of the vector of the calculation result of FFNN; ,
The intermediate layer FFNN operation result vector is used as input data, and the FFNN operation result is the output layer FFNN using the above-mentioned previous / non-existing information as output data.
The classification unit includes:
Based on the character BOW vector generated by the sequence generation unit using the character input layer FFNN, the FFNN is calculated,
Based on the word BOW vector generated by the sequence generation unit using the word input layer FFNN, FFNN is calculated,
Using the intermediate layer FFNN, based on a vector obtained by combining the vector of the character input layer FFNN and the word input layer FFNN, the FFNN is calculated.
A news material classification device, wherein the output layer FFNN is used to perform an FFNN operation based on a vector of operation results of the intermediate layer FFNN, and the operation result is output as the previous / non-existing information. .

In a news material classification device that inputs post information that can be news material among a lot of social media information as news post information, and classifies the news post information as published or not published,
A sequence generator that inputs the news posting information, extracts characters or words included in the news posting information, and generates a character vector composed of the character sequence or a word vector composed of the word sequence; ,
The character vector or the word vector of the news posting information is used as input data, and the previous / non-existing information indicating whether the news posting information has been or has not been output, which is a calculation result, is output data. And a learning model storage unit that stores the learning model that has been machine-learned,
The learning model is read from the learning model storage unit, and using the learning model, the previously-noted / non-existing information is generated based on the character vector or the word vector generated by the sequence generation unit, and the first classification is performed. A classification part to be output as a result;
Input the news posting information, extract agent information added to the news posting information for identifying the posting source device, and generate the previous / non-existing information based on the agent information A second classification unit for outputting as a second classification result,
Based on the first classification result output by the classification unit and the second classification result output by the second classification unit, new output / non-existing information is generated and output. News material classification device.

A program for causing a computer to function as the news material classification device according to any one of claims 1 to 7.

A learning model for causing a computer to function so as to classify news posting information extracted from a large number of social media information according to a user's request and output classification information,
A character NN having a character vector composed of a character sequence extracted from the news posting information as input data, and a vector of the operation result as output data;
A word NN having a word vector consisting of a series of words extracted from the news posting information as input data, and a vector of an operation result as output data; and
By an output NN using a vector obtained by combining the vector of the calculation result of the character NN and the vector of the calculation result of the word NN as input data, and using the classification information that is the calculation result as output data,
The weighting coefficient of the character NN machine-learned using the character vector and the word vector of the news posting information as input data of the learning model, and the classification information of the news posting information as output data of the learning model, A weighting factor for the word NN and the weighting factor for the output NN;
An NN operation based on the weight coefficient of the character NN is performed on the character vector input to the input layer of the character NN, and an operation result vector is output from the output layer.
An NN operation is performed on the word vector input to the input layer of the word NN based on the weight coefficient of the word NN, and an operation result vector is output from the output layer,
For a vector obtained by combining the vector of the operation result output from the output layer of the character NN and the vector of the operation result output from the output layer of the word NN, which are input to the input layer of the output NN A learning model that causes the computer to function so as to perform an NN operation based on a weight coefficient of the output NN and to output the classification information as an operation result from an output layer.