JP7235329B2

JP7235329B2 - Economic indicator estimation system and its program

Info

Publication number: JP7235329B2
Application number: JP2020097567A
Authority: JP
Inventors: 達也奥野; 明生高橋; 昭彦黒野
Original assignee: Xenodata Lab Co Ltd
Current assignee: Xenodata Lab Co Ltd
Priority date: 2020-06-04
Filing date: 2020-06-04
Publication date: 2023-03-08
Anticipated expiration: 2040-06-04
Also published as: JP2021189995A

Description

本発明は、経済指標を推定する経済指標推定システムおよびそのプログラムに関する。 The present invention relates to an economic indicator estimation system for estimating economic indicators and a program thereof.

従来、ニュース記事などの様々なテキストデータを活用して、経済環境や金融市場の動向を予測する手法が知られている。例えば、非特許文献１には、日々配信されている経済ニュースを指数化して景気動向のナウキャスティングを行い、景況感ニュース指数による資産価格のボラティリティ予測を行う手法が開示されている。この手法の特徴は、第１に、経済ニュースより景況感を推定するにあたって、深層学習モデルの一つである畳み込みニューラル・ネットワーク（ＣＮＮ）を用いる点、そして、第２に、日次のニュースを指数化することによって、日次での景気動向を計測する点である。ニュース指数を構築するために、まず、内閣府が公表している景気ウォッチャー調査の景気判断理由集を訓練データとして、ＣＮＮによる教師あり学習を行い、テキスト分類を行う学習器を構築する。つぎに、訓練した学習器を用いて、経済ニュースの日本語記事を構成する文に対して景況感に関するスコアを付与する。最後に、スコアが付与された文章を月次および日次で集計することによって、ニュース指数を構築する。 Conventionally, there is known a method of predicting trends in the economic environment and financial markets by utilizing various text data such as news articles. For example, Non-Patent Document 1 discloses a method of indexing daily distributed economic news to perform nowcasting of economic trends and predicting asset price volatility using the business sentiment news index. The features of this method are: first, it uses a convolutional neural network (CNN), one of the deep learning models, to estimate business sentiment from economic news; It is a point to measure daily economic trends by indexing. In order to construct the news index, first, using the economic judgment reason collection of the economic watchers survey published by the Cabinet Office as training data, supervised learning is performed by CNN to construct a learner that performs text classification. Next, using the trained learner, the sentences that make up the Japanese articles of economic news are given scores related to business sentiment. Finally, a news index is constructed by aggregating the scored sentences on a monthly and daily basis.

非特許文献２には、再帰型ニューラルネットワーク（ＲＮＮ）によって自動的にテキストの景気センチメントを判別することで、金融レポートのテキストを低コストかつ高速に数値（センチメント指数）化して集計する手法が開示されている。景気ウォッチャー調査のテキストから、その景気センチメント（ポジティブ/ネガティブ）を予測するタスクをＲＮＮで学習させた上で、文書の景気センチメントを判別させる。また、非特許文献２には、このようなセンチメント推定モデルを使って政府や日銀が発行する月次レポートのセンチメントを推定し指数化したところ、得られた指数の変動はマクロの景気変動の動きと良く連動していること、および、日経平均との相関を算出したところ、既に投資指標として広く用いられている日銀短観や景気ウォッチャー指数と比べても高い値を示したことが記載されている。 Non-Patent Document 2 describes a method of digitizing and aggregating the text of financial reports at low cost and at high speed by automatically determining the economic sentiment of the text using a recurrent neural network (RNN). is disclosed. After having the RNN learn the task of predicting the economic sentiment (positive/negative) from the text of the economy watcher survey, the economic sentiment of the document is determined. In addition, Non-Patent Document 2 uses such a sentiment estimation model to estimate and index the sentiment of monthly reports issued by the government and the Bank of Japan. , and when calculating the correlation with the Nikkei Stock Average, it showed a higher value than the Bank of Japan Tankan and Economy Watchers Index, which are already widely used as investment indices. ing.

また、非特許文献３には、日本銀行の景気に対するセンチメントを日本銀行の発行するテキストに対して、トピックモデルおよびニューラルネットワーク用いて、トピック毎に分解した指数化を行う手法が開示されている。 In addition, Non-Patent Document 3 discloses a method of indexing the sentiment of the Bank of Japan's economy for each topic using a topic model and a neural network for texts issued by the Bank of Japan. .

さらに、特許文献１には、ニュース情報等の公表に有無にかかわらず経済指標を予測する情報処理装置が開示されている。この情報処理装置は、モデル記憶部と、取得部と、予測部とを有する。モデル記憶部は、端末の位置情報に基づいて設定された予測モデルを記憶する。取得部は、一または複数の端末の位置情報を取得する。予測部は、モデル記憶部に記憶された予測モデルを、取得部により取得された位置情報を適用して、指定された経済指標を予測する。 Furthermore, Patent Literature 1 discloses an information processing apparatus that predicts economic indicators regardless of whether news information or the like is published. This information processing device has a model storage unit, an acquisition unit, and a prediction unit. The model storage unit stores a prediction model set based on the location information of the terminal. The acquisition unit acquires location information of one or more terminals. The prediction unit applies the prediction model stored in the model storage unit to the location information acquired by the acquisition unit to predict the specified economic index.

特開２０１９－４６３７６号公報JP 2019-46376 A

五島圭一他２名，「自然言語処理による景況感ニュース指数の構築とボラティリティ予測への応用」，[online]，２０１９年１月、ＩＭＥＳＤＩＳＣＵＳＳＩＯＮＰＥＰＥＲＳＥＲＩＥＳ，日本，日本銀行金融研究所，［２０２０年５月２８日検索］，インターネット＜ＵＲＬ：http://www.imes.boj.or.jp/research/papers/japanese/19-J-03.pdf＞Keiichi Goto and two others, ``Construction of business sentiment news index by natural language processing and its application to volatility forecast'', [online], January 2019, IMES DISCUSSION PEPER SERIES, Japan, Institute for Monetary and Economic Studies, Bank of Japan, [2020 Searched on May 28], Internet <URL: http://www.imes.boj.or.jp/research/papers/japanese/19-J-03.pdf> 山本裕樹他１名，「景気ウォッチャー調査の深層学習を用いた金融レポートの指数化」，[online]，２０１６年６月６日、第３０回全国大会（２０１６），日本，一般社団法人日本人工知能学会，［２０２０年５月２８日検索］，インターネット＜ＵＲＬ：https://www.ai-gakkai.or.jp/jsai2016/webprogram/2016/pdf/219.pdf＞Hiroki Yamamoto et al., "Indexing Financial Reports Using Deep Learning from Economy Watchers Survey", [online], June 6, 2016, 30th National Conference (2016), Japan, Japan Artificial Intelligence Intelligence Society, [Retrieved May 28, 2020], Internet <URL: https://www.ai-gakkai.or.jp/jsai2016/webprogram/2016/pdf/219.pdf> 余野京登他１名，「金融レポート、およびマクロ経済指数によるリアルタイム日銀センチメントの予測」，[online]，２０１７年５月２３日、第３１回全国大会（２０１７），日本，一般社団法人日本人工知能学会，［２０２０年５月２８日検索］，インターネット＜ＵＲＬ：https://www.jstage.jst.go.jp/article/pjsai/JSAI2017/0/JSAI2017_2D13/_pdf/-char/ja＞Kyoto Yono and others, "Prediction of real-time BOJ sentiment based on financial reports and macroeconomic indices", [online], May 23, 2017, 31st National Conference (2017), Japan, General Incorporated Association Japan Japanese Society for Artificial Intelligence, [Retrieved May 28, 2020], Internet <URL: https://www.jstage.jst.go.jp/article/pjsai/JSAI2017/0/JSAI2017_2D13/_pdf/-char/ja>

本発明の目的は、経済指標を任意の時間分解能で推定する新規な手法を提供することである。 An object of the present invention is to provide a novel technique for estimating economic indicators with arbitrary time resolution.

かかる課題を解決すべく、第１の発明は、特定の経済指標を表す経済指標値の離散的な時系列と、経済指標値に影響を及ぼす経済事象とに基づいて、所望の時間分解能で経済指標を推定する経済指標推定システムを提供する。このシステムは、ダイジェスト生成部と、経済事象データベースと、頻度集計部と、ベクトル集計部と、学習処理部とを有する。ダイジェスト生成部は、外部より収集されたニュース群から抽出された経済事象の内容を、予め定められた複数の項目で構造化した経済事象ダイジェストを生成する。経済事象データベースには、経済事象ダイジェストが格納される。頻度集計部は、経済事象データベースに格納された経済事象ダイジェストの群を集計対象として、所定の集計時間単位毎に、経済事象ダイジェストを内容的な共通性を有する経済事象パターン別に分類し、それぞれの経済事象パターンの出現頻度を集計する。ベクトル推定部は、集計時間単位毎の経済事象パターンのそれぞれに対して、共通の空間に写像させた非負の実数値よりなる経済事象頻度ベクトルが、集計時間単位毎の経済事象パターンのそれぞれの出現頻度を再現するように推定する。ベクトル集計部は、経済指標値の時間分解能に相当する第１の時間単位毎に、第１の時間単位に属する経済事象頻度ベクトルを集計して、第１の時間単位内における各経済事象パターンの出現度合いを表す第１の経済事象集計ベクトルを生成する。学習処理部は、第１の経済事象集計ベクトルの入力に対して、これと時間的に対応する経済指標値が応答するように、経済指標を推定するための回帰モデルの学習を行う。 In order to solve such a problem, a first invention is an economic An economic indicator estimation system for estimating indicators is provided. This system has a digest generator, an economic phenomenon database, a frequency totalizer, a vector totalizer, and a learning processor. The digest generation unit generates an economic event digest by structuring the content of economic events extracted from news groups collected from the outside with a plurality of predetermined items. The economic event database stores economic event digests. The frequency aggregating unit classifies the economic event digests stored in the economic event database into groups of economic event digests stored in the economic event database as aggregation targets, and classifies the economic event digests into economic event patterns having commonalities in content for each predetermined aggregation time unit. Aggregate the frequency of occurrence of economic event patterns. The vector estimator calculates, for each economic event pattern for each aggregated time unit, an economic event frequency vector consisting of non-negative real values mapped onto a common space, which corresponds to the appearance of each economic event pattern for each aggregated time unit. Estimate to reproduce the frequency. The vector aggregation unit aggregates the economic event frequency vectors belonging to the first time unit for each first time unit corresponding to the time resolution of the economic index value, and calculates each economic event pattern within the first time unit. A first economic event aggregation vector representing the degree of occurrence is generated. The learning processing unit learns a regression model for estimating an economic index so that an economic index value temporally corresponding to the input of the first economic event aggregation vector responds.

ここで、第１の発明において、ニュース群のうち、予め設定されたニュースメディアリストに記述されたメディア名のものを抽出して、ダイジェスト生成部に出力するニュースフィルタ部を設けてもよい。また、ダイジェスト生成部によって生成された経済事象ダイジェストのうち、予め設定された絞込条件に合致したものを抽出して、経済事象データベースに格納する絞込処理部を設けてもよい。この場合、上記絞込処理部は、上記絞込条件として、推定対象となる特定の経済指標値に影響を与える経済事象ダイジェストのパターンが記述された業績要因リストを参照して、経済事象ダイジェストを抽出することが好ましい。 Here, in the first invention, a news filter section may be provided that extracts media names described in a preset news media list from the news group and outputs the extracted media names to the digest generation section. Further, a narrowing processing unit may be provided that extracts economic phenomenon digests that match preset narrowing conditions from among the economic event digests generated by the digest generating unit and stores them in the economic phenomenon database. In this case, the narrowing-down processing unit refers to, as the narrowing-down condition, a performance factor list describing patterns of economic event digests that affect specific economic index values to be estimated, and selects economic event digests. Extraction is preferred.

第１の発明において、上記ベクトル集計部は、各次元の構成要素の和が１になるように正規化された経済事象頻度ベクトルの和を算出し、第１の時間単位における出現頻度で正規化することによって、第１の経済事象集計ベクトルを生成することが好ましい。 In the first invention, the vector aggregation unit calculates the sum of economic event frequency vectors normalized so that the sum of the components of each dimension is 1, and normalizes by the appearance frequency in the first time unit Preferably, the first economic event aggregation vector is generated by:

第１の発明において、第２の経済事象集計ベクトルの入力に対する学習済の回帰モデルの応答を、第２の経済事象集計ベクトルと時間的に対応する経済指標の推定値として出力する推定処理部を設けてもよい。この場合、上記ベクトル集計部は、経済指標値とは時間分解能が異なる第２の時間単位に属する経済事象頻度ベクトルを集計して、第２の時間単位内における各経済事象パターンの出現度合いを表す第２の経済事象集計ベクトルを生成することが好ましい。また、上記ベクトル集計部は、各次元の構成要素の和が１になるように正規化された経済事象頻度ベクトルの和を算出し、第２の時間単位における出現頻度で正規化することによって、第２の経済事象集計ベクトルを生成することが好ましい。 In the first invention, an estimation processing unit that outputs a response of the learned regression model to the input of the second aggregate vector of economic events as an estimated value of an economic indicator temporally corresponding to the second aggregate vector of economic events. may be provided. In this case, the vector aggregating unit aggregates economic event frequency vectors belonging to a second time unit having a time resolution different from that of the economic index value, and expresses the degree of occurrence of each economic event pattern within the second time unit. Preferably, a second economic event aggregation vector is generated. In addition, the vector counting unit calculates the sum of economic event frequency vectors normalized so that the sum of the components of each dimension is 1, and normalizes by the appearance frequency in the second time unit, Preferably, a second economic event aggregation vector is generated.

第２の発明は、特定の経済指標を表す経済指標値の離散的な時系列と、経済指標値に影響を及ぼす経済事象とに基づいて、所望の時間分解能で経済指標を推定する経済指標推定プログラムを提供する。このプログラムは、以下の第１から第６までのステップを有する処理をコンピュータに実行させる。第１のステップでは、外部より収集されたニュース群から抽出された経済事象の内容を、予め定められた複数の項目で構造化した経済事象ダイジェストを生成する。第２のステップでは、経済事象ダイジェストを経済事象データベースに格納する。第３のステップでは、経済事象データベースに格納された経済事象ダイジェストの群を集計対象として、所定の集計時間単位毎に、経済事象ダイジェストを内容的な共通性を有する経済事象パターン別に分類し、それぞれの経済事象パターンの出現頻度を集計する。第４のステップでは、集計時間単位毎の経済事象パターンのそれぞれに対して、共通の空間に写像させた非負の実数値よりなる経済事象頻度ベクトルが、集計時間単位毎の経済事象パターンのそれぞれの出現頻度を再現するように推定する。第５のステップでは、経済指標値の時間分解能に相当する第１の時間単位毎に、第１の時間単位に属する経済事象頻度ベクトルを集計して、第１の時間単位内における各経済事象パターンの出現度合いを表す第１の経済事象集計ベクトルを生成する。第６のステップでは、第１の経済事象集計ベクトルの入力に対して、これと時間的に対応する経済指標値が応答するように、経済指標を推定するための回帰モデルの学習を行う。 A second invention is an economic indicator estimation method for estimating an economic indicator with a desired time resolution based on a discrete time series of economic indicator values representing specific economic indicators and economic events affecting the economic indicator values. Offer a program. This program causes the computer to execute processing having the following first to sixth steps. In the first step, an economic event digest is generated by structuring the content of the economic event extracted from the news group collected from the outside with a plurality of predetermined items. In a second step, the economic event digest is stored in the economic event database. In the third step, a group of economic event digests stored in the economic event database is targeted for aggregation, and the economic event digests are classified into economic event patterns having commonalities in content for each predetermined aggregation time unit. aggregate the frequency of occurrence of economic event patterns. In the fourth step, for each aggregated time unit economic event pattern, an economic event frequency vector consisting of non-negative real values mapped onto a common space is obtained as Estimate so as to reproduce the appearance frequency. In the fifth step, for each first time unit corresponding to the time resolution of the economic index value, the economic event frequency vectors belonging to the first time unit are aggregated, and each economic event pattern within the first time unit generates a first economic event aggregation vector representing the degree of occurrence of . In the sixth step, a regression model for estimating economic indices is learned so that economic index values temporally corresponding to the input of the first economic event aggregation vector respond.

ここで、第２の発明において、上記第１のステップは、ニュース群のうち、予め設定されたニュースメディアリストに記述されたメディア名のものを抽出するステップを有していてもよい。また、上記第１のステップは、経済事象ダイジェストのうち、予め設定された絞込条件に合致したものを抽出して、経済事象データベースに格納するステップを有していてもよい。この場合、上記第１のステップは、上記絞込条件として、推定対象となる特定の経済指標値に影響を与える経済事象ダイジェストのパターンが記述された業績要因リストを参照して、経済事象ダイジェストを抽出することが好ましい。 Here, in the second invention, the first step may have a step of extracting, from among the news groups, news items with media names described in a preset news media list. Further, the first step may include a step of extracting economic event digests that match preset narrowing conditions and storing them in the economic event database. In this case, the first step refers to, as the narrowing condition, a performance factor list describing patterns of economic event digests that affect specific economic index values to be estimated, and extracts the economic event digests. Extraction is preferred.

第２の発明において、上記第３のステップは、各次元の構成要素の和が１になるように正規化された経済事象頻度ベクトルの和を算出し、第１の時間単位における出現頻度で正規化することによって、第１の経済事象集計ベクトルを生成することが好ましい。 In the second invention, the third step calculates the sum of the economic event frequency vectors normalized so that the sum of the components of each dimension is 1, and normalizes the sum of the economic event frequency vectors in the first time unit It is preferable to generate the first economic event aggregation vector by unifying.

第２の発明において、上述したステップに加えて、以下の第７および第７のステップを有する処理をコンピュータに実行させてもよい。第７のステップでは、第２の経済事象集計ベクトルの入力に対する学習済の回帰モデルの応答を、第２の経済事象集計ベクトルと時間的に対応する経済指標の推定値として出力する。第８のステップでは、経済指標値とは時間分解能が異なる第２の時間単位に属する経済事象頻度ベクトルを集計して、第２の時間単位内における各経済事象パターンの出現度合いを表す第２の経済事象集計ベクトルを生成する。この場合、上記第８のステップは、各次元の構成要素の和が１になるように正規化された経済事象頻度ベクトルの和を算出し、第２の時間単位における出現頻度で正規化することによって、第２の経済事象集計ベクトルを生成することが好ましい。 In the second invention, in addition to the steps described above, the computer may be caused to execute processing having the following seventh and seventh steps. In the seventh step, the response of the trained regression model to the input of the second aggregate vector of economic events is output as the estimated value of the economic index temporally corresponding to the second aggregate vector of economic events. In the eighth step, economic event frequency vectors belonging to a second time unit with a time resolution different from the economic index value are aggregated to obtain a second time unit representing the degree of occurrence of each economic event pattern within the second time unit. Generate an economic event aggregation vector. In this case, the eighth step calculates the sum of the normalized economic event frequency vectors so that the sum of the components of each dimension is 1, and normalizes by the appearance frequency in the second time unit. It is preferred to generate the second economic event aggregation vector by:

第１および第２の発明において、上記第２の時間単位は、経済指標値よりも時間分解能が高くてもよい。 In the first and second inventions, the second time unit may have higher time resolution than the economic indicator value.

本発明によれば、第１の経済事象集計ベクトルの入力に対して、これと時間的に対応する経済指標値が応答するように、回帰モデルの学習を行う。第１の経済事象集計ベクトルは、第１の時間単位内における各経済事象パターンの出現度合いを表している。回帰モデルの学習結果として、パターン化された経済事象の出現頻度、換言すれば、ある経済指標値に対する経済事象の影響度合いが考慮された形で、経済事象と経済指標値とが関連付けられる。このようにして構築された回帰モデルを利用することで、任意の時間分解能の入力に対する回帰モデルの応答として、この時間分解能に相当する経済指標推定が可能になる。 According to the present invention, the regression model is learned so that the economic indicator value temporally corresponding to the input of the first economic event aggregation vector responds. The first economic event aggregation vector represents the degree of occurrence of each economic event pattern within the first time unit. As a learning result of the regression model, an economic event and an economic index value are associated in a form that takes into consideration the frequency of occurrence of patterned economic events, in other words, the degree of influence of an economic event on a certain economic index value. By using the regression model constructed in this way, it becomes possible to estimate economic indicators corresponding to this time resolution as the response of the regression model to the input of arbitrary time resolution.

経済指標推定システムのブロック図Block diagram of economic indicator estimation system 経済指標値の時系列の一例を示す図Diagram showing an example of a time series of economic index values ニュース群の一例を示す図A diagram showing an example of a news group 経済事象ダイジェストの一例を示す図Diagram showing an example of an economic event digest 正規化辞書の一例を示す図A diagram showing an example of a normalization dictionary 絞込条件の一例を示す図Diagram showing an example of narrowing conditions 企業リストの一例を示す図Diagram showing an example of a company list 企業の業績要因データの一例を示す図Diagram showing an example of company performance factor data 指標推定部のブロック図Block diagram of index estimator 経済事象パターンの出現頻度を日次で集計した結果の一例を示す図A diagram showing an example of the results of daily aggregation of the frequency of occurrence of economic event patterns 日付の潜在ベクトルθdの一例を示す図A diagram showing an example of the date latent vector θd 名前（item）の潜在ベクトルθiの一例を示す図Diagram showing an example of the latent vector θi of name (item) 要素（element）の潜在ベクトルθjの一例を示す図A diagram showing an example of the latent vector θj of an element 変動（predicate）の潜在ベクトルθkの一例を示す図Diagram showing an example of latent vector θk of fluctuation (predicate) 日次単位の経済事象集計ベクトルの一例を示す図A diagram showing an example of a daily unit economic event aggregation vector 回帰モデルにおける入力変数と応答変数との関係の一例を示す図Diagram showing an example of the relationship between input variables and response variables in a regression model 平滑化が行われた経済指標の推定値の一例を示す図A chart showing an example of smoothed estimates of economic indicators

図１は、本実施形態に係る経済指標推定システムのブロック図である。この経済指標推定システム１は、既知の経済指標値と、既知のニュース群より抽出される経済事象とに基づいて、所望の時間分解能で経済指標を推定する。ここで、「経済指標値」とは、特定の経済指標を表しており、日次や月次の如く、離散的な時系列（経済指標データ）として提供される。また、「経済事象」とは、推定対象となる特定の経済指標値に影響を及ぼす出来事をいう。経済事象および経済指標値は何らかの因果関係を有しており、ある時間内において経済事象（一つとは限らない。）が発生すると、これと時間的に対応する経済指標値の変動が起こり得る。 FIG. 1 is a block diagram of an economic index estimation system according to this embodiment. This economic indicator estimation system 1 estimates economic indicators with a desired time resolution based on known economic indicator values and economic events extracted from a known news group. Here, the "economic index value" represents a specific economic index, and is provided as a discrete time series (economic index data) such as daily or monthly. In addition, "economic event" refers to an event that affects a specific economic index value to be estimated. Economic events and economic index values have some kind of causal relationship, and if an economic event (not limited to one) occurs within a certain period of time, the corresponding economic index value may fluctuate in time.

図２は、経済指標値の時系列の一例として、月次で公表される景気ウォッチャー製造業指数の時系列を示している。この経済指標値の時間分解能は月単位であり、経済指標推定システム１に入力すべきデータとして外部から取得される。このような経済指標値としては、景気ウォッチャー製造業指数の他に、日銀短観、鉱工業生産指数、失業率、自動車販売台数、住宅着工統計などを含めて、経済事象との因果関係を有するものであれば、任意の経済指標値を用いることができる。また、経済指標値の公表周期（時間分解能）は、四半期毎や半月毎でもよく、更には不定期なものやランダムなものであっても構わない。 FIG. 2 shows the time series of the Economy Watchers Manufacturing Index released monthly as an example of the time series of economic index values. The time resolution of this economic index value is monthly, and is obtained from the outside as data to be input to the economic index estimation system 1 . In addition to the economy watchers manufacturing index, such economic indicators include the Bank of Japan's Tankan survey, the industrial production index, the unemployment rate, the number of automobiles sold, and the housing starts statistics, all of which have a causal relationship with economic events. Any economic indicator value, if any, can be used. Moreover, the publication cycle (time resolution) of the economic index values may be quarterly or half-monthly, and may be irregular or random.

図３は、経済事象の抽出元となるニュース群の一例を示す図である。このニュース群は、記事ＩＤ、メディア名、配信日時、記事本文からなるニュース記事の集合（１件であってもよい。）であって、インターネット上などに存在する様々な外部ソースより随時収集される。 FIG. 3 is a diagram showing an example of a news group from which economic events are extracted. This news group is a collection of news articles (may be one article) consisting of an article ID, media name, distribution date and time, and article text, and is collected from various external sources such as the Internet as needed. be.

経済指標推定システム１は、ニュースフィルタ部２と、ダイジェスト生成部３と、絞込処理部４と、経済事象データベース５と、指標推定部６とを主体に構成されている。 The economic index estimation system 1 mainly includes a news filter unit 2, a digest generation unit 3, a narrowing processing unit 4, an economic event database 5, and an index estimation unit 6.

ニュースフィルタ部２は、入力されたニュース群のうち、経済事象に関するニュース記事を配信している可能性が高いメディア名のもののみを抽出し、それ以外については除外する。このメディア名は、ニュースメディアリストとして予め設定されており、ここに記述されていないメディア名のものは、経済事象とは本来的に無関係なノイズとみなされる。図３の例では、ニュースメディアリストに記載されていない、記事ＩＤ＝「１００５」の芸能新聞や記事ＩＤ＝「１００６」の農業新聞などのニュース記事が除外されることになる。このようなニュースのソースに基づくフィルタリングを行うことで、それ以降の処理負荷の軽減を図る。 The news filter unit 2 extracts only media names that are highly likely to distribute news articles about economic events from the input news group, and excludes other media. This media name is preset as a news media list, and media names not described here are regarded as noise essentially unrelated to economic events. In the example of FIG. 3, news articles such as entertainment newspapers with article ID=“1005” and agricultural newspapers with article ID=“1006”, which are not listed in the news media list, are excluded. By performing such filtering based on the source of the news, the subsequent processing load can be reduced.

ダイジェスト生成部３は、ニュースフィルタ部２によってフィルタリングされたニュース群に基づいて、経済事象を抽出する。一つのニュース記事から複数の経済事象が抽出されることもあるし、一つも抽出されないこともある。抽出された経済事象の内容は、経済事象ダイジェストの形で出力される。図４は、ニュース群より抽出された経済事象ダイジェストの一例を示す図である。経済事象ダイジェストは、経済事象の内容を予め定められた複数の項目に区分することによって構造化したものであり、冗長性を排した形で経済事象の特徴（特徴量）を端的に表している。一例として、経済事象ダイジェストは、「名前（item）」、「要素（element）」および「変動（predicate）」のセットによって構成することができる。「名前（item）」は、「原油」や「ガソリン」のように、経済事象の名前を表す項目である。「要素（element）」は、「価格」や「需要」のように、経済事象の定量または傾向を表す項目である。「変動（predicate）」は、「上昇」や「増加」のように、経済事象（「要素」）の変動方向（＋／－）を表す項目である。ただし、これらの３項目のうち、経済事象を特徴付けるものとして最も重要なものは、「要素（element）」および「変動（predicate）」、すなわち、「何」が「どうした」（例えば「価格」が「下落」した）である。したがって、「要素（element）」および「変動（predicate）」は必要不可欠であるが、「名前（item）」については必要に応じて適宜採用すればよく、あるいは、これら以外の別の項目を追加してもよい。 The digest generator 3 extracts economic events based on the news group filtered by the news filter 2 . Multiple economic events may be extracted from a single news article, or none may be extracted. The contents of the extracted economic event are output in the form of an economic event digest. FIG. 4 is a diagram showing an example of an economic event digest extracted from a news group. An economic event digest is structured by classifying the content of an economic event into a plurality of predetermined items, and expresses the characteristics (feature values) of an economic event in a simple manner without redundancy. . As an example, an economic event digest may consist of a set of "items", "elements" and "predicates". "Name (item)" is an item representing the name of an economic phenomenon, such as "crude oil" or "gasoline". An "element" is an item such as "price" or "demand" that expresses the quantity or tendency of an economic phenomenon. “Predicate” is an item that represents the direction (+/−) of fluctuations of economic phenomena (“elements”), such as “rise” and “increase”. Of these three items, however, the most important ones that characterize economic events are the "element" and the "predicate," i. has “fallen”). Therefore, "element" and "predicate" are indispensable, but "name" can be adopted as necessary, or another item other than these can be added. You may

なお、「名前（item）」、「要素（element）」および「変動（predicate）」は、表現の揺らぎを解消すべく、正規化辞書を用いて、抽出テキストを正規テキストに正規化、すなわち、表現を統一することが好ましい。図５は、正規化辞書の一例を示す。例えば、変動（predicate）に関して、「高騰」、「増加」、「多い」といった抽出テキストは、変動方向がプラスである「増」という正規テキストに変換され、「下落」、「下降」、「急落」といった抽出テキストは、変動方向がマイナスである「減」という正規テキストに変換される。これにより、抽出テキストの表現が異なっていても、システム上、同一の意味として統一的に取り扱うことが可能になる。 In addition, "name (item)", "element (element)" and "variation (predicate)" normalize the extracted text to regular text using a normalization dictionary in order to eliminate fluctuations in expression, that is, It is preferable to unify the expressions. FIG. 5 shows an example of a normalization dictionary. For example, with regard to fluctuations (predicate), the extracted texts such as "soaring", "increasing", and "many" are converted to the regular text of "increasing", which has a positive fluctuation direction. is converted to regular text "decrease" with a negative change direction. As a result, even if the expression of the extracted text is different, it becomes possible to uniformly handle it as the same meaning on the system.

絞込処理部４は、予め設定された絞込条件に従って、ダイジェスト生成部４によって生成された経済事象ダイジェストを個別に評価して、推定対象となる特定の経済指標値と関連性を有する経済事象ダイジェストを抽出する。図６は、絞込条件の一例を示している。本実施形態において、絞込条件は、推定対象となる特定の経済指標値に影響を与える経済事象ダイジェストのパターンが記述された業績要因リストとして規定されている。これにより、ダイジェスト生成部４によって生成された経済事象ダイジェストのうち、業績要因リストに記述されたパターンに合致するものが抽出され、それに合致しないものは、推定対象となる特定の経済指標値とは関連しないものとして除去される。 The narrowing-down processing unit 4 individually evaluates the economic event digests generated by the digest generating unit 4 according to preset narrowing-down conditions, and identifies economic events related to specific economic index values to be estimated. Extract the digest. FIG. 6 shows an example of narrowing-down conditions. In this embodiment, the narrowing-down condition is defined as a performance factor list describing patterns of economic event digests that affect specific economic index values to be estimated. As a result, among the economic event digests generated by the digest generation unit 4, those that match the pattern described in the performance factor list are extracted, and those that do not match are extracted from the specific economic index value to be estimated. Removed as irrelevant.

業績要因リストは、例えば、企業リストと、企業の業績要因データという２種類のデータを用いて作成することができる。図７に示すように、企業リストには、推定対象である特定の経済指標値に影響を及ぼす企業名がリストアップされている。企業名の記述は、企業の名称であってもよいが、上場企業の場合には銘柄コードなどを用いれば曖昧さをなくすことができる。また、東証３３業種などの分類データを活用すれば、企業リスト自体を容易に作成することができる。一方、図８に示すように、業績要因データには、「企業名」と、「経済事象ダイジェスト」と、「影響」とのセットがリストアップされており、過去の実績として、どのような経済事象ダイジェストが企業にどのような影響を与えたのかが記述されている。例えば、同図において、「白物家電／需要／増加」という経済事象ダイジェストは、「いろは電機」の「増収」という影響を、「ドル円／相場／下落」という経済事象ダイジェストは、「いろは電機」の「減収」という影響をそれぞれ与えたことを示している。このような業績要因データについては、本出願人が既に提案した特開２０２０－２４６８９号公報に記載された手法を想定しているので、必要ならば参照されたい。そして、図８に示した企業の業績要因データのうち、図７に示した企業に関するものが抽出され、これによって、図６に示した業績要因リストが作成される。 The performance factor list can be created, for example, using two types of data: a company list and company performance factor data. As shown in FIG. 7, the company list lists the names of companies that influence specific economic index values to be estimated. The description of the company name may be the name of the company, but in the case of a listed company, the ambiguity can be eliminated by using the brand code or the like. In addition, if classification data such as the 33 industries of the Tokyo Stock Exchange are used, the company list itself can be easily created. On the other hand, as shown in Figure 8, the performance factor data lists sets of "company name," "economic event digest," and "impact." It describes how the event digest affected the company. For example, in the same figure, the economic event digest “white goods/demand/increase” shows the impact of “increase in sales” of “Iroha Denki”, while the economic event digest “dollar yen/market price/decrease” , respectively. For such performance factor data, the method described in Japanese Patent Application Laid-Open No. 2020-24689, which has already been proposed by the present applicant, is assumed, so please refer to it if necessary. 7 are extracted from the company performance factor data shown in FIG. 8, and the performance factor list shown in FIG. 6 is created.

絞込処理部４によって抽出された経済事象ダイジェストは、経済事象データベース５に新規に追加される。経済事象データベース５には、今回追加される経済事象ダイジェストのみならず、それ以前に抽出された過去分の経済事象ダイジェストも格納されている。 The economic phenomenon digest extracted by the narrowing processing unit 4 is newly added to the economic phenomenon database 5 . The economic event database 5 stores not only the economic event digests to be added this time, but also past economic event digests extracted before that time.

指標推定部６は、上述した経済指標値と、経済事象データベース５に格納された経済事象ダイジェストとに基づいて、所望の時間分解能で経済指標を推定する。図９は、指標推定部６のブロック図である。この指標推定部６は、頻度集計部６ａと、ベクトル推定部６ｂと、ベクトル集計部６ｃと、学習処理部６ｄと、推定処理部６ｅと、回帰モデル６ｆとを主体に構成されている。頻度集計部６ａおよびベクトル推定部６ｂは、経済事象データベース５より読みされた経済事象ダイジェストの群を入力とした前処理を行う。ベクトル集計部６ｃおよび学習処理部６ｄは、回帰モデル６ｆの学習を行って、経済事象ダイジェストと経済指標値との関連付けを行う。ベクトル集計部６ｃおよび推定処理部６ｅは、学習済の回帰モデル６ｆを用いて所望の時間分解能で経済指標を推定し、その推定結果として経済指標の推定値を出力する。 The index estimator 6 estimates an economic index with a desired time resolution based on the economic index value described above and the economic event digest stored in the economic event database 5 . FIG. 9 is a block diagram of the index estimator 6. As shown in FIG. The index estimation unit 6 mainly includes a frequency totalization unit 6a, a vector estimation unit 6b, a vector totalization unit 6c, a learning processing unit 6d, an estimation processing unit 6e, and a regression model 6f. The frequency counting unit 6a and the vector estimating unit 6b perform preprocessing with the group of economic event digests read from the economic event database 5 as input. The vector counting unit 6c and the learning processing unit 6d learn the regression model 6f and associate the economic event digest with the economic index value. The vector counting unit 6c and the estimation processing unit 6e estimate the economic index with a desired time resolution using the learned regression model 6f, and output the estimated value of the economic index as the estimation result.

頻度集計部６ａは、経済事象ダイジェストの群を集計対象として、所定の集計時間単位毎に、集計対象となる経済事象ダイジェストを内容的な共通性を有する経済事象パターン別に分類する。そして、頻度集計部６ａは、それぞれの集計時間単位について、分類された経済事象パターンのそれぞれの出現頻度を集計する。集計時間単位は、本実施形態では日次としているが、これに限らず週次や月次のように任意に設定することができ、この集計時間単位が指標推定における最も高い時間分解能に相当する。 The frequency aggregating unit 6a classifies economic event digests to be aggregated according to economic event patterns having commonalities in content for each predetermined aggregation time unit, with a group of economic event digests as aggregation targets. Then, the frequency totaling unit 6a totals the frequency of appearance of each of the classified economic event patterns for each total time unit. The aggregate time unit is daily in the present embodiment, but is not limited to this, and can be arbitrarily set such as weekly or monthly, and this aggregate time unit corresponds to the highest time resolution in index estimation. .

図１０は、経済事象パターンの出現頻度を日次で集計した結果の一例を示す。上述したように、経済事象パターンは、「名前（item）×要素（element）×変動（predicate）」のセットとして規定され、このセットが共通するものが同一の経済事象パターンとしてカウントされる。その結果、２０１８年１２月１４日（日次）について、経済事象パターンＡ（原油×価格×下落）の出現回数は３回、経済事象パターンＢ（自動車×販売×増加）の出現回数は２回、経済事象パターンＣ（携帯電話×輸出×堅調）の出現回数は１回、経済事象パターンＤ（ドル×価格×下落）の出現回数は１回、経済事象パターンＥ（住宅×需要×好調）の出現回数は１回、経済事象パターンＦ（ガソリン×需要×増加）の出現回数は１回となる。それぞれの経済事象パターンＡ～Ｆの出現頻度は、これと時間的に対応する経済指標値（例えば、２０１８年１２月１４日付の経済指標値、または、同日を含む所定期間の経済指標値）と相関性を有しており、出現頻度が高いものほど、この経済指標値に与える影響の度合いが大きいものとみなされる。 FIG. 10 shows an example of the result of summing up the frequency of occurrence of economic event patterns on a daily basis. As described above, an economic event pattern is defined as a set of "item x element x predicate", and those that have this set in common are counted as the same economic event pattern. As a result, on December 14, 2018 (daily), economic event pattern A (crude oil x price x decline) occurred three times, and economic event pattern B (automobiles x sales x increase) occurred twice. , economic event pattern C (mobile phones x exports x solid) appears once, economic event pattern D (dollar x price x decline) appears once, and economic event pattern E (housing x demand x strong) appears once. The number of occurrences is one, and the number of occurrences of economic phenomenon pattern F (gasoline x demand x increase) is one. The frequency of occurrence of each of the economic event patterns A to F is the economic index value corresponding to this in time (for example, the economic index value on December 14, 2018, or the economic index value for a predetermined period including the same day). There is a correlation, and the higher the frequency of appearance, the greater the degree of impact on this economic indicator value.

ベクトル推定部６ｂは、集計時間単位毎の経済事象パターンのそれぞれに対して、共通の空間に写像させた経済事象頻度ベクトルが、集計時間単位毎の経済事象パターンのそれぞれの出現頻度を再現するように推定する。経済事象頻度ベクトルを推定する目的は、ニュースのような離散的なデータを連続的な特徴量に変換することで、例えば、「自動車」と「石油」、または、「販売」と「需要」などを離散的な記号に写像するのではなく、ｒ（ｒ≧２）次元の共通の空間に写像して、同一の尺度で表現するためである。これにより、オブジェクト間の意味の差異や類似度を測ることが可能になる。この点、離散的な記号、例えば、「自動車」＝ｉｄ１、「石油」＝ｉｄ２のような表現では、四則演算などの数学的処理を行うことができない（後述するベクトル集計部６ｃの処理ができない。）。 The vector estimating unit 6b is configured so that the economic event frequency vector mapped to the common space for each economic event pattern for each aggregated time unit reproduces the appearance frequency of each economic event pattern for each aggregated time unit. estimated to . The purpose of estimating economic event frequency vectors is to convert discrete data such as news into continuous features, such as "automobiles" and "oil", or "sales" and "demand". is not mapped to discrete symbols, but to a common space of r (r≧2) dimensions and expressed on the same scale. This makes it possible to measure the difference and similarity in meaning between objects. In this regard, discrete symbols, such as expressions such as “automobile”=id1 and “oil”=id2, cannot be used for mathematical processing such as four arithmetic operations (the processing of the vector counting unit 6c, which will be described later, cannot be performed). .).

経済事象頻度ベクトルの推定は、以下の数式１に示す仮定に基づいている。ここで、記号「～」は、左辺が右辺の確率分布に従うという意味である。ｘdijkは、特定の集計時間単位（日付）における特定の経済事象パターンの出現頻度である。また、θd，θi，θj，θkは、ｒ次元の潜在ベクトルであり、それぞれ、日付(集計時間単位)、名前（item）、要素（element）、変動（predicate）といったオブジェクトの連続的な数値を持つパラメータである。なお、θは非負(マイナスではないこと。)を満たす。 The estimation of the economic event frequency vector is based on the assumptions shown in Equation 1 below. Here, the symbol "~" means that the left side follows the probability distribution of the right side. xdijk is the frequency of occurrence of a particular economic event pattern in a particular aggregated time unit (date). In addition, θd, θi, θj, and θk are r-dimensional latent vectors, and represent continuous numerical values of objects such as date (aggregation time unit), name (item), element (element), and change (predicate). is a parameter with Note that θ satisfies non-negative (not negative).

具体的には、経済事象頻度ベクトルは、以下の数式２に基づいて推定される。ここで、ｘdijkは日付ｄにおける経済事象パターンijkの出現頻度（観測数）、θ_*は経済事象頻度ベクトル（推定対象）のパラメータである。ｒは経済事象頻度ベクトルの次元番号であり、次元数は適宜設定される。また、ｄは日付に対応する添字、ｍ_dは日付ｄに対応する月の添字、ｉは経済事象パターンの名前（item）に対応する添字、ｊは経済事象パターンの要素（element）に対応する添字、ｋは経済事象パターンの変動（predicate）に相当する添字である。同数式２において、集計時間単位内における経済事象パターンの出現頻度は、ポアソン分布に従うことを仮定としている。なお、θは非負を満たしさえすれば、ｅ^θのような関数やニューラルネットのような非線形関数の形を取ることが可能である。 Specifically, the economic event frequency vector is estimated based on Equation 2 below. Here, xdijk is the appearance frequency (number of observations) of the economic event pattern ijk on date d, and θ _* is the parameter of the economic event frequency vector (estimation target). r is the dimension number of the economic event frequency vector, and the number of dimensions is set as appropriate. In addition, d is the subscript corresponding to the date, m _d is the subscript of the month corresponding to the date d, i is the subscript corresponding to the name of the economic event pattern (item), and j corresponds to the element of the economic event pattern. The subscript, k, is a subscript that corresponds to the predicate of the economic event pattern. In Equation 2, it is assumed that the occurrence frequency of the economic event pattern within the aggregation time unit follows the Poisson distribution. Note that θ can take the form of a function such as e ^θ or a nonlinear function such as a neural network, as long as θ satisfies nonnegativeness.

上記数式２における各パラメータθd，θi，θj，θkは、離散的な記号を共通の空間に写像した結果となる。経済事象パターン毎の出現頻度ｘdijkをうまく再現できるように、これらのパラメータθd，θi，θj，θkを推定（学習）できれば、その集計時間単位（日次）内において、どのような経済事象（経済事象パターン）が出現しているかといった状況、換言すれば、各経済事象パターンの出現度合いを表現することが可能となる。 The parameters .theta.d, .theta.i, .theta.j, and .theta.k in Equation 2 above are the results of mapping discrete symbols onto a common space. If these parameters θd, θi, θj, θk can be estimated (learned) so that the frequency of occurrence xdijk for each economic event pattern can be well reproduced, what kind of economic event (economic In other words, it is possible to express the degree of occurrence of each economic event pattern.

上記数式１および２に基づいて推定結果として、経済事象頻度ベクトルは、非負の実数値によって構成される。図１１から図１４は、４つのパラメータθ_*に分解して表現された経済事象頻度ベクトルの一例を示す図である（ｒ＝１０の場合のθ_*の結果）。経済事象頻度ベクトルθ_*の各次元は、オブジェクト間で共通の意味を持つため、例えば、異なる名前（item）であっても非負の実数値として同じ尺度で比較することが可能になる。また、名前（item）間で推定されたパラメータの実数値が似ている傾向があると、これらを含む経済事象は同時に観測され易くなる。 As an estimation result based on Equations 1 and 2 above, the economic event frequency vector is composed of non-negative real values. 11 to 14 are diagrams showing an example of an economic event frequency vector expressed by decomposing into four parameters θ _* (results of θ _* when r=10). Since each dimension of the economic event frequency vector θ _* has a common meaning among objects, for example, different names (items) can be compared on the same scale as non-negative real numbers. In addition, when the real values of estimated parameters between names (items) tend to be similar, economic events including these tend to be observed at the same time.

また、ベクトル推定部６ｂは、経済事象頻度ベクトルの各次元の構成要素（非負の実数値）の和が１になるように、経済事象頻度ベクトルを正規化する。この正規化によって、経済事象頻度ベクトル同士を同じ数値基準で評価することが可能になる。具体的には、特定の日付において出現する経済事象パターン毎に推定された経済事象頻度ベクトルに対して、以下の数式３を適用することによって正規化が行われる。この正規化を行えば、θ_*の値が非負の実数値を取るため、それぞれの経済事象パターンについて正規化された経済事象頻度ベクトルΛ_dijkrの和が必ず１になる。この結果は、経済事象パターンの潜在的に出現率として解釈できるので、経済事象パターン毎に出現頻度を足し上げる処理が可能となる。 The vector estimator 6b also normalizes the economic event frequency vector so that the sum of the constituent elements (non-negative real numbers) of each dimension of the economic event frequency vector becomes one. This normalization allows economic event frequency vectors to be evaluated on the same numerical basis. Specifically, normalization is performed by applying Equation 3 below to the economic event frequency vector estimated for each economic event pattern appearing on a specific date. If this normalization is performed, the value of θ _* takes a non-negative real value, so the sum of the economic event frequency vectors Λ _dijkr normalized for each economic event pattern is always 1. Since this result can be interpreted as the potential appearance rate of the economic event pattern, it is possible to add the appearance frequency for each economic event pattern.

なお、ベクトル推定部６ｂによって生成された経済事象頻度ベクトルは、図示しない記憶装置に格納される。この記憶装置に格納された経済事象頻度ベクトルは、経済事象頻度ベクトルの集計を行う際、ベクトル集計部６ｃによって随時読み出される。 The economic event frequency vector generated by the vector estimator 6b is stored in a storage device (not shown). The economic event frequency vectors stored in this storage device are read by the vector counting unit 6c whenever the economic event frequency vectors are counted.

ベクトル集計部６ｃは、所定の時間単位毎に、この時間単位に属する経済事象頻度ベクトルを集計して、経済事象集計ベクトルを生成する。ここでいう時間単位は、回帰モデル６ｆの学習時では、経済指標値の時間分解能に相当する時間単位（本実施形態では月次）である。また、回帰モデル６ｆを用いた指標推定時では、経済指標値の時間分解能とは異なる時間単位、典型的には、経済指標値よりも時間分解能が高い時間単位（本実施形態では日次）である。経済事象集計ベクトルは、上記時間単位内における各経済事象パターンの出現度合いを表す。すなわち、上記学習時には、月次集計の結果として、１ヶ月における各経済事象パターンの出現度合いを表す経済事象集計ベクトル（第１の経済事象集計ベクトル）が生成される。また、上記指標予測時には、日次集計の結果として、１日における各経済事象パターンの出現度合いを表す経済事象集計ベクトル（第２の経済事象集計ベクトル）が生成される。 The vector totalization unit 6c totalizes the economic event frequency vectors belonging to each predetermined time unit to generate an economic event total vector. The time unit here is a time unit (monthly in this embodiment) corresponding to the time resolution of the economic index value during learning of the regression model 6f. Also, when estimating the index using the regression model 6f, in a time unit different from the time resolution of the economic index value, typically in a time unit with a higher time resolution than the economic index value (daily in this embodiment) be. The economic event aggregate vector represents the degree of occurrence of each economic event pattern within the time unit. That is, during the learning, an economic event aggregation vector (first economic event aggregation vector) representing the degree of occurrence of each economic event pattern in one month is generated as a result of the monthly aggregation. Further, during the index prediction, as a result of the daily aggregation, an economic event aggregation vector (second economic event aggregation vector) representing the degree of appearance of each economic event pattern in one day is generated.

具体的には、経済事象集計ベクトルは、以下の数式４によって計算される。上記数式３において、Λdijkrの総和が必ず１になることを利用すれば、ｘdijkがｄ日の経済事象パターンijkの出現頻度なので、ｘdijkΛdijkrとして、日付ｄ日の名前（item）ｉ、要素（element）ｊについてすべて和を取り、上記時間単位における出現頻度で正規化することによって、ｄ日の経済事象集計ベクトルを計算できる。 Specifically, the economic event aggregation vector is calculated by Equation 4 below. Using the fact that the sum of Λdijkr is always 1 in the above formula 3, xdijk is the frequency of appearance of the economic event pattern ijk on day d, so xdijkΛdijkr is the name (item) i of date d and the element (element) By taking the sum of all j and normalizing by the frequency of occurrence in the time unit, the economic event aggregate vector for d days can be calculated.

この処理は、変動（predicate）ｋ毎に別々に行われる。なお、変動（predicate）に代えて、図８に示した企業の業績要因データにおける「影響」毎に行ってもよい。この「影響」を用いる利点としては、第１に、回帰係数の値の動向を上手く表現できること、そして、第２に、回帰係数の結果が増収ならば必ずプラス、減収ならば必ずマイナスになることを制約できることが挙げられる。また、和を取る際は、任意の関数を使って和を取ることや、過去の経済事象集計ベクトルが現在の経済事象集計ベクトルに影響を与えると仮定して和を取ることも可能である。この処理を行うことで、日付ｄについて、ｒ次元のどの位置に属する事象が出現し易いかという解釈を行うことが可能になる。 This process is done separately for each predicate k. It should be noted that instead of fluctuation (predicate), it may be performed for each "impact" in the performance factor data of the company shown in FIG. The advantage of using this "influence" is, firstly, that the trend of the value of the regression coefficient can be expressed well, and secondly, the result of the regression coefficient is always positive if the revenue increases, and negative if the revenue decreases. can be constrained. In addition, when taking the sum, it is possible to take the sum using an arbitrary function, or to take the sum assuming that the past economic event aggregate vector affects the current economic event aggregate vector. By performing this process, it becomes possible to interpret which position in the r dimension the event belonging to on the date d is likely to appear.

図１５は、日次単位の経済事象集計ベクトルの一例を示す図である。図示した数値の意味は、各次元が経済事象の特性を表しており、ある任意の日ｄに関して、どのような経済事象が出現し易いのかを表している。換言すれば、その日ｄにおける各経済事象パターンの出現度合いとして、次元１は石油関連の事象の特性が強いとか、次元２は自動車関連の事象の特性が強いといった傾向を表している。これは、計算上必ずそうなるというわけでなく、コンピュータの計算結果を人間が解釈した結果として、そうなる可能性があるということである。 FIG. 15 is a diagram showing an example of a daily unit economic event aggregation vector. The numerical values shown in the figure represent the characteristics of economic events in each dimension, and represent what kind of economic events are likely to occur on any given day d. In other words, as the degree of appearance of each economic event pattern on the day d, dimension 1 shows a strong characteristic of oil-related events, and dimension 2 shows a tendency of strong characteristics of automobile-related events. This does not necessarily mean that it will happen in terms of calculation, but that it may happen as a result of human interpretation of the calculation results of the computer.

一方、月次単位の経済事象集計ベクトルを算出する場合には、上記数式４にしたがって、月次単位毎に、その月次に属する経済事象パターン毎の経済事象頻度ベクトルを集計すればよい。回帰モデル６ｆの学習用の事象集計ベクトルを月次単位としているのは、経済指標値の時間分解能と整合させるためである。したがって、もし、経済指標値の時間分解能が四半期単位であるならば、学習用の事象集計ベクトルも四半期単位で生成される。 On the other hand, when calculating the economic event aggregate vector for each month, the economic event frequency vector for each economic event pattern belonging to that month should be aggregated for each monthly unit according to Equation 4 above. The reason why the event aggregation vector for learning of the regression model 6f is monthly is to match the time resolution of the economic index values. Therefore, if the time resolution of the economic index value is quarterly, the event aggregation vector for learning is also generated quarterly.

回帰モデル６ｆは、統計的手法によって二つの変数の関係を推計するモデルであり、経済指標の推定を任意の時間分解能で行うために用いられる。回帰モデル６ｆとしては、リッジ回帰、Ｌａｓｓｏ、ガウス過程回帰、ＸＧｂｏｏｓｔ、ニューラルネットワーク、サポートベクターマシン（ＳＶＭ）など含む任意のモデルを利用することができる。 The regression model 6f is a model for estimating the relationship between two variables by a statistical method, and is used for estimating economic indicators with arbitrary time resolution. Any model including ridge regression, Lasso, Gaussian process regression, XGboost, neural network, support vector machine (SVM) and the like can be used as the regression model 6f.

学習処理部６ｄは、ベクトル集計部６ｃによって生成された月次単位の経済事象集計ベクトル（月次集計）の入力に対して、これと時間的に対応する経済指標値が応答するように、回帰モデル６ｆの学習を行う。図１６は、回帰モデル６ｆにおける入力変数と応答変数との関係の一例を示す図である。例えば、２０１４年１月の経済事象集計ベクトルの入力に対して、同年同月の経済指標値（５７．７）が応答するように、回帰モデル６ｆの学習が行われることになる。 The learning processing unit 6d performs regression so that the economic index value temporally corresponding to the input of the monthly unit economic event aggregate vector (monthly aggregate) generated by the vector aggregate unit 6c responds. The model 6f is trained. FIG. 16 is a diagram showing an example of the relationship between input variables and response variables in the regression model 6f. For example, the learning of the regression model 6f is performed so that the economic index value (57.7) in the same month of January 2014 responds to the input of the economic event aggregation vector in January of the same year.

推定処理部６ｅは、ベクトル集計部６ｃによって生成された日次単位の経済事象集計ベクトル（日次集計）を用いて、これと時間的に対応する経済指標値を推定する。具体的には、経済事象集計ベクトルが学習済の回帰モデル６ｆに入力され、この入力に対する回帰モデル６ｆの応答が経済指標の推定値として出力される。日次および月次の経済事象集計ベクトルについて同一の方法で標準化計算を行うことで、ｒ次元の合計を１に制約をかけることができるので、月次のデータによる学習結果を用いて日次の経済指標の推定を行ったとしても、推定精度を保つことが可能となる。なお、経済指標の推定値は、日次単位よりも時間分解能が高い場合、および、これが低い場合のどちらであっても出力可能であり、また、現時点のみならず過去分についても出力可能である。 The estimation processing unit 6e uses the daily economic event aggregation vector (daily aggregation) generated by the vector aggregation unit 6c to estimate an economic indicator value temporally corresponding thereto. Specifically, the economic event aggregate vector is input to the learned regression model 6f, and the response of the regression model 6f to this input is output as the estimated value of the economic index. By performing standardized calculations for the daily and monthly aggregate vectors of economic events in the same way, the sum of the r dimensions can be constrained to 1. Even if the economic index is estimated, it is possible to maintain the estimation accuracy. Estimates of economic indicators can be output regardless of whether the time resolution is higher or lower than the daily unit, and can be output not only for the current time but also for the past. .

また、推定処理部６ｅは、必要に応じて、回帰モデル６ｆの応答である経済指標の推定値に対して平滑化や季節調整といった処理を行う。図１７は、カルマンフィルタによる平滑化が行われた経済指標の推定値の一例を示す図である。図２に示した元の経済指標値が月次単位であるのに対して、同図に示す経済指標の推定値の日次単位となっており、元の経済指標値よりも時間分解能が高く、かつ、なめらかに変化している。 In addition, the estimation processing unit 6e performs processing such as smoothing and seasonal adjustment on the estimated values of the economic indicators, which are the responses of the regression model 6f, as necessary. FIG. 17 is a diagram showing an example of estimated economic index values smoothed by a Kalman filter. While the original economic index values shown in Figure 2 are monthly units, the estimated economic index values shown in the figure are daily units, and the time resolution is higher than the original economic index values. , and is changing smoothly.

このように、本実施形態によれば、経済事象集計ベクトルの入力に対して、これと時間的に対応する経済指標値が応答するように、回帰モデル６ｆの学習を行う。経済事象集計ベクトルは、例えば月次といった時間単位内における各経済事象パターンの出現度合いを表している。回帰モデル６ｆの学習結果として、パターン化された経済事象の出現頻度、換言すれば、ある経済指標値に対する経済事象の影響度合いが考慮された形で、経済事象と経済指標値とが関連付けられる。このようにして構築された回帰モデル７を利用することで、任意の時間分解能の経済事象集計ベクトルの入力に対する応答として、この時間分解能に相当する経済指標の推定を精度良く行うことができる。 As described above, according to the present embodiment, the regression model 6f is learned so that the economic index value temporally corresponding to the input of the economic event aggregation vector responds. The economic event aggregate vector represents the degree of occurrence of each economic event pattern within a time unit such as monthly. As a learning result of the regression model 6f, an economic event and an economic index value are associated in a form that takes into consideration the occurrence frequency of the patterned economic event, in other words, the degree of influence of the economic event on a certain economic index value. By using the regression model 7 constructed in this way, it is possible to accurately estimate the economic index corresponding to this time resolution as a response to the input of the economic event aggregation vector of arbitrary time resolution.

さらに、本発明は、図１および図９に示した機能ブロックを等価的に実現する経済指標推定プログラムとして捉えることができる。この経済指標推定プログラムは、概略的には、以下の処理をコンピュータに実行させる。まず、ニュース群から抽出された経済事象について、経済事象ダイジェストを生成する。つぎに、予め設定された絞込条件に従って、経済指標値と関連性を有する経済事象ダイジェストを抽出すると共に、抽出された経済事象ダイジェストを経済事象データベース５に格納する。つぎに、経済事象データベース５に格納された経済事象ダイジェストの群を集計対象として、所定の集計時間単位毎に、経済事象ダイジェストを経済事象パターン別に分類し、それぞれの経済事象パターンの出現頻度を集計する。つぎに、集計時間単位毎の経済事象パターンのそれぞれに対して、共通の空間に写像させた経済事象頻度ベクトルが、集計時間単位毎の経済事象パターンのそれぞれの出現頻度を再現するように推定する。つぎに、月次単位毎に、月次単位に属する経済事象頻度ベクトルを集計して、経済事象集計ベクトル（月次）を生成する。つぎに、経済事象集計ベクトル（月次）の入力に対して、これと時間的に対応する経済指標値（月次）が応答するように、回帰モデル６ｆの学習を行う。つぎに、日次単位に属する経済事象頻度ベクトルを集計して、経済事象集計ベクトル（日次）を生成する。最後に、経済事象集計ベクトル（日次）の入力に対する学習済の回帰モデル６ｆの応答を、経済事象集計ベクトル（日次）と時間的に対応する経済指標の推定値（日次）として出力する。 Furthermore, the present invention can be regarded as an economic indicator estimation program that equivalently implements the functional blocks shown in FIGS. This economic index estimation program generally causes a computer to execute the following processes. First, an economic event digest is generated for the economic events extracted from the news group. Next, according to the narrowing-down conditions set in advance, economic event digests having relevance to the economic index value are extracted, and the extracted economic event digests are stored in the economic event database 5 . Next, with a group of economic event digests stored in the economic event database 5 as aggregation targets, the economic event digests are classified by economic event pattern for each predetermined aggregation time unit, and the appearance frequency of each economic event pattern is aggregated. do. Next, for each economic event pattern for each aggregated time unit, the economic event frequency vector mapped to the common space is estimated so as to reproduce the appearance frequency of each economic event pattern for each aggregated time unit. . Next, for each monthly unit, economic event frequency vectors belonging to the monthly unit are aggregated to generate an economic event aggregate vector (monthly). Next, the regression model 6f is trained so that the economic index value (monthly) corresponding to the input of the economic event summary vector (monthly) responds. Next, the economic event frequency vectors belonging to the daily unit are aggregated to generate an economic event aggregation vector (daily). Finally, output the response of the trained regression model 6f to the input of the aggregated economic event vector (daily) as the estimated value (daily) of the economic index temporally corresponding to the aggregated economic event vector (daily). .

１経済指標推定システム
２ニュースフィルタ部
３ダイジェスト生成部
４絞込処理部
５経済事象データベース
６指標推定部
６ａ頻度集計部
６ｂベクトル推定部
６ｃベクトル集計部
６ｄ学習処理部
６ｅ推定処理部
６ｆ回帰モデル 1 economic index estimation system 2 news filter unit 3 digest generation unit 4 narrowing processing unit 5 economic event database 6 index estimation unit 6a frequency aggregation unit 6b vector estimation unit 6c vector aggregation unit 6d learning processing unit 6e estimation processing unit 6f regression model

Claims

An economic indicator estimation system for estimating an economic indicator with a desired time resolution based on a discrete time series of economic indicator values representing a specific economic indicator and economic events that affect the economic indicator values,
a digest generation unit that generates an economic event digest in which the content of the economic event extracted from the news group collected from the outside is structured with a plurality of predetermined items;
an economic event database that stores the economic event digest;
The group of economic event digests stored in the economic event database is subject to aggregation, and the economic event digests are classified into economic event patterns having content commonality for each predetermined aggregation time unit, and each economic event is classified into a frequency counting unit that counts the frequency of occurrence of patterns;
For each of the economic event patterns for each aggregated time unit, an economic event frequency vector composed of non-negative real values mapped to a common space reproduces the appearance frequency of each of the economic event patterns for each aggregated time unit. a vector estimator that estimates such that
For each first time unit corresponding to the time resolution of the economic index value, an economic event frequency vector belonging to the first time unit is aggregated, and the degree of occurrence of each economic event pattern within the first time unit. a vector aggregation unit that generates a first economic event aggregation vector representing
a learning processing unit that learns a regression model for estimating the economic index so that the economic index value temporally corresponding to the input of the first economic event aggregate vector responds to the input An economic index estimation system characterized by:

2. The method according to claim 1, further comprising a news filter unit that extracts media names described in a preset news media list from among the news groups and outputs the extracted media names to the digest generation unit. economic indicator estimation system.

The feature further comprises a narrowing processing unit that extracts, from the economic event digests generated by the digest generating unit, those that match preset narrowing conditions and stores them in the economic event database. The economic index estimation system according to claim 1.

The narrowing-down processing unit extracts the economic event digest by referring to, as the narrowing-down condition, a performance factor list describing patterns of economic event digests that affect specific economic index values to be estimated. The economic index estimation system according to claim 3, characterized by:

The vector aggregation unit calculates the sum of the economic event frequency vectors normalized so that the sum of the components of each dimension is 1, and normalizes by the appearance frequency in the first time unit. , to generate the first economic event aggregation vector.

further comprising an estimation processing unit that outputs a response of the learned regression model to the input of the second aggregate vector of economic events as an estimated value of an economic indicator temporally corresponding to the second aggregate vector of economic events;
The vector aggregation unit aggregates economic event frequency vectors belonging to a second time unit having a time resolution different from that of the economic index value, and expresses the degree of occurrence of each economic event pattern within the second time unit. 2. The economic indicator estimation system according to claim 1, wherein the second economic event aggregation vector is generated.

The vector aggregation unit calculates the sum of the economic event frequency vectors normalized so that the sum of the components of each dimension is 1, and normalizes by the appearance frequency in the second time unit. , to generate the second economic event aggregation vector.

8. The economic index estimation system according to claim 6, wherein said second time unit has a higher time resolution than said economic index value.

In an economic indicator estimation program for estimating an economic indicator with a desired time resolution based on a discrete time series of economic indicator values representing a specific economic indicator and economic events that affect the economic indicator values,
a first step of generating an economic event digest in which the content of the economic event extracted from the news group collected from the outside is structured with a plurality of predetermined items;
a second step of storing the economic event digest in an economic event database;
The group of economic event digests stored in the economic event database is subject to aggregation, and the economic event digests are classified into economic event patterns having content commonality for each predetermined aggregation time unit, and each economic event is classified into a third step of aggregating the frequency of occurrence of patterns;
For each of the economic event patterns for each aggregated time unit, an economic event frequency vector composed of non-negative real values mapped to a common space reproduces the appearance frequency of each of the economic event patterns for each aggregated time unit. a fourth step of estimating that
For each first time unit corresponding to the time resolution of the economic index value, an economic event frequency vector belonging to the first time unit is aggregated, and the degree of occurrence of each economic event pattern within the first time unit. a fifth step of generating a first economic event aggregation vector representing
a sixth step of learning a regression model for estimating the economic index so that the economic index value temporally corresponding to the input of the first economic event aggregation vector responds; An economic indicator estimation program characterized by causing a computer to execute a process comprising:

10. The economic index estimation program according to claim 9, wherein said first step has a step of extracting media names described in a preset news media list from said news group. .

10. The method according to claim 9, wherein said first step has a step of extracting economic event digests that meet preset narrowing conditions and storing them in said economic event database. economic indicator estimation program.

The first step extracts the economic event digest by referring to, as the narrowing condition, a performance factor list describing patterns of economic event digests that affect specific economic index values to be estimated. The economic index estimation program according to claim 11, characterized by:

In the third step, calculating the sum of the economic event frequency vectors normalized so that the sum of the components of each dimension is 1, and normalizing by the appearance frequency in the first time unit. 10. The economic index estimation program according to claim 9, wherein the first economic event aggregation vector is generated by:

a seventh step of outputting a response of the learned regression model to the input of the second aggregate vector of economic events as an estimated value of an economic indicator temporally corresponding to the second aggregate vector of economic events;
aggregating economic event frequency vectors belonging to a second time unit having a time resolution different from that of the economic index value, and a second economic event aggregation representing the degree of occurrence of each economic event pattern within the second time unit; 10. The economic indicator estimation program according to claim 9, further comprising an eighth step of generating vectors.

In the eighth step, calculating the sum of the economic event frequency vectors normalized so that the sum of the components of each dimension is 1, and normalizing by the appearance frequency in the second time unit. 15. The economic index estimation program according to claim 14, wherein the second economic event aggregation vector is generated by:

16. The economic index estimation program according to claim 14, wherein said second time unit has a higher time resolution than said economic index value.