JP6723673B2

JP6723673B2 - Causal relationship extraction system and causal relationship extraction program

Info

Publication number: JP6723673B2
Application number: JP2019135001A
Authority: JP
Inventors: 洋二郎関
Original assignee: Xenodata Lab Co Ltd
Current assignee: Xenodata Lab Co Ltd
Priority date: 2018-07-26
Filing date: 2019-07-23
Publication date: 2020-07-15
Anticipated expiration: 2039-07-23
Also published as: JP2020024689A

Description

本発明は、経済事象と企業業績との因果関係を抽出する因果関係抽出システムおよび因果関係抽出プログラムに関する。 The present invention relates to a causal relationship extraction system and a causal relationship extraction program for extracting a causal relationship between economic events and corporate performance.

例えば、特許文献１には、大量のニュース記事などのコンテンツを投資に有効に活用するために、大量のテキストデータに基づく情報を用いて注目銘柄の検索を可能とする銘柄選択支援装置が開示されている。具体的には、まず、ニュース記事が保持されたニュース記事セット手段からキーワードに関連したニュース記事が読み出される。つぎに、キーワードと関連度の高い銘柄が検索され、関連度の高いものから順次抽出される。そして、ユーザから与えられたキーワードと関連度の高い銘柄に関するものがニュース記事セット手段を検索することによって抽出され、この抽出されたニュース記事がユーザに提供される。 For example, Patent Literature 1 discloses a stock selection support device capable of searching for a stock of interest using information based on a large amount of text data in order to effectively use a large amount of content such as news articles for investment. ing. Specifically, first, the news article related to the keyword is read from the news article setting means that holds the news article. Next, the brands having a high degree of relevance with the keyword are searched, and those having a high degree of relevance are sequentially extracted. Then, the keywords related to the keyword given by the user and having a high degree of relevance are extracted by searching the news article setting means, and the extracted news articles are provided to the user.

また、非特許文献１には、テキストデータから経済事象および金融事象の連鎖を因果関係として抽出する、ベクトル表現を用いた因果関係連鎖の抽出手法が開示されている。具体的には、まず、決済短信のテキストを参照して、因果関係を示す手掛かり表現（因果関係表現）が抽出される。この表現抽出に関して、因果関係が存在する構文パターンを列挙しておき、手掛かり表現を用いることによって、単文、複文、重文に関係なく因果関係表現が抽出される。そして、抽出された因果関係ノードの中から市場の情勢や企業の業績を記した因果関係ノードが手動で選択され、因果関係連鎖の終端ノードとする。つぎに、新聞のテキストを参照して、上記終端ノードを終端とする因果関係連鎖への追加候補の因果関係ノードが抽出される。因果関係の抽出は、決済短信のテキストと同様の手法を用いて行われ、終端ノードよりも過去のもので、基本的に、終端ノードを基準とした探索期間以内のものが因果関係連鎖の追加候補となる。最後に、因果関係ノード間の類似度を計算して、因果関係連鎖を拡張し、因果関係連鎖への追加候補ノードが抽出され、これによって、因果関係連鎖が構築される。 Further, Non-Patent Document 1 discloses a method of extracting a causal relationship chain using a vector expression, which extracts a chain of economic events and financial events as a causal relationship from text data. Specifically, first, a clue expression (causal relationship expression) indicating a causal relationship is extracted with reference to the text of the settlement brief. With regard to this expression extraction, by listing the syntactic patterns in which a causal relationship exists and using the clue expression, the causal expression is extracted regardless of simple sentences, compound sentences, and compound sentences. Then, from the extracted causal relationship nodes, a causal relationship node describing the situation of the market and the performance of the company is manually selected, and is set as the end node of the causal relationship chain. Next, referring to the text of the newspaper, causal relationship nodes that are candidates for addition to the causal relationship chain ending at the terminal node are extracted. The causal relations are extracted using the same method as the text of settlement of accounts, and those that are older than the end node and basically within the search period based on the end node are added to the causal relation chain. Become a candidate. Finally, the similarity between the causal relation nodes is calculated, the causal relation chain is expanded, and additional candidate nodes to the causal relation chain are extracted, thereby constructing the causal relation chain.

特開２００３−１６２６３９号公報JP, 2003-162639, A 西村他，「ベクトル表現を用いた因果関係連鎖の抽出」，[online]，２０１８年３月２０日、第２０回人工知能学会金融情報学研究会（ＳＩＧ−ＦＩＮ），日本，人工知能学会，［２０１９年２月１５日検索］，インターネット＜ＵＲＬ：http://sigfin.org/?plugin=attach&refer=020-09&openfile=SIG-FIN-020-09.pdf＞Nishimura et al., "Extracting Causal Relationship Chains Using Vector Representation", [online], March 20, 2018, 20th AIJ Financial Informatics Research Group (SIG-FIN), Japan, AI Society, [Search February 15, 2019], Internet <URL:http://sigfin.org/?plugin=attach&refer=020-09&openfile=SIG-FIN-020-09.pdf>

いうまでもなく、経済活動に影響を及ぼす経済事象は、株価などの企業業績にも大きな影響を及ぼす。そのため、機関投資家、証券会社、個人投資家などのユーザにとって、このような経済事象は大きな関心事であり、多くのユーザは、ある経済事象が企業の業績にどのような影響を与えるかを常に予測しているところである。しかしながら、世の中では様々な経済事象が常時大量に発生していることから、膨大な数のニュースを読み込んで、経済事象が企業業績に与える影響を分析することは、ユーザにとって多大な負担となる。 Needless to say, economic events that affect economic activities also have a major effect on corporate performance such as stock prices. Therefore, such an economic phenomenon is of great concern to users such as institutional investors, securities companies, and individual investors, and many users wonder how an economic phenomenon affects a company's business performance. We are always predicting. However, since various economic phenomena occur all the time in the world, it becomes a great burden for the user to read a huge number of news and analyze the influence of the economic phenomena on the business performance.

この点、非特許文献１には、経済・金融事象を数値ベクトルで表現し、経済・金融事象間の因果関係を自動的に構築する手法について述べられているものの、因果関係構築における事象間の類似判定はＩＤＦやコサイン類似度など一般的な方法によって行われており、経済・金融事象における因果関係構築のための工夫については何ら記載されていない。 In this regard, Non-Patent Document 1 describes a method of automatically expressing a causal relationship between economic and financial events by expressing an economic/financial event with a numerical vector, but between events in causal relationship building. Similarity determination is performed by a general method such as IDF or cosine similarity, and nothing is described about a device for establishing a causal relationship in economic and financial phenomena.

本発明は、かかる事情に鑑みてなされたものであり、その目的は、経済事象と企業業績との因果関係の抽出をコンピュータ処理によって効率的に自動化することである。 The present invention has been made in view of such circumstances, and an object thereof is to efficiently automate the extraction of a causal relationship between an economic phenomenon and business performance by computer processing.

かかる課題を解決すべく、第１の発明は、因果関係データベースと、ニュース分析部と、決算分析部と、ノード登録部と、因果関係登録部とを有し、経済事象と企業業績との相関性を因果関係として抽出する因果関係抽出システムを提供する。因果関係データベースは、経済事象のダイジェストが付加された複数の事象ノードと、企業業績のダイジェストが付加された複数の業績ノードと、特定の経済事象の対に関するノード間の接続と、特定の経済事象および特定の企業業績の対に関するノード間の接続とを保持する。ニュース分析部は、ニュースの内容を分析して経済事象を抽出すると共に、経済事象のダイジェストを生成する。決算分析部は、決済関連資料の内容を分析して経済事象および企業業績を抽出すると共に、経済事象のダイジェストおよび企業業績のダイジェストを生成する。ノード登録部は、抽出された経済事象について、同一ダイジェストの事象ノードが登録されていないことを条件として、経済事象のダイジェストを付加した事象ノードを因果関係データベースに登録する。それとともに、ノード登録部は、抽出された企業業績について、同一ダイジェストの業績ノードが登録されていないことを条件として、企業業績のダイジェストを付加した業績ノードを因果関係データベースに登録する。因果関係登録部は、因果関係を示す特定の語句を手掛かりとした第１の抽出手法、または、因果関係の有無が明らかな既知の２つの要因の集合である学習データを用いた機械学習による第２の抽出手法によって、ニュースより異なる特定の経済事象同士の因果関係が抽出された場合、この抽出された因果関係に基づいて、この経済事象の対に関するノード間の接続を因果関係データベースに登録する。それとともに、因果関係登録部は、上記第１の抽出手法、上記第２の抽出手法、または、属性ラベルによって抽象化された形態素列と、要因パターンとを比較し、要因パターンと一致した形態素列について、要因パターンによって指定された部分を要因として抽出する第３の抽出手法によって、決算関連資料より特定の企業業績および特定の経済事象の因果関係が抽出された場合、この抽出された因果関係に基づいて、この経済事象およびこの企業業績の対に関するノード間の接続を因果関係データベースに登録する。ここで、経済事象のダイジェストは、経済事象の内容を予め定められた複数の項目に区分することによって構造化されており、これらの項目は、経済事象の定量または傾向を表す項目と、その変動方向（ただし、変動数値は含まない。）を表す項目とを有する。また、企業業績のダイジェストは、企業業績の内容を予め定められた複数の項目に区分することによって構造化されており、これらの項目は、勘定科目を表す項目と、その変動方向を表す項目とを有する。 In order to solve such a problem, the first invention has a causal relation database, a news analysis unit, a settlement analysis unit, a node registration unit, and a causal relation registration unit, and correlates economic events with corporate performance. A causal relationship extraction system for extracting sex as a causal relationship is provided. The causal relationship database is composed of a plurality of event nodes to which a digest of economic events is added, a plurality of performance nodes to which a digest of corporate performance is added, a connection between nodes related to a specific economic event pair, and a specific economic event. And connection between nodes for a particular pair of corporate performance. The news analysis unit analyzes the content of the news to extract the economic phenomenon and also generates a digest of the economic phenomenon. The financial statement analysis unit analyzes the content of the settlement related material to extract the economic phenomenon and the business performance, and also generates the digest of the economic phenomenon and the digest of the business performance. The node registration unit registers, in the causal relation database, the event node to which the digest of the economic event is added, provided that the event node having the same digest is not registered for the extracted economic event. At the same time, the node registration unit registers, in the causal relationship database, a performance node added with a digest of the corporate performance, provided that no performance node of the same digest is registered for the extracted corporate performance. The causal relationship registration unit uses a first extraction method based on a specific word or phrase indicating a causal relationship, or a machine learning method using learning data that is a set of two known factors that are known to have a causal relationship. When a causal relationship between specific economic events different from news is extracted by the extraction method of No. 2, the connection between nodes related to this economic event pair is registered in the causal relationship database based on the extracted causal relationship. .. At the same time, the causal relationship registration unit compares the morpheme sequence abstracted by the first extraction method, the second extraction method, or the attribute label with the factor pattern, and compares the morpheme sequence with the factor pattern. If a causal relationship between a particular corporate performance and a particular economic event is extracted from the financial statements-related material by the third extraction method that extracts the part specified by the factor pattern as a factor, the extracted causal relationship Based on this, the connection between nodes related to this economic event and this company performance pair is registered in the causal relationship database. Here, the digest of an economic event is structured by dividing the content of the economic event into a plurality of predetermined items, and these items are items that represent the quantitative amount or tendency of the economic event and their fluctuations. There is an item indicating a direction (however, a variation numerical value is not included). The corporate performance digest is structured by dividing the content of the corporate performance into a plurality of predetermined items. These items are an item representing an account item and an item representing the changing direction thereof. Have.

ここで、第１の発明において、上記ノード登録部は、同一ダイジェストとみなす表現パターンを複数登録した名寄せ辞書を参照して、抽出された経済事象および抽出された企業業績について、同一ダイジェストの事象ノードおよび同一ダイジェストの業績ノードが因果関係データベースに登録されているか否かを判断することが好ましい。 Here, in the first invention, the node registration unit refers to a name identification dictionary in which a plurality of expression patterns regarded as the same digest are registered, and with respect to the extracted economic event and the extracted corporate performance, the same digest event node And it is preferable to judge whether or not the performance nodes of the same digest are registered in the causal relationship database.

第１の発明において、上記因果関係登録部は、因果関係データベースに登録されており、かつ、事象ノードおよび業績ノードとして接続関係にある一連の登録済ノードについて、変動方向を反転させた上でコピーすることによって、因果関係データベースに予測事象ノードおよび予測業績ノードとして登録してもよい。この場合、上記因果関係登録部は、反転関係にある表現のセットを複数登録した反転辞書を参照して、登録済ノードにおける変動方向の反転に相当する表現が存在することを条件として、予測事象ノードおよび予測業績ノードとして登録することが好ましい。 In the first aspect of the invention, the causal relationship registering unit copies a series of registered nodes that are registered in a causal relationship database and that have a connection relationship as an event node and an achievement node, after reversing the direction of change. By doing so, it may be registered in the causal relationship database as a predicted event node and a predicted achievement node. In this case, the causal relationship registration unit refers to an inversion dictionary in which a plurality of sets of expressions having an inversion relationship are registered, and on the condition that there is an expression corresponding to the inversion of the fluctuation direction in the registered node, the prediction event It is preferable to register as a node and a forecast performance node.

第２の発明は、以下のステップをコンピュータに実行させることによって、経済事象と企業業績との相関性を因果関係として抽出する因果関係抽出プログラムを提供する。第１のステップでは、ニュースの内容を分析して経済事象を抽出すると共に、経済事象のダイジェストを生成する。第２のステップでは、決済関連資料の内容を分析して経済事象および企業業績を抽出すると共に、経済事象のダイジェストおよび企業業績のダイジェストを生成する。第３のステップでは、抽出された経済事象について、同一ダイジェストの事象ノードが登録されていないことを条件として、経済事象のダイジェストを付加した事象ノードを因果関係データベースに登録する。第４のステップでは、抽出された企業業績について、同一ダイジェストの業績ノードが登録されていないことを条件として、企業業績のダイジェストを付加した業績ノードを因果関係データベースに登録する。第５のステップでは、因果関係を示す特定の語句を手掛かりとした第１の抽出手法、または、因果関係の有無が明らかな既知の２つの要因の集合である学習データを用いた機械学習による第２の抽出手法によって、ニュースより異なる特定の経済事象同士の因果関係が抽出された場合、この抽出された因果関係に基づいて、これらの経済事象の対に関するノード間の接続を因果関係データベースに登録する。第６のステップでは、上記第１の抽出手法、上記第２の抽出手法、または、属性ラベルによって抽象化された形態素列と、要因パターンとを比較し、要因パターンと一致した形態素列について、要因パターンによって指定された部分を要因として抽出する第３の抽出手法によって、決算関連資料より特定の企業業績および特定の経済事象の因果関係が抽出された場合、この抽出された因果関係に基づいて、この経済事象およびこの企業業績の対に関するノード間の接続を因果関係データベースに登録する。ここで、経済事象のダイジェストは、経済事象の内容を予め定められた複数の項目に区分することによって構造化されており、これらの項目は、経済事象の定量または傾向を表す項目と、その変動方向（ただし、変動数値は含まない。）を表す項目とを有する。また、企業業績のダイジェストは、企業業績の内容を予め定められた複数の項目に区分することによって構造化されており、これらの項目は、勘定科目を表す項目と、その変動方向を表す項目とを有する。 A second invention provides a causal relationship extraction program for causing a computer to execute the following steps to extract a correlation between an economic phenomenon and a business performance as a causal relationship. In the first step, the content of the news is analyzed to extract the economic phenomenon and a digest of the economic phenomenon is generated. In the second step, the contents of the settlement related material are analyzed to extract the economic phenomenon and the business performance, and at the same time, the economic phenomenon digest and the business performance digest are generated. In the third step, with respect to the extracted economic event, the event node to which the digest of the economic event is added is registered in the causal relationship database, provided that no event node having the same digest is registered. In the fourth step, with respect to the extracted corporate performances, the performance node added with the digest of the corporate performances is registered in the causal relationship database, provided that the performance nodes of the same digest are not registered. In the fifth step, a first extraction method using a specific word or phrase indicating a causal relationship as a clue, or a machine learning method using learning data that is a set of two known factors that have a causal relationship is known. When a causal relationship between specific economic events different from news is extracted by the extraction method of No. 2, the connection between the nodes related to the pair of these economic events is registered in the causal relationship database based on the extracted causal relationship. To do. In the sixth step, the morpheme sequence abstracted by the first extraction method, the second extraction method, or the attribute label is compared with the factor pattern, and the morpheme sequence matching the factor pattern is identified by the factor. When the causal relationship between a specific corporate performance and a specific economic phenomenon is extracted from the financial statements-related material by the third extraction method that extracts the portion specified by the pattern as a factor, based on this extracted causal relationship, Register the connections between nodes for this economic event and this pair of company performances in a causal relationship database. Here, the digest of an economic event is structured by dividing the content of the economic event into a plurality of predetermined items, and these items are items that represent the quantitative amount or tendency of the economic event and their fluctuations. There is an item indicating a direction (however, a variation numerical value is not included) . The corporate performance digest is structured by dividing the content of the corporate performance into a plurality of predetermined items. These items are an item representing an account item and an item representing the changing direction thereof. Have.

ここで、第２の発明において、上記第３および上記第４のステップは、同一ダイジェストとみなす表現パターンを複数登録した名寄せ辞書を参照して、抽出された経済事象および抽出された企業業績について、同一ダイジェストの事象ノードおよび同一ダイジェストの業績ノードが因果関係データベースに登録されているか否かを判断するステップを有することが好ましい。 Here, in the second invention, the third and fourth steps refer to a name identification dictionary in which a plurality of expression patterns regarded as the same digest are registered, and the extracted economic phenomenon and the extracted corporate performance are It is preferable to have a step of determining whether or not the same digest event node and the same digest performance node are registered in the causal relationship database.

第２の発明において、因果関係データベースに登録されており、かつ、事象ノードおよび業績ノードとして接続関係にある一連の登録済ノードについて、変動方向を反転させた上でコピーすることによって、因果関係データベースに予測事象ノードおよび予測業績ノードとして登録する第７のステップを設けてもよい。この場合、上記第７のステップは、反転関係にある表現のセットを複数登録した反転辞書を参照して、登録済ノードにおける変動方向の反転に相当する表現が存在することを条件として、予測事象ノードおよび予測業績ノードとして登録することが好ましい。 In the second invention, a causal relationship database is created by inverting the direction of change and copying a series of registered nodes that are registered in the causal relationship database and that are connected as event nodes and achievement nodes. May have a seventh step of registering as a predicted event node and a predicted achievement node. In this case, the seventh step refers to the inversion dictionary in which a plurality of sets of expressions having an inversion relation are registered, and the prediction event is provided on condition that there is an expression corresponding to the inversion of the fluctuation direction in the registered node. It is preferable to register as a node and a forecast performance node.

本発明によれば、経済事象および企業業績について、因果関係データベースに登録する際の内容の同一性判断も含めて、情報の冗長性を排したダイジェストベースで因果関係の抽出処理を行う。経済事象のダイジェストは、ニュースの記載特性を考慮した上で、経済事象を端的に特徴化した形で構造化されており、経済事象の定量または傾向を表す項目と、その変動方向を表す項目とを有する。また、企業業績のダイジェストは、決算関連資料の記載特性を考慮した上で、企業業績を端的に特徴化した形で構造化されており、勘定科目を表す項目と、その変動方向を表す項目とを有する。これにより、経済事象と企業業績との間における因果関係の抽出をコンピュータ処理によって効率的に自動化することが可能になる。 According to the present invention, causal relationship extraction processing is performed on economic events and corporate performance on a digest basis that eliminates redundancy of information, including determination of identity of content when registering in a causal relationship database. The economic event digest is structured in such a way that it briefly characterizes the economic event, taking into account the description characteristics of the news, and an item that indicates the quantitative amount or tendency of the economic event and an item that indicates the direction of change. Have. In addition, the corporate performance digest is structured in a way that briefly characterizes the corporate performance in consideration of the description characteristics of the financial statements-related materials. Have. This makes it possible to efficiently automate the extraction of the causal relationship between economic events and corporate performance by computer processing.

本実施形態に係るサービスモデルの概念図Conceptual diagram of the service model according to the present embodiment ニュース分析における表示画面の一例を示す図The figure which shows an example of the display screen in news analysis ニュース分析における相関グラフの一例を示す図Diagram showing an example of correlation graph in news analysis 相関グラフ表示装置のブロック図Block diagram of correlation graph display device 因果関係データベースの概念図Conceptual diagram of causality database 利益影響要因分析における相関グラフの一例を示す図Diagram showing an example of correlation graph in profit impact factor analysis 相関グラフ生成システムのブロック図Block diagram of correlation graph generation system ニュースデータベースの構成を示す図Diagram showing the structure of the news database 決算関連データベースの構成を示す図Figure showing the structure of the settlement related database 因果関係抽出システムのブロック図Block diagram of causal relationship extraction system テキスト構造解析を用いた因果関係の抽出の説明図Explanatory diagram of extraction of causal relationship using text structure analysis 機械学習を用いた因果関係の抽出手法の説明図Illustration of causal relationship extraction method using machine learning ニュースから抽出される経済事象を示す図Diagram showing economic events extracted from news ニュースから抽出される事象ノードを示す図Diagram showing event nodes extracted from news ニュースから抽出される事象ノード間の因果関係を示す図Diagram showing causal relationships between event nodes extracted from news 決算関連資料から抽出される事象ノードと業績ノードとの間の因果関係を示す図Diagram showing the causal relationship between the event node and the achievement node extracted from the financial statements-related materials 新規ニュースの反転時における相関グラフの説明図Explanatory diagram of correlation graph when reversing new news 予測事象ノードおよび予測業績ノードの生成処理の説明図Explanatory diagram of generation process of forecast event node and forecast achievement node

図１は、本実施形態に係るサービスモデルの概念図である。本サービスの運営主体またはその委託を受けた者が管理する分析サーバは、ニュース会社などの外部より配信されたニュースを取得する。この分析サーバは、証券取引所や企業などの外部より配信された決算関連資料、すなわち、決算短信、有価証券報告書などの開示資料も取得し、これらを分析する。そして、分析サーバは、後述するニュース分析や利益影響要因分析などを含む様々なサービスを、機関投資家、証券会社、個人投資家などのユーザに提供する。分析サーバが提供する特徴的なサービスとして、経済事象と企業業績との相関性を可視化した相関グラフを提供することが挙げられる。なお、本明細書において、「経済事象」とは、経済活動に影響を及し得る事象を広く包含する概念であって、狭義の経済事象のみならず、経済に影響を及ぼすような自然事象、例えば、冷夏、大地震、大洪水なども含む概念で用いられる。 FIG. 1 is a conceptual diagram of a service model according to this embodiment. The analysis server managed by the operator of the service or a person who is entrusted with the service acquires news distributed from outside such as a news company. The analysis server also acquires settlement-related materials distributed from the outside such as stock exchanges and companies, that is, disclosure materials such as financial statements and securities reports, and analyzes these. Then, the analysis server provides users such as institutional investors, securities companies, and individual investors with various services including news analysis and profit influence factor analysis described later. One of the characteristic services provided by the analysis server is to provide a correlation graph that visualizes the correlation between economic events and corporate performance. In the present specification, the term "economic phenomenon" is a concept that broadly encompasses phenomena that can affect economic activities, and not only economic phenomena in a narrow sense, but also natural phenomena that affect the economy, For example, it is used in the concept including cold summer, large earthquake, and large flood.

図２は、ニュース分析における表示画面の一例を示す図である。この表示画面１は、ユーザが操作するクライアントが備える表示装置に表示される。表示画面１は、それぞれが表示画面１の一部として異なる領域に設けられた、検索候補表示領域１ａと、相関グラフ表示領域１ｂと、ニュース本文表示領域１ｃと、影響企業表示領域１ｄとを有する。検索候補表示領域１ａには、ユーザが選択可能な検索候補群がリスト表示される。表示される検索候補群はサービスの内容によって異なるが、ニュース分析の場合、様々なニュースのタイトルが一覧表示される。相関グラフ表示領域１ｂには、表示された検索候補群（ニュース群）の中から、ユーザによって検索対象（ニュース）が選択された場合、これに応じた相関グラフが表示される。ニュース本文表示領域１ｃには、ユーザによって選択されたニュースの本文が表示される。影響企業表示領域１ｄには、ユーザによって選択されたニュースに示された経済事象によって影響を受け得る企業の一覧がリスト表示される。 FIG. 2 is a diagram showing an example of a display screen in news analysis. The display screen 1 is displayed on the display device included in the client operated by the user. The display screen 1 has a search candidate display area 1a, a correlation graph display area 1b, a news text display area 1c, and an affected company display area 1d, which are provided in different areas as a part of the display screen 1. .. In the search candidate display area 1a, a list of search candidate groups selectable by the user is displayed. The displayed search candidate group varies depending on the content of the service, but in the case of news analysis, various news titles are displayed in a list. When the user selects a search target (news) from the displayed search candidate group (news group), the correlation graph display area 1b displays a correlation graph corresponding thereto. The news text selected by the user is displayed in the news text display area 1c. In the affected company display area 1d, a list of companies that can be affected by the economic phenomenon shown in the news selected by the user is displayed in a list.

図３は、ニュース分析における相関グラフの一例を示す図である。この相関グラフは、経済事象と企業業績との相関性を可視化したものである。相関グラフは、少なくとも一つの経済事象のノードと、少なくとも一つの業績ノードとを含んでいる。相関グラフは、その構造として、分岐を有するツリー状の場合もあるが、単なる列状の場合もある。以下の説明では、「インド銅工場閉鎖」や「インド銅製品生産量の減少」といった経済事象に関するノードを「事象ノード」といい、「○○ペイントの営業減益」や「△△自動車の営業減益」といった企業業績に関するノードを「業績ノード」という。 FIG. 3 is a diagram showing an example of a correlation graph in news analysis. This correlation graph visualizes the correlation between economic events and corporate performance. The correlation graph includes at least one economic event node and at least one performance node. The structure of the correlation graph may be in the form of a tree having branches, or may be in the form of a column. In the following explanation, nodes related to economic events such as "India copper plant closure" and "India copper product production decrease" are referred to as "event nodes", and "○○ Paint business profit decrease" and "△△ automobile business profit decrease". A node related to corporate performance such as "is referred to as "performance node".

相関グラフにおいて、特定のノード同士（一対のノード）は有向辺で結ばれている。この有向辺は、これによって結ばれたノード間に因果関係があることを示しており、有向辺の起点が「要因」、その終点が「結果」をそれぞれ意味する。一対の事象ノードが有向辺で結ばれている場合、起点側の事象ノードに相当する経済事象が要因となって、終点側の事象ノードに相当する経済事象が結果的に生じたことを意味する。また、事象ノードと業績ノードとが有向辺で結ばれている場合、起点側の事象ノードに相当する経済事象が要因となって、終点側の業績ノードに相当する企業業績が結果的に生じたことを意味する。なお、本明細書において、「有向辺」とは、要因側と結果側とをユーザが識別可能なシンボルを広く包含する概念であって、典型的には、要因側から結果側に至る矢印が想定されるが、それ以外に、要因側を底辺、結果側を頂点とした三角形、要因側に「要因」、結果側に「結果」と表記した線分などであってもよい。 In the correlation graph, specific nodes (a pair of nodes) are connected by directed edges. This directed edge indicates that there is a causal relationship between the nodes connected by this, and the origin of the directed edge means “factor” and its end point means “result”, respectively. When a pair of event nodes are connected by a directed edge, it means that the economic event corresponding to the event node on the starting side is the cause of the economic event corresponding to the event node on the starting side. To do. In addition, when the event node and the performance node are connected by a directed edge, an economic event corresponding to the event node on the starting side causes the corporate performance corresponding to the performance node on the ending side. Means that It should be noted that in the present specification, the “directed side” is a concept that broadly includes symbols that the user can distinguish between the factor side and the result side, and typically, an arrow from the factor side to the result side. However, other than that, a triangle with the base on the factor side, the vertex on the result side, the “factor” on the factor side, and the line segment “result” on the result side may be used.

同図の例は、ユーザによって「インド銅工場閉鎖」が検索対象として選択されたケースを示している。この場合、検索対象である「インド銅工場閉鎖」が最上位ノードとなり、これと直接的または間接的な因果関係を有する他の経済事象や企業業績がこれよりも下位のノードとして示される。すなわち、「インド銅工場閉鎖」という経済事象については、これが直接的な要因になって、「インド銅製品生産量の減少」という経済事象（結果）が生じている。また、「インド銅製品生産量の減少」が要因となって、「インド自動車生産量の減少」および「インド銅輸入量の増加」という２つの経済事象（結果）が生じている。これら２つの経済事象は、直近の「インド銅製品生産量の減少」を直接的な要因とするが、これと因果関係を有する最上位の「インド銅工場閉鎖」が間接的な要因となって引き起こされたものである。さらに、「インド自動車生産量の減少」については、これが直接的な要因となって「○○ペイントの営業減益」および「△△自動車の営業減益」という２つの企業業績（結果）が生じている。これも、「インド自動車生産量の減少」と因果関係を有する最上位の「インド銅工場閉鎖」が間接的な要因となって引き起こされたものである。 The example of the figure shows a case where the user selects “Indian copper factory closed” as a search target. In this case, the “closed Indian copper factory”, which is the search target, is the highest node, and other economic events or corporate performances that have a direct or indirect causal relationship with this are shown as lower nodes. In other words, with regard to the economic phenomenon of "Indian copper factory closure", this is a direct factor, and the economic phenomenon (result) of "decrease in Indian copper product production" has occurred. In addition, two economic events (results), "decrease in Indian car production" and "increase in Indian copper import", are caused by "decrease in Indian copper product production". These two economic events are directly caused by the latest “decrease in Indian copper product production”, but indirectly due to the closure of the highest Indian copper factory, which has a causal relationship with this. It was caused. Furthermore, with regard to the "decrease in Indian automobile production," this is a direct factor, which results in two corporate performances (results): "○○ Paint operating profit decrease" and "△△ automobile operating profit decrease". .. This is also caused by an indirect factor, which is the closure of the highest-ranking Indian copper plant, which has a causal relationship with the “decrease in Indian automobile production”.

このことから、「インド銅工場閉鎖」については、この経済事象が要因となって、「○○ペイントの営業減益」、「△△自動車の営業減益」、「□□マテリアルの営業増益」および「××電線の営業増益」といった各企業業績の変動が引き起こされていることが理解できる。このように、検索対象として特定のニュースが選択された場合、相関グラフによって、この特定のニュースによって示された経済事象と直接的または間接的な因果関係を有する一連の経済事象が可視化されると共に、これと直接的または間接的な因果関係を有する企業業績も可視化される。 From this, as for the closure of the Indian copper factory, this business phenomenon caused the decrease in operating income of ○○ Paint, the decrease in operating income of △△ automobiles, the increase in operating income of □□ materials, and It can be understood that fluctuations in the business performance of each company such as “XX increase in electric wire operating income” are caused. Thus, when a particular news item is selected as a search target, the correlation graph visualizes a series of economic events having a direct or indirect causal relationship with the economic event indicated by the particular news item. , Corporate performance that has a direct or indirect causal relationship with this is also visualized.

なお、同図の例では、相関グラフの最下位ノードに複数の企業業績が並んでいるが、これは一例であって、ケースによっては、最上位ノードや中間ノードから企業業績が派生することもある。ただし、企業業績を示す業績ノードは、基本的に、相関グラフにおける各分岐の終端ノードとなる。また、業績ノードは、企業の事業ドメイン毎に割り当てられた複数のノードを有していてもよい。例えば、「□□マテリアル」という企業について、「セメント事業」、「金属事業」、「アルミ事業」および「電子材料事業」といった複数の業績ノードに細分化されているといった如くである。また、相関グラフに表示すべき業績ノードの数が所定数以上の場合、影響度の高い企業から優先的に表示してもよい。この影響度の度合いは、文章中における表現の強弱、特定表現の出現頻度、統計分析などを駆使して定量的に判定することが好ましい。また、検索対象を起点とした相関グラフの表示段数は、システムとして予め設定されていてもよいし、ユーザが任意に設定できるようにしてもよい。さらに、検索候補表示領域１ａには、ユーザの選択や指定によって、特定の企業に関するニュースだけを表示するようにしてもよい。 In addition, in the example of the same figure, a plurality of corporate performances are arranged in the lowest node of the correlation graph, but this is an example, and in some cases, the corporate performance may be derived from the highest node or the intermediate node. is there. However, the performance node indicating the business performance is basically the terminal node of each branch in the correlation graph. Further, the achievement node may have a plurality of nodes assigned to each business domain of the company. For example, a company called “□□ material” is subdivided into a plurality of performance nodes such as “cement business”, “metal business”, “aluminum business”, and “electronic material business”. Further, when the number of achievement nodes to be displayed in the correlation graph is equal to or larger than the predetermined number, the companies having a high influence degree may be displayed first. It is preferable to quantitatively determine the degree of this degree of influence by making full use of the strength of expression in a sentence, the appearance frequency of a specific expression, statistical analysis, and the like. The number of display steps of the correlation graph starting from the search target may be set in advance as the system, or may be arbitrarily set by the user. Further, in the search candidate display area 1a, only news relating to a specific company may be displayed by the user's selection or designation.

図４は、本実施形態に係る相関グラフ表示装置のブロック図である。本実施形態において、この相関グラフ表示装置１０は、ブラウザを備えるクライアント側において構築される。この相関グラフ表示装置１０は、上述した検索候補表示領域１ａや相関グラフ表示領域１ｂなどを表示する表示装置の他に、表示切替部２を備える。この表示切替部２は、検索候補表示領域１ａに表示された検索候補群に対するユーザの選択を受け付け、ユーザによって選択された検索対象に応じた相関グラフを相関グラフ表示領域１ｂに表示させる。相関グラフの構築に際しては、分析サーバ側が備える因果関係データベース３が参照される。そのために、表示切替部２は、分析サーバに対して、検索対象に関する検索依頼を送信すると共に、検索結果（相関グラフ）を分析サーバより受信する。なお、このような表示切替部２は、クライアントにインストールされたウェブブラウザの一機能として実現することができる。 FIG. 4 is a block diagram of the correlation graph display device according to the present embodiment. In the present embodiment, the correlation graph display device 10 is constructed on the client side including a browser. The correlation graph display device 10 includes a display switching unit 2 in addition to the display device that displays the search candidate display area 1a and the correlation graph display area 1b described above. The display switching unit 2 accepts a user's selection of the search candidate group displayed in the search candidate display area 1a, and causes the correlation graph display area 1b to display a correlation graph corresponding to the search target selected by the user. When constructing the correlation graph, the causal relationship database 3 provided on the analysis server side is referred to. Therefore, the display switching unit 2 transmits a search request regarding the search target to the analysis server and receives the search result (correlation graph) from the analysis server. It should be noted that such a display switching unit 2 can be realized as a function of a web browser installed in the client.

図５は、因果関係データベース３の概念図である。因果関係データベース３は、相関グラフを構築するために必要な情報を保持しており、具体的には、複数の事象ノードと、複数の業績ノードと、特定の事象ノードの対に関するノード間の接続と、特定の事象ノードおよび特定の業績ノードの対に関するノード間の接続とを保持する。因果関係データベース３に保持されている各ノードには、ノード固有の識別情報（ノードＩＤ）が付与されており、このノードＩＤによって個々のノードが管理される。上述したように、ノードには、経済事象に相当する事象ノードと、企業業績に相当する業績ノードとが存在するが、両者を区別するために、事象ノードのノードＩＤを「ａ１」〜「ａ７」で示し、業績ノードのノードＩＤを「ｂ１」〜「ｂ３」で示す。また、一対の経済事象の間に因果関係がある場合、要因および結果を識別可能な形（相関グラフの有向辺に相当）で、これらに相当する一対の事象ノード間の接続が保持されている。さらに、ある経済事象とある企業業績との間に因果関係がある場合、前者に相当する事象ノードと、後者に相当する業績ノードとの間の接続も保持されている。なお、後述するように、因果関係データベース３は、それぞれのノードに関するノード情報（ダイジェスト）も保持している。 FIG. 5 is a conceptual diagram of the causal relationship database 3. The causal relationship database 3 holds information necessary for constructing a correlation graph, and specifically, a plurality of event nodes, a plurality of achievement nodes, and a connection between nodes related to a specific pair of event nodes. And a connection between nodes for a particular event node and a particular performance node pair. Each node held in the causal relationship database 3 is provided with identification information (node ID) unique to the node, and each node is managed by this node ID. As described above, the node includes the event node corresponding to the economic event and the performance node corresponding to the corporate performance. In order to distinguish the two, the node IDs of the event nodes are “a1” to “a7”. , And the node IDs of the achievement nodes are shown as “b1” to “b3”. In addition, if there is a causal relationship between a pair of economic events, the connection between the pair of event nodes corresponding to these is maintained in a form that can identify the factors and the results (corresponding to the directed side of the correlation graph). There is. Furthermore, when there is a causal relationship between a certain economic phenomenon and a certain business performance, the connection between the event node corresponding to the former and the performance node corresponding to the latter is also maintained. As will be described later, the causal relationship database 3 also holds node information (digest) regarding each node.

表示切替部２は、ユーザの操作による検索対象の変更に伴い、相関グラフ表示領域１ｂに表示すべき相関グラフを切り替える。すなわち、検索候補表示領域１ａにおいて、検索候補群として表示されたニュース群の中から検索対象として第１のニュースが選択された場合、第１の相関グラフが表示される。第１の相関グラフは、検索対象となる第１のニュースに対応する第１の経済事象を最上位ノードとし、第１の経済事象と相関を有する他の経済事象および企業業績を下位ノードとする。また、第１のニュースとは異なる第２のニュースが検索対象として選択された場合、第１の相関グラフとは異なる第２の相関グラフが表示される。第２の相関グラフは、検索対象となる第２のニュースに対応する第２の経済事象を最上位ノードとし、第２の経済事象と相関を有する他の経済事象および企業業績を下位ノードとする。 The display switching unit 2 switches the correlation graph to be displayed in the correlation graph display area 1b when the search target is changed by the user's operation. That is, in the search candidate display area 1a, when the first news is selected as the search target from the news group displayed as the search candidate group, the first correlation graph is displayed. In the first correlation graph, the first economic event corresponding to the first news to be searched is the top node, and other economic events and company performances that correlate with the first economic event are subordinate nodes. .. Further, when the second news different from the first news is selected as the search target, the second correlation graph different from the first correlation graph is displayed. In the second correlation graph, the second economic event corresponding to the second news item to be searched is the top node, and other economic events and company performances correlated with the second economic event are the subordinate nodes. ..

つぎに、利益影響要因分析について説明する。利益影響要因分析の場合、ニュース分析とは異なり、検索候補表示領域１ａには、ユーザが選択可能な企業業績群がリスト表示される。検索候補群（企業業績群）の中から、ユーザによって検索対象（企業業績）が選択された場合、相関グラフ表示領域１ｂには、これに応じた相関グラフが表示される。 Next, the profit influence factor analysis will be described. In the case of profit influence factor analysis, unlike the news analysis, in the search candidate display area 1a, a group of company achievements selectable by the user is displayed. When the user selects a search target (corporate achievement) from the search candidate group (corporate achievement group), a correlation graph corresponding to this is displayed in the correlation graph display area 1b.

図６は、利益影響要因分析における相関グラフの一例を示す図である。同図の例は、検索対象として「□□マテリアルの営業増益」が選択された場合を示している。この場合、「□□マテリアルの営業増益」が最下位ノードとなり、この企業業績の直接的または間接的な要因となる経済事象が上位ノードとして表示される。この相関グラフから、検索対象である「□□マテリアルの営業増益」という企業業績は、「日本銅製品生産・輸出量の増加」を直接的な要因とし、「インド銅製品輸入量の増加」、「インド銅製品生産量の減少」および「インド銅工場閉鎖」を間接的な要因としていることが理解できる。このように、検索対象として特定の企業業績が選択された場合、相関グラフによって、この特定の企業業績と直接的または間接的な因果関係を有する一連の経済事象が可視化される。 FIG. 6 is a diagram showing an example of a correlation graph in the profit influence factor analysis. The example in the figure shows the case where “Increase in operating income of □□ material” is selected as the search target. In this case, “Increase in operating income of □□ material” is the lowest node, and the economic phenomenon that is a direct or indirect factor of this corporate performance is displayed as the upper node. From this correlation graph, the corporate performance of "Increase in operating income of □□ material", which is the search target, has "Increase in production and export volume of copper products in Japan" as a direct factor, and "Increase in import volume of copper products in India", It can be understood that "decrease in Indian copper product production" and "India copper factory closure" are indirect factors. In this way, when a specific company performance is selected as a search target, the correlation graph visualizes a series of economic events having a direct or indirect causal relationship with the specific company performance.

利益影響要因分析も、上述したニュース分析と同様、表示切替部２は、ユーザの操作による検索対象の変更に伴い、相関グラフ表示領域１ｂに表示すべき相関グラフを切り替える。すなわち、検索候補表示領域１ａにおいて、検索候補群として表示された企業業績群の中から検索対象として第１の企業業績が選択された場合、第３の相関グラフが表示される。第３の相関グラフは、第１の企業業績を最下位ノードとし、第１の企業業績と相関を有する経済事象を上位ノードとする。また、第１の企業業績とは異なる第２の企業業績が選択された場合、第３の相関グラフとは異なる第４の相関グラフが表示される。第４の相関グラフは、第２の企業業績を最下位ノードとし、第２の企業業績と相関を有する経済事象を上位ノードとする。 In the profit influence factor analysis, as in the news analysis described above, the display switching unit 2 switches the correlation graph to be displayed in the correlation graph display area 1b when the search target is changed by the user's operation. That is, in the search candidate display area 1a, when the first corporate performance is selected as the search target from the corporate performance group displayed as the search candidate group, the third correlation graph is displayed. In the third correlation graph, the first corporate performance is the lowest node, and the economic phenomenon having a correlation with the first corporate performance is the upper node. When a second company performance different from the first company performance is selected, a fourth correlation graph different from the third correlation graph is displayed. In the fourth correlation graph, the second corporate performance is the lowest node, and the economic phenomenon having the correlation with the second corporate performance is the upper node.

なお、因果関係データベース３の検索について、ニュース分析と利益影響要因分析とでは、接続関係の辿り方が逆になる。ニュース分析の場合、有向辺の向きどおり、要因（上位）から結果（下位）に向かって接続関係を辿ることになる。これに対し、利益影響要因分析の場合、有向辺の向きとは逆に、結果（下位）から要因（上位）を辿ることになる。 Regarding the search of the causal relation database 3, the news analysis and the profit influence factor analysis have opposite connection tracing. In the case of news analysis, the connection relationship is traced from the factor (upper) to the result (lower) according to the direction of the directed side. On the other hand, in the case of profit influence factor analysis, the factor (upper) is traced from the result (lower), contrary to the direction of the directed side.

図７は、上述した相関グラフ表示装置１０に相関グラフを提供する相関グラフ生成システムの機能的なブロック図である。本実施形態において、相関グラフ生成システム２０は、分析サーバ側において構築される。なお、この生成システム２０は、コンピュータをブロック４〜６として機能・動作させるコンピュータプログラム（相関グラフ生成プログラム）を分析サーバにインストールすることによって等価的に実現することも可能である。 FIG. 7 is a functional block diagram of a correlation graph generation system that provides a correlation graph to the correlation graph display device 10 described above. In the present embodiment, the correlation graph generation system 20 is constructed on the analysis server side. It should be noted that this generation system 20 can be equivalently realized by installing a computer program (correlation graph generation program) that causes a computer to function and operate as blocks 4 to 6 into the analysis server.

相関グラフ生成システム２０は、上述した因果関係データベース３の他に、相関グラフ生成部４と、因果関係抽出システム５と、ニュースデータベース７と、決算関連データベース８とを有する。 The correlation graph generation system 20 has a correlation graph generation unit 4, a causal relationship extraction system 5, a news database 7, and a settlement related database 8 in addition to the above-described causal relationship database 3.

相関グラフ生成部４は、クライアント側からの検索依頼（検索対象の指示を含む。）に基づいて、検索対象に係る事象／業績ノードを検索キーとして因果関係データベース３を検索する。これにより、検索キーと直接的および間接的に接続された少なくとも一つのノードが、ノード間の接続関係も含めて抽出される。そして、相関グラフ生成部４は、因果関係データベース３の検索結果に基づいて相関グラフを生成する。この相関グラフは、検索キーとなるノードと、因果関係データベース３の検索結果として抽出された少なくとも一つのノードとを含む。また、相関グラフの有向辺は、因果関係データベース３の検索結果として抽出されたノード間の接続関係に基づき形成される。 The correlation graph generation unit 4 searches the causal relationship database 3 based on a search request (including a search target instruction) from the client side using the event/achievement node related to the search target as a search key. As a result, at least one node that is directly and indirectly connected to the search key is extracted, including the connection relationship between the nodes. Then, the correlation graph generation unit 4 generates a correlation graph based on the search result of the causal relationship database 3. This correlation graph includes a node serving as a search key and at least one node extracted as a search result of the causality database 3. In addition, the directed side of the correlation graph is formed based on the connection relation between the nodes extracted as the search result of the causal relation database 3.

ニュース分析の場合、ユーザによって選択された特定のニュースに対応する事象ノードを検索キーとして因果関係データベース３が検索され、この検索結果に基づいて、特定のニュースに対応する事象ノードと、これと直接的または間接的に接続された他の事象ノードおよび業績ノードとを含む相関グラフが生成される。一方、利益影響要因分析の場合、ユーザによって選択された特定の企業業績に対応する業績ノードを検索キーとして因果関係データベース３が検索され、この検索結果に基づいて、特定の企業業績に対応する業績ノードと、これと直接的または間接的に接続された事象ノードとを含む相関グラフが生成される。相関グラフ生成部４によって生成された相関グラフは、検索依頼に対する検索結果としてクライアント側に送信される。 In the case of news analysis, the causal relationship database 3 is searched using the event node corresponding to the specific news selected by the user as a search key, and based on the search result, the event node corresponding to the specific news and the A correlation graph is generated that includes other event nodes and achievement nodes that are connected indirectly or indirectly. On the other hand, in the case of profit influence factor analysis, the causal relationship database 3 is searched with the performance node corresponding to the specific corporate performance selected by the user as a search key, and the performance corresponding to the specific corporate performance is based on this search result. A correlation graph including a node and an event node directly or indirectly connected to the node is generated. The correlation graph generated by the correlation graph generation unit 4 is transmitted to the client side as the search result for the search request.

図８は、ニュースデータベース７の構成を示す図である。ニュースデータベース７は、外部より取得された多数のニュースを保持する。このデータベース７は、「ニュースＩＤ」、「ニュース情報」および「ノードＩＤ」によって構成されたレコードを多数保持している。「ニュースＩＤ」は、個々のニュースを識別するための固有の識別情報である。このニュースＩＤは、外部から取得されたニュースをニュースデータベース７に登録する際、その都度採番される。「ニュース情報」は、ニュースの内容そのものであり、ニュースのタイトル、本文、その他の付帯情報（日付等）を含む。「ノードＩＤ」は、このニュースより抽出された経済事象に対応する事象ノードのノードＩＤを示す。ノードＩＤについては、例えば、ニュースＩＤ＝「Ｎ００２」〜「Ｎ００４」のように、１つのニュースにおいて複数のノードＩＤが記述されることがある。これは、１つのニュースから複数の経済事象が抽出され、これらに対応する事象ノードのそれぞれが因果関係データベース３に登録されていることを意味する。逆に、ニュースＩＤ＝「Ｎ００１」，「Ｎ００２」のように、複数のニュースについて、１つのノードＩＤ（＝「ａ１」）が重複して記述されることもある。これは、それぞれのニュースから経済事象が個別に抽出されたものの、同一の経済事象であると判断された結果、１つの事象ノードａ１のみが因果関係データベース３に登録されていることを意味する。 FIG. 8 is a diagram showing the configuration of the news database 7. The news database 7 holds a large number of news acquired from the outside. The database 7 holds a large number of records composed of "news ID", "news information" and "node ID". The "news ID" is unique identification information for identifying each piece of news. This news ID is assigned each time news acquired from outside is registered in the news database 7. The "news information" is the contents of the news itself, and includes the title of the news, the text, and other incidental information (date, etc.). “Node ID” indicates the node ID of the event node corresponding to the economic event extracted from this news. Regarding the node ID, a plurality of node IDs may be described in one news item, for example, news ID=“N002” to “N004”. This means that a plurality of economic events are extracted from one news item and event nodes corresponding to these are registered in the causal relationship database 3. On the contrary, one node ID (=“a1”) may be redundantly described for a plurality of news items such as news ID=“N001” and “N002”. This means that, although economic events are individually extracted from each news item, only one event node a1 is registered in the causal relationship database 3 as a result of determining that they are the same economic event.

図９は、決算関連データベース８の構成を示す図である。決算関連データベース８は、外部より取得された多数の決算関連資料を保持する。このデータベース８は、「決算ＩＤ」、「決算関連情報」および「ノードＩＤ」によって構成されたレコードを多数保持している。「決算ＩＤ」は、個々の決算関連資料を識別するための固有の識別情報である。この決算ＩＤは、外部から取得された決算関連資料を決算関連データベース８に登録する際、その都度採番される。「決算関連情報」は、決算関連資料の内容そのものである。「ノードＩＤ」は、この決算関連資料より抽出された経済事象に対応する事象ノードのノードＩＤ（例えばａ２）、および、抽出された企業業績に対応する業績ノードのノードＩＤ（例えばｂ１）を示す。例えば、決算ＩＤ＝「Ｋ００１」については、この決算関連資料から抽出された経済事象が「ａ２」の事象ノードとして存在し、企業業績が「ｂ１」の業績ノードとして存在することを意味する。 FIG. 9 is a diagram showing the structure of the settlement related database 8. The settlement related database 8 holds a large number of settlement related materials acquired from the outside. This database 8 holds a large number of records composed of “account settlement ID”, “account settlement related information” and “node ID”. The “account settlement ID” is unique identification information for identifying each settlement related material. This settlement ID is assigned each time a settlement-related material obtained from the outside is registered in the settlement-related database 8. “Financial statement related information” is the content itself of the financial statement related material. The "node ID" indicates the node ID (for example, a2) of the event node corresponding to the economic event extracted from this settlement related material and the node ID (for example, b1) of the performance node corresponding to the extracted corporate performance. .. For example, for the settlement ID=“K001”, it means that the economic phenomenon extracted from the settlement related material exists as the event node of “a2” and the business achievement exists as the achievement node of “b1”.

図１０は、因果関係抽出システム５の機能的なブロック図である。この因果関係抽出システム５は、ニュースデータベース７より読み出されたニュースと、決算関連データベース８より読み出された決算関連資料とを分析して、経済事象と企業業績との相関性を因果関係として抽出し、因果関係データベース３に必要な情報を登録する。因果関係抽出システム５は、ニュース分析部５ａと、決算分析部５ｂと、ノード登録部５ｃと、因果関係登録部５ｄとを有する。なお、本実施形態において、因果関係抽出システム５は、分析サーバ側において構築されるが、コンピュータをブロック５ａ〜５ｄとして機能・動作させるコンピュータプログラム（因果関係抽出プログラム）を分析サーバにインストールすることによって等価的に実現することも可能である。 FIG. 10 is a functional block diagram of the causal relationship extraction system 5. The causal relationship extraction system 5 analyzes the news read from the news database 7 and the financial statement-related materials read from the financial statement-related database 8 and determines the correlation between economic events and corporate performance as a causal relationship. The information is extracted and necessary information is registered in the causal relationship database 3. The causal relationship extraction system 5 has a news analysis unit 5a, a settlement analysis unit 5b, a node registration unit 5c, and a causal relationship registration unit 5d. In the present embodiment, the causal relationship extraction system 5 is constructed on the analysis server side, but by installing a computer program (causal relationship extraction program) that causes a computer to function and operate as blocks 5a to 5d into the analysis server. It is also possible to realize equivalently.

ニュース分析部５ａは、ニュースデータベース７に格納されているニュースを読み出し、このニュースの内容を分析することによって経済事象を抽出すると共に、この経済事象のダイジェストを生成する。経済事象のダイジェストは、経済事象の内容を予め定められた複数の項目に区分することによって構造化されており、冗長性を排した形で経済事象の特徴（特徴量）を端的に表している。本実施形態では、ニュースの記載特性を考慮した上でその内容を端的に特徴化すべく、経済事象のダイジェストを規定する複数の項目として、「地域（area）」、「名前（item）」、「要素（element）」および「変動（predicate）」を用いる。ここで、「地域（area）」は、「東南アジア」や「日本」のように、経済事象が発生した地域名を表す項目である。「名前（item）」は、「清涼飲料水」や「原油」のように、経済事象の名前を表す項目である。「要素（element）」は、「価格」、「販売量」、「需要」、「税率」、「景気」、「市況」、「貿易摩擦」、「経営状態」、「規制」のように、経済事象の定量または傾向を表す項目である。「変動（predicate）」は、「下落」、「減少」、「増加」、「激化」、「悪化」、「強化」のように、経済事象（「要素」）の変動方向を表す項目である。例えば、「インド銅製品の輸入量の増加」という経済事象については、「地域」＝インド、「名前」＝銅製品、「要素」＝輸入量、「変動」＝増加としたダイジェストが生成される。ただし、これらの４項目のうち、経済事象を特徴付けるものとして最も重要なものは、「要素（element）」および「変動（predicate）」、すなわち、「何」が「どうした」（例えば「価格」が「下落」した）である。したがって、「要素（element）」および「変動（predicate）」は必要不可欠であるが、「地域（area）」および「名前（item）」については必要に応じて適宜採用すればよく、あるいは、これら以外の別の項目を追加してもよい。また、経済事象の内容によって、「地域（area）」、「名前（item）」、「要素（element）」および「変動（predicate）」のすべてが明確に存在するとは限らない。そこで、経済事象のダイジェストのパターンとして、上記４項目よりも少ない項目数によるダイジェスト化を許容してもよい。 The news analysis unit 5a reads out the news stored in the news database 7, analyzes the contents of the news, extracts the economic phenomenon, and generates a digest of the economic phenomenon. The economic event digest is structured by dividing the content of the economic event into a plurality of predetermined items, and directly expresses the characteristic (feature amount) of the economic event without redundancy. .. In the present embodiment, in order to characterize the content of the news in consideration of the description characteristics, the plurality of items that define the digest of the economic phenomenon are "area", "name", " "Element" and "predicate" are used. Here, the “area” is an item representing the area name where the economic phenomenon occurs, such as “Southeast Asia” and “Japan”. The “name” is an item representing the name of an economic phenomenon, such as “soft drink” or “crude oil”. "Element" means "price", "sales volume", "demand", "tax rate", "economy", "market conditions", "trade friction", "business condition", "regulation". It is an item that represents the quantitative amount or tendency of economic events. “Fluctuation (predicate)” is an item that indicates the direction of change in economic events (“elements”), such as “fall”, “decrease”, “increase”, “intensify”, “deteriorate”, and “strengthen”. .. For example, for the economic phenomenon of “increase in import quantity of Indian copper products”, a digest is generated in which “region”=India, “name”=copper product, “element”=import quantity, and “fluctuation”=increase. .. However, of these four items, the most important ones that characterize economic events are "element" and "predicate", that is, "what" is "what happened" (eg "price"). Has “fallen”). Therefore, "element" and "predicate" are indispensable, but "area" and "name" may be appropriately adopted as necessary, or Other items other than the above may be added. In addition, depending on the content of the economic phenomenon, not all “area”, “name”, “element”, and “predicate” clearly exist. Therefore, as the digest pattern of the economic phenomenon, it is possible to allow the digest with a smaller number of items than the above four items.

ノード登録部５ｃは、抽出された経済事象について、因果関係データベース３に同一内容の経済事象が登録されていないことを条件（重複登録の排除）に、ノードＩＤを採番した上で、新規な事象ノードとして因果関係データベース３に登録される。経済事象が同一であるか否かの判断は、表記ゆれなどを考慮した上で判断する必要がある。そこで、同一とみなす表現のパターンが多数登録された名寄せ辞書を参照して、表現を標準化した上で同一性が判断される（データのクレンジング）。例えば、「原油価格の上昇」、「オイルプライスアップ」、「原油高」といった経済事象は、「名前」＝原油、「要素」＝価格、「変動」＝上昇に標準化された結果、同一であると判断される。なお、例えば、名寄せ辞書を参照して、「消費増税」を含むダイジェストと、「消費税」および「増大」を含むダイジェストとは同一であるといった判断を行えば、項目数が異なるダイジェスト間における同一性の判断が可能になる。 The node registration unit 5c assigns a node ID to the extracted economic event on condition that no economic event having the same content is registered in the causal relationship database 3 (elimination of duplicate registration), and then a new one is created. It is registered in the causal relationship database 3 as an event node. It is necessary to judge whether or not economic phenomena are the same after taking into account notational fluctuations. Therefore, the identity is judged after standardizing the expressions by referring to a name identification dictionary in which many patterns of expressions considered to be the same are registered (data cleansing). For example, economic events such as “Rise in crude oil price”, “Oil price up”, and “Crude oil price” are the same as a result of being standardized to “name”=crude oil, “factor”=price, and “fluctuation”=upward. Is judged. Note that, for example, if the digest including the “consumption tax increase ” and the digest including the “consumption tax” and the “ increased ” are determined to be the same by referring to the name identification dictionary, the digests having different numbers of items are the same. It becomes possible to judge the sex.

因果関係データベース３には、事象ノードのノード情報として、名寄せ辞書によって表現が標準化されたダイジェストが登録される。この点は、決算関連資料から抽出される企業業績のダイジェストについても同様である。 In the causality database 3, a digest whose expression is standardized by a name identification dictionary is registered as node information of an event node. The same applies to the digest of corporate performance extracted from the materials related to financial results.

決算分析部５ｂは、決算関連データベース８に格納されている決算関連資料を読み出し、この決算関連資料の内容を分析することによって、経済事象および企業業績を抽出する。ノード登録部５ｃは、抽出された経済事象および企業業績について、因果関係データベース３に同一内容の経済事業や同一内容の企業業績が登録されていないことを条件に、新規な事象ノードおよび新規な業績ノードとして因果関係データベース３に登録する。また、決算分析部５ｂは、抽出された経済事象および企業業績について、経済事象のダイジェストおよび企業業績のダイジェストを生成する。経済事象のダイジェストは、ニュースの場合と同様である。また、企業業績のダイジェストは、企業業績の内容を予め定められた複数の項目に区分することによって構造化されており、冗長性を排した形で企業業績の特徴（特徴量）を端的に表している。本実施形態では、決算関連資料の記載特性を考慮した上でその内容を端的に特徴化すべく、企業業績のダイジェストを規定する複数の項目として、「勘定科目」と、「変動」とを用いる。「勘定科目」は、勘定科目を表す項目であり、「変動」は、勘定科目の変動方向を表す項目である。ニュースの場合と同様、決算関連資料より抽出された経済事象および企業業績は、因果関係データベース３に同一のものが登録されていないことを条件に、ノードＩＤを採番した上で、因果関係データベース３に登録される。 The settlement analysis unit 5b reads out the settlement-related materials stored in the settlement-related database 8 and analyzes the contents of the settlement-related materials to extract economic events and corporate performance. Regarding the extracted economic event and corporate performance, the node registration unit 5c provides a new event node and a new business performance, provided that no causal relationship database 3 has registered the same economic business or corporate performance. It is registered in the causal relationship database 3 as a node. Further, the settlement analysis unit 5b generates a digest of economic events and a digest of corporate performance for the extracted economic phenomena and corporate performance. The digest of economic events is similar to that of news. Further, the corporate performance digest is structured by dividing the content of the corporate performance into a plurality of predetermined items, and directly expresses the characteristic (feature amount) of the corporate performance without redundancy. ing. In the present embodiment, “accounting item” and “fluctuation” are used as a plurality of items that define the digest of the corporate performance in order to characterize the content of the settlement-related material in consideration of the description characteristic. The “account item” is an item representing the account item, and the “fluctuation” is an item representing the variation direction of the account item. As in the case of news, the economic events and corporate performances extracted from the financial statements-related materials are numbered node IDs on condition that the same thing is not registered in the causal relationship database 3, and then the causal relationship database is created. Registered in 3.

因果関係データベース３に登録された経済事象のダイジェストおよび企業業績のダイジェストは、相関グラフにおけるノード情報として用いられる。すなわち、因果関係データベース３より読み出された経済事象のダイジェストは、相関グラフにおける事象ノードのノード情報として個別に付加される。また、因果関係データベース３より読み出された企業業績のダイジェストは、相関グラフにおける業績ノードのノード情報として個別に付加される。 The digest of economic events and the digest of corporate performance registered in the causal relationship database 3 are used as node information in the correlation graph. That is, the digests of economic events read from the causal relationship database 3 are individually added as node information of event nodes in the correlation graph. Further, the corporate performance digest read from the causal relationship database 3 is individually added as node information of the performance node in the correlation graph.

因果関係登録部５ｄは、ニュースデータベース７に格納されているニュースを読み出し、異なる経済事象同士の因果関係を抽出する。基本的に、１つのニュースに複数の経済事象が含まれており、かつ、一対の経済事象が原因と結果の関係にあると判断された場合、因果関係が認められる。具体的には、まず、１つの経済事象は、上述したような構造化されたダイジェストをベースとして特定される。１つのニュースに２つ以上のダイジェストが抽出された場合、ニュースの文章表現に基づいて、一対のダイジェストに因果関係が認められるセットが抽出される。例えば、「・・・今年日本では記録的冷夏により、清涼飲料水販売量が下落し、・・・」というニュースがあった場合、「により」という語句から、「記録的冷夏」および「清涼飲料水が下落」という一対の経済事象に因果関係があると判断される。なお、極端に日時が異なる経済事象同士の結び付きを禁止するために、因果関係の有無の判断にあたっては時間的な情報を考慮してもよい。 The causal relationship registration unit 5d reads the news stored in the news database 7 and extracts the causal relationship between different economic events. Basically, when one news contains a plurality of economic events and it is determined that a pair of economic events has a cause-effect relationship, a causal relationship is recognized. Specifically, first, one economic event is specified based on the structured digest as described above. When two or more digests are extracted from one news, a set in which a causal relationship is recognized between the pair of digests is extracted based on the text expression of the news. For example, if there is news that "... this year in Japan, sales of soft drinks have fallen due to record cold summer...", the phrase "due to" will be followed by "record cool summer" and "soft drinks". It is judged that there is a causal relationship between the pair of economic events of "falling water". Note that temporal information may be taken into consideration when determining whether there is a causal relationship, in order to prohibit the association of economic events with extremely different dates and times.

また、特定の語句を手掛かりとした単純な手法以外に、以下に示す手法を採用すれば、経済事象間の因果関係をより精度良く抽出することができる。図１１は、第１の手法としてテキスト構造解析を用いた因果関係の抽出手法の説明図である。文章中の文を識別するために、便宜上、個々の文に段落番号１〜５が付与されている。第１の手法では、２つの要因が、文章全体の構造において原因と結果の段落に位置しているかが判定される。具体的には、テキスト構造解析によって節や文、段落間の関係が推定され、その関係が原因結果であるものについて、その中に含まれている「名前」、「要素」および「変動」が因果関係の候補となる。これは、特定の語句を手掛かりとした手法を、より広範囲に体系化したものである。 In addition to the simple method using a specific word as a clue, if the following method is adopted, the causal relationship between economic events can be more accurately extracted. FIG. 11 is an explanatory diagram of a causal relationship extraction method using text structure analysis as the first method. In order to identify the sentences in the sentence, paragraph numbers 1 to 5 are given to the individual sentences for convenience. In the first method, it is determined whether the two factors are located in the cause and effect paragraphs in the structure of the entire sentence. Specifically, the relation between clauses, sentences, and paragraphs is estimated by the text structure analysis, and the "name", "element", and "variation" contained in the relation are the cause and effect. Be a candidate for a causal relationship. This is a more extensive systematization of techniques that use specific words as clues.

図１２は、第２の手法として機械学習を用いた因果関係の抽出手法の説明図である。第２の手法では、予め学習データ（因果関係の有無が明らかな既知の２つの要因の集合）について、それぞれを要因ベクトルに変換する。要因ベクトルとは数値列であって、例えばword2vecと呼ばれる技術を用いて、「名前」、「要素」および「変動」の語を個別にベクトル化したものを連結することによって得られる。そして、学習データを機械学習して学習済のデータベースを用意する。機械学習手法としては、サポートベクターマシン（ＳＶＭ）やランダムフォレストなどのように、２クラス分類手法を用いることができる。「地域」、「名前」、「要素」および「変動」の対から要因ベクトルを計算し、この要因ベクトルを学習済のモデルに入力することによって因果関係の有無を判断することができる。 FIG. 12 is an explanatory diagram of a causal relationship extraction method using machine learning as the second method. In the second method, each of learning data (a set of two known factors with clear causal relationship) is converted into a factor vector in advance. The factor vector is a numerical sequence, and is obtained by concatenating individual vectorized words of “name”, “element”, and “variation” using a technique called word2vec, for example. Then, the learning data is machine-learned to prepare a learned database. As a machine learning method, a two-class classification method such as support vector machine (SVM) or random forest can be used. Whether or not there is a causal relationship can be determined by calculating a factor vector from the pairs of "region", "name", "element", and "variation", and inputting this factor vector into a trained model.

第３の手法では、前処理として文書要約技術によってニュースの要約文を生成し、この要約文に対して、上記第１の手法や第２の手法を適用する。要約には抽出型と抽象型とが存在する。抽出型は、元の語句を用いて要約するタイプであり、文の冗長な構造が排除されるので、上述した「名前」、「要素」および「変動」の検出性能を改善できる可能性がある。一方、抽象型は、元にない語句を要約文中に生成するタイプであり、元の文からは直接得ることができない「名前」、「要素」および「変動」および因果関係が得られる可能性が期待できる。 In the third method, a news summary sentence is generated as a preprocessing by the document summarization technique, and the first technique or the second technique is applied to the summary sentence. There are extraction type and abstract type in the abstract. The extraction type is a type that summarizes using the original words and phrases, and since redundant structure of sentences is eliminated, there is a possibility that the above-mentioned “name”, “element” and “variation” detection performance can be improved. .. On the other hand, the abstract type is a type that generates a phrase that does not exist in the original sentence in the summary sentence, and there is a possibility that "name", "element" and "fluctuation" and causality that cannot be obtained directly from the original sentence are obtained. Can be expected.

また、因果関係登録部５ｄは、決算関連データベース８に格納されている決算関連資料を読み出し、この決算関連資料に企業業績（勘定科目およびその変動）に影響を及した要因となる経済事象が含まれていると判断した場合、この決算関連情報に対応する企業業績のノードと、影響要因に対応する事象ノードとの間の接続関係を因果関係データベース３に登録する。企業業績と経済事象との因果関係を抽出する手法としては、上述した経済事象間の因果関係と同様の抽出手法を用いてもよいが、本出願人が既に取得した日本特許第６１５５４０９号の抽出手法を用いることが好ましい。同特許公報には、決算関連資料を分析して、会計上の事象の要因を抽出する手法が開示されている。具体的には、まず、決算関連資料に含まれる文章を分解した文のそれぞれについて形態素解析を行い、文の形態素列が生成される。つぎに、形態素列を構成する形態素またはその組み合わせに対して、少なくとも科目および金額を分類する属性毎に固有の属性ラベルが付与される。そして、属性ラベルによって抽象化された形態素列と、要因パターンと比較し、形態素列が要因パターンと一致するか否かが判定される。最後に、要因パターンと一致した形態素列について、要因パターンによって指定された部分を要因として抽出し、この抽出された要因が形態素列における科目および金額情報に紐付けられる。 Further, the causal relationship registration unit 5d reads out the settlement-related material stored in the settlement-related database 8, and the settlement-related material includes an economic phenomenon that is a factor affecting the corporate performance (account item and its fluctuation). If it is determined that the connection is established, the connection relationship between the company performance node corresponding to the settlement related information and the event node corresponding to the influencing factor is registered in the causal relationship database 3. As a method of extracting the causal relationship between the business performance and the economic phenomenon, the same extraction method as the causal relationship between the economic phenomena described above may be used, but the extraction of Japanese Patent No. 6155409 already obtained by the present applicant. It is preferable to use the method. This patent gazette discloses a method for analyzing factors related to accounting and extracting factors of accounting events. Specifically, first, morphological analysis is performed on each sentence obtained by decomposing the sentence included in the financial statement-related material, and a morpheme string of the sentence is generated. Next, a unique attribute label is assigned to at least each of the attributes for classifying the subject and the amount of money to the morphemes forming the morpheme string or the combination thereof. Then, the morpheme string abstracted by the attribute label is compared with the factor pattern to determine whether or not the morpheme string matches the factor pattern. Finally, for the morpheme string that matches the factor pattern, the portion designated by the factor pattern is extracted as a factor, and the extracted factor is linked to the subject and amount information in the morpheme string.

一対の経済事象における因果関係は、要因に対応する事象ノードと、結果に対応する事象ノードとの間が接続されていることを示す形で因果関係データベース３に登録される。また、経済事象および企業業績の対における因果関係は、要因に対応する事象ノードと、結果に対応する業績ノードとの間が接続されていることを示す形で因果関係データベース６に登録される。 The causal relationship between the pair of economic events is registered in the causal relationship database 3 in a form showing that the event node corresponding to the factor and the event node corresponding to the result are connected. Further, the causal relationship between the economic event and the corporate performance is registered in the causal relationship database 6 in a form showing that the event node corresponding to the factor is connected to the performance node corresponding to the result.

以下、図１３〜１６を参照しつつ、相関グラフ生成システム２０における処理の流れを具体的に説明する。まず、ニュース分析部５ａによって、ニュースが個別に分析され、それぞれのニュースにおける経済事象が抽出される。図１３に示すように、ニュース「Ｎ００１」から経済事象１、ニュース「Ｎ００２」から経済事象２〜３、ニュース「Ｎ００３」から経済事象４〜６、ニュース「Ｎ００４」から経済事象８〜９、ニュース「Ｎ００５」から経済事象１０〜１２がそれぞれ抽出される。 Hereinafter, the flow of processing in the correlation graph generation system 20 will be specifically described with reference to FIGS. First, the news analysis unit 5a analyzes news individually and extracts economic phenomena in each news. As shown in FIG. 13, economic event 1 is from news “N001”, economic events 2 to 3 are from news “N002”, economic events 4 to 6 are from news “N003”, economic events 8 to 9 are news from “N004”. Economic events 10 to 12 are extracted from “N005”.

つぎに、重複登録を禁止するという条件の下、ノード登録部５ｃによって、ニュースから抽出された経済事象が新規な事象ノードとして因果関係データベース３に登録される。図１４に示すように、ニュース「Ｎ００１」から抽出された経済事象１は未登録なので、新規な事象ノード「ａ１」として登録される。ニュース「Ｎ００２」については、２つの経済事象２〜３が抽出されているが、経済事象２は既に登録された経済事象１と同一である。したがって、経済事象２については、経済事象１に対応する事象ノード「ａ１」が割り当てられ、経済事象３についてのみ、新規な事象ノード「ａ２」として登録される。ニュース「Ｎ００３」については、３つの経済事象４〜６が抽出されているが、経済事象４は既に登録された経済事象１と同一である。したがって、経済事象４については、経済事象１に対応する事象ノード「ａ１」が割り当てられ、経済事象５〜６についてのみ、新規な事象ノード「ａ３」〜「ａ４」として登録される。ニュース「Ｎ００４」については、２つの経済事象７〜８が抽出されているが、経済事象７は既に登録された経済事象６と同一である。したがって、経済事象７については、経済事象６に対応する事象ノード「ａ４」が割り当てられ、経済事象８についてのみ、新規な事象ノード「ａ５」として登録される。ニュース「Ｎ００５」については、３つの経済事象９〜１１が抽出されているが、経済事象９は既に登録された経済事象５と同一である。したがって、経済事象９については、経済事象５に対応する事象ノード「ａ３」が割り当てられ、経済事象１０〜１１についてのみ、新規な事象ノード「ａ６」〜「ａ７」として登録される。 Next, under the condition that duplicate registration is prohibited, the node registration unit 5c registers the economic event extracted from the news in the causal relationship database 3 as a new event node. As shown in FIG. 14, since the economic event 1 extracted from the news “N001” has not been registered, it is registered as a new event node “a1”. For the news “N002”, two economic events 2-3 are extracted, but the economic event 2 is the same as the already registered economic event 1. Therefore, the economic event 2 is assigned the event node “a1” corresponding to the economic event 1, and only the economic event 3 is registered as a new event node “a2”. For the news “N003”, three economic events 4 to 6 are extracted, but the economic event 4 is the same as the already registered economic event 1. Therefore, the economic event 4 is assigned the event node “a1” corresponding to the economic event 1, and only the economic events 5 to 6 are registered as new event nodes “a3” to “a4”. For the news “N004”, two economic events 7 to 8 are extracted, but the economic event 7 is the same as the already registered economic event 6. Therefore, the economic event 7 is assigned the event node “a4” corresponding to the economic event 6, and only the economic event 8 is registered as a new event node “a5”. For the news “N005”, three economic events 9 to 11 are extracted, but the economic event 9 is the same as the already registered economic event 5. Therefore, the economic event 9 is assigned the event node “a3” corresponding to the economic event 5, and only the economic events 10 to 11 are registered as new event nodes “a6” to “a7”.

つぎに、因果関係登録部５ｄによって、ニュースから経済事象間の因果関係が抽出され、要因および結果を識別可能な接続関係として因果関係データベース３に登録される。図１５に示すように、ニュース「Ｎ００２」から、経済事象２（要因）および経済事象３（結果）の間の因果関係が抽出される。これにより、経済事象２に対応する事象ノード「ａ１」を始点とし、経済事象３に対応する事象ノード「ａ２」を終点とする接続が登録される。ニュース「Ｎ００３」から、経済事象４（要因）および経済事象５（結果）の間の因果関係と、経済事象５（要因）および経済事象６（結果）の間の因果関係とが抽出される。これにより、前者については、経済事象４に対応する事象ノード「ａ１」を始点とし、経済事象４に対応する事象ノード「ａ３」を終点とする接続が登録される。また、後者については、経済事象５に対応する事象ノード「ａ３」を始点とし、経済事象６に対応する事象ノード「ａ４」を終点とする接続が登録される。ニュース「Ｎ００４」から、経済事象７（要因）および経済事象８（結果）の間の因果関係が抽出される。これにより、経済事象７に対応する事象ノード「ａ４」を始点とし、経済事象８に対応する事象ノード「ａ５」を終点とする登録される。ニュース「Ｎ００５」から、経済事象９（要因）および経済事象１０（結果）の間の因果関係と、経済事象１０（要因）および経済事象１１（結果）の間の因果関係とが抽出される。これにより、前者については、経済事象９に対応する事象ノード「ａ３」を始点とし、経済事象１０に対応する事象ノード「ａ６」を終点とする接続が登録される。また、後者については、経済事象１０に対応する事象ノード「ａ６」と始点とし、経済事象１１に対応する事象ノード「ａ７」を終点とする接続が登録される。 Next, the causal relationship registration unit 5d extracts the causal relationship between the economic events from the news and registers it in the causal relationship database 3 as a connection relationship in which the factor and the result can be identified. As shown in FIG. 15, a causal relationship between economic event 2 (factor) and economic event 3 (result) is extracted from news “N002”. As a result, a connection having the event node “a1” corresponding to economic event 2 as the starting point and the event node “a2” corresponding to economic event 3 as the ending point is registered. From the news “N003”, the causal relationship between economic event 4 (factor) and economic event 5 (result) and the causal relationship between economic event 5 (factor) and economic event 6 (result) are extracted. As a result, for the former, a connection in which the event node “a1” corresponding to economic event 4 is the starting point and the event node “a3” corresponding to economic event 4 is the ending point is registered. For the latter, a connection is registered with the event node “a3” corresponding to economic event 5 as the starting point and the event node “a4” corresponding to economic event 6 as the ending point. A causal relationship between economic event 7 (factor) and economic event 8 (result) is extracted from news “N004”. As a result, the event node “a4” corresponding to the economic event 7 is registered as a starting point, and the event node “a5” corresponding to the economic event 8 is registered as an end point. From the news “N005”, the causal relationship between the economic event 9 (factor) and the economic event 10 (result) and the causal relationship between the economic event 10 (factor) and the economic event 11 (result) are extracted. As a result, for the former, a connection with the event node “a3” corresponding to the economic event 9 as the starting point and the event node “a6” corresponding to the economic event 10 as the ending point is registered. For the latter, a connection is registered with the event node “a6” corresponding to economic event 10 as the starting point and the event node “a7” corresponding to economic event 11 as the ending point.

そして、因果関係抽出部５ｄによって、決算関連資料から事象ノードと業績ノードとの間の因果関係が抽出され、要因および結果を識別可能な接続関係として因果関係データベース３に登録される。図１６に示すように、決算関連資料「Ｋ００１」から、ある経済事象（要因）とある企業業績（結果）との間の因果関係が抽出され、前者は事象ノード「ａ２」の経済事象と同一であり、後者は未登録であるとする。これにより、企業業績に対応する新規な業績ノード「ｂ１」が登録されると共に、事象ノード「ａ２」を始点とし、業績ノード「ｂ１」を終点とする接続が登録される。決算関連資料「Ｋ００１」から、ある経済事象（要因）とある企業業績（結果）との間の因果関係が抽出され、前者は事象ノード「ａ４」の経済事象と同一であり、後者は未登録であるとする。これにより、企業業績に対応する新規な業績ノード「ｂ２」が登録されると共に、事象ノード「ａ４」を始点とし、業績ノード「ｂ２」を終点とする接続が登録される。決算関連資料「Ｋ００３」から、ある経済事象（要因）とある企業業績（結果）との間の因果関係が抽出され、前者は事象ノード「ａ５」の経済事象と同一であり、後者は未登録であるとする。これにより、企業業績に対応する新規な業績ノード「ｂ３」が登録されると共に、事象ノード「ａ５」を始点とし、業績ノード「ｂ３」を終点とする接続が登録される。 Then, the causal relationship extracting unit 5d extracts a causal relationship between the event node and the achievement node from the settlement related material, and registers the causal relationship in the causal relationship database 3 as a connection relationship capable of identifying the factor and the result. As shown in FIG. 16, a causal relationship between a certain economic event (factor) and a certain corporate performance (result) is extracted from the settlement related material “K001”, and the former is the same as the economic event of the event node “a2”. And the latter is assumed to be unregistered. As a result, a new achievement node “b1” corresponding to the corporate achievement is registered, and a connection having the event node “a2” as a starting point and the achievement node “b1” as an ending point is registered. A causal relationship between a certain economic event (factor) and a certain corporate performance (result) is extracted from the financial statement-related material "K001", the former is the same as the economic event of the event node "a4", and the latter is unregistered. Suppose As a result, a new achievement node “b2” corresponding to the corporate achievement is registered, and a connection with the event node “a4” as the starting point and the achievement node “b2” as the ending point is registered. A causal relationship between a certain economic event (factor) and a certain corporate performance (result) is extracted from the financial statement-related material "K003", the former is the same as the economic event of the event node "a5", and the latter is not registered. Suppose As a result, a new achievement node “b3” corresponding to the corporate achievement is registered, and a connection having the event node “a5” as a starting point and the achievement node “b3” as an ending point is registered.

以上のような一連の処理により、因果関係データベース３の保持内容として、例えば、図５に示したような因果関係が構築される。 Through the series of processes described above, for example, a causal relationship as shown in FIG. 5 is constructed as the content retained in the causal relationship database 3.

最後に、相関グラフの拡張例として、経済事象と企業業績との因果関係を予測した予測相関グラフについて説明する。図１７の上段は、上述した抽出処理によって生成された相関グラフの一例を示している。すなわち、「日本」における「記録的冷夏」の「減少」という事象ノードａ１１をルートとして、「日本」における「清涼飲料水販売量」の「減少」（事象ノードａ１２）、「日本」における「アルミ需要」の「減少」（事象ノードａ１３）、「日本」における「アルミ価格」の「減少」（事象ノードａ１４）を経て、業績ノードｂ１０、すなわち、アルミを原料に製品を製造する特定企業（○○マテリアル）が「営業増益」に至ったことを示している。 Finally, as an extended example of the correlation graph, a predictive correlation graph that predicts the causal relationship between economic events and corporate performance will be described. The upper part of FIG. 17 shows an example of the correlation graph generated by the extraction processing described above. That is, with the event node a11 of “decrease” of “record cold summer” in “Japan” as a route, “decrease” (event node a12) of “soft drink sales volume” in “Japan” and “aluminum” in “Japan”. After "decrease" of "demand" (event node a13) and "decrease" of "aluminum price" in "Japan" (event node a14), performance node b10, that is, a specific company that manufactures products using aluminum as a raw material (○) ○ Material) has reached “increased operating income”.

ここで、仮想的なケースとして、事象ノードａ１１の変動方向が「減少」から「増大」に反転したケースについて考える。この場合、上記相関グラフから、事象ノードａ１１以降の事象ノードａ１２〜ａ１４および業績ノードｂ１０の変動方向も反転することが予想される。すなわち、「日本」における「記録的冷夏」の「増大」という予測事象ノードａ５１をルートとして、「日本」における「清涼飲料水販売量」の「増大」（予測事象ノードａ５２）、「日本」における「アルミ需要」の「増大」（予測事象ノードａ５３）、「日本」における「アルミ価格」の「上昇」（予測事象ノードａ５４）を経て、最終的に、予測業績ノードｂ５０、すなわち、アルミを原料に製品を製造する特定企業（○○マテリアル）が「営業減益」になるであろうことが合理的に予想される。このように、経済事象や企業業績として実際に発生していなくても、実際の発生したものに基づき反転事象や反転業績として予測し、予測相関グラフとして可視化すれば、ユーザにとって有用な情報となる。 Here, as a hypothetical case, consider a case where the variation direction of the event node a11 is reversed from “decrease” to “increase”. In this case, from the above correlation graph, it is expected that the changing directions of the event nodes a12 to a14 and the achievement node b10 after the event node a11 are also reversed. That is, with the predictive event node a51 of "increasing" of "record cold summer" in "Japan" as the route, "increasing" (predictive event node a52) of "soft drink sales volume" in "Japan", and in "Japan" After "increasing" of aluminum demand (forecast event node a53) and "increasing" of "aluminum price" in Japan (forecast event node a54), finally, forecast performance node b50, that is, aluminum as a raw material It is reasonably expected that a specific company that manufactures products (○○ Material) will have a "operating loss". In this way, even if it does not actually occur as an economic event or corporate performance, it is useful information for the user if it is predicted as a reversal event or reversal performance based on what actually occurred and visualized as a prediction correlation graph. ..

図１８は、予測事象ノードおよび予測業績ノードの生成処理の説明図である。まず、因果関係登録部５ｄは、因果関係データベース３に登録されており、かつ、事象ノードおよび業績ノードとして接続関係にある一連の登録済ノードａ１１〜ａ１４，ｂ１０について、新たなノードＩＤ＝ａ５１〜ａ５４，ｂ５０を採番した上で、因果関係データベース３に予測ノード（予測事象ノードおよび予測業績ノード）として登録する。予測ノードは、これに対応する登録済ノードの変動方向を反転（例えば、「減少」→「増大」）させた上で、接続関係も含めてコピーしたものである。これにより、図１７の下段に示した予測相関グラフが生成される。 FIG. 18 is an explanatory diagram of the generation process of the predicted event node and the predicted achievement node. First, the causal relationship registering unit 5d registers a new node ID=a51-a11 with respect to a series of registered nodes a11-a14, b10 that are registered in the causal relationship database 3 and that are connected as event nodes and achievement nodes. After assigning numbers a54 and b50, they are registered in the causal relationship database 3 as prediction nodes (prediction event node and prediction achievement node). The prediction node is obtained by inverting (for example, “decreasing”→“increasing”) the changing direction of the registered node corresponding thereto, and then copying it including the connection relationship. As a result, the predicted correlation graph shown in the lower part of FIG. 17 is generated.

ここで、予測ノードの登録に際しては、反転関係にある表現のセットを複数登録した反転辞書が参照され、登録済ノードにおける変動方向の反転に相当する表現（例えば、「減少」に対する「増大」）が存在することが条件とされる。したがって、反転表現が存在しない登録済ノードについては、これに対応する予測ノードは登録されない。例えば、登録済ノードａ１３について、反転表現が存在しないと判断された場合、予測ノードａ５３は登録されない。その結果、互いに接続された２つの予測ノードａ５１，ａ５２と、互いに接続された２つの予測ノードａ５４，ｂ５０とが分断された形で登録されることになる。 Here, when registering a prediction node, an inversion dictionary in which a plurality of sets of expressions having an inversion relation are registered is referred to, and an expression corresponding to inversion of a fluctuation direction in a registered node (for example, “increase” to “decrease”) Is required to exist. Therefore, for a registered node for which an inverted expression does not exist, the corresponding prediction node is not registered. For example, when it is determined that the inverted expression does not exist for the registered node a13, the prediction node a53 is not registered. As a result, the two predictive nodes a51 and a52 connected to each other and the two predictive nodes a54 and b50 connected to each other are registered in a separated form.

このように、本実施形態によれば、相関グラフを通じて経済事象と企業業績との相関性が可視化され、企業業績の変動要因を容易に把握できるので、ユーザの利便性を高めることができる。 As described above, according to the present embodiment, the correlation between the economic phenomenon and the business performance is visualized through the correlation graph, and the factor of the change in the business performance can be easily grasped, so that the convenience of the user can be enhanced.

また、本実施形態によれば、ユーザは、検索候補表示領域１ａに表示された検索候補群に対する選択だけで、煩雑な操作を要することなく、所望の相関グラフを簡単に取得できる。それとともに、検索対象の変更によって、相関グラフを簡単に切り替えることができる。その際、ニュース分析および利益影響要因分析といった異なる分析サービスを提供すれば、ユーザの利便性を更に高めることができる。 Further, according to the present embodiment, the user can easily obtain a desired correlation graph by only selecting the search candidate group displayed in the search candidate display area 1a without requiring a complicated operation. At the same time, the correlation graph can be easily switched by changing the search target. At that time, if different analysis services such as news analysis and profit influence factor analysis are provided, user convenience can be further enhanced.

また、本実施形態によれば、相関グラフを構成するノードのノード情報として、経済事象や企業業績のダイジェストを表示することで、ユーザの利便性を一層高めることができる。 Further, according to the present embodiment, by displaying the digest of the economic phenomenon and the corporate performance as the node information of the nodes forming the correlation graph, the convenience of the user can be further enhanced.

さらに、本実施形態によれば、経済事象および企業業績について、因果関係データベース３に登録する際の内容の同一性の判断も含めて、情報の冗長性を排したダイジェストベースで因果関係の抽出処理を行う。経済事象のダイジェストは、ニュースの記載特性を考慮した上で、経済事象を端的に特徴化した形で構造化されており、経済事象の定量または傾向を表す項目と、その変動方向を表す項目とを有する。また、企業業績のダイジェストは、決算関連資料の記載特性を考慮した上で、企業業績を端的に特徴化した形で構造化されており、勘定科目を表す項目と、その変動方向を表す項目とを有する。これにより、経済事象と企業業績との間における因果関係の抽出をコンピュータ処理によって効率的に自動化することが可能になる。 Further, according to the present embodiment, the causal relationship extraction process is performed on the basis of the digest, which eliminates the redundancy of information, including the judgment of the identity of the contents when registering in the causal relationship database 3 for the economic phenomenon and the business performance. I do. The economic event digest is structured in such a way that it briefly characterizes the economic event, taking into account the description characteristics of the news, and an item that indicates the quantitative amount or tendency of the economic event and an item that indicates the direction of change. Have. In addition, the corporate performance digest is structured in such a way that it briefly characterizes corporate performance in consideration of the description characteristics of the financial statements-related materials. Have. This makes it possible to efficiently automate the extraction of the causal relationship between economic events and corporate performance by computer processing.

１表示画面
１ａ検索候補表示領域
１ｂ相関グラフ表示領域
１ｃニュース本文表示領域
１ｄ影響企業表示領域
２表示切替部
３因果関係データベース
４相関グラフ生成部
５因果関係抽出システム
５ａニュース分析部
５ｂ決算分析部
５ｃノード登録部
５ｄ因果関係登録部
７ニュースデータベース
８決算関連データベース
１０相関グラフ表示装置
２０相関グラフ生成システム

1 Display Screen 1a Search Candidate Display Area 1b Correlation Graph Display Area 1c News Body Display Area 1d Affected Company Display Area 2 Display Switching Section 3 Causal Relationship Database 4 Correlation Graph Generation Section 5 Causal Relationship Extraction System 5a News Analysis Section 5b Financial Statement Analysis Section 5c Node registration section 5d Causal relationship registration section 7 News database 8 Financial statements related database 10 Correlation graph display device 20 Correlation graph generation system

Claims

In a causal relationship extraction system that extracts the causal relationship between economic phenomena and corporate performance,
Multiple event nodes with economic event digests, multiple performance nodes with corporate performance digests, connections between nodes for specific economic event pairs, and specific economic events and specific corporate performances A causal database holding connections between nodes for pairs of
A news analysis unit that analyzes the content of news and extracts economic events, and also generates a digest of economic events,
A settlement analysis unit that analyzes the contents of settlement related materials to extract economic events and corporate performance, and also generates a digest of economic events and a corporate performance digest,
Regarding the extracted economic phenomenon, on the condition that the event node of the same digest is not registered, while registering the event node added with the digest of the economic phenomenon in the causal relationship database, regarding the extracted corporate performance, A node registration unit that registers a performance node to which a digest of corporate performance is added to the causal relationship database, provided that no performance node of the same digest is registered.
By the first extraction method using a specific word or phrase indicating a causal relationship as a clue, or the second extraction method by machine learning using learning data that is a set of two known factors that have a causal relationship, When a causal relationship between different specific economic events is extracted from the news, based on the extracted causal relationship, the connection between the nodes related to the economic event pair is registered in the causal relationship database and the first causal relationship is registered. The extraction method, the second extraction method, or the morpheme sequence abstracted by the attribute label is compared with the factor pattern, and the morpheme sequence that matches the factor pattern is caused by the portion specified by the factor pattern as a factor. When a causal relationship between a specific corporate performance and a specific economic event is extracted from the financial statements-related material by the third extraction method to extract, based on the extracted causal relationship, the relationship between the economic event and the corporate performance is extracted. And a causal relationship registration unit that registers a connection between nodes related to the causal relationship database,
The digest of the economic phenomenon is structured by dividing the content of the economic phenomenon into a plurality of predetermined items, and the plurality of items are items that represent the quantitative amount or tendency of the economic phenomenon and the changing direction thereof. (However, the variable value is not included.)
The corporate performance digest is structured by dividing the content of the corporate performance into a plurality of predetermined items, and the plurality of items include an item representing an account item and an item representing the direction of change thereof. A causal relationship extraction system characterized by having.

The node registration unit refers to a name identification dictionary in which a plurality of expression patterns regarded as the same digest are registered, and with respect to the extracted economic event and the extracted corporate performance, the same digest event node and the performance node of the same digest are The causal relationship extraction system according to claim 1, wherein it is determined whether or not the causal relationship database is registered.

The causal relationship registration unit, the series of registered nodes that are registered in the causal relationship database, and have a connection relationship as an event node and a performance node, by reversing the changing direction, and then copying. The causal relationship extraction system according to claim 1, wherein the causal relationship database is registered as a predicted event node and a predicted achievement node.

The causal relationship registration unit refers to an inversion dictionary in which a plurality of sets of expressions having an inversion relationship are registered, and the prediction event node is provided on condition that an expression corresponding to the inversion of the fluctuation direction in the registered node exists. And the causal relationship extraction system according to claim 3, wherein the causal relationship extraction system is registered as the predicted achievement node.

In the causal relationship extraction program that extracts the causal relationship between economic phenomena and corporate performance,
A first step of generating a digest of economic events while analyzing the contents of news and extracting economic events;
The second step of analyzing the contents of the settlement related material to extract the economic phenomenon and the business performance, and generating the digest of the economic phenomenon and the business performance.
A third step of registering, in the causal relationship database, an event node to which a digest of an economic event is added, provided that no event node having the same digest is registered for the extracted economic event.
A fourth step of registering, in the causal relationship database, a performance node to which a digest of the corporate performance is added, provided that no performance node having the same digest is registered for the extracted business performance.
By the first extraction method using a specific word or phrase indicating a causal relationship as a clue, or the second extraction method by machine learning using learning data that is a set of two known factors that have a causal relationship, When a causal relationship between different specific economic events is extracted from the news, a fifth step of registering a connection between nodes related to the economic event pair in the causal relationship database based on the extracted causal relationship. ,
The morpheme sequence abstracted by the first extraction method, the second extraction method, or the attribute label is compared with the factor pattern, and the morpheme sequence that matches the factor pattern is specified by the factor pattern. When a causal relationship between a specific corporate performance and a specific economic event is extracted from the financial statements-related material by the third extraction method that extracts as a factor, the economic event and the relevant causal relationship are extracted based on the extracted causal relationship. A sixth step of registering a connection between nodes relating to a pair of corporate achievements in the causal relationship database, causing a computer to execute the processing.
The digest of the economic phenomenon is structured by dividing the content of the economic phenomenon into a plurality of predetermined items, and the plurality of items are items that represent the quantitative amount or tendency of the economic phenomenon and the changing direction thereof. (However, the variable value is not included.)
The corporate performance digest is structured by dividing the content of the corporate performance into a plurality of predetermined items, and the plurality of items include an item representing an account item and an item representing the direction of change thereof. A causal relationship extraction program having:

The third and fourth steps refer to a name identification dictionary in which a plurality of expression patterns regarded as the same digest are registered, and with respect to the extracted economic event and the extracted corporate performance, the same digest event node and the same digest 6. The causal relationship extraction program according to claim 5, further comprising a step of determining whether or not the achievement node of is registered in the causal relationship database.

For a series of registered nodes that are registered in the causal relationship database and that have a connection relationship as an event node and a performance node, the predicted event is recorded in the causal relationship database by reversing the direction of change and copying. The causal relationship extraction program according to claim 5, further comprising a seventh step of registering as a node and a predicted achievement node.

The seventh step refers to an inversion dictionary in which a plurality of sets of expressions having an inversion relation are registered, and the prediction event node is provided on condition that an expression corresponding to the inversion of the variation direction in the registered node exists. And the causal relationship extraction program according to claim 7, which is registered as the predicted achievement node.