JP2017211950A

JP2017211950A - Data correlating device and method

Info

Publication number: JP2017211950A
Application number: JP2016106688A
Authority: JP
Inventors: 真岩山; Makoto Iwayama; 彬童; Bin Tong
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2016-05-27
Filing date: 2016-05-27
Publication date: 2017-11-30
Anticipated expiration: 2036-05-27
Also published as: JP6623119B2

Abstract

PROBLEM TO BE SOLVED: To provide a data correlating device and method with which it is possible to correlate data with high accuracy.SOLUTION: Provided is a data correlating device and method for learning a correlation model of first and second series data obtained from the same data source and correlating, on the basis of the learned correlation model, object data that belongs to one first or second series data with data that belongs to the other second or first series data, wherein each of the first and second series data obtained from the same data source is vectorized, the correlation model of the first and second series data is learned on the basis of the vectorized first and second series data, and a correlation degree is predefined that is the degree of mutual correlation of data, among the first and second series data that are acquired from two discretionary data sources, with the correlation model learned utilizing the correlation degree.SELECTED DRAWING: Figure 10

Description

本発明は、データ対応付け装置及び方法に関し、例えば、シェールオイル・ガスの採掘の際に得られたセンサデータと、レポートのテキストデータとを対応付けるデータ対応付け装置に適用して好適なものである。 The present invention relates to a data association apparatus and method, and is suitable for application to, for example, a data association apparatus that associates sensor data obtained during mining of shale oil and gas with text data of a report. .

従来、シェールオイル・ガスの採掘では、採掘中にドリルに先端に取り付けられた各種センサによってガンマ線量等の様々な数値を定期的に測定する一方で、ある一定の間隔で石を採取し、その特徴（色、硬度、オイル染みの有無など）をテキストでレポートしている。そしてオペレータは、これらの情報に基づいて次のオペレーションを決定している。 Conventionally, in shale oil and gas mining, various numerical values such as gamma dose are periodically measured by various sensors attached to the tip of the drill during mining, while stones are collected at certain intervals. Features (text, hardness, oil stains, etc.) are reported in text. The operator determines the next operation based on these pieces of information.

センサデータは、数値データであるため解釈が難しいという欠点を有するものの、全自動で収集できるという利点を有する。一方、レポートはテキストデータであるため解釈が容易であるという利点を有するものの、レポートの作成にはコストがかかるという欠点がある。このためセンサデータは存在するもののレポートが存在しない地点もある。 Although sensor data is numerical data, it has a disadvantage that it is difficult to interpret, but has an advantage that it can be collected automatically. On the other hand, since the report is text data, it has an advantage that it is easy to interpret, but there is a drawback that the report is expensive. For this reason, there is a point where sensor data exists but no report exists.

そこで、レポートが存在しない地点のセンサデータに対して、既存のレポートのテキストデータの中から適切なレポートのテキストデータを対応付けることができれば、お互いの欠点が補完でき、オペレータが次のオペレーションを決定する際の有力な資料となり得るものと考えられる。 Therefore, if it is possible to associate the text data of the appropriate report from the text data of the existing report with the sensor data at the point where the report does not exist, the mutual disadvantages can be complemented, and the operator determines the next operation. It is thought that it can be a powerful material for the occasion.

この場合において、従来、異なる種類のデータを対応付ける技術が特許文献１及び２に開示されている。特許文献１に開示された技術は、人の移動に対し、センサから得られる数値データとテキスト（例えば「ゆっくり歩く」）を対応付けるものである。また特許文献２に開示された技術は、料理の写真とそのレシピから、料理の写真と素材を対応付けるものである。 Conventionally, techniques for associating different types of data in this case are disclosed in Patent Documents 1 and 2. The technique disclosed in Patent Document 1 associates numerical data obtained from a sensor with text (for example, “slow walking”) with respect to human movement. The technique disclosed in Patent Document 2 associates a dish photo with a material from a dish photo and its recipe.

特開２０１３−２５０８６２号公報JP 2013-250862 A 特開２０１５−４１２２５号公報JP2015-41225A

ところで、一般的に、異なる種類のデータを対応付けるデータ対応付け装置は、同じデータ源から得られる２つの系列データからデータ間の対応関係を学習し、一方の系列の新たなデータが与えられた場合に、学習した対応関係を用いて、その新たなデータに対応する他方の系列のデータを出力する。 By the way, in general, a data association device that associates different types of data learns correspondence between data from two series data obtained from the same data source, and is given new data of one series Then, using the learned correspondence, the other series of data corresponding to the new data is output.

そしてデータ対応付け装置が２つの系列データ間の対応関係を学習する際は、同じデータ源から得られたデータ間の距離を最小化するよう、これら２つの系列データを対応付ける計算式を決定するが、その他にも考慮すべき距離がある。 When the data association apparatus learns the correspondence between two series data, a calculation formula that associates these two series data is determined so as to minimize the distance between the data obtained from the same data source. There are other distances to consider.

例えば、シェールオイル・ガス掘削では、上述のテキストデータ及び数値データといった２つの系列データが物理的に近い地層で取得された場合、それらのデータ間の距離も近くなるべきである。従来の対応関係の計算方法には、このようなデータ源間の距離が考慮されておらず、精度の高いデータの対応付けを行い得ない問題があった。 For example, in shale oil and gas drilling, when two series of data such as the text data and numerical data described above are acquired in a physically close formation, the distance between the data should be close. The conventional correspondence calculation method does not take into account such a distance between data sources, and has a problem that data cannot be associated with high accuracy.

本発明は以上の点を考慮してなされたもので、精度の高いデータの対応付けを行い得るデータ対応付け装置及び方法を提案しようとするものである。 The present invention has been made in view of the above points, and intends to propose a data association apparatus and method capable of performing highly accurate data association.

かかる課題を解決するため本発明においては、同一のデータ源から得られた第１及び第２の系列データの対応モデルを学習し、学習した前記対応モデルに基づいて、一方の前記第１又は第２の系列データに属する対象データを、他方の前記第２又は第１の系列データに属するデータと対応付けるデータ対応付け装置において、同一の前記データ源から得られた前記第１及び第２の系列データをそれぞれベクトル化するベクトル化部と、前記ベクトル化された前記第１及び第２の系列データに基づいて、前記第１及び第２の系列データの前記対応モデルを学習する対応モデル学習部と設け、前記第１及び第２の系列データのうち、任意の２つの異なる前記データ源から取得されたデータ同士の相関の度合いである相関度が予め定義され、前記対応モデル学習部が、前記相関度を利用して前記対応モデルを学習するようにした。 In order to solve such a problem, in the present invention, a correspondence model of the first and second series data obtained from the same data source is learned, and one of the first or first ones is based on the learned correspondence model. The first and second series data obtained from the same data source in a data association apparatus for associating target data belonging to two series data with data belonging to the other second or first series data And a corresponding model learning unit for learning the corresponding model of the first and second series data based on the vectorized first and second series data, respectively. A correlation degree that is a degree of correlation between data acquired from any two different data sources among the first and second series data is defined in advance, Model learning unit, and adapted to learn the correspondence model by using the correlation.

また本発明においては、同一のデータ源から得られた第１及び第２の系列データの対応モデルを学習し、学習した前記対応モデルに基づいて、一方の前記第１又は第２の系列データに属する対象データを、他方の前記第２又は第１の系列データに属するデータと対応付けるデータ対応付け装置において実行されるデータ対応付け方法であって、前記データ対応付け装置が、同一の前記データ源から得られた前記第１及び第２の系列データをそれぞれベクトル化する第１のステップと、前記データ対応付け装置が、前記ベクトル化された前記第１及び第２の系列データに基づいて、前記第１及び第２の系列データの前記対応モデルを学習する第２のステップとを設け、前記第１及び第２の系列データのうち、任意の２つの異なる前記データ源から取得されたデータ同士の相関の度合いである相関度が予め定義され、前記第２のステップにおいて、前記データ対応付け装置は、前記相関度を利用して前記対応モデルを学習するようにした。 In the present invention, the correspondence model of the first and second series data obtained from the same data source is learned, and one of the first or second series data is obtained based on the learned correspondence model. A data association method executed in a data association apparatus for associating target data belonging to data belonging to the other second or first series data, the data association apparatus from the same data source A first step of vectorizing the obtained first and second sequence data, respectively, and the data association apparatus, based on the vectorized first and second sequence data, A second step of learning the correspondence model of the first and second series data, and any two different data sources of the first and second series data Defined correlation is the degree of correlation of the acquired data with each other in advance, in the second step, the data associating apparatus was configured to learn the correspondence model by using the correlation.

本データ対応付け装置及び方法によれば、データ源間の相関度を考慮して、より精度の高い対応モデルを学習することができる。 According to this data association apparatus and method, it is possible to learn a correspondence model with higher accuracy in consideration of the degree of correlation between data sources.

本発明によれば、より精度良くデータの対応付けを行うことができる。 According to the present invention, data can be associated with higher accuracy.

第１及び第２の実施の形態によるデータ対応付け装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the data matching apparatus by 1st and 2nd embodiment. 第１及び第２の実施の形態によるデータ対応付け装置の論理構成を示すブロック図である。It is a block diagram which shows the logic structure of the data matching apparatus by 1st and 2nd embodiment. シェールオイル・ガス掘削の概要説明に供する概念図である。It is a conceptual diagram with which it uses for the outline | summary description of shale oil and gas drilling. テキストデータの構造例を示す概念図である。It is a conceptual diagram which shows the structural example of text data. 第１の実施の形態におけるベクトル化されたテキストデータの構造例を示す概念図である。It is a conceptual diagram which shows the structural example of the vectorized text data in 1st Embodiment. 第１の実施の形態によるテキストデータベクトル化処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the text data vectorization process by 1st Embodiment. 数値データの構造例を示す概念図である。It is a conceptual diagram which shows the structural example of numerical data. 第１の実施の形態におけるベクトル化された数値データの構造例を示す概念図である。It is a conceptual diagram which shows the structural example of the vectorized numerical data in 1st Embodiment. 第１の実施の形態による数値データベクトル化処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the numerical data vectorization process by 1st Embodiment. 第１の実施の形態における対応モデル学習部の処理概要の説明に供する概念図である。It is a conceptual diagram with which it uses for description of the process outline | summary of the corresponding model learning part in 1st Embodiment. 物理的距離による相関度の説明に供する概念図である。It is a conceptual diagram with which it uses for description of the correlation degree by physical distance. 第１の実施の形態による対応モデル学習処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the corresponding model learning process by 1st Embodiment. 対応データ検索部による対応データ検索処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the corresponding data search process by a corresponding data search part. 第２の実施の形態におけるベクトル化されたテキストデータの構造例を示す概念図である。It is a conceptual diagram which shows the structural example of the vectorized text data in 2nd Embodiment. 第２の実施の形態におけるベクトル化された数値データの構造例を示す概念図である。It is a conceptual diagram which shows the structural example of the vectorized numerical data in 2nd Embodiment. 第２の実施の形態よる対応モデル学習部の処理概要の説明に供するフロー図である。It is a flowchart with which it uses for description of the process outline | summary of the corresponding model learning part by 2nd Embodiment.

以下図面について、本発明の一実施の形態を詳述する。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

（１）第１の実施の形態
（１−１）本実施の形態によるデータ対応付け装置の構成
図１において、１は全体として本実施の形態によるデータ対応付け装置のハードウェア構成を示す。本データ対応付け装置１は、プロセッサ２、メモリ３、補助記憶装置４及び入出力インタフェース５を備えて構成される。 (1) First Embodiment (1-1) Configuration of Data Correlation Device According to this Embodiment In FIG. 1, 1 indicates the hardware configuration of a data association device according to this embodiment as a whole. The data association apparatus 1 includes a processor 2, a memory 3, an auxiliary storage device 4, and an input / output interface 5.

プロセッサ２は、データ対応付け装置１全体の動作制御を司る機能を有するデバイスである。またメモリ３は、例えば半導体メモリから構成され、主としてプログラムやデータを一時的に保持するために利用される。後述するデータベクトル化プログラム１０、対応モデル学習プログラム１１及び対応データ検索プログラム１２もこのメモリ３に格納されて保持される。 The processor 2 is a device having a function for controlling operation of the entire data association apparatus 1. The memory 3 is composed of, for example, a semiconductor memory, and is mainly used for temporarily storing programs and data. A data vectorization program 10, a corresponding model learning program 11, and a corresponding data search program 12 described later are also stored and held in the memory 3.

補助記憶装置４は、例えばハードディスク装置やＳＤＤ（Solid State Drive）などの大容量の記憶装置から構成され、プログラムやデータを長期間保持するために利用される。補助記憶装置４に格納されたプログラムが起動時又は必要時にメモリ３にロードされ、このプログラムをプロセッサ２が実行することにより、データ対応付け装置１全体としての各種処理が実行される。 The auxiliary storage device 4 is composed of a large-capacity storage device such as a hard disk device or an SDD (Solid State Drive), and is used to hold programs and data for a long period of time. A program stored in the auxiliary storage device 4 is loaded into the memory 3 at the time of activation or when necessary, and the processor 2 executes the program, whereby various processes as the entire data association device 1 are executed.

入出力インタフェース５は、データ対応付け装置１に周辺機器を接続するためのインタフェースであり、キーボード及びマウスなどの入力装置１３と、液晶ディスプレイ又は有機ＥＬディスプレイなどの表示装置１４となどが接続される。入力装置１３は、ユーザがデータ対応付け装置１に指示や情報などを入力するためのハードウェアデバイスであり、表示装置１４は、入出力用の各種画面を表示するハードウェアデバイスである。 The input / output interface 5 is an interface for connecting peripheral devices to the data association device 1 and is connected to an input device 13 such as a keyboard and a mouse and a display device 14 such as a liquid crystal display or an organic EL display. . The input device 13 is a hardware device for a user to input instructions and information to the data association device 1, and the display device 14 is a hardware device for displaying various screens for input / output.

図２は、本実施の形態によるデータ対応付け装置１の論理構成を示す。本データ対応付け装置１は、データベクトル化部２０、対応モデル学習部２１、対応データ検索部２２、データ蓄積部２３及び対応モデル蓄積部２４を備えて構成される。 FIG. 2 shows a logical configuration of the data association apparatus 1 according to the present embodiment. The data association apparatus 1 includes a data vectorization unit 20, a correspondence model learning unit 21, a correspondence data search unit 22, a data storage unit 23, and a correspondence model storage unit 24.

データベクトル化部２０は、プロセッサ２がメモリ３にロードされたデータベクトル化プログラム１０（図１）を実行することにより具現化される機能部であり、本実施の形態の場合、テキストデータベクトル化部２５及び数値データベクトル化部２６から構成される。 The data vectorization unit 20 is a functional unit embodied by the processor 2 executing the data vectorization program 10 (FIG. 1) loaded in the memory 3, and in the case of the present embodiment, text data vectorization is performed. And a numerical data vectorization unit 26.

テキストデータベクトル化部２５は、テキストデータ２７をベクトル化する機能を有する機能部である。テキストデータベクトル化部２５は、後述する学習モード時には、ベクトル化したテキストデータ２７（以下、これをテキストデータベクトルと呼ぶ）と、そのテキストデータ２７とをデータ蓄積部２３に格納し、後述する対応データ検索モード時には、テキストデータ及びそのテキストデータベクトルを対応データ検索部２２に出力する。 The text data vectorization unit 25 is a functional unit having a function of vectorizing the text data 27. The text data vectorization unit 25 stores the vectorized text data 27 (hereinafter referred to as a text data vector) and the text data 27 in the data storage unit 23 in the learning mode to be described later. In the data search mode, the text data and the text data vector are output to the corresponding data search unit 22.

また数値データベクトル化部２６は、数値データ２８をベクトル化する機能を有する機能部である。数値データベクトル化部２６は、ベクトル化した数値データ２８（以下、これを数値データベクトルと呼ぶ）と、その数値データ２８とをデータ蓄積部２３に格納し、対応データ検索モード時には、数値データ２８及びその数値データベクトルを対応データ検索部２２に出力する。 The numerical data vectorization unit 26 is a functional unit having a function of vectorizing the numerical data 28. The numeric data vectorization unit 26 stores the vectorized numeric data 28 (hereinafter referred to as numeric data vector) and the numeric data 28 in the data storage unit 23. In the corresponding data search mode, the numeric data 28 And the numerical data vector thereof are output to the corresponding data search unit 22.

対応モデル学習部２１は、プロセッサ２がメモリ３にロードされた対応モデル学習プログラム１１（図１）を実行することにより具現化される機能部であり、データ蓄積部２３に蓄積されたテキストデータベクトル及び数値データベクトルに基づいて、テキストデータ２７及び数値データ２８の対応関係を学習する機能を有する。対応モデル学習部２１は、かかる学習により得られたテキストデータ２７及び数値データ２８の対応モデルを対応モデル蓄積部２４に格納する。 The correspondence model learning unit 21 is a functional unit that is realized by the processor 2 executing the correspondence model learning program 11 (FIG. 1) loaded in the memory 3, and the text data vector stored in the data storage unit 23. And the function of learning the correspondence between the text data 27 and the numerical data 28 based on the numerical data vector. The correspondence model learning unit 21 stores the correspondence model of the text data 27 and the numerical data 28 obtained by the learning in the correspondence model accumulation unit 24.

対応データ検索部２２は、対応データ検索モード時に対象データ２９として与えられ、データベクトル化部２０によりベクトル化されたテキストデータ２７に対応する数値データ、又は、データベクトル化部２０によりベクトル化された数値データ２８に対応するテキストデータを、対応モデル蓄積部２４に蓄積された対応モデルを参照してデータ蓄積部２３に格納された数値データ２８又はテキストデータ２７の中から検索する機能を有する機能部である。 Corresponding data search unit 22 is given as target data 29 in the corresponding data search mode, and numerical data corresponding to text data 27 vectorized by data vectorization unit 20 or vectorized by data vectorization unit 20 A functional unit having a function of searching text data corresponding to the numerical data 28 from the numerical data 28 or the text data 27 stored in the data storage unit 23 with reference to the corresponding model stored in the corresponding model storage unit 24 It is.

なおデータ蓄積部２３及び対応モデル蓄積部２４は、メモリ３（図１）又は補助記憶装置４（図１）内に予め確保された記憶領域である。 The data storage unit 23 and the corresponding model storage unit 24 are storage areas secured in advance in the memory 3 (FIG. 1) or the auxiliary storage device 4 (FIG. 1).

かかる構成を有する本データ対応付け装置１では、テキストデータ２７と数値データ２８との対応モデルを学習する学習モード時、既に対応付けられている複数対のテキストデータ２７及び数値データ２８が訓練データとして順次与えられる。 In the data association apparatus 1 having such a configuration, in a learning mode in which a correspondence model between the text data 27 and the numerical data 28 is learned, a plurality of pairs of text data 27 and numerical data 28 that are already associated are used as training data. Given sequentially.

そしてデータ対応付け装置１は、かかる訓練データが与えられると、テキストデータ２７をテキストデータベクトル化部２５においてベクトル化すると共に、数値データ２８を数値データベクトル化部２６においてベクトル化し、かくして得られたテキストデータベクトル及び数値データベクトルと、元のテキストデータ２７及び数値データ２８とをデータ蓄積部２３に格納する。 When such training data is given, the data association apparatus 1 vectorizes the text data 27 in the text data vectorization unit 25 and vectorizes the numeric data 28 in the numeric data vectorization unit 26, thus obtained. The text data vector and numerical data vector, and the original text data 27 and numerical data 28 are stored in the data storage unit 23.

また対応モデル学習部２１は、データ蓄積部２３に蓄積された既に対応付けられているテキストデータベクトル及び数値データベクトルのベクトル対に基づいて、テキストデータ２７及び数値データ２８の対応モデルを生成し、生成した対応モデルを対応モデル蓄積部２４に格納する。 Further, the correspondence model learning unit 21 generates a correspondence model of the text data 27 and the numerical data 28 based on the vector pair of the text data vector and the numerical data vector already associated stored in the data storage unit 23, The generated corresponding model is stored in the corresponding model storage unit 24.

以上により、データ対応付け装置１は、テキストデータ又は数値データが対象データ２９として与えられたときに、そのテキストデータ又は数値データに対応する数値データ又はテキストデータをデータ蓄積部２３に蓄積された数値データ又はテキストデータの中から検索する対応データ検索処理を実行可能となる。 As described above, when the text data or numerical data is given as the target data 29, the data association device 1 stores the numerical data or text data corresponding to the text data or numerical data stored in the data storage unit 23. Corresponding data search processing for searching from data or text data can be executed.

そしてデータ対応付け装置１は、この後、動作モードがかかる対応データ検索処理を実行する対応データ検索モードに切り替えられ、対応付けの対象となる対象データ２９が与えられると、その対象データ２９がテキストデータであった場合には、テキストデータベクトル化部２５においてその対象データ２９をベクトル化する一方、当該対象データ２９が数値データであった場合には数値データベクトル化部２６においてその対象データ２９をベクトル化する。そして、このようにしてベクトル化された対象データ２９（以下、これを対象データベクトルと呼ぶ）は、この後、対応データ検索部２２に与えられる。 After that, the data association apparatus 1 is switched to the correspondence data search mode for executing the correspondence data search processing with the operation mode, and when the target data 29 to be associated is given, the target data 29 is converted into the text. If the target data 29 is numeric data, the text data vectorization unit 25 vectorizes the target data 29. If the target data 29 is numeric data, the numeric data vectorization unit 26 converts the target data 29 into vector data. Vectorize. The target data 29 vectorized in this way (hereinafter referred to as the target data vector) is then given to the corresponding data search unit 22.

対応データ検索部２２は、データベクトル化部２０から上述の対象データベクトルが与えられると、対応モデル蓄積部２４に蓄積されている対応モデルを読み出し、読み出した対応モデルを用いて対象データベクトルに対応付けるべきテキストデータベクトル又は数値データベクトルとして最も適切なテキストデータベクトル又は数値データベクトルをデータ蓄積部２３上で検索する。そして対応データ検索部２２は、この検索により検出した、対象データベクトルに対応付けるべきテキストデータベクトル又は数値データベクトルを検出すると、そのテキストデータベクトル又は数値データベクトルに対応する元のテキストデータ又は数値データを対応データ３０として出力する。かくして、この対応データ３０に関する所定情報が、対象データ２９に対応するデータの検索結果として表示装置１４に表示される。 When the corresponding data search unit 22 is given the above-described target data vector from the data vectorization unit 20, the corresponding data search unit 22 reads the corresponding model stored in the corresponding model storage unit 24, and associates it with the target data vector using the read corresponding model. The most suitable text data vector or numeric data vector as the power text data vector or numeric data vector is searched on the data storage unit 23. Then, when the corresponding data search unit 22 detects the text data vector or numerical data vector to be associated with the target data vector detected by the search, the corresponding data search unit 22 determines the original text data or numerical data corresponding to the text data vector or numerical data vector. Output as correspondence data 30. Thus, the predetermined information related to the correspondence data 30 is displayed on the display device 14 as a search result of data corresponding to the target data 29.

（１−２）本実施の形態の対象データ
図３は、本実施例が対象とする、シェールオイル・ガス掘削の概要を示す。シェールオイル・ガス掘削では、油井４０を下方向に掘り進め、シェール層に達したら、掘削方向を横に変える。４１は掘削経路を示す。シェール層では、水圧破砕により岩盤に裂け目４２を形成し、形成した裂け目４２からオイル・ガスを抽出する。 (1-2) Target Data of the Present Embodiment FIG. 3 shows an outline of shale oil / gas drilling targeted by the present embodiment. In shale oil / gas drilling, the oil well 40 is drilled downward, and when the shale layer is reached, the drilling direction is changed to the side. Reference numeral 41 denotes an excavation route. In the shale layer, a fissure 42 is formed in the rock by hydraulic fracturing, and oil and gas are extracted from the formed fissure 42.

かかるシェールオイル・ガスの掘削中は、２種類の情報をログとして収集する。１つ目は、ドリルに配設されたセンサから収集した数値データである。例えば、含有炭素量を調べるためにガンマ線量を測定する。２つ目は、サンプリングした石の特徴を記述したテキストデータである。テキストデータには、サンプリングされた石の色、硬さ、油染みの有無などの情報を所定フォーマットで記述する。両者は、掘削経路４１に沿って一定の間隔で取得する。オペレータは、二種類のログ情報から、どこで水平掘りに変えるか、どこで水圧破砕を行うかを決定する。 During excavation of such shale oil and gas, two types of information are collected as logs. The first is numerical data collected from sensors arranged on the drill. For example, gamma dose is measured in order to check the carbon content. The second is text data describing the characteristics of the sampled stone. In the text data, information such as the color of the sampled stone, hardness, and the presence or absence of oil stain is described in a predetermined format. Both are acquired at regular intervals along the excavation path 41. The operator decides where to change to horizontal digging and where to perform hydraulic fracturing from the two types of log information.

テキストデータは、オペレータに内容が理解しやすい反面で、サンプリングや作成にコストを要する。よってその取得間隔は比較的長い。一方、数値データは自動で取得できるため、データの取得間隔は短い。しかし、数値データの解釈には専門知識を要する。 Text data is easy for the operator to understand, but requires cost for sampling and creation. Therefore, the acquisition interval is relatively long. On the other hand, since numerical data can be acquired automatically, the data acquisition interval is short. However, interpretation of numerical data requires specialized knowledge.

そこで本実施の形態においては、掘削経路４１内の同一地点において得られたテキストデータ及び数値データに基づき図１及び図２について上述した本実施の形態のデータ対応付け装置１により両者の対応関係を学習し、一方から他方を自動生成する（正確には、それまでに得られたテキストデータ又は数値データの中から最も適切と推定されるものを対応付ける）ことで、お互いの欠点を補い合うようにする。これにより、石がサンプリングされていない地点に対しても、学習した対応関係を用いて、数値データから既存のテキストデータの中から最も近いと推定されるテキストデータを選択してその数値データに対応付けることができる。 Therefore, in the present embodiment, the correspondence between the two is determined by the data association apparatus 1 of the present embodiment described above with reference to FIGS. 1 and 2 based on the text data and the numerical data obtained at the same point in the excavation route 41. Learn and automatically generate one to the other (to be exact, associate the most likely estimate of text or numerical data obtained so far) to compensate for each other's deficiencies . As a result, even for a point where no stone is sampled, the learned correspondence is used to select the text data that is estimated to be the closest from the existing text data from the numerical data and associate it with the numerical data. be able to.

（１−３）テキストデータベクトル化部の処理
図４は、本実施の形態によるテキストデータ２７の構造例を示す。この図４の例の場合、テキストデータ２７は、石を採取（サンプリング）した地点の深さの範囲（以下、これを第１の深さ範囲と呼ぶ）を表す文字列のデータからなる深さ範囲データ部５０と、その第１の深さ範囲内の深さの地層においてサンプリングされた石の詳細説明のテキストのデータからなる詳細説明データ部５１とから構成される。なお、ここでの「深さ」とは、油井４０の入り口からの掘削経路長である。 (1-3) Processing of Text Data Vectorization Unit FIG. 4 shows a structural example of the text data 27 according to this embodiment. In the case of the example of FIG. 4, the text data 27 is a depth composed of character string data representing a depth range (hereinafter referred to as a first depth range) where a stone is collected (sampled). The range data part 50 and the detailed description data part 51 including the text data of the detailed description of the stone sampled in the formation of the depth within the first depth range are configured. Here, the “depth” is the excavation path length from the entrance of the oil well 40.

詳細説明データ部５１には、対応する第１の深さ範囲でサンプリングされた石の特徴の詳細説明がある一定のルールの下で記述される。なお図４は、『「色」、「硬さ」、「表面の滑らかさ」、「油染みの有無」、……の順番にカンマで区切って順次記述する』というルールに従って石の特徴が記述されている例を示している。この詳細説明データ部５１は、人手により入力される。このようにテキストデータ２７は、個々の第１の深さ範囲ごとの石の特徴が記述された系列データである。 The detailed description data portion 51 is described under a certain rule with a detailed description of the characteristics of the stone sampled in the corresponding first depth range. In FIG. 4, the characteristics of the stone are described according to the rule of “describe sequentially in the order of“ color ”,“ hardness ”,“ smoothness of the surface ”,“ presence / absence of oil stain ”,... An example is shown. The detailed explanation data portion 51 is manually input. As described above, the text data 27 is series data in which the characteristics of stones for each first depth range are described.

図５は、このようなテキストデータ２７をテキストデータベクトル化部２５（図２）においてベクトル化することにより得られたテキストデータベクトル５２の構造例を示す。本実施の形態においては、テキストデータベクトル化部２５がテキストデータ２７を第１の深さ範囲を複数含む一定の深さごと（例えば図４において500〔m〕ごと）に区分して、区分（以下、これを深さ範囲区分と呼ぶ）ごとに、その深さ範囲区分に属する各第１の深さ範囲の各テキストデータをまとめてベクトル化する。 FIG. 5 shows an example of the structure of the text data vector 52 obtained by vectorizing such text data 27 in the text data vectorization unit 25 (FIG. 2). In the present embodiment, the text data vectorization unit 25 divides the text data 27 into certain depths including a plurality of first depth ranges (for example, every 500 [m] in FIG. 4), and classifies ( Hereinafter, each text data of each first depth range belonging to the depth range section is vectorized together for each time).

テキストデータベクトル５２の要素は、対応する深さ範囲区分に属する各第１の深さ範囲と対応する各テキストデータ２７に基づく各テキストにおける、サンプリングされた石の属性（特徴）ごとの各属性値の出現頻度である。図５の例の場合、石の属性として、「色」、「硬さ」及び「表面の滑らかさ」などが挙げられており、これらの属性ごとにそれぞれ属性値の頻度が羅列されている。例えば、「色」という属性については、対応する深さ範囲区分に含まれる各第１の深さ範囲のテキストデータ２７の詳細説明データ部５１内に「red（赤色）」という属性値が「10」回、「yellow（黄色）」という属性値が「０」回、「brown（茶色）」という属性値が「５」回出現し、連続する３つの第１の深さ範囲において「red-red-red」という石の色の組合せが出現した回数が「０」、「red-red-yellow」という石の色の組合せが出現した回数が「２」回、「brown-brown-brown」という石の色の組合せが出現した回数が「３」回であったことが示されている。 The elements of the text data vector 52 are attribute values for each sampled stone attribute (feature) in each text based on each text data 27 corresponding to each first depth range belonging to the corresponding depth range section. Frequency of occurrence. In the example of FIG. 5, “color”, “hardness”, “smoothness of surface”, and the like are listed as stone attributes, and the frequency of attribute values is listed for each of these attributes. For example, for the attribute “color”, the attribute value “red (red)” is set to “10” in the detailed description data portion 51 of the text data 27 of each first depth range included in the corresponding depth range section. ”Times, the attribute value“ yellow ”appears“ 0 ”times, the attribute value“ brown ”appears“ 5 ”times, and“ red-red ”appears in three consecutive first depth ranges. -red "stone color combination appeared" 0 "," red-red-yellow "stone color combination appeared" 2 "times," brown-brown-brown "stone It is shown that the number of times that the color combination of “3” appeared was “3”.

図６は、学習モード時に、訓練データとして、それぞれ数値データ２８との対応付けがなされている系列の複数のテキストデータ２７が順番に与えられたときにテキストデータベクトル化部２５により実行されるテキストデータベクトル化処理の処理手順を示す。テキストデータベクトル化部２５は、この図６に示す処理手順に従って、これらのテキストデータ２７をベクトル化する。 FIG. 6 shows the text executed by the text data vectorization unit 25 when a plurality of text data 27 in a series each associated with the numerical data 28 is sequentially given as training data in the learning mode. The processing procedure of data vectorization processing is shown. The text data vectorization unit 25 vectorizes these text data 27 according to the processing procedure shown in FIG.

実際上、テキストデータベクトル化部２５は、テキストデータ２７が与えられるとこの図６に示すテキストデータベクトル化処理を開始し、まず、そのとき与えられたテキストデータ２７が文書画像（文書をスキャンした画像）であるか否かを判断する（ＳＰ１）。 In practice, the text data vectorization unit 25 starts the text data vectorization processing shown in FIG. 6 when the text data 27 is given. First, the text data 27 given at that time is converted into a document image (scanned document). (SP1).

そしてテキストデータベクトル化部２５は、この判断で否定結果を得るとステップＳＰ３に進み、これに対して肯定結果を得ると、その文書画像に対してＯＣＲ（Optical Character Recognition）処理を実行することにより、その文書画像に記載されている各文字列をそれぞれ認識する（ＳＰ２）。 If the text data vectorization unit 25 obtains a negative result in this determination, it proceeds to step SP3, and if it obtains a positive result, it performs OCR (Optical Character Recognition) processing on the document image. Each character string described in the document image is recognized (SP2).

続いて、テキストデータベクトル化部２５は、テキストデータ２７からテキストの各ブロックをそれぞれ抽出するレイアウト処理を実行する（ＳＰ３）。例えば、図４の例の場合、テキストデータベクトル化部２５は、各深さ範囲データ部５０と、各詳細説明データ部５１とをそれぞれテキストのブロックとして抽出する。この際、テキストデータベクトル化部２５は、深さ範囲データ部５０と、これに対応する詳細説明データ部５１とを対応付けておく。 Subsequently, the text data vectorization unit 25 executes a layout process for extracting each block of text from the text data 27 (SP3). For example, in the case of the example of FIG. 4, the text data vectorization unit 25 extracts each depth range data unit 50 and each detailed explanation data unit 51 as a text block. At this time, the text data vectorization unit 25 associates the depth range data unit 50 with the corresponding detailed explanation data unit 51.

次いで、テキストデータベクトル化部２５は、ステップＳＰ３で抽出した各深さ範囲データ部５０及び各詳細説明データ部５１を、０〜500〔m〕、500〜1000〔m〕、1000〜1500〔m〕、……といったように、上述の深さ範囲区分に対応するものごとに分割する（ＳＰ４）。 Next, the text data vectorization unit 25 converts each depth range data unit 50 and each detailed explanation data unit 51 extracted in step SP3 into 0-500 [m], 500-1000 [m], 1000-1500 [m]. ],..., And so on (SP4).

またテキストデータベクトル化部２５は、深さ範囲区分ごとに、その深さ範囲区分に含まれる第１の深さ範囲ごとの詳細説明データ部５１のテキストデータに基づくテキストを句ごとに分割する（ＳＰ５）。例えば、図４の例では、上述のように詳細説明データ部５１に石の特徴が『「色」、「硬さ」、「表面の滑らかさ」、「油染みの有無」、……の順番にカンマで区切って順次記述する』というルールの下で記述されているため、カンマごとに文字列を区切ることによりテキストを句ごとに分割することができる。この場合、各句は、それぞれ石の何らかの属性の属性値を表すことになる。 Further, the text data vectorization unit 25 divides the text based on the text data of the detailed explanation data unit 51 for each first depth range included in the depth range segment for each depth range segment for each phrase ( SP5). For example, in the example of FIG. 4, as described above, in the detailed description data unit 51, the stone features are “color”, “hardness”, “surface smoothness”, “presence / absence of oil stain”, and so on. Since it is described under the rule of “describe sequentially with commas”, the text can be divided into phrases by separating character strings for each comma. In this case, each phrase represents an attribute value of some attribute of the stone.

次いで、テキストデータベクトル化部２５は、深さ範囲区分ごとに、ステップＳＰ５で分割した各句を属性値の所定の辞書と符合することにより、各句が表す属性値をそれぞれ抽出し（ＳＰ６）、この後、ステップＳＰ６で抽出した属性値の出現頻度を属性ごとに集計した後（ＳＰ７）、このテキストデータベクトル化処理を終了する。 Next, the text data vectorization unit 25 extracts each attribute value represented by each phrase by matching each phrase divided in step SP5 with a predetermined dictionary of attribute values for each depth range section (SP6). Thereafter, the appearance frequency of the attribute value extracted in step SP6 is tabulated for each attribute (SP7), and then the text data vectorization process is terminated.

（１−４）数値データベクトル化部の処理
一方、図７は、数値データ２８の一例を示す。この図７は、ドリルに配設されたセンサにより計測された所定深さごとのガンマ線量を表す。数値データベクトル化部２６は、このような数値データ２８を上述のテキストデータ２７の深さ範囲区分と同じ深さ範囲（例えば、０〜500〔m〕、500〜1000〔m〕、1000〜1500〔m〕、……）ごとに区分して、深さ範囲区分ごとに、その深さ範囲区分に属する複数の数値データ２８をまとめてベクトル化する。 (1-4) Processing of Numerical Data Vectorization Unit On the other hand, FIG. FIG. 7 shows the gamma dose for each predetermined depth measured by a sensor disposed on the drill. The numerical data vectorization unit 26 converts the numerical data 28 into the same depth range (for example, 0 to 500 [m], 500 to 1000 [m], 1000 to 1500) as the depth range section of the text data 27 described above. [M],...), And for each depth range segment, a plurality of numerical data 28 belonging to the depth range segment are collectively vectorized.

図８は、図７の数値データ２８のうち０〜500〔m〕の深さ範囲区分に属する複数の数値データ２８をまとめて数値データベクトル化部２６によりベクトル化した結果（数値データベクトル）５３の一例を示す。このベクトル化により得られる数値データベクトル５３の要素には２種類ある。１つ目の要素は、対応する深さ範囲区分内での数値の平均値５３Ａであり、２つ目の要素は、記号化した数値データ系列の頻度５３Ｂである。数値データの記号化には、例えばＳＡＸ（Symbolic Aggregate Approximation）法を用いる。ＳＡＸ法では、各数値を一文字に変換する。変換後の数値データ２８は文字列になる。本実施の形態では、連続する３文字の頻度を数値データ系列の頻度とする。例えば、図８において「35」は、「aaa」という記号の頻度である。 FIG. 8 shows a result (numerical data vector) 53 of the numerical data vectorization unit 26 that combines a plurality of numerical data 28 belonging to the depth range of 0 to 500 [m] among the numerical data 28 of FIG. An example is shown. There are two types of elements of the numerical data vector 53 obtained by this vectorization. The first element is the average value 53A of the numerical values in the corresponding depth range section, and the second element is the frequency 53B of the symbolized numerical data series. For example, the SAX (Symbolic Aggregate Approximation) method is used for the symbolization of numerical data. In the SAX method, each numerical value is converted into one character. The converted numeric data 28 is a character string. In the present embodiment, the frequency of three consecutive characters is set as the frequency of the numerical data series. For example, in FIG. 8, “35” is the frequency of the symbol “aaa”.

図９は、数値データベクトル化部２６が、訓練データとして、それぞれテキストデータ２７との対応付けがなされている系列の数値データ２８が与えられたときに実行する数値データベクトル化処理の処理手順を示す。数値データベクトル化部２６は、この図９に示す処理手順に従って、これらの数値データ２８をベクトル化する。 FIG. 9 shows the processing procedure of the numerical data vectorization processing that is executed when the numerical data vectorization unit 26 is provided with a series of numerical data 28 each associated with the text data 27 as training data. Show. The numerical data vectorization unit 26 vectorizes these numerical data 28 according to the processing procedure shown in FIG.

実際上、数値データベクトル化部２６は、数値データ２８が与えられるとこの図９に示す数値データベクトル化処理を開始し、まず、数値データ２８を上述の深さ範囲区分ごとに分割する（ＳＰ１０）。 Actually, the numerical data vectorization unit 26 starts the numerical data vectorization processing shown in FIG. 9 when the numerical data 28 is given, and first divides the numerical data 28 into the above-described depth range sections (SP10). ).

続いて、数値データベクトル化部２６は、深さ範囲区分ごとに、その深さ範囲区分に属する各数値データ２８の平均値を算出し（ＳＰ１１）、その後、深さ範囲区分ごとに、その深さ範囲区分に属する各数値データ２８をＳＡＸ法により記号化する（ＳＰ１２）。 Subsequently, the numerical data vectorization unit 26 calculates an average value of the numerical data 28 belonging to the depth range section for each depth range section (SP11), and then the depth range section for each depth range section. Each numerical data 28 belonging to the range section is symbolized by the SAX method (SP12).

次いで、数値データベクトル化部２６は、深さ範囲区分ごとに、ステップＳＰ１２の記号化により得られた各記号の出現頻度を集計し（ＳＰ１３）、この後、この数値データベクトル化処理を終了する。 Next, the numerical data vectorization unit 26 totals the appearance frequency of each symbol obtained by the symbolization of step SP12 for each depth range section (SP13), and thereafter ends this numerical data vectorization processing. .

（１−５）対応モデル学習部の処理
図１０は、対応モデル学習部２１の処理の概要を示す。図中、６０はテキストデータ２７のベクトル空間（以下、これをテキストベクトル空間と呼ぶ）、６１は数値データ２８のベクトル空間（以下、これを数値ベクトル空間と呼ぶ）をそれぞれ示す。テキストデータ２７及び数値データ２８の対応モデルを学習するために、テキストベクトル空間６０上のテキストデータ２７のベクトル（テキストデータベクトル）と、数値ベクトル空間６１上の数値データ２８のベクトル（数値データベクトル）とを共通のベクトル空間（以下、これを共通空間と呼ぶ）６２に射影する。６３及び６４は、それぞれかかるテキストデータベクトルや数値データベクトルに対するかかる射影を行うための変換行列Ｌ_ｘ，Ｌ_ｙである。 (1-5) Processing of Corresponding Model Learning Unit FIG. 10 shows an outline of processing of the corresponding model learning unit 21. In the figure, reference numeral 60 denotes a vector space of the text data 27 (hereinafter referred to as a text vector space), and 61 denotes a vector space of the numerical data 28 (hereinafter referred to as a numerical vector space). In order to learn a correspondence model between the text data 27 and the numerical data 28, a vector of the text data 27 in the text vector space 60 (text data vector) and a vector of the numerical data 28 in the numerical vector space 61 (numerical data vector) Are projected onto a common vector space (hereinafter referred to as a common space) 62. Reference numerals 63 and 64 denote transformation matrices L _x and L _y for performing such projection on the text data vector and the numerical data vector, respectively.

対応モデル学習部２１では、共通空間６２内での各ベクトルの配置が、以下の２つの特徴を持つような変換行列Ｌ_ｘ，Ｌ_ｙを学習する。１つ目の特徴は、共通空間６２における任意の２つのベクトル対（図１０では、テキストデータベクトルＬ_ｘ ^Ｔｘ_ｉ及び数値データベクトルＬ_ｙ ^Ｔｙ_ｉからなるベクトル対と、テキストデータベクトルＬ_ｘ ^Ｔｘ_ｊ及び数値データベクトルＬ_ｙ ^Ｔｙ_ｊからなるベクトル対）に関して、同じベクトル対内のテキストデータベクトル及び数値データベクトル（例えば、Ｌ_ｘ ^Ｔｘ_ｉ及びＬ_ｙ ^Ｔｙ_i）間の距離を最小化し、かつ、一方のベクトル対の数値データベクトル及び他方のベクトル対のテキストデータベクトル（例えば、Ｌ_ｙ ^Ｔｙ_ｉ及びＬ_ｘ ^Ｔｘ_ｊ）間の距離を最大化する特徴である。これは次式で与えられるＡを最小化することに相当する。
The corresponding model learning unit 21 learns transformation matrices L _x and L _{y in} which the arrangement of each vector in the common space 62 has the following two features. The first feature is that any two vector pairs in the common space 62 (in FIG. 10, a vector pair composed of a text data vector L _x ^T x _i and a numeric data vector L _y ^T y _i and a text data vector L _x ^For a vector pair consisting of ^T x _j and numeric data vector L _y ^T y _j , minimize the distance between the text data vector and numeric data vector (eg, L _x ^T x _i and L _y ^T y _i ) within the same vector pair And the distance between the numerical data vector of one vector pair and the text data vector (for example, L _y ^T y _i and L _x ^T x _j ) of the other vector pair is maximized. This is equivalent to minimizing A given by the following equation.

なお（１）式において、ｘ_ｉは、テキストベクトル空間６０における「ｉ」という地点で取得したテキストデータ２７のテキストデータベクトル、ｙ_ｉは、その地点で取得した数値データ２８の数値データベクトルをそれぞれ示し、ｘ_ｊは、テキストベクトル空間６０における「ｊ」という地点で取得したテキストデータ２７のテキストデータベクトル、ｙ_ｊは、その地点で取得した数値データ２８の数値データベクトルを示す。さらにｘ_ｉ ^Ｔ，ｘ_ｊ ^Ｔ，Ｌ_ｘ ^Ｔ，Ｌ_ｙ ^Ｔは、それぞれ対応するテキストデータベクトルｘ_ｉ，ｘ_ｊ又は変換行列Ｌ_ｘ，Ｌ_ｙの置換行列を示す。 In equation (1), x _i is the text data vector of the text data 27 acquired at the point “i” in the text vector space 60, and y _i is the numerical data vector of the numerical data 28 acquired at the point. X _j is a text data vector of the text data 27 acquired at the point “j” in the text vector space 60, and y _j is a numerical data vector of the numerical data 28 acquired at the point. Further, x _i ^T , x _j ^T , L _x ^T , and L _y ^T indicate the permutation matrixes of the corresponding text data vectors x _i , x _j or transformation matrices L _x , L _y , respectively.

従って、対応モデル学習部２１は、この（１）式で与えられるＡを最小化するような、つまり対応しているデータ対（ｉ＝ｊ）については共通空間６２内での内積をより大きくし、対応していないデータ対（ｉ≠ｊ）については共通空間６２内での内積をより小さくするような変換行列Ｌ_ｘ，Ｌ_ｙを学習することになる。 Therefore, the correspondence model learning unit 21 minimizes A given by the equation (1), that is, increases the inner product in the common space 62 for the corresponding data pair (i = j). For the data pairs that do not correspond (i ≠ j), the transformation matrices L _x and L _y that make the inner product in the common space 62 smaller are learned.

２つ目の特徴は、地層中で物理的に近い「ｉ」という地点で取得したデータと、「ｊ」という地点で取得したデータとからなるデータ対が、共通空間６２内でも近くなるような特徴である。これは、次式で与えられるＢを最小化することに相当する。
The second feature is that a data pair consisting of data acquired at a point “i” physically close in the formation and data acquired at a point “j” is also close in the common space 62. It is a feature. This is equivalent to minimizing B given by the following equation.

（２）式において、右辺第１項の「Ｌ_ｘ ^Ｔｘ_ｉ−Ｌ_ｘ ^Ｔｘ_ｊ」は、一方のテキストデータ２７のテキストデータベクトルｘ_ｉを共通空間６２に射影したベクトルＬ_ｘ ^Ｔｘ_ｉと、他方のテキストデータ２７のテキストデータベクトルｘ_ｊを共通空間６２に射影したベクトルＬ_ｘ ^Ｔｘ_ｊとの共通空間６２上での距離を表す。また右辺第２項の「Ｌ_ｙ ^Ｔｙ_ｉ−Ｌ_ｙ ^Ｔｙ_ｊ」は、かかる一方のテキストデータ２７に対応する数値データ２８の数値データベクトルｙ_ｉを共通空間６２に射影したベクトルＬ_ｙ ^Ｔｙ_ｉと、かかる他方のテキストデータ２７に対応する数値データ２８の数値データベクトルｙ_ｊを共通空間６２に射影したベクトルＬ_ｙ ^Ｔｙ_ｊとの共通空間６２上での距離表す。 (2) In the formula, _{^{_{_{^{"L x T x i -L x T}}}}} x j 'of the first term on the right side, the vector _L ^x T _{x i} obtained by projecting the text data vector _{x i} of one of the text data 27 to the common space 62 And the distance on the common space 62 to the vector L _x ^T x _j obtained by projecting the text data vector x _j of the other text data 27 onto the common space 62. The second term “L _y ^T y _i -L _y ^T y _j ” on the right side is a vector L _y ^T obtained by projecting the numeric data vector y _i of the numeric data 28 corresponding to the one text data 27 onto the common space 62. expressed distance on the common space 62 between y _i and the vector _L ^y T _{y j} of the numerical data vector _{y j} by projecting the common space 62 of the numerical data 28 corresponding to such other text data 27.

また（２）式において、Ｗ_ｉｊは、「ｉ」という地点と、「ｊ」という地点との地層中での距離をパラメータとした行列であり、この行列の各要素は、これら２つの地点が近い場合に「１」に近い値を、遠い場合は「０」に近い値をとる。つまり、行列Ｗ_ｉｊは、「ｉ」という地点から取得されたテキストデータ２７及び数値データ２８と、「ｊ」という地点から取得されたテキストデータ２７及び数値データ２８との相関の度合い（より正確にはテキストデータ２７及び数値データ２８の類似の度合いであり、以下、これを相関度と呼ぶ）を表しており、これら２つの地点が物理的（地理的）に近いほど、共通空間６２内でのユークリッド距離をより強く考慮する。 In equation (2), W _ij is a matrix whose parameter is the distance between the point “i” and the point “j” in the formation, and each element of this matrix has the following two points: A value close to “1” is taken when the distance is close, and a value close to “0” is taken when the distance is far. That is, the matrix W _ij indicates the degree of correlation between the text data 27 and the numerical data 28 acquired from the point “i” and the text data 27 and the numerical data 28 acquired from the point “j” (more accurately Represents the degree of similarity between the text data 27 and the numerical data 28, which is hereinafter referred to as a correlation), and the closer these two points are to physical (geographic), Consider Euclidean distance more strongly.

かかる行列Ｗ_ｉｊは、かかる２つの地点の垂直距離及び水平距離を考慮して定義する。シェールオイル・ガスの掘削では、垂直距離（深さ）が異なれば、地層が異なる可能性が高い。地層が異なれば、石の性質も異なるため、まずは垂直距離がある閾値を超えた場合は、かかる２つの地点の相関度を「０」にする。例えば、図１１において、油井７１の符号７１Ａで示された地点と、これと異なる油井７２の符号７２Ａで示された地点とでは垂直距離（深さ）が大きく異なるため、（２）式において行列Ｗ_ｉｊの値を「０」にする。 The matrix W _ij is defined in consideration of the vertical distance and the horizontal distance between the two points. When drilling shale oil and gas, if the vertical distance (depth) is different, the formation is likely to be different. If the strata are different, the nature of the stone will also be different. First, when the vertical distance exceeds a certain threshold, the correlation between these two points is set to “0”. For example, in FIG. 11, the vertical distance (depth) differs greatly between a point indicated by reference numeral 71A of the oil well 71 and a point indicated by reference numeral 72A of a different oil well 72. The value of W _ij is set to “0”.

またこれ以外の場合（垂直距離が大きくは異ならない）には、上述の「ｉ」という地点と、「ｊ」という地点との間の水平距離に応じてこれら２つの地点間の相関度を決める。例えば、図１１において、油井７０の符号７０Ａで示された地点と、これと異なる油井７１の符号７１Ａで示す地点については、これら２つの場所の水平距離に応じて行列Ｗ_ｉｊを決定する。この際、水平距離が「０」の場合は、相関度が最大の「１」に、水平距離が無限大の場合は、影響度が最小の「０」となるように行列Ｗ_ｉｊを決定する。行列Ｗ_ｉｊの設定例を次式に示す。 In other cases (the vertical distance is not significantly different), the degree of correlation between the two points is determined according to the horizontal distance between the point “i” and the point “j”. . For example, in FIG. 11, for a point indicated by reference numeral 70A of the oil well 70 and a point indicated by reference numeral 71A of a different well 71, the matrix W _ij is determined according to the horizontal distance between these two locations. At this time, when the horizontal distance is “0”, the matrix W _ij is determined so that the correlation degree is “1” that is the maximum, and when the horizontal distance is infinite, the influence degree is the minimum “0”. . An example of setting the matrix W _ij is shown in the following equation.

なお、（３）式において、「dist(i,j)」は、「ｉ」という地点と、「ｊ」という地点との緯度及び経度上の距離を表す。 In Expression (3), “dist (i, j)” represents the latitude and longitude distance between the point “i” and the point “j”.

以上の点を考慮し、本実施の形態において、対応モデル学習部２１は、上述の（１）式で与えられるＡと、（２）式で与えられるＢとの和（Ａ＋Ｂ）が最小となる変換行列Ｌ_ｘ及びＬ_ｙを繰り返し法で求めることにより、対応モデルを学習する。 Considering the above points, in the present embodiment, the correspondence model learning unit 21 minimizes the sum (A + B) of A given by the above equation (1) and B given by the equation (2). The correspondence model is learned by _{obtaining the} transformation matrices L _x and L _y by an iterative method.

図１２は、このような対応モデルの学習の処理手順を示す。対応モデル学習部２１は、この図１２に示す処理手順に従って、上述のＡとＢとの和を最小とする変換行列Ｌ_ｘ及び変換行列Ｌ_ｙを求める。 FIG. 12 shows a processing procedure for learning such a correspondence model. The correspondence model learning unit 21 obtains the transformation matrix L _x and the transformation matrix L _y that minimize the sum of the above A and B according to the processing procedure shown in FIG.

実際上、対応モデル学習部２１は、学習モード時、テキストデータベクトル化部２５によるテキストデータ２７の系列データのベクトル化と、数値データベクトル化部２６による数値データ２８の系列データのベクトル化とがすべて完了するとこの図１２に示す対応モデル学習処理を開始し、まず、変換行列Ｌ_ｘ及びＬ_ｙを初期設定する。この場合、これら変換行列Ｌ_ｘ及びＬ_ｙの初期値は、どのようなものであっても良い。このため本実施の形態においては、対応モデル学習部２１が、乱数を発生させて変換行列Ｌ_ｘ及びＬ_ｙの各要素の値をそれぞれ決定することにより変換行列Ｌ_ｘ及びＬ_ｙを初期設定するものとする（ＳＰ２０）。 Actually, the correspondence model learning unit 21 performs vectorization of the sequence data of the text data 27 by the text data vectorization unit 25 and vectorization of the sequence data of the numerical data 28 by the numerical data vectorization unit 26 in the learning mode. When all the processes are completed, the correspondence model learning process shown in FIG. 12 is started. First, the transformation matrices L _x and L _y are initialized. In this case, the initial values of these transformation matrices L _x and L _y may be anything. In Therefore the present embodiment, corresponding model learning unit 21 initializes the transformation matrix L _x and L _y by a random number is generated to determine the respective values of the elements of the transformation matrix L _x and L _y are It shall be (SP20).

続いて、対応モデル学習部２１は、一方の変換行列Ｌ_ｙを固定して、最小化の対象式を最小とする行列を求め、他方の変換行列Ｌ_ｘをその行列に更新する（ＳＰ２１）。本実施の形態においては、最小化の対象式が（１）式で算出されるＡと、（２）式で算出されるＢとの和であるため、この対象式を変換行列Ｌ_ｘで偏微分した結果の式がゼロであるような方程式を解くことにより、更新後の変換行列Ｌ_ｘを求めることができる。また対応モデル学習部２１は、これと同様にして、更新後の変換行列Ｌ_ｙを求める（ＳＰ２２）。 Subsequently, the corresponding model learning unit 21 fixes one transformation matrix L _y , obtains a matrix that minimizes the target expression for minimization, and updates the other transformation matrix L _x to that matrix (SP21). In the present embodiment, since the target expression for minimization is the sum of A calculated by Expression (1) and B calculated by Expression (2), this target expression is biased by the transformation matrix L _x . The updated transformation matrix L _x can be obtained by solving an equation in which the differentiated expression is zero. The corresponding model learning unit 21, in the similar manner to obtain the transformation matrix _{L y} after update (SP22).

次いで、対応モデル学習部２１は、更新前の変換行列Ｌ_ｘ及び変換行列Ｌ_ｙと、更新後の変換行列Ｌ_ｘ及び変換行列Ｌ_ｙとの差（例えば、対応する行列要素の差（絶対値および差の二乗）の総和）を計算し、この差を予め設定されている閾値（以下、これを学習終了判定閾値と呼ぶ）と比較する（ＳＰ２３）。そして対応モデル学習部２１は、ステップＳＰ２３の比較結果に基づいて、かかる差が学習終了判定閾値以上であるか否かを判断する（Ｓ２４）。 Then, the corresponding model learning unit 21, a transformation matrix L _x and the transformation matrix L _y before update, the difference between the conversion of the updated matrix L _x and the transformation matrix L _y (e.g., the difference between the corresponding matrix element (absolute value And the square of the difference) is calculated, and the difference is compared with a preset threshold value (hereinafter referred to as a learning end determination threshold value) (SP23). Then, the correspondence model learning unit 21 determines whether or not the difference is equal to or greater than the learning end determination threshold value based on the comparison result in step SP23 (S24).

対応モデル学習部２１は、この判断で否定結果を得るとステップＳＰ２１に戻り、この後、ステップＳＰ２４で肯定結果を得るまでステップＳＰ２１〜ステップＳＰ２４の処理を繰り返す。そして対応モデル学習部２１は、やがて更新前の変換行列Ｌ_ｘ及び変換行列Ｌ_ｙと、更新後の変換行列Ｌ_ｘ及び変換行列Ｌ_ｙとの差がかかる学習終了判定閾値以上となることによりステップＳＰ２４で肯定結果を得ると、この対応モデル学習処理を終了する。 If the corresponding model learning unit 21 obtains a negative result in this determination, it returns to step SP21, and thereafter repeats the processing of step SP21 to step SP24 until an affirmative result is obtained in step SP24. The corresponding model learning unit 21, the step by which the eventually the transformation matrix L _x and the transformation matrix L _y before update, the difference between the transformation matrix updated L _x and the transformation matrix L _y takes learning termination determination threshold value or higher If a positive result is obtained in SP24, this correspondence model learning process is terminated.

（１−６）対応データ検索部の処理
以上のようにして学習した対応モデルを用いることで、任意の対象データ２９（図２）に対応する対応データ３０（図２）を取得することができる。例えば、数値データ２８からテキストデータ２７を取得することができる。本実施の形態では、共通空間６２で最も近いデータを検索し、その検索結果を対応データ３０として出力する。 (1-6) Processing of Corresponding Data Search Unit By using the correspondence model learned as described above, correspondence data 30 (FIG. 2) corresponding to arbitrary target data 29 (FIG. 2) can be acquired. . For example, the text data 27 can be acquired from the numerical data 28. In the present embodiment, the closest data in the common space 62 is searched, and the search result is output as the correspondence data 30.

図１３は、このような対応データ検索部２２により実行される対応データ検索処理の処理手順を示す。対応データ検索部２２は、この図１３に示す処理手順に従って、データベクトル化部２０（図２）によりベクトル化された対象データ２９に対応付けるべきデータ（テキストデータ２７又は数値データ２８）を検索する。 FIG. 13 shows a processing procedure of corresponding data search processing executed by the corresponding data search unit 22. The correspondence data search unit 22 searches for data (text data 27 or numerical data 28) to be associated with the target data 29 vectorized by the data vectorization unit 20 (FIG. 2) according to the processing procedure shown in FIG.

実際上、対応データ検索部２２は、データベクトル化部２０によりベクトル化された対象データ２９（つまり対象データベクトル）を当該データベクトル化部２０から与えられると、この図１３に示す対応データ検索処理を開始し、まず、その対象データベクトルを対応モデル蓄積部２４に蓄積されている対応モデルの変換行列Ｌ_ｘ又はＬ_ｙにより共通空間６２（図１０）に射影する（ＳＰ３０）。 In practice, when the corresponding data search unit 22 receives the target data 29 (that is, the target data vector) vectorized by the data vectorization unit 20 from the data vectorization unit 20, the corresponding data search processing shown in FIG. It was started, first, projected onto the common space 62 (FIG. 10) by the transformation matrix _{L x} or _{L y} of the corresponding model stored the target data vector to the corresponding model storage unit 24 (SP30).

続いて、対応データ検索部２２は、共通空間６２上で（１）式及び（２）式の和が最小となるデータ（対象データ２９がテキストデータであれば数値データ、対象データ２９が数値データであればテキストデータ）をデータ蓄積部２３に蓄積されたテキストデータ２７又は数値データ２８の中から検索し（ＳＰ３１）、当該検索により検出したテキストデータ２７又は数値データ２８を対応データ３０として出力する（ＳＰ３２）。そして対応データ検索部２２は、この後、この対応データ検索処理を終了する。 Subsequently, the corresponding data search unit 22 performs data that minimizes the sum of the expressions (1) and (2) on the common space 62 (numerical data if the target data 29 is text data, and numerical data for the target data 29). Text data) is retrieved from the text data 27 or numerical data 28 stored in the data storage unit 23 (SP31), and the text data 27 or numerical data 28 detected by the search is output as the corresponding data 30. (SP32). Then, the correspondence data search unit 22 ends this correspondence data search processing.

（１−７）本実施の形態の効果
以上のように本実施の形態のデータ対応付け装置１において、対応モデル学習部２１は、２つの地点が物理的に近いほど各要素が「１」に近づくような相関度（行列）を定義し、その相関度を利用して（２）式のように定義されたＢと、（１）式のように定義されたＡとの和が最小となるような対応モデルを学習する。 (1-7) Effect of this Embodiment As described above, in the data association apparatus 1 of this embodiment, the correspondence model learning unit 21 sets each element to “1” as the two points are physically closer to each other. The degree of correlation (matrix) that approaches is defined, and the sum of B defined as in equation (2) and A defined as in equation (1) is minimized by using the degree of correlation. Learn the corresponding model.

この場合、地層内の物理的に近い２つの地点からそれぞれ取得されるテキストデータ２７及び数値データ２８は、これら２つの地点が物理的に近ければ近いほど類似すると考えることができる。従って、本実施の形態の（２）式のように、物理的（地理的）に近い任意の２つの地点からそれぞれ取得されたテキストデータ２７のテキストデータベクトルの共通空間６２上での距離と、これら２つの地点からそれぞれ取得された数値データ２８の数値データベクトルの共通空間６２上での距離との和が最小となるような対応モデルを学習することによって、より精度の高い対応モデルを学習することができ、かくしてより精度良くデータの対応付けを行うことができる。 In this case, it can be considered that the text data 27 and the numerical data 28 acquired from two physically close points in the formation are more similar as the two points are physically close. Therefore, as in the equation (2) of the present embodiment, the distance on the common space 62 of the text data vector of the text data 27 respectively obtained from any two points close to physical (geographic), A correspondence model with higher accuracy is learned by learning a correspondence model that minimizes the sum of the numerical data vector of the numerical data 28 acquired from each of these two points and the distance on the common space 62. Thus, data can be associated with higher accuracy.

（２）第２の実施の形態
図１及び図２において、８０は本実施の形態によるデータ対応付け装置を示す。本データ対応付け装置８０は、プロセッサ２がデータベクトル化プログラム８１を実行することにより具現化されるデータベクトル化部９０のテキストデータベクトル化部９１及び数値データベクトル化部９２によるテキストデータ２７や数値データ２８のベクトル化方法が異なる点と、プロセッサ２が対応モデル学習プログラム８２を実行することにより具現化される対応モデル学習部９３が深層学習により対応モデルを学習する点とを除いて第１の実施の形態のデータ対応付け装置１と同様に構成されている。 (2) Second Embodiment In FIGS. 1 and 2, reference numeral 80 denotes a data association apparatus according to this embodiment. The data association apparatus 80 is configured such that the text data 27 and numerical values by the text data vectorization unit 91 and the numerical data vectorization unit 92 of the data vectorization unit 90 that are realized by the processor 2 executing the data vectorization program 81. Except for the point that the vectorization method of the data 28 is different and the point that the correspondence model learning unit 93 embodied by the processor 2 executing the correspondence model learning program 82 learns the correspondence model by deep learning. The configuration is the same as that of the data association apparatus 1 of the embodiment.

実際上、本実施の形態のデータ対応付け装置８０の場合、テキストデータベクトル化部９１は、テキストデータ２７を図１４で示すような構成を有する２次元ベクトルに変換する。この２次元ベクトルの１つ目の次元は「深さ範囲区分」である。この「深さ範囲区分」は、第１の実施の形態の深さ範囲の区分と同じものである。 In practice, in the case of the data association apparatus 80 of the present embodiment, the text data vectorization unit 91 converts the text data 27 into a two-dimensional vector having a configuration as shown in FIG. The first dimension of the two-dimensional vector is “depth range division”. This “depth range section” is the same as the depth range section of the first embodiment.

各深さ範囲区分に含まれる各第１の深さ範囲のテキストデータ２７に基づいて、石の属性（「色」、「硬さ」、……）ごとに１次元ベクトルでなる「頻度ベクトル」を構築し、同じ深さ範囲区分の各属性の頻度ベクトルを結合した１次元ベクトルを生成する。これが上述したテキストデータ２７の２次元ベクトルの２つ目の次元である。なお、頻度ベクトルは、図５と同様のベクトルであるが、図１４では系列情報は含めない。単純な頻度ベクトルではなく、文章の分散表現（phrase2vec）を用いてもよい。分散表現は既存の手法で生成する。 Based on the text data 27 of each first depth range included in each depth range section, a “frequency vector” that is a one-dimensional vector for each stone attribute (“color”, “hardness”,...). To generate a one-dimensional vector that combines the frequency vectors of the attributes of the same depth range section. This is the second dimension of the two-dimensional vector of the text data 27 described above. Note that the frequency vector is the same vector as in FIG. 5, but the sequence information is not included in FIG. 14. Instead of a simple frequency vector, a distributed representation of a sentence (phrase2vec) may be used. A distributed representation is generated by an existing method.

一方、数値データベクトル化部９２は、数値データ２８を図１５で示す２次元ベクトルに変換する。この２次元ベクトルの１つ目の次元は深さ範囲区分である。この深さ範囲区分は、第１の実施の形態の深さ範囲区分と同じものである。それぞれの深さ範囲区分において、フーリエ変換により周波数毎の強さを要素とするベクトル（図１５の「長さＭの１次元ベクトル」）を構築する。これが上述した数値データ２８の２次元ベクトルの２つ目の次元である。 On the other hand, the numerical data vectorization unit 92 converts the numerical data 28 into a two-dimensional vector shown in FIG. The first dimension of the two-dimensional vector is a depth range section. This depth range section is the same as the depth range section of the first embodiment. In each depth range section, a vector (“one-dimensional vector of length M” in FIG. 15) having the strength for each frequency as an element is constructed by Fourier transform. This is the second dimension of the two-dimensional vector of the numerical data 28 described above.

図１６は、本実施の形態の対応モデル学習部９３（図２）により実行される対応モデル学習処理の流れを示す。上述のように本実施の形態の場合対応モデル学習部９３は、対応モデルを深層学習する。この図１５において、文書２次元ベクトル１００は、図１４で説明した深さ範囲区分ごとのテキストデータ２７の２次元ベクトルである。また数値２次元ベクトル１０１は、図１５で説明した深さ範囲区分ごとの数値データ２８の２次元ベクトルである。これらを独立に複数段の畳込み層１０２，１０３に入力する。 FIG. 16 shows the flow of the correspondence model learning process executed by the correspondence model learning unit 93 (FIG. 2) of the present embodiment. As described above, in the case of the present embodiment, the correspondence model learning unit 93 deeply learns the correspondence model. In FIG. 15, a document two-dimensional vector 100 is a two-dimensional vector of text data 27 for each depth range section described in FIG. The numerical two-dimensional vector 101 is a two-dimensional vector of the numerical data 28 for each depth range section described with reference to FIG. These are independently input to a plurality of convolution layers 102 and 103.

各畳込み層１０２，１０３では、文書２次元ベクトル及び数値２次元ベクトル、又は、前段の畳込み層１０２，１０３の処理結果をそれぞれその畳込み層１０２，１０３に応じた所定個数ずつ深さの方向で畳み込む。例えば、文書２次元ベクトル１００が1000個ある場合、図１６で「畳込み層１」と記述された畳込み層１０２では、文書２次元ベクトル１００を５個ずつ合計200個の文書２次元ベクトルに統合し、次の「畳込み層２」と記述された畳込み層１０２では、200個の文書２次元ベクトル１００を５個ずつ40個の文書２次元ベクトル１００に統合し、……という処理を行う。よって、文書２次元ベクトル１００及び数値２次元ベクトル１０１の両者とも深さの系列情報を考慮していることになる。そして、文書２次元ベクトル１００及び数値２次元ベクトル１０１の双方共に、最後に全結合層１０４，１０５を通す。全結合層１０４，１０５は、文書の学習系列（図１６の左側）と数値の学習系列（図１６の右側）の次元を揃えることが目的である。 In each of the convolution layers 102 and 103, the document two-dimensional vector and the numerical two-dimensional vector, or the processing results of the previous convolution layers 102 and 103 are respectively given a predetermined number of depths according to the convolution layers 102 and 103. Fold in the direction. For example, when there are 1000 document two-dimensional vectors 100, the convolution layer 102 described as “convolution layer 1” in FIG. 16 converts the document two-dimensional vectors 100 into five document two-dimensional vectors of five each. In the convolution layer 102 described as “convolution layer 2”, the 200 document two-dimensional vectors 100 are integrated into 40 document two-dimensional vectors 100 in units of five, and so on. Do. Therefore, both the document two-dimensional vector 100 and the numerical two-dimensional vector 101 consider depth series information. Then, both the two-dimensional document vector 100 and the numerical two-dimensional vector 101 are finally passed through all the coupling layers 104 and 105. The purpose of all the connected layers 104 and 105 is to align the dimensions of the document learning series (left side in FIG. 16) and the numerical learning series (right side in FIG. 16).

この後、全結合層１０４，１０５を通した文書２次元ベクトル１００及び数値２次元ベクトル１０１を、深層学習で最適化する次式
で定義された評価関数Ｅに当てはめ、この評価関数Ｅを最小とする全層のパラメータ（重み行列とバイアス）を求めるようにして対応モデルを学習する。 Thereafter, the following expression is used to optimize the document two-dimensional vector 100 and the numerical two-dimensional vector 101 that have passed through all the connected layers 104 and 105 by deep learning.
Is applied to the evaluation function E defined in (1), and the corresponding model is learned by obtaining the parameters (weight matrix and bias) of all layers that minimize the evaluation function E.

なお（４）式において、ｘは文書２次元ベクトル１００、ｙは数値２次元ベクトル１０１を示す。また（４）式の右辺第１項は、対応付いているｘとｙの距離を反映する評価尺度である。ここで、「ｙ−」は対応付いていないｙのことで、ランダムに選択する。（４）式の第１項の一例を以下に示す。
In equation (4), x represents a document two-dimensional vector 100, and y represents a numerical two-dimensional vector 101. Further, the first term on the right side of the equation (4) is an evaluation scale that reflects the distance between x and y associated with each other. Here, “y−” is y which is not associated, and is selected at random. An example of the first term of equation (4) is shown below.

（５）式において、φは畳込み層１０２，１０３と全結合層１０４，１０５による変換関数であり、出力は共通の次元数のベクトルとなる。（５）式を最小化することは、対応付いているｘとｙの距離を最小化（内積を最大化）し、対応付いていないｘとｙの距離を最大化（内積を最小化）することに相当する。 In the equation (5), φ is a conversion function by the convolution layers 102 and 103 and all the coupling layers 104 and 105, and the output is a vector having a common number of dimensions. Minimizing the expression (5) minimizes the distance between the corresponding x and y (maximizes the inner product) and maximizes the distance between the uncorresponding x and y (minimizes the inner product). It corresponds to that.

（４）式の右辺第２項及び右辺第３項は、地層中での距離を反映した評価尺度であり、一例を以下に示す。
The second term on the right side and the third term on the right side of the equation (4) are evaluation scales reflecting the distance in the formation, and an example is shown below.

（６）式のＷ_ｉｊは、第１の実施の形態について上述した（３）式のＷ_ｉｊと同じものである。（６）式を最小化することは、地理的な相関度（Ｗ_ｉｊ）と類似度の差を最小化することに相当する。 W _ij in the equation (6) is the same as W _{ij in the} equation (3) described above for the first embodiment. Minimizing the expression (6) corresponds to minimizing the difference between the geographical correlation (W _ij ) and the similarity.

深層学習では、（４）式の評価関数を最小化することで、図１５のネットワーク構造のパラメータを決める。具体的な方法については、既存の深層学習の方法を用いる。 In deep learning, the network structure parameters in FIG. 15 are determined by minimizing the evaluation function of equation (4). As a specific method, an existing deep learning method is used.

以上のように本実施の形態のデータ対応付け装置８０では、対応モデル学習部９３が深層学習により対応モデルを学習する。この際、対応モデル学習部９３は、（４）式で定義された評価関数Ｅを最小化する全層のパラメータを求めるようにして対応モデルを学習するが、（４）式は地層中での距離を反映した項を含んでおり、従って、本実施の形態のデータ対応付け装置８０によれば、第１の実施の形態と同様に、精度の高い対応モデルを学習することができ、かくしてより精度良くデータの対応付けを行うことができる。 As described above, in the data association apparatus 80 according to the present embodiment, the correspondence model learning unit 93 learns the correspondence model by deep learning. At this time, the correspondence model learning unit 93 learns the correspondence model by obtaining parameters of all layers that minimize the evaluation function E defined by the equation (4), but the equation (4) Therefore, according to the data association apparatus 80 of the present embodiment, it is possible to learn a correspondence model with high accuracy as in the first embodiment. Data can be associated with high accuracy.

（３）他の実施の形態
なお上述の第１及び第２の実施の形態においては、任意の２つのデータ源（「ｉ」という地点及び「ｊ」という地点）の相関度として、これら２つのデータ源の物理的な距離を考慮するようにした場合について述べたが、本発明はこれに限らず、これらデータ源の距離に加えて又は代えて、これら２つのデータ源の掘削時期の差を考慮してこれらデータ源間の相関度を定義するようにしてもよい。任意の２つのデータ源の距離に代えてこれら２つのデータ源の掘削磁気の差を考慮した場合の相関度Ｗ_ｉｊの式の例を以下に示す。
(3) Other Embodiments In the first and second embodiments described above, as the degree of correlation between any two data sources (point “i” and point “j”), these two Although the case where the physical distance between the data sources is considered has been described, the present invention is not limited to this, and in addition to or instead of the distance between these data sources, the difference in the excavation timing between these two data sources is determined. The degree of correlation between these data sources may be defined in consideration. An example of the expression of the correlation degree W _ij in the case of considering the excavation magnetic difference between these two data sources instead of the distance between any two data sources is shown below.

ここで、（７）式において「dist」は、掘削時期の差とする。従って、（７）式により定義される相関度Ｗ_ｉｊは、「ｉ」という地点の掘削時期と、「ｊ」という地点の掘削時期とが近ければ近いほど大きな値となる。なお任意の２つのデータ源は、同一の油井の掘削経路上のものであっても、異なる油井の掘削経路上のものであってもよい。 Here, “dist” in equation (7) is the difference in excavation time. Accordingly, the correlation degree W _ij defined by the equation (7) becomes a larger value as the excavation time at the point “i” is closer to the excavation time at the point “j”. Any two data sources may be on the same oil well drilling path or on different oil well drilling paths.

また上述の第１の形態においては、（１）式で算出されるＡと、（２）式で算出されるＢとの和（Ａ＋Ｂ）が最小となる変換行列Ｌｘ，Ｌｙを求めるようにして対応モデルを学習するようにした場合について述べたが、本発明はこれに限らず、例えば、Ｂにある定数αを乗算した乗算結果とＡとの和（Ａ＋αＢ）が最小となる変換行列Ｌｘ，Ｌｙを求めるようにして対応モデルを学習するようにしてもよい。 In the first embodiment described above, conversion matrices Lx and Ly that minimize the sum (A + B) of A calculated by equation (1) and B calculated by equation (2) are obtained. Although the case where the corresponding model is learned has been described, the present invention is not limited to this. For example, the transformation matrix Lx, which minimizes the sum (A + αB) of the multiplication result obtained by multiplying the constant α in B and A The correspondence model may be learned by obtaining Ly.

本発明はシェールオイル・ガスの採掘の際に得られたセンサデータと、レポートのテキストデータとを対応付けるデータ対応付け装置のほか、種々のデータ対応付け装置に適用することができる。 The present invention can be applied to various data association devices in addition to a data association device for associating sensor data obtained during mining of shale oil and gas with text data of a report.

１，８０……データ対応付け装置、２……プロセッサ、３……メモリ、１０，８１……データベクトル化プログラム、１１，８２……対応モデル学習プログラム、１２……対応データ検索プログラム、２１，９３……対応モデル学習部、２２……対応データ検索部、２５，９１……テキストデータベクトル化部、２６，９２……数値データベクトル化部、２７……テキストデータ、２８……数値データ、２９……対象データ、３０……対応データ、６０……テキストベクトル空間、６１……数値ベクトル空間、６２……共通空間、６３，６４，Ｌ_ｘ，Ｌ_ｙ……変換行列。 DESCRIPTION OF SYMBOLS 1,80 ... Data matching apparatus, 2 ... Processor, 3 ... Memory, 10, 81 ... Data vectorization program, 11, 82 ... Corresponding model learning program, 12 ... Corresponding data search program, 21, 93 ... Corresponding model learning unit, 22 ... Corresponding data search unit, 25, 91 ... Text data vectorization unit, 26, 92 ... Numeric data vectorization unit, 27 ... Text data, 28 ... Numeric data, 29 ...... target data, 30 ...... corresponding data, 60 ...... text vector space, 61 ...... numeric vector space, 62 ...... common space, 63, _64, L x, _{L y} ...... transformation matrix.

Claims

A correspondence model of first and second series data obtained from the same data source is learned, and based on the learned correspondence model, target data belonging to one of the first or second series data is changed to the other In the data association apparatus for associating with the data belonging to the second or first series data,
A vectorization unit for vectorizing each of the first and second series data obtained from the same data source;
A correspondence model learning unit that learns the correspondence model of the first and second series data based on the vectorized first and second series data; and
Of the first and second series data, a correlation degree that is a degree of correlation between data acquired from any two different data sources is predefined,
The correspondence model learning unit
A data association apparatus that learns the correspondence model by using the degree of correlation.

The degree of correlation is
The data association apparatus according to claim 1, wherein the data association apparatus is defined so as to increase as the distance decreases in accordance with a distance between any two different data sources.

The correspondence model learning unit
In a common space obtained by projecting the vectors of the first and second series data obtained from any two of the data sources, the vector of the first series data obtained from one of the data sources and the other A value obtained by multiplying the distance between the vectors of the first series data obtained from the data source by the degree of correlation, the vector of the second series data obtained from one of the data sources, and the other data The correspondence model is learned so that a sum of a distance between vectors of the second series data obtained from a source and a value obtained by multiplying the correlation degree is minimized. Data association device.

The data association apparatus according to claim 3, wherein the distance is a distance between the data sources in a vertical direction and a horizontal direction.

The degree of correlation is
The data association apparatus according to claim 1, wherein the data association apparatus is defined so as to become larger as the excavation time is closer according to the excavation time of any two different data sources.

A correspondence model of first and second series data obtained from the same data source is learned, and based on the learned correspondence model, target data belonging to one of the first or second series data is changed to the other A data association method executed in a data association apparatus for associating with data belonging to the second or first series data,
A first step in which the data associating device vectorizes the first and second series data obtained from the same data source;
The data association apparatus comprises a second step of learning the correspondence model of the first and second series data based on the vectorized first and second series data; and
Of the first and second series data, a correlation degree that is a degree of correlation between data acquired from any two different data sources is predefined,
In the second step, the data association device includes:
A data association method, wherein the correspondence model is learned using the degree of correlation.

The degree of correlation is
The data association method according to claim 6, wherein the data association method is defined so as to increase as the distance decreases in accordance with a distance between any two different data sources.

The correspondence model learning unit
In a common space obtained by projecting the vectors of the first and second series data obtained from any two of the data sources, the vector of the first series data obtained from one of the data sources and the other A value obtained by multiplying the distance between the vectors of the first series data obtained from the data source by the degree of correlation, the vector of the second series data obtained from one of the data sources, and the other data The correspondence model is learned so that a sum of a distance between vectors of the second series data obtained from a source and a value obtained by multiplying the correlation degree is minimized. Data mapping method.

Each of the data sources exists within the formation,
The data association method according to claim 8, wherein the distance is a vertical distance between the data sources.

The degree of correlation is
The data association method according to claim 6, wherein the data association method is defined such that the closer the excavation time is, the greater the excavation time of any two different data sources is.