JP4177997B2

JP4177997B2 - Database search apparatus, database search method, computer program, and computer-readable recording medium

Info

Publication number: JP4177997B2
Application number: JP2002127554A
Authority: JP
Inventors: 修一岩田; ヴィラスピエール
Original assignee: 大和寛
Priority date: 2002-04-26
Filing date: 2002-04-26
Publication date: 2008-11-05
Anticipated expiration: 2022-04-26
Also published as: JP2003323450A

Description

【０００１】
【発明の属する技術分野】
本発明は、科学技術データや経済データ等の複雑事象のデータベースから特定のパターンを発見するためのデータマイニング技術に関し、より詳細には、多数のフィールドを有するレコード群から構成される化合物データベースを対象とし、化合物データベースに非明示的に含意されている因果関係を系統的に求めるために、化合物を構成要素に展開し、その構成要素が有する属性間の演算とデータの再編を行うデータベース検索装置、データベース検索方法、コンピュータプログラムおよびコンピュータ読取可能な記録媒体に関する。
【０００２】
【従来の技術】
複雑なシステムの挙動は、観測手段、測定方法、データ記述等の制約から完全かつ明示的には表現できない。複雑事象に関しては、事象データベースに記憶するデータは実際の事象とは異なり近似的なものになる。このため、文献から入手可能な数値データ、数値データを獲得するための測定方法および測定条件に関連する記述（以下、この記述を「関連パラメータ」という）、並びにこれらの内容が記載された文献（出典）に関する書誌事項を組み合わせて断片的なデータベースを構築することになる。このような手法を用いた事象データベースの構築に関する試みはことごとく失敗してきた。データと関連パラメータとの組み合わせから構成される原データモデルに準拠して構築された事象データベースは、ユーザから満足な評価を得ることは極めて困難である。
【０００３】
一般に、データベースの利用においては、ユーザが複数の検索条件を入力し、あるいは検索条件のメニューから所望の検索条件を選択する。データベースは、それぞれの検索条件に合致したレコード集合を作成し、レコード集合の論理演算により所望のレコード群を抽出し、表示あるいは印字が実行される。
【０００４】
このような処理により、文献データベースの場合には、目的にあった文献情報、書誌事項が入手できる。また、事象データベースの場合は、数値、テキスト、図、写真等の属性値が入手できる。そして、文献データベース、事象データベースのどちらの場合にも、蓄積したレコード群の部分集合としてのレコード群が属性値とともに入手できる。
【０００５】
また、文献データベース、事象データベースのどちらの場合にも、データベースにレコードとして蓄積されていない事象についての検索結果は空集合である。原データモデルに準拠して構築された事象データベースは、データ編集の困難さから、検索要求に対して空集合を返すことが多い。
【０００６】
すなわち、ユーザの満足が得られるデータベース検索システムを構築するためには、観察される事実が過不足なく正確に記述できていることは例外であるとの前提に立ち、情報の不足をどのように補完するかを考慮することが重要である。
【０００７】
【発明が解決しようとする課題】
現在、自然言語の属性のひとつである「言葉のあいまいさ」を克服するために、分野毎にキーワードが整備され、そうしたキーワードを活用した検索が広く利用されている。しかしながら、キーワードの厳密な定義と使用法とを徹底することはできない。このため、シソーラスや自然言語処理による支援にも限界があり、検索結果には必然的にノイズが含まれ、あるいは検索もれが生ずる。従って、検索結果の妥当性の判断はユーザに委ねられることになる。
【０００８】
また、事象データベースは、データ量やデータ品質が不十分な場合には必然的に検索結果の持つ情報量が乏しくなる。前者の場合には有効な結論を導くことが困難であり、後者の場合には間違った結論へと導かれ易い。すなわち、質、量ともに揃った事象データベースの構築には極めて質の高い膨大な作業を必要とするため、有効な結論を導き出すことは困難であるという問題があった。
【０００９】
知識処理技術の活用は、このような限界を克服するものである。しかしながら、知識処理技術は論理的に定義され確立された知識の再利用に関する技術であり、複雑事象に関して知識ベースを構築することは不可能に近い。複雑事象の場合には、論理的な関係が非明示的にしか定義されていないデータ群から新しい知識を獲得し、その場で獲得した知識の利用を考えることが必要である。したがって、従来の知識処理技術は、設計解や、事業戦略および予防保全などの解のように、知識獲得および知識利用が同時に進行するような解の探索問題には有効ではないという問題があった。
【００１０】
このような問題に鑑みて、本発明は、着目する問題に適した解を容易に探索することができるデータベース検索装置、データベース検索方法、コンピュータプログラムおよびコンピュータ読取可能な記録媒体を提供することを目的とする。
【００１１】
【課題を解決するための手段】
このような目的を達成するため、請求項１に記載の発明は、データベース検索装置であって、多数の化合物を記述するための複数のフィールドを有する第１のレコードと、前記化合物を構成する元素を記述するための複数の属性を有する第２のレコードとを記憶した記憶手段と、前記化合物を記述するための前記複数のフィールドから選択された着目するフィールドの属性値が、空値でなく、ある閾値未満であるレコードと、同着目するフィールドの属性値が、空値でなく、ある閾値以上であるレコードとを、前記第１のレコードから抽出する抽出手段と、前記記憶手段に記憶された前記第２のレコードから選択された前記元素の属性の属性値を変数として試行する少なくとも２つの試行関数を設定する設定手段と、前記抽出手段により抽出された前記レコードの化合物を元素に展開し、同展開された元素の組み合わせに対して、前記設定手段により設定された前記試行関数の計算を実行し、その計算結果を、少なくとも２つの前記試行関数により得られる各数値を座標軸とするグラフ上に配置し、且つ、前記抽出手段により前記第１のレコードから抽出された前記レコード内の前記着目するフィールドの属性値が、ある閾値を超えるか否かの判定を行うことにより、前記化合物を類別する類別手段と、前記着目するフィールドの属性値がある閾値を超えるか否かの前記類別手段による判定結果に応じて各化合物の前記計算結果の表示形態を変更して、同計算結果が配置された前記グラフを表示する表示手段と、を備えたことを特徴とする。
【００１２】
また、請求項２に記載の発明は、請求項１に記載のデータベース検索装置において、前記記憶手段に記憶された属性に基づいて新たな属性を作成する作成手段を更に備えることを特徴とする。
【００１３】
また、請求項３に記載の発明は、請求項１または２に記載のデータベース検索装置において、前記設定手段は、前記属性の選択を受け付ける第１の受付手段と、数学的操作の入力を受け付ける第２の受付手段と、前記第１の受付手段により受け付けられた属性と、前記第２の受付手段により受け付けられた数学的操作とに基づいて前記試行関数を作成する手段とを備えたことを特徴とする。
【００１４】
また、請求項４に記載の発明は、請求項１ないし３のいずれかに記載のデータベース検索装置において、前記類別手段は、前記設定手段により設定された試行関数の計算を実行する計算手段と、少なくとも２つの前記試行関数により得られる各数値を座標軸とするグラフ上に、前記計算手段により実行された計算結果を配置する配置手段と、該配置手段により配置された計算結果のうちから選択された基準点に対応する前記着目するフィールドの値と、該基準点の周囲に存在する他の計算結果に対応する前記着目するフィールドの値とを比較する比較手段とを有することを特徴とする。
【００１５】
また、請求項５に記載の発明は、多数の化合物を記述するための複数のフィールドを有する第１のレコードと、前記化合物を構成する元素を記述するための複数の属性を有する第２のレコードとを記憶したデータベース検索装置におけるデータベース検索方法であって、前記データベース検索装置の抽出手段が、前記化合物を記述するための前記複数のフィールドから選択された着目するフィールドの属性値が、空値でなく、ある閾値未満であるレコードと、同着目するフィールドの属性値が、空値でなく、ある閾値以上であるレコードとを、前記第１のレコードから抽出し、前記データベース検索装置の設定手段が、前記第２のレコードから選択された前記元素の属性の属性値を変数として試行する少なくとも２つの試行関数を設定し、前記データベース検索装置の類別手段が、前記抽出手段により抽出された前記レコードの化合物を元素に展開し、同展開された元素の組み合わせに対して、前記設定手段により設定された前記試行関数の計算を実行し、その計算結果を、少なくとも２つの前記試行関数により得られる各数値を座標軸とするグラフ上に配置し、且つ、前記抽出手段により前記第１のレコードから抽出された前記レコード内の前記着目するフィールドの属性値が、ある閾値を超えるか否かの判定を行うことにより、前記化合物を類別し、前記データベース検索装置の表示手段が、前記着目するフィールドの属性値がある閾値を超えるか否かの前記類別手段による判定結果に応じて各化合物の前記計算結果の表示形態を変更して、同計算結果が配置された前記グラフを表示することを特徴とする。
【００１６】
また、請求項６に記載の発明は、コンピュータプログラムであって、多数の化合物を記述するための複数のフィールドを有する第１のレコードと、前記化合物を構成する元素を記述するための複数の属性を有する第２のレコードとを記憶したコンピュータに、前記化合物を記述するための前記複数のフィールドから選択された着目するフィールドの属性値が、空値でなく、ある閾値未満であるレコードと、同着目するフィールドの属性値が、空値でなく、ある閾値以上であるレコードとを、前記第１のレコードから抽出する抽出手段と、前記第２のレコードから選択された前記元素の属性の属性値を変数として試行する少なくとも２つの試行関数を設定する設定手段と、前記抽出手段により抽出された前記レコードの化合物を元素に展開し、同展開された元素の組み合わせに対して、前記設定手段により設定された前記試行関数の計算を実行し、その計算結果を、少なくとも２つの前記試行関数により得られる各数値を座標軸とするグラフ上に配置し、且つ、前記抽出手段により前記第１のレコードから抽出された前記レコード内の前記着目するフィールドの属性値が、ある閾値を超えるか否かの判定を行うことにより、前記化合物を類別する類別手段と、前記着目するフィールドの属性値がある閾値を超えるか否かの前記類別手段による判定結果に応じて各化合物の前記計算結果の表示形態を変更して、同計算結果が配置された前記グラフを表示する表示手段と、して機能させることを特徴とする。
【００１７】
また、請求項７に記載の発明は、コンピュータ読み取り可能な記録媒体であって、請求項６に記載のコンピュータプログラムを記録したことを特徴とする。
【００１８】
本発明では、化合物の設計解などの解の探索問題を、化合物データベースを利用して、試行関数と解のソーテイング機能を組み合わせた解空間の網羅的な探索を行うことによって解決する。設計は、要求仕様と利用可能な諸資源、諸資源の合成と属性予測のための公理系、定理系から、いずれの条件をも満足する事実の集合を導出する行為である。設計解の解空間は、この行為についての事実に対応するレコードから導かれる空間である。
【００１９】
化合物データベースは、複数フィールドの演算から構成される試行関数の実行とソーテイング機能とを組み合わせた解空間の網羅的な探索によって、解の存在する空間を近似するための試行関数の関数系を決定し、問題の解決を図る。
【００２０】
また、解空間の探索の履歴は解探索の試行錯誤履歴として記録し、プロトコル解析の後、再利用可能な知識ベースとして蓄積することができる。
【００２１】
【発明の実施の形態】
以下、図面を参照し、本発明の実施の形態について詳細に説明する。なお、以下の説明において、化合物とは１つ以上の属性（フィールド）から構成される認識個体をいう。また、データモデルとは、情報を抽象化する過程で、抽象化された世界でのデータの表現形式単位などのデータの持ち方、データ相互関連の表現方法、そしてこれら全体の操作方法の体系のルール化を、個々の応用プログラムのアルゴリズムとは独立に行うためのモデルをいう。
【００２２】
図１は、本発明の実施の形態に係るデータベース検索装置のハードウェア構成を示すブロック図である。
【００２３】
同図に示されたデータベース検索装置２００は、システムメモリ２０４と、ＣＰＵ（Central Processing Unit）２０６と、複数の化合物データの集合からなる化合物データベース（ＤＢ）２０８と、ＲＯＭ（Random Access Memory）２１０とがシステムバス２２１に接続されて構成されている。
【００２４】
システムメモリ２０４には、ＤＢ制御プログラム２１２およびこのプログラムにより参照されるデータ２１３を記憶する。
【００２５】
ＤＢ制御プログラム２１２は、化合物データベース２０８に対し、データの登録、更新、削除等の処理を行うためのプログラムである。化合物データベース２０８は、ＤＢ制御プログラム２１２の実行により登録されたデータの集合である。これらのデータの集合は、例えばハードディスク、磁気ディスク、光ディスク、光磁気ディスク等の内部または外部の記憶装置上に登録されている。
【００２６】
また、システムバス２２１には、周辺装置としてディスプレイ２３０、スピーカ２３２、キーボード２３４、およびマウス２３６が接続されている。
【００２７】
ディスプレイ２３０は、ＣＰＵ２０６により編集された画像の表示を行うものであり、ＬＣＤ（Liquid Crystal Display）やＣＲＴ（Cathode-Ray Tube）などを使用することができる。スピーカ２３２は、電気信号を音声に変換して出力する。キーボード２３４およびマウス２３６は、ディスプレイ２３０上のカーソル制御や、ＣＰＵ２０６に対する命令の入力を行うために使用される入力デバイスである。ディスプレイ２３０、スピーカ２３２、キーボード２３４、およびマウス２３６は全て、不図示の入出力（Ｉ／Ｏ）インタフェースを介してシステムバス２２１と接続されている。
【００２８】
本発明に関わる諸機能は、ＣＰＵ２０６が、ＲＯＭ２１０またはシステムメモリ２０４に記憶されたプログラムを読み出して実行することにより、あるいは本発明に関わるプログラムを記憶したコンピュータ読取可能な記録媒体をデータベース検索装置２００に供給することにより達成される。後者の場合、そのデータベース検索装置２００のＣＰＵ２０６が、記録媒体に格納されたプログラムを読み出して実行することになる。
【００２９】
データベース検索装置２００にプログラムを供給する記録媒体としては、具体的にはフレキシブルディスク、ハードディスク、光磁気ディスク、光ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカードおよびＲＯＭの他、コンパクトフラッシュ（登録商標）などのフラッシュメモリなども使用することができるが、本発明はこれらの媒体に限定されるものではない。
【００３０】
また、本実施形態で使用される用語「プログラム」は、例えばＢＡＳＩＣ、Ｃ言語、Ｃ＋＋などの高水準プログラミング言語、アセンブリ言語、マシン語等のプログラム言語により記述されるものをいうが、本発明はこれらの言語に限定されるものではない。
【００３１】
次に、図２を参照し、本実施形態に係るデータベース検索装置２００に搭載された化合物データベースの内容について説明する。
【００３２】
化合物データベース２０８は、２種類の元素からなる化合物の化合物データ６０１と、元素データ６０２とを記憶している。化合物データ６０１は、化合物を記述するための複数のフィールド（属性）を有するレコードの集合である。化合物データを構成するレコードは、融点、密度、硬度、格子定数、析出物の大きさ、転位密度、結晶粒の大きさ、中性子の断面積、引張強さ（tensile strength）、降伏強さ（yield strength）等のフィールドからなる。元素データ６０２は、化合物を構成する元素を記述するための複数のフィールド（属性）を有するレコードの集合であり、メンデレエフ番号、弾性率、ヤング率、弾性定数、電気陰性度等のフィールドからなる。以下の説明においては、化合物データと元素データのフィールドの混乱を避けるため、化合物データのフィールドを「フィールド」、元素データのフィールドを「属性」と記述する。
【００３３】
次に、図３のフローチャートを参照し、本実施形態に係るデータベース検索方法について説明する。
【００３４】
本実施形態では、２元素からなる化合物について、例えば結晶構造または融点などのいくつかの化合物の特性と、メンデレエフ番号または電気陰性度のような構成要素（元素）の属性との間に存在する隠れた相関を探索する。２つの元素の属性は、いくつかの数学的操作によって結合され（例えば、原子番号の最大値）、この結合を試行関数という。属性と、その属性に対する複数の数学的操作は、ユーザにより選択される。データベース検索装置２００は、選択された属性を数学的操作で結合して試行関数を設定し、全ての元素について試行関数を実行して解の探索を行う。以下の説明において、探索の目的は、結晶構造のように、異なる化合物の特性を境界線によってよく分離させる解空間を見つけることである。
【００３５】
まず、ステップＳ１０２において、データベース検索装置２００は化合物データベース２０８内を検索し、着目するフィールドを従属変数（Dependent Variable: DV）と仮定する。そして、着目するフィールドであって、そのフィールドの間に相関関係が存在するフィールドが空値でないレコードを抽出し、抽出されたレコードの集合を作成する。
【００３６】
着目するフィールドとは、ユーザが分析しようとする問題に関連するフィールドをいう。例えば設計目標となる化合物の特性として、材料（化学物質）の強さに着目した場合、すなわち材料の強さが従属変数となった場合、引張強さ、降伏強さ、硬度というフィールドは「着目するフィールド」とすることができる。同様に、２相界面の整合性を記述するパラメータに相当するフィールド（格子定数、析出物の大きさ、転位密度、結晶粒の大きさ）を「着目するフィールド」とすることもできる。これに対し、例えば中性子の断面積のように、材料の強さと無関係なフィールドは「着目するフィールド」とはならない。
【００３７】
また、フィールド間に相関関係があるとは、例えば化合物の温度を上げる等、環境を変更した場合に、従属変数と仮定されたフィールドの属性値が一定の関係をもって変化するような場合をいい、時間的変化に対する関係の他、物理的な変化に対する関係をも含む。
【００３８】
相関関係があるか否かの判定は次のようにして行うことができる。今、ユーザが分析しようとする問題が、２元素の特性に基づいて、それらの化合物の結晶構造（ＮａＣｌまたはＣｓＣｌ構造のどちらであるか）と、化合物を構成する元素の属性値との間の相関を見つけることであるとする。温度に関する相関関係を判定する場合は、ＮａＣｌまたはＣｓＣｌ構造に結晶化する全ての化合物について、室温における化合物データについてのレコードの集合（データセット１）を作成する。次いで、それら全ての化合物について、温度を変化させた場合の化合物データについてのレコードの集合（データセット２）を作成する。最後に、データセット１からデータセット２を差し引く。そして、データセット１および２の差分を評価する。
【００３９】
なお、着目するフィールドを従属変数と仮定する処理は、ディスプレイ２３０上に化合物データが有する全フィールドのリストを表示し、このリストからフィールドの選択を受け付けることにより設定することができる。この場合、入力デバイスを用いたユーザによる選択操作に応じて従属変数が決められることとなる。
【００４０】
また、ディスプレイ２３０上に着目すべき問題（例えば材料の強さ、物質の安定性等）を選択メニューとして表示し、入力デバイスを用いてユーザにより選択された問題に応じて、その問題に対応するフィールドを従属変数として設定することとしてもよい。
【００４１】
次いで、ステップＳ１０４において、データベース検索装置２００は、当該化合物を元素に展開する。
【００４２】
化合物データベース２０８は２種類の元素からなる化合物を記憶するものであるから、この場合の化合物は基本構成要素、すなわち元素に展開される。例えば、形状記憶合金として用いられるＮｉＴｉは、ＮｉおよびＴｉに展開される。データベース検索装置２００は、展開された元素の元素データを化合物データベース２０８から抽出する。
【００４３】
同時に、データベース検索装置２００は必要に応じて、予め元素データに記憶された属性値に基づいて、着目する従属変数に対する独立変数となり得る新たな属性値を計算する。例えば、材料の強さが従属変数となった場合、独立変数としては原子のトータルエネルギー、トータルエネルギーの１次微分、２次微分などが考えられるが、化合物データベースが属性値として元素のトータルエネルギーを予め記憶しておけば、その１次微分、２次微分は、記憶されたデータに基づいて計算することができる。
【００４４】
次いで、ステップＳ１０６において、データベース検索装置２００は、展開された全ての元素について、上記の従属変数に対し独立変数となり得る属性値であって、空値とならない属性値から構成されるレコードの集合を作成する。
【００４５】
データベース検索装置２００は、全ての元素について、必ず値が存在し、かつ上記従属変数に対する独立変数と仮定したフィールドを組み合わせて１つのレコードとし、元素についてのレコードの集合（元素データベース）を作成する。
【００４６】
次いで、ステップＳ１０８において、データベース検索装置２００は、元素データベースを構成する属性から特定の属性の選択を受け付けると共に、選択された属性に対する数学的操作の入力を受け付ける。そして、受け付けられた属性（独立変数）と数学的操作とに基づいて、試行関数を設定する。
【００４７】
この場合、ユーザは、従属変数（着目するフィールド）と独立変数（各元素の属性）とを関係付けるため、元素データベースのうちから所望の属性値を少なくとも１つ選択するとともに、選択された属性に対する少なくとも２つの数学的操作を選択する。データベース検索装置２００は、ユーザにより選択された属性値と数学的操作を組み合わせて試行関数を設定し、記憶する。
【００４８】
なお、着目する従属変数と仮定した属性値は、ディスプレイ２３０上に独立変数となり得る属性値の候補のリストを表示し、このリストからの属性値の選択を受け付けることにより設定することができる。この場合、入力デバイスを用いたユーザによる全属性値からの選択操作に応じて設定することとなる。数学的操作も同様に、ディスプレイ２３０上に候補のリストを表示し、このリストからの選択を受け付けることにより設定することができる。
【００４９】
最後に、ステップS１１０において、データベース検索装置２００は、集合の従属変数の値による類別を実行するため、上記の試行関数の系統的な演算と、従属変数の値のソーティングとを組み合わせて、類別のための網羅的な探索を実行する。この操作により、従属変数の値をある閾値に基づいて区別するための最適な試行関数を見つける目的で、解の探索が試みられる。
【００５０】
図４は、ステップＳ１１０における処理をより詳細に示すフローチャートである。データベース検索装置２００は、ステップＳ１０８において設定された複数の試行関数をある元素の組み合わせに対して実行する（Ｓ８０１）。具体的には、設定された試行関数が「２元素のメンデレエフ番号の比」および「２元素のメンデレエフ番号の最大値」である場合、データベース検索装置２００は２種類の元素をピックアップしてこれらの試行関数の計算を実行する。
【００５１】
データベース検索装置２００は、各計算結果について、その計算に用いられた２つの元素からなる化合物における従属変数の値が、ある閾値を超えるか否かの判定を行う（化合物の類別）。計算結果は、この判定により、従属変数の値が閾値未満であるという特性と、従属変数の値が閾値以上である特性とに分けることができる（Ｓ８０２）。
【００５２】
そして、その計算結果を、各試行関数により規定される解空間上に配置する（Ｓ８０３）。即ち、ｘ軸を「２元素のメンデレーエフ番号の比」、ｙ軸を「２元素のメンデレーエフ番号の最大値」とした解空間上に、その計算結果を配置する。このような計算および配置は、全ての元素に対して総当たり的に行われ（Ｓ８０４）、最終的には全ての計算結果が解空間上に配置される（ソーティング）。
【００５３】
次いで、データベース検索装置２００は、解空間上に配置された点から１つの点を基準点として設定し（Ｓ８０５）、その基準点の特性を判定する。次いで、その点に最も近い１つ以上の点を見つけ出し、見つけ出された点の特性と、基準点の特性とを比較し、同じ特性であるかを判定する（Ｓ８０６）。そして、基準点に最も近い位置にある点の特性が、基準点の特性と同じである確率を算出する。
【００５４】
次いで、基準点から２番目に近い１つ以上の点を見つけ出し、上記と同様に、見つけ出された点の特性が基準点の特性である確率を求める。このような処理を、解空間上に配置された全ての点について繰り返す（Ｓ８０７）。
【００５５】
次いで、データベース検索装置２００は、得られた確率に基づき、基準点からのある距離を境に、点の特性がよく「分離」しているかの判定を行う。具体的にこの判定は、解空間内において、上記で求められた確率が最も急激に減少するような境界線を求めることにより行う。そして、最も分離のよい解空間を、着目している問題に適した解空間として同定する。
【００５６】
このステップＳ１１０において、従属変数の値により類別された試行関数の計算結果（即ち解）は、解空間上に配置された各点を、判定結果に応じて色分けしてディスプレイ上に表示することにより、ユーザに提供される。
【００５７】
このようにして、試行関数の系統的な演算と、独立変数の値のソーティングとを組み合わせ、解の網羅的な探索を実行することにより、化合物をよりよく類別する解空間の同定が実現される。
【００５８】
なお、本実施形態は、検索条件に合致するレコードが存在せず、物質を構成する元素の属性値に基づいて行われる系統的なソーテイングだけでは、物質のマクロな特性と、基本構成要素（元素）との相関についての解空間を同定できない場合に特に有効である。
【００５９】
次に、図５〜図７を参照し、本実施形態に係るデータベース検索方法において、ディスプレイ上に表示される画面の例について説明する。以下の説明において、ユーザにとっての問題は化合物の安定性であり、この問題の分析は、物質についての化合物形成（ｃｏｍｐｏｕｎｄｆｏｒｍｅｒ）、および化合物非形成（ｃｏｍｐｏｕｎｄｆｏｒｍｅｒ）（構成元素がランダムに格子位置に配置されたり、非晶質になったりする場合）を判定することにより行われる。なお、この例はデータが十分に吟味されているのでサンプルとして用いているが、従属変数としての選択はこれに限定されるものではない。
【００６０】
図５は、ステップＳ１０８において、ディスプレイ上に表示される入力ボックスの例を示す。同図において、入力ボックス３００には、独立変数のメニュー３０１と、数学的操作のメニュー３０２とが含まれている。独立変数のメニュー３０１内には、選択可能な独立変数として、体積弾性率（Bulk Modulus）、ヤング率（Young's Modulus）、横弾性係数（modulus of rigidity）、メンデレエフ番号等といった属性値が表示されている。また、演算処理のメニュー３０２には、加算（Ｓｕｍ）、減算（Ｄｉｆｆｅｒｅｎｃｅ）等といった値が表示されている。データベース検索装置２００はこのメニュー選択の結果に基づき設定される試行関数を使用して、解空間探索の探索を行う。
【００６１】
ユーザは、マウス２３６を操作して入力ボックス３００上のメニュー３０１から所望の独立変数を、メニュー３０２から所望の数学的操作を選択することができる。同図では、メニュー３０１の「Ｍ１メンデレエフ番号」が、メニュー３０２の全ての行が選択されている。ユーザは、「次へ（Ｗｅｉｔｅｒ）」ボタン３０３を押下することで、試行関数の設定を行うことができる。このようにして、元素の属性の選択を受け付け、数学的操作を受け付け、受け付けられた属性と数学的操作とを組み合わせて試行関数を作成する処理が実現される。
【００６２】
図６は、ステップＳ１１０における計算結果の表示の一例を示す図である。
【００６３】
一旦試行関数の計算が完了すると、ディスプレイ上に同図に示す画面が表示される。同図に示すように、左側のウィンドウ４０１には、基準点と同じ特性を有する点の存在確率が、基準点からの距離（Ｎｅｉｇｈｂｏｕｒｓ）の関数として表示される。例えば、横軸の「２０」は、解空間における基準点から２０単位の距離を示す。
【００６４】
右下側のウィンドウ４０３には、基準点から最も近い距離（即ち基準点から１単位）の曲線の積分にしたがってソートされた試行関数毎の計算結果のリストが表示されている。
【００６５】
最後に、右上のウィンドウ４０２には、試行関数のリストが表示されている。ウィンドウ４０２は左から順に「Ｐｒｏｆｉｌｅ」、「１」、「５０」という３つの項目から構成される。これらのうち、項目「Ｐｒｏｆｉｌｅ」には、試行関数の名称を表示する。また、項目「１」における括弧内の数字は、基準点から最も近い位置にある点の特性が基準点の特性と同じである確率を表示する。また、項目「５０」における括弧内の数字は、基準点から５０番目に近い位置にある点の特性が基準点の特性と同じである確率を表示する。
【００６６】
このウィンドウ４０２によれば、「３８−０３／３８―０６／００−００」という試行関数により規定される解空間内で、基準点から１単位の距離にある点の１００％、および５０単位の距離にある点の約９２％が、基準点に対応する化合物と同じ結晶構造を有している。また、左のウィンドウ４０１内の曲線４０４は、この「３８−０３／３８―０６／００−００」という試行関数に対する分離の状態を示している。これらの結果は、化合物形成（ｃｏｍｐｏｕｎｄｆｏｒｍｅｒ）および化合物非形成（ｃｏｍｐｏｕｎｄｎｏｎｆｏｒｍｅｒ）という２つの結晶構造パターンに対して実によい「分離」を示している。従って、この試行関数を用いれば、化合物が最もよく類別され、データベースに従属変数の値が含まれていない新しい化合物の構造を正しく予測できる確率が高いといえる。
【００６７】
図７に、この試行関数に対する解空間のグラフ表現が示されている。同図の左側のウィンドウ５０１には、ｘ軸をメンデレエフ番号の比、縦軸をメンデレエフ番号の最大値として、試行関数の計算結果が配置されている。また、計算結果を示す各点のうち、黒の点は、計算に用いられた元素からなる物質が化合物を形成することを表現し、白の点は化合物を形成しないことを表現している。同図からわかるように、結晶構造パターン（白の点および黒の点）の間のよい分離が得られている。
【００６８】
このように解空間を同定することにより、設計者は、データのない物質についても結晶構造パターンを予測することができる。なお、この手法の適用は、結晶構造パターンに限定されるものではない。
【００６９】
また、各点が技術文献に対応するものであることから、解空間内における点に基づいて、例えば点が密集しているところは盛んに研究が行われていると予測できるなど、化合物の研究開発動向を視覚的に把握することができる。
【００７０】
また、同図に示す例では点の色により解が類別されているが、形状等の他の表現形態を用いて表示することとしてもよい。
【００７１】
以上、本発明の実施の形態について詳細に説明したが、本発明は上述の実施形態に限定されることなく、他の種々の形態でも実施できることはいうまでもない。
【００７２】
例えば、複雑化合物のメカニズム解明のための従来の正統的な手法は、試行関数の設定や実行に綿密な考察を要し、従来の検索方法を適用した化合物データベースシステムでは膨大な計算時間、計算資源が必要である。そこで、上記のデータベース検索プロセスにおける律速過程となる計算処理は、別体として構成される計算サーバーによって行ない、計算値として化合物データベースに登録する方式を採用してもよい。
【００７３】
また、試行関数により表現される空間は２次元に限定されるものではない。図８は、試行関数により表現される空間が３次元の場合における解空間の例を示す。
【００７４】
また、試行関数の数学的操作を四則演算、比較演算だけの簡便な関数系とすることにより、作業仮説の検証に要する時間が数分程度と大幅に短縮され、求解過程における利用者の発想を加速する情報処理装置としての実用性を獲得することができる。
【００７５】
また、本発明は、上述した設計解の他、設計における要求仕様を事業目標とする事業戦略や、要求仕様は人工物の健全性を維持するための予防保全に関する解の探索等にも適用することができる。
【００７６】
【発明の効果】
以上説明したように、本発明によれば、着目する問題に適した解空間を容易に同定することができる。
【００７７】
従って、事実のコレクションである化合物データベースを駆使して着想を獲得するためのプロセスと、獲得した着想を事実として具添加するプロセスを支援するための情報システム・プラットフォームが構築される。
【図面の簡単な説明】
【図１】本発明の実施の形態に係るデータベース検索装置のハードウェア構成を示すブロック図である。
【図２】化合物データベースに記憶された化合物データおよび元素データの内容の一例を示す図である。
【図３】本発明の実施の形態に係るデータベース検索方法の手順を示すフローチャートである。
【図４】ステップＳ１１０における処理をより詳細に示すフローチャートである。
【図５】ステップＳ１０８において、ディスプレイ上に表示される入力ボックスの例を示す図である。
【図６】ステップＳ１１０における計算結果の表示の一例を示す図である。
【図７】ステップＳ１１０においてディスプレイ上に表示される解空間のグラフ表現の一例を示す図である。
【図８】試行関数により表現される空間が３次元の場合における解空間の例を示す図である。
【符号の説明】
２００データベース検索装置
２０４システムメモリ
２０６ＣＰＵ
２０８化合物データベース（ＤＢ）
２１０ＲＯＭ
２１２ＤＢ制御プログラム
２１３データ
２２１システムバス
２３２スピーカ
２３４キーボード
２３６マウス
３００入力ボックス
３０１、３０２メニュー
３０３「次へ」ボタン
４０１、４０２、４０３、５０１ウィンドウ
４０４曲線[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to a data mining technique for finding a specific pattern from a database of complex events such as scientific data and economic data, and more specifically, a record group having a large number of fields.CompoundFor databases,CompoundTo systematically find causal relationships that are implicitly implied in the database,CompoundThe present invention relates to a database search device, a database search method, a computer program, and a computer-readable recording medium that expands a component into a component and performs operations between attributes of the component and data reorganization.
[0002]
[Prior art]
  Complex system behavior cannot be expressed completely and explicitly due to constraints such as observation means, measurement method, and data description. For complex events, the data stored in the event database is different from actual events. Therefore, numerical data that can be obtained from the literature, a measurement method for obtaining the numerical data, a description related to the measurement conditions (hereinafter, this description is referred to as “related parameters”), and a document that describes these contents ( A fragmentary database will be constructed by combining bibliographic items related to (source). Attempts to construct an event database using such methods have failed. An event database constructed in accordance with an original data model composed of a combination of data and related parameters is extremely difficult to obtain a satisfactory evaluation from the user.
[0003]
  In general, in using a database, a user inputs a plurality of search conditions or selects a desired search condition from a search condition menu. The database creates a record set that matches each search condition, extracts a desired record group by a logical operation of the record set, and executes display or printing.
[0004]
  By such processing, in the case of a document database, document information and bibliographic items suitable for the purpose can be obtained. In the case of an event database, attribute values such as numerical values, text, diagrams, and photographs can be obtained. In either case of the document database or the event database, a record group as a subset of the accumulated record group can be obtained together with the attribute value.
[0005]
  Further, in both case of the literature database and the event database, the search result for the event not stored as a record in the database is an empty set. An event database constructed in accordance with the original data model often returns an empty set in response to a search request due to the difficulty of data editing.
[0006]
  In other words, in order to build a database search system that can satisfy users, it is assumed that the fact that the observed facts can be accurately described without excess or deficiency is an exception. It is important to consider whether to supplement.
[0007]
[Problems to be solved by the invention]
  Currently, in order to overcome the “language ambiguity” which is one of the attributes of natural language, keywords are prepared for each field, and search using such keywords is widely used. However, the exact definition and usage of keywords cannot be enforced. For this reason, there is a limit to support by a thesaurus or natural language processing, and the search result necessarily includes noise or a search leak occurs. Therefore, the validity of the search result is left to the user.
[0008]
  In addition, the event database inevitably has a small amount of information in search results when the data amount and data quality are insufficient. In the former case, it is difficult to draw a valid conclusion, and in the latter case, it is easy to lead to a wrong conclusion. That is, there is a problem that it is difficult to draw an effective conclusion because an extremely high-quality work is required to construct an event database with both quality and quantity.
[0009]
  The use of knowledge processing technology overcomes these limitations. However, knowledge processing technology is a technology related to the reuse of knowledge that is logically defined and established, and it is almost impossible to build a knowledge base for complex events. In the case of complex events, it is necessary to acquire new knowledge from a data group in which a logical relationship is defined only implicitly, and consider using the knowledge acquired on the spot. Therefore, the conventional knowledge processing technology has a problem that it is not effective for the problem of searching for solutions in which knowledge acquisition and knowledge use proceed simultaneously, such as solutions for design solutions, business strategies and preventive maintenance. .
[0010]
  In view of such a problem, an object of the present invention is to provide a database search apparatus, a database search method, a computer program, and a computer-readable recording medium that can easily search for a solution suitable for the problem of interest. And
[0011]
[Means for Solving the Problems]
  In order to achieve such an object, the invention described in claim 1 is a database search device,Many compoundsA first record having a plurality of fields for describingCompoundConfigureelementStorage means for storing a second record having a plurality of attributes for describingThe attribute value of the field of interest selected from the plurality of fields for describing the compound is not a null value and is less than a certain threshold, and the attribute value of the field of interest is not a null value. A record that is equal to or greater than a threshold value from the first record.Extraction means for extracting and stored in the storage meansAboveSelected from the second recordOf the elementsattributeAttribute valueWith variablesTrySetting means for setting at least two trial functions to:The compound of the record extracted by the extraction unit is expanded into elements, and the combination of the expanded elements is combined with the setting unit.SetAboveTrial functionAnd the results of the calculation are arranged on a graph with coordinate values as numerical values obtained by at least two trial functions, andBy the extraction meansFrom the first recordExtractedAboveOf the field of interest in the recordattributevalueBy determining whether or not exceeds a certain threshold,Classification means to classify andThe graph in which the calculation result is arranged is changed by changing the display form of the calculation result of each compound according to the determination result by the classification means as to whether or not the attribute value of the field of interest exceeds a certain threshold value. Display means for displaying;It is provided with.
[0012]
  The invention described in claim 2 is the database search apparatus according to claim 1, further comprising creating means for creating a new attribute based on the attribute stored in the storage means.
[0013]
  Further, the invention according to claim 3 is the database search device according to claim 1 or 2, wherein the setting means includes a first accepting means for accepting selection of the attribute and an input of a mathematical operation. And a means for creating the trial function based on the attribute accepted by the first accepting means and the mathematical operation accepted by the second accepting means. And
[0014]
  Further, the invention according to claim 4 is the database search device according to any one of claims 1 to 3, wherein the classification means includes calculation means for executing calculation of a trial function set by the setting means;Graph with coordinate values as numerical values obtained by at least two trial functionsAbove, an arrangement means for arranging the calculation result executed by the calculation means, a value of the field of interest corresponding to a reference point selected from the calculation results arranged by the arrangement means, and the reference point And comparison means for comparing the value of the field of interest corresponding to other calculation results existing around the.
[0015]
  Claims5The invention described inMany compoundsA first record having a plurality of fields for describingCompoundConfigureelementA database search method in a database search device storing a second record having a plurality of attributes for describingThe extraction means of the database search device comprises:SaidThe attribute value of the field of interest selected from the plurality of fields for describing the compound is not a null value and is less than a certain threshold, and the attribute value of the field of interest is not a null value. A record that is equal to or greater than a threshold value from the first record.Extract andThe setting means of the database search device includes theSelected from the second recordOf the elementsattributeAttribute valueWith variablesTrySet at least two trial functions toThe categorizing means of the database search device expands the compound of the record extracted by the extracting means into elements, and for the combination of the expanded elements, the setting meansSetAboveTrial functionThe calculation results are arranged on a graph having coordinate values as numerical values obtained by the at least two trial functions, and extracted from the first record by the extraction means.ExtractedAboveOf the field of interest in the recordattributevalueBy determining whether or not exceeds a certain threshold,ClassificationThe calculation result of each compound according to the determination result by the classification means whether or not the display means of the database search device exceeds a certain threshold value of the attribute value of the field of interest Change the display form of the above, and display the graph with the same calculation resultIt is characterized by.
[0016]
  Claims6The invention described in is a computer program,Many compoundsA first record having a plurality of fields for describingCompoundConfigureelementStoring a second record having a plurality of attributes for describingIn addition,SaidThe attribute value of the field of interest selected from the plurality of fields for describing the compound is not a null value and is less than a certain threshold, and the attribute value of the field of interest is not a null value. A record that is equal to or greater than a threshold value from the first record.ExtractionExtracting means forSelected from the second recordOf the elementsattributeAttribute valueWith variablesTrySet at least two trial functionsAnd setting means for expanding the compound of the record extracted by the extracting means into elements, and for the combination of the expanded elements, the setting meansSetAboveTrial functionThe calculation results are arranged on a graph having coordinate values as numerical values obtained by the at least two trial functions, and extracted from the first record by the extraction means.ExtractedAboveOf the field of interest in the recordattributevalueBy determining whether or not exceeds a certain threshold,ClassificationThe calculation result is arranged by changing the display form of the calculation result of each compound according to the determination result by the classification means and whether the attribute value of the field of interest exceeds a threshold value or not. And function as a display means for displaying the graph.It is characterized by that.
[0017]
  Claims7The invention described in claim 1 is a computer-readable recording medium.6The computer program described in 1 is recorded.
[0018]
  In the present invention, a search problem of a solution such as a design solution of a compound isCompoundUsing a database, the problem is solved by exhaustive search of the solution space combining trial functions and solution sorting functions. Design is an act of deriving a set of facts that satisfy any condition from requirements specifications and available resources, axioms and theorems for composition and attribute prediction of resources. The solution space of the design solution is a space derived from the record corresponding to the fact about this action.
[0019]
  CompoundThe database determines the function system of the trial function for approximating the space where the solution exists, by exhaustive search of the solution space that combines the execution of the trial function consisting of multiple field operations and the sorting function, Try to solve the problem.
[0020]
  Moreover, the search history of the solution space can be recorded as a trial and error history of solution search, and can be accumulated as a reusable knowledge base after protocol analysis.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
  Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the following explanation,CompoundMeans a recognized individual composed of one or more attributes (fields).The data model is a process of abstracting information, and is a method of holding data such as the data representation format unit in the abstracted world, a method of representing data interrelationships, and a system of these overall operation methods. This is a model for making rules independent of the algorithm of each application program.
[0022]
  FIG. 1 is a block diagram showing a hardware configuration of a database search apparatus according to an embodiment of the present invention.
[0023]
  The database search apparatus 200 shown in the figure includes a system memory 204, a CPU (Central Processing Unit) 206, and a plurality ofCompoundConsist of a collection of dataCompoundA database (DB) 208 and a ROM (Random Access Memory) 210 are connected to the system bus 221.
[0024]
  The system memory 204 stores a DB control program 212 and data 213 referred to by this program.
[0025]
  The DB control program 212CompoundThis is a program for performing processing such as data registration, update, and deletion for the database 208.CompoundThe database 208 is a set of data registered by executing the DB control program 212. These sets of data are registered on an internal or external storage device such as a hard disk, a magnetic disk, an optical disk, or a magneto-optical disk.
[0026]
  Further, a display 230, a speaker 232, a keyboard 234, and a mouse 236 are connected to the system bus 221 as peripheral devices.
[0027]
  The display 230 displays an image edited by the CPU 206, and an LCD (Liquid Crystal Display), a CRT (Cathode-Ray Tube), or the like can be used. The speaker 232 converts the electrical signal into sound and outputs it. The keyboard 234 and the mouse 236 are input devices used for controlling the cursor on the display 230 and inputting commands to the CPU 206. The display 230, the speaker 232, the keyboard 234, and the mouse 236 are all connected to the system bus 221 via an input / output (I / O) interface (not shown).
[0028]
  The functions related to the present invention are as follows. The CPU 206 reads out and executes a program stored in the ROM 210 or the system memory 204, or stores a computer-readable recording medium storing the program related to the present invention in the database search apparatus 200. This is achieved by supplying. In the latter case, the CPU 206 of the database search apparatus 200 reads and executes the program stored in the recording medium.
[0029]
  As a recording medium for supplying a program to the database search device 200, specifically, a flexible disk, a hard disk, a magneto-optical disk, an optical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile memory card, and a ROM, A flash memory such as a compact flash (registered trademark) can also be used, but the present invention is not limited to these media.
[0030]
  Further, the term “program” used in the present embodiment refers to what is described in a program language such as a high-level programming language such as BASIC, C language, or C ++, an assembly language, or a machine language. It is not limited to these languages.
[0031]
  Next, referring to FIG. 2, the database retrieval apparatus 200 according to the present embodiment is installed.CompoundDescribe the contents of the database.
[0032]
  CompoundDatabase 208 contains compounds of two kinds of elementsCompoundData 601;elementData 602 is stored.CompoundData 601 isCompoundIs a set of records having a plurality of fields (attributes) for describing.CompoundThe records that make up the data are melting point, density, hardness, lattice constant, precipitate size, dislocation density, grain size, neutron cross section, tensile strength, yield strength (yield strength) ) Etc.elementData 602 isCompoundConfigureelementIs a set of records having a plurality of fields (attributes) for describing, and includes fields such as a Mendeleev number, an elastic modulus, a Young's modulus, an elastic constant, and an electronegativity. In the following description:CompoundData andelementTo avoid confusion with data fields,CompoundThe field of data is “field”,elementThe data field is described as “attribute”.
[0033]
  Next, the database search method according to the present embodiment will be described with reference to the flowchart of FIG.
[0034]
  In the present embodiment, for a compound composed of two elements, for example, the hiding that exists between the characteristics of some compounds such as crystal structure or melting point and the attributes of components (elements) such as Mendeleev number or electronegativity Search for correlations. The attributes of the two elements are combined by several mathematical operations (eg, the maximum number of atoms), and this combination is called a trial function. An attribute and a plurality of mathematical operations on the attribute are selected by the user. The database search apparatus 200 combines the selected attributes with a mathematical operation to set a trial function, and executes a trial function for all elements to search for a solution. In the following description, the purpose of the search is to find a solution space that well separates the properties of different compounds by boundaries, such as a crystal structure.
[0035]
  First, in step S102, the database search device 200CompoundSearch the database 208 and assume that the field of interest is a dependent variable (DV). Then, a record that is a field of interest and in which a field having a correlation between the fields is not a null value is extracted, and a set of extracted records is created.
[0036]
  The field of interest is a field related to the problem that the user is trying to analyze. For example, when focusing on the strength of a material (chemical substance) as a characteristic of a design target compound, that is, when the strength of a material becomes a dependent variable, the fields of tensile strength, yield strength, and hardness Field ". Similarly, fields (lattice constant, precipitate size, dislocation density, crystal grain size) corresponding to parameters describing the consistency of the two-phase interface can be set as the “field of interest”. On the other hand, a field that is unrelated to the strength of the material, such as the cross-sectional area of neutrons, is not a “field of interest”.
[0037]
  In addition, when there is a correlation between fields, for example, when the environment is changed, such as increasing the temperature of the compound, the attribute value of the field assumed to be a dependent variable changes with a certain relationship, In addition to the relationship to temporal changes, it also includes the relationship to physical changes.
[0038]
  Whether or not there is a correlation can be determined as follows. Now, the problem that the user wants to analyze is based on the characteristics of the two elements, between the crystal structure of those compounds (whether they are NaCl or CsCl structures) and the attribute values of the elements constituting the compounds. Suppose that it is to find a correlation. When determining the temperature correlation, all compounds that crystallize into the NaCl or CsCl structure should beCompoundA set of data records (data set 1) is created. Then, for all those compounds, the temperature was changedCompoundA set of records for data (data set 2) is created. Finally, the data set 2 is subtracted from the data set 1. Then, the difference between the data sets 1 and 2 is evaluated.
[0039]
  The process of assuming the field of interest as a dependent variable is displayed on the display 230.CompoundThe list can be set by displaying a list of all fields included in the data and accepting selection of fields from the list. In this case, the dependent variable is determined according to the selection operation by the user using the input device.
[0040]
  Further, a problem to be noted (for example, material strength, substance stability, etc.) is displayed on the display 230 as a selection menu, and the problem is dealt with according to the problem selected by the user using the input device. A field may be set as a dependent variable.
[0041]
  Next, in step S104, the database search device 200CompoundTheelementExpand to.
[0042]
  CompoundDatabase 208 is a compound composed of two kinds of elementsTheSince this is something to remember,CompoundIs the basic component, ieElementalBe expanded. For example, NiTi used as a shape memory alloy is developed into Ni and Ti. Database search device 200 has been deployedelementofelementDataCompoundExtract from database 208.
[0043]
  At the same time, the database search apparatus 200 is preliminarily provided as necessary.elementBased on the attribute value stored in the data, a new attribute value that can be an independent variable for the target dependent variable is calculated. For example, if the strength of the material becomes a dependent variable, the independent variable may be the total energy of the atom, the first derivative of the total energy, the second derivative,CompoundIf the database stores in advance the total energy of an element as an attribute value, the first and second derivatives can be calculated based on the stored data.
[0044]
  Next, in step S106, the database search device 200 displays all the expanded files.elementFor the above, a set of records composed of attribute values that can be independent variables with respect to the dependent variables and that are not null values is created.
[0045]
  The database search device 200element, A value must exist and a field that is assumed to be an independent variable for the above dependent variable is combined into one record,elementA set of records about (elementDatabase).
[0046]
  Next, in step S108, the database search device 200elementA selection of a specific attribute is received from attributes constituting the database, and an input of a mathematical operation for the selected attribute is received. Then, a trial function is set based on the received attribute (independent variable) and mathematical operation.
[0047]
  In this case, the user associates the dependent variable (field of interest) with the independent variable (attribute of each element)elementSelect at least one desired attribute value from the database and select at least two mathematical operations on the selected attribute. The database search apparatus 200 sets and stores a trial function by combining the attribute value selected by the user and a mathematical operation.
[0048]
  The attribute value assumed to be the dependent variable of interest can be set by displaying a list of candidate attribute values that can be independent variables on the display 230 and accepting selection of the attribute value from this list. In this case, it is set according to the selection operation from all attribute values by the user using the input device. Similarly, mathematical operations can be set by displaying a list of candidates on the display 230 and accepting a selection from the list.
[0049]
  Finally, in step S110, the database search device 200setIn order to execute the classification based on the values of the dependent variables, a systematic calculation of the trial function and sorting of the values of the dependent variables are combined to perform an exhaustive search for classification. This operation attempts to find a solution in order to find the optimal trial function for distinguishing the values of the dependent variables based on a certain threshold.
[0050]
  FIG. 4 is a flowchart showing the process in step S110 in more detail. The database search apparatus 200 executes the plurality of trial functions set in step S108 for a certain element combination (S801). Specifically, when the set trial function is “ratio of Mendeleev number of two elements” and “maximum value of Mendeleev number of two elements”, the database search apparatus 200 picks up two kinds of elements and extracts them. Perform trial function calculations.
[0051]
  The database search apparatus 200 determines, for each calculation result, whether or not the value of the dependent variable in the compound composed of the two elements used in the calculation exceeds a certain threshold (CompoundAssortment). Based on this determination, the calculation result can be divided into a characteristic that the value of the dependent variable is less than the threshold value and a characteristic that the value of the dependent variable is equal to or greater than the threshold value (S802).
[0052]
  Then, the calculation result is arranged on a solution space defined by each trial function (S803). That is, the calculation result is arranged on a solution space where the x-axis is “ratio of Mendeleev number of two elements” and the y-axis is “maximum value of Mendeleev number of two elements”. Such calculation and arrangement are performed brute force for all elements (S804), and finally all calculation results are arranged on the solution space (sorting).
[0053]
  Next, the database search device 200 sets one point as a reference point from the points arranged in the solution space (S805), and determines the characteristics of the reference point. Next, one or more points closest to the point are found, the characteristics of the found point are compared with the characteristics of the reference point, and it is determined whether the characteristics are the same (S806). Then, the probability that the characteristic of the point closest to the reference point is the same as the characteristic of the reference point is calculated.
[0054]
  Next, one or more points closest to the reference point are found, and the probability that the characteristic of the found point is the characteristic of the reference point is obtained in the same manner as described above. Such processing is repeated for all points arranged in the solution space (S807).
[0055]
  Next, the database search apparatus 200 determines whether the characteristics of the points are well “separated” at a certain distance from the reference point based on the obtained probability. Specifically, this determination is performed by obtaining a boundary line in the solution space where the probability obtained above decreases most rapidly. Then, the solution space with the best separation is identified as the solution space suitable for the problem of interest.
[0056]
  In this step S110, the calculation results (ie, solutions) of the trial function categorized by the value of the dependent variable are displayed on the display by color-coding each point arranged in the solution space according to the determination result. Provided to the user.
[0057]
  In this way, by combining the systematic operation of the trial function and the sorting of the values of the independent variables, and performing an exhaustive search for solutions,CompoundIdentification of solution spaces that better classify
[0058]
  Note that in this embodiment, there is no record that matches the search condition, and only the systematic sorting performed based on the attribute values of the elements constituting the substance, the macro characteristics of the substance and the basic constituent elements (elements) This is particularly effective when the solution space for the correlation with () cannot be identified.
[0059]
  Next, an example of a screen displayed on the display in the database search method according to the present embodiment will be described with reference to FIGS. In the following description, the problem for the user is the stability of the compound, and the analysis of this problem is based on the formation of compounds for compounds and compound formation (components randomly at lattice positions). It is performed by determining the case of being disposed or becoming amorphous. Although this example is used as a sample because the data has been thoroughly examined, the selection as a dependent variable is not limited to this.
[0060]
  FIG. 5 shows an example of an input box displayed on the display in step S108. In the figure, an input box 300 includes an independent variable menu 301 and a mathematical operation menu 302. In the independent variable menu 301, attribute values such as bulk modulus (Bulk Modulus), Young's modulus (Young's Modulus), lateral modulus of elasticity (modulus of rigidity), Mendeleev number, etc. are displayed as selectable independent variables. Yes. In addition, the arithmetic processing menu 302 displays values such as addition (Sum) and subtraction (Difference). The database search apparatus 200 searches for a solution space search using a trial function set based on the result of this menu selection.
[0061]
  The user can operate the mouse 236 to select a desired independent variable from the menu 301 on the input box 300 and a desired mathematical operation from the menu 302. In the figure, “M1 Mendeleev number” in the menu 301 is selected for all the rows in the menu 302. The user can set a trial function by pressing a “Next” button 303. In this manner, a process of accepting selection of an element attribute, accepting a mathematical operation, and creating a trial function by combining the accepted attribute and the mathematical operation is realized.
[0062]
  FIG. 6 is a diagram illustrating an example of display of the calculation result in step S110.
[0063]
  Once the trial function calculation is completed, the screen shown in the figure is displayed on the display. As shown in the figure, in the left window 401, the existence probability of a point having the same characteristics as the reference point is displayed as a function of the distance from the reference point (Neighbors). For example, “20” on the horizontal axis indicates a distance of 20 units from the reference point in the solution space.
[0064]
  The lower right window 403 displays a list of calculation results for each trial function sorted according to the integral of the curve at the closest distance from the reference point (ie, 1 unit from the reference point).
[0065]
  Finally, in the upper right window 402, a list of trial functions is displayed. The window 402 is composed of three items “Profile”, “1”, and “50” in order from the left. Among these items, the name of the trial function is displayed in the item “Profile”. The number in parentheses in the item “1” indicates the probability that the characteristic of the point closest to the reference point is the same as the characteristic of the reference point. The number in parentheses in the item “50” indicates the probability that the characteristic of the point at the 50th position closest to the reference point is the same as the characteristic of the reference point.
[0066]
  According to this window 402, 100% of the points at a distance of 1 unit from the reference point in the solution space defined by the trial function “38-03 / 38-06 / 00-00”, and 50 units About 92% of the points at distance have the same crystal structure as the compound corresponding to the reference point. A curve 404 in the left window 401 indicates a separation state for the trial function “38-03 / 38-06 / 00-00”. These results show a really good “separation” for the two crystal structure patterns, compound form and compound non-former. Therefore, using this trial function,CompoundIs most likely categorized and has a high probability of correctly predicting the structure of new compounds that do not contain dependent variable values in the database.
[0067]
  FIG. 7 shows a graphical representation of the solution space for this trial function. In the left window 501 of FIG. 9, the calculation result of the trial function is arranged with the x-axis being the ratio of the Mendeleev number and the vertical axis being the maximum value of the Mendeleev number. Of the points indicating the calculation results, the black points were used for the calculation.elementThis means that a substance consisting of a compound forms a compound, and a white dot indicates that a compound is not formed. As can be seen from the figure, good separation between crystal structure patterns (white dots and black dots) is obtained.
[0068]
  By identifying the solution space in this way, the designer can predict a crystal structure pattern even for a substance without data. The application of this method is not limited to the crystal structure pattern.
[0069]
  In addition, since each point corresponds to the technical literature, it can be predicted that, for example, where the points are densely studied based on the points in the solution space, active research is being conducted. Visualize development trends.
[0070]
  Further, in the example shown in the figure, the solutions are classified according to the color of the points, but they may be displayed using other expression forms such as shapes.
[0071]
  As mentioned above, although embodiment of this invention was described in detail, it cannot be overemphasized that this invention can be implemented with another various form, without being limited to the above-mentioned embodiment.
[0072]
  For example, complexCompoundThe conventional orthodox method for elucidating the mechanism of the method requires careful consideration in setting and executing the trial function, and applied the conventional search method.CompoundA database system requires enormous calculation time and resources. Therefore, the calculation process that is the rate-determining process in the above database search process is performed by a separate calculation server, and the calculation value isCompoundA method of registering in the database may be adopted.
[0073]
  Further, the space represented by the trial function is not limited to two dimensions. FIG. 8 shows an example of the solution space when the space represented by the trial function is three-dimensional.
[0074]
  In addition, by making the mathematical operation of the trial function a simple functional system with only four arithmetic operations and comparison operations, the time required to verify the working hypothesis is significantly reduced to a few minutes, and the idea of the user in the solution process is reduced. Practicality as an information processing device that accelerates can be acquired.
[0075]
  In addition to the above-described design solutions, the present invention is also applied to a business strategy for which the required specifications in the design are a business target, and the required specifications are searched for a solution related to preventive maintenance for maintaining the soundness of an artifact. be able to.
[0076]
【The invention's effect】
  As described above, according to the present invention, it is possible to easily identify a solution space suitable for the problem of interest.
[0077]
  Therefore, it is a collection of factsCompoundAn information system platform is built to support the process of acquiring ideas using the database and the process of adding the acquired ideas as facts.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a hardware configuration of a database search apparatus according to an embodiment of the present invention.
[Figure 2]CompoundStored in the databaseCompoundData andelementIt is a figure which shows an example of the content of data.
FIG. 3 is a flowchart showing a procedure of a database search method according to the embodiment of the present invention.
FIG. 4 is a flowchart showing the process in step S110 in more detail.
FIG. 5 is a diagram showing an example of an input box displayed on the display in step S108.
FIG. 6 is a diagram showing an example of display of calculation results in step S110.
FIG. 7 is a diagram showing an example of a graphical representation of the solution space displayed on the display in step S110.
FIG. 8 is a diagram illustrating an example of a solution space when a space expressed by a trial function is three-dimensional.
[Explanation of symbols]
  200 Database search device
  204 System memory
  206 CPU
  208CompoundDatabase (DB)
  210 ROM
  212 DB control program
  213 data
  221 System bus
  232 Speaker
  234 keyboard
  236 mice
  300 input box
  301, 302 Menu
  303 “Next” button
  401, 402, 403, 501 windows
  404 Curve

Claims

Storage means for storing a first record having a plurality of fields for describing a large number of compounds and a second record having a plurality of attributes for describing elements constituting the compound ;
The attribute value of the field of interest selected from the plurality of fields for describing the compound is not a null value, and the attribute value of the field of interest is not a null value. Extracting means for extracting a record that is equal to or greater than a certain threshold from the first record ;
Setting means for setting at least two trial function to try the attribute value of the attribute of the element selected from the second records stored in the storage means as a variable,
Expand compounds of the records extracted by the extraction unit to elemental, for the combination of the expanded elements to perform calculations of the trial function set by the setting means, the calculation result, at least numerical values obtained by the two said trial function is arranged on a graph whose coordinate axes, and the attribute value of the noted field in the record extracted from the first record by the extraction means, a threshold by performing more than whether the determined, and classification means for categorizing said compound,
The display form of the calculation result of each compound is changed according to the determination result by the classification means whether the attribute value of the field of interest exceeds a certain threshold value, and the graph in which the calculation result is arranged is displayed. Display means to
A database search device characterized by comprising:

The database search apparatus according to claim 1, further comprising a creation unit that creates a new attribute based on the attribute stored in the storage unit.

The setting means includes
First accepting means for accepting selection of the attribute;
A second receiving means for receiving an input of a mathematical operation;
The means for creating the trial function based on the attribute received by the first receiving means and the mathematical operation received by the second receiving means, 2. The database search device according to 2.

The classification means is
Calculation means for performing calculation of the trial function set by the setting means;
Arrangement means for arranging calculation results executed by the calculation means on a graph having coordinate values as numerical values obtained by at least two trial functions ;
The value of the field of interest corresponding to the reference point selected from the calculation results arranged by the arrangement means, and the value of the field of interest corresponding to other calculation results existing around the reference point The database search device according to claim 1, further comprising: a comparison unit that compares

Database search method in a database search apparatus storing a first record having a plurality of fields for describing a large number of compounds and a second record having a plurality of attributes for describing elements constituting the compound Because
The extraction means of the database search device selects a record in which the attribute value of the field of interest selected from the plurality of fields for describing the compound is not a null value and less than a certain threshold value, A record whose attribute value is not a null value and is equal to or greater than a certain threshold is extracted from the first record ,
The setting means of the database search device, to set at least two trial function attempts to the attribute value of the attribute of the element selected from the second record as a variable,
Wherein the classification means of the database search device, expand the compound of the records extracted by the extraction unit to elemental, for the combination of the expanded element, the computation of the trial function set by the setting means run, the calculation result, the respective numerical values obtained by at least two of said trial function is arranged on a graph whose coordinate axes, and the interest in the record extracted from the first record by the extraction unit Categorizing the compounds by determining whether the attribute value of the field to be exceeds a certain threshold ,
The display unit of the database search device changes the display form of the calculation result of each compound according to the determination result by the classification unit as to whether or not the attribute value of the field of interest exceeds a certain threshold value. A database search method characterized by displaying the graph in which results are arranged .

In a computer storing a first record having a plurality of fields for describing a large number of compounds and a second record having a plurality of attributes for describing elements constituting the compound ,
The attribute value of the field of interest selected from the plurality of fields for describing the compound is not a null value, and the attribute value of the field of interest is not a null value. Extracting means for extracting a record that is equal to or greater than a certain threshold from the first record ;
Setting means for setting at least two trial function attempts to the attribute value of the attribute of the element selected from the second record as a variable,
Expand compounds of the records extracted by the extraction unit to elemental, for the combination of the expanded elements to perform calculations of the trial function set by the setting means, the calculation result, at least numerical values obtained by the two said trial function is arranged on a graph whose coordinate axes, and the attribute value of the noted field in the record extracted from the first record by the extraction means, a threshold Categorizing means for classifying the compound by determining whether or not
The display form of the calculation result of each compound is changed according to the determination result by the classification means whether the attribute value of the field of interest exceeds a certain threshold value, and the graph in which the calculation result is arranged is displayed. Display means to
A computer program characterized by functioning as a computer program.

A computer-readable recording medium on which the computer program according to claim 6 is recorded.