JP4328031B2

JP4328031B2 - Numerical field dividing apparatus, program, recording medium, and numerical field dividing method

Info

Publication number: JP4328031B2
Application number: JP2001051103A
Authority: JP
Inventors: 一穂前田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2001-02-26
Filing date: 2001-02-26
Publication date: 2009-09-09
Anticipated expiration: 2021-02-26
Also published as: JP2002259358A

Description

【０００１】
【発明の属する技術分野】
本発明は、数値フィールド分割装置、プログラム、記録媒体、および、数値フィールド分割方法に関する。
【０００２】
【従来の技術】
従来より、コンピュータを用いたデータ処理においては、カテゴリーフィールドと数値フィールドとを含むデータの分析を行うことが多い。例えば、個人購買情報データを一例に説明すると、個人ＩＤ（カテゴリーフィールド）毎に購入金額（数値フィールド）を集計したデータに基づいて、購買傾向等の分析処理を行う場合等である。このようなデータを分析する場合に、数値フィールドに従ってデータをいくつかの単位に分割して処理する必要がある場合がある。すなわち、たとえば、離散値にしか対応していないデータ処理を行うための前処理の場合や、決定木において分岐のためにあるフィールドを複数の領域に分割したい場合や、あるいは、領域分割により視覚化などを通し人の理解を助けたい場合等に、データの数値フィールドを複数の領域に分割する必要がある。
【０００３】
これらの場合において、数値フィールドの分割は、人により直感的に分割するか、目的により目的関数を設定してその目的関数を最適とする分割を行うか、あるいは、その両者を組み合わせて分割をすることになる。ここで、すべての数値フィールドについて人が直感的に行うのは、その作業者に負担がかかり適切ではない。したがって、ある目的関数を最適にするような分割が自動的に生成されることが一般的である。その場合に、全体について目的関数に関する最適化問題を解き分割を求めるか、あるいは、局所的改良を繰り返して局所最適解を求めそれに従った分割をすることが多かった。
例えば、「ＡｎＥｍｐｅｒｉｃａｌＣｏｍｐａｒｉｓｏｎｏｆＤｉｓｃｒｅｔｉｚａｔｉｏｎＭｅｔｈｏｄｓ１０^th ＩｎｔｅｒｎａｔｉｏｎａｌＳｙｍｐｏｓｉｕｍｏｎＣｏｍｐｕｔｅｒａｎｄＩｎｆｏｒｍａｔｉｏｎＳｃｉｅｎｃｅ１９９５」の第４４３頁〜第４５０頁には、予め適当な数の区分に分割しておき、局所的に最適な隣接区分同士をマージしていくことを繰返すことにより、最適な区分を求める数値フィールド分割方法が開示されている。
【０００４】
【発明が解決しようとする課題】
しかしながら、全体について目的関数に関する最適化問題を解き分割を求めることにより、数値フィールドの分割を行う場合には、データ量が増えた場合に計算時間が非常にかかるという問題がある。
【０００５】
また、局所的改良を繰り返して局所最適解を求めることにより、数値フィールドの分割を行う場合には、必ずしも適当な分割になるとは限らないという問題があった。
【０００６】
本発明は上記問題点に鑑みてなされたもので、連続値フィールドの分割を行う際に、適当に細かな領域にあらかじめ分割しておき、その後、それに対して動的計画法を用いて最適解を得ることにより、比較的短時間で最適に近い分割を得ることができる数値フィールド分割装置、プログラム、記録媒体、および、数値フィールド分割方法を提供することを目的としている。
【０００７】
【課題を解決するための手段】
このような目的を達成するため、請求項１に記載の数値フィールド分割装置は、数値フィールドを含むデータに対する初期分割を生成する初期分割生成手段と、上記初期分割生成手段にて生成された上記初期分割と、上記データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成する評価用情報生成手段と、上記初期分割生成手段にて生成された上記初期分割と、上記評価関数と、最大分割数と、上記評価用情報生成手段にて生成された上記評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成する評価値計算手段と、上記評価用情報および上記評価修正関数に従って、上記評価値計算手段にて生成された上記評価値を修正する評価値修正手段と、上記分割と上記評価値修正手段にて修正された上記評価値の組を選択する分割選択手段とを備えたことを特徴とする。
【０００８】
この装置によれば、数値フィールドを含むデータに対する初期分割を生成し、生成された初期分割と、データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成し、生成された初期分割と、評価関数と、最大分割数と、生成された評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成し、評価用情報および評価修正関数に従って、生成された評価値を修正し、分割と評価値の組を選択するので、大規模データに対しても短時間で数値フィールドの最適に近い分割を求めることができる。
【０００９】
また、本発明は数値フィールド分割装置に数値フィールド分割方法を実行させるプログラムに関するものであり、請求項２に記載のプログラムは、数値フィールドを含むデータに対する初期分割を生成する初期分割生成ステップと、上記初期分割生成ステップにて生成された上記初期分割と、上記データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成する評価用情報生成ステップと、上記初期分割生成ステップにて生成された上記初期分割と、上記評価関数と、最大分割数と、上記評価用情報生成ステップにて生成された上記評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成する評価値計算ステップと、上記評価用情報および上記評価修正関数に従って、上記評価値計算ステップにて生成された上記評価値を修正する評価値修正ステップと、上記分割と上記評価値修正ステップにて修正された上記評価値の組を選択する分割選択ステップとを含むことを特徴とする。
【００１０】
このプログラムによれば、数値フィールドを含むデータに対する初期分割を生成し、生成された初期分割と、データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成し、生成された初期分割と、評価関数と、最大分割数と、生成された評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成し、評価用情報および評価修正関数に従って、生成された評価値を修正し、分割と評価値の組を選択するので、大規模データに対しても短時間で数値フィールドの最適に近い分割を求めることができる。
【００１１】
また、本発明は数値フィールド分割装置に数値フィールド分割方法を実行させるプログラムを記録したコンピュータ読み取り可能な記録媒体に関するものであり、請求項３に記載の記録媒体は、数値フィールドを含むデータに対する初期分割を生成する初期分割生成ステップと、上記初期分割生成ステップにて生成された上記初期分割と、上記データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成する評価用情報生成ステップと、上記初期分割生成ステップにて生成された上記初期分割と、上記評価関数と、最大分割数と、上記評価用情報生成ステップにて生成された上記評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成する評価値計算ステップと、上記評価用情報および上記評価修正関数に従って、上記評価値計算ステップにて生成された上記評価値を修正する評価値修正ステップと、上記分割と上記評価値修正ステップにて修正された上記評価値の組を選択する分割選択ステップとを含むことを特徴とする。
【００１２】
この記録媒体によれば、数値フィールドを含むデータに対する初期分割を生成し、生成された初期分割と、データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成し、生成された初期分割と、評価関数と、最大分割数と、生成された評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成し、評価用情報および評価修正関数に従って、生成された評価値を修正し、分割と評価値の組を選択するので、大規模データに対しても短時間で数値フィールドの最適に近い分割を求めることができる。
【００１３】
また、本発明は数値フィールド分割方法に関するものであり、請求項４に記載の数値フィールド分割方法は、数値フィールドを含むデータに対する初期分割を生成する初期分割生成ステップと、上記初期分割生成ステップにて生成された上記初期分割と、上記データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成する評価用情報生成ステップと、上記初期分割生成ステップにて生成された上記初期分割と、上記評価関数と、最大分割数と、上記評価用情報生成ステップにて生成された上記評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成する評価値計算ステップと、上記評価用情報および上記評価修正関数に従って、上記評価値計算ステップにて生成された上記評価値を修正する評価値修正ステップと、上記分割と上記評価値修正ステップにて修正された上記評価値の組を選択する分割選択ステップとを含むことを特徴とする。
【００１４】
この方法によれば、数値フィールドを含むデータに対する初期分割を生成し、生成された初期分割と、データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成し、生成された初期分割と、評価関数と、最大分割数と、生成された評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成し、評価用情報および評価修正関数に従って、生成された評価値を修正し、分割と評価値の組を選択するので、大規模データに対しても短時間で数値フィールドの最適に近い分割を求めることができる。
【００１５】
また、請求項５に記載のプログラムは、請求項２に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラムにおいて、上記分割選択ステップは、一つの上記分割を選択し出力する出力ステップをさらに含むことを特徴とする。
【００１６】
これは分割選択の一例を一層具体的に示すものである。このプログラムによれば、一つの分割を選択し出力するので、自動的に最適な分割を選択して出力することができる。
【００１７】
【発明の実施の形態】
以下に、本発明にかかる数値フィールド分割装置、プログラム、記録媒体、および、数値フィールド分割方法の実施の形態を図面に基づいて詳細に説明する。なお、この実施の形態によりこの発明が限定されるものではない。
まず、数値フィールド分割装置１００の構成について説明する。図１は、本発明が適用される数値フィールド分割装置１００の構成の一例を示すブロック図であり、該構成のうち本発明に関係する部分のみを概念的に示している。図１において数値フィールド分割装置１００は、概略的に、数値フィールド分割装置１００の全体を統括的に制御するＣＰＵ等の制御部１０２、入出力装置（図示せず）に接続される入出力制御インタフェース部１０４、および、各種のデータ（対象データ１０６ａ〜評価修正関数１０６ｅ）を格納する記憶部１０６を備えて構成されている。
【００１８】
図１において、制御部１０２は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）等の制御プログラム、各種の処理手順等を規定したプログラム、および所要データを格納するための内部メモリを有し、これらのプログラム等により、種々の処理を実行するための情報処理を行う。制御部１０２は、機能概念的に、初期分割部１０２ａ、評価用情報生成部１０２ｂ、評価値計算部１０２ｃ、評価値修正部１０２ｄ、および、分割選択部１０２ｅを備えて構成されている。
【００１９】
このうち、初期分割部１０２ａは、数値フィールドを含むデータに対する初期分割を生成する初期分割生成手段である。また、評価用情報生成部１０２ｂは、初期分割生成手段にて生成された初期分割と、データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成する評価用情報生成手段手段である。また、評価値計算部１０２ｃは、初期分割生成手段にて生成された初期分割と、評価関数と、最大分割数と、評価用情報生成手段にて生成された記評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成する評価値計算手段である。また、評価値修正部１０２ｄは、評価用情報および評価修正関数に従って、評価値計算手段にて生成された評価値を修正する評価値修正手段である。また、分割選択部１０２ｅは、分割と評価値修正手段にて修正された評価値の組を選択する分割選択手段である。なお、これら各部によって行なわれる処理の詳細については、後述する。
【００２０】
また、図１において、入出力制御インタフェース部１０４は、入力装置や出力装置の制御を行う。ここで、出力装置としては、モニタ（家庭用テレビを含む）の他、スピーカを用いることができる（なお、以下においては出力装置をモニタとして記載する）。また、入力装置としては、キーボード、マウス、および、マイク等を用いることができる。また、モニタも、マウスと協働してポインティングディバイス機能を実現する。
【００２１】
また、図１において、記憶部１０６に格納される各種のデータ（対象データ１０６ａ〜評価修正関数１０６ｅ）は、固定ディスク装置等のストレージ手段であり、各種処理に用いる各種のプログラムやテーブルやファイルやデータベース等を格納する。
【００２２】
（本装置の原理）
図２は、本装置の原理を説明する概念図である。本装置は、数値フィールドを適当に細かな初期分割に分割した後、動的計画法で初期分割の基での最適解を求めることにより、大規模データに対しても比較的短時間で最適解に近い分割を求めることができる。
【００２３】
図中の１０２ａは、初期分割部であり、データと初期分割数を入力とし、初期分割を生成する。
【００２４】
図中の１０２ｂは、評価用情報生成部であり、初期分割とデータと評価関数を入力とし、評価値計算部１０２ｃおよび評価値修正部１０２ｄにおいて必要な情報（評価用情報）を出力する。
【００２５】
図中の１０２ｃは、評価値計算部であり、初期分割と評価関数と最大分割数と評価用情報を入力とし、内部において動的計画法を実行し、分割と評価値の組を出力する。
【００２６】
図中の１０２ｄは、評価値修正部であり、評価値計算部１０２ｃが出力した分割と評価値の組に対し、評価用情報と評価修正関数を入力とし、各評価値を修正する。
【００２７】
図中の１０２ｅは、分割選択部であり、評価値修正部１０２ｄが出力した分割と評価値の組をユーザーに提示し、ユーザーに最適な分割を選択させる。
【００２８】
（システムの処理）
次に、このように構成された本実施の形態における本装置の処理の一例について、以下に図３〜図１４を参照して詳細に説明する。
【００２９】
（数値フィールド分割処理）
次に、このように構成された本装置を用いて行なわれる本方法としての数値フィールド分割処理の詳細について図３を参照して説明する。図３は、本実施形態における本装置の数値フィールド分割処理の一例を示すフローチャートである。
【００３０】
まず、初期分割部１０２ａはデータを入力とし、初期分割を出力する（ステップＳＡ−１）。初期分割は、例えば、ユーザーが与えるものとする。
【００３１】
ここで「分割」とは、各分割領域を唯一に定めることができるものとする。初期分割の方法としては、例えば、そのフィールドの最大値と最小値から適当な間隔を得、それを用いて分割すること等が考えられる。
【００３２】
ついで、評価用情報生成部１０２ｂは初期分割とデータと評価関数を入力とし、評価値計算部１０２ｃおよび評価値修正部１０２ｄにおいて必要な情報（評価用情報）を出力する（ステップＳＡ−２）。ここで、評価用情報生成部１０２ｂの計算時間は、レコード数に対して比例程度で行われなければならない。
【００３３】
ついで、評価値計算部１０２ｃは初期分割と評価関数と最大分割数と評価用情報を入力とし、内部において動的計画法を実行し、分割と評価値の組を出力する（ステップＳＡ−３）。
【００３４】
ここで、「動的計画法」は初期分割数×最大分割数の表を順次埋めることで最適解を求める手法である。例えば、分割をｔとするときの評価関数が、
【数１】

である場合、ｍ番目までの初期分割をｎ個に分割した場合の最適な値は、
【数２】

により計算できる。
したがって、初期分割数をＭとするとき、Ｔ（Ｍ，ｎ）がｎ分割の最適値となる。
【００３５】
図４および図５は、評価値計算部１０２ｃにより実行される処理の一例を示すフローチャートである。
図４および図５において、Ｌは最大分割数である。また、ｆ（β，γ）は、
【数３】

の計算を示す。また、Ｐ［Ｍ］［ｎ］＞＝０であるようなＴ［Ｍ］［ｎ］が、ｎ分割の評価値である。
【００３６】
再び図３に戻り、評価値修正部１０２ｄは、評価値計算部１０２ｃが出力した分割と評価値の組に対し、評価用情報と評価修正関数を入力とし、各評価値を修正する（ステップＳＡ−４）。この修正は、例えば、分割数ごとに異なる評価値を加える必要がある場合等に必要となる。
【００３７】
ついで、分割選択部１０２ｅは、評価値修正部１０２ｄが出力した分割と評価値の組をユーザーに提示し、ユーザーに最適な分割を選択させる（ステップＳＡ−５）。
【００３８】
分割選択部１０２ｅの一実施形態の構成図を図７に示す。分割提示部１０２ｆでユーザーに分割とその評価値を提示し、ユーザー選択部１０２ｇでユーザーに選択された分割を出力する。
ここで、計算時間がレコード数に対して比例よりも大きくなる可能性があるのは評価値計算部１０２ｃである。この計算時間を考えてみると、評価関数の計算に要する時間がｂ秒、評価値の比較及び更新に必要な時間がｃ秒とした場合、
【数４】

秒となる。評価用情報を用いることにより、評価関数の計算が一定時間（データ量に関係しない）でできるならば、この時間はレコード数に関係なく決定する。したがって、以上の操作は大規模データに対しても高速（レコード数に比例程度）で行うことができる。これにて、数値フィールド分割処理が終了する。
【００３９】
（分割選択部１０２ｅにおける分割選択処理）
次に、分割選択部１０２ｅにおける分割選択処理の詳細について図８および９を参照して説明する。分割選択部１０２ｅの別の一実施形態の構成図を図８に示す。分割選択部１０２ｅは、自動的に唯一の分割を選択しその分割をユーザーに出力する。選択方法としては、例えば、最適な評価値をもつ分割を自動的に選択すること等が考えられる。分割自動選択部１０２ｈで自動的に唯一の分割を選択し、その分割を出力する。
【００４０】
また、分割選択部１０２ｅの別の一実施形態の構成図を図９に示す。本図において、分割選択部１０２ｅは、自動的に複数の分割を選択する。この選択方法としては、例えば、最適な評価値のδ以上の評価値を持つ分割をすべて選択すること等が考えられる（例えばδは０．９などと指定する）。
【００４１】
その結果、唯一の分割が選択された場合には、それを出力する。また、複数の分割が選択された場合にはそれらをユーザーに提示し、その中からユーザーに最適な分割を選択させる。
【００４２】
分割自動選択部１０２ｈで自動的に唯一または複数の分割を選択し、唯一の分割であった場合には出力する。一方、複数の分割であった場合、分割提示部１０２ｆでユーザーに分割とその評価値を提示し、ユーザー選択部１０２ｇでユーザーに選択された分割を出力する。これにて、分割選択部１０２ｅにおける分割選択処理が終了する。
【００４３】
（１０２Ｃにおける評価値計算処理）
次に、１０２Ｃにおける評価値計算処理の詳細について図６を参照して説明する。
評価値計算部１０２ｃは、初期分割と評価関数と最大分割数と評価用情報と分割制限を入力とし、内部において動的計画法を実行し、分割制限を満たすような分割と評価値の組を出力する。
【００４４】
ここで、「分割制限」としては、例えば、各分割は最低１０レコード保持しなければならないであるとか、あるいは、全体の５％の頻度を持たなければならないといった条件が考えられる。これは動的計画法の計算過程を以下のように変更することで実現できる。
【数５】

図６は、本実施形態における本装置の１０２Ｃにおける評価値計算処理の一例を示すフローチャートである。
【００４５】
ここでＣｏｎｄ（β，γ）は、
【数６】

が条件を満たすことを示す。また、Ｐ［Ｍ］［ｎ］＞＝０であるようなＴ［Ｍ］［ｎ］が、ｎ分割の評価値である。また、Ｐ［Ｍ］［ｎ］＜０である場合には、適当なｎ分割が存在しない。これにて、１０２Ｃにおける評価値計算処理が終了する。
【００４６】
（初期分割部１０２ａにおける初期分割処理）
次に、初期分割部１０２ａにおける初期分割処理の詳細について図１０を参照して説明する。
初期分割部１０２ａは、出力する初期分割数を最大分割数とその他の情報から自動的に作成する。たとえば、その他情報は、評価値計算部１０２ｃの実行時間（ユーザーが指定）、評価関数の計算に要する時間、及び、評価値の比較・更新に必要な時間とし、評価値計算部１０２ｃの実行時間がユーザーが指定した値に収まるように初期分割数を指定する機能を持つような場合が考えられる。
【００４７】
全体の計算時間をａ秒と指定した場合、評価値計算部１０２ｃの計算時間は、
【数７】

で表現できるため、初期分割数は、
【数８】

により得ることができる。
【００４８】
図１０に初期分割部１０２ａの一実施形態の構成図を示す。初期分割数計算部１０２ｉにより初期分割数を計算し、初期分割計算部１０２ｊにより初期分割を生成、出力する。これにて、初期分割部１０２ａにおける初期分割処理が終了する。
【００４９】
（評価値計算部１０２ｃにおける評価値計算処理）
次に、評価値計算部１０２ｃにおける評価値計算処理の詳細について説明する。評価値計算部１０２ｃは、分割数方向に異なるＣＰＵに計算を振り分ける。Ｔ（ｍ，ｎ）を得るために必要な情報は、
【数９】

であるので、分割数方向に各ＣＰＵに割り振り、初期分割数方向の添字の小さいほうから順次計算していけば、複数のＣＰＵで同時に計算を行うことにより高速に評価値を得ることができる。
【００５０】
例えば、Ｍ＝２Ｓの場合に２つのＣＰＵ（または並列処理可能なＣＰＵ）に割り振ることを考える。
ここで、例えば、
【数１０】

をＣＰＵ１が、また、
【数１１】

をＣＰＵ２が計算する。ここでαを１から順次大きくしながら計算していけば、両ＣＰＵで同時に計算することができ、高速に分割を得ることができる。これにて、評価値計算部１０２ｃにおける評価値計算処理が終了する。
【００５１】
（数値フィールド分割処理）（１）
次に、数値フィールド分割処理の詳細について説明する。
他のカテゴリフィールドを指定し、数値フィールドとそのカテゴリフィールドとの間のカイ２乗値を最大とするように数値フィールド分割を行う。カイ２乗値は分布が均一からどれだけずれているかを示す尺度の一つであり（「統計学入門」東京大学出版会、１９９１年、第２４５頁〜第２４７頁参照）、以下の式で表される。
【数１２】

ただし、ｔは、数値フィールドの分割であり、ｃは、別のカテゴリフィールドの値であり、Ｎ（ｔ，ｃ）は、ｔかつｃである頻度であり、Ｎ（ｔ）は、ｔである頻度であり、Ｎ（ｃ）は、ｃである頻度であり、Ｎは、全頻度である。
【００５２】
この場合、評価用情報は、数値フィールドの初期分割とカテゴリフィールドの値との組み合わせごとの頻度となる。
また、評価関数は、
【数１３】

であり、評価修正関数は必要ない（常に０）。これにて、数値フィールド分割処理が終了する。
【００５３】
（数値フィールド分割処理）（２）
次に、数値フィールド分割処理の他の一例について説明する。本実施形態においては、数値フィールドの分割後の最大対数尤度（ＭＬＬ）を最大とする数値フィールド分割を行う。
【００５４】
ここで、「最大対数尤度（ＭＬＬ）」は、分割を基準とした元の分布の尤もらしさを示し（「情報量基準による統計解析入門」講談社サイエンティフィック、１９９５年、第６６頁〜第８５頁参照）、以下の式で表される。
【数１４】

この場合、評価用情報は各初期分割の頻度となる。
【００５５】
また、評価関数は、
【数１５】

であり、評価修正関数は必要ない（常に０）。これにて、数値フィールド分割処理が終了する。
【００５６】
（数値フィールド分割処理）（３）
次に、数値フィールド分割処理の他の一例について説明する。
数値フィールドの分割における赤池の情報量基準（ＡＩＣ）を最小とする数値フィールド分割を行う。
ここで、「赤池の情報量基準（ＡＩＣ）」は最大対数尤度に分割数から得られる修正を加えたものであり（「情報量統計学」共立出版、１９８３年、第８０頁〜第９１頁参照）、以下の式で表される。
【数１６】

ここで、ｎは分割数である。この場合、評価用情報は各初期分割の頻度となる。ＡＩＣ（ｔ）の最小を求める代わりに−ＡＩＣ（ｔ）の最大を求めることとする。
また、評価関数は、
【数１７】

であり、評価修正関数は、
【数１８】

である。これにて、数値フィールド分割処理が終了する。
【００５７】
（数値フィールド分割処理）（４）
次に、数値フィールド分割処理の他の一例について説明する。
他のカテゴリフィールドを指定し、数値フィールドとそのカテゴリフィールドによる最大対数尤度（ＭＬＬ）を最大とするように数値フィールド分割を行う。
【００５８】
この場合「最大対数尤度（ＭＬＬ）」は、分割を基準とした、カテゴリフィールドも鑑みた元の分布の尤もらしさを示し、以下の式で表される。
【数１９】

また、「評価用情報」は、数値フィールドの初期分割とカテゴリフィールドの値との組み合わせごとの頻度となる。
また、評価関数は、
【数２０】

であり、評価修正関数は必要ない（常に０）。これにて、数値フィールド分割処理が終了する。
【００５９】
（数値フィールド分割処理）（５）
次に、数値フィールド分割処理の他の一例について説明する。
他のカテゴリフィールドを指定し、数値フィールドとそのカテゴリフィールドにおいて赤池の情報量（ＡＩＣ）を最大とするように数値フィールド分割を行う。
【００６０】
この場合、「赤池の情報量（ＡＩＣ）」は、最大対数尤度に分割数から得られる修正を加えたものであり、以下の式で表される。
【数２１】

また、「評価用情報」は、数値フィールドの初期分割とカテゴリフィールドの値との組み合わせごとの頻度となる。
【００６１】
また、ＡＩＣ（ｔ，ｃ）の最小を求める代わりに−ＡＩＣ（ｔ，ｃ）の最大を求めることとし、評価関数は、
【数２２】

である。
また、評価修正関数は、
【数２３】

である。これにて、数値フィールド分割処理が終了する。
【００６２】
（数値フィールド分割処理）（６）
次に、数値フィールド分割処理の他の一例について説明する。
他のカテゴリフィールドを指定し、数値フィールドとそのカテゴリフィールドの間の相互情報量（ＭＩＣ）を最大とするように数値フィールド分割を行う。
【００６３】
ここで、「相互情報量（ＭＩＣ）」は、両フィールド間の情報の共有度を示す指標の一つであり（「現代数理科学辞典」大阪出版、第７７１頁〜第７７２頁参照）、以下の式で表される。
【数２４】

また、「評価用情報」は、数値フィールドの初期分割とカテゴリフィールドの値との組み合わせごとの頻度となる。
【００６４】
また、「評価関数」は、
【数２５】

であり、評価修正関数は必要ない（常に０）。これにて、数値フィールド分割処理が終了する。
【００６５】
（数値フィールド分割処理）（７）
次に、数値フィールド分割処理の他の一例について説明する。
数値フィールドの分割後の層間分散を最大とするような数値フィールドの分割を行う。
【００６６】
ここで、「層間分散」は、分割後の各分割の散らばり具合を示す尺度の一つであり（「現代数理科学辞典」大阪出版、第５４６頁参照）、以下の式で表される。
【数２６】

この場合、「評価用情報」は、各初期分割のフィールド値の合計値である。
【００６７】
また、「評価関数」は、
【数２７】

である。ただし、ｓ（ｔ）は、分割ｔに含まれるフィールド値の合計である。また、評価修正関数は必要ない（常に０）。これにて、数値フィールド分割処理が終了する。
【００６８】
（実施例）
以下に本発明の実施例について、図１１〜１４を参照して詳細に説明する。
なお、本実施例では、評価関数は、上述したカイ２乗値を最大とするものである場合を一例に説明する。また、カテゴリフィールドの値は、Ｃ１，Ｃ２の二種類とし、データは図１１に示す分布をしているものとする。図１１において、一番左の列は、数値フィールドの値であり、右の二列は、Ｃ１およびＣ２のそれぞれの頻度を示す。
【００６９】
初期分割部１０２ａへの入力である初期分割数は、５とする。初期分割は、１刻みで分割されたものとすると、評価用情報生成部１０２ｂの出力は、図１２に示す頻度表となる。
【００７０】
ここで、評価値計算部１０２ｃの入力である最大分割数は、３とする。また、評価値計算部１０２ｃの入力である評価関数は、
【数２８】

であるから、動的計画法により、図１２に示す頻度表から、図１３に示す表が生成される。図１３に示した表中の括弧内は、
【数２９】

において採用されたαをあらわす（図５および図６におけるＰ［ｍ］［ｎ］）。
【００７１】
したがって、評価値計算部１０２ｃの出力は、図１４に示す表のようになる。
ここで、評価値修正部１０２ｄは、修正の必要はないので何もしない。
なお、評価値選択部分割選択部１０２ｅは、上記３種の分割をユーザーに提示し、最適な分割を選択させてもよい。また、評価値選択部分割選択部１０２ｅは、例えば、評価値が最大である３分割を選択し、最適な分割として出力してもよい。また、評価値選択部分割選択部１０２ｅは、例えば、最大評価値の０．７以上の評価値を選択することとし、２分割及び３分割をユーザーに提示し、最適な分割を選択させてもよい。
【００７２】
（他の実施の形態）
さて、これまで本発明の実施の形態について説明したが、本発明は、上述した実施の形態以外にも、上記特許請求の範囲に記載した技術的思想の範囲内において種々の異なる実施の形態にて実施されてよいものである。
【００７３】
また、実施形態において説明した各処理のうち、自動的に行なわれるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行なわれるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。
この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種の登録データや検索条件等のパラメータを含む情報、画面例、データベース構成については、特記する場合を除いて任意に変更することができる。
【００７４】
また、数値フィールド分割装置１００に関して、図示の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。例えば、数値フィールド分割装置１００が備える処理機能、特に制御部にて行なわれる各処理機能については、その全部または任意の一部を、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）および当該ＣＰＵにて解釈実行されるプログラムにて実現することができ、あるいは、ワイヤードロジックによるハードウェアとして実現することも可能である。なお、プログラムは、後述する記録媒体に記録されており、必要に応じて数値フィールド分割装置１００に機械的に読み取られる。
【００７５】
また、数値フィールド分割装置１００は、さらなる構成要素として、マウス等の各種ポインティングデバイスやキーボードやイメージスキャナやデジタイザ等から成る入力装置（図示せず）、入力データのモニタに用いる表示装置（図示せず）、システムクロックを発生させるクロック発生部（図示せず）、および、各種処理結果その他のデータを出力するプリンタ等の出力装置（図示せず）を備えてもよく、また、入力装置、表示装置および出力装置は、それぞれ入出力制御インタフェース部１０４を介して制御部１０２に接続されてもよい。
【００７６】
また、数値フィールド分割装置１００は、既知のパーソナルコンピュータ、ワークステーション等の情報処理端末等の情報処理装置にプリンタやモニタやイメージスキャナ等の周辺装置を接続し、該情報処理装置に本発明の方法を実現させるソフトウェア（プログラム、データ等を含む）を実装することにより実現してもよい。
【００７７】
さらに、数値フィールド分割装置１００の分散・統合の具合的形態は図示のものに限られず、その全部または一部を、各種の負荷等に応じた任意の単位で、機能的または物理的に分散・統合して構成することができる。例えば、各データベースを独立したデータベース装置として独立に構成してもよく、また、処理の一部をＣＧＩ（ＣｏｍｍｏｎＧａｔｅｗａｙＩｎｔｅｒｆａｃｅ）を用いて実現してもよい。
【００７８】
また、本発明にかかるプログラムを、コンピュータ読み取り可能な記録媒体に格納することもできる。ここで、この「記録媒体」とは、フロッピーディスク、光磁気ディスク、ＲＯＭ、ＥＰＲＯＭ、ＥＥＰＲＯＭ、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等の任意の「可搬用の物理媒体」や、各種コンピュータシステムに内蔵されるＲＯＭ、ＲＡＭ、ＨＤ等の任意の「固定用の物理媒体」、あるいは、ＬＡＮ、ＷＡＮ、インターネットに代表されるネットワークを介してプログラムを送信する場合の通信回線や搬送波のように、短期にプログラムを保持する「通信媒体」を含むものとする。
【００７９】
また、「プログラム」とは、任意の言語や記述方法にて記述されたデータ処理方法であり、ソースコードやバイナリコード等の形式を問わない。なお、「プログラム」は必ずしも単一的に構成されるものに限られず、複数のモジュールやライブラリとして分散構成されるものや、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）に代表される別個のプログラムと協働してその機能を達成するものをも含む。なお、実施の形態に示した各装置において記録媒体を読み取るための具体的な構成、読み取り手順、あるいは、読み取り後のインストール手順等については、周知の構成や手順を用いることができる。
【００８０】
また、このプログラムは、数値フィールド分割装置１００に対して任意のネットワークを介して接続されたアプリケーションプログラムサーバに記録されてもよく、必要に応じてその全部または一部をダウンロードすることも可能である。このあるいは、各制御部の全部または任意の一部を、ワイヤードロジック等によるハードウェアとして実現することも可能である。
【００８１】
（付記１）数値フィールドを含むデータに対する初期分割を生成する初期分割生成手段と、
上記初期分割生成手段にて生成された上記初期分割と、上記データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成する評価用情報生成手段と、
上記初期分割生成手段にて生成された上記初期分割と、上記評価関数と、最大分割数と、上記評価用情報生成手段にて生成された上記評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成する評価値計算手段と、
上記評価用情報および上記評価修正関数に従って、上記評価値計算手段にて生成された上記評価値を修正する評価値修正手段と、
上記分割と上記評価値修正手段にて修正された上記評価値の組を選択する分割選択手段と、
を備えたことを特徴とする数値フィールド分割装置。
【００８２】
（付記２）上記分割選択手段は、一つの上記分割を選択し出力する出力手段をさらに備えたことを特徴とする付記１に記載の数値フィールド分割装置。
【００８３】
（付記３）上記分割選択手段は、
複数の上記分割を選択し出力する出力手段と、
上記出力手段にて出力された上記複数の上記分割の中から一つの上記分割を利用者に選択させる選択手段と、
をさらに備えたことを特徴とする付記１に記載の数値フィールド分割装置。
【００８４】
（付記４）上記評価値計算手段は、上記初期分割と、上記データと、上記評価関数と、分割制限とに基づいて分割制限を加えた動的計画法を実行し、上記分割制限を満たすような上記分割と評価値の組を生成することを特徴とする付記１に記載の数値フィールド分割装置。
【００８５】
（付記５）初期分割生成手段は、上記初期分割数を、最大分割数を含む情報から作成することを特徴とする付記１に記載の数値フィールド分割装置。
【００８６】
（付記６）評価値計算手段は、並列処理により複数の上記評価値を同時に生成することを特徴とする付記１に記載の数値フィールド分割装置。
【００８７】
（付記７）上記分割選択手段は、他の数値フィールドのカテゴリフィールドを指定し、上記数値フィールドと該カテゴリフィールドとの間のカイ２乗値を最大とするように上記数値フィールドの上記分割を選択することを特徴とする付記１に記載の数値フィールド分割装置。
【００８８】
（付記８）上記分割選択手段は、最大対数尤度が最大となるように上記数値フィールドの上記分割を選択することを特徴とする付記１に記載の数値フィールド分割装置。
【００８９】
（付記９）上記分割選択手段は、赤池の情報量基準が最小となるように上記数値フィールドの上記分割を選択することを特徴とする付記１に記載の数値フィールド分割装置。
【００９０】
（付記１０）上記分割選択手段は、他の数値フィールドのカテゴリフィールドを指定し、上記数値フィールドと該カテゴリフィールドによる最大対数尤度を最大とする上記数値フィールドの上記分割を選択することを特徴とする付記１に記載の数値フィールド分割装置。
【００９１】
（付記１１）上記分割選択手段は、他の数値フィールドのカテゴリフィールドを指定し、上記数値フィールドと該カテゴリフィールドによる赤池の情報量基準を最小とする上記数値フィールドの上記分割を選択することを特徴とする付記１に記載の数値フィールド分割装置。
【００９２】
（付記１２）上記分割選択手段は、他の数値フィールドのカテゴリフィールドを指定し、上記数値フィールドと該カテゴリフィールドの間の相互情報量を最大とする上記数値フィールドの上記分割を選択することを特徴とする付記１に記載の数値フィールド分割装置。
【００９３】
（付記１３）上記分割選択手段は、分割後の層間分散を最大とするような上記数値フィールドの上記分割を選択することを特徴とする付記１に記載の数値フィールド分割装置。
【００９４】
（付記１４）数値フィールドを含むデータに対する初期分割を生成する初期分割生成ステップと、
上記初期分割生成ステップにて生成された上記初期分割と、上記データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成する評価用情報生成ステップと、
上記初期分割生成ステップにて生成された上記初期分割と、上記評価関数と、最大分割数と、上記評価用情報生成ステップにて生成された上記評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成する評価値計算ステップと、
上記評価用情報および上記評価修正関数に従って、上記評価値計算ステップにて生成された上記評価値を修正する評価値修正ステップと、
上記分割と上記評価値修正ステップにて修正された上記評価値の組を選択する分割選択ステップと、
を含むことを特徴とする数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【００９５】
（付記１５）上記分割選択ステップは、一つの上記分割を選択し出力する出力ステップをさらに含むことを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【００９６】
（付記１６）上記分割選択ステップは、
複数の上記分割を選択し出力する出力ステップと、
上記出力ステップにて出力された上記複数の上記分割の中から一つの上記分割を利用者に選択させる選択ステップと、
をさらに含むことを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【００９７】
（付記１７）上記評価値計算ステップは、上記初期分割と、上記データと、上記評価関数と、分割制限とに基づいて分割制限を加えた動的計画法を実行し、上記分割制限を満たすような上記分割と評価値の組を生成することを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【００９８】
（付記１８）初期分割生成ステップは、上記初期分割数を、最大分割数を含む情報から作成することを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【００９９】
（付記１９）評価値計算ステップは、並列処理により複数の上記評価値を同時に生成することを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【０１００】
（付記２０）上記分割選択ステップは、他の数値フィールドのカテゴリフィールドを指定し、上記数値フィールドと該カテゴリフィールドとの間のカイ２乗値を最大とするように上記数値フィールドの上記分割を選択することを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【０１０１】
（付記２１）上記分割選択ステップは、最大対数尤度が最大となるように上記数値フィールドの上記分割を選択することを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【０１０２】
（付記２２）上記分割選択ステップは、赤池の情報量基準が最小となるように上記数値フィールドの上記分割を選択することを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【０１０３】
（付記２３）上記分割選択ステップは、他の数値フィールドのカテゴリフィールドを指定し、上記数値フィールドと該カテゴリフィールドによる最大対数尤度を最大とする上記数値フィールドの上記分割を選択することを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【０１０４】
（付記２４）上記分割選択ステップは、他の数値フィールドのカテゴリフィールドを指定し、上記数値フィールドと該カテゴリフィールドによる赤池の情報量基準を最小とする上記数値フィールドの上記分割を選択することを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【０１０５】
（付記２５）上記分割選択ステップは、他の数値フィールドのカテゴリフィールドを指定し、上記数値フィールドと該カテゴリフィールドの間の相互情報量を最大とする上記数値フィールドの上記分割を選択することを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【０１０６】
（付記２６）上記分割選択ステップは、分割後の層間分散を最大とするような上記数値フィールドの上記分割を選択することを特徴とする付記１４に記載の数値フィールド分割装置に数値フィールド分割方法を実行させるプログラム。
【０１０７】
（付記２７）数値フィールドを含むデータに対する初期分割を生成する初期分割生成ステップと、
上記初期分割生成ステップにて生成された上記初期分割と、上記データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成する評価用情報生成ステップと、
上記初期分割生成ステップにて生成された上記初期分割と、上記評価関数と、最大分割数と、上記評価用情報生成ステップにて生成された上記評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成する評価値計算ステップと、
上記評価用情報および上記評価修正関数に従って、上記評価値計算ステップにて生成された上記評価値を修正する評価値修正ステップと、
上記分割と上記評価値修正ステップにて修正された上記評価値の組を選択する分割選択ステップと、
含むことを特徴とする数値フィールド分割装置に数値フィールド分割方法を実行させるプログラムを記録したコンピュータ読み取り可能な記録媒体。
【０１０８】
（付記２８）数値フィールドを含むデータに対する初期分割を生成する初期分割生成ステップと、
上記初期分割生成ステップにて生成された上記初期分割と、上記データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成する評価用情報生成ステップと、
上記初期分割生成ステップにて生成された上記初期分割と、上記評価関数と、最大分割数と、上記評価用情報生成ステップにて生成された上記評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成する評価値計算ステップと、
上記評価用情報および上記評価修正関数に従って、上記評価値計算ステップにて生成された上記評価値を修正する評価値修正ステップと、
上記分割と上記評価値修正ステップにて修正された上記評価値の組を選択する分割選択ステップと、
を含むことを特徴とする数値フィールド分割方法。
【０１０９】
付記２に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。この装置によれば、一つの分割を選択し出力するので、自動的に最適な分割を選択して出力することができる。
【０１１０】
付記３に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。この装置によれば、複数の分割を選択し出力し、出力された複数の分割の中から一つの分割を利用者に選択させるので、複数の分割から利用者が最適な分割を選択することができる。
【０１１１】
付記４に記載の発明について述べる。これは評価値計算の一例を一層具体的に示すものである。この装置によれば、初期分割と、データと、評価関数と、分割制限とに基づいて分割制限を加えた動的計画法を実行し、分割制限を満たすような分割と評価値の組を生成するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１１２】
付記５に記載の発明について述べる。これは初期分割数の一例を一層具体的に示すものである。この装置によれば、初期分割数を、最大分割数を含む情報から作成するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１１３】
付記６に記載の発明について述べる。これは評価値計算の一例を一層具体的に示すものである。この装置によれば、並列処理により複数の評価値を同時に生成するので、マルチプロセッサシステムや並列処理システムを用いて、評価値を同時に計算することにより、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１１４】
付記７に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。この装置によれば、他の数値フィールドのカテゴリフィールドを指定し、数値フィールドと該カテゴリフィールドとの間のカイ２乗値を最大とするように数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１１５】
付記８に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。この装置によれば、最大対数尤度が最大となるように数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１１６】
付記９に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。この装置によれば、赤池の情報量基準が最小となるように数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１１７】
付記１０に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。この装置によれば、他の数値フィールドのカテゴリフィールドを指定し、数値フィールドと該カテゴリフィールドによる最大対数尤度を最大とする数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１１８】
付記１１に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。この装置によれば、他の数値フィールドのカテゴリフィールドを指定し、数値フィールドと該カテゴリフィールドによる赤池の情報量基準を最小とする数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１１９】
付記１２に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。この装置によれば、他の数値フィールドのカテゴリフィールドを指定し、数値フィールドと該カテゴリフィールドの間の相互情報量を最大とする数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１２０】
付記１３に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。この装置によれば、分割後の層間分散を最大とするような数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１２１】
付記１６に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。このプログラムによれば、複数の分割を選択し出力し、出力された複数の分割の中から一つの分割を利用者に選択させるので、複数の分割から利用者が最適な分割を選択することができる。
【０１２２】
付記１７に記載の発明について述べる。これは評価値計算の一例を一層具体的に示すものである。このプログラムによれば、初期分割と、データと、評価関数と、分割制限とに基づいて分割制限を加えた動的計画法を実行し、分割制限を満たすような分割と評価値の組を生成するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１２３】
付記１８に記載の発明について述べる。これは初期分割数の一例を一層具体的に示すものである。このプログラムによれば、初期分割数を、最大分割数を含む情報から作成するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１２４】
付記１９に記載の発明について述べる。これは評価値計算の一例を一層具体的に示すものである。このプログラムによれば、並列処理により複数の評価値を同時に生成するので、マルチプロセッサシステムや並列処理システムを用いて、評価値を同時に計算することにより、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１２５】
付記２０に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。このプログラムによれば、他の数値フィールドのカテゴリフィールドを指定し、数値フィールドと該カテゴリフィールドとの間のカイ２乗値を最大とするように数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１２６】
付記２１に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。このプログラムによれば、最大対数尤度が最大となるように数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１２７】
付記２２に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。このプログラムによれば、赤池の情報量基準が最小となるように数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１２８】
付記２３に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。このプログラムによれば、他の数値フィールドのカテゴリフィールドを指定し、数値フィールドと該カテゴリフィールドによる最大対数尤度を最大とする数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１２９】
付記２４に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。このプログラムによれば、他の数値フィールドのカテゴリフィールドを指定し、数値フィールドと該カテゴリフィールドによる赤池の情報量基準を最小とする数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１３０】
付記２５に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。このプログラムによれば、他の数値フィールドのカテゴリフィールドを指定し、数値フィールドと該カテゴリフィールドの間の相互情報量を最大とする数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１３１】
付記２６に記載の発明について述べる。これは分割選択の一例を一層具体的に示すものである。このプログラムによれば、分割後の層間分散を最大とするような数値フィールドの分割を選択するので、大規模データに対してもさらに短時間で数値フィールドの最適に近い分割を求めることができる。
【０１３２】
【発明の効果】
以上詳細に説明したように、本発明によれば、数値フィールドを含むデータに対する初期分割を生成し、生成された初期分割と、データと、評価関数と、評価修正関数とに基づいて、評価用情報を生成し、生成された初期分割と、評価関数と、最大分割数と、生成された評価用情報とに基づいて、動的計画法を実行し、分割と評価値の組を生成し、評価用情報および評価修正関数に従って、生成された評価値を修正し、分割と評価値の組を選択するので、大規模データに対しても短時間で数値フィールドの最適に近い分割を求めることができる数値フィールド分割装置、プログラム、記録媒体、および、数値フィールド分割方法を提供することができる。
【図面の簡単な説明】
【図１】本発明が適用される数値フィールド分割装置１００の構成の一例を示すブロック図である。
【図２】本装置の原理を説明する概念図である。
【図３】本実施形態における本装置の数値フィールド分割処理の一例を示すフローチャートである。
【図４】評価値計算部１０２ｃにより実行される処理の一例を示すフローチャートである。
【図５】評価値計算部１０２ｃにより実行される処理の一例を示すフローチャートである。
【図６】本実施形態における本装置の１０２Ｃにおける評価値計算処理の一例を示すフローチャートである。
【図７】分割選択部１０２ｅの一実施形態の構成図である。
【図８】分割選択部１０２ｅの別の一実施形態の構成図である。
【図９】分割選択部１０２ｅの別の一実施形態の構成図である。
【図１０】初期分割部１０２ａの一実施形態の構成図である。
【図１１】データの分布を示す図である。
【図１２】データの頻度表を示す図である。
【図１３】動的計画法の処理結果を示す図である。
【図１４】評価値計算部１０２ｃの出力を示す図である。
【符号の説明】
１００数値フィールド分割装置
１０２制御部
１０２ａ初期分割部
１０２ｂ評価用情報生成部
１０２ｃ評価値計算部
１０２ｄ評価値修正部
１０２ｅ分割選択部
１０２ｆ分割提示部
１０２ｇユーザー選択部
１０２ｈ分割自動選択部
１０４入出力制御インタフェース部
１０６記憶部
１０６ａ対象データ
１０６ｂ初期分割数
１０６ｃ最大分割数
１０６ｄ評価関数
１０６ｅ評価修正関数[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a numerical field dividing device, a program, a recording medium, and a numerical field dividing method.
[0002]
[Prior art]
Conventionally, in data processing using a computer, data including a category field and a numerical field is often analyzed. For example, personal purchase information data will be described as an example when analysis processing such as purchase tendency is performed based on data obtained by tabulating purchase amounts (numerical fields) for individual IDs (category fields). When analyzing such data, it may be necessary to divide the data into several units according to a numeric field. That is, for example, in the case of preprocessing for performing data processing that supports only discrete values, in the case where it is desired to divide a field for branching in a decision tree into a plurality of areas, or visualization by area division For example, it is necessary to divide the numeric field of data into a plurality of areas.
[0003]
In these cases, the numerical field is divided intuitively by a person, by setting an objective function according to the purpose and performing the division that optimizes the objective function, or by combining both of them. It will be. Here, it is not appropriate for a person to intuitively perform all the numeric fields because the operator is burdened. Therefore, it is common that a partition that optimizes a certain objective function is automatically generated. In such a case, the optimization problem related to the objective function is solved for the whole to obtain a partition, or the local optimal solution is obtained by repeating the local improvement, and the division is performed in accordance therewith.
For example, “An Imperial Comparison of Discrimination Methods 10” ^th From page 443 to page 450 of "International Symposium on Computer and Information Science 1995", it is divided into an appropriate number of segments in advance, and the locally optimal adjacent segments are merged repeatedly. A numerical field dividing method for obtaining an optimum classification is disclosed.
[0004]
[Problems to be solved by the invention]
However, when the numerical field is divided by solving the optimization problem related to the objective function for the whole and obtaining the division, there is a problem that it takes a lot of calculation time when the data amount increases.
[0005]
In addition, there has been a problem that when dividing a numerical field by repeatedly obtaining a local optimal solution by repeating local improvements, the division is not always appropriate.
[0006]
The present invention has been made in view of the above problems. When a continuous value field is divided, it is preliminarily divided into finely divided areas, and then an optimal solution is obtained using dynamic programming. It is an object of the present invention to provide a numerical field dividing device, a program, a recording medium, and a numerical field dividing method capable of obtaining an optimal division in a relatively short time.
[0007]
[Means for Solving the Problems]
In order to achieve such an object, the numerical field dividing apparatus according to claim 1 includes an initial division generating unit that generates an initial division for data including a numeric field, and the initial division generated by the initial division generating unit. Based on the division, the data, the evaluation function, and the evaluation correction function, evaluation information generation means for generating evaluation information, the initial division generated by the initial division generation means, and the evaluation function And evaluation value calculation means for executing dynamic programming based on the maximum number of divisions and the evaluation information generated by the evaluation information generation means, and generating a set of division and evaluation values; In accordance with the evaluation information and the evaluation correction function, the evaluation value correction means for correcting the evaluation value generated by the evaluation value calculation means, the division and the evaluation value corrected by the evaluation value correction means Characterized in that a division selection means for selecting.
[0008]
According to this apparatus, an initial division is generated for data including a numeric field, and evaluation information is generated based on the generated initial division, the data, the evaluation function, and the evaluation correction function. Based on the initial division, the evaluation function, the maximum number of divisions, and the generated evaluation information, dynamic programming is executed to generate a combination of the division and the evaluation value, and according to the evaluation information and the evaluation correction function Since the generated evaluation value is corrected and a combination of the division and the evaluation value is selected, it is possible to obtain an optimum division of the numerical field in a short time even for large-scale data.
[0009]
The present invention also relates to a program for causing a numerical field dividing apparatus to execute a numerical field dividing method, wherein the program according to claim 2 includes an initial division generating step for generating an initial division for data including a numerical field; Based on the initial division generated in the initial division generation step, the data, the evaluation function, and the evaluation correction function, the evaluation information generation step for generating evaluation information, and the initial division generation step Based on the generated initial division, the evaluation function, the maximum number of divisions, and the evaluation information generated in the evaluation information generation step, dynamic programming is executed, and the division and evaluation values are executed. An evaluation value calculating step for generating a set of the above, and the evaluation value generated in the evaluation value calculating step according to the evaluation information and the evaluation correction function And evaluation value correction step of correcting the value, characterized in that it comprises a split selection step of selecting a set of the divided and the evaluation value correction the evaluation value corrected in step.
[0010]
According to this program, an initial division is generated for data including a numeric field, and evaluation information is generated based on the generated initial division, the data, the evaluation function, and the evaluation correction function. Based on the initial division, the evaluation function, the maximum number of divisions, and the generated evaluation information, dynamic programming is executed to generate a combination of the division and the evaluation value, and according to the evaluation information and the evaluation correction function Since the generated evaluation value is corrected and a combination of the division and the evaluation value is selected, it is possible to obtain an optimum division of the numerical field in a short time even for large-scale data.
[0011]
The present invention also relates to a computer-readable recording medium that records a program for causing a numerical field dividing device to execute a numerical field dividing method. The recording medium according to claim 3 is an initial dividing method for data including numerical fields. Generating information for evaluation based on the initial division generated in the initial division generation step, the data, the evaluation function, and the evaluation correction function. Based on the step, the initial division generated in the initial division generation step, the evaluation function, the maximum number of divisions, and the evaluation information generated in the evaluation information generation step. According to the evaluation value calculation step that executes the programming method and generates a combination of the division and the evaluation value, and the evaluation information and the evaluation correction function Including an evaluation value correction step for correcting the evaluation value generated in the evaluation value calculation step, and a division selection step for selecting the combination of the evaluation value corrected in the division and the evaluation value correction step. It is characterized by.
[0012]
According to this recording medium, an initial division is generated for data including a numeric field, and information for evaluation is generated and generated based on the generated initial division, data, evaluation function, and evaluation correction function. Based on the initial partition, the evaluation function, the maximum number of partitions, and the generated evaluation information, dynamic programming is executed to generate a pair of the partition and the evaluation value, and the evaluation information and the evaluation correction function Accordingly, the generated evaluation value is corrected and a combination of the division and the evaluation value is selected, so that the division of the numerical field close to the optimum can be obtained in a short time even for large-scale data.
[0013]
The present invention also relates to a numerical field dividing method. The numerical field dividing method according to claim 4 includes an initial division generating step for generating an initial division for data including a numerical field, and the initial division generating step. Based on the generated initial division, the data, the evaluation function, and the evaluation correction function, an evaluation information generation step for generating evaluation information, and the initial division generated in the initial division generation step Evaluation that executes dynamic programming based on the evaluation function, the maximum number of divisions, and the evaluation information generated in the evaluation information generation step to generate a combination of division and evaluation values A value calculation step, and an evaluation value correction step for correcting the evaluation value generated in the evaluation value calculation step in accordance with the evaluation information and the evaluation correction function , Characterized in that it comprises a split selection step of selecting a set of the divided and the evaluation value is corrected by the evaluation value correction step.
[0014]
According to this method, an initial division is generated for data including a numeric field, and evaluation information is generated based on the generated initial division, the data, the evaluation function, and the evaluation correction function. Based on the initial division, the evaluation function, the maximum number of divisions, and the generated evaluation information, dynamic programming is executed to generate a combination of the division and the evaluation value, and according to the evaluation information and the evaluation correction function Since the generated evaluation value is corrected and a combination of the division and the evaluation value is selected, it is possible to obtain an optimum division of the numerical field in a short time even for large-scale data.
[0015]
The program according to claim 5 is a program for causing the numerical field dividing apparatus according to claim 2 to execute the numerical field dividing method, wherein the division selection step includes an output step of selecting and outputting one of the divisions. It is further characterized by including.
[0016]
This more specifically shows an example of division selection. According to this program, since one division is selected and output, the optimum division can be automatically selected and output.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of a numerical field dividing device, a program, a recording medium, and a numerical field dividing method according to the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to the embodiments.
First, the configuration of the numerical field dividing device 100 will be described. FIG. 1 is a block diagram showing an example of the configuration of a numerical value field dividing device 100 to which the present invention is applied, and conceptually shows only the portion related to the present invention. In FIG. 1, a numerical field dividing apparatus 100 is schematically an input / output control interface connected to an input / output device (not shown), such as a control unit 102 such as a CPU that controls the entire numerical field dividing apparatus 100 in an integrated manner. Unit 104 and a storage unit 106 for storing various data (target data 106a to evaluation correction function 106e).
[0018]
In FIG. 1, the control unit 102 has a control program such as an OS (Operating System), a program defining various processing procedures, and an internal memory for storing necessary data. Information processing for performing the process is performed. The control unit 102 includes an initial division unit 102a, an evaluation information generation unit 102b, an evaluation value calculation unit 102c, an evaluation value correction unit 102d, and a division selection unit 102e in terms of functional concept.
[0019]
Among these, the initial division unit 102a is an initial division generation unit that generates an initial division for data including a numeric field. Further, the evaluation information generation unit 102b generates evaluation information based on the initial division generated by the initial division generation unit, the data, the evaluation function, and the evaluation correction function. It is. Further, the evaluation value calculation unit 102c is based on the initial division generated by the initial division generation unit, the evaluation function, the maximum number of divisions, and the notation evaluation information generated by the evaluation information generation unit. It is an evaluation value calculation means for executing dynamic programming and generating a combination of division and evaluation value. The evaluation value correction unit 102d is evaluation value correction means for correcting the evaluation value generated by the evaluation value calculation means in accordance with the evaluation information and the evaluation correction function. The division selection unit 102e is a division selection unit that selects a set of evaluation values corrected by the division and evaluation value correction unit. Details of processing performed by each of these units will be described later.
[0020]
In FIG. 1, an input / output control interface unit 104 controls an input device and an output device. Here, as an output device, a speaker can be used in addition to a monitor (including a home television) (hereinafter, the output device is described as a monitor). As the input device, a keyboard, a mouse, a microphone, or the like can be used. The monitor also realizes a pointing device function in cooperation with the mouse.
[0021]
Also, in FIG. 1, various data (target data 106a to evaluation correction function 106e) stored in the storage unit 106 are storage means such as a fixed disk device, and various programs, tables, files, Stores database etc.
[0022]
(Principle of this device)
FIG. 2 is a conceptual diagram illustrating the principle of this apparatus. This device divides a numerical field into appropriately fine initial divisions, and then obtains an optimal solution based on the initial division by dynamic programming, so that the optimal solution can be obtained even for large-scale data in a relatively short time. A division close to can be obtained.
[0023]
Reference numeral 102a in the figure denotes an initial division unit that receives data and the number of initial divisions and generates an initial division.
[0024]
Reference numeral 102b in the figure denotes an evaluation information generation unit, which receives initial division, data, and an evaluation function, and outputs necessary information (evaluation information) in the evaluation value calculation unit 102c and the evaluation value correction unit 102d.
[0025]
Reference numeral 102c in the figure denotes an evaluation value calculation unit, which receives the initial division, the evaluation function, the maximum number of divisions, and evaluation information, executes dynamic programming internally, and outputs a set of division and evaluation values.
[0026]
Reference numeral 102d in the figure denotes an evaluation value correction unit, which inputs evaluation information and an evaluation correction function to the combination of the division and evaluation value output by the evaluation value calculation unit 102c, and corrects each evaluation value.
[0027]
Reference numeral 102e in the figure denotes a division selection unit, which presents a set of divisions and evaluation values output by the evaluation value correction unit 102d to the user and allows the user to select an optimal division.
[0028]
(System processing)
Next, an example of processing of the present apparatus configured as described above will be described in detail with reference to FIGS.
[0029]
(Numeric field split processing)
Next, details of the numerical value field dividing process as the present method performed using the present apparatus configured as described above will be described with reference to FIG. FIG. 3 is a flowchart showing an example of numerical field division processing of the apparatus according to the present embodiment.
[0030]
First, the initial division unit 102a receives data and outputs an initial division (step SA-1). The initial division is given by the user, for example.
[0031]
Here, “division” means that each division area can be uniquely determined. As an initial division method, for example, an appropriate interval may be obtained from the maximum value and the minimum value of the field, and division may be performed using that.
[0032]
Next, the evaluation information generation unit 102b receives the initial division, the data, and the evaluation function, and outputs necessary information (evaluation information) in the evaluation value calculation unit 102c and the evaluation value correction unit 102d (step SA-2). Here, the calculation time of the evaluation information generating unit 102b must be proportional to the number of records.
[0033]
Next, the evaluation value calculation unit 102c receives the initial division, the evaluation function, the maximum number of divisions, and the evaluation information, executes dynamic programming internally, and outputs a combination of the division and the evaluation value (step SA-3). .
[0034]
Here, “dynamic programming” is a method for obtaining an optimal solution by sequentially filling a table of initial division number × maximum division number. For example, the evaluation function when the division is t is
[Expression 1]

If the initial division up to the mth is divided into n, the optimal value is
[Expression 2]

Can be calculated by
Therefore, when the initial division number is M, T (M, n) is an optimum value for n division.
[0035]
4 and 5 are flowcharts illustrating an example of processing executed by the evaluation value calculation unit 102c.
4 and 5, L is the maximum number of divisions. F (β, γ) is
[Equation 3]

The calculation of is shown. Further, T [M] [n] such that P [M] [n]> = 0 is an evaluation value for n division.
[0036]
Returning to FIG. 3 again, the evaluation value correction unit 102d inputs the evaluation information and the evaluation correction function to the combination of the division and the evaluation value output by the evaluation value calculation unit 102c, and corrects each evaluation value (step SA). -4). This correction is necessary when, for example, it is necessary to add a different evaluation value for each division number.
[0037]
Next, the division selection unit 102e presents to the user the combination of the division and the evaluation value output by the evaluation value correction unit 102d, and causes the user to select an optimal division (step SA-5).
[0038]
FIG. 7 shows a configuration diagram of an embodiment of the division selection unit 102e. The division presentation unit 102f presents the division and its evaluation value to the user, and the user selection unit 102g outputs the division selected by the user.
Here, it is the evaluation value calculation unit 102c that the calculation time may be larger than the proportionality to the number of records. Considering this calculation time, if the time required for calculating the evaluation function is b seconds and the time required for comparing and updating the evaluation value is c seconds,
[Expression 4]

Second. If the evaluation function can be calculated in a certain time (not related to the data amount) by using the evaluation information, this time is determined regardless of the number of records. Therefore, the above operations can be performed at high speed (approximately proportional to the number of records) even for large-scale data. This completes the numerical field dividing process.
[0039]
(Division selection process in the division selection unit 102e)
Next, details of the division selection processing in the division selection unit 102e will be described with reference to FIGS. FIG. 8 shows a configuration diagram of another embodiment of the division selection unit 102e. The division selection unit 102e automatically selects a single division and outputs the division to the user. As a selection method, for example, it is conceivable to automatically select a division having an optimum evaluation value. The automatic division selection unit 102h automatically selects a single division and outputs the division.
[0040]
FIG. 9 shows a configuration diagram of another embodiment of the division selection unit 102e. In this figure, the division selection unit 102e automatically selects a plurality of divisions. As this selection method, for example, it may be possible to select all of the divisions having an evaluation value equal to or higher than the optimal evaluation value δ (for example, δ is specified as 0.9).
[0041]
As a result, if only one division is selected, it is output. In addition, when a plurality of divisions are selected, they are presented to the user, and the user is allowed to select an optimal division from among them.
[0042]
The automatic division selection unit 102h automatically selects one or a plurality of divisions, and outputs if the division is the only division. On the other hand, in the case of a plurality of divisions, the division presentation unit 102f presents the division and its evaluation value to the user, and the user selection unit 102g outputs the division selected by the user. Thus, the division selection process in the division selection unit 102e ends.
[0043]
(Evaluation value calculation process in 102C)
Next, details of the evaluation value calculation process in 102C will be described with reference to FIG.
The evaluation value calculation unit 102c receives the initial division, the evaluation function, the maximum number of divisions, the evaluation information, and the division restriction, executes dynamic programming internally, and sets a combination of division and evaluation value that satisfies the division restriction. Output.
[0044]
Here, as the “division limit”, for example, a condition that each division must hold at least 10 records or a frequency of 5% of the total should be considered. This can be realized by changing the calculation process of dynamic programming as follows.
[Equation 5]

FIG. 6 is a flowchart showing an example of an evaluation value calculation process in 102C of the present apparatus in the present embodiment.
[0045]
Where Cond (β, γ) is
[Formula 6]

Indicates that the condition is satisfied. Further, T [M] [n] such that P [M] [n]> = 0 is an evaluation value for n division. In addition, when P [M] [n] <0, there is no appropriate n division. This completes the evaluation value calculation process in 102C.
[0046]
(Initial division processing in the initial division unit 102a)
Next, details of the initial division processing in the initial division unit 102a will be described with reference to FIG.
The initial division unit 102a automatically creates the initial division number to be output from the maximum division number and other information. For example, the other information includes the execution time of the evaluation value calculation unit 102c (specified by the user), the time required to calculate the evaluation function, and the time required to compare and update the evaluation value, and the execution time of the evaluation value calculation unit 102c. There may be a case in which the function of specifying the initial number of divisions so that the value falls within the value specified by the user.
[0047]
When the entire calculation time is designated as a seconds, the calculation time of the evaluation value calculation unit 102c is
[Expression 7]

The initial number of divisions is
[Equation 8]

Can be obtained.
[0048]
FIG. 10 shows a configuration diagram of an embodiment of the initial division unit 102a. The initial division number calculation unit 102i calculates the initial division number, and the initial division calculation unit 102j generates and outputs the initial division. This completes the initial division processing in the initial division unit 102a.
[0049]
(Evaluation Value Calculation Processing in Evaluation Value Calculation Unit 102c)
Next, details of the evaluation value calculation process in the evaluation value calculation unit 102c will be described. The evaluation value calculation unit 102c distributes the calculation to different CPUs in the division number direction. The information necessary to obtain T (m, n) is
[Equation 9]

Therefore, by assigning to each CPU in the division number direction and calculating sequentially from the smaller subscript in the initial division number direction, the evaluation value can be obtained at high speed by performing the calculation simultaneously with a plurality of CPUs.
[0050]
For example, when M = 2S, allocation to two CPUs (or CPUs capable of parallel processing) is considered.
Here, for example,
[Expression 10]

CPU1
[Expression 11]

Is calculated by the CPU 2. Here, if α is sequentially increased from 1, it can be calculated simultaneously by both CPUs, and division can be obtained at high speed. Thus, the evaluation value calculation process in the evaluation value calculation unit 102c ends.
[0051]
(Numeric field division processing) (1)
Next, details of the numerical field division processing will be described.
Another category field is specified, and the numeric field division is performed so that the chi-square value between the numeric field and the category field is maximized. The chi-square value is one of the scales indicating how much the distribution deviates from the uniformity (see “Introduction to Statistics”, The University of Tokyo Press, 1991, pages 245 to 247). expressed.
[Expression 12]

Where t is the division of the numeric field, c is the value of another category field, N (t, c) is the frequency t and c, and N (t) is t. N (c) is the frequency that is c, and N is the total frequency.
[0052]
In this case, the evaluation information is a frequency for each combination of the initial division of the numerical field and the value of the category field.
The evaluation function is
[Formula 13]

The evaluation correction function is not necessary (always 0). This completes the numerical field dividing process.
[0053]
(Numeric field division processing) (2)
Next, another example of numerical field division processing will be described. In the present embodiment, numerical field division is performed to maximize the maximum log likelihood (MLL) after the numerical field is divided.
[0054]
Here, “Maximum Log Likelihood (MLL)” indicates the likelihood of the original distribution based on the division (“Introduction to Statistical Analysis Based on Information Criteria” Kodansha Scientific, 1995, pp. 66-66. (Refer to page 85).
[Expression 14]

In this case, the evaluation information is the frequency of each initial division.
[0055]
The evaluation function is
[Expression 15]

The evaluation correction function is not necessary (always 0). This completes the numerical field dividing process.
[0056]
(Numeric field division processing) (3)
Next, another example of numerical field division processing will be described.
Numerical field division is performed to minimize Akaike's information criterion (AIC) in the numerical field division.
Here, “Akaike's information criterion (AIC)” is obtained by adding a correction obtained from the number of divisions to the maximum log likelihood (“Information Statistics”, 1983, pages 80-91). Page), and is represented by the following formula.
[Expression 16]

Here, n is the number of divisions. In this case, the evaluation information is the frequency of each initial division. Instead of obtaining the minimum of AIC (t), the maximum of -AIC (t) is obtained.
The evaluation function is
[Expression 17]

And the evaluation correction function is
[Formula 18]

It is. This completes the numerical field dividing process.
[0057]
(Numeric field division processing) (4)
Next, another example of numerical field division processing will be described.
Another category field is designated, and the numeric field division is performed so that the numeric field and the maximum log likelihood (MLL) by the category field are maximized.
[0058]
In this case, “maximum log likelihood (MLL)” indicates the likelihood of the original distribution in consideration of the category field with reference to the division, and is expressed by the following equation.
[Equation 19]

The “evaluation information” is a frequency for each combination of the initial division of the numeric field and the value of the category field.
The evaluation function is
[Expression 20]

The evaluation correction function is not necessary (always 0). This completes the numerical field dividing process.
[0059]
(Numeric field division processing) (5)
Next, another example of numerical field division processing will be described.
Another category field is designated, and the numeric field division is performed so that the Akaike's information amount (AIC) is maximized in the numeric field and the category field.
[0060]
In this case, “Akaike's information amount (AIC)” is obtained by adding the correction obtained from the number of divisions to the maximum log likelihood and is expressed by the following equation.
[Expression 21]

The “evaluation information” is a frequency for each combination of the initial division of the numeric field and the value of the category field.
[0061]
Further, instead of obtaining the minimum of AIC (t, c), the maximum of -AIC (t, c) is obtained, and the evaluation function is
[Expression 22]

It is.
The evaluation correction function is
[Expression 23]

It is. This completes the numerical field dividing process.
[0062]
(Numeric field division processing) (6)
Next, another example of numerical field division processing will be described.
Another category field is designated, and the numeric field division is performed so as to maximize the mutual information (MIC) between the numeric field and the category field.
[0063]
Here, “mutual information (MIC)” is one of the indexes indicating the degree of information sharing between the two fields (see “Modern Mathematical Science Dictionary”, Osaka Publishing, pages 771 to 772). It is expressed by the following formula.
[Expression 24]

The “evaluation information” is a frequency for each combination of the initial division of the numeric field and the value of the category field.
[0064]
The “evaluation function” is
[Expression 25]

The evaluation correction function is not necessary (always 0). This completes the numerical field dividing process.
[0065]
(Numeric field division processing) (7)
Next, another example of numerical field division processing will be described.
The numerical field is divided so as to maximize the interlayer dispersion after the numerical field is divided.
[0066]
Here, “interlayer dispersion” is one of the scales indicating the degree of dispersion of each division after division (see “Modern Mathematical Science Dictionary”, Osaka Publishing, page 546), and is expressed by the following equation.
[Equation 26]

In this case, “evaluation information” is the total value of the field values of each initial division.
[0067]
The “evaluation function” is
[Expression 27]

It is. However, s (t) is the sum of the field values included in the division t. Also, no evaluation correction function is required (always 0). This completes the numerical field dividing process.
[0068]
(Example)
Hereinafter, embodiments of the present invention will be described in detail with reference to FIGS.
In the present embodiment, the case where the evaluation function maximizes the above-described chi-square value will be described as an example. Further, it is assumed that there are two types of category field values, C1 and C2, and the data has the distribution shown in FIG. In FIG. 11, the leftmost column is the value of the numerical field, and the right two columns indicate the respective frequencies of C1 and C2.
[0069]
The initial number of divisions that is input to the initial division unit 102a is five. Assuming that the initial division is performed in increments of one, the output of the evaluation information generation unit 102b is a frequency table shown in FIG.
[0070]
Here, the maximum number of divisions that is input to the evaluation value calculation unit 102c is three. An evaluation function that is an input to the evaluation value calculation unit 102c is:
[Expression 28]

Therefore, the table shown in FIG. 13 is generated from the frequency table shown in FIG. 12 by dynamic programming. In parentheses in the table shown in FIG.
[Expression 29]

Represents α adopted in (P [m] [n] in FIGS. 5 and 6).
[0071]
Therefore, the output of the evaluation value calculation unit 102c is as shown in the table of FIG.
Here, the evaluation value correction unit 102d does not need to be corrected and does nothing.
Note that the evaluation value selection unit division selection unit 102e may present the above three types of divisions to the user and select the optimum division. Further, the evaluation value selection unit division selection unit 102e may select, for example, the three divisions having the maximum evaluation value and output them as the optimum division. Further, the evaluation value selection unit division selection unit 102e selects, for example, an evaluation value of 0.7 or more of the maximum evaluation value, presents two divisions and three divisions to the user, and selects the optimum division. Good.
[0072]
(Other embodiments)
Although the embodiments of the present invention have been described so far, the present invention can be applied to various different embodiments in addition to the above-described embodiments within the scope of the technical idea described in the claims. May be implemented.
[0073]
In addition, among the processes described in the embodiment, all or part of the processes described as being automatically performed can be performed manually, or all of the processes described as being performed manually are all performed. Alternatively, a part can be automatically performed by a known method.
In addition, the processing procedures, control procedures, specific names, information including parameters such as various registration data and search conditions, screen examples, and database configurations shown in the above documents and drawings, unless otherwise specified. It can be changed arbitrarily.
[0074]
In addition, regarding the numerical value field dividing device 100, each illustrated component is functionally conceptual and does not necessarily need to be physically configured as illustrated. For example, the processing functions provided in the numerical field dividing device 100, particularly the processing functions performed by the control unit, all or any part thereof are interpreted and executed by a CPU (Central Processing Unit) and the CPU. It can also be realized by hardware or by wired logic hardware. The program is recorded on a recording medium to be described later, and is mechanically read by the numerical value field dividing device 100 as necessary.
[0075]
Further, the numerical field dividing apparatus 100 includes, as further components, an input device (not shown) including various pointing devices such as a mouse, a keyboard, an image scanner, a digitizer, and the like, and a display device (not shown) used for monitoring input data. ), A clock generator (not shown) for generating a system clock, and an output device (not shown) such as a printer for outputting various processing results and other data, and an input device and a display device And the output device may be connected to the control unit 102 via the input / output control interface unit 104, respectively.
[0076]
The numerical field dividing apparatus 100 connects a peripheral device such as a printer, a monitor, and an image scanner to an information processing apparatus such as an information processing terminal such as a known personal computer or workstation, and the method of the present invention is applied to the information processing apparatus. You may implement | achieve by mounting the software (a program, data, etc. are included) which implement | achieve.
[0077]
Furthermore, the specific form of distribution / integration of the numerical field dividing apparatus 100 is not limited to the one shown in the figure, and all or part of the numerical field division apparatus 100 may be functionally or physically distributed / arbitrarily in arbitrary units according to various loads. Can be integrated and configured. For example, each database may be independently configured as an independent database device, and a part of the processing may be realized by using CGI (Common Gateway Interface).
[0078]
The program according to the present invention can also be stored in a computer-readable recording medium. Here, the “recording medium” is an arbitrary “portable physical medium” such as a floppy disk, a magneto-optical disk, a ROM, an EPROM, an EEPROM, a CD-ROM, an MO, and a DVD, and is incorporated in various computer systems. Program in a short time, such as a communication line or carrier wave when transmitting a program via any “fixed physical medium” such as ROM, RAM, HD, or a network such as LAN, WAN, or the Internet The “communication medium” that holds
[0079]
The “program” is a data processing method described in an arbitrary language or description method, and may be in any format such as source code or binary code. The “program” is not necessarily limited to a single configuration, but is distributed in the form of a plurality of modules and libraries, or in cooperation with a separate program represented by an OS (Operating System). Including those that achieve the function. Note that a well-known configuration and procedure can be used for a specific configuration for reading a recording medium, a reading procedure, an installation procedure after reading, and the like in each device described in the embodiment.
[0080]
Further, this program may be recorded in an application program server connected to the numerical field dividing device 100 via an arbitrary network, and the whole or a part of the program can be downloaded as necessary. . Alternatively, all or any part of each control unit can be realized as hardware such as wired logic.
[0081]
(Supplementary Note 1) Initial division generation means for generating an initial division for data including a numeric field;
Evaluation information generation means for generating evaluation information based on the initial division generated by the initial division generation means, the data, the evaluation function, and the evaluation correction function;
Based on the initial division generated by the initial division generation means, the evaluation function, the maximum number of divisions, and the evaluation information generated by the evaluation information generation means, dynamic programming is performed. An evaluation value calculation means for executing and generating a combination of division and evaluation value;
Evaluation value correction means for correcting the evaluation value generated by the evaluation value calculation means according to the evaluation information and the evaluation correction function;
Division selection means for selecting a set of the evaluation values corrected by the division and the evaluation value correction means;
A numerical field dividing apparatus comprising:
[0082]
(Additional remark 2) The said division | segmentation selection means is further provided with the output means which selects and outputs one said division | segmentation, The numerical field division | segmentation apparatus of Additional remark 1 characterized by the above-mentioned.
[0083]
(Supplementary note 3) The division selection means is
An output means for selecting and outputting the plurality of divisions;
Selection means for allowing the user to select one of the plurality of divisions output by the output means;
The numerical field dividing device according to supplementary note 1, further comprising:
[0084]
(Supplementary Note 4) The evaluation value calculation means executes dynamic programming with division restrictions based on the initial division, the data, the evaluation function, and the division restriction so as to satisfy the division restriction. 2. The numerical field dividing device according to appendix 1, wherein the combination of the division and the evaluation value is generated.
[0085]
(Supplementary note 5) The numerical field dividing device according to supplementary note 1, wherein the initial division generation means creates the initial division number from information including the maximum division number.
[0086]
(Supplementary note 6) The numerical value field dividing apparatus according to supplementary note 1, wherein the evaluation value calculating means simultaneously generates the plurality of evaluation values by parallel processing.
[0087]
(Supplementary note 7) The division selection means designates a category field of another numeric field, and selects the division of the numeric field so as to maximize the chi-square value between the numeric field and the category field. The numerical field dividing device according to Supplementary Note 1, wherein:
[0088]
(Additional remark 8) The said division | segmentation selection means selects the said division | segmentation of the said numerical field so that the maximum log likelihood becomes the maximum, The numerical field division | segmentation apparatus of Additional remark 1 characterized by the above-mentioned.
[0089]
(Additional remark 9) The said division | segmentation selection means selects the said division | segmentation of the said numerical field so that the information amount reference | standard of Akaike may become the minimum, The numerical field division | segmentation apparatus of Additional remark 1 characterized by the above-mentioned.
[0090]
(Additional remark 10) The said division | segmentation selection means selects the division | segmentation of the said numerical field which designates the category field of another numerical field, and maximizes the maximum log likelihood by the said numerical field and this category field, It is characterized by the above-mentioned. The numerical value field dividing device according to appendix 1.
[0091]
(Supplementary Note 11) The division selection means designates a category field of another numeric field, and selects the division of the numeric field that minimizes the information amount standard of Akaike by the numeric field and the category field. The numerical value field dividing device according to Supplementary Note 1.
[0092]
(Supplementary note 12) The division selection means designates a category field of another numeric field, and selects the division of the numeric field that maximizes the mutual information amount between the numeric field and the category field. The numerical value field dividing device according to Supplementary Note 1.
[0093]
(Additional remark 13) The said division | segmentation selection means selects the said division | segmentation of the said numerical field which maximizes the interlayer dispersion | distribution after a division | segmentation, The numerical field division | segmentation apparatus of Additional remark 1 characterized by the above-mentioned.
[0094]
(Supplementary note 14) An initial division generation step for generating an initial division for data including a numeric field;
An evaluation information generation step for generating evaluation information based on the initial division generated in the initial division generation step, the data, the evaluation function, and the evaluation correction function;
Based on the initial division generated in the initial division generation step, the evaluation function, the maximum number of divisions, and the evaluation information generated in the evaluation information generation step, dynamic programming is performed. An evaluation value calculating step that executes and generates a combination of the division and the evaluation value;
An evaluation value correction step for correcting the evaluation value generated in the evaluation value calculation step according to the evaluation information and the evaluation correction function;
A division selection step of selecting a set of the evaluation values corrected in the division and the evaluation value correction step;
A program for causing a numerical field dividing device to execute a numerical field dividing method.
[0095]
(Supplementary note 15) The program for causing the numerical field division apparatus according to Supplementary note 14 to execute the numerical field dividing method, wherein the division selection step further includes an output step of selecting and outputting one of the divisions.
[0096]
(Supplementary Note 16) The above division selection step includes:
An output step of selecting and outputting a plurality of the above divisions;
A selection step for allowing the user to select one of the plurality of divisions output in the output step;
The program for causing the numerical field dividing apparatus according to supplementary note 14 to execute the numerical field dividing method.
[0097]
(Supplementary Note 17) The evaluation value calculation step executes dynamic programming with division restrictions based on the initial division, the data, the evaluation function, and the division restriction so as to satisfy the division restriction. 15. A program for causing a numerical field dividing apparatus according to appendix 14 to execute a numerical field dividing method, wherein the numerical value dividing apparatus generates the combination of the above-mentioned division and evaluation value.
[0098]
(Supplementary note 18) The program for causing the numerical field division device according to Supplementary note 14 to execute the numerical field division method, wherein the initial division generation step creates the initial division number from information including the maximum division number.
[0099]
(Supplementary note 19) The program for causing the numerical value field dividing apparatus according to Supplementary note 14 to execute the numerical value field dividing method, wherein the evaluation value calculating step simultaneously generates a plurality of evaluation values by parallel processing.
[0100]
(Supplementary note 20) The division selection step designates a category field of another numeric field, and selects the division of the numeric field so as to maximize the chi-square value between the numeric field and the category field. 15. A program for causing a numerical field dividing apparatus according to appendix 14 to execute a numerical field dividing method.
[0101]
(Supplementary note 21) The numerical field division method is executed in the numerical field division device according to supplementary note 14, wherein the division selection step selects the division of the numeric field so that the maximum log likelihood is maximized. Program to make.
[0102]
(Supplementary note 22) In the division selection step, the division of the numeric field is selected in the numeric field division apparatus according to supplementary note 14, wherein the division of the numeric field is selected so that the information criterion of Akaike is minimized. The program to be executed.
[0103]
(Supplementary note 23) The division selection step designates a category field of another numeric field, and selects the division of the numeric field that maximizes the maximum log likelihood of the numeric field and the category field. A program for causing a numerical field dividing apparatus according to appendix 14 to execute a numerical field dividing method.
[0104]
(Supplementary Note 24) The division selection step designates a category field of another numeric field, and selects the division of the numeric field that minimizes the information amount standard of Akaike by the numeric field and the category field. The program which makes the numerical field division | segmentation apparatus of Additional remark 14 perform a numerical field division | segmentation method.
[0105]
(Supplementary note 25) The division selection step designates a category field of another numeric field, and selects the division of the numeric field that maximizes the mutual information amount between the numeric field and the category field. The program which makes the numerical field division | segmentation apparatus of Additional remark 14 perform a numerical field division | segmentation method.
[0106]
(Supplementary note 26) In the division selection step, the division of the numeric field is selected so as to maximize the interlayer dispersion after division. The program to be executed.
[0107]
(Supplementary note 27) An initial division generation step for generating an initial division for data including a numeric field;
An evaluation information generation step for generating evaluation information based on the initial division generated in the initial division generation step, the data, the evaluation function, and the evaluation correction function;
Based on the initial division generated in the initial division generation step, the evaluation function, the maximum number of divisions, and the evaluation information generated in the evaluation information generation step, dynamic programming is performed. An evaluation value calculating step that executes and generates a combination of the division and the evaluation value;
An evaluation value correction step for correcting the evaluation value generated in the evaluation value calculation step according to the evaluation information and the evaluation correction function;
A division selection step of selecting a set of the evaluation values corrected in the division and the evaluation value correction step;
A computer-readable recording medium recording a program for causing a numerical field dividing device to execute a numerical field dividing method.
[0108]
(Supplementary note 28) An initial division generation step for generating an initial division for data including a numeric field;
An evaluation information generation step for generating evaluation information based on the initial division generated in the initial division generation step, the data, the evaluation function, and the evaluation correction function;
Based on the initial division generated in the initial division generation step, the evaluation function, the maximum number of divisions, and the evaluation information generated in the evaluation information generation step, dynamic programming is performed. An evaluation value calculating step that executes and generates a combination of the division and the evaluation value;
An evaluation value correction step for correcting the evaluation value generated in the evaluation value calculation step according to the evaluation information and the evaluation correction function;
A division selection step of selecting a set of the evaluation values corrected in the division and the evaluation value correction step;
A numerical field dividing method characterized by comprising:
[0109]
The invention described in Appendix 2 will be described. This more specifically shows an example of division selection. According to this apparatus, since one division is selected and output, the optimum division can be automatically selected and output.
[0110]
The invention described in Appendix 3 will be described. This more specifically shows an example of division selection. According to this apparatus, since a plurality of divisions are selected and output, and one division is selected from the plurality of output divisions, the user can select an optimum division from the plurality of divisions. it can.
[0111]
The invention described in Appendix 4 will be described. This more specifically shows one example of evaluation value calculation. According to this apparatus, a dynamic programming method in which a division restriction is added based on the initial division, data, evaluation function, and division restriction is executed, and a combination of a division and an evaluation value that satisfies the division restriction is generated. As a result, it is possible to obtain a nearly optimal division of the numerical field in a shorter time even for large-scale data.
[0112]
The invention described in Appendix 5 will be described. This more specifically shows an example of the initial division number. According to this apparatus, since the initial number of divisions is created from information including the maximum number of divisions, it is possible to obtain a nearly optimal division of a numeric field in a shorter time even for large-scale data.
[0113]
The invention described in appendix 6 will be described. This more specifically shows one example of evaluation value calculation. According to this apparatus, since a plurality of evaluation values are simultaneously generated by parallel processing, it is possible to calculate evaluation values at the same time using a multiprocessor system or a parallel processing system. A sub-optimal division of the numeric field can be determined.
[0114]
The invention described in appendix 7 will be described. This more specifically shows an example of division selection. According to this apparatus, the category field of another numeric field is designated, and the division of the numeric field is selected so as to maximize the chi-square value between the numeric field and the category field. On the other hand, it is possible to obtain a sub-optimal division of the numerical field in a shorter time.
[0115]
The invention described in appendix 8 will be described. This more specifically shows an example of division selection. According to this apparatus, since the division of the numeric field is selected so that the maximum log likelihood is maximized, it is possible to obtain a division near the optimum of the numeric field in a shorter time even for large-scale data.
[0116]
The invention described in appendix 9 will be described. This more specifically shows an example of division selection. According to this apparatus, since the division of the numerical field is selected so that the information amount criterion of Akaike is minimized, it is possible to obtain a division near the optimum of the numerical field in a shorter time even for large-scale data.
[0117]
The invention described in appendix 10 will be described. This more specifically shows an example of division selection. According to this apparatus, the category field of another numeric field is designated, and the division of the numeric field and the numeric field that maximizes the maximum log likelihood by the category field is selected. A near-optimal division of the numeric field can be determined in time.
[0118]
The invention described in appendix 11 will be described. This more specifically shows an example of division selection. According to this apparatus, the category field of another numeric field is designated, and the division of the numeric field and the numeric field that minimizes the Akaike information criterion by the category field is selected. A near-optimal division of the numerical field can be obtained in a short time.
[0119]
The invention described in appendix 12 will be described. This more specifically shows an example of division selection. According to this apparatus, the category field of another numeric field is designated, and the division of the numeric field that maximizes the mutual information amount between the numeric field and the category field is selected. A near-optimal division of the numerical field can be obtained in a short time.
[0120]
The invention described in appendix 13 will be described. This more specifically shows an example of division selection. According to this apparatus, the division of the numerical field that maximizes the inter-layer dispersion after the division is selected, so that the division of the numerical field close to the optimum can be obtained in a shorter time even for large-scale data.
[0121]
The invention described in appendix 16 will be described. This more specifically shows an example of division selection. According to this program, a plurality of divisions are selected and output, and one division is selected from the plurality of output divisions, so that the user can select an optimum division from the plurality of divisions. it can.
[0122]
The invention described in appendix 17 will be described. This more specifically shows one example of evaluation value calculation. According to this program, a dynamic programming method that adds a partition restriction based on the initial partition, data, evaluation function, and partition restriction is executed, and a pair of partition and evaluation value that satisfies the partition restriction is generated. As a result, it is possible to obtain a nearly optimal division of the numerical field in a shorter time even for large-scale data.
[0123]
The invention described in appendix 18 will be described. This more specifically shows an example of the initial division number. According to this program, since the initial number of divisions is created from information including the maximum number of divisions, it is possible to obtain a nearly optimal division of a numeric field in a shorter time even for large-scale data.
[0124]
The invention described in appendix 19 will be described. This more specifically shows one example of evaluation value calculation. According to this program, multiple evaluation values are generated simultaneously by parallel processing. By calculating the evaluation values at the same time using a multiprocessor system or parallel processing system, even for large-scale data in a shorter time. A sub-optimal division of the numeric field can be determined.
[0125]
The invention described in appendix 20 will be described. This more specifically shows an example of division selection. According to this program, the category field of another numeric field is specified, and the division of the numeric field is selected so as to maximize the chi-square value between the numeric field and the category field. On the other hand, it is possible to obtain a sub-optimal division of the numerical field in a shorter time.
[0126]
The invention described in appendix 21 will be described. This more specifically shows an example of division selection. According to this program, since the division of the numerical field is selected so that the maximum log likelihood is maximized, it is possible to obtain a division close to the optimal division of the numerical field in a shorter time even for large-scale data.
[0127]
The invention described in appendix 22 will be described. This more specifically shows an example of division selection. According to this program, since the division of the numerical field is selected so that the information amount criterion of Akaike is minimized, it is possible to obtain an optimal division of the numerical field in a shorter time even for large-scale data.
[0128]
The invention described in appendix 23 will be described. This more specifically shows an example of division selection. According to this program, the category field of another numeric field is specified, and the division of the numeric field and the numeric field that maximizes the maximum log likelihood by the category field is selected. A near-optimal division of the numeric field can be determined in time.
[0129]
The invention described in appendix 24 will be described. This more specifically shows an example of division selection. According to this program, the category field of another numeric field is specified, and the division of the numeric field and the numeric field that minimizes the Akaike's information criterion by the category field is selected. A near-optimal division of the numerical field can be obtained in a short time.
[0130]
The invention described in Appendix 25 will be described. This more specifically shows an example of division selection. According to this program, the category field of another numeric field is specified, and the division of the numeric field that maximizes the mutual information between the numeric field and the category field is selected. A near-optimal division of the numerical field can be obtained in a short time.
[0131]
The invention described in appendix 26 will be described. This more specifically shows an example of division selection. According to this program, since the division of the numerical field that maximizes the inter-layer dispersion after the division is selected, it is possible to obtain a nearly optimal division of the numerical field in a shorter time even for large-scale data.
[0132]
【The invention's effect】
As described above in detail, according to the present invention, an initial division is generated for data including a numeric field, and the evaluation division is performed based on the generated initial division, the data, the evaluation function, and the evaluation correction function. Information is generated, based on the generated initial partition, the evaluation function, the maximum number of partitions, and the generated information for evaluation, dynamic programming is performed to generate a pair of the partition and the evaluation value, According to the evaluation information and the evaluation correction function, the generated evaluation value is corrected, and a combination of the division and the evaluation value is selected, so that it is possible to obtain a division near the optimum of the numerical field in a short time even for large-scale data. A numerical field dividing device, a program, a recording medium, and a numerical field dividing method can be provided.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an example of the configuration of a numerical value field dividing device 100 to which the present invention is applied.
FIG. 2 is a conceptual diagram illustrating the principle of the apparatus.
FIG. 3 is a flowchart showing an example of numerical field division processing of the apparatus according to the present embodiment.
FIG. 4 is a flowchart illustrating an example of processing executed by an evaluation value calculation unit 102c.
FIG. 5 is a flowchart illustrating an example of processing executed by an evaluation value calculation unit 102c.
FIG. 6 is a flowchart showing an example of an evaluation value calculation process in 102C of the apparatus according to the present embodiment.
FIG. 7 is a configuration diagram of an embodiment of a division selection unit 102e.
FIG. 8 is a configuration diagram of another embodiment of a division selection unit 102e.
FIG. 9 is a configuration diagram of another embodiment of a division selection unit 102e.
FIG. 10 is a configuration diagram of an embodiment of an initial division unit 102a.
FIG. 11 is a diagram showing a distribution of data.
FIG. 12 is a diagram showing a data frequency table;
FIG. 13 is a diagram illustrating a processing result of dynamic programming.
FIG. 14 is a diagram illustrating an output of an evaluation value calculation unit 102c.
[Explanation of symbols]
100 Numeric field divider
102 Control unit
102a initial division part
102b Evaluation information generator
102c Evaluation value calculator
102d Evaluation value correction unit
102e Division selection unit
102f Division presentation unit
102g User selection part
102h Automatic division selection unit
104 I / O control interface
106 Storage unit
106a Target data
106b Initial number of divisions
106c Maximum number of divisions
106d Evaluation function
106e Evaluation correction function

Claims

A data table in which a frequency value corresponding to each combination of one numerical field composed of a plurality of numerical values and 0 or one category field composed of a plurality of category values is stored in association with each other is read and input M ( Initial dividing means for initially dividing the numerical field into the M pieces according to (M is a natural number);
An evaluation information table as evaluation information by calculating an aggregate value obtained by aggregating the frequency values included in the combination of the numeric field initially divided into the M fields by the initial division means and the category value of the category field Information generating means for evaluation stored in
According to the dynamic programming, the evaluation information generated by the evaluation information generating means is evaluated based on the dynamic programming for every natural number n that does not exceed both the maximum number of divisions and the M that are input. A function is applied, and the evaluation values of the numerical field initially divided into M pieces are sequentially calculated and stored in the evaluation value storage table for each n until a predetermined calculation end condition is satisfied , and the evaluation value storage table Evaluation value calculation means for generating a combination of the evaluation value calculated last until the predetermined calculation end condition is satisfied for each n and the division corresponding to the evaluation value;
An evaluation value correcting means for correcting each evaluation value by applying an evaluation correction function corresponding to the evaluation function to each evaluation value of the combination generated by the evaluation value calculating means for each n;
Division selection means for selecting and outputting the combination having the maximum evaluation value from the combinations for each n in which the evaluation values have been corrected by the evaluation value correction means, Numeric field divider to perform.

A data table in which a frequency value corresponding to each combination of one numerical field composed of a plurality of numerical values and 0 or one category field composed of a plurality of category values is stored in association with each other is read and input M ( An initial division procedure for initially dividing the numeric field into the M pieces according to M is a natural number);
An evaluation information table as evaluation information by calculating an aggregate value obtained by aggregating the frequency values included in the combination of the numeric field initially divided into the M fields by the initial division procedure and the category field category value. Information generation procedure for evaluation to be stored in
In accordance with dynamic programming , each of the evaluation information generated by the evaluation information generation procedure is evaluated based on dynamic programming for every natural number n that does not exceed both the maximum number of divisions and M that are input. A function is applied, and the evaluation values of the numerical field initially divided into M pieces are sequentially calculated and stored in the evaluation value storage table for each n until a predetermined calculation end condition is satisfied , and the evaluation value storage table An evaluation value calculation procedure for generating a combination of the evaluation value calculated last until the predetermined calculation end condition is satisfied for each n and the division corresponding to the evaluation value;
An evaluation value correction procedure for correcting each evaluation value by applying an evaluation correction function corresponding to the evaluation function to each evaluation value of the combination generated by the evaluation value calculation procedure every n times;
The division selection procedure for selecting and outputting the combination having the maximum evaluation value from the combinations for each n in which the evaluation values have been corrected by the evaluation value correction procedure is executed in the numerical field division device Numeric field division program characterized by having

A data table in which a frequency value corresponding to each combination of one numerical field composed of a plurality of numerical values and 0 or one category field composed of a plurality of category values is stored in association with each other is read and input M ( An initial division procedure for initially dividing the numeric field into the M pieces according to M is a natural number);
An evaluation information table as evaluation information by calculating an aggregate value obtained by aggregating the frequency values included in the combination of the numeric field initially divided into the M fields by the initial division procedure and the category field category value. Information generation procedure for evaluation to be stored in
In accordance with dynamic programming , each of the evaluation information generated by the evaluation information generation procedure is evaluated based on dynamic programming for every natural number n that does not exceed both the maximum number of divisions and M that are input. A function is applied, and the evaluation values of the numerical field initially divided into M pieces are sequentially calculated and stored in the evaluation value storage table for each n until a predetermined calculation end condition is satisfied , and the evaluation value storage table An evaluation value calculation procedure for generating a combination of the evaluation value calculated last until the predetermined calculation end condition is satisfied for each n and the division corresponding to the evaluation value;
An evaluation value correction procedure for correcting each evaluation value by applying an evaluation correction function corresponding to the evaluation function to each evaluation value of the combination generated by the evaluation value calculation procedure every n times;
The division selection procedure for selecting and outputting the combination having the maximum evaluation value from the combinations for each n in which the evaluation values have been corrected by the evaluation value correction procedure is executed in the numerical field division device A computer-readable recording medium on which a numerical field division program is recorded.

A data table in which a frequency value corresponding to each combination of one numerical field composed of a plurality of numerical values and 0 or one category field composed of a plurality of category values is stored in association with each other is read and input M ( An initial division step of initially dividing the numerical field into the M pieces according to M is a natural number);
An evaluation information table as evaluation information by calculating an aggregate value obtained by aggregating the frequency values included in the combination of the numeric field initially divided into the M fields by the initial division step and the category field category value. Information generation step for evaluation stored in
In accordance with dynamic programming , each of the evaluation information generated by the evaluation information generation step is evaluated based on dynamic programming for every natural number n that does not exceed both the maximum number of divisions and M input. A function is applied, and the evaluation values of the numerical field initially divided into M pieces are sequentially calculated and stored in the evaluation value storage table for each n until a predetermined calculation end condition is satisfied , and the evaluation value storage table An evaluation value calculation step for generating a combination of the evaluation value calculated last until the predetermined calculation end condition is satisfied for each n and the division corresponding to the evaluation value;
An evaluation value correction step of correcting each evaluation value by applying an evaluation correction function corresponding to the evaluation function to each evaluation value of the combination generated by the evaluation value calculation step every n times; and
The numerical field dividing device executes a division selection step of selecting and outputting the combination having the maximum evaluation value from the combinations for each of the n in which the evaluation values are corrected by the evaluation value correction step. A numerical field dividing method characterized by:

The division selection procedure selects and outputs the combination including the evaluation value having the maximum evaluation value from the combinations for each n in which the evaluation values are corrected by the evaluation value correction procedure. 3. The numerical value field dividing program according to claim 2, wherein: