JP2004086350A

JP2004086350A - Text information analysis system and presentation method of analysis result

Info

Publication number: JP2004086350A
Application number: JP2002243973A
Authority: JP
Inventors: Ryohei Orihara; 折原　良平; Kazuhiko Atsumi; 渥美　一彦; Kouichi Sasaki; 笹氣　光一
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-08-23
Filing date: 2002-08-23
Publication date: 2004-03-18
Anticipated expiration: 2022-08-23
Also published as: JP3831319B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a text information analysis system used for presenting a plurality of results by analysis techniques different from one another by systematically combining them with one another. <P>SOLUTION: A knowledge analysis part 12 executes, for text information stored in a knowledge database 13, an analysis by a clustering part 121 and an analysis by a text mining part 122, and stores the results thereof (a clustering result 141 and a text mining result 142) in an analysis result storing database 14. A user interface part 11 receives, from a user, selections of a cluster of the clustering result 141 disposed on the longitudinal axis and the lateral axis, and of a category of the text mining result 142 by an analysis axis selection part 111, and carries out counting of the analysis results based on the selections by an analysis result counting part 112 to present the counting result to the user. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
この発明は、例えばＬＡＮ（Ｌｏｃａｌ　Ａｒｅａ　Ｎｅｔｗｏｒｋ）やイントラネット経由で収集・蓄積されたアンケートや日報などのテキスト情報を分析するテキスト情報分析システムおよび同システムに適用される分析結果の提示方法に関する。
【０００２】
【従来の技術】
近年、ＬＡＮやイントラネットを敷設し、各社員がもつ情報、例えば業務上で発生するアンケートや日報などの非定型情報を部門を越えて収集・蓄積する企業が増えつつある。この収集・蓄積された情報は、全社員の知識として共有・活用されることを目的に、様々な分析が施されるのが一般的である。そして、その分析手法として、現在では、クラスタリング分析とテキストマイニング分析とがよく知られている。
【０００３】
クラスタリング分析は、例えば特開２００２−１４９６７０号公報に記載されているように、各単語の出現頻度や複数の単語間の関連度により、収集・蓄積された情報を分類するものである。ここで、複数の単語間の関連度とは、共起性の有無をいい、例えば「私はＡとＢを購入した。」といった、「Ａ」と「Ｂ」を共に含むテキスト情報が多数存在する場合、この「Ａ」と「Ｂ」は共起性があると判断する。
【０００４】
その結果、「Ａ」という単語の出現頻度が高い情報だけが同じクラスタに属するものとして取り扱われるだけでなく、「Ｂ」という単語の出現頻度が高い情報も同じクラスタに属するものとして取り扱われ、絞り込みを適切に行った精度の高い分類が自動的に実行されることになる。
【０００５】
一方、テキストマイニング分析は、例えば特開２００１−１４７９３７号公報に記載されているように、収集・蓄積された情報を利用者が望むカテゴリに分類するものである。例えば「Ｃ」、「Ｄ」、「Ｆ」製品に関する情報をそれぞれカテゴリに纏めたい場合、利用者は、どのような記述を含む場合に、その情報を各カテゴリに属するものと判断するのか、その条件を指定する。
【０００６】
このクラスタリング分析およびテキストマイニング分析によれば、無秩序に収集・蓄積された大量の情報から何らかの傾向を掴むことが可能となる等、知識の共有・活用が有効に図られることになる。
【０００７】
【発明が解決しようとする課題】
ところで、前述したクラスタリング分析およびテキストマイニング分析は、どちらもいずれのクラスタおよびカテゴリにも属さない情報を数多く発生させてしまうという欠点をもっている。したがって、いずれの分析手法を採用した場合であっても、極めて重要な情報を抽出することができずに、「その他」の多数の情報の中に埋もれさせてしまうおそれがあった。
【０００８】
また、たとえ両方の分析手法を備える場合であっても、それらの分析結果を個別に参照するだけでは、例えば一方の分析で埋もれてしまった情報のみを対象とした傾向を他方の分析で認識することは難しく、また、いわゆる相乗効果を期待することもできない。
【０００９】
この発明は、このような事情を考慮してなされたものであり、互いに異なる分析手法による複数の分析結果を有機的に結合させて提示するテキスト情報分析システムおよび同システムに適用される分析結果の提示方法を提供することを目的とする。
【００１０】
【課題を解決するための手段】
前述した目的を達成するために、この発明は、収集・蓄積された大量のテキスト情報を分析するテキスト情報分析システムにおいて、各単語の出現頻度および複数の単語間の関連度に基づき、前記収集・蓄積された大量のテキスト情報を分析するクラスタリング分析手段と、任意に指定される条件に基づき、前記収集・蓄積された大量のテキスト情報を分析するテキストマイニング分析手段と、同一のテキスト情報群に対する前記クラスタリング分析手段の分析結果と前記テキストマイニング分析手段の分析結果とを有機的に結合させて提示する分析結果提示手段とを具備することを特徴とする。
【００１１】
この発明のテキスト情報分析システムにおいては、クラスタリング分析の結果とテキストマイニング分析の結果とを、例えばそれぞれ縦軸と横軸とに割り当てた２次元配列の表形式で提示する等、この２つの分析結果を有機的に結合させて提示する。これにより、例えば一方の分析結果で埋もれてしまった情報のみを対象とした傾向を他方の分析結果で簡単に把握できるといった、それぞれの分析結果のみでは得られない付加価値の高い有益な分析結果の提示を実現する。
【００１２】
【発明の実施の形態】
以下、図面を参照してこの発明の実施形態を説明する。
【００１３】
図１は、この発明の実施形態に係る知識分析システムのネットワーク構成を示す図である。
【００１４】
この知識分析システム１は、サーバ機などと称される高性能のコンピュータ上に構築され、複数のクライアントコンピュータ２とＬＡＮやイントラネットなどのネットワーク３を介して接続される。そして、知識分析システム１は、クライアントコンピュータ２からの分析要求を受け付け、その要求に基づく分析の結果を返却する。
【００１５】
図２は、この知識分析システム１の機能ブロックを示す図である。図２に示すように、この知識分析システム１は、ユーザインタフェース部１１および知識分析部１２の処理部と、知識データベース１３および分析結果格納データベース１４のデータ部とを有している。なお、処理部は、この知識分析システム１が構築されるコンピュータに搭載されたＣＰＵの動作手順を記述するプログラムにより構成されるものであり、データ部は、同コンピュータが備える磁気ディスク装置などの記憶媒体上に構成されるものである。
【００１６】
ユーザインタフェース部１１は、クライアントコンピュータ２の利用者に対する窓口の役割を担うものであり、分析軸選択部１１１および分析結果集計部１１２を有している。分析軸選択部１１１は、クライアントコンピュータ２からの指示の一部を受け付けるものであり、その詳細は後述する。一方、分析結果集計部１１２は、この分析軸選択部１１１により受け付けた指示に基づき、分析結果の集計を行ってクライアントコンピュータ２に返却するものである。この詳細についても後述する。
【００１７】
知識分析部１２は、例えば業務上で発生するアンケートや日報など、知識データベース１３に蓄積された大量のテキスト情報を分析し、その結果を分析結果格納データベース１４に格納するものであり、クラスタリング部１２１およびテキストマイニング部１４２を有している。クラスタリング部１２１は、各単語の出現頻度や複数の単語間の関連度により、知識データベース１３のテキスト情報をクラスタに分類するものであり（クラスタリング分析）、これにより得られたクラスタリング結果１４１を分析結果格納データベース１４に格納する。一方、テキストマイニング部１４２は、利用者から指定された条件に基づき、知識データベース１３のテキスト情報を利用者が望むカテゴリに分類するものであり（テキストマイニング分析）、これにより得られたテキストマイニング結果１４２を分析結果格納データベース１４に格納する。
【００１８】
ここで、図３乃至図５を参照して、この知識分析システム１の特徴である分析結果の提示方法についての概略を説明する。
【００１９】
いま、知識データベース１３には、アンケートや日報、メールなどのテキスト情報が大量に蓄積されているものとする（図３のＡ）。そして、この同一のテキスト情報群に対して、一方では、クラスタリング部１２１がクラスタリング分析を実行し、クラスタリング結果１４１を得て（図３のＢ２）、他方では、テキストマイニング部１２２がテキストマイニング分析を実行し、テキストマイニング結果１４２を得たとする（図３のＢ１）。
【００２０】
まず、クラスタリング結果１４１に着目すると、テキスト情報群は、Ｃ１，Ｃ２，Ｃ３，…と分類されているが、これらのいずれにも属さないテキスト情報も大量に発生する。同様に、テキストマイニング結果１４２に着目すると、テキスト情報群は、Ｔ１，Ｔ２，Ｔ３，…と分類されているが、これらのいずれにも属さないテキスト情報も大量に発生する。したがって、このままでは、いずれにも属さないテキスト情報は、「その他」の多数のテキスト情報と共にただ埋もれてしまうことになる。
【００２１】
そこで、この知識分析システム１では、この２つの分析結果を有機的に連結させて、より具体的には、例えばクラスタリング結果１４１を縦軸、テキストマイニング結果１４２を横軸に割り当てた２次元配列の表形式に集計して、利用者に提示するようにした（図３のＣ）。図中、ｎ１１は、クラスタリング部１２１によるクラスタリング分析によってクラスタＣ１に属するとともに、テキストマイニング部１２２によるテキストマイニング分析によってカテゴリＴ１に属するテキスト情報の件数を示している。
【００２２】
これにより、例えばクラスタリング結果１４１では「その他」として纏められたテキスト情報群を、テキストマイニング結果１４２のＴ１，Ｔ２，Ｔ３，…の分類で参照することができ（ｎｘ１，ｎｘ２，ｎｘ３，…）、同様に、テキストマイニング結果１４２では「その他」として纏められたテキスト情報群を、クラスタリング結果１４２におけるＣ１，Ｃ２，Ｃ３，…の分類で参照することができるようになる（ｎ１ｙ，ｎ２ｙ，ｎ３ｙ，…）。また、視点の異なる２つの分析結果を有機的に結合させることにより、一方の分析結果のみからでは得られない新たな発見を促すなど、いわゆる相乗効果を期待することもできる。
【００２３】
また、このクラスタリング部１２１のクラスタリング分析により得られるクラスタリング結果１４１と、テキストマイニング部１４２のテキストマイニング分析により得られるテキストマイニング結果１４２は、テキスト情報群を多階層のクラスタまたはカテゴリに分類されているのが一般的である。図５に、多階層のカテゴリに分類されたテキストマイニング結果１４２の一例を示す。そこで、この知識分析システム１では、縦軸および横軸の項目として配置するクラスタリング結果１４１およびテキストマイニング結果１４２のクラスタおよびカテゴリの階層を、利用者の指示に応じて各軸ごとに上下に移動できるようにした。
【００２４】
例えば、図３に示した表（Ｃ）において、テキストマイニング結果１４２をＴ１に絞ってさらに詳細に参照したいという要求に対して、この知識分析システム１では、図５に示すように、横軸の項目として配置されたテキストマイニング結果１４２のカテゴリの階層を一段下に移動させるべく再集計して提示する。この移動は、その階層が続く限り可能であり、また、逆に下から上への移動も当然に可能である。
【００２５】
次に、図６乃至図１０を参照して、この知識分析システム１が分析結果の提示を行う際の動作原理について説明する。
【００２６】
ネットワーク３を介して接続されるクライアントコンピュータ２に対して知識分析サービスを提供する際、ユーザインタフェース部１１は、まず、図６に示す画面を表示させるための画面データを送信する。この画面には、テキストマイニング分析の実行を指示するボタンａ１と、クラスタリング分析の実行を指示するボタンａ２と、分析軸の選択作業に移行するためのボタンａ３と、この分析軸の選択後に分析結果の集計を開始させるためのボタンａ４とが配置される。この画面の提示を受けた利用者は、クライアントコンピュータ２が備えるマウス等のポインティングデバイスを操作し、所望のボタンを選択する。
【００２７】
ボタンａ１の選択が通知されると、ユーザインタフェース部１１は、テキストマイニング分析の実行を知識分析部１２に指示する。一方、この指示を受けた知識分析部１２は、テキストマイニング部１２２が、知識データベース１３に蓄積された最新のテキスト情報群を対象にテキストマイニング分析を実行し、その分析結果、つまりテキストマイニング結果１４２を分析結果格納データベース１４に格納する。
【００２８】
同様に、ボタンａ２の選択が通知されると、ユーザインタフェース部１１は、クラスタリング分析の実行を知識分析部１２に指示する。一方、この指示を受けた知識分析部１２は、クラスタリング部１２１が、知識データベース１３に蓄積された最新のテキスト情報群を対象にテキストマイニング分析を実行し、その分析結果、つまりクラスタリング結果１４１を分析結果格納データベース１４に格納する。
【００２９】
また、ボタンａ３の選択が通知されると、ユーザインタフェース部１１は、図７に示す画面を表示させるための画面データを送信する。この画面には、クラスタリング結果１４１が割り当てられる表の縦軸の選択作業に移行するためのボタンｂ１と、テキストマイニング結果１４２が割り当てられる表の横軸の選択作業に移行するためのボタンｂ２とが追加配置される。そして、このボタンｂ１またはボタンｂ２の選択が通知されると、ユーザインタフェース部１１は、その通知された分析軸の選択処理を開始する。
【００３０】
いま、ボタンｂ１の選択が通知されたとすると、ユーザインタフェース部１１は、分析結果格納データベース１４に格納されたクラスタリング結果１４１におけるクラスタの階層構造を分析軸選択部１１１に取得させる。そして、ユーザインタフェース部１１は、その取得させたクラスタの階層構造を示した画面を表示させるための画面データを作成して送信する。図８に、この時に利用者に提示される画面を例示する。
【００３１】
図８の例では、クラスタリング結果１４１におけるクラスタの階層構造は、最上位層にＣ１，Ｃ２，…が存在し、また、Ｃ１の１つ下の層には、Ｃ１１，Ｃ１２，…が存在する。さらに、Ｃ１１の１つ下の層には、Ｃ１１１，Ｃ１１２，Ｃ１１３，Ｃ１１４，Ｃ１１５が存在する。そして、この画面の提示を受けた利用者が、この中からＣ１１を選択する場合、クライアントコンピュータ２が備えるマウス等のポインティングデバイスを操作し、Ｃ１１を選択した状態でボタンｃ１を選択する。一方、このＣ１１の選択を通知されたユーザインタフェース部１１は、図９に示す画面を表示させるための画面データを編集して送信する。図９に示すように、利用者が選択したＣ１１が、表の縦軸として選択された旨が示されている（図９のｄ１）。
【００３２】
また、同様に、利用者は、ボタンａ３およびボタンｂ２を選択し、テキストマイニング結果１４２が割り当てられる表の横軸の選択作業を行う。そして、その作業完了後、利用者は、ボタンａ４を選択し、分析結果の集計を開始させる。
【００３３】
このボタンａ４の選択が通知されると、ユーザインタフェース部１１は、クラスタリング結果１４１とテキストマイニング結果１４２とを有機的に結合させるための集計を分析結果集計部１１２に行わせる。
【００３４】
分析結果格納データベース１４に格納されるクラスタリング結果１４１およびテキストマイニング結果１４２には、各クラスタおよび各カテゴリにどのテキスト情報が属しているのかを識別するための情報が含まれている。したがって、この情報を突き合わせることにより、クラスタリング結果１４１の任意のクラスタとテキストマイニング結果１４２の任意のカテゴリの双方に属するテキスト情報の件数を集計することができる。分析結果集計部１１２は、このような突き合わせを行っていくことにより、クラスタリング結果１４１を縦軸、テキストマイニング結果１４２を横軸に割り当てた分析結果の集計を実行する。そして、ユーザインタフェース部１１は、この分析結果集計部１１２に集計させた分析結果を提示する画面を表示させるための画面データを作成して送信する。図１０に、この時に利用者に提示される画面を例示する。
【００３５】
図１０に示すように、画面の上部には、利用者が選択した縦軸および横軸のクラスタおよびカテゴリがそれぞれ表示される（ｅ１）。ここでは、表の縦軸にクラスタＣ１、表の横軸にカテゴリＴ１１が選択されている。そして、この選択に基づき、画面の中央部には、クラスタＣ１の１つ下の階層のクラスタＣ１１，Ｃ１２，Ｃ１３，…を縦軸の項目として配置し、カテゴリＴ１１の１つ下の階層のカテゴリＴ１１１，Ｔ１１２，Ｔ１１３，Ｔ１１４，Ｔ１１５，…を横軸の項目として配置した表形式で集計されたクラスタリング結果１４１およびテキストマイニング結果１４２が表示される（Ｅ２）。なお、この表は、下方向および右方向にそれぞれスクロール可能であり、その末端には、いずれのクラスタおよびカテゴリにも属さないテキスト情報の件数がそれぞれ集計されて表示される。
【００３６】
また、この縦軸の項目として配置されたクラスタ、または横軸の項目として配置されたカテゴリのいずれかを選択すると、その選択されたクラスタまたはカテゴリの１つ下の階層のクラスタまたはカテゴリを各軸に配置した状態で、クラスタリング結果１４１およびテキストマイニング結果１４２が再集計されて表示される（ドリルダウン）。例えば、クラスタＣ１２が選択されたとすると、縦軸はクラスタＣ１２１，Ｃ１２２，…に置き換わり、表内の件数も更新される。
【００３７】
さらに、画面の下部には、縦軸の項目として配置されたクラスタ、または横軸の項目として配置されたカテゴリの階層を１つ上のクラスタまたはカテゴリに移動させる（ドリルアップ）ためのボタンが配置される（ｅ３）。例えば、図１０の状態で表の横軸をドリルアップさせる旨が指示されると、横軸の項目として配置されるカテゴリは、カテゴリＴ１１，Ｔ１２，Ｔ１３，…に置き換わり、表内の件数も更新される。
【００３８】
図１１は、この知識分析システム１が分析結果の提示を行う際の動作手順を示すフローチャートである。
【００３９】
ユーザインタフェース部１１は、まず、クライアントコンピュータ２の利用者が作業を選択するためのタスク選択画面を表示させる画面データを送信する（ステップＡ１）。次に、この画面の提示を受けた利用者が、「分析軸の選択」を選択すると（ステップＡ２のＹＥＳ）、ユーザインタフェース部１１は、縦軸および横軸のいずれかを選択するための選択画面を表示させる画像データを送信する（ステップＡ３）。そして、この画面の提示を受けた利用者が、「縦軸」を選択した場合（ステップＡ４のＹＥＳ）、ユーザインタフェース部１１は、分析軸選択部１１１を用いてクラスタ選択処理を実行し（ステップＡ５）、「横軸」を選択した場合には（ステップＡ４のＮＯ）、分析軸選択部１１１を用いてカテゴリ選択処理を実行する（ステップＡ６）。
【００４０】
また、「分析スタート」が選択された場合（ステップＡ２のＮＯ，ステップＡ７のＹＥＳ）、ユーザインタフェース部１１は、分析結果集計部１１２を用いて選択されたクラスタおよびカテゴリを分析軸とした集計処理を実行し（ステップＡ８）、その集計結果を提示した分析結果画面を表示させる画像データを送信する（ステップＡ９）。
【００４１】
さらに、この分析結果画面上で分析軸の階層移動が指示されると（ステップＡ１０のＹＥＳ）、ユーザインタフェース部１１は、分析結果集計部１１２を用いて移動後のクラスタおよびカテゴリを分析軸とした集計処理を再実行する（ステップＡ８）。
【００４２】
以上の手順により、この知識分析システム１は、クラスタリング部１２１のクラスタリング結果１４１とテキストマイニング部１２２のテキストマイニング結果１４２とを有機的に結合させて提示し、また、分析対象のクラスタまたはカテゴリの階層を指示に応じて上下に移動させる。これにより、例えば一方の分析で埋もれた情報の傾向を他方の分析で把握すること等を可能とし、また、一方の分析結果のみからでは得られない新たな発見を促すなど、いわゆる相乗効果を期待することもできる。
【００４３】
なお、ここでは、視点の異なる２つの分析結果を有機的に結合させる方法として、２次元配列の表形式に集計する例を示したが、この発明は、これに限られるものではなく、互いの関係を表現できれば、どのような形式を適用することも可能である。
【００４４】
また、ここでは、図４に示すとおり、分析結果が多階層に整理されていることを前提に説明を行ったが、これは必ずしも必須ではなく、複数の分類観点を無理やりひとつの階層に押し込むことを強制するものではない。複数の分類観点は、それぞれ独立した平坦な分類体系として扱うことができ、たとえば表の２軸を利用してそれらを有機的に組み合わせることが可能である。
【００４５】
つまり、本願発明は、前記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。更に、前記実施形態には種々の段階の発明が含まれており、開示される複数の構成要件における適宜な組み合わせにより種々の発明が抽出され得る。たとえば、実施形態に示される全構成要件から幾つかの構成要件が削除されても、発明が解決しようとする課題の欄で述べた課題が解決でき、発明の効果の欄で述べられている効果が得られる場合には、この構成要件が削除された構成が発明として抽出され得る。
【００４６】
【発明の効果】
以上のように、この発明によれば、互いに異なる分析手法による複数の分析結果を有機的に結合させて提示するテキスト情報分析システムおよび同システムに適用される分析結果の提示方法を提供することが可能となる。
【図面の簡単な説明】
【図１】この発明の実施形態に係る知識分析システムのネットワーク構成を示す図。
【図２】同実施形態の知識分析システムの機能ブロックを示す図。
【図３】同実施形態の知識分析システムが実行する分析結果の提示方法についての概略を説明するための第１の図。
【図４】同実施形態の知識分析システムが実行する分析結果の提示方法についての概略を説明するための第２の図。
【図５】同実施形態の知識分析システムが実行する分析結果の提示方法についての概略を説明するための第３の図。
【図６】同実施形態の知識分析システムで表示される画面を例示する第１の図。
【図７】同実施形態の知識分析システムで表示される画面を例示する第２の図。
【図８】同実施形態の知識分析システムで表示される画面を例示する第３の図。
【図９】同実施形態の知識分析システムで表示される画面を例示する第４の図。
【図１０】同実施形態の知識分析システムで表示される画面を例示する第５の図。
【図１１】同実施形態の知識分析システムが分析結果の提示を行う際の動作手順を示すフローチャート。
【符号の説明】
１…知識分析システム
２…クライアントコンピュータ
３…ネットワーク
１１…ユーザインタフェース
１２…知識分析部
１３…知識データベース
１４…分析結果格納データベース
１１１…分析軸選択部
１１２…分析結果集計部
１２１…クラスタリング部
１２２…テキストマイニング部
１４１…クラスタリング結果
１４２…テキストマイニング結果[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a text information analysis system for analyzing text information such as questionnaires and daily reports collected and stored via a LAN (Local Area Network) or an intranet, and a method of presenting analysis results applied to the system.
[0002]
[Prior art]
2. Description of the Related Art In recent years, an increasing number of companies have laid a LAN or an intranet to collect and accumulate information held by each employee, for example, irregular information such as a questionnaire and a daily report generated during work, across departments. This collected and accumulated information is generally subjected to various analyzes in order to be shared and utilized as knowledge of all employees. At present, clustering analysis and text mining analysis are well known as the analysis methods.
[0003]
In the clustering analysis, as described in, for example, JP-A-2002-149670, information collected and stored is classified according to the appearance frequency of each word and the degree of association between a plurality of words. Here, the degree of relevance between a plurality of words refers to the presence or absence of co-occurrence. For example, there is a lot of text information including both “A” and “B” such as “I purchased A and B.” In this case, it is determined that “A” and “B” have co-occurrence.
[0004]
As a result, not only information with a high frequency of appearance of the word "A" is treated as belonging to the same cluster, but also information with a high frequency of appearance of the word "B" is treated as belonging to the same cluster. Is performed automatically and the classification with high accuracy is performed automatically.
[0005]
On the other hand, the text mining analysis classifies the collected and accumulated information into categories desired by the user, as described in, for example, JP-A-2001-147937. For example, when it is desired to group information on “C”, “D”, and “F” products into respective categories, the user may determine what kind of description is included in the category and determine whether the information belongs to each category. Specify conditions.
[0006]
According to the clustering analysis and the text mining analysis, knowledge sharing and utilization can be effectively achieved, for example, it is possible to grasp a certain tendency from a large amount of information collected and accumulated in a random manner.
[0007]
[Problems to be solved by the invention]
By the way, both the clustering analysis and the text mining analysis described above have a disadvantage that a large amount of information that does not belong to any cluster or category is generated. Therefore, no matter which analysis method is employed, extremely important information cannot be extracted, and may be buried in a large number of other information.
[0008]
Even if both analysis methods are provided, simply referring to the analysis results individually, for example, recognizes, in the other analysis, a tendency that targets only the information buried in one analysis. It's difficult, and you can't expect so-called synergy.
[0009]
The present invention has been made in view of such circumstances, and a text information analysis system that organically combines and presents a plurality of analysis results obtained by different analysis techniques and an analysis result analysis method applied to the text information analysis system. The purpose is to provide a presentation method.
[0010]
[Means for Solving the Problems]
In order to achieve the above-mentioned object, the present invention provides a text information analysis system for analyzing a large amount of text information collected and stored, based on the frequency of appearance of each word and the degree of association between a plurality of words. Clustering analysis means for analyzing a large amount of accumulated text information; text mining analysis means for analyzing the large amount of collected and accumulated text information based on arbitrarily specified conditions; and An analysis result presentation unit that organically combines and presents the analysis result of the clustering analysis unit and the analysis result of the text mining analysis unit is provided.
[0011]
In the text information analysis system according to the present invention, the results of the two analysis are presented, for example, by presenting the result of the clustering analysis and the result of the text mining analysis in the form of a two-dimensional array assigned to the vertical and horizontal axes, respectively. Are presented organically bound. As a result, for example, it is possible to easily grasp, using the other analysis results, the tendency for only the information that has been buried in one analysis result. Realize the presentation.
[0012]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0013]
FIG. 1 is a diagram showing a network configuration of a knowledge analysis system according to an embodiment of the present invention.
[0014]
The knowledge analysis system 1 is built on a high-performance computer called a server or the like, and is connected to a plurality of client computers 2 via a network 3 such as a LAN or an intranet. Then, the knowledge analysis system 1 receives an analysis request from the client computer 2 and returns an analysis result based on the request.
[0015]
FIG. 2 is a diagram showing functional blocks of the knowledge analysis system 1. As shown in FIG. 2, the knowledge analysis system 1 has a processing unit of a user interface unit 11 and a knowledge analysis unit 12, and a data unit of a knowledge database 13 and an analysis result storage database 14. The processing unit is configured by a program that describes the operation procedure of a CPU mounted on a computer on which the knowledge analysis system 1 is constructed, and the data unit is a storage unit such as a magnetic disk device included in the computer. It is configured on a medium.
[0016]
The user interface unit 11 serves as a contact point for the user of the client computer 2, and has an analysis axis selection unit 111 and an analysis result totaling unit 112. The analysis axis selection unit 111 receives a part of the instruction from the client computer 2, and the details will be described later. On the other hand, the analysis result totaling unit 112 totals the analysis results based on the instruction received by the analysis axis selecting unit 111 and returns the result to the client computer 2. This will be described later in detail.
[0017]
The knowledge analysis unit 12 analyzes a large amount of text information accumulated in the knowledge database 13 such as a questionnaire and a daily report generated in business, and stores the analysis result in the analysis result storage database 14. The clustering unit 121 And a text mining unit 142. The clustering unit 121 classifies the text information of the knowledge database 13 into clusters based on the appearance frequency of each word and the degree of association between a plurality of words (clustering analysis). The clustering result 141 obtained by this is analyzed as a result of analysis. It is stored in the storage database 14. On the other hand, the text mining unit 142 classifies the text information in the knowledge database 13 into categories desired by the user based on the conditions specified by the user (text mining analysis), and obtains the text mining result obtained by this. 142 is stored in the analysis result storage database 14.
[0018]
Here, an outline of a method of presenting an analysis result, which is a feature of the knowledge analysis system 1, will be described with reference to FIGS.
[0019]
Now, it is assumed that a large amount of text information such as a questionnaire, a daily report, and an e-mail is stored in the knowledge database 13 (A in FIG. 3). Then, on the one hand, the clustering unit 121 performs a clustering analysis on the same text information group to obtain a clustering result 141 (B2 in FIG. 3), and on the other hand, the text mining unit 122 performs the text mining analysis. It is assumed that the text mining result 142 is obtained by the execution (B1 in FIG. 3).
[0020]
First, paying attention to the clustering result 141, the text information group is classified as C1, C2, C3,..., But a large amount of text information that does not belong to any of them is generated. Similarly, focusing on the text mining result 142, the text information group is classified as T1, T2, T3,..., But a large amount of text information that does not belong to any of them is generated. Therefore, as it is, text information that does not belong to any of them will be simply buried together with a large number of "other" text information.
[0021]
Therefore, in the knowledge analysis system 1, the two analysis results are organically connected, and more specifically, for example, a two-dimensional array in which the clustering result 141 is assigned to the vertical axis and the text mining result 142 is assigned to the horizontal axis. The data is tabulated and presented to the user (C in FIG. 3). In the figure, n11 indicates the number of pieces of text information that belong to the cluster C1 by the clustering analysis by the clustering unit 121 and belong to the category T1 by the text mining analysis by the text mining unit 122.
[0022]
Thereby, for example, the text information group grouped as “others” in the clustering result 141 can be referred to by the classification of T1, T2, T3,... Of the text mining result 142 (nx1, nx2, nx3,. Similarly, in the text mining result 142, the text information group summarized as “others” can be referred to by the classification of C1, C2, C3,... In the clustering result 142 (n1y, n2y, n3y,. ). By organically combining two analysis results from different viewpoints, a so-called synergistic effect can be expected, such as prompting a new discovery that cannot be obtained from only one analysis result.
[0023]
The clustering result 141 obtained by the clustering analysis of the clustering unit 121 and the text mining result 142 obtained by the text mining analysis of the text mining unit 142 classify the text information group into multi-level clusters or categories. Is common. FIG. 5 shows an example of a text mining result 142 classified into a multi-layer category. Therefore, in this knowledge analysis system 1, the cluster and category hierarchies of the clustering result 141 and the text mining result 142 arranged as items on the vertical axis and the horizontal axis can be moved up and down for each axis according to the user's instruction. I did it.
[0024]
For example, in the table (C) shown in FIG. 3, in response to a request to refer to the text mining result 142 in more detail by focusing on T1, the knowledge analysis system 1 uses the horizontal axis as shown in FIG. The category of the category of the text mining result 142 arranged as an item is re-aggregated and presented to be moved down one level. This movement is possible as long as the hierarchy continues, and conversely, movement from bottom to top is also possible.
[0025]
Next, with reference to FIGS. 6 to 10, an operation principle when the knowledge analysis system 1 presents an analysis result will be described.
[0026]
When providing the knowledge analysis service to the client computer 2 connected via the network 3, the user interface unit 11 first transmits screen data for displaying the screen shown in FIG. The screen includes a button a1 for instructing execution of text mining analysis, a button a2 for instructing execution of clustering analysis, a button a3 for shifting to an analysis axis selection operation, and an analysis result after selecting the analysis axis. And a button a4 for starting tallying. The user who has received this screen operates a pointing device such as a mouse provided in the client computer 2 and selects a desired button.
[0027]
When the selection of the button a1 is notified, the user interface unit 11 instructs the knowledge analysis unit 12 to execute the text mining analysis. On the other hand, in response to the instruction, the knowledge analysis unit 12 performs the text mining analysis on the latest text information group accumulated in the knowledge database 13 by the text mining unit 122, and the analysis result, that is, the text mining result 142 Is stored in the analysis result storage database 14.
[0028]
Similarly, when the selection of the button a2 is notified, the user interface unit 11 instructs the knowledge analysis unit 12 to execute the clustering analysis. On the other hand, in response to the instruction, the knowledge analysis unit 12 performs the text mining analysis on the latest text information group stored in the knowledge database 13 by the clustering unit 121, and analyzes the analysis result, that is, the clustering result 141. The result is stored in the result storage database 14.
[0029]
Further, when the selection of the button a3 is notified, the user interface unit 11 transmits screen data for displaying the screen shown in FIG. On this screen, a button b1 for shifting to a selection operation on the vertical axis of the table to which the clustering result 141 is assigned, and a button b2 for shifting to a selection operation on the horizontal axis of the table to which the text mining result 142 is assigned. It is additionally arranged. Then, when the selection of the button b1 or the button b2 is notified, the user interface unit 11 starts the notified analysis axis selection processing.
[0030]
Now, assuming that the selection of the button b1 has been notified, the user interface unit 11 causes the analysis axis selection unit 111 to acquire the hierarchical structure of the cluster in the clustering result 141 stored in the analysis result storage database 14. Then, the user interface unit 11 creates and transmits screen data for displaying a screen showing the hierarchical structure of the acquired cluster. FIG. 8 illustrates a screen presented to the user at this time.
[0031]
In the example of FIG. 8, in the hierarchical structure of the cluster in the clustering result 141, C1, C2,... Exist in the uppermost layer, and C11, C12,. Further, C111, C112, C113, C114, and C115 exist in a layer immediately below C11. Then, when the user who has been presented with this screen selects C11 from among them, the user operates a pointing device such as a mouse provided in the client computer 2 and selects the button c1 with C11 selected. On the other hand, the user interface unit 11 notified of the selection of C11 edits and transmits screen data for displaying the screen shown in FIG. As shown in FIG. 9, it is indicated that C11 selected by the user has been selected as the vertical axis of the table (d1 in FIG. 9).
[0032]
Similarly, the user selects the button a3 and the button b2, and performs the operation of selecting the horizontal axis of the table to which the text mining result 142 is assigned. Then, after the work is completed, the user selects the button a4 to start counting the analysis results.
[0033]
When notified of the selection of the button a4, the user interface unit 11 causes the analysis result totaling unit 112 to perform totalizing for organically combining the clustering result 141 and the text mining result 142.
[0034]
The clustering result 141 and the text mining result 142 stored in the analysis result storage database 14 include information for identifying which text information belongs to each cluster and each category. Therefore, by matching this information, the number of text information belonging to both the arbitrary cluster of the clustering result 141 and the arbitrary category of the text mining result 142 can be totaled. By performing such matching, the analysis result totaling unit 112 totalizes the analysis results in which the clustering result 141 is assigned to the vertical axis and the text mining result 142 is assigned to the horizontal axis. Then, the user interface unit 11 creates and transmits screen data for displaying a screen that presents the analysis results compiled by the analysis result compilation unit 112. FIG. 10 illustrates a screen presented to the user at this time.
[0035]
As shown in FIG. 10, the vertical and horizontal clusters and categories selected by the user are displayed at the top of the screen (e1). Here, the cluster C1 is selected on the vertical axis of the table, and the category T11 is selected on the horizontal axis of the table. Based on this selection, clusters C11, C12, C13,... Of the next lower layer of the cluster C1 are arranged in the center of the screen as items on the vertical axis, and the category of the next lower layer of the category T11 is arranged. The clustering result 141 and the text mining result 142 totaled in a table format in which T111, T112, T113, T114, T115,... Are arranged as items on the horizontal axis are displayed (E2). This table can be scrolled downward and to the right, respectively. At the end, the number of text information items that do not belong to any of the clusters and categories is totaled and displayed.
[0036]
When one of the clusters arranged as the items on the vertical axis and the categories arranged as the items on the horizontal axis is selected, the cluster or category in the hierarchy one level below the selected cluster or category is displayed on each axis. , The clustering result 141 and the text mining result 142 are re-aggregated and displayed (drill-down). For example, if cluster C12 is selected, the vertical axis is replaced with clusters C121, C122,... And the number of records in the table is updated.
[0037]
Further, at the bottom of the screen, a button for moving (drilling up) a cluster arranged as an item on the vertical axis or a hierarchy of a category arranged as an item on the horizontal axis to the next higher cluster or category is arranged. (E3). For example, when it is instructed to drill up the horizontal axis of the table in the state of FIG. 10, the categories arranged as the items of the horizontal axis are replaced with categories T11, T12, T13,... And the number of cases in the table is also updated. Is done.
[0038]
FIG. 11 is a flowchart showing an operation procedure when the knowledge analysis system 1 presents an analysis result.
[0039]
First, the user interface unit 11 transmits screen data for displaying a task selection screen for the user of the client computer 2 to select an operation (step A1). Next, when the user who has received the display of this screen selects “selection of analysis axis” (YES in step A2), the user interface unit 11 makes a selection for selecting one of the vertical axis and the horizontal axis. The image data for displaying the screen is transmitted (step A3). Then, when the user who receives the display of this screen selects “vertical axis” (YES in step A4), the user interface unit 11 executes a cluster selection process using the analysis axis selection unit 111 (step S4). A5) If “horizontal axis” is selected (NO in step A4), a category selection process is performed using the analysis axis selection unit 111 (step A6).
[0040]
When “start analysis” is selected (NO in step A2, YES in step A7), the user interface unit 11 uses the analysis result totaling unit 112 to perform a totaling process using the selected cluster and category as an analysis axis. Is executed (step A8), and the image data for displaying the analysis result screen presenting the tally result is transmitted (step A9).
[0041]
Further, when a hierarchical movement of the analysis axis is instructed on the analysis result screen (YES in step A10), the user interface unit 11 uses the analysis result totaling unit 112 to set the moved cluster and category as the analysis axis. The counting process is executed again (step A8).
[0042]
According to the above procedure, the knowledge analysis system 1 presents the clustering result 141 of the clustering unit 121 and the text mining result 142 of the text mining unit 122 in an organically combined manner, and furthermore, the hierarchy of the cluster or category to be analyzed. Is moved up and down according to the instruction. This enables the so-called synergistic effect, for example, to allow information in one analysis to grasp trends in the information buried in the other analysis, and to promote new discoveries that cannot be obtained from the results of only one analysis. You can also.
[0043]
Here, as an example of a method of organically combining two analysis results having different viewpoints, an example in which the results are tabulated in a two-dimensional array is shown. However, the present invention is not limited to this, and the present invention is not limited to this. Any form can be applied as long as the relationship can be expressed.
[0044]
Also, here, as shown in FIG. 4, the description has been made on the assumption that the analysis results are arranged in multiple hierarchies. However, this is not essential, and it is necessary to force a plurality of classification viewpoints into one hierarchy. Does not force. A plurality of classification viewpoints can be treated as independent flat classification systems, and can be organically combined using, for example, two axes in a table.
[0045]
That is, the present invention is not limited to the above-described embodiment, and can be variously modified in an implementation stage without departing from the gist of the invention. Furthermore, the embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements. For example, even if some components are deleted from all the components shown in the embodiment, the problem described in the column of the problem to be solved by the invention can be solved, and the effects described in the column of the effect of the invention can be solved. Is obtained, a configuration from which this configuration requirement is deleted can be extracted as an invention.
[0046]
【The invention's effect】
As described above, according to the present invention, it is possible to provide a text information analysis system that organically combines and presents a plurality of analysis results obtained by different analysis methods and a method of presenting the analysis results applied to the system. It becomes possible.
[Brief description of the drawings]
FIG. 1 is a diagram showing a network configuration of a knowledge analysis system according to an embodiment of the present invention.
FIG. 2 is an exemplary view showing functional blocks of the knowledge analysis system according to the embodiment;
FIG. 3 is an exemplary first view for explaining an outline of a method of presenting an analysis result executed by the knowledge analysis system of the embodiment.
FIG. 4 is an exemplary second diagram for describing an outline of a method of presenting an analysis result executed by the knowledge analysis system of the embodiment.
FIG. 5 is an exemplary third diagram illustrating an outline of a method of presenting an analysis result executed by the knowledge analysis system according to the embodiment;
FIG. 6 is an exemplary first diagram illustrating a screen displayed by the knowledge analysis system according to the embodiment;
FIG. 7 is an exemplary second diagram illustrating a screen displayed by the knowledge analysis system according to the embodiment;
FIG. 8 is an exemplary third diagram illustrating a screen displayed by the knowledge analysis system according to the embodiment;
FIG. 9 is an exemplary fourth diagram illustrating a screen displayed by the knowledge analysis system according to the embodiment;
FIG. 10 is an exemplary fifth diagram illustrating a screen displayed by the knowledge analysis system of the embodiment.
FIG. 11 is an exemplary flowchart illustrating the operation procedure when the knowledge analysis system according to the embodiment presents an analysis result.
[Explanation of symbols]
1 ... Knowledge analysis system 2 ... Client computer 3 ... Network 11 ... User interface 12 ... Knowledge analysis unit 13 ... Knowledge database 14 ... Analysis result storage database 111 ... Analysis axis selection unit 112 ... Analysis result totaling unit 121 ... Clustering unit 122 ... Text Mining unit 141 Clustering result 142 Text mining result

Claims

In a text information analysis system that analyzes a large amount of collected and stored text information,
First and second analysis means;
A text information analysis system, comprising: an analysis result presenting means for organically combining and presenting two analysis results of the first and second analysis means for the same text information group.

In a text information analysis system that analyzes a large amount of collected and stored text information,
Clustering analysis means for analyzing the collected and accumulated large amount of text information based on the appearance frequency of each word and the degree of association between the plurality of words;
Text mining analysis means for analyzing the large amount of collected and accumulated text information based on arbitrarily specified conditions,
A text information analysis system, comprising: an analysis result presentation unit that organically combines and presents the analysis result of the clustering analysis unit and the analysis result of the text mining analysis unit for the same text information group.

The text information analysis system according to claim 2, wherein the condition specified by the text mining analysis means is a condition for classifying the text information group into a desired category.

The analysis result presentation means presents the analysis result of the clustering analysis means and the analysis result of the text mining analysis means in a two-dimensional array tabular form assigned to a vertical axis and a horizontal axis, respectively. Item 2. The text information analysis system according to Item 2.

The clustering analysis means and the text mining analysis means classify the text information group into multi-level clusters and categories, and the analysis result presenting means includes a 5. The text information analysis system according to claim 4, further comprising means for moving the hierarchy up and down for each axis.

A computer that operates as a text information analysis system that analyzes a large amount of collected and accumulated text information,
First and second analysis means,
A program for functioning as analysis result presenting means for presenting two analysis results of the first and second analysis means for the same text information group in an organically combined manner.

A computer that operates as a text information analysis system that analyzes a large amount of collected and accumulated text information,
Clustering analysis means for analyzing the collected and accumulated large amount of text information based on the appearance frequency of each word and the degree of association between the plurality of words;
Text mining analysis means for analyzing a large amount of collected and accumulated text information based on arbitrarily specified conditions,
A program for functioning as analysis result presenting means for presenting an analysis result of the clustering analysis means and an analysis result of the text mining analysis means for the same text information group in an organically combined manner.

The program according to claim 7, wherein the condition specified by the text mining analysis unit is a condition for classifying the text information group into a desired category.

The analysis result presentation means presents the analysis result of the clustering analysis means and the analysis result of the text mining analysis means in a two-dimensional array tabular form assigned to a vertical axis and a horizontal axis, respectively. Item 7. The program according to Item 7.

The clustering analysis unit and the text mining analysis unit classify the text information group into multi-level clusters and categories,
10. The program according to claim 9, wherein the analysis result presenting means includes means for moving a hierarchy of clusters and categories arranged as items of the vertical axis and the horizontal axis up and down for each axis.

A method of presenting an analysis result applied to a text information analysis system for analyzing a large amount of collected and accumulated text information,
First and second analysis steps;
An analysis result presentation step of organically combining and presenting the two analysis results of the first and second analysis steps for the same text information group.

A method of presenting an analysis result applied to a text information analysis system for analyzing a large amount of collected and accumulated text information,
A clustering analysis step of analyzing the collected and accumulated large amount of text information based on an appearance frequency of each word and a degree of association between a plurality of words;
Based on arbitrarily specified conditions, a text mining analysis step of analyzing the collected and accumulated large amount of text information,
An analysis result presentation step of organically combining and presenting the analysis result of the clustering analysis step and the analysis result of the text mining analysis step for the same text information group, and presenting the analysis result. .

13. The method according to claim 12, wherein the condition specified in the text mining analysis step is a condition for classifying the text information group into a desired category.

The analysis result presenting step presents the analysis result of the clustering analysis step and the analysis result of the text mining analysis step in a two-dimensional array table format assigned to a vertical axis and a horizontal axis, respectively. Item 14. A method for presenting an analysis result according to Item 12.

The clustering analysis step and the text mining analysis step classify the text information group into multi-level clusters and categories,
The analysis result presentation step according to claim 14, wherein the analysis result presentation step includes a step of moving up and down a hierarchy of clusters and categories arranged as items of the vertical axis and the horizontal axis for each axis. Method.