JP2004206477A

JP2004206477A - Device and method for analyzing text mining, program and recording medium

Info

Publication number: JP2004206477A
Application number: JP2002375457A
Authority: JP
Inventors: Naoyuki Horai; 尚幸蓬莱; Kiyoshi Nitta; 清新田
Original assignee: Celestar Lexico Sciences Inc
Current assignee: Celestar Lexico Sciences Inc
Priority date: 2002-12-25
Filing date: 2002-12-25
Publication date: 2004-07-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device and method for analizing text mining by which a concept and a view can be assigned flexibly in a text mining analysis, and to provide a program and a recording medium. <P>SOLUTION: This method comprises: a step SA-1 where the assignment of a new concept is carried out without using an existing category; a step SA-2 where a category structure is changed by the assigned new concept; a step SA-3 where an analysis object concept to be an object of a text mining analysis is selected and a configuration concept that constitutes a view opening line from concepts located in a rank lower than that of the analysis object concept in the category structure is set to thereby assign a view; and a step SA-4 where the assigned view is used to perform the text mining analysis. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、テキストマイニング分析装置、テキストマイニング分析方法、プログラム、および、記録媒体に関し、特に、テキストマイニング分析において柔軟にコンセプトとビューをアサインすることのできるテキストマイニング分析装置、テキストマイニング分析方法、プログラム、および、記録媒体に関する。
【０００２】
【従来の技術】
近年、論文などの各種の技術文献を蓄積した文献データベースが構築され、インターネットなどを介して広く利用されている。例えば、米国国立バイオテクノロジーセンター（ＮＣＢＩ）が米国国立医学図書館（ＮＬＭ）等の文献データを提供するＰｕｂＭｅｄなどが存在する（例えば、非特許文献１参照。）。
【０００３】
従来の文献データベースの検索サービスにおいては、検索効率の向上などを図るために、各用語の正規形と表記形との対応を取るための「表記辞書」や、各用語についてカテゴリ分類するための「カテゴリ辞書」などが用いられている。
【０００４】
例えば、既存の表記辞書やカテゴリ辞書を用いたテキストマイニングシステムとして、ＩＢＭ（会社名）のＴＡＫＭＩ（製品名）が存在する（例えば、非特許文献２参照。）。
【０００５】
ここで、既存のテキストマイニングシステムでは、図１に示す４種類の情報（文書、コンセプト、カテゴリ、ビュー）を使用して分析を進める。図１は、テキストマイニングシステムにおけるテキストマイニング分析で扱う情報である文書、コンセプト、カテゴリ、および、ビューのそれぞれの概念を示す図である。以下に、これらの情報について図１を参照して説明する。
【０００６】
（１）文書
「文書」は、テキストマイニングの分析対象のテキストデータを意味する（図１におけるｄ０１〜ｄ１２が対応する）。各文書は、一般的にフィールドにより分割されている。
【０００７】
（２）コンセプト
「コンセプト」は、特定の概念に含まれる文書の集合を意味する（図１におけるｃ１〜ｃ６が対応する）。既存システムにおいては、コンセプトは、同義語辞書とその正規形の集合によって決定され、ある特定の概念について述べている文献の集合を保持する。
【０００８】
（３）カテゴリ
「カテゴリ」は、構造を持ったコンセプトの集合を意味する（図１におけるルート、および、その下位に属するコンセプトｃ１〜ｃ６からなる木構造が対応する）。既存システムにおいては、カテゴリは、カテゴリ辞書により決定され、文鎮型あるいは木構造型などの構造によりコンセプト集合を保持する。
【０００９】
（４）ビュー
「ビュー（視点）」は、カテゴリ中のコンセプトからなる順序つき集合を意味する。既存システムにおいては、ビューは、カテゴリが文鎮型構造の場合は、そのカテゴリに含まれる全コンセプト集合に、アルファベットなど（例えば、コンセプトのＩＤ）、出現頻度、または出現頻度倍率で順序をつけた集合として決定される。
【００１０】
一方、カテゴリが木構造の場合は、ビューは、ユーザによるテキストマイニング分析対象となるコンセプトノードの指示により決定され、指示されたコンセプトのカテゴリ木構造上の子の集合に、アルファベットなど（例えば、コンセプトのＩＤ）、出現頻度、または出現頻度倍率で順序をつけた集合として保持する。
【００１１】
ここで図１は、カテゴリが木構造であり、ユーザにより指示されたテキストマイニング分析対象となるコンセプトノードがｃ２であるときのｃ１とｃ３からなる第1のビューと、ユーザにより指示されたコンセプトノードがルートであるときのｃ２とｃ４とｃ５とｃ６からなる第２のビューとが示されている。
【００１２】
【非特許文献１】
インターネット、ＰｕｂＭｅｄのＵＲＬ：ｈｔｔｐ：／／ｗｗｗ．ｎｃｂｉ．ｎｌｍ．ｇｏｖ／ｅｎｔｒｅｚ／
【非特許文献２】
インターネット、ＩＢＭ東京基礎研究所のテキストマイニング技術紹介のホームページのＵＲＬ：ｈｔｔｐ：／／ｗｗｗ．ｔｒｌ．ｉｂｍ．ｃｏｍ／ｐｒｏｊｅｃｔｓ／ｓ７７１０／ｔｍ／ｉｎｄｅｘ．ｈｔｍ、ＴＡＫＭＩ紹介のホームページのＵＲＬ：ｈｔｔｐ：／／ｗｗｗ．ｔｒｌ．ｉｂｍ．ｃｏｍ／ｐｒｏｊｅｃｔｓ／ｓ７７１０／ｔｍ／ｔａｋｍｉ／ｔａｋｍｉ．ｈｔｍ
【００１３】
【発明が解決しようとする課題】
しかしながら、既存のテキストマイニングシステムにおいては、コンセプトのアサイン方法や、カテゴリへのビューのアサイン方法に制約があるというシステム構造上の基本的問題点を有していた。
以下、この問題点の内容について、一層具体的に説明する。
【００１４】
既存のテキストマイニングシステムにおけるコンセプトのアサイン方法は、コンセプトを同義語辞書とその正規形の集合によって決定するアサイン方法である。従って、同義語辞書とカテゴリ辞書によって定義されていないコンセプトを扱うことができないため、新規な概念に対応するコンセプトを作成することができないという問題点がある。
【００１５】
また、既存のテキストマイニングシステムにおけるビューのアサイン方法によると、カテゴリが文鎮型構造の場合は、そのカテゴリに含まれる全コンセプト集合に特定の順序をつけた集合としてビューが決定され、一方、カテゴリが木構造の場合はユーザによるコンセプトノードの指示により、その下位概念に対応するコンセプトの集合としてビューが決定される。従って、いずれの場合においてもビューに余分なコンセプトが入ってしまう場合があるという問題点がある。
【００１６】
また、既存のテキストマイニングシステムにおけるビューのアサイン方法においては、構造上で兄弟関係にないコンセプトをビューとして並べることができないという問題点がある。ここで、図２は、本問題点を説明する概念図である。図２に示すように、既存のテキストマイニングシステムにおけるビューのアサイン方法は、まず、カテゴリの中から分析したいコンセプト（分析対象コンセプト）を選択する（ＭＡ−１）。そして、そのコンセプトの「子コンセプト」（すなわち、構造上で直接１パスで下位に接続されたコンセプト）についてビューをアサインする（ＭＡ−２）。このように、従来のビューのアサイン方法では、カテゴリ中で兄弟関係にあるコンセプトのみをビューとして設定することができるため、兄弟関係に限定されたコンセプトを比較することしかできなかった。
【００１７】
このように、従来のシステム等は予見的に用意されたコンセプトおよびカテゴリしか使用することができないため、利用場面に応じて臨機応変にコンセプトをアサインしたり、カテゴリとは無関係にビューをアサインしたりすることができないという問題点を有しており、その結果、システムの利用者および管理者のいずれにとっても、利便性が悪く、また、利用効率が悪いものであった。
【００１８】
本発明は上記問題点に鑑みてなされたもので、テキストマイニング分析において柔軟にコンセプトとビューをアサインすることのできる、テキストマイニング分析装置、テキストマイニング分析方法、プログラム、および、記録媒体を提供することを目的としている。
【００１９】
【課題を解決するための手段】
このような目的を達成するため、請求項１に記載のテキストマイニング分析装置は、既存カテゴリを利用せずに新規コンセプトのアサインを実行するコンセプトアサイン手段と、上記コンセプトアサイン手段によりアサインされた上記新規コンセプトにより、カテゴリの構造を変更するカテゴリ変更手段と、テキストマイニング分析の対象となる分析対象コンセプトを選択し、上記カテゴリの構造において当該分析対象コンセプトの下位に存在する上記コンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインするビューアサイン手段と、上記ビューアサイン手段にてアサインされた上記ビューを用いてテキストマイニング分析を実行するテキストマイニング分析手段とを備えたことを特徴とする。
【００２０】
この装置によれば、既存カテゴリを利用せずに新規コンセプトのアサインを実行し、アサインされた新規コンセプトにより、カテゴリの構造を変更し、テキストマイニング分析の対象となる分析対象コンセプトを選択し、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインし、アサインされたビューを用いてテキストマイニング分析を実行するので、テキストマイニング分析において柔軟にコンセプトとビューをアサインすることができるようになる。
【００２１】
すなわち、この装置によれば、既存カテゴリを利用せずに新規コンセプトのアサインを実行するので、既存の同義語辞書とカテゴリ辞書によって定義されていない新規な概念に対応するコンセプトを作成することができるようになる。
【００２２】
また、この装置によれば、アサインされた新規コンセプトにより、カテゴリの構造を変更するので、利用場面に応じて臨機応変にコンセプトをカテゴリ上にアサインすることができるようになる。
【００２３】
また、この装置によれば、テキストマイニング分析の対象となる分析対象コンセプトを選択し、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインし、アサインされたビューを用いてテキストマイニング分析を実行するので、カテゴリとは無関係にビューをアサインしたりすることができるようになり、兄弟関係ではないコンセプトを柔軟に指定してビューを構成することができるようになる。
【００２４】
また、請求項２に記載のテキストマイニング分析装置は、請求項１に記載のテキストマイニング分析装置において、上記コンセプトアサイン手段は、文字列や数値に関する検索条件を指定し、文書内に検索条件を満たす上記文字列や数値が存在する上記文書の集合を新規コンセプトとしてアサインする第１コンセプトアサイン手段、上記検索条件と探索対象となるフィールドを指定し、上記文書の上記フィールド内に上記検索条件を満たす上記文字列や数値が存在する上記文書の集合を新規コンセプトとしてアサインする第２コンセプトアサイン手段、および、既存コンセプトについて論理集合演算を行うことにより取得した上記文書の集合を新規コンセプトとしてアサインする第３コンセプトアサイン手段のうち少なくとも一つの手段をさらに備えたことを特徴とする。
【００２５】
これはコンセプトアサイン手段の一例を一層具体的に示すものである。この装置によれば、コンセプトアサイン手段は、文字列や数値に関する検索条件を指定し、文書内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする第１コンセプトアサイン手段、検索条件と探索対象となるフィールドを指定し、文書のフィールド内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする第２コンセプトアサイン手段、および、既存コンセプトについて論理集合演算を行うことにより取得した文書の集合を新規コンセプトとしてアサインする第３コンセプトアサイン手段のうち少なくとも一つの手段をさらに備えたので、既存の同義語辞書とカテゴリ辞書によって定義されていない新規な概念に対応するコンセプトを作成することができるようになる。
【００２６】
また、請求項３に記載のテキストマイニング分析装置は、請求項１または２に記載のテキストマイニング分析装置において、上記カテゴリ変更手段は、既存カテゴリの任意のコンセプトの下に新規コンセプトを配置する第１カテゴリ変更手段、および、新規コンセプトからなる新規カテゴリを構成する第２カテゴリ変更手段のうち少なくとも一つの手段をさらに備えたことを特徴とする。
【００２７】
これはカテゴリ変更手段の一例を一層具体的に示すものである。この装置によれば、カテゴリ変更手段は、既存カテゴリの任意のコンセプトの下に新規コンセプトを配置する第１カテゴリ変更手段、および、新規コンセプトからなる新規カテゴリを構成する第２カテゴリ変更手段のうち少なくとも一つの手段をさらに備えたので、利用場面に応じて臨機応変にコンセプトを既存または新規のカテゴリ上にアサインすることができるようになる。
【００２８】
また、請求項４に記載のテキストマイニング分析装置は、請求項１から３のいずれか一つに記載のテキストマイニング分析装置において、上記ビューアサイン手段は、上記構成コンセプトに対応する属性を設定する属性設定手段をさらに備え、上記テキストマイニング分析手段は、上記属性設定手段にて設定された上記構成コンセプトの上記属性に従って、上記テキストマイニング分析を実行することを特徴とする。
【００２９】
これはビューアサイン手段の一例を一層具体的に示すものである。この装置によれば、ビューアサイン手段は、構成コンセプトに対応する属性を設定する属性設定手段をさらに備え、テキストマイニング分析手段は、設定された構成コンセプトの属性に従って、テキストマイニング分析を実行するので、各種の属性（例えば、「選択（構成コンセプトとして選択する）」、「飛ばし（構成コンセプトとして選択しない）、「その他（他の構成コンセプトとは別のグループに分けて分析を行う）」など）を設定することにより、柔軟にビューをアサインすることができるようになる。
【００３０】
また、本発明はテキストマイニング分析方法に関するものであり、請求項５に記載のテキストマイニング分析方法は、既存カテゴリを利用せずに新規コンセプトのアサインを実行するコンセプトアサインステップと、上記コンセプトアサインステップによりアサインされた上記新規コンセプトにより、カテゴリの構造を変更するカテゴリ変更ステップと、テキストマイニング分析の対象となる分析対象コンセプトを選択し、上記カテゴリの構造において当該分析対象コンセプトの下位に存在する上記コンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインするビューアサインステップと、上記ビューアサインステップにてアサインされた上記ビューを用いてテキストマイニング分析を実行するテキストマイニング分析ステップとを含むことを特徴とする。
【００３１】
この方法によれば、既存カテゴリを利用せずに新規コンセプトのアサインを実行し、アサインされた新規コンセプトにより、カテゴリの構造を変更し、テキストマイニング分析の対象となる分析対象コンセプトを選択し、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインし、アサインされたビューを用いてテキストマイニング分析を実行するので、テキストマイニング分析において柔軟にコンセプトとビューをアサインすることができるようになる。
【００３２】
すなわち、この方法によれば、既存カテゴリを利用せずに新規コンセプトのアサインを実行するので、既存の同義語辞書とカテゴリ辞書によって定義されていない新規な概念に対応するコンセプトを作成することができるようになる。
【００３３】
また、この方法によれば、アサインされた新規コンセプトにより、カテゴリの構造を変更するので、利用場面に応じて臨機応変にコンセプトをカテゴリ上にアサインすることができるようになる。
【００３４】
また、この方法によれば、テキストマイニング分析の対象となる分析対象コンセプトを選択し、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインし、アサインされたビューを用いてテキストマイニング分析を実行するので、カテゴリとは無関係にビューをアサインしたりすることができるようになり、兄弟関係ではないコンセプトを柔軟に指定してビューを構成することができるようになる。
【００３５】
また、請求項６に記載のテキストマイニング分析方法は、請求項５に記載のテキストマイニング分析方法において、上記コンセプトアサインステップは、文字列や数値に関する検索条件を指定し、文書内に検索条件を満たす上記文字列や数値が存在する上記文書の集合を新規コンセプトとしてアサインする第１コンセプトアサインステップ、上記検索条件と探索対象となるフィールドを指定し、上記文書の上記フィールド内に上記検索条件を満たす上記文字列や数値が存在する上記文書の集合を新規コンセプトとしてアサインする第２コンセプトアサインステップ、および、既存コンセプトについて論理集合演算を行うことにより取得した上記文書の集合を新規コンセプトとしてアサインする第３コンセプトアサインステップのうち少なくとも一つのステップをさらに含むことを特徴とする。
【００３６】
これはコンセプトアサインステップの一例を一層具体的に示すものである。この方法によれば、コンセプトアサインステップは、文字列や数値に関する検索条件を指定し、文書内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする第１コンセプトアサインステップ、検索条件と探索対象となるフィールドを指定し、文書のフィールド内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする第２コンセプトアサインステップ、および、既存コンセプトについて論理集合演算を行うことにより取得した文書の集合を新規コンセプトとしてアサインする第３コンセプトアサインステップのうち少なくとも一つのステップをさらに含むので、既存の同義語辞書とカテゴリ辞書によって定義されていない新規な概念に対応するコンセプトを作成することができるようになる。
【００３７】
また、請求項７に記載のテキストマイニング分析方法は、請求項５または６に記載のテキストマイニング分析方法において、上記カテゴリ変更ステップは、既存カテゴリの任意のコンセプトの下に新規コンセプトを配置する第１カテゴリ変更ステップ、および、新規コンセプトからなる新規カテゴリを構成する第２カテゴリ変更ステップのうち少なくとも一つのステップをさらに含むことを特徴とする。
【００３８】
これはカテゴリ変更ステップの一例を一層具体的に示すものである。この方法によれば、カテゴリ変更ステップは、既存カテゴリの任意のコンセプトの下に新規コンセプトを配置する第１カテゴリ変更ステップ、および、新規コンセプトからなる新規カテゴリを構成する第２カテゴリ変更ステップのうち少なくとも一つのステップをさらに含むので、利用場面に応じて臨機応変にコンセプトを既存または新規のカテゴリ上にアサインすることができるようになる。
【００３９】
また、請求項８に記載のテキストマイニング分析方法は、請求項５から７のいずれか一つに記載のテキストマイニング分析方法において、上記ビューアサインステップは、上記構成コンセプトに対応する属性を設定する属性設定ステップをさらに含み、上記テキストマイニング分析ステップは、上記属性設定ステップにて設定された上記構成コンセプトの上記属性に従って、上記テキストマイニング分析を実行することを特徴とする。
【００４０】
これはビューアサインステップの一例を一層具体的に示すものである。この方法によれば、ビューアサインステップは、構成コンセプトに対応する属性を設定する属性設定ステップをさらに含み、テキストマイニング分析ステップは、設定された構成コンセプトの属性に従って、テキストマイニング分析を実行するので、各種の属性（例えば、「選択（構成コンセプトとして選択する）」、「飛ばし（構成コンセプトとして選択しない）、「その他（他の構成コンセプトとは別のグループに分けて分析を行う）」など）を設定することにより、柔軟にビューをアサインすることができるようになる。
【００４１】
また、本発明はプログラムに関するものであり、請求項９に記載のプログラムは、既存カテゴリを利用せずに新規コンセプトのアサインを実行するコンセプトアサインステップと、上記コンセプトアサインステップによりアサインされた上記新規コンセプトにより、カテゴリの構造を変更するカテゴリ変更ステップと、テキストマイニング分析の対象となる分析対象コンセプトを選択し、上記カテゴリの構造において当該分析対象コンセプトの下位に存在する上記コンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインするビューアサインステップと、上記ビューアサインステップにてアサインされた上記ビューを用いてテキストマイニング分析を実行するテキストマイニング分析ステップとを含むテキストマイニング分析方法をコンピュータに実行させることを特徴とする。
【００４２】
このプログラムによれば、既存カテゴリを利用せずに新規コンセプトのアサインを実行し、アサインされた新規コンセプトにより、カテゴリの構造を変更し、テキストマイニング分析の対象となる分析対象コンセプトを選択し、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインし、アサインされたビューを用いてテキストマイニング分析を実行するので、テキストマイニング分析において柔軟にコンセプトとビューをアサインすることができるようになる。
【００４３】
すなわち、このプログラムによれば、既存カテゴリを利用せずに新規コンセプトのアサインを実行するので、既存の同義語辞書とカテゴリ辞書によって定義されていない新規な概念に対応するコンセプトを作成することができるようになる。
【００４４】
また、このプログラムによれば、アサインされた新規コンセプトにより、カテゴリの構造を変更するので、利用場面に応じて臨機応変にコンセプトをカテゴリ上にアサインすることができるようになる。
【００４５】
また、このプログラムによれば、テキストマイニング分析の対象となる分析対象コンセプトを選択し、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインし、アサインされたビューを用いてテキストマイニング分析を実行するので、カテゴリとは無関係にビューをアサインしたりすることができるようになり、兄弟関係ではないコンセプトを柔軟に指定してビューを構成することができるようになる。
【００４６】
また、請求項１０に記載のプログラムは、請求項９に記載のプログラムにおいて、上記コンセプトアサインステップは、文字列や数値に関する検索条件を指定し、文書内に検索条件を満たす上記文字列や数値が存在する上記文書の集合を新規コンセプトとしてアサインする第１コンセプトアサインステップ、上記検索条件と探索対象となるフィールドを指定し、上記文書の上記フィールド内に上記検索条件を満たす上記文字列や数値が存在する上記文書の集合を新規コンセプトとしてアサインする第２コンセプトアサインステップ、および、既存コンセプトについて論理集合演算を行うことにより取得した上記文書の集合を新規コンセプトとしてアサインする第３コンセプトアサインステップのうち少なくとも一つのステップをさらに含むことを特徴とする。
【００４７】
これはコンセプトアサインステップの一例を一層具体的に示すものである。このプログラムによれば、コンセプトアサインステップは、文字列や数値に関する検索条件を指定し、文書内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする第１コンセプトアサインステップ、検索条件と探索対象となるフィールドを指定し、文書のフィールド内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする第２コンセプトアサインステップ、および、既存コンセプトについて論理集合演算を行うことにより取得した文書の集合を新規コンセプトとしてアサインする第３コンセプトアサインステップのうち少なくとも一つのステップをさらに含むので、既存の同義語辞書とカテゴリ辞書によって定義されていない新規な概念に対応するコンセプトを作成することができるようになる。
【００４８】
また、請求項１１に記載のプログラムは、請求項９または１０に記載のプログラムにおいて、上記カテゴリ変更ステップは、既存カテゴリの任意のコンセプトの下に新規コンセプトを配置する第１カテゴリ変更ステップ、および、新規コンセプトからなる新規カテゴリを構成する第２カテゴリ変更ステップのうち少なくとも一つのステップをさらに含むことを特徴とする。
【００４９】
これはカテゴリ変更ステップの一例を一層具体的に示すものである。このプログラムによれば、カテゴリ変更ステップは、既存カテゴリの任意のコンセプトの下に新規コンセプトを配置する第１カテゴリ変更ステップ、および、新規コンセプトからなる新規カテゴリを構成する第２カテゴリ変更ステップのうち少なくとも一つのステップをさらに含むので、利用場面に応じて臨機応変にコンセプトを既存または新規のカテゴリ上にアサインすることができるようになる。
【００５０】
また、請求項１２に記載のプログラムは、請求項９から１１のいずれか一つに記載のプログラムにおいて、上記ビューアサインステップは、上記構成コンセプトに対応する属性を設定する属性設定ステップをさらに含み、上記テキストマイニング分析ステップは、上記属性設定ステップにて設定された上記構成コンセプトの上記属性に従って、上記テキストマイニング分析を実行することを特徴とする。
【００５１】
これはビューアサインステップの一例を一層具体的に示すものである。このプログラムによれば、ビューアサインステップは、構成コンセプトに対応する属性を設定する属性設定ステップをさらに含み、テキストマイニング分析ステップは、設定された構成コンセプトの属性に従って、テキストマイニング分析を実行するので、各種の属性（例えば、「選択（構成コンセプトとして選択する）」、「飛ばし（構成コンセプトとして選択しない）、「その他（他の構成コンセプトとは別のグループに分けて分析を行う）」など）を設定することにより、柔軟にビューをアサインすることができるようになる。
【００５２】
また、本発明は記録媒体に関するものであり、請求項１３に記載の記録媒体は、上記請求項９から１２のいずれか一つに記載されたプログラムを記録したことを特徴とする。
【００５３】
この記録媒体によれば、当該記録媒体に記録されたプログラムをコンピュータに読み取らせて実行することによって、請求項９から１２のいずれか一つに記載されたプログラムをコンピュータを利用して実現することができ、これら各方法と同様の効果を得ることができる。
【００５４】
【発明の実施の形態】
以下に、本発明にかかるテキストマイニング分析装置、テキストマイニング分析方法、プログラム、および、記録媒体の実施の形態を図面に基づいて詳細に説明する。なお、この実施の形態によりこの発明が限定されるものではない。
【００５５】
[本発明の概要]
以下、本発明の概要について説明し、その後、本発明の構成および処理等について詳細に説明する。図３は本発明の基本原理を示すフローチャートである。
本発明は、概略的に、以下の基本的特徴を有する。
【００５６】
本発明は、既存カテゴリを利用せずに新規コンセプトのアサインを実行する（ステップＳＡ−１）。ここで、ステップＳＡ−１の処理の詳細について、図４〜図６を用いて説明する。
【００５７】
まず、図４は、全文検索によるコンセプト構成を行う場合の一例を示す図である。図４に示すように、まず、文字列や数値に関する検索条件（図４の例では文字列 ”ｒｅｇｅｘｐ１”により表現される正規表現による検索条件であり、例えば、検索条件が”＾［Ｂｂ］ｒａｉｎ．＊”である場合、文書中に、”Ｂｒａｉｎ．．．”、”ｂｒａｉｎ．．．”、”ｂｒａｉｎ−ｉｓｃｈｅｍａ．．．”等がある場合にはヒットする）を指定し、文書（図４の例では、ｄ０１〜ｄ１２）内に検索条件を満たす文字列や数値が存在する文書を検索し、該当する文書（図４の例では、ｄ０１、ｄ０４、ｄ０６、ｄ０７、ｄ０８、ｄ１１）の集合を新規コンセプト（図４の例ではｃ１）としてアサインしてもよい。
【００５８】
また、図５は、フィールド検索によるコンセプト構成を行う場合の一例を示す図である。図５に示すように、まず、文字列や数値に関する検索条件（図５の例では文字列”ｒｅｇｅｘｐ２” により表現される正規表現による検索条件）と探索対象となるフィールド（図５の例ではｆ１のフィールド）を指定し、文書（図５の例では、ｄ０１〜ｄ１２）のフィールド内に検索条件を満たす文字列や数値が存在する文書を検索し、該当する文書（図５の例では、ｄ０２、ｄ０３、ｄ０５、ｄ０６、ｄ０８、ｄ１２）の集合を新規コンセプト（図５の例ではｃ２）としてアサインしてもよい。
【００５９】
また、図６は、論理集合演算によるコンセプト構成を行う場合の一例を示す図である。図６に示すように、まず、既存コンセプト（図６の例ではｃ１、ｃ２）について論理集合演算（例えば、ＡＮＤ、ＯＲ、ＳＵＢなど）を行うことにより取得した文書の集合を新規コンセプト（図６の例ではｃ３）としてアサインしてもよい。
【００６０】
再び、図３に戻り、本発明は、アサインされた新規コンセプトにより、カテゴリの構造を変更する（ステップＳＡ−２）。ここで、ステップＳＡ−２の処理の詳細について、図７および図８を用いて説明する。
【００６１】
まず、図７は、既存カテゴリへのコンセプト配置を行う場合の一例を示す図である。図７に示すように、既存カテゴリの任意のコンセプト（図７の例ではｃ３）の下に新規コンセプト（図７の例ではｃ７）を配置してもよい。
【００６２】
また、図８は、新規カテゴリへのコンセプト配置を行う場合の一例を示す図である。図８に示すように、新規コンセプト（図８の例ではｃ８、ｃ９、ｃ１０、ｃ１１、および、ｃ１２）からなる新規カテゴリ（図８の例では文鎮型の構造を持つカテゴリ）を構成してもよい。
【００６３】
再び、図３に戻り、本発明は、テキストマイニング分析の対象となる分析対象コンセプトを選択し、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインする（ステップＳＡ−３）。ここで、ステップＳＡ−３の処理の詳細について、図９〜図１２を用いて説明する。
【００６４】
図９は、本発明によるビューのアサイン手法の概要を示す図である。図９に示すように、まず、テキストマイニング分析の対象となる分析対象コンセプト（図９における二重丸で示すコンセプト）を選択し（ＭＢ−１）、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ライン（図９における太い実線）を構成する構成コンセプト（図９における黒丸で示すコンセプト）を設定することによりビューをアサインする（ＭＢ−２）。
【００６５】
例えば、会社Ｘに関連した論文を絞る場合に、分析したい分析対象コンセプトとして「病気」を選択し（ＭＢ−１）、「病気」の下位のコンセプトについて、会社Ｘに特徴的な部分については、ドリルダウンして詳細な下位のコンセプトまでを構成コンセプトとして設定したり、あまり会社Ｘと関係のないコンセプトの部分は、上位のコンセプトのみを構成コンセプトとして設定したりすることができるようになる。
【００６６】
また、図１０〜図１２は、本発明によるビューのアサイン手法における属性を用いた付加的機能の概要を示す図である。
【００６７】
図１０は、ビュー切り口ラインを構成する構成コンセプトの中に「その他」の属性を設定する場合を説明する概念図である。図１０は、ビュー切り口ラインを構成する構成コンセプトの中に、属性として「その他」が設定された構成コンセプト（図１０における灰色の丸で示すコンセプト）を示している。「その他」の属性が設定された構成コンセプトについては、テキストマイニング分析において、「その他」の属性を持たない通常の構成コンセプトとは別のグループに分類されることになる。ここで、「その他」の属性は、複数種類設定することができる。
【００６８】
また、図１１は、ビュー切り口ラインを構成する構成コンセプトの中に「飛ばし」の属性を設定する場合を説明する概念図である。図１１は、ビュー切り口ラインを構成する構成コンセプトの中に、属性として「飛ばし」が設定された構成コンセプト（図１０における四角で示すコンセプト）を示している。「飛ばし」の属性が設定された構成コンセプトについては、テキストマイニング分析において、分析対象とされないことになる。
【００６９】
また、図１２は、ビュー切り口ラインを構成する構成コンセプトの中に「飛ばし」と「その他」の属性を混在させて設定する場合を説明する概念図である。これにより、分析対象コンセプトをルートとして、全てのコンセプトについて「その他」と「飛ばし」の属性をそれぞれ適宜設定することにより、あらゆるビューを設定することができるようになる。
【００７０】
再び、図３に戻り、本発明は、アサインされたビューを用いてテキストマイニング分析を実行する（ステップＳＡ−４）。ここで、分析の対象となるのは、ビュー切り口ラインを構成する構成コンセプトであるが、各構成コンセプトについて属性が設定されている場合には、その属性に応じた分析が行われる。
【００７１】
［システム構成］
次に、本システムの構成について説明する。図１５は、本発明が適用される本システムの構成の一例を示すブロック図であり、該構成のうち本発明に関係する部分のみを概念的に示している。本システムは、概略的に、テキストマイニング分析装置１００と、論文などの各種の技術文献を蓄積した文献データベース等の外部データベースや各種の分析・検索サービス等を実行する外部プログラム等を提供する外部システム２００とを、ネットワーク３００を介して通信可能に接続して構成されている。
【００７２】
図１５においてネットワーク３００は、テキストマイニング分析装置１００と外部システム２００とを相互に接続する機能を有し、例えば、インターネット等である。
【００７３】
図１５において外部システム２００は、ネットワーク３００を介して、テキストマイニング分析装置１００と相互に接続され、利用者に対して文献データベース等の外部データベースや、各種の分析・検索などのサービス用の外部プログラムを実行するウェブサイトを提供する機能を有する。
【００７４】
ここで、外部システム２００は、ＷＥＢサーバやＡＳＰサーバ等として構成してもよく、そのハードウェア構成は、一般に市販されるワークステーション、パーソナルコンピュータ等の情報処理装置およびその付属装置により構成してもよい。また、外部システム２００の各機能は、外部システム２００のハードウェア構成中のＣＰＵ、ディスク装置、メモリ装置、入力装置、出力装置、通信制御装置等およびそれらを制御するプログラム等により実現される。
【００７５】
図１５においてテキストマイニング分析装置１００は、概略的に、テキストマイニング分析装置１００の全体を統括的に制御するＣＰＵ等の制御部１０２、通信回線等に接続されるルータ等の通信装置（図示せず）に接続される通信制御インターフェース部１０４、入力装置１１２や出力装置１１４に接続される入出力制御インターフェース部１０８、および、各種のデータベースやテーブルなどを格納する記憶部１０６を備えて構成されており、これら各部は任意の通信路を介して通信可能に接続されている。さらに、このテキストマイニング分析装置１００は、ルータ等の通信装置および専用線等の有線または無線の通信回線を介して、ネットワーク３００に通信可能に接続されている。
【００７６】
記憶部１０６に格納される各種のデータベースやテーブル（文書ファイル１０６ａ〜分析結果ファイル１０６ｅ）は、固定ディスク装置等のストレージ手段であり、各種処理に用いる各種のプログラムやテーブルやファイルやデータベースやウェブページ用ファイル等を格納する。
【００７７】
これら記憶部１０６の各構成要素のうち、文書ファイル１０６ａは、論文などの各種の技術文献などの文書に関する情報（例えば、文書ＩＤ、フィールドＩＤ、テキストデータ、画像データなど）を格納した文書情報格納手段である。ここで、文書ファイル１０６ａに格納される各文書データは、それぞれフィールドに分割されていてもよい。
【００７８】
また、コンセプトファイル１０６ｂは、コンセプトに関する情報（例えば、コンセプトＩＤ、当該コンセプトの有する概念、当該コンセプトに含まれる文書の検索条件や論理集合演算指示等）を格納するコンセプト情報格納手段である。ここで、コンセプトファイル１０６ｂは、当該コンセプトに含まれる文書ＩＤを格納してもよい。
【００７９】
また、カテゴリファイル１０６ｃは、カテゴリに関する情報（例えば、カテゴリＩＤ、カテゴリに含まれるノード（カテゴリ）とエッジ（カテゴリ間の関係）に関する構造データ等を格納するカテゴリ情報格納手段である。
【００８０】
また、ビューファイル１０６ｄは、ビューに関する情報（例えば、ビューＩＤ、ビュー切り口ラインを構成する構成コンセプトのコンセプトＩＤ、当該構成コンセプトの属性等）を格納するビュー情報格納手段である。
【００８１】
また、分析結果ファイル１０６ｅは、テキストマイニング分析の分析結果に関する情報等を格納する分析結果格納手段である。
【００８２】
また、図１５において、通信制御インターフェース部１０４は、テキストマイニング分析装置１００とネットワーク３００（またはルータ等の通信装置）との間における通信制御を行う。すなわち、通信制御インターフェース部１０４は、他の端末と通信回線を介してデータを通信する機能を有する。
【００８３】
また、図１５において、入出力制御インターフェース部１０８は、入力装置１１２や出力装置１１４の制御を行う。ここで、出力装置１１４としては、モニタ（家庭用テレビを含む）の他、スピーカを用いることができる（なお、以下においては出力装置１１４をモニタとして記載する場合がある）。また、入力装置１１２としては、キーボード、マウス、および、マイク等を用いることができる。また、モニタも、マウスと協働してポインティングデバイス機能を実現する。
【００８４】
また、図１５において、制御部１０２は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）等の制御プログラム、各種の処理手順等を規定したプログラム、および所要データを格納するための内部メモリを有し、これらのプログラム等により、種々の処理を実行するための情報処理を行う。制御部１０２は、機能概念的に、コンセプトアサイン部１０２ａ、カテゴリ変更部１０２ｂ、ビューアサイン部１０２ｃ、テキストマイニング分析部１０２ｄ、第１コンセプトアサイン部１０２ｅ、第２コンセプトアサイン部１０２ｆ、第３コンセプトアサイン部１０２ｇ、第１カテゴリ変更部１０２ｈ、第２カテゴリ変更部１０２ｉ、および、属性設定部１０２ｊを備えて構成されている。
【００８５】
このうち、コンセプトアサイン部１０２ａは、既存カテゴリを利用せずに新規コンセプトのアサインを実行するコンセプトアサイン手段である。ここで、図１６は、コンセプトアサイン部１０２ａの構成の一例を説明するブロック図である。図１６に示すように、コンセプトアサイン部１０２ａは、第１コンセプトアサイン部１０２ｅ、第２コンセプトアサイン部１０２ｆ、および、第３コンセプトアサイン部１０２ｇを備えて構成されている。
【００８６】
ここで、第１コンセプトアサイン部１０２ｅは、文字列や数値に関する検索条件を指定し、文書内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする第１コンセプトアサイン手段である。
【００８７】
また、第２コンセプトアサイン部１０２ｆは、検索条件と探索対象となるフィールドを指定し、文書のフィールド内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする第２コンセプトアサイン手段である。
【００８８】
また、第３コンセプトアサイン部１０２ｇは、既存コンセプトについて論理集合演算を行うことにより取得した文書の集合を新規コンセプトとしてアサインする第３コンセプトアサイン手段である。
【００８９】
再び図１５に戻り、カテゴリ変更部１０２ｂは、コンセプトアサイン手段によりアサインされた新規コンセプトにより、カテゴリの構造を変更するカテゴリ変更手段である。ここで、図１７は、カテゴリ変更部１０２ｂの構成の一例を説明するブロック図である。図１７に示すように、カテゴリ変更部１０２ｂは、第１カテゴリ変更部１０２ｈ、および、第２カテゴリ変更部１０２ｉを備えて構成されている。
【００９０】
また、第１カテゴリ変更部１０２ｈは、既存カテゴリの任意のコンセプトの下に新規コンセプトを配置する第１カテゴリ変更手段である。
【００９１】
また、第２カテゴリ変更部１０２ｉは、新規コンセプトからなる新規カテゴリを構成する第２カテゴリ変更手段である。
【００９２】
再び図１５に戻り、ビューアサイン部１０２ｃは、テキストマイニング分析の対象となる分析対象コンセプトを選択し、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインするビューアサイン手段である。ここで、図１８は、ビューアサイン部１０２ｃの構成の一例を説明するブロック図である。図１８に示すように、ビューアサイン部１０２ｃは、属性設定部１０２ｊを備えて構成されている。
【００９３】
ここで、属性設定部１０２ｊは、構成コンセプトに対応する属性を設定する属性設定手段である。
【００９４】
再び図１５に戻り、テキストマイニング分析部１０２ｄは、ビューアサイン手段にてアサインされたビューを用いてテキストマイニング分析を実行するテキストマイニング分析手段である。また、テキストマイニング分析部１０２ｄは、属性設定手段（属性設定部１０２ｊ）にて設定された構成コンセプトの属性に従って、テキストマイニング分析を実行する機能を有する。
【００９５】
なお、これら各部によって行なわれる処理の詳細については、後述する。
【００９６】
[システムの処理]
次に、このように構成された本実施の形態における本システムの処理の一例について、以下に図１９〜図２１等を参照して詳細に説明する。
【００９７】
[メイン処理]
本発明のテキストマイニング分析装置１００により実行されるメイン処理の詳細について図１９等を参照して説明する。図１９は、本実施形態における本システムのメイン処理の一例を示すフローチャートである。
【００９８】
まず、テキストマイニング分析装置１００は、コンセプトアサイン部１０２ａの処理により、既存カテゴリを利用せずに新規コンセプトのアサインを実行するコンセプトアサイン処理を行う（ステップＳＢ−１）。
【００９９】
ここで、コンセプトアサイン部１０２ａにより実行されるコンセプトアサイン処理について、図２０を参照して以下に説明する。
【０１００】
[コンセプトアサイン処理]
図２０は、本実施形態における本システムのコンセプトアサイン処理の一例を示すフローチャートである。
【０１０１】
コンセプトアサイン処理は、以下に詳細に説明する、第１コンセプトアサイン処理、第２コンセプトアサイン処理、および、第３コンセプトアサイン処理のうちいずれかを単独または任意の順番で組み合せて実行することができる。
【０１０２】
（第１コンセプトアサイン処理）
コンセプトアサイン部１０２ａは、第１コンセプトアサイン部１０２ｅの処理により、図４を用いて上述したように、文字列や数値に関する検索条件を指定し、文書内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする（ステップＳＣ−１）。
【０１０３】
（第２コンセプトアサイン処理）
コンセプトアサイン部１０２ａは、第２コンセプトアサイン部１０２ｆの処理により、図５を用いて上述したように、検索条件と探索対象となるフィールドを指定し、文書のフィールド内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする（ステップＳＣ−２）。
【０１０４】
（第３コンセプトアサイン処理）
コンセプトアサイン部１０２ａは、第３コンセプトアサイン部１０２ｇの処理により、図６を用いて上述したように、既存コンセプトについて論理集合演算を行うことにより取得した文書の集合を新規コンセプトとしてアサインする（ステップＳＣ−３）。
【０１０５】
これにて、コンセプトアサイン処理が終了する。
【０１０６】
再び図１９に戻り、テキストマイニング分析装置１００は、カテゴリ変更部１０２ｂの処理により、アサインされた新規コンセプトにより、カテゴリの構造を変更するカテゴリ変更処理を実行する（ステップＳＢ−２）。
【０１０７】
ここで、カテゴリ変更部１０２ｂにより実行されるカテゴリ変更処理について、図２１を参照して以下に説明する。
【０１０８】
[カテゴリ変更処理]
次に、カテゴリ変更処理の詳細について図２１を参照して説明する。図２１は、本実施形態における本システムのカテゴリ変更処理の一例を示すフローチャートである。
カテゴリ変更処理は、以下に詳細に説明する、第１カテゴリ変更処理、および、第２カテゴリ変更処理のうちいずれかを単独または任意の順番で組み合せて実行することができる。
【０１０９】
（第１カテゴリ変更処理）
カテゴリ変更部１０２ｂは、第１カテゴリ変更部１０２ｈの処理により、図７を用いて上述したように、既存カテゴリの任意のコンセプトの下に新規コンセプトを配置する（ステップＳＤ−１）。
【０１１０】
（第２カテゴリ変更処理）
カテゴリ変更部１０２ｂは、第２カテゴリ変更部１０２ｉの処理により、図８を用いて上述したように、新規コンセプトからなる新規カテゴリを構成する（ステップＳＤ−２）。
【０１１１】
これにて、カテゴリ変更処理が終了する。
【０１１２】
再び図１９に戻り、テキストマイニング分析装置１００は、ビューアサイン部１０２ｃの処理により、テキストマイニング分析の対象となる分析対象コンセプトを選択し、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインする（ステップＳＢ−３）。
【０１１３】
すなわち、ビューアサイン部１０２ｃは、図１３に示すビュー設定画面を出力装置１１４に表示し、利用者に、分析対象コンセプトと、ビュー切り口ラインを構成する構成コンセプトを設定させる。
【０１１４】
ここで、図１３は、テキストマイニング分析装置１００の出力装置１１４に表示されるビュー設定画面の一例を示す図である。図１３に示す例では、ルートから３つの子コンセプト（コンセプトＩＤが、ｃａｔＡ、ｃａｔＢ、ｃａｔＣ）が接続され、それぞれに複数の子コンセプトが接続された木構造のカテゴリを一例に説明する。
【０１１５】
図１３においてコンセプトＩＤの左側に未展開コンセプト（「＋」で表示される）、展開コンセプト（「−」で表示される）、または、末端コンセプト（「無」で表示される）を区別するための標識が表示される。利用者は、未展開コンセプトについては、当該「＋」の標識部分をマウスなどでクリックなどすることにより、入力装置１１２を用いて指定すると、下位のコンセプトが表示された後、当該未展開コンセプトは、展開コンセプトの表示（「−」）に切り替わる。すなわち、利用者が当該標識を用いて、下位のコンセプトを開いたり閉じたりすることにより、任意のコンセプトを切り口ラインを構成する構成コンセプトとして指定することができる。
【０１１６】
また、未展開コンセプトと末端コンセプトのコンセプトＩＤの右側には、属性を設定するための選択領域が表示される。利用者が、所望の属性を入力装置１１２を用いて「選択（構成コンセプトとして選択する）」、「飛ばし（構成コンセプトとして選択しない）、「その他（他の構成コンセプトとは別のグループに分けて分析を行う）」のいずれかを指定すると、属性設定部１０２ｊは、当該指定された構成コンセプトの属性値をビューファイル１０６ｄの所定の記憶領域に格納する。
【０１１７】
再び図１９に戻り、テキストマイニング分析装置１００は、テキストマイニング分析部１０２ｄの処理により、アサインされたビューを用いてテキストマイニング分析を実行する（ステップＳＢ−４）。
【０１１８】
ここで、図１４は、テキストマイニング分析結果を表示する画面の一例を示す図である。図１４は、ビューに指定されたコンセプト毎に文書数を表示する場合を一例に説明する。
【０１１９】
図１４に示すように、構成コンセプト毎にその属する文書数を表示する。また、コンセプトＢについては、「その他」の属性を持つコンセプトをまとめて分けて表示している。
【０１２０】
これにて、メイン処理が終了する。
【０１２１】
[他の実施の形態]
さて、これまで本発明の実施の形態について説明したが、本発明は、上述した実施の形態以外にも、上記特許請求の範囲に記載した技術的思想の範囲内において種々の異なる実施の形態にて実施されてよいものである。
【０１２２】
例えば、テキストマイニング分析装置１００がスタンドアローンの形態で処理を行う場合を一例に説明したが、テキストマイニング分析装置１００とは別筐体で構成されるクライアント端末からの要求に応じて処理を行い、その処理結果を当該クライアント端末に返却するように構成してもよい。
【０１２３】
また、実施形態において説明した各処理のうち、自動的に行なわれるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行なわれるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。
【０１２４】
この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種の登録データや検索条件等のパラメータを含む情報、画面例、データベース構成については、特記する場合を除いて任意に変更することができる。
【０１２５】
また、テキストマイニング分析装置１００に関して、図示の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。
【０１２６】
例えば、テキストマイニング分析装置１００の各部または各装置が備える処理機能、特に制御部１０２にて行なわれる各処理機能については、その全部または任意の一部を、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）および当該ＣＰＵにて解釈実行されるプログラムにて実現することができ、あるいは、ワイヤードロジックによるハードウェアとして実現することも可能である。なお、プログラムは、後述する記録媒体に記録されており、必要に応じてテキストマイニング分析装置１００に機械的に読み取られる。
【０１２７】
すなわち、ＲＯＭまたはＨＤなどの記憶部１０６などには、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）と協働してＣＰＵに命令を与え、各種処理を行うためのコンピュータプログラムが記録されている。このコンピュータプログラムは、ＲＡＭ等にロードされることによって実行され、ＣＰＵと協働して制御部１０２を構成する。また、このコンピュータプログラムは、テキストマイニング分析装置１００に対して任意のネットワーク３００を介して接続されたアプリケーションプログラムサーバに記録されてもよく、必要に応じてその全部または一部をダウンロードすることも可能である。
【０１２８】
また、本発明にかかるプログラムを、コンピュータ読み取り可能な記録媒体に格納することもできる。ここで、この「記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＥＰＲＯＭ、ＥＥＰＲＯＭ、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等の任意の「可搬用の物理媒体」や、各種コンピュータシステムに内蔵されるＲＯＭ、ＲＡＭ、ＨＤ等の任意の「固定用の物理媒体」、あるいは、ＬＡＮ、ＷＡＮ、インターネットに代表されるネットワークを介してプログラムを送信する場合の通信回線や搬送波のように、短期にプログラムを保持する「通信媒体」を含むものとする。
【０１２９】
また、「プログラム」とは、任意の言語や記述方法にて記述されたデータ処理方法であり、ソースコードやバイナリコード等の形式を問わない。なお、「プログラム」は必ずしも単一的に構成されるものに限られず、複数のモジュールやライブラリとして分散構成されるものや、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）に代表される別個のプログラムと協働してその機能を達成するものをも含む。なお、実施の形態に示した各装置において記録媒体を読み取るための具体的な構成、読み取り手順、あるいは、読み取り後のインストール手順等については、周知の構成や手順を用いることができる。
【０１３０】
また、テキストマイニング分析装置１００は、さらなる構成要素として、マウス等の各種ポインティングデバイスやキーボードやイメージスキャナやデジタイザ等から成る入力装置（図示せず）、入力データのモニタに用いる表示装置（図示せず）、システムクロックを発生させるクロック発生部（図示せず）、および、各種処理結果その他のデータを出力するプリンタ等の出力装置（図示せず）を備えてもよく、また、入力装置、表示装置および出力装置は、それぞれ入出力インターフェースを介して制御部１０２に接続されてもよい。
【０１３１】
また、テキストマイニング分析装置１００は、既知のパーソナルコンピュータ、ワークステーション等の情報処理端末等の情報処理装置にプリンタやモニタやイメージスキャナ等の周辺装置を接続し、該情報処理装置に本発明の方法を実現させるソフトウェア（プログラム、データ等を含む）を実装することにより実現してもよい。
【０１３２】
さらに、テキストマイニング分析装置１００等の分散・統合の具体的形態は明細書および図面に示すものに限られず、その全部または一部を、各種の負荷等に応じた任意の単位で、機能的または物理的に分散・統合して構成することができる（例えば、グリッド・コンピューティングなど）。例えば、各データベースを独立したデータベース装置として独立に構成してもよく、また、処理の一部をＣＧＩ（ＣｏｍｍｏｎＧａｔｅｗａｙＩｎｔｅｒｆａｃｅ）を用いて実現してもよい。
【０１３３】
また、ネットワーク３００は、テキストマイニング分析装置１００と外部システム２００とを相互に接続する機能を有し、例えば、インターネットや、イントラネットや、ＬＡＮ（有線／無線の双方を含む）や、ＶＡＮや、パソコン通信網や、公衆電話網（アナログ／デジタルの双方を含む）や、専用回線網（アナログ／デジタルの双方を含む）や、ＣＡＴＶ網や、ＩＭＴ２０００方式、ＧＳＭ方式またはＰＤＣ／ＰＤＣ―Ｐ方式等の携帯回線交換網／携帯パケット交換網や、無線呼出網や、Ｂｌｕｅｔｏｏｔｈ等の局所無線網や、ＰＨＳ網や、ＣＳ、ＢＳまたはＩＳＤＢ等の衛星通信網等のうちいずれかを含んでもよい。すなわち、本システムは、有線・無線を問わず任意のネットワークを介して、各種データを送受信することができる。
【０１３４】
【発明の効果】
以上詳細に説明したように、本発明によれば、既存カテゴリを利用せずに新規コンセプトのアサインを実行し、アサインされた新規コンセプトにより、カテゴリの構造を変更し、テキストマイニング分析の対象となる分析対象コンセプトを選択し、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインし、アサインされたビューを用いてテキストマイニング分析を実行するので、テキストマイニング分析において柔軟にコンセプトとビューをアサインすることができるテキストマイニング分析装置、テキストマイニング分析方法、プログラム、および、記録媒体を提供することができる。
【０１３５】
また、本発明によれば、既存カテゴリを利用せずに新規コンセプトのアサインを実行するので、既存の同義語辞書とカテゴリ辞書によって定義されていない新規な概念に対応するコンセプトを作成することができるテキストマイニング分析装置、テキストマイニング分析方法、プログラム、および、記録媒体を提供することができる。
【０１３６】
また、本発明によれば、アサインされた新規コンセプトにより、カテゴリの構造を変更するので、利用場面に応じて臨機応変にコンセプトをカテゴリ上にアサインすることができるテキストマイニング分析装置、テキストマイニング分析方法、プログラム、および、記録媒体を提供することができる。
【０１３７】
また、本発明によれば、テキストマイニング分析の対象となる分析対象コンセプトを選択し、カテゴリの構造において当該分析対象コンセプトの下位に存在するコンセプトからビュー切り口ラインを構成する構成コンセプトを設定することによりビューをアサインし、アサインされたビューを用いてテキストマイニング分析を実行するので、カテゴリとは無関係にビューをアサインしたりすることができるようになり、兄弟関係ではないコンセプトを柔軟に指定してビューを構成することができるテキストマイニング分析装置、テキストマイニング分析方法、プログラム、および、記録媒体を提供することができる。
【０１３８】
また、本発明によれば、コンセプトアサイン手段（または「コンセプトアサインステップ」以下同様。）は、文字列や数値に関する検索条件を指定し、文書内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする第１コンセプトアサイン手段、検索条件と探索対象となるフィールドを指定し、文書のフィールド内に検索条件を満たす文字列や数値が存在する文書の集合を新規コンセプトとしてアサインする第２コンセプトアサイン手段、および、既存コンセプトについて論理集合演算を行うことにより取得した文書の集合を新規コンセプトとしてアサインする第３コンセプトアサイン手段のうち少なくとも一つの手段をさらに備えたので、既存の同義語辞書とカテゴリ辞書によって定義されていない新規な概念に対応するコンセプトを作成することができるテキストマイニング分析装置、テキストマイニング分析方法、プログラム、および、記録媒体を提供することができる。
【０１３９】
また、本発明によれば、カテゴリ変更手段（または「カテゴリ変更ステップ」以下同様。）は、既存カテゴリの任意のコンセプトの下に新規コンセプトを配置する第１カテゴリ変更手段、および、新規コンセプトからなる新規カテゴリを構成する第２カテゴリ変更手段のうち少なくとも一つの手段をさらに備えたので、利用場面に応じて臨機応変にコンセプトを既存または新規のカテゴリ上にアサインすることができるテキストマイニング分析装置、テキストマイニング分析方法、プログラム、および、記録媒体を提供することができる。
【０１４０】
さらに、本発明によれば、ビューアサイン手段（または「ビューアサインステップ」以下同様。）は、構成コンセプトに対応する属性を設定する属性設定手段をさらに備え、テキストマイニング分析手段は、設定された構成コンセプトの属性に従って、テキストマイニング分析を実行するので、各種の属性（例えば、「選択（構成コンセプトとして選択する）」、「飛ばし（構成コンセプトとして選択しない）、「その他（他の構成コンセプトとは別のグループに分けて分析を行う）」など）を設定することにより、柔軟にビューをアサインすることができるテキストマイニング分析装置、テキストマイニング分析方法、プログラム、および、記録媒体を提供することができる。
【図面の簡単な説明】
【図１】テキストマイニングシステムにおけるテキストマイニング分析で扱う情報である文書、コンセプト、カテゴリ、および、ビューのそれぞれの概念を示す図である。
【図２】従来技術の問題点を説明する概念図である。
【図３】本発明の基本原理を示すフローチャートである。
【図４】全文検索によるコンセプト構成を行う場合の一例を示す図である。
【図５】フィールド検索によるコンセプト構成を行う場合の一例を示す図である。
【図６】論理集合演算によるコンセプト構成を行う場合の一例を示す図である。
【図７】既存カテゴリへのコンセプト配置を行う場合の一例を示す図である。
【図８】新規カテゴリへのコンセプト配置を行う場合の一例を示す図である。
【図９】本発明によるビューのアサイン手法の概要を示す図である。
【図１０】ビュー切り口ラインを構成する構成コンセプトの中に「その他」の属性を設定する場合を説明する概念図である。
【図１１】ビュー切り口ラインを構成する構成コンセプトの中に「飛ばし」の属性を設定する場合を説明する概念図である。
【図１２】ビュー切り口ラインを構成する構成コンセプトの中に「飛ばし」と「その他」の属性を混在させて設定する場合を説明する概念図である。
【図１３】テキストマイニング分析装置１００の出力装置１１４に表示されるビュー設定画面の一例を示す図である。
【図１４】テキストマイニング分析結果を表示する画面の一例を示す図である。
【図１５】本発明が適用される本システムの構成の一例を示すブロック図である。
【図１６】コンセプトアサイン部１０２ａの構成の一例を説明するブロック図である。
【図１７】カテゴリ変更部１０２ｂの構成の一例を説明するブロック図である。
【図１８】ビューアサイン部１０２ｃの構成の一例を説明するブロック図である。
【図１９】本実施形態における本システムのメイン処理の一例を示すフローチャートである。
【図２０】本実施形態における本システムのコンセプトアサイン処理の一例を示すフローチャートである。
【図２１】本実施形態における本システムのカテゴリ変更処理の一例を示すフローチャートである。
【符号の説明】
１００テキストマイニング分析装置
１０２制御部
１０２ａコンセプトアサイン部
１０２ｂカテゴリ変更部
１０２ｃビューアサイン部
１０２ｄテキストマイニング分析部
１０２ｅ第１コンセプトアサイン部
１０２ｆ第２コンセプトアサイン部
１０２ｇ第３コンセプトアサイン部
１０２ｈ第１カテゴリ変更部
１０２ｉ第２カテゴリ変更部
１０２ｊ属性設定部
１０４通信制御インターフェース部
１０６記憶部
１０６ａ文書ファイル
１０６ｂコンセプトファイル
１０６ｃカテゴリファイル
１０６ｄビューファイル
１０６ｅ分析結果ファイル
１０８入出力制御インターフェース部
１１２入力装置
１１４出力装置
２００外部システム
３００ネットワーク[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a text mining analysis device, a text mining analysis method, a program, and a recording medium, and more particularly to a text mining analysis device, a text mining analysis method, and a program that can flexibly assign concepts and views in text mining analysis. , And a recording medium.
[0002]
[Prior art]
In recent years, a document database storing various technical documents such as papers has been constructed and widely used via the Internet and the like. For example, there is PubMed provided by the US National Biotechnology Center (NCBI) to provide literature data of the National Library of Medicine (NLM) and the like (for example, see Non-Patent Document 1).
[0003]
In a conventional document database search service, in order to improve search efficiency and the like, a "notation dictionary" for associating a normal form with a notation form of each term, and a "notation dictionary" for classifying each term into categories. For example, a "category dictionary" is used.
[0004]
For example, TAKMI (product name) of IBM (company name) exists as a text mining system using an existing notation dictionary or category dictionary (for example, see Non-Patent Document 2).
[0005]
Here, in the existing text mining system, analysis is performed using four types of information (document, concept, category, and view) shown in FIG. FIG. 1 is a diagram illustrating the concepts of a document, a concept, a category, and a view, which are information handled in text mining analysis in a text mining system. Hereinafter, such information will be described with reference to FIG.
[0006]
(1) Document
“Document” means text data to be analyzed for text mining (d01 to d12 in FIG. 1 correspond). Each document is generally divided by fields.
[0007]
(2) Concept
“Concept” means a set of documents included in a specific concept (c1 to c6 in FIG. 1 correspond). In existing systems, a concept is determined by a set of synonym dictionaries and their normal forms, and holds a set of documents describing a particular concept.
[0008]
(3) Category
The “category” means a set of concepts having a structure (a tree structure including a root in FIG. 1 and concepts c1 to c6 belonging to the root and corresponding thereto). In the existing system, a category is determined by a category dictionary, and holds a concept set by a structure such as a paperweight type or a tree structure type.
[0009]
(4) View
"View" refers to an ordered set of concepts in a category. In the existing system, if the category is a paperweight type structure, the view is a set in which all concept sets included in the category are ordered by alphabets (for example, concept IDs), frequency of appearance, or frequency of appearance. Is determined as
[0010]
On the other hand, when the category has a tree structure, the view is determined by the instruction of the concept node to be subjected to the text mining analysis by the user, and a set of children of the designated concept in the category tree structure includes an alphabet or the like (for example, the concept ID), appearance frequency, or appearance frequency magnification.
[0011]
Here, FIG. 1 shows a first view composed of c1 and c3 when the category is a tree structure and the concept node to be subjected to text mining analysis designated by the user is c2, and the concept node designated by the user. A second view consisting of c2, c4, c5, and c6 when is the root is shown.
[0012]
[Non-patent document 1]
Internet, PubMed URL: http: // www. ncbi. nlm. gov / entrez /
[Non-patent document 2]
The URL of the Internet and the homepage of the text mining technology introduction of IBM Tokyo Research Laboratory: http: // www. trl. ibm. com / projects / s7710 / tm / index. http, URL of the homepage of TAKMI introduction: http: // www. trl. ibm. com / projects / s7710 / tm / takmi / takmi. htm
[0013]
[Problems to be solved by the invention]
However, the existing text mining system has a fundamental problem in the system structure that there is a restriction in a method of assigning a concept and a method of assigning a view to a category.
Hereinafter, the content of this problem will be described more specifically.
[0014]
A concept assignment method in an existing text mining system is an assignment method in which a concept is determined by a set of a synonym dictionary and its normal form. Therefore, a concept that is not defined by the synonym dictionary and the category dictionary cannot be handled, so that there is a problem that a concept corresponding to a new concept cannot be created.
[0015]
According to the view assignment method in the existing text mining system, when a category has a paperweight type structure, the view is determined as a set in which a specific order is assigned to all concept sets included in the category. In the case of a tree structure, a view is determined as a set of concepts corresponding to the lower-level concepts according to an instruction of a concept node by a user. Therefore, in any case, there is a problem that an extra concept may be included in the view.
[0016]
Also, in the view assignment method in the existing text mining system, there is a problem that concepts having no sibling relationship in structure cannot be arranged as views. Here, FIG. 2 is a conceptual diagram illustrating this problem. As shown in FIG. 2, in a view assignment method in an existing text mining system, first, a concept to be analyzed (concept to be analyzed) is selected from categories (MA-1). Then, a view is assigned to a “child concept” of the concept (that is, a concept that is directly connected to the lower level by one pass on the structure) (MA-2). As described above, according to the conventional view assignment method, only concepts having a sibling relationship in a category can be set as a view, so that only a concept limited to a sibling relationship can be compared.
[0017]
As described above, since the conventional system and the like can use only concepts and categories prepared foreseeing, a concept can be assigned flexibly according to a use situation, or a view can be assigned irrespective of the category. As a result, the system is inconvenient for both the user and the administrator of the system, and the utilization efficiency is low.
[0018]
The present invention has been made in view of the above problems, and provides a text mining analysis apparatus, a text mining analysis method, a program, and a recording medium that can flexibly assign concepts and views in text mining analysis. It is an object.
[0019]
[Means for Solving the Problems]
In order to achieve such an object, a text mining analyzer according to claim 1 performs a concept assigning unit for executing assignment of a new concept without using an existing category, and the new concept assigned by the concept assigning unit. Based on the concept, select the category change means for changing the structure of the category and the analysis target concept to be subjected to text mining analysis, and construct a view cut line from the above concept existing below the analysis target concept in the above category structure Viewer signing means for assigning a view by setting a configuration concept to be performed, and text mining analysis means for performing text mining analysis using the view assigned by the viewer signing means.
[0020]
According to this device, a new concept is assigned without using an existing category, the structure of the category is changed according to the assigned new concept, an analysis target concept to be subjected to text mining analysis is selected, and the category is selected. The view is assigned by setting the constituent concept that constitutes the view cut line from the concept that exists below the concept to be analyzed in the structure of the subject, and the text mining analysis is performed using the assigned view. Will be able to flexibly assign concepts and views.
[0021]
That is, according to this device, a new concept is assigned without using an existing category, so that a concept corresponding to a new concept that is not defined by the existing synonym dictionary and category dictionary can be created. Become like
[0022]
Further, according to this device, the structure of the category is changed according to the assigned new concept, so that the concept can be assigned to the category flexibly according to the use scene.
[0023]
Further, according to this device, an analysis target concept to be subjected to text mining analysis is selected, and a configuration concept that configures a view cut line from a concept existing below the analysis target concept in the category structure is set. Since views are assigned and text mining analysis is performed using the assigned views, it is possible to assign views independently of categories, and to flexibly specify non-sibling concepts. Can be configured.
[0024]
Further, in the text mining analysis device according to the second aspect, in the text mining analysis device according to the first aspect, the concept assigning unit specifies a search condition relating to a character string or a numerical value and satisfies the search condition in a document. First concept assigning means for assigning the set of documents having the character strings and numerical values as a new concept, designating the search condition and a field to be searched, and satisfying the search condition in the field of the document A second concept assigning means for assigning a set of the above-mentioned documents having character strings and numerical values as a new concept, and a third concept for assigning a set of the above-mentioned documents obtained by performing a logical set operation on the existing concept as a new concept Assign at least one of the assignment means Characterized by comprising a.
[0025]
This more specifically shows an example of the concept assigning means. According to this device, the concept assigning means designates a search condition relating to a character string or a numerical value, and assigns, as a new concept, a set of documents in which a character string or a numerical value satisfying the search condition exists in the document as a new concept. A second concept assigning means for designating a search condition and a field to be searched, assigning a set of documents in which a character string or a numerical value satisfying the search condition exists in a field of the document as a new concept, and a logic for the existing concept. Since at least one of the third concept assigning means for assigning a set of documents obtained by performing the set operation as a new concept is further provided, a new concept not defined by the existing synonym dictionary and category dictionary is provided. To create a concept that corresponds to It made.
[0026]
The text mining analyzer according to claim 3 is the text mining analyzer according to claim 1 or 2, wherein the category changing means arranges a new concept under an arbitrary concept of an existing category. It is characterized by further comprising at least one of a category changing means and a second category changing means constituting a new category composed of a new concept.
[0027]
This shows one example of the category changing means more specifically. According to this device, the category changing means includes at least one of the first category changing means for arranging a new concept under an arbitrary concept of the existing category and the second category changing means for forming a new category including the new concept. Since one means is further provided, it becomes possible to assign a concept to an existing or new category according to the use situation.
[0028]
The text mining analyzer according to claim 4 is the text mining analyzer according to any one of claims 1 to 3, wherein the viewer sign means sets an attribute corresponding to the configuration concept. The apparatus further comprises setting means, wherein the text mining analysis means executes the text mining analysis according to the attribute of the configuration concept set by the attribute setting means.
[0029]
This shows an example of the viewer signing means more specifically. According to this device, the viewer sign means further includes an attribute setting means for setting an attribute corresponding to the composition concept, and the text mining analysis means executes the text mining analysis according to the set attribute of the composition concept. Various attributes (for example, “selection (select as a configuration concept)”, “skip (do not select as a configuration concept),” “other (perform analysis in a separate group from other configuration concepts)” etc.) By setting, the view can be flexibly assigned.
[0030]
In addition, the present invention relates to a text mining analysis method, wherein the text mining analysis method according to claim 5 includes a concept assignment step of executing a new concept assignment without using an existing category, and the concept assignment step. According to the new concept assigned, a category change step for changing the structure of the category and a target concept to be subjected to the text mining analysis are selected. A viewer signing step for assigning a view by setting a configuration concept constituting the view cut line, and a text mining analysis using the view assigned in the viewer signing step. Characterized in that it comprises a strike mining analysis steps.
[0031]
According to this method, a new concept is assigned without using an existing category, the structure of the category is changed according to the assigned new concept, a target concept to be subjected to text mining analysis is selected, and the category is selected. The view is assigned by setting the constituent concept that constitutes the view cut line from the concept that exists below the concept to be analyzed in the structure of the subject, and the text mining analysis is performed using the assigned view. Will be able to flexibly assign concepts and views.
[0032]
That is, according to this method, a new concept is assigned without using an existing category, so that a concept corresponding to a new concept that is not defined by the existing synonym dictionary and category dictionary can be created. Become like
[0033]
Further, according to this method, the structure of the category is changed according to the assigned new concept, so that the concept can be assigned to the category flexibly according to the use situation.
[0034]
In addition, according to this method, an analysis target concept to be subjected to text mining analysis is selected, and a configuration concept that configures a view cut line from a concept existing below the analysis target concept in the category structure is set. Since views are assigned and text mining analysis is performed using the assigned views, it is possible to assign views independently of categories, and to flexibly specify non-sibling concepts. Can be configured.
[0035]
The text mining analysis method according to claim 6 is the text mining analysis method according to claim 5, wherein the concept assigning step specifies a search condition relating to a character string or a numerical value and satisfies the search condition in a document. A first concept assigning step of assigning the set of documents having the character strings and numerical values as a new concept, designating the search condition and a field to be searched, and satisfying the search condition in the field of the document A second concept assigning step of assigning a set of the documents having character strings and numerical values as a new concept, and a third concept of assigning a set of the documents acquired by performing a logical set operation on the existing concept as a new concept At least one of the assignment steps And further comprising a single step.
[0036]
This more specifically shows an example of the concept assignment step. According to this method, the concept assigning step is a first concept assigning step in which a search condition relating to a character string or a numerical value is specified, and a set of documents in which a character string or a numerical value satisfying the search condition exists in the document is assigned as a new concept. A second concept assigning step of designating a search condition and a field to be searched, assigning a set of documents in which a character string or a numerical value satisfying the search condition exists in a field of the document as a new concept, and logic for the existing concept The method further includes at least one of a third concept assignment step of assigning a set of documents obtained by performing a set operation as a new concept, so that a new concept that is not defined by the existing synonym dictionary and category dictionary is added. Create a corresponding concept It becomes possible.
[0037]
Further, in the text mining analysis method according to claim 7, in the text mining analysis method according to claim 5 or 6, the category changing step includes arranging a new concept under an arbitrary concept in an existing category. The method further includes at least one of a category change step and a second category change step constituting a new category including a new concept.
[0038]
This more specifically shows an example of the category change step. According to this method, the category change step includes at least a first category change step of arranging a new concept under any concept of the existing category and a second category change step of forming a new category including the new concept. Since the method further includes one step, the concept can be assigned to an existing or new category according to the usage situation.
[0039]
The text mining analysis method according to claim 8 is the text mining analysis method according to any one of claims 5 to 7, wherein the viewer signing step sets an attribute corresponding to the configuration concept. The method further includes a setting step, wherein the text mining analysis step performs the text mining analysis according to the attribute of the configuration concept set in the attribute setting step.
[0040]
This more specifically shows one example of the viewer sign step. According to this method, the viewer signing step further includes an attribute setting step of setting an attribute corresponding to the composition concept, and the text mining analysis step performs text mining analysis according to the set composition concept attribute. Various attributes (for example, “selection (select as a configuration concept)”, “skip (do not select as a configuration concept),” “other (perform analysis in a separate group from other configuration concepts)” etc.) Setting makes it possible to flexibly assign views.
[0041]
The present invention also relates to a program, wherein the program according to the ninth aspect is a concept assigning step of executing assignment of a new concept without using an existing category, and the new concept assigned by the concept assigning step. , A category change step of changing the structure of the category, and an analysis target concept to be subjected to text mining analysis are selected, and a view cut line is formed from the concept existing below the analysis target concept in the category structure. A viewer signing step of assigning a view by setting a configuration concept, and a text mining analysis step of performing a text mining analysis using the view assigned in the viewer signing step. Characterized in that to execute a free text mining analysis methodologies to computers.
[0042]
According to this program, a new concept is assigned without using an existing category, the structure of the category is changed according to the assigned new concept, a target concept to be subjected to text mining analysis is selected, and the category is selected. The view is assigned by setting the constituent concept that constitutes the view cut line from the concept that exists below the concept to be analyzed in the structure of the subject, and the text mining analysis is performed using the assigned view. Will be able to flexibly assign concepts and views.
[0043]
That is, according to this program, a new concept is assigned without using an existing category, so that a concept corresponding to a new concept that is not defined by the existing synonym dictionary and category dictionary can be created. Become like
[0044]
Further, according to this program, the structure of the category is changed according to the assigned new concept, so that the concept can be assigned to the category flexibly according to the use situation.
[0045]
According to this program, a concept to be analyzed to be subjected to text mining analysis is selected, and a configuration concept that configures a view cut line from a concept existing below the concept to be analyzed in a category structure is set. Since views are assigned and text mining analysis is performed using the assigned views, it is possible to assign views independently of categories, and to flexibly specify non-sibling concepts. Can be configured.
[0046]
According to a tenth aspect of the present invention, in the program according to the ninth aspect, in the concept assigning step, a search condition relating to a character string or a numerical value is specified, and the character string or the numerical value satisfying the search condition is specified in a document. A first concept assigning step of assigning a set of existing documents as a new concept, specifying the search condition and a field to be searched, and the character string or numerical value satisfying the search condition exists in the field of the document At least one of a second concept assigning step of assigning the set of documents to be performed as a new concept and a third concept assigning step of assigning the set of documents acquired by performing a logical set operation on an existing concept as a new concept. And one more step The features.
[0047]
This more specifically shows an example of the concept assignment step. According to this program, the concept assigning step is a first concept assigning step in which a search condition relating to a character string or a numerical value is specified, and a set of documents in which a character string or a numerical value satisfying the search condition exists in the document is assigned as a new concept. A second concept assigning step of designating a search condition and a field to be searched, assigning a set of documents in which a character string or a numerical value satisfying the search condition exists in a field of the document as a new concept, and logic for the existing concept The method further includes at least one of a third concept assignment step of assigning a set of documents obtained by performing a set operation as a new concept, so that a new concept that is not defined by the existing synonym dictionary and category dictionary is added. The corresponding concept So it can be formed.
[0048]
The program according to claim 11 is the program according to claim 9 or 10, wherein the category change step is a first category change step of arranging a new concept under an arbitrary concept of an existing category; The method may further include at least one of the second category changing steps forming a new category including a new concept.
[0049]
This more specifically shows an example of the category change step. According to this program, the category change step includes at least one of a first category change step of arranging a new concept under an arbitrary concept of an existing category and a second category change step of forming a new category including the new concept. Since the method further includes one step, the concept can be assigned to an existing or new category according to the usage situation.
[0050]
The program according to claim 12 is the program according to any one of claims 9 to 11, wherein the viewer signing step further includes an attribute setting step of setting an attribute corresponding to the configuration concept, In the text mining analysis step, the text mining analysis is performed according to the attribute of the configuration concept set in the attribute setting step.
[0051]
This more specifically shows one example of the viewer sign step. According to this program, the viewer signing step further includes an attribute setting step of setting an attribute corresponding to the composition concept, and the text mining analysis step performs text mining analysis according to the set composition concept attribute. Various attributes (for example, “selection (select as a configuration concept)”, “skip (do not select as a configuration concept),” “other (perform analysis in a separate group from other configuration concepts)” etc.) By setting, the view can be flexibly assigned.
[0052]
The present invention also relates to a recording medium, wherein a recording medium according to a thirteenth aspect records the program according to any one of the ninth to twelfth aspects.
[0053]
According to this recording medium, a program recorded in the recording medium is read by a computer and executed, thereby realizing the program described in any one of claims 9 to 12 using a computer. And the same effect as each of these methods can be obtained.
[0054]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of a text mining analysis device, a text mining analysis method, a program, and a recording medium according to the present invention will be described in detail with reference to the drawings. It should be noted that the present invention is not limited by the embodiment.
[0055]
[Overview of the present invention]
Hereinafter, the outline of the present invention will be described, and then the configuration, processing, and the like of the present invention will be described in detail. FIG. 3 is a flowchart showing the basic principle of the present invention.
The present invention generally has the following basic features.
[0056]
According to the present invention, a new concept is assigned without using an existing category (step SA-1). Here, the details of the process of step SA-1 will be described with reference to FIGS.
[0057]
First, FIG. 4 is a diagram illustrating an example of a case where a concept configuration by full-text search is performed. As shown in FIG. 4, first, a search condition relating to a character string or a numerical value (in the example of FIG. 4, a search condition based on a regular expression represented by a character string “regexp1”. . * ", The document is designated as" Brain ... "," brain ... "," brain-ischema ... ", etc., and the document (Fig. 4). In the example of (1), a document in which a character string or a numerical value satisfying the search condition exists in d01 to d12) is searched, and a set of corresponding documents (d01, d04, d06, d07, d08, d11 in the example of FIG. 4) is searched. May be assigned as a new concept (c1 in the example of FIG. 4).
[0058]
FIG. 5 is a diagram illustrating an example of a case where a concept configuration is performed by a field search. As shown in FIG. 5, first, a search condition relating to a character string or a numerical value (in the example of FIG. 5, a search condition by a regular expression represented by a character string “regexp2”) and a field to be searched (f1 in the example of FIG. 5) Of the document (d01 to d12 in the example of FIG. 5) in which a character string or a numerical value that satisfies the search condition is searched, and the corresponding document (d02 in the example of FIG. 5) is searched. , D03, d05, d06, d08, d12) may be assigned as a new concept (c2 in the example of FIG. 5).
[0059]
FIG. 6 is a diagram illustrating an example of a case where a concept configuration is performed by a logical set operation. As shown in FIG. 6, first, a set of documents acquired by performing a logical set operation (for example, AND, OR, SUB, etc.) on an existing concept (c1, c2 in the example of FIG. 6) is converted into a new concept (FIG. 6). May be assigned as c3).
[0060]
Referring back to FIG. 3, the present invention changes the structure of the category according to the assigned new concept (step SA-2). Here, the details of the process of step SA-2 will be described with reference to FIGS. 7 and 8.
[0061]
First, FIG. 7 is a diagram illustrating an example of a case where a concept is arranged in an existing category. As shown in FIG. 7, a new concept (c7 in the example of FIG. 7) may be arranged below an arbitrary concept (c3 in the example of FIG. 7) of the existing category.
[0062]
FIG. 8 is a diagram illustrating an example of a case where concepts are arranged in a new category. As shown in FIG. 8, even if a new category (a category having a paperweight type structure in the example of FIG. 8) composed of the new concept (c8, c9, c10, c11, and c12 in the example of FIG. 8) is formed. Good.
[0063]
Referring back to FIG. 3, the present invention selects an analysis target concept to be subjected to text mining analysis, and sets a configuration concept that configures a view cut line from concepts existing below the analysis target concept in the category structure. To assign a view (step SA-3). Here, the details of the process of step SA-3 will be described with reference to FIGS.
[0064]
FIG. 9 is a diagram showing an outline of a view assignment method according to the present invention. As shown in FIG. 9, first, an analysis target concept (concept indicated by a double circle in FIG. 9) to be subjected to text mining analysis is selected (MB-1), and the analysis target concept is positioned below the analysis target concept in the category structure. A view is assigned by setting a configuration concept (a concept indicated by a black circle in FIG. 9) that constitutes a view cut line (a thick solid line in FIG. 9) from an existing concept (MB-2).
[0065]
For example, when narrowing down the papers related to Company X, “Illness” is selected as the analysis target concept to be analyzed (MB-1). By drilling down, it is possible to set a detailed lower-level concept as a constituent concept, and to set a concept part that has little relation to the company X, set only a higher-level concept as a constituent concept.
[0066]
10 to 12 are diagrams showing an outline of additional functions using attributes in a view assignment method according to the present invention.
[0067]
FIG. 10 is a conceptual diagram illustrating a case where an attribute of “others” is set in a configuration concept forming a view cut line. FIG. 10 shows a configuration concept in which “other” is set as an attribute (a concept indicated by a gray circle in FIG. 10) among the configuration concepts configuring the view cut line. Constituent concepts to which the “other” attribute is set are classified into a different group in the text mining analysis from normal construct concepts that do not have the “other” attribute. Here, a plurality of types of “other” attributes can be set.
[0068]
FIG. 11 is a conceptual diagram illustrating a case where an attribute of “skip” is set in a configuration concept forming a view cut line. FIG. 11 shows a configuration concept (a concept shown by a square in FIG. 10) in which “skip” is set as an attribute among the configuration concepts that constitute the view cut line. The composition concept to which the attribute of “skip” is set is not analyzed in the text mining analysis.
[0069]
FIG. 12 is a conceptual diagram illustrating a case where the attributes “skip” and “other” are mixed and set in the configuration concept of the view cut line. Thus, all views can be set by appropriately setting the “other” and “skip” attributes for all the concepts, with the analysis target concept as the root.
[0070]
Referring back to FIG. 3, the present invention performs a text mining analysis using the assigned view (step SA-4). Here, the analysis target is a configuration concept that forms the view cut line. When an attribute is set for each configuration concept, analysis is performed according to the attribute.
[0071]
[System configuration]
Next, the configuration of the present system will be described. FIG. 15 is a block diagram illustrating an example of the configuration of the present system to which the present invention is applied, and conceptually illustrates only a portion related to the present invention in the configuration. The system schematically includes a text mining analysis apparatus 100, an external database such as a literature database storing various technical documents such as papers, and an external system that provides an external program for executing various analysis and search services. 200 are communicably connected via a network 300.
[0072]
In FIG. 15, a network 300 has a function of interconnecting the text mining analyzer 100 and the external system 200, and is, for example, the Internet.
[0073]
In FIG. 15, an external system 200 is interconnected with the text mining analyzer 100 via a network 300, and provides an external database such as a literature database to the user and an external program for services such as various analyzes and searches. Has a function of providing a website for executing.
[0074]
Here, the external system 200 may be configured as a WEB server, an ASP server, or the like, and its hardware configuration may be configured by an information processing device such as a generally-available workstation, a personal computer, and its accompanying devices. Good. Each function of the external system 200 is realized by a CPU, a disk device, a memory device, an input device, an output device, a communication control device, and the like in a hardware configuration of the external system 200, a program for controlling them, and the like.
[0075]
In FIG. 15, the text mining analyzer 100 generally includes a control unit 102 such as a CPU that comprehensively controls the entire text mining analyzer 100, and a communication device (not shown) such as a router connected to a communication line or the like. ), An input / output control interface unit 108 connected to the input device 112 and the output device 114, and a storage unit 106 for storing various databases and tables. These units are communicably connected via an arbitrary communication path. Furthermore, the text mining analyzer 100 is communicably connected to the network 300 via a communication device such as a router and a wired or wireless communication line such as a dedicated line.
[0076]
Various databases and tables (document file 106a to analysis result file 106e) stored in the storage unit 106 are storage means such as a fixed disk device, and various programs, tables, files, databases, and web pages used for various processes. Stores files for use.
[0077]
Among the constituent elements of the storage unit 106, the document file 106a stores a document information storing information (for example, a document ID, a field ID, text data, and image data) on a document such as various technical documents such as a dissertation. Means. Here, each document data stored in the document file 106a may be divided into fields.
[0078]
The concept file 106b is a concept information storage unit that stores information about the concept (for example, a concept ID, a concept of the concept, a search condition of a document included in the concept, a logical set operation instruction, and the like). Here, the concept file 106b may store a document ID included in the concept.
[0079]
The category file 106c is a category information storage unit that stores information on a category (for example, a category ID, structural data on a node (category) included in the category, and an edge (relation between categories)).
[0080]
The view file 106d is a view information storage unit that stores information about a view (for example, a view ID, a concept ID of a configuration concept forming a view cut line, an attribute of the configuration concept, and the like).
[0081]
The analysis result file 106e is an analysis result storage unit that stores information and the like regarding the analysis result of the text mining analysis.
[0082]
In FIG. 15, the communication control interface unit 104 controls communication between the text mining analyzer 100 and the network 300 (or a communication device such as a router). That is, the communication control interface unit 104 has a function of communicating data with another terminal via a communication line.
[0083]
In FIG. 15, the input / output control interface unit 108 controls the input device 112 and the output device 114. Here, as the output device 114, in addition to a monitor (including a home television), a speaker can be used (in the following, the output device 114 may be described as a monitor). As the input device 112, a keyboard, a mouse, a microphone, and the like can be used. The monitor also realizes a pointing device function in cooperation with the mouse.
[0084]
In FIG. 15, the control unit 102 has a control program such as an OS (Operating System), a program defining various processing procedures, and an internal memory for storing required data. And information processing for executing various processes. The control unit 102 conceptually includes a concept assigning unit 102a, a category changing unit 102b, a viewer signing unit 102c, a text mining analyzing unit 102d, a first concept assigning unit 102e, a second concept assigning unit 102f, and a third concept assigning unit. 102g, a first category changing unit 102h, a second category changing unit 102i, and an attribute setting unit 102j.
[0085]
Among them, the concept assigning unit 102a is a concept assigning unit that executes assignment of a new concept without using an existing category. Here, FIG. 16 is a block diagram illustrating an example of the configuration of the concept assigning unit 102a. As shown in FIG. 16, the concept assigning unit 102a includes a first concept assigning unit 102e, a second concept assigning unit 102f, and a third concept assigning unit 102g.
[0086]
Here, the first concept assigning unit 102e designates a search condition relating to a character string or a numerical value, and assigns a set of documents in which a character string or a numerical value satisfying the search condition exists in the document as a new concept. It is.
[0087]
The second concept assigning unit 102f specifies a search condition and a field to be searched, and assigns, as a new concept, a set of documents in which a character string or a numerical value satisfying the search condition exists in the document field. It is an assignment means.
[0088]
The third concept assigning unit 102g is a third concept assigning unit that assigns a set of documents acquired by performing a logical set operation on an existing concept as a new concept.
[0089]
Referring back to FIG. 15, the category changing unit 102b is a category changing unit that changes the structure of the category according to the new concept assigned by the concept assigning unit. Here, FIG. 17 is a block diagram illustrating an example of the configuration of the category changing unit 102b. As shown in FIG. 17, the category change unit 102b includes a first category change unit 102h and a second category change unit 102i.
[0090]
The first category changing unit 102h is a first category changing unit that arranges a new concept under an arbitrary concept of an existing category.
[0091]
The second category changing unit 102i is a second category changing unit that forms a new category including a new concept.
[0092]
Returning to FIG. 15 again, the viewer signing unit 102c selects an analysis target concept to be subjected to text mining analysis, and converts a configuration concept that configures a view cut line from a concept existing below the analysis target concept in the category structure. This is a viewer signing means for assigning a view by setting. Here, FIG. 18 is a block diagram illustrating an example of the configuration of the viewer sign unit 102c. As shown in FIG. 18, the viewer sign unit 102c includes an attribute setting unit 102j.
[0093]
Here, the attribute setting unit 102j is an attribute setting unit that sets an attribute corresponding to the configuration concept.
[0094]
Referring back to FIG. 15, the text mining analysis unit 102d is a text mining analysis unit that performs a text mining analysis using the view assigned by the viewer signing unit. Further, the text mining analysis unit 102d has a function of executing text mining analysis according to the attribute of the constituent concept set by the attribute setting unit (attribute setting unit 102j).
[0095]
The details of the processing performed by these units will be described later.
[0096]
[System processing]
Next, an example of the processing of the present system configured as described above according to the present embodiment will be described in detail below with reference to FIGS.
[0097]
[Main processing]
Details of the main processing executed by the text mining analyzer 100 of the present invention will be described with reference to FIG. FIG. 19 is a flowchart illustrating an example of main processing of the present system in the present embodiment.
[0098]
First, the text mining analysis device 100 performs a concept assignment process of executing a new concept assignment without using an existing category by the process of the concept assignment unit 102a (step SB-1).
[0099]
Here, the concept assignment process executed by the concept assignment unit 102a will be described below with reference to FIG.
[0100]
[Concept assignment processing]
FIG. 20 is a flowchart illustrating an example of the concept assignment process of the present system in the present embodiment.
[0101]
The concept assignment process can be executed by any one of a first concept assignment process, a second concept assignment process, and a third concept assignment process, which will be described in detail below, or in any combination.
[0102]
(1st concept assignment process)
As described above with reference to FIG. 4, the concept assigning unit 102 a specifies the search condition regarding the character string or the numerical value by the processing of the first concept assigning unit 102 e, and the character string or the numerical value satisfying the search condition exists in the document. A set of documents to be assigned is assigned as a new concept (step SC-1).
[0103]
(Second concept assignment process)
As described above with reference to FIG. 5, the concept assigning unit 102a specifies the search condition and the field to be searched by the processing of the second concept assigning unit 102f, and sets a character string or a character string satisfying the search condition in the document field. A set of documents having numerical values is assigned as a new concept (step SC-2).
[0104]
(Third concept assignment process)
The concept assigning unit 102a assigns, as a new concept, a set of documents acquired by performing a logical set operation on the existing concept as described above with reference to FIG. 6 by the processing of the third concept assigning unit 102g (step SC). -3).
[0105]
Thus, the concept assignment process ends.
[0106]
Referring back to FIG. 19, the text mining analysis device 100 executes a category change process of changing the category structure according to the new concept assigned by the process of the category change unit 102b (step SB-2).
[0107]
Here, the category change process executed by the category change unit 102b will be described below with reference to FIG.
[0108]
[Category change process]
Next, details of the category change process will be described with reference to FIG. FIG. 21 is a flowchart illustrating an example of the category change process of the present system in the present embodiment.
The category change process can be executed by combining any one of the first category change process and the second category change process, which will be described in detail below, in an arbitrary order.
[0109]
(First category change processing)
As described above with reference to FIG. 7, the category changing unit 102b arranges a new concept under an arbitrary concept of an existing category by the processing of the first category changing unit 102h (step SD-1).
[0110]
(Second category change processing)
As described above with reference to FIG. 8, the category changing unit 102b configures a new category including a new concept by the processing of the second category changing unit 102i (step SD-2).
[0111]
Thus, the category change process ends.
[0112]
Returning to FIG. 19 again, the text mining analysis apparatus 100 selects the analysis target concept to be subjected to the text mining analysis by the processing of the viewer signing unit 102c, and selects from the concepts existing below the analysis target concept in the category structure. A view is assigned by setting a configuration concept constituting a view cut line (step SB-3).
[0113]
That is, the viewer sign unit 102c displays the view setting screen shown in FIG. 13 on the output device 114, and prompts the user to set the analysis target concept and the configuration concept constituting the view cut line.
[0114]
Here, FIG. 13 is a diagram showing an example of a view setting screen displayed on the output device 114 of the text mining analyzer 100. In the example illustrated in FIG. 13, a description will be given of an example of a tree structure category in which three child concepts (concept IDs are catA, catB, and catC) are connected from the root, and a plurality of child concepts are connected to each.
[0115]
In FIG. 13, on the left side of the concept ID, an unexpanded concept (indicated by "+"), an unfolded concept (indicated by "-"), or a terminal concept (indicated by "none") is distinguished. Is displayed. When the user designates the undeployed concept by using the input device 112 by clicking the “+” sign with a mouse or the like, the lower concept is displayed. , Is switched to the display of the development concept (“-”). That is, the user can specify an arbitrary concept as a constituent concept constituting a cut line by opening or closing a lower concept using the sign.
[0116]
On the right side of the concept IDs of the undeveloped concept and the terminal concept, a selection area for setting an attribute is displayed. Using the input device 112, the user selects “select (selects as a configuration concept)”, “skips (does not select as a configuration concept)”, or “others (divides into other groups different from other configuration concepts) using the input device 112. Is performed), the attribute setting unit 102j stores the attribute value of the specified configuration concept in a predetermined storage area of the view file 106d.
[0117]
Returning to FIG. 19 again, the text mining analyzer 100 executes the text mining analysis by using the assigned view by the processing of the text mining analyzer 102d (step SB-4).
[0118]
Here, FIG. 14 is a diagram illustrating an example of a screen that displays a text mining analysis result. FIG. 14 illustrates an example in which the number of documents is displayed for each concept specified in the view.
[0119]
As shown in FIG. 14, the number of documents belonging to each configuration concept is displayed. As for the concept B, concepts having an attribute of “others” are collectively displayed.
[0120]
Thus, the main process ends.
[0121]
[Other embodiments]
Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, but may be applied to various different embodiments within the scope of the technical idea described in the claims. It may be implemented.
[0122]
For example, the case where the text mining analyzer 100 performs the processing in a stand-alone form has been described as an example, but the processing is performed in response to a request from a client terminal that is configured separately from the text mining analyzer 100, The processing result may be returned to the client terminal.
[0123]
Further, of the processes described in the embodiment, all or a part of the processes described as being performed automatically may be manually performed, or all of the processes described as being performed manually may be performed. Alternatively, a part thereof can be automatically performed by a known method.
[0124]
In addition, the processing procedures, control procedures, specific names, information including parameters such as various registration data and search conditions, screen examples, and database configurations shown in the above-described documents and drawings, except where otherwise noted, It can be changed arbitrarily.
[0125]
Also, regarding the text mining analysis device 100, the illustrated components are functionally conceptual and do not necessarily need to be physically configured as illustrated.
[0126]
For example, with respect to the processing functions provided in each unit or each device of the text mining analysis device 100, in particular, each processing function performed by the control unit 102, all or any part thereof is transferred to a CPU (Central Processing Unit) and the CPU. It can be realized by a program that is interpreted and executed, or can be realized as hardware by wired logic. The program is recorded on a recording medium described later, and is mechanically read by the text mining analyzer 100 as needed.
[0127]
That is, a computer program for giving instructions to the CPU in cooperation with an OS (Operating System) and performing various processes is recorded in the storage unit 106 such as a ROM or an HD. This computer program is executed by being loaded into a RAM or the like, and configures the control unit 102 in cooperation with the CPU. Further, this computer program may be recorded in an application program server connected to the text mining analyzer 100 via an arbitrary network 300, and all or a part of the computer program may be downloaded as necessary. It is.
[0128]
Further, the program according to the present invention can be stored in a computer-readable recording medium. Here, the “recording medium” refers to an arbitrary “portable physical medium” such as a flexible disk, a magneto-optical disk, a ROM, an EPROM, an EEPROM, a CD-ROM, an MO, a DVD, and the like, and a built-in various computer systems. A short-term program such as a communication line or a carrier wave when transmitting the program via an arbitrary "fixed physical medium" such as ROM, RAM, HD, or a network represented by LAN, WAN, or the Internet. "Communications medium" that holds.
[0129]
The “program” is a data processing method described in an arbitrary language or description method, and may be in any format such as a source code or a binary code. The “program” is not necessarily limited to a single program, but may be distributed in the form of a plurality of modules or libraries, or may operate in cooperation with a separate program represented by an OS (Operating System). Includes those that achieve functions. Note that a known configuration and procedure can be used for a specific configuration, a reading procedure, an installation procedure after reading, and the like in each apparatus described in the embodiments.
[0130]
The text mining analyzer 100 further includes, as additional components, an input device (not shown) including various pointing devices such as a mouse, a keyboard, an image scanner, a digitizer, and the like, and a display device (not shown) used to monitor input data. ), A clock generator (not shown) for generating a system clock, and an output device (not shown) such as a printer for outputting various processing results and other data. The output device may be connected to the control unit 102 via an input / output interface.
[0131]
Further, the text mining analyzer 100 connects a peripheral device such as a printer, a monitor, and an image scanner to an information processing device such as an information processing terminal such as a known personal computer or a workstation, and connects the information processing device of the present invention to the information processing device. May be implemented by implementing software (including programs, data, and the like) for implementing the above.
[0132]
Furthermore, the specific form of the distribution / integration of the text mining analyzer 100 and the like is not limited to those shown in the description and the drawings. It can be physically distributed and integrated (for example, grid computing). For example, each database may be independently configured as an independent database device, or a part of the processing may be realized using a CGI (Common Gateway Interface).
[0133]
The network 300 has a function of interconnecting the text mining analyzer 100 and the external system 200, and includes, for example, the Internet, an intranet, a LAN (including both wired / wireless), a VAN, and a personal computer. A communication network, a public telephone network (including both analog and digital), a private line network (including both analog and digital), a CATV network, an IMT2000 system, a GSM system, a PDC / PDC-P system, and the like. It may include any of a cellular line switching network / portable packet switching network, a radio paging network, a local radio network such as Bluetooth, a PHS network, and a satellite communication network such as CS, BS or ISDB. That is, the present system can transmit and receive various data via any network regardless of wired or wireless.
[0134]
【The invention's effect】
As described above in detail, according to the present invention, a new concept is assigned without using an existing category, the structure of the category is changed according to the assigned new concept, and the subject is subjected to text mining analysis. Assign a view by selecting a concept to be analyzed, setting a constituent concept that constitutes a view cut line from concepts that exist below the concept in the category structure, and performing text mining analysis using the assigned view Therefore, it is possible to provide a text mining analysis apparatus, a text mining analysis method, a program, and a recording medium that can flexibly assign concepts and views in text mining analysis.
[0135]
Further, according to the present invention, a new concept is assigned without using an existing category, so that a concept corresponding to a new concept that is not defined by an existing synonym dictionary and a category dictionary can be created. A text mining analysis device, a text mining analysis method, a program, and a recording medium can be provided.
[0136]
Further, according to the present invention, since the structure of a category is changed according to an assigned new concept, a text mining analysis apparatus and a text mining analysis method which can flexibly assign a concept to a category according to a use situation , A program, and a recording medium.
[0137]
According to the present invention, an analysis target concept to be subjected to text mining analysis is selected, and a configuration concept that configures a view cut line from a concept existing below the analysis target concept in the category structure is set. Since views are assigned and text mining analysis is performed using the assigned views, it is possible to assign views independently of categories, and to flexibly specify non-sibling concepts. , A text mining analysis method, a text mining analysis method, a program, and a recording medium.
[0138]
Further, according to the present invention, the concept assigning means (or “concept assignment step” and the like hereinafter) designates a search condition relating to a character string or a numerical value, and a document in which a character string or a numerical value satisfying the search condition exists in the document. First concept assigning means for assigning a set of documents as a new concept, specifying a search condition and a field to be searched, and assigning a set of documents in which a character string or a numerical value satisfying the search condition exists in a document field as a new concept And a third concept assigning means for assigning a set of documents acquired by performing a logical set operation on an existing concept as a new concept. New not defined by word and category dictionaries A text mining analysis apparatus capable of creating a concept that corresponds to the concept, text mining analysis method, a program, and can provide a recording medium.
[0139]
Further, according to the present invention, the category changing means (or “category changing step” and the like hereinafter) includes a first category changing means for arranging a new concept under an arbitrary concept of an existing category, and a new concept. A text mining analysis apparatus and a text mining apparatus capable of flexibly assigning a concept to an existing or new category according to a use situation, since the apparatus further comprises at least one of the second category changing means constituting the new category. A mining analysis method, a program, and a recording medium can be provided.
[0140]
Further, according to the present invention, the viewer sign means (or “viewer sign step” and the like hereinafter) further includes an attribute setting means for setting an attribute corresponding to the composition concept, and the text mining analysis means includes the set composition. Since the text mining analysis is performed according to the attributes of the concept, various attributes (for example, “selection (select as a configuration concept)”, “skip (do not select as a configuration concept)”, “other (separate from other configuration concepts) The analysis can be flexibly assigned to a text mining analysis apparatus, a text mining analysis method, a program, and a recording medium.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating the concepts of a document, a concept, a category, and a view, which are information handled in text mining analysis in a text mining system.
FIG. 2 is a conceptual diagram illustrating a problem of the related art.
FIG. 3 is a flowchart showing the basic principle of the present invention.
FIG. 4 is a diagram illustrating an example of a case of performing a concept configuration by full-text search;
FIG. 5 is a diagram illustrating an example of a case of performing a concept configuration by field search;
FIG. 6 is a diagram illustrating an example of a case of performing a concept configuration by a logical set operation.
FIG. 7 is a diagram illustrating an example of a case where a concept is arranged in an existing category;
FIG. 8 is a diagram showing an example of a case where a concept is arranged in a new category.
FIG. 9 is a diagram showing an outline of a view assigning method according to the present invention.
FIG. 10 is a conceptual diagram illustrating a case where an attribute “other” is set in a configuration concept forming a view cut line.
FIG. 11 is a conceptual diagram illustrating a case where an attribute of “skip” is set in a configuration concept forming a view cut line.
FIG. 12 is a conceptual diagram illustrating a case where attributes “skip” and “other” are mixed and set in a configuration concept forming a view cut line.
FIG. 13 is a diagram showing an example of a view setting screen displayed on the output device 114 of the text mining analyzer 100.
FIG. 14 is a diagram showing an example of a screen displaying a text mining analysis result.
FIG. 15 is a block diagram showing an example of the configuration of the present system to which the present invention is applied.
FIG. 16 is a block diagram illustrating an example of a configuration of a concept assigning unit 102a.
FIG. 17 is a block diagram illustrating an example of a configuration of a category changing unit 102b.
FIG. 18 is a block diagram illustrating an example of a configuration of a viewer sign unit 102c.
FIG. 19 is a flowchart illustrating an example of main processing of the present system in the present embodiment.
FIG. 20 is a flowchart illustrating an example of a concept assignment process of the present system in the present embodiment.
FIG. 21 is a flowchart illustrating an example of a category change process of the present system in the present embodiment.
[Explanation of symbols]
100 text mining analyzer
102 control unit
102a Concept Assignment Department
102b Category change unit
102c Viewer sign section
102d Text mining analysis unit
102e 1st concept assignment section
102f 2nd concept assignment section
102g 3rd concept assignment section
102h 1st category change section
102i 2nd category change part
102j Attribute setting section
104 Communication control interface unit
106 storage unit
106a Document file
106b concept file
106c category file
106d view file
106e Analysis result file
108 I / O control interface
112 input device
114 Output device
200 External system
300 Network

Claims

A concept assignment means for executing a new concept assignment without using an existing category,
A category changing means for changing a structure of a category according to the new concept assigned by the concept assigning means;
A viewer sign that selects a target concept to be analyzed for text mining analysis and assigns a view by assigning a constituent concept that constitutes a view cut line from the above concept that exists below the target concept in the structure of the above category Means,
Text mining analysis means for performing text mining analysis using the view assigned by the viewer sign means,
A text mining analyzer, comprising:

The above concept assignment means
First concept assigning means for designating a search condition relating to a character string or a numerical value, and assigning, as a new concept, a set of the documents having the character string or the numerical value satisfying the search condition in the document,
Second concept assigning means for designating the search condition and a field to be searched, and assigning, as a new concept, a set of the documents in which the character strings and numerical values satisfying the search condition exist in the fields of the document, and ,
Third concept assigning means for assigning a set of the documents obtained by performing a logical set operation on an existing concept as a new concept;
The text mining analyzer according to claim 1, further comprising at least one of the following.

The category changing means is as follows:
A first category change means for placing a new concept under any concept of an existing category; and
A second category changing means for forming a new category comprising a new concept;
3. The text mining analysis device according to claim 1, further comprising at least one of the following.

The above viewer sign means,
Attribute setting means for setting an attribute corresponding to the above configuration concept;
Further comprising
The text according to any one of claims 1 to 3, wherein the text mining analysis unit performs the text mining analysis according to the attribute of the configuration concept set by the attribute setting unit. Mining analyzer.

A concept assignment step for assigning a new concept without using an existing category,
A category changing step for changing the structure of the category according to the new concept assigned by the concept assigning step;
Viewer sign that assigns a view by selecting an analysis target concept to be subjected to text mining analysis and setting a constituent concept that constitutes a view cut line from the above concept existing below the analysis target concept in the structure of the above category Steps and
A text mining analysis step of performing a text mining analysis using the view assigned in the viewer sign step;
A text mining analysis method, comprising:

The concept assignment steps above
A first concept assigning step of designating a search condition relating to a character string or a numerical value and assigning a set of the documents having the character string or the numerical value satisfying the search condition in the document as a new concept;
A second concept assigning step of designating the search condition and a field to be searched, and assigning, as a new concept, a set of the documents in which the character strings and numerical values satisfying the search condition exist in the fields of the document as a new concept; ,
A third concept assignment step of assigning a set of the documents obtained by performing a logical set operation on the existing concept as a new concept;
The text mining analysis method according to claim 5, further comprising at least one of the following steps.

The category change step above
A first category change step to place the new concept under any concept in the existing category; and
A second category change step that constitutes a new category consisting of a new concept,
The text mining analysis method according to claim 5, further comprising at least one of the following steps.

The above viewer sign step
An attribute setting step of setting an attribute corresponding to the above configuration concept;
Further comprising
The text according to any one of claims 5 to 7, wherein the text mining analysis step performs the text mining analysis according to the attribute of the configuration concept set in the attribute setting step. Mining analysis method.

A concept assignment step for assigning a new concept without using an existing category,
A category changing step for changing the structure of the category according to the new concept assigned by the concept assigning step;
Viewer sign that assigns a view by selecting an analysis target concept to be subjected to text mining analysis and setting a constituent concept that constitutes a view cut line from the above concept existing below the analysis target concept in the structure of the above category Steps and
A text mining analysis step of performing a text mining analysis using the view assigned in the viewer sign step;
A program for causing a computer to execute a text mining analysis method including the following.

The concept assignment steps above
A first concept assigning step of designating a search condition relating to a character string or a numerical value and assigning a set of the documents having the character string or the numerical value satisfying the search condition in the document as a new concept;
A second concept assigning step of designating the search condition and a field to be searched, and assigning, as a new concept, a set of the documents in which the character strings and numerical values satisfying the search condition exist in the fields of the document as a new concept; ,
A third concept assignment step of assigning a set of the documents obtained by performing a logical set operation on the existing concept as a new concept;
The program according to claim 9, further comprising at least one of the following steps:

The category change step above
A first category change step to place the new concept under any concept in the existing category; and
A second category change step that constitutes a new category consisting of a new concept,
The program according to claim 9, further comprising at least one of the following steps:

The above viewer sign step
An attribute setting step of setting an attribute corresponding to the above configuration concept;
Further comprising
The program according to any one of claims 9 to 11, wherein the text mining analysis step executes the text mining analysis according to the attribute of the configuration concept set in the attribute setting step. .

A computer-readable recording medium on which the program according to any one of claims 9 to 12 is recorded.