JP3955522B2

JP3955522B2 - Data analysis apparatus and method, and program

Info

Publication number: JP3955522B2
Application number: JP2002326698A
Authority: JP
Inventors: 真佐野
Original assignee: 株式会社ジャストシステム
Priority date: 2002-11-11
Filing date: 2002-11-11
Publication date: 2007-08-08
Anticipated expiration: 2022-11-11
Also published as: JP2004164079A

Description

【０００１】
【発明の属する技術分野】
本発明は、テキストマイニングの手法によるデータ分析装置及び方法、並びにプログラムに関する。
【０００２】
【従来の技術】
近年、経営戦略或いはマーケティング戦略を有効に策定するためには、経営やマーケティングにとって必要な現時点までの傾向やパターンなどを導き出し、データ分析を行うことが不可欠となっている。このデータ分析において傾向やパターンなどを導き出す手法として、データマイニングが適用されている。データマイニングにおいて顧客情報を取り扱う場合は、アンケート結果などによる一義的、具体的な事実に基づいた定量的な顧客情報（属性情報）による分析が中心となっている。
【０００３】
例えば、購買履歴からのデータマイニングの適用としては、顧客の年齢、購入日時などの数値情報、購入アイテムに関する属性データを入力データとし、属性データの因果関係をマイニングして、クロスセル・アップセル分析を行うものである。製品開発場面でのデータマイニングの適用としては、ノートＰＣの開発を例とすると、バッテリー駆動時間、ＤＶＤドライブの有無などの属性データの組み合わせ条件から、コンジョイント分析などの手法で知覚品質を評価したりしている。
【０００４】
データマイニングにおける分析では、このように一義的、具体的な事実に基づいた属性情報を分析するものであり、その背後にある顧客の主観的意図は分析対象としていない。このため、例えば、（１）顧客が評価を行った場所、時間、状況などの関わりが考慮されない。（２）どのような考えで顧客が購買行動に至ったのかが分からない。（３）顧客がなぜその状況で商品を購入したか分からない。（４）どのような考えで特定のモデルを高く（低く）評価したかが分からない。等の問題があった。
【０００５】
そこで、上記のデータマイニングのように一義的、具体的な情報を分析するだけでなく、例えば、顧客が文章で回答した結果を分析しようとするテキストマイニングが提唱されている。テキストマイニングで対象とする顧客の回答文などには、顧客の主観的意図が含まれている場合があり、経営やマーケティングにとってより有用な情報が得られることが期待される。
【０００６】
【発明が解決しようとする課題】
しかしながら、これまでのテキストマイニングの手法は、データマイニングの手法を踏襲しただけのものが多かった。ここでは、文章の中に含まれる言葉をキーワードとして一義的、具体的なデータとして捉え、データマイニングと同じような手法でしか分析を行っておらず、その言葉の背後にある顧客の主観的意図がほとんど何も分析されないでいた。
【０００７】
このため、結果として生成される経営分析或いはマーケティング分析用のデータからは、顧客の主観的意図に関する情報が欠落してしまっていた。また、データマイニングの手法を踏襲しただけのテキストマイニングでは、結果として生成されるデータの質もデータマイニングの場合と同程度であり、結果として得られたデータから経営戦略やマーケティング戦略を策定するには、戦略策定者の経験や勘に頼らなければいけない部分が多かった。
【０００８】
本発明は、テキストマイニングの手法により主観的な情報を捉えて的確なデータ分析を行うことができるデータ分析装置及び方法、並びにプログラムを提供することを目的とする。
【０００９】
本発明は、また、経営戦略やマーケティング戦略等の策定が容易となる分析結果を提供することができるデータ分析装置及び方法、並びにプログラムを提供することを目的とする。
【００１０】
【課題を解決するための手段】
上記目的を達成するため、本発明の第１の観点にかかるデータ分析装置は、
分析対象となる文章に含まれ得る名詞句を登録した名詞句辞書を記憶する名詞句辞書記憶手段と、
前記名詞句辞書に登録された名詞句を修飾し得る修飾語句を、各修飾語句が意味的に好ましいか好ましくないかを示すレベル値と対応付けて登録した修飾語句辞書を記憶する修飾語句辞書記憶手段と、
文章の分析開始が指示されることにより、前記名詞句辞書記憶手段に記憶された名詞句辞書を参照して、現状に関する記述部分と該現状に関連した将来の状態に関する記述部分とを含む分析対象となる文章に含まれる各記述部分から名詞句を抽出する名詞句抽出手段と、
前記修飾語句辞書記憶手段に記憶された修飾語句辞書を参照して、前記名詞句抽出手段が抽出した名詞句を修飾する修飾語句を抽出する修飾語句抽出手段と、
前記分析対象となる文章の中から、用いられている時制が過去形または現在形か未来形か、記述されている位置が先か後か、または記憶手段に記憶された辞書に登録されている将来の状態の頻出語句を含むか否かに従って、現状に関する記述部分と将来の状態に関する記述部分とを切り分ける切り分け手段と、
前記切り分け手段が切り分けた記述部分毎に、前記修飾語句抽出手段が抽出した修飾語句について前記修飾語句辞書に登録されたレベル値を前記名詞句抽出手段が抽出した名詞句毎に集計して、該集計したレベル値を名詞句毎のアフェクト度として算出するアフェクト度算出手段と、
前記アフェクト度算出手段が算出した名詞句毎のアフェクト度を出力するアフェクト度出力手段と
を備えることを特徴とする。
【００１１】
上記第１の観点にかかるデータ分析装置は、
前記切り分け手段が切り分けた記述部分毎に、前記名詞句抽出手段が抽出した各名詞句の前記分析対象となる文章に含まれる各記述部分における出現頻度を算出する名詞句頻度算出手段をさらに備えていてもよい。この場合、
前記アフェクト度出力手段は、各名詞句について前記アフェクト度算出手段が算出したアフェクト度と前記名詞句頻度算出手段が算出した出現頻度とを視覚的に関連付けて出力するものとすることができる。
【００１２】
上記第１の観点にかかるデータ分析装置は、
前記修飾語句辞書記憶手段に記憶された修飾語句辞書を参照して、分析対象となる文章に含まれる各記述部分において名詞句を修飾する修飾語句を抽出する修飾語句抽出手段と、
前記名詞句辞書記憶手段に記憶された名詞句辞書を参照して、前記修飾語句抽出手段が抽出した各修飾語句によって修飾される名詞句を抽出する名詞句抽出手段と、
前記切り分け手段が切り分けた記述部分毎に、前記名詞句抽出手段による名詞句の抽出結果に基づいて、前記修飾語句抽出手段が抽出した修飾語句が同一の名詞句を修飾している度合いを示す集約度を修飾語句毎に算出する集約度算出手段と、
前記集約度算出手段が算出した修飾語句毎の集約度を出力する集約度出力手段とをさらに備えるものとすることができる。
【００１３】
この場合において、上記第１の観点にかかるデータ分析装置は、
前記切り分け手段が切り分けた記述部分毎に、前記修飾語句抽出手段が抽出した各修飾語句の前記分析対象となる文章に含まれる各記述部分における出現頻度を算出する修飾語句頻度算出手段をさらに備えていてもよい。そして、
前記集約度出力手段は、各修飾語句について前記集約度算出手段が算出した集約度と前記修飾語句頻度算出手段が算出した出現頻度とを視覚的に関連付けて出力するものとすることができる。
【００１４】
上記目的を達成するため、本発明の第２の観点にかかるデータ分析装置は、
分析対象となる文章に含まれ得る名詞句を登録した名詞句辞書を記憶する名詞句辞書記憶手段と、
前記名詞句辞書に登録された名詞句を修飾し得る修飾語句を登録した修飾語句辞書を記憶する修飾語句辞書記憶手段と、
文章の分析開始が指示されることにより、前記修飾語句辞書記憶手段に記憶された修飾語句辞書を参照して、現状に関する記述部分と該現状に関連した将来の状態に関する記述部分とを含む分析対象となる文章に含まれる各記述部分において名詞句を修飾する修飾語句を抽出する修飾語句抽出手段と、
前記名詞句辞書記憶手段に記憶された名詞句辞書を参照して、前記修飾語句抽出手段が抽出した各修飾語句によって修飾される名詞句を抽出する名詞句抽出手段と、
前記分析対象となる文章の中から、用いられている時制が過去形または現在形か未来形か、記述されている位置が先か後か、または記憶手段に記憶された辞書に登録されている将来の状態の頻出語句を含むか否かに従って、現状に関する記述部分と将来の状態に関する記述部分とを切り分ける切り分け手段と、
前記切り分け手段が切り分けた記述部分毎に、前記名詞句抽出手段による名詞句の抽出結果に基づいて、前記修飾語句抽出手段が抽出した修飾語句が同一の名詞句を修飾している度合いを示す集約度を修飾語句毎に算出する集約度算出手段と、
前記集約度算出手段が算出した修飾語句毎の集約度を出力する集約度出力手段と
を備えることを特徴とする。
【００１５】
上記第２の観点にかかるデータ分析装置は、
前記切り分け手段が切り分けた記述部分毎に、前記修飾語句抽出手段が抽出した各修飾語句の前記分析対象となる文章に含まれる各記述部分における出現頻度を算出する修飾語句頻度算出手段をさらに備えていてもよい。この場合において、
前記集約度出力手段は、各修飾語句について前記集約度算出手段が算出した集約度と前記修飾語句頻度算出手段が算出した出現頻度とを視覚的に関連付けて出力するものとすることができる。
【００１８】
上記目的を達成するため、本発明の第３の観点にかかるデータ分析方法は、
分析対象となる文章に含まれ得る名詞句を登録した名詞句辞書と、前記名詞句辞書に登録された名詞句を修飾し得る修飾語句を、各修飾語句が意味的に好ましいか好ましくないかを示すレベル値と対応付けて登録した修飾語句辞書とを記憶した辞書記憶手段を備えるとともに、作業領域として用いられる記憶手段を有するコンピュータ装置において、
分析開始が指示されることにより、前記辞書記憶手段に記憶された名詞句辞書を参照して、前記記憶手段に記憶され、現状に関する記述部分と該現状に関連した将来の状態に関する記述部分とを含む分析対象となる文章に含まれる各記述部分から名詞句を抽出し、該抽出した名詞句を前記記憶手段に記憶させ、
前記辞書記憶手段に記憶された修飾語句辞書を参照して、前記抽出して前記記憶手段に記憶させた名詞句を修飾する修飾語句を抽出し、該抽出した修飾語句を前記記憶手段に記憶させ、
前記記憶手段に記憶された分析対象となる文章の中から、用いられている時制が過去形または現在形か未来形か、記述されている位置が先か後か、または前記辞書記憶手段にさらに記憶された頻出語句辞書に登録されている将来の状態の頻出語句を含むか否かに従って、現状に関する記述部分と将来の状態に関する記述部分とを切り分けて、前記記憶手段に記憶させ、
前記切り分けた記述部分毎に、前記抽出して前記記憶手段に記憶させた修飾語句について前記修飾語句辞書に登録されたレベル値を前記抽出して前記記憶手段に記憶させた名詞句毎に集計して、該集計したレベル値を名詞句毎のアフェクト度として算出し、該算出したアフェクト度を前記記憶手段に記憶させ、
前記算出して前記記憶手段に記憶させた名詞句毎のアフェクト度を前記コンピュータ装置の出力装置から出力させる
ことを特徴とする。
【００２０】
上記目的を達成するため、本発明の第４の観点にかかるデータ分析方法は、
分析対象となる文章に含まれ得る名詞句を登録した名詞句辞書と、前記名詞句辞書に登録された名詞句を修飾し得る修飾語句を登録した修飾語句辞書とを記憶した辞書記憶手段を備えるとともに、作業領域として用いられる記憶手段を有するコンピュータ装置において、
分析開始が指示されることにより、前記辞書記憶手段に記憶された修飾語句辞書を参照して、前記記憶手段に記憶され、現状に関する記述部分と該現状に関連した将来の状態に関する記述部分とを含む分析対象となる文章に含まれる各記述部分において名詞句を修飾する修飾語句を抽出し、該抽出した修飾語句を前記記憶手段に記憶させ、
前記辞書記憶手段に記憶された名詞句辞書を参照して、前記抽出して前記記憶手段に記憶させた各修飾語句によって修飾される名詞句を抽出し、
前記記憶手段に記憶された分析対象となる文章の中から、用いられている時制が過去形または現在形か未来形か、記述されている位置が先か後か、または前記辞書記憶手段にさらに記憶された頻出語句辞書に登録されている将来の状態の頻出語句を含むか否かに従って、現状に関する記述部分と将来の状態に関する記述部分とを切り分けて、前記記憶手段に記憶させ、
前記切り分けた記述部分毎に、前記名詞句の抽出結果に基づいて、前記抽出して前記記憶手段に記憶させた修飾語句が同一の名詞句を修飾している度合いを示す集約度を修飾語句毎に算出し、該算出した集約度を前記記憶手段に記憶させ、
前記算出して前記記憶手段に記憶させた修飾語句毎の集約度を前記コンピュータ装置の出力装置から出力させる
ことを特徴とするデータ分析方法。
【００２１】
上記第４の観点にかかるデータ分析方法は、
前記抽出した各修飾語句の前記分析対象となる文章における出現頻度をさらにものとすることができる。この場合において、
前記集約度は、前記算出した出現頻度と視覚的に関連付けて、各修飾語句毎に視覚的に出力されるものとすることができる。
【００２２】
上記目的を達成するため、本発明の第５の観点にかかるプログラムは、
分析対象となる文章に含まれ得る名詞句を登録した名詞句辞書と、前記名詞句辞書に登録された名詞句を修飾し得る修飾語句を、各修飾語句が意味的に好ましいか好ましくないかを示すレベル値と対応付けて登録した修飾語句辞書とを記憶した辞書記憶手段を備えるコンピュータ装置を、
文章の分析開始が指示されることにより、前記辞書記憶手段に記憶された名詞句辞書を参照して、現状に関する記述部分と該現状に関連した将来の状態に関する記述部分とを含む分析対象となる文章に含まれる各記述部分から名詞句を抽出する名詞句抽出手段、
前記辞書記憶手段に記憶された修飾語句辞書を参照して、前記名詞句抽出手段が抽出した名詞句を修飾する修飾語句を抽出する修飾語句抽出手段、
前記分析対象となる文章の中から、用いられている時制が過去形または現在形か未来形か、記述されている位置が先か後か、または記憶手段に記憶された辞書に登録されている将来の状態の頻出語句を含むか否かに従って、現状に関する記述部分と将来の状態に関する記述部分とを切り分ける切り分け手段、
前記切り分け手段が切り分けた記述部分毎に、前記修飾語句抽出手段が抽出した修飾語句について前記修飾語句辞書に登録されたレベル値を前記名詞句抽出手段が抽出した名詞句毎に集計して、該集計したレベル値を名詞句毎のアフェクト度として算出するアフェクト度算出手段、及び、
前記アフェクト度算出手段が算出した名詞句毎のアフェクト度を前記コンピュータ装置が備える出力装置から出力させるアフェクト度出力手段と
として機能させることを特徴とする。
【００２４】
上記目的を達成するため、本発明の第６の観点にかかるプログラムは、
分析対象となる文章に含まれ得る名詞句を登録した名詞句辞書と、前記名詞句辞書に登録された名詞句を修飾し得る修飾語句を登録した修飾語句辞書とを記憶した辞書記憶手段を備えるコンピュータ装置を、
分析対象となる文章に含まれ得る名詞句を登録した名詞句辞書を記憶する名詞句辞書記憶手段、
前記名詞句辞書に登録された名詞句を修飾し得る修飾語句を登録した修飾語句辞書を記憶する修飾語句辞書記憶手段、
文章の分析開始が指示されることにより、前記修飾語句辞書記憶手段に記憶された修飾語句辞書を参照して、現状に関する記述部分と該現状に関連した将来の状態に関する記述部分とを含む分析対象となる文章に含まれる各記述部分において名詞句を修飾する修飾語句を抽出する修飾語句抽出手段、
前記名詞句辞書記憶手段に記憶された名詞句辞書を参照して、前記修飾語句抽出手段が抽出した各修飾語句によって修飾される名詞句を抽出する名詞句抽出手段、
前記分析対象となる文章の中から、用いられている時制が過去形または現在形か未来形か、記述されている位置が先か後か、または記憶手段に記憶された辞書に登録されている将来の状態の頻出語句を含むか否かに従って、現状に関する記述部分と将来の状態に関する記述部分とを切り分ける切り分け手段、
前記切り分け手段が切り分けた記述部分毎に、前記名詞句抽出手段による名詞句の抽出結果に基づいて、前記修飾語句抽出手段が抽出した修飾語句が同一の名詞句を修飾している度合いを示す集約度を修飾語句毎に算出する集約度算出手段、及び、
前記集約度算出手段が算出した修飾語句毎の集約度を前記コンピュータ装置が備える出力装置から出力させる集約度出力手段と
として機能させることを特徴とする。
【００２５】
上記第６の観点にかかるプログラムは、前記コンピュータ装置を、
前記修飾語句抽出手段が抽出した各修飾語句の前記分析対象となる文章における出現頻度を算出する修飾語句頻度算出手段としてさらに機能させるものであってもよい。この場合において、
前記集約度出力手段は、各修飾語句について前記集約度算出手段が算出した集約度と前記修飾語句頻度算出手段が算出した出現頻度とを視覚的に関連付けて出力するものとすることができる。
【００２６】
【発明の実施の形態】
以下、添付図面を参照して、本発明の実施の形態について説明する。
【００２７】
図１は、この実施の形態にかかるデータ分析装置を中心としたシステムの構成を示すブロック図である。このデータ分析装置１は、ＬＡＮ（Local Area Network）３を介して管理部門の社員が利用する端末装置（パーソナルコンピュータやワークステーションなど）２に接続される。さらにゲートウェイ５からインターネット６を介して顧客の端末装置（パーソナルコンピュータや携帯電話機など）４にも接続される。このデータ分析装置１は、ＣＰＵ（Central Processing Unit）１１と、記憶装置１２と、通信装置１３と、ファイル装置１４とを備えている。
【００２８】
ＣＰＵ１１は、記憶装置１２に記憶されたプログラムを実行し、後述するようにテキストマイニングの手法によりデータ分析を行う。記憶装置１２は、主記憶装置及び補助記憶装置を含むものであり、ＣＰＵ１１が実行するプログラムを記憶すると共に、ＣＰＵ１１のワークエリアとして使用される。通信装置１３は、ＬＡＮ３やインターネット６を介して端末装置２、４などと情報を送受信する。
【００２９】
ファイル装置１４は、アンケートファイル１４ａと、知識辞書１４ｂと、分析結果データベース１４ｃとを含んでいる。ファイル装置１４は、物理的には記憶装置１２の補助記憶装置の中に含まれるものではあるが、本発明において重要な役割を有するファイル、データベースを含むものであるため、特に構成を分けて記載している。
【００３０】
アンケートファイル１４ａは、顧客が製品に対して自然言語の文章で入力したアンケート結果のテキストデータを記録したファイルである。このアンケートは、図２に示すような記入フォーム１００に従って端末装置４において記入され、インターネット６を介してデータ分析装置１に送信されて、アンケートファイル１４ａに記録される。
【００３１】
図２の記入フォーム１００は、アンケートの対象とした製品に対して顧客が便利だと感じる点の記入欄（満足記入欄）１０１と、便利な点をさらに延ばすために何を期待するかの記入欄（進展期待記入欄）１０２と、顧客が不便だと感じる点の記入欄（不満記入欄）１０３と、不便な点をどのように改善することを望むかの記入欄（改善期待記入欄）１０４とに分かれている。ここで、記入欄１０１、１０３には現状が、記入欄１０２、１０４には将来あるべき状態が記入されることとなる。
【００３２】
知識辞書１４ｂは、単語辞書と、文法辞書と、品詞辞書とを含んでいる。単語辞書及び文法辞書は、文章を形態素解析し、構文解析するために従来より用いられている辞書と実質的に同じものが適用される。品詞辞書には、単語辞書に登録された文法的な意味での単語が登録されるのではなく、評価の対象となる語句（１語、複数語の連接により１つのまとまった意味をなす句）が登録されている。また、品詞辞書は、名詞、形容詞及び動詞の別に用意されているが、これらは文法的な働きにより分類されているのではなく、意味上名詞的な働きをするか、形容詞的な働きをするか、動詞的な働きをするかによって分類されている。
【００３３】
名詞句辞書は、形容詞句、動詞句によって修飾される名詞句を登録した辞書である。形容詞句辞書及び動詞句辞書は、名詞句を修飾することとなる形容詞句、動詞句をそれぞれ登録した辞書である。ここでの修飾・被修飾関係は、文法的に修飾語・被修飾語となるかどうかまでを要求するのではなく、意味的に係り受けの関係にあればよい。また、修飾語は、被修飾語に前置されているか後置されているかを問わない。特に形容詞句辞書は、後述する評価分析を行うため、図３に示すように形容詞句（表現語及び正規化表現）に対応付けて、その言葉が好ましい評価をするのか好ましくない評価をするのかを示すレベル値を登録している。
【００３４】
分析結果データベース１４ｃは、アンケートファイル１４ａに記録されたアンケートの文章に対して、知識辞書１４ｂを参照して、後述するデータ分析を行った結果を登録するデータベースである。分析結果データベース１４ｃには、データ分析の最終結果として出力される前の段階の解析・分析結果（後述する形態素解析及び構文解析の結果、並びに主題分析の結果）をも登録してもよい。
【００３５】
以下、この実施の形態にかかるデータ分析装置１における処理について説明する。ここでは、テキストマイニングの対象となるアンケートは、既に端末装置４から送られてきて、アンケートファイル１４ａに記録されているが、未だ分析は行われていないものとする。図４は、データ分析装置１における処理を示すフローチャートである。このフローチャートの処理は、端末装置２から処理開始の指示が送られてくることで開始する。
【００３６】
処理が開始すると、ＣＰＵ１１は、まず、知識辞書１４ｂの単語辞書及び文法辞書を参照して、アンケートファイル１４ａに記録されている各アンケートの文章を形態素解析し（ステップＳ１１）、構文解析する（ステップＳ１２）。形態素解析及び構文解析は、従来と同様の手法により行われ、その構文解析結果に基づいて後述する修飾・被修飾関係を把握することが可能となる。
【００３７】
形態素解析及び構文解析を終了すると、ＣＰＵ１１は、次に主題分析を行う（ステップＳ１３）。主題分析では、ＣＰＵ１１は、アンケートファイル１４ａに記録されている各アンケートをサブドキュメント単位に区切り、知識辞書１４ｂの名詞句辞書を参照して、各サブドキュメントから名詞句を抽出する。ＣＰＵ１１は、抽出した各名詞句に対してサブドキュメント中の出現頻度、アンケート全体における分布等の統計情報を求め、この統計情報を利用して各サブドキュメントをベクター表現に変換する。
【００３８】
ＣＰＵ１１は、各サブドキュメントのベクター表現に基づいて、アンケート毎のベクター表現を生成し、アンケート毎のベクター表現の類似度を求め、所定の値を超えるアンケートの文書同士を１つのクラスタとする。ＣＰＵ１１は、各クラスタのベクター表現の類似度を求め、所定の値を超えるクラスタ同士を同じ主題のものとしてまとめていく。なお、１つのアンケートが複数の主題に分類される場合もある。
【００３９】
主題分析が終了すると、ＣＰＵ１１は、主題分析で得られた主題についてデータ分析を行い、アンケートに記載された問題点等を抽出する（ステップＳ１４）。データ分析は、問題点等を抽出する際の切り口となる視点により、評価分析（ステップＳ１４−１）と、感性分析（ステップＳ１４−２）と、機能要求分析（ステップＳ１４−３）とを行う。
【００４０】
このデータ分析は、現状に関する記述（記入フォーム１００の記入欄１０１、１０３に記入された事項）と将来の状態に関する記述（記入フォーム１００の記入欄１０２、１０４に記入された事項）とを分けて行うことができる。ここで、ステップＳ１３の主題分析により分類された主題は、例えば、図５に示すように簡易デンドログラム表示され、この中からオペレータがデータ分析を行いたい主題を選択する。
【００４１】
ステップＳ１４−１の評価分析では、ＣＰＵ１１は、名詞句辞書を参照して、選択された主題の分類に含まれる名詞句を抽出し、名詞句毎の出現度数（頻度）を求める。ＣＰＵ１１は、さらに構文解析結果及び形容詞句辞書を参照して、各名詞句を修飾している形容詞句を抽出し、抽出した形容詞句のレベル値に従って各名詞句のアフェクト度を取得する。アフェクト度の取得についてより詳細に説明すると、名詞句を修飾する形容詞句が抽出されると、知識辞書１４ｂ中の形容詞句辞書に登録された当該形容詞句に対応するレベル値を取得する。このレベル値を次の数式１に従って名詞句毎に集計した結果（マイナスの計算をする場合があり）が、各名詞句のアフェクト度となる。
【００４２】
【数１】
アフェクト度＝Σ（（形容詞句のレベル値）×（被修飾名詞句と形容詞句からなるフレーズの頻度））÷（被修飾名詞句を含むフレーズの頻度）
但し、Σは被修飾名詞句のリデュースにより類型化された形容詞句毎の重み付けレベル値算出結果の合計
【００４３】
なお、リデュースとは、複数の処理対象の中で指定された条件に基づく共通・重複分を集計・集約することをいう。集約の結果、共通・重複部分は唯一の（リデュースされた）ものとなり、共通・重複しない部分は、差分情報として類型化される。例えば、「美しい−色」「綺麗な−色」「すてきな−色」という３つのフレーズ（形容詞句−名詞句）について、被修飾名詞句をリデュースすると、「色−美しい／綺麗な／すてきな」となる。
【００４４】
ステップＳ１４−２の感性分析では、ＣＰＵ１１は、形容詞句辞書を参照して、選択された主題の分類に含まれる形容詞句を抽出し、各形容詞句の出現度数（頻度）を求める。ＣＰＵ１１は、さらに構文解析結果及び名詞句辞書を参照して、各形容詞句が修飾している名詞句がどれだけ固定的であるかを示す評価集約度を取得する。ここで、評価集約度は、次の数式２に従って求められ、同一の形容詞句と修飾・被修飾関係にある名詞句のバリエーションが形容詞句の頻度に比べて少ない場合、その形容詞句の評価集約度が高くなる。
【００４５】
【数２】
評価集約度＝−ｌｏｇ（名詞句の異なり語数／形容詞句の頻度）
但し、名詞句、形容詞句のいずれも修飾・被修飾関係にあるもののみを計数対象とする。
【００４６】
ステップＳ１４−３の機能要求分析では、ＣＰＵ１１は、動詞句辞書を参照して、選択された主題の分類に含まれる名詞句を修飾している動詞句を抽出し、各動詞句の出現頻度（度数）を求める。ＣＰＵ１１は、さらに構文解析結果及び名詞句辞書を参照して、各動詞句が修飾している名詞句がどれだけ固定的であるかを示す意見集約度を取得する。ここで、意見集約度は、次の数式３に従って求められ、同一の動詞句と修飾・被修飾関係にある名詞句のバリエーションが動詞句の頻度に比べて少ない場合、その動詞句の意見集約度が高くなる。
【００４７】
【数３】
意見集約度＝−ｌｏｇ（名詞句の異なり語数／動詞句の頻度）
但し、名詞句、動詞句のいずれも修飾・被修飾関係にあるもののみを計数対象とする。
【００４８】
なお、ステップＳ１４−１の評価分析、ステップＳ１４−２の感性分析、及びステップＳ１４−３の機能要求分析における名詞句、形容詞句或いは動詞句の抽出については、後に詳しく説明するものとする。
【００４９】
選択した主題についてのデータ分析が終了すると、ＣＰＵ１１は、その分析結果を分析結果データベース１４ｃに登録する（ステップＳ１５）。また、ＣＰＵ１１は、その分析結果を処理開始の指示をした端末装置２に対して通信装置１３から送信させ、当該端末装置２の表示装置に表示して出力させる（ステップＳ１６）。これで、データ分析装置１における処理が終了する。なお、分析結果データベース１４ｃに登録された分析結果は、端末装置２から要求することにより、いつでも取り出すことができるようになっている。
【００５０】
次に、ステップＳ１４−１の評価分析、ステップＳ１４−２の感性分析、ステップＳ１４−３の機能要求分析における名詞句、形容詞句、或いは動詞句の抽出について説明する。前述したとおり、ここでの名詞句、形容詞句或いは動詞句の抽出は、アフェクト度、評価集約度、或いは意見集約度の算出の前提となるものである。なお、ここでは、現状に関する記述（記入フォーム１００の記入欄１０１、１０３に記入された事項）を対象としたデータ分析の結果の出力例を示している。
【００５１】
図６は、評価分析における名詞句、形容詞句の抽出を説明する図である。ここでは、反転表示して示すように「単語登録」という名詞句がまず抽出される。次に「単語登録」という名詞句を修飾している形容詞句として、枠囲みして示すように「面倒」という形容詞句が抽出される。この場合において、名詞句「単語登録」の頻度は４、形容詞句「面倒」のレベル値を−１とすると、他に「単語登録」を修飾する形容詞句がないのであれば、アフェクト度は−１と求められることが分かる。
【００５２】
図７は、感性分析における形容詞句、名詞句の抽出を説明する図である。ここでは、反転表示して示すように「正しい」という形容詞句がまず抽出される。次に「正しい」という形容詞句が修飾している名詞句として、枠囲みして示すように「日本語」という名詞句が抽出される。この場合において、形容詞句「正しい」の頻度は４、評価集約度は１と求められることが分かる。
【００５３】
図８は、機能要求分析における動詞句、名詞句の抽出を説明する図である。ここでは、反転表示して示すように「設定」という動詞句がまず抽出される。次に「設定」という動詞句が修飾している名詞句として、枠囲みして示すように「辞書」という名詞句が抽出される。この場合において、動詞句「設定」の頻度は４、意見集約度は１と求められることが分かる。
【００５４】
次に、ステップＳ１４−１の評価分析、ステップＳ１４−２の感性分析、ステップＳ１４−３の機能要求分析の結果の出力例について、具体例を挙げて説明する。ここでも、現状に関する記述（記入フォーム１００の記入欄１０１、１０３に記入された事項）を対象として説明する。
【００５５】
図９は、ステップＳ１４−１の評価分析の結果である各名詞句のアフェクト度の出力例を示す図である。この図において、縦軸が名詞句のアフェクト度、横軸が名詞句の頻度を示している。アフェクト度は、その値がプラスになる場合（評価として好評の場合）、縦軸の正領域（図の上側）に表示され、その値がマイナスになる場合（評価として不評の場合）、縦軸の負領域（図の下側）に表示されるものとなる。
【００５６】
図９を参照すると、例えば、「変換精度」については、頻度が大きく、アフェクト度も高いので、現状の製品でもかなり顧客に満足されていることを視覚的に容易に認識することができる。これに対して、「単語登録」については、頻度が大きくてもアフェクト度がマイナスを示しているので、現状の製品における顧客の不満が大きく、今後の製品開発において改良すべき点であると考えられることを視覚的に容易に認識することができる。
【００５７】
図１０は、ステップＳ１４−２の感性分析の結果である各形容詞句毎の評価集約度の出力例を示す図である。この図において、縦軸が形容詞句の評価集約度、横軸が形容詞句の頻度を示している。図１０を参照すると、例えば、「難しい」や「正しい」については、頻度が大きく、評価集約度も高くなっているので、多くの顧客が同じ製品の同じ点について「難しい」とか「正しい」とか感じていることを視覚的に容易に認識することができる。
【００５８】
図１１は、ステップＳ１４−３の機能要求分析の結果である各動詞句の意見集約度の出力例を示す図である。この図において、縦軸が動詞句の意見集約度、横軸が動詞句の頻度を示している。図１１を参照すると、例えば、「インストール」については、頻度が大きく、意見集約度も高くなっているので、多くの顧客が「インストール」に関する同じ点について同じ要求を持っていることを視覚的に容易に認識することができる。
【００５９】
以上説明したように、この実施の形態にかかるデータ分析装置１では、評価分析において各名詞句に着目し、名詞句毎に抽出した形容詞句のレベル値を集計してアフェクト度を算出している。このアフェクト度は、対応する名詞句の点に対する顧客の評価が好評であるか不評であるかを数値化して示すものである。この名詞句毎のアフェクト度により、製品の様々な点について顧客がどのように評価しているかが即座に分かるようになるので、評価分析の結果を経営戦略やマーケティング戦略などの策定に役立てるのが容易になる。
【００６０】
感性分析においては各形容詞句に着目し、各形容詞句が修飾する名詞句に基づいて評価集約度を算出している。この評価集約度は、対応する形容詞句のような感じ方が、同じ点に集中してなされているのか、それとも様々な点に分散してなされているのかを数値化して示すものである。この形容詞句毎の評価集約度により、同じ感じ方が同じ点に集中してなされているのかどうかが即座に分かるようになるので、感性分析の結果を経営戦略やマーケティング戦略などの策定に役立てるのが容易になる。
【００６１】
機能要求分析においては動詞句に着目し、各動詞句が修飾する名詞句に基づいて意見集約度を算出している。この意見集約度は、対応する動詞句のような要求が、同じ点に集中してなされているのか、それとも様々な点に分散してなされているのかを数値化して示すものである。この動詞句毎の意見集約度により、同じ要求が同じ点に集中してなされているのかどうかが即座に分かるようになるので、機能要求分析の結果を経営戦略やマーケティング戦略などの策定に役立てるのが容易になる。
【００６２】
また、評価分析、感性分析、機能要求分析の結果としてのアフェクト度、評価集約度、意見集約度は、これらをグラフの縦軸とし、対応する語句の頻度を横軸として出力される。このため、多くの顧客が感じていることや要求していることを、視覚表現から容易に把握することができるようになり、データ分析の結果を経営戦略やマーケティング戦略などの策定に役立てるのが容易になる。
【００６３】
さらに、この実施の形態でデータ分析の対象としているアンケートは、記入欄１０１〜１０４を有する記入フォーム１００に従って記載されている。ここで、現状に関する記述は記入欄１０１、１０３に、将来の状態に関する記述は記入欄１０２、１０４に記入されるもので、互いに切り分けられている。このため、現状の問題点などに関する分析と、問題解決後のものとしてユーザが望む状態の分析とを、切り分けて行うことができるので、データ分析の結果を経営戦略やマーケティング戦略などの策定に役立てるのが容易になる。
【００６４】
本発明は、上記の実施の形態に限られず、種々の変形、応用が可能である。以下、本発明に適用可能な上記の実施の形態の変形態様について説明する。
【００６５】
上記の実施の形態では、顧客のアンケートにおいて、図２の記入フォーム１００を用意しておき、現状と将来の状態とを切り分けて記入させるものとしていた。しかしながら、必ずしも現状と将来の状態とが明確に切り分けられてアンケートが記入されているとは限らない。また、図２の記入フォーム１００にとらわれず、自由にアンケートに回答したいと思う顧客も存在し得る。図２の記入フォーム１００のような記入方法によらなければ、そもそも現状と将来の状態とが切り分けられてアンケートが記入されることがない。
【００６６】
そこで、次の４つのいずれかを任意に複合した方法により、現状と将来の状態との切り分けを行ってから、データ分析を行うものとすることができる。図２の記入フォーム１００に従って記入されたアンケートの文章では、現状と将来の状態とが切り分けられたものとして推定するが、最終的に確定するのは、次のような方法で処理を行った後とすることができる。
【００６７】
第１の方法として、文章の時制として現在形または過去形が用いられている部分を現状の記述として判断し、未来形が用いられている部分を将来の状態の記述として判断して、現状と将来の状態とを切り分けることができる。第２の方法として、時間的な先後から通常は現状に関する記述の方が将来の状態に関する記述よりも先に現れるので、記述の先後に従って現状と将来の状態との切り分けを行うことができる。
【００６８】
第３の方法としては、不便な点として記載されたものに着目した場合、ステップＳ１４−１の評価分析で得られたアフェクト度を利用して、アフェクト度の低い部分の記述を現状として把握する。同一の文章内において同一の名詞句についてアフェクト度の高い部分があれば、その部分の記述を将来の状態として把握して、現状と将来の状態とを切り分けることができる。
【００６９】
第４の方法としては、知識辞書１４ｂとして、さらに図１２に示すような辞書を用意しておく。この辞書は、将来の状態を記述するために頻繁に用いられる語句を登録したものである。アンケートの文章のうちで、図１２に示す辞書に登録された語句を含む部分の記述を将来の状態として把握する。同一の文章内において同一の名詞句を有する対応する記載があれば、その部分の記述を現状として把握して、現状と将来の状態とを切り分けることができる。
【００７０】
上記のような方法で現状と将来の状態との切り分けを明確に行って、アンケートの文章を分析することによって、アンケートの対象とした製品に対する顧客の満足・不満と要望とを別々に分かり易く把握できるようになる。これにより、経営戦略やマーケティング戦略を策定する者は、顧客満足度を向上させるための的確な戦略を容易に作成できるようになる。
【００７１】
上記の実施の形態では、データ分析の対象となる文章は、図２の記入フォーム１００に従って記入されたアンケートであり、顧客が自己の端末装置４からデータ分析装置１にアクセスして、アンケートファイル１４ａに記録させるものであった。これに対して、データ分析の対象となる文章は、顧客が自らデータ分析装置１に送ったものだけでなく、インターネット６上のＷｅｂサーバに書き込まれた掲示板や記事などのデータを収集したものであってもよい。このような文章では、現状と将来の状態とが切り分けられていないので、上記した切り分け技術の適用が特に有効となる。
【００７２】
上記の実施の形態では、データ分析装置１は、ＬＡＮ３を介して端末装置２に接続されており、端末装置２からの指示に従ってアンケートファイル１４ａに記録されたアンケートをデータ分析し、その分析結果を指示元の端末装置２に返却するものとしていた。すなわち、クライアント−サーバ構成のシステムにおいて本発明が実現されていた。これに対して、データ分析装置１が入力装置及び表示装置を有するものとして、スタンドアローン型のシステムにおいて本発明が実現されるものとしてもよい。
【００７３】
上記の実施の形態では、データ分析装置１のがアンケートに対してデータ分析するためのプログラムは、記憶装置１２に予め記憶されているものとして説明した。しかしながら、このプログラムをＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなどのコンピュータ読み取り可能な記録媒体に格納して、ハードウェアとは独立して配布するものとしてもよい。また、これらの処理プログラムをインターネット上のＷｅｂサーバ装置が有する固定ディスク装置に格納しておき、インターネットを通じて配信するものとしてもよい。
【００７４】
【発明の効果】
以上説明したように、本発明によれば、テキストマイニングの手法により主観的な情報を容易に捉えて的確なデータ分析を行うことができるようになる。また、その分析結果により、また、経営戦略やマーケティング戦略等の策定が容易になる。
【図面の簡単な説明】
【図１】本発明の実施の形態にかかるデータ分析装置を中心としたのシステムの構成を示すブロック図である。
【図２】アンケートファイルに記録されるアンケートの記入フォームを示す図である。
【図３】知識辞書の例を示す図である。
【図４】本発明の実施の形態にかかるデータ分析装置における処理を示すフローチャートである。
【図５】主題分析により分類された主題の簡易デンドログラム表示の例を示す図である。
【図６】評価分析における名詞句と形容詞句の抽出を説明する図である。
【図７】感性分析における形容詞句と名詞句の抽出を説明する図である。
【図８】機能要求分析における動詞句と名詞句の抽出を説明する図である。
【図９】評価分析の結果である各名詞句のアフェクト度の出力例を示す図である。
【図１０】感性分析の結果である各形容詞句毎の評価集約度の出力例を示す図である。
【図１１】機能要求分析の結果である各動詞句の意見集約度の出力例を示す図である。
【図１２】変形例において、知識辞書として追加される辞書の例を示す図である。
【符号の説明】
１データ分析装置
２端末装置
３ＬＡＮ
４端末装置
５ゲートウェイ
６インターネット
１１ＣＰＵ
１２記憶装置
１３通信装置
１４ファイル装置
１４ａアンケートファイル
１４ｂ知識辞書
１４ｃ分析結果データベース[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data analysis apparatus and method using a text mining technique, and a program.
[0002]
[Prior art]
In recent years, in order to effectively formulate a management strategy or a marketing strategy, it is indispensable to derive a trend or pattern up to the present time necessary for management or marketing and perform data analysis. Data mining is applied as a technique for deriving trends and patterns in this data analysis. When handling customer information in data mining, the analysis is based on quantitative customer information (attribute information) based on unambiguous and specific facts based on questionnaire results.
[0003]
For example, as data mining from purchase history, numerical information such as customer age, purchase date and time, attribute data related to purchased items is used as input data, and the causal relationship of attribute data is mined to perform cross-sell / up-sell analysis. Is what you do. As an example of the application of data mining in product development situations, taking notebook PC development as an example, the perceptual quality is evaluated by methods such as conjoint analysis from the combination conditions of attribute data such as battery operating time and the presence or absence of DVD drive. It is.
[0004]
In analysis in data mining, attribute information based on such unambiguous and specific facts is analyzed, and the customer's subjective intention behind the analysis is not analyzed. For this reason, for example, (1) the relationship of the place, time, situation, etc. where the customer has evaluated is not considered. (2) I don't know what kind of thought the customer came to purchase behavior. (3) I do not know why the customer purchased the product in that situation. (4) I don't know what kind of idea the particular model was evaluated high (low). There was a problem such as.
[0005]
Therefore, text mining that proposes not only analyzing unambiguous and specific information as in the above-described data mining but also analyzing the result of a customer answering in text has been proposed. The customer's response sentence and the like targeted by text mining may include the customer's subjective intention, and it is expected that more useful information for management and marketing can be obtained.
[0006]
[Problems to be solved by the invention]
However, many text mining techniques up to now have followed the data mining technique. Here, the words contained in the text are regarded as keywords, as unambiguous and specific data, and are analyzed only in the same way as data mining, and the customer's subjective intention behind the words There was almost no analysis.
[0007]
For this reason, the information regarding the customer's subjective intention was missing from the data for management analysis or marketing analysis generated as a result. In addition, text mining that only follows the data mining method has the same quality as the data mining result, and it is necessary to formulate management strategy and marketing strategy from the resulting data. There were many parts that had to rely on the experience and intuition of strategy makers.
[0008]
It is an object of the present invention to provide a data analysis apparatus and method, and a program capable of accurately analyzing data by capturing subjective information using a text mining technique.
[0009]
It is another object of the present invention to provide a data analysis apparatus and method, and a program that can provide an analysis result that facilitates the formulation of a management strategy, a marketing strategy, and the like.
[0010]
[Means for Solving the Problems]
  In order to achieve the above object, a data analysis apparatus according to the first aspect of the present invention includes:
  A noun phrase dictionary that registers noun phrases that can be included in sentences to be analyzedNoun phrase dictionary storage means for storingWhen,
  A modifier phrase dictionary registered by associating modifier phrases that can modify noun phrases registered in the noun phrase dictionary with level values indicating whether each modifier phrase is semantically preferable or not preferredModifier dictionary storage means for storingWhen,
By instructing the start of sentence analysis,SaidStored in the noun phrase dictionary storage meansRefer to the noun phrase dictionaryIncludes a description part about the current state and a description part about the future state related to the current stateTo be analyzedEach description part included in the sentenceNoun phrase extraction means for extracting noun phrases from
  SaidStored in the modifier dictionary storage meansReferring to a modifier phrase dictionary, a modifier phrase extracting means for extracting a modifier phrase that modifies the noun phrase extracted by the noun phrase extracting means;
  Among the sentences to be analyzed, the tense used is the past tense, present tense or future tense, the described position is first or second, or registered in the dictionary stored in the storage means A separation means for separating a description part related to the current state and a description part related to the future state according to whether or not a frequent phrase of the future state is included;
For each description part carved by the carving means,The level value registered in the modifier phrase dictionary for the modifier phrase extracted by the modifier phrase extraction unit is totaled for each noun phrase extracted by the noun phrase extraction unit, and the total level value is calculated for each noun phrase. A degree of effect calculation means for calculating as
  An effect degree output means for outputting the effect degree for each noun phrase calculated by the effect degree calculation means;
  It is characterized by providing.
[0011]
  The data analysis apparatus according to the first aspect is as follows.
  For each description part carved by the carving means,It becomes the analysis object of each noun phrase extracted by the noun phrase extracting meansEach description part included in the sentenceThere may be further provided a noun phrase frequency calculating means for calculating the appearance frequency in. in this case,
  The effect level output means may output the degree of effect calculated by the effect level calculation means for each noun phrase and the appearance frequency calculated by the noun phrase frequency calculation means in a visual association.
[0012]
  The data analysis apparatus according to the first aspect is as follows.
  SaidStored in the modifier dictionary storage meansRefer to the modifier dictionary for analysisEach description part included in the sentenceA modifier extraction means for extracting a modifier that modifies the noun phrase in
  SaidStored in the noun phrase dictionary storage meansA noun phrase extraction unit that refers to a noun phrase dictionary and extracts a noun phrase that is modified by each of the modifiers extracted by the modifier extraction unit;
  For each description part carved by the carving means,Based on the extraction result of the noun phrase extracted by the noun phrase extracting means, the degree of aggregation is calculated for each modifier phrase indicating the degree to which the modifier phrase extracted by the modifier extraction means modifies the same noun phrase. Means,
  An aggregation level output unit that outputs an aggregation level for each modifier phrase calculated by the aggregation level calculation unit may be further provided.
[0013]
  In this case, the data analysis apparatus according to the first aspect is
  For each description part carved by the carving means,It becomes the analysis object of each modifier phrase extracted by the modifier phrase extraction meansEach description part included in the sentenceThere may be further provided a modifier phrase frequency calculating means for calculating the appearance frequency in. And
  The degree-of-aggregation output means may output the degree-of-aggregation calculated by the degree-of-aggregation calculation means for each modifier and the appearance frequency calculated by the modifier phrase frequency calculation means in a visual association.
[0014]
  In order to achieve the above object, a data analysis apparatus according to the second aspect of the present invention provides:
  A noun phrase dictionary that registers noun phrases that can be included in sentences to be analyzedNoun phrase dictionary storage means for storingWhen,
  A modifier phrase dictionary that registers modifier phrases that can modify noun phrases registered in the noun phrase dictionaryModifier dictionary storage means for storingWhen,
By instructing the start of sentence analysis,SaidStored in the modifier dictionary storage meansRefer to the modifier dictionaryIncludes a description part about the current state and a description part about the future state related to the current stateTo be analyzedEach description part included in the sentenceA modifier extraction means for extracting a modifier that modifies the noun phrase in
  SaidStored in the noun phrase dictionary storage meansA noun phrase extraction unit that refers to a noun phrase dictionary and extracts a noun phrase that is modified by each of the modifiers extracted by the modifier extraction unit;
  Among the sentences to be analyzed, the tense used is the past tense, present tense or future tense, the described position is first or second, or registered in the dictionary stored in the storage means A separation means for separating a description part related to the current state and a description part related to the future state according to whether or not a frequent phrase of the future state is included;
For each description part carved by the carving means,Based on the extraction result of the noun phrase extracted by the noun phrase extracting means, the degree of aggregation is calculated for each modifier phrase indicating the degree to which the modifier phrase extracted by the modifier extraction means modifies the same noun phrase. Means,
  An intensity output means for outputting the intensity for each modifier calculated by the intensity calculation means;
  It is characterized by providing.
[0015]
  The data analysis apparatus according to the second aspect is as follows.
  For each description part carved by the carving means,It becomes the analysis object of each modifier phrase extracted by the modifier phrase extraction meansEach description part included in the sentenceThere may be further provided a modifier phrase frequency calculating means for calculating the appearance frequency in. In this case,
  The degree-of-aggregation output means may output the degree-of-aggregation calculated by the degree-of-aggregation calculation means for each modifier and the appearance frequency calculated by the modifier phrase frequency calculation means in a visual association.
[0018]
  In order to achieve the above object, a data analysis method according to the third aspect of the present invention includes:
  A noun phrase dictionary that registers noun phrases that can be included in the sentence to be analyzed, and a modifier that can modify the noun phrases registered in the noun phrase dictionary, whether each modifier is semantically preferable or not preferable The modifier dictionary registered in association with the level valueIn addition to storing dictionary storage means, it has storage means used as a work areaIn a computer device,
By instructing the start of analysis,SaidStored in dictionary storage meansRefer to the noun phrase dictionaryStored in the storage means, including a description part relating to the current state and a description part relating to the future state relating to the current stateTo be analyzedEach description part included in the sentenceExtract noun phrases from, Storing the extracted noun phrase in the storage means;
  SaidStored in dictionary storage meansRefer to the modifier dictionary and extractStored in the storage meansExtract modifiers that modify noun phrases, Storing the extracted modifier in the storage means;
Among the sentences to be analyzed stored in the storage means, the tense used is the past tense, the present tense or the future tense, the described position is the first or the last, or the dictionary storage means According to whether or not to include a frequent phrase of the future state registered in the stored frequent phrase dictionary, the description part regarding the current state and the description part regarding the future state are separated and stored in the storage means,
  For each separated description part,ExtractStored in the storage meansThe level value registered in the modifier dictionary for the modifier is extracted.Stored in the storage meansAggregate for each noun phrase, and calculate the level value for each noun phrase as the degree of effect for each noun phrase., Storing the calculated degree of effect in the storage means,
  CalculateStored in the storage meansThe degree of effect for each noun phraseFrom the output device of the computer deviceoutputMake
  It is characterized by that.
[0020]
  In order to achieve the above object, a data analysis method according to the fourth aspect of the present invention includes:
  A noun phrase dictionary in which noun phrases that can be included in a sentence to be analyzed are registered, and a modifier phrase dictionary in which modifier phrases that can modify noun phrases registered in the noun phrase dictionary are registered.In addition to storing dictionary storage means, it has storage means used as a work areaIn a computer device,
  By instructing the start of analysis,SaidStored in dictionary storage meansRefer to the modifier dictionaryStored in the storage means, including a description part relating to the current state and a description part relating to the future state relating to the current stateTo be analyzedEach description part included in the sentenceExtract modifiers that modify noun phrases in, Storing the extracted modifier in the storage means;
  SaidStored in dictionary storage meansRefer to the noun phrase dictionary and extractStored in the storage meansExtract noun phrases modified by each modifier,
Among the sentences to be analyzed stored in the storage means, the tense used is the past tense, the present tense or the future tense, the described position is the first or the last, or the dictionary storage means According to whether or not to include a frequent phrase of the future state registered in the stored frequent phrase dictionary, the description part regarding the current state and the description part regarding the future state are separated and stored in the storage means,
  For each separated description part,Based on the extraction result of the noun phrase, the extraction is performed.Stored in the storage meansThe degree of aggregation that indicates the degree to which the modifiers modify the same noun phrase is calculated for each modifier., Storing the calculated degree of aggregation in the storage means,
  CalculateStored in the storage meansThe degree of aggregation for each modifierFrom the output device of the computer deviceoutputMake
  A data analysis method characterized by that.
[0021]
The data analysis method according to the fourth aspect is as follows.
The appearance frequency of the extracted modifiers in the sentence to be analyzed can be further increased. In this case,
The degree of aggregation may be visually output for each modifier phrase in a visual association with the calculated appearance frequency.
[0022]
  In order to achieve the above object, a program according to the fifth aspect of the present invention is:
  A noun phrase dictionary that registers noun phrases that can be included in the sentence to be analyzed, and a modifier that can modify the noun phrases registered in the noun phrase dictionary, whether each modifier is semantically preferable or not preferable The modifier dictionary registered in association with the level valueA stored dictionary storage means is provided.Computer equipment,
By instructing the start of sentence analysis,SaidStored in dictionary storage meansRefer to the noun phrase dictionaryIncludes a description part about the current state and a description part about the future state related to the current stateTo be analyzedEach description part included in the sentenceNoun phrase extraction means for extracting noun phrases from
  SaidStored in dictionary storage meansA modifier extraction unit that refers to a modifier dictionary and extracts a modifier that modifies the noun phrase extracted by the noun phrase extraction unit;
  Among the sentences to be analyzed, the tense used is the past tense, present tense or future tense, the described position is first or second, or registered in the dictionary stored in the storage means A separation means for separating a description part about the current state and a description part about the future state according to whether or not a frequent phrase of the future state is included,
For each description part carved by the carving means,The level value registered in the modifier phrase dictionary for the modifier phrase extracted by the modifier phrase extraction unit is totaled for each noun phrase extracted by the noun phrase extraction unit, and the total level value is calculated for each noun phrase. A degree-of-effect calculation means for calculating
  The degree of effect for each noun phrase calculated by the degree of effect calculating means isOutput from an output device included in the computer deviceAbility degree output means and
  It is made to function as.
[0024]
  In order to achieve the above object, a program according to the sixth aspect of the present invention is:
  A noun phrase dictionary in which noun phrases that can be included in a sentence to be analyzed are registered, and a modifier phrase dictionary in which modifier phrases that can modify noun phrases registered in the noun phrase dictionary are registered.A stored dictionary storage means is provided.Computer equipment,
  A noun phrase dictionary that registers noun phrases that can be included in sentences to be analyzedNoun phrase dictionary storage means for storing,
  A modifier phrase dictionary that registers modifier phrases that can modify noun phrases registered in the noun phrase dictionaryModifier dictionary storage means for storing,
By instructing the start of sentence analysis,SaidStored in the modifier dictionary storage meansRefer to the modifier dictionaryIncludes a description part about the current state and a description part about the future state related to the current stateTo be analyzedEach description part included in the sentenceA modifier extraction means for extracting a modifier that modifies the noun phrase in
  SaidStored in the noun phrase dictionary storage meansA noun phrase extraction unit that refers to a noun phrase dictionary and extracts a noun phrase that is modified by each modifier extracted by the modifier extraction unit;
  Among the sentences to be analyzed, the tense used is the past tense, present tense or future tense, the described position is first or second, or registered in the dictionary stored in the storage means A separation means for separating a description part about the current state and a description part about the future state according to whether or not a frequent phrase of the future state is included,
For each description part carved by the carving means,Based on the extraction result of the noun phrase extracted by the noun phrase extracting means, the degree of aggregation is calculated for each modifier phrase indicating the degree to which the modifier phrase extracted by the modifier extraction means modifies the same noun phrase. Means and
  The degree of aggregation for each modifier calculated by the degree of aggregation calculation meansOutput from an output device included in the computer deviceAggregation level output means
  It is made to function as.
[0025]
The program according to the sixth aspect provides the computer device,
You may make it function further as a modifier phrase frequency calculation means which calculates the appearance frequency in the sentence used as the analysis object of each modifier phrase extracted by the modifier phrase extraction means. In this case,
The degree-of-aggregation output means may output the degree-of-aggregation calculated by the degree-of-aggregation calculation means for each modifier and the appearance frequency calculated by the modifier phrase frequency calculation means in a visual association.
[0026]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the accompanying drawings.
[0027]
FIG. 1 is a block diagram showing a system configuration centering on a data analysis apparatus according to this embodiment. The data analysis device 1 is connected to a terminal device (such as a personal computer or a workstation) 2 used by employees in a management department via a LAN (Local Area Network) 3. Further, it is also connected to a customer terminal device (such as a personal computer or a mobile phone) 4 from the gateway 5 via the Internet 6. The data analysis device 1 includes a CPU (Central Processing Unit) 11, a storage device 12, a communication device 13, and a file device 14.
[0028]
The CPU 11 executes a program stored in the storage device 12 and performs data analysis by a text mining technique as will be described later. The storage device 12 includes a main storage device and an auxiliary storage device, stores a program executed by the CPU 11, and is used as a work area for the CPU 11. The communication device 13 transmits / receives information to / from the terminal devices 2 and 4 via the LAN 3 and the Internet 6.
[0029]
The file device 14 includes a questionnaire file 14a, a knowledge dictionary 14b, and an analysis result database 14c. The file device 14 is physically included in the auxiliary storage device of the storage device 12, but includes a file and a database having an important role in the present invention. Yes.
[0030]
The questionnaire file 14a is a file in which text data of a questionnaire result input by a customer in a natural language sentence for a product is recorded. This questionnaire is entered in the terminal device 4 according to the entry form 100 as shown in FIG. 2, transmitted to the data analysis apparatus 1 via the Internet 6, and recorded in the questionnaire file 14a.
[0031]
The entry form 100 in FIG. 2 is an entry column (satisfaction entry column) 101 of points that the customer feels convenient for the products targeted by the questionnaire and what to expect to further extend the useful points. Field (progress expectation entry field) 102, entry field of points that the customer feels inconvenient (dissatisfaction entry field) 103, and entry field of how to improve the inconvenience point (improvement expectation entry field) 104. Here, the entry columns 101 and 103 are filled with the current state, and the entry columns 102 and 104 are filled with the future state.
[0032]
The knowledge dictionary 14b includes a word dictionary, a grammar dictionary, and a part of speech dictionary. As the word dictionary and the grammar dictionary, substantially the same dictionary as conventionally used for morphological analysis and syntactic analysis of sentences is applied. The part-of-speech dictionary does not register words with grammatical meanings registered in the word dictionary, but rather the words to be evaluated (single words, phrases that have a single meaning by concatenating multiple words) Is registered. The part-of-speech dictionary is prepared separately for nouns, adjectives and verbs, but these are not classified according to grammatical functions, but they function as nouns in meaning or as adjectives. Or whether it works verbally.
[0033]
The noun phrase dictionary is a dictionary in which noun phrases modified by adjective phrases and verb phrases are registered. The adjective phrase dictionary and the verb phrase dictionary are dictionaries that respectively register adjective phrases and verb phrases that will modify the noun phrase. The modification / modification relationship here does not require grammatically whether or not it becomes a modification / modification word, but may have a dependency relationship semantically. Moreover, it does not ask | require whether the modifier is preceded or followed by the word to be modified. In particular, the adjective phrase dictionary is associated with adjective phrases (expression words and normalized expressions) as shown in FIG. 3 in order to perform the evaluation analysis described later, and whether the word evaluates preferably or not. The level value shown is registered.
[0034]
The analysis result database 14c is a database for registering the results of data analysis described later with reference to the knowledge dictionary 14b for the questionnaire text recorded in the questionnaire file 14a. The analysis result database 14c may also register the analysis / analysis results (the results of morpheme analysis and syntactic analysis, which will be described later, and the results of subject analysis) before being output as the final results of data analysis.
[0035]
Hereinafter, processing in the data analysis apparatus 1 according to this embodiment will be described. Here, it is assumed that the questionnaire to be subjected to text mining has already been sent from the terminal device 4 and recorded in the questionnaire file 14a, but has not been analyzed yet. FIG. 4 is a flowchart showing processing in the data analysis apparatus 1. The process of this flowchart is started when an instruction to start processing is sent from the terminal device 2.
[0036]
When the processing starts, the CPU 11 first refers to the word dictionary and grammar dictionary of the knowledge dictionary 14b, morphologically analyzes the sentences of each questionnaire recorded in the questionnaire file 14a (step S11), and performs syntax analysis (step S11). S12). The morpheme analysis and the syntax analysis are performed by the same method as the conventional method, and it becomes possible to grasp a modification / modification relationship to be described later based on the result of the syntax analysis.
[0037]
When the morphological analysis and syntax analysis are completed, the CPU 11 next performs the subject analysis (step S13). In the subject analysis, the CPU 11 divides each questionnaire recorded in the questionnaire file 14a into sub-document units, refers to the noun phrase dictionary in the knowledge dictionary 14b, and extracts a noun phrase from each sub-document. CPU11 calculates | requires statistical information, such as the appearance frequency in a subdocument, the distribution in the whole questionnaire, etc. with respect to each extracted noun phrase, and converts each subdocument into a vector expression using this statistical information.
[0038]
CPU11 produces | generates the vector expression for every questionnaire based on the vector expression of each subdocument, calculates | requires the similarity of the vector expression for every questionnaire, and makes the documents of the questionnaire exceeding a predetermined value into one cluster. CPU11 calculates | requires the similarity of the vector expression of each cluster, and collects the clusters exceeding a predetermined value as the same subject. One questionnaire may be classified into a plurality of subjects.
[0039]
When the subject analysis ends, the CPU 11 performs data analysis on the subject obtained by the subject analysis, and extracts problems and the like described in the questionnaire (step S14). In the data analysis, evaluation analysis (step S14-1), sensibility analysis (step S14-2), and function requirement analysis (step S14-3) are performed based on a viewpoint that becomes a starting point when extracting problems and the like. .
[0040]
This data analysis is divided into a description relating to the current state (items entered in the entry fields 101 and 103 of the entry form 100) and a description relating to the future state (items entered in the entry fields 102 and 104 of the entry form 100). It can be carried out. Here, the subjects classified by the subject analysis in step S13 are displayed in a simple dendrogram as shown in FIG. 5, for example, and the operator selects a subject for which data analysis is to be performed.
[0041]
In the evaluation analysis in step S14-1, the CPU 11 refers to the noun phrase dictionary, extracts noun phrases included in the selected subject classification, and obtains the appearance frequency (frequency) for each noun phrase. The CPU 11 further extracts an adjective phrase that modifies each noun phrase by referring to the parsing result and the adjective phrase dictionary, and acquires the degree of affect of each noun phrase according to the level value of the extracted adjective phrase. The acquisition of the degree of effect will be described in more detail. When an adjective phrase that modifies a noun phrase is extracted, a level value corresponding to the adjective phrase registered in the adjective phrase dictionary in the knowledge dictionary 14b is acquired. The result of summing up the level values for each noun phrase according to the following formula 1 (which may be negatively calculated) is the degree of effect of each noun phrase.
[0042]
[Expression 1]
Affectiveness = Σ ((level value of adjective phrase) × (frequency of phrase consisting of modified noun phrase and adjective phrase)) ÷ (frequency of phrase including modified noun phrase)
However, Σ is the sum of the weight level calculation results for each adjective phrase categorized by reducing the modified noun phrase
[0043]
Note that reducing means summing up / aggregating common / overlapping parts based on conditions specified among a plurality of processing targets. As a result of the aggregation, the common / overlapping part is unique (reduced), and the common / non-overlapping part is categorized as difference information. For example, for the three phrases (adjective phrase-noun phrase) of "beautiful-color", "beautiful-color", and "nice-color", the modified noun phrase is reduced to "color-beautiful / beautiful / beautiful". "
[0044]
In the sentiment analysis in step S14-2, the CPU 11 refers to the adjective phrase dictionary, extracts adjective phrases included in the selected subject classification, and obtains the appearance frequency (frequency) of each adjective phrase. The CPU 11 further refers to the syntax analysis result and the noun phrase dictionary, and acquires the evaluation aggregation degree indicating how fixed the noun phrase that each adjective phrase modifies. Here, the evaluation intensity is obtained according to the following formula 2. If the variation of the noun phrase in the modification / modification relationship with the same adjective phrase is less than the frequency of the adjective phrase, the evaluation intensity of the adjective phrase Becomes higher.
[0045]
[Expression 2]
Evaluation intensity = -log (number of different noun phrases / frequency of adjective phrases)
However, only noun phrases and adjective phrases that are in a modified / modified relationship are counted.
[0046]
In the function request analysis in step S14-3, the CPU 11 refers to the verb phrase dictionary, extracts verb phrases that modify noun phrases included in the selected subject classification, and the appearance frequency of each verb phrase ( Frequency). The CPU 11 further refers to the parsing result and the noun phrase dictionary, and acquires the opinion aggregation degree indicating how fixed the noun phrase that each verb phrase modifies. Here, the opinion aggregation level is obtained according to the following Equation 3. If the variation of the noun phrase in the modification / modification relationship with the same verb phrase is less than the frequency of the verb phrase, the opinion aggregation level of the verb phrase Becomes higher.
[0047]
[Equation 3]
Opinion intensity = -log (number of different noun phrases / frequency of verb phrases)
However, only noun phrases and verb phrases that are in a modified / modified relationship are counted.
[0048]
The extraction of noun phrases, adjective phrases, or verb phrases in the evaluation analysis in step S14-1, the sensitivity analysis in step S14-2, and the function requirement analysis in step S14-3 will be described in detail later.
[0049]
When the data analysis for the selected subject is completed, the CPU 11 registers the analysis result in the analysis result database 14c (step S15). Further, the CPU 11 causes the communication device 13 to transmit the analysis result to the terminal device 2 that has instructed to start processing, and displays and outputs the analysis result on the display device of the terminal device 2 (step S16). This completes the processing in the data analysis device 1. The analysis result registered in the analysis result database 14c can be taken out at any time by making a request from the terminal device 2.
[0050]
Next, extraction of noun phrases, adjective phrases, or verb phrases in the evaluation analysis in step S14-1, the sensitivity analysis in step S14-2, and the function requirement analysis in step S14-3 will be described. As described above, the extraction of the noun phrase, the adjective phrase, or the verb phrase here is a premise for calculating the degree of effect, the degree of evaluation aggregation, or the degree of opinion aggregation. Here, an output example of the result of data analysis for a description relating to the current state (items entered in the entry fields 101 and 103 of the entry form 100) is shown.
[0051]
FIG. 6 is a diagram for explaining extraction of noun phrases and adjective phrases in the evaluation analysis. Here, as shown in reverse video, the noun phrase “word registration” is first extracted. Next, as an adjective phrase that modifies the noun phrase “word registration”, an adjective phrase “troublesome” is extracted as shown in a box. In this case, if the frequency of the noun phrase “word registration” is 4, and the level value of the adjective phrase “troublesome” is −1, if there is no other adjective phrase that modifies “word registration”, the degree of effect is − It can be seen that 1 is required.
[0052]
FIG. 7 is a diagram for explaining extraction of adjective phrases and noun phrases in sensitivity analysis. Here, as shown in reverse video, the adjective phrase “correct” is first extracted. Next, as a noun phrase modified by the adjective phrase “correct”, a noun phrase “Japanese” is extracted as shown in a box. In this case, it can be seen that the frequency of the adjective phrase “correct” is 4 and the evaluation intensity is 1.
[0053]
FIG. 8 is a diagram for explaining the extraction of verb phrases and noun phrases in the function request analysis. Here, the verb phrase “setting” is first extracted as shown in reverse video. Next, as a noun phrase modified by the verb phrase “setting”, a noun phrase “dictionary” is extracted as shown in a box. In this case, it is understood that the frequency of the verb phrase “setting” is 4 and the opinion aggregation level is 1.
[0054]
Next, output examples of the results of the evaluation analysis in step S14-1, the sensitivity analysis in step S14-2, and the function request analysis in step S14-3 will be described with specific examples. Here as well, the description regarding the current situation (the items entered in the entry fields 101 and 103 of the entry form 100) will be described.
[0055]
FIG. 9 is a diagram illustrating an output example of the degree of effect of each noun phrase that is the result of the evaluation analysis in step S14-1. In this figure, the vertical axis indicates the degree of noun phrase effect, and the horizontal axis indicates the frequency of the noun phrase. The degree of effect is displayed in the positive area of the vertical axis (upper side of the figure) when the value is positive (when the evaluation is popular), and the vertical axis when the value is negative (when the evaluation is unfavorable) Will be displayed in the negative area (lower side of the figure).
[0056]
Referring to FIG. 9, for example, “conversion accuracy” has a high frequency and a high degree of effect. Therefore, it can be easily recognized visually that the customer is satisfied with the current product. On the other hand, “word registration” shows a negative degree of effect even if the frequency is high, so there is a lot of customer dissatisfaction with the current product, and it should be improved in future product development. Can be easily recognized visually.
[0057]
FIG. 10 is a diagram illustrating an output example of the evaluation intensity for each adjective phrase, which is the result of the sensitivity analysis in step S14-2. In this figure, the vertical axis represents the adjective phrase evaluation intensity, and the horizontal axis represents the frequency of the adjective phrase. Referring to FIG. 10, for example, “difficult” or “correct” has a high frequency and high evaluation aggregation, so that many customers are “difficult” or “correct” with respect to the same point of the same product. You can easily recognize what you feel visually.
[0058]
FIG. 11 is a diagram illustrating an output example of the opinion aggregation level of each verb phrase as a result of the function request analysis in step S14-3. In this figure, the vertical axis indicates the degree of opinion aggregation of the verb phrase, and the horizontal axis indicates the frequency of the verb phrase. Referring to FIG. 11, for example, “installation” has a high frequency and a high degree of opinion aggregation, so it can be visually confirmed that many customers have the same request regarding the same point regarding “installation”. It can be easily recognized.
[0059]
As described above, in the data analysis apparatus 1 according to this embodiment, attention is paid to each noun phrase in the evaluation analysis, and the level of adjective phrases extracted for each noun phrase is aggregated to calculate the degree of effect. . This degree of effect is a numerical value indicating whether the customer's evaluation of the corresponding noun phrase point is favorable or unfavorable. The degree of effect for each noun phrase makes it easy to see how customers are evaluating various aspects of the product, so the results of evaluation analysis can be used to formulate management strategies and marketing strategies. It becomes easy.
[0060]
In the sensitivity analysis, attention is paid to each adjective phrase, and the evaluation intensity is calculated based on the noun phrase modified by each adjective phrase. This evaluation aggregation degree shows numerically whether the way of feeling like the corresponding adjective phrase is concentrated on the same point or distributed over various points. This aggregative evaluation level for each adjective phrase makes it possible to immediately see whether the same feeling is concentrated on the same point, so the results of Kansei analysis can be used to formulate management strategies and marketing strategies. Becomes easier.
[0061]
In function requirement analysis, attention is focused on verb phrases, and the degree of opinion aggregation is calculated based on the noun phrases that each verb phrase modifies. This opinion aggregation level shows numerically whether requests such as corresponding verb phrases are concentrated on the same point or distributed over various points. This level of opinion aggregation for each verb phrase makes it possible to immediately see whether the same request is concentrated on the same point, so the results of functional requirement analysis can be used to formulate management strategies and marketing strategies. Becomes easier.
[0062]
In addition, the degree of effect, evaluation aggregation, and opinion aggregation as a result of the evaluation analysis, sensitivity analysis, and function requirement analysis are output with the vertical axis of the graph and the frequency of the corresponding phrase as the horizontal axis. For this reason, it is possible to easily grasp what many customers feel and demand from visual expression, and use the results of data analysis for the formulation of management strategy and marketing strategy. It becomes easy.
[0063]
Furthermore, the questionnaire which is the object of data analysis in this embodiment is described according to the entry form 100 having entry fields 101-104. Here, descriptions relating to the current state are entered in entry columns 101 and 103, and descriptions relating to future states are entered in entry columns 102 and 104, which are separated from each other. For this reason, it is possible to separate the analysis on the current problem and the analysis of the state desired by the user after solving the problem, so that the results of the data analysis can be used for the formulation of management strategy, marketing strategy, etc. It becomes easy.
[0064]
The present invention is not limited to the above-described embodiment, and various modifications and applications are possible. Hereinafter, modifications of the above-described embodiment applicable to the present invention will be described.
[0065]
In the above embodiment, the customer's questionnaire is provided with the entry form 100 shown in FIG. 2, and the current state and the future state are separated and filled in. However, the current status and the future status are not necessarily clearly separated and the questionnaire is not always filled in. Further, there may be a customer who wants to freely answer a questionnaire without being bound by the entry form 100 of FIG. Unless a filling method such as the filling form 100 in FIG. 2 is used, the current state and the future state are separated in the first place and the questionnaire is not filled.
[0066]
Therefore, data analysis can be performed after the current state and the future state are separated by a method in which any of the following four is arbitrarily combined. In the questionnaire text entered according to the entry form 100 in FIG. 2, it is estimated that the current state and the future state are separated, but the final decision is made after processing by the following method It can be.
[0067]
As a first method, a part where the present tense or past tense is used as a text tense is judged as a description of the current state, and a part where the future tense is used is judged as a description of the future state, Can be separated from the future state. As a second method, since the description about the current state usually appears earlier than the description about the future state, the current state and the future state can be separated according to the description later.
[0068]
As a third method, when focusing on what is described as an inconvenience, the description of the portion with a low degree of effect is grasped as the current state by using the degree of effect obtained by the evaluation analysis in step S14-1. . If there is a portion with a high degree of effect for the same noun phrase in the same sentence, the description of that portion can be grasped as a future state, and the current state and the future state can be separated.
[0069]
As a fourth method, a dictionary as shown in FIG. 12 is prepared as the knowledge dictionary 14b. This dictionary registers words frequently used to describe future states. Among the sentences of the questionnaire, a description of a part including a word / phrase registered in the dictionary shown in FIG. 12 is grasped as a future state. If there is a corresponding description having the same noun phrase in the same sentence, the description of that part can be grasped as the current state, and the current state and the future state can be separated.
[0070]
By clearly distinguishing between the current state and the future state as described above, and analyzing the questionnaire text, it is easy to understand separately the customer satisfaction / dissatisfaction and demand for the products targeted by the questionnaire become able to. As a result, a person who formulates a management strategy and a marketing strategy can easily create an accurate strategy for improving customer satisfaction.
[0071]
In the above embodiment, the text to be analyzed is a questionnaire entered according to the entry form 100 of FIG. 2, and the customer accesses the data analysis apparatus 1 from his / her terminal device 4 and the questionnaire file 14a. Was recorded. On the other hand, the text to be analyzed is not only the data sent by the customer to the data analysis device 1 but also the collected data such as bulletin boards and articles written on the Web server on the Internet 6. There may be. In such a sentence, since the current state and the future state are not separated, application of the above-described separation technique is particularly effective.
[0072]
In the above embodiment, the data analysis device 1 is connected to the terminal device 2 via the LAN 3, and performs data analysis on the questionnaire recorded in the questionnaire file 14a in accordance with an instruction from the terminal device 2, and the analysis result is obtained. It was supposed to be returned to the terminal device 2 of the instruction source. That is, the present invention is realized in a system having a client-server configuration. On the other hand, the present invention may be realized in a stand-alone system assuming that the data analysis apparatus 1 has an input device and a display device.
[0073]
In the above-described embodiment, the program for the data analysis apparatus 1 to perform data analysis on the questionnaire has been described as being stored in the storage device 12 in advance. However, this program may be stored in a computer-readable recording medium such as a CD-ROM or DVD-ROM and distributed independently of the hardware. Alternatively, these processing programs may be stored in a fixed disk device of a Web server device on the Internet and distributed through the Internet.
[0074]
【The invention's effect】
As described above, according to the present invention, it is possible to easily capture subjective information by a text mining technique and perform accurate data analysis. The analysis results also facilitate the formulation of management strategies and marketing strategies.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a system configuration centered on a data analysis apparatus according to an embodiment of the present invention;
FIG. 2 is a diagram showing a questionnaire entry form recorded in a questionnaire file.
FIG. 3 is a diagram illustrating an example of a knowledge dictionary.
FIG. 4 is a flowchart showing processing in the data analysis apparatus according to the embodiment of the present invention.
FIG. 5 is a diagram showing an example of a simple dendrogram display of the subjects classified by the subject analysis.
FIG. 6 is a diagram illustrating extraction of noun phrases and adjective phrases in evaluation analysis.
FIG. 7 is a diagram illustrating extraction of adjective phrases and noun phrases in sensitivity analysis.
FIG. 8 is a diagram for explaining verb phrase and noun phrase extraction in function request analysis;
FIG. 9 is a diagram illustrating an output example of the degree of effect of each noun phrase as a result of the evaluation analysis.
FIG. 10 is a diagram illustrating an output example of evaluation intensity for each adjective phrase, which is a result of sensitivity analysis.
FIG. 11 is a diagram illustrating an output example of opinion aggregation level of each verb phrase as a result of the function request analysis;
FIG. 12 is a diagram showing an example of a dictionary added as a knowledge dictionary in the modification.
[Explanation of symbols]
1 Data analyzer
2 Terminal equipment
3 LAN
4 Terminal equipment
5 Gateway
6 Internet
11 CPU
12 Storage device
13 Communication device
14 File device
14a Questionnaire file
14b Knowledge Dictionary
14c Analysis result database

Claims

Noun phrase dictionary storage means for storing a noun phrase dictionary in which noun phrases that can be included in the sentence to be analyzed are registered;
A modifier phrase storage that stores modifier phrases that are registered by associating modifier phrases that can modify noun phrases registered in the noun phrase dictionary with level values that indicate whether each modifier phrase is semantically preferable or not. Means ,
An analysis object including a description part relating to the current state and a description part relating to a future state related to the present state with reference to the noun phrase dictionary stored in the noun phrase dictionary storage means by instructing the start of sentence analysis Noun phrase extraction means for extracting a noun phrase from each description part included in the sentence to be ,
A modifier phrase extracting means for referring to a modifier phrase dictionary stored in the modifier phrase storage means and extracting a modifier phrase that modifies the noun phrase extracted by the noun phrase extracting means;
Among the sentences to be analyzed, the tense used is the past tense, present tense or future tense, the described position is first or second, or registered in the dictionary stored in the storage means A separation means for separating a description part related to the current state and a description part related to the future state according to whether or not a frequent phrase of the future state is included;
For each descriptive part carved out by the carving means, the level values registered in the modifier phrase dictionary for the modifiers extracted by the modifier extraction means are totaled for each noun phrase extracted by the noun phrase extracting means, An effect level calculating means for calculating the aggregated level value as an effect level for each noun phrase;
A data analysis apparatus comprising: an effect degree output means for outputting the effect degree for each noun phrase calculated by the effect degree calculation means.

Noun phrase frequency calculating means for calculating the appearance frequency in each description part included in the sentence to be analyzed of each noun phrase extracted by the noun phrase extracting means for each description part cut by the dividing means ,
2. The effect level output unit outputs the degree of effect calculated by the effect level calculation unit for each noun phrase and the appearance frequency calculated by the noun phrase frequency calculation unit in a visual association. The data analysis device described in 1.

Referring to a modifier phrase dictionary stored in the modifier phrase storage means, a modifier phrase extracting means for extracting a modifier phrase that modifies a noun phrase in each description part included in the sentence to be analyzed;
A noun phrase extraction unit that refers to a noun phrase dictionary stored in the noun phrase dictionary storage unit and extracts a noun phrase that is modified by each modifier extracted by the modifier phrase;
Aggregation indicating the degree to which the modifier phrase extracted by the modifier phrase modifier modifies the same noun phrase on the basis of the extracted result of the noun phrase extracted by the noun phrase extractor for each description part segmented by the separator means A degree of aggregation calculating means for calculating the degree for each modifier,
The data analysis apparatus according to claim 1, further comprising an aggregation degree output unit that outputs an aggregation degree for each modifier phrase calculated by the aggregation degree calculation unit.

For each description part that has been carved out by the carving means, further comprising a modifier word frequency calculating means for calculating the appearance frequency of each of the description parts included in the sentence to be analyzed of each modifier word extracted by the modifier word extracting means,
The said degree-of-aggregation output means outputs the association degree calculated by said degree-of-aggregation calculation means for each modifier phrase and the appearance frequency calculated by said modifier phrase frequency calculation means in a visually related manner. The data analysis device described in 1.

Noun phrase dictionary storage means for storing a noun phrase dictionary in which noun phrases that can be included in the sentence to be analyzed are registered;
A modifier phrase storage means for storing a modifier phrase dictionary in which modifier phrases that can modify noun phrases registered in the noun phrase dictionary are stored ;
An analysis object including a description part relating to the current state and a description part relating to the future state related to the present state with reference to the modifier phrase dictionary stored in the modifier phrase dictionary storage means by instructing the start of sentence analysis A modifier extraction means for extracting a modifier that modifies the noun phrase in each description part included in the sentence ,
A noun phrase extraction unit that refers to a noun phrase dictionary stored in the noun phrase dictionary storage unit and extracts a noun phrase that is modified by each modifier extracted by the modifier phrase;
Among the sentences to be analyzed, the tense used is the past tense, present tense or future tense, the described position is first or second, or registered in the dictionary stored in the storage means A separation means for separating a description part related to the current state and a description part related to the future state according to whether or not a frequent phrase of the future state is included;
Aggregation indicating the degree to which the modifier phrase extracted by the modifier phrase modifier modifies the same noun phrase on the basis of the extracted result of the noun phrase extracted by the noun phrase extractor for each description part segmented by the separator means A degree of aggregation calculating means for calculating the degree for each modifier,
A data analysis apparatus comprising: a degree-of-aggregation output means for outputting the degree of aggregation for each modifier phrase calculated by the degree-of-aggregation calculation means.

For each description part that has been carved out by the carving means, further comprising a modifier word frequency calculating means for calculating the appearance frequency of each of the description parts included in the sentence to be analyzed of each modifier word extracted by the modifier word extracting means,
The aggregation level output unit outputs the level of aggregation calculated by the level of aggregation calculation unit for each modifier and the appearance frequency calculated by the modifier frequency calculation unit in a visually correlated manner. The data analysis device described in 1.

A noun phrase dictionary that registers noun phrases that can be included in the sentence to be analyzed, and a modifier that can modify the noun phrases registered in the noun phrase dictionary, whether each modifier is semantically preferable or not preferable In a computer apparatus having a dictionary storage means for storing a modifier dictionary registered in association with the level value shown, and having a storage means used as a work area ,
When the start of analysis is instructed, the noun phrase dictionary stored in the dictionary storage means is referred to, and the description part stored in the storage means and the description part relating to the current state and the description part relating to the future state are provided. from each description part included in the text to be analyzed, including extracting noun phrases, stores the noun phrase the extracted in said memory means,
Referring to a modifier dictionary stored in the dictionary storage means, extract a modifier phrase that modifies the noun phrase extracted and stored in the storage means, and stores the extracted modifier phrase in the storage means ,
Among the sentences to be analyzed stored in the storage means, the tense used is the past tense, the present tense or the future tense, the described position is the first or the last, or the dictionary storage means According to whether or not to include a frequent phrase of the future state registered in the stored frequent phrase dictionary, the description part regarding the current state and the description part regarding the future state are separated and stored in the storage means,
Each description part where the carved, aggregated level values registered in the modifier dictionary for each noun phrase is stored in the storage means with said extracted for modifier having stored in the storage means and the extracted The calculated level value is calculated as an effect level for each noun phrase, and the calculated effect level is stored in the storage means.
Data analysis method characterized in that makes output noun phrases for each of Afekuto degree is stored in the storage means and said calculated from the output device of the computer equipment.

A dictionary storage means for storing a noun phrase dictionary in which noun phrases that can be included in a sentence to be analyzed are registered, and a modifier phrase dictionary in which modifier phrases that can modify noun phrases registered in the noun phrase dictionary are stored. In addition, in a computer device having storage means used as a work area ,
When the analysis start is instructed, the modifier dictionary stored in the dictionary storage means is referred to, the description part stored in the storage means, and the description part related to the current state and the description part related to the future state in each description part included in the sentence to be analyzed, including extracting the modifier for modifying the noun phrase, it is stored a modifier which the extracted in said memory means,
With reference to the noun phrase dictionary stored in the dictionary storage means, the noun phrases that are modified by each of the modifier phrases that are extracted and stored in the storage means are extracted ,
Among the sentences to be analyzed stored in the storage means, the tense used is the past tense, the present tense or the future tense, the described position is the first or the last, or the dictionary storage means According to whether or not to include a frequent phrase of the future state registered in the stored frequent phrase dictionary, the description part regarding the current state and the description part regarding the future state are separated and stored in the storage means,
Based on the extracted result of the noun phrase, the degree of aggregation indicating the degree of modification of the same noun phrase modified by the extracted extracted and stored in the storage unit for each descriptive part. And the calculated degree of aggregation is stored in the storage means,
Data analysis method characterized in that makes output intensity for each modifier having stored in the storage means and said calculated from the output device of the computer device.

A noun phrase dictionary that registers noun phrases that can be included in the sentence to be analyzed, and a modifier that can modify the noun phrases registered in the noun phrase dictionary, whether each modifier is semantically preferable or not preferable A computer device comprising a dictionary storage means for storing a modifier dictionary registered in association with the level value shown,
When an instruction to start analyzing a sentence is given, the noun phrase dictionary stored in the dictionary storage means is referred to, and an analysis target including a description part related to the current state and a description part related to the future state related to the current state is obtained. Noun phrase extraction means for extracting a noun phrase from each description part included in the sentence ,
A modifier phrase extracting unit that extracts a modifier phrase that modifies the noun phrase extracted by the noun phrase extracting unit with reference to the modifier dictionary stored in the dictionary storage unit;
Among the sentences to be analyzed, the tense used is the past tense, present tense or future tense, the described position is first or second, or registered in the dictionary stored in the storage means A separation means for separating a description part about the current state and a description part about the future state according to whether or not a frequent phrase of the future state is included,
For each description part carved out by the carving means, the level values registered in the modifier phrase dictionary for the modifiers extracted by the modifier extraction means are totaled for each noun phrase extracted by the noun phrase extracting means, An effect level calculation means for calculating the total level value as an effect level for each noun phrase, and
A program for functioning as an effect degree output means for outputting an effect degree for each noun phrase calculated by the effect degree calculating means from an output device provided in the computer device .

A dictionary storage means for storing a noun phrase dictionary in which noun phrases that can be included in a sentence to be analyzed are registered, and a modifier phrase dictionary in which modifier phrases that can modify noun phrases registered in the noun phrase dictionary are stored. Computer equipment,
Noun phrase dictionary Symbol 憶means for storing the noun phrase dictionary having registered the noun phrase which may be included in the text to be analyzed,
A modifier phrase storage means for storing a modifier phrase dictionary in which modifier phrases that can modify noun phrases registered in the noun phrase dictionary are stored ;
An analysis object including a description part relating to the current state and a description part relating to the future state related to the present state with reference to the modifier phrase dictionary stored in the modifier phrase dictionary storage means by instructing the start of sentence analysis A modifier extraction means for extracting a modifier that modifies the noun phrase in each description part included in the sentence
A noun phrase extraction means for extracting a noun phrase modified by each modifier phrase extracted by the modifier phrase extraction means with reference to the noun phrase dictionary stored in the noun phrase dictionary storage means;
Among the sentences to be analyzed, the tense used is the past tense, present tense or future tense, the described position is first or second, or registered in the dictionary stored in the storage means A separation means for separating a description part about the current state and a description part about the future state according to whether or not a frequent phrase of the future state is included,
Aggregation indicating the degree to which the modifier phrase extracted by the modifier phrase modifier modifies the same noun phrase on the basis of the extracted result of the noun phrase extracted by the noun phrase extractor for each description part segmented by the separator means A degree-of-aggregation calculating means for calculating the degree for each modifier, and
A program for functioning as an aggregation output means for outputting an aggregation degree for each modifier phrase calculated by the aggregation degree calculation means from an output device included in the computer device .