JP2004021605A

JP2004021605A - Information sorting device, method, and program

Info

Publication number: JP2004021605A
Application number: JP2002175625A
Authority: JP
Inventors: Hideaki Masuguchi; 樽口　秀昭
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2002-06-17
Filing date: 2002-06-17
Publication date: 2004-01-22
Anticipated expiration: 2022-06-17
Also published as: JP4161171B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information sorting device and method and program for easily and accurately sorting information into groups, and for informing a user the groups of the information with sounds. <P>SOLUTION: The information sorting device is provided with a sorting means for sorting information into groups based on a keyword dictionary, a dictionary updating means for updating the keyword dictionary based on configuring elements extracted from the information, and a sound reproducing means for reproducing sound according to the groups obtained by sorting the information. In this case, the keyword dictionary is updated based on the configuring elements extracted from the information by the dictionary updating means (S315). Thus, it is possible to easily and accurately sort the information into groups. Also, a sound reproducing means reproduces a sound according to the sorted groups. Thus, it is possible to inform the user of the groups of the information with those sounds. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【従来の技術】
本発明は、情報分類装置、方法及びプログラムに関する。
【０００２】
【従来の技術】
従来、予め登録しておいたキーワードを用いて情報を分類する技術が広く知られている。分類対象となる情報は、電子メール、インターネットを通じて配信されるテキスト、画像、音声など幅広い。このような情報分類技術を応用することにより、多くの情報の中から利用者にとって重要なものを短時間に抽出したり、通信回線を通じて情報を随時受信しているときに受信者にとって重要なものを受信すると即座にそのことを知ったりすることができる。
【０００３】
情報の利用者に分類結果を音で通知することにより、利用者が情報に注意を払っているか否かに関わらず、分類結果を即座に通知できる可能性が高い。また、音によって分類結果を通知すれば、利用者は、視覚を用いて情報を把握しつつ、聴覚を用いて情報を分類できるため、情報の把握が容易になる。
【０００４】
特開２００１−２８２６３５号公報には、予め登録したキーワードが電子メール内のオブジェクトに存在するとき、当該電子メールの着信時に予め当該キーワードと対応付けて登録されているメロディをスピーカで出力する通信装置が開示されている。この通信装置によると、オペレータはキーワードを予め登録しておけば、重要な電子メールの着信を即座に知ることができる。
【０００５】
【発明が解決しようとする課題】
しかし、特開２００１−２８２６３５号公報に開示された通信装置には、次の問題がある。第一に、適切なキーワードを登録することの困難と手間である。情報の利用者にとって重要な情報を重要でない情報から区別するためには、適切なキーワードを予め登録しておかなければならない。ところが、重要な情報と重要でない情報とを正確に分類しようとすれば、多数のキーワードを適切な組み合わせ条件とともに登録する必要がある。第二に、情報の重要度の時間的変動である。情報の利用者にとって何が重要な情報であるかは時間の経過とともに変動する。したがって、あるときには重要な情報をそのときには重要でない情報から区別するためには、キーワードを常に更新し続けなければならない。
【０００６】
本発明は、これらの問題を解決するために創作されたものであって、情報を容易かつ正確にグループに分類し、情報のグループを利用者に音で通知する情報分類装置、方法及びプログラムを提供することを目的とする。
【０００７】
【課題を解決するための手段】
上記目的を達成するため、本発明に係る情報分類装置は、キーワード辞書に基づいて情報をグループに分類する分類手段と、情報から抽出した構成要素に基づいて前記キーワード辞書を更新する辞書更新手段と、情報が分類されるグループに応じて音を再生する音再生手段と、を備えることを特徴とする。
【０００８】
辞書更新手段がキーワード辞書を更新するために用いる情報は、分類を直接的な目的として入力される情報ではなく、具体的には例えば、テキスト入力により作成される文書であり、通信回線を通じて受信した文書である。すなわち、辞書更新手段がキーワード辞書を更新するために用いる構成要素は、具体的には例えば、電子メールを作成するときに入力する比較的小さな言語単位（表題、文節、句、単語など）に対応したテキストであり、受信した電子メール内の比較的小さな言語単位に対応したテキストであり、インターネットを通じて受信するＨＴＭＬファイルに含まれる比較的小さな言語単位に対応したテキストである。
【０００９】
本発明に係る情報分類装置によると、辞書更新手段が情報から抽出した構成要素に基づいてキーワード辞書を更新するため、情報を容易かつ正確にグループに分類することができ、音再生手段が分類されたグループに応じて音を再生するため、情報のグループを利用者に音で通知することができる。
【００１０】
さらに本発明に係る情報分類装置の辞書更新手段は、テキスト入力により情報が作成されるときテキスト入力の区切毎に当該情報の構成要素を抽出することを特徴とする。一般に、キーワード辞書に登録されるキーワードは、特定の文脈で出現頻度が高くなる普通名詞、固有名詞などでなければならない。このため、テキスト情報から適切なキーワードを抽出するためには、文を単語に分解する処理が必要になる。一方、テキスト入力時には、単語、文節、句などに区切って例えば漢字仮名交じり文などに文字種変換したり、単語と単語の間をスペースで区切ることが多い。したがって、辞書更新手段がテキスト入力の区切毎に情報の構成要素を抽出することにより、キーワード辞書を更新するために情報から適切なキーワードを抽出する処理が簡素化される。
【００１１】
さらに辞書更新手段が、受信した情報から当該情報の構成要素を抽出することにより、情報を正確にグループに分類することができる。一般に、情報の利用者は、自分にとって重要な情報を積極的に収集するため、重要な情報は重要でない情報に比べて受信頻度が高くなるからである。
【００１２】
さらに辞書更新手段が、前記キーワード辞書に登録済みのキーワードとの共起関係の強い構成要素を情報から抽出し当該キーワードと対のグループと対のキーワードとして前記キーワード辞書に登録することにより、情報をより正確に分類することができる。尚、「対」とは互いに関連付けられていることをいう。
【００１３】
さらに前記音再生手段が、情報を受信したときに当該情報が分類されるグループに応じた音を再生することにより、重要な情報を受信したときには、即座にその情報を利用することが可能になる。
【００１４】
本発明に係る情報分類装置は、キーワードに対する重み付けを前記キーワード辞書に登録する重み付け手段をさらに備え、前記分類手段は、キーワードに登録された重み付けに基づいて情報をグループに分類することを特徴とする。重み付けを用いることにより、情報をより正確に分類することができる。
【００１５】
さらに前記音再生手段が、音を再生するための制御情報を、前記キーワード辞書に登録済みのキーワードの出現頻度であって情報の被表示部分での出現頻度に応じて変更することにより、利用者は例えばスクロール表示中に情報の重要部分が表示されていることを容易に知ることができるため、利用者は情報の重要部分を効率よく利用することができる。尚、ここでいう情報の被表示部分とは、全体を１画面に表示しきれない情報のうち、表示されている部分をいうものとする。
【００１６】
さらに本発明に係る情報分類装置は、情報とグループとの相関関係に応じた位置を座標平面上に表示するマッピング手段をさらに備えることにより、複数のグループと相関関係にある情報の傾向を正確に利用者に伝えることができる。
【００１７】
さらに前記辞書更新手段が、前記手動分類手段により分類された情報の構成要素をオペレータに指示されたグループと対のキーワードとして前記キーワード辞書に追加登録することにより、オペレータの意図を直接的に分類処理に反映させることができる。
【００１８】
尚、本発明に係る情報分類装置に備わる複数の手段の各機能は、構成自体で機能が特定されるハードウェア資源とプログラムにより機能が特定されるハードウェア資源との任意の組み合わせにより実現される。また、これら複数の手段の各機能は、各々が物理的に互いに独立したハードウェア資源で実現されるものに限定されない。
【００１９】
また、本発明は装置の発明として特定できるだけでなく、プログラムの発明としても、そのプログラムを記録した記録媒体の発明としても、方法の発明としても特定することができる。
【００２０】
【発明の実施の形態】
以下、本発明の実施例を図面に基づいて説明する。
図２は、本発明の一実施例である情報分類装置１を通信ネットワークＮ及び電子メール配送網Ｍに接続した様子を表す図である。情報分類装置１はパーソナルコンピュータ、携帯情報端末（ＰＤＡ）、携帯電話等として構成され、インターネット等の通信ネットワークＮを通じてＷＷＷ（Ｗｏｒｌｄ　Ｗｉｄｅ　Ｗｅｂ）サーバ２からＨＴＭＬファイルを受信する他、電話回線等の電子メール配送網Ｍを通じて電子メール装置３と電子メールの送受信を行う。尚、通信ネットワークＮと電子メール配送網Ｍは同一のものであってもよい。
【００２１】
図３は、情報分類装置１のハードウェア構成を示すブロック図である。図示するように情報分類装置１はＣＰＵ１１、ＲＯＭ１２、ＲＡＭ１３、操作器１４、通信部１５、サウンド制御部１６、スピーカ１７、表示制御部１８、表示装置１９及び外部記憶装置２０を備える。
【００２２】
ＣＰＵ１１は、ＲＯＭ１２に格納されたプログラムを実行して情報分類装置１の各部を制御する他、処理プログラムを実行し、ＨＴＭＬファイルや電子メール等の情報から抽出した構成要素に基づいてキーワード辞書Ｗを更新する処理、キーワード辞書Ｗに基づいて情報をグループに分類する処理、情報が分類されるグループに応じて音を再生する処理、情報とグループとの相関関係に応じた位置を座標平面上に表示する処理、情報の構成要素に基づいてキーワードに対する重み付けをキーワード辞書Ｗに登録する処理等を行う。
【００２３】
ＲＯＭ１２はＣＰＵ１１が動作するために最低限必要な制御プログラムやデータ、処理プログラム、電子メールプログラム、Ｗｅｂブラウザ等を予め格納しているメモリであり、ＲＡＭ１３はプログラムや各種のデータ等を一時的に格納するメモリである。これらのプログラムや各種のデータは、通信部１５を介してダウンロードしてＲＡＭ１３や外部記憶装置２０の所定領域に格納してもよい。またこれらのプログラムや各種のデータは、図示しないコンパクトディスク等のコンピュータ読み取り可能な記憶媒体から読み出してＲＡＭ１３や外部記憶装置２０の所定領域に格納してもよい。
【００２４】
操作器１４は、例えばパーソナルコンピュータの場合はキーボードやマウス、携帯電話の場合はダイヤルボタンであり、情報分類装置１のオペレータが種々の指示やテキスト入力を行うためのものである。
【００２５】
通信部１５は、所謂ネットワークインタフェースカードやモデム等であり、通信ネットワークＮ及び電子メール配送網Ｍに接続可能に構成されている。
サウンド制御部１６は、音を再生するための制御情報が記述されたサウンドファイルに基づいて音響信号を生成する。そして、この音響信号をスピーカ１７に出力して再生させる。また、サウンド制御部１６はスピーカ１７が発音する音量、すなわち再生ボリュームの調整を行うことが可能とされており、ＣＰＵ１１から再生ボリュームの設定値を表す制御情報が出力されたとき、その制御情報に基づいて再生ボリュームを調整する。
【００２６】
表示制御部１８は、通信ネットワークＮを通じて受信するＨＴＭＬファイルや電子メール配送網Ｍを通じて受信する電子メール等の情報をＣＰＵ１１の制御の下で液晶表示パネル（ＬＣＤ）やＣＲＴ等から構成される表示装置１９に出力する。
【００２７】
外部記憶装置２０は、ハードディスク、フラッシュメモリ等を備え、後述するキーワード辞書、共起キーワード辞書、仮名漢字変換辞書、キーワード候補辞書、グループテーブル、メロディテーブル、サウンドファイルなどを記憶している。サウンドファイルのデータの形式はＭＩＤＩ形式のようなディジタル符号化されたデータであってもよいし、ＰＣＭ、ＤＰＣＭ、ＡＤＰＣＭ等のような波形サンプルデータ方式によるデータであってもよい。
【００２８】
図４は、情報分類装置１のデータフロー図である。処理プログラムは、通信プロセス３１、分類プロセス３２、メロディ再生プロセス３３、マッピングプロセス３４、形態素解析プロセス３５、仮名漢字変換プロセス３６、フィルタプロセス３７及び辞書更新プロセス３８を生成する。
【００２９】
通信プロセス３１は、所定のプロトコルに従ってＷＷＷサーバ２からＨＴＭＬファイルを受信する他、電子メール装置３との電子メールの送受信を行う。通信プロセス３１はＨＴＭＬファイルや電子メール等の情報を受信すると、それらを形態素解析プロセス３５及び分類プロセス３２に出力する。
【００３０】
分類プロセス３２は、通信プロセス３１から出力される情報をキーワード辞書Ｗに基づいてグループに分類し、情報毎にグループ番号やグループ評価値などをメロディ再生プロセス３３及びマッピングプロセス３４に出力する。
【００３１】
図５は、キーワード辞書Ｗの一例を示す図である。キーワード辞書Ｗのレコードはグループ番号、キーワード、設定値及び曲番号のフィールドを含む。同一レコードに含まれるグループ番号とキーワードとは、互いに対になってキーワード辞書Ｗで管理される。グループ番号は後述するグループテーブルＧで各グループに割り当てられているものであり、キーワードが属するグループはこのグループ番号により決定される。したがって、グループ番号とキーワードとを互いに対にしてキーワード辞書Ｗに登録することは、グループとキーワードとを互いに対にしてキーワード辞書Ｗに登録することと実質的に等しい。キーワードには特定の文脈で特に出現頻度が高くなる単語が登録される。設定値は、キーワードに対する重み付けを表すものであり、受信した情報はキーワードとその設定値が表す重み付けに基づいてグループに分類される。曲番号は、後述するメロディテーブルＭで各サウンドファイル名に割り当てられている。尚、キーワードについての曲番号の設定は任意であり、設定されていない場合は当該キーワードが属するグループに設定されている曲番号により再生するサウンドファイルが決定される。曲番号は、例えばある情報について特定のキーワードの出現頻度が極めて高い場合に、そのキーワードについて設定されているサウンドファイルを、そのキーワードが属するグループについて設定されているサウンドファイルより優先して出力するような場合に利用することができる。
【００３２】
図６は、グループテーブルＧの一例を示す図である。グループテーブルＧは情報を分類するグループ名を格納するためのテーブルである。このテーブルのレコードはグループ番号、グループ名、曲番号及び設定座標の４つのフィールドから構成される。情報がグループに分類されたとき、分類されたグループの曲番号に対応するサウンドファイルが再生される。グループ名のうち「嗜好」はオペレータの嗜好を表すキーワードを分類するためのグループであり、グループ「嗜好」に属するキーワードの登録処理方法は他のグループとは異なる。詳しくは後述する。設定座標は、情報とグループとの相関関係に応じた座標平面上の位置にマークを表示するときに、マークの位置を求めるための値であり、詳しくは後述する。
【００３３】
分類プロセス３２は、上述のキーワード辞書Ｗ及びグループテーブルＧを用いて具体的には例えば次のようにして情報を分類する。分類プロセス３２は、分類しようとする情報から表題、本文、ファイル名などのテキスト情報を抽出し、以下の式により分類しようとする情報について各グループの評価値を算出し、当該情報をグループ評価値の最も大きいグループに分類する。
グループ評価値＝Σ（当該グループに属するキーワードの出現回数×当該キーワードの設定値）
ここでキーワードの設定値はキーワード辞書Ｗに格納されている「設定値」である。また、キーワードの出現回数は分類しようとしている情報中に当該キーワードが何個あるかをカウントした値である。尚、分類プロセス３２は、テキスト情報に限らず、画像情報、音情報などもファイル名などを用いて分類することができる。
【００３４】
例えばテキスト情報が以下の内容であったとする。
（情報）：”バイクのプラモを買いました”
図５に示すキーワード辞書Ｗを用いる場合、各グループのグループ評価値は以下のように算出される。
グループ「趣味」：５（”バイク”×１）＋２（”プラモ”×１）＝７
グループ「仕事」：０
グループ「嗜好」：０
従って”バイクのプラモを買いました”というテキスト情報はグループ「趣味」に分類される。
【００３５】
尚、情報の分類は、重み付けを用いずグループ毎にそのグループに属する全てのキーワードについて出現回数を合計し、その値が最も大きいグループに分類してもよいし、単に最も出現頻度の高いキーワードが属するグループに分類してもよい。
【００３６】
メロディ再生プロセス３３は、分類プロセス３２で情報が分類されたグループに応じてサウンドファイルを再生する。メロディ再生プロセス３３は、分類プロセス３２から情報を分類したグループのグループ番号が出力されると、グループテーブルＧから当該グループに設定されている曲番号を取得し、メロディテーブルにおいて当該曲番号で特定されるサウンドファイルを再生する。
【００３７】
図７は、メロディテーブルＭの一例を示す図である。メロディテーブルＭはサウンドファイルのファイル名を格納するためのテーブルであり、曲番号及びサウンドファイルを一意に識別するためのファイル名のフィールドを含む。
【００３８】
サウンドファイルの再生のタイミングは、例えば情報の受信時、情報の閲覧時などに設定することができる。具体的には例えば電子メールが着信したとき、着信した電子メールを閲覧するとき、あるいはＷｅｂブラウザでＨＴＭＬファイルを閲覧するときなどである。閲覧時に再生する場合、表示中の部分に出現するキーワードの出現頻度に応じて再生する音量やサウンドファイルの再生部分を変化させるように設定することもできる。例えば表示中の部分におけるキーワードの出現頻度が高いときは音量を大きくしたり、あるいはさびの部分を再生したりするよう設定しておくと、情報をスクロールさせて閲覧するとき、その音量が変化することあるいは再生部分が途中でさびの部分に変化することにより表示中の部分におけるキーワードの出現頻度の変化を容易に知ることができる。
【００３９】
尚、メロディ再生プロセス３３は、キーワード辞書Ｗでキーワード毎に設定されているサウンドファイルを再生してもよい。例えば、情報中に最も出現頻度の高いキーワードに対して設定されているサウンドファイルを再生してもよいし、例えば複数のキーワードが含まれる情報をスクロール表示させて閲覧するとき、そのときどきの表示中の部分に最も多く出現するキーワードに応じて再生するサウンドファイルを変更してもよい。
【００４０】
また、再生する音量はグループ毎にグループの重要度に応じて予め設定するようにしてもよい。また、電子メールが着信したとき、当該電子メールが分類されるグループに分類された過去に受信済みの電子メールのうち未確認メールが所定数以上であるときは、再生する音量を大きくしてオペレータに警告するようにしてもよい。
【００４１】
マッピングプロセス３４は、図８に示すように、座標平面上において情報とグループとの相関関係に応じた位置に星印Ｚを表示する。具体的には例えば次のように処理する。はじめに次式によりグループ毎の座標を求める。
Σ（グループの設定座標×グループ評価値／評価値の総計）
ここでグループの設定座標とはグループテーブルＧに示す各グループの設定座標であり、グループ評価値は分類プロセス３２で算出した値である。また、評価値の総計は各グループのグループ評価値の総計である。
【００４２】
次に、上式によりグループ毎にｘ座標を合計した値を星印Ｚのｘ座標として求め、グループ毎にｙ座標を合計した値を星印Ｚのｙ座標として求める。
次に、ポップアップウィンドウを表示装置１９に表示し、そのポップアップウィンドウではウィンドウ中央を原点とするｘｙ座標平面上に星印Ｚを表示し、嗜好、趣味、仕事の各グループの設定座標と原点とを結ぶ３本の軸を表示する。表示のタイミングは、電子メールが着信したとき、あるいはＨＴＭＬファイルの送信をＷＷＷサーバ２に要求し、受信したＨＴＭＬファイルをＷｅｂブラウザに表示したとき等である。
【００４３】
例えば、着信した電子メールについて各グループのグループ評価値が以下の値であったとする。
グループ「趣味」：５
グループ「仕事」：３
グループ「嗜好」：２
この場合、上記の式によりグループ毎に算出した座標は以下の値になる。
グループ「趣味」：（１６０，−１００）
グループ「仕事」：（−９６，−６０）
グループ「嗜好」：（０，４０）
これらの座標をｘ座標、ｙ座標毎に合計すると（６４，−１２０）になる。この結果、表示装置１９の座標平面上で（６４，−１２０）に相当する位置に星印Ｚを表示したポップアップウィンドウが電子メールの着信時に表示装置１９に表示される。
【００４４】
形態素解析プロセス３５は、通信プロセス３１から出力された情報から構成要素としての単語を抽出しフィルタプロセス３７に出力する処理を行う。形態素解析プロセス３５は通信プロセス３１から情報が出力されると、情報内のテキストを形態素解析により単語に分解する。具体的には例えば「静岡県西部のゲレンデはスキーに最適です」というテキストを「静岡県／西部／の／ゲレンデ／は／スキー／に／最適／です」という９個の単語に分解する。
【００４５】
仮名漢字変換プロセス３６は、仮名漢字変換辞書に基づきユーザによって入力されるテキストを仮名漢字変換するとともに、変換指示毎に変換対象となっているテキストの構成要素を形態素解析プロセス３５に出力する。具体的には例えばオペレータが「静岡県西部のゲレンデはスキーに最適です」というテキストを入力するために、「しずおかけん／せいぶの／げれんでは／すきーに／さいてきです」というようにテキストを５つの構成要素に区切って変換を指示する場合、仮名漢字変換プロセス３６は、区切毎に「静岡県／西部の／ゲレンデは／スキーに／最適です」という漢字仮名交じり文に変換するとともに、変換済みの５つの構成要素を形態素解析プロセス３５に出力する。これにより形態素解析プロセス３５に比較的小さな構成要素が入力されるため、形態素解析プロセス３５で単語に分解するために必要な処理量を低減でき、その結果、情報を単語に分解する処理を簡素化し高速化することができる。
【００４６】
尚、仮名漢字変換プロセス３６は、漢字に変換した単語について変換した回数、すなわちテキストを作成するときの単語の使用頻度を単語毎に蓄積し、使用頻度が所定回数以上に達したときその単語を形態素解析プロセス３５に出力するようにしてもよい。
【００４７】
フィルタプロセス３７は、キーワード候補辞書を用いてフィルタリングすることにより、形態素解析プロセス３５から出力される単語から文脈と無関係に出現する単語を除外し、特定の文脈でのみ出現率が高くなる名詞をキーワードとして抽出する。具体的には例えば「静岡県／西部／の／ゲレンデ／は／スキー／に／最適／です」という９個の単語から「静岡県」、「ゲレンデ」、「スキー」という３つのキーワードを抽出する。
【００４８】
辞書更新プロセス３８は、フィルタプロセス３７から出力されるキーワード及びキーワード辞書Ｗに基づいて共起キーワード辞書Ｖを更新し、共起キーワード辞書Ｖに基づいてキーワード辞書Ｗを更新する。図９は、共起キーワード辞書Ｖの一例を示す図である。共起キーワード辞書Ｖは、１レコードをグループ番号、登録済キーワード、共起キーワード、共起回数及び生起回数で構成している。１レコード中の登録済キーワードと共起キーワードとは互いに対になって共起キーワード辞書Ｖに登録される。
【００４９】
共起キーワードは「嗜好」以外のグループに属するキーワードとしてキーワード辞書Ｗに登録されたキーワード（登録済キーワード）と共に１つの情報内に一定条件を満たして出現（共起）したキーワードである。また共起キーワードはキーワード辞書Ｗに新たに登録するキーワードの候補となる単語である。共起キーワードの抽出条件は、具体的には例えば、登録済キーワードが含まれる情報内に出現する、登録済キーワードが含まれる情報において登録済キーワードが含まれる段落内に出現する、登録済キーワードが含まれる情報において登録済キーワードを中心に前後ｎ（”ｎ”は任意の値）個以内のキーワード内に出現する等である。尚、グループ「嗜好」に属するキーワードについては共起キーワードの抽出は行われない。生起回数は、あるキーワードが共起キーワードとして格納された以降に分類対象となった情報のうち当該共起キーワードと対になっている登録済キーワードを含んだ情報の数である。共起回数は、あるキーワードが共起キーワードとして格納された以降に分類対象となった情報のうち登録済キーワードの共起キーワードとして当該共起キーワード（既に格納されている共起キーワード）が再度抽出された情報の数を表す。
【００５０】
尚、あるキーワードが共起キーワードとして格納された以降に分類対象となった情報のうち当該共起キーワードを含んだ情報の数を生起回数としてもよい。また、情報内に出現する出現回数で生起回数及び共起回数をカウントするようにしてもよい。
【００５１】
辞書更新プロセス３８は、登録済キーワードと共起関係の強い共起キーワードを、当該登録済キーワードの属するグループに属するキーワードとして、キーワード辞書Ｗに登録する。登録済キーワードと共起キーワードとの共起関係の強さは、例えば共起回数／生起回数で表される確率の高さや、共起回数の多さなどが指標となる。具体的には例えば、共起回数／生起回数＞０．７かつ共起回数＞９の条件を満たすときに共起キーワードをキーワード辞書Ｗに登録するとすれば、図９に示す共起キーワード辞書Ｖがある場合に、新たに受信した電子メールに「山形県のスキー場には雪が多く残っている。」というテキストが含まれ、登録済キーワードである「スキー」の共起キーワードとして「雪」が抽出されたとすると、「雪」はグループ番号「１」のキーワードとしてキーワード辞書Ｗに登録され、共起キーワード辞書Ｖから削除される。このときキーワード辞書Ｗに登録する設定値は、例えば共起キーワードと対になっていた登録済キーワードと同じ値でもよいし、その登録済キーワードの設定値に共起回数／生起回数で表される確率を掛けた値でもよい。また登録する曲番号は、例えばその登録済キーワードと同じ曲番号であってもよいし、その登録済キーワードが属するグループについて設定されているサウンドファイルを再生すればよい場合は設定しなくてもよい。
【００５２】
辞書更新プロセス３８は、出力されたキーワード毎にそのキーワードが含まれる情報の数を共起キーワード辞書Ｖとは別に蓄積し、その数がある条件を満たしたとき、そのキーワードをグループ「嗜好」に属するキーワード、すなわちグループ番号が”０”のキーワードとしてキーワード辞書Ｗに追加する。オペレータがある特定のキーワードを含む情報に興味があるとき、オペレータはそのキーワードを含む情報を積極的に収集するため、そのキーワードはオペレータが受信する情報全般に多く含まれる可能性が高い。従って、グループ「嗜好」に属するキーワードはオペレータの嗜好を表すキーワードであるといえる。キーワードをグループ「嗜好」に分類することで、オペレータは電子メール着信時の音でその情報が自身の嗜好する情報であることを容易に知ることができる。
【００５３】
また、情報のグループをオペレータ自身に指定させることもできる。具体的には例えば次のように処理する。オペレータがグループを指定するよう予め設定されているとき、ある情報についてフィルタプロセス３７から出力されると、辞書更新プロセス３８はオペレータが当該情報を分類するグループを指示するための指示画面を表示装置１９に表示し、オペレータに当該情報を分類するグループを指定させる。オペレータがグループを指定すると、辞書更新プロセス３８は、抽出されたキーワードのうち出現回数が所定の回数以上であるキーワードをオペレータに指示されたグループに属するキーワードとしてキーワード辞書Ｗに追加登録する。尚、追加登録するキーワードをオペレータに選択させるようにしてもよい。
【００５４】
以上、情報分類装置１の構成について説明した。以下、情報分類装置１の作動について説明する。
図１は、情報分類装置１の処理の流れを表すフローチャートである。
【００５５】
始めに、図中の「開始４」から始まるメロディテーブルＭ、グループテーブルＧ及びキーワード辞書Ｗの初期設定を行うときの作動について説明する。「開始４」の処理は情報分類装置１を初めて使うとき必ず行い、以降、必要に応じて行われる。辞書更新プロセス３８はキーワード辞書Ｗの初期設定を行うための画面を表示し、オペレータにキーワード、キーワードが属するグループ番号及び設定値の入力を要求する。オペレータがキーワード、そのキーワードが属するグループ番号、設定値及び曲番号を入力すると、辞書更新プロセス３８はそれらをキーワード辞書Ｗに登録する（Ｓ４０５）。次に、辞書更新プロセス３８はメロディテーブルＭに格納されているサウンドファイル名を画面に一覧表示し、それぞれのグループ毎にサウンドファイル名を選択させる。オペレータがサウンドファイル名を選択すると、辞書更新プロセス３８は選択されたサウンドファイル名の曲番号をグループテーブルＧの「曲番号」フィールドにそれぞれ登録する（Ｓ４１０）。次に、辞書更新プロセス３８はグループテーブルＧの設定座標をグループ数に応じて設定する（Ｓ４１５）。
【００５６】
次に、図中の「開始１」から始まる電子メールの送信時に辞書を更新するときの作動について説明する。オペレータは情報分類装置１で電子メールプログラムを起動し、送信するテキストを入力して所定のキーボード操作により区切りを指定することでそれまでに入力したテキストの仮名漢字変換を指示する（Ｓ１０５）。仮名漢字変換が指示されると、仮名漢字変換プロセス３６は指示されたテキストを漢字仮名交じり文に仮名漢字変換し、変換済みの構成要素を形態素解析プロセス３５に出力する（Ｓ１１０）。形態素解析プロセス３５は単語を抽出してフィルタプロセス３７に出力し、フィルタプロセス３７はその単語にフィルタリングを行ってキーワードを抽出し、辞書更新プロセス３８に出力する（Ｓ３１０）。キーワードが出力されると、辞書更新プロセス３８はそのキーワードに基づいてキーワード辞書Ｗ及び共起キーワード辞書Ｖを更新する処理を行う（Ｓ３１５）。以上のステップＳ３１０〜Ｓ３１５をオペレータが仮名漢字変換を指示する毎に繰り返し、オペレータが全てのテキスト情報を入力し終わって電子メールを送信する操作を行うと、通信プロセス３１はその電子メールを電子メール装置３に送信する（Ｓ１１５）。
【００５７】
次に、図中の「開始３」から始まる電子メールを受信したときの作動について説明する。通信プロセス３１は通信部１５を介して電子メール装置３から電子メールを受信すると、受信した電子メールのテキスト情報を形態素解析プロセス３５に出力する（Ｓ３０５）。形態素解析プロセス３５はテキスト情報を単語に分解してフィルタプロセス３７に出力する。フィルタプロセス３７は単語から抽出したキーワードを辞書更新プロセス３８に出力する（Ｓ３１０）。キーワードが入力されると、辞書更新プロセス３８はそのキーワードに基づいてキーワード辞書Ｗ及び共起キーワード辞書Ｖの更新を行う（Ｓ３１５）。次に、分類プロセス３２は通信プロセス３１から出力された電子メールのテキスト情報をグループに分類してメロディ再生プロセス３３及びマッピングプロセス３４に出力する（Ｓ３２０）。メロディ再生プロセス３３は分類したグループに応じた音を再生する処理を行う。例えば分類したグループが「趣味」であったとすると、メロディ再生プロセス３３は分類したグループ「趣味」に対応するファイル名”曲名１．ｍｍｆ”をグループテーブルＧから取得し、そのサウンドファイルを外部記憶装置２０から取得する。次に、取得したサウンドファイルをサウンド制御部１６に出力する。これによりサウンド制御部１６でサウンドファイルの再生が行われ、電子メールが分類されたグループに応じた音が再生される。一方、マッピングプロセス３４は情報とグループとの相関関係に応じた位置に星印を表示したポップアップウィンドウを表示装置１９に表示する（Ｓ３２５）。次に、電子メールプログラムは電子メールを表示装置１９に表示する。メロディ再生プロセス３３はサウンドファイルを再生すると共に、再生ボリュームをテキスト情報の表示中の部分に含まれる登録済キーワードの出現頻度に応じて制御する。
【００５８】
図中の「開始２」から始まるＷｅｂを閲覧するとき、すなわちＨＴＭＬファイルを受信するときの作動は「開始３」のときの作動と実質的に同一であるため説明は省略する。
【００５９】
以下、本実施例の効果について説明する。
本実施例の情報分類装置１によると、登録済キーワードとの共起関係が強い共起キーワードを新たにキーワード辞書Ｗに登録する。共起関係が強いということは登録済キーワードが出現する情報にその共起キーワードが出現する頻度が高いということであり、ある情報にその共起キーワードが出現するとき、その情報は登録済キーワードのグループに属する情報である可能性が高いということである。また、時間の経過に伴って興味の対象がずれていったとしても、過去に興味を持った情報と関連のある情報について将来興味を持つ可能性が高い。共起キーワードをキーワードとしてキーワード辞書Ｗに登録すると、元の登録済キーワードは出現せずその新たに登録したキーワードのみが出現する情報を元の登録済キーワードが属するグループに分類することができる。従って情報を正確にグループに分類することができる。
【００６０】
更に、情報はキーワード毎の重み付けに基づいて算出したグループ評価値の大きいグループに分類されるため、情報を更に正確にグループに分類することができる。また、オペレータがグループを指定するよう設定されていない場合は、オペレータ自身がキーワードの登録や情報の分類を行う必要がないため、情報を容易にグループに分類することができる。例えば、情報の重要度が時間的に変化する場合であっても、時間と共に特定の単語が出現する頻度が増えてくると情報分類装置１はその単語をキーワードとして登録するため、オペレータはオペレータ自身でキーワード辞書Ｗを更新し続けることなく重要度の高い情報を特定のグループに分類することができる。また、メロディ再生プロセス３３は分類されたグループに応じて音を再生するため、情報のグループをオペレータに音で通知することができる。これによりオペレータはその音で情報のグループを判断でき、例えば電子メールの場合であれば重要な電子メールの着信を即座に知ることができる。
【００６１】
更に、本実施例の情報分類装置１は、テキスト入力により情報が作成されるときテキスト入力の区切毎に当該情報の構成要素を抽出するため、仮名漢字変換プロセス３６から出力された仮名漢字変換後の構成要素については形態素解析プロセス３５における処理が簡素化される。
【００６２】
更に、本実施例の情報分類装置１は、情報とグループとの相関関係に応じた位置を座標平面上に表示する。これにより、一つの情報に複数のキーワードが含まれ、それらのキーワードが属するグループが異なるとき、その情報と複数のグループとの関係の程度が視覚的に表現される。従って、複数のグループと相関関係にある情報の傾向を正確にオペレータに伝えることができる。
【００６３】
更に、本実施例の情報分類装置１によると、情報をオペレータの指示に応じてグループに分類することにより、オペレータの意図を直接的に分類処理に反映させることができる。
【００６４】
尚、本実施例では分類する情報として電子メール及びＨＴＭＬファイルを例に説明したが、情報は画像ファイルやサウンドファイルであってもよく、例えばサウンドファイルのファイル名に基づいてそのサウンドファイルをグループに分類してもよい。
【図面の簡単な説明】
【図１】本発明の一実施例である情報分類装置の処理の流れを表すフローチャートである。
【図２】本発明の一実施例である情報分類装置をネットワークに接続した様子を表す図である。
【図３】本発明の一実施例である情報分類装置のハードウェア構成を示すブロック図である。
【図４】本発明の一実施例である情報分類装置のデータフロー図である。
【図５】本発明の一実施例である情報分類装置が備えるキーワード辞書を示す図である。
【図６】本発明の一実施例である情報分類装置が備えるグループテーブルを示す図である。
【図７】本発明の一実施例である情報分類装置が備えるメロディテーブルを示す図である。
【図８】本発明の一実施例である情報分類装置が情報とグループとの相関関係に応じた位置を座標平面上に表示した一例である。
【図９】本発明の一実施例である情報分類装置が備える共起キーワード辞書を示す図である。
【符号の説明】
１　情報分類装置
２　ＷＷＷサーバ
３　電子メール装置
１１　ＣＰＵ（音再生手段、辞書更新手段、手動分類手段、重み付け手段、分類手段、マッピング手段）
１２　ＲＯＭ
１３　ＲＡＭ（音再生手段、辞書更新手段、手動分類手段、重み付け手段、分類手段、マッピング手段）
１４　操作器（手動分類手段）
１５　通信部
１６　サウンド制御部（音再生手段）
１７　スピーカ（音再生手段）
１８　表示制御部（手動分類手段、重み付け手段、マッピング手段）
１９　表示装置（手動分類手段、重み付け手段、マッピング手段）
２０　外部記憶装置（音再生手段、辞書更新手段、重み付け手段）
３１　通信プロセス
３２　分類プロセス
３３　メロディ再生プロセス
３４　マッピングプロセス
３５　形態素解析プロセス
３６　仮名漢字変換プロセス
３７　フィルタプロセス
３８　辞書更新プロセス[0001]
[Prior art]
The present invention relates to an information classification device, a method, and a program.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, a technique for classifying information using a keyword registered in advance has been widely known. The information to be classified covers a wide range such as e-mails, texts, images, and voices distributed via the Internet. By applying such information classification technology, it is possible to extract important information from users in a short period of time from a large amount of information, or to extract important information from users when receiving information as needed through a communication line. You can immediately know that when you receive.
[0003]
By notifying the user of the information with the sound of the classification result, there is a high possibility that the classification result can be immediately notified regardless of whether the user pays attention to the information. In addition, if the classification result is notified by sound, the user can classify the information by using the auditory sense while grasping the information by using the visual sense, thereby facilitating the grasp of the information.
[0004]
Japanese Patent Application Laid-Open No. 2001-282635 discloses a communication device that outputs a melody registered in advance in association with a keyword when a keyword registered in advance is present in an object in the e-mail, when the e-mail is received. Is disclosed. According to this communication device, the operator can immediately know the arrival of an important electronic mail by registering the keyword in advance.
[0005]
[Problems to be solved by the invention]
However, the communication device disclosed in JP-A-2001-282635 has the following problem. First, it is difficult and troublesome to register an appropriate keyword. In order to distinguish important information from unimportant information for an information user, an appropriate keyword must be registered in advance. However, in order to accurately classify important information and unimportant information, it is necessary to register a large number of keywords together with appropriate combination conditions. Second, there is a temporal change in the importance of information. What is important to the user of the information changes over time. Therefore, in order to distinguish important information from information that is not important at one time, keywords must be constantly updated.
[0006]
The present invention has been made in order to solve these problems, and provides an information classification apparatus, method, and program for easily and accurately classifying information into groups and notifying a user of the information groups by sound. The purpose is to provide.
[0007]
[Means for Solving the Problems]
In order to achieve the above object, an information classification device according to the present invention includes: a classification unit configured to classify information into groups based on a keyword dictionary; and a dictionary update unit configured to update the keyword dictionary based on components extracted from the information. Sound reproducing means for reproducing a sound in accordance with the group into which the information is classified.
[0008]
The information used by the dictionary updating means to update the keyword dictionary is not information input for the purpose of direct classification, but specifically, for example, a document created by text input and received through a communication line. Document. That is, the components used by the dictionary updating means to update the keyword dictionary specifically correspond to, for example, relatively small linguistic units (titles, phrases, phrases, words, etc.) input when creating an e-mail. The text is a text corresponding to a relatively small language unit in the received electronic mail, and is a text corresponding to a relatively small language unit included in an HTML file received via the Internet.
[0009]
According to the information classification device according to the present invention, since the dictionary updating unit updates the keyword dictionary based on the components extracted from the information, the information can be easily and accurately classified into groups, and the sound reproducing unit is classified. Since the sound is reproduced according to the group, the group of information can be notified to the user by sound.
[0010]
Further, the dictionary updating means of the information classification device according to the present invention is characterized in that, when information is created by text input, a component of the information is extracted for each section of the text input. In general, keywords registered in a keyword dictionary must be common nouns, proper nouns, and the like whose appearance frequency increases in a specific context. Therefore, in order to extract an appropriate keyword from the text information, a process of decomposing a sentence into words is required. On the other hand, at the time of text input, it is often the case that a character type is converted into a word, a phrase, a phrase, or the like into, for example, a kanji kana mixed sentence, or a word is separated by a space. Therefore, the process of extracting an appropriate keyword from the information to update the keyword dictionary is simplified by the dictionary updating unit extracting the information components for each section of the text input.
[0011]
Furthermore, the dictionary updating unit extracts the components of the information from the received information, so that the information can be accurately classified into groups. This is because, in general, a user of information actively collects information that is important to him / her, so that important information is received more frequently than non-important information.
[0012]
Further, the dictionary updating unit extracts from the information a component having a strong co-occurrence relationship with the keyword registered in the keyword dictionary, and registers the component in the keyword dictionary as a paired keyword and a paired keyword in the keyword dictionary. Classification can be made more accurately. In addition, "pair" means being associated with each other.
[0013]
Further, the sound reproducing means reproduces a sound corresponding to a group to which the information is classified when the information is received, so that when the important information is received, the information can be used immediately. .
[0014]
The information classification device according to the present invention further includes weighting means for registering a weight for the keyword in the keyword dictionary, wherein the classification means classifies the information into groups based on the weight registered for the keyword. . By using weighting, information can be classified more accurately.
[0015]
Further, the sound reproducing means changes the control information for reproducing the sound in accordance with the frequency of appearance of the keyword registered in the keyword dictionary and the frequency of appearance of the information in the displayed portion, whereby the user For example, the user can easily know that the important part of the information is displayed during the scroll display, so that the user can efficiently use the important part of the information. Here, the displayed portion of the information refers to a displayed portion of the information that cannot be entirely displayed on one screen.
[0016]
Furthermore, the information classification device according to the present invention further includes a mapping unit that displays a position corresponding to the correlation between the information and the group on a coordinate plane, so that the tendency of the information correlated with the plurality of groups can be accurately determined. I can tell the user.
[0017]
Further, the dictionary updating means directly registers the intention of the operator by additionally registering the components of the information classified by the manual classifying means in the keyword dictionary as a keyword paired with the group designated by the operator. Can be reflected.
[0018]
Each function of the plurality of means provided in the information classification device according to the present invention is realized by an arbitrary combination of hardware resources whose functions are specified by the configuration itself and hardware resources whose functions are specified by a program. . The functions of the plurality of means are not limited to those realized by hardware resources which are physically independent of each other.
[0019]
Further, the present invention can be specified not only as an invention of an apparatus, but also as an invention of a program, an invention of a recording medium on which the program is recorded, and an invention of a method.
[0020]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 2 is a diagram illustrating a state in which the information classification device 1 according to one embodiment of the present invention is connected to a communication network N and an electronic mail delivery network M. The information classifying device 1 is configured as a personal computer, a personal digital assistant (PDA), a mobile phone, or the like, receives an HTML file from a WWW (World Wide Web) server 2 through a communication network N such as the Internet, and receives electronic files such as a telephone line. The e-mail is transmitted to and received from the e-mail device 3 via the mail delivery network M. Note that the communication network N and the e-mail delivery network M may be the same.
[0021]
FIG. 3 is a block diagram illustrating a hardware configuration of the information classification device 1. 1, the information classification device 1 includes a CPU 11, a ROM 12, a RAM 13, an operation device 14, a communication unit 15, a sound control unit 16, a speaker 17, a display control unit 18, a display device 19, and an external storage device 20.
[0022]
The CPU 11 executes a program stored in the ROM 12 to control each unit of the information classification device 1, executes a processing program, and stores a keyword dictionary W based on components extracted from information such as an HTML file and an electronic mail. Processing for updating, processing for classifying information into groups based on the keyword dictionary W, processing for reproducing sound according to the group to which the information is classified, and displaying the position corresponding to the correlation between the information and the group on a coordinate plane And a process of registering a weight for a keyword in the keyword dictionary W based on a component of information.
[0023]
The ROM 12 is a memory in which control programs and data, a processing program, an e-mail program, a Web browser, and the like necessary for the operation of the CPU 11 are stored in advance, and the RAM 13 temporarily stores programs, various data, and the like. Memory. These programs and various data may be downloaded via the communication unit 15 and stored in the RAM 13 or a predetermined area of the external storage device 20. These programs and various data may be read from a computer-readable storage medium such as a compact disk (not shown) and stored in the RAM 13 or a predetermined area of the external storage device 20.
[0024]
The operating device 14 is, for example, a keyboard or mouse in the case of a personal computer, or a dial button in the case of a mobile phone, and is used by the operator of the information classification device 1 to input various instructions and text.
[0025]
The communication unit 15 is a so-called network interface card, a modem, or the like, and is configured to be connectable to the communication network N and the electronic mail delivery network M.
The sound control unit 16 generates an acoustic signal based on a sound file in which control information for reproducing a sound is described. Then, this acoustic signal is output to the speaker 17 and reproduced. The sound control unit 16 can adjust the volume of the speaker 17, that is, the reproduction volume. When the CPU 11 outputs control information indicating a set value of the reproduction volume, the sound control unit 16 outputs the control information. Adjust the playback volume based on that.
[0026]
The display control unit 18 transmits information such as an HTML file received through the communication network N and an e-mail received through the e-mail delivery network M to a display device including a liquid crystal display panel (LCD) and a CRT under the control of the CPU 11. 19 is output.
[0027]
The external storage device 20 includes a hard disk, a flash memory, and the like, and stores a keyword dictionary, a co-occurrence keyword dictionary, a kana-kanji conversion dictionary, a keyword candidate dictionary, a group table, a melody table, a sound file, and the like, which will be described later. The data format of the sound file may be digitally encoded data such as the MIDI format, or may be data based on a waveform sample data system such as PCM, DPCM, ADPCM, or the like.
[0028]
FIG. 4 is a data flow diagram of the information classification device 1. The processing program generates a communication process 31, a classification process 32, a melody reproduction process 33, a mapping process 34, a morphological analysis process 35, a kana-kanji conversion process 36, a filter process 37, and a dictionary update process 38.
[0029]
The communication process 31 receives an HTML file from the WWW server 2 according to a predetermined protocol, and transmits and receives an e-mail with the e-mail device 3. When the communication process 31 receives information such as an HTML file and an e-mail, it outputs them to the morphological analysis process 35 and the classification process 32.
[0030]
The classification process 32 classifies information output from the communication process 31 into groups based on the keyword dictionary W, and outputs a group number, a group evaluation value, and the like for each information to the melody reproduction process 33 and the mapping process 34.
[0031]
FIG. 5 is a diagram illustrating an example of the keyword dictionary W. The record of the keyword dictionary W includes fields of a group number, a keyword, a setting value, and a song number. The group numbers and keywords included in the same record are managed in the keyword dictionary W in pairs. The group number is assigned to each group in a group table G described later, and the group to which the keyword belongs is determined by the group number. Therefore, registering a group number and a keyword in pairs in the keyword dictionary W is substantially equivalent to registering a group and a keyword in pairs in the keyword dictionary W. Words that appear particularly frequently in a specific context are registered as keywords. The set value indicates the weight for the keyword, and the received information is classified into groups based on the keyword and the weight indicated by the set value. The song number is assigned to each sound file name in a melody table M described later. The setting of the song number for the keyword is optional, and if not set, the sound file to be reproduced is determined by the song number set for the group to which the keyword belongs. The song number is set so that, for example, when the frequency of occurrence of a specific keyword for certain information is extremely high, the sound file set for the keyword is output in preference to the sound file set for the group to which the keyword belongs. It can be used in such cases.
[0032]
FIG. 6 is a diagram illustrating an example of the group table G. The group table G is a table for storing group names for classifying information. The record of this table is composed of four fields: a group number, a group name, a music number, and set coordinates. When the information is classified into groups, a sound file corresponding to the music number of the classified group is played. “Preference” in the group name is a group for classifying keywords representing the preference of the operator, and the method of registering keywords belonging to the group “preference” is different from other groups. Details will be described later. The set coordinates are values for calculating the position of the mark when displaying the mark at a position on the coordinate plane corresponding to the correlation between the information and the group, and will be described in detail later.
[0033]
The classification process 32 classifies information using the above-described keyword dictionary W and group table G, for example, as follows. The classification process 32 extracts text information such as a title, a text, and a file name from the information to be classified, calculates an evaluation value of each group for the information to be classified by the following formula, and divides the information into a group evaluation value. Into the largest group.
Group evaluation value = Σ (number of appearances of keywords belonging to the group × setting value of the keywords)
Here, the set value of the keyword is the “set value” stored in the keyword dictionary W. The number of appearances of a keyword is a value obtained by counting the number of keywords in information to be classified. The classification process 32 can classify not only text information but also image information, sound information, and the like using a file name or the like.
[0034]
For example, assume that the text information has the following contents.
(Information): “I bought a motorcycle model”
When the keyword dictionary W shown in FIG. 5 is used, the group evaluation value of each group is calculated as follows.
Group "Hobbies": 5 ("bike" x 1) + 2 ("plamo" x 1) = 7
Group "work": 0
Group "Taste": 0
Therefore, the text information “bought a motorcycle model” is classified into the group “hobby”.
[0035]
In addition, information may be classified by adding the number of appearances of all keywords belonging to the group for each group without using weighting and classifying the group into the group having the largest value, or simply selecting the keyword having the highest appearance frequency. They may be classified into groups to which they belong.
[0036]
The melody reproduction process 33 reproduces a sound file according to the group whose information has been classified in the classification process 32. When the group number of the group into which the information is classified is output from the classification process 32, the melody reproduction process 33 acquires the song number set for the group from the group table G, and is specified by the song number in the melody table. Play a sound file.
[0037]
FIG. 7 is a diagram illustrating an example of the melody table M. The melody table M is a table for storing the file name of a sound file, and includes a song number and a file name field for uniquely identifying the sound file.
[0038]
The reproduction timing of the sound file can be set, for example, when receiving information, when browsing information, and the like. Specifically, for example, when an electronic mail arrives, when the received electronic mail is browsed, or when an HTML file is browsed with a Web browser. In the case of reproduction at the time of browsing, it is possible to set so as to change the reproduction volume or the reproduction part of the sound file according to the frequency of appearance of the keyword appearing in the part being displayed. For example, if the keyword appears frequently in the displayed part, the volume is increased or the rust part is set to be reproduced, so that when the information is scrolled and viewed, the volume changes. That is, the change in the appearance frequency of the keyword in the part being displayed can be easily known by the fact that the reproduction part changes to a rust part in the middle.
[0039]
The melody reproduction process 33 may reproduce a sound file set for each keyword in the keyword dictionary W. For example, a sound file set for a keyword having the highest frequency of appearance in the information may be played back. The sound file to be reproduced may be changed according to the keyword that appears most frequently in the section.
[0040]
The volume to be reproduced may be set in advance for each group according to the importance of the group. Also, when an e-mail arrives, if the number of unconfirmed e-mails in the past received e-mails classified into the group into which the e-mails are classified is equal to or more than a predetermined number, the volume to be reproduced is increased and the operator is notified. A warning may be issued.
[0041]
The mapping process 34 displays an asterisk Z at a position on the coordinate plane according to the correlation between the information and the group, as shown in FIG. Specifically, for example, the processing is performed as follows. First, the coordinates for each group are obtained by the following equation.
Σ (set coordinates of group x group evaluation value / total of evaluation values)
Here, the set coordinates of the group are the set coordinates of each group shown in the group table G, and the group evaluation value is a value calculated in the classification process 32. The total of the evaluation values is the total of the group evaluation values of each group.
[0042]
Next, the value obtained by summing the x coordinates for each group is calculated as the x coordinate of the star Z by the above formula, and the value obtained by summing the y coordinates for each group is calculated as the y coordinate of the star Z.
Next, a pop-up window is displayed on the display device 19, and in the pop-up window, an asterisk Z is displayed on an xy coordinate plane whose origin is at the center of the window, and the set coordinates and the origin of each group of preference, hobby, and work are displayed. Display the three connected axes. The timing of the display is, for example, when an electronic mail arrives, or when the transmission of an HTML file is requested to the WWW server 2 and the received HTML file is displayed on a Web browser.
[0043]
For example, it is assumed that the group evaluation value of each group of the received electronic mail is as follows.
Group "Hobbies": 5
Group "work": 3
Group "Taste": 2
In this case, the coordinates calculated for each group by the above equation have the following values.
Group "Hobbies": (160, -100)
Group "work": (-96, -60)
Group "Taste": (0,40)
The sum of these coordinates for each of the x and y coordinates is (64, -120). As a result, a pop-up window displaying the star Z at a position corresponding to (64, -120) on the coordinate plane of the display device 19 is displayed on the display device 19 when an electronic mail arrives.
[0044]
The morphological analysis process 35 performs a process of extracting a word as a component from information output from the communication process 31 and outputting the extracted word to the filter process 37. When the information is output from the communication process 31, the morphological analysis process 35 decomposes the text in the information into words by morphological analysis. Specifically, for example, the text "Slope in western Shizuoka is best for skiing" is broken down into nine words "Shizuoka / western / no / piste / is / ski / ni / optimum /".
[0045]
The kana-kanji conversion process 36 converts the text input by the user based on the kana-kanji conversion dictionary into kana-kanji characters, and outputs the components of the text to be converted to the morphological analysis process 35 for each conversion instruction. Specifically, for example, the operator inputs the text "Slope in the western part of Shizuoka is best for skiing." When the text is divided into five components and conversion is instructed, the kana-kanji conversion process 36 converts the text into a kanji kana mixed sentence of "Shizuoka / Western / Slope / Ski / Best" for each break. , Are output to the morphological analysis process 35. As a result, since relatively small components are input to the morphological analysis process 35, the amount of processing required for decomposing into words in the morphological analysis process 35 can be reduced, and as a result, the process of decomposing information into words can be simplified. Speed can be increased.
[0046]
The kana-kanji conversion process 36 accumulates, for each word, the number of times a word has been converted into a kanji, that is, the frequency of use of a word when creating a text. You may make it output to the morphological analysis process 35.
[0047]
The filter process 37 filters words using the keyword candidate dictionary to exclude words appearing irrespective of the context from the words output from the morphological analysis process 35, and removes a noun whose appearance rate is high only in a specific context from the keyword. Extract as Specifically, for example, three keywords “Shizuoka”, “Slope”, and “Ski” are extracted from nine words “Shizuoka / Seibu / No / Slope / Ha / Ski / Ni / Optimum / is”. .
[0048]
The dictionary update process 38 updates the co-occurrence keyword dictionary V based on the keywords output from the filter process 37 and the keyword dictionary W, and updates the keyword dictionary W based on the co-occurrence keyword dictionary V. FIG. 9 is a diagram illustrating an example of the co-occurrence keyword dictionary V. The co-occurrence keyword dictionary V includes one record including a group number, a registered keyword, a co-occurrence keyword, a co-occurrence count, and an occurrence count. The registered keywords and the co-occurrence keywords in one record are registered in the co-occurrence keyword dictionary V in pairs.
[0049]
The co-occurrence keyword is a keyword that appears (co-occurs) in one piece of information together with a keyword (registered keyword) registered in the keyword dictionary W as a keyword belonging to a group other than “preference” in one piece of information. The co-occurrence keywords are words that are candidates for keywords newly registered in the keyword dictionary W. The extraction conditions of the co-occurrence keyword are, for example, appearing in the information including the registered keyword, appearing in the paragraph including the registered keyword in the information including the registered keyword, and In the included information, it appears in n or less (“n” is an arbitrary value) keywords around the registered keyword. Note that the co-occurrence keywords are not extracted for the keywords belonging to the group “preference”. The number of occurrences is the number of pieces of information including registered keywords that are paired with the co-occurrence keyword among the information to be classified after a certain keyword is stored as a co-occurrence keyword. For the co-occurrence count, the co-occurrence keyword (already stored co-occurrence keyword) is extracted again as the co-occurrence keyword of the registered keyword among the information to be classified after a certain keyword is stored as a co-occurrence keyword. Indicates the number of pieces of information.
[0050]
Note that the number of occurrences may include the number of pieces of information that include the co-occurrence keyword among the information to be classified after the certain keyword is stored as the co-occurrence keyword. Further, the number of occurrences and the number of co-occurrences may be counted based on the number of appearances appearing in the information.
[0051]
The dictionary update process 38 registers a co-occurrence keyword having a strong co-occurrence relationship with the registered keyword as a keyword belonging to a group to which the registered keyword belongs in the keyword dictionary W. The strength of the co-occurrence relationship between the registered keyword and the co-occurrence keyword is, for example, a high probability represented by the number of co-occurrence / occurrence, or a large number of co-occurrence. Specifically, for example, if the co-occurrence keyword is registered in the keyword dictionary W when the conditions of co-occurrence frequency / occurrence frequency> 0.7 and co-occurrence frequency> 9 are satisfied, the co-occurrence keyword dictionary V shown in FIG. If there is a keyword, the newly received e-mail contains the text "Snow is remaining in ski areas in Yamagata Prefecture." And "Snow" is a co-occurrence keyword of the registered keyword "Ski". If it is extracted, “snow” is registered in the keyword dictionary W as the keyword of the group number “1”, and is deleted from the co-occurrence keyword dictionary V. At this time, the set value registered in the keyword dictionary W may be, for example, the same value as the registered keyword paired with the co-occurrence keyword, or represented by the co-occurrence count / occurrence count in the set value of the registered keyword. A value multiplied by the probability may be used. The song number to be registered may be, for example, the same song number as the registered keyword, or may not be set if the sound file set for the group to which the registered keyword belongs should be reproduced. .
[0052]
The dictionary update process 38 accumulates, for each output keyword, the number of pieces of information including the keyword separately from the co-occurrence keyword dictionary V, and when the number satisfies a certain condition, assigns the keyword to the group “preference”. It is added to the keyword dictionary W as a keyword to which it belongs, that is, a keyword whose group number is “0”. When the operator is interested in information including a certain keyword, the operator actively collects information including the keyword, and thus the keyword is likely to be included in a large amount of information generally received by the operator. Therefore, it can be said that the keywords belonging to the group "preferences" are keywords representing the preferences of the operator. By categorizing the keywords into the group "preferences", the operator can easily know that the information is his / her favorite information by the sound at the time of receiving the e-mail.
[0053]
Further, the operator can specify the group of information by himself / herself. Specifically, for example, the processing is performed as follows. When certain information is output from the filter process 37 when the operator is preset to specify a group, the dictionary update process 38 displays an instruction screen for the operator to specify a group into which the information is classified on the display device 19. And the operator specifies a group into which the information is classified. When the operator specifies a group, the dictionary update process 38 additionally registers, in the keyword dictionary W, keywords whose appearance frequency is equal to or more than a predetermined frequency among the extracted keywords as keywords belonging to the group specified by the operator. The operator may be allowed to select a keyword to be additionally registered.
[0054]
The configuration of the information classification device 1 has been described above. Hereinafter, the operation of the information classification device 1 will be described.
FIG. 1 is a flowchart showing a flow of processing of the information classification device 1.
[0055]
First, an operation for initial setting of the melody table M, the group table G, and the keyword dictionary W starting from “Start 4” in the figure will be described. The process of “Start 4” is always performed when the information classification device 1 is used for the first time, and thereafter, is performed as needed. The dictionary update process 38 displays a screen for performing the initial setting of the keyword dictionary W, and requests the operator to input a keyword, a group number to which the keyword belongs, and a set value. When the operator inputs a keyword, a group number to which the keyword belongs, a set value, and a song number, the dictionary update process 38 registers them in the keyword dictionary W (S405). Next, the dictionary update process 38 displays a list of sound file names stored in the melody table M on the screen, and allows the user to select a sound file name for each group. When the operator selects a sound file name, the dictionary update process 38 registers the song number of the selected sound file name in the “song number” field of the group table G (S410). Next, the dictionary update process 38 sets the set coordinates of the group table G according to the number of groups (S415).
[0056]
Next, the operation of updating the dictionary when sending an e-mail starting from “Start 1” in the figure will be described. The operator activates the e-mail program in the information classification device 1, inputs a text to be transmitted, and designates a delimiter by a predetermined keyboard operation, thereby instructing the kana-kanji conversion of the text input so far (S105). When the kana-kanji conversion is instructed, the kana-kanji conversion process 36 converts the instructed text into a kana-kana mixed sentence and outputs the converted component to the morphological analysis process 35 (S110). The morphological analysis process 35 extracts a word and outputs it to the filter process 37. The filter process 37 filters the word to extract a keyword, and outputs the keyword to the dictionary update process 38 (S310). When the keyword is output, the dictionary update process 38 updates the keyword dictionary W and the co-occurrence keyword dictionary V based on the keyword (S315). The above steps S310 to S315 are repeated each time the operator instructs the kana-kanji conversion, and when the operator finishes inputting all the text information and performs an operation of transmitting an e-mail, the communication process 31 converts the e-mail into an e-mail. The data is transmitted to the device 3 (S115).
[0057]
Next, the operation when an e-mail starting from “Start 3” in the figure is received will be described. When receiving the electronic mail from the electronic mail device 3 via the communication unit 15, the communication process 31 outputs the text information of the received electronic mail to the morphological analysis process 35 (S305). The morphological analysis process 35 decomposes the text information into words and outputs the words to the filter process 37. The filter process 37 outputs the keyword extracted from the word to the dictionary update process 38 (S310). When a keyword is input, the dictionary update process 38 updates the keyword dictionary W and the co-occurrence keyword dictionary V based on the keyword (S315). Next, the classification process 32 classifies the text information of the e-mail output from the communication process 31 into groups, and outputs the group to the melody reproduction process 33 and the mapping process 34 (S320). The melody reproduction process 33 performs a process of reproducing sounds according to the classified groups. For example, if the classified group is “hobby”, the melody reproduction process 33 acquires the file name “song name 1.mmf” corresponding to the classified group “hobby” from the group table G, and stores the sound file in the external storage device. 20. Next, the acquired sound file is output to the sound control unit 16. As a result, the sound file is reproduced by the sound control unit 16, and the sound corresponding to the group into which the electronic mail is classified is reproduced. On the other hand, the mapping process 34 displays a pop-up window in which a star is displayed at a position corresponding to the correlation between the information and the group on the display device 19 (S325). Next, the e-mail program displays the e-mail on the display device 19. The melody reproduction process 33 reproduces the sound file and controls the reproduction volume in accordance with the appearance frequency of the registered keyword included in the part of the text information being displayed.
[0058]
The operation when browsing the Web starting from “Start 2” in the figure, that is, when receiving the HTML file, is substantially the same as the operation when “Start 3”, so the description is omitted.
[0059]
Hereinafter, effects of the present embodiment will be described.
According to the information classification device 1 of the present embodiment, a co-occurrence keyword having a strong co-occurrence relationship with a registered keyword is newly registered in the keyword dictionary W. A strong co-occurrence relationship means that the co-occurrence keyword frequently appears in the information where the registered keyword appears. When the co-occurrence keyword appears in certain information, the information is It is highly likely that the information belongs to the group. Further, even if the target of interest shifts with the passage of time, it is highly likely that the user will be interested in information related to information that has been interested in the past in the future. When a co-occurrence keyword is registered as a keyword in the keyword dictionary W, information in which the original registered keyword does not appear but only the newly registered keyword appears can be classified into a group to which the original registered keyword belongs. Therefore, information can be correctly classified into groups.
[0060]
Further, information is classified into groups having a large group evaluation value calculated based on weighting for each keyword, so that information can be more accurately classified into groups. Further, when the operator is not set to designate a group, there is no need for the operator himself to register keywords and classify information, so that the information can be easily classified into groups. For example, even if the importance of information changes over time, if the frequency of occurrence of a specific word increases with time, the information classification device 1 registers the word as a keyword, and the operator himself , The information having high importance can be classified into a specific group without continuously updating the keyword dictionary W. Further, since the melody reproduction process 33 reproduces a sound according to the classified group, the group of information can be notified to the operator by sound. Thus, the operator can judge the group of information based on the sound, and, for example, in the case of an e-mail, the operator can immediately know the arrival of an important e-mail.
[0061]
Further, the information classification device 1 according to the present embodiment extracts the constituent elements of the information for each section of the text input when the information is created by the text input, so that the kana-kanji conversion process 36 The processing in the morphological analysis process 35 is simplified for the components of
[0062]
Further, the information classification device 1 of the present embodiment displays a position corresponding to the correlation between the information and the group on a coordinate plane. Accordingly, when one information includes a plurality of keywords and the groups to which the keywords belong are different, the degree of the relationship between the information and the plurality of groups is visually expressed. Therefore, the tendency of the information correlated with the plurality of groups can be accurately transmitted to the operator.
[0063]
Furthermore, according to the information classification device 1 of the present embodiment, the information is classified into groups according to the instruction of the operator, so that the intention of the operator can be directly reflected in the classification processing.
[0064]
In the present embodiment, the e-mail and the HTML file are described as examples of the information to be classified. However, the information may be an image file or a sound file. For example, the sound files are grouped based on the file name of the sound file. It may be classified.
[Brief description of the drawings]
FIG. 1 is a flowchart illustrating a flow of processing of an information classification device according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a state where an information classification device according to an embodiment of the present invention is connected to a network.
FIG. 3 is a block diagram illustrating a hardware configuration of an information classification device according to an embodiment of the present invention.
FIG. 4 is a data flow diagram of the information classification device according to one embodiment of the present invention.
FIG. 5 is a diagram showing a keyword dictionary included in the information classification device according to one embodiment of the present invention.
FIG. 6 is a diagram showing a group table provided in the information classification device according to one embodiment of the present invention.
FIG. 7 is a diagram showing a melody table provided in the information classification device according to one embodiment of the present invention.
FIG. 8 is an example in which an information classification device according to an embodiment of the present invention displays a position corresponding to a correlation between information and a group on a coordinate plane.
FIG. 9 is a diagram showing a co-occurrence keyword dictionary included in the information classification device according to one embodiment of the present invention.
[Explanation of symbols]
1 Information classification device
2 WWW server
3 E-mail device
11 CPU (sound reproducing means, dictionary updating means, manual classifying means, weighting means, classifying means, mapping means)
12 ROM
13 RAM (sound reproducing means, dictionary updating means, manual classifying means, weighting means, classifying means, mapping means)
14 Actuator (manual classification means)
15 Communication unit
16. Sound control unit (sound reproducing means)
17 speakers (sound reproducing means)
18 Display control unit (manual classification means, weighting means, mapping means)
19 Display device (manual classification means, weighting means, mapping means)
20 external storage device (sound reproducing means, dictionary updating means, weighting means)
31 Communication Process
32 Classification Process
33 Melody Playback Process
34 Mapping Process
35 Morphological Analysis Process
36 Kana-Kanji conversion process
37 Filter Process
38 Dictionary update process

Claims

Classification means for classifying information into groups based on the keyword dictionary;
Dictionary updating means for updating the keyword dictionary based on components extracted from information;
Sound reproducing means for reproducing sound according to a group into which information is classified;
An information classification device, comprising:

2. The information classification apparatus according to claim 1, wherein the dictionary updating unit extracts a component of the information for each section of the text input when the information is created by the text input.

3. The information classification device according to claim 1, wherein the dictionary updating unit extracts a component of the information from the received information.

The dictionary updating means extracts from the information a component having a strong co-occurrence relationship with a keyword registered in the keyword dictionary and registers the component in the keyword dictionary as a paired keyword and a paired keyword. The information classification device according to claim 1, 2 or 3.

The information classification device according to any one of claims 1 to 4, wherein, when the information is received, the sound reproduction unit reproduces a sound according to a group into which the information is classified.

Weighting means for registering weighting for the keyword in the keyword dictionary,
The information classification device according to any one of claims 1 to 5, wherein the classification unit classifies information into groups based on weights registered in keywords.

The sound reproducing unit changes control information for reproducing a sound in accordance with an appearance frequency of a keyword registered in the keyword dictionary and an appearance frequency of the information in a displayed portion. Item 7. The information classification device according to any one of Items 1 to 6.

The information classification device according to any one of claims 1 to 7, further comprising mapping means for displaying a position corresponding to the correlation between the information and the group on a coordinate plane.

The information according to any one of claims 1 to 8, wherein the dictionary update unit additionally registers a component extracted from the information as a keyword paired with a group specified by an operator in the keyword dictionary. Classifier.

A classification step of classifying information into groups based on the keyword dictionary;
A dictionary update step of updating the keyword dictionary based on the components extracted from the information,
A sound reproduction step of reproducing a sound according to a group into which information is classified;
An information classification method comprising:

A program for causing a computer to function as an information classification device,
Classification means for classifying information into groups based on the keyword dictionary,
Dictionary updating means for updating the keyword dictionary based on components extracted from information;
Sound reproducing means for reproducing sound according to a group into which information is classified;
Program to function as