JP4238642B2

JP4238642B2 - Word registration device, word registration method, and word registration program

Info

Publication number: JP4238642B2
Application number: JP2003163203A
Authority: JP
Inventors: 宏顕木曽
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2003-06-09
Filing date: 2003-06-09
Publication date: 2009-03-18
Anticipated expiration: 2023-06-09
Also published as: JP2004362496A

Description

【０００１】
【発明の属する技術分野】
本発明は、単語文字列及びその読みを関連付けて格納し、読みから単語文字列（表記）への変換に使用される単語辞書に対して、未登録の単語文字列及びその読みを登録する機能を備える携帯電話機、携帯電子メール端末、ＰＤＡ、パーソナルコンピュータなどの単語登録装置に関し、さらには、その単語登録方法及び単語登録プログラムに関する。
【０００２】
【従来の技術】
文書入力が行われる携帯電話機、携帯電子メール端末、ＰＤＡ、パーソナルコンピュータなどの装置には、キーボードなどから入力された読み（かな、ローマ字など）を、単語文字列（漢字、カタカナなど）へ変換する変換プログラム（ＦＥＰ、ＩＭＥ）が搭載されている。
【０００３】
この種の変換プログラムは、読み入力後の変換操作に応じて、単語辞書から読みに対応する文字列を検索し、これを未確定状態で表示する。変換された文字列が正しい場合は、確定操作に応じて、変換文字列の入力を確定する一方、変換文字列が正しくない場合は、変換候補切り換え操作、文節区切り変更操作などに応じて、変換文字列の修正を行う。
【０００４】
通常、上記のような変換プログラムは、その変換精度を向上させるために、単語登録機能を備えている。単語登録機能は、未登録単語を単語辞書に追加登録する機能であり、その際には、単語文字列及びその読みが入力される。したがって、ユーザは、未登録単語を逐次又は一括して単語辞書に追加登録することにより、精度の高い文字列変換を行うことが可能になる。
【０００５】
ところで、近年においては、電子メールなどの普及に伴い、他のユーザが作成した文書を参照しながら、文書入力を行うケースが増えている。このようなケースでは、参照する文書の一部を引用することがあり、このときユーザは、引用部分の読みを入力し、漢字などの文字列に変換することになる。
しかしながら、上記の引用部分に、ユーザが通常使用しない単語文字列が含まれ、かつ、その単語文字列が辞書登録されていない場合には、正しい文字列変換が行われず、文章入力の効率が低下するという問題がある。
【０００６】
そこで、既に文字列に変換された文章から未登録の単語文字列を自動的に抽出するもの（例えば、特許文献１、２参照。）、文書ファイルに含まれる未登録の単語文字列を変換候補として表示可能にするもの（例えば、特許文献３参照。）、未登録の単語文字列及びその読みを単語辞書に自動的に登録するもの（例えば、特許文献４参照。）などが提案されている。
【０００７】
【特許文献１】
特開平７−２１１７０号公報（第３頁、第２図）
【特許文献２】
特開２００２−２２９９９９号公報（第９頁、第４図）
【特許文献３】
特開平１０−３３４０９０号公報（第４頁、第７図）
【特許文献４】
特開平１１−８５７６１号公報（第９頁、第５図）
【０００８】
【発明が解決しようとする課題】
しかしながら、特許文献１、２に示されるものは、既に文字列に変換された文章から未登録の単語文字列を自動的に抽出するものの、抽出した単語文字列を単語辞書に登録する際には、ユーザが読みを入力する必要がある。
なお、特許文献１、２では、文章から単語文字列を自動的に抽出するための具体的な方法が開示されていない。
【０００９】
また、特許文献３に示されるものは、文書入力時に未登録の単語文字列及びその読みを抽出するとともに、これらの情報を文書ファイルに付加しておき、その文書ファイルを他のユーザが開いて編集／修正する際、付加情報にもとづいて未登録の単語文字列も変換候補として表示するものである。つまり、特許文献３に示されるものでは、文書ファイルに特殊な情報を付加することが前提となっており、同じ変換プログラムを使用するユーザ間でなければ、効果が発揮されないという問題がある。
【００１０】
また、特許文献４に示されるものは、ユーザが入力した文字列を対象とし、そこに含まれる未登録の単語文字列を、その読みとともに単語辞書に自動的に登録するものであり、既に文字列に変換された電子メールなどの文章から単語文字列を抽出するものではない。
また、特許文献４に示されるものでは、複雑な形態素解析ルーチンを用いて文字列の抽出を行っているため、仮に、電子メールなどの変換済み文書を対象とした場合には、装置の処理負担が増大するという問題がある。特に、ハードウエアリソースに制限がある携帯電話機などの小型装置では、処理能力の不足により実施困難となる可能性が高い。
【００１１】
本発明は、上記の事情にかんがみなされたものであり、既に文字列に変換された文章を対象とし、未登録の単語文字列を抽出するとともに、その読みを生成し、抽出した単語文字列及び生成した読みを単語辞書に登録することにより、ユーザによる単語登録作業を軽減でき、しかも、単語文字列の抽出処理及び読みの生成処理を可及的に簡略化することにより、ハードウエアリソースに制限がある携帯電話機などの小型装置でも実施することができる単語登録装置、単語登録方法及び単語登録プログラムの提供を目的とする。
【００１２】
【課題を解決するための手段】
上記目的を達成するため本発明の単語登録装置は、単語文字列及びその読みを関連付けて格納し、読みから単語文字列への変換に使用される単語辞書に対して、未登録の単語文字列及びその読みを登録する単語登録装置であって、既に文字列に変換された文章を対象とし、単語文字列抽出用キーワードを用いて、前記単語辞書の登録対象となる単語文字列を抽出する単語文字列抽出手段と、抽出した前記単語文字列を文字単位に分解するとともに、分解した各文字の読みを前記単語辞書から検索し、検索した複数の前記読みを結合して、抽出した前記単語文字列の読みを生成する読み生成手段と、抽出した前記単語文字列及び生成した前記読みを関連付けて前記単語辞書に登録する単語登録手段と、を備える構成としてある。
【００１３】
単語登録装置をこのように構成すれば、既に文字列に変換された文章を対象とし、未登録の単語文字列を抽出するとともに、その読みを生成し、抽出した単語文字列及び生成した読みを単語辞書に登録することが可能になる。これにより、ユーザによる単語登録作業を軽減できるとともに、既に文字列に変換された文章を参照しながら文書入力を行う際、文章入力の効率を向上させることができる。
【００１４】
また、単語文字列抽出用キーワードを用いて、単語文字列を抽出することにより、複雑な形態素解析ルーチンを用いる場合に比べ、単語文字列の抽出処理を簡素化することができる。
また、抽出した単語文字列の読みを文字単位で検索し、検索した複数の読みを結合して単語文字列の読みを生成することにより、読みの生成処理も簡略化することができる。
その結果、装置の処理負担を軽減して、迅速な単語登録処理が可能になるだけでなく、ハードウエアリソースに制限がある携帯電話機などの小型装置でも実施することが可能になる。
【００１５】
また、本発明の単語登録装置は、前記単語文字列抽出手段が、前記単語抽出用キーワードに挟まれた単語文字列及び／又は記述記号と前記単語抽出用キーワードに挟まれた単語文字列を抽出し、抽出した単語文字列のうち、前記単語辞書に未登録のものを登録対象とする構成としてある。
単語登録装置をこのように構成すれば、単語文字列の抽出処理が更に簡素化されるため、処理負担の軽減効果を高めることができるだけでなく、処理速度を更に向上させることができる。
【００１６】
また、本発明の単語登録装置は、前記単語文字列抽出手段が、漢字のみで構成された単語文字列及び／又はカタカナのみで構成された単語文字列を抽出する構成としてある。
単語登録装置をこのように構成すれば、単語文字列の抽出処理が更に簡素化されるため、処理負担の軽減効果を高めることができるだけでなく、処理速度を更に向上させることができる。
【００１７】
また、本発明の単語登録装置は、前記単語辞書が、単語文字列の読みを、単語文字列の文字単位で区切って格納する構成としてある。
単語登録装置をこのように構成すれば、読みから単語文字列へ変換するための辞書データと、抽出した単語文字列の読みを生成するための辞書データを兼用化し、単語辞書容量を小さくすることができる。これにより、ハードウエアリソースに制限がある携帯電話機などでの実施が更に容易となる。
【００１８】
また、本発明の単語登録装置は、前記単語辞書が、読みから単語文字列への変換に使用されるオリジナル辞書及びユーザ辞書を備え、前記読み生成手段が、抽出した前記単語文字列の読み検索を、前記オリジナル辞書を用いて行い、前記単語登録手段が、前記ユーザ辞書に対して単語登録を行う構成としてある。
単語登録装置をこのように構成すれば、ユーザ辞書に登録された辞書データに影響を受けることなく、抽出した単語文字列の読みを精度良く生成することができる。
【００１９】
また、本発明の単語登録装置は、前記単語登録手段が、抽出した前記単語文字列及び生成した前記読みを表示し、その修正及び／又は登録確認を要求する構成としてある。
単語登録装置をこのように構成すれば、誤って抽出された単語文字列や、誤って生成された読みが、単語辞書に登録されることを防止できるだけでなく、単語登録する単語文字列及びその読みをユーザが認識し、効率の良い文書入力を行うことができる。
【００２０】
また、本発明の単語登録装置は、小型の通信用端末機器に内蔵した構成としてある。
本発明の単語登録装置は、単語文字列の抽出処理及び読みの生成処理が簡略化されるため、ハードウエアリソースに制限がある携帯電話機などの小型の通信用端末機器でも実施することが可能になる。
【００２１】
また、上記目的を達成するため本発明の単語登録方法は、単語文字列及びその読みを関連付けて格納し、読みから単語文字列への変換に使用される単語辞書に対して、未登録の単語文字列及びその読みを登録する単語登録方法であって、既に文字列に変換された文章を対象とし、単語文字列抽出用キーワードを用いて、前記単語辞書の登録対象となる単語文字列を抽出し、抽出した前記単語文字列を文字単位に分解するとともに、分解した各文字の読みを前記単語辞書から検索し、検索した複数の前記読みを結合して、抽出した前記単語文字列の読みを生成し、抽出した前記単語文字列及び生成した前記読みを関連付けて前記単語辞書に登録する方法としてある。
【００２２】
また、本発明の単語登録方法は、小型の通信用端末機器における単語登録において実施するようにしてある。
このようにすれば、単語文字列の抽出処理及び読みの生成処理を簡単に行うことができるので、ハードウエアリソースが制限されている携帯電話機などの小型の通信用端末機器でも実施することが可能になる。
【００２３】
単語登録方法をこのような方法にすれば、既に文字列に変換された文章を対象とし、未登録の単語文字列を抽出するとともに、その読みを生成し、抽出した単語文字列及び生成した読みを単語辞書に登録することにより、ユーザによる単語登録作業を軽減できるとともに、効率の良い文書入力を行うことができる。
しかも、単語文字列の抽出処理及び読みの生成処理を可及的に簡略化することにより、ハードウエアリソースに制限がある携帯電話機などの小型装置でも実施することが可能になる。
【００２４】
また、上記目的を達成するため本発明の単語登録プログラムは、単語文字列及びその読みを関連付けて格納し、読みから単語文字列への変換に使用される単語辞書に対して、未登録の単語文字列及びその読みを登録する単語登録プログラムであって、単語登録装置に、既に文字列に変換された文章を対象とし、単語文字列抽出用キーワードを用いて、前記単語辞書の登録対象となる単語文字列を抽出させ、抽出した前記単語文字列を文字単位に分解するとともに、分解した各文字の読みを前記単語辞書から検索し、検索した複数の前記読みを結合して、抽出した前記単語文字列の読みを生成させ、抽出した前記単語文字列及び生成した前記読みを関連付けて前記単語辞書に登録させる構成としてある。
【００２５】
単語登録プログラムをこのように構成すれば、既に文字列に変換された文章を対象とし、未登録の単語文字列を抽出するとともに、その読みを生成し、抽出した単語文字列及び生成した読みを単語辞書に登録するため、ユーザによる単語登録作業を軽減できる。
【００２６】
また、本発明の単語登録プログラムは、小型の通信用端末機器において単語登録を実行させるようにしてある。
このようにすれば、ハードウエアリソースが制限されている携帯電話機などの小型の通信用端末機器でも単語登録を容易に実現できる。
【００２７】
【発明の実施の形態】
以下、本発明の実施形態について、図面を参照して説明する。
【００２８】
［第一実施形態］
まず、本発明の第一実施形態について、図１〜図３を参照して説明する。
図１は、本発明の第一実施形態に係る文書入力装置（単語登録装置）のハードウエア構成を示すブロック図である。
【００２９】
この図に示される文書入力装置１は、機能的に本発明の単語登録装置を備えるものであり、例えば、携帯電話機、携帯電子メール端末、ＰＤＡ、パーソナルコンピュータなどの文書入力機能を有する装置によって構成されている。
例えば、文書入力装置１がパーソナルコンピュータである場合は、ハードウエアとして、キーボードなどの入力部２と、液晶ディスプレイなどの表示部３と、ＬＡＮなどの通信部４と、ハードディスクなどの記憶部５と、ＣＰＵなどの制御部６とを備えて構成される。
【００３０】
図２は、本発明の第一実施形態に係る文書入力装置（単語登録装置）の機能構成を示すブロック図である。
この図に示すように、文書入力装置１は、記憶部５に格納されるプログラムにより、文書データベース１０、日本語変換装置２０、単語登録装置３０などを機能的に構成している。
文書データベース１０には、ユーザが作成した文書データや、他のユーザが作成した文書データが格納されている。他のユーザが作成した文書データとしては、例えば、受信メールが挙られる。
【００３１】
日本語変換装置２０は、単語文字列とその読みを関連付けて格納する単語辞書２１を用い、読みから単語文字列への変換を行う変換エンジン２２を備えている。具体的に説明すると、変換エンジン２２は、入力部２における読み入力後の変換操作に応じて、単語辞書２１から読みに対応する文字列を検索し、これを未確定状態で表示部３に表示する。変換された文字列が正しい場合は、入力部２における確定操作に応じて、変換文字列の入力を確定する一方、変換文字列が正しくない場合は、入力部２における変換候補切り換え操作、文節区切り変更操作などに応じて、変換文字列の修正を行う。
【００３２】
単語辞書２１には、オリジナル辞書２１ａ及びユーザ辞書２１ｂが含まれる。オリジナル辞書２１ａは、文書入力装置１に標準装備される単語辞書であり、ユーザ辞書２１ｂは、ユーザによる単語の追加登録が許容される単語辞書である。本実施形態のオリジナル辞書２１ａでは、後述する抽出単語文字列の読み検索を行うために、単語文字列の読みが、単語文字列の文字単位に区切って格納されている。例えば、「単語文字列：計算機、読み：けいさんき」という単語辞書データにおいては、内部的に「単語文字列：計／算／機、読み：けい／さん／き」と構成されており、「文字：計、読み：けい」「文字：算、読み：さん」「文字：機、読み：き」という単漢字辞書データとしても使用することが可能となっている。
【００３３】
単語登録装置３０は、単語文字列抽出手段３１、読み生成手段３２及び単語登録手段３３を備えている。
単語文字列抽出手段３１は、既に文字列に変換された文書データベース１０内の文章データを対象とし、後述する単語文字列抽出用キーワードを用いて、単語辞書２１（ユーザ辞書２１ｂ）の登録対象となる単語文字列を自動的に抽出する機能的な構成部分である。
本実施形態の単語文字列抽出手段３１は、抽出対象とする文書データの条件や、単語文字列の抽出条件を予め設定する機能を備えている。
【００３４】
読み生成手段３２は、抽出した単語文字列を文字単位に分解するとともに、分解した各文字の読みを単語辞書２１（オリジナル辞書２１ａ）から検索し、検索した複数の読みを結合して、抽出した単語文字列の読みを生成する機能的な構成部分である。
また、単語登録手段３３は、抽出した単語文字列及び生成した読みを関連付けて単語辞書２１（ユーザ辞書２１ｂ）に登録する機能的な構成部分である。
本実施形態の単語登録手段３３は、抽出した単語文字列及び生成した読みを表示部３に表示し、その修正や登録確認をユーザに要求する機能を備える。
【００３５】
つぎに、本実施形態における単語登録装置３０の動作について、図３及び図４を参照して説明する。
図３は、本発明の第一実施形態に係る単語登録装置の動作を示すフローチャート、図４は、本発明の第一実施形態に係る単語登録装置が単語文字列の抽出に用いる単語文字列抽出用キーワードを示す説明図である。
【００３６】
図３に示すように、まず、ユーザが入力部２から単語文字列抽出条件の入力を行う（Ｓ１０１）。単語文字列抽出手段３１は、入力部２から入力された単語文字列抽出条件にしたがって、文書データベース１０の抽出対象となる文書データ選択や単語抽出数などの条件設定を行い、その設定に応じて、抽出対象となる文書データから単語文字列を抽出する（Ｓ１０２）。
この単語文字列抽出処理は、図４に示すような単語文字列抽出用キーワードを用いて行われる。例えば、単語文字列抽出用キーワードには、格助詞、格助詞相当、堤題助詞、取り立て助詞、格助詞＋取り立て助詞、取り立て助詞＋格助詞、接続助詞、判定助詞などが含まれる。
【００３７】
具体的には、単語文字列抽出用キーワードの後に続き、単語文字列抽出用キーワードの前までの文字列と、記述記号（句読点、疑問符、感嘆符など）の後に続き、単語文字列抽出用キーワードの前までの文字列が抽出される。
また、抽出する単語文字列は、漢字のみで構成された単語文字列と、カタカナのみで構成された単語文字列に限定しており、単語辞書２１（ユーザ辞書２１ｂ）の登録対象とする単語文字列は、単語辞書２１（オリジナル辞書２１ａ及びユーザ辞書２１ｂ）に登録されていない単語文字列である。
【００３８】
つぎに、抽出した単語文字列の読みを生成する（Ｓ１０３）。抽出された単語文字列がカタカナであれば、それに対応した読みとし、抽出された単語文字列が漢字であれば、単語辞書２１（オリジナル辞書２１ａ）を用いて、単語文字列の読みを生成する。
オリジナル辞書２１ａでは、前述したように、読みが単語文字列の文字単位で区切られているため、抽出した単語文字列を文字単位に分解し、各文字の読みをオリジナル辞書２１ａにて検索する。そして、検索した複数の読みを結合して、抽出した単語文字列の読みとする。また、読みの候補が複数存在するときは、ヒット件数が最も多い読みを採用する。
【００３９】
つぎに、抽出した単語文字列及び生成した読みを表示部３に表示し（Ｓ１０４）、ユーザに修正又は登録確認を要求する（Ｓ１０５）。ここで、修正が不要な場合は、抽出した単語文字列及び生成した読みを単語辞書２１（ユーザ辞書２１ｂ）に登録し（Ｓ１０６）、修正が必要な場合は、ユーザが手動で単語文字列又は読みを修正した後（Ｓ１０７）、単語文字列及び読みを単語辞書２１（ユーザ辞書２１ｂ）に登録する。
その後は、単語辞書２１（ユーザ辞書２１ｂ）に登録した上記の読みを入力すれば、上記の単語文字列へ変換することが可能になる（Ｓ１０８）。
【００４０】
つぎに、本発明の第一実施形態に係る単語登録装置の具体的な動作例について説明する。
例えば、文書データベース１０に、「将来、道州制が必要。」という文字列を含む受信メールがあり、オリジナル辞書２１ａに、「単語文字列：道／路、読み：どう／ろ」、「単語文字列：本／州、読み：ほん／しゅう」、「単語文字列：制／度、読み：せい／ど」という辞書データがある場合を考える。まず、ユーザが該当する受信メールを選択し（Ｓ１０１）、その受信メールに対して、単語文字列の抽出処理を実施する（Ｓ１０２）。この単語文字列抽出ステップでは、受信メールに含まれる文「将来、道州制が必要。」に対して、読点の後に続き、単語文字列抽出用キーワード「が」の前までの文字列で、かつ、文字種が漢字のみで構成されている文字列として「道州制」を抽出する。
【００４１】
つぎに、抽出した文字列「道州制」の読みを生成する（Ｓ１０３）。この読み生成処理ステップでは、文字列「道州制」を文字単位に分解し、各文字「道」、「州」、「制」の読みをオリジナル辞書２１ａにて検索する。本例では、「文字：道、読み：どう」、「文字：州、読み：しゅう」、「文字：制、読み：せい」が検索され、これらを結合して辞書登録候補データ「文字列：道州制、読み：どうしゅうせい」とする。
【００４２】
つぎに、抽出した単語文字列及びその読みをユーザに表示し（Ｓ１０４）、修正の有無を確認する（Ｓ１０５）。この場合は、修正が不要であるため、ユーザの確認操作に応じて、辞書登録候補データ「文字列：道州制、読み：どうしゅうせい」をユーザ辞書２１ｂへ登録する（Ｓ１０６）。
以上のステップを実行することにより、つぎに文書入力を行う際、変換エンジン２２は、入力された読み「どうしゅうせい」に対し、単語文字列「道州制」という新しい登録単語をユーザ辞書２１ｂから検索し、変換候補として表示することが可能となる（Ｓ１０８）。
【００４３】
以上のように構成された本実施形態によれば、既に文字列に変換された文章を対象とし、未登録の単語文字列を抽出するとともに、その読みを生成し、抽出した単語文字列及び生成した読みを単語辞書２１に登録することが可能になる。これにより、ユーザによる単語登録作業を軽減できるとともに、既に文字列に変換された文章を参照しながら文書入力を行う際、文章入力の効率を向上させることができる。
【００４４】
また、単語文字列抽出用キーワードを用いて、単語文字列を抽出することにより、複雑な形態素解析ルーチンを用いる場合に比べ、単語文字列の抽出処理を簡素化することができる。また、抽出した単語文字列の読みを文字単位で検索し、検索した複数の読みを結合して単語文字列の読みを生成することにより、読みの生成処理も簡略化することができる。その結果、装置の処理負担を軽減して、迅速な単語登録処理が可能になるだけでなく、ハードウエアリソースに制限がある携帯電話機などの小型装置でも実施することが可能になる。
【００４５】
また、文字列の抽出処理では、単語抽出用キーワード又は記述記号の後に続き、単語抽出用キーワードの前までの単語文字列で、かつ、漢字又はカタカナのみで構成された単語文字列を抽出し、抽出した単語文字列のうち、ユーザ辞書２１ｂに未登録のものを登録対象とするため、単語文字列の抽出処理を更に簡素化し、処理負担の軽減効果を高めることができるだけでなく、処理速度を更に向上させることができる。
【００４６】
また、単語辞書２１（オリジナル辞書２１ａ）は、単語文字列の読みを、単語文字列の文字単位で区切って格納するため、読みから単語文字列へ変換するための辞書データと、抽出した単語文字列の読みを生成するための辞書データを兼用化し、単語辞書容量を小さくすることができる。これにより、ハードウエアリソースに制限がある携帯電話機などでの実施が更に容易となる。
【００４７】
また、単語辞書２１は、読みから単語文字列への変換に使用されるオリジナル辞書２１ａ及びユーザ辞書２１ｂを備え、抽出した単語文字列の読み検索は、オリジナル辞書２１ａを用いて行い、単語登録は、ユーザ辞書２１ｂに対して行うようにしたので、ユーザ辞書２１ｂに登録された辞書データに影響を受けることなく、抽出した単語文字列の読みを精度良く生成することができる。
【００４８】
また、抽出した単語文字列及び生成した読みを表示し、その修正又は登録確認を要求するため、誤って抽出された単語文字列や、誤って生成された読みが、単語辞書２１（ユーザ辞書２１ｂ）に登録されることを防止できるだけでなく、単語登録する単語文字列及びその読みをユーザが認識し、効率の良い文書入力を行うことができる。
【００４９】
［第二実施形態］
つぎに、本発明の第二実施形態について、図５を参照して説明する。
図５は、本発明の第二実施形態に係る単語登録装置の動作を示すフローチャートである。
この図に示される第二実施形態は、文書データベース１０の抽出対象となる文書データをユーザが直接選択する点と、抽出した単語文字列及びその読みをユーザに確認することなく単語辞書２１（ユーザ辞書２１ｂ）に登録する点が前記実施形態と相違している。
【００５０】
図５に示すように、第二実施形態では、まず、ユーザが文書データベース１０の文書データを選択する（Ｓ２０１）。それ以降は、単語文字列の抽出処理（Ｓ２０２）と、抽出した単語文字列の読み生成処理（Ｓ２０３）と、抽出した単語文字列及び生成した読みの辞書登録処理（Ｓ２０４）とが自動的に実行される。その後は、単語辞書２１（ユーザ辞書２１ｂ）に登録した上記の読みを入力すれば、上記の単語文字列へ変換することが可能になる（Ｓ２０５）。
【００５１】
なお、本実施形態では、ユーザに対する修正要求や登録確認を行わないため、単語文字列の抽出処理及び単語文字列の読み生成処理では、抽出条件や読み生成条件を厳しくし、正解率が高い辞書データのみを抽出、生成することが好ましい。例えば、単語文字列の抽出処理では、図４に示す単語文字列抽出用キーワードにおいて、正解率の高いもののみを使用し、また、読み生成処理では、単語文字列の文字単位の読み検索において、同じ読み候補のヒット件数が、所定の閾値を超えるものだけを使用するなどの条件を加えることにより、精度が高い候補を選択するようにする。
【００５２】
つぎに、第二実施形態の具体的な動作について説明する。
例えば、文書データベース１０には、「明日は、道州制について議論する。」という文を含む受信メールがあり、オリジナル辞書２１ａには、「文字列：道／路、読み：どう／ろ」、「文字列：歩／道、読み：ほ／どう」、「文字列：本／州、読み：ほん／しゅう」、「文字列：九／州、読み：きゅう／しゅう」、「文字列：制／約、読み：せい／やく」という辞書データがある場合を考える。
【００５３】
ユーザが該当する受信メールを選択すると（Ｓ２０１）、その受信メールに対して単語文字列の自動抽出処理が実施される（Ｓ２０２）。このとき、受信メールに含まれる文「明日は、道州制について議論する。」においては、読点の後に続き、単語文字列抽出用キーワード「について」の前までの文字列で、かつ、文字種が漢字で構成されている文字列として「道州制」が辞書登録候補として抽出される。ここでは、単語文字列抽出用キーワードとして精度の高い「について」を適用したことにより、抽出精度が高められた。
【００５４】
つぎに、抽出した単語文字列の文字単位の読みをオリジナル辞書２１ａにて検索する。本例では、単語文字列の文字単位の読みとして同一の読みが２候補以上存在することを条件として加える。そして、この条件を満たすものとして、「文字：道、読み：どう」、「文字：州、読み：しゅう」、「文字：制、読み：せい」が検索され、「文字列：道州制、読み：どうしゅうせい」が辞書登録候補として作成される。
【００５５】
以上のように構成された本実施形態によれば、第一実施形態の効果に加え、文書データを選択するだけで単語文字列の抽出、単語文字列の読み生成、及び単語文字列及び読みの辞書登録を自動的に実施できるという効果が得られる。
しかも、単語文字列の抽出条件や単語文字列の読み生成条件を厳しくすることにより、登録辞書データの精度低下も回避することができる。
【００５６】
［第三実施形態］
つぎに、本発明の第三実施形態について、図６を参照して説明する。
図６は、本発明の第三実施形態に係る単語登録装置の動作を示すフローチャートである。
この図に示される第三実施形態は、ユーザが文書データを直接選択することなく、文書データの選択条件を設定する点が第二実施形態と相違している。
ユーザが文書データの選択条件を入力すると（Ｓ３０１）、その条件に合う文書データを対象とし、単語文字列の抽出処理（Ｓ３０２）と、読み生成処理（Ｓ３０３）と、単語文字列及び読みの登録処理（Ｓ３０４）とが自動的に実施され、その後、上記読みから上記単語文字列への変換が可能となる（Ｓ３０５）。
【００５７】
上記のように構成された第三実施形態によれば、最初に文書の選択条件を設定するだけで、文書選択を含む全ての単語登録処理を自動化することが可能になる。これにより、ユーザは、受信メールなどで使用されている未登録単語を意識することなく日本語変換することができ、本格的な文脈依存型の日本語変換処理が可能となる。
【００５８】
【発明の効果】
以上のように、本発明によれば、既に文字列に変換された文章を対象とし、未登録の単語文字列を抽出するとともに、その読みを生成し、抽出した単語文字列及び生成した読みを単語辞書に登録することにより、ユーザによる単語登録作業を軽減でき、しかも、単語文字列の抽出処理及び読みの生成処理を可及的に簡略化することにより、ハードウエアリソースに制限がある携帯電話機などの小型の通信用端末機器でも実施することができる。
【図面の簡単な説明】
【図１】本発明の第一実施形態に係る文書入力装置（単語登録装置）のハードウエア構成を示すブロック図である。
【図２】本発明の第一実施形態に係る文書入力装置（単語登録装置）の機能構成を示すブロック図である。
【図３】本発明の第一実施形態に係る単語登録装置の動作を示すフローチャートである。
【図４】本発明の第一実施形態に係る単語登録装置が単語文字列の抽出に用いる単語文字列抽出用キーワードを示す説明図である。
【図５】本発明の第二実施形態に係る単語登録装置の動作を示すフローチャートである。
【図６】本発明の第三実施形態に係る単語登録装置の動作を示すフローチャートである。
【符号の説明】
１文書入力装置
２入力部
３表示部
４通信部
５記憶部
６制御部
１０文書データベース
２０日本語変換装置
２１単語辞書
２１ａオリジナル辞書
２１ｂユーザ辞書
２２変換エンジン
３０単語登録装置
３１単語文字列抽出手段
３２読み生成手段
３３単語登録手段[0001]
BACKGROUND OF THE INVENTION
The present invention stores a word character string and its reading in association with each other, and registers a non-registered word character string and its reading in a word dictionary used for conversion from reading to word character string (notation) In addition, the present invention relates to a word registration method and a word registration program, such as a cellular phone, a portable electronic mail terminal, a PDA, and a personal computer.
[0002]
[Prior art]
For devices such as mobile phones, mobile e-mail terminals, PDAs, and personal computers where document input is performed, reading (Kana, Roman characters, etc.) input from the keyboard is converted into word character strings (Kanji, Katakana, etc.). Conversion programs (FEP, IME) are installed.
[0003]
This type of conversion program retrieves a character string corresponding to a reading from the word dictionary in accordance with a conversion operation after reading input, and displays this in an unconfirmed state. If the converted character string is correct, the input of the converted character string is confirmed according to the confirming operation. If the converted character string is incorrect, the conversion is performed according to the conversion candidate switching operation, the phrase delimiter changing operation, etc. Correct the string.
[0004]
Usually, the conversion program as described above has a word registration function in order to improve the conversion accuracy. The word registration function is a function for additionally registering unregistered words in the word dictionary. At this time, a word character string and its reading are input. Therefore, the user can perform highly accurate character string conversion by additionally registering unregistered words sequentially or collectively in the word dictionary.
[0005]
By the way, in recent years, with the spread of e-mail and the like, cases of inputting a document while referring to a document created by another user are increasing. In such a case, a part of the document to be referred to may be quoted. At this time, the user inputs the reading of the quoted part and converts it into a character string such as kanji.
However, if the above quoted part includes a word character string that is not normally used by the user, and the word character string is not registered in the dictionary, correct character string conversion is not performed, and the efficiency of sentence input decreases. There is a problem of doing.
[0006]
Thus, an unregistered word character string is automatically extracted from a sentence that has already been converted to a character string (see, for example, Patent Documents 1 and 2), and an unregistered word character string included in a document file is converted into a conversion candidate. Have been proposed (for example, see Patent Document 3), and those that automatically register unregistered word character strings and their readings in a word dictionary (for example, see Patent Document 4). .
[0007]
[Patent Document 1]
Japanese Patent Laid-Open No. 7-21170 (page 3, FIG. 2)
[Patent Document 2]
Japanese Patent Laid-Open No. 2002-229999 (page 9, FIG. 4)
[Patent Document 3]
Japanese Patent Application Laid-Open No. 10-334090 (page 4, FIG. 7)
[Patent Document 4]
Japanese Patent Laid-Open No. 11-85761 (page 9, FIG. 5)
[0008]
[Problems to be solved by the invention]
However, in Patent Documents 1 and 2, an unregistered word character string is automatically extracted from a sentence that has already been converted into a character string, but when registering the extracted word character string in a word dictionary, , The user needs to input the reading.
Note that Patent Documents 1 and 2 do not disclose a specific method for automatically extracting a word character string from a sentence.
[0009]
Also, in Patent Document 3, an unregistered word character string and its reading are extracted at the time of inputting a document, and the information is added to the document file so that another user can open the document file. When editing / modifying, unregistered word character strings are also displayed as conversion candidates based on the additional information. In other words, the technique disclosed in Patent Document 3 is based on the premise that special information is added to a document file, and there is a problem that the effect is not exhibited unless the user uses the same conversion program.
[0010]
Moreover, what is shown in Patent Document 4 is for automatically registering an unregistered word character string included in a character string input by a user together with its reading in a word dictionary. It does not extract word character strings from sentences such as e-mails converted into strings.
In addition, in the technique disclosed in Patent Document 4, since a character string is extracted using a complicated morphological analysis routine, if a converted document such as an e-mail is targeted, the processing burden of the apparatus There is a problem that increases. In particular, a small device such as a cellular phone with limited hardware resources is likely to be difficult to implement due to insufficient processing capacity.
[0011]
The present invention has been considered in view of the above circumstances, and is intended for sentences already converted into character strings, extracts unregistered word character strings, generates readings thereof, and extracts extracted word character strings and By registering the generated readings in the word dictionary, the word registration work by the user can be reduced, and by restricting the hardware resources by simplifying the word string extraction process and the reading generation process as much as possible An object of the present invention is to provide a word registration device, a word registration method, and a word registration program that can be implemented even in a small device such as a mobile phone.
[0012]
[Means for Solving the Problems]
In order to achieve the above object, the word registration apparatus of the present invention stores a word character string and its reading in association with each other and stores an unregistered word character string with respect to a word dictionary used for conversion from reading to word character string. And a word registration device for registering readings thereof, which target sentences that have already been converted into character strings and extract word character strings to be registered in the word dictionary using a word character string extraction keyword A character string extracting unit that decomposes the extracted word character string into character units, searches the word dictionary for readings of each decomposed character, combines the searched plural readings, and extracts the word characters A reading generation means for generating a reading of a sequence and a word registration means for associating the extracted word character string and the generated reading with each other and registering them in the word dictionary are provided.
[0013]
If the word registration device is configured in this way, a sentence that has already been converted into a character string is targeted, an unregistered word character string is extracted, its reading is generated, and the extracted word character string and the generated reading are read. It becomes possible to register in the word dictionary. Thereby, the word registration work by the user can be reduced, and the efficiency of sentence input can be improved when inputting a document while referring to a sentence that has already been converted into a character string.
[0014]
Also, by extracting a word character string using a word character string extraction keyword, the word character string extraction process can be simplified as compared with the case of using a complicated morpheme analysis routine.
Also, the reading generation processing can be simplified by searching for the reading of the extracted word character string in character units and combining the searched plural readings to generate the reading of the word character string.
As a result, not only can the processing load of the apparatus be reduced, and a quick word registration process can be performed, but also a small apparatus such as a mobile phone with limited hardware resources can be implemented.
[0015]
In the word registration device of the present invention, the word character string extraction unit extracts a word character string and / or a descriptive symbol sandwiched between the word extraction keywords and a word character string sandwiched between the word extraction keywords. Of the extracted word character strings, those not registered in the word dictionary are registered.
If the word registration device is configured in this way, the extraction process of the word character string is further simplified, so that not only the processing load can be reduced, but also the processing speed can be further improved.
[0016]
In the word registration device of the present invention, the word character string extraction unit extracts a word character string composed only of kanji and / or a word character string composed only of katakana.
If the word registration device is configured in this way, the extraction process of the word character string is further simplified, so that not only the processing load can be reduced, but also the processing speed can be further improved.
[0017]
In the word registration device of the present invention, the word dictionary stores word character string readings in units of word character string characters.
If the word registration device is configured in this way, the dictionary data for converting from reading to word character string and the dictionary data for generating the reading of the extracted word character string are combined to reduce the word dictionary capacity. Can do. This further facilitates implementation on a cellular phone or the like that has limited hardware resources.
[0018]
In the word registration device of the present invention, the word dictionary includes an original dictionary and a user dictionary used for conversion from reading to a word character string, and the reading generation unit reads the word character string extracted. Is performed using the original dictionary, and the word registration means registers words in the user dictionary.
If the word registration device is configured in this way, reading of the extracted word character string can be generated with high accuracy without being affected by the dictionary data registered in the user dictionary.
[0019]
In the word registration device of the present invention, the word registration unit displays the extracted word character string and the generated reading, and requests correction and / or registration confirmation.
If the word registration device is configured in this way, it is possible not only to prevent erroneously extracted word character strings or erroneously generated readings from being registered in the word dictionary, but also to register word words and their Users can recognize readings and perform efficient document input.
[0020]
Moreover, the word registration device of the present invention is configured to be built in a small communication terminal device.
Since the word character string extraction process and the reading generation process are simplified, the word registration device of the present invention can be implemented even in a small communication terminal device such as a mobile phone with limited hardware resources. Become.
[0021]
In order to achieve the above object, the word registration method of the present invention stores a word character string and its reading in association with each other, and stores unregistered words in a word dictionary used for conversion from reading to word character string. A word registration method for registering a character string and a reading thereof, for a sentence that has already been converted to a character string, and extracting a word character string to be registered in the word dictionary using a word character string extraction keyword Then, the extracted word character string is decomposed into character units, readings of each decomposed character are searched from the word dictionary, and a plurality of searched readings are combined to read the extracted word character string. The generated and extracted word character string and the generated reading are associated with each other and registered in the word dictionary.
[0022]
Further, the word registration method of the present invention is implemented in word registration in a small communication terminal device.
In this way, the word character string extraction process and the reading generation process can be easily performed, and thus can be performed even in a small communication terminal device such as a cellular phone in which hardware resources are limited. become.
[0023]
If such a word registration method is used, a sentence that has already been converted to a character string is extracted, an unregistered word character string is extracted, its reading is generated, and the extracted word character string and the generated reading are generated. Is registered in the word dictionary, so that the word registration work by the user can be reduced and efficient document input can be performed.
In addition, by simplifying the word character string extraction process and the reading generation process as much as possible, it is possible to implement even a small device such as a mobile phone with limited hardware resources.
[0024]
In order to achieve the above object, the word registration program of the present invention stores a word character string and its reading in association with each other and stores unregistered words in a word dictionary used for conversion from reading to word character string. A word registration program for registering a character string and its reading, which is a word registration device that targets a sentence that has already been converted to a character string, and is a registration target of the word dictionary using a word character string extraction keyword Extracting a word character string, decomposing the extracted word character string into character units, searching for the reading of each decomposed character from the word dictionary, combining the searched plural readings, and extracting the word A reading of a character string is generated, and the extracted word character string and the generated reading are associated with each other and registered in the word dictionary.
[0025]
If the word registration program is configured in this way, a sentence already converted to a character string is targeted, an unregistered word character string is extracted, its reading is generated, and the extracted word character string and the generated reading are read. Since it is registered in the word dictionary, the word registration work by the user can be reduced.
[0026]
The word registration program of the present invention is configured to execute word registration in a small communication terminal device.
In this way, word registration can be easily realized even in a small communication terminal device such as a mobile phone in which hardware resources are limited.
[0027]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0028]
[First embodiment]
First, a first embodiment of the present invention will be described with reference to FIGS.
FIG. 1 is a block diagram showing a hardware configuration of a document input device (word registration device) according to the first embodiment of the present invention.
[0029]
The document input device 1 shown in this figure is functionally provided with the word registration device of the present invention, and is constituted by a device having a document input function, such as a mobile phone, a portable electronic mail terminal, a PDA, a personal computer, for example. Has been.
For example, when the document input device 1 is a personal computer, the hardware includes an input unit 2 such as a keyboard, a display unit 3 such as a liquid crystal display, a communication unit 4 such as a LAN, and a storage unit 5 such as a hard disk. And a control unit 6 such as a CPU.
[0030]
FIG. 2 is a block diagram showing a functional configuration of the document input device (word registration device) according to the first embodiment of the present invention.
As shown in this figure, the document input device 1 functionally configures a document database 10, a Japanese language conversion device 20, a word registration device 30, and the like by a program stored in the storage unit 5.
The document database 10 stores document data created by users and document data created by other users. An example of document data created by another user is a received mail.
[0031]
The Japanese conversion device 20 includes a conversion engine 22 that converts a reading into a word character string using a word dictionary 21 that stores the word character string and its reading in association with each other. More specifically, the conversion engine 22 searches the word dictionary 21 for a character string corresponding to reading in accordance with the conversion operation after reading input in the input unit 2, and displays this on the display unit 3 in an unconfirmed state. To do. If the converted character string is correct, the input of the converted character string is confirmed according to the confirming operation in the input unit 2, while if the converted character string is not correct, the conversion candidate switching operation and phrase break in the input unit 2 are performed. The conversion character string is corrected according to the change operation.
[0032]
The word dictionary 21 includes an original dictionary 21a and a user dictionary 21b. The original dictionary 21a is a word dictionary that is provided as a standard in the document input device 1, and the user dictionary 21b is a word dictionary that allows additional registration of words by the user. In the original dictionary 21a of this embodiment, in order to perform a reading search of an extracted word character string, which will be described later, word character string readings are stored in units of word character string characters. For example, the word dictionary data “word character string: computer, reading: Keisanki” is internally configured as “word character string: total / calculation / machine, reading: Kei / san / ki”. It can also be used as single-kanji dictionary data such as “character: total, reading: kei”, “character: arithmetic, reading: san”, “character: machine, reading: ki”.
[0033]
The word registration device 30 includes a word character string extraction unit 31, a reading generation unit 32, and a word registration unit 33.
The word character string extraction means 31 targets sentence data in the document database 10 that has already been converted to a character string, and uses a word character string extraction keyword described later as a registration target of the word dictionary 21 (user dictionary 21b). This is a functional component that automatically extracts a word character string.
The word character string extraction means 31 of this embodiment has a function of presetting document data conditions to be extracted and word character string extraction conditions.
[0034]
The reading generation means 32 decomposes the extracted word character string into character units, searches the word dictionary 21 (original dictionary 21a) for the readings of the decomposed characters, and combines the extracted readings to extract them. It is a functional component that generates a reading of a word string.
The word registration unit 33 is a functional component that registers the extracted word character string and the generated reading in association with each other in the word dictionary 21 (user dictionary 21b).
The word registering means 33 of the present embodiment has a function of displaying the extracted word character string and the generated reading on the display unit 3 and requesting the user to make corrections or confirm registration.
[0035]
Next, the operation of the word registration device 30 in the present embodiment will be described with reference to FIGS.
FIG. 3 is a flowchart showing the operation of the word registration device according to the first embodiment of the present invention, and FIG. 4 is a word character string extraction used by the word registration device according to the first embodiment of the present invention to extract a word character string. It is explanatory drawing which shows the keyword for use.
[0036]
As shown in FIG. 3, first, the user inputs word character string extraction conditions from the input unit 2 (S101). The word character string extraction means 31 performs condition settings such as selection of document data to be extracted from the document database 10 and the number of word extractions according to the word character string extraction conditions input from the input unit 2, and according to the settings. Then, a word character string is extracted from the document data to be extracted (S102).
This word character string extraction processing is performed using a word character string extraction keyword as shown in FIG. For example, the word character string extraction keywords include case particles, case particle equivalents, ramification particles, collection particles, case particles + collection particles, collection particles + case particles, connection particles, determination particles, and the like.
[0037]
Specifically, it follows the word string extraction keyword, followed by the character string up to the word string extraction keyword and the descriptive symbols (punctuation marks, question marks, exclamation marks, etc.), and the word string extraction keyword. The character string up to is extracted.
Moreover, the word character string to be extracted is limited to a word character string composed only of kanji and a word character string composed only of katakana, and word characters to be registered in the word dictionary 21 (user dictionary 21b). The columns are word character strings that are not registered in the word dictionary 21 (original dictionary 21a and user dictionary 21b).
[0038]
Next, a reading of the extracted word character string is generated (S103). If the extracted word character string is katakana, the corresponding reading is used. If the extracted word character string is kanji, a word character string reading is generated using the word dictionary 21 (original dictionary 21a). .
As described above, in the original dictionary 21a, the reading is divided in character units of the word character string. Therefore, the extracted word character string is decomposed into character units, and the reading of each character is searched in the original dictionary 21a. Then, a plurality of retrieved readings are combined to obtain an extracted word character string reading. When there are a plurality of reading candidates, the reading with the largest number of hits is adopted.
[0039]
Next, the extracted word character string and the generated reading are displayed on the display unit 3 (S104), and a correction or registration confirmation is requested from the user (S105). Here, when correction is not necessary, the extracted word character string and the generated reading are registered in the word dictionary 21 (user dictionary 21b) (S106), and when correction is necessary, the user manually enters the word character string or After correcting the reading (S107), the word character string and the reading are registered in the word dictionary 21 (user dictionary 21b).
Thereafter, if the above reading registered in the word dictionary 21 (user dictionary 21b) is input, it can be converted into the above word character string (S108).
[0040]
Next, a specific operation example of the word registration device according to the first embodiment of the present invention will be described.
For example, in the document database 10, there is an incoming mail including a character string “Need to have a state system in the future.” In the original dictionary 21a, “word character string: road / road, reading: how / ro”, “word Consider a case where there is dictionary data such as “character string: book / state, reading: hon / shu” and “word string: control / degree, reading: sei / do”. First, the user selects a corresponding received mail (S101), and a word character string extraction process is performed on the received mail (S102). In this word string extraction step, for the sentence “In the future, a state system will be required.” Included in the received email, it is the string that follows the punctuation and before the word string extraction keyword “ga”. In addition, “Doshu system” is extracted as a character string whose character type is composed of only kanji.
[0041]
Next, a reading of the extracted character string “Doshu system” is generated (S103). In this reading generation processing step, the character string “state” is disassembled into character units, and the readings of the characters “way”, “state”, “system” are searched in the original dictionary 21a. In this example, “character: road, reading: how”, “character: state, reading: Shu”, “character: system, reading: sei” are searched, and these are combined to obtain dictionary registration candidate data “character string: "Doshu system, reading: how to say".
[0042]
Next, the extracted word character string and its reading are displayed to the user (S104), and the presence or absence of correction is confirmed (S105). In this case, since no correction is necessary, the dictionary registration candidate data “character string: state system, reading: doseisei” is registered in the user dictionary 21b in accordance with the user's confirmation operation (S106).
By executing the above steps, the next time the document is input, the conversion engine 22 adds a new registered word “word state system” to the user dictionary 21b for the input reading “Doshusei”. Can be retrieved and displayed as conversion candidates (S108).
[0043]
According to the present embodiment configured as described above, a sentence already converted to a character string is targeted, an unregistered word character string is extracted, its reading is generated, and the extracted word character string and generation It becomes possible to register the readings made in the word dictionary 21. Thereby, the word registration work by the user can be reduced, and the efficiency of sentence input can be improved when inputting a document while referring to a sentence that has already been converted into a character string.
[0044]
Also, by extracting a word character string using a word character string extraction keyword, the word character string extraction process can be simplified as compared with the case of using a complicated morpheme analysis routine. Also, the reading generation processing can be simplified by searching for the reading of the extracted word character string in character units and combining the searched plural readings to generate the reading of the word character string. As a result, not only can the processing load of the apparatus be reduced, and a quick word registration process can be performed, but also a small apparatus such as a mobile phone with limited hardware resources can be implemented.
[0045]
In addition, in the character string extraction process, a word character string consisting of only kanji or katakana following the word extraction keyword or description symbol and the word character string up to the word extraction keyword is extracted. Of the extracted word character strings, those not registered in the user dictionary 21b are to be registered, so that the word character string extraction process can be further simplified and the processing load can be reduced. Further improvement can be achieved.
[0046]
Further, the word dictionary 21 (original dictionary 21a) stores the reading of the word character string by dividing the word character string into character units, so that the dictionary data for converting the reading into the word character string and the extracted word characters are stored. The dictionary data for generating column readings can also be used in common, and the word dictionary capacity can be reduced. This further facilitates implementation on a cellular phone or the like that has limited hardware resources.
[0047]
The word dictionary 21 includes an original dictionary 21a and a user dictionary 21b that are used for conversion from reading to word character strings. The word dictionary 21 is used to perform a reading search of the extracted word character strings, and word registration is performed. Since it is performed on the user dictionary 21b, reading of the extracted word character string can be generated with high accuracy without being affected by the dictionary data registered in the user dictionary 21b.
[0048]
In addition, since the extracted word character string and the generated reading are displayed and the correction or registration confirmation is requested, the word character string extracted by mistake or the erroneously generated reading is displayed in the word dictionary 21 (user dictionary 21b). In addition, the user can recognize the word character string to be registered and the reading thereof, and can efficiently input a document.
[0049]
[Second Embodiment]
Next, a second embodiment of the present invention will be described with reference to FIG.
FIG. 5 is a flowchart showing the operation of the word registration device according to the second embodiment of the present invention.
In the second embodiment shown in this figure, the user directly selects the document data to be extracted from the document database 10, and the word dictionary 21 (user) without confirming the extracted word character string and its reading to the user. The point of registration in the dictionary 21b) is different from the above embodiment.
[0050]
As shown in FIG. 5, in the second embodiment, first, the user selects document data in the document database 10 (S201). Thereafter, the word character string extraction processing (S202), the extracted word character string reading generation processing (S203), and the extracted word character string and generated reading dictionary registration processing (S204) are automatically performed. Executed. After that, if the above reading registered in the word dictionary 21 (user dictionary 21b) is input, it can be converted into the above word character string (S205).
[0051]
In this embodiment, no correction request or registration confirmation is made to the user. Therefore, in the word character string extraction process and the word character string reading generation process, the extraction condition and the reading generation condition are stricter, and the dictionary has a high accuracy rate. It is preferable to extract and generate only data. For example, the word character string extraction processing uses only the word character string extraction keywords shown in FIG. 4 with a high accuracy rate. In the reading generation processing, the word character string reading search is performed in character units. By adding a condition such as using only the number of hits of the same reading candidate exceeding a predetermined threshold, a candidate with high accuracy is selected.
[0052]
Next, a specific operation of the second embodiment will be described.
For example, the document database 10 includes a received mail including a sentence “Tomorrow will discuss the state system”, and the original dictionary 21 a includes “character string: road / road, reading: how / ro”, "Character string: Ayumu / Road, Reading: Ho / Do", "Character string: Book / State, Reading: Hon / Shyu", "String: Kyu / State, Reading: Kyu / Shyu", "String: System Consider the case where there is dictionary data “/ about, reading: sei / yaku”.
[0053]
When the user selects a corresponding received mail (S201), an automatic word / character string extraction process is performed on the received mail (S202). At this time, in the sentence “Tomorrow will discuss the Doshu system” included in the received mail, it is a character string that follows the reading and precedes the word character string extraction keyword “About”, and the character type is “Doshu system” is extracted as a dictionary registration candidate as a character string composed of kanji. Here, by applying “about” with high accuracy as a word character string extraction keyword, the extraction accuracy is improved.
[0054]
Next, the original dictionary 21a is searched for the character unit reading of the extracted word character string. In this example, it is added as a condition that there are two or more candidates for the same reading as the word-by-character reading of the word character string. Then, “character: road, reading: how”, “character: state, reading: Shu”, “character: system, reading: se” are searched as “character string: road state, "Reading: Dosei" is created as a candidate for dictionary registration.
[0055]
According to the present embodiment configured as described above, in addition to the effects of the first embodiment, extraction of word character strings, generation of reading of word character strings, and reading of word character strings and readings by simply selecting document data The effect is that dictionary registration can be performed automatically.
In addition, the strictness of the word character string extraction conditions and the word character string reading and generation conditions can also prevent the accuracy of the registered dictionary data from being lowered.
[0056]
[Third embodiment]
Next, a third embodiment of the present invention will be described with reference to FIG.
FIG. 6 is a flowchart showing the operation of the word registration device according to the third embodiment of the present invention.
The third embodiment shown in this figure is different from the second embodiment in that the user sets selection conditions for document data without directly selecting the document data.
When the user inputs document data selection conditions (S301), word text string extraction processing (S302), reading generation processing (S303), and registration of word character strings and readings are performed on document data that meets the conditions. The processing (S304) is automatically performed, and then the reading can be converted into the word character string (S305).
[0057]
According to the third embodiment configured as described above, all word registration processing including document selection can be automated only by first setting the document selection condition. As a result, the user can perform Japanese conversion without being aware of unregistered words used in received mails, and full-scale context-dependent Japanese conversion processing is possible.
[0058]
【The invention's effect】
As described above, according to the present invention, a sentence that has already been converted into a character string is extracted, an unregistered word character string is extracted, its reading is generated, and the extracted word character string and the generated reading are read. By registering in the word dictionary, it is possible to reduce the word registration work by the user, and further simplify the word character string extraction processing and reading generation processing as much as possible, thereby limiting the hardware resources. It can also be implemented with a small communication terminal device such as the above.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a hardware configuration of a document input device (word registration device) according to a first embodiment of the present invention.
FIG. 2 is a block diagram showing a functional configuration of a document input device (word registration device) according to the first embodiment of the present invention.
FIG. 3 is a flowchart showing the operation of the word registration device according to the first embodiment of the present invention.
FIG. 4 is an explanatory diagram showing a word character string extraction keyword used by the word registration device according to the first embodiment of the present invention to extract a word character string.
FIG. 5 is a flowchart showing the operation of the word registration device according to the second embodiment of the present invention.
FIG. 6 is a flowchart showing the operation of the word registration device according to the third embodiment of the present invention.
[Explanation of symbols]
1 Document input device
2 Input section
3 Display section
4 Communication Department
5 storage unit
6 Control unit
10 Document database
20 Japanese translation device
21 word dictionary
21a Original dictionary
21b User dictionary
22 Conversion engine
30 word registration device
31 Word string extraction means
32 Reading generation means
33 word registration means

Claims

A word registration device for storing a word character string and its reading in association with each other and registering an unregistered word character string and its reading for a word dictionary used for conversion from reading to word character string,
A word character string extraction means for extracting a word character string to be registered in the word dictionary using a word character string extraction keyword for a sentence that has already been converted into a character string;
The extracted word character string is decomposed into character units, readings of each decomposed character are searched from the word dictionary, and the plurality of searched readings are combined to generate the extracted readings of the word character string Reading generation means;
A word registration means for registering the extracted word character string and the generated reading in the word dictionary in association with each other;
With
The word character string extraction keyword consists of predetermined keywords including case particles, case particle equivalents, ramification particles, collection particles, case particles + collection particles, collection particles + case particles, connection particles, and judgment particles,
The word registration device, wherein the word character string extraction unit extracts a word character string to be registered in the word dictionary by extracting a word character string sandwiched between the word extraction keywords.

The word character string extracting means extracts a word character string following a descriptive symbol and before the word extracting keyword, and among the extracted word character strings, those not registered in the word dictionary are registered. The word registration device according to claim 1, wherein:

The word character string extracting means extracts a word character string composed only of kanji and / or a word character string composed only of katakana from a word character string extracted using the keyword for word extraction. The word registration device according to claim 1 or 2, characterized in that

The word dictionary stores word character string readings separated in character character string units so that the word character string extracted by the word character string extracting unit is retrieved in character units. The word registration device according to claim 1.

The word dictionary includes an original dictionary and a user dictionary used for conversion from reading to word character string, and the reading generation means performs a reading search of the extracted word character string using the original dictionary, The word registration device according to claim 1, wherein the word registration unit performs word registration on the user dictionary.

The word registration device according to claim 1, wherein the word registration unit displays the extracted word character string and the generated reading, and requests correction and / or registration confirmation thereof. .

The word registration device according to claim 1, which is built in a small communication terminal device.

The computer stores the word character string and its reading stored in the storage unit in association with each other, and stores the unregistered word character string and its reading with respect to the word dictionary used for conversion from reading to word character string. A word registration method to register,
Computer
For sentences already converted to strings, case particles, case particle equivalents, lemma particles, collection particles, case particles + collection particles, collection particles + case particles, connection particles, and judgment words stored in the memory A word character string to be registered in the word dictionary is extracted by extracting a word character string sandwiched between the word extraction keywords using a word character string extraction keyword including a predetermined keyword including:
The extracted word character string is decomposed into character units, the reading of each decomposed character is searched from the word dictionary, and a plurality of the searched readings are combined to generate an extracted reading of the word character string. ,
A word registration method, wherein the extracted word character string and the generated reading are associated and registered in the word dictionary.

The word registration method according to claim 8, which is performed in word registration in a small communication terminal device.

Word for word registration device, is stored in association with the word string and its reading, for the word dictionary used in the conversion of readings Word string, to register the unregistered word string and read A registration program,
In the word registration device,
For sentences already converted to character strings, it consists of predetermined keywords including case particles, case particle equivalents, ramification particles, collection particles, case particles + collection particles, collection particles + case particles, connection particles, and judgment particles. Extracting a word character string to be registered in the word dictionary by extracting a word character string sandwiched between the word extraction keywords using a word character string extraction keyword,
The extracted word character string is decomposed into character units, the reading of each decomposed character is searched from the word dictionary, and the plurality of searched readings are combined to generate the extracted reading of the word character string. ,
A word registration program for associating the extracted word character string with the generated reading and registering it in the word dictionary.

According to claim 10 word registration program for executing the word registration in small communication terminal device.