JP2005044103A

JP2005044103A - Document creation device and method and program

Info

Publication number: JP2005044103A
Application number: JP2003202559A
Authority: JP
Inventors: Masato Yajima; 真人矢島; Yukihiro Fukunaga; 幸弘福永; Hisayoshi Nagae; 尚義永江
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-07-28
Filing date: 2003-07-28
Publication date: 2005-02-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document creation device and method making it possible to efficiently select an optimal character string from a plurality of character string candidates corresponding to words recognized from input voice when a document is created through the recognition of the input voice. <P>SOLUTION: A plurality of character strings and another plurality of character strings stored in a storage means which stores readings corresponding to the former plurality of character strings are each set as either subjects to be learned or non-subjects to be learned. For a plurality of words recognized from the input voice, a plurality of candidate character strings with the same or similar readings to the words are selected from among the plurality of character strings stored in the storage means. The character strings which are the non-subjects to be learned are excluded from the plurality of candidate character strings corresponding to the words, and the character strings corresponding to the plurality of words recognized from the input voice are selected from the remaining candidate character strings to create a document. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、入力音声を音声認識して文書を作成する文書作成装置に関する。
【０００２】
【従来の技術】
従来の文書作成装置において、学習機能とはオペレータが選択した単語あるいは文節、またはその前後の単語あるいは文節をテーブルに記憶し、後の入力作業において該単語あるいは文節が候補中に出現した場合、テーブル中に記憶された時間等を考慮して候補順位を調整していた。さらに、学習においては付属語のような特定の品詞や無変換平仮名候補、カタカナ候補などは設定により学習非対象とすることもできた。
【０００３】
一般的に、文書作成装置における学習機能は、入力された音声あるいは文字列に対する認識あるいは変換候補のなかからユーザにより選択された候補（単語あるいは文節）を逐次記憶しておき、新たな音声あるいは文字列が入力した際には、当該入力音声あるいは入力文字列に対応する候補に、記憶された単語あるいは文節が含まれる場合、より直近に学習された候補を優先して出力することにより、認識精度あるいは変換精度を向上させる技術である。
【０００４】
しかし、入力音声あるいは文字列に対応する認識候補グループあるいは変換候補グループに、一般的に利用する語と稀にしか利用されない語が存在し、後者が選択されると、次に同様の認識候補グループ／変換候補グループが得られた場合に、従来の学習機能では、一般的な語よりも学習された語が優先されてしまうため、候補選択処理が必要となる場合が多かった。
【０００５】
特に、かな漢字変換の場合には、変換対象である入力文字列の読み、変換結果の表現、品詞などに応じて、「学習なし」、「一時的最新使用学習」、「一時的頻度学習」、「永久的最新使用学習」、「永久的頻度学習」などの学習方式に変更し、利用状況に応じて最適な学習方式の選択を可能にする技術がある（例えば、特許文献１参照）
【０００６】
【特許文献１】
特開平９−６２６６８号公報
【０００７】
【発明が解決しようとする課題】
入力音声を文字認識する場合、入力音声に対する候補単語から、希にしか利用されない単語が選択されたとき、従来の学習機能では、次に同じ音声が再び入力されたときに、当該音声に対する候補単語として、当該選択された単語が優先的に選択あるいは表示されてしまい、その結果、当該入力音声に対応する候補単語の一覧の中から最適な単語を選択し直さなければならず、効率よく文書作成を行えないという問題点があった。
【０００８】
そこで、本発明は上記問題点に鑑み、入力音声を音声認識して文書を作成する際に、入力音声から認識された単語に対応する複数の文字列候補のなかから最適な文字列を効率よく選択することのできる文書作成装置および方法を提供することを目的とする。
【０００９】
【課題を解決するための手段】
本発明は、入力音声を音声認識して文書を作成するためのものであって、（ａ）複数の文字列と少なくとも当該複数の文字列のそれぞれに対応する読みを記憶する記憶手段に記憶された前記複数の文字列のそれぞれを、学習対象と学習非対象のうちのいずれか一方に設定し、（ｂ）前記入力音声から認識された複数の単語のそれぞれについて、前記記憶手段に記憶された複数の文字列のなかから、当該単語と読みが同じかあるいは類似する複数の候補文字列を選択し、（ｃ）前記入力音声から認識された複数の単語のそれぞれに対応する文字列を、各単語に対応する前記複数の候補文字列から前記学習非対象の文字列を除く残りの候補文字列のなかから選択して、前記文書を作成し、（ｄ）前記複数の単語のうちの１つである第１の単語に対応する前記複数の候補文字列のうちユーザにより選択された候補文字列で、前記文書中の前記第１の単語に対応する文字列を置き換え、（ｅ）前記複数の単語のそれぞれに対応する文字列として前記各単語に対応する複数の候補文字列のなかからそれぞれ選択された前記学習対象の文字列と前記学習非対象の文字列のうち、前記学習対象の文字列のみ、当該各単語に対応する複数の候補文字列のなかから優先的に選択することにより、入力音声を音声認識して文書を作成する際に、入力音声から認識された単語に対応する複数の文字列候補のなかから最適な文字列を効率よく選択することができる。
【００１０】
【発明の実施の形態】
以下、本発明の実施形態について、図面を参照して説明する。
【００１１】
図１は、本発明の文書作成装置の実施形態にかかる文書作成装置１の構成例を示したもので、大きく分けて、文書作成部１１と入力部１２と出力部１３とから構成されている。
【００１２】
入力部１２は、例えばユーザの発言などの音声を入力するためのマイク１３１と、マイク１３１から入力された音声（入力音声）を音声認識して文字コードに変換する音声認識部１３２と、ユーザが文書作成装置１への各種編集指示を入力するための編集指示入力部１３３とを有するものである。入力部１２からは、入力音声を音声認識した結果得られた文字コードと、ユーザにより入力された編集指示を表す制御信号（辞書メンテナンスを指示する信号、文書作成を指示する信号、候補選択を指示する信号など）が出力され、文書作成部１１へ入力する。
【００１３】
文書作成部１１は、入力制御部１１２と辞書メンテナンス部１１４と候補選択部１１５と単語学習部１１６と言語解析部１１８と候補展開部１１９と出力制御部１２４と、入力部１２での音声認識により得られた文字コードを基に言語解析部１１８での言語解析を行う際に用いる解析ルールを記憶する解析ルール記憶部１２１と、学習テーブルを記憶する学習テーブル記憶部１１７と、認識候補を一時記憶する候補バッファ１２０と、単語辞書を記憶する単語辞書記憶部１２２と、文書を一時記憶する文書バッファ１２５とを有する。
【００１４】
なお、以下の説明において、単語辞書記憶部１２２に記憶されている単語辞書を簡単に単語辞書１２２と呼び、学習テーブル記憶部１１７に記憶されている学習テーブルを簡単に学習テーブル１１７と呼ぶ。解析ルール記憶部１２１に記憶されている解析ルールを簡単に解析ルール１２１と呼ぶ。
【００１５】
文書作成部１１は、入力音声から認識された複数の単語のそれぞれに対応する文字列を、単語辞書１２２の複数の文字列のうちの当該単語と読みが同じかあるいは類似する複数の候補文字列から上記学習非対象の候補文字列を除く、残りの候補文字列のなかから選択して、文書を作成する。作成された文書は文書バッファ１２５に格納されている。また、当該文書中のユーザにより指定された第１の文字列を、当該第１の文字列に対応する複数の候補文字列のうちユーザにより選択された候補文字列で置き換えて上記文書を編集する。
【００１６】
入力制御部１１２には、入力部１２から出力された文字コードや制御信号が入力する。単語１２２のメンテナンスを指示する信号が入力したときには、辞書メンテナンス部１１４を起動し、候補選択を指示する信号が入力したときには、候補選択部１１５を起動する。
【００１７】
辞書メンテナンス部１１４は、単語辞書１２２に記憶されている各単語が、学習非対象と学習対象のいずれかであるかを区別するために各単語に属性情報を対応つけて記憶している。なお、ここでは、例えば、学習対象と学習非対象の単語のうち、学習非対象の単語に対してのみ、学習非対象属性情報として「１」を対応つけて記憶するようになっている。
【００１８】
単語辞書１２２中の各単語は初期状態では、全て学習対象に設定されている（すなわち、学習非対象属性情報「１」が登録されていない）としてもよいが、一般的に希にしか使用しない単語に対しては予め学習非対象に設定されていてもよい（すなわち、学習非対象属性情報「１」が登録されていない）。
【００１９】
入力制御部１１２は、入力部１２から文字コードが入力したときには、それを入力バッファ１１３に格納するとともに、言語解析部１１８を起動する。ここで文字コードは一意に決定されていても、複数の候補を持っていてもよい。
【００２０】
言語解析部１１８では、入力音声の入力時間間隔や入力バッファ１１３に格納された文字数から解析対象となった場合、単語辞書１２２、解析ルール１２１を参照し、入力バッファ１１３に格納されている文字コードを順次組み合わせて、単語系列を作成し、当該単語系列を候補展開部１１９へ渡す。
【００２１】
単語辞書１０５は、複数の単語のそれぞれについて、各単語を識別するための識別情報（辞書番号あるいは辞書ＩＤとも呼ぶ）と、その表記、読み、品詞、使用頻度などが予め登録されており、さらに、ユーザにより指定された単語を学習対象から学習非対象に設定する場合には、辞書メンテナンス部１１４により、学習非対象属性情報が当該単語の情報として書き込まれ、また、ユーザにより指定された単語を学習非対象から学習対象に設定する場合には、辞書メンテナンス部１１４は、当該単語に対応付けた学習非対象属性を削除する。
【００２２】
解析ルール１２１は、品詞間の接続可否や、単語間の繋がり易さなどが記述されているものである。
【００２３】
候補展開部１１９では、言語解析部１１８から出力される単語系列中の各単語あるいは文節に対し、単語辞書１２０から同音語を取得し、取得した同音語の辞書番号および表記などを候補バッファ１２０に格納する。更に候補展開部１１９は候補バッファ１２０に格納した、上記単語系列中の単語に対応する同音語に対し候補順位を付与する。その際、学習テーブル１１７に記憶されている最も直近に使われた単語に最も高い候補順位を付与する。候補展開部１１９は候補順位を決定した後、１位候補の辞書番号、表記などを文書バッファ１２５に格納する。文書バッファ１２５では候補バッファ１２０に格納されている同音語候補に対して該１位候補から関連がわかるようにリンクを張る。
【００２４】
出力制御部１２４は、文書バッファ１２５に格納された文書中の各文字列を出力部１３に表示するとともに、候補バッファ１２０に格納されている当該文字列中の単語にリンクされている当該単語の同音語候補の一覧を出力部１３に表示する。
【００２５】
一方、入力制御部１１２は、入力部１２から入力された制御信号が、出力部１３に表示された文字列中の単語に対する次候補選択を指示するものであるとき、候補選択部１１５を起動する。候補選択部１１５は、候補バッファ１０８から当該単語にリンクされている当該単語の同音語の一覧を出力部１３に表示する。
【００２６】
ユーザが上記一覧から所望の単語を選択した場合、候補選択部１１５は、単語辞書１２２を参照して、当該選択された単語に学習非対象語であることを示す属性情報が付されているか否かを調べ、当該属性情報が付されていないときには（学習非対象語でなかった場合のみ）、単語学習部１１６に対して、当該選択された単語に確定されたことを通知する。
【００２７】
単語学習部１１６では、当該通知を受けて学習テーブル１１７に当該確定された単語（学習非対象語ではない学習対象語）の情報および直前・直後の単語の情報などを候補選択の履歴情報として記憶する。
【００２８】
また、候補選択部１１５は、選択された単語を候補バッファ１２０より取り出し、当該選択された単語で、現在候補の一覧を表示している文書バッファ１１２中の単語を置換するとともに、画面表示を更新する。すなわち、候補選択部１１５は、文書バッファ１２５内の文書の編集を行うようになっている。
【００２９】
図２は、図１の文書作成装置１の処理動作を説明するためのフローチャートであり、単語辞書１２２に登録されている学習対象の単語を学習非対象に変更したり、あるいは、学習非対象の単語を学習対象に変更し、また、単語辞書１２２中の単語の学習非対象属性情報の有無に応じて文書作成を行う際の処理動作を示している。
【００３０】
入力制御部１１２は、入力部１２から単語辞書のメンテナンスを指示する信号が入力された後（ステップＳ１）、ユーザにより、処理対象とする単語を指定するための情報が入力されたとき（ステップＳ２）、辞書メンテナンス部１１４は、入力された情報に一致する単語を単語辞書１２２より取得する（ステップＳ３）。ここで、単語を指定するための情報とは、単語の表記・読み・品詞などの全部または一部の情報いずれであってもよい。ステップＳ３では取得した単語の一覧が表示されるので、この一覧から新たに学習非対象とする単語や、既に学習非対象となっている単語のうち学習対象とする単語を選択する（ステップＳ４）。
【００３１】
辞書メンテナンス部１１４は、学習非対象とすべく選択された（学習対象の）単語は学習非対象属性であることを表す属性情報を付与するためにテンポラリーファイル一時記憶するとともに（ステップＳ５）、学習テーブル１１７から当該選択された単語を削除すべく、単語学習部１１６に当該選択された単語を通知する。単語学習部１１６は、当該通知された単語が学習テーブル１１７に記憶されているときには、学習テーブル１１７から当該通知された単語を削除する（ステップＳ６）。また、学習対象にすべく選択された（学習非対象の）単語は、学習非対象属性であることを表す属性情報を削除するためにテンポラリーファイルに一時記憶する（ステップＳ７）。
【００３２】
単語辞書１２２に登録されている単語に対する学習非対象属性情報の付与や削除のための作業が終了するまで、ステップＳ２からステップＳ８を繰り返す。辞書メンテナンス部１１４では、メンテナンス終了時に、テンポラリーファイルに記録された内容を基に、単語辞書１２２を更新する（ステップＳ９）。すなわち、単語辞書１２２中の学習非対象にすべく選択された単語には、学習非対象属性情報を当該単語辞書１２２に記録し、単語辞書１２２中の学習対象にすべく選択された単語には、学習非対象属性情報を当該単語辞書１２２から削除する。
【００３３】
ステップＳ１において、入力部１２から文書作成を指示する信号が入力された後、入力音声を音声認識した結果の文字コードが入力した場合（ステップＳ１０）、入力制御部１１２は、入力した文字コードを順次、入力バッファ１０３へ格納する（ステップＳ１１）。
【００３４】
言語解析部１１８では、入力バッファ１１３に格納された文字コードを監視し、入力時間間隔や文字数等から解析対象となると判断された場合（ステップＳ１２）、単語辞書１２２および解析ルール１２１を参照して言語解析を行う（ステップＳ１３）。
【００３５】
ここで言語解析とは、解析対象の文字コードを時系列的に組み合わせ、単語辞書１２２や解析ルール１２１の情報から、最も言語的に正しくなる単語系列を作成するものである。言語解析部１１８では、解析結果の単語系列を候補展開部１１９を介して候補バッファ１２０に格納する（ステップＳ１４）。
【００３６】
候補展開部１１９では、候補バッファ１２０に格納された各単語について、当該単語の同音語（読みが同じか類似する語）であって、当該単語の変換候補となる単語を単語辞書１２２から求め、候補バッファ１２０に加える（ステップＳ１５）。更に候補展開部１１９では、候補バッファ１２０に格納した各単語が学習テーブル１１７に記録されているかを調べる。
【００３７】
学習テーブル１１７には、最近選択された単語に関する履歴情報（例えば、当該単語の単語ＩＤなど）が記録されているので、候補バッファ１２０に格納した単語のなかに、学習テーブル１１７に履歴として記録されている単語がある場合には、最も新しく学習テーブル１１７に記録されている履歴に対応する単語の順位が最も高くなるように、候補バッファ１２０に格納した単語の候補順位を定める（ステップＳ１６）。候補展開部１１９は、言語解析部１１８で求めた単語系列の各単語を候補順位が１位の単語に置き換えてなる単語系列を文書バッファ１２５に格納する（ステップＳ１７）。
【００３８】
出力制御部１２４は、文書バッファ１２５に格納された候補順位が１位の単語からなる１位候補を出力部１３上に表示する（ステップＳ１８）。ユーザは、画面表示された１位候補を確認し、期待した表記と異なる場合、次候補の選択を行う。
【００３９】
次候補選択は、どの単語の候補を選択するかを、例えばマウスのクリック等の手段で入力部１２の編集指示入力部１２３から指示することで開始する（ステップＳ１９）。候補選択部１１５は、出力部１３に表示された文字列中のユーザにより指定された単語にリンクされている当該指定された単語の同音語（候補バッファ１２０に格納されている候補単語）の一覧を出力部１３に表示する（ステップＳ２０）。
【００４０】
ユーザが候補の一覧から所望の候補を選択すると（ステップ２１）、候補選択部１１５は、単語辞書１２２を参照して、当該選択された単語に学習非対象属性情報が付与されているかを調べ、付与されていない場合には、当該選択された単語に関する情報（当該選択された単語の単語ＩＤや、当該単語の前後にある単語の単語ＩＤまでなど）を単語学習部１１６に通知する（ステップＳ２２）。
【００４１】
単語学習部１１６は、学習テーブル１１７に当該選択された単語に関する情報を候補選択の履歴情報として記録する（ステップＳ２３）。更に候補選択部１１５では、文書バッファ１２５に格納されている選択前の単語を当該選択された単語で書換え（ステップＳ２４）、画面表示を更新する（ステップＳ１８）。すなわち、次候補選択操作で選択された単語に置き換えた結果が画面に表示される。
【００４２】
なお、入力制御部１１２は、入力部１２から入力された制御信号が、辞書メンテナンスを指示する制御信号、文書作成を指示するための制御信号、候補選択を指示する制御信号のいずれでもない場合には、当該制御信号にて指示された処理を行って入力待ちとなる（ステップＳ２５）。
【００４３】
図３は、単語辞書１２２の一例を示したものである。単語辞書１２２には、各単語について、当該単語を検索する際にキーとなる単語ＩＤと、言語解析部１１８で言語解析を行う際に必要な読み・表記・品詞・頻度等の情報が格納されている。この情報は、入力の種類によって適宜増減する。更に、各単語には、当該単語が学習非対象であるか否かを表す属性情報も対応付けて記憶されている。
【００４４】
ユーザが、図２のステップＳ１〜ステップＳ９の処理を行うことで、例えば図３（ａ）に単語のうち、単語ＩＤが「ｘｘ０８」および「ｘｘ０ａ」の単語を学習非対象語に設定した場合、辞書メンテナンス部１１４は、単語辞書１２２中の単語ＩＤが「ｘｘ０８」と「ｘｘ０ａ」の単語に対応する学習非対象属性情報として「１」を書き込む（図３（ｂ）参照）。
【００４５】
候補選択部１１５は、図２のステップＳ２２において、図３（ｂ）に示したような単語辞書１２２を参照して、次候補として選択された単語の学習非対象属性情報として「１」が書き込まれているときには、当該選択された単語は、学習テーブル１１７に記録しないようにする。すなわち、当該選択された単語を学習しない。
【００４６】
次に、単語辞書１２２中のある単語に学習非対象属性を付与した場合としなかった場合で、入力音声に対応する文字列（文書作成部１１から出力されて出力部１３から表示される文字列）にどの様な違いが生ずるかを具体例を挙げて説明する。
【００４７】
入力部１２からの入力音声を音声認識した結果、図４（ａ）に示すように、「ひろさわさんにおあいします」といった文字コードの列が得られ、これが入力バッファ１１３に格納されると（図２のステップＳ１、ステップＳ１０〜ステップＳ１１）、言語解析部１１８で言語解析を行った結果、「ひろさわ／さん／に／お／あい／し／ます」といった単語系列が出力される。なお、図４において「／」は単語の切れ目を示す。この単語系列は、候補バッファ１２０に格納される（ステップＳ１２〜ステップＳ１４）。候補展開部１１９は、単語辞書１２２を用いて、この単語系列の各単語の同音語を求める。初期状態（学習がされていない状態）では、単語辞書１２２に予め格納されている各単語の使用頻度などから同音語の順位付けが行われ（ステップＳ１５、ステップＳ１６）、図４（ｂ）に示すように、「広沢／さん／に／お／会い／し／ます」という、各単語に対する候補順位が第１位の単語からなる単語系列が得られる（ステップＳ１７〜ステップＳ１８）。ここで、「ひろさわ」の次候補の一覧を表示させると（ステップＳ１９）、図４（ｃ）に示すような変換候補の一覧が得られる（ステップＳ２０）。この一覧から、次候補２番目の「廣沢」を選択する（ステップＳ２１）。
【００４８】
当該語が学習対象であるときには、当該語は学習テーブルに記録される（ステップＳ２２、ステップＳ２３）。なお、このとき、出力制御部１２４は、「広沢」を選択された候補「廣沢」に置き換えて図４（ｄ）に示すように表示する（ステップＳ２４、ステップＳ１８）。
【００４９】
その後、入力部１２から、図４（ｅ）に示すように、「ひろさわさんかられんらくがありました」といった音声入力があったとき、上記同様にして、これが入力バッファ１１３に格納されると、言語解析部１１８で言語解析を行った結果、「ひろさわ／さん／から／れんらく／が／あり／まし／た」といった単語系列が出力されて、この単語系列は、候補バッファ１２０に格納される（ステップＳ１２〜ステップＳ１４）。候補展開部１１９は、単語辞書１２２を用いて、この単語系列の各単語の同音語を求める。その際、学習テーブル１１７には、前回候補選択した「廣沢」が記録されているので、「ひろさわ」に対応する同音語候補の中で、この語の候補順位が最も高くなり、図４（ｆ）に示すように、「廣沢／さん／から／連絡／が／あり／まし／た」という、各単語に対する候補順位が第１位の単語からなる単語系列が得られる（ステップＳ１６〜ステップＳ１８）。
【００５０】
一方、「廣沢」という語が、学習非対象であるとき（例えば、学習非対象属性情報の値が「１」であるとき）、ステップＳ２１で、当該語が候補一覧から選択されても、ステップＳ２３はスキップして、学習テーブル１１７には、その履歴が記録されない。よって、ステップＳ１６では、「ひろさわ」に対応する同音語候補の中で、使用頻度の最も高い「広沢」が最も候補順位が高くなり、図４（ｇ）に示すように、「広沢／さん／から／連絡／が／あり／まし／た」という単語系列が得られる（ステップＳ１６〜ステップＳ１８）。
【００５１】
なお、上記実施形態の文書作成装置１では、ステップＳ１９以下で次候補選択を行う場合、同時に辞書メンテナンス部１１４をも起動し、出力部１３で表示された文字列中のユーザにより指定された語に対応する変換候補の一覧を表示した際に（ステップＳ２０）、当該一覧中の単語に学習非対象属性を付与する（学習非対象に設定する）こともできる。すなわち、例えば図４（ｂ）に示したような単語系列が出力部１３から表示されているときに、当該単語系列中の「広沢」という語が指定されて候補選択部１１５が起動し、当該語の変換候補の一覧が図５（ａ）に示すように表示される（ステップＳ２０）。この変換候補の一覧は、図５（ａ）に示すように候補順位の高いものから順に選択肢の文字列が並ぶ。各候補文字列の右側のボタンＢ１〜Ｂ４は、該単語を学習非対象に設定することを指示したり、あるいは既に学習非対象に設定されている場合には、学習対象への変更を指示するためのものである。また、各ボタンＢ１〜Ｂ４には、当該単語に既に学習非対象属性情報が付与されている場合には、その旨を表示するインジケータも兼ねている。ただし、第５候補の「ひろさわ」という文字列は無変換候補のためボタンは存在しない。
【００５２】
図５（ａ）に示した一覧において、第３候補の「廣沢」を選択する場合、ボタンＢ３を選択すると（ステップＳ２１）、候補選択の指示と同時に学習非対象属性情報の付与（学習非対象に設定すること）を指示することを意味し、現在表示されている「広沢」という表記は「廣沢」に置き換わるものの、ステップＳ２３をスキップして、「廣沢」という文字列は学習テーブル１１７には記録されない（ステップＳ２４、ステップＳ１８）。この場合、辞書メンテナンス部１１４は、単語辞書１２２には「廣沢」という単語に対応する学習非対象属性情報として「１」が書き込む（図３（ｂ）参照）。
【００５３】
次に、「ひろさわ」の入力に対する変換候補の一覧を表示した場合は（ステップＳ１９、ステップＳ２０）、単語辞書１２２には「廣沢」という単語に対応する学習非対象属性情報に「１」が書き込まれているので、図５（ｂ）に示すように、変換候補一覧中の「廣沢」に対応するボタンＢ３には、学習非対象に設定されていることを示すチェックマークが表示されている。
【００５４】
一方、図５（ａ）に示したような候補一覧が表示されている状態において、第３候補の「廣沢」を選択する場合、ボタンＢ３を除く部分（例えば、「廣沢」という文字列や、当該文字列に付されている候補順位を表す番号「３」）を選択すると（ステップＳ２１）、候補選択の指示のみ（学習非対象属性の付与の指示は含まれない）を意味し、現在表示されている「広沢」という表記は「廣沢」に置き換わり、ステップＳ２３では、「廣沢」という文字列が学習テーブル１１７に記録される（ステップＳ２４、ステップＳ１８）。
【００５５】
また、図５（ｂ）に示した候補一覧が表示されている状態において、第３候補の「廣沢」を選択する場合、ボタンＢ３を除く部分（例えば、「廣沢」という文字列や、当該文字列に付されている候補順位を表す番号「３」）を選択すると（ステップＳ２１）、学習非対象に設定されたままで、候補選択の指示のみを意味し、現在表示されている「広沢」という表記は「廣沢」に置き換わるものの、ステップＳ２３をスキップして、「廣沢」という文字列は学習テーブル１１７には記録されない（ステップＳ２４、ステップＳ１８）。
【００５６】
また、図５（ｂ）に示した候補一覧が表示されている状態において、第３候補の「廣沢」を選択する場合、チェックマークが表示されているボタンＢ３を選択すると（ステップＳ２１）、それは、候補選択の指示と学習非対象属性情報の削除（学習対象に設定すること）を意味し、現在表示されている「広沢」という表記は「廣沢」に置き換わり、ステップＳ２３では、「廣沢」という文字列が学習テーブル１１７に記録される（ステップＳ２４、ステップＳ１８）。すなわち、以後、「廣沢」という文字列は学習対象となる。
【００５７】
次に、単語辞書１２２中のある単語に学習非対象属性を付与した場合としなかった場合で、入力音声に対応する文字列（文書作成部１１から出力されて出力部１３から表示される文字列）にどの様な違いが生ずるかを別の具体例を挙げて説明する。
【００５８】
例えば、政治討論の場において、入力音声を音声認識した結果、「せーふ」という文字コードの列が得られ、これに対し候補展開部１１９は、単語辞書１２２を用いて、この単語系列の各単語の同音語を求めた結果、当該単語に対する候補順位が第１位の単語として「セーフ」が出力されたところ（ステップＳ１７〜ステップＳ１８）、次の入力音声から「せーふ」という文字コードが得られたとき、再び「セーフ」が出力されてしまう。しかし、ここは政治討論の場であるから、「せーふ」は「政府」であるはずである。このような場合、最初に、辞書メンテナンス部１１４を起動して、「セーフ」という文字列に対し、学習非対象属性を付与することにより、「セーフ」は出力されず、常に候補順位第２位の「政府」を出力することができる。
【００５９】
以上説明したように、上記実施形態によれば、入力音声に対応する候補文字列の一覧から選択によって文字列を学習テーブル１１７に記録しておき、次に同様な音声が入力したときに学習テーブル１１７に記録した文字列を優先的に出力する機能を、ユーザの意志により単語辞書中の単語単位にオンオフできる。ユーザは、希にしか利用しない単語を候補一覧から選択した場合に、次の文書作成作業中に当該単語が第１候補として出力されために別の頻繁に使用される他の候補を選択し直す必要がなくなる。また、例えば差別用語など予め出力を制限したい単語に、文書作成前に上記学習非対象属性を付与しておくことにより、意図的に該単語を選択したときにのみ表示されるように制御することも可能となる。
【００６０】
単語辞書に格納されている各単語に対して学習非対象とする属性を付与することにより、入力音声から認識された単語に対する複数の候補の一覧から学習非対象の単語が選択された場合でも、次回は別のより使われやすい候補が優先的に表示されるようになり、不必要な選択操作を行うことなく、効率よく文書を作成することができる。更に、文書作成中に、図５に示したような候補の一覧から所望の単語を学習非対象に設定することにより、予め単語辞書に格納されている単語に対し、学習非対象属性情報を付与する作業を省略することができる。
【００６１】
本発明の実施の形態に記載した本発明の手法（例えば、図２のフローチャートに示す処理動作）は、コンピュータに実行させることのできるプログラムとして、磁気ディスク（フレキシブルディスク、ハードディスクなど）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤなど）、半導体メモリなどの記録媒体に格納して頒布することもできる。
【００６２】
なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。
【００６３】
【発明の効果】
以上説明したように、本発明によれば、入力音声を音声認識して文書を作成する際に、入力音声から認識された単語に対応する複数の文字列候補のなかから最適な文字列を効率よく選択することができる。
【図面の簡単な説明】
【図１】本発明の実施形態にかかる文書作成装置の構成例を示した図。
【図２】文書作成装置の処理動作を説明するためのフローチャート。
【図３】単語辞書の一例を示した図。
【図４】図１の文書作成装置の動作を具体的に説明するための図。
【図５】候補一覧の表示例を示した図。
【符号の説明】
１…文書作成装置、１１…文書作成部、１２…入力部、１３…出力部、１１２…入力制御部、１１３…入力バッファ、１１４…辞書メンテナンス部、１１５…候補選択部、１１６…単語学習部、１１７…学習テーブル記憶部、１１８…言語解析部、１１９…候補展開部、１２０…候補バッファ、１２１…解析ルール記憶部、１２２…単語辞書記憶部、１２４…出力制御部、１２５…文書バッファ、１３１…マイク、１３２…音声認識部、１３３…編集指示入力部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a document creation device that recognizes input speech and creates a document.
[0002]
[Prior art]
In a conventional document creation device, a learning function is a table in which a word or phrase selected by an operator, or a word or phrase before or after the word or phrase is stored in a table, and the word or phrase appears in a candidate in a later input operation. The candidate ranking was adjusted in consideration of the time stored inside. Furthermore, in learning, specific parts of speech such as attached words, unconverted hiragana candidates, katakana candidates, and the like could be excluded from learning by setting.
[0003]
In general, the learning function in the document creation apparatus sequentially stores candidates (words or phrases) selected by the user from recognition or conversion candidates for input speech or character strings, and creates new speech or characters. When a string is input, if the candidate corresponding to the input speech or input character string includes a stored word or phrase, the recognition accuracy is improved by giving priority to the most recently learned candidate. Alternatively, it is a technique for improving the conversion accuracy.
[0004]
However, in the recognition candidate group or conversion candidate group corresponding to the input speech or character string, there are words that are generally used and words that are rarely used, and if the latter is selected, then the same recognition candidate group / When a conversion candidate group is obtained, in the conventional learning function, a learned word is given priority over a general word, and thus a candidate selection process is often required.
[0005]
In particular, in the case of Kana-Kanji conversion, depending on the reading of the input character string to be converted, the expression of the conversion result, the part of speech, etc., `` no learning '', `` temporary latest use learning '', `` temporary frequency learning '', There is a technique for changing to a learning method such as “permanent latest use learning” or “permanent frequency learning” and enabling selection of an optimal learning method according to the use situation (see, for example, Patent Document 1).
[0006]
[Patent Document 1]
Japanese Patent Laid-Open No. 9-62668
[0007]
[Problems to be solved by the invention]
When the input speech is recognized, when a word that is rarely used is selected from the candidate words for the input speech, the conventional learning function allows the candidate word for the speech to be input the next time the same speech is input again. As a result, the selected word is preferentially selected or displayed, and as a result, the optimum word must be selected again from the list of candidate words corresponding to the input speech, thereby efficiently creating a document. There was a problem that could not be performed.
[0008]
Therefore, in view of the above problems, the present invention efficiently creates an optimum character string from among a plurality of character string candidates corresponding to words recognized from input speech when creating a document by recognizing input speech. An object of the present invention is to provide a document creation apparatus and method that can be selected.
[0009]
[Means for Solving the Problems]
The present invention is for creating a document by recognizing input speech, and (a) stored in a storage means for storing a plurality of character strings and at least a reading corresponding to each of the plurality of character strings. In addition, each of the plurality of character strings is set as one of a learning target and a learning non-target, and (b) each of the plurality of words recognized from the input speech is stored in the storage unit. A plurality of candidate character strings having the same or similar reading as the word are selected from among the plurality of character strings, and (c) a character string corresponding to each of the plurality of words recognized from the input speech, Selecting from the remaining candidate character strings excluding the non-learning character string from the plurality of candidate character strings corresponding to a word, creating the document, and (d) one of the plurality of words The first word that is A character string corresponding to the first word in the document is replaced with a candidate character string selected by the user among the plurality of candidate character strings to be (e) a character string corresponding to each of the plurality of words Among the plurality of candidate character strings corresponding to each word, the learning target character string and the learning non-target character string, only the learning target character string corresponds to the word. By preferentially selecting from a plurality of candidate character strings, when creating a document by recognizing the input speech, it is best to select from among a plurality of character string candidates corresponding to the words recognized from the input speech. A character string can be selected efficiently.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0011]
FIG. 1 shows a configuration example of a document creation apparatus 1 according to an embodiment of a document creation apparatus of the present invention, which is roughly composed of a document creation unit 11, an input unit 12, and an output unit 13. .
[0012]
The input unit 12 includes, for example, a microphone 131 for inputting a voice such as a user's speech, a voice recognition unit 132 that recognizes a voice (input voice) input from the microphone 131 and converts it into a character code, and a user An editing instruction input unit 133 for inputting various editing instructions to the document creation apparatus 1 is provided. From the input unit 12, a character code obtained as a result of voice recognition of the input voice, a control signal (editing instruction for dictionary maintenance, a signal for instructing document creation, and instruction for candidate selection) indicating an editing instruction input by the user Signal etc.) is output and input to the document creation unit 11.
[0013]
The document creation unit 11 performs voice recognition at the input control unit 112, dictionary maintenance unit 114, candidate selection unit 115, word learning unit 116, language analysis unit 118, candidate expansion unit 119, output control unit 124, and input unit 12. Based on the obtained character code, an analysis rule storage unit 121 that stores an analysis rule used when language analysis is performed by the language analysis unit 118, a learning table storage unit 117 that stores a learning table, and a temporary storage of recognition candidates. A candidate buffer 120 for storing the word dictionary, a word dictionary storage unit 122 for storing the word dictionary, and a document buffer 125 for temporarily storing the document.
[0014]
In the following description, a word dictionary stored in the word dictionary storage unit 122 is simply referred to as a word dictionary 122, and a learning table stored in the learning table storage unit 117 is simply referred to as a learning table 117. The analysis rule stored in the analysis rule storage unit 121 is simply referred to as an analysis rule 121.
[0015]
The document creation unit 11 reads a character string corresponding to each of a plurality of words recognized from the input speech, and a plurality of candidate character strings whose readings are the same as or similar to the word among the plurality of character strings in the word dictionary 122. Are selected from the remaining candidate character strings excluding the candidate character strings that are not subject to learning, and a document is created. The created document is stored in the document buffer 125. Further, the first character string designated by the user in the document is replaced with a candidate character string selected by the user from among a plurality of candidate character strings corresponding to the first character string, and the document is edited. .
[0016]
The character code and control signal output from the input unit 12 are input to the input control unit 112. When a signal instructing maintenance of the word 122 is input, the dictionary maintenance unit 114 is activated. When a signal instructing candidate selection is input, the candidate selecting unit 115 is activated.
[0017]
The dictionary maintenance unit 114 stores each word stored in the word dictionary 122 in association with attribute information for each word in order to distinguish whether the word is a learning non-target or a learning target. Here, for example, among the learning target and non-learning target words, only the non-learning target word is stored in association with “1” as the learning non-target attribute information.
[0018]
In the initial state, all the words in the word dictionary 122 may be set as learning targets (that is, the learning non-target attribute information “1” is not registered), but are generally rarely used. The learning non-target may be set in advance for the word (that is, the learning non-target attribute information “1” is not registered).
[0019]
When the character code is input from the input unit 12, the input control unit 112 stores it in the input buffer 113 and activates the language analysis unit 118. Here, the character code may be uniquely determined or may have a plurality of candidates.
[0020]
The language analysis unit 118 refers to the word dictionary 122 and the analysis rule 121 when the input target is analyzed from the input time interval of the input speech or the number of characters stored in the input buffer 113, and the character code stored in the input buffer 113. Are sequentially combined to create a word sequence, and pass the word sequence to the candidate expansion unit 119.
[0021]
In the word dictionary 105, for each of a plurality of words, identification information for identifying each word (also referred to as a dictionary number or dictionary ID), its notation, reading, part of speech, frequency of use, etc. are registered in advance. When the word specified by the user is set from the learning target to the learning non-target, the dictionary maintenance unit 114 writes the learning non-target attribute information as information of the word, and the word specified by the user When setting the learning non-target as the learning target, the dictionary maintenance unit 114 deletes the learning non-target attribute associated with the word.
[0022]
The analysis rule 121 describes whether connection between parts of speech is possible, ease of connection between words, and the like.
[0023]
The candidate expansion unit 119 acquires a homophone from the word dictionary 120 for each word or phrase in the word sequence output from the language analysis unit 118, and stores the acquired homophone word dictionary number and notation in the candidate buffer 120. Store. Further, the candidate expansion unit 119 gives candidate ranks to the homophones stored in the candidate buffer 120 and corresponding to the words in the word series. At that time, the highest candidate ranking is assigned to the most recently used word stored in the learning table 117. After determining the candidate rank, the candidate expansion unit 119 stores the dictionary number and notation of the first candidate in the document buffer 125. In the document buffer 125, a link is established so that the homophone word stored in the candidate buffer 120 can be recognized from the first candidate.
[0024]
The output control unit 124 displays each character string in the document stored in the document buffer 125 on the output unit 13 and displays the word linked to the word in the character string stored in the candidate buffer 120. A list of homophone candidates is displayed on the output unit 13.
[0025]
On the other hand, the input control unit 112 activates the candidate selection unit 115 when the control signal input from the input unit 12 instructs the selection of the next candidate for the word in the character string displayed on the output unit 13. . The candidate selection unit 115 displays a list of homophones of the word linked to the word from the candidate buffer 108 on the output unit 13.
[0026]
When the user selects a desired word from the list, the candidate selection unit 115 refers to the word dictionary 122 and determines whether or not attribute information indicating that the selected word is a learning non-target word is attached to the selected word. When the attribute information is not attached (only when it is not a learning non-target word), the word learning unit 116 is notified that the selected word is confirmed.
[0027]
In response to the notification, the word learning unit 116 stores, in the learning table 117, information on the confirmed word (learning target word that is not a learning non-target word) and information on the immediately preceding and following words as candidate selection history information. To do.
[0028]
In addition, the candidate selection unit 115 retrieves the selected word from the candidate buffer 120, replaces the word in the document buffer 112 displaying the list of candidates currently with the selected word, and updates the screen display. To do. In other words, the candidate selection unit 115 edits the document in the document buffer 125.
[0029]
FIG. 2 is a flowchart for explaining the processing operation of the document creation apparatus 1 in FIG. 1. The learning target word registered in the word dictionary 122 is changed to a learning non-target, or a learning non-target is selected. The processing operation when a word is changed to a learning target and a document is created according to the presence or absence of learning non-target attribute information of the word in the word dictionary 122 is shown.
[0030]
The input control unit 112 receives a signal for instructing maintenance of the word dictionary from the input unit 12 (step S1), and then receives information for designating a word to be processed by the user (step S2). ), The dictionary maintenance unit 114 acquires a word that matches the input information from the word dictionary 122 (step S3). Here, the information for designating a word may be all or a part of information such as word notation, reading, part of speech. In step S3, a list of acquired words is displayed. From this list, a word to be newly learned or a word to be learned is selected from words that are already not to be learned (step S4). .
[0031]
The dictionary maintenance unit 114 temporarily stores a temporary file in order to provide attribute information indicating that the word selected for learning non-target (the learning target) has a learning non-target attribute (step S5). In order to delete the selected word from the table 117, the word learning unit 116 is notified of the selected word. When the notified word is stored in the learning table 117, the word learning unit 116 deletes the notified word from the learning table 117 (step S6). Further, the word selected to be a learning target (non-learning target) is temporarily stored in a temporary file in order to delete the attribute information indicating that it is a learning non-target attribute (step S7).
[0032]
Steps S2 to S8 are repeated until the operation for adding or deleting learning non-target attribute information to the words registered in the word dictionary 122 is completed. The dictionary maintenance unit 114 updates the word dictionary 122 based on the contents recorded in the temporary file at the end of the maintenance (step S9). That is, the learning non-target attribute information is recorded in the word dictionary 122 for the word selected to be the learning non-target in the word dictionary 122, and the word selected to be the learning target in the word dictionary 122 is stored in the word dictionary 122. The learning non-target attribute information is deleted from the word dictionary 122.
[0033]
In step S1, after a signal for instructing document creation is input from the input unit 12, when a character code as a result of speech recognition of the input speech is input (step S10), the input control unit 112 displays the input character code. The data are sequentially stored in the input buffer 103 (step S11).
[0034]
The language analysis unit 118 monitors the character code stored in the input buffer 113, and if it is determined that the character is to be analyzed from the input time interval, the number of characters, etc. (step S12), the language analysis unit 118 refers to the word dictionary 122 and the analysis rule 121 Language analysis is performed (step S13).
[0035]
Here, the linguistic analysis is to combine the character codes to be analyzed in time series, and create a word series that is the most linguistically correct from the information in the word dictionary 122 and the analysis rule 121. The language analysis unit 118 stores the word sequence of the analysis result in the candidate buffer 120 via the candidate expansion unit 119 (step S14).
[0036]
For each word stored in the candidate buffer 120, the candidate expansion unit 119 obtains, from the word dictionary 122, a word that is a homophone of the word (a word having the same or similar reading) and is a conversion candidate for the word, It adds to the candidate buffer 120 (step S15). Further, the candidate expansion unit 119 checks whether each word stored in the candidate buffer 120 is recorded in the learning table 117.
[0037]
Since the history information about the recently selected word (for example, the word ID of the word) is recorded in the learning table 117, it is recorded as a history in the learning table 117 in the words stored in the candidate buffer 120. If there is a word, the word candidate rank stored in the candidate buffer 120 is determined so that the word rank corresponding to the latest history recorded in the learning table 117 is the highest (step S16). The candidate expansion unit 119 stores in the document buffer 125 a word sequence obtained by replacing each word of the word sequence obtained by the language analysis unit 118 with the word having the first candidate rank (step S17).
[0038]
The output control unit 124 displays on the output unit 13 the first candidate consisting of the first candidate word stored in the document buffer 125 (step S18). The user confirms the first candidate displayed on the screen, and if it is different from the expected notation, selects the next candidate.
[0039]
The next candidate selection is started by instructing which word candidate is selected from the editing instruction input unit 123 of the input unit 12 by means of, for example, a mouse click (step S19). The candidate selection unit 115 is a list of homophones (candidate words stored in the candidate buffer 120) of the designated word linked to the word designated by the user in the character string displayed on the output unit 13. Is displayed on the output unit 13 (step S20).
[0040]
When the user selects a desired candidate from the list of candidates (step 21), the candidate selecting unit 115 refers to the word dictionary 122 to check whether learning non-target attribute information is given to the selected word. If not, information on the selected word (such as the word ID of the selected word and the word IDs of the words before and after the word) is notified to the word learning unit 116 (step S22). ).
[0041]
The word learning unit 116 records information on the selected word in the learning table 117 as candidate selection history information (step S23). Further, the candidate selection unit 115 rewrites the word before selection stored in the document buffer 125 with the selected word (step S24), and updates the screen display (step S18). That is, the result of replacement with the word selected by the next candidate selection operation is displayed on the screen.
[0042]
Note that the input control unit 112 receives a control signal input from the input unit 12 when it is neither a control signal for instructing dictionary maintenance, a control signal for instructing document creation, or a control signal for instructing candidate selection. Performs processing instructed by the control signal and waits for input (step S25).
[0043]
FIG. 3 shows an example of the word dictionary 122. The word dictionary 122 stores, for each word, a word ID that is a key when searching for the word, and information such as reading / notation / part of speech / frequency necessary for language analysis by the language analysis unit 118. ing. This information is appropriately increased or decreased depending on the type of input. Further, attribute information indicating whether or not the word is a non-learning object is also stored in association with each word.
[0044]
When the user performs the processing from step S1 to step S9 in FIG. 2, for example, the words with the word IDs “xx08” and “xx0a” are set as learning non-target words among the words in FIG. The dictionary maintenance unit 114 writes “1” as learning non-target attribute information corresponding to the words having the word IDs “xx08” and “xx0a” in the word dictionary 122 (see FIG. 3B).
[0045]
In step S22 of FIG. 2, the candidate selection unit 115 refers to the word dictionary 122 as shown in FIG. 3B and writes “1” as learning non-target attribute information of the word selected as the next candidate. The selected word is not recorded in the learning table 117. That is, the selected word is not learned.
[0046]
Next, a character string corresponding to the input speech (a character string output from the document creation unit 11 and displayed from the output unit 13 in a case where a learning non-target attribute is not given to a certain word in the word dictionary 122) ) With a specific example.
[0047]
As a result of voice recognition of the input voice from the input unit 12, as shown in FIG. 4A, a character code string such as “Meet Hirosawa” is obtained and stored in the input buffer 113. (Step S1, Step S10 to Step S11 in FIG. 2) As a result of the language analysis by the language analysis unit 118, a word series such as “Hirosawa / san / ni / o / ai / shi / masu” is output. In FIG. 4, “/” indicates a break between words. This word sequence is stored in the candidate buffer 120 (steps S12 to S14). The candidate expansion unit 119 uses the word dictionary 122 to obtain a homophone for each word in this word series. In an initial state (a state in which learning is not performed), homophones are ranked based on the frequency of use of each word stored in advance in the word dictionary 122 (steps S15 and S16), and FIG. As shown in the figure, a word series consisting of the words with the highest candidate ranking for each word, such as “Hirosawa / san / ni / o / meet / shi / masu” is obtained (steps S17 to S18). When a list of next candidates for “Hirosawa” is displayed (step S19), a list of conversion candidates as shown in FIG. 4C is obtained (step S20). From this list, the second candidate “Serizawa” is selected (step S21).
[0048]
When the word is a learning target, the word is recorded in the learning table (steps S22 and S23). At this time, the output control unit 124 replaces “Hirosawa” with the selected candidate “Serizawa” and displays it as shown in FIG. 4D (steps S24 and S18).
[0049]
After that, as shown in FIG. 4E, when there is a voice input such as “There was a lotus from Hirosawa” from the input unit 12, when this is stored in the input buffer 113 in the same manner as described above. As a result of the language analysis by the language analysis unit 118, a word sequence such as “Hirosawa / san / kara / renraku / ga / arashi / masashi / ta” is output, and this word sequence is stored in the candidate buffer 120. (Step S12 to Step S14). The candidate expansion unit 119 uses the word dictionary 122 to obtain a homophone for each word in this word series. At that time, since “Serizawa” selected as the previous candidate is recorded in the learning table 117, the candidate ranking of this word is the highest among the homophones corresponding to “Hirosawa”, and FIG. As shown in f), a word sequence consisting of the words having the first candidate rank for each word, such as “Serizawa / san / kara / contact / gai / yes / masashi / ta” is obtained (steps S16 to S18). ).
[0050]
On the other hand, when the word “Serizawa” is non-learning (for example, when the value of learning non-target attribute information is “1”), even if the word is selected from the candidate list in step S21, step S23 is skipped and the history is not recorded in the learning table 117. Therefore, in step S16, “Hirosawa”, which is the most frequently used among the homophones corresponding to “Hirosawa”, has the highest candidate ranking. As shown in FIG. The word sequence of “/ from / contact / ga // yes / better / ta” is obtained (steps S16 to S18).
[0051]
In the document creation device 1 of the above embodiment, when the next candidate is selected in step S19 and subsequent steps, the dictionary maintenance unit 114 is activated at the same time, and the word specified by the user in the character string displayed on the output unit 13 is displayed. When a list of conversion candidates corresponding to is displayed (step S20), a learning non-target attribute may be assigned to the words in the list (set as learning non-target). That is, for example, when a word sequence as shown in FIG. 4B is displayed from the output unit 13, the word “Hirosawa” in the word sequence is designated and the candidate selection unit 115 is activated, A list of word conversion candidates is displayed as shown in FIG. 5A (step S20). In the list of conversion candidates, as shown in FIG. 5A, character strings of options are arranged in descending order of candidate rank. Buttons B1 to B4 on the right side of each candidate character string instruct to set the word as a learning non-object, or instruct to change to a learning object if it is already set as a learning non-object. Is for. Further, each button B1 to B4 also serves as an indicator for displaying the fact that learning non-target attribute information is already given to the word. However, since the character string “Hirosawa” of the fifth candidate is a non-conversion candidate, there is no button.
[0052]
When the third candidate “Serizawa” is selected in the list shown in FIG. 5A, when the button B3 is selected (step S21), the learning non-target attribute information is added simultaneously with the candidate selection instruction (learning non-target). Although the notation “Hirosawa” currently displayed is replaced with “Serizawa”, step S23 is skipped, and the character string “Serizawa” is not stored in the learning table 117. Not recorded (step S24, step S18). In this case, the dictionary maintenance unit 114 writes “1” in the word dictionary 122 as learning non-target attribute information corresponding to the word “Serizawa” (see FIG. 3B).
[0053]
Next, when a list of conversion candidates for the input of “Hirosawa” is displayed (steps S19 and S20), “1” is included in the learning non-target attribute information corresponding to the word “Serizawa” in the word dictionary 122. Since it has been written, as shown in FIG. 5B, a check mark indicating that learning is not set is displayed on the button B3 corresponding to “Serizawa” in the conversion candidate list. .
[0054]
On the other hand, when the third candidate “Serizawa” is selected in the state where the candidate list as shown in FIG. 5A is displayed, the portion excluding the button B3 (for example, the character string “Serizawa” When the number “3” representing the candidate rank attached to the character string is selected (step S21), it means only the instruction for candidate selection (not including the instruction for giving the learning non-target attribute) and is currently displayed. The notation “Hirosawa” is replaced with “Serizawa”, and in step S23, the character string “Serizawa” is recorded in the learning table 117 (steps S24 and S18).
[0055]
Further, when the third candidate “Serizawa” is selected in the state where the candidate list shown in FIG. 5B is displayed, a portion excluding the button B3 (for example, a character string “Serizawa” or the character When the number “3” representing the candidate rank attached to the column is selected (step S21), it means that only the candidate selection instruction is set while being set as a learning non-target, and is currently displayed as “Hirosawa”. Although the notation is replaced with “Serizawa”, step S23 is skipped and the character string “Serizawa” is not recorded in the learning table 117 (step S24, step S18).
[0056]
Further, in the state where the candidate list shown in FIG. 5B is displayed, when the third candidate “Serizawa” is selected, when the button B3 on which a check mark is displayed is selected (step S21), , Meaning deletion of candidate selection and learning non-target attribute information (setting as a learning target), the currently displayed “Hirosawa” is replaced with “Serizawa”, and “Serizawa” is displayed in step S23. A character string is recorded in the learning table 117 (step S24, step S18). That is, thereafter, the character string “Serizawa” is a learning target.
[0057]
Next, a character string corresponding to the input speech (a character string output from the document creation unit 11 and displayed from the output unit 13 in a case where a learning non-target attribute is not given to a certain word in the word dictionary 122) ) Will be described with another specific example.
[0058]
For example, as a result of speech recognition of the input speech in a political discussion, a character code string “Sefu” is obtained. On the other hand, the candidate expansion unit 119 uses the word dictionary 122 to search for this word sequence. As a result of obtaining the homophone of each word, when “safe” is output as the word having the first candidate rank for the word (step S17 to step S18), the character “sefu” is obtained from the next input voice. When the code is obtained, “safe” is output again. However, since this is a place for political debate, "Sef" should be "Government". In such a case, first, the dictionary maintenance unit 114 is activated and the learning non-target attribute is assigned to the character string “safe”, so that “safe” is not output, and the candidate ranking is always second place. Can output "government".
[0059]
As described above, according to the above embodiment, a character string is recorded in the learning table 117 by selection from a list of candidate character strings corresponding to the input speech, and the learning table is next input when similar speech is input. The function of preferentially outputting the character string recorded in 117 can be turned on / off for each word in the word dictionary at the user's will. When the user selects a rarely used word from the candidate list, the user selects another candidate that is frequently used because the word is output as the first candidate during the next document creation operation. There is no need. In addition, for example, by giving the learning non-target attribute to a word whose output is to be limited in advance, such as a discriminatory term, control is performed so that the word is only displayed when the word is intentionally selected. Is also possible.
[0060]
Even when a learning non-target word is selected from a list of a plurality of candidates for a word recognized from the input speech by giving an attribute to be non-learning for each word stored in the word dictionary, Next time, another candidate that is easier to use is preferentially displayed, and a document can be efficiently created without performing an unnecessary selection operation. Furthermore, learning non-target attribute information is given to words stored in the word dictionary in advance by setting a desired word as a learning non-target from a list of candidates as shown in FIG. 5 during document creation. The work to do can be omitted.
[0061]
The method of the present invention described in the embodiment of the present invention (for example, the processing operation shown in the flowchart of FIG. 2) is a program that can be executed by a computer, such as a magnetic disk (flexible disk, hard disk, etc.), optical disk (CD). -ROM, DVD, etc.) and storage media such as semiconductor memory can be distributed.
[0062]
Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.
[0063]
【The invention's effect】
As described above, according to the present invention, when a document is created by recognizing input speech, an optimum character string is efficiently selected from among a plurality of character string candidates corresponding to words recognized from the input speech. You can choose well.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration example of a document creation apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart for explaining a processing operation of the document creation apparatus.
FIG. 3 is a diagram showing an example of a word dictionary.
4 is a diagram for specifically explaining the operation of the document creation apparatus in FIG. 1; FIG.
FIG. 5 is a view showing a display example of a candidate list.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Document creation apparatus, 11 ... Document creation part, 12 ... Input part, 13 ... Output part, 112 ... Input control part, 113 ... Input buffer, 114 ... Dictionary maintenance part, 115 ... Candidate selection part, 116 ... Word learning part DESCRIPTION OF SYMBOLS 117 ... Learning table memory | storage part, 118 ... Language analysis part, 119 ... Candidate expansion | deployment part, 120 ... Candidate buffer, 121 ... Analysis rule memory | storage part, 122 ... Word dictionary memory | storage part, 124 ... Output control part, 125 ... Document buffer, 131: Microphone, 132: Voice recognition unit, 133: Editing instruction input unit.

Claims

A document creation device that recognizes input speech and creates a document,
Storage means for storing a plurality of character strings and at least readings corresponding to each of the plurality of character strings;
Means for setting each of the plurality of character strings stored in the storage means as one of a learning target and a learning non-target;
For each of a plurality of words recognized from the input speech, a means for selecting a plurality of candidate character strings having the same or similar reading as the word from a plurality of character strings stored in the storage means;
A character string corresponding to each of a plurality of words recognized from the input speech is selected from the remaining candidate character strings excluding the non-learning character string from the plurality of candidate character strings corresponding to each word. Creating means for creating the document;
A candidate character string selected by the user among the plurality of candidate character strings corresponding to the first word that is one of the plurality of words, and a character string corresponding to the first word in the document Editing means for editing the document by replacing
Of the learning target character strings and the learning non-target character strings selected as character strings corresponding to the plurality of words by the creating means and the editing means, only the learning target character strings, Means for preferentially selecting from a plurality of candidate character strings corresponding to the word;
A document creating apparatus comprising:

Display means for displaying a list of the plurality of candidate character strings corresponding to the first word when a character string in the document corresponding to the first word is designated by a user;
Means for setting the candidate character string to be learned among the plurality of candidate character strings displayed in the list as the learning non-target;
The document creation apparatus according to claim 1, further comprising:

3. The document creation apparatus according to claim 2, further comprising means for setting, as the learning target, the learning non-target candidate character string among the plurality of candidate character strings displayed in the list.

2. The document creation according to claim 1, wherein the storage unit stores attribute information representing the learning non-target in association with the learning non-target character string among the plurality of character strings. apparatus.

A document creation method for recognizing input speech to create a document,
Each of the plurality of character strings stored in a storage unit that stores a plurality of character strings and at least a reading corresponding to each of the plurality of character strings is set as one of a learning target and a learning non-target. A first step;
For each of a plurality of words recognized from the input speech, a second candidate character string having the same or similar reading as the word is selected from the plurality of character strings stored in the storage means. Steps,
A character string corresponding to each of a plurality of words recognized from the input speech is selected from the remaining candidate character strings excluding the non-learning character string from the plurality of candidate character strings corresponding to each word. A third step of creating the document;
A candidate character string selected by the user among the plurality of candidate character strings corresponding to the first word that is one of the plurality of words, and a character string corresponding to the first word in the document A fourth step to replace
The learning target character string selected from the learning target character string and the non-learning character string selected as character strings corresponding to the plurality of words in the third step and the fourth step, respectively. Only, a fifth step for preferentially selecting from a plurality of candidate character strings corresponding to each word;
A document creation method characterized by comprising:

6. The document according to claim 5, further comprising a sixth step of setting the learning target candidate character string as the learning non-target among the plurality of candidate character strings corresponding to the first word. How to make.

6. The document according to claim 5, further comprising a seventh step of setting the candidate character string not to be learned as the learning target among the plurality of candidate character strings corresponding to the first word. How to make.

A program for recognizing input speech to create a document,
On the computer,
Each of the plurality of character strings stored in a storage unit that stores a plurality of character strings and at least a reading corresponding to each of the plurality of character strings is set as one of a learning target and a learning non-target. A first step;
For each of a plurality of words recognized from the input speech, a second candidate character string having the same or similar reading as the word is selected from the plurality of character strings stored in the storage means. Steps,
A character string corresponding to each of a plurality of words recognized from the input speech is selected from the remaining candidate character strings excluding the non-learning character string from the plurality of candidate character strings corresponding to each word. A third step of creating the document;
A candidate character string selected by the user among the plurality of candidate character strings corresponding to the first word that is one of the plurality of words, and a character string corresponding to the first word in the document A fourth step to replace
The learning target character strings respectively selected as character strings corresponding to the plurality of words in the third step and the fourth step are selected from among a plurality of candidate character strings corresponding to the respective words. A fifth step for preferential selection;
A program that executes