JP2005215726A

JP2005215726A - Information presenting system for speaker, and program

Info

Publication number: JP2005215726A
Application number: JP2004017948A
Authority: JP
Inventors: Takeshi Moriwaki; 健森脇; Toshihiro Shiren; 俊宏枝連
Original assignee: Advanced Media Inc
Current assignee: Advanced Media Inc
Priority date: 2004-01-27
Filing date: 2004-01-27
Publication date: 2005-08-11

Abstract

<P>PROBLEM TO BE SOLVED: To enable an information processor to extract predetermined words from among the uttered voice of a human being, and to present information by using voice recognition to the human being who utters voice. <P>SOLUTION: This information presenting system is provided with a voice input means 7 for inputting the uttered voice of a speaker, an action data storage means 10 for storing conditions corresponding to predetermined keywords and action information related with those conditions by associating them with each other and an information presentation processing means 9 for presenting information by using voice recognition. This information presentation processing means 9 performs the voice recognition processing of the uttered voice, and detects keywords stored in the action data storing means 10 from the processing result, and decides whether or not the detected keywords satisfy the conditions, and extracts the corresponding action information from the action data storing means 10 when the conditions are satisfied, and executes processing based on the extracted action information. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

この発明は、入力された人間の自然な発話音声を認識し、情報処理装置が,その発話の中から予め定められた単語を抽出し、発話する人間に対して、情報を提示する音声認識を利用した情報提示システム及びプログラムの提供に関する。 The present invention recognizes an input human natural speech, and an information processing device extracts a predetermined word from the speech and performs speech recognition for presenting information to the human to speak. The present invention relates to the use of information presentation systems and programs.

人間の自然な発話音声を認識し、その発話の中からキーワードを抽出し、そのキーワードに対応した情報を人間に提供する発明として、特開２００３−２０８４３９号公報（以下、「従来発明」と記す）に記載のものがある。
この従来発明は、ユーザからの電話を介した問合せに対してオペレータが対応するコールセンターのような場所で使用されることを念頭に置いたものである。オペレータが会話応答時に使用すると想定される単語を予め抽出しておき、この単語がユーザとオペレータとの会話の中で出現したとき、その単語に対応する１以上の応答情報となりうる情報と関連づけてオペレータ端末に記憶させておく。そして、オペレータ端末にインストールしたソフトウェアがユーザとオペレータとの会話の音声認識を行い、その認識結果と上記した抽出済みの単語とのマッチングを行い、マッチングの度合いによって、１以上の応答情報となりうる情報を抽出し、これをオペレータ端末に画面表示する。オペレータは、表示された情報そのものをユーザに応答する情報とすることができるし、あるいは、表示された情報の中からいずれかを特定することで適切な情報を得ることができる。この従来発明によって、オペレータは、ユーザからの質問に対し迅速かつ効率よく返答できるようになった。
特開２００３−２０８４３９号公報 Japanese Patent Laid-Open No. 2003-208439 (hereinafter referred to as “conventional invention”) is an invention that recognizes human natural speech, extracts keywords from the speech, and provides information corresponding to the keywords to humans. ).
This conventional invention is intended to be used in a place such as a call center where an operator responds to an inquiry via a telephone from a user. A word that is assumed to be used by the operator during conversation response is extracted in advance, and when this word appears in the conversation between the user and the operator, it is associated with information that can be one or more response information corresponding to the word. Store in the operator terminal. The software installed on the operator terminal performs voice recognition of the conversation between the user and the operator, matches the recognition result with the extracted word, and information that can be one or more response information depending on the degree of matching Is extracted and displayed on the screen of the operator terminal. The operator can use the displayed information itself as information in response to the user, or can obtain appropriate information by specifying one of the displayed information. With this conventional invention, the operator can quickly and efficiently respond to questions from the user.
JP 2003-208439 A

この発明の基本的なコンセプトは、上記の従来発明と共通する。その上で、話者へ提供する情報の質的な充実を図ること、話者に迅速な情報の提示を確保すること、音声認識の精度を高めることで、ユーザへの情報提示サービスの向上を図ることを課題とした。 The basic concept of the present invention is in common with the conventional invention described above. On top of that, improving the information presentation service to users by improving the quality of information provided to speakers, ensuring prompt presentation of information to speakers, and improving the accuracy of voice recognition The task was to plan.

第１の発明は、話者に対する情報提示システムにおいて、話者の発話音声を入力する音声入力手段と、所定のキーワードに対応する条件とそれに関連するアクション情報とを対応づけたキーワードとアクションの対応データを記憶するアクションデータ記憶手段と、音声認識を利用した情報提示を行う情報提示処理手段を備え、該情報提示処理手段は、話者の発話音声の音声認識処理を行い、この処理の結果から、前記アクションデータ記憶手段に記憶されたキーワードを検出し、検出されたキーワードが前記条件を満たすか否かを判定し、条件を満たす場合には対応するアクション情報を前記アクションデータ記憶手段から抽出し、抽出されたアクション情報に基づく処理を実行することを特徴とする。 1st invention is the information presentation system with respect to a speaker. The correspondence of the keyword and action which matched the voice input means which inputs the speech voice of a speaker, the condition corresponding to a predetermined keyword, and action information relevant to it Action data storage means for storing data, and information presentation processing means for presenting information using voice recognition, the information presentation processing means performs voice recognition processing of the uttered voice of the speaker, and from the result of this processing , Detecting a keyword stored in the action data storage means, determining whether the detected keyword satisfies the condition, and extracting the corresponding action information from the action data storage means if the condition is satisfied. The processing based on the extracted action information is executed.

話者の発話音声から予め定めたキーワードが検出された場合に、このシステムは、話者への情報提示というアクションを起こす。このアクションは読み込んだファイルの内容を表示することであったり、プログラムの実行であったり、インターネットのサイトの閲覧であったりする。このアクションの基礎になるファイルやサイトを特定する情報が「アクション情報」である。 When a predetermined keyword is detected from the speech voice of the speaker, the system takes an action of presenting information to the speaker. This action may be to display the contents of the read file, to execute a program, or to browse an Internet site. Information that identifies the file or site that is the basis of this action is “action information”.

第２の発明は、第１の発明において、話者が閲覧可能な出力手段を備え、前記抽出されたアクション情報に基づく処理は前記出力手段に情報を表示させることを特徴とする。 A second invention is characterized in that, in the first invention, an output means that can be viewed by a speaker is provided, and the processing based on the extracted action information causes the output means to display information.

第３の発明は、第１又は第２の発明において、前記情報提示処理手段は、入力された音声をディクテーション処理を用いた方式でテキスト化し、このテキストから前記アクションデータ記憶手段に記憶されているキーワードを検出することを特徴とする。 According to a third invention, in the first or second invention, the information presentation processing means converts the input voice into text using a dictation process, and the text is stored in the action data storage means. It is characterized by detecting a keyword.

第４の発明は、第３の発明において、単語をその表記と読み方とを対応させて記憶するユーザ単語を記憶したユーザ辞書記憶手段を備え、前記ディクテーション処理によって得られたテキストに前記ユーザ辞書記憶手段に記憶されたユーザ単語が含まれる場合には、そのテキスト中の該ユーザ単語の先頭位置を示す情報を出力することを特徴とし、前記情報提示処理手段は該ユーザ単語の先頭位置のみからキーワードの検索を実行することを特徴とする。 According to a fourth invention, in the third invention, there is provided user dictionary storage means for storing a user word for storing a word in association with its notation and reading, and the user dictionary storage is stored in the text obtained by the dictation process. If the user word stored in the means is included, information indicating the head position of the user word in the text is output, and the information presentation processing means can search the keyword only from the head position of the user word. The search is performed.

「ユーザ単語」とは、アクションデータ記憶手段に登録したキーワードの検出の基礎になる単語である。少なくとも先頭部分が「ユーザ単語」として登録されていないキーワードは、テキストのどの位置から検索してよいか不明のため、検出されない。 A “user word” is a word that serves as a basis for detecting a keyword registered in the action data storage means. A keyword that is not registered as a “user word” at least at the beginning is not detected because it is unclear from which position in the text it can be searched.

第５の発明は、第４の発明において、前記アクションデータ記憶手段に記憶させるキーワードを、前記ユーザ辞書記憶手段にユーザ単語として記憶させるユーザ単語登録処理手段を備えることを特徴とする。 A fifth invention is characterized in that, in the fourth invention, there is provided user word registration processing means for storing a keyword to be stored in the action data storage means as a user word in the user dictionary storage means.

第６の発明は、第１〜第５のいずれか１の発明において、前記情報提示処理手段は、前記アクションデータ記憶手段にキーワードに対応する条件と対応づけられたアクション情報が、ファイルの種類とファイル名を特定する情報であれば、そのファイルをオープンし、その種類に応じた処理を行い、インターネット上のサイトのＵＲＬであれば、そのサイトのＷｅｂページの閲覧請求をすることを特徴とする。 According to a sixth invention, in any one of the first to fifth inventions, the information presentation processing means is configured such that the action information associated with the condition corresponding to the keyword in the action data storage means is the type of file. If the file name is information specifying the file name, the file is opened, processing corresponding to the type is performed, and if the URL is a URL of a site on the Internet, a request is made to browse the Web page of the site. .

第７の発明は、第６の発明において、前記ファイルをオープンし、その種類に応じた処理が、ファイルの種類がプログラムファイルである場合に、音声認識結果をそのプログラムの引数とすることを特徴とする。 A seventh invention is characterized in that, in the sixth invention, when the file is opened and the process according to the type is a program file, the speech recognition result is used as an argument of the program. And

第８の発明は、第１〜第７のいずれか１の発明において、前記音声入力手段は、電話回線を介した質問に対応する応答員と質問者との会話の音声を入力し、その会話の音声を前記情報提示処理手段に渡すことを特徴とする。 In an eighth invention according to any one of the first to seventh inventions, the voice input means inputs the voice of the conversation between the responder and the questioner corresponding to the question via the telephone line, and the conversation. Is delivered to the information presentation processing means.

第９の発明は、第８の発明において、前記音声入力手段は、電話回線を介した質問に対応する応答員の話す音声のみを採集する専用のマイクを備え、該マイクに入力された音声が電話回線を経由せずに前記情報提示処理手段の入力データとなることを特徴とする。 In a ninth aspect based on the eighth aspect, the voice input means includes a dedicated microphone that collects only the voice spoken by the responder corresponding to the question via the telephone line, and the voice input to the microphone is It is characterized by being input data of the information presentation processing means without going through a telephone line.

第１０の発明は、コンピュータプログラムにおいて、コンピュータに読み込まれて実行されることにより該コンピュータを話者の発話音声から音声認識によって検出したキーワードに対応した情報を提示する情報提示システムとして動作させるためのコンピュータプログラムであって、前記コンピュータを、所定のキーワードに対応する条件とそれに関連するファイル情報とを対応づけたアクションデータをアクションデータ記憶手段に登録する手段、入力された話者の発話音声に対し音声認識処理を行う手段、音声認識の結果から前記アクションデータ記憶手段に記憶されたキーワードを検出し、検出されたキーワードが前記条件を満たすか否かを判定し、条件を満たす場合には対応するファイル情報を前記アクションデータ記憶手段から抽出し、抽出されたファイル情報に基づく処理を実行する手段、その実行結果を出力させる手段として機能させることを特徴とする。 According to a tenth aspect of the present invention, in a computer program, when the computer program is read and executed by the computer, the computer is operated as an information presentation system that presents information corresponding to a keyword detected by speech recognition from a speaker's speech. A computer program for registering action data in the action data storage means in which action data in which a condition corresponding to a predetermined keyword is associated with file information related to the computer is input to an input voice of a speaker Means for performing speech recognition processing, detecting a keyword stored in the action data storage means from the result of speech recognition, determining whether the detected keyword satisfies the condition, and responding if the condition is satisfied File information from the action data storage means Out, means for executing the processing based on the extracted file information, characterized in that to function as means for outputting the execution result.

この発明によれば、連続する発話音声をシステムが音声認識し、検出したキーワードに適した情報を話者に提示することができる。電話による問合せへの応対を業務とするコールセンターの応答員等を、この発明のシステムのユーザとすれば、発話の進行につれ、システムがタイムリーに情報を提示してくれるので、電話の相手方に迅速・的確に応対できる。 According to the present invention, the system can recognize a continuous utterance voice and present information suitable for the detected keyword to the speaker. If a call center responder who handles telephone inquiries is a user of the system of the present invention, the system presents information in a timely manner as the utterance progresses.・ Can respond accurately.

また、この発明は、単数のキーワードだけでなく、複数のキーワードが出現した場合にも対応できる点に特色がある。複数のキーワード間の論理関係（ａｎｄ、ｏｒ、ｎｏｔ）から、その論理値を抽出する。すなわち、単なる単語の羅列ではなく、文章そのものを抽出したことになるので、より質の高い情報提供が可能となる。 In addition, the present invention is characterized in that it can cope with not only a single keyword but also a plurality of keywords. The logical value is extracted from the logical relationship (and, or, not) between a plurality of keywords. That is, since the sentence itself is extracted, not just a list of words, it is possible to provide higher quality information.

さらに、この発明によれば、キーワードの候補をユーザ単語として登録し、ディクテーション処理結果のテキスト中、ユーザ単語の先頭位置からキーワードを検索するので、迅速にキーワードが検出できる。その結果、話者の自然な発話の進行にあわせて、タイミングよい情報の提示が可能になる。しかも、キーワードの登録とユーザ単語の登録は、１度の操作でできるので、煩雑なデータ登録の手間が軽減されるとともに、データ間の不整合を防止できる。 Furthermore, according to the present invention, keyword candidates are registered as user words, and the keywords are searched from the head position of the user words in the text of the dictation processing result, so that the keywords can be detected quickly. As a result, information can be presented in a timely manner as the speaker's natural utterance progresses. Moreover, since keyword registration and user word registration can be performed with a single operation, troublesome data registration is reduced and inconsistencies between data can be prevented.

さらにまた、この発明によれば、話者の音声を専用のマイクで採集するので、音質劣化がなく、精度の高い音声認識が可能となる。 Furthermore, according to the present invention, since the voice of the speaker is collected by a dedicated microphone, there is no deterioration in sound quality, and highly accurate voice recognition is possible.

以下に、この発明の情報提示システムが、顧客からの電話による問合せに応対する応答員を配備したコールセンターに適用される場合の実施形態を説明する。
図１は、この実施形態のシステム全体の構成例を示す図である。
応答員は、情報処理装置１と電話機２が設置されている場所に配置されている。この応答員側の電話機２は、公衆通信回線網Ｎ１によって、顧客側の電話機３と接続し、双方向の通信が行われる。応答員側の情報処理装置１は、通信回線Ｎ２を介して応答員を管理する管理者側の情報処理装置４と接続する。さらに、情報処理装置１は、インターネットＮ３を介して、ホームページ用サーバ（以下「ＨＰ用サーバ」と記す）５と接続可能である。 In the following, an embodiment in which the information presentation system of the present invention is applied to a call center provided with responders who respond to telephone inquiries from customers will be described.
FIG. 1 is a diagram showing a configuration example of the entire system of this embodiment.
The responder is arranged at a place where the information processing apparatus 1 and the telephone 2 are installed. The telephone 2 on the responder side is connected to the telephone 3 on the customer side via the public communication line network N1, and two-way communication is performed. The responder-side information processing device 1 is connected to the manager-side information processing device 4 that manages the responder via the communication line N2. Further, the information processing apparatus 1 can be connected to a homepage server (hereinafter referred to as “HP server”) 5 via the Internet N3.

図２は、応答員側の情報処理装置１の構成を示すブロック図である。
情報処理装置１は、コンピュータ本体６と、音声入力手段７と、出力手段８を備えている。 FIG. 2 is a block diagram illustrating a configuration of the information processing apparatus 1 on the responder side.
The information processing apparatus 1 includes a computer main body 6, audio input means 7, and output means 8.

コンピュータ本体６は、情報提示処理手段９、アクションデータ記憶手段１０、ユーザ辞書記憶手段１１、ファイル記憶手段１２、ユーザ単語登録処理手段１３を備えている。さらに、コンピュータ本体６は、通信インタフェース（不図示）を備え、音声入力手段７や出力手段８、さらにＬＡＮやインターネット等を介して管理者側の情報処理装置４やＨＰ用サーバ５等との接続機能を有している。 The computer main body 6 includes information presentation processing means 9, action data storage means 10, user dictionary storage means 11, file storage means 12, and user word registration processing means 13. Further, the computer main body 6 includes a communication interface (not shown), and is connected to the information processing apparatus 4 on the administrator side, the HP server 5, etc. via the voice input means 7 and the output means 8, and the LAN and the Internet. It has a function.

音声入力手段７は、話者の発話音声を採集するための手段であり、音声を採集できる手段ならば、なんでもよい。したがって、電話機２をコンピュータ本体６に接続し、これを音声入力手段としてもよい。しかし、この実施形態では、電話機２とは別に、電話回線に流れる前の応答員の発話音声を採集する専用のマイクを備え、これを音声入力の手段としている。いったん電話回線に流れると音質の劣化は免れがたいので、これを防止し、精度の高い音声認識が行えるようにするためである。 The voice input means 7 is a means for collecting the speech voice of the speaker, and any means can be used as long as it can collect the voice. Therefore, the telephone 2 may be connected to the computer main body 6 and used as voice input means. However, in this embodiment, in addition to the telephone set 2, a dedicated microphone for collecting the voice of the responder before flowing on the telephone line is provided, and this is used as a voice input means. This is to prevent deterioration of the sound quality once it flows through the telephone line, so that this can be prevented and highly accurate voice recognition can be performed.

なお、音声入力手段７から収集された音声データはアナログ信号データであるので、これを情報提示処理手段９に入力する前に、デジタル信号データに変換する。以後、情報提示処理手段９による音声認識処理の対象となる音声データとは、このＡ／Ｄ変換後のデータのことである。
データ変換等の詳細は、この発明にとっての本質的事項ではないので、説明を省略する。 Since the voice data collected from the voice input means 7 is analog signal data, it is converted into digital signal data before being input to the information presentation processing means 9. Hereinafter, the voice data to be subjected to voice recognition processing by the information presentation processing means 9 is data after this A / D conversion.
Details of the data conversion and the like are not essential matters for the present invention, and thus description thereof will be omitted.

出力手段８は、応答員に対して、発話音声の音声認識結果から抽出されたキーワードと所定の関連のある情報を出力する手段である。出力手段８としては、話者が閲覧可能なディスプレイが考えられる。しかし、ディスプレイに限らず、応答員が提示される情報を知得しうるものならばなんでもよい。 The output unit 8 is a unit that outputs information related to the keyword extracted from the speech recognition result of the uttered voice to the responder. As the output means 8, a display that can be viewed by a speaker is conceivable. However, it is not limited to the display, and anything that can know the information presented by the responder may be used.

情報提示処理手段９は、音声入力手段７によって入力された音声を、音声認識処理し、その認識結果からキーワードを検出し、そのキーワードと対応する条件が成立するとき、予め定められた処理を行い、その処理結果を出力手段８に送信し出力させる。これらの処理は、外部補助記憶装置やＲＯＭに格納されているコンピュータプログラムをメモリ上にロードすることによって行われる。 The information presentation processing unit 9 performs voice recognition processing on the voice input by the voice input unit 7, detects a keyword from the recognition result, and performs a predetermined process when a condition corresponding to the keyword is satisfied. Then, the processing result is transmitted to the output means 8 for output. These processes are performed by loading a computer program stored in an external auxiliary storage device or ROM onto the memory.

この実施形態では、情報提示処理手段９による音声認識処理は、ディクテーションを使用する方法によって音声をテキスト化する処理部分、このテキストから予め定められたキーワードを検出する処理部分に分かれる。説明の便宜上、前者をディクテーション処理、後者をキーワード検出処理と表現する。このディクテーション処理は、一般的な文章を「てにをは」を含めて認識する。あらかじめ作成した単語等のリストにある単語を認識対象とするルールグラマーを利用した方法に比べると、多くの可能性のある中で処理をする点で困難な方法である。そこで、この困難の緩和のために、公知の手法であるＮグラムを用いたモデリングを用いる。
このＮグラムモデルは、Ｎ＝３の場合（トライグラムという）を例にとると、与えられた単語列ｗ₁ｗ_２・・・ｗ_ｎの出現確率Ｐ（ｗ₁ｗ_２・・・ｗ_ｎ）の推定をする場合に、Ｐ（ｗ₁ｗ_２・・・ｗ_ｎ）＝ΠＰ（ｗ_i｜ｗ_i-2、ｗ_i-1）×Ｐ（ｗ₁ｗ_２）のような近似をするモデルである。右辺のＰ（ｗ_i｜ｗ_i-2、ｗ_i-1）は、単語ｗ_i-2、ｗ_i-1と来たときに、次にｗ_iが来る条件付確率を表す。Ｐ（ｗ_i｜ｗ_i-2、ｗ_i-1）のすべての積を計算し、Ｐ（ｗ₁ｗ_２・・・ｗ_ｎ）が最も大きな値を取る単語列の組み合わせを認識結果として決定する。 In this embodiment, the speech recognition processing by the information presentation processing means 9 is divided into a processing portion for converting speech into text by a method using dictation and a processing portion for detecting a predetermined keyword from this text. For convenience of explanation, the former is expressed as dictation processing and the latter is expressed as keyword detection processing. This dictation process recognizes a general sentence including “Tenanoha”. Compared to a method using a rule grammar that recognizes a word in a list such as a word that has been created in advance, this method is difficult in terms of processing in many possibilities. Therefore, in order to alleviate this difficulty, modeling using an N-gram that is a known method is used.
The N-gram model, N = take 3 of the case (referred to as tri-gram) as an example, the word was given string w ₁ w _{2 ···} w _n of the occurrence probability _{_{P (w 1 w 2 ··· w}} n ) Is approximated by P (w ₁ w _2... W _n ) = ΠP (w _i | w _i−2, w _i−1 ) × P (w ₁ w ₂ ). It is a model. P (w _i | w _i−2, w _i−1 ) on the right side represents a conditional probability that w _i comes next when the words w _i−2 and w _i−1 come. All products of P (w _i | w _i−2, w _i−1 ) are calculated, and a combination of word strings in which P (w ₁ w _2... W _n ) has the largest value is determined as a recognition result. To do.

このようなディクテーション処理によって、音声をテキスト化できるので、このテキストを文字列検索することでキーワードの検出が容易にできる。
また、音声認識の結果がテキストとして得られるため、このテキストを後日の記録として保存できる。さらに、このテキストを編集して応答員が日報を簡単に作成できる等の副次的な効果も得られる。 Such a dictation process enables speech to be converted into text, so that a keyword can be easily detected by performing a character string search on the text.
Moreover, since the result of speech recognition is obtained as text, this text can be saved as a record of the future. In addition, there is a secondary effect that the responder can easily create a daily report by editing this text.

アクションデータ記憶手段１０は、キーワードに対応する条件とそれに関連するアクション情報とを対応づけたキーワードとアクションの対応データを格納するものである。キーワードとアクションの対応データについては、後に詳しく説明する。
ユーザ辞書記憶手段１１は、後に詳しく説明するユーザ辞書を格納するものである。
ファイル記憶手段１２は、上記したアクション情報として登録されているファイル等を格納しておく手段である。
アクションデータ記憶手段１０、ユーザ辞書記憶手段１１、ファイル記憶手段１２は、コンピュータ本体６に備えられたＣＤ−ＲＯＭやハードディスクなどの補助記憶装置に実装されている。 The action data storage unit 10 stores correspondence data of keywords and actions in which conditions corresponding to the keywords are associated with action information related thereto. The correspondence data between keywords and actions will be described in detail later.
The user dictionary storage unit 11 stores a user dictionary which will be described in detail later.
The file storage means 12 is a means for storing a file or the like registered as the above action information.
The action data storage means 10, the user dictionary storage means 11, and the file storage means 12 are mounted on an auxiliary storage device such as a CD-ROM or a hard disk provided in the computer main body 6.

次に、ユーザ単語登録処理手段１３は、アクションデータ記憶手段１０に記憶させるキーワードを、ユーザ辞書記憶手段１１にユーザ単語として記憶させる手段である。
なお、情報提示処理手段９、ユーザ単語登録処理手段１３は、コンピュータ本体６に搭載されたＣＰＵが、ＲＯＭ，あるいは補助記憶装置等から所定のコンピュータプログラムを読み込んで実行することにより実現されている。 Next, the user word registration processing means 13 is a means for storing a keyword stored in the action data storage means 10 as a user word in the user dictionary storage means 11.
The information presentation processing unit 9 and the user word registration processing unit 13 are realized by a CPU mounted on the computer main body 6 reading and executing a predetermined computer program from a ROM or an auxiliary storage device.

次に、アクションデータ記憶手段１０に格納されるキーワードとアクションの対応データについて説明する。図３は、キーワードに対応する条件とアクション情報との対応を例示したものである。キーワードとは、図３の例では、キーワードに対応する条件欄１４（以下「条件欄」という）に現れる「新型電子レンジ」、「電子レンジ」、「冷蔵庫」、「価格」、「値段」、「わかりました」という単語のことである。ディクテーション処理の出力テキストからキーワードが検出されると、そのキーワードは有効であるとする。検出されないキーワードは無効である。
なお、便宜上、有効とは論理値“真”を、無効とは論理値“偽”を持つ状態として説明する。
キーワードに対応する条件とは、各キーワードが持つ値を論理演算した結果をいい、これが“真”の場合、キーワードに対応する条件が満たされたと判定される。条件欄１４に掲げる条件が満たされたとき、対応するアクション情報欄１５に記述されたファイル識別子やＵＲＬに基づく処理が実行されることになる。 Next, the correspondence data between keywords and actions stored in the action data storage means 10 will be described. FIG. 3 illustrates the correspondence between the conditions corresponding to the keywords and the action information. In the example of FIG. 3, the keywords are “new microwave oven”, “microwave oven”, “refrigerator”, “price”, “price”, which appear in the condition column 14 (hereinafter referred to as “condition column”) corresponding to the keyword. It is the word "I understand". When a keyword is detected from the output text of the dictation process, it is assumed that the keyword is valid. Keywords that are not detected are invalid.
For the sake of convenience, the description will be made assuming that “effective” has a logical value “true” and “invalid” has a logical value “false”.
The condition corresponding to the keyword means a result obtained by performing a logical operation on the value of each keyword. When this is “true”, it is determined that the condition corresponding to the keyword is satisfied. When the conditions listed in the condition column 14 are satisfied, processing based on the file identifier and URL described in the corresponding action information column 15 is executed.

キーワードに対応する条件について、以下に、詳しく説明する。
条件欄１４ａは、キーワード「新型電子レンジ」の値が真、つまり、「新型電子レンジ」が話者の発話音声から検出されたことを条件として、アクション情報欄１５ａに基づく処理を実行することを意味する。
条件欄１４ｂは、キーワード「電子レンジ」の値が真、かつ、「not新型電子レンジ」が真、つまり、キーワード「新型電子レンジ」の値が偽の場合に、対応したアクション情報に基づく処理を実行することを意味する。
条件欄１４ｃは、キーワード「新型電子レンジ」の値が真、かつ、「価格 or 値段」が真、つまり、キーワード「価格」か「値段」のいずれかの値が真の場合に、対応したアクション情報に基づく処理を実行することを意味する。 The conditions corresponding to the keywords will be described in detail below.
The condition column 14a indicates that the process based on the action information column 15a is executed on condition that the value of the keyword “new microwave oven” is true, that is, the “new microwave oven” is detected from the voice of the speaker. means.
In the condition column 14b, when the value of the keyword “microwave oven” is true and “not new microwave oven” is true, that is, the value of the keyword “new microwave oven” is false, processing based on the corresponding action information is performed. It means to execute.
The condition column 14c shows the corresponding action when the value of the keyword “new microwave oven” is true and “price or price” is true, that is, when the value of either the keyword “price” or “price” is true. This means that processing based on information is executed.

アクション情報欄１５には、ファイル識別子やＵＲＬが記述されている。アクション情報欄１５ａに記述されたファイル識別子「c:\product\new_microwave.jpg」から、ファイル名とファイルの格納場所とファイル種類がわかる。ファイル拡張子が「ｊｐｇ」であることから、当該ファイルは画像ファイルであり、キーワード「新型電子レンジ」が出現したときに、情報提示処理手段９は、このファイルをファイル記憶手段１２から読み出し、画像データを出力手段８に出力することになる。 The action information column 15 describes file identifiers and URLs. From the file identifier “c: \ product \ new_microwave.jpg” described in the action information column 15a, the file name, the file storage location, and the file type can be known. Since the file extension is “jpg”, the file is an image file, and when the keyword “new microwave oven” appears, the information presentation processing means 9 reads this file from the file storage means 12, and Data is output to the output means 8.

なお、ファイル拡張子は、他に、実行形式ファイルを意味する「ｅｘｅ」やＷｏｒｄ（マイクロソフト社の製品名）で作成された文書ファイルを意味する「ｄｏｃ」など種々があり、情報提示処理手段９は、それぞれのファイル種別に応じた処理を行う。例えば、アクション情報欄１５ｂには「warning」という名前のプログラムを格納したファイルが指定されている。このように、アクション情報欄１５にプログラムファイルが記述されているときは、音声認識処理の結果を引数とすることができる。音声認識処理の結果とは、ディクテーション処理によって得られたテキストや、キーワード検出処理によって得られたキーワードなどのことである。また、引数をとる場合は、その個数は１個に限らない。 There are various other file extensions, such as “exe”, which means an executable file, and “doc”, which means a document file created with Word (a Microsoft product name). Performs processing according to each file type. For example, a file storing a program named “warning” is designated in the action information column 15b. Thus, when the program file is described in the action information column 15, the result of the speech recognition process can be used as an argument. The result of the speech recognition process is a text obtained by the dictation process, a keyword obtained by the keyword detection process, or the like. Also, when taking an argument, the number is not limited to one.

アクション情報欄１５ｃには「http://pricelist/new_microwave」というＵＲＬが記述されている。情報提示処理手段９は、条件欄１４ｃが真の場合、ＨＰ用サーバ５に対し、このＵＲＬのサイトのコンテンツを送信するように要求する。 In the action information column 15c, a URL "http: // pricelist / new_microwave" is described. When the condition column 14c is true, the information presentation processing means 9 requests the HP server 5 to transmit the content of the site with this URL.

アクション情報欄１５には、上記のようにプログラムを指定することができるが、このプログラムは、応答員に対し、顧客からの質問などに答えるための情報を取得させるものに限定されない。応答員のＯＪＴを目的とするものであってもよい。
例えば、アクション情報欄１５ｂで指定されている「warning」は、不適当な言葉遣いを検出したときに、警告を発し、正しい言葉遣いを画面に表示させるプログラムであるとする。応答員が不適当な言葉遣い（図３の例では、「わかりました」）をすると、これが引数としてプログラム「warning」に渡され、正しい言葉遣い（例えば、「かしこまりました」）を応答員に示せば、応答員に対する教育指導ができる。あるいは、不適当な言葉遣いの頻度の高い応答員の発話内容を管理者側のコンピュータ４に転送するようなプログラムであってもよい。
要するに、アクションデータとして、どのようなキーワードを登録し、どのような条件のもと、どのような処理を実行するかについて、この実施形態は制約を設けていない。 Although the program can be specified in the action information column 15 as described above, this program is not limited to the one that causes the responder to acquire information for answering the question from the customer. It may be aimed at the responder's OJT.
For example, it is assumed that “warning” specified in the action information column 15b is a program that issues a warning and displays a correct wording on the screen when an inappropriate wording is detected. If the responder makes an inappropriate wording ("I understand" in the example in Figure 3), this is passed as an argument to the program "warning" and the correct wording (for example, "I'm clever") If you show it, you can give instruction to responders. Or the program which transfers the utterance content of the responder with a high frequency of improper wording to the administrator's computer 4 may be sufficient.
In short, in this embodiment, there is no restriction on what keywords are registered as action data and what processing is executed under what conditions.

次に、ユーザ辞書記憶手段１１に格納されるユーザ辞書について説明する。
図４は、ユーザ辞書の一例を示したものである。この辞書には、上記したアクションデータ記憶手段１０に登録するキーワードを、その表記と読み方とを対応づけて登録する。
後に詳しく説明する情報提示のためのキーワードの検出は、このユーザ辞書に登録されている単語のみを対象とする。このように検出するべきキーワードを絞り込んでいるので、話者の自然な発声に追従可能な処理スピードが実現できる。 Next, the user dictionary stored in the user dictionary storage unit 11 will be described.
FIG. 4 shows an example of a user dictionary. In this dictionary, keywords to be registered in the action data storage means 10 are registered in association with their notations and readings.
The detection of keywords for presenting information, which will be described in detail later, targets only words registered in this user dictionary. Since the keywords to be detected are narrowed down in this way, a processing speed capable of following the natural utterance of the speaker can be realized.

アクションデータの登録およびユーザ辞書への登録は、別々に行おうとすると手間のかかる作業となる。
この実施形態では、ユーザが、図示しないキーボードやマウスなどの入力手段を用いてアクションデータ記憶手段１０にキーワードを登録すると、ユーザ単語登録処理手段１３がこのキーワードの表記をユーザ辞書に登録する。表記と対応した読み方は、ユーザ単語登録処理手段１３がシステム辞書（不図示）を参照等することにより、ユーザが入力しなくても登録できる。なお、単語の表記から読み方を求める処理は公知であり、この実施形態も公知な処理にしたがっている。
なお、ここでいう「ユーザ」とは、この実施形態のシステムを使用する者のことであり、コールセンターの管理者や応答員も「ユーザ」に該当する。 Registration of action data and registration in the user dictionary are laborious operations if they are performed separately.
In this embodiment, when a user registers a keyword in the action data storage unit 10 using an input unit such as a keyboard or a mouse (not shown), the user word registration processing unit 13 registers the notation of the keyword in the user dictionary. The reading corresponding to the notation can be registered without user input by the user word registration processing means 13 referring to a system dictionary (not shown). In addition, the process for obtaining the reading from the word notation is publicly known, and this embodiment also follows the publicly known process.
The “user” here is a person who uses the system of this embodiment, and a call center administrator and responder also correspond to the “user”.

この実施形態では、ユーザは、アクションデータ記憶手段１０にキーワードを登録するキーボード入力などの操作をすれば、ユーザ辞書へ登録するための操作を行う必要がないという点に意味がある。ユーザ辞書へ登録するために、ユーザの操作が別途必要であるとすれば、この操作を失念した結果、アクションデータとユーザ辞書との整合がとれなくなるという不都合が生じうる。しかし、ユーザが意識しないうちにユーザ単語登録処理手段１３がユーザ辞書への登録まで行うので、このような問題は生じない。
なお、このことは、ユーザがユーザ辞書への登録手続きを行おうとする余地を排除することを意味するのではない。例えば、「ＷＹＳＩＷＹＧ」のように、読み方が難しいキーワードの場合は、ユーザがキーボードの操作等によって「うぃずぃうぃぐ」とユーザ辞書に登録することを妨げるものではない。 In this embodiment, if the user performs an operation such as a keyboard input for registering a keyword in the action data storage unit 10, it is not necessary to perform an operation for registering in the user dictionary. If a user operation is separately required for registration in the user dictionary, forgetting this operation may result in inconvenience that the action data cannot be matched with the user dictionary. However, since the user word registration processing unit 13 performs registration to the user dictionary without the user being aware of this, such a problem does not occur.
This does not mean that there is no room for the user to perform registration procedures in the user dictionary. For example, in the case of a keyword that is difficult to read, such as “WYSIWYG”, this does not prevent the user from registering “wizig” in the user dictionary by operating the keyboard or the like.

ところで、情報提示処理手段９による処理は、コールセンターの稼動時間中、絶え間なく行われる性質のものである。これに対し、ユーザ単語登録処理手段１３による処理は、システムの運用開始前、運用開始後のメンテナンス時などに行われ、恒常的に行われる性質のものではない。そのうえ、特定の担当者をその登録処理に充てたほうが適切な場合もある。そこで、ユーザ単語登録処理手段１３は、応答員側の情報処理装置１に備えず、管理者側の情報処理装置４に備えるようにしてもよい。 By the way, the processing by the information presentation processing means 9 is of a nature that is continuously performed during the operation time of the call center. On the other hand, the processing by the user word registration processing means 13 is performed at the time of maintenance after starting the operation of the system or after starting the operation, and is not a property that is constantly performed. In addition, it may be appropriate to assign a specific person in charge to the registration process. Therefore, the user word registration processing means 13 may be provided in the information processing apparatus 4 on the manager side, not in the information processing apparatus 1 on the responder side.

次に、この実施形態の作用について説明する。
コールセンターの応答員が、電話による問合せをしてきたお客様と対話をし、このとき、応答員が、「かしこまりました。新型電子レンジをご購入されるためのお値段のお問い合わせでございますね」と発話した場合を例に説明する。以下の説明において、図３、図４に示す例を利用する。
この応答員の発話音声が音声入力手段７を介して、情報提示処理手段９に入力される。 Next, the operation of this embodiment will be described.
A call center responder interacted with a customer who made an inquiry over the phone, and at this time, the responder said, “It was clever. You are inquiring about the price for purchasing a new microwave oven.” An example will be described. In the following description, the examples shown in FIGS. 3 and 4 are used.
The voice of the responder is input to the information presentation processing means 9 via the voice input means 7.

情報提示処理手段９は、ユーザ辞書記憶手段１１に格納されたユーザ辞書および図示しないシステム辞書を用いて、ディクテーション処理を行い、入力された音声をテキストに変換する。ディクテーション処理によって、例えば「貸し困りました。☆新型☆電子レンジを五個乳されるためのお☆値段のお問い合わせでございますね」という認識結果が得られたとする。ここで、☆は「ユーザ単語の先頭位置を示す情報」を表す。先頭位置を示すものであれば、テキストが格納されたメモリの該当するアドレスでも、何番目の文字であるかといった情報でも、なんでもよい。 The information presentation processing means 9 performs a dictation process using a user dictionary stored in the user dictionary storage means 11 and a system dictionary (not shown), and converts the input speech into text. For example, suppose that the dictation process gives a recognition result, for example, "I was having trouble lending. ☆ New ☆ Inquiry about price for milking five microwave ovens." Here, ☆ represents “information indicating the start position of the user word”. As long as it indicates the head position, it may be the corresponding address of the memory in which the text is stored, or information such as what number character.

次いで、情報提示処理手段９は、応答員の発声に合わせて、上記テキストを順次に処理していく。例えば、「貸し困りました。☆新型☆電子レンジを」までのテキストを受け取ったところで、キーワードの先頭位置を示す情報がある「新」から始まる文字列「新型電子レンジを」及び「電」から始まる文字列「電子レンジを」を対象に、アクションデータ記憶手段１０からキーワードの検索を行う。
ここで、「新」が出現する以前の「貸し困りました。」はキーワード検出の対象とする必要がない。このように、キーワードを検出する範囲が絞り込めるので、迅速な処理が実現できる。
なお、図４に示すユーザ辞書の例では、「新型電子レンジ」がユーザ単語として登録されていない。しかし、「新型」が登録されているので、「新」から始まる文字列をキーワード検索の対象とするべきことがわかる。したがって、図３の例にあるキーワード「新型電子レンジ」が検出洩れになることはない。 Next, the information presentation processing means 9 sequentially processes the text in accordance with the utterance of the responder. For example, after receiving the text up to "I was troubled to lend. ☆ New ☆ Microwave", there is information indicating the head position of the keyword. From the strings "New microwave" and "Den" starting with "New" The keyword is searched from the action data storage means 10 for the character string “microwave” starting with the character string.
Here, it is not necessary to use the keyword detection for “I had trouble lending” before “New” appeared. As described above, since the keyword detection range can be narrowed down, rapid processing can be realized.
In the example of the user dictionary shown in FIG. 4, “new microwave oven” is not registered as a user word. However, since “new” is registered, it is understood that a character string starting from “new” should be the target of keyword search. Therefore, the keyword “new type microwave oven” in the example of FIG. 3 will not be missed.

キーワードの検索を行うことにより、キーワード「新型電子レンジ」及び「電子レンジ」が検出され、これらのキーワードを有効にする。ここで、キーワードの検索は、文字列の左端（「新」「新型」「新型電」．．．）がキーワードの集合（数万語に達することもある）の要素との一致の有無を調べることによって行う。１つの文字列と多数のキーワードを対象とするこの検索は、ハッシュインデクスなどの公知の方法で高速に行うことができる。また、ユーザ単語の先頭位置からのみキーワードの検索を行うので、検索回数が少なくて済み、効率的に処理ができる。 By searching for keywords, the keywords “new microwave oven” and “microwave oven” are detected, and these keywords are validated. Here, the keyword search is performed by checking whether the left end ("new", "new", "new model" ...) of the character string matches the element of the keyword set (may reach tens of thousands of words). By doing. This search for a single character string and a large number of keywords can be performed at high speed by a known method such as a hash index. In addition, since the keyword search is performed only from the head position of the user word, the number of searches can be reduced and processing can be performed efficiently.

このように効率的な処理ができるということは、応答員の自然な発話にこのシステムが対応できることを意味する。もし、このシステムによる検索処理の効率が悪ければ、応答員に不自然な発話を強いることになりかねない。それでは、応答員の業務を支援するという目的がまっとうできない。 This efficient processing means that this system can respond to the natural utterances of responders. If the efficiency of the search process by this system is poor, it may force the responder to speak unnaturally. Then, the purpose of supporting the responder's work cannot be fulfilled.

キーワード「新型電子レンジ」が検出されたので、アクションデータ記憶手段１０からキーワードに対応する条件を満たすものを求めると、「新型電子レンジ c:\product\new_microwave.jpg」が該当する。そこで、情報提示処理手段９は、ファイル「c:\product\new_microwave.jpg」をオープンし、読み込んだ画像データを出力手段８に送信し表示させる。
さらに、「．．．五個乳されるためのお☆値段の」まで処理が進むと、キーワード「値段」が検出されたので、これを有効にする。それ以前の「五個乳されるためのお」はキーワード検出の対象とする必要がない。
ここまでの処理で、「新型電子レンジ」と「値段」が検出され、いずれも有効なので、アクションデータ「新型電子レンジ and (価格 or 値段) http://pricelist/new_microwave」の条件が満たされた。そこで、当該ＵＲＬのサイトにコンテンツの閲覧請求をする。 Since the keyword “new microwave oven” has been detected, if a condition that satisfies the condition corresponding to the keyword is obtained from the action data storage means 10, “new microwave oven c: \ product \ new_microwave.jpg” corresponds. Therefore, the information presentation processing means 9 opens the file “c: \ product \ new_microwave.jpg”, and transmits the read image data to the output means 8 for display.
Furthermore, when the processing proceeds to “... Price for five milks”, the keyword “price” is detected, and this is validated. Prior to that, “To be breast-fed” does not need to be a target for keyword detection.
In the process so far, “new microwave oven” and “price” are detected and both are valid, so the condition of action data “new microwave oven and (price or price) http: // pricelist / new_microwave” is satisfied . Therefore, a request for browsing the content is made to the site of the URL.

さらに、応答員の発声が終了し、お客様の発声が始まると、情報提示処理手段９のキーワード検出処理部分は、ディクテーション処理部分の処理結果であるテキストを一定時間（例えば10秒間）にわたって受け取ることができない。そこで、キーワード検出処理部分は、それまでに検出されたキーワード「新型電子レンジ」と「値段」を無効にする。ここで、例えばお客様の発声の後で応答員が「別途のご質問で冷蔵庫についてでございますね」と発声し、キーワード「冷蔵庫」が検出されたとき、既にキーワード「値段」は無効になっている。そのため、アクションデータ「冷蔵庫and (価格 or 値段) http://pricelist/refregerator」が誤って「条件を満たす」と判定されることを防止できる。 Furthermore, when the utterance of the responder ends and the utterance of the customer starts, the keyword detection processing portion of the information presentation processing means 9 may receive the text that is the processing result of the dictation processing portion over a certain time (for example, 10 seconds). Can not. Therefore, the keyword detection processing portion invalidates the keywords “new microwave oven” and “price” detected so far. Here, for example, after the customer's utterance, when the responder utters "I have a separate question about the refrigerator" and the keyword "refrigerator" is detected, the keyword "price" is already invalidated. Yes. Therefore, it is possible to prevent the action data “refrigerator and (price or price) http: // pricelist / refregerator” from being erroneously determined to satisfy the “condition”.

なお、この実施形態では、ディクテーション処理の結果であるテキストが一定時間にわたって受け取られなかったことをキーワード検出処理部分が検知することにより、それまでに検出されたキーワードを無効にした。しかし、キーワードを無効にする条件は、これに限定されない。例えば、応答員によるキーボードやマウスの操作あるいは電話の切断をもってキーワードを無効にすることも可能である。 In this embodiment, the keyword detection processing part detects that the text as a result of the dictation process has not been received for a certain period of time, thereby invalidating the keywords detected so far. However, the condition for invalidating the keyword is not limited to this. For example, the keyword can be invalidated by operating a keyboard or mouse by the responder or disconnecting the telephone.

上記の実施形態では、応答員の発話音声のみを音声認識処理の入力データとした。しかし、応答員側の情報処理装置１に電話機２を接続し、応答員と顧客との会話音声を電話機２から情報提示処理手段９へ入力してもよい。あるいは、応答員の音声のみを電話機２から情報提示処理手段９へ入力してもよい。 In the above embodiment, only the utterance voice of the responder is used as input data for the voice recognition process. However, the telephone 2 may be connected to the information processing apparatus 1 on the responder side, and the conversation voice between the responder and the customer may be input from the telephone 2 to the information presentation processing means 9. Alternatively, only the voice of the responder may be input from the telephone 2 to the information presentation processing means 9.

また、上記の実施形態では、情報処理装置１が公衆通信回線網Ｎ１、通信回線網Ｎ２、インターネットＮ３を介して、外部の電話機３やＨＰ用サーバ５等と接続していた。しかし、この発明は、単独の情報処理装置１に対しても適用できる。つまり、コールセンター以外の用途にも適用の余地がある。 In the above embodiment, the information processing apparatus 1 is connected to the external telephone 3, the HP server 5, and the like via the public communication network N 1, the communication network N 2, and the Internet N 3. However, the present invention can also be applied to a single information processing apparatus 1. In other words, there is room for application to uses other than call centers.

さらに、上記の実施形態では、出力手段８として、もっぱら視覚により閲覧可能な表示装置を念頭に説明した。しかし、出力手段８は、聴覚的な情報提示手段であってもよい。例えば、視覚障害者が、質問等を音声で入力すると、音声で適当な応答が返ってくるようにすれば、福祉や教育の分野への応用も可能になる。 Further, in the above-described embodiment, the display device that can be browsed by visual sense as the output unit 8 has been described. However, the output unit 8 may be an auditory information presentation unit. For example, if a visually handicapped person inputs a question or the like by voice, an appropriate response is returned by voice, and application to the fields of welfare and education becomes possible.

さらにまた、上記の実施形態では、キーワードの検出をディクテーションによる方法で行った。しかし、ディクテーションによる方法に限定されるものではなく、連続する発声からキーワードの検出ができるならば、他の方法を用いても差し支えないことは当然である。
Furthermore, in the above embodiment, the keyword is detected by the dictation method. However, the method is not limited to the dictation method, and other methods may be used as long as keywords can be detected from continuous utterances.

本発明を適用した情報提示システムの構成を示す図である。It is a figure which shows the structure of the information presentation system to which this invention is applied. 応答員側の情報処理装置の機能ブロック図である。It is a functional block diagram of the information processing apparatus on the responder side. キーワードとアクションの対応データの一例を示す図である。It is a figure which shows an example of the corresponding data of a keyword and an action. ユーザ辞書記憶手段に格納されたデータの一例を示す図である。It is a figure which shows an example of the data stored in the user dictionary memory | storage means.

Explanation of symbols

２音声入力手段（電話機）
７音声入力手段（専用のマイク）
８出力手段
９情報提示処理手段
１０アクションデータ記憶手段
１１ユーザ辞書記憶手段
１３ユーザ単語登録処理手段

2 Voice input means (telephone)
7 Voice input means (dedicated microphone)
8 Output means 9 Information presentation processing means 10 Action data storage means 11 User dictionary storage means 13 User word registration processing means

Claims

Action data for storing correspondence data of a keyword and an action corresponding to a voice input means for inputting the voice of the speaker, a condition corresponding to a predetermined keyword, and action information related thereto in the information presentation system for the speaker Storage means and information presentation processing means for presenting information using voice recognition, the information presentation processing means performs voice recognition processing of a speaker's uttered voice, and based on the result of this processing, the action data storage means The keyword stored in is detected, it is determined whether or not the detected keyword satisfies the condition, and if the condition is satisfied, the corresponding action information is extracted from the action data storage means, and the extracted action information An information presentation system for a speaker, characterized by executing a process based on.

2. The information presentation system for a speaker according to claim 1, further comprising an output unit that can be browsed by the speaker, wherein the processing based on the extracted action information causes the output unit to display information.

The information presentation processing means converts the input voice into text by a method using dictation processing, and detects a keyword stored in the action data storage means from the text. Information presentation system for the speaker described.

A user dictionary storage means for storing user words for storing words in correspondence with their notation and reading, and when the user words stored in the user dictionary storage means are included in the text obtained by the dictation process 2. The method according to claim 1, wherein information indicating a head position of the user word in the text is output, and the information presentation processing means executes a keyword search only from the head position of the user word. 3. An information presentation system for a speaker according to 3.

5. The system for presenting information to a speaker according to claim 4, further comprising user word registration processing means for storing keywords stored in the action data storage means as user words in the user dictionary storage means.

If the action information associated with the condition corresponding to the keyword in the action data storage means is information specifying the file type and file name, the information presentation processing means opens the file and sets the type 6. The system for presenting information to a speaker according to any one of claims 1 to 5, wherein if the URL of a site on the Internet is performed according to the corresponding process, the web page of the site is requested to be browsed.

7. The information for a speaker according to claim 6, wherein when the file is opened and the process according to the type is a program file, the speech recognition result is used as an argument of the program. Presentation system.

The voice input means inputs voice of a conversation between a responder and a questioner corresponding to a question via a telephone line, and passes the voice of the conversation to the information presentation processing means. 8. An information presentation system for a speaker according to any one of 7 above.

The voice input means includes a dedicated microphone that collects only the voice spoken by the responder corresponding to the question via the telephone line, and the information input processing means does not pass the voice input to the microphone via the telephone line. 9. The system for presenting information to a speaker according to claim 8, wherein:

A computer program for causing a computer to operate as an information presentation system for presenting information corresponding to a keyword detected by speech recognition from a speech voice of a speaker by being read and executed by a computer, the computer comprising: Means for registering action data associating a condition corresponding to a predetermined keyword with file information related to the condition into action data storage means, means for performing speech recognition processing on the input speech of the speaker, voice recognition The keyword stored in the action data storage means is detected from the result, and it is determined whether or not the detected keyword satisfies the condition. If the condition is satisfied, the corresponding file information is extracted from the action data storage means. And execute processing based on the extracted file information That means, a computer program for functioning as means for outputting the execution result.