JP2005181358A

JP2005181358A - Speech recognition and synthesis system

Info

Publication number: JP2005181358A
Application number: JP2003417388A
Authority: JP
Inventors: Hiroaki Iso; 浩明磯
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2003-12-16
Filing date: 2003-12-16
Publication date: 2005-07-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition and synthesis system that enables equipment control by speech input close to a conversation between persons by making it possible to input a voice command consisting of a document that people daily use. <P>SOLUTION: In addition to functions of a conventional speech recognition/synthesis device, this system is provided with a history database 107 which stores histories of operations of prescribed equipment by users, a means 108 of analyzing the database to generate a script, a means of specifying a user based upon the recognition result of a speech recognition processing means 102 when there are a plurality of users, and means (101, 110, 107, and 108) of analyzing the operation histories stored in the history database, user by user, to relate specified keywords to each other and generating or updating a script of an operation procedure of the prescribed equipment from the related information. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、音声認識や音声合成を使い、対話形式でユーザにコマンドを音声入力させ機器制御するシステムに関する。 The present invention relates to a system that uses voice recognition or voice synthesis to control a device by allowing a user to input a command by voice in an interactive manner.

現在、コンピュータや電子機器の分野において、ユーザの音声による命令を認識して機器制御ができるものや文字入力を音声によってできるものがある。また、そのような機器では、音声を認識するだけでなく、音声を合成しユーザに対して音声でコマンド内容の確認や動作の開始終了を伝達するといったことも行われている。 Currently, in the field of computers and electronic devices, there are devices that can recognize a user's voice command and control the device, and can perform character input by voice. In such devices, not only the voice is recognized, but also the voice is synthesized and the confirmation of the command content and the start / end of the operation are transmitted to the user by voice.

カーナビゲーション（以下カーナビと言う）を例に、機器制御の一連の動作を説明する。
以下にカーナビに目的地を音声入力するときのユーザとカーナビとのやり取りの一例を示す。
（カーナビ）「行き先を入力ください」
（ユーザ）「東京都」
（カーナビ）「東京都のどちらですか？」
（ユーザ）「世田谷区」
（カーナビ）「世田谷区のどちらですか？」
・・・・
というやり取りを機器と行うことによって、ユーザは運転中に手などを使うことなく、声だけで目的地を入力し、目的地の地図表示や目的地までの経路探索などをカーナビに行わせることができる。また、近年では、音声認識技術の発達により、単語の認識だけでなく連続した単語の認識もできるようになり、例えば、目的地入力において「東京都世田谷区世田谷１丁目に行く」という文章を入力することによって、目的地までの経路探索を行わせることもできるようになってきている。 A series of device control operations will be described by taking car navigation (hereinafter referred to as car navigation) as an example.
An example of the exchange between the user and the car navigation when inputting the destination to the car navigation is shown below.
(Car navigation system) "Please enter your destination"
(User) “Tokyo”
(Car navigation) "Which is Tokyo?"
(User) “Setagaya Ward”
(Car navigation system) "Which is Setagaya Ward?"
...
By exchanging information with the device, the user can input the destination by voice without using a hand or the like while driving, and can cause the car navigation to display a map of the destination or search for a route to the destination. it can. In recent years, with the development of speech recognition technology, it has become possible not only to recognize words but also to recognize consecutive words. For example, in the destination entry, the text "Go to Setagaya 1-chome, Setagaya-ku, Tokyo" is entered. By doing so, the route search to the destination can be performed.

ところで、このような音声入力による機器の操作ができるシステムでは、ユーザと機器間のやり取りの定義、すなわち、どのような言葉が入力されたら、次にどのような機器動作をするのか、例えば、ある言葉の入力待ちになったり、経路探索の実行を開始したり、それと同時に音声合成によってユーザに返答をするなどという動作手順のシナリオが重要となる。このシナリオをプログラムによってアプリケーションに実装する方法もあるが、現在では、テキストファイルベースのスクリプトを定義し、それを音声認識エンジンや音声合成エンジンを持つブラウザプログラムでそのスクリプトファイルを読み込み、順次実行するといった技術がある。すなわち、下記の特許文献１（発明の名称：音声命令システム、音声命令装置、音声命令方法及び音声命令プログラム）に開示された技術では、HTMLベースでこのスクリプトを記述している。また、下記の非特許文献１に開示された技術では、、W3Cにおいては、XMLベースでこのスクリプトを記述するvoiceXMLという規格も標準化されている。
また、特定の自然言語を扱う修正型音声認識・合成システムは、下記の特許文献２に記載されているように、広く知られている。
特開２００２−３６６３４４号公報特表２００３−５２１７５０号公報（請求項７７、７８） Voice eXtensible Markup Language（VoiceXML）version 1.0 W3C Note 05-May-2000 (http://www.w3.org/TR/2000/NOTE-voicexml-20000505) By the way, in such a system capable of operating a device by voice input, the definition of the exchange between the user and the device, that is, what kind of word is input and what device operation is performed next, for example, A scenario of an operation procedure such as waiting for input of a word, starting execution of a route search, and simultaneously responding to the user by speech synthesis becomes important. There is also a method of implementing this scenario in an application programmatically, but currently, a text file-based script is defined, and the script file is read and executed sequentially by a browser program that has a speech recognition engine or speech synthesis engine. There is technology. That is, in the technology disclosed in the following Patent Document 1 (title of invention: voice command system, voice command device, voice command method, and voice command program), this script is described on an HTML basis. In the technology disclosed in Non-Patent Document 1 below, the W3C standardizes voiceXML that describes this script on an XML basis.
Also, a modified speech recognition / synthesis system that handles a specific natural language is widely known as described in Patent Document 2 below.
JP 2002-366344 A Japanese translation of PCT publication No. 2003-521750 (claims 77 and 78) Voice eXtensible Markup Language (VoiceXML) version 1.0 W3C Note 05-May-2000 (http://www.w3.org/TR/2000/NOTE-voicexml-20000505)

さて、上述のスクリプトファイルを用いた音声認識合成システムでは、あらかじめ用意されたいろいろなユーザの操作場面を想定したスクリプトファイルに定義された手順に従って機器動作がなされる。よって、同じ動作をさせるときは一連の音声のやりとりを行う必要がある。例えば、従来例のように「東京都世田谷区世田谷１丁目」に、また行きたいときには、再び「東京都世田谷区世田谷１丁目」を入力しなければならない。このようにユーザに何度も同じことを繰り返させることは、ユーザにとって利便性が悪いことは言うまでもない。また、このように何度も同じことを機器が尋ねたりすることによって、人間同士の自然な会話とかけ離れるものとなり、音声認識合成システムの普及の妨げにもなっている。
また、上記特許文献２に記載のシステムは、スピーチシステムであり、コマンドの伝達に応用できるものではなかった。 In the speech recognition / synthesis system using the script file described above, device operation is performed in accordance with procedures defined in script files that are prepared for various user operation scenes prepared in advance. Therefore, when performing the same operation, it is necessary to exchange a series of sounds. For example, if you want to go again to “Setagaya-ku, Setagaya-ku, Tokyo” as in the conventional example, you have to enter “Setagaya-ku, Setagaya-ku, Tokyo” again. Needless to say, it is not convenient for the user to make the user repeat the same thing over and over. In addition, when the device asks the same thing many times in this way, it becomes far from the natural conversation between humans, and it has also hindered the spread of speech recognition and synthesis systems.
Further, the system described in Patent Document 2 is a speech system and cannot be applied to command transmission.

そこで、本発明では、ユーザの過去の音声入力履歴や機器の操作履歴を基に人間がよく日常で使う言葉として「いつもの」「この前の」などという言葉に対応する処理を適宜スクリプトファイルに反映させることにより、「いつものところへ行く」「この前のお店に行く」などという日常的に人間が使う文章による音声コマンド入力を可能とし、より人間同士の会話に近い音声入力で機器制御を可能とする音声認識合成システムを提供することを目的とする。 Therefore, in the present invention, processes corresponding to words such as “usual” and “previous” are frequently used in a script file as humans often use on a daily basis based on a user's past voice input history and device operation history. By reflecting it, it is possible to input voice commands by sentences that humans use on a daily basis, such as “go to the usual place” or “go to the previous shop”, and device control is possible with voice input that is closer to the conversation between humans An object of the present invention is to provide a speech recognition and synthesis system.

本発明は上記目的を達成するための手段として、音声認識処理部及び音声合成処理部を備え、ユーザから入力される音声命令に対応する被制御機器における操作手順が定義されているスクリプトの内容を実行させるための操作コマンドを、前記被制御機器に送出し、前記ユーザの音声命令によって前記被制御機器に対して所望の操作を実行させる音声認識合成システムにおいて、
前記ユーザが複数であるとき、前記音声認識処理部での認識結果に基づいて前記ユーザを特定する手段と、
複数の前記ユーザによる前記被制御機器の操作履歴を前記特定する手段により特定された各ユーザと対応させて格納するデータベースと、
前記データベースを解析しスクリプトを生成する手段と、
その特定された前記ユーザごとに前記データベースに格納された前記操作履歴を解析し、特定のキーワード同士の関連付けを行い、前記関連付けた情報から前記被制御機器の操作手順のスクリプトを生成又は更新する手段とを、
有することを特徴とする音声認識合成システムを提供するものである。 The present invention includes a speech recognition processing unit and a speech synthesis processing unit as means for achieving the above object, and the contents of a script in which an operation procedure in a controlled device corresponding to a voice command input from a user is defined. In the speech recognition and synthesis system for sending an operation command to be executed to the controlled device and causing the controlled device to execute a desired operation by the user's voice command,
Means for identifying the user based on a recognition result in the voice recognition processing unit when there are a plurality of the users;
A database for storing the operation history of the controlled device by a plurality of users in association with each user specified by the means for specifying;
Means for analyzing the database and generating a script;
Means for analyzing the operation history stored in the database for each specified user, associating specific keywords with each other, and generating or updating a script of an operation procedure of the controlled device from the associated information And
The present invention provides a speech recognition and synthesis system characterized by having the above.

本発明の音声認識合成システムでは、ユーザの過去の操作履歴を解析し、回数の最も多い操作に関するキーワードや、最後に行った操作に関するキーワードと、「いつもの」や「この前の」といった特別なキーワードとを関連付けることによって、ユーザからの「いつものＸＸＸ」「この前のＸＸＸ」といったコマンド入力に対応することが可能となり、何度も同じ住所を入力させるなど、繰り返し同じ音声コマンドをユーザに要求することもなくなる。 In the speech recognition / synthesis system of the present invention, a user's past operation history is analyzed, and a keyword related to the operation with the highest number of times, a keyword related to the last operation performed, and “special” or “previous” special keywords. By associating with a keyword, it becomes possible to respond to the command input from the user such as “ordinary XXX” and “previous XXX”. You do n’t have to.

さらに、音声認識の際に話者の特徴を抽出し話者を特定し、ユーザごとの履歴が反映されたスクリプトを生成することにより、ユーザごとに「いつもの」や「この前の」といったコマンドに対して最適な動作をすることが可能となる。
また、日常的に人間が使う文章に近いコマンド入力が使えることにより、機械と話すという抵抗感も、より軽減されるという効果を得ることができる。 In addition, by extracting speaker characteristics during voice recognition, identifying the speaker, and generating a script that reflects the history of each user, commands such as “usual” and “previous” for each user Can be optimally operated.
In addition, the ability to use command input that is similar to text used by humans on a daily basis can provide the effect of reducing the sense of resistance to talking with the machine.

本発明の音声認識合成システムは、ユーザごとの過去の操作履歴を解析し、回数の最も多い操作に関するキーワードや、最後に行った操作に関するキーワードと、「いつもの」や「この前の」といった特別なキーワードと関連付けることによって、ユーザからの「いつものＸＸＸ」「この前のＸＸＸ」といったコマンド入力に対応を可能とした。
図１は、本発明の音声認識・合成システムの実施の形態を示す構成図である。なお、本実施の形態では、操作対象の機器の一例としてカーナビゲーションを想定した。 The speech recognition / synthesis system of the present invention analyzes the past operation history for each user, and the keyword related to the most frequently used operation, the keyword related to the last operation, and special such as “ordinary” and “previous”. By associating with various keywords, it is possible to respond to command inputs such as “ordinary XXX” and “previous XXX” from the user.
FIG. 1 is a block diagram showing an embodiment of a speech recognition / synthesis system according to the present invention. In the present embodiment, car navigation is assumed as an example of the operation target device.

voiceXMLインタプリタ１０１は、voiceXMLスクリプト１０９を読み込み実行する。voiceXMLスクリプト１０９は、システム内のハードディスクやRAMなど書き換え可能な媒体に格納されている。音声認識エンジン１０２は、ユーザがマイク１０４を使い入力した音声を認識し、文字列としてvoiceXMLインタプリタ１０１に送信する。また、音声認識エンジン１０２は話者を特定してその話者情報をユーザ管理部１１０に送信する。音声合成エンジン１０３は、voiceXMLインタプリタ１０１から送信された文字列を音声に変換し、スピーカ１０５に出力し、ユーザに対して応答をする。 The voiceXML interpreter 101 reads and executes the voiceXML script 109. The voiceXML script 109 is stored in a rewritable medium such as a hard disk or RAM in the system. The voice recognition engine 102 recognizes the voice input by the user using the microphone 104 and transmits it to the voiceXML interpreter 101 as a character string. In addition, the voice recognition engine 102 identifies a speaker and transmits the speaker information to the user management unit 110. The speech synthesis engine 103 converts the character string transmitted from the voiceXML interpreter 101 into speech, outputs it to the speaker 105, and responds to the user.

例えば、voiceXMLスクリプト１０９に、「行き先はどこですか？」という音声を出力後、ユーザからの音声入力待ちとなり、行き先が入力された後に、その行き先について経路探索するという記述がされていた場合、voiceXMLインタプリタ１０１は、まず、音声合成エンジン１０３に「行き先はどこですか？」という文字列を送信し、音声を出力させる。 For example, if the voiceXML script 109 outputs a voice saying “Where is the destination?”, Waits for voice input from the user, and after inputting the destination, it is described that the route search is performed for the destination. The interpreter 101 first transmits a character string “Where is the destination?” To the speech synthesis engine 103 to output the speech.

その後、音声認識エンジン１０２からの文字列待ちになる。そして、音声認識エンジン１０２から文字列が送信されてきた場合、その単語を解釈し、機器操作処理部１０６に経路探索の指示を出す。機器操作処理部１０６は、経路探索結果をユーザに表示するなどの処理を完了した後、その操作履歴とユーザ管理部１１０からの情報を履歴データベース１０７に格納する。その後、履歴アナライザ１０８は、更新された履歴データベース１０７に格納された情報を基に解析を始める。例えば、最後に経路探索した場所や過去探索回数が最も多い場所などをユーザ別に検索する。また、ジャンルなど属性別にも同様な検索を行う。その結果を基に「いつもの店」「この前行った店」「いつものところ」といった特別なキーワードとの関連付けをユーザ別に行い、voiceXMLスクリプト１０９を更新する。その後再度voiceXMLスクリプト１０９が実行されると、ユーザから「いつもの店」「この前行った店」といったコマンドが入力された場合、voiceXMLインタプリタ１０１は、ユーザ管理部１１０から話者情報とvoiceXMLスクリプト１０９の内容を参照し、ユーザに応じた経路探索の処理を行う。 Thereafter, the process waits for a character string from the speech recognition engine 102. When a character string is transmitted from the speech recognition engine 102, the word is interpreted and a route search instruction is issued to the device operation processing unit 106. The device operation processing unit 106 stores the operation history and information from the user management unit 110 in the history database 107 after completing processing such as displaying the route search result to the user. Thereafter, the history analyzer 108 starts analysis based on the information stored in the updated history database 107. For example, the user searches for the place where the route was searched last, the place where the number of past searches is the largest, and the like. A similar search is performed for each attribute such as genre. Based on the result, the voice XML script 109 is updated by associating with a special keyword such as “ordinary store”, “store visited last time”, and “usual place” for each user. After that, when the voiceXML script 109 is executed again, the voiceXML interpreter 101 receives the speaker information and the voiceXML script 109 from the user management unit 110 when a command such as “ordinary store” or “previous store” is input from the user. The route search process corresponding to the user is performed with reference to the contents of the above.

ここで、図２の履歴データベースの一例を用いて、詳細に説明すると、履歴データベースには、探索場所、探索日時、属性１、属性２、探索回数（探索履歴）、ユーザ名（ユーザ情報）といった項目が保存されている。このデータベースを参照すると、「いつもの店」に関連付けられるキーワードとしては、ユーザＡの場合、「店」という属性を持つ項目の探索場所の中で、最も探索回数が多い「ＸＸコンビニエンス」が該当する。また、「この前の店」に関連付けられるキーワードは、「店」という属性を持つ探索場所の中で、探索日時が直近の「ＸＸレストラン」が当てはまる。一方、ユーザＢの場合は、「店」という属性を持つ項目の探索場所の中で、最も探索回数が多い「ＢＢスーパー」が該当する。また、「この前の店」に関連付けられるキーワードは、「店」という属性を持つ探索場所の中で、探索日時が直近の「ラーメンＢＢ」が当てはまる。 The history database will be described in detail using an example of the history database in FIG. 2. The history database includes a search location, a search date, attribute 1, attribute 2, search frequency (search history), user name (user information), and the like. The item is saved. Referring to this database, as a keyword associated with “ordinary store”, in the case of user A, “XX convenience” having the largest number of searches among search locations of items having the attribute “store” is applicable. . The keyword associated with “the previous store” is “XX restaurant” with the latest search date and time among search locations having the attribute “store”. On the other hand, in the case of the user B, “BB super” having the largest number of searches corresponds to the search place of the item having the attribute “store”. The keyword associated with “the previous store” is “ramen BB” having the latest search date and time among search locations having the attribute “store”.

このように履歴データベースを解析するとともに、さらにその関連付けから、「いつもの店」という音声コマンドがユーザから発行されたら、ユーザＡであれば「ＸＸコンビニエンス」、また、ユーザＢであれば「ＢＢスーパー」を案内するよう定義されたvoiceXMLスクリプトに現存のvoiceXMLスクリプトを更新する。 In this way, the history database is analyzed, and further, from the association, when a voice command “ordinary store” is issued by the user, “XX convenience” for user A, and “BB supermarket” for user B. Update the existing voiceXML script to the voiceXML script defined to guide you.

図３は、更新前のvoiceXMLスクリプトの一例である。「行き先はどこですか？」と音声を出力後、ユーザからの音声入力待ちになり、入力後経路探索処理が始まる。
一方、図４は、更新後のvoiceXMLスクリプトの一例であり、ユーザが音声を入力後、「いつもの店」や「この前の店」といったキーワードが入力された場合は、ユーザ別の操作履歴に応じた動作をするように定義されている。
このように過去の操作履歴をvoiceXMLスクリプトに反映させることにより、機器が「行き先はどこですか？」と尋ねた後、ユーザから「いつもの店」や「この前の店」という返答があった場合に過去の操作履歴に応じた店を案内することが実現できる。 FIG. 3 is an example of a voiceXML script before update. After outputting a voice saying "Where is your destination?", It waits for voice input from the user, and the route search process after input starts.
On the other hand, FIG. 4 shows an example of the updated voiceXML script. If a keyword such as “ordinary store” or “previous store” is input after the user inputs voice, the operation history for each user is displayed. It is defined to behave accordingly.
By reflecting the past operation history in the voiceXML script in this way, after the device asks "Where is the destination?", The user responds "Normal store" or "Previous store" It is possible to guide the store according to the past operation history.

なお、本実施の形態ではカーナビゲーションの経路探索を想定しているが、他のコマンドにも応用が可能であるとともに、他分野の音声認識合成を用いるシステムに応用も可能である。また、本実施の形態では、スクリプトとしてvoiceXMLを使用したが、HTMLや他のスクリプトを利用することも可能である。 In the present embodiment, a route search for car navigation is assumed, but the present invention can be applied to other commands, and can also be applied to systems using speech recognition synthesis in other fields. In this embodiment, voiceXML is used as a script. However, HTML and other scripts can also be used.

図１に示した実施の形態では、機器操作処理部１０６が設けられていて、本発明の音声認識・合成システムを用いて操作を制御する対象の機器の一部が組み込まれた形となっている。しかし、制御対象の機器の一部を含まない構成とする場合は、図１中の機器操作処理部１０６に代えて制御対象の機器を制御するための制御信号を送出するとともに、制御対象の機器の状態を示す信号を受け付けるインターフェイスなどを設けることができる。 In the embodiment shown in FIG. 1, a device operation processing unit 106 is provided, and a part of a device whose operation is controlled using the speech recognition / synthesis system of the present invention is incorporated. Yes. However, in the case of a configuration that does not include a part of the device to be controlled, a control signal for controlling the device to be controlled is sent instead of the device operation processing unit 106 in FIG. An interface or the like for receiving a signal indicating the state can be provided.

図１に構成を示した本発明の実施の形態中のマイク１０４、スピーカ１０５を除いた部分は、コンピュータの構成要素であるＣＰＵ（中央演算処理装置）、ＲＡＭ、ＲＯＭ、インターフェイス、バスラインなどで構成することができる。したがって、上記実施の形態で説明した本発明にかかる装置中の各機能は、コンピュータプログラムとして具現し、コンピュータに実行させることができる。かかるコンピュータプログラムは、所定の記録媒体に記録して供給されてコンピュータに取り込まれるようにすることもできるし、インターネットなどの通信ネットワークを介して伝送されてコンピュータに取り込まれるようにすることもできる。 A portion excluding the microphone 104 and the speaker 105 in the embodiment of the present invention shown in FIG. 1 is a CPU (Central Processing Unit), a RAM, a ROM, an interface, a bus line, etc. that are components of the computer. Can be configured. Therefore, each function in the apparatus according to the present invention described in the above embodiment can be embodied as a computer program and executed by a computer. Such a computer program can be supplied by being recorded on a predetermined recording medium and taken into the computer, or can be sent via a communication network such as the Internet and taken into the computer.

本発明の音声認識合成システムは、ユーザからの「いつものＸＸＸ」「この前のＸＸＸ」といったコマンド入力への対応が可能となり、何度も同じ住所を入力させるなど、繰り返し同じ音声コマンドをユーザに要求することもなくなるので、本実施の形態で説明したカーナビゲーションシステムのみならず、音声認識合成を用いてユーザが機器操作する様々なシステム全般において有用である。 The speech recognition / synthesis system according to the present invention can respond to command input from the user such as “ordinary XXX” and “previous XXX”, and repeatedly input the same voice command to the user, such as inputting the same address many times. Since it is not required, it is useful not only in the car navigation system described in this embodiment but also in various systems in which a user operates a device using speech recognition synthesis.

本発明の音声認識合成システムの実施の形態を示す構成図である。It is a block diagram which shows embodiment of the speech recognition synthesis system of this invention. 本発明の音声認識合成システムの実施の形態に用いる履歴データベースの構造の一例を示す図である。It is a figure which shows an example of the structure of the history database used for embodiment of the speech recognition synthesis system of this invention. 本発明の音声認識合成システムの実施の形態に用いるvoiceXMLスクリプトの一例である（更新前）。It is an example of the voiceXML script used for the embodiment of the speech recognition and synthesis system of the present invention (before update). 本発明の音声認識合成システムの実施の形態に用いるvoiceXMLスクリプトの一例である（更新後）。It is an example of the voiceXML script used for embodiment of the speech recognition synthesis system of this invention (after update).

Explanation of symbols

１０１ voiceXMLインタプリタ（ユーザ管理部、履歴データベース、履歴アナライザと共にスクリプトを生成又は更新する手段を構成する）
１０２音声認識エンジン（音声認識処理部）
１０３音声合成エンジン
１０４マイク
１０５スピーカ
１０６機器操作処理部
１０７履歴データベース
１０８履歴アナライザ（スクリプトを生成する手段）
１０９ voiceXMLスクリプト
１１０ユーザ管理部
101 voiceXML interpreter (configures a means for generating or updating a script together with a user management unit, a history database, and a history analyzer)
102 Voice recognition engine (voice recognition processing unit)
103 speech synthesis engine 104 microphone 105 speaker 106 device operation processing unit 107 history database 108 history analyzer (means for generating a script)
109 voiceXML Script 110 User Management Department

Claims

The controlled device includes a voice recognition processing unit and a voice synthesis processing unit, and causes the controlled device to execute an operation command for executing the contents of a script in which an operation procedure in the controlled device corresponding to a voice command input from a user is defined. In a speech recognition and synthesis system for sending and executing a desired operation on the controlled device according to the user's voice command,
Means for identifying the user based on a recognition result in the voice recognition processing unit when there are a plurality of the users;
A database for storing the operation history of the controlled device by a plurality of users in association with each user specified by the means for specifying;
Means for analyzing the database and generating a script;
Means for analyzing the operation history stored in the database for each specified user, associating specific keywords with each other, and generating or updating a script of an operation procedure of the controlled device from the associated information And
A speech recognition and synthesis system comprising: