JP6123121B2

JP6123121B2 - Voice control system and program

Info

Publication number: JP6123121B2
Application number: JP2011227492A
Authority: JP
Inventors: 岳史藤田
Original assignee: ヴイアールアイ株式会社
Priority date: 2011-10-14
Filing date: 2011-10-14
Publication date: 2017-05-10
Anticipated expiration: 2031-10-14
Also published as: JP2013088535A

Description

本発明は、自然文を音声で入力することにより、機器の動作を制御するシステムに関する。 The present invention relates to a system for controlling the operation of a device by inputting a natural sentence by voice.

この種の従来例として、特開２０００−５６９４４号公報が開示されている。この公報に開示された発明によると、段落００９０に記載されているように、音声で自然文を入力することができ、例えば、段落００４８に記載されているように「ビデオのチャンネルを１にして」という自然文を入力すると、段落００５１に記載されているように、「電源をオンする」というコマンドと、「チャンネルを１にする」という複数のコマンドを連続して実行することが可能となっている。 As a conventional example of this type, Japanese Patent Laid-Open No. 2000-56944 is disclosed. According to the invention disclosed in this publication, a natural sentence can be input by voice as described in paragraph 0090. For example, as described in paragraph 0048, “video channel is set to 1”. ”Is entered, as described in paragraph 0051, a command“ turn on power ”and a plurality of commands“ set channel 1 ”can be executed in succession. ing.

しかし、この発明で解釈できる自然文は、「電源をオンする」「チャンネルを１にする」といった、従来のリモコンの個々のボタンに対応する要求に限られ、例えば、「ニュースを見たい」といった自然文の要求に対しては応えることができなかった。 However, natural sentences that can be interpreted by the present invention are limited to requests corresponding to individual buttons of a conventional remote controller, such as “turn on power” and “set channel 1”. For example, “I want to see news” We could not meet the demands of natural sentences.

特開２０００−５６９４４号公報JP 2000-56944 A

本発明は、１つの自然文によって複数のコマンドを実行可能とした音声制御システムにおいて、従来よりも幅広い表現の自然文を受け入れることを可能とし、ユーザの利便性を向上することを課題とする。 An object of the present invention is to improve a user's convenience by enabling a speech control system that can execute a plurality of commands by one natural sentence to accept a natural sentence having a wider expression than before.

この課題を解決するため、本発明は以下のように構成する。
１．音声入力された１つの自然文による希望に応じて当該希望に沿った動作を制御対象機器に指令する複数の連続する制御コマンドを生成し、当該複数の連続する制御コマンドを制御対象機器に出力する音声制御システムにおいて、
前記自然文から抽出される言葉と前記制御コマンドとを直接結びつけるコマンド変換辞書を記憶手段に備えると共に、前記制御コマンドに直接結びつかない言葉を前記制御コマンド生成用のパラメータに変換するための変換補助情報を記憶手段に記憶し、
前記制御コマンドに直接結びつかない言葉については前記変換補助情報を参照して前記制御コマンド生成用のパラメータに変換し、前記制御コマンドを生成するコマンド変換手段を備えた、音声制御システム。 In order to solve this problem, the present invention is configured as follows.
1. Generates a plurality of continuous control commands for instructing the device to be controlled to perform an action in accordance with the desire according to a request by one natural sentence inputted by voice, and outputs the plurality of continuous control commands to the device to be controlled. In the voice control system,
Conversion auxiliary information for converting a word that is not directly connected to the control command into a parameter for generating the control command, and having a command conversion dictionary that directly connects the word extracted from the natural sentence and the control command. Is stored in the storage means,
A voice control system comprising command conversion means for generating a control command by converting a word that is not directly connected to the control command into a parameter for generating the control command with reference to the conversion auxiliary information.

本発明によれば、１つの自然文によって複数のコマンドを実行可能とした音声制御システムにおいて、従来よりも幅広い表現の自然文を受け入れることを可能とし、ユーザの利便性を向上することができる。 According to the present invention, in a voice control system in which a plurality of commands can be executed by one natural sentence, it is possible to accept natural sentences with a wider expression than before, and it is possible to improve user convenience.

本発明の第１実施形態を示すブロック構成図。The block block diagram which shows 1st Embodiment of this invention. 図１の記憶手段に記憶される、コマンド変換辞書の例を示すデータ構造図。The data structure figure which shows the example of the command conversion dictionary memorize | stored in the memory | storage means of FIG. 本発明の第２実施形態を示すブロック構成図。The block block diagram which shows 2nd Embodiment of this invention.

［第１実施形態］ [First Embodiment]

以下に本発明の第１実施形態を説明する。図１は、本実施形態の構成図である。音声入力手段１０１は、マイクを備え、自然文の音声の入力を受け付ける。また、音声入力手段１０１は、入力された音声信号をデジタル変換し、音声認識手段１０２に入力する。音声認識手段１０２は、既知の音声認識処理に基づき、自然文の音声から複数の言葉を抽出し、コマンド変換手段１０３に入力する。音声認識処理により自然文から単語を抽出する従来例としては例えば、特許第３５８１０４４号公報などがある。コマンド変換手段１０３は、音声認識手段１０２において抽出された複数の言葉をコマンド変換辞書に照合し、当該言葉の元となる自然文に対応して実行すべき複数のコマンドを決定する。同変換辞書は、コマンド変換辞書記憶手段１０４に記憶されている。コマンド出力手段１０５は、コマンド変換手段１０３において実行を決定された複数のコマンドを、当該コマンドを表す信号に変換し、制御対象機器に宛てて出力する。 The first embodiment of the present invention will be described below. FIG. 1 is a configuration diagram of this embodiment. The voice input means 101 includes a microphone and receives an input of a natural sentence voice. The voice input unit 101 converts the input voice signal into a digital signal and inputs the digital signal to the voice recognition unit 102. The voice recognition unit 102 extracts a plurality of words from the natural sentence voice based on the known voice recognition process, and inputs them to the command conversion unit 103. As a conventional example of extracting a word from a natural sentence by voice recognition processing, for example, there is Japanese Patent No. 3581044. The command conversion means 103 collates a plurality of words extracted by the speech recognition means 102 with a command conversion dictionary, and determines a plurality of commands to be executed corresponding to the natural sentence that is the basis of the words. The conversion dictionary is stored in the command conversion dictionary storage unit 104. The command output means 105 converts the plurality of commands determined to be executed by the command conversion means 103 into signals representing the commands and outputs them to the control target device.

また、コマンド変換手段１０３は、自然文に対応するコマンドを決定する際、変換補助情報を参照する。変換補助情報とは、コマンド変換辞書を参照しても制御対象機器用のコマンドに直接結び付けることのできない言葉を、制御対象機器用のコマンドに結びつけるための補助情報である。例えば、「ニュース」という言葉から現在ニュースを放送している放送チャンネルを導くための番組表の情報が含まれる。変換補助情報は、変換補助情報記憶手段１０６に記憶されている。 Further, the command conversion means 103 refers to conversion auxiliary information when determining a command corresponding to a natural sentence. The conversion auxiliary information is auxiliary information for linking a word that cannot be directly linked to the command for the control target device with reference to the command conversion dictionary to the command for the control target device. For example, information of a program guide for deriving a broadcast channel that is currently broadcasting news from the word “news” is included. The conversion auxiliary information is stored in the conversion auxiliary information storage unit 106.

また、コマンド変換手段１０３は、自然文に対応して実行するコマンドを決定するにあたり、補助信号入力手段１０７から入力される情報を参照する。補助信号入力手段１０７は、ユーザが発する音声以外の信号を取得するもので、当該信号の種別に応じた適切なセンサを備えている。例えば、補助信号入力手段１０７は、カメラを備え、ユーザの手の姿勢や動作を撮影する。また、補助信号入力手段１０７は、捉えた補助信号をデジタル変換し、補助信号認識手段１０８に入力する。補助信号認識手段１０８は、補助信号入力手段１０７から入力された情報のパターンを認識し、当該認識したパターンの種別をコマンド変換手段１０３に入力する。例えば、補助信号入力手段１０７のカメラが捉えたユーザの動作が所定のジェスチャーであることを認識し、当該認識結果をコマンド変換手段１０３に通知する。ユーザの動作をカメラで捉え、当該動作を認識し制御入力とする従来例としては、例えば、特許第４４５７９８３号公報がある。 In addition, the command conversion unit 103 refers to information input from the auxiliary signal input unit 107 when determining a command to be executed corresponding to a natural sentence. The auxiliary signal input unit 107 acquires a signal other than the voice uttered by the user, and includes an appropriate sensor corresponding to the type of the signal. For example, the auxiliary signal input unit 107 includes a camera and photographs the posture and motion of the user's hand. The auxiliary signal input unit 107 converts the captured auxiliary signal into a digital signal and inputs the digital signal to the auxiliary signal recognition unit 108. The auxiliary signal recognition unit 108 recognizes the pattern of information input from the auxiliary signal input unit 107 and inputs the recognized pattern type to the command conversion unit 103. For example, the user's action captured by the camera of the auxiliary signal input unit 107 is recognized as a predetermined gesture, and the recognition result is notified to the command conversion unit 103. For example, Japanese Patent No. 4457983 discloses a conventional example in which a user's motion is captured by a camera, and the motion is recognized and input as a control input.

コマンド変換手段１０３には、学習指示手段１０９が接続されている。学習指示手段１０９は、ユーザが操作可能な操作子を備えている。この操作子は、通常制御対象機器を制御するために用いる操作子であり、例えば、当該制御対象機器用のリモコンのボタンである。コマンド変換手段１０３は、音声認識手段１０２によって自然文から抽出された言葉を取得した後、学習指示手段１０９の操作子が操作されたことを検出すると、当該取得した言葉と、当該操作された操作子に対応するコマンドとを関連付け、コマンド変換辞書に登録する。 A learning instruction unit 109 is connected to the command conversion unit 103. The learning instruction unit 109 includes an operator that can be operated by the user. This operation element is an operation element that is used to control a normal control target device, for example, a button of a remote control for the control target device. When the command conversion unit 103 detects that the operator of the learning instruction unit 109 has been operated after acquiring the words extracted from the natural sentence by the voice recognition unit 102, the command conversion unit 103 and the operated operation The command corresponding to the child is associated and registered in the command conversion dictionary.

また、コマンド出力手段１０５には、出力先記憶手段１１０が接続されている。出力先記憶手段１１０には、コマンドをどの制御対象機器に出力すべきかを表す制御対象機器の識別子が設定されるようになっている。コマンド出力手段１０５は、出力先記憶手段に設定されている制御対象機器の識別子に応じて、コマンドを当該制御対象機器の解釈可能な赤外線などの信号に変換して出力する。コマンド出力手段１０５は、制御対象機器の仕様に対応したコマンド信号の出力デバイスを備えている。 The command output unit 105 is connected to an output destination storage unit 110. The output destination storage unit 110 is set with an identifier of a control target device indicating to which control target device the command should be output. The command output unit 105 converts the command into a signal such as an infrared ray that can be interpreted by the control target device according to the identifier of the control target device set in the output destination storage unit, and outputs the signal. The command output means 105 includes a command signal output device corresponding to the specification of the control target device.

この図１において、音声認識手段１０２、コマンド変換手段１０３、コマンド出力手段１０５および補助信号認識手段１０８は、それぞれ専用プロセッサにより構成されてもよいし、汎用プロセッサがプログラムを実行することにより、各手段を実現するように構成してもよい。また、各記憶手段１０４，１０６，１１０は、データを記憶できる記憶領域を備えていればよく、素子や媒体の種類は限定されない。 In FIG. 1, the voice recognition means 102, the command conversion means 103, the command output means 105, and the auxiliary signal recognition means 108 may each be constituted by a dedicated processor, or each means by a general-purpose processor executing a program. You may comprise so that. Moreover, each storage means 104, 106, and 110 should just be provided with the memory area which can memorize | store data, and the kind of element or medium is not limited.

図２は、コマンド変換辞書の一例である。コマンド変換辞書は、言葉と、コマンドとを関連付けている。また、コマンドには実行の優先順位が関連付けられている。また、コマンドには、対応する学習指示手段の操作子が関連付けられている。１つのコマンドには、複数の言葉を関連付けることができるようになっている。例えば、電源オンのコマンドに対応する言葉として、「つける」「オン」「Turn on」といった複数の言葉が関連付けられている。また、「見る」「Watch」という言葉に対し、「テレビ電源オン」というコマンドが関連付けられ、一つの動詞により、複数の制御対象機器の中から特定の制御対象機器（テレビ）を選択して電源を入れることが可能となっている。また、「ニュース」「音楽」「Music」といった、制御対象機器宛てのコマンドに直接結びつかない言葉に対しては、コマンド変換手段１０３が行う「チャンネル選択」プロセスの起動が紐付けられている。また、「テレビ」「ラジオ」「ビデオ」といった複数の制御対象機器のいずれかを表す言葉には、コマンド変換手段１０３が行う「機器選択」プロセスの起動が紐付けられている。 FIG. 2 is an example of a command conversion dictionary. The command conversion dictionary associates words with commands. In addition, execution priority is associated with the command. In addition, an operator of the corresponding learning instruction unit is associated with the command. A single command can be associated with a plurality of words. For example, a plurality of words such as “ON”, “ON”, and “Turn on” are associated as words corresponding to a power-on command. In addition, the words “TV” and “Watch” are associated with the command “TV power on”, and a single verb selects a specific control target device (TV) from a plurality of control target devices. It is possible to put. For words that are not directly related to the command addressed to the control target device, such as “news”, “music”, and “Music”, the activation of the “channel selection” process performed by the command conversion unit 103 is associated. In addition, a term indicating any one of a plurality of control target devices such as “TV”, “radio”, and “video” is associated with activation of a “device selection” process performed by the command conversion unit 103.

各コマンド又はプロセスには優先順位が与えられ、機器選択は１番、電源オンは２番、チャンネル選択は３番となっている。また、電源オンのコマンドには、電源ボタンの操作子が関連付けられ、上述したように或る言葉が入力された後に電源ボタンが操作されると、当該言葉が新たにコマンド変換辞書に登録され、電源オンのコマンドに紐づけられる。同様に、テレビ電源オンのコマンドには、テレビ電源オンのボタンが紐付けられている。 Priorities are given to each command or process, device selection is No. 1, power-on is No. 2, and channel selection is No. 3. Further, the power button command is associated with the power button operator, and when a power button is operated after a word is input as described above, the word is newly registered in the command conversion dictionary, It is tied to the power-on command. Similarly, a TV power on button is associated with the TV power on command.

本システムを稼働状態に設定し、ユーザが音声入力手段１０１から自然文の音声を入力すると、音声認識手段１０２が、入力された自然文から言葉を抽出し、コマンド変換手段１０３に入力する。例えば、「テレビつけて」と発音すると、「テレビ」と「つける」の言葉が認識され、コマンド変換手段１０３に入力される。コマンド変換手段１０３は、コマンド変換辞書を参照し、「テレビ」の語に基づいて優先順位１番の機器選択プロセスを起動し、出力先記憶手段１１０に制御対象機器として特定のテレビを設定する。次いで、「つける」の語に基づき、優先順位２番として電源オンコマンドを発信するように、コマンド出力手段１０５に指示する。コマンド出力手段１０５は、出力先記憶手段１１０の設定に基づき、電源オンのコマンドを制御対象機器であるテレビが解釈可能な信号に変換し、当該制御対象機器に宛てて出力する。これにより、テレビの電源が入る。 When this system is set to the operating state and the user inputs a natural sentence voice from the voice input unit 101, the voice recognition unit 102 extracts words from the input natural sentence and inputs them to the command conversion unit 103. For example, when “Television is turned on” is pronounced, the words “Television” and “Turn” are recognized and input to the command conversion means 103. The command conversion unit 103 refers to the command conversion dictionary, activates the device selection process with the first priority based on the word “TV”, and sets a specific television as a control target device in the output destination storage unit 110. Next, based on the word “attach”, the command output means 105 is instructed to send a power-on command with the second priority. Based on the setting of the output destination storage unit 110, the command output unit 105 converts the power-on command into a signal that can be interpreted by the television as the control target device, and outputs the signal to the control target device. This turns on the TV.

また、例えば「ニュースを見たい」と発音すると、「ニュース」と「見る」の言葉が認識され、コマンド変換手段１０３に入力される。コマンド変換手段１０３は、コマンド変換辞書を参照し、「見る」の語に基づき、優先順位２番のテレビ電源オンのコマンドを実行する。即ち、出力先記憶手段１１０に制御対象機器として特定のテレビを設定する。次いで、電源オンコマンドを発信するように、コマンド出力手段１０５に指示する。コマンド出力手段１０５は、出力先記憶手段１１０の設定に基づき、電源オンのコマンドを制御対象機器である特定のテレビが解釈可能な信号に変換し、当該制御対象機器に宛てて出力する。これにより、テレビの電源が入る。続いて、コマンド変換手段１０３は、「ニュース」の語に基づき、優先順位３番の「チャンネル選択」プロセスを実行する。 For example, when “pronounce news” is pronounced, the words “news” and “see” are recognized and input to the command conversion means 103. The command conversion means 103 refers to the command conversion dictionary and executes a TV power-on command with the second highest priority based on the word “view”. That is, a specific television is set as a control target device in the output destination storage unit 110. Next, the command output means 105 is instructed to send a power-on command. Based on the setting of the output destination storage unit 110, the command output unit 105 converts the power-on command into a signal that can be interpreted by a specific television that is the control target device, and outputs the signal to the control target device. This turns on the TV. Subsequently, the command conversion means 103 executes a “channel selection” process of priority number 3 based on the word “news”.

即ち、コマンド変換手段１０３は、まず、変換補助情報を参照し、変換補助情報として記憶されている現在時刻の情報を得る。次に、変換補助情報として記憶されている当日のテレビ番組表を参照し、現在の時刻にニュースが放送されているテレビチャンネルを判定する。番組表の各番組には、ニュース、映画、サッカーなどの番組の種別が予め埋め込まれているものとする。次いで、コマンド変換手段１０３は、ニュースを放送しているチャンネルにテレビのチャンネルを切り替えるためのコマンドをコマンド出力手段１０５に入力する。コマンド出力手段１０５は、出力先記憶手段１１０を参照し、当該コマンドを制御対象機器であるテレビが解釈可能な信号に変換し、当該制御対象機器宛てに出力する。これにより、テレビのチャンネルが切り替わり、ニュース番組が表示される。 That is, the command conversion unit 103 first refers to the conversion auxiliary information and obtains information on the current time stored as the conversion auxiliary information. Next, the TV program table of the day stored as conversion auxiliary information is referred to, and the TV channel on which news is broadcast at the current time is determined. Each program in the program guide is preliminarily embedded with a program type such as news, movie, or soccer. Next, the command conversion means 103 inputs a command for switching the television channel to the channel broadcasting the news to the command output means 105. The command output unit 105 refers to the output destination storage unit 110, converts the command into a signal that can be interpreted by the television as the control target device, and outputs the signal to the control target device. Thereby, the television channel is switched and a news program is displayed.

コマンド変換手段１０３は、補助信号入力手段１０７からの入力に基づいてコマンドの実行を決定してもよい。例えば、補助信号入力手段１０７から入力されたユーザのジェスチャーが所定のジェスチャーに一致する場合に限りコマンドを実行し、所定のジェスチャーが無い場合は、単に会話の音声を拾っただけで、制御対象機器を制御する意思は無いものと判断し、コマンドの実行を中止してもよい。また、例えば、音声から電源オンのコマンドを把握し、どの制御対象機器の電源をオンするかについては、補助信号入力手段１０７から入力されたジェスチャー等に基づいて判定するようにしてもよい。 The command conversion unit 103 may determine execution of the command based on the input from the auxiliary signal input unit 107. For example, the command is executed only when the user's gesture input from the auxiliary signal input unit 107 matches the predetermined gesture. When there is no predetermined gesture, the control target device is simply picked up the voice of the conversation. It may be determined that there is no intention to control the command, and execution of the command may be stopped. Further, for example, a command to turn on the power may be grasped from voice, and which control target device to turn on may be determined based on a gesture input from the auxiliary signal input unit 107.

コマンド出力手段１０５は、制御対象機器がウェブブラウザを実行している場合、コマンドを当該ウェブブラウザにおいて解釈可能なスクリプトに変換して出力する。この場合、連続して実行すべき複数のコマンドを１つのスクリプトの中に記述する。制御対象機器がコンピュータネットワーク上にある場合、コマンド出力手段１０５がＬＡＮに対応したネットワークアダプタを備える構成とする。 When the control target device is executing a web browser, the command output unit 105 converts the command into a script that can be interpreted by the web browser and outputs the script. In this case, a plurality of commands to be executed continuously are described in one script. When the control target device is on the computer network, the command output means 105 is configured to include a network adapter corresponding to the LAN.

本実施形態の音声制御システムは、制御対象機器用のリモコンに内蔵することができる。また、携帯端末装置に内蔵することができる。 The voice control system of the present embodiment can be built in a remote controller for a control target device. Further, it can be built in a portable terminal device.

以上説明した本実施形態によれば、音声による１つの自然文の発生により、例えば、テレビの電源を入れてニュースを放送中のチャンネルにあわせる、といった制御をおこなうことができる。よって、１つの自然文によって複数のコマンドを実行可能とした音声制御システムにおいて、従来よりも幅広い表現の自然文を受け入れることを可能とし、ユーザの利便性を向上することができる。 According to the present embodiment described above, it is possible to perform control such as turning on the television and adjusting the news to the channel being broadcast by the occurrence of one natural sentence by voice. Therefore, in a voice control system in which a plurality of commands can be executed by one natural sentence, it is possible to accept a natural sentence with a wider expression than before, and the convenience of the user can be improved.

［第２実施形態］
次に、本発明の第２実施形態を説明する。図３は、本実施形態の構成図である。本実施形態において、音声制御システムは、デバイス１００と、サーバ２００とによって構成される。デバイス１００の基本構成は、第１実施形態とほぼ同一である。デバイス１００が、サーバ２００との通信手段１１１を備えている点と、コマンド変換手段１０３が、ローカルにおいて言葉からコマンドを決定できない場合に、サーバに処理を移譲する点が第１実施形態と異なっている。その他の各手段の動作は第１実施形態と同一である。 [Second Embodiment]
Next, a second embodiment of the present invention will be described. FIG. 3 is a configuration diagram of the present embodiment. In the present embodiment, the voice control system includes the device 100 and the server 200. The basic configuration of the device 100 is almost the same as that of the first embodiment. Unlike the first embodiment, the device 100 includes communication means 111 with the server 200 and the command conversion means 103 transfers the processing to the server when the command conversion means 103 cannot determine a command from words locally. Yes. The operations of the other means are the same as in the first embodiment.

本実施形態において、デバイス１００のコマンド変換手段１０３は、ローカルで言葉に対応するコマンドを決定できない場合、音声からコマンドへの変換をサーバ２００に依頼する。例えば、コマンド変換辞書に登録されていない言葉が抽出された場合や、コマンド変換辞書に登録されていない他国の言語が抽出された場合である。この場合、コマンド変換手段１０３は、音声入力手段１０１に入力された自然文の音声を記録した音声ファイルを音声認識手段１０２から取得する。また、コマンド変換手段１０３は、補助信号入力手段１０７から入力された補助信号の情報を記録した情報ファイルを補助信号認識手段１０８から取得する。そして、コマンド変換手段１０３は、取得した自然文の音声ファイルと、補助信号の情報ファイルとを通信手段１１１を介してサーバ２００に出力し、コマンドへの変換をサーバ２００に依頼する。デバイスの通信手段１１１およびサーバの通信手段２１１は、ＩＰネットワークに対応した通信デバイスを備えている。 In this embodiment, the command conversion means 103 of the device 100 requests the server 200 to convert voice to a command when a command corresponding to a word cannot be determined locally. For example, a case where a word not registered in the command conversion dictionary is extracted or a language of another country that is not registered in the command conversion dictionary is extracted. In this case, the command conversion unit 103 acquires from the speech recognition unit 102 an audio file in which the natural sentence audio input to the audio input unit 101 is recorded. Further, the command conversion means 103 acquires from the auxiliary signal recognition means 108 an information file in which information of the auxiliary signal input from the auxiliary signal input means 107 is recorded. Then, the command conversion unit 103 outputs the acquired natural sentence audio file and the auxiliary signal information file to the server 200 via the communication unit 111, and requests the server 200 to convert to a command. The device communication unit 111 and the server communication unit 211 include communication devices compatible with the IP network.

一方、サーバ２００は、当該通信手段２１１と、コマンド変換手段２０３とを備えている。また、サーバ２００は、コマンド変換辞書記憶手段２０４と、変換補助情報記憶手段２０６とを備えている。コマンド変換手段２０３は、デバイス１００のコマンド変換手段１０３と同等の機能を備えるほか、音声認識手段１０２の機能と、補助信号認識手段１０８の機能も備えている。また、コマンド変換辞書記憶手段２０４および変換補助情報記憶手段２０６は、デバイス１００が持っているコマンド変換辞書の情報および変換補助情報に加え、デバイス１００が持っていない情報も大量に記憶している。 On the other hand, the server 200 includes the communication unit 211 and the command conversion unit 203. The server 200 also includes command conversion dictionary storage means 204 and conversion auxiliary information storage means 206. The command conversion unit 203 has a function equivalent to that of the command conversion unit 103 of the device 100, and also has a function of the voice recognition unit 102 and a function of the auxiliary signal recognition unit 108. The command conversion dictionary storage unit 204 and the conversion auxiliary information storage unit 206 store a large amount of information that the device 100 does not have in addition to the command conversion dictionary information and conversion auxiliary information that the device 100 has.

デバイス１００において言葉および補助情報に基づくコマンドの決定ができなかった場合、デバイス１００のコマンド変換手段１０３は、デバイス１００において取得した音声ファイルと、補助情報の情報ファイルとを通信手段１１１を介してサーバ２００に送信する。サーバ２００は、デバイス１００から送信された音声ファイルおよび補助情報の情報ファイルを通信手段２１１を介して受信し、記憶手段に蓄積する。サーバ２００のコマンド変換手段２０３は、デバイス１００よりも情報量の多いコマンド変換辞書および変換補助情報を参照し、デバイス１００ではコマンドに変換することのできなかった音声および補助情報を適切なコマンドに変換する。そして、サーバ２００において変換されたコマンドは、通信手段２１１を介して送信され、デバイス１００の通信手段１１１に受信される。 When the device 100 cannot determine the command based on the words and the auxiliary information, the command conversion unit 103 of the device 100 sends the audio file acquired in the device 100 and the information file of the auxiliary information to the server via the communication unit 111. 200. The server 200 receives the audio file and the auxiliary information information file transmitted from the device 100 via the communication unit 211 and accumulates them in the storage unit. The command conversion means 203 of the server 200 refers to the command conversion dictionary and conversion auxiliary information having a larger amount of information than the device 100, and converts the voice and auxiliary information that could not be converted into a command by the device 100 into an appropriate command. To do. Then, the command converted in the server 200 is transmitted via the communication unit 211 and received by the communication unit 111 of the device 100.

デバイス１００のコマンド変換手段１０３は、サーバ２００から受信したコマンドをコマンド出力手段１０５に渡す。コマンド出力手段１０５は、ローカルの出力先記憶手段１１０に設定されている出力先を参照し、サーバ２００から受信したコマンドを当該出力先の制御対象機器の解釈可能なコマンド信号に変換し、当該制御対象機器に宛てて出力する。これにより、音声入力された自然文および補助信号に応じて制御対象機器の動作が制御される。 The command conversion unit 103 of the device 100 passes the command received from the server 200 to the command output unit 105. The command output means 105 refers to the output destination set in the local output destination storage means 110, converts the command received from the server 200 into a command signal that can be interpreted by the control target device at the output destination, and Output to the target device. As a result, the operation of the control target device is controlled in accordance with the natural sentence and the auxiliary signal inputted by voice.

この第２実施形態によれば、第１実施形態の効果に加え、デバイス１００単体ではコマンドに変換できない音声および補助情報をサーバのもつ豊富な情報量によってコマンドに変換することができる。また、コマンドを制御対象機器向けの信号に変換する処理はローカルで受け持つので、具体的にどのような機器を制御するのか、というプライベートな情報はサーバに対して隠すことができる。 According to the second embodiment, in addition to the effects of the first embodiment, voice and auxiliary information that cannot be converted into a command by the device 100 alone can be converted into a command based on the abundant information amount of the server. In addition, since processing for converting a command into a signal for a control target device is handled locally, private information on what device is specifically controlled can be hidden from the server.

以上の説明では、サーバ２００で変換したコマンドを、当該コマンドの変換を依頼したデバイス１００に戻すように構成しているが、サーバ２００は、変換したコマンドをデバイス１００以外の他のデバイスに宛てて送信するようにしてもよい。この場合、送信先のデバイスのＩＰアドレスをデバイス１００からサーバ２００に対して指示するように構成してもよい。 In the above description, the command converted by the server 200 is configured to be returned to the device 100 that requested the conversion of the command. However, the server 200 addresses the converted command to a device other than the device 100. You may make it transmit. In this case, the IP address of the destination device may be instructed from the device 100 to the server 200.

１００デバイス
１０１音声入力手段
１０２音声認識手段
１０３コマンド変換手段
１０４コマンド変換辞書記憶手段
１０５コマンド出力手段
１０６変換補助情報記憶手段
１０７補助信号入力手段
１０８補助信号認識手段
１０９学習指示手段
１１０出力先記憶手段
２００サーバ
２０３コマンド変換手段
２０４コマンド変換辞書記憶手段
２０６変換補助情報記憶手段
２１１通信手段 100 device 101 voice input means 102 voice recognition means 103 command conversion means 104 command conversion dictionary storage means 105 command output means 106 conversion auxiliary information storage means 107 auxiliary signal input means 108 auxiliary signal recognition means 109 learning instruction means 110 output destination storage means 200 Server 203 Command conversion means 204 Command conversion dictionary storage means 206 Conversion auxiliary information storage means 211 Communication means

Claims

In a voice control system for generating a plurality of continuous control commands for instructing a control target device to perform an action in accordance with one natural sentence inputted by voice, and outputting the plurality of continuous control commands to the control target device.
Conversion auxiliary information for converting a word that is not directly connected to the control command into a parameter for generating the control command, and having a command conversion dictionary that directly connects the word extracted from the natural sentence and the control command. Is stored in the storage means,
For words that are not directly linked to the control command, refer to the conversion auxiliary information, convert the control command to a parameter for generating the control command, comprising command conversion means for generating the control command,
The command conversion dictionary registers the plurality of control commands in association with execution priorities,
The voice conversion system, wherein the command conversion means instructs the command output means to output the plurality of consecutive control commands generated based on the one natural sentence according to the priority order.

The voice control system according to claim 1.
A voice control system comprising auxiliary signal input means for inputting an auxiliary signal other than voice uttered by a user, wherein the command conversion means generates the control command based on the word and the auxiliary signal.

A voice control system according to claim 1 or 2 is provided as a device, and a server that communicates with the device is provided.
The server stores a server-side command conversion dictionary having an information amount exceeding the information amount of the command conversion dictionary stored in the device in the storage unit, and an information amount exceeding the information amount of the conversion auxiliary information stored in the device Server side conversion auxiliary information is stored in the storage means,
When the device cannot generate the control command based on the word by the command conversion dictionary or conversion auxiliary information stored in the device, the device transmits the natural sentence speech information to the server,
The server extracts the word based on the natural sentence audio information received from the device, converts the word into a control command with reference to a server side command conversion dictionary and server side conversion auxiliary information, and the device To
The voice control system, wherein the device outputs a control command received from the server to a control target device.

The voice control system according to claim 3.
When the device transmits the audio information to the server, the device also transmits information based on the auxiliary signal input from the auxiliary signal input means to the server,
The voice control system, wherein the server generates the control command based on both the word and information based on the auxiliary signal.

The voice control system according to claim 3 or 4,
After receiving a control command from the server, the device converts the control command into a command signal interpretable by the control target device, and outputs the command signal to the control target device.

The voice control system according to any one of claims 1 to 5,
When the control target device executes a script, the voice control system describes the plurality of continuous control commands in one script and outputs the script to the control target device.

In a voice control system for generating a plurality of continuous control commands for instructing a control target device to perform an action in accordance with one natural sentence inputted by voice, and outputting the plurality of continuous control commands to the control target device.
Conversion auxiliary information for converting a word that is not directly connected to the control command into a parameter for generating the control command, and having a command conversion dictionary that directly connects the word extracted from the natural sentence and the control command. Is stored in the storage means,
The command conversion dictionary registers the plurality of control commands in association with execution priorities,
For a word that is not directly linked to the control command, refer to the conversion auxiliary information, convert the control command to a parameter for generating the control command, and generate the control command;
A process of instructing to output the plurality of consecutive control commands generated based on the one natural sentence according to the priority order;
A voice control program that causes a computer to execute.