JP2020074043A

JP2020074043A - Electronic apparatus and control method for the same

Info

Publication number: JP2020074043A
Application number: JP2020018508A
Authority: JP
Inventors: 秀人井澤; Hideto Izawa; 玲子嘉和知; Reiko Kawachi; 邦朗本沢; Kuniaki Motosawa; 弘之野本; Hiroyuki Nomoto
Original assignee: Toshiba Visual Solutions Corp
Current assignee: Toshiba Visual Solutions Corp
Priority date: 2020-02-06
Filing date: 2020-02-06
Publication date: 2020-05-14
Anticipated expiration: 2036-10-12
Also published as: JP6858334B2

Abstract

To provide an electronic apparatus that controls devices connected through a network.SOLUTION: An electronic apparatus 332 determines the execution of control of individual devices 310, 320, 340 on the basis of the content of a second voice that is input after the input of a first voice, depending on the content of the first voice input from the outside. The apparatus includes: management means that creates voice data for determination for determining that the first voice is a desired voice and manages them and that determines that the first voice is the desired voice by using the voice data for determination; and control means for executing the control of the devices on the basis of the content of the second voice. The control means executes the control of the devices in accordance with a determination result obtained by the management means, has an output part for outputting, in voice, the determination result obtained by the management means, and, in accordance with the determination result obtained by the management means, outputs the result from the output part. The management means has a determination criterion 2 with a plurality of criteria, and changes a content output from the output part in accordance with any of the plurality of criteria of the determination criterion 2 satisfied by the determination result.SELECTED DRAWING: Figure 1

Description

本発明の実施形態は、家庭やオフィスや小規模事業所におけるホームオートメーションの分野における、音声によって複数の機器を制御する電子機器及びその制御方法に関するものである。 The embodiments of the present invention relate to an electronic device that controls a plurality of devices by voice and a control method thereof in the field of home automation in a home, office, or small business establishment.

従来ホームオートメーションの分野において、音声入力により家庭やオフィスや小規模事業所における種々の機器を操作及び制御する音声認識装置及び方法が存在している。 In the field of home automation, there have been voice recognition devices and methods for operating and controlling various devices in homes, offices, and small businesses by voice input.

この音声認識装置及び方法は、ユーザから入力された音声を解析することで、その入力された音声が当該装置の機能をオンにする音声であるか否かの判定を行ったり、当該装置の機能をオンにする音声であると判定した場合は、継続する音声の内容を分析しその分析結果に基づく処理を行ったりするものである。また、ユーザから入力された音声の特徴を認識することで、音声を発したユーザを特定し、そのユーザに適した処理を行ったりするものもある。 This voice recognition device and method analyze a voice input by a user to determine whether or not the input voice is a voice that turns on the function of the device, and the function of the device. When it is determined that the voice is a voice that turns on, the content of the voice that continues is analyzed and processing based on the analysis result is performed. Further, there is also one that recognizes the characteristics of the voice input by the user, identifies the user who uttered the voice, and performs processing suitable for the user.

国際公開第2015／029379号International Publication No. 2015/029379 国際公開第2015／033523号International Publication No. 2015/033523

ホームオートメーションシステムの形態としては、各々の機器が家庭内のネットワークにより互いに接続され、更にこの接続された複数の機器をトータルで制御するホスト機器がネットワークに接続されているものがある。この場合ホスト機器は、ネットワークで接続された各機器の動作の制御を行ったり、各機器に関する情報を集めてユーザが一元的に閲覧等できるよう管理したりしている。 As a form of the home automation system, there is a system in which the respective devices are connected to each other by a home network, and a host device for totally controlling the connected plurality of devices is connected to the network. In this case, the host device controls the operation of each device connected by the network, and collects information about each device and manages it so that the user can browse it in a unified manner.

ユーザは、例えば音声によりホスト機器に命令することで、ホスト機器とネットワークで接続された各々の機器の制御を行ったり、その接続された各々の機器に関する情報を一元的に閲覧したりすることができる。 By instructing the host device by voice, for example, the user can control each device connected to the host device via a network and can centrally browse information about each connected device. it can.

このような形態のホームオートメーションシステムは、制御対象の機器をネットワークにより容易に接続させることが可能なため、接続機器の数や種類が多数になる傾向がある。また、制御対象の機器の追加、変更、バージョンアップ、設置場所の移動及び廃棄等に伴うネットワークへの新たな参加、設定変更及びネットワークからの脱退が度々発生する傾向にある。また、接続している機器の動作内容や仕様等の種類が多数に及ぶことから、家庭内やオフィスでも老若男女問わずホームオートメーションシステムを使用する傾向にある。特に最近の多種多様な機能をもつ機器やセンサの小型化に伴い、この傾向はますます顕著になってきている。 In the home automation system having such a form, the devices to be controlled can be easily connected to the network, so that the number and types of connected devices tend to be large. In addition, there is a tendency that new participation in the network, setting change, and withdrawal from the network often occur due to addition, change, version upgrade, movement of the installation place, disposal, etc. of the device to be controlled. Further, since there are many types of operation contents and specifications of connected devices, home automation systems tend to be used in homes and offices regardless of age or sex. This tendency is becoming more and more remarkable with the recent miniaturization of devices and sensors having various functions.

しかし従来のホームオートメーションシステムでは、多種多様の機器の制御や幅広いユーザ層への対応が十分とは言えないものであった。例えば、家庭内でホームオートメーションシステムを使用する場合、家族一人ひとりの生活スタイルによりマッチした機器の制御が十分に行われているとは言えないものであった。 However, in conventional home automation systems, control of a wide variety of devices and support for a wide range of users have not been sufficient. For example, when using a home automation system at home, it cannot be said that the devices that match the lifestyle of each family member are sufficiently controlled.

本実施形態は、上記課題を鑑みてなされたもので、ネットワークにより接続された多種多様な機器を、ユーザの個々の生活スタイルによりマッチするように制御する電子機器及びその制御方法を提案することを目的とする。 The present embodiment has been made in view of the above problems, and proposes an electronic device and a control method for controlling a wide variety of devices connected by a network so as to match each user's individual lifestyle. To aim.

実施形態の電子機器は、外部から入力される第１の音声の内容により、前記第１の音声が入力された以降に入力される第２の音声の内容に基づいて１台または複数台の機器の制御の実行を判定する電子機器において、前記第１の音声が所望の音声であることを判定するための判定用音声データを、複数回外部から入力された音声により作成管理し、作成管理されている前記判定用音声データを用いて前記第１の音声が所望の音声であることを判定する管理手段と、第２の音声の内容に基づいて前記１台または複数台の機器の制御を実行する制御手段とを備え、前記管理手段により前記判定用音声データを用いて、前記第１の音声が所望の音声であると判定された場合に、前記制御手段により前記第２の音声の内容に基づいて前記１台または複数台の機器の制御を実行し、前記管理手段による判定結果を音声で出力する出力部を有し、前記第１の音声が、前記管理手段により前記判定用音声データを用いて所望の音声であると判定された場合は、その旨を前記出力部から出力し、前記管理手段は、前記判定用音声データを用いて前記第１の音声が所望のデータであることを判定する際に、複数の基準を持つ判定基準２を持ち、判定結果が満たす前記判定基準２の複数の基準のうちのいずれかに応じて、前記出力部から出力する内容を変える。 The electronic device according to the embodiment includes one or a plurality of devices based on the content of the first voice input from the outside, based on the content of the second voice input after the input of the first voice. In the electronic device that determines the execution of the control, the creation voice is used to manage the determination voice data for determining whether the first voice is the desired voice by the voice input from the outside multiple times. Management means for determining whether the first voice is a desired voice using the determination voice data, and controlling the one or more devices based on the content of the second voice. When the first voice is determined to be a desired voice by the management voice using the determination voice data, the control means changes the content of the second voice. Based on the above-mentioned one or multiple Has a output unit for executing the control of a single device and outputting the judgment result by the management unit as a sound, and the first sound is a desired sound using the judgment sound data by the management unit. When it is determined that the first voice is desired data using the determination voice data, the management unit outputs a plurality of The content output from the output unit is changed in accordance with any one of the plurality of criteria 2 of the criteria 2 that the determination result satisfies.

図１は一実施形態に係るホームオートメーションシステムの全体像の例を示す図である。FIG. 1 is a diagram showing an example of an overview of a home automation system according to an embodiment. 図２は一実施形態に係るセンサの他の例を示す一覧である。FIG. 2 is a list showing another example of the sensor according to the embodiment. 図３は一実施形態に係るホスト機器の例を示す図である。FIG. 3 is a diagram illustrating an example of a host device according to an embodiment. 図４は一実施形態に係るホスト機器の機能ブロック図である。FIG. 4 is a functional block diagram of the host device according to the embodiment. 図５Ａは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 5A is a diagram showing an example of a processing sequence in registering a reserved word according to an embodiment. 図５Ｂは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 5B is a diagram showing an example of a processing sequence in registering a reserved word according to the embodiment. 図６Ａは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 6A is a diagram showing an example of a processing sequence in registering a reserved word according to an embodiment. 図６Ｂは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 6B is a diagram showing an example of a processing sequence in registering a reserved word according to the embodiment. 図７Ａは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 7A is a diagram showing an example of a processing sequence in registering a reserved word according to the embodiment. 図７Ｂは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 7B is a diagram showing an example of a processing sequence in registering a reserved word according to the embodiment. 図８Ａは一実施形態に係る予約語の認識における処理シーケンスの例を示す図である。FIG. 8A is a diagram showing an example of a processing sequence in recognizing a reserved word according to the embodiment. 図８Ｂは一実施形態に係る予約語の認識における処理シーケンスの例を示す図である。FIG. 8B is a diagram showing an example of a processing sequence in recognizing a reserved word according to the embodiment. 図９Ａは一実施形態に係る予約語の認識における処理シーケンスの例を示す図である。FIG. 9A is a diagram showing an example of a processing sequence in recognizing a reserved word according to the embodiment. 図９Ｂは一実施形態に係る予約語の認識における処理シーケンスの例を示す図である。FIG. 9B is a diagram showing an example of a processing sequence in recognizing a reserved word according to the embodiment. 図１０Ａは一実施形態に係る予約語を認識した以降に、継続してユーザが発した機器やセンサを制御する言葉をもとに該当する機器やセンサを制御する処理シーケンスの例を示す図である。FIG. 10A is a diagram showing an example of a processing sequence for controlling a corresponding device or sensor based on a word issued by the user for controlling the device or sensor after recognizing the reserved word according to the embodiment. is there. 図１０Ｂは一実施形態に係る予約語を認識した以降に、継続してユーザが発した機器やセンサを制御する言葉をもとに該当する機器やセンサを制御する処理シーケンスの例を示す図である。FIG. 10B is a diagram showing an example of a processing sequence for controlling a corresponding device or sensor based on a word issued by the user for controlling the device or sensor after recognizing the reserved word according to the embodiment. is there. 図１１Ａは一実施形態に係る予約語を認識した以降に、継続してユーザが発する機器やセンサを制御する言葉が、一定時間内に継続される場合の処理シーケンスの例を示す図である。FIG. 11A is a diagram showing an example of a processing sequence in the case where words for controlling devices and sensors issued by the user continuously after recognizing the reserved word according to the embodiment continue within a certain time. 図１１Ｂは一実施形態に係る予約語を認識した以降に、継続してユーザが発する機器やセンサを制御する言葉が、一定時間内に継続される場合の処理シーケンスの例を示す図である。FIG. 11B is a diagram showing an example of a processing sequence in the case where words for controlling the devices and sensors issued by the user continuously after the recognition of the reserved word according to the embodiment continue within a certain time. 図１２Ａは一実施形態に係る予約語を認識した以降に、継続してユーザが発する機器やセンサを制御する言葉が、一定時間を超えて継続される場合の処理シーケンスの例を示す図である。FIG. 12A is a diagram showing an example of a processing sequence in the case where the words for controlling the devices and sensors issued by the user continuously after the recognition of the reserved word according to the embodiment continues for a predetermined time or longer. .. 図１２Ｂは一実施形態に係る予約語を認識した以降に、継続してユーザが発する機器やセンサを制御する言葉が、一定時間を超えて継続される場合の処理シーケンスの例を示す図である。FIG. 12B is a diagram showing an example of a processing sequence in the case where words for controlling a device or a sensor issued by the user continuously after recognizing the reserved word according to the embodiment continue for a predetermined time or longer. .. 図１３は一実施形態に係る予約語を認識した以降に、機器やセンサを制御する際に用いる制御情報の内容を具体的に示した一覧である。FIG. 13 is a list specifically showing the contents of control information used when controlling a device or a sensor after recognizing a reserved word according to an embodiment. 図１４は一実施形態に係る複数の予約語に応じて変更する動作内容の例を示す一覧である。FIG. 14 is a list showing an example of operation contents changed according to a plurality of reserved words according to an embodiment. 図１５Ａは一実施形態に係る複数の予約語の登録において、各予約語に応じて変更する動作内容もあわせて登録する処理シーケンスの例を示す図である。FIG. 15A is a diagram showing an example of a processing sequence in which, in registering a plurality of reserved words according to one embodiment, the operation content changed according to each reserved word is also registered. 図１５Ｂは一実施形態に係る複数の予約語の登録において、各予約語に応じて変更する動作内容もあわせて登録する処理シーケンスの例を示す図である。FIG. 15B is a diagram showing an example of a processing sequence in which, in registering a plurality of reserved words according to an embodiment, the operation content changed according to each reserved word is also registered. 図１６Ａは一実施形態に係る予約語の認識において、各予約語に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 16A is a diagram showing an example of a processing sequence for setting the operation content according to each reserved word in the recognition of the reserved word according to the embodiment. 図１６Ｂは一実施形態に係る予約語の認識において、各予約語に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 16B is a diagram showing an example of a processing sequence for setting the operation content according to each reserved word in the recognition of the reserved word according to the embodiment. 図１７は一実施形態に係る予約語において、その予約語に継続する言葉に応じて設定する動作内容の例を示す一覧である。FIG. 17 is a list showing an example of operation contents set in accordance with a word continuing to the reserved word in the reserved word according to the embodiment. 図１８Ａは一実施形態に係る登録済み予約語の認識において、その予約語に継続する言葉に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 18A is a diagram showing an example of a processing sequence in which, in recognition of a registered reserved word according to one embodiment, an operation content is set according to a word continuing to the reserved word. 図１８Ｂは一実施形態に係る登録済み予約語の認識において、その予約語に継続する言葉に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 18B is a diagram showing an example of a processing sequence for setting the operation content in accordance with a word continuing to the reserved word in recognizing the registered reserved word according to the embodiment. 図１８Ｃは一実施形態に係る登録済み予約語の認識において、その予約語に継続する言葉に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 18C is a diagram showing an example of a processing sequence in which, in recognition of a registered reserved word according to the embodiment, the operation content is set according to a word continuing to the reserved word. 図１８Ｄは一実施形態に係る登録済み予約語の認識において、その予約語に継続する言葉に応じた動作内容を設定する処理シーケンスの別の例を示す図である。FIG. 18D is a diagram showing another example of the processing sequence for setting the operation content according to the word continuing to the reserved word in the recognition of the registered reserved word according to the embodiment. 図１８Ｅは一実施形態に係る登録済み予約語の認識において、その予約語に継続する言葉に応じた動作内容を設定する処理シーケンスの別の例を示す図である。FIG. 18E is a diagram showing another example of the processing sequence for setting the operation content according to the word continuing to the reserved word in the recognition of the registered reserved word according to the embodiment. 図１９Ａは一実施形態に係る予約語の認識において、その認識した予約語に継続する言葉に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 19A is a diagram showing an example of a processing sequence in which, in the recognition of a reserved word according to the embodiment, the operation content is set according to a word continuing to the recognized reserved word. 図１９Ｂは一実施形態に係る予約語の認識において、その認識した予約語に継続する言葉に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 19B is a diagram showing an example of a processing sequence in which, in the recognition of a reserved word according to the embodiment, the operation content is set according to a word continuing to the recognized reserved word. 図２０は一実施形態に係る複数の予約語の認識において、その予約語に応じて使用する音声認識辞書の種類の例を示す一覧である。FIG. 20 is a list showing an example of types of voice recognition dictionaries used according to the reserved words in recognizing a plurality of reserved words according to an embodiment. 図２１Ａは一実施形態に係る複数の予約語の認識において、その予約語に応じて使用する音声認識辞書の種類を変更する処理シーケンスの例を示す図である。FIG. 21A is a diagram showing an example of a processing sequence for changing the type of the voice recognition dictionary to be used according to the reserved words in recognizing a plurality of reserved words according to the embodiment. 図２１Ｂは一実施形態に係る複数の予約語の認識において、その予約語に応じて使用する音声認識辞書の種類を変更する処理シーケンスの例を示す図である。FIG. 21B is a diagram showing an example of a processing sequence for changing the type of the voice recognition dictionary to be used in accordance with the reserved words in recognizing a plurality of reserved words according to the embodiment. 図２２は一実施形態に係る複数の予約語の認識において、その予約語に継続する言葉や応じて設定する動作内容や、使用する音声認識辞書の種類を変更する例を示す一覧である。FIG. 22 is a list showing an example in which, in recognition of a plurality of reserved words according to an embodiment, the words that continue to the reserved words, the operation content set according to the reserved words, and the type of the voice recognition dictionary to be used are changed. 図２３は一実施形態に係る音声認識辞書の種類の変更において、予約語以外の内容に応じて変更する例を示す一覧である。FIG. 23 is a list showing an example of changing the type of the voice recognition dictionary according to the embodiment in accordance with the content other than the reserved word. 図２４は一実施形態に係る音声認識辞書の種類の変更において、予約語以外の内容に応じて変更する音声認識辞書の種類を登録する処理のシーケンスを示す図である。FIG. 24 is a diagram showing a sequence of processing for registering the type of the voice recognition dictionary to be changed according to the content other than the reserved word in changing the type of the voice recognition dictionary according to the embodiment. 図２５は一実施形態に係る音声認識辞書の種類の変更において、予約語以外の内容に応じて登録する音声認識辞書の種類を変える場合の処理のシーケンスを示す図である。FIG. 25 is a diagram showing a sequence of processing when changing the type of the voice recognition dictionary to be registered according to the content other than the reserved word in changing the type of the voice recognition dictionary according to the embodiment. 図２６は一実施形態に係る処理において、ユーザが登録済みの予約語を忘れてしまった場合の、予約語を表示するための予約語（救済用）と、それに対応して予約語を表示する範囲の例を示す一覧である。FIG. 26 shows a reserved word (for relief) for displaying a reserved word and a reserved word corresponding to the reserved word when the user forgets the registered reserved word in the process according to the embodiment. It is a list which shows the example of a range. 図２７は一実施形態に係るホスト機器の機能ブロック図である。FIG. 27 is a functional block diagram of the host device according to the embodiment. 図２８は一実施形態に係る処理において、予約語、付加語、あるいは付加情報を登録するシーンが発生したとき、あるいは予約語、付加語、を認識するシーンが発生したときに、ホスト機器３３２が登録のシーンあるいは認識のシーンを録音あるいは録画する場合の時間経過の一例を示している図である。In the processing according to the embodiment, FIG. 28 shows that when a scene for registering a reserved word, an additional word, or additional information occurs, or a scene for recognizing a reserved word, an additional word occurs, the host device 332 executes It is a figure showing an example of time lapse when recording a scene of registration, or a scene of recognition. 図２９は一実施形態に係る録音あるいは録画されたシーンの各データを再生する際の再生対象のデータが表示されている様子の一例を示している図である。FIG. 29 is a diagram showing an example of a state in which the data to be reproduced is displayed when reproducing each data of the recorded sound or the recorded scene according to the embodiment.

図１は、本実施形態に係るホームオートメーションシステムの全体構成の一例を示した図である。ホームオートメーションシステムは、クラウドに置かれたサーバ群からなるクラウドサーバ１と、ＨＧＷ（ＨｏｍｅＧａｔｅＷａｙ）機能を持つホスト機器３３２を経由してネットワーク３３３で互いに接続されている各種センサ３１０や各種設備機器３２０や各種家電機器３４０が配置されているホーム３と、クラウドサーバ１とホスト機器３３２とを接続するインターネット２とから成る。 FIG. 1 is a diagram showing an example of the overall configuration of a home automation system according to the present embodiment. The home automation system includes a cloud server 1 including a group of servers placed in a cloud, various sensors 310 and various equipment devices 320 connected to each other via a network 333 via a host device 332 having an HGW (HomeGate Way) function, and The home 3 includes various home appliances 340 and the Internet 2 that connects the cloud server 1 and the host device 332.

ホーム１は、ＨＧＷ機能を持つホスト機器３３２を経由して、家庭内のネットワーク３３３で互いに接続されている各種センサ３１０や各種設備機器３２０や各種家電機器３４０が配置された家庭やオフィスや小規模事業所であり、その規模は問わない。 The home 1 is a home, an office, or a small scale in which various sensors 310, various facility devices 320, and various home appliances 340 connected to each other via a home network 333 are arranged via a host device 332 having an HGW function. It is a business establishment and its scale does not matter.

ホスト機器３３２は、予め設定されている情報やネットワーク３３３で接続されたセンサから通知された情報をもとにネットワーク３３３で接続されている機器やセンサを制御したり、また各々の機器やセンサに関する情報を一元管理したりする機能を有する。 The host device 332 controls the devices and sensors connected to the network 333 based on preset information and information notified from the sensors connected to the network 333, and also relates to each device and sensor. It has the function of centrally managing information.

更にホスト機器３３２は、マイクを備えておりユーザ３３１が発した言葉を取り込むことが出来る。ホスト機器３３２は、ユーザ３３１が発した言葉の中から予め決められたキーワード（以降予約語と呼ぶ）を認識すると、その予約語に続いてユーザ３３１が発した言葉を取り込み、その取り込んだ言葉の内容を解析することで解析結果に応じた応答をユーザ３３１に返したり、或いはネットワーク３３３で接続されている機器やセンサを解析結果に応じて制御をしたりする機能を有する。 Furthermore, the host device 332 is equipped with a microphone and can capture words spoken by the user 331. When the host device 332 recognizes a predetermined keyword (hereinafter referred to as a reserved word) from the words issued by the user 331, the host device 332 takes in the word issued by the user 331 subsequent to the reserved word, It has a function of returning a response according to the analysis result to the user 331 by analyzing the contents, or controlling a device or a sensor connected by the network 333 according to the analysis result.

逆にホスト機器３３２は、ユーザ３３１が発した言葉の中から予約語を認識しない限り、ユーザ３３１が発した言葉を継続して取り込むことはしない。これによりホスト機器３３２は、周囲の不要な音声を拾って動作することを防いでいる。 On the contrary, unless the host device 332 recognizes the reserved word from the words spoken by the user 331, the host machine 332 does not continuously take in the word spoken by the user 331. This prevents the host device 332 from picking up unnecessary surrounding sounds and operating them.

予約語の認識はホスト機器３３２内で行われ、予約語に続いてユーザ３３１が発した言葉を継続して取り込み、その取り込んだ言葉の内容の解析は、クラウドサーバ１において行われる。ホスト機器３３２の機能の詳細については後で説明する。 The recognition of the reserved word is performed in the host device 332, the word issued by the user 331 following the reserved word is continuously taken in, and the content of the taken-in word is analyzed in the cloud server 1. Details of the function of the host device 332 will be described later.

各種設備機器３２０と各種家電機器３４０は、説明の便宜上設備機器３２０が移動があまり容易でない機器を意味しており、各種家電機器３４０が移動が比較的容易である機器を意味している。例示した設備機器や家電機器の名称は、個々の機器の能力や機能を制限するものではない。 For convenience of description, the various equipments 320 and the various home appliances 340 mean the equipments that the equipments 320 are not so easy to move, and the various home appliances 340 are the equipments that are relatively easy to move. The illustrated names of equipment and home appliances do not limit the capabilities and functions of individual devices.

各種センサ３１０の具体例として、防犯カメラ３１１、火災報知器３１２、人感センサ３１３、温度センサ３１４がある。また、各種設備機器３２０３２０の具体例として、インターフォン３２５、照明３２６、エアコン３２７、給湯器３２８がある。また、各種家電機器３４０の具体例として、洗濯機３４１、冷蔵庫３４２、電子レンジ３４３、扇風機３４４、炊飯器３４５、テレビ３４６がある。 Specific examples of the various sensors 310 include a security camera 311, a fire alarm 312, a motion sensor 313, and a temperature sensor 314. Further, as specific examples of the various equipment 320320, there are an intercom 325, a lighting 326, an air conditioner 327, and a water heater 328. Further, as specific examples of the various home appliances 340, there are a washing machine 341, a refrigerator 342, a microwave oven 343, a fan 344, a rice cooker 345, and a television 346.

図２は、図１に示す各種センサ３１０のその他の例を示したものである。 FIG. 2 shows another example of the various sensors 310 shown in FIG.

図３は、図１に示すホスト機器３３２の種々の例を示している。 FIG. 3 shows various examples of the host device 332 shown in FIG.

ホスト機器３３２−１は、図１に示すホスト機器３３２であり、ＨＧＷ機能を内蔵する据え置き型の例である。ホスト機器３３２−１は、ネットワーク３３３を通じてホーム１内に配置されている他の機器やセンサと接続されており、またインターネット２を通じてクラウドサーバ１と接続されている。ホスト機器３３２−１は、据え置き型のため例えばモーター等の自律的に移動する手段を搭載しない例である。 The host device 332-1 is the host device 332 shown in FIG. 1 and is an example of a stationary type having a built-in HGW function. The host device 332-1 is connected to other devices and sensors arranged in the home 1 via the network 333, and is also connected to the cloud server 1 via the Internet 2. Since the host device 332-1 is a stationary type, it is an example in which a means for moving autonomously such as a motor is not mounted.

ホスト機器３３２−２は、ＨＧＷ機能を内蔵しない据え置き型の例である。そのためホスト機器３３２−２は、ネットワーク３３３を通じてＨＧＷ３３０と接続されている。ホスト機器３３２−２は、ＨＧＷ３３０を経由してネットワーク３３３を通じてホーム１内に配置されている他の機器やセンサと接続され、またＨＧＷ３３０を経由してインターネット２を通じてクラウドサーバ１と接続されている。ホスト機器３３２−２は、据え置き型のため例えばモーター等の自律的に移動する手段を搭載しない例である。 The host device 332-2 is an example of a stationary type that does not have a built-in HGW function. Therefore, the host device 332-2 is connected to the HGW 330 via the network 333. The host device 332-2 is connected to other devices and sensors arranged in the home 1 via the network 333 via the HGW 330, and is also connected to the cloud server 1 via the Internet 2 via the HGW 330. Since the host device 332-2 is a stationary type, it is an example in which no means for autonomously moving such as a motor is mounted.

ホスト機器３３２−３は、ＨＧＷ機能を内蔵する可動型の例である。ホスト機器３３２−３は、ネットワーク３３３を通じて他の機器やセンサと接続されており、またインターネット２を通じてクラウドサーバ１と接続されている。ホスト機器３３２−３は、可動型のため例えばモーター等の自律的に移動するための手段を搭載する例である。 The host device 332-3 is an example of a movable type having a built-in HGW function. The host device 332-3 is connected to other devices and sensors via the network 333, and is also connected to the cloud server 1 via the Internet 2. Since the host device 332-3 is a movable type, it is an example in which means for autonomously moving such as a motor is mounted.

ホスト機器３３２−４は、ＨＧＷ機能を内蔵しない可動型の例である。そのためホスト機器３３２−４は、ネットワーク３３３を通じてＨＧＷ３３０と接続されている。ホスト機器３３２−４は、ＨＧＷ３３０を経由してネットワーク３３３を通じて他の機器やセンサと接続され、またＨＧＷ３３０を経由してインターネット２を通じてクラウドサーバ１と接続されている。ホスト機器３３２−４は、可動型のため例えばモーター等の自律的に移動するための手段を搭載する例である。 The host device 332-4 is an example of a movable type that does not have a built-in HGW function. Therefore, the host device 332-4 is connected to the HGW 330 via the network 333. The host device 332-4 is connected to other devices and sensors via the network 333 via the HGW 330, and is also connected to the cloud server 1 via the Internet 2 via the HGW 330. Since the host device 332-4 is a movable type, it is an example in which means for autonomously moving such as a motor is mounted.

図４は、図１に示すホスト機器３３２の機能ブロックを示したものである。ホスト機器３３２は、内部の処理全体を制御するシステムコントローラ４０２、とそれにより各機能を制御する制御管理部４０１、トリガー設定部４０３、トリガー認識部４０５、入力管理部４２０及びネットワーク３３３と接続するためのネットワークＩ／Ｆ４２７をもつ。制御管理部４０１は、内部にホスト機器３３２の各種動作を制御するための複数のアプリケーションを管理するＡＰＰ−Ｍｇ４０１−１、ホスト機器３３２の各機能ブロックの初期設定や種々の状態設定や動作設定などの設定内容を管理するＣＯＮＦ−Ｍｇ４０１−２からなる。 FIG. 4 shows functional blocks of the host device 332 shown in FIG. The host device 332 is connected to the system controller 402 that controls the entire internal processing, and the control management unit 401, the trigger setting unit 403, the trigger recognition unit 405, the input management unit 420, and the network 333 that control the respective functions by the system controller 402. Network I / F 427. The control management unit 401 internally manages a plurality of applications for controlling various operations of the host device 332, APP-Mg 401-1, initial settings of various functional blocks of the host device 332, various status settings, operation settings, etc. CONF-Mg 401-2 which manages the setting contents of.

またホスト機器３３２は、ユーザ３３１とのインターフェース（Ｉ／Ｆ）として、ユーザ３３１が発する言葉を取り込むためのマイク４２１、ユーザ３３１に対して応答を音声で出力するためのスピーカ４２３及びユーザ３３１に対してホスト機器３３２の状態を通知するための表示部４２５とを持つ。 Further, the host device 332 serves as an interface (I / F) with the user 331, to the microphone 421 for capturing the words spoken by the user 331, the speaker 423 for outputting a response to the user 331 by voice, and the user 331. And a display unit 425 for notifying the state of the host device 332.

マイク４２１は、入力管理部４２０に接続されている。入力管理部４２０は、内部で管理する状態に応じて、マイク４２１から入力された音声データを、トリガー設定部４０３、トリガー認識部４０５及び音声処理部４０７の何れに送るかの制御をする。表示部４２５は、ホスト機器３３２の状態をユーザ３３１に通知するものであり、例えばＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）やＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）である。 The microphone 421 is connected to the input management unit 420. The input management unit 420 controls which of the trigger setting unit 403, the trigger recognition unit 405, and the voice processing unit 407 the voice data input from the microphone 421 is sent to, depending on the state managed internally. The display unit 425 notifies the user 331 of the state of the host device 332, and is, for example, an LED (Light Emitting Diode) or an LCD (Liquid Crystal Display).

メモリ４１０は、動作モード保存エリア４１０−１、予約語保存エリア４１０−２、音声蓄積エリア４１０−３の３つの領域に分かれている。各々のエリアに保存される情報の内容は後で説明する。 The memory 410 is divided into three areas: an operation mode storage area 410-1, a reserved word storage area 410-2, and a voice storage area 410-3. The contents of the information stored in each area will be described later.

先に述べたようにホスト機器３３２の機能は、ユーザ３３１が発した言葉の中から予約語を認識すると、その予約語に継続するユーザ３３１の発した言葉を取り込み、その取り込んだ言葉の内容を解析することで、解析結果に応じた応答をユーザ３３１に返したりネットワーク３３３を通じて接続されている機器やセンサの動作を制御したりする機能を持つ。 As described above, when the function of the host device 332 recognizes the reserved word from the words issued by the user 331, the word issued by the user 331 that continues to the reserved word is taken in, and the content of the taken word is read. By the analysis, it has a function of returning a response according to the analysis result to the user 331 and controlling the operation of devices and sensors connected through the network 333.

これらの機能を実現するために、ホスト機器３３２は、大きく４つの処理を行う。１つ目の処理は、予約語の登録である。２つ目の処理は、予約語の認識である。３つ目の処理は、動作を制御する機器やセンサの制御内容の登録である。４つ目の処理は、制御内容が登録されている機器やセンサの制御である。 In order to realize these functions, the host device 332 roughly performs four processes. The first process is the registration of reserved words. The second process is recognition of reserved words. The third process is registration of the control content of the device or sensor that controls the operation. The fourth process is control of devices and sensors whose control contents are registered.

最初に、１つ目の処理である予約語の登録について説明する。
ホスト機器３３２は、予約語をホスト機器３３２に登録する機能を有している。予約語を登録するために、ホスト機器３３２は、予約語を登録するモード（以降予約語登録モードと呼ぶ）を有している。 First, the reserved word registration, which is the first process, will be described.
The host device 332 has a function of registering a reserved word in the host device 332. In order to register the reserved word, the host device 332 has a mode for registering the reserved word (hereinafter referred to as reserved word registration mode).

図５Ａおよび図５Ｂは、予約語を登録するためにホスト機器３３２が「予約語登録モード」に遷移している状態において、予約語の登録開始から登録完了までのホスト機器３３２の処理シーケンスの例を示している。 5A and 5B show an example of the processing sequence of the host device 332 from the start of registration of a reserved word to the completion of registration in a state where the host device 332 is in the "reserved word registration mode" for registering a reserved word. Is shown.

なおホスト機器３３２は、モード変更するために予め決められた順番通りにユーザ３３１が発した言葉を認識することで、モード変更ができるようにしてもよい。あるいは表示部４２５にメニュー画面を表示し、そのメニュー画面をユーザ３３１が操作することでモード変更ができるようにしてもよい。あるいは、ネットワークＩ／Ｆ４２７を経由して接続されているスマートフォンやタブレットに表示されたホスト機器３３２のモードを変更するメニュー画面をユーザ３３１が操作することで、モード変更ができるようにしてもよい。 The host device 332 may change the mode by recognizing words spoken by the user 331 in a predetermined order for changing the mode. Alternatively, a menu screen may be displayed on the display unit 425, and the user 331 may operate the menu screen to change the mode. Alternatively, the mode may be changed by the user 331 operating a menu screen for changing the mode of the host device 332 displayed on a smartphone or tablet connected via the network I / F 427.

予約語として登録する言葉をユーザ３３１が発すると、ホスト機器３３２はマイク４２１から入力された音声データを入力管理部４２０に取り込む（Ｓ５０１）。入力管理部４２０は、内部で管理する状態に応じて入力された音声データの転送先を決める機能を有している。ホスト機器３３２のモードが設定モードである場合、入力管理部４２０は、受信した音声データをトリガー設定部４０３に転送する（Ｓ５０２）。トリガー設定部４０３は、受信した音声データをメモリ４１０の音声蓄積エリア４１０−３に保存する（Ｓ５０３）とともに、ユーザ３３１の音声を取り込んだ回数が規定回数に達しているかの確認（Ｓ５０４）を行う。 When the user 331 issues a word to be registered as a reserved word, the host device 332 fetches the voice data input from the microphone 421 into the input management unit 420 (S501). The input management unit 420 has a function of determining the transfer destination of the input audio data according to the state managed internally. When the mode of the host device 332 is the setting mode, the input management unit 420 transfers the received voice data to the trigger setting unit 403 (S502). The trigger setting unit 403 saves the received voice data in the voice storage area 410-3 of the memory 410 (S503) and confirms whether the number of times of capturing the voice of the user 331 has reached the prescribed number (S504). ..

トリガー設定部４０３は、ユーザ３３１の音声を取り込んだ回数が規定回数に達しているかの確認の結果、規定回数に達していないと判定した場合、登録する言葉を発するようにユーザ３３１に促す表示を行う（Ｓ５０７）と共に、入力管理部４２０に対して入力継続通知を送付する（Ｓ５０６）。入力継続通知を受信した入力管理部４２０は、内部の状態をマイクからの音声の入力待ちの状態に遷移させる（Ｓ５００）。 When the trigger setting unit 403 determines that the number of times of capturing the voice of the user 331 has reached the specified number of times as a result of confirmation, the trigger setting unit 403 displays a message prompting the user 331 to issue the words to be registered. At the same time as performing (S507), an input continuation notification is sent to the input management unit 420 (S506). Upon receiving the input continuation notification, the input management unit 420 transitions the internal state to the state of waiting for the voice input from the microphone (S500).

なお、登録する言葉を入力するようにユーザ３３１に対して促す表示は、トリガー設定部４０３が表示装置４２５に対して登録未完了通知を送信（Ｓ５０５）し、その登録未完了通知を受信した表示装置４２５が例えば発光ダイオード（ＬＥＤ）を赤色で点滅させる（Ｓ５０７）、というようにユーザ３３１が認識できる表示方法で行うことが望ましい。また表示による方法の代わりに音声による方法を用いて、登録する言葉の入力をユーザ３３１に促してもよい。この場合トリガー設定部４０３は、スピーカ４２３に対して登録未完了通知を送信し、この登録未完了通知を受け取ったスピーカ４２３は、例えば「もう一度入力してください」とユーザ３３１に対してアナウンスする方法でもよい。或いはトリガー設定部４０３は、ユーザ３３１に対して登録する言葉の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器３３２が可動型の場合、トリガー設定部４０３は、ホスト機器３３２が例えばある一定の角度幅で繰り返し回転移動するように、記載していない移動手段に対して指示を出してもよい。 The display prompting the user 331 to input the word to be registered is a display in which the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S505) and receives the registration incomplete notification. It is desirable that the device 425 perform a display method that the user 331 can recognize, such as causing a light emitting diode (LED) to blink red (S507). The user 331 may be prompted to input a word to be registered by using a voice method instead of the display method. In this case, the trigger setting unit 403 sends a registration incomplete notification to the speaker 423, and the speaker 423 that has received this registration incomplete notification announces to the user 331, for example, “Please input again”. But it's okay. Alternatively, the trigger setting unit 403 may use both the display method and the voice method to prompt the user 331 to input a word to be registered. Alternatively, when the host device 332 is movable, the trigger setting unit 403 may give an instruction to a moving unit (not shown) so that the host device 332 is repeatedly rotated and moved, for example, with a certain angular width.

トリガー設定部４０３は、ユーザ３３１の音声を取り込んだ回数が規定回数に達しているかの確認の結果、規定回数に達していると判定した場合、それまでに音声蓄積エリア４１０−３に保存してある音声データを読み出し（Ｓ５０８）、インターネット２を通じてクラウドサーバ１にある音声認識クラウド１０１の中の認識用データ変換部１０１−１に送付する（Ｓ５０９）。 When the trigger setting unit 403 determines that the number of times of capturing the voice of the user 331 has reached the specified number of times as a result of confirmation, the trigger setting unit 403 stores it in the audio storage area 410-3 by then. Certain voice data is read (S508) and sent to the recognition data conversion unit 101-1 in the voice recognition cloud 101 in the cloud server 1 through the Internet 2 (S509).

認識用データ変換部１０１−１は、トリガー設定部４０３から送られてきた音声データを、予約語として認識するための認識用データに変換する（Ｓ５１０）。認識用データへの変換が完了すると、認識用データ変換部（１０１−１）は、インターネット２を通じて認識用データをトリガー設定部４０３に送付（Ｓ５１１）する。認識用データを受信したトリガー設定部４０３は、受信したデータをメモリ４１０の予約語保存エリア４１０−２に保存する（Ｓ５１２）。 The recognition data conversion unit 101-1 converts the voice data sent from the trigger setting unit 403 into recognition data for recognizing as a reserved word (S510). When the conversion into the recognition data is completed, the recognition data conversion unit (101-1) sends the recognition data to the trigger setting unit 403 via the Internet 2 (S511). Upon receiving the recognition data, the trigger setting unit 403 stores the received data in the reserved word storage area 410-2 of the memory 410 (S512).

トリガー設定部４０３は、予約語の登録が完了したことをユーザ３３１に対して知らせる表示（Ｓ５１４）を行う。予約語の登録が完了したことをユーザ３３１に対して知らせる表示は、トリガー設定部４０３が表示装置４２５に対して登録完了通知を送信（Ｓ５１４）し、その登録完了通知を受信した表示装置４２５が例えばＬＥＤを緑色で点灯させる、というようにユーザ３３１が認識できる表示方法で行うことが望ましい。或いはトリガー設定部４０３は、予約語の登録が完了したことをユーザ３３１に対して通知するのに、表示による方法の代わりに音声による方法を用いてもよい。この場合トリガー設定部４０３は、スピーカ４２３に対して登録完了通知を送信し、この登録完了通知を受け取ったスピーカ４２３は、例えば「登録が完了しました」とユーザ３３１に対してアナウンスする方法でもよい。或いはトリガー設定部４０３は、ユーザ３３１に対して予約語の登録が完了したことを通知するのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器３３２が可動型の場合、トリガー設定部４０３は、ホスト機器３３２が例えばある一定の移動幅で繰り返し直線移動するように、記載していない移動手段に対して指示を出してもよい。 The trigger setting unit 403 performs a display for notifying the user 331 that the reserved word registration is completed (S514). The trigger setting unit 403 sends a registration completion notification to the display device 425 (S514), and the display device 425 that has received the registration completion notification displays the notification that the registration of the reserved word is completed to the user 331. It is desirable to use a display method that can be recognized by the user 331, such as turning on the LED in green. Alternatively, the trigger setting unit 403 may use a voice method instead of the display method to notify the user 331 that the reserved word registration is completed. In this case, the trigger setting unit 403 may transmit a registration completion notification to the speaker 423, and the speaker 423 that has received the registration completion notification may announce to the user 331 that “registration is completed”, for example. .. Alternatively, the trigger setting unit 403 may use both the display method and the voice method to notify the user 331 that registration of the reserved word is completed. Alternatively, when the host device 332 is movable, the trigger setting unit 403 may give an instruction to a moving unit (not shown) so that the host device 332 repeatedly linearly moves with a certain movement width.

以上のように、トリガー設定部４０３は、予約語の登録においてデータの流れを管理する役割を持っている。 As described above, the trigger setting unit 403 has a role of managing the flow of data when registering a reserved word.

図６Ａおよび図６Ｂは、予約語の登録開始から登録完了までの別のシーケンス例を示している。ホスト機器３３２が取り込んだ音声データを予約語として登録するのに不十分な場合がある。このように取り込んだデータが不十分な場合の処理の例を示す。 FIGS. 6A and 6B show another sequence example from the start of registration of a reserved word to the completion of registration. It may be insufficient to register the voice data captured by the host device 332 as a reserved word. An example of processing when the data thus captured is insufficient will be shown.

図６Ａおよび図６Ｂに示すＳ６００からＳ６１５の処理は、それぞれ図５Ａおよび図５Ｂに示すＳ５００からＳ５１５の処理と同一である。図５Ａおよび図５Ｂにおける処理と図６Ａおよび図６Ｂにおける処理との相違点は、図６Ｂの処理にＳ６１６の処理からＳ６１９の処理が追加されている点である。 The processing of S600 to S615 shown in FIGS. 6A and 6B is the same as the processing of S500 to S515 shown in FIGS. 5A and 5B, respectively. The difference between the processing in FIGS. 5A and 5B and the processing in FIGS. 6A and 6B is that the processing in S616 to S619 is added to the processing in FIG. 6B.

トリガー設定部４０３は、ユーザ３３１が発した言葉を取り込んだ回数が規定回数に達しているかの確認（Ｓ６０４）を行った結果、規定回数に達していると判定した場合、それまでに音声蓄積エリア４１０−３に保存してある音声データを読み出し（Ｓ６０８）、インターネット２を通じてクラウドサーバ１にある音声認識クラウド１０１の中の認識用データ変換部１０１−１に送付する（Ｓ６０９）。 If the trigger setting unit 403 determines that the number of times the words uttered by the user 331 have been captured has reached the specified number of times (S604), and determines that the number of times has reached the specified number of times, the trigger setting unit 403 will have the audio storage area by then. The voice data stored in 410-3 is read (S608) and sent to the recognition data conversion unit 101-1 in the voice recognition cloud 101 in the cloud server 1 via the Internet 2 (S609).

トリガー設定部４０３は、ユーザ３３１が発した言葉を取り込んだ回数が規定回数に達していないと判定した場合、登録する言葉を発するようにユーザ３３１に促す表示を行う（Ｓ６０７）と共に、入力管理部４２０に対して入力継続通知を送付する（Ｓ６０６）。入力継続通知を受信した入力管理部４２０は、内部の状態をマイクからの音声の入力待ちの状態に遷移させる（Ｓ６００）。 When the trigger setting unit 403 determines that the number of times that the words uttered by the user 331 have been captured has not reached the specified number of times, the trigger setting unit 403 displays a message prompting the user 331 to utter the words to be registered (S607), and also the input management unit 403. An input continuation notification is sent to 420 (S606). Upon receiving the input continuation notification, the input management unit 420 transitions the internal state to the state of waiting for input of voice from the microphone (S600).

なお、登録する言葉を入力するようにユーザ３３１に対して促す表示は、トリガー設定部４０３が表示装置４２５に対して登録未完了通知を送信（Ｓ６０５）し、その登録未完了通知を受信した表示装置４２５が例えばＬＥＤを赤色で点滅させる（Ｓ６０７）、というようにユーザ３３１が認識できる表示方法で行うことが望ましい。また表示による方法の代わりに音声による方法を用いて、登録する言葉の入力をユーザ３３１に促してもよい。この場合トリガー設定部４０３は、スピーカ４２３に対して登録未完了通知を送信し、この登録未完了通知を受け取ったスピーカ４２３は、例えば「もう一度入力してください」とユーザ３３１に対してアナウンスする方法でもよい。或いはトリガー設定部４０３は、ユーザ３３１に対して登録する言葉の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器３３２が可動型の場合、トリガー設定部４０３は、ホスト機器３３２が例えばある一定の角度幅で繰り返し回転移動するように、記載していない移動手段に対して指示を出してもよい。 The display prompting the user 331 to input a word to be registered is a display in which the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S605) and receives the registration incomplete notification. It is desirable that the device 425 uses a display method that the user 331 can recognize, such as blinking the LED in red (S607). The user 331 may be prompted to input a word to be registered by using a voice method instead of the display method. In this case, the trigger setting unit 403 sends a registration incomplete notification to the speaker 423, and the speaker 423 that has received this registration incomplete notification announces to the user 331, for example, “Please input again”. But it's okay. Alternatively, the trigger setting unit 403 may use both the display method and the voice method to prompt the user 331 to input a word to be registered. Alternatively, when the host device 332 is movable, the trigger setting unit 403 may give an instruction to a moving unit (not shown) so that the host device 332 is repeatedly rotated and moved, for example, with a certain angular width.

認識用データ変換部１０１−１は、トリガー設定部４２０より送られてきた全音声データを認識用データに変換する際に、送られてきた音声データが認識用データに変換できるかどうかを判定する（Ｓ６１６）。送られてきた音声データの幾つかが認識用データに変換できないと判定した場合、認識用データ変換部１０１−１は、インターネット２を通じてトリガー設定部４０３に対して音声データ追加要求を送信（Ｓ６１７）する。音声データ追加要求を受信したトリガー設定部４０３は、予約語として登録したい言葉をユーザ３３１に追加で入力してもらう回数を設定し（Ｓ６１８）、入力管理部４２０に対して入力継続通知（Ｓ６１９）を通知する。 The recognition data conversion unit 101-1 determines, when converting all the voice data sent from the trigger setting unit 420 into the recognition data, whether the sent voice data can be converted into the recognition data. (S616). When it is determined that some of the sent voice data cannot be converted into recognition data, the recognition data conversion unit 101-1 transmits a voice data addition request to the trigger setting unit 403 via the Internet 2 (S617). To do. Upon receiving the voice data addition request, the trigger setting unit 403 sets the number of times the user 331 additionally inputs a word to be registered as a reserved word (S618), and notifies the input management unit 420 of an input continuation (S619). To notify.

トリガー設定部４０３がユーザ３３１に追加で入力してもらう追加回数を設定した（Ｓ６１８）時点では、表示部４２５の例えばＬＥＤは赤色で点灯したままである。この表示に従って、ユーザ３３１は、Ｓ６１８で追加設定された回数分、予約語として登録する言葉を発する。 At the time when the trigger setting unit 403 sets the number of times of additional input to the user 331 (S618), for example, the LED of the display unit 425 remains lit in red. In accordance with this display, the user 331 emits the word to be registered as the reserved word for the number of times additionally set in S618.

入力管理部４２０は、入力継続通知を受信すると（Ｓ６１９）、内部状態を入力待ちに遷移させ（Ｓ６００）、ユーザ３３１が発する言葉の入力待ち状態となる。 When the input management unit 420 receives the input continuation notification (S619), the input management unit 420 transitions the internal state to input waiting (S600), and waits for the input of the words issued by the user 331.

図５Ａおよび図５Ｂに示す処理、図６Ａおよび図６Ｂに示す処理は、ユーザ３３１が発した音声を入力管理部４０２が取り込んだ回数が規定回数に達してから、その取り込んだ音声データをまとめてクラウドサーバ１にある認識用データ変換部１０１−１に送信する例であるが、ユーザ３３１が発した音声を入力管理部４２０が取り込むごとに、その取り込んだ音声データを認識用データ変換部１０１−１に送信してもよい。図７Ａおよび図７Ｂは、ユーザ３３１が発した音声を入力管理部４２０が取り込むごとに、その取り込んだ音声データを逐次クラウドサーバ１にある認識用データ変換部１０１−１に送付して、認識用データに変換する場合のシーケンス例である。 The processes shown in FIGS. 5A and 5B and the processes shown in FIGS. 6A and 6B are performed after the number of times the input management unit 402 has captured the voice uttered by the user 331 reaches a prescribed number, and then the captured voice data is collected. This is an example of transmitting to the recognition data conversion unit 101-1 in the cloud server 1, but every time the input management unit 420 acquires the voice uttered by the user 331, the acquired voice data is converted to the recognition data conversion unit 101-1. 1 may be transmitted. 7A and 7B, each time the input management unit 420 captures a voice uttered by the user 331, the captured voice data is sequentially sent to the recognition data conversion unit 101-1 in the cloud server 1 for recognition. It is a sequence example at the time of converting into data.

図７Ａに示すＳ７００からＳ７０２の処理は、それぞれ図５Ａに示すＳ５００からＳ５０２に示す処理と同一である。また図７Ａに示すＳ７０３とＳ７０４の処理は、それぞれ図５Ａに示すＳ５０５とＳ５０７の処理と同一である。 The processes of S700 to S702 shown in FIG. 7A are the same as the processes of S500 to S502 shown in FIG. 5A, respectively. Further, the processing of S703 and S704 shown in FIG. 7A is the same as the processing of S505 and S507 shown in FIG. 5A, respectively.

予約語として登録する言葉をユーザ３３１が発すると、ホスト機器３３２は、マイク４２１から入力された音声データを入力管理部４２０に取り込む（Ｓ７０１）。ホスト機器３３２のモードが予約語登録モードであるので、入力管理部４２０は、受信した音声データをトリガー設定部４０３に転送する（Ｓ７０２）。トリガー設定部４０３は、受信した音声データを、受信するごとに逐次クラウドサーバ１にある認識用データ変換部１０１−１に送信する（Ｓ７０６）。認識用データ変換部１０１−１は、トリガー設定部４０３より送られてきた音声データを認識用データに変換する際に、送られてきた音声データが認識用データに変換できるかどうかを判定する（Ｓ７０７）。 When the user 331 issues a word to be registered as a reserved word, the host device 332 fetches the voice data input from the microphone 421 into the input management unit 420 (S701). Since the mode of the host device 332 is the reserved word registration mode, the input management unit 420 transfers the received voice data to the trigger setting unit 403 (S702). The trigger setting unit 403 sequentially transmits the received voice data to the recognition data conversion unit 101-1 in the cloud server 1 each time it is received (S706). When converting the voice data sent from the trigger setting unit 403 into the recognition data, the recognition data conversion unit 101-1 determines whether or not the sent voice data can be converted into the recognition data ( S707).

送られてきた音声データが認識用データに変換できないと判定した場合は、認識用データ変換部１０１−１はインターネット２を通じてトリガー設定部４０３に対して音声データ追加要求を送信する（Ｓ７０８）。音声データ追加要求を受信したトリガー設定部４０３（Ｓ７０８）は、ユーザ３３１の音声を取り込んだ回数が規定回数に達しているかの確認（Ｓ７１４）を行う。トリガー設定部４０３は、ユーザ３３１の音声を取り込んだ回数が規定回数に達しているかの確認の結果、規定回数に達していないと判定した場合、登録する言葉を発するようにユーザ３３１に促す表示を継続すると共に、入力管理部４２０に対して入力継続通知を送付する（Ｓ７１５）ことで、入力管理部４２０をマイクからの音声の入力待ちの状態に遷移させる（Ｓ７００）。入力管理部４２０は、入力継続通知を受信すると（Ｓ７１５）、内部状態を入力待ちに遷移させ（Ｓ７００）、ユーザ３３１が発する言葉の入力待ち状態となる。 When it is determined that the sent voice data cannot be converted into the recognition data, the recognition data conversion unit 101-1 transmits a voice data addition request to the trigger setting unit 403 via the Internet 2 (S708). Upon receiving the voice data addition request, the trigger setting unit 403 (S708) confirms whether the number of times of capturing the voice of the user 331 has reached the specified number (S714). When the trigger setting unit 403 determines that the number of times of capturing the voice of the user 331 has reached the specified number of times as a result of confirmation, the trigger setting unit 403 displays a message prompting the user 331 to issue the words to be registered. While continuing, by sending an input continuation notification to the input management unit 420 (S715), the input management unit 420 is transitioned to a state of waiting for input of voice from the microphone (S700). When the input management unit 420 receives the input continuation notification (S715), the input management unit 420 shifts the internal state to input waiting (S700) and waits for the input of the words issued by the user 331.

認識用データ変換部１０１−１は、送られてきた音声データが認識用データに変換できると判定（Ｓ７０７）した場合は、音声データを認識用データに変換する（Ｓ７０９）。認識用データ変換部１０１−１は、認識用データに変換した（Ｓ７０９）結果、既に認識用データに変換したものも含めてすべての認識用データを用いて、マイク４２１より入力された音声データを予約語として認識できる精度を確保しているかどうかの判定を行う（Ｓ７１０）。 When it is determined that the received voice data can be converted into the recognition data (S707), the recognition data conversion unit 101-1 converts the voice data into the recognition data (S709). The recognition data conversion unit 101-1 converts the voice data input from the microphone 421 by using all the recognition data including those already converted into the recognition data as a result of the conversion into the recognition data (S709). It is judged whether or not the accuracy of recognition as a reserved word is secured (S710).

すべての認識用データにより、マイク４２１より入力された音声データを予約語として認識するのに十分な精度を確保していると判定した場合は、予約語として登録したい言葉をユーザ３３１が発するのを止めてもらうために、インターネット２を通じて、認識用データが十分である旨の情報を付加した認識用データ（認識用データ充足通知付）をトリガー設定部４０３に通知する（Ｓ７１１）。認識用データ（認識用データ充足）を受信したトリガー設定部４０３は、この時点までに受信した認識用データで、マイク４２１より入力された音声データを予約語として認識するのに十分な認識用データが存在すると認識し、ユーザ３３１の音声を取り込んだ回数が規定回数に達していなくても、これ以上ユーザ３３１に対して登録する言葉の入力を促すことを中止する（Ｓ７１２）。トリガー設定部４０３は、この時点までに受信した認識用データすべてを予約語保存エリア４１０−２に保存する（Ｓ７１６）とともに、入力管理部４２０、表示部４２５、認識用データ変換部１０１−１に登録完了通知を送付する（Ｓ７１７）（Ｓ７１８）（Ｓ７１９）。これにより、変換された認識用データの精度により、ユーザ３３１の音声を取り込んだ回数が規定回数に達しなくて予約語として登録する言葉をユーザ３３１に発してもらうのを止めてもらうことが可能となり、より自由度のある予約語の登録処理が可能となる。なお、規定回数は、ホスト機器３３２の設定値としてユーザ３３１による変えることが可能であり、また後述する付加情報の１つとして変えることが可能である。 If it is determined that all the recognition data have sufficient accuracy to recognize the voice data input from the microphone 421 as a reserved word, the user 331 issues a word to be registered as a reserved word. In order to stop the recognition, the trigger setting unit 403 is notified of the recognition data (with the recognition data satisfaction notification) to which the information indicating that the recognition data is sufficient is added via the Internet 2 (S711). The trigger setting unit 403 that has received the recognition data (satisfaction data for recognition) is the recognition data received up to this point, and is sufficient recognition data for recognizing the voice data input from the microphone 421 as a reserved word. Even if the number of times that the voice of the user 331 has been captured has not reached the prescribed number, urging the user 331 to input a word to be registered is stopped (S712). The trigger setting unit 403 saves all the recognition data received up to this point in the reserved word storage area 410-2 (S716), and also stores them in the input management unit 420, the display unit 425, and the recognition data conversion unit 101-1. A registration completion notice is sent (S717) (S718) (S719). As a result, due to the accuracy of the converted recognition data, it is possible to stop the user 331 from issuing a word to be registered as a reserved word because the number of times that the voice of the user 331 has been captured does not reach the prescribed number. , It becomes possible to perform the process of registering a reserved word with more flexibility. The specified number of times can be changed by the user 331 as a setting value of the host device 332, and can be changed as one of additional information described later.

認識用データ変換部１０１−１は、この時点までに作成した認識用データにより、マイク４２１より入力された音声データを予約語として認識するのに十分な精度を確保していないと判定した場合は、変換した認識用データのみをトリガー設定部４０３に送付する（Ｓ７１３）。認識用データを受信したトリガー設定部４０３は、ユーザ３３１の音声を取り込んだ回数が規定回数に達しているかの確認（Ｓ７１４）を行う。トリガー設定部４０３は、規定回数に達しているかの確認の結果規定回数に達していないと判定した場合、登録する言葉を発するようにユーザ３３１に促す表示を継続すると共に、入力管理部４２０に対して入力継続通知を送付する（Ｓ７１５）ことで、入力管理部４２０をマイクからの音声の入力待ちの状態に遷移させる（Ｓ７００）。 If the recognition data conversion unit 101-1 determines that the recognition data created up to this point does not ensure sufficient accuracy to recognize the voice data input from the microphone 421 as a reserved word, , And sends only the converted recognition data to the trigger setting unit 403 (S713). Upon receiving the recognition data, the trigger setting unit 403 confirms whether the number of times that the voice of the user 331 has been captured has reached the specified number (S714). If the trigger setting unit 403 determines that the specified number of times has not been reached as a result of checking whether the number of times has reached the specified number of times, the trigger setting unit 403 continues the display prompting the user 331 to utter a word to be registered, By sending an input continuation notification (S715), the input management unit 420 is transitioned to a state of waiting for input of voice from the microphone (S700).

なお、登録する言葉を入力するようにユーザ３３１に対して促す表示は、トリガー設定部４０３が表示装置４２５に対して登録未完了通知を送信（Ｓ７０３）し、その登録未完了通知を受信した表示装置４２５が例えばＬＥＤを赤色で点滅させる（Ｓ７０４）、というようにユーザ３３１が認識できる表示方法で行うことが望ましい。また表示による方法の代わりに音声による方法を用いて、登録する言葉の入力をユーザ３３１に促してもよい。この場合トリガー設定部４０３は、スピーカ４２３に対して登録未完了通知を送信し、この登録未完了通知を受け取ったスピーカ４２３は、例えば「もう一度入力してください」とユーザ３３１に対してアナウンスする方法でもよい。或いはトリガー設定部４０３は、ユーザ３３１に対して登録する言葉の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器３３２が可動型の場合、トリガー設定部４０３は、ホスト機器３３２が例えばある一定の角度幅で繰り返し回転移動するように、記載していない移動手段に対して指示を出してもよい。 The display prompting the user 331 to input the word to be registered is a display in which the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S703) and receives the registration incomplete notification. It is preferable that the device 425 uses a display method that the user 331 can recognize, such as blinking the LED in red (S704). The user 331 may be prompted to input a word to be registered by using a voice method instead of the display method. In this case, the trigger setting unit 403 sends a registration incomplete notification to the speaker 423, and the speaker 423 that has received this registration incomplete notification announces to the user 331, for example, “Please input again”. But it's okay. Alternatively, the trigger setting unit 403 may use both the display method and the voice method to prompt the user 331 to input a word to be registered. Alternatively, when the host device 332 is movable, the trigger setting unit 403 may give an instruction to a moving unit (not shown) so that the host device 332 is repeatedly rotated and moved, for example, with a certain angular width.

認識用データを受信したトリガー設定部４０３は、規定回数に達しているかの確認（Ｓ７１４）の結果規定回数に達していると判定した場合、登録完了通知を入力管理部４２０、表示部４２５、認識用データ変換部１０１−１に登録完了通知を送付する（Ｓ７１７）（Ｓ７１８）（Ｓ７１９）。登録完了通知を受信（Ｓ７１８）した認識用データ変換部１０１−１は、Ｓ７１０の処理を行うために一時的に保存していた変換済み認識用データをクリアする。 When the trigger setting unit 403 that has received the recognition data determines that the specified number of times has been reached as a result of checking whether the specified number of times has been reached (S714), a registration completion notification is input to the input management unit 420, the display unit 425, and the recognition. A registration completion notification is sent to the data conversion unit 101-1 (S717) (S718) (S719). Upon receiving the registration completion notification (S718), the recognition data conversion unit 101-1 clears the converted recognition data temporarily stored for performing the process of S710.

次に、ホスト機器３３２の２つ目の処理である予約語の認識について説明する。 Next, recognition of reserved words, which is the second process of the host device 332, will be described.

ホスト機器３３２は、ユーザ３３１が発した言葉の中から予約語を認識した場合、継続するユーザ３３１が発した言葉の内容を解析することで、その解析結果をもとに機器やセンサを制御する機能を有している。この予約語を認識し、予約語を認識した以降に機器やセンサを制御するために、ホスト機器３３２は、予約語を認識および機器やセンサを制御するモード（以降動作モードと呼ぶ）を有している。 When the host device 332 recognizes the reserved word from the words issued by the user 331, the host device 332 analyzes the content of the word issued by the user 331 continuously, and controls the device and the sensor based on the analysis result. It has a function. In order to recognize this reserved word and control the device or sensor after recognizing the reserved word, the host device 332 has a mode for recognizing the reserved word and controlling the device or sensor (hereinafter referred to as operation mode). ing.

図８Ａおよび図８Ｂは、動作モードにおいて、ユーザ３３１が発した言葉が登録済みの予約語の１つであると認識するまでの、ホスト機器３３２の処理シーケンスの例を示している。 FIG. 8A and FIG. 8B show an example of a processing sequence of the host device 332 until the word issued by the user 331 is recognized as one of the registered reserved words in the operation mode.

ユーザ３３１が言葉を発すると、ホスト機器３３２は、マイク４２１から入力された音声データを入力管理部４２０に取り込む（Ｓ８０１）。ホスト機器３３２のモードが動作モードである場合、入力管理部４２０は、受信した音声データをトリガー認識部４０５に転送する（Ｓ８０２）。トリガー認識部４０５は、入力管理部４２０から転送されてきた音声データを受け取ると、転送されてきた音声データが予約語であるかどうかを判定するために、メモリ４１０の予約語保存エリア４１０−２から読みだし（Ｓ８０３）た認識用データと比較を行う（Ｓ８０４）。 When the user 331 utters a word, the host device 332 takes in the voice data input from the microphone 421 to the input management unit 420 (S801). When the mode of the host device 332 is the operation mode, the input management unit 420 transfers the received voice data to the trigger recognition unit 405 (S802). When the trigger recognition unit 405 receives the voice data transferred from the input management unit 420, the trigger recognition unit 405 determines whether or not the transferred voice data is a reserved word, and reserves a reserved word storage area 410-2 in the memory 410. It is compared with the recognition data read from (S803) (S804).

トリガー認識部４０５は、入力された音声データが予約語と認識出来ないと判定した場合（Ｓ８０５）、予約語を発するようにユーザ３３１に促す表示を行う（Ｓ８０８）と共に、入力管理部４２０に入力継続通知を送付する（Ｓ８０７）。なお、予約語を発するようにユーザ３３１に促す表示は、トリガー認識部４０５が表示部４２５に対して認識未完了通知を送信（Ｓ８０６）し、その認識未完了通知を受信した表示部４２５が例えばＬＥＤを赤色で点滅させる（Ｓ８０８）、というようにユーザ３３１３が認識できる表示方法で行うことが望ましい。またトリガー設定部４０３は、表示による方法の代わりに音声による方法を用いて、音声の入力をユーザ３３１に促してもよい。この場合トリガー認識部４０５は、スピーカ４２３に対して、認識未完了通知を送信し、この認識未完了通知を受け取ったスピーカ４２３は、例えば「聞こえなかったよ」とユーザ３３１に対してアナウンスする方法でもよい。或いはトリガー認識部４０５は、ユーザ３３１に対して音声の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器３３２が可動型の場合、トリガー設定部４０３は、ホスト機器３３２が例えばある一定の角度幅で繰り返し回転移動するように、記載していない移動手段に対して指示を出してもよい。 When the trigger recognition unit 405 determines that the input voice data cannot be recognized as the reserved word (S805), the trigger recognition unit 405 performs a display prompting the user 331 to issue the reserved word (S808), and inputs it to the input management unit 420. A continuation notice is sent (S807). The display prompting the user 331 to issue the reserved word is performed by the trigger recognizing unit 405 transmitting a recognition incompletion notification to the display unit 425 (S806), and the display unit 425 receiving the recognition incompletion notification is, for example, It is desirable to use a display method that allows the user 3313 to recognize, such as blinking the LED in red (S808). Also, the trigger setting unit 403 may prompt the user 331 to input a voice by using a voice method instead of the display method. In this case, the trigger recognition unit 405 also transmits a recognition incompletion notice to the speaker 423, and the speaker 423 that has received this recognition incompletion notice announces to the user 331, for example, "I did not hear". Good. Alternatively, the trigger recognition unit 405 may use both a display method and a voice method to prompt the user 331 to input a voice. Alternatively, when the host device 332 is movable, the trigger setting unit 403 may give an instruction to a moving unit (not shown) so that the host device 332 is repeatedly rotated and moved, for example, with a certain angular width.

トリガー認識部４０５は、入力された音声データが予約語と認識出来た場合（Ｓ８０５）、ユーザ３３１が発した音声を予約語として認識したことを示す表示を行う（Ｓ８１０）。なお、ユーザ３３１が発した音声を予約語として認識したことを示す表示は、トリガー認識部４０３が表示装置４２５に対して認識完了通知を送信（Ｓ８０９）し、その認識完了通知を受信した表示装置４２５が例えばＬＥＤを緑色で点灯させる（Ｓ８１０）、というようにユーザ３３１が認識できる表示方法で行うことが望ましい。またトリガー認識部４０５は、表示による方法の代わりに音声による方法を用いて、ユーザ３３１が発した音声を予約語として認識しことを通知してもよい。この場合トリガー認識部４０５は、スピーカ４２３に対して認識完了通知を送信し、この認識完了通知を受け取ったスピーカ４２３は、例えば「はいはい」や「聞こえたよ」とユーザ３３１に対してアナウンスする方法でもよい。或いはトリガー認識部４０５は、ユーザ３３１が発した音声を予約語として認識したことを示すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器３３２が可動型の場合、トリガー設定部４０３は、ホスト機器３３２が例えばある一定の移動幅で繰り返し直線移動するように、記載していない移動手段に対して指示を出してもよい。 When the input voice data can be recognized as a reserved word (S805), the trigger recognition unit 405 displays that the voice uttered by the user 331 is recognized as a reserved word (S810). The display indicating that the voice uttered by the user 331 has been recognized as a reserved word is displayed on the display device that the trigger recognition unit 403 transmits a recognition completion notification to the display device 425 (S809) and receives the recognition completion notification. It is desirable to use a display method that the user 331 can recognize, such as 425 turning on the LED in green (S810). Further, the trigger recognition unit 405 may notify that the voice uttered by the user 331 is recognized as a reserved word by using a voice method instead of the display method. In this case, the trigger recognition unit 405 also transmits a recognition completion notification to the speaker 423, and the speaker 423 that has received the recognition completion notification announces to the user 331, for example, "Yes" or "I heard". Good. Alternatively, the trigger recognition unit 405 may use both the display method and the voice method to indicate that the voice uttered by the user 331 is recognized as a reserved word. Alternatively, when the host device 332 is movable, the trigger setting unit 403 may give an instruction to a moving unit (not shown) so that the host device 332 repeatedly linearly moves with a certain movement width.

図９Ａおよび図９Ｂは、動作モードにおいて、ユーザ３３１が発した言葉を登録済みの予約語の１つであると認識するまでのホスト機器３３２の処理シーケンスの他の例である。 9A and 9B are another example of the processing sequence of the host device 332 until the word issued by the user 331 is recognized as one of the registered reserved words in the operation mode.

図９Ａおよび図９Ｂのシーケンス例と図８Ａおよび図８Ｂのシーケンス例との違いは、予約語の認識を行う過程で、認識確率を考慮に入れている点である。認識確率とは、認識用データと、入力管理部４２０から転送されてきた音声データの周波数成分や強さ等の特徴点の比較を行い、両者が一致しているレベルことを意味している。図９Ａおよび図９Ｂに示すＳ９００からＳ９１２の処理は、それぞれＳ８００からＳ８１２に示す処理と同一で、図９Ａおよび図９Ｂにおける処理において図８Ａおよび図８Ｂとの処理との違いは、Ｓ９１３からＳ９１６の処理が追加されている点である。 The difference between the sequence examples of FIGS. 9A and 9B and the sequence examples of FIGS. 8A and 8B is that the recognition probability is taken into consideration in the process of recognizing a reserved word. The recognition probability means a level at which the recognition data is compared with characteristic points such as frequency components and strengths of the voice data transferred from the input management unit 420, and the two match. The processes of S900 to S912 shown in FIGS. 9A and 9B are the same as the processes of S800 to S812, respectively. The difference between the processes of FIGS. 9A and 9B and the processes of FIGS. 8A and 8B is that of S913 to S916. The point is that processing is added.

トリガー認識部４０５は、入力管理部４２０から転送されてきた音声データを受け取ると、メモリ４１０の予約語保存エリア４１０−２から認識用データを読み出し（Ｓ９０３）、入力管理部４２０から転送されてきた音声データとの比較を行う（Ｓ９０４）。 Upon receiving the voice data transferred from the input management unit 420, the trigger recognition unit 405 reads the recognition data from the reserved word storage area 410-2 of the memory 410 (S903), and is transferred from the input management unit 420. It is compared with the audio data (S904).

トリガー認識部４０５は、入力された音声データが予約語と認識出来たと判定（Ｓ９０５）した場合、認識確率の判定処理（Ｓ９１３）に移る。 When the trigger recognition unit 405 determines that the input voice data has been recognized as a reserved word (S905), the trigger recognition unit 405 proceeds to recognition probability determination processing (S913).

ここでトリガー認識部４０５が行う音声認識処理は、メモリ４１０の予約語保存エリア４１０−２から読み出した認識用データと入力管理部４２０から転送されてきた音声データの周波数成分や強さ等の特徴点との比較を行い、両者が一定のレベル以上一致する場合に、入力管理部４２０から転送された音声データは認識用データである、と判定するものである。 Here, the voice recognition processing performed by the trigger recognition unit 405 has characteristics such as frequency components and strength of the recognition data read from the reserved word storage area 410-2 of the memory 410 and the voice data transferred from the input management unit 420. It is determined that the voice data transferred from the input management unit 420 is the recognition data when the two points match with each other at a certain level or more.

ホスト機器３３２は、認識用データと入力管理部４２０から転送されてきた音声データの周波数成分や強さ等の特徴点との比較を行う際に、両者が一致しているレベルを判定する閾値を複数設けることも可能である。このようにすることで、ホスト機器３３２は、ユーザが発した言葉の中から予約語を認識する際に、予約語を認識出来た／予約語を認識出来ない、という２通りの判定ではなく、例えば予約語を認識出来た／予約語を認識出来ない／予約語を認識出来たとは言えない、というように、予約語に近いが正しい予約語ではない、という判定を加えることも出来る。このように認識確率の閾値を複数設けることで、ユーザ３３１が例えば予約語を正確に覚えていない場合、ユーザ３３１が予約語に近い言葉を繰り返し発することで、そのユーザ３３１の発した言葉を取り込んだホスト機器３３２は「予約語を認識出来たとは言えない」という判定結果に応じた応答をし、その応答内容を見たユーザ３３１は、正しい予約語に近づくことができる、というメリットがある。 When the host device 332 compares the recognition data with the characteristic points such as the frequency component and the strength of the voice data transferred from the input management unit 420, the host device 332 sets a threshold value for determining the level at which the two match. It is also possible to provide a plurality. By doing so, when the host device 332 recognizes the reserved word from the words issued by the user, the host device 332 does not perform the two determinations that the reserved word can be recognized or the reserved word cannot be recognized. For example, it is possible to add a judgment that the reserved word is not recognized, the reserved word cannot be recognized, or the reserved word cannot be recognized. By thus providing a plurality of thresholds of recognition probabilities, when the user 331 does not remember the reserved word correctly, for example, the user 331 repeatedly utters a word close to the reserved word to capture the word uttered by the user 331. However, the host device 332 responds according to the determination result that "the reserved word cannot be recognized", and the user 331 who sees the response content can approach the correct reserved word.

図９Ａおよび図９Ｂの例は、認識確率の閾値を２つ設けた場合の例である。予約語を認識出来る閾値を閾値１とし予約語を認識出来ない閾値を閾値０とすると、Ｓ９０４において比較の結果、認識確率が閾値１以上の場合は、予約語が認識出来た、との判定結果となる。また認識確率が閾値０以上閾値１未満の場合は、予約語を認識出来たいと言えない、との判定結果となる。また認識確率が閾値０未満の場合は、予約語が認識出来ない、との判定結果となる。したがってＳ９０５の処理は、認識確率を閾値０と大小比較を行う処理である。またＳ９１３の処理は、認識確率を閾値１と大小比較を行う処理となる。 The examples of FIGS. 9A and 9B are examples in which two threshold values of the recognition probability are provided. Assuming that the threshold for recognizing a reserved word is threshold 1 and the threshold for not recognizing a reserved word is threshold 0, the result of the comparison in S904 is that if the recognition probability is 1 or more, the reserved word can be recognized. Becomes If the recognition probability is greater than or equal to the threshold value 0 and less than the threshold value 1, it is determined that the reserved word cannot be recognized. Further, when the recognition probability is less than the threshold value 0, it is determined that the reserved word cannot be recognized. Therefore, the process of S905 is a process of comparing the recognition probability with the threshold value 0. The process of S913 is a process of comparing the recognition probability with the threshold value 1 in magnitude.

ホスト機器３３２は、認識確率が閾値０以上閾値１未満である、と判定した場合（Ｓ９１３）、予約語を発するようにユーザ３３１に促す表示を行う（Ｓ９１５）と共に、入力管理部４２０に入力継続通知を送付する（Ｓ９１６）。なお、予約語を発するようにユーザ３３１に促す表示は、トリガー認識部４０５が表示部４２５に対して認識不十分通知を送付（Ｓ９１４）し、その認識不十分通知を受信した表示部４２５が例えばＬＥＤを緑色で点滅させる（Ｓ９１５）、というようにユーザ３３１が認識できる表示方法で行うことが望ましい。 When the host device 332 determines that the recognition probability is greater than or equal to the threshold value 0 and less than the threshold value 1 (S913), the host device 332 displays a message prompting the user 331 to issue a reserved word (S915) and continues input to the input management unit 420. A notice is sent (S916). As for the display prompting the user 331 to issue the reserved word, the trigger recognition unit 405 sends a notification of insufficient recognition to the display unit 425 (S914), and the display unit 425 that has received the insufficient notification of recognition recognizes, for example, It is desirable to perform the display method that the user 331 can recognize, such as blinking the LED in green (S915).

このように、認識確率が低い場合に、予約語を発するようにユーザ３３１に促す表示は、認識に失敗した場合の表示（Ｓ９０８）や認識に成功した場合の表示（Ｓ９１０）と変えることで、ユーザ３３１は、自分が発した言葉が予約語に近いが正しく予約語を発していない、と認識することができる。 As described above, when the recognition probability is low, the display prompting the user 331 to issue the reserved word is changed to the display when the recognition fails (S908) or the display when the recognition succeeds (S910). The user 331 can recognize that the word that he / she has spoken is close to the reserved word but does not correctly emit the reserved word.

またトリガー設定部４０３は、表示による方法の代わりに音声による方法を用いて、音声の入力をユーザ３３１に促してもよい。この場合トリガー認識部４０５は、スピーカ４２３に対して認識不十分通知を送信（Ｓ９１４）し、この認識不十分通知を受け取ったスピーカ４２３は、例えば「何か呼んだ？」とユーザ３３１に対してアナウンスする方法でもよい。或いはトリガー認識部４０５は、ユーザ３３１に対して音声の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器３３２が可動型の場合、トリガー設定部４０３は、ホスト機器３３２が例えばある一定の角度幅で繰り返し回転移動するように、記載していない移動手段に対して指示を出してもよい。 Also, the trigger setting unit 403 may prompt the user 331 to input a voice by using a voice method instead of the display method. In this case, the trigger recognition unit 405 transmits a recognition insufficient notification to the speaker 423 (S914), and the speaker 423 that has received the recognition insufficient notification notifies the user 331, for example, "Did you call something?" An announcement method is also acceptable. Alternatively, the trigger recognition unit 405 may use both a display method and a voice method to prompt the user 331 to input a voice. Alternatively, when the host device 332 is movable, the trigger setting unit 403 may give an instruction to a moving unit (not shown) so that the host device 332 is repeatedly rotated and moved, for example, with a certain angular width.

次に、ホスト機器３３２の３つ目の処理である、動作を制御する機器やセンサの制御内容の登録と、４つ目の処理である、制御内容が登録されている機器やセンサの制御について説明する。 Next, regarding the third process of the host device 332, registration of the control content of the device or sensor for controlling the operation, and the fourth process of controlling the device or sensor with the control content registered explain.

まずは、ホスト機器３３２を用いた機器やセンサの制御の全体像を説明する。 First, an overview of control of devices and sensors using the host device 332 will be described.

ホスト機器３３２は、ユーザ３３１が発した言葉の中から予約語を認識した場合、予約語を認識した以降にユーザが発した言葉を継続して取り込み、その取り込んだ言葉の内容を解析することで機器やセンサを制御する機能を有している。 When the host device 332 recognizes the reserved word from the words spoken by the user 331, the host device 332 continuously captures the word spoken by the user after the reserved word is recognized, and analyzes the content of the captured word. It has the function of controlling devices and sensors.

図１０Ａおよび図１０Ｂは、ホスト機器が、予約語の認識が完了した以降において、マイク４２１から取り込んだ機器やセンサの制御内容を含んだ音声データの内容に基づいて、機器やセンサを制御する場合の処理シーケンスの例を示している。入力管理部４２０の内部状態は、予約語の認識は完了しているので認識済み（Ｓ１０００）に遷移している。 10A and 10B show a case where the host device controls the device and the sensor based on the content of the voice data including the control content of the device and the sensor fetched from the microphone 421 after the recognition of the reserved word is completed. The example of the processing sequence of is shown. The internal state of the input management unit 420 has transitioned to recognized (S1000) because the recognition of the reserved word is completed.

ユーザ３３１が、機器やセンサを制御する内容を含んだ言葉を発すると、ホスト機器３３２はマイク４２１を通じて（Ｓ１００１）、その音声データ（制御内容）を入力管理部４２０に取り込む（Ｓ１００２）。入力管理部４２０は、内部状態が認識済みであるので、入力された音声データ（制御内容）を音声処理部４０７に転送する（Ｓ１００２）。音声処理部４０７は、転送された音声データ（制御内容）をインターネット２を通じて、クラウドサーバ１にある音声認識クラウド１０１の中の音声テキスト変換部１０１−２に送る。 When the user 331 utters a word including content for controlling the device or sensor, the host device 332 takes in the voice data (control content) to the input management unit 420 through the microphone 421 (S1001) (S1002). Since the internal state has been recognized, the input management unit 420 transfers the input voice data (control content) to the voice processing unit 407 (S1002). The voice processing unit 407 sends the transferred voice data (control contents) to the voice text conversion unit 101-2 in the voice recognition cloud 101 in the cloud server 1 via the Internet 2.

音声テキスト変換部１０１−２は、インターネット２を通じて送られてき音声データを、テキストデータに変換する処理を行う（Ｓ１００４）。この処理により、もともとマイク４２１を通じて取り込まれたユーザ３３１が発した音声が、テキストデータに変換される。 The voice / text conversion unit 101-2 performs a process of converting voice data sent via the Internet 2 into text data (S1004). By this processing, the voice originally uttered by the user 331 and taken in through the microphone 421 is converted into text data.

テキストデータへの変換が完了すると音声テキスト変換部１０１−２は、変換したテキストデータを内部に保存すると共に変換完了通知を音声処理部４０７に送付する（Ｓ１００５）。 When the conversion into text data is completed, the voice / text conversion unit 101-2 stores the converted text data therein and sends a conversion completion notice to the voice processing unit 407 (S1005).

音声処理部４０７は、変換完了通知を受け取ると、音声テキスト変換部１０１−２に対してテキスト分析要求を送信する（Ｓ１００６）。音声テキスト変換部１０１−２は、テキスト分析要求を受信すると、内部に保存してあるテキストに変換済みのデータとともにテキスト分析要求をテキスト分析部１０２−１に送付する（Ｓ１００７）。テキスト分析部１０２−１は、テキスト分析要求を受信（Ｓ１００７）したら、付随しているテキストデータの内容の解析を実施する（Ｓ１００８）。テキスト分析部１０２−１は、送られてきたテキストデータの内容の解析が完了すると、その解析結果をテキスト分析結果通知として応答・アクション生成部１０２−２に送付する（Ｓ１００９）。応答・アクション生成部１０２−２は、テキスト分析結果を受信（Ｓ１００９）すると、その内容に基づいて対象となる機器とその機器を制御するコマンドを生成し（Ｓ１０１０）、生成したコマンドを応答・アクション生成結果通知として音声処理部４０７に送付する（Ｓ１０１１）。 Upon receiving the conversion completion notification, the voice processing unit 407 transmits a text analysis request to the voice / text conversion unit 101-2 (S1006). Upon receiving the text analysis request, the voice / text conversion unit 101-2 sends the text analysis request to the text analysis unit 102-1 together with the data converted into the text stored inside (S1007). Upon receiving the text analysis request (S1007), the text analysis unit 102-1 analyzes the content of the accompanying text data (S1008). When the analysis of the content of the sent text data is completed, the text analysis unit 102-1 sends the analysis result to the response / action generation unit 102-2 as a text analysis result notification (S1009). Upon receiving the text analysis result (S1009), the response / action generation unit 102-2 generates a target device and a command for controlling the device based on the content (S1010), and returns the generated command as a response / action. It is sent to the voice processing unit 407 as a generation result notification (S1011).

音声処理部４０７は、応答・アクション生成結果通知を受信する（Ｓ１０１１）と、応答・アクション生成結果通知の内容から制御対象の機器やセンサとその制御内容を特定する（Ｓ１０１２）。音声処理部４０７は、特定した制御対象の機器やセンサとその制御内容を、制御対象の機器やセンサが認識出来るフォーマットに変換して、必要なタイミングにおいてネットワーク３３３を通じて対象機器や対象センサにアクション通知として送信する（Ｓ１０１３）。 Upon receiving the response / action generation result notification (S1011), the voice processing unit 407 identifies the device or sensor to be controlled and its control content from the content of the response / action generation result notification (S1012). The voice processing unit 407 converts the specified control target device or sensor and its control content into a format that can be recognized by the control target device or sensor, and notifies the target device or target sensor of an action through the network 333 at a necessary timing. (S1013).

アクション通知の通知先である制御対象の機器やセンサは、アクション通知を受け取る（Ｓ１０１３）と、その中に含まれる制御内容に基づいて動作を行う（Ｓ１０１４）。 Upon receiving the action notification (S1013), the device or sensor to be controlled, which is the notification destination of the action notification, operates based on the control content included in the notification (S1014).

ホスト機器３３２は、ユーザ３３１が連続して音声を発する場合、この連続した音声を一連の音声と判定して途中でユーザ３３１に対して予約語を発することを要求することなく、この連続した音声を取り込むことが出来る。逆にホスト機器３３２は、ユーザ３３１が、ある程度時間をおいて音声を発する場合は、再度予約語の入力を要求する。各々の場合について、図１１Ａおよび図１１Ｂと図１２Ａおよび図１２Ｂを用いて説明する。 When the user 331 continuously emits a voice, the host device 332 determines that the continuous voice is a continuous voice and does not request the user 331 to issue a reserved word in the middle of the continuous voice. Can be captured. Conversely, when the user 331 utters a voice after a certain period of time, the host device 332 requests the input of the reserved word again. Each case will be described with reference to FIGS. 11A and 11B and FIGS. 12A and 12B.

図１１Ａおよび図１１Ｂは、予約語の認識が完了した以降において、ユーザ３３１が時間Ｔ０以内に連続的に言葉を発する場合の処理シーケンスの例である。ホスト機器３３２が、マイク４２１から入力された音声データ（制御内容）を入力管理部４２０に取り込む（Ｓ１１０１）と、入力管理部４２０は入力間隔確認タイマＴを起動させる。入力間隔確認タイマＴが満了する時間（＝Ｔ０）以前の時間Ｔ１に、マイク４２１を通じてユーザ３３１が発した次の音声データ（制御内容）を入力管理部４２０に取り込んだ場合（Ｓ１１２１）、入力管理部４２０は、その取り込んだ音声データ（制御内容）を音声処理部４０７に転送する（Ｓ１１２２）。同時に、起動中の入力間隔確認タイマＴを再度起動させる。音声処理部４０７は、転送されてきた音声データ（制御内容）をインターネット２を通じて、クラウドサーバ１にある音声認識クラウド１０１の中の音声テキスト変換部１０１−２に送る（Ｓ１１２３）。以降は、Ｓ１１０４からＳ１１１０の処理と同様に、音声認識クラウド１０１において送られてきた音声データ（Ｓ１１２３）の処理を継続する。 11A and 11B are examples of a processing sequence when the user 331 continuously speaks a word within a time T0 after the recognition of the reserved word is completed. When the host device 332 fetches the audio data (control content) input from the microphone 421 into the input management unit 420 (S1101), the input management unit 420 activates the input interval confirmation timer T. When the next voice data (control content) generated by the user 331 through the microphone 421 is captured in the input management unit 420 at the time T1 before the time (= T0) when the input interval confirmation timer T expires (S1121), the input management The unit 420 transfers the captured voice data (control contents) to the voice processing unit 407 (S1122). At the same time, the input interval confirmation timer T being activated is activated again. The voice processing unit 407 sends the transferred voice data (control contents) to the voice text conversion unit 101-2 in the voice recognition cloud 101 in the cloud server 1 via the Internet 2 (S1123). After that, similarly to the processing from S1104 to S1110, the processing of the voice data (S1123) sent in the voice recognition cloud 101 is continued.

なお入力間隔確認タイマＴは、入力管理部４２０がマイク４２１から入力された音声データを取り込んだタイミングで起動しているが、これに限らず例えば入力管理部４２０が、マイク４２１から送られてきたデータをトリガー設定部４０３や音声処理部４０７に転送するタイミングで起動してもよい。また、入力管理部４２０の内部状態が認識済みに遷移（Ｓ１１００）したタイミングで、起動してもよい。 The input interval confirmation timer T is activated at the timing when the input management unit 420 captures the voice data input from the microphone 421. However, the input interval confirmation timer T is not limited to this, and the input management unit 420 is sent from the microphone 421, for example. It may be activated at the timing when the data is transferred to the trigger setting unit 403 or the voice processing unit 407. The input management unit 420 may be activated at the timing when the internal state of the input management unit 420 transits to "recognized" (S1100).

図１２Ａおよび図１２Ｂは、ユーザ３３１が時間Ｔ０以内に連続的に音声を発しない場合の例である。ホスト機器３３２は、マイク４２１から入力された音声データ（制御内容）を入力管理部４２０に取り込む（Ｓ１２０１）と、入力管理部４２０は入力間隔確認タイマＴを起動させる。入力管理部４２０は、入力間隔確認タイマＴが満了する時間（＝Ｔ０）を過ぎると、内部状態を入力待ちに遷移させる（Ｓ１２２０）。 12A and 12B are examples in which the user 331 does not continuously make a sound within the time T0. When the host device 332 fetches the voice data (control content) input from the microphone 421 into the input management unit 420 (S1201), the input management unit 420 activates the input interval confirmation timer T. When the input interval confirmation timer T expires (= T0), the input management unit 420 transitions the internal state to wait for input (S1220).

ホスト機器３３２は、入力間隔確認タイマＴが満了する時間（＝Ｔ０）を過ぎてからマイク４２１から入力された次の音声データを取り込んだ場合（Ｓ１２２４）、この取り込んだ音声データをもとに機器やセンサを制御する処理を実行せず、予約語を発するようにユーザ３３１を促す表示を行う。 When the host device 332 acquires the next audio data input from the microphone 421 after the time (= T0) at which the input interval confirmation timer T expires (S1224), the device 332 is based on the acquired audio data. The display for urging the user 331 to issue the reserved word is displayed without executing the process for controlling the or sensor.

入力間隔確認タイマＴが満了すると、入力管理部は内部の状態を入力待ちに遷移させる（Ｓ１２２０）とともに、タイムアウト通知を音声処理部４０７に通知する（Ｓ１２２１）。タイムアウト通知を受け取った音声処理部４０７は、表示部４２５に対して認識未完了通知を送信し（Ｓ１２２２）、その認識未完了通知を受信した表示部４２５は、予約語を発するようにユーザ３３１に促す表示、例えばＬＥＤを赤色で点滅させる（Ｓ１２２３）。 When the input interval confirmation timer T expires, the input management unit shifts the internal state to wait for input (S1220) and also notifies the voice processing unit 407 of a timeout notification (S1221). The voice processing unit 407 that has received the time-out notification transmits a recognition incomplete notification to the display unit 425 (S1222), and the display unit 425 that has received the recognition incomplete notification instructs the user 331 to issue a reserved word. A prompting message, for example, the LED is blinked in red (S1223).

入力間隔確認タイマＴが満了後に、マイク４２１から入力された次の音声データを取り込んだ場合（Ｓ１２２４）、入力管理部４２０は、内部状態を認識中に遷移させる（Ｓ１２２５）とともに、その取り込んだ音声データをトリガー認識部４０５に転送する（Ｓ１２２６）。以降、ホスト機器３３２は、図８Ａおよび図８ＢのＳ８０３からＳ８１２までの処理あるいは図９Ａおよび図９ＢのＳ９０３からＳ９１６までの処理を行い、予約語の認識を再度行う。 When the next voice data input from the microphone 421 is fetched after the input interval confirmation timer T expires (S1224), the input management unit 420 causes the internal state to transition to recognition (S1225) and the fetched voice. The data is transferred to the trigger recognition unit 405 (S1226). After that, the host device 332 performs the processing from S803 to S812 in FIGS. 8A and 8B or the processing from S903 to S916 in FIGS. 9A and 9B, and recognizes the reserved word again.

次にホスト機器３３２を用いた機器やセンサを制御するための制御内容の登録と、その登録された制御内容に基づいて行う機器やセンサの制御について説明する。 Next, registration of control contents for controlling the devices and sensors using the host device 332 and control of the devices and sensors performed based on the registered control contents will be described.

図１３は、ホスト機器３３２が、予約語を認識した後図１０Ａおよび図１０Ｂのシーケンス図に示したように各種センサ３１０や各種設備機器３２０や各種家電機器３４０を制御する際に用いる制御情報の内容の具体的な例を示したものである。 13 shows control information used when the host device 332 recognizes a reserved word and then controls various sensors 310, various equipment devices 320, and various home electric appliances 340 as shown in the sequence diagrams of FIGS. 10A and 10B. It shows a specific example of the content.

項目１は、応答・アクション生成部１０２−２より送信される応答・アクション生成結果通知に含まれている、各種センサ３１０や各種設備機器３２０や各種家電機器３４０を制御する情報（以降応答・アクション情報と呼ぶ）の具体例である。この応答・アクション生成情報は、機器３３２が制御する機器やセンサ等の「対象」と、その制御対象を制御する内容を表す「命令」とから成る。ホスト機器３３２は、応答・アクション生成結果通知を受信すると、その中に含まれるアクション情報を抽出し、そのアクション情報の内容に基づいて、対象となる機器の制御を行う。 Item 1 is information included in the response / action generation result notification transmitted from the response / action generation unit 102-2 for controlling the various sensors 310, the various facility devices 320, and the various home appliances 340 (hereinafter, the response / action). (Called information). This response / action generation information is made up of a “target” such as a device or a sensor controlled by the device 332, and a “command” indicating the content of controlling the control target. Upon receiving the response / action generation result notification, the host device 332 extracts the action information included therein, and controls the target device based on the content of the action information.

「命令」の例としては、制御する対象の機器を起動させる（動作させる）「起動命令」、終了させる（停止させる）「停止命令」、動作中の内容（モード）を変更する「動作変更命令」、対象機器に予め設定している内容（モード）を変更する「設定変更命令」等がある。 Examples of the "instruction" are a "start instruction" for activating (operating) the device to be controlled, a "stop instruction" for terminating (stopping) the device, and an "operation change instruction" for changing the contents (mode) during operation. , ”“ Setting change command ”for changing the contents (mode) preset in the target device.

応答・アクション生成部１０２−２が応答・アクション生成結果通知に含む応答・アクション情報を生成するために、ユーザ３３１は予め制御対象の機器とその制御内容、及びその機器を制御させるためにホスト機器３３２に対して発する言葉、の組み合わせを、ホスト機器３３２の初期設定として応答・アクション生成部１０２−２に登録する必要がある。以下図１３の例を用いて、ホスト機器３３２の初期設定における応答・アクション情報の登録に関して説明する。 In order for the response / action generation unit 102-2 to generate the response / action information included in the response / action generation result notification, the user 331 preliminarily controls the device to be controlled, its control content, and the host device to control the device. It is necessary to register a combination of words issued to 332 in the response / action generation unit 102-2 as an initial setting of the host device 332. Registration of response / action information in the initial setting of the host device 332 will be described below using the example of FIG. 13.

項目２は、ホスト機器３３２を通して制御する機器である「対象」である。この「対象」は、各種センサ３１０や各種設備機器３２０や各種家電機器３４０に含まれる機器やセンサの識別名称であり、具体例としてエアコン１を記載している。 Item 2 is a “target” that is a device controlled through the host device 332. The “target” is an identification name of a device or a sensor included in the various sensors 310, the various facility devices 320, and the various home appliances 340, and the air conditioner 1 is described as a specific example.

項目３は、「項目２」に示す機器の制御内容である「命令」である。この「命令」は、具体例として項目２に挙げたエアコン１の命令を記載しており、エアコンを動かす「起動命令」、エアコンを停止させる「停止命令」、エアコンの動作内容を変える「動作変更命令」、エアコンの設定内容を変える「設定変更命令」を例として記載している。 Item 3 is a "command" that is the control content of the device shown in "item 2". This “command” describes the command of the air conditioner 1 given in item 2 as a specific example, and it is a “start command” to move the air conditioner, a “stop command” to stop the air conditioner, and a “change operation” to change the operation content of the air conditioner. The “command” and the “setting change command” for changing the setting contents of the air conditioner are described as examples.

項目２及び項目３の各機器やセンサの製品仕様は、記載していない製品仕様の情報が保存されている製品仕様クラウドサーバに予め保存されている。ユーザ３３１は、ホスト機器３３２を通して制御したい対象機器や対象センサの項目２及び項目３の製品仕様の情報を製品仕様クラウドサーバから入手する。 The product specifications of the devices and sensors of item 2 and item 3 are stored in advance in the product specification cloud server in which the information of the product specifications not described is stored. The user 331 obtains, from the product specification cloud server, the product specification information of item 2 and item 3 of the target device or target sensor to be controlled through the host device 332.

次にユーザ３３１は、ホスト機器３３２を通して項目２及び項目３の制御内容を実行する際に、ホスト機器３３２に発する言葉である項目４＝「フレーズ」を決定する。この「フレーズ」は、項目３に挙げたエアコン１の命令に対応する内容であることが望ましく、例えばエアコンを動かす「起動命令」に対しては「エアコンつけて」、エアコンを停止させる「停止命令」に対しては「エアコンけして」、エアコンの動作内容である「冷房」を「ドライ」に変える「動作変更命令」に対しては「ドライにして」、エアコンの設定内容である運転開始時間を「夜１０時運転開始」に変える「設定変更命令」に対しては「夜１０時にエアコンつけて」を例として記載している。 Next, when the user 331 executes the control contents of item 2 and item 3 through the host device 332, the user 331 determines item 4 = “phrase” which is a word issued to the host device 332. This "phrase" is preferably the content corresponding to the command of the air conditioner 1 listed in item 3. For example, "start command" to move the air conditioner is "turn on the air conditioner" and "stop command to stop the air conditioner". "For air conditioner", "Air conditioner operation" is changed to "Dry" "Operation change command" is "Dry", air conditioner setting operation start time As for the “setting change command” that changes “to 10 o'clock at night”, “turn on air conditioner at 10 o'clock” is described as an example.

以上より決定した（対象、命令、フレーズ）の組み合わせを、ユーザ３３１は、ホスト機器３３２の初期設定として作成する。ユーザ３３１は、ホスト機器３３２を通じて制御したい機器すべてに対して同様の作成を行い、最終的に制御対象すべての機器に関する（対象、命令、フレーズ）を１つにまとめた応答・アクション情報一覧を生成する。作成された応答・アクション情報一覧は、ホスト機器３３２を通して応答・アクション生成部１０２−２に登録される。 The user 331 creates the combination of (target, command, phrase) determined as described above as the initial setting of the host device 332. The user 331 performs the same creation for all devices to be controlled through the host device 332, and finally generates a response / action information list in which (targets, commands, phrases) relating to all devices to be controlled are combined. To do. The created response / action information list is registered in the response / action generation unit 102-2 through the host device 332.

応答・アクション生成部１０２−２に応答・アクション情報一覧が登録されると、図１０Ａおよび図１０Ｂに示すように、ホスト機器３３２は、予約語の認識が完了した以降、引き続きユーザ３３１が発する言葉を取り込んで解析することで、機器やセンサを制御することができる。 When the response / action information list is registered in the response / action generation unit 102-2, as shown in FIGS. 10A and 10B, the host device 332 continues to output the words issued by the user 331 after the recognition of the reserved words is completed. By taking in and analyzing, it is possible to control devices and sensors.

例えば、ユーザ３３１が発した言葉＝エアコンつけて、の場合、音声テキスト変換部１０１−２は入力された音声データを「えあこんつけて」というテキストに変換し、テキスト分析部１０２−１は、テキストデータ「えあこんつけて」を「エアコンつけて」という内容であると分析する。この分析結果をもとに応答・アクション生成部１０２−２は、既に登録されている応答・アクション情報一覧を参照し、「エアコンつけて」という「フレーズ」の分析結果に対応する応答・アクション情報を検索する。これにより、（対象＝エアコン１、命令＝運転開始）と言う応答・アクション情報を抽出し、応答・アクション生成結果通知に（対象＝エアコン１、命令＝運転開始）の応答・アクション情報を設定して音声処理部４０７に通知する。 For example, when the word uttered by the user 331 = turn on the air conditioner, the voice / text conversion unit 101-2 converts the input voice data into a text "Eakon Tsukete", and the text analysis unit 102-1 , Analyze that the text data "Eakon" should have "Air conditioner". Based on this analysis result, the response / action generation unit 102-2 refers to the already-registered response / action information list, and responds / action information corresponding to the analysis result of the "phrase" "turn on the air conditioner". To search. As a result, the response / action information of (target = air conditioner 1, command = start operation) is extracted, and the response / action information of (target = air conditioner 1, command = start operation) is set in the response / action generation result notification. The audio processing unit 407 is notified.

音声処理部４０７は、受信した応答・アクション生成結果通知に設定されている応答・アクション情報を参照して、各種センサ３１０や各種設備機器３２０や各種家電機器３４０の中の該当する機器やセンサを制御する。 The voice processing unit 407 refers to the response / action information set in the received response / action generation result notification to identify the corresponding device or sensor among the various sensors 310, the various facility devices 320, and the various home appliances 340. Control.

次にホスト機器３３２を用いて機器やセンサを制御する場合、種々の条件により機器やセンサを制御する制御内容や、ホスト機器３３２の動作内容を変更する場合について説明する。 Next, in the case of controlling a device or a sensor using the host device 332, a description will be given of a case where the control content for controlling the device or the sensor under various conditions and the operation content of the host device 332 are changed.

図１４は、ホスト機器３３２に予約語が複数登録されている場合、ホスト機器３３２がユーザ３３１の発した言葉を予約語の１つであると認識し、その認識した予約語に応じて行う動作内容の例の一覧である。 In FIG. 14, when a plurality of reserved words are registered in the host device 332, the host device 332 recognizes that the word issued by the user 331 is one of the reserved words, and performs the operation according to the recognized reserved word. 3 is a list of example contents.

ホスト機器３３２は、複数の予約語を登録することが可能であり、またその複数の予約語の各々を認識した場合に、その認識した予約語に応じた動作内容（以降付加情報１と呼ぶ）を設定することが出来る。 The host device 332 is capable of registering a plurality of reserved words, and when each of the plurality of reserved words is recognized, the operation content corresponding to the recognized reserved word (hereinafter referred to as additional information 1) Can be set.

図１４に示すようにホスト機器３３２は、予約語として例えば「いろは」「オレ様だ」「息子や」の３つを登録しているものとする。ホスト機器３３２は、ユーザ３３１が発した言葉を予約語「いろは」と認識した場合は、既に設定されている動作内容を変えないが、ユーザ３３１が発した言葉を予約語「オレ様だ」と認識した場合は、以降ユーザ３３１の発する言葉を認識したら必ず「ご主人様喜んで」とスピーカ４２３を通じてアナウンスするように動作を変更する。また、ユーザ３３１が発した言葉を予約語「息子や」と認識した場合、ホスト機器３３２は、ユーザ３３１がシニアユーザであると判定し、シニアの場合はゆっくりと話をする傾向にあるため、図１１Ａおよび図１１Ｂに示す入力間隔確認タイマの満了時間Ｔ０を通常の設定時間より長くするように設定変更する。 As shown in FIG. 14, it is assumed that the host device 332 has registered three reserved words, for example, “Iroha”, “Ore-sama”, and “sonya”. When the host device 332 recognizes the word uttered by the user 331 as the reserved word “Iroha”, the host device 332 does not change the operation content that has already been set, but the word uttered by the user 331 is the reserved word “I am like”. If it is recognized, the operation is changed so that, after recognizing the words spoken by the user 331, the user will be announced through the speaker 423 that "the master is pleased". Further, when the word spoken by the user 331 is recognized as the reserved word “sonya”, the host device 332 determines that the user 331 is a senior user, and in the case of a senior, the host device 332 tends to talk slowly. The setting is changed so that the expiration time T0 of the input interval confirmation timer shown in FIGS. 11A and 11B is set longer than the normal set time.

図１４の例は、ホスト機器３３２が、ホスト機器自身の動作内容を変える例を示しているが、それに限らず、ホスト機器３３２とネットワーク３３３で接続されている機器やセンサに対する動作の制御を行ってもよい。 The example of FIG. 14 shows an example in which the host device 332 changes the operation content of the host device itself, but the invention is not limited to this, and controls the operation of devices and sensors connected to the host device 332 by the network 333. May be.

ホスト機器３３２は、複数の予約語に応じてホスト機器３３２の動作を変えるために、各々の予約語に対する付加情報１を予めホスト機器３３２に登録しておく必要がある。 The host device 332 needs to register the additional information 1 for each reserved word in the host device 332 in advance in order to change the operation of the host device 332 according to a plurality of reserved words.

ホスト機器３３２は、予約語をホスト機器３３２に登録する際に、登録する予約語に対応する付加情報１もあわせて登録するモード（以降予約語登録（付加情報１）モードと呼ぶ）を有している。 The host device 332 has a mode (hereinafter referred to as a reserved word registration (additional information 1) mode) in which the additional information 1 corresponding to the reserved word to be registered is also registered when the reserved word is registered in the host device 332. ing.

図１５Ａおよび図１５Ｂは、予約語およびそれに対応する付加情報１を合わせて登録するために、ホスト機器３３２が「予約語登録（付加情報１）モード」に遷移している状態において、予約語の登録開始から付加情報１の登録完了までのホスト機器３３２の処理シーケンスの例を示している。図１５Ａおよび図１５Ｂに示すＳ１５００からＳ１５１４の処理は、それぞれ図５Ａおよび図５Ｂに示すＳ５００からＳ５１４の処理と同一である。図１５Ａおよび図１５Ｂにおける処理の図５Ａおよび図５Ｂとの処理の相違点は、Ｓ１５１５がＳ５１５と異なる点と、Ｓ１５１６からＳ１５２３が追加されている点である。 FIG. 15A and FIG. 15B show a reserved word in a state in which the host device 332 is in the “reserved word registration (additional information 1) mode” in order to register the reserved word and the additional information 1 corresponding thereto. An example of a processing sequence of the host device 332 from the start of registration to the completion of registration of the additional information 1 is shown. The processes of S1500 to S1514 shown in FIGS. 15A and 15B are the same as the processes of S500 to S514 shown in FIGS. 5A and 5B, respectively. 15A and 15B are different from those of FIGS. 5A and 5B in that S1515 is different from S515 and that S1516 to S1523 are added.

トリガー設定部４０３は、予約語の登録が完了したことをユーザ３３１に対して知らせる表示（Ｓ１５１４）を行う。予約語の登録が完了したことをユーザに対して知らせる表示（Ｓ１５１５）は、トリガー設定部４０３が表示装置４２５に対して登録完了通知を送信（Ｓ１５１４）し、その登録完了通知を受信した表示装置４２５が例えばＬＥＤを緑色で点滅させる、というようにユーザ３３１が認識できる表示方法で行うことが望ましい。これにより、トリガー設定部４０３は、付加情報１の登録を行うようにユーザ３３１に促すことが可能となる。 The trigger setting unit 403 displays to notify the user 331 that the reserved word registration is completed (S1514). The display notifying the user that registration of the reserved word is completed (S1515) is performed by the trigger setting unit 403 by transmitting a registration completion notification to the display device 425 (S1514), and receiving the registration completion notification. It is desirable to use a display method that can be recognized by the user 331, such as the LED 425 blinking the LED in green. Thereby, the trigger setting unit 403 can prompt the user 331 to register the additional information 1.

ＬＥＤが緑色に点滅している（Ｓ１５１５）ことを認識したユーザ３３１は、Ｓ１５１１で登録が完了した予約語に対応した付加情報１を設定することができる。 The user 331 who recognizes that the LED is blinking in green (S1515) can set the additional information 1 corresponding to the reserved word registered in S1511.

付加情報１の設定方法は、ユーザ３３１が発した音声をマイク４２１を通じてホスト機器３３２が取り込み、その取り込んだ音声データを解析することで、登録できるようにしてもよい。或いはまた表示装置４２５に、付加情報１を設定するメニューを表示させ、ユーザ３３１がそのメニューに従って操作することで登録できるようにしてもよい。或いは図４に示すネットワークＩ／Ｆ４２７を経由して接続されている外部のデバイス、例えばスマートフォンやタブレットを用いて、そのスマートフォンやタブレットの表示画面に予約語に対応した付加情報１を設定するメニューを表示させ、ユーザ３３１がその表示されたメニュー画面に従って操作することで登録できるようにしてもよい。図１５Ａおよび図１５Ｂは、表示部４２５に表示された付加情報１を設定するメニューを表示させ、ユーザ３３１がそのメニューに従って操作することで付加情報１を登録する場合の処理シーケンスの例である。 The additional information 1 may be set by registering the voice generated by the user 331 by the host device 332 through the microphone 421 and analyzing the captured voice data. Alternatively, a menu for setting the additional information 1 may be displayed on the display device 425 so that the user 331 can register it by operating according to the menu. Alternatively, using an external device connected via the network I / F 427 shown in FIG. 4, for example, a smartphone or tablet, a menu for setting additional information 1 corresponding to the reserved word on the display screen of the smartphone or tablet is displayed. It may be displayed so that the user 331 can register by operating according to the displayed menu screen. FIG. 15A and FIG. 15B are examples of a processing sequence when the menu for setting the additional information 1 displayed on the display unit 425 is displayed and the user 331 operates the menu to register the additional information 1.

ユーザ３３１に付加情報１の入力を促すためにＬＥＤが緑色に点滅する（Ｓ１５１５）と、表示部４２５に付加情報１を登録するためのメニューが表示される。ユーザ３３１は、表示されたメニュー画面に従って操作することで、付加情報１を作成する。作成が完了した付加情報１は、入力管理部４２０に取り込まれる（Ｓ１５１７）。入力管理部４２０は、取り込んだ付加情報１をトリガー設定部４０３に転送する。トリガー設定部４０３は、転送された付加情報１をメモリ４１０の予約語保存エリア４１０−２に保存する（Ｓ１５１９）。 When the LED blinks in green to prompt the user 331 to input the additional information 1 (S1515), a menu for registering the additional information 1 is displayed on the display unit 425. The user 331 creates the additional information 1 by operating according to the displayed menu screen. The additional information 1 that has been created is taken into the input management unit 420 (S1517). The input management unit 420 transfers the captured additional information 1 to the trigger setting unit 403. The trigger setting unit 403 stores the transferred additional information 1 in the reserved word storage area 410-2 of the memory 410 (S1519).

なおトリガー設定部４０３は、付加情報１をメモリ４１０の予約語保存エリア４１０−２に保存する際にはＳ１５１３で登録した予約語と関連付けて保存する。 Note that the trigger setting unit 403 stores the additional information 1 in association with the reserved word registered in S1513 when storing the additional information 1 in the reserved word storage area 410-2 of the memory 410.

また、音声処理部４０７は、付加情報１の登録が完了したことをユーザ３３１に対して知らせる表示（Ｓ１５２２）を行う。付加情報１の登録が完了したことをユーザ３３１に対して知らせる表示（Ｓ１５２２）は、音声処理部４０７が表示装置４２５に対して登録完了通知を送信（Ｓ１５２０）し、その登録完了通知を受信した表示装置４２５が例えばＬＥＤを緑色で点灯させる、というようにユーザ３３１が認識できる表示方法で行うことが望ましい。 The voice processing unit 407 also performs a display (S1522) informing the user 331 that the registration of the additional information 1 has been completed. As for the display notifying the user 331 that the registration of the additional information 1 is completed (S1522), the voice processing unit 407 transmits a registration completion notification to the display device 425 (S1520), and receives the registration completion notification. It is desirable to use a display method that the user 331 can recognize, for example, the display device 425 turns on the LED in green.

図１６Ａおよび図１６Ｂは、図１５Ａおよび図１５Ｂに示す処理によりメモリ４１０の予約語保存エリア４１０−２に付加情報１が保存された場合に、ユーザ３３１が発した言葉の中から予約語の認識し、その認識した予約語の付加情報１を予約語保存エリア４１０−２から読み出して、ホスト機器３３２に対して動作を設定する場合のシーケンスの例である。 FIGS. 16A and 16B show recognition of a reserved word from words issued by the user 331 when the additional information 1 is stored in the reserved word storage area 410-2 of the memory 410 by the processing shown in FIGS. 15A and 15B. Then, the additional information 1 of the recognized reserved word is read from the reserved word storage area 410-2, and the operation is set in the host device 332.

図１６Ａおよび図１６Ｂに示すＳ１６００からＳ１６１２の処理は、それぞれ図８Ａおよび図８Ｂに示すＳ８００からＳ８１２の処理と同一である。図１６Ａおよび図１６Ｂの処理における図８Ａおよび図８Ｂの処理との違いは、Ｓ１６１３とＳ１６１４の処理が追加されている点である。 The processes of S1600 to S1612 shown in FIGS. 16A and 16B are the same as the processes of S800 to S812 shown in FIGS. 8A and 8B, respectively. The difference between the processes of FIGS. 16A and 16B and the processes of FIGS. 8A and 8B is that the processes of S1613 and S1614 are added.

ユーザ３３１が発した言葉を予約語として認識すると（Ｓ１６０５）、トリガー認識部４０５は、該当する予約語に対応した付加情報１をメモリ４１０の予約語保存エリア４１０−２から読み出す。付加情報１を読み出したトリガー認識部４０５は、読み出した付加情報１（Ｓ１６１３）の内容の動作をホスト機器３３２に設定する（Ｓ１６１４）。図１４に示されている例の内容が予約語保存エリア４１０−２に保存されている場合、Ｓ１６０５で予約語として「息子や」を認識した場合、トリガー認識部４０５は、Ｓ１６１４にて入力間隔確認タイマＴの満了時間Ｔ０を、通常の値をより長くするように設定する。 When the word recognized by the user 331 is recognized as a reserved word (S1605), the trigger recognition unit 405 reads the additional information 1 corresponding to the corresponding reserved word from the reserved word storage area 410-2 of the memory 410. The trigger recognition unit 405 that has read the additional information 1 sets the operation having the content of the read additional information 1 (S1613) in the host device 332 (S1614). When the content of the example shown in FIG. 14 is stored in the reserved word storage area 410-2, when “son” is recognized as the reserved word in S1605, the trigger recognition unit 405 determines the input interval in S1614. The expiration time T0 of the confirmation timer T is set so as to make the normal value longer.

図１７（Ａ）は、ユーザ３３１が発した言葉を、ホスト機器３３２に登録されている予約語として認識した場合、その認識した予約語に継続するユーザ３３１が発した言葉に応じて、ホスト機器３３２が特定の動作をする動作内容の例の一覧である。 In FIG. 17A, when a word uttered by the user 331 is recognized as a reserved word registered in the host device 332, the host device responsive to the word continued by the reserved word recognized by the user 331. 332 is a list of examples of operation contents for performing a specific operation.

ホスト機器３３２は、ユーザ３３１が発した言葉を、登録されている予約語であると認識した場合、その認識した予約語に継続してユーザ３３１が発した言葉（以降付加語と呼ぶ）の内容に応じて動作内容（以降付加情報２と呼ぶ）を設定することが出来る。 When the host device 332 recognizes the word issued by the user 331 as a registered reserved word, the content of the word issued by the user 331 (hereinafter referred to as an additional word) following the recognized reserved word. The operation content (hereinafter referred to as additional information 2) can be set according to the above.

例えば図１７（Ａ）に示すように、予約語として「いろは」が登録されているとする。この場合、ホスト機器３３２は、予約語「いろは」を認識した場合、この予約語「いろは」に続くユーザ３３１の発した言葉を認識しない場合は、既に設定されている動作内容を変更しない。ホスト機器３３２は、予約語「いろは」に続くユーザ３３１の発した言葉として「ちゃん」を認識した場合は、ユーザ３３１の機嫌がよいと判定し、スピーカ４２３を通して応答する場合は、応答する際のトーンを上げるように動作内容を変更する。また、ホスト機器３３２は、予約語「いろは」に続くユーザ３３１の発した言葉として「や」を認識した場合は、ユーザ３３１がシニアユーザであると推定し、ユーザ３３１がゆっくりと話す傾向にあるため、図１１Ａおよび図１１Ｂに示す入力間隔確認タイマの満了時間Ｔ０を通常の設定時間より長くするように変更する。またホスト機器３３２は、予約語「いろは」に続くユーザ３３１の発した言葉として「おい」を認識した場合は、ユーザ３３１が怒っていると判定し、「申し訳ございません」とスピーカ４２３を通じてすぐにアナウンスするようにする。 For example, as shown in FIG. 17A, assume that "Iroha" is registered as a reserved word. In this case, when the host device 332 recognizes the reserved word “Iroha” and does not recognize the word issued by the user 331 following the reserved word “Iroha”, the host device 332 does not change the operation content that has already been set. When the host device 332 recognizes "chan" as the word spoken by the user 331 following the reserved word "iroha", the host device 332 determines that the user 331 is in a good mood, and when responding through the speaker 423, the response is made. Change the operation so that the tone is raised. When the host device 332 recognizes “ya” as the word spoken by the user 331 following the reserved word “iroha”, the host device 332 estimates that the user 331 is a senior user, and the user 331 tends to speak slowly. Therefore, the expiration time T0 of the input interval confirmation timer shown in FIGS. 11A and 11B is changed to be longer than the normal set time. Further, when the host device 332 recognizes “Ooi” as the word spoken by the user 331 following the reserved word “Iroha”, the host device 332 determines that the user 331 is angry and immediately says “Sorry” through the speaker 423. Make an announcement.

図１７（Ａ）の例は、１つの予約語に対して複数の付加語を設定し予約語に対する複数の付加語の組み合わせごとに付加情報２を設定することで、ホスト機器３３２が付加情報２の内容に基づいて動作内容を変える例を示しているが、複数の予約語と複数の付加語との組み合わせごとに付加情報２を設定することも可能である。図１７（Ｂ）に示すように、例えばホスト機器３３２が予約語として「いろは」と「おおきに」「あーしんど」の３つを登録しているとする。この場合、各予約語に対して付加語を定義し、その予約語＋付加語の組み合わせごとに付加情報２を設定してもよい。 In the example of FIG. 17A, the host device 332 sets the additional information 2 by setting a plurality of additional words for one reserved word and setting additional information 2 for each combination of a plurality of additional words for the reserved word. Although the example in which the operation content is changed based on the content of 1 is shown, the additional information 2 can be set for each combination of a plurality of reserved words and a plurality of additional words. As shown in FIG. 17B, for example, it is assumed that the host device 332 has registered three reserved words “Iroha”, “Ookini”, and “Ahsind”. In this case, an additional word may be defined for each reserved word, and the additional information 2 may be set for each combination of the reserved word and the additional word.

また、ユーザによっては、予約語を発するだけで、ある特定の動作をしてほしいときがある。例えば、ある個人の口癖がある場合、その口癖を予約語としてホスト機器３３２に登録し、併せてこの予約語に対応した動作をホスト機器３３２に登録することで、その個人の特性にあった機器やセンサの動作の制御を簡易に実行することができる。図１７（Ｂ）の予約語「あーしんど」の例では、「あーしんど」という予約語をホスト機器３３２が認識した場合に、ホスト機器３３２がユーザ３３１の発した言葉の中から予約語を認識しただけで、ネットワーク３３３に接続されている冷蔵庫の中に保存されているビールの情報をスピーカ４２３を通してアナウンスする、ということも可能である。 In addition, some users may want to perform a specific operation only by issuing a reserved word. For example, when there is a habit of a certain individual, that habit is registered as a reserved word in the host device 332, and an operation corresponding to this reserved word is also registered in the host device 332, so that a device that meets the characteristics of the individual is registered. It is possible to easily control the operation of the sensor and the sensor. In the example of the reserved word “earth” in FIG. 17B, when the host device 332 recognizes the reserved word “earth”, the host device 332 recognizes the reserved word from the words issued by the user 331. Only by doing, it is possible to announce the information of beer stored in the refrigerator connected to the network 333 through the speaker 423.

ホスト機器３３２は、予約語に対する付加語の内容に応じて動作を変えるために、予約語に対応した付加語と、この予約語と付加語の組み合わせに対する動作内容である付加情報２、の組み合わせを予めホスト機器３３２に登録しておく必要がある。このためホスト機器３３２は、登録済み予約語に対して、対応する付加語や付加情報を追加登録するモードを有している。ホスト機器３３２に既に登録されている予約語に対して、付加情報１を追加するモードを付加情報１追加登録モード、付加語と付加情報２を追加するモードを付加情報２追加登録モードと呼ぶこととする。 In order to change the operation according to the content of the additional word to the reserved word, the host device 332 combines the additional word corresponding to the reserved word and the additional information 2 which is the operation content for the combination of the reserved word and the additional word. It is necessary to register in the host device 332 in advance. Therefore, the host device 332 has a mode of additionally registering a corresponding additional word or additional information for the registered reserved word. The mode for adding the additional information 1 to the reserved word already registered in the host device 332 is called the additional information 1 additional registration mode, and the mode for adding the additional word and the additional information 2 is called the additional information 2 additional registration mode. And

付加情報２の設定方法は、付加情報１の設定同様にユーザ３３１が発した音声をマイク４２１を通じてホスト機器３３２が取り込み、その取り込んだ音声データを解析することで、登録できるようにしてもよい。或いはまた表示装置４２５に、付加情報２を設定するメニューを表示させ、ユーザ３３１がその表示されたメニューに従って操作することで登録できるようにしてもよい。或いは図４に示すネットワークＩ／Ｆ４２７を経由して接続されている外部のデバイス、例えばスマートフォンやタブレットを用いて、そのスマートフォンやタブレットの表示画面に予約語および付加語に対応した付加情報２を設定するメニューを表示させ、ユーザ３３１がその表示されたメニュー画面に従って操作することで登録できるようにしてもよい。 Similarly to the setting of the additional information 1, the additional information 2 may be registered by the host device 332 capturing the voice uttered by the user 331 through the microphone 421 and analyzing the captured voice data. Alternatively, a menu for setting the additional information 2 may be displayed on the display device 425 so that the user 331 can perform registration by operating according to the displayed menu. Alternatively, by using an external device connected via the network I / F 427 shown in FIG. 4, for example, a smartphone or tablet, additional information 2 corresponding to the reserved word and the additional word is set on the display screen of the smartphone or tablet. It is also possible to display a menu to be displayed and allow the user 331 to perform registration by operating in accordance with the displayed menu screen.

図１８Ａ、図１８Ｂおよび図１８Ｃは、図１７（Ａ）（Ｂ）に示す登録済みの予約語に対して、付加語の登録とその付加語に対する動作内容（付加情報２）の登録を行う場合の処理シーケンスの例である。 FIGS. 18A, 18B, and 18C show the case where an additional word is registered for the registered reserved words shown in FIGS. 17A and 17B and the operation content (additional information 2) is registered for the additional word. Is an example of a processing sequence of.

登録済みの予約語に対する付加語を追加登録するために、ユーザ３３１はホスト機器３３２を「付加情報２追加登録モード」に変更する。ホスト機器を「付加情報２追加登録モード」に変更すると、ユーザ３３１は、ホスト機器３３２に登録済みの予約語と、その予約語に対して登録したい付加語を発する。ホスト機器３３２は、ユーザ３３１の発した言葉の中から、最初に予約語の認識を行う（Ｓ１８０５）。 In order to additionally register the additional word for the registered reserved word, the user 331 changes the host device 332 to the “additional information 2 additional registration mode”. When the host device is changed to the “additional information 2 additional registration mode”, the user 331 issues a reserved word registered in the host device 332 and an additional word to be registered for the reserved word. The host device 332 first recognizes a reserved word from words spoken by the user 331 (S1805).

ホスト機器３３２は、ユーザ３３１が発した言葉をマイク４２１を通じて入力管理部４２０に取り込む（Ｓ１８０１）。入力管理部４２０は、音声データを取り込むと内部で管理する内部状態を認識中（予約語）に遷移させる（Ｓ１８０２）とともに、入力された音声データをトリガー認識部４０５に転送する（Ｓ１８０３）。 The host device 332 takes in the words spoken by the user 331 into the input management unit 420 through the microphone 421 (S1801). When the input management unit 420 takes in the voice data, the input management unit 420 changes the internal state internally managed to being recognized (reserved word) (S1802), and transfers the input voice data to the trigger recognition unit 405 (S1803).

トリガー認識部４０５は、入力管理部４２０から転送されてきた音声データを受け取ると、メモリ４１０の予約語保存エリア４１０−２から認識用データを読み出し（Ｓ１８０４）、入力管理部４２０から転送されてきた音声データとの比較を行う（Ｓ１８０５）。トリガー認識部４０５は、入力された音声データが予約語と認識出来た場合、入力管理部４２０に認識完了通知（Ｓ１８０６）を通知する。認識完了通知を受け取った入力管理部４２０は、内部で管理する内部状態を認識中（予約語）から入力待ち（付加語）に遷移（Ｓ１８０７）させる。 When the trigger recognition unit 405 receives the voice data transferred from the input management unit 420, the trigger recognition unit 405 reads the recognition data from the reserved word storage area 410-2 of the memory 410 (S1804), and is transferred from the input management unit 420. It is compared with the audio data (S1805). When the input voice data can be recognized as a reserved word, the trigger recognition unit 405 notifies the input management unit 420 of a recognition completion notification (S1806). Upon receiving the recognition completion notification, the input management unit 420 causes the internal state internally managed to change from recognizing (reserved word) to waiting for input (additional word) (S1807).

ホスト機器３３２は、ユーザ３３１が予約語に続いて発した言葉をマイク４２１を通じて入力管理部４２０に取り込む（Ｓ１８０８）。入力管理部４２０は、内部で管理する内部状態が入力待ち（付加語）である（Ｓ１８０７）ので、入力された音声データをトリガー設定部４０３に転送する（Ｓ１８０９）。以降、図５Ａおよび図５Ｂで説明した予約語の登録同様に、トリガー設定部４０３は、受信した音声データをメモリ４１０の音声蓄積エリア４１０−３に保存（Ｓ１８１０）しながら、規定回数の付加語の取り込みを行う（Ｓ１８１１）。 The host device 332 fetches the word issued by the user 331 following the reserved word into the input management unit 420 through the microphone 421 (S1808). The input management unit 420 transfers the input voice data to the trigger setting unit 403 because the internal state managed internally is waiting for input (additional word) (S1807) (S1809). After that, the trigger setting unit 403 stores the received voice data in the voice storage area 410-3 of the memory 410 (S1810), and performs the specified number of additional words in the same manner as the reserved word registration described in FIGS. 5A and 5B. Is taken in (S1811).

トリガー設定部４０３は、規定回数に達しているかの確認の結果規定回数に達していないと判定した場合、登録する付加語の音声の入力をユーザ３３１に促す表示を行う（Ｓ１８１２）と共に、入力管理部４２０に入力継続通知を送信する（Ｓ１８１４）。なお、付加語として登録する音声の入力をユーザ３３１に対して促す表示（Ｓ１８１３）は、トリガー設定部４０３が表示装置４２５に対して登録未完了通知を送信（Ｓ１８１２）し、その登録未完了通知を受信した表示装置４２５が例えばＬＥＤを赤色で点滅させる、というようにユーザ３３１が認識できる表示方法で行うことが望ましい。また表示による方法の代わりに音声による方法を用いて、登録する音声の入力をユーザ３３１に促してもよい。この場合トリガー設定部４０３は、スピーカ４２３に対して登録未完了通知を送信し、この登録未完了通知を受け取ったスピーカ４２３は、たとえば「もう一度入力してください」とユーザ３３１に対してアナウンスする方法でもよい。或いはトリガー設定部４０３は、ユーザ３３１に対して登録する音声の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。 If the trigger setting unit 403 determines that the number of times has reached the specified number of times as a result of checking whether the number of times has reached the specified number of times, the trigger setting unit 403 displays a message prompting the user 331 to input the voice of the additional word to be registered (S1812), and manages the input An input continuation notification is transmitted to the section 420 (S1814). As for the display prompting the user 331 to input a voice to be registered as an additional word (S1813), the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S1812), and the registration incomplete notification is sent. It is desirable that the display device 425 that has received the message be displayed by a method that the user 331 can recognize, for example, blinking the LED in red. A voice method may be used instead of the display method to prompt the user 331 to input a voice to be registered. In this case, the trigger setting unit 403 sends a registration incomplete notification to the speaker 423, and the speaker 423 that has received this registration incomplete notification announces to the user 331, for example, “Please input again”. But it's okay. Alternatively, the trigger setting unit 403 may use both the display method and the voice method to prompt the user 331 to input the voice to be registered.

トリガー設定部４０３は、規定回数に達しているかの確認の結果規定回数に達していると判定した場合、それまでに音声蓄積エリア４１０−３に保存している音声データを読み出し（Ｓ１８１５）、インターネット２を通じてクラウドサーバ１にある音声認識クラウド１０１の中の認識用データ変換部１０１−１に送付する（Ｓ１８１６）。 When the trigger setting unit 403 determines that the number of times has reached the specified number as a result of checking whether the number of times has reached the specified number of times, the trigger setting unit 403 reads out the audio data stored in the audio storage area 410-3 up to then (S1815), and the Internet. 2 to the recognition data conversion unit 101-1 in the voice recognition cloud 101 in the cloud server 1 (S1816).

認識用データ変換部１０１−１は、トリガー設定部４０３から送られてきた音声データを、付加語を認識するための認識用データに変換する（Ｓ１８１７）。認識用データへの変換が完了すると、認識用データ変換部１０１−１は、インターネット２を通じて認識用データをトリガー設定部４０３に送付（Ｓ１８１８）する。付加語を認識するための認識用データ（以降認識用データ（付加語）と呼ぶ）を受信したトリガー設定部４０３は、受信したデータをメモリ４１０の予約語保存エリア４１０−２に保存する（Ｓ１８１９）。トリガー設定部４０３は、認識用データ（付加語）を保存する際には、Ｓ１８０６で認識した予約語と関連づけて保存する。これにより、Ｓ１８０６で認識した予約語に関連付けされて認識用データ（付加語）を保存することが可能となる。 The recognition data conversion unit 101-1 converts the voice data sent from the trigger setting unit 403 into recognition data for recognizing an additional word (S1817). When the conversion to the recognition data is completed, the recognition data conversion unit 101-1 sends the recognition data to the trigger setting unit 403 via the Internet 2 (S1818). The trigger setting unit 403 that has received the recognition data for recognizing the additional word (hereinafter referred to as the recognition data (additional word)) stores the received data in the reserved word storage area 410-2 of the memory 410 (S1819). ). When saving the recognition data (additional word), the trigger setting unit 403 saves it in association with the reserved word recognized in S1806. This makes it possible to store the recognition data (additional word) in association with the reserved word recognized in S1806.

また、トリガー設定部４０３は、付加語の登録が完了したことをユーザ３３１に対して知らせる表示（Ｓ１８２２）を行う。予約語の登録が完了したことをユーザ３３１に対して知らせる表示（Ｓ１８２２）は、トリガー設定部４０３が表示装置４２５に対して登録完了通知を送信（Ｓ１８２１）し、その登録完了通知を受信した表示装置４２５が例えばＬＥＤを緑色で点滅させる（Ｓ１８２２）、というようにユーザ３３１が認識できる表示方法で行うことが望ましい。或いはトリガー設定部４０３は、予約語の登録が完了したことをユーザ３３１に対して通知するのに、表示による方法の代わりに音声による方法を用いてもよい。この場合トリガー設定部４０３は、スピーカ４２３に対して登録完了通知を送信し（Ｓ１８２１）、この登録完了通知を受け取ったスピーカ４２３が例えば「登録が完了しました」とユーザ３３１に対してアナウンスする方法でもよい。或いはトリガー設定部４０３は、予約語の登録が完了したことをユーザ３３１に対して通知するのに、表示による方法と音声による方法の両方を用いてもよい。これにより、ユーザ３３１は、付加語に対応した動作内容である付加情報２の内容を言葉で発するタイミングを知ることができる。 Further, the trigger setting unit 403 performs a display (S1822) informing the user 331 that the additional word registration is completed. The display notifying the user 331 that the reserved word registration is completed (S1822) is a display in which the trigger setting unit 403 transmits a registration completion notification to the display device 425 (S1821) and receives the registration completion notification. It is desirable that the device 425 perform a display method that the user 331 can recognize, such as making the LED blink green (S1822). Alternatively, the trigger setting unit 403 may use a voice method instead of the display method to notify the user 331 that the reserved word registration is completed. In this case, the trigger setting unit 403 sends a registration completion notification to the speaker 423 (S1821), and the speaker 423 that has received the registration completion notification announces to the user 331 that "registration is completed", for example. But it's okay. Alternatively, the trigger setting unit 403 may use both the display method and the voice method to notify the user 331 that the reserved word registration is completed. As a result, the user 331 can know the timing at which the content of the additional information 2, which is the operation content corresponding to the additional word, is uttered.

ユーザ３３１に付加情報２の入力を促すためにＬＥＤが緑色に点滅させる（Ｓ１８２２）と、表示部４２５に付加情報２を登録するためのメニューが表示される。ユーザ３３１は、表示されたメニュー画面に従って操作することで、付加情報２を作成する。作成が完了し付加情報２は、入力管理部４２０に取り込まれる（Ｓ１８２４）。入力管理部４２０は、取り込んだ付加情報２をトリガー設定部４０３に転送する（Ｓ１８２５）。トリガー設定部４０３は、転送された付加情報２をメモリ４１０の予約語保存エリア４１０−２に保存する（Ｓ１８２６）。 When the LED blinks green to prompt the user 331 to input the additional information 2 (S1822), a menu for registering the additional information 2 is displayed on the display unit 425. The user 331 creates the additional information 2 by operating according to the displayed menu screen. After the creation is completed, the additional information 2 is fetched by the input management unit 420 (S1824). The input management unit 420 transfers the captured additional information 2 to the trigger setting unit 403 (S1825). The trigger setting unit 403 stores the transferred additional information 2 in the reserved word storage area 410-2 of the memory 410 (S1826).

なおトリガー設定部４０３は、付加情報２をメモリ４１０の予約語保存エリア４１０−２に保存する際にはＳ１８０６で認識した予約語と関連付けて保存する。これにより、Ｓ１８０６で認識した予約語に関連付けされ、かつＳ１８１９で保存された付加語に関連付けされた動作内容（付加情報２）を保存することが可能となる。 The trigger setting unit 403 stores the additional information 2 in association with the reserved word recognized in S1806 when storing the additional information 2 in the reserved word storage area 410-2 of the memory 410. This makes it possible to save the operation content (additional information 2) associated with the reserved word recognized in S1806 and with the additional word stored in S1819.

登録済みの予約語に対して、付加情報だけをあとから追加することも可能である。 It is also possible to add only the additional information to the registered reserved word later.

図１８Ｄおよび図１８Ｅは、図１８Ａ、図１８Ｂおよび図１８Ｃとは異なり登録済みの予約語に対して、付加情報だけを追加する場合の処理シーケンスの例である。 18D and 18E are an example of a processing sequence in the case of adding only additional information to a registered reserved word, which is different from FIGS. 18A, 18B and 18C.

図１８Ｄに示すＳ１８５０からＳ１８５６の処理は、それぞれ図１８Ａに示すＳ１８００からＳ１８０６の処理と同一である。また、図１８Ｄおよび図１８Ｅに示すＳ１８７１からＳ１８８０の処理は、それぞれ図１８Ｃに示すＳ１８２１からＳ１８３０の処理と同一である。図１８Ａ、図１８Ｂおよび図１８Ｃのシーケンス例と図１８Ｄおよび図１８Ｅとのシーケンス例との違いは、図１８Ａ、図１８Ｂおよび図１８ＣのＳ１８０７からＳ１８２０の付加語登録処理に対応する処理が、図１８Ｄおよび図１８Ｅには無い点である。 The processes of S1850 to S1856 shown in FIG. 18D are the same as the processes of S1800 to S1806 shown in FIG. 18A, respectively. Further, the processing of S1871 to S1880 shown in FIGS. 18D and 18E is the same as the processing of S1821 to S1830 shown in FIG. 18C, respectively. The difference between the sequence example of FIGS. 18A, 18B and 18C and the sequence example of FIGS. 18D and 18E is that the process corresponding to the additional word registration process of S1807 to S1820 of FIGS. 18A, 18B and 18C is 18D and FIG. 18E do not have this point.

ユーザ３３１に付加情報１の入力を促すためにＬＥＤが緑色に点滅させる（Ｓ１８７１）と、表示部４２５に付加情報１を登録するためのメニューが表示される。ユーザ３３１は、表示されたメニュー画面に従って操作することで、付加情報１を作成する。作成が完了し付加情報１は、入力管理部４２０に取り込まれる（Ｓ１８７４）。入力管理部４２０は、取り込んだ付加情報１をトリガー設定部４０３に転送する（Ｓ１８７５）。トリガー設定部４０３は、転送された付加情報１をメモリ４１０の予約語保存エリア４１０−２に保存する（Ｓ１８７６）。 When the LED blinks in green to prompt the user 331 to input the additional information 1 (S1871), a menu for registering the additional information 1 is displayed on the display unit 425. The user 331 creates the additional information 1 by operating according to the displayed menu screen. After the creation is completed, the additional information 1 is fetched by the input management unit 420 (S1874). The input management unit 420 transfers the captured additional information 1 to the trigger setting unit 403 (S1875). The trigger setting unit 403 stores the transferred additional information 1 in the reserved word storage area 410-2 of the memory 410 (S1876).

なおトリガー設定部４０３は、付加情報１をメモリ４１０の予約語保存エリア４１０−２に保存する際にはＳ１８５６で認識した予約語と関連付けて保存する。これにより、Ｓ１８５６で認識した予約語に関連付けされた動作内容を保存することが可能となる。 The trigger setting unit 403 stores the additional information 1 in association with the reserved word recognized in S1856 when storing the additional information 1 in the reserved word storage area 410-2 of the memory 410. As a result, it becomes possible to save the operation content associated with the reserved word recognized in S1856.

図１９Ａおよび図１９Ｂは、図１８Ａ、図１８Ｂおよび図１８Ｃに示す処理によりメモリ４１０の予約語保存エリア４１０−２に付加語及び付加情報２が保存された場合に、ユーザ３３１が発した言葉の中から予約語と付加語を認識し、その認識した予約語と付加語の組み合わせに対応する付加情報２を予約語保存エリア４１０−２から読み出して、ホスト機器３３２に対して動作を設定する場合のシーケンス例である。 FIGS. 19A and 19B show the words issued by the user 331 when the additional word and the additional information 2 are stored in the reserved word storage area 410-2 of the memory 410 by the processing shown in FIGS. 18A, 18B, and 18C. When a reserved word and an additional word are recognized from the inside, the additional information 2 corresponding to the recognized combination of the reserved word and the additional word is read from the reserved word storage area 410-2, and the operation is set to the host device 332. Is an example of the sequence.

図１９Ａに示すＳ１９００からＳ１９０８の処理は、それぞれ図１６Ａに示すＳ１６００からＳ１６０８の処理と同一である。図１９Ａおよび図１９Ｂの処理における処理の図１６Ａおよび図１６Ｂの処理との違いは、Ｓ１９０９からＳ１９１１の付加語の認識の処理が追加されている点と、Ｓ１９１２からＳ１９１３の付加情報２の読み出し処理を行う点である。 The processes of S1900 to S1908 shown in FIG. 19A are the same as the processes of S1600 to S1608 shown in FIG. 16A, respectively. 19A and 19B are different from the processes of FIGS. 16A and 16B in that the additional word recognition process of S1909 to S1911 is added, and the additional information 2 reading process of S1912 to S1913 is added. Is the point to do.

ユーザ３１１が発した言葉を取り込んだデータに対して、図１９ＡのＳ１９０５において予約語の認識が成功すると、トリガー認識部４２０は、ユーザ３１１が発した言葉を取り込んだデータに対して、認識に成功した予約語に継続して入力された音声データが、付加語であるかの判定を判定するために、メモリ４１０の予約語保存エリア４１０−２から読み出した認識用データ（付加語）との比較を行う（Ｓ１９１１）。予約語に継続する音声データが付加語であると認識した場合、トリガー認識部４０５は、該当する予約語と付加語に対応した付加情報２をメモリ４１０の予約語保存エリア４１０−２から読み出す（Ｓ１９１２）。付加情報２を読み出したトリガー認識部４０５は、読み出した付加情報２の内容の動作をホスト機器３３２に設定する（Ｓ１９１３）。 When the reserved word is successfully recognized in S1905 of FIG. 19A with respect to the data in which the words the user 311 has taken in, the trigger recognition unit 420 succeeds in recognizing the data in which the words the user 311 has taken. The voice data continuously input to the reserved word is compared with the recognition data (additional word) read from the reserved word storage area 410-2 of the memory 410 to determine whether it is an additional word. Is performed (S1911). When the voice data continuing to the reserved word is recognized as an additional word, the trigger recognition unit 405 reads the corresponding reserved word and additional information 2 corresponding to the additional word from the reserved word storage area 410-2 of the memory 410 ( S1912). The trigger recognition unit 405 that has read the additional information 2 sets the operation of the content of the read additional information 2 in the host device 332 (S1913).

以上のように、ホスト機器３３２に予約語、付加語、付加情報を登録することで、ホスト機器３３２は、ホスト機器３３２の動作や、ホスト機器３３２とネットワークで接続されている機器やセンサに対する動作を自由に制御することが出来、個々人の生活スタイルにあった機器やセンサの制御が可能となる。 As described above, by registering the reserved word, the additional word, and the additional information in the host device 332, the host device 332 operates as the host device 332 and operates as to the devices and sensors connected to the host device 332 via the network. Can be controlled freely, and it is possible to control devices and sensors that suit the lifestyle of each individual.

図２０は、ホスト機器３３２に予約語が複数登録された場合、ユーザ３３１が発した言葉の中から予約語のいずれかであると認識した場合、その認識した予約語に応じて、音声認識クラウド１０１の音声テキスト変換部１０１−２で用いる音声認識辞書を変更する例の一覧である。 In FIG. 20, when a plurality of reserved words are registered in the host device 332, when the user 331 recognizes that one of the reserved words is one of the reserved words, the speech recognition cloud is displayed according to the recognized reserved word. 3 is a list of examples of changing the voice recognition dictionary used by the voice text conversion unit 101-2 of 101.

ホスト機器３３２は、複数の予約語を登録することが可能である。ホスト機器３３２は、ユーザ３３１が発した言葉を、登録された複数の予約語のいずれかであると認識した場合、その認識した予約語に応じて音声認識クラウド１０１の音声テキスト変換部１０１−２で用いる音声からテキストに変換するための音声認識辞書を変更することができる。例えば図２１Ａおよび図２１Ｂに示すように、ホスト機器３３２は、予約語として「こんにちは」「Ｈｅｌｌｏ」「おおきに」の３つを登録しているものとする。この場合ホスト機器３３２は、予約語「こんにちは」を認識した場合は、音声認識クラウド１０１の音声テキスト変換部１０１−２で用いる音声認識辞書を日本語辞書に変更するように命令を出すことができる。また、予約語「Ｈｅｌｌｏ」を認識した場合は、ホスト機器３３２は、音声認識クラウド１０１の音声テキスト変換部１０１−２に対して、音声認識辞書の種類を英語辞書に変更するように命令を出すことができる。さらにまた、予約語「おおきに」を認識した場合は、ホスト機器３３２は、音声認識クラウド１０１の音声テキスト変換部１０１−２で用いる音声認識辞書の種類を方言辞書（関西弁）に変更するように命令を出すことができる。 The host device 332 can register a plurality of reserved words. When the host device 332 recognizes the word issued by the user 331 as one of the registered reserved words, the voice text conversion unit 101-2 of the voice recognition cloud 101 according to the recognized reserved word. It is possible to change the voice recognition dictionary for converting the voice used in to the text. For example, as shown in FIGS. 21A and 21B, the host apparatus 332 is assumed to have registered three "Hello", "Hello", "Ookini" as reserved words. In this case the host device 332, if you recognize the "Hello" reserved word, it is possible to issue an instruction to change the voice recognition dictionary used in speech-to-text conversion unit 101-2 of the speech recognition cloud 101 to Japanese dictionary .. When the reserved word “Hello” is recognized, the host device 332 issues a command to the voice text conversion unit 101-2 of the voice recognition cloud 101 to change the type of voice recognition dictionary to an English dictionary. be able to. Furthermore, when the reserved word “Ookini” is recognized, the host device 332 changes the type of the voice recognition dictionary used by the voice text conversion unit 101-2 of the voice recognition cloud 101 to a dialect dictionary (Kansai dialect). Can issue orders.

ホスト機器３３２が認識した予約語に応じて音声認識クラウド１０１の音声テキスト変換部１０１−２で用いる音声認識辞書の種類を変えるためには、ユーザ３３１は、ホスト機器３３２に対して予約語を登録する際に、予約語に対応して音声テキスト変換部１０１−２で使用する音声認識辞書の種類（以降付加情報３と呼ぶ）をあわせて登録する必要がある。 In order to change the type of the voice recognition dictionary used by the voice text conversion unit 101-2 of the voice recognition cloud 101 according to the reserved word recognized by the host device 332, the user 331 registers the reserved word in the host device 332. In doing so, it is necessary to register the type of the voice recognition dictionary used by the voice text conversion unit 101-2 (hereinafter referred to as additional information 3) in association with the reserved word.

予約語に対応する音声認識辞書の種類（付加情報３）を、予約語の登録とあわせて登録する処理シーケンスは、図１５Ａおよび図１５Ｂに示す予約語に対して付加情報１を登録する処理シーケンスと同一であり、表示部４２５に表示されるメニュー画面で付加情報１を入力する（Ｓ１５１６）代わりに、付加情報３の入力画面を選択して入力すればよい。以降、図１５ＢのＳ１５１４以降の処理を用いて、付加情報３を登録する処理の流れについて説明する。図１５ＢのＳ１５１４以降に記載されている付加情報１は、付加情報３と読み替えて説明する。 The processing sequence for registering the type (additional information 3) of the voice recognition dictionary corresponding to the reserved word together with the registration of the reserved word is the processing sequence for registering the additional information 1 for the reserved word shown in FIGS. 15A and 15B. The additional information 1 may be selected and input instead of inputting the additional information 1 on the menu screen displayed on the display unit 425 (S1516). Hereinafter, the flow of processing for registering the additional information 3 will be described using the processing from S1514 of FIG. 15B. The additional information 1 described after S1514 in FIG. 15B will be replaced with the additional information 3 for description.

ユーザ３３１に付加情報３の入力を促すためにＬＥＤが緑色点滅する（Ｓ１５１４）と、表示部４２５に付加情報３を登録するためのメニューが表示される。ユーザ３３１は、表示されたメニュー画面に従って付加情報３の入力操作することで、付加情報３として辞書の種類を選択することができる。作成が完了し付加情報３は、入力管理部４２０に取り込まれる（Ｓ１５１６）。入力管理部４２０は、取り込んだ付加情報３をトリガー設定部４０３に転送する。トリガー設定部４０３は、転送された付加情報３をメモリ４１０の予約語保存エリア４１０−２に保存する。 When the LED blinks in green to prompt the user 331 to input the additional information 3 (S1514), a menu for registering the additional information 3 is displayed on the display unit 425. The user 331 can select the type of dictionary as the additional information 3 by inputting the additional information 3 according to the displayed menu screen. When the creation is completed, the additional information 3 is taken into the input management unit 420 (S1516). The input management unit 420 transfers the captured additional information 3 to the trigger setting unit 403. The trigger setting unit 403 stores the transferred additional information 3 in the reserved word storage area 410-2 of the memory 410.

なおトリガー設定部４０３は、付加情報３をメモリ４１０の予約語保存エリア４１０−２に保存する際にはＳ１５１３で登録した予約語と関連付けて保存する。 The trigger setting unit 403 stores the additional information 3 in association with the reserved word registered in S1513 when storing the additional information 3 in the reserved word storage area 410-2 of the memory 410.

図２１Ａおよび図２１Ｂは、図２０に示したようにホスト機器３３２に予約語が複数登録された場合の、各予約語がホスト機器３３２で認識された場合に、音声テキスト変換部１０１−２で使用する音声認識辞書の種類を変更するシーケンス例を示している。図２１Ａおよび図２１Ｂに示すＳ２１００からＳ２１１３の処理は、それぞれ図１６Ａおよび図１６Ｂに示すＳ１６００からＳ１６１３の処理と同一である。図２１Ａおよび図２１Ｂにおける処理の図１６Ａおよび図１６Ｂの処理との相違点は、図１６Ａおよび図１６Ｂの処理の場合は、トリガー認識部４０３が付加情報１を読み出した後、その付加情報１の内容に基づいてホスト機器３３２の動作の設定を行う（Ｓ１６１４）のに対して、図２１Ａおよび図２１Ｂの場合は、トリガー認識部４０３が付加情報３を読み出した後、その付加情報３の内容に基づいて音声テキスト変換部１０１−２で使用する音声認識辞書の種類を変えるために音声テキスト変換部１０１−２とのやり取りを行う（Ｓ２１１４−１からＳ２１１４−３）点である。 21A and 21B show that the voice text conversion unit 101-2 is used when a plurality of reserved words are registered in the host device 332 as shown in FIG. 20 and each reserved word is recognized by the host device 332. The example of a sequence which changes the kind of speech recognition dictionary used is shown. The processes of S2100 to S2113 shown in FIGS. 21A and 21B are the same as the processes of S1600 to S1613 shown in FIGS. 16A and 16B, respectively. 21A and 21B is different from the processes of FIGS. 16A and 16B in that in the processes of FIGS. 16A and 16B, after the trigger recognition unit 403 reads the additional information 1, the additional information 1 While the operation of the host device 332 is set based on the content (S1614), in the case of FIGS. 21A and 21B, after the trigger recognition unit 403 reads the additional information 3, the content of the additional information 3 is set. The point is that the voice-text conversion unit 101-2 exchanges with the voice-text conversion unit 101-2 in order to change the type of the voice-recognition dictionary used (S2114-1 to S2114-3).

なお、予約語の認識及び音声認識辞書の変更が完了したことをユーザに対して知らせる表示は、トリガー設定部４０３が表示装置４２５に対して登録完了通知を送信（Ｓ２１０９）し、その登録完了通知を受信した表示装置４２５が例えばＬＥＤを緑色で点灯させる、というようにユーザ３３１が認識できる表示方法で行うことが望ましい。或いはトリガー認識部４０５は、スピーカ４２３に対して認識完了通知を送付することで、その認識完了通知を受け取ったスピーカ４２３が例えば「はいはいなんでしょうか？。ところで、音声認識の辞書は方言辞書（関西弁）に変えましたよ」とユーザ３３１に対して音声によりアナウンスする方法でもよい。或いはトリガー認識部４０５は、予約語の認識と認識した予約語に対応した音声認識辞書の変更とが完了したことをユーザ３３１に対して通知するに、表示装置４２５を用いた表示による方法とスピーカ４２３を用いた音声による方法の両方を用いてもよい。 As for the display notifying the user that the reserved word recognition and the voice recognition dictionary change are completed, the trigger setting unit 403 sends a registration completion notification to the display device 425 (S2109), and the registration completion notification is sent. It is desirable that the display device 425 that has received the message illuminate the LED in green, for example, by a display method that can be recognized by the user 331. Alternatively, the trigger recognition unit 405 sends a recognition completion notification to the speaker 423, and the speaker 423 that has received the recognition completion notification asks, for example, “Yes, yes? By the way, the voice recognition dictionary is a dialect dictionary (Kansai). It has been changed to "valve". "Alternatively, a method of making an audio announcement to the user 331 may be used. Alternatively, the trigger recognition unit 405 notifies the user 331 that the recognition of the reserved word and the change of the voice recognition dictionary corresponding to the recognized reserved word are completed, in order to notify the user 331 of the method using the display device 425 and the speaker. Both voice methods using 423 may be used.

なお、図１４に示す予約語に対応する動作内容（付加情報１）、図１７（Ａ）（Ｂ）に示す予約語に対する付加語ごとの動作内容（付加情報２）、及び図２０に示す予約語に対する音声認識辞書の種類（付加情報３）は、組み合わせて登録を行うことができる。 Note that the operation content (additional information 1) corresponding to the reserved word shown in FIG. 14, the operation content for each additional word (additional information 2) with respect to the reserved word shown in FIGS. 17A and 17B, and the reservation shown in FIG. The types of the voice recognition dictionary (additional information 3) for words can be registered in combination.

図２２は、図１４に示す予約語に対応する動作内容の登録、図１７（Ａ）に示す予約語に対する付加語の登録、付加語に対する動作内容の登録及び図２０に示す予約語に対する音声認識辞書の種類の登録を組み合わせて行う場合の組み合わせの一覧である。ホスト機器３３２は、例えば予約語「こんにちは」に対しては、音声認識辞書の種類として日本語辞書を使用するように設定する。ホスト機器３３２は、また予約語「こんにちは」に対して付加語として「ちゃん」「や」「おい」を登録し、付加語が「ちゃん」の場合は応答する際のトーンを上げるようにホスト機器３３２の動作内容を変更し、付加語が「や」の場合は入力間隔確認タイマＴの満了時間Ｔ０を長くするように設定内容を変更し、また付加語が「おい」の場合は、「申し訳ございません」とすぐにアナウンスするように動作内容をする。 22. FIG. 22 shows the registration of the operation content corresponding to the reserved word shown in FIG. 14, the registration of the additional word for the reserved word shown in FIG. 17A, the registration of the operation content for the additional word, and the voice recognition of the reserved word shown in FIG. It is a list of combinations when registration of dictionary types is performed in combination. The host device 332 is, for example, a reserved word for the "Hello" is configured to use the Japanese dictionary as the type of voice recognition dictionary. The host device 332, also "Chan", "Ya" as an additional language to the reserved word "Hello" to register the "Hey", the host device so as to raise the tone at the time of response in the case of the additional word "chan" Change the operation contents of 332, change the setting contents to lengthen the expiration time T0 of the input interval confirmation timer T when the additional word is "ya", and if the additional word is "Oi", There is no such action "so that it will be announced immediately.

図２３は、予約語以外の内容（以降変更条件と呼ぶ）に応じてテキスト変換部１０１−２で使用する音声認識辞書の種類を変更する例の一覧である。例えば図２３（Ａ）は、変更条件として時刻を設定した場合の例である。ホスト機器３３２は、音声認識クラウド１０１のテキスト変換部１０１−２が音声データをテキストに変換する際に使用する音声認識辞書の種類を、その音声認識辞書を使用する時間によって変更するように指示する例を示している。 FIG. 23 is a list of examples in which the type of the voice recognition dictionary used by the text conversion unit 101-2 is changed according to the contents other than the reserved words (hereinafter referred to as change conditions). For example, FIG. 23 (A) shows an example in which time is set as the change condition. The host device 332 instructs the text conversion unit 101-2 of the voice recognition cloud 101 to change the type of the voice recognition dictionary used when converting the voice data into text depending on the time when the voice recognition dictionary is used. An example is shown.

ホスト機器３３２は、例えば、時刻０５：００から０８：００までは家族一般用辞書を使用し、時刻０８：００から１６：００までは奥様用辞書を使用し、時刻１６：００から２０：００までは家族一般用辞書を使用し、時刻２０：００から０５：００までは大人用辞書を使用するように、インターネット２を通じてテキスト変換部１０１−２に指示する。 For example, the host device 332 uses the family general dictionary from 05:00 to 08:00, the wife's dictionary from 08:00 to 16:00, and the time from 16:00 to 20:00. To the text conversion unit 101-2 via the Internet 2 to use the family general dictionary up to and from time 20:00 to 05:00.

また図２３（Ｂ）は、変更条件＝ホスト機器３３２の動作ステータスとした場合の例である。ホスト機器３３２は、テキスト変換部１０１−２が使用する音声認識辞書の種類を、その音声認識辞書を使用する際のホスト機器３３２の動作ステータスの種類によって変更するように指示することができる。 In addition, FIG. 23B is an example when the change condition = the operation status of the host device 332. The host device 332 can instruct to change the type of the voice recognition dictionary used by the text conversion unit 101-2 according to the type of the operation status of the host device 332 when using the voice recognition dictionary.

ホスト機器３３２は、例えば、動作ステータス＝今から出勤の時は時刻・ルート検索辞書を使用し、動作ステータス＝外出の時は一般辞書を使用し、動作ステータス＝夜モードの時はリフレッシュ辞書を使用するように、インターネット２を通じてテキスト変換部１０１−２に指示する。 The host device 332 uses, for example, the time / route search dictionary when the operation status = going to work, the general dictionary when the operation status = going out, and the refresh dictionary when the operation status = night mode. To the text conversion unit 101-2 via the Internet 2.

ホスト機器３３２は、条件に応じて使用する音声認識辞書の種類の情報である、変更条件種類情報を登録するモード（以降変更条件登録モードと呼ぶ）以降を有している。 The host device 332 has a mode (hereinafter referred to as a change condition registration mode) for registering change condition type information, which is information on the type of the voice recognition dictionary to be used according to the conditions.

ユーザ３３１は、変更条件に応じて音声認識辞書の種類を使い分けるために、変更条件種類情報をホスト機器３３２に予め登録する必要がある。 The user 331 needs to register the change condition type information in the host device 332 in advance in order to properly use the type of the voice recognition dictionary according to the change condition.

変更条件に応じて音声認識辞書の種類を使い分けるための登録方法は、ユーザ３３１が発した音声をマイク４２１を通じてホスト機器３３２が取り込み、その取り込んだ音声データを解析することで、登録できるようにしてもよい。或いはまた表示装置４２５に、付加情報１を設定するメニューを表示させ、ユーザ３３１がそのメニューに従って操作することで登録できるようにしてもよい。或いは図４に示すネットワークＩ／Ｆ４２７を経由して接続されている外部のデバイス、例えばスマートフォンやタブレットを用いて、そのスマートフォンやタブレットの表示画面に予約語に付加情報１を設定するメニューを表示さ、ユーザ３３１がその表示されたメニュー画面に従って操作することで登録できるようにしてもよい。 The registration method for selectively using the type of the voice recognition dictionary according to the change condition is that the host device 332 captures the voice uttered by the user 331 through the microphone 421 and analyzes the captured voice data to enable registration. Good. Alternatively, a menu for setting the additional information 1 may be displayed on the display device 425 so that the user 331 can register it by operating according to the menu. Alternatively, using an external device connected via the network I / F 427 shown in FIG. 4, such as a smartphone or tablet, a menu for setting the additional information 1 to the reserved word is displayed on the display screen of the smartphone or tablet. The user 331 may be registered by operating according to the displayed menu screen.

図２４は、表示部４２５に表示された変更条件種類情報を設定するメニューを表示させ、ユーザ３３１がそのメニューに従って操作することで変更条件に応じて使い分ける音声認識辞書の種類を登録する場合の処理シーケンスの例である。図２４に示すＳ２４１７からＳ２４２３の処理は、付加情報１の登録シーケンスである図１５ＢのＳ１５１７からＳ１５２３の処理と同一である。 FIG. 24 is a process of displaying a menu for setting change condition type information displayed on the display unit 425, and operating the user 331 according to the menu to register the type of the voice recognition dictionary to be used according to the change condition. It is an example of a sequence. The processing of S2417 to S2423 shown in FIG. 24 is the same as the processing of S1517 to S1523 of FIG. 15B which is the registration sequence of the additional information 1.

ユーザ３３１は、表示されたメニュー画面に従って操作することで、変更条件に応じて使い分ける音声認識辞書の種類を入力する。入力が完了した変更条件種類情報は、入力管理部４２０に取り込まれる（Ｓ２４１７）。入力管理部４２０は、取り込んだ変更条件種類情報をトリガー設定部４０３に転送する（Ｓ２４１８）。トリガー設定部４０３は、転送された変更条件種類情報をメモリ４１０の予約語保存エリア４１０−２に保存する（Ｓ２４１９）。 The user 331 operates the displayed menu screen to input the type of the voice recognition dictionary to be used according to the change condition. The change condition type information of which the input is completed is taken into the input management unit 420 (S2417). The input management unit 420 transfers the fetched change condition type information to the trigger setting unit 403 (S2418). The trigger setting unit 403 stores the transferred change condition type information in the reserved word storage area 410-2 of the memory 410 (S2419).

図２５は、図２３に示すように変更条件に応じて音声認識辞書の種類を変更するための変更条件種類情報がメモリ４１０の予約語保存エリア４１０−２に保存されている場合に、その保存されている変更条件種類情報の内容に応じて、ホスト機器３３２が音声テキスト変換部１０１−２に、音声認識辞書の変更を通知する場合の処理シーケンスの例である。 FIG. 25 shows a case where the change condition type information for changing the type of the voice recognition dictionary according to the change condition is stored in the reserved word storage area 410-2 of the memory 410 as shown in FIG. It is an example of a processing sequence in the case where the host device 332 notifies the voice / text conversion unit 101-2 of the change of the voice recognition dictionary according to the contents of the change condition type information that has been performed.

図２５の処理は、例えば図９Ｂに示す予約語の認識の処理が終了した（Ｓ９１１）あとに、継続して行うことが望ましい。或いは、予約語の認識が行われた後に、図１０Ａおよび図１０Ｂに示すように、機器やセンサを制御するためにユーザ３３１がホスト機器３３２に発した場合に、その言葉をホスト機器３３２が取り込んだタイミング（Ｓ１００１）で行うことが望ましい。 It is desirable that the processing of FIG. 25 be continuously performed after the reserved word recognition processing shown in FIG. 9B is completed (S911), for example. Alternatively, after the reserved word is recognized, as shown in FIGS. 10A and 10B, when the user 331 issues the word to the host device 332 to control the device or sensor, the host device 332 captures the word. It is desirable to carry out at the same timing (S1001).

図２５は、図１０Ａおよび図１０Ｂに示すように機器やセンサを制御するためにユーザ３３１がホスト機器３３２に言葉を発した場合に、その言葉をホスト機器３３２が取り込んだタイミング（Ｓ１００１）で、音声認識辞書の変更の判定とその結果の通知を行う場合の例である。 FIG. 25 shows the timing (S1001) when the user device 331 utters a word to the host device 332 in order to control the device or sensor as shown in FIGS. 10A and 10B, and the word is taken in by the host device 332 (S1001). This is an example of the case of determining the change of the voice recognition dictionary and notifying the result.

予約語の認識が完了した場合、ホスト機器３３２は、継続してユーザの発した音声を、マイク４２１を通じて入力管理部４２０に取り込む（Ｓ２５０１）。入力管理部４２０は、音声データを取り込んだタイミングで、変更条件種類情報を読み出すために、音声処理部４０７に読み出し要求（変更条件種類情報）を送信する（Ｓ２５０２）とともに取り込んだ音声データに対する処理は一時停止する。読み出し要求（変更条件種類情報）を受信した音声処理部４０７は、メモリ４１０の予約語保存エリア４１０−２から、変更条件と音声認識辞書の種類の組み合わせが含まれている変更条件種類情報を読み出す（Ｓ２５０３）。音声処理部４０７は、読み出した変更条件種類情報の「変更条件」を解析し、その内容がホスト機器３３２の状態に適合しているかの判定を行う（Ｓ２５０４）。判定の結果適合していると判定された場合、音声処理部４０７は、「変更条件」に対応する「音声認識辞書の種類」を読み出し、音声認識辞書種類通知により変更後の音声認識辞書の種類をインターネット２を通じて音声テキスト変換部１０１−２に通知する（２５０５）。音声認識辞書種類通知を受信した音声テキスト変換部１０１−２は、通知された音声認識辞書の種類を参照し、現在使用中の音声認識辞書の種類を通知された音声認識辞書の種類に変更する（Ｓ２５０６）
音声テキスト変換部１０１−２は、音声認識辞書の種類の変更が完了すると、変更完了の通知として、音声処理部４０７に対して音声認識辞書変更完了通知を通知する（Ｓ２５０７）。 When the recognition of the reserved word is completed, the host device 332 continuously captures the voice uttered by the user into the input management unit 420 through the microphone 421 (S2501). The input management unit 420 transmits a read request (change condition type information) to the voice processing unit 407 to read the change condition type information at the timing of capturing the voice data (S2502), Pause. Upon receiving the read request (change condition type information), the voice processing unit 407 reads the change condition type information including the combination of the change condition and the voice recognition dictionary type from the reserved word storage area 410-2 of the memory 410. (S2503). The voice processing unit 407 analyzes the “change condition” of the read change condition type information, and determines whether the content matches the state of the host device 332 (S2504). When it is determined that the voice recognition dictionary is suitable, the voice processing unit 407 reads the “voice recognition dictionary type” corresponding to the “change condition” and notifies the voice recognition dictionary type notification of the changed voice recognition dictionary type. To the voice / text conversion unit 101-2 via the Internet 2 (2505). Upon receiving the voice recognition dictionary type notification, the voice text conversion unit 101-2 refers to the notified voice recognition dictionary type, and changes the currently used voice recognition dictionary type to the notified voice recognition dictionary type. (S2506)
When the change of the type of the voice recognition dictionary is completed, the voice / text conversion unit 101-2 notifies the voice processing unit 407 of a voice recognition dictionary change completion notification as a change completion notification (S2507).

音声処理部４０７は、音声認識辞書変更完了通知を受信すると（Ｓ２５０７）、入力管理部４２０に対して、変更条件種類情報の読み出しが完了した旨の通知として、読み出し完了通知を送信する（Ｓ２５０８）。入力管理部４２０は、読み出し完了通知を受信する（Ｓ２５０８）と、Ｓ２５０１において取り込んでいた音声データに対する処理を再開する。 Upon receiving the voice recognition dictionary change completion notification (S2507), the voice processing unit 407 transmits a read completion notification to the input management unit 420 as a notification that the read of the change condition type information is completed (S2508). .. When the input management unit 420 receives the read completion notification (S2508), the input management unit 420 restarts the processing on the audio data captured in S2501.

ユーザ３３１は、ホスト機器３３２に登録した予約語を忘れてしまう場合がある。そのような場合に備えて、ユーザ３３１は、登録済みの予約語を簡易な方法で確認できることが望ましい。 The user 331 may forget the reserved word registered in the host device 332. In preparation for such a case, it is desirable that the user 331 be able to confirm the registered reserved words by a simple method.

図２６は、図５Ａおよび図５Ｂに示す処理シーケンスの例で予約語を登録したユーザ３３１が、登録済みの予約語を忘れてしまった場合、登録済みの予約語の一部または全部をユーザ３３１に通知するための予約語（以降救済予約語と呼ぶ）と表示内容（表示範囲）の例の一覧を示している。例えば「わからない」という予約語に対しては、ホスト機器３３２に登録済みの予約語の全てを表示部４２５に表示する、或いはホスト機器に３３２に接続された外部のデバイスの表示エリアに表示する場合を示している。また「ちょっと教えて」という予約語に対しては、ホスト機器３３２に登録済みの予約語のうち予め決められた一部を表示部４２５に表示する、或いはホスト機器３３２に接続された外部のデバイスの表示エリアに表示する場合を示している。また「使ってないヤツ」という予約語に対しては、ホスト機器３３２に登録済みの予約語のうち使用履歴が過去１年間ない予約語を表示部４２５に表示する、或いはホスト機器３３２に接続された外部のデバイスの表示エリアに表示する場合を示している。ホスト機器３３２に接続された外部のデバイスとしては、例えばスマートフォンやタブレット、液晶テレビ等の表示画面が比較的大きくユーザが一度に多くの予約語を参照することができるデバイスであることが望ましい。 In FIG. 26, when the user 331 who has registered a reserved word in the example of the processing sequence shown in FIGS. 5A and 5B forgets the registered reserved word, some or all of the registered reserved words are written by the user 331. A list of examples of reserved words (hereinafter referred to as “reservation reserved words”) and display contents (display range) for notifying the user is shown. For example, when the reserved word "I don't know" is displayed on the display unit 425 of all the reserved words registered in the host device 332, or in the display area of an external device connected to the host device 332. Is shown. For the reserved word "tell me a little", a predetermined part of the reserved words registered in the host device 332 is displayed on the display unit 425, or an external device connected to the host device 332. It shows the case of displaying in the display area of. For the reserved word “unused”, the reserved words registered in the host device 332 and having no usage history for the past year are displayed on the display unit 425, or connected to the host device 332. It shows the case of displaying in the display area of an external device. As an external device connected to the host device 332, for example, a device such as a smartphone, a tablet, or a liquid crystal television, which has a relatively large display screen and allows the user to refer to many reserved words at one time, is desirable.

このように、登録済みの予約語を表示させるための予約語の登録は、ホスト機器のモード＝設定モード（予約語（表示用））に変更して、図５Ａおよび図５Ｂに示す予約語の登録の処理シーケンスに従って登録すればよい。 As described above, the registration of the reserved word for displaying the registered reserved word is performed by changing the mode of the host device to the setting mode (reserved word (for display)) and changing the reserved word shown in FIGS. 5A and 5B. The registration may be performed according to the registration processing sequence.

上記の例は、図２６に示した「救済予約語」をユーザが発することで、すぐに該当する予約語が表示される例である。しかし、ホスト機器３３２が、該当する予約語を表示するまえに、ユーザ３３１に対して合言葉を聞くようにしてもよい。ユーザが「救済予約語」を発した後、ホスト機器３３２はスピーカ４２３を通じて例えば「山」と音声を発し、これに対して例えばユーザ３３１が「川」と応答したときに、該当する予約語を表示してもよい。 The above example is an example in which when the user issues the "reservation reserved word" shown in FIG. 26, the corresponding reserved word is immediately displayed. However, the host device 332 may ask the user 331 for a secret word before displaying the corresponding reserved word. After the user issues the "reservation reserved word", the host device 332 emits a sound such as "mountain" through the speaker 423, and when the user 331 responds "kawa", the corresponding reserved word is issued. It may be displayed.

更に、ホスト機器３３２は、ユーザ３３１が発した言葉を取り込んで、予約語、付加語、あるいは付加情報を登録するシーンを録音あるいは録画することも出来る。あるいは、予約語、付加語、を認識した場合に、その認識するシーンを録音あるいは録画することもできる。 Furthermore, the host device 332 can also capture the words issued by the user 331 and record or record the reserved word, the additional word, or the scene in which the additional information is registered. Alternatively, when the reserved word or the additional word is recognized, the recognized scene can be recorded or recorded.

図２７は、ホスト機器３３２が、ユーザ３３１が発した言葉を取り込んで、予約語、付加語、あるいは付加情報の登録、予約語あるいは付加語の認識、のシーンを録音あるいは録画する場合、ホスト機器３３２の機能ブロック図を示している。図４との違いは、ホスト機器２７００が予約語、付加語、あるいは付加情報を登録するシーンを録画する、あるいは予約語あるいは付加語を認識するシーンを録画するためのカメラ２７０２を有する点、また制御管理部２７０１がＡＰＰ−Ｍｇ２７０１−１、ＣＯＮＦ−Ｍｇ２７０１−２に加えＥＶＴ−Ｍｇ２７０１−３を有する点、システムコントローラ４０２が録音あるいは録画したシーンのデータを再生するための再生制御機能を有している点である。ＥＶＴ−Ｍｇ２７０１−３は、予約語、付加語、あるいは付加情報を登録するシーンの発生、また、予約語、付加語、を認識するシーンの発生、に起因して後述する録音あるいは録画を行う機能を有している。以下、ホスト機器３３２が、ユーザ３３１が発した言葉を取り込んで、予約語、付加語、あるいは付加情報を登録するシーンを録音あるいは録画する処理の流れ、また予約語、付加語を認識するシーンを録音あるは録画する処理の流れ、について説明する。 FIG. 27 shows that when the host device 332 captures a word issued by the user 331 and records a scene of a reserved word, an additional word, or additional information, and a reserved word or the recognition of the additional word, 3 shows a functional block diagram of 332. The difference from FIG. 4 is that the host device 2700 has a camera 2702 for recording a scene in which a reserved word, an additional word, or additional information is registered, or a scene for recognizing a reserved word or an additional word. The control management unit 2701 has EVT-Mg2701-3 in addition to APP-Mg2701-1 and CONF-Mg2701-2, and the system controller 402 has a reproduction control function for reproducing data of a recorded or recorded scene. That is the point. The EVT-Mg2701-3 has a function of performing recording or recording described later due to the occurrence of a scene in which a reserved word, an additional word, or additional information is registered, and the occurrence of a scene in which the reserved word or the additional word is recognized. have. Hereinafter, the host device 332 captures the words uttered by the user 331 and records or records a scene for registering reserved words, additional words, or additional information, and a scene for recognizing reserved words and additional words. The flow of a recording process or a recording process will be described.

図２８は、予約語、付加語、あるいは付加情報を登録するシーンが発生したとき、あるいは、予約語、付加語、を認識するシーンが発生した場合に、登録のシーンあるいは認識のシーンをホスト機器３３２が録音あるいは録画する場合の時間経過を示している。 In FIG. 28, when a scene for registering a reserved word, an additional word, or additional information occurs, or a scene for recognizing a reserved word, an additional word occurs, the registered scene or the recognized scene is used as a host device. The reference numeral 332 indicates a time lapse when recording or recording.

時刻t1において、ホスト機器３３２は、ユーザが発した言葉を予約語として登録を開始したとする。予約語の登録の開始は、例えば図５Ａおよび図５Ｂの予約語の登録シーケンスにおける、入力管理部４２０がＳ５０２の処理を行うタイミングとしてもよい。入力管理部４２０は、予約語の登録の開始を認識すると、その旨をＥＶＴ−Ｍｇ２７０１−３に通知する。予約語の登録開始の旨の通知を受信したＥＶＴ−Ｍｇ２７０１―３は、マイク４２１を通じて予約語登録のシーンをＲｅｃ１として録音する、あるいはカメラ２７０２を通じて予約語登録のシーンをＲｅｃ１として録画する。予約語の登録の終了は、例えば図５Ａおよび図５Ｂの予約語の登録シーケンスにおける、入力管理部４２０がＳ５１２の登録完了通知を受け取ったタイミングとしてもよい。予約語の登録の終了を把握した入力管理部４２０は、その旨とＥＶＴ−Ｍｇ２７０１−３に通知する。予約語の登録完了の旨を受信したＥＶＴ−Ｍｇ２７０１−３は、マイク４２１を通じて行っていた予約語登録のシーンの録音を終了させる、あるいはカメラ２７０２を通じて行っていた予約語登録のシーンの録画を終了させる。 It is assumed that at time t1, the host device 332 starts registration with the word issued by the user as a reserved word. The registration of the reserved word may be started, for example, at the timing when the input management unit 420 performs the process of S502 in the reserved word registration sequence of FIGS. 5A and 5B. When the input management unit 420 recognizes the start of registration of a reserved word, it notifies the EVT-Mg2701-3 to that effect. The EVT-Mg2701-3 that has received the notification that the reserved word registration is started records the reserved word registration scene as Rec1 through the microphone 421 or the reserved word registration scene as Rec1 through the camera 2702. The reserved word registration may be ended at the timing when the input management unit 420 receives the registration completion notification of S512 in the reserved word registration sequence of FIGS. 5A and 5B, for example. The input management unit 420, which has recognized the end of the reserved word registration, notifies the EVT-Mg2701-3 to that effect. The EVT-Mg2701-3, which has received the notification that the reserved word registration is completed, terminates the recording of the reserved word registration scene performed through the microphone 421, or ends the reserved word registration scene recording performed through the camera 2702. Let

同様に、時刻t2において、ホスト機器３３２はユーザが発した言葉を予約語として認識を開始したとする。予約語の認識の開始は、例えば図８Ａおよび図８Ｂの予約語の認識シーケンスにおける、入力管理部４２０がＳ８０２の処理を行うタイミングとしてもよい。入力管理部４２０は、予約語の認識の開始を認識すると、その旨をＥＶＴ−Ｍｇ２７０１−３に通知する。予約語の認識開始の旨の通知を受信したＥＶＴ−Ｍｇ２７０１―３は、マイク４２１を通じて予約語認識のシーンをＲｅｃ２として録音する、あるいはカメラ２７０２を通じて予約語認識のシーンをＲｅｃ２として録画する。予約語の認識の終了は、例えば図８Ａおよび図８Ｂの予約語の登録シーケンスにおける、入力管理部４２０がＳ８１１の認識完了通知を受け取ったタイミングとしてもよい。予約語の登録の終了を把握した入力管理部４２０は、その旨とＥＶＴ−Ｍｇ２７０１−３に通知する。予約語の登録完了の旨を受信したＥＶＴ−Ｍｇ２７０１−３は、マイク４２１を通じて行っていた予約語認識のシーンの録音を終了させる、あるいはカメラ２７０２を通じて行っていた予約語認識のシーンの録画を終了させる。 Similarly, it is assumed that at time t2, the host device 332 starts recognition of a word uttered by the user as a reserved word. The recognition of the reserved word may be started, for example, at the timing when the input management unit 420 performs the process of S802 in the reserved word recognition sequence of FIGS. 8A and 8B. Upon recognizing the start of the recognition of the reserved word, the input management unit 420 notifies the EVT-Mg2701-3 to that effect. The EVT-Mg2701-3 that has received the notification that the reserved word recognition is started records the reserved word recognition scene as Rec2 through the microphone 421, or records the reserved word recognition scene as Rec2 through the camera 2702. The recognition of the reserved word may be ended at the timing when the input management unit 420 receives the recognition completion notification of S811 in the reserved word registration sequence of FIGS. 8A and 8B, for example. The input management unit 420, which has recognized the end of the reserved word registration, notifies the EVT-Mg2701-3 to that effect. The EVT-Mg2701-3, which has received the notification that the reserved word registration is completed, terminates the recording of the reserved word recognition scene performed through the microphone 421, or the recording of the reserved word recognition scene performed through the camera 2702. Let

同様に、t3およびt4において発生した登録あるいは認識のイベントを録画あるいは録音する。 Similarly, record or record the registration or recognition event that occurred at t3 and t4.

ホスト機器３３２は、録音または録画された登録のシーンまたは認識のシーンを再生することができる。 The host device 332 can play back the recorded scene or the recorded scene of registration or the scene of recognition.

図２９は、録画あるいは録音されたシーンの各データを再生する際に、再生対象のデータが表示されている様子の一例を示している。図２９の例では、図２８の時間軸に対するイベントの発生する様子に対応する形で、４つの再生対象のデータのアイコンが表示されている。この再生対象のデータのアイコン表示は、例えば表示部４２５に表示されてもよい。あるいはホスト機器３３２に接続された外部デバイス、例えばスマートフォンやタブレット、液晶テレビ等に表示されてもよい。 FIG. 29 shows an example of how the data to be reproduced is displayed when reproducing each data of the recorded or recorded scene. In the example of FIG. 29, four reproduction target data icons are displayed in a manner corresponding to the occurrence of events on the time axis of FIG. 28. The icon display of the reproduction target data may be displayed on the display unit 425, for example. Alternatively, it may be displayed on an external device connected to the host device 332, such as a smartphone, a tablet, or a liquid crystal television.

表示されているアイコンは、録音または録画された日時と、録画または録音の対象のデータの内容を表している。例えばアイコンの表示内容が、予約語登録「おおきに」の場合は、録画または録音されているデータの内容が、「おおきに」を予約語として登録したシーンであることを示している。同様にアイコンの表示内容が、予約語認識「おおきに」の場合は、録画または録音されているデータの内容が、「おおきに」を予約語として認識したシーンであることを示している。 The displayed icons represent recording or recording date and time, and contents of recording or recording target data. For example, when the display content of the icon is the reserved word registration “OOKINI”, it indicates that the content of the recorded or recorded data is a scene in which “OOKINI” is registered as the reserved word. Similarly, when the display content of the icon is the reserved word recognition “Ookini”, it indicates that the content of the recorded or recorded data is a scene in which “Ookini” is recognized as the reserved word.

ユーザ３３１は、再生したいデータのアイコンを選択することが、対象となるデータの録音または録画された内容を確認することが出来る。 The user 331 can confirm the recording of the target data or the recorded contents by selecting the icon of the data to be reproduced.

更にまたホスト機器３３２は、ネットワーク３３３で接続されているカメラやマイクに指示を出し、これらのカメラやマイクにより、予約語、付加語、あるいは付加情報を登録するシーンが発生した場合に、あるいは、予約語、付加語、を認識するシーンが発生した場合に、登録のシーンあるいは認識のシーンを録音あるいは録画してもよい。 Furthermore, the host device 332 gives an instruction to a camera or a microphone connected via the network 333, and when a scene for registering a reserved word, an additional word, or additional information is generated by these cameras or microphones, or When a scene for recognizing a reserved word or an additional word occurs, a registered scene or a recognized scene may be recorded or recorded.

既に説明したようにホスト機器３３２は、ユーザ３３１が発した言葉の中から予約語を認識することで、その予約語に対応した付加情報の内容をもとに、ネットワークで接続された機器やセンサを制御することが出来る。この対象となる機器やセンサの制御内容は、高いセキュリティを必要とする場合もある。例えば、金庫の扉の開閉の制御をホスト機器を用いて実施できるように、ホスト機器３３２に付加情報として金庫の扉の開閉動作が設定されている予約語が登録されているとする。この場合、ホスト機器３３２は、該当する予約語を認識した場合、金庫の扉の開閉を行うとともに、金庫の周辺にあるマイクやカメラを用いて、制御対象の機器である金庫の周辺を録音あるいは録画することで、金庫の扉の開閉動作のセキュリティを保つことが可能となる。ユーザ３３１は、ネットワークで接続されたマイクやカメラを用いて録音あるいは録画されたデータも、ホスト機器３３２に内蔵されているマイクやカメラを用いて録音あるいは録画されたデータ同様に、その内容を確認することが出来る。ホスト機器３３２による制御対象となる機器やセンサの制御内容が高いセキュリティを必要とする場合、ホスト機器３３２は更にまた、制御内容を実施するまえに、制御対象の機器やセンサの周辺にあるマイクやカメラを用いて録音した音声や録画した映像を用いて、録音された音声を発した人物あるいは録画された映像の人物の正当性確認を行ってもよい。ホスト機器３３２は、特定の付加情報における制御内容を実行する前に、あらかじめ登録してある特定人物の声や顔などの特徴点と、制御対象の機器やセンサの周辺にあるマイクやカメラを用いて集音された音声や撮影された映像とを比較し、該当人物の正当性が確認された場合のみ、該当する制御内容を実行するようにしてもよい。 As described above, the host device 332 recognizes the reserved word from the words issued by the user 331, and based on the content of the additional information corresponding to the reserved word, the device or the sensor connected to the network. Can be controlled. The control contents of the target device or sensor may require high security. For example, it is assumed that a reserved word in which the opening / closing operation of the safe door is set as the additional information is registered in the host device 332 so that the opening / closing control of the safe door can be performed using the host device. In this case, when the host device 332 recognizes the corresponding reserved word, the host device 332 opens and closes the door of the safe, and uses a microphone and a camera around the safe to record or record around the safe, which is the device to be controlled. By recording the video, it becomes possible to maintain the security of the opening / closing operation of the safe door. The user 331 confirms the content of the data recorded or recorded using the microphone or camera connected to the network, as well as the data recorded or recorded using the microphone or camera built in the host device 332. You can do it. If the host device 332 requires high security for the control content of the device or sensor to be controlled, the host device 332 may also perform the control content before the control device or the microphone around the sensor. The validity of the person who uttered the recorded voice or the person of the recorded video may be confirmed by using the voice or the video recorded by the camera. The host device 332 uses a feature point such as a voice or face of a specific person, which is registered in advance, and a microphone or a camera around the device or sensor to be controlled, before executing the control content of the specific additional information. It is also possible to compare the collected sound and the captured video and execute the corresponding control content only when the legitimacy of the person is confirmed.

以上の実施形態の説明は、認識用データ変換部１０１−１、音声テキスト変換部１０１−２、テキスト分析部１０２−１、応答・アクション生成部１０２−２が、いずれもクラウドサーバ１の中に存在しているものとして説明したが、これらの一部あるいは全てがホスト機器３３２の中に存在していても構わない。その場合も、既に説明した各処理の動作シーケンスの例は、記載済みのものと同様となる。 In the description of the above embodiment, the recognition data conversion unit 101-1, the voice text conversion unit 101-2, the text analysis unit 102-1 and the response / action generation unit 102-2 are all included in the cloud server 1. Although it has been described as being present, some or all of these may be present in the host device 332. Also in that case, the example of the operation sequence of each processing already described is the same as the one already described.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although some embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. These embodiments and their modifications are included in the scope and gist of the invention, and are also included in the invention described in the claims and the scope equivalent thereto.

１・・・クラウドサーバ、２・・・インターネット、３・・・ホーム、１０１・・・音声認識クラウド、１０２・・・対応アクション生成クラウド、３１０・・・各種センサ、３２０・・・各種設備機器、３３０・・・ＨＧＷ（ＨｏｍｅＧａｔｅＷａｙ）、３３１・・・ユーザ、３３２・・・ホスト機器、３４０・・・各種家電機器。 1 ... Cloud server, 2 ... Internet, 3 ... Home, 101 ... Voice recognition cloud, 102 ... Corresponding action generation cloud, 310 ... Various sensors, 320 ... Various equipment , 330 ... HGW (HomeGateway), 331 ... User, 332 ... Host device, 340 ... Various home appliances.

Claims

Based on the content of the first voice input from the outside, it is determined whether to control one or a plurality of devices based on the content of the second voice input after the input of the first voice. In electronic devices,
The determination voice data for determining whether the first voice is a desired voice is created and managed by the voice input from the outside multiple times, and the determination voice data created and managed is used to perform the Management means for determining that the first voice is the desired voice,
A control unit that controls the one or more devices based on the content of the second voice, and the first voice is a desired voice using the determination voice data by the management unit. An electronic device that controls the one or more devices based on the content of the second voice by the control means when it is determined that
When the management unit determines that the first voice is a desired voice by using the determination voice data, the output unit outputs the determination result by the management unit by voice, and to that effect. Is output from the output unit,
When the management unit determines that the first voice is desired data using the determination voice data, the management unit has a determination standard 2 having a plurality of standards, and the determination standard 2 that is satisfied by the determination result. An electronic device that changes the content output from the output unit according to any one of the plurality of standards.

The electronic device according to claim 1, wherein the management unit can create and manage a plurality of the determination voice data.

Based on the content of the first voice input from the outside, it is determined whether to control one or a plurality of devices based on the content of the second voice input after the input of the first voice. In the control method,
The determination voice data for determining whether the first voice is a desired voice is created and managed by the voice input from the outside multiple times, and the determination voice data created and managed is used to perform the Management means for determining that the first voice is the desired voice,
A control unit that controls the one or more devices based on the content of the second voice, and the first voice is a desired voice using the determination voice data by the management unit. When it is determined that the control means controls the one or a plurality of devices based on the content of the second voice,
When the management unit determines that the first voice is a desired voice by using the determination voice data, the output unit outputs the determination result by the management unit by voice, and to that effect. Is output from the output unit,
When the management unit determines that the first voice is desired data using the determination voice data, the management unit has a determination standard 2 having a plurality of standards, and the determination standard 2 that is satisfied by the determination result. The control method of changing the content output from the output unit according to any one of the plurality of criteria.