WO2019198186A1 - Electronic device and control method for same - Google Patents

Electronic device and control method for same Download PDF

Info

Publication number
WO2019198186A1
WO2019198186A1 PCT/JP2018/015306 JP2018015306W WO2019198186A1 WO 2019198186 A1 WO2019198186 A1 WO 2019198186A1 JP 2018015306 W JP2018015306 W JP 2018015306W WO 2019198186 A1 WO2019198186 A1 WO 2019198186A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
sound
user
host device
data
Prior art date
Application number
PCT/JP2018/015306
Other languages
French (fr)
Japanese (ja)
Inventor
秀人 井澤
玲子 嘉和知
邦朗 本沢
弘之 野本
Original Assignee
東芝映像ソリューション株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 東芝映像ソリューション株式会社 filed Critical 東芝映像ソリューション株式会社
Priority to PCT/JP2018/015306 priority Critical patent/WO2019198186A1/en
Priority to CN201880077613.1A priority patent/CN111656314A/en
Publication of WO2019198186A1 publication Critical patent/WO2019198186A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output

Definitions

  • Embodiments of the present invention relate to an electronic device that controls a plurality of devices by voice and a control method thereof in the field of home automation in homes, offices, and small-scale offices.
  • This voice recognition apparatus and method analyze voice inputted from a user to determine whether the inputted voice is voice that turns on the function of the apparatus, When it is determined that the sound is to turn on, the content of the continued sound is analyzed and processing based on the analysis result is performed.
  • there is a method of identifying a user who has emitted a voice by recognizing characteristics of the voice input by the user and performing a process suitable for the user.
  • each device is connected to each other via a home network, and a host device that totally controls the plurality of connected devices is connected to the network.
  • the host device controls the operation of each device connected via the network, or manages information so that the user can centrally browse and collect information about each device.
  • the user can control each device connected to the host device via a network by instructing the host device by voice, for example, and centrally browse information on each connected device. it can.
  • devices to be controlled can be easily connected via a network, so the number and types of connected devices tend to be large.
  • new participation in the network, setting change, and withdrawal from the network frequently occur due to addition, change, version upgrade, movement and disposal of the installation target device.
  • home automation systems tend to be used in homes and offices regardless of gender. In particular, with the recent miniaturization of devices and sensors having a wide variety of functions, this trend has become more prominent.
  • the present embodiment has been made in view of the above problems, and proposes an electronic device that controls various devices connected by a network so as to match each lifestyle of the user and a control method thereof. Objective.
  • the electronic device includes one or a plurality of devices based on the content of the first sound input from the outside, based on the content of the second sound input after the input of the first sound.
  • the sound data for determination for determining that the first sound is a desired sound is created and managed by the sound input from the outside a plurality of times, and is created and managed.
  • the control means changes the content of the second sound to the content of the second sound. Based on said one or To perform the control of the few cars of the equipment.
  • FIG. 1 is a diagram illustrating an example of an overall image of a home automation system according to an embodiment.
  • FIG. 2 is a list showing another example of the sensor according to the embodiment.
  • FIG. 3 is a diagram illustrating an example of a host device according to an embodiment.
  • FIG. 4 is a functional block diagram of the host device according to the embodiment.
  • FIG. 5A is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment.
  • FIG. 5B is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment.
  • FIG. 6A is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment.
  • FIG. 6B is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment.
  • FIG. 7A is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment.
  • FIG. 7B is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment.
  • FIG. 8A is a diagram showing an example of a processing sequence in reserved word recognition according to an embodiment.
  • FIG. 8B is a diagram illustrating an example of a processing sequence in reserved word recognition according to an embodiment.
  • FIG. 9A is a diagram showing an example of a processing sequence in reserved word recognition according to an embodiment.
  • FIG. 9B is a diagram showing an example of a processing sequence in reserved word recognition according to an embodiment.
  • FIG. 9A is a diagram showing an example of a processing sequence in reserved word recognition according to an embodiment.
  • FIG. 9B is a diagram showing an example of a processing sequence in reserved word recognition according to an embodiment.
  • FIG. 10A is a diagram illustrating an example of a processing sequence for controlling a corresponding device or sensor based on words for controlling the device or sensor continuously issued by a user after recognizing a reserved word according to an embodiment. is there.
  • FIG. 10B is a diagram illustrating an example of a processing sequence for controlling a corresponding device or sensor based on words for controlling the device or sensor continuously issued by the user after recognizing a reserved word according to an embodiment. is there.
  • FIG. 11A is a diagram illustrating an example of a processing sequence in a case where words for controlling devices and sensors continuously issued by a user after a reserved word according to an embodiment is continued within a certain period of time.
  • FIG. 10B is a diagram illustrating an example of a processing sequence for controlling a corresponding device or sensor based on words for controlling the device or sensor continuously issued by the user after recognizing a reserved word according to an embodiment. is there.
  • FIG. 11A is a diagram illustrating an example of a processing sequence in a case where words for
  • FIG. 11B is a diagram illustrating an example of a processing sequence in a case where words for controlling devices and sensors continuously issued by a user after a reserved word according to an embodiment is continued within a certain period of time.
  • FIG. 12A is a diagram illustrating an example of a processing sequence in a case where words for controlling devices and sensors continuously issued by a user after a reserved word according to an embodiment is continued for a certain period of time.
  • FIG. 12B is a diagram illustrating an example of a processing sequence in a case where words for controlling devices and sensors continuously issued by a user after a reserved word according to an embodiment is continued for a certain period of time. .
  • FIG. 12A is a diagram illustrating an example of a processing sequence in a case where words for controlling devices and sensors continuously issued by a user after a reserved word according to an embodiment is continued for a certain period of time.
  • FIG. 12B is a diagram illustrating an example of a processing sequence in a case where words for controlling devices and sensors continuously issued by
  • FIG. 13 is a list specifically showing the contents of control information used when controlling devices and sensors after recognizing a reserved word according to an embodiment.
  • FIG. 14 is a list showing examples of operation contents to be changed according to a plurality of reserved words according to an embodiment.
  • FIG. 15A is a diagram illustrating an example of a processing sequence for registering a plurality of reserved words according to an embodiment, in which operation contents to be changed according to each reserved word are also registered.
  • FIG. 15B is a diagram illustrating an example of a processing sequence for registering a plurality of reserved words according to an embodiment, in which operation contents to be changed according to each reserved word are also registered.
  • FIG. 15A is a diagram illustrating an example of a processing sequence for registering a plurality of reserved words according to an embodiment, in which operation contents to be changed according to each reserved word are also registered.
  • FIG. 15B is a diagram illustrating an example of a processing sequence for registering a plurality of reserved words according to an embodiment, in which operation contents to be changed
  • FIG. 16A is a diagram illustrating an example of a processing sequence for setting the operation content according to each reserved word in the recognition of the reserved word according to the embodiment.
  • FIG. 16B is a diagram illustrating an example of a processing sequence for setting operation contents according to each reserved word in the recognition of the reserved word according to the embodiment.
  • FIG. 17 is a list showing examples of operation contents set in accordance with words continuing to the reserved words in the reserved words according to the embodiment.
  • FIG. 18A is a diagram illustrating an example of a processing sequence for setting the operation content according to a word continuing to a reserved word in recognition of a registered reserved word according to an embodiment.
  • FIG. 18B is a diagram showing an example of a processing sequence for setting the operation content according to words that continue to the reserved word in the recognition of the registered reserved word according to the embodiment.
  • FIG. 18C is a diagram showing an example of a processing sequence for setting the operation content according to a word continuing to the reserved word in the recognition of the registered reserved word according to the embodiment.
  • FIG. 18D is a diagram illustrating another example of a processing sequence for setting operation contents according to words that continue to the reserved word in recognition of a registered reserved word according to an embodiment.
  • FIG. 18E is a diagram showing another example of a processing sequence for setting operation content according to words that continue to the reserved word in recognition of a registered reserved word according to an embodiment.
  • FIG. 19A is a diagram illustrating an example of a processing sequence for setting operation contents according to words that continue to the recognized reserved word in the recognition of the reserved word according to the embodiment.
  • FIG. 19B is a diagram showing an example of a processing sequence for setting the operation content according to a word continuing to the recognized reserved word in the recognition of the reserved word according to the embodiment.
  • FIG. 20 is a list showing examples of types of speech recognition dictionaries used in accordance with reserved words in the recognition of a plurality of reserved words according to an embodiment.
  • FIG. 21A is a diagram showing an example of a processing sequence for changing the type of speech recognition dictionary used in accordance with a reserved word in the recognition of a plurality of reserved words according to an embodiment.
  • FIG. 21B is a diagram showing an example of a processing sequence for changing the type of speech recognition dictionary used in accordance with a reserved word in the recognition of a plurality of reserved words according to an embodiment.
  • FIG. 22 is a list showing an example in which, in the recognition of a plurality of reserved words according to one embodiment, the words that continue to the reserved words, the operation contents set according to the reserved words, and the type of the speech recognition dictionary to be used are changed.
  • FIG. 23 is a list showing an example of changing the type of the speech recognition dictionary according to the embodiment according to contents other than reserved words.
  • FIG. 24 is a diagram showing a sequence of processing for registering the type of the speech recognition dictionary to be changed according to the content other than the reserved word in the change of the type of the speech recognition dictionary according to the embodiment.
  • FIG. 25 is a diagram showing a processing sequence when changing the type of the speech recognition dictionary to be registered according to the contents other than the reserved word in the change of the type of the speech recognition dictionary according to the embodiment.
  • FIG. 26 shows a reserved word (for relief) for displaying a reserved word and a corresponding reserved word when the user forgets a registered reserved word in the processing according to the embodiment. It is a list which shows the example of a range.
  • FIG. 27 is a functional block diagram of a host device according to an embodiment.
  • FIG. 28 shows a case where the host device 332 detects that a scene for registering a reserved word, additional word, or additional information or a scene for recognizing a reserved word, additional word occurs in the processing according to an embodiment.
  • FIG. 29 is a diagram illustrating an example of a state in which data to be reproduced is displayed when reproducing each data of a recorded or recorded scene according to an embodiment.
  • FIG. 1 is a diagram showing an example of the overall configuration of the home automation system according to the present embodiment.
  • the home automation system includes a cloud server 1 including a group of servers placed in the cloud and various sensors 310 and various equipment devices 320 connected to each other via a network 333 via a host device 332 having a HGW (HomeGateway) function. It consists of a home 3 in which various home appliances 340 are arranged, and the Internet 2 that connects the cloud server 1 and the host device 332.
  • a cloud server 1 including a group of servers placed in the cloud and various sensors 310 and various equipment devices 320 connected to each other via a network 333 via a host device 332 having a HGW (HomeGateway) function.
  • HGW HomeGateway
  • the home 1 is a home, office, or small scale where various sensors 310, various equipment devices 320, and various home appliances 340 are connected to each other via a home network 333 via a host device 332 having an HGW function. It is a business establishment and its size is not limited.
  • the host device 332 controls devices and sensors connected via the network 333 based on information set in advance and information notified from sensors connected via the network 333, and also relates to each device and sensor. It has a function to centrally manage information.
  • the host device 332 has a microphone and can capture words uttered by the user 331.
  • the host device 332 recognizes a predetermined keyword (hereinafter referred to as a reserved word) from words uttered by the user 331, the host device 332 takes in the words uttered by the user 331 following the reserved words, and By analyzing the contents, a response corresponding to the analysis result is returned to the user 331, or devices and sensors connected via the network 333 are controlled according to the analysis result.
  • a predetermined keyword hereinafter referred to as a reserved word
  • the host device 332 recognizes a reserved word from the words uttered by the user 331, the host device 332 does not continuously capture the words uttered by the user 331. This prevents the host device 332 from picking up unnecessary surrounding sounds and operating.
  • the recognition of reserved words is performed in the host device 332, the words uttered by the user 331 following the reserved words are continuously captured, and the contents of the captured words are analyzed in the cloud server 1. Details of the function of the host device 332 will be described later.
  • the various equipment 320 and the various household appliances 340 mean equipment for which the equipment 320 is not easily moved for convenience of explanation, and mean that the various household appliances 340 are relatively easy to move.
  • the names of the exemplified equipment and home appliances do not limit the capabilities and functions of the individual devices.
  • Specific examples of the various sensors 310 include a security camera 311, a fire alarm 312, a human sensor 313, and a temperature sensor 314.
  • Specific examples of the various equipment 320320 include an interphone 325, an illumination 326, an air conditioner 327, and a water heater 328.
  • Specific examples of the various home appliances 340 include a washing machine 341, a refrigerator 342, a microwave oven 343, a fan 344, a rice cooker 345, and a television 346.
  • FIG. 2 shows another example of the various sensors 310 shown in FIG.
  • FIG. 3 shows various examples of the host device 332 shown in FIG.
  • the host device 332-1 is the host device 332 shown in FIG. 1, and is an example of a stationary type having a built-in HGW function.
  • the host device 332-1 is connected to other devices and sensors arranged in the home 1 through the network 333, and is connected to the cloud server 1 through the Internet 2. Since the host device 332-1 is a stationary type, for example, an autonomous moving means such as a motor is not mounted.
  • the host device 332-2 is an example of a stationary type that does not have a built-in HGW function. Therefore, the host device 332-2 is connected to the HGW 330 through the network 333.
  • the host device 332-2 is connected to other devices and sensors disposed in the home 1 via the network 333 via the HGW 330, and is connected to the cloud server 1 via the Internet 2 via the HGW 330. Since the host device 332-2 is a stationary type, for example, an autonomously moving means such as a motor is not mounted.
  • the host device 332-3 is a movable example having a built-in HGW function.
  • the host device 332-3 is connected to other devices and sensors through the network 333, and is connected to the cloud server 1 through the Internet 2. Since the host device 332-3 is a movable type, it is an example in which means for autonomously moving, such as a motor, is mounted.
  • the host device 332-4 is an example of a movable type that does not have a built-in HGW function. Therefore, the host device 332-4 is connected to the HGW 330 through the network 333.
  • the host device 332-4 is connected to other devices and sensors via the network 333 via the HGW 330, and is connected to the cloud server 1 via the Internet 2 via the HGW 330. Since the host device 332-4 is movable, it is an example in which means for autonomously moving, such as a motor, is mounted.
  • FIG. 4 shows functional blocks of the host device 332 shown in FIG.
  • the host device 332 is connected to the system controller 402 that controls the entire internal processing, and the control management unit 401, trigger setting unit 403, trigger recognition unit 405, input management unit 420, and network 333 that controls each function thereby.
  • the control management unit 401 internally manages a plurality of applications for controlling various operations of the host device 332, initial settings of various functional blocks of the host device 332, various state settings, operation settings, and the like. It consists of CONF-Mg 401-2 for managing the setting contents.
  • the host device 332 serves as an interface (I / F) with the user 331, such as a microphone 421 for capturing words uttered by the user 331, a speaker 423 for outputting a response to the user 331 by voice, and the user 331. And a display unit 425 for notifying the status of the host device 332.
  • the microphone 421 is connected to the input management unit 420.
  • the input management unit 420 controls whether the voice data input from the microphone 421 is sent to the trigger setting unit 403, the trigger recognition unit 405, or the voice processing unit 407 according to the state managed internally.
  • the display unit 425 notifies the state of the host device 332 to the user 331, and is, for example, an LED (Light Emitting Diode) or an LCD (Liquid Crystal Display).
  • the memory 410 is divided into three areas: an operation mode storage area 410-1, a reserved word storage area 410-2, and a voice storage area 410-3. The contents of the information stored in each area will be described later.
  • the function of the host device 332 recognizes a reserved word among the words uttered by the user 331, the function utters the word uttered by the user 331 to the reserved word and displays the contents of the imported word.
  • the function By performing the analysis, it has a function of returning a response according to the analysis result to the user 331 and controlling the operation of devices and sensors connected through the network 333.
  • the host device 332 performs four processes.
  • the first process is registration of reserved words.
  • the second process is recognition of reserved words.
  • the third process is registration of control contents of devices and sensors that control operations.
  • the fourth process is control of devices and sensors for which control contents are registered.
  • the host device 332 has a function of registering a reserved word in the host device 332.
  • the host device 332 has a mode for registering a reserved word (hereinafter referred to as a reserved word registration mode).
  • FIG. 5A and FIG. 5B show examples of processing sequences of the host device 332 from the start of reserved word registration to the completion of registration in a state where the host device 332 is in the “reserved word registration mode” in order to register a reserved word. Is shown.
  • the host device 332 may be able to change the mode by recognizing words uttered by the user 331 in a predetermined order in order to change the mode.
  • a menu screen may be displayed on the display unit 425, and the mode may be changed by the user 331 operating the menu screen.
  • the mode may be changed by the user 331 operating a menu screen for changing the mode of the host device 332 displayed on the smartphone or tablet connected via the network I / F 427.
  • the host device 332 takes the voice data input from the microphone 421 into the input management unit 420 (S501).
  • the input management unit 420 has a function of determining a transfer destination of input audio data according to a state managed internally.
  • the input management unit 420 transfers the received audio data to the trigger setting unit 403 (S502).
  • the trigger setting unit 403 stores the received audio data in the audio storage area 410-3 of the memory 410 (S503), and confirms whether the number of times the user 331 has acquired the audio has reached the specified number (S504). .
  • the trigger setting unit 403 displays a display prompting the user 331 to utter a word to be registered when it is determined that the specified number has not been reached as a result of checking whether the number of times the user 331 has captured the voice has reached the specified number. (S507) and an input continuation notification is sent to the input management unit 420 (S506). Upon receiving the input continuation notification, the input management unit 420 changes the internal state to a state waiting for voice input from the microphone (S500).
  • the display prompting the user 331 to input a word to be registered is a display in which the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S505) and receives the registration incomplete notification. It is desirable that the display method be recognized by the user 331, for example, the device 425 blinks the light emitting diode (LED) in red (S507). Further, instead of the display method, a voice method may be used to prompt the user 331 to input a registered word. In this case, the trigger setting unit 403 transmits a registration incomplete notification to the speaker 423, and the speaker 423 that has received the registration incomplete notification announces to the user 331, for example, “Please input again”. But you can.
  • the trigger setting unit 403 may use both a display method and a voice method to prompt the user 331 to input words to be registered.
  • the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly rotates and moves with a certain angular width, for example.
  • the trigger setting unit 403 determines that the prescribed number of times has been reached as a result of checking whether the number of times the user 331 has captured the voice has reached the prescribed number, the trigger setting unit 403 stores it in the voice accumulation area 410-3 until then. Certain voice data is read (S508) and sent to the recognition data conversion unit 101-1 in the voice recognition cloud 101 in the cloud server 1 through the Internet 2 (S509).
  • the recognition data conversion unit 101-1 converts the voice data sent from the trigger setting unit 403 into recognition data for recognition as a reserved word (S510).
  • the recognition data conversion unit (101-1) sends the recognition data to the trigger setting unit 403 through the Internet 2 (S511).
  • the trigger setting unit 403 Upon receiving the recognition data, stores the received data in the reserved word storage area 410-2 of the memory 410 (S512).
  • the trigger setting unit 403 displays (S514) informing the user 331 that registration of the reserved word is completed.
  • the trigger setting unit 403 transmits a registration completion notification to the display device 425 (S514), and the display device 425 that has received the registration completion notification receives the registration completion notification.
  • a display method that the user 331 can recognize, such as lighting the LED in green.
  • the trigger setting unit 403 may use a voice method instead of the display method to notify the user 331 that registration of the reserved word has been completed.
  • the trigger setting unit 403 may transmit a registration completion notification to the speaker 423, and the speaker 423 that has received the registration completion notification may announce, for example, “registration is completed” to the user 331. .
  • the trigger setting unit 403 may use both a display method and a voice method to notify the user 331 that registration of the reserved word has been completed.
  • the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly moves linearly with a certain moving width, for example.
  • the trigger setting unit 403 has a role of managing the flow of data in registering reserved words.
  • 6A and 6B show another sequence example from the start of registration of reserved words to the completion of registration. In some cases, it is insufficient to register the voice data captured by the host device 332 as a reserved word. An example of processing when the captured data is insufficient will be described.
  • the processing from S600 to S615 shown in FIGS. 6A and 6B is the same as the processing from S500 to S515 shown in FIGS. 5A and 5B, respectively.
  • the difference between the process in FIGS. 5A and 5B and the process in FIGS. 6A and 6B is that the process of S616 is added to the process of FIG. 6B.
  • the trigger setting unit 403 confirms whether the number of times the words uttered by the user 331 have been taken reaches the specified number (S604), and determines that the number has reached the specified number, the trigger setting unit 403
  • the voice data stored in 410-3 is read (S608), and sent to the recognition data conversion unit 101-1 in the voice recognition cloud 101 in the cloud server 1 through the Internet 2 (S609).
  • the trigger setting unit 403 determines that the number of times the words uttered by the user 331 are not reached the specified number, the trigger setting unit 403 displays a message prompting the user 331 to utter the words to be registered (S607) and the input management unit An input continuation notice is sent to 420 (S606).
  • the input management unit 420 that has received the input continuation notification transitions the internal state to a state waiting for input of sound from the microphone (S600).
  • the display prompting the user 331 to input a word to be registered is a display in which the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S605) and receives the registration incomplete notification. It is desirable that the display method be recognized by the user 331, for example, the device 425 blinks the LED in red (S607). Further, instead of the display method, a voice method may be used to prompt the user 331 to input a registered word. In this case, the trigger setting unit 403 transmits a registration incomplete notification to the speaker 423, and the speaker 423 that has received the registration incomplete notification announces to the user 331, for example, “Please input again”. But you can.
  • the trigger setting unit 403 may use both a display method and a voice method to prompt the user 331 to input words to be registered.
  • the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly rotates and moves with a certain angular width, for example.
  • the recognition data conversion unit 101-1 determines whether or not the received voice data can be converted into recognition data when converting all the voice data sent from the trigger setting unit 420 into the recognition data. (S616). If it is determined that some of the transmitted voice data cannot be converted into recognition data, the recognition data conversion unit 101-1 transmits a voice data addition request to the trigger setting unit 403 through the Internet 2 (S617). To do. Upon receiving the voice data addition request, the trigger setting unit 403 sets the number of times that the user 331 additionally inputs a word to be registered as a reserved word (S618), and notifies the input management unit 420 of continued input (S619). To be notified.
  • the trigger setting unit 403 sets the additional number of times that the user 331 additionally inputs (S618), for example, the LED of the display unit 425 remains lit in red.
  • the user 331 issues words to be registered as reserved words for the number of times additionally set in S618.
  • the input management unit 420 When receiving the input continuation notification (S619), the input management unit 420 changes the internal state to input waiting (S600), and enters an input waiting state for words uttered by the user 331.
  • the processing shown in FIGS. 5A and 5B and the processing shown in FIGS. 6A and 6B are performed by collecting the audio data acquired after the number of times that the input management unit 402 has acquired the audio uttered by the user 331 reaches the specified number.
  • This is an example of transmitting to the recognition data conversion unit 101-1 in the cloud server 1, but each time the input management unit 420 captures the voice uttered by the user 331, the captured voice data is converted into the recognition data conversion unit 101-. 1 may be transmitted.
  • 7A and 7B every time the input management unit 420 captures the voice uttered by the user 331, the captured voice data is sequentially sent to the recognition data conversion unit 101-1 in the cloud server 1 for recognition. It is an example of a sequence in the case of converting into data.
  • the processing from S700 to S702 shown in FIG. 7A is the same as the processing shown from S500 to S502 in FIG. 5A, respectively. Further, the processes of S703 and S704 shown in FIG. 7A are the same as the processes of S505 and S507 shown in FIG. 5A, respectively.
  • the host device 332 takes the voice data input from the microphone 421 into the input management unit 420 (S701). Since the mode of the host device 332 is the reserved word registration mode, the input management unit 420 transfers the received voice data to the trigger setting unit 403 (S702).
  • the trigger setting unit 403 sequentially transmits the received audio data to the recognition data conversion unit 101-1 in the cloud server 1 every time it is received (S706).
  • the recognition data conversion unit 101-1 determines whether the received voice data can be converted into recognition data when converting the voice data sent from the trigger setting unit 403 into recognition data ( S707).
  • the recognition data conversion unit 101-1 transmits a voice data addition request to the trigger setting unit 403 via the Internet 2 (S708).
  • the trigger setting unit 403 (S708) checks whether the number of times the user 331 has taken in the voice has reached the specified number (S714).
  • the trigger setting unit 403 displays a display prompting the user 331 to utter a word to be registered when it is determined that the specified number has not been reached as a result of checking whether the number of times the user 331 has captured the voice has reached the specified number.
  • an input continuation notice is sent to the input management unit 420 (S715), thereby causing the input management unit 420 to transition to a state waiting for voice input from the microphone (S700).
  • the input management unit 420 changes the internal state to input waiting (S700), and enters a state of waiting for input of words uttered by the user 331.
  • the recognition data conversion unit 101-1 converts the voice data into recognition data (S709).
  • the recognition data conversion unit 101-1 converts the voice data input from the microphone 421 using all the recognition data including those already converted into the recognition data as a result of the conversion to the recognition data (S709). It is determined whether or not accuracy that can be recognized as a reserved word is secured (S710).
  • the user 331 issues a word to be registered as a reserved word.
  • the trigger setting unit 403 is notified to the trigger setting unit 403 via the Internet 2 of recognition data (with recognition data full notification) to which information indicating that the recognition data is sufficient is added (S711).
  • the trigger setting unit 403 that has received the recognition data is the recognition data received up to this point and sufficient recognition data to recognize the voice data input from the microphone 421 as a reserved word.
  • the trigger setting unit 403 stores all the recognition data received up to this point in the reserved word storage area 410-2 (S716), and also stores it in the input management unit 420, the display unit 425, and the recognition data conversion unit 101-1.
  • a registration completion notification is sent (S717) (S718) (S719).
  • the accuracy of the converted recognition data makes it possible to stop the user 331 from issuing a word to be registered as a reserved word because the number of times the user's 331 has been captured does not reach the specified number. This makes it possible to register reserved words with a higher degree of freedom.
  • the specified number of times can be changed by the user 331 as a setting value of the host device 332, and can be changed as one of additional information described later.
  • the recognition data conversion unit 101-1 determines that the recognition data created up to this point does not ensure sufficient accuracy to recognize the voice data input from the microphone 421 as a reserved word Only the converted recognition data is sent to the trigger setting unit 403 (S713).
  • the trigger setting unit 403 that has received the recognition data checks whether or not the number of times the user's 331 voice has been captured has reached the specified number (S714). If the trigger setting unit 403 determines that the specified number has not been reached as a result of checking whether the specified number has been reached, the trigger setting unit 403 continues to display the user 331 to utter the words to be registered, and to the input management unit 420. By sending an input continuation notification (S715), the input management unit 420 is shifted to a state waiting for voice input from the microphone (S700).
  • the display prompting the user 331 to input a word to be registered is a display in which the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S703) and receives the registration incomplete notification. It is desirable that the display method be recognized by the user 331, for example, the device 425 blinks the LED in red (S704). Further, instead of the display method, a voice method may be used to prompt the user 331 to input a registered word. In this case, the trigger setting unit 403 transmits a registration incomplete notification to the speaker 423, and the speaker 423 that has received the registration incomplete notification announces to the user 331, for example, “Please input again”. But you can.
  • the trigger setting unit 403 may use both a display method and a voice method to prompt the user 331 to input words to be registered.
  • the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly rotates and moves with a certain angular width, for example.
  • a registration completion notification is received from the input management unit 420, the display unit 425, and the recognition.
  • a registration completion notice is sent to the data conversion unit 101-1 (S717) (S718) (S719).
  • the recognition data conversion unit 101-1 clears the converted recognition data temporarily stored for performing the processing of S710.
  • the host device 332 When the host device 332 recognizes a reserved word among the words uttered by the user 331, the host device 332 analyzes the contents of the words uttered by the user 331, and controls devices and sensors based on the analysis result. It has a function. In order to recognize the reserved word and control the device and sensor after recognizing the reserved word, the host device 332 has a mode for recognizing the reserved word and controlling the device and sensor (hereinafter referred to as an operation mode). ing.
  • an operation mode for recognizing the reserved word and controlling the device and sensor
  • 8A and 8B show an example of a processing sequence of the host device 332 until it is recognized that the word uttered by the user 331 is one of registered reserved words in the operation mode.
  • the host device 332 takes the voice data input from the microphone 421 into the input management unit 420 (S801).
  • the input management unit 420 transfers the received voice data to the trigger recognition unit 405 (S802).
  • the trigger recognizing unit 405 receives the voice data transferred from the input management unit 420, the trigger recognizing unit 405 determines whether the transferred voice data is a reserved word or not in the reserved word storage area 410-2 of the memory 410. The data is compared with the recognition data read out (S803) (S804).
  • the trigger recognition unit 405 determines that the input voice data cannot be recognized as a reserved word (S805), the trigger recognition unit 405 displays a message prompting the user 331 to emit the reserved word (S808) and inputs the input data to the input management unit 420. A continuation notice is sent (S807).
  • the display that prompts the user 331 to issue a reserved word is such that the trigger recognition unit 405 transmits a recognition incomplete notification to the display unit 425 (S806), and the display unit 425 that has received the recognition incomplete notification receives, for example, It is desirable to use a display method that the user 331 can recognize, such as blinking the LED in red (S808).
  • the trigger recognizing unit 405 may prompt the user 331 to input a voice by using a voice method instead of the display method. In this case, the trigger recognizing unit 405 transmits a recognition incomplete notification to the speaker 423, and the speaker 423 that has received the recognition incomplete notification announces to the user 331, for example, "I did not hear”. Good.
  • the trigger recognition unit 405 may use both a display method and a voice method to prompt the user 331 to input a voice.
  • the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly rotates and moves with a certain angular width, for example.
  • the trigger recognizing unit 405 displays that the voice uttered by the user 331 is recognized as a reserved word (S810).
  • the display indicating that the voice uttered by the user 331 is recognized as a reserved word is displayed by the trigger recognition unit 405 transmitting a recognition completion notification to the display device 425 (S809) and receiving the recognition completion notification. It is desirable to use a display method that can be recognized by the user 331, for example, 425 turns on the LED in green (S810).
  • the trigger recognizing unit 405 may notify that the voice uttered by the user 331 is recognized as a reserved word by using a voice method instead of the display method.
  • the trigger recognition unit 405 transmits a recognition completion notification to the speaker 423, and the speaker 423 that has received the recognition completion notification also announces to the user 331, for example, “Yes” or “I heard it”. Good.
  • the trigger recognition unit 405 may use both a display method and a voice method to indicate that the voice uttered by the user 331 is recognized as a reserved word.
  • the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly moves linearly with a certain moving width, for example.
  • FIG. 9A and FIG. 9B show another example of the processing sequence of the host device 332 until the word uttered by the user 331 is recognized as one of registered reserved words in the operation mode.
  • the difference between the sequence example of FIGS. 9A and 9B and the sequence example of FIGS. 8A and 8B is that the recognition probability is taken into consideration in the process of recognizing the reserved word.
  • the recognition probability means that the recognition data is compared with the feature points such as the frequency component and strength of the voice data transferred from the input management unit 420, and the two match each other.
  • the processes from S900 to S912 shown in FIGS. 9A and 9B are the same as the processes shown from S800 to S812, respectively, and the process in FIGS. 9A and 9B is different from the processes in FIGS. 8A and 8B in the processes from S913 to S916. This is the point where processing is added.
  • the trigger recognition unit 405 Upon receiving the voice data transferred from the input management unit 420, the trigger recognition unit 405 reads the recognition data from the reserved word storage area 410-2 of the memory 410 (S903), and is transferred from the input management unit 420. Comparison with audio data is performed (S904).
  • the trigger recognition unit 405 determines that the input voice data is recognized as a reserved word (S905), the trigger recognition unit 405 proceeds to a recognition probability determination process (S913).
  • the speech recognition processing performed by the trigger recognition unit 405 includes characteristics such as the frequency component and strength of the recognition data read from the reserved word storage area 410-2 of the memory 410 and the voice data transferred from the input management unit 420. A point is compared, and when both coincide with each other at a certain level or more, it is determined that the voice data transferred from the input management unit 420 is recognition data.
  • the host device 332 compares the recognition data with the feature points such as the frequency component and strength of the audio data transferred from the input management unit 420, the host device 332 sets a threshold value for determining the level at which the two match. It is also possible to provide a plurality. By doing in this way, the host device 332 is not a two-way determination that the reserved word can be recognized / the reserved word cannot be recognized when recognizing the reserved word among the words uttered by the user, For example, it is possible to add a determination that a reserved word is not a correct reserved word, such that a reserved word can be recognized / a reserved word cannot be recognized / a reserved word cannot be recognized.
  • the host device 332 responds according to the determination result that “the reserved word cannot be recognized”, and there is an advantage that the user 331 who sees the response content can approach the correct reserved word.
  • 9A and 9B are examples in the case where two recognition probability thresholds are provided. Assuming that the threshold value for recognizing a reserved word is threshold value 1 and the threshold value for not recognizing a reserved word is threshold value 0, as a result of comparison in S904, if the recognition probability is greater than or equal to threshold value 1, it is determined that the reserved word has been recognized. It becomes. If the recognition probability is greater than or equal to the threshold 0 and less than the threshold 1, it is determined that the reserved word cannot be recognized. When the recognition probability is less than the threshold 0, it is determined that the reserved word cannot be recognized. Therefore, the process of S905 is a process of comparing the recognition probability with the threshold 0. The process of S913 is a process of comparing the recognition probability with the threshold 1 in size.
  • the host device 332 determines that the recognition probability is greater than or equal to the threshold value 0 and less than the threshold value 1 (S913), the host device 332 displays a message prompting the user 331 to emit a reserved word (S915) and continues input to the input management unit 420. A notification is sent (S916).
  • the display that prompts the user 331 to issue a reserved word is that the trigger recognition unit 405 sends an insufficient recognition notification to the display unit 425 (S914), and the display unit 425 that has received the insufficient recognition notification, for example, It is desirable to use a display method that the user 331 can recognize, such as blinking the LED in green (S915).
  • the display that prompts the user 331 to issue a reserved word can be changed from the display when the recognition is unsuccessful (S908) or the display when the recognition is successful (S910), The user 331 can recognize that his / her words are close to the reserved words, but do not correctly issue the reserved words.
  • the trigger setting unit 403 may prompt the user 331 to input a voice by using a voice method instead of the display method.
  • the trigger recognizing unit 405 transmits an insufficient recognition notification to the speaker 423 (S914), and the speaker 423 that has received this insufficient recognition notification, for example, “has something called?” To the user 331.
  • the method of announcement may be used.
  • the trigger recognition unit 405 may use both a display method and a voice method to prompt the user 331 to input a voice.
  • the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly rotates and moves with a certain angular width, for example.
  • the host device 332 When the host device 332 recognizes a reserved word among the words uttered by the user 331, the host device 332 continuously captures the words uttered by the user after recognizing the reserved words, and analyzes the contents of the captured words. It has a function to control equipment and sensors.
  • 10A and 10B show a case where the host device controls the device or sensor based on the content of the audio data including the control content of the device or sensor captured from the microphone 421 after the reserved word is recognized.
  • An example of the processing sequence is shown.
  • the internal state of the input management unit 420 transitions to recognized (S1000) since reserved word recognition has been completed.
  • the host device 332 takes the voice data (control content) into the input management unit 420 through the microphone 421 (S1002). Since the internal state has already been recognized, the input management unit 420 transfers the input voice data (control content) to the voice processing unit 407 (S1002). The voice processing unit 407 sends the transferred voice data (control contents) to the voice text conversion unit 101-2 in the voice recognition cloud 101 in the cloud server 1 via the Internet 2.
  • the voice text conversion unit 101-2 performs a process of converting voice data sent through the Internet 2 into text data (S1004). By this process, the voice uttered by the user 331 originally captured through the microphone 421 is converted into text data.
  • the speech text conversion unit 101-2 stores the converted text data therein and sends a conversion completion notification to the speech processing unit 407 (S1005).
  • the voice processing unit 407 Upon receiving the conversion completion notification, the voice processing unit 407 transmits a text analysis request to the voice text conversion unit 101-2 (S1006).
  • the voice text conversion unit 101-2 sends the text analysis request to the text analysis unit 102-1 together with the data converted into the text stored therein (S1007).
  • the text analysis unit 102-1 receives the text analysis request (S1007), it analyzes the content of the accompanying text data (S1008).
  • the text analysis unit 102-1 sends the analysis result to the response / action generation unit 102-2 as a text analysis result notification (S1009).
  • the response / action generation unit 102-2 Upon receiving the text analysis result (S1009), the response / action generation unit 102-2 generates a target device and a command for controlling the device based on the content (S1010), and sends the generated command as a response / action. It is sent to the voice processing unit 407 as a generation result notification (S1011).
  • the voice processing unit 407 When the voice processing unit 407 receives the response / action generation result notification (S1011), the voice processing unit 407 specifies the device or sensor to be controlled and the control content from the response / action generation result notification (S1012). The voice processing unit 407 converts the specified control target device or sensor and its control contents into a format that can be recognized by the control target device or sensor, and sends an action notification to the target device or target sensor through the network 333 at a necessary timing. (S1013).
  • the control target device or sensor that is the notification destination of the action notification Upon receiving the action notification (S1013), the control target device or sensor that is the notification destination of the action notification performs an operation based on the control content included therein (S1014).
  • the host device 332 determines that the continuous voice is a series of voices and requests the user 331 to issue a reserved word on the way without requesting the continuous voice. Can be captured. On the contrary, the host device 332 requests the input of the reserved word again when the user 331 utters a sound after a certain period of time.
  • FIG. 11A and FIG. 11B are examples of processing sequences in the case where the user 331 continuously utters words within the time T0 after the recognition of the reserved word is completed.
  • the input management unit 420 starts the input interval confirmation timer T.
  • the unit 420 transfers the captured audio data (control content) to the audio processing unit 407 (S1122).
  • the activated input interval confirmation timer T is activated again.
  • the voice processing unit 407 sends the transferred voice data (control content) to the voice text conversion unit 101-2 in the voice recognition cloud 101 in the cloud server 1 via the Internet 2 (S1123). Thereafter, similarly to the processes from S1104 to S1110, the process of the voice data (S1123) transmitted in the voice recognition cloud 101 is continued.
  • the input interval confirmation timer T is activated at the timing when the input management unit 420 takes in the voice data input from the microphone 421.
  • the input management unit 420 is not limited to this, and for example, the input management unit 420 is sent from the microphone 421. You may start at the timing which transfers data to the trigger setting part 403 or the audio
  • FIG. 12A and FIG. 12B are examples in the case where the user 331 does not continuously emit voice within the time T0.
  • the input management unit 420 starts the input interval confirmation timer T.
  • the input management unit 420 changes the internal state to wait for input (S1220).
  • the input management unit changes the internal state to waiting for input (S1220), and notifies the voice processing unit 407 of a timeout notification (S1221).
  • the voice processing unit 407 that has received the timeout notification transmits a recognition incomplete notification to the display unit 425 (S1222), and the display unit 425 that has received the recognition incomplete notification notifies the user 331 to issue a reserved word.
  • a prompting display for example, the LED blinks in red (S1223).
  • the input management unit 420 changes the internal state during recognition (S1225) and the captured voice.
  • the data is transferred to the trigger recognition unit 405 (S1226).
  • the host device 332 performs the processing from S803 to S812 in FIGS. 8A and 8B or the processing from S903 to S916 in FIGS. 9A and 9B, and recognizes the reserved word again.
  • FIG. 13 shows control information used when the host device 332 recognizes a reserved word and controls various sensors 310, various equipment devices 320, and various home appliances 340 as shown in the sequence diagrams of FIGS. 10A and 10B. A specific example of the contents is shown.
  • Item 1 includes information for controlling various sensors 310, various equipment 320, and various home appliances 340 included in the response / action generation result notification transmitted from the response / action generation unit 102-2 (hereinafter, response / action This is a specific example).
  • This response / action generation information includes a “target” such as a device or a sensor controlled by the device 332 and a “command” indicating the content for controlling the control target.
  • the host device 332 receives the response / action generation result notification, the host device 332 extracts the action information contained therein, and controls the target device based on the content of the action information.
  • commands include “start command” for starting (operating) a target device to be controlled, “stop command” for ending (stopping), and “operation change command for changing contents (mode) during operation” And “setting change command” for changing contents (modes) set in advance in the target device.
  • the user 331 In order for the response / action generation unit 102-2 to generate the response / action information included in the response / action generation result notification, the user 331 previously controls the device to be controlled, its control contents, and the host device to control the device. It is necessary to register a combination of words issued to 332 in the response / action generation unit 102-2 as an initial setting of the host device 332.
  • the registration of response / action information in the initial setting of the host device 332 will be described with reference to the example of FIG.
  • Item 2 is “target” which is a device controlled through the host device 332.
  • the “target” is an identification name of devices and sensors included in the various sensors 310, the various equipment devices 320, and the various home appliances 340, and the air conditioner 1 is described as a specific example.
  • Item 3 is a “command” which is the control content of the device shown in “Item 2”. This “command” describes the command of the air conditioner 1 listed in item 2 as a specific example.
  • “Command” and “Setting change command” for changing the setting contents of the air conditioner are described as examples.
  • the product specifications of each device and sensor of item 2 and item 3 are stored in advance in a product specification cloud server in which information on product specifications not described is stored.
  • the user 331 obtains the product specification information of the item 2 and the item 3 of the target device and target sensor to be controlled through the host device 332 from the product specification cloud server.
  • phrase “phrase”, which is a word uttered to the host device 332.
  • This “phrase” is preferably the content corresponding to the command of the air conditioner 1 listed in item 3, for example, “turn on command” for “start command” for operating the air conditioner, “stop command for stopping the air conditioner” ”For air conditioning”, change the air conditioner operation “cooling” to “dry” “operation change command” for “change to dry”, air conditioner setting content for operation start time “Setting change command” for changing “to start operation at 10 o'clock” describes “turn on air conditioner at 10 o'clock” as an example.
  • the user 331 creates a combination (target, command, phrase) determined as described above as an initial setting of the host device 332.
  • the user 331 performs the same creation for all the devices to be controlled through the host device 332, and finally generates a response / action information list in which all the devices to be controlled (objects, commands, phrases) are combined into one. To do.
  • the created response / action information list is registered in the response / action generation unit 102-2 through the host device 332.
  • the host device 332 When the response / action information list is registered in the response / action generation unit 102-2, as shown in FIGS. 10A and 10B, the host device 332 continues the words that the user 331 issues after the recognition of the reserved words is completed. By taking in and analyzing, it is possible to control devices and sensors.
  • the speech text conversion unit 101-2 converts the input speech data into the text “Eat Kontetsu”, and the text analysis unit 102-1 , Analyzes that the text data “Eakontsute” is the content of “Turn on the air conditioner”.
  • the response / action generating unit 102-2 refers to the list of registered response / action information, and the response / action information corresponding to the analysis result of “phrase” “turn on air conditioner” Search for.
  • the audio processing unit 407 refers to the response / action information set in the received response / action generation result notification, and selects the corresponding device or sensor from among the various sensors 310, the various equipment 320, and the various home appliances 340. Control.
  • FIG. 14 shows that when a plurality of reserved words are registered in the host device 332, the host device 332 recognizes the word uttered by the user 331 as one of the reserved words, and performs in accordance with the recognized reserved word. It is a list of examples of contents.
  • the host device 332 can register a plurality of reserved words, and when each of the plurality of reserved words is recognized, the operation content corresponding to the recognized reserved word (hereinafter referred to as additional information 1). Can be set.
  • the host device 332 has registered three reserved words, for example, “Iroha”, “I like you”, and “Sonya”.
  • the host device 332 recognizes the word uttered by the user 331 as the reserved word “Iroha”, the operation content already set is not changed, but the word uttered by the user 331 is the reserved word “I am like”.
  • the operation is changed so as to be surely announced through the speaker 423 that “the master is pleased” when the words uttered by the user 331 are recognized thereafter.
  • the host device 332 determines that the user 331 is a senior user. The setting is changed so that the expiration time T0 of the input interval confirmation timer shown in FIGS. 11A and 11B is longer than the normal setting time.
  • FIG. 14 shows an example in which the host device 332 changes the operation content of the host device itself.
  • the present invention is not limited to this, and controls the operation of devices and sensors connected to the host device 332 via the network 333. May be.
  • the host device 332 needs to register the additional information 1 for each reserved word in the host device 332 in advance in order to change the operation of the host device 332 according to a plurality of reserved words.
  • the host device 332 When registering a reserved word in the host device 332, the host device 332 has a mode for registering additional information 1 corresponding to the reserved word to be registered (hereinafter referred to as a reserved word registration (additional information 1) mode). ing.
  • FIG. 15A and FIG. 15B show the case where the reserved word and the additional information 1 corresponding to the reserved word are registered in the state where the host device 332 is in the “reserved word registration (additional information 1) mode”.
  • An example of a processing sequence of the host device 332 from registration start to registration completion of additional information 1 is shown.
  • the processing from S1500 to S1514 shown in FIGS. 15A and 15B is the same as the processing from S500 to S514 shown in FIGS. 5A and 5B, respectively.
  • 15A and 15B are different from FIG. 5A and FIG. 5B in that S1515 is different from S515 and that S1516 to S1523 are added.
  • the trigger setting unit 403 displays (S1514) informing the user 331 that registration of the reserved word has been completed.
  • the display informing the user that registration of the reserved word has been completed (S1515) is performed by the trigger setting unit 403 transmitting a registration completion notification to the display device 425 (S1514) and receiving the registration completion notification. It is desirable to use a display method that can be recognized by the user 331, for example, 425 causes the LED to blink green. Accordingly, the trigger setting unit 403 can prompt the user 331 to register the additional information 1.
  • the user 331 who recognizes that the LED is blinking in green (S1515) can set the additional information 1 corresponding to the reserved word registered in S1511.
  • the setting method of the additional information 1 may be such that the host device 332 captures the voice uttered by the user 331 through the microphone 421, and can be registered by analyzing the captured voice data.
  • a menu for setting the additional information 1 may be displayed on the display device 425 so that the user 331 can perform registration in accordance with the menu.
  • an external device connected via the network I / F 427 shown in FIG. 4 such as a smartphone or tablet
  • a menu for setting additional information 1 corresponding to the reserved word on the display screen of the smartphone or tablet is displayed.
  • the user 331 may be registered by operating in accordance with the displayed menu screen.
  • FIGS. 15A and 15B show an example of a processing sequence in the case where a menu for setting additional information 1 displayed on the display unit 425 is displayed and the user 331 operates according to the menu to register the additional information 1.
  • a menu for registering the additional information 1 is displayed on the display unit 425.
  • the user 331 creates additional information 1 by operating according to the displayed menu screen.
  • the additional information 1 that has been created is taken into the input management unit 420 (S1517).
  • the input management unit 420 transfers the acquired additional information 1 to the trigger setting unit 403.
  • the trigger setting unit 403 stores the transferred additional information 1 in the reserved word storage area 410-2 of the memory 410 (S1519).
  • the trigger setting unit 403 stores the additional information 1 in association with the reserved word registered in S1513 when storing the additional information 1 in the reserved word storage area 410-2 of the memory 410.
  • the voice processing unit 407 performs a display (S1522) informing the user 331 that the registration of the additional information 1 has been completed.
  • 16A and 16B show the recognition of reserved words among the words uttered by the user 331 when the additional information 1 is stored in the reserved word storage area 410-2 of the memory 410 by the processing shown in FIGS. 15A and 15B.
  • the recognized additional information 1 of the reserved word is read from the reserved word storage area 410-2 and an operation is set for the host device 332.
  • the processing from S1600 to S1612 shown in FIGS. 16A and 16B is the same as the processing from S800 to S812 shown in FIGS. 8A and 8B, respectively.
  • the difference between the processing of FIGS. 16A and 16B and the processing of FIGS. 8A and 8B is that the processing of S1613 and S1614 is added.
  • the trigger recognizing unit 405 When the word uttered by the user 331 is recognized as a reserved word (S1605), the trigger recognizing unit 405 reads the additional information 1 corresponding to the corresponding reserved word from the reserved word storage area 410-2 of the memory 410. The trigger recognition unit 405 that has read the additional information 1 sets the operation of the content of the read additional information 1 (S1613) in the host device 332 (S1614). When the contents of the example shown in FIG. 14 are stored in the reserved word storage area 410-2, when “Son” is recognized as a reserved word in S1605, the trigger recognizing unit 405 determines the input interval in S1614. The expiration time T0 of the confirmation timer T is set to make the normal value longer.
  • FIG. 17A shows that when a word uttered by the user 331 is recognized as a reserved word registered in the host device 332, the host device 331 responds to the word uttered by the user 331 following the recognized reserved word.
  • 332 is a list of examples of operation contents for performing a specific operation.
  • additional information 2 When the host device 332 recognizes that the words uttered by the user 331 are registered reserved words, the contents of the words uttered by the user 331 (hereinafter referred to as additional words) following the recognized reserved words.
  • the operation content (hereinafter referred to as additional information 2) can be set in accordance with.
  • the host device 332 when the host device 332 recognizes “ya” as a word uttered by the user 331 following the reserved word “Iroha”, the host device 332 estimates that the user 331 is a senior user, and the user 331 tends to speak slowly. Therefore, the expiration time T0 of the input interval confirmation timer shown in FIGS. 11A and 11B is changed to be longer than the normal set time.
  • the host device 332 recognizes “oi” as a word uttered by the user 331 following the reserved word “Iroha”
  • the host device 332 determines that the user 331 is angry and immediately transmits “sorry” through the speaker 423. Try to announce.
  • the host device 332 sets the additional information 2 by setting a plurality of additional words for one reserved word and setting the additional information 2 for each combination of the plurality of additional words for the reserved word.
  • the additional information 2 for each combination of a plurality of reserved words and a plurality of additional words.
  • FIG. 17B for example, it is assumed that the host device 332 has registered “Iroha”, “Ookini”, and “Ashindo” as reserved words.
  • an additional word may be defined for each reserved word, and additional information 2 may be set for each combination of the reserved word and the additional word.
  • some users may want a specific action just by issuing a reserved word. For example, if there is a phrase of a certain individual, the phrase is registered in the host device 332 as a reserved word, and an operation corresponding to the reserved word is registered in the host device 332 as well. And control of the operation of the sensor can be executed easily.
  • the reserved word “Ashindo” in FIG. 17B when the host device 332 recognizes the reserved word “Ashindo”, the host device 332 recognizes the reserved word from the words uttered by the user 331. It is also possible to announce the beer information stored in the refrigerator connected to the network 333 through the speaker 423.
  • the host device 332 uses a combination of the additional word corresponding to the reserved word and the additional information 2 that is the operation content for the combination of the reserved word and the additional word. It is necessary to register with the host device 332 in advance. Therefore, the host device 332 has a mode for additionally registering corresponding additional words and additional information with respect to the registered reserved words.
  • a mode for adding additional information 1 to a reserved word already registered in the host device 332 is called an additional information 1 additional registration mode
  • a mode for adding additional words and additional information 2 is called an additional information 2 additional registration mode.
  • the setting method of the additional information 2 may be configured such that, similar to the setting of the additional information 1, the voice uttered by the user 331 is captured by the host device 332 through the microphone 421, and the captured voice data is analyzed to be registered.
  • a menu for setting the additional information 2 may be displayed on the display device 425 so that the user 331 can perform registration in accordance with the displayed menu.
  • an external device connected via the network I / F 427 shown in FIG. 4 for example, a smartphone or tablet, additional information 2 corresponding to the reserved word and the additional word is set on the display screen of the smartphone or tablet.
  • a menu to be displayed may be displayed, and the user 331 may perform registration according to an operation according to the displayed menu screen.
  • 18A, 18B, and 18C show a case where an additional word is registered and an operation content (additional information 2) is registered for the registered reserved word shown in FIGS. 17A and 17B. It is an example of a processing sequence.
  • the user 331 changes the host device 332 to “additional information 2 additional registration mode”.
  • the host device is changed to the “additional information 2 additional registration mode”
  • the user 331 issues a reserved word registered in the host device 332 and an additional word to be registered for the reserved word.
  • the host device 332 first recognizes a reserved word from the words uttered by the user 331 (S1805).
  • the host device 332 captures the words uttered by the user 331 into the input management unit 420 through the microphone 421 (S1801).
  • the input management unit 420 changes the internal state managed internally to being recognized (reserved word) (S1802), and transfers the input voice data to the trigger recognition unit 405 (S1803).
  • the trigger recognition unit 405 Upon receiving the voice data transferred from the input management unit 420, the trigger recognition unit 405 reads the recognition data from the reserved word storage area 410-2 of the memory 410 (S1804), and has been transferred from the input management unit 420. Comparison with audio data is performed (S1805). If the input voice data can be recognized as a reserved word, the trigger recognition unit 405 notifies the input management unit 420 of a recognition completion notification (S1806). Receiving the recognition completion notification, the input management unit 420 changes the internal state managed internally from being recognized (reserved word) to waiting for input (additional word) (S1807).
  • the host device 332 takes in a word issued by the user 331 following the reserved word into the input management unit 420 through the microphone 421 (S1808). Since the internal state managed internally is waiting for input (additional word) (S1807), the input management unit 420 transfers the input voice data to the trigger setting unit 403 (S1809). Thereafter, like the reserved word registration described with reference to FIGS. 5A and 5B, the trigger setting unit 403 saves the received voice data in the voice storage area 410-3 of the memory 410 (S1810), while adding the specified number of additional words. Is taken in (S1811).
  • the trigger setting unit 403 determines that the specified number has not been reached as a result of checking whether the specified number has been reached, the trigger setting unit 403 displays a message prompting the user 331 to input the additional word to be registered (S1812) and input management.
  • the input continuation notification is transmitted to the unit 420 (S1814).
  • the display that prompts the user 331 to input a voice to be registered as an additional word (S1813) the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S1812), and the registration incomplete notification. It is desirable that the display device 425 that has received the message blinks the LED in red, for example, so that the user 331 can recognize the display method.
  • the user 331 may be prompted to input a voice to be registered by using a voice method instead of the display method.
  • the trigger setting unit 403 transmits a registration incomplete notification to the speaker 423, and the speaker 423 that has received the registration incomplete notification announces to the user 331, for example, “Please input again”. But you can.
  • the trigger setting unit 403 may use both a display method and a voice method to prompt the user 331 to input a voice to be registered.
  • the trigger setting unit 403 determines that the specified number has been reached as a result of checking whether the specified number has been reached, the trigger setting unit 403 reads the voice data stored in the voice storage area 410-3 until then (S1815), 2 is sent to the recognition data conversion unit 101-1 in the voice recognition cloud 101 in the cloud server 1 (S1816).
  • the recognition data conversion unit 101-1 converts the voice data sent from the trigger setting unit 403 into recognition data for recognizing additional words (S1817). When the conversion to the recognition data is completed, the recognition data conversion unit 101-1 sends the recognition data to the trigger setting unit 403 through the Internet 2 (S1818). Upon receiving the recognition data for recognizing the additional word (hereinafter referred to as recognition data (additional word)), the trigger setting unit 403 stores the received data in the reserved word storage area 410-2 of the memory 410 (S1819). ). When saving the recognition data (additional word), the trigger setting unit 403 saves it in association with the reserved word recognized in S1806. As a result, the recognition data (additional word) can be stored in association with the reserved word recognized in S1806.
  • recognition data additional word
  • the trigger setting unit 403 performs a display (S1822) informing the user 331 that the registration of additional words has been completed.
  • a display that informs the user 331 that the registration of the reserved word has been completed is a display in which the trigger setting unit 403 transmits a registration completion notification to the display device 425 (S1821) and receives the registration completion notification. It is desirable to use a display method that the user 331 can recognize, such as the device 425 blinking the LED in green (S1822).
  • the trigger setting unit 403 may use a voice method instead of the display method to notify the user 331 that registration of the reserved word has been completed.
  • the trigger setting unit 403 transmits a registration completion notification to the speaker 423 (S1821), and the speaker 423 that has received the registration completion notification announces to the user 331, for example, “registration is complete”. But you can.
  • the trigger setting unit 403 may use both a display method and a voice method to notify the user 331 that registration of the reserved word is completed. Thereby, the user 331 can know the timing at which the content of the additional information 2 which is the operation content corresponding to the additional word is uttered.
  • a menu for registering the additional information 2 is displayed on the display unit 425.
  • the user 331 creates additional information 2 by operating according to the displayed menu screen.
  • the creation is completed and the additional information 2 is taken into the input management unit 420 (S1824).
  • the input management unit 420 transfers the acquired additional information 2 to the trigger setting unit 403 (S1825).
  • the trigger setting unit 403 stores the transferred additional information 2 in the reserved word storage area 410-2 of the memory 410 (S1826).
  • the trigger setting unit 403 stores the additional information 2 in association with the reserved word recognized in S1806 when storing the additional information 2 in the reserved word storage area 410-2 of the memory 410. As a result, the operation content (additional information 2) associated with the reserved word recognized in S1806 and associated with the additional word saved in S1819 can be saved.
  • 18D and 18E are examples of processing sequences in the case where only additional information is added to a registered reserved word, unlike FIGS. 18A, 18B, and 18C.
  • the processing from S1850 to S1856 shown in FIG. 18D is the same as the processing from S1800 to S1806 shown in FIG. 18A, respectively.
  • the processing from S1871 to S1880 shown in FIGS. 18D and 18E is the same as the processing from S1821 to S1830 shown in FIG. 18C, respectively.
  • the difference between the sequence example of FIGS. 18A, 18B, and 18C and the sequence example of FIGS. 18D and 18E is that the processing corresponding to the additional word registration processing from S1807 to S1820 in FIGS. 18A, 18B, and 18C There is no point in 18D and FIG. 18E.
  • a menu for registering the additional information 1 is displayed on the display unit 425.
  • the user 331 creates additional information 1 by operating according to the displayed menu screen.
  • the creation is completed and the additional information 1 is taken into the input management unit 420 (S1874).
  • the input management unit 420 transfers the acquired additional information 1 to the trigger setting unit 403 (S1875).
  • the trigger setting unit 403 stores the transferred additional information 1 in the reserved word storage area 410-2 of the memory 410 (S1876).
  • the trigger setting unit 403 stores the additional information 1 in association with the reserved word recognized in S1856 when storing the additional information 1 in the reserved word storage area 410-2 of the memory 410. As a result, the operation content associated with the reserved word recognized in S1856 can be saved.
  • 19A and 19B show the words uttered by the user 331 when the additional word and the additional information 2 are stored in the reserved word storage area 410-2 of the memory 410 by the processing shown in FIGS. 18A, 18B, and 18C.
  • additional information 2 corresponding to the recognized reserved word and additional word combination is read from the reserved word storage area 410-2, and an operation is set for the host device 332 This is a sequence example.
  • the processing from S1900 to S1908 shown in FIG. 19A is the same as the processing from S1600 to S1608 shown in FIG. 16A, respectively.
  • 19A and 19B differs from the processing in FIGS. 16A and 16B in that additional word recognition processing from S1909 to S1911 is added, and additional information 2 reading processing from S1912 to S1913. It is a point to do.
  • the trigger recognition unit 405 successfully recognizes the data captured by the user 311. Comparison with recognition data (additional word) read from the reserved word storage area 410-2 of the memory 410 in order to determine whether the voice data continuously input to the reserved word is an additional word (S1911). When the voice data continuing to the reserved word is recognized as an additional word, the trigger recognition unit 405 reads the corresponding reserved word and additional information 2 corresponding to the additional word from the reserved word storage area 410-2 of the memory 410 ( S1912). The trigger recognition unit 405 that has read the additional information 2 sets the operation of the content of the read additional information 2 in the host device 332 (S1913).
  • the host device 332 operates the host device 332, and operates on devices and sensors connected to the host device 332 through the network. Can be controlled freely, and it is possible to control devices and sensors suitable for the individual lifestyle.
  • FIG. 20 shows that when a plurality of reserved words are registered in the host device 332, when the user 331 recognizes any one of the reserved words, the voice recognition cloud according to the recognized reserved word.
  • 10 is a list of examples of changing a speech recognition dictionary used in the speech text conversion unit 101-2 of 101.
  • the host device 332 can register a plurality of reserved words.
  • the voice text conversion unit 101-2 of the speech recognition cloud 101 according to the recognized reserved word. It is possible to change a speech recognition dictionary for converting voice to text used in the process. For example, as shown in FIGS. 21A and 21B, the host apparatus 332 is assumed to have registered three "Hello", "Hello", "Ookini” as reserved words. In this case the host device 332, if you recognize the "Hello" reserved word, it is possible to issue an instruction to change the voice recognition dictionary used in speech-to-text conversion unit 101-2 of the speech recognition cloud 101 to Japanese dictionary .
  • the host device 332 instructs the speech text conversion unit 101-2 of the speech recognition cloud 101 to change the type of the speech recognition dictionary to the English dictionary. be able to. Furthermore, when the reserved word “OOKINI” is recognized, the host device 332 changes the type of the speech recognition dictionary used in the speech text conversion unit 101-2 of the speech recognition cloud 101 to a dialect dictionary (Kansai dialect). An order can be issued.
  • a dialect dictionary Kansai dialect
  • the user 331 registers the reserved word in the host device 332 In this case, it is necessary to register the type of speech recognition dictionary (hereinafter referred to as additional information 3) used in the speech text conversion unit 101-2 corresponding to the reserved word.
  • additional information 3 used in the speech text conversion unit 101-2 corresponding to the reserved word.
  • the processing sequence for registering the type (additional information 3) of the speech recognition dictionary corresponding to the reserved word together with the registration of the reserved word is a processing sequence for registering additional information 1 for the reserved word shown in FIGS. 15A and 15B.
  • the additional information 1 may be selected and input instead of inputting the additional information 1 on the menu screen displayed on the display unit 425 (S1516).
  • S1514 in FIG. 15B the flow of the process of registering the additional information 3 will be described using the processes after S1514 in FIG. 15B. Additional information 1 described after S1514 in FIG. 15B will be described as additional information 3.
  • a menu for registering the additional information 3 is displayed on the display unit 425.
  • the user 331 can select a type of dictionary as the additional information 3 by performing an input operation of the additional information 3 in accordance with the displayed menu screen.
  • the creation is completed and the additional information 3 is taken into the input management unit 420 (S1516).
  • the input management unit 420 transfers the acquired additional information 3 to the trigger setting unit 403.
  • the trigger setting unit 403 stores the transferred additional information 3 in the reserved word storage area 410-2 of the memory 410.
  • the trigger setting unit 403 stores the additional information 3 in association with the reserved word registered in S1513 when storing the additional information 3 in the reserved word storage area 410-2 of the memory 410.
  • FIGS. 21A and 21B show the case where each reserved word is recognized by the host device 332 when a plurality of reserved words are registered in the host device 332 as shown in FIG.
  • the sequence example which changes the kind of speech recognition dictionary to be used is shown.
  • the processing from S2100 to S2113 shown in FIGS. 21A and 21B is the same as the processing from S1600 to S1613 shown in FIGS. 16A and 16B, respectively.
  • the difference between the processing in FIGS. 21A and 21B and the processing in FIGS. 16A and 16B is that, in the case of the processing in FIGS. 16A and 16B, after the trigger recognition unit 405 reads out the additional information 1, While the operation of the host device 332 is set based on the contents (S1614), in the case of FIGS.
  • the trigger setting unit 403 transmits a registration completion notification to the display device 425 (S2109) to notify the user that the reserved word recognition and the change of the voice recognition dictionary have been completed. It is desirable that the display device 425 that has received the signal illuminates the LED in green, for example, so that the user 331 can recognize the display method.
  • the trigger recognition unit 405 sends a recognition completion notification to the speaker 423, and the speaker 423 that has received the recognition completion notification is, for example, “Yes? What is the voice recognition dictionary? It may be a method of announcing to the user 331 by voice.
  • the trigger recognizing unit 405 notifies the user 331 that the reserved word has been recognized and the speech recognition dictionary corresponding to the recognized reserved word has been changed. Both voice methods using 423 may be used.
  • the operation content corresponding to the reserved word shown in FIG. 14 (additional information 1), the operation content for each additional word (additional information 2) shown in FIGS. 17A and 17B, and the reservation shown in FIG.
  • the types of voice recognition dictionaries (additional information 3) for words can be registered in combination.
  • FIG. 22 shows the registration of the operation content corresponding to the reserved word shown in FIG. 14, the registration of the additional word for the reserved word shown in FIG. 17A, the registration of the operation content for the additional word, and the voice recognition for the reserved word shown in FIG. It is a list of combinations when registering dictionary types in combination.
  • the host device 332 is, for example, a reserved word for the "Hello" is configured to use the Japanese dictionary as the type of voice recognition dictionary.
  • the host device 332 also "Chan”, “Ya” as an additional language to the reserved word “Hello” to register the "Hey”, the host device so as to raise the tone at the time of response in the case of the additional word "chan"
  • the additional word is “Ya”
  • the setting content is changed to increase the expiration time T0 of the input interval confirmation timer T.
  • the additional word is “O”, “Sorry, There is no action "so that it will be announced immediately.
  • FIG. 23 is a list of examples in which the type of the speech recognition dictionary used in the text conversion unit 101-2 is changed according to contents other than reserved words (hereinafter referred to as change conditions).
  • FIG. 23A shows an example in which time is set as the change condition.
  • the host device 332 instructs the text conversion unit 101-2 of the voice recognition cloud 101 to change the type of the voice recognition dictionary used when the voice data is converted into text depending on the time for which the voice recognition dictionary is used. An example is shown.
  • the host device 332 uses the family general dictionary from time 05:00 to 08:00, uses the wife dictionary from time 08:00 to 16:00, and time 16:00 to 20:00.
  • the text conversion unit 101-2 is instructed through the Internet 2 to use the family general dictionary until the time 20:00 to 05:00 and use the adult dictionary.
  • the host device 332 can instruct to change the type of the speech recognition dictionary used by the text conversion unit 101-2 depending on the type of operation status of the host device 332 when using the speech recognition dictionary.
  • the text conversion unit 101-2 is instructed through the Internet 2.
  • the host device 332 has a mode for registering change condition type information (hereinafter referred to as a change condition registration mode) or later, which is information on the type of speech recognition dictionary used in accordance with conditions.
  • a change condition registration mode a mode for registering change condition type information (hereinafter referred to as a change condition registration mode) or later, which is information on the type of speech recognition dictionary used in accordance with conditions.
  • the user 331 needs to register the change condition type information in the host device 332 in advance in order to use the type of the voice recognition dictionary according to the change condition.
  • the registration method for properly using different types of voice recognition dictionaries according to the change condition is to allow the host device 332 to capture the voice uttered by the user 331 through the microphone 421 and analyze the captured voice data so that registration can be performed. Also good.
  • a menu for setting the additional information 1 may be displayed on the display device 425 so that the user 331 can perform registration in accordance with the menu.
  • using an external device connected via the network I / F 427 shown in FIG. 4, for example, a smartphone or tablet a menu for setting additional information 1 as a reserved word is displayed on the display screen of the smartphone or tablet.
  • the user 331 may perform registration according to an operation according to the displayed menu screen.
  • FIG. 24 displays a menu for setting change condition type information displayed on the display unit 425, and processing when the user 331 operates according to the menu to register the type of the voice recognition dictionary to be used depending on the change condition. It is an example of a sequence.
  • the processing from S2417 to S2423 shown in FIG. 24 is the same as the processing from S1517 to S1523 in FIG.
  • the user 331 inputs the type of voice recognition dictionary to be used according to the change condition by operating according to the displayed menu screen.
  • the change condition type information for which input has been completed is taken into the input management unit 420 (S2417).
  • the input management unit 420 transfers the captured change condition type information to the trigger setting unit 403 (S2418).
  • the trigger setting unit 403 stores the transferred change condition type information in the reserved word storage area 410-2 of the memory 410 (S2419).
  • FIG. 25 shows the case where the change condition type information for changing the type of the speech recognition dictionary according to the change condition is stored in the reserved word storage area 410-2 of the memory 410 as shown in FIG. This is an example of a processing sequence in the case where the host device 332 notifies the voice text conversion unit 101-2 of the change of the voice recognition dictionary according to the content of the change condition type information that has been changed.
  • FIG. 25 shows a timing (S1001) when the host device 332 fetches the words when the user 331 utters words to the host device 332 to control the devices and sensors as shown in FIGS. 10A and 10B. It is an example in the case of performing determination of change of the speech recognition dictionary and notification of the result.
  • the host device 332 When the recognition of the reserved word is completed, the host device 332 continuously captures the voice uttered by the user into the input management unit 420 through the microphone 421 (S2501).
  • the input management unit 420 transmits a read request (change condition type information) to the audio processing unit 407 in order to read the change condition type information at the timing when the audio data is acquired (S2502). Pause.
  • the voice processing unit 407 receives the read request (change condition type information), the voice processing unit 407 reads the change condition type information including the combination of the change condition and the type of the voice recognition dictionary from the reserved word storage area 410-2 of the memory 410. (S2503).
  • the voice processing unit 407 analyzes the “change condition” of the read change condition type information, and determines whether or not the content is compatible with the state of the host device 332 (S2504). If it is determined as a result of the determination, the speech processing unit 407 reads “the type of the speech recognition dictionary” corresponding to the “change condition”, and the type of the speech recognition dictionary after the change by the speech recognition dictionary type notification. Is sent to the voice text conversion unit 101-2 through the Internet 2 (2505). Upon receiving the voice recognition dictionary type notification, the voice text conversion unit 101-2 refers to the notified voice recognition dictionary type and changes the type of the voice recognition dictionary currently in use to the notified voice recognition dictionary type. (S2506) When the change of the type of the voice recognition dictionary is completed, the voice text conversion unit 101-2 notifies the voice processing unit 407 of the voice recognition dictionary change completion notification as the change completion notification (S2507).
  • the voice processing unit 407 Upon receiving the voice recognition dictionary change completion notification (S2507), the voice processing unit 407 transmits a read completion notification to the input management unit 420 as a notification that the change condition type information has been read (S2508). .
  • the input management unit 420 receives the read completion notification (S2508), the input management unit 420 resumes the processing for the audio data captured in S2501.
  • the user 331 may forget the reserved word registered in the host device 332. In preparation for such a case, it is desirable that the user 331 can confirm a registered reserved word by a simple method.
  • FIG. 26 shows a case where the user 331 who registered a reserved word in the example of the processing sequence shown in FIGS. 5A and 5B forgets a registered reserved word, and a part or all of the registered reserved word is changed to the user 331.
  • a list of examples of reserved words hereinafter referred to as “reserved reserved words” and display contents (display range) for notifying the user is shown. For example, for a reserved word “I don't know”, all of the reserved words registered in the host device 332 are displayed on the display unit 425 or displayed in the display area of an external device connected to the host device 332. Is shown.
  • a predetermined part of reserved words registered in the host device 332 is displayed on the display unit 425 or an external device connected to the host device 332. It shows the case of displaying in the display area.
  • a reserved word “unused” a reserved word that has not been used in the past year among the reserved words registered in the host device 332 is displayed on the display unit 425 or connected to the host device 332. This shows the case where the data is displayed in the display area of the external device.
  • the external device connected to the host device 332 is preferably a device that has a relatively large display screen, such as a smartphone, tablet, or liquid crystal television, and allows the user to refer to many reserved words at once.
  • the registration of reserved words for displaying registered reserved words is performed by changing the mode of the host device to the setting mode (reserved words (for display)) and changing the reserved words shown in FIGS. 5A and 5B. Registration may be performed according to a registration processing sequence.
  • the above example is an example in which the corresponding reserved word is displayed immediately when the user issues the “reserved reserved word” shown in FIG.
  • the host device 332 may ask the user 331 for a secret before displaying the corresponding reserved word. After the user issues a “reserve reserved word”, the host device 332 emits a voice such as “mountain” through the speaker 423, and when the user 331 responds with “river”, for example, the corresponding reserved word is displayed. It may be displayed.
  • the host device 332 can capture or record a scene in which a reserved word, an additional word, or additional information is registered by taking in a word uttered by the user 331.
  • the recognized scene can be recorded or recorded.
  • FIG. 27 shows a case where the host device 332 captures a word uttered by the user 331 and records or records a reserved word, additional word, or additional information registration, reserved word or additional word recognition scene.
  • a functional block diagram of 332 is shown. The difference from FIG. 4 is that the host device 2700 has a camera 2702 for recording a scene for registering a reserved word, additional word, or additional information, or for recording a scene for recognizing a reserved word or additional word.
  • the control management unit 2701 has EVT-Mg2701-3 in addition to APP-Mg2701-1 and CONF-Mg2701-2, and has a playback control function for playing back the recorded or recorded scene data. It is a point.
  • EVT-Mg2701-3 is a function for recording or recording, which will be described later, due to the occurrence of a scene that registers reserved words, additional words, or additional information, and the occurrence of scenes that recognize reserved words, additional words have.
  • the host device 332 captures a word uttered by the user 331, and records or records a scene for registering a reserved word, additional word, or additional information, and a scene for recognizing the reserved word or additional word. A recording process or a recording process flow will be described.
  • FIG. 28 shows a registered scene or a recognition scene when a scene for registering a reserved word, additional word, or additional information occurs, or when a scene for recognizing a reserved word, additional word is generated.
  • Reference numeral 332 indicates the passage of time when recording or recording.
  • the host device 332 starts registration using the words uttered by the user as reserved words.
  • the start of reserved word registration may be, for example, the timing at which the input management unit 420 performs the processing of S502 in the reserved word registration sequence of FIGS. 5A and 5B.
  • the input management unit 420 notifies the EVT-Mg 2701-3 to that effect.
  • the EVT-Mg2701-3 that has received the notification that the reserved word registration is started records the reserved word registration scene as Rec1 through the microphone 421, or records the reserved word registration scene as Rec1 through the camera 2702.
  • the end of the reserved word registration may be, for example, the timing when the input management unit 420 receives the registration completion notification of S512 in the reserved word registration sequence of FIGS. 5A and 5B.
  • the input management unit 420 that grasps the end of the registration of the reserved word notifies the EVT-Mg 2701-3 to that effect.
  • the EVT-Mg2701- 3 that has received the reserved word registration completion finishes recording the reserved word registration scene that was performed through the microphone 421, or ends recording of the reserved word registration scene that was performed through the camera 2702
  • the host device 332 starts recognizing a word uttered by the user as a reserved word.
  • the start of reserved word recognition may be, for example, the timing at which the input management unit 420 performs the processing of S802 in the reserved word recognition sequence of FIGS. 8A and 8B.
  • the input management unit 420 recognizes the start of recognition of the reserved word, it notifies the EVT-Mg 2701-3 to that effect.
  • the EVT-Mg2701-3 that has received the notification of the start of reserved word recognition records the reserved word recognition scene as Rec2 through the microphone 421, or records the reserved word recognition scene as Rec2 through the camera 2702.
  • the end of the recognition of the reserved word may be the timing when the input management unit 420 receives the recognition completion notification in S811 in the reserved word registration sequence of FIGS. 8A and 8B, for example.
  • the input management unit 420 that grasps the end of the registration of the reserved word notifies the EVT-Mg 2701-3 to that effect.
  • the EVT-Mg2701- 3 that has received the reserved word registration completion terminates the recording of the reserved word recognition scene performed through the microphone 421 or terminates the recording of the reserved word recognition scene performed through the camera 2702.
  • the host device 332 can play back a recorded scene or a recorded scene that has been recorded.
  • FIG. 29 shows an example of a state in which data to be reproduced is displayed when each data of a recorded or recorded scene is reproduced.
  • four icons of data to be reproduced are displayed in a form corresponding to the state of occurrence of events with respect to the time axis of FIG.
  • the icon display of the data to be reproduced may be displayed on the display unit 425, for example. Or you may display on the external device connected to the host apparatus 332, for example, a smart phone, a tablet, a liquid crystal television.
  • the displayed icon represents the date and time of recording or recording and the content of the data to be recorded or recorded. For example, if the displayed content of the icon is reserved word registration “OOKINI”, it indicates that the content of the recorded or recorded data is a scene in which “OOKINI” is registered as a reserved word. Similarly, when the display content of the icon is reserved word recognition “OOKINI”, it indicates that the content of the recorded or recorded data is a scene in which “OOKINI” is recognized as a reserved word.
  • the user 331 can confirm the recording of the target data or the recorded contents by selecting the icon of the data to be reproduced.
  • the host device 332 issues instructions to the cameras and microphones connected via the network 333, and when a scene for registering reserved words, additional words, or additional information is generated by these cameras or microphones, or When a scene for recognizing reserved words and additional words occurs, a registered scene or a recognition scene may be recorded or recorded.
  • the host device 332 recognizes a reserved word from words uttered by the user 331, and based on the content of the additional information corresponding to the reserved word, a device or sensor connected via the network. Can be controlled.
  • the control contents of the target devices and sensors may require high security.
  • a reserved word in which the opening / closing operation of the safe door is set as additional information is registered in the host device 332 so that the opening / closing control of the safe door can be performed using the host device.
  • the host device 332 opens and closes the door of the safe and records or records the periphery of the safe as a control target device using a microphone and a camera around the safe.
  • the user 331 confirms the contents of data recorded or recorded using a microphone or camera connected to the network, as well as data recorded or recorded using a microphone or camera built in the host device 332. I can do it.
  • the host device 332 further includes a microphone or a device around the control target device or sensor before the control content is executed. The validity of the person who produced the recorded voice or the person of the recorded video may be confirmed using the voice recorded using the camera or the recorded video.
  • the host device 332 uses a pre-registered feature point such as a voice or face of a specific person and a microphone or camera around the device or sensor to be controlled before executing the control content in the specific additional information.
  • the control content may be executed only when the sound of the collected sound or the captured video is compared and the validity of the corresponding person is confirmed.
  • the recognition data conversion unit 101-1, the speech text conversion unit 101-2, the text analysis unit 102-1 and the response / action generation unit 102-2 are all included in the cloud server 1. Although described as existing, some or all of these may exist in the host device 332. Also in this case, the example of the operation sequence of each process already described is the same as that already described.

Abstract

An objective of the present embodiment is to provide an electronic device and a control method for the same, the electronic device controlling various devices, which are connected by a network, so as to match the lifestyle of individual users. An electronic device according to the present embodiment determines, from the content of an externally input first utterance, to control one or a plurality of devices on the basis of the content of a second utterance input after the first utterance. The electronic device comprises: a management means which uses a plurality of externally input utterances to create and manage determining data for determining that the first utterance is a desired utterance, and which uses the created and managed determining utterance data to determine that the first utterance is the desired utterance; and a control means that controls the one or plurality of devices on the basis of the content of the second utterance. When the managements means has determined that the first utterance is the desired utterance by using the determining utterance data, the control means controls the one or plurality of devices on the basis of the content of the second utterance.

Description

電子機器及びその制御方法Electronic device and control method thereof
 本発明の実施形態は、家庭やオフィスや小規模事業所におけるホームオートメーションの分野における、音声によって複数の機器を制御する電子機器及びその制御方法に関するものである。 Embodiments of the present invention relate to an electronic device that controls a plurality of devices by voice and a control method thereof in the field of home automation in homes, offices, and small-scale offices.
 従来ホームオートメーションの分野において、音声入力により家庭やオフィスや小規模事業所における種々の機器を操作及び制御する音声認識装置及び方法が存在している。 Conventionally, in the field of home automation, there are voice recognition devices and methods for operating and controlling various devices in homes, offices and small-scale offices by voice input.
 この音声認識装置及び方法は、ユーザから入力された音声を解析することで、その入力された音声が当該装置の機能をオンにする音声であるか否かの判定を行ったり、当該装置の機能をオンにする音声であると判定した場合は、継続する音声の内容を分析しその分析結果に基づく処理を行ったりするものである。また、ユーザから入力された音声の特徴を認識することで、音声を発したユーザを特定し、そのユーザに適した処理を行ったりするものもある。 This voice recognition apparatus and method analyze voice inputted from a user to determine whether the inputted voice is voice that turns on the function of the apparatus, When it is determined that the sound is to turn on, the content of the continued sound is analyzed and processing based on the analysis result is performed. In addition, there is a method of identifying a user who has emitted a voice by recognizing characteristics of the voice input by the user and performing a process suitable for the user.
国際公開2015-029379号International Publication No.2015-029379 国際公開2015-033523号International Publication No. 2015-033523
 ホームオートメーションシステムの形態としては、各々の機器が家庭内のネットワークにより互いに接続され、更にこの接続された複数の機器をトータルで制御するホスト機器がネットワークに接続されているものがある。この場合ホスト機器は、ネットワークで接続された各機器の動作の制御を行ったり、各機器に関する情報を集めてユーザが一元的に閲覧等できるよう管理したりしている。 As a form of a home automation system, there is a type in which each device is connected to each other via a home network, and a host device that totally controls the plurality of connected devices is connected to the network. In this case, the host device controls the operation of each device connected via the network, or manages information so that the user can centrally browse and collect information about each device.
 ユーザは、例えば音声によりホスト機器に命令することで、ホスト機器とネットワークで接続された各々の機器の制御を行ったり、その接続された各々の機器に関する情報を一元的に閲覧したりすることができる。 For example, the user can control each device connected to the host device via a network by instructing the host device by voice, for example, and centrally browse information on each connected device. it can.
 このような形態のホームオートメーションシステムは、制御対象の機器をネットワークにより容易に接続させることが可能なため、接続機器の数や種類が多数になる傾向がある。また、制御対象の機器の追加、変更、バージョンアップ、設置場所の移動及び廃棄等に伴うネットワークへの新たな参加、設定変更及びネットワークからの脱退が度々発生する傾向にある。また、接続している機器の動作内容や仕様等の種類が多数に及ぶことから、家庭内やオフィスでも老若男女問わずホームオートメーションシステムを使用する傾向にある。特に最近の多種多様な機能をもつ機器やセンサの小型化に伴い、この傾向はますます顕著になってきている。 In such a form of home automation system, devices to be controlled can be easily connected via a network, so the number and types of connected devices tend to be large. In addition, there is a tendency that new participation in the network, setting change, and withdrawal from the network frequently occur due to addition, change, version upgrade, movement and disposal of the installation target device. In addition, since there are many types of operation contents and specifications of connected devices, home automation systems tend to be used in homes and offices regardless of gender. In particular, with the recent miniaturization of devices and sensors having a wide variety of functions, this trend has become more prominent.
 しかし従来のホームオートメーションシステムでは、多種多様の機器の制御や幅広いユーザ層への対応が十分とは言えないものであった。例えば、家庭内でホームオートメーションシステムを使用する場合、家族一人ひとりの生活スタイルによりマッチした機器の制御が十分に行われているとは言えないものであった。 However, conventional home automation systems are not sufficient for controlling a wide variety of devices and supporting a wide range of users. For example, when a home automation system is used in a home, it cannot be said that a device that matches the lifestyle of each family is sufficiently controlled.
 本実施形態は、上記課題を鑑みてなされたもので、ネットワークにより接続された多種多様な機器を、ユーザの個々の生活スタイルによりマッチするように制御する電子機器及びその制御方法を提案することを目的とする。 The present embodiment has been made in view of the above problems, and proposes an electronic device that controls various devices connected by a network so as to match each lifestyle of the user and a control method thereof. Objective.
 実施形態の電子機器は、外部から入力される第1の音声の内容により、前記第1の音声が入力された以降に入力される第2の音声の内容に基づいて1台または複数台の機器の制御の実行を判定する電子機器において、前記第1の音声が所望の音声であることを判定するための判定用音声データを、複数回外部から入力された音声により作成管理し、作成管理されている前記判定用音声データを用いて前記第1の音声が所望の音声であることを判定する管理手段と、第2の音声の内容に基づいて前記1台または複数台の機器の制御を実行する制御手段とを備え、前記管理手段により前記判定用音声データを用いて、前記第1の音声が所望の音声であると判定された場合に、前記制御手段により前記第2の音声の内容に基づいて前記1台または複数台の機器の制御を実行する。 The electronic device according to the embodiment includes one or a plurality of devices based on the content of the first sound input from the outside, based on the content of the second sound input after the input of the first sound. In the electronic device that determines the execution of the control, the sound data for determination for determining that the first sound is a desired sound is created and managed by the sound input from the outside a plurality of times, and is created and managed. Management means for determining that the first sound is a desired sound using the determination sound data, and controlling the one or a plurality of devices based on the content of the second sound And when the management means determines that the first sound is a desired sound, the control means changes the content of the second sound to the content of the second sound. Based on said one or To perform the control of the few cars of the equipment.
図1は一実施形態に係るホームオートメーションシステムの全体像の例を示す図である。FIG. 1 is a diagram illustrating an example of an overall image of a home automation system according to an embodiment. 図2は一実施形態に係るセンサの他の例を示す一覧である。FIG. 2 is a list showing another example of the sensor according to the embodiment. 図3は一実施形態に係るホスト機器の例を示す図である。FIG. 3 is a diagram illustrating an example of a host device according to an embodiment. 図4は一実施形態に係るホスト機器の機能ブロック図である。FIG. 4 is a functional block diagram of the host device according to the embodiment. 図5Aは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 5A is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment. 図5Bは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 5B is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment. 図6Aは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 6A is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment. 図6Bは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 6B is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment. 図7Aは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 7A is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment. 図7Bは一実施形態に係る予約語の登録における処理シーケンスの例を示す図である。FIG. 7B is a diagram illustrating an example of a processing sequence in registering a reserved word according to an embodiment. 図8Aは一実施形態に係る予約語の認識における処理シーケンスの例を示す図である。FIG. 8A is a diagram showing an example of a processing sequence in reserved word recognition according to an embodiment. 図8Bは一実施形態に係る予約語の認識における処理シーケンスの例を示す図である。FIG. 8B is a diagram illustrating an example of a processing sequence in reserved word recognition according to an embodiment. 図9Aは一実施形態に係る予約語の認識における処理シーケンスの例を示す図である。FIG. 9A is a diagram showing an example of a processing sequence in reserved word recognition according to an embodiment. 図9Bは一実施形態に係る予約語の認識における処理シーケンスの例を示す図である。FIG. 9B is a diagram showing an example of a processing sequence in reserved word recognition according to an embodiment. 図10Aは一実施形態に係る予約語を認識した以降に、継続してユーザが発した機器やセンサを制御する言葉をもとに該当する機器やセンサを制御する処理シーケンスの例を示す図である。FIG. 10A is a diagram illustrating an example of a processing sequence for controlling a corresponding device or sensor based on words for controlling the device or sensor continuously issued by a user after recognizing a reserved word according to an embodiment. is there. 図10Bは一実施形態に係る予約語を認識した以降に、継続してユーザが発した機器やセンサを制御する言葉をもとに該当する機器やセンサを制御する処理シーケンスの例を示す図である。FIG. 10B is a diagram illustrating an example of a processing sequence for controlling a corresponding device or sensor based on words for controlling the device or sensor continuously issued by the user after recognizing a reserved word according to an embodiment. is there. 図11Aは一実施形態に係る予約語を認識した以降に、継続してユーザが発する機器やセンサを制御する言葉が、一定時間内に継続される場合の処理シーケンスの例を示す図である。FIG. 11A is a diagram illustrating an example of a processing sequence in a case where words for controlling devices and sensors continuously issued by a user after a reserved word according to an embodiment is continued within a certain period of time. 図11Bは一実施形態に係る予約語を認識した以降に、継続してユーザが発する機器やセンサを制御する言葉が、一定時間内に継続される場合の処理シーケンスの例を示す図である。FIG. 11B is a diagram illustrating an example of a processing sequence in a case where words for controlling devices and sensors continuously issued by a user after a reserved word according to an embodiment is continued within a certain period of time. 図12Aは一実施形態に係る予約語を認識した以降に、継続してユーザが発する機器やセンサを制御する言葉が、一定時間を超えて継続される場合の処理シーケンスの例を示す図である。FIG. 12A is a diagram illustrating an example of a processing sequence in a case where words for controlling devices and sensors continuously issued by a user after a reserved word according to an embodiment is continued for a certain period of time. . 図12Bは一実施形態に係る予約語を認識した以降に、継続してユーザが発する機器やセンサを制御する言葉が、一定時間を超えて継続される場合の処理シーケンスの例を示す図である。FIG. 12B is a diagram illustrating an example of a processing sequence in a case where words for controlling devices and sensors continuously issued by a user after a reserved word according to an embodiment is continued for a certain period of time. . 図13は一実施形態に係る予約語を認識した以降に、機器やセンサを制御する際に用いる制御情報の内容を具体的に示した一覧である。FIG. 13 is a list specifically showing the contents of control information used when controlling devices and sensors after recognizing a reserved word according to an embodiment. 図14は一実施形態に係る複数の予約語に応じて変更する動作内容の例を示す一覧である。FIG. 14 is a list showing examples of operation contents to be changed according to a plurality of reserved words according to an embodiment. 図15Aは一実施形態に係る複数の予約語の登録において、各予約語に応じて変更する動作内容もあわせて登録する処理シーケンスの例を示す図である。FIG. 15A is a diagram illustrating an example of a processing sequence for registering a plurality of reserved words according to an embodiment, in which operation contents to be changed according to each reserved word are also registered. 図15Bは一実施形態に係る複数の予約語の登録において、各予約語に応じて変更する動作内容もあわせて登録する処理シーケンスの例を示す図である。FIG. 15B is a diagram illustrating an example of a processing sequence for registering a plurality of reserved words according to an embodiment, in which operation contents to be changed according to each reserved word are also registered. 図16Aは一実施形態に係る予約語の認識において、各予約語に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 16A is a diagram illustrating an example of a processing sequence for setting the operation content according to each reserved word in the recognition of the reserved word according to the embodiment. 図16Bは一実施形態に係る予約語の認識において、各予約語に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 16B is a diagram illustrating an example of a processing sequence for setting operation contents according to each reserved word in the recognition of the reserved word according to the embodiment. 図17は一実施形態に係る予約語において、その予約語に継続する言葉に応じて設定する動作内容の例を示す一覧である。FIG. 17 is a list showing examples of operation contents set in accordance with words continuing to the reserved words in the reserved words according to the embodiment. 図18Aは一実施形態に係る登録済み予約語の認識において、その予約語に継続する言葉に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 18A is a diagram illustrating an example of a processing sequence for setting the operation content according to a word continuing to a reserved word in recognition of a registered reserved word according to an embodiment. 図18Bは一実施形態に係る登録済み予約語の認識において、その予約語に継続する言葉に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 18B is a diagram showing an example of a processing sequence for setting the operation content according to words that continue to the reserved word in the recognition of the registered reserved word according to the embodiment. 図18Cは一実施形態に係る登録済み予約語の認識において、その予約語に継続する言葉に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 18C is a diagram showing an example of a processing sequence for setting the operation content according to a word continuing to the reserved word in the recognition of the registered reserved word according to the embodiment. 図18Dは一実施形態に係る登録済み予約語の認識において、その予約語に継続する言葉に応じた動作内容を設定する処理シーケンスの別の例を示す図である。FIG. 18D is a diagram illustrating another example of a processing sequence for setting operation contents according to words that continue to the reserved word in recognition of a registered reserved word according to an embodiment. 図18Eは一実施形態に係る登録済み予約語の認識において、その予約語に継続する言葉に応じた動作内容を設定する処理シーケンスの別の例を示す図である。FIG. 18E is a diagram showing another example of a processing sequence for setting operation content according to words that continue to the reserved word in recognition of a registered reserved word according to an embodiment. 図19Aは一実施形態に係る予約語の認識において、その認識した予約語に継続する言葉に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 19A is a diagram illustrating an example of a processing sequence for setting operation contents according to words that continue to the recognized reserved word in the recognition of the reserved word according to the embodiment. 図19Bは一実施形態に係る予約語の認識において、その認識した予約語に継続する言葉に応じて動作内容を設定する処理シーケンスの例を示す図である。FIG. 19B is a diagram showing an example of a processing sequence for setting the operation content according to a word continuing to the recognized reserved word in the recognition of the reserved word according to the embodiment. 図20は一実施形態に係る複数の予約語の認識において、その予約語に応じて使用する音声認識辞書の種類の例を示す一覧である。FIG. 20 is a list showing examples of types of speech recognition dictionaries used in accordance with reserved words in the recognition of a plurality of reserved words according to an embodiment. 図21Aは一実施形態に係る複数の予約語の認識において、その予約語に応じて使用する音声認識辞書の種類を変更する処理シーケンスの例を示す図である。FIG. 21A is a diagram showing an example of a processing sequence for changing the type of speech recognition dictionary used in accordance with a reserved word in the recognition of a plurality of reserved words according to an embodiment. 図21Bは一実施形態に係る複数の予約語の認識において、その予約語に応じて使用する音声認識辞書の種類を変更する処理シーケンスの例を示す図である。FIG. 21B is a diagram showing an example of a processing sequence for changing the type of speech recognition dictionary used in accordance with a reserved word in the recognition of a plurality of reserved words according to an embodiment. 図22は一実施形態に係る複数の予約語の認識において、その予約語に継続する言葉や応じて設定する動作内容や、使用する音声認識辞書の種類を変更する例を示す一覧である。FIG. 22 is a list showing an example in which, in the recognition of a plurality of reserved words according to one embodiment, the words that continue to the reserved words, the operation contents set according to the reserved words, and the type of the speech recognition dictionary to be used are changed. 図23は一実施形態に係る音声認識辞書の種類の変更において、予約語以外の内容に応じて変更する例を示す一覧である。FIG. 23 is a list showing an example of changing the type of the speech recognition dictionary according to the embodiment according to contents other than reserved words. 図24は一実施形態に係る音声認識辞書の種類の変更において、予約語以外の内容に応じて変更する音声認識辞書の種類を登録する処理のシーケンスを示す図である。FIG. 24 is a diagram showing a sequence of processing for registering the type of the speech recognition dictionary to be changed according to the content other than the reserved word in the change of the type of the speech recognition dictionary according to the embodiment. 図25は一実施形態に係る音声認識辞書の種類の変更において、予約語以外の内容に応じて登録する音声認識辞書の種類を変える場合の処理のシーケンスを示す図である。FIG. 25 is a diagram showing a processing sequence when changing the type of the speech recognition dictionary to be registered according to the contents other than the reserved word in the change of the type of the speech recognition dictionary according to the embodiment. 図26は一実施形態に係る処理において、ユーザが登録済みの予約語を忘れてしまった場合の、予約語を表示するための予約語(救済用)と、それに対応して予約語を表示する範囲の例を示す一覧である。FIG. 26 shows a reserved word (for relief) for displaying a reserved word and a corresponding reserved word when the user forgets a registered reserved word in the processing according to the embodiment. It is a list which shows the example of a range. 図27は一実施形態に係るホスト機器の機能ブロック図である。FIG. 27 is a functional block diagram of a host device according to an embodiment. 図28は一実施形態に係る処理において、予約語、付加語、あるいは付加情報を登録するシーンが発生したとき、あるいは予約語、付加語、を認識するシーンが発生したときに、ホスト機器332が登録のシーンあるいは認識のシーンを録音あるいは録画する場合の時間経過の一例を示している図である。FIG. 28 shows a case where the host device 332 detects that a scene for registering a reserved word, additional word, or additional information or a scene for recognizing a reserved word, additional word occurs in the processing according to an embodiment. It is a figure which shows an example of time passage at the time of recording or recording a registration scene or a recognition scene. 図29は一実施形態に係る録音あるいは録画されたシーンの各データを再生する際の再生対象のデータが表示されている様子の一例を示している図である。FIG. 29 is a diagram illustrating an example of a state in which data to be reproduced is displayed when reproducing each data of a recorded or recorded scene according to an embodiment.
 図1は、本実施形態に係るホームオートメーションシステムの全体構成の一例を示した図である。ホームオートメーションシステムは、クラウドに置かれたサーバ群からなるクラウドサーバ1と、HGW(HomeGateWay)機能を持つホスト機器332を経由してネットワーク333で互いに接続されている各種センサ310や各種設備機器320や各種家電機器340が配置されているホーム3と、クラウドサーバ1とホスト機器332とを接続するインターネット2とから成る。 FIG. 1 is a diagram showing an example of the overall configuration of the home automation system according to the present embodiment. The home automation system includes a cloud server 1 including a group of servers placed in the cloud and various sensors 310 and various equipment devices 320 connected to each other via a network 333 via a host device 332 having a HGW (HomeGateway) function. It consists of a home 3 in which various home appliances 340 are arranged, and the Internet 2 that connects the cloud server 1 and the host device 332.
 ホーム1は、HGW機能を持つホスト機器332を経由して、家庭内のネットワーク333で互いに接続されている各種センサ310や各種設備機器320や各種家電機器340が配置された家庭やオフィスや小規模事業所であり、その規模は問わない。 The home 1 is a home, office, or small scale where various sensors 310, various equipment devices 320, and various home appliances 340 are connected to each other via a home network 333 via a host device 332 having an HGW function. It is a business establishment and its size is not limited.
 ホスト機器332は、予め設定されている情報やネットワーク333で接続されたセンサから通知された情報をもとにネットワーク333で接続されている機器やセンサを制御したり、また各々の機器やセンサに関する情報を一元管理したりする機能を有する。 The host device 332 controls devices and sensors connected via the network 333 based on information set in advance and information notified from sensors connected via the network 333, and also relates to each device and sensor. It has a function to centrally manage information.
 更にホスト機器332は、マイクを備えておりユーザ331が発した言葉を取り込むことが出来る。ホスト機器332は、ユーザ331が発した言葉の中から予め決められたキーワード(以降予約語と呼ぶ)を認識すると、その予約語に続いてユーザ331が発した言葉を取り込み、その取り込んだ言葉の内容を解析することで解析結果に応じた応答をユーザ331に返したり、或いはネットワーク333で接続されている機器やセンサを解析結果に応じて制御をしたりする機能を有する。 Furthermore, the host device 332 has a microphone and can capture words uttered by the user 331. When the host device 332 recognizes a predetermined keyword (hereinafter referred to as a reserved word) from words uttered by the user 331, the host device 332 takes in the words uttered by the user 331 following the reserved words, and By analyzing the contents, a response corresponding to the analysis result is returned to the user 331, or devices and sensors connected via the network 333 are controlled according to the analysis result.
 逆にホスト機器332は、ユーザ331が発した言葉の中から予約語を認識しない限り、ユーザ331が発した言葉を継続して取り込むことはしない。これによりホスト機器332は、周囲の不要な音声を拾って動作することを防いでいる。 Conversely, unless the host device 332 recognizes a reserved word from the words uttered by the user 331, the host device 332 does not continuously capture the words uttered by the user 331. This prevents the host device 332 from picking up unnecessary surrounding sounds and operating.
 予約語の認識はホスト機器332内で行われ、予約語に続いてユーザ331が発した言葉を継続して取り込み、その取り込んだ言葉の内容の解析は、クラウドサーバ1において行われる。ホスト機器332の機能の詳細については後で説明する。 The recognition of reserved words is performed in the host device 332, the words uttered by the user 331 following the reserved words are continuously captured, and the contents of the captured words are analyzed in the cloud server 1. Details of the function of the host device 332 will be described later.
 各種設備機器320と各種家電機器340は、説明の便宜上設備機器320が移動があまり容易でない機器を意味しており、各種家電機器340が移動が比較的容易である機器を意味している。例示した設備機器や家電機器の名称は、個々の機器の能力や機能を制限するものではない。 The various equipment 320 and the various household appliances 340 mean equipment for which the equipment 320 is not easily moved for convenience of explanation, and mean that the various household appliances 340 are relatively easy to move. The names of the exemplified equipment and home appliances do not limit the capabilities and functions of the individual devices.
 各種センサ310の具体例として、防犯カメラ311、火災報知器312、人感センサ313、温度センサ314がある。また、各種設備機器320320の具体例として、インターフォン325、照明326、エアコン327、給湯器328がある。また、各種家電機器340の具体例として、洗濯機341、冷蔵庫342、電子レンジ343、扇風機344、炊飯器345、テレビ346がある。 Specific examples of the various sensors 310 include a security camera 311, a fire alarm 312, a human sensor 313, and a temperature sensor 314. Specific examples of the various equipment 320320 include an interphone 325, an illumination 326, an air conditioner 327, and a water heater 328. Specific examples of the various home appliances 340 include a washing machine 341, a refrigerator 342, a microwave oven 343, a fan 344, a rice cooker 345, and a television 346.
 図2は、図1に示す各種センサ310のその他の例を示したものである。 FIG. 2 shows another example of the various sensors 310 shown in FIG.
 図3は、図1に示すホスト機器332の種々の例を示している。 FIG. 3 shows various examples of the host device 332 shown in FIG.
 ホスト機器332-1は、図1に示すホスト機器332であり、HGW機能を内蔵する据え置き型の例である。ホスト機器332-1は、ネットワーク333を通じてホーム1内に配置されている他の機器やセンサと接続されており、またインターネット2を通じてクラウドサーバ1と接続されている。ホスト機器332-1は、据え置き型のため例えばモーター等の自律的に移動する手段を搭載しない例である。 The host device 332-1 is the host device 332 shown in FIG. 1, and is an example of a stationary type having a built-in HGW function. The host device 332-1 is connected to other devices and sensors arranged in the home 1 through the network 333, and is connected to the cloud server 1 through the Internet 2. Since the host device 332-1 is a stationary type, for example, an autonomous moving means such as a motor is not mounted.
 ホスト機器332-2は、HGW機能を内蔵しない据え置き型の例である。そのためホスト機器332-2は、ネットワーク333を通じてHGW330と接続されている。ホスト機器332-2は、HGW330を経由してネットワーク333を通じてホーム1内に配置されている他の機器やセンサと接続され、またHGW330を経由してインターネット2を通じてクラウドサーバ1と接続されている。ホスト機器332-2は、据え置き型のため例えばモーター等の自律的に移動する手段を搭載しない例である。 The host device 332-2 is an example of a stationary type that does not have a built-in HGW function. Therefore, the host device 332-2 is connected to the HGW 330 through the network 333. The host device 332-2 is connected to other devices and sensors disposed in the home 1 via the network 333 via the HGW 330, and is connected to the cloud server 1 via the Internet 2 via the HGW 330. Since the host device 332-2 is a stationary type, for example, an autonomously moving means such as a motor is not mounted.
 ホスト機器332-3は、HGW機能を内蔵する可動型の例である。ホスト機器332-3は、ネットワーク333を通じて他の機器やセンサと接続されており、またインターネット2を通じてクラウドサーバ1と接続されている。ホスト機器332-3は、可動型のため例えばモーター等の自律的に移動するための手段を搭載する例である。 The host device 332-3 is a movable example having a built-in HGW function. The host device 332-3 is connected to other devices and sensors through the network 333, and is connected to the cloud server 1 through the Internet 2. Since the host device 332-3 is a movable type, it is an example in which means for autonomously moving, such as a motor, is mounted.
 ホスト機器332-4は、HGW機能を内蔵しない可動型の例である。そのためホスト機器332-4は、ネットワーク333を通じてHGW330と接続されている。ホスト機器332-4は、HGW330を経由してネットワーク333を通じて他の機器やセンサと接続され、またHGW330を経由してインターネット2を通じてクラウドサーバ1と接続されている。ホスト機器332-4は、可動型のため例えばモーター等の自律的に移動するための手段を搭載する例である。 The host device 332-4 is an example of a movable type that does not have a built-in HGW function. Therefore, the host device 332-4 is connected to the HGW 330 through the network 333. The host device 332-4 is connected to other devices and sensors via the network 333 via the HGW 330, and is connected to the cloud server 1 via the Internet 2 via the HGW 330. Since the host device 332-4 is movable, it is an example in which means for autonomously moving, such as a motor, is mounted.
 図4は、図1に示すホスト機器332の機能ブロックを示したものである。ホスト機器332は、内部の処理全体を制御するシステムコントローラ402、とそれにより各機能を制御する制御管理部401、トリガー設定部403、トリガー認識部405、入力管理部420及びネットワーク333と接続するためのネットワークI/F427をもつ。制御管理部401は、内部にホスト機器332の各種動作を制御するための複数のアプリケーションを管理するAPP-Mg401-1、ホスト機器332の各機能ブロックの初期設定や種々の状態設定や動作設定などの設定内容を管理するCONF-Mg401-2からなる。 FIG. 4 shows functional blocks of the host device 332 shown in FIG. The host device 332 is connected to the system controller 402 that controls the entire internal processing, and the control management unit 401, trigger setting unit 403, trigger recognition unit 405, input management unit 420, and network 333 that controls each function thereby. Network I / F 427. The control management unit 401 internally manages a plurality of applications for controlling various operations of the host device 332, initial settings of various functional blocks of the host device 332, various state settings, operation settings, and the like. It consists of CONF-Mg 401-2 for managing the setting contents.
 またホスト機器332は、ユーザ331とのインターフェース(I/F)として、ユーザ331が発する言葉を取り込むためのマイク421、ユーザ331に対して応答を音声で出力するためのスピーカ423及びユーザ331に対してホスト機器332の状態を通知するための表示部425とを持つ。 In addition, the host device 332 serves as an interface (I / F) with the user 331, such as a microphone 421 for capturing words uttered by the user 331, a speaker 423 for outputting a response to the user 331 by voice, and the user 331. And a display unit 425 for notifying the status of the host device 332.
 マイク421は、入力管理部420に接続されている。入力管理部420は、内部で管理する状態に応じて、マイク421から入力された音声データを、トリガー設定部403、トリガー認識部405及び音声処理部407の何れに送るかの制御をする。表示部425は、ホスト機器332の状態をユーザ331に通知するものであり、例えばLED(LightEmittingDiode)やLCD(LiquidCrystalDisplay)である。 The microphone 421 is connected to the input management unit 420. The input management unit 420 controls whether the voice data input from the microphone 421 is sent to the trigger setting unit 403, the trigger recognition unit 405, or the voice processing unit 407 according to the state managed internally. The display unit 425 notifies the state of the host device 332 to the user 331, and is, for example, an LED (Light Emitting Diode) or an LCD (Liquid Crystal Display).
 メモリ410は、動作モード保存エリア410-1、予約語保存エリア410-2、音声蓄積エリア410-3の3つの領域に分かれている。各々のエリアに保存される情報の内容は後で説明する。 The memory 410 is divided into three areas: an operation mode storage area 410-1, a reserved word storage area 410-2, and a voice storage area 410-3. The contents of the information stored in each area will be described later.
 先に述べたようにホスト機器332の機能は、ユーザ331が発した言葉の中から予約語を認識すると、その予約語に継続するユーザ331の発した言葉を取り込み、その取り込んだ言葉の内容を解析することで、解析結果に応じた応答をユーザ331に返したりネットワーク333を通じて接続されている機器やセンサの動作を制御したりする機能を持つ。 As described above, when the function of the host device 332 recognizes a reserved word among the words uttered by the user 331, the function utters the word uttered by the user 331 to the reserved word and displays the contents of the imported word. By performing the analysis, it has a function of returning a response according to the analysis result to the user 331 and controlling the operation of devices and sensors connected through the network 333.
 これらの機能を実現するために、ホスト機器332は、大きく4つの処理を行う。1つ目の処理は、予約語の登録である。2つ目の処理は、予約語の認識である。3つ目の処理は、動作を制御する機器やセンサの制御内容の登録である。4つ目の処理は、制御内容が登録されている機器やセンサの制御である。 In order to realize these functions, the host device 332 performs four processes. The first process is registration of reserved words. The second process is recognition of reserved words. The third process is registration of control contents of devices and sensors that control operations. The fourth process is control of devices and sensors for which control contents are registered.
 最初に、1つ目の処理である予約語の登録について説明する。 
 ホスト機器332は、予約語をホスト機器332に登録する機能を有している。予約語を登録するために、ホスト機器332は、予約語を登録するモード(以降予約語登録モードと呼ぶ)を有している。
First, registration of reserved words, which is the first process, will be described.
The host device 332 has a function of registering a reserved word in the host device 332. In order to register a reserved word, the host device 332 has a mode for registering a reserved word (hereinafter referred to as a reserved word registration mode).
 図5Aおよび図5Bは、予約語を登録するためにホスト機器332が「予約語登録モード」に遷移している状態において、予約語の登録開始から登録完了までのホスト機器332の処理シーケンスの例を示している。 FIG. 5A and FIG. 5B show examples of processing sequences of the host device 332 from the start of reserved word registration to the completion of registration in a state where the host device 332 is in the “reserved word registration mode” in order to register a reserved word. Is shown.
 なおホスト機器332は、モード変更するために予め決められた順番通りにユーザ331が発した言葉を認識することで、モード変更ができるようにしてもよい。あるいは表示部425にメニュー画面を表示し、そのメニュー画面をユーザ331が操作することでモード変更ができるようにしてもよい。あるいは、ネットワークI/F427を経由して接続されているスマートフォンやタブレットに表示されたホスト機器332のモードを変更するメニュー画面をユーザ331が操作することで、モード変更ができるようにしてもよい。 Note that the host device 332 may be able to change the mode by recognizing words uttered by the user 331 in a predetermined order in order to change the mode. Alternatively, a menu screen may be displayed on the display unit 425, and the mode may be changed by the user 331 operating the menu screen. Alternatively, the mode may be changed by the user 331 operating a menu screen for changing the mode of the host device 332 displayed on the smartphone or tablet connected via the network I / F 427.
 予約語として登録する言葉をユーザ331が発すると、ホスト機器332はマイク421から入力された音声データを入力管理部420に取り込む(S501)。入力管理部420は、内部で管理する状態に応じて入力された音声データの転送先を決める機能を有している。ホスト機器332のモードが設定モードである場合、入力管理部420は、受信した音声データをトリガー設定部403に転送する(S502)。トリガー設定部403は、受信した音声データをメモリ410の音声蓄積エリア410-3に保存する(S503)とともに、ユーザ331の音声を取り込んだ回数が規定回数に達しているかの確認(S504)を行う。 When the user 331 utters a word to be registered as a reserved word, the host device 332 takes the voice data input from the microphone 421 into the input management unit 420 (S501). The input management unit 420 has a function of determining a transfer destination of input audio data according to a state managed internally. When the mode of the host device 332 is the setting mode, the input management unit 420 transfers the received audio data to the trigger setting unit 403 (S502). The trigger setting unit 403 stores the received audio data in the audio storage area 410-3 of the memory 410 (S503), and confirms whether the number of times the user 331 has acquired the audio has reached the specified number (S504). .
 トリガー設定部403は、ユーザ331の音声を取り込んだ回数が規定回数に達しているかの確認の結果、規定回数に達していないと判定した場合、登録する言葉を発するようにユーザ331に促す表示を行う(S507)と共に、入力管理部420に対して入力継続通知を送付する(S506)。入力継続通知を受信した入力管理部420は、内部の状態をマイクからの音声の入力待ちの状態に遷移させる(S500)。 The trigger setting unit 403 displays a display prompting the user 331 to utter a word to be registered when it is determined that the specified number has not been reached as a result of checking whether the number of times the user 331 has captured the voice has reached the specified number. (S507) and an input continuation notification is sent to the input management unit 420 (S506). Upon receiving the input continuation notification, the input management unit 420 changes the internal state to a state waiting for voice input from the microphone (S500).
 なお、登録する言葉を入力するようにユーザ331に対して促す表示は、トリガー設定部403が表示装置425に対して登録未完了通知を送信(S505)し、その登録未完了通知を受信した表示装置425が例えば発光ダイオード(LED)を赤色で点滅させる(S507)、というようにユーザ331が認識できる表示方法で行うことが望ましい。また表示による方法の代わりに音声による方法を用いて、登録する言葉の入力をユーザ331に促してもよい。この場合トリガー設定部403は、スピーカ423に対して登録未完了通知を送信し、この登録未完了通知を受け取ったスピーカ423は、例えば「もう一度入力してください」とユーザ331に対してアナウンスする方法でもよい。或いはトリガー設定部403は、ユーザ331に対して登録する言葉の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器332が可動型の場合、トリガー設定部403は、ホスト機器332が例えばある一定の角度幅で繰り返し回転移動するように、記載していない移動手段に対して指示を出してもよい。 The display prompting the user 331 to input a word to be registered is a display in which the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S505) and receives the registration incomplete notification. It is desirable that the display method be recognized by the user 331, for example, the device 425 blinks the light emitting diode (LED) in red (S507). Further, instead of the display method, a voice method may be used to prompt the user 331 to input a registered word. In this case, the trigger setting unit 403 transmits a registration incomplete notification to the speaker 423, and the speaker 423 that has received the registration incomplete notification announces to the user 331, for example, “Please input again”. But you can. Alternatively, the trigger setting unit 403 may use both a display method and a voice method to prompt the user 331 to input words to be registered. Alternatively, when the host device 332 is a movable type, the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly rotates and moves with a certain angular width, for example.
 トリガー設定部403は、ユーザ331の音声を取り込んだ回数が規定回数に達しているかの確認の結果、規定回数に達していると判定した場合、それまでに音声蓄積エリア410-3に保存してある音声データを読み出し(S508)、インターネット2を通じてクラウドサーバ1にある音声認識クラウド101の中の認識用データ変換部101-1に送付する(S509)。 If the trigger setting unit 403 determines that the prescribed number of times has been reached as a result of checking whether the number of times the user 331 has captured the voice has reached the prescribed number, the trigger setting unit 403 stores it in the voice accumulation area 410-3 until then. Certain voice data is read (S508) and sent to the recognition data conversion unit 101-1 in the voice recognition cloud 101 in the cloud server 1 through the Internet 2 (S509).
 認識用データ変換部101-1は、トリガー設定部403から送られてきた音声データを、予約語として認識するための認識用データに変換する(S510)。認識用データへの変換が完了すると、認識用データ変換部(101-1)は、インターネット2を通じて認識用データをトリガー設定部403に送付(S511)する。認識用データを受信したトリガー設定部403は、受信したデータをメモリ410の予約語保存エリア410-2に保存する(S512)。 The recognition data conversion unit 101-1 converts the voice data sent from the trigger setting unit 403 into recognition data for recognition as a reserved word (S510). When the conversion to the recognition data is completed, the recognition data conversion unit (101-1) sends the recognition data to the trigger setting unit 403 through the Internet 2 (S511). Upon receiving the recognition data, the trigger setting unit 403 stores the received data in the reserved word storage area 410-2 of the memory 410 (S512).
 トリガー設定部403は、予約語の登録が完了したことをユーザ331に対して知らせる表示(S514)を行う。予約語の登録が完了したことをユーザ331に対して知らせる表示は、トリガー設定部403が表示装置425に対して登録完了通知を送信(S514)し、その登録完了通知を受信した表示装置425が例えばLEDを緑色で点灯させる、というようにユーザ331が認識できる表示方法で行うことが望ましい。或いはトリガー設定部403は、予約語の登録が完了したことをユーザ331に対して通知するのに、表示による方法の代わりに音声による方法を用いてもよい。この場合トリガー設定部403は、スピーカ423に対して登録完了通知を送信し、この登録完了通知を受け取ったスピーカ423は、例えば「登録が完了しました」とユーザ331に対してアナウンスする方法でもよい。或いはトリガー設定部403は、ユーザ331に対して予約語の登録が完了したことを通知するのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器332が可動型の場合、トリガー設定部403は、ホスト機器332が例えばある一定の移動幅で繰り返し直線移動するように、記載していない移動手段に対して指示を出してもよい。 The trigger setting unit 403 displays (S514) informing the user 331 that registration of the reserved word is completed. In the display informing the user 331 that the registration of the reserved word is completed, the trigger setting unit 403 transmits a registration completion notification to the display device 425 (S514), and the display device 425 that has received the registration completion notification receives the registration completion notification. For example, it is desirable to use a display method that the user 331 can recognize, such as lighting the LED in green. Alternatively, the trigger setting unit 403 may use a voice method instead of the display method to notify the user 331 that registration of the reserved word has been completed. In this case, the trigger setting unit 403 may transmit a registration completion notification to the speaker 423, and the speaker 423 that has received the registration completion notification may announce, for example, “registration is completed” to the user 331. . Alternatively, the trigger setting unit 403 may use both a display method and a voice method to notify the user 331 that registration of the reserved word has been completed. Alternatively, when the host device 332 is a movable type, the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly moves linearly with a certain moving width, for example.
 以上のように、トリガー設定部403は、予約語の登録においてデータの流れを管理する役割を持っている。 As described above, the trigger setting unit 403 has a role of managing the flow of data in registering reserved words.
 図6Aおよび図6Bは、予約語の登録開始から登録完了までの別のシーケンス例を示している。ホスト機器332が取り込んだ音声データを予約語として登録するのに不十分な場合がある。このように取り込んだデータが不十分な場合の処理の例を示す。 6A and 6B show another sequence example from the start of registration of reserved words to the completion of registration. In some cases, it is insufficient to register the voice data captured by the host device 332 as a reserved word. An example of processing when the captured data is insufficient will be described.
 図6Aおよび図6Bに示すS600からS615の処理は、それぞれ図5Aおよび図5Bに示すS500からS515の処理と同一である。図5Aおよび図5Bにおける処理と図6Aおよび図6Bにおける処理との相違点は、図6Bの処理にS616の処理からS619の処理が追加されている点である。 The processing from S600 to S615 shown in FIGS. 6A and 6B is the same as the processing from S500 to S515 shown in FIGS. 5A and 5B, respectively. The difference between the process in FIGS. 5A and 5B and the process in FIGS. 6A and 6B is that the process of S616 is added to the process of FIG. 6B.
 トリガー設定部403は、ユーザ331が発した言葉を取り込んだ回数が規定回数に達しているかの確認(S604)を行った結果、規定回数に達していると判定した場合、それまでに音声蓄積エリア410-3に保存してある音声データを読み出し(S608)、インターネット2を通じてクラウドサーバ1にある音声認識クラウド101の中の認識用データ変換部101-1に送付する(S609)。 If the trigger setting unit 403 confirms whether the number of times the words uttered by the user 331 have been taken reaches the specified number (S604), and determines that the number has reached the specified number, the trigger setting unit 403 The voice data stored in 410-3 is read (S608), and sent to the recognition data conversion unit 101-1 in the voice recognition cloud 101 in the cloud server 1 through the Internet 2 (S609).
 トリガー設定部403は、ユーザ331が発した言葉を取り込んだ回数が規定回数に達していないと判定した場合、登録する言葉を発するようにユーザ331に促す表示を行う(S607)と共に、入力管理部420に対して入力継続通知を送付する(S606)。入力継続通知を受信した入力管理部420は、内部の状態をマイクからの音声の入力待ちの状態に遷移させる(S600)。 When the trigger setting unit 403 determines that the number of times the words uttered by the user 331 are not reached the specified number, the trigger setting unit 403 displays a message prompting the user 331 to utter the words to be registered (S607) and the input management unit An input continuation notice is sent to 420 (S606). The input management unit 420 that has received the input continuation notification transitions the internal state to a state waiting for input of sound from the microphone (S600).
 なお、登録する言葉を入力するようにユーザ331に対して促す表示は、トリガー設定部403が表示装置425に対して登録未完了通知を送信(S605)し、その登録未完了通知を受信した表示装置425が例えばLEDを赤色で点滅させる(S607)、というようにユーザ331が認識できる表示方法で行うことが望ましい。また表示による方法の代わりに音声による方法を用いて、登録する言葉の入力をユーザ331に促してもよい。この場合トリガー設定部403は、スピーカ423に対して登録未完了通知を送信し、この登録未完了通知を受け取ったスピーカ423は、例えば「もう一度入力してください」とユーザ331に対してアナウンスする方法でもよい。或いはトリガー設定部403は、ユーザ331に対して登録する言葉の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器332が可動型の場合、トリガー設定部403は、ホスト機器332が例えばある一定の角度幅で繰り返し回転移動するように、記載していない移動手段に対して指示を出してもよい。 The display prompting the user 331 to input a word to be registered is a display in which the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S605) and receives the registration incomplete notification. It is desirable that the display method be recognized by the user 331, for example, the device 425 blinks the LED in red (S607). Further, instead of the display method, a voice method may be used to prompt the user 331 to input a registered word. In this case, the trigger setting unit 403 transmits a registration incomplete notification to the speaker 423, and the speaker 423 that has received the registration incomplete notification announces to the user 331, for example, “Please input again”. But you can. Alternatively, the trigger setting unit 403 may use both a display method and a voice method to prompt the user 331 to input words to be registered. Alternatively, when the host device 332 is a movable type, the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly rotates and moves with a certain angular width, for example.
 認識用データ変換部101-1は、トリガー設定部420より送られてきた全音声データを認識用データに変換する際に、送られてきた音声データが認識用データに変換できるかどうかを判定する(S616)。送られてきた音声データの幾つかが認識用データに変換できないと判定した場合、認識用データ変換部101-1は、インターネット2を通じてトリガー設定部403に対して音声データ追加要求を送信(S617)する。音声データ追加要求を受信したトリガー設定部403は、予約語として登録したい言葉をユーザ331に追加で入力してもらう回数を設定し(S618)、入力管理部420に対して入力継続通知(S619)を通知する。 The recognition data conversion unit 101-1 determines whether or not the received voice data can be converted into recognition data when converting all the voice data sent from the trigger setting unit 420 into the recognition data. (S616). If it is determined that some of the transmitted voice data cannot be converted into recognition data, the recognition data conversion unit 101-1 transmits a voice data addition request to the trigger setting unit 403 through the Internet 2 (S617). To do. Upon receiving the voice data addition request, the trigger setting unit 403 sets the number of times that the user 331 additionally inputs a word to be registered as a reserved word (S618), and notifies the input management unit 420 of continued input (S619). To be notified.
 トリガー設定部403がユーザ331に追加で入力してもらう追加回数を設定した(S618)時点では、表示部425の例えばLEDは赤色で点灯したままである。この表示に従って、ユーザ331は、S618で追加設定された回数分、予約語として登録する言葉を発する。 At the time when the trigger setting unit 403 sets the additional number of times that the user 331 additionally inputs (S618), for example, the LED of the display unit 425 remains lit in red. In accordance with this display, the user 331 issues words to be registered as reserved words for the number of times additionally set in S618.
 入力管理部420は、入力継続通知を受信すると(S619)、内部状態を入力待ちに遷移させ(S600)、ユーザ331が発する言葉の入力待ち状態となる。 When receiving the input continuation notification (S619), the input management unit 420 changes the internal state to input waiting (S600), and enters an input waiting state for words uttered by the user 331.
 図5Aおよび図5Bに示す処理、図6Aおよび図6Bに示す処理は、ユーザ331が発した音声を入力管理部402が取り込んだ回数が規定回数に達してから、その取り込んだ音声データをまとめてクラウドサーバ1にある認識用データ変換部101-1に送信する例であるが、ユーザ331が発した音声を入力管理部420が取り込むごとに、その取り込んだ音声データを認識用データ変換部101-1に送信してもよい。図7Aおよび図7Bは、ユーザ331が発した音声を入力管理部420が取り込むごとに、その取り込んだ音声データを逐次クラウドサーバ1にある認識用データ変換部101-1に送付して、認識用データに変換する場合のシーケンス例である。 The processing shown in FIGS. 5A and 5B and the processing shown in FIGS. 6A and 6B are performed by collecting the audio data acquired after the number of times that the input management unit 402 has acquired the audio uttered by the user 331 reaches the specified number. This is an example of transmitting to the recognition data conversion unit 101-1 in the cloud server 1, but each time the input management unit 420 captures the voice uttered by the user 331, the captured voice data is converted into the recognition data conversion unit 101-. 1 may be transmitted. 7A and 7B, every time the input management unit 420 captures the voice uttered by the user 331, the captured voice data is sequentially sent to the recognition data conversion unit 101-1 in the cloud server 1 for recognition. It is an example of a sequence in the case of converting into data.
 図7Aに示すS700からS702の処理は、それぞれ図5Aに示すS500からS502に示す処理と同一である。また図7Aに示すS703とS704の処理は、それぞれ図5Aに示すS505とS507の処理と同一である。 The processing from S700 to S702 shown in FIG. 7A is the same as the processing shown from S500 to S502 in FIG. 5A, respectively. Further, the processes of S703 and S704 shown in FIG. 7A are the same as the processes of S505 and S507 shown in FIG. 5A, respectively.
 予約語として登録する言葉をユーザ331が発すると、ホスト機器332は、マイク421から入力された音声データを入力管理部420に取り込む(S701)。ホスト機器332のモードが予約語登録モードであるので、入力管理部420は、受信した音声データをトリガー設定部403に転送する(S702)。トリガー設定部403は、受信した音声データを、受信するごとに逐次クラウドサーバ1にある認識用データ変換部101-1に送信する(S706)。認識用データ変換部101-1は、トリガー設定部403より送られてきた音声データを認識用データに変換する際に、送られてきた音声データが認識用データに変換できるかどうかを判定する(S707)。 When the user 331 utters a word to be registered as a reserved word, the host device 332 takes the voice data input from the microphone 421 into the input management unit 420 (S701). Since the mode of the host device 332 is the reserved word registration mode, the input management unit 420 transfers the received voice data to the trigger setting unit 403 (S702). The trigger setting unit 403 sequentially transmits the received audio data to the recognition data conversion unit 101-1 in the cloud server 1 every time it is received (S706). The recognition data conversion unit 101-1 determines whether the received voice data can be converted into recognition data when converting the voice data sent from the trigger setting unit 403 into recognition data ( S707).
 送られてきた音声データが認識用データに変換できないと判定した場合は、認識用データ変換部101-1はインターネット2を通じてトリガー設定部403に対して音声データ追加要求を送信する(S708)。音声データ追加要求を受信したトリガー設定部403(S708)は、ユーザ331の音声を取り込んだ回数が規定回数に達しているかの確認(S714)を行う。トリガー設定部403は、ユーザ331の音声を取り込んだ回数が規定回数に達しているかの確認の結果、規定回数に達していないと判定した場合、登録する言葉を発するようにユーザ331に促す表示を継続すると共に、入力管理部420に対して入力継続通知を送付する(S715)ことで、入力管理部420をマイクからの音声の入力待ちの状態に遷移させる(S700)。入力管理部420は、入力継続通知を受信すると(S715)、内部状態を入力待ちに遷移させ(S700)、ユーザ331が発する言葉の入力待ち状態となる。 If it is determined that the received voice data cannot be converted into recognition data, the recognition data conversion unit 101-1 transmits a voice data addition request to the trigger setting unit 403 via the Internet 2 (S708). Upon receiving the voice data addition request, the trigger setting unit 403 (S708) checks whether the number of times the user 331 has taken in the voice has reached the specified number (S714). The trigger setting unit 403 displays a display prompting the user 331 to utter a word to be registered when it is determined that the specified number has not been reached as a result of checking whether the number of times the user 331 has captured the voice has reached the specified number. At the same time, an input continuation notice is sent to the input management unit 420 (S715), thereby causing the input management unit 420 to transition to a state waiting for voice input from the microphone (S700). When receiving the input continuation notification (S715), the input management unit 420 changes the internal state to input waiting (S700), and enters a state of waiting for input of words uttered by the user 331.
 認識用データ変換部101-1は、送られてきた音声データが認識用データに変換できると判定(S707)した場合は、音声データを認識用データに変換する(S709)。認識用データ変換部101-1は、認識用データに変換した(S709)結果、既に認識用データに変換したものも含めてすべての認識用データを用いて、マイク421より入力された音声データを予約語として認識できる精度を確保しているかどうかの判定を行う(S710)。 When it is determined that the received voice data can be converted into recognition data (S707), the recognition data conversion unit 101-1 converts the voice data into recognition data (S709). The recognition data conversion unit 101-1 converts the voice data input from the microphone 421 using all the recognition data including those already converted into the recognition data as a result of the conversion to the recognition data (S709). It is determined whether or not accuracy that can be recognized as a reserved word is secured (S710).
 すべての認識用データにより、マイク421より入力された音声データを予約語として認識するのに十分な精度を確保していると判定した場合は、予約語として登録したい言葉をユーザ331が発するのを止めてもらうために、インターネット2を通じて、認識用データが十分である旨の情報を付加した認識用データ(認識用データ充足通知付)をトリガー設定部403に通知する(S711)。認識用データ(認識用データ充足)を受信したトリガー設定部403は、この時点までに受信した認識用データで、マイク421より入力された音声データを予約語として認識するのに十分な認識用データが存在すると認識し、ユーザ331の音声を取り込んだ回数が規定回数に達していなくても、これ以上ユーザ331に対して登録する言葉の入力を促すことを中止する(S712)。トリガー設定部403は、この時点までに受信した認識用データすべてを予約語保存エリア410-2に保存する(S716)とともに、入力管理部420、表示部425、認識用データ変換部101-1に登録完了通知を送付する(S717)(S718)(S719)。これにより、変換された認識用データの精度により、ユーザ331の音声を取り込んだ回数が規定回数に達しなくて予約語として登録する言葉をユーザ331に発してもらうのを止めてもらうことが可能となり、より自由度のある予約語の登録処理が可能となる。なお、規定回数は、ホスト機器332の設定値としてユーザ331による変えることが可能であり、また後述する付加情報の1つとして変えることが可能である。 If it is determined that the voice data input from the microphone 421 is sufficiently accurate to be recognized as a reserved word by all the recognition data, the user 331 issues a word to be registered as a reserved word. In order to stop, the trigger setting unit 403 is notified to the trigger setting unit 403 via the Internet 2 of recognition data (with recognition data full notification) to which information indicating that the recognition data is sufficient is added (S711). The trigger setting unit 403 that has received the recognition data (recognition data sufficiency) is the recognition data received up to this point and sufficient recognition data to recognize the voice data input from the microphone 421 as a reserved word. Even if the number of times the user 331 has captured the voice has not reached the specified number, the user 331 is no longer prompted to input words to be registered (S712). The trigger setting unit 403 stores all the recognition data received up to this point in the reserved word storage area 410-2 (S716), and also stores it in the input management unit 420, the display unit 425, and the recognition data conversion unit 101-1. A registration completion notification is sent (S717) (S718) (S719). As a result, the accuracy of the converted recognition data makes it possible to stop the user 331 from issuing a word to be registered as a reserved word because the number of times the user's 331 has been captured does not reach the specified number. This makes it possible to register reserved words with a higher degree of freedom. The specified number of times can be changed by the user 331 as a setting value of the host device 332, and can be changed as one of additional information described later.
 認識用データ変換部101-1は、この時点までに作成した認識用データにより、マイク421より入力された音声データを予約語として認識するのに十分な精度を確保していないと判定した場合は、変換した認識用データのみをトリガー設定部403に送付する(S713)。認識用データを受信したトリガー設定部403は、ユーザ331の音声を取り込んだ回数が規定回数に達しているかの確認(S714)を行う。トリガー設定部403は、規定回数に達しているかの確認の結果規定回数に達していないと判定した場合、登録する言葉を発するようにユーザ331に促す表示を継続すると共に、入力管理部420に対して入力継続通知を送付する(S715)ことで、入力管理部420をマイクからの音声の入力待ちの状態に遷移させる(S700)。 If the recognition data conversion unit 101-1 determines that the recognition data created up to this point does not ensure sufficient accuracy to recognize the voice data input from the microphone 421 as a reserved word Only the converted recognition data is sent to the trigger setting unit 403 (S713). The trigger setting unit 403 that has received the recognition data checks whether or not the number of times the user's 331 voice has been captured has reached the specified number (S714). If the trigger setting unit 403 determines that the specified number has not been reached as a result of checking whether the specified number has been reached, the trigger setting unit 403 continues to display the user 331 to utter the words to be registered, and to the input management unit 420. By sending an input continuation notification (S715), the input management unit 420 is shifted to a state waiting for voice input from the microphone (S700).
 なお、登録する言葉を入力するようにユーザ331に対して促す表示は、トリガー設定部403が表示装置425に対して登録未完了通知を送信(S703)し、その登録未完了通知を受信した表示装置425が例えばLEDを赤色で点滅させる(S704)、というようにユーザ331が認識できる表示方法で行うことが望ましい。また表示による方法の代わりに音声による方法を用いて、登録する言葉の入力をユーザ331に促してもよい。この場合トリガー設定部403は、スピーカ423に対して登録未完了通知を送信し、この登録未完了通知を受け取ったスピーカ423は、例えば「もう一度入力してください」とユーザ331に対してアナウンスする方法でもよい。或いはトリガー設定部403は、ユーザ331に対して登録する言葉の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器332が可動型の場合、トリガー設定部403は、ホスト機器332が例えばある一定の角度幅で繰り返し回転移動するように、記載していない移動手段に対して指示を出してもよい。 The display prompting the user 331 to input a word to be registered is a display in which the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S703) and receives the registration incomplete notification. It is desirable that the display method be recognized by the user 331, for example, the device 425 blinks the LED in red (S704). Further, instead of the display method, a voice method may be used to prompt the user 331 to input a registered word. In this case, the trigger setting unit 403 transmits a registration incomplete notification to the speaker 423, and the speaker 423 that has received the registration incomplete notification announces to the user 331, for example, “Please input again”. But you can. Alternatively, the trigger setting unit 403 may use both a display method and a voice method to prompt the user 331 to input words to be registered. Alternatively, when the host device 332 is a movable type, the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly rotates and moves with a certain angular width, for example.
 認識用データを受信したトリガー設定部403は、規定回数に達しているかの確認(S714)の結果規定回数に達していると判定した場合、登録完了通知を入力管理部420、表示部425、認識用データ変換部101-1に登録完了通知を送付する(S717)(S718)(S719)。登録完了通知を受信(S718)した認識用データ変換部101-1は、S710の処理を行うために一時的に保存していた変換済み認識用データをクリアする。 When the trigger setting unit 403 that has received the recognition data determines that the specified number has been reached as a result of checking whether the specified number has been reached (S714), a registration completion notification is received from the input management unit 420, the display unit 425, and the recognition. A registration completion notice is sent to the data conversion unit 101-1 (S717) (S718) (S719). Receiving the registration completion notification (S718), the recognition data conversion unit 101-1 clears the converted recognition data temporarily stored for performing the processing of S710.
 次に、ホスト機器332の2つ目の処理である予約語の認識について説明する。 Next, recognition of reserved words, which is the second process of the host device 332, will be described.
 ホスト機器332は、ユーザ331が発した言葉の中から予約語を認識した場合、継続するユーザ331が発した言葉の内容を解析することで、その解析結果をもとに機器やセンサを制御する機能を有している。この予約語を認識し、予約語を認識した以降に機器やセンサを制御するために、ホスト機器332は、予約語を認識および機器やセンサを制御するモード(以降動作モードと呼ぶ)を有している。 When the host device 332 recognizes a reserved word among the words uttered by the user 331, the host device 332 analyzes the contents of the words uttered by the user 331, and controls devices and sensors based on the analysis result. It has a function. In order to recognize the reserved word and control the device and sensor after recognizing the reserved word, the host device 332 has a mode for recognizing the reserved word and controlling the device and sensor (hereinafter referred to as an operation mode). ing.
 図8Aおよび図8Bは、動作モードにおいて、ユーザ331が発した言葉が登録済みの予約語の1つであると認識するまでの、ホスト機器332の処理シーケンスの例を示している。 8A and 8B show an example of a processing sequence of the host device 332 until it is recognized that the word uttered by the user 331 is one of registered reserved words in the operation mode.
 ユーザ331が言葉を発すると、ホスト機器332は、マイク421から入力された音声データを入力管理部420に取り込む(S801)。ホスト機器332のモードが動作モードである場合、入力管理部420は、受信した音声データをトリガー認識部405に転送する(S802)。トリガー認識部405は、入力管理部420から転送されてきた音声データを受け取ると、転送されてきた音声データが予約語であるかどうかを判定するために、メモリ410の予約語保存エリア410-2から読みだし(S803)た認識用データと比較を行う(S804)。 When the user 331 utters a word, the host device 332 takes the voice data input from the microphone 421 into the input management unit 420 (S801). When the mode of the host device 332 is the operation mode, the input management unit 420 transfers the received voice data to the trigger recognition unit 405 (S802). When the trigger recognizing unit 405 receives the voice data transferred from the input management unit 420, the trigger recognizing unit 405 determines whether the transferred voice data is a reserved word or not in the reserved word storage area 410-2 of the memory 410. The data is compared with the recognition data read out (S803) (S804).
 トリガー認識部405は、入力された音声データが予約語と認識出来ないと判定した場合(S805)、予約語を発するようにユーザ331に促す表示を行う(S808)と共に、入力管理部420に入力継続通知を送付する(S807)。なお、予約語を発するようにユーザ331に促す表示は、トリガー認識部405が表示部425に対して認識未完了通知を送信(S806)し、その認識未完了通知を受信した表示部425が例えばLEDを赤色で点滅させる(S808)、というようにユーザ331が認識できる表示方法で行うことが望ましい。またトリガー認識部405は、表示による方法の代わりに音声による方法を用いて、音声の入力をユーザ331に促してもよい。この場合トリガー認識部405は、スピーカ423に対して、認識未完了通知を送信し、この認識未完了通知を受け取ったスピーカ423は、例えば「聞こえなかったよ」とユーザ331に対してアナウンスする方法でもよい。或いはトリガー認識部405は、ユーザ331に対して音声の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器332が可動型の場合、トリガー設定部403は、ホスト機器332が例えばある一定の角度幅で繰り返し回転移動するように、記載していない移動手段に対して指示を出してもよい。 If the trigger recognition unit 405 determines that the input voice data cannot be recognized as a reserved word (S805), the trigger recognition unit 405 displays a message prompting the user 331 to emit the reserved word (S808) and inputs the input data to the input management unit 420. A continuation notice is sent (S807). Note that the display that prompts the user 331 to issue a reserved word is such that the trigger recognition unit 405 transmits a recognition incomplete notification to the display unit 425 (S806), and the display unit 425 that has received the recognition incomplete notification receives, for example, It is desirable to use a display method that the user 331 can recognize, such as blinking the LED in red (S808). The trigger recognizing unit 405 may prompt the user 331 to input a voice by using a voice method instead of the display method. In this case, the trigger recognizing unit 405 transmits a recognition incomplete notification to the speaker 423, and the speaker 423 that has received the recognition incomplete notification announces to the user 331, for example, "I did not hear". Good. Alternatively, the trigger recognition unit 405 may use both a display method and a voice method to prompt the user 331 to input a voice. Alternatively, when the host device 332 is a movable type, the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly rotates and moves with a certain angular width, for example.
 トリガー認識部405は、入力された音声データが予約語と認識出来た場合(S805)、ユーザ331が発した音声を予約語として認識したことを示す表示を行う(S810)。なお、ユーザ331が発した音声を予約語として認識したことを示す表示は、トリガー認識部405が表示装置425に対して認識完了通知を送信(S809)し、その認識完了通知を受信した表示装置425が例えばLEDを緑色で点灯させる(S810)、というようにユーザ331が認識できる表示方法で行うことが望ましい。またトリガー認識部405は、表示による方法の代わりに音声による方法を用いて、ユーザ331が発した音声を予約語として認識しことを通知してもよい。この場合トリガー認識部405は、スピーカ423に対して認識完了通知を送信し、この認識完了通知を受け取ったスピーカ423は、例えば「はいはい」や「聞こえたよ」とユーザ331に対してアナウンスする方法でもよい。或いはトリガー認識部405は、ユーザ331が発した音声を予約語として認識したことを示すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器332が可動型の場合、トリガー設定部403は、ホスト機器332が例えばある一定の移動幅で繰り返し直線移動するように、記載していない移動手段に対して指示を出してもよい。 When the input voice data can be recognized as a reserved word (S805), the trigger recognizing unit 405 displays that the voice uttered by the user 331 is recognized as a reserved word (S810). The display indicating that the voice uttered by the user 331 is recognized as a reserved word is displayed by the trigger recognition unit 405 transmitting a recognition completion notification to the display device 425 (S809) and receiving the recognition completion notification. It is desirable to use a display method that can be recognized by the user 331, for example, 425 turns on the LED in green (S810). The trigger recognizing unit 405 may notify that the voice uttered by the user 331 is recognized as a reserved word by using a voice method instead of the display method. In this case, the trigger recognition unit 405 transmits a recognition completion notification to the speaker 423, and the speaker 423 that has received the recognition completion notification also announces to the user 331, for example, “Yes” or “I heard it”. Good. Alternatively, the trigger recognition unit 405 may use both a display method and a voice method to indicate that the voice uttered by the user 331 is recognized as a reserved word. Alternatively, when the host device 332 is a movable type, the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly moves linearly with a certain moving width, for example.
 図9Aおよび図9Bは、動作モードにおいて、ユーザ331が発した言葉を登録済みの予約語の1つであると認識するまでのホスト機器332の処理シーケンスの他の例である。 FIG. 9A and FIG. 9B show another example of the processing sequence of the host device 332 until the word uttered by the user 331 is recognized as one of registered reserved words in the operation mode.
 図9Aおよび図9Bのシーケンス例と図8Aおよび図8Bのシーケンス例との違いは、予約語の認識を行う過程で、認識確率を考慮に入れている点である。認識確率とは、認識用データと、入力管理部420から転送されてきた音声データの周波数成分や強さ等の特徴点の比較を行い、両者が一致しているレベルことを意味している。図9Aおよび図9Bに示すS900からS912の処理は、それぞれS800からS812に示す処理と同一で、図9Aおよび図9Bにおける処理において図8Aおよび図8Bとの処理との違いは、S913からS916の処理が追加されている点である。 The difference between the sequence example of FIGS. 9A and 9B and the sequence example of FIGS. 8A and 8B is that the recognition probability is taken into consideration in the process of recognizing the reserved word. The recognition probability means that the recognition data is compared with the feature points such as the frequency component and strength of the voice data transferred from the input management unit 420, and the two match each other. The processes from S900 to S912 shown in FIGS. 9A and 9B are the same as the processes shown from S800 to S812, respectively, and the process in FIGS. 9A and 9B is different from the processes in FIGS. 8A and 8B in the processes from S913 to S916. This is the point where processing is added.
 トリガー認識部405は、入力管理部420から転送されてきた音声データを受け取ると、メモリ410の予約語保存エリア410-2から認識用データを読み出し(S903)、入力管理部420から転送されてきた音声データとの比較を行う(S904)。 Upon receiving the voice data transferred from the input management unit 420, the trigger recognition unit 405 reads the recognition data from the reserved word storage area 410-2 of the memory 410 (S903), and is transferred from the input management unit 420. Comparison with audio data is performed (S904).
 トリガー認識部405は、入力された音声データが予約語と認識出来たと判定(S905)した場合、認識確率の判定処理(S913)に移る。 When the trigger recognition unit 405 determines that the input voice data is recognized as a reserved word (S905), the trigger recognition unit 405 proceeds to a recognition probability determination process (S913).
 ここでトリガー認識部405が行う音声認識処理は、メモリ410の予約語保存エリア410-2から読み出した認識用データと入力管理部420から転送されてきた音声データの周波数成分や強さ等の特徴点との比較を行い、両者が一定のレベル以上一致する場合に、入力管理部420から転送された音声データは認識用データである、と判定するものである。 Here, the speech recognition processing performed by the trigger recognition unit 405 includes characteristics such as the frequency component and strength of the recognition data read from the reserved word storage area 410-2 of the memory 410 and the voice data transferred from the input management unit 420. A point is compared, and when both coincide with each other at a certain level or more, it is determined that the voice data transferred from the input management unit 420 is recognition data.
 ホスト機器332は、認識用データと入力管理部420から転送されてきた音声データの周波数成分や強さ等の特徴点との比較を行う際に、両者が一致しているレベルを判定する閾値を複数設けることも可能である。このようにすることで、ホスト機器332は、ユーザが発した言葉の中から予約語を認識する際に、予約語を認識出来た/予約語を認識出来ない、という2通りの判定ではなく、例えば予約語を認識出来た/予約語を認識出来ない/予約語を認識出来たとは言えない、というように、予約語に近いが正しい予約語ではない、という判定を加えることも出来る。このように認識確率の閾値を複数設けることで、ユーザ331が例えば予約語を正確に覚えていない場合、ユーザ331が予約語に近い言葉を繰り返し発することで、そのユーザ331の発した言葉を取り込んだホスト機器332は「予約語を認識出来たとは言えない」という判定結果に応じた応答をし、その応答内容を見たユーザ331は、正しい予約語に近づくことができる、というメリットがある。 When the host device 332 compares the recognition data with the feature points such as the frequency component and strength of the audio data transferred from the input management unit 420, the host device 332 sets a threshold value for determining the level at which the two match. It is also possible to provide a plurality. By doing in this way, the host device 332 is not a two-way determination that the reserved word can be recognized / the reserved word cannot be recognized when recognizing the reserved word among the words uttered by the user, For example, it is possible to add a determination that a reserved word is not a correct reserved word, such that a reserved word can be recognized / a reserved word cannot be recognized / a reserved word cannot be recognized. By providing a plurality of recognition probability thresholds in this way, when the user 331 does not remember a reserved word accurately, for example, the user 331 repeatedly utters a word close to the reserved word, thereby capturing the word uttered by the user 331. The host device 332 responds according to the determination result that “the reserved word cannot be recognized”, and there is an advantage that the user 331 who sees the response content can approach the correct reserved word.
 図9Aおよび図9Bの例は、認識確率の閾値を2つ設けた場合の例である。予約語を認識出来る閾値を閾値1とし予約語を認識出来ない閾値を閾値0とすると、S904において比較の結果、認識確率が閾値1以上の場合は、予約語が認識出来た、との判定結果となる。また認識確率が閾値0以上閾値1未満の場合は、予約語を認識出来たいと言えない、との判定結果となる。また認識確率が閾値0未満の場合は、予約語が認識出来ない、との判定結果となる。したがってS905の処理は、認識確率を閾値0と大小比較を行う処理である。またS913の処理は、認識確率を閾値1と大小比較を行う処理となる。 9A and 9B are examples in the case where two recognition probability thresholds are provided. Assuming that the threshold value for recognizing a reserved word is threshold value 1 and the threshold value for not recognizing a reserved word is threshold value 0, as a result of comparison in S904, if the recognition probability is greater than or equal to threshold value 1, it is determined that the reserved word has been recognized. It becomes. If the recognition probability is greater than or equal to the threshold 0 and less than the threshold 1, it is determined that the reserved word cannot be recognized. When the recognition probability is less than the threshold 0, it is determined that the reserved word cannot be recognized. Therefore, the process of S905 is a process of comparing the recognition probability with the threshold 0. The process of S913 is a process of comparing the recognition probability with the threshold 1 in size.
 ホスト機器332は、認識確率が閾値0以上閾値1未満である、と判定した場合(S913)、予約語を発するようにユーザ331に促す表示を行う(S915)と共に、入力管理部420に入力継続通知を送付する(S916)。なお、予約語を発するようにユーザ331に促す表示は、トリガー認識部405が表示部425に対して認識不十分通知を送付(S914)し、その認識不十分通知を受信した表示部425が例えばLEDを緑色で点滅させる(S915)、というようにユーザ331が認識できる表示方法で行うことが望ましい。 When the host device 332 determines that the recognition probability is greater than or equal to the threshold value 0 and less than the threshold value 1 (S913), the host device 332 displays a message prompting the user 331 to emit a reserved word (S915) and continues input to the input management unit 420. A notification is sent (S916). The display that prompts the user 331 to issue a reserved word is that the trigger recognition unit 405 sends an insufficient recognition notification to the display unit 425 (S914), and the display unit 425 that has received the insufficient recognition notification, for example, It is desirable to use a display method that the user 331 can recognize, such as blinking the LED in green (S915).
 このように、認識確率が低い場合に、予約語を発するようにユーザ331に促す表示は、認識に失敗した場合の表示(S908)や認識に成功した場合の表示(S910)と変えることで、ユーザ331は、自分が発した言葉が予約語に近いが正しく予約語を発していない、と認識することができる。 In this way, when the recognition probability is low, the display that prompts the user 331 to issue a reserved word can be changed from the display when the recognition is unsuccessful (S908) or the display when the recognition is successful (S910), The user 331 can recognize that his / her words are close to the reserved words, but do not correctly issue the reserved words.
 またトリガー設定部403は、表示による方法の代わりに音声による方法を用いて、音声の入力をユーザ331に促してもよい。この場合トリガー認識部405は、スピーカ423に対して認識不十分通知を送信(S914)し、この認識不十分通知を受け取ったスピーカ423は、例えば「何か呼んだ?」とユーザ331に対してアナウンスする方法でもよい。或いはトリガー認識部405は、ユーザ331に対して音声の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。或いはホスト機器332が可動型の場合、トリガー設定部403は、ホスト機器332が例えばある一定の角度幅で繰り返し回転移動するように、記載していない移動手段に対して指示を出してもよい。 Also, the trigger setting unit 403 may prompt the user 331 to input a voice by using a voice method instead of the display method. In this case, the trigger recognizing unit 405 transmits an insufficient recognition notification to the speaker 423 (S914), and the speaker 423 that has received this insufficient recognition notification, for example, “has something called?” To the user 331. The method of announcement may be used. Alternatively, the trigger recognition unit 405 may use both a display method and a voice method to prompt the user 331 to input a voice. Alternatively, when the host device 332 is a movable type, the trigger setting unit 403 may issue an instruction to a moving unit that is not described so that the host device 332 repeatedly rotates and moves with a certain angular width, for example.
 次に、ホスト機器332の3つ目の処理である、動作を制御する機器やセンサの制御内容の登録と、4つ目の処理である、制御内容が登録されている機器やセンサの制御について説明する。 Next, the third processing of the host device 332, registration of the control contents of the device and sensor that controls the operation, and the fourth processing, control of the device and sensor in which the control content is registered. explain.
 まずは、ホスト機器332を用いた機器やセンサの制御の全体像を説明する。 First, an overall image of device and sensor control using the host device 332 will be described.
 ホスト機器332は、ユーザ331が発した言葉の中から予約語を認識した場合、予約語を認識した以降にユーザが発した言葉を継続して取り込み、その取り込んだ言葉の内容を解析することで機器やセンサを制御する機能を有している。 When the host device 332 recognizes a reserved word among the words uttered by the user 331, the host device 332 continuously captures the words uttered by the user after recognizing the reserved words, and analyzes the contents of the captured words. It has a function to control equipment and sensors.
 図10Aおよび図10Bは、ホスト機器が、予約語の認識が完了した以降において、マイク421から取り込んだ機器やセンサの制御内容を含んだ音声データの内容に基づいて、機器やセンサを制御する場合の処理シーケンスの例を示している。入力管理部420の内部状態は、予約語の認識は完了しているので認識済み(S1000)に遷移している。 10A and 10B show a case where the host device controls the device or sensor based on the content of the audio data including the control content of the device or sensor captured from the microphone 421 after the reserved word is recognized. An example of the processing sequence is shown. The internal state of the input management unit 420 transitions to recognized (S1000) since reserved word recognition has been completed.
 ユーザ331が、機器やセンサを制御する内容を含んだ言葉を発すると、ホスト機器332はマイク421を通じて(S1001)、その音声データ(制御内容)を入力管理部420に取り込む(S1002)。入力管理部420は、内部状態が認識済みであるので、入力された音声データ(制御内容)を音声処理部407に転送する(S1002)。音声処理部407は、転送された音声データ(制御内容)をインターネット2を通じて、クラウドサーバ1にある音声認識クラウド101の中の音声テキスト変換部101-2に送る。 When the user 331 utters a word including content for controlling the device and the sensor, the host device 332 takes the voice data (control content) into the input management unit 420 through the microphone 421 (S1002). Since the internal state has already been recognized, the input management unit 420 transfers the input voice data (control content) to the voice processing unit 407 (S1002). The voice processing unit 407 sends the transferred voice data (control contents) to the voice text conversion unit 101-2 in the voice recognition cloud 101 in the cloud server 1 via the Internet 2.
 音声テキスト変換部101-2は、インターネット2を通じて送られてき音声データを、テキストデータに変換する処理を行う(S1004)。この処理により、もともとマイク421を通じて取り込まれたユーザ331が発した音声が、テキストデータに変換される。 The voice text conversion unit 101-2 performs a process of converting voice data sent through the Internet 2 into text data (S1004). By this process, the voice uttered by the user 331 originally captured through the microphone 421 is converted into text data.
 テキストデータへの変換が完了すると音声テキスト変換部101-2は、変換したテキストデータを内部に保存すると共に変換完了通知を音声処理部407に送付する(S1005)。 When the conversion to the text data is completed, the speech text conversion unit 101-2 stores the converted text data therein and sends a conversion completion notification to the speech processing unit 407 (S1005).
 音声処理部407は、変換完了通知を受け取ると、音声テキスト変換部101-2に対してテキスト分析要求を送信する(S1006)。音声テキスト変換部101-2は、テキスト分析要求を受信すると、内部に保存してあるテキストに変換済みのデータとともにテキスト分析要求をテキスト分析部102-1に送付する(S1007)。テキスト分析部102-1は、テキスト分析要求を受信(S1007)したら、付随しているテキストデータの内容の解析を実施する(S1008)。テキスト分析部102-1は、送られてきたテキストデータの内容の解析が完了すると、その解析結果をテキスト分析結果通知として応答・アクション生成部102-2に送付する(S1009)。応答・アクション生成部102-2は、テキスト分析結果を受信(S1009)すると、その内容に基づいて対象となる機器とその機器を制御するコマンドを生成し(S1010)、生成したコマンドを応答・アクション生成結果通知として音声処理部407に送付する(S1011)。 Upon receiving the conversion completion notification, the voice processing unit 407 transmits a text analysis request to the voice text conversion unit 101-2 (S1006). When receiving the text analysis request, the voice text conversion unit 101-2 sends the text analysis request to the text analysis unit 102-1 together with the data converted into the text stored therein (S1007). When the text analysis unit 102-1 receives the text analysis request (S1007), it analyzes the content of the accompanying text data (S1008). When the analysis of the content of the sent text data is completed, the text analysis unit 102-1 sends the analysis result to the response / action generation unit 102-2 as a text analysis result notification (S1009). Upon receiving the text analysis result (S1009), the response / action generation unit 102-2 generates a target device and a command for controlling the device based on the content (S1010), and sends the generated command as a response / action. It is sent to the voice processing unit 407 as a generation result notification (S1011).
 音声処理部407は、応答・アクション生成結果通知を受信する(S1011)と、応答・アクション生成結果通知の内容から制御対象の機器やセンサとその制御内容を特定する(S1012)。音声処理部407は、特定した制御対象の機器やセンサとその制御内容を、制御対象の機器やセンサが認識出来るフォーマットに変換して、必要なタイミングにおいてネットワーク333を通じて対象機器や対象センサにアクション通知として送信する(S1013)。 When the voice processing unit 407 receives the response / action generation result notification (S1011), the voice processing unit 407 specifies the device or sensor to be controlled and the control content from the response / action generation result notification (S1012). The voice processing unit 407 converts the specified control target device or sensor and its control contents into a format that can be recognized by the control target device or sensor, and sends an action notification to the target device or target sensor through the network 333 at a necessary timing. (S1013).
 アクション通知の通知先である制御対象の機器やセンサは、アクション通知を受け取る(S1013)と、その中に含まれる制御内容に基づいて動作を行う(S1014)。 Upon receiving the action notification (S1013), the control target device or sensor that is the notification destination of the action notification performs an operation based on the control content included therein (S1014).
 ホスト機器332は、ユーザ331が連続して音声を発する場合、この連続した音声を一連の音声と判定して途中でユーザ331に対して予約語を発することを要求することなく、この連続した音声を取り込むことが出来る。逆にホスト機器332は、ユーザ331が、ある程度時間をおいて音声を発する場合は、再度予約語の入力を要求する。各々の場合について、図11Aおよび図11Bと図12Aおよび図12Bを用いて説明する。 When the user 331 continuously utters a voice, the host device 332 determines that the continuous voice is a series of voices and requests the user 331 to issue a reserved word on the way without requesting the continuous voice. Can be captured. On the contrary, the host device 332 requests the input of the reserved word again when the user 331 utters a sound after a certain period of time. Each case will be described with reference to FIGS. 11A and 11B and FIGS. 12A and 12B.
 図11Aおよび図11Bは、予約語の認識が完了した以降において、ユーザ331が時間T0以内に連続的に言葉を発する場合の処理シーケンスの例である。ホスト機器332が、マイク421から入力された音声データ(制御内容)を入力管理部420に取り込む(S1101)と、入力管理部420は入力間隔確認タイマTを起動させる。入力間隔確認タイマTが満了する時間(=T0)以前の時間T1に、マイク421を通じてユーザ331が発した次の音声データ(制御内容)を入力管理部420に取り込んだ場合(S1121)、入力管理部420は、その取り込んだ音声データ(制御内容)を音声処理部407に転送する(S1122)。同時に、起動中の入力間隔確認タイマTを再度起動させる。音声処理部407は、転送されてきた音声データ(制御内容)をインターネット2を通じて、クラウドサーバ1にある音声認識クラウド101の中の音声テキスト変換部101-2に送る(S1123)。以降は、S1104からS1110の処理と同様に、音声認識クラウド101において送られてきた音声データ(S1123)の処理を継続する。 FIG. 11A and FIG. 11B are examples of processing sequences in the case where the user 331 continuously utters words within the time T0 after the recognition of the reserved word is completed. When the host device 332 takes in the audio data (control content) input from the microphone 421 to the input management unit 420 (S1101), the input management unit 420 starts the input interval confirmation timer T. When the next voice data (control content) issued by the user 331 through the microphone 421 is taken into the input management unit 420 at the time T1 before the input interval confirmation timer T expires (= T0) (S1121) The unit 420 transfers the captured audio data (control content) to the audio processing unit 407 (S1122). At the same time, the activated input interval confirmation timer T is activated again. The voice processing unit 407 sends the transferred voice data (control content) to the voice text conversion unit 101-2 in the voice recognition cloud 101 in the cloud server 1 via the Internet 2 (S1123). Thereafter, similarly to the processes from S1104 to S1110, the process of the voice data (S1123) transmitted in the voice recognition cloud 101 is continued.
 なお入力間隔確認タイマTは、入力管理部420がマイク421から入力された音声データを取り込んだタイミングで起動しているが、これに限らず例えば入力管理部420が、マイク421から送られてきたデータをトリガー設定部403や音声処理部407に転送するタイミングで起動してもよい。また、入力管理部420の内部状態が認識済みに遷移(S1100)したタイミングで、起動してもよい。 The input interval confirmation timer T is activated at the timing when the input management unit 420 takes in the voice data input from the microphone 421. However, the input management unit 420 is not limited to this, and for example, the input management unit 420 is sent from the microphone 421. You may start at the timing which transfers data to the trigger setting part 403 or the audio | voice processing part 407. Moreover, you may start at the timing which the internal state of the input management part 420 changes to recognized (S1100).
 図12Aおよび図12Bは、ユーザ331が時間T0以内に連続的に音声を発しない場合の例である。ホスト機器332は、マイク421から入力された音声データ(制御内容)を入力管理部420に取り込む(S1201)と、入力管理部420は入力間隔確認タイマTを起動させる。入力管理部420は、入力間隔確認タイマTが満了する時間(=T0)を過ぎると、内部状態を入力待ちに遷移させる(S1220)。 FIG. 12A and FIG. 12B are examples in the case where the user 331 does not continuously emit voice within the time T0. When the host device 332 takes in the audio data (control content) input from the microphone 421 to the input management unit 420 (S1201), the input management unit 420 starts the input interval confirmation timer T. When the input interval confirmation timer T expires (= T0), the input management unit 420 changes the internal state to wait for input (S1220).
 ホスト機器332は、入力間隔確認タイマTが満了する時間(=T0)を過ぎてからマイク421から入力された次の音声データを取り込んだ場合(S1224)、この取り込んだ音声データをもとに機器やセンサを制御する処理を実行せず、予約語を発するようにユーザ331を促す表示を行う。 When the host device 332 captures the next audio data input from the microphone 421 after the time (= T0) when the input interval confirmation timer T expires (S1224), the host device 332 performs the device based on the acquired audio data. And a process for controlling the sensor are performed, and a display prompting the user 331 to issue a reserved word is performed.
 入力間隔確認タイマTが満了すると、入力管理部は内部の状態を入力待ちに遷移させる(S1220)とともに、タイムアウト通知を音声処理部407に通知する(S1221)。タイムアウト通知を受け取った音声処理部407は、表示部425に対して認識未完了通知を送信し(S1222)、その認識未完了通知を受信した表示部425は、予約語を発するようにユーザ331に促す表示、例えばLEDを赤色で点滅させる(S1223)。 When the input interval confirmation timer T expires, the input management unit changes the internal state to waiting for input (S1220), and notifies the voice processing unit 407 of a timeout notification (S1221). The voice processing unit 407 that has received the timeout notification transmits a recognition incomplete notification to the display unit 425 (S1222), and the display unit 425 that has received the recognition incomplete notification notifies the user 331 to issue a reserved word. A prompting display, for example, the LED blinks in red (S1223).
 入力間隔確認タイマTが満了後に、マイク421から入力された次の音声データを取り込んだ場合(S1224)、入力管理部420は、内部状態を認識中に遷移させる(S1225)とともに、その取り込んだ音声データをトリガー認識部405に転送する(S1226)。以降、ホスト機器332は、図8Aおよび図8BのS803からS812までの処理あるいは図9Aおよび図9BのS903からS916までの処理を行い、予約語の認識を再度行う。 When the next voice data input from the microphone 421 is captured after the input interval confirmation timer T expires (S1224), the input management unit 420 changes the internal state during recognition (S1225) and the captured voice. The data is transferred to the trigger recognition unit 405 (S1226). Thereafter, the host device 332 performs the processing from S803 to S812 in FIGS. 8A and 8B or the processing from S903 to S916 in FIGS. 9A and 9B, and recognizes the reserved word again.
 次にホスト機器332を用いた機器やセンサを制御するための制御内容の登録と、その登録された制御内容に基づいて行う機器やセンサの制御について説明する。 Next, registration of control contents for controlling a device or sensor using the host device 332 and control of the device or sensor performed based on the registered control content will be described.
 図13は、ホスト機器332が、予約語を認識した後図10Aおよび図10Bのシーケンス図に示したように各種センサ310や各種設備機器320や各種家電機器340を制御する際に用いる制御情報の内容の具体的な例を示したものである。 FIG. 13 shows control information used when the host device 332 recognizes a reserved word and controls various sensors 310, various equipment devices 320, and various home appliances 340 as shown in the sequence diagrams of FIGS. 10A and 10B. A specific example of the contents is shown.
 項目1は、応答・アクション生成部102-2より送信される応答・アクション生成結果通知に含まれている、各種センサ310や各種設備機器320や各種家電機器340を制御する情報(以降応答・アクション情報と呼ぶ)の具体例である。この応答・アクション生成情報は、機器332が制御する機器やセンサ等の「対象」と、その制御対象を制御する内容を表す「命令」とから成る。ホスト機器332は、応答・アクション生成結果通知を受信すると、その中に含まれるアクション情報を抽出し、そのアクション情報の内容に基づいて、対象となる機器の制御を行う。 Item 1 includes information for controlling various sensors 310, various equipment 320, and various home appliances 340 included in the response / action generation result notification transmitted from the response / action generation unit 102-2 (hereinafter, response / action This is a specific example). This response / action generation information includes a “target” such as a device or a sensor controlled by the device 332 and a “command” indicating the content for controlling the control target. When the host device 332 receives the response / action generation result notification, the host device 332 extracts the action information contained therein, and controls the target device based on the content of the action information.
 「命令」の例としては、制御する対象の機器を起動させる(動作させる)「起動命令」、終了させる(停止させる)「停止命令」、動作中の内容(モード)を変更する「動作変更命令」、対象機器に予め設定している内容(モード)を変更する「設定変更命令」等がある。 Examples of “commands” include “start command” for starting (operating) a target device to be controlled, “stop command” for ending (stopping), and “operation change command for changing contents (mode) during operation” And “setting change command” for changing contents (modes) set in advance in the target device.
 応答・アクション生成部102-2が応答・アクション生成結果通知に含む応答・アクション情報を生成するために、ユーザ331は予め制御対象の機器とその制御内容、及びその機器を制御させるためにホスト機器332に対して発する言葉、の組み合わせを、ホスト機器332の初期設定として応答・アクション生成部102-2に登録する必要がある。以下図13の例を用いて、ホスト機器332の初期設定における応答・アクション情報の登録に関して説明する。 In order for the response / action generation unit 102-2 to generate the response / action information included in the response / action generation result notification, the user 331 previously controls the device to be controlled, its control contents, and the host device to control the device. It is necessary to register a combination of words issued to 332 in the response / action generation unit 102-2 as an initial setting of the host device 332. Hereinafter, the registration of response / action information in the initial setting of the host device 332 will be described with reference to the example of FIG.
 項目2は、ホスト機器332を通して制御する機器である「対象」である。この「対象」は、各種センサ310や各種設備機器320や各種家電機器340に含まれる機器やセンサの識別名称であり、具体例としてエアコン1を記載している。 Item 2 is “target” which is a device controlled through the host device 332. The “target” is an identification name of devices and sensors included in the various sensors 310, the various equipment devices 320, and the various home appliances 340, and the air conditioner 1 is described as a specific example.
 項目3は、「項目2」に示す機器の制御内容である「命令」である。この「命令」は、具体例として項目2に挙げたエアコン1の命令を記載しており、エアコンを動かす「起動命令」、エアコンを停止させる「停止命令」、エアコンの動作内容を変える「動作変更命令」、エアコンの設定内容を変える「設定変更命令」を例として記載している。 Item 3 is a “command” which is the control content of the device shown in “Item 2”. This “command” describes the command of the air conditioner 1 listed in item 2 as a specific example. The “start command” for moving the air conditioner, the “stop command” for stopping the air conditioner, and the “change operation” for changing the operation content of the air conditioner. “Command” and “Setting change command” for changing the setting contents of the air conditioner are described as examples.
 項目2及び項目3の各機器やセンサの製品仕様は、記載していない製品仕様の情報が保存されている製品仕様クラウドサーバに予め保存されている。ユーザ331は、ホスト機器332を通して制御したい対象機器や対象センサの項目2及び項目3の製品仕様の情報を製品仕様クラウドサーバから入手する。 The product specifications of each device and sensor of item 2 and item 3 are stored in advance in a product specification cloud server in which information on product specifications not described is stored. The user 331 obtains the product specification information of the item 2 and the item 3 of the target device and target sensor to be controlled through the host device 332 from the product specification cloud server.
 次にユーザ331は、ホスト機器332を通して項目2及び項目3の制御内容を実行する際に、ホスト機器332に発する言葉である項目4=「フレーズ」を決定する。この「フレーズ」は、項目3に挙げたエアコン1の命令に対応する内容であることが望ましく、例えばエアコンを動かす「起動命令」に対しては「エアコンつけて」、エアコンを停止させる「停止命令」に対しては「エアコンけして」、エアコンの動作内容である「冷房」を「ドライ」に変える「動作変更命令」に対しては「ドライにして」、エアコンの設定内容である運転開始時間を「夜10時運転開始」に変える「設定変更命令」に対しては「夜10時にエアコンつけて」を例として記載している。 Next, when the user 331 executes the control contents of the items 2 and 3 through the host device 332, the user 331 determines item 4 = “phrase”, which is a word uttered to the host device 332. This “phrase” is preferably the content corresponding to the command of the air conditioner 1 listed in item 3, for example, “turn on command” for “start command” for operating the air conditioner, “stop command for stopping the air conditioner” ”For air conditioning”, change the air conditioner operation “cooling” to “dry” “operation change command” for “change to dry”, air conditioner setting content for operation start time “Setting change command” for changing “to start operation at 10 o'clock” describes “turn on air conditioner at 10 o'clock” as an example.
 以上より決定した(対象、命令、フレーズ)の組み合わせを、ユーザ331は、ホスト機器332の初期設定として作成する。ユーザ331は、ホスト機器332を通じて制御したい機器すべてに対して同様の作成を行い、最終的に制御対象すべての機器に関する(対象、命令、フレーズ)を1つにまとめた応答・アクション情報一覧を生成する。作成された応答・アクション情報一覧は、ホスト機器332を通して応答・アクション生成部102-2に登録される。 The user 331 creates a combination (target, command, phrase) determined as described above as an initial setting of the host device 332. The user 331 performs the same creation for all the devices to be controlled through the host device 332, and finally generates a response / action information list in which all the devices to be controlled (objects, commands, phrases) are combined into one. To do. The created response / action information list is registered in the response / action generation unit 102-2 through the host device 332.
 応答・アクション生成部102-2に応答・アクション情報一覧が登録されると、図10Aおよび図10Bに示すように、ホスト機器332は、予約語の認識が完了した以降、引き続きユーザ331が発する言葉を取り込んで解析することで、機器やセンサを制御することができる。 When the response / action information list is registered in the response / action generation unit 102-2, as shown in FIGS. 10A and 10B, the host device 332 continues the words that the user 331 issues after the recognition of the reserved words is completed. By taking in and analyzing, it is possible to control devices and sensors.
 例えば、ユーザ331が発した言葉=エアコンつけて、の場合、音声テキスト変換部101-2は入力された音声データを「えあこんつけて」というテキストに変換し、テキスト分析部102-1は、テキストデータ「えあこんつけて」を「エアコンつけて」という内容であると分析する。この分析結果をもとに応答・アクション生成部102-2は、既に登録されている応答・アクション情報一覧を参照し、「エアコンつけて」という「フレーズ」の分析結果に対応する応答・アクション情報を検索する。これにより、(対象=エアコン1、命令=運転開始)と言う応答・アクション情報を抽出し、応答・アクション生成結果通知に(対象=エアコン1、命令=運転開始)の応答・アクション情報を設定して音声処理部407に通知する。 For example, in the case where the word uttered by the user 331 = turn on the air conditioner, the speech text conversion unit 101-2 converts the input speech data into the text “Eat Kontetsu”, and the text analysis unit 102-1 , Analyzes that the text data “Eakontsute” is the content of “Turn on the air conditioner”. Based on the analysis result, the response / action generating unit 102-2 refers to the list of registered response / action information, and the response / action information corresponding to the analysis result of “phrase” “turn on air conditioner” Search for. As a result, the response / action information (target = air conditioner 1, command = operation start) is extracted, and the response / action information of (target = air conditioner 1, command = operation start) is set in the response / action generation result notification. To the voice processing unit 407.
 音声処理部407は、受信した応答・アクション生成結果通知に設定されている応答・アクション情報を参照して、各種センサ310や各種設備機器320や各種家電機器340の中の該当する機器やセンサを制御する。 The audio processing unit 407 refers to the response / action information set in the received response / action generation result notification, and selects the corresponding device or sensor from among the various sensors 310, the various equipment 320, and the various home appliances 340. Control.
 次にホスト機器332を用いて機器やセンサを制御する場合、種々の条件により機器やセンサを制御する制御内容や、ホスト機器332の動作内容を変更する場合について説明する。 Next, in the case of controlling a device or sensor using the host device 332, a description will be given of a case where the control content for controlling the device or sensor under various conditions or the operation content of the host device 332 is changed.
 図14は、ホスト機器332に予約語が複数登録されている場合、ホスト機器332がユーザ331の発した言葉を予約語の1つであると認識し、その認識した予約語に応じて行う動作内容の例の一覧である。 FIG. 14 shows that when a plurality of reserved words are registered in the host device 332, the host device 332 recognizes the word uttered by the user 331 as one of the reserved words, and performs in accordance with the recognized reserved word. It is a list of examples of contents.
 ホスト機器332は、複数の予約語を登録することが可能であり、またその複数の予約語の各々を認識した場合に、その認識した予約語に応じた動作内容(以降付加情報1と呼ぶ)を設定することが出来る。 The host device 332 can register a plurality of reserved words, and when each of the plurality of reserved words is recognized, the operation content corresponding to the recognized reserved word (hereinafter referred to as additional information 1). Can be set.
 図14に示すようにホスト機器332は、予約語として例えば「いろは」「オレ様だ」「息子や」の3つを登録しているものとする。ホスト機器332は、ユーザ331が発した言葉を予約語「いろは」と認識した場合は、既に設定されている動作内容を変えないが、ユーザ331が発した言葉を予約語「オレ様だ」と認識した場合は、以降ユーザ331の発する言葉を認識したら必ず「ご主人様喜んで」とスピーカ423を通じてアナウンスするように動作を変更する。また、ユーザ331が発した言葉を予約語「息子や」と認識した場合、ホスト機器332は、ユーザ331がシニアユーザであると判定し、シニアの場合はゆっくりと話をする傾向にあるため、図11Aおよび図11Bに示す入力間隔確認タイマの満了時間T0を通常の設定時間より長くするように設定変更する。 As shown in FIG. 14, it is assumed that the host device 332 has registered three reserved words, for example, “Iroha”, “I like you”, and “Sonya”. When the host device 332 recognizes the word uttered by the user 331 as the reserved word “Iroha”, the operation content already set is not changed, but the word uttered by the user 331 is the reserved word “I am like”. In the case of recognition, the operation is changed so as to be surely announced through the speaker 423 that “the master is pleased” when the words uttered by the user 331 are recognized thereafter. In addition, when the user 331 recognizes the reserved word “son” as the reserved word, the host device 332 determines that the user 331 is a senior user. The setting is changed so that the expiration time T0 of the input interval confirmation timer shown in FIGS. 11A and 11B is longer than the normal setting time.
 図14の例は、ホスト機器332が、ホスト機器自身の動作内容を変える例を示しているが、それに限らず、ホスト機器332とネットワーク333で接続されている機器やセンサに対する動作の制御を行ってもよい。 The example of FIG. 14 shows an example in which the host device 332 changes the operation content of the host device itself. However, the present invention is not limited to this, and controls the operation of devices and sensors connected to the host device 332 via the network 333. May be.
 ホスト機器332は、複数の予約語に応じてホスト機器332の動作を変えるために、各々の予約語に対する付加情報1を予めホスト機器332に登録しておく必要がある。 The host device 332 needs to register the additional information 1 for each reserved word in the host device 332 in advance in order to change the operation of the host device 332 according to a plurality of reserved words.
 ホスト機器332は、予約語をホスト機器332に登録する際に、登録する予約語に対応する付加情報1もあわせて登録するモード(以降予約語登録(付加情報1)モードと呼ぶ)を有している。 When registering a reserved word in the host device 332, the host device 332 has a mode for registering additional information 1 corresponding to the reserved word to be registered (hereinafter referred to as a reserved word registration (additional information 1) mode). ing.
 図15Aおよび図15Bは、予約語およびそれに対応する付加情報1を合わせて登録するために、ホスト機器332が「予約語登録(付加情報1)モード」に遷移している状態において、予約語の登録開始から付加情報1の登録完了までのホスト機器332の処理シーケンスの例を示している。図15Aおよび図15Bに示すS1500からS1514の処理は、それぞれ図5Aおよび図5Bに示すS500からS514の処理と同一である。図15Aおよび図15Bにおける処理の図5Aおよび図5Bとの処理の相違点は、S1515がS515と異なる点と、S1516からS1523が追加されている点である。 FIG. 15A and FIG. 15B show the case where the reserved word and the additional information 1 corresponding to the reserved word are registered in the state where the host device 332 is in the “reserved word registration (additional information 1) mode”. An example of a processing sequence of the host device 332 from registration start to registration completion of additional information 1 is shown. The processing from S1500 to S1514 shown in FIGS. 15A and 15B is the same as the processing from S500 to S514 shown in FIGS. 5A and 5B, respectively. 15A and 15B are different from FIG. 5A and FIG. 5B in that S1515 is different from S515 and that S1516 to S1523 are added.
 トリガー設定部403は、予約語の登録が完了したことをユーザ331に対して知らせる表示(S1514)を行う。予約語の登録が完了したことをユーザに対して知らせる表示(S1515)は、トリガー設定部403が表示装置425に対して登録完了通知を送信(S1514)し、その登録完了通知を受信した表示装置425が例えばLEDを緑色で点滅させる、というようにユーザ331が認識できる表示方法で行うことが望ましい。これにより、トリガー設定部403は、付加情報1の登録を行うようにユーザ331に促すことが可能となる。 The trigger setting unit 403 displays (S1514) informing the user 331 that registration of the reserved word has been completed. The display informing the user that registration of the reserved word has been completed (S1515) is performed by the trigger setting unit 403 transmitting a registration completion notification to the display device 425 (S1514) and receiving the registration completion notification. It is desirable to use a display method that can be recognized by the user 331, for example, 425 causes the LED to blink green. Accordingly, the trigger setting unit 403 can prompt the user 331 to register the additional information 1.
 LEDが緑色に点滅している(S1515)ことを認識したユーザ331は、S1511で登録が完了した予約語に対応した付加情報1を設定することができる。 The user 331 who recognizes that the LED is blinking in green (S1515) can set the additional information 1 corresponding to the reserved word registered in S1511.
 付加情報1の設定方法は、ユーザ331が発した音声をマイク421を通じてホスト機器332が取り込み、その取り込んだ音声データを解析することで、登録できるようにしてもよい。或いはまた表示装置425に、付加情報1を設定するメニューを表示させ、ユーザ331がそのメニューに従って操作することで登録できるようにしてもよい。或いは図4に示すネットワークI/F427を経由して接続されている外部のデバイス、例えばスマートフォンやタブレットを用いて、そのスマートフォンやタブレットの表示画面に予約語に対応した付加情報1を設定するメニューを表示させ、ユーザ331がその表示されたメニュー画面に従って操作することで登録できるようにしてもよい。図15Aおよび図15Bは、表示部425に表示された付加情報1を設定するメニューを表示させ、ユーザ331がそのメニューに従って操作することで付加情報1を登録する場合の処理シーケンスの例である。 The setting method of the additional information 1 may be such that the host device 332 captures the voice uttered by the user 331 through the microphone 421, and can be registered by analyzing the captured voice data. Alternatively, a menu for setting the additional information 1 may be displayed on the display device 425 so that the user 331 can perform registration in accordance with the menu. Alternatively, using an external device connected via the network I / F 427 shown in FIG. 4, such as a smartphone or tablet, a menu for setting additional information 1 corresponding to the reserved word on the display screen of the smartphone or tablet is displayed. The user 331 may be registered by operating in accordance with the displayed menu screen. FIGS. 15A and 15B show an example of a processing sequence in the case where a menu for setting additional information 1 displayed on the display unit 425 is displayed and the user 331 operates according to the menu to register the additional information 1.
 ユーザ331に付加情報1の入力を促すためにLEDが緑色に点滅する(S1515)と、表示部425に付加情報1を登録するためのメニューが表示される。ユーザ331は、表示されたメニュー画面に従って操作することで、付加情報1を作成する。作成が完了した付加情報1は、入力管理部420に取り込まれる(S1517)。入力管理部420は、取り込んだ付加情報1をトリガー設定部403に転送する。トリガー設定部403は、転送された付加情報1をメモリ410の予約語保存エリア410-2に保存する(S1519)。 When the LED blinks green to prompt the user 331 to input the additional information 1 (S1515), a menu for registering the additional information 1 is displayed on the display unit 425. The user 331 creates additional information 1 by operating according to the displayed menu screen. The additional information 1 that has been created is taken into the input management unit 420 (S1517). The input management unit 420 transfers the acquired additional information 1 to the trigger setting unit 403. The trigger setting unit 403 stores the transferred additional information 1 in the reserved word storage area 410-2 of the memory 410 (S1519).
 なおトリガー設定部403は、付加情報1をメモリ410の予約語保存エリア410-2に保存する際にはS1513で登録した予約語と関連付けて保存する。 The trigger setting unit 403 stores the additional information 1 in association with the reserved word registered in S1513 when storing the additional information 1 in the reserved word storage area 410-2 of the memory 410.
 また、音声処理部407は、付加情報1の登録が完了したことをユーザ331に対して知らせる表示(S1522)を行う。付加情報1の登録が完了したことをユーザ331に対して知らせる表示(S1522)は、音声処理部407が表示装置425に対して登録完了通知を送信(S1520)し、その登録完了通知を受信した表示装置425が例えばLEDを緑色で点灯させる、というようにユーザ331が認識できる表示方法で行うことが望ましい。 Also, the voice processing unit 407 performs a display (S1522) informing the user 331 that the registration of the additional information 1 has been completed. The display informing the user 331 that the registration of the additional information 1 has been completed (S1522), the voice processing unit 407 transmits a registration completion notification to the display device 425 (S1520), and the registration completion notification has been received. It is desirable that the display device 425 perform the display method that the user 331 can recognize, for example, the LED is lit in green.
 図16Aおよび図16Bは、図15Aおよび図15Bに示す処理によりメモリ410の予約語保存エリア410-2に付加情報1が保存された場合に、ユーザ331が発した言葉の中から予約語の認識し、その認識した予約語の付加情報1を予約語保存エリア410-2から読み出して、ホスト機器332に対して動作を設定する場合のシーケンスの例である。 16A and 16B show the recognition of reserved words among the words uttered by the user 331 when the additional information 1 is stored in the reserved word storage area 410-2 of the memory 410 by the processing shown in FIGS. 15A and 15B. In this example, the recognized additional information 1 of the reserved word is read from the reserved word storage area 410-2 and an operation is set for the host device 332.
 図16Aおよび図16Bに示すS1600からS1612の処理は、それぞれ図8Aおよび図8Bに示すS800からS812の処理と同一である。図16Aおよび図16Bの処理における図8Aおよび図8Bの処理との違いは、S1613とS1614の処理が追加されている点である。 The processing from S1600 to S1612 shown in FIGS. 16A and 16B is the same as the processing from S800 to S812 shown in FIGS. 8A and 8B, respectively. The difference between the processing of FIGS. 16A and 16B and the processing of FIGS. 8A and 8B is that the processing of S1613 and S1614 is added.
 ユーザ331が発した言葉を予約語として認識すると(S1605)、トリガー認識部405は、該当する予約語に対応した付加情報1をメモリ410の予約語保存エリア410-2から読み出す。付加情報1を読み出したトリガー認識部405は、読み出した付加情報1(S1613)の内容の動作をホスト機器332に設定する(S1614)。図14に示されている例の内容が予約語保存エリア410-2に保存されている場合、S1605で予約語として「息子や」を認識した場合、トリガー認識部405は、S1614にて入力間隔確認タイマTの満了時間T0を、通常の値をより長くするように設定する。 When the word uttered by the user 331 is recognized as a reserved word (S1605), the trigger recognizing unit 405 reads the additional information 1 corresponding to the corresponding reserved word from the reserved word storage area 410-2 of the memory 410. The trigger recognition unit 405 that has read the additional information 1 sets the operation of the content of the read additional information 1 (S1613) in the host device 332 (S1614). When the contents of the example shown in FIG. 14 are stored in the reserved word storage area 410-2, when “Son” is recognized as a reserved word in S1605, the trigger recognizing unit 405 determines the input interval in S1614. The expiration time T0 of the confirmation timer T is set to make the normal value longer.
 図17(A)は、ユーザ331が発した言葉を、ホスト機器332に登録されている予約語として認識した場合、その認識した予約語に継続するユーザ331が発した言葉に応じて、ホスト機器332が特定の動作をする動作内容の例の一覧である。 FIG. 17A shows that when a word uttered by the user 331 is recognized as a reserved word registered in the host device 332, the host device 331 responds to the word uttered by the user 331 following the recognized reserved word. 332 is a list of examples of operation contents for performing a specific operation.
 ホスト機器332は、ユーザ331が発した言葉を、登録されている予約語であると認識した場合、その認識した予約語に継続してユーザ331が発した言葉(以降付加語と呼ぶ)の内容に応じて動作内容(以降付加情報2と呼ぶ)を設定することが出来る。 When the host device 332 recognizes that the words uttered by the user 331 are registered reserved words, the contents of the words uttered by the user 331 (hereinafter referred to as additional words) following the recognized reserved words. The operation content (hereinafter referred to as additional information 2) can be set in accordance with.
 例えば図17(A)に示すように、予約語として「いろは」が登録されているとする。この場合、ホスト機器332は、予約語「いろは」を認識した場合、この予約語「いろは」に続くユーザ331の発した言葉を認識しない場合は、既に設定されている動作内容を変更しない。ホスト機器332は、予約語「いろは」に続くユーザ331の発した言葉として「ちゃん」を認識した場合は、ユーザ331の機嫌がよいと判定し、スピーカ423を通して応答する場合は、応答する際のトーンを上げるように動作内容を変更する。また、ホスト機器332は、予約語「いろは」に続くユーザ331の発した言葉として「や」を認識した場合は、ユーザ331がシニアユーザであると推定し、ユーザ331がゆっくりと話す傾向にあるため、図11Aおよび図11Bに示す入力間隔確認タイマの満了時間T0を通常の設定時間より長くするように変更する。またホスト機器332は、予約語「いろは」に続くユーザ331の発した言葉として「おい」を認識した場合は、ユーザ331が怒っていると判定し、「申し訳ございません」とスピーカ423を通じてすぐにアナウンスするようにする。 For example, as shown in FIG. 17A, it is assumed that “Iroha” is registered as a reserved word. In this case, when the host device 332 recognizes the reserved word “Iroha” and does not recognize the word issued by the user 331 following the reserved word “Iroha”, the operation content that has already been set is not changed. When the host device 332 recognizes “chan” as a word uttered by the user 331 following the reserved word “Iroha”, the host device 332 determines that the user 331 is in good mood, and when responding through the speaker 423, Change the operation to raise the tone. Further, when the host device 332 recognizes “ya” as a word uttered by the user 331 following the reserved word “Iroha”, the host device 332 estimates that the user 331 is a senior user, and the user 331 tends to speak slowly. Therefore, the expiration time T0 of the input interval confirmation timer shown in FIGS. 11A and 11B is changed to be longer than the normal set time. When the host device 332 recognizes “oi” as a word uttered by the user 331 following the reserved word “Iroha”, the host device 332 determines that the user 331 is angry and immediately transmits “sorry” through the speaker 423. Try to announce.
 図17(A)の例は、1つの予約語に対して複数の付加語を設定し予約語に対する複数の付加語の組み合わせごとに付加情報2を設定することで、ホスト機器332が付加情報2の内容に基づいて動作内容を変える例を示しているが、複数の予約語と複数の付加語との組み合わせごとに付加情報2を設定することも可能である。図17(B)に示すように、例えばホスト機器332が予約語として「いろは」と「おおきに」「あーしんど」の3つを登録しているとする。この場合、各予約語に対して付加語を定義し、その予約語+付加語の組み合わせごとに付加情報2を設定してもよい。 In the example of FIG. 17A, the host device 332 sets the additional information 2 by setting a plurality of additional words for one reserved word and setting the additional information 2 for each combination of the plurality of additional words for the reserved word. Although the example in which the operation content is changed based on the content of the additional information 2 is shown, it is also possible to set the additional information 2 for each combination of a plurality of reserved words and a plurality of additional words. As shown in FIG. 17B, for example, it is assumed that the host device 332 has registered “Iroha”, “Ookini”, and “Ashindo” as reserved words. In this case, an additional word may be defined for each reserved word, and additional information 2 may be set for each combination of the reserved word and the additional word.
 また、ユーザによっては、予約語を発するだけで、ある特定の動作をしてほしいときがある。例えば、ある個人の口癖がある場合、その口癖を予約語としてホスト機器332に登録し、併せてこの予約語に対応した動作をホスト機器332に登録することで、その個人の特性にあった機器やセンサの動作の制御を簡易に実行することができる。図17(B)の予約語「あーしんど」の例では、「あーしんど」という予約語をホスト機器332が認識した場合に、ホスト機器332がユーザ331の発した言葉の中から予約語を認識しただけで、ネットワーク333に接続されている冷蔵庫の中に保存されているビールの情報をスピーカ423を通してアナウンスする、ということも可能である。 Also, some users may want a specific action just by issuing a reserved word. For example, if there is a phrase of a certain individual, the phrase is registered in the host device 332 as a reserved word, and an operation corresponding to the reserved word is registered in the host device 332 as well. And control of the operation of the sensor can be executed easily. In the example of the reserved word “Ashindo” in FIG. 17B, when the host device 332 recognizes the reserved word “Ashindo”, the host device 332 recognizes the reserved word from the words uttered by the user 331. It is also possible to announce the beer information stored in the refrigerator connected to the network 333 through the speaker 423.
 ホスト機器332は、予約語に対する付加語の内容に応じて動作を変えるために、予約語に対応した付加語と、この予約語と付加語の組み合わせに対する動作内容である付加情報2、の組み合わせを予めホスト機器332に登録しておく必要がある。このためホスト機器332は、登録済み予約語に対して、対応する付加語や付加情報を追加登録するモードを有している。ホスト機器332に既に登録されている予約語に対して、付加情報1を追加するモードを付加情報1追加登録モード、付加語と付加情報2を追加するモードを付加情報2追加登録モードと呼ぶこととする。 In order to change the operation according to the content of the additional word for the reserved word, the host device 332 uses a combination of the additional word corresponding to the reserved word and the additional information 2 that is the operation content for the combination of the reserved word and the additional word. It is necessary to register with the host device 332 in advance. Therefore, the host device 332 has a mode for additionally registering corresponding additional words and additional information with respect to the registered reserved words. A mode for adding additional information 1 to a reserved word already registered in the host device 332 is called an additional information 1 additional registration mode, and a mode for adding additional words and additional information 2 is called an additional information 2 additional registration mode. And
 付加情報2の設定方法は、付加情報1の設定同様にユーザ331が発した音声をマイク421を通じてホスト機器332が取り込み、その取り込んだ音声データを解析することで、登録できるようにしてもよい。或いはまた表示装置425に、付加情報2を設定するメニューを表示させ、ユーザ331がその表示されたメニューに従って操作することで登録できるようにしてもよい。或いは図4に示すネットワークI/F427を経由して接続されている外部のデバイス、例えばスマートフォンやタブレットを用いて、そのスマートフォンやタブレットの表示画面に予約語および付加語に対応した付加情報2を設定するメニューを表示させ、ユーザ331がその表示されたメニュー画面に従って操作することで登録できるようにしてもよい。 The setting method of the additional information 2 may be configured such that, similar to the setting of the additional information 1, the voice uttered by the user 331 is captured by the host device 332 through the microphone 421, and the captured voice data is analyzed to be registered. Alternatively, a menu for setting the additional information 2 may be displayed on the display device 425 so that the user 331 can perform registration in accordance with the displayed menu. Alternatively, by using an external device connected via the network I / F 427 shown in FIG. 4, for example, a smartphone or tablet, additional information 2 corresponding to the reserved word and the additional word is set on the display screen of the smartphone or tablet. A menu to be displayed may be displayed, and the user 331 may perform registration according to an operation according to the displayed menu screen.
 図18A、図18Bおよび図18Cは、図17(A)(B)に示す登録済みの予約語に対して、付加語の登録とその付加語に対する動作内容(付加情報2)の登録を行う場合の処理シーケンスの例である。 18A, 18B, and 18C show a case where an additional word is registered and an operation content (additional information 2) is registered for the registered reserved word shown in FIGS. 17A and 17B. It is an example of a processing sequence.
 登録済みの予約語に対する付加語を追加登録するために、ユーザ331はホスト機器332を「付加情報2追加登録モード」に変更する。ホスト機器を「付加情報2追加登録モード」に変更すると、ユーザ331は、ホスト機器332に登録済みの予約語と、その予約語に対して登録したい付加語を発する。ホスト機器332は、ユーザ331の発した言葉の中から、最初に予約語の認識を行う(S1805)。 In order to additionally register additional words for registered reserved words, the user 331 changes the host device 332 to “additional information 2 additional registration mode”. When the host device is changed to the “additional information 2 additional registration mode”, the user 331 issues a reserved word registered in the host device 332 and an additional word to be registered for the reserved word. The host device 332 first recognizes a reserved word from the words uttered by the user 331 (S1805).
 ホスト機器332は、ユーザ331が発した言葉をマイク421を通じて入力管理部420に取り込む(S1801)。入力管理部420は、音声データを取り込むと内部で管理する内部状態を認識中(予約語)に遷移させる(S1802)とともに、入力された音声データをトリガー認識部405に転送する(S1803)。 The host device 332 captures the words uttered by the user 331 into the input management unit 420 through the microphone 421 (S1801). When the input management unit 420 captures the voice data, the input management unit 420 changes the internal state managed internally to being recognized (reserved word) (S1802), and transfers the input voice data to the trigger recognition unit 405 (S1803).
 トリガー認識部405は、入力管理部420から転送されてきた音声データを受け取ると、メモリ410の予約語保存エリア410-2から認識用データを読み出し(S1804)、入力管理部420から転送されてきた音声データとの比較を行う(S1805)。トリガー認識部405は、入力された音声データが予約語と認識出来た場合、入力管理部420に認識完了通知(S1806)を通知する。認識完了通知を受け取った入力管理部420は、内部で管理する内部状態を認識中(予約語)から入力待ち(付加語)に遷移(S1807)させる。 Upon receiving the voice data transferred from the input management unit 420, the trigger recognition unit 405 reads the recognition data from the reserved word storage area 410-2 of the memory 410 (S1804), and has been transferred from the input management unit 420. Comparison with audio data is performed (S1805). If the input voice data can be recognized as a reserved word, the trigger recognition unit 405 notifies the input management unit 420 of a recognition completion notification (S1806). Receiving the recognition completion notification, the input management unit 420 changes the internal state managed internally from being recognized (reserved word) to waiting for input (additional word) (S1807).
 ホスト機器332は、ユーザ331が予約語に続いて発した言葉をマイク421を通じて入力管理部420に取り込む(S1808)。入力管理部420は、内部で管理する内部状態が入力待ち(付加語)である(S1807)ので、入力された音声データをトリガー設定部403に転送する(S1809)。以降、図5Aおよび図5Bで説明した予約語の登録同様に、トリガー設定部403は、受信した音声データをメモリ410の音声蓄積エリア410-3に保存(S1810)しながら、規定回数の付加語の取り込みを行う(S1811)。 The host device 332 takes in a word issued by the user 331 following the reserved word into the input management unit 420 through the microphone 421 (S1808). Since the internal state managed internally is waiting for input (additional word) (S1807), the input management unit 420 transfers the input voice data to the trigger setting unit 403 (S1809). Thereafter, like the reserved word registration described with reference to FIGS. 5A and 5B, the trigger setting unit 403 saves the received voice data in the voice storage area 410-3 of the memory 410 (S1810), while adding the specified number of additional words. Is taken in (S1811).
 トリガー設定部403は、規定回数に達しているかの確認の結果規定回数に達していないと判定した場合、登録する付加語の音声の入力をユーザ331に促す表示を行う(S1812)と共に、入力管理部420に入力継続通知を送信する(S1814)。なお、付加語として登録する音声の入力をユーザ331に対して促す表示(S1813)は、トリガー設定部403が表示装置425に対して登録未完了通知を送信(S1812)し、その登録未完了通知を受信した表示装置425が例えばLEDを赤色で点滅させる、というようにユーザ331が認識できる表示方法で行うことが望ましい。また表示による方法の代わりに音声による方法を用いて、登録する音声の入力をユーザ331に促してもよい。この場合トリガー設定部403は、スピーカ423に対して登録未完了通知を送信し、この登録未完了通知を受け取ったスピーカ423は、たとえば「もう一度入力してください」とユーザ331に対してアナウンスする方法でもよい。或いはトリガー設定部403は、ユーザ331に対して登録する音声の入力を促すのに、表示による方法と音声による方法の両方を用いてもよい。 If the trigger setting unit 403 determines that the specified number has not been reached as a result of checking whether the specified number has been reached, the trigger setting unit 403 displays a message prompting the user 331 to input the additional word to be registered (S1812) and input management. The input continuation notification is transmitted to the unit 420 (S1814). In addition, the display that prompts the user 331 to input a voice to be registered as an additional word (S1813), the trigger setting unit 403 transmits a registration incomplete notification to the display device 425 (S1812), and the registration incomplete notification. It is desirable that the display device 425 that has received the message blinks the LED in red, for example, so that the user 331 can recognize the display method. Further, the user 331 may be prompted to input a voice to be registered by using a voice method instead of the display method. In this case, the trigger setting unit 403 transmits a registration incomplete notification to the speaker 423, and the speaker 423 that has received the registration incomplete notification announces to the user 331, for example, “Please input again”. But you can. Alternatively, the trigger setting unit 403 may use both a display method and a voice method to prompt the user 331 to input a voice to be registered.
 トリガー設定部403は、規定回数に達しているかの確認の結果規定回数に達していると判定した場合、それまでに音声蓄積エリア410-3に保存している音声データを読み出し(S1815)、インターネット2を通じてクラウドサーバ1にある音声認識クラウド101の中の認識用データ変換部101-1に送付する(S1816)。 If the trigger setting unit 403 determines that the specified number has been reached as a result of checking whether the specified number has been reached, the trigger setting unit 403 reads the voice data stored in the voice storage area 410-3 until then (S1815), 2 is sent to the recognition data conversion unit 101-1 in the voice recognition cloud 101 in the cloud server 1 (S1816).
 認識用データ変換部101-1は、トリガー設定部403から送られてきた音声データを、付加語を認識するための認識用データに変換する(S1817)。認識用データへの変換が完了すると、認識用データ変換部101-1は、インターネット2を通じて認識用データをトリガー設定部403に送付(S1818)する。付加語を認識するための認識用データ(以降認識用データ(付加語)と呼ぶ)を受信したトリガー設定部403は、受信したデータをメモリ410の予約語保存エリア410-2に保存する(S1819)。トリガー設定部403は、認識用データ(付加語)を保存する際には、S1806で認識した予約語と関連づけて保存する。これにより、S1806で認識した予約語に関連付けされて認識用データ(付加語)を保存することが可能となる。 The recognition data conversion unit 101-1 converts the voice data sent from the trigger setting unit 403 into recognition data for recognizing additional words (S1817). When the conversion to the recognition data is completed, the recognition data conversion unit 101-1 sends the recognition data to the trigger setting unit 403 through the Internet 2 (S1818). Upon receiving the recognition data for recognizing the additional word (hereinafter referred to as recognition data (additional word)), the trigger setting unit 403 stores the received data in the reserved word storage area 410-2 of the memory 410 (S1819). ). When saving the recognition data (additional word), the trigger setting unit 403 saves it in association with the reserved word recognized in S1806. As a result, the recognition data (additional word) can be stored in association with the reserved word recognized in S1806.
 また、トリガー設定部403は、付加語の登録が完了したことをユーザ331に対して知らせる表示(S1822)を行う。予約語の登録が完了したことをユーザ331に対して知らせる表示(S1822)は、トリガー設定部403が表示装置425に対して登録完了通知を送信(S1821)し、その登録完了通知を受信した表示装置425が例えばLEDを緑色で点滅させる(S1822)、というようにユーザ331が認識できる表示方法で行うことが望ましい。或いはトリガー設定部403は、予約語の登録が完了したことをユーザ331に対して通知するのに、表示による方法の代わりに音声による方法を用いてもよい。この場合トリガー設定部403は、スピーカ423に対して登録完了通知を送信し(S1821)、この登録完了通知を受け取ったスピーカ423が例えば「登録が完了しました」とユーザ331に対してアナウンスする方法でもよい。或いはトリガー設定部403は、予約語の登録が完了したことをユーザ331に対して通知するのに、表示による方法と音声による方法の両方を用いてもよい。これにより、ユーザ331は、付加語に対応した動作内容である付加情報2の内容を言葉で発するタイミングを知ることができる。 Also, the trigger setting unit 403 performs a display (S1822) informing the user 331 that the registration of additional words has been completed. A display that informs the user 331 that the registration of the reserved word has been completed (S1822) is a display in which the trigger setting unit 403 transmits a registration completion notification to the display device 425 (S1821) and receives the registration completion notification. It is desirable to use a display method that the user 331 can recognize, such as the device 425 blinking the LED in green (S1822). Alternatively, the trigger setting unit 403 may use a voice method instead of the display method to notify the user 331 that registration of the reserved word has been completed. In this case, the trigger setting unit 403 transmits a registration completion notification to the speaker 423 (S1821), and the speaker 423 that has received the registration completion notification announces to the user 331, for example, “registration is complete”. But you can. Alternatively, the trigger setting unit 403 may use both a display method and a voice method to notify the user 331 that registration of the reserved word is completed. Thereby, the user 331 can know the timing at which the content of the additional information 2 which is the operation content corresponding to the additional word is uttered.
 ユーザ331に付加情報2の入力を促すためにLEDが緑色に点滅させる(S1822)と、表示部425に付加情報2を登録するためのメニューが表示される。ユーザ331は、表示されたメニュー画面に従って操作することで、付加情報2を作成する。作成が完了し付加情報2は、入力管理部420に取り込まれる(S1824)。入力管理部420は、取り込んだ付加情報2をトリガー設定部403に転送する(S1825)。トリガー設定部403は、転送された付加情報2をメモリ410の予約語保存エリア410-2に保存する(S1826)。 When the LED blinks green to prompt the user 331 to input the additional information 2 (S1822), a menu for registering the additional information 2 is displayed on the display unit 425. The user 331 creates additional information 2 by operating according to the displayed menu screen. The creation is completed and the additional information 2 is taken into the input management unit 420 (S1824). The input management unit 420 transfers the acquired additional information 2 to the trigger setting unit 403 (S1825). The trigger setting unit 403 stores the transferred additional information 2 in the reserved word storage area 410-2 of the memory 410 (S1826).
 なおトリガー設定部403は、付加情報2をメモリ410の予約語保存エリア410-2に保存する際にはS1806で認識した予約語と関連付けて保存する。これにより、S1806で認識した予約語に関連付けされ、かつS1819で保存された付加語に関連付けされた動作内容(付加情報2)を保存することが可能となる。 The trigger setting unit 403 stores the additional information 2 in association with the reserved word recognized in S1806 when storing the additional information 2 in the reserved word storage area 410-2 of the memory 410. As a result, the operation content (additional information 2) associated with the reserved word recognized in S1806 and associated with the additional word saved in S1819 can be saved.
 登録済みの予約語に対して、付加情報だけをあとから追加することも可能である。 It is also possible to add only additional information to registered reserved words later.
 図18Dおよび図18Eは、図18A、図18Bおよび図18Cとは異なり登録済みの予約語に対して、付加情報だけを追加する場合の処理シーケンスの例である。 18D and 18E are examples of processing sequences in the case where only additional information is added to a registered reserved word, unlike FIGS. 18A, 18B, and 18C.
 図18Dに示すS1850からS1856の処理は、それぞれ図18Aに示すS1800からS1806の処理と同一である。また、図18Dおよび図18Eに示すS1871からS1880の処理は、それぞれ図18Cに示すS1821からS1830の処理と同一である。図18A、図18Bおよび図18Cのシーケンス例と図18Dおよび図18Eとのシーケンス例との違いは、図18A、図18Bおよび図18CのS1807からS1820の付加語登録処理に対応する処理が、図18Dおよび図18Eには無い点である。 The processing from S1850 to S1856 shown in FIG. 18D is the same as the processing from S1800 to S1806 shown in FIG. 18A, respectively. Also, the processing from S1871 to S1880 shown in FIGS. 18D and 18E is the same as the processing from S1821 to S1830 shown in FIG. 18C, respectively. The difference between the sequence example of FIGS. 18A, 18B, and 18C and the sequence example of FIGS. 18D and 18E is that the processing corresponding to the additional word registration processing from S1807 to S1820 in FIGS. 18A, 18B, and 18C There is no point in 18D and FIG. 18E.
 ユーザ331に付加情報1の入力を促すためにLEDが緑色に点滅させる(S1871)と、表示部425に付加情報1を登録するためのメニューが表示される。ユーザ331は、表示されたメニュー画面に従って操作することで、付加情報1を作成する。作成が完了し付加情報1は、入力管理部420に取り込まれる(S1874)。入力管理部420は、取り込んだ付加情報1をトリガー設定部403に転送する(S1875)。トリガー設定部403は、転送された付加情報1をメモリ410の予約語保存エリア410-2に保存する(S1876)。 When the LED blinks green to prompt the user 331 to input the additional information 1 (S1871), a menu for registering the additional information 1 is displayed on the display unit 425. The user 331 creates additional information 1 by operating according to the displayed menu screen. The creation is completed and the additional information 1 is taken into the input management unit 420 (S1874). The input management unit 420 transfers the acquired additional information 1 to the trigger setting unit 403 (S1875). The trigger setting unit 403 stores the transferred additional information 1 in the reserved word storage area 410-2 of the memory 410 (S1876).
 なおトリガー設定部403は、付加情報1をメモリ410の予約語保存エリア410-2に保存する際にはS1856で認識した予約語と関連付けて保存する。これにより、S1856で認識した予約語に関連付けされた動作内容を保存することが可能となる。 The trigger setting unit 403 stores the additional information 1 in association with the reserved word recognized in S1856 when storing the additional information 1 in the reserved word storage area 410-2 of the memory 410. As a result, the operation content associated with the reserved word recognized in S1856 can be saved.
 図19Aおよび図19Bは、図18A、図18Bおよび図18Cに示す処理によりメモリ410の予約語保存エリア410-2に付加語及び付加情報2が保存された場合に、ユーザ331が発した言葉の中から予約語と付加語を認識し、その認識した予約語と付加語の組み合わせに対応する付加情報2を予約語保存エリア410-2から読み出して、ホスト機器332に対して動作を設定する場合のシーケンス例である。 19A and 19B show the words uttered by the user 331 when the additional word and the additional information 2 are stored in the reserved word storage area 410-2 of the memory 410 by the processing shown in FIGS. 18A, 18B, and 18C. When a reserved word and an additional word are recognized from among them, additional information 2 corresponding to the recognized reserved word and additional word combination is read from the reserved word storage area 410-2, and an operation is set for the host device 332 This is a sequence example.
 図19Aに示すS1900からS1908の処理は、それぞれ図16Aに示すS1600からS1608の処理と同一である。図19Aおよび図19Bの処理における処理の図16Aおよび図16Bの処理との違いは、S1909からS1911の付加語の認識の処理が追加されている点と、S1912からS1913の付加情報2の読み出し処理を行う点である。 The processing from S1900 to S1908 shown in FIG. 19A is the same as the processing from S1600 to S1608 shown in FIG. 16A, respectively. 19A and 19B differs from the processing in FIGS. 16A and 16B in that additional word recognition processing from S1909 to S1911 is added, and additional information 2 reading processing from S1912 to S1913. It is a point to do.
 ユーザ311が発した言葉を取り込んだデータに対して、図19AのS1905において予約語の認識が成功すると、トリガー認識部405は、ユーザ311が発した言葉を取り込んだデータに対して、認識に成功した予約語に継続して入力された音声データが、付加語であるかの判定を判定するために、メモリ410の予約語保存エリア410-2から読み出した認識用データ(付加語)との比較を行う(S1911)。予約語に継続する音声データが付加語であると認識した場合、トリガー認識部405は、該当する予約語と付加語に対応した付加情報2をメモリ410の予約語保存エリア410-2から読み出す(S1912)。付加情報2を読み出したトリガー認識部405は、読み出した付加情報2の内容の動作をホスト機器332に設定する(S1913)。 If the reserved word is successfully recognized in S1905 of FIG. 19A for the data captured by the user 311, the trigger recognition unit 405 successfully recognizes the data captured by the user 311. Comparison with recognition data (additional word) read from the reserved word storage area 410-2 of the memory 410 in order to determine whether the voice data continuously input to the reserved word is an additional word (S1911). When the voice data continuing to the reserved word is recognized as an additional word, the trigger recognition unit 405 reads the corresponding reserved word and additional information 2 corresponding to the additional word from the reserved word storage area 410-2 of the memory 410 ( S1912). The trigger recognition unit 405 that has read the additional information 2 sets the operation of the content of the read additional information 2 in the host device 332 (S1913).
 以上のように、ホスト機器332に予約語、付加語、付加情報を登録することで、ホスト機器332は、ホスト機器332の動作や、ホスト機器332とネットワークで接続されている機器やセンサに対する動作を自由に制御することが出来、個々人の生活スタイルにあった機器やセンサの制御が可能となる。 As described above, by registering reserved words, additional words, and additional information in the host device 332, the host device 332 operates the host device 332, and operates on devices and sensors connected to the host device 332 through the network. Can be controlled freely, and it is possible to control devices and sensors suitable for the individual lifestyle.
 図20は、ホスト機器332に予約語が複数登録された場合、ユーザ331が発した言葉の中から予約語のいずれかであると認識した場合、その認識した予約語に応じて、音声認識クラウド101の音声テキスト変換部101-2で用いる音声認識辞書を変更する例の一覧である。 FIG. 20 shows that when a plurality of reserved words are registered in the host device 332, when the user 331 recognizes any one of the reserved words, the voice recognition cloud according to the recognized reserved word. 10 is a list of examples of changing a speech recognition dictionary used in the speech text conversion unit 101-2 of 101.
 ホスト機器332は、複数の予約語を登録することが可能である。ホスト機器332は、ユーザ331が発した言葉を、登録された複数の予約語のいずれかであると認識した場合、その認識した予約語に応じて音声認識クラウド101の音声テキスト変換部101-2で用いる音声からテキストに変換するための音声認識辞書を変更することができる。例えば図21Aおよび図21Bに示すように、ホスト機器332は、予約語として「こんにちは」「Hello」「おおきに」の3つを登録しているものとする。この場合ホスト機器332は、予約語「こんにちは」を認識した場合は、音声認識クラウド101の音声テキスト変換部101-2で用いる音声認識辞書を日本語辞書に変更するように命令を出すことができる。また、予約語「Hello」を認識した場合は、ホスト機器332は、音声認識クラウド101の音声テキスト変換部101-2に対して、音声認識辞書の種類を英語辞書に変更するように命令を出すことができる。さらにまた、予約語「おおきに」を認識した場合は、ホスト機器332は、音声認識クラウド101の音声テキスト変換部101-2で用いる音声認識辞書の種類を方言辞書(関西弁)に変更するように命令を出すことができる。 The host device 332 can register a plurality of reserved words. When the host device 332 recognizes that the word uttered by the user 331 is one of a plurality of registered reserved words, the voice text conversion unit 101-2 of the speech recognition cloud 101 according to the recognized reserved word. It is possible to change a speech recognition dictionary for converting voice to text used in the process. For example, as shown in FIGS. 21A and 21B, the host apparatus 332 is assumed to have registered three "Hello", "Hello", "Ookini" as reserved words. In this case the host device 332, if you recognize the "Hello" reserved word, it is possible to issue an instruction to change the voice recognition dictionary used in speech-to-text conversion unit 101-2 of the speech recognition cloud 101 to Japanese dictionary . If the reserved word “Hello” is recognized, the host device 332 instructs the speech text conversion unit 101-2 of the speech recognition cloud 101 to change the type of the speech recognition dictionary to the English dictionary. be able to. Furthermore, when the reserved word “OOKINI” is recognized, the host device 332 changes the type of the speech recognition dictionary used in the speech text conversion unit 101-2 of the speech recognition cloud 101 to a dialect dictionary (Kansai dialect). An order can be issued.
 ホスト機器332が認識した予約語に応じて音声認識クラウド101の音声テキスト変換部101-2で用いる音声認識辞書の種類を変えるためには、ユーザ331は、ホスト機器332に対して予約語を登録する際に、予約語に対応して音声テキスト変換部101-2で使用する音声認識辞書の種類(以降付加情報3と呼ぶ)をあわせて登録する必要がある。 In order to change the type of the speech recognition dictionary used in the speech text conversion unit 101-2 of the speech recognition cloud 101 according to the reserved word recognized by the host device 332, the user 331 registers the reserved word in the host device 332 In this case, it is necessary to register the type of speech recognition dictionary (hereinafter referred to as additional information 3) used in the speech text conversion unit 101-2 corresponding to the reserved word.
 予約語に対応する音声認識辞書の種類(付加情報3)を、予約語の登録とあわせて登録する処理シーケンスは、図15Aおよび図15Bに示す予約語に対して付加情報1を登録する処理シーケンスと同一であり、表示部425に表示されるメニュー画面で付加情報1を入力する(S1516)代わりに、付加情報3の入力画面を選択して入力すればよい。以降、図15BのS1514以降の処理を用いて、付加情報3を登録する処理の流れについて説明する。図15BのS1514以降に記載されている付加情報1は、付加情報3と読み替えて説明する。 The processing sequence for registering the type (additional information 3) of the speech recognition dictionary corresponding to the reserved word together with the registration of the reserved word is a processing sequence for registering additional information 1 for the reserved word shown in FIGS. 15A and 15B. The additional information 1 may be selected and input instead of inputting the additional information 1 on the menu screen displayed on the display unit 425 (S1516). Hereinafter, the flow of the process of registering the additional information 3 will be described using the processes after S1514 in FIG. 15B. Additional information 1 described after S1514 in FIG. 15B will be described as additional information 3.
 ユーザ331に付加情報3の入力を促すためにLEDが緑色点滅する(S1514)と、表示部425に付加情報3を登録するためのメニューが表示される。ユーザ331は、表示されたメニュー画面に従って付加情報3の入力操作することで、付加情報3として辞書の種類を選択することができる。作成が完了し付加情報3は、入力管理部420に取り込まれる(S1516)。入力管理部420は、取り込んだ付加情報3をトリガー設定部403に転送する。トリガー設定部403は、転送された付加情報3をメモリ410の予約語保存エリア410-2に保存する。 When the LED blinks green to prompt the user 331 to input the additional information 3 (S1514), a menu for registering the additional information 3 is displayed on the display unit 425. The user 331 can select a type of dictionary as the additional information 3 by performing an input operation of the additional information 3 in accordance with the displayed menu screen. The creation is completed and the additional information 3 is taken into the input management unit 420 (S1516). The input management unit 420 transfers the acquired additional information 3 to the trigger setting unit 403. The trigger setting unit 403 stores the transferred additional information 3 in the reserved word storage area 410-2 of the memory 410.
 なおトリガー設定部403は、付加情報3をメモリ410の予約語保存エリア410-2に保存する際にはS1513で登録した予約語と関連付けて保存する。 The trigger setting unit 403 stores the additional information 3 in association with the reserved word registered in S1513 when storing the additional information 3 in the reserved word storage area 410-2 of the memory 410.
 図21Aおよび図21Bは、図20に示したようにホスト機器332に予約語が複数登録された場合の、各予約語がホスト機器332で認識された場合に、音声テキスト変換部101-2で使用する音声認識辞書の種類を変更するシーケンス例を示している。図21Aおよび図21Bに示すS2100からS2113の処理は、それぞれ図16Aおよび図16Bに示すS1600からS1613の処理と同一である。図21Aおよび図21Bにおける処理の図16Aおよび図16Bの処理との相違点は、図16Aおよび図16Bの処理の場合は、トリガー認識部405が付加情報1を読み出した後、その付加情報1の内容に基づいてホスト機器332の動作の設定を行う(S1614)のに対して、図21Aおよび図21Bの場合は、トリガー認識部405が付加情報3を読み出した後、その付加情報3の内容に基づいて音声テキスト変換部101-2で使用する音声認識辞書の種類を変えるために音声テキスト変換部101-2とのやり取りを行う(S2114-1からS2114-3)点である。 21A and 21B show the case where each reserved word is recognized by the host device 332 when a plurality of reserved words are registered in the host device 332 as shown in FIG. The sequence example which changes the kind of speech recognition dictionary to be used is shown. The processing from S2100 to S2113 shown in FIGS. 21A and 21B is the same as the processing from S1600 to S1613 shown in FIGS. 16A and 16B, respectively. The difference between the processing in FIGS. 21A and 21B and the processing in FIGS. 16A and 16B is that, in the case of the processing in FIGS. 16A and 16B, after the trigger recognition unit 405 reads out the additional information 1, While the operation of the host device 332 is set based on the contents (S1614), in the case of FIGS. 21A and 21B, after the trigger recognition unit 405 reads the additional information 3, the contents of the additional information 3 are changed. Based on this, in order to change the type of the speech recognition dictionary used in the speech text conversion unit 101-2, exchange with the speech text conversion unit 101-2 is performed (S2114-1 to S2114-3).
 なお、予約語の認識及び音声認識辞書の変更が完了したことをユーザに対して知らせる表示は、トリガー設定部403が表示装置425に対して登録完了通知を送信(S2109)し、その登録完了通知を受信した表示装置425が例えばLEDを緑色で点灯させる、というようにユーザ331が認識できる表示方法で行うことが望ましい。或いはトリガー認識部405は、スピーカ423に対して認識完了通知を送付することで、その認識完了通知を受け取ったスピーカ423が例えば「はいはいなんでしょうか?。ところで、音声認識の辞書は方言辞書(関西弁)に変えましたよ」とユーザ331に対して音声によりアナウンスする方法でもよい。或いはトリガー認識部405は、予約語の認識と認識した予約語に対応した音声認識辞書の変更とが完了したことをユーザ331に対して通知するに、表示装置425を用いた表示による方法とスピーカ423を用いた音声による方法の両方を用いてもよい。 The trigger setting unit 403 transmits a registration completion notification to the display device 425 (S2109) to notify the user that the reserved word recognition and the change of the voice recognition dictionary have been completed. It is desirable that the display device 425 that has received the signal illuminates the LED in green, for example, so that the user 331 can recognize the display method. Alternatively, the trigger recognition unit 405 sends a recognition completion notification to the speaker 423, and the speaker 423 that has received the recognition completion notification is, for example, “Yes? What is the voice recognition dictionary? It may be a method of announcing to the user 331 by voice. Alternatively, the trigger recognizing unit 405 notifies the user 331 that the reserved word has been recognized and the speech recognition dictionary corresponding to the recognized reserved word has been changed. Both voice methods using 423 may be used.
 なお、図14に示す予約語に対応する動作内容(付加情報1)、図17(A)(B)に示す予約語に対する付加語ごとの動作内容(付加情報2)、及び図20に示す予約語に対する音声認識辞書の種類(付加情報3)は、組み合わせて登録を行うことができる。 The operation content corresponding to the reserved word shown in FIG. 14 (additional information 1), the operation content for each additional word (additional information 2) shown in FIGS. 17A and 17B, and the reservation shown in FIG. The types of voice recognition dictionaries (additional information 3) for words can be registered in combination.
 図22は、図14に示す予約語に対応する動作内容の登録、図17(A)に示す予約語に対する付加語の登録、付加語に対する動作内容の登録及び図20に示す予約語に対する音声認識辞書の種類の登録を組み合わせて行う場合の組み合わせの一覧である。ホスト機器332は、例えば予約語「こんにちは」に対しては、音声認識辞書の種類として日本語辞書を使用するように設定する。ホスト機器332は、また予約語「こんにちは」に対して付加語として「ちゃん」「や」「おい」を登録し、付加語が「ちゃん」の場合は応答する際のトーンを上げるようにホスト機器332の動作内容を変更し、付加語が「や」の場合は入力間隔確認タイマTの満了時間T0を長くするように設定内容を変更し、また付加語が「おい」の場合は、「申し訳ございません」とすぐにアナウンスするように動作内容をする。 FIG. 22 shows the registration of the operation content corresponding to the reserved word shown in FIG. 14, the registration of the additional word for the reserved word shown in FIG. 17A, the registration of the operation content for the additional word, and the voice recognition for the reserved word shown in FIG. It is a list of combinations when registering dictionary types in combination. The host device 332 is, for example, a reserved word for the "Hello" is configured to use the Japanese dictionary as the type of voice recognition dictionary. The host device 332, also "Chan", "Ya" as an additional language to the reserved word "Hello" to register the "Hey", the host device so as to raise the tone at the time of response in the case of the additional word "chan" When the additional word is “Ya”, the setting content is changed to increase the expiration time T0 of the input interval confirmation timer T. When the additional word is “O”, “Sorry, There is no action "so that it will be announced immediately.
 図23は、予約語以外の内容(以降変更条件と呼ぶ)に応じてテキスト変換部101-2で使用する音声認識辞書の種類を変更する例の一覧である。例えば図23(A)は、変更条件として時刻を設定した場合の例である。ホスト機器332は、音声認識クラウド101のテキスト変換部101-2が音声データをテキストに変換する際に使用する音声認識辞書の種類を、その音声認識辞書を使用する時間によって変更するように指示する例を示している。 FIG. 23 is a list of examples in which the type of the speech recognition dictionary used in the text conversion unit 101-2 is changed according to contents other than reserved words (hereinafter referred to as change conditions). For example, FIG. 23A shows an example in which time is set as the change condition. The host device 332 instructs the text conversion unit 101-2 of the voice recognition cloud 101 to change the type of the voice recognition dictionary used when the voice data is converted into text depending on the time for which the voice recognition dictionary is used. An example is shown.
 ホスト機器332は、例えば、時刻05:00から08:00までは家族一般用辞書を使用し、時刻08:00から16:00までは奥様用辞書を使用し、時刻16:00から20:00までは家族一般用辞書を使用し、時刻20:00から05:00までは大人用辞書を使用するように、インターネット2を通じてテキスト変換部101-2に指示する。 For example, the host device 332 uses the family general dictionary from time 05:00 to 08:00, uses the wife dictionary from time 08:00 to 16:00, and time 16:00 to 20:00. The text conversion unit 101-2 is instructed through the Internet 2 to use the family general dictionary until the time 20:00 to 05:00 and use the adult dictionary.
 また図23(B)は、変更条件=ホスト機器332の動作ステータスとした場合の例である。ホスト機器332は、テキスト変換部101-2が使用する音声認識辞書の種類を、その音声認識辞書を使用する際のホスト機器332の動作ステータスの種類によって変更するように指示することができる。 FIG. 23B shows an example in which the change condition = the operation status of the host device 332 is set. The host device 332 can instruct to change the type of the speech recognition dictionary used by the text conversion unit 101-2 depending on the type of operation status of the host device 332 when using the speech recognition dictionary.
 ホスト機器332は、例えば、動作ステータス=今から出勤の時は時刻・ルート検索辞書を使用し、動作ステータス=外出の時は一般辞書を使用し、動作ステータス=夜モードの時はリフレッシュ辞書を使用するように、インターネット2を通じてテキスト変換部101-2に指示する。 The host device 332 uses, for example, a time / route search dictionary when the operation status = from now on, uses a general dictionary when the operation status = goes out, and uses a refresh dictionary when the operation status = night mode. Thus, the text conversion unit 101-2 is instructed through the Internet 2.
 ホスト機器332は、条件に応じて使用する音声認識辞書の種類の情報である、変更条件種類情報を登録するモード(以降変更条件登録モードと呼ぶ)以降を有している。 The host device 332 has a mode for registering change condition type information (hereinafter referred to as a change condition registration mode) or later, which is information on the type of speech recognition dictionary used in accordance with conditions.
 ユーザ331は、変更条件に応じて音声認識辞書の種類を使い分けるために、変更条件種類情報をホスト機器332に予め登録する必要がある。 The user 331 needs to register the change condition type information in the host device 332 in advance in order to use the type of the voice recognition dictionary according to the change condition.
 変更条件に応じて音声認識辞書の種類を使い分けるための登録方法は、ユーザ331が発した音声をマイク421を通じてホスト機器332が取り込み、その取り込んだ音声データを解析することで、登録できるようにしてもよい。或いはまた表示装置425に、付加情報1を設定するメニューを表示させ、ユーザ331がそのメニューに従って操作することで登録できるようにしてもよい。或いは図4に示すネットワークI/F427を経由して接続されている外部のデバイス、例えばスマートフォンやタブレットを用いて、そのスマートフォンやタブレットの表示画面に予約語に付加情報1を設定するメニューを表示さ、ユーザ331がその表示されたメニュー画面に従って操作することで登録できるようにしてもよい。 The registration method for properly using different types of voice recognition dictionaries according to the change condition is to allow the host device 332 to capture the voice uttered by the user 331 through the microphone 421 and analyze the captured voice data so that registration can be performed. Also good. Alternatively, a menu for setting the additional information 1 may be displayed on the display device 425 so that the user 331 can perform registration in accordance with the menu. Alternatively, using an external device connected via the network I / F 427 shown in FIG. 4, for example, a smartphone or tablet, a menu for setting additional information 1 as a reserved word is displayed on the display screen of the smartphone or tablet. The user 331 may perform registration according to an operation according to the displayed menu screen.
 図24は、表示部425に表示された変更条件種類情報を設定するメニューを表示させ、ユーザ331がそのメニューに従って操作することで変更条件に応じて使い分ける音声認識辞書の種類を登録する場合の処理シーケンスの例である。図24に示すS2417からS2423の処理は、付加情報1の登録シーケンスである図15BのS1517からS1523の処理と同一である。 FIG. 24 displays a menu for setting change condition type information displayed on the display unit 425, and processing when the user 331 operates according to the menu to register the type of the voice recognition dictionary to be used depending on the change condition. It is an example of a sequence. The processing from S2417 to S2423 shown in FIG. 24 is the same as the processing from S1517 to S1523 in FIG.
 ユーザ331は、表示されたメニュー画面に従って操作することで、変更条件に応じて使い分ける音声認識辞書の種類を入力する。入力が完了した変更条件種類情報は、入力管理部420に取り込まれる(S2417)。入力管理部420は、取り込んだ変更条件種類情報をトリガー設定部403に転送する(S2418)。トリガー設定部403は、転送された変更条件種類情報をメモリ410の予約語保存エリア410-2に保存する(S2419)。 The user 331 inputs the type of voice recognition dictionary to be used according to the change condition by operating according to the displayed menu screen. The change condition type information for which input has been completed is taken into the input management unit 420 (S2417). The input management unit 420 transfers the captured change condition type information to the trigger setting unit 403 (S2418). The trigger setting unit 403 stores the transferred change condition type information in the reserved word storage area 410-2 of the memory 410 (S2419).
 図25は、図23に示すように変更条件に応じて音声認識辞書の種類を変更するための変更条件種類情報がメモリ410の予約語保存エリア410-2に保存されている場合に、その保存されている変更条件種類情報の内容に応じて、ホスト機器332が音声テキスト変換部101-2に、音声認識辞書の変更を通知する場合の処理シーケンスの例である。 FIG. 25 shows the case where the change condition type information for changing the type of the speech recognition dictionary according to the change condition is stored in the reserved word storage area 410-2 of the memory 410 as shown in FIG. This is an example of a processing sequence in the case where the host device 332 notifies the voice text conversion unit 101-2 of the change of the voice recognition dictionary according to the content of the change condition type information that has been changed.
 図25の処理は、例えば図9Bに示す予約語の認識の処理が終了した(S911)あとに、継続して行うことが望ましい。或いは、予約語の認識が行われた後に、図10Aおよび図10Bに示すように、機器やセンサを制御するためにユーザ331がホスト機器332に発した場合に、その言葉をホスト機器332が取り込んだタイミング(S1001)で行うことが望ましい。 25 is preferably performed continuously after the reserved word recognition process shown in FIG. 9B is completed (S911). Alternatively, after the reserved word is recognized, when the user 331 utters the host device 332 to control the device or sensor as shown in FIGS. 10A and 10B, the host device 332 captures the word. It is desirable to carry out at the timing (S1001).
 図25は、図10Aおよび図10Bに示すように機器やセンサを制御するためにユーザ331がホスト機器332に言葉を発した場合に、その言葉をホスト機器332が取り込んだタイミング(S1001)で、音声認識辞書の変更の判定とその結果の通知を行う場合の例である。 FIG. 25 shows a timing (S1001) when the host device 332 fetches the words when the user 331 utters words to the host device 332 to control the devices and sensors as shown in FIGS. 10A and 10B. It is an example in the case of performing determination of change of the speech recognition dictionary and notification of the result.
 予約語の認識が完了した場合、ホスト機器332は、継続してユーザの発した音声を、マイク421を通じて入力管理部420に取り込む(S2501)。入力管理部420は、音声データを取り込んだタイミングで、変更条件種類情報を読み出すために、音声処理部407に読み出し要求(変更条件種類情報)を送信する(S2502)とともに取り込んだ音声データに対する処理は一時停止する。読み出し要求(変更条件種類情報)を受信した音声処理部407は、メモリ410の予約語保存エリア410-2から、変更条件と音声認識辞書の種類の組み合わせが含まれている変更条件種類情報を読み出す(S2503)。音声処理部407は、読み出した変更条件種類情報の「変更条件」を解析し、その内容がホスト機器332の状態に適合しているかの判定を行う(S2504)。判定の結果適合していると判定された場合、音声処理部407は、「変更条件」に対応する「音声認識辞書の種類」を読み出し、音声認識辞書種類通知により変更後の音声認識辞書の種類をインターネット2を通じて音声テキスト変換部101-2に通知する(2505)。音声認識辞書種類通知を受信した音声テキスト変換部101-2は、通知された音声認識辞書の種類を参照し、現在使用中の音声認識辞書の種類を通知された音声認識辞書の種類に変更する(S2506)
 音声テキスト変換部101-2は、音声認識辞書の種類の変更が完了すると、変更完了の通知として、音声処理部407に対して音声認識辞書変更完了通知を通知する(S2507)。
When the recognition of the reserved word is completed, the host device 332 continuously captures the voice uttered by the user into the input management unit 420 through the microphone 421 (S2501). The input management unit 420 transmits a read request (change condition type information) to the audio processing unit 407 in order to read the change condition type information at the timing when the audio data is acquired (S2502). Pause. Receiving the read request (change condition type information), the voice processing unit 407 reads the change condition type information including the combination of the change condition and the type of the voice recognition dictionary from the reserved word storage area 410-2 of the memory 410. (S2503). The voice processing unit 407 analyzes the “change condition” of the read change condition type information, and determines whether or not the content is compatible with the state of the host device 332 (S2504). If it is determined as a result of the determination, the speech processing unit 407 reads “the type of the speech recognition dictionary” corresponding to the “change condition”, and the type of the speech recognition dictionary after the change by the speech recognition dictionary type notification. Is sent to the voice text conversion unit 101-2 through the Internet 2 (2505). Upon receiving the voice recognition dictionary type notification, the voice text conversion unit 101-2 refers to the notified voice recognition dictionary type and changes the type of the voice recognition dictionary currently in use to the notified voice recognition dictionary type. (S2506)
When the change of the type of the voice recognition dictionary is completed, the voice text conversion unit 101-2 notifies the voice processing unit 407 of the voice recognition dictionary change completion notification as the change completion notification (S2507).
 音声処理部407は、音声認識辞書変更完了通知を受信すると(S2507)、入力管理部420に対して、変更条件種類情報の読み出しが完了した旨の通知として、読み出し完了通知を送信する(S2508)。入力管理部420は、読み出し完了通知を受信する(S2508)と、S2501において取り込んでいた音声データに対する処理を再開する。 Upon receiving the voice recognition dictionary change completion notification (S2507), the voice processing unit 407 transmits a read completion notification to the input management unit 420 as a notification that the change condition type information has been read (S2508). . When the input management unit 420 receives the read completion notification (S2508), the input management unit 420 resumes the processing for the audio data captured in S2501.
 ユーザ331は、ホスト機器332に登録した予約語を忘れてしまう場合がある。そのような場合に備えて、ユーザ331は、登録済みの予約語を簡易な方法で確認できることが望ましい。 The user 331 may forget the reserved word registered in the host device 332. In preparation for such a case, it is desirable that the user 331 can confirm a registered reserved word by a simple method.
 図26は、図5Aおよび図5Bに示す処理シーケンスの例で予約語を登録したユーザ331が、登録済みの予約語を忘れてしまった場合、登録済みの予約語の一部または全部をユーザ331に通知するための予約語(以降救済予約語と呼ぶ)と表示内容(表示範囲)の例の一覧を示している。例えば「わからない」という予約語に対しては、ホスト機器332に登録済みの予約語の全てを表示部425に表示する、或いはホスト機器に332に接続された外部のデバイスの表示エリアに表示する場合を示している。また「ちょっと教えて」という予約語に対しては、ホスト機器332に登録済みの予約語のうち予め決められた一部を表示部425に表示する、或いはホスト機器332に接続された外部のデバイスの表示エリアに表示する場合を示している。また「使ってないヤツ」という予約語に対しては、ホスト機器332に登録済みの予約語のうち使用履歴が過去1年間ない予約語を表示部425に表示する、或いはホスト機器332に接続された外部のデバイスの表示エリアに表示する場合を示している。ホスト機器332に接続された外部のデバイスとしては、例えばスマートフォンやタブレット、液晶テレビ等の表示画面が比較的大きくユーザが一度に多くの予約語を参照することができるデバイスであることが望ましい。 FIG. 26 shows a case where the user 331 who registered a reserved word in the example of the processing sequence shown in FIGS. 5A and 5B forgets a registered reserved word, and a part or all of the registered reserved word is changed to the user 331. A list of examples of reserved words (hereinafter referred to as “reserved reserved words”) and display contents (display range) for notifying the user is shown. For example, for a reserved word “I don't know”, all of the reserved words registered in the host device 332 are displayed on the display unit 425 or displayed in the display area of an external device connected to the host device 332. Is shown. In addition, for a reserved word of “tell me”, a predetermined part of reserved words registered in the host device 332 is displayed on the display unit 425 or an external device connected to the host device 332. It shows the case of displaying in the display area. For a reserved word “unused”, a reserved word that has not been used in the past year among the reserved words registered in the host device 332 is displayed on the display unit 425 or connected to the host device 332. This shows the case where the data is displayed in the display area of the external device. The external device connected to the host device 332 is preferably a device that has a relatively large display screen, such as a smartphone, tablet, or liquid crystal television, and allows the user to refer to many reserved words at once.
 このように、登録済みの予約語を表示させるための予約語の登録は、ホスト機器のモード=設定モード(予約語(表示用))に変更して、図5Aおよび図5Bに示す予約語の登録の処理シーケンスに従って登録すればよい。 As described above, the registration of reserved words for displaying registered reserved words is performed by changing the mode of the host device to the setting mode (reserved words (for display)) and changing the reserved words shown in FIGS. 5A and 5B. Registration may be performed according to a registration processing sequence.
 上記の例は、図26に示した「救済予約語」をユーザが発することで、すぐに該当する予約語が表示される例である。しかし、ホスト機器332が、該当する予約語を表示するまえに、ユーザ331に対して合言葉を聞くようにしてもよい。ユーザが「救済予約語」を発した後、ホスト機器332はスピーカ423を通じて例えば「山」と音声を発し、これに対して例えばユーザ331が「川」と応答したときに、該当する予約語を表示してもよい。 The above example is an example in which the corresponding reserved word is displayed immediately when the user issues the “reserved reserved word” shown in FIG. However, the host device 332 may ask the user 331 for a secret before displaying the corresponding reserved word. After the user issues a “reserve reserved word”, the host device 332 emits a voice such as “mountain” through the speaker 423, and when the user 331 responds with “river”, for example, the corresponding reserved word is displayed. It may be displayed.
 更に、ホスト機器332は、ユーザ331が発した言葉を取り込んで、予約語、付加語、あるいは付加情報を登録するシーンを録音あるいは録画することも出来る。あるいは、予約語、付加語、を認識した場合に、その認識するシーンを録音あるいは録画することもできる。 Further, the host device 332 can capture or record a scene in which a reserved word, an additional word, or additional information is registered by taking in a word uttered by the user 331. Alternatively, when a reserved word or additional word is recognized, the recognized scene can be recorded or recorded.
 図27は、ホスト機器332が、ユーザ331が発した言葉を取り込んで、予約語、付加語、あるいは付加情報の登録、予約語あるいは付加語の認識、のシーンを録音あるいは録画する場合、ホスト機器332の機能ブロック図を示している。図4との違いは、ホスト機器2700が予約語、付加語、あるいは付加情報を登録するシーンを録画する、あるいは予約語あるいは付加語を認識するシーンを録画するためのカメラ2702を有する点、また制御管理部2701がAPP-Mg2701-1、CONF-Mg2701-2に加えEVT-Mg2701-3を有する点、システムコントローラ402が録音あるいは録画したシーンのデータを再生するための再生制御機能を有している点である。EVT-Mg2701-3は、予約語、付加語、あるいは付加情報を登録するシーンの発生、また、予約語、付加語、を認識するシーンの発生、に起因して後述する録音あるいは録画を行う機能を有している。以下、ホスト機器332が、ユーザ331が発した言葉を取り込んで、予約語、付加語、あるいは付加情報を登録するシーンを録音あるいは録画する処理の流れ、また予約語、付加語を認識するシーンを録音あるは録画する処理の流れ、について説明する。 FIG. 27 shows a case where the host device 332 captures a word uttered by the user 331 and records or records a reserved word, additional word, or additional information registration, reserved word or additional word recognition scene. A functional block diagram of 332 is shown. The difference from FIG. 4 is that the host device 2700 has a camera 2702 for recording a scene for registering a reserved word, additional word, or additional information, or for recording a scene for recognizing a reserved word or additional word. The control management unit 2701 has EVT-Mg2701-3 in addition to APP-Mg2701-1 and CONF-Mg2701-2, and has a playback control function for playing back the recorded or recorded scene data. It is a point. EVT-Mg2701-3 is a function for recording or recording, which will be described later, due to the occurrence of a scene that registers reserved words, additional words, or additional information, and the occurrence of scenes that recognize reserved words, additional words have. Hereinafter, the host device 332 captures a word uttered by the user 331, and records or records a scene for registering a reserved word, additional word, or additional information, and a scene for recognizing the reserved word or additional word. A recording process or a recording process flow will be described.
 図28は、予約語、付加語、あるいは付加情報を登録するシーンが発生したとき、あるいは、予約語、付加語、を認識するシーンが発生した場合に、登録のシーンあるいは認識のシーンをホスト機器332が録音あるいは録画する場合の時間経過を示している。 FIG. 28 shows a registered scene or a recognition scene when a scene for registering a reserved word, additional word, or additional information occurs, or when a scene for recognizing a reserved word, additional word is generated. Reference numeral 332 indicates the passage of time when recording or recording.
 時刻t1において、ホスト機器332は、ユーザが発した言葉を予約語として登録を開始したとする。予約語の登録の開始は、例えば図5Aおよび図5Bの予約語の登録シーケンスにおける、入力管理部420がS502の処理を行うタイミングとしてもよい。入力管理部420は、予約語の登録の開始を認識すると、その旨をEVT-Mg2701-3に通知する。予約語の登録開始の旨の通知を受信したEVT-Mg2701―3は、マイク421を通じて予約語登録のシーンをRec1として録音する、あるいはカメラ2702を通じて予約語登録のシーンをRec1として録画する。予約語の登録の終了は、例えば図5Aおよび図5Bの予約語の登録シーケンスにおける、入力管理部420がS512の登録完了通知を受け取ったタイミングとしてもよい。予約語の登録の終了を把握した入力管理部420は、その旨とEVT-Mg2701-3に通知する。予約語の登録完了の旨を受信したEVT-Mg2701-3は、マイク421を通じて行っていた予約語登録のシーンの録音を終了させる、あるいはカメラ2702を通じて行っていた予約語登録のシーンの録画を終了させる。 At time t1, it is assumed that the host device 332 starts registration using the words uttered by the user as reserved words. The start of reserved word registration may be, for example, the timing at which the input management unit 420 performs the processing of S502 in the reserved word registration sequence of FIGS. 5A and 5B. When recognizing the start of reserved word registration, the input management unit 420 notifies the EVT-Mg 2701-3 to that effect. The EVT-Mg2701-3 that has received the notification that the reserved word registration is started records the reserved word registration scene as Rec1 through the microphone 421, or records the reserved word registration scene as Rec1 through the camera 2702. The end of the reserved word registration may be, for example, the timing when the input management unit 420 receives the registration completion notification of S512 in the reserved word registration sequence of FIGS. 5A and 5B. The input management unit 420 that grasps the end of the registration of the reserved word notifies the EVT-Mg 2701-3 to that effect. The EVT-Mg2701- 3 that has received the reserved word registration completion finishes recording the reserved word registration scene that was performed through the microphone 421, or ends recording of the reserved word registration scene that was performed through the camera 2702 Let
 同様に、時刻t2において、ホスト機器332はユーザが発した言葉を予約語として認識を開始したとする。予約語の認識の開始は、例えば図8Aおよび図8Bの予約語の認識シーケンスにおける、入力管理部420がS802の処理を行うタイミングとしてもよい。入力管理部420は、予約語の認識の開始を認識すると、その旨をEVT-Mg2701-3に通知する。予約語の認識開始の旨の通知を受信したEVT-Mg2701―3は、マイク421を通じて予約語認識のシーンをRec2として録音する、あるいはカメラ2702を通じて予約語認識のシーンをRec2として録画する。予約語の認識の終了は、例えば図8Aおよび図8Bの予約語の登録シーケンスにおける、入力管理部420がS811の認識完了通知を受け取ったタイミングとしてもよい。予約語の登録の終了を把握した入力管理部420は、その旨とEVT-Mg2701-3に通知する。予約語の登録完了の旨を受信したEVT-Mg2701-3は、マイク421を通じて行っていた予約語認識のシーンの録音を終了させる、あるいはカメラ2702を通じて行っていた予約語認識のシーンの録画を終了させる。 Similarly, it is assumed that at time t2, the host device 332 starts recognizing a word uttered by the user as a reserved word. The start of reserved word recognition may be, for example, the timing at which the input management unit 420 performs the processing of S802 in the reserved word recognition sequence of FIGS. 8A and 8B. When the input management unit 420 recognizes the start of recognition of the reserved word, it notifies the EVT-Mg 2701-3 to that effect. The EVT-Mg2701-3 that has received the notification of the start of reserved word recognition records the reserved word recognition scene as Rec2 through the microphone 421, or records the reserved word recognition scene as Rec2 through the camera 2702. The end of the recognition of the reserved word may be the timing when the input management unit 420 receives the recognition completion notification in S811 in the reserved word registration sequence of FIGS. 8A and 8B, for example. The input management unit 420 that grasps the end of the registration of the reserved word notifies the EVT-Mg 2701-3 to that effect. The EVT-Mg2701- 3 that has received the reserved word registration completion terminates the recording of the reserved word recognition scene performed through the microphone 421 or terminates the recording of the reserved word recognition scene performed through the camera 2702. Let
 同様に、t3およびt4において発生した登録あるいは認識のイベントを録画あるいは録音する。 Similarly, record or record registration or recognition events that occurred at t3 and t4.
 ホスト機器332は、録音または録画された登録のシーンまたは認識のシーンを再生することができる。 The host device 332 can play back a recorded scene or a recorded scene that has been recorded.
 図29は、録画あるいは録音されたシーンの各データを再生する際に、再生対象のデータが表示されている様子の一例を示している。図29の例では、図28の時間軸に対するイベントの発生する様子に対応する形で、4つの再生対象のデータのアイコンが表示されている。この再生対象のデータのアイコン表示は、例えば表示部425に表示されてもよい。あるいはホスト機器332に接続された外部デバイス、例えばスマートフォンやタブレット、液晶テレビ等に表示されてもよい。 FIG. 29 shows an example of a state in which data to be reproduced is displayed when each data of a recorded or recorded scene is reproduced. In the example of FIG. 29, four icons of data to be reproduced are displayed in a form corresponding to the state of occurrence of events with respect to the time axis of FIG. The icon display of the data to be reproduced may be displayed on the display unit 425, for example. Or you may display on the external device connected to the host apparatus 332, for example, a smart phone, a tablet, a liquid crystal television.
 表示されているアイコンは、録音または録画された日時と、録画または録音の対象のデータの内容を表している。例えばアイコンの表示内容が、予約語登録「おおきに」の場合は、録画または録音されているデータの内容が、「おおきに」を予約語として登録したシーンであることを示している。同様にアイコンの表示内容が、予約語認識「おおきに」の場合は、録画または録音されているデータの内容が、「おおきに」を予約語として認識したシーンであることを示している。 The displayed icon represents the date and time of recording or recording and the content of the data to be recorded or recorded. For example, if the displayed content of the icon is reserved word registration “OOKINI”, it indicates that the content of the recorded or recorded data is a scene in which “OOKINI” is registered as a reserved word. Similarly, when the display content of the icon is reserved word recognition “OOKINI”, it indicates that the content of the recorded or recorded data is a scene in which “OOKINI” is recognized as a reserved word.
 ユーザ331は、再生したいデータのアイコンを選択することが、対象となるデータの録音または録画された内容を確認することが出来る。 The user 331 can confirm the recording of the target data or the recorded contents by selecting the icon of the data to be reproduced.
 更にまたホスト機器332は、ネットワーク333で接続されているカメラやマイクに指示を出し、これらのカメラやマイクにより、予約語、付加語、あるいは付加情報を登録するシーンが発生した場合に、あるいは、予約語、付加語、を認識するシーンが発生した場合に、登録のシーンあるいは認識のシーンを録音あるいは録画してもよい。 Furthermore, the host device 332 issues instructions to the cameras and microphones connected via the network 333, and when a scene for registering reserved words, additional words, or additional information is generated by these cameras or microphones, or When a scene for recognizing reserved words and additional words occurs, a registered scene or a recognition scene may be recorded or recorded.
 既に説明したようにホスト機器332は、ユーザ331が発した言葉の中から予約語を認識することで、その予約語に対応した付加情報の内容をもとに、ネットワークで接続された機器やセンサを制御することが出来る。この対象となる機器やセンサの制御内容は、高いセキュリティを必要とする場合もある。例えば、金庫の扉の開閉の制御をホスト機器を用いて実施できるように、ホスト機器332に付加情報として金庫の扉の開閉動作が設定されている予約語が登録されているとする。この場合、ホスト機器332は、該当する予約語を認識した場合、金庫の扉の開閉を行うとともに、金庫の周辺にあるマイクやカメラを用いて、制御対象の機器である金庫の周辺を録音あるいは録画することで、金庫の扉の開閉動作のセキュリティを保つことが可能となる。ユーザ331は、ネットワークで接続されたマイクやカメラを用いて録音あるいは録画されたデータも、ホスト機器332に内蔵されているマイクやカメラを用いて録音あるいは録画されたデータ同様に、その内容を確認することが出来る。 ホスト機器332による制御対象となる機器やセンサの制御内容が高いセキュリティを必要とする場合、ホスト機器332は更にまた、制御内容を実施するまえに、制御対象の機器やセンサの周辺にあるマイクやカメラを用いて録音した音声や録画した映像を用いて、録音された音声を発した人物あるいは録画された映像の人物の正当性確認を行ってもよい。ホスト機器332は、特定の付加情報における制御内容を実行する前に、あらかじめ登録してある特定人物の声や顔などの特徴点と、制御対象の機器やセンサの周辺にあるマイクやカメラを用いて集音された音声や撮影された映像とを比較し、該当人物の正当性が確認された場合のみ、該当する制御内容を実行するようにしてもよい。 As described above, the host device 332 recognizes a reserved word from words uttered by the user 331, and based on the content of the additional information corresponding to the reserved word, a device or sensor connected via the network. Can be controlled. The control contents of the target devices and sensors may require high security. For example, it is assumed that a reserved word in which the opening / closing operation of the safe door is set as additional information is registered in the host device 332 so that the opening / closing control of the safe door can be performed using the host device. In this case, when the host device 332 recognizes the corresponding reserved word, the host device 332 opens and closes the door of the safe and records or records the periphery of the safe as a control target device using a microphone and a camera around the safe. By recording, it is possible to keep the security of the opening and closing operation of the safe door. The user 331 confirms the contents of data recorded or recorded using a microphone or camera connected to the network, as well as data recorded or recorded using a microphone or camera built in the host device 332. I can do it. When the control content of the device or sensor to be controlled by the host device 332 requires high security, the host device 332 further includes a microphone or a device around the control target device or sensor before the control content is executed. The validity of the person who produced the recorded voice or the person of the recorded video may be confirmed using the voice recorded using the camera or the recorded video. The host device 332 uses a pre-registered feature point such as a voice or face of a specific person and a microphone or camera around the device or sensor to be controlled before executing the control content in the specific additional information. The control content may be executed only when the sound of the collected sound or the captured video is compared and the validity of the corresponding person is confirmed.
 以上の実施形態の説明は、認識用データ変換部101-1、音声テキスト変換部101-2、テキスト分析部102-1、応答・アクション生成部102-2が、いずれもクラウドサーバ1の中に存在しているものとして説明したが、これらの一部あるいは全てがホスト機器332の中に存在していても構わない。その場合も、既に説明した各処理の動作シーケンスの例は、記載済みのものと同様となる。 In the description of the above embodiment, the recognition data conversion unit 101-1, the speech text conversion unit 101-2, the text analysis unit 102-1 and the response / action generation unit 102-2 are all included in the cloud server 1. Although described as existing, some or all of these may exist in the host device 332. Also in this case, the example of the operation sequence of each process already described is the same as that already described.
 本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.
 1・・・クラウドサーバ、2・・・インターネット、3・・・ホーム、101・・・音声認識クラウド、102・・・対応アクション生成クラウド、310・・・各種センサ、320・・・各種設備機器、330・・・HGW(HomeGateWay)、331・・・ユーザ、332・・・ホスト機器、340・・・各種家電機器。 DESCRIPTION OF SYMBOLS 1 ... Cloud server, 2 ... Internet, 3 ... Home, 101 ... Voice recognition cloud, 102 ... Corresponding action generation cloud, 310 ... Various sensors, 320 ... Various equipment , 330 ... HGW (HomeGateWay), 331 ... user, 332 ... host device, 340 ... various home appliances.

Claims (15)

  1.  外部から入力される第1の音声の内容により、前記第1の音声が入力された以降に入力される第2の音声の内容に基づいて1台または複数台の機器の制御の実行を判定する電子機器において、
     前記第1の音声が所望の音声であることを判定するための判定用音声データを、複数回外部から入力された音声により作成管理し、作成管理されている前記判定用音声データを用いて前記第1の音声が所望の音声であることを判定する管理手段と、
    第2の音声の内容に基づいて前記1台または複数台の機器の制御を実行する制御手段と、を備え
     前記管理手段により前記判定用音声データを用いて、前記第1の音声が所望の音声であると判定された場合に、前記制御手段により前記第2の音声の内容に基づいて前記1台または複数台の機器の制御を実行する
    電子機器。
    Based on the content of the first audio input from the outside, the execution of control of one or a plurality of devices is determined based on the content of the second audio input after the input of the first audio. In electronic equipment,
    The determination sound data for determining that the first sound is a desired sound is created and managed by sound input from the outside a plurality of times, and the determination sound data that is created and managed is used for the determination. Management means for determining that the first voice is a desired voice;
    Control means for executing control of the one or a plurality of devices based on the content of the second voice, and the first voice is a desired voice using the judgment voice data by the management means. An electronic device that executes control of the one or more devices based on the content of the second sound by the control means.
  2.  前記第2の音声の内容に基づいて制御する前記1台または複数台の機器とは、ネットワークにより接続されている請求項1に記載の電子機器。 The electronic device according to claim 1, wherein the electronic device is connected to the one or a plurality of devices controlled based on the content of the second sound via a network.
  3.  前記管理手段は、複数の前記判定用音声データを作成管理することができる、請求項1に記載の電子機器。 2. The electronic apparatus according to claim 1, wherein the management means can create and manage a plurality of the determination audio data.
  4.  前記管理手段による判定結果を表示できる表示部を有し、前記第1の音声が、前記管理手段により前記判定用音声データを用いて所望の音声であると判定された場合は、その旨を前記表示部に表示する請求項1に記載の電子機器。 A display unit that can display a determination result by the management unit, and when the first sound is determined by the management unit to be a desired sound using the determination sound data; The electronic device of Claim 1 displayed on a display part.
  5.  前記管理手段による判定結果を音声で出力する出力部を有し、前記第1の音声が、前記管理手段により前記判定用音声データを用いて所望の音声であると判定された場合は、その旨を前記出力部から出力する請求項1あるいは請求項4に記載の電子機器。 An output unit for outputting the determination result by the management means in a voice, and when the first voice is determined by the management means to be a desired voice using the determination voice data, to that effect The electronic device according to claim 1, wherein the electronic device is output from the output unit.
  6.  前記管理手段は、前記判定用音声データを用いて前記第1の音声が所望のデータであることを判定する際に、複数の基準を持つ判定基準1を持ち、判定結果が満たす前記判定基準1の複数の基準のうちのいずれかに応じて、前記表示部に表示する内容を変える請求項4に記載の電子機器。 The management means has the determination criterion 1 having a plurality of criteria and satisfies the determination result when determining that the first sound is desired data using the determination sound data. The electronic device according to claim 4, wherein contents displayed on the display unit are changed according to any of the plurality of criteria.
  7.  前記管理手段は、前記判定用音声データを用いて前記第1の音声が所望のデータであることを判定する際に、複数の基準を持つ判定基準2を持ち、判定結果が満たす前記判定基準2の複数の基準のうちのいずれかに応じて、前記出力部から出力する内容を変える請求項5に記載の電子機器。 The management means has the determination criterion 2 having a plurality of criteria and satisfies the determination result when determining that the first sound is desired data using the determination sound data. The electronic device according to claim 5, wherein contents output from the output unit are changed according to any of the plurality of criteria.
  8.  前記第1の音声が、前記管理手段により前記判定用音声データを用いて所望の音声であると判定された場合に、前記第2の音声の内容の一部あるいは全部の内容に基づいて、前記1台または複数台の機器の制御の内容を変える請求項1から請求項7のいずれかに記載の電子機器。 When the first sound is determined by the management means to be a desired sound using the determination sound data, based on part or all of the content of the second sound, The electronic device according to claim 1, wherein the content of control of one or a plurality of devices is changed.
  9.  前記第1の音声が、前記管理手段により前記判定用音声データを用いて所望の音声であると判定された場合に、前記第2の音声の内容の一部あるいは全部の内容に基づいて、前記電子機器の動作内容を変える請求項1から請求項8のいずれかに記載の電子機器。 When the first sound is determined by the management means to be a desired sound using the determination sound data, based on part or all of the content of the second sound, The electronic device according to claim 1, wherein the operation content of the electronic device is changed.
  10.  前記管理手段が複数の前記判定用音声データを持つ場合、前記第1の音声が所望の音声であると判定するために用いた前記判定用音声データの種類に応じて、前記1台または複数台の機器の制御の内容を変える請求項8に記載の電子機器。 In the case where the management means has a plurality of the determination sound data, the one or a plurality of the sound is determined depending on the type of the determination sound data used to determine that the first sound is a desired sound. The electronic device according to claim 8, wherein the content of control of the device is changed.
  11.  前記管理手段が複数の前記判定用音声データを持つ場合、前記第1の音声が所望の音声であると判定するために用いた前記判定用音声データの種類に応じて、前記電子機器の動作内容を変える請求項9に記載の電子機器。 When the management unit has a plurality of determination sound data, the operation content of the electronic device is determined according to the type of the determination sound data used to determine that the first sound is a desired sound. The electronic device according to claim 9, wherein the electronic device is changed.
  12.  前記管理手段は、外部から入力される音声の入力されるタイミングを確認する確認タイマを持ち、前記確認タイマにより外部から入力される音声のタイミングが一定時間以上離れていると判定した場合は、前記第1の音声の入力を要求する請求項1から請求項11のいずれかに記載の電子機器。 The management means has a confirmation timer for confirming the input timing of the sound input from the outside, and when the determination timer determines that the timing of the sound input from the outside is more than a predetermined time, The electronic device according to claim 1, wherein input of the first sound is requested.
  13.  前記第2の音声の内容を解析するために用いる音声認識辞書の種類を選択する選択手段を持ち、
    前記管理手段が複数の前記判定用音声データを持つ場合、前記第1の音声が所望の音声であると判定するために用いた前記判定用音声データの種類に応じて、前記選択手段は前記音声認識辞書の種類を決定する、請求項1から請求項12のいずれかに記載の電子機器。
    Selecting means for selecting a type of speech recognition dictionary used for analyzing the content of the second speech;
    When the management means has a plurality of the determination sound data, the selection means determines whether the first sound is a desired sound according to the type of the determination sound data used for the determination. The electronic device according to claim 1, wherein the type of recognition dictionary is determined.
  14.  前記第2の音声の内容を解析するために用いる前記音声認識辞書の種類を選択する前記選択手段を持ち、
    前記電子機器の状態に応じて、前記選択手段は前記音声認識辞書の種類を決定する、請求項13に記載の電子機器。
    The selection means for selecting a type of the voice recognition dictionary used for analyzing the content of the second voice;
    The electronic device according to claim 13, wherein the selection unit determines a type of the voice recognition dictionary according to a state of the electronic device.
  15.  外部から入力される第1の音声の内容により、前記第1の音声が入力された以降に入力される第2の音声の内容に基づいて1台または複数台の機器の制御の実行を判定する制御方法において、
     前記第1の音声が所望の音声であることを判定するための判定用音声データを、複数回外部から入力された音声により作成管理し、作成管理されている前記判定用音声データを用いて前記第1の音声が所望の音声であることを判定する管理手段と、
    第2の音声の内容に基づいて前記1台または複数台の機器の制御を実行する制御手段と、を備え
     前記管理手段により前記判定用音声データを用いて、前記第1の音声が所望の音声であると判定された場合に、前記制御手段により前記第2の音声の内容に基づいて前記1台または複数台の機器の制御を実行する
    制御方法。
    Based on the content of the first audio input from the outside, the execution of control of one or a plurality of devices is determined based on the content of the second audio input after the input of the first audio. In the control method,
    The determination sound data for determining that the first sound is a desired sound is created and managed by sound input from the outside a plurality of times, and the determination sound data that is created and managed is used for the determination. Management means for determining that the first voice is a desired voice;
    Control means for executing control of the one or a plurality of devices based on the content of the second voice, and the first voice is a desired voice using the judgment voice data by the management means. A control method for executing control of the one or a plurality of devices based on the content of the second sound by the control means when it is determined that
PCT/JP2018/015306 2018-04-11 2018-04-11 Electronic device and control method for same WO2019198186A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2018/015306 WO2019198186A1 (en) 2018-04-11 2018-04-11 Electronic device and control method for same
CN201880077613.1A CN111656314A (en) 2018-04-11 2018-04-11 Electronic apparatus and control method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/015306 WO2019198186A1 (en) 2018-04-11 2018-04-11 Electronic device and control method for same

Publications (1)

Publication Number Publication Date
WO2019198186A1 true WO2019198186A1 (en) 2019-10-17

Family

ID=68164315

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/015306 WO2019198186A1 (en) 2018-04-11 2018-04-11 Electronic device and control method for same

Country Status (2)

Country Link
CN (1) CN111656314A (en)
WO (1) WO2019198186A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018036397A (en) * 2016-08-30 2018-03-08 シャープ株式会社 Response system and apparatus
JP2018036653A (en) * 2012-08-10 2018-03-08 エイディシーテクノロジー株式会社 Voice response device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202453859U (en) * 2011-12-20 2012-09-26 安徽科大讯飞信息科技股份有限公司 Voice interaction device for home appliance
CN104538030A (en) * 2014-12-11 2015-04-22 科大讯飞股份有限公司 Control system and method for controlling household appliances through voice
CN106773742B (en) * 2015-11-23 2019-10-25 宏碁股份有限公司 Sound control method and speech control system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018036653A (en) * 2012-08-10 2018-03-08 エイディシーテクノロジー株式会社 Voice response device
JP2018036397A (en) * 2016-08-30 2018-03-08 シャープ株式会社 Response system and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"settings and use methods for Hey SIRI", 21 October 2016 (2016-10-21), Retrieved from the Internet <URL:http://dekiru.net/article/5312> *

Also Published As

Publication number Publication date
CN111656314A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
JP6659514B2 (en) Electronic device and control method thereof
JP6475386B2 (en) Device control method, device, and program
JP7198861B2 (en) Intelligent assistant for home automation
CN111512365B (en) Method and system for controlling multiple home devices
JP6567737B2 (en) Spoken dialogue control method
US9230560B2 (en) Smart home automation systems and methods
JP6053097B2 (en) Device operating system, device operating device, server, device operating method and program
WO2017059815A1 (en) Fast identification method and household intelligent robot
JP5753212B2 (en) Speech recognition system, server, and speech processing apparatus
US11303955B2 (en) Video integration with home assistant
KR20140089863A (en) Display apparatus, Method for controlling display apparatus and Method for controlling display apparatus in Voice recognition system thereof
JP7276129B2 (en) Information processing device, information processing system, information processing method, and program
US20190130707A1 (en) Event notification using an intelligent digital assistant
US11233490B2 (en) Context based volume adaptation by voice assistant devices
US20210157542A1 (en) Context based media selection based on preferences setting for active consumer(s)
WO2019198186A1 (en) Electronic device and control method for same
JP6858336B2 (en) Electronic devices and their control methods
JP6858335B2 (en) Electronic devices and their control methods
JP6858334B2 (en) Electronic devices and their control methods
JP7452528B2 (en) Information processing device and information processing method
WO2018023518A1 (en) Smart terminal for voice interaction and recognition
JP6921311B2 (en) Equipment control system, equipment, equipment control method and program
JP5973030B2 (en) Speech recognition system and speech processing apparatus
JP2020061046A (en) Voice operation apparatus, voice operation method, computer program, and voice operation system
JP2010072704A (en) Interface device and input method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18914916

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18914916

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP