WO2019229978A1

WO2019229978A1 - Audio output device, apparatus control system, audio output method, and program

Info

Publication number: WO2019229978A1
Application number: PCT/JP2018/021204
Authority: WO
Inventors: 紀之小宮
Original assignee: 三菱電機株式会社
Priority date: 2018-06-01
Filing date: 2018-06-01
Publication date: 2019-12-05
Also published as: JP6945734B2; JPWO2019229978A1

Abstract

In the present invention, an operation sensing unit (102) senses an operation performed by a user (10) on an equipment apparatus. If the operation sensing unit (102) has sensed an operation, an audio output unit (103) outputs audio representing the content of the operation. An action sensing unit (105) senses an action performed by the user. If the operation sensed by the operation sensing unit (102) and the action sensed by the action sensing unit (105) are in a predetermined relationship, a history information generation unit (106) generates history information in which the content of the operation and the content of the action are associated. If an action has been sensed by the action sensing unit (105), the audio output unit (103) outputs the audio representing the content of the operation which was associated with the content of the action, in the history information.

Description

Audio output device, device control system, audio output method, and program

The present invention relates to an audio output device, a device control system, an audio output method, and a program.

Currently, various techniques for controlling equipment according to user operations are known. Such operations include screen operations and voice operations. When controlling equipment according to voice operation, for example, equipment is controlled using an equipment control device which controls equipment based on the voice which a user uttered. When a device control apparatus that controls facility devices based on voice is used, user convenience is often improved.

By the way, it is very troublesome for the user to repeatedly perform operations corresponding to various control contents in order to set the equipment to a desired control state. In order to reduce such annoyance, a method for storing information indicating a desired control state for each user is known. For example, Patent Literature 1 includes a portable device that transmits user identification information when a user controls the device using the portable device, and stores personal information acquired by using the device in accordance with the user identification information. A personal information storage system is described.

JP 2008-234371 A

However, since the technique described in Patent Document 1 is not a technique that uses a device control apparatus that controls equipment based on voice, it is difficult to immediately apply to a technique that uses such a device control apparatus. is there. For this reason, there is a demand for a technique for easily bringing a facility device into a desired control state using a device control device that controls the facility device based on voice.

The present invention has been made in view of the above problems, and an audio output device and an apparatus control system for easily setting an equipment device in a desired control state by using an equipment control device that controls the equipment equipment based on sound. An object is to provide an audio output method and a program.

In order to achieve the above object, an audio output device according to the present invention includes:
Operation detection means for detecting an operation on the equipment by the user;
Audio output means for outputting audio representing the content of the operation when the operation is detected by the operation detection means;
Action detecting means for detecting an action made by the user;
When the operation detected by the operation detection unit and the behavior detected by the behavior detection unit have a predetermined relationship, history information in which the content of the operation and the content of the behavior are associated with each other is stored. A history information generating means for generating,
When the action is detected by the action detecting means, the sound output means outputs the sound representing the content of the operation associated with the content of the action by the history information.

In the present invention, the operation performed on the equipment by the user and the action performed by the user have a predetermined relationship, and when the action is detected, a sound representing the content of the operation is output. Therefore, according to this invention, an equipment apparatus can be easily made into a desired control state using the equipment control apparatus which controls an equipment apparatus based on an audio | voice.

The block diagram of the equipment control system which concerns on Embodiment 1 of this invention 1 is a configuration diagram of an audio output device according to Embodiment 1 of the present invention. The block diagram of the apparatus control apparatus which concerns on Embodiment 1 of this invention. Functional configuration diagram of the audio output device according to the first embodiment of the present invention. Functional configuration diagram of the device control apparatus according to the first embodiment of the present invention The figure which shows the historical information which concerns on Embodiment 1 of this invention. The flowchart which shows the audio | voice output process which the audio | voice output apparatus which concerns on Embodiment 1 of this invention performs. Functional configuration diagram of the audio output device according to the second embodiment of the present invention The figure which shows the historical information which concerns on Embodiment 2 of this invention. The flowchart which shows the audio | voice output process which the audio | voice output apparatus which concerns on Embodiment 2 of this invention performs. Functional configuration diagram of the audio output device according to the third embodiment of the present invention Configuration diagram of device control system according to Embodiment 4 of the present invention The figure which shows the historical information which concerns on Embodiment 4 of this invention. The figure which shows the historical information which concerns on Embodiment 5 of this invention. The flowchart which shows the setting process which the audio | voice output apparatus which concerns on Embodiment 6 of this invention performs.

(Embodiment 1)
First, the configuration of a device control system 1000 according to Embodiment 1 of the present invention will be described with reference to FIG. The device control system 1000 includes a sound output device 100 and a device control device 200, and the sound output device 100 and the device control device 200 cooperate to control facility devices. In the present embodiment, the equipment is assumed to be an air conditioner 300. The voice output device 100 receives an operation on the equipment from the user 10 and outputs a voice representing the operation. The device control device 200 detects the sound output by the sound output device 100, generates a control command corresponding to the sound, and transmits the control command to the air conditioner 300. The device control apparatus 200 and the air conditioner 300 are connected to each other via a communication network 600. The communication network 600 is, for example, a wireless LAN (Local Area Network) built in a home.

The device control apparatus 200 is generally used to detect a voice spoken by the user 10 and transmit a control command corresponding to the voice to the air conditioner 300. However, in this embodiment, not the voice uttered by the user 10 but the voice uttered in response to the operation received by the voice output device 100 from the user 10 is detected, and a control command corresponding to this voice is sent to the air conditioner 300. Send. Thus, various effects can be expected by the audio output device 100 relaying the user 10 and the device control device 200. For example, according to such a configuration, the user 10 can control the air conditioner 300 not by voice operation but by screen operation. In addition, according to such a configuration, for example, not only control corresponding to an operation performed by the user 10 but also control corresponding to another operation related to this operation can be automatically executed.

The audio output device 100 receives an operation for instructing control on the air conditioner 300 from the user 10. The audio output device 100 outputs audio corresponding to the content of the operation received from the user 10. The sound output from the sound output device 100 is detected by the device control device 200. The audio output device 100 can automatically output audio for setting the air conditioner 300 to a control state desired by the user 10 based on the history of operations received from the user 10. For this reason, the audio output device 100 can output audio other than audio corresponding to the content of the accepted operation, or can output audio when no operation is accepted. The audio output device 100 is, for example, a smartphone, a tablet terminal, or a personal computer.

Hereinafter, the configuration of the audio output device 100 will be described with reference to FIG. As shown in FIG. 2, the audio output device 100 includes a processor 11, a flash memory 12, a touch screen 13, a microphone 14, a speaker 15, and a communication interface 16. The processor 11 controls the overall operation of the audio output device 100. The processor 11 is, for example, a CPU (Central Processing Unit) that incorporates ROM (Read Only Memory), RAM (Random Access Memory), RTC (Real Time Clock), and the like. The CPU operates according to a basic program stored in the ROM, for example, and uses the RAM as a work area.

The flash memory 12 is a nonvolatile memory that stores various types of information. For example, the flash memory 12 stores a program executed by the processor 11. The touch screen 13 detects an operation performed by the user and supplies a signal indicating the detection result to the processor 11. The touch screen 13 displays information according to control by the processor 11.

The microphone 14 is a device that converts sound into an electrical signal. For example, the microphone 14 converts a voice uttered by the user 10 into an electrical signal. The speaker 15 is a device that converts a supplied electric signal into physical vibration and generates sound. For example, the speaker 15 outputs sound for transmitting various messages to the user 10. The communication interface 16 is a communication interface for connecting the audio output device 100 to a telephone network (not shown) or the Internet (not shown).

The device control device 200 detects the sound output from the sound output device 100 and generates a control command corresponding to the detected sound. The device control apparatus 200 transmits the generated control command to the air conditioner 300 via the communication network 600. The device control apparatus 200 has a function of converting speech into words and a function of converting words into control commands. The device control apparatus 200 is, for example, a smart speaker.

Hereinafter, the configuration of the device control apparatus 200 will be described with reference to FIG. As illustrated in FIG. 3, the device control apparatus 200 includes a processor 21, a flash memory 22, a touch screen 23, a microphone 24, a speaker 25, and a communication interface 26. The processor 21 controls the overall operation of the device control apparatus 200. The processor 21 is, for example, a CPU incorporating a ROM, RAM, RTC, and the like. The CPU operates according to a basic program stored in the ROM, for example, and uses the RAM as a work area.

The flash memory 22 is a non-volatile memory that stores various types of information. For example, the flash memory 22 stores a program executed by the processor 21. The touch screen 23 detects an operation performed by the user and supplies a signal indicating the detection result to the processor 21. The touch screen 23 displays information according to control by the processor 21.

The microphone 24 is a device that converts sound into an electrical signal. For example, the microphone 24 converts the sound output from the sound output device 100 into an electrical signal. The speaker 25 is a device that converts a supplied electric signal into physical vibration and utters a sound. For example, the speaker 25 outputs sound for transmitting various messages to the user 10. The communication interface 26 is a communication interface for connecting the device control apparatus 200 to the communication network 600.

The air conditioner 300 is a facility device to be controlled by the device control system 1000. The air conditioner 300 is, for example, a device that harmonizes air in a home space. The air conditioner 300 includes, for example, a heating function, a cooling function, a dehumidifying function, and a blowing function. The air conditioner 300 includes, for example, an indoor unit (not shown) installed in the house, an outdoor unit (not shown) installed outside the house, and a remote controller (for controlling the indoor unit and the outdoor unit). (Not shown). The air conditioner 300 has a function of connecting to the communication network 600. The air conditioner 300 is controlled according to a control command received from the device control apparatus 200 via the communication network 600.

Next, the function of the audio output device 100 will be described with reference to FIG. As shown in FIG. 4, the audio output device 100 functionally includes a control unit 101, an audio output unit 103, an audio information storage unit 104, an action detection unit 105, a history information generation unit 106, a history An information storage unit 107. The behavior detection unit 105 includes an operation detection unit 102. The operation detection unit corresponds to the operation detection unit 102, for example. The audio output unit corresponds to the audio output unit 103, for example. The behavior detection unit corresponds to the behavior detection unit 105, for example. The history information generation unit corresponds to the history information generation unit 106, for example.

The control unit 101 controls the overall operation of the audio output device 100. For example, the control unit 101 causes the audio output unit 103 to output sound based on the detection result by the operation detection unit 102. For example, the control unit 101 causes the history information generation unit 106 to generate history information based on the detection result by the operation detection unit 102 and the detection result by the behavior detection unit 105. Further, for example, the control unit 101 outputs a sound from the sound output unit 103 based on at least one detection result of the detection result by the operation detection unit 102 and the detection result by the behavior detection unit 105 and the history information. Output. The function of the control unit 101 is realized, for example, when the processor 11 executes a program stored in the flash memory 12.

The operation detection unit 102 detects an operation performed on the air conditioner 300 by the user 10. This operation is an operation for controlling the air conditioner 300, and is an operation for instructing control contents for the air conditioner 300. This operation is, for example, an operation of instructing the air conditioner 300 to turn on the power, an operation of switching the air conditioning mode to cooling, or an operation of switching the set temperature to 22 ° C. This operation is a screen operation or a voice operation on the voice output device 100. For example, the operation detection unit 102 receives a screen operation on an operation screen for receiving an operation on the air conditioner 300. Or the operation detection part 102 detects the audio | voice showing the control content with respect to the air conditioner 300. FIG. It can be said that the operation detection unit 102 is substantially an operation reception unit that receives an operation by the user 10. The function of the operation detection unit 102 is realized by the function of the touch screen 13 or the function of the microphone 14, for example.

When the operation detection unit 102 detects the operation, the audio output unit 103 outputs a sound representing the content of the operation. For example, it is assumed that the content of the operation is specified by the control unit 101, and audio information representing the content of control is acquired by the control unit 101. In this case, the audio output unit 103 generates an electric signal based on the audio information supplied from the control unit 101, and generates an audio corresponding to the electric signal. The function of the audio output unit 103 is realized by the cooperation of the processor 11 and the speaker 15, for example.

The voice information storage unit 104 stores voice information. The sound information is information indicating sound to be output for each operation content, that is, for each control content. For example, in the audio information, the content of the operation of the air conditioner 300: power supply: on is associated with information corresponding to an electrical signal for outputting a sound “Please turn on the air conditioner 300.” Information. The function of the audio information storage unit 104 is realized by the function of the flash memory 12, for example.

The behavior detection unit 105 detects a behavior performed by the user 10. This action is, for example, an operation on the air conditioner 300 by the user 10 or a utterance of words by the user 10. In the present embodiment, the operation on the air conditioner 300 is substantially an operation on the audio output device 100. The function of the behavior detection unit 105 is realized by, for example, the function of the touch screen 13 or the function of the microphone 14.

When the operation detected by the operation detection unit 102 and the behavior detected by the behavior detection unit 105 have a predetermined relationship, the history information generation unit 106 determines that the content of the operation and the content of the behavior are The associated history information is generated. The predetermined relationship is, for example, a relationship in which the difference between detected times is within a threshold value, or a relationship in which all detected times are times in the setting mode. The function of the history information generation unit 106 is realized by, for example, the processor 11 executing a program stored in the flash memory 12.

The history information storage unit 107 stores the history information generated by the history information generation unit 106. The function of the history information storage unit 107 is realized by the function of the flash memory 12, for example.

When the behavior detection unit 105 detects the behavior, the voice output unit 103 outputs the voice representing the content of the operation associated with the content of the behavior based on history information. That is, when there is history information in which the detected action content and the operation content are associated with each other, the audio output unit 103 outputs a sound representing the operation content.

Here, when the operation and the action are detected within a predetermined time, the history information generation unit 106 generates history information in which the content of the operation and the content of the action are associated with each other. This predetermined time is, for example, about several minutes. Then, the sound output unit 103 outputs the sound when the behavior is detected after the history information is generated. Thus, when the user 10 has a track record of continuously executing the operation and the action at relatively short intervals, the user 10 is likely to execute the operation together with the action. Therefore, when there is such a track record, the voice output unit 103 considers that the user 10 is highly likely to perform the operation when the action is detected, and outputs a voice for realizing control by the operation. Output automatically.

Specifically, in the present embodiment, the operation detection unit 102 detects a first operation on the air conditioner 300 by the user 10 and a second operation on the air conditioner 300 by the user 10. Here, when the first operation is detected, the voice output unit 103 outputs the first voice representing the content of the first operation, and when the second operation is detected, the voice output unit 103 represents the second operation. Output audio. The behavior detection unit 105 includes an operation detection unit 102 and detects the second operation as the behavior performed by the user 10.

Further, when the first operation and the second operation are detected within the predetermined time, the history information generation unit 106 associates the contents of the first operation with the contents of the second operation. Is generated. Then, after the history information is generated, the voice output unit 103 outputs the first voice and the second voice when the second operation is detected. Thus, when there is a track record in which the first operation and the second operation are continuously detected, there is a high possibility that the first operation and the second operation are continuously executed. Therefore, when there is such a track record, the audio output unit 103 considers that the first operation is highly likely to be performed when the second operation is detected, and the second audio representing the content of the second operation. In addition, a first sound representing the content of the first operation is also output.

In the present embodiment, the history information generation unit 106 determines the contents of the first operation and the second information when the first operation is detected before the predetermined time elapses after the second operation is detected. History information associated with the contents of the operation is generated. That is, in the present embodiment, when there is a track record in which the first operation has been detected since the second operation has been detected, when the second operation is newly detected, the second sound representing the content of the second operation is displayed. In addition, a first sound representing the content of the first operation is also output. On the other hand, when there is a track record in which the first operation has been detected since the second operation has been detected, when the first operation is newly detected, a first sound representing the content of the first operation is output, and the second The second sound representing the content of the operation is not output.

As described above, in the present embodiment, a voice representing the content of an operation that is considered to be highly likely to be further executed after a certain operation is detected is automatically output, and further executed after a certain operation is detected. The voice representing the content of the operation that is considered to be less likely to be performed is not automatically output. For example, it is assumed that the second operation is an operation for turning on the power to the air conditioner 300 and the first operation is an operation for setting the air conditioning mode to the air conditioner 300 for cooling. In this case, the possibility that the first operation is performed after the second operation is performed is high, but the possibility that the second operation is performed after the first operation is performed is low. Therefore, only when a previously detected operation is newly detected, a sound indicating the content of the operation detected later is additionally output.

Here, the history information generation unit 106, when the number of times that the operation and the action are detected within the predetermined time within the most recent predetermined period has reached a predetermined threshold, It is preferable to generate history information in which the content of the operation and the content of the action are associated with each other. The most recent predetermined period is, for example, the most recent month. The predetermined time is, for example, several minutes. The predetermined threshold is, for example, 5 times. Thus, when there is a track record in which the above operation and the action are continuously detected, for example, five times in the most recent month, it is considered highly likely that the operation is performed together with the action. Therefore, in such a case, when the behavior is detected, it is preferable that a sound representing the content of the operation is output. According to such a configuration, it is possible to suppress inappropriate output of sound.

Next, the function of the device control apparatus 200 will be described with reference to FIG. As shown in FIG. 5, the device control apparatus 200 functionally includes a control unit 201, a sound detection unit 202, a sound output unit 203, a sound information storage unit 204, a device control unit 205, and command information. And a storage unit 206. The sound detection means included in the device control apparatus 200 corresponds to the sound detection unit 202, for example. The device control means corresponds to the device control unit 205, for example.

The control unit 201 controls the overall operation of the device control apparatus 200. For example, the control unit 201 specifies the content of control for the air conditioner 300 from the sound detected by the sound detection unit 202, and causes the device control unit 205 to transmit a control command representing the specified content of control. The function of the control unit 201 is realized, for example, when the processor 21 executes a program stored in the flash memory 22.

The voice detection unit 202 detects the voice output by the voice output unit 103. Therefore, it is desirable that the sound detection unit 202 be disposed near the sound output unit 103. For example, the voice detection unit 202 is disposed in an area within several meters from the voice output unit 103. The function of the voice detection unit 202 is realized by the function of the microphone 24, for example.

The sound output unit 203 outputs various sounds according to the control by the control unit 201. For example, the voice output unit 203 outputs a voice representing an announcement to the user 10. The function of the audio output unit 203 is realized by the cooperation of the processor 21 and the speaker 25, for example.

The voice information storage unit 204 stores voice information. The voice information is information used to specify the content of control from the voice detected by the voice detection unit 202, for example. For example, the audio information is information indicating information representing an electrical signal corresponding to a sound representing the content of control for each content of control. The function of the audio information storage unit 204 is realized by the function of the flash memory 22, for example.

The device control unit 205 controls the air conditioner 300 based on the content of the operation represented by the voice detected by the voice detection unit 202. For example, the device control unit 205 transmits a control command corresponding to the detected voice to the air conditioner 300 via the communication network 600 according to the control by the control unit 201. The function of the device control unit 205 is realized by the cooperation of the processor 21 and the communication interface 26, for example.

The command information storage unit 206 stores command information. The command information is, for example, information in which the control content corresponding to the operation content is associated with the control command. The function of the command information storage unit 206 is realized by the function of the flash memory 22, for example.

Next, history information will be described with reference to FIG. The history information shown in FIG. 6 is information indicating all combinations of a plurality of operations performed continuously in the past. The detection start time is the time when detection of the corresponding record combination is started. The operation A is an operation performed first among a plurality of operations performed continuously in the past. The operation A corresponds to the second operation, for example. The operation B is the second operation among a plurality of operations performed continuously in the past. The operation B corresponds to the first operation, for example. The operation C is the third operation among a plurality of operations performed continuously in the past. The operation C corresponds to the first operation, for example. In the present embodiment, there is one second operation, and there are one or more first operations.

The top record in the history information shown in FIG. 6 includes an operation for turning on the power of the air conditioner 300 at 12:00 on May 18, 2018, and an operation for setting the air conditioning mode of the air conditioner 300 to air conditioning. The operation for setting the set temperature of the air conditioner 300 to 28 ° C. is a record indicating the results of continuous execution. When an operation for turning on the power of the air conditioner 300 is detected in such a case, an operation for cooling the air conditioning mode of the air conditioner 300 and an operation for setting the set temperature of the air conditioner 300 to 28 ° C. are executed. There is a high possibility of being. Therefore, when an operation to turn on the air conditioner 300 is detected, in addition to a sound for instructing to turn on the air conditioner 300, a sound for instructing the air conditioning mode of the air conditioner 300 to be cooled, A sound for instructing to set the set temperature of the air conditioner 300 to 28 ° C. is output. In the example shown in FIG. 6, the content of the operation A corresponding to the second operation is the same in the top record and the bottom record. In such a case, it is preferable to employ the top record with a new detection start time.

Although FIG. 6 shows an example in which all detected combinations of operations are included in the history information, the history information is not limited to this example. For example, only combinations of operations detected during the most recent predetermined period may be included in the history information. Further, only the combination of operations detected a predetermined number of times or more in the latest predetermined period may be included in the history information. In addition, a record with an older detection start time among combinations having a competitive relationship may be excluded from the history information. The combinations in the competitive relationship are, for example, combinations in which the second operation is the same and at least one first operation is different.

Next, the audio output process executed by the audio output device 100 will be described with reference to the flowchart of FIG. The audio output process is executed in response to, for example, turning on the power of the audio output device 100.

First, the processor 11 determines whether or not an operation is detected (step S101). When determining that the operation is not detected (step S101: NO), the processor 11 returns the process to step S101. On the other hand, if the processor 11 determines that an operation has been detected (step S101: YES), the processor 11 stores the detection start time (step S102). When completing the process of step S102, the processor 11 determines whether or not there is an interlocking setting (step S103). Specifically, the processor 11 determines whether or not the history information includes a record having the operation detected in step S101 as the second operation.

If the processor 11 determines that there is an interlocking setting (step S103: YES), the processor 11 outputs a voice group (step S104). For example, the processor 11 performs processing for selecting one operation content included in the record and processing for outputting sound representing the content of the selected operation from the speaker 15, and processing all the operation content included in the record. Run repeatedly until selected. On the other hand, when determining that there is no interlocking setting (step S103: NO), the processor 11 outputs a single sound (step S105). For example, the processor 11 causes the speaker 15 to output sound representing the content of the operation detected in step S101.

The processor 11 determines whether or not an operation is detected when the process of step S104 or the process of step S105 is completed (step S106). When it is determined that the operation is detected (step S106: YES), the processor 11 determines whether or not there is an interlocking setting (step S107). Specifically, the processor 11 determines whether or not the history information includes a record having the operation detected in step S106 as the second operation. If the processor 11 determines that there is an interlocking setting (step S107: YES), the processor 11 outputs a voice group (step S108). On the other hand, when determining that there is no interlocking setting (step S107: NO), the processor 11 outputs a single sound (step S109).

When the processor 11 completes the process of step S108 or the process of step S109, or determines that the operation is not detected (step S106: NO), whether or not the first time has elapsed from the detection start time. Is discriminated (step S110). The first time is the above-described predetermined time, for example, several minutes. When determining that the first time has not elapsed since the detection start time (step S110: NO), the processor 11 returns the process to step S106.

On the other hand, when determining that the first time has elapsed from the detection start time (step S110: YES), the processor 11 determines whether or not a plurality of operations are detected within the first period (step S111). The first period is a period until the first time elapses from the detection start time. When the processor 11 determines that a plurality of operations are detected within the first period (step S111: YES), the processor 11 generates history information (step S112). For example, the processor 11 updates the history information so as to include a record in which the operation detected in step S101 is the second operation and the operation detected in step S106 is the first operation. When it is determined that a plurality of operations are not detected within the first period (step S111: NO), or when the process of step S112 is completed, the processor 11 returns the process to step S101.

The audio output method according to the present embodiment is realized by the audio output device 100 according to the present embodiment executing the audio output process shown in FIG. In this voice output method, first, an operation on the equipment device by the user 10 is detected, and when this operation is detected, a voice representing the content of this operation is output. In this audio output method, an action performed by the user 10 is detected. And in this audio | voice output method, when this action is detected and this operation and this action have a predetermined relationship, the said audio | voice is output.

As described above, in this embodiment, the operation performed on the equipment by the user 10 and the action performed by the user 10 are in a predetermined relationship, and when the action is detected, the sound representing the content of the operation is displayed. Is output. Therefore, according to the present embodiment, the equipment device can be easily brought into a desired control state by using the equipment control device 200 that controls the equipment device based on the voice. For example, it can be expected that the control of the facility device reflecting the preference of the user 10 is realized by a single operation on the audio output device 100.

Further, in the present embodiment, when there is a track record of performing a plurality of operations within a predetermined time, when any of these operations is performed, the contents of the other operations are displayed. The voice that represents is automatically output. Therefore, according to the present embodiment, the equipment can be brought into a desired control state with few operations.

Further, in the present embodiment, when there is a track record of performing a plurality of operations within a predetermined time, when the first operation among these operations is performed, the contents of other operations are represented. Audio is automatically output. Therefore, according to the present embodiment, the equipment can be appropriately set to a desired control state with few operations.

(Embodiment 2)
In the first embodiment, an example in which keywords are not associated with operations and actions that are continuously executed has been described. In the present embodiment, an example in which keywords are associated with operations and actions that are continuously executed will be described. Hereinafter, fundamentally, differences from the first embodiment will be described.

First, the function of the audio output device 120 will be described with reference to FIG. As shown in FIG. 8, the audio output device 120 functionally includes a control unit 101, an operation detection unit 102, an audio output unit 103, an audio information storage unit 104, an action detection unit 105, and history information. A generation unit 106 and a history information storage unit 107 are provided. The behavior detection unit 105 includes a voice detection unit 108. The sound detection means included in the sound output device 120 corresponds to, for example, the sound detection unit 108.

The operation detection unit 102 detects a first operation performed on the equipment by the user 10. When the first operation is detected, the voice output unit 103 outputs a first voice representing the content of the first operation. The behavior detection unit 105 includes a voice detection unit 108 that detects a third voice representing a word uttered by the user 10, and detects the utterance of the word by the user 10 as an action made by the user 10. The function of the voice detection unit 108 is realized by the function of the microphone 14, for example.

The history information generation unit 106 generates history information in which the contents of the first operation and the above words are associated when the first operation and the third voice are detected within a predetermined time. The sound output unit 103 outputs the first sound when the third sound is detected after the history information is generated.

The words expressed by the third voice are words that are highly likely to be issued along with the execution of the first operation, and are treated as keywords. Then, when there is a record that this keyword has been issued along with the execution of the first operation, when this keyword is issued, a voice representing the first operation is automatically output. Note that the number of first operations associated with this keyword based on history information may be any number as long as it is one or more. The history information may be information in which the content of the first operation is associated with the keyword issued immediately before or after the first operation, or the history information is issued immediately before the content of the first operation and the first operation. Information associated with a keyword may be used, or information associated with the content of the first operation and the keyword issued immediately after the first operation may be used.

Next, history information in the present embodiment will be described with reference to FIG. In the present embodiment, the history information is information in which a keyword, the content of the operation A, the content of the operation B, and the content of the operation C are associated with each other. Operation A, operation B, and operation C are operations performed together with keyword utterances, and are first operations. The top record indicated by the history information includes an utterance of the keyword “air conditioner”, an operation of turning on the power of the air conditioner 300, an operation of cooling the air conditioning mode of the air conditioner 300, This shows that there is a track record of performing an operation for setting the set temperature to 28 ° C. In such a case, when the keyword “air conditioner” is spoken, a voice instructing to turn on the power of the air conditioner 300 and instructing to set the air conditioning mode of the air conditioner 300 to cooling. A voice and a voice instructing to set the set temperature of the air conditioner 300 to 28 ° C. are automatically output. Although FIG. 9 shows an example in which all detected combinations of keywords and first operations are included in the history information, the history information is not limited to this example. For example, only the combination of the keyword and the first operation detected during the most recent predetermined period may be included in the history information. In addition, only the combination of the keyword and the first operation detected more than a predetermined number of times in the most recent predetermined period may be included in the history information. In addition, a record with an older detection start time among combinations having a competitive relationship may be excluded from the history information. The combinations having a competitive relationship are, for example, combinations having the same keyword and different at least one first operation.

Next, the audio output process executed by the audio output device 120 will be described with reference to the flowchart of FIG. The voice output process is executed in response to, for example, the power of the voice output device 120 being turned on. Here, an example will be described in which the contents of a series of operations performed after the keyword utterance are associated with the keyword.

First, the processor 11 determines whether or not a word is detected (step S201). For example, the processor 11 determines whether or not a voice representing a word that can be a keyword has been detected by the microphone 14. If the processor 11 determines that no word is detected (step S201: NO), the process returns to step S201. On the other hand, if the processor 11 determines that a word has been detected (step S201: YES), the processor 11 stores the detection start time (step S202).

When the processing of step S202 is completed, the processor 11 determines whether or not there is an interlocking setting (step S203). Specifically, the processor 11 determines whether or not a record using the word detected in step S201 as a keyword is included in the history information. When the processor 11 determines that there is an interlocking setting (step S203: YES), the processor 11 outputs a single voice or a voice group (step S204). In addition, when the content of a single operation is included in the record, a single sound is output. On the other hand, when the contents of a plurality of operations are included in the record, a voice group is output.

When the processing of step S204 is completed or when it is determined that there is no interlocking setting (step S203: NO), the processor 11 determines whether an operation is detected (step S205). When determining that the operation is detected (step S205: YES), the processor 11 outputs a single sound (step S206). When the processor 11 completes the process of step S206 or determines that the operation is not detected (step S205: NO), the processor 11 determines whether the first time has elapsed from the detection start time (step S205). S207).

If the processor 11 determines that the first time has not elapsed since the detection start time (step S207: NO), the processor 11 returns the process to step S205. On the other hand, when determining that the first time has elapsed from the detection start time (step S207: YES), the processor 11 determines whether an operation is detected within the first period (step S208). When the processor 11 determines that an operation has been detected within the first period (step S208: YES), the processor 11 generates history information (step S209). The processor 11 updates the history information so as to include a record in which the keyword, which is the detected word, is associated with the content of the operation detected within the first period. When the processor 11 completes the process of step S209 or determines that no operation is detected within the first period (step S208: NO), the processor 11 returns the process to step S201.

In this embodiment, when there is a track record in which at least one operation has been detected together with the keyword utterance, a voice representing the content of the at least one operation is automatically output when the keyword utterance is detected. Therefore, according to the present embodiment, the facility device can be brought into a desired control state by the utterance of the keyword. Moreover, in this embodiment, since the user can select freely the keyword matched with a series of operation for making an equipment apparatus the desired control state, a user's convenience increases.

(Embodiment 3)
In the first embodiment, the example in which the behavior detected by the behavior detection unit 105 is an operation detected by the operation detection unit 102 has been described. In the second embodiment, the example in which the behavior detected by the behavior detection unit 105 is an utterance of voice detected by the voice detection unit 108 has been described. In the present embodiment, an example in which the behavior detected by the behavior detection unit 105 is both an operation detected by the operation detection unit 102 and a speech utterance detected by the voice detection unit 108 will be described. Hereinafter, fundamentally, differences from the first and second embodiments will be described.

First, the function of the audio output device 130 will be described with reference to FIG. As shown in FIG. 11, the audio output device 130 functionally includes a control unit 101, an audio output unit 103, an audio information storage unit 104, an action detection unit 105, a history information generation unit 106, a history An information storage unit 107. The behavior detection unit 105 includes an operation detection unit 102 and a voice detection unit 108.

The behavior detection unit 105 includes a voice detection unit 108 that detects a third voice representing a word uttered by the user 10, and detects the utterance of the word by the user 10 as an action performed by the user 10. When the first operation, the second operation, and the third voice are detected within a predetermined time, the history information generation unit 106 associates the contents of the first operation, the contents of the second operation, and the above words. Generated history information is generated. The sound output unit 103 outputs the first sound and the second sound when the third sound is detected after the history information is generated.

That is, in the present embodiment, when there is a track record in which the first operation, the second operation, and the third sound are detected continuously, when at least one of the second operation and the third sound is detected, One voice and second voice are output. For example, as in the second embodiment, it is assumed that the history information illustrated in FIG. 9 is generated. In the present embodiment, for example, the air conditioner 300 is turned on in both cases where the keyword “air conditioner” is pronounced and whether the operation of turning on the air conditioner 300 is performed. , A voice instructing to set the air conditioning mode of the air conditioner 300 to cooling, and a voice instructing to set the set temperature of the air conditioner 300 to 28 ° C. are output.

In this embodiment, when there is a track record in which a plurality of operations are detected together with the utterance of the keyword, when the utterance of the keyword is detected, or when the first operation among the plurality of operations is detected, these Voices representing the contents of multiple operations are automatically output. Therefore, according to the present embodiment, the facility device can be brought into a desired control state by the first operation of the keyword utterance or the series of operations.

(Embodiment 4)
In the embodiment 1-3, the example in which there is only one equipment device to be controlled has been described. In the present embodiment, an example in which there are a plurality of facility devices to be controlled will be described. In the present embodiment, as shown in FIG. 12, an example in which the equipment to be controlled is three of an air conditioner 300, a bathroom heater 310, and a water heater 320 will be described.

The operation detection unit 102 detects a first operation on the first equipment device among the plurality of equipment devices by the user 10 and a second operation on the second equipment device among the plurality of equipment devices by the user 10. In the present embodiment, it is assumed that any one equipment device among the air conditioner 300, the bathroom heater 310, and the water heater 320 is the second equipment device, and the remaining two equipment devices are the first equipment devices. . However, any of the three equipment may be the second equipment.

The sound output unit 103 outputs a first sound representing the content of the first operation when the first operation is detected, and outputs a second sound representing the content of the second operation when the second operation is detected. To do. The behavior detection unit 105 includes an operation detection unit 102 and detects the second operation as the behavior performed by the user 10. When the first operation and the second operation are detected within a predetermined time, the history information generation unit 106 generates history information in which the content of the first operation and the content of the second operation are associated with each other. . Then, the sound output unit 103 outputs the first sound and the second sound when the second operation is detected after the history information is generated.

Next, history information in the present embodiment will be described with reference to FIG. In the present embodiment, the history information is information in which the detection start time, the content of the operation A, the content of the operation B, and the content of the operation C are associated with each other. The operation A, the operation B, and the operation C are a series of operations detected continuously within a predetermined time. This series of operations may include a plurality of operations for one facility device. Any one of the operation A, the operation B, and the operation C is the second operation. The remaining two operations are the first operations.

The top record indicated by the history information includes an operation to turn on the power of the air conditioner 300 until a predetermined time elapses from 12:00 on May 18, 2018, which is the detection start time. The operation of turning on the power of the bathroom heater 310 and the operation of turning on the power of the hot water heater 320 have been performed in succession. In the case where there is such a track record, any one of an operation of turning on the power of the air conditioner 300, an operation of turning on the power of the bathroom heater 310, and an operation of turning on the power of the water heater 320 is performed. When detected, a sound instructing to turn on the power of the air conditioner 300, a sound instructing to turn on the power of the bathroom heater 310, and a sound instructing to turn on the power of the water heater 320 Is automatically output.

In the present embodiment, when there is a track record of a series of operations performed on a plurality of equipment devices, if any of the series of operations is detected, the contents of these series of operations are represented. Audio is automatically output. Therefore, according to the present embodiment, a desired control state can be achieved for a plurality of facility devices by one operation in a series of operations.

(Embodiment 5)
Embodiment 4 demonstrated the example in which a keyword is not matched with a series of operation with respect to a some installation apparatus. In the present embodiment, an example in which keywords are associated with a series of operations on a plurality of facility devices will be described. Hereinafter, fundamentally, differences from the fourth embodiment will be described.

The behavior detection unit 105 includes a voice detection unit 108 that detects a third voice representing a word uttered by the user 10, and detects the utterance of the word by the user 10 as an action performed by the user 10. When the first operation, the second operation, and the third voice are detected within a predetermined time, the history information generation unit 106 associates the contents of the first operation, the contents of the second operation, and the above words. Generated history information is generated.

The voice output unit 103 outputs the first voice and the second voice when the third voice is detected after the history information is generated. Thus, in this embodiment, when there is a track record in which a series of operations on a plurality of equipment devices and a keyword utterance corresponding to the third voice are continuously detected, one operation in the series of operations is performed. Alternatively, when a keyword utterance is detected, a series of sounds representing a series of operations are output.

Next, history information in the present embodiment will be described with reference to FIG. In the present embodiment, the history information is information in which a keyword, the content of the operation A, the content of the operation B, and the content of the operation C are associated with each other. The operation A, the operation B, and the operation C are a series of operations that are continuously detected within a predetermined time. The keyword is a word detected within the predetermined time together with the series of operations. Any one of the operation A, the operation B, and the operation C is the second operation. The remaining two operations are the first operations.

The top record indicated by the history information includes the pronunciation of the keyword “returned now”, an operation to turn on the air conditioner 300, an operation to turn on the bathroom heater 310, and the hot water heater 320. This shows that there is a track record of continuous operation of turning on the power. In such a case, the pronunciation of the keyword “returned now” or the operation of turning on the power of the air conditioner 300, the operation of turning on the power of the bathroom heater 310, and the power of the water heater 320 are turned on. When any one of the turning-on operations is detected, a sound instructing to turn on the power of the air conditioner 300, a sound instructing to turn on the power of the bathroom heater 310, and a water heater A sound instructing to turn on the power of 320 is automatically output.

In the present embodiment, when there is a track record of performing a series of operations on a plurality of facility devices together with the utterance of a keyword, if any of the utterance of the keyword or this series of operations is detected, these Voices representing the contents of a series of operations are automatically output. Therefore, according to the present embodiment, a plurality of facility devices can be brought into a desired control state by keyword utterance or one of a series of operations.

(Embodiment 6)
In Embodiment 1-5, the example in which history information is automatically generated has been described. In the present embodiment, an example in which history information is manually generated will be described. In the present embodiment, the touch screen 13 or the microphone 14 receives an instruction to shift to the setting mode. The transition instruction receiving unit corresponds to, for example, the touch screen 13 or the microphone 14.

The history information generation unit 106 receives the transition instruction from the touch screen 13 or the microphone 14, and when the operation and the action are detected while the setting mode is set based on the transition instruction, the operation information and the action The history information associated with the contents of is generated. The voice output unit 103 outputs the voice when the behavior is detected after the history information is generated.

For example, it is assumed that the user 10 instructs the voice output device 100 to shift. Note that in the audio output device 100, the processor 11 and the speaker 15 cooperate to utter, and the voice uttered by the user 10 is detected by the microphone 14. In this case, for example, the audio output device 100 utters “What will be linked?”. When the user 10 utters “Hot water heater, power, on”, the audio output device 100 utters “What do you want to work with?”. When the user 10 utters “bathroom heating, power, on”, the audio output device 100 utters “What do you want to work with?”. When the user 10 utters “End”, the audio output device 100 utters “What is the keyword?”. When the user 10 utters “I'm back now”, the audio output device 100 utters “Setting is complete”.

Hereinafter, the setting process executed by the audio output device 100 will be described with reference to FIG.

First, the processor 11 determines whether or not there is an instruction to shift to the setting mode (step S301). When the processor 11 determines that there is no instruction to shift to the setting mode (step S301: NO), the processor 11 returns the process to step S301. On the other hand, when the processor 11 determines that there is an instruction to shift to the setting mode (step S301: YES), the processor 11 utters a message for prompting the operation (step S302).

After completing the process of step S302, the processor 11 determines whether or not there is a control designation operation (step S303). The control designation operation is an operation for designating control by voice, for example. When determining that there is a control designation operation (step S303: YES), the processor 11 stores the designated control (step S304). When it is determined that there is no control designation operation (step S303: NO), or when the process of step S304 is completed, the processor 11 determines whether there is a setting end operation (step S305). When determining that there is no setting end operation (step S305: NO), the processor 11 returns the process to step S302. On the other hand, when determining that there is a setting end operation (step S305: YES), the processor 11 utters a message that prompts the keyword to be uttered (step S306).

When completing the process of step S306, the processor 11 determines whether or not there is a keyword utterance (step S307). When determining that there is a keyword utterance (step S307: YES), the processor 11 generates history information with a keyword (step S308). On the other hand, when determining that there is no keyword utterance (step S307: NO), the processor 11 generates history information without a keyword (step S309). When completing the process of step S308 or the process of step S309, the processor 11 returns the process to step S301.

In this embodiment, a series of operations desired by the user are manually set. Therefore, according to this embodiment, it can suppress that the setting which a user does not intend is performed.

(Modification)
As mentioned above, although embodiment of this invention was described, when implementing this invention, a deformation | transformation and application with a various form are possible.

In the present invention, which part of the configuration, function, and operation described in the above embodiment is adopted is arbitrary. Further, in the present invention, in addition to the configuration, function, and operation described above, further configuration, function, and operation may be employed. Moreover, the structure, function, and operation | movement demonstrated in the said embodiment can be combined freely.

For example, in the second embodiment, an example in which a series of operations on facility equipment is performed after the utterance of a keyword has been described. The keyword may be uttered after a series of operations on the equipment.

The operation described as the screen operation may be a voice operation, or the operation described as the voice operation may be a screen operation.

By applying an operation program that defines the operation of the audio output device 100 according to the present invention to an existing personal computer or information terminal device, the personal computer or the like can also function as the audio output device 100 according to the present invention. is there. The distribution method of such a program is arbitrary. For example, the program is stored and distributed on a computer-readable recording medium such as a CD-ROM (Compact Disk Read-Only Memory), a DVD (Digital Versatile Disk), or a memory card. Alternatively, it may be distributed via a communication network such as the Internet.

The present invention is capable of various embodiments and modifications without departing from the broad spirit and scope of the present invention. Further, the above-described embodiment is for explaining the present invention, and does not limit the scope of the present invention. That is, the scope of the present invention is shown not by the embodiments but by the claims. Various modifications within the scope of the claims and within the scope of the equivalent invention are considered to be within the scope of the present invention.

The present invention is applicable to a device control system including a device control device that controls equipment based on voice.

10 users, 11, 21 processor, 12, 22 flash memory, 13, 23 touch screen, 14, 24 microphone, 15, 25 speaker, 16, 26 communication interface, 100, 120, 130 audio output device, 101, 201 control unit , 102 operation detection unit, 103, 203 voice output unit, 104, 204 voice information storage unit, 105 behavior detection unit, 106 history information generation unit, 107 history information storage unit, 108, 202 voice detection unit, 200 device control device, 205 device control unit, 206 command information storage unit, 300 air conditioner, 310 bathroom heater, 320 water heater, 600 communication network, 1000 device control system

Claims

Operation detection means for detecting an operation on the equipment by the user;
Audio output means for outputting audio representing the content of the operation when the operation is detected by the operation detection means;
Action detecting means for detecting an action made by the user;
When the operation detected by the operation detection unit and the behavior detected by the behavior detection unit have a predetermined relationship, history information in which the content of the operation and the content of the behavior are associated with each other is stored. A history information generating means for generating,
The voice output unit outputs the voice representing the content of the operation associated with the content of the action by the history information when the behavior is detected by the behavior detection unit.
Audio output device.
The history information generation means generates the history information in which the content of the operation and the content of the action are associated when the operation and the behavior are detected within a predetermined time,
The voice output means outputs the voice when the behavior is detected after the history information is generated.
The audio output device according to claim 1.
The operation detection means detects a first operation on the equipment by the user and a second operation on the equipment by the user,
The sound output means outputs a first sound representing the content of the first operation when the first operation is detected, and a second sound representing the content of the second operation when the second operation is detected. 2 audio output,
The behavior detection means includes the operation detection means, detects the second operation as an action performed by the user,
The history information generation means associates the content of the first operation with the content of the second operation when the first operation and the second operation are detected within the predetermined time. Generating the history information;
The sound output means outputs the first sound and the second sound when the second operation is detected after the history information is generated.
The audio output device according to claim 2.
When the first operation is detected before the predetermined time has elapsed since the second operation was detected, the history information generation unit is configured to include the contents of the first operation and the contents of the second operation. And generating the history information associated with
The audio output device according to claim 3.
The behavior detection means includes voice detection means for detecting a third voice representing a word uttered by the user, detects the utterance of the word by the user as an action made by the user,
When the first operation, the second operation, and the third sound are detected within the predetermined time, the history information generation unit includes the contents of the first operation and the contents of the second operation. Generating the history information associated with the words;
The sound output means outputs the first sound and the second sound when the third sound is detected after the history information is generated.
The audio output device according to claim 3 or 4.
The operation detection means detects a first operation on the equipment by the user,
The sound output means outputs a first sound representing the content of the first operation when the first operation is detected,
The behavior detection means includes voice detection means for detecting a third voice representing a word uttered by the user, detects the utterance of the word by the user as an action made by the user,
When the first operation and the third voice are detected within the predetermined time, the history information generating unit displays the history information in which the contents of the first operation and the words are associated with each other. Generate
The voice output means outputs the first voice when the third voice is detected after the history information is generated.
The audio output device according to claim 2.
There are a plurality of the equipments,
The operation detection means detects a first operation on a first equipment device among the plurality of equipment devices by the user and a second operation on a second equipment device among the plurality of equipment devices by the user,
The sound output means outputs a first sound representing the content of the first operation when the first operation is detected, and a second sound representing the content of the second operation when the second operation is detected. 2 audio output,
The behavior detection means includes the operation detection means, detects the second operation as an action performed by the user,
The history information generation means associates the content of the first operation with the content of the second operation when the first operation and the second operation are detected within the predetermined time. Generating the history information;
The sound output means outputs the first sound and the second sound when the second operation is detected after the history information is generated.
The audio output device according to claim 2.
The behavior detection means includes voice detection means for detecting a third voice representing a word uttered by the user, detects the utterance of the word by the user as an action made by the user,
When the first operation, the second operation, and the third sound are detected within the predetermined time, the history information generation unit includes the contents of the first operation and the contents of the second operation. Generating the history information associated with the words;
The sound output means outputs the first sound and the second sound when the third sound is detected after the history information is generated.
The audio output device according to claim 7.
When the number of times that the operation and the action are detected within the predetermined time within the most recent predetermined period has reached a predetermined threshold, the history information generating means Generating the history information in which the content and the content of the action are associated;
The audio output device according to any one of claims 2 to 8.
A transition instruction receiving means for receiving a transition instruction to the setting mode;
The history information generating means receives the transition instruction by the transition instruction receiving means, and when the operation and the action are detected while the setting mode is set based on the transition instruction, Generating the history information in which the content of the operation and the content of the action are associated;
The voice output means outputs the voice when the behavior is detected after the history information is generated.
The audio output device according to claim 1.
A device control system comprising an audio output device and a device control device,
The audio output device is
Operation detection means for detecting an operation on the equipment by the user;
Audio output means for outputting audio representing the content of the operation when the operation is detected by the operation detection means;
Action detecting means for detecting an action made by the user;
When the operation detected by the operation detection unit and the behavior detected by the behavior detection unit have a predetermined relationship, history information in which the content of the operation and the content of the behavior are associated with each other is stored. A history information generating means for generating,
The device control device
Voice detection means for detecting the voice output by the voice output means;
Device control means for controlling the facility equipment based on the content of the operation represented by the voice detected by the voice detection means,
The voice output unit outputs the voice representing the content of the operation associated with the content of the action by the history information when the behavior is detected by the behavior detection unit.
Equipment control system.
Detects user operations on equipment,
When the operation is detected, a sound representing the content of the operation is output,
Detecting actions taken by the user,
When the action is detected and the operation and the action are in a predetermined relationship, the sound is output.
Audio output method.
Computer
Operation detection means for detecting operations on equipment by the user,
A sound output means for outputting a sound representing the content of the operation when the operation is detected by the operation detection means;
Action detecting means for detecting an action made by the user;
When the operation detected by the operation detection unit and the behavior detected by the behavior detection unit have a predetermined relationship, history information in which the content of the operation and the content of the behavior are associated with each other is stored. A program for functioning as history information generation means for generating,
The voice output unit outputs the voice representing the content of the operation associated with the content of the action by the history information when the behavior is detected by the behavior detection unit.
program.