CN113228166B

CN113228166B - Command control device, control method, and nonvolatile storage medium

Info

Publication number: CN113228166B
Application number: CN202080005756.9A
Authority: CN
Inventors: 浜田知; 高橋大
Original assignee: Hisense Visual Technology Co Ltd; Toshiba Visual Solutions Corp
Current assignee: Hisense Visual Technology Co Ltd; Toshiba Visual Solutions Corp
Priority date: 2019-12-06
Filing date: 2020-11-13
Publication date: 2023-02-28
Anticipated expiration: 2040-11-13
Also published as: JP7229906B2; CN113228166A; WO2021109839A1; JP2021092612A

Abstract

Embodiments of the present application relate to a command control device, a control method, and a nonvolatile storage medium. The command control device assists voice recognition for voice commands. The command control device of an embodiment generates an execution command including a control command for a device to be controlled based on an input voice, wherein the execution command includes any one of a 1 st control command and a2 nd control command different from the 1 st control command, and wherein the 1 st control command is a control command included in an identification command acquired by voice-identifying the voice.

Description

Command control device, control method, and nonvolatile storage medium

The present application claims priority of japanese patent application having application No. 2019-221573 entitled "command control device, control method, and control program" filed by japanese patent office on 6/12/2019, the entire contents of which are incorporated herein by reference.

Technical Field

The embodiment relates to a command control device, a control method and a nonvolatile storage medium.

Background

In recent years, a sound control technique of controlling a device by sound using sound recognition has been studied. For example, the voice control technology can be used in place of a remote controller for a television, and when a user issues a command as a control command, a voice on which the command is superimposed (hereinafter, referred to as a voice command) is received by a microphone or the like attached to the television, and the command is extracted as data by voice recognition. The extracted command data becomes a command to be executed by the television. In this technique, a voice command issued by a user may be recognized as an erroneous command by voice recognition under the influence of surrounding noise or the like, and improvement of performance of voice recognition is being studied.

Prior art documents

Patent literature

Patent document 1: japanese patent laid-open publication No. 2017-167247.

Disclosure of Invention

However, depending on the magnitude of the surrounding noise, voice recognition may not function well, and the range in which the voice control technology can be applied is limited. In addition, control by the voice control technique, which operates at ordinary times, may not operate normally due to increased noise in an emergency such as an earthquake.

The problem to be solved by the present application is to provide a command control device, a control method, and a nonvolatile storage medium that assist voice recognition of a voice command.

An instruction control device according to an embodiment of the present application generates an execution instruction including a control instruction for a device to be controlled based on an input voice, wherein the instruction control device generates the execution instruction including any one of a 1 st control instruction and a2 nd control instruction different from the 1 st control instruction, and the 1 st control instruction is the 1 st control instruction included in an identification instruction acquired by voice-recognizing the voice.

Drawings

Fig. 1 is a diagram showing an example of the overall configuration of the system according to embodiment 1;

fig. 2 is a block diagram showing an example of the functional configuration of the receiving apparatus of the embodiment;

fig. 3 is a block diagram showing an example of a functional configuration of the command control unit according to the embodiment;

fig. 4 is a diagram showing an example of a control target portion and a state of the control target portion that can be controlled by a voice command in the embodiment;

fig. 5 is a diagram showing an example of auxiliary information that can be utilized in the embodiment;

fig. 6 is a flowchart showing an example of processing operations for the command control section to receive a voice command and transmit an execution command in the embodiment;

fig. 7 is a diagram showing an example of the execution instruction generation conditions set in the instruction control unit for controlling the main power supply in the embodiment;

fig. 8 is a diagram showing an example of the execution command generation condition set in the command control unit for controlling the volume in embodiment 2;

fig. 9 is a diagram showing an example of the execution command generation conditions set in the command control unit for controlling the reception of digital content in embodiment 3;

fig. 10 is a diagram showing an example of the configuration of the entire system according to embodiment 4.

Description of the reference numerals

1 \8230, an instruction control section 2 \8230, an instruction control device 10 \8230, a receiving device 11 \8230, an instruction generating section 12 \8230, a voice detecting section 13 \8230, an instruction identifying section 14 \8230, a user interface section 15 \8230, an interface section 16 \8230, an auxiliary information acquiring section 17 \8230, a state information acquiring section 18 \8230, an instruction transmitting section 19 \8230, a control section 20 \8230, a broadcast signal receiving processing section 21 \8230, an auxiliary information output section 30 \8230, a content processing section 31 \8230, an auxiliary information output section 40 \8230, a video recording section 41 \8230, an auxiliary information output section 50 \8230, an equipment data generating section, an auxiliary information generating section 823051 \8230 60 \8230, an output control part 62 \8230, a state switching part 70 \8230, a sensor part 71 \8230, an auxiliary information output part 80 \8230, a main power supply part 82 \8230, a state switching part 90 \8230, a control part 91 \8230, an auxiliary information management part 92 \8230, a state information management part 93 \8230, a time management part 100 \8230, a remote controller, 101 \8230, an antenna, 102 \8230, a communication part, 103 \8230, a display part, 104 \8230, a speaker part, 105 \8230, an interface part, 111 \8230, a storage part, 121 \8230, a sound analysis part, 200 \8230, a content server device, 300 \8230, a viewing analysis device, 400 \8230, a sound identification device, 500 \8230anda network.

Detailed Description

Hereinafter, embodiments will be described with reference to the drawings.

Fig. 1 is a diagram showing an example of the configuration of the entire system of the embodiment.

The receiving apparatus 10 is a device for viewing digital content, and is, for example, a television receiving apparatus (also referred to as a television apparatus, a television receiver, or a broadcast signal receiving apparatus) capable of receiving and viewing digital broadcasts such as terrestrial broadcasts and satellite broadcasts of 2K or 4K/8K. The receiving apparatus 10 has an external interface for connecting to a network 500 such as the internet, for example, and can view digital content provided by a content server apparatus 200 (which may be a cloud server) connected to the network 500. The receiving apparatus 10 may have a video recording/reproducing function, and the digital content recorded in the receiving apparatus 10 can be viewed by the video recording/reproducing function. Further, by connecting a video/audio player, not shown, to the receiving apparatus 10, it is possible to view digital content recorded by the video/audio player, not shown. In fig. 1, although 1 receiving device 10 is shown, a plurality of receiving devices 10 may be further connected. However, the receiving apparatus 10 does not need to have all the same functions as long as it is a device capable of receiving digital content. Specifically, a video-only Recorder (Recorder) having no monitor capable of viewing, a Set-Top Box (STB) capable of receiving a broadcast signal, and the like may be used. The television apparatus, the video recorder, the STB, and the like are collectively referred to as a receiving apparatus 10.

In the present embodiment, an example of a television device is particularly shown, but the television device may be replaced with the receiving device 10 such as a video recorder or an STB. As a device having the same receiving function, the receiving device 10 may be a mobile device such as a personal computer or a smartphone.

The receiving apparatus 10 of the present embodiment is capable of performing voice-based control using voice recognition. The command control unit 1 performs voice recognition on a voice command uttered by a user, for example, and controls each function in the receiving apparatus 10 using a command (hereinafter, referred to as a recognition command) acquired by the voice recognition. The remote controller 100 is a remote controller for remotely controlling the receiving apparatus 10.

The receiving apparatus 10 can collect and store a viewing history (hereinafter, referred to as viewing data) of the digital content that has been viewed by the receiving apparatus 10. In addition, the reception apparatus 10 can store specific data (hereinafter, referred to as device-specific data). The data including the viewing data and the device individual data is referred to as viewing device data. The receiving apparatus 10 can output the stored data such as the viewing device data to the viewing analysis apparatus 300 connected to the network 500, and can receive the analysis result of the viewing device data performed by the viewing analysis apparatus 300.

The term "viewing" in the present embodiment includes a state in which the digital content or the received digital content is received in a viewable state or a recorded state even if the user is not actually viewing the digital content. Therefore, the "viewing history" in the present embodiment includes, for example: a "reception history" in which a reception device such as an STB receives only a signal regardless of whether the signal is viewed; if the receiving apparatus is a video recorder, the recording apparatus receives a "recording history" or the like that has been reserved or recorded. Similarly, the "viewing time" in the present embodiment includes, for example: the time when the digital content is received by a receiving device such as an STB; when the receiving apparatus is a video recorder, the time of recording the digital content, and the like.

Therefore, when the receiving apparatus 10 is replaced with a receiving apparatus such as a recorder or an STB, the viewing device data includes reception log data, download history data, recording data, and the like generated by the receiving apparatus such as a recorder or an STB.

The device-specific data may be registered in the receiving apparatus 10 by the user using the remote control 100 of the receiving apparatus 10, or may be set by the manufacturer as the fixed number of the receiving apparatus 10. The device-specific data includes, for example, an IP address and an identification number of the receiving apparatus 10, demographic data (hereinafter, referred to as "demographic") registered in advance by the owner or the like, and the like. The demographic is data for classifying the user viewing the receiving apparatus 10, and is information such as the age, sex, family configuration, and living area of the user. The identification number is a number, a character string, or the like for identifying different receiving apparatuses 10, and is composed of a combination of, for example, a number and a letter.

The content server apparatus 200 is a cloud server or a network that provides digital content. The content server device 200 is provided by, for example, a Content Service Provider (CSP), a communications carrier, and other service carriers of digital content called Over The Top transport (OTT). The content server device 200 also includes a server that provides a service such as Video On Demand (Video On Demand).

The viewing analyzer 300 is a device including: the viewing device data stored in the receiving apparatus 10 is collected and stored on the network or the cloud, and the stored data is analyzed to perform various viewing analyses, and the analysis result is output to the receiving apparatus 10 that has requested the analysis, for example. The viewing analysis apparatus 300 is constructed in a computer having a CPU, a memory, and the like, for example. The viewing analysis device 300 may be a personal computer, or may be a cloud service performed by a server, a database, or the like.

The viewing analysis apparatus 300 may be configured to identify each of the receiving apparatuses 10 based on an identification number that the receiving apparatus 10 individually has, and perform viewing analysis on the individual receiving apparatus 10.

The voice recognition apparatus 400 is provided on the network 500. The voice recognition device 400 is constructed in a computer having a CPU, a memory, and the like, for example. Voice recognition apparatus 400 may be a personal computer or a cloud server. The voice recognition apparatus 400 receives digital data of a voice waveform (hereinafter, referred to as voice data) corresponding to a voice of a user received by a microphone or the like via the network 500, estimates or recognizes a command issued by the user, and outputs the recognized command (recognition command) as text data. In the voice recognition apparatus 400, the command issued by the user to the control target may be tabulated as text data in advance, the text data acquired from the received voice data may be compared with the text data in the list, the closest command in the list may be selected, and the text data of the selected command may be output. However, the method of voice recognition in the voice recognition apparatus 400 is not limited to this method.

The voice recognition device 400 may be configured by a Speech To Text (hereinafter, referred To as STT server) server that generates a voice Text obtained by converting voice data into a Text (such as a Text or a language), and a remote recognition server that generates a command closest To the generated voice Text as a recognition command from among the tabulated commands. Specifically, the remote recognition server may refer to a command table registered in advance in a storage unit or the like, not shown, and output a command matching the voice text generated by the STT server or a command closest to the voice text in the table as the recognition command.

The network 500 is a network in which the receiving apparatus 10, the content server apparatus 200, the viewing analysis apparatus 300, and the voice recognition apparatus 400 are connected to enable communication between the apparatuses, and is, for example, the internet. The network 500 is not limited to the internet, and may be a network including a plurality of different networks if each device can communicate with each other.

Fig. 2 is a block diagram showing an example of the functional configuration of the receiving apparatus 10 according to the embodiment.

The receiving apparatus 10 is a device that performs reception processing on a signal superimposed with digital content to acquire the digital content and view it.

The broadcast signal reception processing unit 20 processes a broadcast signal such as a digital broadcast received from the antenna 101 in accordance with the specifications of the digital broadcast, acquires content data such as video, audio, and text, and outputs the content data. The specification of digital broadcasting may be, for example, an MPEG2TS scheme adopted in 2K digital broadcasting, an MPEG Media Transport (MMT) scheme adopted in 4K/8K digital broadcasting, or the like, and both of them may be supported by a plurality of tuners. The processes according to the specifications of digital broadcasting include a demultiplexing process of a digital data stream in which the digital data stream is separated into content data such as video, audio, and text, an error correction code decoding process, an encryption/decoding process of decoding encrypted data, and a decoding process of coding (video coding, audio coding, text coding, and the like) each content data.

The auxiliary information output unit 21 extracts information requested by the control unit 90 from the broadcast signal, and outputs the extracted information to the control unit 90. The auxiliary information output unit 21 extracts information requested from control information, program-related information, and the like included in the broadcast signal, and outputs the extracted information to the control unit 90. The information output to the control unit 90 is processed as necessary, and is used as auxiliary information in the command control unit 1.

The content processing unit 30 performs decoding processing and the like on the digital data input from the communication unit 102 with respect to the encoding processing performed by the content server apparatus 200, and outputs content data such as video, audio, and characters. Specifically, as the decoding process, the content processing unit 30 performs, for example, a demultiplexing process (separation process), an error correction code decoding process, a decoding process for encoded content data (video, text, audio, and the like), and the like.

The auxiliary information output unit 31 extracts information requested by the control unit 90 from the data output from the content server device 200, and outputs the extracted information to the control unit 90. The auxiliary information output unit 31 extracts, from the control information and the like output from the content server apparatus 200, related attribute information and the like such as the distribution provider of the content program, and outputs the extracted information to the control unit 90. The information output to the control unit 90 is processed as necessary, and is used as auxiliary information in the command control unit 1.

The recording/playing unit 40 is, for example, an optical disk player (Disc player), an HDD recorder, or the like, stores content data and the like output from the broadcast signal reception processing unit 20 and the content processing unit 30, outputs the stored content data to the output control unit 60 in accordance with a playing request, and outputs the content data as video, text, and audio from the display unit 103 and the speaker unit 104. The user views the digital contents and the like stored in the recording and playing section 40 by viewing the video, characters, and audio output from the display section 103 and the speaker section 104. In the present embodiment, the video/audio unit 40 is provided as one function of the receiving apparatus 10, but may be an external device via the interface unit 105, for example. In the case where the receiving apparatus 10 includes a plurality of tuners (corresponding to the broadcast signal reception processing units 20), the recording/playback unit 40 may be configured to record, by the recording/playback unit 40, the content B received by the other broadcast signal reception processing unit 20 while viewing the content a received by the broadcast signal reception processing unit 20.

The device data generating unit 50 collects data from the functional units (particularly, the broadcast signal reception processing unit 20, the content processing unit 30, and the video recording/playing unit 40) of the receiving apparatus 10, and generates viewing device data. The viewing device data includes device individual data and viewing data. The device-specific data is data unique to the receiving apparatus 10, and may be set in the receiving apparatus 10 by the user from a remote controller 100 or the like, for example, when the receiving apparatus 10 is installed. The identification number used in the viewing analysis may be assigned in advance, for example, at the time of purchase of the receiving apparatus 10, and may be set in the device-specific data or the like.

The viewing data may be, for example, a viewing history, or may be generated sequentially by the receiving apparatus 10 when a certain viewing is performed. Specifically, for example, the channel providing the digital content, the program name (or the program id based on the program name), the date when the program was viewed, and the like are stored in each functional unit according to the time when the broadcast signal reception processing unit 20, the content processing unit 30, and the recording/playing unit 40 are activated, and each functional unit outputs the stored data to the device data generating unit 50 as viewing data according to the time when each functional unit is deactivated. The device data generating unit 50 processes the viewing data as necessary, and stores the viewing data in a storage unit, not shown, of the receiving apparatus 10. In addition, it is also possible to provide: each of the functional units (particularly, the broadcast signal reception processing unit 20, the content processing unit 30, and the video recording/playing unit 40) stores viewing data, and when there is a request for transmitting the viewing data from an external device, each of the functional units outputs the viewing data to the device data generating unit 50, and the device data generating unit 50 processes the viewing data as necessary and outputs the viewing data to the external device as viewing device data.

The device data generating unit 50 outputs the generated viewing device data to the communication unit 102 in response to a request from an external apparatus (particularly, the viewing analysis apparatus 300). The viewing device data is transmitted from the communication unit 102 to the requested viewing analysis apparatus 300 via the network 500. In addition, when a change in the viewing device data is detected, the receiving apparatus 10 may transmit the viewing device data to the viewing analysis apparatus 300 at an arbitrary timing.

The auxiliary information output unit 51 receives the viewing device data generated by the device data generation unit 50, the viewing data collected from the functional units of the reception apparatus 10, and the various analysis result data output by the viewing analysis apparatus 300, and outputs the received data to the control unit 90. The information output to the control unit 90 is processed as necessary, and is used as auxiliary information in the command control unit 1.

The output control unit 60 adjusts output timing, display method, and the like for the content data output from the broadcast signal reception processing unit 20, the content processing unit 30, and the video/audio playing unit 40, and outputs the content data to the display unit 103 and the speaker unit 104.

The state switching unit 62 switches the current display state (on state or off state, state being displayed, state not being displayed, or the like) of the display unit 103, the current output state (on state or off state, sound output state, state in which sound is being muted, or the like) of the speaker unit 104, and the like, in accordance with the control from the control unit 90. The control unit 90 manages the display state of the display unit 103 and the output state of the speaker 104, and the control unit 90 controls the state switching unit 62 in accordance with a control command from the remote controller 100 or the command control unit 1.

The sensor unit 70 is, for example, a camera provided on the front surface of the television, and can photograph the front surface of the television with the camera and estimate, for example, a user in front of the television from the photographed image by using image recognition or the like.

The auxiliary information output unit 71 outputs the user information estimated by the sensor unit 70 to the control unit 90. The information output to the control unit 90 is processed as necessary, and is used as auxiliary information in the command control unit 1.

The main power supply unit 80 is a power supply of the main function of the receiving apparatus 10, and is a power supply used when digital content is output to the display unit 103 and the speaker unit 104 of the television apparatus, or output is stopped (this operation is referred to as an on-off operation of the television apparatus), for example.

The state switching unit 82 switches the current state of the main power supply unit 80, that is, the on-off state of the television apparatus, in accordance with the control from the control unit 90. The control unit 90 manages the current state of the main power supply unit 80, and the control unit 90 controls the state switching unit 82 in accordance with a control command from the remote controller 100 or the command control unit 1.

The control unit 90 controls each function of the receiving apparatus 10. Specifically, the control unit 90 receives various control signals from the interface unit 105 and controls the functions of the receiving apparatus 10. For example, when the user designates from the remote controller 100 whether to view a content based on a broadcast signal or view a content from the content server apparatus 200, the control unit 90 receives a control signal from the remote controller via the interface unit 105, and controls the function of the receiving apparatus 10 so that the user designates the function. In fig. 2, functional blocks not particularly connected to the control unit 90 may exchange data with each other.

The control unit 90 in the present embodiment receives an execution command from the command control unit 1 via the interface unit 105, and controls each function in the receiving apparatus 10 according to the execution command.

The auxiliary information management unit 91 receives the auxiliary information output by the auxiliary information output units (the auxiliary information output unit 21, the auxiliary information output unit 31, the auxiliary information output unit 41, the auxiliary information output unit 51, the auxiliary information output unit 71, and the like) in the respective functional units, and stores the auxiliary information in a storage unit (not shown). The auxiliary information management unit 91 transmits the stored auxiliary information of each functional unit from the interface 105 to the command control unit 1.

It can also be arranged that: the information stored in the auxiliary information management unit 91 is set as a predetermined item in an auxiliary information output unit in each functional unit, and is automatically output to the auxiliary information management unit 91 by the auxiliary information output unit at the time when the auxiliary information is updated in each functional unit. In addition, it is also possible to provide: the auxiliary information management unit 91 requests auxiliary information from the auxiliary information output unit of each function as necessary.

The state information management unit 92 stores state information of the state switching unit (the state switching unit 62, the state switching unit 82, and the like) in each functional unit in a storage unit (not shown). The state information management unit 92 transmits the stored state information of each function to the command control unit 1 via the interface 105. In the present embodiment, the control unit 90 always keeps the state information of the state information management unit 92 in the latest state in order to control the states of the respective functional units. Further, the information stored by the state information management unit 92 may be provided as: predetermined items are set in a state information output unit, not shown, in each functional unit, and the state information output unit automatically outputs the state information to the state information management unit 92 at the time when the state information is updated in each functional unit. Further, the state information management unit 92 may request the state information output unit of each function unit for the state information as necessary.

The time management unit 93 manages the time of the reception device 10. The clock may be provided inside the receiving apparatus 10, or the time may be extracted from a broadcast signal or the like.

The communication unit 102 performs data transmission and reception processing with various devices connected to the network 500 according to a predetermined communication protocol such as TCP/IP or UDP/IP.

The display unit 103 is, for example, a monitor for displaying video and characters. The speaker unit 104 is, for example, a speaker that outputs sound. The display unit 103 and the speaker unit 104 output the content data output from the output control unit 60 as video, text, and audio. The user views the video, text, and audio output from the display unit 103 and the speaker unit 104, thereby viewing the broadcast signal, the digital content provided from the content server apparatus 200, and the other recording/playing unit 40.

The interface unit 105 is an interface for controlling the receiving apparatus 10 from outside such as the remote controller 100, and receives a control signal from the remote controller 100, a switch not shown attached to the receiving apparatus 10, or the like, and outputs the control signal to the control unit 90 of the receiving apparatus 10. Instead of the remote control 100, an interface may be provided to receive a control signal from a terminal such as a smartphone, not shown. The interface unit 105 may have an interface for connecting to an external device. For example, the receiver 10 may be an interface for connecting to an external HDD video recorder, not shown. In this case, the user can view digital content output from an external HDD recorder, not shown, using the display unit 103 and the speaker unit 104.

The interface 105 in the present embodiment includes an interface capable of communicating with the interface unit 15 of the command control unit 1, and may be, for example, various types of serial buses. The interface 105 receives the execution command output from the command control unit 1, and outputs the execution command to the control unit 90.

Fig. 3 is a block diagram showing an example of the functional configuration of the command control unit according to the embodiment.

In the command control unit 1, the command generation unit 11 generates an execution command based on the recognition command generated by the command recognition unit 13 through voice recognition and the additional information (the auxiliary information and the status information) acquired by the auxiliary information acquisition unit 16 and the status acquisition unit 17, and outputs the execution command from the command transmission unit 18 to the outside of the command control unit 1. Hereinafter, each functional block will be described.

The instruction generating unit 11 stores a combination of the identification instruction, the additional information, and the execution instruction (hereinafter, referred to as a condition combination table) in the storage unit 111. The condition combination table (list) can be set by the user in advance. The condition combination table may be incorporated in the computer program of the instruction generating unit 11, or may be downloaded from an external server via the network 500, for example. When the identification command and the additional information are input to the command generating unit 11, the command generating unit 11 generates and outputs an execution command by referring to the condition combination table.

The sound detection unit 12 is, for example, a microphone, and receives a sound emitted by a user, acquires sound data of a received sound waveform by sampling (sampling), and outputs the sound data.

The command recognition unit 13 performs voice recognition based on the voice data output from the voice detection unit 12, estimates or recognizes a command issued by the user, and outputs the recognition command as text data. The command recognition unit 13 may include a voice data processing unit (voice recognition engine) for performing voice recognition therein, or may acquire a recognition command by transmitting voice data to the voice recognition apparatus 400 on the network 500 including the voice recognition engine (engine). By using the voice recognition device 400, the manufacturing cost of the command recognition unit 13 can be reduced. In addition, the voice recognition apparatus 400 obtains voice data from the plurality of receiving apparatuses 10 and performs voice recognition, so that the voice recognition apparatus 400 has an advantage that a learning effect is obtained and the performance of voice recognition can be improved. However, since the voice recognition device 400 is used via the network 500, there is a disadvantage that the processing delay of the voice recognition is large, and in this respect, when the voice recognition engine inside the command recognition unit 13 is used, there is an advantage that the processing delay of the voice recognition is small. Although fig. 3 shows an example in which the sound detection unit 12 (speaker) is incorporated, a smart speaker or the like may be used. For example, a smart speaker is connected to the interface 105 for use. Specifically, the smart speaker equipped with the voice recognition engine may generate a recognition command, and the command control unit 1 (or the command generating unit 11) may acquire the recognition command generated by the smart speaker via the interface 105, for example. The acquired identification command may be used by the command generating unit 11 to generate an execution command.

The user interface unit 14 is an interface capable of inputting characters, such as a keyboard of a personal computer. Specifically, the user sets the condition combination table in the storage unit 111 from the user interface unit 14. The condition combination table may be set in the storage unit 111 using the remote controller 100, a smartphone, or the like. Further, the command generating unit 11 may download the condition combination table to the storage unit 111 from a server, not shown, provided outside the receiving apparatus 10 and providing the condition combination table.

The interface unit 15 performs various data communications with the interface unit 105. Specifically, the interface unit 15 receives the assist information and the status information from the interface unit 105. The interface unit 15 sends the execution command generated by the command generation unit 11 to the interface unit 105.

The auxiliary information acquiring unit 16 acquires auxiliary information from the auxiliary information managing unit 91 via the interface unit 15. The auxiliary information acquiring unit 16 may output an auxiliary information acquisition request to the auxiliary information managing unit 91 and acquire the auxiliary information from the auxiliary information managing unit 91, for example, in accordance with the timing at which the identification command is output from the command identifying unit 13.

The status acquisition unit 17 acquires status information from the status information management unit 92 via the interface unit 15. For example, the state information acquiring unit 17 may output a state information acquisition request to the state information managing unit 92 and acquire the state information from the state information managing unit 92, based on the time when the command recognizing unit 13 outputs the recognition command or the time when the command generating unit 11 receives the recognition command.

The command transmission unit 18 transmits the execution command generated and output by the command generation unit 11 from the interface unit 15 to the control unit 90. Further, the information can be transmitted to a device or apparatus on the network 500 via the communication unit 102.

The control unit 19 controls each functional block in the command control unit 1. Further, it is also possible to provide: in fig. 3, data exchange (control) is also performed between functional blocks not connected to the control unit 19 in particular.

The sound analysis unit 121 performs, for example, frequency analysis (voiceprint analysis) on the sound data output from the sound detection unit 12, and estimates the attributes of the sound source such as the child, age, and sex to obtain a sound source estimation result. The sound analysis unit 121 performs necessary processing on the sound source estimation result and outputs the result to the auxiliary information acquisition unit 16 as auxiliary information. The sound source estimation result obtained by the sound analysis unit 121 can also be used to determine whether or not to activate the command generation unit 11, the command recognition unit 13, the command transmission unit 18, and the like. In accordance with the sound source estimation result, for example, when the user who has uttered the sound is a child, the sound analysis unit 121 may output the sound source estimation result to the control unit 19 so that the execution instruction is not output, and the control unit 19 may enable or disable the functions of the instruction generation unit 11, the instruction recognition unit 13, and the instruction transmission unit 18.

Fig. 4 is a diagram showing an example of a control target part and a state of the control target part that can be controlled by a voice command in the embodiment.

The column on the left side shows "number", "control target portion", "available state value", and "audio command". The "controlled part" is a part of the receiving apparatus 10 that can be controlled by a voice command. The "available state value" is an available state of the "control target portion". The "voice command" is a control command issued by the user when switching the state value to the "control target part".

"number" =1 indicates that the "available state value" of the main power supply unit 80 as the "control target portion" of the receiving apparatus 10 is 2 states of on and off. In this case, the "voice command" for switching the state value is "power on" when the state value is turned on, and is "power off" when the state value is turned off. That is, the user utters "power on" to turn on the power of the television apparatus, and the user utters "power off" to turn off the power of the television apparatus.

"number" =2 indicates that the "available state value" of the state switching unit 62 of the output control unit 60 as the "control target portion" of the reception apparatus 10 is 2 states of on and off. That is, the user pronounces "on volume" in order to turn on the volume of the television apparatus so that sound is output from the speaker section 104, and the user pronounces "off volume" in order to turn off the volume of the television apparatus so that sound is not output from the speaker section 104 (mute).

"number" =3 denotes a content output device in a state where digital content is being output to the display unit 103 or the speaker unit 104. The content output device is a general term for the broadcast signal reception processing unit 20, the content processing unit 30, and the recording/playing unit 40. Specifically, "tuner" in the column of "available state value" corresponds to the broadcast signal reception processing unit 20, "network" corresponds to the content processing unit 30, and "recorder" corresponds to the video recording/playing unit 40. For example, the user utters "tuner" in order to set the broadcast signal reception processing unit 20 to a reception state and cause the display unit 103 and the speaker unit 104 to output content based on the broadcast signal. The user utters "network" in order to cause the display unit 103 and the speaker unit 104 to output the content from the content server apparatus 200 in order to cause the content processing unit 30 to be in the reception state. The user utters "recorder" so that the video/audio playing unit 40 is in a playing state and the display unit 103 and the speaker unit 104 output.

"number" =4 is a tuner in a state where digital content is being output to the display unit 103 or the speaker unit 104. The BS, CS, and terrestrial numbers of the "desirable state value" indicate tuners of BS digital broadcasting, CS digital broadcasting, and terrestrial digital television broadcasting (terrestrial digital broadcasting), respectively. For example, the user utters "BS" in order to put the BS digital broadcast into a reception state and cause the display unit 103 and the speaker unit 104 to output the content of the BS digital broadcast. In order to set the CS digital broadcast to a reception state and cause the display unit 103 and the speaker unit 104 to output the content of the CS digital broadcast, the user utters "CS". In order to put the terrestrial digital broadcast into a receiving state and cause the display unit 103 and the speaker unit 104 to output the contents of the terrestrial digital broadcast, the user pronounces "terrestrial digital".

The "number" =5 is a channel (program) which the content output apparatus is outputting to the display unit 103 and the speaker unit 104. 1ch, 2ch, and 3ch of the "available state value" indicate channel numbers. For example, when 1ch is set to the reception state and the display unit 103 and the speaker unit 104 output a program of 1ch, the user utters "1ch". The same is true for other channels. In addition, in general, a channel may also be defined for each content output device and for each tuner. For example, when the content output device is the broadcast signal reception processing unit 20, a channel number may be assigned to each of the tuners "BS", "CS", and "terrestrial digital". In addition, a unique channel number may be uniformly assigned to all channels of the content output apparatus. In this case, the type of content output device and the type of tuner can be specified based on the channel number.

The "desirable state value" may be a state of "regression (home)". "regression" is a state value that can be set by the user.

For example, in the example of No. 3, a case will be described in which the "tuner" is set to the "regression" state value of the content output device as the control target portion. For example, when the user views a program on the "network" of the content output apparatus, that is, when the current state of the content output apparatus is "network", the user issues a voice command to the "network" and the like, and the command recognition unit 13 recognizes and generates the "network" as a recognition command. When the identification command is input to the command generation unit 11, the command generation unit 11 generates and transmits an execution command "tuner" for switching the state to "tuner" which is "return", and switches the content output device to "tuner".

For example, in the example of No. 4, a case will be described in which "terrestrial digital" is set as a "regression" state value of the tuner as the control target portion. For example, when the user views the tuner of the "BS", that is, when the current state of the tuner is "BS", the user issues a sound of "BS" as a voice command, and the command recognition unit 13 recognizes and generates "BS" as a recognition command. When the identification command is input to the command generation unit 11, the command generation unit 11 generates and transmits an execution command "terrestrial digital" for switching the state to "terrestrial digital" which is "return", and switches the tuner to "terrestrial digital".

In the example of reference numeral 5, "1ch" is set as the "regression" state value of the channel as the controlled part. For example, when the user views a channel of "2ch", that is, when the current state of the channel is "2ch", the user utters "2ch" as a voice command, and the command recognition unit 13 recognizes and generates "2ch" as a recognition command. When the recognition command is input to the command generation unit 11, the command generation unit 11 generates and transmits an execution command "1ch" for switching the state to "1ch" which is "return", and switches the channel to "1ch".

The above-described setting of the "return" state value may be set by the user, or may be automatically set to a state value before the power supply of the television device was turned off in the previous time. Further, as the voice command, "return" may be provided. For example, when the user utters the voice command as "regression" and generates "regression" as the recognition command, the state may be shifted to the state value set as "regression".

Fig. 5 is a diagram showing an example of auxiliary information that can be used in the present embodiment.

The "number", "information source", "auxiliary information", and "information value" are shown from the left column. The "information source" is a transmission source of the "auxiliary information" or an extraction source (extracted part) of the "auxiliary information". The "auxiliary information" is information used when the instruction control unit 1 generates an execution instruction in the present embodiment. The "information value" is a value of "auxiliary information".

"number" =1 denotes that the time management unit 93 of the reception apparatus 10 as the "information source" includes the time in the auxiliary information.

"number" =2 denotes an attribute of a source of a voice obtained by performing voice analysis by the voice analysis unit 121 of the reception apparatus 10 as an "information source", and the attribute of the source of the voice is included in the auxiliary information. The attribute of the sound transmission source is, for example, the age of the transmission source, the age-defined category of adults and children, and user identification information, which is the age-defined category of babies, kindergarten children, pupils, middle school students, high school students, college students, social people, or the like.

"number" =3 denotes an attribute that specifies user identification information from a voice recognition result obtained by performing voice recognition by the instruction recognition unit 13 of the receiving apparatus 10 as the "information source" and includes the user identification information in the auxiliary information.

"numbers" =4, 5, and 6 respectively indicate that the intensity of the sound, the height of the sound, and the speed of the sound are provided as auxiliary information in accordance with the intensity of the sound, the height of the sound, and the speed of the sound, which are obtained by performing sound analysis by the sound analysis unit 121 of the receiver apparatus 10 serving as the "information source".

The "number" =7 denotes that the sensor unit 70 (for example, a camera) of the receiving apparatus 10 as the "information source" specifies the user identification information from the image identification result acquired by the camera, and the auxiliary information includes the attribute of the user identification information.

The "numbers" =8 to 11 indicate "broadcast provider", "program broadcast time zone", "program type", and "program name" extracted from the broadcast signal by the broadcast signal reception processing unit 20 of the reception device 10 as the "information source" included in the auxiliary information, respectively. In the case of "broadcast operator", for example, the names of operators such as operator a, operator B, and operator C are used as the values of the auxiliary information. In the case of the "program broadcasting time zone", the broadcasting time zone is taken as a value of the auxiliary information. In the case of the "program genre", the value of the auxiliary information is, for example, the genre of a program such as drama, sports, music, news, education, and emergency information. In the case of "program name", for example, the names of programs such as program a, program B, and program C are used as the values of the auxiliary information.

The "number" =12, and the auxiliary information includes a demographic obtained from an analysis result provided by the viewing analysis device 300 as an "information source".

The "numbers" =13 to 16 indicate "broadcast provider", "program broadcast time zone", "program type", and "program name" extracted from broadcast waves when a program is recorded in a recording/playing section 40 (which may further include an external recorder or the like) of the receiving apparatus 10 as an "information source" included in the auxiliary information.

"numbers" =17 to 20 indicate "distribution provider", "program broadcast time zone", "program type", and "program name", respectively, which are extracted from information accompanying the content transmitted from the content server apparatus 200 of the receiving apparatus 10 as the "information source", and are included in the auxiliary information. The auxiliary information may include a user who recorded the digital content (particularly, a recording manager).

(embodiment 1)

In the present embodiment, an operation example in the case of controlling the on-off of the main power supply of the receiving apparatus based on the voice command and the additional information issued by the user is shown.

Fig. 6 is a flowchart showing an example of processing operation for the command control unit to receive the voice command and transmit the execution command in the embodiment.

The user issues a voice command "power on" or "power off" in order to control on-off of the main power supply of the reception apparatus 10. The sound detection unit detects sound (yes in step S11). When the sound detection unit does not detect a sound, it waits until the sound is detected (no in step S11). The sound detection unit may determine whether the sound is detected or not detected, for example, by: a threshold value is set for the intensity of the input sound, and if the intensity is greater than or equal to the threshold value, the sound is determined to be detected, and if the intensity is less than the threshold value, the sound is determined not to be detected.

The sound detection unit 12 outputs the sound data of the received sound to the sound analysis unit 121, and the sound analysis unit 121 performs sound analysis processing on the received sound data (step S12). Specifically, the sound analysis unit 121 performs frequency analysis (voiceprint analysis) on the sound data to estimate the attribute of the sound source.

When the sound analysis unit 121 outputs the estimation result to the control unit 19, the control unit 19 determines whether or not to generate a recognition command based on the estimation result (step S13). The judgment rule as to whether the recognition command is generated or not may be set by the user from the user interface unit 14, for example, in the storage unit 111. The control unit 19 accesses the storage unit 111 to refer to the determination rule for generating the identification command. When determining that the recognition command is to be generated, the control unit 19 causes the command recognition unit 13 to perform voice recognition on the voice data (yes in step S13). When determining that the recognition command is not generated, the control unit 19 controls the command recognition unit 13 not to perform voice recognition on the voice data, and sets the command recognition unit 13 in a standby state until the next voice is detected (no in step S13, S11). When determining that the recognition instruction is not generated, the control unit 19 may cause the display unit 103 and the speaker unit 104 to output a message such as "generation of no recognition instruction" to notify the user.

When the command recognition unit 13 is instructed to generate a recognition command from the control unit 19, the command recognition unit 13 acquires the recognition command as text data by voice recognition. The command recognition unit 13 outputs the recognition command to the command generation unit 11 (step S14). In step S14, it may be configured to: the command recognition unit 13 first attempts to acquire a recognition command by the voice recognition engine provided in the command recognition unit 13, and if the recognition fails and the recognition fails, the recognition command is acquired by the external voice recognition device 400 having a higher recognition rate of voice recognition.

Upon receiving the identification command from the command identification unit 13, the command generation unit 11 refers to the condition combination table in the storage unit 111.

Fig. 7 is a diagram showing an example of the execution instruction generation conditions (combination condition table) set in the instruction control unit for controlling the main power supply in the embodiment.

In the example of the combination condition table, "number", "execution instruction", "identification instruction", "status information", and "auxiliary information" are shown in the left column. The "execution instruction" described in the same line is specified on the condition of "identification instruction", "state information", and "auxiliary information" described for each line. The "recognition command" is a recognition command generated and output by the command recognition unit 13. The combination condition table may be set in the storage unit 111 by the user from the user interface unit 14, for example. The combination condition table shown in fig. 7 is an example, and the number of pieces of side information (the number of columns) can be further added.

Referring back to fig. 6, the processing operation of the command control unit 1 will be described. When "power on" is input as "identification command", for example, from the command identification unit 13 to the command generation unit 11, the command generation unit 11 refers to the combination condition table and instructs the auxiliary information acquisition unit 16 and the status information acquisition unit 17 to acquire "auxiliary information" and "status information" as column entries of the table, respectively. The auxiliary information acquiring unit 16 requests the auxiliary information managing unit 91 to acquire "time", "intensity of sound", and "user identification information" as items of auxiliary information in the combination condition table, and outputs the acquired information to the command generating unit 11 (step S15). The state information acquisition unit 17 recognizes that the control target is the main power supply 80 based on the fact that the "identification command" is "power on", acquires the state information of the main power supply 80 with respect to the state information management unit 92, and outputs the state information to the command generation unit 11 (step S16).

The command generating unit 11 refers to the combination condition table, and searches for a row in which the acquired assist information, state information, and identification command match the combination of the conditions in the combination condition table. When there is an instruction in the item of the "execution instruction" in the line where the combination of the conditions matches, the instruction described in the item of the "execution instruction" is generated as the execution instruction and is output to the instruction transmission unit 18 (yes in steps S17 and S18, S19). On the other hand, for example, as shown by reference numeral 8 in fig. 7, when the item of "execution command" in which the combination of conditions matches is "not to send", the execution command is not generated, and the command control unit 1 waits for the next detection of a sound (no in steps S17 and S18). When the execution command is "not to be transmitted", the command control unit 1 may transmit a message including "the execution command is not transmitted or not generated" to the control unit 90. The control unit 90 that has received the message may display the message on the display unit 103 or output a sound including the message content from the speaker unit 104 to notify the user.

In addition, when the item of "execution instruction" in which the combination of conditions matches is "not to send", the instruction control unit 1 may generate and send "execution instruction for maintaining the current state". For example, when the voice command is voiceprint analyzed by the voice analysis unit 121 and, as a result, the instruction generation unit 11 inputs the auxiliary information indicating that the source of the voice is "child" as the user identification information, the "execution instruction for maintaining the current state" may be generated and transmitted using only the auxiliary information without using the identification instruction. In this case, the command generating unit 11 may transmit a message including "this operation is not possible" to the control unit 90. At this time, the control unit 90 that has received the message may display the message on the display unit 103 or output a sound including the message content from the speaker unit 104 to notify the user.

If the execution instruction is input, the instruction transmission section 18 transmits the execution instruction to the control section 90 via the interface section 15 (step S101). The control unit 90 controls the function of the control target in accordance with the received execution command.

When an execution instruction is generated or transmitted, the instruction control unit 1 may transmit a message including the content "the execution instruction is transmitted" to the control unit 90. When the identification command is different from the generated execution command, the command control unit 1 may transmit a message including the content "the execution command is different from the identification command" to the control unit 90. In the command control unit 1, a message may be displayed on the display unit 103 or a sound including the content of the message may be output from the speaker unit 104 to notify the user. By the above setting, the user can confirm whether the voice command uttered by the user is correctly executed.

Details of the combination condition table will be described again with reference to fig. 7.

In the case where the "number" =1, the auxiliary information is "-", which indicates that the auxiliary information is not used, and is an example in which the instruction generating unit 11 generates the execution instruction using only the status information and the identification instruction. In the example where "number" =1, the execution command is "power off" when the "identification command" is "power on" and the "status information" is "on". The same applies to "numbers" =2 to 4, and thus description thereof is omitted.

In the case of "no" =5, the following example is given: when the "identification command" is "power on", "status information" is "on", the "time" as the 1 st "auxiliary information" is "day", and the "intensity of sound" as the 2 nd "auxiliary information" is "strong", the execution command is "no transmission (or no generation)". Here, the value of "time" as the "auxiliary information" is "day", but for example, the auxiliary information acquisition unit 16 may have a correspondence relationship between time and time classification values (heading, day, and night), and the auxiliary information acquisition unit 16 may determine the corresponding time classification value from the acquired "time" and output the time classification value (day in the case of code = 5) to the command generation unit 11. The same is true for the 2 nd "auxiliary information", i.e., "intensity of sound". For example, the auxiliary information acquiring unit 16 may have a correspondence relationship between "intensity of sound" input from the sound analyzing unit 121 and intensity classification (strong/weak), and the auxiliary information acquiring unit 16 may determine a corresponding intensity classification value from the received "intensity of sound" and output the intensity classification value (strong in the case of code = 5) to the command generating unit 11. The same applies to "numbers" =6 to 12, and thus description thereof is omitted. As in the case of the "numbers" =5 to 12, "time" and "intensity of sound" are used as the auxiliary information, and thus the execution instruction can be generated taking into consideration a situation in which the accuracy of the sound recognition engine deteriorates due to an increase in ambient noise during the "daytime" time period and the recognition instruction is different from the user's sound instruction.

In the case where "number" =13, it is an example as follows: when the "identification command" is "power on", "status information" is "on", "time" is "night" which is the 1 st "auxiliary information", and "user identification information" which is the 2 nd "auxiliary information" is "adult", the execution command is "power off". Similarly to the "number" =5, for example, when the age of the transmission source of the voice command acquired by the voiceprint analysis is input to the auxiliary information acquisition unit 16, the voice analysis unit 121 may determine the corresponding age classification value from "age" received by the auxiliary information acquisition unit 16 in accordance with the correspondence relationship between the age and the age classification value (adult, child), and output the age classification value (adult in the case of the number = 13) to the command generation unit 11. The same applies to "numbers" =14 to 20, and thus description thereof is omitted. As with the "numbers" =13 to 20, by using "time" and "user identification information" as auxiliary information, the accuracy of the voice recognition engine does not deteriorate because the surrounding noise is small in the time zone of "night", but the present invention can be used when the user wants to restrict the generation and transmission of the execution instruction in "night".

As described above, the execution instruction can be generated using the condition combination table of fig. 7 in the order of the flowchart of fig. 6.

When the condition of "number" =1 "in the condition combination table in fig. 7 is applied, the current state of the main power supply (corresponding to" state information ") is" on ". In such a situation, it is generally considered that the user is less likely to issue a voice command such as "power on", and conversely, the user is more likely to issue a voice command to "power off" as opposed to the current state of "power on". However, if the voice command for turning "off the power supply" is recognized as "on" by the voice recognition engine of the command recognition unit 13 due to ambient noise or the like, a situation occurs in which the main power supply of the reception apparatus 10 cannot be turned off even though the user utters "off" and the user's stress increases. However, according to the present embodiment, the command control unit 1 can prevent such a situation by interpreting the identification command based on the status information of the reception device 10 by using the condition of "number" =1 in the condition combination table. Similarly, when the "number" =2 to 4 in the condition combination table is applied to the instruction control unit 1, if the identification instruction is a control instruction for switching the current state of the main power source ("state information"), the identification instruction is made to coincide with the control instruction for executing the instruction. In contrast, when the identification instruction is a control command in the same state as the "state information", the control command for executing the instruction becomes a control command opposite or different from the identification instruction. This is because the voice recognition engine has a high possibility of recognizing a voice which is particularly misaligned and turned off. In the present embodiment, it is considered that there is almost no possibility that the user issues a voice command in order to make the current state of the controlled object and the state after the control the same, so that it is possible to assist the interpretation of the voice command by voice recognition.

In the condition combination table of fig. 7 in the present embodiment, the condition combinations for specifying the execution instruction are shown in the numbers 1 to 20, but it is also possible to give priority to these combinations and refer to the condition combination table with the instruction generating unit 11 taking priority into consideration. Priority may also be given in the line order of number 1 to number 20 as in fig. 7. In this case, even if the auxiliary information is acquired, the instruction generating unit 11 does not use the auxiliary information because the numbers 1 to 4 have priority. In addition, it is also possible to provide: in the table of fig. 7, a column entry indicating the priority order is additionally added so that the user can set the priority in the condition combination of each row.

In addition, when there are a plurality of "auxiliary information", priority may be given to the "auxiliary information". For example, in the combination condition table of fig. 7, the auxiliary information is given a higher priority in the order of "time", "sound intensity", and "user identification information", for example. In this case, the command generating unit 11 refers to a combination of conditions in which auxiliary information having a high priority is placed, for example, in the combination condition table. Specifically, the user may set as follows: the auxiliary information gives the highest priority to the condition combination having only "time", then gives priority to the condition combination having only "time" and "intensity of sound", and then gives priority to the condition combination having only "time" and "user identification information".

Note that, the user may set a score indicating a priority to "time", "intensity of voice", and "user identification information" so that the instruction generating unit 11 calculates the total of the scores given to "time", "intensity of voice", and "user identification information", and may give priority to each combination condition table by setting the priority to be high separately from the higher of the total of the scores. In this case, since the command generating unit 11 can calculate the priority order, the user may set a score indicating the priority degree for "time", "intensity of sound", and "user identification information".

It is desirable to set the combination of conditions shown in the combination condition table to a combination that does not substantially satisfy the same conditions. When there is a combination of the same conditions, a rule such as selecting the command generating unit 11 in the order of the table may be determined in advance.

(embodiment 2)

In the present embodiment, an example of the operation of the command control unit 1 in the case of controlling the on/off of the sound volume output from the speaker unit 104 of the receiving apparatus 10 by using the audio command and the state information issued by the user, and the program type and the user identification information as the auxiliary information is shown. In the present embodiment, the main power supply is set to be on.

Fig. 8 is a diagram showing an example of the execution command generation conditions set in the command control unit for controlling the volume in embodiment 2. Although an example of a processing operation for the command control unit 1 to receive an audio command and transmit an execution command in the present embodiment will be described with reference to fig. 6, the same portions as those in embodiment 1 will not be described.

The user issues a voice command "volume on" or "volume off" in order to control on/off of the volume output of the speaker section 104 of the receiving apparatus 10. The sound detection unit detects sound (yes in step S11). As a result of the voice analysis processing, the control unit 19 determines that a recognition command is generated, and causes the command recognition unit 13 to perform voice recognition on the voice data (yes in step S13). The command recognition unit 13 acquires a recognition command as text data by voice recognition, and outputs the recognition command to the command generation unit 11 (step S14).

Upon receiving the identification command from the command identification unit 13, the command generation unit 11 refers to the condition combination table in the storage unit 111. When "the sound volume" is input as "the recognition instruction" from the instruction recognition unit 13, the instruction generation unit 11 refers to the combination condition table, and instructs the auxiliary information acquisition unit 16 and the state information acquisition unit 17 to acquire "the auxiliary information" and "the state information" as the column items of the table, respectively. The auxiliary information acquiring unit 16 requests the auxiliary information managing unit 91 to acquire the "program type" and the "user identification information" as the items of the auxiliary information in the combination condition table, and outputs the acquired information to the command generating unit 11 (step S15). The state information acquiring unit 17 recognizes that the control target is the main power supply 80 based on the fact that the "identification command" is the "on volume", acquires the state information of the speaker unit 104 from the state information managing unit 92, and outputs the state information to the command generating unit 11 (step S16).

When there is an instruction in the item of the "execution instruction" of the row in which the acquired combination of the assist information, the state information, and the identification instruction matches, the instruction generation unit 11 generates an instruction described in the item of the "execution instruction" as an execution instruction and outputs the instruction to the instruction transmission unit 18, with reference to the combination condition table (yes in steps S17 and S18, S19). On the other hand, when the item of "execute command" is "not to send", the execution command is not generated, and the command control unit 1 waits for the next voice to be detected (no in steps S17 and S18). When the execution command is input, the command transmission unit 18 transmits the execution command from the interface unit 15 to the control unit 90 (step S101).

Details of the combination condition table will be described with reference to fig. 8.

In the case where "number" =1, since the "state information" is on ", it means that the current" volume "is in the output state. The combination of conditions of "no" =1 is the following example: when the "identification command" is "sound volume on", the "program type" as the 1 st "auxiliary information" is "emergency information", and the "user identification information" as the 2 nd "auxiliary information" is "-" (unconditional), the command generating unit 11 generates "no transmission (or no generation)" as the execution command. The same applies to the "numbers" =2 to 4, and therefore, description thereof is omitted. As with the "numbers" =1 to 4, the "program genre" is used as the auxiliary information, and a request to turn ON (ON) the sound volume in all situations can be considered when the "program genre" is the "emergency information". For example, when an earthquake or the like occurs in the broadcast signal received by the broadcast signal reception processing unit 20, if the volume is occasionally set to off, the user swiftly outputs a voice command "turn on the volume" in order to turn the volume on. However, it is assumed that the surrounding noise becomes large in an emergency such as an earthquake, and the voice command issued by the user is highly likely to be recognized by mistake. In this case, when the "emergency information" is acquired as the "program type" of the auxiliary information acquired by the auxiliary information acquisition unit 16, the command generation unit 11 always turns on the sound volume regardless of the received "identification command" and the "status information" of the control target.

"number" =5 to 12 are examples of using "user identification information" as the auxiliary information. For example, when a child wants to avoid watching digital content with the receiving apparatus 10 during daytime hours, a combination of conditions such as "number" =5 to 12 can be used.

As described above, the user can assume various use cases by setting the combination condition table as in the present embodiment, and can perform command control on the voice command.

(embodiment 3)

In the present embodiment, an example of the operation of the command control unit 1 is shown in the case where the receiving apparatus 10 controls the digital content to be output to the display unit 103 and the speaker unit 104 by using the voice command and the state information issued by the user, and the user identification information and the time as the auxiliary information. In the present embodiment, a state in which the main power supply is on is assumed. In the present embodiment, an example is shown in which 2 execution commands are output for 1 voice command.

Fig. 9 is a diagram showing an example of the execution command generation conditions set in the command control unit to control the reception of digital content in embodiment 3.

Although an example of processing operations for the command control unit 1 to receive an audio command and transmit an execution command in the present embodiment will be described with reference to the flowchart of fig. 6, descriptions of the same parts as those in embodiment 1 will be omitted.

It is assumed that the user issues any one of the audio commands "network", "tuner", and "recorder" in order to control the output of the digital content to the display unit 103 and the speaker unit 104 of the reception apparatus 10. The sound detection unit detects sound (yes in step S11). After steps S12 and S13, the command recognition unit 13 obtains a recognition command as text data by voice recognition, and outputs the recognition command to the command generation unit 11 (step S14).

When the "identification command" is input from the command identifying unit 13, the command generating unit 11 refers to the condition combination table of the storage unit 111, and the command assistance information acquiring unit 16 and the state information acquiring unit 17 acquire the "assistance information" and the "state information" as column entries of the table, respectively. The auxiliary information acquiring unit 16 requests the auxiliary information managing unit 91 to acquire "user identification information" and "time" as items of auxiliary information of the combination condition table, and outputs the acquired information to the command generating unit 11 (step S15). The state information acquiring unit 17 recognizes that the control target is the content output device based on the "identification command", acquires the state information of the speaker unit 104 from the state information managing unit 92, and outputs the state information to the command generating unit 11 (step S16).

The command generating unit 11 generates an execution command through steps 17 to S19, and outputs the execution command to the command transmitting unit 18. The execution command is transmitted from the command transmitting unit 18 to the control unit 90 (step S101). In the present embodiment, 2 execution instructions are generated. Is the 1 st execution instruction for the content output apparatus and the 2 nd execution instruction for the program genre.

Details of the combination condition table will be described with reference to fig. 9.

In the case where the "number" =1, "status information" of the content output apparatus is "tuner", and thus indicates that "tuner" is currently outputting to the display unit 103 and the speaker unit 104. The combination of conditions of "no" =1 is the following example: when the identification command received by the command generation unit 11 is "network" and the 1 st auxiliary information, i.e., the user identification information is "adult", the command generation unit 11 generates an execution command for outputting the command from the "network" as one of the content output devices to the display unit 103 and the speaker unit 104. The same applies to "numbers" =2 and 3, and thus description thereof is omitted.

In the case where the "number" =4, "status information" of the content output apparatus is "tuner", and thus indicates that "tuner" is currently outputting to the display unit 103 and the speaker unit 104. The combination of conditions of "number" =4 is the following example: when the identification command received by the command generating unit 11 is "network", the 1 st auxiliary information, that is, the user identification information is "child", and the time of day is "day", the command generating unit 11 does not generate or transmit the execution command so as not to output the command from the "network", which is one of the content output devices, to the display unit 103 and the speaker unit 104. The situation of "number" =4 can be used, for example, in the following cases: it is desirable to prevent children from viewing digital content from the "network" with the receiving apparatus 10 during daytime hours and the like.

In addition, in the case where the "number" =9, "status information" of the content output apparatus is "recorder", and thus indicates that "recorder" is currently being output to the display unit 103 and the speaker unit 104. In the condition combination of "number" =9, when the identification command received by the command generation unit 11 is "network", the 1 st auxiliary information, that is, the user identification information is "child", and the time of the 2 nd auxiliary information is "night", the command generation unit 11 generates 2 of "network" and "education", which is an execution command for the program type, as the execution command. The generated 2 execution instructions are transmitted from the instruction transmitting unit 18 to the control unit 90, and the program for "education" for the delivery service by the "network" is output to the display unit 103 and the speaker unit 104. The situation of "number" =9 can be used, for example, in the following cases: in the case where the child wants to watch the "network" with the receiving apparatus 10 in the nighttime period, it wants to set to watch the digital content of which the program category is "education".

Further, for example, if the user views the BS, that is, if the state information of the controlled target site "tuner" is "BS", the user makes a sound of "BS" or the like as a voice command, and the command recognition unit 13 generates "BS" as a recognition command and outputs the recognition command to the command generation unit 11. When the identification command is input, the command generation unit 11 generates and transmits an execution command "terrestrial digital" for switching to the state value (assumed to be "terrestrial digital") set in the "return" of the control target part "tuner". The control section 90 receives the execution instruction "terrestrial digital", and switches the tuner to "terrestrial digital". At this time, it is also possible to provide: the state value of "regression" set in the "channel (program)" as the part to be controlled is selected for the channel (program) to be output to the display unit 103 and the speaker 104. That is, in this example, the instruction generating unit 11 generates the execution instruction for the plural control target parts of the "tuner" and the "channel (program)" for 1 voice instruction (or the identification instruction obtained by identifying the voice instruction) for the control target part issued by the user. The "return" set to the "channel (program)" may be, for example, a channel that was being viewed before the power of the television apparatus was turned off in the previous time, or may be a channel that is set in advance by the user. Further, 1 execution command may be used instead of distinguishing execution commands for different control target portions.

As described above, by setting the combination condition table as in the present embodiment, the user can generate a plurality of execution commands from 1 voice command, and can transmit and execute the generated execution commands.

(embodiment 4)

In the present embodiment, an example of a system in a case where the command control unit 1 is used as a single independent device is shown.

Fig. 10 is a diagram showing an example of the configuration of the entire system according to embodiment 3.

The command control device 2, the IoT device 600A, and the IoT device 600B are connected to the network 500. IoT device 600A and IoT device 600B have the same functions, and IoT device 600A and IoT device 600B are referred to as IoT device 600 without particularly distinguishing between the two. The IoT devices 600 are illustrated in fig. 10 as 2 only, but may be provided in plurality.

The command control device 2 is constructed in a computer having a CPU, a memory, and the like, for example. The command control device 2 may be a personal computer, or may be a cloud service including a server, a database, and the like. The command control device 2 includes a command control unit 1 and a communication unit 102A. The command control unit 1 and the communication unit 102A correspond to the functions of the command recognition unit 1 and the communication unit 102 in fig. 3, respectively.

The IoT device 600 is an all kinds of IoT devices, is connected to the network 500, and performs data communication with devices on the network 500. The IoT device 600 may be a sensor such as a temperature detection sensor, a water leakage detection sensor, or a door opening/closing sensor, or may be a detection device using these sensors. The IoT device 600 may be an electric appliance such as an air conditioner, a television device, lighting, or a camera. Further, if connected to the network 500, the mobile terminal may be a smartphone-like mobile terminal. The IoT device 600 in the present embodiment provides the instruction control device 2 with the auxiliary information and the status information, receives the execution instruction output by the instruction control device 2, and controls the function of the IoT device 600 according to the received execution instruction.

Specifically, the IoT device 600 includes an auxiliary information output unit 21A, a status switching unit 22A, a control unit 90A, an auxiliary information management unit 91A, a status information management unit 92A, a communication unit 102A, and an interface unit 105A. The respective functions correspond to those of the auxiliary information output unit 21, the state switching unit 82, the control unit 90, the auxiliary information management unit 91, the state information management unit 92, the communication unit 102, and the interface unit 105 in fig. 2.

That is, it can be said that each function of fig. 2 is allocated to the command control apparatus 2 and the IoT device 600A. The receiving apparatus 10 according to embodiment 1 is largely different from the IoT device 600 according to this embodiment in that the instruction control unit 1 is provided in the receiving apparatus 10, but the instruction control unit 1 is not provided in the IoT device 600. In the present embodiment, the instruction control unit 1 is provided in the instruction control device 2, and the iot device 600 receives the execution instruction from the instruction control device 2 via the network 500.

The target control unit of the IoT device 600 controllable by the command control device 2 may be, for example, the power supply (on/off) described in embodiments 1 to 3, or may be another control unit that controls the activation and the stop of the functional unit.

The flowchart of fig. 6 can be used to instruct the instruction control device 2 to transmit the execution instruction to the IoT device 600. The same procedure as in embodiment 1 and embodiment 2 will not be described in detail.

The user remotely controls the IoT device 600A via the network 500 in the vicinity of the instruction control apparatus 2. The user sets an execution instruction generation condition (combination condition table) for the IoT device 600A to the instruction control unit 1 of the instruction control apparatus 2.

In order to control the functions of the IoT device 600A, the user issues a voice command to the voice detection unit 12 of the command control apparatus 2. The voice detection unit 12 detects voice, and the command recognition unit 13 generates a recognition command through voice analysis processing by the voice analysis unit 121, and outputs the recognition command to the command generation unit 11 (yes in step S11, yes in S12, S13, S14).

When the identification command is input from the command identification unit 13, the command generation unit 11 refers to the combination condition table, and instructs the auxiliary information acquisition unit 16 and the status information acquisition unit 17 to acquire "auxiliary information" and "status information" as column entries of the table, respectively. The auxiliary information acquiring unit 16 requests and acquires each item of auxiliary information of the combination condition table from the auxiliary information managing unit 91A, and outputs the item to the command generating unit 11 (step S15). Step S15 in the present embodiment will be described in detail.

The auxiliary information acquisition unit 16 requests auxiliary information from the IoT device 600A via the communication unit 102A. In the IoT device 600A, the control section 90A receives the request for the assistance information via the communication section 102 A2. The control unit 90A acquires the auxiliary information from the auxiliary information management unit 91A, and transmits the acquired auxiliary information to the command control device 2 via the communication unit 102 A2.

The state information acquisition unit 17 grasps the control target in accordance with the identification instruction, and requests the IoT device 600A for the state information of the control target via the communication unit 102A. In the IoT device 600A, the control section 90A receives a request for status information via the communication section 102 A2. The control unit 90A acquires the status information from the status information management unit 92A, and transmits the acquired status information to the command control device 2 via the communication unit 102A 2. In the command control device 2, the command generating unit 11 receives the status information via the communication unit 102A (step S16).

When there is a command in the item of the "execution command" of the condition matching the combination of the acquired assist information, state information, and identification command, the command generation unit 11 generates a command described in the item of the "execution command" as an execution command and outputs the command to the command transmission unit 18, with reference to the combination condition table (yes in steps S17 and S18, S19). On the other hand, when the item of "execute command" is "not to send", the execution command is not generated, and the command control unit 1 waits for the next detection of sound (no in steps S17 and S18). In this case, the command control unit 1 may display a message including "no execution command is transmitted or no execution command is generated" such as "no command is transmitted" on a display unit, not shown, of the command control device 2, or output a sound including the content of the message from a speaker unit, not shown, of the command control device 2 to notify the user.

The instruction transmitting unit 18 transmits the input execution instruction to the IoT device 600A via the communication unit 102A (step S101). In the IoT device 600A, the controlled function is controlled according to the received execution instruction. When the execution command is generated or transmitted, the command control unit 1 may display a message including "the execution command is transmitted" on a display unit, not shown, of the command control device 2, or output a sound including the content of the message from a speaker unit, not shown, of the command control device 2 to notify the user of the message. When the identification command is different from the generated execution command, the command control device 2 may display a message including a content "the execution command is different from the identification command" on a display unit, not shown, of the command control device 2, or output a sound including the message content from a speaker unit, not shown, of the command control device 2 to notify the user. By setting as above, the user can confirm whether the intention of the voice instruction uttered by himself is accurately performed.

When the user issues a voice command to the command control device 2 connected to the network 500 in the above-described order, the IoT device 600 connected to the network 500 can be controlled by the execution command.

According to at least 1 of the embodiments described above, it is possible to provide an instruction generating device, an instruction generating method, and an instruction generating program that assist voice recognition for a voice instruction.

In the above embodiment, the example in which the function of the control unit 90 (including the auxiliary information management unit 91 and the status information management unit 92) is provided in the receiving apparatus 10 and the IoT device 600 as the control target apparatus has been described, but all or a part of the function of the control unit 90 may be provided as an external control apparatus or the like as the control target apparatus. In this case, the control device is also connected to the network 500. Similarly, the command control device may have various configurations and combinations of functions in the device and hardware for executing the functions, and is not limited to the above-described embodiments.

(modification example)

In the present modification, for example, the following example is shown: by using the "user identification information" and the "time slot" in fig. 5 as the auxiliary information, parental Control (Parental Control) such as viewing restriction and operation restriction of the child is possible. A remote controller or the like provided to operate the television apparatus is managed by an adult.

Specifically, when a child issues a voice command, the voice analysis unit 121 performs voiceprint analysis, and the auxiliary information acquisition unit 16 acquires "child" as user identification information and inputs the user identification information to the command generation unit 11. At the same time, the voice command is input to the command recognition unit 13, a recognition command is generated, and the recognition command is input to the command generation unit 11. The command generation unit 11 generates "power off" as an execution command based on, for example, the recognition command "power off" and the assist information "child", and outputs the generated execution command to the control unit 90. On the other hand, the instruction generating portion 11 acquires "child" as the user identification information and "late night" as the time zone from the auxiliary information acquiring portion 16. In this case, the instruction generating unit 11 may generate, for example, an "execution instruction for maintaining the current state" based on only the assist information such as "child" and "late night".

Although not shown in fig. 5, parental control may be performed by using "program viewing target" or the like as auxiliary information. For example, the command recognition unit 13 recognizes that the control target portion is "channel (program)" based on a voice command issued by a child. When the command recognition unit 13 outputs the identification information including the recognized control target site to the auxiliary information acquisition unit 16, the auxiliary information acquisition unit 16 acquires the "program viewing target" which is the auxiliary information for the control target site "channel (program)" included in the identification information from the broadcast signal reception processing unit 20 or the like, and outputs the acquired auxiliary information to the command generation unit 11. The instruction generating unit 11 generates an execution instruction based on the 2 pieces of auxiliary information of "children" and "program viewing target" which are the user identification information acquired from the sound analyzing unit 121. For example, when "children" are included in the program viewing object, the identification command is directly generated as the execution command. In addition, when "children" is not included in the program viewing object, for example, "an execution command for maintaining the current state" is generated.

As described above, according to the present modification, it is possible to perform parental control on a voice command. In addition, according to the present modification, among the recognition command, the state information, and the auxiliary information, the execution command can be generated only from the auxiliary information acquired based on the voice command without using the recognition command.

According to at least 1 of the embodiments and the modifications described above, it is possible to provide an instruction control device, a control method, and a control program that assist voice recognition for a voice instruction.

The present apparatus described above can be described as follows.

（A－1）

An electronic device (command control unit 1 or command control device 2) includes a motion control means (command generation unit 11) that acquires motion control information (identification command) from an input voice (audio command), sets the motion state of a control target device (reception device 10) to either a 1 st mode or a2 nd mode different from the 1 st mode, and sets the motion state to the 2 nd mode when the motion state of the control target device is the 1 st mode and the motion control information is information related to the motion state in the 1 st mode.

（A－2）

According to the electronic apparatus of (a-1), operation state information (state information) relating to a state of a control target portion of the control target apparatus is acquired.

（A－3）

According to the electronic apparatus of (A-1) or (A-2), control condition information (auxiliary information) different from the operation control information and the operation state information is acquired.

（A－4）

The electronic apparatus according to (a-3), wherein the control condition information is information set based on at least any one of a group of a time slot, user identification information, user attribute, received content information, a broadcast carrier, a distribution carrier, and a video recording content manager.

（A－5）

A program for causing a computer to execute an operation of acquiring operation control information (recognition instruction) from an input voice (voice instruction) and setting an operation state of a control target apparatus (receiving apparatus 10) to any one of an operation state of a 1 st mode and an operation state of a2 nd mode different from the 1 st mode, wherein when the operation state of the control target apparatus is the 1 st mode and the operation control information is information relating to the operation state in the 1 st mode, the operation state is set to the 2 nd mode.

Several embodiments of the present application have been described, but these embodiments are shown as examples and are not intended to limit the scope of the application. These new embodiments may be implemented in other various forms, and various omissions, substitutions, and changes may be made without departing from the spirit of the present application. These embodiments and modifications thereof are included in the scope and gist of the application, and are included in the invention described in the claims and the scope equivalent thereto. Furthermore, it is also within the scope of the present application that each component of the technical means is expressed by dividing the component, by combining a plurality of components, or by combining a plurality of components. In addition, a plurality of embodiments may be combined, and an example configured by the combination also belongs to the scope of the application.

In order to make the description clearer, the drawings may schematically show the width, thickness, shape, and the like of each part as compared with the actual form. In the block diagram, data and signals may be exchanged between modules not connected or even in a case where the modules are connected and the direction of an arrow is not shown. The functions shown in the block diagrams, the processes shown in the flowcharts, and the sequence diagrams may be realized by hardware (an IC chip, etc.) or software (a program, etc.), or a combination of hardware and software. The present invention is also applicable to a device of the present application when the device is expressed as a control logic, a program including instructions to be executed by a computer, and a computer-readable storage medium storing the instructions. The terms and expressions used are not limited, and other expressions are also included in the present application as long as they have substantially the same contents and the same subjects.

Claims

1. An instruction control device, comprising:

a voice recognition unit that recognizes a voice input by a user and acquires a recognition command;

an additional information acquisition unit that acquires additional information including status information indicating a control status of one or more control targets and auxiliary information that is not the status information of the one or more control targets, the auxiliary information including: auxiliary information including user identification information based on a voice recognition result of the voice input by the user and intensity, height, and speed information of the voice input by the user, and values of auxiliary information including two or more for one type of auxiliary information;

the command control device generates an execution command including any one of a 1 st control command and a2 nd control command different from the 1 st control command, wherein the 1 st control command is a control command included in an identification command acquired by voice-identifying the voice;

a storage mechanism that stores a condition combination table including: the identification instruction, the state information, the auxiliary information and the values of the auxiliary information, and corresponding execution instructions, wherein for one type of auxiliary information, different values of the auxiliary information correspond to different execution instructions; and

and the execution instruction generating mechanism generates the execution instruction according to the corresponding relation.

2. The instruction control device according to claim 1,

the instruction control device determines whether to generate the execution instruction based on the identification instruction and the additional information in a scene of receiving the digital content, and determines a first execution instruction of a content output device for the digital content and a second instruction of a content type for the digital content according to auxiliary information of a sound input by a user and a value of the auxiliary information for the sound input by the user for a first sound input by the user.

3. The command control device according to claim 1,

the instruction control means determines to transmit zero, one, two or more of the execution instructions based on the condition combination table.

4. The instruction control apparatus according to claim 1, wherein the instruction control apparatus comprises a display device, and the auxiliary information comprises a combination of two or more of a reception content information, a broadcast operator, a distribution operator, and a video content manager.

5. The instruction control device according to claim 2, wherein the sound input by the user is one sound input, and the instruction control device is configured to generate two or more execution instructions based on a correspondence relationship between the identification instruction corresponding to the one sound input, the additional information, and the execution instruction, wherein a first execution instruction of the two or more execution instructions is an instruction for controlling a control state of a first control object of the one or more control objects, and a second execution instruction of the plurality of or more execution instructions is an execution instruction associated with auxiliary information corresponding to the identification instruction.

6. The instruction control apparatus according to any one of claims 1 to 3, comprising a display device, the instruction control apparatus being configured to cause the display device to display a message indicating that the transmitted execution instruction is different from the identification instruction, in a case where the transmitted execution instruction is different from the identification instruction.

7. The instruction control device according to any one of claims 1 to 3, the instruction control apparatus comprising a speaker, the instruction control device being configured to cause the speaker to play a voice message indicating that the transmitted execution instruction is different from the recognition instruction, in a case where the transmitted execution instruction is different from the recognition instruction.

8. An instruction control method, the method comprising:

recognizing sound input by a user to obtain a recognition instruction;

acquiring additional information, wherein the additional information comprises state information and auxiliary information, the state information is information representing the control state of one or more control objects, the auxiliary information is not information of the state information of the one or more control objects, and the auxiliary information comprises: auxiliary information including user identification information based on a voice recognition result of the voice input by the user and intensity, height, and speed information of the voice input by the user, and values of auxiliary information including two or more for one type of auxiliary information;

obtaining a condition combination table, wherein the condition combination table comprises: the identification instruction, the state information of the additional information, the values of the auxiliary information and the corresponding relation of the execution instructions, wherein for one type of auxiliary information, different values of the auxiliary information correspond to different execution instructions;

and generating a control instruction according to the condition combination table.

9. The method according to claim 8, wherein in a scene of digital content reception, whether to generate the execution instruction is determined based on the identification instruction and the additional information, and for a user-input sound, a first execution instruction of a content output apparatus for the digital content and a content type second instruction for the digital content are determined according to auxiliary information of the user-input sound and a value of the auxiliary information.