CN112930686B

CN112930686B - Receiving apparatus

Info

Publication number: CN112930686B
Application number: CN202080004651.1A
Authority: CN
Inventors: 山下丈次
Original assignee: Hisense Visual Technology Co Ltd; Toshiba Visual Solutions Corp
Current assignee: Hisense Visual Technology Co Ltd; Toshiba Visual Solutions Corp
Priority date: 2019-08-13
Filing date: 2020-08-13
Publication date: 2022-10-14
Anticipated expiration: 2040-08-13
Also published as: CN112930686A; WO2021027892A1; JP2021032906A; JP7206167B2

Abstract

The present application relates to a receiving apparatus. The purpose of the present application is to reduce the situation where a voice recognition service is started when the voice recognition service is not required. A receiving device according to an embodiment includes a voice input unit, a selection unit, and a voice recognition unit. The voice input unit inputs a voice of a user. The selection unit selects either one of the valid state and the invalid state of the voice recognition based on a predetermined condition. When the valid state is selected, the voice recognition unit executes voice recognition processing for the voice input to the voice input unit, and when the invalid state is selected, the voice recognition unit does not execute the voice recognition processing.

Description

Receiving apparatus

The present application claims priority of filing japanese patent application having application number 2019-148384, entitled "receiving device" by the japanese patent office on 8/13/2019, the entire contents of which are incorporated by reference in the present application.

Technical Field

Embodiments of the present application relate to a receiving apparatus.

Background

In recent years, there has been an increasing demand for a voice recognition service that enables a user to operate a device using voice. For example, a television device or the like having a voice recognition function is known. In such a television device or the like, for example, when a Wake Word (Wake Word) uttered by a user is detected, a voice recognition service is activated, and for example, a certain response is performed or the volume of content being played is lowered so that the voice of the user can be easily recognized.

However, in such a television apparatus or the like, there is a case where the voice recognition service is activated at a timing that is not intended by the user due to false detection of the wakeup word or the like. In this case, the user may be annoyed by obstructing the viewing of the content by the user.

Prior art documents

Patent literature

Patent document 1: japanese patent laid-open publication No. 2013-235032.

Disclosure of Invention

A situation in which the voice recognition service is started without the need for the voice recognition service should be reduced.

A receiving device according to an embodiment includes a voice input unit, a selection unit, and a voice recognition unit. The voice input unit inputs a voice of a user. The selection unit selects either one of the valid state and the invalid state of the voice recognition based on a predetermined condition. When the valid state is selected, the voice recognition unit executes the voice recognition processing for the voice input to the voice input unit, and when the invalid state is selected, the voice recognition unit does not execute the voice recognition processing.

Drawings

Fig. 1 is a diagram showing an example of a hardware configuration of a television device according to a first embodiment;

fig. 2 is a diagram showing an example of a functional configuration of the television device according to the first embodiment;

fig. 3 is a flowchart showing an example of the flow of the selection processing of the valid state and the invalid state of the voice recognition in the first embodiment;

fig. 4 is a diagram showing an example of a functional configuration of a television device according to a second embodiment;

fig. 5 is a diagram showing an example of a functional configuration of a television device according to a third embodiment;

fig. 6 is a diagram showing an example of a functional configuration of a television device according to a fourth embodiment;

fig. 7 is a diagram showing an example of a functional configuration of a television device according to the fifth embodiment.

Description of the reference numerals

10 \65381;, television device 11, 1011, 2011 \65381; \ 65381;, acquisition section 12 \65381; \ 65381;, wakeup word detection section 13 \65381;, 65381;, sound recognition section 14 \65381;, 65381;,

display control section

15, 1015, 2015, 3015 \65381;, 65381;, selection section 16 \65381;, 65381 _, equipment control section 17 \65381;, 65381 _, application execution section, 18 \65381;, learning section 19 \65381;, 65381;, program table generation section 110 \65381; \ 65381;, display panel 111 \65381;, 65381;, operating section 112 \65381;, 65381;, light receiving section 115 \65381;, 65381;, memory 116 \, 65381;, remote controller 300 \65381 _, memory 117 \65381;, network 65381 _, and microphone 1119;, 6538381 _, 65381 _, are provided in the display panel.

Detailed Description

(first embodiment)

Fig. 1 is a diagram showing an example of a hardware configuration of a television apparatus 10 according to the present embodiment. As shown in fig. 1, the television device 10 includes an antenna 101, an input terminal 102a, a tuner 103, a demodulator 104, a demultiplexer 105, an input terminal 102b and an input terminal 102c, an a/D (analog/digital) converter 106, a selector 107, a signal Processing Unit 108, a speaker 109, a display panel 110, an operation Unit 111, a light receiving Unit 112, an IP communication Unit 113, a CPU (Central Processing Unit) 114, a memory 115, a storage 116, a microphone 117, and an audio I/F (interface) 118. The television device 10 is an example of a receiving device in the present embodiment.

The antenna 101 receives a broadcast signal of digital broadcasting, and supplies the received broadcast signal to the tuner 103 via the input terminal 102 a. The tuner 103 selects a broadcast signal of a desired channel from broadcast signals supplied from the antenna 101, and supplies the selected broadcast signal to the demodulator 104. The broadcast signal is also called a broadcast wave.

The demodulator 104 demodulates the broadcast signal supplied from the tuner 103, and supplies the demodulated broadcast signal to the demultiplexer 105. The demultiplexer 105 separates the broadcast signal supplied from the demodulator 104 to generate an image signal and an audio signal, and supplies the generated image signal and audio signal to the selector 107.

The selector 107 is configured to select one of the plurality of signals supplied from the demultiplexer 105, the a/D converter 106, and the input terminal 102c, and supply the selected one signal to the signal processing unit 108.

The signal processing unit 108 is configured to perform predetermined signal processing on the image signal supplied from the selector 107 and supply the processed image signal to the display panel 110. The signal processing unit 108 is configured to perform predetermined signal processing on the audio signal supplied from the selector 107 and supply the processed audio signal to the speaker 109.

The speaker 109 is configured to output a voice or various sounds based on the sound signal supplied from the signal processing unit 108. The speaker 109 changes the volume of the output voice or various sounds based on the control performed by the CPU 114.

The display panel 110 is configured to display an image such as a still image or a moving image based on an image signal supplied from the signal processing unit 108 or control performed by the CPU 114. The display panel 110 is an example of a display unit.

The input terminal 102b receives analog signals (video signals and audio signals) input from the outside. The input terminal 102c is configured to receive digital signals (video signals and audio signals) input from the outside. For example, the input terminal 102c is designed to be able to input a digital signal to the input terminal 102c from a video recorder (BD recorder) or the like mounted with a drive device for recording and playing a recording medium such as a BD (Blu-ray Disc) (registered trademark). The a/D converter 106 supplies a digital signal generated by a/D converting the analog signal supplied from the input terminal 102b to the selector 107.

The operation unit 111 receives an operation input from a user. The light receiving unit 112 receives infrared rays from the remote control 119. The IP communication unit 113 is a communication interface for performing IP (internet protocol) communication via the network 300.

The CPU114 is a control unit that controls the entire television apparatus 10. The Memory 115 is a ROM (Read Only Memory) storing various computer programs to be executed by the CPU114, a RAM (Random Access Memory) providing a work partition to the CPU114, and the like. The memory 116 is an HDD (Hard Disk Drive), SSD (Solid State Drive), or the like. The memory 116 stores the signal selected by the selector 107 as video data, for example.

The microphone 117 takes the sound of the user speaking and sends it to the audio I/F118. The microphone 117 is an example of an audio input unit. The microphone 117 can input sound when it is in the "on state", and cannot input sound when it is in the "off state". In the present embodiment, the microphone 117 is automatically turned on when the television apparatus 10 is started. For example, when the microphone 117 selects the voice recognition to be enabled by the control performed by the CPU114, it is kept in the on state. For example, when the CPU114 selects the voice recognition disabled state, the microphone 117 is switched to the off state. The details of the selection of the valid state and the invalid state of the voice recognition will be described later as the processing of the selection unit 15.

The audio I/F118 analog/digital converts the sound acquired by the microphone 117 and transmits it as a sound signal to the CPU 114.

Next, the functions of the television apparatus 10 according to the present embodiment will be described.

Fig. 2 is a diagram showing an example of a functional configuration of the television apparatus 10 according to the present embodiment. As shown in fig. 2, the television apparatus 10 includes an acquisition unit 11, a wakeup word detection unit 12, a voice recognition unit 13, a display control unit 14, a selection unit 15, and a device control unit 16.

The program executed by the television apparatus 10 according to the present embodiment has a module configuration including the above-described respective units (the acquisition unit, the wakeup word detection unit, the voice recognition unit, the display control unit, the selection unit, and the device control unit), and the CPU114, which is actual hardware, reads the program from the ROM or the like and executes the program, thereby loading the respective units into a main storage device such as the RAM, and generating the acquisition unit, the wakeup word detection unit, the voice recognition unit, the display control unit, the selection unit, and the device control unit in the main storage device.

The program executed by the television device 10 of the present embodiment is provided by, for example, loading a ROM or the like in advance. The program executed by the television device 10 of the present embodiment may be provided as follows: in the form of an installable or executable file, a storage medium readable by a computer, such as a CD-ROM, a flash memory (FD), a CD-R, or a DVD (Digital Versatile Disk).

The program executed by the television device 10 according to the present embodiment may be stored in a computer connected to a network such as the internet and provided by downloading the program via the network. The program executed by the television device 10 of the present embodiment may be provided or distributed via a network such as the internet. In the present embodiment, each functional unit is described as being implemented by one CPU, but each functional unit may be implemented by a plurality of CPUs or various circuits.

The acquisition section 11 acquires the voice of the user input to the microphone 117 via the audio I/F118. The acquisition unit 11 transmits the acquired voice to the wakeup word detection unit 12 and the voice recognition unit 13. Note that the "sound" acquired by the acquisition unit 11 is a digital sound signal converted by the audio I/F118, but is hereinafter simply referred to as "sound".

The acquisition unit 11 acquires various signals from the operation unit 111, the light receiving unit 112, the IP communication unit 113, the selector 107, the signal processing unit 108, and the like connected to the CPU 114. For example, the acquisition unit 11 receives an operation by the user based on infrared rays from the remote control 119 received by the light receiving unit 112 or an operation input to the operation unit 111. The acquisition unit 11 transmits the content of the received user operation to the display control unit 14 and the device control unit 16.

The Wake Word detector 12 detects a Wake Word (Wake Word) from the sound acquired by the acquisition unit 11. The wakeup word is a predetermined voice command that triggers the start of the voice recognition service. The wakeup word is a word that is preset. In addition, the method for determining whether the sound signal contains the wake-up word may adopt a known sound recognition technology.

In the present embodiment, the setting of the wakeup word detection unit 12 does not change depending on the selection of any of the enabled state and the disabled state of the voice recognition by the selection unit 15 described later, but when the disabled state is selected, the microphone 117 is turned off and the input of the voice is disabled, and therefore the voice cannot be acquired. Therefore, the wakeup word detection unit 12 does not perform the detection process of the wakeup word when the disabled state of voice recognition is selected. When the valid state of voice recognition is selected, the microphone 117 is turned on, and voice input is enabled. Therefore, when the active state of the voice recognition is selected, the wakeup word detection unit 12 executes the detection process of the wakeup word for the voice input to the microphone 117.

When the wakeup word is detected from the sound acquired by the acquisition unit 11, the wakeup word detection unit 12 notifies the display control unit 14 and the device control unit 16 that the wakeup word has been detected. When the user's voice is input next to the wakeup word, the wakeup word detection unit 12 transmits the voice next to the wakeup word to the voice recognition unit 13.

The voice recognition unit 13 performs voice recognition processing for the voice input to the microphone 117. In the present embodiment, although the setting of the voice recognition unit 13 itself does not change in response to the selection of any one of the valid state and the invalid state of voice recognition by the selection unit 15 described later, when the invalid state is selected, the microphone 117 cannot input a voice, and therefore cannot acquire a voice. Therefore, when the disabled state of voice recognition is selected, the voice recognition unit 13 does not execute the voice recognition processing. When the valid state of voice recognition is selected, the microphone 117 can input voice. Therefore, when the valid state of the voice recognition is selected, the voice recognition unit 13 executes the voice recognition processing for the voice input to the microphone 117.

More specifically, when the wake word is detected by the wake word detection unit 12, the voice recognition unit 13 performs voice recognition processing on the voice following the wake word to specify the content of the voice of the user. The voice recognition process may be adapted to known techniques. For example, the voice recognition section 13 converts the voice content of the user into text data using a known technique. The voice recognition unit 13 transmits the voice recognition result to the display control unit 14 and the device control unit 16. In the present embodiment, the voice recognition service is realized by the display control unit 14, the device control unit 16, and other functional units executing processing based on the result of voice recognition of the user's voice by the voice recognition unit 13.

The display control section 14 controls various displays on the display panel 110. For example, when the acquisition unit 11 acquires an operation by a user input to the remote control 119 or the like, the display control unit 14 displays an operation screen corresponding to the operation on the display panel 110. More specifically, when the user performs an operation such as pressing a button for starting the setting of the recording reservation, the display control unit 14 displays an operation screen on the display panel 110, the operation screen being capable of receiving the operation by the user. The Display mode of the operation Screen may be, for example, an OSD (On Screen Display) displayed superimposed On the Screen of the content being played back, or may be a full-Screen Display displayed On the entire Display panel 110. In this embodiment, "content" includes a television program, a moving image recorded on a DVD or the like, a moving image played back by an application, and the like.

The display control unit 14 also displays various notification screens on the display panel 110. For example, the display control unit 14 displays a notification screen including a message for providing information to the user, warning the user, or calling the user's attention, as an OSD superimposed on the screen of the content being played.

When the wakeup word is detected by the wakeup word detection unit 12, the display control unit 14 displays a message, an icon, or the like that responds to the voice on the display panel 110. The message, icon, or the like that responds to the voice may be, for example, a content that urges the user to speak, or may be in a form in which the recognition result of the voice of the user is displayed as character data. By displaying the message, the icon, or the like, the user can easily recognize that the wakeup word is recognized and that the speech sound is an instruction to the television apparatus 10.

For example, when the operation screen or the notification screen is displayed on the display panel 110, the display control unit 14 sets an operation screen display flag indicating that the operation screen is being displayed or a notification screen display flag indicating that the notification screen is being displayed in the memory 115. When the display of the operation screen or the notification screen is finished, the display control unit 14 deletes the operation screen display flag or the notification screen display flag from the memory 115. The method of displaying the operation screen or the notification screen on the display panel 110 is not limited to this. For example, the display control unit 14 may notify the selection unit 15 of a message indicating that the operation screen or the notification screen is displayed on the display panel 110 or a message indicating that the display of the operation screen or the notification screen is completed.

Further, the display control unit 14 controls the display of the display panel 110 based on a command included in the voice of the user recognized by the voice recognition unit 13. For example, the display control unit 14 controls the tuner 103 based on a command included in the user's voice, selects a channel on which a program designated by the user's voice is being broadcast, and displays the program on the display panel 110. Further, the display control unit 14 may play back the recorded data of the program stored in the memory 116 or an external storage device based on a command included in the user's voice and display the played data on the display panel 110.

The selection unit 15 selects either one of the valid state and the invalid state of the voice recognition based on a predetermined condition.

The predetermined condition in the present embodiment is that "at least either one of the operation screen and the notification screen is being displayed on the display panel 110". The selection unit 15 of the present embodiment selects the invalid state when the state of the display panel 110 of the television apparatus 10 satisfies a predetermined condition. The selection unit 15 selects the valid state when the state of the display panel 110 of the television apparatus 10 does not satisfy the predetermined condition.

For example, the selection unit 15 determines that the operation screen is being displayed when the operation screen display flag is set in the memory 115, and determines that the notification screen is being displayed when the notification screen display flag is set in the memory 115. When determining that at least one of the operation screen and the notification screen is being displayed on the display panel 110, the selection unit 15 determines that the television apparatus 10 satisfies the predetermined condition. In this case, the selection unit 15 selects the disabled state.

The method of determining whether or not the operation screen or the notification screen is displayed is not limited to this, and for example, the selection unit 15 may determine whether or not at least one of the operation screen and the notification screen is being displayed on the display panel 110 based on whether or not the operation screen or the notification screen is displayed, which is acquired from the display control unit 14.

When determining that neither the operation screen nor the notification screen is displayed on the display panel 110, the selection unit 15 determines that the television apparatus 10 does not satisfy the predetermined condition. In this case, the selection unit 15 selects the active state.

The selection unit 15 transmits the selection result of the valid state and the invalid state of the voice recognition to the device control unit 16.

The device control unit 16 controls various devices included in the television apparatus 10. For example, when the selection unit 15 selects the disabled state of voice recognition, the device control unit 16 turns off the microphone 117. For example, when the selection unit 15 selects the valid state of the voice recognition, the device control unit 16 turns on the microphone 117.

When the wakeup word detection unit 12 detects a wakeup word, the device control unit 16 controls the speaker 109 to reduce the volume. This is to reduce the situation where the input of the sound of the user speaking after the wake-up word is disturbed by the sound of the content.

The device control unit 16 controls various devices included in the television apparatus 10 based on commands included in the user's voice recognized by the voice recognition unit 13. For example, when the user's voice includes a command to increase the volume, the device control unit 16 controls the speaker 109 to increase the volume. The device control unit 16 may retrieve information from the internet based on a command included in the voice of the user recognized by the voice recognition unit 13.

Next, a flow of the selection process of the valid state and the invalid state of the voice recognition performed by the television apparatus 10 configured as described above will be described.

Fig. 3 is a flowchart showing an example of the flow of the selection processing of the valid state and the invalid state of the voice recognition in the present embodiment. It is assumed that the processing of the flowchart is continuously executed during the operation of the television apparatus 10. Note that, at the start time of the flowchart, the voice recognition is in the active state and the microphone 117 is in the open state.

First, the selection unit 15 determines whether or not the television apparatus 10 satisfies a predetermined condition based on, for example, whether or not the operation screen display flag or the notification screen display flag is set in the memory 115 (S1).

When the operation screen display flag or the notification screen display flag is set in the memory 115, the selection unit 15 determines that the television apparatus 10 satisfies the predetermined condition (yes in S1). In this case, the selection unit 15 selects the disabled state of the voice recognition (S2). The selection unit 15 transmits the selection of the disabled state of voice recognition to the device control unit 16.

Next, the device control unit 16 sets the microphone 117 to the "off state" (S3). Thereby, the microphone 117 is in a state of not receiving the input of the sound. After the microphone 117 is set to the "off state" by the device control unit 16, the process returns to S1, and the process is repeated.

When any one of the operation screen display flag and the notification screen display flag is not set in the memory 115, the selection unit 15 determines that the television apparatus 10 does not satisfy the predetermined condition (no in S1). In this case, the selection unit 15 selects the valid state of the voice recognition (S4). For example, when the display of the operation screen or the notification screen is ended and the flag is deleted after the voice recognition is in the disabled state, the selection unit 15 selects the enabled state, and thereby the voice recognition is switched from the disabled state to the enabled state. The selection unit 15 transmits the selection of the valid state of voice recognition to the device control unit 16.

Next, the device control unit 16 turns on the microphone 117 (S5). Thereby, the microphone 117 is in a state capable of receiving an input of voice. Note that, in the case where the microphone 117 is already in the on state, the apparatus control section 16 does not perform any processing.

Next, the wake word detector 12 acquires the voice of the user input to the microphone 117 via the audio I/F118 (S6). The acquiring unit 11 transmits the acquired voice to the wakeup word detection unit 12 and the voice recognition unit 13.

Then, the wake word detector 12 determines whether or not the sound acquired by the acquirer 11 includes a wake word (S7). When the wakeup word is detected from the acquired sound (yes in S7), the wakeup word detection unit 12 notifies the display control unit 14 and the device control unit 16 that the wakeup word has been detected. When the user's voice is input next to the wakeup word, the wakeup word detection unit 12 transmits the voice following the wakeup word to the voice recognition unit 13.

Next, the device control unit 16 controls the speaker 109 to lower the volume of the content being played (S8). Further, the display control unit 14 displays a response message or an icon for the user on the display panel 110 (S9). Such a process by the device control unit 16 or the display control unit 14 is an example of a process at the start of the voice recognition service.

Then, the voice recognition section 13 performs voice recognition processing for the voice input into the microphone 117 after the wakeup word (S10). The voice recognition unit 13 transmits the voice recognition result of the voice recognition process to the display control unit 14 and the device control unit 16. Then, the display control section 14 or the device control section 16 implements the voice recognition service by executing the processing based on the voice recognition result (S11). Thereafter, the process returns to S1, and the process in the flowchart is repeated until the power of the television apparatus 10 is turned off.

In this way, the television apparatus 10 according to the present embodiment selects either one of the valid state and the invalid state of the voice recognition based on a predetermined condition, executes the voice recognition processing for the voice input to the microphone 117 when the valid state is selected, and does not execute the voice recognition processing when the invalid state is selected. Therefore, according to the television apparatus 10 of the present embodiment, it is possible to reduce the situation in which the voice recognition service is started when the voice recognition service is not necessary.

For example, there is a case where although the sound of the user speaking is not a wake word, it is erroneously recognized as a wake word. In general, when a user operates a remote controller or the like, an operation based on a voice recognition service is not required in many cases. However, in the conventional technology, when a user operates a remote controller or the like while looking at an operation screen on a display panel, if a voice of the user speaking is erroneously recognized as a wakeup word, a voice recognition service is started, and a response message or an icon for the user is displayed on the display panel, so that the operation screen disappears or becomes less visible.

In addition, when the notification screen is displayed on the display panel, the user may read a message or the like displayed on the notification screen, and therefore does not want the notification screen to be blocked by another screen until the display of the notification screen is completed. However, in the related art, even if the user is watching the notification screen on the display panel, if the voice of the user speaking is erroneously recognized as a wakeup word, the voice recognition service is started, and a response message or an icon for the user is displayed on the display panel, so that the notification screen disappears or becomes less visible. In such a case, the user may be annoyed and the provision of information to the user may be hindered.

In contrast, in the television apparatus 10 according to the present embodiment, when at least one of the operation screen and the notification screen is displayed on the display panel 110, it is determined that the television apparatus 10 satisfies the predetermined condition, and the disabled state is selected. Therefore, according to the television apparatus 10 of the present embodiment, it is possible to reduce a situation in which the voice recognition service is started when the operation screen or the notification screen is displayed on the display panel 110. Therefore, according to the television apparatus 10 of the present embodiment, it is possible to reduce a situation in which a response message or an icon for the user is displayed on the display panel 110 when the user is using the operation screen or the notification screen, and the user is made difficult to see the operation screen or the notification screen.

In addition, the television apparatus 10 according to the present embodiment turns on the microphone 117 when the active state is selected, and turns off the microphone 117 when the inactive state is selected. Therefore, according to the television apparatus 10 of the present embodiment, the user cannot physically input the voice in the disabled state, and the situation in which the voice recognition service is started can be reduced.

In the present embodiment, the microphone 117 as hardware is used as an example of the audio input unit, but the acquisition unit 11 realized by a program may be used as an example of the audio input unit. The microphone 117 may be provided not in the main body of the television apparatus 10 but in the remote control 119. The voice input unit may be realized by a voice recognition device external to the television apparatus 10.

In the present embodiment, the predetermined condition is that at least one of the "operation screen and the notification screen is being displayed on the display panel 110", but the predetermined condition may be that the "operation screen is being displayed on the display panel 110" or the "notification screen is being displayed on the display panel 110". For example, when the predetermined condition is "the operation screen is being displayed on the display panel 110", the selection unit 15 determines that the predetermined condition is satisfied regardless of whether the notification screen is displayed when the operation screen is being displayed on the display panel 110. When the operation screen is not displayed on the display panel 110, the selection unit 15 determines that the predetermined condition is not satisfied regardless of whether the notification screen is displayed.

In the present embodiment, the wake-up word detecting unit 12 and the voice recognizing unit 13 are different functional units, but the voice recognizing unit 13 may be designed to have the function of the wake-up word detecting unit 12. The voice recognition unit 13 and the wake-up word detection unit 12 may be collectively referred to as a voice recognition unit. The contents of the voice recognition service illustrated in the present embodiment are merely examples, and the contents of the voice recognition service are not limited to the illustrated contents.

Note that the lowering of the sound volume, the display of the response message, and the like on the display panel 110 in the present embodiment are examples of the processing at the start of the voice recognition service, and the processing at the start of the voice recognition service is not limited thereto. For example, the television apparatus 10 may output the response message in the form of voice at the start of the voice recognition service.

In the present embodiment, the selection unit 15 selects the invalid state of voice recognition when determining that the predetermined condition is satisfied, and selects the valid state of voice recognition when determining that the predetermined condition is not satisfied.

For example, when the voice recognition is in the normal state, the selection unit 15 may select the valid state of the voice recognition when it is determined that the predetermined condition is satisfied, and may select the invalid state of the voice recognition when it is determined that the predetermined condition is not satisfied. As a specific example, when the predetermined condition is "neither the operation screen nor the notification screen is displayed on the display panel 110", the selection unit 15 may determine that the predetermined condition is satisfied and select the valid state of the voice recognition when it is determined that neither the operation screen nor the notification screen is displayed on the display panel 110. When determining that either the operation screen or the notification screen is being displayed on the display panel 110, the selection unit 15 may determine that a predetermined condition is not satisfied and select the disabled state of the voice recognition.

(second embodiment)

In the first embodiment described above, the predetermined condition for selecting the disabled state of voice recognition is "at least one of the operation screen and the notification screen is being displayed on the display panel 110". In contrast, in the second embodiment, the predetermined condition for selecting the invalid state of voice recognition is "a predetermined application is running".

The hardware configuration of the television apparatus 10 of the present embodiment is the same as that of the first embodiment.

Fig. 4 is a diagram showing an example of a functional configuration of the television apparatus 10 according to the present embodiment. As shown in fig. 4, the television apparatus 10 includes an acquisition unit 11, a wakeup word detection unit 12, a voice recognition unit 13, a display control unit 14, a selection unit 1015, a device control unit 16, and an application execution unit 17. The application execution unit 17 is also realized by the CPU114 executing a program, as with other functional units. The acquisition unit 11, the wakeup word detection unit 12, the voice recognition unit 13, the display control unit 14, and the device control unit 16 have the same functions as those of the first embodiment.

The application execution unit 17 executes an application for distributing content, and displays a moving image of the content distributed by the application on the display panel 110.

The application to which the content is distributed, which is executed by the application execution unit 17, is an example of a predetermined application in the present embodiment. The application for content distribution is an application for receiving a content moving image such as a drama or a movie from an external server via the network 300, for example, but may be an application including another function.

The application execution unit 17 sets an application execution flag indicating that the application to which the content is distributed is being executed, in the memory 115, for example, while the application to which the content is distributed is being executed.

The selection unit 1015 of the present embodiment selects either one of the valid state and the invalid state of the voice recognition based on predetermined conditions as in the first embodiment, but in the present embodiment, either one of the valid state and the invalid state is selected using a condition different from that of the first embodiment.

More specifically, the predetermined condition in the present embodiment is "a predetermined application (application to which content is distributed) is being executed". The selection unit 1015 according to the present embodiment acquires the execution state of the application for content delivery, determines that a predetermined condition is satisfied when the application for content delivery is being executed, and selects the disabled state of voice recognition. When the application for content delivery is not being executed, the selection unit 1015 determines that a predetermined condition is not satisfied and selects the valid state of voice recognition.

The selection unit 1015 determines whether or not a predetermined application is being executed based on the presence or absence of the application execution flag in the memory 115, for example, but may acquire the execution state of the predetermined application by another method.

The flow of the selection process of the valid state and the invalid state of the voice recognition in the present embodiment is the same as that in the first embodiment shown in fig. 3.

As described above, the television apparatus 10 according to the present embodiment selects the valid state when the application for content distribution is not being executed, and selects the invalid state when the application for content distribution is being executed. Therefore, according to the television apparatus 10 of the present embodiment, in addition to the effects of the first embodiment, the following effects can be obtained: it is possible to reduce a situation in which the voice recognition service is started when moving image content or the like is displayed on the display panel 110 by an application that is distributing the content.

That is, according to the television apparatus 10 of the present embodiment, it is possible to reduce the occurrence of a situation in which the moving image of the content displayed on the display panel 110 disappears due to the start of the voice recognition service, or the moving image of the content is blocked by displaying a response message on the moving image of the content. Further, since the volume of the speaker 109 is lowered at the start of the voice recognition service, the viewing of a moving image of the content being played may be obstructed. According to the television apparatus 10 of the present embodiment, since it is possible to reduce the situation in which the voice recognition service is started when the video content or the like is displayed on the display panel 110 by the application that is distributing the content, it is possible to reduce the situation in which the user is prevented from viewing the moving image of the content being played.

Even if the voice recognition service is not actually started, the user is prepared for the start of the voice recognition service, and may not be able to concentrate on viewing the moving image content or the like.

In the present embodiment, the predetermined application is an application for content distribution, but which application among applications executable by the television apparatus 10 is to be the "predetermined application" may be set in the television apparatus 10 in advance or may be designed to be settable by a user.

(third embodiment)

In the third embodiment, the predetermined condition for selecting the disabled state of voice recognition is "the current time is within the disabled period".

Fig. 5 is a diagram showing an example of a functional configuration of the television apparatus 10 according to the present embodiment. As shown in fig. 5, the television apparatus 10 includes an acquisition unit 1011, a wakeup word detection unit 12, a voice recognition unit 13, a display control unit 14, a selection unit 2015, and a device control unit 16. The wake word detection unit 12, the voice recognition unit 13, the display control unit 14, and the device control unit 16 have the same functions as those of the first embodiment.

The television apparatus 10 of the present embodiment has a setting of an invalid period in which voice recognition is disabled. The invalid period is a period in which voice recognition is in an invalid state. The setting of the invalid period is stored in the memory 116, for example. In the present embodiment, the setting of the invalid period is registered or changed by the user's operation. The setting of the invalid period is, for example, setting of the start time and the end time of the invalid period.

More specifically, the acquisition unit 1011 of the present embodiment receives the input operation of the start time and the end time of the invalid period by the user in addition to the functions of the first embodiment. For example, the acquisition unit 1011 receives input operations of the start time and the end time of the invalid period by the user based on the infrared ray received from the remote controller 119 by the light receiving unit 112 or an operation input to the operation unit 111, and stores the received invalid period information indicating the start time and the end time of the invalid period in the memory 116 or the like. The storage location of the invalid period information is not limited to this.

For example, the user may also set "PM23:00 to AM06:00 "is set as an invalid period in case the voice recognition service is started during bedtime. In addition, the user can also set "AM09:00 to PM17:00 "is set as an invalid period in case the voice recognition service is started during the outgoing period.

In the present embodiment, it is designed that all the periods not set as the invalid periods are valid periods. In the present embodiment, similarly to the first embodiment, the sound recognition is performed in the active state and the microphone 117 is turned on in the normal state.

The selector 2015 of the present embodiment selects either one of the valid state and the invalid state of speech recognition based on a predetermined condition, as in the first embodiment, but in the present embodiment, either one of the valid state and the invalid state is selected using a condition different from that of the first embodiment.

More specifically, the predetermined condition in the present embodiment is that "the current time is within the invalid period". When the current time is within the invalid period, the selection unit 2015 of the present embodiment determines that a predetermined condition is satisfied, and selects the invalid state of speech recognition. When the current time is within the valid period, the selection unit 2015 determines that the predetermined condition is not satisfied, and selects the valid state of the speech recognition.

Note that the flow of the state selection process of voice recognition in the present embodiment is the same as that in the first embodiment shown in fig. 3.

As described above, according to the television apparatus 10 of the present embodiment, by selecting the valid state when the current time is within the valid period and selecting the invalid state when the current time is within the invalid period, it is possible to reduce the situation in which the voice recognition service is started in a period in which the user does not desire the start of the voice recognition service, in addition to the effect of the first embodiment.

In the present embodiment, the setting of the invalid period by the user is received, but the setting of the valid period may be received. For example, when the voice recognition is disabled in the television apparatus 10 and is in a normal state, the voice recognition is enabled only for a set enabled period. In this case, the predetermined condition may be, for example, "the current time is within the valid period". In the case of this configuration, the selection unit 2015 may select the valid state of speech recognition if it is determined that the predetermined condition is satisfied, and may select the invalid state of speech recognition if it is determined that the predetermined condition is not satisfied.

In the present embodiment, the invalid period is defined only by the start time and the end time, but may be defined in more detail by calendar information such as a day of the week or a holiday.

(fourth embodiment)

In the fourth embodiment, the predetermined condition for selecting the disabled state of voice recognition is "the current time is within the disabled period" as in the third embodiment. However, while the user sets the invalid period in the third embodiment, the television apparatus 10 sets the invalid period based on the learning result in the fourth embodiment.

Fig. 6 is a diagram showing an example of a functional configuration of the television apparatus 10 according to the present embodiment. As shown in fig. 6, the television apparatus 10 includes an acquisition unit 11, a wakeup word detection unit 12, a voice recognition unit 13, a display control unit 14, a selection unit 2015, a device control unit 16, and a learning unit 18. The learning unit 18 is also realized by the CPU114 executing a program, as in the other functional units. The acquisition unit 11, the wakeup word detection unit 12, the voice recognition unit 13, the display control unit 14, and the device control unit 16 have the same functions as those of the first embodiment. The selection portion 2015 has the same function as that of the third embodiment.

The learning unit 18 learns a pattern of an operation performed by the user, and generates a learning completion model. The learning completion model in the present embodiment is information that associates time with whether or not a voice recognition service is necessary at that time, for example. The learning method performed by the learning unit 18 may be, for example, a technique of teachers-less learning among known machine learning and deep learning. The learning completion model is stored in the memory 116, for example, but the storage location is not limited thereto.

The input data of the learning unit 18 is the operation content and the time of the user, and is, for example, the time when the user performs the cancel operation of the voice recognition service, the time when the user uses the voice recognition service, or the like. For example, when the user ends the voice recognition service using the remote control 119 or the like without using the started voice recognition service, the behavior of the user performing the cancellation operation of the voice recognition service and the timing of the completion of the voice recognition service is learned.

The learning unit 18 outputs the time when the voice recognition service is not necessary based on the learning result. The learning unit 18 stores the output result in the memory 116 or the like as invalid period information indicating the start time and the end time of the invalid period.

Further, the learning unit 18 continues learning the pattern of the operation performed by the user even after the learning completion model is generated once, thereby improving the accuracy of the learning completion model.

In this way, the television apparatus 10 according to the present embodiment sets an invalid period for voice recognition based on the result of learning the pattern of the user's operation, selects an valid state when the current time is within the valid period, and selects an invalid state when the current time is within the invalid period. Therefore, according to the television apparatus 10 of the present embodiment, in addition to the effects of the first and third embodiments, it is possible to reduce the time and effort required for the user to perform the setting operation of the invalid period.

The input data to be input to the learning unit 18 and the output result to be output from the learning unit 18 in the present embodiment are merely examples, and are not limited thereto. The learning unit 18 may set an invalid period different depending on not only the time but also calendar information such as a day of the week or a holiday.

In the present embodiment, the television apparatus 10 sets the invalid period of the voice recognition based on the result of learning the pattern of the user's operation, but may set the valid period of the voice recognition based on the result of learning.

(fifth embodiment)

In the fifth embodiment, the predetermined condition for selecting the disabled state of the speech recognition is "the current time is within the period from the start time to the end time of the specific program".

Fig. 7 is a diagram showing an example of a functional configuration of the television apparatus 10 according to the present embodiment. As shown in fig. 7, the television apparatus 10 includes an acquisition unit 2011, a wakeup word detection unit 12, a voice recognition unit 13, a display control unit 14, a selection unit 3015, a device control unit 16, and a program table generation unit 19. The wake word detection unit 12, the voice recognition unit 13, the display control unit 14, and the device control unit 16 have the same functions as those of the first embodiment.

The acquisition unit 2011 according to the present embodiment includes the functions of the first embodiment, and acquires Information related to a program from Service Information SI (Service Information) included in a broadcast signal. The acquiring unit 2011 transmits the acquired information about the program to the program list generating unit 19.

The acquisition unit 2011 according to the present embodiment receives an operation performed by the user to specify a specific program. For example, the acquisition unit 2011 receives an operation for specifying a specific program by the user based on infrared rays from the remote control 119 received by the light receiving unit 112 or an operation input to the operation unit 111. The acquiring unit 2011 acquires the start time and the end time of the specific program designated by the user from the program table stored in the memory 116. The acquisition unit 2011 stores the received program time information indicating the start time and the end time of the specific program in the memory 116 or the like. The storage location of the program time information is not limited to this.

The program table generating unit 19 generates a program table based on the information about the program acquired by the acquiring unit 2011. The program table generator 19 stores the generated program table in the memory 116, for example.

In addition, the user may be designed to input the start time and the end time of a specific program.

The selection unit 3015 of the present embodiment selects either the valid state or the invalid state of voice recognition based on predetermined conditions as in the first embodiment, but in the present embodiment, either the valid state or the invalid state is selected using a condition different from that of the first embodiment.

More specifically, the predetermined condition in the present embodiment is "the current time is within a period from the start time to the end time of the specific program". The "period from the start time to the end time of the specific program" is an example of the invalid period in the present embodiment.

The selection unit 3015 of the present embodiment selects either the valid state or the invalid state based on whether or not the current time is within a period from the start time to the end time of the specific program. For example, when the current time is within a period from the start time to the end time of the specific program, the selection unit 3015 determines that a predetermined condition is satisfied and selects the disabled state of the voice recognition. When the current time is not within the period from the start time to the end time of the specific program, the selection unit 3015 determines that a predetermined condition is not satisfied, and selects the valid state of the voice recognition.

In this way, the television apparatus 10 according to the present embodiment selects either the valid state or the invalid state based on whether or not the current time is within the period from the start time to the end time of the specific program. Therefore, according to the television apparatus 10 of the present embodiment, in addition to the effects of the first embodiment, it is possible to prevent a situation in which the voice recognition service is started when the user is watching a specific program. Therefore, according to the television apparatus 10 of the present embodiment, it is possible to reduce the situation in which the user is prevented from being disturbed by the start of the unnecessary voice recognition service while watching the favorite program. In addition, according to the television apparatus 10 of the present embodiment, it is possible to reduce the occurrence of an erroneous operation in which the user accidentally switches to another program or turns off the power supply of the television apparatus 10 due to the voice recognition service while the user is watching a specific program, and it is possible to reduce a situation in which the user misses a program due to the occurrence of the erroneous operation.

In the present embodiment, the specific program is set by the user, but the specific program may be set by the television apparatus 10 based on a learning result obtained by learning the viewing history of the user.

In the present embodiment, the television apparatus 10 as an example of the receiving apparatus acquires information on a program from a broadcast signal, but the receiving apparatus may acquire program table data from the outside via the IP communication unit 113 and the network 300.

(modification 1)

In the first to fifth embodiments, the on state and the off state of the microphone 117 are switched when the voice recognition is performed in the active state and the inactive state, but the active state and the inactive state of the voice recognition function may be switched when the microphone 117 remains in the on state.

For example, when the disabled state of voice recognition is selected, the wakeup word detection unit 12 and the voice recognition unit 13 do not perform the detection processing and the voice recognition processing of the wakeup word for the voice input to the microphone 117. Therefore, when the disabled state of voice recognition is selected, the voice recognition service is not started even if the microphone 117 is in a state in which voice can be input.

In addition, when the active state of the voice recognition is selected, the wakeup word detection unit 12 and the voice recognition unit 13 execute the detection process or the voice recognition process of the wakeup word for the voice input to the microphone 117, as in the first to fifth embodiments.

(modification 2)

In the first to fifth embodiments described above, the valid state and the invalid state of the voice recognition are selected based on predetermined conditions different from each other, but the predetermined conditions in the different embodiments may be combined. For example, the predetermined condition for selecting the invalid state of the voice recognition may be "at least one of the operation screen and the notification screen is being displayed on the display panel 110, the predetermined application is being executed, the current time is in an invalid period, OR the current time is in a period from the start time to the end time of the specific program", which is obtained by combining the predetermined conditions in the first to fifth embodiments as the OR condition (OR condition) ", OR may be a condition obtained by combining a part of the predetermined conditions.

(modification 3)

In the first to fifth embodiments, the television apparatus 10 is used as an example of the receiving apparatus, but the receiving apparatus is not limited to this. For example, the receiving apparatus may be a set-top box, a PC (Personal Computer) with a television function, or the like, or may be a recording/playing apparatus such as a BD (Blu-ray Disc) (registered trademark) recorder or a DVD recorder.

As described above, according to the first to fifth embodiments, it is possible to reduce the situation in which the voice recognition service is started when the voice recognition service is not required.

Several embodiments of the present application have been described, but these embodiments are presented only as examples and are not intended to limit the scope of the application. These new embodiments may be implemented in other various forms, and various omissions, substitutions, and changes may be made without departing from the spirit of the present application. These embodiments and modifications thereof are included in the scope and gist of the application, and are included in the inventions described in the claims and the scope equivalent thereto.

Claims

1. A reception device is provided with:

a voice input unit that inputs a voice of a user;

a selection unit that selects either one of a valid state and an invalid state of voice recognition based on a predetermined condition; and

a voice recognition unit that executes voice recognition processing for the voice input to the voice input unit when the valid state is selected, and does not execute the voice recognition processing when the invalid state is selected;

the predetermined condition is that the notification screen is being displayed on the display unit,

the selection unit determines that the predetermined condition is satisfied when the notification screen is displayed on the display unit and determines that the notification screen is in the invalid state when the notification screen is not displayed on the display unit, and determines that the predetermined condition is satisfied when the notification screen display flag is set in the memory of the reception device.

2. The receiving device of claim 1,

the specified condition is that the specified application is in execution,

the selection unit acquires an execution state of the predetermined application, and determines that the predetermined application is in the valid state when the predetermined application is not being executed, and determines that the predetermined application is in the invalid state when the predetermined application is being executed.

3. The receiving device of claim 1,

the predetermined condition is that the current time is within an invalid period or within an active period,

the selection unit determines that the current time is in the valid period as the valid state, and determines that the current time is in the invalid period as the invalid state.

4. The reception apparatus according to any one of claims 1 to 3, wherein,

the sound input part is a microphone and,

the reception device further includes a device control unit that turns the microphone on when the selection unit determines that the microphone is in the active state, and turns the microphone off when the selection unit determines that the microphone is in the inactive state.

5. The reception apparatus according to claim 3, wherein the invalid period is settable by a user.

6. The reception apparatus according to claim 3, wherein the invalid period includes a sleep period of the user or an outgoing period of the user.

7. The reception apparatus according to any one of claims 1 to 3, wherein an active state and an inactive state of a voice recognition function are switched with the microphone maintained in an open state.