WO2021027892A1 - 接收装置 - Google Patents

接收装置 Download PDF

Info

Publication number
WO2021027892A1
WO2021027892A1 PCT/CN2020/108978 CN2020108978W WO2021027892A1 WO 2021027892 A1 WO2021027892 A1 WO 2021027892A1 CN 2020108978 W CN2020108978 W CN 2020108978W WO 2021027892 A1 WO2021027892 A1 WO 2021027892A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
voice recognition
state
voice
invalid
Prior art date
Application number
PCT/CN2020/108978
Other languages
English (en)
French (fr)
Inventor
山下丈次
Original Assignee
海信视像科技股份有限公司
东芝视频解决方案株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 海信视像科技股份有限公司, 东芝视频解决方案株式会社 filed Critical 海信视像科技股份有限公司
Priority to CN202080004651.1A priority Critical patent/CN112930686B/zh
Publication of WO2021027892A1 publication Critical patent/WO2021027892A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/441Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk

Definitions

  • the embodiment of the present application relates to a receiving device.
  • a voice recognition service that enable users to operate devices using voice.
  • devices such as a television device with a voice recognition function.
  • a voice recognition service is activated when a wake word (Wake Word) sent by the user is detected, for example, a certain response is made or the volume of the content being played is reduced in order to easily identify the user. sound.
  • a wake word Wike Word
  • Patent Document 1 Japanese Patent Application Publication No. 2013-235032
  • the receiving device of the embodiment includes a voice input unit, a selection unit, and a voice recognition unit.
  • the voice input unit inputs the user's voice.
  • the selection unit selects one of the valid state and the invalid state of voice recognition based on predetermined conditions. When the valid state is selected, the voice recognition unit performs voice recognition processing for the voice input to the voice input unit, and when the invalid state is selected, the voice recognition unit does not perform voice recognition processing.
  • FIG. 1 is a diagram showing an example of the hardware configuration of the television device of the first embodiment
  • FIG. 2 is a diagram showing an example of the functional configuration of the television device of the first embodiment
  • FIG. 3 is a flowchart showing an example of the flow of the selection process of the valid state and the invalid state of voice recognition in the first embodiment
  • FIG. 4 is a diagram showing an example of the functional configuration of the television device of the second embodiment
  • FIG. 5 is a diagram showing an example of the functional configuration of the television device of the third embodiment
  • Fig. 6 is a diagram showing an example of a functional configuration of a television device according to a fourth embodiment
  • Fig. 7 is a diagram showing an example of a functional configuration of a television device according to a fifth embodiment.
  • FIG. 1 is a diagram showing an example of the hardware configuration of a television device 10 according to this embodiment.
  • the television device 10 includes an antenna 101, an input terminal 102a, a tuner 103, a demodulator 104, a demultiplexer 105, an input terminal 102b and an input terminal 102c, and an A/D (analog/digital) converter 106, selector 107, signal processing unit 108, speaker 109, display panel 110, operation unit 111, light receiving unit 112, IP communication unit 113, CPU (Central Processing Unit) 114, memory (memory) 115, storage (storage) 116 , Microphone 117 and audio I/F (interface) 118.
  • the television device 10 is an example of the receiving device in this embodiment.
  • the antenna 101 receives broadcasting signals of digital broadcasting, and supplies the received broadcasting signals to the tuner 103 via the input terminal 102a.
  • the tuner 103 selects a broadcast signal of a desired channel from the broadcast signals supplied from the antenna 101, and supplies the selected broadcast signal to the demodulator 104.
  • Broadcast signals are also called broadcast waves.
  • the demodulator 104 demodulates the broadcast signal supplied from the tuner 103 and supplies the demodulated broadcast signal to the demultiplexer 105.
  • the demultiplexer 105 separates the broadcast signal supplied from the demodulator 104 to generate an image signal and an audio signal, and supplies the generated image signal and audio signal to the selector 107.
  • the selector 107 is configured to select one of a plurality of signals supplied from the demultiplexer 105, the A/D converter 106, and the input terminal 102c, and supply the selected one signal to the signal processing unit 108.
  • the signal processing unit 108 is configured to perform predetermined signal processing on the image signal supplied from the selector 107 and supply the processed image signal to the display panel 110. In addition, the signal processing unit 108 is configured to perform predetermined signal processing on the audio signal supplied from the selector 107 and supply the processed audio signal to the speaker 109.
  • the speaker 109 is configured to output speech or various sounds based on the sound signal supplied from the signal processing unit 108. In addition, the speaker 109 changes the volume of the output voice or various sounds based on the control performed by the CPU 114.
  • the display panel 110 is configured to display images such as still images and moving images based on image signals supplied from the signal processing unit 108 or control by the CPU 114.
  • the display panel 110 is an example of a display unit.
  • the input terminal 102b receives an analog signal (image signal and audio signal) input from the outside.
  • the input terminal 102c is configured to receive digital signals (image signals and audio signals) input from the outside.
  • the input terminal 102c is designed so that it can be connected to the input terminal from a video recorder (BD video recorder) equipped with a drive device for recording and playback of recording media such as BD (Blu-ray Disc) (registered trademark).
  • BD Blu-ray Disc
  • the A/D converter 106 supplies the selector 107 with a digital signal generated by A/D conversion of the analog signal supplied from the input terminal 102b.
  • the operation unit 111 receives the user's operation input.
  • the light receiving unit 112 receives infrared rays from the remote controller 119.
  • the IP communication unit 113 is a communication interface for performing IP (Internet Protocol) communication via the network 300.
  • the CPU 114 is a control unit that controls the entire television device 10.
  • the memory 115 is a ROM (Read Only Memory) that stores various computer programs executed by the CPU 114, a RAM (Random Access Memory) that provides a work partition for the CPU 114, and the like.
  • the storage 116 is HDD (Hard Disk Drive), SSD (Solid State Drive), or the like.
  • the memory 116 stores the signal selected by the selector 107 as recording data, for example.
  • the microphone 117 acquires the voice of the user's speech and transmits it to the audio I/F 118.
  • the microphone 117 is an example of a voice input unit.
  • the microphone 117 is automatically turned on when the television device 10 is activated.
  • the microphone 117 is controlled by the CPU 114 to select the effective state of voice recognition, the microphone 117 remains in the open state.
  • the microphone 117 is switched to the off state when the voice recognition is selected to be in the disabled state under the control of the CPU 114. The details of the selection of the valid state and the invalid state of the voice recognition will be described later as a process of the selection unit 15.
  • the audio I/F 118 performs analog/digital conversion on the sound acquired by the microphone 117 and transmits it to the CPU 114 as a sound signal.
  • FIG. 2 is a diagram showing an example of the functional configuration of the television device 10 of this embodiment.
  • the television device 10 includes an acquisition unit 11, a wake word detection unit 12, a voice recognition unit 13, a display control unit 14, a selection unit 15, and a device control unit 16.
  • the program executed by the television device 10 of the present embodiment has a module structure including the above-mentioned units (acquisition unit, wake word detection unit, voice recognition unit, display control unit, selection unit, device control unit), and passes through CPU114 as actual hardware. Read the program from ROM, etc. and execute the program, thereby loading the above-mentioned parts on a main storage device such as RAM, and generating an acquisition part, wake word detection part, voice recognition part, display control part, and selection part on the main storage device , Equipment Control Department.
  • the program executed by the television device 10 of the present embodiment is provided by preloading in a ROM or the like, for example.
  • the program executed by the television device 10 of the present embodiment can also be provided in the form of an installable or executable file stored in CD-ROM, flash memory (FD), CD-R, DVD (Digital Versatile Disk ) And other storage media readable by a computer.
  • each functional unit is realized by one CPU, but each functional unit may be realized by a plurality of CPUs or various circuits.
  • the acquisition unit 11 acquires the user's voice input into the microphone 117 via the audio I/F 118.
  • the acquisition unit 11 sends the acquired voice to the wake word detection unit 12 and the voice recognition unit 13.
  • the “sound” acquired by the acquiring unit 11 is a digital sound signal converted by the audio I/F 118, but is simply described as “sound” below.
  • the acquisition unit 11 acquires various signals from the operation unit 111, the light receiving unit 112, the IP communication unit 113, the selector 107, the signal processing unit 108, and the like connected to the CPU 114.
  • the acquisition unit 11 receives the user's operation based on the infrared rays from the remote control 119 received by the light receiving unit 112 or the operation input to the operation unit 111.
  • the acquisition unit 11 transmits the received content of the user's operation to the display control unit 14 and the device control unit 16.
  • the wake word detection unit 12 detects a wake word (Wake Word) from the sound acquired by the acquisition unit 11.
  • the wake-up word is a prescribed voice command that becomes a trigger for starting the voice recognition service.
  • Wake words are pre-set words.
  • the method for judging whether the sound signal contains a wake-up word can use a known sound recognition technology.
  • the setting of the wake word detection unit 12 does not change according to the selection by the selection unit 15 to be described later on either of the valid state and the invalid state of voice recognition
  • the microphone 117 will be turned off, and sound input cannot be performed, so sound cannot be obtained. Therefore, the wake word detection unit 12 does not execute the wake word detection processing when the voice recognition invalid state is selected.
  • the microphone 117 is in an open state, and voice input can be performed. Therefore, when the effective state of voice recognition is selected, the wake word detection unit 12 executes a wake word detection process for the sound input into the microphone 117.
  • the wake word detection unit 12 notifies the display control unit 14 and the device control unit 16 that the wake word is detected when the wake word is detected from the sound acquired by the acquisition unit 11. In addition, the wake-up word detection unit 12 transmits the voice following the wake-up word to the voice recognition unit 13 when the user's voice is input after the wake-up word.
  • the voice recognition unit 13 performs voice recognition processing for the voice input into the microphone 117.
  • the setting of the voice recognition unit 13 does not change according to the selection of either of the valid state and the invalid state of the voice recognition by the selection unit 15 described later, when the invalid state is selected In the case of the state, the microphone 117 cannot input sound, and therefore cannot acquire sound. Therefore, the voice recognition unit 13 does not execute voice recognition processing when the voice recognition invalid state is selected.
  • the microphone 117 can perform voice input. Therefore, the voice recognition unit 13 executes voice recognition processing for the voice input into the microphone 117 when the effective state of voice recognition is selected.
  • the voice recognition unit 13 determines the content of the user's voice by performing voice recognition processing on the voice following the wake-up word.
  • a known technique can be applied to the voice recognition processing.
  • the voice recognition unit 13 uses a known technology to convert the user's voice content into text data.
  • the voice recognition unit 13 sends the voice recognition result to the display control unit 14 and the device control unit 16.
  • each functional unit such as the display control unit 14 or the device control unit 16 executes processing based on the result of the voice recognition unit 13 performing voice recognition on the user's voice, thereby realizing a voice recognition service.
  • the display control section 14 controls various displays on the display panel 110. For example, when the acquisition unit 11 acquires a user's operation input into the remote controller 119 or the like, the display control unit 14 displays an operation screen corresponding to the operation on the display panel 110. More specifically, when the user performs an operation such as pressing a button for starting the setting of recording reservation, the display control unit 14 displays on the display panel 110 an operation screen that can accept the user's operation.
  • the display form of the operation screen may be, for example, an OSD (On Screen Display) displayed superimposed on the screen of the content being played, or may be a full screen display displayed on the entire display panel 110.
  • “content” includes a TV program, a moving image recorded on a DVD or the like, or a moving image played by an application.
  • the display control unit 14 displays various notification screens on the display panel 110.
  • the display control unit 14 superimposes a notification screen including messages such as providing information to the user, issuing a warning to the user, or calling the user's attention, on the screen of the content being played, and displays it as an OSD.
  • the display control unit 14 displays a message or an icon in response to a voice on the display panel 110 when the wake-up word is detected by the wake-up word detection unit 12.
  • a message or an icon that responds to the voice may be, for example, content that urges the user to speak, or may be a format in which the recognition result of the user's voice is displayed as text data. Through the display of this message, icon, etc., the user can easily recognize that the wake-up word is recognized and the speech sound becomes an instruction to the television device 10.
  • the display control unit 14 when the display control unit 14 displays an operation screen or a notification screen on the display panel 110, it sets an operation screen display flag indicating that the operation screen is being displayed or a notification screen display flag indicating that the notification screen is being displayed in the memory 115. in. In addition, the display control unit 14 deletes the operation screen display flag or the notification screen display flag from the memory 115 when the display of the operation screen or the notification screen is finished. It should be noted that the method of indicating that the operation screen or the notification screen is displayed on the display panel 110 is not limited to this. For example, the display control unit 14 may notify the selection unit 15 of a message indicating that an operation screen or a notification screen is displayed on the display panel 110 or a message indicating that the display of the operation screen or the notification screen has ended.
  • the display control unit 14 controls the display of the display panel 110 based on a command included in the user's voice recognized by the voice recognition unit 13. For example, the display control unit 14 controls the tuner 103 based on a command included in the user's voice, selects a channel on which a program designated by the user is broadcast, and displays the program on the display panel 110. In addition, the display control unit 14 may play the recording data of the program stored in the memory 116 or an external storage device and display it on the display panel 110 based on a command included in the user's voice.
  • the selection unit 15 selects one of the valid state and the invalid state of voice recognition based on predetermined conditions.
  • the predetermined condition in this embodiment is that "at least one of the operation screen and the notification screen is being displayed on the display panel 110".
  • the selection unit 15 of the present embodiment selects the invalid state when the state of the display panel 110 of the television device 10 satisfies a predetermined condition.
  • the selection unit 15 selects the valid state when the state of the display panel 110 of the television device 10 does not satisfy a predetermined condition.
  • the selection unit 15 determines that the operation screen is being displayed, and when a notification screen display mark is set in the memory 115, it determines that the notification screen is being displayed.
  • the selection unit 15 determines that at least any one of the operation screen or the notification screen is being displayed on the display panel 110, it determines that the television device 10 satisfies the predetermined condition. In this case, the selection unit 15 selects the invalid state.
  • the method of determining whether the operation screen or notification screen is displayed is not limited to this.
  • the selection unit 15 may also determine whether the operation screen or notification screen is displayed based on the display control unit 14 Whether at least any one of the operation screen and the notification screen is being displayed on the display panel 110.
  • the selection unit 15 determines that neither the operation screen nor the notification screen is displayed on the display panel 110, it determines that the television device 10 does not satisfy the predetermined condition. In this case, the selection unit 15 selects the effective state.
  • the selection unit 15 transmits the selection results of the valid state and the invalid state of the voice recognition to the device control unit 16.
  • the device control unit 16 controls various devices included in the television device 10. For example, the device control unit 16 sets the microphone 117 to the off state when the voice recognition invalid state is selected by the selection unit 15. In addition, for example, the device control unit 16 sets the microphone 117 to the on state when the effective state of voice recognition is selected by the selection unit 15.
  • the device control unit 16 controls the speaker 109 to lower the volume when the wake-up word is detected by the wake-up word detection unit 12. This is to reduce the situation where the user's speech input after the wake-up word is disturbed by the sound of the content.
  • the device control unit 16 controls various devices included in the television device 10 based on commands included in the user's voice recognized by the voice recognition unit 13. For example, the device control unit 16 controls the speaker 109 to increase the volume when the user's voice includes a command such as "volume up". It should be noted that the device control unit 16 may retrieve information from the Internet based on a command included in the user's voice recognized by the voice recognition unit 13.
  • FIG. 3 is a flowchart showing an example of the flow of the selection process of the valid state and the invalid state of voice recognition in this embodiment. It is assumed that the processing of this flowchart is continuously executed while the television device 10 is operating. In addition, it is assumed that the voice recognition is in the active state and the microphone 117 is in the on state at the beginning of the flowchart.
  • the selection unit 15 determines whether or not the television device 10 satisfies a predetermined condition based on, for example, whether an operation screen display flag or a notification screen display flag is set in the memory 115 (S1).
  • the selection unit 15 determines that the television device 10 satisfies the predetermined condition ("Yes" in S1). In this case, the selection unit 15 selects the invalid state of voice recognition (S2). The selection unit 15 transmits to the device control unit 16 that the invalid state of voice recognition is selected.
  • the device control unit 16 sets the microphone 117 to the "off state" (S3). As a result, the microphone 117 is in a state of not receiving audio input. After the microphone 117 is set to the "off state" by the device control unit 16, the process returns to S1, and the process is repeated.
  • the selection unit 15 selects the effective state of voice recognition (S4). For example, when the display of the operation screen or the notification screen is ended and the flag is deleted after the voice recognition becomes the invalid state, the selection unit 15 selects the valid state, and the voice recognition is switched from the invalid state to the valid state. The selection unit 15 transmits to the device control unit 16 that the effective state of voice recognition is selected.
  • the device control unit 16 sets the microphone 117 to the on state (S5). As a result, the microphone 117 is in a state capable of receiving sound input. It should be noted that when the microphone 117 is already in the open state, the device control unit 16 does not perform any processing.
  • the wake word detection unit 12 acquires the user's voice input into the microphone 117 via the audio I/F 118 (S6).
  • the acquisition unit 11 sends the acquired voice to the wake word detection unit 12 and the voice recognition unit 13.
  • the wake word detection unit 12 determines whether or not a wake word is included in the voice acquired by the acquisition unit 11 (S7).
  • the wake word detection unit 12 detects a wake word from the acquired voice (Yes in S7), it notifies the display control unit 14 and the device control unit 16 that the wake word is detected.
  • the wake-up word detection unit 12 transmits the voice following the wake-up word to the voice recognition unit 13 when the user's voice is input after the wake-up word.
  • the device control unit 16 controls the speaker 109 to lower the volume of the content being played (S8).
  • the display control unit 14 displays a response message or an icon to the user on the display panel 110 (S9).
  • Such processing by the device control unit 16 or the display control unit 14 is an example of processing when the voice recognition service is started.
  • the voice recognition section 13 performs voice recognition processing for the voice input into the microphone 117 after the wake-up word (S10).
  • the voice recognition unit 13 sends the voice recognition result of the voice recognition process to the display control unit 14 and the device control unit 16.
  • the display control unit 14 or the device control unit 16 executes processing based on the voice recognition result to realize the voice recognition service (S11). After that, the process returns to S1, and the process of this flowchart is repeated until the power of the television device 10 is cut off.
  • the television device 10 of the present embodiment selects either the valid state or the invalid state of voice recognition based on predetermined conditions, and when the valid state is selected, the voice recognition process for the voice input into the microphone 117 is executed. When the invalid state is selected, voice recognition processing is not executed. Therefore, according to the television device 10 of the present embodiment, it is possible to reduce the situation that the voice recognition service is started when the voice recognition service is not required.
  • the voice spoken by the user is not a wake-up word but is misrecognized as a wake-up word.
  • the voice recognition service is started, and the display panel Display a response message or icon for the user, making the operation screen disappear or become difficult to see.
  • the notification screen when the notification screen is displayed on the display panel, the user will read the message displayed on the notification screen, and therefore, it is not desired that the notification screen be blocked by other screens until the display of the notification screen ends.
  • the voice recognition service even if the user is watching the notification screen on the display panel, if the voice of the user is mistakenly recognized as a wake-up word, the voice recognition service will be started, and a response message or response to the user will be displayed on the display panel.
  • the icon makes the notification screen disappear or become difficult to see. In such a situation, it sometimes annoys the user and hinders the provision of information to the user.
  • the television device 10 of the present embodiment when at least one of the operation screen and the notification screen is displayed on the display panel 110 in the television device 10 of the present embodiment, it is determined that the television device 10 satisfies a predetermined condition, and the invalid state is selected. Therefore, according to the television device 10 of the present embodiment, it is possible to reduce the situation that the voice recognition service is started when the operation screen or the notification screen is displayed on the display panel 110. Therefore, according to the television device 10 of this embodiment, it is possible to reduce the situation where a response message or icon for the user is displayed on the display panel 110 when the user is using the operation screen or the notification screen, making it difficult for the user to see the operation screen or notification screen. .
  • the television device 10 of the present embodiment sets the microphone 117 to the on state when the active state is selected, and sets the microphone 117 to the off state when the invalid state is selected. Therefore, according to the television device 10 of the present embodiment, it is physically impossible to input the user's voice in the disabled state, and it is possible to reduce the situation of starting the voice recognition service.
  • the microphone 117 as hardware is used as an example of the sound input unit, but the acquisition unit 11 realized by a program may be used as an example of the sound input unit.
  • the microphone 117 may not be provided in the main body of the television device 10 but may be provided in the remote control 119.
  • the voice input unit may also be realized by a voice recognition device external to the television device 10.
  • the selection unit 15 determines that the predetermined condition is satisfied regardless of whether the notification screen is displayed or not when the operation screen is being displayed on the display panel 110. condition. In addition, when the operation screen is not displayed on the display panel 110, the selection unit 15 determines that the predetermined condition is not satisfied regardless of whether the notification screen is displayed.
  • the wake word detection unit 12 and the voice recognition unit 13 are set as different functional units, but the voice recognition unit 13 may be designed to have the function of the wake word detection unit 12.
  • the voice recognition unit 13 and the wake word detection unit 12 may also be collectively referred to as a voice recognition unit. It should be noted that the content of the voice recognition service illustrated in this embodiment is only an example, and the content of the voice recognition service is not limited to the illustrated content.
  • the reduction of the volume and the display of the response message on the display panel 110 in this embodiment are examples of the processing at the start of the voice recognition service, and the processing at the start of the voice recognition service is not limited to this.
  • the television device 10 may output the response message in the form of voice when the voice recognition service starts.
  • the selection unit 15 selects the invalid state of voice recognition when it is determined that the predetermined condition is satisfied, and selects the effective state of voice recognition when it is determined that the predetermined condition is not satisfied, but the selection criterion is not Not limited to this.
  • the selection unit 15 may select the effective state of voice recognition when it is determined that the predetermined condition is satisfied, and when it is determined that the predetermined condition is not satisfied Select the invalid state of voice recognition.
  • the predetermined condition is that "none of the operation screen and the notification screen is displayed on the display panel 110”
  • the selection unit 15 may determine whether the operation screen and the notification screen are both If it is not displayed on the display panel 110, it is determined that a predetermined condition is satisfied and the effective state of voice recognition is selected.
  • the selection unit 15 may determine that a predetermined condition is not satisfied and select the ineffective state of voice recognition.
  • the predetermined condition for selecting the invalid state of voice recognition is that "at least one of the operation screen and the notification screen is being displayed on the display panel 110".
  • the predetermined condition for selecting the invalid state of voice recognition is "the predetermined application is being executed”.
  • the hardware configuration of the television device 10 of this embodiment is the same as that of the first embodiment.
  • FIG. 4 is a diagram showing an example of the functional configuration of the television device 10 of this embodiment.
  • the television device 10 includes an acquisition unit 11, a wake word detection unit 12, a voice recognition unit 13, a display control unit 14, a selection unit 1015, a device control unit 16, and an application execution unit 17.
  • the application execution unit 17 is also realized by the CPU 114 executing a program in the same way as other functional units.
  • the acquisition unit 11, the wake word detection unit 12, the voice recognition unit 13, the display control unit 14, and the device control unit 16 have the same functions as those of the first embodiment.
  • the application execution unit 17 executes a content distribution application, and displays a dynamic image of the content distributed through the application on the display panel 110.
  • the content distribution application executed by the application execution unit 17 is an example of a predetermined application in this embodiment.
  • the content distribution application is, for example, an application that receives the distribution of content moving images such as TV series and movies from an external server via the network 300, but it may also be an application that includes other functions.
  • the application execution unit 17 sets, in the memory 115, an application execution flag indicating that the content distribution application is executing, for example, while the content distribution application is being executed.
  • the selection unit 1015 of this embodiment selects either of the valid state and the invalid state of voice recognition based on predetermined conditions similarly to the first embodiment. However, in this embodiment, conditions different from those of the first embodiment are used for Select any one of the valid state and the invalid state.
  • the predetermined condition in this embodiment is that "the predetermined application (application for content distribution) is being executed".
  • the selection unit 1015 of this embodiment acquires the execution status of the content-published application, and when the content-published application is being executed, determines that a predetermined condition is satisfied, and selects the invalid state of voice recognition. In addition, when the content distribution application is not being executed, the selection unit 1015 determines that the predetermined condition is not satisfied, and selects the effective state of voice recognition.
  • the selection unit 1015 determines whether a predetermined application is being executed based on the presence or absence of an application execution flag in the memory 115, for example, but the execution status of the predetermined application may be obtained by other methods.
  • the television device 10 of the present embodiment selects the active state when the content-published application is not in execution, and selects the invalid state when the content-published application is in execution. Therefore, according to the television device 10 of the present embodiment, in addition to the effects of the first embodiment, there is an effect that it is possible to reduce the number of cases where moving image content and the like are displayed on the display panel 110 through an application that is currently distributing content. Such a situation as voice recognition services.
  • the television device 10 of the present embodiment it is possible to reduce the disappearance of the moving image of the content displayed on the display panel 110 due to the start of the voice recognition service, the display of a response message on the moving image of the content, and the like. To block the occurrence of such a situation.
  • the volume of the speaker 109 is lowered when the voice recognition service is started, viewing of the moving image of the content being played may be hindered.
  • the television device 10 of this embodiment can reduce this State of affairs.
  • the predetermined application is an application for content distribution, but which of the applications that can be executed by the television device 10 will become the "prescribed application" can be preset in the television device 10. It can also be designed to be set by the user.
  • the predetermined condition for selecting the invalid state of voice recognition is "the current moment is within the invalid period”.
  • the hardware configuration of the television device 10 of this embodiment is the same as that of the first embodiment.
  • FIG. 5 is a diagram showing an example of the functional configuration of the television device 10 of this embodiment.
  • the television device 10 includes an acquisition unit 1011, a wake-up word detection unit 12, a voice recognition unit 13, a display control unit 14, a selection unit 2015, and a device control unit 16.
  • the wake word detection unit 12, the voice recognition unit 13, the display control unit 14, and the device control unit 16 have the same functions as the first embodiment.
  • the television device 10 of this embodiment has an invalid period setting in which voice recognition is in an invalid state.
  • the invalid period is the period during which voice recognition becomes invalid.
  • the setting of the invalid period is stored in the memory 116, for example. In this embodiment, the setting of the invalid period is registered or changed by the user's operation.
  • the setting of the invalid period is, for example, setting related to the start time and end time of the invalid period.
  • the acquisition unit 1011 of this embodiment has the function of the first embodiment, and also receives input operations of the start time and end time of the invalid period by the user.
  • the acquisition unit 1011 receives the input operation of the start time and end time of the invalidation period performed by the user based on the infrared rays from the remote control 119 received by the light receiving unit 112 or the operation input to the operation unit 111, and invalidates the received indication
  • the invalid period information at the start time and end time of the period is stored in the memory 116 and the like. It should be noted that the storage location of the invalid period information is not limited to this.
  • the user can also set “PM23:00 ⁇ AM06:00” as an invalid period to prevent the voice recognition service from being activated during bedtime.
  • users can also set "AM09:00 ⁇ PM17:00” as an invalid period to prevent the voice recognition service from being activated while out.
  • this embodiment it is designed so that all periods not set as invalid periods are valid periods. It should be noted that, in this embodiment, similar to the first embodiment, it is designed such that in a normal state, the voice recognition is in the active state and the microphone 117 is in the on state.
  • the selection unit 2015 of this embodiment selects either of the valid state and the invalid state of voice recognition based on predetermined conditions similarly to the first embodiment. However, in this embodiment, a condition different from that of the first embodiment is used. Select any one of the valid state and the invalid state.
  • the predetermined condition in this embodiment is "the current time is within the invalid period”.
  • the selection unit 2015 of the present embodiment determines that a predetermined condition is satisfied when the current time is within the invalid period, and selects the invalid state of voice recognition.
  • the selection unit 2015 determines that the predetermined condition is not satisfied, and selects the valid state of voice recognition.
  • the active state is selected when the current time is within the valid period
  • the invalid state is selected when the current time is within the invalid period, thereby in addition to the effects of the first embodiment It can also reduce the situation that the voice recognition service starts when the user does not expect the start of the voice recognition service.
  • the setting of the invalid period by the user is received, but the setting of the valid period may be received.
  • the voice recognition in the invalid state is a normal state in the television device 10
  • the voice recognition is made into the effective state only during the set effective period.
  • the prescribed condition may be, for example, "the current moment is within the valid period”.
  • the selection unit 2015 may select the valid state of voice recognition when it is determined that the predetermined condition is satisfied, and select the invalid state of voice recognition when it is determined that the predetermined condition is not satisfied.
  • the invalid period is defined only by the start time and the end time, but it can also be defined in more detail by using calendar information such as days of the week or holidays.
  • the predetermined condition for selecting the invalid state of voice recognition is "the current moment is within the invalid period" as in the third embodiment.
  • the user sets the invalid period.
  • the television device 10 sets the invalid period based on the learning result.
  • the hardware configuration of the television device 10 of this embodiment is the same as that of the first embodiment.
  • FIG. 6 is a diagram showing an example of the functional configuration of the television device 10 of this embodiment.
  • the television device 10 includes an acquisition unit 11, a wake word detection unit 12, a voice recognition unit 13, a display control unit 14, a selection unit 2015, a device control unit 16, and a learning unit 18.
  • the learning unit 18 is also realized by the CPU 114 executing a program in the same manner as other functional units.
  • the acquisition unit 11, the wake word detection unit 12, the voice recognition unit 13, the display control unit 14, and the device control unit 16 have the same functions as those of the first embodiment.
  • the selection unit 2015 has the same function as the third embodiment.
  • the learning unit 18 learns the pattern of the operation performed by the user, and generates a learning completion model.
  • the learning completion model in this embodiment is information that associates a time with whether a voice recognition service is required at that time.
  • the learning method performed by the learning unit 18 can be, for example, a well-known machine learning or deep learning technique without teacher learning.
  • the learned model is stored in the memory 116, etc., but the storage location is not limited to this.
  • the input data of the learning unit 18 is the user's operation content and time, for example, the time when the user performed the cancellation operation of the voice recognition service, the time when the user used the voice recognition service, and the like. For example, when the user does not use the started voice recognition service but ends the voice recognition service with the remote control 119 or the like, learn the time and the behavior that the user performed the cancel operation of the voice recognition service.
  • the learning unit 18 outputs the time when the voice recognition service is unnecessary based on the learning result.
  • the learning unit 18 stores the output result as invalid period information indicating the start time and end time of the invalid period in the memory 116 or the like.
  • the learning unit 18 continues to learn the mode of operation performed by the user after generating the learning completion model once to improve the accuracy of the learning completion model.
  • the television device 10 of the present embodiment sets the invalid period of voice recognition based on the result of learning the user's operation pattern, selects the valid state when the current time is within the valid period, and is invalid at the current time In the case of the period, select the invalid state. Therefore, according to the television device 10 of this embodiment, in addition to the effects of the first and third embodiments, it is possible to reduce the time and effort required for the user to perform the invalidation period setting operation.
  • the input data input to the learning unit 18 and the output result output from the learning unit 18 exemplified in the present embodiment are merely examples, and are not limited to this.
  • the learning unit 18 may not only set different invalid periods according to time, but also different invalid periods according to calendar information such as days of the week or holidays.
  • the television device 10 sets the invalid period of voice recognition based on the result of learning the user's operation pattern, but it may also set the validity of voice recognition based on the result of learning. period.
  • the predetermined condition for selecting the invalid state of voice recognition is "the current time is within the period from the start time to the end time of the specific program”.
  • the hardware configuration of the television device 10 of this embodiment is the same as that of the first embodiment.
  • FIG. 7 is a diagram showing an example of the functional configuration of the television device 10 of this embodiment.
  • the television device 10 includes an acquisition unit 2011, a wake word detection unit 12, a voice recognition unit 13, a display control unit 14, a selection unit 3015, a device control unit 16, and a program table generation unit 19.
  • the wake word detection unit 12, the voice recognition unit 13, the display control unit 14, and the device control unit 16 have the same functions as the first embodiment.
  • the acquisition unit 2011 of this embodiment also acquires program-related information from service information SI (Service Information) included in the broadcast signal.
  • the acquisition unit 2011 transmits the acquired information related to the program to the program table generating unit 19.
  • the acquisition unit 2011 of this embodiment receives an operation performed by the user for specifying a specific program.
  • the acquisition unit 2011 receives an operation for specifying a specific program performed by the user based on infrared rays from the remote control 119 received by the light receiving unit 112 or an operation input to the operation unit 111.
  • the acquisition unit 2011 acquires the start time and end time of the specific program designated by the user from the program table stored in the memory 116.
  • the acquisition unit 2011 stores the received program time information indicating the start time and end time of the specific program in the memory 116 or the like. It should be noted that the storage location of the program time information is not limited to this.
  • the program table generating unit 19 generates a program table based on the information related to the program acquired by the acquiring unit 2011.
  • the program table generating unit 19 stores the generated program table in the memory 116, for example.
  • the selection unit 3015 of this embodiment selects any of the valid state and the invalid state of voice recognition based on predetermined conditions as in the first embodiment. However, in this embodiment, conditions different from those of the first embodiment are used for Select any one of the valid state and the invalid state.
  • the predetermined condition in the present embodiment is "the current time is within the period from the start time to the end time of the specific program".
  • the "period from the start time to the end time of the specific program” is an example of the invalid period in this embodiment.
  • the selection unit 3015 of the present embodiment selects either the valid state or the invalid state based on whether the current time is within the period from the start time to the end time of the specific program. For example, when the current time is within the period from the start time to the end time of the specific program, the selection unit 3015 determines that a predetermined condition is satisfied, and selects the invalid state of voice recognition. In addition, when the current time is not within the period from the start time to the end time of the specific program, the selection unit 3015 determines that the predetermined condition is not satisfied, and selects the effective state of voice recognition.
  • the television device 10 of the present embodiment selects either the valid state or the invalid state based on whether the current time is within the period from the start time to the end time of the specific program. Therefore, according to the television device 10 of this embodiment, in addition to the effects of the first embodiment, it is possible to prevent a situation in which the voice recognition service starts while the user is watching a specific program. Therefore, according to the television device 10 of the present embodiment, it is possible to reduce the situation where the user is hindered by the start of an unnecessary voice recognition service while watching a favorite program.
  • the television device 10 of the present embodiment it is possible to reduce the occurrence of erroneous operations such as accidentally switching to another program or cutting off the power of the television device 10 due to the voice recognition service while the user is watching a specific program. Reduce the situation that users miss programs due to the occurrence of this misoperation.
  • the television device 10 may set the specific program based on the learning result obtained by learning the user's viewing history.
  • the television device 10 as an example of the receiving device obtains program-related information from a broadcast signal, but the receiving device may obtain program schedule data from the outside via the IP communication unit 113 and the network 300.
  • the microphone 117 is switched between the on state and the off state when the voice is recognized as the active state and the inactive state, but it can also be switched while the microphone 117 is kept on.
  • the effective state and ineffective state of the voice recognition function is switched between the on state and the off state when the voice is recognized as the active state and the inactive state, but it can also be switched while the microphone 117 is kept on.
  • the wake word detection unit 12 and the voice recognition unit 13 do not perform wake word detection processing and voice recognition processing for the sound input into the microphone 117. Therefore, when the inactive state of voice recognition is selected, even if the microphone 117 is in a state where voice can be input, the voice recognition service will not be started.
  • the wake word detection unit 12 and the voice recognition unit 13 execute the wake word detection for the voice input into the microphone 117 in the same manner as in the first to fifth embodiments. Detection processing or voice recognition processing.
  • the effective state and the ineffective state of voice recognition are selected based on mutually different predetermined conditions, but the predetermined conditions in different embodiments may be combined.
  • the predetermined condition for selecting the invalid state of voice recognition may be "at least any of the operation screen and the notification screen obtained by combining the predetermined conditions in the first to fifth embodiments as an OR condition).
  • One is being displayed on the display panel 110, the specified application is being executed, the current time is in the invalid period, or the current time is in the period from the start time to the end time of the specific program", or the above-mentioned prescribed conditions Part of the combination of conditions.
  • the television device 10 is taken as an example of the receiving device, but the receiving device is not limited to this.
  • the receiving device may also be a set-top box or a PC (Personal Computer) with a TV function, etc., or a video playback device such as a BD (Blu-ray Disc) (registered trademark) video recorder or a DVD recorder.
  • BD Blu-ray Disc

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请涉及接收装置。本申请的目的在于,减少在不需要声音识别服务的场合下开始声音识别服务这样的状况。实施方式的接收装置具备声音输入部、选择部和声音识别部。声音输入部输入用户的声音。选择部基于规定的条件来选择声音识别的有效状态和无效状态中的任一个。在选择了有效状态的情况下,声音识别部执行针对输入到声音输入部中的声音的声音识别处理,在选择了无效状态的情况下,声音识别部不执行声音识别处理。

Description

接收装置
本申请要求在2019年8月13日提交日本专利局、申请号为2019-148384、发明名称为“接收装置”的日本专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请的实施方式涉及接收装置。
背景技术
近年来,针对能够供用户利用声音来进行设备的操作的声音识别服务的需求日益高涨。例如,已知有具备声音识别功能的电视装置等设备。在这样的电视装置等中,例如,在检测出用户发出的唤醒词(Wake Word)的情况下启动声音识别服务,例如,进行某种回应或者降低播放中的内容的音量以便于容易识别用户的声音。
然而,在这样的电视装置等中,存在因唤醒词的误检测等而在用户不想要的时机启动了声音识别服务的情况。这种情况下,可能会由于妨碍用户对内容的收看而让用户感到烦恼。
在先技术文献
专利文献
专利文献1:日本特开2013-235032号公报
发明内容
应减少在不需要声音识别服务的情况下开始声音识别服务这样的状况。
实施方式的接收装置具备声音输入部、选择部和声音识别部。声音输入部输入用户的声音。选择部基于规定的条件来选择声音识别的有效状态和无效状态中的任一个。在选择了有效状态的情况下,声音识别部执行针对输入 到声音输入部的声音的声音识别处理,在选择了无效状态的情况下,声音识别部不执行声音识别处理。
附图说明
图1是表示第一实施方式的电视装置的硬件结构的一例的图;
图2是表示第一实施方式的电视装置的功能结构的一例的图;
图3是表示第一实施方式的声音识别的有效状态与无效状态的选择处理的流程的一例的流程图;
图4是表示第二实施方式的电视装置的功能结构的一例的图;
图5是表示第三实施方式的电视装置的功能结构的一例的图;
图6是表示第四实施方式的电视装置的功能结构的一例的图;
图7是表示第五实施方式的电视装置的功能结构的一例的图。
附图标记说明
10…电视装置,11、1011、2011…获取部,12…唤醒词检测部,13…声音识别部,14…显示控制部,15、1015、2015、3015…选择部,16…设备控制部,17…应用执行部,18…学习部,19…节目表生成部,110…显示面板,111…操作部,112…受光部,115…内存,116…存储器,117…麦克风,119…遥控器,300…网络。
具体实施方式
(第一实施方式)
图1是表示本实施方式的电视装置10的硬件结构的一例的图。如图1所示,电视装置10具备天线101、输入端子102a、调谐器103、解调器104、解复用器105、输入端子102b及输入端子102c、A/D(模拟/数字)转换器106、选择器107、信号处理部108、扬声器109、显示面板110、操作部111、受光部112、IP通信部113、CPU(Central Processing Unit)114、内存(memory) 115、存储器(storage)116、麦克风(microphone)117和音频I/F(接口)118。电视装置10是本实施方式中的接收装置的一例。
天线101接收数字广播的广播信号,并将接收到的广播信号经由输入端子102a向调谐器103供给。调谐器103从天线101供给来的广播信号中选择所期望的频道的广播信号,并将选择出的广播信号向解调器104供给。广播信号也称作广播波。
解调器104对从调谐器103供给来的广播信号进行解调,并将解调后的广播信号向解复用器105供给。解复用器105对从解调器104供给来的广播信号进行分离来生成图像信号及声音信号,并将生成的图像信号及声音信号向选择器107供给。
选择器107构成为,从自解复用器105、A/D转换器106及输入端子102c供给的多个信号中选择一个,并将选择出的一个信号向信号处理部108供给。
信号处理部108构成为,对从选择器107供给的图像信号实施规定的信号处理,并将处理后的图像信号向显示面板110供给。另外,信号处理部108构成为,对从选择器107供给的声音信号实施规定的信号处理,并将处理后的声音信号向扬声器109供给。
扬声器109构成为,基于从信号处理部108供给的声音信号来输出语音或者各种声音。另外,扬声器109基于由CPU114进行的控制来对输出的语音或各种声音的音量进行变更。
显示面板110构成为,基于从信号处理部108供给的图像信号或由CPU114进行的控制来显示静止图像、动态图像等图像。显示面板110是显示部的一例。
输入端子102b接收从外部输入的模拟信号(图像信号及声音信号)。另外,输入端子102c构成为接收从外部输入的数字信号(图像信号及声音信号)。例如,输入端子102c设计成,能够从搭载有驱动BD(Blu-ray Disc)(注册商标)等录像播放用的存储介质来进行录像及播放的驱动装置的录像机(BD录像机)等向该输入端子102c输入数字信号。A/D转换器106将通过对从输入 端子102b供给的模拟信号实施A/D转换而生成的数字信号向选择器107供给。
操作部111接收用户的操作输入。另外,受光部112接收来自遥控器119的红外线。IP通信部113是用于经由网络300来进行IP(互联网协议)通信的通信接口。
CPU114是控制电视装置10整体的控制部。内存115是保存有供CPU114执行的各种计算机程序的ROM(Read Only Memory)、向CPU114提供工作分区的RAM(Random Access Memory)等。另外,存储器116是HDD(Hard Disk Drive)、SSD(Solid State Drive)等。存储器116例如将由选择器107选择出的信号作为录像数据来存储。
麦克风117获取用户讲话的声音并将其向音频I/F118发送。麦克风117是声音输入部的一例。麦克风117在设为“打开状态”的情况下能够进行声音的输入,在设为“关闭状态”的情况下不能进行声音的输入。在本实施方式中,麦克风117在电视装置10启动的情况下自动成为打开状态。例如,麦克风117在通过由CPU114进行的控制而选择将声音识别设为有效状态的情况下,保持打开状态。另外,例如,麦克风117在通过由CPU114进行的控制而选择将声音识别设为无效状态的情况下,切换为关闭状态。关于声音识别的有效状态和无效状态的选择的详细情况,将会作为选择部15的处理而在后进行叙述。
音频I/F118对麦克风117获取到的声音进行模拟/数字转换并将其作为声音信号向CPU114发送。
接着,对本实施方式的电视装置10的功能进行说明。
图2是表示本实施方式的电视装置10的功能结构的一例的图。如图2所示,电视装置10具备获取部11、唤醒词检测部12、声音识别部13、显示控制部14、选择部15和设备控制部16。
由本实施方式的电视装置10执行的程序成为包含上述的各部(获取部、唤醒词检测部、声音识别部、显示控制部、选择部、设备控制部)的模块结构,通过作为实际的硬件的CPU114从ROM等中读取程序并执行该程序,由 此将上述各部加载到RAM等主存储装置上,在主存储装置上生成获取部、唤醒词检测部、声音识别部、显示控制部、选择部、设备控制部。
由本实施方式的电视装置10执行的程序例如通过预先装入ROM等来提供。另外,由本实施方式的电视装置10执行的程序也可以通过如下方式来提供:以可安装或可执行的文件的形式存储在CD-ROM、闪存(FD)、CD-R、DVD(Digital Versatile Disk)等由计算机可读取的存储介质中。
还可以将由本实施方式的电视装置10执行的程序保存在连接到互联网等网络的计算机上,通过经由网络下载来提供。另外,还可以将由本实施方式的电视装置10执行的程序经由互联网等网络来提供或分配。另外,在本实施方式中,记载的是通过一台CPU来实现各功能部,但也可以通过多个CPU或各种电路来实现各功能部。
获取部11经由音频I/F118来获取输入到麦克风117中的用户的声音。获取部11将获取到的声音向唤醒词检测部12和声音识别部13发送。需要说明的是,获取部11所获取的“声音”是由音频I/F118转换后的数字声音信号,但以下简单记载为“声音”。
另外,获取部11从与CPU114连接的操作部111、受光部112、IP通信部113、选择器107、信号处理部108等获取各种信号。例如,获取部11基于受光部112接收的来自遥控器119的红外线或输入到操作部111中的操作,接收用户的操作。获取部11将接收到的用户的操作的内容向显示控制部14和设备控制部16发送。
唤醒词检测部12从由获取部11获取到的声音中检测唤醒词(Wake Word)。唤醒词是成为声音识别服务启动的触发的规定的声音指令。唤醒词是被预先设定的词语。另外,判断声音信号是否包含唤醒词的方法可以采用已知的声音识别技术。
在本实施方式中,虽然唤醒词检测部12的设定自身不会根据由后述的选择部15对声音识别的有效状态和无效状态中的任一个进行的选择而发生变化,但在选择了无效状态的情况下,麦克风117会变为关闭状态,无法进行声音 的输入,因此无法获取声音。因此,唤醒词检测部12在选择了声音识别的无效状态的情况下不执行唤醒词的检测处理。另外,在选择了声音识别的有效状态的情况下,麦克风117为打开状态,能够进行声音的输入。因此,唤醒词检测部12在选择了声音识别的有效状态的情况下,执行针对输入到麦克风117中的声音进行的唤醒词的检测处理。
唤醒词检测部12在从由获取部11获取的声音中检测出唤醒词的情况下,通知显示控制部14及设备控制部16检测出唤醒词。另外,唤醒词检测部12在唤醒词之后接着输入了用户的声音的情况下,将接在唤醒词之后的声音向声音识别部13发送。
声音识别部13执行针对输入到麦克风117中的声音进行的声音识别处理。在本实施方式中,虽然声音识别部13的设定自身不会根据由后述的选择部15对声音识别的有效状态和无效状态中的任一个进行的选择而发生变化,但在选择了无效状态的情况下,麦克风117无法进行声音的输入,因此无法获取声音。因此,声音识别部13在选择了声音识别的无效状态的情况下,不执行声音识别处理。另外,在选择了声音识别的有效状态的情况下,麦克风117能够进行声音的输入。因此,声音识别部13在选择了声音识别的有效状态的情况下,执行针对输入到麦克风117中的声音进行的声音识别处理。
更详细而言,声音识别部13在由唤醒词检测部12检测出唤醒词的情况下,通过对接在唤醒词之后的声音进行声音识别处理来确定用户的声音的内容。声音识别处理可以适用已知的技术。例如,声音识别部13使用已知的技术来将用户的声音内容转换为文本数据。声音识别部13将声音识别结果向显示控制部14和设备控制部16发送。在本实施方式中,通过显示控制部14或设备控制部16等各功能部基于声音识别部13对用户的声音进行声音识别而得到的结果来执行处理,由此实现声音识别服务。
显示控制部14控制显示面板110上的各种显示。例如,显示控制部14在获取部11获取了输入到遥控器119等中的用户的操作的情况下,将与该操作相应的操作画面显示于显示面板110。更具体而言,在用户进行了将用于开 始录像预约的设定的按钮按下等操作的情况下,显示控制部14将能够接收该用户的操作的操作画面显示于显示面板110。操作画面的显示形态例如可以是重叠在播放中的内容的画面上来显示的OSD(On Screen Display),也可以是在显示面板110的整体上显示的全画面显示。需要说明的是,在本实施方式中,“内容”包括电视节目、录制在DVD等中的动态图像或者利用应用来播放的动态图像等。
另外,显示控制部14将各种通知画面显示于显示面板110。例如,显示控制部14将包含向用户提供信息、向用户发出警告或者唤起用户注意等消息在内的通知画面重叠在播放中的内容的画面上而作为OSD来显示。
另外,显示控制部14在由唤醒词检测部12检测出唤醒词的情况下,将针对声音进行应答的消息或图标等显示于显示面板110。针对声音进行应答的消息或图标等例如可以是催促用户讲话的内容,也可以是将用户的声音的识别结果作为文字数据来显示这样的形式。通过该消息或图标等的显示,用户能够容易地识别出唤醒词被识别以及讲话的声音成为对电视装置10的指示这些情况。
另外,例如,显示控制部14在将操作画面或通知画面显示于显示面板110的情况下,将表示正在显示操作画面的操作画面显示标志或正在显示通知画面的通知画面显示标志设定在内存115中。另外,显示控制部14在结束了操作画面或通知画面的显示的情况下,从内存115中删除操作画面显示标志或通知画面显示标志。需要说明的是,表示在显示面板110上显示有操作画面或通知画面的方法并不限定于此。例如,显示控制部14也可以向选择部15通知表示在显示面板110上显示有操作画面或通知画面的消息或者表示结束了操作画面或通知画面的显示的消息。
另外,显示控制部14基于由声音识别部13识别出的用户的声音中包含的命令,来控制显示面板110的显示。例如,显示控制部14基于用户的声音中包含的命令来控制调谐器103,选取正在播出用户用声音指定的节目的频道,并将该节目显示于显示面板110。另外,显示控制部14也可以基于用户的声 音中包含的命令,来播放保存在存储器116或外部的存储装置中的节目的录像数据并将其显示于显示面板110。
选择部15基于规定的条件来选择声音识别的有效状态和无效状态中的任一个。
本实施方式中的规定的条件是“操作画面或通知画面中的至少任一个正显示于显示面板110”。本实施方式的选择部15在电视装置10的显示面板110的状态满足规定的条件的情况下,选择无效状态。另外,选择部15在电视装置10的显示面板110的状态不满足规定的条件的情况下,选择有效状态。
例如,选择部15在内存115中设立有操作画面显示标志的情况下,判断为正在显示操作画面,在内存115中设立有通知画面显示标志的情况下,判断为正在显示通知画面。选择部15在判断为操作画面或通知画面中的至少任一个正显示于显示面板110的情况下,判定为电视装置10满足规定的条件。这种情况下,选择部15选择无效状态。
需要说明的是,判断操作画面或通知画面有无显示的方法并不限定于此,例如,选择部15也可以基于从显示控制部14获取到的、操作画面或通知画面有无显示,来判断操作画面和通知画面中的至少任一个是否正显示于显示面板110。
另外,选择部15在判断为操作画面及通知画面中的任一个都没有显示于显示面板110的情况下,判断为电视装置10不满足规定的条件。这种情况下,选择部15选择有效状态。
选择部15将声音识别的有效状态和无效状态的选择结果向设备控制部16发送。
设备控制部16对电视装置10所包括的各种设备进行控制。例如,设备控制部16在由选择部15选择了声音识别的无效状态的情况下,将麦克风117设为关闭状态。另外,例如,设备控制部16在由选择部15选择了声音识别的有效状态的情况下,将麦克风117设为打开状态。
另外,设备控制部16在由唤醒词检测部12检测出唤醒词的情况下,控 制扬声器109来降低音量。这是为了减少用户在唤醒词之后讲话的声音的输入受到内容的声音干扰的情况。
另外,设备控制部16基于由声音识别部13识别出的用户的声音中包含的命令来控制电视装置10所包括的各种设备。例如,设备控制部16在用户的声音中包含“提高音量”这样的命令的情况下,控制扬声器109来提高音量。需要说明的是,设备控制部16也可以基于由声音识别部13识别出的用户的声音中包含的命令来从互联网检索信息。
接着,对由如上那样构成的电视装置10执行的声音识别的有效状态和无效状态的选择处理的流程进行说明。
图3是表示本实施方式中的声音识别的有效状态和无效状态的选择处理的流程的一例的流程图。设该流程图的处理在电视装置10运行期间被持续执行。另外,设在该流程图的开始时刻下声音识别为有效状态且麦克风117为打开状态。
首先,选择部15例如基于是否在内存115中设立有操作画面显示标志或通知画面显示标志来判定电视装置10是否满足规定的条件(S1)。
选择部15在内存115中设立有操作画面显示标志或通知画面显示标志的情况下,判定为电视装置10满足规定的条件(在S1中为“是”)。这种情况下,选择部15选择声音识别的无效状态(S2)。选择部15将选择了声音识别的无效状态这一情况向设备控制部16发送。
接着,设备控制部16将麦克风117设为“关闭状态”(S3)。由此,麦克风117成为不接收声音的输入的状态。在由设备控制部16将麦克风117设为“关闭状态”之后,返回到S1的处理,反复进行处理。
另外,选择部15在内存115中没有设立有操作画面显示标志及通知画面显示标志中的任一个的情况下,判定为电视装置10不满足规定的条件(在S1中为“否”)。这种情况下,选择部15选择声音识别的有效状态(S4)。例如,在声音识别成为无效状态之后结束了操作画面或通知画面的显示并删除了标志的情况下,选择部15选择有效状态,由此声音识别从无效状态切换为有效 状态。选择部15将选择了声音识别的有效状态这一情况向设备控制部16发送。
接着,设备控制部16将麦克风117设为打开状态(S5)。由此,麦克风117成为能够接收声音的输入的状态。需要说明的是,在麦克风117已经是打开状态的情况下,设备控制部16不执行任何处理。
接着,唤醒词检测部12经由音频I/F118来获取输入到麦克风117中的用户的声音(S6)。获取部11将获取到的声音向唤醒词检测部12和声音识别部13发送。
然后,唤醒词检测部12判断由获取部11获取到的声音中是否包含唤醒词(S7)。唤醒词检测部12在从获取到的声音中检测出唤醒词的情况下(在S7中为“是”),通知显示控制部14及设备控制部16检测出唤醒词。另外,唤醒词检测部12在唤醒词之后接着输入了用户的声音的情况下,将接在唤醒词之后的声音向声音识别部13发送。
接着,设备控制部16控制扬声器109来降低播放中的内容的音量(S8)。另外,显示控制部14将针对用户的应答消息或图标显示于显示面板110(S9)。这样的基于设备控制部16或显示控制部14的处理是声音识别服务开始时的处理的一例。
然后,声音识别部13执行针对在唤醒词之后输入到麦克风117中的声音的声音识别处理(S10)。声音识别部13将声音识别处理的声音识别结果向显示控制部14和设备控制部16发送。然后,显示控制部14或设备控制部16通过执行基于声音识别结果的处理,来实现声音识别服务(S11)。之后,返回到S1的处理,反复进行该流程图的处理直至电视装置10的电源切断为止。
这样,本实施方式的电视装置10基于规定的条件来选择声音识别的有效状态和无效状态中的任一个,在选择了有效状态的情况下执行针对输入到麦克风117中的声音的声音识别处理,在选择了无效状态的情况下不执行声音识别处理。因此,根据本实施方式的电视装置10,能够减少在不需要声音识别服务的场合下开始声音识别服务这样的状况。
例如,存在尽管用户讲话的声音不是唤醒词但却被误识别为唤醒词的情况。通常,在用户正在操作遥控器等的场合下,大多不需要基于声音识别服务进行的操作。然而,在现有技术中,在用户一边看着显示面板上的操作画面一边操作遥控器等的场合下,若是用户讲话的声音被误识别为唤醒词,则开始声音识别服务,在显示面板上显示针对用户的应答消息或图标,使得操作画面消失或变得不容易看到。
另外,在显示面板上显示通知画面的情况下,用户会阅读显示在通知画面上的消息等,因此直至该通知画面的显示结束之前都不希望该通知画面被其他画面遮挡。然而,在现有技术中,即便用户正在观看显示面板上的通知画面,若用户讲话的声音被误识别为唤醒词,则也会开始声音识别服务,在显示面板上显示针对用户的应答消息或图标,使得通知画面消失或变得不容易看到。在这样的情况下,有时会让用户感到烦恼,会妨碍向用户提供信息。
相对于此,本实施方式的电视装置10在显示面板110上显示有操作画面和通知画面中的至少任一个的情况下,判断为电视装置10满足规定的条件,选择无效状态。因此,根据本实施方式的电视装置10,能够减少在显示面板110上显示有操作画面或通知画面的情况下开始声音识别服务这样的状况。因此,根据本实施方式的电视装置10,能够减少在用户正使用操作画面或通知画面时在显示面板110上显示针对用户的应答消息或图标而使得用户难以看到操作画面或通知画面这样的状况。
另外,本实施方式的电视装置10在选择了有效状态的情况下将麦克风117设为打开状态,在选择了无效状态的情况下将麦克风117设为关闭状态。因此,根据本实施方式的电视装置10,在无效状态下在物理上无法进行用户的声音的输入,能够减少开始声音识别服务的状况。
需要说明的是,在本实施方式中,将作为硬件的麦克风117设为声音输入部的一例,但也可以将通过程序来实现的获取部11设为声音输入部的一例。另外,麦克风117也可以不设置于电视装置10主体而设置于遥控器119。另外,声音输入部还可以通过电视装置10外部的声音识别设备来实现。
另外,在本实施方式中,将“操作画面和通知画面中的至少任一个正显示于显示面板110”作为规定的条件,但也可以将“操作画面正显示于显示面板110”或“通知画面正显示于显示面板110”作为规定的条件。例如,在将“操作画面正显示于显示面板110”作为规定的条件的情况下,选择部15在操作画面正显示于显示面板110的情况下,无论通知画面有无显示都判定为满足规定的条件。另外,选择部15在操作画面没有显示于显示面板110的情况下,无论通知画面有无显示都判定为不满足规定的条件。
另外,在本实施方式中,将唤醒词检测部12和声音识别部13设为不同的功能部,但也可以设计成声音识别部13具备唤醒词检测部12的功能。另外,还可以将声音识别部13和唤醒词检测部12统称为声音识别部。需要说明的是,在本实施方式中例示的声音识别服务的内容仅是一例,声音识别服务的内容并不限定于例示的内容。
另外,本实施方式中的音量的降低、应答消息等向显示面板110的显示是声音识别服务开始时的处理的一例,声音识别服务开始时的处理并不限定于此。例如,电视装置10也可以在声音识别服务开始时将应答消息以声音的形式输出。
另外,在本实施方式中,选择部15在判定为满足规定的条件的情况下选择声音识别的无效状态,在判定为不满足规定的条件的情况下选择声音识别的有效状态,但选择基准并不局限于此。
例如,在声音识别设为无效状态是通常的状态的情况下,选择部15也可以在判定为满足规定的条件的情况下选择声音识别的有效状态,在判定为不满足规定的条件的情况下选择声音识别的无效状态。举具体的例子的话,在规定的条件为“操作画面及通知画面中的任一个都没有显示于显示面板110”的情况下,选择部15可以在判断为操作画面及通知画面中的任一个都没有显示于显示面板110的情况下,判定为满足规定的条件并选择声音识别的有效状态。另外,选择部15也可以在判断为操作画面和通知画面中的任一个正显示于显示面板110的情况下,判定为不满足规定的条件并选择声音识别的无 效状态。
(第二实施方式)
在上述的第一实施方式中,选择声音识别的无效状态的规定的条件是“操作画面和通知画面中的至少任一个正显示于显示面板110”。相对于此,在该第二实施方式中,选择声音识别的无效状态的规定的条件是“规定的应用处于执行中”。
本实施方式的电视装置10的硬件结构与第一实施方式同样。
接着,对本实施方式的电视装置10的功能进行说明。
图4是表示本实施方式的电视装置10的功能结构的一例的图。如图4所示,电视装置10具备获取部11、唤醒词检测部12、声音识别部13、显示控制部14、选择部1015、设备控制部16和应用执行部17。应用执行部17也与其他的功能部同样通过CPU114执行程序来实现。获取部11、唤醒词检测部12、声音识别部13、显示控制部14和设备控制部16具备与第一实施方式同样的功能。
应用执行部17执行内容发布的应用,将通过该应用发布的内容的动态图像显示于显示面板110。
由应用执行部17执行的内容发布的应用是本实施方式中的规定的应用的一例。内容发布的应用例如设为是经由网络300从外部的服务器接收电视剧、电影等内容动态图像的发布的应用,但也可以是包含其他的功能的应用。
应用执行部17例如在内容发布的应用正在执行的期间,在内存115中设定表示内容发布的应用处于执行中的应用执行标志。
本实施方式的选择部1015与第一实施方式同样地基于规定的条件来选择声音识别的有效状态和无效状态中的任一个,但在本实施方式中,使用与第一实施方式不同的条件来选择有效状态和无效状态中的任一个。
更详细而言,本实施方式中的规定的条件是“规定的应用(内容发布的应用)处于执行中”。本实施方式的选择部1015获取内容发布的应用的执行状态,在内容发布的应用处于执行中的情况下,判定为满足规定的条件,选 择声音识别的无效状态。另外,选择部1015在内容发布的应用没有处于执行中的情况下,判定为不满足规定的条件,选择声音识别的有效状态。
选择部1015例如基于内存115中有无应用执行标志来判定规定的应用是否处于执行中,但也可以通过其他的方法来获取规定的应用的执行状态。
另外,本实施方式中的声音识别的有效状态和无效状态的选择处理的流程与图3所示的第一实施方式同样。
这样,本实施方式的电视装置10在内容发布的应用没有处于执行中的情况下选择有效状态,在内容发布的应用处于执行中的情况下选择无效状态。因此,根据本实施方式的电视装置10,除了第一实施方式的效果以外,还能起到如下效果:能够减少在正通过内容发布的应用将动态图像内容等显示于显示面板110的情况下开始声音识别服务这样的状况。
即,根据本实施方式的电视装置10,能够减少由于声音识别服务的开始而使得显示在显示面板110上的内容的动态图像消失、在内容的动态图像上显示应答消息等而将内容的动态图像遮挡这样的事态的发生。另外,由于在声音识别服务开始时会降低扬声器109的音量,因此,存在播放中的内容的动态图像的收看受到妨碍的情况。根据本实施方式的电视装置10,能够减少在正通过内容发布的应用将视频内容等显示于显示面板110的情况下开始声音识别服务这样的状况,因此,能够减少妨碍用户收看播放中的内容的动态图像这样的状况。
另外,即便实际上没有开始声音识别服务,但用户也会防备着声音识别服务的开始,从而导致有时无法集中注意力在动态图像内容等的收看中,但本实施方式的电视装置10能够减少这样的事态。
需要说明的是,在本实施方式中,规定的应用为内容发布的应用,但能够由电视装置10执行的应用中的哪个应用会成为“规定的应用”可以预先设定在电视装置10中,也可以设计成能够由用户来设定。
(第三实施方式)
在该第三实施方式中,选择声音识别的无效状态的规定的条件是“当前 时刻处于无效期间内”。
本实施方式的电视装置10的硬件结构与第一实施方式同样。
接着,对本实施方式的电视装置10的功能进行说明。
图5是表示本实施方式的电视装置10的功能结构的一例的图。如图5所示,电视装置10具备获取部1011、唤醒词检测部12、声音识别部13、显示控制部14、选择部2015和设备控制部16。唤醒词检测部12、声音识别部13、显示控制部14和设备控制部16具备与第一实施方式同样的功能。
本实施方式的电视装置10具有将声音识别设为无效状态的无效期间的设定。无效期间是声音识别成为无效状态的期间。无效期间的设定例如保存于存储器116。在本实施方式中,通过用户的操作来对该无效期间的设定进行登记或变更。无效期间的设定例如是与无效期间的开始时刻及结束时刻有关的设定。
更详细而言,本实施方式的获取部1011在具备第一实施方式的功能的基础上,还接收由用户进行的无效期间的开始时刻及结束时刻的输入操作。例如,获取部1011基于受光部112接收的来自遥控器119的红外线或输入到操作部111中的操作,接收由用户进行的无效期间的开始时刻及结束时刻的输入操作,将接收到的表示无效期间的开始时刻及结束时刻的无效期间信息保存于存储器116等。需要说明的是,无效期间信息的保存场所并不局限于此。
例如,用户也可以将“PM23:00~AM06:00”设定为无效期间,以防在就寝中启动声音识别服务。另外,用户还可以将“AM09:00~PM17:00”设定为无效期间,以防在外出的期间启动声音识别服务。
另外,在本实施方式中,设计成没有被设定为无效期间的期间全部是有效期间。需要说明的是,在本实施方式中,与第一实施方式同样地设计成在通常的状态下声音识别为有效状态且麦克风117为打开状态。
本实施方式的选择部2015与第一实施方式同样地基于规定的条件来选择声音识别的有效状态和无效状态中的任一个,但在本实施方式中,使用与第一实施方式不同的条件来选择有效状态和无效状态中的任一个。
更详细而言,本实施方式中的规定的条件是“当前时刻处于无效期间内”。本实施方式的选择部2015在当前时刻处于无效期间内的情况下,判定为满足规定的条件,选择声音识别的无效状态。另外,选择部2015在当前时刻处于有效期间内的情况下,判定为不满足规定的条件,选择声音识别的有效状态。
另外,本实施方式中的声音识别的状态选择处理的流程与图3所示的第一实施方式同样。
这样,根据本实施方式的电视装置10,在当前时刻处于有效期间内的情况下选择有效状态,在当前时刻处于无效期间内的情况下选择无效状态,由此,除了第一实施方式的效果以外,还能减少在用户不期望声音识别服务的开始的时段开始声音识别服务这样的状况。
需要说明的是,在本实施方式中,接收由用户进行的无效期间的设定,但也可以接收有效期间的设定。例如,在电视装置10中声音识别为无效状态是通常的状态的情况下,仅在设定好的有效期间内使声音识别成为有效状态。这种情况下,规定的条件例如可以是“当前时刻处于有效期间内”。另外,在采用该结构的情况下,选择部2015也可以在判定为满足规定的条件的情况下选择声音识别的有效状态,在判定为不满足规定的条件的情况下选择声音识别的无效状态。
需要说明的是,在本实施方式中,无效期间仅是由开始时刻和结束时刻来定义,但也可以利用星期或节日等日历信息来更详细地定义。
(第四实施方式)
在该第四实施方式中,选择声音识别的无效状态的规定的条件与第三实施方式同样是“当前时刻处于无效期间内”。但是,在第三实施方式中,用户设定的是无效期间,相对于此,在该第四实施方式中,电视装置10基于学习结果来设定无效期间。
本实施方式的电视装置10的硬件结构与第一实施方式同样。
接着,对本实施方式的电视装置10的功能进行说明。
图6是表示本实施方式的电视装置10的功能结构的一例的图。如图6所 示,电视装置10具备获取部11、唤醒词检测部12、声音识别部13、显示控制部14、选择部2015、设备控制部16和学习部18。学习部18也与其他的功能部同样地通过CPU114执行程序来实现。获取部11、唤醒词检测部12、声音识别部13、显示控制部14和设备控制部16具备与第一实施方式同样的功能。另外,选择部2015具备与第三实施方式同样的功能。
学习部18学习由用户进行的操作的模式,生成学习完成模型。本实施方式中的学习完成模型作为一例是将时刻与在该时刻下是否需要声音识别服务建立了对应关系的信息。学习部18进行学习的方法例如可以适用公知的机器学习或深度学习中的无教师学习的技术。学习完成模型例如保存于存储器116等,但保存场所并不局限于此。
学习部18的输入数据是用户的操作内容和时刻,例如是用户进行了声音识别服务的取消操作的时刻、用户利用声音识别服务的时刻等。例如,在用户不利用开始了的声音识别服务而用遥控器119等结束了声音识别服务的情况下,对该时刻和用户进行了声音识别服务的取消操作这一行为进行学习。
学习部18基于学习结果来输出不需要声音识别服务的时刻。学习部18将该输出的结果作为表示无效期间的开始时刻及结束时刻的无效期间信息而保存于存储器116等。
另外,学习部18在生成过一次的学习完成模型之后,仍继续学习由用户进行的操作的模式,来提高学习完成模型的精度。
另外,本实施方式中的声音识别的有效状态和无效状态的选择处理的流程与图3所示的第一实施方式同样。
这样,本实施方式的电视装置10基于对用户的操作的模式进行学习而得到的结果来设定声音识别的无效期间,在当前时刻处于有效期间内的情况下选择有效状态,在当前时刻处于无效期间内的情况下选择无效状态。因此,根据本实施方式的电视装置10,除了第一实施方式、第三实施方式的效果以外,还能够减少用户进行无效期间的设定操作所耗费的时间和精力。
需要说明的是,在本实施方式中例示的向学习部18输入的输入数据及从 学习部18输出的输出结果仅是一例,并不限定于此。另外,学习部18不仅可以设定根据时刻而不同的无效期间,还可以设定根据星期或节日等日历信息而不同的无效期间。
需要说明的是,在本实施方式中,电视装置10基于对用户的操作的模式进行学习而得到的结果来设定声音识别的无效期间,但也可以基于学习的结果来设定声音识别的有效期间。
(第五实施方式)
在该第五实施方式中,选择声音识别的无效状态的规定的条件是“当前时刻处于特定节目的从开始时刻到结束时刻为止的期间内”。
本实施方式的电视装置10的硬件结构与第一实施方式同样。
接着,对本实施方式的电视装置10的功能进行说明。
图7是表示本实施方式的电视装置10的功能结构的一例的图。如图7所示,电视装置10具备获取部2011、唤醒词检测部12、声音识别部13、显示控制部14、选择部3015、设备控制部16和节目表生成部19。唤醒词检测部12、声音识别部13、显示控制部14和设备控制部16具备与第一实施方式同样的功能。
本实施方式的获取部2011在具备第一实施方式的功能的基础上,还从广播信号所包含的服务信息SI(Service Information)中获取与节目有关的信息。获取部2011将获取的与节目有关的信息向节目表生成部19发送。
另外,本实施方式的获取部2011接收由用户进行的用于指定特定节目的操作。例如,获取部2011基于受光部112接收的来自遥控器119的红外线或输入到操作部111中的操作,来接收由用户进行的用于指定特定节目的操作。另外,获取部2011从保存于存储器116的节目表中获取由用户指定的特定节目的开始时刻及结束时刻。获取部2011将接收到的表示特定节目的开始时刻及结束时刻的节目时刻信息保存于存储器116等。需要说明的是,节目时刻信息的保存场所并不局限于此。
节目表生成部19基于由获取部2011获取的与节目有关的信息来生成节 目表。节目表生成部19将生成的节目表例如保存于存储器116。
另外,也可以设计成用户输入特定节目的开始时刻及结束时刻。
本实施方式的选择部3015与第一实施方式同样地基于规定的条件来选择声音识别的有效状态和无效状态中的任一个,但在本实施方式中,使用与第一实施方式不同的条件来选择有效状态和无效状态中的任一个。
更详细而言,本实施方式中的规定的条件是“当前时刻处于特定节目的从开始时刻到结束时刻为止的期间内”。“特定节目的从开始时刻到结束时刻为止的期间”是本实施方式中的无效期间的一例。
本实施方式的选择部3015基于当前时刻是否处于特定节目的从开始时刻到结束时刻为止的期间内,来选择有效状态和无效状态中的任一个。例如,选择部3015在当前时刻处于特定节目的从开始时刻到结束时刻为止的期间内的情况下,判定为满足规定的条件,选择声音识别的无效状态。另外,选择部3015在当前时刻没有处于特定节目的从开始时刻到结束时刻为止的期间内的情况下,判定为不满足规定的条件,选择声音识别的有效状态。
另外,本实施方式中的声音识别的有效状态和无效状态的选择处理的流程与图3所示的第一实施方式同样。
这样,本实施方式的电视装置10基于当前时刻是否处于特定节目的从开始时刻到结束时刻为止的期间内来选择有效状态和无效状态中的任一个。因此,根据本实施方式的电视装置10,除了第一实施方式的效果以外,还能够防止在用户正收看特定节目时声音识别服务开始这样的状况。因此,根据本实施方式的电视装置10,能够减少用户在正收看喜爱的节目的期间由于不需要的声音识别服务的开始而受到妨碍的状况。另外,根据本实施方式的电视装置10,能够减少在用户正收看特定节目的期间由于声音识别服务而意外地切换为其他的节目或者切断电视装置10的电源这样的误操作的发生,另外,能够减少由于该误操作的发生而导致用户错过节目这样的状况。
需要说明的是,在本实施方式中,由用户来设定特定节目,但也可以是电视装置10基于对用户的收看历史进行学习而得到的学习结果来设定特定节 目。
另外,在本实施方式中,作为接收装置的一例的电视装置10从广播信号中获取与节目有关的信息,但也可以是接收装置经由IP通信部113及网络300来从外部获取节目表数据。
(变形例1)
在上述的第一实施方式至第五实施方式中,在声音识别为有效状态和无效状态的情况下切换麦克风117的打开状态与关闭状态,但也可以在麦克风117维持着打开状态的情况下切换声音识别功能的有效状态与无效状态。
例如,在选择了声音识别的无效状态的情况下,唤醒词检测部12及声音识别部13不执行针对输入到麦克风117中的声音进行的唤醒词的检测处理及声音识别处理。因此,在选择了声音识别的无效状态的情况下,即便麦克风117处于能够输入声音的状态,也不会开始声音识别服务。
另外,在选择了声音识别的有效状态的情况下,唤醒词检测部12及声音识别部13与第一实施方式~第五实施方式同样地执行针对输入到麦克风117中的声音进行的唤醒词的检测处理或声音识别处理。
(变形例2)
在上述的第一实施方式至第五实施方式中,基于彼此不同的规定的条件来选择声音识别的有效状态和无效状态,但也可以组合不同的实施方式中的规定的条件。例如,选择声音识别的无效状态的规定的条件可以是将第一实施方式至第五实施方式中的规定的条件作为或条件(OR条件)组合而得到的“操作画面和通知画面中的至少任一个正显示于显示面板110、规定的应用处于执行中、当前时刻处于无效期间内、或者当前时刻处于特定节目的从开始时刻到结束时刻为止的期间内”,也可以是将上述的规定的条件的一部分组合而得到的条件。
(变形例3)
在上述的第一实施方式至第五实施方式中,将电视装置10作为接收装置的一例,但接收装置并不限定于此。例如,接收装置也可以是机顶盒或带电 视功能的PC(Personal Computer)等,还可以是BD(Blu-ray Disc)(注册商标)录像机或DVD录像机等录像播放装置。
如以上所说明的那样,根据第一实施方式至第五实施方式,能够减少在不需要声音识别服务的场合下开始声音识别服务这样的状况。
对本申请的几个实施方式进行了说明,但这些实施方式仅是作为示例来提示的,并不意在限定申请的范围。这些新的实施方式可以以其他各种各样的形态来实施,可以在不脱离申请的主旨的范围内进行各种省略、置换、变更。这些实施方式及其变形包含在申请的范围、主旨内,并且包含在权利要求书所记载的发明及与其等同的范围内。

Claims (5)

  1. 一种接收装置,其具备:
    声音输入部,其输入用户的声音;
    选择部,其基于规定的条件,选择声音识别的有效状态和无效状态中的任一个;以及
    声音识别部,在选择了所述有效状态的情况下,所述声音识别部执行针对输入到所述声音输入部的所述声音的声音识别处理,在选择了所述无效状态的情况下,所述声音识别部不执行所述声音识别处理。
  2. 根据权利要求1所述的接收装置,其中,
    所述规定的条件是通知画面或能够接收所述用户的操作的操作画面中的至少任一个画面正显示于显示部,
    所述选择部在所述显示部显示有所述操作画面或所述通知画面中的至少任一个画面的情况下选择所述无效状态,在所述操作画面或所述通知画面中的任一个画面都没有显示于所述显示部的情况下选择所述有效状态。
  3. 根据权利要求1所述的接收装置,其中,
    所述规定的条件是规定的应用处于执行中,
    所述选择部获取所述规定的应用的执行状态,在所述规定的应用没有处于执行中的情况下选择所述有效状态,在所述规定的应用处于执行中的情况下选择所述无效状态。
  4. 根据权利要求1所述的接收装置,其中,
    所述规定的条件是当前时刻处于无效期间内或有效期间内,
    所述选择部在当前时刻处于所述有效期间内的情况下选择所述有效状态,在当前时刻处于所述无效期间内的情况下选择所述无效状态。
  5. 根据权利要求1~4中任一项所述的接收装置,其中,
    所述声音输入部是麦克风,
    所述接收装置还具备设备控制部,所述设备控制部在由所述选择部选择 了所述有效状态的情况下将所述麦克风设为打开状态,在由所述选择部选择了所述无效状态的情况下将所述麦克风设为关闭状态。
PCT/CN2020/108978 2019-08-13 2020-08-13 接收装置 WO2021027892A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202080004651.1A CN112930686B (zh) 2019-08-13 2020-08-13 接收装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019148384A JP7206167B2 (ja) 2019-08-13 2019-08-13 受信装置
JP2019-148384 2019-08-13

Publications (1)

Publication Number Publication Date
WO2021027892A1 true WO2021027892A1 (zh) 2021-02-18

Family

ID=74570548

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/108978 WO2021027892A1 (zh) 2019-08-13 2020-08-13 接收装置

Country Status (3)

Country Link
JP (1) JP7206167B2 (zh)
CN (1) CN112930686B (zh)
WO (1) WO2021027892A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100333163A1 (en) * 2009-06-25 2010-12-30 Echostar Technologies L.L.C. Voice enabled media presentation systems and methods
CN105979324A (zh) * 2016-05-31 2016-09-28 青岛海信电器股份有限公司 一种智能电视控制遥控器麦克风的方法及装置
CN108600796A (zh) * 2018-03-09 2018-09-28 百度在线网络技术(北京)有限公司 智能电视的控制模式切换方法、设备及计算机可读介质
CN108986809A (zh) * 2018-08-30 2018-12-11 广东小天才科技有限公司 一种便携式设备及其唤醒方法和装置
CN109346071A (zh) * 2018-09-26 2019-02-15 出门问问信息科技有限公司 唤醒处理方法、装置及电子设备

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS57142096A (en) * 1981-02-27 1982-09-02 Citizen Watch Co Ltd Operating controller for electronic device
JPS59109093A (ja) * 1982-12-14 1984-06-23 三菱自動車工業株式会社 登録型音声認識装置
JPH04129976A (ja) * 1990-09-20 1992-04-30 Toshiba Corp エレベータの音声認識制御装置
JP3101389B2 (ja) * 1992-01-24 2000-10-23 三洋電機株式会社 車両用操作スイッチ装置
JP4188989B2 (ja) * 2006-09-15 2008-12-03 本田技研工業株式会社 音声認識装置、音声認識方法、及び音声認識プログラム
CN103151038A (zh) * 2011-12-06 2013-06-12 张国鸿 在电子产品中实现语音识别操控的方法
JP6459330B2 (ja) * 2014-09-17 2019-01-30 株式会社デンソー 音声認識装置、音声認識方法、及び音声認識プログラム
JP6641830B2 (ja) 2015-09-18 2020-02-05 カシオ計算機株式会社 電子機器、制御方法及びプログラム
JP7230482B2 (ja) 2018-12-17 2023-03-01 コニカミノルタ株式会社 画像処理システム、画像形成装置、音声入力禁止判定方法及びプログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100333163A1 (en) * 2009-06-25 2010-12-30 Echostar Technologies L.L.C. Voice enabled media presentation systems and methods
CN105979324A (zh) * 2016-05-31 2016-09-28 青岛海信电器股份有限公司 一种智能电视控制遥控器麦克风的方法及装置
CN108600796A (zh) * 2018-03-09 2018-09-28 百度在线网络技术(北京)有限公司 智能电视的控制模式切换方法、设备及计算机可读介质
CN108986809A (zh) * 2018-08-30 2018-12-11 广东小天才科技有限公司 一种便携式设备及其唤醒方法和装置
CN109346071A (zh) * 2018-09-26 2019-02-15 出门问问信息科技有限公司 唤醒处理方法、装置及电子设备

Also Published As

Publication number Publication date
CN112930686A (zh) 2021-06-08
JP7206167B2 (ja) 2023-01-17
CN112930686B (zh) 2022-10-14
JP2021032906A (ja) 2021-03-01

Similar Documents

Publication Publication Date Title
US7586549B2 (en) Video apparatus and method for controlling the same
US8629940B2 (en) Apparatus, systems and methods for media device operation preferences based on remote control identification
JP4929177B2 (ja) 映像表示装置及び再生装置
US8619192B2 (en) Closed captioning preferences
JP2012100309A (ja) 消費者電化製品に関連する装置をボイス制御する方法及び装置
JP2007251711A (ja) テレビジョン受像機
JP2010187158A (ja) コンテンツ処理装置
US20050036069A1 (en) Image display apparatus having sound level control function and control method thereof
US20150341694A1 (en) Method And Apparatus For Using Contextual Content Augmentation To Provide Information On Recent Events In A Media Program
CN102668580A (zh) 显示装置、程序及记录有程序的计算机可读取的存储介质
JP4525673B2 (ja) 録画装置
WO2020121776A1 (ja) 受信装置および制御方法
TW201337860A (zh) 提供口述影像可用性通知的方法與設備
WO2021027892A1 (zh) 接收装置
KR20040104661A (ko) 디지탈 레코딩 및 관련 사용자 인터페이스를 제어하기위한 방법 및 장치
JP2007295100A (ja) テレビジョン受像機
US20150052551A1 (en) Emergency notification control device and emergency notification system
US20050237436A1 (en) Electronic device operated by remote controller
JP2018148381A (ja) 映像表示装置
JP2006042061A (ja) 放送受信装置、番組情報音声出力プログラム
JP2022127309A (ja) 電子機器、システム、およびプログラム
JP2020061046A (ja) 音声操作装置、音声操作方法、コンピュータプログラムおよび音声操作システム
JP2007123964A (ja) ディスク装置内蔵型テレビジョン受信装置
JP2017152960A (ja) 録画再生装置
JP2008054188A (ja) 表示装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20851555

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20851555

Country of ref document: EP

Kind code of ref document: A1