WO2021004511A1 - 电子设备、非易失性存储介质及声音识别方法 - Google Patents

电子设备、非易失性存储介质及声音识别方法 Download PDF

Info

Publication number
WO2021004511A1
WO2021004511A1 PCT/CN2020/101150 CN2020101150W WO2021004511A1 WO 2021004511 A1 WO2021004511 A1 WO 2021004511A1 CN 2020101150 W CN2020101150 W CN 2020101150W WO 2021004511 A1 WO2021004511 A1 WO 2021004511A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
sound
unit
voice recognition
electronic device
Prior art date
Application number
PCT/CN2020/101150
Other languages
English (en)
French (fr)
Inventor
山下丈次
Original Assignee
海信视像科技股份有限公司
东芝视频解决方案株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 海信视像科技股份有限公司, 东芝视频解决方案株式会社 filed Critical 海信视像科技股份有限公司
Priority to CN202080002706.5A priority Critical patent/CN112243588B/zh
Publication of WO2021004511A1 publication Critical patent/WO2021004511A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • H04N21/42206User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
    • H04N21/42222Additional components integrated in the remote control device, e.g. timer, speaker, sensors for detecting position, direction or movement of the remote control, microphone or battery charging device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/432Content retrieval operation from a local storage medium, e.g. hard-disk
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the embodiments of the present application relate to electronic devices, non-volatile storage media, and voice recognition methods.
  • the remote control when an instruction is given to a device equipped with a device capable of displaying information, such as a television device (hereinafter referred to as "TV”) and a personal computer (hereinafter referred to as "PC"), from a location away from the device, the remote control
  • TV television device
  • PC personal computer
  • the remote controller is basic, and it is considered that the remote controller is used as a sound collecting mechanism when searching for content and inputting characters.
  • a microphone is built in the remote control, and the sound emitted by the speaker is collected by the microphone and transmitted from the remote control to the main body of the TV through wireless communication.
  • the voice of the user is processed (voice recognition); the TV body has a built-in microphone, and the TV body directly collects the user's voice for processing.
  • the microphone is built into the remote control
  • Patent Document 1 Japanese Patent Application Publication No. 2006-319797
  • the speaker when the speaker holds the remote control, it is better to use the sound collected by the microphone of the remote control.
  • the speaker does not hold the remote control it is better to use the sound collected by the microphone on the TV main body side. In this way, the microphone needs to be used separately according to the speaker's condition.
  • the problem to be solved by this application is to provide a sound collecting unit on both the external terminal and the electronic device to improve the speaker’s instruction operability, and the ability to use multiple sound collecting units separately according to the speaker’s condition for effective use Electronic equipment, programs, and voice recognition methods for sounds collected by each sound collection unit.
  • An embodiment provides an electronic device that is wirelessly connected or wiredly connected to an external terminal having a first sound collection unit, the first sound collection unit collecting first sound around itself, wherein the electronic device has The first sound acquisition unit, the second sound collection unit, the second sound acquisition unit, the sound recognition unit, and the control unit.
  • the first sound acquisition unit acquires the first sound collected by the first sound collection unit of the external terminal from the external terminal.
  • the second sound collection unit collects second sounds around itself.
  • the second sound acquisition unit acquires the second sound collected by the second sound collection unit.
  • the voice recognition unit performs voice recognition processing on the input first voice and/or second voice.
  • the control unit inputs a voice that matches a preset condition among the first voice and the second voice to the voice recognition unit to perform voice recognition processing.
  • FIG. 1 is a diagram showing the structure of a recording and playback device according to an embodiment
  • FIG. 2 is a flowchart showing a first example of operation of the recording and playback device
  • FIG. 3 is a flowchart showing a second example of the operation of the recording and playback device
  • Fig. 4 is a flowchart showing a third operation example of the recording and playback device.
  • 1...Recording and playback device 14...Image display unit, 15...Speaker, 16...Operation unit, 18...IR receiving unit, 19...BT communication unit, 20...Remote control device (remote control), 21...Button, 21a...Setting Button, 21b...Voice button, 22...Signal processing unit, 23...IR transmitter, 24...Microphone, 25...Sound processing unit, 26...BT communication unit, 50...antenna, 51...tuner, 52...OFDM demodulator , 53... signal processing unit, 58... graphics processing unit, 59... sound processing unit, 61... OSD signal generation unit, 62... image processing unit, 64... input sound processing unit, 65... control module, 68... flash memory, 69...
  • Setting part 70... Recording part, 71... Voice recognition part, 72... Control part, 73... Communication interface (communication I/F), 76... USB interface (USB I/F), 81... Main body microphone, 100... Recording The main body of the playback device, 101, 102... Hard Disk Drive (HDD), 200, 201... Server, NTW... Network.
  • HDMI Hard Disk Drive
  • FIG. 1 is a diagram showing an example of a schematic configuration of a recording and playback device 1 according to an embodiment of an electronic device.
  • the recording and playback device 1 including the image display unit 14 will be described, but the image display unit 14 is not an essential structure.
  • the electronic device is, for example, a digital video recorder or the main body of a computer, the electronic device does not include the image display unit 14 and outputs display information to an external image display unit (display) via various cables or the like.
  • an electronic device for example, an air conditioner, a refrigerator, etc. may also be used as an electronic device.
  • the recording and playback device 1 is an electronic device wirelessly connected to a remote control device 20 (hereinafter referred to as "remote control 20") as an external terminal, and includes a recording and playback device main body 100 that is connected via a network
  • the NTW is connected to one or more service servers (service servers, server 200, server 201, etc.) serving as computers that provide retrieval services of content based on sound on the network.
  • service servers service servers, server 200, server 201, etc.
  • the recording and playback device 1 may also be connected to the remote controller 20 by wire.
  • the recording and playback device main body 100 is connected to the remote controller 20 through wireless communication such as Bluetooth (registered trademark) and infrared communication.
  • the remote controller 20 may be a remote controller dedicated to the recording/reproducing apparatus 1 as shown in this example, and may also be a unit having a communication function for communicating with an information terminal such as a smartphone or a tablet, and a microphone, for example.
  • the remote controller 20 has a plurality of buttons 21 for operating the functions of the recording and playback device main body 100, a signal processing section 22, an IR transmitting section 23 as a first transmitting section, a microphone 24 as a first sound collecting section, and a sound processing section 25 , And the Bluetooth communication unit 26 (hereinafter referred to as "BT communication unit 26") as the second transmission unit, and the like.
  • BT communication unit 26 As one of the plurality of buttons 21, there are provided a setting button 21a which is a button for calling a setting function, and a voice button 21b which is a button for operating a voice function.
  • the signal processing unit 22 generates signals corresponding to the pressing of the plurality of buttons 21.
  • the IR transmitter 23 outputs the signal generated by the signal processing unit 22 according to the operation of the voice button 21b through infrared communication.
  • the signal processing unit 22 By pressing the voice button 21b, the signal processing unit 22 generates a signal for starting the recording operation of the voice function of the recording and playback device main body 100, that is, an instruction signal for instructing the recording and playback device main body 100 to start recording (specific Trigger signal).
  • the microphone 24 has a narrow sound collection area (a directivity of 90° and a sound collection distance of several tens of centimeters), and it becomes effective by the operation of the voice button 21b, thereby collecting its own (microphone 24)
  • the first surrounding sound (mainly the sound made by the speaker toward the microphone 24), so a relatively high-quality sound can be obtained.
  • the sound processing unit 25 digitizes the analog sound collected by the microphone 24 and transmits it to the BT communication unit 26.
  • the BT communication unit 26 transmits the sound digitized by the sound processing unit 25 through Bluetooth communication. That is, the BT communication unit 26 and the sound processing unit 25 transmit the sound collected by the microphone 24 to the main body 100 of the recording/reproducing apparatus.
  • the recording and playback device main body 100 has an antenna 50 for terrestrial digital broadcast reception, a tuner 51, an OFDM demodulator 52, a signal processing unit 53, a graphics processing unit 58, a sound processing unit 59, an OSD signal generating unit 61, and an image display unit 14.
  • Speaker 15 operation unit 16, various terminals not shown (image output terminal, sound output terminal, etc.), various interfaces (IR receiving unit 18, BT communication unit 19, communication interface connected to LAN and external network NTW 73 (hereinafter referred to as "communication I/F 73”)), main body microphone 81, control module 65, hard disk drive 101 (hereinafter referred to as "HDD 101”), and the like.
  • the HDD 101 provided inside the device is also called a built-in HDD or the like.
  • the antenna 50 supplies the received terrestrial digital television broadcasting signal to the tuner 51 for terrestrial digital broadcasting.
  • the tuner 51 selects a broadcast signal of a designated channel from the supplied broadcast signals and supplies it to an OFDM (orthogonal frequency division multiplexing) demodulator 52.
  • the OFDM demodulator 52 demodulates the broadcast signal of the input channel into digital image signals and audio signals, and outputs them to the signal processing unit 53.
  • the signal processing unit 53 performs predetermined digital signal processing on the digital image signal and audio signal input from the OFDM demodulator 52 and outputs it to a graphics processing unit (graphic) 58 and a sound processing unit 59.
  • the graphics processing unit 58 superimposes the OSD signal generated by the OSD (on screen display) signal generating unit 61 on the digital image signal supplied from the signal processing unit 53 and outputs it to the image processing unit 62.
  • the graphics processing unit 58 can selectively output the output image signal of the signal processing unit 53 and the output OSD signal of the OSD signal generation unit 61, or can output these two outputs in combination.
  • the image processing unit 62 performs processing of brightness, brightness, chroma, etc. on the digital image signal input from the graphics processing unit 58 and supplies the image signal to the image display unit 14 and an image output terminal (not shown).
  • the image processing unit 62 functions as an output unit that outputs an image of the content to the screen.
  • the image display unit 14 is, for example, a display, a display panel, etc., and displays an image generated based on an image signal on the display panel.
  • the image signal supplied to the image output terminal is output to the external device.
  • the sound processing unit 59 converts the input digital sound signal into an analog sound signal that can be reproduced by the speaker 15 and outputs it to the speaker 15, thereby outputting sound.
  • the analog audio signal is output to the outside via an audio output terminal (not shown) such as a headset terminal.
  • the operating unit 16 is a button or switch provided in the main body 100 of the recording/reproducing device, and can perform operations substantially equivalent to the remote controller 20 for each function of the main body 100 of the recording/reproducing device.
  • the operating unit 16 inputs to the control module 65 a control instruction corresponding to a direct operation performed by the user.
  • the direct operation performed by the user refers to, for example, an EPG (electronic Program list) display, selection of TV broadcast (program) channel (television station) from EPG (electronic program list), program recording start (REC), program list display for playing recorded programs (past Program list), selection from the past program list for playing recorded programs (direction directions up, down, left, and right), PLAY, etc.
  • the main body microphone 81 is a second collection of the second sound (the speaker's voice) around itself (the main body microphone 81) (the directivity of a certain angle and a range of several meters in front of the screen of the image display unit 14).
  • the sound collection area is larger than the microphone 24 of the remote controller 20 (directivity of 120° and a sound collection distance of several meters).
  • the input sound processing unit 64 digitizes the analog sound collected by the main body microphone 81 and outputs it to the control module 65.
  • the input sound processing unit 64 functions as a second sound acquisition unit for acquiring the second sound collected by the main body microphone 81.
  • the main body microphone 81 always collects sounds in a state (active state) capable of collecting sounds, and switches to an inactive state (stopped when the voice button 21b of the remote controller 20 is pressed).
  • the state of the sound collection operation the microphone 24 of the remote controller 20 is set to be active, and the sound (first sound) collected by the microphone 24 is acquired from the remote controller 20.
  • the main body microphone 81 in a state (active state) capable of collecting sound even when the voice button 21b of the remote control 20 is pressed, and the sound collected from the following microphones or recording the sound
  • the sound is output to the sound recognition unit 71, where the above-mentioned microphone refers to the microphone or the sound collected by each of the two microphones 24 and 81 whose sound pressure is stronger (higher sound pressure) or the sound is clearly collected (clear (As a result, the side with high voice recognition rate) microphone.
  • SII Speech Intelligibility Index
  • SII Speech Intelligibility Index
  • ANSI S3.5-1997 the signal-to-noise ratio and the frequency-based coefficient (the contribution rate to the frequency-based resolution)
  • the sharpness index in terms of frequency use the sum of these sharpness indexes to solve the overall sharpness index.
  • the voice recognition rate can be evaluated according to any one of the sound pressure Pv and the intelligibility index SII.
  • the voice recognition rate can also be evaluated by the combination of the sound pressure Pv and the intelligibility index SII.
  • the voice recognition rate can be evaluated by linear addition of the sound pressure Pv and the intelligibility index SII as shown in the following equation (1).
  • the coefficients K1 and K2 are proportional coefficients.
  • the voice of the larger value R determined by the formula (1) can be the voice with the higher voice recognition rate.
  • the IR receiving unit 18 inputs an instruction (operation input) from the remote controller 20 to the control module 65 through infrared communication.
  • the instruction (operation input) from the remote controller 20 is, for example, the selection (selection of a channel) (television station). Channel), recording start (REC), playback of recorded programs (PLAY), temporary stop (PAUSE), special playback or menu display, etc.
  • the BT communication unit 19 performs Bluetooth communication (near-field communication) with the remote controller 20.
  • the BT communication unit 19 receives the sound signal transmitted from the remote controller 20 and inputs it to the control module 65.
  • the BT communication unit 19 functions as a first sound acquisition unit that acquires the first sound collected by the microphone 24 of the remote control 20 from the remote control 20.
  • the USB I/F 76 communicates data and signals with external connection devices (input devices, storage devices) and the like corresponding to the USB standard.
  • input device there are, for example, a keyboard and a mouse.
  • storage device as shown in this example, the HDD 102 or the like connected to a USB terminal.
  • the HDD 101 and HDD 102 can utilize various storage areas according to settings.
  • the HDD101 can be set to schedule or manually record the programs specified by the user from the electronic program guide (EPG), and the HDD102 can be set to perform the time-shifting function (also called full Program recording function: recording realized by "full recording function” or “circular recording function”), in which the time shift machine function refers to the specific channel (broadcasting platform, publishing platform) and The function of recording all the programs of the specified time period in a certain period.
  • EPG electronic program guide
  • the HDD102 can be set to perform the time-shifting function (also called full Program recording function: recording realized by "full recording function” or “circular recording function”), in which the time shift machine function refers to the specific channel (broadcasting platform, publishing platform) and The function of recording all the programs of the specified time period in a certain period.
  • the time shift machine function refers to the specific channel (broadcasting platform, publishing platform)
  • the function of recording all the programs of the specified time period in a certain period.
  • the HDD 101 is provided inside the device and the HDD 102 is connected outside the device.
  • multiple externally connected HDD 102 may be connected.
  • the communication I/F 73 is controlled by the control module 65 to access the external network NTW and communicate with various service servers on the external network NTW (server 200, server 201, etc. that provide content retrieval services based on voice recognition) Communication. Specifically, the communication I/F 73 is controlled by the control module 65 to perform a search request (transmission of input information) for obtaining information, reception of search results (acquisition of information), and the like.
  • the server 200 manages program information used for watching TV programs, recording reservations, and storing the history of recorded contents, and performs AI assistant function based on utterance (sound) program retrieval and program related content Search service (hereinafter referred to as "A service”, “first search service”, etc.).
  • a service utterance program retrieval and program related content Search service
  • the server 201 is a computer that provides a search service (hereinafter referred to as "B service”, “second search service”, etc.) of content on the Internet based on utterance (sound) with an AI assistant function, and can perform traffic information, weather information, and Search for a wide range of content such as Internet programs and dictionaries.
  • B service a search service
  • second search service etc.
  • the above-mentioned service of the service server corresponds not only to retrieval under voice, but also to retrieval under character data obtained by characterizing voice.
  • both the digital sound signal and its character data are called sound data.
  • the control module 65 includes a ROM (read only memory) 66 that stores a control program that manages the operation of the device, a RAM (random access memory) 67 that provides a work area for processing signals and data, saves recording reservation information, and various The flash memory 68, the setting unit 69, the recording unit 70, the voice recognition unit 71, the control unit 72, etc. of the setting information and control information, etc.
  • the control module 65 controls the recording and playback device main body 100 including the aforementioned signal processing, etc. All functions (broadcast receiving function, program recording and playback function, setting function, voice function, communication function with the network) and actions are controlled in a unified manner.
  • the voice function refers to the voice recognition function of the voice recognition unit 71 including a voice/character conversion function and a syntax analysis function.
  • the recording and playback device main body 100 receives terrestrial digital broadcasts through the broadcast receiving function, and uses the playback function to play programs (image data including sound) recorded on the HDD 101 and HDD 102 through the recording function, thereby enabling the user to watch the programs.
  • the main body of the recording and playback device 100 is connected to the home network, thereby being able to play back programs stored (recorded) on other video recorders or home servers connected to the home network.
  • the flash memory 68 stores a recording reservation table for performing reservation recording using the reservation recording function, a recording reservation table of an individual program, recording information that is attribute information of the recorded program, setting information of a voice function, and the like.
  • the setting information there are cases where it is set in advance, and there are cases where it is set from the setting menu screen displayed by the setting unit 69 in accordance with a user's selection operation.
  • the setting information includes selection conditions for selecting any one of retrieval services provided by one or more service servers (server 200, server 201, etc.).
  • the flash memory 68 can be said to store conditions for setting either of the two microphones 24, 81 to be valid (operating state) or ineffective (operating stopped state), or to use the two microphones 24, 81 A storage unit for the conditions of either of the two acquired sounds.
  • the setting unit 69 displays a screen for setting the setting information in the flash memory 68, and stores the determined setting information in the flash memory 68 based on the setting operation performed by the user.
  • the recording unit 70 stores (records) the first sound acquired by the BT communication unit 1 (first sound acquisition unit) and the second sound acquired by the input sound processing unit 64 (second sound acquisition unit) in the flash memory 68 or HDD 101 Wait.
  • the voice recognition unit 71 reads the voice recorded by the recording unit 70 from the flash memory 68 or the HDD 101 and analyzes it, that is, performs voice recognition processing.
  • the recorded sound may not be read out for processing, but the sound from the remote controller 20 (the first sound) received by the BT communication unit 26 may be processed. Or the sound (second sound) collected by the main body microphone 81 is analyzed in real time. Analyzing the voice refers to the following voice recognition processing: characterize the voice (the user's voice), and use the pre-set analysis dictionary to perform syntactic analysis on the characterized voice data to extract words and meanings Characters, or strings (keywords).
  • the control unit 72 inputs the first voice of the microphone 24 of the remote controller 20 and the second voice of the main body microphone 81 to the voice recognition unit 71 and performs voice recognition processing on the voice that matches the preset conditions.
  • the conditions include the following conditions “1.” to “3.”, etc.
  • the voice recognition unit 71 recognizes the first voice obtained from the remote controller 20;
  • the control unit 72 calls the control program held in the ROM 66 to the work area provided by the RAM 67, and executes processing corresponding to the input signal and the control signal based on the called control program.
  • the control unit 72 controls, for example, the recording and playback function and the voice function, and acquires various information (attribute information) related to the content (program).
  • the control unit 72 controls the various parts of the device (setting unit 69, recording unit 70) based on the operation information (control input) from the operation unit 16 and the operation information (control input) from the remote controller 20 received by the IR receiving unit 18 , Voice recognition unit 71, etc.).
  • control unit 72 writes various setting information and management information related to other video recorders and television devices connected to the home server in the home network into the flash memory 68.
  • the control unit 72 controls the recording and playback function based on an operation instruction (control input) performed by the user or recording reservation information for making reservation recording, and records (records) the output image signal, sound signal, etc. in a pre-designated party.
  • HDD either HDD101, HDD102).
  • the control unit 72 causes the service server (either the server 200 or the server 201) that provides the search service to use the character or character string based on the recognition result obtained by the voice recognition unit 71 and the acquired voice (first voice or second voice) To perform content retrieval, and receive retrieval results.
  • the service server either the server 200 or the server 201 that provides the search service to use the character or character string based on the recognition result obtained by the voice recognition unit 71 and the acquired voice (first voice or second voice) To perform content retrieval, and receive retrieval results.
  • control unit 72 makes a search request (transmission of input information) for obtaining content, reception of search results (acquisition of content), and the like to the service server (either the server 200 or the server 201).
  • control unit 72 makes a search request to the service server (either the server 200 or the server 201) via the communication I/F 73 so that the requested service server uses characters or characters based on the recognition result obtained by the voice recognition unit 71
  • the content is searched for at least a part of the character string and the acquired sound, and the control unit 72 outputs the search result for the search request received from the server to the image display unit 14.
  • control unit 72 transmits/receives information to/from a service server (server 200, server 201, etc.) connected to the external network NTW via the communication I/F 73. Furthermore, the control unit 72 described above performs information transfer with the USB-compatible device via the USB I/F 76.
  • control unit 72 displays the content (program) of the selected channel received by the tuner 51.
  • control unit 72 refers to the recording reservation information included in the recording reservation list stored in the flash memory 68 to control the recording operation of the content (program) obtained based on the signal received by the tuner 51.
  • the recording operation also includes recording based on manual operation.
  • the recording storage place of the content (program) during the recording operation is, for example, the HDD 101 installed in the device, the HDD 102 connected via the USB I/F 76, and the like.
  • the control unit 72 activates the main body microphone 81 and collects sounds from the periphery of the main body microphone 81 (step S101 in FIG. 2).
  • step S102 If the voice button 21b of the remote controller 20 is not operated during the sound collection by the main body microphone 81 and a signal is received (No in step S102), the control unit 72 controls the recording unit 70 and the voice recognition unit 71, The voice collected by the main body microphone 81 is recorded (step S103), and voice recognition processing is performed on the recorded voice (step S104).
  • the control unit 72 performs an operation on the service server (either the server 200 or the server 201) set in advance as the request target.
  • Search request includes at least a part of the recorded sound, and the words of the analysis result as required.
  • the service server (either the server 200 or the server 201) that has received the search request performs a content search based on the received sound and words, and transmits the search result (content) to the recording and playback device main body 100.
  • the recording and playback device main body 100 when receiving the search result (content) sent from the server (step S106), the content is output to the image display unit 14 (step S107) and displayed.
  • step S101 if the user operates the button 21 of the remote control 20 while the sound is being collected by the main body microphone 81 (step S101), the signal processing unit 22 in the remote control 20 generates a signal corresponding to the button 21. The signal is transmitted from the IR transmitter 23.
  • the signal processing unit 22 activates the microphone 24 and starts sound collection by the microphone 24.
  • the user speaks toward the microphone 24 of the remote controller 20
  • the user's voice is collected by the microphone 24 and processed by the sound, and then transmitted from the BT communication unit 26.
  • the control unit 72 determines whether the signal is the signal of the voice button 21b (step S108 ).
  • step S109 If the result of the determination is that it is not the signal of the voice button 21b (NO in step S108), the control of the function corresponding to the signal is performed (step S109).
  • the control unit 72 refers to the condition of the flash memory 68. Since the condition "1.” when performing this action is a condition to stop the action of the main body microphone 81 when a signal is received due to the operation of the voice button 21b of the remote control 20, the control unit 72 sets the main body microphone 81 If it is not valid (step S110), the collection of the second sound by the main body microphone 81 is stopped.
  • control unit 72 controls the recording unit 70 to record the first sound from the remote controller 20 (step S112).
  • the recording and playback device main body 100 is provided with the setting unit 69, the recording unit 70, the voice recognition unit 71, and the control unit 72.
  • the voice button 21b of the remote controller 20 is pressed, the signal is received.
  • setting the main body microphone 81 to be inactive and using the first voice acquired from the microphone 24 of the remote controller 20 for voice recognition processing can improve the accuracy of voice recognition.
  • voice collection and voice recognition processing based on the main body microphone 81 are usually performed.
  • the control unit 72 uses the trigger as an opportunity to switch
  • the main body microphone 81 is set to be inactive and the microphone 24 of the remote control 20 is set to be active.
  • the first sound collected by the remote control 20 that is close to the speaker is used for voice recognition processing.
  • the high-quality voice of the operating speaker (user) is used to perform voice recognition processing with high accuracy.
  • the control unit 72 activates the main body microphone 81 and collects sounds from the periphery of the main body microphone 81 (step S101 in FIG. 3).
  • control unit 72 If the voice button 21b of the remote controller 20 is not operated and a signal is received while the sound is being collected by the main body microphone 81 (No in step S102), the control unit 72 operates in the same manner as in the first operation example (step S103 ⁇ S107).
  • step S101 if the user operates the button 21 of the remote control 20 while the sound is being collected by the main body microphone 81 (step S101), the signal processing unit 22 in the remote control 20 generates a signal corresponding to the button 21. The signal is transmitted from the IR transmitter 23.
  • the signal processing unit 22 activates the microphone 24 and starts sound collection by the microphone 24.
  • the user speaks to the microphone 24 of the remote controller 20
  • the user's voice is collected by the microphone 24 and processed by the voice, and then transmitted from the BT communication unit 26.
  • the control unit 72 determines whether the signal is the signal of the voice button 21b (step S108 ).
  • step S109 If the result of the determination is that it is not the signal of the voice button 21b (NO in step S108), the control of the function corresponding to the signal is performed (step S109).
  • step S108 when the received signal is the signal of the voice button 21b (YES in step S108), then the control unit 72 waits for the reception of the sound from the remote controller 20, and when the sound from the remote controller 20 is received At this time (step S121), the recording unit 70 is controlled to record the sound from the remote controller 20 (step S122). It should be noted that during this period, since the main body microphone 81 is also maintained effective, the recording of the sound collected by the main body microphone 81 is also continued (step S103).
  • control unit 72 refers to the conditions of the flash memory 68. Since the condition "2." of this action is to make the voice recognition unit 71 recognize the first voice obtained from the remote controller 20 when a signal is received due to the operation of the voice button 21b of the remote controller 20, the control The section 72 inputs the first voice obtained from the remote controller 20 among the two voices recorded by the recording section 70 to the voice recognition section 71, and causes the voice recognition section 71 to perform voice recognition processing (S123). After that, the operation using the voice recognition result of the voice recognition unit 71 is the same as in the first embodiment.
  • the control unit 72 when a signal is received due to the operation of the voice button 21b of the remote controller 20, the control unit 72 will determine which of the two sounds (the first sound and the second sound) respectively recorded by the recording unit 70 The recorded first voice of the remote controller 20 is input to the voice recognition unit 71, and the voice recognition unit 71 performs voice recognition processing.
  • the trigger for the start of recording is the activation of the main body of the recording and playback device 100 or the pressing of the voice button 21b of the remote controller 20
  • the trigger is used as a trigger to simultaneously record the second sound of the main body microphone 81 and the remote controller.
  • the trigger source is the remote controller 20 that is close to the speaker (user)
  • the sound collected by the microphone 24 of the remote controller 20 is acquired and the voice recognition processing is performed. In this way, it is possible to improve the accuracy of voice recognition by performing recognition processing on the high-quality voice obtained by the remote controller 20 close to the speaker among the multiple voices recorded at the same time.
  • the operation from the activation of the recording/reproducing apparatus main body 100 to the recording of the sound collected by each microphone is the same as the second operation example, and the description thereof is omitted.
  • the control unit 72 refers to the conditions of the flash memory 68 while the two voices are recorded separately. Since the condition "3." of this action is to use the one with the better sound quality among the two recorded sounds, the control unit 72 performs sound quality inspection on the two sounds respectively recorded by the recording unit 70 The voice of the higher voice recognition rate of the two voices after the voice quality inspection is input to the voice recognition unit 71, and the voice recognition unit 71 is caused to perform voice recognition processing (S131, S132). After that, the operation using the voice recognition result of the voice recognition unit 71 is the same as in the first embodiment and the second embodiment.
  • the control unit 72 inspects the quality of each of the multiple voices (the first voice and the second voice) that are respectively acquired and recorded from the microphone 24 and the main body microphone 81 of the remote control 20, and the recorded
  • the voice with the best quality among the multiple voices is used for voice recognition processing, and therefore, the accuracy of voice recognition can be improved.
  • the microphone 24 is set to trigger the sound collection to be the same as the second action example.
  • the action itself of sound collection is always performed by each microphone to recognize the voice
  • the timing of the processing is when the voice button 21b of the remote controller 20 is pressed, that is, when the signal of the voice button 21b is received.
  • the remote controller 20 external terminal
  • the recording and playback device main body 100 electronic equipment
  • microphones sound collection unit
  • Voice collection using voices that match the conditions of "1.” to "3.” among the collected voices for voice recognition processing, which can improve the operability of the operator (speaker)'s instructions and be able to respond to speech
  • the plurality of microphones 24 and 81 are used separately according to the human condition to effectively use the sound collected by each microphone 24 and 81.
  • the sound collecting unit is switched to the microphone 24 close to the speaker, for example, so that high-quality sound data can be acquired.
  • microphones 24, 81, etc. are provided in the main body of the recording and playback device 100 and the remote controller 20, respectively.
  • multiple external terminals the first remote controller and the second (2) Remote control separately set up microphones and transmit multiple sounds from each remote control to the main body 100 of the recording and playback device.
  • the recording and playback device main body 100 acquires the first sound collected by the microphone of the first remote control and the second sound collected by the microphone of the second remote control, and selects the same in the recording and playback device main body 100. Pre-set sounds with consistent conditions and use them for voice recognition processing.
  • the constituent elements of the recording and playback device 1 shown in the above-mentioned embodiment may be realized by a program installed in a memory such as a hard disk device of a computer, or the above-mentioned program may be pre-stored in a nonvolatile computer readable
  • a non-volatile storage medium enables the computer to read the program from the non-volatile storage medium to realize the functions of the above-mentioned solution of the present application by the computer.
  • Examples of storage media include recording media such as CD-ROM, flash memory, and removable media.
  • the constituent elements may be distributedly stored in different computers connected via a network, and the functions of the present invention may be realized by communicating between the computers that enable each constituent element to function.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Databases & Information Systems (AREA)
  • Selective Calling Equipment (AREA)

Abstract

提供了一种在外部终端和电子设备这两方设置集音部来提高说话人的指示操作性并且根据说话人的状况来分开使用多个麦克风以有效地利用由各麦克风收集到的声音的电子设备、非易失性存储介质及声音识别方法。该电子设备具备第一声音获取部、第二集音部、第二声音获取部、声音识别部(71)和控制部(72)。第一声音获取部从外部终端获取外部终端的第一集音部收集到的第一声音。第二集音部对自身的周围的第二声音进行收集。第二声音获取部获取由第二集音部收集到的第二声音。声音识别部(71)对输入的第一声音和/或第二声音进行声音识别处理。控制部(72)将第一声音及第二声音中的与预先设定好的条件一致的声音向声音识别部(71)输入来进行声音识别处理。

Description

电子设备、非易失性存储介质及声音识别方法
本申请要求在2019年7月11日提交日本专利局、申请号为2019-129339、发明名称为“电子设备、程序及声音识别方法”的日本专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请的实施方式涉及电子设备、非易失性存储介质及声音识别方法。
背景技术
近年来,进行基于声音实现的设备的操作、信息、内容的检索的服务(基于AI实现的声音对话型的内容检索服务)的需求日益高涨。该检索服务具有无需手持遥控装置(以下称为“遥控器”)、仅通过与设备搭话就能够进行操作、信息检索这样的便利性,因此迅速地普及开来。
由于作为操作的对象的设备不仅是说话人搭话的设备,家庭内的所有设备都会成为对象,因此,预想到今后提供这样的检索服务的企业、设备制造商会增加。
另一方面,在针对电视装置(以下称为“TV”)、个人计算机(以下称为“PC”)这样的具备能够显示信息的元件的设备从远离设备的位置处施加指示的情况下,遥控器操作是基本的,并考虑在进行内容的检索、字符输入等时将遥控器活用为集音机构。
作为有效地利用由TV收集到的声音的具体的例子,例如考虑有如下等例子:在遥控器内置麦克风,将说话人发出的声音用麦克风收集并从遥控器通过无线通信向TV主体传递收集到的声音来进行处理(声音识别);在TV主体内置麦克风,TV主体直接收集用户发出的声音来进行处理。
就前者(在遥控器内置麦克风)的例子而言,由于麦克风与说话人的距离近,因此能够收集高品质的声音并对收集到的声音高精度地进行识别处理, 但另一方面,具有说话人需要手持遥控器这样的缺点。
另外,就后者(在TV主体内置麦克风)的例子而言,与前者相反,说话人无需手持遥控器就可以发声,但由于麦克风与说话人的距离远,因此无法期望由麦克风收集的声音的品质是高品质。
因此,考虑集两者优点的方案,即在遥控器和TV主体这两方设置麦克风。
在先技术文献
专利文献
专利文献1:日本特开2006-319797号公报
发明内容
然而,在遥控器(外部终端)和TV主体(电子设备)这两方设置麦克风(集音部)的情况下,会发生由各麦克风收集到的声音同时向TV主体输入的状况(声音的冲突),存在无法有效地利用收集到的声音的问题。
例如在说话人手持遥控器时,利用由遥控器的麦克风收集到的声音比较好,在说话人没有手持遥控器时,利用由TV主体侧的麦克风收集到的声音比较好。这样,需要根据说话人的状况来分开使用麦克风。
本申请要解决的课题在于,提供在外部终端和电子设备这两方设置集音部来提高说话人的指示操作性,并且能够根据说话人的状况来分开使用多个集音部以有效地利用由各集音部收集到的声音的电子设备、程序及声音识别方法。
实施方式提供一种电子设备,其与具有第一集音部的外部终端无线连接或者有线连接,所述第一集音部对自身的周围的第一声音进行收集,其中,所述电子设备具备第一声音获取部、第二集音部、第二声音获取部、声音识别部和控制部。第一声音获取部从所述外部终端获取外部终端的所述第一集音部收集到的第一声音。第二集音部对自身的周围的第二声音进行收集。第二声音获取部获取由第二集音部收集到的第二声音。声音识别部对输入的第一声音和/或第二声音进行声音识别处理。控制部将第一声音及第二声音中的 与预先设定好的条件一致的声音向声音识别部输入来进行声音识别处理。
附图说明
图1是表示实施方式的记录播放装置的结构的图;
图2是表示记录播放装置的第一动作例的流程图;
图3是表示记录播放装置的第二动作例的流程图;
图4是表示记录播放装置的第三动作例的流程图。
附图标记说明
1…记录播放装置、14…图像显示部、15…扬声器、16…操作部、18…IR接收部、19…BT通信部、20…遥控装置(遥控器)、21…按钮、21a…设定按钮、21b…语音按钮、22…信号处理部、23…IR发送部、24…麦克风、25…声音处理部、26…BT通信部、50…天线、51…调谐器、52…OFDM解调器、53…信号处理部、58…图形处理部、59…声音处理部、61…OSD信号生成部、62…图像处理部、64…输入声音处理部、65…控制模块、68…闪存、69…设定部、70…录音部、71…声音识别部、72…控制部、73…通信接口(通信I/F)、76…USB接口(USB I/F)、81…主体麦克风、100…记录播放装置主体、101、102…硬盘驱动器(HDD)、200、201…服务器、NTW…网络。
具体实施方式
以下,参照附图来详细地说明实施方式。
图1是表示涉及电子设备的一个实施方式的记录播放装置1的简要结构的一例的图。在本实施方式中,对具备图像显示部14的记录播放装置1进行说明,但图像显示部14不是必要结构。在电子设备例如是数码录像机或者计算机的主体等情况下,电子设备不具备图像显示部14,而是经由各种电缆等向外部的图像显示部(显示器)输出显示信息。此外,作为电子设备,例如还可以是空调、冰箱等。
参照图1对记录播放装置1的结构进行说明。如图1所示,记录播放装置1是与作为外部终端的遥控装置20(以下称为“遥控器20”)无线连接的电子设备,具备记录播放装置主体100,该记录播放装置主体100经由网络NTW与在网络上提供基于声音实现的内容的检索服务的一个以上的作为计算机的服务服务器(service server,服务器200、服务器201等)连接。记录播放装置1也可以与遥控器20有线连接。
记录播放装置主体100通过Bluetooth(注册商标)和红外线通信等无线通信与遥控器20连接。遥控器20除了如该例所示那样是记录播放装置1专用的遥控器以外,也可以是例如具有与智能手机、平板电脑等信息终端、麦克风进行通信的通信功能的单元。
遥控器20具有用于操作记录播放装置主体100的功能的多个按钮21、信号处理部22、作为第一发送部的IR发送部23、作为第一集音部的麦克风24、声音处理部25、以及作为第二发送部的Bluetooth通信部26(以下称为“BT通信部26”)等。作为多个按钮21之一,设有用于调出设定功能的按钮即设定按钮21a、用于使语音功能动作的按钮即语音按钮21b。
信号处理部22生成与多个按钮21的按下对应的信号。IR发送部23将根据语音按钮21b的操作而由信号处理部22生成的信号通过红外线通信来输出。通过对语音按钮21b进行按下操作,由此信号处理部22生成用于使记录播放装置主体100的语音功能开始录音动作的信号、即用于指示记录播放装置主体100开始录音的指示信号(特定的触发信号)。
麦克风24具有窄的集音区域(90°这种程度的指向性且几十厘米这种程度的集音距离),通过语音按钮21b的操作而变为有效,由此收集自身(麦克风24)的周围的第一声音(主要是说话人朝向麦克风24发出的声音),因此能够获得比较高品质的声音。
声音处理部25将由麦克风24收集到的模拟声音数字化而向BT通信部26传送。BT通信部26将由声音处理部25数字化后的声音通过Bluetooth通信来发送。即,BT通信部26及声音处理部25将由麦克风24收集到的声音 向记录播放装置主体100发送。
记录播放装置主体100具有地面数字广播接收用的天线50、调谐器51、OFDM解调器52、信号处理部53、图形处理部58、声音处理部59、OSD信号生成部61、图像显示部14、扬声器15、操作部16、未图示的各种端子(图像输出端子、声音输出端子等)、各种接口(IR接收部18、BT通信部19、与LAN、外部网络NTW连接的通信接口73(以下称为“通信I/F73”))、主体麦克风81、控制模块65、硬盘驱动器101(以下称为“HDD101”)等。将设置在设备内部的HDD101也称为内置HDD等。
天线50将接收到的地面数字电视广播信号向地面数字广播用的调谐器51供给。调谐器51从供给来的广播信号中选择指定的频道的广播信号并将其向OFDM(orthogonal frequency division multiplexing)解调器52供给。
OFDM解调器52将输入的频道的广播信号解调为数字的图像信号及声音信号后向信号处理部53输出。
信号处理部53对从OFDM解调器52输入的数字的图像信号及声音信号实施规定的数字信号处理并将其向图形处理部(graphic)58及声音处理部59输出。
图形处理部58在从信号处理部53供给的数字的图像信号上重叠由OSD(on screen display)信号生成部61生成的OSD信号并将其向图像处理部62输出。该图形处理部58能够选择性地输出信号处理部53的输出图像信号和OSD信号生成部61的输出OSD信号,或者将这两个输出组合地输出。
图像处理部62对从图形处理部58输入的数字的图像信号实施明度、亮度、彩度等的处理,并将该图像信号向图像显示部14和图像输出端子(未图示)供给。图像处理部62作为向画面输出内容的图像的输出部而发挥功能。
图像显示部14例如是显示器、显示面板等,将基于图像信号生成的图像显示于显示面板。当在图像输出端子连接外部设备时,供给到图像输出端子的图像信号向外部设备输出。
声音处理部59将输入的数字的声音信号转换为能够由扬声器15播放的 模拟声音信号后向扬声器15输出,由此输出声音。模拟声音信号经由头戴式耳机(headphone)端子等声音输出端子(未图示)向外部输出。
操作部16是设置于该记录播放装置主体100的按钮、开关这类的部件,能够针对记录播放装置主体100的各功能进行与遥控器20大致同等的操作。
详细而言,操作部16将与基于用户进行的直接操作对应的控制指令向控制模块65输入,其中,基于用户进行的直接操作例如是指用于收看节目、对节目进行录像预约的EPG(电子节目表)显示、从EPG(电子节目表)中进行的电视广播(节目)的频道(电视台)的选择、节目的录像开始(REC)、用于播放录像完成的节目的节目的列表显示(过去节目表)、从过去节目表中进行的用于播放录好的节目的选择(上下左右的方向指示)、播放(PLAY)等。
主体麦克风81是对自身(主体麦克风81)的周围(图像显示部14的画面前方的具有某角度的指向性且几米的范围)的第二声音(说话人的声音)进行收集的第二集音部,在比遥控器20的麦克风24大的集音区域(120°这种程度的指向性且几米这种程度的集音距离)内进行声音收集。
输入声音处理部64将由主体麦克风81收集到的模拟声音数字化而向控制模块65输出。输入声音处理部64作为用于获取由主体麦克风81收集的第二声音的第二声音获取部而发挥功能。
通常,在记录播放装置主体100动作的期间,主体麦克风81在能够收集声音的状态(有效状态)下始终收集声音,在遥控器20的语音按钮21b被按下时切换为非有效状态(停止了集音动作的状态),遥控器20的麦克风24被设为有效,从遥控器20获取由麦克风24收集到的声音(第一声音)。
此外,也可以在遥控器20的语音按钮21b被按下时仍将主体麦克风81维持为能够收集声音的状态(有效状态),将从如下的麦克风收集到的声音或者对该声音进行录音所得的声音向声音识别部71输出,其中,上述麦克风是指两个麦克风24、81中的各自收集到的声音的压力更强(声压更大)的这一方的麦克风或者声音被清晰地收集(清晰性高)的这一方(其结果是,声音 识别率高的这一方)的麦克风。
声音的清晰性例如由清晰度指数(作为一例是SII:Speech Intelligibility Index)来评价。SII作为“ANSI S3.5-1997”而被标准化,基本上在每个划分出的频带内根据信噪比和按频率来说的系数(对按频率来说的清晰度的贡献率)来求解按频率来说的清晰度指数,利用这些清晰度指数的总和来求解整体的清晰度指数。
也可以对其进行简化,将频带限定为显著有助于声音的清晰度的频带区域(例如1000Hz~3000Hz)来求解清晰度指数。
此时,可以根据声压Pv和清晰度指数SII中的任一个来评价声音识别率的高低。
需要说明的是,也可以通过声压Pv与清晰度指数SII的组合来评价声音识别率的高低。例如,可以如下面的式(1)所示那样通过声压Pv和清晰度指数SII的线性加法运算来评价声音识别率。
R=K1*Pv+K2*SII…式(1)
这里,系数K1、K2是比例系数。
即,可以将由式(1)确定的值R大的这一方的声音设为声音识别率高的声音。
IR接收部18将与来自遥控器20的指示(操作输入)对应的指令通过红外线通信向控制模块65输入,其中,来自遥控器20的指示(操作输入)例如是频道(电视台)的选择(选台)、录像开始(REC)、录好的节目的播放(PLAY)、暂时停止(PAUSE)、特殊播放或者菜单显示等。
BT通信部19与遥控器20进行Bluetooth通信(近距离无线通信)。BT通信部19接收从遥控器20发送的声音信号并将其向控制模块65输入。BT通信部19作为从遥控器20获取由遥控器20的麦克风24收集到的第一声音的第一声音获取部而发挥功能。
此外,也可以通过具备WiFi(Wireless Fidelity)通信部等来与符合WiFi规格等的近距离无线通信设备之间进行无线通信。进而,还可以设置NFC (Near Field Communication)等规格的近距离无线通信部来与同规格的外部设备进行通信。
USB I/F76和与USB规格对应的外部连接装置(输入装置、存储装置)等进行数据、信号的通信。作为输入装置,例如有键盘、鼠标等。作为存储装置,如该例所示那样是与USB端子连接的HDD102等。HDD101、HDD 102能够根据设定来各种利用存储区域。
可以对HDD101进行设定以使其对用户从电子节目表(EPG)中单独指定了的节目进行预约录像或手动录像,对HDD102进行设定以使其进行基于时移机功能(也称为全节目录像功能:“全录功能”或“循环录像功能”)实现的录像,其中,时移机功能(Time shift machine)是指将用户预先指定了的特定的频道(广播平台、发布平台)以及规定的时段的节目在一定期间内全部记录下来的功能。另外,与上述设定相反的设定也是可以的。
需要说明的是,在该例中,对在设备内部设置有HDD101且在设备外部连接有HDD102的例子进行了说明,但也可以连接多个外部连接的HDD102。
通信I/F73由控制模块65控制来进行向外部网络NTW的访问和与外部网络NTW上的各种服务服务器(提供基于声音识别实现的内容的检索服务的服务器200、服务器201等)之间的通信。具体而言,通信I/F73由控制模块65控制来进行用于获取信息的检索要求(发送输入信息)、检索的结果的接收(获取信息)等。
服务器200对用于电视节目的收看、录像预约、录像完成的内容的历史记录保管等的节目信息进行管理,进行AI助手功能的基于发声(声音)实现的节目的检索及与节目相关联的内容的检索服务(以下称为“A服务”、“第一检索服务”等)。
服务器201是提供AI助手功能的基于发声(声音)实现的互联网上的内容的检索服务(以下称为“B服务”、“第二检索服务”等)的计算机,能够进行交通信息、气象信息、互联网节目、字典等大范围的内容的检索。
上述的服务服务器的服务不仅与声音下的检索对应,还与将声音字符化 而得到的字符数据下的检索对应。这里,将数字的声音信号及其字符数据都包含在内而称为声音数据。
控制模块65具备保存有管理该装置的动作的控制程序的ROM(read only memory)66、提供对信号、数据进行处理时的工作区域的RAM(random access memory)67、保存录像预约信息、各种的设定信息、及控制信息等的闪存68、设定部69、录音部70、声音识别部71、控制部72等,控制模块65对包括上述的信号处理等在内的记录播放装置主体100的全部功能(广播接收功能、节目的录像及播放功能、设定功能、语音功能、与网络的通信功能)及动作进行统一控制。语音功能是指包括声音/字符转换功能及句法解析功能在内的声音识别部71的声音识别功能。
由此,记录播放装置主体100通过广播接收功能来接收地面数字广播,利用播放功能来播放借助录像功能记录于HDD101、HDD102的节目(包含声音的图像数据),由此使得用户能够收看节目。另外,记录播放装置主体100通过与家庭网络连接,由此能够播放保存(记录)于家庭网络上连接的其他的录像机或者家庭服务器中的节目。
在闪存68中存储有用于利用预约录像功能来进行预约录像的录像预约表或个别的节目的录像预约表、录好的节目的属性信息即录像信息、语音功能的设定信息等。就设定信息而言,存在预先设定好的情况,也存在根据用户的选择操作来从设定部69所显示的设定菜单画面中进行设定的情况。设定信息包括用于从基于一个以上的服务服务器(服务器200、服务器201等)提供的检索服务中选定任一个的选定条件。
即,闪存68可以说是存储有用于将两个麦克风24、81中的任一个设为有效(动作状态)或非有效(动作停止状态)的条件、或者用于利用由两个麦克风24、81获取的两个声音中的任一个的条件的存储部。
设定部69显示用于在闪存68中对设定信息进行设定的画面,在基于用户进行的设定操作之后,将确定出的设定信息存储于闪存68。
录音部70将由BT通信部1(第一声音获取部)获取到的第一声音及由 输入声音处理部64(第二声音获取部)获取到的第二声音存储(录制)于闪存68或HDD101等。
声音识别部71将由录音部70录制的声音从闪存68或HDD101等读出并进行解析,即进行声音识别处理。
需要说明的是,若记录播放装置主体100的处理能力高,则也可以不将录制的声音读出来进行处理,而是对由BT通信部26接收的来自遥控器20的声音(第一声音)或由主体麦克风81收集的声音(第二声音)实时进行解析。对声音进行解析是指如下的声音识别处理:将声音(用户发出的声音)字符化,使用预先设定的解析用的字典对字符化后的声音数据进行句法解析,从而提取出单词、具有意思的字符、或者字符串(关键词)。
控制部72将遥控器20的麦克风24的第一声音及主体麦克风81的第二声音中的与预先设定的条件一致的声音向声音识别部71输入并对其进行声音识别处理。
这里,条件有以下的“1.”~“3.”的条件等。
“1.”的条件…例如在由于遥控器20的语音按钮21b的操作而接收到信号的情况下,停止主体麦克风81的动作;
“2.”的条件…在由于遥控器20的语音按钮21b的操作而接收到信号的情况下,使声音识别部71识别从遥控器20得到的第一声音;
“3.”的条件…使用录制的两个声音中的音质好的这一方的声音。
控制部72将保持于ROM66的控制程序调出到RAM67所提供的工作区域,并基于调出的控制程序来执行与输入信号、控制信号对应的处理。
控制部72例如对记录播放功能、语音功能进行控制,获取与内容(节目)相关联的各种信息(属性信息)。
控制部72基于来自操作部16的操作信息(控制输入)、IR接收部18接收到的来自遥控器20的操作信息(控制输入),来控制该装置的各部(设定部69、录音部70、声音识别部71等)。
另外,控制部72将各种的设定信息、与在家庭网络中连接于家庭服务器 的其他的录像机、电视装置有关的管理信息等写入闪存68。
控制部72例如基于由用户进行的操作指示(控制输入)或者用于进行预约录像的录像预约信息来控制记录播放功能,将输出的图像信号、声音信号等录像(记录)于预先指定的一方的HDD(HDD101、HDD102中的任一方)中。
控制部72使提供检索服务的服务服务器(服务器200、服务器201中的任一个)使用基于声音识别部71得到的识别结果的字符或字符串以及所获取的声音(第一声音或第二声音)来进行内容的检索,并且,接收检索的结果。
即,控制部72对服务服务器(服务器200、服务器201中的任一个)进行用于获取内容的检索要求(发送输入信息)、检索的结果的接收(获取内容)等。
具体而言,控制部72经由通信I/F73对服务服务器(服务器200、服务器201中的任一个)进行检索要求以使该被要求的服务服务器使用基于声音识别部71得到的识别结果的字符或字符串以及所获取的声音的至少一部分来进行内容的检索,并且,控制部72将从该服务器接收到的针对检索要求的检索结果向图像显示部14输出。
另外,控制部72经由通信I/F73向/从连接到外部的网络NTW上的服务服务器(服务器200、服务器201等)发送/接收信息。进而,上述控制部72经由USB I/F76与USB对应设备进行信息传送。
进而,控制部72显示由调谐器51接收且被选定的频道的内容(节目)。另外,控制部72参照存储于闪存68的录像预约列表所包含的录像预约信息,对基于由调谐器51接收到的信号而得到的内容(节目)的录像动作进行控制。录像动作也包括基于手动操作进行的录像等。进行录像动作时的内容(节目)的录像存放处例如是设置在设备内部的HDD101、经由USB I/F76连接的HDD102等。
以下,参照图2至图4来说明与上述的“1.”~“3.”的条件对应的动作。首先,参照图2的流程图来说明该记录播放装置1的与“1.”的条件对应 的第一动作例。
在该第一动作例的情况下,当记录播放装置主体100起动时,控制部72将主体麦克风81设为有效并进行来自主体麦克风81周边的声音收集(图2的步骤S101)。
若是在由主体麦克风81收集声音的期间没有对遥控器20的语音按钮21b进行操作而接收到信号(在步骤S102中为否),则控制部72控制录音部70及声音识别部71,对由主体麦克风81收集到的声音进行录制(步骤S103),并对录制的声音进行声音识别处理(步骤S104)。
然后,控制部72基于声音识别处理的结果(单词(字符)、字符串、关键词等)及声音,对预先被设定为要求对象的服务服务器(服务器200、服务器201中的任一个)进行检索要求(步骤S105)。检索要求中包括录制的声音的至少一部分,根据需要包括解析结果的单词等。
在接收到检索要求的服务服务器(服务器200、服务器201中的任一个)中,基于接收到的声音、单词来进行内容的检索,并将检索的结果(内容)向记录播放装置主体100发送。
在记录播放装置主体100中,在接收到从服务器发送来的检索的结果(内容)时(步骤S106),将该内容向图像显示部14输出(步骤S107)并进行显示。
另一方面,若是在由主体麦克风81收集声音的期间(步骤S101)用户对遥控器20的按钮21进行操作,则在遥控器20中,信号处理部22生成与按钮21对应的信号,生成的信号从IR发送部23发送出去。
这里,例如在作为遥控器20的特定的按钮的语音按钮21b被按下时,信号处理部22将麦克风24设为有效,开始基于麦克风24进行的声音收集。
这里,在用户朝向遥控器20的麦克风24发声时,用户的声音在被麦克风24收集并进行声音处理之后从BT通信部26发送出去。
在记录播放装置主体100中,在从遥控器20发送来的IR信号被IR接收部18接收时(在步骤S102中为是),控制部72判定该信号是否是语音按钮 21b的信号(步骤S108)。
若判定的结果为不是语音按钮21b的信号(在步骤S108中为否),则进行与该信号对应的功能的控制(步骤S109)。
另一方面,在接收到的信号是语音按钮21b的信号的情况下(在步骤S108中为是),接着,控制部72参照闪存68的条件。由于进行该动作时的条件“1.”是在由于遥控器20的语音按钮21b的操作而接收到信号的情况下停止主体麦克风81的动作这样的条件,因此,控制部72将主体麦克风81设为非有效(步骤S110),停止基于主体麦克风81进行的第二声音的收集。
并且,在接收到来自遥控器20的第一声音时(步骤S111),控制部72控制录音部70来对来自遥控器20的第一声音进行录制(步骤S112)。
根据该第一动作例,在记录播放装置主体100设置有设定部69、录音部70、声音识别部71及控制部72,在遥控器20的语音按钮21b被按下而接收到其信号的情况下,将主体麦克风81设为非有效并将从遥控器20的麦克风24获取到的第一声音用于声音识别处理,由此能够提高声音识别的精度。
例如,通常进行基于主体麦克风81实现的声音收集及声音识别处理,在遥控器20的语音按钮21b被按下而接收到录音开始的触发信号的情况下,控制部72以该触发为契机而将主体麦克风81设为非有效且将遥控器20的麦克风24设为有效,将由距离说话人近的遥控器20收集到的第一声音用于声音识别处理,由此,能够获取对遥控器20进行了操作的说话人(用户)的高品质的声音来高精度地进行声音识别处理。
接着,参照图3的流程图对该记录播放装置1的与“2.”的条件对应的第二动作例进行说明。需要说明的是,在该第二动作例中,对与第一动作例相同的动作标注同一符号并省略其说明。
在该第二动作例的情况下,当记录播放装置主体100起动时,控制部72将主体麦克风81设为有效,进行来自主体麦克风81周边的声音收集(图3的步骤S101)。
若是在由主体麦克风81收集声音的期间没有对遥控器20的语音按钮21b 进行操作而接收到信号(在步骤S102中为否),则控制部72与第一动作例同样地进行动作(步骤S103~S107)。
另一方面,若是在由主体麦克风81收集声音的期间(步骤S101)用户对遥控器20的按钮21进行操作,则在遥控器20中,信号处理部22生成与按钮21对应的信号,生成的信号从IR发送部23发送出去。
这里,例如在作为遥控器20的特定的按钮即语音按钮21b被按下时,信号处理部22将麦克风24设为有效,开始由麦克风24进行的声音收集。
并且,在用户朝向遥控器20的麦克风24发声时,用户的声音在被麦克风24收集并进行声音处理之后从BT通信部26发送出去。
在记录播放装置主体100中,在从遥控器20发送来的IR信号被IR接收部18接收时(在步骤S102中为是),控制部72判定该信号是否是语音按钮21b的信号(步骤S108)。
若判定的结果为不是语音按钮21b的信号(在步骤S108中为否),则进行与该信号对应的功能的控制(步骤S109)。
另一方面,在接收到的信号是语音按钮21b的信号的情况下(在步骤S108中为是),接着,控制部72等待来自遥控器20的声音的接收,在接收到遥控器20的声音时(步骤S121),对录音部70进行控制而对来自遥控器20的声音进行录制(步骤S122)。需要说明的是,在此期间,由于主体麦克风81也维持着有效,因此由主体麦克风81收集到的声音的录制也在继续(步骤S103)。
接着,控制部72参照闪存68的条件。由于该动作的条件“2.”是在由于遥控器20的语音按钮21b的操作而接收到信号的情况下使声音识别部71识别从遥控器20得到的第一声音这样的条件,因此,控制部72将由录音部70分别录制的两个声音中的从遥控器20得到的第一声音向声音识别部71输入,并使声音识别部71进行声音识别处理(S123)。之后,使用声音识别部71的声音识别结果的动作与第一实施方式相同。
根据该第二动作例,在由于遥控器20的语音按钮21b的操作而接收到信号的情况下,控制部72将由录音部70分别录制的两个声音(第一声音及第 二声音)中的录制好的遥控器20的第一声音向声音识别部71输入,并使声音识别部71进行声音识别处理。
例如,在录音开始的触发是记录播放装置主体100的起动或者遥控器20的语音按钮21b的按下的情况下,以该触发为契机同时地进行主体麦克风81的第二声音的录制和遥控器20的麦克风24的第一声音的录制。并且,若触发发生源是距离说话人(用户)近的遥控器20,则获取由遥控器20的麦克风24收集到的声音来进行声音识别处理。这样,通过对同时录制好的多个声音中的、由距离说话人近的遥控器20得到的高品质的声音进行识别处理,由此能够提高声音的识别精度。
接着,参照图4的流程图来说明该记录播放装置1的与“3.”的条件对应的第三动作例。需要说明的是,在该第三动作例中,对与第二动作例相同的动作标注同一符号并省略其说明。
在该第三动作例的情况下,从记录播放装置主体100起动到对由各麦克风收集的声音进行录制为止的动作与第二动作例相同,省略其说明。
控制部72在两个声音分别被录制的期间参照闪存68的条件。由于该动作的条件“3.”是使用录制好的两个声音中的音质好的这一方的声音这样的条件,因此,控制部72对由录音部70分别录制好的两个声音进行音质检验,将音质检验后的两个声音中的声音识别率高的这一方的声音向声音识别部71输入,并且使声音识别部71进行声音识别处理(S131、S132)。之后,使用声音识别部71的声音识别结果的动作与第一实施方式及第二实施方式相同。
根据该第三动作例,控制部72对从遥控器20的麦克风24及主体麦克风81分别获取且录制好的多个声音(第一声音及第二声音)各自的品质进行检验,将录制好的多个声音中的品质最好的声音用于声音识别处理,因此,能够提高声音识别的精度。
需要说明的是,在该第三动作例中,将麦克风24开始进行声音收集的触发设为与第二动作例相同,但也可以是,由各麦克风始终进行声音收集这个动作本身,将声音识别处理的时机设为遥控器20的语音按钮21b被按下时、 即接收到语音按钮21b的信号时。
如以上所说明的那样,根据该实施方式的记录播放装置1,构成为在遥控器20(外部终端)和记录播放装置主体100(电子设备)这两方设置有麦克风(集音部)来进行声音收集,将收集到的声音中的与“1.”~“3.”的条件一致的声音用于声音识别处理,由此能够提高操作者(说话人)的指示操作性,并且能够根据说话人的状况来分开使用多个麦克风24、81以有效地利用由各麦克风24、81收集到的声音。
另外,在本实施方式中,通过根据说话人的状况来分开使用多个麦克风24、81,由此将集音部切换为例如离说话人近的麦克风24,从而能够获取高音质的声音数据。此外,还可获得能够避免在由遥控器20的麦克风24收集声音的期间主体麦克风81误进行反应这样的效果。
需要说明的是,在上述实施方式中,示出了在记录播放装置主体100和遥控器20分别设置有麦克风24、81等的例子,但也可以在多个外部终端(第一遥控器及第二遥控器)分别设置麦克风并将多个声音从各遥控器向记录播放装置主体100传送。
即,也可以构成为,记录播放装置主体100获取由第一遥控器的麦克风收集到的第一声音和由第二遥控器的麦克风收集到的第二声音,选择与在记录播放装置主体100内部预先设定好的条件一致的声音并将其用于声音识别处理。
对本发明的实施方式进行了说明,但该实施方式是作为例来提示的实施方式,并不有意地限定发明的范围。上述的新的实施方式能够以其他的各种形态来实施,在不脱离发明的主旨的范围内进行各种省略、置换、变更。上述实施方式及其变形包含在发明的范围、主旨中,并且包含在与权利要求书所记载的发明等同的范围中。
另外,也可以将上述实施方式所示的记录播放装置1的各构成要素通过安装在计算机的硬盘装置等存储器中的程序来实现,还可以将上述程序预先存储于计算机可读取的非易失性存储介质,通过使计算机从非易失性存储介 质读取程序来由计算机实现本申请上述的方案的功能。
作为存储介质,例如包括CD-ROM等记录介质、闪存、可移动介质(Removable media)等。进而,也可以将构成要素分散地存储于经由网络连接的不同的计算机中,通过在使各构成要素发挥功能的计算机之间进行通信来实现本发明的功能。

Claims (7)

  1. 一种电子设备,其与具有第一集音部的外部终端无线连接或者有线连接,所述第一集音部对自身的周围的第一声音进行收集,其中,
    所述电子设备具备:
    第一声音获取部,其从所述外部终端获取所述外部终端的所述第一集音部收集到的第一声音;
    第二集音部,其对自身的周围的第二声音进行收集;
    第二声音获取部,其获取由所述第二集音部收集到的第二声音;
    声音识别部,其对输入的第一声音和/或第二声音进行声音识别处理;以及
    控制部,其将所述第一声音及所述第二声音中的与预先设定好的条件一致的声音向所述声音识别部输入来进行声音识别处理。
  2. 根据权利要求1所述的电子设备,其中,
    所述电子设备还具备录音部,所述录音部对由所述第一声音获取部获取到的所述第一声音及由所述第二声音获取部获取到的所述第二声音进行录音,
    所述控制部使所述声音识别部对由所述录音部录音的所述第一声音及所述第二声音中的与所述条件一致的声音进行识别。
  3. 根据权利要求2所述的电子设备,其中,
    所述电子设备还具备接收部,所述接收部接收从所述外部终端发送的指示信号,
    所述条件是来自所述外部终端的特定的指示信号的接收,
    所述控制部在由所述接收部接收到来自所述外部终端的特定的指示信号的情况下,将得到的所述第一声音及所述第二声音中的从所述外部终端得到的所述第一声音向所述声音识别部输入来进行识别。
  4. 根据权利要求1所述的电子设备,其中,
    所述电子设备还具备接收部,所述接收部接收从所述外部终端发送的指 示信号,
    所述条件是来自所述外部终端的特定的指示信号的接收,
    所述控制部在由所述接收部接收到来自所述外部终端的特定的指示信号的情况下,停止所述第二集音部的动作,并且对从所述外部终端得到的所述第一声音进行声音识别处理。
  5. 根据权利要求1或2所述的电子设备,其中,
    所述条件是使用声音识别率高的这一方的声音,
    所述控制部使所述声音识别部对所述第一声音及所述第二声音中的、所述声音识别率高的这一方的声音进行声音识别处理。
  6. 一种计算机可读的非易失性存储介质,所述存储介质存储有使电子设备动作的程序或指令,所述电子设备与具有第一集音部的外部终端无线连接或者有线连接,所述第一集音部对自身的周围的第一声音进行收集,其中,
    所述程序或指令使所述电子设备通过如下的部件来发挥功能:
    第一声音获取部,其从所述外部终端获取所述外部终端的所述第一集音部收集到的第一声音;
    第二声音获取部,其获取第二声音,所述第二声音是由设置于所述电子设备的第二集音部从所述第二集音部的周围收集到的声音;
    声音识别部,其对输入的第一声音和/或第二声音进行声音识别处理;以及
    控制部,其将所述第一声音及所述第二声音中的与预先设定好的条件一致的声音向所述声音识别部输入来进行声音识别处理。
  7. 一种用于电子设备的声音识别方法,所述电子设备与具有第一集音部的外部终端无线连接或者有线连接,所述第一集音部对自身的周围的第一声音进行收集,其中,
    声音识别方法包括如下的步骤:
    从所述外部终端获取所述外部终端的所述第一集音部收集到的第一声音;
    获取第二声音,所述第二声音是由设置于所述电子设备的第二集音部从 所述第二集音部的周围收集到的声音;以及
    对所述第一声音及所述第二声音中的与预先设定好的条件一致的声音进行声音识别处理。
PCT/CN2020/101150 2019-07-11 2020-07-09 电子设备、非易失性存储介质及声音识别方法 WO2021004511A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202080002706.5A CN112243588B (zh) 2019-07-11 2020-07-09 电子设备、非易失性存储介质及声音识别方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019129339A JP7216621B2 (ja) 2019-07-11 2019-07-11 電子機器、プログラムおよび音声認識方法
JP2019-129339 2019-07-11

Publications (1)

Publication Number Publication Date
WO2021004511A1 true WO2021004511A1 (zh) 2021-01-14

Family

ID=74114403

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/101150 WO2021004511A1 (zh) 2019-07-11 2020-07-09 电子设备、非易失性存储介质及声音识别方法

Country Status (3)

Country Link
JP (1) JP7216621B2 (zh)
CN (1) CN112243588B (zh)
WO (1) WO2021004511A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103187063A (zh) * 2011-12-30 2013-07-03 三星电子株式会社 电子装置和控制电子装置的方法
CN103716669A (zh) * 2012-09-28 2014-04-09 三星电子株式会社 电子装置及其控制方法
CN108600810A (zh) * 2018-05-03 2018-09-28 四川长虹电器股份有限公司 利用语音遥控器提高语音识别精度的电视系统及方法
EP3474557A1 (en) * 2016-07-05 2019-04-24 Samsung Electronics Co., Ltd. Image processing device, operation method of image processing device, and computer-readable recording medium
CN109767766A (zh) * 2019-01-23 2019-05-17 海信集团有限公司 一种语音识别方法及装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03284589A (ja) * 1990-03-30 1991-12-16 Toshiba Corp エレベータの音声登録装置
JP2001222291A (ja) 2000-02-08 2001-08-17 Kenwood Corp 音声認識装置を用いた制御装置
ES2273870T3 (es) 2000-07-28 2007-05-16 Koninklijke Philips Electronics N.V. Sistema para controlar una aparato con instrucciones de voz.
JP4724943B2 (ja) 2001-04-05 2011-07-13 株式会社デンソー 音声認識装置
JP2011118822A (ja) 2009-12-07 2011-06-16 Nec Casio Mobile Communications Ltd 電子機器、発話検出装置、音声認識操作システム、音声認識操作方法及びプログラム
JP2012047924A (ja) 2010-08-26 2012-03-08 Sony Corp 情報処理装置、および情報処理方法、並びにプログラム
CN103594088A (zh) * 2013-11-11 2014-02-19 联想(北京)有限公司 一种信息处理方法和电子设备
CN109542386B (zh) * 2017-09-22 2022-05-06 卡西欧计算机株式会社 录音装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103187063A (zh) * 2011-12-30 2013-07-03 三星电子株式会社 电子装置和控制电子装置的方法
CN103716669A (zh) * 2012-09-28 2014-04-09 三星电子株式会社 电子装置及其控制方法
EP3474557A1 (en) * 2016-07-05 2019-04-24 Samsung Electronics Co., Ltd. Image processing device, operation method of image processing device, and computer-readable recording medium
CN108600810A (zh) * 2018-05-03 2018-09-28 四川长虹电器股份有限公司 利用语音遥控器提高语音识别精度的电视系统及方法
CN109767766A (zh) * 2019-01-23 2019-05-17 海信集团有限公司 一种语音识别方法及装置

Also Published As

Publication number Publication date
CN112243588A (zh) 2021-01-19
JP7216621B2 (ja) 2023-02-01
JP2021015180A (ja) 2021-02-12
CN112243588B (zh) 2022-07-26

Similar Documents

Publication Publication Date Title
US11270704B2 (en) Voice enabled media presentation systems and methods
US10957323B2 (en) Image display apparatus and method of controlling the same
US9219949B2 (en) Display apparatus, interactive server, and method for providing response information
US8321898B2 (en) Content display-playback system, content display-playback method, and recording medium and operation control apparatus used therewith
CN102845076B (zh) 显示装置、控制装置、电视接收机、显示装置的控制方法、程序及记录介质
US20160134833A1 (en) Apparatus, systems and methods for synchronization of multiple headsets
JP2014002383A (ja) 端末装置及び端末装置の制御方法
KR20140087717A (ko) 디스플레이 장치 및 제어 방법
US10901690B2 (en) Display device and system comprising same
KR102454761B1 (ko) 영상표시장치의 동작 방법
US8600732B2 (en) Translating programming content to match received voice command language
US20150341694A1 (en) Method And Apparatus For Using Contextual Content Augmentation To Provide Information On Recent Events In A Media Program
US11700428B2 (en) Systems and methods for providing media based on a detected language being spoken
WO2006112326A1 (ja) 制御装置および方法、プログラム、並びに記録媒体
WO2021004511A1 (zh) 电子设备、非易失性存储介质及声音识别方法
JP7301663B2 (ja) 通知機能を備えた電子装置及び電子装置の制御方法
WO2021004309A1 (zh) 电子设备及检索服务选定方法
US11887588B2 (en) Display device
JP7087745B2 (ja) 端末装置、情報提供システム、端末装置の動作方法および情報提供方法
US20230054251A1 (en) Natural language processing device
KR20150082083A (ko) 디스플레이장치 및 그 제어방법
JP2022112292A (ja) 音声コマンド処理回路、受信装置、サーバ、システム、方法およびプログラム
CN113228166A (zh) 指令控制装置、控制方法及非易失性存储介质
JP2013121096A (ja) 音声調整装置およびデジタル放送受信装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20837444

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20837444

Country of ref document: EP

Kind code of ref document: A1