WO2019037732A1 - 麦克风阵列的电视机及电视系统 - Google Patents

麦克风阵列的电视机及电视系统 Download PDF

Info

Publication number
WO2019037732A1
WO2019037732A1 PCT/CN2018/101657 CN2018101657W WO2019037732A1 WO 2019037732 A1 WO2019037732 A1 WO 2019037732A1 CN 2018101657 W CN2018101657 W CN 2018101657W WO 2019037732 A1 WO2019037732 A1 WO 2019037732A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
audio signal
audio
television system
signal
Prior art date
Application number
PCT/CN2018/101657
Other languages
English (en)
French (fr)
Inventor
李新
卢铁军
Original Assignee
深圳创维-Rgb电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳创维-Rgb电子有限公司 filed Critical 深圳创维-Rgb电子有限公司
Publication of WO2019037732A1 publication Critical patent/WO2019037732A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups

Definitions

  • the present invention relates to the field of televisions, and more particularly to a television and television system for a microphone array.
  • the microphone of the traditional smart TV itself responsible for voice reception is easily interfered by environmental noise, and the interference source comes from the sound output of the TV itself and the external environment sound.
  • the TV When the TV is working normally, the TV itself will make a sound through the speaker.
  • the microphone function is activated, the sound output by the speaker is interfered by the microphone after receiving the microphone, and the interference is transmitted back to the main control chip and output to the main control chip.
  • Speaker at this time, the sound output from the TV itself is mixed with the interference generated by the microphone. In the normal sound, there will be a lot of noise, especially in the quiet environment around, the noise will be particularly obvious.
  • the microphone function When the external environment is relatively noisy, the microphone function is turned on at this time, and the microphone is easily interfered by the external environment sound when the microphone recognizes the vocal command, thereby indicating that the recognition sensitivity is low, the feedback content is incorrect, and the user experience is reduced.
  • the distance between the TV and the TV is uncontrollable and the position is uncontrollable, which causes the signals captured by the microphone at different distances and different angles to be different, resulting in poor recognition.
  • Existing technology needs to be improved.
  • the main object of the present invention is to provide a television and television system for a microphone array, which aims to solve the technical problem that the sensitivity of the user is low due to the low sensitivity of the voice interactive recognition and the feedback content is incorrect in the prior art.
  • the present invention provides a television system of a microphone array, the television system comprising: a microphone array, a processor, an intelligent voice server, and an audio;
  • the microphone array is configured to collect a first sound audio signal, and send the first sound audio signal to the processor;
  • the processor configured to perform echo cancellation and interference sound filtering processing on the first sound audio signal to obtain an original sound audio signal, convert the original sound audio signal into a digital signal, and send the digital signal to the Intelligent voice server;
  • the intelligent voice server is configured to acquire response voice data that matches the digital signal, and send the response voice data to the sound;
  • the sound is used to output the response voice data.
  • the processor is further configured to analyze whether the digital signal includes target data corresponding to a preset keyword, and if the digital signal includes the target data, set the smart voice server In the on state, if the target data is not included in the digital signal, the smart voice server is set to a closed state.
  • the intelligent voice server is further configured to: when in the open state, determine whether there is local data matching the digital signal in a local database of the intelligent voice server, when the local database exists The local data is used as the response voice data when the local data matches the digital signal.
  • the intelligent voice server is further configured to: when the local database does not have local data that matches the digital signal, search for related resource data that matches the digital signal through the Internet, and use the related resource data. As the response voice data.
  • the processor is further configured to receive an audio backhaul audio signal corresponding to the first sound audio signal and the sound sound, compare the sound back audio signal with the sound audio signal, and eliminate the An audio audio signal corresponding to the acoustic sound in the first audio audio signal, the first audio audio signal of the acoustic audio signal is eliminated as a second audio audio signal, and the second audio audio signal is identified to obtain the The original audio signal and the interfering audio signal cancel the interfering audio signal to obtain the acoustic audio signal.
  • the television system further includes: an input and output buffer;
  • the input/output buffer configured to: after the processor receives the first sound audio signal and the sound back audio signal corresponding to the sound sound, the first sound audio signal and the sound back audio signal Temporarily storing, after synchronizing the audio return audio signal with the first audio audio signal, transmitting the synchronized first audio audio signal and the audio return audio signal to the processor.
  • the television system further includes: an automatic gain controller;
  • the automatic gain controller is configured to: after the processor receives the first sound audio signal and the sound back audio signal corresponding to the sound sound, the first sound audio signal and the sound back audio signal Performing automatic gain control to ensure output intensity of the first sound audio signal and the audio back audio signal, and transmitting the first sound audio signal and the audio return audio signal after automatic gain control to The processor.
  • the intelligent voice server is further configured to establish a wireless connection with an external smart home appliance, generate a control signal according to the digital signal, and send the control signal to the external smart home appliance to implement voice control.
  • the processor is further configured to filter the first sound audio signal and the audio back audio signal according to a preset frequency range.
  • the present invention also provides a television set comprising the above-described microphone array television system.
  • the invention collects a first sound audio signal through a microphone array, and sends the first sound audio signal to the processor, and the processor performs echo cancellation and interference sound filtering processing on the first sound audio signal to obtain an original sound An audio signal, the original audio signal is converted into a digital signal, and the digital signal is sent to the intelligent voice server, and the intelligent voice server acquires response voice data that matches the digital signal, and sends the response voice data to The audio and audio output the response voice data, so that the entire voice interaction process is more flexible and simple, and the voice recognition sensitivity can be more effectively improved, and the accuracy and user experience of the voice interaction feedback content are significantly improved.
  • FIG. 1 is a block diagram showing the structure of a first embodiment of a television and television system for a microphone array of the present invention
  • FIG. 2 is a schematic diagram showing a arrangement of microphone arrays in a television and television system of a microphone array according to the present invention
  • FIG. 3 is a structural block diagram of a second embodiment of a television and television system for a microphone array of the present invention.
  • FIG. 4 is a flow chart of echo cancellation and interference sound filtering processing in a television and television system of a microphone array of the present invention.
  • FIG. 1 there is shown a block diagram of a first embodiment of a television and television system for a microphone array of the present invention.
  • the television system includes: a microphone array 10, a processor 20, an intelligent voice server 30, and an audio 40;
  • the microphone array 10 is configured to collect a first sound audio signal, and send the first sound audio signal to the processor;
  • the microphone array 10 is further configured to determine an acquisition position of an external sound source, collect a first sound audio signal at the collection position, and send the first sound audio signal to the processor;
  • the microphone array 10 has a far field identification and a sound source localization function, and the microphone array 10 is composed of a certain number of acoustic sensors (generally microphones), and the position of the external sound source is determined according to the sound source localization function, and the position is taken as the collection.
  • Position, the sound signal collected at the collection position as the first sound audio signal, and the first sound audio signal is sent to the processor 10;
  • the microphone array 10 refers to a plurality of microphones arranged in a regular manner, such as a spacing between microphones, a number of microphones, and a direction.
  • FIG. 2 is a television and television system of the microphone array of the present invention. Schematic diagram of the arrangement of the microphone array.
  • the arrangement of the microphone arrays as shown in FIG. 2 is one of a plurality of arrangements, and of course, other arrangement rules may be used for arrangement. This embodiment does not limit this. Referring to FIG. 2, in which the number of X-axis microphones m is greater than or equal to 1, the number of Y-axis microphones m is greater than or equal to 1, and the total number of microphones m is greater than or equal to two.
  • the speaker under normal circumstances, has a certain distance when interacting with the voice of the smart television. In a certain space, there is a lot of environmental noise that also interferes with the recognition of the speaker voice by the microphone array, and the microphone array 10 uses it.
  • the advantages of multiple (at least three) microphones and the far-field speech recognition function filter out sounds other than the speaker's voice in the speaker's direction, achieving accurate recognition within a certain distance. Multiple microphones are positioned according to the time of receiving the speaker's voice, positioning the speaker's direction, filtering out noise in other directions through software algorithms, and assisting far-field recognition to achieve more accurate recognition.
  • the processor 20 is configured to receive the first sound audio signal, perform echo cancellation and interference sound filtering processing on the first sound audio signal, to obtain an original sound audio signal, and convert the original sound audio signal into a digital signal. Transmitting the digital signal to the intelligent voice server 30;
  • the processor 20 has a function of processing audio, and can perform echo cancellation and interference sound filtering processing on the first audio and audio signal to obtain an original audio signal, where the original audio signal refers to The first sound audio signal collected by the microphone array 10 removes the interference signal and the audio signal remaining after the echo signal, obtains the original sound audio signal, converts the original sound audio signal into a digital signal, and sends the digital signal to The intelligent voice server 30;
  • the processor 20 is further configured to receive the audio back audio signal corresponding to the first sound audio signal and the sound sound, compare the sound back audio signal with the sound audio signal, and eliminate the first Acoustic audio signal corresponding to the acoustic sound in the audio audio signal, the first audio audio signal of the acoustic audio signal is eliminated as a second audio audio signal, and the second audio audio signal is recognized to obtain the original sound audio Signaling and interfering with the acoustic audio signal, eliminating the interfering sound audio signal to obtain the acoustic audio signal.
  • the audio audio signal corresponding to the sound sound in the first sound audio signal is eliminated, and the second sound audio signal is Performing identification, acquiring a spectrum of the original sound audio signal and the interference sound audio signal, and the manner of identifying may be real-time comparison between the two signals by a software algorithm, allowing the speaker's original sound audio signal to pass, and The spectrum of the interfering audio and audio signal is filtered out, and of course, other recognition methods are used to achieve the effect of eliminating the interference sound and the acoustic echo. This embodiment does not limit this.
  • the processor 20 is further configured to filter the first audio audio signal and the audio back audio signal according to a preset frequency range.
  • the processor 20 filters the first audio audio signal and the audio back audio signal, and may exceed the preset in the first audio audio signal and the audio back audio signal. A part of the audio signal of the frequency range is filtered out, and the filtering process is equivalent to preliminary screening of the first sound audio signal and the audio back-transmitted audio signal, thereby improving the sensitivity of the voice recognition and avoiding interference with the audio signal and the The error caused by the audio signal to the digital signal improves the accuracy and efficiency of speech recognition.
  • the processor 20 may use a software algorithm to identify sounds in different directions acquired through the microphone array, identify who is speaking, and identify the direction, and mark different sound spectra in different directions. Identify and answer one by one for different people or multiple people.
  • the processor 20 is further configured to analyze whether the digital signal includes target data corresponding to a preset keyword, and if the digital signal includes the target data, the intelligent voice server 30 is used. Set to an on state, if the target data is not included in the digital signal, the smart voice server 30 is set to an off state.
  • the processor 20 can quickly control the start and stop of the smart voice server 30 by identifying keywords in the digital signal, and improve the efficiency of voice interaction;
  • the preset keyword can be a TV system default.
  • the keyword can also be a keyword set by the user, which is not limited in this embodiment.
  • the processor 20 when the television is working and the sound is normally outputting the sound, the processor 20 performs the echo cancellation and the interference sound filtering process on the first sound audio signal collected by the microphone array 10 to obtain the original sound audio signal. Converting the original audio signal into a digital signal, and analyzing whether the digital signal includes a keyword in a preset keyword, and if the keyword exists, the keyword “wakes up” the smart voice
  • the server 30 generates a corresponding control instruction to reduce the output of the system sound to reduce the interference of the sound size of the television itself to the voice feedback, and the intelligent voice server 30 processes the digital signal in time and feeds back the voice information. Output through the speaker in a normal sound.
  • the intelligent voice server 30 is configured to receive the digital signal, obtain response voice data that matches the digital signal, and send the response voice data to the audio 40;
  • the sound 40 is configured to receive the response voice data, and output the response voice data.
  • the sound 40 may be a local sound of the television or a peripheral sound connected to the television, which is not limited in this embodiment.
  • the smart voice server 30 is further configured to send the response voice data to the processor 20, and the processor 20 generates a corresponding control instruction according to the response voice data to perform a corresponding operation;
  • the corresponding operation may be to control the corresponding external device, such as an external speaker, to turn off, turn on, and adjust the volume, or to control the TV itself, for example, to call up the corresponding display page, and perform channel change, search, and playback according to the control command.
  • the corresponding operations, such as returning and suspending may of course be performed according to the control command, which is not limited in this embodiment.
  • the processor 20 and the intelligent voice server 30 are connected through a software function interface and a hardware function interface, so that the smart voice server 30 sends the response voice data matched with the digital signal to the
  • the processor 20 generates a corresponding control instruction according to the response voice data to perform a corresponding operation; for example, the user says, “Which of the variety shows are currently being played?”
  • the microphone array 10 collects the After the audio signal, after the processor 20 echo cancellation and interference sound filtering processing, the original sound audio signal is obtained, and the original sound audio signal is converted into a digital signal, and then the intelligent voice server 30 finds a response matching the digital signal.
  • Voice data the response voice data is sent to the processor 20 and the audio 40, and the processor 20 generates a corresponding control instruction according to the response voice data, searches for a variety show that is currently being played, and searches for The result is displayed on the TV display screen, and corresponding feedback voice data is generated and sent to the
  • the sound 40, the sound 40 outputs the feedback voice data and the response voice data, that is, "OK, is searching for you" "has already searched for the seven-speed variety show being played, you can search results
  • the display interface is selected for viewing, and of course, it can also be a processing manner of other scenarios, which is not limited in this embodiment.
  • the intelligent voice server 30 After receiving the digital signal, the intelligent voice server 30 performs big data analysis and processing on the digital signal, obtains response voice data that matches the digital signal, and sends the response voice data. To the sound 40;
  • the intelligent voice server 30 is further configured to: when in the open state, determine whether there is local data matching the digital signal in the local database of the intelligent voice server 30, when the local database When there is the local data matching the digital signal, the local data is used as the response voice data;
  • data matching the digital signal may be searched in the local database, and if data matching the digital signal is found, the Data as the response voice data;
  • the intelligent voice server 30 can identify the data in the local database by using a deep learning algorithm and generate a recognition result, and establish a response mapping between the data in the local database according to the recognition result. Relationship, according to the response mapping relationship, when the digital signal matches the data in the local database, first searching for data in the local database that has the same meaning as the digital signal, and then locally through the response mapping relationship Data in the database having a response relationship with the same meaning as the digital signal is found, and the data is used as the response voice data.
  • data matching the digital signal can be found in the local database by other means. This embodiment does not limit this.
  • the local database of the intelligent voice server 30 continuously accumulates and updates voice data, and uses deep learning algorithms for voice recognition, semantic recognition, voiceprint recognition, etc., which are constantly accumulating, under the continuous improvement of big data. Providing voice data more accurately, thereby improving the accuracy of recognizing the voice data and feeding back the response voice data; the local database is continuously accumulated, and stores a response between the data in the local database according to the recognition result.
  • the mapping relationship can achieve intelligent performance when the TV is not connected to the Internet, provide accurate voice response, and can replace the remote control to free hands through accurate voice response, realize direct interaction between human and machine, and improve user experience.
  • the intelligent voice server 30 is further configured to: when the local database does not have local data that matches the digital signal, search for related resource data that matches the digital signal through the Internet, and use the related resource. Data is used as the response voice data.
  • the related resource data matching the digital signal is searched through the Internet, and the related resource data is divided into internal resource data and external resources.
  • the internal resource data is resource data of a cloud background database matching the current television set
  • the external resource data is resource data that is captured on the Internet and matched with the digital signal; in practical applications, subject to The memory size of the television set, the local database cannot store too much data, and when there is no local data matching the digital signal in the local database, it is preferred to search for whether the internal resource data exists through the Internet.
  • the related resource data matched by the digital signal if there is no related resource data matching the digital signal in the internal resource data, searching whether the external resource data has a correlation with the digital signal Resource data, the internal resource data is more than the external resource data Open, free, stronger needle selection, and the search time of the internal time data resource external resource data faster than the search, the search from the internal resource consuming and less expensive computing resources than searching the external resource data.
  • the related resource data is used as the response voice data, and if the searched related resource data matching the digital signal is many, Sorting the searched related resource data from high to low according to the degree of matching with the digital signal, and selecting related resource data with the highest degree of matching with the digital signal as the response voice data, or The data matching the user's speaking habits is selected as the response voice data according to the speaking habits of the surveying user.
  • the optimal data may be selected from the plurality of related resource data as the answering voice data by other means. There is no restriction on this.
  • the intelligent voice server 30 after receiving the digital signal, performs big data analysis and processing on the digital signal, and compares the data through the local database with the digital signal, if the local database The data cannot be matched with the digital signal, that is, the resources of the local database are limited, then the Internet can be used to search for related resource data matching the digital signal, and the digital signal can be preferentially searched from the internal resource data. Matching related resource data, if not found, searching for the highest matching data from the digital signal from the external resource data, for example, on some open platforms or on a webpage, searching for related resource data matching the digital signal.
  • the local database can be updated in real time, the local resource capacity can be expanded, and the voice can be optimized and improved. Identify and respond to efficiencies, making voice interactions more intelligent And humane.
  • the cooperation with multiple voice resource solution providers can cooperate with multiple network search engine resources to improve compatibility of various platforms.
  • the smart voice server 30 preferentially selects the fastest feedback speed. The most accurate solution to provide users with the highest quality and most desired content.
  • the intelligent voice server 30 is further configured to establish a wireless connection with an external smart home appliance, generate a control signal according to the digital signal, and send the control signal to the external smart home appliance to implement voice control.
  • the smart voice server 30 can be used in conjunction with a smart home to establish a wireless connection with an external smart home appliance, and may be connected to an external smart home appliance by other means.
  • the TV converts the received sound data into control data, and transmits the control data to other smart home appliances interconnected with the TV through wireless communication technology, thereby achieving the purpose of voice control intelligent household appliances, thereby realizing Interconnection.
  • the wireless connection may be connected through WiFi or may be connected through Bluetooth, which is not limited in this embodiment.
  • WiFi Wireless Fidelity
  • Bluetooth Wireless Fidelity
  • the first sound audio signal is collected by the microphone array, and the first sound audio signal is sent to the processor, and the processor performs echo cancellation and interference sound filtering processing on the first sound audio signal to obtain An original audio signal, the original audio signal is converted into a digital signal, and the digital signal is sent to the intelligent voice server, and the intelligent voice server acquires response voice data that matches the digital signal, and sends the response voice data.
  • the audio outputs the response voice data, which makes the whole voice interaction process more flexible and simple, can more effectively improve the voice recognition sensitivity, and significantly improves the accuracy and user experience of the voice interactive feedback content.
  • FIG. 3 is a structural block diagram of a second embodiment of a television and television system for a microphone array according to the present invention. Referring to FIG. 3, the television system is described. Also including: an input and output buffer 50 and the automatic gain controller 60;
  • the input/output buffer 50 is configured to: after the processor 20 receives the first sound audio signal and the sound back audio signal corresponding to the sound sound, return the first sound audio signal and the sound back Temporarily storing the audio signal, and synchronizing the audio back audio signal with the first sound audio signal, and then transmitting the synchronized first sound audio signal and the audio back audio signal to the processing 20.
  • the automatic gain controller 60 is configured to: after the processor 20 receives the first sound audio signal and the sound back audio signal corresponding to the sound sound, the first sound audio signal and the sound back The audio signal is subjected to automatic gain control to ensure the output intensity of the first sound audio signal and the audio return audio signal, and the first sound audio signal and the audio back audio signal after the automatic gain control Sent to the processor 20.
  • the input/output buffer 50 functions to coordinate and buffer, temporarily storing the first audio audio signal and the audio return audio signal, and returning the audio signal and the sound.
  • the synchronized first sound audio signal and the audio back audio signal are sent to the processor 20 to enable a high speed working processor (such as a CPU) and
  • the slow working peripheral can realize the synchronization of data transmission;
  • the automatic gain controller 60 can adjust the output signals, that is, the first sound audio signal and the audio back audio signal, to ensure the output signal strength.
  • the television and television system of the microphone array of the present invention further includes a digitizer and a digitizing filter, as shown in the flowchart of the echo cancellation and interference sound filtering processing in the television and television system of the microphone array of the present invention, see FIG.
  • the echo cancellation and interference sound filtering processing flow is as follows: the microphone array receives the first sound audio signal, the first sound audio signal includes an original sound in different directions in the external environment, an interference sound and a sound of the television itself; The array sends the received first sound audio signal to the digitizer, and the digitizer samples the analog electrical signal output by the sound through the line echo; The digital sampler modulates the acquired sound signal by pulse code modulation (Pulse Code The Modulation, PCM) process converts to a PCM signal format and removes unwanted portions of the signal, such as random noise, through a digital filter to extract useful portions of the signal, such as components within the human ear's audible frequency range.
  • PCM pulse code modulation
  • the output useful signal is transmitted to the input/output buffer, and the input/output buffer performs level conversion processing on the signal of the digitizing filter, so that the high-speed working CPU coordinates and buffers the slow working peripheral to realize Synchronization of data transfer.
  • the signal sampled from the audio needs to be prepared for subsequent echo cancellation, and the processed data is simultaneously sent to the automatic gain controller with the data collected and processed by the microphone array, and the automatic gain controller pairs the first sound.
  • the audio signal and the audio return audio signal are automatically gain controlled, and the output signal is adjusted to ensure the output signal strength.
  • the processor performs phase-locked synchronization processing on the signal output by the automatic gain controller, and compiles and converts the two signals so that they can be stored and compared through the comparison logic and the system-on-chip (System) On Chip, SOC) side logic operation and software algorithm add the two signals, and remove the signal transmitted from the microphone array, that is, the audio audio signal corresponding to the acoustic sound in the first sound audio signal, that is, the microphone is removed. Received acoustic echo.
  • System System
  • SOC system-on-chip
  • the first audio audio signal and the audio return audio signal are level-converted by an input/output buffer to make the audio return audio signal and the first audio audio signal have the same voltage.
  • Automatic gain controller after receiving the first sound audio signal and the sound back audio signal corresponding to the sound sound, the processor performs automatic gain control on the first sound audio signal and the sound back audio signal, In order to ensure the output intensity of the first sound audio signal and the audio back-transmitted audio signal, the output intensity of the first sound audio signal and the audio back-transmitted audio signal is improved, and the audio back-transmitted audio signal and the sound source are implemented.
  • the synchronous transmission of the first sound and audio signal further improves the accuracy and efficiency of the voice recognition and improves the user experience.
  • the embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course Hardware, but in many cases the former is a better implementation.
  • the technical solution of the present invention may be in the form of a software product in essence or in part contributing to the prior art. It is now found that the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), and includes a plurality of instructions for making a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device). Etc.) Performing the methods described in various embodiments of the invention.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本发明公开了一种麦克风阵列的电视机及电视系统,本发明通过麦克风阵列采集第一声音音频信号,并将所述第一声音音频信号发送至所述处理器,处理器对所述第一声音音频信号进行回音消除和干扰声过滤处理,以获得原声音频信号,将所述原声音频信号转换成数字信号,将所述数字信号发送至所述智能语音服务器,智能语音服务器获取与所述数字信号匹配的应答语音数据,将所述应答语音数据发送至所述音响,音响将所述应答语音数据输出,使整个语音交互过程更加灵活简单,能够更加有效的提高语音识别灵敏度,显著提升了语音交互回馈内容的准确性和用户体验。

Description

麦克风阵列的电视机及电视系统
技术领域
本发明涉及电视机领域,尤其涉及一种麦克风阵列的电视机及电视系统。
背景技术
随着科技的发展及智能数码设备的普及,人机交互功能及系统越来越受到人们的青睐,通常情况下人们通过遥控器或按键来控制电视机,而语音遥控则越过遥控器和按键,直接用声音控制电视机功能,并可以与电视机互动,达到人机交互的目的,在电视机上体现人工智能,为用户带来很多方便。
传统智能电视机本身负责语音接收的麦克风容易受环境噪声干扰,其干扰源来自于电视机本身的声音输出和外界环境声音。当电视机正常工作时,电视机本身会通过喇叭发出声音,此时如果启动麦克风功能,那么喇叭输出的声音被麦克风接收后对麦克风产生干扰,其干扰回传给主控芯片并通过功放输出给喇叭,此时电视机本身输出的声音与麦克风产生的干扰混在一起输出,在正常的声音里会混有很大的底噪声,尤其在周围安静的环境下,此底噪声会特别明显。当外界环境相对很吵杂时,此时开启麦克风功能,麦克风识别人声指令时容易受到外界环境声音干扰,从而体现识别灵敏度低,回馈内容有误等,降低用户体验感。用户在使用电视机麦克风功能时,其与电视机之间的距离不可控,位置不可控,这样就会导致麦克风在不同距离和不同角度上捕捉的信号有差别,从而导致识别效果不好,因此现有的技术还有待提高。
上述内容仅用于辅助理解本发明的技术方案,并不代表承认上述内容是现有技术。
发明内容
本发明主要目的在于提供一种麦克风阵列的电视机及电视系统,旨在解决现有技术中语音交互识别灵敏度低,回馈内容有误造成的用户体验感差的技术问题。
为实现上述目的,本发明提供一种麦克风阵列的电视系统,所述电视系统包括:麦克风阵列、处理器、智能语音服务器和音响;
所述麦克风阵列,用于采集第一声音音频信号,并将所述第一声音音频信号发送至所述处理器;
所述处理器,用于对所述第一声音音频信号进行回音消除和干扰声过滤处理,以获得原声音频信号,将所述原声音频信号转换成数字信号,将所述数字信号发送至所述智能语音服务器;
所述智能语音服务器,用于获取与所述数字信号匹配的应答语音数据,将所述应答语音数据发送至所述音响;
所述音响,用于将所述应答语音数据输出。
进一步地,所述处理器,还用于分析所述数字信号中是否包含有与预设关键词对应的目标数据,若所述数字信号中包含所述目标数据,则将所述智能语音服务器设置为开启状态,若所述数字信号中未包含所述目标数据,则将所述智能语音服务器设置为关闭状态。
进一步地,所述智能语音服务器,还用于在处于所述开启状态时,判断在所述智能语音服务器的本地数据库中是否存在与所述数字信号匹配的本地数据,当所述本地数据库中存在与所述数字信号匹配的所述本地数据时,将所述本地数据作为所述应答语音数据。
进一步地,所述智能语音服务器,还用于当所述本地数据库不存在与所述数字信号匹配的本地数据时,通过互联网搜索与所述数字信号匹配的相关资源数据,将所述相关资源数据作为所述应答语音数据。
进一步地,所述处理器,还用于接收所述第一声音音频信号和音响声音对应的音响回传音频信号,将所述音响回传音频信号与所述声音音频信号进行对比,消除所述第一声音音频信号中所述音响声音对应的音响音频信号,将消除所述音响音频信号的第一声音音频信号作为第二声音音频信号,对所述第二声音音频信号进行识别,获取所述原声音频信号和干扰声音频信号,消除所述干扰声音音频信号,以获得所述原声音频信号。
进一步地,所述电视系统还包括:输入输出缓冲器;
所述输入输出缓冲器,用于在所述处理器接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,将所述第一声音音频信号和所述音响回传音频信号暂时存储,使所述音响回传音频信号和所述第一声音音频信号同步后,再将经过同步后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器。
进一步地,所述电视系统还包括:自动增益控制器;
所述自动增益控制器,用于在所述处理器接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,对所述第一声音音频信号和所述音响回传音频信号进行自动增益控制,以保证所述第一声音音频信号和所述音响回传音频信号的输出强度,将经过自动增益控制后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器。
进一步地,所述智能语音服务器,还用于与外部智能家用电器建立无线连接,根据所述数字信号生成控制信号,将所述控制信号发送至所述外部智能家用电器以实现语音控制。
进一步地,所述处理器,还用于按照预设频率范围对所述第一声音音频信号和所述音响回传音频信号进行过滤。
此外,为实现上述目的,本发明还提出一种电视机,所述电视机包括上述麦克风阵列的电视系统。
本发明通过麦克风阵列采集第一声音音频信号,并将所述第一声音音频信号发送至所述处理器,处理器对所述第一声音音频信号进行回音消除和干扰声过滤处理,以获得原声音频信号,将所述原声音频信号转换成数字信号,将所述数字信号发送至所述智能语音服务器,智能语音服务器获取与所述数字信号匹配的应答语音数据,将所述应答语音数据发送至所述音响,音响将所述应答语音数据输出,使整个语音交互过程更加灵活简单,能够更加有效的提高语音识别灵敏度,显著提升了语音交互回馈内容的准确性和用户体验。
附图说明
图1为本发明麦克风阵列的电视及电视系统第一实施例的结构框图;
图2为本发明麦克风阵列的电视及电视系统中麦克风阵列排列方式示意图;
图3为本发明麦克风阵列的电视及电视系统第二实施例的结构框图;
图4为本发明麦克风阵列的电视及电视系统中回音消除和干扰声过滤处理流程图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
参照图1,图1为本发明麦克风阵列的电视及电视系统第一实施例的结构框图。
所述电视系统包括:麦克风阵列10、处理器20、智能语音服务器30和音响40;
所述麦克风阵列10,用于采集第一声音音频信号,并将所述第一声音音频信号发送至所述处理器;
需要说明的是,所述麦克风阵列10还用于确定外界声源的采集位置,在所述采集位置上采集第一声音音频信号,并将所述第一声音音频信号发送至所述处理器;所述麦克风阵列10具有远场识别和声源定位功能,所述麦克风阵列10由一定数目的声学传感器(一般是麦克风)组成,根据声源定位功能确定外界声源的位置,将该位置作为采集位置,在所述采集位置上采集的声音信号作为所述第一声音音频信号,并将所述第一声音音频信号发送至所述处理器10;
可以理解的是,所述麦克风阵列10是指多个麦克风以一定规则排列,比如麦克风之间的间距、麦克风数量和方向等进行排列,例如,图2为本发明麦克风阵列的电视及电视系统中麦克风阵列排列方式示意图。如图2所示的麦克风阵列排列方式是众多排列方式中的一种,当然还可以采用其他的排列规则进行排列,本实施例对此不加以限制。参照图2,其中X轴麦克风m的数量大于等于1,Y轴麦克风m的数量大于等于1,麦克风m的总数量大于等于2。
在具体实现中,通常状况下讲话者与智能电视机语音交互的时候都有一定的距离,在一定空间内有很多环境噪声也会干扰麦克风阵列对讲话者声音的识别,麦克风阵列10会用其多个(至少三个)麦克风的优势和远场语音识别功能过滤出在讲话者方向除了讲话者声音以外的声音,在一定距离内达到精准识别效果。多个麦克风根据接收到讲话者声音的时间不同,定位讲话者所在的方向,通过软件算法过滤掉其它方向的噪声,辅助远场识别达到更加精准的识别效果。
所述处理器20,用于接收所述第一声音音频信号,对所述第一声音音频信号进行回音消除和干扰声过滤处理,以获得原声音频信号,将所述原声音频信号转换成数字信号,将所述数字信号发送至所述智能语音服务器30;
需要说明的是,所述处理器20具有对音频处理的功能,能够对所述第一声音音频信号进行回音消除和干扰声过滤处理,以获得原声音频信号,所述原声音频信号是指通过所述麦克风阵列10采集的所述第一声音音频信号去除干扰信号和回音信号后剩余的音频信号,获得所述原声音频信号后将所述原声音频信号转换成数字信号,将所述数字信号发送至所述智能语音服务器30;
所述处理器20,还用于接收所述第一声音音频信号和音响声音对应的音响回传音频信号,将所述音响回传音频信号与所述声音音频信号进行对比,消除所述第一声音音频信号中所述音响声音对应的音响音频信号,将消除所述音响音频信号的第一声音音频信号作为第二声音音频信号,对所述第二声音音频信号进行识别,获取所述原声音频信号和干扰声音频信号,消除所述干扰声音音频信号,以获得所述原声音频信号。
需要说明的是,通过将所述音响回传音频信号与所述声音音频信号进行对比,消除所述第一声音音频信号中所述音响声音对应的音响音频信号,将所述第二声音音频信号进行识别,获取所述原声音频信号和所述干扰声音频信号的频谱,识别的方式可以通过软件算法对这两种信号进行实时对比,让讲话者的所述原声音频信号通过,并把所述干扰声音频信号的频谱过滤掉,当然还可以是通过其他识别方式以达到消除干扰声和音响回音的效果,本实施例对此不加以限制。
所述处理器20,还用于按照预设频率范围对所述第一声音音频信号和所述音响回传音频信号进行过滤。
需要说明的是,所述处理器20对所述第一声音音频信号和所述音响回传音频信号进行过滤,可以将所述第一声音音频信号和所述音响回传音频信号中超出预设频率范围的部分音频信号过滤出来,过滤的过程相当于对所述第一声音音频信号和所述音响回传音频信号做了初步的筛选,提升了语音识别的灵敏度,避免了干扰音频信号和所述音响音频信号对所述数字信号造成的误差,提高了语音识别的准确性和效率。
在具体实现中,所述处理器20可以利用软件算法对通过麦克风阵列获取的不同方向的声音进行辨别,辨别出是谁在说话,还可以辨别出方向,对不同方向不同的声音频谱进行标记和辨别,从而针对不同的人或多人进行一一回答。
进一步地,所述处理器20还用于分析所述数字信号中是否包含有与预设关键词对应的目标数据,若所述数字信号中包含所述目标数据,则将所述智能语音服务器30设置为开启状态,若所述数字信号中未包含所述目标数据,则将所述智能语音服务器30设置为关闭状态。
可以理解的是,所述处理器20能够通过识别所述数字信号中的关键词快速控制所述智能语音服务器30的启停,提高语音交互的效率;所述预设关键词可以是电视系统默认的关键词,也可以是用户自行设定的关键词,本实施例对此不加以限制。
在具体实现中,当电视机正在工作、音响正常输出声音时,所述处理器20将麦克风阵列10采集的所述第一声音音频信号进行回音消除和干扰声过滤处理后获得所述原声音频信号,将所述原声音频信号转换成数字信号,分析所述数字信号中是否包含有预设关键词中的关键词,若存在所述关键词,则所述关键词会“唤醒”所述智能语音服务器30,所述处理器20生成相应控制指令减小系统声音的输出,以减小电视机本身声音大小对语音反馈的干扰,所述智能语音服务器30及时处理所述数字信号并反馈语音信息,以正常的声音通过音箱输出。
所述智能语音服务器30,用于接收所述数字信号,获取与所述数字信号匹配的应答语音数据,将所述应答语音数据发送至所述音响40;
所述音响40,用于接收所述应答语音数据,将所述应答语音数据输出。
需要说明的是,所述音响40可以是电视机的本机音响,也可以是与电视机连接的外设音响,本实施例对此不加以限制。
可以理解的是,所述智能语音服务器30还用于将所述应答语音数据发送至所述处理器20,所述处理器20,根据所述应答语音数据生成对应的控制指令以进行相应操作;相应操作可以是对相应的外接设备例如外接音箱进行关闭、开启和调节音量等控制,还可以是对电视机本身进行相应控制例如调出相应显示页面,按照控制指令进行换台、搜索、重放、返回和暂停等相应操作,当然还可以是根据控制指令进行其他的操作,本实施例对此不加以限制。
在具体实现中,所述处理器20与所述智能语音服务器30通过软件功能接口和硬件功能接口进行对接,以使所述智能语音服务器30将与所述数字信号匹配的应答语音数据发送至所述处理器20,所述处理器20根据所述应答语音数据生成对应的控制指令以进行相应操作;例如:用户说:“现在正在播放的综艺节目有哪些?”所述麦克风阵列10采集到该音频信号后,经过处理器20回音消除和干扰声过滤处理,获得原声音频信号,将该原声音音频信号转换成数字信号后,经过所述智能语音服务器30查找到与所述数字信号匹配的应答语音数据,将所述应答语音数据发送至所述处理器20和所述音响40,所述处理器20根据所述应答语音数据生成对应的控制指令,查找现在正在播放的综艺节目,并将查找到的结果在电视机显示屏幕显示出来,生成相应反馈语音数据发送至所述音响40,所述音响40将所述反馈语音数据和所述应答语音数据输出,即“好的,正在帮您搜寻”“已经帮您搜寻到七档正在播放的综艺节目,您可以在搜寻结果显示界面进行选择观看”,当然还可以是其他情景的处理方式,本实施例对此不加以限制。
需要说明的是,所述智能语音服务器30接收所述数字信号后,对所述数字信号进行大数据分析和处理,获得与所述数字信号匹配的应答语音数据,并将所述应答语音数据发送至所述音响40;
进一步地,所述智能语音服务器30,还用于在处于所述开启状态时,判断在所述智能语音服务器30的本地数据库中是否存在与所述数字信号匹配的本地数据,当所述本地数据库中存在与所述数字信号匹配的所述本地数据时,将所述本地数据作为所述应答语音数据;
需要说明的是,当所述智能语音服务器30处于所述开启状态时,可以在所述本地数据库中查找与所述数字信号匹配的数据,若找到与所述数字信号匹配的数据,则将该数据作为所述应答语音数据;
可以理解的是,所述智能语音服务器30,可以利用深度学习算法对所述本地数据库中的数据进行识别并生成识别结果,根据所述识别结果建立所述本地数据库中的数据之间的应答映射关系,根据所述应答映射关系可以在所述数字信号与所述本地数据库中的数据匹配时,首先查找到本地数据库中与所述数字信号相同意思的数据,然后通过所述应答映射关系在本地数据库中找到与所述数字信号相同意思的数据有应答关系的数据,将该数据作为所述应答语音数据,当然还可以通过其他方式从所述本地数据库中找到与所述数字信号匹配的数据,本实施例对此不加以限制。
应当理解的是,所述智能语音服务器30的所述本地数据库会不断积累和更新语音数据,通过深度学习算法进行语音识别,语义识别和声纹识别等,不断日积月累,在不断完善的大数据下更准确的提供语音数据,从而提升识别语音数据和回馈所述应答语音数据的准确性;所述本地数据库通过不断积累,存储有根据所述识别结果建立所述本地数据库中的数据之间的应答映射关系,在电视机未联网的情况下就能够达到智能的表现,提供精准的语音应答,通过精准的语音应答能够取代遥控器解放双手,实现人机直接交互,提高了用户体验。
进一步地,所述智能语音服务器30,还用于当所述本地数据库不存在与所述数字信号匹配的本地数据时,通过互联网搜索与所述数字信号匹配的相关资源数据,将所述相关资源数据作为所述应答语音数据。
应当理解的是,当所述本地数据库不存在与所述数字信号匹配的本地数据时,通过互联网搜索与所述数字信号匹配的相关资源数据,所述相关资源数据分为内部资源数据与外部资源数据,所述内部资源数据为与当前电视机匹配的云端后台数据库的资源数据,所述外部资源数据为在互联网上抓取的与所述数字信号匹配的资源数据;在实际应用中,受制于电视机的内存大小,所述本地数据库无法存放太多的数据,当所述本地数据库中不存在与所述数字信号匹配的本地数据时,优先选择通过互联网搜索所述内部资源数据中是否存在与所述数字信号匹配的相关资源数据,若在所述内部资源数据中还不存在与所述数字信号匹配的相关资源数据,则搜索所述外部资源数据中是否存在与所述数字信号匹配的相关资源数据,所述内部资源数据比所述外部资源数据更加开放自由,选择针更强,且搜索内部资源数据的时间比搜索所述外部资源数据的时间更快,搜索内部资源数据耗费的计算资源和成本比搜索所述外部资源数据更低。
需要说明的是,利用互联网搜索与所述数字信号匹配的相关资源数据,将所述相关资源数据作为所述应答语音数据,若搜索到的与所述数字信号匹配的相关资源数据很多,则可以是按照与所述数字信号的匹配程度对搜索到的所述相关资源数据从高到低进行排序,选取与所述数字信号的匹配程度最高的相关资源数据作为所述应答语音数据,还可以是根据调查用户的说话习惯选择与用户说话习惯相匹配的数据作为所述应答语音数据,当然还可通过其他方式从众多相关资源数据中选取出最优的数据作为所述应答语音数据,本实施例对此不加以限制。
可以理解的是,所述智能语音服务器30接收所述数字信号后,对所述数字信号进行大数据分析和处理,通过所述本地数据库的数据与所述数字信号比对,如果所述本地数据库的数据无法与所述数字信号匹配,即所述本地数据库的资源有限,那么可以利用互联网搜索与所述数字信号匹配的相关资源数据,可以优先从所述内部资源数据中查找与所述数字信号匹配的相关资源数据,若未找到则可以从所述外部资源数据例如在一些开放平台或在网页上搜索与所述数字信号匹配的相关资源数据,找到与所述数字信号的匹配程度最高的数据作为所述应答语音数据,通过将与所述数字信号的匹配程度最高的相关资源数据保存在所述本地数据库中,可以实时更新所述本地数据库,可以扩大本地资源容量,并且能够优化和提升语音识别和应答效率,使语音交互更加智能化、人性化。
在具体实现中,可以通过与多个语音资源方案商合作,与多个网络搜索引擎资源合作,打通各个平台兼容性,在用户使用过程中,所述智能语音服务器30优先选择回馈速度最快、最准确的方案,为用户提供最优质、最想要的内容。
进一步地,所述智能语音服务器30还用于与外部智能家用电器建立无线连接,根据所述数字信号生成控制信号,将所述控制信号发送至所述外部智能家用电器以实现语音控制。
需要说明的是,通过所述智能语音服务器30可以通过与智慧家庭类似的功能联合使用,与外部智能家用电器建立无线连接,当然也可以是通过其他方式与外部智能家用电器进行连接,本实施例对此不加以限制;电视机将接收到的声音数据转换成控制数据,通过无线通信技术把控制数据发送给与电视机互联的其它智能家用电器,达到语音控制智能家用电器的目的,从而可实现互通互联。
可以理解的是,所述无线连接可以是通过WiFi进行连接,也可以是通过蓝牙进行连接,本实施例对此不加以限制。例如,电视机通过蓝牙连接蓝牙智能音箱并且在音箱正常工作时,当用户对电视机说“关闭蓝牙音箱声音”,电视系统给音箱发送关闭声音的数据,蓝牙音箱接收到数据后进行处理,关闭自身音量,从而达到控制目的。
本实施例通过麦克风阵列采集第一声音音频信号,并将所述第一声音音频信号发送至所述处理器,处理器对所述第一声音音频信号进行回音消除和干扰声过滤处理,以获得原声音频信号,将所述原声音频信号转换成数字信号,将所述数字信号发送至所述智能语音服务器,智能语音服务器获取与所述数字信号匹配的应答语音数据,将所述应答语音数据发送至所述音响,音响将所述应答语音数据输出,使整个语音交互过程更加灵活简单,能够更加有效的提高语音识别灵敏度,显著提升了语音交互回馈内容的准确性和用户体验。
基于上述第二实施例,提出本发明麦克风阵列的电视及电视系统第二实施例,图3为本发明麦克风阵列的电视及电视系统第二实施例的结构框图,参照图3,所述电视系统还包括:输入输出缓冲器50和所述自动增益控制器60;
所述输入输出缓冲器50,用于在所述处理器20接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,将所述第一声音音频信号和所述音响回传音频信号暂时存储,使所述音响回传音频信号和所述第一声音音频信号同步后,再将经过同步后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器20。
所述自动增益控制器60,用于在所述处理器20接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,对所述第一声音音频信号和所述音响回传音频信号进行自动增益控制,以保证所述第一声音音频信号和所述音响回传音频信号的输出强度,将经过自动增益控制后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器20。
可以理解的是,所述输入输出缓冲器50起到的作用是协调和缓冲,将所述第一声音音频信号和所述音响回传音频信号暂时存储,使所述音响回传音频信号和所述第一声音音频信号同步后,再将经过同步后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器20,以使高速工作的处理器(例如CPU)与慢速工作的外设能够实现数据传送的同步;所述自动增益控制器60能够对输出信号即所述第一声音音频信号和所述音响回传音频信号进行调整,保证输出信号强度。
在具体实现中,本发明麦克风阵列的电视及电视系统还包括数字化采样器和数字化滤波器,如图4本发明麦克风阵列的电视及电视系统中回音消除和干扰声过滤处理流程图所示,参见图4,回音消除和干扰声过滤处理流程如下:麦克风阵列接收所述第一声音音频信号,所述第一声音音频信号包含外界环境中不同方向的原声,干扰声和电视机本身的声音;麦克风阵列将接收的所述第一声音音频信号发送至数字化采样器,同时数字化采样器通过线路回音将音响输出的模拟电信号进行采样; 数字化采样器将采集的声音信号通过脉冲编码调制(Pulse Code Modulation,PCM)处理转换成PCM信号格式,并通过数字化滤波器移除信号中不需要的部分,比如随机噪声,取出信号中有用的部分,比如人耳可听频率范围内的成分。将输出的有用信号传递给输入输出缓冲器,输入输出缓冲器将经过所述数字化滤波器的信号进行电平转换处理,使高速工作的CPU与慢速工作的外设起协调和缓冲作用,实现数据传送的同步。从音响采样过来的信号需,为后续消除回音做准备,并将此处理后的数据与麦克风阵列采集处理后的数据同时给到自动增益控制器,所述自动增益控制器对所述第一声音音频信号和所述音响回传音频信号进行自动增益控制,对输出信号进行调整,保证输出信号强度。处理器将自动增益控制器输出的信号进行锁相同步处理,对两种信号进行编制、转换,使其可被存储和比较,通过比较逻辑、系统级芯片(System on Chip,SOC)端逻辑运算和软件算法将两种信号进行加法处理,去掉麦克阵列传来的信号即所述第一声音音频信号中所述音响声音对应的音响音频信号,即理解为去掉麦克风中接收到的音响回声。将得到的第二声音音频信号进行解码,再通过残余回声和噪声抑制,抑制或过滤掉除讲话者频谱以外的干扰,再将信号传递给音频处理器形成所述原声音频信号,将所述原声音频信号转换成数字信号,将所述数字信号发送至所述智能语音服务器,通过智能语音服务器通过音频处理器获取到有用信息进行解析,获取与所述数字信号匹配的应答语音数据,将所述应答语音数据发送至所述音响,所述音响将所述应答语音数据输出,所述智能语音服务器将所述应答语音数据发送至所述处理器,所述处理器根据所述应答语音数据生成对应的控制指令以进行相应操作,从而达到交互和控制目的。
本实施例通过输入输出缓冲器对所述第一声音音频信号和所述音响回传音频信号进行电平转换处理,以使所述音响回传音频信号和所述第一声音音频信号的电压一致,自动增益控制器所述处理器接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,对所述第一声音音频信号和所述音响回传音频信号进行自动增益控制,以保证所述第一声音音频信号和所述音响回传音频信号的输出强度,提高了第一声音音频信号和所述音响回传音频信号的输出强度,实现所述音响回传音频信号和所述第一声音音频信号同步传输,进一步提高了语音识别的准确性和效率,提升了用户体验。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还 包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、 方法、物品或者系统中还存在另外的相同要素。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述 实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通 过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体 现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (20)

  1. 一种麦克风阵列的电视系统,其特征在于,所述电视系统包括:麦克风阵列、处理器、智能语音服务器和音响;
    所述麦克风阵列,用于采集第一声音音频信号,并将所述第一声音音频信号发送至所述处理器;
    所述处理器,用于对所述第一声音音频信号进行回音消除和干扰声过滤处理,以获得原声音频信号,将所述原声音频信号转换成数字信号,将所述数字信号发送至所述智能语音服务器;
    所述智能语音服务器,用于获取与所述数字信号匹配的应答语音数据,将所述应答语音数据发送至所述音响;
    所述音响,用于将所述应答语音数据输出。
  2. 如权利要求1所述的电视系统,其特征在于,所述处理器,还用于分析所述数字信号中是否包含有与预设关键词对应的目标数据,若所述数字信号中包含所述目标数据,则将所述智能语音服务器设置为开启状态,若所述数字信号中未包含所述目标数据,则将所述智能语音服务器设置为关闭状态。
  3. 如权利要求2所述的电视系统,其特征在于,所述智能语音服务器,还用于在处于所述开启状态时,判断在所述智能语音服务器的本地数据库中是否存在与所述数字信号匹配的本地数据,当所述本地数据库中存在与所述数字信号匹配的所述本地数据时,将所述本地数据作为所述应答语音数据。
  4. 如权利要求3所述的电视系统,其特征在于,所述智能语音服务器,还用于当所述本地数据库不存在与所述数字信号匹配的本地数据时,通过互联网搜索与所述数字信号匹配的相关资源数据,将所述相关资源数据作为所述应答语音数据。
  5. 如权利要求1所述的电视系统,其特征在于,所述处理器,还用于接收所述第一声音音频信号和音响声音对应的音响回传音频信号,将所述音响回传音频信号与所述声音音频信号进行对比,消除所述第一声音音频信号中所述音响声音对应的音响音频信号,将消除所述音响音频信号的第一声音音频信号作为第二声音音频信号,对所述第二声音音频信号进行识别,获取所述原声音频信号和干扰声音频信号,消除所述干扰声音音频信号,以获得所述原声音频信号。
  6. 如权利要求2所述的电视系统,其特征在于,所述处理器,还用于接收所述第一声音音频信号和音响声音对应的音响回传音频信号,将所述音响回传音频信号与所述声音音频信号进行对比,消除所述第一声音音频信号中所述音响声音对应的音响音频信号,将消除所述音响音频信号的第一声音音频信号作为第二声音音频信号,对所述第二声音音频信号进行识别,获取所述原声音频信号和干扰声音频信号,消除所述干扰声音音频信号,以获得所述原声音频信号。
  7. 如权利要求3所述的电视系统,其特征在于,所述处理器,还用于接收所述第一声音音频信号和音响声音对应的音响回传音频信号,将所述音响回传音频信号与所述声音音频信号进行对比,消除所述第一声音音频信号中所述音响声音对应的音响音频信号,将消除所述音响音频信号的第一声音音频信号作为第二声音音频信号,对所述第二声音音频信号进行识别,获取所述原声音频信号和干扰声音频信号,消除所述干扰声音音频信号,以获得所述原声音频信号。
  8. 如权利要求4所述的电视系统,其特征在于,所述处理器,还用于接收所述第一声音音频信号和音响声音对应的音响回传音频信号,将所述音响回传音频信号与所述声音音频信号进行对比,消除所述第一声音音频信号中所述音响声音对应的音响音频信号,将消除所述音响音频信号的第一声音音频信号作为第二声音音频信号,对所述第二声音音频信号进行识别,获取所述原声音频信号和干扰声音频信号,消除所述干扰声音音频信号,以获得所述原声音频信号。
  9. 如权利要求5所述的电视系统,其特征在于,所述电视系统还包括:输入输出缓冲器;
    所述输入输出缓冲器,用于在所述处理器接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,将所述第一声音音频信号和所述音响回传音频信号暂时存储,使所述音响回传音频信号和所述第一声音音频信号同步后,再将经过同步后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器。
  10. 如权利要求5 所述的电视系统,其特征在于,所述电视系统还包括:自动增益控制器;
    所述自动增益控制器,用于在所述处理器接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,对所述第一声音音频信号和所述音响回传音频信号进行自动增益控制,以保证所述第一声音音频信号和所述音响回传音频信号的输出强度,将经过自动增益控制后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器。
  11. 如权利要求5所述的电视系统,其特征在于,所述智能语音服务器,还用于与外部智能家用电器建立无线连接,根据所述数字信号生成控制信号,将所述控制信号发送至所述外部智能家用电器以实现语音控制。
  12. 如权利要求5所述的电视系统,其特征在于,所述处理器,还用于按照预设频率范围对所述第一声音音频信号和所述音响回传音频信号进行过滤。
  13. 如权利要求6所述的电视系统,其特征在于,所述电视系统还包括:输入输出缓冲器;
    所述输入输出缓冲器,用于在所述处理器接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,将所述第一声音音频信号和所述音响回传音频信号暂时存储,使所述音响回传音频信号和所述第一声音音频信号同步后,再将经过同步后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器。
  14. 如权利要求6 所述的电视系统,其特征在于,所述电视系统还包括:自动增益控制器;
    所述自动增益控制器,用于在所述处理器接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,对所述第一声音音频信号和所述音响回传音频信号进行自动增益控制,以保证所述第一声音音频信号和所述音响回传音频信号的输出强度,将经过自动增益控制后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器。
  15. 如权利要求7所述的电视系统,其特征在于,所述电视系统还包括:输入输出缓冲器;
    所述输入输出缓冲器,用于在所述处理器接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,将所述第一声音音频信号和所述音响回传音频信号暂时存储,使所述音响回传音频信号和所述第一声音音频信号同步后,再将经过同步后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器。
  16. 如权利要求7所述的电视系统,其特征在于,所述电视系统还包括:自动增益控制器;
    所述自动增益控制器,用于在所述处理器接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,对所述第一声音音频信号和所述音响回传音频信号进行自动增益控制,以保证所述第一声音音频信号和所述音响回传音频信号的输出强度,将经过自动增益控制后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器。
  17. 如权利要求8所述的电视系统,其特征在于,所述电视系统还包括:输入输出缓冲器;
    所述输入输出缓冲器,用于在所述处理器接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,将所述第一声音音频信号和所述音响回传音频信号暂时存储,使所述音响回传音频信号和所述第一声音音频信号同步后,再将经过同步后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器。
  18. 如权利要求8 所述的电视系统,其特征在于,所述电视系统还包括:自动增益控制器;
    所述自动增益控制器,用于在所述处理器接收所述第一声音音频信号和音响声音对应的音响回传音频信号之后,对所述第一声音音频信号和所述音响回传音频信号进行自动增益控制,以保证所述第一声音音频信号和所述音响回传音频信号的输出强度,将经过自动增益控制后的所述第一声音音频信号和所述音响回传音频信号发送至所述处理器。
  19. 如权利要求8所述的电视系统,其特征在于,所述智能语音服务器,还用于与外部智能家用电器建立无线连接,根据所述数字信号生成控制信号,将所述控制信号发送至所述外部智能家用电器以实现语音控制。
  20. 一种电视机,其特征在于,所述电视机包含权利要求1至19任一项所述的电视系统。
PCT/CN2018/101657 2017-08-23 2018-08-22 麦克风阵列的电视机及电视系统 WO2019037732A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710732950.7 2017-08-23
CN201710732950.7A CN107454508B (zh) 2017-08-23 2017-08-23 麦克风阵列的电视机及电视系统

Publications (1)

Publication Number Publication Date
WO2019037732A1 true WO2019037732A1 (zh) 2019-02-28

Family

ID=60493278

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/101657 WO2019037732A1 (zh) 2017-08-23 2018-08-22 麦克风阵列的电视机及电视系统

Country Status (2)

Country Link
CN (1) CN107454508B (zh)
WO (1) WO2019037732A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220205665A1 (en) * 2020-12-31 2022-06-30 Lennox Industries Inc. Occupancy tracking using environmental information

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107454508B (zh) * 2017-08-23 2020-07-14 深圳创维-Rgb电子有限公司 麦克风阵列的电视机及电视系统
CN108260037B (zh) * 2018-01-05 2019-10-22 深圳市沃特沃德股份有限公司 语音采集装置和家电设备
KR102459920B1 (ko) * 2018-01-25 2022-10-27 삼성전자주식회사 저전력 에코 제거를 지원하는 애플리케이션 프로세서, 이를 포함하는 전자 장치 및 그 동작 방법
CN108235189A (zh) * 2018-02-07 2018-06-29 深圳创维-Rgb电子有限公司 一种语音信号的回声消除装置及电视机
CN108320745A (zh) * 2018-02-08 2018-07-24 北京小米移动软件有限公司 控制显示的方法及装置
CN110312093A (zh) * 2018-03-27 2019-10-08 晨星半导体股份有限公司 电子装置及相关的信号处理方法
CN108305627A (zh) * 2018-03-30 2018-07-20 合肥惠科金扬科技有限公司 一种智能显示器及系统
CN108289267A (zh) * 2018-04-14 2018-07-17 北京智网时代科技有限公司 消除电视干扰的回声消除装置、方法、音箱、音频发送器
CN110493616B (zh) * 2018-05-15 2021-08-06 中国移动通信有限公司研究院 一种音频信号处理方法、装置、介质和设备
CN110866157B (zh) * 2018-08-27 2022-07-15 北京猎户星空科技有限公司 机器人应答方法、装置及机器人
CN109192219B (zh) * 2018-09-11 2021-12-17 四川长虹电器股份有限公司 基于关键词改进麦克风阵列远场拾音的方法
CN110166882B (zh) * 2018-09-29 2021-05-25 腾讯科技(深圳)有限公司 远场拾音设备、及远场拾音设备中采集人声信号的方法
CN109120993B (zh) * 2018-09-30 2021-12-03 Tcl通力电子(惠州)有限公司 语音识别方法、智能终端、语音识别系统及可读存储介质
CN109284505A (zh) * 2018-11-07 2019-01-29 江苏中润普达信息技术有限公司 一种用于车载的自然语言语义分析方法
CN109493861A (zh) * 2018-12-05 2019-03-19 百度在线网络技术(北京)有限公司 利用语音控制电器的方法、装置、设备和可读存储介质
CN109462794B (zh) * 2018-12-11 2021-02-12 Oppo广东移动通信有限公司 智能音箱及用于智能音箱的语音交互方法
CN109979452A (zh) * 2019-03-21 2019-07-05 中山安信通机器人制造有限公司 车载机器人自然语言处理方法、计算机装置及计算机可读存储介质
CN112152890B (zh) * 2019-06-28 2022-01-21 海信视像科技股份有限公司 一种基于智能音箱的控制系统及方法
CN110289025A (zh) * 2019-07-29 2019-09-27 东莞市居胜电子有限公司 一种多媒体视讯音响系统
CN110691301A (zh) * 2019-09-25 2020-01-14 晶晨半导体(深圳)有限公司 一种测试远场语音设备与外置喇叭之间延迟时间的方法
CN110797040A (zh) * 2019-10-28 2020-02-14 星络智能科技有限公司 一种噪声消除方法、智能音箱及存储介质
CN111223484A (zh) * 2020-01-10 2020-06-02 广州华夏职业学院 一种基于ai算法提升广告牌交互速率的方法
CN111462743B (zh) * 2020-03-30 2023-09-12 北京声智科技有限公司 一种语音信号处理方法及装置
CN111667826B (zh) * 2020-05-28 2023-12-26 深圳创维-Rgb电子有限公司 具有ai语音控制功能的广电监视器及ai语音控制方法
CN112420064B (zh) * 2020-10-21 2024-04-02 深圳创维-Rgb电子有限公司 无线音箱设备语音回声消除处理方法、装置及智能终端

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601838A (zh) * 2014-12-18 2015-05-06 深圳狗尾草智能科技有限公司 一种语音、无线控制智能家用电器操作系统
CN105227967A (zh) * 2015-10-08 2016-01-06 微鲸科技有限公司 支持智能翻译的电视机
CN106358061A (zh) * 2016-11-11 2017-01-25 四川长虹电器股份有限公司 电视语音遥控系统及方法
CN106548783A (zh) * 2016-12-09 2017-03-29 西安Tcl软件开发有限公司 语音增强方法、装置及智能音箱、智能电视
CN106910500A (zh) * 2016-12-23 2017-06-30 北京第九实验室科技有限公司 对带麦克风阵列的设备进行语音控制的方法及设备
CN107454508A (zh) * 2017-08-23 2017-12-08 深圳创维-Rgb电子有限公司 麦克风阵列的电视机及电视系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102842306B (zh) * 2012-08-31 2016-05-04 深圳Tcl新技术有限公司 语音控制方法及装置、语音响应方法及装置
CN204667052U (zh) * 2015-06-03 2015-09-23 深圳市轻生活科技有限公司 一种智能语音交互终端
CN105163233A (zh) * 2015-06-25 2015-12-16 康佳集团股份有限公司 一种智能云音箱与智能终端交互方法及系统
CN106297815B (zh) * 2016-07-27 2017-09-01 武汉诚迈科技有限公司 一种语音识别场景中回音消除的方法
CN106898348B (zh) * 2016-12-29 2020-02-07 北京小鸟听听科技有限公司 一种出声设备的去混响控制方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601838A (zh) * 2014-12-18 2015-05-06 深圳狗尾草智能科技有限公司 一种语音、无线控制智能家用电器操作系统
CN105227967A (zh) * 2015-10-08 2016-01-06 微鲸科技有限公司 支持智能翻译的电视机
CN106358061A (zh) * 2016-11-11 2017-01-25 四川长虹电器股份有限公司 电视语音遥控系统及方法
CN106548783A (zh) * 2016-12-09 2017-03-29 西安Tcl软件开发有限公司 语音增强方法、装置及智能音箱、智能电视
CN106910500A (zh) * 2016-12-23 2017-06-30 北京第九实验室科技有限公司 对带麦克风阵列的设备进行语音控制的方法及设备
CN107454508A (zh) * 2017-08-23 2017-12-08 深圳创维-Rgb电子有限公司 麦克风阵列的电视机及电视系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220205665A1 (en) * 2020-12-31 2022-06-30 Lennox Industries Inc. Occupancy tracking using environmental information

Also Published As

Publication number Publication date
CN107454508A (zh) 2017-12-08
CN107454508B (zh) 2020-07-14

Similar Documents

Publication Publication Date Title
WO2019037732A1 (zh) 麦克风阵列的电视机及电视系统
JP6428954B2 (ja) 情報処理装置、情報処理方法およびプログラム
CN110223690A (zh) 基于图像与语音融合的人机交互方法及装置
US11301208B2 (en) Control method, control device, and control system
CN108363557A (zh) 人机交互方法、装置、计算机设备和存储介质
JP2017138476A (ja) 情報処理装置、情報処理方法、及びプログラム
WO2020130549A1 (en) Electronic device and method for controlling electronic device
CN109473097B (zh) 一种智能语音设备及其控制方法
CN110992955A (zh) 一种智能设备的语音操作方法、装置、设备及存储介质
CN106875946B (zh) 语音控制交互系统
JP2017144521A (ja) 情報処理装置、情報処理方法、及びプログラム
CN107452398B (zh) 回声获取方法、电子设备及计算机可读存储介质
WO2020138662A1 (ko) 전자 장치 및 그의 제어 방법
JP2002034092A (ja) 収音装置
JP7400364B2 (ja) 音声認識システム及び情報処理方法
CN111800700A (zh) 环境中对象提示方法、装置、耳机设备及存储介质
CN108769799B (zh) 一种信息处理方法及电子设备
JP6943192B2 (ja) 家電機器および場所検索システム
CN111182416A (zh) 处理方法、装置及电子设备
KR101442027B1 (ko) 음향패턴을 이용하여 휴대형 단말기용 이어폰 인식하는 음향처리 시스템, 음향패턴을 이용한 휴대형 단말기용 이어폰 인식방법 및 이를 이용한 입력음향 처리 방법.
JP6934831B2 (ja) 対話装置及びプログラム
US11170754B2 (en) Information processor, information processing method, and program
WO2022059911A1 (ko) 전자 장치 및 그 제어 방법
JP6794872B2 (ja) 音声取引システムおよび連携制御装置
JPWO2020166173A1 (ja) 情報処理装置及び情報処理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18847505

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18847505

Country of ref document: EP

Kind code of ref document: A1