WO2020063798A1

WO2020063798A1 - Echo cancellation method, device and intelligent loudspeaker box

Info

Publication number: WO2020063798A1
Application number: PCT/CN2019/108343
Authority: WO
Inventors: 韩中波; 夏萌; 吴海全; 迟欣; 张恩勤; 曹磊; 师瑞文
Original assignee: 深圳市冠旭电子股份有限公司
Priority date: 2018-09-27
Filing date: 2019-09-27
Publication date: 2020-04-02

Abstract

The present application discloses an echo cancellation method and device and an intelligent loudspeaker box. The method comprises: acquiring N first audio signals corresponding to N audio channels connected with the input end of a loudspeaker; wherein N is an integer greater than or equal to 2; performing linear transformation on the N first audio signals to synthesize a second audio signal, and using the second audio signal as a reference signal of echo cancellation; and acquiring a third audio signal collected by a microphone, and performing echo cancellation on the third audio signal according to the reference signal to generate a fourth audio signal. The embodiment of the present application improves the echo cancellation efficiency by getting rid of the need to perform multiple echo cancellation on the audio signals in a plurality of audio channels respectively, realizes more accurate simulation of the echo audio signal by using an audio signal synthesized from audio signals in the plurality of audio channels as a reference signal of echo cancellation, and improves the output tone quality of the loudspeaker after echo cancellation.

Description

Specification Invention Name: Echo cancellation method, device and smart speaker

[0001] This application requires a Chinese patent application filed in the Chinese Patent Office on September 27, 2018, with an application number of 201811130274.7, and an invention name of "a method, device and smart terminal for echo cancellation", and in 2018 The priority of a Chinese patent application filed at the Chinese Patent Office on December 20 with application number 201811561782.0 and the invention name is "An Echo Cancellation Method, Device and Smart Speaker", the entire contents of which are incorporated herein by reference.

Technical field

[0002] The present application relates to the field of signal processing technologies, and in particular, to an echo cancellation method, device, and smart speaker.

[0003] With the continuous pursuit of audiovisual enjoyment, various smart speaker systems have continuously developed from mono to stereo multichannel audio for playback, and there will be noise interference during audio playback, such as audio playback equipment (Speakers) and audio capture devices (microphones) are accessories to the speaker system.

[0004] When the audio played by a speaker is collected into the system through a microphone, echo interference occurs, making the speaker system unable to recognize or play a truly useful voice signal. However, currently such echo interference technologies generally only support mono, which cannot meet the requirements. Audio playback in multiple mainstream audio channels (such as 5.1-channel or 7.1-channel audio playback).

Summary of invention

technical problem

[0005] The purpose of the embodiments of the present application is to provide an echo cancellation method, device, and smart speaker, which are intended to solve the problem that the existing echo interference technology cannot satisfy audio playback in multiple audio channels.

Problem solution

Technical solutions

[0006] In order to solve the above technical problems, the technical solutions adopted in the embodiments of the present application are:

[0007] In a first aspect, an echo cancellation method is provided, which is applied to a smart speaker. The method includes:

[0008] acquiring N first audio signals corresponding to N audio channels connected to a speaker input end; The Ng2 is an integer;

[0009] linearly transform the N first audio signals into a second audio signal, and use the second audio signal as a reference signal for echo cancellation;

[0010] acquiring a third audio signal collected by a microphone, and performing echo cancellation on the third audio signal according to the reference signal to generate a fourth audio signal.

[0011] In a second aspect, an echo cancellation device is provided, where the device includes:

[0012] an acquisition module, configured to acquire N first audio signals corresponding to N audio channels connected to the speaker input end; wherein Ng2 is an integer;

[0013] a synthesizing module, configured to linearly transform the N first audio signals to synthesize a second audio signal, and use the second audio signal as a reference signal for echo cancellation;

A canceling module, configured to acquire a third audio signal collected by a microphone, and perform echo cancellation on the third audio signal according to the reference signal to generate a fourth audio signal.

[0015] According to a third aspect, a smart speaker is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor. The processor implements the computer program when the processor executes the computer program. Steps of the first aspect of the method.

[0016] In a fourth aspect, a computer-readable storage medium is provided. The computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the steps of the method of the first aspect.

[0017] The beneficial effects of the echo cancellation method provided in the embodiments of the present application are: obtaining N first audio signals corresponding to N audio channels connected to the speaker input end; wherein, Ng2 is an integer; and N first audio signals are linearly transformed into a second audio signal, and the second audio signal is used as a reference signal for echo cancellation; a third audio signal collected by a microphone is obtained, and the third audio signal is collected according to the reference signal. The audio signal is subjected to echo cancellation to generate a fourth audio signal. The N first audio signals in the N audio channels are synthesized into a second audio signal as the reference signal for echo cancellation. The audio signals of multiple audio channels can be synthesized and used as the reference signal for echo cancellation, so that multiple audio channels can be processed. The audio signals in the channels are unified for echo cancellation, eliminating the need to perform multiple echo cancellations on the audio signals in multiple audio channels separately, which improves the efficiency of echo cancellation, and because the echo audio signals collected by the microphones are audio in multiple audio channels The audio signal synthesized by the signal, and the audio signals in multiple audio channels are combined into an audio signal as a reference signal for echo cancellation, which can more accurately simulate the echo audio signal , Can improve the sound quality of the loudspeaker output after echo cancellation.

The beneficial effects of the invention

Brief description of the drawings

BRIEF DESCRIPTION OF THE DRAWINGS

[0018] In order to more clearly explain the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or exemplary technical descriptions will be briefly introduced below.

[0019] FIG. 1 is a schematic flowchart of an echo cancellation method provided in Embodiment 1 of the present application;

[0020] FIG. 2 is a schematic flowchart of an echo cancellation method provided in Embodiment 2 of the present application;

[0021] FIG. 3 is a schematic flowchart of an echo cancellation method provided in Embodiment 3 of the present application;

[0022] FIG. 4 is a schematic flowchart of an echo cancellation method provided in Embodiment 4 of the present application;

5 is a schematic flowchart of an echo cancellation method provided in Embodiment 5 of the present application;

[0024] FIG. 6 is a schematic flowchart of an echo cancellation method provided in Embodiment 6 of the present application;

[0025] FIG. 7 is a schematic diagram of an echo cancellation device provided in Embodiment 7 of the present application;

8 is a schematic structural diagram of a smart speaker provided in Embodiment 8 of the present application.

Invention Examples

Embodiments of the invention

[0027] In order to make the purpose, technical solution, and advantages of the present application clearer and clearer, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.

[0028] It should be noted that the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features. The meaning of "multiple" is two or more, unless it is specifically and specifically defined otherwise.

[0029] In order to explain the technical solution described in this application, detailed descriptions are given below with reference to specific drawings and embodiments.

Embodiment 1

[0031] The echo cancellation method provided in the embodiment of the present application may be applied to an audio playback device or system such as a smart speaker including a speaker and a microphone. As shown in FIG. 1, the echo cancellation method provided in Embodiment 1 of the application includes: [0032] Step S101 obtains N first audio signals corresponding to N audio channels connected to a speaker input end; wherein Ng2 is an integer;

[0033] In the embodiments of the present application, the current mainstream speaker or audio playback system is to play high-quality sound effects such as 5.1 or 7.1 channels. The speaker or audio playback system capable of playing multiple channels includes multiple audio channels, and transmits multiple audio channels. Audio signals for each channel. The speaker may be one or more speakers, and the N audio channels may be connected to one or more speakers. When N first audio signals transmitted from the N audio channels are transmitted to the speakers, the N audio channels are obtained. First audio signal. The speaker is a transducing device capable of converting an electric signal into an acoustic signal.

[0034] Step S102: Linearly transform the N first audio signals into a second audio signal, and use the second audio signal as a reference signal for echo cancellation;

[0035] In the embodiment of the present application, the N first audio signals are played through a speaker. When the microphone collects audio emitted by the N first audio signals played by the speaker, an acoustic echo phenomenon is caused. The acoustic echo phenomenon It is generated from the N first audio signals, linearly transforms the N first audio signals, and synthesizes a second audio signal, and uses the second audio signal as a reference signal for echo cancellation.

[0036] In one embodiment, performing linear transformation on the N first audio signals to synthesize a second audio signal includes: obtaining gain values for gain processing in the N audio channels, respectively; according to the N audios A corresponding weight is assigned to the N first audio signals by a channel corresponding gain value; the amplitudes of the N first audio signals are respectively multiplied by the corresponding weights and then accumulated to generate the second audio signal. The above obtaining the gain values of the N audio channels for gain processing can be understood as: performing gain amplification processing on audio signals by the gain amplifier in the N audio channels, and obtaining the gain values of the N audio channels through the gain amplifier. The coefficient of gain amplification. The above-mentioned coefficient of gain amplification may be a preset gain amplification parameter in a gain amplifier corresponding to each audio channel. The above allocating corresponding weights to the N first audio signals according to the gain values corresponding to the N audio channels can be understood as: Assigning corresponding weights according to the size of the gain values corresponding to the N audio channels, and different gain values can be established in advance A mapping table with the corresponding weights, and the corresponding weights are allocated according to the size of the gain values corresponding to the N audio channels. The above-mentioned second audio signal may be understood as an audio signal collected by the microphone and synthesized from the N first audio signals.

[0037] Step S103: Acquire a third audio signal collected by a microphone, and align the third audio signal according to the reference signal. The frequency signal is subjected to echo cancellation to generate a fourth audio signal.

[0038] In the embodiment of the present application, the third audio signal collected by the microphone includes a useful audio signal and a noise audio signal, and the noise audio signal includes collecting an echo audio signal synthesized by N first audio signals sent from a speaker. The fourth audio signal may be understood as an audio signal after the echo signal is eliminated from the third audio signal. The fourth audio signal may be generated by performing echo cancellation on the third audio signal according to the reference signal. Specifically, the reference signal may be echoed as a reference signal in an echo canceller designed according to an Acoustic Echo Canceller technology. After cancellation, a fourth audio signal is generated.

[0039] In an embodiment, acquiring a third audio signal collected by a microphone, and performing echo cancellation on the third audio signal according to the reference signal to generate a fourth audio signal includes: acquiring an echo canceller according to the reference An echo estimation signal generated by the signal; acquiring a third audio signal collected by a microphone, and subtracting the echo estimation signal from the third audio signal to generate the fourth audio signal. The reference signal may be passed through an adaptive filter in an acoustic echo canceller to generate an echo estimation signal, and the third audio signal including the useful audio signal and the echo audio signal collected by the microphone may be subjected to echo cancellation by generating the echo estimation signal, Specifically, the fourth audio signal may be generated by subtracting the echo estimation signal from the third audio signal.

[0040] It can be seen that, in the embodiment of the present application, the N first audio signals in the N audio channels are combined into a second audio signal as a reference signal for echo cancellation, and audio signals of multiple audio channels can be processed. Synthetic processing is used as a reference signal for echo cancellation, thereby performing unified echo cancellation on audio signals in multiple audio channels, eliminating the need for multiple echo cancellations for audio signals in multiple audio channels, which improves the efficiency of echo cancellation, and because The echo audio signal collected by the microphone from the noise audio signal is an audio signal synthesized from audio signals in multiple audio channels. The audio signals in multiple audio channels are combined into an audio signal as a reference signal for echo cancellation, which can be more accurate. Analog echo audio signal can improve the sound quality of the speaker output after echo cancellation.

[0041] Embodiment Two

[0042] As shown in FIG. 2, the echo cancellation method provided in Embodiment 2 of the present application includes:

[0043] Step S201: Obtain N first audio signals corresponding to the N audio channels connected to the speaker input end, where Ng2 is an integer; [0044] Step S202: linearly transform the N first audio signals into a second audio signal, and use the second audio signal as a reference signal for echo cancellation;

[0045] Step S203: Acquire a third audio signal collected by the microphone, and perform echo cancellation on the third audio signal according to the reference signal to generate a fourth audio signal.

[0046] In one embodiment, after acquiring a third audio signal collected by a microphone, and performing echo cancellation on the third audio signal according to the reference signal to generate a fourth audio signal, the method includes: according to the fourth audio The signal and the preset standard audio signal calculate the audio signal difference value through the audio quality perception evaluation algorithm PEAQ, and determine whether the audio signal difference value is within a preset audio signal difference range; if the audio signal difference value is not in a preset Within the audio signal difference range, return the audio signal difference value to the echo canceller, so that the echo canceller adjusts a filter coefficient according to the audio signal difference value. The audio quality perception evaluation algorithm PEAQ (Perceptual Evaluation of Audio Quality) can imitate the hearing system of the human ear, analyze and compare the reference signal and the test signal to obtain an objective evaluation difference in audio quality, and store the standard audio signal of the speaker as The reference signal in PEAQ and the above-mentioned echo-cancelled fourth audio signal are used as test signals in PEAQ, and the audio signal difference and equivalent value can be calculated by PEAQ according to the fourth audio signal and a preset standard audio signal. When the echo canceller receives the difference value of the audio signal, it can adjust the filter coefficient (increase or decrease the filter coefficient) according to the difference value of the audio signal until the difference value of the audio signal is within a preset difference range of the audio signal.

[0047] In the embodiment of the present application, the above steps S201, S202, and S203 are the same or similar to the above steps S101, S102, and S103, respectively. For details, refer to the related description of the above steps S101 to S103, and details are not described herein again.

[0048] Step S204: Frequency-divide the fourth audio signal and input the corresponding N audio channels respectively, and then input the fourth audio signal to the speakers connected to the N audio channels after gain processing. Instruct the speaker to play the fourth audio signal that has been processed by gain.

[0049] In the embodiment of the present application, the fourth audio signal is a useful audio signal after echo cancellation. The fourth audio signal is frequency-divided to generate corresponding N audio signals, and the corresponding N audio signals are input. After the channels are subjected to gain amplification processing, playback is performed by one or more speakers connected to the N audio channels.

[0050] It can be seen that, in the embodiment of the present application, the N first audio signals in the N audio channels are combined into one. Two second audio signals are used as reference signals for echo cancellation, and audio signals of multiple audio channels can be synthesized as reference signals for echo cancellation, thereby performing unified echo cancellation on audio signals in multiple audio channels, without the need to separately The audio signals in each audio channel are subjected to multiple echo cancellations, which improves the efficiency of echo cancellation, and because the echo audio signals in the noise audio signals collected by the microphone are audio signals synthesized from the audio signals in multiple audio channels, multiple The audio signals in each audio channel are combined into an audio signal as a reference signal for echo cancellation, which can more accurately simulate the echo audio signal, and can improve the sound quality of the loudspeaker output after echo cancellation.

Embodiment 3

[0052] As shown in FIG. 3, the echo cancellation method provided in Embodiment 3 of the present application includes:

[0053] Step S301: Obtain N first audio signals corresponding to the N audio channels connected to the speaker input end, where Ng2 is an integer.

[0054] Step S302, linearly transform the N first audio signals into a second audio signal, and use the second audio signal as a reference signal for echo cancellation.

[0055] In the embodiment of the present application, the above steps S301 and S302 are the same or similar to the above steps S101 and S102, respectively. For details, refer to the related descriptions of the above steps S101 to S102, and details are not described herein.

[0056] Step S303: Detect the working mode of the smart speaker.

[0057] In the embodiment of the present application, the current working mode of the smart speaker is detected. The specific working modes mentioned above include a voice working mode and a music playing mode. The voice working mode includes use scenarios such as voice playing and telephone calling. The music playing mode Including usage scenarios such as playing music.

[0058] Step S304: Acquire a third audio signal collected by a microphone and an echo estimation signal generated by the echo canceller according to the reference signal and the working mode, and subtract the echo estimation signal from the third audio signal to generate the Fourth audio signal.

[0059] Specifically, when the microphone collects a third audio signal, according to the detected current working mode of the smart speaker, an adaptive filter coefficient in the echo canceller corresponding to the current working mode is selected, and an echo estimation signal is generated based on the reference signal. And subtracting the echo estimation signal from the third audio signal to generate a fourth audio signal to perform echo cancellation.

[0060] In the embodiment of the present application, since the collected audio signal can be echo-cancelled by the echo canceller according to the working mode of the smart speaker, the characteristics of different modes can be targeted in different modes of the smart speaker. Performing corresponding echo cancellation can effectively reduce the error of echo cancellation.

Embodiment 4

[0062] This embodiment is a further description of the third embodiment. For the same or similar parts of this embodiment as those of the third embodiment, reference may be made to the related description of the third embodiment, which will not be repeated here. The echo canceller includes the first embodiment. An adaptive filter and a second adaptive filter. As shown in FIG. 4, the above step S304 includes:

[0063] Step S401: if the working mode of the smart speaker is a first preset working mode, obtain the echo estimation signal generated by the first adaptive filter according to the reference signal;

[0064] In the embodiment of the present application, when the first use scenario of the smart speaker included in the first preset working mode is detected, it means that the smart speaker is in the first preset working mode, Acquire a preset echo estimation signal generated by a first adaptive filter that is preset corresponding to a preset in the first preset working mode, and perform echo cancellation on the third audio signal according to the echo estimation signal.

[0065] In an embodiment, the first preset working mode may be a voice working mode, and a first use scenario corresponding to the first preset working mode when the first working mode is the voice working mode, such as a smart speaker in voice playback and a phone call. scenes to be used.

[0066] Step S402: if the working mode of the smart speaker is a second preset working mode, obtain the echo estimation signal generated by the second adaptive filter according to the reference signal.

[0067] In the embodiment of the present application, the second preset working mode includes a second use scenario of the smart speaker. When it is detected that the smart speaker is in the second use scenario, it means that the smart speaker is in the second preset working mode. A predetermined echo estimation signal generated by a preset second adaptive filter corresponding to a preset in the second preset working mode, and performing echo cancellation on the third audio signal according to the echo estimation signal.

[0068] In one embodiment, the above-mentioned second preset working mode may be a music playback working mode, and a second usage scenario corresponding to the second preset working mode when the second preset working mode is a music mode. For example, the smart speaker is used in music playback.

[0069] Embodiment Five

[0070] This embodiment is a further description of the fourth embodiment. For the same or similar places of this embodiment and the fourth embodiment, reference may be made to the related description of the fourth embodiment, and details are not described herein again, as shown in FIG. 5. Step S401 includes:

[0071] Step S501: If the working mode of the smart speaker is a voice working mode, determine it by using a minimum mean square algorithm. Determining coefficients of a first adaptive filter corresponding to the voice working mode;

[0072] In the embodiment of the present application, when a third audio signal is collected by a microphone and the working mode of the smart speaker is a voice working mode, it is determined to work with the voice by using a Least Mean Squares (LMS) algorithm. Coefficients of the first adaptive filter corresponding to the mode; because the RLS algorithm has good convergence performance, except that the convergence speed is faster than the recursive least squares (RLS, Recursive Least Squares) algorithm and the stability is strong, and has higher stability Initial convergence rate, smaller weight noise, and greater noise suppression. Therefore, when a voice signal is detected, determining the coefficient of the first adaptive filter corresponding to the voice working mode by using LMS will make the first adaptive filter perform better noise suppression on the third voice signal. .

[0073] Step S502: the echo estimation signal generated according to the coefficient of the first adaptive filter and the reference signal.

[0074] Specifically, adjusting a coefficient of the first adaptive filter, passing a reference signal through the first adaptive filter to generate an echo estimation signal, and subtracting the third audio signal from the echo estimation signal to generate the fourth audio signal.

[0075] In the embodiment of the present application, when a smart speaker is in a voice working mode, echo cancellation is performed on a voice signal collected by a microphone through a first adaptive filter, and echo cancellation is performed according to characteristics of this mode, which can effectively reduce echo cancellation. error.

Embodiment 6

[0077] This embodiment is a further description of the fourth embodiment. For the same or similar places of this embodiment and the fourth embodiment, reference may be made to the related description of the fourth embodiment, which will not be repeated here. As shown in FIG. Step S402 includes:

[0078] Step S601: if the working mode of the smart speaker is a music playback mode, determine a coefficient of a second adaptive filter corresponding to the music playback mode by recursive least squares algorithm;

[0079] In the embodiment of the present application, when a third audio signal is collected by a microphone and the working mode of the smart speaker is a music playback mode, a second adaptive filter corresponding to the music playback mode is determined by an RLS algorithm. Because the music has multiple frequency components, because the RLS algorithm has better adaptability to non-stationary signals than the LMS, its filtering performance is significantly better than the LMS algorithm, and the second adaptive filtering corresponding to the music playback mode is determined using RLS Coefficients of the filter will cause the second adaptive filter to process the third speech signal. Echo cancellation is more adaptive.

[0080] Step S602: the echo estimation signal generated according to the coefficient of the second adaptive filter and the reference signal.

[0081] Specifically, adjusting the coefficients of the second adaptive filter, passing the reference signal through the second adaptive filter to generate an echo estimation signal, and subtracting the third audio signal from the echo estimation signal to generate the fourth audio signal.

[0082] In the embodiment of the present application, echo cancellation is performed on a voice signal collected by a microphone through a second adaptive filter when the smart speaker is in a music playback mode, and echo cancellation is performed according to the characteristics of this mode, which can effectively reduce echo cancellation. error.

Embodiment 7

[0084] An embodiment of the present application provides an echo cancellation device, which can be integrated into an audio playback device or system such as a smart speaker including a speaker and a microphone, and is configured to execute the method steps in Embodiments 1 to 6. For ease of description, Only shown in relevant parts of this application. As shown in FIG. 7, the echo cancellation device 700 includes:

[0085] The acquisition module 701 is configured to acquire N first audio signals corresponding to the N audio channels connected to the speaker input end, where Ng2 is an integer;

A synthesizing module 702, configured to linearly transform the N first audio signals into a second audio signal, and use the second audio signal as a reference signal for echo cancellation;

[0087] In an embodiment, the composition module 702 includes:

[0088] a first acquisition unit, configured to acquire gain values for gain processing in the N audio channels, respectively;

[0089] an assigning unit, configured to assign corresponding weights to the N first audio signals according to the gain values corresponding to the N audio channels;

[0090] an accumulating unit, configured to multiply the amplitudes of the N first audio signals by the corresponding weights, and then accumulate to generate the second audio signal.

[0091] The cancellation module 703 is configured to acquire a third audio signal collected by a microphone, and perform echo cancellation on the third audio signal according to the reference signal to generate a fourth audio signal.

[0092] In one embodiment, the elimination module 702 includes:

[0093] a second obtaining unit, configured to obtain an echo estimate generated by the adaptive filter according to the reference signal Signal

[0094] a generating unit, configured to obtain a third audio signal collected by a microphone, and subtract the echo estimation signal from the third audio signal to generate the fourth audio signal.

[0095] In an embodiment, the echo cancellation device 700 further includes:

[0096] a frequency division processing module, configured to divide the fourth audio signal into the corresponding N audio channels after frequency division processing, and input the signals to all the channels connected to the N audio channels after gain processing; And speaking the speaker to instruct the speaker to play the fourth audio signal that has been processed by gain.

[0097] In an embodiment, the echo cancellation device 700 further includes:

[0098] a judging module, configured to calculate an audio signal difference value through an audio quality perception evaluation algorithm PEAQ according to the fourth audio signal and a preset standard audio signal, and determine whether the audio signal difference value is within a preset audio signal Within the difference range; if the audio signal difference value is not within a preset audio signal difference range, returning the audio signal difference value to the adaptive filter, so that the adaptive filter is based on the audio signal The difference value adjusts the filter coefficient.

[0099] In one embodiment, the echo cancellation device 700 further includes a detection module for detecting a working mode of the smart speaker;

[0100] Correspondingly, the second obtaining unit is specifically configured to:

[0101] acquiring an echo estimation signal generated by the echo canceller according to the reference signal and the operating mode.

[0102] In one embodiment, the echo canceller includes a first adaptive filter and a second adaptive filter, and the second obtaining unit is further specifically configured to:

[0103] if the working mode of the smart speaker is a first preset working mode, acquiring the echo estimation signal generated by the first adaptive filter according to the reference signal;

[0104] if the working mode of the smart speaker is a second preset working mode, acquiring the echo estimation signal generated by the second adaptive filter according to the reference signal.

[0105] In one embodiment, the first preset working mode is a voice working mode.

[0106] In one embodiment, the second obtaining unit is specifically configured to: determine a coefficient of a first adaptive filter corresponding to the voice working mode by using a minimum mean square algorithm;

[0107] the echo estimation signal generated according to the coefficients of the first adaptive filter and the reference signal.

[0108] the echo estimation signal generated according to the coefficients of the first adaptive filter and the reference signal. [0109] In one embodiment, the second preset working mode is a music playback mode.

[0110] In an embodiment, the second obtaining unit is specifically configured to determine a coefficient of a second adaptive filter corresponding to the music playback mode by a recursive least square algorithm;

[0111] the echo estimation signal generated according to a coefficient of the second adaptive filter and the reference signal.

[0112] It can be seen that, in the embodiment of the present application, the N first audio signals in the N audio channels are combined into a second audio signal as a reference signal for echo cancellation, and audio signals of multiple audio channels can be processed. Synthetic processing is used as a reference signal for echo cancellation, thereby performing unified echo cancellation on audio signals in multiple audio channels, eliminating the need for multiple echo cancellations for audio signals in multiple audio channels, which improves the efficiency of echo cancellation, and because The echo audio signal collected by the microphone from the noise audio signal is an audio signal synthesized from audio signals in multiple audio channels. The audio signals in multiple audio channels are combined into an audio signal as a reference signal for echo cancellation, which can be more accurate. Analog echo audio signal can improve the sound quality of the speaker output after echo cancellation.

Embodiment 8

[0114] FIG. 8 is a schematic structural diagram of a smart speaker according to an embodiment of the present application. The smart speaker 800 includes: a processor 801, a memory 802, and a computer program 803 stored in the memory 802 and executable on the processor 801. When the processor 801 executes the computer program 803, the steps in the embodiment of the echo cancellation method are implemented, for example, the method steps in the foregoing embodiment.

[0115] Exemplarily, the computer program 803 may be divided into one or more units / modules, and the one or more units / modules are stored in the memory 802 and executed by the processor 801 to complete the present invention. Application. The one or more units / modules may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program 803 in the smart speaker 800 described above.

[0116] The processor 801 may be a central processing unit (CPU), or may be other general-purpose processors, digital signal processors (DSPs), and application specific integrated circuits (Application Specific Integrated Circuits, ASIC), off-the-shelf programmable gate array

(Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like. [0117] The memory 802 may be an internal storage unit of the smart speaker 800, such as a hard disk or a memory of the smart speaker 800. The memory 802 may also be an external storage device of the smart speaker 800, for example, a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, and a flash memory provided on the smart speaker 800. Card (Flash Card), etc. Further, the memory 802 may include both the internal storage unit of the smart speaker 800 and an external storage device. The memory 802 is configured to store the computer program and other programs and data required by the smart speaker 800. The memory 802 may also be used to temporarily store data that has been output or is to be output.

[0118] Those skilled in the art can understand that FIG. 8 is only an example of the smart speaker 800, and does not constitute a limitation on the smart speaker 800. The smart speaker 800 may include more or fewer components than shown, or some components may be combined, or Different components, for example, the above-mentioned smart speaker 800 may further include an input-output device, a network access device, a bus, and the like.

[0119] Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional units and modules is used as an example. In actual applications, the foregoing functions may be allocated by different functions according to requirements. The functional units and modules are completed, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit, and the integrated unit may use hardware. It can be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of this application. For the specific working process of the unit and module in the smart speaker, reference may be made to the corresponding process in the foregoing method embodiment, and details are not described herein again.

[0120] In the foregoing embodiments, the description of each embodiment has its own emphasis. For a part that is not detailed or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

[0121] Those of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraints of the technical solution. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application. [0122] In the embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are only schematic. For example, the division of the foregoing modules or units is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined. Or it can be integrated into another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, the indirect coupling or communication connection of the device or unit, and may be electrical, mechanical or other forms.

[0123] The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. on. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions in the embodiments of the present application.

[0124] In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional unit.

[0125] If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on such an understanding, this application implements all or part of the processes in the method in the foregoing embodiment, and may also be completed by a computer program instructing related hardware. The computer program may be stored in a computer-readable storage medium. The computer program When executed by a processor, the steps of the foregoing method embodiments may be implemented. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file, or some intermediate form. The above computer-readable medium may include: any entity or device capable of carrying the above computer program code, a recording medium, a U disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory), a random Access memory (RAM, Random Access Memory), electric carrier signals, telecommunication signals, and software distribution media.

[0126] The above are only optional embodiments of the present application, and are not used to limit the present application. For those skilled in the art, this application may have various modifications and changes. All within the spirit and principles of this application Any modification, equivalent replacement, improvement, etc. shall be included in the scope of the claims of this application.

Claims

Claim

[Claim 1] An echo cancellation method applied to a smart speaker, characterized in that the method includes: acquiring N first audio signals corresponding to N audio channels connected to a speaker input end; wherein, the Ng2 is an integer;

Linearly transform the N first audio signals into a second audio signal, and use the second audio signal as a reference signal for echo cancellation;

A third audio signal collected by a microphone is acquired, and an echo cancellation is performed on the third audio signal according to the reference signal to generate a fourth audio signal.

[Claim 2] The echo cancellation method according to claim 1, wherein linearly transforming the N first audio signals to synthesize a second audio signal comprises: obtaining the N audio channels respectively. Gain value for gain processing;

Assign corresponding weights to the N first audio signals according to the gain values corresponding to the N audio channels;

Multiplying the amplitudes of the N first audio signals by the corresponding weights and accumulating them to generate the second audio signal.

[Claim 3] The echo cancellation method according to claim 1, wherein a third audio signal collected by a microphone is acquired, and the third audio signal is subjected to echo cancellation according to the reference signal to generate a fourth audio signal. Including:

Obtaining an echo estimation signal generated by the echo canceller according to the reference signal;

Acquiring a third audio signal collected by a microphone, and subtracting the echo estimation signal from the third audio signal to generate the fourth audio signal.

[Claim 4] The echo cancellation method according to any one of claims 1 to 3, wherein after the third audio signal is subjected to echo cancellation according to the reference signal, a fourth audio signal is generated, and Including:

Frequency-dividing the fourth audio signal into the corresponding N audio channels and inputting the fourth audio signal to the speakers connected to the N audio channels after gain processing to instruct the speakers to pass through The fourth audio signal after gain processing.

[Claim 5] The echo cancellation method according to claim 3, characterized in that: After the third audio signal is echo-cancelled according to the reference signal to generate a fourth audio signal, the method further includes:

Calculating an audio signal difference value through an audio quality perception evaluation algorithm PEAQ according to the fourth audio signal and a preset standard audio signal, and determining whether the audio signal difference value is within a preset audio signal difference range;

If the audio signal difference value is not within a preset audio signal difference range, returning the audio signal difference value to the echo canceller, so that the echo canceller adjusts a filter coefficient according to the audio signal difference value.

[Claim 6] The echo cancellation method according to claim 3, wherein the echo cancellation method further comprises: detecting a working mode of the smart speaker;

Correspondingly, the acquiring the echo estimation signal generated by the echo canceller according to the reference signal specifically includes:

Acquire an echo estimation signal generated by the echo canceller according to the reference signal and the operating mode.

[Claim 7] The echo canceling method according to claim 6, wherein the echo canceller includes a first adaptive filter and a second adaptive filter, and the echo canceler is obtained according to the reference signal and The echo estimation signal generated by the working mode includes:

If the working mode of the smart speaker is a first preset working mode, obtaining the echo estimation signal generated by the first adaptive filter according to the reference signal;

If the working mode of the smart speaker is a second preset working mode, obtain the echo estimation signal generated by the second adaptive filter according to the reference signal.

[Claim 8] The echo cancellation method according to claim 7, wherein the first preset working mode is a voice working mode.

[Claim 9] The echo cancellation method according to claim 8, wherein the acquiring the echo estimation signal generated by the first adaptive filter according to the reference signal comprises: determining by a least mean square algorithm Coefficients of a first adaptive filter corresponding to the voice working mode;

The echo estimate generated according to the coefficients of the first adaptive filter and the reference signal 计信号。 Meter signal.

[Claim 10] The echo cancellation method according to claim 7, wherein the second preset working mode is a music playback mode.

[Claim 11] The echo cancellation method according to claim 10, wherein acquiring the echo estimation signal generated by the second adaptive filter according to the reference signal comprises: recursive least squares An algorithm determining a coefficient of a second adaptive filter corresponding to the music playback mode;

The echo estimation signal generated based on the coefficients of the second adaptive filter and the reference signal.

[Claim 12] An echo cancellation device, characterized in that the device includes:

An acquisition module, configured to acquire N first audio signals corresponding to the N audio channels connected to the speaker input end; wherein Ng2 is an integer;

A synthesis module, configured to linearly transform the N first audio signals into a second audio signal, and using the second audio signal as a reference signal for echo cancellation; a cancellation module configured to obtain a third signal collected by a microphone An audio signal, and performing echo cancellation on the third audio signal according to the reference signal to generate a fourth audio signal.

[Claim 13] The echo cancellation device according to claim 12, wherein the synthesis module includes a first acquisition unit configured to acquire gain values of gain processing in the N audio channels, respectively;

An allocation unit, configured to allocate corresponding weights to the N first audio signals according to the gain values corresponding to the N audio channels;

And an accumulating unit, configured to multiply the amplitudes of the N first audio signals by corresponding weights, and then accumulate to generate the second audio signal.

[Claim 14] The echo cancellation device according to claim 12, wherein the cancellation module includes a second acquisition unit, configured to acquire an echo estimation signal generated by the echo canceller according to the reference signal; A generating unit is configured to obtain a third audio signal collected by a microphone, and subtract the echo estimation signal from the third audio signal to generate the fourth audio signal.

[Claim 15] The echo cancellation device according to any one of claims 12 to 14, wherein the device further comprises:

A frequency division processing module, configured to frequency-divide the fourth audio signal and input the corresponding N audio channels respectively, and input the signals to the speakers connected to the N audio channels after gain processing, To instruct the speaker to play the fourth audio signal that has been processed by gain.

[Claim 16] The echo cancellation device according to claim 14, wherein the device further comprises: a judging module, configured to perform an audio quality perception evaluation according to the fourth audio signal and a preset standard audio signal. The algorithm PEAQ calculates an audio signal difference value, and determines whether the audio signal difference value is within a preset audio signal difference range;

[Claim 17] The echo cancellation device according to claim 14, wherein the device further comprises a detection module for detecting a working mode of the smart speaker;

Correspondingly, the second obtaining unit is specifically configured to:

[Claim 18] The echo cancellation device according to claim 17, wherein the echo canceller includes a first adaptive filter and a second adaptive filter, and the second obtaining unit is further specifically configured to: :

[Claim 19] A smart speaker, comprising a memory, a processor, and stored in the memory and accessible in The computer program running on the processor is characterized in that, when the processor executes the computer program, the steps of the method according to any one of claims 1 to 11 are implemented.

[Claim 20] A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the computer program is implemented according to any one of claims 1 to 11. Method steps.