WO2020103209A1 - 一种回声消除方法及终端 - Google Patents

一种回声消除方法及终端

Info

Publication number
WO2020103209A1
WO2020103209A1 PCT/CN2018/119887 CN2018119887W WO2020103209A1 WO 2020103209 A1 WO2020103209 A1 WO 2020103209A1 CN 2018119887 W CN2018119887 W CN 2018119887W WO 2020103209 A1 WO2020103209 A1 WO 2020103209A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
audio
terminal
playback device
played
Prior art date
Application number
PCT/CN2018/119887
Other languages
English (en)
French (fr)
Inventor
林惠东
陈郭皇寿
孙磊
Original Assignee
网宿科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 网宿科技股份有限公司 filed Critical 网宿科技股份有限公司
Priority to US17/285,616 priority Critical patent/US20210321005A1/en
Priority to EP18940719.0A priority patent/EP3882913A4/en
Publication of WO2020103209A1 publication Critical patent/WO2020103209A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/085Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using digital techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02163Only one microphone

Definitions

  • the embodiments of the present invention relate to the technical field of audio and video real-time communication, and in particular, to an echo cancellation method and a terminal.
  • a single audio and video call scenario is a bit boring and can no longer meet the needs of users. Therefore, on this basis, a "watch and talk" application is derived, that is, watching TV programs and talking on the same terminal (such as a mobile phone or a TV); or experiencing voice games during game play also needs to involve voice collection And play.
  • the background sounds generated by TV programs are collected by the microphone together with the human voice and sent to the far end, which affects the quality of the call; or, during the game, the different call ends collect voice back and forth, generating unwanted echoes, The sound quality is noisy, affecting the user experience.
  • the background sound collected by the terminal is simply treated as noise to suppress it.
  • the background sound cannot be accurately recognized, and only a small part of the noise can be eliminated, thereby affecting the voice call quality.
  • obtain the background sound at the software layer and synthesize it with the far-end audio then directly use it as reference data for echo cancellation.
  • the background sound is often acquired, it needs to be synthesized again, which is different from the actual playback data, which affects the elimination effect. Therefore, a technical solution that can effectively eliminate external echoes and improve the quality of voice calls is urgently needed.
  • the embodiments of the present application provide an echo cancellation method and terminal.
  • an embodiment of the present application provides an echo cancellation method.
  • the method includes:
  • the terminal collects first-end audio data, where the first-end audio data includes the voice of the first-end user and the audio played by the audio playback device on the terminal;
  • the terminal queries the reference audio data corresponding to the first-end audio data from the buffer area, and the buffer area buffers the audio data on the audio playback device as reference audio data;
  • the terminal uses the reference audio data to eliminate the audio played by the audio playback device in the first-end audio data, and determines to correct the audio data;
  • the terminal sends the corrected audio data to the second-end user terminal.
  • the terminal pre-caches the audio data on the audio playback device as reference audio data
  • the terminal collects the audio played on the audio playback device and the voice of the first end user during the audio playback, using the reference audio
  • the data eliminates the audio played on the audio playback device in the first-end audio data, leaving the voice of the first-end user, avoiding the audio played on the audio-playing device to interfere with the voice of the first-end user, thereby improving the The call quality of the second-end user.
  • the audio data on the audio playback device includes audio data to be played on the audio playback device.
  • querying the reference audio data corresponding to the first-end audio data from the buffer area by the terminal includes:
  • the terminal determines the similarity between the first-end audio data and each reference audio data in the buffer area
  • the terminal determines the reference audio data with the highest similarity to the first-end audio data as the reference audio data corresponding to the first-end audio data.
  • the first end audio data is determined by comparing the similarity between the first end audio data and each reference audio data
  • the corresponding reference audio data does not need to strictly correspond to the acquisition time of the first-end audio data and the buffer time of the reference audio data, thereby improving the stability of echo cancellation and reducing complexity.
  • the method before the terminal sends the corrected audio data to the second-end user terminal, the method further includes:
  • the terminal performs gain processing on the corrected audio data.
  • the terminal uses the reference audio data to eliminate the audio played by the audio playback device in the first-end audio data, after determining the modified audio data, the power of the modified audio is correspondingly weakened, so the modified audio data is gain-processed to improve the reception of the second-end user terminal The power of the audio, thereby improving the call effect between the first-end user and the second-end user.
  • the terminal uses the reference audio data to eliminate the audio played by the audio playback device in the first-end audio data, and determines to modify the audio data, including:
  • the terminal inputs the reference audio data and the first end audio data to a linear adaptive filter, and the linear adaptive filter subtracts the reference audio data from the first end audio data and outputs the Correct audio data.
  • the terminal uses the reference audio data to eliminate the audio played by the audio playback device in the first-end audio data, and determines to modify the audio data, including:
  • the terminal inputs the reference audio data and the first-end audio data to a linear adaptive filter, the linear adaptive filter uses the reference audio data to estimate echo audio, and subtracts the first-end audio data The echo audio is removed, and the corrected audio data is output.
  • the terminal before the terminal inputs the reference audio data and the first-end audio data to the linear adaptive filter, the terminal further includes:
  • the terminal adjusts the audio parameters of the reference audio data and the audio parameters of the first-end audio data to a preset value that matches the linear adaptive filter.
  • it also includes:
  • the terminal When determining that the attenuation value of the first-end audio data compared to the modified audio data is greater than a preset threshold, the terminal replaces the modified audio data with comfort noise.
  • the attenuation value of the first-end audio data compared to the corrected audio data is greater than the preset threshold, it means that most of the first-end audio data is audio played by the audio playback device, and the proportion of the first-end user's voice is very large If it is small, the first-end audio data can be deleted directly, and comfortable noise can be added at the same time to avoid fluctuations in the sense of hearing.
  • an embodiment of the present application provides a terminal, including:
  • a collection module configured to collect first-end audio data, where the first-end audio data includes the voice of the first-end user and the audio played by the audio playback device on the terminal;
  • the query module is used to query the reference audio data corresponding to the first-end audio data from the buffer area, and the buffer area buffers the audio data on the audio playback device as reference audio data;
  • a processing module configured to use the reference audio data to eliminate the audio played by the audio playback device in the first-end audio data, and determine the modified audio data
  • the sending module is used to send the modified audio data to the second-end user terminal.
  • the audio data on the audio playback device includes audio data to be played on the audio playback device.
  • the query module is specifically used to:
  • the reference audio data with the highest similarity to the first-end audio data is determined as the reference audio data corresponding to the first-end audio data.
  • a gain module is also included.
  • the gain module is specifically used for:
  • processing module is specifically used to:
  • the reference audio data and the first-end audio data are input to a linear adaptive filter, and the linear-adaptive filter subtracts the reference audio data from the first-end audio data to output the modified audio data .
  • processing module is specifically used to:
  • the reference audio data and the first-end audio data are input to a linear adaptive filter, the linear adaptive filter uses the reference audio data to estimate echo audio, and the first-end audio data is subtracted from the Echo audio, output the corrected audio data.
  • processing module is also used to:
  • processing module is also used to:
  • the modified audio data is replaced with comfort noise.
  • an embodiment of the present application provides a terminal device, including at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the The processor executes the steps of the above echo cancellation method.
  • an embodiment of the present application provides a computer-readable storage medium that stores a computer program executable by a terminal device, and when the program runs on the terminal device, causes the terminal device to perform the foregoing echo cancellation method A step of.
  • the terminal since the terminal pre-caches the audio data to be played on the audio playback device as reference audio data, when playing audio on the audio playback device, the terminal collects the audio played on the audio playback device and first End user's voice, use the reference audio data to eliminate the audio played on the audio playback device in the first end audio data, leaving the first end user's voice, avoiding the audio played on the audio playback device to interfere with the first end user's voice In order to improve the call quality between the first-end user and the second-end user.
  • the linear adaptive filter is used to fit the echo audio corresponding to the reference audio data, so that the echo audio is closer to the audio played by the audio playback device, so when the echo audio is used to cancel the audio played by the audio playback device in the first end of the audio data, it is eliminated The effect of echo is better.
  • the modified audio data is subjected to gain processing and then sent to the second-end user terminal to increase the power of the modified audio and improve the voice effect heard by the second-end user.
  • FIG. 1 is an application scenario diagram provided by an embodiment of this application
  • FIG. 2 is a schematic flowchart of an echo cancellation method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an echo cancellation method provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a terminal according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
  • the echo cancellation method in the embodiment of the present application may be applied to the application scenario shown in FIG. 1, where the application scenario includes a first-end user terminal 101 and a second-end user terminal 102.
  • the first-end user terminal 101 and the second-end user terminal 102 are electronic devices equipped with a call function and an audio and video playback function.
  • the electronic device may be a smart TV, a smart phone, a tablet computer, a portable personal computer, or the like.
  • the first-end user terminal 101 and the second-end user terminal 102 can make a call by dialing a phone or by using instant messaging software, where the call includes a voice call and a video call.
  • An application program for playing audio and video is installed on the first-end user terminal 101 and the second-end user terminal 102.
  • the first end user terminal 101 is a near-end user terminal
  • the second point user terminal 102 is a far-end user terminal as an example for description.
  • the first-end user terminal 101 that is, the near-end user terminal is used to collect the voice of the near-end user and the audio played by the audio playback device, and then perform echo cancellation on the collected audio data and send it to the remote user terminal.
  • the far-end user terminal is used to receive echo-cancelled audio data sent by the near-end user terminal.
  • the first user terminal 101 may be a remote user terminal
  • the second user terminal 102 may be a near-end user terminal
  • the TV saves the TV audio data that the speaker needs to play in the buffer area as reference audio data.
  • the speaker broadcasts TV audio.
  • the first-end user speaks into the microphone on the TV.
  • the microphone collects the TV audio played by the speaker and the first-end user's voice as the first-end audio data. If you do not cancel the audio collected by the microphone, the TV audio played by the speaker will also be sent to the second-end user terminal, so the second-end user will hear audio other than the first-end user's voice, which affects the call quality.
  • the reference audio data in the buffer area is used to eliminate the TV audio played by the speaker in the first-end audio data to obtain the first-end user's voice, and the first-end user's voice is gain-processed and sent to Second-end users, thereby improving the quality of calls.
  • an embodiment of the present application provides a flow of an echo cancellation method.
  • the flow of this method may be executed by a terminal.
  • the terminal may be the above-mentioned first-end user terminal 101 , Including the following steps:
  • step S201 the terminal collects the first-end audio data, and the first-end audio data includes the voice of the first-end user and the audio played by the audio playback device on the terminal.
  • the terminal is an electronic device with a call function and an audio and video playback function.
  • the electronic device may be a smart TV, a smart phone, a tablet computer, a portable personal computer, or the like.
  • the terminal collects the first end audio data through the microphone.
  • the audio playback device on the terminal may be a speaker.
  • the audio played by the audio playback device may be audio in a video, such as audio of a TV program, audio played in a player, and so on.
  • the audio played by the audio playback device can also be pure audio, such as music played by a music player, radio broadcasted by a radio station, and cell phone ringtones.
  • the audio played by the audio playback device may also be the voice of the second-end user received by the terminal.
  • the audio playback device may simultaneously play multiple audios. For example, when the speaker simultaneously receives the audio data of the TV program and the voice data of the second end user, the audio of the TV program and the voice of the second end user are simultaneously played.
  • the duration of the audio collected every time by the terminal can be preset, for example, the duration of the audio collected every time is 5ms.
  • the first-end audio data further includes audio parameters of the first-end audio data.
  • the audio parameters include audio size, sampling rate, number of channels, bit width, and whether to interleave.
  • step S202 the terminal queries the reference audio data corresponding to the first-end audio data from the buffer area.
  • the buffer area is preset, and the buffer area buffers the audio data on the audio playback device as reference audio data.
  • the duration of each piece of reference audio data cached in the buffer area can be set in advance, and the duration of the reference audio data corresponds to the duration of the audio played on the audio playback device collected by the terminal. For example, the duration of each piece of reference audio data is 5 ms. The duration of the audio played on the collected audio playback device is 5ms.
  • the audio data on the audio playback device includes audio data to be played on the audio playback device, and the buffer area buffers the audio data to be played on the audio playback device as reference audio data.
  • the audio data on the audio playback device includes audio data that has been played on the audio playback device, and the buffer area buffers the audio data that has been played on the audio playback device as reference audio data.
  • step S203 the terminal uses the reference audio data to eliminate the audio played by the audio playback device in the first-end audio data, and determines to correct the audio data.
  • the reference audio data is highly correlated with the audio played by the audio playback device. Subtracting the reference audio data from the first end audio data can cancel the audio played by the audio playback device in the first end audio data.
  • Step S204 the terminal sends the corrected audio data to the second-end user terminal.
  • the terminal pre-caches the audio data to be played on the audio playback device as reference audio data
  • the terminal collects the audio played on the audio playback device and the voice of the first-end user during audio playback. Refer to the audio data to eliminate the audio played on the audio playback device in the first-end audio data, leaving the voice of the first-end user, to prevent the audio played on the audio playback device from interfering with the voice of the first-end user, thereby improving the first-end The quality of the call between the user and the second-end user.
  • the embodiments of the present application provide at least the following implementation modes:
  • the terminal determines the similarity between the first end audio data and each reference audio data in the buffer area, and determines the reference audio data with the highest similarity to the first end audio data as the first end audio Reference audio data corresponding to the data.
  • a plurality of reference audio data are pre-cached in the buffer area, and for each reference audio data, the first-end audio data and the reference audio data are input into a linear adaptive filter, which is determined according to the convergence speed of the linear adaptive filter The similarity between the first-end audio data and the reference audio data. After obtaining the similarity between the first-end audio data and each reference audio data, select a reference audio data with the highest similarity to the first-end audio data from the multiple reference audio data buffered in the buffer area as the first Reference audio data corresponding to the end audio data.
  • the first end audio data is determined by comparing the similarity between the first end audio data and each reference audio data
  • the corresponding reference audio data does not need to strictly correspond to the acquisition time of the first-end audio data and the buffer time of the reference audio data, thereby improving the stability of echo cancellation and reducing complexity.
  • the duration of the first-end audio data collected each time and the duration of the reference audio data are set to the same value in advance, each buffered section of reference audio data is assigned a sequence number, and each section The audio data at one end is also assigned a serial number, and the reference audio data corresponding to the audio data at the first end is determined by matching the serial numbers.
  • the reference audio data is assigned a serial number according to the order of buffering the reference audio data
  • the first end audio data is assigned a serial number according to the order in which the first end audio data is collected, and then the first end audio data is matched with the reference audio data using the serial number , Thereby improving the matching efficiency.
  • a linear adaptive filter may be used to determine the modified audio data.
  • the embodiment of the present application provides at least the following two implementation modes:
  • the reference audio data and the first-end audio data are input to a linear adaptive filter, and the linear adaptive filter subtracts the reference audio data from the first-end audio data to output modified audio data.
  • the reference audio data and the first-end audio data are input into a linear adaptive filter.
  • the linear adaptive filter uses the reference audio data to estimate the echo audio, and the first-end audio data is subtracted from the echo audio , Output corrected audio data.
  • an echo fitting model is established.
  • the echo fitting model is used to make the reference audio data as close as possible to the audio played by the audio playback device.
  • adjust the coefficients of the linear adaptive filter based on the echo fitting model.
  • the linear adaptive filter converges and stabilizes, the reference audio data and the first-end audio data are input to the linear adaptive filter.
  • the linear adaptive filter first estimates the echo audio based on the reference audio data, which is very close to the audio played by the audio playback device, and then uses the first end audio data to subtract the echo audio, and the output eliminates the first end audio data. Audio data for audio correction.
  • the linear adaptive filter is used to estimate the echo audio corresponding to the reference audio data first, so that the reference audio data is closer to the audio played on the audio playback device.
  • the echo audio is used to eliminate the audio played by the audio playback device in the first end of the audio data to improve The effect of echo cancellation.
  • the terminal before inputting the reference audio data and the first-end audio data to the linear adaptive filter, the terminal adjusts the audio parameters of the reference audio data and the first-end audio data to a pre-matched linear adaptive filter Set value.
  • the sampling rate of the reference audio data and the sampling rate of the first-end audio data do not match the sampling rate supported by the linear adaptive filter
  • the sampling rate of the reference audio data and the sampling rate of the first-end audio data Adjust to the sampling rate supported by the linear adaptive filter.
  • the number of channels of reference audio data and the number of channels of first-end audio data do not match the number of channels supported by the linear adaptive filter
  • the number of channels of reference audio data and the number of channels of first-end audio data will be referenced Adjust to the number of channels supported by the linear adaptive filter.
  • the linear adaptive filter supports the interleaving of audio data of each channel, while the reference audio data is non-interleaved, and the first-end audio data is non-interleaved, the reference audio data and the first-end audio data are converted into interleaving.
  • the terminal inputs the reference audio data and the first-end audio data into the linear adaptive filter, and after outputting the modified audio data, the modified audio data may also include audio played by the audio playback device, so the modified audio data may be further Perform echo cancellation.
  • the following methods are used in the embodiments of the present application:
  • the terminal When determining that the attenuation value of the first-end audio data compared to the corrected audio data is greater than a preset threshold, the terminal replaces the corrected audio data with comfort noise.
  • the attenuation value of the first-end audio data compared to the corrected audio data is greater than the preset threshold, it means that most of the first-end audio data is audio played by the audio playback device, and the proportion of the first-end user's voice is very large If it is small, the first-end audio data can be deleted directly, and comfortable noise can be added at the same time to avoid fluctuations in the sense of hearing.
  • the terminal may first perform gain processing on the modified audio data, and then send the gain-corrected audio data to the second-end user terminal. Since the terminal uses the reference audio data to eliminate the audio played by the audio playback device in the first-end audio data, and after determining the modified audio data, the power of the modified audio is correspondingly weakened, the gain processing is performed on the modified audio data to improve the second-end user terminal reception The power of the audio, thereby improving the call effect between the first-end user and the second-end user.
  • the following describes an echo cancellation method provided by the embodiments of the present application in combination with specific implementation scenarios, setting the first end user terminal as a near-end user terminal and the second end user terminal as a far-end user terminal User terminal.
  • the first-end user terminal collects the first-end voice data through a microphone
  • the audio playback device on the first-end user terminal is a speaker
  • the player on the first-end user terminal plays a video
  • the first-end user uses the first-end user terminal and
  • the second-end user makes a call.
  • the first-end user terminal collects audio data to be played on the speaker as reference audio data through a hardware chip interface.
  • the audio data to be played on the speaker includes audio played by the player and voice of the second end user.
  • the microphone collects the audio played in the speaker as the first-end audio data, and the audio played on the speaker includes the audio played on the player and the voice of the user at the second end.
  • the linear adaptive filter estimates the echo audio based on the reference audio data.
  • the echo audio is very close to the audio played on the speaker, and then subtracted from the first-end audio data. Echo audio, output modified audio data. Then, determine whether the attenuation value of the first-end audio data compared to the modified audio data is greater than the preset threshold. If so, replace the modified audio data with comfort noise, otherwise, the modified audio data is gain-processed and sent to the second-end user terminal.
  • the first-end user terminal pre-caches the audio data to be played on the audio playback device as reference audio data
  • the first-end user terminal collects the audio played on the audio playback device and during the audio playback
  • the first-end user's voice uses the reference audio data to eliminate the audio played on the audio playback device in the first-end audio data, leaving the first-end user's voice, avoiding the audio played on the audio playback device to the first-end user's voice Interference is performed to improve the call quality between the first-end user and the second-end user.
  • the linear adaptive filter is used to fit the echo audio corresponding to the reference audio data, so that the echo audio is closer to the audio played by the audio playback device, so when the echo audio is used to cancel the audio played by the audio playback device in the first end of the audio data, it is eliminated The effect of echo is better.
  • the modified audio data is subjected to gain processing and then sent to the second-end user terminal to increase the power of the modified audio and improve the voice effect heard by the second-end user.
  • the terminal 400 includes:
  • the collection module 401 is used to collect first-end audio data, where the first-end audio data includes the voice of the first-end user and the audio played by the audio playback device on the terminal;
  • the query module 402 is used to query the reference audio data corresponding to the first-end audio data from the buffer area, and the buffer area buffers the audio data on the audio playback device as reference audio data;
  • the processing module 403 is configured to use the reference audio data to eliminate the audio played by the audio playback device in the first-end audio data, and determine the modified audio data;
  • the sending module 404 is configured to send the modified audio data to the second-end user terminal.
  • the audio data on the audio playback device includes audio data to be played on the audio playback device.
  • the query module 402 is specifically used to:
  • the reference audio data with the highest similarity to the first-end audio data is determined as the reference audio data corresponding to the first-end audio data.
  • a gain module 405 is also included;
  • the gain module 405 is specifically used to:
  • processing module 403 is specifically used to:
  • the reference audio data and the first-end audio data are input to a linear adaptive filter, and the linear-adaptive filter subtracts the reference audio data from the first-end audio data to output the modified audio data .
  • processing module 403 is specifically used to:
  • the reference audio data and the first-end audio data are input to a linear adaptive filter, the linear adaptive filter uses the reference audio data to estimate echo audio, and the first-end audio data is subtracted from the Echo audio, output the corrected audio data.
  • processing module 403 is also used to:
  • processing module 403 is also used to:
  • the modified audio data is replaced with comfort noise.
  • an embodiment of the present application provides a terminal device, as shown in FIG. 5, including at least one processor 501 and a memory 502 connected to the at least one processor, and the processor is not limited in the embodiment of the present application
  • the processor 501 and the memory 502 in FIG. 5 are connected by a bus as an example.
  • the bus can be divided into address bus, data bus and control bus.
  • the memory 502 stores instructions executable by at least one processor 501.
  • the at least one processor 501 can perform the steps included in the foregoing echo cancellation method.
  • the processor 501 is the control center of the terminal device, and can use various interfaces and lines to connect various parts of the terminal device, by running or executing the instructions stored in the memory 502 and calling the data stored in the memory 502, thereby eliminating echo .
  • the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, and application programs, etc.
  • the modulation processor mainly handles wireless communication. It can be understood that, the foregoing modem processor may not be integrated into the processor 501.
  • the processor 501 and the memory 502 may be implemented on the same chip, and in some embodiments, they may also be implemented on separate chips.
  • the processor 501 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array or other programmable logic devices, discrete gates, or transistors
  • the logic device and the discrete hardware component can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied and executed by a hardware processor, or may be executed and completed by a combination of hardware and software modules in the processor.
  • the memory 502 is a non-volatile computer-readable storage medium that can be used to store non-volatile software programs, non-volatile computer executable programs, and modules.
  • the memory 502 may include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Erasable Programmable Read Only Memory (Electrically Programmable Read-Only Memory, EEPROM), Magnetic Memory, Disk , CD, etc.
  • the memory 502 is any other medium that can be used to carry or store a desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory 502 in the embodiment of the present application may also be a circuit or any other device capable of realizing a storage function, which is used to store program instructions and / or data.
  • embodiments of the present application also provide a computer-readable storage medium that stores computer instructions, and when the computer instructions run on the terminal device, causes the terminal device to perform the echo cancellation method as described above A step of.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

本申请实施例提供了一种回声消除方法及终端,涉及音视频实时通讯技术领域,该方法包括:终端采集第一端音频数据,第一端音频数据包括第一端用户的语音以及终端上的音频播放设备播放的音频。然后从缓存区中查询第一端音频数据对应的参考音频数据,缓存区缓存音频播放设备上待播放的音频数据作为参考音频数据。之后再采用参考音频数据消除第一端音频数据中音频播放设备播放的音频,确定修正音频数据。最后将修正音频数据发送至第二端用户终端。由于采用参考音频数据消除第一端音频数据中音频播放设备上播放的音频,留下第一端用户的语音,避免音频播放设备上播放的音频对第一端用户的语音进行干扰,从而提高第一端用户与第二端用户的通话质量。

Description

一种回声消除方法及终端 技术领域
本发明实施例涉及音视频实时通讯技术领域,尤其涉及一种回声消除方法及终端。
背景技术
随着带宽和终端性能的提升,单一的音视频通话场景稍显枯燥,已不能满足用户的需求。因此在这个基础上衍生了“边看边聊”的应用,即在同一个终端上(如手机或电视机)边看电视节目,边通话;或者游戏过程中边体验游戏时也需要涉及语音采集和播放。但是,电视节目产生的背景音,会与人声一起被麦克风采集,并发送至远端,从而影响通话质量;或者,游戏过程中不同通话端对语音的来回采集,产生了不需要的回声,造成音质嘈杂,影响用户体验。
现有技术中,或是简单地将终端采集的背景音当成噪声进行抑制,造成无法准确识别背景音,只能消除小部分噪声,进而影响语音通话质量。或是在软件层获取背景音,与远端音频合成后,直接作为参考数据以进行回声消除。但是由于获取到背景音后往往需要再合成,这样就与实际播放数据存在差异,进而影响消除效果。因此,亟待一种可以有效消除外部回声,提升语音通话质量的技术方案。
发明内容
由于现有技术中,在边通话边看视频过程中,视频的背景音被麦克风采集发送至远端,从而影响通话质量的问题,本申请实施例提供了一种回声消除方法及终端。
一方面,本申请实施例提供了一种回声消除方法,该方法包括:
终端采集第一端音频数据,所述第一端音频数据包括第一端用户的语音以及所述终端上的音频播放设备播放的音频;
所述终端从缓存区中查询所述第一端音频数据对应的参考音频数据,所述缓存区缓存所述音频播放设备上的音频数据作为参考音频数据;
所述终端采用所述参考音频数据消除所述第一端音频数据中所述音频播放设备播放的音频,确定修正音频数据;
所述终端将所述修正音频数据发送至第二端用户终端。
由于终端预先缓存音频播放设备上的音频数据作为参考音频数据,故在音频播放设备上播放音频时,终端采集音频播放设备上播放的音频以及在播放音频期间第一端用户的语音,采用参考音频数据消除第一端音频数据中音频播放设备上播放的音频,留下第一端用户的语音,避免音频播放设备上播放的音频对第一端用户的语音进行干扰,从而提高第一端用户与第二端用户的通话质量。
可选地,所述音频播放设备上的音频数据包括所述音频播放设备上待播放的音频数据。
可选地,所述终端从缓存区中查询所述第一端音频数据对应的参考音频数据,包括:
所述终端确定所述第一端音频数据与缓存区中每一个参考音频数据的相似度;
所述终端将与所述第一端音频数据的相似度最高的参考音频数据确定为所述第一端音频数据对应的参考音频数据。
由于在缓存区中预先缓存多个参考音频数据,当终端采集到第一端音频数据时,通过比较第一端音频数据与每个参考音频数据之间的相似度,从中确定第一端音频数据对应的参考音频数据,而不需要严格将第一端音频数据的采集时间与参考音频数据的缓存时间对应,从而提高了回声消除的稳定性,降低了复杂度。
可选地,所述终端将所述修正音频数据发送至第二端用户终端之前,还包 括:
所述终端对所述修正音频数据进行增益处理。
由于终端采用参考音频数据消除第一端音频数据中音频播放设备播放的音频,确定修正音频数据之后,修正音频的功率相应被削弱,故对修正音频数据进行增益处理,提高第二端用户终端接收的音频的功率,从而提高第一端用户与第二端用户的通话效果。
可选地,所述终端采用所述参考音频数据消除所述第一端音频数据中所述音频播放设备播放的音频,确定修正音频数据,包括:
所述终端将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器,所述线性自适应滤波器将所述第一端音频数据减去所述参考音频数据,输出所述修正音频数据。
可选地,所述终端采用所述参考音频数据消除所述第一端音频数据中所述音频播放设备播放的音频,确定修正音频数据,包括:
所述终端将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器,所述线性自适应滤波器采用所述参考音频数据估计回声音频,将所述第一端音频数据减去所述回声音频,输出所述修正音频数据。
可选地,所述终端将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器之前,还包括:
所述终端将所述参考音频数据的音频参数和所述第一端音频数据的音频参数调整至与所述线性自适应滤波器匹配的预设值。
可选地,还包括:
所述终端在确定所述第一端音频数据相较于所述修正音频数据的衰减值大于预设阈值时,将所述修正音频数据替换为舒适噪声。
由于当第一端音频数据相较于修正音频数据的衰减值大于预设阈值时,说明第一端音频数据中大部分数据为音频播放设备播放的音频,第一端用户的语音所占比例很小,那么可以直接删除该第一端音频数据,同时添加舒适噪声, 避免出现听感起伏的情况。
另一方面,本申请实施例提供了一种终端,包括:
采集模块,用于采集第一端音频数据,所述第一端音频数据包括第一端用户的语音以及所述终端上的音频播放设备播放的音频;
查询模块,用于从缓存区中查询所述第一端音频数据对应的参考音频数据,所述缓存区缓存所述音频播放设备上的音频数据作为参考音频数据;
处理模块,用于采用所述参考音频数据消除所述第一端音频数据中所述音频播放设备播放的音频,确定修正音频数据;
发送模块,用于将所述修正音频数据发送至第二端用户终端。
可选地,所述音频播放设备上的音频数据包括所述音频播放设备上待播放的音频数据。
可选地,所述查询模块具体用于:
确定所述第一端音频数据与缓存区中每一个参考音频数据的相似度;
将与所述第一端音频数据的相似度最高的参考音频数据确定为所述第一端音频数据对应的参考音频数据。
可选地,还包括增益模块;
所述增益模块具体用于:
将所述修正音频数据发送至第二端用户终端之前,对所述修正音频数据进行增益处理。
可选地,所述处理模块具体用于:
将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器,所述线性自适应滤波器将所述第一端音频数据减去所述参考音频数据,输出所述修正音频数据。
可选地,所述处理模块具体用于:
将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器,所述线性自适应滤波器采用所述参考音频数据估计回声音频,将所述第一端音频数 据减去所述回声音频,输出所述修正音频数据。
可选地,所述处理模块还用于:
将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器之前,将所述参考音频数据的音频参数和所述第一端音频数据的音频参数调整至与所述线性自适应滤波器匹配的预设值。
可选地,所述处理模块还用于:
在确定所述第一端音频数据相较于所述修正音频数据的衰减值大于预设阈值时,将所述修正音频数据替换为舒适噪声。
又一方面,本申请实施例提供了一种终端设备,包括至少一个处理器、以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行上述回声消除方法的步骤。
再一方面,本申请实施例提供了一种计算机可读存储介质,其存储有可由终端设备执行的计算机程序,当所述程序在终端设备上运行时,使得所述终端设备执行上述回声消除方法的步骤。
本申请实施例中,由于终端预先缓存音频播放设备上待播放的音频数据作为参考音频数据,故在音频播放设备上播放音频时,终端采集音频播放设备上播放的音频以及在播放音频期间第一端用户的语音,采用参考音频数据消除第一端音频数据中音频播放设备上播放的音频,留下第一端用户的语音,避免音频播放设备上播放的音频对第一端用户的语音进行干扰,从而提高第一端用户与第二端用户的通话质量。其次,采用线性自适应滤波器拟合参考音频数据对应的回声音频,使得回声音频更接近音频播放设备播放的音频,故采用回声音频抵消第一端音频数据中音频播放设备播放的音频时,消除回声的效果更好。另外,将修正音频数据进行增益处理后发送至第二端用户终端,提高了修正音频的功率,提高了第二端用户听到的语音效果。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种应用场景图;
图2为本申请实施例提供的一种回声消除方法的流程示意图;
图3为本申请实施例提供的一种回声消除方法的流程示意图;
图4为本申请实施例提供的一种终端的结构示意图;
图5为本申请实施例提供的一种终端设备的结构示意图。
具体实施方式
为了使本发明的目的、技术方案及有益效果更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
本申请实施例中的回声消除方法可以应用于如图1所示的应用场景,在该应用场景中包括第一端用户终端101和第二端用户终端102。
第一端用户终端101和第二端用户终端102是具备通话功能以及音视频播放功能的电子设备,该电子设备可以是智能电视、智能手机、平板电脑或便携式个人计算机等等。第一端用户终端101和第二端用户终端102可以通过拨打电话的方式进行通话,也可以通过即时通讯软件进行通话,其中,通话包括语音通话和视频通话等。第一端用户终端101和第二端用户终端102上安装有播放音频和视频的应用程序。本实施例中,以第一端用户终端101为近端用户终端,第二点用户终端102为远端用户终端为例进行说明。第一端用户终端101,即近端用户终端用于采集近端用户的语音以及音频播放设备播放的音频,然后对采集的音频数据进行回声消除后发送至远端用户终端。远端用户终端用于接收近端用户终端发送的消除回声后的音频数据。
基于同样的原理,在其他实施例中,第一端用户终端101可以为远端用户终端,第二用户终端102则为近端用户终端。
下面以第一端用户终端101和第二端用户终端102均为电视为例进行具体说明,其中,第一端用户终端101为近端用户终端,第二用户终端102为远端用户终端。设定第一端用户终端101和第二端用户终端102上均安装有微信。第一端用户通过电视上安装的微信与第二端用户进行语音通话,电视上播放电视节目。
电视将扬声器需要播放的电视音频数据保存在缓存区中作为参考音频数据。扬声器播电视音频,第一端用户对着电视上的麦克风说话,麦克风采集扬声器播放的电视音频以及第一端用户的语音作为第一端音频数据。如果不对麦克风采集的音频进行回声消除,扬声器播放的电视音频也会发送至第二端用户终端,故第二端用户会听到除第一端用户的语音之外的音频,从而影响通话质量。因此,本申请实施例中,采用缓存区中的参考音频数据消除第一端音频数据中扬声器播放的电视音频,得到第一端用户的语音,将第一端用户的语音进行增益处理后发送至第二端用户,从而提高通话质量。
基于图1所示的应用场景图,本申请实施例提供了一种回声消除方法的流程,该方法的流程可以由终端执行,如图2所示,该终端可以是上述第一端用户终端101,包括以下步骤:
步骤S201,终端采集第一端音频数据,第一端音频数据包括第一端用户的语音以及终端上的音频播放设备播放的音频。
可选地,终端是具备通话功能以及音视频播放功能的电子设备,该电子设备可以是智能电视、智能手机、平板电脑或便携式个人计算机等等。
终端通过麦克风采集第一端音频数据。终端上的音频播放设备可以是扬声器。音频播放设备播放的音频可以是视频中的音频,比如电视节目的音频、播放器中播放的音频等。音频播放设备播放的音频也可以是纯音频,比如音乐播放器播放的音乐、电台播放的广播和手机铃声等。音频播放设备播放的音频还 可以是终端接收的第二端用户的语音。
音频播放设备在接收到多个待播放的音频时,可以同时播放多个音频。比如扬声器同时接收到电视节目的音频数据以及第二端用户的语音数据时,同时播放电视节目的音频以及第二端用户的语音。终端每次采集的音频时长可以预先设置,比如每次采集的音频时长为5ms。
可选地,第一端音频数据还包括第一端音频数据的音频参数,具体地,音频参数包括音频大小、采样率、通道数、位宽和是否交织等。
步骤S202,终端从缓存区中查询第一端音频数据对应的参考音频数据。
预先设置缓存区,缓存区缓存音频播放设备上的音频数据作为参考音频数据。缓存区中缓存的每一段参考音频数据的时长可以预先设置,参考音频数据的时长与终端采集的音频播放设备上播放的音频时长对应,比如,每一段参考音频数据的时长为5ms,终端每次采集的音频播放设备上播放的音频时长为5ms。
在一种可能的实施方式中,音频播放设备上的音频数据包括音频播放设备上待播放的音频数据,缓存区缓存音频播放设备上待播放的音频数据作为参考音频数据。
在另一种可能的实施方式中,音频播放设备上的音频数据包括音频播放设备上已播放的音频数据,缓存区缓存音频播放设备上已播放的音频数据作为参考音频数据。
步骤S203,终端采用参考音频数据消除第一端音频数据中音频播放设备播放的音频,确定修正音频数据。
参考音频数据与音频播放设备播放的音频相关性很高,将第一端音频数据减去参考音频数据,可以抵消第一端音频数据中音频播放设备播放的音频。
步骤S204,终端将修正音频数据发送至第二端用户终端。
由于终端预先缓存音频播放设备上待播放的音频数据作为参考音频数据,故在音频播放设备上播放音频时,终端采集音频播放设备上播放的音频以及在 播放音频期间第一端用户的语音,采用参考音频数据消除第一端音频数据中音频播放设备上播放的音频,留下第一端用户的语音,避免音频播放设备上播放的音频对第一端用户的语音进行干扰,从而提高第一端用户与第二端用户的通话质量。
可选地,在上述步骤S202中,终端从缓存区中查询第一端音频数据对应的参考音频数据时,本申请实施例至少提供以下几种实施方式:
在一种可能的实施方式中,终端确定第一端音频数据与缓存区中每一个参考音频数据的相似度,将与第一端音频数据的相似度最高的参考音频数据确定为第一端音频数据对应的参考音频数据。
具体实施中,缓存区中预先缓存多个参考音频数据,针对每个参考音频数据,将第一端音频数据与该参考音频数据输入线性自适应滤波器,根据线性自适应滤波器的收敛速度确定第一端音频数据与该参考音频数据的相似度。获得第一端音频数据与每个参考音频数据的相似度之后,从缓存区中缓存的多个参考音频数据中,选取一个与第一端音频数据的相似度最高的参考音频数据,作为第一端音频数据对应的参考音频数据。
由于在缓存区中预先缓存多个参考音频数据,当终端采集到第一端音频数据时,通过比较第一端音频数据与每个参考音频数据之间的相似度,从中确定第一端音频数据对应的参考音频数据,而不需要严格将第一端音频数据的采集时间与参考音频数据的缓存时间对应,从而提高了回声消除的稳定性,降低了复杂度。
在一种可能的实施方式中,预先将每次采集的第一端音频数据的时长与参考音频数据的时长设置为相同的值,每缓存一段参考音频数据分配一个序列号,同时每采集一段第一端音频数据也分配一个序列号,通过序列号匹配确定第一端音频数据对应的参考音频数据。
根据缓存参考音频数据的顺序为参考音频数据分配序列号,根据采集第一端音频数据的顺序为第一端音频数据分配序列号,然后采用序列号将第一端音 频数据与参考音频数据进行匹配,从而提高了匹配效率。
可选地,在上述步骤S203中,可以采用线性自适应滤波器确定修正音频数据,本申请实施例至少提供以下两种实施方式:
在一种可能的实施方式中,将参考音频数据和第一端音频数据输入线性自适应滤波器,线性自适应滤波器将第一端音频数据减去参考音频数据,输出修正音频数据。
在另一种可能的实施方式中,将参考音频数据和第一端音频数据输入线性自适应滤波器,线性自适应滤波器采用参考音频数据估计回声音频,将第一端音频数据减去回声音频,输出修正音频数据。
具体地,由于音频播放设备播放音频后,音频会经过墙壁等障碍物的反射,因此,终端采集的音频播放设备播放的音频与参考音频数据之间还是存在一定的区别,因此本申请实施例中,先以参考音频数据和音频播放设备播放的音频的相关性为基础,建立回声拟合模型。利用该回声拟合模型使参考音频数据尽量逼近音频播放设备播放的音频。然后基于回声拟合模型调整线性自适应滤波器的系数,当线性自适应滤波器收敛稳定后,将参考音频数据和第一端音频数据输入线性自适应滤波器。线性自适应滤波器首先根据参考音频数据估计回声音频,该回声音频非常逼近音频播放设备播放的音频,然后采用第一端音频数据减去回声音频,输出消除第一端音频数据中音频播放设备播放的音频的修正音频数据。
由于采用线性自适应滤波器先估计参考音频数据对应的回声音频,使得参考音频数据与音频播放设备上播放的音频更接近,采用回声音频消除第一端音频数据中音频播放设备播放的音频,提高了回声消除的效果。
可选地,终端将参考音频数据和第一端音频数据输入线性自适应滤波器之前,将参考音频数据的音频参数和第一端音频数据的音频参数调整至与线性自适应滤波器匹配的预设值。
示例性地,当参考音频数据的采样率和第一端音频数据的采样率与线性自 适应滤波器支持的采样率不匹配时,将参考音频数据的采样率和第一端音频数据的采样率调整至线性自适应滤波器支持的采样率。
示例性地,当参考音频数据的通道数和第一端音频数据的通道数与线性自适应滤波器支持的通道数不匹配时,将参考音频数据的通道数和第一端音频数据的通道数调整至线性自适应滤波器支持的通道数。
示例性地,当线性自适应滤波器支持各通道的音频数据交织,而参考音频数据非交织,第一端音频数据非交织时,将参考音频数据和第一端音频数据转化为交织。
可选地,终端将参考音频数据和第一端音频数据输入线性自适应滤波器,输出修正音频数据之后,修正音频数据中可能还包括音频播放设备播放的音频,因此,可以对修正音频数据进一步进行回声消除。本申请实施例中采用以下方式:
终端在确定第一端音频数据相较于修正音频数据的衰减值大于预设阈值时,将修正音频数据替换为舒适噪声。由于当第一端音频数据相较于修正音频数据的衰减值大于预设阈值时,说明第一端音频数据中大部分数据为音频播放设备播放的音频,第一端用户的语音所占比例很小,那么可以直接删除该第一端音频数据,同时添加舒适噪声,避免出现听感起伏的情况。
可选地,在上述步骤S204,终端将修正音频数据发送至第二端用户终端之前,可以先对修正音频数据进行增益处理,然后再将增益后的修正音频数据发送至第二端用户终端。由于终端采用参考音频数据消除第一端音频数据中音频播放设备播放的音频,确定修正音频数据之后,修正音频的功率相应被削弱,故对修正音频数据进行增益处理,提高第二端用户终端接收的音频的功率,从而提高第一端用户与第二端用户的通话效果。
为了更好的解释本申请实施例,下面结合具体的实施场景描述本申请实施例提供的一种回声消除方法,设定第一端用户终端为近端用户终端,第二端用户终端为远端用户终端。第一端用户终端通过麦克风采集第一端语音数据,第 一端用户终端上的音频播放设备为扬声器,第一端用户终端上播放器播放视频,同时第一端用户通过第一端用户终端与第二端用户进行通话。如图3所示,第一端用户终端通过硬件芯片接口采集扬声器上待播放的音频数据作为参考音频数据,扬声器上待播放的音频数据包括播放器播放的音频、第二端用户的语音。麦克风采集扬声器中播放的音频作为第一端音频数据,扬声器上播放的音频包括播放器上播放的音频以及第二端用户的语音。将第一端音频数据和参考音频数据输入线性自适应滤波器,线性自适应滤波器根据参考音频数据估计回声音频,该回声音频非常逼近扬声器上播放的音频,然后采用第一端音频数据减去回声音频,输出修正音频数据。之后再判断第一端音频数据相较于修正音频数据的衰减值是否大于预设阈值,若是,则将修正音频数据替换为舒适噪声,否则将修正音频数据进行增益处理后发送至第二端用户终端。
由于第一端用户终端预先缓存音频播放设备上待播放的音频数据作为参考音频数据,故在音频播放设备上播放音频时,第一端用户终端采集音频播放设备上播放的音频以及在播放音频期间第一端用户的语音,采用参考音频数据消除第一端音频数据中音频播放设备上播放的音频,留下第一端用户的语音,避免音频播放设备上播放的音频对第一端用户的语音进行干扰,从而提高第一端用户与第二端用户的通话质量。其次,采用线性自适应滤波器拟合参考音频数据对应的回声音频,使得回声音频更接近音频播放设备播放的音频,故采用回声音频抵消第一端音频数据中音频播放设备播放的音频时,消除回声的效果更好。另外,将修正音频数据进行增益处理后发送至第二端用户终端,提高了修正音频的功率,提高了第二端用户听到的语音效果。
基于相同的技术构思,本申请实施例提供了一种终端,如图4所示,该终端400包括:
采集模块401,用于采集第一端音频数据,所述第一端音频数据包括第一端用户的语音以及所述终端上的音频播放设备播放的音频;
查询模块402,用于从缓存区中查询所述第一端音频数据对应的参考音频 数据,所述缓存区缓存所述音频播放设备上的音频数据作为参考音频数据;
处理模块403,用于采用所述参考音频数据消除所述第一端音频数据中所述音频播放设备播放的音频,确定修正音频数据;
发送模块404,用于将所述修正音频数据发送至第二端用户终端。
可选地,所述音频播放设备上的音频数据包括所述音频播放设备上待播放的音频数据。
可选地,所述查询模块402具体用于:
确定所述第一端音频数据与缓存区中每一个参考音频数据的相似度;
将与所述第一端音频数据的相似度最高的参考音频数据确定为所述第一端音频数据对应的参考音频数据。
可选地,还包括增益模块405;
所述增益模块405具体用于:
将所述修正音频数据发送至第二端用户终端之前,对所述修正音频数据进行增益处理。
可选地,所述处理模块403具体用于:
将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器,所述线性自适应滤波器将所述第一端音频数据减去所述参考音频数据,输出所述修正音频数据。
可选地,所述处理模块403具体用于:
将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器,所述线性自适应滤波器采用所述参考音频数据估计回声音频,将所述第一端音频数据减去所述回声音频,输出所述修正音频数据。
可选地,所述处理模块403还用于:
将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器之前,将所述参考音频数据的音频参数和所述第一端音频数据的音频参数调整至与所述线性自适应滤波器匹配的预设值。
可选地,所述处理模块403还用于:
在确定所述第一端音频数据相较于所述修正音频数据的衰减值大于预设阈值时,将所述修正音频数据替换为舒适噪声。
基于相同的技术构思,本申请实施例提供了一种终端设备,如图5所示,包括至少一个处理器501,以及与至少一个处理器连接的存储器502,本申请实施例中不限定处理器501与存储器502之间的具体连接介质,图5中处理器501和存储器502之间通过总线连接为例。总线可以分为地址总线、数据总线和控制总线等。
在本申请实施例中,存储器502存储有可被至少一个处理器501执行的指令,至少一个处理器501通过执行存储器502存储的指令,可以执行前述的回声消除方法中所包括的步骤。
其中,处理器501是终端设备的控制中心,可以利用各种接口和线路连接终端设备的各个部分,通过运行或执行存储在存储器502内的指令以及调用存储在存储器502内的数据,从而消除回声。可选的,处理器501可包括一个或多个处理单元,处理器501可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器501中。在一些实施例中,处理器501和存储器502可以在同一芯片上实现,在一些实施例中,它们也可以在独立的芯片上分别实现。
处理器501可以是通用处理器,例如中央处理器(CPU)、数字信号处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器502作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。存储器502可以包括至少一种类型的存储介质,例如可以包括闪存、硬盘、多媒体卡、卡型存储器、随机访问存储器(Random Access Memory,RAM)、静态随机访问存储器(Static Random Access Memory,SRAM)、可编程只读存储器(Programmable Read Only Memory,PROM)、只读存储器(Read Only Memory,ROM)、带电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性存储器、磁盘、光盘等等。存储器502是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器502还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。
基于同一发明构思,本申请实施例还提供一种计算机可读存储介质,该可读存储介质存储有计算机指令,当该计算机指令在终端设备上运行时,使得终端设备执行如前述的回声消除方法的步骤。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (18)

  1. 一种回声消除方法,其特征在于,包括:
    终端采集第一端音频数据,所述第一端音频数据包括第一端用户的语音以及所述终端上的音频播放设备播放的音频;
    所述终端从缓存区中查询所述第一端音频数据对应的参考音频数据,所述缓存区缓存所述音频播放设备上的音频数据作为参考音频数据;
    所述终端采用所述参考音频数据消除所述第一端音频数据中所述音频播放设备播放的音频,确定修正音频数据;
    所述终端将所述修正音频数据发送至第二端用户终端。
  2. 如权利要求1所述的方法,其特征在于,所述音频播放设备上的音频数据包括所述音频播放设备上待播放的音频数据。
  3. 如权利要求1所述的方法,其特征在于,所述终端从缓存区中查询所述第一端音频数据对应的参考音频数据,包括:
    所述终端确定所述第一端音频数据与缓存区中每一个参考音频数据的相似度;
    所述终端将与所述第一端音频数据的相似度最高的参考音频数据确定为所述第一端音频数据对应的参考音频数据。
  4. 如权利要求1至3任一所述的方法,其特征在于,所述终端将所述修正音频数据发送至第二端用户终端之前,还包括:
    所述终端对所述修正音频数据进行增益处理。
  5. 如权利要求4所述的方法,其特征在于,所述终端采用所述参考音频数据消除所述第一端音频数据中所述音频播放设备播放的音频,确定修正音频数据,包括:
    所述终端将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器,所述线性自适应滤波器将所述第一端音频数据减去所述参考音频数据,输出所述修正音频数据。
  6. 如权利要求4所述的方法,其特征在于,所述终端采用所述参考音频数据消除所述第一端音频数据中所述音频播放设备播放的音频,确定修正音频数据,包括:
    所述终端将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器,所述线性自适应滤波器采用所述参考音频数据估计回声音频,将所述第一端音频数据减去所述回声音频,输出所述修正音频数据。
  7. 如权利要求5或6所述的方法,其特征在于,所述终端将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器之前,还包括:
    所述终端将所述参考音频数据的音频参数和所述第一端音频数据的音频参数调整至与所述线性自适应滤波器匹配的预设值。
  8. 如权利要求7所述的方法,其特征在于,还包括:
    所述终端在确定所述第一端音频数据相较于所述修正音频数据的衰减值大于预设阈值时,将所述修正音频数据替换为舒适噪声。
  9. 一种终端,其特征在于,包括:
    采集模块,用于采集第一端音频数据,所述第一端音频数据包括第一端用户的语音以及所述终端上的音频播放设备播放的音频;
    查询模块,用于从缓存区中查询所述第一端音频数据对应的参考音频数据,所述缓存区缓存所述音频播放设备上的音频数据作为参考音频数据;
    处理模块,用于采用所述参考音频数据消除所述第一端音频数据中所述音频播放设备播放的音频,确定修正音频数据;
    发送模块,用于将所述修正音频数据发送至第二端用户终端。
  10. 如权利要求9所述的终端,其特征在于,所述音频播放设备上的音频数据包括所述音频播放设备上待播放的音频数据。
  11. 如权利要求9所述的终端,其特征在于,所述查询模块具体用于:
    确定所述第一端音频数据与缓存区中每一个参考音频数据的相似度;
    将与所述第一端音频数据的相似度最高的参考音频数据确定为所述第一 端音频数据对应的参考音频数据。
  12. 如权利要求9至11任一所述的终端,其特征在于,还包括增益模块;
    所述增益模块具体用于:
    将所述修正音频数据发送至第二端用户终端之前,对所述修正音频数据进行增益处理。
  13. 如权利要求12所述的终端,其特征在于,所述处理模块具体用于:
    将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器,所述线性自适应滤波器将所述第一端音频数据减去所述参考音频数据,输出所述修正音频数据。
  14. 如权利要求12所述的终端,其特征在于,所述处理模块具体用于:
    将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器,所述线性自适应滤波器采用所述参考音频数据估计回声音频,将所述第一端音频数据减去所述回声音频,输出所述修正音频数据。
  15. 如权利要求13或14所述的终端,其特征在于,所述处理模块还用于:
    将所述参考音频数据和所述第一端音频数据输入线性自适应滤波器之前,将所述参考音频数据的音频参数和所述第一端音频数据的音频参数调整至与所述线性自适应滤波器匹配的预设值。
  16. 如权利要求15所述的终端,其特征在于,所述处理模块还用于:
    在确定所述第一端音频数据相较于所述修正音频数据的衰减值大于预设阈值时,将所述修正音频数据替换为舒适噪声。
  17. 一种终端设备,其特征在于,包括至少一个处理器、以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行权利要求1~8任一权利要求所述方法的步骤。
  18. 一种计算机可读介质,其特征在于,其存储有可由终端设备执行的计算机程序,当所述程序在终端设备上运行时,使得所述终端设备执行权利要求1~8任一所述方法的步骤。
PCT/CN2018/119887 2018-11-20 2018-12-07 一种回声消除方法及终端 WO2020103209A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/285,616 US20210321005A1 (en) 2018-11-20 2018-12-07 Method and terminal for echo cancellation
EP18940719.0A EP3882913A4 (en) 2018-11-20 2018-12-07 ECHO CANCELLATION PROCEDURE AND TERMINAL DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811382416.9 2018-11-20
CN201811382416.9A CN109346098B (zh) 2018-11-20 2018-11-20 一种回声消除方法及终端

Publications (1)

Publication Number Publication Date
WO2020103209A1 true WO2020103209A1 (zh) 2020-05-28

Family

ID=65316533

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/119887 WO2020103209A1 (zh) 2018-11-20 2018-12-07 一种回声消除方法及终端

Country Status (4)

Country Link
US (1) US20210321005A1 (zh)
EP (1) EP3882913A4 (zh)
CN (1) CN109346098B (zh)
WO (1) WO2020103209A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110138650A (zh) * 2019-05-14 2019-08-16 北京达佳互联信息技术有限公司 即时通讯的音质优化方法、装置及设备
CN111028854B (zh) * 2019-12-06 2022-10-11 北京达佳互联信息技术有限公司 一种音频数据的处理方法、装置、电子设备及存储介质
CN113160782B (zh) * 2020-01-22 2022-11-01 百度在线网络技术(北京)有限公司 音频处理的方法、装置、电子设备及可读存储介质
CN113470673A (zh) * 2020-03-30 2021-10-01 阿里巴巴集团控股有限公司 数据处理方法、装置、设备和存储介质
CN113179447A (zh) * 2021-04-08 2021-07-27 上海视龙软件有限公司 用于网页播放媒体流的回声消除的方法、装置及设备
CN113726936B (zh) * 2021-08-30 2023-10-24 联想(北京)有限公司 一种音频数据处理方法及装置
CN114267367A (zh) * 2021-12-14 2022-04-01 海尔优家智能科技(北京)有限公司 回声联合消除方法及智能语音设备、电子设备、存储介质
CN114979344A (zh) * 2022-05-09 2022-08-30 北京字节跳动网络技术有限公司 回声消除方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140119518A1 (en) * 2012-10-31 2014-05-01 Citrix Systems, Inc. Systems and methods of monitoring performance of acoustic echo cancellation
CN104010100A (zh) * 2014-05-08 2014-08-27 深圳市汇川技术股份有限公司 VoIP通信中的回声消除系统及方法
CN106210371A (zh) * 2016-08-31 2016-12-07 广州视源电子科技股份有限公司 一种回声时延的确定方法、装置及智能会议设备
CN108091343A (zh) * 2018-01-16 2018-05-29 北京三体云联科技有限公司 一种回声消除方法及装置
CN108510997A (zh) * 2018-01-18 2018-09-07 晨星半导体股份有限公司 电子设备及应用于电子设备的回声消除方法

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4091418A (en) * 1977-03-02 1978-05-23 Zenith Radio Corporation Automatic channel equalization with time base expansion
JPS6053336A (ja) * 1983-09-02 1985-03-27 Nec Corp 反響消去装置
US5295136A (en) * 1992-04-13 1994-03-15 Motorola, Inc. Method of performing convergence in a, least mean square, adaptive filter, echo canceller
US5949888A (en) * 1995-09-15 1999-09-07 Hughes Electronics Corporaton Comfort noise generator for echo cancelers
US8271279B2 (en) * 2003-02-21 2012-09-18 Qnx Software Systems Limited Signature noise removal
US8311819B2 (en) * 2005-06-15 2012-11-13 Qnx Software Systems Limited System for detecting speech with background voice estimates and noise estimates
CN103312913B (zh) * 2013-07-03 2015-12-23 苏州科达科技股份有限公司 一种消除回声的系统及方法
US9516159B2 (en) * 2014-11-04 2016-12-06 Apple Inc. System and method of double talk detection with acoustic echo and noise control
KR20160076059A (ko) * 2014-12-22 2016-06-30 삼성전자주식회사 디스플레이장치 및 그 반향 제거방법
TWI671737B (zh) * 2015-08-07 2019-09-11 圓剛科技股份有限公司 回音消除裝置以及回音消除方法
WO2017053493A1 (en) * 2015-09-25 2017-03-30 Microsemi Semiconductor (U.S.) Inc. Comfort noise generation apparatus and method
US10225395B2 (en) * 2015-12-09 2019-03-05 Whatsapp Inc. Techniques to dynamically engage echo cancellation
US9912373B1 (en) * 2016-10-19 2018-03-06 Whatsapp Inc. Techniques to detect echoes using audio fingerprinting
CN106817490A (zh) * 2017-01-12 2017-06-09 努比亚技术有限公司 一种终端和声音播放方法
US10389885B2 (en) * 2017-02-01 2019-08-20 Cisco Technology, Inc. Full-duplex adaptive echo cancellation in a conference endpoint
US20220303386A1 (en) * 2021-03-22 2022-09-22 DSP Concepts, Inc. Method and system for voice conferencing with continuous double-talk

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140119518A1 (en) * 2012-10-31 2014-05-01 Citrix Systems, Inc. Systems and methods of monitoring performance of acoustic echo cancellation
CN104010100A (zh) * 2014-05-08 2014-08-27 深圳市汇川技术股份有限公司 VoIP通信中的回声消除系统及方法
CN106210371A (zh) * 2016-08-31 2016-12-07 广州视源电子科技股份有限公司 一种回声时延的确定方法、装置及智能会议设备
CN108091343A (zh) * 2018-01-16 2018-05-29 北京三体云联科技有限公司 一种回声消除方法及装置
CN108510997A (zh) * 2018-01-18 2018-09-07 晨星半导体股份有限公司 电子设备及应用于电子设备的回声消除方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3882913A4 *

Also Published As

Publication number Publication date
CN109346098A (zh) 2019-02-15
CN109346098B (zh) 2022-06-07
EP3882913A4 (en) 2021-10-20
US20210321005A1 (en) 2021-10-14
EP3882913A1 (en) 2021-09-22

Similar Documents

Publication Publication Date Title
WO2020103209A1 (zh) 一种回声消除方法及终端
US8744091B2 (en) Intelligibility control using ambient noise detection
KR102031023B1 (ko) 적응형 잡음 제거 시스템에서 잡음 방지 생성기 응답 및 2차 경로 응답의 시퀀싱된 적응
US9666176B2 (en) Systems and methods for adaptive noise cancellation by adaptively shaping internal white noise to train a secondary path
CN105144674B (zh) 多通道回声消除与噪声抑制
US9491545B2 (en) Methods and devices for reverberation suppression
US9066176B2 (en) Systems and methods for adaptive noise cancellation including dynamic bias of coefficients of an adaptive noise cancellation system
US9264808B2 (en) Systems and methods for detection and cancellation of narrow-band noise
US8774399B2 (en) System for reducing speakerphone echo
CN108076239B (zh) 一种改善ip电话回声的方法
US9413434B2 (en) Cancellation of interfering audio on a mobile device
CN106791244B (zh) 回声消除方法、装置以及通话设备
US9392364B1 (en) Virtual microphone for adaptive noise cancellation in personal audio devices
US20150086006A1 (en) Echo suppressor using past echo path characteristics for updating
US8744524B2 (en) User interface tone echo cancellation
US20190221226A1 (en) Electronic apparatus and echo cancellation method applied to electronic apparatus
US8582754B2 (en) Method and system for echo cancellation in presence of streamed audio
CN112702460B (zh) 一种语音通信的回声消除方法及装置
WO2024131371A1 (zh) 一种语音处理方法、装置和电子设备
US20110116644A1 (en) Simulated background noise enabled echo canceller
US9392365B1 (en) Psychoacoustic hearing and masking thresholds-based noise compensator system
CN113038340B (zh) 基于安卓设备的声学回音消除调优方法、系统及存储介质
CN114979344A (zh) 回声消除方法、装置、设备及存储介质
CN113608714A (zh) 一种回音消除方法、电子设备和计算机可读存储介质
US20230267910A1 (en) Method for reducing echo in a hearing instrument and hearing instrument

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18940719

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018940719

Country of ref document: EP

Effective date: 20210617