CN114979344A

CN114979344A - Echo cancellation method, device, equipment and storage medium

Info

Publication number: CN114979344A
Application number: CN202210503264.3A
Authority: CN
Inventors: 周新权; 熊伟浩; 戚耿鑫
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-08-30

Abstract

The disclosure discloses an echo cancellation method, device, equipment and storage medium. Acquiring a target near-end communication voice; acquiring background sound as a first reference signal; acquiring far-end call voice as a second reference signal; and performing echo cancellation on the target near-end call voice based on the first reference signal and the second reference signal, and sending the near-end call voice after the echo cancellation to a far end. According to the echo cancellation method provided by the embodiment of the disclosure, the echo cancellation is performed on the target near-end call voice based on the first reference signal corresponding to the background sound and the second reference signal corresponding to the far-end call voice, so that the echo in the voice call process can be cancelled, and the voice call quality is improved.

Description

Echo cancellation method, device, equipment and storage medium

Technical Field

The embodiments of the present disclosure relate to the field of audio signal processing technologies, and in particular, to an echo cancellation method, apparatus, device, and storage medium.

Background

In the system audio mode, the media mode has the characteristics of good playing quality, media volume of the system and 0-adjustable playing volume, and is adopted by most communication software. Then in some specific scenarios, such as: in a scene that an entertainment Application program (APP) voice (a voice call function in an entertainment APP) plays audio outside a device, since Background sound (BGM) of the entertainment APP may cause echo interference to a call voice of a user, it is very important to implement echo cancellation for BGM in a media mode.

Disclosure of Invention

The embodiment of the disclosure provides an echo cancellation method, an echo cancellation device, echo cancellation equipment and a storage medium, which can cancel echo in a voice call process and improve voice call quality.

In a first aspect, an embodiment of the present disclosure provides an echo cancellation method, including:

acquiring a target near-end communication voice;

acquiring background sound as a first reference signal;

acquiring far-end conversation voice as a second reference signal;

and performing echo cancellation on the target near-end call voice based on the first reference signal and the second reference signal, and sending the near-end call voice after the echo cancellation to a far end.

In a second aspect, an embodiment of the present disclosure further provides an echo cancellation device, including:

the target near-end call voice acquisition module is used for acquiring target near-end call voice;

the first reference signal acquisition module is used for acquiring background sound as a first reference signal;

the second reference signal acquisition module is used for acquiring the far-end call voice as a second reference signal;

and the echo cancellation module is used for performing echo cancellation on the target near-end call voice based on the first reference signal and the second reference signal, and sending the near-end call voice after the echo cancellation to a far end.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:

one or more processing devices;

storage means for storing one or more programs;

when the one or more programs are executed by the one or more processing devices, the one or more processing devices are caused to implement the echo cancellation method according to the embodiment of the present disclosure.

In a fourth aspect, the disclosed embodiments also provide a computer-readable medium, on which a computer program is stored, where the program, when executed by a processing device, implements an echo cancellation method according to the disclosed embodiments.

The embodiment of the disclosure discloses an echo cancellation method, an echo cancellation device, echo cancellation equipment and a storage medium. Acquiring a target near-end communication voice; acquiring background sound as a first reference signal; acquiring far-end call voice as a second reference signal; and performing echo cancellation on the target near-end call voice based on the first reference signal and the second reference signal, and sending the near-end call voice after the echo cancellation to the far end. According to the echo cancellation method provided by the embodiment of the disclosure, the echo cancellation is performed on the target near-end call voice based on the first reference signal corresponding to the background sound and the second reference signal corresponding to the far-end call voice, so that the echo in the voice call process can be cancelled, and the voice call quality is improved.

Drawings

Fig. 1 is a flow chart of an echo cancellation method in an embodiment of the present disclosure;

FIG. 2a is an exemplary diagram of an audio waveform collected by a first microphone in this embodiment;

FIG. 2b is an exemplary diagram of an audio waveform collected by the second microphone in this embodiment;

FIG. 3 is a schematic diagram of an implementation of echo cancellation in an embodiment of the invention;

fig. 4 is a schematic structural diagram of an echo cancellation device in an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device in the embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Fig. 1 is a flowchart of an echo cancellation method provided in an embodiment of the present disclosure, where the embodiment is applicable to an echo cancellation situation in a voice call process, and the method may be executed by an echo cancellation device, where the echo cancellation device may be composed of hardware and/or software, and may be generally integrated in a device with an echo cancellation function, where the device may be an electronic device such as a server, a mobile terminal, or a server cluster. As shown in fig. 1, the method specifically includes the following steps:

and S110, acquiring the target near-end call voice.

The near-end call voice can be call voice collected by a near-end microphone and is used for being transmitted to the far-end, wherein the near-end can be understood as a local client, and the far-end can be understood as a far-end client which establishes call connection with the near-end. The near-end speech may include near-end speaker sounds, background sounds played by a near-end speaker, and far-end speech. The background sound can be understood as the background sound played by the communication software itself when the near-end user and the far-end user perform a voice call through the communication software of the mobile terminal. The far-end call voice may be call voice collected by a far-end microphone for transmission to the near-end. For example, when the near-end user and one or more far-end users perform voice communication through the entertainment voice software and play through the speaker, the near-end communication voice collected by the microphone of the near-end device includes, in addition to the voice of the near-end speaker, the background sound output by the entertainment voice software played by the speaker and the far-end communication voice. The scenario may be that a voice call is performed while a game is played, and the call function is embedded in a game Application (APP).

Optionally, the method for acquiring the target near-end call voice may be: if the near-end communication voices of the channels are collected, determining the signal echo ratio of the near-end communication voices of each channel; and determining the near-end call voice of the channel with the maximum signal echo ratio as the target near-end call voice.

The signal echo ratio may be a ratio of a voice signal collected by the microphone to an echo. If the signal-echo ratio is larger, the audio signal collected by the microphone is better, that is, the speaker voice signal collected by the microphone is better, and the echo ratio is lower. For example, taking the same mobile phone and two microphones for simultaneously collecting the near-end call voice as an example, fig. 2a is an exemplary diagram of an audio waveform collected by the first microphone in this embodiment. Fig. 2b is an exemplary diagram of an audio waveform collected by the second microphone in this embodiment. As shown in fig. 2a and fig. 2b, the first half of the audio waveform of fig. 2a and fig. 2b is the speaker's voice in the near-end call speech, and the second half is the echo, so that it can be seen that the characteristics of the speaker's voice and the echo are very different. The echo collected by the first microphone in fig. 2a is much lower than the echo collected by the second microphone in fig. 2b, that is, the signal echo ratio of the near-end call voice of the first microphone is the maximum, and then the near-end call voice collected by the first microphone is used as the target near-end call voice.

In this embodiment, if the near-end device has at least two microphones, at least two channels collect the near-end call voice, the signal echo ratio of the near-end call voice of each channel may be calculated respectively, and the near-end call voice collected by the corresponding microphone may be selected according to the signal echo ratio. Specifically, the near-end call voice of the channel with the largest signal echo ratio is used as the target near-end call voice. In this embodiment, the target near-end speech is selected based on the maximum signal echo ratio of the near-end speech, so that the efficiency and effect of echo cancellation processing can be improved on the basis of ensuring the speech quality.

And S120, acquiring background sound as a first reference signal.

In this embodiment, since the background sound is the background sound played by the voice module in the entertainment software system, the voice communication system cannot acquire the audio data corresponding to the background sound through its own set function, and therefore, in this embodiment, the audio data corresponding to the background sound may be acquired by means of a third party callback technology. Specifically, the audio data corresponding to the background sound may be obtained based on an audio play callback function of a hook system, and used as the first reference signal.

Optionally, the manner of acquiring the background sound as the first reference signal may be: calling and setting an audio playing function; and acquiring background sound as a first reference signal based on the set audio play function.

The set audio playing function may be an audio playing callback function of a hook system.

In this embodiment, based on the hook technology, the playing data of the background sound obtained by the system playing function (Play function) is captured by the audio playing function, so as to obtain the background sound, and the background sound is used as the first reference signal. The system playing function is used for acquiring playing data of background sound and transmitting the playing data to the loudspeaker for playing. In this embodiment, the convenience of selecting the first reference signal can be improved by calling the mode of setting the audio playing function to acquire the background sound.

Optionally, the manner of calling and setting the audio play function may be: establishing a corresponding relation between the name of a system playing function and a set audio playing function address; when a system playing function is called, a set audio playing function address is obtained based on the corresponding relation; and calling the set audio playing function based on the set audio playing function address.

In this embodiment, the corresponding relationship between the system play function name and the set audio play function address may be established in the global offset table of the dynamic library. The dynamic library may be a library that is shared by multiple processes and does not occupy additional physical memory. The addresses of all functions in the dynamic library are determined in the loading process and loaded into the memory, and the function addresses in the dynamic library are different along with the difference of the loading addresses of the whole dynamic library. When the dynamic library calls the function of the target dynamic library, a Global Offset Table (GOT) can be added in the middle of the call, and the real function can be indirectly called through the GOT. The global offset table may be a table storing the correspondence between function names and function addresses.

Specifically, in the global offset table of the target dynamic library, the system play function name is kept unchanged, the system play function address corresponding to the system play function name is replaced with the set audio play function address, and a corresponding relationship is established between the system play function name and the set audio play function address.

In this embodiment, when the dynamic library calls the system play function, the set audio play function address is queried according to the corresponding relationship in the global offset table, so that the set audio play function is called according to the set audio play function address. In this embodiment, through a hook technology, a corresponding relationship between a system play function name and a set audio play function address is established in a global offset table of a target dynamic library, so that when a system play function is called, the set audio play function can be called, audio data played by entertainment software itself, that is, background sound, can be obtained, and then the audio data can be used as a first reference signal for echo cancellation, and acquisition of the first reference signal is facilitated.

And S130, acquiring the far-end call voice as a second reference signal.

The second reference signal may be understood as audio data corresponding to far-end call voice. The audio data to be played can be acquired through the relevant knowledge in the audio signal field, and the far-end call voice, namely the second reference signal, can be obtained. For example, a setting function (e.g., a Play function) in the voice communication system may be called to obtain audio data corresponding to the far-end call voice.

And S140, performing echo cancellation on the target near-end call voice based on the first reference signal and the second reference signal, and sending the near-end call voice after echo cancellation to the far end.

The echo may be background sound and far-end call voice played by a near-end speaker collected by a near-end microphone. Echo Cancellation (AEC) may be a method of performing Echo Cancellation by a delay estimation algorithm, an adaptive filter algorithm, and a residual Echo suppression algorithm. The present embodiment is not limited to the specific delay estimation algorithm, adaptive filter algorithm, and residual echo suppression algorithm. The steps of echo cancellation may be: calculating the time delay of the reference signal and the echo through a time delay estimation algorithm to align the reference signal and the echo; suppressing most echo energy through an adaptive filter algorithm; the residual echo energy is further suppressed by a residual echo suppression algorithm.

Optionally, the method for performing echo cancellation on the target near-end call voice based on the first reference signal and the second reference signal may be: performing echo cancellation on the target near-end call voice based on the first reference signal, and performing further echo cancellation on the target near-end call voice after the echo cancellation based on the second reference signal; or, performing echo cancellation on the target near-end call voice based on the second reference signal, and performing further echo cancellation on the target near-end call voice after echo cancellation based on the first reference signal.

In this embodiment, a dual echo cancellation method may be used for two kinds of echoes causing the target near-end call voice, i.e., the background sound and the far-end call voice. The two echo cancellation methods can be cascaded, and then the echoes introduced by the background sound and the far-end call voice can be respectively cancelled according to the cascade sequence. The present embodiment does not limit the order of the cascade connection. Specifically, echo cancellation may be performed on the target near-end call voice based on the first reference signal, and then, based on the first reference signal, echo cancellation may be further performed on the target near-end call voice. Or, the echo cancellation is performed on the target near-end call voice based on the second reference signal, and on this basis, the echo cancellation is further performed on the target near-end call voice based on the first reference signal.

In this embodiment, by using the dual echo cancellation method, the cancellation of the background sound echo and the cancellation of the far-end call voice echo are simultaneously covered, so that not only the quality of the near-end call voice can be ensured, but also the far-end call voice echo can be ensured to be not leaked, and the echo cancellation effect is improved.

Optionally, the method of performing echo cancellation on the target near-end call voice based on the first reference signal may be: and performing echo cancellation on the target near-end call voice by adopting a first suppression coefficient based on the first reference signal. The echo cancellation of the target near-end call voice based on the second reference signal may be: performing echo cancellation on the target near-end call voice by adopting a second suppression coefficient based on a second reference signal; wherein the first suppression coefficient is smaller than the second suppression coefficient.

The first suppression coefficient may be a suppression coefficient used when the background sound echo is cancelled. The second suppression coefficient may be a suppression coefficient used when canceling the far-end call voice echo.

In this embodiment, the echo of the background sound is different from the echo of the call voice, the background sound generally continuously exists, and the call voice generally alternately operates at both ends, so that for the echo introduced by the background sound, when performing echo cancellation on the target near-end call voice based on the first reference signal, the adopted first suppression coefficient is relatively small, so as to ensure the quality of the near-end call voice and simultaneously cancel the echo as much as possible. And when echo cancellation is carried out on the target near-end call voice based on the second reference signal, the adopted second suppression coefficient is relatively large, so that the echo of the far-end call voice is guaranteed to be cancelled, and the echo cancellation effect is improved.

Optionally, after performing echo cancellation on the target near-end call voice based on the first reference signal and the second reference signal, the method further includes: and performing Voice Activity Detection (VAD) processing on the target near-end call Voice after echo cancellation.

Where silence suppression VAD may be a technique to cancel the residual background acoustic echo. Specifically, the target near-end speech after echo cancellation is judged in real time, and if the judgment result is that the current target near-end speech is background sound, the target near-end speech can be processed in a manner of setting a signal of the background sound to 0 by using a silence suppression VAD, so as to further suppress the background sound echo. And if the recognition result is that the current target near-end call voice comprises speaker voice and background voice, completely reserving the target near-end call voice without performing mute suppression VAD processing.

In this embodiment, the silence suppression VAD is adopted to process the echo-cancelled target near-end call voice again, so that the residual background sound echo can be cancelled, and the echo-cancelled near-end call voice is sent to the far-end, thereby improving the user experience of voice call.

Fig. 3 is a schematic diagram of an implementation of echo cancellation according to an embodiment of the present invention. As shown in fig. 3, a plurality of microphones collect near-end call speech, select a microphone with the largest signal-to-echo ratio, determine the collected near-end call speech as target near-end call speech, call and set an audio play function based on hook technology to obtain background sound as a first reference signal, cancel the background sound echo based on the first reference signal, use the far-end call speech as a second reference signal, cancel the far-end call speech echo based on the second reference signal, perform VAD processing on the target near-end call speech after echo cancellation again, and send the near-end call speech after echo cancellation to the far end.

Fig. 4 is a schematic structural diagram of an echo cancellation device according to an embodiment of the present disclosure, and as shown in fig. 4, the echo cancellation device includes:

a target near-end call voice obtaining module 210, configured to obtain a target near-end call voice;

a first reference signal obtaining module 220, configured to obtain a background sound as a first reference signal;

a second reference signal obtaining module 230, configured to obtain a far-end call voice as a second reference signal;

and the echo cancellation module 240 is configured to perform echo cancellation on the target near-end call voice based on the first reference signal and the second reference signal, and send the near-end call voice after the echo cancellation to the far-end.

Optionally, the target near-end call voice obtaining module 210 is further configured to:

if the near-end communication voices of the channels are collected, determining the signal echo ratio of the near-end communication voices of each channel;

and determining the near-end call voice of the channel with the maximum signal echo ratio as the target near-end call voice.

Optionally, the first reference signal obtaining module 220 is further configured to:

calling and setting an audio playing function;

and acquiring background sound as a first reference signal based on the set audio play function.

establishing a corresponding relation between the name of a system playing function and a set audio playing function address;

when a system playing function is called, a set audio playing function address is obtained based on the corresponding relation;

and calling the set audio playing function based on the set audio playing function address.

Optionally, the echo cancellation module 240 is further configured to:

performing echo cancellation on the target near-end call voice based on the first reference signal, and performing further echo cancellation on the target near-end call voice after the echo cancellation based on the second reference signal; alternatively, the first and second electrodes may be,

and performing echo cancellation on the target near-end call voice based on the second reference signal, and performing further echo cancellation on the echo-cancelled target near-end call voice based on the first reference signal.

Optionally, the echo cancellation module 240 is further configured to:

performing echo cancellation on the target near-end call voice by adopting a first suppression coefficient based on the first reference signal;

performing echo cancellation on the target near-end call voice based on the second reference signal, including:

performing echo cancellation on the target near-end call voice by adopting a second suppression coefficient based on a second reference signal; wherein the first suppression coefficient is smaller than the second suppression coefficient.

Optionally, the echo cancellation module 240 is further configured to:

and performing mute voice-to-speech (VAD) processing on the target near-end call voice after echo cancellation.

The device can execute the methods provided by all the embodiments of the disclosure, and has corresponding functional modules and beneficial effects for executing the methods. For details of the technology not described in detail in this embodiment, reference may be made to the methods provided in all the foregoing embodiments of the disclosure.

Referring now to FIG. 5, a block diagram of an electronic device 300 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like, or various forms of servers such as a stand-alone server or a server cluster. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 300 may include a processing means (e.g., central processing unit, graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a read-only memory device (ROM)302 or a program loaded from a storage device 305 into a random access memory device (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to communicate wirelessly or by wire with other devices to exchange data. While fig. 5 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program containing program code for performing a method for recommending words. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 309, or installed from the storage means 305, or installed from the ROM 302. The computer program, when executed by the processing device 301, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target near-end communication voice; acquiring background sound as a first reference signal and far-end call voice as a second reference signal; and performing echo cancellation on the target near-end call voice based on the first reference signal and the second reference signal, and sending the near-end call voice after the echo cancellation to a far end.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the disclosed embodiments, the disclosed embodiments disclose an echo cancellation method, comprising:

acquiring a target near-end communication voice;

acquiring background sound as a first reference signal;

acquiring far-end conversation voice as a second reference signal;

Further, acquiring the target near-end call voice comprises:

Further, acquiring a background sound as a first reference signal includes:

calling and setting an audio playing function;

and acquiring background sound as a first reference signal based on the set audio playing function.

Further, the calling and setting an audio playing function includes:

establishing a corresponding relation between the system playing function name and the set audio playing function address;

when the system playing function is called, the address of the set audio playing function is obtained based on the corresponding relation;

Further, performing echo cancellation on the target near-end call voice based on the first reference signal and the second reference signal, including:

performing echo cancellation on the target near-end call voice based on the first reference signal, and performing further echo cancellation on the echo-cancelled target near-end call voice based on the second reference signal; alternatively, the first and second electrodes may be,

Further, performing echo cancellation on the target near-end call voice based on the first reference signal, including:

performing echo cancellation on the target near-end call voice by adopting a second suppression coefficient based on the second reference signal; wherein the first suppression coefficient is less than the second suppression coefficient.

Further, after performing echo cancellation on the target near-end speech based on the first reference signal and the second reference signal, the method further includes:

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions of the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. An echo cancellation method, comprising:

acquiring a target near-end communication voice;

acquiring background sound as a first reference signal;

acquiring far-end call voice as a second reference signal;

2. The method of claim 1, wherein obtaining the target near-end speech comprises:

if the near-end conversation voices of the channels are collected, determining the signal echo ratio of the near-end conversation voice of each channel;

3. The method of claim 1, wherein obtaining the background sound as the first reference signal comprises:

calling and setting an audio playing function;

4. The method of claim 3, wherein invoking the set audio play function comprises:

5. The method of claim 1, wherein performing echo cancellation on the target near-end speech based on the first reference signal and the second reference signal comprises:

6. The method of claim 5, wherein performing echo cancellation on the target near-end speech based on the first reference signal comprises:

7. The method of claim 1, further comprising, after echo canceling the target near-end speech based on the first reference signal and the second reference signal:

8. An echo cancellation device, comprising:

9. An electronic device, characterized in that the electronic device comprises:

one or more processing devices;

storage means for storing one or more programs;

when executed by the one or more processing devices, cause the one or more processing devices to implement the echo cancellation method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by processing means, carries out the echo cancellation method according to any one of claims 1-7.