CN110138991B

CN110138991B - Echo cancellation method and device

Info

Publication number: CN110138991B
Application number: CN201910542019.1A
Authority: CN
Inventors: 解增辉
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2021-05-25
Anticipated expiration: 2039-06-21
Also published as: CN110138991A

Abstract

The invention discloses an echo cancellation method and device, and relates to the technical field of voice processing. One embodiment of the method comprises: collecting first audio data and second audio data of an audio input end, wherein the first audio data is automatically generated by a starting-up self-starting process; determining reference audio data according to the first audio data and the third audio data under the condition that third audio data exists, wherein the third audio data is audio data output by an application program; and acquiring the reference audio data, and performing echo cancellation processing on the second audio data according to the reference audio data. The implementation method can eliminate the echo doped in the audio data collected by the audio input end and improve the voice communication quality, and the process does not need to increase a hardware module and change hardware wiring, thereby avoiding the problems that the hardware cost is increased and the hardware module is difficult to modify by finished equipment.

Description

Echo cancellation method and device

Technical Field

The present invention relates to the field of speech processing technologies, and in particular, to an echo cancellation method and apparatus.

Background

Speech is a good interactive way and is widely used, for example, in instant messaging applications, as well as in interaction with smart devices or other speech recognition usage scenarios. The presence of echo often affects call quality and user experience. For example, in an instant messaging application, real-time voice communication between two parties or between multiple parties is required, and if a user does not wear an earphone and uses an external voice input device, voice emitted by the user is collected by a microphone and played to the other party of audio communication. Therefore, the opposite side in audio communication with the user can not hear the sound of the user, but also hear the sound emitted before the opposite side, the communication quality and the user experience can be influenced, and more seriously, the opposite side can also vibrate to generate howling. Similarly, in many scenarios of voice recognition, if the device for recognizing the input voice may be playing audio (e.g., playing audio/video, voice broadcast, etc.), the playing audio of the speaker of the device is mixed with the input audio of the microphone of the device, that is, the sound of the speaker is fed back to the microphone, such the playing audio of the speaker existing in the input audio of the microphone of the device is generally called an Echo (Acoustic Echo), and the existence of the Echo often interferes with the recognition of the external voice by the device.

At present, most echo cancellation methods eliminate echo through a reference sound, the acquisition of the reference sound is a key part in echo cancellation, and a common method is to add an ADC (Analog-to-Digital Converter) to perform Analog-to-Digital conversion on an audio output signal to be used as the reference sound.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the traditional echo cancellation adopts a hardware mode, a processing chip is integrated on a hardware circuit, a hardware module is added or hardware wiring is changed, not only can the cost be increased, but also some systems are difficult to modify a hardware loop, and the acquisition of the reference sound through hardware is difficult to realize.

Disclosure of Invention

In view of this, embodiments of the present invention provide an echo cancellation method and apparatus, which can cancel an echo doped in audio data collected at an audio input end, and improve voice communication quality, and in the process, there is no need to add a hardware module or change hardware wiring, so as to avoid the problems of hardware cost increase and hard modification of a hardware module in a finished device.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an echo cancellation method including:

collecting first audio data and second audio data of an audio input end, wherein the first audio data is automatically generated by a starting-up self-starting process;

determining reference audio data from the first audio data and the third audio data in the presence of third audio data;

and acquiring the reference audio data, and performing echo cancellation processing on the second audio data according to the reference audio data.

Optionally, the method further comprises: in the absence of third audio data, the first audio data is taken as reference audio data.

Optionally, the method further comprises: and generating fourth audio data under the condition that the starting self-starting process is abnormal and the third audio data does not exist, wherein the fourth audio data is used as reference audio data.

Optionally, the first audio data and the fourth audio data are both null data.

Optionally, when the boot self-booting process is abnormal and the third audio data does not exist, determining a data amount of fourth audio data to be acquired according to the following process:

determining a current time and a time when fourth audio data is first generated;

determining the data volume of the fourth audio data which is expected to be acquired according to the current time, the time for generating the fourth audio data for the first time, the sampling rate, the channel number and the sampling precision;

and determining the data volume of the fourth audio data to be acquired according to the data volume of the fourth audio data expected to be acquired and the data volume of the fourth audio data acquired before the current time.

To achieve the above object, according to another aspect of the embodiments of the present invention, there is provided an echo cancellation device including:

the acquisition module is used for acquiring first audio data and second audio data of an audio input end, wherein the first audio data is automatically generated by a starting-up self-starting process;

a reference sound determination module, configured to determine, in the presence of third audio data, reference audio data according to the first audio data and the third audio data;

and the echo eliminating module is used for acquiring the reference audio data and carrying out echo eliminating processing on the second audio data according to the reference audio data.

Optionally, the reference tone determination module is further configured to: in the absence of third audio data, the first audio data is taken as reference audio data.

Optionally, the reference tone determination module is further configured to: and generating fourth audio data under the condition that the starting-up self-starting process is abnormal and the third audio data does not exist, and taking the fourth audio data as reference audio data.

Optionally, the first audio data and the fourth audio data are both null data.

Optionally, the echo cancellation module is further configured to: under the condition that the starting self-starting process is abnormal and the third audio data does not exist, determining the data volume of the fourth audio data to be acquired according to the following process:

To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the echo cancellation method according to the embodiment of the present invention.

To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program implementing an echo cancellation method of an embodiment of the present invention when executed by a processor.

One embodiment of the above invention has the following advantages or benefits: because the technical means that the reference audio data is determined according to the first audio data generated in the starting self-starting process and the third audio data output by the application program and the echo of the second audio of the collected audio input end is eliminated according to the reference audio data is adopted, the first audio data and the third audio data output for playing can be used as the reference audio data in a software mode without adding a hardware module and changing hardware wiring, and the problems that the hardware cost is increased and the hardware module is difficult to modify by finished equipment are solved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

fig. 1 is a schematic diagram of a main flow of an echo cancellation method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a main flow of an echo cancellation method according to another embodiment of the present invention;

fig. 3 is a schematic diagram of main modules of an echo cancellation device according to an embodiment of the present invention;

fig. 4 is a schematic diagram of main modules of an echo cancellation device according to another embodiment of the present invention;

fig. 5 is a schematic diagram of main modules of an echo cancellation device according to another embodiment of the present invention;

FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 7 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of a main flow of an echo cancellation method according to an embodiment of the present invention, as shown in fig. 1, the method includes:

step S101: collecting first audio data and second audio data of an audio input end, wherein the first audio data is automatically generated by a starting-up self-starting process;

step S102: determining reference audio data according to the first audio data and the third audio data under the condition that third audio data exists, wherein the third audio data is audio data output by an application program;

step S103: in the absence of third audio data, taking the first audio data as reference audio data;

step S104: and acquiring the reference audio data, and performing echo cancellation processing on the second audio data according to the reference audio data.

In step S101, in this embodiment, the power-on self-booting process is configured to automatically generate the first audio data in a power-on state, and send the first audio data to the audio output terminal for playing. As a specific example, a boot self-boot process nulaudtoratrack is created, and after booting, the process automatically runs and generates first audio data. When creating the boot self-boot process nulaudiotrack, it can be created using AudioTrack (AudioTrack is a class that manages and plays a single audio resource). As an example, the first audio data may be data consisting of zero values, which may be referred to as null data.

In this embodiment, the audio input end may be a recording unit (e.g., a microphone) of a voice recording and playing device (e.g., a smart phone or a smart speaker), and the audio output end is a voice playing unit (e.g., a speaker) of the voice recording and playing device.

For step S102, the third audio data is audio data output by an application program built in the voice recording and playing device. When the application program outputs the third audio data, the third audio data is sent to the audio output end to be played.

When the first audio data is acquired and the third audio data exists, the audio data played by the audio output end is composed of the first audio data and the third audio data, the audio data composed of the first audio data and the third audio data is the reference audio data, and the audio data played by the audio output end is the reference audio data. As a specific example, the audio data played at the audio output end is audio data after the first audio data and the third audio data are subjected to mixing processing, for example, audio data after the first audio data and the third audio data are linearly superimposed.

For step S103, when the first audio data is acquired and the third audio data does not exist, the acquired first audio data is taken as the reference audio data.

As for step S104, the echo cancellation processing on the second audio data means: the reference audio data is eliminated from the collected second audio data, that is, the audio data (sound data played by a loudspeaker) at the audio output end is eliminated from the collected second audio data (sound data collected by a microphone), so that the sound recorded at the audio input end is only the sound of the local user speaking.

In the process of collecting the second audio data at the audio input end, the maximum echo interference is the audio data played at the audio output end, therefore, the audio data played at the audio output end is taken as the reference audio data for echo elimination processing, the echo in the second audio data can be eliminated, and the audio data to be collected, such as the sound signal of a user, can be restored, thereby greatly improving the voice communication quality.

Fig. 2 is a schematic diagram of a main flow of an echo cancellation method according to another embodiment of the present invention. As shown in fig. 2, the method includes:

step S201: collecting second audio data of the audio input end;

step S202: detecting whether the starting-up self-starting process works normally and whether third audio data exist or not;

step S203: under the condition that the startup self-starting process works normally and third audio data exists, determining reference audio data according to the first audio data and the third audio data produced by the startup self-starting process, wherein the third audio data is audio data output by an application program;

step S204: under the condition that the startup self-starting process works normally and third audio data does not exist, the first audio data produced by the startup self-starting process is used as reference audio data;

step S205: under the condition that the starting-up self-starting process works abnormally and third audio data exists, taking the third audio data as reference audio data;

step S206: generating fourth audio data under the condition that the starting-up self-starting process works abnormally and third audio data does not exist, and taking the fourth audio data as reference audio data;

step S207: and acquiring the reference audio data, and performing echo cancellation processing on the second audio data according to the reference audio data.

When the power-on self-starting process works abnormally under certain conditions, such as when the power-on self-starting process is abnormally suspended, the power-on self-starting process can not generate the first audio data any more. In addition, since the echo cancellation processing process must be supported by the reference audio data, in this embodiment, in the case of an abnormal operation in the boot self-booting process, if the third audio data exists, the third audio data is used as the reference audio data; and if the third audio data does not exist, automatically generating fourth audio data, and taking the generated fourth audio data as reference audio data. Wherein the fourth audio data may be null data.

According to the echo cancellation method provided by the embodiment of the invention, under the condition that the audio output end is playing the audio, the audio data played by the audio output end is used as the reference audio data for echo processing, and under the condition that the audio output end is not playing the audio, the reference audio data is automatically produced, and in the process, a hardware module does not need to be added and hardware wiring does not need to be changed, so that the problems that the hardware cost is increased and the hardware module is difficult to modify by finished equipment are solved.

As a specific example, in step S202, a cache module may be preset, the first audio data generated by the power-on self-starting process may be written into the cache module, and the third audio data output by the application program may also be written into the cache module. When detecting whether audio data exists at the audio output end, whether the written audio data (whether the audio data after the first audio data or the third audio data or both audio mixing exists) exists at the cache module can be detected, if so, it is indicated that the audio data exists at the audio output end, and if not, it is indicated that the audio data does not exist at the audio output end. Preferably, during the detection, whether the audio data exists in a preset time period may be detected, for example, the preset time period is 1 minute, and if no audio data is written into the buffer module in 1 minute, it is determined that no audio data exists at the audio output end.

In an optional embodiment, when the boot self-starting process is abnormal and the third audio data does not exist, determining a data amount of fourth audio data to be acquired according to the following process:

determining a current time and a time when fourth audio data is first generated; wherein, the current time is the time of the current request for reading the reference audio data;

Specifically, the data amount of the fourth audio data expected to be acquired may be determined according to the following equation (1):

D＝(currenttime–starttime)×simplerate×channels×format/8(1)

wherein D represents the data amount of the fourth audio data expected to be acquired, currenttime represents the current time, starttime represents the time of first generating the fourth audio data, simplerate represents the sampling rate, channels represents the number of channels, and format represents the sampling precision.

Determining the data amount of the fourth audio data to be acquired according to the following formula (1):

L＝D-M(2)

wherein L represents a data amount of fourth audio data to be acquired, and M represents a data amount of fourth audio data that has been acquired before the current time.

In this embodiment, the current time and the time of first generating the fourth audio data are determined by a clock module built in the voice recording and playing device, and since the clock module may generate an error during operation, when calculating the data amount of the fourth audio data to be acquired, the data amount that should be acquired together from the time of first generating the fourth audio data to the current time is calculated first, and then the already acquired data amount is subtracted to obtain the data amount requested to be acquired this time, so that the error caused by the clock module can be eliminated, and the method is more accurate.

Fig. 3 is a schematic diagram of main modules of an echo cancellation device 300 according to an embodiment of the present invention, and as shown in fig. 3, the device 300 includes:

the acquisition module 301 is configured to acquire first audio data and second audio data at an audio input end, where the first audio data is automatically generated by a power-on self-starting process;

a reference sound determining module 302, configured to determine, in a case that third audio data exists, reference audio data according to the first audio data and the third audio data, where the third audio data is audio data output by an application;

and an echo cancellation module 303, configured to obtain the reference audio data, and perform echo cancellation processing on the second audio data according to the reference audio data.

Optionally, the reference tone determining module 302 is further configured to: in the absence of third audio data, the first audio data is taken as reference audio data.

Optionally, the first audio data and the fourth audio data are both null data.

Optionally, the echo cancellation module 303 is further configured to: under the condition that the starting self-starting process is abnormal and the third audio data does not exist, determining the data volume of the fourth audio data to be acquired according to the following process:

The echo eliminating device of the embodiment of the invention adopts the technical means of determining the reference audio data according to the first audio data generated in the starting self-starting process and the third audio data output by the application program and carrying out echo eliminating processing on the second audio of the collected audio input end according to the reference audio data, can use the first audio data and the third audio data output for playing as the reference audio data in a software mode, does not need to increase a hardware module and change hardware wiring, and avoids the problems of hardware cost increase and hard modification of the hardware module of finished equipment.

Fig. 4 is a schematic structural diagram of an echo cancellation device according to another embodiment of the present invention, as shown in fig. 4, the echo cancellation device according to the embodiment of the present invention uses audio data played by an audio output terminal speaker as reference audio data of a reference sound determining module 402, so that an echo in second audio data collected by a mic is cancelled by an echo cancelling module 403, and audio data to be collected, such as a sound signal of a user, is restored, thereby greatly improving voice communication quality.

Fig. 5 is a schematic structural diagram of an echo cancellation device according to another embodiment of the present invention, as shown in fig. 5, in a power-on self-boot process, first audio data is generated when a device is in a power-on state, an application audio output unit indicates that an application outputs third audio data, and a mixer performs mixing processing on the first audio data and the third audio data. When the first audio data and the third audio data exist, the first audio data and the third audio data are also sent to the reference sound determination module 501 when the first audio data and the third audio data are sent to the speaker for playing. When the power-on self-starting process is abnormal and the third audio data does not exist, the reference tone determining module 501 generates fourth audio data. The reference tone determining module 501 takes the fourth audio data as reference audio data. The echo cancellation module 502 acts as a consumer to obtain reference audio data from the reference tone determination module EchoAudio.

According to the echo cancellation device provided by the embodiment of the invention, when the loudspeaker plays audio, the audio data played by the loudspeaker is used as the reference audio data of the reference sound determination module 502, when the loudspeaker does not play audio, fourth audio data is generated and is used as the reference audio data of the reference sound determination module 502, and the echo cancellation module 503 is used as a consumer to acquire the reference audio data from the reference sound determination module 502 so as to cancel the echo in the second audio data acquired by the microphone, and restore the audio data which is desired to be acquired, such as the sound signal of the user, so that the voice communication quality is greatly improved. In addition, the process does not need to increase hardware modules and change hardware wiring, and the problems that the hardware cost is increased and finished equipment is difficult to modify the hardware modules are solved.

The device can execute the method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.

Fig. 6 shows an exemplary system architecture 600 to which the echo cancellation method or echo cancellation device of embodiments of the present invention may be applied.

As shown in fig. 6, the system architecture 600 may include

terminal devices

601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the

terminal devices

601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. Various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

601, 602, and 603.

The

terminal devices

601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 605 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the

terminal devices

601, 602, and 603. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.

It should be noted that the echo cancellation method provided in the embodiment of the present invention is generally executed by the server 605, and accordingly, the echo cancellation apparatus is generally disposed in the server 605.

It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not in some cases constitute a limitation on the unit itself, and for example, the sending module may also be described as a "module that sends a picture acquisition request to a connected server".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:

determining reference audio data according to the first audio data and the third audio data under the condition that third audio data exists, wherein the third audio data is audio data output by an application program;

The technical scheme of the embodiment of the invention can eliminate the echo doped in the audio data collected by the audio input end and improve the voice communication quality, and the process does not need to increase a hardware module and change hardware wiring, thereby avoiding the problems of hardware cost increase and hard modification of the hardware module of finished equipment.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An echo cancellation method, comprising:

acquiring the reference audio data, and performing echo cancellation processing on the second audio data according to the reference audio data;

generating fourth audio data under the condition that the starting-up self-starting process is abnormal and the third audio data does not exist, wherein the fourth audio data is used as reference audio data;

under the condition that the starting self-starting process is abnormal and the third audio data does not exist, determining the data volume of the fourth audio data to be acquired according to the following process:

determining the data volume of the fourth audio data to be acquired according to the data volume of the fourth audio data expected to be acquired and the data volume of the fourth audio data acquired before the current time;

the data amount of the expected acquired fourth audio data is determined as:

D＝(currenttime–starttime)×simplerate×channels×format/8

wherein D represents the data volume of the fourth audio data which is expected to be acquired, currenttime represents the current time, starttime represents the time of first generating the fourth audio data, simplerate represents the sampling rate, channels represent the number of channels, and format represents the sampling precision;

the data amount of the fourth audio data to be acquired is determined as:

L＝D-M

2. The method of claim 1, further comprising:

in the absence of third audio data, the first audio data is taken as reference audio data.

3. The method of claim 2, wherein the first audio data and the fourth audio data are both null data.

4. An echo cancellation device, comprising:

the echo eliminating module is used for acquiring the reference audio data and carrying out echo eliminating processing on the second audio data according to the reference audio data;

the reference tone determination module is further configured to: generating fourth audio data under the condition that the starting-up self-starting process is abnormal and the third audio data does not exist, and taking the fourth audio data as reference audio data;

the echo cancellation module is further configured to: under the condition that the starting self-starting process is abnormal and the third audio data does not exist, determining the data volume of the fourth audio data to be acquired according to the following process:

the data amount of the expected acquired fourth audio data is determined as:

D＝(currenttime–starttime)×simplerate×channels×format/8

the data amount of the fourth audio data to be acquired is determined as:

L＝D-M

5. The apparatus of claim 4, wherein the reference tone determination module is further configured to: in the absence of third audio data, the first audio data is taken as reference audio data.

6. The apparatus of claim 5, wherein the first audio data and the fourth audio data are both null data.

7. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-3.

8. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-3.