CN115706755A

CN115706755A - Echo cancellation method, electronic device, and storage medium

Info

Publication number: CN115706755A
Application number: CN202110902994.6A
Authority: CN
Inventors: 丁浩; 钟小飞; 李刚; 张斌
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2023-02-17

Abstract

The embodiment of the application provides an echo cancellation method, electronic equipment and a storage medium, which relate to the technical field of communication, and the method comprises the following steps: transmitting a first ultrasonic wave to a second device using a speaker; receiving first data sent by second equipment in a first transmission mode, wherein the first data comprises first ultrasonic waves acquired by the second equipment; calculating a first echo time delay of an audio loop corresponding to the first equipment and the second equipment according to the first ultrasonic waves and the first data, wherein the audio loop comprises a first transmission mode; and according to the first echo time delay, carrying out echo cancellation on the audio frequency loop where the first equipment and the second equipment are located. The method provided by the embodiment of the application can estimate the echo time delay more accurately, so that the echo time delay can be eliminated more effectively.

Description

Echo cancellation method, electronic device, and storage medium

Technical Field

The embodiment of the application relates to the field of communication technologies, and in particular, to an echo cancellation method, an electronic device, and a storage medium.

Background

In daily remote communication, as point-to-point double-end communication, the sound pickup devices of both parties of communication are often single devices such as a mobile phone or a tablet. And the single equipment has limited sound pickup distance and limited capability, and once the distance is far away, the sound pickup effect is obviously reduced. In a multi-person conference call, a special conference room needs to be provided, and a professional multimedia system, such as "octopus" or the like, for a conference is arranged in the conference room, and a corresponding microphone is provided, so that the voice call quality in the conference can be improved.

With the rapid development of electronic technology and information technology, the popularity of electronic devices such as mobile phones and tablet computers has increased. Therefore, the electronic device with the microphone, such as the mobile phone or the tablet, can be used as a distributed sound pickup device. A plurality of distributed pickup equipment are combined to be used as a combined microphone, pickup effects of different positions in a space can be effectively improved, meanwhile, the complex constraint of a wired microphone and the high cost of a multimedia system can be eliminated, and pickup experience of an existing conference scene can be greatly improved.

However, in double talk, acoustic echo is generated. For example, when the voice of the far-end speaker is transmitted to the near-end device through the network, the near-end speaker will play, and since the conversation is real-time duplex, the audio played by the speaker at the near-end will be picked up by the near-end microphone. If acoustic echo cancellation is not performed, the audio content is transmitted to the far-end through the network, so that the far-end user hears the echo of the speaking of the far-end user, and the experience of voice call is seriously influenced.

Disclosure of Invention

Embodiments of the present application provide an echo cancellation method, an electronic device, and a storage medium, so as to provide a way of calculating echo time delay between distributed devices, thereby implementing echo cancellation between distributed devices.

In a first aspect, an embodiment of the present application provides an echo cancellation method, which is applied to a first device, where the first device includes a speaker, and includes:

transmitting a first ultrasonic wave to a second device using a speaker; the first device can be a mobile phone, a tablet and other devices with a loudspeaker; the second device is an electronic device having a microphone.

Receiving first data sent by second equipment in a first transmission mode, wherein the first data comprises first ultrasonic waves acquired by the second equipment; wherein the first data may be in the form of ultrasonic audio data, e.g. audio frames. The first transmission mode may be a transmission mode of a wireless network, such as WIFI, a mobile network, and bluetooth.

Calculating a first echo time delay of an audio loop corresponding to the first device and the second device according to the first ultrasonic waves and the first data, wherein the audio loop comprises a first transmission mode; the echo delay is calculated by using the existing AEC algorithm.

And according to the first echo time delay, carrying out echo cancellation on the audio frequency loop where the first equipment and the second equipment are located.

In the embodiment of the application, the echo time delay between the central equipment and the distributed equipment is calculated, and the echo is dynamically eliminated according to the echo time delay, so that the accuracy of echo time delay calculation can be improved, the echo can be effectively eliminated, and the experience of a user can be improved.

In one possible implementation manner, the first device further includes a microphone, and further includes:

collecting a first ultrasonic wave by using a microphone;

calculating to obtain a second echo time delay corresponding to the first equipment based on the acquired first ultrasonic wave;

and based on the second echo time delay, performing echo cancellation on the first equipment.

In the embodiment of the application, the central device can also generate the echo time delay, and the central device calculates the echo time delay of the central device by acquiring the first ultrasonic wave played by the loudspeaker of the central device, so that the echo time delay of the central device can be eliminated, and the experience of a user is improved.

In one possible implementation manner, the method further includes:

and updating the first echo time delay of the audio loop corresponding to the first device and the second device.

In the embodiment of the application, the echo time delay between the central equipment and the distributed equipment is updated, so that the echo time delay can be dynamically updated, the echo time delay can be more accurately calculated, the echo can be more accurately eliminated, and the experience of a user is improved.

In one possible implementation manner, the updating the first echo delay of the audio loop corresponding to the first device and the second device includes:

when detecting that at least one of the first equipment and the second equipment moves, sending a second ultrasonic wave to the second equipment;

and updating the first echo time delay of the audio loop corresponding to the first device and the second device based on the received second data sent by the second device, wherein the second data comprises second ultrasonic waves acquired by the second device.

In the embodiment of the application, the echo time delay can be effectively updated aiming at the moving scene of the central equipment or the distributed equipment, so that the echo can be effectively eliminated.

In one possible implementation manner, the first ultrasonic wave has a preset first frequency, the second ultrasonic wave has a preset second frequency, and the preset second frequency and the preset first frequency are different frequencies.

In the embodiment of the application, by distinguishing the frequencies of the two ultrasonic waves, the distributed equipment can effectively identify the scene updated by the echo time delay, and can feed back ultrasonic audio data according to the scene, so that the echo can be effectively eliminated.

In one possible implementation manner, the second ultrasonic wave is superimposed with the user voice collected by the first device and then sent to the second device.

In the embodiment of the application, the ultrasonic waves and the voice of the user are superposed and sent, so that the sending efficiency of the ultrasonic waves can be improved.

sending a first ultrasonic wave to a second device based on a preset period;

and updating the first echo time delay of the audio loop corresponding to the first device and the second device based on the received first data sent by the second device.

In the embodiment of the application, the central device can periodically update the echo time delay between the central device and the distributed devices by periodically sending the ultrasonic waves, so that the echo time delay can be more effectively calculated, and the echo can be more effectively eliminated.

when detecting that the first equipment is in a silent state, sending a first ultrasonic wave to the second equipment;

In the embodiment of the application, the ultrasonic waves are sent when the central equipment is in a silent state (for example, during a conference rest or when no person speaks), so that the echo time delay can be updated, the sending efficiency of the ultrasonic waves can be improved, and the occupation of equipment resources caused by the sending of the ultrasonic waves is avoided.

sending a first ultrasonic wave to a second device based on the detected preset event;

In the embodiment of the application, the echo time delay can be recalculated for events such as online new distributed equipment, conference restart and conference interruption, so that the echo time delay in the events can be updated, and the echo can be eliminated more accurately based on the updated time delay.

In one possible implementation manner, the first data further includes a device number corresponding to the second device, and the device number is used to identify the second device.

In the embodiment of the application, the corresponding distributed equipment can be identified by carrying the equipment number in the data, echoes between the distributed equipment and the corresponding distributed equipment can be eliminated in a targeted manner, and all the distributed equipment is not required to be subjected to echo elimination, so that the efficiency of echo elimination can be improved.

In one possible implementation manner, the first transmission manner includes a wireless network transmission manner of WIFI, a mobile network, and bluetooth.

In the embodiment of the application, the ultrasonic audio data are sent in wireless network modes such as WIFI, mobile network and Bluetooth, the distributed equipment can quickly send the ultrasonic audio data to the central equipment, and therefore the central equipment can be enabled to realize quick technical echo time delay.

In one possible implementation manner, the audio loop between the first device and the second device includes a first channel to be subjected to echo cancellation, the first transmission manner includes a second channel, and the first channel and the second channel are consistent.

In the embodiment of the application, the channel for eliminating the echo and the channel for transmitting the ultrasonic audio data are kept consistent, so that the consistency of calculating the echo time delay at every time can be ensured, the accuracy of echo elimination can be ensured, and the experience of a user can be improved.

In one possible implementation manner, the audio loop between the first device and the second device includes a first channel to be subjected to echo cancellation, the first device has a third channel for processing the first ultrasonic wave and the first data, and the first channel and the third channel are consistent.

In the embodiment of the application, the processing channel in the central equipment is kept consistent with the channel to be subjected to echo cancellation, so that time delay errors caused by processing of different processing channels in the central equipment can be avoided, the accuracy of calculation of echo time delay can be ensured, and the accuracy of echo cancellation can be further ensured.

In a second aspect, an embodiment of the present application provides an echo cancellation apparatus, applied to a first device, where the first device includes a speaker, and includes:

a transmitting module for transmitting a first ultrasonic wave to a second device using a speaker;

the receiving module is used for receiving first data sent by second equipment in a first transmission mode, wherein the first data comprises first ultrasonic waves acquired by the second equipment;

the computing module is used for computing a first echo time delay of an audio loop corresponding to the first equipment and the second equipment according to the first ultrasonic wave and the first data, wherein the audio loop comprises a first transmission mode;

and the first eliminating module is used for eliminating the echo of the audio loop where the first equipment and the second equipment are located according to the first echo time delay.

In one possible implementation manner, the first device further includes a microphone, and the apparatus further includes:

a second cancellation module for acquiring the first ultrasonic wave using a microphone; calculating to obtain a second echo time delay corresponding to the first equipment based on the acquired first ultrasonic wave; and based on the second echo time delay, performing echo cancellation on the first equipment.

In one possible implementation manner, the apparatus further includes:

and the updating module is used for updating the first echo time delay of the audio loop corresponding to the first equipment and the second equipment.

In one possible implementation manner, the update module is further configured to

Sending a first ultrasonic wave to the second equipment based on a preset period;

In a third aspect, an embodiment of the present application provides a first device, including:

a memory for storing computer program code, the computer program code including instructions that, when read from the memory, cause the first device to perform the steps of:

transmitting a first ultrasonic wave to a second device using a speaker;

receiving first data sent by second equipment in a first transmission mode, wherein the first data comprises first ultrasonic waves acquired by the second equipment;

calculating a first echo time delay of an audio loop corresponding to the first equipment and the second equipment according to the first ultrasonic waves and the first data, wherein the audio loop comprises a first transmission mode;

In one possible implementation manner, the first device further includes a microphone, and when the instruction is executed by the first device, the first device further performs the following steps:

acquiring a first ultrasonic wave by using a microphone;

In one possible implementation manner, when the instruction is executed by the first device, the first device further performs the following steps:

In one possible implementation manner, when the instruction is executed by the first device, the step of the first device updating the first echo delay of the audio loop corresponding to the first device and the second device includes:

In a possible implementation manner, when the instruction is executed by the first device, the step of the first device updating the first echo delay of the audio loop corresponding to the first device and the second device includes:

In one possible implementation manner, the audio loop between the first device and the second device includes a first channel to be subjected to echo cancellation, the first device has a third channel for processing the first ultrasonic wave and the first data, and the first channel and the third channel are kept consistent.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein a computer program, which, when run on a computer, causes the computer to perform the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program, which is configured to perform the method according to the first aspect when the computer program is executed by a computer.

In a possible design, the program in the fifth aspect may be stored in whole or in part on a storage medium packaged with the processor, or in part or in whole on a memory not packaged with the processor.

Drawings

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic system architecture diagram of an electronic device according to an embodiment of the present application;

fig. 3 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart illustrating an embodiment of an echo cancellation method provided in the present application;

FIGS. 5a and 5b are schematic waveforms of ultrasonic waves provided by embodiments of the present application;

fig. 6 is a schematic diagram of echo delay calculation according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an embodiment of an echo cancellation device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone.

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present application, "a plurality" means two or more unless otherwise specified.

In daily remote communication, as point-to-point double-end communication, the sound pickup devices of both parties of communication are often single devices such as a mobile phone or a tablet. And the single equipment has limited sound pickup distance and limited capability, and once the distance is far away, the sound pickup effect is obviously reduced. In a multi-person conference call, a special conference room needs to be provided, and a professional multimedia system, such as a conference "octopus" or the like, is arranged in the conference room, and a corresponding microphone is provided, so that the voice call quality in the conference can be improved.

With the rapid development of electronic technology and information technology, the popularity of electronic devices such as mobile phones and tablet computers has increased. Therefore, the electronic device with the microphone, such as the mobile phone or the tablet, can be used as a distributed sound pickup device. A plurality of distributed pickup equipment are combined, and the combined microphone can be used as a combined microphone, pickup effects of different positions in a space can be effectively improved, meanwhile, the complex constraint of a wired microphone and the high cost of a multimedia system can be eliminated, and pickup experience of an existing conference scene can be greatly improved.

One of the echo cancellation methods is for a single device, i.e. the microphone and the loudspeaker are located in the same device. In the distributed combined microphone, the distributed microphone is scattered at each spatial position as a voice input device, and a loudspeaker for playing sound may be a sound device, a large screen and the like, and the position is not fixed. Therefore, the distance between the microphone and the speaker cannot be determined, resulting in an inaccurate estimation of the echo time delay. In addition, since the microphone devices accessing the distributed microphone array to acquire audio may be different types of devices, such as a mobile phone or a tablet, the echo time delay of different devices cannot be accurately estimated. Echo cancellation is based on echo time delay, and since the echo time delay cannot be estimated accurately, echoes cannot be cancelled effectively, and user experience is reduced.

In order to solve the above problem, an embodiment of the present application provides an echo cancellation method, which is applied to the electronic device 100. The electronic device 100 may be an electronic device having a speaker and a microphone, such as a mobile phone, a tablet, and the like. It is to be understood that the type of the electronic device 100 is not limited to the embodiment of the present application, and in some embodiments, the electronic device 100 may also be other types of electronic devices.

Fig. 1 is an application scenario of the echo cancellation method, as shown in fig. 1, the application scenario includes a first device 10, a second device 20, and a second device 21. The first device 10, the second device 20, and the second device 21 may be the electronic device 100. It may be understood that the above application scenario illustrates only 3 electronic devices 100 by way of example, and does not constitute a limitation to the embodiment of the present application, and in some embodiments, the above application scenario may further include more or fewer electronic devices 100.

In the application scenario shown in fig. 1, the first device 10, the second device 20, and the second device 21 may form a distributed combined microphone network. Taking the example that the first device 10 cancels the echo of the audio loop between the first device 10 and the second device 20, the second device 20 has a microphone, the first device 10 sends the ultrasonic wave to the second device 20 through the speaker, the second device 20 collects the ultrasonic wave through the microphone, and sends the collected ultrasonic wave to the first device 10 through the WIFI channel 1101, thereby enabling the first device 10 to calculate the echo delay based on the sent ultrasonic wave and the received ultrasonic wave, and canceling the echo of the audio loop between the first device 10 and the second device 20 according to the echo delay. It is understood that the WIFI channel 1101 is an exemplary illustration and is not a limitation to the embodiments of the present application, and in some embodiments, may be another wireless channel.

Next, a system block diagram of the electronic apparatus 100 will be described with reference to fig. 2. As shown in fig. 2, the electronic device 100 includes. The electronic device 100 may include an application layer 200, a driver layer 300, and a physical layer 400.

The application layer 200 is used to provide different applications. The application can be a common functional application or a customized functional application. In the embodiment of the present application, a preset ultrasonic audio may be transmitted through the above application.

The driver layer 300 includes an Automatic Echo Cancellation (AEC) module 310, a driver module 320, and a delay estimation module 330. The delay estimation module 330 is configured to estimate the echo delay. The AEC module 310 is configured to perform echo cancellation based on the echo delay. It is understood that the above-mentioned echo cancellation method can use the existing AEC algorithm, and will not be described herein. The driving module 320 includes a speaker driver 321 and a microphone driver 322. The speaker driver 321 is configured to receive audio data and play the audio data through a speaker. The microphone driver 322 is used for receiving the audio played by the speaker and converting the audio into audio data.

It should be noted that the data channel for performing echo delay calculation by the delay estimation module 330 is the same as the data channel for performing echo cancellation by the AEC module 310, that is, the data channel for performing echo delay calculation by the delay estimation module 330 and the data channel for performing echo cancellation by the AEC module 310 use the same thread or the same process, so that it is ensured that the system delays are as consistent as possible, and further, the delays of the data channels are the same. In particular implementations, the delay estimation module 330 may be placed in the AEC module 310, so that the consistency of the delay path may be guaranteed.

The physical layer 400 is used to provide different types of communication interfaces, which may be interfaces between two electronic devices 100, and which may be wireless communication interfaces such as WIFI, bluetooth, etc.

An exemplary electronic device provided in the following embodiments of the present application is first described below with reference to fig. 3. Fig. 3 shows a schematic structural diagram of an electronic device 100, and the electronic device 100 may be the first device 10, the second device 20, or the second device 21.

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus including a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, the processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement a touch function of the electronic device 100.

The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 through an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through the I2S interface, so as to implement a function of answering a call through a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.

MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a display screen serial interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture functionality of electronic device 100. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the electronic device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, and the like.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transmit data between the electronic device 100 and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.

It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only illustrative and is not limited to the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In other embodiments, the power management module 141 may be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The

antennas

1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication applied to the electronic device 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

In some embodiments, antenna 1 of electronic device 100 is coupled to mobile communication module 150 and antenna 2 is coupled to wireless communication module 160 so that electronic device 100 can communicate with networks and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), time division code division multiple access (time-division multiple access, TD-SCDMA), long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The electronic device 100 implements display functions via the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The electronic device 100 may implement a photographing function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, and the application processor, etc.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. Applications such as intelligent recognition of the electronic device 100 can be realized through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The electronic device 100 may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The electronic apparatus 100 can listen to music through the speaker 170A or listen to a handsfree call.

The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic apparatus 100 receives a call or voice information, it can receive voice by placing the receiver 170B close to the ear of the person.

The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or sending voice information, the user can input a voice signal to the microphone 170C by uttering a voice signal close to the microphone 170C through the mouth of the user. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and perform directional recording.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic apparatus 100 may also calculate the touched position from the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., the x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects a shake angle of the electronic device 100, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device 100 through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, electronic device 100 calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by barometric pressure sensor 180C.

The magnetic sensor 180D includes a hall sensor. The electronic device 100 may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip phone, the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.

The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, electronic device 100 may utilize range sensor 180F to range for fast focus.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic apparatus 100 emits infrared light to the outside through the light emitting diode. The electronic device 100 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there are no objects near the electronic device 100. The electronic device 100 can utilize the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear for talking, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense the ambient light level. Electronic device 100 may adaptively adjust the brightness of display screen 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touches.

The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 can utilize the collected fingerprint characteristics to unlock the fingerprint, access the application lock, photograph the fingerprint, answer an incoming call with the fingerprint, and so on.

The temperature sensor 180J is used to detect temperature. In some embodiments, electronic device 100 implements a temperature processing strategy using the temperature detected by temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the electronic device 100 heats the battery 142 when the temperature is below another threshold to avoid the low temperature causing the electronic device 100 to shut down abnormally. In other embodiments, when the temperature is lower than a further threshold, the electronic device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.

The touch sensor 180K is also called a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device 100, different from the position of the display screen 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human voice vibrating a bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 180M may also be provided in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone mass vibrated by the sound part acquired by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The electronic apparatus 100 may receive a key input, and generate a key signal input related to user setting and function control of the electronic apparatus 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be brought into and out of contact with the electronic apparatus 100 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards can be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the electronic device 100 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

Fig. 4 is a schematic flowchart of an embodiment of an echo cancellation method according to an embodiment of the present application, including:

step 401, the central device receives and plays a common audio stream.

In particular, the central device may be a voice playing device in a distributed combined microphone network, that is, a master device. Other devices in the distributed combined microphone network may be used as sound pickup devices, i.e., distributed devices. For example, the distributed combined microphone network may be composed of a plurality of devices (e.g., the first device 10, the second device 20, and the second device 21). The first device 10, the second device 20 and the second device 21 may be connected through a local area network to form a distributed combined microphone network. The user may set any one of the first device 10, the second device 20, or the second device 21 in the above-described distributed combined microphone network as the center device. For example, if the user sets the first device 10 as a central device, the first device 10 may be used as a voice playing device, and the other devices (e.g., the second device 20 and the second device 21) may be used as distributed devices, that is, sound pickup devices.

After the central device is determined, the central device can receive the common audio stream and play the common audio stream. It is understood that the frame rate of the above-mentioned normal audio stream may be 5ms or 20ms, but is not limited to the embodiment of the present application. In some embodiments, other frame rates are possible.

Step 402, the center device sends ultrasonic audio data.

Specifically, the ultrasonic audio data may be transmitted through the application layer 200. The ultrasonic audio data can be obtained from an ultrasonic audio file preset in the central equipment, and the preset ultrasonic audio data has a first preset frequency. Alternatively, the ultrasound audio file may also be downloaded from the internet in real time. The source of the ultrasonic audio data is not particularly limited in the embodiments of the present application. In addition, the preset frequency range of the ultrasonic audio data can be larger than 20KHz, namely, high-frequency sound waves, so that the central equipment can make the user feel noninductive when playing the ultrasonic waves, and the user experience can be improved. The ultrasonic audio data may be data having a frequency domain characteristic, for example, the ultrasonic audio data may be a continuous sine wave.

In a specific implementation, the ultrasonic audio data may only include a waveform of a preset ultrasonic wave, or may include a preset ultrasonic waveform and blank data. For example, fig. 5a shows ultrasonic audio data 500 including only a waveform of a preset ultrasonic wave, and fig. 5b shows ultrasonic audio data 510 including a waveform of a preset ultrasonic wave and blank data. As shown in fig. 5b, the ultrasonic audio data 510 includes a preset ultrasonic waveform 511 and blank data 512.

In step 403, the center device plays the ultrasonic wave.

Specifically, the center device can drive the speaker to play the ultrasonic waves through the speaker driver 321. Thereby enabling other distributed devices to collect the ultrasonic waves through the microphone. It is understood that, after the central device plays the ultrasonic wave, the ultrasonic wave audio data corresponding to the ultrasonic wave may be stored, and thus the ultrasonic wave audio data may be used as a reference waveform. In addition, the ultrasonic wave in this application also can include the ultrasonic wave of other frequency points or frequency channels, and wherein, the ultrasonic wave of above-mentioned other frequency points or frequency channels is the ultrasonic wave that the human ear is difficult to the perception, can avoid the ultrasonic wave to cause the influence to the user from this, and then can improve user's experience.

At step 404, the distributed device acquires ultrasound.

Specifically, each distributed device may collect ultrasound waves played by the central device through a microphone and may convert the collected ultrasound waves into ultrasound audio data through the microphone driver 322.

It should be noted that, since the center device may also have a microphone, an echo may also be generated between the microphone of the center device and the speaker of the center device. Therefore, the center device can also collect the above-mentioned ultrasonic waves, whereby the echo in the center device can be cancelled.

In step 405, the distributed devices transmit the ultrasonic audio data to the central device.

Specifically, after receiving the ultrasonic waves through the speaker, each distributed device may send ultrasonic audio data corresponding to the ultrasonic waves to the central device through a wireless communication interface (e.g., WIFI, bluetooth) in the physical layer 400. For example, the distributed devices may send the ultrasonic audio data to the central device in the form of data packets. It will be understood that the distributed device will also receive the normal audio stream played by the central device, and therefore, the distributed device will also transmit the normal audio stream data to the central device in the form of data packets. In addition, it should be noted that the channel through which the distributed device sends the ultrasonic audio data to the central device is consistent with the channel through which the echo is to be cancelled, so that the echo time delay calculation can be performed more truly and accurately, and the accuracy of echo cancellation can be further ensured. The channel may include an actual physical channel or a channel type, for example, the channel may be a WIFI channel 1101 in the application scenario shown in fig. 1.

Further, after receiving the ultrasonic audio data sent by the distributed device, the central device may perform calculation through an internal software module and an internal hardware module (i.e., a transmission channel that sends the ultrasonic waves, receives the ultrasonic audio data, and calculates the echo delay based on the ultrasonic audio data), so that the echo delay may be calculated. Therefore, the central equipment can also keep the software or hardware module for processing the ultrasonic waves and the ultrasonic audio data consistent or approximately consistent with the channel of the echo to be eliminated, thereby ensuring the accuracy of the echo elimination.

Each data packet may include one audio frame in audio data (the audio data may be ultrasonic audio data or general audio data), and each data packet may also include a plurality of audio frames in the audio data. The audio frame may be a frame of data in the audio stream, and for example, if the frame rate of the audio stream is 5ms, the audio stream generates one frame of audio data every 5ms, that is, one audio frame of 5 ms.

Further, when each distributed device transmits audio data to the central device, the device number of the distributed device may also be transmitted to the central device. The device number may be an identification number of the distributed device, and the device number may be used to distinguish different distributed devices. Therefore, after the central device receives the audio data sent by the distributed devices, the central device can distinguish which distributed device sent the audio data. Taking the second device 20 as an example, when the second device 20 sends the audio data of the second device 20 to the center device, the audio data sent by the second device 20 and the device number of the second device 20 may be encapsulated into a data packet and sent.

In step 406, the central device calculates the echo time delay.

Specifically, the echo time delay may be calculated by the central device. After the central device receives the audio data sent by each distributed device, the echo time delay of each distributed device can be calculated. In a specific implementation, the center device may identify the ultrasonic audio data in the received audio data, and then may determine the echo delay according to a time difference between a time when the ultrasonic audio data is transmitted and a time when the ultrasonic audio data is received.

Take the example that the audio data packet only contains one audio frame. After the central device receives the audio data packet sent by any distributed device, the central device can read the audio data packet to obtain the audio frame in the audio data packet. Next, the received plurality of audio frames may be combined into a waveform of the ultrasonic wave to be verified (for convenience of explanation, "the waveform of the ultrasonic wave to be verified composed of the plurality of audio frames" will be referred to as "the ultrasonic wave to be verified" hereinafter), and compared with the ultrasonic audio data previously stored in the above-described step 403 (for convenience of explanation, "the ultrasonic audio data previously stored in the step 403" will be referred to as "the first ultrasonic wave" hereinafter).

It is to be understood that the above-described center apparatus may also receive the normal audio stream in step 401 between the time period of the time when the ultrasonic audio data is transmitted and the time when the ultrasonic audio data is received. Therefore, when the center device recognizes the ultrasonic audio data from the received audio data, the total number of the received normal audio frames between the time when the ultrasonic audio data is transmitted and the time when the ultrasonic audio data is received can be calculated, and thus the echo delay can be calculated based on the total number of the normal audio frames.

The above manner of identifying the ultrasonic audio data in the received audio data may be a manner of comparing the ultrasonic wave to be verified with the first ultrasonic wave. In a specific implementation, the comparison may be to compare the frequency domain data of the ultrasonic wave to be verified with the frequency domain data of the first ultrasonic wave. Thereby, the similarity between the waveform of the ultrasonic wave to be verified and the waveform of the first ultrasonic wave can be obtained. If the similarity is greater than the preset first threshold, it may be determined that the waveform of the ultrasonic wave to be verified is similar to the waveform of the first ultrasonic wave, and if the similarity is less than or equal to the preset first threshold, it may be determined that the waveform of the ultrasonic wave to be verified is not similar to the waveform of the first ultrasonic wave.

It should be noted that the ultrasonic audio data to be verified may include a plurality of audio frames to be verified, where the total number of the audio frames to be verified may be determined by the total number of the audio frames in the first ultrasonic audio data. For example, assuming that the first ultrasonic wave includes 5 audio frames, the center device may compare the ultrasonic wave to be verified, which is composed of consecutive 5 audio frames, with the first ultrasonic wave after receiving one audio frame. After the comparison, if the central device determines that the waveform of the ultrasonic wave to be verified is similar to the waveform of the first ultrasonic wave, the total number of common audio frames between the time of sending the ultrasonic audio data and the time of receiving the ultrasonic audio data may be calculated, so that the echo delay may be calculated according to the total number of the common audio frames, for example, the echo delay = the total number of the common audio frames × frame rate. The time when the center device transmits the ultrasonic audio data (i.e., the first ultrasonic wave) may be determined by the time when the first received data packet is received after the ultrasonic audio data is transmitted.

Now, the comparison process between the waveform to be verified and the first ultrasonic waveform will be described by taking a sliding window manner as an example by referring to fig. 6. As shown in fig. 6, the waveform 600 is a waveform of the first ultrasonic wave, that is, a previously stored ultrasonic wave reference waveform. The sliding window 610 contains a plurality of audio frames, which can be read by audio packets received by the central device. Wherein the size of the sliding window 610 (i.e., the total number of audio frames that can be accommodated in the sliding window 610) can be determined by the total number of audio frames of the waveform of the first ultrasonic wave. It will be appreciated that the sliding window 610 described above may slide with each received audio frame. Illustratively, at time t, the center device receives audio frame x, and receives 4 audio frames, i.e., audio frame x-1, audio frame x-2, audio frame x-3, and audio frame x-4, at times t-1, t-2, t-3, and t-4, respectively, before time t. At this time (i.e., time t), the center device receives a total of 5 audio frames, i.e., audio frame x-1, audio frame x-2, audio frame x-3, and audio frame x-4. Assuming that the first ultrasonic wave includes 5 audio frames, the center device may compare the ultrasonic wave to be verified, which is composed of 5 audio frames, such as the audio frame x, the audio frame x-1, the audio frame x-2, the audio frame x-3, and the audio frame x-4, with the first ultrasonic wave. And if the ultrasonic wave to be verified is not similar to the first ultrasonic wave, the audio frame sent by the distributed equipment can be continuously received. At time t +1, the central device receives the audio frame t +1 again. The sliding window is moved forward by the position of one audio frame, that is, the sliding window contains 5 audio frames, such as audio frame x +1, audio frame x-1, audio frame x-2, and audio frame x-3, and the ultrasonic wave to be verified, which is composed of 5 audio frames, such as audio frame x +1, audio frame x-1, audio frame x-2, and audio frame x-3, can be then compared with the first ultrasonic wave. And may continue to advance the sliding window 610 in the manner described above until an ultrasound to be verified is found that is similar to the first ultrasound. When the center device finds an ultrasonic wave to be verified that is similar to the first ultrasonic wave (for convenience of explanation, the ultrasonic wave to be verified that is similar to the first ultrasonic wave is referred to as a second ultrasonic wave hereinafter), the total number of common audio frames between the time when the first audio frame y in the second ultrasonic wave is received and the time when the center device transmits the first ultrasonic wave (that is, the time when the first audio frame z is received after the first ultrasonic wave is transmitted) can be calculated, and the echo time delay is calculated according to the total number of the audio frames.

It is to be understood that the sliding window manner in fig. 6 is only an exemplary illustration and is not a limitation to the embodiments of the present application, and in some embodiments, the waveforms may be compared in other manners.

Preferably, after receiving an audio data packet sent by any one of the distributed devices, the central device may also continuously number the audio data packets sent by the same distributed device, where the audio data packet may include an ultrasonic audio data packet and a general audio data packet. Then, the central device may read a plurality of consecutively numbered audio data packets of the same distributed device, thereby obtaining audio frames in the audio data packets and obtaining ultrasonic waves to be verified, which are composed of the plurality of audio frames. If the central device compares the first ultrasonic wave with the ultrasonic wave to be verified, and after the first ultrasonic wave is determined to be similar to the ultrasonic wave to be verified, the total number of the common audio frames between the moment of sending the first ultrasonic wave and the moment of receiving the ultrasonic audio data can be determined according to the difference value of the numbers. For example, assuming that the first received audio packet after the center device transmits the first ultrasonic wave is numbered n, and the first audio packet in the second ultrasonic wave (which is similar to the first ultrasonic wave) is numbered m, the total number of normal audio frames = m-n, so that the echo delay can be calculated according to the total number of normal audio frames, for example, the echo delay = the total number of normal audio frames × the frame rate.

Further, the central device may also calculate its own echo delay, for example, the central device may compare the ultrasonic wave acquired by the microphone of the central device with the second ultrasonic wave, so that the echo delay of the central device itself may be obtained, and a specific echo delay algorithm may be in a manner shown in fig. 6, which is not described herein again.

Step 407, the central device updates the echo delay.

Optionally, after the central device calculates the echo delay, the central device may further update the echo delay. The above-described scenarios for updating the echo delay include the following three scenarios. Wherein the updated echo delay includes an echo delay between the central device and the distributed devices.

The first method is as follows: mobile update

When the central device or any one of the distributed devices moves, the distance between the central device and the distributed devices changes, so that the echo time delay between the central device and the distributed devices can be recalculated, the central device can obtain the updated echo time delay, and the central device can eliminate echo according to the updated echo time delay. The above-mentioned detecting device may move through a vibration sensor, a displacement sensor and a wireless positioning technology (e.g. UWB, bluetooth or RSSI). The distance between the central equipment and the distributed equipment can be determined to be changed in the above mode. It is to be understood that the above examples only illustrate the manner in which the detection device moves, and do not limit the embodiments of the present application, and in some embodiments, it may also be determined that the device moves in other manners.

The central device may then retransmit the ultrasonic waves to the distributed mode devices through the speakers. The ultrasonic wave may be the same data as in step 402 or different data from step 402, and the audio data of the ultrasonic wave to be retransmitted is not particularly limited in the embodiment of the present application.

After each distributed device acquires the ultrasonic wave re-sent by the central device through the microphone, the audio data corresponding to the ultrasonic wave is sent to the central device through the wireless communication interface of the physical layer 400, so that the central device can recalculate the echo time delay between the central device and the distributed devices, and the specific process of calculating the echo time delay may refer to the above steps 402 to 406, which is not described herein again. The sending of the ultrasonic wave may be a single sending, or the ultrasonic wave may be sent after being mixed with the normal voice of the user collected by the microphone of the center device, where the mixed sending may be a superposition of the ultrasonic wave and the normal voice of the user. If the ultrasonic waves are mixed with the normal voice of the user collected by the microphone of the central equipment and sent, the data sending efficiency can be improved. Since the ultrasonic wave is high-frequency data, after the distributed device receives the mixed data, the ultrasonic wave may be filtered by a filter, and for example, low-frequency signal data may be filtered by a low-pass filter, so that the high-frequency ultrasonic audio data may be obtained.

Optionally, since all the devices in the central device and the distributed devices are not necessarily moved, in this embodiment, if only a part of the distributed devices are changed in distance from the central device, only the echo delay between the central device and the part of the distributed devices may be updated. For example, if the distance between the center device and the second device 20 changes and the distance between the center device and the second device 21 does not change, only the echo delay between the center device and the second device 20 needs to be updated. Take a scene in which the distance between the center device and the second device 20 has changed as an example. The center device may acquire the ultrasonic wave of the preset second frequency and transmit the ultrasonic wave of the preset second frequency to all the distributed devices. It will be appreciated that the preset second frequency may be a different frequency than the preset first frequency. The preset first frequency may be used to instruct each distributed device to transmit ultrasonic audio data, and the preset second frequency may be used to instruct only the distributed device that has moved to transmit ultrasonic audio data. At this time, when the second device 20 receives the ultrasonic wave of the preset second frequency, the ultrasonic wave is identified by Fast Fourier Transform (FFT), whereby the frequency of the ultrasonic wave can be identified. When the second device 20 determines that the frequency of the ultrasonic wave is the preset second frequency, the second device 20 further determines whether the distance from the center device is changed. When the second device 20 determines that the distance from the center device has changed, the second device 20 may transmit ultrasonic audio data corresponding to the ultrasonic waves to the center device. When the second device 21 receives the ultrasonic wave of the preset second frequency, the ultrasonic wave is identified through Fast Fourier Transform (FFT), so that the frequency of the ultrasonic wave can be identified. When the second device 21 determines that the frequency of the ultrasonic wave is the preset second frequency, the second device 21 further determines whether the distance from the center device is changed. When the second device 21 determines that the distance from the center device has not changed, the second device 21 does not need to transmit the ultrasonic audio data to the center device. The manner in which the distributed device determines that the distance between the distributed device and the central device changes may refer to the determination manner of the central device, and is not described herein again.

It should be noted that the above examples only exemplarily show the manner of transmitting the ultrasonic audio data by the frequency for instructing the partial distributed devices, and do not constitute a limitation to the embodiments of the present application, and in some embodiments, the ultrasonic audio data may also be transmitted by other manners, for example, different modulation codes.

The second method comprises the following steps: timed updates

The central equipment can preset time length, timing can be started after the central equipment sends the first ultrasonic wave, the central equipment can reacquire the ultrasonic wave audio file after the time length is preset, and the ultrasonic wave is sent to the distributed equipment through the loudspeaker. When the distributed devices acquire the ultrasonic waves transmitted by the central device through the loudspeakers, the ultrasonic wave audio data corresponding to the ultrasonic waves can be transmitted to the central device, so that the central device can update the echo time delay between the central device and each distributed device at regular time, and further echo can be eliminated according to the updated echo time delay.

The third method comprises the following steps: silent updates

When the central device is in a silent state (e.g., no voice of the user is collected), the central device may actively acquire an ultrasound audio file and send the ultrasound to the distributed devices through the speakers. The silence state can be used to characterize that the microphone of the center device does not collect any voice data, and the speaker of the center device does not play any voice data. At this time, the center apparatus may transmit the acquired ultrasonic waves to the distributed apparatuses through the speakers. After the distributed equipment collects the ultrasonic waves sent by the central equipment through the loudspeaker, the distributed equipment can send ultrasonic wave audio data corresponding to the ultrasonic waves to the central equipment, so that the central equipment can update the echo time delay between the central equipment and each distributed equipment in a silent state, and further, echo can be eliminated according to the updated echo time delay.

The method is as follows: event triggered update

The events may include events such as new distributed devices being online, a meeting being restarted, and a meeting being interrupted. The new online distributed device may be a distributed device newly added to the distributed combined microphone network.

The central equipment can detect the events in real time, and when the central equipment monitors the events, the central equipment can actively acquire the ultrasonic audio files and send the ultrasonic waves to the distributed equipment through the loudspeaker. Then, after the distributed devices acquire the ultrasonic waves sent by the central device through the loudspeaker, the distributed devices can send ultrasonic wave audio data corresponding to the ultrasonic waves to the central device, so that the central device can update the echo time delay between the central device and each distributed device when monitoring the event, and further can eliminate echoes according to the updated echo time delay.

Optionally, when the central device monitors the event, the echo time delay between the central device and a part of the distributed devices may also be updated. Taking the online new distributed device as an example, since the distances between the other distributed devices and the central device are not changed, the echo delay between the other distributed devices and the central device does not need to be updated, and only the echo delay between the central device and the newly added distributed device needs to be updated.

And step 408, the central equipment performs echo cancellation according to the echo delay.

Specifically, after the central device calculates its own echo delay and the echo delay between each distributed device, the echo delay of the central device and the echo delay between each distributed device may be eliminated.

In a specific implementation, the central device may use an AEC algorithm to cancel its own echo delay. In addition, the central device can align the data collected by the microphone with the data played by the loudspeaker based on the echo time delay between the central device and the distributed devices, so that each distributed device can perform echo cancellation according to different distances between the distributed devices and the central device, and the adaptability between the devices can be improved.

As illustrated by the above central device and the distributed devices, the distributed devices collect the voice of the user through the microphone and then send the voice to the central device. The central equipment plays the user voice sent by the distributed equipment through the loudspeaker, at the moment, the microphone of the distributed equipment can collect the user voice played by the central equipment, and if the user voice is not filtered, the user voice played by the central equipment collected by the microphone of the distributed equipment can be sent to the central equipment again, so that echo can be caused. Therefore, the distributed device needs to filter out the echo. The central device can align the data played by the loudspeaker in the central device with the data acquired by the microphone according to the echo time delay by calculating the echo time delay between the central device and the distributed devices, so that the distributed devices can filter the data played by the central device acquired by the microphone, and the purpose of eliminating echo is achieved.

Fig. 7 is a schematic structural diagram of an embodiment of the echo cancellation device of the present application, and as shown in fig. 7, the echo cancellation device 70 may be applied to a first device, where the first device includes a speaker, and includes: a sending module 71, a receiving module 72, a calculating module 73 and a first eliminating module 74; wherein the content of the first and second substances,

a transmitting module 71, configured to transmit the first ultrasonic wave to the second device using a speaker;

the receiving module 72 is configured to receive first data sent by the second device through a first transmission manner, where the first data includes a first ultrasonic wave acquired by the second device;

the calculating module 73 is configured to calculate a first echo time delay of an audio loop corresponding to the first device and the second device according to the first ultrasonic wave and the first data, where the audio loop includes a first transmission mode;

and a first cancellation module 74, configured to perform echo cancellation on the audio loop where the first device and the second device are located according to the first echo delay.

In one possible implementation manner, the first device further includes a microphone, and the apparatus 70 further includes: a second cancellation module 75; wherein the content of the first and second substances,

a second cancellation module 75 for acquiring the first ultrasonic wave using a microphone; calculating to obtain a second echo time delay corresponding to the first equipment based on the acquired first ultrasonic wave; and based on the second echo time delay, performing echo cancellation on the first equipment.

In one possible implementation manner, the apparatus 70 further includes: an update module 76; wherein the content of the first and second substances,

and an updating module 76, configured to update the first echo delay of the audio loop corresponding to the first device and the second device.

In one possible implementation, the update module 76 is further configured to

In one possible implementation manner, the first transmission manner includes a wireless network transmission manner selected from WIFI, a mobile network, and bluetooth.

The echo cancellation device provided in the embodiment shown in fig. 7 may be used to implement the technical solutions of the method embodiments shown in fig. 1 to fig. 6 of the present application, and the implementation principles and technical effects thereof may be further referred to in the related description of the method embodiments.

It should be understood that the division of the modules of the echo cancellation device shown in fig. 7 is merely a logical division, and the actual implementation may be wholly or partially integrated into one physical entity or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling by the processing element in software, and part of the modules can be realized in the form of hardware. For example, the detection module may be a separate processing element, or may be integrated into a chip of the electronic device. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, these modules may be integrated together and implemented in the form of a System-On-a-Chip (SOC).

It should be understood that the connection relationship between the modules illustrated in the embodiment of the present application is only an exemplary illustration, and does not limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

It is to be understood that the electronic devices and the like described above include hardware structures and/or software modules for performing the respective functions in order to realize the functions described above. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

In the embodiment of the present application, the electronic device and the like may be divided into functional modules according to the method example, for example, each functional module may be divided according to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.

Each functional unit in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or make a contribution to the prior art, or all or part of the technical solutions may be implemented in the form of a software product stored in a storage medium and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media that can store program code, such as flash memory, removable hard drive, read-only memory, random-access memory, magnetic or optical disk, etc.

The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An echo cancellation method applied to a first device, the first device including a speaker, the method comprising:

transmitting a first ultrasonic wave to a second device using the speaker;

receiving first data sent by the second equipment through a first transmission mode, wherein the first data comprises the first ultrasonic waves acquired by the second equipment;

calculating a first echo time delay of an audio loop corresponding to the first device and the second device according to the first ultrasonic wave and the first data, wherein the audio loop comprises the first transmission mode;

2. The method of claim 1, wherein the first device further comprises a microphone, the method further comprising:

acquiring the first ultrasonic wave using the microphone;

3. The method according to claim 1 or 2, characterized in that the method further comprises:

4. The method of claim 3, wherein updating the first echo delay of the audio loop of the first device corresponding to the second device comprises:

updating a first echo time delay of an audio loop corresponding to the first device and the second device based on received second data sent by the second device, wherein the second data comprises the second ultrasonic waves acquired by the second device.

5. The method of claim 4, wherein the first ultrasonic wave has a predetermined first frequency and the second ultrasonic wave has a predetermined second frequency, the predetermined second frequency being different from the predetermined first frequency.

6. The method of claim 4, wherein the second ultrasound is transmitted to the second device in superposition with the user speech acquired by the first device.

7. The method of claim 3, wherein updating the first echo delay of the audio loop of the first device corresponding to the second device comprises:

sending the first ultrasonic wave to the second equipment based on a preset period;

updating the first echo time delay of the audio loop corresponding to the second equipment by the first equipment based on the received first data sent by the second equipment.

8. The method of claim 3, wherein updating the first echo delay of the audio loop of the first device corresponding to the second device comprises:

when the first equipment is detected to be in a silent state, the first ultrasonic wave is sent to the second equipment;

updating the first echo time delay of the audio loop corresponding to the first device and the second device based on the received first data sent by the second device.

9. The method of claim 3, wherein updating the first echo delay of the audio loop of the first device corresponding to the second device comprises:

sending the first ultrasonic wave to the second equipment based on the detected preset event;

10. The method of any of claims 1-9, wherein the first data further comprises a device number corresponding to the second device, the device number identifying the second device.

11. The method according to any one of claims 1-10, wherein the first transmission comprises a wireless network transmission selected from WIFI, mobile network, and bluetooth.

12. The method according to any of claims 1-11, wherein the audio loop between the first device and the second device comprises a first channel from which echoes are to be cancelled, and wherein the first transmission mode comprises a second channel, and wherein the first channel is coincident with the second channel.

13. The method according to any one of claims 1-12, wherein the audio loop between the first device and the second device comprises a first channel to be echo cancelled, the first device having a third channel to process the first ultrasound and the first data, the first channel coinciding with the third channel.

14. A first device, comprising: a memory for storing computer program code, the computer program code comprising instructions that, when read from the memory by the first device, cause the first device to perform the method of any of claims 1-13.

15. A computer readable storage medium comprising computer instructions which, when executed on the first device, cause the first device to perform the method of any one of claims 1-13.