WO2022218175A1

WO2022218175A1 - Method for estimating echo delay between distributed devices, and electronic device

Info

Publication number: WO2022218175A1
Application number: PCT/CN2022/084862
Authority: WO
Inventors: 丁浩; 钟小飞; 李刚; 张斌
Original assignee: 华为技术有限公司
Priority date: 2021-04-17
Filing date: 2022-04-01
Publication date: 2022-10-20
Also published as: CN115223581A

Abstract

A method for estimating an echo delay between distributed devices, and an electronic device, a system, a computer program product and a readable storage medium. The method comprises: firstly, a device determining the distance between devices, and determining a propagation delay between different devices; secondly, determining a processing delay between the different devices by means of an audio signal; and finally, determining an echo delay between the different devices on the basis of the propagation delay and the processing delay, and then performing echo cancellation.

Description

Distributed equipment echo delay estimation method and electronic equipment

This application claims the priority of the Chinese patent application filed on April 17, 2021 with the application number 202110415110.4 and titled "Echo Delay Estimation Method for Distributed Equipment and Electronic Equipment", the entire contents of which are incorporated by reference in this application.

technical field

The present application relates to the field of electronic technologies, and in particular, to a method for estimating an echo delay of a distributed device and an electronic device.

Background technique

With the popularization of smart devices and basic communication construction, users can communicate with distant friends directly through the Internet, which greatly shortens the interaction distance between people. In scenarios such as network calls and video conferences, after the voice of the far-end user is transmitted to the near-end device through the network, the voice of the far-end speaker is played through the near-end audio output device, so that the near-end user can directly hear the voice of the far-end speaker. The voice of the far end user.

However, in the scenarios of network calls and video conferences, since the audio output device and the audio input device are always working, the voice of the remote user played by the audio output device will be acquired by the audio input device and the remote user's voice will be transmitted through the network. The sound is transmitted to the remote user. In this case, the user will not only hear the voice he has just spoken, but also the interference of the echo may become larger and larger, resulting in howling, which greatly affects the user's experience.

In order to eliminate the influence of echo, a feasible processing method is: by fixing the relative position of the audio output device and the audio input device, manually measure the propagation delay of the sound between the audio output device and the audio input device, and combine the audio output device with the audio output device. The echo delay is determined by the hardware and software characteristics of the audio input device. After determining the echo delay, manually configure the echo delay parameters for the echo cancellation module on the audio output device, so that the echo cancellation module can work normally and avoid the audio output device from playing echoes.

However, this method can only solve the echo problem when the audio output device and the audio input device are relatively unchanged. When the user moves the audio input device/audio input device, the echo delay parameter that has been configured to the echo cancellation module is no longer valid. , the electronic device cannot effectively filter out the echo.

SUMMARY OF THE INVENTION

The embodiment of the present application provides a method for estimating echo delay of distributed devices. The method includes: an audio output device establishes a spatial coordinate system, continuously receives location information transmitted by multiple audio input devices through a network connection, and based on different audio input devices The location information of the device continuously updates the echo delay of different audio input devices, thereby eliminating the echo.

In the first aspect, at the first moment, the first electronic device determines a first distance, where the first distance is the distance between the first electronic device and the second electronic device at the first moment; the first electronic device is based on the The first distance and the speed of sound determine a first propagation delay, where the first propagation delay is the time delay between the propagation of the sound signal from the first electronic device to the second electronic device at the first moment; the first electronic device Determining that the first propagation delay is the first echo delay, or the first electronic device determines that the sum of the first propagation delay and the first processing delay is the first echo delay, and the first processing delay The sum of the delay from acquiring the voice signal to playing the voice signal by the first electronic device and the delay of acquiring the voice signal by the second electronic device and forwarding the voice signal to the first electronic device by the second electronic device.

In the above embodiment, the first electronic device determines the propagation delay between the first electronic device and the second electronic device by determining the distance between the first electronic device and the second electronic device, and further based on the propagation delay Determines the echo delay. The method first allows the electronic device itself to move at will, without affecting the effect of echo cancellation, which facilitates the user's experience; and does not require manual measurement of echo delay, users can participate in online video, conferences, and calls at any time with their electronic devices. , without worrying about echo effects.

In combination with the embodiments of the first aspect, in some embodiments, at a second time after the first time, the electronic device determines a second distance, where the second distance is the distance between the first electronic device and the second time at the second time The distance between the electronic devices; the first electronic device determines a second propagation delay based on the second distance and the speed of sound, and the second propagation delay is when the sound signal is transmitted from the first electronic device to the first electronic device at the second moment. The delay between two electronic devices; the first electronic device determines that the second propagation delay is the second echo delay, or the first electronic device determines that the sum of the second propagation delay and the first processing delay is the second echo delay; the second echo delay is different from the first echo delay.

In the above embodiment, the first electronic device continuously updates the distance between the second electronic device and the first electronic device, and further updates the echo delay, so that the user can move the first electronic device/second electronic device at will, which improves the user experience.

With reference to the embodiments of the first aspect, in some embodiments, before the first electronic device determines the first distance at the first moment, the method further includes: at a third moment before the first moment, the first electronic device determines the first distance. A processing delay.

In the above embodiment, considering that in some cases, the processing delay in the echo delay cannot be ignored, and different electronic devices correspond to different processing delays, the first electronic device determines the first processing delay by determining the first processing delay. The echo delay between the second electronic device and the first electronic device is used to perform echo cancellation on the echo related to the second electronic device.

With reference to the embodiments of the first aspect, in some embodiments, the first electronic device determines the first processing delay at a third time before the first time, specifically including: at a third time before the first time, When the distance between the first electronic device and the second electronic device is less than the distance threshold, the first electronic device plays the first audio; the first audio; the first electronic device determines that the time difference between playing the first audio and receiving the first audio is the first processing delay.

In the above embodiment, the user can bring the second electronic device close to the first electronic device, and determine the first processing delay by calculating the time difference between the audio signal passing between the second electronic device and the first electronic device, so that the first electronic device The device can determine the different processing delays caused by different types of electronic devices, and thus determine the echo delays.

With reference to the embodiments of the first aspect, in some embodiments, at a third time before the first time, the first electronic device determines the first processing delay, which specifically includes: at a third time before the first time, responding to According to the user's input, the first electronic device plays the first audio; the first electronic device receives the first audio sent by the second electronic device through the wireless network/near field communication service; the first electronic device determines to play the first audio The time difference between the audio and receiving the first audio is the first processing delay.

With reference to the embodiments of the first aspect, in some embodiments, determining the first distance by the first electronic device specifically includes: the first electronic device receives first motion information, where the first motion information includes the distance of the second electronic device. motion state; the first electronic device determines the first distance based on the first motion information.

In the above-mentioned embodiment, the first electronic device receives the motion information of the second electronic device, such as the acceleration, speed, attitude angle and other parameters in the X-axis, Y-axis, and Z-axis directions, which helps to determine the first electronic device more accurately distance from the second electronic device.

With reference to the embodiments of the first aspect, in some embodiments, determining the first distance by the first electronic device specifically includes: the first electronic device receives first location information, where the first location information includes a distance of the second electronic device. location; the first electronic device determines the first distance based on the first location information.

In the above-mentioned embodiment, the first electronic device receives the position information of the second electronic device, and then the distance between the first electronic device and the second electronic device can be determined, which is helpful for more accurate determination of the first electronic device and the second electronic device. distance between electronic devices.

With reference to the embodiments of the first aspect, in some embodiments, determining the first distance by the first electronic device specifically includes: the first electronic device receiving first distance information, where the first distance information includes the first distance; the The first electronic device determines the first distance based on the first distance information.

In the above embodiment, after the second electronic device determines the distance from the first electronic device, it informs the first electronic device of the distance, thereby reducing the computational burden of the first electronic device.

With reference to the embodiments of the first aspect, in some embodiments, at the fourth moment, the first electronic device determines a second processing delay, where the second processing delay is from acquiring the voice signal to playing the voice signal by the first electronic device The sum of the delay time and the delay time of the third electronic device acquiring the voice signal to the third electronic device and forwarding the voice signal to the first electronic device; at the fifth time after the fourth time, the first electronic device determines the third distance, the The third distance is the distance between the first electronic device and the third electronic device at the fifth moment; the first electronic device determines a third propagation delay based on the third distance and the speed of sound, and the third propagation delay is At the fifth moment, the time delay between the sound signal propagating from the first electronic device to the third electronic device. The first electronic device determines that the sum of the third propagation delay and the second processing delay is a third echo delay.

In the above embodiment, the electronic device can determine the echo delays of multiple devices, and use the corresponding echo delays to cancel the echoes related to different devices, which improves the user experience.

In a second aspect, at a third time point before the first time point, the first electronic device determines the first processing delay. At the first moment, the first electronic device determines a first distance, where the first distance is the distance between the first electronic device and the second electronic device at the first moment; the first electronic device is based on the first distance and the speed of sound to determine a first propagation delay, where the first propagation delay is the delay between the propagation of the sound signal from the first electronic device to the second electronic device at the first moment; the first electronic device determines the first propagation delay A propagation delay is the first echo delay, or the first electronic device determines that the sum of the first propagation delay and the first processing delay is the first echo delay, and the first processing delay is the The sum of the delay from acquiring the voice signal to playing the voice signal by the first electronic device and the delay of the second electronic device acquiring the voice signal to the second electronic device forwarding the voice signal to the first electronic device.

With reference to some embodiments of the second aspect, in some embodiments, at a third time point before the first time point, the first electronic device determines the first processing delay, which specifically includes: when the first electronic device and the second electronic device communicate with each other. When the distance of the device is less than the distance threshold, the first electronic device plays the first audio; the second electronic device collects the first audio; the second electronic device sends the first electronic device through the wireless network/near field communication service. the first audio; the first electronic device determines that the time difference between playing the first audio and receiving the first audio is the first processing delay.

With reference to some embodiments of the second aspect, in some embodiments, when the distance between the first electronic device and the second electronic device is less than a distance threshold, the first electronic device plays signaling, which specifically includes: before the first moment At the third moment, in response to the user's input, the first electronic device plays the first audio; the second electronic device collects the first audio; the second electronic device sends the first electronic The device sends the first audio; the first electronic device determines that the time difference between playing the first audio and receiving the first audio is the first processing delay.

With reference to some embodiments of the second aspect, in some embodiments, the first electronic device determining the first distance specifically includes: the second electronic device determining first motion information, where the first motion information includes the second electronic device The second electronic device sends motion information to the first electronic device; the first electronic device receives the motion information, and the first electronic device determines the first distance based on the first motion information.

In the above-mentioned embodiment, the second electronic device records its own motion information such as acceleration, speed, attitude angle and other parameters in the directions of the X-axis, Y-axis, and Z-axis, and sends the motion information to the first electronic device, which is helpful for The distance between the first electronic device and the second electronic device is more accurately determined.

With reference to some embodiments of the second aspect, in some embodiments, the first electronic device determining the first distance specifically includes: the second electronic device determining first location information, where the first location information includes the second electronic device The second electronic device sends the first position information to the first electronic device; the first electronic device receives the first position information, and the first electronic device determines the first distance based on the first position information.

In the above embodiment, the second electronic device can determine its own position based on the motion information, and send the position information including its own position to the first electronic device, so as to determine the distance between the first electronic device and the second electronic device. The distance helps to more accurately determine the distance between the first electronic device and the second electronic device.

With reference to some embodiments of the second aspect, in some embodiments, the first electronic device determining the first distance specifically includes: the second electronic device determining first distance information, where the first distance information includes the first distance; The second electronic device sends the first distance information to the second electronic device; the first electronic device receives the first distance information, and the first electronic device determines the first distance based on the first position information.

In a third aspect, an embodiment of the present application provides an electronic device, the electronic device includes: one or more processors and a memory; the memory is coupled to the one or more processors, and the memory is used to store computer program codes, The computer program code includes computer instructions invoked by the one or more processors to cause the electronic device to perform:

At the first moment, the first electronic device determines a first distance, where the first distance is the distance between the first electronic device and the second electronic device at the first moment; the first electronic device is based on the first distance and The speed of sound determines a first propagation delay, where the first propagation delay is the delay between the propagation of the sound signal from the first electronic device to the second electronic device at the first moment; the first electronic device determines the first propagation delay The propagation delay is the first echo delay, or the first electronic device determines that the sum of the first propagation delay and the first processing delay is the first echo delay, and the first processing delay is the first echo delay The sum of the delay from acquiring the voice signal to playing the voice signal by the electronic device and the delay of the second electronic device acquiring the voice signal to the second electronic device forwarding the voice signal to the first electronic device.

With reference to some embodiments of the third aspect, in some embodiments, the one or more processors are further configured to invoke the computer instructions to cause the electronic device to execute: at a second time after the first time, the electronic device determining a second distance, where the second distance is the distance between the first electronic device and the second electronic device at the second moment; the first electronic device determines a second propagation delay based on the second distance and the speed of sound, the The second propagation delay is the delay between the transmission of the sound signal from the first electronic device to the second electronic device at the second moment; the first electronic device determines that the second propagation delay is the second echo delay , or the first electronic device determines that the sum of the second propagation delay and the first processing delay is the second echo delay; the second echo delay is different from the first echo delay.

With reference to some embodiments of the third aspect, in some embodiments, the one or more processors are further configured to invoke the computer instructions to cause the electronic device to execute: at a third moment before the first moment, the first The electronic device determines the first processing delay.

With reference to some embodiments of the third aspect, in some embodiments, the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute: at a third moment before the first moment, when the first moment When the distance between an electronic device and a second electronic device is less than the distance threshold, the first electronic device plays the first audio; the first electronic device receives the first audio sent by the second electronic device through a wireless network/near field communication service ; the first electronic device determines that the time difference between playing the first audio and receiving the first audio is the first processing delay.

With reference to some embodiments of the third aspect, in some embodiments, the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute: at a third time point before the first time point, in response to a user input, the first electronic device plays the first audio; the first electronic device receives the first audio sent by the second electronic device through the wireless network/near field communication service; the first electronic device determines to play the first audio and The time difference between receiving the first audio is the first processing delay.

With reference to some embodiments of the third aspect, in some embodiments, the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute: the first electronic device receives first motion information, the first A motion information includes a motion state of the second electronic device; the first electronic device determines the first distance based on the first motion information.

With reference to some embodiments of the third aspect, in some embodiments, the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute: the first electronic device receives first location information, the first A location information includes the location of the second electronic device; the first electronic device determines the first distance based on the first location information.

With reference to some embodiments of the third aspect, in some embodiments, the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute: the first electronic device receives the first distance information, the first A distance information includes the first distance; the first electronic device determines the first distance based on the first distance information.

With reference to some embodiments of the third aspect, in some embodiments, the one or more processors are further configured to invoke the computer instruction to cause the electronic device to execute: at the fourth moment, the first electronic device determines the second process Delay, the second processing delay is the delay from the acquisition of the voice signal to the playback of the voice signal by the first electronic device and the delay from the acquisition of the voice signal by the third electronic device to the transfer of the voice signal to the first electronic device by the third electronic device. the sum;

At a fifth time after the fourth time, the first electronic device determines a third distance, where the third distance is the distance between the first electronic device and the third electronic device at the fifth time; the first electronic device is based on The third distance and the speed of sound determine a third propagation time delay, where the third propagation time delay is the time delay between the propagation of the sound signal from the first electronic device to the third electronic device at the fifth moment. The first electronic device determines that the sum of the third propagation delay and the second processing delay is a third echo delay.

In a fourth aspect, an embodiment of the present application provides a chip system, the chip system is applied to an electronic device, the chip system includes one or more processors, and the processors are configured to invoke computer instructions to cause the electronic device to execute the first Aspects and methods described in any possible implementation of the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer program product containing instructions, when the above computer program product is run on an electronic device, the electronic device is made to perform the first aspect and any possible implementation manner of the first aspect the described method, or perform the method described in the first aspect and any possible implementation manner of the first aspect.

In a sixth aspect, the embodiments of the present application provide a computer-readable storage medium, including instructions, when the above-mentioned instructions are run on an electronic device, the electronic device can execute the first aspect and any possible implementation manner of the first aspect. method described.

It can be understood that the electronic device provided in the third aspect, the chip system provided in the fourth aspect, the computer program product provided in the fifth aspect, and the computer storage medium provided in the sixth aspect are all used to execute the methods provided by the embodiments of the present application. . Therefore, for the beneficial effects that can be achieved, reference may be made to the beneficial effects in the corresponding method, which will not be repeated here.

Description of drawings

FIG. 1 is an exemplary schematic diagram of an echo generation process involved in the present application;

FIG. 2 is an exemplary schematic diagram of propagation delay and processing delay involved in the application;

3 is an exemplary schematic diagram of an echo module involved in the present application;

FIG. 4 is an exemplary schematic diagram of a usage scenario of the echo cancellation method involved in the application;

FIG. 5 is another exemplary schematic diagram of a usage scenario of the echo cancellation method involved in the present application;

FIG. 6 is an exemplary schematic diagram of the orientation diagram of the audio input device in FIG. 5;

FIG. 7 is an exemplary schematic diagram of a conference scene involved in the application;

FIG. 8 is an exemplary schematic diagram of an echo cancellation data flow in the scenario shown in FIG. 7;

FIG. 9 is an exemplary schematic diagram of an echo delay estimation method provided by an embodiment of the present application;

10A to 10C are exemplary schematic diagrams of two position calibration methods provided by the embodiments of the present application;

FIG. 11A to FIG. 11B are an exemplary schematic diagram of a process of determining an echo delay provided by an embodiment of the present application;

FIG. 12 is an exemplary schematic diagram of establishing a three-dimensional space coordinate system by the audio output device 200 provided by the embodiment of the present application;

13 is an exemplary schematic diagram of an echo delay estimation method under a single device provided by an embodiment of the present application;

14 is an exemplary schematic diagram of an echo delay estimation method under multiple devices provided by an embodiment of the present application;

FIG. 15 is a schematic diagram of an exemplary hardware structure of an electronic device 100 provided by an embodiment of the present application;

FIG. 16 is a schematic diagram of an exemplary software structure of an electronic device 100 according to an embodiment of the present application.

Detailed ways

The terms used in the following embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to be used as limitations of the present application. As used in the specification of this application and the appended claims, the singular expressions "a," "an," "the," "above," "the," and "the" are intended to also include Plural expressions unless the context clearly dictates otherwise. It will also be understood that, as used in this application, the term "and/or" refers to and includes any and all possible combinations of one or more of the listed items.

Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as implying or implying relative importance or implying the number of indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, unless otherwise specified, the "multiple" The meaning is two or more.

For ease of understanding, related terms and related concepts involved in the embodiments of the present application are first introduced below. The terms used in the embodiments of the present invention are only used to explain specific embodiments of the present invention, and are not intended to limit the present invention.

(1) Echo

In scenarios such as network calls and video conferences, after the far-end or near-end user makes a sound, the local audio output device will play the sound made by the user before, and the sound played by the audio output device is an echo.

Specifically, taking the far-end user as an example, when the far-end user makes a sound, the sound is acquired by the far-end audio input device, and transmitted to the near-end audio output device through the network. After the near-end audio output device plays the sound, since the near-end audio input device is always working, the sound will be acquired by the near-end audio output device and transmitted to the far-end audio output device through the network. The sound played by the far-end audio output device includes the sound of the far-end user speaking by himself at the previous moment, and the sound is the echo.

The following takes the content shown in FIG. 1 as an example to specifically introduce the echo generation process.

FIG. 1 is an exemplary schematic diagram of the echo generation process involved in the present application.

As shown in FIG. 1 , the voice 1 of the far-end user is transmitted to the near-end audio output device 1 through the network, and the audio output device 1 plays the voice 1 . Both the voice 2 of the near-end user and the voice 1 of the far-end user are acquired by the audio input device 1, and then the voice 1 and the voice 2 are transmitted to the audio output device 2 through the network. Audio output device 2 will play voice 1 and voice 2. For remote user 1, voice 1 is an echo.

(2) Echo cancellation

The echo cancellation includes: by pre-estimating the echo delay, combined with an adaptive algorithm, the echo is eliminated from the superimposed signal of the sound signal and the echo signal. The echo cancellation module is the module responsible for echo cancellation.

Among them, when the interaction delay of the communication between the audio input device and the audio output device can be ignored, the echo delay is the sum of the propagation delay and the processing delay; when the interaction delay between the audio input device and the audio output device is not When negligible, the echo delay can be the sum of propagation delay, processing delay and interaction delay.

Specifically, when there is a wired connection between the audio input device and the audio output device, it is considered that the interaction delay between the audio input device and the audio output device can be ignored; when the audio input device and the audio output device are wirelessly connected, In some cases, for example, when the communication quality of the channel carrying the wireless connection is poor, the transmission rate of the channel is low, or the channel delay is large, it is considered that the interaction delay between the audio input device and the audio output device cannot be ignored.

Among them, wireless connections include wireless networks and short-range communication services. The wireless network includes cellular mobile communication, WIFI, and the like. The short-range communication service can be in many forms, such as Bluetooth, Hi-Link, Near Field Communication (Near Field Communication, NFC), Apple Wireless Direct Link (Apple Wireless Direct Link, AWDL) and other protocols, which are not limited here.

Among them, when the interaction delay between the audio input device and the audio output device can be ignored, the propagation delay is the delay of the spatial propagation of the sound between the audio output device and the audio input device; the processing delay consists of two parts, respectively are processing delay 1 and processing delay 2. Among them, the processing delay 1 is the delay from the time when the sound signal on the audio output device is sent to the echo cancellation module until the sound signal is played by the audio playback module of the audio output device; The delay between the acquisition of the audio input device and the sound signal being sent to the echo cancellation module on the audio output device.

Wherein, when the interaction delay between the audio input device and the audio output device cannot be ignored, further, the processing delay 2 can be divided into a processing delay 21 , an interaction delay, and a processing delay 22 . Among them, the processing delay 21 is the time difference between the audio input device collecting the sound signal from the audio input module and the communication module sending the sound signal; the interaction delay is the time difference between the audio input device sending the sound signal and the audio output device receiving the sound signal , that is, the interaction delay is related to the communication performance of the channel carrying the data interaction between the audio output device and the audio input device; the processing delay 22 is the time between the audio output device receiving the sound signal and the echo cancellation module on the audio output device receiving the sound signal. Time difference.

It is worth noting that the propagation delay is related to the relative position between the audio output device and the audio input device; the processing delay 1 is related to the hardware and software on the audio output device; the processing delay 2 is related to the audio input device and the audio output device. on the hardware and software.

The following takes the content shown in FIG. 2 as an example to exemplarily introduce the propagation delay and the processing delay.

FIG. 2 is an exemplary schematic diagram of propagation delay and processing delay involved in the present application.

As shown in FIG. 2 , after the digital sound signal of the far-end user is transmitted to the near-end device through the network, the device will transmit the digital sound signal to the echo cancellation module and to the audio output device 1 at the same time. The audio output device 1 will convert the digital sound signal 1 into an analog sound signal through a D/A converter or the like and play it out. The audio input device 1 acquires the analog sound signal, converts the analog sound signal into a digital sound signal through a D/A converter, etc., and then transmits the digital sound signal to the echo cancellation module. After the echo cancellation module cancels the echo, the digital sound signal output by the echo module is transmitted to the far end through the network.

In the content shown in Figure 2, the processing delay 1 is the time difference between the digital sound signals being sent to the audio output device 1 of the echo cancellation module; the propagation delay is the analog sound signal between the audio output device 1 and the audio input device 1. The time of sound wave propagation between the two; the processing delay 2 is the time difference between the audio input device 1 obtains the digital voice signal and the audio input device 1 sends the digital voice signal 2 to the echo cancellation module.

The following takes the content shown in FIG. 3 as an example to exemplarily introduce how the echo cancellation module cancels the echo.

FIG. 3 is an exemplary schematic diagram of the echo module involved in this application.

As shown in FIG. 3 , the input of the echo cancellation module includes: echo delay, near-end sound, and reference sound; the output of the echo cancellation module is near-end sound-echo. Among them, the near-end sound is all sound signals obtained by the near-end audio input device; the reference sound is the sound signal transmitted by the far-end through the network.

The digital filter in the echo cancellation module calculates the weight of the filter through adaptive algorithms such as: least mean square (LMS) algorithm, normalized least mean square (NLMS) algorithm, proportional normalized least mean square (PNLMS) algorithm .

It is worth noting that, when the input echo delay of the echo cancellation module is closer to the delay time of the real echo, the less the residual echo in the output of the echo cancellation module is.

Secondly, an echo cancellation method related to the present application is introduced below:

Specifically, when the audio input device and the audio output device are located on the same device or the positions between the audio input device and the audio output device are relatively fixed, the device manufacturer will pre-test the echo delay of the device, and the tested echo delay The value is written to the echo cancellation module. When the user uses the device, the echo cancellation module of the device can work normally according to the echo delay parameters determined by the factory test to eliminate the echo.

FIG. 4 is an exemplary schematic diagram of a usage scenario of the echo cancellation method involved in this application.

As shown in Figure 4, for a mobile phone, since the positions of the microphone and the speaker are relatively fixed, the propagation delay can be determined by manually measuring the distance between the audio input device and the audio output device. For example, when the distance between the microphone and the speaker is about 15cm, the propagation delay in the echo delay is about 0.15/340≈0 second, which can be ignored. The sum of processing delay 1 and processing delay 2 can be obtained by pre-testing. In this case, the echo delay is equal to the processing delay.

When the user uses the call function, video function, and voice function, after the microphone receives the voice from the local user and the voice of the remote user played by the speaker, the microphone amplifies the analog signal through the amplifier, and quantizes it into digital through the D/A converter. After the signal, the digital signal is passed to the echo cancellation module. The echo cancellation module filters the received sound signal, and after filtering the voice of the remote user, the echo cancellation module encodes the filtered sound signal and transmits it to the opposite end user through the network.

When the relative position between the audio input device and the audio output device is not fixed, and the echo delay cannot be accurately estimated, the energy of the echo obtained by the audio input device can be reduced to compensate for the inaccurate estimation of the echo delay. The performance of the echo cancellation module to filter out echoes is degraded.

FIG. 5 is another exemplary schematic diagram of a usage scenario of the echo cancellation method involved in this application.

In the scenario shown in FIG. 5, since the distance between the audio input device and the audio output device is uncertain, the propagation delay in the echo delay cannot be determined. In order to reduce the echo in this scene, the user can choose the audio input device and audio output device with strong directionality. For example, the audio input device can choose a pressure-pressure-difference composite microphone. The pressure and differential pressure composite microphone has good directivity, and mainly accepts the sound signal from the main lobe direction, while the sound signal from the side lobe direction will be attenuated to a greater extent.

Among them, the strength of the directivity is positively related to the ratio of the main lobe to the side lobe in the audio input device/audio output device directional diagram. The direction map is used to describe the normalized response of the audio input device/audio output device to the sound signal in different directions. direction.

In the scenario shown in Figure 5, when the sound signal from the audio output device propagates to the audio input device along path 1 or path 2, for the audio input device, the direction of the sound played by the audio output device is mainly the direction of the microphone. The side lobe direction will be attenuated to a greater extent. For path 2, the energy of the sound signal will be further attenuated by reflections from objects such as walls. In concerts, conference rooms and other similar scenes, objects such as walls will be wrapped with absorbing materials, so that the energy of the sound signal propagating along path 2 is further attenuated when reflected by objects such as walls.

FIG. 6 is an exemplary schematic diagram of the orientation diagram of the audio input device in FIG. 5 .

When the orientation diagram of the audio input device is shown in FIG. 6 , it can be considered that the audio input device acquires the user's voice from the 0-degree direction, and acquires the echo from the 120-degree to 240-degree direction. In this case, since the response of the audio input device to sounds in different directions is different, the energy of the echo obtained by the audio input device can be reduced by improving the direction map of the audio input device.

In scenarios such as conferences/calls, since multiple audio input devices are scattered at different positions in the space, the propagation delay in the echo delay cannot be ignored, and the echo delays of different audio input devices are different. During the conference, the absolute spatial positions of the audio output device and the audio input device can change; the relative distance between the audio output device and different audio input devices can also change. Second, the audio output device does not know what type of device the audio input device is; similarly, the audio input device does not know what type of device the audio output device is. In this case, the processing delay of different audio input devices cannot be estimated by pre-measurement. Second, since the direction of arrival of the echo on the audio input device is not known, the echo energy cannot be reduced by selecting a highly directional audio input device.

The following takes the content shown in FIG. 7 as an example to exemplarily introduce a multi-person conference scenario.

FIG. 7 is an exemplary schematic diagram of a conference scenario involved in this application.

As shown in FIG. 7 , in a conference/call scenario, the audio output device may be a tablet, and the audio input device may be a mobile phone. In this conference scenario, the local user's random movement will cause the random movement of the audio input device. In this case, the propagation delay in the echo delay varies with time. Moreover, since all devices with an audio input function can be used as audio input devices, it is impossible for the audio output device and the audio input device to know each other's processing delay.

FIG. 8 is an exemplary schematic diagram of an echo cancellation data flow in the scenario shown in FIG. 7 .

As shown in FIG. 8 , the digital sound signal 1 of the far-end user is sent to the near-end audio output device through the network. After the audio output device receives the digital sound signal 1, it converts it into an analog sound signal 1 and plays it out. The analog sound signal 1 propagates in the near-end space.

After the near-end user 1 speaks, the audio input device 1 can acquire the analog sound signal 2 of the near-end user 1 and the analog sound signal 1 propagating in space. The audio input device 1 will acquire all the analog sound signals, convert them into digital sound signals 2, and send them to the audio output module through the network. After the audio output module performs echo cancellation on the digital sound signal 2, it sends the echo-cancelled digital sound signal to the remote device. Similarly, after the near-end user 2 speaks, the audio input device 2 can acquire the analog sound signal 3 of the near-end user 2 and the analog sound signal 2 propagating in space. The audio input device 2 will acquire all analog sound signals, convert them into digital sound signals 3, and send them to the audio output module through the network. After the audio output module performs echo cancellation on the digital sound signal 3, it sends the echo-cancelled digital sound signal to the remote device.

Since the propagation delay and processing delay in the echo delay cannot be determined, the echo signal in the sound signal output by the echo cancellation module on the audio output device.

It is worth noting that if the conference software, call software and other software used by the user are local applications of the device, and the local application can access the driver of the electronic device, the processing delay 2 can be determined, but the echo delay cannot be determined.

Combining the content shown in Figure 5 to Figure 8, it can be determined that when the distance between the audio input device and the audio output device is not fixed, especially in daily scenarios such as conferences/calls, the echo delay can be determined by measuring in advance. The method cannot accurately estimate the echo delay when the user uses it in the real scene, and thus cannot effectively filter and suppress the echo. Secondly, since the audio output device and audio input device can be different types of devices, such as mobile phones, tablets, wristbands, smart eyes, VR/AR devices, vehicle terminals, etc., it is impossible to require audio input devices and audio output devices to have good orientation. sex.

Again, the echo delay estimation method provided by the present application is described below by taking a single audio input device 201 and a single audio output device 200 as examples.

The echo delay estimation method provided by the present application estimates the processing delay by using signaling. And, using the device's gyroscope and/or accelerometer sensor and/or image sensor and other sensors to perform spatial modeling, estimate the distance between different audio input devices and audio output devices, and then determine the propagation delay.

In some embodiments of the present application, the electronic device may not perform step S905, that is, the processing delay is considered to be negligible, and the propagation delay calculated in steps S906 and S907 is used as the echo delay.

FIG. 9 is an exemplary schematic diagram of an echo delay estimation method provided by an embodiment of the present application.

As shown in Figure 9, the echo delay estimation method provided by the application specifically includes:

S901: The user starts a conference application.

Specifically, before the user is ready to start participating in the network conference/call, the conference/call application will be pre-launched. After starting the conference/call application, the user needs to select a suitable device as the audio input device/audio output device.

After the electronic device starts the conference application, it will find the available audio input device and audio output device through wired network, wireless network, short-range communication service or image sensor and other methods. After discovering the available audio input devices and audio output devices, the application displays the available devices on the application interface for the user to select.

For example, the user can select the tablet as the audio output device 200 and the mobile phone as the audio input device 201; or, the user can select the tablet as the audio output device 200 and the Bluetooth headset as the audio input device 201; or, the user can select the mobile phone as the audio output device 200 , a microphone as the audio input device 201; alternatively, the user can select a projector with an audio function as the audio output device 200, a tablet as the audio input device 201, etc., which are not limited herein.

S902: Enable position calibration.

Specifically, after the user selects the audio output device and the audio input device, the electronic device reminds the user to start the position calibration. When the distance between the audio input device and the audio output device falls below the distance threshold, the position calibration is complete.

Among them, the position calibration mainly includes two methods: first, the user manually confirms that the distance between the audio input device and the audio output device is lower than the distance threshold; second, the audio output device/audio input device determines that the distance between the two is low at the distance threshold.

Taking the content shown in FIG. 10A to FIG. 10C as an example below, two position calibration methods are exemplarily introduced respectively.

FIGS. 10A to 10C are exemplary schematic diagrams of two position calibration methods provided by the embodiments of the present application.

The first position calibration method is shown in FIG. 10A and FIG. 10B .

As shown in FIG. 10A , when the user selects the mobile phone as the audio input device 201 and the tablet as the audio output device 200 , the tablet and the mobile phone will remind the user to bring the audio input device and the audio output device close to each other. For example, an interface 1001 will be displayed on the mobile phone, and the content displayed in the interface 1001 is used to remind the user that the position calibration has been turned on. .

As shown in FIG. 10B , the interface 1001 and the interface 2001 may also display other contents, such as informing the user how to complete the position calibration. When the user moves the audio input device and/or the audio output device so that they are close to each other, the user can click the confirmation control 1002 on the audio input device to inform the audio input device that the position calibration has been completed; The OK control of 2002 informs the audio output device that position calibration has been completed.

It is worth noting that, after the user clicks the OK control 1002 to inform the audio input device 201 that the position calibration has been completed, the audio input device can inform the audio output device that the position calibration has been completed; correspondingly, the user clicks the OK control 2002 to inform the audio output device 200 that the position calibration has been completed. calibration.

The second position calibration method is shown in FIG. 10A and FIG. 10C .

When the user selects the mobile phone as the audio input device 201 and the tablet as the audio output device 200 . For the audio input device 201 and the audio output device 200, the proximity communication service can be enabled to estimate the distance between the audio input device 201 and the audio output device 200; , laser sensor, etc. to estimate the distance between audio input device 201 and audio output device 200; 200 distance, etc.

Wherein, estimating the distance between the audio input device 201 and the audio output device 200 through the short-range communication service includes: the audio input device 201/audio output device 200 determines the distance between the two according to the strength of the received signal. For example, when the audio input device 201 is connected to the audio output device 200 via Bluetooth, the received signal strength indicator (RSSI) and

The distance between the audio input device 201 and the audio output device 200 is determined. Among them, d is the distance between the audio input device 201 and the audio output device 200, A is the signal strength when the transmitting end and the receiving end are separated by 1 meter, and n is the attenuation factor related to the environment.

After the distance between the audio input device 201 and the audio output device 200 is determined, it is determined whether the distance is less than a preset distance threshold. If the distance is less than or equal to the distance threshold, it may be considered that the audio input device 201 is close to the audio output device 200, that is, the position calibration has been completed; if the distance is greater than the distance threshold, it may be considered that the audio input device 201 is not close to the audio output device 200, that is, Position calibration not completed.

As shown in FIG. 10C , the mobile phone is used as the audio input device 201 , and the tablet is used as the audio output device 200 . The mobile phone displays whether the calibration is completed on the interface 1001 according to the estimated distance between the mobile phone and the tablet and the distance threshold; similarly, the tablet displays on the interface 1001 whether the calibration is completed according to the estimated distance between the mobile phone and the tablet and the distance threshold. Complete the calibration.

S903: Whether the audio input device 201 is close to the audio output device 200.

Specifically, when the first position calibration method is used in step S902, the audio output device 200 or the audio input device 201 determines whether a confirmation input from the user is received. The confirmation input is that the user confirms that the audio output device 200 and the audio input device 201 are close to each other in space. How the audio output device 200 or the audio input device 201 obtains the user's input can refer to the content in FIG. 10A , and details are not repeated here. If the audio input device 201 or the audio output device 200 receives the confirmation input of the user, then step S905 and step S906 are performed; if the audio input device 201 or the audio output device 200 does not receive the confirmation input of the user, then step S904 is performed.

When the second position calibration method is used in step S902, the audio output device 200 or the audio input device 201 can determine whether the distance between the audio input device 201 and the audio output device 200 is less than or equal to the distance threshold. If it is less than or equal to the distance threshold, go to step S905 and step S906; if it is greater than the distance threshold, go to step S904.

S904 : remind the user to adjust the positions of the audio input device 201 and the audio output device 200 .

The audio input device 201 and/or the audio output device 200 reminds the user to adjust the positions of the audio input device 201 and the audio output device 200 through various methods such as screen display, playing sound, and vibration. The audio input device 201 and/or the audio output device 200 reminds the user to adjust the positions of the audio input device 201 and the audio output device 200 through the content displayed on the screen. Reference may be made to the description in FIG. 10A , which will not be repeated here.

Step S903 is executed.

S905: The audio output device 200 sends voice signaling, and after receiving the voice signaling, the audio input device 201 forwards the voice signaling to the audio output device 200, and further determines the propagation delay.

Specifically, after the position calibration is completed, it is considered that the spatial positions of the audio input device 201 and the audio output device 200 are coincident or nearly coincident. At this time, the audio output device 200 can send signaling to the audio input device 201, and the signaling can be an analog signal (sound wave), which can propagate freely in space, and the audio input device 201 continuously collects sound waves in the space, wherein the sound waves include signaling. The signaling is a signal with specific time-domain waveform characteristics or spectral characteristics.

Since it is considered that the spatial positions of the audio input device 201 and the audio output device 200 are coincident, the propagation delay can be ignored in this case, so the echo delay is equal to the processing delay.

The audio input device 201 continuously transmits the acquired sound waves to the audio output device 200 through the network or the short-range communication service. The audio output device 200 can determine whether the received signal includes the signaling according to the time domain feature, frequency domain feature, and time domain-frequency domain joint feature of the signaling, and then determine that the signaling is sent from the echo cancellation module to the audio input device 201. The time difference T2 between echo cancellation modules sent to the audio output device 200 . Wherein, the time difference T2 is the processing delay.

Wherein, since the audio output device 200 knows the time domain feature, frequency domain feature or time-frequency domain joint feature of the signaling, the audio output device 200 can use methods such as cross-correlation function, matched filter, spectrum analysis, time-frequency analysis, etc. to determine whether to The signalling is received, and the time when the signalling is received.

It should be noted that, after the audio output device 200 and/or the audio input device 201 completes the position calibration, the audio output device 200 may inform the audio input device 201 that the position calibration is successful; or, the user may inform the audio input device 201 that the position calibration is successful through interaction . After the position calibration is successful, the user can freely move the audio input device 201 and/or the audio output device 200 .

It should be noted that, in some cases, the audio input device 201 may send the signaling, and after receiving the signaling, the audio output device 200 forwards the signaling to the audio input device 201 to determine the propagation delay. After the audio input device 201 determines the propagation delay, it sends the value of the propagation delay to the audio output device 200 .

11A to FIG. 11B are taken as an example to exemplarily introduce the process of the audio output device 200 determining whether the received signal includes signaling and the audio output device 200 determining the processing delay.

FIG. 11A to FIG. 11B are exemplary schematic diagrams of a process of determining an echo delay provided by an embodiment of the present application.

As shown in FIG. 11A , after the signaling 1 is generated by the audio output device 200 , it can be transmitted to the audio output module of the audio output device 200 via the echo cancellation module of the audio output device 200 . The audio output module of the audio output device 200 converts the digital signal signaling 1 into a sound wave signal and plays it through a speaker; the audio input module in the audio input device 201 acquires and converts the sound wave signaling 1 into a digital signal signaling 1. The audio input module transmits signaling 1 to the communication module of the audio input device 201 . The communication module of the audio input device 201 will transmit the signaling 1 to the communication module of the audio output device 200 through the network or the short-range transmission service. The communication module of the audio output device 200 transmits the received signaling 1 to the echo cancellation module of the audio output device 200 .

Since the position calibration of the audio input device 201 and the audio output device 200 has been completed, the spatial distance between the audio input device 201 and the audio output device 200 can be ignored, and thus the propagation delay in the echo delay can be ignored. The audio output device 200 records the time when signaling 1 is sent by the echo cancellation module as T ₁ , and records the time when signaling 1 is received by the echo cancellation module as T ₂ . The audio output device 200 may determine the processing delay among the echo delays according to T ₁ and T ₂ . For example, processing delay = T ₂ -T ₁ .

It can be understood that the processing delay calculated by sending the signaling is equivalent to calculating that the sound sent from the remote electronic device reaches the communication module on the audio input device 201 to the audio input device 201 and forwards the echo to the audio output device. Latency between communication modules on 200.

As shown in FIG. 11B , the audio output device 200 can determine that the echo module at time T2 receives the signaling ₁ sent by the audio input device 201 according to the time domain feature of the signaling 1 . Further, the processing delay in the echo processing is determined.

Step S908 is executed.

S906: Establish a three-dimensional space coordinate system.

Specifically, after the calibration is completed, the audio output device 200 may establish a three-dimensional space coordinate system with itself as the origin.

In some embodiments of the present application, both the audio output device 200 and the audio input device 201 may establish a three-dimensional space coordinate system with itself as the origin.

The three-dimensional space coordinate system may be a Cartesian coordinate system, a polar coordinate system, a spherical coordinate system, or the like, which is not limited here.

Step S907 is executed.

S907 : The audio input device 201 sends motion information, location information or distance information to the audio output device 200 .

After the audio output device 200 establishes the three-dimensional space coordinate system, the audio input device 201 may periodically send motion information to the audio output device 200 through the network/near field communication service. The motion information includes one or more of speed, acceleration, azimuth angle, pitch angle, yaw angle, and the like. After the audio output device 200 receives the motion information sent by the audio input device 201, it can calculate the coordinates of the audio input device 201 in the three-dimensional space coordinate system, and then determine the distance between the audio input device 201 and itself, and determine the echo delay according to the distance. propagation delay. Wherein, after determining the motion information, the audio input device 201 or the audio output device 200 may combine the motion information and an inertial navigation algorithm to determine the position information of the device corresponding to the motion.

If the audio input device 201 also establishes a three-dimensional space coordinate system, the audio input device 201 can determine its own coordinates according to the motion information. The audio input device 201 sends its own location information to the audio output device 200 . The location information may include coordinates. After the audio output device 200 determines the location of the audio input device 201 , the distance between the two can be determined according to its own location and the location of the audio input device 201 .

If the audio input device 201 also establishes a three-dimensional space coordinate system, when the audio output device 200 is located at the origin, the audio input device can directly determine its own coordinates according to the motion information, and determine itself and the audio output device 200 according to its own coordinates. and send the distance information to the audio output device 200. The distance information includes the distance between the audio input device 201 and the audio output device 200 .

Step S908 is executed.

It should be noted that the audio input device 201 may periodically send motion information, location information or distance information to the audio output device 200 . Alternatively, when the audio input device 201 determines that the distance between the current position and the position when the position information and distance information were sent to the audio output device 200 last time is greater than the threshold, the audio input device 201 sends the position information or distance information to the audio output device 200 .

FIG. 12 is an exemplary schematic diagram of establishing a three-dimensional space coordinate system by the audio output device 200 according to an embodiment of the present application.

As shown in FIG. 12 , after establishing the coordinate system, the audio input device 201 can determine its own coordinates according to the motion information. For example, at a certain moment, the audio input device 201 determines its own coordinates as (x ₀ , y ₀ , z ₀ ) according to the motion information; at another moment, the audio input device 201 determines its own coordinates as (x ₁ according to the motion information) , y ₁ , z ₁ );

The audio input device 201 can send its own position information such as (x ₀ , y ₀ , z ₀ ) or (x ₁ , y ₁ , z ₁ ) to the audio output device 200 through the network; or, when the audio output device 200 does not move In the case of , the audio input device 201 converts its distance information as

or

send to the audio output device; or, when the audio input device 201 determines the position information of the audio output device 200 such as (x ₂ , y ₂ , z ₂ ), the audio input device 201 sends its own distance information such as

or

sent to the audio output device 200 .

Similarly, the audio output device 200 can determine the distance between the audio output device 200 and the audio input device 201 after receiving the motion information or the position information sent by the audio input device 201 .

S908: The echo cancellation module determines the echo delay, and performs echo cancellation on the received sound signal.

According to step S905, the audio output device 200 can determine the processing delay in the echo delay; according to steps S906 and S907, can determine the propagation delay in the echo delay. The audio output device 200 may determine the echo delay for the audio input device 201, where echo delay=processing delay+propagation delay. After the audio output device 200 determines the echo delay for the audio input device 201, the audio output device 200 can filter out the echo in the sound obtained by the audio input device 201 according to the echo delay, and send the filtered sound. to the remote user's device.

Finish.

It is worth noting that, in scenarios such as network calls and video conferences, when a new device joins the call or conference, steps S901 to S908 can be completely executed, so that the audio output device 200 can determine the echo of the new device of the new user. extension. The echo delay of the new device is related to the distance between the new device and the audio output device 200 .

Taking a single audio input device and a single audio output device as examples below, the echo delay estimation method provided by the present application as shown in FIG. 9 is exemplarily introduced in the form of a data flow.

FIG. 13 is an exemplary schematic diagram of an echo delay estimation method under a single device provided by an embodiment of the present application.

As shown in FIG. 13, corresponding to steps S901 and S902, after the user starts the conference/call application, and selects the appropriate audio input device 201 and audio output device 200, the audio input device 201 and the audio output device 200 are started. Position calibration.

Corresponding to steps S903 and S904, the audio input device 201 and the audio output device 200 continuously perform position calibration until the position calibration is successful. Wherein, whether the position calibration is successful, when the audio input device 201/audio output device 200 provides interactive controls for the user, whether the audio input device 201/audio output device 200 accepts the user's input position; or, the audio input device 201/ The audio output device 200 may determine whether to complete the position calibration through the mobile cellular network, WIFI, and short-range communication services.

After the audio output device 200 receives the confirmation input from the user, the audio output device 200 determines that the position calibration is successful, and sends a message to inform the audio input device 201 that the position calibration is successful; correspondingly, if the audio input device 201 receives the confirmation input from the user, the audio The input device 201 determines that the position calibration is successful, and sends a message to inform the audio output device 200 that the position calibration is successful.

When the audio input device 201 or the audio output device 200 uses a mobile cellular network, WIFI, or short-range communication service, it is determined whether the distance between the two is less than a distance threshold. Among them, any one of the audio input device 201 or the audio output device 200 determines that the distance between the two is less than the distance threshold according to the mobile cellular network, WIFI, and short-range communication services, considers that the position calibration is successful, and sends a message to inform the other device. Calibration is successful; or, both the audio input device 201 and the audio output device 200 determine that the distance between them is less than the distance threshold according to the mobile cellular network, WIFI, and short-range communication services, and it is considered that the position calibration is successful.

Corresponding to step S905, after the position calibration is successful, the audio output device 200 plays a signaling (sound wave) in the space, and the audio input device 201 continuously collects the sound in the space, converts the sound into a digital signal, and transmits it through the network/nearby network. The distance communication service is transmitted to the audio output device 200 . The audio output device 200 determines the time difference between sending the signaling and receiving the signaling as the processing delay in the echo delay.

Corresponding to step S906, the audio output device 200 establishes a three-dimensional space coordinate system, wherein the initial positions of the audio output device 200 and the audio input device 201 are (0, 0, 0).

Corresponding to step S907 , the audio input device 201 may periodically calculate and update its position (coordinates) in the three-dimensional space coordinate system, and send the information to the audio output device 200 .

Corresponding to step S908, after receiving the location information of the audio input device 201, the audio output device 200 can determine the distance between the audio input device 201 and the audio output device 200 in combination with the location of the audio output device 200. Still further, the propagation delay among the echo delays can be determined according to the distance between the audio input device 201 and the audio output device 200 . Combined with the processing delays determined between, the audio output device 200 can determine the echo delays.

The audio output device 200 can continuously adjust the parameters of the echo cancellation module based on the determined echo delay, so that echo cancellation can be performed on the received sound signal.

Secondly, the following takes multiple audio input devices and a single audio output device as examples to exemplarily introduce the echo delay estimation method provided by the present application as shown in FIG. 9 in the form of a data flow.

In scenarios such as network calls and video conferences, when a new device joins the call or conference, the new device, as the audio input device 202, can perform position calibration with other audio input devices 201 that have completed position calibration and have started to work normally, and The location information and the processing delay 21 are sent to the audio output device 200 . The audio output device 200 determines the echo delay of the audio input device 202 according to the network/near field communication service delay, the received location information, the processing delay 21 , and the processing delay 22 .

FIG. 14 is an exemplary schematic diagram of an echo delay estimation method under multiple devices provided by an embodiment of the present application.

As shown in FIG. 14 , the audio output device 200 has determined and continuously updated the echo delay of the audio input device 201 . When the audio input device 202 joins the conference as a new device, the position calibration can be performed with the audio input device 201 as a reference.

Corresponding to steps S902 , S903 and S904 , when the new audio input device 202 joins the conference/call, it can perform position calibration with the audio input device 201 .

When the position calibration is completed, the audio input device 201 sends the position (coordinates) information of the audio input device 201 to the audio input device 202 as the position of the audio input device 201 .

Corresponding to step S907 , the audio input device 202 may periodically calculate and update its position (coordinates) in the three-dimensional space coordinate system, and send the information to the audio output device 200 .

The audio output device 200 determines and updates the echo delay of the audio input device 202 based on the location information of the audio input device 202.

The audio input device 202 sends its own processing delay 21 to the audio output device 200 . The audio output device 200 may determine its own processing delay 22 and may determine the interaction delay determined by the network/near field communication service. In turn, the audio output device 200 may determine the echo delay of the audio input device 202 .

Corresponding to step S908, the audio output device 200 can adjust the parameters of the echo cancellation module based on the echo delay of the audio input device 202 to filter out the echo in the sound collected by the audio input device 202; correspondingly, the audio output device 200 can be based on the audio input device. The echo delay of 201 adjusts the parameters of the echo cancellation module to filter out the echo in the sound collected by the audio input device 201 .

Again, the electronic device provided by the embodiments of the present application is described below.

The audio input device 201 , the audio input device 202 , and the audio output device 200 in the embodiments of the present application may be the electronic device 100 hereinafter.

FIG. 15 is a schematic diagram of an exemplary hardware structure of an electronic device 100 according to an embodiment of the present application.

The electronic device 100 may be a cell phone, tablet computer, desktop computer, laptop computer, handheld computer, notebook computer, ultra-mobile personal computer (UMPC), netbook, as well as cellular telephones, personal digital assistants (personal digital assistants) digital assistant (PDA), augmented reality (AR) devices, virtual reality (VR) devices, artificial intelligence (AI) devices, wearable devices, in-vehicle devices, smart home devices and/or Smart city equipment, the embodiments of the present application do not specifically limit the specific type of the electronic equipment.

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.

It can be understood that, the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.

The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and processor 110 latency is reduced, thereby increasing the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus that includes a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 can be respectively coupled to the touch sensor 180K, the charger, the flash, the camera 193 and the like through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate with each other through the I2C bus interface, so as to realize the touch function of the electronic device 100 .

The I2S interface can be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 . In some embodiments, the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.

The PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communication. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160 . For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function. In some embodiments, the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.

The MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 . MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc. In some embodiments, the processor 110 communicates with the camera 193 through a CSI interface, so as to realize the photographing function of the electronic device 100 . The processor 110 communicates with the display screen 194 through the DSI interface to implement the display function of the electronic device 100 .

The GPIO interface can be configured by software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface may be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.

The USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like. The USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones. The interface can also be used to connect other electronic devices, such as AR devices.

It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 . In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.

The charging management module 140 is used to receive charging input from the charger.

The power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.

Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 . The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .

The modem processor may include a modulator and a demodulator. Wherein, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and passed to the application processor. The application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 . In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.

The wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR). The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .

In some embodiments, the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).

The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

Display screen 194 is used to display images, videos, and the like.

The electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process the data fed back by the camera 193 . For example, when taking a photo, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.

Camera 193 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.

A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.

The NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transfer mode between neurons in the human brain, it can quickly process the input information, and can continuously learn by itself. Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.

The internal memory 121 may include one or more random access memories (RAM) and one or more non-volatile memories (NVM).

Random access memory can include static random-access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronization Dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM, such as the fifth generation DDR SDRAM is generally called DDR5 SDRAM), etc.;

Non-volatile memory may include magnetic disk storage devices, flash memory.

Flash memory can be divided into NOR FLASH, NAND FLASH, 3D NAND FLASH, etc. according to the operating principle, and can include single-level memory cell (SLC), multi-level memory cell (multi-level memory cell, SLC) according to the level of storage cell potential. cell, MLC), triple-level cell (TLC), quad-level cell (QLC), etc., according to the storage specification can include universal flash storage (English: universal flash storage, UFS) , embedded multimedia memory card (embedded multi media Card, eMMC) and so on.

The random access memory can be directly read and written by the processor 110, and can be used to store executable programs (eg, machine instructions) of an operating system or other running programs, and can also be used to store data of users and application programs.

The non-volatile memory can also store executable programs and store data of user and application programs, etc., and can be loaded into the random access memory in advance for the processor 110 to directly read and write.

The external memory interface 120 can be used to connect an external non-volatile memory, so as to expand the storage capacity of the electronic device 100 . The external non-volatile memory communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video, etc. files in external non-volatile memory.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.

The audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .

Speaker 170A, also referred to as a "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.

The receiver 170B, also referred to as "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or a voice message, the voice can be answered by placing the receiver 170B close to the human ear.

The microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.

The earphone jack 170D is used to connect wired earphones. The earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.

The pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals. In some embodiments, the pressure sensor 180A may be provided on the display screen 194 . There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, and the like. The capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A. In some embodiments, touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.

The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 . In some embodiments, the angular velocity of electronic device 100 about three axes (ie, x, y, and z axes) may be determined by gyro sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shaking angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to offset the shaking of the electronic device 100 through reverse motion to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenarios.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.

The magnetic sensor 180D includes a Hall sensor. The electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip machine, the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Further, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.

The acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.

Distance sensor 180F for measuring distance. The electronic device 100 can measure the distance through infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 can use the distance sensor 180F to measure the distance to achieve fast focusing.

Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes. The light emitting diodes may be infrared light emitting diodes. The electronic device 100 emits infrared light to the outside through the light emitting diode. Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 . The electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power. Proximity light sensor 180G can also be used in holster mode, pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense ambient light brightness. The fingerprint sensor 180H is used to collect fingerprints.

The temperature sensor 180J is used to detect the temperature. In some embodiments, the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy.

Touch sensor 180K, also called "touch device". The touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”. The touch sensor 180K is used to detect a touch operation on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. Visual output related to touch operations may be provided through display screen 194 . In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.

The bone conduction sensor 180M can acquire vibration signals.

The keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key. The electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .

Motor 191 can generate vibrating cues.

The indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.

The SIM card interface 195 is used to connect a SIM card.

In some embodiments of the present application, in steps S903 and S904, the audio input device 201/audio output device 200 may use the distance sensor 180F to determine the distance between the audio input device 201 and the audio output device 200.

In some embodiments of the present application, in steps S903 and S904, the audio input device 201/audio output device 200 may use the wireless communication module 160 and the mobile communication module 150 to determine the distance between the audio input device 201 and the audio output device 200 the distance.

In some embodiments of the present application, in step S905, after determining the distance between the audio input device 201 and the audio output device 200, information such as temperature and air pressure in the current space can be obtained according to the air pressure sensor 180C, the temperature sensor 180J, etc. , and then more accurately determine the speed of sound, and then more accurately determine the propagation delay in the echo delay.

It should be noted that the audio input device may not have the receiver 170B; the audio output device may not have the microphone 170C.

The software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. The embodiment of the present invention takes an Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 as an example.

The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces. In some embodiments, the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a kernel layer.

The application layer can include a series of application packages.

As shown in Figure 16, the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on. Application bundles can also include meeting/calling applications such as WeLink.

The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions. For example, the function may be a method for determining the distance between the audio input device 201 and the audio output device 200; or, the function may be a method for establishing a three-dimensional space coordinate system; or, the function may be to send motion information, position to other devices information or distance information, etc.

As shown in Figure 16, the application framework layer may include window managers, content providers, view systems, telephony managers, resource managers, notification managers, and the like.

A window manager is used to manage window programs. The window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.

Content providers are used to store and retrieve data and make these data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.

The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications. A display interface can consist of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures, and may include displaying an interface as shown in FIG. 7 .

The phone manager is used to provide the communication function of the electronic device 100 . For example, the management of call status (including connecting, hanging up, etc.).

The resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.

The notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc. The notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.

Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.

The core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.

The application layer and the application framework layer run in virtual machines. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.

A system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.

The Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.

2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is the layer between hardware and software. The kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers. Among them, the application can call the driver of the kernel layer through the framework layer of the application, for example, call the audio driver to play signaling, call the sensor driver to determine the distance from other devices, etc., call the sensor driver to determine its own position, etc.

As used in the above embodiments, the term "when" may be interpreted to mean "if" or "after" or "in response to determining..." or "in response to detecting..." depending on the context. Similarly, depending on the context, the phrases "in determining..." or "if detecting (the stated condition or event)" can be interpreted to mean "if determining..." or "in response to determining..." or "on detecting (the stated condition or event)" or "in response to the detection of (the stated condition or event)".

In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions according to the embodiments of the present application are generated. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored on or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted over a wire from a website site, computer, server or data center (eg coaxial cable, optical fiber, digital subscriber line) or wireless (eg infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media. The available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state drives), and the like.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented. The process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium. When the program is executed , which may include the processes of the foregoing method embodiments. The aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Claims

A method for echo delay estimation, comprising:

At the first moment, the first electronic device determines a first distance, where the first distance is the distance between the first electronic device and the second electronic device at the first moment;

The first electronic device determines a first propagation delay based on the first distance and the speed of sound, where the first propagation delay is when the sound signal propagates from the first electronic device to the second electronic device at a first moment delay between;

The first electronic device determines that the first propagation delay is the first echo delay, or the first electronic device determines that the sum of the first propagation delay and the first processing delay is the first echo delay Echo delay, the first processing delay is the delay from acquiring the voice signal to playing the voice signal by the first electronic device and the second electronic device acquiring the voice signal to the second electronic device and forwarding the voice signal to the first electronic device. the sum of the upper delays.
The method according to claim 1, wherein the method further comprises:

At a second time after the first time, the first electronic device determines a second distance, where the second distance is the distance between the first electronic device and the second electronic device at the second time;

The first electronic device determines a second propagation delay based on the second distance and the speed of sound, where the second propagation delay is when the sound signal is transmitted from the first electronic device to the second electronic device at a second moment delay between;

The first electronic device determines that the second propagation delay is the second echo delay, or the first electronic device determines that the sum of the second propagation delay and the first processing delay is the first Second echo delay;

The second echo delay is different from the first echo delay.
The method according to claim 1 or 2, wherein, at the first moment, before the first electronic device determines the first distance, the method further comprises:

At a third time point before the first time point, the first electronic device determines the first processing delay.
The method according to claim 3, wherein, at a third time before the first time, the first electronic device determines the first processing delay, which specifically includes:

At a third moment before the first moment, when the distance between the first electronic device and the second electronic device is less than a distance threshold, the first electronic device plays the first audio;

The first electronic device receives the first audio sent by the second electronic device through a wireless network/near field communication service;

The first electronic device determines that a time difference between playing the first audio and receiving the first audio is the first processing delay.
The method according to claim 3, wherein, at a third time before the first time, the first electronic device determines the first processing delay, which specifically includes:

At a third moment before the first moment, in response to the user's input, the first electronic device plays the first audio;

The first electronic device receives the first audio sent by the second electronic device through a wireless network/near field communication service;

The first electronic device determines that a time difference between playing the first audio and receiving the first audio is the first processing delay.
The method according to any one of claims 1 to 5, wherein the determining of the first distance by the first electronic device specifically includes:

the first electronic device receives first motion information, the first motion information includes a motion state of the second electronic device;

The first electronic device determines the first distance based on the first motion information.
The method according to any one of claims 1 to 5, wherein the determining of the first distance by the first electronic device specifically includes:

the first electronic device receives first location information, the first location information including the location of the second electronic device;

The first electronic device determines the first distance based on the first location information.
The method according to any one of claims 1 to 5, wherein the determining of the first distance by the first electronic device specifically includes:

the first electronic device receives first distance information, the first distance information includes the first distance;

The first electronic device determines the first distance based on the first distance information.
The method according to any one of claims 1 to 8, wherein the method further comprises:

At the fourth moment, the first electronic device determines a second processing delay, where the second processing delay is the delay from acquiring the voice signal to playing the voice signal by the first electronic device and the time from acquiring the voice signal to the third electronic device The third electronic device forwards the voice signal to the sum of the delays on the first electronic device;

At a fifth time after the fourth time, the first electronic device determines a third distance, where the third distance is the distance between the first electronic device and the third electronic device at the fifth time;

The first electronic device determines a third propagation delay based on the third distance and the speed of sound, where the third propagation delay is when the sound signal propagates from the first electronic device to the third electronic device at a fifth moment in time delay between;

The first electronic device determines that the sum of the third propagation delay and the second processing delay is a third echo delay.
The method according to claim 3, wherein, at a third time before the first time, the first electronic device determines the first processing delay, which specifically includes:

When the distance between the first electronic device and the second electronic device is less than a distance threshold, the first electronic device plays the first audio;

the second electronic device collects the first audio;

The second electronic device sends the first audio to the first electronic device through a wireless network/near field communication service;

The first electronic device determines that a time difference between playing the first audio and receiving the first audio is the first processing delay.
The method according to claim 3, wherein when the distance between the first electronic device and the second electronic device is less than a distance threshold, the first electronic device plays signaling, which specifically includes:

At a third moment before the first moment, in response to the user's input, the first electronic device plays the first audio;

the second electronic device collects the first audio;

The second electronic device sends the first audio to the first electronic device through a wireless network/near field communication service;

The first electronic device determines that a time difference between playing the first audio and receiving the first audio is the first processing delay.
The method according to claim 10 or 11, wherein the determining of the first distance by the first electronic device specifically includes:

the second electronic device determines first motion information, where the first motion information includes a motion state of the second electronic device;

the second electronic device sends motion information to the first electronic device;

The first electronic device receives the motion information, and the first electronic device determines the first distance based on the first motion information.
The method according to claim 10 or 11, wherein the determining of the first distance by the first electronic device specifically includes:

the second electronic device determines first location information, the first location information including the location of the second electronic device;

the second electronic device sends the first location information to the first electronic device;

The first electronic device receives the first location information, and the first electronic device determines the first distance based on the first location information.
The method according to claim 10 or 11, wherein the determining of the first distance by the first electronic device specifically includes:

the second electronic device determines first distance information, the first distance information includes the first distance;

The second electronic device sends the first distance information to the second electronic device;

The first electronic device receives the first distance information, and the first electronic device determines the first distance based on the first location information.
An electronic device, characterized in that the electronic device comprises: one or more processors and a memory;

The memory is coupled to the one or more processors for storing computer program code, the computer program code comprising computer instructions that the one or more processors invoke to cause the The electronic device performs the method of any one of claims 1 to 9.
A chip system, the chip system applied to an electronic device, the chip system comprising one or more processors for invoking computer instructions to cause the electronic device to execute any one of claims 1 to 9 method described in item.
A computer program product comprising instructions, wherein, when the computer program product is run on an electronic device, the electronic device is caused to perform the method according to any one of claims 1 to 9.
A computer-readable storage medium comprising instructions, characterized in that, when the instructions are executed on an electronic device, the electronic device is caused to perform the method according to any one of claims 1 to 9.