WO2021190274A1

WO2021190274A1 - Method and device for determining state of echo sound field, storage medium, and terminal

Info

Publication number: WO2021190274A1
Application number: PCT/CN2021/079181
Authority: WO
Inventors: 叶顺舟
Original assignee: 紫光展锐(重庆)科技有限公司
Priority date: 2020-03-26
Filing date: 2021-03-05
Publication date: 2021-09-30
Also published as: CN111654585A; CN111654585B

Abstract

A method and device for determining a state of an echo sound field, a storage medium, and a terminal. The method comprises: acquiring a signal to be determined; determining a remote signal X_n(k), a proximal signal D_n(k), and a filter coefficient W_n(k) of said signal; determining a filter update degree Cef_update at least on the basis of the remote signal X_n(k), of the proximal signal D_n(k), and of the filter coefficient W_n(k); and determining, at least on the basis of the filter update degree Cef_update being greater than an update degree threshold Thrd_update, whether an echo sound field state of said signal is an echo path change state. The present invention effectively increases the accuracy of an echo path change state determination, provides an opportunity to employ more parameters in determining more echo sound field states, effectively implements multifeatured detection, and increases the comprehensiveness of an echo sound field state determination.

Description

Method and device for determining state of echo sound field, storage medium and terminal

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 26, 2020, the application number is 202010223647.6, and the invention title is "Method and device for determining the state of the echo sound field, storage medium, and terminal", the entire content of which is incorporated by reference Incorporated in this application.

Technical field

The present invention relates to the technical field of acoustic echo cancellation, in particular to a method and device for determining the state of an echo sound field, a storage medium, and a terminal.

Background technique

In the process of real-time voice communication and Voice over Internet Protocol (VOIP), the sound emitted by the speaker of the communication terminal will always be picked up by the microphone of the terminal. If it is not processed, it will be sent out and the other party can always hear it. The voice of self-talk is not good. In the field of human-computer interaction, since the sound emitted by the interactive terminal is picked up by the microphone and the controller's speech is picked up at the same time, if the microphone pickup signal does not eliminate the sound from the interactive terminal, the interactive terminal is recognizing the controller's speech At this time, strong interference will be introduced, which will reduce the success rate of recognition and eventually cause interaction difficulties. It is a well-known method to use Acoustic Echo Canceler (AEC) to cancel the echo. A typical AEC system includes an adaptive filter AF for linear echo processing and a nonlinear part for residual echo processing.

Due to the diversity and variability of the echo sound field, the robustness and stability of the corresponding AEC technology are greatly challenged. For example, if the update of adaptive filtering is not controlled in dual-talk and no-speech scenarios, it will face the risk of divergence and misalignment.At the same time, when the echo path changes, if the update speed is not increased, the convergence speed will be too slow, resulting in The residual echo; similarly, in the non-linear or residual echo processing, if the single talk and the dual state are not distinguished, it will often lead to the damage of the effective speech and reduce the performance of the dual talk.

The detection of Double Talk State (DTS) in the echo sound field state is particularly important. Conventional Double Talk Detection (DTD) methods can be roughly divided into three categories: energy-based detection, correlation-based detection, and Detection based on echo path. Among them, the energy-based detection is the simplest, which is extremely dependent on the stability of the echo signal strength, the near-end speech signal strength and the background noise strength, and the misjudgment rate is very high; the correlation-based detection is limited by the characteristics of the device, when the speaker is nonlinear When the distortion is large, the performance of this method drops sharply; based on the detection of the echo path, such as estimating the horn impulse response, variable impulse response, etc., the performance becomes worse when the echo path changes.

However, in the prior art, the accuracy of determining the state of the echo sound field is low, which in turn affects the effect of echo cancellation.

Summary of the invention

The technical problem solved by the present invention is to provide a method and device for determining the state of the echo sound field, a storage medium, and a terminal, which can effectively improve the accuracy of determining the state of the echo path change.

In order to solve the above technical problem, an embodiment of the present invention provides a method for determining the state of an echo sound field, which includes the following steps: acquiring a signal to be determined; determining the far-end signal X _n (k) and the near-end signal D _n ( k) and the filter coefficient W _n (k); at least according to the far-end signal X _n (k), the near-end signal D _n (k) and the filter coefficient W _n (k), the filter update degree Cef _{update is determined} At least according to the filter update degree Cef _{update being} greater than the preset update degree threshold Thrd _update , it is determined whether the echo sound field state of the signal to be determined is the echo path change state.

Optionally, the method for determining the echo sound field state further includes: determining whether the echo sound field state of the signal to be determined is far, _{at least according to the filter update degree Cef update being} less than or equal to the preset update degree threshold Thrd _update Single talk status.

Optionally, determining the filter update degree Cef _update at least according to the far-end signal X _n (k), the near-end signal D _n (k) and the filter coefficient W _n (k) includes: according to the far-end signal X _n (k), near-end signal D _n (k) and the filter coefficients W _n (k), determining the residual signal E _n (k); according to the residual signal E _n (k), to determine the updated Filter coefficient W _n+1 (k); determine the filter update degree Cef _update according to the filter coefficient W _n (k) and the updated filter coefficient W _n+1 (k).

Alternatively, one or more of the following: using the following equation to determine the residual signal E _n (k):

The following formula is used to determine the updated filter coefficient W _n+1 (k), where the update step size μ _n (k) is used to indicate the _{update step size of the filter coefficient W n} (k):

Use the following formula to determine the filter update degree Cef _update :

Optionally, before determining whether the echo sound field state of the signal to be determined is an echo path change state, the method for determining the echo sound field state further includes: performing voice activation detection _{on the near-end signal D n (k),} To obtain the near-end voice activation flag DVflag; if the near-end voice activation flag DVflag is not equal to 1, it is determined that the echo sound field state of the signal to be determined is an idle state.

Optionally, before determining whether the echo sound field state of the signal to be determined is an echo path change state, the method for determining the echo sound field state further includes: performing voice activation detection _{on the far-end signal X n (k),} To obtain the far-end voice activation flag XVflag; if the far-end voice activation flag XVflag is not equal to 1, it is determined that the echo sound field state of the signal to be determined is the near-end single talk state.

Optionally, before determining whether the echo sound field state of the signal to be determined is an echo path change state, the method for determining the echo sound field state further includes: determining the echo suppression ratio Err of the signal to be determined; if said If the echo suppression ratio Err is greater than the preset echo threshold Thrd _err , it is determined that the echo sound field state of the signal to be determined is the remote single talk state.

Optionally, determining the echo suppression ratio Err of the signal to be determined includes: determining the residual signal according to the far-end signal X _n (k), the near-end signal D _n (k), and the filter coefficient W _n (k). the difference signal E _n (k); according to the end signal D _n (k) and the residual signal E _n (k), the echo signal suppression ratio determined Err.

Use the following formula to determine the signal echo suppression ratio Err:

Wherein, k is the frequency index of the signal to be determined.

Optionally, before determining whether the echo sound field state of the signal to be determined is the echo path change state, the echo sound field state determination method further includes: determining the normalized cross-correlation values C _YE and C _DE ; if C _DE Is greater than the first preset cross-correlation threshold Thrd1 _coh and C _{YE is} less than the second preset cross-correlation threshold Thrd2 _coh , then it is determined that the echo sound field state of the signal to be determined is a dual-talk state; wherein, the first preset cross-correlation threshold The correlation threshold Thrd1 _{coh is} greater than or equal to the second preset cross-correlation threshold Thrd2 _coh .

Optionally, it further includes one or more of the following: if the filter update degree Cef _{update is} greater than the preset update degree threshold Thrd _update , determining that the echo sound field state of the signal to be determined is the echo path change state; If the update degree Cef _{update is} less than or equal to the preset update degree threshold Thrd _update , it is determined that the echo sound field state of the signal to be determined is the remote single talk state.

Optionally, use the following formula to determine the normalized cross-correlation values C _YE and C _DE :

Wherein, M and L are the frequency band indexes of the signal to be determined.

Optionally, the normalized cross-correlation values C _YE and C _DE are normalized cross-correlation values in the linear region; where M and L are frequency band indexes of the linear region.

Optionally, the method for determining the echo sound field state further includes: adjusting the update step size μ _n (k) of the signal to be determined according to the echo sound field state of the signal to be determined; wherein the update step size μ _n ( k) is used to indicate the update step size of _{the filter coefficient W n (k).}

Optionally, adjusting the update step size μ _n (k) includes one or more of the following: if it is determined that the echo sound field state of the signal to be determined is the echo path change state, increase the update step size μ _n (k); if If it is determined that the echo sound field state of the signal to be determined is the dual talk state, adjust μ _n (k) to slow down the update; if it is determined that the echo sound field state of the signal to be determined is the idle state or the near-end single talk state, adjust μ _n (k)=0.

Optionally, an echo adaptive filter is used to adjust the update step size μ _n (k) of the signal to be determined.

Optionally, the method for determining the echo sound field state further includes: determining whether to perform non-linear processing on the signal to be determined according to the echo sound field state of the signal to be determined.

Optionally, determining whether to perform nonlinear processing on the signal to be determined includes one or more of the following: if it is determined that the echo sound field state of the signal to be determined is a dual-talk state, reducing the degree of nonlinear processing; If the echo sound field state of the signal to be determined is the echo path change state, then the nonlinear processing of the signal to be determined is enhanced; if it is determined that the echo sound field state of the signal to be determined is the near-end single talk state, stop talking to the Non-linear processing of the signal to be determined; if it is determined that the echo sound field state of the signal to be determined is an idle state, then the non-linear processing of the signal to be determined is stopped.

Optionally, a post-processing non-linear processing unit is used to perform non-linear processing on the signal to be determined.

Optionally, the method for determining the echo sound field state further includes: determining, according to the echo sound field state of the signal to be determined, to reduce the noise update speed of the signal to be determined or to increase the non-stationary noise suppression capability of the signal to be determined .

Optionally, determining to reduce the noise update speed or to improve the non-stationary noise suppression capability includes one or more of the following: if it is determined that the echo sound field state of the signal to be determined is the near-end single talk state, reducing the signal to be determined Noise update speed; if it is determined that the echo sound field state of the signal to be determined is a dual-talk state, reduce the noise update speed of the signal to be determined; if it is determined that the echo sound field state of the signal to be determined is a remote single talk state, The non-stationary noise suppression capability of the signal to be determined is improved; if it is determined that the echo sound field state of the signal to be determined is an echo path change state, the non-stationary noise suppression capability of the signal to be determined is improved.

Optionally, a post-processing noise suppression unit is used to reduce the noise update speed of the signal to be determined or to improve the non-stationary noise suppression capability of the signal to be determined.

Optionally, the method for determining the state of the echo sound field further includes: determining the temporary sound field state of the signal to be determined; and determining to maintain the dual sound field state of the signal to be determined according to the echo sound field state and the temporary sound field state of the signal to be determined. Talking status output or output of delaying echo path change for the signal to be determined.

Optionally, the output determined to maintain the dual-talk state output for the signal to be determined or to suspend the echo path change for the signal to be determined includes one or more of the following: if the echo sound field state of the signal to be determined is dual-talk State, the temporary sound field state is the remote single-talk state, the signal to be determined is maintained in the dual-talk state output through the hold time; if the echo sound field state of the signal to be determined is the dual-talk state, the temporary sound field state If it is the echo path change state, the output of the echo path change is suspended for the signal to be determined through the start time.

In order to solve the above technical problem, an embodiment of the present invention provides an echo sound field state determination device, which includes: an acquisition module for acquiring a signal to be determined; a signal determination module for determining the far-end signal X _n ( k), the near-end signal D _n (k) and the filter coefficient W _n (k); the update degree determination module is used to determine at least the far-end signal X _n (k), the near-end signal D _n (k) and the The filter coefficient W _n (k) determines the filter update degree Cef _update ; the state determination module is used to determine the echo sound field state of the signal to be determined at least according to the filter update degree Cef _{update being} greater than the preset update degree threshold Thrd _update Whether it is the state of echo path change.

In order to solve the above technical problem, an embodiment of the present invention provides a storage medium on which computer instructions are stored, and the computer instructions execute the steps of the method for determining the state of the echo sound field when the computer instructions are executed.

In order to solve the above technical problems, an embodiment of the present invention provides a terminal, including a memory and a processor, the memory stores computer instructions that can run on the processor, and the processor executes the computer instructions when the computer instructions are run. The steps of the method for determining the state of the echo sound field.

Compared with the prior art, the technical solution of the embodiment of the present invention has the following beneficial effects:

In the embodiment of the present invention, by setting the filter update degree Cef _{update to be} greater than the preset update degree threshold Thrd _update to determine whether the echo sound field state of the signal to be determined is the echo path change state, appropriate parameters can be set, The signal to be determined is actually the state of the echo path change. Compared with the prior art, the echo sound field state is simply divided into a single-talk state and a dual-talk state for detection, which makes it easier to detect the echo path change state. The misjudgment is the dual-talk state. The solution of the embodiment of the present invention can effectively improve the accuracy of the judgment of the echo path change state, and in the subsequent steps, there is an opportunity to use more parameters to judge more echo sound field states, and more Effectively realize multi-feature detection and improve the completeness of the judgment of the echo sound field state.

Further, at least according to the filter update degree Cef _{update being} less than or equal to the preset update degree threshold Thrd _update , it is determined that the echo sound field state of the signal to be determined is the remote single talk state, which is easier to change than in the prior art The change state of the echo path is misjudged as the dual-talk state, and the solution of the embodiment of the present invention can further effectively improve the accuracy of the determination of the change state of the echo path.

Further, by judging that the near-end voice activation flag DVflag is not equal to 1, the echo sound field state of the signal to be determined is idle. When the near-end voice activation flag DVflag is not 1, it can be considered that there is no voice at the near end, otherwise it means that the near end There is voice at the end, and the signal to be determined needs to be further judged.

Further, by judging that the far-end voice activation flag XVflag is not equal to 1, the echo sound field state of the signal to be determined is the near-end single talk state, and when the far-end voice activation flag XVflag is not 1, it can be considered that there is no signal at the far end, There is no echo signal in the near-end signal, and the current state is the near-end single-talk state. Otherwise, it indicates that there is echo in the near-end signal, and further judgment on the signal to be determined is required.

Furthermore, by judging that the echo suppression ratio Err is greater than the preset echo threshold Thrd _err , it indicates that the relative amplitude of the residual signal is very small, and most of the near-end signal components are determined to be echo signals, which have been eliminated by the adaptive filter AF, and the current state It is the far-end single-talk state, otherwise it indicates that the relative amplitude of the residual signal is still high and the component in the near-end signal is uncertain, and further judgments on the signal to be determined need to be made.

Further, the normalized cross-correlation value to make the near-end signal and the residual signal component further determines, in the filter converges, the residual data E _n (k) corresponding to the de-correlated echo signal, at this time if the C _{DE is} greater than the threshold Thrd1 _coh , indicating that the near-end signal contains many components that are not related to echo, but if the filter does not converge, the residual signal will also contain a large amount of echo components. This conclusion is not valid; therefore, C _YE is used to further Confirm that if C _{YE is} less than the threshold Thrd2 _{coh, it} means that there are few echo components in the residual signal. Combined with _{the condition that C DE is} greater than the threshold Thrd1 _coh , it can be confirmed that the near-end signal contains components that are not related to echo. At this time, the current state is dual-talk. Status, otherwise it means that the signal component cannot be determined, and further judgment on the signal to be determined is required.

Further, after judging and excluding the dual-talk state, it is judged that the echo path change state _{according to the filter convergence Cef update being} greater than the threshold Thrd _update _{, and judging according to the filter update degree Cef update being} less than or equal to the preset update degree threshold Thrd _update It is the far-end single-talk state, which can indicate that the filter is in a fast update state. Since the previous judgment has ruled out the deterministic dual-talk state, the interference of the near-end voice signal to the filter is not too high. Due to convergence or echo path change, the current state is the echo path change state. Otherwise, the current feature has no obvious distinction and is regarded as an uncertain state. In the embodiment of the present invention, it is determined as the remote single talk state.

Further, the normalized cross-correlation values C _YE and C _DE are the normalized cross-correlation values in the linear region; where M and L are the frequency band indexes of the linear region, and the accuracy of judgment can be improved by taking the value in the linear region .

Further, when the signal to be determined is the echo path change state, the update step size μ _n (k) can be increased to speed up the update and fast convergence; when the signal to be determined is the dual-talk state DTS, adjust μ _n (k) Slow down the update to ensure the robustness of the filter; when the signal to be determined is the remote single talk state FSTS, μ _n (k) takes the normal value without special adjustment; when the signal to be determined is In the idle state IDS or the near-end single talk state NSTS, μ _n (k) is taken as 0, and the update is stopped to prevent divergence, thereby improving the signal transmission quality.

Further, the degree of nonlinear processing can be reduced when the signal to be determined is in the dual-talk state, so that effective speech is not damaged, and dual-talk performance is ensured; when the signal to be determined is the echo path change state PCS, the degree of nonlinear processing can be enhanced , To prevent the leakage of residual echo; when the signal to be determined is near-end single talk NSTS and idle state IDS, stop non-linear processing to avoid causing near-end voice and environmental sound distortion; when the signal to be determined is far-end No special processing is done in the single-talk state FSTS, and the residual echo is normally suppressed, thereby improving the signal transmission quality.

Further, when the signal to be determined is in the near-end single-talk state and the dual-talk state, the noise update speed can be slowed down to ensure the intelligibility of the effective voice; when the signal to be determined is the far-end single-talk and echo path changes When the non-stationary noise suppression ability is improved, the residual echo is suppressed; when the signal to be determined is in the idle state, that is, the background noise IDS state, no special processing is performed, and the background noise is normally tracked, thereby improving the signal transmission quality .

Description of the drawings

Figure 1 is a schematic diagram of the structure of an AEC system in the prior art;

2 is a flowchart of a method for determining the state of an echo sound field in an embodiment of the present invention;

FIG. 3 is a flowchart of another method for determining the state of an echo sound field in an embodiment of the present invention;

Figure 4 is a schematic structural diagram of an AEC system in an embodiment of the present invention;

Fig. 5 is a schematic structural diagram of a device for determining an echo sound field state in an embodiment of the present invention.

Detailed ways

As mentioned above, in the process of real-time voice communication and IP-based voice transmission, the sound emitted by the speaker of the communication terminal will always be picked up by the microphone of the terminal. If it is not processed and sent out, the other party can always hear the voice. Sound, poor experience. It is a well-known method to use echo cancellation to cancel the echo. A typical AEC system includes an adaptive filter AF for linear echo processing and a nonlinear part for residual echo processing.

Referring to Fig. 1, Fig. 1 is a schematic structural diagram of an AEC system in the prior art.

As shown in Figure 1, the signal x(n) passes through the speaker (statistical process control, SPK) to obtain the signal h(n). After (MIC), the signal d(n) is output.

Short-time Fourier transform (short-time Fourier transform, or short-term Fourier transform, STFT) is performed on the signal d(n) and signal x(n) respectively to obtain a near-end signal D _n (k) and a far-end signal signal X _n (k), the adaptive filter (adaptive filters, AF) can be calculated far-end signal X _n (k) with the filter coefficients W _n (k) the echo estimation signal Y _n (k), and the near end of the signal D _n (k) obtained by subtracting the residual signal E _n (k).

In a specific implementation, the filter coefficient can be _{updated according to the filter coefficient W n} (k) to obtain W _n+1 (k).

Further the residual signal E _n (k) may be nonlinear input processing unit (Non-linear programming, NLP) and post-processing noise suppression unit (Noise suppression, NS).

The detection of the dual-talk state in the echo sound field state is particularly important. Conventional dual-talk detection methods can be roughly divided into three categories: energy-based detection, correlation-based detection, and echo path-based detection. Among them, the energy-based detection is the simplest, which is extremely dependent on the stability of the echo signal strength, the near-end speech signal strength and the background noise strength, and the misjudgment rate is very high; the correlation-based detection is limited by the characteristics of the device, when the speaker is nonlinear When the distortion is large, the performance of this method drops sharply; based on the detection of the echo path, such as estimating the horn impulse response, variable impulse response, etc., the performance becomes worse when the echo path changes. However, in the prior art, the accuracy of determining the state of the echo sound field is low, which in turn affects the effect of echo cancellation.

The inventors of the present invention have discovered through research that the existing methods for determining the state of the echo sound field simply divide the state of the echo sound field into a single talk state (Single Talk State, STS) and a double talk state (Double Talk State, DTS). However, in actual situations, the path change state (Path Change State, PCS) used to indicate the change of the echo path lacks an effective detection method, and is often misjudged as DTS, so that the echo that needs to be processed most is maximized The retention of the echo sound field status is incorrect, which affects the echo cancellation effect.

In the embodiment of the present invention, by setting the filter update degree Cef _{update to be} greater than the preset update degree threshold Thrd _update to determine whether the echo sound field state of the signal to be determined is the echo path change state, appropriate parameters can be set, The signal to be determined is actually the state of the echo path change. Compared with the prior art, the echo sound field state is simply divided into a single-talk state and a dual-talk state for detection, which makes it easier to detect the echo path change state. The misjudgment is the dual-talk state, and the solution of the embodiment of the present invention can effectively improve the accuracy of the judgment of the echo path change state.

In order to make the above objectives, features and beneficial effects of the present invention more obvious and understandable, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Referring to FIG. 2, FIG. 2 is a flowchart of a method for determining the state of an echo sound field in an embodiment of the present invention. The method for determining the state of the echo sound field includes steps S21 to S24:

Step S21: Obtain a signal to be determined;

Step S22: Determine the far-end signal, the near-end signal, and filter coefficients of the signal to be determined;

Step S23: Determine the filter update degree at least according to the far-end signal, the near-end signal and the filter coefficient;

Step S24: Determine whether the echo sound field state of the signal to be determined is the echo path change state at least according to the filter update degree being greater than the preset update degree threshold.

It can be understood that, in specific implementation, the method can be implemented in the form of a software program that runs on a processor integrated inside a chip or a chip module.

In the specific implementation of step S21, the to-be-determined signals with different echo sound field states may include different signals, for example, may include the signal obtained after the sound emitted by the speaker of the communication terminal is picked up by the microphone of the terminal, and may also include only the remote Signal. In the embodiment of the present invention, by accurately determining the echo sound field state of the signal to be determined, the echo cancellation can be achieved more effectively.

In the specific implementation of step S22, the far-end signal X _n (k), the near-end signal D _n (k) and the filter coefficient W _n (k) of the signal to be determined are determined.

Specifically, conventional techniques may be used to determine the far-end signal X _n (k), the near-end signal D _n (k), and the filter coefficient W _n (k) of the signal to be determined. For example, short-time Fourier transform is performed on the signal d(n) and signal x(n) shown in FIG. 1 to obtain the near-end signal D _n (k) and the far-end signal X _n (k). Determine the filter coefficient W _n (k) by an appropriate method.

In the specific implementation of step S23, the filter update degree Cef _{update is} determined.

Further, according to at least the far-end signal X _n (k), the near-end signal D _n (k) and the filter coefficient W _n (k), the step of determining the filter update degree Cef _{update may include:} end signal X _n (k), near-end signal D _n (k) and the filter coefficients W _n (k), determining the residual signal E _n (k); according to the residual signal E _n (k), determine the update After the filter coefficient W _n+1 (k); according to the filter coefficient W _n (k) and the updated filter coefficient W _n+1 (k), determine the filter update degree Cef _update .

Still further, the following formula may be used to determine the residual signal E _n (k):

Furthermore, the following formula can be used to determine the updated filter coefficient W _n+1 (k), where the update step size μ _n (k) is used to indicate the _{update step of the filter coefficient W n} (k) long:

Furthermore, the following formula can be used to determine the filter update degree Cef _update :

It should be pointed out that in the embodiment of the present invention, other appropriate methods may also be used to determine the above-mentioned parameters, which is not limited in the embodiment of the present invention.

In the specific implementation of step S24, it may be determined whether the echo sound field state of the signal to be determined is the echo path change state _{at least according to the filter update degree Cef update being} greater than the preset update degree threshold Thrd _update.

Further, in a specific implementation manner of the embodiment of the present invention, if the filter update degree Cef _{update is} greater than the preset update degree threshold Thrd _update , it can be determined that the echo sound field state of the signal to be determined is the echo path change state .

Further, the method for determining the echo sound field state may further include: determining whether the echo sound field state of the signal to be determined is far, _{at least according to the filter update degree Cef update being} less than or equal to the preset update degree threshold Thrd _update Single talk status.

In a specific implementation of the embodiment of the present invention, if the filter update degree Cef _{update is} less than or equal to the preset update degree threshold Thrd _update , it can be determined that the echo sound field state of the signal to be determined is remote single talk state.

In the embodiment of the present invention, at least according to the filter update degree Cef _{update being} less than or equal to the preset update degree threshold Thrd _update , it is determined that the echo sound field state of the signal to be determined is the remote single talk state, which is compared with the current state. In some technologies, it is easy to misjudge the change state of the echo path as the dual-talk state. The solution of the embodiment of the present invention can further effectively improve the accuracy of the judgment of the change state of the echo path.

Further, before determining whether the echo sound field state of the signal to be determined is an echo path change state, the method for determining the echo sound field state may further include: performing voice activation detection _{on the near-end signal D n (k),} To obtain the near-end voice activation flag DVflag; if the near-end voice activation flag DVflag is not equal to 1, it is determined that the echo sound field state of the signal to be determined is an idle state.

It should be pointed out that, in the embodiment of the present invention, the near-end signal D _n (k) is subjected to voice activation detection, and the echo sound field state of the signal to be determined is determined to be idle according to the near-end voice activation flag DVflag The state step can also be set to be executed after step S24. The embodiment of the present invention does not limit the sequence of the step of judging the near-end voice activation flag DVflag and the step S24.

In the embodiment of the present invention, by judging that the near-end voice activation flag DVflag is not equal to 1, the echo sound field state of the signal to be determined is idle. When the near-end voice activation flag DVflag is not 1, it can be considered that the near-end has no Voice, otherwise it means that there is voice at the near end, and further judgment on the to-be-determined signal is needed.

Further, before determining whether the echo sound field state of the signal to be determined is an echo path change state, the method for determining the echo sound field state may further include: performing voice activation detection _{on the far-end signal X n (k),} To obtain the far-end voice activation flag XVflag; if the far-end voice activation flag XVflag is not equal to 1, it is determined that the echo sound field state of the signal to be determined is the near-end single talk state.

It should be pointed out that, in the embodiment of the present invention, the _{voice activation detection is performed on the far-end signal X n} (k), and the echo sound field state of the signal to be determined is judged to be close according to the far-end voice activation flag XVflag. The step of the single-talk state can also be set to be executed after step S24. The embodiment of the present invention does not limit the sequence of the step of determining the remote voice activation flag XVflag and the step S24.

It should be pointed out that the voice activation detection technology can adopt well-known technologies, such as energy detection, zero-crossing rate detection, spectral entropy detection, pitch detection, etc., which are not specifically limited in the embodiment of the present invention.

In the embodiment of the present invention, by judging that the far-end voice activation flag XVflag is not equal to 1, the echo sound field state of the signal to be determined is the near-end single talk state, and it can be considered that when the far-end voice activation flag XVflag is not 1, There is no signal at the far end, no echo signal in the near-end signal, and the current state is the near-end single talk state. Otherwise, it indicates that there is echo in the near-end signal, and further judgment on the to-be-determined signal is required.

Further, before determining whether the echo sound field state of the signal to be determined is an echo path change state, the method for determining the echo sound field state may further include: determining the echo suppression ratio Err of the signal to be determined; if said If the echo suppression ratio Err is greater than the preset echo threshold Thrd _err , it is determined that the echo sound field state of the signal to be determined is the remote single talk state.

It should be pointed out that, in the embodiment of the present invention, the step of determining the echo suppression ratio Err of the signal to be determined and determining that the echo sound field state of the signal to be determined is the remote single talk state can also be set in step S24. Execute afterwards. The embodiment of the present invention does not limit the sequence of the step of determining the echo suppression ratio Err of the signal to be determined and the step S24.

Furthermore, the step of determining the echo suppression ratio Err of the signal to be determined may include: according to the far-end signal X _n (k), the near-end signal D _n (k), and the filter coefficient W _n (k) determining residual signal E _n (k); according to the end signal D _n (k) and the residual signal E _n (k), the echo signal suppression ratio determined Err.

Furthermore, the following formula can be used to determine the signal echo suppression ratio Err:

Wherein, k is the frequency index of the signal to be determined.

In the embodiment of the present invention, by judging that the echo suppression ratio Err is greater than the preset echo threshold Thrd _err , it indicates that the relative amplitude of the residual signal is very small, and most of the near-end signal components are determined to be echo signals, which have been determined by the adaptive filter. AF is eliminated, and the current state is the far-end single-talk state. Otherwise, it indicates that the relative amplitude of the residual signal is still high and the component in the near-end signal is uncertain, and further judgment on the signal to be determined is required.

As a non-limiting example, the threshold Thrd _err reference value may be 12 to 20 dB.

Further, before determining whether the echo sound field state of the signal to be determined is the echo path change state, the echo sound field state determination method may further include: determining the normalized cross-correlation values C _YE and C _DE ; if C _{DE is} greater than If the first preset cross-correlation threshold Thrd1 _coh and C _{YE is} less than the second preset cross-correlation threshold Thrd2 _coh , it is determined that the echo sound field state of the signal to be determined is a dual-talk state; wherein, the first preset cross-correlation The threshold Thrd1 _{coh is} greater than or equal to the second preset cross-correlation threshold Thrd2 _coh .

Furthermore, the following formula can be used to determine the normalized cross-correlation values C _YE and C _DE :

Wherein, M and L are the frequency band indexes of the signal to be determined.

In an embodiment of the present invention, the residual signal of the near-end signal component further determined by normalizing the cross-correlation value, at the convergence of the filter, the residual data E _n (k) corresponding to the echo signal is decorrelated At this time, if C _{DE is} greater than the threshold Thrd1 _coh , it means that the near-end signal contains many components that are not related to echo. However, if the filter does not converge, the residual signal will also contain a large amount of echo components, and this conclusion is not valid; C _{YE is used} for further confirmation. If C _{YE is} less than the threshold Thrd2 _{coh, it} means that there are few echo components in the residual signal. Combined with _{the condition that C DE is} greater than the threshold Thrd1 _coh , it can be confirmed that the near-end signal contains components that are not related to echo. The current state is a dual-talk state, otherwise it means that the signal component cannot be determined, and the signal to be determined needs to be further judged.

Furthermore, the normalized cross-correlation values C _YE and C _DE are normalized cross-correlation values in the linear region; wherein M and L are frequency band indexes of the linear region.

In the embodiment of the present invention, the normalized cross-correlation values C _YE and C _DE are the normalized cross-correlation values of the linear region; where M and L are the frequency band indexes of the linear region. By taking the value in the linear region, you can Improve the accuracy of judgment.

It should be pointed out that by setting M and L as the frequency band index corresponding to the linear region, since the nonlinear distortion of the device has harmonic characteristics and is often distributed in the middle and high frequencies, the present invention gives the reference frequency range, and M corresponds to the low frequency band in 100~ In the 300Hz interval, L corresponds to the high frequency band in the 2500～3000Hz interval. This range is only a reference value, and the actual use is not limited by this.

Further, if the filter update degree Cef _{update is} greater than the preset update degree threshold Thrd _update , it is determined that the echo sound field state of the signal to be determined is the echo path change state; if the filter update degree Cef _{update is} less than or equal to the The preset update threshold Thrd _update determines that the echo sound field state of the signal to be determined is the remote single talk state.

That is, in the embodiment of the present invention, the step of judging that the filter update degree Cef _{update is} greater than the preset update degree threshold Thrd _update may be set after judging the dual-talk state.

It should be pointed out that the echo suppression ratio Err is the relative cancellation amount of the echo signal, which avoids the influence of the echo signal strength; the normalized cross-correlation quantities C _YE and C _DE are normalized and have nothing to do with the signal strength of the far and near ends. At the same time, the linear region calculation is used to reduce the influence of device distortion; the filter update degree Cef _update uses a certain degree of robustness of the AF itself to reflect the change intensity of the echo path. Therefore, the comprehensive use of these features can effectively solve the influence of uncertain factors such as echo signal strength changes, far and near-end signal strength changes, device distortion, and echo path changes on the detection accuracy.

In the embodiment of the present invention, after the dual-talk state is judged and excluded, it is judged that the echo path change state _{according to the filter convergence Cef update being} greater than the threshold Thrd _update _{, and according to the filter update degree Cef update being} less than or equal to the preset update degree Threshold Thrd _update , judged as the far-end single-talk state, can indicate that the filter is in the fast update state. Since the previous judgment has ruled out the deterministic dual-talk state, the interference of the near-end voice signal to the filter is not too high. The update can only be caused by non-convergence or echo path change. The current state is the echo path change state. Otherwise, the current feature has no obvious distinction and is regarded as an uncertain state. In the embodiment of the present invention, it can be determined as the remote single talk state .

It should be pointed out that when the filter update degree Cef _{update is} less than or equal to the preset update degree threshold Thrd _update , the current feature has no obvious distinguishability for the time being and can be regarded as an uncertain state. The inventor of the present invention has studied and practiced, Select FSTS processing in remote single talk state.

As a non-limiting example, the _{reference value of Thrd1 coh} may be 0.3 to 0.5, and the _{reference value of Thrd2 coh} may be 0.1 to 0.3.

Referring to FIG. 3, FIG. 3 is a flowchart of another method for determining the state of an echo sound field in an embodiment of the present invention. The another method for determining the state of the echo sound field may include step S301 to step S311, and each step will be described below.

In step S301, it is judged whether DVflag is equal to 1; when the judgment result is yes, step S302 can be executed; otherwise, step S303 can be executed.

In step S302, it is judged whether XVflag is equal to 1; when the judgment result is yes, step S304 can be executed; otherwise, step S305 can be executed.

In step S303, it is determined that the state of the echo sound field is the idle state (IDS).

In step S304, it is judged whether Err is greater than Thrd _err ; when the judgment result is yes, step S306 can be executed; otherwise, step S307 can be executed.

In step S305, it is determined that the state of the echo sound field is the near-end single talk state (NSTS).

In step S306, it is determined that the state of the echo sound field is the far-end single talk state (FSTS).

In step S307, it is judged whether C _DE is greater than Thrd1 _coh and C _{YE is} less than Thrd2 _coh ; when the judgment result is yes, step S308 can be executed; otherwise, step S309 can be executed.

In step S308, it is determined that the state of the echo sound field is a dual talk state (DTS).

In step S309, it is determined that Cef _{update is} greater than Thrd _update ; when the determination result is yes, step S310 can be executed; otherwise, step S311 can be executed.

In step S310, it is determined that the echo sound field state is the echo path change state (PCS).

In step S311, it is determined that the state of the echo sound field is the far-end single talk state (FSTS).

It should be pointed out that the sequence number of each step in this embodiment does not represent a limitation on the execution order of each step. For example, the order of steps between steps S301, S302, S304, S307, and S309 is not limited.

In a specific implementation manner of the embodiment of the present invention, step S309 may be set after S307 to improve the accuracy of judging the change state of the echo path.

In the embodiment of the present invention, the selected features and decision methods are robust against uncertain factors such as signal strength changes (far and near ends, echo signals), device distortion and echo path changes, and the combined use of multiple features Makes the detection accuracy higher and the performance more reliable.

Further, the method for determining the echo sound field state may further include adjusting the update step size μ _n (k) of the signal to be determined according to the echo sound field state of the signal to be determined; wherein the update step size μ _n (k) ) Is used to indicate the update step size of _{the filter coefficient W n (k).}

Furthermore, adjusting the update step size μ _n (k) includes one or more of the following: if it is determined that the echo sound field state of the signal to be determined is the echo path change state, then the update step size μ _n (k) is increased; if If it is determined that the echo sound field state of the signal to be determined is the dual talk state, adjust μ _n (k) to slow down the update; if it is determined that the echo sound field state of the signal to be determined is the idle state or the near-end single talk state, adjust μ _n (k)=0.

Furthermore, an echo adaptive filter may be used to adjust the update step size μ _n (k) of the signal to be determined.

In the embodiment of the present invention, when the signal to be determined is the echo path change state, the value of the update step μ _n (k) can be increased to speed up the update and fast convergence; when the signal to be determined is in the dual-talk state DTS When, adjust μ _n (k) to slow down the update to ensure the robustness of the filter; when the signal to be determined is the remote single talk state FSTS, μ _n (k) takes the normal value without special adjustment; When the signal to be determined is in the idle state IDS or the near-end single talk state NSTS, μ _n (k) is set to 0, and the update is stopped to prevent divergence, thereby improving the signal transmission quality.

Further, the method for determining the state of the echo sound field may further include: determining whether to perform nonlinear processing on the signal to be determined according to the state of the echo sound field of the signal to be determined.

Furthermore, the step of determining whether to perform nonlinear processing on the signal to be determined may include one or more of the following: if it is determined that the echo sound field state of the signal to be determined is a dual-talk state, reducing the degree of nonlinear processing; If it is determined that the echo sound field state of the signal to be determined is the echo path change state, the nonlinear processing of the signal to be determined is enhanced; if it is determined that the echo sound field state of the signal to be determined is the near-end single talk state, stop Non-linear processing of the signal to be determined; if it is determined that the echo sound field state of the signal to be determined is an idle state, the non-linear processing of the signal to be determined is stopped.

Furthermore, a post-processing non-linear processing unit can be used to perform non-linear processing on the signal to be determined.

In the embodiment of the present invention, the degree of non-linear processing can be reduced when the signal to be determined is in the dual-talk state, so that the effective voice is not damaged, and the dual-talk performance is ensured; when the signal to be determined is the echo path change state PCS Enhance the degree of non-linear processing to prevent leakage of residual echo; when the signal to be determined is near-end single talk NSTS and idle state IDS, stop non-linear processing to avoid causing near-end voice and environmental sound distortion; When it is determined that the signal is in the far-end single talk state FSTS, no special processing is performed, and the residual echo is normally suppressed, thereby improving the signal transmission quality.

Further, the method for determining the state of the echo sound field may further include: according to the state of the echo sound field of the signal to be determined, determining to reduce the noise update speed of the signal to be determined or to increase the non-stationary noise suppression capability of the signal to be determined .

Furthermore, the step of determining to reduce the noise update speed or to improve the non-stationary noise suppression capability may include one or more of the following: if it is determined that the echo sound field state of the signal to be determined is the near-end single talk state, then the standby state is reduced. Determine the noise update speed of the signal; if it is determined that the echo sound field state of the signal to be determined is the dual-talk state, reduce the noise update speed of the signal to be determined; if it is determined that the echo sound field state of the signal to be determined is the remote single In terms of state, the non-stationary noise suppression capability of the signal to be determined is improved; if it is determined that the echo sound field state of the signal to be determined is the echo path change state, the non-stationary noise suppression capability of the signal to be determined is improved.

Furthermore, a post-processing noise suppression unit is used to reduce the noise update speed of the signal to be determined or to improve the non-stationary noise suppression capability of the signal to be determined.

In the embodiment of the present invention, when the signal to be determined is in the near-end single-talk state and the dual-talk state, the noise update speed can be slowed down to ensure the intelligibility of effective speech; when the signal to be determined is the far-end single-talk state When the echo path is changed, the non-stationary noise suppression capability is improved, and the residual echo is suppressed; when the signal to be determined is in the idle state, that is, the background noise IDS state, no special processing is performed, and the background noise is normally tracked. Thereby improving the quality of signal transmission.

Referring to Fig. 4, Fig. 4 is a schematic structural diagram of an AEC system in an embodiment of the present invention.

As shown in Figure 4, the signal x(n) passes through the loudspeaker (SPK) to obtain the signal h(n), which has echo, and the voice signal (voice) and noise signal (noise) after passing through the microphone (MIC) Output signal d(n).

The short-time Fourier transform (STFT) is performed on the signal d(n) and signal x(n) to obtain the near-end signal D _n (k) and the far-end signal X _n (k). The adaptive filter (AF ) can be calculated far-end signal X _n (k) with the filter coefficients W _n (k) the echo estimation signal Y _n (k), and the near-end signal D _n (k) obtained by subtracting the residual signal E _n ( k).

Further far-end signal may be X _n (k), near-end signal D _n (k), the echo estimation signal Y _n (k), the residual signal E _n (k) with the filter coefficients W _n (k) back to the input sound field The state detection unit ESD performs signal feature calculation, and makes the echo sound field state judgment based on the calculation result, and obtains the specific echo sound field state.

As mentioned above, in the embodiment of the present invention, the echo state can be subdivided into five sound field states: far-end single-talk state FSTS, near-end single-talk state NSTS, dual-talk state DTS, echo path change state PCS, and IDS in idle state (ie, background noise).

Furthermore, an adaptive filter AF and a post-processing non-linear processing unit (NLP) and a post-processing noise suppression unit (NS) can be set to obtain a specific sound field state through ESD, and perform corresponding processing.

Further, the method for determining the state of the echo sound field may further include: determining the temporary sound field state of the signal to be determined; and determining to maintain the dual sound field state of the signal to be determined according to the echo sound field state and the temporary sound field state of the signal to be determined. Talking status output or output of delaying echo path change for the signal to be determined.

In the embodiment of the present invention, if the historical state is dual-talk DTS and EStemp is the remote single-talk FSTS, the DTS output is maintained through the holding time Thold to protect the near-end voice to the greatest extent.

Further, the output determined to maintain the dual-talk state output for the signal to be determined or to suspend the echo path change for the signal to be determined includes one or more of the following: if the echo sound field state of the signal to be determined is dual-talk State, the temporary sound field state is the remote single-talk state, the signal to be determined is maintained in the dual-talk state output through the hold time; if the echo sound field state of the signal to be determined is the dual-talk state, the temporary sound field state If it is the echo path change state, the output of the echo path change is suspended for the signal to be determined through the start time.

In the embodiment of the present invention, if the historical state is dual talk DTS and EStemp is the echo path change PCS, the output of the PCS will be suspended through the start time Tstart. At this time, the state output is forced to be the remote single talk FSTS to reduce the risk of filter divergence A compromise effect with suppressing echo residue.

As a non-limiting example, the value of Thold and Tstart can be set between 20 and 100 ms.

Referring to FIG. 5, FIG. 5 is a schematic structural diagram of a device for determining an echo sound field state in an embodiment of the present invention. The apparatus for determining the state of the echo sound field may include:

The obtaining module 51 is used to obtain the signal to be determined;

The signal determining module 52 is configured to determine the far-end signal X _n (k), the near-end signal D _n (k), and the filter coefficient W _n (k) of the signal to be determined;

The update degree determination module 53 is configured to determine the filter update degree Cef _update at least according to the far-end signal X _n (k), the near-end signal D _n (k) and the filter coefficient W _n (k);

The state determination module 54 is configured to determine whether the echo sound field state of the signal to be determined is the echo path change state at least according to the filter update degree Cef _{update being} greater than the preset update degree threshold Thrd _update.

In a specific implementation, the foregoing device may correspond to a chip with data processing function in user equipment, such as a baseband chip; or a chip module including a chip with data processing function in user equipment, or a user equipment.

For the principle, specific implementation and beneficial effects of the device for determining the state of the echo sound field, please refer to the foregoing and the related description of the method for determining the state of the echo sound field shown in FIGS. 2 to 4, which will not be repeated here.

The embodiment of the present invention also provides a storage medium on which computer instructions are stored, and the computer instructions execute the steps of the foregoing method when the computer instructions are executed. The storage medium may be a computer-readable storage medium, for example, it may include non-volatile memory (non-volatile) or non-transitory (non-transitory) memory, and may also include optical disks, mechanical hard drives, solid state hard drives, and the like.

An embodiment of the present invention also provides a terminal, including a memory and a processor, the memory stores computer instructions that can run on the processor, and the processor executes the steps of the above method when the computer instructions are executed. . The terminal includes, but is not limited to, terminal devices such as mobile phones, computers, and tablets.

Regarding the various modules/units contained in the various devices and products described in the above embodiments, they may be software modules/units, hardware modules/units, or part software modules/units and part hardware modules/units. . For example, for various devices and products that are applied to or integrated in a chip, the various modules/units contained therein can be implemented in the form of hardware such as circuits, or at least part of the modules/units can be implemented in the form of software programs. Runs on the integrated processor inside the chip, and the remaining (if any) part of the modules/units can be implemented by hardware methods such as circuits; for each device and product applied to or integrated in the chip module, the modules/units contained therein can be All are implemented by hardware such as circuits. Different modules/units can be located in the same component (such as a chip, circuit module, etc.) or different components of the chip module, or at least part of the modules/units can be implemented by software programs. The software program runs on the processor integrated inside the chip module, and the remaining (if any) part of the modules/units can be implemented by hardware methods such as circuits; for each device and product applied to or integrated in the terminal, the modules contained therein The modules/units can all be implemented by hardware such as circuits, and different modules/units can be located in the same component (for example, chip, circuit module, etc.) or different components in the terminal, or at least part of the modules/units can be implemented in the form of software programs Implementation, the software program runs on the processor integrated inside the terminal, and the remaining (if any) part of the modules/units can be implemented by hardware such as circuits.

Although the present invention is disclosed as above, the present invention is not limited to this. Any person skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention should be subject to the scope defined by the claims.

Claims

A method for determining the state of an echo sound field is characterized in that it comprises the following steps:

Obtain the signal to be determined;

Determine the far-end signal X n (k), the near-end signal D n (k) and the filter coefficient W n (k) of the signal to be determined;

Determine the filter update degree Cef update at least according to the far-end signal X n (k), the near-end signal D n (k) and the filter coefficient W n (k);

At least according to the filter update degree Cef update being greater than the preset update degree threshold Thrd update , it is determined whether the echo sound field state of the signal to be determined is the echo path change state.
The method for determining the state of the echo sound field according to claim 1, further comprising:

Determine whether the echo sound field state of the signal to be determined is a remote single talk state at least according to the filter update degree Cef update being less than or equal to the preset update degree threshold Thrd update.
The method for determining the state of the echo sound field according to claim 1, wherein the filter is determined based on at least the far-end signal X n (k), the near-end signal D n (k), and the filter coefficient W n (k). Cef update includes:

Based on the far-end signal X n (k), near-end signal D n (k) and the filter coefficients W n (k), determining the residual signal E n (k);

According to the residual signal E n (k), determines the filter coefficient W updated n + 1 (k);

Determine the filter update degree Cef update according to the filter coefficient W n (k) and the updated filter coefficient W n+1 (k).
The method for determining the state of the echo sound field according to claim 3, wherein one or more of the following is satisfied:

Using the following equation to determine the residual signal E n (k):

The following formula is used to determine the updated filter coefficient W n+1 (k), where the update step size μ n (k) is used to indicate the update step size of the filter coefficient W n (k):

Use the following formula to determine the filter update degree Cef update :
The method for determining an echo sound field state according to claim 1, wherein before determining whether the echo sound field state of the signal to be determined is an echo path change state, the method further comprises:

Performing voice activation detection on the near-end signal D n (k) to obtain a near-end voice activation flag DVflag;

If the near-end voice activation flag DVflag is not equal to 1, it is determined that the echo sound field state of the signal to be determined is an idle state.
The method for determining an echo sound field state according to claim 1, wherein before determining whether the echo sound field state of the signal to be determined is an echo path change state, the method further comprises:

Perform voice activation detection on the far-end signal X n (k) to obtain a far-end voice activation flag XVflag;

If the far-end voice activation flag XVflag is not equal to 1, it is determined that the echo sound field state of the signal to be determined is the near-end single talk state.
The method for determining an echo sound field state according to claim 1, wherein before determining whether the echo sound field state of the signal to be determined is an echo path change state, the method further comprises:

Determining the echo suppression ratio Err of the signal to be determined;

If the echo suppression ratio Err is greater than the preset echo threshold Thrd err , it is determined that the echo sound field state of the signal to be determined is the remote single talk state.
The method for determining an echo sound field state according to claim 7, wherein determining the echo suppression ratio Err of the signal to be determined comprises:

Based on the far-end signal X n (k), near-end signal D n (k) and the filter coefficients W n (k), determining the residual signal E n (k);

The proximal end of the signal D n (k) and the residual signal E n (k), the echo signal suppression ratio determined Err.
The method for determining the state of the echo sound field according to claim 8, wherein one or more of the following is satisfied:

Using the following equation to determine the residual signal E n (k):

Use the following formula to determine the signal echo suppression ratio Err:

Wherein, k is the frequency index of the signal to be determined.
The method for determining an echo sound field state according to claim 1, wherein before determining whether the echo sound field state of the signal to be determined is an echo path change state, the method further comprises:

Determine the normalized cross-correlation values C YE and C DE ;

If C DE is greater than the first preset cross-correlation threshold Thrd1 coh and C YE is less than the second preset cross-correlation threshold Thrd2 coh , determining that the echo sound field state of the signal to be determined is a dual-talk state;

Wherein, the first preset cross-correlation threshold Thrd1 coh is greater than or equal to the second preset cross-correlation threshold Thrd2 coh .
The method for determining the state of the echo sound field according to claim 10, further comprising one or more of the following:

If the filter update degree Cef update is greater than the preset update degree threshold Thrd update , determining that the echo sound field state of the signal to be determined is the echo path change state;

If the filter update degree Cef update is less than or equal to the preset update degree threshold Thrd update , it is determined that the echo sound field state of the signal to be determined is the remote single talk state.
The method for determining the state of the echo sound field according to claim 10, wherein the following formula is used to determine the normalized cross-correlation values C YE and C DE :

Wherein, M and L are the frequency band indexes of the signal to be determined.
The method for determining the state of the echo sound field according to claim 12, wherein:

The normalized cross-correlation values C YE and C DE are normalized cross-correlation values in the linear region;

Among them, M and L are the frequency band indexes of the linear region.
The method for determining the state of the echo sound field according to claim 1, further comprising:

Adjusting the update step size μ n (k) of the signal to be determined according to the echo sound field state of the signal to be determined;

Wherein, the update step μ n (k) is used to indicate the update step of the filter coefficient W n (k).
The method for determining the state of the echo sound field according to claim 14, wherein the adjusting and updating step size μ n (k) includes one or more of the following:

If it is determined that the echo sound field state of the signal to be determined is the echo path change state, increase the update step size μ n (k);

If it is determined that the echo sound field state of the signal to be determined is a dual-talk state, adjust μ n (k) to slow down the update;

If it is determined that the echo sound field state of the signal to be determined is the idle state or the near-end single talk state, adjust μ n (k)=0.
The method for determining the state of the echo sound field according to claim 14, wherein an echo adaptive filter is used to adjust the update step size μ n (k) of the signal to be determined.
The method for determining the state of the echo sound field according to claim 1, further comprising:

According to the echo sound field state of the signal to be determined, it is determined whether to perform nonlinear processing on the signal to be determined.
The method for determining the state of the echo sound field according to claim 17, wherein determining whether to perform nonlinear processing on the signal to be determined comprises one or more of the following:

If it is determined that the echo sound field state of the signal to be determined is a dual-talk state, reduce the degree of non-linear processing;

If it is determined that the echo sound field state of the signal to be determined is an echo path change state, the nonlinear processing of the signal to be determined is enhanced;

If it is determined that the echo sound field state of the signal to be determined is the near-end single talk state, stop the non-linear processing of the signal to be determined;

If it is determined that the echo sound field state of the signal to be determined is an idle state, the non-linear processing of the signal to be determined is stopped.
The method for determining the state of the echo sound field according to claim 17, wherein a post-processing non-linear processing unit is used to perform non-linear processing on the signal to be determined.
The method for determining the state of the echo sound field according to claim 1, further comprising:

According to the echo sound field state of the signal to be determined, it is determined to reduce the noise update speed of the signal to be determined or to increase the non-stationary noise suppression capability of the signal to be determined.
The method for determining the state of the echo sound field according to claim 20, wherein the determining to reduce the noise update speed or to improve the non-stationary noise suppression capability includes one or more of the following:

If it is determined that the echo sound field state of the signal to be determined is the near-end single talk state, reducing the noise update speed of the signal to be determined;

If it is determined that the echo sound field state of the signal to be determined is a dual-talk state, reducing the noise update speed of the signal to be determined;

If it is determined that the echo sound field state of the signal to be determined is the far-end single talk state, improving the non-stationary noise suppression capability of the signal to be determined;

If it is determined that the echo sound field state of the signal to be determined is an echo path change state, the non-stationary noise suppression capability of the signal to be determined is improved.
The method for determining the state of the echo sound field according to claim 20, wherein a post-processing noise suppression unit is used to reduce the noise update speed of the signal to be determined or to improve the non-stationary noise suppression capability of the signal to be determined.
The method for determining the state of the echo sound field according to claim 1, further comprising:

Determining the temporary sound field state of the signal to be determined;

According to the echo sound field state and the temporary sound field state of the signal to be determined, it is determined that the signal to be determined is kept in a dual-talk state output or the output of the echo path change of the signal to be determined is suspended.
The method for determining the state of the echo sound field according to claim 23, wherein the output determined to maintain the dual-talk state output for the signal to be determined or to suspend the echo path change of the signal to be determined includes one or more of the following:

If the echo sound field state of the signal to be determined is a dual-talk state, and the temporary sound field state is a remote single-talk state, then the signal to be determined is kept in a dual-talk state output through the holding time;

If the echo sound field state of the signal to be determined is a dual-talk state, and the temporary sound field state is an echo path change state, the output of the echo path change for the signal to be determined is temporarily suspended based on the start time.
A device for determining the state of an echo sound field, characterized in that it comprises:

The acquisition module is used to acquire the signal to be determined;

A signal determining module for determining the far-end signal X n (k), the near-end signal D n (k) and the filter coefficient W n (k) of the signal to be determined;

The update degree determination module is configured to determine the filter update degree Cef update at least according to the far-end signal X n (k), the near-end signal D n (k) and the filter coefficient W n (k);

The state determination module is configured to determine whether the echo sound field state of the signal to be determined is the echo path change state at least according to the filter update degree Cef update being greater than the preset update degree threshold Thrd update.
A storage medium having computer instructions stored thereon, wherein the computer instructions execute the steps of the method for determining the state of the echo sound field according to any one of claims 1 to 24 when the computer instructions are run.
A terminal, comprising a memory and a processor, and computer instructions that can run on the processor are stored on the memory, wherein the processor executes any one of claims 1 to 24 when the computer instructions are executed. The steps of the method for determining the state of the echo sound field described in the item.