CN113345459A - Method and device for detecting double-talk state, computer equipment and storage medium - Google Patents

Method and device for detecting double-talk state, computer equipment and storage medium Download PDF

Info

Publication number
CN113345459A
CN113345459A CN202110805408.6A CN202110805408A CN113345459A CN 113345459 A CN113345459 A CN 113345459A CN 202110805408 A CN202110805408 A CN 202110805408A CN 113345459 A CN113345459 A CN 113345459A
Authority
CN
China
Prior art keywords
signal
double
talk
far
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110805408.6A
Other languages
Chinese (zh)
Other versions
CN113345459B (en
Inventor
秦永红
付贤会
刘武钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Rongxun Technology Co ltd
Original Assignee
Beijing Rongxun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Rongxun Technology Co ltd filed Critical Beijing Rongxun Technology Co ltd
Priority to CN202110805408.6A priority Critical patent/CN113345459B/en
Publication of CN113345459A publication Critical patent/CN113345459A/en
Application granted granted Critical
Publication of CN113345459B publication Critical patent/CN113345459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Telephone Function (AREA)

Abstract

The embodiment of the invention discloses a method and a device for detecting a double-talk state, computer equipment and a storage medium. The method comprises the following steps: acquiring a far-end voice reference signal and a near-end signal; determining a near-end microphone input signal according to the far-end speech reference signal and the near-end signal; the far-end voice reference signal is processed by a preset self-adaptive filter to obtain an estimated echo signal; determining a residual echo output signal from the near-end microphone input signal and the estimated echo signal; calculating a double-talk detection decision value according to the estimated echo signal, the near-end signal and the residual echo output signal; and determining whether the current state is the double-talk state or not according to the double-talk detection judgment value. The method for detecting the double-talk state improves the accuracy of double-talk state detection, can adapt to various speaking contexts or scenes, has higher robustness compared with the prior art, and reduces the problem of voice interruption caused by false detection.

Description

Method and device for detecting double-talk state, computer equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of audio signal processing, in particular to a method and a device for detecting a double-talk state, computer equipment and a storage medium.
Background
With the continuous development of information technology, various distributed intelligent hardware is increasingly widely applied in various fields, and echo suppression becomes a hot spot for research of technicians in related fields. Real-time transmission of speech over the internet has become widespread, and one of the key factors affecting speech quality is the problem of echo. One key indicator of the echo cancellation algorithm is double-talk detection, and if the double-talk detection is not accurate in the echo cancellation algorithm, voice interruption occurs. Therefore, in echo cancellation processing, double talk detection is crucial to speech quality.
At present, the double-talk detection is mostly realized based on the traditional time/frequency domain calculation, namely, the mutual correlation coefficient between far-end speech and near-end speech, the spectrum calculation and other ideas are used for performing the double-talk detection. However, the prior art has at least the following problems: in voice communication of a Voice Over Internet Phone (VOIP), the reasons for echo generation are complex, and the method has the characteristics of complex echo source, large echo path delay, variable call scenes, variable call device types and the like, while the iteration factors and the adopted parameters of the conventional double-talk detection algorithm are basically fixed, so that the method has limitations and cannot be effectively adjusted along with the change of the speaking context/scene.
Disclosure of Invention
The embodiment of the invention provides a method and a device for detecting a double-talk state, computer equipment and a storage medium, which are used for improving the accuracy of double-talk state detection and are suitable for various scenes so as to reduce the problem of voice interruption caused by false detection.
In a first aspect, an embodiment of the present invention provides a method for detecting a dual speech state, where the method includes:
acquiring a far-end voice reference signal and a near-end signal;
determining a near-end microphone input signal according to the far-end speech reference signal and the near-end signal;
the far-end voice reference signal is processed by a preset self-adaptive filter to obtain an estimated echo signal;
determining a residual echo output signal from the near-end microphone input signal and the estimated echo signal;
calculating a double-talk detection decision value according to the estimated echo signal, the near-end signal and the residual echo output signal;
and determining whether the current state is the double-talk state or not according to the double-talk detection judgment value.
In a second aspect, an embodiment of the present invention further provides a device for detecting a dual speech state, where the device includes:
the signal acquisition module is used for acquiring a far-end voice reference signal and a near-end signal;
a near-end microphone input signal determination module, configured to determine a near-end microphone input signal according to the far-end speech reference signal and the near-end signal;
an estimated echo signal obtaining module, configured to pass the far-end speech reference signal through a preset adaptive filter to obtain an estimated echo signal;
a residual echo output signal determining module, configured to determine a residual echo output signal according to the near-end microphone input signal and the estimated echo signal;
the double-talk detection decision value calculation module is used for calculating a double-talk detection decision value according to the estimated echo signal, the near-end signal and the residual echo output signal;
and the double-talk state determining module is used for determining whether the current double-talk state is the double-talk state according to the double-talk detection judging value.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
a memory for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the method for detecting a double talk state provided by any embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for detecting a double talk state provided in any embodiment of the present invention.
The embodiment of the invention provides a method for detecting a double-talk state, which comprises the steps of firstly obtaining a far-end voice reference signal and a near-end signal, then determining a near-end microphone input signal according to the far-end voice reference signal and the near-end signal, simultaneously enabling the far-end voice reference signal to pass through a preset self-adaptive filter to obtain an estimated echo signal, then determining a residual echo output signal according to the obtained near-end microphone input signal and the estimated echo signal, and finally calculating a double-talk detection judgment value according to the obtained estimated echo signal, the near-end signal and the residual echo output signal, thereby determining whether the double-talk state is currently determined according to the double-talk detection judgment value. The method for detecting the double-talk state provided by the embodiment of the invention judges whether the double-talk state is currently in the double-talk state by adaptively calculating the double-talk detection judgment value according to the estimated echo signal, the near-end signal and the residual echo output signal every time, improves the accuracy of double-talk state detection, can adapt to various speaking contexts or scenes, has higher robustness compared with the prior art, and reduces the problem of voice interruption caused by false detection.
Drawings
Fig. 1 is a flowchart of a method for detecting a double talk state according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a dual-speech state detection apparatus according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example one
Fig. 1 is a flowchart of a method for detecting a double talk state according to an embodiment of the present invention. The embodiment is applicable to the case of eliminating echo in a microphone collected signal, and the method can be executed by the device for detecting a double-talk state provided by the embodiment of the invention, and the device can be realized by hardware and/or software, and can be generally integrated in a computer device. As shown in fig. 1, the method specifically comprises the following steps:
and S11, acquiring a far-end voice reference signal and a near-end signal.
Specifically, in the real-time transmission process of voice on the internet, the local computer device may collect a near-end signal through the microphone and transmit the near-end signal to the outside, and may play the received audio signal through the speaker, so that the microphone may collect the near-end signal and may actually collect the audio signal played by the speaker at the same time, thereby generating an echo, and the audio signal played by the speaker may be used as a far-end voice reference signal.
And S12, determining a near-end microphone input signal according to the far-end voice reference signal and the near-end signal.
Specifically, after the far-end speech reference signal and the near-end signal are acquired, a signal actually acquired by the microphone, that is, a near-end microphone input signal, may be calculated according to the two signals.
Optionally, the determining a near-end microphone input signal according to the far-end speech reference signal and the near-end signal includes:
y(n)=z(n)+ξ*x(n)
where y (n) represents the near-end microphone input signal, z (n) represents the near-end signal, x (n) represents the far-end speech reference signal, ξ x (n) represents a superposition of direct and reflected sounds of the far-end speech reference signal.
And S13, the far-end voice reference signal is processed by a preset self-adaptive filter to obtain an estimated echo signal.
Specifically, after the far-end speech reference signal is obtained, the far-end speech reference signal may be input to a preset adaptive filter, so as to obtain an estimated echo signal. Wherein the estimated echo signal obtained by the adaptive filter may be
Figure BDA0003166324670000051
Where N may be the number of partitioned subbands.
And S14, determining a residual echo output signal according to the near-end microphone input signal and the estimated echo signal.
Specifically, after obtaining the near-end microphone input signal and the estimated echo signal, a residual echo output signal may be calculated according to the two signals, that is, the signal after echo removal.
Wherein, optionally, the determining a residual echo output signal from the near-end microphone input signal and the estimated echo signal comprises:
Figure BDA0003166324670000061
wherein e (n) represents the residual echo output signal, y (n) represents the near-end microphone input signal,
Figure BDA0003166324670000062
representing the estimated echo signal.
And S15, calculating a double-talk detection judgment value according to the estimated echo signal, the near-end signal and the residual echo output signal.
Specifically, after the estimated echo signal, the near-end signal and the residual echo output signal are obtained, a double-talk detection decision value can be calculated according to the three signals, so that whether the current double-talk state is determined according to the double-talk detection decision value.
Optionally, the calculating a double-talk detection decision value according to the estimated echo signal, the near-end signal, and the residual echo output signal includes:
Figure BDA0003166324670000063
wherein λ represents the double talk detection determination value,
Figure BDA0003166324670000064
representing said estimated echo signal, e (n) representing said residual echo output signal, z (n) representing said near-end signal,
Figure BDA0003166324670000065
representing the mean square error of the estimated echo signal,
Figure BDA0003166324670000066
representing the mean square error of the residual echo output signal,
Figure BDA0003166324670000067
represents the mean square error of the near-end signal, and mu represents an amplification factor, and specifically can be mu ≧ 1. The method has higher robustness by calculating the mean square error of the statistical characteristics of the time domain signals to calculate the double-talk detection judgment value.
And S16, determining whether the current state is the double-talk state according to the double-talk detection judgment value.
Optionally, the dual-talk detection and determination value includes the dual-talk detection and determination value of each sub-band, and determining whether the current dual-talk state is the dual-talk state according to the dual-talk detection and determination value includes: respectively comparing the double-talk detection judgment value of each sub-band with a preset judgment value threshold, and counting the number of the sub-bands of which the corresponding double-talk detection judgment value is greater than or equal to the preset judgment value threshold; and comparing the number of the sub-bands with a preset number, if the number of the sub-bands is more than or equal to the preset number, determining that the current state is a single-talk state, and otherwise, determining that the current state is a double-talk state. Optionally, the preset decision threshold is 0-1, because the residual echo output signal is far smaller than the output signal of the adaptive filter, when the near-end signal is close to 0, the dual-talk detection decision value is close to 1, and when the near-end signal is not 0, the dual-talk detection decision value is obviously smaller than 1. The preset judgment value threshold can be 0-1, and meanwhile, the double-talk detection can have high discrimination degree through the action of the amplification factor mu. Specifically, after the dual-talk detection decision value of each sub-band is obtained through calculation, the number of the sub-bands with the dual-talk detection decision value being greater than or equal to the preset decision value threshold value is counted, the number of the sub-bands is compared with the preset number, if the dual-talk detection decision value of the sub-bands with the number being greater than or equal to the preset decision value threshold value exists, the current single-talk state can be judged, and if the dual-talk detection decision value of the sub-bands with the number being greater than or equal to the preset decision value threshold value exists, the current single-talk state is judged, and otherwise, the dual-talk state is judged. Through setting up the predetermined quantity, can adjust the sensitivity that two talkbacks detected, and the settlement of predetermineeing the quantity is not restricted to environmental factor.
On the basis of the foregoing technical solution, optionally, after determining whether the current state is the double-talk state according to the double-talk detection determination value, the method further includes: and if the current state is determined to be the double-talk state, taking the residual echo output signal as a final output signal to finish echo suppression. Specifically, if it is determined that the current dual-talk state is present, the echo needs to be cancelled, which may specifically include: acquiring a far-end voice reference signal x (n); passing x (n) through an adaptive filter to obtain an output signal
Figure BDA0003166324670000071
Meanwhile, the near-end microphone input signal y (n) ═ z (n) + xi x (n) can be obtained, wherein z (n) is the near-end signal, and xi x (n) is the superposition of the direct sound and the reflected sound of the far-end voice reference signal; calculating the output signal
Figure BDA0003166324670000072
Echo suppression is completed.
According to the technical scheme provided by the embodiment of the invention, a far-end voice reference signal and a near-end signal are firstly obtained, then a near-end microphone input signal is determined according to the far-end voice reference signal and the near-end signal, meanwhile, the far-end voice reference signal can pass through a preset self-adaptive filter to obtain an estimated echo signal, then a residual echo output signal is determined according to the obtained near-end microphone input signal and the estimated echo signal, finally, a double-talk detection judgment value can be calculated according to the obtained estimated echo signal, the near-end signal and the residual echo output signal, and therefore, whether the current double-talk state is determined according to the double-talk detection judgment value. Whether the current double-talk state is judged by calculating the double-talk detection judgment value in a self-adaptive way according to the estimated echo signal, the near-end signal and the residual echo output signal every time, so that the double-talk state detection accuracy is improved, the method can adapt to various speaking contexts or scenes, has higher robustness compared with the prior art, and reduces the problem of voice interruption caused by false detection.
Example two
Fig. 2 is a schematic structural diagram of a dual-speech state detection apparatus according to a second embodiment of the present invention, which may be implemented in a hardware and/or software manner, and may be generally integrated in a computer device for executing the dual-speech state detection method according to any embodiment of the present invention. As shown in fig. 2, the apparatus includes:
a signal obtaining module 21, configured to obtain a far-end speech reference signal and a near-end signal;
a near-end microphone input signal determining module 22, configured to determine a near-end microphone input signal according to the far-end speech reference signal and the near-end signal;
an estimated echo signal obtaining module 23, configured to pass the far-end speech reference signal through a preset adaptive filter to obtain an estimated echo signal;
a residual echo output signal determining module 24, configured to determine a residual echo output signal according to the near-end microphone input signal and the estimated echo signal;
a double-talk detection decision value calculation module 25, configured to calculate a double-talk detection decision value according to the estimated echo signal, the near-end signal, and the residual echo output signal;
and a double-talk state determining module 26, configured to determine whether the current double-talk state is the double-talk state according to the double-talk detection determination value.
According to the technical scheme provided by the embodiment of the invention, a far-end voice reference signal and a near-end signal are firstly obtained, then a near-end microphone input signal is determined according to the far-end voice reference signal and the near-end signal, meanwhile, the far-end voice reference signal can pass through a preset self-adaptive filter to obtain an estimated echo signal, then a residual echo output signal is determined according to the obtained near-end microphone input signal and the estimated echo signal, finally, a double-talk detection judgment value can be calculated according to the obtained estimated echo signal, the near-end signal and the residual echo output signal, and therefore, whether the current double-talk state is determined according to the double-talk detection judgment value. Whether the current double-talk state is judged by calculating the double-talk detection judgment value in a self-adaptive way according to the estimated echo signal, the near-end signal and the residual echo output signal every time, so that the double-talk state detection accuracy is improved, the method can adapt to various speaking contexts or scenes, has higher robustness compared with the prior art, and reduces the problem of voice interruption caused by false detection.
On the basis of the above technical solution, optionally, the dual-speech state determining module 26 includes:
the sub-band number counting unit is used for respectively comparing the double-talk detection judgment value of each sub-band with a preset judgment value threshold value and counting the number of the sub-bands of which the corresponding double-talk detection judgment value is greater than or equal to the preset judgment value threshold value;
and the sub-band quantity comparison unit is used for comparing the sub-band quantity with a preset quantity, if the sub-band quantity is greater than or equal to the preset quantity, the current single-talk state is determined, and if not, the current double-talk state is determined.
On the basis of the above technical solution, optionally, the preset judgment value threshold is 0-1.
On the basis of the above technical solution, optionally, the near-end microphone input signal determining module 22 is specifically configured to:
y(n)=z(n)+ξ*x(n)
where y (n) represents the near-end microphone input signal, z (n) represents the near-end signal, x (n) represents the far-end speech reference signal, ξ x (n) represents a superposition of direct and reflected sounds of the far-end speech reference signal.
On the basis of the above technical solution, optionally, the residual echo output signal determining module 24 is specifically configured to:
Figure BDA0003166324670000101
wherein e (n) represents the residual echo output signal, y (n) represents the near-end microphone input signal,
Figure BDA0003166324670000102
representing the estimated echo signal.
On the basis of the above technical solution, optionally, the dual-talk detection decision value calculation module 25 is specifically configured to:
Figure BDA0003166324670000103
wherein λ represents the double talk detection determination value,
Figure BDA0003166324670000104
representing said estimated echo signal, e (n) representing said residual echo output signal, z (n) representing said near-end signal,
Figure BDA0003166324670000105
representing the mean square error of the estimated echo signal,
Figure BDA0003166324670000106
represents the aboveThe mean square error of the residual echo output signal,
Figure BDA0003166324670000107
represents the mean square error of the near-end signal and μ represents the amplification factor.
On the basis of the above technical solution, optionally, the apparatus for detecting a dual-speech state further includes:
and the echo suppression module is used for determining whether the current state is the double-talk state according to the double-talk detection judgment value, and taking the residual echo output signal as a final output signal to finish echo suppression if the current state is the double-talk state.
The device for detecting the double-talk state provided by the embodiment of the invention can execute the method for detecting the double-talk state provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
It should be noted that, in the embodiment of the apparatus for detecting a dual-speech state, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a computer device provided in the third embodiment of the present invention, and shows a block diagram of an exemplary computer device suitable for implementing the embodiment of the present invention. The computer device shown in fig. 3 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present invention. As shown in fig. 3, the computer apparatus includes a processor 31, a memory 32, an input device 33, and an output device 34; the number of the processors 31 in the computer device may be one or more, one processor 31 is taken as an example in fig. 3, the processor 31, the memory 32, the input device 33 and the output device 34 in the computer device may be connected by a bus or in other ways, and the connection by the bus is taken as an example in fig. 3.
The memory 32 is used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the detection method of the double-talk state in the embodiment of the present invention (for example, the signal acquisition module 21, the near-end microphone input signal determination module 22, the estimated echo signal acquisition module 23, the residual echo output signal determination module 24, the double-talk detection determination value calculation module 25, and the double-talk state determination module 26 in the detection device of the double-talk state). The processor 31 executes various functional applications and data processing of the computer device by executing software programs, instructions and modules stored in the memory 32, that is, the above-mentioned detection method of the double talk state is realized.
The memory 32 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 32 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 32 may further include memory located remotely from the processor 31, which may be connected to a computer device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 33 may be used to acquire a far-end speech reference signal and a near-end signal, and to generate key signal inputs and the like relating to user settings and function control of the computer apparatus. The output device 34 may be used to output the processed target audio data and the like.
Example four
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method for detecting a double talk state, where the method includes:
acquiring a far-end voice reference signal and a near-end signal;
determining a near-end microphone input signal according to the far-end speech reference signal and the near-end signal;
the far-end voice reference signal is processed by a preset self-adaptive filter to obtain an estimated echo signal;
determining a residual echo output signal from the near-end microphone input signal and the estimated echo signal;
calculating a double-talk detection decision value according to the estimated echo signal, the near-end signal and the residual echo output signal;
and determining whether the current state is the double-talk state or not according to the double-talk detection judgment value.
The storage medium may be any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lambda (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in the computer system in which the program is executed, or may be located in a different second computer system connected to the computer system through a network (such as the internet). The second computer system may provide the program instructions to the computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the method for detecting a double talk state provided by any embodiment of the present invention.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for detecting a double talk state is characterized by comprising the following steps:
acquiring a far-end voice reference signal and a near-end signal;
determining a near-end microphone input signal according to the far-end speech reference signal and the near-end signal;
the far-end voice reference signal is processed by a preset self-adaptive filter to obtain an estimated echo signal;
determining a residual echo output signal from the near-end microphone input signal and the estimated echo signal;
calculating a double-talk detection decision value according to the estimated echo signal, the near-end signal and the residual echo output signal;
and determining whether the current state is the double-talk state or not according to the double-talk detection judgment value.
2. The method according to claim 1, wherein the double-talk detection decision value includes the double-talk detection decision value of each sub-band, and the determining whether the current double-talk state is determined according to the double-talk detection decision value includes:
respectively comparing the double-talk detection judgment value of each sub-band with a preset judgment value threshold, and counting the number of the sub-bands of which the corresponding double-talk detection judgment value is greater than or equal to the preset judgment value threshold;
and comparing the number of the sub-bands with a preset number, if the number of the sub-bands is more than or equal to the preset number, determining that the current state is a single-talk state, and otherwise, determining that the current state is a double-talk state.
3. The method according to claim 2, wherein the threshold value of the predetermined determination value is 0 to 1.
4. The method of claim 1, wherein the determining a near-end microphone input signal from the far-end speech reference signal and the near-end signal comprises:
y(n)=z(n)+ξ*x(n)
where y (n) represents the near-end microphone input signal, z (n) represents the near-end signal, x (n) represents the far-end speech reference signal, ξ x (n) represents a superposition of direct and reflected sounds of the far-end speech reference signal.
5. The method of claim 1, wherein determining a residual echo output signal based on the near-end microphone input signal and the estimated echo signal comprises:
Figure FDA0003166324660000021
wherein e (n) represents the residual echo output signal, y (n) represents the near-end microphone input signal,
Figure FDA0003166324660000022
representing the estimated echo signal.
6. The method of claim 1, wherein said calculating a double talk detection decision value based on said estimated echo signal, said near-end signal and said residual echo output signal comprises:
Figure FDA0003166324660000023
wherein λ represents the double talk detection determination value,
Figure FDA0003166324660000024
representing said estimated echo signal, e (n) representing said residual echo output signal, z (n) representing said near echo signalThe end signals are sent to the mobile station,
Figure FDA0003166324660000025
representing the mean square error of the estimated echo signal,
Figure FDA0003166324660000026
representing the mean square error of the residual echo output signal,
Figure FDA0003166324660000027
represents the mean square error of the near-end signal and μ represents the amplification factor.
7. The method for detecting a double talk state according to claim 1, further comprising, after determining whether the double talk state is currently present according to the double talk detection determination value:
and if the current state is determined to be the double-talk state, taking the residual echo output signal as a final output signal to finish echo suppression.
8. A device for detecting a double-talk state, comprising:
the signal acquisition module is used for acquiring a far-end voice reference signal and a near-end signal;
a near-end microphone input signal determination module, configured to determine a near-end microphone input signal according to the far-end speech reference signal and the near-end signal;
an estimated echo signal obtaining module, configured to pass the far-end speech reference signal through a preset adaptive filter to obtain an estimated echo signal;
a residual echo output signal determining module, configured to determine a residual echo output signal according to the near-end microphone input signal and the estimated echo signal;
the double-talk detection decision value calculation module is used for calculating a double-talk detection decision value according to the estimated echo signal, the near-end signal and the residual echo output signal;
and the double-talk state determining module is used for determining whether the current double-talk state is the double-talk state according to the double-talk detection judging value.
9. A computer device, comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of detecting a double talk state as recited in any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method for detecting a double talk state according to any one of claims 1 to 7.
CN202110805408.6A 2021-07-16 2021-07-16 Method and device for detecting double-talk state, computer equipment and storage medium Active CN113345459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110805408.6A CN113345459B (en) 2021-07-16 2021-07-16 Method and device for detecting double-talk state, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110805408.6A CN113345459B (en) 2021-07-16 2021-07-16 Method and device for detecting double-talk state, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113345459A true CN113345459A (en) 2021-09-03
CN113345459B CN113345459B (en) 2023-02-21

Family

ID=77480046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110805408.6A Active CN113345459B (en) 2021-07-16 2021-07-16 Method and device for detecting double-talk state, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113345459B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113949776A (en) * 2021-10-19 2022-01-18 随锐科技集团股份有限公司 Double-end talk detection method and device based on double-step fast echo cancellation
CN114650340A (en) * 2022-04-21 2022-06-21 深圳市中科蓝讯科技股份有限公司 Echo cancellation method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008131593A (en) * 2006-11-24 2008-06-05 Nippon Telegr & Teleph Corp <Ntt> Method of deciding double talk state, echo eraser using same and its, program and recording medium therefore
US20150023514A1 (en) * 2012-03-23 2015-01-22 Dolby Laboratories Licensing Corporation Method and Apparatus for Acoustic Echo Control
CN111083297A (en) * 2019-11-14 2020-04-28 维沃移动通信(杭州)有限公司 Echo cancellation method and electronic equipment
CN112185404A (en) * 2019-07-05 2021-01-05 南京工程学院 Low-complexity double-end detection method based on sub-band signal-to-noise ratio estimation
CN112292844A (en) * 2019-05-22 2021-01-29 深圳市汇顶科技股份有限公司 Double-end call detection method, double-end call detection device and echo cancellation system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008131593A (en) * 2006-11-24 2008-06-05 Nippon Telegr & Teleph Corp <Ntt> Method of deciding double talk state, echo eraser using same and its, program and recording medium therefore
US20150023514A1 (en) * 2012-03-23 2015-01-22 Dolby Laboratories Licensing Corporation Method and Apparatus for Acoustic Echo Control
CN112292844A (en) * 2019-05-22 2021-01-29 深圳市汇顶科技股份有限公司 Double-end call detection method, double-end call detection device and echo cancellation system
CN112185404A (en) * 2019-07-05 2021-01-05 南京工程学院 Low-complexity double-end detection method based on sub-band signal-to-noise ratio estimation
CN111083297A (en) * 2019-11-14 2020-04-28 维沃移动通信(杭州)有限公司 Echo cancellation method and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谢鹏等: "子带仿射投影及子带双端检测算法的回声消除系统", 《通信技术》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113949776A (en) * 2021-10-19 2022-01-18 随锐科技集团股份有限公司 Double-end talk detection method and device based on double-step fast echo cancellation
CN113949776B (en) * 2021-10-19 2024-04-16 随锐科技集团股份有限公司 Double-end speaking detection method and device based on double-step rapid echo cancellation
CN114650340A (en) * 2022-04-21 2022-06-21 深圳市中科蓝讯科技股份有限公司 Echo cancellation method and device and electronic equipment

Also Published As

Publication number Publication date
CN113345459B (en) 2023-02-21

Similar Documents

Publication Publication Date Title
EP0901267B1 (en) The detection of the speech activity of a source
US8014519B2 (en) Cross-correlation based echo canceller controllers
CN109473118B (en) Dual-channel speech enhancement method and device
US8498407B2 (en) Systems and methods for double-talk detection in acoustically harsh environments
CN113345459B (en) Method and device for detecting double-talk state, computer equipment and storage medium
US9378754B1 (en) Adaptive spatial classifier for multi-microphone systems
US10771621B2 (en) Acoustic echo cancellation based sub band domain active speaker detection for audio and video conferencing applications
US10978086B2 (en) Echo cancellation using a subset of multiple microphones as reference channels
US11349525B2 (en) Double talk detection method, double talk detection apparatus and echo cancellation system
CN110634496B (en) Double-talk detection method and device, computer equipment and storage medium
CN110995951B (en) Echo cancellation method, device and system based on double-end sounding detection
US8019075B2 (en) Hybrid echo canceller controllers
US8081753B2 (en) Hybrid echo canceller controllers
Papp et al. Hands-free voice communication with TV
CN110431624B (en) Residual echo detection method, residual echo detection device, voice processing chip and electronic equipment
US8831210B2 (en) Method and system for detection of onset of near-end signal in an echo cancellation system
CN111028855B (en) Echo suppression method, device, equipment and storage medium
CN111883153B (en) Microphone array-based double-end speaking state detection method and device
CN111970610B (en) Echo path detection method, audio signal processing method and system, storage medium, and terminal
CN111989934B (en) Echo cancellation device, echo cancellation method, signal processing chip, and electronic apparatus
CN111355855A (en) Echo processing method, device, equipment and storage medium
CN113393853B (en) Method and apparatus for processing mixed sound signal, storage medium, and electronic apparatus
CN112165558B (en) Method and device for detecting double-talk state, storage medium and terminal equipment
CN111883155A (en) Echo cancellation method, device and storage medium
CN113223547B (en) Double-talk detection method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A dual talk state detection method, device, computer equipment, and storage medium

Effective date of registration: 20231201

Granted publication date: 20230221

Pledgee: Beijing Yizhuang International Financing Guarantee Co.,Ltd.

Pledgor: Beijing Rongxun Technology Co.,Ltd.

Registration number: Y2023980068991

PE01 Entry into force of the registration of the contract for pledge of patent right