CN113470677A - Audio processing method, device and system - Google Patents

Audio processing method, device and system Download PDF

Info

Publication number
CN113470677A
CN113470677A CN202110741217.8A CN202110741217A CN113470677A CN 113470677 A CN113470677 A CN 113470677A CN 202110741217 A CN202110741217 A CN 202110741217A CN 113470677 A CN113470677 A CN 113470677A
Authority
CN
China
Prior art keywords
module
audio signal
processing
filtering
end audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110741217.8A
Other languages
Chinese (zh)
Inventor
修平平
刘焕
朱赛男
严涛
浦宏杰
鄢仁祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Keda Technology Co Ltd
Original Assignee
Suzhou Keda Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Keda Technology Co Ltd filed Critical Suzhou Keda Technology Co Ltd
Priority to CN202110741217.8A priority Critical patent/CN113470677A/en
Publication of CN113470677A publication Critical patent/CN113470677A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems
    • H04R27/04Electric megaphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/009Signal processing in [PA] systems to enhance the speech intelligibility

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application provides an audio processing method, an audio processing device and an audio processing system, wherein the audio processing method comprises the following steps: inputting a far-end audio signal into a self-adaptive filtering module, and acquiring a first filtering coefficient after the self-adaptive filtering module converges; the first filter coefficient is used as a filter coefficient of a reference filter module, and the double-filter processing module comprises a reference filter module and a real-time filter module; performing double filtering processing on the near-end audio signal through the double filtering processing module; performing sound mixing on the near-end audio signal and the far-end audio signal after the double filtering processing to obtain a sound mixing signal; and playing the audio mixing signal by near-end public address equipment, wherein the near-end public address equipment and the near-end acquisition equipment of the near-end audio signal are positioned in the same space. The application can reduce the feedback of the sound amplification.

Description

Audio processing method, device and system
Technical Field
The present application relates to the field of audio processing, and in particular, to an audio processing method, apparatus and system.
Background
Under the remote interaction scene such as education recorded broadcast system, online meeting, both carry out public address and broadcast to the pronunciation of near-end (local) collection, in order to satisfy under the local big space each meeting personnel to the sound size and the definition demand of near-end pronunciation, simultaneously, again, need the butt joint network in order to realize online remote interaction, the personnel in far-end space can broadcast the pronunciation of near-end collection in step, the personnel in far-end space also can pass through the collection equipment of distal end, thereby carry out public address and broadcast with its audio signal at the near-end, thereby realize remote interaction.
In such a scenario, when the near-end acquisition device, the far-end acquisition device, the near-end public address device and the far-end public address device are used almost simultaneously, the near-end acquisition device, the far-end public address device and the far-end public address device may cause the generation of echo and howling during the acquisition, playing and transmission of the audio signal.
How to alleviate the generation of echo and howling in a scene where a near-end acquisition device, a far-end acquisition device, a near-end public address device and a far-end public address device are used simultaneously is a technical problem to be solved urgently by those skilled in the art.
Disclosure of Invention
In order to overcome the defects in the prior art, the application provides an audio processing method, an audio processing device and an audio processing system, which can alleviate the generation of echo and howling in a scene where a near-end acquisition device, a far-end acquisition device, a near-end public address device and a far-end public address device are used simultaneously.
According to an aspect of the present application, there is provided an audio processing method including:
inputting a far-end audio signal into a self-adaptive filtering module, and acquiring a first filtering coefficient after the self-adaptive filtering module converges;
the first filter coefficient is used as a filter coefficient of a reference filter module, and the double-filter processing module comprises a reference filter module and a real-time filter module;
performing double filtering processing on the near-end audio signal through the double filtering processing module;
performing sound mixing on the near-end audio signal and the far-end audio signal after the double filtering processing to obtain a sound mixing signal;
and playing the audio mixing signal by near-end public address equipment, wherein the near-end public address equipment and the near-end acquisition equipment of the near-end audio signal are positioned in the same space.
In some embodiments of the present application, after obtaining the converged first filter coefficient of the adaptive filter module and before taking the first filter coefficient as the filter coefficient of the reference filter module, the method further includes:
performing confidence calculation on the first filter coefficient;
and when the confidence coefficient of the first filter coefficient is larger than a preset threshold value, executing the step of taking the first filter coefficient as the filter coefficient of the reference filter module.
In some embodiments of the present application, the performing confidence calculation on the first filter coefficient includes:
obtaining a plurality of self-adaptive filtering curves according to a plurality of historical first filtering coefficients and gains of historical audio signals at each frequency point, wherein the historical audio signals are output by the self-adaptive filtering modules corresponding to the historical first filtering coefficients;
clustering the multiple self-adaptive filter curves to obtain a reference first filter coefficient;
calculating the confidence of the first filter coefficient according to the reference first filter coefficient.
In some embodiments of the present application, after the double-filtering processing module performs double-filtering processing on the near-end audio signal, and performs audio mixing on the near-end audio signal after the double-filtering processing and the far-end audio signal, before obtaining an audio mixing signal, the method further includes:
performing decorrelation processing on the near-end audio signal subjected to the double filtering processing so as to perform audio mixing on the near-end audio signal subjected to the decorrelation processing and the far-end audio signal; and/or
And automatically equalizing the set frequency points in the near-end audio signals after the double filtering processing so as to mix the near-end audio signals after the automatic equalization processing and the far-end audio signals.
In some embodiments of the present application, the performing automatic equalization processing on the set frequency point in the near-end audio signal after the double filtering processing includes:
and the response sensitivity of the set frequency point in the near-end audio signal after the automatic equalization processing is smaller than the response sensitivity of the set frequency point in the near-end audio signal before the automatic equalization processing.
In some embodiments of the present application, after the inputting the far-end audio signal into the adaptive filtering module, the method further includes:
estimating an echo signal of the far-end audio signal according to the output signal of the self-adaptive filtering module;
and removing the echo signal of the far-end audio signal from the near-end audio signal, and sending the far-end audio signal as a far-end audio signal of an opposite terminal.
In some embodiments of the present application, the reference filtering module and the real-time filtering module respectively process the near-end audio signals, wherein the near-end audio signals output by the double filtering process are obtained by weighting the near-end audio signals output by the reference filtering module and the near-end audio signals output by the real-time filtering module.
According to still another aspect of the present application, there is also provided an audio processing apparatus including:
the adaptive filtering module is configured to filter the far-end audio signal;
the system comprises an acquisition module, a filter module and a processing module, wherein the acquisition module is configured to input a far-end audio signal into an adaptive filtering module and acquire a first filtering coefficient after the adaptive filtering module converges;
a parameter extraction module configured to use the first filter coefficient as a filter coefficient of a reference filter module, wherein the dual-filter processing module includes a reference filter module and a real-time filter module;
the double-filtering processing module is configured to carry out double-filtering processing on the near-end audio signal;
the audio mixing module is configured to perform audio mixing on the near-end audio signal and the far-end audio signal after the double filtering processing to obtain an audio mixing signal;
and the control module is configured to control a near-end sound amplifying device to play the sound mixing signal, and the near-end sound amplifying device and a near-end acquisition device of the near-end audio signal are positioned in the same space.
In some embodiments of the present application, further comprising:
a decorrelation module configured to perform decorrelation processing on the near-end audio signal subjected to the double filtering processing, so as to perform audio mixing on the near-end audio signal subjected to the decorrelation processing and the far-end audio signal; and/or
And the automatic equalization module is configured to perform automatic equalization processing on the set frequency point in the near-end audio signal after the double filtering processing so as to perform sound mixing on the near-end audio signal after the automatic equalization processing and the far-end audio signal.
According to yet another aspect of the present application, there is also provided an audio processing system, comprising:
a near-end acquisition device for the near-end audio signal;
the near-end sound amplifying device and the near-end acquisition device are positioned in the same space;
the audio processing means of the near end as described above.
According to yet another aspect of the present application, there is also provided an electronic apparatus, including: a processor; a storage medium having stored thereon a computer program which, when executed by the processor, performs the steps as described above.
According to yet another aspect of the present application, there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps as described above.
Therefore, compared with the prior art, the scheme provided by the application has the following advantages:
this application is through inputing far-end audio signal self-adaptation filtering module, and acquire the first filter coefficient after the self-adaptation filtering module convergence is as the filter coefficient of the reference filter module of double-filter processing module to can reduce the far-end audio signal of near-end public address equipment broadcast, to the influence of the near-end audio signal who gathers, improve echo suppression effect.
Drawings
The above and other features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
Fig. 1 shows a flow chart of an audio processing method according to an embodiment of the application.
Fig. 2 shows a flow chart of a far-end audio signal forming a peer according to a specific embodiment of the present application.
Fig. 3 shows a block diagram of an audio processing system according to an embodiment of the application.
Fig. 4 shows a schematic diagram of an audio processing system and an audio processing apparatus according to an embodiment of the application.
Fig. 5 is a block diagram of an audio processing system and an audio processing apparatus according to an embodiment of the present application.
Fig. 6 schematically illustrates a computer-readable storage medium in an exemplary embodiment of the disclosure.
Fig. 7 schematically illustrates an electronic device in an exemplary embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
In order to overcome the defects in the prior art, the application provides an audio processing method and an audio processing apparatus, which can alleviate the generation of echo and howling in a scene where a near-end acquisition device, a far-end acquisition device, a near-end public address device and a far-end public address device are used simultaneously. The near-end acquisition equipment and the near-end public address equipment are located in the same space, and the far-end acquisition equipment and the far-end public address equipment are located in the same space. The audio processing system at the near end interacts with the audio processing system at the far end. The near-end public address equipment can play the audio signals collected by the near-end collecting equipment and the audio signals collected by the far-end collecting equipment. Meanwhile, the audio signal played by the near-end public address device may be collected by the near-end collection device. Similarly, when the far-end audio processing system is used as the near-end audio processing system, the far-end acquisition device and the far-end public address device acquire and play the audio signals in the same way.
Referring first to fig. 1, fig. 1 shows a flow chart of an audio processing method according to an embodiment of the present application. Fig. 1 shows the following steps in total:
step S110: and inputting a far-end audio signal into the self-adaptive filtering module, and acquiring a first filtering coefficient after the self-adaptive filtering module converges.
Specifically, the adaptive filtering module may perform filtering using a normalized least mean square algorithm (NLMS). The invention is not limited thereto, and other filtering algorithms of the adaptive filtering module are also within the scope of the invention. The filter coefficient of the adaptive filter module changes dynamically.
In a preferred embodiment, step S110 may be further followed by a step of performing confidence calculation on the first filter coefficient.
In some embodiments of the present application, the confidence calculation may be performed by: obtaining a plurality of self-adaptive filtering curves according to a plurality of historical first filtering coefficients and gains of historical audio signals at each frequency point, wherein the historical audio signals are output by the self-adaptive filtering modules corresponding to the historical first filtering coefficients; clustering the multiple self-adaptive filter curves to obtain a reference first filter coefficient; calculating the confidence of the first filter coefficient according to the reference first filter coefficient. Further, the horizontal axis of the adaptive filter curve is the frequency points, and the vertical axis is the gain of each frequency point. Because different frequency points correspond to different filter coefficients, the first filter coefficient actually is a filter coefficient group to represent different frequency points and corresponding filter coefficients. For clarity, the following describes the implementation of the above steps with a filter coefficient set: the variance may be calculated by fitting a curve (a set of filter coefficients) reflecting the transfer function trend, such as by a plurality of sets of filter coefficients (historical first filter coefficients), calculating a variance from the fitted coefficient curve by the newly input set of filter coefficients, indicating that the newly input set of filter coefficients is too far from the fitted coefficient curve, discarding, if the variance is greater than a threshold, adding the newly input set of filter coefficients, discarding the set of filter coefficients that was added at the earliest time, and re-fitting a set of curve coefficients based on the newly added set of filter coefficients. Specifically, the confidence may be inversely correlated with the variance, and thus, the confidence may be smaller when the variance is larger and the confidence may be larger when the variance is smaller, so that the next step may be performed when the confidence is larger than the set threshold.
In some embodiments of the present application, the confidence may be calculated by a pre-trained confidence calculation model. The confidence calculation model may be any machine learning model, such as a neural network model, a random forest model, etc., and the present invention is not limited thereto. The confidence calculation model can learn according to the sample data and the set confidence.
In other embodiments of the present application, only the far-end audio signal may be input to the adaptive filtering module to obtain the converged reference first filter coefficient (filter coefficient) when the near-end capturing device does not capture the audio signal (or when the audio intensity of the near-end captured audio signal is 0). Then, the degree of similarity between the first filter coefficient and the reference first filter coefficient can be used as the confidence of the first filter coefficient, so as to indicate whether the first filter coefficient really reflects the near-end environment transfer function.
In still other embodiments of the present application, a functional relationship between the first filter coefficient and the filter effect may be generated/fitted based on the first filter coefficient and the filter effect obtained in the current near-end environment and the far-end environment within a previously set time period. Therefore, when the filtering effect is better, the higher the confidence of the first filtering coefficient is, the confidence of the first filtering coefficient can be calculated in real time based on the generated/fitted functional relationship between the first filtering coefficient and the filtering effect, and the confidence judgment is performed on the obtained first filtering coefficient.
Therefore, confidence calculation can be performed through the first filter coefficient, so that the first filter coefficient can truly reflect the environment transfer function, and the echo suppression effect is further improved.
Step S130: and taking the first filter coefficient as a filter coefficient of a reference filter module.
In the above preferred example of calculating the confidence level, when the confidence level of the first filter coefficient is greater than the preset threshold, it indicates that the first filter coefficient can truly reflect the ambient transfer function of the near end, so that the near-end audio signal can be filtered by using the first filter coefficient (the near-end audio signal may include the far-end audio signal played by the near-end loudspeaker device).
Step S140: performing double filtering processing on the near-end audio signal through the double filtering processing module
Specifically, the dual filtering processing module includes a reference filtering module and a real-time filtering module. The reference filtering module and the real-time filtering module can adopt a normalized least mean square algorithm for filtering. The invention is not limited thereto, and other filtering algorithms of the reference filtering module and the real-time filtering module are also within the protection scope of the invention. The filter coefficients of the reference filter module are set by the adaptive filter module. And the filter coefficient of the real-time filter module changes dynamically.
Further, the normalized least mean square algorithm processes the input audio signal by using the reference signal, thereby realizing filtering operations such as echo cancellation and the like. In this embodiment, the filter coefficient of the reference filtering process is obtained by convergence and confidence calculation of the adaptive filtering module, so that the near-end and far-end environmental transfer functions can be estimated, and meanwhile, the output signal of the reference filtering module can be used as the reference signal of the real-time filtering module, so that the real-time filtering module can perform optimized echo cancellation and filtering operations with reference to the estimation of the near-end and far-end environmental transfer functions.
Specifically, the reference filtering module and the real-time filtering module process the near-end audio signals respectively, wherein the near-end audio signals output by the double filtering process are obtained by weighting the near-end audio signals output by the reference filtering module and the near-end audio signals output by the real-time filtering module. The weights of the reference filtering module and the real-time filtering module can be set as required, and can also be calculated based on the filtering effects of the reference filtering module and the real-time filtering module. For example, the echo residual energy may be obtained by one or more of the size of the residual echo energy filtered by the reference filtering module and the real-time filtering module, the product of the difference values of the residual echo energy filtered by the reference filtering module and the real-time filtering module, and the product of the residual echo energy of the reference filtering module and the estimated echo difference energy. The estimated echo energy is estimated echo difference energy obtained by calculating according to the estimated echo difference of the reference filtering module and the real-time filtering module. The present application is not so limited.
Step S150: and mixing the near-end audio signal and the far-end audio signal after the double filtering processing to obtain a mixed audio signal.
Step S160: and playing the mixed sound signal by the near-end public address equipment.
Therefore, the audio signals collected by the near end and the audio signals collected by the far end can be played together, and the generation of echo and howling can be relieved.
According to the audio processing method, the far-end audio signal is input into the self-adaptive filtering module, the first filtering coefficient after the self-adaptive filtering module converges is obtained and serves as the filtering coefficient of the reference filtering module of the double-filtering processing module, so that the far-end audio signal played by the near-end public address equipment can be reduced, the influence on the collected near-end audio signal is reduced, and the echo suppression effect is improved.
In some embodiments of the present application, after the dual-filtering processing module performs the dual-filtering processing on the near-end audio signal in step S140 of fig. 1, and performs the audio mixing on the near-end audio signal after the dual-filtering processing and the far-end audio signal in step S150, the following steps may be further performed before obtaining the audio mixing signal: and performing decorrelation processing on the near-end audio signal after the double filtering processing so as to perform audio mixing on the near-end audio signal after the decorrelation processing and the far-end audio signal. Specifically, the decorrelation processing may be implemented by, for example, frequency shift and/or phase modulation, so that the correlation between the audio signal to be played by the public address device and the audio signal collected by the collecting device is reduced, which is beneficial to the double filtering processing and can avoid the generation of howling.
In some embodiments of the present application, after the dual-filtering processing module performs the dual-filtering processing on the near-end audio signal in step S140 of fig. 1, and performs the audio mixing on the near-end audio signal after the dual-filtering processing and the far-end audio signal in step S150, the following steps may be further performed before obtaining the audio mixing signal: and automatically equalizing the set frequency points in the near-end audio signals after the double filtering processing so as to mix the near-end audio signals after the automatic equalization processing and the far-end audio signals. Particularly, sound dyeing is easy to occur due to poor room sound construction, acoustic resonance of a room enables certain frequencies in sound to be really strengthened, subsequent howling is caused, and automatic equalization can be carried out according to certain unstable frequency points in a room sound field. Specifically, because the filter coefficients are coefficients corresponding to frequency points in the frequency domain, unstable points that are possible to estimate can be roughly known as set frequency points through a larger coefficient, and then filtering is performed through set automatic equalization parameters (for example, 1/3 octaves are used to enable the center frequency to correspond to the possible unstable points, and the gain visual coefficient is determined to be-6 to-3 dB, which is not limited in the present application), so that the response sensitivity of the frequency points is adaptively reduced, and the response sensitivity of the set frequency points in the near-end audio signal after the automatic equalization processing is smaller than the response sensitivity of the set frequency points in the near-end audio signal before the automatic equalization processing, thereby achieving early prevention of howling and improving the gain of sound transmission. Further, the step of the automatic equalization process may be performed before the step of the decorrelation process.
In some embodiments of the present application, after the step S110 in fig. 1 inputs the far-end audio signal into the adaptive filtering module, the steps shown in fig. 2 may be further performed:
step S170: and estimating an echo signal of the far-end audio signal according to the output signal of the self-adaptive filtering module.
Step S180: and removing the echo signal of the far-end audio signal from the near-end audio signal, and sending the far-end audio signal as a far-end audio signal of an opposite terminal.
Therefore, the echo of the near-end audio signal collected by the near-end collecting device can be processed at the near end through the steps, and meanwhile, the audio signal subjected to echo removal processing can be used as a reference signal of the adaptive filter, so that convergence of a filter coefficient of the adaptive filter is facilitated, the accuracy of echo estimation is enhanced, and meanwhile, the adaptive filter can provide a first filter coefficient with higher confidence coefficient to the double-filter module conveniently.
The above exemplary embodiments of the present application are shown, the present application is not limited thereto, and in each embodiment, the addition, the omission, and the sequence change of the steps are all within the protection scope of the present application; the embodiments may be implemented individually or in combination.
Fig. 3 shows a block diagram of an audio processing system according to an embodiment of the application. In particular, fig. 3 shows an audio processing system at the near end and an audio processing system at the far end. The proximal end and the distal end are opposite to each other. As shown in fig. 3, the audio processing system at the near end includes a near end capturing device 210, a near end loudspeaker device 230, and a near end audio processing apparatus 220. The near-end capture device 210 and the near-end microphone device 230 are located in the same space, so that the audio signal played by the near-end microphone device 230 is transferred to the near-end capture device 210 through the ambient transfer function of the near-end. The audio processing system at the far end comprises a far end acquisition device 240, a far end loudspeaker device 260 and a far end audio processing means 250. The far-end capturing device 240 and the far-end loudspeaker device 260 are located in the same space, so that the audio signal played by the far-end loudspeaker device 260 is transferred to the far-end capturing device 240 through the far-end environment transfer function. The near-end audio processing device 220 and the far-end audio processing device 250 communicate with each other in a wireless or wired manner, so that the near-end microphone device 230 can play the audio signal collected by the far-end capture device 240, and similarly, the far-end microphone device 260 can also play the audio signal collected by the near-end capture device 210. Further, the public address device may be, for example, a sound; the acquisition device may be, for example, a microphone, which is not intended to be limiting in the present application.
Referring now to fig. 4, fig. 4 shows a schematic diagram of an audio processing system and an audio processing apparatus according to an embodiment of the present application. The audio processing system and the audio processing apparatus shown in fig. 4 are applied to the scenario shown in fig. 3. The audio processing system comprises a near-end acquisition device 210 for near-end audio signals, a near-end loudspeaker device 230 and a near-end audio processing means 220. The audio processing device 250 at the far end in fig. 3 may have the same structure as the audio processing device 220 at the near end.
The audio processing apparatus 220 at the near end includes an adaptive filtering module 221, an obtaining module 222, a parameter extracting module 224, a dual filtering processing module 225, a mixing module 226, and a control module 227.
The adaptive filtering module 221 is configured to filter the far-end audio signal.
The obtaining module 222 is configured to input the far-end audio signal into the adaptive filtering module and obtain a first filter coefficient after convergence of the adaptive filtering module.
The parameter extraction module 224 is configured to use the first filter coefficient as a filter coefficient of the reference filter module.
The dual filtering processing module 225 is configured to perform dual filtering processing on the near-end audio signal. The double-filtering processing module comprises a reference filtering module and a real-time filtering module. The reference filtering module and the real-time filtering module may have the same filter as the adaptive filtering module 221.
The mixing module 226 is configured to mix the near-end audio signal and the far-end audio signal after the double filtering processing to obtain a mixed signal.
The control module 227 is configured to control the near-end audio amplifying device to play the audio mixing signal.
Therefore, the far-end audio signal is input into the self-adaptive filtering module, and the first filtering coefficient after the self-adaptive filtering module converges is obtained to be used as the filtering coefficient of the reference filtering module of the double-filtering processing module, so that the far-end audio signal played by the near-end public address equipment can be reduced, the influence on the collected near-end audio signal is reduced, and the echo suppression effect is improved.
Referring now to fig. 5, fig. 5 is a block diagram illustrating an audio processing system and an audio processing device according to an embodiment of the present application. The audio processing system and the audio processing apparatus shown in fig. 5 are applied to the scenario shown in fig. 3. The audio processing system comprises a near-end acquisition device 210 for near-end audio signals, a near-end loudspeaker device 230 and a near-end audio processing means 220. The audio processing device 250 at the far end in fig. 3 may have the same structure as the audio processing device 220 at the near end.
The near-end audio processing apparatus 220 includes an adaptive filtering module 221, an obtaining module 222, a confidence calculation module 223, a parameter extraction module 224, a dual filtering processing module 225, a mixing module 226, a control module 227, an echo removal module 228, an automatic equalization module 229, and a decorrelation module 2210.
The adaptive filtering module 221, the obtaining module 222, the parameter extracting module 224, the dual filtering processing module 225, the mixing module 226, and the control module 227 have already been described with reference to fig. 4, and are not described herein again.
The confidence calculation module 223 is configured to perform confidence calculation on the first filter coefficient, and thus the parameter extraction module 224 is configured to use the first filter coefficient as the filter coefficient of the reference filter module of the dual filter processing module when the confidence of the first filter coefficient is greater than a preset threshold.
The echo removing module 228 is configured to estimate an echo signal of the far-end audio signal according to the output signal of the adaptive filtering module, and remove the echo signal of the far-end audio signal from the near-end audio signal, and send the estimated echo signal as a far-end audio signal of the opposite end.
The automatic equalization module 229 is configured to perform automatic equalization on the set frequency points in the near-end audio signal after the double filtering processing, so as to perform sound mixing on the near-end audio signal after the automatic equalization processing and the far-end audio signal.
The decorrelation module 2210 is configured to decorrelate the dual-filtered near-end audio signal for mixing the decorrelated near-end audio signal and the far-end audio signal.
The audio processing device and system can be realized by software, hardware, firmware and any combination thereof. Fig. 3-5 are merely schematic illustrations of the audio processing apparatus and system provided in the present application, and the splitting, combining, and adding of modules are within the scope of the present application without departing from the spirit of the present application.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium, on which a computer program is stored, which when executed by, for example, a processor, may implement the steps of the audio processing method described in any one of the above embodiments. In some possible embodiments, the various aspects of the present application may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to the various exemplary embodiments of the present application described in the audio processing method section above of this specification, if the program product is run on the terminal device.
Referring to fig. 6, a program product 800 for implementing the above method according to an embodiment of the present application is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the tenant computing device, partly on the tenant device, as a stand-alone software package, partly on the tenant computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing devices may be connected to the tenant computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In an exemplary embodiment of the present disclosure, there is also provided an electronic device, which may include a processor, and a memory for storing executable instructions of the processor. Wherein the processor is configured to perform the steps of the audio processing method in any of the above embodiments via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 600 according to this embodiment of the present application is described below with reference to fig. 7. The electronic device 600 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 7, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present application described in the above-mentioned audio processing method section of the present specification. For example, the processing unit 610 may perform the steps as shown in fig. 1 or 2.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a tenant to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above-mentioned audio processing method according to the embodiments of the present disclosure.
This application is through inputing far-end audio signal self-adaptation filtering module, and acquire the first filter coefficient after the self-adaptation filtering module convergence is as the filter coefficient of the reference filter module of double-filter processing module to can reduce the far-end audio signal of near-end public address equipment broadcast, improve echo suppression effect to the influence of the near-end audio signal who gathers.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. An audio processing method, comprising:
inputting a far-end audio signal into a self-adaptive filtering module, and acquiring a first filtering coefficient after the self-adaptive filtering module converges;
the first filter coefficient is used as a filter coefficient of a reference filter module, and the double-filter processing module comprises a reference filter module and a real-time filter module;
performing double filtering processing on the near-end audio signal through the double filtering processing module;
performing sound mixing on the near-end audio signal and the far-end audio signal after the double filtering processing to obtain a sound mixing signal;
and playing the audio mixing signal by near-end public address equipment, wherein the near-end public address equipment and the near-end acquisition equipment of the near-end audio signal are positioned in the same space.
2. The audio processing method according to claim 1, wherein after obtaining the converged first filter coefficient of the adaptive filter module and before using the converged first filter coefficient as a filter coefficient of a reference filter module, the method further comprises:
performing confidence calculation on the first filter coefficient;
and when the confidence coefficient of the first filter coefficient is larger than a preset threshold value, executing the step of taking the first filter coefficient as the filter coefficient of the reference filter module.
3. The audio processing method of claim 2, wherein the performing the confidence calculation on the first filter coefficient comprises:
obtaining a plurality of self-adaptive filtering curves according to a plurality of historical first filtering coefficients and gains of historical audio signals at each frequency point, wherein the historical audio signals are output by the self-adaptive filtering modules corresponding to the historical first filtering coefficients;
fitting the plurality of adaptive filter curves to obtain a reference first filter coefficient;
calculating the confidence of the first filter coefficient according to the reference first filter coefficient.
4. The audio processing method of claim 1, wherein after the double-filtering processing module performs the double-filtering processing on the near-end audio signal, and before the double-filtered near-end audio signal and the far-end audio signal are mixed, the method further comprises:
performing decorrelation processing on the near-end audio signal subjected to the double filtering processing so as to perform audio mixing on the near-end audio signal subjected to the decorrelation processing and the far-end audio signal; and/or
And automatically equalizing the set frequency points in the near-end audio signals after the double filtering processing so as to mix the near-end audio signals after the automatic equalization processing and the far-end audio signals.
5. The audio processing method according to claim 4, wherein the performing automatic equalization processing on the set frequency points in the near-end audio signal after the double filtering processing comprises:
and the response sensitivity of the set frequency point in the near-end audio signal after the automatic equalization processing is smaller than the response sensitivity of the set frequency point in the near-end audio signal before the automatic equalization processing.
6. The audio processing method of claim 1, wherein said inputting the far-end audio signal into the adaptive filtering module further comprises:
estimating an echo signal of the far-end audio signal according to the output signal of the self-adaptive filtering module;
and removing the echo signal of the far-end audio signal from the near-end audio signal, and sending the far-end audio signal as a far-end audio signal of an opposite terminal.
7. The audio processing method as claimed in claim 1, wherein the reference filtering module and the real-time filtering module process the near-end audio signals respectively, and wherein the near-end audio signals output by the double filtering process are obtained by weighting the near-end audio signals output by the reference filtering module and the near-end audio signals output by the real-time filtering module.
8. An audio processing apparatus, comprising:
the adaptive filtering module is configured to filter the far-end audio signal;
the system comprises an acquisition module, a filter module and a processing module, wherein the acquisition module is configured to input a far-end audio signal into an adaptive filtering module and acquire a first filtering coefficient after the adaptive filtering module converges;
a parameter extraction module configured to use the first filter coefficient as a filter coefficient of a reference filter module, wherein the dual-filter processing module includes a reference filter module and a real-time filter module;
the double-filtering processing module is configured to carry out double-filtering processing on the near-end audio signal;
the audio mixing module is configured to perform audio mixing on the near-end audio signal and the far-end audio signal after the double filtering processing to obtain an audio mixing signal;
and the control module is configured to control a near-end sound amplifying device to play the sound mixing signal, and the near-end sound amplifying device and a near-end acquisition device of the near-end audio signal are positioned in the same space.
9. The audio processing apparatus of claim 8, further comprising:
a decorrelation module configured to perform decorrelation processing on the near-end audio signal subjected to the double filtering processing, so as to perform audio mixing on the near-end audio signal subjected to the decorrelation processing and the far-end audio signal; and/or
And the automatic equalization module is configured to perform automatic equalization processing on the set frequency point in the near-end audio signal after the double filtering processing so as to perform sound mixing on the near-end audio signal after the automatic equalization processing and the far-end audio signal.
10. An audio processing system, comprising:
a near-end acquisition device for the near-end audio signal;
the near-end sound amplifying device and the near-end acquisition device are positioned in the same space;
audio processing means at the near end as claimed in claim 8 or 9.
CN202110741217.8A 2021-06-30 2021-06-30 Audio processing method, device and system Pending CN113470677A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110741217.8A CN113470677A (en) 2021-06-30 2021-06-30 Audio processing method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110741217.8A CN113470677A (en) 2021-06-30 2021-06-30 Audio processing method, device and system

Publications (1)

Publication Number Publication Date
CN113470677A true CN113470677A (en) 2021-10-01

Family

ID=77876960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110741217.8A Pending CN113470677A (en) 2021-06-30 2021-06-30 Audio processing method, device and system

Country Status (1)

Country Link
CN (1) CN113470677A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004258533A (en) * 2003-02-27 2004-09-16 National Institute Of Advanced Industrial & Technology Multichannel real-time sound signal processor
CN1842110A (en) * 2005-03-28 2006-10-04 华为技术有限公司 Echo eliminating device and method
KR20090010288A (en) * 2007-07-23 2009-01-30 삼성전자주식회사 Apparatus and method for echo cancelling in portable terminal
KR20110007394A (en) * 2009-07-16 2011-01-24 (주)시그젠 Real-time howling signal eliminating system and method
CN102938254A (en) * 2012-10-24 2013-02-20 中国科学技术大学 Voice signal enhancement system and method
CN109451195A (en) * 2018-09-18 2019-03-08 北京佳讯飞鸿电气股份有限公司 A kind of echo cancel method and system of adaptive double-end monitor

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004258533A (en) * 2003-02-27 2004-09-16 National Institute Of Advanced Industrial & Technology Multichannel real-time sound signal processor
CN1842110A (en) * 2005-03-28 2006-10-04 华为技术有限公司 Echo eliminating device and method
KR20090010288A (en) * 2007-07-23 2009-01-30 삼성전자주식회사 Apparatus and method for echo cancelling in portable terminal
KR20110007394A (en) * 2009-07-16 2011-01-24 (주)시그젠 Real-time howling signal eliminating system and method
CN102938254A (en) * 2012-10-24 2013-02-20 中国科学技术大学 Voice signal enhancement system and method
CN109451195A (en) * 2018-09-18 2019-03-08 北京佳讯飞鸿电气股份有限公司 A kind of echo cancel method and system of adaptive double-end monitor

Similar Documents

Publication Publication Date Title
US10438607B2 (en) Device and method for cancelling echo
CN111768796B (en) Acoustic echo cancellation and dereverberation method and device
CN108630219B (en) Processing system, method and device for echo suppression audio signal feature tracking
US11587575B2 (en) Hybrid noise suppression
CN111031448B (en) Echo cancellation method, echo cancellation device, electronic equipment and storage medium
US20070286230A1 (en) Echo cancellation for channels with unknown time-varying gain
CN111583950B (en) Audio processing method and device, electronic equipment and storage medium
CN113241085B (en) Echo cancellation method, device, equipment and readable storage medium
CN112601045A (en) Speaking control method, device, equipment and storage medium for video conference
CN111710344A (en) Signal processing method, device, equipment and computer readable storage medium
CN109215672B (en) Method, device and equipment for processing sound information
EP4198824A1 (en) Audio signal processing method and device, training method and device, apparatus and storage medium
CN113689878A (en) Echo cancellation method, echo cancellation device, and computer-readable storage medium
US8600037B2 (en) Audio quality and double talk preservation in echo control for voice communications
CN112997249B (en) Voice processing method, device, storage medium and electronic equipment
CN110992975B (en) Voice signal processing method and device and terminal
CN113470677A (en) Audio processing method, device and system
CN112151051A (en) Audio data processing method and device and storage medium
CN113113046B (en) Performance detection method and device for audio processing, storage medium and electronic equipment
CN115620737A (en) Voice signal processing device, method, electronic equipment and sound amplification system
CN113241084B (en) Echo cancellation method, device and equipment
CN112653979A (en) Adaptive dereverberation method and device
CN114302286A (en) Method, device and equipment for reducing noise of call voice and storage medium
CN113286047B (en) Voice signal processing method and device and electronic equipment
CN111640449A (en) Echo cancellation method, computer readable storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination