WO2013189199A1 - 一种单通道语音去混响的方法和装置 - Google Patents
一种单通道语音去混响的方法和装置 Download PDFInfo
- Publication number
- WO2013189199A1 WO2013189199A1 PCT/CN2013/073584 CN2013073584W WO2013189199A1 WO 2013189199 A1 WO2013189199 A1 WO 2013189199A1 CN 2013073584 W CN2013073584 W CN 2013073584W WO 2013189199 A1 WO2013189199 A1 WO 2013189199A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- current frame
- sound
- power spectrum
- late
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Definitions
- the present invention relates to the field of speech enhancement, and more particularly to a method and apparatus for single channel speech dereverberation.
- voice communication such as conference calls, smart television network telephones, and the like
- the speaker is relatively far away from the microphone, and the communication environment is a relatively closed space, and the signal received by the microphone end is susceptible to environmental reverberation.
- the signal received by the microphone termination is a mixture of direct and reflected sounds. This part of the reflected sound is the reverb signal.
- the reverberation is serious, the voice will be unclear and affect the quality of the call.
- the reverberation caused by the reverberation will also lead to poor performance of the acoustic receiving system and a significant degradation in the performance of the speech recognition system.
- de-reverberation method does not require estimating the impulse response of the reverberant environment, so there is no need to calculate the inverse filter and perform the inverse filtering operation, also known as the blind de-reverberation method.
- Such methods are usually based on speech model assumptions, such as: Reverberation causes the received voiced excitation pulse to change, making the periodicity less noticeable, affecting speech intelligibility.
- This type of method is generally based on the LPC (Linear Prediction Coding) model, assuming that the model that produces the speech is an all-pole model, and that reverberation or other additive noise introduces new zeros throughout the system, thus overwhelming The voiced spurs the pulse, but does not affect the all-pole filter.
- LPC Linear Prediction Coding
- the dereverberation method is: Estimating the LPC residual of the signal, and then according to the gene synchronization burst > (1 (( pitch - synchronous clustering criterion) or 11 more ⁇ K rtosis ) : , to estimate the clean pulse excitation sequence, thus achieving dereverberation.
- the problem with this type of method is that the computational complexity is often very high, and the assumption that the reverberation only affects the all-zero filter is inconsistent with the experimental analysis. Reverberation is a better solution by using the subtractive method.
- the speech signal includes direct sound, early reflected sound and late reflected sound.
- the spectral subtraction method is used to remove the power spectrum of the late reflected sound from the power spectrum of the whole speech to improve the speech quality.
- the present invention provides a single channel speech dereverberation method and apparatus for solving the problem of estimating the transfer function of a reverberant environment or estimating the reverberation time in single channel speech dereverberation.
- the invention discloses a single channel voice dereverberation method, and the method comprises:
- the input single-channel speech signal is divided into rows and frames, and the frame signal is processed in chronological order as follows: a short-time Fourier transform is performed on the current frame to obtain a power spectrum and a phase spectrum of the current frame;
- the power spectrum of the estimated reflection sound of the current frame is removed from the power of the current frame by subtraction, and the power spectrum of the direct sound and the early reflected sound of the current frame is obtained;
- the power error of the direct sound and the early reflected sound of the current frame is subjected to a short-time Fourier transform together with the phase spectrum of the current frame to obtain a signal after the current frame is dereverbered.
- the upper limit value of the duration range is set according to the attenuation characteristic of the late anti-shot sound; and the duration is set according to the speech correlation characteristic and the impact response distribution area of the direct sound and the early reflection sound in the reverberation environment.
- the lower limit of the range is set according to the attenuation characteristic of the late anti-shot sound.
- the upper limit of the duration range is selected at (). A value between 3 seconds and 0.5 seconds.
- the lower limit of the duration range selects a value between 50 milliseconds and 80 milliseconds.
- the power spectrum of the frames is linearly superimposed to estimate the power spectrum of the late reflection sound of the current frame, and specifically includes:
- the autoregressive AR model is used to estimate the power spectrum of the late reflection sound of the current frame by superimposing all the components in the power spectrum of these frames;
- the power of these frames is linearly superimposed on the direct and early reflected sound components to estimate the power spectrum of the late inverse of the current frame.
- the autoregressive AR model is used to map all the power spectra of these frames. The components are linearly superimposed, and the moving average and the early reflected sound components in the power spectrum of these frames are linearly superimposed using a moving average MA model to estimate the power spectrum of the late reflected sound of the current frame.
- the invention also discloses a single channel voice dereverberation device, and the device comprises:
- a framing unit configured to framing the input single-channel speech signal, and outputting the frame signal to the Fourier transform unit in chronological order;
- a Fourier transform unit configured to perform short-time Fourier transform on the received current frame, obtain a power spectrum and a phase spectrum of the current frame, and output a power spectrum of the current frame to the spectral subtraction unit and the spectral estimation unit, to Fourier Inverse transform unit output phase;
- a multiplication unit configured to estimate a power spectrum of a late-reflected sound of a current frame of a current frame from a power-spot of a thousand frames within a set duration of time before the current frame, to a power spectrum of the current frame, to The submersible unit outputs the estimated power-latency of the late reflection sound of the current frame;
- a spectral subtraction unit configured to remove the power of the late reflection sound of the current frame obtained by the error estimation unit from the power language of the current frame obtained by the spectral subtraction from the Fourier transform unit, to obtain the direct sound and the early reflection sound of the current frame The power of the output, outputting the power of the direct sound of the current frame and the early reflected sound to the inverse Fourier transform unit;
- An inverse Fourier transform unit for performing a short time Fourier inverse with a power spectrum of a direct sound and an early reflected sound of a current frame obtained from a spectral subtraction unit together with a phase spectrum of a current frame obtained from a Fourier transform unit Transform, output the signal after the current frame is reverberated.
- the spectrum estimation unit is specifically configured to set an upper limit value of the duration range according to an attenuation characteristic of the late reflection sound; and Z or, depending on the voice related characteristic and the direct sound and the early reflection sound in the reverberation environment
- the lower impact response distribution area sets the lower limit of the duration range.
- the pan estimating unit is specifically configured to select a value between the upper limit of the duration range of (), 3 seconds to 0.5 seconds.
- the potential estimating unit is specifically configured to select a lower limit value of a range of durations between 50 milliseconds and 8 milliseconds.
- the estimating unit is specifically configured to: For the thousands of frames before the current frame whose distance to the current frame is within the set duration, the autoregressive AR model is used to linearly superimpose all the components in the power spectrum of the frames to estimate the late reflection sound of the current frame. Power spectrum; for the thousands of frames in the range of the set duration, before the current frame, the moving average MA model is used to linearly superimpose the direct and early reflected sound components in the power spectrum of the frames. The power spectrum of the late reflected sound of the current frame;
- the autoregressive AR type is used to linearly superimpose all the components in the power spectrum of the frames, and the frames are applied by the moving average MA model.
- the direct sound and the early reflected sound components are linearly superimposed to estimate the power spectrum of the late reflected sound of the current frame.
- the beneficial effects of the embodiment of the present invention are: estimating the late reflection sound of the current frame by linearly superimposing the power spectrum of the frames before the current frame and the distance from the current frame within the set duration.
- the power spectrum can estimate the power spectrum of the late reflection sound of the current frame without estimating the transfer function or reverberation time of the reverberation environment, and then use the spectral subtraction method to de-reverberation, simplifying the basis of voice-related characteristics and direct
- the upper limit value can reduce the amount of superposition calculation while ensuring the accuracy of the power spectrum of the estimated late reflection sound;
- the upper limit value is selected to be a value between 0.3 seconds and (), 5 seconds, and the upper limit value is a threshold value obtained through experiments.
- the upper limit value does not need to be adjusted. , can get 4 good de-reverberation effects;
- the lower limit value is set between 5 () milliseconds and 80 milliseconds.
- the lower limit value is not required to be changed, so that the direct sound and the early reflected sound can be effectively overlapped, so that the superimposed result is obtained. It basically does not contain direct sound and early reflection sound, so it can retain useful direct sound and early reflection sound while de-reverberation, and achieve better voice quality.
- Figure 2 is a schematic diagram of the impulse of a real room
- FIG. 3 is a schematic diagram of the effect of the implementation of the present invention
- FIG. 3( a ) is a time domain diagram of the reverberation signal
- FIG. 3 ( b ) is a time domain diagram of the signal after dereverberation
- FIG. 3 ( e ) is a reverberation signal and going The energy envelope curve of the reverberation signal
- FIG. 4 is a structural diagram of a single channel-to-speech de-reverberation device of the present invention.
- FIG. 5 is a structural diagram of a specific embodiment of a single channel-to-speech de-reverberation device according to the present invention.
- FIG. 1 is a single channel voice dereverberation provided by the present invention.
- Step S100 In-line framing the input single-channel speech signal, and processing the frame signal in chronological order as follows.
- Step S200 Perform a short-time Fourier transform on the current frame to obtain a power error and a phase spectrum of the current frame.
- Step S300 selecting thousands of frames before the current frame and ranging from the current frame to the set duration, and linearly superimposing the power spectrum of the frames into the _-line linear superposition to estimate the power spectrum of the late reflection sound of the current frame.
- the thousand frames are a preset number of frames, and may be all frames within the duration or a part of the frames within the duration.
- Step S400 removing the power spectrum of the late reverse sound of the estimated current frame from the power spectrum of the current frame by spectral subtraction, and obtaining the power of the direct sound and the early reflected sound of the current frame.
- Step S50() performing a short-time Fourier inverse transform on the power spectrum of the direct sound and the early reflected sound of the current frame together with the phase spectrum of the current frame to obtain a signal after the current frame is dereverbered.
- the signal acquired by the microphone is a mixture of direct and reflected sounds, which can be represented by the following reverberation model:
- A is the room impulse response between the two points from the sound source position to the microphone position
- * indicates the convolution operation, indicating other additive noise in the reverberant environment.
- the impulse response of a real room as shown in Figure 2. It can be divided into 3 parts, direct Peak ⁇ , early reflection and late reflection. Convolution with 5 (0) can be simply thought of as the reproduction of the signal from the sound source at the microphone end after a certain delay, corresponding to the direct sound portion of x . The impact response of the early reflection portion corresponds to the portion of the subsequent period of time, The time point of the duration of the duration is
- the impact response of the late reflection part is the long tailing part of the room impulse response after the removal and the reverberation caused by the convolution of this part, which is the reverberation component that affects the sense of hearing.
- the de-reverberation algorithm is mainly to remove the influence of this part.
- the reverb model can also be expressed as:
- the reverberation time (RT60) of the reverberation environment is a zero-mean Gaussian distribution random variable.
- the power word estimation of the late reflected sound is described in detail below.
- the signal power spectrum can be expressed as: where /) is the power spectrum of the late reflection sound, and ⁇ , /) is the power spectrum of the direct sound and early reflection sound, which should be preserved.
- the spectral subtraction can be used to estimate the ⁇ ⁇ /) from /) to achieve dereverberation.
- the power spectrum of the late reflection sound is linear with some components of the signal power spectrum or signal power before it, and the power of the direct sound and the early reflection sound is due to human speech characteristics. Some components in the past signal power spectrum or signal power do not form a linear relationship. Therefore, by linearly superimposing the power error components of the frame of a specific duration before the current frame, the power spectrum of the late reflection sound of the current frame can be estimated. Then, the power spectrum of the late reflected sound is removed from the power spectrum by spectral subtraction, and single-channel speech de-reverberation can be realized.
- the upper limit value of the duration range is set according to the attenuation characteristic of the late reflection sound.
- the lower limit value of the range of the duration is set according to the speech-related characteristic and the impact response distribution area of the direct sound and the early reflection sound in the reverberation environment.
- the lower limit value is set in the shock response distribution area in the reverberation environment, so that the time period of avoiding the direct sound and the early reflection sound energy concentration during the linear superposition can be better while removing the reverberation.
- the lower limit value of the duration range is selected to be a value between 50 milliseconds and 80 milliseconds.
- the upper limit of the duration range is selected to be a value between 0.3 seconds and 0.5 seconds.
- the setting of the upper limit is related to the specific environment in which the method is applied.
- the upper limit value theoretically corresponds to the length of the room impulse response, but the impulse response generation model combined with the reverberation generation and the real environment is attenuated by the exponential model, the distance The farther the reflected sound energy at the current time is, the less the energy of the reflected sound is almost negligible after more than 0.5 s. Therefore, in practice only a rough upper limit is needed to apply to most reverberant environments.
- the upper limit value is between 0.3 seconds and 0,5 seconds, for the anechoic chamber environment (the reverberation time is often short), the general office environment (reverberation time 0,3 ⁇ 0.5s), or even
- the reverberation environment of the auditorium (reverberation time > is) has a good adaptability.
- the method of the present invention only estimates the linear component and bypasses the energy concentration period of the direct sound and the early reflected sound, so even if the upper limit value is much longer than the reverberation time of the anechoic chamber, the effective speech component is Will not be removed.
- the linearly superimposing the power spectra of the frames to estimate the power spectrum of the late inverse sound of the current frame specifically includes: applying an autoregressive AR model to line all the components in the power spectrum of the frames. The surname superposition estimates the power spectrum of the late reflected sound of the current frame.
- the power spectrum of the frames is estimated to be linearly superimposed to estimate the power spectrum of the late reflection sound of the current frame, which specifically includes: applying a moving average MA model to direct and early reflection sounds in the power spectrum of the frames The components are linearly superimposed to estimate the power spectrum of the late reflected sound of the current frame.
- the power spectrum of the frames is linearly superimposed to estimate the power spectrum of the late reflection sound of the current frame, which specifically includes: applying an autoregressive AR model to superimpose all components in the power spectrum of the frames And applying a moving average MA model to linearly superimpose the direct sound and early reflected sound components in the power spectrum of these frames, and estimate the power pan of the late reflected sound of the current frame.
- J is the order of the AR model derived from the upper limit of the set duration range
- a is the AR model estimate-number, which is the order of the MA model derived from the set upper limit, which is MA
- Y(t - j - M) is the power spectrum of the direct and early reflection sounds of the j frame before the current frame
- ⁇ ⁇ ' ⁇ '/) is the power of the j frame before the current frame. , for frame spacing.
- the power spectrum estimation of the late reflection sound mentioned in the prior art is often a special case of the AR or MA or ARMA model proposed above.
- the power estimation methods of other late reflection sounds often need to estimate the reverberation in the intermittent phase of speech.
- the ambient reverberation time (RT60) is an important parameter in the power spectrum estimation of late reflection sound.
- RT60 ambient reverberation time
- , / is the spectrum -;
- Gain (gain) function ' n ⁇ fruit as shown in Figure 3.
- the reverb signal (single channel voice signal) is collected from the conference room, the sound source and microphone are 2m away, and the reverberation time (RT60) is about 0,45s.
- the power spectrum of the late reflected sound is estimated, the lower limit is set to 80 ms, and the upper limit is set to 0, 5 s.
- the reverberation tail is obviously attenuated, and the speech quality is significantly improved.
- the apparatus of the present invention is shown in Figure 4, and the single channel "voice de-reverberation" apparatus includes the following units.
- the framing unit 100 is configured to frame the input single channel_speech signal and output the frame signal to the Fourier transform unit 200 in chronological order.
- a Fourier transform unit 200 configured to perform short-time Fourier transform on the received current frame, obtain a power spectrum and a phase spectrum of the current frame, and output a power spectrum of the current frame to the spectral subtraction unit 400 and the estimating unit 300, to Fu
- the inverse Fourier transform unit 500 outputs a phase spectrum.
- the spectrum estimation unit 300 is configured to superimpose the power spectrum of the thousands of frames in the range of the current frame before the current frame within the set duration, and estimate the power spectrum of the late anti-sound of the current frame.
- the subtraction unit 400 outputs the estimated power of the late reflection sound of the current frame.
- the subtraction unit 400 is configured to remove the power spectrum of the late reflection sound of the current frame obtained from the estimation unit 300 in the power spectrum of the current frame obtained from the Fourier transform unit 200 by the f subtraction, to obtain the current direct sound and early reflection.
- the power spectrum of the sound outputs the power of the direct sound of the current frame and the early reflected sound to the inverse Fourier transform unit 500.
- the inverse Fourier transform unit 500 is configured to perform short-time Fu with the power spectrum of the direct sound and the early reflected sound of the current frame obtained from the subtraction unit 400 together with the phase spectrum of the current frame obtained from the Fourier transform unit 200. Inverse transform, output the current frame to the signal after reverberation.
- the estimation unit 300 is specifically configured to set an upper limit value of the duration range according to an attenuation characteristic of the late reflection sound.
- the estimating unit 300 is specifically configured to set the lower limit value of the duration range according to the speech correlation characteristic and the impact response distribution area of the direct sound and the early reflected sound in the reverberation environment.
- the spectrum estimation unit 300 is specifically configured to select a value between the upper limit of the range of durations of 0.3 seconds to 0.5 seconds.
- the pan estimating unit 300 is specifically configured to select a lower limit value of the range of time duration values between 5 () milliseconds and 80 milliseconds.
- the estimating unit 300 is specifically configured to: apply an autoregressive AR model to the frames of the thousands of frames within a set duration of time before the current frame to the current frame.
- a linear superposition of all components in the power spectrum estimates the power spectrum of the late reflected sound of the current frame.
- use the AR model to estimate the power of the late reflection sound of the current frame by the following formula: R(t, f) ⁇ J ⁇ ⁇ , f ' X(i - j ' At, f)
- the power spectrum for the estimated late reflection sound ' 7 .
- the number of starting stages derived from the set lower limit value, the order of the AR model derived from the set upper limit value, is the AR model estimation parameter;
- X ⁇ t -j - M is the power spectrum of the j frame before the current frame , for frame spacing.
- the error estimation unit 300 is specifically configured to: apply a moving average MA model to the power spectrum of the frames in the range of the set time length before the current frame.
- a linear superposition of the medium direct sound and the early reflected sound component estimates the power spectrum of the late reflected sound of the current frame.
- the starting number obtained from the lower limit of the setting ⁇ ⁇ is the order of the ⁇ model derived from the set upper limit value, / is the ⁇ model estimation parameter; . , /) is the direct sound of the j frame before the current frame
- the power error of the early reflected sound is the frame spacing.
- the error estimation unit 300 is specifically configured to: apply an autoregressive AR model to the power of the frames before the current frame, if the distance to the current frame is within a set duration All components in the spectrum are linearly superimposed, and the moving average and the early reflected sound components in the power spectrum of these frames are linearly superimposed by using the moving average MA model to estimate the power of the late reverse sound of the current frame.
- the number of initial stages obtained from the set lower limit value is the order of the AR model obtained from the set upper limit value
- / is the AR model estimation parameter
- ⁇ The order of the set of the upper limit of the MA model is the estimated value of the MA model
- Y(t ⁇ j - At, f) is the power spectrum of the direct and early reflected sound of the j frame before the current frame
- X(t -j -M,f) is the power spectrum of the j frame before the current frame
- ⁇ is the frame spacing
- the spectral subtraction unit 400 is specifically configured to: based on the power spectrum of the late reflected sound
- the benefit function multiplies the gain function by the power spectrum of the current frame to obtain the power of the direct and early reflected sounds of the current frame.
- the speech signal ⁇ '/) from which the reverberation is removed can be obtained by spectral subtraction:
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
- Telephone Function (AREA)
- Reverberation, Karaoke And Other Acoustics (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2015516415A JP2015519614A (ja) | 2012-06-18 | 2013-04-01 | シングルチャンネル音声残響除去方法及びその装置 |
| US14/407,610 US9269369B2 (en) | 2012-06-18 | 2013-04-01 | Method and device for dereverberation of single-channel speech |
| DK13807732.6T DK2863391T3 (da) | 2012-06-18 | 2013-04-01 | Fremgangsmåde og indretning til fjernelse af efterklang af enkanal-tale |
| EP13807732.6A EP2863391B1 (en) | 2012-06-18 | 2013-04-01 | Method and device for dereverberation of single-channel speech |
| KR1020147035393A KR101614647B1 (ko) | 2012-06-18 | 2013-04-01 | 단일채널 음성의 반향제거를 위한 방법 및 장치 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201210201879.7A CN102750956B (zh) | 2012-06-18 | 2012-06-18 | 一种单通道语音去混响的方法和装置 |
| CN201210201879.7 | 2012-06-18 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2013189199A1 true WO2013189199A1 (zh) | 2013-12-27 |
Family
ID=47031075
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2013/073584 Ceased WO2013189199A1 (zh) | 2012-06-18 | 2013-04-01 | 一种单通道语音去混响的方法和装置 |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US9269369B2 (enExample) |
| EP (1) | EP2863391B1 (enExample) |
| JP (2) | JP2015519614A (enExample) |
| KR (1) | KR101614647B1 (enExample) |
| CN (1) | CN102750956B (enExample) |
| DK (1) | DK2863391T3 (enExample) |
| WO (1) | WO2013189199A1 (enExample) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2016054421A (ja) * | 2014-09-03 | 2016-04-14 | リオン株式会社 | 残響抑制装置 |
| CN111512367A (zh) * | 2017-09-21 | 2020-08-07 | 弗劳恩霍夫应用研究促进协会 | 提供处理的降噪且混响降低的音频信号的信号处理器和方法 |
| CN113160842A (zh) * | 2021-03-06 | 2021-07-23 | 西安电子科技大学 | 一种基于mclp的语音去混响方法及系统 |
| CN114255777A (zh) * | 2021-12-20 | 2022-03-29 | 苏州蛙声科技有限公司 | 实时语音去混响的混合方法及系统 |
| CN114898771A (zh) * | 2022-03-25 | 2022-08-12 | 沈阳化工大学 | 一种适用于美声教学的发声训练方法 |
Families Citing this family (22)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102750956B (zh) * | 2012-06-18 | 2014-07-16 | 歌尔声学股份有限公司 | 一种单通道语音去混响的方法和装置 |
| CN104867497A (zh) * | 2014-02-26 | 2015-08-26 | 北京信威通信技术股份有限公司 | 一种语音降噪方法 |
| CN106504763A (zh) * | 2015-12-22 | 2017-03-15 | 电子科技大学 | 基于盲源分离与谱减法的麦克风阵列多目标语音增强方法 |
| CN107358962B (zh) * | 2017-06-08 | 2018-09-04 | 腾讯科技(深圳)有限公司 | 音频处理方法及音频处理装置 |
| CN109754821B (zh) | 2017-11-07 | 2023-05-02 | 北京京东尚科信息技术有限公司 | 信息处理方法及其系统、计算机系统和计算机可读介质 |
| CN110111802B (zh) * | 2018-02-01 | 2021-04-27 | 南京大学 | 基于卡尔曼滤波的自适应去混响方法 |
| US10726857B2 (en) * | 2018-02-23 | 2020-07-28 | Cirrus Logic, Inc. | Signal processing for speech dereverberation |
| CN108986799A (zh) * | 2018-09-05 | 2018-12-11 | 河海大学 | 一种基于倒谱滤波的混响参数估计方法 |
| CN109584896A (zh) * | 2018-11-01 | 2019-04-05 | 苏州奇梦者网络科技有限公司 | 一种语音芯片及电子设备 |
| CN112997249B (zh) * | 2018-11-30 | 2022-06-14 | 深圳市欢太科技有限公司 | 语音处理方法、装置、存储介质及电子设备 |
| CN110364161A (zh) * | 2019-08-22 | 2019-10-22 | 北京小米智能科技有限公司 | 响应语音信号的方法、电子设备、介质及系统 |
| CN111123202B (zh) * | 2020-01-06 | 2022-01-11 | 北京大学 | 一种室内早期反射声定位方法及系统 |
| EP3863303B1 (en) * | 2020-02-06 | 2022-11-23 | Universität Zürich | Estimating a direct-to-reverberant ratio of a sound signal |
| CN111489760B (zh) * | 2020-04-01 | 2023-05-16 | 腾讯科技(深圳)有限公司 | 语音信号去混响处理方法、装置、计算机设备和存储介质 |
| KR102191736B1 (ko) | 2020-07-28 | 2020-12-16 | 주식회사 수퍼톤 | 인공신경망을 이용한 음성향상방법 및 장치 |
| CN112599126B (zh) * | 2020-12-03 | 2022-05-27 | 海信视像科技股份有限公司 | 一种智能设备的唤醒方法、智能设备及计算设备 |
| CN112863536A (zh) * | 2020-12-24 | 2021-05-28 | 深圳供电局有限公司 | 环境噪声提取方法、装置、计算机设备和存储介质 |
| CN113362841B (zh) * | 2021-06-10 | 2023-05-02 | 北京小米移动软件有限公司 | 音频信号处理方法、装置和存储介质 |
| CN113223543B (zh) * | 2021-06-10 | 2023-04-28 | 北京小米移动软件有限公司 | 语音增强方法、装置和存储介质 |
| CN114333876B (zh) * | 2021-11-25 | 2024-02-09 | 腾讯科技(深圳)有限公司 | 信号处理的方法和装置 |
| CN114898763B (zh) * | 2022-05-27 | 2025-04-25 | 随锐科技集团股份有限公司 | 用于预测混响时间的方法及其相关产品 |
| CN116469407A (zh) * | 2023-05-04 | 2023-07-21 | 北京沃东天骏信息技术有限公司 | 一种音频处理方法、装置、设备和介质 |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1989550A (zh) * | 2004-07-22 | 2007-06-27 | 皇家飞利浦电子股份有限公司 | 音频信号去混响 |
| US20080059157A1 (en) * | 2006-09-04 | 2008-03-06 | Takashi Fukuda | Method and apparatus for processing speech signal data |
| US20080292108A1 (en) * | 2006-08-01 | 2008-11-27 | Markus Buck | Dereverberation system for use in a signal processing apparatus |
| CN101315772A (zh) * | 2008-07-17 | 2008-12-03 | 上海交通大学 | 基于维纳滤波的语音混响消减方法 |
| CN101385386A (zh) * | 2006-03-03 | 2009-03-11 | 日本电信电话株式会社 | 混响除去装置、混响除去方法、混响除去程序和记录介质 |
| CN101454825A (zh) * | 2006-09-20 | 2009-06-10 | 哈曼国际工业有限公司 | 用于提取和改变输入信号的混响内容的方法和装置 |
| US8160262B2 (en) * | 2007-10-31 | 2012-04-17 | Nuance Communications, Inc. | Method for dereverberation of an acoustic signal |
| CN102750956A (zh) * | 2012-06-18 | 2012-10-24 | 歌尔声学股份有限公司 | 一种单通道语音去混响的方法和装置 |
Family Cites Families (26)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5029509A (en) * | 1989-05-10 | 1991-07-09 | Board Of Trustees Of The Leland Stanford Junior University | Musical synthesizer combining deterministic and stochastic waveforms |
| JPH0739968B2 (ja) * | 1991-03-25 | 1995-05-01 | 日本電信電話株式会社 | 音響伝達特性模擬方法 |
| JPH1091194A (ja) * | 1996-09-18 | 1998-04-10 | Sony Corp | 音声復号化方法及び装置 |
| US6011846A (en) * | 1996-12-19 | 2000-01-04 | Nortel Networks Corporation | Methods and apparatus for echo suppression |
| US6261101B1 (en) * | 1997-12-17 | 2001-07-17 | Scientific Learning Corp. | Method and apparatus for cognitive training of humans using adaptive timing of exercises |
| US6496795B1 (en) * | 1999-05-05 | 2002-12-17 | Microsoft Corporation | Modulated complex lapped transform for integrated signal enhancement and coding |
| US6618712B1 (en) * | 1999-05-28 | 2003-09-09 | Sandia Corporation | Particle analysis using laser ablation mass spectroscopy |
| JP2001175298A (ja) * | 1999-12-13 | 2001-06-29 | Fujitsu Ltd | 騒音抑圧装置 |
| JP2003533753A (ja) * | 2000-05-17 | 2003-11-11 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | スペクトルのモデル化 |
| DE60110086T2 (de) * | 2000-07-27 | 2006-04-06 | Activated Content Corp., Inc., Burlingame | Stegotextkodierer und -dekodierer |
| US6862558B2 (en) * | 2001-02-14 | 2005-03-01 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | Empirical mode decomposition for analyzing acoustical signals |
| WO2005122640A1 (en) * | 2004-06-08 | 2005-12-22 | Koninklijke Philips Electronics N.V. | Coding reverberant sound signals |
| DE602005020662D1 (de) * | 2004-10-13 | 2010-05-27 | Koninkl Philips Electronics Nv | Echolöschung |
| JP4486527B2 (ja) * | 2005-03-07 | 2010-06-23 | 日本電信電話株式会社 | 音響信号分析装置およびその方法、プログラム、記録媒体 |
| JP2007065204A (ja) * | 2005-08-30 | 2007-03-15 | Nippon Telegr & Teleph Corp <Ntt> | 残響除去装置、残響除去方法、残響除去プログラム及びその記録媒体 |
| US7856353B2 (en) * | 2007-08-07 | 2010-12-21 | Nuance Communications, Inc. | Method for processing speech signal data with reverberation filtering |
| JP5178370B2 (ja) * | 2007-08-09 | 2013-04-10 | 本田技研工業株式会社 | 音源分離システム |
| US20090154726A1 (en) * | 2007-08-22 | 2009-06-18 | Step Labs Inc. | System and Method for Noise Activity Detection |
| JP4532576B2 (ja) * | 2008-05-08 | 2010-08-25 | トヨタ自動車株式会社 | 処理装置、音声認識装置、音声認識システム、音声認識方法、及び音声認識プログラム |
| JP2009276365A (ja) * | 2008-05-12 | 2009-11-26 | Toyota Motor Corp | 処理装置、音声認識装置、音声認識システム、音声認識方法 |
| JP4977100B2 (ja) * | 2008-08-11 | 2012-07-18 | 日本電信電話株式会社 | 残響除去装置、残響除去方法、そのプログラムおよび記録媒体 |
| JP4960933B2 (ja) * | 2008-08-22 | 2012-06-27 | 日本電信電話株式会社 | 音響信号強調装置とその方法と、プログラムと記録媒体 |
| JP5645419B2 (ja) * | 2009-08-20 | 2014-12-24 | 三菱電機株式会社 | 残響除去装置 |
| EP2545717A1 (de) * | 2010-03-10 | 2013-01-16 | Siemens Medical Instruments Pte. Ltd. | Enthallen von signalen einer binauralen hörvorrichtung |
| WO2012014451A1 (ja) * | 2010-07-26 | 2012-02-02 | パナソニック株式会社 | 多入力雑音抑圧装置、多入力雑音抑圧方法、プログラムおよび集積回路 |
| JP5751110B2 (ja) * | 2011-09-22 | 2015-07-22 | 富士通株式会社 | 残響抑制装置および残響抑制方法並びに残響抑制プログラム |
-
2012
- 2012-06-18 CN CN201210201879.7A patent/CN102750956B/zh active Active
-
2013
- 2013-04-01 EP EP13807732.6A patent/EP2863391B1/en active Active
- 2013-04-01 KR KR1020147035393A patent/KR101614647B1/ko active Active
- 2013-04-01 WO PCT/CN2013/073584 patent/WO2013189199A1/zh not_active Ceased
- 2013-04-01 DK DK13807732.6T patent/DK2863391T3/da active
- 2013-04-01 US US14/407,610 patent/US9269369B2/en active Active
- 2013-04-01 JP JP2015516415A patent/JP2015519614A/ja active Pending
-
2016
- 2016-10-28 JP JP2016211765A patent/JP6431884B2/ja active Active
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1989550A (zh) * | 2004-07-22 | 2007-06-27 | 皇家飞利浦电子股份有限公司 | 音频信号去混响 |
| CN101385386A (zh) * | 2006-03-03 | 2009-03-11 | 日本电信电话株式会社 | 混响除去装置、混响除去方法、混响除去程序和记录介质 |
| US20080292108A1 (en) * | 2006-08-01 | 2008-11-27 | Markus Buck | Dereverberation system for use in a signal processing apparatus |
| US20080059157A1 (en) * | 2006-09-04 | 2008-03-06 | Takashi Fukuda | Method and apparatus for processing speech signal data |
| CN101454825A (zh) * | 2006-09-20 | 2009-06-10 | 哈曼国际工业有限公司 | 用于提取和改变输入信号的混响内容的方法和装置 |
| US8160262B2 (en) * | 2007-10-31 | 2012-04-17 | Nuance Communications, Inc. | Method for dereverberation of an acoustic signal |
| CN101315772A (zh) * | 2008-07-17 | 2008-12-03 | 上海交通大学 | 基于维纳滤波的语音混响消减方法 |
| CN102750956A (zh) * | 2012-06-18 | 2012-10-24 | 歌尔声学股份有限公司 | 一种单通道语音去混响的方法和装置 |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2016054421A (ja) * | 2014-09-03 | 2016-04-14 | リオン株式会社 | 残響抑制装置 |
| CN111512367A (zh) * | 2017-09-21 | 2020-08-07 | 弗劳恩霍夫应用研究促进协会 | 提供处理的降噪且混响降低的音频信号的信号处理器和方法 |
| CN111512367B (zh) * | 2017-09-21 | 2023-03-14 | 弗劳恩霍夫应用研究促进协会 | 提供处理的降噪且混响降低的音频信号的信号处理器和方法 |
| CN113160842A (zh) * | 2021-03-06 | 2021-07-23 | 西安电子科技大学 | 一种基于mclp的语音去混响方法及系统 |
| CN113160842B (zh) * | 2021-03-06 | 2024-04-09 | 西安电子科技大学 | 一种基于mclp的语音去混响方法及系统 |
| CN114255777A (zh) * | 2021-12-20 | 2022-03-29 | 苏州蛙声科技有限公司 | 实时语音去混响的混合方法及系统 |
| CN114898771A (zh) * | 2022-03-25 | 2022-08-12 | 沈阳化工大学 | 一种适用于美声教学的发声训练方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP2863391A4 (en) | 2015-09-09 |
| US20150149160A1 (en) | 2015-05-28 |
| EP2863391A1 (en) | 2015-04-22 |
| EP2863391B1 (en) | 2020-05-20 |
| JP2015519614A (ja) | 2015-07-09 |
| US9269369B2 (en) | 2016-02-23 |
| CN102750956A (zh) | 2012-10-24 |
| KR20150005719A (ko) | 2015-01-14 |
| DK2863391T3 (da) | 2020-08-03 |
| JP6431884B2 (ja) | 2018-11-28 |
| KR101614647B1 (ko) | 2016-04-21 |
| CN102750956B (zh) | 2014-07-16 |
| JP2017021385A (ja) | 2017-01-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2013189199A1 (zh) | 一种单通道语音去混响的方法和装置 | |
| US10891931B2 (en) | Single-channel, binaural and multi-channel dereverberation | |
| US11133019B2 (en) | Signal processor and method for providing a processed audio signal reducing noise and reverberation | |
| CN108172231B (zh) | 一种基于卡尔曼滤波的去混响方法及系统 | |
| CN108141656B (zh) | 用于麦克风的数字信号处理的方法和装置 | |
| US20140025374A1 (en) | Speech enhancement to improve speech intelligibility and automatic speech recognition | |
| JP6019969B2 (ja) | 音響処理装置 | |
| EP3692529A1 (en) | An apparatus and a method for signal enhancement | |
| CN109920444B (zh) | 回声时延的检测方法、装置以及计算机可读存储介质 | |
| Zhou et al. | Speech dereverberation with a reverberation time shortening target | |
| CN106340302A (zh) | 一种语音数据的去混响方法及装置 | |
| Cherkassky et al. | Blind synchronization in wireless sensor networks with application to speech enhancement | |
| GB2577905A (en) | Processing audio signals | |
| CN112837697B (zh) | 一种回声抑制方法及装置 | |
| CN202887704U (zh) | 一种单通道语音去混响装置 | |
| EP3058564A1 (fr) | Spatialisation sonore avec effet de salle, optimisee en complexite | |
| CN121056569A (zh) | 一种声学回声消除方法及处理终端 | |
| Islam et al. | Statistical modeling for suppression of late reverberation with inverse filtering for early reflections |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13807732 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2015516415 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 14407610 Country of ref document: US |
|
| ENP | Entry into the national phase |
Ref document number: 20147035393 Country of ref document: KR Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2013807732 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |