WO2014097470A1 - Reverberation removal device - Google Patents

Reverberation removal device Download PDF

Info

Publication number
WO2014097470A1
WO2014097470A1 PCT/JP2012/083266 JP2012083266W WO2014097470A1 WO 2014097470 A1 WO2014097470 A1 WO 2014097470A1 JP 2012083266 W JP2012083266 W JP 2012083266W WO 2014097470 A1 WO2014097470 A1 WO 2014097470A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
filter
adaptive filter
prediction error
transfer function
Prior art date
Application number
PCT/JP2012/083266
Other languages
French (fr)
Japanese (ja)
Inventor
博秋 河崎
健作 藤井
Original Assignee
Toa株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toa株式会社 filed Critical Toa株式会社
Priority to PCT/JP2012/083266 priority Critical patent/WO2014097470A1/en
Publication of WO2014097470A1 publication Critical patent/WO2014097470A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response

Definitions

  • the present invention relates to a dereverberation apparatus, and in particular, to remove a dereverberation component by an acoustic system from a collected sound signal that is an output signal of a sound collecting means that captures sound from a sound source via an acoustic system having reverberation characteristics. Relates to the device.
  • Patent Document 1 Conventionally, as this type of dereverberation device, for example, there is one disclosed in Patent Document 1.
  • the reverberation signal after the sound source signal passes through the reverberation path is divided into short time sections. Then, a linear prediction coefficient (Linier Predictive Coefficient: LPC) is obtained for each section. Further, a residual signal is obtained by convolving the linear prediction coefficient with the reverberant signal. Then, the pitch frequency is further removed from the residual signal.
  • LPC Linear Predictive Coefficient
  • an acoustic transfer characteristic of an acoustic system including a reverberation path (for example, the same room) is measured in advance, and its outline is obtained.
  • this acoustic transfer characteristic is convolved with the residual signal after pitch frequency removal, whereby the residual signal is corrected.
  • an AR (Autoregressive) coefficient in a long time section is obtained from the corrected residual signal.
  • the AR coefficient is set in the FIR filter as a dereverberation filter coefficient, so that the FIR filter functions as a dereverberation filter. That is, when a reverberation signal is input to the FIR filter as the reverberation suppression filter, a reverberation component included in the reverberation signal is suppressed.
  • the present invention has an object to provide a dereverberation apparatus that does not need to measure acoustic transfer characteristics in advance unlike the prior art, and that can remove dereverberation more accurately than the prior art.
  • the present invention provides a reverberation removal that removes a reverberation component by an acoustic system from a collected sound signal that is an output signal of a sound collecting means that captures sound from a sound source via an acoustic system having reverberation characteristics.
  • the device is assumed.
  • the original sound component removing means for removing the original sound component which is the original sound component (from the sound source) from the collected sound signal, and after the original sound component is removed from the collected sound signal by the original sound component removing means
  • Inverse transfer function computing means for obtaining the inverse of the transfer function of the acoustic system based on the original sound component removed signal
  • signal processing means for performing processing based on the inverse transfer function obtained by the inverse transfer function computing means on the collected sound signal And.
  • the original sound component in addition to the original sound component that is a component of the original sound emitted from the sound source, the original sound component is repeatedly reflected in the collected sound signal that is the output signal of the sound collecting means.
  • Reverberation components specifically, various noise components in addition to this
  • the original sound component removing means first removes the original sound component from the collected sound signal, that is, separates it.
  • the original sound component removal signal after the original sound component is removed (separated) from the collected sound signal, in other words, the original sound component removal signal including only the reverberation component, the reciprocal of the transfer function of the acoustic system, That is, the inverse transfer function is obtained by the inverse transfer function calculating means.
  • the inverse transfer function of the acoustic system can be obtained accurately without being affected by the original sound component.
  • a signal processing means performs the process based on the reverse transfer function of this acoustic system with respect to a sound collection signal. As a result, only the reverberation component is accurately removed from the collected sound signal, and only the original sound component is accurately reproduced (reproduced).
  • the inverse transfer function of the acoustic system is obtained based only on the collected sound signal by the sound collecting means, and thus the reverberation component is removed, so-called blind dereverberation is realized.
  • this blind dereverberation is generally considered difficult
  • the original sound component is removed from the collected sound signal as described above, and the acoustic system is based on the signal after the removal of the original sound component after the removal.
  • the inverse transfer function of the acoustic system can be accurately obtained without being affected by the original sound component, and thus accurate dereverberation can be realized.
  • the original sound component removing means in the present invention may include a first adaptive filter to which a sound collection signal from the sound collection means is input.
  • the inverse transfer function calculation means may include a second adaptive filter to which a signal processed by the first adaptive filter is input.
  • the signal processing means may include a third adaptive filter to which a sound collection signal from the sound collection means is input.
  • the first adaptive filter has a relatively high responsiveness that can follow the fluctuation of the original sound component, and the level of the signal processed by itself or the signal processed by the second adaptive filter is minimized.
  • the transfer function of the first adaptive filter is the inverse characteristic of the original sound component (strictly speaking, for example, a transfer function for generating the original sound component like a vocal tract transfer function). Is almost equivalent to That is, the reverse characteristic of the original sound component is identified by the first adaptive filter, that is, an inverse filter of the original sound component is formed. Therefore, when the collected sound signal is input to the first adaptive filter, the original sound component is removed from the collected sound signal. Then, the second adaptive filter has a relatively low response that is incapable of following the fluctuation of the original sound component, and operates so that the level of the processed signal by itself is minimized.
  • the transfer function of the second adaptive filter becomes substantially equivalent to the inverse of the transfer function of the acoustic system. That is, the inverse transfer function of the acoustic system is identified by the second adaptive filter, that is, the inverse filter of the acoustic system is formed. Further, the third adaptive filter realizes the processing based on the above-described inverse transfer function of the acoustic system by copying the filter coefficient of the second adaptive filter to the third adaptive filter, so-called copying.
  • the transfer function of the third adaptive filter becomes substantially equivalent to the inverse transfer function of the acoustic system, that is, the inverse filter of the acoustic system is It is formed. Then, when the collected sound signal is input to such a third adaptive filter, the reverberation component is removed from the collected sound signal.
  • the first adaptive filter referred to here has a relatively small number of taps (tap length). By setting the number of taps to be relatively small in this way, the first adaptive filter has a relatively high response.
  • the second adaptive filter has a smaller number of taps than the number of taps of the first adaptive filter. As a result, the second adaptive filter has a lower responsiveness than the responsiveness of the first adaptive filter.
  • the number of taps of the third adaptive filter is naturally the same as the number of taps of the second adaptive filter.
  • the step size for adjusting the update amount of the first adaptive filter is set to be relatively large. This also makes the first adaptive filter have high responsiveness.
  • the step size of the second adaptive filter is set to be smaller than the step size of the first adaptive filter. This also causes the second adaptive filter to have a response lower than the response of the first adaptive filter.
  • the responsiveness of the third adaptive filter is equivalent to the responsiveness of the second adaptive filter.
  • Such a first adaptive filter is preferably a linear prediction error filter, for example.
  • the second adaptive filter is desirably a linear prediction error filter. That is, it is known that the linear prediction error filter is suitable for forming an inverse filter, and that the processing (algorithm) basically does not cause a delay. Therefore, by adopting such a linear prediction error filter as a first adaptive filter (for forming an inverse filter of the original sound component) and a second adaptive filter (for forming an inverse filter of the acoustic system), Real-time performance is realized. In addition, it is also known that the linear prediction error filter has a relatively small amount of calculation.
  • the third adaptive filter is a linear prediction error filter configured to conform to the second adaptive filter.
  • FIG. 4 It is a block diagram which shows schematic structure of one Embodiment of this invention. It is a block diagram which shows the specific structure of the 1st linear prediction error filter in the embodiment. It is a block diagram which shows the specific structure of the 2nd linear prediction error filter in the embodiment. It is a figure which shows the experimental result in the same embodiment. It is a figure which expands and shows a part of FIG. It is a block diagram which shows equivalently the transfer function of the acoustic system in the embodiment. It is a figure which shows the experimental result different from FIG. 4 and FIG.
  • the dereverberation apparatus 10 includes a microphone 20 as sound collection means as shown in FIG.
  • the microphone 20 captures a sound emitted from the sound source 30, a so-called original sound S (z) (z; a variable in z conversion) via an acoustic system (acoustic space) 40 having reverberation characteristics.
  • the collected sound signal X (z) that is an output signal of the microphone 20 is input to a first linear prediction error filter (Linear Prediction Error LP; LPEF) 50 as a first adaptive filter.
  • Linear Prediction Error LP Linear Prediction Error LP; LPEF
  • the first linear prediction error filter 50 is for removing only the component of the original sound S (z) from the collected sound signal X (z) input from the microphone 20. Details of the first linear prediction error filter 50 will be described later. Then, the signal X ′ (z) processed by the first linear prediction error filter 50 is input to a second linear prediction error filter 60 as a second adaptive filter.
  • the second linear prediction error filter 60 is based on the signal X ′ (z) processed by the first linear prediction error filter 50, in short, of the collected sound signal X (z) obtained from the microphone 20. Based on the processed signal X ′ (z) after the component is removed, the inverse transfer function 1 / R (z) that is the inverse of the transfer function R (z) of the acoustic system 40 from the sound source 30 to the microphone 20. Is determined, that is, identified. Then, the inverse transfer function 1 / R (z) of the acoustic system 40 identified by the second linear prediction error filter 60, strictly speaking, a filter coefficient a n (n: tap) described later of the second linear prediction error filter 60 will be described. ) Is copied to the dereverberation filter 70 as the third adaptive filter. Details of the second linear prediction error filter 60 will be described later in detail.
  • the collected sound signal X (z) is input to the dereverberation filter 70 from the microphone 20.
  • the dereverberation filter 70 performs a filtering process according to the above-described inverse transfer function 1 / R (z) of the acoustic system 40 on the collected sound signal X (z), so that the acoustic system 40 is obtained from the collected sound signal X (z).
  • the noise component due to, especially the reverberation component is removed.
  • the signal from which the reverberation component has been removed in other words, the signal from which only the component of the original sound S (z) has been extracted is used as the post-processing signal S ′ (z) by the dereverberation filter 70. 70, and thus the output of the entire dereverberation apparatus 10 according to the present embodiment. Details of the dereverberation filter 70 will also be described later in detail.
  • the first linear prediction error filter 50 removes the component of the original sound S (z) from the collected sound signal X (z) from the microphone 20 as described above.
  • the level of the signal E (z) after processing by the filter 60 that is, a so-called prediction error is operated to be minimized.
  • the transfer function H S (z) of the first linear prediction error filter 50 is expressed by the following equation (1).
  • M is the number of taps.
  • the tap number M By setting the tap number M to a relatively small value in this way, the responsiveness of the first linear prediction error filter 50 becomes high, and it becomes possible to quickly follow the fluctuation of the original sound S (z) component.
  • b m is a filter coefficient of the m-th tap. By this filter coefficient b m are updated accordingly, i.e.
  • the function H S (z) is substantially equivalent to the inverse characteristic 1 / S (z) of the original sound S (z) (H S (z) ⁇ 1 / S (z)).
  • the filter coefficient b m is updated by, for example, a known learning identification method (Normalized Least Mean Square: NLMS).
  • mu S is a relatively large value, the response of the first linear prediction error filter 50 becomes high.
  • an appropriate algorithm other than the learning identification method may be employed as an algorithm for updating the filter coefficient b m .
  • Such a first linear prediction error filter 50 has a transversal configuration as shown in FIG. 2, for example.
  • the adder 502 to which the sound collection signal X (z) from the microphone 20 is input is provided.
  • the collected sound signal X (z) is sequentially delayed by the M delay elements 504, 504,.
  • the outputs of the delay elements 504, 504,... Are multiplied by the corresponding filter coefficients b m by the corresponding multipliers 506, 506,.
  • adders 508, 508, are summed by adders 508, 508,.
  • the sum P (z) by these adders 508, 508,... Is a prediction signal that uses the collected sound signal X (z) as a target signal, and this prediction signal P (z) is input to the adder 502 described above.
  • a signal after subtraction by the adder 502 that is, a residual signal that is a difference between the sound pickup signal X (z) as a target signal and the prediction signal P (z) is processed by the first linear prediction error filter 50. It is output as a rear signal X ′ (z).
  • the first linear prediction error filter 50 having the transfer function H S (z) expressed by Equation 1 is realized.
  • the transfer function H S (z) of the first linear prediction error filter 50 is substantially equivalent to the inverse characteristic 1 / S (z) of the original sound S (z) as described above. That is, the first linear prediction error filter 50 identifies an inverse characteristic of the original sound S (z), and so forms an inverse filter of the original sound S (z). Therefore, when the collected sound signal X (z) is input to the first linear prediction error filter 50, the component of the original sound S (z) is removed from the collected sound signal X (z), and almost only the reverberation component is present. Remain. Then, the processed signal X ′ (z) by the first linear prediction error filter 50 including only substantially the reverberation component is input to the second linear prediction error filter 60 as described above.
  • the second linear prediction error filter 60 identifies the inverse transfer function 1 / R (z) of the acoustic system 40 as described above.
  • the second linear prediction error filter 60 is also the first linear prediction error filter. As in the case of 50, the operation is performed so that the prediction error E (z) is minimized.
  • the second linear prediction error filter 60 is similar to the transfer function H S (z) of the first linear prediction error filter 50 expressed by Equation 1, and the transfer function H expressed by Equation 2 below. R (z).
  • N is the number of taps.
  • the transfer function H of the second linear prediction error filter 60 R (z) is substantially equivalent to the inverse transfer function 1 / R (z) of the acoustic system 40 (H R (z) ⁇ 1 / R (z)), that is, the inverse transfer function 1 / R of the acoustic system 40. (Z) is identified.
  • the filter coefficients a n are also, for example, is updated by the above-mentioned learning identification method.
  • the response of the second linear prediction error filter 60 is low.
  • the updating algorithm of the filter coefficients a n of the second linear prediction error filter 60 may, as appropriate algorithms except learning identification method may be employed.
  • Such a second linear prediction error filter 60 has, for example, a transversal configuration as shown in FIG. 3 similar to the configuration of the first linear prediction error filter 50 shown in FIG. That is, according to this configuration, the adder 602 to which the signal X ′ (z) processed by the first linear prediction error filter 50 is input is provided. In addition, the processed signal X ′ (z) is sequentially delayed by M delay elements 604, 604,. Then, each of the delay elements 604, 604, ... are output, the multiplier 606,606 corresponding to each, after ... have multiplied the filter coefficients a n corresponding to the respective through another (N-1 pieces as above Are summed by adders 608, 608,. The sum P ′ (z) by the adders 608, 608,...
  • the second linear prediction error filter 60 identifies the inverse transfer function 1 / R (z) of the acoustic system 40 as described above, that is, forms the inverse filter of the acoustic system 40. Then, the transfer function H R of the second linear prediction error filter 60 (z), the filter coefficients a n of strictly said second linear prediction error filter 60 is copied to the dereverberation filter 70 as described above.
  • the second filter coefficient a n copies from the linear prediction error filter 60 to dereverberation filter 70 for example, every 1 sampling, i.e. at a constant cycle, is carried out. Incidentally, the copy may be performed for each of the plurality sampling, irregularly, for example when the filter coefficients a n occurs above a certain variation, it may be performed.
  • the signal S ′ (z) ( ⁇ S (z)) after processing by the dereverberation filter 70 is the output of the entire dereverberation apparatus 10 according to the present embodiment.
  • the dereverberation effect by the dereverberation apparatus 10 according to this embodiment was actually confirmed. Specifically, in a conference room having a width of about 5.8 m, a depth of about 3.2 m, and a height of about 2.7 m, white noise as the original sound S (z) is output from a speaker as the sound source 30; This is received by the microphone 20 separated from the speaker by about 1.0 m. Then, by observing the collected sound signal X (z) by the microphone 20, the impulse response before dereverberation is confirmed, and the signal after processing by the dereverberation filter 70 (the output of the entire dereverberation apparatus 10) is observed. Then, I confirmed the impulse response after dereverberation.
  • the configuration of the dereverberation filter 70 conforms to the configuration of the second linear prediction error filter 60.
  • FIGS. 4 and 5 The results shown in FIGS. 4 and 5 were obtained by this experiment.
  • 4A shows an impulse response (X (z)) before dereverberation
  • FIG. 4B shows an impulse response (S ′ (z)) after dereverberation.
  • FIG. 5A is an enlarged view of a portion surrounded by a broken line frame A in FIG. 4A
  • FIG. 5B is surrounded by a broken line frame B in FIG. 4B.
  • the component indicating the maximum amplitude (when time is zero) is the component of the original sound (z), which is a so-called direct sound component.
  • a component behind the direct sound component is a reverberation component.
  • the original sound S (z) component (the amplitude thereof) which is a direct sound component is the same as that before dereverberation, that is, nothing. It is not influenced by. And it turns out that only the reverberation component is reduced more effectively than before reverberation removal. Similarly, it was confirmed that a good dereverberation effect was obtained when human speech was adopted as the original sound S (z).
  • a very good dereverberation effect can be obtained in an environment where flutter echoes (Naruto) occur. This is due to the following reason.
  • Equation 3 K is the number of taps.
  • ⁇ k is a k-th tap filter coefficient, and ⁇ k is another k-tap filter coefficient.
  • Equation 3 corresponds to the minimum phase component, and a typical example is a flutter echo.
  • the numerator in Equation 3 corresponds to a non-minimum phase component, and a typical example thereof includes an irregular reflection echo.
  • the minimum phase component (the denominator in Equation 3) of the inverse transfer function 1 / R (z) of the acoustic system 40 and the second linear prediction error filter 60 Transfer functions H R (z) correspond to each other perfectly. This means that the minimum phase component is reliably identified by the second linear prediction error filter 60. Therefore, the minimum phase component including the flutter echo is removed very well. Note that the non-minimum phase component (the numerator in Equation 3) of the inverse transfer function 1 / R (z) of the acoustic system 40 is not subjected to special processing and is ignored.
  • the transfer function R (z) of the acoustic system 40 is equivalently expressed in a block diagram, it is as shown in FIG. According to FIG. 6, the original sound S (z) from the sound source 30 is multiplied by a filter coefficient ⁇ 0 by a multiplier 402 and then input to an adder 404. The output of the adder 404 is used as the output of the acoustic system 40, that is, input to the microphone 20.
  • the original sound S (z) from the sound source 30 is sequentially delayed by the K delay elements 406, 406,.
  • the outputs of the delay elements 406, 406,... Are multiplied by the corresponding filter coefficients ⁇ k by the corresponding multipliers 408, 408,.
  • the output of the adder 404 is sequentially delayed by K separate delay elements 412, 412,.
  • the outputs of the delay elements 412, 412,... Are multiplied by the corresponding filter coefficients ⁇ k by the corresponding multipliers 414, 414,. It is input to....
  • Each adder 410 subtracts the multiplication result by the multiplier 414 from the multiplication result by the multiplier 408. Then, the subtraction results by the adders 410, 410,... Are summed together and input to the above-described adder 402, where they are added to the original sound S (z). Thereby, an equivalent circuit of the transfer function R (z) of the acoustic system 40 expressed by Equation 3 is realized.
  • the configuration of the equivalent circuit of the transfer function R (z) of the acoustic system 40 shown in FIG. 6 and the second linear prediction error filter 60 (shown in FIG. 3 for identifying the inverse transfer function 1 / R (z) are shown.
  • a portion corresponding to the above-described minimum phase component (the denominator in Equation 3) in the equivalent circuit of the transfer function R (z) of the acoustic system 40 (FIG. 3) 6 and the configuration of the second linear prediction error filter 60 shown in FIG. 3 are mutually opposite. Also from this, it can be understood that the minimum phase component is reliably identified by the second linear prediction error filter 60 and that the minimum phase component including the flutter echo is satisfactorily removed.
  • the following device may be made in order to obtain the effect earlier.
  • Equation 4 ⁇ 0 is an initial value of the step size ⁇ R and is arbitrarily set.
  • E 0 is a target value of the prediction error E (z), which is also set arbitrarily.
  • P S is the power of the input signal X '(z), it is approximated by the following equation 5.
  • P E is the power of the output signal E (z) and is approximated by Equation (6). Note that the coefficient ⁇ in Equations 5 and 6 is determined as in Equation 7.
  • step size mu R of the second linear prediction error filter 60 is controlled, identified speed (convergence rate of the inverse transfer function 1 / R of the acoustic system 40 according to the second linear prediction error filter 60 (z) ) Will improve. The effect was actually confirmed by simulation.
  • the reverberation time is set to 1.024 s.
  • FIG. 7 The result shown in FIG. 7 was obtained by this simulation.
  • (A) in FIG. 7 of them is a comparative control of the results step size mu R is fixed
  • FIG. 7 (b) is a result of the control of the step size mu R.
  • the dereverberation apparatus 10 of the present embodiment only the reverberation component of the collected sound signal X (z) from the microphone 20 can be accurately removed, and only the original sound S (z) can be accurately extracted. it can.
  • Such a dereverberation apparatus 10 is extremely good in reverberation particularly in an environment where the flutter echo described above occurs, or in an environment where the speaker and the microphone are separated by about 1 m as in a teleconference system. Demonstrate the removal effect. This is particularly effective for devices that are susceptible to reverberation, such as hands-free telephones, intercoms, and boundary microphones.
  • the dereverberation apparatus 10 of the present embodiment unlike the above-described conventional technology, it is not necessary to measure the acoustic transfer characteristics in advance. Furthermore, according to the dereverberation apparatus 10 of the present embodiment, the inverse transfer function 1 / R (z) of the acoustic system 40 is identified based only on the sound collection signal S (z) from the microphone 20, and thus dereverberation is performed. In other words, so-called blind dereverberation is realized. This blind dereverberation is generally considered difficult, but according to the dereverberation apparatus 10 of the present embodiment, the component of the original sound S (z) from the collected signal X (z) of the microphone 20 as described above.
  • the inverse transfer function 1 / R (z) of the acoustic system 40 is identified based on the processed signal X ′ (z) after the removal, so that the influence of the component of the original sound S (z) can be reduced. Without being received, the inverse transfer function 1 / R (z) of the acoustic system 40 is accurately identified, so that accurate noise removal is realized. In particular, a better dereverberation effect can be obtained as compared with the prior art lacking accuracy of dereverberation suppression. This greatly contributes to the improvement of sound quality.
  • a good dereverberation effect can be obtained as described above by the three linear prediction error filters 50, 60 and 70 including the dereverberation filter 70. Since these linear prediction error filters 50, 60 and 70 basically do not cause a delay, a so-called real-time property is realized. Further, since the linear prediction error filters 50, 60 and 70 have a relatively small amount of calculation, the burden on the CPU and DSP (not shown) constituting them is reduced. In other words, as the CPU, DSP, etc., it is possible to adopt an inexpensive one whose capability is not necessarily high.
  • the first linear prediction error filter 50 is a transversal type, but is not limited thereto.
  • the first linear prediction error filter 50 may be realized by a configuration other than the transversal type such as a lattice type.
  • the transversal type has a simpler configuration than, for example, the lattice type, by adopting such a transversal type, the CPU, DSP, or the like constituting the first linear prediction error filter 50 is used. The burden is further reduced. In other words, a more inexpensive one can be used as the CPU or DSP.
  • the second linear prediction error filter 60 is not limited to the transversal type but may be realized by a configuration other than the transversal type such as a lattice type.
  • the configuration of the noise removal filter 70 is determined in accordance with the configuration of the second linear prediction error filter 60.
  • the first linear prediction error filter 50 instead of the first linear prediction error filter 50 as the first adaptive filter, an adaptive filter of another configuration (algorithm) may be employed.
  • an adaptive filter of another configuration (algorithm) may be employed.
  • the first linear prediction error filter 50 as the first adaptive filter, real-time performance is realized, and the burden on the CPU, DSP, and the like constituting the real-time property is reduced.
  • the second linear prediction error filter 60 may be replaced with an adaptive filter having another configuration instead.
  • the configuration of the dereverberation filter 70 is determined in accordance with the configuration of the second linear prediction error filter 60.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

[Problem] To achieve accurate reverberation removal even while being of a simple configuration. [Solution] According to this reverberation removal device (10), by way of input of a collected sound signal X(z) from a microphone (20) into a first linear prediction error filter (50), the original sound component S(z) is removed from the collected sound signal X(z). Then, on the basis of a post-processing signal X'(z) by way of the first linear prediction error filter (50), a second linear prediction error filter (60) identifies an inverse transfer function 1/R(z) which is a reciprocal of a transfer function R(z) of an acoustic system (40) from an audio source (30) to the microphone (20). Then, the inverse transfer function 1/R(z) of the acoustic system (40) which has been identified by the second linear prediction error filter (60) is copied to a reverberation removal filter (70). By inputting the collected sound signal X(z) from the microphone (20) into the reverberation removal filter (70), reverberation components by the acoustic system (40) are removed from the collected sound signal X(z) so that only the original sound component S(z) is extracted.

Description

残響除去装置Reverberation removal device
 本発明は、残響除去装置に関し、特に、残響特性を有する音響系を介して音源からの音を捉える収音手段の出力信号である収音信号から当該音響系による残響成分を除去する、残響除去装置に関する。 The present invention relates to a dereverberation apparatus, and in particular, to remove a dereverberation component by an acoustic system from a collected sound signal that is an output signal of a sound collecting means that captures sound from a sound source via an acoustic system having reverberation characteristics. Relates to the device.
 この種の残響除去装置として、従来、例えば特許文献1に開示されたものがある。この従来技術によれば、まず、音源信号が残響路を経由した後の残響信号が短時間区間に分割される。そして、それぞれの区間ごとに線形予測係数(Linier Predictive Coefficient:LPC)が求められる。さらに、この線形予測係数が残響信号に畳み込まれることで、残差信号が求められる。そして、この残差信号からさらにピッチ周波数が除去される。一方、残響路を含む音響系(例えば同一室内)の音響伝達特性が事前に測定され、その概形が求められる。そして、この音響伝達特性の概形がピッチ周波数除去後の残差信号に畳み込まれることで、当該残差信号が補正される。その上で、この補正後の残差信号から長時間区間でのAR(Autoregressive:自己回帰)係数が求められる。そして、このAR係数が残響抑圧フィルタ係数としてFIRフィルタに設定されることで、当該FIRフィルタが残響抑圧フィルタとして機能する。即ち、この残響抑圧フィルタとしてのFIRフィルタに残響信号が入力されることで、当該残響信号に含まれる残響成分が抑圧される。 Conventionally, as this type of dereverberation device, for example, there is one disclosed in Patent Document 1. According to this prior art, first, the reverberation signal after the sound source signal passes through the reverberation path is divided into short time sections. Then, a linear prediction coefficient (Linier Predictive Coefficient: LPC) is obtained for each section. Further, a residual signal is obtained by convolving the linear prediction coefficient with the reverberant signal. Then, the pitch frequency is further removed from the residual signal. On the other hand, an acoustic transfer characteristic of an acoustic system including a reverberation path (for example, the same room) is measured in advance, and its outline is obtained. And the outline of this acoustic transfer characteristic is convolved with the residual signal after pitch frequency removal, whereby the residual signal is corrected. After that, an AR (Autoregressive) coefficient in a long time section is obtained from the corrected residual signal. The AR coefficient is set in the FIR filter as a dereverberation filter coefficient, so that the FIR filter functions as a dereverberation filter. That is, when a reverberation signal is input to the FIR filter as the reverberation suppression filter, a reverberation component included in the reverberation signal is suppressed.
特開平9-261133号公報JP-A-9-261133
 しかし、上述の従来技術では、事前に音響系の音響伝達特性を測定する必要があるので、そのための余分な手間や時間が掛かる。しかも、この事前に測定される音響伝達特性は、話者の発話位置のものではないので(つまり残響抑制の対象となる残響信号が経由するのとは異なる経路のものであるので)、このような音響伝達特性を用いての残響抑圧は、どうしても精確さに欠ける。 However, in the above-described conventional technique, it is necessary to measure the acoustic transfer characteristics of the acoustic system in advance, and thus extra time and effort are required for that purpose. In addition, since the acoustic transfer characteristic measured in advance is not that of the speaker's utterance position (that is, because the reverberation signal that is the target of reverberation suppression has a different path), Reverberation suppression using a simple acoustic transmission characteristic is inevitably lacking in accuracy.
 そこで、本発明は、従来技術とは異なり事前に音響伝達特性を測定する必要がなく、しかも、当該従来技術よりも精確に残響を除去することができる、残響除去装置を提供することを、目的とする。 Therefore, the present invention has an object to provide a dereverberation apparatus that does not need to measure acoustic transfer characteristics in advance unlike the prior art, and that can remove dereverberation more accurately than the prior art. And
 この目的を達成するために、本発明は、残響特性を有する音響系を介して音源からの音を捉える収音手段の出力信号である収音信号から当該音響系による残響成分を除去する残響除去装置を、前提とする。この前提の下、収音信号から元の(音源からの)音の成分である原音成分を除去する原音成分除去手段と、この原音成分除去手段によって当該収音信号から原音成分が除去された後の原音成分除去後信号に基づいて音響系の伝達関数の逆数を求める逆伝達関数演算手段と、この逆伝達関数演算手段によって求められた逆伝達関数に基づく処理を収音信号に施す信号処理手段と、を具備する。 In order to achieve this object, the present invention provides a reverberation removal that removes a reverberation component by an acoustic system from a collected sound signal that is an output signal of a sound collecting means that captures sound from a sound source via an acoustic system having reverberation characteristics. The device is assumed. Under this assumption, the original sound component removing means for removing the original sound component which is the original sound component (from the sound source) from the collected sound signal, and after the original sound component is removed from the collected sound signal by the original sound component removing means Inverse transfer function computing means for obtaining the inverse of the transfer function of the acoustic system based on the original sound component removed signal, and signal processing means for performing processing based on the inverse transfer function obtained by the inverse transfer function computing means on the collected sound signal And.
 このように構成された本発明によれば、収音手段の出力信号である収音信号には、音源から発せられる元の音の成分である原音成分の他に、この原音成分が繰り返し反射することによる残響成分(厳密にはこれに加えて各種の雑音成分)が含まれている。このような収音信号から残響成分を除去するべく、まず、原音成分除去手段が、当該収音信号から原音成分を除去し、言わば分離する。そして、この収音信号から原音成分が除去(分離)された後の原音成分除去後信号、言い換えれば残響成分のみを含む当該原音成分除去後信号、に基づいて、音響系の伝達関数の逆数、つまり逆伝達関数が、逆伝達関数演算手段によって求められる。このような手順が踏まれることで、原音成分の影響を受けることなく精確に音響系の逆伝達関数が求められる。そして、信号処理手段が、この音響系の逆伝達関数に基づく処理を収音信号に施す。この結果、収音信号から残響成分のみが精確に除去され、原音成分のみが精確に再現(再生)される。 According to the present invention configured as described above, in addition to the original sound component that is a component of the original sound emitted from the sound source, the original sound component is repeatedly reflected in the collected sound signal that is the output signal of the sound collecting means. Reverberation components (specifically, various noise components in addition to this) are included. In order to remove the reverberation component from such a collected sound signal, the original sound component removing means first removes the original sound component from the collected sound signal, that is, separates it. Then, based on the original sound component removal signal after the original sound component is removed (separated) from the collected sound signal, in other words, the original sound component removal signal including only the reverberation component, the reciprocal of the transfer function of the acoustic system, That is, the inverse transfer function is obtained by the inverse transfer function calculating means. By taking such a procedure, the inverse transfer function of the acoustic system can be obtained accurately without being affected by the original sound component. And a signal processing means performs the process based on the reverse transfer function of this acoustic system with respect to a sound collection signal. As a result, only the reverberation component is accurately removed from the collected sound signal, and only the original sound component is accurately reproduced (reproduced).
 即ち、本発明によれば、収音手段による収音信号のみに基づいて音響系の逆伝達関数が求められ、ひいては残響成分が除去され、いわゆるブラインド残響除去が実現される。このブラインド残響除去は、一般に、困難であるとされているが、本発明によれば、上述の如く収音信号から原音成分が除去され、この除去後の原音成分除去後信号に基づいて音響系の逆伝達関数が求められることで、原音成分の影響を受けることなく当該音響系の逆伝達関数が精確に求められ、ひいては精確な残響除去が実現される。特に、話者の発話位置のものではない音響伝達関数を用いて残響抑制が行われる上述の従来技術に比べて、精確な残響除去が実現される。しかも、当該従来技術とは異なり、事前に音響伝達関数を測定する必要もない。 That is, according to the present invention, the inverse transfer function of the acoustic system is obtained based only on the collected sound signal by the sound collecting means, and thus the reverberation component is removed, so-called blind dereverberation is realized. Although this blind dereverberation is generally considered difficult, according to the present invention, the original sound component is removed from the collected sound signal as described above, and the acoustic system is based on the signal after the removal of the original sound component after the removal. Thus, the inverse transfer function of the acoustic system can be accurately obtained without being affected by the original sound component, and thus accurate dereverberation can be realized. In particular, accurate dereverberation can be achieved as compared to the above-described conventional technique in which dereverberation is performed using an acoustic transfer function that is not at the speaker's utterance position. Moreover, unlike the prior art, it is not necessary to measure the acoustic transfer function in advance.
 本発明における原音成分除去手段は、収音手段による収音信号が入力される第1適応フィルタを含むものであってもよい。そして、逆伝達関数演算手段は、当該第1適応フィルタによる処理後信号が入力される第2適応フィルタを含むものであってもよい。さらに、信号処理手段は、収音手段による収音信号が入力される第3適応フィルタを含むものであってもよい。この場合、第1適応フィルタは、原音成分の変動に追随可能な程度の比較的に高い応答性を有すると共に、自身による処理後信号または第2適応フィルタによる処理後信号のレベルが最小になるように動作する。このように第1適応フィルタが動作することで、当該第1適応フィルタの伝達関数が原音成分(厳密には例えば声道伝達関数のように当該原音成分を生成するための伝達関数)の逆特性と略等価になる。即ち、第1適応フィルタによって、原音成分の逆特性が同定され、言わば当該原音成分の逆フィルタが形成される。従って、このような第1適応フィルタに収音信号が入力されることで、当該収音信号から原音成分が除去される。そして、第2適応フィルタは、原音成分の変動に追随不可能な程度の比較的に低い応答性を有すると共に、自身による処理後信号のレベルが最小になるように動作する。このように第2フィルタが動作することで、当該第2適応フィルタの伝達関数が音響系の伝達関数の逆数と略等価になる。即ち、第2適応フィルタによって、音響系の逆伝達関数が同定され、つまり当該音響系の逆フィルタが形成される。さらに、第3適応フィルタは、これに第2適応フィルタのフィルタ係数が複写され、いわゆるコピーされることで、上述した音響系の逆伝達関数に基づく処理を実現する。即ち、第2適応フィルタのフィルタ係数が第3適応フィルタにコピーされることで、当該第3適応フィルタの伝達関数が音響系の逆伝達関数と略等価になり、つまり当該音響系の逆フィルタが形成される。そして、このような第3適応フィルタに収音信号が入力されることで、当該収音信号から残響成分が除去される。 The original sound component removing means in the present invention may include a first adaptive filter to which a sound collection signal from the sound collection means is input. The inverse transfer function calculation means may include a second adaptive filter to which a signal processed by the first adaptive filter is input. Furthermore, the signal processing means may include a third adaptive filter to which a sound collection signal from the sound collection means is input. In this case, the first adaptive filter has a relatively high responsiveness that can follow the fluctuation of the original sound component, and the level of the signal processed by itself or the signal processed by the second adaptive filter is minimized. To work. By operating the first adaptive filter in this manner, the transfer function of the first adaptive filter is the inverse characteristic of the original sound component (strictly speaking, for example, a transfer function for generating the original sound component like a vocal tract transfer function). Is almost equivalent to That is, the reverse characteristic of the original sound component is identified by the first adaptive filter, that is, an inverse filter of the original sound component is formed. Therefore, when the collected sound signal is input to the first adaptive filter, the original sound component is removed from the collected sound signal. Then, the second adaptive filter has a relatively low response that is incapable of following the fluctuation of the original sound component, and operates so that the level of the processed signal by itself is minimized. As the second filter operates in this way, the transfer function of the second adaptive filter becomes substantially equivalent to the inverse of the transfer function of the acoustic system. That is, the inverse transfer function of the acoustic system is identified by the second adaptive filter, that is, the inverse filter of the acoustic system is formed. Further, the third adaptive filter realizes the processing based on the above-described inverse transfer function of the acoustic system by copying the filter coefficient of the second adaptive filter to the third adaptive filter, so-called copying. That is, by copying the filter coefficient of the second adaptive filter to the third adaptive filter, the transfer function of the third adaptive filter becomes substantially equivalent to the inverse transfer function of the acoustic system, that is, the inverse filter of the acoustic system is It is formed. Then, when the collected sound signal is input to such a third adaptive filter, the reverberation component is removed from the collected sound signal.
 ここで言う第1適応フィルタは、そのタップ数(タップ長)が比較的に小さいものである。このように比較的に小さいタップ数とされることによって、第1適応フィルタは、比較的に高い応答性を有するようになる。そして、第2適応フィルタは、そのタップ数が第1適応フィルタのタップ数よりも小さいものである。これにより、第2適応フィルタは、第1適応フィルタの応答性よりも低い応答性を有するようになる。なお、第3適応フィルタのタップ数は、当然に、第2適応フィルタのタップ数と同じである。 The first adaptive filter referred to here has a relatively small number of taps (tap length). By setting the number of taps to be relatively small in this way, the first adaptive filter has a relatively high response. The second adaptive filter has a smaller number of taps than the number of taps of the first adaptive filter. As a result, the second adaptive filter has a lower responsiveness than the responsiveness of the first adaptive filter. The number of taps of the third adaptive filter is naturally the same as the number of taps of the second adaptive filter.
 また、第1適応フィルタの更新量を調整するためのステップサイズは、比較的に大きめに設定される。これによっても、第1適応フィルタは、高い応答性を有するようになる。そして、第2適応フィルタのステップサイズは、第1適応フィルタのステップサイズよりも小さめに設定される。これによっても、第2適応フィルタは、第1適応フィルタの応答性よりも低い応答性を有するようになる。なお、第3適応フィルタの応答性は、第2適応フィルタの応答性と等価である。 Also, the step size for adjusting the update amount of the first adaptive filter is set to be relatively large. This also makes the first adaptive filter have high responsiveness. The step size of the second adaptive filter is set to be smaller than the step size of the first adaptive filter. This also causes the second adaptive filter to have a response lower than the response of the first adaptive filter. The responsiveness of the third adaptive filter is equivalent to the responsiveness of the second adaptive filter.
 このような第1適応フィルタは、例えば線形予測誤差フィルタであるのが望ましい。また、第2適応フィルタも同様に、線形予測誤差フィルタであるのが望ましい。即ち、線形予測誤差フィルタは、逆フィルタを形成するのに好適であり、また、その処理(アルゴリズム)は、基本的に遅延を生じないことが知られている。従って、このような線形予測誤差フィルタが(原音成分の逆フィルタを形成するための)第1適応フィルタおよび(音響系の逆フィルタを形成するための)第2適応フィルタとして採用されることで、リアルタイム性が実現される。加えて、線形予測誤差フィルタは、その演算量が比較的に少ないことも知られている。従って、当該線形予測誤差フィルタを構成するCPU(Central Processing Unit)やDSP(Digital Signal Processor)等の負担を軽減することができる。言い換えれば、当該CPUやDSP等として、その能力が必ずしも高くはない言わば廉価なものを採用することができる。なお、第3適応フィルタは、第2適応フィルタに準拠する構成の線形予測誤差フィルタとされる。 Such a first adaptive filter is preferably a linear prediction error filter, for example. Similarly, the second adaptive filter is desirably a linear prediction error filter. That is, it is known that the linear prediction error filter is suitable for forming an inverse filter, and that the processing (algorithm) basically does not cause a delay. Therefore, by adopting such a linear prediction error filter as a first adaptive filter (for forming an inverse filter of the original sound component) and a second adaptive filter (for forming an inverse filter of the acoustic system), Real-time performance is realized. In addition, it is also known that the linear prediction error filter has a relatively small amount of calculation. Therefore, it is possible to reduce the burden on the CPU (Central Processing Unit) and DSP (Digital Signal Processor) that constitute the linear prediction error filter. In other words, as the CPU, DSP, etc., it is possible to adopt an inexpensive device whose capability is not necessarily high. Note that the third adaptive filter is a linear prediction error filter configured to conform to the second adaptive filter.
本発明の一実施形態の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of one Embodiment of this invention. 同実施形態における第1線形予測誤差フィルタの具体的な構成を示すブロック図である。It is a block diagram which shows the specific structure of the 1st linear prediction error filter in the embodiment. 同実施形態における第2線形予測誤差フィルタの具体的な構成を示すブロック図である。It is a block diagram which shows the specific structure of the 2nd linear prediction error filter in the embodiment. 同実施形態における実験結果を示す図である。It is a figure which shows the experimental result in the same embodiment. 図4の一部を拡大して示す図である。It is a figure which expands and shows a part of FIG. 同実施形態における音響系の伝達関数を等価的に示すブロック図である。It is a block diagram which shows equivalently the transfer function of the acoustic system in the embodiment. 図4および図5とは別の実験結果を示す図である。It is a figure which shows the experimental result different from FIG. 4 and FIG.
 本発明の一実施形態について、図1~図7を参照して説明する。 An embodiment of the present invention will be described with reference to FIGS.
 本実施形態に係る残響除去装置10は、図1に示すように、収音手段としてのマイクロホン20を備えている。このマイクロホン20は、音源30から発せられる音、いわゆる原音S(z)(z;z変換における変数)を、残響特性を有する音響系(音響空間)40を介して捉える。そして、このマイクロホン20の出力信号である収音信号X(z)は、第1適応フィルタとしての第1の線形予測誤差フィルタ(Linear Prediction Error Filter;LPEF)50に入力される。 The dereverberation apparatus 10 according to the present embodiment includes a microphone 20 as sound collection means as shown in FIG. The microphone 20 captures a sound emitted from the sound source 30, a so-called original sound S (z) (z; a variable in z conversion) via an acoustic system (acoustic space) 40 having reverberation characteristics. The collected sound signal X (z) that is an output signal of the microphone 20 is input to a first linear prediction error filter (Linear Prediction Error LP; LPEF) 50 as a first adaptive filter.
 第1線形予測誤差フィルタ50は、マイクロホン20から入力された収音信号X(z)のうち原音S(z)の成分のみを除去するためのものである。この第1線形予測誤差フィルタ50の詳細については、後で詳しく説明する。そして、この第1線形予測誤差フィルタ50による処理後信号X’(z)は、第2適応フィルタとしての第2線形予測誤差フィルタ60に入力される。 The first linear prediction error filter 50 is for removing only the component of the original sound S (z) from the collected sound signal X (z) input from the microphone 20. Details of the first linear prediction error filter 50 will be described later. Then, the signal X ′ (z) processed by the first linear prediction error filter 50 is input to a second linear prediction error filter 60 as a second adaptive filter.
 第2線形予測誤差フィルタ60は、第1線形予測誤差フィルタ50による処理後信号X’(z)に基づいて、要するにマイクロホン20から得られる収音信号X(z)のうち原音S(z)の成分が除去された後の当該処理後信号X’(z)に基づいて、音源30からマイクロホン20までの音響系40の伝達関数R(z)の逆数である逆伝達関数1/R(z)を求め、つまり同定する。そして、この第2線形予測誤差フィルタ60によって同定された音響系40の逆伝達関数1/R(z)、厳密には当該第2線形予測誤差フィルタ60の後述するフィルタ係数a(n:タップの番号)は、第3適応フィルタとしての残響除去フィルタ70にコピーされる。なお、この第2線形予測誤差フィルタ60の詳細についても、後で詳しく説明する。 The second linear prediction error filter 60 is based on the signal X ′ (z) processed by the first linear prediction error filter 50, in short, of the collected sound signal X (z) obtained from the microphone 20. Based on the processed signal X ′ (z) after the component is removed, the inverse transfer function 1 / R (z) that is the inverse of the transfer function R (z) of the acoustic system 40 from the sound source 30 to the microphone 20. Is determined, that is, identified. Then, the inverse transfer function 1 / R (z) of the acoustic system 40 identified by the second linear prediction error filter 60, strictly speaking, a filter coefficient a n (n: tap) described later of the second linear prediction error filter 60 will be described. ) Is copied to the dereverberation filter 70 as the third adaptive filter. Details of the second linear prediction error filter 60 will be described later in detail.
 一方、残響除去フィルタ70には、マイクロホン20から収音信号X(z)が入力される。残響除去フィルタ70は、この収音信号X(z)に上述の音響系40の逆伝達関数1/R(z)に従うフィルタリング処理を施すことで、当該収音信号X(z)から音響系40による雑音成分、特に残響成分を、除去する。これにより、当該残響成分が除去された後の信号、言い換えれば原音S(z)の成分のみが取り出された信号が、残響除去フィルタ70による処理後信号S’(z)として、当該残響除去フィルタ70から出力され、ひいては本実施形態に係る残響除去装置10全体の出力とされる。この残響除去フィルタ70の詳細についても、後で詳しく説明する。 On the other hand, the collected sound signal X (z) is input to the dereverberation filter 70 from the microphone 20. The dereverberation filter 70 performs a filtering process according to the above-described inverse transfer function 1 / R (z) of the acoustic system 40 on the collected sound signal X (z), so that the acoustic system 40 is obtained from the collected sound signal X (z). The noise component due to, especially the reverberation component, is removed. As a result, the signal from which the reverberation component has been removed, in other words, the signal from which only the component of the original sound S (z) has been extracted is used as the post-processing signal S ′ (z) by the dereverberation filter 70. 70, and thus the output of the entire dereverberation apparatus 10 according to the present embodiment. Details of the dereverberation filter 70 will also be described later in detail.
 さて、第1線形予測誤差フィルタ50は、上述の如くマイクロホン20からの収音信号X(z)のうち原音S(z)の成分を除去するものであるが、そのために、第2線形予測誤差フィルタ60による処理後信号E(z)のレベル、いわゆる予測誤差が、最小になるように動作する。具体的には、当該第1線形予測誤差フィルタ50の伝達関数H(z)は、次の数1で表される。 The first linear prediction error filter 50 removes the component of the original sound S (z) from the collected sound signal X (z) from the microphone 20 as described above. The level of the signal E (z) after processing by the filter 60, that is, a so-called prediction error is operated to be minimized. Specifically, the transfer function H S (z) of the first linear prediction error filter 50 is expressed by the following equation (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 この数1において、Mは、タップ数である。このタップ数Mは、比較的に小さい値とされ、例えば収音信号X(z)のサンプリング周波数fがf=16kHzであるとすると、M=16~64(時間換算で1ms~4ms)程度とされる。このようにタップ数Mが比較的に小さい値とされることによって、第1線形予測誤差フィルタ50の応答性が高くなり、原音S(z)成分の変動に対して素早く追随することが可能になる。また、当該数1において、bは、mタップ目のフィルタ係数である。このフィルタ係数bが適宜に更新されることで、つまり上述の予測誤差E(z)が最小になるように当該フィルタ係数bが更新されることで、第1線形予測誤差フィルタ50の伝達関数H(z)が原音S(z)の逆特性1/S(z)と略等価(H(z)≒1/S(z))になる。なお、フィルタ係数bは、例えば公知の学習同定法(Normalized Least Mean Square;NLMS)によって更新される。そして、このフィルタ係数bの更新量を調整するためのステップサイズμは、比較的に大きい値とされ、例えばμ=0.1~0.5程度とされる。このようにステップサイズμが比較的に大きい値とされることによっても、第1線形予測誤差フィルタ50の応答性が高くなる。勿論、フィルタ係数bの更新アルゴリズムとして、学習同定法以外の適宜のアルゴリズムが採用されてもよい。 In this equation 1, M is the number of taps. The number of taps M is a relatively small value. For example, if the sampling frequency f of the collected sound signal X (z) is f = 16 kHz, M = 16 to 64 (1 ms to 4 ms in terms of time). Is done. By setting the tap number M to a relatively small value in this way, the responsiveness of the first linear prediction error filter 50 becomes high, and it becomes possible to quickly follow the fluctuation of the original sound S (z) component. Become. In Equation 1, b m is a filter coefficient of the m-th tap. By this filter coefficient b m are updated accordingly, i.e. above the prediction error E (z) is that is the filter coefficients b m so as to minimize is updated, the transmission of the first linear prediction error filter 50 The function H S (z) is substantially equivalent to the inverse characteristic 1 / S (z) of the original sound S (z) (H S (z) ≈1 / S (z)). The filter coefficient b m is updated by, for example, a known learning identification method (Normalized Least Mean Square: NLMS). The step size μ S for adjusting the update amount of the filter coefficient b m is set to a relatively large value, for example, μ S = about 0.1 to 0.5. Thus also by the step size mu S is a relatively large value, the response of the first linear prediction error filter 50 becomes high. Of course, an appropriate algorithm other than the learning identification method may be employed as an algorithm for updating the filter coefficient b m .
 このような第1線形予測誤差フィルタ50は、例えば図2に示すようなトランスバーサル型の構成とされる。この構成によれば、マイクロホン20からの収音信号X(z)が入力される加算器502が設けられている。併せて、収音信号X(z)は、M個の遅延素子504,504,…によって順次遅延される。そして、各遅延素子504,504,…の出力は、それぞれに対応する乗算器506,506,…によってそれぞれに対応するフィルタ係数bを乗ぜられた後、上述とは別の(M-1個の)加算器508,508,…によって合計される。これらの加算器508,508,…による合計P(z)は、収音信号X(z)を目標信号とする予測信号であり、この予測信号P(z)は、上述の加算器502に入力され、ここで、当該目標信号としての収音信号X(z)から差し引かれる。そして、この加算器502による差し引き後の信号、つまり目標信号としての収音信号X(z)と予測信号P(z)との差である残差信号が、第1線形予測誤差フィルタ50による処理後信号X’(z)として出力される。これにより、数1で表される伝達関数H(z)を有する第1線形予測誤差フィルタ50が実現される。 Such a first linear prediction error filter 50 has a transversal configuration as shown in FIG. 2, for example. According to this configuration, the adder 502 to which the sound collection signal X (z) from the microphone 20 is input is provided. In addition, the collected sound signal X (z) is sequentially delayed by the M delay elements 504, 504,. The outputs of the delay elements 504, 504,... Are multiplied by the corresponding filter coefficients b m by the corresponding multipliers 506, 506,. Are summed by adders 508, 508,. The sum P (z) by these adders 508, 508,... Is a prediction signal that uses the collected sound signal X (z) as a target signal, and this prediction signal P (z) is input to the adder 502 described above. Here, it is subtracted from the collected sound signal X (z) as the target signal. A signal after subtraction by the adder 502, that is, a residual signal that is a difference between the sound pickup signal X (z) as a target signal and the prediction signal P (z) is processed by the first linear prediction error filter 50. It is output as a rear signal X ′ (z). Thereby, the first linear prediction error filter 50 having the transfer function H S (z) expressed by Equation 1 is realized.
 この第1線形予測誤差フィルタ50の伝達関数H(z)は、上述の如く原音S(z)の逆特性1/S(z)と略等価である。即ち、当該第1線形予測誤差フィルタ50は、原音S(z)の逆特性を同定し、言わば当該原音S(z)の逆フィルタを形成する。従って、この第1線形予測誤差フィルタ50に収音信号X(z)が入力されることで、当該収音信号X(z)から原音S(z)の成分が除去され、概ね残響成分のみが残る。そして、この概ね残響成分のみを含む第1線形予測誤差フィルタ50による処理後信号X’(z)は、上述したように第2線形予測誤差フィルタ60に入力される。 The transfer function H S (z) of the first linear prediction error filter 50 is substantially equivalent to the inverse characteristic 1 / S (z) of the original sound S (z) as described above. That is, the first linear prediction error filter 50 identifies an inverse characteristic of the original sound S (z), and so forms an inverse filter of the original sound S (z). Therefore, when the collected sound signal X (z) is input to the first linear prediction error filter 50, the component of the original sound S (z) is removed from the collected sound signal X (z), and almost only the reverberation component is present. Remain. Then, the processed signal X ′ (z) by the first linear prediction error filter 50 including only substantially the reverberation component is input to the second linear prediction error filter 60 as described above.
 第2線形予測誤差フィルタ60は、上述の如く音響系40の逆伝達関数1/R(z)を同定するものであるが、当該第2線形予測誤差フィルタ60もまた、第1線形予測誤差フィルタ50と同様、予測誤差E(z)が最小になるように動作する。具体的には、当該第2線形予測誤差フィルタ60は、数1で表される第1線形予測誤差フィルタ50の伝達関数H(z)と同様の次の数2で表される伝達関数H(z)を有する。 The second linear prediction error filter 60 identifies the inverse transfer function 1 / R (z) of the acoustic system 40 as described above. The second linear prediction error filter 60 is also the first linear prediction error filter. As in the case of 50, the operation is performed so that the prediction error E (z) is minimized. Specifically, the second linear prediction error filter 60 is similar to the transfer function H S (z) of the first linear prediction error filter 50 expressed by Equation 1, and the transfer function H expressed by Equation 2 below. R (z).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 この数2において、Nは、タップ数である。このタップ数Nは、比較的に大きい値とされ、例えば上述のサンプリング周波数fがf=16kHzであるとすると、N=1024~2048(時間換算で64ms~128ms)程度とされる。このようにタップ数Nが比較的に大きい値とされることによって、第2線形予測誤差フィルタ60の応答性が低くなり、例えば原音S(z)成分の変動に対して追随することは不可能である一方、音響系40の逆伝達関数1/R(z)を精確に同定するのに好適となる。そして、当該数2において、aは、nタップ目のフィルタ係数である。このフィルタ係数aが適宜に更新されることで、つまり予測誤差E(z)が最小になるように当該フィルタ係数aが更新されることで、第2線形予測誤差フィルタ60の伝達関数H(z)が音響系40の逆伝達関数1/R(z)と略等価(H(z)≒1/R(z))になり、つまり当該音響系40の逆伝達関数1/R(z)が同定される。なお、このフィルタ係数aもまた、例えば上述の学習同定法によって更新される。そして、このフィルタ係数aの更新量を調整するためのステップサイズμは、比較的に小さい値とされ、例えばμ=0.001~0.01程度とされる。このようにステップサイズμが比較的に小さい値とされることによっても、第2線形予測誤差フィルタ60の応答性が低くなる。この第2線形予測誤差フィルタ60のフィルタ係数aの更新アルゴリズムについても、学習同定法以外の適宜のアルゴリズムが採用されてもよい。 In Equation 2, N is the number of taps. The tap number N is a relatively large value. For example, if the above sampling frequency f is f = 16 kHz, N = 1024 to 2048 (64 ms to 128 ms in terms of time). Thus, by setting the tap number N to a relatively large value, the responsiveness of the second linear prediction error filter 60 becomes low, and it is impossible to follow the fluctuation of the original sound S (z) component, for example. On the other hand, it is suitable for accurately identifying the inverse transfer function 1 / R (z) of the acoustic system 40. Then, in the number 2, a n is the filter coefficient of the n taps th. By this filter coefficients a n are updated accordingly, i.e. the prediction error E (z) is that is the filter coefficient a n so as to minimize the update, the transfer function H of the second linear prediction error filter 60 R (z) is substantially equivalent to the inverse transfer function 1 / R (z) of the acoustic system 40 (H R (z) ≈1 / R (z)), that is, the inverse transfer function 1 / R of the acoustic system 40. (Z) is identified. Incidentally, the filter coefficients a n are also, for example, is updated by the above-mentioned learning identification method. Then, the step size mu R for adjusting the amount of update of the filter coefficient a n is a relatively small value, are for example, μ R = 0.001 ~ 0.01. Thus also by the step size mu R is a relatively small value, the response of the second linear prediction error filter 60 is low. The updating algorithm of the filter coefficients a n of the second linear prediction error filter 60 may, as appropriate algorithms except learning identification method may be employed.
 このような第2線形予測誤差フィルタ60は、例えば図2に示した第1線形予測誤差フィルタ50の構成と同様の図3に示すようなトランスバーサル型の構成とされる。即ち、この構成によれば、第1線形予測誤差フィルタ50による処理後信号X’(z)が入力される加算器602が設けられている。併せて、当該処理後信号X’(z)は、M個の遅延素子604,604,…によって順次遅延される。そして、各遅延素子604,604,…の出力は、それぞれに対応する乗算器606,606,…によってそれぞれに対応するフィルタ係数aを乗ぜられた後、上述とは別の(N-1個の)加算器608,608,…によって合計される。これらの加算器608,608,…による合計P’(z)は、第1線形予測誤差フィルタ50による処理後信号X’(z)を目標信号とする予測信号であり、この予測信号P’(z)は、上述の加算器602に入力され、ここで、当該目標信号としての処理後信号X’(z)から差し引かれる。そして、この加算器602による差し引き後の信号、つまり目標信号としての処理後信号X’(z)と予測信号P’(z)との差である残差信号が、第2線形予測誤差フィルタ60による予測誤差E(z)として出力される。これにより、数2で表される伝達関数H(z)を有する第2線形予測誤差フィルタ60が実現される。 Such a second linear prediction error filter 60 has, for example, a transversal configuration as shown in FIG. 3 similar to the configuration of the first linear prediction error filter 50 shown in FIG. That is, according to this configuration, the adder 602 to which the signal X ′ (z) processed by the first linear prediction error filter 50 is input is provided. In addition, the processed signal X ′ (z) is sequentially delayed by M delay elements 604, 604,. Then, each of the delay elements 604, 604, ... are output, the multiplier 606,606 corresponding to each, after ... have multiplied the filter coefficients a n corresponding to the respective through another (N-1 pieces as above Are summed by adders 608, 608,. The sum P ′ (z) by the adders 608, 608,... Is a prediction signal having the processed signal X ′ (z) by the first linear prediction error filter 50 as a target signal, and this prediction signal P ′ ( z) is input to the above-described adder 602, where it is subtracted from the processed signal X ′ (z) as the target signal. Then, a signal after subtraction by the adder 602, that is, a residual signal that is a difference between the processed signal X ′ (z) as the target signal and the prediction signal P ′ (z) is the second linear prediction error filter 60. Is output as a prediction error E (z). Thereby, the second linear prediction error filter 60 having the transfer function H R (z) expressed by Equation 2 is realized.
 第2線形予測誤差フィルタ60は、上述の如く音響系40の逆伝達関数1/R(z)を同定し、つまり当該音響系40の逆フィルタを形成する。そして、この第2線形予測誤差フィルタ60の伝達関数H(z)、厳密には当該第2線形予測誤差フィルタ60のフィルタ係数aは、上述したように残響除去フィルタ70にコピーされる。この第2線形予測誤差フィルタ60から残響除去フィルタ70へのフィルタ係数aのコピーは、例えば1サンプリングごとに、つまり一定の周期で、行われる。なお、当該コピーは、複数サンプリングごとに行われてもよいし、不定期的に、例えばフィルタ係数aが一定以上の変動を生じたときに、行われてもよい。 The second linear prediction error filter 60 identifies the inverse transfer function 1 / R (z) of the acoustic system 40 as described above, that is, forms the inverse filter of the acoustic system 40. Then, the transfer function H R of the second linear prediction error filter 60 (z), the filter coefficients a n of strictly said second linear prediction error filter 60 is copied to the dereverberation filter 70 as described above. The second filter coefficient a n copies from the linear prediction error filter 60 to dereverberation filter 70, for example, every 1 sampling, i.e. at a constant cycle, is carried out. Incidentally, the copy may be performed for each of the plurality sampling, irregularly, for example when the filter coefficients a n occurs above a certain variation, it may be performed.
 残響除去フィルタ70は、基本的に(自身が適応動作しない以外は)図3に示した第2線形予測誤差フィルタ60と同じ構成のものであり、つまりその伝達関数H(z)は、上述の数2で表される当該第2線形予測誤差フィルタ60の伝達関数H(z)と等価(H(z)=H(z))である。従って、このような残響除去フィルタ70にマイクロホン20からの収音信号X(z)が入力されることで、当該収音信号X(z)から残響成分が除去され、原音S(z)の成分のみが取り出される。そして、この残響除去フィルタ70による処理後信号S’(z)(≒S(z))が、本実施形態に係る残響除去装置10全体の出力とされる。 The dereverberation filter 70 basically has the same configuration as that of the second linear prediction error filter 60 shown in FIG. 3 (except that the dereverberation operation itself does not perform an adaptive operation), that is, its transfer function H F (z) is the above-mentioned. This is equivalent to the transfer function H R (z) of the second linear prediction error filter 60 expressed by Equation 2 (H F (z) = H R (z)). Therefore, when the collected sound signal X (z) from the microphone 20 is input to such a dereverberation filter 70, the reverberation component is removed from the collected sound signal X (z), and the component of the original sound S (z). Only is taken out. The signal S ′ (z) (≈S (z)) after processing by the dereverberation filter 70 is the output of the entire dereverberation apparatus 10 according to the present embodiment.
 このような本実施形態に係る残響除去装置10による残響除去効果を、実際に確認してみた。具体的には、幅が約5.8m、奥行きが約3.2m、高さが約2.7mの会議室において、音源30としてのスピーカから原音S(z)としての白色雑音を出力させ、これを当該スピーカから1.0mほど離れたマイクロホン20で受ける。そして、このマイクロホン20による収音信号X(z)を観測することで、残響除去前のインパルス応答を確認し、残響除去フィルタ70による処理後信号(残響除去装置10全体の出力)を観測することで、残響除去後のインパルス応答を確認してみた。なお、このときのサンプリング周波数fは、f=16kHzである。そして、第1線形予測誤差フィルタ50のタップ数Mは、M=16であり、ステップサイズμは、μ=0.2である。また、第2線形予測誤差フィルタ60のタップ数Nは、N=2048であり、ステップサイズμは、μ=0.004である。残響除去フィルタ70の構成は、第2線形予測誤差フィルタ60の構成に準ずる。 The dereverberation effect by the dereverberation apparatus 10 according to this embodiment was actually confirmed. Specifically, in a conference room having a width of about 5.8 m, a depth of about 3.2 m, and a height of about 2.7 m, white noise as the original sound S (z) is output from a speaker as the sound source 30; This is received by the microphone 20 separated from the speaker by about 1.0 m. Then, by observing the collected sound signal X (z) by the microphone 20, the impulse response before dereverberation is confirmed, and the signal after processing by the dereverberation filter 70 (the output of the entire dereverberation apparatus 10) is observed. Then, I confirmed the impulse response after dereverberation. Note that the sampling frequency f at this time is f = 16 kHz. The number of taps M of the first linear prediction error filter 50 is M = 16, and the step size μ S is μ S = 0.2. Moreover, the tap number N of second linear prediction error filter 60 is N = 2048, the step size mu R is a mu R = 0.004. The configuration of the dereverberation filter 70 conforms to the configuration of the second linear prediction error filter 60.
 この実験によって、図4および図5に示す結果が得られた。このうちの図4(a)は、残響除去前のインパルス応答(X(z))であり、図4(b)は、残響除去後のインパルス応答(S’(z)))である。そして、図5(a)は、図4(a)における破線枠Aで囲まれた部分を拡大した図であり、図5(b)は、図4(b)における破線枠Bで囲まれた部分を拡大した図である。また、これら図4および図5において、最大振幅を示す(時間がゼロのときの)成分が、原音(z)の成分であり、いわゆる直接音の成分である。そして、この直接音成分よりも後方の成分が、残響成分である。 The results shown in FIGS. 4 and 5 were obtained by this experiment. 4A shows an impulse response (X (z)) before dereverberation, and FIG. 4B shows an impulse response (S ′ (z)) after dereverberation. FIG. 5A is an enlarged view of a portion surrounded by a broken line frame A in FIG. 4A, and FIG. 5B is surrounded by a broken line frame B in FIG. 4B. It is the figure which expanded the part. 4 and 5, the component indicating the maximum amplitude (when time is zero) is the component of the original sound (z), which is a so-called direct sound component. A component behind the direct sound component is a reverberation component.
 これらの図4および図5から明らかなように、本実施形態の残響除去装置10によれば、直接音成分である原音S(z)成分(の振幅)は残響除去前と変わらず、つまり何らの影響も受けていない。そして、残響成分のみが残響除去前よりも効果的に低減されていることが分かる。原音S(z)として、人間の音声が採用された場合も同様に、良好な残響除去効果が得られたことが確認された。 As is apparent from FIGS. 4 and 5, according to the dereverberation apparatus 10 of the present embodiment, the original sound S (z) component (the amplitude thereof) which is a direct sound component is the same as that before dereverberation, that is, nothing. It is not influenced by. And it turns out that only the reverberation component is reduced more effectively than before reverberation removal. Similarly, it was confirmed that a good dereverberation effect was obtained when human speech was adopted as the original sound S (z).
 特に、本実施形態の残響除去装置10によれば、フラッタエコー(鳴竜)が発生する環境下において、極めて良好な残響除去効果が得られる。これは、次の理由による。 In particular, according to the dereverberation apparatus 10 of the present embodiment, a very good dereverberation effect can be obtained in an environment where flutter echoes (Naruto) occur. This is due to the following reason.
 即ち、音響系40の伝達関数R(z)を数式で表すと、次の数3のようになる。なお、この数3において、Kは、タップ数である。そして、αは、kタップ目のフィルタ係数であり、βは、kタップ目の別のフィルタ係数である。 That is, the transfer function R (z) of the acoustic system 40 is expressed by the following mathematical formula 3. In Equation 3, K is the number of taps. Α k is a k-th tap filter coefficient, and β k is another k-tap filter coefficient.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 この数3における分母は、最小位相成分に当たり、その代表的なものとして、フラッタエコーがある。一方、当該数3における分子は、非最小位相成分に当たり、その代表的なものとして、例えば乱反射エコーがある。そして、この数3によって表される音響系40の伝達関数R(z)と、その逆伝達関数1/R(z)を同定するための上述の数2によって表される第2線形予測誤差フィルタ60の伝達関数H(z)と、を比較すると、当該音響系40の逆伝達関数1/R(z)のうちの最小位相成分(数3における分母)と、第2線形予測誤差フィルタ60の伝達関数H(z)とが、互いにピッタリと対応している。これは、当該最小位相成分が第2線形予測誤差フィルタ60によって確実に同定されていることを意味する。ゆえに、フラッタエコーを含む当該最小位相成分は、極めて良好に除去される。なお、音響系40の逆伝達関数1/R(z)のうちの非最小位相成分(数3における分子)については、特段な処理は施されず、言わば無視される。 The denominator in Equation 3 corresponds to the minimum phase component, and a typical example is a flutter echo. On the other hand, the numerator in Equation 3 corresponds to a non-minimum phase component, and a typical example thereof includes an irregular reflection echo. Then, the second linear prediction error filter represented by the above equation 2 for identifying the transfer function R (z) of the acoustic system 40 represented by the equation 3 and its inverse transfer function 1 / R (z). When the transfer function H R (z) of 60 is compared, the minimum phase component (the denominator in Equation 3) of the inverse transfer function 1 / R (z) of the acoustic system 40 and the second linear prediction error filter 60 Transfer functions H R (z) correspond to each other perfectly. This means that the minimum phase component is reliably identified by the second linear prediction error filter 60. Therefore, the minimum phase component including the flutter echo is removed very well. Note that the non-minimum phase component (the numerator in Equation 3) of the inverse transfer function 1 / R (z) of the acoustic system 40 is not subjected to special processing and is ignored.
 また、音響系40の伝達関数R(z)をブロック図で等価的に表すと、図6のようになる。この図6によれば、音源30からの原音S(z)は、乗算器402によってβというフィルタ係数を乗ぜられた後、加算器404に入力される。そして、この加算器404の出力が、音響系40の出力とされ、つまりマイクロホン20に入力される。併せて、音源30からの原音S(z)は、K個の遅延素子406,406,…によって順次遅延される。そして、各遅延素子406,406,…の出力は、それぞれに対応する乗算器408,408,…によってそれぞれに対応するフィルタ係数βを乗ぜられた後、それぞれに対応する(K個の)加算器410,410,…に入力される。また、上述の加算器404の出力は、K個の別の遅延素子412,412,…によって順次遅延される。そして、各遅延素子412,412,…の出力は、それぞれに対応する乗算器414,414,…によってそれぞれに対応するフィルタ係数αを乗ぜられた後、それぞれに対応する加算器410,410,…に入力される。それぞれの加算器410は、乗算器408による乗算結果から乗算器414による乗算結果を減算する。そして、各加算器410,410,…による減算結果は、互いに合計されて、上述の加算器402に入力され、ここで、原音S(z)に付加される。これにより、数3で表される音響系40の伝達関数R(z)の等価回路が実現される。 Moreover, if the transfer function R (z) of the acoustic system 40 is equivalently expressed in a block diagram, it is as shown in FIG. According to FIG. 6, the original sound S (z) from the sound source 30 is multiplied by a filter coefficient β 0 by a multiplier 402 and then input to an adder 404. The output of the adder 404 is used as the output of the acoustic system 40, that is, input to the microphone 20. In addition, the original sound S (z) from the sound source 30 is sequentially delayed by the K delay elements 406, 406,. The outputs of the delay elements 406, 406,... Are multiplied by the corresponding filter coefficients β k by the corresponding multipliers 408, 408,. Are input to the units 410, 410,... Further, the output of the adder 404 is sequentially delayed by K separate delay elements 412, 412,. The outputs of the delay elements 412, 412,... Are multiplied by the corresponding filter coefficients α k by the corresponding multipliers 414, 414,. It is input to…. Each adder 410 subtracts the multiplication result by the multiplier 414 from the multiplication result by the multiplier 408. Then, the subtraction results by the adders 410, 410,... Are summed together and input to the above-described adder 402, where they are added to the original sound S (z). Thereby, an equivalent circuit of the transfer function R (z) of the acoustic system 40 expressed by Equation 3 is realized.
 この図6に示す音響系40の伝達関数R(z)の等価回路の構成と、その逆伝達関数1/R(z)を同定するための図3に示した第2線形予測誤差フィルタ60(伝達関数H(z))の構成と、を比較すると、当該音響系40の伝達関数R(z)の等価回路のうちの上述した最小位相成分(数3における分母)に対応する部分(図6における右側の部分)の構成と、図3に示した第2線形予測誤差フィルタ60の構成とが、互いに真逆であることが分かる。このことからも、当該最小位相成分が第2線形予測誤差フィルタ60によって確実に同定され、ひいてはフラッタエコーを含む当該最小位相成分が良好に除去されることが理解できる。 The configuration of the equivalent circuit of the transfer function R (z) of the acoustic system 40 shown in FIG. 6 and the second linear prediction error filter 60 (shown in FIG. 3 for identifying the inverse transfer function 1 / R (z) are shown. When the configuration of the transfer function H R (z)) is compared, a portion corresponding to the above-described minimum phase component (the denominator in Equation 3) in the equivalent circuit of the transfer function R (z) of the acoustic system 40 (FIG. 3) 6 and the configuration of the second linear prediction error filter 60 shown in FIG. 3 are mutually opposite. Also from this, it can be understood that the minimum phase component is reliably identified by the second linear prediction error filter 60 and that the minimum phase component including the flutter echo is satisfactorily removed.
 さらに、本実施形態の残響除去装置10による効果をより向上させるために、厳密には当該効果をより早期に得られるようにするために、次のような工夫が成されてもよい。 Furthermore, in order to further improve the effect of the dereverberation apparatus 10 of the present embodiment, strictly, the following device may be made in order to obtain the effect earlier.
 即ち、第2線形予測誤差フィルタ60への入力信号である第1線形予測誤差フィルタ50による処理後信号X’(z)と、当該第2線形予測誤差フィルタ60の出力信号である予測誤差E(z)と、に応じて、当該第2線形予測誤差フィルタ60のステップサイズμが適宜に制御されるようにする。具体的には、次の式4に基づいて当該ステップサイズμが制御される。 That is, a signal X ′ (z) after processing by the first linear prediction error filter 50 that is an input signal to the second linear prediction error filter 60 and a prediction error E () that is an output signal of the second linear prediction error filter 60. and z), depending on the step size mu R of the second linear prediction error filter 60 is to be controlled appropriately. Specifically, the step size mu R is controlled on the basis of the following equation 4.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 なお、この数4において、μは、ステップサイズμの初期値であり、任意に設定される。そして、Eは、予測誤差E(z)の目標値であり、これもまた任意に設定される。さらに、Pは、入力信号X’(z)のパワーであり、次の数5によって近似される。そして、Pは、出力信号E(z)のパワーであり、数6によって近似される。なお、数5および数6における係数ρは、数7のように定められる。 In Equation 4, μ 0 is an initial value of the step size μ R and is arbitrarily set. E 0 is a target value of the prediction error E (z), which is also set arbitrarily. Furthermore, P S is the power of the input signal X '(z), it is approximated by the following equation 5. P E is the power of the output signal E (z) and is approximated by Equation (6). Note that the coefficient ρ in Equations 5 and 6 is determined as in Equation 7.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 このように第2線形予測誤差フィルタ60のステップサイズμが制御されることによって、当該第2線形予測誤差フィルタ60による音響系40の逆伝達関数1/R(z)の同定速度(収束速度)が向上する。その効果を、シミュレーションにより実際に確認してみた。 By thus step size mu R of the second linear prediction error filter 60 is controlled, identified speed (convergence rate of the inverse transfer function 1 / R of the acoustic system 40 according to the second linear prediction error filter 60 (z) ) Will improve. The effect was actually confirmed by simulation.
 具体的には、原音S(z)として、白色雑音を用いる。そして、残響として、指数減衰する正規乱数を用い、その残響時間を、1.024sとする。そして、数4におけるステップサイズμの初期値μを、μ=0.2とし、予測誤差E(z)の目標値Eを、E=0.1とする。また、比較対照用として、ステップサイズμがμ=0.008に固定されたものを用意する。これ以外の条件については、図4および図5を参照しながら説明した先の実験時と同じである。 Specifically, white noise is used as the original sound S (z). Then, a normal random number that exponentially decays is used as the reverberation, and the reverberation time is set to 1.024 s. Then, the initial value μ 0 of the step size μ R in Equation 4 is set to μ 0 = 0.2, and the target value E 0 of the prediction error E (z) is set to E 0 = 0.1. Further, for the comparison, the step size mu R is prepared that is fixed to the mu R = 0.008. Other conditions are the same as in the previous experiment described with reference to FIGS. 4 and 5.
 このシミュレーションによって、図7に示す結果が得られた。このうちの図7の(a)は、ステップサイズμが固定された比較対照用の結果であり、図7(b)が、ステップサイズμの制御を行った結果である。この結果から分かるように、ステップサイズμが固定された構成では、第2線形予測誤差フィルタ60による同定動作が安定するのに1分(=96×10サンプリング)ほどの時間が掛かるのに、ステップサイズμが制御されることによって、当該第2線形予測誤差フィルタ60による同定動作が安定までの時間が15秒(=24×10サンプリング)程度に短縮される。即ち、第2線形予測誤差フィルタ60のステップサイズμが制御されることによって、当該第2線形予測誤差フィルタ60による同定速度が向上することが、確認された。なお、実際の残響除去装置10を用いた実験においても、シミュレーションと同様の効果が得られた。 The result shown in FIG. 7 was obtained by this simulation. (A) in FIG. 7 of them is a comparative control of the results step size mu R is fixed, and FIG. 7 (b) is a result of the control of the step size mu R. As can be seen from the results, in the configuration step size mu R is fixed, 1 minute (= 96 × 10 4 sampling) as it takes time for the for identification operation by the second linear prediction error filter 60 is stabilized by the step size mu R is controlled, time to identification operation by the second linear prediction error filter 60 is stabilized is shortened to 15 seconds (= 24 × 10 4 sampling) degree. That is, by the step size mu R of the second linear prediction error filter 60 is controlled, that the identification rate by the second linear prediction error filter 60 is improved, has been confirmed. In the experiment using the actual dereverberation apparatus 10, the same effect as the simulation was obtained.
 以上のように、本実施形態の残響除去装置10によれば、マイクロホン20による収音信号X(z)のうち残響成分のみを精確に除去し、原音S(z)のみを精確に取り出すことができる。このような残響除去装置10は、特に、上述したフラッタエコーが発生する環境下や、遠隔会議システムのように話者とマイクロホンとが1mほど離れた状態にある環境下等において、極めて良好な残響除去効果を発揮する。その用途としては、とりわけハンズフリー型の電話機やインターカム,バウンダリマイク等のように、残響の影響を受け易い機器に、有効である。 As described above, according to the dereverberation apparatus 10 of the present embodiment, only the reverberation component of the collected sound signal X (z) from the microphone 20 can be accurately removed, and only the original sound S (z) can be accurately extracted. it can. Such a dereverberation apparatus 10 is extremely good in reverberation particularly in an environment where the flutter echo described above occurs, or in an environment where the speaker and the microphone are separated by about 1 m as in a teleconference system. Demonstrate the removal effect. This is particularly effective for devices that are susceptible to reverberation, such as hands-free telephones, intercoms, and boundary microphones.
 また、本実施形態の残響除去装置10によれば、上述した従来技術とは異なり、事前に音響伝達特性を測定する必要はない。さらに、本実施形態の残響除去装置10によれば、マイクロホン20による収音信号S(z)のみに基づいて音響系40の逆伝達関数1/R(z)が同定され、ひいては残響除去が行われ、いわゆるブラインド残響除去が実現される。このブラインド残響除去は、一般に、困難であるとされているが、本実施形態の残響除去装置10によれば、上述の如くマイクロホン20の収音信号X(z)から原音S(z)の成分が除去され、この除去後の処理後信号X’(z)に基づいて音響系40の逆伝達関数1/R(z)が同定されることで、当該原音S(z)の成分の影響を受けることなく音響系40の逆伝達関数1/R(z)が精確に同定され、ひいては精確な雑音除去が実現される。特に、残響抑圧の精確さに欠ける従来技術に比べて、良好な残響除去効果を得ることができる。このことは、音質の向上に大きく貢献する。 Also, according to the dereverberation apparatus 10 of the present embodiment, unlike the above-described conventional technology, it is not necessary to measure the acoustic transfer characteristics in advance. Furthermore, according to the dereverberation apparatus 10 of the present embodiment, the inverse transfer function 1 / R (z) of the acoustic system 40 is identified based only on the sound collection signal S (z) from the microphone 20, and thus dereverberation is performed. In other words, so-called blind dereverberation is realized. This blind dereverberation is generally considered difficult, but according to the dereverberation apparatus 10 of the present embodiment, the component of the original sound S (z) from the collected signal X (z) of the microphone 20 as described above. Is removed, and the inverse transfer function 1 / R (z) of the acoustic system 40 is identified based on the processed signal X ′ (z) after the removal, so that the influence of the component of the original sound S (z) can be reduced. Without being received, the inverse transfer function 1 / R (z) of the acoustic system 40 is accurately identified, so that accurate noise removal is realized. In particular, a better dereverberation effect can be obtained as compared with the prior art lacking accuracy of dereverberation suppression. This greatly contributes to the improvement of sound quality.
 加えて、本実施形態の残響除去装置10によれば、残響除去フィルタ70を含む3つの線形予測誤差フィルタ50,60および70によって、上述の如く良好な残響除去効果を得ることができる。これらの線形予測誤差フィルタ50,60および70は、基本的に遅延を生じないので、いわゆるリアルタイム性が実現される。また、当該線形予測誤差フィルタ50,60および70は、その演算量が比較的に少ないので、それらを構成する図示しないCPUやDSP等の負担が軽減される。言い換えれば、当該CPUやDSP等として、その能力が必ずしも高くはない廉価なものを採用することができる。 In addition, according to the dereverberation apparatus 10 of the present embodiment, a good dereverberation effect can be obtained as described above by the three linear prediction error filters 50, 60 and 70 including the dereverberation filter 70. Since these linear prediction error filters 50, 60 and 70 basically do not cause a delay, a so-called real-time property is realized. Further, since the linear prediction error filters 50, 60 and 70 have a relatively small amount of calculation, the burden on the CPU and DSP (not shown) constituting them is reduced. In other words, as the CPU, DSP, etc., it is possible to adopt an inexpensive one whose capability is not necessarily high.
 なお、本実施形態で説明した内容は、飽くまでも本発明を実現するための1つの具体例であり、本発明の範囲を限定するものではない。 Note that the content described in the present embodiment is one specific example for realizing the present invention, and does not limit the scope of the present invention.
 例えば、本実施形態においては、第1線形予測誤差フィルタ50がトランスバーサル型とされたが、これに限らない。ラティス型等のトランスバーサル型以外の構成によって、第1線形予測誤差フィルタ50が実現されてもよい。ただし、トランスバーサル型は、例えばラティス型に比べて、その構成が簡素であるので、このようなトランスバーサル型が採用されることで、第1線形予測誤差フィルタ50を構成するCPUやDSP等の負担がさらに軽減される。言い換えれば、当該CPUやDSP等として、さらに廉価なものを採用することができる。 For example, in the present embodiment, the first linear prediction error filter 50 is a transversal type, but is not limited thereto. The first linear prediction error filter 50 may be realized by a configuration other than the transversal type such as a lattice type. However, since the transversal type has a simpler configuration than, for example, the lattice type, by adopting such a transversal type, the CPU, DSP, or the like constituting the first linear prediction error filter 50 is used. The burden is further reduced. In other words, a more inexpensive one can be used as the CPU or DSP.
 また、第2線形予測誤差フィルタ60も同様に、トランスバーサル型に限らず、ラティス型等のトランスバーサル型以外の構成によって実現されてもよい。ただし、この第2線形予測誤差フィルタ60の構成に合わせて、雑音除去フィルタ70の構成が決まることは、言うまでもない。 Similarly, the second linear prediction error filter 60 is not limited to the transversal type but may be realized by a configuration other than the transversal type such as a lattice type. However, it goes without saying that the configuration of the noise removal filter 70 is determined in accordance with the configuration of the second linear prediction error filter 60.
 さらに、第1適応フィルタとしての第1線形予測誤差フィルタ50に代えて、他の構成(アルゴリズム)の適応フィルタが採用されてもよい。ただし、上述したように、当該第1適応フィルタとして第1線形予測誤差フィルタ50が採用されることによって、リアルタイム性が実現されると共に、これを構成するCPUやDSP等の負担が軽減される。 Furthermore, instead of the first linear prediction error filter 50 as the first adaptive filter, an adaptive filter of another configuration (algorithm) may be employed. However, as described above, by employing the first linear prediction error filter 50 as the first adaptive filter, real-time performance is realized, and the burden on the CPU, DSP, and the like constituting the real-time property is reduced.
 また、第2線形予測誤差フィルタ60についても同様に、これに代えて他の構成の適応フィルタが採用されてもよい。この場合も、当該第2線形予測誤差フィルタ60の構成に合わせて、残響除去フィルタ70の構成が決まる。 Similarly, the second linear prediction error filter 60 may be replaced with an adaptive filter having another configuration instead. Also in this case, the configuration of the dereverberation filter 70 is determined in accordance with the configuration of the second linear prediction error filter 60.
 10 残響除去装置
 20 マイクロホン
 30 音源
 40 音響系
 50 第1線形予測誤差フィルタ
 60 第2線形予測誤差フィルタ
 70 残響除去フィルタ
DESCRIPTION OF SYMBOLS 10 Reverberation removal apparatus 20 Microphone 30 Sound source 40 Acoustic system 50 1st linear prediction error filter 60 2nd linear prediction error filter 70 Reverberation removal filter

Claims (5)

  1.  残響特性を有する音響系を介して音源からの音を捉える収音手段の出力信号である収音信号から該音響系による残響成分を除去する残響除去装置において、
     上記収音信号から上記音の成分である原音成分を除去する原音成分除去手段と、
     上記原音成分除去手段によって上記収音信号から上記原音成分が除去された後の原音成分除去後信号に基づいて上記音響系の伝達関数の逆数を求める逆伝達関数演算手段と、
     上記逆伝達関数演算手段によって求められた逆伝達関数に基づく処理を上記収音信号に施す信号処理手段と、
    を具備することを特徴とする、残響除去装置。
    In the dereverberation apparatus for removing the reverberation component by the acoustic system from the collected sound signal that is the output signal of the sound collecting means that captures the sound from the sound source via the acoustic system having the reverberation characteristics
    An original sound component removing means for removing an original sound component that is a component of the sound from the collected sound signal;
    An inverse transfer function computing means for obtaining an inverse of the transfer function of the acoustic system based on the original sound component removed signal after the original sound component is removed from the collected sound signal by the original sound component removing means;
    Signal processing means for performing processing based on the inverse transfer function obtained by the inverse transfer function calculating means on the collected sound signal;
    A dereverberation apparatus comprising:
  2.  上記原音成分除去手段は上記収音信号が入力される第1適応フィルタを含み、
     上記逆伝達関数演算手段は上記第1適応フィルタによる処理後信号が入力される第2適応フィルタを含み、
     上記信号処理手段は上記収音信号が入力される第3適応フィルタを含み、
     上記第1適応フィルタは上記原音成分の変動に追随可能な程度の高い応答性を有すると共に自身による処理後信号または上記第2適応フィルタによる処理後信号のレベルが最小になるように動作し、
     上記第2適応フィルタは上記原音成分の変動に追随不可能な程度の低い応答性を有すると共に自身による処理後信号のレベルが最小になるように動作し、
     上記第3適応フィルタは上記第2適応フィルタのフィルタ係数が複写されることで上記逆伝達関数に基づく処理を実現する、
    請求項1に記載の残響除去装置。
    The original sound component removing means includes a first adaptive filter to which the collected sound signal is input,
    The inverse transfer function calculating means includes a second adaptive filter to which a signal processed by the first adaptive filter is input,
    The signal processing means includes a third adaptive filter to which the collected sound signal is input,
    The first adaptive filter has such a high responsiveness as to follow the fluctuation of the original sound component and operates so that the level of the signal processed by itself or the signal processed by the second adaptive filter is minimized,
    The second adaptive filter has such a low response that it cannot follow the fluctuation of the original sound component and operates so that the level of the signal after processing by itself is minimized,
    The third adaptive filter realizes processing based on the inverse transfer function by copying the filter coefficient of the second adaptive filter.
    The dereverberation apparatus according to claim 1.
  3.  上記第1適応フィルタのタップ数は上記第2適応フィルタのタップ数よりも小さい、
    請求項2に記載の残響除去装置。
    The number of taps of the first adaptive filter is smaller than the number of taps of the second adaptive filter.
    The dereverberation apparatus according to claim 2.
  4.  上記第1適応フィルタのステップサイズは上記第2適応フィルタのステップサイズよりも大きい、
    請求項2または3に記載の残響除去装置。
    The step size of the first adaptive filter is larger than the step size of the second adaptive filter;
    The dereverberation apparatus according to claim 2 or 3.
  5.  上記第1適応フィルタと上記第2適応フィルタと上記第3適応フィルタとの一部または全部は線形予測誤差フィルタである、
    請求項2ないし4のいずれかに記載の残響除去装置。
    A part or all of the first adaptive filter, the second adaptive filter, and the third adaptive filter are linear prediction error filters.
    The dereverberation apparatus according to any one of claims 2 to 4.
PCT/JP2012/083266 2012-12-21 2012-12-21 Reverberation removal device WO2014097470A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/083266 WO2014097470A1 (en) 2012-12-21 2012-12-21 Reverberation removal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/083266 WO2014097470A1 (en) 2012-12-21 2012-12-21 Reverberation removal device

Publications (1)

Publication Number Publication Date
WO2014097470A1 true WO2014097470A1 (en) 2014-06-26

Family

ID=50977848

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/083266 WO2014097470A1 (en) 2012-12-21 2012-12-21 Reverberation removal device

Country Status (1)

Country Link
WO (1) WO2014097470A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429995A (en) * 2017-02-14 2018-08-21 株式会社东芝 Acoustic processing device, sound processing method and storage medium
CN113347536A (en) * 2021-05-20 2021-09-03 天津大学 Acoustic feedback suppression algorithm based on linear prediction and subband adaptive filtering

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09261133A (en) * 1996-03-25 1997-10-03 Nippon Telegr & Teleph Corp <Ntt> Reverberation suppression method and its equipment
JPH09321860A (en) * 1996-03-25 1997-12-12 Nippon Telegr & Teleph Corp <Ntt> Reverberation elimination method and equipment therefor
JP2008058900A (en) * 2006-09-04 2008-03-13 Internatl Business Mach Corp <Ibm> Low-cost filter coefficient determination method in reverberation removal
JP2012048134A (en) * 2010-08-30 2012-03-08 Nippon Telegr & Teleph Corp <Ntt> Reverberation removal method, reverberation removal device and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09261133A (en) * 1996-03-25 1997-10-03 Nippon Telegr & Teleph Corp <Ntt> Reverberation suppression method and its equipment
JPH09321860A (en) * 1996-03-25 1997-12-12 Nippon Telegr & Teleph Corp <Ntt> Reverberation elimination method and equipment therefor
JP2008058900A (en) * 2006-09-04 2008-03-13 Internatl Business Mach Corp <Ibm> Low-cost filter coefficient determination method in reverberation removal
JP2012048134A (en) * 2010-08-30 2012-03-08 Nippon Telegr & Teleph Corp <Ntt> Reverberation removal method, reverberation removal device and program

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429995A (en) * 2017-02-14 2018-08-21 株式会社东芝 Acoustic processing device, sound processing method and storage medium
CN108429995B (en) * 2017-02-14 2020-03-13 株式会社东芝 Sound processing device, sound processing method, and storage medium
CN113347536A (en) * 2021-05-20 2021-09-03 天津大学 Acoustic feedback suppression algorithm based on linear prediction and subband adaptive filtering
CN113347536B (en) * 2021-05-20 2023-05-26 天津大学 Acoustic feedback suppression algorithm based on linear prediction and sub-band adaptive filtering

Similar Documents

Publication Publication Date Title
CN109686381B (en) Signal processor for signal enhancement and related method
US10403299B2 (en) Multi-channel speech signal enhancement for robust voice trigger detection and automatic speech recognition
TWI661684B (en) Method and apparatus for adaptive beam forming
CN108172231B (en) Dereverberation method and system based on Kalman filtering
JP6019969B2 (en) Sound processor
WO2009148049A1 (en) Acoustic echo canceller and acoustic echo cancel method
JP5645419B2 (en) Reverberation removal device
JP7215541B2 (en) SIGNAL PROCESSING DEVICE, REMOTE CONFERENCE DEVICE, AND SIGNAL PROCESSING METHOD
WO2014103066A1 (en) Sound-source separation method, device, and program
WO2021171829A1 (en) Signal processing device, signal processing method, and program
WO2014097470A1 (en) Reverberation removal device
JP4690243B2 (en) Digital filter, periodic noise reduction device, and noise reduction device
JP5140785B1 (en) Directivity control method and apparatus
TWI579833B (en) Signal processing device and signal processing method
JP2006126841A (en) Periodic signal enhancement system
JP3609611B2 (en) Echo cancellation method and echo canceller
Prasad et al. Two microphone technique to improve the speech intelligibility under noisy environment
JP5975398B2 (en) Speech enhancement device
CN113347536B (en) Acoustic feedback suppression algorithm based on linear prediction and sub-band adaptive filtering
van Waterschoot et al. Adaptive feedback cancellation for audio signals using a warped all-pole near-end signal model
WO2024006778A1 (en) Audio de-reverberation
JP2004357053A (en) Echo canceler device and echo canceler method
Pawar et al. Implementation of binary masking technique for hearing aid application
Joorabchi et al. Comparative analysis of speech dereverberation in noisy acoustical environments
JP2005250266A (en) Echo suppressing method, and device, program and recording medium implementing the method,

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12890593

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12890593

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP