JP5516169B2 - Sound processing apparatus and program - Google Patents

Sound processing apparatus and program Download PDF

Info

Publication number
JP5516169B2
JP5516169B2 JP2010159543A JP2010159543A JP5516169B2 JP 5516169 B2 JP5516169 B2 JP 5516169B2 JP 2010159543 A JP2010159543 A JP 2010159543A JP 2010159543 A JP2010159543 A JP 2010159543A JP 5516169 B2 JP5516169 B2 JP 5516169B2
Authority
JP
Japan
Prior art keywords
noise
component
acoustic signal
matrix
coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2010159543A
Other languages
Japanese (ja)
Other versions
JP2012022120A (en
Inventor
多伸 近藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yamaha Corp
Original Assignee
Yamaha Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corp filed Critical Yamaha Corp
Priority to JP2010159543A priority Critical patent/JP5516169B2/en
Publication of JP2012022120A publication Critical patent/JP2012022120A/en
Application granted granted Critical
Publication of JP5516169B2 publication Critical patent/JP5516169B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
  • Circuit For Audible Band Transducer (AREA)

Description

本発明は、音響信号に含まれる雑音成分を抑圧する技術に関する。   The present invention relates to a technique for suppressing a noise component included in an acoustic signal.

目的音成分と雑音成分との混合音の音響信号から雑音成分を抑圧する技術が従来から提案されている。例えば特許文献1には、複数の音響信号の各々における低域成分と各低域成分の平均成分とのうち強度が最小となる成分を選択して各音響信号の高域成分と合成することで、風雑音が抑圧された雑音抑圧信号を生成する技術が開示されている。   Conventionally, a technique for suppressing a noise component from an acoustic signal of a mixed sound of a target sound component and a noise component has been proposed. For example, in Patent Document 1, a component having the minimum intensity is selected from a low-frequency component and an average component of each low-frequency component in each of a plurality of acoustic signals, and synthesized with a high-frequency component of each acoustic signal. A technique for generating a noise suppression signal in which wind noise is suppressed is disclosed.

特許第4356670号公報Japanese Patent No. 4356670

しかし、特許文献1の技術では、雑音抑圧信号の生成に利用される成分が強度のみを基準として選択されるから、例えば風雑音と比較して目的音成分の強度が小さい場合には目的音成分が除去される可能性がある。また、複数の音響信号の平均成分が雑音抑圧信号の低域成分として選択された場合には、雑音抑圧信号の生成の過程で目的音成分の波形が大幅に変化するから、目的音成分が忠実に再現されないという問題もある。以上の事情を考慮して、本発明は、音響信号の雑音成分を高精度に抑圧することを目的とする。   However, in the technique of Patent Document 1, since the component used for generating the noise suppression signal is selected based on the intensity only, for example, when the intensity of the target sound component is small compared to the wind noise, the target sound component May be removed. In addition, when the average component of multiple acoustic signals is selected as the low-frequency component of the noise suppression signal, the waveform of the target sound component changes significantly during the generation of the noise suppression signal. There is also a problem that it is not reproduced. In view of the above circumstances, an object of the present invention is to suppress a noise component of an acoustic signal with high accuracy.

以上の課題を解決するために本発明が採用する手段を説明する。なお、本発明の理解を容易にするために、以下の説明では、本発明の各要素と後述の各実施形態の要素との対応を括弧書で付記するが、本発明の範囲を実施形態の例示に限定する趣旨ではない。   Means employed by the present invention to solve the above problems will be described. In order to facilitate understanding of the present invention, in the following description, the correspondence between each element of the present invention and the element of each of the embodiments described later is indicated in parentheses, but the scope of the present invention is not limited to the embodiment. It is not intended to limit the example.

本発明の音響処理装置は、並列に収音された第1音響信号および第2音響信号の各々について、当該音響信号の周波数毎の成分値の時系列を要素とする観測行列(例えば観測行列Vi)の非負行列因子分解で、当該音響信号の相異なる成分の周波数毎の成分値を示す複数の基底(例えば基底Ci[1]〜Ci[K])を含む基底行列(例えば基底行列Wi)と、当該各基底の重み値の時系列を各々が示す複数の重み系列(例えば重み系列Ei[1]〜Ei[K])を含む係数行列(例えば係数行列Hi)とを生成する行列分解手段(例えば行列分解部44)と、第1音響信号の基底行列の複数の基底のうち第2音響信号の基底行列の基底との相関が高い基底を、第1音響信号の雑音成分に対応する雑音基底(例えば雑音基底Ci_noise)として特定する雑音特定手段(例えば雑音特定部46)と、第1音響信号から雑音成分を抑圧した推定目的音成分(例えば推定目的音信号qTiのスペクトルYTi)を基底行列のうち雑音基底以外の各基底と係数行列のうち雑音基底以外に対応する各重み系列とを利用して生成する目的音抽出手段(例えば目的音抽出部52)と、第1音響信号から目的音成分を抑圧した推定雑音成分(例えば推定雑音信号qNiのスペクトルYNi)を雑音基底と係数行列のうち当該雑音基底に対応する重み系列(例えば重み系列Ei_noise)とを利用して生成する雑音抽出手段(例えば雑音抽出部54)と、目的音成分の調波構造に対応する残留成分(例えば残留成分のスペクトルRi)を推定雑音成分から抽出する調波成分抽出手段(例えば調波成分抽出部64)と、推定目的音成分と残留成分とを合成する目的音合成手段(例えば目的音合成部66)とを具備する。   The acoustic processing apparatus according to the present invention provides an observation matrix (for example, an observation matrix Vi) for each of the first acoustic signal and the second acoustic signal collected in parallel as elements of a time series of component values for each frequency of the acoustic signal. ) In a non-negative matrix factorization, a base matrix (for example, base matrix Wi) including a plurality of bases (for example, base Ci [1] to Ci [K]) indicating component values for different frequencies of the acoustic signal, Matrix decomposition means for generating a coefficient matrix (for example, coefficient matrix Hi) including a plurality of weight series (for example, weight series Ei [1] to Ei [K]) each indicating a time series of the weight values of the respective bases ( For example, a base having a high correlation with the base of the base matrix of the second acoustic signal among a plurality of bases of the base matrix of the first acoustic signal is selected as a noise base corresponding to the noise component of the first acoustic signal. Noise specifying means (for example, noise base Ci_noise) A noise specifying unit 46) and an estimated target sound component (for example, a spectrum YTi of the estimated target sound signal qTi) obtained by suppressing the noise component from the first acoustic signal. Target sound extraction means (for example, target sound extraction unit 52) generated using each of the corresponding weight series, and an estimated noise component (for example, spectrum of the estimated noise signal qNi) obtained by suppressing the target sound component from the first acoustic signal. YNi) using a noise base and a weight sequence (for example, a weight sequence Ei_noise) corresponding to the noise base in the coefficient matrix, and a harmonic structure of the target sound component A harmonic component extracting means (for example, harmonic component extracting unit 64) for extracting a residual component corresponding to (for example, the spectrum Ri of the residual component) from the estimated noise component, and combining the estimated target sound component and the residual component That includes the target sound synthesizing means (e.g. target sound synthesizing unit 66).

以上の構成では、第1音響信号および第2音響信号の各々の観測行列が基底行列と係数行列とに分解され、第1音響信号の基底行列の複数の基底のうち第2音響信号の基底行列の基底との相関が高い雑音基底が除外されたうえで推定目的音成分が抽出される。したがって、第1音響信号の目的音成分の強度が雑音成分と比較して低い場合でも雑音成分を高精度に抑圧することが可能である。また、目的音成分の調波構造に対応する残留成分が推定雑音成分から抽出されて推定目的音成分に合成されるから、推定雑音成分に目的音成分の一部(残留成分)が残留した場合でも、目的音成分の欠落を有効に防止できるという利点がある。しかも、調波構造を手掛かりに推定雑音成分から残留成分を抽出するから、残留成分の強度が雑音成分に対して低い場合でも残留成分を高精度に抽出できるという利点がある。なお、本発明の適用の範囲は、2系統の音響信号を処理する構成に限定されない。すなわち、3系統以上の音響信号を処理する構成でも、特定の2系統の音響信号に着目したときに本発明の要件を充足する構成は、本発明の範囲に当然に包含される。   In the above configuration, each observation matrix of the first acoustic signal and the second acoustic signal is decomposed into a base matrix and a coefficient matrix, and the base matrix of the second acoustic signal among the plurality of bases of the base matrix of the first acoustic signal. The noise target having a high correlation with the base of the sound is excluded and the estimated target sound component is extracted. Therefore, even when the intensity of the target sound component of the first acoustic signal is lower than the noise component, the noise component can be suppressed with high accuracy. Also, since the residual component corresponding to the harmonic structure of the target sound component is extracted from the estimated noise component and synthesized with the estimated target sound component, a part of the target sound component (residual component) remains in the estimated noise component However, there is an advantage that the loss of the target sound component can be effectively prevented. In addition, since the residual component is extracted from the estimated noise component using the harmonic structure as a clue, there is an advantage that the residual component can be extracted with high accuracy even when the strength of the residual component is lower than the noise component. The scope of application of the present invention is not limited to a configuration that processes two systems of acoustic signals. That is, even in a configuration for processing three or more systems of acoustic signals, a configuration that satisfies the requirements of the present invention when focusing on two specific systems of acoustic signals is naturally included in the scope of the present invention.

本発明の好適な態様において、調波成分抽出手段は、目的音成分の基本周波数を推定する周波数推定手段(例えば周波数推定部72)と、推定雑音成分のうち基本周波数の整数倍の周波数の調波成分が強調されるように各係数値が設定された調波係数列を生成する調波係数列生成手段(例えば調波係数列生成部74)と、調波係数列を推定雑音成分に作用させて残留成分を抽出する調波抽出手段(例えば調波抽出部78)とを含む。以上の態様においては、周波数推定手段が推定した目的音成分の基本周波数に応じて生成された調波係数列の適用で推定雑音成分から残留成分が抽出される。したがって、音響信号の周波数特性(調波構造)に応じた適切な残留成分を抽出できるという利点がある。
さらに好適な態様において、周波数推定手段は、目的音抽出手段が生成した推定目的音成分の基本周波数を推定する。以上の態様においては、雑音成分が抑圧された推定目的音成分について基本周波数が推定されるから、雑音成分が混在した状態で基本周波数を推定する場合と比較して目的音成分の基本周波数を高精度に推定できるという利点がある。ただし、目的音成分と雑音成分とが混在した第1音響信号または第2音響信号を対象として目的音成分の基本周波数を推定する方法も採用され得る。なお、基本周波数の推定の方法(例えば周波数領域での処理か時間領域での処理か)は任意である。
In a preferred aspect of the present invention, the harmonic component extracting means includes frequency estimating means (for example, the frequency estimating unit 72) for estimating the fundamental frequency of the target sound component, and adjusting the frequency of an integral multiple of the fundamental frequency among the estimated noise components. Harmonic coefficient string generating means (for example, harmonic coefficient string generating unit 74) for generating a harmonic coefficient string in which each coefficient value is set so that the wave component is emphasized, and the harmonic coefficient string acting on the estimated noise component And harmonic extraction means (for example, harmonic extraction unit 78) for extracting residual components. In the above aspect, the residual component is extracted from the estimated noise component by applying the harmonic coefficient sequence generated according to the fundamental frequency of the target sound component estimated by the frequency estimating means. Therefore, there is an advantage that an appropriate residual component corresponding to the frequency characteristic (harmonic structure) of the acoustic signal can be extracted.
In a further preferred aspect, the frequency estimation means estimates the fundamental frequency of the estimated target sound component generated by the target sound extraction means. In the above aspect, since the fundamental frequency is estimated for the estimated target sound component in which the noise component is suppressed, the fundamental frequency of the target sound component is increased compared to the case where the fundamental frequency is estimated in a state where the noise component is mixed. There is an advantage that accuracy can be estimated. However, a method of estimating the fundamental frequency of the target sound component for the first sound signal or the second sound signal in which the target sound component and the noise component are mixed may be employed. Note that the method of estimating the fundamental frequency (for example, processing in the frequency domain or time domain) is arbitrary.

本発明の好適な態様に係る音響処理装置は、第1音響信号および第2音響信号のスペクトルの時系列を第1解析パラメータ(例えば窓幅ωA,移動量δA)のもとで観測行列として生成する第1周波数分析手段(例えば周波数分析部42)と、第1解析パラメータとは相違する第2解析パラメータ(例えば窓幅ωB,移動量δB)のもとで推定目的音成分および推定雑音成分のスペクトルを順次に生成する第2周波数分析手段(例えば周波数分析部62)と、第2解析パラメータに応じた間隔で時間軸上および周波数軸上に配列する解析点(例えば解析点p2)毎に係数値が設定された補正係数列(例えば補正係数列GBi)を生成する係数列補正手段(例えば係数列補正部76)とを具備し、雑音抽出手段は、第1解析パラメータに応じた間隔で時間軸上および周波数軸上に配列する解析点(例えば解析点p1)毎に係数値が設定された雑音係数列(例えば雑音係数列GNi)を、雑音基底と当該雑音基底に対応する重み系列とを利用して生成し、雑音係数列を観測行列に作用させて推定雑音成分を生成し、係数列補正手段は、雑音係数列から補正係数列を生成し、調波係数列生成手段は、補正係数列から基本周波数の整数倍の周波数の成分を抽出して調波係数列を生成する。以上の態様においては、推定雑音成分の抽出に利用される雑音係数列が調波係数列の生成に流用されるから、調波係数列の生成に雑音係数列を利用しない構成と比較して残留成分の抽出に必要な処理の負荷が軽減される。また、第1解析パラメータに応じた雑音係数列が第2解析パラメータに応じた補正係数列に補正されたうえで調波係数列の生成(さらには残留成分の抽出)に適用されるから、第1解析パラメータと第2解析パラメータとが相違する場合でも、残留成分の抽出に利用される適切な調波係数列を生成することが可能である。したがって、例えば非負行列因子分解に最適な数値に第1解析パラメータを設定し、基本周波数の推定や残留成分と推定目的音成分との合成に最適な数値に第2解析パラメータを設定することが可能である。   The acoustic processing apparatus according to a preferred aspect of the present invention generates a time series of spectra of the first acoustic signal and the second acoustic signal as an observation matrix based on a first analysis parameter (for example, window width ωA, movement amount δA). First estimated frequency analysis means (for example, frequency analysis unit 42) and second analysis parameters (for example, window width ωB, movement amount δB) that are different from the first analysis parameters. Second frequency analysis means (for example, frequency analysis unit 62) that sequentially generates a spectrum and analysis points (for example, analysis point p2) arranged on the time axis and frequency axis at intervals according to the second analysis parameter. Coefficient sequence correction means (for example, coefficient sequence correction section 76) for generating a correction coefficient sequence (for example, correction coefficient sequence GBi) in which numerical values are set, and the noise extraction means is time-dependent at intervals according to the first analysis parameter. axis And a noise coefficient sequence (for example, noise coefficient sequence GNi) in which coefficient values are set for each analysis point (for example, analysis point p1) arranged on the frequency axis, using a noise base and a weight sequence corresponding to the noise base. And generating the estimated noise component by applying the noise coefficient sequence to the observation matrix, the coefficient sequence correcting means generates a correction coefficient sequence from the noise coefficient sequence, and the harmonic coefficient sequence generating means from the correction coefficient sequence A harmonic coefficient sequence is generated by extracting a component having a frequency that is an integral multiple of the fundamental frequency. In the above aspect, since the noise coefficient sequence used for extracting the estimated noise component is diverted for generation of the harmonic coefficient sequence, it remains as compared with the configuration in which the noise coefficient sequence is not used for generation of the harmonic coefficient sequence. The processing load necessary to extract the components is reduced. Since the noise coefficient sequence corresponding to the first analysis parameter is corrected to the correction coefficient sequence corresponding to the second analysis parameter and applied to the generation of the harmonic coefficient sequence (and the residual component extraction), Even when the first analysis parameter is different from the second analysis parameter, it is possible to generate an appropriate harmonic coefficient sequence used for extraction of residual components. Therefore, for example, the first analysis parameter can be set to a value that is optimal for non-negative matrix factorization, and the second analysis parameter can be set to a value that is optimal for estimation of the fundamental frequency and synthesis of the residual component and the estimated target sound component. It is.

本発明の好適な態様に係る音響処理装置は、第1音響信号と第2音響信号との位相差(例えば位相差ΔP[nA])を算定する位相差算定手段(例えば位相差算定部582)を具備し、目的音抽出手段は、位相差算定手段が算定した位相差に応じて各係数値が可変に設定された目的音係数列を、基底行列のうち雑音基底以外の各基底と係数行列のうち雑音基底以外に対応する重み系列とから生成して観測行列に作用させる。例えば、第1音響信号と第2音響信号との位相差が大きい(雑音成分が優勢である)ほど目的音係数列による雑音抑圧の効果が増加するように、目的音係数列の各係数値が位相差に応じて設定される。以上の態様においては、第1音響信号と第2音響信号との位相差が目的音係数列に反映されるから、目的音係数列に位相差を反映させない構成と比較して雑音成分を充分に抑圧した推定目的音成分を生成できるという利点がある。なお、位相差算定手段が算定した位相差を雑音係数列に反映させる構成も採用され得る。すなわち、雑音抽出手段は、第1音響信号と第2音響信号との位相差に応じて各係数値が可変に設定された雑音係数列を、雑音基底と当該雑音基底に対応する重み系列とから生成して観測行列に作用させる。   The acoustic processing apparatus according to a preferred aspect of the present invention is a phase difference calculating means (for example, a phase difference calculating unit 582) that calculates a phase difference (for example, a phase difference ΔP [nA]) between the first acoustic signal and the second acoustic signal. The target sound extraction means includes a target sound coefficient sequence in which each coefficient value is variably set according to the phase difference calculated by the phase difference calculation means, and a base matrix other than the noise base and a coefficient matrix. Are generated from weight sequences corresponding to those other than the noise basis, and are applied to the observation matrix. For example, each coefficient value of the target sound coefficient sequence increases so that the effect of noise suppression by the target sound coefficient sequence increases as the phase difference between the first sound signal and the second sound signal increases (the noise component is dominant). It is set according to the phase difference. In the above aspect, since the phase difference between the first acoustic signal and the second acoustic signal is reflected in the target sound coefficient sequence, the noise component is sufficiently compared with the configuration in which the phase difference is not reflected in the target sound coefficient sequence. There is an advantage that a suppressed estimated target sound component can be generated. A configuration in which the phase difference calculated by the phase difference calculating means is reflected in the noise coefficient sequence can also be employed. That is, the noise extraction means calculates a noise coefficient sequence in which each coefficient value is variably set according to the phase difference between the first acoustic signal and the second acoustic signal from the noise base and the weight sequence corresponding to the noise base. Generate and act on the observation matrix.

本発明の好適な態様に係る音響処理装置は、第1音響信号と第2音響信号との強度差(例えば強度差ΔA[nA])を算定する強度差算定手段(例えば強度差算定部584)を具備し、目的音抽出手段は、強度差算定手段が算定した強度差(例えば振幅差やパワー差)に応じて各係数値が可変に設定された目的音係数列を、基底行列のうち雑音基底以外の各基底と係数行列のうち雑音基底以外に対応する重み系列とから生成して観測行列に作用させる。例えば、第1音響信号と第2音響信号との強度差が大きい(雑音成分が優勢である)ほど目的音係数列による雑音抑圧の効果が増加するように、目的音係数列の各係数値が強度差に応じて設定される。以上の形態においては、第1音響信号と第2音響信号との強度差が目的音係数列に反映されるから、目的音係数列に強度差を反映させない構成と比較して雑音成分を充分に抑圧した推定目的音成分を生成できるという利点がある。なお、強度差算定手段が算定した強度差を雑音係数列に反映させる構成も採用され得る。すなわち、雑音抽出手段は、第1音響信号と第2音響信号との強度差に応じて各係数値が可変に設定された雑音係数列を、雑音基底と当該雑音基底に対応する重み系列とから生成して観測行列に作用させる。   The sound processing apparatus according to a preferred aspect of the present invention is an intensity difference calculating means (for example, an intensity difference calculating unit 584) for calculating an intensity difference (for example, an intensity difference ΔA [nA]) between the first acoustic signal and the second acoustic signal. And the target sound extraction means outputs the target sound coefficient sequence in which each coefficient value is variably set according to the intensity difference (for example, amplitude difference or power difference) calculated by the intensity difference calculation means, from the base matrix. It is generated from each basis other than the basis and a weight sequence other than the noise basis among the coefficient matrices and is applied to the observation matrix. For example, each coefficient value of the target sound coefficient sequence increases so that the effect of noise suppression by the target sound coefficient sequence increases as the intensity difference between the first acoustic signal and the second acoustic signal increases (the noise component is dominant). It is set according to the intensity difference. In the above embodiment, since the difference in intensity between the first acoustic signal and the second acoustic signal is reflected in the target sound coefficient sequence, the noise component is sufficiently compared with the configuration in which the intensity difference is not reflected in the target sound coefficient sequence. There is an advantage that a suppressed estimated target sound component can be generated. Note that a configuration in which the intensity difference calculated by the intensity difference calculating means is reflected in the noise coefficient sequence may be employed. That is, the noise extraction means calculates a noise coefficient sequence in which each coefficient value is variably set according to the intensity difference between the first acoustic signal and the second acoustic signal from the noise base and the weight sequence corresponding to the noise base. Generate and act on the observation matrix.

以上の各態様に係る音響処理装置は、音響信号の処理に専用されるDSP(Digital Signal Processor)などのハードウェア(電子回路)によって実現されるほか、CPU(Central Processing Unit)などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明に係るプログラムは、並列に収音された第1音響信号および第2音響信号の各々について、当該音響信号の周波数毎の成分値の時系列を要素とする観測行列の非負行列因子分解で、当該音響信号の相異なる成分の周波数毎の成分値を示す複数の基底を含む基底行列と、当該各基底の重み値の時系列を各々が示す複数の重み系列を含む係数行列とを生成する行列分解処理と、第1音響信号の基底行列の複数の基底のうち第2音響信号の基底行列の基底との相関が高い基底を、第1音響信号の雑音成分に対応する雑音基底として特定する雑音特定処理と、第1音響信号から雑音成分を抑圧した推定目的音成分を基底行列のうち雑音基底以外の各基底と係数行列のうち雑音基底以外に対応する各重み系列とを利用して生成する目的音抽出処理と、第1音響信号から目的音成分を抑圧した推定雑音成分を雑音基底と係数行列のうち当該雑音基底に対応する重み系列とを利用して生成する雑音抽出処理と、目的音成分の調波構造に対応する残留成分を推定雑音成分から抽出する調波成分抽出処理と、推定目的音成分と残留成分とを合成する目的音合成処理とをコンピュータに実行させる。以上のプログラムによれば、本発明に係る音響処理装置と同様の作用および効果が実現される。なお、本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で利用者に提供されてコンピュータにインストールされるほか、通信網を介した配信の形態でサーバ装置から提供されてコンピュータにインストールされる。   The acoustic processing device according to each of the above aspects is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to processing of an acoustic signal, or a general-purpose calculation such as a CPU (Central Processing Unit). This is also realized by cooperation between the processing device and the program. The program according to the present invention is a non-negative matrix factorization of an observation matrix having a time series of component values for each frequency of the acoustic signal for each of the first acoustic signal and the second acoustic signal collected in parallel. Generating a base matrix including a plurality of bases indicating component values for different frequencies of different components of the acoustic signal, and a coefficient matrix including a plurality of weight sequences each indicating a time series of weight values of the respective bases A base having a high correlation between the matrix decomposition process and the base of the base matrix of the second acoustic signal among the plurality of bases of the base matrix of the first acoustic signal is specified as a noise base corresponding to the noise component of the first acoustic signal. Generates estimated target sound component with noise component suppressed from first acoustic signal using noise identification processing and each weight sequence corresponding to non-noise basis among coefficient matrix and coefficient matrix Target sound extraction processing , A noise extraction process for generating an estimated noise component obtained by suppressing the target sound component from the first acoustic signal using a noise base and a weight sequence corresponding to the noise base in the coefficient matrix, and a harmonic structure of the target sound component The computer executes a harmonic component extraction process for extracting a residual component corresponding to the estimated noise component from the estimated noise component and a target sound synthesis process for synthesizing the estimated target sound component and the residual component. According to the above program, the same operation and effect as the sound processing apparatus according to the present invention are realized. The program of the present invention is provided to the user in a form stored in a computer-readable recording medium and installed in the computer, or is provided from the server device in the form of distribution via a communication network. Installed on.

本発明の第1実施形態に係る音響処理装置のブロック図である。1 is a block diagram of a sound processing apparatus according to a first embodiment of the present invention. 第1処理部のブロック図である。It is a block diagram of a 1st processing part. 観測行列の説明図である。It is explanatory drawing of an observation matrix. 基底行列および係数行列の説明図である。It is explanatory drawing of a base matrix and a coefficient matrix. 目的音抽出部および雑音抽出部のブロック図である。It is a block diagram of a target sound extraction part and a noise extraction part. 第2処理部のブロック図である。It is a block diagram of a 2nd processing part. 第2処理部の処理で想定される解析点の説明図である。It is explanatory drawing of the analysis point assumed by the process of a 2nd process part. 調波成分抽出部のブロック図である。It is a block diagram of a harmonic component extraction unit. 調波成分抽出部の動作の説明図である。It is explanatory drawing of operation | movement of a harmonic component extraction part. 第2実施形態における第1処理部のブロック図である。It is a block diagram of the 1st processing part in a 2nd embodiment. 第3実施形態における第2処理部のブロック図である。It is a block diagram of the 2nd processing part in a 3rd embodiment. 第4実施形態における調波成分抽出部のブロック図である。It is a block diagram of the harmonic component extraction part in 4th Embodiment.

<A:第1実施形態>
図1は、本発明の第1実施形態に係る音響処理装置100のブロック図である。図1に示すように、音響処理装置100には信号供給装置12と放音装置14とが接続される。信号供給装置12は、相異なる位置で並列(同時)に収音されたステレオ形式の音響信号s1および音響信号s2を音響処理装置100に供給する。各音響信号si(i=1,2)は、目的音成分と雑音成分との混合音の音圧波形を表す時間領域信号である。図1では、相互に離間して配置された複数の収音機器122(例えば無指向性のマイクロホン)が信号供給装置12として例示されている。ただし、可搬型または内蔵型の記録媒体から各音響信号siを取得して音響処理装置100に供給する再生装置や、各音響信号siを通信網から受信して音響処理装置100に供給する通信装置を、信号供給装置12として採用することも可能である。
<A: First Embodiment>
FIG. 1 is a block diagram of a sound processing apparatus 100 according to the first embodiment of the present invention. As shown in FIG. 1, a signal supply device 12 and a sound emitting device 14 are connected to the sound processing device 100. The signal supply device 12 supplies the sound processing device 100 with the stereo-type sound signal s1 and sound signal s2 collected in parallel (simultaneously) at different positions. Each acoustic signal si (i = 1, 2) is a time domain signal representing a sound pressure waveform of a mixed sound of a target sound component and a noise component. In FIG. 1, a plurality of sound collection devices 122 (for example, omnidirectional microphones) that are arranged apart from each other are illustrated as the signal supply device 12. However, a playback device that acquires each acoustic signal si from a portable or built-in recording medium and supplies it to the sound processing device 100, or a communication device that receives each acoustic signal si from a communication network and supplies it to the sound processing device 100. Can also be employed as the signal supply device 12.

音響処理装置100は、音響信号s1および音響信号s2からステレオ形式の音響信号q1および音響信号q2を生成する。各音響信号qiは、音響信号siから雑音成分を抑圧(目的音成分を強調)した時間領域信号である。放音装置14(例えばステレオスピーカやステレオヘッドホン)は、音響処理装置100が生成した音響信号q1および音響信号q2に応じた音波を放射する。なお、音響信号siをアナログからデジタルに変換するA/D変換器や音響信号qiをデジタルからアナログに変換するD/A変換器の図示は便宜的に省略されている。   The sound processing apparatus 100 generates a stereo sound signal q1 and a sound signal q2 from the sound signal s1 and the sound signal s2. Each acoustic signal qi is a time domain signal obtained by suppressing a noise component (emphasizing a target sound component) from the acoustic signal si. The sound emitting device 14 (for example, a stereo speaker or stereo headphones) radiates sound waves corresponding to the acoustic signal q1 and the acoustic signal q2 generated by the acoustic processing device 100. Note that an A / D converter that converts the acoustic signal si from analog to digital and a D / A converter that converts the acoustic signal qi from digital to analog are not shown for convenience.

図1に示すように、音響処理装置100は、演算処理装置22と記憶装置24とを具備するコンピュータシステムで実現される。記憶装置24は、演算処理装置22が実行するプログラムや演算処理装置22が使用する各種のデータを記憶する。半導体記録媒体や磁気記録媒体等の公知の記録媒体や複数種の記録媒体の組合せが記憶装置24として任意に採用され得る。音響信号s1および音響信号s2を記憶装置24に記憶した構成(したがって信号供給装置12は省略され得る)も好適である。   As shown in FIG. 1, the sound processing device 100 is realized by a computer system including an arithmetic processing device 22 and a storage device 24. The storage device 24 stores a program executed by the arithmetic processing device 22 and various data used by the arithmetic processing device 22. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media can be arbitrarily employed as the storage device 24. A configuration in which the acoustic signal s1 and the acoustic signal s2 are stored in the storage device 24 (therefore, the signal supply device 12 can be omitted) is also suitable.

演算処理装置22は、記憶装置24に格納されたプログラムの実行で、音響信号siから音響信号qiを生成するための複数の機能(第1処理部31,第2処理部32)を実現する。なお、演算処理装置22の各機能を複数の集積回路に分散した構成や、専用の電子回路(DSP)が各機能を実現する構成も採用され得る。   The arithmetic processing unit 22 implements a plurality of functions (a first processing unit 31 and a second processing unit 32) for generating the acoustic signal qi from the acoustic signal si by executing a program stored in the storage device 24. A configuration in which each function of the arithmetic processing unit 22 is distributed over a plurality of integrated circuits, or a configuration in which a dedicated electronic circuit (DSP) realizes each function may be employed.

図1の第1処理部31は、目的音成分を強調(雑音成分を抑圧)したステレオ形式の推定目的音信号qT1および推定目的音信号qT2(T:target)と、雑音成分を強調(目的音成分を抑圧)したステレオ形式の推定雑音信号qN1および推定雑音信号qN2(N:noise)とを、音響信号s1および音響信号s2から生成する。すなわち、音響信号siが目的音成分(推定目的音信号qTi)と雑音成分(推定雑音信号qNi)とに分離される。ただし、目的音成分と雑音成分との完全な分離は困難であるから、本来的には推定目的音信号qTiに選別されるべき目的音成分の一部(以下「残留成分」という)が推定雑音信号qNiに混在する可能性がある。そこで、第2処理部32は、残留成分を推定雑音信号qNiから抽出して推定目的音信号qTiに合成することで音響信号qi(q1,q2)を生成する。   The first processing unit 31 in FIG. 1 emphasizes the target sound component (the noise component is suppressed) in the stereo form of the estimated target sound signal qT1 and the estimated target sound signal qT2 (T: target) and the noise component (the target sound). A stereo-type estimated noise signal qN1 and an estimated noise signal qN2 (N: noise) with components suppressed) are generated from the acoustic signal s1 and the acoustic signal s2. That is, the acoustic signal si is separated into a target sound component (estimated target sound signal qTi) and a noise component (estimated noise signal qNi). However, since it is difficult to completely separate the target sound component and the noise component, a part of the target sound component (hereinafter referred to as “residual component”) to be originally selected for the estimated target sound signal qTi is estimated noise. The signal qNi may be mixed. Therefore, the second processing unit 32 generates the acoustic signal qi (q1, q2) by extracting the residual component from the estimated noise signal qNi and synthesizing it with the estimated target sound signal qTi.

図2は、第1処理部31のブロック図である。図2に示すように、第1処理部31は、周波数分析部42と行列分解部44と雑音特定部46と目的音抽出部52と雑音抽出部54と波形合成部56とを含んで構成される。   FIG. 2 is a block diagram of the first processing unit 31. As shown in FIG. 2, the first processing unit 31 includes a frequency analysis unit 42, a matrix decomposition unit 44, a noise identification unit 46, a target sound extraction unit 52, a noise extraction unit 54, and a waveform synthesis unit 56. The

周波数分析部42は、図3に示すように、各音響信号siのスペクトルSi(S1,S2)を時間軸上の単位期間(フレーム)毎に順次に生成する。各単位期間のスペクトルSiは、周波数軸上の相異なる周波数(f1,f2,……,fMA,……)に対応する複数の成分値(パワー)xiを配列したパワースペクトルである。すなわち、図3に示すように、時間軸上に間隔ΔtAで配列する時点t(t1,t2,……)と周波数軸上に間隔ΔfAで配列する周波数f(f1,f2,……)とに対応して時間-周波数平面に行列状に配列する解析点(グリッド)p1毎に成分値xiが算定される。   As shown in FIG. 3, the frequency analysis unit 42 sequentially generates a spectrum Si (S1, S2) of each acoustic signal si for each unit period (frame) on the time axis. The spectrum Si of each unit period is a power spectrum in which a plurality of component values (power) xi corresponding to different frequencies (f1, f2,..., FMA,...) On the frequency axis are arranged. That is, as shown in FIG. 3, at time points t (t1, t2,...) Arranged at intervals ΔtA on the time axis and frequencies f (f1, f2,...) Arranged at intervals ΔfA on the frequency axis. Correspondingly, component values xi are calculated for each analysis point (grid) p1 arranged in a matrix on the time-frequency plane.

各スペクトルSiの生成には、単位期間の窓幅(フレーム長)ωAおよび移動量(時間軸上のシフト量)δAを解析パラメータとして適用した短時間フーリエ変換が採用される。各解析点p1の時間軸上の間隔ΔtAおよび周波数軸上の間隔ΔfAは、周波数分析部42による周波数分析の解析パラメータ(窓幅ωA,移動量δA)に応じて可変に設定される。   The generation of each spectrum Si employs short-time Fourier transform in which the window width (frame length) ωA and movement amount (shift amount on the time axis) δA of the unit period are applied as analysis parameters. The interval ΔtA on the time axis and the interval ΔfA on the frequency axis of each analysis point p1 are variably set according to the analysis parameters (window width ωA, movement amount δA) of frequency analysis by the frequency analysis unit 42.

図3に示すように、各音響信号siのスペクトルSiは、帯域BLa内のスペクトルXiと帯域BHa内のスペクトルXHiとに区分される。帯域BLaは、雑音成分の周波数を包含するように設定される。本実施形態では風雑音を雑音成分として想定する。風雑音は、空気自体が流動して収音機器122の振動板に直接に衝突することで発生する雑音成分である。空気の衝突に起因した振動板の振動の周波数は、空気の振動(音圧変化)として振動板に伝播する音波の周波数と比較して低い。具体的には、風雑音の周波数は、例えば1kHz以下の低周波成分が支配的となる。以上の傾向を考慮して、帯域BLaは、MA個(MAは自然数)の周波数f1〜fMAを含む1kHz以下の範囲に設定される。帯域BHaは、帯域BLaと比較して高域側(例えば1kHz以上)の帯域である。   As shown in FIG. 3, the spectrum Si of each acoustic signal si is divided into a spectrum Xi in the band BLa and a spectrum XHi in the band BHa. The band BLa is set so as to include the frequency of the noise component. In the present embodiment, wind noise is assumed as a noise component. Wind noise is a noise component generated when air itself flows and directly collides with the diaphragm of the sound collecting device 122. The vibration frequency of the diaphragm due to the air collision is lower than the frequency of the sound wave propagating to the diaphragm as air vibration (sound pressure change). Specifically, the frequency of the wind noise is predominantly a low frequency component of 1 kHz or less, for example. Considering the above tendency, the band BLa is set to a range of 1 kHz or less including MA (MA is a natural number) frequencies f1 to fMA. The band BHa is a band on the high frequency side (for example, 1 kHz or more) compared to the band BLa.

図3に示すように、周波数分析部42が生成したスペクトルSiの時系列(スペクトログラム)は、NA個の時点t1〜tNAを含む解析期間T0毎に時間軸上で区分される。解析期間T0は、例えば数十秒程度の長時間に設定される。図3に示すように、帯域BLa内のMA個の周波数f1〜fMAと解析期間T0内のNA個の時点t1〜tNAとに対応する解析点p1の成分値xi[1,1]〜xi[MA,NA]をMA行×NA列に配列した観測行列Viが、音響信号s1および音響信号s2の各々について解析期間T0毎に規定される。成分値xi[mA,nA]は、帯域BLa内のMA個の周波数f1〜fMAのうち第mA番目(mA=1〜MA)の周波数fmAと、解析期間T0内のNA個の時点t1〜tNAのうち第nA番目(nA=1〜NA)の時点tnAとに対応する解析点p1の成分値xiを意味する。   As shown in FIG. 3, the time series (spectrogram) of the spectrum Si generated by the frequency analysis unit 42 is divided on the time axis for every analysis period T0 including NA time points t1 to tNA. The analysis period T0 is set to a long time of about several tens of seconds, for example. As shown in FIG. 3, the component values xi [1,1] to xi [at the analysis point p1 corresponding to the MA frequencies f1 to fMA in the band BLa and the NA time points t1 to tNA within the analysis period T0. MA, NA] is defined for each analysis period T0 for each of the acoustic signal s1 and acoustic signal s2. The component value xi [mA, nA] is the mAth (mA = 1 to MA) frequency fmA of the MA frequencies f1 to fMA in the band BLa and the NA time points t1 to tNA within the analysis period T0. Means the component value xi of the analysis point p1 corresponding to the nAth (nA = 1 to NA) time point tnA.

以上の説明から理解されるように、観測行列Viの第nA列は、解析期間T0内の第nA番目の時点tnAにおけるスペクトルXiのMA個の成分値xi[1,nA]〜xi[MA,nA]の系列に相当し、観測行列Viの第mA行は、解析期間T0内のNA個の時点t1〜tNAにわたる周波数fmAの成分値xi[mA,t1]〜xi[mA,NA]の時系列に相当する。スペクトルXiの成分値xi[mA,nA]はパワー(非負値)を意味するから、観測行列Viは非負行列(負数を含まない行列)である。なお、スペクトルSi(Xi)を振幅スペクトルとした構成も採用され得る。   As can be understood from the above description, the nAth column of the observation matrix Vi represents the MA component values xi [1, nA] to xi [MA, of the spectrum Xi at the nAth time point tnA within the analysis period T0. nA] series, and the mAth row of the observation matrix Vi represents the component values xi [mA, t1] to xi [mA, NA] of the frequency fmA over the NA time points t1 to tNA in the analysis period T0. Corresponds to the series. Since the component value xi [mA, nA] of the spectrum Xi means power (non-negative value), the observation matrix Vi is a non-negative matrix (matrix not including negative numbers). A configuration in which the spectrum Si (Xi) is an amplitude spectrum can also be employed.

図2の行列分解部44は、各観測行列Viの非負行列因子分解(NMF:Non-negative Matrix Factorization)で基底行列Wi(W1,W2)と係数行列Hi(H1,H2)とを生成する。図4に示すように、基底行列Wiは、成分値wi[1,1]〜wi[MA,K]を配列したMA行×K列の非負行列であり、係数行列Hiは、重み値hi[1,1]〜hi[K,NA]を配列したK行×NA列の非負行列である(Kは自然数)。基底行列Wiと係数行列Hiとの積が観測行列Viと近似する(Vi≒Wi・Hi)ように基底行列Wiと係数行列Hiとが生成される。周波数分析部42が適用する解析パラメータ(窓幅ωA,移動量δA)は、観測行列Viの非負行列因子分解が適切に実行され得る数値に設定される。   The matrix decomposition unit 44 in FIG. 2 generates a base matrix Wi (W1, W2) and a coefficient matrix Hi (H1, H2) by non-negative matrix factorization (NMF) of each observation matrix Vi. As shown in FIG. 4, the base matrix Wi is a non-negative matrix of MA rows × K columns in which component values wi [1,1] to wi [MA, K] are arranged, and the coefficient matrix Hi has a weight value hi [ 1,1] to hi [K, NA] is a non-negative matrix of K rows × NA columns (K is a natural number). The base matrix Wi and the coefficient matrix Hi are generated so that the product of the base matrix Wi and the coefficient matrix Hi approximates the observation matrix Vi (Vi≈Wi · Hi). The analysis parameters (window width ωA, movement amount δA) applied by the frequency analysis unit 42 are set to numerical values at which the non-negative matrix factorization of the observation matrix Vi can be appropriately performed.

図4に示すように、基底行列Wiは、K個の基底(codebook)Ci[1]〜Ci[K]で構成される。第k列目(k=1〜K)の基底Ci[k]は、解析期間T0内の音響信号siを構成するK種類の音響成分から選択された1種類の音響成分について周波数f1〜fMAでの成分値wi[1,k]〜wi[MA,k]を配列したパワースペクトルに相当する。他方、係数行列Hiは、図4に示すように、K個の重み系列(excitation)Ei[1]〜Ei[K]で構成される。第k行目の重み系列Ei[k]は、基底行列Wiの基底Ci[k]が示す音響成分に対する単位期間毎の重み値hi[k,1]〜hi[k,NA]の時系列(基底Ci[k]の各成分値wi[mA,k]の時間変化)に相当する。以上の定義から理解されるように、音響信号siの時点tnAでのスペクトルXiは、係数行列Hiのうち当該時点tnAに対応するK個の重み値hi[1,nA]〜hi[K,nA]を適用したK個の基底Ci[1]〜Ci[K]の加重和で近似される(Xi≒hi[1,nA]×Ci[1]+hi[2,nA]×Ci[2]+……+hi[K,nA]×Ci[K])。   As shown in FIG. 4, the base matrix Wi is composed of K codebooks Ci [1] to Ci [K]. The basis Ci [k] of the k-th column (k = 1 to K) has frequencies f1 to fMA for one type of acoustic component selected from the K types of acoustic components constituting the acoustic signal si within the analysis period T0. Corresponds to a power spectrum in which component values wi [1, k] to wi [MA, k] are arranged. On the other hand, the coefficient matrix Hi is composed of K weight sequences (excitation) Ei [1] to Ei [K] as shown in FIG. The weight sequence Ei [k] in the k-th row is a time series of weight values hi [k, 1] to hi [k, NA] per unit period for the acoustic component indicated by the basis Ci [k] of the basis matrix Wi ( Corresponding to each component value wi [mA, k] of the base Ci [k]. As understood from the above definition, the spectrum Xi of the acoustic signal si at the time point tnA is the K weight values hi [1, nA] to hi [K, nA corresponding to the time point tnA in the coefficient matrix Hi. ] Is approximated by a weighted sum of K bases Ci [1] to Ci [K] (Xi≈hi [1, nA] × Ci [1] + hi [2, nA] × Ci [2] + ...... + hi [K, nA] × Ci [K]).

観測行列Viの非負行列因子分解には公知の方法が任意に採用される。例えば、基底行列Wiおよび係数行列Hiの積と観測行列Viとの相違(例えば距離)が最小化するように基底行列Wiと係数行列Hiとを逐次的に更新(反復演算)する方法が好適に採用される。反復演算に適用される基底行列Wiの初期値(成分値wi[mA,k]の初期値)は、例えば乱数に設定される。なお、例えば風雑音のスペクトル(高域ほど減衰する周波数特性)を模擬するように各基底Ci[k]のMA個の成分値wi[1,k]〜wi[MA,k]の初期値を設定した構成も好適である。   A known method is arbitrarily employed for non-negative matrix factorization of the observation matrix Vi. For example, a method of sequentially updating (iteratively calculating) the base matrix Wi and the coefficient matrix Hi so as to minimize the difference (for example, distance) between the product of the base matrix Wi and the coefficient matrix Hi and the observation matrix Vi is preferable. Adopted. The initial value of the base matrix Wi applied to the iterative calculation (the initial value of the component value w i [mA, k]) is set to a random number, for example. For example, the initial values of the MA component values wi [1, k] to wi [MA, k] of each basis Ci [k] are simulated so as to simulate the spectrum of wind noise (frequency characteristics that attenuate as the frequency increases). The set configuration is also suitable.

図2の雑音特定部46は、各基底行列WiのK個の基底Ci[1]〜Ci[K]のうち雑音成分(風雑音)に対応する1個の基底Ci[k](以下では「雑音基底Ci_noise」と表記する)を特定する。風雑音は、収音機器122に衝突する空気の乱流に起因して発生するから、相異なる位置で収音された音響信号s1および音響信号s2の各々に含まれる風雑音の瞬時的な周波数特性は相互に統計的に独立する。ただし、風雑音の長期的な周波数特性は、音声等と比較すると、収音の位置に関わらず同様の特性に維持され易い。すなわち、解析期間T0のような長期間にわたる風雑音の周波数特性は音響信号s1と音響信号s2とで類似するという傾向がある。   The noise specifying unit 46 shown in FIG. 2 has one basis Ci [k] (hereinafter referred to as “wind noise”) corresponding to a noise component (wind noise) among the K bases Ci [1] to Ci [K] of each basis matrix Wi. Specified as “noise base Ci_noise”. Since wind noise is generated due to the turbulent flow of air colliding with the sound collecting device 122, the instantaneous frequency of the wind noise included in each of the acoustic signals s1 and s2 collected at different positions. The characteristics are statistically independent of each other. However, the long-term frequency characteristics of wind noise are more likely to be maintained at the same characteristics regardless of the position of sound collection, as compared to voice and the like. That is, the frequency characteristics of wind noise over a long period such as the analysis period T0 tend to be similar between the acoustic signal s1 and the acoustic signal s2.

以上の傾向を考慮して、雑音特定部46は、音響信号s1の基底行列W1(K個の基底C1[1]〜C1[K])と音響信号s2の基底行列W2(K個の基底C2[1]〜C2[K])との間で相互に相関が高い各基底Ci[k](C1[k1],C2[k2])を雑音基底Ci_noiseとして基底行列W1および基底行列W2の各々から特定する。例えば、基底行列W1の1個の基底C1[k]と基底行列W2の1個の基底C2[k]とを選択する全通りの組合せについて基底C1[k]と基底C2[k]との相関の度合を示す指標(相関指標)を算定し、相関指標が示す相関の度合が最大となる組合せの基底C1[k1]と基底C2[k2]との各々(変数k1と変数k2との数値の異同は不問)を雑音基底Ci_noise(C1_noise,C2_noise)として抽出する。基底C1[k]と基底C2[k]との相関指標としては、例えば距離(ユークリッド距離)や内積が好適に採用される。   In consideration of the above tendency, the noise specifying unit 46 determines the basis matrix W1 (K basis C1 [1] to C1 [K]) of the acoustic signal s1 and the basis matrix W2 (K basis C2) of the acoustic signal s2. [1] to C2 [K]), each base Ci [k] (C1 [k1], C2 [k2]) having a high correlation with each other is used as a noise base Ci_noise from each of the base matrix W1 and the base matrix W2. Identify. For example, the correlation between the basis C1 [k] and the basis C2 [k] for all combinations of selecting one basis C1 [k] of the basis matrix W1 and one basis C2 [k] of the basis matrix W2 An index (correlation index) indicating the degree of the correlation is calculated, and each of the combinations of the base C1 [k1] and the base C2 [k2] (the values of the variable k1 and the variable k2) having the maximum degree of correlation indicated by the correlation index The noise base Ci_noise (C1_noise, C2_noise) is extracted. As a correlation index between the base C1 [k] and the base C2 [k], for example, a distance (Euclidean distance) or an inner product is preferably employed.

図2の目的音抽出部52は、音響信号siから目的音成分を抽出した推定目的音信号qTiのスペクトルYTi(YT1,YT2)を順次に生成する。雑音抽出部54は、音響信号siから雑音成分を抽出した推定雑音信号qNiのスペクトルYNi(YN1,YN2)を生成する。図5は、目的音抽出部52および雑音抽出部54のブロック図である。   The target sound extraction unit 52 in FIG. 2 sequentially generates the spectrum YTi (YT1, YT2) of the estimated target sound signal qTi obtained by extracting the target sound component from the acoustic signal si. The noise extraction unit 54 generates a spectrum YNi (YN1, YN2) of the estimated noise signal qNi obtained by extracting a noise component from the acoustic signal si. FIG. 5 is a block diagram of the target sound extraction unit 52 and the noise extraction unit 54.

図5に示すように、目的音抽出部52は、係数列生成部522と抽出処理部524とを含んで構成される。係数列生成部522は、解析期間T0毎に目的音係数列GTi(GT1,GT2)を生成する。目的音係数列GTiは、係数値gTi[1,1]〜gTi[MA,NA]を配列したMA行×NA列の行列である。目的音係数列GTiのうち第mA行の第nA列に位置する係数値gTi[mA,nA]は、時点tnAのスペクトルXiのうち周波数fmAでの成分値xi[mA,nA]に対する利得(スペクトルゲイン)に相当し、0以上かつ1以下の範囲内で音響信号siの特性(風雑音の強度)に応じて可変に設定される。すなわち、時点tnAでの音響信号siの周波数fmAの音響成分において風雑音が優勢であるほど係数値gTi[mA,nA]は小さい数値に設定される。   As shown in FIG. 5, the target sound extraction unit 52 includes a coefficient sequence generation unit 522 and an extraction processing unit 524. The coefficient sequence generator 522 generates the target sound coefficient sequence GTi (GT1, GT2) for each analysis period T0. The target sound coefficient sequence GTi is a matrix of MA rows × NA columns in which coefficient values gTi [1,1] to gTi [MA, NA] are arranged. The coefficient value gTi [mA, nA] located in the nAth column of the mAth row of the target sound coefficient sequence GTi is the gain (spectrum) for the component value xi [mA, nA] at the frequency fmA of the spectrum Xi at the time tnA. Is variably set in accordance with the characteristics of the acoustic signal si (wind noise intensity) within a range of 0 or more and 1 or less. That is, the coefficient value gTi [mA, nA] is set to a smaller numerical value as the wind noise becomes dominant in the acoustic component of the frequency fmA of the acoustic signal si at the time point tnA.

第1実施形態の係数列生成部522は、図4に示すように、音響信号siの基底行列Wiから雑音基底Ci_noiseを除外したMA行×(K-1)列の行列WTiと、雑音基底Ci_noiseに対応する重み系列Ei_noiseを係数行列Hiから除外した(K-1)行×NA列の行列HTiとから目的音係数列GTi(G1,G2)を生成する。   As shown in FIG. 4, the coefficient sequence generator 522 of the first embodiment includes a matrix WTi of MA rows × (K−1) columns excluding the noise basis Ci_noise from the basis matrix Wi of the acoustic signal si, and the noise basis Ci_noise. The target sound coefficient sequence GTi (G1, G2) is generated from the matrix HTi of (K-1) rows × NA columns excluding the weight sequence Ei_noise corresponding to.

具体的には、係数列生成部522は、第1に、雑音基底Ci_noiseの除外後の行列WTiと重み系列Ei_noiseの除外後の行列HTiとの乗算で行列VTiを算定する。図4に示すように、行列VTiは、要素値vTi[1,1]〜vTi[MA,NA]をMA行×NA列に配列した行列である。以上の説明から理解されるように、行列VTiの第nA列に位置するMA個の要素値vTi[1,nA]〜vTi[MA,nA]は、時点tnAのスペクトルXiから風雑音を抑圧したパワースペクトルの推定値に相当する。   Specifically, the coefficient sequence generation unit 522 first calculates the matrix VTi by multiplying the matrix WTi after the noise basis Ci_noise is excluded and the matrix HTi after the weight sequence Ei_noise is excluded. As shown in FIG. 4, the matrix VTi is a matrix in which element values vTi [1,1] to vTi [MA, NA] are arranged in MA rows × NA columns. As understood from the above description, the MA element values vTi [1, nA] to vTi [MA, nA] located in the nA-th column of the matrix VTi suppress wind noise from the spectrum Xi at the time point tnA. It corresponds to the estimated value of the power spectrum.

第2に、係数列生成部522は、以下の数式(A)の演算で目的音係数列GTiの係数値gTi[mA,nA]を算定する。数式(A)の記号v[mA,nA]は、基底行列Wiと係数行列Hiとを乗算したMA行×NA列の行列のうち第m行の第n列の要素値(すなわち、スペクトルXiの成分値xi[mA,nA]の推定値)に相当する。要素値vTi[mA,nA]を要素値v[mA,nA]で除算するのは、係数値gTi[mA,nA]を0以上かつ1以下の数値に正規化するためである。以上のように風雑音の雑音基底Ci_noiseおよび重み系列Ei_noiseを除外した行列VTiから目的音係数列GTiが生成されるから、風雑音が優勢であるほど係数値gTi[mA,nA]は小さい数値に設定される。
gTi[mA,nA]=vTi[mA,nA]/v[mA,nA] ……(A)
Second, the coefficient sequence generation unit 522 calculates the coefficient value gTi [mA, nA] of the target sound coefficient sequence GTi by the calculation of the following formula (A). The symbol v [mA, nA] in equation (A) is the element value of the nth column of the mth row (ie, the spectrum Xi) of the matrix of MA rows × NA columns obtained by multiplying the base matrix Wi and the coefficient matrix Hi. Equivalent value of the component value x i [mA, nA]). The reason why the element value vTi [mA, nA] is divided by the element value v [mA, nA] is to normalize the coefficient value gTi [mA, nA] to a numerical value between 0 and 1. As described above, since the target sound coefficient sequence GTi is generated from the matrix VTi excluding the noise base Ci_noise and the weight sequence Ei_noise of the wind noise, the coefficient value gTi [mA, nA] becomes smaller as the wind noise becomes more dominant. Is set.
gTi [mA, nA] = vTi [mA, nA] / v [mA, nA] (A)

図5の抽出処理部524は、係数列生成部522が生成した目的音係数列GTiを音響信号siの観測行列Viに作用させることで、解析期間T0内のNA個の時点t1〜tNAの各々に対応するNA個のスペクトルYTiの時系列(解析期間T0内のスペクトログラム)を解析期間T0毎に順次に生成する。時点tnAのスペクトルYTiは、MA個の成分値yTi[1,nA]〜yTi[MA,nA]で構成されるパワースペクトルである。具体的には、成分値yTi[mA,nA]は、目的音係数列GTiの係数値gTi[mA,nA]と観測行列Viの成分値xi[mA,nA]との乗算値に設定される(yTi[mA,nA]=gTi[mA,nA]×xi[mA,nA])。前述のように風雑音が優勢であるほど係数値gTi[mA,nA]は小さい数値に設定されるから、抽出処理部524が生成するスペクトルYTiは、音響信号siのスペクトルXiから風雑音を抑圧したスペクトルに相当する。   The extraction processing unit 524 in FIG. 5 applies each of the target sound coefficient sequence GTi generated by the coefficient sequence generation unit 522 to the observation matrix Vi of the acoustic signal si, so that each of the NA points in time t1 to tNA within the analysis period T0. A time series (spectrogram within the analysis period T0) of NA spectra YTi corresponding to is sequentially generated for each analysis period T0. The spectrum YTi at the time tnA is a power spectrum composed of MA component values yTi [1, nA] to yTi [MA, nA]. Specifically, the component value yTi [mA, nA] is set to a product of the coefficient value gTi [mA, nA] of the target sound coefficient sequence GTi and the component value xi [mA, nA] of the observation matrix Vi. (YTi [mA, nA] = gTi [mA, nA] × xi [mA, nA]). As described above, the coefficient value gTi [mA, nA] is set to a smaller value as the wind noise becomes more dominant. Therefore, the spectrum YTi generated by the extraction processing unit 524 suppresses the wind noise from the spectrum Xi of the acoustic signal si. It corresponds to the spectrum.

図5の雑音抽出部54は、目的音抽出部52と同様に、係数値gNi[1,1]〜gNi[MA,NA]で構成されるMA行×NA列の雑音係数列GNi(GN1,GN2)を解析期間T0毎に生成する係数列生成部542と、雑音係数列GNiを観測行列Viに作用させてNA個のスペクトルYNiの時系列(解析期間T0内のスペクトログラム)を生成する抽出処理部544とを含んで構成される。   The noise extraction unit 54 in FIG. 5, similar to the target sound extraction unit 52, has a noise coefficient sequence GNi (GN 1, GN) of MA rows × NA columns composed of coefficient values gNi [1,1] to gNi [MA, NA]. GN2) is generated every analysis period T0, and extraction processing for generating a time series of NA spectra YNi (a spectrogram within the analysis period T0) by applying the noise coefficient sequence GNi to the observation matrix Vi. Part 544.

図4に示すように、係数列生成部542は、第1に、雑音特定部46が特定した雑音基底Ci_noiseと当該雑音基底Ci_noiseに対応する重み系列Ei_noiseとの乗算で、要素値vNi[1,1]〜vNi[MA,NA]をMA行×NA列に配列した行列VNiを算定する。行列VNiは、解析期間T0内の音響信号siの雑音成分のスペクトログラムに相当する。第2に、係数列生成部542は、前述の数式(A)と同様の数式(B)の演算で0以上かつ1以下の係数値gNi[mA,nA]を算定する。以上の説明から理解されるように、時点tnAでの音響信号siの周波数fmAの音響成分において風雑音が優勢であるほど係数値gNi[mA,nA]は大きい数値に設定される。
gNi[mA,nA]=vNi[mA,nA]/v[mA,nA] ……(B)
As shown in FIG. 4, the coefficient sequence generation unit 542 first multiplies the noise base Ci_noise specified by the noise specification unit 46 and the weight sequence Ei_noise corresponding to the noise base Ci_noise to obtain an element value vNi [1, 1] to vNi [MA, NA] are calculated as a matrix VNi arranged in MA rows × NA columns. The matrix VNi corresponds to the spectrogram of the noise component of the acoustic signal si within the analysis period T0. Second, the coefficient sequence generation unit 542 calculates a coefficient value gNi [mA, nA] that is greater than or equal to 0 and less than or equal to 1 by the calculation of the formula (B) similar to the formula (A) described above. As understood from the above description, the coefficient value gNi [mA, nA] is set to a larger numerical value as the wind noise becomes dominant in the acoustic component of the frequency fmA of the acoustic signal si at the time tnA.
gNi [mA, nA] = vNi [mA, nA] / v [mA, nA] (B)

抽出処理部544は、係数列生成部542が生成した雑音係数列GNiを音響信号siの観測行列Viに作用させることで、解析期間T0内のNA個のスペクトルYNiの時系列(スペクトログラム)を解析期間T0毎に順次に生成する。スペクトルYNiは、MA個の成分値yNi[1,nA]〜yNi[MA,nA]で構成されるパワースペクトルである。具体的には、成分値yNi[mA,nA]は、雑音係数列GNiの係数値gNi[mA,nA]と観測行列Viの成分値xi[mA,nA]との乗算値に設定される(yNi[mA,nA]=gNi[mA,nA]×xi[mA,nA])。前述のように雑音成分(風雑音)が優勢であるほど係数値gNi[mA,nA]は大きい数値に設定されるから、抽出処理部544が生成するスペクトルYNiは、音響信号siのスペクトルXiから風雑音を抽出したスペクトルに相当する。   The extraction processing unit 544 analyzes the time series (spectrogram) of the NA spectra YNi within the analysis period T0 by applying the noise coefficient sequence GNi generated by the coefficient sequence generation unit 542 to the observation matrix Vi of the acoustic signal si. It generates sequentially for every period T0. The spectrum YNi is a power spectrum composed of MA component values yNi [1, nA] to yNi [MA, nA]. Specifically, the component value yNi [mA, nA] is set to a product of the coefficient value gNi [mA, nA] of the noise coefficient sequence GNi and the component value xi [mA, nA] of the observation matrix Vi ( yNi [mA, nA] = gNi [mA, nA] × xi [mA, nA]). As described above, the coefficient value gNi [mA, nA] is set to a larger value as the noise component (wind noise) becomes more dominant. Therefore, the spectrum YNi generated by the extraction processing unit 544 is derived from the spectrum Xi of the acoustic signal si. It corresponds to the spectrum from which wind noise is extracted.

以上に説明したように、目的音抽出部52は音響信号siから目的音成分を抽出し、雑音抽出部54は音響信号siから雑音成分を抽出する。すなわち、目的音抽出部52および雑音抽出部54は、音響信号siを目的音成分(YT1,YT2)と雑音成分(YN1,YN2)とに分離する要素として機能する。   As described above, the target sound extraction unit 52 extracts a target sound component from the acoustic signal si, and the noise extraction unit 54 extracts a noise component from the acoustic signal si. That is, the target sound extraction unit 52 and the noise extraction unit 54 function as elements that separate the acoustic signal si into the target sound component (YT1, YT2) and the noise component (YN1, YN2).

図2の波形合成部56は、目的音抽出部52が単位期間毎に生成したスペクトルYTi(帯域BLa)と周波数分析部42が生成したスペクトルXHi(帯域BHa)とから時間領域の推定目的音信号qTi(qT1,qT2)を生成する。具体的には、波形合成部56は、スペクトルYTiおよびスペクトルXHiの加算値の振幅スペクトルと音響信号siの位相スペクトルとを適用した逆フーリエ変換で時間領域信号を生成するとともに前後の単位期間で相互に連結することで推定目的音信号qTiを生成する。また、波形合成部56は、雑音抽出部54が単位期間毎に生成したスペクトルYNiから時間領域の推定雑音信号qNi(qN1,qN2)を生成する。   The waveform synthesis unit 56 in FIG. 2 uses the spectrum YTi (band BLa) generated by the target sound extraction unit 52 for each unit period and the spectrum XHi (band BHa) generated by the frequency analysis unit 42 to estimate the target sound signal in the time domain. qTi (qT1, qT2) is generated. Specifically, the waveform synthesizer 56 generates a time domain signal by inverse Fourier transform using the amplitude spectrum of the added value of the spectrum YTi and the spectrum XHi and the phase spectrum of the acoustic signal si, and at the same time before and after the unit period. To generate the estimated target sound signal qTi. Further, the waveform synthesizer 56 generates a time-domain estimated noise signal qNi (qN1, qN2) from the spectrum YNi generated by the noise extraction unit 54 for each unit period.

図1の第2処理部32は、前述のように、以上の手順で生成された推定雑音信号qNiから残留成分を抽出して推定目的音信号qTiに合成する。第1実施形態では、目的音成分が調波構造の音響(典型的には音声)である場合を想定し、推定雑音信号qNiのうち調波構造を構成する調波成分(基音成分および倍音成分)を残留成分として推定雑音信号qNiから抽出する。図6は、第2処理部32のブロック図である。図6に示すように、第2処理部32は、周波数分析部62と調波成分抽出部64と目的音合成部66と波形合成部68とを含んで構成される。   As described above, the second processing unit 32 in FIG. 1 extracts a residual component from the estimated noise signal qNi generated by the above procedure and synthesizes it to the estimated target sound signal qTi. In the first embodiment, assuming that the target sound component is a harmonic structure sound (typically speech), the harmonic components (fundamental sound component and harmonic component) constituting the harmonic structure of the estimated noise signal qNi. ) As a residual component from the estimated noise signal qNi. FIG. 6 is a block diagram of the second processing unit 32. As shown in FIG. 6, the second processing unit 32 includes a frequency analysis unit 62, a harmonic component extraction unit 64, a target sound synthesis unit 66, and a waveform synthesis unit 68.

周波数分析部62は、推定目的音信号qTiのスペクトルSTi(ST1,ST2)と推定雑音信号qNiのスペクトルSNi(SN1,SN2)とを単位期間毎に順次に生成する。推定目的音信号qTiのスペクトルSTiは、複数の成分値(パワー)sTiを配列したパワースペクトルである。同様に、推定雑音信号qNiのスペクトルSNiは、複数の成分値sNiを配列したパワースペクトルである。図7に示すように、時間軸上に間隔ΔtBで配列する各時点tと周波数軸上に間隔ΔfBで配列する各周波数fとに対応する解析点p2毎に成分値xTiおよび成分値xNiが算定される。図6に示すように、推定目的音信号qTiのスペクトルSTiは、帯域BLb内のスペクトルZiと帯域BHb内のスペクトルZHiとに区分される。帯域BLbは、MB個(MBは自然数)の周波数f1〜fMBを含む範囲(例えば0.1kHzから4.4kHzまでの帯域)に設定され、帯域BHbは、帯域BLbの高域側(例えば4.4kHz以上の帯域)に設定される。   The frequency analysis unit 62 sequentially generates a spectrum STi (ST1, ST2) of the estimated target sound signal qTi and a spectrum SNi (SN1, SN2) of the estimated noise signal qNi for each unit period. The spectrum STi of the estimated target sound signal qTi is a power spectrum in which a plurality of component values (power) sTi are arranged. Similarly, the spectrum SNi of the estimated noise signal qNi is a power spectrum in which a plurality of component values sNi are arranged. As shown in FIG. 7, the component value xTi and the component value xNi are calculated for each analysis point p2 corresponding to each time point t arranged at intervals ΔtB on the time axis and each frequency f arranged at intervals ΔfB on the frequency axis. Is done. As shown in FIG. 6, the spectrum STi of the estimated target sound signal qTi is divided into a spectrum Zi in the band BLb and a spectrum ZHi in the band BHb. The band BLb is set to a range (for example, a band from 0.1 kHz to 4.4 kHz) including MB (MB is a natural number) frequencies f1 to fMB, and the band BHb is a high band side of the band BLb (for example, 4. 4 kHz or higher band).

スペクトルSTiおよびスペクトルSNiの算定には、単位期間の窓幅ωBおよび移動量(時間軸上のシフト量)δBを解析パラメータとした短時間フーリエ変換が採用される。周波数分析部42の解析パラメータ(窓幅ωA,移動量δA)が非負行列因子分解に好適な数値という観点から選定されるのに対し、周波数分析部62の解析パラメータ(窓幅ωB,移動量δA)は、第2処理部32での残留成分(調波成分)の抽出および合成にとって好適な数値という観点から選定される。以上の相違に起因して、周波数分析部42の解析パラメータ(窓幅ωA,移動量δA)と周波数分析部62の解析パラメータ(窓幅ωB,移動量δB)とは相違する。すなわち、第1処理部31で想定される各解析点p1の時間軸上の間隔ΔtAと第2処理部32で想定される各解析点p2の間隔ΔtBとは相違し、第1処理部31で想定される各解析点p1の周波数軸上の間隔ΔfAと第2処理部32で想定される各解析点p2の間隔ΔfBとは相違する。具体的には、時間分解能は周波数分析部62が周波数分析部42を上回り(ΔtB<ΔtA)、周波数分解能は周波数分析部42が周波数分析部62を上回る(ΔfA<ΔfB)ように、各解析パラメータが選定される。例えば、周波数分析部42については窓幅ωAが512msに設定されて移動量δAが64msに設定されるのに対し、周波数分析部62については窓幅ωBが25msに設定されて移動量δBが5msに設定される。   For the calculation of the spectrum STi and the spectrum SNi, a short-time Fourier transform using the window width ωB and the movement amount (shift amount on the time axis) δB in the unit period as analysis parameters is adopted. The analysis parameters (window width ωA, movement amount δA) of the frequency analysis unit 42 are selected from the viewpoint of numerical values suitable for non-negative matrix factorization, whereas the analysis parameters (window width ωB, movement amount δA) of the frequency analysis unit 62 are selected. ) Is selected from the viewpoint of a numerical value suitable for extraction and synthesis of residual components (harmonic components) in the second processing unit 32. Due to the above differences, the analysis parameters (window width ωA, movement amount δA) of the frequency analysis unit 42 and the analysis parameters (window width ωB, movement amount δB) of the frequency analysis unit 62 are different. That is, the interval ΔtA on the time axis of each analysis point p1 assumed by the first processing unit 31 is different from the interval ΔtB of each analysis point p2 assumed by the second processing unit 32. The interval ΔfA on the frequency axis of each analysis point p1 assumed is different from the interval ΔfB of each analysis point p2 assumed in the second processing unit 32. Specifically, each analysis parameter is set so that the frequency analysis unit 62 exceeds the frequency analysis unit 42 (ΔtB <ΔtA) and the frequency resolution exceeds the frequency analysis unit 62 (ΔfA <ΔfB). Is selected. For example, for the frequency analysis unit 42, the window width ωA is set to 512 ms and the movement amount δA is set to 64 ms, whereas for the frequency analysis unit 62, the window width ωB is set to 25 ms and the movement amount δB is 5 ms. Set to

図6の調波成分抽出部64は、雑音成分の推定雑音信号qNiから残留成分のスペクトルRiを順次に抽出する。NB個のスペクトルRiの時系列(残留成分のスペクトログラム)が解析期間T0毎に順次に生成される。   6 sequentially extracts the residual component spectrum Ri from the noise component estimated noise signal qNi. A time series (spectrogram of residual components) of NB spectra Ri is sequentially generated every analysis period T0.

図8は、調波成分抽出部64のブロック図である。図8に示すように、調波成分抽出部64は、周波数推定部72と調波係数列生成部74と係数列補正部76と調波抽出部78とを含んで構成される。周波数推定部72は、推定目的音信号qTiのスペクトルZiの解析で各解析期間T0内のNB個の単位期間の各々について音響信号siの目的音成分(推定目的音信号qTi)の基本周波数Fi[nB](Fi[1]〜Fi[NB])を推定する。基本周波数Fi[nB]の推定には公知の技術(例えば調波構造の解析やケプストラムの算定)が任意に採用される。   FIG. 8 is a block diagram of the harmonic component extraction unit 64. As shown in FIG. 8, the harmonic component extraction unit 64 includes a frequency estimation unit 72, a harmonic coefficient sequence generation unit 74, a coefficient sequence correction unit 76, and a harmonic extraction unit 78. The frequency estimator 72 analyzes the spectrum Zi of the estimated target sound signal qTi and analyzes the fundamental frequency Fi [of the target sound component (estimated target sound signal qTi) of the acoustic signal si for each of the NB unit periods within each analysis period T0. nB] (Fi [1] to Fi [NB]) is estimated. For estimating the fundamental frequency Fi [nB], a known technique (for example, analysis of harmonic structure or calculation of cepstrum) is arbitrarily employed.

調波係数列生成部74は、周波数推定部72が推定した基本周波数Fi[nB]と第1処理部31の係数列生成部542が生成した雑音係数列GNiとを利用して調波係数列GHi(H:harmonics)を生成する。図9に示すように、調波係数列GHiは、時間-周波数平面内の相異なる解析点p2に対応する係数値gHi[1,1]〜gHi[MB,NB]を配列したMB行×NB列の行列である。調波係数列GHiの第nB列を構成するMB個の係数値gHi[1,nB]〜gHi[MB,nB]は、解析期間T0のうち第nB番目の単位期間(時点tnB)における目的音成分の調波構造を示す係数列である。   The harmonic coefficient sequence generation unit 74 uses the fundamental frequency Fi [nB] estimated by the frequency estimation unit 72 and the noise coefficient sequence GNi generated by the coefficient sequence generation unit 542 of the first processing unit 31 to generate a harmonic coefficient sequence. GHi (H: harmonics) is generated. As shown in FIG. 9, the harmonic coefficient sequence GHi has an MB row × NB in which coefficient values gHi [1,1] to gHi [MB, NB] corresponding to different analysis points p2 in the time-frequency plane are arranged. A matrix of columns. The MB coefficient values gHi [1, nB] to gHi [MB, nB] constituting the nBth row of the harmonic coefficient row GHi are target sounds in the nBth unit period (time point tnB) in the analysis period T0. It is a coefficient sequence which shows the harmonic structure of a component.

調波係数列生成部74による調波係数列GHiの生成には雑音係数列GNiが利用されるが、雑音係数列GNiの各係数値gNi[mA,nA]に対応する解析点p1と、調波係数列GHiの各係数値gHi[mB,nB]に対応する解析点p2とは相違する。以上の相違を補償するために、係数列補正部76は、図9に示すように、第1処理部31の係数列生成部542が生成した雑音係数列GNiの補正で解析期間T0毎に補正係数列GBiを生成する。補正係数列GBiは、係数値gBi[1,1]〜gBi[MB,NB]を配列したMB行×NB列の行列である。   The generation of the harmonic coefficient string GHi by the harmonic coefficient string generation unit 74 uses the noise coefficient string GNi. The analysis point p1 corresponding to each coefficient value gNi [mA, nA] of the noise coefficient string GNi, This is different from the analysis point p2 corresponding to each coefficient value gHi [mB, nB] of the wave coefficient sequence GHi. In order to compensate for the above difference, the coefficient sequence correction unit 76 corrects the noise coefficient sequence GNi generated by the coefficient sequence generation unit 542 of the first processing unit 31 for each analysis period T0 as shown in FIG. A coefficient sequence GBi is generated. The correction coefficient sequence GBi is a matrix of MB rows × NB columns in which coefficient values gBi [1,1] to gBi [MB, NB] are arranged.

具体的には、係数列補正部76は、雑音係数列GNiの各係数値gNi[mA,nA]の補間または間引で補正係数列GBiの各係数値gBi[mB,nB]を生成する。例えば雑音係数列GNiの行数MAが目標の行数MBを上回る場合(MA>MB)、係数列補正部76は、雑音係数列GNiの各列を構成するMA個の係数値gNi[1,nA]〜gHi[MA,nA]の間引(補間)でMB個の係数列gBi[1,nA]〜gBi[MB,nA]を生成する。また、雑音係数列GNiの列数NAが目標の列数NBを下回る場合(NA<NB)、係数列補正部76は、雑音係数列GNiの各行を構成するNA個の係数値gNi[mA,1]〜gNi[mA,NA]の補間でNB個の係数列gBi[mA,1]〜gBi[mA,NB]を生成する。補間や間引には公知の技術(例えば直線補間)が任意に採用される。   Specifically, the coefficient sequence correction unit 76 generates each coefficient value gBi [mB, nB] of the correction coefficient sequence GBi by interpolation or thinning of each coefficient value gNi [mA, nA] of the noise coefficient sequence GNi. For example, when the number of rows MA of the noise coefficient sequence GNi exceeds the target number of rows MB (MA> MB), the coefficient sequence correction unit 76 sets the MA coefficient values gNi [1, MB coefficient sequences gBi [1, nA] to gBi [MB, nA] are generated by thinning out (interpolating) nA] to gHi [MA, nA]. When the number NA of noise coefficient columns GNi is less than the target number of columns NB (NA <NB), the coefficient column correction unit 76 uses NA coefficient values gNi [mA, mA constituting each row of the noise coefficient column GNi. NB coefficient sequences gBi [mA, 1] to gBi [mA, NB] are generated by interpolation of 1] to gNi [mA, NA]. A known technique (for example, linear interpolation) is arbitrarily employed for interpolation and thinning.

図8に示すように、調波係数列生成部74は、調波構造特定部742と係数列合成部744とを含んで構成される。調波構造特定部742は、音響信号siの目的音成分(残留成分)の調波構造を示す調波係数列Diを生成する。図9に示すように、調波係数列Diは、係数値di[1,1]〜di[MB,NB]を配列したMB行×NB列の行列である。調波係数列Diの第nB列を構成するMB個の係数値di[1,nB]〜di[MB,nB]の数値列は、解析期間T0内の第nB番目の単位期間における目的音成分の調波構造を指定する。具体的には、図9に示すように、MB個の係数値di[1,nB]〜di[MB,nB]のうち周波数推定部72が推定した基本周波数Fi[nB]の整数倍の周波数(F0[nB],2F0[nB],3F0[nB],……)に対応する係数値di[mB,nB]が1に設定されるとともに他の係数値di[mB,nB]はゼロに設定される。   As shown in FIG. 8, the harmonic coefficient sequence generation unit 74 includes a harmonic structure specifying unit 742 and a coefficient sequence synthesis unit 744. The harmonic structure specifying unit 742 generates a harmonic coefficient sequence Di indicating the harmonic structure of the target sound component (residual component) of the acoustic signal si. As shown in FIG. 9, the harmonic coefficient sequence Di is a matrix of MB rows × NB columns in which coefficient values di [1,1] to di [MB, NB] are arranged. The numerical sequence of MB coefficient values di [1, nB] to di [MB, nB] constituting the nBth column of the harmonic coefficient sequence Di is a target sound component in the nBth unit period in the analysis period T0. Specifies the harmonic structure of. Specifically, as shown in FIG. 9, among the MB coefficient values di [1, nB] to di [MB, nB], a frequency that is an integral multiple of the fundamental frequency Fi [nB] estimated by the frequency estimation unit 72. The coefficient value di [mB, nB] corresponding to (F0 [nB], 2F0 [nB], 3F0 [nB],...) Is set to 1 and the other coefficient values di [mB, nB] are set to zero. Is set.

図8の係数列合成部744は、係数列補正部76が生成した補正係数列GBiと調波構造特定部742が生成した調波係数列Diとの合成で調波係数列GHiを生成する。具体的には、調波係数列GHiの係数値gHi[mB,nB]は、図9に示すように、補正係数列GBiの係数値gBi[mB,nB]と調波係数列Diの係数値di[mB,nB]との乗算値に設定される(gHi[mB,nB]=gBi[mB,nB]×di[mB,nB])。したがって、基本周波数Fi[nB]の整数倍の周波数において残留成分が優勢であるほど係数値gHi[mB,nB]は大きい数値に設定される。   8 generates a harmonic coefficient sequence GHi by combining the correction coefficient sequence GBi generated by the coefficient sequence correction unit 76 and the harmonic coefficient sequence Di generated by the harmonic structure specifying unit 742. Specifically, the coefficient value gHi [mB, nB] of the harmonic coefficient sequence GHi is the coefficient value gBi [mB, nB] of the correction coefficient sequence GBi and the coefficient value of the harmonic coefficient sequence Di as shown in FIG. It is set to the product of di [mB, nB] (gHi [mB, nB] = gBi [mB, nB] × di [mB, nB]). Accordingly, the coefficient value gHi [mB, nB] is set to a larger value as the residual component becomes dominant at a frequency that is an integral multiple of the fundamental frequency Fi [nB].

図8の調波抽出部78は、係数列合成部744が生成した調波係数列GHiを推定雑音信号qNiのスペクトルSNiに作用させることで、解析期間T0内のNB個のスペクトルRiの時系列(残留成分のスペクトログラム)を生成する。時点tnBのスペクトルRiは、周波数f1〜fMBに対応するMB個の成分値ri[1,nB]〜ri[MB,nB]で構成されるパワースペクトルである。調波抽出部78は、推定雑音信号qNiの時点tnBのスペクトルSNiのうち周波数fmBの成分値sNi[mB,nB]と調波係数列GHiの係数値gHi[mB,nB]との乗算値をスペクトルRiの成分値ri[mB,nB]として算定する(ri[mB,nB]=gHi[mB,nB]×sNi[mB,nB])。したがって、スペクトルRiは、推定雑音信号qNiに混入した残留成分(周波数Fi[nB]を基本周波数とする調波成分)のスペクトルの推定値に相当する。以上が調波成分抽出部64の構成および作用である。   The harmonic extraction unit 78 of FIG. 8 applies the harmonic coefficient sequence GHi generated by the coefficient sequence synthesis unit 744 to the spectrum SNi of the estimated noise signal qNi, so that the time series of the NB spectra Ri in the analysis period T0 is obtained. (Spectrogram of residual components) is generated. The spectrum Ri at the time point tnB is a power spectrum composed of MB component values ri [1, nB] to ri [MB, nB] corresponding to the frequencies f1 to fMB. The harmonic extraction unit 78 calculates a product of the component value sNi [mB, nB] of the frequency fmB and the coefficient value gHi [mB, nB] of the harmonic coefficient sequence GHi in the spectrum SNi at the time tnB of the estimated noise signal qNi. It is calculated as the component value ri [mB, nB] of the spectrum Ri (ri [mB, nB] = gHi [mB, nB] × sNi [mB, nB]). Therefore, the spectrum Ri corresponds to the estimated value of the spectrum of the residual component (harmonic component having the frequency Fi [nB] as the fundamental frequency) mixed in the estimated noise signal qNi. The above is the configuration and operation of the harmonic component extraction unit 64.

図6の目的音合成部66は、周波数分析部62が推定目的音信号qTiから生成したスペクトルZi(帯域BLb)と調波成分抽出部64が生成したスペクトルRiとの合成(スペクトル加算)で単位期間毎に順次にスペクトルZRiを生成する。すなわち、スペクトルZRiは、推定目的音信号qTiのうち帯域BLb内の目的音成分と推定雑音信号qNiに残留した残留成分との混合音のパワースペクトルに相当する。   The target sound synthesizer 66 shown in FIG. 6 is unitized by combining (spectrum addition) the spectrum Zi (band BLb) generated from the estimated target sound signal qTi by the frequency analyzer 62 and the spectrum Ri generated by the harmonic component extractor 64. A spectrum ZRi is generated sequentially for each period. That is, the spectrum ZRi corresponds to the power spectrum of the mixed sound of the target sound component in the band BLb and the residual component remaining in the estimated noise signal qNi in the estimated target sound signal qTi.

波形合成部68は、目的音合成部66が単位期間毎に生成したスペクトルZRi(帯域BLb)と周波数分析部62が生成したスペクトルZHi(帯域BHb)とから時間領域の音響信号qi(q1,q2)を生成する。波形合成部68による音響信号qiの生成には、波形合成部56による推定目的音信号qTiの生成と同様の方法が採用される。以上の説明から理解されるように、音響信号qiの再生音は、推定目的音信号qTiの目的音成分と推定雑音信号qNiの残留成分との混合音に相当する。   The waveform synthesizer 68 generates a time-domain acoustic signal qi (q1, q2) from the spectrum ZRi (band BLb) generated by the target sound synthesizer 66 for each unit period and the spectrum ZHi (band BHb) generated by the frequency analyzer 62. ) Is generated. For the generation of the acoustic signal qi by the waveform synthesizer 68, the same method as the generation of the estimated target sound signal qTi by the waveform synthesizer 56 is employed. As can be understood from the above description, the reproduced sound of the acoustic signal qi corresponds to a mixed sound of the target sound component of the estimated target sound signal qTi and the residual component of the estimated noise signal qNi.

以上に説明したように、第1実施形態では、音響信号siの観測行列Viが基底行列Wiと係数行列Hiとに分解され、雑音基底Ci_noiseを除外した基底行列Wi(行列WTi)と重み系列Ei_noiseを除外した係数行列Hi(行列HTi)とを利用して目的音係数列GTiが生成される。したがって、音響信号siの目的音成分の強度が雑音成分と比較して低い場合でも、高精度に風雑音を抑圧することが可能である。また、基底行列Wiのうち雑音基底Ci_noise以外の各基底Ci[k]と係数行列Hiのうち重み系列Ei_noise以外の各重み系列Ei[k]とは維持されるから、音響信号siの目的音成分の波形が忠実に維持された音響信号qiを生成できるという利点もある。   As described above, in the first embodiment, the observation matrix Vi of the acoustic signal si is decomposed into the base matrix Wi and the coefficient matrix Hi, and the base matrix Wi (matrix WTi) excluding the noise base Ci_noise and the weight sequence Ei_noise. The target sound coefficient sequence GTi is generated using the coefficient matrix Hi (matrix HTi) excluding the. Therefore, even when the intensity of the target sound component of the acoustic signal si is lower than that of the noise component, it is possible to suppress wind noise with high accuracy. Further, since each basis Ci [k] other than the noise basis Ci_noise in the basis matrix Wi and each weight series Ei [k] other than the weight series Ei_noise in the coefficient matrix Hi are maintained, the target sound component of the acoustic signal si is maintained. There is also an advantage that an acoustic signal qi can be generated in which the waveform of is maintained faithfully.

なお、基底行列Wiから雑音基底Ci_noiseを特定する方法としては、例えば、風雑音の周波数特性を模擬するように事前に作成されたモデルを基底行列Wiの各基底Ci[k]と比較する構成も採用され得る。しかし、風雑音のモデルを利用する構成では、事前に用意されたモデルとは周波数特性が相違する風雑音を充分に抑圧できない可能性がある。他方、第1実施形態では、基底行列W1と基底行列W2との間で相関が高い各基底Ci[k]が雑音基底Ci_noiseとして特定されるから、風雑音のモデルを利用する構成と比較して、多様な特性の風雑音を充分に抑圧できるという利点がある。   As a method for identifying the noise base Ci_noise from the base matrix Wi, for example, a configuration in which a model created in advance so as to simulate the frequency characteristics of wind noise is compared with each base Ci [k] of the base matrix Wi. Can be employed. However, in a configuration using a wind noise model, wind noise having a frequency characteristic different from that of a model prepared in advance may not be sufficiently suppressed. On the other hand, in the first embodiment, each base Ci [k] having a high correlation between the base matrix W1 and the base matrix W2 is specified as a noise base Ci_noise, so that it is compared with a configuration using a wind noise model. There is an advantage that wind noise with various characteristics can be sufficiently suppressed.

また、第1実施形態では、推定雑音信号qNi内に残留する目的音成分(残留成分)が抽出されて推定目的音信号qTi(スペクトルZi)に合成されるから、例えば推定目的音信号qTiを放音装置14から再生する場合と比較して、再生音における目的音成分の欠落を防止することが可能である。しかも、調波構造の解析で残留成分を雑音成分から分離するため、残留成分の強度が雑音成分に対して低い場合でも残留成分を高精度に抽出できるという利点がある。   In the first embodiment, since the target sound component (residual component) remaining in the estimated noise signal qNi is extracted and synthesized with the estimated target sound signal qTi (spectrum Zi), for example, the estimated target sound signal qTi is released. Compared with the case of reproducing from the sound device 14, it is possible to prevent the target sound component from being lost in the reproduced sound. In addition, since the residual component is separated from the noise component in the analysis of the harmonic structure, there is an advantage that the residual component can be extracted with high accuracy even when the strength of the residual component is lower than the noise component.

また、第1実施形態では、解析点p1に対応する雑音係数列GNiが第2処理部32での解析点p2に対応するように補正されたうえで残留成分の抽出に利用される。したがって、第1処理部31の周波数分析部42による解析パラメータ(窓幅ωA,移動量δA)と第2処理部32の周波数分析部62による解析パラメータ(窓幅ωB,移動量δB)とを個別に選定できるという利点がある。具体的には、前述のように、行列分解部44による非負行列因子分解に適切な数値という観点から周波数分析部42の解析パラメータ(窓幅ωA,移動量δA)を選定し、残留成分の抽出および合成に適切な数値という観点から周波数分析部62の解析パラメータを選定することが可能である。   In the first embodiment, the noise coefficient sequence GNi corresponding to the analysis point p1 is corrected so as to correspond to the analysis point p2 in the second processing unit 32, and then used for extracting residual components. Therefore, the analysis parameters (window width ωA, movement amount δA) by the frequency analysis unit 42 of the first processing unit 31 and the analysis parameters (window width ωB, movement amount δB) by the frequency analysis unit 62 of the second processing unit 32 are individually set. There is an advantage that can be selected. Specifically, as described above, the analysis parameters (window width ωA, movement amount δA) of the frequency analysis unit 42 are selected from the viewpoint of numerical values suitable for the non-negative matrix factorization by the matrix decomposition unit 44, and the residual components are extracted. In addition, it is possible to select an analysis parameter of the frequency analysis unit 62 from the viewpoint of a numerical value appropriate for synthesis.

<B:第2実施形態>
次に、本発明の第2実施形態について説明する。なお、以下の各例示において作用や機能が第1実施形態と同等である要素については、以上の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。
<B: Second Embodiment>
Next, a second embodiment of the present invention will be described. In addition, about the element which an effect | action and function are equivalent to 1st Embodiment in each following illustration, the code | symbol referred by the above description is diverted and each detailed description is abbreviate | omitted suitably.

信号供給装置12の2個の収音機器122に対して正面方向から到来する目的音成分は、位相差を殆ど発生させずに略同等の強度(振幅)で各収音機器122に到達する。他方、風雑音は前述のように空気の乱流に起因するから、同位相かつ同振幅で各収音機器122に到達する可能性は低い。したがって、音響信号s1や音響信号s2にて風雑音が優勢となるほど両者間の位相差や強度差が増加するという傾向がある。以上の傾向を考慮して、本実施形態では、音響信号s1と音響信号s2との位相差や強度差に応じて目的音係数列GTiの各係数値gTi[mA,nA]や雑音係数列GNiの各係数値gNi[mA,nA]を可変に設定する。   The target sound component that arrives from the front direction with respect to the two sound collecting devices 122 of the signal supply device 12 reaches each sound collecting device 122 with substantially the same intensity (amplitude) without generating a phase difference. On the other hand, since wind noise is caused by air turbulence as described above, the possibility of reaching each sound collecting device 122 with the same phase and the same amplitude is low. Therefore, the more the wind noise becomes dominant in the acoustic signal s1 and the acoustic signal s2, the more the phase difference and the intensity difference between the two tend to increase. In consideration of the above tendency, in the present embodiment, each coefficient value gTi [mA, nA] of the target sound coefficient sequence GTi and the noise coefficient sequence GNi according to the phase difference or intensity difference between the acoustic signal s1 and the acoustic signal s2. Each coefficient value gNi [mA, nA] is variably set.

第2実施形態の音響処理装置100は、図10に示すように、第1実施形態の第1処理部31に位相差算定部582と強度差算定部584とを追加した構成である。音響信号s1および音響信号s2の各々の帯域BMの成分が位相差算定部582および強度差算定部584に供給される。帯域BMは、風雑音の周波数と主要な目的音成分の周波数とを包含するように設定される。例えば、帯域BMは4kHz以下の範囲(すなわち帯域BLaを含む帯域)に設定される。   As shown in FIG. 10, the acoustic processing apparatus 100 of the second embodiment has a configuration in which a phase difference calculation unit 582 and an intensity difference calculation unit 584 are added to the first processing unit 31 of the first embodiment. The components of the bands BM of the acoustic signal s1 and the acoustic signal s2 are supplied to the phase difference calculation unit 582 and the intensity difference calculation unit 584. The band BM is set so as to include the frequency of the wind noise and the frequency of the main target sound component. For example, the band BM is set to a range of 4 kHz or less (that is, a band including the band BLa).

図10の位相差算定部582は、音響信号s1と音響信号s2との位相差ΔP[nA]を単位期間毎(時点tnA毎)に順次に算定する。位相差ΔP[nA]は、例えば、帯域BM内の各周波数での位相差の代表値(例えば平均値)である。同様に、強度差算定部584は、音響信号s1と音響信号s2との強度差(例えば振幅差やパワー差)ΔA[nA]を単位期間毎に順次に算定する。   The phase difference calculation unit 582 in FIG. 10 sequentially calculates the phase difference ΔP [nA] between the acoustic signal s1 and the acoustic signal s2 for each unit period (every time tnA). The phase difference ΔP [nA] is, for example, a representative value (for example, an average value) of the phase difference at each frequency in the band BM. Similarly, the intensity difference calculation unit 584 sequentially calculates the intensity difference (for example, amplitude difference or power difference) ΔA [nA] between the acoustic signal s1 and the acoustic signal s2 for each unit period.

目的音抽出部52の係数列生成部522は、位相差算定部582が算定した位相差ΔP[nA]と強度差算定部584が算定した強度差ΔA[nA]とに応じて目的音係数列GTiの係数値gTi[mA,nA]を可変に設定する。具体的には、係数列生成部522は、位相差ΔP[nA]または強度差ΔA[nA]が大きい(時点tnAで風雑音が優勢である)ほど、前掲の数式(A)で算定される係数値gTi[mA,nA]を小さい数値に補正する。したがって、第2実施形態によれば、第1実施形態と比較して風雑音を充分に抑圧した推定目的音信号qTiを生成できるという利点がある。   The coefficient sequence generation unit 522 of the target sound extraction unit 52 determines the target sound coefficient sequence according to the phase difference ΔP [nA] calculated by the phase difference calculation unit 582 and the intensity difference ΔA [nA] calculated by the intensity difference calculation unit 584. The coefficient value gTi [mA, nA] of GTi is variably set. Specifically, the coefficient sequence generation unit 522 calculates the above equation (A) as the phase difference ΔP [nA] or the intensity difference ΔA [nA] is larger (wind noise is dominant at the time tnA). The coefficient value gTi [mA, nA] is corrected to a small value. Therefore, according to the second embodiment, there is an advantage that it is possible to generate the estimated target sound signal qTi in which the wind noise is sufficiently suppressed as compared with the first embodiment.

他方、雑音抽出部54の係数列生成部542は、位相差ΔP[nA]と強度差ΔA[nA]とに応じて雑音係数列GNiの係数値gNi[mA,nA]を可変に設定する。具体的には、係数列生成部542は、位相差ΔP[nA]または強度差ΔA[nA]が大きい(時点tnAで風雑音が優勢である)ほど、前掲の数式(B)で算定される係数値gNi[mA,nA]を大きい数値に補正する。したがって、第2実施形態によれば、第1実施形態と比較して風雑音を充分に強調した推定雑音信号qNiを生成できるという利点がある。   On the other hand, the coefficient sequence generation unit 542 of the noise extraction unit 54 variably sets the coefficient value gNi [mA, nA] of the noise coefficient sequence GNi according to the phase difference ΔP [nA] and the intensity difference ΔA [nA]. Specifically, the coefficient sequence generation unit 542 is calculated by the above formula (B) as the phase difference ΔP [nA] or the intensity difference ΔA [nA] is larger (wind noise is dominant at the time tnA). The coefficient value gNi [mA, nA] is corrected to a large value. Therefore, according to the second embodiment, there is an advantage that an estimated noise signal qNi in which wind noise is sufficiently emphasized can be generated as compared with the first embodiment.

<C:第3実施形態>
第3実施形態の音響処理装置100は、図11に示すように第2処理部32に調整部65を追加した構成である。調整部65は、推定雑音信号qNiのスペクトルSNiの強度(パワー)を減少させる増幅器(例えば1未満の数値を乗算する乗算器)である。目的音合成部66は、第1実施形態と同様のスペクトルZi(帯域BLb)およびスペクトルRiと、調整部65による処理後(減衰後)のスペクトルSNiとの合成で単位期間毎に順次にスペクトルZRiを生成する。すなわち、音響信号siの雑音成分が低音量で再生音に付加される。
<C: Third Embodiment>
The sound processing apparatus 100 according to the third embodiment has a configuration in which an adjustment unit 65 is added to the second processing unit 32 as illustrated in FIG. 11. The adjustment unit 65 is an amplifier (for example, a multiplier that multiplies a numerical value less than 1) that decreases the intensity (power) of the spectrum SNi of the estimated noise signal qNi. The target sound synthesizer 66 combines the spectrum Zi (band BLb) and spectrum Ri similar to those in the first embodiment and the spectrum SNi after processing (after attenuation) by the adjustment unit 65 to sequentially produce the spectrum ZRi for each unit period. Is generated. That is, the noise component of the acoustic signal si is added to the reproduced sound at a low volume.

第3実施形態でも第1実施形態と同様の効果が実現される。なお、推定目的音信号qTiのスペクトルZiと残留成分のスペクトルRiとのみを合成してスペクトルZRiを生成する第1実施形態の構成では、雑音成分を高度に除外することが可能であるが、再生音が聴感的に不自然な印象となる可能性がある。第3実施形態では、推定雑音信号qNiのスペクトルSNiがスペクトルZRiの合成に適用されるから、聴感的に自然な印象の再生音を生成できるという利点がある。   In the third embodiment, the same effect as in the first embodiment is realized. In the configuration of the first embodiment in which the spectrum ZRi is generated by synthesizing only the spectrum Zi of the estimated target sound signal qTi and the residual component spectrum Ri, the noise component can be highly excluded. The sound may be audibly unnatural. In the third embodiment, since the spectrum SNi of the estimated noise signal qNi is applied to the synthesis of the spectrum ZRi, there is an advantage that it is possible to generate a reproduced sound with an audibly natural impression.

<D:第4実施形態>
第1実施形態では、基底行列Wiと係数行列Hiとから生成した目的音係数列GTiを目的音抽出部52が観測行列Viに作用させることでスペクトルYTiの時系列を生成し、雑音係数列GNiを雑音抽出部54が観測行列Viに作用させることでスペクトルYNiの時系列を生成した。第4実施形態は、目的音抽出部52や雑音抽出部54の動作を簡略化した形態である。
<D: Fourth Embodiment>
In the first embodiment, the target sound extraction unit 52 applies the target sound coefficient sequence GTi generated from the base matrix Wi and the coefficient matrix Hi to the observation matrix Vi to generate a time series of the spectrum YTi, and the noise coefficient sequence GNi. Is applied to the observation matrix Vi by the noise extraction unit 54 to generate a time series of the spectrum YNi. In the fourth embodiment, the operations of the target sound extraction unit 52 and the noise extraction unit 54 are simplified.

図4を参照して前述したように、基底行列Wiから雑音基底Ci_noiseを除外した行列WTiと係数行列Hiから重み系列Ei_noiseを除外した行列HTiとを乗算した行列VTiは、音響信号siから雑音成分を抑圧した場合のスペクトログラムに近似する。そこで、第4実施形態の目的音抽出部52は、雑音成分の抑圧後のスペクトルYTiの時系列(スペクトログラム)として行列VTiを解析期間T0毎に順次に生成する。行列VTiの第nA列に位置するMA個の要素値vTi[1,nA]〜vTi[MA,nA]の系列がスペクトルYTiとして波形合成部56に供給される。   As described above with reference to FIG. 4, the matrix VTi obtained by multiplying the matrix WTi excluding the noise basis Ci_noise from the base matrix Wi and the matrix HTi excluding the weight sequence Ei_noise from the coefficient matrix Hi is a noise component from the acoustic signal si. Approximate to the spectrogram when. Therefore, the target sound extraction unit 52 of the fourth embodiment sequentially generates the matrix VTi as the time series (spectrogram) of the spectrum YTi after suppressing the noise component for each analysis period T0. A sequence of MA element values vTi [1, nA] to vTi [MA, nA] located in the nAth column of the matrix VTi is supplied to the waveform synthesis unit 56 as a spectrum YTi.

また、雑音基底Ci_noiseと重み系列Ei_noiseとを乗算した行列VNiは、音響信号siから目的音成分を抑圧した場合のスペクトログラムに近似する。そこで、第4実施形態の雑音抽出部54は、目的音成分の抑圧後のスペクトルYNiの時系列(スペクトログラム)として行列VNiを解析期間T0毎に順次に生成する。行列VNiの第nA列に位置するMA個の要素値vNi[1,nA]〜vNi[MA,nA]の系列がスペクトルYNiとして利用される。   The matrix VNi obtained by multiplying the noise base Ci_noise and the weight sequence Ei_noise approximates a spectrogram when the target sound component is suppressed from the acoustic signal si. Therefore, the noise extraction unit 54 of the fourth embodiment sequentially generates the matrix VNi as the time series (spectrogram) of the spectrum YNi after suppression of the target sound component for each analysis period T0. A sequence of MA element values vNi [1, nA] to vNi [MA, nA] located in the nA-th column of the matrix VNi is used as the spectrum YNi.

図12は、第4実施形態における調波成分抽出部64のブロック図である。図12に示すように、第4実施形態の調波成分抽出部64は、第1実施形態の係数列合成部744を省略した構成である。係数列補正部76には、雑音抽出部54が生成した行列VNiが供給される。   FIG. 12 is a block diagram of the harmonic component extraction unit 64 in the fourth embodiment. As illustrated in FIG. 12, the harmonic component extraction unit 64 of the fourth embodiment has a configuration in which the coefficient string synthesis unit 744 of the first embodiment is omitted. The coefficient sequence correction unit 76 is supplied with the matrix VNi generated by the noise extraction unit 54.

行列VNiは、各解析点p1に対応した要素値vNi[1,1]〜vNi[MA,NA]で構成されたMA行×NA列の行列である。係数列補正部76は、行列VNiの各要素値vNi[mA,nA]の補正(補間,間引)で行列VBiを生成する。行列VBiは、周波数分析部62の解析パラメータ(窓幅ωB,移動量δB)に応じた各解析点p2に対応した要素値vBi[1,1]〜vBi[MB,NB]で構成される。   The matrix VNi is a matrix of MA rows × NA columns composed of element values vNi [1,1] to vNi [MA, NA] corresponding to each analysis point p1. The coefficient sequence correction unit 76 generates the matrix VBi by correcting (interpolating, thinning out) each element value vNi [mA, nA] of the matrix VNi. The matrix VBi is composed of element values vBi [1,1] to vBi [MB, NB] corresponding to the analysis points p2 corresponding to the analysis parameters (window width ωB, movement amount δB) of the frequency analysis unit 62.

調波抽出部78は、調波構造特定部742が生成した調波係数列Diを係数列補正部76による補正後の行列VBiに作用させることで、解析期間T0内のNB個のスペクトルRiの時系列(残留成分のスペクトログラム)を生成する。具体的には、調波抽出部78は、行列VBiの要素値vBi[mB,nB]と調波係数列Diの係数値di[mB,nB]との乗算値をスペクトルRiの成分値ri[mB,nB]として算定する。行列VBiは、第1処理部31での分離後の雑音成分のスペクトルの推定値に近似するから、スペクトルRiは、分離後の雑音成分から目的音成分の残留成分(すなわち基本周波数Fi[nB]の整数倍の周波数の調波成分)を抽出したスペクトルの推定値に相当する。したがって、第4実施形態でも第1実施形態と同様の効果が実現される。   The harmonic extraction unit 78 applies the harmonic coefficient sequence Di generated by the harmonic structure specifying unit 742 to the matrix VBi after correction by the coefficient sequence correction unit 76, so that the NB spectra Ri within the analysis period T0 can be obtained. Generate time series (residual component spectrogram). Specifically, the harmonic extraction unit 78 multiplies the element value vBi [mB, nB] of the matrix VBi and the coefficient value di [mB, nB] of the harmonic coefficient sequence Di by the component value ri [ mB, nB]. Since the matrix VBi approximates the estimated value of the spectrum of the noise component after separation in the first processing unit 31, the spectrum Ri is the residual component of the target sound component (ie, the fundamental frequency Fi [nB]) from the noise component after separation. The harmonic component having a frequency that is an integral multiple of () is equivalent to the estimated value of the extracted spectrum. Therefore, the fourth embodiment can achieve the same effect as the first embodiment.

<E:変形例>
以上の各形態には多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された2以上の態様は適宜に併合され得る。
<E: Modification>
Each of the above forms can be variously modified. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

(1)変形例1
以上の各形態では、推定目的音信号qTiのスペクトルZiから音響信号siの目的音成分の基本周波数Fi[nB]を推定したが、基本周波数Fi[nB]の推定の方法は任意である。例えば、推定目的音信号qTiの基本周波数Fi[nB]を時間領域の処理(例えば自己相関関数を利用した方法)で推定することも可能である。また、目的音成分の抽出前の音響信号si(またはスペクトルXi)の解析で基本周波数Fi[nB]を推定する構成も採用され得る。ただし、雑音成分が混在している段階では基本周波数Fi[nB]の推定の精度が低下するから、基本周波数Fi[nB]の高精度な推定という観点からは、雑音成分の抑圧後に基本周波数Fi[nB]を推定する第1実施形態の構成が有利である。
(1) Modification 1
In each of the above embodiments, the fundamental frequency Fi [nB] of the target sound component of the acoustic signal si is estimated from the spectrum Zi of the estimated target sound signal qTi, but the method of estimating the fundamental frequency Fi [nB] is arbitrary. For example, the fundamental frequency Fi [nB] of the estimated target sound signal qTi can be estimated by time domain processing (for example, a method using an autocorrelation function). In addition, a configuration in which the fundamental frequency Fi [nB] is estimated by analyzing the acoustic signal si (or spectrum Xi) before extracting the target sound component may be employed. However, since the estimation accuracy of the fundamental frequency Fi [nB] decreases at the stage where noise components are mixed, from the viewpoint of high-precision estimation of the fundamental frequency Fi [nB], the fundamental frequency Fi is suppressed after the noise component is suppressed. The configuration of the first embodiment for estimating [nB] is advantageous.

なお、例えば目的音成分と雑音成分とが混在している音響信号siから基本周波数Fiを推定する場合でも、遮断周波数を下回る成分の除去(ローカットフィルタ処理)で雑音成分を減衰させれば基本周波数Fi[nB]の高精度な推定も可能である。しかし、雑音成分の周波数が刻々と変化する状況を想定すると、遮断周波数を最適値に選定することは非常に困難である。前述の各実施形態においては、雑音成分の周波数が変化する場合でも雑音成分が有効に抑圧され、抑圧後の推定目的音信号qTiを対象として基本周波数Fi[nB]が推定されるから、雑音成分の周波数が変化した場合の遮断周波数の選定を問題とせずに基本周波数Fi[nB]を高精度に推定できるという利点がある。   For example, even when the fundamental frequency Fi is estimated from the acoustic signal si in which the target sound component and the noise component are mixed, if the noise component is attenuated by removing the component below the cutoff frequency (low cut filter processing), the fundamental frequency Fi [nB] can also be estimated with high accuracy. However, assuming a situation where the frequency of the noise component changes every moment, it is very difficult to select the cut-off frequency as an optimum value. In each of the above-described embodiments, the noise component is effectively suppressed even when the frequency of the noise component changes, and the fundamental frequency Fi [nB] is estimated for the estimated target sound signal qTi after the suppression. There is an advantage that the fundamental frequency Fi [nB] can be estimated with high accuracy without causing the problem of selection of the cut-off frequency when the frequency changes.

(2)変形例2
目的音成分と雑音成分との分離(第1処理部31)や残留成分の抽出(第2処理部32)の対象を低域側の帯域(BLa,BLb)に限定する構成は省略され得る。例えば、音響信号siの全帯域を行列分解部44や雑音特定部46による処理の対象とした構成も採用され得る。ただし、風雑音の強度は高域側の帯域(例えば帯域BHa)で低下するから、音響信号siの帯域分割を省略した構成では、風雑音の独立した基底Ci[k]を非負行列因子分解で高精度に抽出することが困難となる。したがって、抑圧の対象となる雑音成分の周波数帯域が事前に判明している場合には、雑音成分を包含する周波数帯域(帯域BLa)のみを行列分解部44や雑音特定部46による処理の対象とした前述の構成が格別に好適である。
(2) Modification 2
The configuration for limiting the target sound component and the noise component (first processing unit 31) and the residual component extraction (second processing unit 32) to the lower band (BLa, BLb) may be omitted. For example, a configuration in which the entire band of the acoustic signal si is a target of processing by the matrix decomposing unit 44 or the noise specifying unit 46 may be employed. However, since the intensity of the wind noise decreases in the higher band (for example, the band BHa), in the configuration in which the band division of the acoustic signal si is omitted, an independent basis Ci [k] of the wind noise is obtained by non-negative matrix factorization. It becomes difficult to extract with high accuracy. Therefore, when the frequency band of the noise component to be suppressed is known in advance, only the frequency band (band BLa) including the noise component is processed by the matrix decomposing unit 44 and the noise specifying unit 46. The above-described configuration is particularly suitable.

(3)変形例3
以上の各形態では、音響信号siの解析期間T0毎に目的音係数列GTiおよび雑音係数列GNiを生成したが、解析期間T0の区切は省略される。例えば、音響信号siの全区間にわたる単位期間毎のスペクトルXiの時系列を1個の観測行列Viとした構成も採用され得る。
(3) Modification 3
In each of the above embodiments, the target sound coefficient sequence GTi and the noise coefficient sequence GNi are generated every analysis period T0 of the acoustic signal si, but the division of the analysis period T0 is omitted. For example, a configuration in which the time series of the spectrum Xi for each unit period over the entire section of the acoustic signal si is one observation matrix Vi can be adopted.

(4)変形例4
以上の各形態では、目的音係数列GTiの各係数値gTi[mA,nA]を音響信号siの各成分値xi[mA,nA]に乗算することで推定目的音信号qTiを生成したが、目的音係数列GTiを音響信号siに作用させる方法は適宜に変更される。例えば、音響信号siの各成分値xi[mA,nA]に係数値gTi[mA,nA]を加算する構成も採用され得る。また、以上の各形態での例示とは反対に、風雑音が優勢であるほど係数値gTi[mA,nA]が大きい数値となるように目的音係数列GTiを生成する構成では、成分値xi[mA,nB]を係数値gTi[mA,nA]で除算または減算する構成が採用され得る。雑音成分の強調用の雑音係数列GNiについても同様に、音響信号siに対する適用の方法や風雑音の優劣との関係は適宜に変更される。
(4) Modification 4
In each of the above embodiments, the estimated target sound signal qTi is generated by multiplying each component value xi [mA, nA] of the acoustic signal si by each coefficient value gTi [mA, nA] of the target sound coefficient sequence GTi. The method of applying the target sound coefficient sequence GTi to the acoustic signal si is appropriately changed. For example, a configuration in which the coefficient value gTi [mA, nA] is added to each component value xi [mA, nA] of the acoustic signal si may be employed. Contrary to the examples in the above embodiments, in the configuration in which the target sound coefficient sequence GTi is generated so that the coefficient value gTi [mA, nA] becomes larger as the wind noise becomes more dominant, the component value xi A configuration in which [mA, nB] is divided or subtracted by the coefficient value gTi [mA, nA] may be employed. Similarly, for the noise coefficient string GNi for emphasizing the noise component, the method of application to the acoustic signal si and the relationship with the superiority or inferiority of wind noise are appropriately changed.

(5)変形例5
以上の各形態では、2系統の音響信号qi(q1,q2)を生成したが、1系統(モノラル形式)の音響信号q1のみを生成する場合にも以上の各形態が同様に適用され得る。例えば、音響信号s1に対応する1系統のみを対象として目的音成分と雑音成分との分離や残留成分の抽出および付加が実行される。以上の構成では、音響信号s1の基底行列W1から雑音基底C1_noiseを特定するために音響信号s2が利用される。
(5) Modification 5
In the above embodiments, two systems of acoustic signals qi (q1, q2) are generated, but the above embodiments can be similarly applied when only one system (monaural format) of acoustic signals q1 is generated. For example, separation of target sound components and noise components and extraction and addition of residual components are executed for only one system corresponding to the acoustic signal s1. In the above configuration, the acoustic signal s2 is used to specify the noise base C1_noise from the basis matrix W1 of the acoustic signal s1.

(6)変形例6
演算処理装置22の処理は、音響信号siの供給に並行して実時間的に実行され、処理毎に逐次的に音響信号qiが再生され得る。ただし、事前に用意された音響信号siに対する処理が完了してから音響信号qiの再生を開始する構成(バッチ処理)も好適である。
(6) Modification 6
The processing of the arithmetic processing unit 22 is executed in real time in parallel with the supply of the acoustic signal si, and the acoustic signal qi can be reproduced sequentially for each processing. However, a configuration (batch processing) in which the reproduction of the acoustic signal qi is started after the processing for the acoustic signal si prepared in advance is completed is also suitable.

100……音響処理装置、12……信号供給装置、14……放音装置、22……演算処理装置、24……記憶装置、31……第1処理部、32……第2処理部、42……周波数分析部、44……行列分解部、46……雑音特定部、52……目的音抽出部、54……雑音抽出部、56……波形合成部、522……係数列生成部、524……抽出処理部、542……係数列生成部、544……抽出処理部、582……位相差算定部、584……強度差算定部、62……周波数分析部、64……調波成分抽出部、65……調整部、66……目的音合成部、68……波形合成部、72……周波数推定部、74……調波係数列生成部、742……調波構造特定部、744……係数列合成部、76……係数列補正部、78……調波抽出部。
DESCRIPTION OF SYMBOLS 100 ... Sound processing device, 12 ... Signal supply device, 14 ... Sound emission device, 22 ... Arithmetic processing device, 24 ... Memory | storage device, 31 ... 1st processing part, 32 ... 2nd processing part, 42 …… Frequency analysis unit, 44 …… Matrix decomposition unit, 46 …… Noise identification unit, 52 …… Target sound extraction unit, 54 …… Noise extraction unit, 56 …… Waveform synthesis unit, 522 …… Coefficient sequence generation unit 524 ... Extraction processing unit, 542 ... Coefficient sequence generation unit, 544 ... Extraction processing unit, 582 ... Phase difference calculation unit, 584 ... Intensity difference calculation unit, 62 ... Frequency analysis unit, 64 ... Adjustment Wave component extraction unit, 65 ... adjustment unit, 66 ... target sound synthesis unit, 68 ... waveform synthesis unit, 72 ... frequency estimation unit, 74 ... harmonic coefficient sequence generation unit, 742 ... harmonic structure identification Unit 744... Coefficient sequence combining unit 76... Coefficient sequence correction unit 78.

Claims (6)

並列に収音された第1音響信号および第2音響信号の各々について、当該音響信号の周波数毎の成分値の時系列を要素とする観測行列の非負行列因子分解で、当該音響信号の相異なる成分の周波数毎の成分値を示す複数の基底を含む基底行列と、当該各基底の重み値の時系列を各々が示す複数の重み系列を含む係数行列とを生成する行列分解手段と、
前記第1音響信号の前記基底行列の前記複数の基底のうち前記第2音響信号の前記基底行列の基底との相関が高い基底を、前記第1音響信号の雑音成分に対応する雑音基底として特定する雑音特定手段と、
前記第1音響信号から前記雑音成分を抑圧した推定目的音成分を前記基底行列のうち前記雑音基底以外の各基底と前記係数行列のうち前記雑音基底以外に対応する各重み系列とを利用して生成する目的音抽出手段と、
前記第1音響信号から目的音成分を抑圧した推定雑音成分を前記雑音基底と前記係数行列のうち当該雑音基底に対応する重み系列とを利用して生成する雑音抽出手段と、
前記目的音成分の調波構造に対応する残留成分を前記推定雑音成分から抽出する調波成分抽出手段と、
前記推定目的音成分と前記残留成分とを合成する目的音合成手段と
を具備する音響処理装置。
For each of the first acoustic signal and the second acoustic signal collected in parallel, the acoustic signals are different by non-negative matrix factorization of an observation matrix having a time series of component values for each frequency of the acoustic signal as elements. Matrix decomposition means for generating a base matrix including a plurality of bases indicating component values for each frequency of the component, and a coefficient matrix including a plurality of weight sequences each indicating a time series of weight values of the respective bases;
A base having a high correlation with the base of the base matrix of the second acoustic signal among the plurality of bases of the base matrix of the first acoustic signal is identified as a noise base corresponding to a noise component of the first acoustic signal Noise identification means to
An estimated target sound component obtained by suppressing the noise component from the first acoustic signal is obtained using each basis other than the noise basis in the basis matrix and each weight sequence corresponding to the other than the noise basis in the coefficient matrix. A target sound extraction means to generate;
Noise extraction means for generating an estimated noise component obtained by suppressing a target sound component from the first acoustic signal using the noise base and a weight sequence corresponding to the noise base in the coefficient matrix;
Harmonic component extraction means for extracting a residual component corresponding to the harmonic structure of the target sound component from the estimated noise component;
A sound processing apparatus comprising: target sound synthesis means for synthesizing the estimated target sound component and the residual component.
前記調波成分抽出手段は、
前記目的音成分の基本周波数を推定する周波数推定手段と、
前記推定雑音成分のうち前記基本周波数の整数倍の周波数の調波成分が強調されるように各係数値が設定された調波係数列を生成する調波係数列生成手段と、
前記調波係数列を前記推定雑音成分に作用させて前記残留成分を抽出する調波抽出手段とを含む
請求項1の音響処理装置。
The harmonic component extraction means includes
Frequency estimating means for estimating a fundamental frequency of the target sound component;
Harmonic coefficient string generation means for generating a harmonic coefficient string in which each coefficient value is set so as to emphasize a harmonic component of a frequency that is an integral multiple of the fundamental frequency among the estimated noise components;
The acoustic processing apparatus according to claim 1, further comprising: a harmonic extraction unit that causes the harmonic coefficient sequence to act on the estimated noise component to extract the residual component.
前記周波数推定手段は、前記目的音抽出手段が生成した前記推定目的音成分の基本周波数を推定する
請求項2の音響処理装置。
The sound processing apparatus according to claim 2, wherein the frequency estimation unit estimates a fundamental frequency of the estimated target sound component generated by the target sound extraction unit.
前記第1音響信号および前記第2音響信号の単位区間毎のスペクトルの時系列を、各単位区間の窓幅と移動量とを含む第1解析パラメータのもとで前記観測行列として生成する第1周波数分析手段と、
前記第1解析パラメータとは相違する窓幅と移動量とを含む第2解析パラメータのもとで前記推定目的音成分および前記推定雑音成分のスペクトルを単位区間毎に順次に生成する第2周波数分析手段と、
前記第2解析パラメータに応じた間隔で時間軸上および周波数軸上に配列する解析点毎に係数値が設定された補正係数列を生成する係数列補正手段とを具備し、
前記雑音抽出手段は、前記第1解析パラメータに応じた間隔で時間軸上および周波数軸上に配列する解析点毎に係数値が設定された雑音係数列を、前記雑音基底と当該雑音基底に対応する重み系列とを利用して生成し、前記雑音係数列を前記観測行列に作用させて前記推定雑音成分を生成し、
前記係数列補正手段は、前記雑音係数列から前記補正係数列を生成し、
前記調波係数列生成手段は、前記補正係数列から前記基本周波数の整数倍の周波数の成分を抽出して前記調波係数列を生成する
請求項2または請求項3の音響処理装置。
A time series of spectra for each unit section of the first acoustic signal and the second acoustic signal is generated as the observation matrix based on a first analysis parameter including a window width and a movement amount of each unit section . Frequency analysis means;
A second frequency analysis for sequentially generating a spectrum of the estimated target sound component and the estimated noise component for each unit section based on a second analysis parameter including a window width and a movement amount different from the first analysis parameter. Means,
Coefficient sequence correction means for generating a correction coefficient sequence in which coefficient values are set for each analysis point arranged on the time axis and the frequency axis at intervals according to the second analysis parameter,
The noise extraction unit corresponds to the noise base and the noise base, a noise coefficient sequence in which coefficient values are set for each analysis point arranged on the time axis and the frequency axis at intervals according to the first analysis parameter. And generating the estimated noise component by applying the noise coefficient sequence to the observation matrix,
The coefficient sequence correction unit generates the correction coefficient sequence from the noise coefficient sequence,
4. The acoustic processing device according to claim 2, wherein the harmonic coefficient string generation unit generates a harmonic coefficient string by extracting a component having a frequency that is an integral multiple of the fundamental frequency from the correction coefficient string.
前記第1音響信号と前記第2音響信号との位相差を算定する位相差算定手段と、
前記第1音響信号と前記第2音響信号との強度差を算定する強度差算定手段とを具備し、
前記目的音抽出手段は、前記第1音響信号と前記第2音響信号との位相差および強度差に応じて各係数値が可変に設定された目的音係数列を、前記基底行列のうち前記雑音基底以外の各基底と前記係数行列のうち前記雑音基底以外に対応する重み系列とから生成して前記観測行列に作用させ、
前記雑音抽出手段は、前記第1音響信号と前記第2音響信号との位相差および強度差に応じて各係数値が可変に設定された雑音係数列を、前記雑音基底と当該雑音基底に対応する重み系列とから生成して前記観測行列に作用させる
請求項4の音響処理装置。
Phase difference calculating means for calculating a phase difference between the first acoustic signal and the second acoustic signal;
An intensity difference calculating means for calculating an intensity difference between the first acoustic signal and the second acoustic signal;
The target sound extraction means uses a target sound coefficient sequence in which each coefficient value is variably set according to a phase difference and an intensity difference between the first acoustic signal and the second acoustic signal as the noise in the base matrix. Generating from each basis other than the basis and a weight sequence corresponding to the coefficient matrix other than the noise basis, and acting on the observation matrix;
The noise extraction unit corresponds to a noise coefficient sequence in which each coefficient value is variably set according to a phase difference and an intensity difference between the first acoustic signal and the second acoustic signal, and corresponds to the noise base and the noise base. The sound processing device according to claim 4, wherein the sound processing device is generated from a weight sequence to be applied to the observation matrix.
並列に収音された第1音響信号および第2音響信号の各々について、当該音響信号の周波数毎の成分値の時系列を要素とする観測行列の非負行列因子分解で、当該音響信号の相異なる成分の周波数毎の成分値を示す複数の基底を含む基底行列と、当該各基底の重み値の時系列を各々が示す複数の重み系列を含む係数行列とを生成する行列分解処理と、  For each of the first acoustic signal and the second acoustic signal collected in parallel, the acoustic signals are different by non-negative matrix factorization of an observation matrix having a time series of component values for each frequency of the acoustic signal as elements. Matrix decomposition processing for generating a base matrix including a plurality of bases indicating component values for each frequency of the component, and a coefficient matrix including a plurality of weight sequences each indicating a time series of weight values of the respective bases;
前記第1音響信号の前記基底行列の前記複数の基底のうち前記第2音響信号の前記基底行列の基底との相関が高い基底を、前記第1音響信号の雑音成分に対応する雑音基底として特定する雑音特定処理と、  A base having a high correlation with the base of the base matrix of the second acoustic signal among the plurality of bases of the base matrix of the first acoustic signal is identified as a noise base corresponding to a noise component of the first acoustic signal Noise identification processing,
前記第1音響信号から前記雑音成分を抑圧した推定目的音成分を前記基底行列のうち前記雑音基底以外の各基底と前記係数行列のうち前記雑音基底以外に対応する各重み系列とを利用して生成する目的音抽出処理と、  An estimated target sound component obtained by suppressing the noise component from the first acoustic signal is obtained using each basis other than the noise basis in the basis matrix and each weight sequence corresponding to the other than the noise basis in the coefficient matrix. A target sound extraction process to be generated;
前記第1音響信号から目的音成分を抑圧した推定雑音成分を前記雑音基底と前記係数行列のうち当該雑音基底に対応する重み系列とを利用して生成する雑音抽出処理と、  A noise extraction process for generating an estimated noise component obtained by suppressing a target sound component from the first acoustic signal using the noise base and a weight sequence corresponding to the noise base in the coefficient matrix;
前記目的音成分の調波構造に対応する残留成分を前記推定雑音成分から抽出する調波成分抽出処理と、  A harmonic component extraction process for extracting a residual component corresponding to the harmonic structure of the target sound component from the estimated noise component;
前記推定目的音成分と前記残留成分とを合成する目的音合成処理と  A target sound synthesis process for synthesizing the estimated target sound component and the residual component;
をコンピュータに実行させるプログラム。  A program that causes a computer to execute.
JP2010159543A 2010-07-14 2010-07-14 Sound processing apparatus and program Expired - Fee Related JP5516169B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2010159543A JP5516169B2 (en) 2010-07-14 2010-07-14 Sound processing apparatus and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2010159543A JP5516169B2 (en) 2010-07-14 2010-07-14 Sound processing apparatus and program

Publications (2)

Publication Number Publication Date
JP2012022120A JP2012022120A (en) 2012-02-02
JP5516169B2 true JP5516169B2 (en) 2014-06-11

Family

ID=45776455

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2010159543A Expired - Fee Related JP5516169B2 (en) 2010-07-14 2010-07-14 Sound processing apparatus and program

Country Status (1)

Country Link
JP (1) JP5516169B2 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013160735A1 (en) * 2012-04-27 2013-10-31 Sony Mobile Communications Ab Noise suppression based on correlation of sound in a microphone array
JP6174856B2 (en) 2012-12-27 2017-08-02 キヤノン株式会社 Noise suppression device, control method thereof, and program
US9384553B2 (en) * 2013-04-03 2016-07-05 Mitsubishi Electric Research Laboratories, Inc. Method for factorizing images of a scene into basis images
JP2015118361A (en) * 2013-11-15 2015-06-25 キヤノン株式会社 Information processing apparatus, information processing method, and program
JP6371516B2 (en) 2013-11-15 2018-08-08 キヤノン株式会社 Acoustic signal processing apparatus and method
JP6482173B2 (en) * 2014-01-20 2019-03-13 キヤノン株式会社 Acoustic signal processing apparatus and method
JP6274872B2 (en) * 2014-01-21 2018-02-07 キヤノン株式会社 Sound processing apparatus and sound processing method
US10515650B2 (en) 2015-06-30 2019-12-24 Nec Corporation Signal processing apparatus, signal processing method, and signal processing program
JP7443823B2 (en) * 2020-02-28 2024-03-06 ヤマハ株式会社 Sound processing method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001124621A (en) * 1999-10-28 2001-05-11 Matsushita Electric Ind Co Ltd Noise measuring instrument capable of reducing wind noise
JP2006227152A (en) * 2005-02-16 2006-08-31 Nippon Telegr & Teleph Corp <Ntt> Computing device, and sound collecting device using the same
JP4356670B2 (en) * 2005-09-12 2009-11-04 ソニー株式会社 Noise reduction device, noise reduction method, noise reduction program, and sound collection device for electronic device
JP2008263483A (en) * 2007-04-13 2008-10-30 Sanyo Electric Co Ltd Wind noise reducing device, sound signal recorder, and imaging apparatus
US8015003B2 (en) * 2007-11-19 2011-09-06 Mitsubishi Electric Research Laboratories, Inc. Denoising acoustic signals using constrained non-negative matrix factorization
JP5454330B2 (en) * 2010-04-23 2014-03-26 ヤマハ株式会社 Sound processor

Also Published As

Publication number Publication date
JP2012022120A (en) 2012-02-02

Similar Documents

Publication Publication Date Title
JP5516169B2 (en) Sound processing apparatus and program
JP5124014B2 (en) Signal enhancement apparatus, method, program and recording medium
WO2014021318A1 (en) Spectral envelope and group delay inference system and voice signal synthesis system for voice analysis/synthesis
US20080228470A1 (en) Signal separating device, signal separating method, and computer program
Kim et al. Nonlinear enhancement of onset for robust speech recognition.
EP2946382B1 (en) Vehicle engine sound extraction and reproduction
US8090119B2 (en) Noise suppressing apparatus and program
JP6019969B2 (en) Sound processor
JP5454330B2 (en) Sound processor
JP5187666B2 (en) Noise suppression device and program
JP5034735B2 (en) Sound processing apparatus and program
US10297272B2 (en) Signal processor
EP2640096A2 (en) Sound processing apparatus
US9959852B2 (en) Vehicle engine sound extraction
JP5942388B2 (en) Noise suppression coefficient setting device, noise suppression device, and noise suppression coefficient setting method
JP5387442B2 (en) Signal processing device
JP5609157B2 (en) Coefficient setting device and noise suppression device
JP2798003B2 (en) Voice band expansion device and voice band expansion method
Dreier et al. Sound source modelling by nonnegative matrix factorization for virtual reality applications
US20130322644A1 (en) Sound Processing Apparatus
JP5263020B2 (en) Signal processing device
JP5884473B2 (en) Sound processing apparatus and sound processing method
JP6790659B2 (en) Sound processing equipment and sound processing method
JP2015169901A (en) Acoustic processing device
Kreutzer et al. Time domain attack and release modeling-applied to spectral domain sound synthesis

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20130520

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20131128

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20131217

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20140207

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20140304

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20140317

R150 Certificate of patent or registration of utility model

Ref document number: 5516169

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

LAPS Cancellation because of no payment of annual fees