JP5605575B2

JP5605575B2 - Multi-channel acoustic signal processing method, system and program thereof

Info

Publication number: JP5605575B2
Application number: JP2010550500A
Authority: JP
Inventors: 剛範辻川; 正江森; 祥史大西
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-02-13
Filing date: 2010-02-08
Publication date: 2014-10-15
Anticipated expiration: 2030-02-08
Also published as: US20120029916A1; JPWO2010092915A1; WO2010092915A1; US9064499B2

Description

本発明は、多チャンネル音響信号処理方法、多チャンネル音響信号処理システム及びプログラムに関する。 The present invention relates to a multichannel acoustic signal processing method, a multichannel acoustic signal processing system, and a program.

関連する多チャンネル音響信号処理システムの一例が、特許文献１に記載されている。この装置は、任意に配置された複数のマイクロホンで観測した複数の話者の音声および雑音の混合音響信号から目的外音声、背景雑音を除去することにより目的音声を抽出できるシステムである。また、上記混合音響信号から目的音声を検出できるシステムでもある。 An example of a related multi-channel acoustic signal processing system is described in Patent Document 1. This apparatus is a system that can extract a target voice by removing non-target voice and background noise from a mixed acoustic signal of voice and noise of a plurality of speakers observed with a plurality of microphones arranged arbitrarily. Moreover, it is also a system which can detect the target voice from the mixed acoustic signal.

図３は、特許文献１に開示されている雑音除去システムの構成を示すブロック図である。その雑音除去システムにおける混合音響信号から目的音声を検出する箇所について構成および動作を概説する。複数のチャンネルの入力時系列信号を受けて分離する信号分離部１０１と、信号分離部１０１から出力される分離信号を受け強度比計算部１０６からの強度比に基づき雑音を推定する雑音推定部１０２と、信号分離部１０１から出力される分離信号と、雑音推定部１０２で推定された雑音成分と、強度比計算部１０６の出力を受けて雑音区間／音声区間を検出する雑音区間検出部１０３とを有する。 FIG. 3 is a block diagram showing the configuration of the noise removal system disclosed in Patent Document 1. As shown in FIG. The configuration and operation of the point where the target speech is detected from the mixed acoustic signal in the noise removal system will be outlined. A signal separator 101 that receives and separates input time-series signals of a plurality of channels, and a noise estimator 102 that receives a separated signal output from the signal separator 101 and estimates noise based on the intensity ratio from the intensity ratio calculator 106. A noise interval detection unit 103 that detects a noise interval / speech interval by receiving the separated signal output from the signal separation unit 101, the noise component estimated by the noise estimation unit 102, and the output of the intensity ratio calculation unit 106; Have

特開２００５−３０８７７１号公報（図１）Japanese Patent Laying-Open No. 2005-308771 (FIG. 1)

上記で説明した特許文献１に記載の雑音除去システムに含まれる、混合音響信号から目的音声を検出する箇所は、任意に配置された複数のマイクロホンで観測した複数の話者の音声および雑音の混合音響信号から目的音声を検出することを意図したものであるが、下記の問題点を有している。 The place where the target speech is detected from the mixed acoustic signal included in the noise removal system described in Patent Document 1 described above is a mixture of speech and noise of a plurality of speakers observed with a plurality of arbitrarily arranged microphones. Although intended to detect the target voice from the acoustic signal, it has the following problems.

その問題点は、信号分離部１が非効率的であるということである。 The problem is that the signal separation unit 1 is inefficient.

その理由は、複数のマイクロホンが任意に配置され、複数のマイクロホンからの信号（マイクロホン信号、図３では入力時系列信号）を用いて、例えば目的音声を検出することを想定すると、マイクロホン信号によっては、信号分離が必要な場合と、不要な場合とがあるためである。すなわち、信号分離部１の後段の処理によって、信号分離が必要な度合いが異なるということである。信号分離が不要なマイクロホン信号が多数となると、信号分離部１は不要な処理に莫大な計算量を費やすことになり、非効率的である。 The reason is that, assuming that a plurality of microphones are arbitrarily arranged and a target voice is detected using signals from the plurality of microphones (microphone signal, input time series signal in FIG. 3), for example, depending on the microphone signal, This is because there are cases where signal separation is necessary and cases where signal separation is unnecessary. That is, the degree of necessity for signal separation differs depending on the subsequent processing of the signal separation unit 1. When there are a large number of microphone signals that do not require signal separation, the signal separation unit 1 consumes an enormous amount of calculation for unnecessary processing, which is inefficient.

そこで、本発明は上記課題に鑑みて発明されたものであって、その目的は、多チャンネルの入力信号を効率的に信号分離できる多チャンネル音響信号処理方法、そのシステム及びプログラムを提供することにある。 Accordingly, the present invention has been invented in view of the above problems, and an object of the present invention is to provide a multi-channel acoustic signal processing method, system and program capable of efficiently separating multi-channel input signals. is there.

上記課題を解決する本発明は、多チャンネルの入力信号からチャンネル毎に特徴量を算出し、前記チャンネル毎の特徴量のチャンネル間の類似度を計算し、前記類似度が高い複数のチャンネルを選択し、選択した複数のチャンネルの入力信号を用いて信号を分離することを特徴とする多チャンネル音響信号処理方法である。 The present invention that solves the above problems calculates feature values for each channel from multi-channel input signals, calculates the similarity between the channels of the feature values for each channel, and selects a plurality of channels with the high similarity Then, the multi-channel acoustic signal processing method is characterized in that signals are separated using input signals of a plurality of selected channels.

上記課題を解決する本発明は、多チャンネルの入力信号からチャンネル毎に特徴量を算出する特徴量算出部と、前記チャンネル毎の特徴量のチャンネル間の類似度を計算する類似度計算部と、前記類似度が高い複数のチャンネルを選択するチャンネル選択部と、選択した複数のチャンネルの入力信号を用いて信号を分離する信号分離部とを有することを特徴とする多チャンネル音響信号処理システムである。 The present invention for solving the above-mentioned problems is a feature amount calculation unit that calculates a feature amount for each channel from a multi-channel input signal, a similarity calculation unit that calculates a similarity between channels of the feature amount for each channel, A multi-channel acoustic signal processing system comprising: a channel selection unit that selects a plurality of channels having a high degree of similarity; and a signal separation unit that separates signals using input signals of the selected plurality of channels. .

上記課題を解決する本発明は、多チャンネルの入力信号からチャンネル毎に特徴量を算出する特徴量算出処理と、前記チャンネル毎の特徴量のチャンネル間の類似度を計算する類似度計算処理と、前記類似度が高い複数のチャンネルを選択するチャンネル選択処理と、選択した複数のチャンネルの入力信号を用いて信号を分離する信号分離処理とを情報処理装置に実行させることを特徴とするプログラムである。 The present invention for solving the above-mentioned problems is a feature amount calculation process for calculating a feature amount for each channel from a multi-channel input signal, a similarity calculation process for calculating a similarity between channels of the feature amount for each channel, A program that causes an information processing device to execute channel selection processing for selecting a plurality of channels with high similarity and signal separation processing for separating signals using input signals of the selected plurality of channels. .

本発明は、信号分離が不要なチャンネルを除くことができ、効率的に信号を分離するという、本発明の目的を達成することができる。 The present invention can eliminate the channels that do not require signal separation, and can achieve the object of the present invention of efficiently separating signals.

本発明を実施するための最良の形態の構成を示すブロック図である。It is a block diagram which shows the structure of the best form for implementing this invention. 本発明を実施するための最良の形態の動作を示す流れ図である。It is a flowchart which shows operation | movement of the best form for implementing this invention. 特許文献１の雑音除去システムの構成を示すブロック図である。It is a block diagram which shows the structure of the noise removal system of patent document 1. FIG.

以下、図面を参照して本発明の実施の形態について詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の多チャンネル音響信号処理システムの構成例を示すブロック図である。 FIG. 1 is a block diagram showing a configuration example of a multi-channel acoustic signal processing system of the present invention.

図１に例示する多チャンネル音響信号処理システムは、入力信号１〜Ｍをそれぞれ受けてチャンネル毎の特徴量を算出する特徴量算出部１−１〜１−Ｍと、特徴量を受けてチャンネル間の類似度を計算する類似度計算部２と、チャンネル間の類似度を受けて類似度の高いチャンネルを選択するチャンネル選択部３と、選択された類似度が高いチャンネルの入力信号を受けて信号を分離する信号分離部４−１〜４−Ｎとを有する。 The multi-channel acoustic signal processing system illustrated in FIG. 1 receives input signals 1 to M and calculates feature amounts for each channel, and receives feature amounts between channels. A similarity calculation unit 2 that calculates the similarity between the channels, a channel selection unit 3 that receives a similarity between channels and selects a channel having a high similarity, and a signal that receives an input signal of the selected channel with a high similarity Signal separation units 4-1 to 4-N.

図２は、本発明の実施の形態に係る多チャンネル音響信号処理システムにおける処理手順を示す流れ図である。 FIG. 2 is a flowchart showing a processing procedure in the multi-channel acoustic signal processing system according to the embodiment of the present invention.

図１および図２を参照して、本実施の形態の多チャンネル音響信号処理システムの詳細について以下に説明する。 The details of the multi-channel acoustic signal processing system of the present embodiment will be described below with reference to FIGS.

入力信号１〜Ｍをそれぞれx1(t)〜xM(t)とする。ただし、tはサンプル番号である。特徴量算出部１−１〜１−Ｍでは、入力信号１〜Ｍから、それぞれ特徴量１〜Ｍを算出する（ステップＳ１）。 Assume that the input signals 1 to M are x1 (t) to xM (t), respectively. Where t is a sample number. The feature quantity calculators 1-1 to 1-M calculate feature quantities 1 to M from the input signals 1 to M, respectively (step S1).

F1(T) = [f11(T) f12(T) … f1L(T)] … (1-1)
F2(T) = [f21(T) f22(T) … f2L(T)] … (1-2)
.
.
.
FM(T) = [fM1(T) fM2(T) … fML(T)] … (1-M)
ただし、F1(T)〜FM(T)は入力信号１〜Ｍから算出した特徴量１〜Ｍである。Tは時間のインデックスであり、複数のサンプルtを1つの区間とし、その時間区間におけるインデックスとしてTを用いてもよい。F1 (T) = [f11 (T) f12 (T)… f1L (T)]… (1-1)
F2 (T) = [f21 (T) f22 (T)… f2L (T)]… (1-2)
.
.
.
FM (T) = [fM1 (T) fM2 (T)… fML (T)]… (1-M)
However, F1 (T) to FM (T) are feature quantities 1 to M calculated from the input signals 1 to M. T is a time index, and a plurality of samples t may be set as one section, and T may be used as an index in the time section.

数式(1-1)〜(1-M)に示すように、特徴量F1(T)〜FM(T)は、それぞれL次元(Lは１以上の値)の特徴量の要素を持つベクトルとして構成される。特徴量の要素としては、例えば、時間波形（入力信号）、平均パワーなどの統計量、周波数スペクトル、周波数対数スペクトル、ケプストラム、メルケプストラム、音響モデルに対する尤度、音響モデルに対する信頼度（エントロピーを含む）、音素・音節認識結果、音声区間長のようなものが考えられる。 As shown in Equations (1-1) to (1-M), feature quantities F1 (T) to FM (T) are each a vector having elements of feature quantities in the L dimension (L is a value of 1 or more). Composed. The elements of the feature quantity include, for example, time waveform (input signal), statistics such as average power, frequency spectrum, frequency logarithmic spectrum, cepstrum, mel cepstrum, likelihood for the acoustic model, reliability for the acoustic model (including entropy) ), Phoneme / syllable recognition results, speech segment length, and so on.

上記の通り、入力信号１〜Ｍから直接求める特徴量だけでなく、音響モデルというある基準に対するチャンネル毎の値を特徴量とすることも可能である。なお、上記の特徴量は一例であり、その他の特徴量でも良いことはもちろんである。 As described above, not only the feature amount directly obtained from the input signals 1 to M but also a value for each channel with respect to a certain standard called an acoustic model can be used as the feature amount. Note that the above feature amount is an example, and other feature amounts may be used.

次に、類似度計算部２は、特徴量１〜Ｍを受けて、チャンネル間の類似度を計算する（ステップＳ２）。 Next, the similarity calculation unit 2 receives the feature quantities 1 to M and calculates the similarity between channels (step S2).

類似度の計算方法は、特徴量の要素によって異なる。 The method for calculating the degree of similarity differs depending on the feature amount element.

相関値は、一般的に類似度を表す指標として適している。また、距離(差分)値は、小さいほど類似度が高いということを表す指標となる。また、特徴量が音素・音節認識結果の場合は、文字列の比較となり、その類似度の計算にはＤＰマッチングなどを利用することもある。 The correlation value is generally suitable as an index representing the degree of similarity. The distance (difference) value is an index indicating that the smaller the value is, the higher the similarity is. When the feature quantity is a phoneme / syllable recognition result, character strings are compared, and DP matching or the like may be used to calculate the similarity.

なお、上記の相関値、距離値などは一例であり、その他の指標で類似度を計算しても良いことはもちろんである。また、全チャンネルの全組み合わせの類似度を計算する必要はなく、Ｍチャンネルのうちのあるチャンネルを基準とし、そのチャンネルに対する類似度のみを計算してもよい。また、複数の時刻Tを１つの区間として、その時間区間における類似度を計算してもよい。また特徴量に音声区間長が含まれる場合は、音声区間が検出されないチャンネルに対しては、以後の処理を省略することも可能である。 Note that the above correlation value, distance value, and the like are examples, and it is needless to say that the similarity may be calculated using another index. Moreover, it is not necessary to calculate the similarity of all combinations of all channels, and only the similarity to the channel may be calculated on the basis of a certain channel among the M channels. Alternatively, a plurality of times T may be taken as one section, and the similarity in that time section may be calculated. When the feature amount includes the voice section length, subsequent processing can be omitted for a channel in which the voice section is not detected.

チャンネル選択部３は、類似度計算部２からのチャンネル間の類似度を受けて、類似度が高いチャンネルを選択し、グルーピングする（ステップＳ３）。 The channel selection unit 3 receives the similarity between channels from the similarity calculation unit 2, selects a channel with a high similarity, and groups them (step S3).

選択方法としては、類似度を閾値と比較して、閾値より高い場合に、それらのチャンネルをグルーピングする、相対的に類似度が高い場合にグルーピングするなど、クラスタリングの手法を用いればよい。その際、複数のグループに選択されるチャンネルがあってもよい、また、どのグループにも選択されないチャンネルがあってもよい。 As a selection method, a method of clustering may be used, such as comparing the degree of similarity with a threshold and grouping those channels when the degree of similarity is higher than the threshold, or grouping when the degree of similarity is relatively high. At this time, there may be channels selected for a plurality of groups, or there may be channels that are not selected for any group.

なお、類似度算出部２とチャンネル選択部３とは、異なる特徴量に対して、類似度を計算、チャンネルを選択、という処理を繰り返すことにより、選択するチャンネルを絞り込むように処理してもよい。 Note that the similarity calculation unit 2 and the channel selection unit 3 may perform processing to narrow down the channels to be selected by repeating the process of calculating the similarity and selecting the channel for different feature amounts. .

信号分離部４−１〜４−Ｎは、チャンネル選択部３で選択されたグループ毎に信号分離を行う（ステップＳ４）。 The signal separation units 4-1 to 4-N perform signal separation for each group selected by the channel selection unit 3 (step S4).

信号分離は、独立成分分析に基づく手法や、２乗誤差最小化に基づく手法などを用いればよい。各信号分離部の出力は類似度が低いことが期待されるが、異なる信号分離部の出力には類似度が高いものが含まれる可能性がある。その場合には、類似している出力を取捨選択してもよい。 For signal separation, a method based on independent component analysis, a method based on square error minimization, or the like may be used. Although the output of each signal separation unit is expected to have a low similarity, the output of different signal separation units may include a high similarity. In that case, similar outputs may be selected.

本実施の形態は、全チャンネルで信号分離を行うのではなく、チャンネル間の類似度に基づいて、信号分離を行う単位を小規模にし、また信号分離不要なチャンネルは信号分離部に入力しない。そのため、全チャンネルで信号分離を行う場合に比べて、効率的に信号分離を行うことが可能となる。 In this embodiment, signal separation is not performed on all channels, but the unit for performing signal separation is made small based on the similarity between channels, and channels that do not require signal separation are not input to the signal separation unit. Therefore, signal separation can be performed more efficiently than when signal separation is performed on all channels.

以上の如く、本実施の形態は、チャンネル毎に算出された特徴量のチャンネル間の類似度を計算し、類似度が高いチャンネルに対して信号を分離する。このような構成を採用し、信号を分離することにより、信号分離が不要なチャンネルを除くことができるため、効率的に信号を分離するという、本発明の目的を達成することができる。 As described above, according to the present embodiment, the similarity between channels of the feature amount calculated for each channel is calculated, and the signal is separated from the channel having a high similarity. By adopting such a configuration and separating the signals, channels that do not require signal separation can be removed, so that the object of the present invention of efficiently separating signals can be achieved.

尚、上述した実施の形態において、特徴量算出部１−１〜１−Ｍと、類似度計算部２と、チャンネル選択部３と、信号分離部４−１〜４−Ｎとをハードウェアで構成したが、それらの全部又は一部をプログラムで動作する情報処理装置により構成することもできる。 In the above-described embodiment, the feature quantity calculation units 1-1 to 1-M, the similarity calculation unit 2, the channel selection unit 3, and the signal separation units 4-1 to 4-N are implemented by hardware. Although configured, all or part of them can be configured by an information processing apparatus that operates by a program.

また、上記の実施の形態の内容は、以下のようにも表現されうる。 The contents of the above embodiment can also be expressed as follows.

［付記１］多チャンネルの入力信号からチャンネル毎に特徴量を算出し、
前記チャンネル毎の特徴量のチャンネル間の類似度を計算し、
前記類似度が高い複数のチャンネルを選択し、
選択した複数のチャンネルの入力信号を用いて信号を分離する
ことを特徴とする多チャンネル音響信号処理方法。[Appendix 1] Calculate feature values for each channel from multi-channel input signals,
Calculate the similarity between channels of the feature amount for each channel,
Select a plurality of channels with high similarity,
A multi-channel acoustic signal processing method, wherein signals are separated using input signals of a plurality of selected channels.

［付記２］前記チャンネル毎に算出する特徴量は、時間波形、統計量、周波数スペクトル、周波数対数スペクトル、ケプストラム、メルケプストラム、音響モデルに対する尤度、音響モデルに対する信頼度、音素認識結果、音節認識結果、音声区間長のうち少なくとも１つを含むことを特徴とする付記１に記載の多チャンネル音響信号処理方法。 [Supplementary Note 2] The feature values calculated for each channel are time waveform, statistic, frequency spectrum, frequency logarithmic spectrum, cepstrum, mel cepstrum, likelihood for acoustic model, reliability for acoustic model, phoneme recognition result, syllable recognition. As a result, the multi-channel acoustic signal processing method according to supplementary note 1, wherein the multi-channel acoustic signal processing method includes at least one of speech segment lengths.

［付記３］前記類似度を表す指標として、相関値、距離値のうち少なくとも１つを含むことを特徴とする付記１又は付記２に記載の多チャンネル音響信号処理方法。 [Supplementary Note 3] The multi-channel acoustic signal processing method according to Supplementary Note 1 or 2, wherein the index representing the degree of similarity includes at least one of a correlation value and a distance value.

［付記４］前記チャンネル毎の類似度を計算して類似度が高い複数のチャンネルを選択することを、異なる特徴量を用いて複数回繰り返し、選択するチャンネルを絞ることを特徴とする付記１から付記３のいずれかに記載の多チャンネル音響信号処理方法。 [Supplementary Note 4] From the supplementary note 1, selecting a plurality of channels having high similarity by calculating the similarity for each channel is repeated a plurality of times using different feature amounts, and the channels to be selected are narrowed down. The multi-channel acoustic signal processing method according to any one of Appendix 3.

［付記５］多チャンネルの入力信号からチャンネル毎に特徴量を算出する特徴量算出部と、
前記チャンネル毎の特徴量のチャンネル間の類似度を計算する類似度計算部と、
前記類似度が高い複数のチャンネルを選択するチャンネル選択部と、
選択した複数のチャンネルの入力信号を用いて信号を分離する信号分離部と
を有することを特徴とする多チャンネル音響信号処理システム。[Supplementary Note 5] A feature amount calculation unit that calculates a feature amount for each channel from multi-channel input signals;
A similarity calculator for calculating the similarity between channels of the feature amount for each channel;
A channel selection unit for selecting a plurality of channels having a high degree of similarity;
A multi-channel acoustic signal processing system comprising: a signal separation unit that separates signals using input signals of a plurality of selected channels.

［付記６］前記特徴量算出部は、時間波形、統計量、周波数スペクトル、周波数対数スペクトル、ケプストラム、メルケプストラム、音響モデルに対する尤度、音響モデルに対する信頼度、音素認識結果、音節認識結果、音声区間長のうち少なくとも１つを、特徴量として算出することを特徴とする付記５に記載の多チャンネル音響信号処理システム。 [Supplementary Note 6] The feature amount calculation unit includes a time waveform, a statistic, a frequency spectrum, a frequency logarithmic spectrum, a cepstrum, a mel cepstrum, a likelihood for an acoustic model, a reliability for an acoustic model, a phoneme recognition result, a syllable recognition result, and a voice. The multichannel acoustic signal processing system according to appendix 5, wherein at least one of the section lengths is calculated as a feature amount.

［付記７］前記類似度計算部は、相関値、距離値のうち少なくとも１つを、前記類似度を表す指標として算出することを特徴とする付記５又は付記６に記載の多チャンネル音響信号処理システム。 [Appendix 7] The multi-channel acoustic signal processing according to appendix 5 or appendix 6, wherein the similarity calculation unit calculates at least one of a correlation value and a distance value as an index representing the similarity. system.

［付記８］前記特徴量算出部は、異なる特徴量の種類でチャンネル毎の異なる特徴量を算出し、
前記類似度計算部は、異なる特徴量を用いて複数回チャンネルの選択を行い、選択するチャンネルを絞り込むことを特徴とする付記５から付記７のいずれかに記載の多チャンネル音響信号処理システム。[Supplementary Note 8] The feature quantity calculation unit calculates different feature quantities for each channel with different types of feature quantities,
The multi-channel acoustic signal processing system according to any one of appendix 5 to appendix 7, wherein the similarity calculation unit selects a channel a plurality of times using different feature amounts and narrows down the channels to be selected.

［付記９］多チャンネルの入力信号からチャンネル毎に特徴量を算出する特徴量算出処理と、
前記チャンネル毎の特徴量のチャンネル間の類似度を計算する類似度計算処理と、
前記類似度が高い複数のチャンネルを選択するチャンネル選択処理と、
選択した複数のチャンネルの入力信号を用いて信号を分離する信号分離処理と
を情報処理装置に実行させることを特徴とするプログラム。[Supplementary Note 9] A feature amount calculation process for calculating a feature amount for each channel from multi-channel input signals;
Similarity calculation processing for calculating the similarity between channels of the feature amount for each channel;
A channel selection process for selecting a plurality of channels having a high degree of similarity;
A program for causing an information processing apparatus to execute signal separation processing for separating signals using input signals of a plurality of selected channels.

［付記１０］前記特徴量算出処理は、時間波形、統計量、周波数スペクトル、周波数対数スペクトル、ケプストラム、メルケプストラム、音響モデルに対する尤度、音響モデルに対する信頼度、音素認識結果、音節認識結果、音声区間長のうち少なくとも１つを、特徴量として算出することを特徴とする付記９に記載のプログラム。 [Supplementary Note 10] The feature amount calculation processing includes time waveform, statistic, frequency spectrum, frequency logarithmic spectrum, cepstrum, mel cepstrum, likelihood for acoustic model, reliability for acoustic model, phoneme recognition result, syllable recognition result, speech The program according to appendix 9, wherein at least one of the section lengths is calculated as a feature amount.

［付記１１］前記類似度計算処理は、相関値、距離値のうち少なくとも１つを、前記類似度を表す指標として算出することを特徴とする付記９又は付記１０に記載のプログラム。 [Supplementary Note 11] The program according to Supplementary Note 9 or Supplementary Note 10, wherein the similarity calculation process calculates at least one of a correlation value and a distance value as an index representing the similarity.

［付記１２］前記特徴量算出処理と前記類似度計算処理とを、異なる特徴量を用いて複数回繰り返し、
前記チャンネル選択処理は、選択するチャンネルを絞る
ことを特徴とする付記９から付記１１のいずれかに記載のプログラム。[Supplementary Note 12] The feature quantity calculation process and the similarity calculation process are repeated a plurality of times using different feature quantities,
The program according to any one of appendix 9 to appendix 11, wherein the channel selection process narrows down the channels to be selected.

以上好ましい実施の形態をあげて本発明を説明したが、本発明は必ずしも上記実施の形態に限定されるものではなく、その技術的思想の範囲内において様々に変形し実施することが出来る。 Although the present invention has been described with reference to the preferred embodiments, the present invention is not necessarily limited to the above-described embodiments, and various modifications can be made within the scope of the technical idea.

本出願は、２００９年２月１３日に出願された日本出願特願２００９−０３１１１１号を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2009-031111 for which it applied on February 13, 2009, and takes in those the indications of all here.

本発明によれば、任意に配置された複数のマイクロホンで観測した複数の話者の音声および雑音の混合音響信号を分離する多チャンネル音響信号処理装置や、多チャンネル音響信号処理装置をコンピュータに実現するためのプログラムといった用途に適用できる。 According to the present invention, a multi-channel acoustic signal processing device and a multi-channel acoustic signal processing device that separates mixed acoustic signals of speech and noise of a plurality of speakers observed with a plurality of arbitrarily arranged microphones are realized in a computer. It can be applied to uses such as programs for

１−１入力信号１から特徴量を算出する特徴量算出部
１−２入力信号２から特徴量を算出する特徴量算出部
１−Ｍ入力信号Ｍから特徴量を算出する特徴量算出部
２類似度計算部
３チャンネル選択部
４−１グループ１として選択されたチャンネルの信号を分離する信号分離部
４−ＮグループＮとして選択されたチャンネルの信号を分離する信号分離部
1-1 Feature amount calculation unit that calculates feature amount from input signal 1 1-2 Feature amount calculation unit that calculates feature amount from input signal 2 1-M Feature amount calculation unit that calculates feature amount from input signal M 2 Similar Degree calculation unit 3 Channel selection unit 4-1 Signal separation unit that separates signals of channels selected as group 1 4-N Signal separation unit that separates signals of channels selected as group N

Claims

The feature value is calculated for each channel from the multi-channel input signal including the target signal in at least one channel ,
Calculate the similarity between channels of the feature amount for each channel,
Select a plurality of channels with high similarity,
A multi-channel acoustic signal processing method , wherein a target signal included in the selected input signals of a plurality of channels is separated using input signals of the selected plurality of channels .

The feature values calculated for each channel are time waveform, statistic, frequency spectrum, frequency logarithmic spectrum, cepstrum, mel cepstrum, likelihood for acoustic model, reliability for acoustic model, phoneme recognition result, syllable recognition result, speech section The multi-channel acoustic signal processing method according to claim 1, comprising at least one of the lengths.

The multi-channel acoustic signal processing method according to claim 1, wherein the similarity index includes at least one of a correlation value and a distance value.

4. The calculation of the similarity for each channel and selecting a plurality of channels having a high similarity are repeated a plurality of times using different feature amounts to narrow down the channels to be selected. The multi-channel acoustic signal processing method according to any one of the above.

A feature amount calculation unit that calculates a feature amount for each channel from a multi-channel input signal including a target signal in at least one channel ;
A similarity calculator for calculating the similarity between channels of the feature amount for each channel;
A channel selection unit for selecting a plurality of channels having a high degree of similarity;
A multi-channel acoustic signal processing system , comprising: a signal separation unit that separates target signals included in the selected plurality of channels using input signals of the selected plurality of channels .

The feature amount calculation unit includes a time waveform, a statistic, a frequency spectrum, a frequency logarithm spectrum, a cepstrum, a mel cepstrum, a likelihood for an acoustic model, a reliability for an acoustic model, a phoneme recognition result, a syllable recognition result, and a speech section length. 6. The multi-channel acoustic signal processing system according to claim 5, wherein at least one is calculated as a feature amount.

The multi-channel acoustic signal processing system according to claim 5 or 6, wherein the similarity calculation unit calculates at least one of a correlation value and a distance value as an index representing the similarity.

The feature amount calculation unit calculates different feature amounts for each channel with different types of feature amounts,
The multi-channel acoustic signal processing system according to claim 5, wherein the similarity calculation unit selects a channel a plurality of times using different feature amounts and narrows down the channel to be selected. .

A feature amount calculation process for calculating a feature amount for each channel from a multi-channel input signal including a target signal in at least one channel ;
Similarity calculation processing for calculating the similarity between channels of the feature amount for each channel;
A channel selection process for selecting a plurality of channels having a high degree of similarity;
A program for causing an information processing apparatus to execute signal separation processing for separating a target signal included in input signals of a plurality of selected channels using input signals of a plurality of selected channels .

The feature amount calculation processing includes time waveform, statistic, frequency spectrum, frequency logarithm spectrum, cepstrum, mel cepstrum, likelihood for acoustic model, reliability for acoustic model, phoneme recognition result, syllable recognition result, speech section length The program according to claim 9, wherein at least one is calculated as a feature amount.

The program according to claim 9 or 10, wherein the similarity calculation processing calculates at least one of a correlation value and a distance value as an index representing the similarity.

The feature quantity calculation process and the similarity calculation process are repeated a plurality of times using different feature quantities,
The program according to any one of claims 9 to 11, wherein the channel selection processing narrows down the channels to be selected.