JP2005091560A - Method and apparatus for signal separation - Google Patents

Method and apparatus for signal separation Download PDF

Info

Publication number
JP2005091560A
JP2005091560A JP2003322746A JP2003322746A JP2005091560A JP 2005091560 A JP2005091560 A JP 2005091560A JP 2003322746 A JP2003322746 A JP 2003322746A JP 2003322746 A JP2003322746 A JP 2003322746A JP 2005091560 A JP2005091560 A JP 2005091560A
Authority
JP
Japan
Prior art keywords
signal
parameter value
identification
time
band
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2003322746A
Other languages
Japanese (ja)
Inventor
Mitsunobu Kaminuma
Tsuyoki Nishikawa
Hiroshi Saruwatari
洋 猿渡
充伸 神沼
剛樹 西川
Original Assignee
Nissan Motor Co Ltd
日産自動車株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nissan Motor Co Ltd, 日産自動車株式会社 filed Critical Nissan Motor Co Ltd
Priority to JP2003322746A priority Critical patent/JP2005091560A/en
Publication of JP2005091560A publication Critical patent/JP2005091560A/en
Application status is Pending legal-status Critical

Links

Images

Abstract

When a necessary signal is separated and extracted from a plurality of signals, the independent component analysis method in the frequency domain and the time domain has a problem that the separation accuracy is lowered due to a phenomenon called Permutation. This phenomenon is particularly noticeable when the number of signal sources is smaller than the number of sensors. Therefore, an object of the present invention is to solve the problem of performance degradation due to mismatch between the number of signal sources and the number of sensors.
Independent component analysis (ICA) in the frequency domain (FD) and time domain (TD) is sequentially performed, and in particular, the signal identification process in FDICA is divided into a plurality of sub-blocks. In this process, the number of signal sources is estimated, and the result is used to substantially match the number of active sensors with the number of signal sources.
[Selection] Figure 1

Description

  The present invention relates to a method and apparatus for separating and extracting necessary signals from a plurality of signals detected by sensors such as a plurality of microphones (hereinafter abbreviated as microphones).

Blind technology that identifies source signals using only observed signals when multiple signals are observed
This is called Source Separation (hereinafter referred to as BSS). In recent years, a signal separation method based on independent component analysis (hereinafter referred to as ICA) has been mainstream.
According to this signal separation method, for example, a plurality of sound signals are received by K microphones (sensors), and the received signals are utilized by utilizing that the sound signals arriving from each sound source are statistically independent. As a result, the same number of K or less than K sound sources can be separated. Initially, the sound source separation method using ICA was difficult to apply to a microphone array because the time difference between incoming sounds from each sound source was not considered. However, in recent years, many techniques have been proposed in which a time difference is taken into account and a plurality of sound signals are observed using a microphone array, and inverse transformations for the mixing process of signals coming from these sound sources in the frequency domain are obtained.

  In general, when sound signals arriving from a plurality of L sound sources are linearly mixed and observed by K microphones, the observed sound signals can be written as follows at a certain frequency f.

Here, S (f) is a sound signal vector transmitted from each sound source, X (f) is an observed signal vector observed by a microphone array as a sound receiving point, and A (f) is a sound source, a sound receiving point, and This is a mixing matrix related to the propagation vector indicating the propagation characteristics of the spatial acoustic system, and can be written as follows.

Here, T with a right shoulder represents transposition of a vector, and a 'symbol represents an element (scalar amount) in each matrix. At this time, if the mixing matrix A (f) is known, using the observation signal vector X (f) at the sound receiving point,

Thus, the sound signal vector S (f) transmitted from the sound source can be calculated by calculating the general inverse matrix of A (f). However, in general, the propagation vector A (f) is unknown, and the sound signal vector S (f) must be obtained by using only the observation signal vector X (f).

  In order to solve the BSS problem, it is assumed that the sound signal vector S (f) is generated stochastically and that all the components of the sound signal vector S (f) are all independent of each other. Since the observation signal vector X (f) detected by the microphone at this time is a signal in which a plurality of sound source signals are mixed, the distribution of each component of the observation signal vector X (f) is not independent. Therefore, it is considered to search each sound source signal by ICA from independent components included in the observation signal vector X (f), that is, a mixed sound source signal. That is, a matrix W (f) (hereinafter referred to as an inverse mixing matrix) for converting the observed signal vector X (f) into independent components is calculated, and the inverse mixing matrix W (f) is applied to the observed signal vector X (f). Thus, an approximate signal is obtained for the sound signal vector S (f) transmitted from the sound source.

  As processing for obtaining the inverse transformation of the mixing process by ICA, a method of analyzing in the time domain and a method of analyzing in the frequency domain have been proposed. Here, the method of calculating in the frequency domain will be described with reference to FIG.

  In FIG. 12, the incoming signal X (f) from the sound source is detected by the microphones 401 and 402, and then short-time using an appropriate orthogonal transform (for example, short-time discrete Fourier transform / st-DFT in FIG. 12). Perform frame analysis. At this time, by plotting the complex spectrum value at a specific frequency bin at the input of one microphone 401, it is considered as a time series. Here, the frequency bin indicates an individual complex component in a signal vector that is frequency-converted by short-time discrete Fourier transform. Similarly, the same operation is performed on the input of the other microphone 402. The time-frequency signal sequence obtained here is

Can be described. Next, signal separation is performed using the inverse mixing matrix W (f). If the signal separated from the signals input to the microphones 401 and 402 is Y (f, t), this signal separation processing is shown as follows.

Here, in the inverse mixing matrix W (f), L time series outputs Y (f, t), that is, Y 1 ′ (f, t) and Y 2 ′ (f, t) are independent from each other. To be optimized. These processes are performed for all frequency bins. Finally, although not shown in FIG. 12, inverse orthogonal transformation is applied to the separated time series Y (f, t) to reconstruct the sound source signal time waveform reconstruction.

  In the above processing, as an independence evaluation and demixing matrix optimization method, an unsupervised learning algorithm based on the minimization of Kullback-Leibler divergence in the following “Non-Patent Document 1”, or a secondary or higher order correlation An algorithm has been proposed for decorrelating.

  In general, it is known that the analysis method in the frequency domain requires less calculation amount and the separation performance is improved as compared with the analysis method in the time domain. However, the method of analyzing in the frequency domain may cause a phenomenon (Permutation) in which sound sources analyzed for each frequency are switched in adjacent frequency bins.

  On the other hand, Nishikawa et al. Used ICA in the frequency domain (hereinafter referred to as FDICA) as the preceding process, and ICA in the time domain (hereinafter referred to as TDICA) as the subsequent process, aiming to compensate for each weak point, and the frequency domain and time. The processing method shown in FIG. 13 is proposed in “Non-Patent Document 2” below as multistage MSICA (Multi Stage ICA) in which ICAs in the region are connected in series. Nishikawa et al. Pointed out that when the number of signal sources is two and the number of microphones is two, the separation accuracy of the target signal is improved compared to the method only reported in the frequency domain (FDICA). However, there has been no successful report on the case where the number of signal sources and the number of microphones are two or more.

  The ICA processing as described above is not limited to sound signal processing, for example, separates signals that have arrived in a mixed manner in mobile communication, or is reported in Non-Patent Document 3 below. When signals generated in various places inside are measured from the outside using an electroencephalograph, magnetoencephalograph, fMRI (Functional Magnetic Resonance Imaging), etc., the target signal is separated and extracted from the measurement signal Etc. are also used.

"Basics of blind source separation using array signal processing", Technical report of IEICE, EA2001-7) T.Nishikawa, H.Saruwatari and K.Shikano, "Blind source separation of acoustic signals basedon Multistage ICA combining Frequency-domain ICA and Time-domain ICA", IEICETrans. Fundamentals, vol.E84-A, No.1 Jan 2001) "What is independent component analysis?" Computer Today, p38-43, 1998.9, No. 87, "Application to fMRI image analysis" Computer Today, p60-67, 2001.1 No. 95).

  One of the problems of the analysis method in the frequency domain is a phenomenon called Permutation in which sound sources analyzed for each frequency are switched in adjacent frequency bins. This phenomenon occurs particularly when the number of signal sources is smaller than the number of sensors, and the separation accuracy of the separated target signal is significantly lowered. However, it is difficult to always match the number of signal sources and the number of microphones, and when the system is actually created, the dispersion of the target signal separation accuracy increases due to the difference in the number of sound sources.

  In the method of Nishikawa et al. In “Non-Patent Document 2”, since TDICA is installed in the subsequent stage, an effect can be expected for the Permutation problem occurring in the preceding FDICA. However, when the number of signal sources is smaller than the number of sensors, the permutation problem for each frequency bin becomes complicated, so that it is expected that the reduction in separation accuracy in FDICA executed in the preceding stage will be significant. Also, if the number of signal sources is less than the number of sensors, it will easily fall into a local solution in the FDICA optimization learning process, resulting in poor convergence. An easy solution to these problems is to predict the number of signal sources and select the same number of sensors as the number of signal sources predicted from a plurality of redundantly arranged sensors. It is conceivable to match the number of sensors. However, this method is disadvantageous in terms of cost because not all sensors can be used effectively.

  Therefore, in the present invention, while ensuring the maximum separation performance of the target signal by using all the sensors, while constructing a method that does not decrease the separation performance even when the number of signal sources is smaller than the number of sensors, The purpose is to give a method to realize.

  In order to achieve the above object, independent component analysis (ICA) is employed in the present invention. That is, as a first processing step, wave signals from a plurality of signal sources are detected by a plurality of fixed sensors, and the signals for the detected plurality of channels are detected for data detection such as amplification and amplification waveform shaping. After performing the signal detection process, the time signal group is data that is identified as the parameter value of the target signal by dividing the channel into channels for each frequency band and performing independent component analysis in the frequency domain (FDICA) 1 and a plurality of signal identification processes 1 for transmitting both the time signal group 2 which is data identified as an unnecessary signal parameter value. The parameter value corresponds to the energy included in the frequency bin corresponding to the output of each element of the matrix indicating the received signals from a plurality of sound sources.

Next, as a second processing step, the separated signals are further subjected to independent component analysis in the time domain (TDICA), so that the temporal characteristics of the time signal group 1 and the time signal group 2 are statistically determined. It has a signal identification process 2 that analyzes and separates the target signal of at least one signal source. The signal identification process 2 includes a secondary attenuation process for attenuating the parameter value of the identified unnecessary signal.
In particular, in the present invention, since the signal identification process 1 analyzes signals input from a plurality of sensors, the signal identification process 1 is divided into a plurality of sub-blocks having fewer than all the sensors, When a signal group of a plurality of channels divided into frequency bands is input, each signal group is configured to be identified and processed independently in each sub-block.

  A primary attenuation process for attenuating the time signal group 2 identified as a signal related to an unnecessary signal parameter value in FDICA by using both independent component analysis (FDICA) in the frequency domain and independent component analysis (TDICA) in the time domain; TDICA also includes a secondary attenuation process for attenuating unnecessary signal parameter values identified in addition to the target signal parameter value of the signal source, and further, the signal identification process 1 is divided into sub-blocks to perform signal processing. Thus, by estimating the number of sound sources from the received signal, the signal extraction, that is, the sound source separation can be performed with high accuracy by making the number of sound sources equal to the number of receiving sensors.

Hereinafter, the basic configuration of the present invention and the operation principle thereof will be described.
In the present invention, the Permutation problem is solved by sub-blocking the FDICA installed in the previous stage in the method of Nishikawa et al. (Non-Patent Document 2).
First, multi-stage processing (hereinafter referred to as MSICA) that combines processing in the frequency domain (hereinafter referred to as FDICA) and processing in the time domain (hereinafter referred to as TDICA) will be described. In the following, if the signal is expressed in the time-frequency domain, with respect to the input and output signals, L sound source signal vectors S L (f, m), observed signal vectors X K (f, m), and FDICA output signal vector Z L (f, m) is a symbol '
If displayed with

The relationship between the sound source signal vector S L (f, m) and the observed signal vector X K (f, m) is

Given in. Here, A KL (f) is a mixing matrix of K rows and L columns that gives the spatial propagation characteristics of the signal, m is a frame number in short-time discrete Fourier (st-DFT) analysis, and f is a frequency. In the MSICA processing procedure, first, FDICA processing is performed on the observation signal. The output signal Z L (f, m) of the FDICA uses a separation matrix V LL (f) that relates the signals between the input and output,

As obtained. In FDICA, V LL (f) is optimized so that L output signals are independent of each other for each frequency f as in the case of the processing between W (f) and Y (f, t) in FIG.

  Next, individual output signals after signal source separation by FDICA in the frequency domain

Is regarded as an input signal of TDICA, which is the next stage, and TDICA processing is executed. Where t represents time and F −1 [] represents the inverse discrete Fourier transform for the expression in []. The final separation signal TDICA output signal y L (t) is

As

Given in. Here, w LL (τ) is a separation filter matrix having FIR filters as elements, and Q is a filter length. In TDICA, w LL (τ) is optimized so that L output signals are independent of each other.

(Embodiment 1)
In the present invention, the observation signals obtained from the K microphones are considered as a set of L (<K) observation signals, and are regarded as sub-blocks (FDICA1, FDICA2,..., FDIAN) shown in FIG. Then, N sub-blocks are configured, and FDICA is performed in each sub-block. The FDICA separation process in the n-th sub-block is expressed by adding the parenthesis to the sub-block number n and adding a superscript to (Formula 12).

Can be shown. However,

It is. Next, the output signals of the N sub-blocks are regarded as input signals for the next stage TDICA, and TDICA processing is performed. The separation process of TDICA is

Given in. However,

And w L (L × N) (τ) is a separation filter matrix of L rows × N columns. The TDICA separation filter matrix w L (L × N) (τ) is optimized using the following iterative learning.

Here, w i L (L × N) (τ) is the i-th separation filter matrix, and α is the step size for iterative learning. In the equation (22), the number of learnings may be fixed, but by completing the learning when the identification level 2 exceeds a certain threshold value, the convergence time is accelerated while guaranteeing the separation performance of the separation filter. I can do it. As a calculation example that gives an identification level 2, for example, Nishikawa et al.
Nishikawa, H. Saruwatari and K. Shkano, ”Blind Source
Separation of acoustic signals based on multistage ICA combining
Frequency-domain ICA and Time-domain ICA ”IEICE TRANS.
Fundamentals, Vol.E-84A, No1 Jan 2001) may be used. That is, by using the signal Y L (f, t) obtained by frequency-converting the time-cut signal in the short-time frame analysis of the separated signal y L (t) of (Equation 19)

And learning may be repeated until the evaluation function J exceeds a certain threshold. In the equation (23), <> t and <> f indicate that an average with respect to time and frequency is taken with respect to the expression in <>, and a symbol H represents a conjugate transpose matrix of the matrix attached with this symbol. , Diag represents a diagonal matrix, and the vertical double line on the right side represents the Frobenius norm. Φ (Y L (f, t)) is

Is a function given by
In the present invention, the input signal is divided into a small number of channel groups in each sub-block, and FDICA is applied to the channel groups. Therefore, even when the number of microphones is larger than the number of signal sources when all the microphones are used, the number of microphones is larger than the number of signal sources by matching the number of channels of the channel group with the number of signal sources. A decrease in the accuracy of FDICA separation can be prevented. Furthermore, since the output signals from all sub-blocks are used as TDICA input signals, the input information for all K microphones can be used effectively.

FIG. 2 shows a block diagram of the above process.
In FIG. 2, the observation signal is detected and converted into an electric signal in the sensors 10-1 to 10-n and the detection process 20. The next step, the band splitting process 30, gives the observed signal X (n) L (f, m) in equation (16). This band-divided signal is input to a signal identification process 1 indicated by 40, and a dispersion matrix V (n) LL (f) is obtained. Here, in the signal identification process 1, the parameter value indicating the state of the signal in the band frequency-analyzed for each channel, the signal type (for example, in the vehicle) according to the spatial position difference between the signal source and the sensor and the type of the signal source In the case of human voice, engine noise, road noise, etc.), the temporal characteristics and frequency characteristics of the detection signal due to the difference are statistically analyzed and input from the same signal source from the above parameter values An identification level 1 for identifying a target signal parameter value of at least one signal source, a time signal group 1 identified as a signal related to the target signal parameter value using the identification level 1, and an unnecessary signal parameter A plurality of signal identification processes for transmitting any of the time signal groups 2 identified as signals relating to values.

Next, in the first-order attenuation process 50, the calculation of Expression (16) is executed to calculate the output signal Z (n) LL (f, m) of the nth FDICA subblock. Based on the result of the above processing, the separation filter matrix w L (L × N) (τ) in Expression (19) is obtained by the signal identification process 2 shown at 60. In the signal processing step 2 indicated by 60, the signal identification level 2 of the separation filter is calculated, and the learning is repeated until the signal identification level 2 reaches a desired level. In the secondary attenuation process 70, the calculation of Expression (19) is executed to calculate the separation signal y L (t). The above formulas (16) to (21) are merely examples, and do not represent the calculation method and all of the present invention.

In order to show the effect of the present invention, a sound source signal separation experiment was performed by offline simulation. In this experiment, the separation accuracy in the acoustic signals of 2ch MSICA and 12ch MSICA to which the present invention is applied is compared.
As a sound source signal, an impulse response having a reverberation time of 300 ms is convoluted with a signal source by an RWCP database often used in this kind of experiment, and reverberation added sound (sampling frequency:
8kHz).
The signal source was an audio signal, and experiments were conducted on 12 combinations of speaker and sound source positions. The number of microphones is 2 (2ch-MSICA) and 12 (12ch-MSICA: proposed method), 2.83cm
At intervals, they were linearly arranged at a height of 1.46 m from the floor. Assuming that the sound source signal comes from two different azimuths set at a height of 1.72 m from the floor, the sound is transmitted from two conditions (signal source arrangement pattern 1: azimuth −60 ° and + 40 °. Radiation, signal source arrangement pattern 2: Sound is radiated from two directions of azimuth −40 ° and + 20 °, where 0 ° direction is a direction perpendicular to the microphone row). The distance between the sound source and the center of the microphone array is 2.02 m, and the SNR is 0 dB when mixing two signals.
It is.

In FDICA, the separation method proposed by Saruwatari (H.Saruwatari
et al.: Proc. Eurospeech2001, vol.4, pp.2603--2606, Sep. 2001.) and TDICA proposed separation method (S. Choi)
et al .: Proc. International Conference on
ICA and BSS) pp.371-376, Jan. 1999.). The FDICA separation filter in each sub-block is 1024 taps, and the initial value is a blind spot control type beam former that forms a blind spot to ± 60 °. The TDICA separation filter was set to 2048 taps. In this experiment, Noise is used as an objective evaluation scale of separation accuracy.
Reduction rate (NRR; output SNR [dB] −input SNR [dB]) was used.

FIG. 3 shows the results of separation accuracy for two experimental conditions.
First, the convergence accuracy of the separation filter will be described. In any of the conditions, since the NRR value is a positive number, the SNR is improved as compared with the mixing. Therefore, the convergence of the separation filter of the 2ch-MSICA is well performed, and further, the convergence of the separation filter is performed well even in the 12ch-MSICA where the number of signal sources is small with respect to the number of microphones. Understand.

  Next, the separation accuracy is compared for the two methods. In the signal source arrangement pattern 1, the NRR value of 12ch-MSICA is 15.06 dB while the NRR value of 2ch-MSICA is 11.92 dB, and a performance improvement of 3.14 dB can be observed. In the signal source arrangement pattern 2, the NRR value of 2ch-MSICA was 7.92 dB, while that of 12ch-MSICA was 10.98 dB, and a performance improvement of 3.06 dB was observed. From the above, it can be seen that the separation accuracy is improved in the present invention as compared with the conventional method.

Next, the basic configuration of the apparatus corresponding to the above processing procedure will be described with reference to FIG. 4, and the configuration of the central portion of the processing apparatus will be described with reference to FIG.
The sensor means 110-1 to 110-n and the detection means 120 shown in FIG. 4 are used for receiving and detecting an incoming observation signal. This can be realized by a sensor group such as a microphone, a filter 220, and an A / D converter 230 shown in sensors 210-1 to 210-2 in FIG.
The sensor groups 210-1 to 210-2 have a plurality (n pieces) of sensors each having a function of detecting a wave signal such as light, sound, vibration, magnetic change, magnetic field change, electricity, and radio wave and converting it into an electric signal. ), Arranged at different spatial positions. Specifically, a single sensor or a plurality of sensors that detect waves such as an optical sensor, a sound sensor, a microphone, a vibration sensor, a magnetic sensor, an electric sensor, and an antenna are used.

The filter is used to remove noise included in the electrical signal obtained from the sensor. For this, it is sufficient to use a band-pass filter that removes only a signal of a component that cannot be a characteristic of the signal source from the electric signal detected by each sensor, and an existing electric filter circuit is used. This can be achieved.
The A / D converter only needs to be a device having a sampling frequency sufficient to discretize a signal in a band of a signal source accurately, and can convert an A / D converter into a discrete information signal. This can be realized by using a D conversion circuit or the like.

The band dividing means 130 in FIG. 4 converts the detected signal into a mathematically orthogonal space using an orthogonal transformation function. Specifically, a frequency conversion function such as discrete Fourier transform, Z transform, and Laplace transform may be used, which can be calculated by the arithmetic device 240 and the storage device 250 in FIG.
The arithmetic device 240 is configured by combining a single computer or a plurality of main arithmetic circuits and circuit groups such as CPU, MPU, DSP, FPGA and the like of a general computer with sub-operation circuits and storage circuits that are peripheral circuits. The storage device 250 is a device that can store electrical signals, represented by cache memory, main memory, disk memory, compact disk, flash memory, DVD, tape, floppy disk, magneto-optical disk, MD, and DAT, and This can be realized by using a medium.
Further, the signal identification unit 1 indicated by 140 calculates the identification level 1 of the separation filter in each frequency band, and performs an operation for extracting a target signal from the divided signals. This can be realized by the arithmetic device 240 and the storage device 250 of FIG.

  In the primary attenuation means 150 for executing the FDICA process and the secondary attenuation means 160 for executing the TDICA process shown in FIG. 4, a necessary target signal is extracted from the input signal, and other unnecessary signals are attenuated. Process. This can be realized by the arithmetic device 240 and the storage device 250 in FIG.

(Embodiment 2)
The primary attenuation process of the present invention aims to separate signals from a plurality of directional signal sources. However, in a real environment, there are also diffusible signal sources, which causes deterioration of the signal source separation performance. For this reason, a processing method for preventing mixing of components other than a signal transmitted from a target signal source in a frequency band in which signal separation is difficult even if FDICA is used due to the presence of a diffusive signal source And realization of the device is required. In the second embodiment, processing for removing diffusive noise by frequency band suppression (SBE / SubBand Elimination) will be described.

First, a process using the method according to the present invention will be described with reference to FIG.
In FIG. 6, the sensors 10-1 to 10-n, the detection process 20, the band division process 30 and the primary attenuation process 50 are as described in FIG. In FIG. 6, in the primary sub-attenuation process 55, the parameter value indicating the state of the signal in the frequency band in which the identification level 1 does not reach a predetermined level by the primary attenuation process 50 and is difficult to separate is regarded as an unnecessary component. Time signal group 2 is suppressed. As a result, at least two or more sensors are used, and the target signal parameter value of at least one signal source using the parameter value of the signal identification process 1 is independent in time, frequency, and geometric space from the unnecessary signal parameter value. The identification level 1 is made higher when the property is high. As described above, by providing the primary sub-attenuation process 55 after the primary attenuation process 50, it is easier to suppress the time signal group 2 that identifies the parameter value indicating the state of the unnecessary signal that is difficult to suppress in the frequency band. I made it.

In order to know the frequency band of unwanted signals that were difficult to separate, for example, a band suppression method using a cost function based on cosine distance proposed by Saruwatari et al. (Saruwatari et al., “Blind sound source separation and subband removal processing was used. It can be realized by introducing "In-car sound recognition", IEICE, EA2002-8). In the technique of Saruwatari et al., Attention is paid to the fact that the frequency separation band with a larger cost function used for calculating the FDICA has worsened the signal separation accuracy by ICA, and the SNR improvement rate is improved by suppressing these bands. Yes.
That is, a cosine distance indicating a difference between the target signal parameter value and the unnecessary signal parameter value is defined as a cost function, and when the value of the cost function is low, it is considered that the independence is high. It is high. Here, the cost function is for evaluating the independence between separated signals, and can be obtained by using a higher-order correlation value between separated signals or a cosine distance in a signal matrix space. In particular, the latter method using the cosine distance is considered to be efficient with a small amount of calculation. Equation (25) represents a cost function J (f) based on the cosine distance between two sound sources.

In the formula (25), Y 1 (f, t) and Y 2 (f, t) are separated signals after unnecessary bands are removed, and <
> Indicates time average, * indicates complex conjugate. In order to actually apply the cost function obtained in this way, processing such as smoothing is necessary, but in any case, this method can be introduced to the present invention, and is preceded by the secondary attenuation process 70. By removing the band including diffusive noise in advance, an improvement in separation performance in the secondary attenuation process 70 can be expected.

  In the correction process 80 in FIG. 6, each time the primary attenuation process 50 for separation is calculated in the signal identification process 1 indicated by 40, the cost function J (f) is referred to, and the primary sub-attenuation process 55 is performed. Learning is performed by correcting the frequency band to be suppressed.

(Embodiment 3)
FIG. 7 shows a third embodiment of the present invention. The third embodiment will be described with reference to FIGS.

  In the primary sub-attenuating unit 155 in FIG. 7, the parameter value of the frequency band that is difficult to separate by the primary attenuating unit 150 is suppressed. Here, the parameter value corresponds to the energy contained in the frequency bin corresponding to the output of each element of the matrix indicating the received signals from a plurality of sound sources. More specifically, the parameter value suppression may be performed by reducing the energy of the parameter value in the target frequency band. For example, the parameter value is reduced to 1 / n or combined with a notch filter. Thus, a technique such as removing a target frequency band may be applied. This can be realized by using the arithmetic device 240 and the storage device 250 in FIG.

  The storage unit 180 in FIG. 7 stores information related to the bandwidth of the signal divided by the band dividing unit 130 and information related to the identification level used when identifying the signal used in the signal identification unit 1 shown in 140. The information about the bandwidth stores, for example, information about the analysis width when the signal is analyzed and frequency-converted, information about the analysis width when the signal is converted into a subband after frequency conversion, and the like. Information relating to the identification level stores information on the cost function. The correction means 190 in FIG. 7 corrects the primary sub-attenuation means 155 with reference to the cost function every time the primary attenuation means 150 for separation is calculated in the signal identification means 1 indicated by 140.

(Embodiment 4)
In FDICA, when the number of signal sources matches the number of microphones, good separation accuracy can be obtained. For this reason, it is necessary to realize means for matching the number of signal sources with the number of microphones, and it is necessary to predict the number of signal sources.
For example, consider the case where the present invention is applied to a hands-free voice communication and voice input device in a vehicle cabin. At this time, the target signal is the user's voice, and the unnecessary signal can be considered as various noises generated in the vehicle, such as road noise, engine noise, air conditioner noise, and the like. Road noise can be predicted by detecting the vehicle speed, and engine noise can be predicted by vehicle speed and the presence or absence of idling. The air conditioner noise can be predicted by the air conditioner ON / OFF and the outlet switching state. In the fourth embodiment, the presence / absence of these unnecessary signals is predicted, and the number of sub-blocks of the signal identification unit 1 and the determination process of the channel and the number of channels corresponding to one sub-block will be described.

FIG. 8 is a block diagram of this processing system. The configuration of the apparatus according to the present invention will be described with reference to FIGS.
8 predicts the occurrence of various unnecessary signals in the actual environment, and sends the number of signal sources to the changing means 190 in FIG. The changing unit 190 determines the number of subblocks included in the signal identifying unit 1 indicated by 140 in FIG. 8 and the number of channels and the number of channels corresponding to one subblock according to the number of signal sources. The number and the number of channels and the number of channels corresponding to one sub-block are changed.

(Embodiment 5)
The number of subblocks, the channels corresponding to the input / output terminals of one subblock, and the number of channels can be determined as follows.

  FIG. 9 is a block diagram of a system for determining the above numbers. Hereinafter, the fifth embodiment will be described with reference to FIGS. 10 and 5 together.

  The table means 200 in FIG. 9 stores a plurality of standard patterns describing the number of subblocks to be changed by the changing means 190, the channels corresponding to the input / output terminals of one subblock, and the number of channels. The table means 200 can be realized by using the storage means 250 of FIG. An example of the standard pattern is shown in FIG. In FIG. 10, in a system with five sensors corresponding to microphones, sub-blocks when two signal sources are predicted, paths from each sensor to each sub-block, and channels input to one sub-block The state where the number is determined is shown. At this time, three subblocks are generated, and the number of channels input to each subblock is two, which is the same as the number of predicted signal sources. The path from each sensor to each sub-block is as follows: sensor 1 and sensor 2 are inputs of sub-block 1, sensor 3 and sensor 4 are inputs of sub-block 2, and sensors 4 and 5 are inputs of sub-block 3. Yes. The number of channels of the output signal from TDICA corresponding to the secondary attenuation means 170 is two, which is the same as the number of signal sources predicted. Since the number of these sub-blocks and the standard pattern of channels and the number of channels corresponding to one sub-block are affected by the position of the sensor, it is desirable to determine them according to the environment to which the present invention is applied.

(Embodiment 6)
In the sixth embodiment, as a method of sensor arrangement, a method of constructing a sub sensor array and associating one sub block with each sub sensor array is disclosed. Each of the sub-sensor arrays 304, 305, and 306 shown in FIG. 11 has two sensors in one sub-sensor array, and one sub-block is arranged after these sub-sensor arrays. When such a method is used, since it is assumed that there are two signal sources in advance, it is desirable to arrange the sub sensor array so that signals can be detected spatially independently. As described above, by arranging a plurality of microphones as sensors, the degree of freedom in arranging the sensors can be improved, and practicality can be improved.
Each of the above embodiments is merely an example to which the present invention is applied, and does not limit the scope of application of the present invention.

The system diagram which shows the flow of the signal in this invention. FIG. 3 is a flowchart for explaining a signal processing process according to the first embodiment. The histogram which shows the signal processing result by this invention. 1 is a block diagram showing a basic configuration of a signal processing device according to Embodiment 1. FIG. The block diagram which shows the connection relation of the central part hardware of a signal processing system. FIG. 9 is a flowchart for explaining a signal processing process according to the second embodiment. FIG. 5 is a block diagram according to a signal processing device according to a third embodiment. FIG. 6 is a block diagram according to a signal processing device according to a fourth embodiment. FIG. 6 is a block diagram according to a signal processing device according to a fifth embodiment. FIG. 16 is an interconnection diagram regarding sub-blocks in the fifth embodiment. The interconnection diagram regarding the sub-block in Embodiment 6. FIG. The processing system system diagram explaining the conventional signal separation processing. A processing system diagram explaining other conventional signal separation processing.

Explanation of symbols

10_1 to 10_n, 110_1 to 110_n, 210_1 to 210_n: Sensor 20: Detection process 30: Band division process 40: Signal identification process 1 50: Primary attenuation process 55: Primary secondary attenuation process 60: Signal identification process 2
70: Secondary attenuation process 80: Correction process 120: Detection means 130: Band division means 140: Signal identification means 1 150: Primary attenuation means 155: Primary and secondary attenuation means 160: Signal identification means 2
170: Secondary attenuation means 180: Storage means 181: Prediction means 190: Correction means 191: Change means 200: Table means 220: Filter 230: AD converter 240: Arithmetic device 250: Storage device 300: Sub block 1 301: Sub Block 2
302: Subblock 3 303: TDICA
304, 305, 306: Sub sensor array

Claims (15)

  1. A method of detecting wave signals emitted from a plurality of signal sources by a plurality of fixed sensors, and separating a signal of at least one signal source from the detected signals,
    A detection process of inputting signals from the plurality of sensors;
    A band division process of dividing a plurality of detection signals detected by the detection process into channels for each frequency band;
    About the parameter value indicating the state of the signal in each band divided for each channel by the band division process, the time of the detection signal resulting from the difference in spatial position and the signal type between the signal source and the sensor Statistically analyzing the characteristic characteristic and the frequency characteristic to calculate an identification level 1 for identifying a target signal parameter value of a signal inputted from the same signal source from at least one signal source from the parameter value; Using the identification level 1, the time signal group 1 that is a signal related to the target signal parameter value and the time signal group 2 that is a signal related to the unnecessary signal parameter value are respectively identified, and the time signal group 1 and the time signal group 2 are identified. A plurality of signal identification processes 1 for sending
    A primary attenuation process for attenuating the time signal group 2;
    Regarding the time signal group 1 and the time signal group 2 transmitted from the signal identification process 1, the time signal group caused by a difference in spatial position between the signal source and the sensor and a signal type by the signal source 1 and a signal identification process 2 for statistically analyzing temporal characteristics of the time signal group 2 to separate a target signal of at least one signal source;
    A second-order attenuation process for attenuating unnecessary signal parameter values other than the target signal parameter value of the identified at least one signal source in the signal identification process 2;
    The signal identification process 1 is composed of a plurality of sub-blocks less than the total number of sensors for analyzing signals input from a plurality of sensors,
    When a signal group of a plurality of channels detected by the detection process and divided by the band division process is input to each sub-block, the signal group is independently identified and processed in each sub-block. Signal separation method.
  2.   2. The signal separation method according to claim 1, wherein a signal identification level 2 is calculated in the signal identification step 2 and learning is repeated until the signal identification level 2 reaches a desired level.
  3.   For the identification level 1 identified from the parameter value of the signal identification process 1 using at least two sensors, the target signal parameter value for at least one signal source is more temporal than the unnecessary signal parameter value, 2. The signal separation method according to claim 1, wherein the identification level 1 is increased when the frequency and geometric space are highly independent.
  4.   Defining a cosine distance indicating a difference between the target signal parameter value and the unnecessary signal parameter value as a cost function, and assuming that the independence is high when the value of the cost function is low, and increasing the identification level 1 The signal separation method according to claim 1, wherein the signal separation method is a signal separation method.
  5.   In the band where the identification level 1 calculated in the signal identification process 1 does not reach the desired level, the parameter value of the band is regarded as an unnecessary component, and a primary and secondary attenuation process for attenuating the signal of the band is included. The signal separation method according to claim 1, wherein:
  6. A method of detecting wave signals emitted from a plurality of signal sources by a plurality of fixed sensors, and separating a signal of at least one signal source from the detected signals,
    A detection process of inputting signals from the plurality of sensors;
    A band division process of dividing a plurality of detection signals detected by the detection process into channels for each frequency band;
    About the parameter value indicating the state of the signal in each band divided for each channel by the band division process, the time of the detection signal resulting from the difference in spatial position and the signal type between the signal source and the sensor Statistically analyzing the characteristic characteristic and the frequency characteristic to calculate an identification level 1 for identifying a target signal parameter value of a signal inputted from the same signal source from at least one signal source from the parameter value; Using the identification level 1, the time signal group 1 that is a signal related to the target signal parameter value and the time signal group 2 that is a signal related to the unnecessary signal parameter value are respectively identified, and the time signal group 1 and the time signal group 2 are identified. A plurality of signal identification processes 1 for sending
    A primary attenuation process for attenuating the time signal group 2;
    Regarding the time signal group 1 and the time signal group 2 transmitted from the signal identification process 1, the time signal group caused by a difference in spatial position between the signal source and the sensor and a signal type by the signal source 1 and a signal identification process 2 for statistically analyzing temporal characteristics of the time signal group 2 to separate a target signal of at least one signal source;
    A secondary attenuation process of attenuating unnecessary signal parameter values other than the target signal parameter value of the identified at least one signal source in the signal identification process 2;
    In the band in which the identification level 1 calculated in the signal identification process 1 does not reach a desired level, a primary and secondary attenuation process in which the parameter value of the band is regarded as an unnecessary component and the signal in the band is attenuated. A signal separation method comprising:
  7. A device for detecting wave signals emitted from a plurality of signal sources by a plurality of fixed sensors and separating a signal of at least one signal source from the detected signals,
    Detecting means for inputting signals from the plurality of sensors;
    Band dividing means for dividing a plurality of detection signals detected by the detecting means into channels for each frequency band;
    About the parameter value indicating the state of the signal in each band divided for each channel by the band dividing means, the time of the detection signal caused by the difference in spatial position and the signal type between the signal source and the sensor Statistically analyzing the characteristic characteristic and the frequency characteristic to calculate an identification level 1 for identifying a target signal parameter value of a signal inputted from the same signal source from at least one signal source from the parameter value; Using the identification level 1, the time signal group 1 that is a signal related to the target signal parameter value and the time signal group 2 that is a signal related to the unnecessary signal parameter value are respectively identified, and the time signal group 1 and the time signal group 2 are identified. A plurality of signal identification means 1 for sending
    Primary attenuation means for attenuating the time signal group 2;
    For the time signal group 1 and the time signal group 2 sent from the signal identification means 1, the time signal group caused by a difference in spatial position between the signal source and the sensor and a signal type by the signal source 1 and a signal identifying means 2 for statistically analyzing temporal characteristics of the time signal group 2 and separating a target signal of at least one signal source;
    The signal identification means 2 comprises secondary attenuation means for attenuating unnecessary signal parameter values identified other than the target signal parameter value of at least one identified signal source;
    The signal identification means 1 is composed of a plurality of sub-blocks less than the total number of sensors for analyzing signals input from a plurality of sensors.
    When a signal group of a plurality of channels detected by the detecting unit and divided by the band dividing unit is input to each sub-block, the signal group is independently identified in each sub-block. A signal separation device.
  8.   8. The signal separation device according to claim 7, wherein the signal identification means 2 calculates a signal identification level 2 and repeats learning until the signal identification level 2 reaches a desired level.
  9.   For the identification level 1 identified from the parameter value of the signal identification means 1 using at least two or more sensors, the target signal parameter value for at least one signal source is more temporal than the unnecessary signal parameter value, 8. The signal separation device according to claim 7, wherein the identification level 1 is increased when the frequency and geometric space are highly independent.
  10.   Defining a cosine distance indicating a difference between the target signal parameter value and the unnecessary signal parameter value as a cost function, and assuming that the independence is high when the value of the cost function is low, and increasing the identification level 1 The signal separation device according to claim 7 or 8, wherein the signal separation device is characterized in that:
  11.   In the band where the identification level 1 calculated by the signal identification unit 1 does not reach a desired level, the band has a primary / sub-attenuation unit that regards the parameter value of the band as an unnecessary component and attenuates the signal of the band. The signal separation device according to claim 7, wherein the signal separation device is a signal separation device.
  12. Predicting means for predicting the number of signal sources; and changing means for changing the number of sub-blocks, the channels input to each sub-block, and the number of channels based on the predicted number of signal sources;
    The signal separation device according to claim 7, comprising:
  13.   13. The signal separating apparatus according to claim 12, further comprising table means for holding a standard pattern of the number of sub-blocks determined by said changing means, channels input to each sub-block, and the number of channels.
  14.   In the sub-block, one or more sub-sensor arrays that are installed at positions where it is easy to maintain independence from each other spatially and are configured with a smaller number of microphones than the total number of microphones are allocated to each sub-block. The signal source separation device according to claim 7, wherein the signal source separation device is arranged.
  15. A device for detecting wave signals emitted from a plurality of signal sources by a plurality of fixed sensors and separating a signal of at least one signal source from the detected signals,
    Detecting means for inputting signals from the plurality of sensors;
    Band dividing means for dividing a plurality of detection signals detected by the detecting means into channels for each frequency band;
    About the parameter value indicating the state of the signal in each band divided for each channel by the band dividing means, the time of the detection signal caused by the difference in spatial position and the signal type between the signal source and the sensor Statistically analyzing the characteristic characteristic and the frequency characteristic to calculate an identification level 1 for identifying a target signal parameter value of a signal inputted from the same signal source from at least one signal source from the parameter value; Using the identification level 1, the time signal group 1 that is a signal related to the target signal parameter value and the time signal group 2 that is a signal related to the unnecessary signal parameter value are respectively identified, and the time signal group 1 and the time signal group 2 are identified. A plurality of signal identification means 1 for sending
    Primary attenuation means for attenuating the time signal group 2;
    For the time signal group 1 and the time signal group 2 sent from the signal identification means 1, the time signal group caused by a difference in spatial position between the signal source and the sensor and a signal type by the signal source 1 and a signal identifying means 2 for statistically analyzing temporal characteristics of the time signal group 2 and separating a target signal of at least one signal source;
    Secondary attenuation means for attenuating unnecessary signal parameter values other than the target signal parameter value of the identified at least one signal source in the signal identification means 2;
    In the band where the identification level 1 calculated by the signal identification unit 1 does not reach the desired level, a primary and secondary attenuation unit that regards the parameter value of the band as an unnecessary component and attenuates the signal of the band; A signal separation device comprising:
JP2003322746A 2003-09-16 2003-09-16 Method and apparatus for signal separation Pending JP2005091560A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003322746A JP2005091560A (en) 2003-09-16 2003-09-16 Method and apparatus for signal separation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003322746A JP2005091560A (en) 2003-09-16 2003-09-16 Method and apparatus for signal separation

Publications (1)

Publication Number Publication Date
JP2005091560A true JP2005091560A (en) 2005-04-07

Family

ID=34454013

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003322746A Pending JP2005091560A (en) 2003-09-16 2003-09-16 Method and apparatus for signal separation

Country Status (1)

Country Link
JP (1) JP2005091560A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007096418A (en) * 2005-09-27 2007-04-12 Chubu Electric Power Co Inc Separation method of a plurality of sound sources
WO2007083814A1 (en) * 2006-01-23 2007-07-26 Kabushiki Kaisha Kobe Seiko Sho Sound source separation device and sound source separation method
JP2007295085A (en) * 2006-04-21 2007-11-08 Kobe Steel Ltd Sound source separation apparatus, and sound source separation method
WO2008072566A1 (en) * 2006-12-12 2008-06-19 Nec Corporation Signal separation reproduction device and signal separation reproduction method
JP2009257933A (en) * 2008-04-17 2009-11-05 Kobe Steel Ltd Magnetic field measuring device, nondestructive inspection device, and magnetic field measurement signal processing method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007096418A (en) * 2005-09-27 2007-04-12 Chubu Electric Power Co Inc Separation method of a plurality of sound sources
JP4652191B2 (en) * 2005-09-27 2011-03-16 中部電力株式会社 Multiple sound source separation method
JP2007219479A (en) * 2006-01-23 2007-08-30 Kobe Steel Ltd Device, program, and method for separating sound source
JP4496186B2 (en) * 2006-01-23 2010-07-07 国立大学法人 奈良先端科学技術大学院大学 Sound source separation device, sound source separation program, and sound source separation method
WO2007083814A1 (en) * 2006-01-23 2007-07-26 Kabushiki Kaisha Kobe Seiko Sho Sound source separation device and sound source separation method
JP2007295085A (en) * 2006-04-21 2007-11-08 Kobe Steel Ltd Sound source separation apparatus, and sound source separation method
WO2008072566A1 (en) * 2006-12-12 2008-06-19 Nec Corporation Signal separation reproduction device and signal separation reproduction method
US8345884B2 (en) 2006-12-12 2013-01-01 Nec Corporation Signal separation reproduction device and signal separation reproduction method
JP5131596B2 (en) * 2006-12-12 2013-01-30 日本電気株式会社 Signal separating / reproducing apparatus and signal separating / reproducing method
JP2009257933A (en) * 2008-04-17 2009-11-05 Kobe Steel Ltd Magnetic field measuring device, nondestructive inspection device, and magnetic field measurement signal processing method

Similar Documents

Publication Publication Date Title
JP3522954B2 (en) Microphone array input type speech recognition apparatus and method
Simmer et al. Post-filtering techniques
JP4671303B2 (en) Post filter for microphone array
US9280965B2 (en) Method for determining a noise reference signal for noise compensation and/or noise reduction
KR101258491B1 (en) Method and apparatus of processing audio signals in a communication system
Benesty et al. On microphone-array beamforming from a MIMO acoustic signal processing perspective
US20030138116A1 (en) Interference suppression techniques
JP2004334218A (en) Method and system for microphone array and method and device for speech recognition using same
Markovich et al. Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals
US7533015B2 (en) Signal enhancement via noise reduction for speech recognition
EP2245861B1 (en) Enhanced blind source separation algorithm for highly correlated mixtures
US7295972B2 (en) Method and apparatus for blind source separation using two sensors
KR20130117750A (en) Monaural noise suppression based on computational auditory scene analysis
JP5007442B2 (en) System and method using level differences between microphones for speech improvement
EP1887831B1 (en) Method, apparatus and program for estimating the direction of a sound source
JP4195267B2 (en) Speech recognition apparatus, speech recognition method and program thereof
EP2063419B1 (en) Speaker localization
JP4690072B2 (en) Beam forming system and method using a microphone array
Yoshioka et al. Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening
Ikram et al. Permutation inconsistency in blind speech separation: Investigation and solutions
JP5587396B2 (en) System, method and apparatus for signal separation
ES2398407T3 (en) Robust two microphone noise suppression system
JP4815661B2 (en) Signal processing apparatus and signal processing method
US8363850B2 (en) Audio signal processing method and apparatus for the same
EP2237271A1 (en) Method for determining a signal component for reducing noise in an input signal

Legal Events

Date Code Title Description
A621 Written request for application examination

Effective date: 20060727

Free format text: JAPANESE INTERMEDIATE CODE: A621

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20090601

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20090630

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20091104