WO2005024788A1 - 信号分離方法、信号分離装置、信号分離プログラム及び記録媒体 - Google Patents
信号分離方法、信号分離装置、信号分離プログラム及び記録媒体 Download PDFInfo
- Publication number
- WO2005024788A1 WO2005024788A1 PCT/JP2004/012629 JP2004012629W WO2005024788A1 WO 2005024788 A1 WO2005024788 A1 WO 2005024788A1 JP 2004012629 W JP2004012629 W JP 2004012629W WO 2005024788 A1 WO2005024788 A1 WO 2005024788A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- value
- vector
- mask
- observed
- Prior art date
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 270
- 239000013598 vector Substances 0.000 claims description 241
- 239000011159 matrix material Substances 0.000 claims description 210
- 238000000034 method Methods 0.000 claims description 131
- 238000012545 processing Methods 0.000 claims description 78
- 230000008569 process Effects 0.000 claims description 40
- 239000000284 extract Substances 0.000 claims description 34
- 230000000295 complement effect Effects 0.000 claims description 13
- 239000000203 mixture Substances 0.000 claims description 13
- 238000013507 mapping Methods 0.000 claims description 11
- 230000007704 transition Effects 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 6
- 230000036962 time dependent Effects 0.000 claims description 3
- 238000007796 conventional method Methods 0.000 description 38
- 238000012880 independent component analysis Methods 0.000 description 36
- 230000006870 function Effects 0.000 description 30
- 230000010354 integration Effects 0.000 description 30
- 238000004364 calculation method Methods 0.000 description 27
- 238000010586 diagram Methods 0.000 description 26
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 23
- 238000006243 chemical reaction Methods 0.000 description 22
- 230000015654 memory Effects 0.000 description 17
- 238000000605 extraction Methods 0.000 description 15
- 238000010606 normalization Methods 0.000 description 15
- 238000009826 distribution Methods 0.000 description 12
- 230000000694 effects Effects 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 9
- 230000014509 gene expression Effects 0.000 description 9
- 230000005236 sound signal Effects 0.000 description 8
- 239000004065 semiconductor Substances 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000004044 response Effects 0.000 description 6
- 230000001131 transforming effect Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000001052 transient effect Effects 0.000 description 5
- 230000015556 catabolic process Effects 0.000 description 4
- 238000006731 degradation reaction Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000008030 elimination Effects 0.000 description 3
- 238000003379 elimination reaction Methods 0.000 description 3
- 238000012856 packing Methods 0.000 description 3
- 230000008707 rearrangement Effects 0.000 description 3
- 101100269850 Caenorhabditis elegans mask-1 gene Proteins 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 102100037425 17-beta-hydroxysteroid dehydrogenase 14 Human genes 0.000 description 1
- 101100309766 Arabidopsis thaliana SDR3a gene Proteins 0.000 description 1
- 101100309767 Arabidopsis thaliana SDR3b gene Proteins 0.000 description 1
- 101100129500 Caenorhabditis elegans max-2 gene Proteins 0.000 description 1
- 241000934790 Daphne mezereum Species 0.000 description 1
- 102100027626 Ferric-chelate reductase 1 Human genes 0.000 description 1
- 101150072282 HSD17B14 gene Proteins 0.000 description 1
- 101000862406 Homo sapiens Ferric-chelate reductase 1 Proteins 0.000 description 1
- 101000604054 Homo sapiens Neuroplastin Proteins 0.000 description 1
- 101000806155 Homo sapiens Short-chain dehydrogenase/reductase 3 Proteins 0.000 description 1
- 241000656753 Jaminaea rosea Species 0.000 description 1
- 101000654471 Mus musculus NAD-dependent protein deacetylase sirtuin-1 Proteins 0.000 description 1
- 241001606091 Neophasia menapia Species 0.000 description 1
- 241001602688 Pama Species 0.000 description 1
- 101100477602 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) SIR3 gene Proteins 0.000 description 1
- 102100037857 Short-chain dehydrogenase/reductase 3 Human genes 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 101150026756 sir1 gene Proteins 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000010408 sweeping Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2134—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
- G06F18/21347—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis using domain transformations
Definitions
- the present invention relates to a signal separation method, a signal separation device, a signal separation program, and a recording medium.
- the present invention relates to the technical field of signal processing, and in particular, it is difficult to directly observe only a necessary source signal (target signal) but to observe another signal superimposed on the target signal.
- the present invention relates to a signal separation method, a signal separation device, a signal separation program, and a recording medium storing the same, for estimating a target signal in a situation.
- a mixed signal obtained by mixing a plurality of source signals (such as audio signals) is used to separate and extract a source signal before mixing without using knowledge of the source signal and the mixing process.
- Blind Source Separation technology is known.
- FIG. 27A is a block diagram conceptually illustrating this blind signal separation technique.
- the process in which the source signals s emitted from the signal source 701 are mixed and observed by the sensor 702 is called a “mixing process”, and the process of extracting a separated signal from the observation result of the sensor 702 is referred to as a “separation process”. Process ".
- N is the number of signal sources 701
- M is the number of sensors 702
- s is the i-th signal source 701 (signal source i)
- the output signal source signal
- h is the j-th sensor 702
- the impulse response up to the sensor is defined as the signal X observed by the sensor j is the convolutional mixture of the source signal s and the impulse response h
- [Number 1] Is modeled by Here, “convolution” means that a signal is delayed in a signal propagation process, multiplied by a predetermined coefficient, and then added. In addition, all signals are sampled at a certain sampling frequency and are represented discretely.
- P indicates the impulse response length
- t indicates the sampling time
- p indicates the variable for sweeping (operation of applying different coefficients to each sample value of the time-shifted signal). ing. It is assumed that the N signal sources 701 are statistically independent of each other, and that each signal is sufficiently sparse. Also, "sparse” indicates that the signal is almost always 0 in ijt, and this sparseness is confirmed by, for example, an audio signal.
- the purpose of the BSS is to estimate the separation system (W) 703 from only the observed signal X without knowing the source signal s and the impulse response h to obtain a separated signal y.
- f the frequency
- m the time of the frame used for DFT.
- W (f, m) be an (NX M) matrix having, in its jk element, a frequency k frequency response W (f, m) from the observation signal at sensor j to the separation signal y.
- This W (f, m) is called the separation matrix.
- the separated signal is in the time-frequency domain
- Y (f, m) W (f, m) X (f, m) It becomes.
- Y (f, m) [Y (f, m), ⁇ , Y (f, m)] T is the time-frequency
- the separated signal y is subjected to a short-time inverse discrete Fourier transform (IDFT) to obtain a separated signal y which is an estimation result of the source signal.
- IDFT short-time inverse discrete Fourier transform
- Y (f, m) is a vector.
- the separation matrix W (f, m) is estimated only from the observed signals.
- Conventional methods for estimating the separated signal Y (f, m) include (a) a method using independent component analysis, (b) a method using signal sparsity, and (c) estimating a mixing matrix using sparsity.
- the method is known. Hereinafter, each will be described.
- a separation matrix W (f, m) at each frequency W (f, m) at each frequency.
- the ICA separation matrix estimator 705 for example,
- AW (f) is obtained by the learning rule.
- [ ⁇ ] ⁇ denotes the conjugate transpose of ⁇ .
- I is a unit matrix
- ⁇ > is a time average
- ⁇ is a certain nonlinear function
- / is an update coefficient.
- the separation system required by ICA is a time-invariant linear system.
- Various ICA algorithms such as those described in Non-Patent Document 1, are introduced.
- permutation solution Is such that the separated signal components corresponding to the same source signal ⁇ are separated signals Y (f, m) having the same subscript i at all frequencies.
- the estimated arrival direction of the signal obtained using the inverse matrix of the separation matrix in the case of N ⁇ M, the Moore-Penrose type pseudo-reverse system IJ is verified, and the estimation corresponding to the i-th separated signal is performed.
- the permutation / scaling solution unit 706 is, for example, a regression system 1J of the separation matrix W (f, m) obtained after the permutation solution (in the case of N ⁇ M, a Moore-Penrose-type pseudoregression system IJ) W— ⁇ f, m), and for each row w (f, m) of the separation matrix W (f, m)
- Etc. can be used. Further, as described above, as the permutation solution, for example, any one of a signal arrival direction estimation method and a method using the frequency similarity of the frequency of a separated signal, or a method combining both can be used. The details are described in Patent Literature 1 and Non-Patent Literature 2. In addition, ICA requires that the number of signal sources N and the number of sensors M have a relationship of M ⁇ N.
- Non-Patent Document 3 As a separation method when the number N of signal sources and the number M of sensors have a relation of M ⁇ N, there is a method based on sparsity of signals (for example, Non-Patent Document 3).
- a signal binary mask that estimates the signal observed at each time from which signal source is generated by some method and extracts only the signal at that time is used as the separation system W ( f, m), it is possible to separate the signals. This is a method based on S-sparity.
- FIG. 28 (conventional method 2) is a block diagram for explaining the method using the sparsity.
- the following method is generally used for estimating the signal source at each time. That is, assuming that the respective signal sources are spatially separated, a phase difference and an amplitude ratio are generated between the signals observed by the plurality of sensors, depending on the relative positions of the respective signal sources and the sensors. Based on the assumption that the observed signal at each time contains at most one signal, the phase difference and amplitude ratio of the observed signal at each time are the phase and amplitude of one signal included in the observed signal at that time. Therefore, the phase difference and the amplitude ratio of the observed signal in each sample can be clustered, and each source signal can be estimated by reconstructing the signal at the time belonging to each class.
- the arrival direction of the signal obtained from the phase difference other than the phase difference itself may be set as the relative value z (f, m).
- FIG. 29 illustrates this distribution.
- a representative value calculation section 753 calculates representative values (peak, average value, median value, etc.) of these N classes. For the sake of convenience, a, a,.
- j is an arbitrary sensor number.
- FIG. 28 (conventional method 3) is a block diagram for explaining a method of estimating the mixed-system IJ based on the sparsity.
- the mixed signal X (f, m) is calculated using the mixing matrix H (f).
- X (f, m) M k (f, m) X (f, m) is obtained.
- the observation signals X (f, m) of all sensors (X (f, m), ..., X (f, m)
- the separated signal X '"(f, mi) obtained in this way is sent to the mixing process calculation unit 756, where
- H (f) is estimated by calculating Where ⁇ [ ⁇ ] is the mean for m.
- the tr (f) obtained in this way is sent to the inverse matrix calculator 757, where the inverse matrix tr (f) -1 is obtained. Then, the signal separation unit 758 performs the calculation of the above equation (7), whereby the separation signal Y (f, m) can be estimated.
- Patent Document 1 JP 2004-145172 A
- Non-Patent Document 1 A. Hyvaermen and J. Karhunen and E. Oja, Independent Component Analysis, John Wiley & Sons, 2001, ISBN 0-471-40540
- Non-Patent Document 2 H. Sawada, R. Mukai, S. Araki and S. Makino, "A Robust and Precise Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation", in Pro the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA 2003), 2003, pp. 505-510
- Non-Patent Document 3 S. Rickard, R. Balan, and J. Rosea, ⁇ Real-Time Time-Frequency Based Blind Source Separation, ⁇ rd International Conference on Independent Component Analysis and Blind Source Separation (ICA2001), San Diego,
- Non-Patent Document 4 F. Abrard, Y. Deville, P. White, From blind source separation to blind source cancellation in the underdetermined case: a new approach based on time-frequency analysis, "Proceedings of the 3rd International Conference on Independent Component Analysis and Signal Separation (ICA'2001), pp. 734-739, San Diego, California, Dec. 2001.
- ICA'2001 Independent Component Analysis and Signal Separation
- Non-Patent Document 5 Y. Deville, "Temporal and time-frquency correlation-based blind source separation methods," in Proc., ICASSP2003, Apr. 2003, pp. 1059-1064.
- the parsing property is not perfect, there may be two or more observation signals of the same frequency at a certain time.
- the relative value z (f, m) at this time is a value that is far from the representative value a,..., A that should originally correspond, and depending on the value of ⁇ ,
- the observation signal corresponding to this sample is treated as 0, and the 0 component is packed into the separated signal. Since the proportion of the excluded sample is larger as the value of ⁇ is smaller, the amount of the zero component packed becomes larger as the value of ⁇ is smaller. If a large number of 0 components are packed in each separated signal, this causes the distortion of the separated signal to increase, resulting in an unpleasant audible noise called Musical Noise. On the other hand, when the ⁇ of the binary mask is increased, the Musical Noise is a power that reduces the amount of noise, but on the other hand, the separation performance deteriorates.
- the present invention has been made in view of such a point, and it is possible to separate a mixed signal with high quality even when the number N of signal sources and the number M of sensors have a power SN> M.
- the aim is to provide possible technologies.
- the first invention solves the above problem as follows.
- the value of the observed signal which is a mixture of N (N ⁇ 2) signals observed by the M sensors, is converted into a frequency domain value, and the relative values of the observed values between the sensors are converted using the frequency domain value.
- the value (including the mapping of the relative value) is calculated at each frequency.
- these relative values are clustered into N classes, and a representative value of each class is calculated.
- a mask is created to extract the value of the signal generated by V (V ⁇ M) signal sources in the frequency domain, and using the generated mask, The value of the limited signal consisting of the emitted signal is extracted.
- V ⁇ 2 the limited signal is a mixed signal composed of the V signal source powers and the generated signal powers. Therefore, the limited signal is further separated to obtain the value of each separated signal.
- the source signal can be extracted with high quality.
- this alone can extract only V source signals. Therefore, for example, all the source signals are extracted by using a plurality of types of masks and repeating the same processing while changing the combination of the signals to be extracted.
- the second invention solves the above problem as described below.
- the observed signal values X (t),..., X (t) are converted to frequency domain values X (f, m),.
- the third invention solves the above problem as follows.
- the observed signal values X (t),..., X (t) are transformed into frequency domain values X (f, m),.
- Cluster T into N clusters C) (i l,..., N) for each frequency f. If the source signal is sparse, even if the number of sensors is insufficient (N> M), it can be clustered into N clusters C (f), and the N representative vectors a (f) are It can also be calculated.
- the separation matrix W (f, m) is time-dependent, so that if the discrete time m is different, the obtained combination of separated signals may be different. Therefore, all separated signals can be obtained by obtaining separated signals for a plurality of discrete times m.
- FIG. 1 is a block diagram illustrating an overall configuration of a signal separation device according to a first embodiment.
- FIG. 2 is a block diagram illustrating details of a representative value generation unit, a mask control unit, a limited signal creation unit, and a limited signal separation unit in FIG. 1.
- FIG. 3 is a block diagram illustrating details of a mask creation unit in FIGS. 1 and 2;
- FIG. 4 is a flowchart for explaining processing of the signal separation device according to the first embodiment.
- FIG. 5 is an example of a histogram created by a clustering unit.
- FIG. 6 is a diagram for explaining how to take an estimated arrival direction ⁇ of a signal used when generating a mask having a smooth shape in the first embodiment.
- FIG. 7 is an example of a mask according to the first embodiment.
- FIG. 8 is a block diagram illustrating one system of a signal separation device according to a second embodiment.
- FIG. 9 is a block diagram illustrating one system of a signal separation device according to a third embodiment.
- FIG. 10 shows an example of a mask according to the third embodiment.
- FIG. 11 is a block diagram illustrating a configuration of a mask creation unit according to a fourth embodiment.
- A is an example of a binary mask according to the sixth embodiment
- B is an example of a binary mask according to the seventh embodiment.
- FIG. 13 is a block diagram illustrating a configuration of a representative value generation unit, a mask control unit, and a limited signal generation unit according to an eighth embodiment.
- FIG. 14 is a flowchart illustrating signal separation processing according to the eighth embodiment.
- FIG. 15 is a block diagram illustrating a configuration of a signal separation device according to a ninth embodiment.
- FIG. 16 is a flowchart illustrating a process performed by the signal separation device according to the ninth embodiment.
- FIG. 17 is a flowchart for explaining separation matrix generation processing when the number of sensors is insufficient (M ⁇ N).
- FIG. 24 is a flowchart for explaining a separation matrix generation process applicable regardless of whether the number of sensors is sufficient for the number of signal sources.
- FIG. 25 A part of a block diagram illustrating a configuration for performing signal integration in the wavenumber domain and then converting to the time domain.
- FIG. 26 is an example of a signal separation device in which each embodiment is configured by a computer.
- FIG. 27 A is a block diagram conceptually illustrating a conventional blind signal separation technology, and B is an IC.
- FIG. 28 is a block diagram for explaining a method using sparsity and a method for estimating a mixing matrix using sparsity.
- FIG. 29 An example of the distribution of relative values.
- This embodiment is an embodiment according to the first present invention, in which a mask having a smooth shape using the directional characteristics of a blind spot beamformer is used, and V (2 ⁇ V ⁇ M”
- V 2 ⁇ V ⁇ M
- FIG. 1 is a block diagram illustrating an overall configuration of a signal separation device 1 of the present embodiment.
- FIG. 4 is a block diagram illustrating details of the embodiment.
- FIG. 3 is a block diagram illustrating details of the mask creating unit 51_k in FIGS. 1 and 2.
- the arrows in these figures indicate the flow of data, and the flow of data to and from the force control unit 10 and the temporary storage unit 90 is omitted. That is, even when the data passes through the control unit 10 or the temporary storage unit 90, the process of passing the data is omitted.
- FIG. 4 is a flowchart for explaining the processing of the signal separation device 1 in the present embodiment.
- the configuration and processing of the signal separation device 1 of this example will be described with reference to these drawings.
- the signal separation device 1 of the present embodiment includes a storage unit 2 and a signal separation processor 3 electrically connected to the storage unit 2 by wire or wirelessly.
- the storage unit 2 includes, for example, a magnetic recording device such as a hard disk device, a flexible disk, and a magnetic tape, a DVD-RAM (Random Access Memory), and a CD-R (Recordable) / R Optical disk devices such as W (Rewritable), magneto-optical recording devices such as MO (Magneto-Optical disc), semiconductor memories such as EEP-ROM (Electronically Erasable and Programmable Dle-Read Only Memory), and flash memory . Further, the storage unit 2 may be present in the same housing as the signal separation processor 3 or may be configured in a separate housing.
- the signal separation processor 3 of this example is, for example, hardware constituted by a processor, a RAM, and the like, and has each processing block described below.
- the signals emitted from the N signal sources are statistically independent of each other, and that each signal is sufficiently sparse.
- “sparse” is a property that a signal rarely takes 0 or a large value close to 0 at most of the time t. This sparsity is confirmed, for example, by an audio signal.
- a non-white signal such as an audio signal is subjected to a short-time discrete Fourier transform or the like to form a time series for each frequency, so that the number of times closer to 0 increases and the sparseness is emphasized.
- a Gaussian distribution is often used as a model for a signal, but a signal having sparseness is modeled by a Laplace distribution instead of a Gaussian distribution.
- the M observation signal values X (t) are converted into frequency domain observation signals by the frequency domain transformation unit 20.
- the representative value generator 30 calculates N representative values a, a,..., A corresponding to each source signal.
- the mask control unit 40 sets V (2 ⁇ V ⁇ M) of the representative values a, a,.
- the limited signal generator 50-k selects V source signals from the observed signal value X (f, m).
- a separation system for obtaining V separated signals is provided.
- M limited signal values X (f, m) are input, and V separated signal values Y (f, m) are output.
- V ⁇ M for the number of inputs M and the number of outputs V of the separation system, V ⁇ M, so that [Conventional method 1] or [Conventional method 3] can be used to estimate the separation system here. is there.
- the time domain conversion unit 70-k converts the separated signal value Y (f, m) obtained in the time frequency domain into a signal value in the time domain.
- V separated signals With only the above processing, only V separated signals can be obtained. Therefore, in order to obtain other separated signals, the configuration of the V representative values selected by the mask control unit 40 is changed, and the processing from the limited signal generation unit 50-k to the time domain conversion unit 70-k is performed by multiple systems (u system) Finally, the signal integrating unit 80 integrates the outputs from the respective systems to obtain all N separated signals.
- the source signal is separated and extracted from the observed signals. is there .
- the signal in this example is a signal that can assume sparsity such as a voice signal, and the number N of sound sources is known or can be estimated.
- the sensor of this example is a microphone or the like that can observe this signal, and it is assumed that they are arranged on a straight line.
- the signal separation processor 3 accesses the storage unit 2 and derives each observed signal value X (t) therefrom.
- the data is sequentially read and sent to the frequency domain conversion unit 20 (Step Sl).
- the frequency domain transforming unit 20 sequentially converts these signal values into time-domain observed signal values X (f, m) by a short-time discrete Fourier transform or the like, and stores them in the temporary storage unit 90 (step S2). ).
- the observed signal value X (f, m) in the frequency domain stored in the storage unit 90 is sent to the representative value generation unit 30.
- the relative value calculation unit 31 of the representative value generation unit 30 sends the relative value z (f, m) of the observation value between the sensors using the transmitted observation signal value X (f, m) in the frequency domain. , For each frequency (Step S3).
- At least one of them may be used, or a mapping (for example, an arrival direction of a signal obtained from the phase difference) which is not a phase difference itself may be used.
- V is the signal speed and d is the distance between sensor jl and sensor j2.
- the stirrer 32 sequentially reads the relative values z (f, m) from the temporary memory 90, and
- the value z (f, m) is clustered into N classes (step S4).
- the cluster is clustered into N classes.
- the cluster is clustered into N classes.
- the ring unit 32 creates a histogram from the transmitted relative values z (f, m).
- FIG. 5 is an example of a histogram created in this way.
- the clustering information (clusters C 1, C 2,..., C) generated by the clustering unit 32 is
- the representative value calculation unit 33 reads them and calculates representative values a, a,..., A of the N clusters C 1, C 2,..., C (step S5). Specifically, for example,
- the peak of each class in the histogram may be used as a representative value, or the average value of each class may be used as a representative value.
- the N representative values are referred to as a, a,..., A from the smaller value (for convenience) (see FIG. 5). Note that these representative values a, a, ..., a are the arrival of each of the N signals.
- the mask control unit 40 generates a set G having the representative values a, a,.
- the data specifying 1 2 N 0 is assigned to a variable SG, and the variable SG is stored in the temporary storage unit 90.
- the mask control unit 40 sets a value obtained by adding 1 to the variable k stored in the temporary storage unit 90 as a new variable k and stores it in the temporary storage unit 90 again (step S7).
- the mask control unit 40 calls the variables SG and SG from the temporary storage unit 90. Then, the mask control unit 40
- the set G of appropriate V ( ⁇ Micromax) number of representative values, including the original indicating the complement of G c)) and k selected, substitutes the data specifying the set G into a variable SG, this variable SG It is stored in the temporary storage kkk storage unit 90 (step S8).
- the mask generation unit 51-k of the limited signal generation unit 50-k reads the variable SG stored in the temporary storage unit 90, and outputs a signal kkk of a class having a set G specified by the variable SG as a representative value
- a “smooth-shaped mask” for extracting a signal is created (step S9).
- the “smooth-shaped mask” is defined as a high-level value relative to a relative value within a predetermined range (limited range) including V (2 ⁇ V ⁇ M) representative values. Take a low level value for a representative value that is not within the limited range, and change from the high level value to the low level value with a change in the relative value. Denotes a function that is continuous.
- “high level value” means a numerical value sufficiently larger than 0 (for example, 1 or more), and “low level value” means a value sufficiently close to 0 (for example, 60 dB for the high level value). Force, etc.)
- the value is not particularly limited.
- a “sliding force and an appropriately shaped mask” are created using the directional characteristics of a blind spot beamformer formed by N-V + 1 sensors.
- This mask has sufficient sensitivity in the direction (G) of the V signals included in the limited signal, and N—V signals to be removed
- the variables SG, SG, and SG e are read from the mask creation unit 51—k force temporary storage unit 90.
- the mask creation unit 51-k calculates the elements of the set G indicated by the variable SG (within the limited range)
- the mask creation unit 51—k is a variable
- the mask creating unit 51-k stores ⁇ and ⁇ in the temporary storage unit 90.
- d is the distance between sensor 1 and sensor j (d is 0)
- f is a variable of frequency
- V is the speed of the signal.
- the phase difference is obtained from the phase difference z (f, m) between the observation signals of the two sensors.
- the angle formed by the line segment connecting the signal source and the line segment connecting the origin and the first sensor 10 is ⁇ ⁇ corresponding to the Banme signal source.
- the generated delay matrix H (f) is converted from the temporary storage unit 90 (Fig. 1) to the NBF creation unit 51b-k (Fig.
- This NBF system IjW (f) is stored in the temporary storage unit 90 (FIG. 1).
- the directivity calculating unit 51c-1k sequentially stores the elements W (f), d, and v in the first row of the NBF matrix W (f) from the temporary storage unit 90.
- the generated directional characteristic function F (f, ⁇ ) is sent to mask configuration units 51d-k.
- the mask configuration unit 51d-k uses this directional characteristic function F (f, ⁇ ) and the relative value z (f, m) (z (f, m) in this example) read from the temporary storage unit 90. , Generate a smooth-shaped mask M (f, m)
- the mask M (f, m) to be generated for example, the directional characteristic F (f
- the entire area of the value z (f, m) is called a limited signal area. Also, if G contains a or a
- the entire area is called a removal signal area. Also, if it contains a or a in G n G e, 0 ° ⁇ z
- a region that does not belong to either the constant signal region or the removal signal region is called a transient region.
- a is used, for example, a value sufficiently larger than 0, such as the maximum value of IF (f, ⁇ ) I in the removal signal region
- b is used, for example, a small value such as the minimum value of the gain of the directional characteristic.
- the mask M (f, m) generated by the mask generation unit 51-k as described above is stored in the temporary storage unit 90
- Limited signal extraction section 52-k further reads frequency domain observation signal value X (f, m) from temporary storage section 90. Then, the limited signal extraction unit 52—k (FIG. 2) uses the mask M (f, m) and the observed signal value X (f,
- the limited signal value X "(f, m) is stored in the temporary storage unit 90, and the limited signal separating unit 60—kk
- Source power is approximated to be the value of the mixed signal constituted by the emitted signals. Therefore, the method using independent component analysis described in [Conventional method 1] can be used to estimate the separation matrix. That is, as the input of the independent component analysis, the limited signal value X "(f, m) is used instead of the observed signal value X, and separation is performed using, for example, the equation (2) described in [Conventional method 1].
- the separation by ICA in the present embodiment, first, in the ICA separation matrix estimating section 61-k, the limited matrix value X (f, m) is used, and the separation matrix W is determined in accordance with the learning rule of the above-mentioned equation (2). (f, m)
- this separation matrix W (f, m) is stored in the temporary storage unit 90.
- the separation matrix W (f, m) for example, the feedback of the output value Y (f, m) from the following permutation 'scaling solution unit 62_k is used.
- the generated separation matrix W (f, m) is
- Permutation 'scaling solution unit 62-k for example, outputs the separated signal value Y (f, m
- this tag ⁇ is represented as a superscript nkq of the separated signal value Y.
- the permutation 'scaling solution unit 62-k force temporary storage unit 90 force The inverse matrix of the extracted separation matrix W (f) (in the case of N ⁇ M, the Moore-Penrose pseudo Inverse matrix)
- the solution unit 62—k assigns a tag ⁇ indicating the representative value a to the separated signal Y (pair
- the scaling problem of the ICA is solved, and the separation matrix W (f) after the scaling problem is solved is stored in the temporary storage unit 90.
- Each separated signal value Y to which the tag ⁇ is added is sent to the time domain transform unit 70-k.
- the inter-domain transform unit 70-k converts each separated signal value Y obtained in the time-frequency domain into a signal value in the time domain by, for example, short-time inverse discrete Fourier transform or the like, and converts the converted value.
- the time domain transforming unit 70-k extracts the tag ⁇ ⁇ ⁇ ⁇ associated with the signal value Y in the frequency domain from the temporary storage unit 90 for each frequency.
- Band conversion section 70-k determines whether or not the tags ⁇ ⁇ at each frequency are all equal. This
- the tag ⁇ ⁇ associated with the signal value Y of the area is associated. On the other hand, these are all
- the tag of the signal value y in the time domain is determined by majority vote.
- the mask control unit 40 extracts the variables SG and SG from the temporary storage unit 90,
- variable SG is stored in the temporary storage unit 90 (step S14). Also, the mask control unit 40 reads the variables SG and SG from the temporary storage unit 90, and this new set G is equal to the set G.
- step S15 It is determined whether it is 0 0 or not (step S15). Here, if not G2 G, go to step S7
- the selection / integration is performed to obtain all N separated signals (step S16). More specifically, for example, first, the signal integration unit 80 first reads each of the separated signals y (t).
- the signal integration unit 80 determines that all the separated signal values y (t)
- the signal integration unit 80 appropriately selects one of the separated signal values having the same tag, and Output as a separated signal value y ⁇ t) ⁇ The average of separated signal values having the same tag is calculated, and this is used as an output signal (step S17).
- one of the separated signal values y (t) is appropriately selected, and the final separated signal value y (t).
- the signal integration unit 80 outputs, for example, a signal having the maximum power among the separated signal values y (t) having the same tag a as the final separated signal value y (t). .
- the average of the separated signal values with the same tag is output as the final separated signal value y (t).
- the signal integration unit 80 In the case of processing, the signal integration unit 80
- N signals are separated with little distortion.
- a mixed signal (limited signal) composed of two or more and M or less original signals is extracted by a mask having a smooth shape. Therefore, signals (samples) for a wide range of relative values z (f, m) can be extracted as limited signals, compared to the binary mask of [Conventional method 2] that extracts only one signal value.
- the signals are separated and extracted using a mask having a smooth shape.
- the mask having the smooth shape has a shape in which the edge portion is smoothly spread. Therefore, if this smooth mask is used, even if there are two or more observation signals of the same frequency at a certain time and the sample value deviates from the representative values a,.
- the mask for the position may have a value other than 0, more signals can be extracted than a binary mask whose value changes sharply. As a result, it is possible to suppress quality deterioration due to discontinuous packing of the 0 component in the separated signal.
- audio signals from three speakers are used as source signals, and a mixed signal in an environment without reverberation is observed with two omnidirectional microphones. Is simulating.
- the SIR in the table is the signal to interference ratio (dB), which is an index indicating the separation performance.
- the SDR is the signal-to-distortion ratio (Signal to distortion ratio) (dB), which is an index indicating the degree of signal distortion. In both cases, higher values indicate better performance.
- SIR1 and SDR1 correspond to speaker 1
- SIR2 and SDR2 correspond to speaker 2
- SIR3 and SDR3 correspond to speaker 3.
- This embodiment is also an embodiment according to the first invention.
- a “mask having a smooth shape” is used in the limited signal generation unit, and a separation method based on the mixing matrix estimation is used in the limited signal separation unit. Note that in this embodiment, descriptions of items common to the first embodiment will be omitted.
- FIG. 8 is a block diagram illustrating only one system for obtaining V separated signal values in the signal separating device according to the present embodiment.
- the same components as those in the first embodiment are denoted by the same reference numerals as those in the first embodiment.
- the difference between the signal separation device 1 of the first embodiment and the signal separation device of the present embodiment in the configuration is that the limited signal generation unit 50-k is limited to the limited signal generation unit 150-k. And the limited signal separating section 60-k is replaced by the limited signal separating section 160-k.
- the representative value generation unit 30 extracts, from the temporary storage unit 90, the observed signal value X (f, m) in the frequency domain generated by the frequency domain conversion unit 20 (FIG. 1).
- the representative value generator 30 extracts, from the temporary storage unit 90, the observed signal value X (f, m) in the frequency domain generated by the frequency domain conversion unit 20 (FIG. 1).
- FIG. 8 shows a case where the relative value calculating unit 31 calculates the relative value z (f, m) of the observed value, performs clustering in the clustering unit 32, and calculates the representative value, as in the first embodiment.
- representative values a, a, ..., a are calculated.
- the relative value z (f, m) is i
- This is a mask for extracting the value X (f, m) of the limited signal in which V ( M) signals corresponding to the table values are mixed, and has the smooth shape mask shown in the first embodiment.
- the other is a binary mask M (f, m) that extracts signals containing only one signal, and k
- the limited signal extraction unit 152—k obtains a smooth-shaped mask M (f, m) from the temporary storage unit 90 (FIG. 1) and the observed signal value X (f, m). And limited
- the signal extraction unit 152—k (FIG. 8) converts the mask M (f, m) to the observed signal value X (f
- the mixture matrix is sent to the inverse matrix calculation unit 163-k, and the inverse matrix calculation unit 163-k first drops the rank of the mixture matrix H '. That is, in the mixing matrix H ′, V columns corresponding to the limited signal X (f, m) composed of V signals (that is, corresponding to the V representative values a included in G)
- the inverse matrix calculation unit 163 calculates the inverse matrix H ′ 1 (f) of the created square matrix H ′.
- This embodiment is also an embodiment according to the first invention.
- a “smooth mask” is used, and only signals composed of signals emitted from any one signal source from the observed signal (this is called “limited signal” in this embodiment) Is extracted, and the extracted limited signal is used as a separated signal.
- the limited signal this is called “limited signal” in this embodiment
- items common to the first embodiment are described. Explanation is omitted.
- FIG. 9 is a block diagram illustrating only one system part for obtaining one separated signal in the signal separating device of the present embodiment. Note that, in FIG. 9, the same reference numerals as in the first embodiment denote the same components as those in the first embodiment.
- the difference between the signal separation device 1 of the first embodiment and the signal separation device of the present embodiment in the configuration is that the limited signal generation units 50-k are limited signal generation units. 250-k, and the point that the limited signal separator 60-k does not exist in the signal separator of the present embodiment.
- the configuration and processing of the present embodiment will be described.
- the representative value generation unit 30 also extracts the frequency domain observation signal value X (f, m) generated by the frequency domain conversion unit 20 as a temporary storage unit 90 (FIG. 1).
- Representative value generator 30 (Fig. 9)
- the relative value calculator 31 calculates the relative value z (f, m) of the observed value, performs clustering in the clustering unit 32, and performs the representative value calculation in the representative value calculator 33. Calculate a, a, ..., a.
- the relative value z (f, m) is the phase difference and the amplitude ratio.
- At least one of them, or its mapping (for example, the direction of arrival of a signal obtained from the phase difference) can be used.
- the phase difference force between observation signals is obtained.
- the 250-k mask generator 251-k (Fig. 9) reads these representative values a, a, ..., a,
- This function is a function that takes a low-level Venore value for the value and the transition from the high-level value to the low-level value with a change in the relative value is continuous.
- the mask creation unit 251-k generates a (NXN) delay matrix H (f).
- the mask creation unit 251—k is configured to store the representative values a, a,.
- H (f) exp (j2 f ⁇ )
- mask creation section 251-k uses this delay matrix H (f) to generate a blind spot beamformer.
- the mask creation unit 251-k sequentially extracts the elements W (f), d, and v of the first row of the NBF matrix W (f) from the temporary storage unit 90,
- the directional characteristic function F (f, ⁇ ) shown in the above equation (10) is generated. Then, the mask creation unit 25 l_k uses this directional characteristic function F (f, ⁇ ) to generate a smooth-shaped mask M (f, m).
- a mask represented by the formula (11) in the first embodiment (referred to as “mask 7”) or a mask represented by the formula (12) (referred to as “mask 8”) ) Is generated as a smooth-shaped mask M (f, m) in this embodiment.
- a [smoothly-shaped mask] having a characteristic of uniformly reducing the gain of the removal signal region as described below may be generated.
- MDC ( f , m) ⁇
- ⁇ is an estimated value of the direction of arrival of the signal not to be removed (N ⁇ 1 representative values other than the representative value a to be extracted) among the estimated values of the direction of arrival of the N ⁇ 1 signals to be removed (extraction value). It is the closest to the representative value a).
- Mask generator 251 The smooth-shaped mask M (f, m) generated by k
- the signal separation device returns the obtained separated signal Y (f, m) to a time-domain signal in the time-domain conversion unit, and outputs the signal as it is through the signal integration unit.
- audio signals from three speakers are used as source signals, and a mixed signal in an environment without reverberation is observed with two omnidirectional microphones. Is simulating.
- This example is a simulation result when the way of mixing signals (specifically, the position of the speaker) is changed in the situation shown in Table 2.
- the method of this embodiment it is possible to obtain a much higher SDR than in the conventional method 2 with almost no decrease in the separation performance SIR. This indicates that the signal is separated with little distortion. From this, it can be seen that the method of the present embodiment is effective for separating signals with low distortion when the number N of signal sources is larger than the number M of sensors.
- This embodiment is also an embodiment according to the first invention.
- a smooth-shaped mask is generated by convolving a smooth-shaped function with the inari mask.
- the processing in the mask generation unit corresponding to the mask generation unit 51-k in FIG. 1 will be described.
- the phase difference z (f, m), the amplitude ratio z (f, m), and the phase difference described in the first embodiment are used.
- the arrival direction z (f, m) of the signal obtained from the phase difference z (f, m) is used as the relative value z (f, m).
- FIG. 11 is a block diagram illustrating the configuration of the mask creation unit 300-k according to the present embodiment.
- the binary mask creating unit 301-k takes a high level value for a relative value within a predetermined range including V representative values, and Take a low level value for a relative value that is not within the range, and generate a binary mask whose transition from the high level value to the low level value as the relative value changes is a discontinuous function.
- the mask generator 300-k is a binary mask for extracting a signal in which V signals are mixed.
- a and a are calculated by the following processing.
- the calculated variance value ⁇ is stored in the temporary storage unit 90 (FIG. 1), and then the mask creation unit 30 1-k (FIG. 11) stores the variance value ⁇ 2 and the representative value stored in the temporary storage unit 90. Read the value a (in this example, the average of cluster C) and use them to
- the binary mask F (z) generated as described above is stored in the temporary storage unit 90 (FIG. 1).
- a unimodal function generator 302—k (FIG. 11) generates a unimodal function g (z) whose value continuously changes with the change of z, and stores the data in the temporary storage unit 90 (FIG. 1). ).
- the unimodal function g (z) is, for example, Gaussian
- ⁇ means the standard deviation of g (z).
- ⁇ ⁇ ( ⁇ , ⁇ ) k + v + 1 k k + v + 1.
- ⁇ and ⁇ are those of Expression (22). Also, min (hi,)
- the convolution mixing unit 303-k (FIG. 11) reads the binary mask F (z) and the unimodal function g (z) from the temporary storage unit 90 (FIG. 1), and F (z) is a unimodal function g (bb
- the mask construction unit 304-k (FIG. 11) reads the relative value z (f, m) and the function F (z) from the temporary storage unit 90 (FIG. 1), and stores them in the function F (z).
- Mask to which relative value z (f, m) is assigned M (f, m) F (z (f, m))
- the function of a smooth shape may be defined as F (z) and the mask of Expression (24) may be obtained.
- the representative value a in this example, the average value of the cluster C
- the variance values ⁇ 2 and a and a obtained as shown in Expressions (22) and (23) are used as the mask component 304—k ( Figure 11) reads, average a (f), variance
- gi (z) is normalized by g (z) / gi ( ai ), and the value at ai is normalized to 1.
- [gk + V a max ⁇ z may be calculated to obtain the mask of Expression (24).
- This embodiment is also an embodiment according to the first invention.
- a mask having a smooth shape is generated from the difference between the odd functions.
- the processing in the mask creation unit corresponding to the mask generation unit 51-1k in FIG. 1 will be described.
- the other configurations and processes are the same as those of the first to third embodiments.
- the mask creation unit according to the present embodiment is configured such that the relative value is 0 when the relative value is the lower limit value a of the limited range.
- the relative value z (f, m) includes at least one of the phase difference Zi (f, m) and the amplitude ratio z (f, m) shown in the first embodiment or the like, or a mapping thereof (for example, From the phase difference
- This embodiment is also an embodiment according to the first invention.
- the mask of the present embodiment is created in the mask creating section 51-k in FIGS. 1 and 2, takes a high level value for a relative value within a predetermined range including V representative values, It is a function (binary mask) that takes a low-level value for a representative value that is not within the range and has a discontinuous transition from a high-level value to a low-level value.
- V ⁇ M that is, for example,
- BCf.m ⁇ mul ma ⁇ (25)
- A, a are set in the range of a a a a a a, a a a a a a
- a and a are generated by, for example, the same procedure as the method described in the fourth embodiment. Also in this embodiment, the phase difference z (f, m), the amplitude ratio z (f, m), the phase mm max 1 2
- the number of relative values z (f, m) included in the range of a force a is 2 or more and M or less
- the number M of sensors preferably the number M of sensors.
- a plurality of types of binary masks B (f, m) are created in this embodiment.
- the mask control unit 40 (Figs. 1 and 2) reads the representative values a, a, ..., a from the temporary memory unit 90, and reads these representative values a, a,. .., data that identifies the set G with elements a
- variable SG is assigned to the variable SG, and the variable SG is stored in the temporary storage unit 90. Also, mask control
- the mask control unit 40 sets a value obtained by adding 1 to the variable k stored in the temporary storage unit 90 as a new variable k and stores it again in the temporary storage unit 90 (FIG. 4: step S7).
- the mask control unit 40 calls the variables SG and SG from the temporary storage unit 90.
- the mask control unit 40 determines from the set G specified by the variable SG
- the mask creation unit 51 — k reads the variable SG stored in the temporary storage unit 90, and
- FIG. 12A is an example of a binary mask according to the present embodiment. This example assigns a high-level value (eg, 1) to a relative value z (f, m) within a predetermined range including two representative values a, a.
- a high-level value eg, 1
- z (f, m) e.g. 1
- the high level value is flat, and the high level value and the low level value are discontinuous.
- a binary mask B (f, m) is used instead of the smooth-shaped mask M (f, m) used in the first and second embodiments, and the signal value in the frequency domain is used.
- a mixed signal in this embodiment, this is called a "limited signal" composed of signals emitted from the signal sources is extracted, and the processing of the first or second embodiment is executed.
- a sample value located between 1 2 and 1 2 can also be extracted. Also, for example, a position between a and a
- Such a sample is highly likely to be the sample corresponding to the representative value a or a.
- the signal power degradation due to the binary mask B (f, m) of the present embodiment is caused by the limited signal being s
- a limited signal is extracted using the binary mask of the present embodiment, and ICA is provided to the limited signal to perform signal separation.
- audio signals from three speakers are used as the original signal, and a mixed signal in a reverberation-free environment is observed with two omnidirectional microphones. Simulating the situation.
- the method of this embodiment can obtain a much higher SDR than the conventional method 2 with almost no decrease in the separation performance SIR. This indicates that the method of this embodiment performs signal separation with much lower distortion.
- This embodiment is also an embodiment according to the first invention, and is a modification of the above-described sixth embodiment.
- the present embodiment is also an embodiment in which the limited signal is extracted using the binary mask when 2 ⁇ V ⁇ M, but there is a difference in the method of creating the binary mask B (f, m) and the process of calculating the limited signal. .
- the method of creating the binary mask B (f, m) and the process of calculating the limited signal will be described, and other processes and functional configurations will be described in the first embodiment or the second embodiment. Since this is the same as the embodiment, the description is omitted.
- the hard mask B (f, m) of this mode is for extracting an observation signal component other than the above-described limited signal.
- the binary mask B (f, m) created by the mask creation unit of this embodiment has a low-level value with respect to a relative value within a predetermined range including V representative values (this set is defined as G). To a high value for a representative value (G c ) that is not within this predetermined range.
- the transition from a high level value to a low level value is a discontinuous function. However, 2 ⁇ V ⁇ M.
- the mask generating unit 51- k in this embodiment for example, with the representative values included in the G e
- phase difference z (f, m) the amplitude ratio z (f, m), the direction of arrival z (f, m) of the signal obtained from the phase difference z (f, m), and the like are given.
- FIG. 12B is an example of the binary mask B (f, m) of the present embodiment.
- V two representative values a, a within a given range containing a.
- a high level value for example, 1
- the high level value of the binary mask of this example is flat, and the high level value and the low level value are discontinuous.
- the limited signal extraction unit of this embodiment converts the signal value X (f, m) in the frequency domain
- the binary mask M (f, m) in the above equation (3) is a binary mask that takes a high-level value for only one representative value, and a high-level for two or more representative values.
- the processing of this embodiment may be performed using a binary mask that takes a value. Also, the processing of the present embodiment may be performed using the above-described smooth-shaped mask instead of the binary mask.
- the limited signal X (f, m) is calculated, the same limited signal separation, time domain conversion, and signal integration processing as in the first embodiment or the second embodiment is performed.
- This embodiment is an example according to the second embodiment of the present invention, in which a signal is observed by M sensors. Then, the observation values are clustered in the M-dimensional domain, and a mask is defined.
- a description will be given focusing on differences from the first embodiment, and a description of items common to the first embodiment will be omitted.
- FIG. 13 is a block diagram illustrating a configuration of the representative value generation unit 430, the mask control unit 40, and the limited signal generation unit 450-k in the present embodiment. This figure shows only one system for obtaining V separated signals. In this embodiment, 1 ⁇ V ⁇ M.
- the structural difference between the signal separation device of the present embodiment and the signal separation device 1 of the first embodiment is a representative value generation unit and a limited signal generation unit. That is, a representative value generation unit 430 (FIG. 13) is provided instead of the representative value generation unit 30 (FIG. 1) of the signal separation device 1 of the first embodiment, and the limited signal generation unit 50 of the signal separation device 1 is provided. Limited signal generator 450-k (FIG. 13) is provided instead of -k (FIG. 1). Other configurations are the same as those of the first embodiment.
- FIG. 14 is a flowchart for explaining signal separation processing in the present embodiment. Hereinafter, the signal separation processing of the present embodiment will be described with reference to this flowchart.
- the signal separation processor 3 executes the following processing under the control of the control unit 10.
- the signal separation processor 3 accesses the storage unit 2 under the control of the control unit 10, sequentially reads each observation signal value X (t) therefrom, and sends it to the frequency domain conversion unit 20 (step S21).
- the frequency domain transform unit 20 sequentially converts these signal values into time-domain observed signal values X (f, m) by a short-time discrete Fourier transform or the like, and stores them in the temporary storage unit 90.
- the clustering unit 432 calculates the observed signal values X (f, m),..., X (f, m) in the frequency domain stored in the temporary storage unit 90 (FIG. 1). ) Is read. And the clustering part 4
- X (f, m) [X (f, m),..., X (f, m)]
- the purpose of clustering is to classify samples (observed signal vector X (f, m)) in which the same signal source is dominant (having a main component) into the same cluster.
- the obtained N clusters C (f),..., C (f) need not necessarily be disjoint (c i (f) n c j (f) is an empty set, i ⁇ j).
- the clustering unit 432 in this example performs each clustering so that the clustering can be performed properly, that is, the samples in which the same signal source is dominant (the observed signal vector X (f, m)) are classified into the same cluster. Clustering is performed after normalizing Sampnore.
- the observed signal vector X (f, m) is read from the normalization unit 432a (FIG. 13) and the temporary storage unit 90 (FIG. 1).
- the normalization unit 432a in this example performs the normalization of Expressions (28) and (29), and further performs
- the cluster generation unit 4 3 2 b performs clustering on the normalized result.
- is the norm of X (f, m).
- L k (X (f 5 m)) (L norm ⁇ X (f, m) ⁇ ⁇ i, m) defined by ( ⁇ ⁇ 3 ⁇ 4
- clustering performed by the cluster generation unit 432b
- a method described in many textbooks such as hierarchical clustering or k-means clustering can be used (for example, See “Translation of Patterns” by Morio Onoe, New Technology Communications, ISBN 4-915851-24-9, Chapter 10.).
- the clustering method of the displacement is also defined as the distance between the two samples X (f, m) and X '(f, m), and the closeness between the samples is measured in accordance with the distance.
- the class is set so that is included in the same cluster.
- the cluster generation unit 432b uses the cosine distance between the two normalized observation signal vectors X (f, m) as a distance scale. Perform clustering using Note that the cosine distance between the two samples X (f, m) and X '(f, m) is
- the cluster generation unit 432b calculates the difference (X (f, m) between the two normalized observation signal vectors. -X '(f, m)) L norm ⁇ X (f, m) — X, (f, m) ⁇ , m) — norm
- clustering is performed using the cosine distance (Equation (32)) as a distance measure (end of the description of [Details of processing in clustering section 432]).
- the representative value calculation unit 433 sequentially sorts each class C (f) stored in the temporary storage unit 90 (FIG. 1).
- the representative vector (corresponding to the “second vector”) (f) representing each class C f (f) is calculated (step S24).
- the representative vector generation unit 433a (FIG. 13) of the representative value calculation unit 433 sequentially extracts each class C (f) stored in the temporary storage unit 90 (FIG. 1), and assigns each class C (f) to each cluster C (f). Average value of the sample value X (f, m) to which it belongs
- the sample X (f, m) belonging to each cluster C (f) is appropriately quantized, the most probable value is obtained, and this is represented by the representative vector a (and
- the representative vector a (f) obtained in this manner is stored in the temporary storage unit 90 (FIG. 1).
- the reordering unit 433b (FIG. 13) reads out these representative vectors a (f),..., A (f) from the temporary storage unit 90 (FIG. 1), and Each source signal of each representative vector a (f),..., A (f)
- each representative vector a (f) ki is changed so that the correspondence with s (t) is the same at all frequencies f (step S25).
- the reordering unit 433b uses the read representative vector a (f) of each frequency f,
- d is the position of the sensor j
- V is the speed of the signal
- a (f) is the i-th element of the representative vector a (f)
- d and V are stored in the temporary storage unit 90 in advance, for example. The data that has been used will be used.
- the calculated estimated values ⁇ (f) are stored in the temporary storage unit 90 (FIG. 1), for example, in correspondence with the representative vector a (f) used for the calculation.
- the sorting unit 433b Fig. 13
- each estimated value ⁇ . (F) is read from the temporary storage unit 90, and these are rearranged in a predetermined order (for example, ascending order, descending order, etc.) for each frequency f.
- This rearrangement is performed by, for example, a known rearrangement algorithm.
- the rearranging unit 433b reads this order information j ′ (f, a (f)) from the temporary storage unit 90, and stores the order information j, (f, a (f ))
- each representative vector and i is changed to correspond to the symbol (replace the subscript i in a (f)). Then, each of the representative vectors a (f) with the subscript i replaced is stored in the temporary storage unit 90 (FIG. 1).
- the mask control unit 40 specifies a set G having the respective representative vectors a (f) as elements.
- the data to be i 0 is substituted for a variable SG, and the variable SG is stored in the temporary storage unit 90. Also,
- the mask control unit 40 sets a value obtained by adding 1 to the variable k stored in the temporary storage unit 90 as a new variable k and stores it again in the temporary storage unit 90 (step S27).
- the mask control unit 40 calls the variables SG and SG from the temporary storage unit 90 (FIG. 1).
- V ( ⁇ M) representative vectors a (f) (p l, 7) including elements of the complement of the specified set G (G e ( ⁇ e indicates the complement of hi)) , V) (corresponding to the “third vector”).
- step S28 the mask control unit 40 sets each representative vector a (f
- the variables SG, SG and the observation signal vector X (f, m) are obtained from the mask generation unit 451-k (FIG. 13) and the temporary storage unit 90 (FIG. 1) of the limited signal generation unit 450-k. Reading k 0
- step S29 To generate the following mask M (f, m) (step S29).
- D (X (f, m), ai (f)) is the Mahalanobis square distance between the vector X (f, m) and a; (f).
- the mask M (f, m) is stored in the temporary storage unit 90 (Fig. 1), and the limited signal extraction unit 452-k
- FIG. 13 reads the mask M (f, m) and the observed signal vector X (f, m) from the temporary storage unit 90.
- the limited signal separating unit 60-k uses the limited signal value X (f, m) to obtain k
- the limited signal is separated (step S31).
- the limited signal value X ′ (f, m) is V k (l
- the permutation 'scaling solution unit 62-k force temporary storage unit 90 force The inverse matrix of the extracted separation matrix W (f) (in the case of N ⁇ M, the Moore-Penrose pseudo Inverse matrix)
- the station 'scaling solver 62-k generates a representative value a for the separated signal Y.
- the separation matrix W (f) is extracted from the permutation 'scaling solution section 62-k force temporary storage section 90, and each row w (f) is extracted.
- the scaling problem of the ICA is solved, and the separation matrix W (f) after the scaling problem is solved is stored in the temporary storage unit 90.
- Each separated signal value Y to which the tag ⁇ is added is sent to the time domain transform unit 70-k.
- the inter-domain transform unit 70-k converts each separated signal value Y obtained in the time-frequency domain into a signal value in the time domain by, for example, short-time inverse discrete Fourier transform or the like, and converts the converted value.
- the time domain transforming unit 70-k extracts the tag ⁇ ⁇ ⁇ ⁇ associated with the signal value Y in the frequency domain from the temporary storage unit 90 for each frequency and time.
- the time domain conversion unit 70-k determines whether or not the tags ⁇ at each frequency and time are all equal. Here, if all these are equal, the tag of the time-domain signal value y k is associated with the tag q associated with the frequency-domain signal value Y q kq kq
- the tag of the signal value y kq in the time domain is determined by majority decision.
- the mask control unit 40 extracts the variables SG and SG from the temporary storage unit 90,
- variable SG is stored in the temporary storage section 90 (step S34). Also, the mask control unit 40 reads the variables SG and SG from the temporary storage unit 90, and this new set G is equal to the set G.
- step S35 It is determined whether it is 0 0 or not (step S35). Here, if G is not G, the process of step S27
- the selection / integration is performed to obtain all N separated signals (step S36). Specifically For example, first, the signal integration unit 80 firstly reads each separated signal y (t
- the signal integration unit 80 determines that all the separated signal values y (t)
- the signal integration unit 80 appropriately selects one of the separated signal values having the same tag, and determines the final value. Power to be output as the target separated signal value y (t) ⁇ The average of separated signal values having the same tag is calculated and used as the output signal (step S37).
- one of the separated signal values y (t) is appropriately selected, and the final separated signal value y (t) is selected.
- the signal integration unit 80 outputs, for example, a signal having the maximum power among the separated signal values y (t) having the same tag a as the final separated signal value y (t). .
- the signal integration unit 80 in the case of a process of outputting the average of the separated signal values having the same tag as the final separated signal value y (t), the signal integration unit 80
- N signals are separated with little distortion.
- the mask M (f, m) is not generated
- a limited signal value may be directly generated. That is, for example, the limited signal generation unit 450-1 k calculates the observed signal vector X (f, m)
- Equation 39 max a p (f) eG k D (X (f, m), a p (f)) ⁇ min a q (f) eGC D (X (f, m), a q (f) ) Is determined, and the observed signal level X (f, m) determined to be satisfied is determined by the signal source. Alternatively, it may be extracted as a value of a signal emitted therefrom.
- This embodiment is an embodiment according to the third invention.
- FIG. 15 is a block diagram illustrating the configuration of a brand signal separation device 500 according to the present embodiment.
- the arrows in this figure indicate the flow of data, and the flow of data to and from the force control unit 521 and the temporary storage unit 522 is omitted. That is, even when data passes through the control unit 521 or the temporary storage unit 522, the process of passing the data is omitted.
- the signal separation device 500 of the present embodiment includes a storage unit 501 and a signal separation processor 502 electrically connected to the storage unit 501 by wire or wirelessly.
- the storage unit 501 includes, for example, a magnetic recording device such as a hard disk device, a flexible disk, and a magnetic tape; an optical disk device such as a DVD-RAM (Random Access Memory) and a CD-R (Recordable) / RW (Rewritable); and an MO (Magneto- Examples include magneto-optical recording devices such as an optical disc, semiconductor memories such as an EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), and a flash memory. Further, storage unit 501 may be present in the same housing as signal separation processor 502, or may be configured in a separate housing.
- the signal separation processor 502 in this example is hardware constituted by, for example, a processor, a RAM, and the like, and includes a frequency domain transforming section 511, a mixing matrix estimating section 512, a permutation problem solving section 513, and a scaling problem solving section. It has a section 514, a column selection section 516, a matrix generation section 517, a separation matrix generation section 518, a separation signal generation section 519, a time domain conversion section 520, a control section 521, and a temporary storage section 522.
- the mixing matrix estimating unit 512 of this example includes a clustering unit 512a, a representative vector calculation unit 512b, and a vector integration unit 512c. Further, the clustering unit 512a has a normalization unit 512aa and a cluster generation unit 512ab.
- FIG. 16 is a flowchart for explaining the entire processing of the signal separation device 500 in the present embodiment. It is one chart. Hereinafter, the processing of the signal separation device 500 will be described with reference to FIGS. In the following, a case will be described where signals emitted from N (N ⁇ 2) signal sources are mixed and observed by M sensors.
- the signal separation device 500 executes the following processing under the control of the control unit 521. First, the values X (t) and X of the observed signals observed by the M sensors are written as
- the frequency domain converter 511 converts these observed signal values X (t) and X (t) into a short-time discrete Fourier transform.
- the generated estimated mixing matrix A (f) is stored in the temporary storage unit 522.
- the permutation problem solving unit 513 reads the estimated mixing matrix A (f) from the temporary storage unit 522, and sorts the columns of the estimated mixing matrix A (f) to solve the permutation problem ( Step S55). In this process, the value Y (f
- ⁇ , m can be used as feedback, in which case the permutation problem can be solved more accurately.
- the scaling problem solving unit 514 normalizes the columns of the estimated mixing matrix A (f) to solve the scaling problem (step S56), and then uses this estimated mixing matrix A (f) to
- the separation matrix generation unit 518 generates a separation matrix W (f, m) (step S57).
- the separated matrix W (f, m) is separated therefrom.
- a signal vector Y (f, m) [Y (f, m),..., Y (f, 111)] is calculated (step 358).
- the output separated signal values Y (f, m), ..., ⁇ (f, m) are stored in the temporary storage
- the time domain transform unit 520 converts the separated signal values Y (f, m),.
- step S59 a separated signal value y (t) in the time domain is obtained.
- the clustering unit 512a puts together the observation signal components X (f, m),..., X (f, m) of all the sensors read from the temporary storage unit 522, and combines these with the observation signal vector X (f, m).
- the number N of clusters C (f) equal to the number of signal sources are generated by clustering, and these are stored in the temporary storage unit 522 (step S52).
- the purpose of clustering is to classify samples (observed signal vectors X (f, m)) in which the same signal source is dominant (having a main component) into the same cluster. Note that the obtained N clusters C (f),..., C (f) are
- n c (f) is an empty set, i ⁇ j) and that do not belong to a cluster
- representative vector calculation section 512b reads each cluster C (f) from temporary storage section 522, and calculates the average value of sample X (f, m) belonging to each cluster C (f).
- ai (m) ⁇ X ( f, m) eCi (f) X (f 'm) /
- Ci to (f) l is calculated as a representative vector a ⁇ f) for each signal source (step S53).
- the sample X (f, m) belonging to each cluster C (f) may be appropriately quantized, the most probable value may be obtained, and this may be used as the representative vector a (f).
- Mixing matrix A (f) [a (f),..., a (f)]
- the estimated mixing matrix A (f) includes the arbitrariness of the order of each vector (arbitrary of permutation) and the arbitrariness of the size of each vector (arbitrary of scaling). That is, the representative vector a (f) is
- ⁇ is a permutation expressing the arbitraryness of permutation.
- the clustering unit 512a in this example uses the same signal source as the dominant sample (observed signal vector X (f, m)) so that the clustering can be performed properly. Clustering is performed after normalizing each sample by the normalizing unit 512aa.
- the normalization unit 512aa in this example further includes:
- Clustering is performed after normalization of.
- is the norm of X (f, m).
- clustering method for example, a method described in many textbooks such as hierarchical clustering or k-means clustering is used (for example, "Translation by Morio Onoe")
- each clustering method the distance between two samples X (f, m) and X '(f, m) is defined, and the closeness between samples is measured according to the distance.
- the class is ringed so as to be included in the same cluster.
- the clustering unit 512a uses the cosine distance between the two normalized observation signal vectors X (f, m) as a distance measure. To perform clustering.
- the cosine distance between the two samples X (f, m) and X '(f, m) is
- the clustering unit 512a causes the cluster generation unit 512ab to calculate the distance between the two normalized observation signal vectors.
- the representative vector a (f) of each cluster C is estimated as the mixed vector h (f) (large
- cluster C only a certain source signal S is dominant and the other source signals are close to zero.
- the observed signal vector X (f, m) normalized by equation (36) is represented by a straight line of a vector obtained by multiplying the mixed vector h (f) by sign * (H (f)). You can see that they get together.
- the position on the straight line depends on the size of the signal source I S (f, m)
- the columns of the estimated mixing matrix A (f) calculated at each frequency f are rearranged, and all the representative vectors a (f) for the same signal source s (t) are obtained.
- step S55 Make the same at frequency f (step S55). That is, the subscript i is added so that the correspondence between each separated signal Y (f, m), ..., Y (f, m) and each signal source is the same at each frequency f.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Quality & Reliability (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
- Indication And Recording Devices For Special Purposes And Tariff Metering Devices (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE602004022175T DE602004022175D1 (de) | 2003-09-02 | 2004-09-01 | Signaltrennverfahren, signaltrenneinrichtung,signaltrennprogramm und aufzeichnungsmedium |
JP2005513646A JP3949150B2 (ja) | 2003-09-02 | 2004-09-01 | 信号分離方法、信号分離装置、信号分離プログラム及び記録媒体 |
EP04772585A EP1662485B1 (en) | 2003-09-02 | 2004-09-01 | Signal separation method, signal separation device, signal separation program, and recording medium |
US10/539,609 US7496482B2 (en) | 2003-09-02 | 2004-09-01 | Signal separation method, signal separation device and recording medium |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2003-309720 | 2003-09-02 | ||
JP2003309720 | 2003-09-02 | ||
JP2004-195818 | 2004-07-01 | ||
JP2004195818 | 2004-07-01 | ||
JP2004195867 | 2004-07-01 | ||
JP2004-195867 | 2004-07-01 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2005024788A1 true WO2005024788A1 (ja) | 2005-03-17 |
WO2005024788A9 WO2005024788A9 (ja) | 2007-05-18 |
Family
ID=34279554
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/012629 WO2005024788A1 (ja) | 2003-09-02 | 2004-09-01 | 信号分離方法、信号分離装置、信号分離プログラム及び記録媒体 |
Country Status (5)
Country | Link |
---|---|
US (1) | US7496482B2 (ja) |
EP (2) | EP1662485B1 (ja) |
JP (1) | JP3949150B2 (ja) |
DE (2) | DE602004022175D1 (ja) |
WO (1) | WO2005024788A1 (ja) |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006090589A1 (ja) * | 2005-02-25 | 2006-08-31 | Pioneer Corporation | 音分離装置、音分離方法、音分離プログラムおよびコンピュータに読み取り可能な記録媒体 |
JP2006330687A (ja) * | 2005-04-28 | 2006-12-07 | Nippon Telegr & Teleph Corp <Ntt> | 信号分離装置、信号分離方法、そのプログラムおよび記録媒体 |
WO2007083814A1 (ja) * | 2006-01-23 | 2007-07-26 | Kabushiki Kaisha Kobe Seiko Sho | 音源分離装置及び音源分離方法 |
JP2007243326A (ja) * | 2006-03-06 | 2007-09-20 | Mitsubishi Electric Corp | 信号分離方法およびその方法を使用した信号分離装置 |
JP2007295085A (ja) * | 2006-04-21 | 2007-11-08 | Kobe Steel Ltd | 音源分離装置及び音源分離方法 |
JP2008052117A (ja) * | 2006-08-25 | 2008-03-06 | Oki Electric Ind Co Ltd | 雑音除去装置、方法及びプログラム |
JP2008134298A (ja) * | 2006-11-27 | 2008-06-12 | Megachips System Solutions Inc | 信号処理装置、信号処理方法およびプログラム |
WO2008072566A1 (ja) * | 2006-12-12 | 2008-06-19 | Nec Corporation | 信号分離再生装置および信号分離再生方法 |
JP2008158035A (ja) * | 2006-12-21 | 2008-07-10 | Nippon Telegr & Teleph Corp <Ntt> | 多音源有音区間判定装置、方法、プログラム及びその記録媒体 |
JP2008203474A (ja) * | 2007-02-20 | 2008-09-04 | Nippon Telegr & Teleph Corp <Ntt> | 多信号強調装置、方法、プログラム及びその記録媒体 |
JP2008219458A (ja) * | 2007-03-05 | 2008-09-18 | Kobe Steel Ltd | 音源分離装置,音源分離プログラム及び音源分離方法 |
JP2008227916A (ja) * | 2007-03-13 | 2008-09-25 | Nippon Telegr & Teleph Corp <Ntt> | 信号分離装置、信号分離方法、信号分離プログラム、記録媒体 |
JPWO2006132249A1 (ja) * | 2005-06-06 | 2009-01-08 | 国立大学法人佐賀大学 | 信号分離装置 |
WO2010005050A1 (ja) * | 2008-07-11 | 2010-01-14 | 日本電気株式会社 | 信号分析装置、信号制御装置及びその方法と、プログラム |
WO2010092913A1 (ja) * | 2009-02-13 | 2010-08-19 | 日本電気株式会社 | 多チャンネル音響信号処理方法、そのシステム及びプログラム |
WO2010092915A1 (ja) * | 2009-02-13 | 2010-08-19 | 日本電気株式会社 | 多チャンネル音響信号処理方法、そのシステム及びプログラム |
JP2010217773A (ja) * | 2009-03-18 | 2010-09-30 | Yamaha Corp | 信号処理装置およびプログラム |
JP2011027825A (ja) * | 2009-07-22 | 2011-02-10 | Sony Corp | 音声処理装置、音声処理方法およびプログラム |
JP2011107602A (ja) * | 2009-11-20 | 2011-06-02 | Sony Corp | 信号処理装置、および信号処理方法、並びにプログラム |
US20110164567A1 (en) * | 2006-04-27 | 2011-07-07 | Interdigital Technology Corporation | Method and apparatus for performing blind signal separation in an ofdm mimo system |
JP2012507049A (ja) * | 2008-10-24 | 2012-03-22 | クゥアルコム・インコーポレイテッド | コヒーレンス検出のためのシステム、方法、装置、およびコンピュータ可読媒体 |
WO2012105386A1 (ja) * | 2011-02-01 | 2012-08-09 | 日本電気株式会社 | 有音区間検出装置、有音区間検出方法、及び有音区間検出プログラム |
WO2012105385A1 (ja) * | 2011-02-01 | 2012-08-09 | 日本電気株式会社 | 有音区間分類装置、有音区間分類方法、及び有音区間分類プログラム |
JP2013504283A (ja) * | 2009-09-07 | 2013-02-04 | クゥアルコム・インコーポレイテッド | マルチチャネル信号の残響除去のためのシステム、方法、装置、およびコンピュータ可読媒体 |
JP2013070395A (ja) * | 2008-01-29 | 2013-04-18 | Qualcomm Inc | 高度に相関する混合のための強調ブラインド信号源分離アルゴリズム |
US8620672B2 (en) | 2009-06-09 | 2013-12-31 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for phase-based processing of multichannel signal |
JP2014077899A (ja) * | 2012-10-11 | 2014-05-01 | Institute Of National Colleges Of Technology Japan | 信号処理方法、装置、プログラム、およびプログラムを記録したコンピュータ読み取り可能な記録媒体 |
JP2014089249A (ja) * | 2012-10-29 | 2014-05-15 | Mitsubishi Electric Corp | 音源分離装置 |
US8954324B2 (en) | 2007-09-28 | 2015-02-10 | Qualcomm Incorporated | Multiple microphone voice activity detector |
CN110491410A (zh) * | 2019-04-12 | 2019-11-22 | 腾讯科技(深圳)有限公司 | 语音分离方法、语音识别方法及相关设备 |
CN115810364A (zh) * | 2023-02-07 | 2023-03-17 | 海纳科德(湖北)科技有限公司 | 混音环境中的端到端目标声信号提取方法及系统 |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1942932B (zh) * | 2005-02-08 | 2010-07-28 | 日本电信电话株式会社 | 信号分离装置和信号分离方法 |
JP2007034184A (ja) * | 2005-07-29 | 2007-02-08 | Kobe Steel Ltd | 音源分離装置,音源分離プログラム及び音源分離方法 |
US7472041B2 (en) * | 2005-08-26 | 2008-12-30 | Step Communications Corporation | Method and apparatus for accommodating device and/or signal mismatch in a sensor array |
US20070083365A1 (en) * | 2005-10-06 | 2007-04-12 | Dts, Inc. | Neural network classifier for separating audio sources from a monophonic audio signal |
US8130940B2 (en) * | 2005-12-05 | 2012-03-06 | Telefonaktiebolaget L M Ericsson (Publ) | Echo detection |
US8898056B2 (en) * | 2006-03-01 | 2014-11-25 | Qualcomm Incorporated | System and method for generating a separated signal by reordering frequency components |
US8131542B2 (en) * | 2007-06-08 | 2012-03-06 | Honda Motor Co., Ltd. | Sound source separation system which converges a separation matrix using a dynamic update amount based on a cost function |
US7987090B2 (en) * | 2007-08-09 | 2011-07-26 | Honda Motor Co., Ltd. | Sound-source separation system |
US8175871B2 (en) * | 2007-09-28 | 2012-05-08 | Qualcomm Incorporated | Apparatus and method of noise and echo reduction in multiple microphone audio systems |
US8755469B1 (en) * | 2008-04-15 | 2014-06-17 | The United States Of America, As Represented By The Secretary Of The Army | Method of spectrum mapping and exploitation using distributed sensors |
US8321214B2 (en) * | 2008-06-02 | 2012-11-27 | Qualcomm Incorporated | Systems, methods, and apparatus for multichannel signal amplitude balancing |
JP5277887B2 (ja) * | 2008-11-14 | 2013-08-28 | ヤマハ株式会社 | 信号処理装置およびプログラム |
EP2350926A2 (en) * | 2008-11-24 | 2011-08-03 | Institut Ruder Boskovic | Method of and system for blind extraction of more than two pure components out of spectroscopic or spectrometric measurements of only two mixtures by means of sparse component analysis |
EP2476008B1 (en) * | 2009-09-10 | 2015-04-29 | Rudjer Boskovic Institute | Underdetermined blind extraction of components from mixtures in 1d and 2d nmr spectroscopy and mass spectrometry by means of combined sparse component analysis and detection of single component points |
KR101612704B1 (ko) * | 2009-10-30 | 2016-04-18 | 삼성전자 주식회사 | 다중음원 위치 추적장치 및 그 방법 |
KR101419377B1 (ko) * | 2009-12-18 | 2014-07-15 | 배재대학교 산학협력단 | 암묵신호 분리 방법 및 이를 수행하는 장치 |
US8521477B2 (en) * | 2009-12-18 | 2013-08-27 | Electronics And Telecommunications Research Institute | Method for separating blind signal and apparatus for performing the same |
US8897455B2 (en) * | 2010-02-18 | 2014-11-25 | Qualcomm Incorporated | Microphone array subset selection for robust noise reduction |
JP5726790B2 (ja) * | 2012-03-09 | 2015-06-03 | 日本電信電話株式会社 | 音源分離装置、音源分離方法、およびプログラム |
JP6059072B2 (ja) * | 2013-04-24 | 2017-01-11 | 日本電信電話株式会社 | モデル推定装置、音源分離装置、モデル推定方法、音源分離方法及びプログラム |
JP2015135318A (ja) * | 2013-12-17 | 2015-07-27 | キヤノン株式会社 | データ処理装置、データ表示システム、試料データ取得システム、及びデータ処理方法 |
DE102015203003A1 (de) * | 2015-02-19 | 2016-08-25 | Robert Bosch Gmbh | Batteriespeichersystem mit unterschiedlichen Zelltypen |
US10991362B2 (en) * | 2015-03-18 | 2021-04-27 | Industry-University Cooperation Foundation Sogang University | Online target-speech extraction method based on auxiliary function for robust automatic speech recognition |
US11694707B2 (en) | 2015-03-18 | 2023-07-04 | Industry-University Cooperation Foundation Sogang University | Online target-speech extraction method based on auxiliary function for robust automatic speech recognition |
US10725174B2 (en) * | 2015-08-24 | 2020-07-28 | Hifi Engineering Inc. | Method and system for determining the distance to an acoustically reflective object in a conduit |
CN105352998B (zh) * | 2015-11-17 | 2017-12-26 | 电子科技大学 | 脉冲涡流红外热图像的独立成分个数确定方法 |
CN109285557B (zh) * | 2017-07-19 | 2022-11-01 | 杭州海康威视数字技术股份有限公司 | 一种定向拾音方法、装置及电子设备 |
US20190278551A1 (en) * | 2018-03-06 | 2019-09-12 | Silicon Video Systems, Inc. | Variable layout module |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3480477B2 (ja) * | 1995-07-26 | 2003-12-22 | ソニー株式会社 | 動き検出回路および動き検出方法、並びに輝度・色信号分離装置 |
JPH1084284A (ja) * | 1996-09-06 | 1998-03-31 | Sony Corp | 信号再生方法および装置 |
US6954494B2 (en) | 2001-10-25 | 2005-10-11 | Siemens Corporate Research, Inc. | Online blind source separation |
JP3975153B2 (ja) | 2002-10-28 | 2007-09-12 | 日本電信電話株式会社 | ブラインド信号分離方法及び装置、ブラインド信号分離プログラム並びにそのプログラムを記録した記録媒体 |
-
2004
- 2004-09-01 EP EP04772585A patent/EP1662485B1/en not_active Expired - Lifetime
- 2004-09-01 DE DE602004022175T patent/DE602004022175D1/de not_active Expired - Lifetime
- 2004-09-01 US US10/539,609 patent/US7496482B2/en not_active Expired - Fee Related
- 2004-09-01 DE DE602004027774T patent/DE602004027774D1/de not_active Expired - Lifetime
- 2004-09-01 JP JP2005513646A patent/JP3949150B2/ja not_active Expired - Fee Related
- 2004-09-01 WO PCT/JP2004/012629 patent/WO2005024788A1/ja active Application Filing
- 2004-09-01 EP EP09004195A patent/EP2068308B1/en not_active Expired - Lifetime
Non-Patent Citations (4)
Title |
---|
ARAKI S. ET AL.: "Jikan shuhasu masking to ICA no heiyo ni yoru ongensu > microphone-su no baai no blind ongen bunri", THE ACOUSTICAL SOCIETY OF JAPAN (ASJ) 2003 NEN SHUKI KENKYU HAPPYOKAI KOEN RONBUNSHU -I-, 17 September 2003 (2003-09-17), pages 587 - 588, XP002985749 * |
RICKARD S. ET AL.: "On the approximative W-disjoint orthogonality of speech", PROC. ICASSP, vol. 1, 2002, pages 529 - 532, XP002985747 * |
SARUWATARI H.: "Onsei.onkyo shingo o taisho toshita blind ongen bunri", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS GIJUTSU KENKYU HOKOKU UTSUSHIN HOSHIKI], vol. 101, no. 669, 25 February 2002 (2002-02-25), pages 59 - 66, XP002985748 * |
See also references of EP1662485A4 * |
Cited By (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006090589A1 (ja) * | 2005-02-25 | 2006-08-31 | Pioneer Corporation | 音分離装置、音分離方法、音分離プログラムおよびコンピュータに読み取り可能な記録媒体 |
JP2006330687A (ja) * | 2005-04-28 | 2006-12-07 | Nippon Telegr & Teleph Corp <Ntt> | 信号分離装置、信号分離方法、そのプログラムおよび記録媒体 |
JP4653674B2 (ja) * | 2005-04-28 | 2011-03-16 | 日本電信電話株式会社 | 信号分離装置、信号分離方法、そのプログラムおよび記録媒体 |
JPWO2006132249A1 (ja) * | 2005-06-06 | 2009-01-08 | 国立大学法人佐賀大学 | 信号分離装置 |
WO2007083814A1 (ja) * | 2006-01-23 | 2007-07-26 | Kabushiki Kaisha Kobe Seiko Sho | 音源分離装置及び音源分離方法 |
JP2007219479A (ja) * | 2006-01-23 | 2007-08-30 | Kobe Steel Ltd | 音源分離装置、音源分離プログラム及び音源分離方法 |
JP4496186B2 (ja) * | 2006-01-23 | 2010-07-07 | 株式会社神戸製鋼所 | 音源分離装置、音源分離プログラム及び音源分離方法 |
JP2007243326A (ja) * | 2006-03-06 | 2007-09-20 | Mitsubishi Electric Corp | 信号分離方法およびその方法を使用した信号分離装置 |
JP4650891B2 (ja) * | 2006-03-06 | 2011-03-16 | 三菱電機株式会社 | 信号分離方法およびその方法を使用した信号分離装置 |
JP2007295085A (ja) * | 2006-04-21 | 2007-11-08 | Kobe Steel Ltd | 音源分離装置及び音源分離方法 |
US8634499B2 (en) * | 2006-04-27 | 2014-01-21 | Interdigital Technology Corporation | Method and apparatus for performing blind signal separation in an OFDM MIMO system |
US20110164567A1 (en) * | 2006-04-27 | 2011-07-07 | Interdigital Technology Corporation | Method and apparatus for performing blind signal separation in an ofdm mimo system |
JP2008052117A (ja) * | 2006-08-25 | 2008-03-06 | Oki Electric Ind Co Ltd | 雑音除去装置、方法及びプログラム |
JP2008134298A (ja) * | 2006-11-27 | 2008-06-12 | Megachips System Solutions Inc | 信号処理装置、信号処理方法およびプログラム |
WO2008072566A1 (ja) * | 2006-12-12 | 2008-06-19 | Nec Corporation | 信号分離再生装置および信号分離再生方法 |
JP5131596B2 (ja) * | 2006-12-12 | 2013-01-30 | 日本電気株式会社 | 信号分離再生装置および信号分離再生方法 |
US8345884B2 (en) | 2006-12-12 | 2013-01-01 | Nec Corporation | Signal separation reproduction device and signal separation reproduction method |
JP4746533B2 (ja) * | 2006-12-21 | 2011-08-10 | 日本電信電話株式会社 | 多音源有音区間判定装置、方法、プログラム及びその記録媒体 |
JP2008158035A (ja) * | 2006-12-21 | 2008-07-10 | Nippon Telegr & Teleph Corp <Ntt> | 多音源有音区間判定装置、方法、プログラム及びその記録媒体 |
JP2008203474A (ja) * | 2007-02-20 | 2008-09-04 | Nippon Telegr & Teleph Corp <Ntt> | 多信号強調装置、方法、プログラム及びその記録媒体 |
JP2008219458A (ja) * | 2007-03-05 | 2008-09-18 | Kobe Steel Ltd | 音源分離装置,音源分離プログラム及び音源分離方法 |
JP2008227916A (ja) * | 2007-03-13 | 2008-09-25 | Nippon Telegr & Teleph Corp <Ntt> | 信号分離装置、信号分離方法、信号分離プログラム、記録媒体 |
US8954324B2 (en) | 2007-09-28 | 2015-02-10 | Qualcomm Incorporated | Multiple microphone voice activity detector |
JP2013070395A (ja) * | 2008-01-29 | 2013-04-18 | Qualcomm Inc | 高度に相関する混合のための強調ブラインド信号源分離アルゴリズム |
WO2010005050A1 (ja) * | 2008-07-11 | 2010-01-14 | 日本電気株式会社 | 信号分析装置、信号制御装置及びその方法と、プログラム |
JPWO2010005050A1 (ja) * | 2008-07-11 | 2012-01-05 | 日本電気株式会社 | 信号分析装置、信号制御装置及びその方法と、プログラム |
JP2012507049A (ja) * | 2008-10-24 | 2012-03-22 | クゥアルコム・インコーポレイテッド | コヒーレンス検出のためのシステム、方法、装置、およびコンピュータ可読媒体 |
US8724829B2 (en) | 2008-10-24 | 2014-05-13 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for coherence detection |
US8954323B2 (en) | 2009-02-13 | 2015-02-10 | Nec Corporation | Method for processing multichannel acoustic signal, system thereof, and program |
WO2010092913A1 (ja) * | 2009-02-13 | 2010-08-19 | 日本電気株式会社 | 多チャンネル音響信号処理方法、そのシステム及びプログラム |
WO2010092915A1 (ja) * | 2009-02-13 | 2010-08-19 | 日本電気株式会社 | 多チャンネル音響信号処理方法、そのシステム及びプログラム |
US9064499B2 (en) | 2009-02-13 | 2015-06-23 | Nec Corporation | Method for processing multichannel acoustic signal, system therefor, and program |
JP2010217773A (ja) * | 2009-03-18 | 2010-09-30 | Yamaha Corp | 信号処理装置およびプログラム |
US8620672B2 (en) | 2009-06-09 | 2013-12-31 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for phase-based processing of multichannel signal |
JP2011027825A (ja) * | 2009-07-22 | 2011-02-10 | Sony Corp | 音声処理装置、音声処理方法およびプログラム |
JP2013504283A (ja) * | 2009-09-07 | 2013-02-04 | クゥアルコム・インコーポレイテッド | マルチチャネル信号の残響除去のためのシステム、方法、装置、およびコンピュータ可読媒体 |
JP2011107602A (ja) * | 2009-11-20 | 2011-06-02 | Sony Corp | 信号処理装置、および信号処理方法、並びにプログラム |
WO2012105385A1 (ja) * | 2011-02-01 | 2012-08-09 | 日本電気株式会社 | 有音区間分類装置、有音区間分類方法、及び有音区間分類プログラム |
WO2012105386A1 (ja) * | 2011-02-01 | 2012-08-09 | 日本電気株式会社 | 有音区間検出装置、有音区間検出方法、及び有音区間検出プログラム |
US9245539B2 (en) | 2011-02-01 | 2016-01-26 | Nec Corporation | Voiced sound interval detection device, voiced sound interval detection method and voiced sound interval detection program |
JP5994639B2 (ja) * | 2011-02-01 | 2016-09-21 | 日本電気株式会社 | 有音区間検出装置、有音区間検出方法、及び有音区間検出プログラム |
US9530435B2 (en) | 2011-02-01 | 2016-12-27 | Nec Corporation | Voiced sound interval classification device, voiced sound interval classification method and voiced sound interval classification program |
JP2014077899A (ja) * | 2012-10-11 | 2014-05-01 | Institute Of National Colleges Of Technology Japan | 信号処理方法、装置、プログラム、およびプログラムを記録したコンピュータ読み取り可能な記録媒体 |
JP2014089249A (ja) * | 2012-10-29 | 2014-05-15 | Mitsubishi Electric Corp | 音源分離装置 |
CN110491410A (zh) * | 2019-04-12 | 2019-11-22 | 腾讯科技(深圳)有限公司 | 语音分离方法、语音识别方法及相关设备 |
CN115810364A (zh) * | 2023-02-07 | 2023-03-17 | 海纳科德(湖北)科技有限公司 | 混音环境中的端到端目标声信号提取方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2005024788A1 (ja) | 2006-11-09 |
EP2068308B1 (en) | 2010-06-16 |
DE602004022175D1 (de) | 2009-09-03 |
EP2068308A3 (en) | 2009-07-08 |
DE602004027774D1 (de) | 2010-07-29 |
JP3949150B2 (ja) | 2007-07-25 |
US7496482B2 (en) | 2009-02-24 |
EP1662485A1 (en) | 2006-05-31 |
EP1662485A4 (en) | 2008-01-23 |
US20060058983A1 (en) | 2006-03-16 |
EP1662485B1 (en) | 2009-07-22 |
EP2068308A2 (en) | 2009-06-10 |
WO2005024788A9 (ja) | 2007-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2005024788A1 (ja) | 信号分離方法、信号分離装置、信号分離プログラム及び記録媒体 | |
EP3479377B1 (en) | Speech recognition | |
CN109661705B (zh) | 声源分离装置和方法以及程序 | |
US10176826B2 (en) | Separating audio sources | |
JP4406428B2 (ja) | 信号分離装置、信号分離方法、信号分離プログラム及び記録媒体 | |
US20140078867A1 (en) | Sound direction estimation device, sound direction estimation method, and sound direction estimation program | |
JP6334895B2 (ja) | 信号処理装置及びその制御方法、プログラム | |
US20180070170A1 (en) | Sound processing apparatus and sound processing method | |
JP6345327B1 (ja) | 音声抽出装置、音声抽出方法および音声抽出プログラム | |
JP6992873B2 (ja) | 音源分離装置、音源分離方法およびプログラム | |
JP6538624B2 (ja) | 信号処理装置、信号処理方法および信号処理プログラム | |
JP4769238B2 (ja) | 信号分離装置、信号分離方法、プログラム及び記録媒体 | |
JP2019049685A (ja) | 音声抽出装置、音声抽出方法および音声抽出プログラム | |
JP2013167698A (ja) | 音源ごとに信号のスペクトル形状特徴量を推定する装置、方法、目的信号のスペクトル特徴量を推定する装置、方法、プログラム | |
WO2012023268A1 (ja) | 多マイクロホン話者分類装置、方法およびプログラム | |
JP6973254B2 (ja) | 信号分析装置、信号分析方法および信号分析プログラム | |
WO2021112066A1 (ja) | 音響解析装置、音響解析方法及び音響解析プログラム | |
WO2020184210A1 (ja) | 雑音空間共分散行列推定装置、雑音空間共分散行列推定方法、およびプログラム | |
JP5147012B2 (ja) | 目的信号区間推定装置、目的信号区間推定方法、目的信号区間推定プログラム及び記録媒体 | |
JP6915579B2 (ja) | 信号分析装置、信号分析方法および信号分析プログラム | |
JP2019035851A (ja) | 目的音源推定装置、目的音源推定方法及び目的音源推定プログラム | |
Wei et al. | Underdetermined Blind Source Separation Based on Spatial Estimation and Compressed Sensing | |
Selouani et al. | Evolutionary Algorithms and Speech Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 2005513646 Country of ref document: JP |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 20048015707 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2004772585 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2006058983 Country of ref document: US Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10539609 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 10539609 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 2004772585 Country of ref document: EP |