US10878832B2 - Mask estimation apparatus, mask estimation method, and mask estimation program - Google Patents
Mask estimation apparatus, mask estimation method, and mask estimation program Download PDFInfo
- Publication number
- US10878832B2 US10878832B2 US15/998,742 US201615998742A US10878832B2 US 10878832 B2 US10878832 B2 US 10878832B2 US 201615998742 A US201615998742 A US 201615998742A US 10878832 B2 US10878832 B2 US 10878832B2
- Authority
- US
- United States
- Prior art keywords
- distribution
- component
- parameter
- masks
- mask
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 title claims description 26
- 238000009826 distribution Methods 0.000 claims abstract description 200
- 239000013598 vector Substances 0.000 claims abstract description 93
- 239000000203 mixture Substances 0.000 claims abstract description 71
- 238000000605 extraction Methods 0.000 claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims description 91
- 230000006870 function Effects 0.000 claims description 28
- 230000017105 transposition Effects 0.000 claims description 6
- 239000000284 extract Substances 0.000 abstract description 4
- 230000014509 gene expression Effects 0.000 description 110
- 238000004364 calculation method Methods 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000000926 separation method Methods 0.000 description 7
- 238000010606 normalization Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 238000000354 decomposition reaction Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2134—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
-
- G06K9/624—
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/0308—Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
Definitions
- the present invention relates to a mask estimation apparatus, a mask estimation method, and a mask estimation program.
- a technology to estimate masks indicating the degree of contribution of each acoustic signal at each time-frequency point is known, which are used to estimate desired sound from observation signals acquired by a plurality of microphones.
- the masks are used for noise reduction or source separation of the observation signals.
- a position parameter and a spread parameter of a mixture distribution modeling the feature vectors are updated to adjust the positions and the spread of component distributions of the mixture distribution.
- a component distribution merely represents a shape of distribution that is rotationally symmetric about some axis.
- the shape of distribution of the feature vectors is affected by various factors including the positions of microphones and transfer characteristics in an acoustic space, and is not always rotationally symmetric.
- the shape of distribution of the feature vectors may be an elliptical shape of distribution.
- the mixture distribution cannot sufficiently approximate to the distribution of feature vectors, and the problem is that the accuracy of mask estimation is not always high.
- One example of an embodiment disclosed herein has been made in view of the above, and, for example, it is an object thereof to provide a mask estimation apparatus, a mask estimation method, and a mask estimation program for further improving the accuracy of mask estimation.
- a mask estimation apparatus extracts, from a plurality of observation signals acquired at different positions under a situation where a plurality of acoustic signals are mixed, feature vectors obtained by collecting time-frequency components of the observation signals for each time-frequency point.
- the mask estimation apparatus uses the feature vectors, a mixture weight of each component distribution, and a shape parameter that is a model parameter capable of controlling a shape of each component distribution to update masks indicating a proportion in which each component distribution contributes to each time-frequency point, where a probability distribution of the feature vectors is modeled by a mixture distribution consisting of a plurality of component distributions.
- the mask estimation apparatus updates the mixture weight based on the updated masks.
- the mask estimation apparatus updates the shape parameter based on the feature vectors and the masks.
- masks can be accurately estimated even when the distribution of the feature vectors is not rotationally symmetric.
- FIG. 1 is a diagram illustrating an example of a mask estimation apparatus according to a first embodiment (second embodiment).
- FIG. 2 is a flowchart illustrating an example of mask estimation processing according to the first embodiment (second embodiment).
- FIG. 3 is a diagram illustrating an example of a desired sound estimation system according to a third embodiment.
- FIG. 4 is a flowchart illustrating an example of desired sound estimation processing according to the third embodiment.
- FIG. 5 is a diagram illustrating an example of a computer in which a desired sound estimation system including the mask estimation apparatus and the desired sound estimation apparatus according to the embodiments is implemented by executing a computer program.
- FIG. 6 is a diagram illustrating an example of a mask estimation apparatus according to the background art.
- the expression “ ⁇ circumflex over ( ) ⁇ A” for “A” is equivalent to “the symbol with ⁇ circumflex over ( ) ⁇ above A”. Furthermore, the expression “vector A” is used when “A” is a vector, the expression “matrix A” is used when “A” is a matrix, and the expression “set A” is used when “A” is a set.
- FIG. 6 is a diagram illustrating an example of a mask estimation apparatus according to the background art.
- a mask estimation apparatus 10 A according to the background art is connected with a storage unit 20 A.
- the mask estimation apparatus 10 A includes a feature extraction unit 11 A, a mask update unit 12 A, a mixture weight update unit 1311 , a position parameter update unit 14 A, and a spread parameter update unit 15 A.
- the storage unit 20 A stores therein a mixture weight for each component distribution, a position parameter indicating the position of each component distribution, and a spread parameter indicating the spread of each component distribution, where the probability distribution of feature vectors extracted by the feature extraction unit 11 A is modeled by a mixture distribution consisting of a plurality of component distributions.
- the feature extraction unit 11 A receives M (M>1) observation signals recorded at different positions in a situation where N (N>1) acoustic signals are mixed, makes an M-dimensional column vector obtained by collecting the time-frequency components of all observation signals for each time-frequency point, and uses the M-dimensional column vector to extract an M-dimensional feature vector having a predetermined norm.
- the mask update unit 12 A receives the feature vectors extracted by the feature extraction unit 11 A from the feature extraction unit 11 A, receives the mixture weight, the position parameter, and the spread parameter stored in the storage unit 20 A from the storage unit 20 A, and updates masks indicating the proportion in which each component distribution contributes to each time-frequency point.
- the mixture weight update unit 13 A receives the masks updated by the mask update unit 12 A, and updates the mixture weight.
- the position parameter update unit 14 A receives the feature vectors extracted by the feature extraction unit 11 A and the masks updated by the mask update unit 12 A, and updates the position parameter.
- the spread parameter update unit 15 A receives the feature vectors extracted by the feature extraction unit 11 A and the masks updated by the mask update unit 12 A, and updates the spread parameter.
- the above Literature 1 assumes that the number N of acoustic signals is already known and that N- 1 thereof are desired sounds and one thereof is background noise.
- the case N>2 corresponds to the case where a conversation by N ⁇ 1 persons is recorded by M microphones under the presence of background noise, for example.
- an M-dimensional column vector (hereinafter referred to as an “observation signal vector”) obtained by collecting, for each time-frequency point, time-frequency components of observation signals obtained by applying time-frequency analysis such as the short-time Fourier transform to M observation signals is represented by y(t,f).
- an observation signal vector y(t,f) takes one of the following N states at each time-frequency point.
- State S N the state where no desired sound is present.
- observation signal vector y(t,f) can be modeled by either of the following expressions (1-1) and (1-2).
- y ( t,f ) s n ( t,f )+ v ( t,f ) (1-1)
- y ( t,f ) v ( t,f ) (1-2)
- the above expression (1-1) represents the case where only the n-th of the desired sounds is present at the time-frequency point
- the above expression (1-2) represents the case where none of the desired sounds is present at the time-frequency point.
- the vector s n (t,f) in the above expressions (1-1) and (1-2) represents a component corresponding to the n-th desired sound
- the vector v(t,f) represents a component corresponding to background noise.
- the mask ⁇ n (t,f) can be defined as the posterior probability of the state S n at the time-frequency point.
- the mask ⁇ n (t,f) is defined as the posterior probability of the state S n at the time-frequency point.
- the masks ⁇ n (t,f) can be used for various applications, such as the estimation of each desired sound s n (t,f). For example, by using the masks ⁇ n (t,f) to collect the time-frequency components where the desired sound is present, statistics representing characteristics of the desired sound can be estimated. Each desired sound s n (t,f) can be estimated by a filter designed by using the statistics.
- the feature extraction unit 11 A extracts a feature vector representing information as to from which direction the sound arrives at each time-frequency point.
- the length (norm) of an observation signal vector y(t,f) depends on an acoustic signal itself, but it is assumed that the direction of the observation signal vector y(t,f) is determined by only the position of its sound source. Based on this assumption, an M-dimensional feature vector z(t,f) having a given norm is extracted from the observation signal vector y(t,f) as a feature vector representing the sound source position.
- the feature vector z(t,f) can be extracted by the following expression (2).
- ⁇ represents the 2-norm.
- the direction of arrival of sound is different depending on which of the states S n the time-frequency point takes.
- the feature vector z(t,f) has a different probability distribution for each state S n .
- Clustering is performed by estimating model parameters (set) ⁇ so that the mixture distribution in the above expression (3) approximates to the distribution of the feature vector z(t,f).
- Each component distribution of the mixture distribution in the above expression (3) is represented by the following expression (4).
- K is the number of components
- the model parameters (set) ⁇ represent a set ⁇ k (f),a k (f), ⁇ 2 k (f) ⁇ of the model parameters of the mixture distribution in the above expression (3)
- ⁇ H represents Hermitian transposition of a matrix. If we compare the component distribution p w (z(t,f);a k (f), ⁇ 2 k (f)) in the above expression (3) to a mountain, the mixture weight ⁇ k (f) as a model parameter of the mixture distribution in the above expression (3) corresponds to the height of the mountain, the position parameter a k (f) corresponds to the position of the mountain, and the spread parameter ⁇ 2 k (f) corresponds to the spread of the foot of the mountain.
- the number K of components is equal to the number N of acoustic signals (assumed to be already known).
- the posterior probability that the time-frequency point corresponds to the k-th component distribution under the condition that the feature vector z(t,f) is observed (that is, the mask) is obtained by the following expression (5) based on Bayes' theorem.
- the question is how to estimate the model parameters ⁇ .
- the model parameters ⁇ and the masks ⁇ k (t,f) are estimated by alternating two steps, namely a step of updating the masks ⁇ k (t,f) by the above expression (5) using the model parameters ⁇ and a step of updating the model parameters ⁇ using the masks ⁇ k (t,f).
- the masks ⁇ k (t,f) are used to update the model parameters ⁇ by the following expressions (6-1) to (6-3).
- a k ( f ) UNIT-NORM EIGENVECTOR CORRESPONDING TO MAXIMUM EIGENVALUE OF R k ( f ) (6-2)
- the matrix R k (f) is calculated by the following expression (7).
- the above-mentioned iterative processing can be theoretically derived as optimization of the log-likelihood by the expectation-maximization algorithm.
- the above expression (5) corresponds to processing in which the mask update unit 12 A updates the masks ⁇ k (t,f)
- the above expression (6-1) corresponds to processing in which the mixture weight update unit 13 A updates the mixture weight ⁇ k (f)
- the above expression (6-2) corresponds to processing in which the position parameter update unit 14 A updates the position parameter a k (f)
- the above expression (6-3) corresponds to processing in which the spread parameter update unit 15 A updates the spread parameter ⁇ 2 k (f).
- the masks are estimated by the above expression (5) based on the mixture distribution in the above expression (3), and hence the estimation accuracy of the masks is greatly affected by how accurately the mixture distribution in the above expression (3) can approximate to the distribution of the feature vectors z(t,f).
- the position parameter a k (f) and the spread parameter ⁇ 2 k (f) in the mixture distribution in the above expression (3) are updated to adjust the position and the spread of the component distribution in the above expression (4).
- the component distribution in the above expression (4) can represent only a shape of distribution that is rotationally symmetric about some axis.
- the shape of distribution of the feature vectors z(t,f) is affected by various factors including the arrangement of microphones and acoustic transfer characteristics of a room, and is not always rotationally symmetric.
- the mixture distribution in the above expression (3) cannot always sufficiently approximate to the distribution of the feature vectors z(t,f), and the problem is that the accuracy of mask estimation by the above expression (5) is not high.
- a mask estimation apparatus in a first aspect of the disclosed embodiments updates masks based on a mixture weight of each component distribution and a shape parameter that is a model parameter capable of controlling the shape of each component distribution, where the probability distribution of an M-dimensional feature vector based on M observation signals (M>1) recorded in a situation in which N (N>1) acoustic signals are mixed is modeled by a mixture distribution consisting of component distributions.
- the shape means the attributes of a figure except those representing its position and spread. Examples of the shape include information corresponding to the major axis and the minor axis of an ellipse.
- the shape parameter is updated to adjust the shapes of the component distributions, and hence as compared with the method in the above Literature 1, the mixture distribution can approximate more accurately to the distribution of the feature vectors, and the masks can be estimated more accurately.
- a mask estimation apparatus in a second aspect of the disclosed embodiments is configured so that, in the mask estimation apparatus in the first aspect, each component distribution is a complex Bingham distribution, where the probability distribution of the feature vectors is modeled by a mixture distribution consisting of component distributions, and the shape parameter is a parameter matrix of the complex Bingham distributions.
- a probability distribution p(z(t,f); ⁇ ) of the feature vectors z(t,f) is modeled by a mixture distribution (hereinafter referred to as “mixed complex Bingham distributions”) represented by the following expression (8) in which the component distributions are complex Bingham distributions.
- the matrix B is a parameter matrix of the complex Bingham distribution
- c(B) is a normalization constant.
- the parameters ⁇ represent a set ⁇ k (f),B k (f) ⁇ of model parameters of the mixed complex Bingham distributions in the above expression (8).
- the mixture weight ⁇ k (f) which is a model parameter in the mixture distribution in the above expression (8), represents the height of the component distribution k
- the matrix B k (f) is a shape parameter capable of controlling the shape of distribution (such as distribution spreads in axial directions of an ellipse representing the shape of distribution) in addition to the position and the spread of the component distribution k.
- the first eigenvector of the matrix B k (f) represents the position of the component distribution k
- the absolute value of the difference between the first eigenvalue and the second eigenvalue of the matrix B k (f) represents the smallness of spread of the component distribution k
- the absolute value of the difference between the first eigenvalue and the m-th eigenvalue (3 ⁇ m ⁇ M) of the matrix B k (f) represents the smallness of a distribution spread in the (m ⁇ 1)th axial direction of an ellipse representing the shape of distribution of the component distribution k.
- the complex Bingham distribution in the above expression (9) is represented by the following expression (10).
- the above expression (10) has the same shape as in the above expression (4), and is rotationally symmetric about the axis parallel to the vector a.
- the complex Bingham distribution in the above expression (9) no special restrictions are imposed on the parameter matrix B, and by updating the parameter matrix B, the shape of distribution of the complex Bingham distribution in the above expression (9) can be adjusted to represent a shape of distribution that is not rotationally symmetric.
- the use of the complex Bingham distribution in the above expression (9) enables the mixture distribution in the above expression (8) to sufficiently approximate to the distribution of the feature vectors z(t,f) even when the shape of distribution of the feature vectors z(t,f) is not rotationally symmetric.
- the masks can be estimated more accurately than in the method described in the above Literature 1.
- an algorithm for estimating the masks ⁇ k (t,f) and the model parameters ⁇ by alternating two steps, namely, a step of updating the masks ⁇ k (t,f) using the model parameters ⁇ and a step of updating the model parameters ⁇ using the masks ⁇ k (t,f), can be derived.
- a mask estimation apparatus in a third aspect of the disclosed embodiments is configured so that, in the mask estimation apparatus in the first aspect, each component distribution is a complex angular central Gaussian (cACG) distribution, where the probability distribution of the feature vectors is modeled by a mixture distribution consisting of component distributions, and the shape parameter is a parameter matrix of the complex angular central Gaussian distribution.
- cACG complex angular central Gaussian
- the probability distribution p(z(t,f); ⁇ ) of the feature vector z(t,f) is modeled by a mixture distribution (hereinafter referred to as “mixed complex angular central Gaussian distributions”) in the following expression (11) in which the component distributions are complex angular central Gaussian distributions.
- the p A (z; ⁇ ) in the above expression (11) is a complex angular central Gaussian distribution defined by the following expression (12) in which the parameter matrix is a matrix ⁇ .
- the matrix ⁇ k (f) is a shape parameter capable of controlling the shape of distribution (such as distribution spreads in axial directions of an ellipse representing the shape of distribution) in addition to the position and the spread of the component distribution k.
- the first eigenvector of the matrix ⁇ k (f) represents the position of the component distribution k
- the ratio obtained by dividing the first eigenvalue of the matrix ⁇ k (f) by the second eigenvalue represents the smallness of the spread of the component distribution k
- the ratio obtained by dividing the first eigenvalue of the matrix ⁇ k (f) by the m-th eigenvalue (3 ⁇ m ⁇ M) represents the smallness of the distribution spread in the (m ⁇ 1)th axial direction of an ellipse representing the shape of distribution of the component distribution k.
- the model parameters ⁇ represent a set ⁇ k (f); ⁇ k (f) ⁇ of model parameters of the mixed complex angular central Gaussian distributions in the above expression (11).
- the complex Bingham distribution is used as a component distribution.
- N the number of acoustic signals
- M the mixture of N ⁇ 1 desired sounds and one background noise
- FIG. 1 is a diagram illustrating an example of a mask estimation apparatus according to the first embodiment.
- a mask estimation apparatus 10 according to the first embodiment is connected with a storage unit 20 .
- the mask estimation apparatus 10 includes a feature extraction unit 11 , a mask update unit 12 , a mixture weight update unit 13 , and a parameter update unit 14 .
- ⁇ being the number (sample number) representing time
- an observation signal in a time domain recorded by a microphone m is denoted by y m ( ⁇ ).
- n is a positive integer) and background noise v m ( ⁇ ).
- the feature extraction unit 11 receives observation signals recorded by a plurality of microphones, and applies time-frequency analysis to each observation signal y m ( ⁇ ) to determine a time-frequency component y m (t,f) of each observation signal (m represents the microphone index and is an integer between 1 and M).
- time-frequency analysis such as the short-time Fourier transform and the short-time discrete cosine transform can be used as the time-frequency analysis.
- observation signal vector an M-dimensional column vector (referred to as “observation signal vector”) y(t,f) represented by the following expression (14), which is obtained by collecting time-frequency components of all observation signals for each time-frequency point.
- y ⁇ ( t , f ) [ y 1 ⁇ ( t , f ) y 2 ⁇ ( t , f ) ⁇ y M ⁇ ( t , f ) ] ( 14 )
- the feature extraction unit 11 uses the observation signal vector y(t,f) to extract an M-dimensional feature vector z(t,f) having a predetermined norm.
- the feature vector z(t,f) various kinds of feature vectors such as the ones described in Literature 2 “H. Sawada, S. Araki, and S. Makino, “Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment,” IEEE Transactions on Audio, Speech and Language Processing (ASLP), vol. 19, no. 3, pp. 516-527, March 2011.”, Literature 3 “D. H. Tran Vu and R.
- the mask update unit 12 receives the feature vector z(t,f), and calculates and outputs masks ⁇ k (t,f) indicating the proportion in which the k-th component distribution contributes to each time-frequency point (t,f).
- the initial values of the model parameters ⁇ can be set by various methods, including the use of random numbers.
- the mask update unit 12 receives the feature vector z(t,f) and the current estimated values of the model parameters ⁇ , and updates masks corresponding to the k-th component distribution by the following expression (16), for example, as the posterior probability ⁇ k (t,f).
- the mixture weight update unit 13 receives the posterior probability ⁇ k (t,f), and updates the mixture weight by the following expression (18).
- the parameter update unit 14 receives the feature vector z(t,f) and the posterior probability ⁇ k (t,f), and calculates a positive definite Hermitian matrix R k (f) by the following expression (19).
- the parameter update unit 14 performs eigenvalue decomposition of the matrix R k (f) as represented by the following expression (20).
- R k ( f ) U k ( f ) D k ( f ) U k H ( f ) (20)
- a unitary matrix U k (f) of eigenvectors of the matrix R k (f) and a diagonal matrix D k (f) of eigenvalues of the matrix R k (f) are determined.
- the m-th diagonal component (eigenvalue) d km (f) of the matrix D k (f) is determined on the assumption that the diagonal components are arranged in ascending order of d k1 (f) ⁇ . . . ⁇ d kM (f).
- the parameter update unit 14 updates the parameter matrix B k (f) with the following expression (21).
- B k ( f ) U k ( f ) E k ( f ) U k H ( f ) (21)
- the matrix ⁇ k (f) in the above expression (21) is a diagonal matrix in which the m-th diagonal component is e km (f).
- e km (f) is given by the following expression (22).
- the processing in the mask update unit 12 , the mixture weight update unit 13 , and the parameter update unit 14 is repeatedly performed until the finish conditions are satisfied.
- finish conditions various conditions such as the condition that “the processing is repeatedly performed for a predetermined number of times (for example, 20 times)” and the condition that “the increase amount of the log-likelihood function (described later) before and after update is equal to or smaller than a predetermined threshold” can be used.
- the storage unit 20 stores therein the mixture weight updated by the mixture weight update unit 13 and the shape parameter updated by the parameter update unit 14 , and provides the stored mixture weight and shape parameter in the next processing in the mask update unit 12 and the parameter update unit 14 .
- the processing in the mask estimation apparatus 10 is derived by solving an optimization problem of maximizing a log-likelihood function L( ⁇ ) in the following expression (23) with respect to model parameters ⁇ .
- the log-likelihood function L( ⁇ ) in the above expression (23) can be optimized by alternatingly repeating E Step and M Step described below based on the expectation-maximization algorithm.
- the current estimated values of the model parameters ⁇ are used to update the posterior probability ⁇ k (t,f) of the k-th component distribution by the following expression (24-2).
- the normalization constant c(.) is defined by the above expression (17).
- the posterior probability ⁇ k (t,f) updated at E Step is used to maximize a Q function defined by the following expressions (25-1) and (25-2), thereby updating the model parameters ⁇ .
- the matrix R k (f) is defined by the above expression (19), and tr represents the trace of the matrix.
- an update rule for the shape parameter B k (f) of the complex Bingham distribution which is a component distribution, is derived.
- Eigenvalue decomposition of the matrix R k (f) is defined by the following expression (26-1), and eigenvalue decomposition of the matrix B k (f) is defined by the following expression (26-2).
- the matrix U k (f) in the above expression (26-1) is a unitary matrix of eigenvectors of the matrix R k (f)
- the matrix V k (f) in the above expression (26-2) is a unitary matrix of eigenvectors of the matrix B k (f).
- the matrix D k (f) in the above expression (26-1) is a diagonal matrix of eigenvalues of the matrix R k (f)
- the matrix E k (f) in the above expression (26-2) is a diagonal matrix of eigenvalues of the matrix B k (f).
- the matrix D k (f) and the matrix E k (f) are represented by the following expressions (27-1) and (27-2), respectively. Note that the maximum eigenvalue e kM (f) of the matrix B(f) is 0.
- the first eigenvector represents the position of a peak of the distribution of the sound source of interest.
- FIG. 2 is a flowchart illustrating an example of mask estimation processing according to the first embodiment.
- the feature extraction unit 11 extracts M-dimensional feature vector z(t,f) from observation signals recorded by microphones (Step S 11 ).
- the mask update unit 12 calculates and updates masks ⁇ k (t,f) based on the feature vector z(t,f), a mixture weight, and a shape parameter (Step S 12 ).
- the mixture weight update unit 13 updates the mixture weight based on a posterior probability ⁇ k (t,f) (Step S 13 ).
- the parameter update unit 14 updates the parameter matrix based on the feature vector z(t,f) and the posterior probability ⁇ k (t,f) (Step S 14 ).
- the mask update unit 12 determines whether the finish conditions are satisfied (Step S 15 ). When the finish conditions are satisfied (Yes at Step S 15 ), the mask update unit 12 finishes the mask estimation processing. When the finish conditions are not satisfied (No at Step S 15 ), the mask update unit 12 moves the processing to Step S 12 .
- a complex angular central Gaussian distribution is used as a component distribution.
- N of acoustic signals is already known and sounds in which N ⁇ 1 desired sounds and one background noise are mixed are recorded by M microphones.
- the mask estimation apparatus 10 - 2 is connected with a storage unit 20 - 2 .
- the mask estimation apparatus 10 - 2 includes a feature extraction unit 11 , a mask update unit 12 - 2 , a mixture weight update unit 13 - 2 , and a parameter update unit 14 - 2 .
- the feature extraction unit 11 is similar to that in the first embodiment.
- the mask update unit 12 - 2 receives a feature vector z(t,f), calculates and outputs a mask ⁇ k (t,f) indicating the proportion in which the k-th component distribution contributes to each time-frequency point (t,f).
- the matrix ⁇ k (f) is a parameter matrix of complex angular central Gaussian distributions.
- the mixture weight update unit 13 - 2 receives the posterior probability ⁇ k (t,f), and updates the mixture weight with the following expression (32).
- the parameter update unit 14 - 2 receives the feature vector z(t,f), the parameter matrix ⁇ k (t,f), and the posterior probability ⁇ k (t,f), and updates the parameter matrix ⁇ k (t,f) with the following expression (33).
- the processing in the mask update unit 12 - 2 , the mixture weight update unit 13 - 2 , and the parameter update unit 14 - 2 is repeatedly performed until the finish conditions are satisfied similarly to the first embodiment.
- the storage unit 20 - 2 stores therein the mixture weight updated by the mixture weight update unit 13 - 2 and the shape parameter updated by the parameter update unit 14 - 2 , and provides the stored masks and shape parameter in the next processing in the mask update unit 12 - 2 and the parameter update unit 14 - 2 .
- the processing in the mask estimation apparatus 10 - 2 is derived by solving an optimization problem of maximizing the log-likelihood function L( ⁇ ) in the following expression (34) with respect to the model parameters ⁇ .
- the log-likelihood function L( ⁇ ) in the above expression (34) can be optimized by alternatingly repeating E Step and M Step described below based on the expectation-maximization algorithm.
- the current estimated values of the model parameters ⁇ are used to update the posterior probability ⁇ k (t,f) of the k-th component distribution with the following expression (35-2).
- the posterior probability ⁇ k (t,f) updated at E Step is used to update the model parameters ⁇ by maximizing the Q function defined by the following expressions (36-1) and (36-2). Note that ln represents a logarithmic function.
- mask estimation processing according to the second embodiment is executed in accordance with the processing order in the flowchart representing an example of mask estimation processing illustrated in FIG. 2 similarly to the mask estimation processing according to the first embodiment.
- N ⁇ 1 of the N acoustic signals are desired sounds, and one of the N acoustic signals is background noise.
- masks corresponding to each acoustic signal can be estimated by the same processing.
- the parameters ⁇ are regarded as being definite, and the parameters ⁇ are estimated by the maximum-likelihood method.
- the parameters ⁇ may be regarded as random variables to give a prior distribution, and the parameters ⁇ may be estimated by posterior probability maximization.
- the third embodiment discloses a desired sound estimation system configured to estimate desired sound by using either the mask estimation apparatus 10 in the first embodiment or the mask estimation apparatus 10 - 2 in the second embodiment.
- FIG. 3 is a diagram illustrating an example of a desired sound estimation system according to the third embodiment.
- a desired sound estimation system 100 includes either the mask estimation apparatus 10 in the first embodiment or the mask estimation apparatus 10 - 2 in the second embodiment, and a desired sound estimation apparatus 30 .
- the desired sound estimation apparatus 30 includes a matrix calculation unit 31 , a Wiener filter calculating unit 32 , and a desired sound estimation unit 33 .
- a permutation problem Due to the permutation problem, in order to estimate each desired sound by using the masks ⁇ k (t,f) determined in the first embodiment or the second embodiment, measures are taken such that a cluster corresponding to acoustic signals having the same index n has the same cluster index irrespective of the frequency f. This is called permutation alignment.
- the permutation alignment can be achieved by various methods, such as the method described in the above Literature 2 “H. Sawada, S. Araki, and S. Makino, “Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment,” IEEE Transactions on Audio, Speech and Language Processing (ASLP), vol. 19, no. 3, pp. 516-527, March 2011.”
- ⁇ n (t,f) a mask after the permutation alignment corresponding to the n-th acoustic signal is denoted by ⁇ n (t,f) again. Furthermore, it is assumed that which of the N masks ⁇ n (t,f) corresponds to background noise is already known. For example, separated sounds created by masking using the masks are compared by hearing, so that a cluster corresponding to background noise can be identified artificially.
- the matrix calculation unit 31 calculates a noise covariance matrix ⁇ v (f) by the following expression (39).
- ⁇ n ( f ) ⁇ n+v ( f ) ⁇ v ( f ) (40)
- the matrix calculation unit 31 determines an observation covariance matrix ⁇ y (f) by the following expression (41).
- W n ( f ) ⁇ y ⁇ 1 ( f ) ⁇ n ( f ) (42)
- the desired sound estimation unit 33 applies the multichannel Wiener filter W n (f) to the observation signal vector y(t,f) as represented by the following expression (43), so that background noise and components of desired sounds other than the n-th desired sound can be suppressed to obtain an estimated value ⁇ circumflex over ( ) ⁇ s n (t,f) of the component of the desired sound n.
- ⁇ n ( t,f ) W n H ( f ) y ( t,f ) (43)
- FIG. 4 is a flowchart illustrating an example of desired sound estimation processing according to the third embodiment.
- the matrix calculation unit 31 in the desired sound estimation apparatus 30 acquires mask information from the mask estimation apparatus 10 ( 10 - 2 ) (Step S 21 ).
- the matrix calculation unit 31 calculates a desired sound covariance matrix ⁇ n+v (f) with noise (Step S 22 ).
- the matrix calculation unit 31 calculates a noise covariance matrix ⁇ v (f) (Step S 23 ).
- the matrix calculation unit 31 calculates a desired sound covariance matrix ⁇ n (f) (Step S 24 ).
- the matrix calculation unit 31 calculates an observation covariance matrix ⁇ y (f) (Step S 25 ).
- the Wiener filter calculation unit 32 calculates a multichannel Wiener filter W n (f) (Step S 26 ).
- the desired sound estimation unit 33 applies the multichannel Wiener filter W n (f) calculated at Step S 26 to the observation signal vector y(t,f) to obtain and output an estimated value ⁇ circumflex over ( ) ⁇ s n (t,f) of components of the desired sound n (Step S 27 ).
- Speech recognition performance was as follows.
- the word error rate in the case where speech recognition was performed without mask estimation was 14.29(%). Furthermore, the word error rate in the case where speech recognition was performed in a manner that multichannel Wiener filtering was applied after masks were estimated by mixed complex Watson distributions was 9.51(%). The word error rate in the case where multichannel Wiener filtering was applied after masks were estimated by mixed complex Bingham distributions in the desired sound estimation system 100 including the mask estimation apparatus 10 in the first embodiment was 8.53(%). From the above, it is understood that speech recognition performance in the third embodiment is improved as compared with the related art.
- each processing performed in the desired sound estimation system 100 including the mask estimation apparatus 10 ( 10 - 2 ) and the desired sound estimation apparatus 30 in the above-mentioned embodiments may be implemented by a processing device such as a central processing unit (CPU) and a computer program analyzed and executed by the processing device. Furthermore, each processing performed in the desired sound estimation system 100 including the mask estimation apparatus 10 ( 10 - 2 ) and the desired sound estimation apparatus 30 may be implemented as hardware by wired logic.
- a processing device such as a central processing unit (CPU) and a computer program analyzed and executed by the processing device.
- each processing performed in the desired sound estimation system 100 including the mask estimation apparatus 10 ( 10 - 2 ) and the desired sound estimation apparatus 30 may be implemented as hardware by wired logic.
- the whole or part of the processing described above to be automatically performed among the processing described in the embodiments may be manually performed.
- the whole or part of the processing described above to be manually performed among the processing described in the embodiments may be automatically performed by a publicly known method.
- each processing described above with reference to the flowchart in the embodiments may be performed with appropriate replacement of execution order and parallel execution as long as the final execution results are identical.
- the above-mentioned and illustrated processing procedure, control procedure, specific names, and information including various kinds of data and parameters can be changed as appropriate unless otherwise specified.
- FIG. 5 is a diagram illustrating an example of a computer in which a desired sound estimation system including the mask estimation apparatus and the desired sound estimation apparatus according to the embodiments is implemented by executing a computer program.
- a computer 1000 includes, for example, a memory 1010 and a CPU 1020 . Furthermore, the computer 1000 includes a hard disk drive interface 1030 , a disk drive interface 1040 , a serial port interface 1050 , a video adapter 1060 , and a network interface 1070 . In the computer 1000 , these units are connected by a bus 1080 .
- the memory 1010 includes a ROM 1011 and a RAM 1012 .
- the ROM 1011 stores a boot program such as BIOS therein.
- the hard disk drive interface 1030 is connected to a hard disk drive 1031 .
- the disk drive interface 1040 is connected to a disk drive 1041 .
- a removable storage medium such as a magnetic disk and an optical disc is inserted to a disk drive 1041 .
- the serial port interface 1050 is connected to, for example, a mouse 1051 and a keyboard 1052 .
- the video adapter 1060 is connected to a display 1061 .
- the hard disk drive 1031 stores an OS 1091 , an application program 1092 , a program module 1093 , and program data 1094 therein.
- a computer program that defines each processing of the mask estimation apparatus 10 ( 10 - 2 ) and the desired sound estimation apparatus 30 is stored in, for example, the hard disk drive 1031 as a program module 1093 in which a command executed by the computer 1000 is written.
- a program module 1093 for executing information processing similarly to the functional configurations in the mask estimation apparatus 10 ( 10 - 2 ) and the desired sound estimation apparatus 30 is stored in the hard disk drive 1031 .
- setting data used in the processing in the embodiments is stored in, for example, the memory 1010 or the hard disk drive 1031 as program data 1094 .
- the CPU 1020 reads and executes the program module 1093 or the program data 1094 stored in the memory 1010 or the hard disk drive 1031 onto the RAM 1012 as necessary.
- program module 1093 and the program data 1094 may be stored in, for example, a removable storage medium and read by the CPU 1020 through the disk drive 1041 without being limited to the case of being stored in the hard disk drive 1031 .
- the program module 1093 and the program data 1094 may be stored in another computer connected through a network (such as local area network (LAN) and wide area network (WAN)). Then, the program module 1093 and the program data 1094 may be read by the CPU 1020 through the network interface 1070 .
- LAN local area network
- WAN wide area network
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Circuit For Audible Band Transducer (AREA)
- Complex Calculations (AREA)
Abstract
Description
- Non Patent Literature 1: M. Souden, S. Araki, K. Kinoshita, T. Nakatani, and H. Sawada, “A Multichannel MMSE-Based Framework for Speech Source Separation and Noise Reduction,” IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL. 21, NO. 9, SEPTEMBER 2013, pp. 1913-1928.
y(t,f)=s n(t,f)+v(t,f) (1-1)
y(t,f)=v(t,f) (1-2)
a k(f)=UNIT-NORM EIGENVECTOR CORRESPONDING TO MAXIMUM EIGENVALUE OF R k(f) (6-2)
R k(f)=U k(f)D k(f)U k H(f) (20)
B k(f)=U k(f)E k(f)U k H(f) (21)
R k(f)=U k(f)D k(f)U k H(f) (26-1)
B k(f)=V k(f)Σk(f)V k H(f) (26-2)
tr{B k(f)R k(f)}=tr{Σ k(f)D k(f)} (28)
Φn(f)=Φn+v(f)−Φv(f) (40)
W n(f)=Φy −1(f)Φn(f) (42)
ŝ n(t,f)=W n H(f)y(t,f) (43)
-
- 10, 10-2, 10A Mask estimation apparatus
- 11, 11A Feature extraction unit
- 12, 12-2, 12A Mask update unit
- 13, 13-2, 13A Mixture weight update unit
- 14, 14-2 Parameter update unit
- 14A Position parameter update unit
- 15A Spread parameter update unit
- 20, 20-2, 20A Storage unit
- 30 Desired sound estimation apparatus
- 31 Matrix calculation unit
- 32 Wiener filter calculation unit
- 33 Desired sound estimation unit
- 1000 Computer
- 1010 Memory
- 1020 CPU
Claims (11)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016-027424 | 2016-02-16 | ||
JP2016027424 | 2016-02-16 | ||
JP2016027424 | 2016-02-16 | ||
PCT/JP2016/087996 WO2017141542A1 (en) | 2016-02-16 | 2016-12-20 | Mask estimation apparatus, mask estimation method, and mask estimation program |
Publications (2)
Publication Number | Publication Date |
---|---|
US20190267019A1 US20190267019A1 (en) | 2019-08-29 |
US10878832B2 true US10878832B2 (en) | 2020-12-29 |
Family
ID=59625834
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/998,742 Active 2037-07-05 US10878832B2 (en) | 2016-02-16 | 2016-12-20 | Mask estimation apparatus, mask estimation method, and mask estimation program |
Country Status (4)
Country | Link |
---|---|
US (1) | US10878832B2 (en) |
JP (1) | JP6535112B2 (en) |
CN (1) | CN108701468B (en) |
WO (1) | WO2017141542A1 (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019163487A1 (en) * | 2018-02-23 | 2019-08-29 | 日本電信電話株式会社 | Signal analysis device, signal analysis method, and signal analysis program |
JP6915579B2 (en) * | 2018-04-06 | 2021-08-04 | 日本電信電話株式会社 | Signal analyzer, signal analysis method and signal analysis program |
JP6992709B2 (en) * | 2018-08-31 | 2022-01-13 | 日本電信電話株式会社 | Mask estimation device, mask estimation method and mask estimation program |
CN109859769B (en) * | 2019-01-30 | 2021-09-17 | 西安讯飞超脑信息科技有限公司 | Mask estimation method and device |
CN110674528B (en) * | 2019-09-20 | 2024-04-09 | 深圳前海微众银行股份有限公司 | Federal learning privacy data processing method, device, system and storage medium |
CN113539290B (en) * | 2020-04-22 | 2024-04-12 | 华为技术有限公司 | Voice noise reduction method and device |
CN115699171A (en) | 2020-06-11 | 2023-02-03 | 杜比实验室特许公司 | Generalized stereo background and panning source separation with minimal training |
CN112564885B (en) * | 2020-11-26 | 2022-07-12 | 南京农业大学 | Side channel test analysis method based on mask variable maximum probability density function distribution |
WO2022130445A1 (en) * | 2020-12-14 | 2022-06-23 | 日本電信電話株式会社 | Sound source signal generation device, sound source signal generation method, program |
US11755888B1 (en) * | 2023-01-09 | 2023-09-12 | Fudan University | Method and system for accelerating score-based generative models with preconditioned diffusion sampling |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6816632B1 (en) * | 2000-02-17 | 2004-11-09 | Wake Forest University Health Sciences | Geometric motion analysis |
US20060034361A1 (en) * | 2004-08-14 | 2006-02-16 | Samsung Electronics Co., Ltd | Method and apparatus for eliminating cross-channel interference, and multi-channel source separation method and multi-channel source separation apparatus using the same |
US20060126714A1 (en) * | 2004-12-15 | 2006-06-15 | Spirox Corporation | Method and apparatus for measuring signal jitters |
US20060277035A1 (en) | 2005-06-03 | 2006-12-07 | Atsuo Hiroe | Audio signal separation device and method thereof |
US20070025556A1 (en) * | 2005-07-26 | 2007-02-01 | Kabushiki Kaisha Kobe Seiko Sho | Sound source separation apparatus and sound source separation method |
US20070133811A1 (en) * | 2005-12-08 | 2007-06-14 | Kabushiki Kaisha Kobe Seiko Sho | Sound source separation apparatus and sound source separation method |
US20090242787A1 (en) * | 2008-03-25 | 2009-10-01 | Nuflare Technology, Inc. | Charged-particle beam writing method and charged-particle beam writing apparatus |
US20100020204A1 (en) * | 2008-03-18 | 2010-01-28 | Fleischer Jason W | System and method for nonlinear self-filtering via dynamical stochastic resonance |
US20100125352A1 (en) * | 2008-11-14 | 2010-05-20 | Yamaha Corporation | Sound Processing Device |
US20110149301A1 (en) * | 2009-12-23 | 2011-06-23 | Samsung Electronics Co., Ltd. | Beam position measuring apparatus and method |
JP2012163676A (en) | 2011-02-04 | 2012-08-30 | Yamaha Corp | Acoustic processing device |
US20130311142A1 (en) * | 2012-05-16 | 2013-11-21 | Toshiba Medical Systems Corporation | Random coincidence reduction in positron emission tomography using tangential time-of-flight mask |
US20150124988A1 (en) * | 2013-11-07 | 2015-05-07 | Continental Automotive Systems,Inc. | Cotalker nulling based on multi super directional beamformer |
US20160005394A1 (en) * | 2013-02-14 | 2016-01-07 | Sony Corporation | Voice recognition apparatus, voice recognition method and program |
US20160216384A1 (en) * | 2015-01-26 | 2016-07-28 | Brimrose Technology Corporation | Detection of nuclear radiation via mercurous halides |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5728888B2 (en) * | 2010-10-29 | 2015-06-03 | ソニー株式会社 | Signal processing apparatus and method, and program |
CN103594093A (en) * | 2012-08-15 | 2014-02-19 | 王景芳 | Method for enhancing voice based on signal to noise ratio soft masking |
JP6253226B2 (en) * | 2012-10-29 | 2017-12-27 | 三菱電機株式会社 | Sound source separation device |
JP6059072B2 (en) * | 2013-04-24 | 2017-01-11 | 日本電信電話株式会社 | Model estimation device, sound source separation device, model estimation method, sound source separation method, and program |
CN105096961B (en) * | 2014-05-06 | 2019-02-01 | 华为技术有限公司 | Speech separating method and device |
-
2016
- 2016-12-20 WO PCT/JP2016/087996 patent/WO2017141542A1/en active Application Filing
- 2016-12-20 US US15/998,742 patent/US10878832B2/en active Active
- 2016-12-20 JP JP2017567967A patent/JP6535112B2/en active Active
- 2016-12-20 CN CN201680081856.3A patent/CN108701468B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6816632B1 (en) * | 2000-02-17 | 2004-11-09 | Wake Forest University Health Sciences | Geometric motion analysis |
US20060034361A1 (en) * | 2004-08-14 | 2006-02-16 | Samsung Electronics Co., Ltd | Method and apparatus for eliminating cross-channel interference, and multi-channel source separation method and multi-channel source separation apparatus using the same |
US20060126714A1 (en) * | 2004-12-15 | 2006-06-15 | Spirox Corporation | Method and apparatus for measuring signal jitters |
US20060277035A1 (en) | 2005-06-03 | 2006-12-07 | Atsuo Hiroe | Audio signal separation device and method thereof |
JP2006337851A (en) | 2005-06-03 | 2006-12-14 | Sony Corp | Speech signal separating device and method |
US20070025556A1 (en) * | 2005-07-26 | 2007-02-01 | Kabushiki Kaisha Kobe Seiko Sho | Sound source separation apparatus and sound source separation method |
US20070133811A1 (en) * | 2005-12-08 | 2007-06-14 | Kabushiki Kaisha Kobe Seiko Sho | Sound source separation apparatus and sound source separation method |
US20100020204A1 (en) * | 2008-03-18 | 2010-01-28 | Fleischer Jason W | System and method for nonlinear self-filtering via dynamical stochastic resonance |
US20090242787A1 (en) * | 2008-03-25 | 2009-10-01 | Nuflare Technology, Inc. | Charged-particle beam writing method and charged-particle beam writing apparatus |
US20100125352A1 (en) * | 2008-11-14 | 2010-05-20 | Yamaha Corporation | Sound Processing Device |
US20110149301A1 (en) * | 2009-12-23 | 2011-06-23 | Samsung Electronics Co., Ltd. | Beam position measuring apparatus and method |
JP2012163676A (en) | 2011-02-04 | 2012-08-30 | Yamaha Corp | Acoustic processing device |
US20130311142A1 (en) * | 2012-05-16 | 2013-11-21 | Toshiba Medical Systems Corporation | Random coincidence reduction in positron emission tomography using tangential time-of-flight mask |
US20160005394A1 (en) * | 2013-02-14 | 2016-01-07 | Sony Corporation | Voice recognition apparatus, voice recognition method and program |
US20150124988A1 (en) * | 2013-11-07 | 2015-05-07 | Continental Automotive Systems,Inc. | Cotalker nulling based on multi super directional beamformer |
US20160216384A1 (en) * | 2015-01-26 | 2016-07-28 | Brimrose Technology Corporation | Detection of nuclear radiation via mercurous halides |
Non-Patent Citations (5)
Title |
---|
International Search Report dated Mar. 14, 2017 in PCT/JP2016/087996 filed Dec. 20, 2016. |
Ito, N. et al., "Modeling Audio Directional Statistics Using a Complex Bingham Mixture Model for Blind Source Extraction from Diffuse Noise", ICASSP, 2016, pp. 465-468. |
Ito, N. et al., "Permutation-Free Clustering Method for Underdetermined Blind Source Separation Based on Source Location Information", The Institute of Electronics, Information and Communication Engineers (IEICE), vol. J97-A, No. 4, 2014, pp. 234-246 (with English translation). |
Office Action dated Dec. 18, 2018 in Japanese Patent Application No. 2017-567967 (with English translation). |
Souden , M. et al, "A Multichannel MMSE-Based Framework for Speech Source Separation and Noise Reduction", IEEE Transactions on Audio, Speech and Language Processing, vol. 21, No. 9, ., Sep. 2013 , pp. 1913-1928. |
Also Published As
Publication number | Publication date |
---|---|
CN108701468B (en) | 2023-06-02 |
WO2017141542A1 (en) | 2017-08-24 |
JPWO2017141542A1 (en) | 2018-07-12 |
JP6535112B2 (en) | 2019-06-26 |
CN108701468A (en) | 2018-10-23 |
US20190267019A1 (en) | 2019-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10878832B2 (en) | Mask estimation apparatus, mask estimation method, and mask estimation program | |
US10643633B2 (en) | Spatial correlation matrix estimation device, spatial correlation matrix estimation method, and spatial correlation matrix estimation program | |
US20060206315A1 (en) | Apparatus and method for separating audio signals | |
US11456003B2 (en) | Estimation device, learning device, estimation method, learning method, and recording medium | |
US20180047407A1 (en) | Sound source separation apparatus and method, and program | |
KR101305373B1 (en) | Interested audio source cancellation method and voice recognition method thereof | |
Adiloğlu et al. | Variational Bayesian inference for source separation and robust feature extraction | |
Saeidi et al. | Uncertain LDA: Including observation uncertainties in discriminative transforms | |
Kolossa et al. | Noise-adaptive LDA: A new approach for speech recognition under observation uncertainty | |
US11423924B2 (en) | Signal analysis device for modeling spatial characteristics of source signals, signal analysis method, and recording medium | |
JP6538624B2 (en) | Signal processing apparatus, signal processing method and signal processing program | |
Kubo et al. | Efficient full-rank spatial covariance estimation using independent low-rank matrix analysis for blind source separation | |
US20220335965A1 (en) | Speech signal processing device, speech signal processing method, speech signal processing program, training device, training method, and training program | |
JP6059072B2 (en) | Model estimation device, sound source separation device, model estimation method, sound source separation method, and program | |
JP5726790B2 (en) | Sound source separation device, sound source separation method, and program | |
CN112037813B (en) | Voice extraction method for high-power target signal | |
US11302343B2 (en) | Signal analysis device, signal analysis method, and signal analysis program | |
US11297418B2 (en) | Acoustic signal separation apparatus, learning apparatus, method, and program thereof | |
JP6114053B2 (en) | Sound source separation device, sound source separation method, and program | |
US20220189496A1 (en) | Signal processing device, signal processing method, and program | |
JP6193823B2 (en) | Sound source number estimation device, sound source number estimation method, and sound source number estimation program | |
CN117711422A (en) | Underdetermined voice separation method and device based on compressed sensing space information estimation | |
US20240144952A1 (en) | Sound source separation apparatus, sound source separation method, and program | |
JP6734237B2 (en) | Target sound source estimation device, target sound source estimation method, and target sound source estimation program | |
Droppo | Feature Compensation. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ITO, NOBUTAKA;ARAKI, SHOKO;NAKATANI, TOMOHIRO;SIGNING DATES FROM 20180614 TO 20180620;REEL/FRAME:047253/0908 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |