CN108322858B - Multi-microphone sound enhancement method based on tensor resolution - Google Patents
Multi-microphone sound enhancement method based on tensor resolution Download PDFInfo
- Publication number
- CN108322858B CN108322858B CN201810070662.4A CN201810070662A CN108322858B CN 108322858 B CN108322858 B CN 108322858B CN 201810070662 A CN201810070662 A CN 201810070662A CN 108322858 B CN108322858 B CN 108322858B
- Authority
- CN
- China
- Prior art keywords
- tensor
- observation
- orthogonal
- noise
- decomposition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 239000011159 matrix material Substances 0.000 claims description 48
- 238000000354 decomposition reaction Methods 0.000 claims description 26
- 238000009826 distribution Methods 0.000 claims description 14
- 239000000835 fiber Substances 0.000 claims description 9
- 238000011084 recovery Methods 0.000 claims description 5
- 230000001629 suppression Effects 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000000844 transformation Methods 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 abstract description 2
- 238000001914 filtration Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/04—Circuits for transducers, loudspeakers or microphones for correcting frequency response
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a kind of multi-microphone sound enhancement method based on tensor resolution, comprising: the multicenter voice signal that multiple microphones observe is indicated using 3D tensor, and will be on a series of tensor projection to orthogonal basis;Using statistical risk criterion is minimized, noise segment is collected in real time, and tracking noise covariance calculates optimal thresholding according to tensor block size.Received multi-channel data is expressed as a three rank tensors by the present invention, to retain original spatial information and temporal information, it is thus possible to be removed ambient noise and weak directionality noise more obviously, and be reduced voice distortion as much as possible.
Description
Technical Field
The invention relates to the field of voice noise reduction, in particular to a multi-microphone voice enhancement method based on tensor decomposition.
Background
In the field of speech enhancement, a classical single-channel algorithm can remove more background noise, but easily causes speech distortion, even brings "music" noise, and causes speech quality impairment. By adopting the microphone array and utilizing the beam forming algorithm, the directional interference can be well inhibited.
Traditional speech enhancement algorithms based on microphone arrays can be divided into time domain noise reduction algorithms and frequency domain noise reduction algorithms. Time domain algorithms typically splice the speech frames output by each microphone and optimally linearly filter the lengthened frames. The frequency domain algorithm performs Fourier transform on frames of each microphone, extracts time-frequency units corresponding to the frames to form a snapshot vector, and performs optimal linear filtering on the snapshot vector. However, the vector-based representation method cannot fully utilize the spatial, temporal, and frequency information carried in the multi-channel data, and thus has room for improvement.
Furthermore, in many cases the noise received by the microphone array is not completely directional interference, which makes beamforming techniques based on the spatial filtering principle highly vulnerable to performance loss. For background noise with insignificant or even no directivity, the spatial filtering effect of the beamforming algorithm is poor, which may cause more noise residue.
Disclosure of Invention
The invention aims to provide a multi-microphone speech enhancement method based on tensor decomposition. Compared with the traditional beam forming method, the method expresses the received multi-channel data as a third-order tensor, thereby retaining the original spatial information and time information, more obviously removing background noise and weak directional noise and reducing voice distortion as much as possible.
The purpose of the invention is realized by the following technical scheme:
a multi-microphone speech enhancement method based on tensor decomposition, comprising:
step (1), expressing the observed multi-channel voice data as a third-order tensor, and taking the third-order tensor as an observation tensor; and three dimensions of the observation tensor are respectively subjected to sparse reconstruction by using three orthogonal square matrixes to obtain a core tensor containing a projection coefficient, and the method comprises the following steps: respectively projecting three dimensions of the observation tensor onto the three orthogonal matrixes by using a preset or preselected orthogonal basis as a projection matrix to obtain a core tensor containing a projection coefficient; the input to this step is the observation tensor and the output is the core tensor, which contains the projection coefficients.
Step (2), a non-negative threshold value is preset, and the projection coefficient with the amplitude lower than the threshold value in the core tensor is set to be zero, so that the suppression of noise and the reconstruction of clean voice are realized; the method comprises the following steps: and designing an optimal threshold value by adopting a minimum statistical risk criterion, wherein the size of the threshold value is determined by the standard deviation of the noise and the size of the observation tensor. The input to this step is the core tensor and the output is the tensor representing the clean speech.
Further, in the multi-microphone speech enhancement method based on tensor decomposition, the step (1) includes:
step (11), representing observed multi-channel voice data as a third-order tensor through a signal receiving model of a microphone array, and taking the third-order tensor as an observation tensor;
the signal reception model of the microphone array is represented as follows:
Y(l,k,n)=X(l,k,n)+N(l,k,n)∈RL×K×N;
wherein,Yrepresenting the observed tensor, X represents the tensor to be estimated representing clean speech,Nthe tensor of the noise is represented as,Y(l,k,n),X(l, k, n) andN(L, K, N) respectively represent an observation tensor, a tensor to be estimated, an nth receiving channel in a noise tensor and an L element of a kth frame, and L, K and N respectively represent the frame length, the number of frames and the number of microphones;
respectively carrying out sparse reconstruction on three dimensions of the observation tensor by using three orthogonal square matrixes to obtain a core tensor containing a projection coefficient;
the decomposition of the observation tensor typically has the form:
Y=Σ×1U1×2U2×3U3,Σ∈RL′×K′×N′,U1∈RL×L′,
U2∈RK×K′,U3∈RN×N′,L′≤L,K′≤K,N′≤N
wherein { U1,U2,U3Denotes a base matrix of the image data set,Σrepresenting the core tensor. Specifically, U1Fiber expressing observation tensor mode-1YBase matrix of (: k, n), U2Fiber expressing observation tensor mode-2YBase matrix of (l,: n), U3Representing observed tensor mode-3 fibersYA base matrix of (l, k,: in the following order,Σincludes observing the tensor at basis matrix U1,U2,U3The projection coefficients on, L ', K ', N ' represent the truncated size of the core tensor.
By canonical polymorphic decomposition, we can decompose the observation tensor into the most basic form of rank-1 tensor summation, and we can obtain canonical polymorphic decomposition of the tensor by solving the following formula:
s.t.L′=K′=N′=R,Σis diagonal
a core tensor of the hyper-diagonal and a non-orthogonal basis matrix are obtained. Where R denotes the rank of the clean speech tensor.
By orthogonal decomposition of the tensor, we can decompose the observation tensor into the form of the product of three orthogonal basis matrices and the core tensor, and the orthogonal decomposition of the tensor can be obtained by solving the following formula:
a non-diagonal core tensor and orthogonal basis matrices are obtained.
Note that if L ' ≦ L, K ' ≦ K, N ' ≦ N, passing directly throughThe clean speech tensor can be approximately reconstructed to recover the original speech signal. In the present invention, we select L ═ L, K ═ K, and N ═ N to obtain the orthogonal square matrix { U ═ N1,U2,U3As a base matrix; then a threshold value lambda is designed, which willΣAnd setting the projection coefficient with the medium amplitude absolute value smaller than lambda as zero, thereby realizing the suppression of noise. In general, an excessively large threshold will cause more speech distortion, while a smaller threshold will cause more noise residual; regarding the selection of the optimal threshold, it will be described in detail below:
for the following linear observation model:
y(i)=x(i)+n(i),i=1,2,...,Q
where n (i) obeys a univariate Gaussian distribution, n (i) at different times are independent of each other, and x (i) obeys a Gaussian distribution, then the minimum statistical risk estimate for x (i) can be expressed as Hλ(y (i)) wherein Hλ(. h) is a hard threshold operator whose effect is to set the components in y (i) below threshold λ to 0; based on the minimum statistical risk criterion, the optimal threshold is
For a multi-channel data reception model, the following relationship is satisfied:
Y=X+N=Σ×1U1×2U2×3U3
here the tensor of noiseNSatisfies the assumption of mutually independent, gaussian distributions, the above formula being equivalent to:
by recoveryX' immediate recoveryX(ii) a Due to U1,U2,U3Are all orthogonal matrices, and the orthogonal matrices correspond to rotational transformations and thus do not changeNThe nature of the independent co-distribution, gaussian distribution; by unfolding the tensor in the above formula into a vector, it can be rewritten asHere vectorAll the length of the microphone frames are NLK (namely NxLxK, N, L, K respectively indicate the number of microphones, the frame length and the number of frames);can be estimated asAccording to the pair tensorXIn the definition of' in the present specification,can be reconstructed intoX', which in turn can reduceX(ii) a And the optimal threshold isWhere δ represents the standard deviation of the noise; n, L, K respectively indicate the number of microphones,frame length, frame number; log represents base 2 logarithm.
According to the technical scheme provided by the invention, on one hand, compared with the traditional multi-channel speech enhancement algorithm, the method has the advantages that the received data is expressed as a third-order tensor, and the time-space correlation characteristic in the original signal can be effectively reserved; on the other hand, compared with the traditional multi-channel enhancement algorithm, the method has small operand, and can realize effective noise reduction only by determining the optimal threshold coefficient.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a flow chart of a multi-channel speech enhancement algorithm provided by an embodiment of the present invention;
fig. 2 is a schematic diagram of calculating an optimal threshold according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a multi-microphone speech enhancement method based on tensor decomposition, which mainly comprises the following steps as shown in figure 1:
and 11, selecting which orthogonal basis matrix (3DDCT orthogonal basis, supervised orthogonal basis and unsupervised orthogonal basis).
Step 12, projecting the tensor on the selected base matrix; and selecting an optimal threshold to truncate the projection coefficient.
The flowchart for calculating the optimal threshold operator according to the minimum statistical risk criterion provided by the embodiment of the present invention, as shown in fig. 2, mainly includes the following steps:
and step 21, tracking and extracting a no-sound/noise section in real time, and calculating the noise variance.
And step 22, selecting the size of a tensor block according to the signal sampling rate and the like, and calculating the optimal truncation threshold.
Compared with the traditional multi-channel speech enhancement algorithm, the scheme of the invention realizes the enhancement of multi-channel speech signals by utilizing the high-order tensor representation, and can effectively reserve the space-time correlation characteristic of the signals; in addition, compared with the traditional multi-channel wiener filtering algorithm, the scheme of the invention has small operand and can realize enhancement only by determining the optimal threshold.
For ease of understanding, the following description will be made in detail with respect to the above two steps.
1. Sparse reconstruction of three dimensions of tensor by using three orthogonal matrixes
The signal reception model of the microphone array is represented as follows:
Y(l,k,n)=X(l,k,n)+N(l,k,n)∈RL×K×N;
wherein,Y(l, k, n) is the l-th element of the k-th frame in the n-th receive channel.
The decomposition of the observation tensor typically has the form:
Y=Σ×1U1×2U2×3U3,Σ∈RL′×K′×N′,U1∈RL×L′,
U2∈RK×K′,U3∈RN×N′,L′≤L,K′≤K,N′≤N
wherein { U1,U2,U3Denotes a base matrix of the image data set,Σrepresenting the core tensor. Specifically, U1Fiber expressing observation tensor mode-1YBase matrix of (: k, n), U2Fiber expressing observation tensor mode-2YBase matrix of (l,: n), U3Representing observation tensormode-3 fiberYA base matrix of (l, k,: in the following order,Σthe projection coefficients of the observation tensor onto these basis matrices are included, L ', K ', N ' representing the size of the core tensor.
Canonical polymorphic decomposition solves the following problems by:
s.t.L′=K′=N′=R,Σis diagonal
a core tensor of the hyper-diagonal and a non-orthogonal basis matrix are obtained. Where R denotes the rank of the clean speech tensor.
The orthogonal tensor decomposition then solves the following problem by:
a non-diagonal core tensor and orthogonal basis matrices are obtained.
Note that if L ' ≦ L, K ' ≦ K, N ' ≦ N, passing directly throughThe clean speech tensor can be approximately reconstructed to recover the original speech signal. In the present invention, we select L ═ L, K ═ K, and N ═ N to obtain the orthogonal square matrix { U ═ N1,U2,U3As a base matrix; then a threshold value lambda is designed, which willΣAnd the projection coefficient with the middle absolute value smaller than lambda is set to be zero, so that the suppression of noise is realized. In general, an excessively large threshold will cause more speech distortion, while a smaller threshold will cause more noise residual; with respect to the selection of the optimal threshold, it will be demonstrated later.
The embodiment of the invention considers four base matrices, the base matrix { U }1,U2,U3The (C) may be a 3-dimensional discrete cosine transform (3D-DCT) basis matrix, a supervised basis matrix, or an unsupervised near basis matrixBasis-like matrix, (unsupervised) exact basis matrix. Wherein the 3D-DCT basis matrix is defined by the following formula:
the 3D-DCT basis matrix is a data-independent general basis matrix.
In a practical process, clean multi-channel voice data can be collected as training data for a specific problem, so that an orthogonal basis matrix optimal for the problem is obtained. For example, the present invention obtains a supervised basis matrix by solving the following optimization problem
Here, the number of the first and second electrodes,X i∈RL×K×Ni-1, 2, …, T denotes a training block consisting of clean speech. Due to hidden variablesΣ iThe above problem does not have an explicit optimal solution. A loop iteration method is adopted to obtain a local optimal solution. In the first step, we willInitializing into a 3D-DCT matrix; second step, givenWe use a soft or hard threshold operator to obtain sparsenessΣ i(ii) a Third step, givenΣ iAndupdatingThe fourth step, giveΣ iAndupdatingThe fifth step, giveΣ iAndupdatingThe steps two to five are repeated continuously until the whole process converges.
For example, in step three, we need to solve the following optimization problem:
this problem can translate into:
here, theX i(1)Mode-1 expansion matrix representing a clean speech block. The above problem can be simplified as follows:
the problem is further equivalent to:
suppose thatThe SVD decomposition can be simplified intoThe above problem further translates into:
due to the orthogonal matrixCannot exceed 1, we haveEqual sign is only onThis is true. That is, in this step,is optimally taken asSimilarly, we can updateAndthe whole process can be converged after 20-30 cycles.
The supervised base matrix described above achieves the best results on all training data, but in practical terms, the test data we face is usually not exactly matched with the training data, which may cause the supervised base matrix to face a certain performance degradation. Therefore, the invention proposes to adopt an unsupervised learning mode to automatically deduce the orthogonal basis matrix most suitable for the test data from the test data. The specific optimization problem is as follows:
here, theAn unsupervised basis matrix is represented,a tensor representing a thinned out tensor comprising projection coefficients. Based on the above problems, the present invention provides two unsupervised basis matrices, namely an approximate basis matrix and an exact basis matrix.
The approximate basis matrix can be obtained by a high-order singular value decomposition algorithm; by pairsY (1),Y (2)AndY (3)can be respectively obtained by SVD decompositionThe accurate basis matrix needs to be further optimized on the basis of the accurate basis matrix. First fix itUpdatingThen fixedAndupdatingBy analogy, updateThe whole process is iterated circularly until convergence. For example, forThe update of (a) can be converted into:
hypothesis matrixThe singular value decomposition of (A) can be expressed asThe solution to the above problem can be written directly asHere, M ', N ' are singular vector matrices, and Σ ' is a diagonal matrix composed of non-negative singular values.
2. Selecting a non-negative threshold value, and setting coefficients lower than the threshold value in the core tensor to be zero
For the following linear observation model:
y(i)=x(i)+n(i),i=1,2,...,Q
where n (i) obeys a univariate Gaussian distribution, n (i) at different times are independent of each other, and x (i) obeys a Gaussian distribution, then the minimum statistical risk estimate for x (i) can be expressed as Hλ(y (i)) wherein Hλ(. cndot.) is a hard threshold operator that acts to set the components in y (i) below threshold λ to 0. Based on the minimum statistical risk criterion, the optimal threshold is
For a multi-channel data reception model, the following relationship is satisfied:
Y=X+N=Σ×1U1×2U2×3U3
here the tensor of noiseNSatisfy the assumption of independent, gaussian distribution. The above formula is equivalent to:
here we only need to recoverX' immediate recoveryX. Due to U1,U2,U3Are all orthogonal matrices, and the orthogonal matrices correspond to rotational transformations and thus do not changeN' independent homographic, Gaussian distribution properties. By unfolding the tensor in the above formula into a vector, it can be rewritten asHere vectorAre all NLK (i.e., N × L × K, N, L, K respectively indicate the number of microphones, the frame length, and the number of frames). ThenCan be estimated asBy definition,can be reconstructed intoX', which in turn can reduceX. Thus the optimum threshold is
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (3)
1. A multi-microphone speech enhancement method based on tensor decomposition is characterized by comprising the following steps:
step (1), expressing the observed multi-channel voice data as a third-order tensor, and taking the third-order tensor as an observation tensor; and three dimensions of the observation tensor are respectively subjected to sparse reconstruction by using three orthogonal square matrixes to obtain a core tensor containing a projection coefficient, and the method comprises the following steps: respectively projecting three dimensions of the observation tensor onto the three orthogonal matrixes by using a preset or preselected orthogonal basis as a projection matrix to obtain a core tensor containing a projection coefficient;
step (2), a non-negative threshold value is preset, and the projection coefficient with the amplitude lower than the threshold value in the core tensor is set to be zero, so that the suppression of noise and the reconstruction of clean voice are realized; the method comprises the following steps: and designing an optimal threshold value by adopting a minimum statistical risk criterion, wherein the size of the threshold value is determined by the standard deviation of the noise and the size of the observation tensor.
2. The tensor decomposition-based multi-microphone speech enhancement method of claim 1, wherein the step (1) comprises:
step (11), representing observed multi-channel voice data as a third-order tensor through a signal receiving model of a microphone array, and taking the third-order tensor as an observation tensor;
the signal reception model of the microphone array is represented as follows:
Y(l,k,n)=X(l,k,n)+N(l,k,n)∈RL×K×N;
wherein,Ya tensor is represented which is a tensor of observation,Xrepresenting the tensor to be estimated that represents the clean speech,Nthe tensor of the noise is represented as,Y(l,k,n),X(l, k, n) andN(L, K, N) respectively represent an observation tensor, a tensor to be estimated, an nth receiving channel in a noise tensor and an L element of a kth frame, and L, K and N respectively represent the frame length, the number of frames and the number of microphones;
respectively carrying out sparse reconstruction on three dimensions of the observation tensor by using three orthogonal square matrixes to obtain a core tensor containing a projection coefficient;
the decomposition of the observed tensor takes the form:
Y=Σ×1U1×2U2×3U3,Σ∈RL′×K′×N′,U1∈RL×L′,
U2∈RK×K′,U3∈RN×N′,L′≤L,K′≤K,N′≤N
wherein { U1,U2,U3Denotes a base matrix of the image data set,Σrepresenting a core tensor; specifically, U1Fiber expressing observation tensor mode-1YBase matrix of (: k, n), U2Fiber expressing observation tensor mode-2YBase matrix of (l,: n), U3Representing observed tensor mode-3 fibersYA base matrix of (l, k,: in the following order,Σincludes the projection coefficients of the observation tensor on the basis matrixes1、×2、×3Respectively representΣ、U1、U2、U3Sequentially multiplying the mode 1, the mode 2 and the mode 3, wherein L ', K ' and N ' represent the size of the core tensor;
by canonicalizing polymorphic decomposition, we can decompose the observation tensor into a form of the sum of a finite number of rank-1 tensors, and canonicalizing polymorphic decomposition of the tensor can be realized by the following formula:
s.t.L′=K′=N′=R,
obtaining a core tensor of the over-diagonal and a non-orthogonal basis matrix, where R represents the rank of the clean speech tensor;
by orthogonal decomposition, we can decompose the observation tensor into the form of the product of three orthogonal basis matrices and one core tensor, and the orthogonal decomposition of the tensor can be realized by the following formula:
a non-diagonal core tensor and orthogonal basis matrices are obtained.
3. The multi-microphone speech enhancement method based on tensor decomposition as recited in claim 2, wherein in the step (2):
for the following linear observation model:
y(i)=x(i)+n(i),i=1,2,...,Q
where Q represents the number of total samples, where n (i) obeys a univariate Gaussian distribution, n (i) at different times are independent of each other, and x (i) obeys a Gaussian distribution, then the minimum statistical risk estimate for x (i) can be represented as Hλ(y (i)) wherein Hλ(. h) is a hard threshold operator whose effect is to set the components in y (i) below threshold λ to 0; based on the minimum statistical risk criterion, the optimal threshold is
For a multi-channel data reception model, the following relationship is satisfied:
Y=X+N=Σ×1U1×2U2×3U3
here the tensor of noiseNSatisfies the assumption of mutually independent, gaussian distributions, the above formula being equivalent to:
by recoveryX' immediate recoveryX(ii) a Due to U1,U2,U3Are all orthogonal matrices, and the orthogonal matrices correspond to rotational transformations and thus do not changeNThe nature of the independent co-distribution, gaussian distribution; by unfolding the tensor in the above formula into a vector, it can be rewritten asHere vectorAll the lengths of the two groups are NLK;can be estimated asAccording to the pair tensorXIn the definition of' in the present specification,can be reconstructed intoX', which in turn can reduceX(ii) a And the optimal threshold isWhere δ represents the standard deviation of the noise; log represents base 2 logarithm and NLK represents N x L x K.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810070662.4A CN108322858B (en) | 2018-01-25 | 2018-01-25 | Multi-microphone sound enhancement method based on tensor resolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810070662.4A CN108322858B (en) | 2018-01-25 | 2018-01-25 | Multi-microphone sound enhancement method based on tensor resolution |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108322858A CN108322858A (en) | 2018-07-24 |
CN108322858B true CN108322858B (en) | 2019-11-22 |
Family
ID=62887100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810070662.4A Active CN108322858B (en) | 2018-01-25 | 2018-01-25 | Multi-microphone sound enhancement method based on tensor resolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108322858B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111312270B (en) * | 2020-02-10 | 2022-11-22 | 腾讯科技(深圳)有限公司 | Voice enhancement method and device, electronic equipment and computer readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982805A (en) * | 2012-12-27 | 2013-03-20 | 北京理工大学 | Multi-channel audio signal compressing method based on tensor decomposition |
CN103117059A (en) * | 2012-12-27 | 2013-05-22 | 北京理工大学 | Voice signal characteristics extracting method based on tensor decomposition |
CN106127297A (en) * | 2016-06-02 | 2016-11-16 | 中国科学院自动化研究所 | The acceleration of degree of depth convolutional neural networks based on resolution of tensor and compression method |
US9576583B1 (en) * | 2014-12-01 | 2017-02-21 | Cedar Audio Ltd | Restoring audio signals with mask and latent variables |
CN106981292A (en) * | 2017-05-16 | 2017-07-25 | 北京理工大学 | A kind of multichannel spatial audio signal compression modeled based on tensor and restoration methods |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10468036B2 (en) * | 2014-04-30 | 2019-11-05 | Accusonus, Inc. | Methods and systems for processing and mixing signals using signal decomposition |
-
2018
- 2018-01-25 CN CN201810070662.4A patent/CN108322858B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982805A (en) * | 2012-12-27 | 2013-03-20 | 北京理工大学 | Multi-channel audio signal compressing method based on tensor decomposition |
CN103117059A (en) * | 2012-12-27 | 2013-05-22 | 北京理工大学 | Voice signal characteristics extracting method based on tensor decomposition |
US9576583B1 (en) * | 2014-12-01 | 2017-02-21 | Cedar Audio Ltd | Restoring audio signals with mask and latent variables |
CN106127297A (en) * | 2016-06-02 | 2016-11-16 | 中国科学院自动化研究所 | The acceleration of degree of depth convolutional neural networks based on resolution of tensor and compression method |
CN106981292A (en) * | 2017-05-16 | 2017-07-25 | 北京理工大学 | A kind of multichannel spatial audio signal compression modeled based on tensor and restoration methods |
Non-Patent Citations (1)
Title |
---|
基于听觉感知与张量模型的鲁棒语音特征提取方法研究;吴强;《中国博士学位论文全文数据库 信息科技辑》;20110715(第7期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108322858A (en) | 2018-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108447498B (en) | Speech enhancement method applied to microphone array | |
US8848933B2 (en) | Signal enhancement device, method thereof, program, and recording medium | |
CN108172231B (en) | Dereverberation method and system based on Kalman filtering | |
JP2007526511A (en) | Method and apparatus for blind separation of multipath multichannel mixed signals in the frequency domain | |
CN110998723B (en) | Signal processing device using neural network, signal processing method, and recording medium | |
Geng et al. | End-to-end speech enhancement based on discrete cosine transform | |
Nesta et al. | Blind source extraction for robust speech recognition in multisource noisy environments | |
Nesta et al. | Robust Automatic Speech Recognition through On-line Semi Blind Signal Extraction | |
Habets et al. | Dereverberation | |
EP3440671A1 (en) | Audio source parameterization | |
JPWO2019163487A1 (en) | Signal analyzer, signal analysis method and signal analysis program | |
CN117854536B (en) | RNN noise reduction method and system based on multidimensional voice feature combination | |
Kubo et al. | Efficient full-rank spatial covariance estimation using independent low-rank matrix analysis for blind source separation | |
KR101043114B1 (en) | Method of Restoration of Sound, Recording Media of the same and Apparatus of the same | |
Şimşekli et al. | Non-negative tensor factorization models for Bayesian audio processing | |
CN108322858B (en) | Multi-microphone sound enhancement method based on tensor resolution | |
CN108875824B (en) | Single-channel blind source separation method | |
CN112037813B (en) | Voice extraction method for high-power target signal | |
CN109644304B (en) | Source separation for reverberant environments | |
CN103176947B (en) | A kind of multi channel signals denoising method based on signal correlation | |
CN111312270B (en) | Voice enhancement method and device, electronic equipment and computer readable storage medium | |
CN101322183A (en) | Signal distortion elimination apparatus, method, program, and recording medium having the program recorded thereon | |
CN116705049A (en) | Underwater acoustic signal enhancement method and device, electronic equipment and storage medium | |
WO2016162165A1 (en) | Method and device for encoding multiple audio signals, and method and device for decoding a mixture of multiple audio signals with improved separation | |
Inoue et al. | Sepnet: a deep separation matrix prediction network for multichannel audio source separation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |