CN108322858B - Multi-microphone sound enhancement method based on tensor resolution - Google Patents

Multi-microphone sound enhancement method based on tensor resolution Download PDF

Info

Publication number
CN108322858B
CN108322858B CN201810070662.4A CN201810070662A CN108322858B CN 108322858 B CN108322858 B CN 108322858B CN 201810070662 A CN201810070662 A CN 201810070662A CN 108322858 B CN108322858 B CN 108322858B
Authority
CN
China
Prior art keywords
tensor
observation
orthogonal
noise
decomposition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810070662.4A
Other languages
Chinese (zh)
Other versions
CN108322858A (en
Inventor
叶中付
童仁杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201810070662.4A priority Critical patent/CN108322858B/en
Publication of CN108322858A publication Critical patent/CN108322858A/en
Application granted granted Critical
Publication of CN108322858B publication Critical patent/CN108322858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a kind of multi-microphone sound enhancement method based on tensor resolution, comprising: the multicenter voice signal that multiple microphones observe is indicated using 3D tensor, and will be on a series of tensor projection to orthogonal basis;Using statistical risk criterion is minimized, noise segment is collected in real time, and tracking noise covariance calculates optimal thresholding according to tensor block size.Received multi-channel data is expressed as a three rank tensors by the present invention, to retain original spatial information and temporal information, it is thus possible to be removed ambient noise and weak directionality noise more obviously, and be reduced voice distortion as much as possible.

Description

Multi-microphone speech enhancement method based on tensor decomposition
Technical Field
The invention relates to the field of voice noise reduction, in particular to a multi-microphone voice enhancement method based on tensor decomposition.
Background
In the field of speech enhancement, a classical single-channel algorithm can remove more background noise, but easily causes speech distortion, even brings "music" noise, and causes speech quality impairment. By adopting the microphone array and utilizing the beam forming algorithm, the directional interference can be well inhibited.
Traditional speech enhancement algorithms based on microphone arrays can be divided into time domain noise reduction algorithms and frequency domain noise reduction algorithms. Time domain algorithms typically splice the speech frames output by each microphone and optimally linearly filter the lengthened frames. The frequency domain algorithm performs Fourier transform on frames of each microphone, extracts time-frequency units corresponding to the frames to form a snapshot vector, and performs optimal linear filtering on the snapshot vector. However, the vector-based representation method cannot fully utilize the spatial, temporal, and frequency information carried in the multi-channel data, and thus has room for improvement.
Furthermore, in many cases the noise received by the microphone array is not completely directional interference, which makes beamforming techniques based on the spatial filtering principle highly vulnerable to performance loss. For background noise with insignificant or even no directivity, the spatial filtering effect of the beamforming algorithm is poor, which may cause more noise residue.
Disclosure of Invention
The invention aims to provide a multi-microphone speech enhancement method based on tensor decomposition. Compared with the traditional beam forming method, the method expresses the received multi-channel data as a third-order tensor, thereby retaining the original spatial information and time information, more obviously removing background noise and weak directional noise and reducing voice distortion as much as possible.
The purpose of the invention is realized by the following technical scheme:
a multi-microphone speech enhancement method based on tensor decomposition, comprising:
step (1), expressing the observed multi-channel voice data as a third-order tensor, and taking the third-order tensor as an observation tensor; and three dimensions of the observation tensor are respectively subjected to sparse reconstruction by using three orthogonal square matrixes to obtain a core tensor containing a projection coefficient, and the method comprises the following steps: respectively projecting three dimensions of the observation tensor onto the three orthogonal matrixes by using a preset or preselected orthogonal basis as a projection matrix to obtain a core tensor containing a projection coefficient; the input to this step is the observation tensor and the output is the core tensor, which contains the projection coefficients.
Step (2), a non-negative threshold value is preset, and the projection coefficient with the amplitude lower than the threshold value in the core tensor is set to be zero, so that the suppression of noise and the reconstruction of clean voice are realized; the method comprises the following steps: and designing an optimal threshold value by adopting a minimum statistical risk criterion, wherein the size of the threshold value is determined by the standard deviation of the noise and the size of the observation tensor. The input to this step is the core tensor and the output is the tensor representing the clean speech.
Further, in the multi-microphone speech enhancement method based on tensor decomposition, the step (1) includes:
step (11), representing observed multi-channel voice data as a third-order tensor through a signal receiving model of a microphone array, and taking the third-order tensor as an observation tensor;
the signal reception model of the microphone array is represented as follows:
Y(l,k,n)=X(l,k,n)+N(l,k,n)∈RL×K×N
wherein,Yrepresenting the observed tensor, X represents the tensor to be estimated representing clean speech,Nthe tensor of the noise is represented as,Y(l,k,n),X(l, k, n) andN(L, K, N) respectively represent an observation tensor, a tensor to be estimated, an nth receiving channel in a noise tensor and an L element of a kth frame, and L, K and N respectively represent the frame length, the number of frames and the number of microphones;
respectively carrying out sparse reconstruction on three dimensions of the observation tensor by using three orthogonal square matrixes to obtain a core tensor containing a projection coefficient;
the decomposition of the observation tensor typically has the form:
YΣ×1U1×2U2×3U3,Σ∈RL′×K′×N′,U1∈RL×L′,
U2∈RK×K′,U3∈RN×N′,L′≤L,K′≤K,N′≤N
wherein { U1,U2,U3Denotes a base matrix of the image data set,Σrepresenting the core tensor. Specifically, U1Fiber expressing observation tensor mode-1YBase matrix of (: k, n), U2Fiber expressing observation tensor mode-2YBase matrix of (l,: n), U3Representing observed tensor mode-3 fibersYA base matrix of (l, k,: in the following order,Σincludes observing the tensor at basis matrix U1,U2,U3The projection coefficients on, L ', K ', N ' represent the truncated size of the core tensor.
By canonical polymorphic decomposition, we can decompose the observation tensor into the most basic form of rank-1 tensor summation, and we can obtain canonical polymorphic decomposition of the tensor by solving the following formula:
s.t.L′=K′=N′=R,Σis diagonal
a core tensor of the hyper-diagonal and a non-orthogonal basis matrix are obtained. Where R denotes the rank of the clean speech tensor.
By orthogonal decomposition of the tensor, we can decompose the observation tensor into the form of the product of three orthogonal basis matrices and the core tensor, and the orthogonal decomposition of the tensor can be obtained by solving the following formula:
a non-diagonal core tensor and orthogonal basis matrices are obtained.
Note that if L ' ≦ L, K ' ≦ K, N ' ≦ N, passing directly throughThe clean speech tensor can be approximately reconstructed to recover the original speech signal. In the present invention, we select L ═ L, K ═ K, and N ═ N to obtain the orthogonal square matrix { U ═ N1,U2,U3As a base matrix; then a threshold value lambda is designed, which willΣAnd setting the projection coefficient with the medium amplitude absolute value smaller than lambda as zero, thereby realizing the suppression of noise. In general, an excessively large threshold will cause more speech distortion, while a smaller threshold will cause more noise residual; regarding the selection of the optimal threshold, it will be described in detail below:
for the following linear observation model:
y(i)=x(i)+n(i),i=1,2,...,Q
where n (i) obeys a univariate Gaussian distribution, n (i) at different times are independent of each other, and x (i) obeys a Gaussian distribution, then the minimum statistical risk estimate for x (i) can be expressed as Hλ(y (i)) wherein Hλ(. h) is a hard threshold operator whose effect is to set the components in y (i) below threshold λ to 0; based on the minimum statistical risk criterion, the optimal threshold is
For a multi-channel data reception model, the following relationship is satisfied:
YX+NΣ×1U1×2U2×3U3
here the tensor of noiseNSatisfies the assumption of mutually independent, gaussian distributions, the above formula being equivalent to:
by recoveryX' immediate recoveryX(ii) a Due to U1,U2,U3Are all orthogonal matrices, and the orthogonal matrices correspond to rotational transformations and thus do not changeNThe nature of the independent co-distribution, gaussian distribution; by unfolding the tensor in the above formula into a vector, it can be rewritten asHere vectorAll the length of the microphone frames are NLK (namely NxLxK, N, L, K respectively indicate the number of microphones, the frame length and the number of frames);can be estimated asAccording to the pair tensorXIn the definition of' in the present specification,can be reconstructed intoX', which in turn can reduceX(ii) a And the optimal threshold isWhere δ represents the standard deviation of the noise; n, L, K respectively indicate the number of microphones,frame length, frame number; log represents base 2 logarithm.
According to the technical scheme provided by the invention, on one hand, compared with the traditional multi-channel speech enhancement algorithm, the method has the advantages that the received data is expressed as a third-order tensor, and the time-space correlation characteristic in the original signal can be effectively reserved; on the other hand, compared with the traditional multi-channel enhancement algorithm, the method has small operand, and can realize effective noise reduction only by determining the optimal threshold coefficient.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a flow chart of a multi-channel speech enhancement algorithm provided by an embodiment of the present invention;
fig. 2 is a schematic diagram of calculating an optimal threshold according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a multi-microphone speech enhancement method based on tensor decomposition, which mainly comprises the following steps as shown in figure 1:
and 11, selecting which orthogonal basis matrix (3DDCT orthogonal basis, supervised orthogonal basis and unsupervised orthogonal basis).
Step 12, projecting the tensor on the selected base matrix; and selecting an optimal threshold to truncate the projection coefficient.
The flowchart for calculating the optimal threshold operator according to the minimum statistical risk criterion provided by the embodiment of the present invention, as shown in fig. 2, mainly includes the following steps:
and step 21, tracking and extracting a no-sound/noise section in real time, and calculating the noise variance.
And step 22, selecting the size of a tensor block according to the signal sampling rate and the like, and calculating the optimal truncation threshold.
Compared with the traditional multi-channel speech enhancement algorithm, the scheme of the invention realizes the enhancement of multi-channel speech signals by utilizing the high-order tensor representation, and can effectively reserve the space-time correlation characteristic of the signals; in addition, compared with the traditional multi-channel wiener filtering algorithm, the scheme of the invention has small operand and can realize enhancement only by determining the optimal threshold.
For ease of understanding, the following description will be made in detail with respect to the above two steps.
1. Sparse reconstruction of three dimensions of tensor by using three orthogonal matrixes
The signal reception model of the microphone array is represented as follows:
Y(l,k,n)=X(l,k,n)+N(l,k,n)∈RL×K×N
wherein,Y(l, k, n) is the l-th element of the k-th frame in the n-th receive channel.
The decomposition of the observation tensor typically has the form:
YΣ×1U1×2U2×3U3,Σ∈RL′×K′×N′,U1∈RL×L′,
U2∈RK×K′,U3∈RN×N′,L′≤L,K′≤K,N′≤N
wherein { U1,U2,U3Denotes a base matrix of the image data set,Σrepresenting the core tensor. Specifically, U1Fiber expressing observation tensor mode-1YBase matrix of (: k, n), U2Fiber expressing observation tensor mode-2YBase matrix of (l,: n), U3Representing observation tensormode-3 fiberYA base matrix of (l, k,: in the following order,Σthe projection coefficients of the observation tensor onto these basis matrices are included, L ', K ', N ' representing the size of the core tensor.
Canonical polymorphic decomposition solves the following problems by:
s.t.L′=K′=N′=R,Σis diagonal
a core tensor of the hyper-diagonal and a non-orthogonal basis matrix are obtained. Where R denotes the rank of the clean speech tensor.
The orthogonal tensor decomposition then solves the following problem by:
a non-diagonal core tensor and orthogonal basis matrices are obtained.
Note that if L ' ≦ L, K ' ≦ K, N ' ≦ N, passing directly throughThe clean speech tensor can be approximately reconstructed to recover the original speech signal. In the present invention, we select L ═ L, K ═ K, and N ═ N to obtain the orthogonal square matrix { U ═ N1,U2,U3As a base matrix; then a threshold value lambda is designed, which willΣAnd the projection coefficient with the middle absolute value smaller than lambda is set to be zero, so that the suppression of noise is realized. In general, an excessively large threshold will cause more speech distortion, while a smaller threshold will cause more noise residual; with respect to the selection of the optimal threshold, it will be demonstrated later.
The embodiment of the invention considers four base matrices, the base matrix { U }1,U2,U3The (C) may be a 3-dimensional discrete cosine transform (3D-DCT) basis matrix, a supervised basis matrix, or an unsupervised near basis matrixBasis-like matrix, (unsupervised) exact basis matrix. Wherein the 3D-DCT basis matrix is defined by the following formula:
the 3D-DCT basis matrix is a data-independent general basis matrix.
In a practical process, clean multi-channel voice data can be collected as training data for a specific problem, so that an orthogonal basis matrix optimal for the problem is obtained. For example, the present invention obtains a supervised basis matrix by solving the following optimization problem
Here, the number of the first and second electrodes,X i∈RL×K×Ni-1, 2, …, T denotes a training block consisting of clean speech. Due to hidden variablesΣ iThe above problem does not have an explicit optimal solution. A loop iteration method is adopted to obtain a local optimal solution. In the first step, we willInitializing into a 3D-DCT matrix; second step, givenWe use a soft or hard threshold operator to obtain sparsenessΣ i(ii) a Third step, givenΣ iAndupdatingThe fourth step, giveΣ iAndupdatingThe fifth step, giveΣ iAndupdatingThe steps two to five are repeated continuously until the whole process converges.
For example, in step three, we need to solve the following optimization problem:
this problem can translate into:
here, theX i(1)Mode-1 expansion matrix representing a clean speech block. The above problem can be simplified as follows:
the problem is further equivalent to:
suppose thatThe SVD decomposition can be simplified intoThe above problem further translates into:
due to the orthogonal matrixCannot exceed 1, we haveEqual sign is only onThis is true. That is, in this step,is optimally taken asSimilarly, we can updateAndthe whole process can be converged after 20-30 cycles.
The supervised base matrix described above achieves the best results on all training data, but in practical terms, the test data we face is usually not exactly matched with the training data, which may cause the supervised base matrix to face a certain performance degradation. Therefore, the invention proposes to adopt an unsupervised learning mode to automatically deduce the orthogonal basis matrix most suitable for the test data from the test data. The specific optimization problem is as follows:
here, theAn unsupervised basis matrix is represented,a tensor representing a thinned out tensor comprising projection coefficients. Based on the above problems, the present invention provides two unsupervised basis matrices, namely an approximate basis matrix and an exact basis matrix.
The approximate basis matrix can be obtained by a high-order singular value decomposition algorithm; by pairsY (1),Y (2)AndY (3)can be respectively obtained by SVD decompositionThe accurate basis matrix needs to be further optimized on the basis of the accurate basis matrix. First fix itUpdatingThen fixedAndupdatingBy analogy, updateThe whole process is iterated circularly until convergence. For example, forThe update of (a) can be converted into:
hypothesis matrixThe singular value decomposition of (A) can be expressed asThe solution to the above problem can be written directly asHere, M ', N ' are singular vector matrices, and Σ ' is a diagonal matrix composed of non-negative singular values.
2. Selecting a non-negative threshold value, and setting coefficients lower than the threshold value in the core tensor to be zero
For the following linear observation model:
y(i)=x(i)+n(i),i=1,2,...,Q
where n (i) obeys a univariate Gaussian distribution, n (i) at different times are independent of each other, and x (i) obeys a Gaussian distribution, then the minimum statistical risk estimate for x (i) can be expressed as Hλ(y (i)) wherein Hλ(. cndot.) is a hard threshold operator that acts to set the components in y (i) below threshold λ to 0. Based on the minimum statistical risk criterion, the optimal threshold is
For a multi-channel data reception model, the following relationship is satisfied:
YX+NΣ×1U1×2U2×3U3
here the tensor of noiseNSatisfy the assumption of independent, gaussian distribution. The above formula is equivalent to:
here we only need to recoverX' immediate recoveryX. Due to U1,U2,U3Are all orthogonal matrices, and the orthogonal matrices correspond to rotational transformations and thus do not changeN' independent homographic, Gaussian distribution properties. By unfolding the tensor in the above formula into a vector, it can be rewritten asHere vectorAre all NLK (i.e., N × L × K, N, L, K respectively indicate the number of microphones, the frame length, and the number of frames). ThenCan be estimated asBy definition,can be reconstructed intoX', which in turn can reduceX. Thus the optimum threshold is
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (3)

1. A multi-microphone speech enhancement method based on tensor decomposition is characterized by comprising the following steps:
step (1), expressing the observed multi-channel voice data as a third-order tensor, and taking the third-order tensor as an observation tensor; and three dimensions of the observation tensor are respectively subjected to sparse reconstruction by using three orthogonal square matrixes to obtain a core tensor containing a projection coefficient, and the method comprises the following steps: respectively projecting three dimensions of the observation tensor onto the three orthogonal matrixes by using a preset or preselected orthogonal basis as a projection matrix to obtain a core tensor containing a projection coefficient;
step (2), a non-negative threshold value is preset, and the projection coefficient with the amplitude lower than the threshold value in the core tensor is set to be zero, so that the suppression of noise and the reconstruction of clean voice are realized; the method comprises the following steps: and designing an optimal threshold value by adopting a minimum statistical risk criterion, wherein the size of the threshold value is determined by the standard deviation of the noise and the size of the observation tensor.
2. The tensor decomposition-based multi-microphone speech enhancement method of claim 1, wherein the step (1) comprises:
step (11), representing observed multi-channel voice data as a third-order tensor through a signal receiving model of a microphone array, and taking the third-order tensor as an observation tensor;
the signal reception model of the microphone array is represented as follows:
Y(l,k,n)=X(l,k,n)+N(l,k,n)∈RL×K×N
wherein,Ya tensor is represented which is a tensor of observation,Xrepresenting the tensor to be estimated that represents the clean speech,Nthe tensor of the noise is represented as,Y(l,k,n),X(l, k, n) andN(L, K, N) respectively represent an observation tensor, a tensor to be estimated, an nth receiving channel in a noise tensor and an L element of a kth frame, and L, K and N respectively represent the frame length, the number of frames and the number of microphones;
respectively carrying out sparse reconstruction on three dimensions of the observation tensor by using three orthogonal square matrixes to obtain a core tensor containing a projection coefficient;
the decomposition of the observed tensor takes the form:
YΣ×1U1×2U2×3U3,Σ∈RL′×K′×N′,U1∈RL×L′,
U2∈RK×K′,U3∈RN×N′,L′≤L,K′≤K,N′≤N
wherein { U1,U2,U3Denotes a base matrix of the image data set,Σrepresenting a core tensor; specifically, U1Fiber expressing observation tensor mode-1YBase matrix of (: k, n), U2Fiber expressing observation tensor mode-2YBase matrix of (l,: n), U3Representing observed tensor mode-3 fibersYA base matrix of (l, k,: in the following order,Σincludes the projection coefficients of the observation tensor on the basis matrixes1、×2、×3Respectively representΣ、U1、U2、U3Sequentially multiplying the mode 1, the mode 2 and the mode 3, wherein L ', K ' and N ' represent the size of the core tensor;
by canonicalizing polymorphic decomposition, we can decompose the observation tensor into a form of the sum of a finite number of rank-1 tensors, and canonicalizing polymorphic decomposition of the tensor can be realized by the following formula:
s.t.L′=K′=N′=R,
obtaining a core tensor of the over-diagonal and a non-orthogonal basis matrix, where R represents the rank of the clean speech tensor;
by orthogonal decomposition, we can decompose the observation tensor into the form of the product of three orthogonal basis matrices and one core tensor, and the orthogonal decomposition of the tensor can be realized by the following formula:
a non-diagonal core tensor and orthogonal basis matrices are obtained.
3. The multi-microphone speech enhancement method based on tensor decomposition as recited in claim 2, wherein in the step (2):
for the following linear observation model:
y(i)=x(i)+n(i),i=1,2,...,Q
where Q represents the number of total samples, where n (i) obeys a univariate Gaussian distribution, n (i) at different times are independent of each other, and x (i) obeys a Gaussian distribution, then the minimum statistical risk estimate for x (i) can be represented as Hλ(y (i)) wherein Hλ(. h) is a hard threshold operator whose effect is to set the components in y (i) below threshold λ to 0; based on the minimum statistical risk criterion, the optimal threshold is
For a multi-channel data reception model, the following relationship is satisfied:
YX+NΣ×1U1×2U2×3U3
here the tensor of noiseNSatisfies the assumption of mutually independent, gaussian distributions, the above formula being equivalent to:
by recoveryX' immediate recoveryX(ii) a Due to U1,U2,U3Are all orthogonal matrices, and the orthogonal matrices correspond to rotational transformations and thus do not changeNThe nature of the independent co-distribution, gaussian distribution; by unfolding the tensor in the above formula into a vector, it can be rewritten asHere vectorAll the lengths of the two groups are NLK;can be estimated asAccording to the pair tensorXIn the definition of' in the present specification,can be reconstructed intoX', which in turn can reduceX(ii) a And the optimal threshold isWhere δ represents the standard deviation of the noise; log represents base 2 logarithm and NLK represents N x L x K.
CN201810070662.4A 2018-01-25 2018-01-25 Multi-microphone sound enhancement method based on tensor resolution Active CN108322858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810070662.4A CN108322858B (en) 2018-01-25 2018-01-25 Multi-microphone sound enhancement method based on tensor resolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810070662.4A CN108322858B (en) 2018-01-25 2018-01-25 Multi-microphone sound enhancement method based on tensor resolution

Publications (2)

Publication Number Publication Date
CN108322858A CN108322858A (en) 2018-07-24
CN108322858B true CN108322858B (en) 2019-11-22

Family

ID=62887100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810070662.4A Active CN108322858B (en) 2018-01-25 2018-01-25 Multi-microphone sound enhancement method based on tensor resolution

Country Status (1)

Country Link
CN (1) CN108322858B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111312270B (en) * 2020-02-10 2022-11-22 腾讯科技(深圳)有限公司 Voice enhancement method and device, electronic equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982805A (en) * 2012-12-27 2013-03-20 北京理工大学 Multi-channel audio signal compressing method based on tensor decomposition
CN103117059A (en) * 2012-12-27 2013-05-22 北京理工大学 Voice signal characteristics extracting method based on tensor decomposition
CN106127297A (en) * 2016-06-02 2016-11-16 中国科学院自动化研究所 The acceleration of degree of depth convolutional neural networks based on resolution of tensor and compression method
US9576583B1 (en) * 2014-12-01 2017-02-21 Cedar Audio Ltd Restoring audio signals with mask and latent variables
CN106981292A (en) * 2017-05-16 2017-07-25 北京理工大学 A kind of multichannel spatial audio signal compression modeled based on tensor and restoration methods

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10468036B2 (en) * 2014-04-30 2019-11-05 Accusonus, Inc. Methods and systems for processing and mixing signals using signal decomposition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982805A (en) * 2012-12-27 2013-03-20 北京理工大学 Multi-channel audio signal compressing method based on tensor decomposition
CN103117059A (en) * 2012-12-27 2013-05-22 北京理工大学 Voice signal characteristics extracting method based on tensor decomposition
US9576583B1 (en) * 2014-12-01 2017-02-21 Cedar Audio Ltd Restoring audio signals with mask and latent variables
CN106127297A (en) * 2016-06-02 2016-11-16 中国科学院自动化研究所 The acceleration of degree of depth convolutional neural networks based on resolution of tensor and compression method
CN106981292A (en) * 2017-05-16 2017-07-25 北京理工大学 A kind of multichannel spatial audio signal compression modeled based on tensor and restoration methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于听觉感知与张量模型的鲁棒语音特征提取方法研究;吴强;《中国博士学位论文全文数据库 信息科技辑》;20110715(第7期);全文 *

Also Published As

Publication number Publication date
CN108322858A (en) 2018-07-24

Similar Documents

Publication Publication Date Title
CN108447498B (en) Speech enhancement method applied to microphone array
US8848933B2 (en) Signal enhancement device, method thereof, program, and recording medium
CN108172231B (en) Dereverberation method and system based on Kalman filtering
JP2007526511A (en) Method and apparatus for blind separation of multipath multichannel mixed signals in the frequency domain
CN110998723B (en) Signal processing device using neural network, signal processing method, and recording medium
Geng et al. End-to-end speech enhancement based on discrete cosine transform
Nesta et al. Blind source extraction for robust speech recognition in multisource noisy environments
Nesta et al. Robust Automatic Speech Recognition through On-line Semi Blind Signal Extraction
Habets et al. Dereverberation
EP3440671A1 (en) Audio source parameterization
JPWO2019163487A1 (en) Signal analyzer, signal analysis method and signal analysis program
CN117854536B (en) RNN noise reduction method and system based on multidimensional voice feature combination
Kubo et al. Efficient full-rank spatial covariance estimation using independent low-rank matrix analysis for blind source separation
KR101043114B1 (en) Method of Restoration of Sound, Recording Media of the same and Apparatus of the same
Şimşekli et al. Non-negative tensor factorization models for Bayesian audio processing
CN108322858B (en) Multi-microphone sound enhancement method based on tensor resolution
CN108875824B (en) Single-channel blind source separation method
CN112037813B (en) Voice extraction method for high-power target signal
CN109644304B (en) Source separation for reverberant environments
CN103176947B (en) A kind of multi channel signals denoising method based on signal correlation
CN111312270B (en) Voice enhancement method and device, electronic equipment and computer readable storage medium
CN101322183A (en) Signal distortion elimination apparatus, method, program, and recording medium having the program recorded thereon
CN116705049A (en) Underwater acoustic signal enhancement method and device, electronic equipment and storage medium
WO2016162165A1 (en) Method and device for encoding multiple audio signals, and method and device for decoding a mixture of multiple audio signals with improved separation
Inoue et al. Sepnet: a deep separation matrix prediction network for multichannel audio source separation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant