CN111145768B - Speech enhancement method based on WSHRRPCA algorithm - Google Patents

Speech enhancement method based on WSHRRPCA algorithm Download PDF

Info

Publication number
CN111145768B
CN111145768B CN201911290388.2A CN201911290388A CN111145768B CN 111145768 B CN111145768 B CN 111145768B CN 201911290388 A CN201911290388 A CN 201911290388A CN 111145768 B CN111145768 B CN 111145768B
Authority
CN
China
Prior art keywords
time
spectrum
speech
enhanced
fourier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911290388.2A
Other languages
Chinese (zh)
Other versions
CN111145768A (en
Inventor
罗勇江
杨腾飞
杨家利
毕鲁浩
汤建龙
王钟慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Shengxin Technology Co ltd
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201911290388.2A priority Critical patent/CN111145768B/en
Publication of CN111145768A publication Critical patent/CN111145768A/en
Application granted granted Critical
Publication of CN111145768B publication Critical patent/CN111145768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a voice enhancement method based on a WSHRRPCA algorithm, which mainly solves the problem of poor voice enhancement effect of the existing algorithm in a colored noise environment, and specifically comprises the following steps: establishing a whitening model by using a noisy speech sample, whitening the noisy speech by using the model in a time domain, and then acquiring a time-frequency amplitude spectrum and a time-frequency phase spectrum of the noisy speech by using short-time Fourier transform; the method comprises the steps of disordering and rearranging the arrangement sequence of the spectrum elements in each column of the time-frequency amplitude spectrum by a Hash function mapping method, decomposing the spectrum elements by using a robust principal component analysis algorithm to obtain an enhanced time-frequency amplitude spectrum and restoring the arrangement sequence; and forming an enhanced time frequency spectrum by using the enhanced time frequency amplitude spectrum and the time frequency phase spectrum, reconstructing a complete time domain whitening enhanced voice signal, and performing inverse whitening processing on the signal by using a whitening model to obtain enhanced voice. The invention can effectively eliminate various noises in the noisy speech to achieve the aim of speech enhancement, and can be applied to a speech receiving system, a speech coding system and a speech recognition system.

Description

Speech enhancement method based on WSHRRPCA algorithm
Technical Field
The invention belongs to the technical field of signal processing, and further relates to voice signal processing, in particular to a voice enhancement method based on a Whitened short-time Fourier spectrum Hash rearrangement Robust Principal Component Analysis WSHRRPCA (Whitened-short-time-Fourier-spectral-hash-reordered Robust Component Analysis) algorithm, which can be used for a voice receiving system and a voice recognition system, realizes voice enhancement and noise reduction in the voice receiving system, and improves the signal-to-noise ratio of an input signal in a front-end preprocessing part of the voice recognition system, thereby improving the anti-interference capability and the recognition rate of the system.
Background
The speech enhancement technology is widely applied to the fields of voice call, teleconference, scene recording, military eavesdropping, hearing aid equipment, voice recognition equipment and the like, and a plurality of preprocessing modules of the speech coding and speech recognition system are all related to the technology. Traditional speech enhancement algorithms are mainly classified into three main categories: spectral subtraction, statistical model-based algorithms, and subspace algorithms. However, these conventional speech enhancement algorithms have their limitations in application. Spectral subtraction often performs signal enhancement processing based on estimation of a noise spectrum, when some non-stationary noise occurs, the estimation of the noise spectrum is inaccurate, the signal enhancement effect is affected, and the algorithm is easy to generate 'unnaturalness' music noise; statistical model-based algorithms generally require the assumption that speech and noise signals are statistically independent and obey a gaussian distribution; the subspace algorithm needs to assume that the clean speech signal subspace and the noise subspace are orthogonal, but this assumption of subspace orthogonality is very unreasonable in practical situations. In order to break through the limitations of the conventional algorithms, people begin to find new theories. In recent years, the convex optimization-based compressed sensing and matrix rank minimization and low-rank matrix recovery theory derived from the convex optimization-based compressed sensing become one of research hotspots in the field of digital signal processing, and a matrix low-rank sparse decomposition algorithm, namely robust principal component analysis, serving as the low-rank matrix recovery theory has also been applied to the field of speech enhancement and achieves a better effect. However, the speech enhancement method based on robust principal component analysis has the following disadvantages: first, the method has good performance in white noise environment, but the energy distribution characteristics of colored noise and white noise are different, which makes the method have insufficient performance in eliminating colored noise; second, when noise is removed, a part of low-rank speech components are also removed, resulting in loss of speech components and affecting the speech enhancement effect.
Sun et al, in their published paper "A novel Speech enhancement method based on constrained low-rank and sparse matrix decomposition" (Speech Communication,60: 44-55,2014), propose a Speech enhancement method based on a matrix decomposition algorithm with low rank and sparse constraints. The method comprises the following implementation steps: the method comprises the steps that firstly, a short-time Fourier transform is used for obtaining a time-frequency amplitude spectrum and a time-frequency phase spectrum of noise-containing voice, and a three-point median filter is used for smoothing the time-frequency amplitude spectrum; secondly, decomposing a time-frequency amplitude spectrum of the noisy speech by using a constraint low-rank and sparse matrix decomposition algorithm to obtain a low-rank matrix and a sparse matrix, and performing binary time-frequency masking processing on the sparse matrix; and thirdly, reconstructing a time spectrum of the enhanced voice by using the sparse matrix and the noisy voice phase spectrum, and reconstructing the enhanced voice in a time domain form by using inverse short-time Fourier transform. The main problem of this method is to reduce the possibility that the low rank speech component is erroneously eliminated only by limiting the size of the low rank matrix rank, which is not fundamentally solved, and thus, there is still a part of the low rank speech that is removed as noise. Meanwhile, the method increases the limitation on the sparsity of the sparse matrix, so that under the condition of strong background noise, a large number of voice components are eliminated, and the voice quality is reduced.
Disclosure of Invention
The invention aims to provide a speech enhancement method based on a whitening short-time Fourier spectrum Hash rearrangement robust principal component analysis algorithm aiming at the defects of the prior art, obtains high-quality enhanced speech in a noise environment, and is mainly applied to a speech receiving system, a speech coding system and a speech recognition system.
The specific idea for realizing the purpose of the invention is that firstly a whitening model is established by utilizing a part of samples of the noisy speech, the whitening model is used for whitening the noisy speech in the time domain, short-time Fourier transform is used for carrying out time-frequency analysis on the whitened noisy speech to obtain the time-frequency amplitude spectrum and the time-frequency phase spectrum thereof, then, the arrangement sequence of the spectrum elements in each column of the time-frequency amplitude spectrum is disordered and rearranged by using a Hash function mapping method to obtain a rearranged time-frequency amplitude spectrum, then the rearranged time-frequency amplitude spectrum is decomposed by using a robust principal component analysis algorithm to obtain an enhanced time-frequency amplitude spectrum, the arrangement sequence of each column of the spectrum elements is recovered, then, an enhanced time frequency spectrum is formed by utilizing the enhanced time frequency amplitude spectrum and the time frequency phase spectrum, a complete time domain whitening enhanced voice signal is reconstructed, and finally, the voice signal is subjected to inverse whitening processing by using the established whitening model to obtain enhanced voice. The invention can be used for speech enhancement in various speech processing systems, recover the quality and intelligibility of speech seriously polluted by noise and achieve the aim of enhancing noise-containing speech.
The method specifically comprises the following steps:
(1) generating whitened noisy speech xw(n):
(1a) Selecting an integer value as a sample point number N within the range of [1000,1500], and taking the first N sampling points of a noisy speech x (N) to establish a whitening filter;
(1b) carrying out whitening treatment on the noise-containing voice x (n) by using the whitening filter obtained in the step (1a) to obtain whitened noise-containing voice xw(n);
(2) Generating whitened noisy speech xw(n) time-frequency amplitude spectrum | DwI and time-frequency phase spectrum < Dw
(2a) In [20,40 ]]The duration of each frame of speech signal is optionally selected from a range of milliseconds, and the length of each frame is 25%, 75%]Selecting a value in the range of (1) as the displacement of the next frame speech relative to the previous frame speech, and whitening the noise-containing speech xw(n) dividing into a plurality of short-time speech frames;
(2b) selecting an unprocessed frame of short-time speech in sequence according to a time sequence from all the short-time speech frames as a frame to be processed currently;
(2c) performing Fourier transform on a short-time speech frame to be processed currently to obtain a Fourier spectrum of the frame, and calculating the amplitude and phase of the Fourier spectrum to obtain a Fourier amplitude spectrum and a Fourier phase spectrum;
(2d) judging whether all short-time speech frames are processed or not, if so, executing the step (2e), otherwise, returning to the step (2 b);
(2e) taking the Fourier magnitude spectrum of each frame as a column vector, arranging the column vector according to the time sequence to form a time-frequency magnitude spectrum | D of the whitened noisy speechwL, |; taking the Fourier phase spectrum of each frame as a column vector, arranging the column vector according to the time sequence to form a time-frequency phase spectrum < D > for whitening the noise-containing voicew
(3) Generating a rearranged time-frequency amplitude spectrum | Dw|r
(3a) Amplitude spectrum | D in timewOfIn all column vectors, sequentially selecting an unprocessed column according to a time sequence to serve as a Fourier magnitude spectrum to be processed currently;
(3b) generating a new arrangement sequence for the spectral elements in the current Fourier magnitude spectrum by utilizing a Hash function, and rearranging the spectral elements according to the sequence to obtain a rearranged Fourier magnitude spectrum;
(3c) judging whether the | D is processedwIf yes, executing the step (3d), otherwise, returning to the step (3 a);
(3d) all the rearranged Fourier magnitude spectrums are used as column vectors and are arranged according to time sequence to form a rearranged time frequency magnitude spectrum | Dw|r
(4) Generating an enhanced time-frequency magnitude spectrum | Sw|:
(4a) In [6,10 ]]Is selected as an integer Q as the estimated rearrangement time-frequency amplitude spectrum | Dw|rThe number of columns used for medium noise intensity, using | Dw|rFront Q-column rebinned Fourier magnitude spectral estimation | Dw|rThe intensity of the noise in (1);
(4b) utilizing robust principal component analysis algorithm to rearrange time-frequency amplitude spectrum | D according to the noise intensity estimated in (4a)w|rEnhancing to generate sparse rearrangement time-frequency amplitude spectrum | Sw|r
(4c) Restoring | S according to the arrangement order generated in (3b)w|rThe order of the Fourier amplitude spectrum elements in all the columns to obtain an enhanced time-frequency amplitude spectrum | Sw|;
(5) Composing an enhanced time spectrum Sw
By enhancing the time-frequency amplitude spectrum | SwI and time-frequency phase spectrum < DwComposing an enhanced time spectrum Sw
(6) Reconstructed whitened enhanced speech yw(n):
(6a) Spectrum S at enhancementwIn all the column vectors, one unprocessed column is sequentially selected according to the time sequence and is used as the enhanced Fourier spectrum to be processed currently;
(6b) performing inverse Fourier transform on the enhanced Fourier spectrum to be processed currently to obtain a frame of whitened short-time enhanced voice;
(6c) judging whether the processing is finished SwIf yes, executing the step (6d), otherwise, returning to the step (6 a);
(6d) reconstructing all whitened short-time enhanced speech frames into complete whitened enhanced speech y using overlap-Add Overlapped Add methodw(n);
(7) Generating the enhanced speech y (n):
whitening enhanced speech y using the whitening filter obtained in (1a)w(n) performing inverse whitening processing to obtain enhanced speech y (n).
Compared with the prior art, the invention has the following advantages:
firstly, the processing procedure of whitening is added in the invention, when the background noise is colored noise, the colored noise can be converted into white noise, and the capability of eliminating the colored noise is improved; moreover, the whitening processing does not influence the noise reduction capability of the invention in a white noise environment;
secondly, the invention uses the Hash function mapping method to carry out disordering rearrangement on the arrangement sequence of each column of spectrum elements of the original time-frequency amplitude spectrum before generating the enhanced time-frequency amplitude spectrum, so that the low-rank voice components in the invention become close to full rank and no longer have the characteristic of low rank, the low-rank voice components are effectively retained in the enhanced voice, and the quality of the enhanced voice is improved.
Drawings
FIG. 1 is a flow chart of an implementation of the method of the present invention;
fig. 2 is a diagram showing a comparison result between a top view of a time-frequency amplitude spectrum of colored noise F16 noise and a top view of a time-frequency amplitude spectrum of a whitening signal thereof in simulation experiment 1 according to the present invention.
FIG. 3 is a time-frequency amplitude spectrum visual comparison graph of the speech enhancement effect of the method of the present invention and the speech enhancement method based on the robust principal component analysis algorithm under the condition of colored noise F16 noise in simulation experiment 2 of the present invention;
FIG. 4 is a comparison graph of objective indexes of average speech enhancement effect of the method of the present invention and a speech enhancement method based on a robust principal component analysis algorithm under six different types of colored noise in simulation experiment 3.
Detailed Description
The implementation steps of the method of the invention are described in further detail below with reference to fig. 1.
Step 1, generating whitened noisy speech xw(n)。
(1.1) selecting an integer value as the number N of sample points within the range of [1000,1500], and taking the first N sampling points of noisy speech x (N) to establish a whitening filter; the specific steps for establishing the whitening filter are as follows:
step 1, at [30,50 ]]Selecting an integer p as the order of the whitening filter, and establishing a p-order linear predictor by using x (N) first sampling points of the noisy speech, wherein the transfer function of the linear predictor is
Figure BDA0002318904970000051
And solving coefficient a of linear predictor by using autocorrelation methodi(i=1,2,…,p);
Step 2, using a p-order linear predictor to build a p-order whitening filter with a transfer function of
Figure BDA0002318904970000052
(1.2) carrying out whitening treatment on the noise-containing voice x (n) by using the whitening filter obtained in (1.1) to obtain whitened noise-containing voice xw(n) of (a). The whitening processing of the noisy speech x (n) means: and (3) filtering the noisy speech x (n) by using the p-order whitening filter established in the step (1.1) of the step.
Step 2, generating whitening noisy speech xw(n) time-frequency amplitude spectrum | DwI and time-frequency phase spectrum < Dw
(2.1) in [20,40 ]]The duration of each frame of speech signal is optionally selected from a range of milliseconds, and the length of each frame is 25%, 75%]Selecting a value in the range of (1) as the displacement of the next frame speech relative to the previous frame speech, and whitening the noise-containing speech xw(n) division into a plurality of short-time speech frames。
And (2.2) sequentially selecting an unprocessed frame of short-time speech as a frame to be processed currently in time sequence from all the short-time speech frames.
And (2.3) carrying out Fourier transform on the short-time speech frame to be processed currently to obtain a Fourier spectrum of the frame, and calculating the amplitude and the phase of the Fourier spectrum to obtain a Fourier amplitude spectrum and a Fourier phase spectrum.
And (2.4) judging whether all short-time speech frames are processed, if so, executing the step (2.5) of the step, otherwise, executing the step (2.2) of the step.
(2.5) taking the Fourier magnitude spectrum of each frame as a column vector, arranging the column vector according to the time sequence to form a time frequency magnitude spectrum | D of the whitened noisy speechwTaking the Fourier phase spectrum of each frame as a column vector, arranging the column vector according to the time sequence to form a time-frequency phase spectrum < D > of the whitened noisy speechwHere, the time-frequency magnitude spectrum | DwI and time-frequency phase spectrum < DwAre all matrices and | Dw|∈Rm×n,∠Dw∈Rm×nWhere e denotes that the element belongs to the set, R denotes the matrix | Dw| and matrix &wWherein the elements are real numbers, and m is a matrix | DwI and matrix < D |wN is the matrix | DwI and matrix < D |wThe number of columns.
Step 3, generating a rearrangement time-frequency amplitude spectrum | Dw|r
(3.1) amplitude Spectrum | D in timewAnd in all column vectors of l, sequentially selecting an unprocessed column according to a time sequence to serve as a Fourier magnitude spectrum to be processed currently.
(3.2) generating a new arrangement sequence for the spectral elements in the current Fourier magnitude spectrum by utilizing a hash function, and rearranging the spectral elements according to the sequence to obtain a rearranged Fourier magnitude spectrum, wherein the method comprises the following specific steps:
let the current Fourier magnitude spectrum be X ═ X1,x2,…,xm]T∈Rm×1Wherein X is a column vector, X1,x2,…,xmIs m spectral elementsSubscripts 1,2, …, m of each spectral element represent the arrangement sequence of the spectral elements in a Fourier magnitude spectrum, T represents a vector transposition operation, epsilon represents that the elements belong to a set, and R represents that the spectral elements are real numbers;
(3.2.1) selecting an integer a which is coprime to m in the range of [2, m), selecting an integer b in the range of [0, m), and constructing a hash function f (k) ═ ak + b)m+1, wherein (.)mDenotes a modulo operation, k denotes the subscript of the spectral element, and k ═ 1,2, …, m;
(3.2.2) mapping the original sequence of indices 1,2, …, m of the spectral elements to a new sequence of indices f (1), f (2), …, f (m) using a hash function f (k);
(3.2.3) rearranging the spectral elements according to the new subscript sequence f (1), f (2), …, f (m) to obtain a rearranged Fourier magnitude spectrum Xr=[xf(1),xf(2),…,xf(m)]T∈Rm×1
(3.3) judging whether | D is processedwIf yes, executing the (3.4) step of the step, otherwise, executing the (3.1) step of the step.
(3.4) all the rearranged Fourier amplitude spectra are arranged in time sequence as column vectors to form a rearranged time-frequency amplitude spectrum | Dw|r
Step 4, generating an enhanced time-frequency amplitude spectrum | Sw|。
(4.1) at [6,10]Is selected as an integer Q as the estimated rearrangement time-frequency amplitude spectrum | Dw|rThe number of columns used for medium noise intensity, using | Dw|rFront Q-column rebinned Fourier magnitude spectral estimation | Dw|rThe intensity of the noise in (2).
(4.2) according to the noise intensity estimated in the step (4.1) of the step, utilizing a robust principal component analysis algorithm to rearrange a time-frequency amplitude spectrum | Dw|rEnhancing to generate sparse rearrangement time-frequency amplitude spectrum | Sw|r
The robust principal component analysis algorithm is as follows: rearrangement time-frequency amplitude spectrum | D by using augmented Lagrange multiplier ALM methodw|rThe robust principal component analysis algorithm model carries out optimization solution, and a rearrangement time-frequency amplitude spectrum | D is decomposedw|rObtaining a sparse rearrangement time-frequency amplitude spectrum | Sw|rAnd low rank rearrangement time-frequency amplitude spectrum | Lw|r. The specific optimization process comprises the following steps:
at | Dw|r=|Lw|′r+|Sw|′rUnder the condition, finding a sparse rearrangement time-frequency amplitude spectrum | Sw|rMatrix and low rank rearrangement time-frequency amplitude spectrum | Lw|rMatrix, such that Lw|r||*+λ|||Sw|r||1Has the smallest value, i.e.
Figure BDA0002318904970000071
Wherein, | Sw|′rRepresenting low rank rearranged time-frequency amplitude spectrum, | L, containing noise informationw|′rRepresenting a sparsely rearranged time-frequency amplitude spectrum containing speech information, | · | | luminance*Representing kernel norm operation, λ representing weight, | · | | | luminance1Representing a 1-norm operation.
(4.3) restoring | S according to the arrangement order generated in the (3.2) th step of the step 3w|rThe arrangement sequence of the Fourier amplitude spectrum elements in all the columns to obtain an enhanced time frequency amplitude spectrum | Sw|。
Step 5, forming an enhanced time spectrum Sw
Using the enhanced time-frequency amplitude spectrum | S generated in step 4wI and the time-frequency phase spectrum < D obtained in the (2.5) step of the step 2wComposing an enhanced time spectrum Sw
Step 6, reconstructing the whitened enhanced speech yw(n)。
(6.1) Spectrum S at enhancementwAnd sequentially selecting an unprocessed column in the all column vectors according to the time sequence to be used as the enhanced Fourier spectrum to be processed currently.
And (6.2) carrying out inverse Fourier transform on the enhanced Fourier spectrum to be processed currently to obtain a frame of whitened short-time enhanced voice.
(6.3) judging whether S has been processedwIf yes, executing the (6.4) th step of the step, otherwise, executing the (6.1) th step of the step.
(6.4) reconstructing all whitened short term enhanced speech frames into a complete whitened enhanced speech y using overlap-added overlaid Addw(n)。
And 7, generating enhanced voice y (n).
Using the whitening filter obtained in step 1 (1.1) to whiten the enhanced speech y obtained in step 6w(n) performing inverse whitening processing to obtain enhanced speech y (n).
The whitening enhanced voice y obtained in the step 6 is subjected to whitening by using the whitening filter obtained in the step 1, the step (1.1)w(n) inverse whitening processing is performed to obtain enhanced speech y (n) as follows.
(7a) Using the whitening filter obtained in step 1 (1.1) to build an inverse whitening filter having a transfer function of WI(z)=1/W(z)。
(7b) Speech enhancement using an inverse whitening filterw(n) filtering to obtain enhanced speech y (n).
The application effect of the invention is further explained by combining the following simulation:
1. simulation conditions
The simulation experiment of the invention is realized by MATLAB simulation software, the sampling rate of the voice is set to be 8000 Hz, the time length of each frame of short-time voice is 32 milliseconds, and the displacement of the next frame of voice relative to the previous frame of voice is 16 milliseconds. And taking the first 1024 sampling points of the noisy speech to establish a 40-order whitening filter. The method solves the robust principal component analysis algorithm by using an Exact ALM (Exact Augmented Lagrange Multiplier) method in simulation, wherein the weight parameters of the robust principal component analysis algorithm and the rearranged Fourier magnitude spectrum | Dw|rThe noise intensity relation in (2) can be determined adaptively by the following formula:
λ=-0.004×ζ+0.1181
wherein λ represents a weight parameter of the robust principal component analysis algorithm,ζ represents the rearranged Fourier magnitude spectrum | Dw|rAn estimate of the signal-to-noise ratio in (1). Specifically, ζ can be determined by the following formula:
Figure BDA0002318904970000091
where ζ represents the rearranged Fourier magnitude spectrum | Dw|rEstimate of the signal-to-noise ratio in (log)10(. to.) denotes a base-10 logarithm operation,. sigma.,
Figure BDA0002318904970000092
representation matrix | Dw|rThe square of the spectral element at the ith row and jth column position in the matrix, | Dw|rN is the matrix | Dw|rThe number of columns of (1), Q represents | Dw|rThe number of columns of the voice spectrum is set to 8 in the simulation experiment of the present invention.
2. Emulated content
The simulation experiments of the invention are three. Simulation experiment 1 is a whitening experiment of colored noise to illustrate the effectiveness of the whitening process in the present invention. Fig. 2 is a diagram showing the comparison result between the top view of the time-frequency amplitude spectrum of the colored noise F16 noise and the top view of the time-frequency amplitude spectrum of the whitened signal obtained in simulation experiment 1. Fig. 2(a) shows a top view of a time-frequency amplitude spectrum of a colored noise F16 noise, and fig. 2(a) shows a top view of a time-frequency amplitude spectrum of a signal obtained by whitening a colored noise F16. The horizontal axis in each time-frequency amplitude spectrum in fig. 2 represents the time axis in seconds, the vertical axis represents the frequency axis in kilohertz, and each time-frequency amplitude spectrum is represented in the form of a logarithmic spectrum with the spectral values in decibels.
The simulation experiment 2 is to visually compare the voice enhancement effect of the method of the present invention with the voice enhancement method based on the robust principal component analysis algorithm to obtain the time-frequency amplitude spectrum visual comparison graph of fig. 3. In simulation experiment 2, a clean speech segment is polluted by colored noise F16, the signal-to-noise ratio is 5dB, and speech enhancement is respectively carried out by using the method and the existing speech enhancement method based on the robust principal component analysis algorithm. Fig. 3(a) shows a top view of a time-frequency amplitude spectrum of clean speech, fig. 3(b) shows a top view of a time-frequency amplitude spectrum of colored noise F16, fig. 3(c) shows a top view of a time-frequency amplitude spectrum of a speech component obtained by a speech enhancement method based on a robust principal component analysis algorithm, fig. 3(d) shows a top view of a time-frequency amplitude spectrum of a noise component obtained by a speech enhancement method based on a robust principal component analysis algorithm, fig. 3(e) shows a top view of a time-frequency amplitude spectrum of a speech component obtained by the method of the present invention, and fig. 3(F) shows a top view of a time-frequency amplitude spectrum of a noise component obtained by the method of the present invention. The horizontal axis in each time-frequency amplitude spectrum in fig. 3 represents the time axis in seconds, the vertical axis represents the frequency axis in kilohertz, and each time-frequency amplitude spectrum is represented in the form of a log spectrum with spectral values in decibels.
Simulation experiment 3 is to compare the average voice enhancement effect in six different types of colored noise (buccaneer1, buccaneer2, f16, factor 1, hfchannel and ping) by using the method of the present invention and the existing voice enhancement method based on the robust principal component analysis algorithm, and the result is shown in fig. 4. Fig. 4 is a simulation experiment 3 showing objective index comparison of average speech enhancement effect of the speech enhancement method based on the robust principal component analysis algorithm under six different types of colored noise conditions, where the speech enhancement effect is measured by two objective indexes, namely, source distortion ratio and speech quality perception evaluation, the source distortion ratio is measured by the ratio of speech signal energy to noise energy contained in the enhanced speech, and is measured in decibels, and the speech quality perception evaluation is an index for evaluating subjective intelligibility of the enhanced speech, and the larger the numerical values of the two indexes are, the better the speech enhancement effect is. The curve marked by a circle in fig. 4(a) represents the variation curve of the average signal-to-distortion ratio of the enhanced speech obtained by the method of the present invention under the above six colored noise pollutions, which is influenced by the signal-to-noise ratio. The curve marked by diamonds in fig. 4(a) represents the variation curve of the average signal-to-distortion ratio of the enhanced speech under the six colored noise pollutions, which is based on the speech enhancement method of the robust principal component analysis algorithm, and is influenced by the signal-to-noise ratio. The abscissa in fig. 4(a) represents the snr of noisy speech in decibels and the ordinate represents the source distortion ratio in decibels. The curve marked by a circle in fig. 4(b) represents the variation curve of the average speech quality perception evaluation index of the enhanced speech obtained under the above six colored noise pollutions, which is influenced by the signal-to-noise ratio, according to the method of the present invention. The curve marked by diamonds in fig. 4(b) represents the variation curve of the average speech quality perception evaluation index of the enhanced speech under the six colored noise pollutions, which is based on the speech enhancement method of the robust principal component analysis algorithm, and is influenced by the signal-to-noise ratio. The abscissa in fig. 4(b) represents the signal-to-noise ratio in decibels, and the ordinate represents the speech quality perception assessment.
3. And (3) simulation result analysis:
as can be seen from fig. 2, the colored noise F16 is mainly concentrated in the frequency bands of 0 hz-1.5 khz and 2.5 khz-3 khz, and after whitening, the signal energy is substantially uniformly distributed in the entire frequency band of 0 hz-4 khz, which is similar to white noise. Therefore, the whitening processing step in the present invention is effective in converting color noise into white noise.
FIG. 3 shows that the noise component time-frequency amplitude spectrum obtained by the speech enhancement method based on the robust principal component analysis algorithm in FIG. 3(d) has a large amount of speech components remaining, while the noise component time-frequency amplitude spectrum obtained by the method of the present invention in FIG. 3(f) has very few speech components remaining; meanwhile, the energy of the voice component in fig. 3(e) is larger than that in fig. 3(c), which intuitively shows that the algorithm of the present invention has a better voice enhancement effect. The method has the advantages that the arrangement sequence of each frame of Fourier spectrum elements is disturbed, so that the similarity between the time-frequency amplitude spectrum frame and the frame of the low-rank voice component is reduced, the condition that the low-rank voice component is wrongly decomposed is relieved, and the voice enhancement effect is improved.
As can be seen from fig. 4, the source distortion ratio curve obtained by the method of the present invention is above the source distortion ratio curve obtained by the speech enhancement method based on the robust principal component analysis algorithm. For the voice quality perception evaluation index, under the condition that the signal-to-noise ratio of the noise-containing voice is 0dB and-5 dB, the score of the method is slightly lower than that of a voice enhancement method based on a robust principal component analysis algorithm. By combining the two indexes, the method has better noise elimination capability in various colored noise environments, and simultaneously, more voice components are kept in the enhanced voice as much as possible, so that the method has good voice enhancement effect.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (5)

1. A speech enhancement method based on a whitening short-time Fourier spectrum Hash rearrangement robust principal component analysis WSHRRPCA algorithm is characterized by comprising the following steps:
(1) generating whitened noisy speech xw(n):
(1a) Selecting an integer value as a sample point number N within the range of [1000,1500], and taking the first N sampling points of a noisy speech x (N) to establish a whitening filter; the method comprises the following specific steps:
(1a1) in [30,50 ]]Selecting an integer p as the order of the whitening filter, and establishing a p-order linear predictor by using x (N) first sampling points of the noisy speech, wherein the transfer function of the linear predictor is
Figure FDA0003573928750000011
And solving the coefficient a of the linear predictor by using an autocorrelation methodi(i=1,2,…,p);
(1a2) Using a p-order linear predictor to build a p-order whitening filter having a transfer function of
Figure FDA0003573928750000012
(1b) By using (1a) to obtainThe whitening filter performs whitening processing on the noisy speech x (n), specifically: filtering the noisy speech x (n) by using the p-order whitening filter established in the step (1a) to obtain whitened noisy speech xw(n);
(2) Generating whitened noisy speech xw(n) time-frequency amplitude spectrum | DwI and time-frequency phase spectrum < Dw
(2a) In [20,40 ]]The duration of each frame of speech signal is optionally selected from a range of milliseconds, and the length of each frame is 25%, 75%]Selecting a value in the range of (1) as the displacement of the next frame speech relative to the previous frame speech, and whitening the noisy speech xw(n) dividing into a plurality of short-time speech frames;
(2b) selecting an unprocessed frame of short-time speech in sequence according to a time sequence from all the short-time speech frames as a frame to be processed currently;
(2c) performing Fourier transform on a short-time speech frame to be processed currently to obtain a Fourier spectrum of the frame, calculating the amplitude and the phase of the Fourier spectrum, and obtaining a Fourier magnitude spectrum and a Fourier phase spectrum;
(2d) judging whether all short-time speech frames are processed or not, if so, executing the step (2e), otherwise, returning to the step (2 b);
(2e) taking the Fourier magnitude spectrum of each frame as a column vector, arranging the column vector according to the time sequence to form a time-frequency magnitude spectrum | D of the whitened noisy speechwL, |; taking the Fourier phase spectrum of each frame as a column vector, arranging the column vector according to the time sequence to form a time-frequency phase spectrum & lt D & gt of the whitened noisy speechw
(3) Generating a rearranged time-frequency amplitude spectrum | Dw|r
(3a) Amplitude spectrum | D in timewIn all column vectors of |, sequentially selecting an unprocessed column according to a time sequence to serve as a Fourier magnitude spectrum to be processed currently;
(3b) generating a new arrangement sequence for the spectral elements in the current Fourier magnitude spectrum by utilizing a Hash function, and rearranging the spectral elements according to the sequence to obtain a rearranged Fourier magnitude spectrum;
(3c) judging whether the | D is processedwIf yes, executing the step (3d), otherwise, returning to the step (3 a);
(3d) all the rearranged Fourier magnitude spectrums are used as column vectors and are arranged according to time sequence to form a rearranged time frequency magnitude spectrum | Dw|r
(4) Generating an enhanced time-frequency magnitude spectrum | Sw|:
(4a) In [6,10 ]]Is selected as an integer Q as the estimated rearrangement time-frequency amplitude spectrum | Dw|rThe number of columns used for medium noise intensity, using | Dw|rFront Q-column rebinned Fourier magnitude spectral estimation | Dw|rThe intensity of the noise in (1);
(4b) utilizing robust principal component analysis algorithm to rearrange time-frequency amplitude spectrum | D according to the noise intensity estimated in (4a)w|rEnhancing to generate sparse rearrangement time-frequency amplitude spectrum | Sw|r
(4c) Restoring | S according to the new arrangement order generated in (3b)w|rThe arrangement sequence of the Fourier magnitude spectrum elements in all the columns to obtain an enhanced time frequency magnitude spectrum | Sw|;
(5) Composing an enhanced time spectrum Sw
By enhancing the time-frequency amplitude spectrum | SwI and time-frequency phase spectrum < DwComposing an enhanced time spectrum Sw
(6) Reconstructed whitened enhanced speech yw(n):
(6a) Spectrum S at enhancementwSequentially selecting an unprocessed column from all the column vectors according to a time sequence to serve as an enhanced Fourier spectrum to be processed currently;
(6b) performing inverse Fourier transform on the enhanced Fourier spectrum to be processed currently to obtain a frame of whitened short-time enhanced voice;
(6c) judging whether the processing is finished SwIf yes, executing the step (6d), otherwise, returning to the step (6 a);
(6d) reconstructing all whitened short-time enhanced speech frames into complete whitened enhanced speech y using overlap-Add Overlapped Add methodw(n);
(7) Generating enhanced speech y (n):
whitening enhanced speech y using the whitening filter obtained in (1a)w(n) performing inverse whitening processing to obtain enhanced speech y (n).
2. The method of claim 1, further comprising: the time-frequency amplitude spectrum | D in step (2e)wI and time-frequency phase spectrum < DwAre all matrices and | Dw|∈Rm×n、∠Dw∈Rm×nWhere e denotes that the element belongs to the set, R denotes the matrix | DwI and matrix < D |wWherein the elements are real numbers, and m is a matrix | DwI and matrix < D |wN is the matrix | DwI and matrix < D |wThe number of columns.
3. The method of claim 1, further comprising: in the step (3b), a new arrangement order is generated for the spectral elements in the current fourier magnitude spectrum by using a hash function, and the spectral elements are rearranged according to the new arrangement order to obtain a rearranged fourier magnitude spectrum, and the steps are as follows:
let the current Fourier magnitude spectrum be X ═ X1,x2,…,xm]T∈Rm×1Wherein X is a column vector, X1,x2,…,xmM spectral elements are represented, subscripts 1,2, … of the spectral elements indicate arrangement sequence of the spectral elements in a Fourier magnitude spectrum, T indicates a vector transposition operation, e indicates that the elements belong to a set, and R indicates that the spectral elements are real numbers;
(3b1) selecting an integer a which is coprime to m in the range of [2, m), selecting an integer b in the range of [0, m), and constructing a hash function f (k) ═ ak + b)m+1, wherein (.)mDenotes a modulo operation, k denotes the subscript of the spectral element, and k ═ 1,2, …, m;
(3b2) mapping the original index sequence 1,2, …, m of the spectral elements to a new index sequence f (1), f (2), …, f (m) using a hash function f (k);
(3b3) rearranging the spectral elements according to the new subscript sequence f (1), f (2), …, f (m) to obtain a rearranged FourierAmplitude spectrum Xr=[xf(1),xf(2),…,xf(m)]T∈Rm×1
4. The method of claim 1, further comprising: the robust principal component analysis algorithm in the step (4b) refers to: rearrangement time-frequency amplitude spectrum | D by using augmented Lagrange multiplier ALM methodw|rThe robust principal component analysis algorithm model carries out optimization solution, and a rearrangement time-frequency amplitude spectrum | D is decomposedw|rObtaining a sparse rearrangement time-frequency amplitude spectrum | Sw|rAnd low rank rearrangement time-frequency amplitude spectrum | Lw|r
5. The method of claim 1, further comprising: step (7) of using the whitening filter obtained in step (1a) to whiten the enhanced speech yw(n) performing inverse whitening processing to obtain enhanced speech y (n) as follows:
(7a) using the whitening filter obtained in (1a) to build an inverse whitening filter having a transfer function of WI(z)=1/W(z);
(7b) Speech enhancement using an inverse whitening filterw(n) filtering to obtain enhanced speech y (n).
CN201911290388.2A 2019-12-16 2019-12-16 Speech enhancement method based on WSHRRPCA algorithm Active CN111145768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911290388.2A CN111145768B (en) 2019-12-16 2019-12-16 Speech enhancement method based on WSHRRPCA algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911290388.2A CN111145768B (en) 2019-12-16 2019-12-16 Speech enhancement method based on WSHRRPCA algorithm

Publications (2)

Publication Number Publication Date
CN111145768A CN111145768A (en) 2020-05-12
CN111145768B true CN111145768B (en) 2022-05-17

Family

ID=70518298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911290388.2A Active CN111145768B (en) 2019-12-16 2019-12-16 Speech enhancement method based on WSHRRPCA algorithm

Country Status (1)

Country Link
CN (1) CN111145768B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115267650B (en) * 2022-05-25 2024-09-13 广东工业大学 Information source number estimation method used under conditions of color noise and small snapshot number

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215671A (en) * 2018-11-08 2019-01-15 西安电子科技大学 Speech-enhancement system and method based on MFrSRRPCA algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1253581B1 (en) * 2001-04-27 2004-06-30 CSEM Centre Suisse d'Electronique et de Microtechnique S.A. - Recherche et Développement Method and system for speech enhancement in a noisy environment
US8374854B2 (en) * 2008-03-28 2013-02-12 Southern Methodist University Spatio-temporal speech enhancement technique based on generalized eigenvalue decomposition

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215671A (en) * 2018-11-08 2019-01-15 西安电子科技大学 Speech-enhancement system and method based on MFrSRRPCA algorithm

Also Published As

Publication number Publication date
CN111145768A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN107845389B (en) Speech enhancement method based on multi-resolution auditory cepstrum coefficient and deep convolutional neural network
CN110619885B (en) Method for generating confrontation network voice enhancement based on deep complete convolution neural network
CN109215671B (en) Voice enhancement system and method based on MFrSRRPCA algorithm
CN105679330B (en) Based on the digital deaf-aid noise-reduction method for improving subband signal-to-noise ratio (SNR) estimation
CN111508518B (en) Single-channel speech enhancement method based on joint dictionary learning and sparse representation
CN105489226A (en) Wiener filtering speech enhancement method for multi-taper spectrum estimation of pickup
CN111899750B (en) Speech enhancement algorithm combining cochlear speech features and hopping deep neural network
Vanjari et al. Comparative Analysis of Speech Enhancement Techniques in Perceptive of Hearing Aid Design
CN114566176B (en) Residual echo cancellation method and system based on deep neural network
CN110808057A (en) Voice enhancement method for generating confrontation network based on constraint naive
Do et al. Speech Separation in the Frequency Domain with Autoencoder.
CN111145768B (en) Speech enhancement method based on WSHRRPCA algorithm
CN115574922A (en) Hydroelectric generating set vibration signal noise reduction method and system based on cross entropy
Min et al. Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement
CN116682444A (en) Single-channel voice enhancement method based on waveform spectrum fusion network
Nossier et al. Two-stage deep learning approach for speech enhancement and reconstruction in the frequency and time domains
CN116469394A (en) Robust speaker identification method based on spectrogram denoising and countermeasure learning
Rao et al. Speech enhancement using sub-band cross-correlation compensated Wiener filter combined with harmonic regeneration
TWI749547B (en) Speech enhancement system based on deep learning
CN113066483B (en) Sparse continuous constraint-based method for generating countermeasure network voice enhancement
Surendran et al. Perceptual subspace speech enhancement with variance normalization
Meher et al. Dynamic spectral subtraction on AWGN speech
Sulong et al. Speech enhancement based on wiener filter and compressive sensing
Goswami et al. Phase aware speech enhancement using realisation of Complex-valued LSTM
Childers et al. Co--Channel speech separation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230424

Address after: 710065 5th Floor, Block B, Productivity Building, No. 3, Dianzi West Street, Electronic Industrial Park, Hi-tech Zone, Xi'an City, Shaanxi Province

Patentee after: Xi'an Shengxin Technology Co.,Ltd.

Address before: 710071 No. 2 Taibai South Road, Shaanxi, Xi'an

Patentee before: XIDIAN University

CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: Room 810, Building C, 8th Floor, Chuangye Building, No. 16 Gaoxin 1st Road, Xi'an City, Shaanxi Province, 710065

Patentee after: Xi'an Shengxin Technology Co.,Ltd.

Address before: 710065 5th Floor, Block B, Productivity Building, No. 3, Dianzi West Street, Electronic Industrial Park, Hi-tech Zone, Xi'an City, Shaanxi Province

Patentee before: Xi'an Shengxin Technology Co.,Ltd.