WO1986005619A1 - Analyzer for speech in noise prone environments - Google Patents
Analyzer for speech in noise prone environments Download PDFInfo
- Publication number
- WO1986005619A1 WO1986005619A1 PCT/US1985/002255 US8502255W WO8605619A1 WO 1986005619 A1 WO1986005619 A1 WO 1986005619A1 US 8502255 W US8502255 W US 8502255W WO 8605619 A1 WO8605619 A1 WO 8605619A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech
- signals
- noise
- autocorrelation
- signal
- Prior art date
Links
- 238000000034 method Methods 0.000 claims description 18
- 239000000654 additive Substances 0.000 claims description 6
- 230000000996 additive effect Effects 0.000 claims description 6
- 230000007613 environmental effect Effects 0.000 claims description 4
- 238000000638 solvent extraction Methods 0.000 claims description 4
- 238000001228 spectrum Methods 0.000 description 9
- 238000005311 autocorrelation function Methods 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/06—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
Definitions
- This invention relates to a method for analyzing speech in a noise prone environment.
- the problem is that the speech signal spectrum .is no longer an all-pole spectrum and the usual methods of estimating the pth order all-pole parameters from the first p+1 correlations of the signal are no longer valid. While such an analysis results in a match of the first p+1 correlations, it does not guaranty the matching of the higher order correlations and affects the degree to which speech parameters obtained from the analysis corresponds to the speech pattern applied to the analyzer.
- the problems are solved in accordance with this invention by a method which comprises the steps of partitioning an input speech pattern into successive time frame intervals, forming signals representative of the autocorrelation of the input speech of the current time frame interval responsive to the input speech pattern, generating noise signals representative of the environment, forming first and second autocorrelation signals responsive to said input speech autocorrelation signals and said generated noise signals, generating a signal corresponding to the difference between said first and second autocorrelation signals, and producing a set of signals representative of the current time frame interval speech responsive to said difference corresponding signal and the second autocorrelation signals.
- the invention is directed to an arrangement for analyzing noise-contaminated speech in which the speech pattern is partitioned into successive time frame intervals.
- a predictive error signal is generated for each interval responsive to the noise-contaminated speech of the interval and an estimate of the noise.
- Predictive parameter signals are selected to minimize the time frame predictive error signal so that accurate digital codes representative of the speech are generated.
- FIG. 1 depicts a flow chart showing the general operation of a speech analyzer illustrative of the invention
- FIG. 2 shows a block diagram of a circuit adapted to analyze speech patterns in accordance with the flow chart of FIG. 1 that is illustrative of the invention
- FIGS. 3-5 are detailed flow charts illustrating the speech analysis process of FIG. 1; and FIG. 6 shows waveforms illustrating the autocorrelations of speech patterns obtained through the analyzer of FIGS. 1 and 2. Detailed Description
- LPC linear predictive analysis
- the parameters of a pth order all-pole filter are determined such that the sum of the autocorrelations based on the all-pole model and the noise autocorrelations match the autocorrelations of speech contaminated with noise over a large number of lags beyond £.
- r k is the autocorrelation function of the speech contaminated with noise at the kth sample lag
- r ⁇ is the corresponding autocorrelation based on the all-pole model
- n k is the autocorrelation function of the noise signal
- ⁇ is the unknown noise intensity
- Equation (4) requires solving a set of nonlinear equations in p+1 unknowns. Such equations are difficult to formulate and solve using data processing techniques.
- the first p+1 correlations of the speech signal and the all-pole filter and noise model are matched exactly while simultaneously the noise factor ⁇ is selected to minimize the mismatch at the next correlations.
- the mismatch is then a function of ⁇ which is determined by a one-dimensional search.
- the optimum linear predictor coefficients characterizing a time frame interval are obtained as solutions of p+1 linear equations
- Equation (5) is solved for a number of values of ⁇ between zero and the highest expected value of the noise power expressed as a fractio-n of the speech power.
- the value of ⁇ that minimizes the extended sum squared error in Equation (4) is selected as the optimum value and the linear prediction parameter signals a1 ,a2, .... ,ap are formed.
- FIG. 6 illustrates the effect of noise on the speech pattern analysis.
- Waveform 601 shows the true autocorrelation function for a 20 millisecond time frame interval of speech contaminated by 10 db of additive white noise.
- Waveform 603 shows the autocorrelation function obtained from a pth order all-pole model in accordance with the prior art
- waveform 605 illustrates the autocorrelation function obtained from the modified autocorrelation analysis in accordance with the invention. It is readily seen that waveform 605 follows the true autocorrelation function of waveform 601 very closely while waveform 603 obtained from an all-pole model analysis deviates significantly from the true autocorrelation functiono Detailed Description
- FIG. 1 A general flow chart illustrating the noisy speech pattern analysis arrangement is shown in FIG. 1, and FIG. 2 depicts a block diagram of a microprocessor circuit adapted to carry out the operations of the flow chart of FIG. 1.
- a speech pattern and the environmental noise associated therewith are sampled and digitized as indicated in step 101 of FIG. 1. This is accomplished in the circuit of FIG. 2 by receiving the noisy speech pattern at microphone 201 and low pass filtering the speech signal in filter 205.
- the bandlimited signal from the filter is sampled in analog-to-digital converter 210 at a prescribed rate, and each sample is converted into a digital code corresponding to the magnitude of the sample.
- Step 105 of FIG. 1 is performed and the digitized speech samples from converter 210 are partitioned into time frame intervals in floating point array processor 220 hereinafter known as an arithmetic 220 under control of control processor 215.
- Such partitioning may be done on a frame-by-frame basis as the speech signal is received from microphone 201.
- the time frame interval signal samples are processed successively, and a set of LPC speech parameter signals are produced in arithmetic processor 220 and transferred therefrom to utilization device 260.
- the utilization device may comprise speech processing equipment such as a speech recognizer, synthesizer or coder or general purpose data processing equipment such as a main frame computer or a personal computer.
- the programmed instructions controlling the operation of the circuit of FIG. 2 in carrying out the operations of the flow chart of FIG. 1 in control memory 230 are listed in Fortran language form in the Appendix hereto.”
- step 110 in arithmetic processor 220 according to stored instructions in program memory 230 of FIG. 2 using the frame speech sample signals of store 240 and the window signal in store 235.
- Windowed speech samples are stored in memory 245 and the r(i) correlation signals are placed in store 250 of FIG. 2.
- N successive digitized speech samples s(1 ) ,s(2) , ...s(n) , ... ,s(N) are windowed by combining the sample signals with a window function signal w(n) stored in memory 235 of FIG. 2 as is well known in the art.
- the windowed sample signals x(n) s(n)*w(n) 1 ⁇ n ⁇ N (6)
- step 110 in FIG. 1 is shown in greater detail in FIG. 3.
- the autocorrelation index signal k. is initially set to zero (step 301).
- the autocorrelation signals are then iteratively formed in steps 305, 310 and
- Each autocorrelation signal r(k) ⁇ x(n)*x(n-k) . (7)
- the r(k) signals are stored in autocorrelation signal store 250 as they are produced in step 305.
- the noise contribution level signal is then initially set to a minimum prescribed value as per index K in step 115 of FIG. 1.
- the noise contribution signal corresponds to the noise autocorrelation patterns expected during a time frame interval.
- Such noise pattern signals may be fixed as white or colored noise or may be obtained by sampling the particular speech analyzer environment in the absence of speech. The sampling may represent the average noise background or may be obtained in the first several milliseconds of each speech analysis operation prior to the application of speech to microphone 201.
- the loop including steps 120 to 140 that is operative to form a set of modified autocorrelation signals is then entered.
- This loop is adapted to form modified autocorrelation signals of the current time frame interval * by subtracting the noise contribution signal ⁇ indexed by K (step 120), form linear prediction parameter signals for the current time frame interval, form the modified autocorrelation signals (step 125), form all-pole model autocorrelation signals and to generate an error signal corresponding to the match between the all-pole model autocorrelations and modified autocorrelation signals (step 130).
- Noise contribution index signal K is then incremented (step 135) and the loop is iterated for the predetermined set of noise contribution indices.
- FIG. 4 illustrates the operations of the modified autocorrelation signal formation loop in greater detail.
- a singularity flag IS is initially reset to zero in step 403. If during the iterations through the loop from step 410 through 430, unacceptable values for predictor coefficients are obtained, the singularity flag is set and no further modification autocorrelation signals are formed.
- Noise index K is then set to zero in step 405. Index K is incremented by a predetermined amount each iteration through the loop to provide modified autocorrelation signals responsive to different values of noise contribution,
- the modified autocorrelation signal loop is then iterated for increasing values of noise index K until K exceeds Kmax corresponding to the maximum noise contribution signal expected.
- Step 415 is then entered and the linear prediction coefficients resulting from the current modified autocorrelation signals of the time frame interval are generated from
- Step 415 includes generating signals
- index K* corresponding to the minimum matching error signal is determined (step 145) by a search through the correlation signal matching errors obtained in the iterations of loop 450 in FIG. 4.
- the noise contribution signal for. index K* is then used to form LPC speech parameter signals for the current frame (step 150 of
- Step 501 is entered after the noise index K* has been determined in step 150 of FIG. 1.
- the linear prediction coefficients corresponding to the noise corrected autocorrelation coefficients are then formed from the relationship
- control is then passed to step 110 via step 155 and the circuit of FIG. 2 processes the next set of N digitized speech sample signals of store 240.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Time-Division Multiplex Systems (AREA)
- Noise Elimination (AREA)
Abstract
A speech pattern from a noisy environment is analyzed to form a representative sequence of speech parameter signals. The noise-contaminated speech pattern is partitioned into successive time frame intervals. A predictive error signal is generated for each successive interval responsive to the noise contaminated speech of the interval and an estimate of the contaminating noise. Predictive parameter signals are selected to minimize the time frame predictive error signal so that accurate digital codes representative of the speech are generated.
Description
ANALYZER FOR SPEECH IN NOISE PRONE ENVIRONMENTS
Field of the Invention
This invention relates to a method for analyzing speech in a noise prone environment. Background of the Invention
The analysis of speech patterns using linear prediction techniques is well known in the #art and is used in speech processing, speech recognition, and speech synthesis. U. S. Patent 3,624,302 discloses arrangements to transform speech signals into linear prediction signals for such uses as speech coding, speech recognition, and speech synthesis. The prediction analysis is based on a pth order all-pole model of the speech signal. .In noisy environments such as factories, moving vehicles, or large offices with typewriters and other business equipment, however, the background noise contaminates the speech so that the all-pole model is not accurate. Even in relatively quiet environments, or where noise canceling microphones are used, noise always present in the background affects the accuracy of speech analysis.
With additive noise, the problem is that the speech signal spectrum .is no longer an all-pole spectrum and the usual methods of estimating the pth order all-pole parameters from the first p+1 correlations of the signal are no longer valid. While such an analysis results in a match of the first p+1 correlations, it does not guaranty the matching of the higher order correlations and affects the degree to which speech parameters obtained from the analysis corresponds to the speech pattern applied to the analyzer. It is an object of the invention to provide improved speech analysis that mitigates the effects of environmental noise on the generation of speech parameters, The problems are solved in accordance with this invention by a method which comprises the steps of partitioning an input speech pattern into successive time frame intervals, forming signals representative of the
autocorrelation of the input speech of the current time frame interval responsive to the input speech pattern, generating noise signals representative of the environment, forming first and second autocorrelation signals responsive to said input speech autocorrelation signals and said generated noise signals, generating a signal corresponding to the difference between said first and second autocorrelation signals, and producing a set of signals representative of the current time frame interval speech responsive to said difference corresponding signal and the second autocorrelation signals. Summary of the Invention
The invention is directed to an arrangement for analyzing noise-contaminated speech in which the speech pattern is partitioned into successive time frame intervals. A predictive error signal is generated for each interval responsive to the noise-contaminated speech of the interval and an estimate of the noise. Predictive parameter signals are selected to minimize the time frame predictive error signal so that accurate digital codes representative of the speech are generated. Brief Description of the Drawing
FIG. 1 depicts a flow chart showing the general operation of a speech analyzer illustrative of the invention FIG. 2 shows a block diagram of a circuit adapted to analyze speech patterns in accordance with the flow chart of FIG. 1 that is illustrative of the invention;
FIGS. 3-5 are detailed flow charts illustrating the speech analysis process of FIG. 1; and FIG. 6 shows waveforms illustrating the autocorrelations of speech patterns obtained through the analyzer of FIGS. 1 and 2. Detailed Description
As is well known in the art, linear predictive analysis (LPC) utilizes a pth order all-pole model for the speech signal. For an all-pole spectrum, the matching of the first p+1 autocorrelations assures that the subsequent
correlations also match. In this way the underlying spectrum of the process is recovered, where additive white or other kind of noise is present, however, the signal spectrum no longer fits the all-pole spectrum and should be modeled as a sum of an all-pole spectrum and the noise spectrum. Using only the all-pole spectrum results in inaccuracies in the higher order correlations.
According to the invention, the parameters of a pth order all-pole filter are determined such that the sum of the autocorrelations based on the all-pole model and the noise autocorrelations match the autocorrelations of speech contaminated with noise over a large number of lags beyond £. Using speech contaminated with noise, the error for the kth sample lag is ξk = rk - (rk + gηk) (1)
" rk " βηk"rk (2) where rk is the autocorrelation function of the speech contaminated with noise at the kth sample lag, r^ is the corresponding autocorrelation based on the all-pole model, nk is the autocorrelation function of the noise signal, and ø is the unknown noise intensity. As aforementioned, the matching of the first ρ+1 autocorrelations result in
where a-j,...,ap are the predictor coefficients of the all-pole model. The unknown predictor coefficients a^_ and the unknown noise intensity factor β are determined by minimizing an extended sum squared error p+q p+q p m
The minimization of the extended sum squared error of Equation (4) requires solving a set of nonlinear equations in p+1 unknowns. Such equations are difficult to formulate and solve using data processing techniques. In accordance with the invention, the first p+1 correlations
of the speech signal and the all-pole filter and noise model are matched exactly while simultaneously the noise factor β is selected to minimize the mismatch at the next correlations. The mismatch is then a function of β which is determined by a one-dimensional search. The optimum linear predictor coefficients characterizing a time frame interval are obtained as solutions of p+1 linear equations
P Σ (r(k-i)-βη(k-i))a(i) = r(i)1< k < p (5) i=1 where r(i) is the ith autocorrelation coefficient of the noisy speech signal. Equation (5) is solved for a number of values of β between zero and the highest expected value of the noise power expressed as a fractio-n of the speech power. The value of β that minimizes the extended sum squared error in Equation (4) is selected as the optimum value and the linear prediction parameter signals a1 ,a2, .... ,ap are formed.
FIG. 6 illustrates the effect of noise on the speech pattern analysis. Waveform 601 shows the true autocorrelation function for a 20 millisecond time frame interval of speech contaminated by 10 db of additive white noise. Waveform 603 shows the autocorrelation function obtained from a pth order all-pole model in accordance with the prior art, and waveform 605 illustrates the autocorrelation function obtained from the modified autocorrelation analysis in accordance with the invention. It is readily seen that waveform 605 follows the true autocorrelation function of waveform 601 very closely while waveform 603 obtained from an all-pole model analysis deviates significantly from the true autocorrelation functio Detailed Description
A general flow chart illustrating the noisy speech pattern analysis arrangement is shown in FIG. 1, and FIG. 2 depicts a block diagram of a microprocessor circuit adapted to carry out the operations of the flow chart of FIG. 1.
Referring to FIGS. 1 and 2, a speech pattern and
the environmental noise associated therewith are sampled and digitized as indicated in step 101 of FIG. 1. This is accomplished in the circuit of FIG. 2 by receiving the noisy speech pattern at microphone 201 and low pass filtering the speech signal in filter 205. The bandlimited signal from the filter is sampled in analog-to-digital converter 210 at a prescribed rate, and each sample is converted into a digital code corresponding to the magnitude of the sample. Step 105 of FIG. 1 is performed and the digitized speech samples from converter 210 are partitioned into time frame intervals in floating point array processor 220 hereinafter known as an arithmetic 220 under control of control processor 215. Such partitioning may be done on a frame-by-frame basis as the speech signal is received from microphone 201. The time frame interval signal samples are processed successively, and a set of LPC speech parameter signals are produced in arithmetic processor 220 and transferred therefrom to utilization device 260. The utilization device may comprise speech processing equipment such as a speech recognizer, synthesizer or coder or general purpose data processing equipment such as a main frame computer or a personal computer. The programmed instructions controlling the operation of the circuit of FIG. 2 in carrying out the operations of the flow chart of FIG. 1 in control memory 230 are listed in Fortran language form in the Appendix hereto."
Autocorrelation signals r(i) for the current time frame are generated by methods well known in the art
(step 110) in arithmetic processor 220 according to stored instructions in program memory 230 of FIG. 2 using the frame speech sample signals of store 240 and the window signal in store 235. Windowed speech samples are stored in memory 245 and the r(i) correlation signals are placed in store 250 of FIG. 2. In the frame analysis, N successive digitized speech samples s(1 ) ,s(2) , ...s(n) , ... ,s(N) are
windowed by combining the sample signals with a window function signal w(n) stored in memory 235 of FIG. 2 as is well known in the art. The windowed sample signals x(n) = s(n)*w(n) 1<n<N (6)
are produced in arithmetic processor 220 and are stored in memory 245. The autocorrelation signal formation of step 110 in FIG. 1 is shown in greater detail in FIG. 3.
Referring to FIG. 3 the autocorrelation index signal k. is initially set to zero (step 301). The autocorrelation signals are then iteratively formed in steps 305, 310 and
315 until the autocorrelation index reaches p+q+1. Each autocorrelation signal r(k) = ∑x(n)*x(n-k) . (7)
The r(k) signals are stored in autocorrelation signal store 250 as they are produced in step 305.
The noise contribution level signal is then initially set to a minimum prescribed value as per index K in step 115 of FIG. 1. The noise contribution signal corresponds to the noise autocorrelation patterns expected during a time frame interval. Such noise pattern signals may be fixed as white or colored noise or may be obtained by sampling the particular speech analyzer environment in the absence of speech. The sampling may represent the average noise background or may be obtained in the first several milliseconds of each speech analysis operation prior to the application of speech to microphone 201. The loop including steps 120 to 140 that is operative to form a set of modified autocorrelation signals is then entered. This loop is adapted to form modified autocorrelation signals of the current time frame interval* by subtracting the noise contribution signal^indexed by K (step 120), form linear prediction parameter signals for the current time frame interval, form the modified autocorrelation signals (step 125), form all-pole model autocorrelation signals and to generate an error signal
corresponding to the match between the all-pole model autocorrelations and modified autocorrelation signals (step 130). Noise contribution index signal K is then incremented (step 135) and the loop is iterated for the predetermined set of noise contribution indices.
The flow chart of FIG. 4 illustrates the operations of the modified autocorrelation signal formation loop in greater detail. In FIG. 4, a singularity flag IS is initially reset to zero in step 403. If during the iterations through the loop from step 410 through 430, unacceptable values for predictor coefficients are obtained, the singularity flag is set and no further modification autocorrelation signals are formed. Noise index K is then set to zero in step 405. Index K is incremented by a predetermined amount each iteration through the loop to provide modified autocorrelation signals responsive to different values of noise contribution,
The modified autocorrelation signal loop is then iterated for increasing values of noise index K until K exceeds Kmax corresponding to the maximum noise contribution signal expected. In step 410, a modified autocorrelation signal is generated in accordance with c(m) = r(m) - β*η(m) 0<m<p+q . (8) Step 415 is then entered and the linear prediction coefficients resulting from the current modified autocorrelation signals of the time frame interval are generated from
P
Σ c(m-j)*a(j) = c(m) 1<m<ρ . (9) j=1 Step 415 includes generating signals
P s1 = c(m)+ ∑ a(j)*c(m-j ) (10) j=1 and
P s2 = c(0)+ Σ a(j)*c(j) (11) j=1
and forming the signal a(m) = -s1/s2 . (12)
If s2 in Equation (10) is equal to or less than zero, the singularity flag IS is set and the modified autocorrelation forming loop is exited. Otherwise, the LPC coefficient is produced by generating signal b(j)=a(j) for 1^.j m and using the b(j) signal in
a(j) = b(j) + a(m)*b(m) for 1<j<m-1 (13) this process is iterated for modified autocorrelation index m=1 through m=p.
At this point in the operation of the circuit of
FIG. 2, the error signal q p e(K) = Σ [c(p+i)- Σ c (p+i-j)*a(j)]2 (14) i=1 j=1 representing degree of match between the modified autocorrelation signals of FIG. 4 and the corresponding autocorrelation signals based on the all-pole model for the present index K is formed in processor 220 of FIG. 2"". Index K is then incremented in step 425 and the next iteration is started in step 410.
When index Kmax has been exceeded in decision step 430 the index K* corresponding to the minimum matching error signal is determined (step 145) by a search through the correlation signal matching errors obtained in the iterations of loop 450 in FIG. 4. The noise contribution signal for. index K* is then used to form LPC speech parameter signals for the current frame (step 150 of
FIG. 1) and transferred to utilization device 260. The formation of the parameter signals corresponding to the noise speech and the additive noise is shown in greater detail in FIG. 5. Step 501 is entered after the noise index K* has been determined in step 150 of FIG. 1.
In step 501 the noise corrected autocorrelation signals
are generated responsive to the initial autocorrelation signals r(m) and the noise signal β = K*/100. The linear prediction coefficients corresponding to the noise corrected autocorrelation coefficients are then formed from the relationship
P
Σ c( |m-j|)*a(j) = c(m) . (16) j=1
These noise corrected speech parameters are then transferred to utilization device 260. Control is then passed to step 110 via step 155 and the circuit of FIG. 2 processes the next set of N digitized speech sample signals of store 240.
The invention has been shown and described with reference to a particular embodiment thereof. It is to be understood that changes and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention.
APPENDIX A
SUBROUTINE NLPC (A)
COMMON /STO 235/ W(0:200) COMMON /STO 240/ S(0:200) COMMON /STO 245/ X(0:200) COMMON /STO 250/ R(0:40)
COMMON /STO 255/ C(0:40)
REAL ETA(0:40) ,A(0:12) ,E(0:100) ,T(0:12) DATA (ETA(I) ,I=0,0)/1.0/
INTEGER P,Q
P=12 Q=12 N=172
KMAX=100
DO1I=0,N-1
1 W(I)=0.54-0.46*COS( (2*3.14159265*1) /N)
C+++ STEP 110 - GENERATE AUTOCORRELATION SIGNALS
DO2M=0,N-1
2 X(M)=S(M)*W(M)
DO3K=0,P+Q R(K)=0 D04M=K,N-1 4 R(K)=R(K)+X(M)*X(M-K) 3 CONTINUE
R0=R(0) D05M=0,P+Q
5 R( M ) =R( M ) /R0
K=0
100 CONTINUE
C+++ STEP 120 - MODIFY AUTOCORRELATION SIGNALS
BETA=0.001*K
C0=R(0)-BETA*ETA(0) DO6M=0,P+Q
6 C(M)=(R(M)-BETA*ETA(M))/C0
C+++ STEP 125 - FORM SPEECH PARAMETER SIGNALS
CALL ACTOPC (C( 1 ) ,A,P,T,IS)
IF(IS.EQ.1 )GOTO150
C+++ STEP 130 GENERATE CORRELATION MATCHING ERROR
E(K)=0
D07I=1 ,Q SM=0
D08J=1 ,P 8 SM=SM-C(P+I-J)*A(J) 7 E(K)=E(K)+(C(P+I)-SM)**2
WRITE(IFCW,908)K,BETA,E(K)
908 FORMAT(I5,F8.4,F8.4)
K=K+1
IF(K.LE.KMAX)GOTO100
150 CONTINUE C+++ STEP 145 DETERMINE INDEX OF MINIMUM MATCHING ERROR
KSTAR=0 EMIN=E(0) DO9I=0,K-1 IF(E(I) .GE.EMIN)GOT09
EMIN='E(I) KSTAR=I 9 CONTINUE
C+++ STEP 150 FORM SPEECH PARAMETER SIGNALS
BETA=0.001*KSTAR C0=R(0)-BETA*ETA(0) DO16M=0,P+Q 16 C(M)=(R(M)-BETA*ETA(M) )/C0
CALL ACTOPC (C(1 ) ,A,P,T,IS)
RETURN END
SUBROUTINE ACTOPC (A,X,N,T,M)
DIMENSION A( 1 ) ,X( 1 ) ,T( 1 )
M=0 X(1)=1
X(2)=-A(1)
D03I=2,N
S1=A(I)
S2=1 D04J=1,I-1
S1=S1+A(I-J)*X(J+1 ) 4 S2=S2+A(J)*X(J+1 )
IF(S2.GT.0.0)GOTO7
M=1 RETURN
7 RC=-S1/S2
T(I)=RC X(I+1 )=RC D01J=1,(I/2) 5 TI=X(J+1)
TJ=X(I-J+1 ) X(J+1 )=TI+RC*TJ 1 X(I-J+1 )=TI*RC+TJ 3 CONTINUE 10 M=0
RETURN END
15
20
25
30
35
Claims
Claims :
1. A method for analyzing speech in a noise prone environment comprising the steps of: partitioning an input speech pattern into successive time frame intervals? forming signals representative of the autocorrelation of the input speech of the current time frame interval responsive to the input speech pattern: generating noise signals representative of the environment; forming first and second autocorrelation signals responsive to said input speech autocorrelation signals and said generated noise signals; generating a signal corresponding to the difference between said first and second autocorrelation signals; and producing a set of signals representative of the current time frame interval speech responsive to said difference corresponding signal and the second autocorrelation signals.
2. A method for analyzing speech in a noise prone environment according to claim 1
CHARACTERIZED IN THAT the noise signal generating and first and second autocorrelation signal forming steps comprise: producing a time frame interval noise signal of prescribed magnitude; forming first and second autocorrelation signals responsive to the produced noise signal of prescribed magnitude; and repeating the noise signal producing step and the first and second autocorrelation signal forming step for prescribed magnitudes over a predetermined range of prescribed magnitudes. 3. A method for analyzing speech in a noise prone environment according to claim 2
CHARACTERIZED IN THAT
generating the difference corresponding signal comprises the steps of: producing a signal representative of the difference between the first autocorrelation signals of the current time frame interval and the second autocorrelation signals of the current time frame interval for each prescribed magnitude noise signal; and selecting the minimum of the difference representative signals. 4. A method for analyzing speech in a noise prone environment according to claim 3
CHARACTERIZED IN THAT producing the set of current time frame interval speech pattern representative signals comprises: generating a set of corrected autocorrelation signals responsive to the minimum difference representative signal; and producing a set of speech parameter signals for the current time frame interval responsive to the corrected autocorrelation signals.
5. A method for analyzing speech in a noise prone environment according to claim 4
CHARACTERIZED IN THAT the speech parameter signals are linear prediction coefficient signals.
6. A method for analyzing speech in a noise prone environment according to claim 5
CHARACTERIZED IN THAT the first autocorrelation signals correspond to all-pole model autocorrelation signals and said second autocorrelation signals correspond to noise signal reduced input speech autocorrelation signals.
1 . A method for analyzing speech in a noise prone environment according to claims 1, 2, 3, 4, 5 or 6 CHARACTERIZED IN THAT the speech pattern comprises speech and additive noise.
8. A method for analyzing speech in a noise prone environment according to claims 1. 2, 3, 4, 5, or 6
CHARACTERIZED IN THAT each time frame interval noise signal comprises a set of autocorrelation signals generated responsive to environmental noise preceding start of the speech pattern.
9. A speech analyzer for a noise prone environment
CHARACTERIZED IN THAT the analyzer comprising: means for partitioning an input speech pattern into successive time frame intervals; means for forming a signal representative of the autocorrelation of the input speech of the current time frame interval responsive to the input speech pattern; means for generating noise signals representative of the speech environment; means for forming a set of first and second autocorrelation signals of the current time frame interval responsive to said input speech autocorrelation signals and said generated noise signals; means for generating a signal corresponding to the difference between said first autocorrelation signals and said second autocorrelation signals? and means for producing a set of signals representative of the current time frame interval speech pattern responsive to said difference corresponding signal and said first autocorrelation signals.
10. A speech analyzer for a noise prone environment according to claim 9
CHARACTERIZED IN THAT the noise signal generating and first and second autocorrelation signal forming means comprise: means for producing a time frame interval noise signal of prescribed magnitude; means for forming first and second autocorrelation signals responsive to the produced noise signal of
prescribed magnitude; and means for iteratively operating the noise signal producing means and the first and second autocorrelation signal forming means for prescribed magnitudes over a predetermined range of prescribed magnitudes.
11. A speech analyzer for a noise prone environment according to claim 10
CHARACTERIZED IN THAT the difference corresponding signal generating means comprises: means for producing a signal representative of the difference between the autocorrelation signals of the current time frame interval and the first and second autocorrelation signals for each prescribed magnitude noise signal; and means for selecting the minimum of said difference representative signals.
12. A speech analyzer for a noise prone environment according to claim 11 CHARACTERIZED IN THAT the means for producing the set of current time frame interval speech pattern representative signals comprises: means for generating a set of corrected autocorrelation signals responsive to the minimum difference representative signal? and means for producing a set of speech parameter signals for the current time frame interval responsive to the corrected autocorrelation signals. 13. A speech analyzer for a noise prone environment according to claim 12 CHARACTERIZED IN THAT the speech parameter signals are linear prediction coefficient signals. 14. A speech analyzer for a noise prone environment according to claims 9, 10, 11. 12, or 13 CHARACTERIZED IN THAT
the speech pattern comprises speech and additive noise.
15. A speech analyzer for a noise prone environment noise according to claim 9, 10, 11, 12, or 13 CHARACTERIZED IN THAT the first autocorrelation signals correspond to all-pole model autocorrelation signals and said second autocorrelation signals correpond to noise signal reduced input speech autocorrelation signals. 16. A speech analyzer for a noise prone environment according to claims 9, 10. 11, 12 or 13 CHARACTERIZED IN THAT the means for generating the time frame interval noise signal comprises means for forming a set of autocorrelation signals responsive to environmental noise preceding start of the speech pattern.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US71488885A | 1985-03-22 | 1985-03-22 | |
US714,888 | 1985-03-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1986005619A1 true WO1986005619A1 (en) | 1986-09-25 |
Family
ID=24871862
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1985/002255 WO1986005619A1 (en) | 1985-03-22 | 1985-11-14 | Analyzer for speech in noise prone environments |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP0226590A1 (en) |
JP (1) | JPS62502288A (en) |
AU (1) | AU5202086A (en) |
ES (1) | ES8704658A1 (en) |
WO (1) | WO1986005619A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
NL8500843A (en) * | 1985-03-22 | 1986-10-16 | Koninkl Philips Electronics Nv | MULTIPULS EXCITATION LINEAR-PREDICTIVE VOICE CODER. |
-
1985
- 1985-11-14 JP JP50526685A patent/JPS62502288A/en active Pending
- 1985-11-14 AU AU52020/86A patent/AU5202086A/en not_active Abandoned
- 1985-11-14 WO PCT/US1985/002255 patent/WO1986005619A1/en not_active Application Discontinuation
- 1985-11-14 EP EP19850906004 patent/EP0226590A1/en not_active Withdrawn
- 1985-11-21 ES ES549155A patent/ES8704658A1/en not_active Expired
Non-Patent Citations (2)
Title |
---|
ICASSP 81, IEEE International Conference on Acoustics, Speech and Signal Processing, March 30 - April 1, 1981 Atlanta, Vol. 3, IEEE (New York, US) C.K. UN et al.: "Improving LPC Analysis of Noisy Speech by Anticorrelation Substraction Method", pages 1082-1085, see page 1082, right-hand column, lines 1-16 * |
ICASSP 85, IEEE International Conference on Acoustics, Speech and Signal Processing, March 26-29, 1985, Tampa, Vol. 2, IEEE, (New York, US) V.K. JAIN et al.: "Robust LPC Analysis of Speech by Extended Correlation Matching", pages 473-476, see paragraph II: "Suboptimum Solution of Extended Correlation Matching Problem" * |
Also Published As
Publication number | Publication date |
---|---|
EP0226590A1 (en) | 1987-07-01 |
AU5202086A (en) | 1986-10-13 |
ES549155A0 (en) | 1987-04-16 |
JPS62502288A (en) | 1987-09-03 |
ES8704658A1 (en) | 1987-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lim et al. | All-pole modeling of degraded speech | |
EP1309964B1 (en) | Fast frequency-domain pitch estimation | |
CA1301339C (en) | Parallel processing pitch detector | |
US4283601A (en) | Preprocessing method and device for speech recognition device | |
US5179626A (en) | Harmonic speech coding arrangement where a set of parameters for a continuous magnitude spectrum is determined by a speech analyzer and the parameters are used by a synthesizer to determine a spectrum which is used to determine senusoids for synthesis | |
KR960002388B1 (en) | Speech encoding process system and voice synthesizing method | |
US7272551B2 (en) | Computational effectiveness enhancement of frequency domain pitch estimators | |
EP0336658A2 (en) | Vector quantization in a harmonic speech coding arrangement | |
GB1533337A (en) | Speech analysis and synthesis system | |
EP0528324A2 (en) | Auditory model for parametrization of speech | |
EP1511011B1 (en) | Noise reduction for robust speech recognition | |
EP0470245A1 (en) | Method for spectral estimation to improve noise robustness for speech recognition. | |
US5097508A (en) | Digital speech coder having improved long term lag parameter determination | |
CA1061906A (en) | Speech signal fundamental period extractor | |
US4890328A (en) | Voice synthesis utilizing multi-level filter excitation | |
US5884251A (en) | Voice coding and decoding method and device therefor | |
Atal et al. | Linear prediction analysis of speech based on a pole‐zero representation | |
US4922539A (en) | Method of encoding speech signals involving the extraction of speech formant candidates in real time | |
US6456965B1 (en) | Multi-stage pitch and mixed voicing estimation for harmonic speech coders | |
US5007094A (en) | Multipulse excited pole-zero filtering approach for noise reduction | |
EP1199712B1 (en) | Noise reduction method | |
KR100323487B1 (en) | Burst here Linear prediction | |
WO1986005619A1 (en) | Analyzer for speech in noise prone environments | |
US6438517B1 (en) | Multi-stage pitch and mixed voicing estimation for harmonic speech coders | |
KR0128851B1 (en) | Pitch detecting method by spectrum harmonics matching of variable length dual impulse having different polarity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE FR GB IT LU NL SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1985906004 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1985906004 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1985906004 Country of ref document: EP |