EP0226590A1 - Analyzer for speech in noise prone environments - Google Patents

Analyzer for speech in noise prone environments

Info

Publication number
EP0226590A1
EP0226590A1 EP19850906004 EP85906004A EP0226590A1 EP 0226590 A1 EP0226590 A1 EP 0226590A1 EP 19850906004 EP19850906004 EP 19850906004 EP 85906004 A EP85906004 A EP 85906004A EP 0226590 A1 EP0226590 A1 EP 0226590A1
Authority
EP
European Patent Office
Prior art keywords
speech
signals
noise
autocorrelation
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP19850906004
Other languages
German (de)
French (fr)
Inventor
Bishnu Saroop Atal
Vijay Kumar Jain
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
American Telephone and Telegraph Co Inc
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by American Telephone and Telegraph Co Inc, AT&T Corp filed Critical American Telephone and Telegraph Co Inc
Publication of EP0226590A1 publication Critical patent/EP0226590A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/12Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients

Definitions

  • This invention relates to a method for analyzing speech in a noise prone environment.
  • the problem is that the speech signal spectrum .is no longer an all-pole spectrum and the usual methods of estimating the pth order all-pole parameters from the first p+1 correlations of the signal are no longer valid. While such an analysis results in a match of the first p+1 correlations, it does not guaranty the matching of the higher order correlations and affects the degree to which speech parameters obtained from the analysis corresponds to the speech pattern applied to the analyzer.
  • the problems are solved in accordance with this invention by a method which comprises the steps of partitioning an input speech pattern into successive time frame intervals, forming signals representative of the autocorrelation of the input speech of the current time frame interval responsive to the input speech pattern, generating noise signals representative of the environment, forming first and second autocorrelation signals responsive to said input speech autocorrelation signals and said generated noise signals, generating a signal corresponding to the difference between said first and second autocorrelation signals, and producing a set of signals representative of the current time frame interval speech responsive to said difference corresponding signal and the second autocorrelation signals.
  • the invention is directed to an arrangement for analyzing noise-contaminated speech in which the speech pattern is partitioned into successive time frame intervals.
  • a predictive error signal is generated for each interval responsive to the noise-contaminated speech of the interval and an estimate of the noise.
  • Predictive parameter signals are selected to minimize the time frame predictive error signal so that accurate digital codes representative of the speech are generated.
  • FIG. 1 depicts a flow chart showing the general operation of a speech analyzer illustrative of the invention
  • FIG. 2 shows a block diagram of a circuit adapted to analyze speech patterns in accordance with the flow chart of FIG. 1 that is illustrative of the invention
  • FIGS. 3-5 are detailed flow charts illustrating the speech analysis process of FIG. 1; and FIG. 6 shows waveforms illustrating the autocorrelations of speech patterns obtained through the analyzer of FIGS. 1 and 2. Detailed Description
  • LPC linear predictive analysis
  • the parameters of a pth order all-pole filter are determined such that the sum of the autocorrelations based on the all-pole model and the noise autocorrelations match the autocorrelations of speech contaminated with noise over a large number of lags beyond £.
  • r k is the autocorrelation function of the speech contaminated with noise at the kth sample lag
  • r ⁇ is the corresponding autocorrelation based on the all-pole model
  • n k is the autocorrelation function of the noise signal
  • is the unknown noise intensity
  • Equation (4) requires solving a set of nonlinear equations in p+1 unknowns. Such equations are difficult to formulate and solve using data processing techniques.
  • the first p+1 correlations of the speech signal and the all-pole filter and noise model are matched exactly while simultaneously the noise factor ⁇ is selected to minimize the mismatch at the next correlations.
  • the mismatch is then a function of ⁇ which is determined by a one-dimensional search.
  • the optimum linear predictor coefficients characterizing a time frame interval are obtained as solutions of p+1 linear equations
  • Equation (5) is solved for a number of values of ⁇ between zero and the highest expected value of the noise power expressed as a fractio-n of the speech power.
  • the value of ⁇ that minimizes the extended sum squared error in Equation (4) is selected as the optimum value and the linear prediction parameter signals a1 ,a2, .... ,ap are formed.
  • FIG. 6 illustrates the effect of noise on the speech pattern analysis.
  • Waveform 601 shows the true autocorrelation function for a 20 millisecond time frame interval of speech contaminated by 10 db of additive white noise.
  • Waveform 603 shows the autocorrelation function obtained from a pth order all-pole model in accordance with the prior art
  • waveform 605 illustrates the autocorrelation function obtained from the modified autocorrelation analysis in accordance with the invention. It is readily seen that waveform 605 follows the true autocorrelation function of waveform 601 very closely while waveform 603 obtained from an all-pole model analysis deviates significantly from the true autocorrelation functiono Detailed Description
  • FIG. 1 A general flow chart illustrating the noisy speech pattern analysis arrangement is shown in FIG. 1, and FIG. 2 depicts a block diagram of a microprocessor circuit adapted to carry out the operations of the flow chart of FIG. 1.
  • a speech pattern and the environmental noise associated therewith are sampled and digitized as indicated in step 101 of FIG. 1. This is accomplished in the circuit of FIG. 2 by receiving the noisy speech pattern at microphone 201 and low pass filtering the speech signal in filter 205.
  • the bandlimited signal from the filter is sampled in analog-to-digital converter 210 at a prescribed rate, and each sample is converted into a digital code corresponding to the magnitude of the sample.
  • Step 105 of FIG. 1 is performed and the digitized speech samples from converter 210 are partitioned into time frame intervals in floating point array processor 220 hereinafter known as an arithmetic 220 under control of control processor 215.
  • Such partitioning may be done on a frame-by-frame basis as the speech signal is received from microphone 201.
  • the time frame interval signal samples are processed successively, and a set of LPC speech parameter signals are produced in arithmetic processor 220 and transferred therefrom to utilization device 260.
  • the utilization device may comprise speech processing equipment such as a speech recognizer, synthesizer or coder or general purpose data processing equipment such as a main frame computer or a personal computer.
  • the programmed instructions controlling the operation of the circuit of FIG. 2 in carrying out the operations of the flow chart of FIG. 1 in control memory 230 are listed in Fortran language form in the Appendix hereto.”
  • step 110 in arithmetic processor 220 according to stored instructions in program memory 230 of FIG. 2 using the frame speech sample signals of store 240 and the window signal in store 235.
  • Windowed speech samples are stored in memory 245 and the r(i) correlation signals are placed in store 250 of FIG. 2.
  • N successive digitized speech samples s(1 ) ,s(2) , ...s(n) , ... ,s(N) are windowed by combining the sample signals with a window function signal w(n) stored in memory 235 of FIG. 2 as is well known in the art.
  • the windowed sample signals x(n) s(n)*w(n) 1 ⁇ n ⁇ N (6)
  • step 110 in FIG. 1 is shown in greater detail in FIG. 3.
  • the autocorrelation index signal k. is initially set to zero (step 301).
  • the autocorrelation signals are then iteratively formed in steps 305, 310 and
  • Each autocorrelation signal r(k) ⁇ x(n)*x(n-k) . (7)
  • the r(k) signals are stored in autocorrelation signal store 250 as they are produced in step 305.
  • the noise contribution level signal is then initially set to a minimum prescribed value as per index K in step 115 of FIG. 1.
  • the noise contribution signal corresponds to the noise autocorrelation patterns expected during a time frame interval.
  • Such noise pattern signals may be fixed as white or colored noise or may be obtained by sampling the particular speech analyzer environment in the absence of speech. The sampling may represent the average noise background or may be obtained in the first several milliseconds of each speech analysis operation prior to the application of speech to microphone 201.
  • the loop including steps 120 to 140 that is operative to form a set of modified autocorrelation signals is then entered.
  • This loop is adapted to form modified autocorrelation signals of the current time frame interval * by subtracting the noise contribution signal ⁇ indexed by K (step 120), form linear prediction parameter signals for the current time frame interval, form the modified autocorrelation signals (step 125), form all-pole model autocorrelation signals and to generate an error signal corresponding to the match between the all-pole model autocorrelations and modified autocorrelation signals (step 130).
  • Noise contribution index signal K is then incremented (step 135) and the loop is iterated for the predetermined set of noise contribution indices.
  • FIG. 4 illustrates the operations of the modified autocorrelation signal formation loop in greater detail.
  • a singularity flag IS is initially reset to zero in step 403. If during the iterations through the loop from step 410 through 430, unacceptable values for predictor coefficients are obtained, the singularity flag is set and no further modification autocorrelation signals are formed.
  • Noise index K is then set to zero in step 405. Index K is incremented by a predetermined amount each iteration through the loop to provide modified autocorrelation signals responsive to different values of noise contribution,
  • the modified autocorrelation signal loop is then iterated for increasing values of noise index K until K exceeds Kmax corresponding to the maximum noise contribution signal expected.
  • Step 415 is then entered and the linear prediction coefficients resulting from the current modified autocorrelation signals of the time frame interval are generated from
  • Step 415 includes generating signals
  • index K* corresponding to the minimum matching error signal is determined (step 145) by a search through the correlation signal matching errors obtained in the iterations of loop 450 in FIG. 4.
  • the noise contribution signal for. index K* is then used to form LPC speech parameter signals for the current frame (step 150 of
  • Step 501 is entered after the noise index K* has been determined in step 150 of FIG. 1.
  • the linear prediction coefficients corresponding to the noise corrected autocorrelation coefficients are then formed from the relationship
  • control is then passed to step 110 via step 155 and the circuit of FIG. 2 processes the next set of N digitized speech sample signals of store 240.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Time-Division Multiplex Systems (AREA)
  • Noise Elimination (AREA)

Abstract

Une modèle vocal capté dans un environnement bruyant est analysé pour former une séquence représentative de signaux de paramètres vocaux. Le modèle vocal affecté par le bruit est partagé en intervalles par tranches de temps successives. Un signal d'erreur prédictif est produit pour chaque intervalle successif en fonction de l'émission vocale affectée par le bruit dans l'intervalle et d'une estimation du bruit ambiant. Le signaux de paramètres prédictifs sont sélectionnés pour réduire au minimum le signal d'erreur prédictif de chaque tranche de temps, de manière à produire des codes numériques précis représentant l'émission vocale.A speech model captured in a noisy environment is analyzed to form a representative sequence of speech parameter signals. The voice model affected by noise is divided into intervals by successive time slots. A predictive error signal is produced for each successive interval based on the voice emission affected by noise in the interval and an estimate of the ambient noise. The predictive parameter signals are selected to minimize the predictive error signal for each time slot, so as to produce precise digital codes representing the voice broadcast.

Description

ANALYZER FOR SPEECH IN NOISE PRONE ENVIRONMENTS
Field of the Invention
This invention relates to a method for analyzing speech in a noise prone environment. Background of the Invention
The analysis of speech patterns using linear prediction techniques is well known in the #art and is used in speech processing, speech recognition, and speech synthesis. U. S. Patent 3,624,302 discloses arrangements to transform speech signals into linear prediction signals for such uses as speech coding, speech recognition, and speech synthesis. The prediction analysis is based on a pth order all-pole model of the speech signal. .In noisy environments such as factories, moving vehicles, or large offices with typewriters and other business equipment, however, the background noise contaminates the speech so that the all-pole model is not accurate. Even in relatively quiet environments, or where noise canceling microphones are used, noise always present in the background affects the accuracy of speech analysis.
With additive noise, the problem is that the speech signal spectrum .is no longer an all-pole spectrum and the usual methods of estimating the pth order all-pole parameters from the first p+1 correlations of the signal are no longer valid. While such an analysis results in a match of the first p+1 correlations, it does not guaranty the matching of the higher order correlations and affects the degree to which speech parameters obtained from the analysis corresponds to the speech pattern applied to the analyzer. It is an object of the invention to provide improved speech analysis that mitigates the effects of environmental noise on the generation of speech parameters, The problems are solved in accordance with this invention by a method which comprises the steps of partitioning an input speech pattern into successive time frame intervals, forming signals representative of the autocorrelation of the input speech of the current time frame interval responsive to the input speech pattern, generating noise signals representative of the environment, forming first and second autocorrelation signals responsive to said input speech autocorrelation signals and said generated noise signals, generating a signal corresponding to the difference between said first and second autocorrelation signals, and producing a set of signals representative of the current time frame interval speech responsive to said difference corresponding signal and the second autocorrelation signals. Summary of the Invention
The invention is directed to an arrangement for analyzing noise-contaminated speech in which the speech pattern is partitioned into successive time frame intervals. A predictive error signal is generated for each interval responsive to the noise-contaminated speech of the interval and an estimate of the noise. Predictive parameter signals are selected to minimize the time frame predictive error signal so that accurate digital codes representative of the speech are generated. Brief Description of the Drawing
FIG. 1 depicts a flow chart showing the general operation of a speech analyzer illustrative of the invention FIG. 2 shows a block diagram of a circuit adapted to analyze speech patterns in accordance with the flow chart of FIG. 1 that is illustrative of the invention;
FIGS. 3-5 are detailed flow charts illustrating the speech analysis process of FIG. 1; and FIG. 6 shows waveforms illustrating the autocorrelations of speech patterns obtained through the analyzer of FIGS. 1 and 2. Detailed Description
As is well known in the art, linear predictive analysis (LPC) utilizes a pth order all-pole model for the speech signal. For an all-pole spectrum, the matching of the first p+1 autocorrelations assures that the subsequent correlations also match. In this way the underlying spectrum of the process is recovered, where additive white or other kind of noise is present, however, the signal spectrum no longer fits the all-pole spectrum and should be modeled as a sum of an all-pole spectrum and the noise spectrum. Using only the all-pole spectrum results in inaccuracies in the higher order correlations.
According to the invention, the parameters of a pth order all-pole filter are determined such that the sum of the autocorrelations based on the all-pole model and the noise autocorrelations match the autocorrelations of speech contaminated with noise over a large number of lags beyond £. Using speech contaminated with noise, the error for the kth sample lag is ξk = rk - (rk + gηk) (1)
" rk " βηk"rk (2) where rk is the autocorrelation function of the speech contaminated with noise at the kth sample lag, r^ is the corresponding autocorrelation based on the all-pole model, nk is the autocorrelation function of the noise signal, and ø is the unknown noise intensity. As aforementioned, the matching of the first ρ+1 autocorrelations result in
where a-j,...,ap are the predictor coefficients of the all-pole model. The unknown predictor coefficients a^_ and the unknown noise intensity factor β are determined by minimizing an extended sum squared error p+q p+q p m
The minimization of the extended sum squared error of Equation (4) requires solving a set of nonlinear equations in p+1 unknowns. Such equations are difficult to formulate and solve using data processing techniques. In accordance with the invention, the first p+1 correlations of the speech signal and the all-pole filter and noise model are matched exactly while simultaneously the noise factor β is selected to minimize the mismatch at the next correlations. The mismatch is then a function of β which is determined by a one-dimensional search. The optimum linear predictor coefficients characterizing a time frame interval are obtained as solutions of p+1 linear equations
P Σ (r(k-i)-βη(k-i))a(i) = r(i)1< k < p (5) i=1 where r(i) is the ith autocorrelation coefficient of the noisy speech signal. Equation (5) is solved for a number of values of β between zero and the highest expected value of the noise power expressed as a fractio-n of the speech power. The value of β that minimizes the extended sum squared error in Equation (4) is selected as the optimum value and the linear prediction parameter signals a1 ,a2, .... ,ap are formed.
FIG. 6 illustrates the effect of noise on the speech pattern analysis. Waveform 601 shows the true autocorrelation function for a 20 millisecond time frame interval of speech contaminated by 10 db of additive white noise. Waveform 603 shows the autocorrelation function obtained from a pth order all-pole model in accordance with the prior art, and waveform 605 illustrates the autocorrelation function obtained from the modified autocorrelation analysis in accordance with the invention. It is readily seen that waveform 605 follows the true autocorrelation function of waveform 601 very closely while waveform 603 obtained from an all-pole model analysis deviates significantly from the true autocorrelation functio Detailed Description
A general flow chart illustrating the noisy speech pattern analysis arrangement is shown in FIG. 1, and FIG. 2 depicts a block diagram of a microprocessor circuit adapted to carry out the operations of the flow chart of FIG. 1.
Referring to FIGS. 1 and 2, a speech pattern and the environmental noise associated therewith are sampled and digitized as indicated in step 101 of FIG. 1. This is accomplished in the circuit of FIG. 2 by receiving the noisy speech pattern at microphone 201 and low pass filtering the speech signal in filter 205. The bandlimited signal from the filter is sampled in analog-to-digital converter 210 at a prescribed rate, and each sample is converted into a digital code corresponding to the magnitude of the sample. Step 105 of FIG. 1 is performed and the digitized speech samples from converter 210 are partitioned into time frame intervals in floating point array processor 220 hereinafter known as an arithmetic 220 under control of control processor 215. Such partitioning may be done on a frame-by-frame basis as the speech signal is received from microphone 201. The time frame interval signal samples are processed successively, and a set of LPC speech parameter signals are produced in arithmetic processor 220 and transferred therefrom to utilization device 260. The utilization device may comprise speech processing equipment such as a speech recognizer, synthesizer or coder or general purpose data processing equipment such as a main frame computer or a personal computer. The programmed instructions controlling the operation of the circuit of FIG. 2 in carrying out the operations of the flow chart of FIG. 1 in control memory 230 are listed in Fortran language form in the Appendix hereto."
Autocorrelation signals r(i) for the current time frame are generated by methods well known in the art
(step 110) in arithmetic processor 220 according to stored instructions in program memory 230 of FIG. 2 using the frame speech sample signals of store 240 and the window signal in store 235. Windowed speech samples are stored in memory 245 and the r(i) correlation signals are placed in store 250 of FIG. 2. In the frame analysis, N successive digitized speech samples s(1 ) ,s(2) , ...s(n) , ... ,s(N) are windowed by combining the sample signals with a window function signal w(n) stored in memory 235 of FIG. 2 as is well known in the art. The windowed sample signals x(n) = s(n)*w(n) 1<n<N (6)
are produced in arithmetic processor 220 and are stored in memory 245. The autocorrelation signal formation of step 110 in FIG. 1 is shown in greater detail in FIG. 3.
Referring to FIG. 3 the autocorrelation index signal k. is initially set to zero (step 301). The autocorrelation signals are then iteratively formed in steps 305, 310 and
315 until the autocorrelation index reaches p+q+1. Each autocorrelation signal r(k) = ∑x(n)*x(n-k) . (7)
The r(k) signals are stored in autocorrelation signal store 250 as they are produced in step 305.
The noise contribution level signal is then initially set to a minimum prescribed value as per index K in step 115 of FIG. 1. The noise contribution signal corresponds to the noise autocorrelation patterns expected during a time frame interval. Such noise pattern signals may be fixed as white or colored noise or may be obtained by sampling the particular speech analyzer environment in the absence of speech. The sampling may represent the average noise background or may be obtained in the first several milliseconds of each speech analysis operation prior to the application of speech to microphone 201. The loop including steps 120 to 140 that is operative to form a set of modified autocorrelation signals is then entered. This loop is adapted to form modified autocorrelation signals of the current time frame interval* by subtracting the noise contribution signal^indexed by K (step 120), form linear prediction parameter signals for the current time frame interval, form the modified autocorrelation signals (step 125), form all-pole model autocorrelation signals and to generate an error signal corresponding to the match between the all-pole model autocorrelations and modified autocorrelation signals (step 130). Noise contribution index signal K is then incremented (step 135) and the loop is iterated for the predetermined set of noise contribution indices.
The flow chart of FIG. 4 illustrates the operations of the modified autocorrelation signal formation loop in greater detail. In FIG. 4, a singularity flag IS is initially reset to zero in step 403. If during the iterations through the loop from step 410 through 430, unacceptable values for predictor coefficients are obtained, the singularity flag is set and no further modification autocorrelation signals are formed. Noise index K is then set to zero in step 405. Index K is incremented by a predetermined amount each iteration through the loop to provide modified autocorrelation signals responsive to different values of noise contribution,
The modified autocorrelation signal loop is then iterated for increasing values of noise index K until K exceeds Kmax corresponding to the maximum noise contribution signal expected. In step 410, a modified autocorrelation signal is generated in accordance with c(m) = r(m) - β*η(m) 0<m<p+q . (8) Step 415 is then entered and the linear prediction coefficients resulting from the current modified autocorrelation signals of the time frame interval are generated from
P
Σ c(m-j)*a(j) = c(m) 1<m<ρ . (9) j=1 Step 415 includes generating signals
P s1 = c(m)+ ∑ a(j)*c(m-j ) (10) j=1 and
P s2 = c(0)+ Σ a(j)*c(j) (11) j=1 and forming the signal a(m) = -s1/s2 . (12)
If s2 in Equation (10) is equal to or less than zero, the singularity flag IS is set and the modified autocorrelation forming loop is exited. Otherwise, the LPC coefficient is produced by generating signal b(j)=a(j) for 1^.j m and using the b(j) signal in
a(j) = b(j) + a(m)*b(m) for 1<j<m-1 (13) this process is iterated for modified autocorrelation index m=1 through m=p.
At this point in the operation of the circuit of
FIG. 2, the error signal q p e(K) = Σ [c(p+i)- Σ c (p+i-j)*a(j)]2 (14) i=1 j=1 representing degree of match between the modified autocorrelation signals of FIG. 4 and the corresponding autocorrelation signals based on the all-pole model for the present index K is formed in processor 220 of FIG. 2"". Index K is then incremented in step 425 and the next iteration is started in step 410.
When index Kmax has been exceeded in decision step 430 the index K* corresponding to the minimum matching error signal is determined (step 145) by a search through the correlation signal matching errors obtained in the iterations of loop 450 in FIG. 4. The noise contribution signal for. index K* is then used to form LPC speech parameter signals for the current frame (step 150 of
FIG. 1) and transferred to utilization device 260. The formation of the parameter signals corresponding to the noise speech and the additive noise is shown in greater detail in FIG. 5. Step 501 is entered after the noise index K* has been determined in step 150 of FIG. 1.
In step 501 the noise corrected autocorrelation signals are generated responsive to the initial autocorrelation signals r(m) and the noise signal β = K*/100. The linear prediction coefficients corresponding to the noise corrected autocorrelation coefficients are then formed from the relationship
P
Σ c( |m-j|)*a(j) = c(m) . (16) j=1
These noise corrected speech parameters are then transferred to utilization device 260. Control is then passed to step 110 via step 155 and the circuit of FIG. 2 processes the next set of N digitized speech sample signals of store 240.
The invention has been shown and described with reference to a particular embodiment thereof. It is to be understood that changes and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention.
APPENDIX A
SUBROUTINE NLPC (A)
COMMON /STO 235/ W(0:200) COMMON /STO 240/ S(0:200) COMMON /STO 245/ X(0:200) COMMON /STO 250/ R(0:40)
COMMON /STO 255/ C(0:40)
REAL ETA(0:40) ,A(0:12) ,E(0:100) ,T(0:12) DATA (ETA(I) ,I=0,0)/1.0/
INTEGER P,Q
P=12 Q=12 N=172
KMAX=100
DO1I=0,N-1
1 W(I)=0.54-0.46*COS( (2*3.14159265*1) /N)
C+++ STEP 110 - GENERATE AUTOCORRELATION SIGNALS
DO2M=0,N-1
2 X(M)=S(M)*W(M)
DO3K=0,P+Q R(K)=0 D04M=K,N-1 4 R(K)=R(K)+X(M)*X(M-K) 3 CONTINUE
R0=R(0) D05M=0,P+Q 5 R( M ) =R( M ) /R0
K=0
100 CONTINUE
C+++ STEP 120 - MODIFY AUTOCORRELATION SIGNALS
BETA=0.001*K
C0=R(0)-BETA*ETA(0) DO6M=0,P+Q
6 C(M)=(R(M)-BETA*ETA(M))/C0
C+++ STEP 125 - FORM SPEECH PARAMETER SIGNALS
CALL ACTOPC (C( 1 ) ,A,P,T,IS)
IF(IS.EQ.1 )GOTO150
C+++ STEP 130 GENERATE CORRELATION MATCHING ERROR
E(K)=0
D07I=1 ,Q SM=0
D08J=1 ,P 8 SM=SM-C(P+I-J)*A(J) 7 E(K)=E(K)+(C(P+I)-SM)**2
WRITE(IFCW,908)K,BETA,E(K)
908 FORMAT(I5,F8.4,F8.4)
K=K+1
IF(K.LE.KMAX)GOTO100
150 CONTINUE C+++ STEP 145 DETERMINE INDEX OF MINIMUM MATCHING ERROR KSTAR=0 EMIN=E(0) DO9I=0,K-1 IF(E(I) .GE.EMIN)GOT09
EMIN='E(I) KSTAR=I 9 CONTINUE
C+++ STEP 150 FORM SPEECH PARAMETER SIGNALS
BETA=0.001*KSTAR C0=R(0)-BETA*ETA(0) DO16M=0,P+Q 16 C(M)=(R(M)-BETA*ETA(M) )/C0
CALL ACTOPC (C(1 ) ,A,P,T,IS)
RETURN END
SUBROUTINE ACTOPC (A,X,N,T,M)
DIMENSION A( 1 ) ,X( 1 ) ,T( 1 )
M=0 X(1)=1
X(2)=-A(1)
D03I=2,N
S1=A(I)
S2=1 D04J=1,I-1
S1=S1+A(I-J)*X(J+1 ) 4 S2=S2+A(J)*X(J+1 )
IF(S2.GT.0.0)GOTO7
M=1 RETURN
7 RC=-S1/S2 T(I)=RC X(I+1 )=RC D01J=1,(I/2) 5 TI=X(J+1)
TJ=X(I-J+1 ) X(J+1 )=TI+RC*TJ 1 X(I-J+1 )=TI*RC+TJ 3 CONTINUE 10 M=0
RETURN END
15
20
25
30
35

Claims

Claims :
1. A method for analyzing speech in a noise prone environment comprising the steps of: partitioning an input speech pattern into successive time frame intervals? forming signals representative of the autocorrelation of the input speech of the current time frame interval responsive to the input speech pattern: generating noise signals representative of the environment; forming first and second autocorrelation signals responsive to said input speech autocorrelation signals and said generated noise signals; generating a signal corresponding to the difference between said first and second autocorrelation signals; and producing a set of signals representative of the current time frame interval speech responsive to said difference corresponding signal and the second autocorrelation signals.
2. A method for analyzing speech in a noise prone environment according to claim 1
CHARACTERIZED IN THAT the noise signal generating and first and second autocorrelation signal forming steps comprise: producing a time frame interval noise signal of prescribed magnitude; forming first and second autocorrelation signals responsive to the produced noise signal of prescribed magnitude; and repeating the noise signal producing step and the first and second autocorrelation signal forming step for prescribed magnitudes over a predetermined range of prescribed magnitudes. 3. A method for analyzing speech in a noise prone environment according to claim 2
CHARACTERIZED IN THAT generating the difference corresponding signal comprises the steps of: producing a signal representative of the difference between the first autocorrelation signals of the current time frame interval and the second autocorrelation signals of the current time frame interval for each prescribed magnitude noise signal; and selecting the minimum of the difference representative signals. 4. A method for analyzing speech in a noise prone environment according to claim 3
CHARACTERIZED IN THAT producing the set of current time frame interval speech pattern representative signals comprises: generating a set of corrected autocorrelation signals responsive to the minimum difference representative signal; and producing a set of speech parameter signals for the current time frame interval responsive to the corrected autocorrelation signals.
5. A method for analyzing speech in a noise prone environment according to claim 4
CHARACTERIZED IN THAT the speech parameter signals are linear prediction coefficient signals.
6. A method for analyzing speech in a noise prone environment according to claim 5
CHARACTERIZED IN THAT the first autocorrelation signals correspond to all-pole model autocorrelation signals and said second autocorrelation signals correspond to noise signal reduced input speech autocorrelation signals.
1 . A method for analyzing speech in a noise prone environment according to claims 1, 2, 3, 4, 5 or 6 CHARACTERIZED IN THAT the speech pattern comprises speech and additive noise. 8. A method for analyzing speech in a noise prone environment according to claims 1. 2, 3, 4, 5, or 6
CHARACTERIZED IN THAT each time frame interval noise signal comprises a set of autocorrelation signals generated responsive to environmental noise preceding start of the speech pattern.
9. A speech analyzer for a noise prone environment
CHARACTERIZED IN THAT the analyzer comprising: means for partitioning an input speech pattern into successive time frame intervals; means for forming a signal representative of the autocorrelation of the input speech of the current time frame interval responsive to the input speech pattern; means for generating noise signals representative of the speech environment; means for forming a set of first and second autocorrelation signals of the current time frame interval responsive to said input speech autocorrelation signals and said generated noise signals; means for generating a signal corresponding to the difference between said first autocorrelation signals and said second autocorrelation signals? and means for producing a set of signals representative of the current time frame interval speech pattern responsive to said difference corresponding signal and said first autocorrelation signals.
10. A speech analyzer for a noise prone environment according to claim 9
CHARACTERIZED IN THAT the noise signal generating and first and second autocorrelation signal forming means comprise: means for producing a time frame interval noise signal of prescribed magnitude; means for forming first and second autocorrelation signals responsive to the produced noise signal of prescribed magnitude; and means for iteratively operating the noise signal producing means and the first and second autocorrelation signal forming means for prescribed magnitudes over a predetermined range of prescribed magnitudes.
11. A speech analyzer for a noise prone environment according to claim 10
CHARACTERIZED IN THAT the difference corresponding signal generating means comprises: means for producing a signal representative of the difference between the autocorrelation signals of the current time frame interval and the first and second autocorrelation signals for each prescribed magnitude noise signal; and means for selecting the minimum of said difference representative signals.
12. A speech analyzer for a noise prone environment according to claim 11 CHARACTERIZED IN THAT the means for producing the set of current time frame interval speech pattern representative signals comprises: means for generating a set of corrected autocorrelation signals responsive to the minimum difference representative signal? and means for producing a set of speech parameter signals for the current time frame interval responsive to the corrected autocorrelation signals. 13. A speech analyzer for a noise prone environment according to claim 12 CHARACTERIZED IN THAT the speech parameter signals are linear prediction coefficient signals. 14. A speech analyzer for a noise prone environment according to claims 9, 10, 11. 12, or 13 CHARACTERIZED IN THAT the speech pattern comprises speech and additive noise.
15. A speech analyzer for a noise prone environment noise according to claim 9, 10, 11, 12, or 13 CHARACTERIZED IN THAT the first autocorrelation signals correspond to all-pole model autocorrelation signals and said second autocorrelation signals correpond to noise signal reduced input speech autocorrelation signals. 16. A speech analyzer for a noise prone environment according to claims 9, 10. 11, 12 or 13 CHARACTERIZED IN THAT the means for generating the time frame interval noise signal comprises means for forming a set of autocorrelation signals responsive to environmental noise preceding start of the speech pattern.
EP19850906004 1985-03-22 1985-11-14 Analyzer for speech in noise prone environments Withdrawn EP0226590A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US71488885A 1985-03-22 1985-03-22
US714888 1985-03-22

Publications (1)

Publication Number Publication Date
EP0226590A1 true EP0226590A1 (en) 1987-07-01

Family

ID=24871862

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19850906004 Withdrawn EP0226590A1 (en) 1985-03-22 1985-11-14 Analyzer for speech in noise prone environments

Country Status (5)

Country Link
EP (1) EP0226590A1 (en)
JP (1) JPS62502288A (en)
AU (1) AU5202086A (en)
ES (1) ES8704658A1 (en)
WO (1) WO1986005619A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL8500843A (en) * 1985-03-22 1986-10-16 Koninkl Philips Electronics Nv MULTIPULS EXCITATION LINEAR-PREDICTIVE VOICE CODER.

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO8605619A1 *

Also Published As

Publication number Publication date
ES8704658A1 (en) 1987-04-16
ES549155A0 (en) 1987-04-16
WO1986005619A1 (en) 1986-09-25
JPS62502288A (en) 1987-09-03
AU5202086A (en) 1986-10-13

Similar Documents

Publication Publication Date Title
EP1309964B1 (en) Fast frequency-domain pitch estimation
Lim et al. All-pole modeling of degraded speech
US5179626A (en) Harmonic speech coding arrangement where a set of parameters for a continuous magnitude spectrum is determined by a speech analyzer and the parameters are used by a synthesizer to determine a spectrum which is used to determine senusoids for synthesis
US5450522A (en) Auditory model for parametrization of speech
US7272551B2 (en) Computational effectiveness enhancement of frequency domain pitch estimators
US5305421A (en) Low bit rate speech coding system and compression
CA1123955A (en) Speech analysis and synthesis apparatus
KR960002388B1 (en) Speech encoding process system and voice synthesizing method
US5023910A (en) Vector quantization in a harmonic speech coding arrangement
US4283601A (en) Preprocessing method and device for speech recognition device
US5459815A (en) Speech recognition method using time-frequency masking mechanism
GB1533337A (en) Speech analysis and synthesis system
EP0235181A1 (en) A parallel processing pitch detector.
EP0470245A1 (en) Method for spectral estimation to improve noise robustness for speech recognition.
US4081605A (en) Speech signal fundamental period extractor
US5884251A (en) Voice coding and decoding method and device therefor
Atal et al. Linear prediction analysis of speech based on a pole‐zero representation
US4922539A (en) Method of encoding speech signals involving the extraction of speech formant candidates in real time
US4890328A (en) Voice synthesis utilizing multi-level filter excitation
US5007094A (en) Multipulse excited pole-zero filtering approach for noise reduction
US6912496B1 (en) Preprocessing modules for quality enhancement of MBE coders and decoders for signals having transmission path characteristics
AU2394895A (en) A multi-pulse analysis speech processing system and method
US7043424B2 (en) Pitch mark determination using a fundamental frequency based adaptable filter
EP0226590A1 (en) Analyzer for speech in noise prone environments
Srivastava Fundamentals of linear prediction

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19870302

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): BE DE FR GB IT NL SE

17Q First examination report despatched

Effective date: 19881031

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19890311

RIN1 Information on inventor provided before grant (corrected)

Inventor name: JAIN, VIJAY, KUMAR

Inventor name: ATAL, BISHNU, SAROOP