EP0226590A1 - Analyzer for speech in noise prone environments - Google Patents
Analyzer for speech in noise prone environmentsInfo
- Publication number
- EP0226590A1 EP0226590A1 EP19850906004 EP85906004A EP0226590A1 EP 0226590 A1 EP0226590 A1 EP 0226590A1 EP 19850906004 EP19850906004 EP 19850906004 EP 85906004 A EP85906004 A EP 85906004A EP 0226590 A1 EP0226590 A1 EP 0226590A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- signals
- noise
- autocorrelation
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/06—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
Definitions
- This invention relates to a method for analyzing speech in a noise prone environment.
- the problem is that the speech signal spectrum .is no longer an all-pole spectrum and the usual methods of estimating the pth order all-pole parameters from the first p+1 correlations of the signal are no longer valid. While such an analysis results in a match of the first p+1 correlations, it does not guaranty the matching of the higher order correlations and affects the degree to which speech parameters obtained from the analysis corresponds to the speech pattern applied to the analyzer.
- the problems are solved in accordance with this invention by a method which comprises the steps of partitioning an input speech pattern into successive time frame intervals, forming signals representative of the autocorrelation of the input speech of the current time frame interval responsive to the input speech pattern, generating noise signals representative of the environment, forming first and second autocorrelation signals responsive to said input speech autocorrelation signals and said generated noise signals, generating a signal corresponding to the difference between said first and second autocorrelation signals, and producing a set of signals representative of the current time frame interval speech responsive to said difference corresponding signal and the second autocorrelation signals.
- the invention is directed to an arrangement for analyzing noise-contaminated speech in which the speech pattern is partitioned into successive time frame intervals.
- a predictive error signal is generated for each interval responsive to the noise-contaminated speech of the interval and an estimate of the noise.
- Predictive parameter signals are selected to minimize the time frame predictive error signal so that accurate digital codes representative of the speech are generated.
- FIG. 1 depicts a flow chart showing the general operation of a speech analyzer illustrative of the invention
- FIG. 2 shows a block diagram of a circuit adapted to analyze speech patterns in accordance with the flow chart of FIG. 1 that is illustrative of the invention
- FIGS. 3-5 are detailed flow charts illustrating the speech analysis process of FIG. 1; and FIG. 6 shows waveforms illustrating the autocorrelations of speech patterns obtained through the analyzer of FIGS. 1 and 2. Detailed Description
- LPC linear predictive analysis
- the parameters of a pth order all-pole filter are determined such that the sum of the autocorrelations based on the all-pole model and the noise autocorrelations match the autocorrelations of speech contaminated with noise over a large number of lags beyond £.
- r k is the autocorrelation function of the speech contaminated with noise at the kth sample lag
- r ⁇ is the corresponding autocorrelation based on the all-pole model
- n k is the autocorrelation function of the noise signal
- ⁇ is the unknown noise intensity
- Equation (4) requires solving a set of nonlinear equations in p+1 unknowns. Such equations are difficult to formulate and solve using data processing techniques.
- the first p+1 correlations of the speech signal and the all-pole filter and noise model are matched exactly while simultaneously the noise factor ⁇ is selected to minimize the mismatch at the next correlations.
- the mismatch is then a function of ⁇ which is determined by a one-dimensional search.
- the optimum linear predictor coefficients characterizing a time frame interval are obtained as solutions of p+1 linear equations
- Equation (5) is solved for a number of values of ⁇ between zero and the highest expected value of the noise power expressed as a fractio-n of the speech power.
- the value of ⁇ that minimizes the extended sum squared error in Equation (4) is selected as the optimum value and the linear prediction parameter signals a1 ,a2, .... ,ap are formed.
- FIG. 6 illustrates the effect of noise on the speech pattern analysis.
- Waveform 601 shows the true autocorrelation function for a 20 millisecond time frame interval of speech contaminated by 10 db of additive white noise.
- Waveform 603 shows the autocorrelation function obtained from a pth order all-pole model in accordance with the prior art
- waveform 605 illustrates the autocorrelation function obtained from the modified autocorrelation analysis in accordance with the invention. It is readily seen that waveform 605 follows the true autocorrelation function of waveform 601 very closely while waveform 603 obtained from an all-pole model analysis deviates significantly from the true autocorrelation functiono Detailed Description
- FIG. 1 A general flow chart illustrating the noisy speech pattern analysis arrangement is shown in FIG. 1, and FIG. 2 depicts a block diagram of a microprocessor circuit adapted to carry out the operations of the flow chart of FIG. 1.
- a speech pattern and the environmental noise associated therewith are sampled and digitized as indicated in step 101 of FIG. 1. This is accomplished in the circuit of FIG. 2 by receiving the noisy speech pattern at microphone 201 and low pass filtering the speech signal in filter 205.
- the bandlimited signal from the filter is sampled in analog-to-digital converter 210 at a prescribed rate, and each sample is converted into a digital code corresponding to the magnitude of the sample.
- Step 105 of FIG. 1 is performed and the digitized speech samples from converter 210 are partitioned into time frame intervals in floating point array processor 220 hereinafter known as an arithmetic 220 under control of control processor 215.
- Such partitioning may be done on a frame-by-frame basis as the speech signal is received from microphone 201.
- the time frame interval signal samples are processed successively, and a set of LPC speech parameter signals are produced in arithmetic processor 220 and transferred therefrom to utilization device 260.
- the utilization device may comprise speech processing equipment such as a speech recognizer, synthesizer or coder or general purpose data processing equipment such as a main frame computer or a personal computer.
- the programmed instructions controlling the operation of the circuit of FIG. 2 in carrying out the operations of the flow chart of FIG. 1 in control memory 230 are listed in Fortran language form in the Appendix hereto.”
- step 110 in arithmetic processor 220 according to stored instructions in program memory 230 of FIG. 2 using the frame speech sample signals of store 240 and the window signal in store 235.
- Windowed speech samples are stored in memory 245 and the r(i) correlation signals are placed in store 250 of FIG. 2.
- N successive digitized speech samples s(1 ) ,s(2) , ...s(n) , ... ,s(N) are windowed by combining the sample signals with a window function signal w(n) stored in memory 235 of FIG. 2 as is well known in the art.
- the windowed sample signals x(n) s(n)*w(n) 1 ⁇ n ⁇ N (6)
- step 110 in FIG. 1 is shown in greater detail in FIG. 3.
- the autocorrelation index signal k. is initially set to zero (step 301).
- the autocorrelation signals are then iteratively formed in steps 305, 310 and
- Each autocorrelation signal r(k) ⁇ x(n)*x(n-k) . (7)
- the r(k) signals are stored in autocorrelation signal store 250 as they are produced in step 305.
- the noise contribution level signal is then initially set to a minimum prescribed value as per index K in step 115 of FIG. 1.
- the noise contribution signal corresponds to the noise autocorrelation patterns expected during a time frame interval.
- Such noise pattern signals may be fixed as white or colored noise or may be obtained by sampling the particular speech analyzer environment in the absence of speech. The sampling may represent the average noise background or may be obtained in the first several milliseconds of each speech analysis operation prior to the application of speech to microphone 201.
- the loop including steps 120 to 140 that is operative to form a set of modified autocorrelation signals is then entered.
- This loop is adapted to form modified autocorrelation signals of the current time frame interval * by subtracting the noise contribution signal ⁇ indexed by K (step 120), form linear prediction parameter signals for the current time frame interval, form the modified autocorrelation signals (step 125), form all-pole model autocorrelation signals and to generate an error signal corresponding to the match between the all-pole model autocorrelations and modified autocorrelation signals (step 130).
- Noise contribution index signal K is then incremented (step 135) and the loop is iterated for the predetermined set of noise contribution indices.
- FIG. 4 illustrates the operations of the modified autocorrelation signal formation loop in greater detail.
- a singularity flag IS is initially reset to zero in step 403. If during the iterations through the loop from step 410 through 430, unacceptable values for predictor coefficients are obtained, the singularity flag is set and no further modification autocorrelation signals are formed.
- Noise index K is then set to zero in step 405. Index K is incremented by a predetermined amount each iteration through the loop to provide modified autocorrelation signals responsive to different values of noise contribution,
- the modified autocorrelation signal loop is then iterated for increasing values of noise index K until K exceeds Kmax corresponding to the maximum noise contribution signal expected.
- Step 415 is then entered and the linear prediction coefficients resulting from the current modified autocorrelation signals of the time frame interval are generated from
- Step 415 includes generating signals
- index K* corresponding to the minimum matching error signal is determined (step 145) by a search through the correlation signal matching errors obtained in the iterations of loop 450 in FIG. 4.
- the noise contribution signal for. index K* is then used to form LPC speech parameter signals for the current frame (step 150 of
- Step 501 is entered after the noise index K* has been determined in step 150 of FIG. 1.
- the linear prediction coefficients corresponding to the noise corrected autocorrelation coefficients are then formed from the relationship
- control is then passed to step 110 via step 155 and the circuit of FIG. 2 processes the next set of N digitized speech sample signals of store 240.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Time-Division Multiplex Systems (AREA)
- Noise Elimination (AREA)
Abstract
Une modèle vocal capté dans un environnement bruyant est analysé pour former une séquence représentative de signaux de paramètres vocaux. Le modèle vocal affecté par le bruit est partagé en intervalles par tranches de temps successives. Un signal d'erreur prédictif est produit pour chaque intervalle successif en fonction de l'émission vocale affectée par le bruit dans l'intervalle et d'une estimation du bruit ambiant. Le signaux de paramètres prédictifs sont sélectionnés pour réduire au minimum le signal d'erreur prédictif de chaque tranche de temps, de manière à produire des codes numériques précis représentant l'émission vocale.A speech model captured in a noisy environment is analyzed to form a representative sequence of speech parameter signals. The voice model affected by noise is divided into intervals by successive time slots. A predictive error signal is produced for each successive interval based on the voice emission affected by noise in the interval and an estimate of the ambient noise. The predictive parameter signals are selected to minimize the predictive error signal for each time slot, so as to produce precise digital codes representing the voice broadcast.
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US71488885A | 1985-03-22 | 1985-03-22 | |
US714888 | 1985-03-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
EP0226590A1 true EP0226590A1 (en) | 1987-07-01 |
Family
ID=24871862
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19850906004 Withdrawn EP0226590A1 (en) | 1985-03-22 | 1985-11-14 | Analyzer for speech in noise prone environments |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP0226590A1 (en) |
JP (1) | JPS62502288A (en) |
AU (1) | AU5202086A (en) |
ES (1) | ES8704658A1 (en) |
WO (1) | WO1986005619A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
NL8500843A (en) * | 1985-03-22 | 1986-10-16 | Koninkl Philips Electronics Nv | MULTIPULS EXCITATION LINEAR-PREDICTIVE VOICE CODER. |
-
1985
- 1985-11-14 JP JP50526685A patent/JPS62502288A/en active Pending
- 1985-11-14 WO PCT/US1985/002255 patent/WO1986005619A1/en not_active Application Discontinuation
- 1985-11-14 AU AU52020/86A patent/AU5202086A/en not_active Abandoned
- 1985-11-14 EP EP19850906004 patent/EP0226590A1/en not_active Withdrawn
- 1985-11-21 ES ES549155A patent/ES8704658A1/en not_active Expired
Non-Patent Citations (1)
Title |
---|
See references of WO8605619A1 * |
Also Published As
Publication number | Publication date |
---|---|
ES8704658A1 (en) | 1987-04-16 |
ES549155A0 (en) | 1987-04-16 |
WO1986005619A1 (en) | 1986-09-25 |
JPS62502288A (en) | 1987-09-03 |
AU5202086A (en) | 1986-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1309964B1 (en) | Fast frequency-domain pitch estimation | |
Lim et al. | All-pole modeling of degraded speech | |
US5179626A (en) | Harmonic speech coding arrangement where a set of parameters for a continuous magnitude spectrum is determined by a speech analyzer and the parameters are used by a synthesizer to determine a spectrum which is used to determine senusoids for synthesis | |
US5450522A (en) | Auditory model for parametrization of speech | |
US7272551B2 (en) | Computational effectiveness enhancement of frequency domain pitch estimators | |
US5305421A (en) | Low bit rate speech coding system and compression | |
CA1123955A (en) | Speech analysis and synthesis apparatus | |
KR960002388B1 (en) | Speech encoding process system and voice synthesizing method | |
US5023910A (en) | Vector quantization in a harmonic speech coding arrangement | |
US4283601A (en) | Preprocessing method and device for speech recognition device | |
US5459815A (en) | Speech recognition method using time-frequency masking mechanism | |
GB1533337A (en) | Speech analysis and synthesis system | |
EP0235181A1 (en) | A parallel processing pitch detector. | |
EP0470245A1 (en) | Method for spectral estimation to improve noise robustness for speech recognition. | |
US4081605A (en) | Speech signal fundamental period extractor | |
US5884251A (en) | Voice coding and decoding method and device therefor | |
Atal et al. | Linear prediction analysis of speech based on a pole‐zero representation | |
US4922539A (en) | Method of encoding speech signals involving the extraction of speech formant candidates in real time | |
US4890328A (en) | Voice synthesis utilizing multi-level filter excitation | |
US5007094A (en) | Multipulse excited pole-zero filtering approach for noise reduction | |
US6912496B1 (en) | Preprocessing modules for quality enhancement of MBE coders and decoders for signals having transmission path characteristics | |
AU2394895A (en) | A multi-pulse analysis speech processing system and method | |
US7043424B2 (en) | Pitch mark determination using a fundamental frequency based adaptable filter | |
EP0226590A1 (en) | Analyzer for speech in noise prone environments | |
Srivastava | Fundamentals of linear prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19870302 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): BE DE FR GB IT NL SE |
|
17Q | First examination report despatched |
Effective date: 19881031 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 19890311 |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: JAIN, VIJAY, KUMAR Inventor name: ATAL, BISHNU, SAROOP |