EP0226590A1 - Sprachanalysator für lärmanfällige umgebungen - Google Patents

Sprachanalysator für lärmanfällige umgebungen

Info

Publication number
EP0226590A1
EP0226590A1 EP19850906004 EP85906004A EP0226590A1 EP 0226590 A1 EP0226590 A1 EP 0226590A1 EP 19850906004 EP19850906004 EP 19850906004 EP 85906004 A EP85906004 A EP 85906004A EP 0226590 A1 EP0226590 A1 EP 0226590A1
Authority
EP
European Patent Office
Prior art keywords
speech
signals
noise
autocorrelation
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP19850906004
Other languages
English (en)
French (fr)
Inventor
Bishnu Saroop Atal
Vijay Kumar Jain
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
American Telephone and Telegraph Co Inc
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by American Telephone and Telegraph Co Inc, AT&T Corp filed Critical American Telephone and Telegraph Co Inc
Publication of EP0226590A1 publication Critical patent/EP0226590A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/12Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients

Definitions

  • This invention relates to a method for analyzing speech in a noise prone environment.
  • the problem is that the speech signal spectrum .is no longer an all-pole spectrum and the usual methods of estimating the pth order all-pole parameters from the first p+1 correlations of the signal are no longer valid. While such an analysis results in a match of the first p+1 correlations, it does not guaranty the matching of the higher order correlations and affects the degree to which speech parameters obtained from the analysis corresponds to the speech pattern applied to the analyzer.
  • the problems are solved in accordance with this invention by a method which comprises the steps of partitioning an input speech pattern into successive time frame intervals, forming signals representative of the autocorrelation of the input speech of the current time frame interval responsive to the input speech pattern, generating noise signals representative of the environment, forming first and second autocorrelation signals responsive to said input speech autocorrelation signals and said generated noise signals, generating a signal corresponding to the difference between said first and second autocorrelation signals, and producing a set of signals representative of the current time frame interval speech responsive to said difference corresponding signal and the second autocorrelation signals.
  • the invention is directed to an arrangement for analyzing noise-contaminated speech in which the speech pattern is partitioned into successive time frame intervals.
  • a predictive error signal is generated for each interval responsive to the noise-contaminated speech of the interval and an estimate of the noise.
  • Predictive parameter signals are selected to minimize the time frame predictive error signal so that accurate digital codes representative of the speech are generated.
  • FIG. 1 depicts a flow chart showing the general operation of a speech analyzer illustrative of the invention
  • FIG. 2 shows a block diagram of a circuit adapted to analyze speech patterns in accordance with the flow chart of FIG. 1 that is illustrative of the invention
  • FIGS. 3-5 are detailed flow charts illustrating the speech analysis process of FIG. 1; and FIG. 6 shows waveforms illustrating the autocorrelations of speech patterns obtained through the analyzer of FIGS. 1 and 2. Detailed Description
  • LPC linear predictive analysis
  • the parameters of a pth order all-pole filter are determined such that the sum of the autocorrelations based on the all-pole model and the noise autocorrelations match the autocorrelations of speech contaminated with noise over a large number of lags beyond £.
  • r k is the autocorrelation function of the speech contaminated with noise at the kth sample lag
  • r ⁇ is the corresponding autocorrelation based on the all-pole model
  • n k is the autocorrelation function of the noise signal
  • is the unknown noise intensity
  • Equation (4) requires solving a set of nonlinear equations in p+1 unknowns. Such equations are difficult to formulate and solve using data processing techniques.
  • the first p+1 correlations of the speech signal and the all-pole filter and noise model are matched exactly while simultaneously the noise factor ⁇ is selected to minimize the mismatch at the next correlations.
  • the mismatch is then a function of ⁇ which is determined by a one-dimensional search.
  • the optimum linear predictor coefficients characterizing a time frame interval are obtained as solutions of p+1 linear equations
  • Equation (5) is solved for a number of values of ⁇ between zero and the highest expected value of the noise power expressed as a fractio-n of the speech power.
  • the value of ⁇ that minimizes the extended sum squared error in Equation (4) is selected as the optimum value and the linear prediction parameter signals a1 ,a2, .... ,ap are formed.
  • FIG. 6 illustrates the effect of noise on the speech pattern analysis.
  • Waveform 601 shows the true autocorrelation function for a 20 millisecond time frame interval of speech contaminated by 10 db of additive white noise.
  • Waveform 603 shows the autocorrelation function obtained from a pth order all-pole model in accordance with the prior art
  • waveform 605 illustrates the autocorrelation function obtained from the modified autocorrelation analysis in accordance with the invention. It is readily seen that waveform 605 follows the true autocorrelation function of waveform 601 very closely while waveform 603 obtained from an all-pole model analysis deviates significantly from the true autocorrelation functiono Detailed Description
  • FIG. 1 A general flow chart illustrating the noisy speech pattern analysis arrangement is shown in FIG. 1, and FIG. 2 depicts a block diagram of a microprocessor circuit adapted to carry out the operations of the flow chart of FIG. 1.
  • a speech pattern and the environmental noise associated therewith are sampled and digitized as indicated in step 101 of FIG. 1. This is accomplished in the circuit of FIG. 2 by receiving the noisy speech pattern at microphone 201 and low pass filtering the speech signal in filter 205.
  • the bandlimited signal from the filter is sampled in analog-to-digital converter 210 at a prescribed rate, and each sample is converted into a digital code corresponding to the magnitude of the sample.
  • Step 105 of FIG. 1 is performed and the digitized speech samples from converter 210 are partitioned into time frame intervals in floating point array processor 220 hereinafter known as an arithmetic 220 under control of control processor 215.
  • Such partitioning may be done on a frame-by-frame basis as the speech signal is received from microphone 201.
  • the time frame interval signal samples are processed successively, and a set of LPC speech parameter signals are produced in arithmetic processor 220 and transferred therefrom to utilization device 260.
  • the utilization device may comprise speech processing equipment such as a speech recognizer, synthesizer or coder or general purpose data processing equipment such as a main frame computer or a personal computer.
  • the programmed instructions controlling the operation of the circuit of FIG. 2 in carrying out the operations of the flow chart of FIG. 1 in control memory 230 are listed in Fortran language form in the Appendix hereto.”
  • step 110 in arithmetic processor 220 according to stored instructions in program memory 230 of FIG. 2 using the frame speech sample signals of store 240 and the window signal in store 235.
  • Windowed speech samples are stored in memory 245 and the r(i) correlation signals are placed in store 250 of FIG. 2.
  • N successive digitized speech samples s(1 ) ,s(2) , ...s(n) , ... ,s(N) are windowed by combining the sample signals with a window function signal w(n) stored in memory 235 of FIG. 2 as is well known in the art.
  • the windowed sample signals x(n) s(n)*w(n) 1 ⁇ n ⁇ N (6)
  • step 110 in FIG. 1 is shown in greater detail in FIG. 3.
  • the autocorrelation index signal k. is initially set to zero (step 301).
  • the autocorrelation signals are then iteratively formed in steps 305, 310 and
  • Each autocorrelation signal r(k) ⁇ x(n)*x(n-k) . (7)
  • the r(k) signals are stored in autocorrelation signal store 250 as they are produced in step 305.
  • the noise contribution level signal is then initially set to a minimum prescribed value as per index K in step 115 of FIG. 1.
  • the noise contribution signal corresponds to the noise autocorrelation patterns expected during a time frame interval.
  • Such noise pattern signals may be fixed as white or colored noise or may be obtained by sampling the particular speech analyzer environment in the absence of speech. The sampling may represent the average noise background or may be obtained in the first several milliseconds of each speech analysis operation prior to the application of speech to microphone 201.
  • the loop including steps 120 to 140 that is operative to form a set of modified autocorrelation signals is then entered.
  • This loop is adapted to form modified autocorrelation signals of the current time frame interval * by subtracting the noise contribution signal ⁇ indexed by K (step 120), form linear prediction parameter signals for the current time frame interval, form the modified autocorrelation signals (step 125), form all-pole model autocorrelation signals and to generate an error signal corresponding to the match between the all-pole model autocorrelations and modified autocorrelation signals (step 130).
  • Noise contribution index signal K is then incremented (step 135) and the loop is iterated for the predetermined set of noise contribution indices.
  • FIG. 4 illustrates the operations of the modified autocorrelation signal formation loop in greater detail.
  • a singularity flag IS is initially reset to zero in step 403. If during the iterations through the loop from step 410 through 430, unacceptable values for predictor coefficients are obtained, the singularity flag is set and no further modification autocorrelation signals are formed.
  • Noise index K is then set to zero in step 405. Index K is incremented by a predetermined amount each iteration through the loop to provide modified autocorrelation signals responsive to different values of noise contribution,
  • the modified autocorrelation signal loop is then iterated for increasing values of noise index K until K exceeds Kmax corresponding to the maximum noise contribution signal expected.
  • Step 415 is then entered and the linear prediction coefficients resulting from the current modified autocorrelation signals of the time frame interval are generated from
  • Step 415 includes generating signals
  • index K* corresponding to the minimum matching error signal is determined (step 145) by a search through the correlation signal matching errors obtained in the iterations of loop 450 in FIG. 4.
  • the noise contribution signal for. index K* is then used to form LPC speech parameter signals for the current frame (step 150 of
  • Step 501 is entered after the noise index K* has been determined in step 150 of FIG. 1.
  • the linear prediction coefficients corresponding to the noise corrected autocorrelation coefficients are then formed from the relationship
  • control is then passed to step 110 via step 155 and the circuit of FIG. 2 processes the next set of N digitized speech sample signals of store 240.
EP19850906004 1985-03-22 1985-11-14 Sprachanalysator für lärmanfällige umgebungen Withdrawn EP0226590A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US71488885A 1985-03-22 1985-03-22
US714888 1985-03-22

Publications (1)

Publication Number Publication Date
EP0226590A1 true EP0226590A1 (de) 1987-07-01

Family

ID=24871862

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19850906004 Withdrawn EP0226590A1 (de) 1985-03-22 1985-11-14 Sprachanalysator für lärmanfällige umgebungen

Country Status (5)

Country Link
EP (1) EP0226590A1 (de)
JP (1) JPS62502288A (de)
AU (1) AU5202086A (de)
ES (1) ES8704658A1 (de)
WO (1) WO1986005619A1 (de)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL8500843A (nl) * 1985-03-22 1986-10-16 Koninkl Philips Electronics Nv Multipuls-excitatie lineair-predictieve spraakcoder.

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO8605619A1 *

Also Published As

Publication number Publication date
JPS62502288A (ja) 1987-09-03
WO1986005619A1 (en) 1986-09-25
ES549155A0 (es) 1987-04-16
AU5202086A (en) 1986-10-13
ES8704658A1 (es) 1987-04-16

Similar Documents

Publication Publication Date Title
EP1309964B1 (de) Schnelle frequenzbereichs-tonhöhenabschätzung
Lim et al. All-pole modeling of degraded speech
US5179626A (en) Harmonic speech coding arrangement where a set of parameters for a continuous magnitude spectrum is determined by a speech analyzer and the parameters are used by a synthesizer to determine a spectrum which is used to determine senusoids for synthesis
US5450522A (en) Auditory model for parametrization of speech
US7272551B2 (en) Computational effectiveness enhancement of frequency domain pitch estimators
US5305421A (en) Low bit rate speech coding system and compression
CA1123955A (en) Speech analysis and synthesis apparatus
KR960002388B1 (ko) 언어 엔코딩 처리 시스템 및 음성 합성방법
US5023910A (en) Vector quantization in a harmonic speech coding arrangement
CA1301339C (en) Parallel processing pitch detector
US4283601A (en) Preprocessing method and device for speech recognition device
US5459815A (en) Speech recognition method using time-frequency masking mechanism
GB1533337A (en) Speech analysis and synthesis system
EP0470245A1 (de) Spektralbewertungsverfahren zur verbesserung der wiederstandsfähigkeit gegen rauschen bei der spracherkennung.
US4081605A (en) Speech signal fundamental period extractor
US5884251A (en) Voice coding and decoding method and device therefor
Atal et al. Linear prediction analysis of speech based on a pole‐zero representation
US4922539A (en) Method of encoding speech signals involving the extraction of speech formant candidates in real time
US4890328A (en) Voice synthesis utilizing multi-level filter excitation
US5007094A (en) Multipulse excited pole-zero filtering approach for noise reduction
US6912496B1 (en) Preprocessing modules for quality enhancement of MBE coders and decoders for signals having transmission path characteristics
AU2394895A (en) A multi-pulse analysis speech processing system and method
US7043424B2 (en) Pitch mark determination using a fundamental frequency based adaptable filter
EP0226590A1 (de) Sprachanalysator für lärmanfällige umgebungen
Srivastava Fundamentals of linear prediction

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19870302

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): BE DE FR GB IT NL SE

17Q First examination report despatched

Effective date: 19881031

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19890311

RIN1 Information on inventor provided before grant (corrected)

Inventor name: JAIN, VIJAY, KUMAR

Inventor name: ATAL, BISHNU, SAROOP