EP0052120A4 - Verbesserungen in der signalverarbeitung. - Google Patents

Verbesserungen in der signalverarbeitung.

Info

Publication number
EP0052120A4
EP0052120A4 EP19810901295 EP81901295A EP0052120A4 EP 0052120 A4 EP0052120 A4 EP 0052120A4 EP 19810901295 EP19810901295 EP 19810901295 EP 81901295 A EP81901295 A EP 81901295A EP 0052120 A4 EP0052120 A4 EP 0052120A4
Authority
EP
European Patent Office
Prior art keywords
coefficients
data
filter
pef
formant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP19810901295
Other languages
English (en)
French (fr)
Other versions
EP0052120A1 (de
Inventor
John Sinclair Reid
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of EP0052120A1 publication Critical patent/EP0052120A1/de
Publication of EP0052120A4 publication Critical patent/EP0052120A4/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/12Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information

Definitions

  • the present invention relates to signal processing and more particularly to signal processing by means of linear prediction coding.
  • the invention is primarily concerned with speech signals but is not limited thereto.
  • a fundamental problem in the analysis of speech, and in similar fields, concerns the identification of the broad features, or trend, of the power spectrum of a signal in the presence of fine structure brought about by the harmonics of low frequency components and by the presence of noise.
  • the broad peaks in the envelope of the power spectrum of a signal are known as formants.
  • the problem is a problem of formant identification.
  • Present techniques apply either in the frequency domain or in the time domain.
  • the power spectrum of the signal is first approximated in some way such as by feeding the signal through a parallel array of filters or by computing the Fourier transform of a segment of the input data. Smoothing procedures are then applied and peaks in the smoothed frequency domain data are taken as estimates of the formant peaks of the spectrum.
  • a more sophisticated time domain approach used in the analysis of speech and of which the invention is a special case, is the linear prediction method. Under this method, successive time segments of the signal are each assumed to be the outcome of a stationary autoregressive random process.
  • PEF prediction error filter
  • ⁇ ' is the transpose of ⁇ .
  • Equation (2) is solved for the vector which com prises an estimate of the population prediction error filter (PEF) coefficient vector, ⁇ .
  • PPF population prediction error filter
  • the linear prediction method of estimating spect trends has several advantages over other methods, viz: (i) only a small number of parameters are required to represent spectral trends, (ii) the estimated spectrum at low orders represents smooth approximation to the population spectrum is completely unaffected by the presence of pitch harmonics, since these cannot affect the values of the covariances, for moderate values of the order, n,
  • the invention is an implementation of a multiprocessing approach to real timespectral analysis analogous to that which must occur in living organisms and is intendedto exploit the cheapness and power of contemporary silicon chip technology.
  • a linear prediction method is used but it is a method which avoids the time consuming aspects of conventional linear prediction methods. Essentially it involves a process by which the inversion of a single high order matrix is replaced by the simultaneous inversion of a number of second order matrices to yield a solution to equation (2) above. This solution although mathematically imprecise, is sufficiently accurate for practical purposes.
  • the spectrum, S n , derived for a particular data sequence is completely contrqlled by the roots of the polynomial Y (z) defined in equation (4).
  • the polynomial may be factorized into a number of binomial factors having real coefficients, viz:
  • Each binomial fadtor in (7) corresponds to a pair of roots in the complex number plane. Those root pairs which are close to the unit circle are related to peaks in the spectrum, Sn , which are called formants in the case of speech data. Thus some of the binomial coefficient pairs (a 1 , a 2 ) , (b 1 , b 2 ) etc. are associated with particular formants while others which are of less practical importance merely control the spectral trend. Now suppose that the binomial coefficient pair (a 1 , a 2 ) associated with a particular formant is known precisely. They can be used as the coefficients of a non-recursive filter which acts on the data according to equation (32) below to yield a new data sequence whose linear prediction spectrum S* is givenby
  • the second order PEF coefficients are found for this second data sequence they will summarize the trend in the spectrum after the partial removal of the dominant peak. Convolution of the original data sequence with this second set of coefficients will have the effect of reducing the residual peaks in this data sequence, leaving the dominant peak more isolated than before and the sequence will yield PEF coefficients less contaminated by the residual peaks. In general, if this process of convoluting a pair of data sequences with the PEF coefficients derived from the alternate sequence is continued, one sequence of sequences will -converge to a limit sequence in which the dominant peak or formant is present in isolation and the other will converge to a limit sequence which has the spectrum of the original sequence but with the dominant peak removed and which can itself be operated on in the same way so as to remove further peaks.
  • the data is divided into a plurality of data streams for example, stream A and B, whereby the PEF coefficients computed from stream A are convoluted with the original data stream to yield stream B, while the PEF coefficients from stream B are convoluted with the original stream to yield stream A.
  • stream A and B the data is divided into a plurality of data streams for example, stream A and B, whereby the PEF coefficients computed from stream A are convoluted with the original data stream to yield stream B, while the PEF coefficients from stream B are convoluted with the original stream to yield stream A.
  • the PEF coefficients yielded by both streams are identical and are a poor approximation to the PEF coefficients of the dominant peak. Some asymmetry must be deliberately introduced into such a network.
  • a method of introducing an asymmetry is to filter stream A with a recursive filter (see equation (33) below) whose coefficients are equal to or derived from the PEF coefficients computed from stream A itself.
  • This method is effective in isolating the dominant formant in stream A while stream B comprises a stream in which the dominant formant has been removed and which can be further processed in order to isolate remaining formants.
  • stream B comprises a stream in which the dominant formant has been removed and which can be further processed in order to isolate remaining formants.
  • the network selected to perform a particular task will depend upon the application; the type of spectral information required, the available hardware and so on. Even in the case of speech analysis different networks may be appropriate to each of three distinct applications, viz: phoneme recognition, speaker identification and speech compression for storage and transmission.
  • a cascaded two stream embodiment will yield, at any instant, several pairs of coefficients, one pair from each module associated with the formant isolated in the A stream by that module, plus a final coefficient pair associated with the B stream of the final module from which all the formants have been removed.
  • Each pair of PEF coefficients, C 1 and C 2 contain information about both the centre frequency and the bandwidth of the corresponding formant (see equation (34) below).
  • This second dimension of information is extremely useful in practice since the second coefficient can be used directly as a criterion for accepting or rejecting a particular formant during phoneme recognition; small values of C 2 , that is, less than some threshold value, correspond to peaks which are too broad and weak to be considered valid as formants.
  • Fig. 1 shows a circuit block diagram of an embodiment of the device as a formant tracker
  • Fig. 2 shows a circuit block diagram of one of the second order prediction filter estimators depicted by circles in Fig. 1,
  • the PEF estimator 15 will now compute PEF coefficients related to the residual peaks in data sequence 6 and pass these coefficients to filter 11.
  • the residual peaks will then be attenuated in data stream 5 allowing PEF estimator 14 to compute coefficients relating to the dominant peak which are less contaminated by the residual peaks than was previously the case. This effect of isolating the dominant peak will be further reinforced by the action of filter 13.
  • coefficients may be used as the coefficient of a non-recursive (or "finite impulse response") filter which acts on a data sequence ⁇ x i ⁇ to yield a new data sequence ⁇ y i ⁇ , viz:
  • a network for the recognition of different phonemes in speech data could dispense with the coefficient modifiers such as 16 and replace them with switches so that full positive feedback is maintained for a short time causing rapid convergence in the isolation of the formants.
  • the recursive filters such as 13 can be switched out of the network for the remainder of the duration of the phoneme.
  • Statistically significant changes in the coefficient values appearing in the buffers 17, 18, 19 and 20 can be used to detect the onset of a new phoneme and so cause the recursive filters to again become operative. In this way a speech recognition device can be constructed which is phoneme synchronous, thus avoiding the need for time axis normalization.
  • the simplest embodiment of the invention comprises the PEF estimator of Fig. 2.
  • This device alone is unsuited to the analysis of spectrally complex signals such as speech but it can be used for the recognition of simpler signals such as the signalling tones used in telephone switching systems. In this way a single device can be used to distinguish between a wide variety of tones when the input (line 1 in Fig. 2) is fed from an appropriate line in the telephone system.
  • the coefficients C 1 and C 2 generated by the device are then compared with predefined values in order to classify the incoming signal and to cause the remainder of the telephone system to take appropriate action.
  • An analogue embodiment of the device utilizing fixed delays in place of shift registers would be more suited to this application.
EP19810901295 1980-05-19 1981-05-18 Verbesserungen in der signalverarbeitung. Withdrawn EP0052120A4 (de)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
AUPE360580 1980-05-19
AU3605/80 1980-05-19
AU5556/80 1980-09-12
AUPE555680 1980-09-12

Publications (2)

Publication Number Publication Date
EP0052120A1 EP0052120A1 (de) 1982-05-26
EP0052120A4 true EP0052120A4 (de) 1983-12-09

Family

ID=25642381

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19810901295 Withdrawn EP0052120A4 (de) 1980-05-19 1981-05-18 Verbesserungen in der signalverarbeitung.

Country Status (5)

Country Link
EP (1) EP0052120A4 (de)
JP (1) JPS57500901A (de)
BR (1) BR8108616A (de)
DK (1) DK21282A (de)
WO (1) WO1981003392A1 (de)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3296374A (en) * 1963-06-28 1967-01-03 Ibm Speech analyzing system
US3327057A (en) * 1963-11-08 1967-06-20 Bell Telephone Labor Inc Speech analysis
US3335225A (en) * 1964-02-20 1967-08-08 Melpar Inc Formant period tracker
US3369076A (en) * 1964-05-18 1968-02-13 Ibm Formant locating system
US3649765A (en) * 1969-10-29 1972-03-14 Bell Telephone Labor Inc Speech analyzer-synthesizer system employing improved formant extractor
US4032711A (en) * 1975-12-31 1977-06-28 Bell Telephone Laboratories, Incorporated Speaker recognition arrangement
US4092493A (en) * 1976-11-30 1978-05-30 Bell Telephone Laboratories, Incorporated Speech recognition system
US4220819A (en) * 1979-03-30 1980-09-02 Bell Telephone Laboratories, Incorporated Residual excited predictive speech coding system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-26, no. 6, December 1978, NEW YORK (US) *
See also references of WO8103392A1 *

Also Published As

Publication number Publication date
DK21282A (da) 1982-01-19
WO1981003392A1 (en) 1981-11-26
EP0052120A1 (de) 1982-05-26
JPS57500901A (de) 1982-05-20
BR8108616A (pt) 1982-04-06

Similar Documents

Publication Publication Date Title
Le Roux et al. A fixed point computation of partial correlation coefficients
Chang et al. Analysis of conjugate gradient algorithms for adaptive filtering
Lee et al. Blind source separation of real world signals
Lim et al. A new algorithm for two-dimensional maximum entropy power spectrum estimation
Allen Short term spectral analysis, synthesis, and modification by discrete Fourier transform
Rabiner et al. The chirp z-transform algorithm
US6167417A (en) Convolutive blind source separation using a multiple decorrelation method
US4489434A (en) Speech recognition method and apparatus
Rabiner On the use of autocorrelation analysis for pitch detection
Stapleton et al. Adaptive noise cancellation for a class of nonlinear, dynamic reference channels
Trancoso et al. Efficient procedures for finding the optimum innovation in stochastic coders
US4486900A (en) Real time pitch detection by stream processing
CA1172362A (en) Continuous speech recognition method
US20050228518A1 (en) Filter set for frequency analysis
Robinson Logical convolution and discrete Walsh and Fourier power spectra
Barnwell Recursive windowing for generating autocorrelation coefficients for LPC analysis
GB2107101A (en) Continous word string recognition
GB2107100A (en) Continuous speech recognition
JP2008017511A (ja) 高精度及び高効率を有するディジタルフィルタ
Makhoul et al. Adaptive lattice methods for linear prediction
NL7812151A (nl) Werkwijze en inrichting voor het bepalen van de toon- hoogte in menselijke spraak.
Scheibler SDR—medium rare with fast computations
CN111883154A (zh) 回声消除方法及装置、计算机可读的存储介质、电子装置
JPS6356560B2 (de)
EP1417755A1 (de) Verfahren und gerät zur schätzung der fehlereigenschaften einer impulsantwort unter verwendung des kleinste-quadrat-verfahrens

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Designated state(s): AT CH DE FR GB NL SE

17P Request for examination filed

Effective date: 19820524

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Withdrawal date: 19840525