EP0052120A4 - IMPROVEMENTS IN SIGNAL PROCESSING. - Google Patents
IMPROVEMENTS IN SIGNAL PROCESSING.Info
- Publication number
- EP0052120A4 EP0052120A4 EP19810901295 EP81901295A EP0052120A4 EP 0052120 A4 EP0052120 A4 EP 0052120A4 EP 19810901295 EP19810901295 EP 19810901295 EP 81901295 A EP81901295 A EP 81901295A EP 0052120 A4 EP0052120 A4 EP 0052120A4
- Authority
- EP
- European Patent Office
- Prior art keywords
- coefficients
- data
- filter
- pef
- formant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000001228 spectrum Methods 0.000 claims abstract description 49
- 238000000034 method Methods 0.000 claims abstract description 39
- 230000008569 process Effects 0.000 claims abstract description 10
- 230000002238 attenuated effect Effects 0.000 claims description 6
- 239000000872 buffer Substances 0.000 abstract description 8
- 230000003044 adaptive effect Effects 0.000 abstract 1
- 230000003595 spectral effect Effects 0.000 description 18
- 230000000694 effects Effects 0.000 description 11
- 230000009471 action Effects 0.000 description 8
- 239000003607 modifier Substances 0.000 description 7
- 238000002955 isolation Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 230000001934 delay Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 210000000653 nervous system Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/15—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
Definitions
- the present invention relates to signal processing and more particularly to signal processing by means of linear prediction coding.
- the invention is primarily concerned with speech signals but is not limited thereto.
- a fundamental problem in the analysis of speech, and in similar fields, concerns the identification of the broad features, or trend, of the power spectrum of a signal in the presence of fine structure brought about by the harmonics of low frequency components and by the presence of noise.
- the broad peaks in the envelope of the power spectrum of a signal are known as formants.
- the problem is a problem of formant identification.
- Present techniques apply either in the frequency domain or in the time domain.
- the power spectrum of the signal is first approximated in some way such as by feeding the signal through a parallel array of filters or by computing the Fourier transform of a segment of the input data. Smoothing procedures are then applied and peaks in the smoothed frequency domain data are taken as estimates of the formant peaks of the spectrum.
- a more sophisticated time domain approach used in the analysis of speech and of which the invention is a special case, is the linear prediction method. Under this method, successive time segments of the signal are each assumed to be the outcome of a stationary autoregressive random process.
- PEF prediction error filter
- ⁇ ' is the transpose of ⁇ .
- Equation (2) is solved for the vector which com prises an estimate of the population prediction error filter (PEF) coefficient vector, ⁇ .
- PPF population prediction error filter
- the linear prediction method of estimating spect trends has several advantages over other methods, viz: (i) only a small number of parameters are required to represent spectral trends, (ii) the estimated spectrum at low orders represents smooth approximation to the population spectrum is completely unaffected by the presence of pitch harmonics, since these cannot affect the values of the covariances, for moderate values of the order, n,
- the invention is an implementation of a multiprocessing approach to real timespectral analysis analogous to that which must occur in living organisms and is intendedto exploit the cheapness and power of contemporary silicon chip technology.
- a linear prediction method is used but it is a method which avoids the time consuming aspects of conventional linear prediction methods. Essentially it involves a process by which the inversion of a single high order matrix is replaced by the simultaneous inversion of a number of second order matrices to yield a solution to equation (2) above. This solution although mathematically imprecise, is sufficiently accurate for practical purposes.
- the spectrum, S n , derived for a particular data sequence is completely contrqlled by the roots of the polynomial Y (z) defined in equation (4).
- the polynomial may be factorized into a number of binomial factors having real coefficients, viz:
- Each binomial fadtor in (7) corresponds to a pair of roots in the complex number plane. Those root pairs which are close to the unit circle are related to peaks in the spectrum, Sn , which are called formants in the case of speech data. Thus some of the binomial coefficient pairs (a 1 , a 2 ) , (b 1 , b 2 ) etc. are associated with particular formants while others which are of less practical importance merely control the spectral trend. Now suppose that the binomial coefficient pair (a 1 , a 2 ) associated with a particular formant is known precisely. They can be used as the coefficients of a non-recursive filter which acts on the data according to equation (32) below to yield a new data sequence whose linear prediction spectrum S* is givenby
- the second order PEF coefficients are found for this second data sequence they will summarize the trend in the spectrum after the partial removal of the dominant peak. Convolution of the original data sequence with this second set of coefficients will have the effect of reducing the residual peaks in this data sequence, leaving the dominant peak more isolated than before and the sequence will yield PEF coefficients less contaminated by the residual peaks. In general, if this process of convoluting a pair of data sequences with the PEF coefficients derived from the alternate sequence is continued, one sequence of sequences will -converge to a limit sequence in which the dominant peak or formant is present in isolation and the other will converge to a limit sequence which has the spectrum of the original sequence but with the dominant peak removed and which can itself be operated on in the same way so as to remove further peaks.
- the data is divided into a plurality of data streams for example, stream A and B, whereby the PEF coefficients computed from stream A are convoluted with the original data stream to yield stream B, while the PEF coefficients from stream B are convoluted with the original stream to yield stream A.
- stream A and B the data is divided into a plurality of data streams for example, stream A and B, whereby the PEF coefficients computed from stream A are convoluted with the original data stream to yield stream B, while the PEF coefficients from stream B are convoluted with the original stream to yield stream A.
- the PEF coefficients yielded by both streams are identical and are a poor approximation to the PEF coefficients of the dominant peak. Some asymmetry must be deliberately introduced into such a network.
- a method of introducing an asymmetry is to filter stream A with a recursive filter (see equation (33) below) whose coefficients are equal to or derived from the PEF coefficients computed from stream A itself.
- This method is effective in isolating the dominant formant in stream A while stream B comprises a stream in which the dominant formant has been removed and which can be further processed in order to isolate remaining formants.
- stream B comprises a stream in which the dominant formant has been removed and which can be further processed in order to isolate remaining formants.
- the network selected to perform a particular task will depend upon the application; the type of spectral information required, the available hardware and so on. Even in the case of speech analysis different networks may be appropriate to each of three distinct applications, viz: phoneme recognition, speaker identification and speech compression for storage and transmission.
- a cascaded two stream embodiment will yield, at any instant, several pairs of coefficients, one pair from each module associated with the formant isolated in the A stream by that module, plus a final coefficient pair associated with the B stream of the final module from which all the formants have been removed.
- Each pair of PEF coefficients, C 1 and C 2 contain information about both the centre frequency and the bandwidth of the corresponding formant (see equation (34) below).
- This second dimension of information is extremely useful in practice since the second coefficient can be used directly as a criterion for accepting or rejecting a particular formant during phoneme recognition; small values of C 2 , that is, less than some threshold value, correspond to peaks which are too broad and weak to be considered valid as formants.
- Fig. 1 shows a circuit block diagram of an embodiment of the device as a formant tracker
- Fig. 2 shows a circuit block diagram of one of the second order prediction filter estimators depicted by circles in Fig. 1,
- the PEF estimator 15 will now compute PEF coefficients related to the residual peaks in data sequence 6 and pass these coefficients to filter 11.
- the residual peaks will then be attenuated in data stream 5 allowing PEF estimator 14 to compute coefficients relating to the dominant peak which are less contaminated by the residual peaks than was previously the case. This effect of isolating the dominant peak will be further reinforced by the action of filter 13.
- coefficients may be used as the coefficient of a non-recursive (or "finite impulse response") filter which acts on a data sequence ⁇ x i ⁇ to yield a new data sequence ⁇ y i ⁇ , viz:
- a network for the recognition of different phonemes in speech data could dispense with the coefficient modifiers such as 16 and replace them with switches so that full positive feedback is maintained for a short time causing rapid convergence in the isolation of the formants.
- the recursive filters such as 13 can be switched out of the network for the remainder of the duration of the phoneme.
- Statistically significant changes in the coefficient values appearing in the buffers 17, 18, 19 and 20 can be used to detect the onset of a new phoneme and so cause the recursive filters to again become operative. In this way a speech recognition device can be constructed which is phoneme synchronous, thus avoiding the need for time axis normalization.
- the simplest embodiment of the invention comprises the PEF estimator of Fig. 2.
- This device alone is unsuited to the analysis of spectrally complex signals such as speech but it can be used for the recognition of simpler signals such as the signalling tones used in telephone switching systems. In this way a single device can be used to distinguish between a wide variety of tones when the input (line 1 in Fig. 2) is fed from an appropriate line in the telephone system.
- the coefficients C 1 and C 2 generated by the device are then compared with predefined values in order to classify the incoming signal and to cause the remainder of the telephone system to take appropriate action.
- An analogue embodiment of the device utilizing fixed delays in place of shift registers would be more suited to this application.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AUPE360580 | 1980-05-19 | ||
AU3605/80 | 1980-05-19 | ||
AU5556/80 | 1980-09-12 | ||
AUPE555680 | 1980-09-12 |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0052120A1 EP0052120A1 (en) | 1982-05-26 |
EP0052120A4 true EP0052120A4 (en) | 1983-12-09 |
Family
ID=25642381
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19810901295 Withdrawn EP0052120A4 (en) | 1980-05-19 | 1981-05-18 | IMPROVEMENTS IN SIGNAL PROCESSING. |
Country Status (5)
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3296374A (en) * | 1963-06-28 | 1967-01-03 | Ibm | Speech analyzing system |
US3327057A (en) * | 1963-11-08 | 1967-06-20 | Bell Telephone Labor Inc | Speech analysis |
US3335225A (en) * | 1964-02-20 | 1967-08-08 | Melpar Inc | Formant period tracker |
US3369076A (en) * | 1964-05-18 | 1968-02-13 | Ibm | Formant locating system |
US3649765A (en) * | 1969-10-29 | 1972-03-14 | Bell Telephone Labor Inc | Speech analyzer-synthesizer system employing improved formant extractor |
US4032711A (en) * | 1975-12-31 | 1977-06-28 | Bell Telephone Laboratories, Incorporated | Speaker recognition arrangement |
US4092493A (en) * | 1976-11-30 | 1978-05-30 | Bell Telephone Laboratories, Incorporated | Speech recognition system |
US4220819A (en) * | 1979-03-30 | 1980-09-02 | Bell Telephone Laboratories, Incorporated | Residual excited predictive speech coding system |
-
1981
- 1981-05-18 JP JP56501583A patent/JPS57500901A/ja active Pending
- 1981-05-18 WO PCT/AU1981/000060 patent/WO1981003392A1/en not_active Application Discontinuation
- 1981-05-18 BR BR8108616A patent/BR8108616A/pt unknown
- 1981-05-18 EP EP19810901295 patent/EP0052120A4/en not_active Withdrawn
-
1982
- 1982-01-19 DK DK21282A patent/DK21282A/da not_active Application Discontinuation
Non-Patent Citations (2)
Title |
---|
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-26, no. 6, December 1978, NEW YORK (US) * |
See also references of WO8103392A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO1981003392A1 (en) | 1981-11-26 |
BR8108616A (pt) | 1982-04-06 |
EP0052120A1 (en) | 1982-05-26 |
DK21282A (da) | 1982-01-19 |
JPS57500901A (GUID-C5D7CC26-194C-43D0-91A1-9AE8C70A9BFF.html) | 1982-05-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chang et al. | Analysis of conjugate gradient algorithms for adaptive filtering | |
Le Roux et al. | A fixed point computation of partial correlation coefficients | |
Lee et al. | Blind source separation of real world signals | |
Lim et al. | A new algorithm for two-dimensional maximum entropy power spectrum estimation | |
EP1070390B1 (en) | Convolutive blind source separation using a multiple decorrelation method | |
Rabiner et al. | Applications of a nonlinear smoothing algorithm to speech processing | |
Heinonen et al. | FIR-median hybrid filters with predictive FIR substructures | |
Trancoso et al. | Efficient procedures for finding the optimum innovation in stochastic coders | |
Stapleton et al. | Adaptive noise cancellation for a class of nonlinear, dynamic reference channels | |
CA1172362A (en) | Continuous speech recognition method | |
US20050228518A1 (en) | Filter set for frequency analysis | |
Luo et al. | Ultra-lightweight speech separation via group communication | |
Robinson | Logical convolution and discrete Walsh and Fourier power spectra | |
GB2107100A (en) | Continuous speech recognition | |
EP0182989B1 (en) | Normalization of speech signals | |
NL7812151A (nl) | Werkwijze en inrichting voor het bepalen van de toon- hoogte in menselijke spraak. | |
Sakuma et al. | MLP-based architecture with variable length input for automatic speech recognition | |
Friedlander et al. | Least squares algorithms for adaptive linear-phase filtering | |
Kaveh et al. | An optimum tapered Burg algorithm for linear prediction and spectral analysis | |
Südholt et al. | Pruning deep neural network models of guitar distortion effects | |
Laakso et al. | Energy-based effective length of the impulse response of a recursive filter | |
JPS6356560B2 (GUID-C5D7CC26-194C-43D0-91A1-9AE8C70A9BFF.html) | ||
WO2003015272A1 (en) | Method and apparatus for providing an error characterization estimate of an impulse response derived using least squares | |
WO1981003392A1 (en) | Improvements in signal processing | |
US4161625A (en) | Method for determining the fundamental frequency of a voice signal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Designated state(s): AT CH DE FR GB NL SE |
|
17P | Request for examination filed |
Effective date: 19820524 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
18W | Application withdrawn |
Withdrawal date: 19840525 |