WO2005027541A1

WO2005027541A1 - Signal-processing methods and apparatus

Info

Publication number: WO2005027541A1
Application number: PCT/GB2004/003929
Authority: WO
Inventors: Fredrik Gustaf Rodin
Original assignee: Fulcrum Systems Limited
Priority date: 2003-09-15
Filing date: 2004-09-15
Publication date: 2005-03-24
Also published as: GB2406745A; GB0321565D0; GB2406745B; GB0420544D0

Abstract

Processing of a received telephone-signal to limit soundpressure from an earpiece(3) involves frequency-dependent limitation (7) after DTMF-tone suppression (5, 6). The energies (SyN) of synthesised frequency-components (1-N) of the sampled signal are computed within the limiter (7) from weighting coefficients (CNI,CN2) derived iteratively by an adaptive LMS filter(10) for the sine and cosine pairs of the synthesised components. The computed energies (SyN) after weighting (WN) according to characteristics of the earpiece (3) are combined (17) to give an estimate (ET) of the energy-content of the signal-sample. Comparison (19) of the estimate (ET) with itself before and after application of a non-linear function (18) produces an attenuation factor that is applied (20) to limit the signal supplied to the earpiece (3). DTMF-tone detection (6) uses LMS-filter synthesis (25) of the sample-signal at DTMF-tone frequencies, and sorting (27) of their computed energies (SY1-8). The presence of DTMF tones in the sample-signal is detected if signals (CS6-CS8) representing the three highest energies are found (31) to satisfy three criteria (28-30) attributable to the presence of DTMF-tone pairs.

Description

Signal-Processing Methods and Apparatus

This invention relates to signal-processing methods and apparatus .

The invention is concerned especially, though not exclusively, with signal-processing methods and apparatus for use in the context of telephony.

Extensive use is made of the telephone for calls to emergency and enquiry services, and the operators receiving such calls are very often subjected to high sound-pressure levels that can be injurious, or sounds that are otherwise of a nature to create undesirable stress. For example, amplification may have to be used in order to enable the operator to hear the caller, so that bursts of interference on the telephone line or wireless lin may result in pain or even injury to the operator. Furthermore, for example, where the operator is communicating a telephone number to a caller, the caller may start to enter that number into their telephone with the result that the operator is subjected to the stress of hearing DTMF dial tones.

It is one of the objects of the present invention to provide signal-processing methods and apparatus that may be used to reduce significantly the likelihood of injury and stress in the above context.

According to one aspect of the present invention there is provided a method of processing an input signal having a multiplicity of different frequency components within a range of frequencies, wherein synthesis of the signal in a plurality of its components within the range is carried out to derive representations of the energy-content of the signal at the frequencies of those respective components, and wherein the energy-content representations are utilised as discriminants in the processing of the signal.

According to another aspect of the invention apparatus for processing an input signal having a multiplicity of different frequency components within a range of frequencies, comprises means for synthesis of the signal in a plurality of its components within the range, means for deriving representations of the energy-content of the signal at the frequencies of those respective components, and means for processing the input signal in dependence upon the derived energy-content representations.

The input signal of both aspects of the invention may be sampled repetitively and the synthesis of the input signal may be carried out in respect of each sample in turn to derive fresh energy-content representations in relation to each sample. Frequency analysis of digital- signal samples is currently performed within the framework of the digital Fourier transform (DFT) ; the fast Fourier transform (FFT) is employed when computational efficiency is crucial, which is often the case in real-time applications. The DFT operates on blocks of input-signal samples, each referred to as a frame, and cannot operate sample by sample as in the case of the method and apparatus of the present invention.

The synthesis of the input signal in the method and apparatus of the present invention may involve RLS (recursive least squares) filtering of the input signal to derive coefficients related to the individual components. The filtering, which may be LMS (least mean squares) filtering, may involve quadrature-frequency signal pairs weighted individually in accordance with coefficients derived iteratively in the filtering, and the energy-content representations may each be derived as the square root of the sum of the squares of the two coefficients of each weighted signal pair.

The processing carried out by the method and apparatus of the invention may be, for example, for limiting the sound pressure level produced by an earphone or other electro- acoustic transducer that is driven by the input signal. As an alternative, or in addition, the method and apparatus of the invention may be applied to the detection and suppression of DTMF tones in a telephone system.

Signal-processing methods and apparatus according to the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:

Figure 1 is a schematic representation of the signal- processing apparatus according to the invention;

Figure 2 is a schematic representation of a signal- limiter that forms part of the apparatus of Figure 1;

Figure 3 is illustrative of an adaptive filter that forms part of the signal-limiter of Figure 2 ;

Figure 4 is a schematic representation of a DTMF-tone detector that forms part of the apparatus of Figure 1; and

Figures 5 and 6 are illustrative of part of the operation of the DTMF-tone detector of Figure 4.

The operation of the signal-processing apparatus of

Figure 1 is carried out by electronic processing circuits in accordance with software, but for ease of understanding its operation will be described in terms of separate stages with dedicated function. It is to be understood in this connection that it is not of the essence of the invention whether dedicated hardware or software-controlled apparatus is utilised.

Referring to Figure 1, the signal-processing apparatus 1 is connected as part of a telephone system between the received-signal output 2 of a land-line or wireless terminal and an electro-acoustic transducer 3 (for example an earphone of a telephone headset) . The signal from the output 2 is supplied to an analogue-digital converter 4 so as to produce a digital signal y (n) with a sampling frequency f_s, that is supplied to a notch filter 5 used in the suppression of DTMF tones detected by a

DTMF-tone detector 6. The output from the notch filter 5 is supplied via a limiter 7 to a digital-analogue converter 8 and thence to the transducer 3. The limiter 7 is responsive to the frequency spectrum of the signal- samples y (n) it receives from the filter 5 to limit the sound energy emitted by the transducer 3.

The limiter 7 has an effective structure as illustrated in Figure 2, and will now be described.

Referring to Figure 2, the limiter 7 includes a first energy-tracking section 9 that involves an LMS (least- mean-squares) filter stage 10. The filter stage 10 is used to synthesize the input sample y (n) in every time instant n as a sum of N equally-spaced narrow-band frequency components; the sum is identified as synthesized signal y (n) . Synthesis is achieved by choosing a mix of narrow-band components that minimizes the error (y(n) -y (n) ) , so that the resulting synthesized signal y (n) has a discrete power spectrum that is comparable with the continuous spectrum of the input signal y (n) . An error e = y (n) -y (n) is introduced since spectral information is lost in y (n) , however, the larger the value of N the smaller the error e.

The LMS filter stage 10 has the effective structure illustrated in Figure 3 where N (for example, thirteen) pairs of quadrature-frequency signals c_nsin (2Ηf_lnn) and c₁₂cos (2ιrf_lnn) to c_N1sin (2πf_mn) and c_mcos (2πf_N2n) are generated using respective pairs of weighting stages 11. The N pairs of quadrature-frequency signals generated are summed in a stage 12 and the result subtracted from the input sample signal y (n) in a comparator stage 13 to derive an instantaneous-error signal. This error signal is fed back to the stages 11 to adjust iteratively the weighting coefficients c_n and c₁₂ to c_m and c_m applied to the pairs of quadrature-frequency signals respectively. The iterative adjustment is towards reducing the error signal to zero so that the signal derived by the stage 12 is a synthesized signal y (n) approximating to the input signal y (n) .

Returning to consideration of Figure 2, signals in accordance with the adjusted pairs of weighting coefficients c_u and c_n to c_m and c_N2 in stages 11, are applied within the energy-tracking section 9 of unit 7 to respective computation stages 13. The stages 13 derive N signals -?y₇ to Sy_N representative of the energy content of the N frequency-component signals respectively, of the synthesized signal y (n) . In this regard each energy- content signal Sy₂ to Sy_N is computed as the square root of the sum of the squares of the respective pair of weighting coefficients c_n and c_J2 to c_N1 and c_N2. The phase information involved in each pair of frequency-component signals is not significant to the approximation provided in the energy-signals Sy_t to Sy_N.

The signals Sy₁ to Sy_N are supplied from the energy- tracking section 9 of the limiter 7 to a second, frequency-weighting section 14 which comprises N weighting stages 15. The weighting stages 15 impose weighting coefficients w₂ to w_N on the signals _3y₂ to Sy_N respectively, and pass the signals as so weighted, on to a third, attenuation-computing section 16 of the limiter 7.

In the third section 16, a computing stage 17 derives the square root of the sum of the squares of the weighted signals Sy₁.w₁ to Sy_N.w_N, as an estimate E_τ of the total energy content of the signal y (n) . This estimate is made up of the sum of approximations to the energy content of each discrete frequency component of the signal y (n) as adjusted by its respective weighting coefficient w₂ to w_N. By setting these coefficients w_; to w_N in accordance with the frequency-dependent transfer function of the transducer 3 , a correlation can be achieved between the energy estimate E_τ and the frequency-dependent sound pressure that would be produced by the transducer 3 if fed with the synthesized signal y (n) .

A signal representative of E_τ is supplied from stage 17 as an input to a non-linear stage 18 within the section 16. The stage 16 operates according to a non-linear function to produce an output signal representative of E_ouτ. The non-linear function imposes a maximum magnitude of (a + b) on the signal-representation of E_oυτ, with E_ouτ related proportionately to E_τ for energy-magnitudes of E_τ less than or equal to magnitude b , but of reducing proportionality to E_τ as the magnitude of E_τ increases above magnitude b . The transfer function of stage 18 is as represented at (1) in the annexed sheet of formulae.

The signal-representation of E_ouτ is compared with the signal-representation of E_τ in a comparator stage 19 to derive an attenuation factor. This factor is applied via a fourth section 20 of the limiter 7, to the input signal y (n) before this latter signal is supplied to the converter 8 for application to the transducer 3.

The essence of the frequency-dependent limiting carried out by the limiter 7 is the sample-by-sample spectral analysis provided by the power spectrum S_yl to S_yN. The LMS adaptive filter stage 10 minimizes the error e between signals y (n) and y (n) for every time-instant n according to the formula (2) with respect to the matrix (3) , of the annexed sheet. The desired accuracy (that is to say, the maximal tolerated mean error) is dependent on the number of iterations of the LMS filter stage 10 that take place, and for which there is repeated updating of the matrix (3) , for every time instant n .

Tuning and calibration of the limiter 7 with respect to the specific transducer 3 is necessary before use. The weighting coefficients w_t to w_N and limit-maximum _{(a + b)} are set by sweeping a narrow-band tone through the frequencies covered in the bandwidth of the input signal

(for example, 200 to 4000 Hz). The resulting sound pressure at the transducer 3 is logged by a sound analyzer, measuring the maximum sound-pressure level in 1/3 octave bands. Through superposition, and assuming that the input signal is a stationary sinusoid with variance greater than (a + b) , the output of the transducer 3 is set to a limiting level for every discrete frequency by first adjusting the overall maximum (a + b) and then each weighting coefficient w_; to w_N individually.

The transmission of DTMF tones to an operator's headset is a principal cause of nuisance and even injury, and although the limiter 7 is capable of suppressing such tones to some extent, the fast and efficient suppression of them is achieved in the apparatus of Figure 1 via the notch filter 5. The notch filter 5 acts to block DTMF tones that are detected by the DTMF-tone detector 6. The detector 6 and its mode of operation will now be described with reference to Figure 4.

Referring to Figure 4 , the DTMF-tone detector 6 receives signal y (n) containing speech and/or DTMF signal components. For every time instant n , the detector 6 classifies the content of the current sample y (n) as DTMF or DTMF, namely, including or not-including, respectively, DTMF tones. The classification is made according to three functional blocks or stages 22, 23 and 24.

Stages 22 and 23 act together as a feature extraction engine, providing the classifier stage 24 with features significant for DTMF tones and for speech respectively. Block 24 makes a decision on the extracted features of y (n) and supplies a DTMF-tone index to the notch filter 5 if a DTMF tone is detected.

The method of detection is based on a-priori knowledge of the characteristic spectral shapes of speech and of pairs of narrow-band tones. The major energy-content of speech is concentrated in the lower end of the frequency domain of 200-2000 Hz. Also, speech is generally composed of a fundamental frequency f_i and a set of narrow-band energy components located at multiples (i.e. harmonics) of r^* _x. Since these additional energy components are equally spaced within the frequency domain, the probability that more than one of them coincides with the eight standardised DTMF-tone frequencies F_υ i={l,8}, is small. Moreover, a signal containing a DTMF tone alone, has an energy spectrum dominated by two narrow energy components, and a basic assumption in the present context is that the energy of all DTMF tones is larger than that of any harmonic-frequency component of speech. The operation of stages 22, 23 and 24 will now be described in more detail with reference to Figure 4.

Referring to Figure 4, stage 22 includes an LMS filter stage 25 that has the same effective structure as that illustrated in Figure 3 for generating pairs of quadrature-frequency signals. In this case, however, just eight pairs of quadrature-frequency signals c_nsin (2Ηf_lnn) and c₁₂cos (2πf_lnιι) to c_SIsin (2πf₈₁n) and c_S2cos (2πf₈₂n) are generated using the respective pairs of weighting stages 11. The eight pairs of quadrature- frequency signals generated in this case have the same frequencies as the eight standard DTMF-tone frequencies respectively, and are summed in the stage 12. The resultant sum is subtracted from the input sample signal y(n) in the comparator stage 13 to derive an instantaneous-error signal. This error signal is fed back to the stages 11 to adjust iteratively the weighting coefficients c_n and c₁₂ to c₈₁ and c_S2 applied to the pairs of quadrature-frequency signals respectively. The iterative adjustment is towards reducing the error signal to zero so that the signal derived by the stage 12 is a synthesized signal y (n) approximating to the DTMF-tone frequency components in input signal y (n) .

Returning to consideration of Figure 4, signals in accordance with the adjusted pairs of weighting coefficients c_u and c₁₂ to c₈₁ and c₈₂ are applied within section 9 of the detector apparatus 6 to respective computation stages 26. The stages 26 derive eight signals Sy₂ to Sy₈ representative respectively of the energy content of the eight DTMF-tone frequency components of the synthesized signal y (n) . In this regard each energy-content signal Sy_; to Sy₈ is computed as the square root of the sum of the squares of the respective pair of weighting coefficients c_n and c₁₂ to c₈₁ and c₈₂. The phase information involved in each pair of frequency-component signals is not significant to the approximation provided in the energy-signals S _; to Sy₈.

The signals Sy_t to Sy₈ are supplied from the first section 22 to a sorter 27 of the second section 23. The sorter 27 acts to sort the signals Sy₂ to Sy₈ in ascending order of magnitude, so as to produce representative signals c_sl to c_s8 in that order. The three ordered signals c_s6 to c_s8 correspond to whichever three of the energy-signals Sy₂ to Sy₈ are of the highest magnitude, and are supplied for classification to the classifying stage 24.

The classification carried out in stage 24 is based upon three criteria (represented by blocks 28 to 30) of which two (blocks 28 and 29) are derived from the spectral properties of speech and DTMF signals respectively. To visualise the concept of classification, the distribution of energy content within two typical sets of the signals c_sl to c_s8 are illustrated in Figures 5 and 6. Figure 5 illustrates the energy distribution for the component frequencies where y (n) contains speech alone, showing that the highest energy is to be found in only one of the DTMF-tone frequencies. Figure 6 illustrates the corresponding energy distribution where y(n) contains DTMF tones; the presence of a pair of DTMF-tone frequencies is indicated by the high energy concentrated in two of the component frequencies.

The first criterion used for classification is applied in block 28 to determine whether:

(c_s7 - c_s6) I (c_s7 + c_s6) > A

where A has a value close to 1. If that criterion is satisfied, the signal y (n) is likely to contain a DTMF tone. In this latter case the block 28 signals a "1", whereas otherwise it signals a "0". The second criterion is applied in block 28 to determine whether :

(Css " ^cs?) I (^css + cr_s7) < B

where B has a value less than 1, and preferably 0.5. This criterion is based on the fact that where the signal y (n) contains speech the difference (c_s8 - c_s7) is likely to increase as some coefficients become excited by a harmonic speech-component (see Figure 5) , and the sum (c_s8 + c_s7) will stay relatively small. On the other hand, if a DTMF-tone pair is received the value of the fraction (c_s8 - c_s7) I (c_s8 + c_s7) will be forced down. If the value of the fraction is less than B , the block 29 signals "l" if it is not, it signals a "0".

The third criterion, applied in the block 30, is based on the consideration that DTMF tones are each generated as a sum of two narrow-band sinusoids, with frequencies determined by the DTMF matrix. This matrix has four rows and four columns, each associated with a particular frequency, with each matrix element F_{iJ f} i={ l , 4} , ={5 , 8} corresponding to a DTMF tone.

The third criterion exploits the property that if a DTMF tone is indicated by block 28 as likely to be present, one of the two signals c_s7 and c_S8 must correspond to F_i, i={ l , } and the other to F_j, j = { 5 , 8 } . This criterion can be summarised as:

°S7 ~ ^Ci 'S8 = C, ^CS7 ~ ^Cj ^{→ C}S8 ~ ^Ci

and if satisfied indicates that the signal y (n) contains a DTMF tone. If the criterion is satisfied the block 30 signals "l¹¹ if it is not, it signals a "0". The signals from the blocks 28 to 30 are supplied to an AND gate 31. It is only in the event that each block 28 to 30 signals a "1", that the detector 6 signals to the notch filter 5 that the signal y (n) is classified DTMF, otherwise it signals that it is classified DTMF.

The notch filter 5 responds to the condition in which the signal y (n) is classified DTMF, to suppress from that signal as applied to the limiter 7, frequencies corresponding to those represented by the coefficients c_s7 and c_s8. Identification of those two frequencies are provided by signals supplied to the notch filter 5 from the sorter 27.

Although the system described above utilises LMS filters for stages 10 and 25, any form of filter from the family of RLS (recursive least squares) filter-family may be used instead. However, the LMS filter has been found to provide high computational efficiency in the context of the described telephony system.

Formulae

Claims

Claims :

1. A method of processing an input signal having a multiplicity of different frequency components within a range of frequencies, wherein synthesis of the signal in a plurality of its components within the range is carried out to derive representations of the energy-content of the signal at the frequencies of those respective components, and wherein the energy-content representations are utilised as discriminants in the processing of the signal.

2. A method according to Claim 1 wherein the input signal is sampled repetitively and the synthesis of the input signal is carried out in respect of each sample in turn to derive fresh energy-content representations in relation to each sample.

3. A method according to Claim 1 or Claim 2 wherein synthesis of the input signal in said plurality of components involves recursive least squares filtering of the signal to derive coefficients related to those individual components .

4. A method according to Claim 3 wherein the filtering involves quadrature-frequency signal pairs weighted individually in accordance with coefficients derived iteratively in the filtering, and the energy-content representations are each derived as the square root of the sum of the squares of the two coefficients of each weighted signal pair.

5. A method according to Claim 3 or Claim 4 wherein the filtering is LMS (least mean squares) filtering.

6. A method according to any one of Claims 1 to 5 wherein comparison is made between the square root of the sum of the squares of the energy-content representations, and the result of submitting that sum to a non-linear attenuation function, and wherein an attenuation factor for application to the input signal is derived from the result of the comparison.

7. A method according to Claim 6 for use in a telephony system, wherein the input signal contains speech components and is supplied to an electro-acoustic transducer via a limiter, and wherein the limiter is responsive to the attenuation factor to attenuate the input signal accordingly and thereby limit the sound pressure produced by the transducer.

8. A method according to Claim 6 or Claim 7 wherein the energy-content representations include weightings in accordance with the frequency-dependent transfer function of the transducer.

9. A method according to any one of Claims 1 to 5 wherein the energy-content representation representing the highest energy-content is selected, the selected representation is utilised in one or more comparisons between the energy-content representations, and processing of the input signal is carried out in dependence upon the results of the comparisons.

10. A method according to Claim 9 wherein the synthesis of the input signal is in more than three of its components, and wherein three energy-content representations representing the highest energy-content are selected, the selected representations are utilised in one or more comparisons between them, and processing of the input signal is carried out in dependence upon the results of the comparisons.

11. A method according to Claim 10 for use in a telephony system for suppressing DTMF tones in the input signal, the synthesis of the input signal is in the eight DTMF-tone component frequencies, and the processing of the input signal involves suppression of the DTMF-tone frequencies in the input signal in dependence upon the results of the comparisons.

12. Apparatus for processing an input signal having a multiplicity of different frequency components within a range of frequencies, comprising means for synthesis of the signal in a plurality of its components within the range, means for deriving representations of the energy- content of the signal at the frequencies of those respective components, and means for processing the input signal in dependence upon the derived energy-content representations .

13. Apparatus according to Claim 12 wherein the input signal is sampled repetitively and the synthesis of the input signal is carried out in respect of each sample in turn to derive fresh energy-content representations in relation to each sample.

14. Apparatus according to Claim 12 or Claim 13 wherein synthesis of the input signal in said plurality of components involves recursive least squares filtering of the signal to derive coefficients related to those individual components.

15. Apparatus according to Claim 14 wherein the filtering involves quadrature-frequency signal pairs weighted individually in accordance with coefficients derived iteratively in the filtering, and the energy- content representations are each derived as the square root of the sum of the squares of the two coefficients of each weighted signal pair.

16. Apparatus according to Claim 14 or Claim 15 wherein the filtering is LMS (least mean squares) filtering.

17. Apparatus according to any one of Claims 12 to 16 wherein comparison is made between the square root of the sum of the squares of the energy-content representations, and the result of submitting that sum to a non-linear attenuation function, and wherein an attenuation factor for application to the input signal is derived from the result of the comparison.

18. Apparatus according to Claim 17 for use in a telephony system, wherein the input signal contains speech components and is supplied to an electro-acoustic transducer via a limiter, and wherein the limiter is responsive to the attenuation factor to attenuate the input signal accordingly and thereby limit the sound pressure produced by the transducer.

19. Apparatus according to Claim 17 or Claim 18 wherein the energy-content representations include weightings in accordance with the frequency-dependent transfer function of the transducer.

20. Apparatus according to any one of Claims 12 to 16 wherein the energy-content representation representing the highest energy-content is selected, the selected representation is utilised in one or more comparisons between the energy-content representations, and processing of the input signal is carried out in dependence upon the results of the comparisons.

21. Apparatus according to Claim 20 wherein the synthesis of the input signal is in more than three of its components, and wherein three energy-content representations representing the highest energy-content are selected, the selected representations are utilised in one or more comparisons between them, and processing of the input signal is carried out in dependence upon the results of the comparisons.

22. Apparatus according to Claim 21 for use in a telephony system for suppressing DTMF tones in the input signal, the synthesis of the input signal is in the eight DTMF-tone component frequencies, and the processing of the input signal involves suppression of the DTMF-tone frequencies in the input signal in dependence upon the results of the comparisons.