WO1986000133A1

WO1986000133A1 - Adaptive speech detector system

Info

Publication number: WO1986000133A1
Application number: PCT/AU1985/000121
Authority: WO
Inventors: David Spalding
Original assignee: Plessey Australia Pty. Limited
Priority date: 1984-06-08
Filing date: 1985-06-06
Publication date: 1986-01-03
Also published as: AU4439385A; CA1227573A; EP0186671A1; EP0186671A4; NZ212331A; JPS61502368A; AU584904B2

Abstract

A speech detector circuit for selectively transmitting speech signals while suppressing ambient sound related signals comprises an input from a microphone producing a speech signal and an input from a microphone producing a noise signal. The circuit includes a comparator which compares the speech signal with peak values of the noise signal to produce a pulse signal which indicates when the speech signal exceeds a threshold and a discriminator responsive to the pulse signal to produce a detected speech signal when the pulse signal occurs within a selected period which is spaced from a preceding pulse signal by a selected period.

Description

ADAPTIVE SPEECH DETECTOR SYSTEM TECHNICAL FIELD OF THE INVENTION

This invention relates to speech detector systems. The invention is particularly suitabl^e for use in combination with a transmitter for voice communication and which is equipped with a dual-microphone amplifier system arranged so as selectively to transmit speech signals whilst suppressing ambient-sound-related signals.

There are a number of applications for speech detector systems. For example, wireless transmitters, especially of the portable type, are commonly required to have a VOX facility, that is, automatic keying of the transmitter in response to speech signals. BACKGROUND ART

In previously-known VOX systems which utilise a single microphone channel, it has been difficult to combine effective recognition of the operator's speech with adequate rejection of unwanted background noise or speech. Various techniques have been employed to improve the voice discrimination, including band-separation filtering and the use of noise-cancelling microphones coupled with automatic gain control, abbreviated as "AGC". However, where a wide amplitude-dynamic-range is achieved by the use of AGC it is difficult to avoid false-triggering of the VOX circuit. In addition, the AGC system tends to amplify background noise signals and circuit noise to an objectionable level during periods when the speech is interrupted. This is especially evident when a transmitter is switched to a manual-keying or press-to-talk mode, sometimes abbreviated as "PTT".

A system which has been used successfully to overcome the foregoing disadvantages of VOX and AGC systems comprises a two-channel dual-microphone arrangement in which one microphone receives the operator's speech, superimposed on ambient noise, and the other principally receives ambient noise. Not only is noise cancellation possible, over at least the lower part of the frequency range, but the ambient noise signal may be used to control the speech channel gain in the absence of speech and, by this method, prevent the noise signal at the speech channel output from rising above, for example, a level lOdB below the nominal speech signal output level. A significant advantage of this system when combined with a speech detector is that, at least in the steady state, there is a well-defined difference in speech and noise levels which is easily discriminated in a simple comparator circuit. Some constraints and disadvantages of the dual-microphone system are:

A time delay must be included in the speech detector response to take account of the AGC attack time which can be, for example, up to 5 ms. The two microphones must be closely matched in sensitivity and frequency response. Direct noise cancellation is usually only possible at frequencies up to about 500 Hz because of the transit time difference for a sound pressure wave travelling to each of the two microphones, this varying with the direction of the sound source. There is a variation of the automatic-gain-controlled ouput level of either channel with input signal level, depending on the loop gain of the AGC loop. There is an uncertainty in the output reference level of the AGC system because of production tolerances in components.

There is a need for precise tracking, or equality of gain, in the control elements of the speech and noise channels.

There is an uncertainty in the speech detector reference level owing to production tolerances in components.

A number of the constraints and disadvantages listed above can produce effects which are additive. This can give rise to a substantial error in the effective speech detector threshold. Even with the inclusion in the circuit of a speech detector threshold adjustment means to take account of static mismatch with the preceding microphone and AGC amplifier system, dynamic uncertainties may still give rise to false-triggering by the speech detector.

An object of the present invention is to provide circuit means which at least ameliorate some of the above disadvantages of the prior art and which in preferred embodiments reduce if not eliminate the effects of variations in AGC levels, mismatch between speech detector and AGC reference levels, tracking errors of AGC elements and transient signals during the AGC attack time. In principle this is achieved by combining the functions of AGC and speech detection to eliminate mismatch occurring in separate .circuits and, further, by dynamically comparing the detected speech level with the detected noise level instead of with a constant threshold. DISCLOSURE OF INVENTION

This invention consists in a speech detector system comprising means for producing a first (speech) signal representative of speech superimposed on ambient noise and a second (noise) signal representative of said ambient noise; comparator means to determine a speech signal threshold according to peak values of said second signal and to produce a third (pulse) signal which provides an indication of when said first signal exceeds said speech signal threshold; and discriminator means responsive to an indication by said third signal to produce a fourth signal indicating detected speech when an indication occurs within a second selected period which is spaced from a preceeding indication by a first selected period.

For preference the speech signal and noise signal are subjected to automatic gain control and noise cancellation prior to input to the comparison means. For preference also the ratio of the noise cancelled speech signal to the noise signal fed to the comparison means is greater than 1:1.

Other aspects of the invention will be apparent from the description which follows. BRIEF DESCRIPTION OF DRAWINGS

An embodiment of the invention will now be described by way of example only with reference to the accompanying drawings wherein:

Figure 1 is a schematic block diagram showing part of a wireless transmitter speech detector system having a two channel audio system with speech and noise inputs.

Figure 2 shows a first circuit suitable for use as the peak comparator shown as a block in Fig. 1.

Figure 3 shows a circuit suitable for use as the pulse discriminator shown as a block in Fig. 1.

Figure 4 shows a preferred circuit which combines the peak comparator and pulse discriminator in a single circuit.

Figure 5 shows a further preferred circuit which combine the peak comparator and pulse discriminator. BEST MODE FOR CARRYING OUT INVENTION

The block diagram of Figure 1 shows schematically a wireless transmitter speech detector system having a two channel audio system with speech and noise inputs, and AGC. The rectified speech and noise signals are compared to detect the presence of speech.

The AGC system operates in substantially the known manner to control the peak noise level at the output of the noise channel, to a predetermined value equal to V__,_p/ lO. The high-frequency components of noise in the speech channel, which are not removed- by the low-frequency noise-cancellation circuit, will generally be of similar amplitude to the output of the noise channel. The foregoing control mode is over-ridden when the level of signal in the speech channel output exceeds ^VREF' ^tϊlus suppressing the noise further and enabling . an effective signal-to-noise ratio for speech of lOdB to be maintained in noisy environments.

It has been found that, in a practical system, it is desirable to have the VOX control switched at a speech-to-noise ratio of approximately 6dB, based on peak values. In previously-known speech detectors it has been usual, therefore, to detect, by means of a comparator amplifier, the instances when a signal in the speech channel exceeds a predetermined value of

In the present invention, the peak value of the rectified noise-channel signal is itself used, after amplification, as the comparator reference as shown in Fig. 2.

Figure 2 shows a peak comparator circuit for use in the arrangement shown in Figure 1. Rectified speech signal I 5 representative of speech superimposed on ambient noise (a first signal) and rectified noise signal nl representative of ambient noise (a second signal) are fed to the inputs as shown after amplification by amplifiers shown in Figure 1, where n is the ratio of the amplification of the noise channel to the amplification of the speech channel.

The current signals I and nl„ respectively generate voltages V and V across the input resistors R, and R_.

Voltage V generated by signal nl appears at the base of transistor Ql which has its collector connected to a positive power supply V . The emitter of transistor Ql is connected to an R-C combination of resistor R4 and capacitor C2. Transistor Ql, capacitor C2 and resistor R4 comprise peak detecting means. The transistor Ql operates as a voltage follower when V^. is greater than the voltage on capacitor C2 and is switched off when V„ is less than the voltage on capacitor C2. Current to charge capacitor C2 is drawn from supply V . The charging current is therefore independent of the current drawn by resistor R2 thereby avoiding response lag. Resistor R4 is in parallel with capacitor C2 and the decay time constant R4C2 is long enough for acceptable smoothing of random noise.

Voltage Vs generated by signal Is charges capacitor Cl through diode DI only when the voltage stored across capacitor Cl is less than V . That is, when a positive pulse or peak occurs in the speech signal I . The time constant of resistor Rl and

Capacitor Cl is short enough to allow Cl to be charged by peaks in Is corresponding to speech "glottal" pulses, but long enough to filter out transient noise. Resistor R3 is in parallel to capacitor Cl and its value sets the decay time of the charge on capacitor Cl. The decay time constant resistor R3 and capacitor Cl is made equal to the time constant of resistor R4 and capacitor C2 to provide good dynamic tracking.

The voltages appearing across capacitors Cl and C2 are fed to the inputs of a differential amplifier 0P1 which produces a positive output when the voltage across Cl exceeds the voltage across C2 and a negative output in the reverse situation. That is, the voltage across capacitor C2 constitutes a speech signal threshold and the output of amplifier 0P1 comprises a third signal which provides an indication of when the speech signal exceeds the speech signal threshold.

The rectified noise signal amplification in Figure 1 is usually twice the rectified speech signal amplification so that a positive peak in the speech signal of at least twice the level of ambient noise is required to produce a positive output from differential amplifier 0P1.

Fig. 3 shows one kind of pulse discriminator circuit, the principle purpose of which is to generate an output VOX control pulse only if a series of speech pulses is received but not when a single pulse or short burst of pulses is received. This is achieved by discriminating between glottal pulses comprising speech, which typically occur at 6-8 ms intervals, and single pulses or short bursts of pulses separated by less than 3 ms generated by noise. In this way the system provides immunity to transient noise such as that caused by an impact, which is typically too fast for AGC response to be effective.

In the system shown, a bistable latch is used to generate a VOX control or "speech detected" signal. When a trigger pulse is received from the peak comparator, a timer comprising a double pulse generator generates two control pulses, A and B, with a first selected period Tl and a second selected period T2 respectively. Control pulse A and the trigger pulse from the peak comparator are fed to an AND gate. The output of the AND gate is connected to the SET input of the bistable latch. A delay is included in the trigger pulse connection to prevent a trigger pulse reaching the AND gate before control pulse A. In this way control pulse A inhibits the setting of the bistable latch by either the initial trigger pulse or any subsequent pulse occurring within the period Tl, typically 5 ms. However, a trigger pulse occurring within the period T2 is able to set the bistable latch as well as re-starting the timing period T2 of control pulse B. The latter function ensures that the output VOX control pulse has at least a period equal to the initial or minimum value of T2, 10 ms for example. This period is determined by the requirements of any ensuing transmitter circuit. At the end of the period T2 after the last trigger pulse, the bistable latch is reset and control pulse A trigger is enabled.

Fig. 4 shows a system which combines the peak comparator and pulse-discriminator functions in one circuit, resulting in a substantial saving in components which can be important in the envisaged applications.

The transistors Ql and Q2 act as peak detecting means for the noise and speech rectified inputs respectively, and because transistors Ql and Q2 share a common connection to ground via the parallel combination of resistor R4 and a storage element comprising capacitor C2, the collector-emitter current of either transistor on input signal peaks is dependent on the previous peak value to which capacitor C2 has been charged. By suitable choice of the decay time of capacitor C2 via resistor R4, the transistor pair Ql, Q2 can therefore also act to supress second and subsequent peaks separated by less than the normal "glottal" period.

This is achieved because the total charge flowing into the collector of transistor Q2 during a detected speech pulse input I_ is dependent on the instantaneous charge of capacitor C2, which is arranged to decay via resistor R4. It follows that, if a rapid burst of pulses of similar amplitude occur in I , capacitor C2 will charge up rapidly and only one or two large pulses of charge will flow into the collector of transistor Q2, followed by small charge pulses which are sufficient to keep capacitor C2 charged. In contrast, speech signals typically have large impulses, or glottal pulses, spaced apart at 6-8 ms intervals with smaller amplitude pulses in between. By choice of decay time capacitor C2 charge decays enough for each glottal pulse to produce a large current impulse in the collector lead of transistor Q2. .It is therefore quite a simple matter to discriminate between speech and either continuous or impact noise by integrating the collector charge impulses and comparing this voltage with a suitable reference value using a Schmitt trigger circuit.

Integration is achieved in the circuit of Figure 4 by means of the R-C combination comprising resistor R5 and capacitor C3. Charge impulses from the collector lead of transistor Q2 result in a voltage appearing across capacitor C3 with respect to constant voltage supply V . The reference voltage is chosen so that it is exceeded by the voltage across C3 when three consecutive pulses occur in the collector lead of transistor Q2 without substantial decay of the charge on capacitor C3 between pulses. This is achieved by suitable choice of resistor R5 so that "staircase" integration of current pulses is obtained for pulses occurring within a selected period.

The Schmitt trigger is preferred to a comparator, for example, to ensure clean switching and an output pulse width in excess of some arbitrary value, dependent on the needs of ensuing circuitry. This is related partly to the decay time-constant of capacitor C3 which is controlled by resistor R5. The decay time-constant is principally determined by the need for recovery between typical impact-noise occurrences, but must be long enough for effective "staircase" integration of glottal pulses. A resistor R6 is included in the collector lead of transistor Q2 to limit the amplitude of charge impulses in the event of the AGC system being overloaded momentarily by a large impact-noise signal, thus ensuring that effective discrimination is maintained.

As will be apparent to those skilled in the art, the amplification of detected noise inputs I_N by a greater factor n than speech input Is further enhances the discrimination of the circuit by, firstly, generating a comparator threshold proportional to the ambient noise level and thereby reducing the frequency of Q2 collector impulses resulting from random noise and, secondly, because of the previously-discussed properties of the t o-microphone system, permitting operator speech to be distinguished from more distant "ambient" speech, the effectiveness of the approach being improved because of the reduced reliance on AGC control accuracy.

Figure 5 shows a further circuit combining the peak comparator and pulse discriminator functions. The circuit operates in substantially the same manner as the circuit of Figure 4 however the transistors Q4 and Q5 act as a current mirror in a known manner to produce a current flowing through collector of transistor Q5 proportional to the collector current of transistor Q2. This allows R-C combination of resistor R5 and capacitor C3 to be connected to ground thereby eliminating the need for maintaining the voltage supply V in Figure 4 constant. It will be apparent that resistor R5 and capacitor C3 effectively integrate charge impulses flowing from transistor Q5 collector in substantially the same manner as described above to control the voltage at the input of Schmitt trigger SI which generates a VOX control or "speech detected" signal when a predetermined threshold voltage is reached.

The amplitude of charge impulses from the collector of Q5 is limited by the shunting action of Q3 in response to the voltage developed across R7 by the current flowing into Q2 collector.

As will be apparent to those skilled in the art the invention hereof may be embodied in other circuits which function in an equivalent or analagous manner and such embodiments are within the scope hereof.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:-

1. A speech detector system comprising means for producing a first (speech) signal representative of speech superimposed on ambient noise and a second (noise) signal representative of said ambient noise; comparator means to determine a speech signal threshold according to peak values of said second signal and to produce a third (pulse) signal which provides an indication of when said first signal exceeds said speech signal threshold; and discriminator means responsive to an indication by said third signal to produce a fourth signal indicating detected-speech when an indication occurs within a second selected period which is spaced from a preceding indication by a first selected period.

2. A speech detector system as claimed in claim 1 wherein said speech signal threshold is determined from said second signal by peak detecting means including a storage element.

3. A speech detector system as claimed in claim 2 wherein said first signal is compared with said speech signal threshold by means of a differential amplifier,, the output of the differential amplifier producing said third signal.

4. A speech detector system as claimed in claim 2 wherein said comparator means comprises a three terminal amplifier to generate said third signal, said amplifier having the terminal common to input and output connected to the storage element of said peak detector means.

5. A speech detector system as claimed in claim 4 wherein current flow through the output terminal of said three terminal amplifier comprises said third signal.

6. A speech detector system as claimed in claim 5 wherein said 3-terminal amplifier is a transistor and said terminal common to input and output is the emitter.

7. A speech detector system as claimed in claim 4 or claim 5 further comprising means to limit said third signal amplitude.

8. A speech detector system as claimed in claim 5 or 6 further comprising a current mirror to produce a mirrored current flow proportional to said third signal current flow and wherein said discriminator means is responsive to the mirrored current flow.

9. A speech detector as claimed in claim 8 further comprising transistor current shunting means to limit said mirrored current flow.

10. A speech detector system as claimed in any one of claims 1 to 9 wherein said discriminator means comprises a control pulse generator which in response to an indication in said third signal generates a first control pulse having a duration equal to said first selected period and a second control pulse having a duration equal to said second selected period, said first control pulse acting to inhibit the production of a fourth signal during the duration thereof and said second control pulse acting to permit production of a fourth signal only in response to pulse indications in said third signal occuring during the duration of said second control pulse.

11. A speech detector system as claimed in claim 10 wherein a bistable latch generates said fourth signal and said first and second control pulses act to control operation of the bistable latch by said third signal.

12. A speech detector system as claimed in any one of claims 1 to 9 wherein said discriminator means comprises an integrator the output of which controls production of said fourth signal, said second selected period being determined by the integration constant.

13. A speech detector system as claimed in claim 12 wherein the output of the integrator is compared with a reference voltage to produce said fourth .signal.

14. A speech detector system as claimed in any one of claims 11 to 13 wherein the integrator comprises a capacitor and the integration constant is determined by the decay time of the capacitor through a parallel connected resistance.

15. A speech detector system as claimed in any of claims 1 to 14 wherein said first signal and said second signal are amplified and the second signal is amplified by a greater amount than the first signal.

16. A speech detector system substantially as herein described with reference to Figures 1, 2 and 3, or Figures 1 and 4, or Figures 1 and 5 of the accompanying drawings.