CA1130920A - Speech detector with variable threshold - Google Patents

Speech detector with variable threshold


Publication number
CA1130920A CA343,335A CA343335A CA1130920A CA 1130920 A CA1130920 A CA 1130920A CA 343335 A CA343335 A CA 343335A CA 1130920 A CA1130920 A CA 1130920A
Prior art keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Application number
Other languages
French (fr)
William G. Crouse
Charles R. Knox
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US1779179A priority Critical
Priority to US017,791 priority
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Application granted granted Critical
Publication of CA1130920A publication Critical patent/CA1130920A/en
Expired legal-status Critical Current



    • H04B3/00Line transmission systems
    • H04B3/02Details
    • H04B3/20Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other
    • H04J3/00Time-division multiplex systems
    • H04J3/17Time-division multiplex systems in which the transmission channel allotted to a first user may be taken away and re-allotted to a second user if the first user becomes inactive, e.g. TASI
    • H04J3/175Speech activity or inactivity detectors



Abstract A speech activity detection circuit and method are described in which a digitized sample of the analog speech signal in a communications system is taken n times in a period of time t. The magni-tude M of each sample is calculated and the largest value of M which is encountered over a given period t is stored as a peak magnitude ?(t). This value is compared to a value ?(t-1) which is the smallest value of ?(t) from prior periods. The lesser of the two values is stored as the new value for ?(t-1). The process is continued for i periods and at the end of the ith period, the value ?(t-1) is multiplied by a constant k and the product is stored as the threshold value for the next i periods of time t. Thus, the largest signal sample magnitude over a period of time t becomes, upon multiplication by constant k, the candidate "standard" threshold level for determining whether speech exists. If this candidate is smaller than the previously stored value for a previous period of time, then the candidate becomes the new standard. In this manner, the threshold level effectively floats or varies with the conditions of the incoming signal.



Technical Field This invention relates to analog or digital voice communications systems such as those usually employed in long distance communications u ng xadio tran~mission channels. In part:Lcular, the invention is related to communications systems such as tho~e used in satellite transmission schemes, speech storage systems and voice activity compression tech-niques in which speech signal detectors are utilized.

Prior Art In communications systems employing satellites or using radio channel communication or sp~ech storage, the capacity of the channel in terms of bandwidth, storage, or time is at a premium. In order to make the maximum use of the available capacity, techniques of voice activity compression are often used. Such techniques take advantage of the fact that during human speech, many periods of inactivity during which information is not conveyed exist. During such periods of time it is possible to transmit active portions of other conversations on the same channel, thereby increasing the actual usage of the channel and the effective ' ~13~32~

transmission of mvre informative data.

Naturally, a mechanism for recognizing the periods during which speech is present is a pri-mary requisite at the heart of any voice activi~,ycompression system. These mechanisms ha~e been referred to in the prior art as "speech detectors"
and that phrase will be used nerein for the overall function and apparatus employed for this purpose.

Some prior art speech detectors are well , known. These have been of two general types. The first type uses a technique of spectrum analysis in which the frequency spectrum of a received sig-nal is analyzed in order to detect speech. Theperformance of such a detector is excellent, but it is expensive to implement and may not operate in real time because of the extensive digital or c,omputer processing required in'order to analyze ~ the signal patterns using Fourier transforms for example in order to isolate those analog signals which contain speech data as compared to noise or inactivity.
Z5 The second type of prior art speech detector ordinarily used compares the magnitude of a re-ceived signal instantaneously in question to a fixed threshold value. If the signal magnitude exceeds the threshold it is classified as a speech signal. This type of detector works well provided that the background noise in the system is well below the threshold level and that speech segments which are delivered at low relative magnitude consistently exceed the threshold.

In environments such as those which use public switched telephone networks, the range :~ .

of speech levels encountered overlaps the range of noise levels generally existing. For a given set of connections establishing a cornmunication path, a given fiY~ed threshold will be too high and 1,ow level speech signals will be lost~ On other connections, the background noise will be at or near the threshold which will cause the activity criteria indicative of speech signals on the channel to approach 100~ thereby over-dedicating the available channel capacity to a given set of cornmunicators.

A solution to the problems noted above is obviously to adjust the threshold level so that it is always just above the background noise level.
This adjustment must be made when only noise is present on the line or in the ehannel. .Therefore, the problem becomes one of determining how to clis-tinguish between speeeh and bac]cground noi5e. This of course was the original problem for which the varying threshold solution was proposed and hence and effective means and method of carrying out this technique is necessarily iterative and con-tinuous.
- The prior art solutions just discussed have not inexpensively and effeetively solved the voice activity deteetion problem in a manner easily adapted to the satellite eomrnunication network problems where channel capacity is at a premiurn and electronie pro-cessing capability is costly and/or limited.

Objects of the Invention In view of the foregoing difficulties and short-comings with the known prior art, it is an object of RA9-7~-003 ~ ~

this invention to provide an improved voice ac-tivity detection apparatus and method which will effectively set a variable threshold for a com~
parison standard abov~ which speech signals are defined and below which noise or inactivity is, defined so that an effective allocation of channel capacity can be made.

A second object of the present invention is to provide an improved voice det~ction apparatus which operates in real time and effectively pro-vides an accurate floating threshold level for controlling allocation of the channel capacity to various potential users.
ummary The foregoing objects and others not enumerated are met in the present invention by providing a sampling circuit for the analog signal input.
The sampling circuit is operated n times in a period of time t. The magnitude M of each sample so taken is measured or calculated and the largest value encountered during the period is stored as M(t). This value is compared to the previously stored value for a previous period of time t-l which represents the smallest M~t) from the prior periods. The lesser of the two values is stored as a new value M(t-l). The process continues for i periods of time t. At the end of the ith period, the value stored is multiplied by a constant k and the product is stored as the threshold value for the next i periods of time and the value M(t-l) ~ is set to a high level.

:. , R~9-78-0~3 ~ ~O ~ ~?O
s srief Descriptlon of the Drawin~s The present invention will now be descri~ed in greater detail with reference to illustrative drawings oE a preferred embodiment -thereof and' a detailed specification in which:

Figure 1 is a schematic diagram of a gen-eralized fixed threshold speech detection circult.
Figure 2 is a schematic diagram of a variable threshold speech activity detection circuit as a preferred embodiment of the present invention.

Detailed Specification Turning to Figure 1, a generalized speech detection apparatus and technique are illustrated.
In Figure 1, an incoming signal having a value X
at a given instant and a magnitude value or level of x2 is compared against a fixed threshold value in a comparator as shown. When the value of the --;
incoming signal magnitude exceeds that of the threshold, "speech" is defined to be present. The ~5 detection signal so produced can be used to allocate communication channel capacity to the transmission of data presented by the speech signal X. During periods when the incoming signal X has a magnitude less than the threshold value, the channel capacity may be allocated to other users.
'-In Figure 2, the preferred embodiment of thepresent invention, the generation apparatus for the variable threshold for comparison against the in-coming signal level is shown. In Figure 2 an in-coming signal and/or noise component of a signal X
is sampled at a samp1ing frequency Es by a sampling '~ ' '' 3V~2~

switch schematically illustrated as 1 and driven at a sampling rate f5 established by a clock 2. This sample is then converted to the absolute value in the magnitude cir-cuit generally illustrated as 3. This produces an output identified as signal A which is, in the Einal analysis, compared in comparator 4 against a threshold level B in the same manner as illustrated in Figure 1 so that if the A
signal exceeds that of B, speech signals are defined as detected.

However, the A signal output is also stored under certain conditions as the value M~t) in register 5. Whether the value A is stored in register 5 is determined by the comparator 6 which compares the signal level A against the previously existing value in register 5. If the value of A
is less than the previously existing value in re~ister 5, it will be stored in reyister 5 as a new value in register 5.
Register 5 is read out at the clocking rate fs established by clock 2 and is reset once every n clock periods as set by the divide by n circuit 7 with the pulse delay 8 included to assure that the reset signal will occur after the setting of register 9.

The value coming from register 5 may be stored as a threshold value M(t-l) in register 9 once every n clock periods under certain circumstances as follows.

The value coming from register 5 is compared in com-parator 10 against the contents of register 9 as shown. If the new value coming from register 5 is less than that in register 9, an enable signal is given by comparator 10 to load the contents from register 5 into register 9 as shown.
This occurs once every n cvcles from clock 2 as established by ~3 ' L13~9;~

the divide by n circuit 7 as shown. A di~ide by 1 circuit 11 counts the n periods developed by divide by n circuit 7 to control, through pulse delay 12, a reset of reqister 9. The reset of register 9 actually sets the contents to a high level so that the next occurring value from ~egister 5 will be lower than the content of register 9.

The contents of register g which are read out once every ith period under the control of divide by i counter 11 are multipliecl by a con-stant value k stored in a constant value register 13, multiplied in multiplier 14 and stored in a threshold value register 15 once every (nxi)th clock period as shown. The threshold value stored in register 15 is sent as the "B" comparison signal to comparator 4 to determine whether speech signals are defined in a given sample, The value stored in register 15 is constantly present at the output and a new value is loaded into this register whenever the enable and the clock inputs go to a high level.

The actual value for the various timings and variables are not too critical. The following numbers have been experimentally determined in an implementation of the preferred embodiment of this detector:

n = 480, t = 15 milliseconds, i = 60, k = 2O25.
The following general rules can be used in choosing the values for the preferred embodiment as just given:

t/n should be greater than the Nyquist rate.
2t should be less than the syllablic rate of speech.
i~t should be much much greater than the syllabic period.



k ~ i/t where k has been found by experiment to produce the best result when it lies in a ranye l.9 to 3.0 with the preferred value for the present embodiment being 2.25.
This speech detection circuit and method have been implemented and tested and have been found to work exceptionally well in real time. The cir-cuitry is simple and inexpensive and of small size so that it may be easily implemented in large scale integration. The detector described is unique since the threshold level is constantly adjusted to the proper value without resorting to spectrum analysis to differentiate between speech and noise signal levels. The hardware cost used is much less than that fo.r spectrum analysis and it can be used in the product rather than just within the labora-tory as spectrum analysis techniques often are.

RA9-78-0~3 , . . ~ _ . ...


The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
1. In a communications system having means for ad-justably setting a detection threshold value for an incoming signal stream above which threshold value speech signals are detected and below which threshold value no speech signals are defined, a method for periodically adjusting and setting a new operative level for said detection threshold, com-prising steps of:

instantaneously sampling the signal level of said in-coming signal stream n times in a period of time t;

measuring the magnitude M of each said sample;

comparing the magnitude M of each successive said sample with the largest magnitude M of all the preceding said samples during said sampling period t;

storing the largest magnitude M encountered during said period of time t as the peak sample magnitude ?(t);

comparing said value ?(t) to a previous value ?(t-1) which represents the smallest ?(t) from prior periods of time t; and following said comparison, storing the lesser of said two magnitudes as the new value ?(t-1);

repeating said foregoing steps for i periods of time t and then multiplying the latest exsiting value of ?(t-1) by a constant k; and storing said product as said new operative threshold level for use during the next i periods of time t; and resetting said value ?(t-1) to a level higher than that anticipated for any of the samples to be taken over the next said period of time t.
CA343,335A 1979-03-05 1980-01-09 Speech detector with variable threshold Expired CA1130920A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US1779179A true 1979-03-05 1979-03-05
US017,791 1979-03-05

Publications (1)

Publication Number Publication Date
CA1130920A true CA1130920A (en) 1982-08-31



Family Applications (1)

Application Number Title Priority Date Filing Date
CA343,335A Expired CA1130920A (en) 1979-03-05 1980-01-09 Speech detector with variable threshold

Country Status (4)

Country Link
EP (1) EP0015363B1 (en)
JP (1) JPS5853356B2 (en)
CA (1) CA1130920A (en)
DE (1) DE3064505D1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5850596A (en) * 1981-09-22 1983-03-25 Hitachi Ltd Voice information storage reproduction system
US4460808A (en) * 1982-08-23 1984-07-17 At&T Bell Laboratories Adaptive signal receiving method and apparatus
JPH0467200B2 (en) * 1983-01-27 1992-10-27 Ei Teii Ando Teii Tekunorojiizu Inc
EP0127718B1 (en) * 1983-06-07 1987-03-18 International Business Machines Corporation Process for activity detection in a voice transmission system

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3520999A (en) * 1967-03-27 1970-07-21 Bell Telephone Labor Inc Digital speech detection system
US3896273A (en) * 1971-01-08 1975-07-22 Communications Satellite Corp Digital echo suppressor
US3832491A (en) * 1973-02-13 1974-08-27 Communications Satellite Corp Digital voice switch with an adaptive digitally-controlled threshold
DE2317718B2 (en) * 1973-04-09 1976-11-18 for producing two frequencies for remote equipment circuitry binaerzeichenuebertragung a frequency shift in telegraph or
US3882458A (en) * 1974-03-27 1975-05-06 Gen Electric Voice operated switch including apparatus for establishing a variable threshold noise level
FR2304899B1 (en) * 1975-03-20 1977-11-18 Cit Alcatel
US4008375A (en) * 1975-08-21 1977-02-15 Communications Satellite Corporation (Comsat) Digital voice switch for single or multiple channel applications
US4052568A (en) * 1976-04-23 1977-10-04 Communications Satellite Corporation Digital voice switch
US4028496A (en) * 1976-08-17 1977-06-07 Bell Telephone Laboratories, Incorporated Digital speech detector
JPS5947503B2 (en) * 1977-04-15 1984-11-19 Nippon Electric Co
JPS6013535B2 (en) * 1977-05-12 1985-04-08 Nippon Electric Co

Also Published As

Publication number Publication date
DE3064505D1 (en) 1983-09-15
JPS55118100A (en) 1980-09-10
CA1130920A1 (en)
JPS5853356B2 (en) 1983-11-29
EP0015363A1 (en) 1980-09-17
EP0015363B1 (en) 1983-08-10

Similar Documents

Publication Publication Date Title
US4313197A (en) Spread spectrum arrangement for (de)multiplexing speech signals and nonspeech signals
EP0127718B1 (en) Process for activity detection in a voice transmission system
US5355368A (en) Method for allocating time slots for transmission in a half-duplex time division multiple access point-to-multipoint bidirectional transmission network
CA1182222A (en) Method and apparatus for continuous word string recognition
US7236929B2 (en) Echo suppression and speech detection techniques for telephony applications
CA1295421C (en) Amplitude enhanced sampled clipped speech encoder and decoder
DE3302503C2 (en)
US6201490B1 (en) DA conversion apparatus to reduce transient noise upon switching of analog signals
US4239936A (en) Speech recognition system
US5749067A (en) Voice activity detector
US6618701B2 (en) Method and system for noise suppression using external voice activity detection
FI104663B (en) Signal processing device
EP0548054B1 (en) Voice activity detector
US5410264A (en) Adaptive impulse noise canceler for digital subscriber lines
US4631538A (en) Single frequency multitransmitter telemetry system
JP3955672B2 (en) Apparatus and method for detecting and reducing intermodulation distortion
CA2233424C (en) Echo detection, tracking, cancellation and noise fill in real time in a communication system
US5809133A (en) DTMF detector system and method which performs frequency domain energy calculations with improved performance
US4185168A (en) Method and means for adaptively filtering near-stationary noise from an information bearing signal
US4489434A (en) Speech recognition method and apparatus
US5119322A (en) Digital dtmf tone detector
EP0221221B1 (en) A process for determining an echo path flat delay and echo canceler using said process
US6490556B2 (en) Audio classifier for half duplex communication
EP1183848B1 (en) System and method for near-end talker detection by spectrum analysis
US4932062A (en) Method and apparatus for frequency analysis of telephone signals

Legal Events

Date Code Title Description
MKEX Expiry