CA1077627A

CA1077627A - Method and apparatus for speech detection on pcm multiplexed voice channels

Info

Publication number: CA1077627A
Application number: CA251,930A
Authority: CA
Inventors: Fouad Daaboul; Jean-Pierre Adoul
Original assignee: Universite de Sherbrooke
Current assignee: Universite de Sherbrooke
Priority date: 1976-05-06
Filing date: 1976-05-06
Publication date: 1980-05-13

Abstract

ABSTRACT OF THE DISCLOSURE

The disclosure herein describes a method and a system for speech detection on PCM multiplexed voice channels; for each channel, a decision is reached every M samples regarding the channel activity; in addition, the nature of speech is detected as: voiced (compact or non-compact) or unvoiced (fricative or non-fricative) when the channel is active; pure silence, white noise or echo when the channel is inactive. The decision is based on the joint value of the amplitude, zero crossing of the signal and zero crossing of the signal deriva-tive.

Description

1C~776Z7 FIELD OF THE INYENTION
The present invention relates generally to PCM (Pu7se Code Modulation) telecommunications and, more particularly, to speech detection for use in a Time Assignment Speech Interpel-ation system in which all the signals are expressed in PCMcoded form and on time-division basis; such system is known in the art as a PCM-TASI system.
BACKGROUND OF THE INVENTION
TASI systems are well-known and consist basically in increasing the number of signal sources that can be switched over a fixed number of transmission lines by connecting a talker and a listener only when the talker is actually speaking. One example of such a system is described in U.S.
patent No. 3,030,447 issued April 17, 1962 to Saal.
Most conventional detectors operate on the analog (non-digital) vocal signal and consist in computing tne mean power value of the signal and in comparing this value wit~ a ~- -pre-determined decision threshold. More recent systems consist in periodically sampling the amplitude of voice-frequency signals and in translating these amplitude values into digital form (see, for example, U.S. patent No. 3,712,959 granted Jan.
23, 1973 to Fariello and U.S. patent No. 3,832,491 granted Aug.
27, 1974 to Sciulli). However, the decision reached concerning the status of a voice channel is based only on the amplitude of the vocal signal and a distinction is made only between noise and silence.
In present detectors, there is a certain delay before the beginning of the identification of speech so as to prevent undesired pulse noises which could cause the unwanted acti-vation of a transmission channel. This delay is required inorder to ensure that the talker has really began to speak and . .
- 1 - ~

' - : -, is an inverse function of the signal amplitude. This solution, while avoiding false activation, reduces the intelligibility of the message since there is a chopping of the consonants of low amplitude which, however, contain very useful information.
Indeed, the differences between the sounds "ta" and "da"
or "pa" and "ba" are condensed in the first milliseconds.
Furthermore, in presently known detectors, since consonants include a lot of information and since they are of low ampli-tude, there is a tendency to consider as speech all signals having a relatively low amplitude. This results in considering as speech: white noises of various origins which are inherent to all transmission channels; and echos, i.e. vowels of high amplitude which the other talker transmi~s and which, by inter-ference, are present in the channel under consideration. These echos are evidently reduced but have sufficient a~plitude to cause a reactivation.
OBJECTS OF THE INVENTION
An object of the present invention is to provide a speech detection system that instantly recognizes the presence or absence of speech without being affected by random noises.
It is further object of the present invention to provide a speech detection system whereby, when speech is detected, the actual nature of speech may be known.
It is still a further object of this invention to provide a speech detection system whereby, when no speech is present on a channel, the type of silence or noise may be known.
The present invention is concerned with a speech '~ system which analyses in ~ee~ time the digital vocal signal and which detects the presence or absence of speech. This system enables to control a group of telephone channels

- 2 -- ' 1~)776Z~7 based on silences during conversations. The present system differs from prior systems by its capability of discriminating speech from what is not speech rather than discrimating noise from silence. The present speech detection system enables, at all times, information on the nature of the speech: voiced compact, voiced non-compact, and unvoiced. Then, the system enables to distinguish instantly the presence of short conso-nants thereby ensuring a greater intelligibility to the tele-phone transmission.
STATEMENT OF THE INVENTION
The present invention relates to a method of speech detection in a PCM multiplexed voice-channel system which comprises: processing a predetermined batch of consecutive PCM
samples; sequentially computing a series of parameters during processing of the predetermined batch of consecutive PCM ~
samples, the parameters relating to: the amplitude, zero ~ -crossing, zero crossing of the derivative of the vocal signali -and determining the status of each channel from information received as a result of the computing of the parameters over ;
the batch.
Whereas a certain delay is required in presently known detectors to avoid unwanted noises of short duration, such delay is no longer needed in the present system since the present system is capable of recognizing these voices. ~- -Furthermore9 white noises are now detected inde-pendently of their amplitude; this is based on a characteristic which d~stinguishes the white noise from other spoken sounds.
With the present invention, the voiced and unvoiced signals are treated separately; this provides an immunity agains~ echos and the unvoiced signals are not affected by this immunity. Hence, a voiced signal of insufficient ampli-tude to be a legitimate voiced signal will immediately be _ 3 _ __. . . . . . . .

identified as an echoi on the other hand, the system wi71 remain extremely sensitive to unvoiced signals (consonants~
even of lower amplitude than that of an echo.
BRIEF DESCRIPTION OF THE DRAWINGS
A preferred embodiment will now be described with reference to the accompanying drawings, in which:
Figure 1 is a block diagram of the speech detector made in accordance with the present invention; and Figure 2 is a schematic representation of the basic principle of the decision stage of the present invention.
DESCRIPTION OF A PREFERRED EMBODIMENT
The voice speech detector of the subject invention operates on PCM samples. Conventionally, the analog voice information is applied to a PCM device which performs a sampling, typically, at a 8 KHz rate; each sample is subse-quently converted into a 8 bit binary code. In accordance with the specific embodiment described herein, the 8 bit samples are received in subassembly I in Fig. 1. The logarithm of the amplitude of each sample is coded by an integer taken between -127 ancl ~127 (with a double zero: -O and ~0 for symmetry purpose).
The detector of the present invention operates on N
multiplexed voice channels. For channel n, the detector computes four parameters from a batch of M consecutive samples.
Thus, as far as channel n is concerned, a new set of four parame~ers is available every M samples. For a particular channei, the parameters are the four positive integers defined as:
a: the sum of the absolute values of the M samples:
M
a = ~ ¦Xj¦ where Xj is the integer corresponding i ~ 1 to the jth sample of the waveform;

zo: the number of zero crossingsof the waveform is the number of sign changes between consecutive samples;
zl: considering the sequence of the M differences between consecutive samples (i.e. ~1 - Xj - Xj; for i - 1, 2, 3 .. M), zl represents the number of sign - -changes among these M differences; in the sequel al will be referred to as the signal derivative;
d: it is zl minus zo.
The status of channel n is decided on the sole basis of the four integers along with its previous status.
For each channel~ there are two operating modes.
First, there is the computation mode which consists in computing the values of a, zo, zl and d which is done sequen-tially, as soon as the PCM samples arrive at the input of the speech detector. Secondly, there is the decision mode which consists in providing a decision at the end of a predetermined batch of M samples. However, in order to carry out these oper-ations. the parameters a, zo, zl and d are truncated to become, respectively, A, Zo, Zl and D. The decision is then obtained by means of three memories. In the embodiment described, the same ROM memory of 256 binary inputs and 8 binary outputs is consecutively used three times; this memory is divided into three fields of 128, 64 and 64 binary inputs, respectively.
Figure 2 illustrates a schematic representation of the truncation of a, zo, zl and d into A. Zo, Zl and D.
A - O, 1, 2, 3, 4, ~, 6, 7; it is the binary number corresponding to the three highest bits of the binary number (in 11 bits for M 48) corresponding to Ma + 1 wherein ~1 is a constant which enables to optimize the information contained in A. For M ~ 48, for example, ~1 - -20; for another value of M, another value of ~1 must be determined in order to maintain .

: . . . . - - .
. - , , ~ . . ~ .

1(~776;;~7 as close as possible the equivalence between a and A given in the following Table la.
TABLE la a ~c 4 0 5 ~c a < 12 1 12 ~ a c 28 2,3,4 28 ~c a 5,6,7 This value ~1 may be made adjustable with the mean level of a talker based upon a few seconds. This results in directly rendering the detector adaptable in amplitude which may represent an advantage in certain applications.
Zo = 0, 1, 2, 3, ...15 is the binary number corre-sponding to the four highest bits of the binary number (in 5 bits for M - 48) corresponding to ZO + ~2. For M - 48, ~2 is equal to - ~2; for another value of M, another value of ~2 must be determined to satisfy the equivalence of Table lb.
TABLE lb zo Zo ~ 2.6 2.6 1 Zl - 0, 1, 2, 3, ...7 is the binary number corre-sponding to the three highest bits of the binary number (in 6 bits for M - 48) corresponding to zl + ~3. For M = 48, ~3 is equal to +6i for another value of M, another value of ~ must be determined to satisfy the Table lc.

.

TABLE lc zl ~1 .
zl 2.6 0,1,2 2 6 ~ zl < 1M8 3,4 ~M8 ~ zl 5,6,7 D ~ 0, 1, 2, ...7 is the binary number corresponding to the three highest bits of the binary number (in 4 bits for M = 48) corresponding to zl - zo.
The four new integers ar~ processed two by two. -.
The memory field #1, which receives inputs D and Zo, provides two output binary parameters R - 0,1 and Z ~ 0,1 as in Table ld.
TABLE ld .
zo, zl or d R :
. . .
1.18 ~ Zzl ~ 1.42 0 :

If not 1 ~ :

It should be noted that R is a function of the ratio zl/zo;
this v?lue is easy obtainable from the parameters d and zo which are sufficiently approximated by D and Zo. In essence, R identifies the presence of white voice.
The memory field #2, which receives inputs Zl and A, provid~s an output binary number AZ - 0,1 ...6 of 3 bits in accord~nce with Table 2.

: - 7 -1(~77627 \ 0 1 2 3 4 ~ 6 7 O O O O O O O O O

3 2 2 2 5 5 6 5 6

4 2 2 2 5 5 6 6 6 ~:
2 2 2 3 3 6 6 6 :

AZ = f(A,Zl) The memory field #3 receives inputs K, R, Z and AZ
(K and R being two binary parameters, the obtention of which will be described hereinbelow); it provides, first, an inter- -mediate parameter K - 0,1 the value of which with respect to the inputs is given in Table 3a:
TABLE 3a ..... ~

AZ : -_ . _ -Zo R K 0 1 2 3 4 5 6 O O O 1 1 0 0 0 0 0 ' O O 1 1 1 0 0 0 0 o ' ' ' 0 1 0 1 1 0 0 0 0 1 .' 1 0 0 1 0 0 0 0 0 0 :

1 1 1 I 1 1 1 1 1 1 1 .-K f(Zo,R,K,AZ) 10~776;~7 TABLE 3b AZ
Zo R K O 1 2 3 4 5 6 O 1 1 5 7 1 2 6 3 ~

1 0 1 5 4 4 4 4 4 4 -:

S ~ f(Zo,R,K,AZ) On the other hand, memory field #3 provides the status information S ~ 1, 2, ... 7, the value of which with :
re`spect to the inputs is given in Table 3b. This status may be conveniently described by seven binary variables referenced:
Y, CM, NV, FR, SL, WN and EC, which take the values of O or 1 according to Table 4a.
TABLE 4a STATUS OUTPUT INFORMATION IDENTIFIED
NUMBER CORRESPONDING TO STATU' WAVEFORM
S V CM NV FR SL WN EC CHANNEL TYPE OF SPEECH
_ 1 1 1 0 0 0 0 O active Voiced compact ; 2 1 0 0 0 0 0 0 active Voiced non~compact 3 O O 1 0 0 0 0 active Unvoiced, non-fricative 4 O O 1 1 0 0 0 active Unvoiced, fricative

5 O O O O 1 0 0 passive Silence

6 O O O O 1 1 0 passive White noise

7 O O O O 1 0 1 passive Echo . 1~77627 The script j is given to the parameters and to the decisions pertaining to the present batch of M samples and j - 1, j - 2 for the preceding decisions. Therefore, R and K
may be defined by the following logic equations: Rj = Rj "or"
Rj 1 and Kj - Kj 1 "and" Kj 2 (where "and" and "or" are the operators of the Boolean logic).
The ultimate decision, Sj*, concerning the status of a channel after the analysis of batch j is given at table 4b.
TABLE 4b 1 0 \ S j I :

\ 1 2 3 4 5 6 7 Sj _ 1 3 1 2 3 3 5 6 7 ; 5 1 2 3 4 5 6 7 _ _ 1 2 3 4 5 6 7 Sj* = f(Sj, Sj 1) Sj* is a function of status Sj given by the memory field #3 as well as the status of Sj 1 which was identified by the same memory for the preceding batch. Sj* is equal to Sj, ~ -except in few cases where it is equal to Sj 1 Theseexceptions correspond to a minor refinement of the decision concerning the type voiced, compact/non-compact, or unvoiced, fricative or non-fricative.
Referring to Figure 1~ the detector made in 20 accordance with the present inventlon includes fifteen sub- -assemblies which are referenced in Roman numerals. The output -' .. ... . ~ . . .

1C~776Z7 of a sub-assembly is referred by its Roman numeral, followed by the subscript: 1, 2, 3 ... .
A description of each sub-assembly and of its function will now be given. Standard digital integrated circuits well known to the person skilled in thisart may be used to perform these functions and a detailed description thereof is believed not to be necessary for a full understanding of the present invention.
SUB-ASSEMBLY I
This sub-assembly receives the PCM samples of the waveform which constitute the input to the detector and computes sequentially the differences corresponding to the derivative of the signal. The sequential operation of the speech detector allows to keep in the memory of this sub- ~`
assembly only one PCM sample per channel and the sign of the derivative. -SUB-ASSEMBLY II
For each channel, this sub-assembly detects the zero crossings of the waveform by comparing the signs of two successive samples and computing the sum (zo) of a batch of M samples.
SUB-ASSEMBLY III
For each channel, this sub-assembly computes the difference (d) between the number of zero crossings (zo) of the signal and the number of zero crossings of the derivative (zl)~for a batch of M samples.
SUB-ASSEMBLY IV
For each channel, this sub-assembly detects the zero crossings of the derivative of the signal by comparing the signs of two successive samples and computing the sum (zl) for a batch of M samples.

, ~)776Z7 SUB-ASSEMBLY V
For each channel, this sub-assembly takes the abso-lute value of the amplitude of each sample of the signal and computes the sum (a) thereof for a batch of M samples.
SUB-ASSEMBLY VI
For each successive channel, this sub-assembly effects a quantification or truncation on zo, which comes from sub-assembly II and becomes Zo, and keeps it in memory with a format of 4 bits. It also effects a quantification on d which comes from sub-assembly III and becomes D, and keeps it in memory with a format of 3 bits. It further includes a one bit memory for the addressing of sub-assembly X.
SUB-ASSEMBLY VII
_ _ .
For each successive channel, this sub-assembly effects a quantification on Z1, coming from sub-assembly IV, which becomes Z1, and keeps it in memory with a format of 3 bits. It also effects a quantification on a, coming from the sub-assembly V, which becomes A, and keeps it in memory with a format of three bits. It further includes a two bit memory for the addressing of sub-assembly X.
SUB-ASSEMBLY VIII
For each successive channel, it keeps in memory the outputs of sub-assemblies XI and XII and the outputs X2 to X5 of sub-assembly X. It includes a two-bit memory for the addressing of sub-assembly X.
SUB-ASSEMBLY IX
This sub-assembly enables, for each channel, to successively direct the outputs of sub-assemblies VI, VII, VIII
to the inputs of sub-assembly X. ~`
SUB-ASSEMBLY X
This sub-assembly consists of a read only memory -, - : . - . . . . .

1~77627 (ROM) including three fields respectively addressed by sub-assemblies VI, VII, VIII. The parameters ~ and Z resulting from the memory field #l are the outputs Xl and X2 which respectively constitute the inputs of sub-assemblies XII and VIII. The memory field #2 gives parameter AZ on outputs X3, X4, X5, thereby completing the input of sub-assembly VIII.
The informations with respect to the status V, NV, SL, WN, EC
resulting from memory field #3 is available on X2, X4, X6, X7 and X8 and are entered in sub-assembly XV whereas the para-meters CM and FR on outputs X3 and X5, respectively, are entered in sub-assemblies XIII and XIV. The parameter ~ on output X1 constitutes the input of sub-assembly XI.
SUB-ASSEMBLY XI
For each channel, it provides a sequence test on parameter K between two consecutive batches of M samples;
this sub-assembly may include a pair of shift registers and an "AND" gate.
SUB-ASSEMBLY XII
For each channel, it provides a sequence test for parameter R between two consecutive batches of M samples;
this sub-assembly may include a shift register and an "OR"
gate.
SUB-ASSEMBLY XIII
.
For each channel, it provides a sequence test on the results NV and FR between two consecutive batches of M
samples. This sub-assembly may include a pair of shift registers and an two-input selector.
SUB-ASSEMBLY XIV
For each channel, it provides a sequence test on the results V and CM between two consecutive batches of M samples.
This sub-assembly may include a pair of shift registers and a 76;2~

two-input selector.
SU~-ASSEMBLY XV
For each successive channel, it keeps in memory the results V, CM, NV, FR, SL, WN, EC and makes them available during the time allotted to a channel. It may include a shift register which serves as a buffer memory for the results obtained.
It is to be understood that the above described arrangements are merely illustrative of numerous and varied other arrangements which may form applications of the princi-ples of the invention both in the calculation and in the decision (i.e.: several distinct memories, use of micro processors...). It is evident that these other arrangements may readily be devised by persons skilled in the art without departing from the spirit and scope of the present invention.

.

Claims

WHAT IS CLAIMED IS:

1. A speech detector for use in a PCM multiplexed voice channel system comprising: means for processing a pre-determined batch of consecutive PCM samples; means for sequentially computing a series of parameters during processing of said predetermined batch of consecutive PCM samples, said parameters relating to: the amplitude of the vocal signal, the zero crossing of the vocal signal, the zero crossing of the derivative of the vocal signal; and means for determining the status of each channel from information received as a result of the computing of said parameters over said predetermined batch.

2. A speech detector as defined in Claim 1, further comprising means for determining the nature of speech detected from said information received as a result of the computing of said parameters over said predetermined batch, said speech being determined as voiced compact, voiced non-compact, unvoiced fricative, unvoiced non-fricative.

3. A speech detector as defined in Claim 1, wherein said determining means provide further information on said channel when said status is inactive, said further information pertaining to the presence of white noise, echo or pure silence.

4. A speech detector as defined in Claim 2, wherein said parameters include the following four integers:
a: the sum of the absolute values of said amplitude;
zo: the number of sign changes between consecutive PCM
samples;
zl: the number of sign changes among the sequence of differences between consecutive PCM samples;

d: the difference between zl and zo.

5. A speech detector as defined in Claim 4, further including means for effecting a quantification of the values of a, zo, zl and d.

6. A speech detector as defined in Claim 4, wherein white noise is determined by the value of the ratio ?? within predetermined limits.

7. A speech detector as defined in Claim 6, wherein said determining means include a ROM memory having three fields successively used for each batch of samples.

8. A method of speech detection in a PCM multiplexed voice channel system, comprising: processing a predetermined batch of consecutive PCM samples; sequentially computing a series of parameters during processing of said predetermined batch of consecutive PCM samples, said parameters relating to:
the amplitude of the vocal signal, the zero crossing of the vocal signal, the zero crossing of the derivative of the vocal signal; and determining the status of said channel from information received as a result of the computing of said parameters over said predetermined batch.

9. A method as defined in Claim 8, further deter-mining the nature of speech detected from said information received as a result of said computing as: voiced compact, voiced non-compact, unvoiced fricative, unvoiced non-fricative.

10. A method as defined in Claim 8, further defining the nature of each channel when no speech is detected as:
white noise, pure silence, echo.

11. A method as defined in Claim 9, defining said parameters into four positive integers as follows:

a: the sum of absolute values of said amplitude of said PCM samples;
zo: the number of sign changes between consecutive PCM
samples;
zl: the number of sign changes among the sequence of differences between consecutive PCM samples;
d: being equal to zl - zo.

12. A method as defined in Claim 11, further effecting a quantification of said integers prior to the determining steps.