GB2229342A

GB2229342A - Data reduction of digitised speech signals

Info

Publication number: GB2229342A
Application number: GB8905970A
Authority: GB
Inventors: Robert Anthony Severwright
Original assignee: GEC Marconi Ltd; Marconi Co Ltd
Current assignee: BAE Systems Electronics Ltd
Priority date: 1989-03-15
Filing date: 1989-03-15
Publication date: 1990-09-19
Also published as: GB8905970D0

Abstract

Speech signals, e.g. between a telephone 1 and telephone 9, are digitised in circuit 2 and reconverted in circuit 8. Successive blocks of digitised data, e.g. 16m seconds thereof, are compared with each other in the frequency domain in comparator 14, the conversion having been effected in block 13. If one block is similar to its predecessor wrihin pre-set limits, that block is not sent to the output 5, e.g. because 15 is opened and instead a control signal 14b is passed to the communication link to instruct a store 19 to repeat the previous block at the other end of the communication link. Thus only critical data blocks are transmitted and the data rate is reduced. This can be used for communication with an aircraft. <IMAGE>

Description

Data Reduction of Digitised Speech Signals This invention relates to the data reduction of digitised speech signals.

The data rate for speech signals on the telephone system is normally 64 kbit/second but this rate is considerably faster than the minimum needed to convey the voice information content given the limited sound bandwidth of the telephone system.

There are situations where a reduction in this data rate is desirable. To this end1 various techniques have been proposed for reducing the data rate without impairing the information content.

For example, advantage can be taken of the fact that over a large number of speech channel pairs, each channel of each pair is on average only transmitting speech data for 39% of the time (obviously usually one caller listens while the other speaks and there are also some mutual silences). Thus, a large number of channels in each direction can be grouped together and the number of channels in each direction can be under half the number of telephone conversations transmitted in that direction.

One channel thus carries the data for two or more conversations in a suitably labelled manner enabling the separate calls to be reconstituted at the other end of the channel. A disadvantage with such digital speech interpolation (DSI) is that, for a small proportion of the time, the telephone conversations in one direction exceed the statistical average and the capacity of the system, resulting in the phenomenon of freeze-out, when parts of a conversation are lost. To overcome this problem in a group of channels, so-called "bit stealing" is sometimes used, in which each pulse code modulated sample is truncated by one bit and another channel is provided by the extra bit across the truncated samples in 8 channels.

Another technique for data reduction, which can be used in conjunction with digital speech interpolation, is the use of a codec which not only digitises speech signals but also breaks the speech down into components which are transmitted separately. In particular, speech broken down into an excitation signal, which represents the exciting frequency of the vocal chords i.e. the pitch, and into a filtration signal, which represents the effect on the exciting frequency of the throat, mouth and nasal cavities, which enables listeners to distinguish e.g.

different vowel sounds at the same pitch. With such a multi-pulse linear predictive codec (MPC), it is common to repeat the previous parameters if a data block is corrupted during transmission.

However, because al the data transmitted with such an MPC is critical to the information content in the speech (as opposed to the shape of the waveform), if the system is used in conjunction with DSI, it is not possible to overcome freeze-out by "bit stealing". This may be a problem in some applications e.g. where a limited bandwidth and hence limited signalling rate is available for transmission of the speech data e.g. in telecommunication links with an aircraft.

The invention provides apparatus for the data reduction of digitised speech signals, comprising means to compare in the frequency domain blocks of digitised data representing speech signals with blocks of digitised data representing preceding portions of the speech signals, and to ascertain if the similarity is within predetermined limits.

Comparison of each block with that for the preceding portion of a speech signal enables further redundancy in digitised speech signals to be identified.

The apparatus may include means to pass to an output only those blocks which are not within the predetermined limits of similarity to preceding blocks, and may include means for generating signals to indicate when data blocks are similar to preceding blocks within the predetermined limits.

The. comparison means may be arranged to calculate differences between the moduli of the amplitudes at different frequencies, and to ascertain if the sum of those differences is less than a predetermined value.

Alternatively, the comparison means may include a correlator to correlate e.g. cross-correlate each block with a preceding one.

The invention also provides a method of data reduction of digitised speech signals comprising the steps of comparing in the frequency domain blocks of digitised data representing speech signals with blocks of digitised data representing preceding portions of the speech signals and ascertaining if the similarity is within predetermined limits.

Apparatus for and a method of data reduction of digitised speech will now be described by way of example with reference to the accompanying drawings, in which: Fig. 1 is a circuit block diagram of a telecommunication link including a first form of data reduction and data reconstruction apparatus; Fig. 2 is a schematic representation of the processing taking place in the first form of data reduction apparatus; and Fig. 3 is a circuit block diagram of an alternative form of data reduction and reconstruction apparatus.

Referring to figure 1, one channel of a channel pair providing simultaneous two way telephone communication is illustrated. Speech signals from a telephone 1 in a first location are fed to means 2 for digitising the speech signals, and to a data reducer 3. The output from the data reducer is fed, together with the output from seven identical channels (not shown), to a multiplexer 4 which employs time division multiplexing to combine the eight outputs into an output channel 5. This is a radio link to a demultiplexer 6 in a second location, but any form of communication link may be employed instead.The demultiplexer reconstitutes the radio channel as 8 one-way channels of telephone traffic and, in the one channel that is fully illustrated, a circuit 7 for reconstructing the data expands the data which was compressed in the data reducer and feeds a digital-to-analogue converter 8, which supplies the telephone 9 at the other end of the one-way channel. The other seven channels are identical to that illustrated.

The eight channels described provide communciation in one direction between telephone 1 and telephone 9, and the other telephone pairs not illustrated. Communication in the other direction from telephone 9 to telephone 1 and for the other telephone pairs (not shown) is provided by circuitry corresponding to items 2, 3 and 4 at the second location and to items 6, 7 and 8 at the first location (not shown), a radio or other link being provided between the two locations in the other direction.

A speech digitising means 2 includes a low pass antialiasing filter 10 and an analogue-to-digital converter 11, which converts speech signals to 8000 pulse code modulated samples per second, a sample being eight bits long, giving a data rate of 64kbits/second.

The data compression means or data reducer 3 includes a store 12 into which successive 16 milli-second blocks of samples (128 samples) are fed. Converter 13 produces a representation of the frequency spectrum of one block of samples using a fourier transform function e.g. a fast fourier transfer form (FFT). While this is being done, the next block of samples is being clocked into a register ready for transfer to the store when all 128 samples have been clocked in. The converter 13 thus calculates the frequency spectrum of successive 16 milli-second blocks of samples.

The comparator 14 compares each block of 128 samples fed to it with the immediately preceding block of 128 samples. A comparator calculates, for each of a discrete number of frequencies across the frequency spectrum, the modulus of the difference between the amplitudes of the frequency spectrum for one block with that for the previous block. These are then summed and compared with a fixed constant k. If the result is that k is greater than the summation, then it is considered that the information content is such that the data block is sufficiently similar to that in the preceding data block for there to be no need to transmit it to the output. On the other hand, if the summation is greater than k, then the data block is transmitted.

Referring to figure 2, the frequency spectrum of block A is very similar to that of block B but not similar to that of block C. The main peak of each spectrum will of course represent the pitch of the sound produced by the vocal chords (the excitation), and the overall shape of the response will represent the filtration produced by the facial cavity. Obviously, if A is similar to B, the difference between the spectra at the discrete frequencies will be small, and the summation will also be small. This would be the case for example during an open vowel sound.

Thus, data block B will not be passed to the output. In the case of blocks B and C, the difference between the spectra at the discrete frequencies will not be small, and the summation will be greater than k. Data block C will thus be passed to the output.

Data block C is compared with data block B after data blocks A and B have been compared, although it could if desired be compared instead with data blocks A, the last one which was transmitted, rather than the last one B, which was not sent.

If a data block is not to be sent to an output, a switch 15 opens in a path 16, the signal in this path being suitably delayed so that block B appears at switch 15 just after the comparison with block A has been made.

This data block is not then passed to output 16a. At the same time, the control signal that caused switch 15 to open is also fed to the multiplexer 4 along a control signal path 14b.

At the remote location, the demultiplexer decodes and separates the output signal (which now forms an input signal 17) and the control signal 18. Each new data block transmitted and received is held in store 19 e.g. block A would be held in store 19. Then, when block B is not transmitted, control signal 18 will move switch contact 20 onto pole 20a connected to the store so that, in the 16 milli-second second time block following receipt of block A on the input 17 (when the switch 20 was connected to pole 20b), the digital-to-analogue converter 8 will be fed block A again. If block C was the same as blocks A and B, the switch would stay in position 20a for the next 16 milli-seconds. But when the spectral content differs e.g.

block C compared to block B, the control signal switches 20 to 20b so that the corresponding data block is fed to the digital-to-analogue converter 8. Thus the digital-toanalogue converter is fed a continous stream of signals at 64kbits/second.

The digital-to-analogue- converter 8 converts data to analogue speech signals via a digital-to-analogue converter 21 and a filter 22, so that appropriate speech signals are heard on the telephone 9. Although the speech signals now derive only from critical data blocks repeated where necessary, it has been found that the effect is not apparent to the listener, while the data rate to be transmitted may be reduced to a half of what it would otherwise have been As an alternative to the comparison method described, cross-correlation of the frequency spectrum of each data block with its predecessor may be used at a criterion for deciding whether or not to send it to the output.

The communication link may be a radio link e.g. for communication between a first location at a ground base station and a second location on an aeroplane or on one of several aeroplanes.

The invention may be used in conjunction with digital speech interpolation (DSI). As mentioned earlier, this relies on combining the various channels so that silences are not transmitted and data from the various channels is interspersed with each other on a lesser number of channels. This could be done in the multiplexer, and of course the sections of each channel must be labelled so that they can be reassembled correctly at the other end of the communication link.

In the case where freeze-out occurs, this can be alleviated by varying the limits within which a frequency spectrum must lie for it to be transmitted i.e. greater differences between successive blocks could be allowed before each block was transmitted.

Referring to figure 3, further advantages can be obtained by using in the data reducer (compressor) 3 and the data reconstructer (expander) 7, a codec 23 which breaks down speech into excitation and filtration components which are transmitted separately for the purpose of reducing the data rate to be transmitted e.g. a multipulse linear predictive codec (MPC). Such a codec can already reduce a 64kbit/second data rate to a 9.6kbit/second data rate.

A circuit 24 compares, for successive blocks of 16m seconds, the excitation signal for one block with that for the previous block (the excitation signal already being in the frequency domain). It may also compare the filtration signals for one block with those for the previous blocks (each two signals having been separated in circuit 25), and the decision to send or not may be based on the similarity of the excitation signals, on the similarity of the filtration signals, or on a combination of the two.

of the two. The filtration signals may be represented by the multiplying (tap) co-efficients in a finite impulse response filter.

If a decision is made to refrain from sending a block, a switch opens and an appropriate control signal is sent so that, at the data reconstructor, the data block for the previous time period is repeated.

With the use of the invention in conjunction with an MPC codec, a data reduction to 4kbits/second per one-way channel may be achieved, enabling eight one-way channels to be accommodated in a bit rate of 38kbits/second, leaving 6kbits/second for control and 32kbits/second for data. This could be reduced further with the use of DSI and makes the system very useful for signalling to aircraft where, for various reasons, the bandwidth and hence data bit rate is very restricted. A 25kHZ analogue channel could be used as the carrier.

Claims

1. Apparatus for data reduction of digitised speech signals, comprising means to compare in the frequency domain blocks of digitised data representing speech signals with blocks of digitised data representing preceding portions of speech signals, and to ascertain if the similarity is within predetermined limits.

2. Apparatus as claimed in claim 1, including means to pass to an output only those blocks which are not within the predetermined limits of the similarity to the preceding blocks.

3. Apparatus as claimed in claim 1 or claim 2, including means for generating signals to indicate when the data blocks are similar to preceding data blocks within the predetermined limits.

4. Apparatus as claimed in any one of claims 1 to 3, in which the comparison means is arranged to calculate the differences between the moduli of the components at different frequencies, and to ascertain if the sum of those differences is less than a predetermined value.

5. Apparatus as claimed in claim 4, in which the predetermined value is variable.

6. Apparatus as claimed in any one of claims 1 to 3, in which the comparison means includes a correlator to correlate each block with each preceding one.

7. Apparatus as claimed in any one of claims 1 to 6, in which the frequency spectrum of each block is compared with the frequency spectrum of the block for the preceding portion of the signal.

8. Apparatus as claimed in any one of claims 1 to 7, including a multipulse linear codec arranged to reduce the data rate by breaking speech into eritation and filtration components.

9. Apparatus as claimed in any one of claims 1 to 8, including a multiplexer arranged to employ digital speech interpolation to combine the output of the apparatus with the output of other apparatuses according to claim 1.

10. Apparatus for the data reduction of digitised speech signals substantially as herein described with reference to the accompanying drawings.

11. A method of data- reduction digitised speech signals comprising the steps of comparing in the frequency domain blocks of digitised data representing speech signals with blocks of digitised data representing preceding portions of the speech signals, and ascertaining if the similarity is within predetermined limits.

12. A method of data reduction substantially as hereinbefore described with reference to the accompanying drawings.