GB2332841A

GB2332841A - Speech communication systems

Info

Publication number: GB2332841A
Application number: GB9727179A
Authority: GB
Inventors: Howard Thomas
Original assignee: Motorola Ltd
Current assignee: Motorola Solutions UK Ltd
Priority date: 1997-12-24
Filing date: 1997-12-24
Publication date: 1999-06-30
Anticipated expiration: 2017-12-24
Also published as: GB9727179D0; GB2332841B

Abstract

Apparatus for compressing a voice message prior to radio transmission includes a speech recogniser 3 and a comparator 6 for identifying those words which are redundant to the meaning of the message and eliminating them from the transmitted signal. Alternatively longer words may be replaced with shorter synonyms. A syntax extractor 4, produces an error check signal which is received at the receiver to verify the accuracy of the received message and for relaying this back to the transmitter. The speech recogniser 3 compares each word of a digital bit stream from encoder 1 with stored templates and recognised words are compared at 6 with templates of redundant words stored at 7. The compressed word chain from comparator 6 along with the output of syntax extractor 4 are modulated at 8 and transmitted. In the receiver (fig.2 not shown), the received signal is demodulated and decoded for reconstruction as speech.

Description

SPEECH COMMUNICATION SYSTEMS This invention relates to digital communication systems and particularly to the transmission and reception of voice data over an air interface.

In a typical digital radio communication system, there are components that digitally encode and decode speech for communication over radio frequencies. For example, in the GSM (Global System for Mobile Communications) a speech transcoder provides the encoding and decoding ability in one component and is sometimes referred to as a speech codec.

Speech encoders are designed to utilise techniques which exploit the redundancy in the speech signal in order to reduce the number of bits required to represent the speech signal. This is an important consideration when large quantities of speech are to be held in storage media (such as a voice mail system) or when a limited bandwidth is available to transmit the speech signal over a telecommunications channel.

The present invention aims to provide a more efficient means for transmission of voice messages than is presently possible using known techniques. This is achieved by extracting the essential meaning of a voiced message or spoken text and rejecting any redundant information prior to transmission.

Accordingly, the invention comprises, in a first aspect, voice message transmission apparatus including: means for digitising an input analogue speech signal comprising the voice message to produce a first digital output signal representing a string of words, means for recognising individual words comprising said string of words, and means for eliminating from said string of words those words which are redundant to the meaning of the voice message to produce a second digital output signal representing a modified string of words.

In one embodiment, the apparatus further includes means for modulating onto a carrier said second digital output signal for onward transmission.

Thus, the invention has the advantage that the information content of a voice message can be described and transmitted in a highly compressed manner with a high degree of redundancy being removed from any spoken text.

By this means, the efficiency of transmission of a voice message is greatly enhanced. This contributes to a reduced call-time (or air time in the case of radio communications) and a consequent cost saving. It also lessens the power requirements on transmission and receiving equipment.

The means for digitising an input analogue speech signal may comprise any suitable known speech encoder which, for example, utilises the well-known two-stage process of sampling and quantisisation (ie pulsecode modulation). Codecs which utilise a non-linear quantisation process or devices which employ adaptive quantisation or differential quantisation, for example, are equally suitable.

The above-mentioned speech encoders all operate directly on the time domain speech signal. Also suitable are other known devices which operate by encoding a modified or transformed version of the speech signal, for example, linear predictive coders.

The number of bits used for encoding the speech signal may be chosen to be greater for those words which are essential to the meaning of the voice message than those which are not. These essential words would then be less effected by transmission errors or corruption.

The means for recognising individual words may comprise any suitable speech recogniser. Devices which utilise dynamic time warping or hidden Markov modelling, for example, are equally suitable. Such devices operate a pattern matching process by comparing a processed signal comprising the input word with stored word patterns.

Techniques for modulating the second digital output (which represents a modified voice message) onto a carrier may comprise any one of several known methods. For example, frequency shift keying.

The modulated signal may then be transmitted over a communication channel in accordance with the usual practice which may utilise multiplexing techniques such as frequency division multiple access or time division multiple access.

The modulated, transmitted signal may then be detected and demodulated by conventional receiving apparatus to reproduce the voice message. Alternatively, the voice message may be audibly reconstructed by means of a speech synthesizer. One known way of generating synthetic speech firstly requires a set of control parameters for producing a particular utterance. These parameters can conveniently be derived by analysis of the original voice message by the speech encoder. In the case of linear prediction, the analysis process is automatic and provided that the prediction error signal is adequately reproduced, the resulting resynthesised speech can be of high quality and virtually indistinguishable from the original.

In one embodiment, the voice transmission apparatus further includes speech understanding means. Known speech understanding systems typically incorporate several interacting knowledge sources such as acoustic, phonetic and syntactic knowledge sources. This knowledge is usually in the form of a set of rules for each knowledge source. One type of speech understanding system suitable for this application is a speech processor which is adapted to extract from the first digital output signal information relating to the "prosody" of the voice message. In this context "prosody" means and includes, for example, pitch, syntax, intonation, emphasis, rhythm and spectral characteristics. Some or all of this information may be encoded and transmitted along with the transmitted modified voice message for reception by the receiving apparatus.

Therefore, in a second aspect, the present invention comprises voice message receiving apparatus including: means for receiving via a communication link, a communications signal comprising a first data signal representing a voice message and a second data signal representing "prosody" of the voice message, means for decoding the first data signal for reproducing the voice message, and means for comparing the first and second data signals to generate an error signal.

By virtue of this second aspect of the invention, any corruption of the original voice message which has occurred in the transmission process and can be detected in a receiver using the prosody information content of the original message and comparing this with the received voice message.

In a further embodiment, the error signal may be retransmitted back to the transmission apparatus. In this case and on reception of the error signal, the transmission apparatus is adapted to, for example, retransmit the voice message over a different communications channel or adjust the channel modulation level or the relevant encoding parameters, eg, bit-rate and transmit a reconfigured signal. This process can be repeated until an acceptable error signal is derived at the receiving apparatus. This error measurement can be used as a similar error rate parameter to R x qual in a GSM system, including handover, power control and network quality.

Some embodiments of the invention will now be described, by way of example only, with reference to the following drawings of which: Figure 1 is a schematic block diagram of voice message transmission apparatus in accordance with the invention and Figure 2 is a schematic block diagram of voice message receiving apparatus in accordance with the invention.

In Figure 1, a speech encoder 1 has an input on line 2 comprising a voice message in analogue form. An output of the speech encoder 1 is connected to a speech recogniser 3, a syntax extractor 4 and a memory 5.

An output of the speech recogniser 3 is connected to a first input of a comparator 6. The second input of the comparator 6 is connected to word store 7 and a third input to an output of the syntax extractor 4. The output of the comparator 6 is connected to the memory 5 whose output along with a second output from the syntax extractor 4 is connected to modulator 8.

The modulator's output signal on line 9 is transmitted over an air interface by means of an antenna 10 via a transmit/receive duplexer 11, for reception by the apparatus of Figure 2. A second output of the duplexer 11 is connected to an error rate detector 12 whose output is fed to a second input of the speech encoder 1.

In Figure 2 an incoming signal is received by a second antenna 13 and passed through a second transmit/receive duplex 14 to a demodulator 15. The demodulator 15 has two outputs. A first output on line 16 (relating to syntax) is input to an error signal generator 17. A second output on line 18 representing voice data is connected to a decoder 19 and to a second input of the error signal generator 17. The output of the decoder 19 is connected via line 20 to a loud speaker (not shown) via suitable conditioning electronics (not shown).

In operation, with reference to Figure 1, a speech signal comprising a voice message is encoded into digital form by the speech encoder 1. The output from the speech encoder 1 consists of a digital bit stream which represents a string of words comprising the voice message. This bitstream is operated on by the syntax extractor 4 which generates an output error check signal for transmission via the modulator 8. The syntax extractor is programmed to extract the meaning of the voice message.

The bit-stream is also operated on by the speech recogniser 3 which performs a pattern matching exercise comparing each word (digitally represented) in the word string with stored templates. Each recognised word is then fed into the comparator 6 for filtering out unnecessary utterances. Also connected to the comparator 6 is the word store 7 in which are stored templates of words which are generally redundant to the meaning of a spoken message. In this single example, the words stored are "the" "on", "in" "at" and meaningless utterances such as "um", "er".

Words such as these are looked for in the voice message word string by the comparator 6 and rejected.

As an example, consider the voice message "John will meet you at the conference in Cannes on February 17". Deleting the words "at", "the", "in", "on" still leaves the message comprehensible and accurate yet considerably reduces the overall duration of the message. (The syntax extractor output will also be indicative of the sense of a modified message).

Words that are to be retained in the message are held in the memory 5 and those to be rejected are not stored. The action of the memory 5 is under the control of the comparator 6.

When the comparator 6 has completed its word filtering process, the modified voice message (now considerably compressed into fewer words and therefore fewer bits of data) is fed out of the memory 5 and modulated onto a carrier by the modulator 8. The output from the syntax extractor 4 is also modulated onto a carrier by the modulator 8 and the resulting RF signals are transmitted via the duplexer 11 and antenna 10 in accordance with conventional techniques.

With reference to Figure 2 the transmitted signal is received and demodulated in the demodulator 15. The digital data stream output by the demodulator 15 and representing the voice message is decoded by the speech decoder 19 for reconstruction as an audible voice message. The digital data stream representing the error check signal is sent to the error signal generator 17. This device uses the information received from the syntax extractor 4 to search for corruption of the received voice message which is also input to the error signal generator 17 from the demodulator 15. If a syntax error is detected, then this fact is relayed back to the transmission apparatus of Figure 1. The voice message is then retransmitted using different transmission parameters. One option is to retransmit the same message over a different communications channel.

Another option is to re-encode the voice message by increasing the bit rate of the speech encoder 1. A further option is to alter the modulation level in the modulator 8.

On reception of the retransmitted signal, the error signal generator 17 performs a further error analysis. Depending on the result, the error signal generator 17 can signal to the transmitting apparatus and to the recipient of the reconstructed audio message that the transmission is valid or it can request the transmitting apparatus to change the variable parameters and retransmit until the error level is acceptable.

In an alternative embodiment, the word store 7 is also provided with synonyms which may be used to replace longer words having the same meaning.

Claims

Claims 1. Voice message transmission apparatus including; means for digitising an input analogue speech signal comprising the voice message to produce a first digital output signal representing a string of words, means for recognising individual words comprising said string of words, and means for eliminating from string of words those words which are redundant to the meaning of voice message to produce a second digital output signal representing a modified string of words.
2. Voice message transmission apparatus according to Claim 1 and further including speech processing means for extracting from the first digital output signal, information relating to the prosody of the voice message.
3. Voice message transmission apparatus according to Claim 1 and further including speech processing means for extracting from the first digital output signal, information relating to the syntax of the voice message.
4. Voice transmission apparatus according to any preceding Claim and further including means for modulating onto a carrier digital data representative of the voice message for radio transmission.
5. Voice message receiving apparatus including: means for receiving via a communication link a communications signal comprising a first data signal representing a voice message and a second data signal representing prosody of the voice message, means for decoding the first data signal for reproducing the voice message, and means for comparing the first and second data signals to generate an error signal.
6. Voice message receiving apparatus including: means for receiving via a communication link a first data signal representing a transmitted voice message and a second data signal relating to the syntax of the transmitted voice message, and means for extracting syntax information from the transmitted voice message as received and comparing the extracted syntax information with the second data signal.
7. A method for compressing a voice message signal for transmission and reception via a telecommunications channel including the steps: digitising an input analogue speech signal comprising the voice message signal to produce a first digital output signal representing a string of words, recognising individual words comprising said string of words, and eliminating from said string of words those words which are redundant to the meaning of the voice message to produce a second digital output signal representive of compressed version of the voice message.
8. A method accord to Claim 5 and further including steps of: extracting prosody information from the first digital output signal, and comparing said compressed version of the voice message with said prosody information to produce an error signal.
9. Voice message transmission apparatus substantially as hereinbefore described with reference to Figure 1.
10. Voice message receiving apparatus substantially as hereinbefore described with reference to Figure 2.
11. A method of transmitting and receiving a voice message signal substantially as hereinbefore described with reference to the drawings.