GB2254986A - Device for storing and reproducing speech - Google Patents

Device for storing and reproducing speech Download PDF

Info

Publication number
GB2254986A
GB2254986A GB9204879A GB9204879A GB2254986A GB 2254986 A GB2254986 A GB 2254986A GB 9204879 A GB9204879 A GB 9204879A GB 9204879 A GB9204879 A GB 9204879A GB 2254986 A GB2254986 A GB 2254986A
Authority
GB
United Kingdom
Prior art keywords
speech
signal
memory
memory means
periods
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB9204879A
Other versions
GB2254986B (en
GB9204879D0 (en
Inventor
Timo Kolehmainen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Oyj
Original Assignee
Nokia Mobile Phones Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Mobile Phones Ltd filed Critical Nokia Mobile Phones Ltd
Publication of GB9204879D0 publication Critical patent/GB9204879D0/en
Publication of GB2254986A publication Critical patent/GB2254986A/en
Application granted granted Critical
Publication of GB2254986B publication Critical patent/GB2254986B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/64Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
    • H04M1/65Recording arrangements for recording a message from the calling party
    • H04M1/656Recording arrangements for recording a message from the calling party for recording conversations
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/64Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
    • H04M1/65Recording arrangements for recording a message from the calling party
    • H04M1/6505Recording arrangements for recording a message from the calling party storing speech in digital form

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

A speech codec (2), a voice activity detector (4, 7) and a memory (3) can be used for storing speech without pause periods, and for reproducing the original speech with the required pauses. Speech is first converted with an Analog-to-Digital converter into digital form, and encoded with an encoding device into frames. Speech frames containing active speech are stored in the memory (3) until the voice activity detector (4, 7) detects a pause. A start mark bit pattern and information as to the duration of the pause are then stored in the memory. When reproducing the speech, speech frames are read in succession from the memory (3), and they are decoded and converted into analog form. Between the speech frames pauses are inserted, their positions and durations being read from the memory. The invention is particularly appropriate for use in a telephone answering device incorporated in a digital radio telephone. <IMAGE>

Description

Device for storing and reproducinq speech The invention relates to a device in which digital and encoded speech is stored compressed in a memory, and in which the speech may be reproduced in its original form.
In digital mobile telephone systems, for example, each telephone set is provided with a speech codec for encoding speech to be transmitted and decoding speech that is received. Speech coding is a target for intensive research and the present trend is to develop procedures with which a low transmission rate is achieved while maintaining a good speech quality.
Digital signal processing has enabled the development of sophisticated speech codecs in which correlations between samples of sampled speech are utilized, using methods known as short term prediction and long term prediction. Coding algorithms used in short term prediction are called linear predictive coding (LPC) in which the correlation of successive samples is used, whereas the long term correlation between successive base frequency segments is used in algorithms used in long term prediction (LTP). The speech codec based on these methods is standardized in the Pan-European digital cellular radio system known as Groupe Special Mobile (GSM) in accordance with the standards laid down by the European Telecommunications Standards Institute (ETSI) in Recommendation 06.10, and the encoder is a Regular Pulse Excitation - Long Term Prediction (RPE-LTP) encoder.This is a block based encoder in which sampling is carried out at a frequency of 8kHz.
It has a net bit rate of 13 kbits/s. The input speech samples are analysed in frames with a 20 ms duration giving an output frame of 260 bits. Another encoder is used in the dual mode (analog/digital) system to be used in the USA and known as the USDM system. This encoder employs code-excited linear prediction, also called Stochaistic Coding, which is a variation of the coding algorithm, so-called VSELP (Vector-Sum Excited Linear Predictive Coding) coding algorithm, in which a so-called code book accelerating the counting is used.
Such mobile telephone systems also employ a Voice Activity Detector (VAD), incorporated in the encoder, its function being to determine whether a signal is speech enlivened by background noise or mere background noise with no active speech. The idea is that if the VAD determines that the signal is only background noise, the transmitter of the telephone is shut off and it is opened again when a signal is interpreted as speech. Therefore, the transmission is discontinuous.
The VAD makes use of the parameters of the speech coder calculated by the speech codec. Speech coding and speech activity detection related thereto are described in the US. patent No. 4,630,262.
According to the present invention, there is provided a device for storing digitized speech signals encoded into periods of temporally constant duration comprising memory means for storing said periods of encoded speech signals, means for detecting the absence of active speech in the speech signal and operable to supply, in response thereto, a first signal indicative of a period not-containing active speech to the memory means for storage therein, the memory means being operable in response to the first signal to cease storage of the periods of the encoded speech signal and means for counting the number of periods of the encoded speech signal not-containing active speech and operable to supply a second signal indicative of that number to the memory means for storage therein, said memory means being operable to recommence storage of said periods of the encoded speech signal after storage of said second signal. This has the advantage that only speech and not pauses is stored in the memory, thus reducing the memory space required allowing the available memory space to be used economically and allowing analog speech to be stored in digital form.
The first and second signals may be in the form of bit patterns each of one period duration, stored in adjacent memory locations of the memory means which may be, for example a RAM. This has the advantage that only two memory locations are required to give information on the location and length of pauses between active speech, again reducing the memory space required.
The detecting means may include at least one voice activity detector.
In a preferred embodiment, the device further includes means for reproducing the speech signal stored in the memory means and means for detecting, in the periods read from memory means, successive frames containing the first and second signals and operable to supply, in response there to, to the memory means, a signal such that the memory means inhibits reading out of periods containing active speech subsequent to the detected first and second signals, for a time interval equal to the number of periods stored as the second signal.
The invention has the further advantage that as modern digital Public Land Mobile Network (PLMN) telephones are provided with speech codecs, a voice activity detector and a digital signal processor, these can be used as in this device. A digital radio telephone's central processing unit is always provided with a read/write memory, and this Random Access Memory (RAM) can be utilized as the memory means. This allows stored speech to be reproduced with the pauses in the original speech inserted.
The device would be very appropriate for use in digital portable telephones in the form of a telephone answering device in which the user and a person calling the number are able to dictate a message. The invention can make use of the above-described speech encoding known in itself, speech activity detection (VAD), and the storing of speech signal in digital form known in itself.
The invention will now be described, by way of example only referring to the accompanying figures of which: Figure 1 is a schematic representation of the functional components of a telephone answering device in a digital Public Land Mobile Network (PLMN) telephone, and Fig. 2, is a schematic representation of the contents of the memory in which speech has been stored.
Fig. 1 shows the functional components of the telephone which are essential only regarding the invention. A speech signal is supplied from a microphone to an Analog-to-Digital (A/D) converter, in which samples are taken therefrom at regular intervals. The sampling frequency is dependant upon the encoder being used, but in the RPE-LTP encoder used in GSM, and the VSELP used in USDM, the sampling frequency is 8 kHz. A signal supplied to the encoder 2 from the A/D converter 1 is a 13 bit pulse code modulated (PCM) signal. The samples are segmented into frames of 160 samples, whereby the duration of one frame is 20 ms as discussed above. Coding operations are carried out in the frames, and encoded speech results, in GSM, at the speed of 13 kbit per second as discussed above. In the American USDM system the speed is lower at, 7.95kbit/s.
The digital signal supplied from the A/D converter 1 is also monitored by a Voice Activity Detector (VAD) 4.
VAD 4 can be separated from the encoder 2 or, as is usual, connected thereto, and it makes use of the parameter calculations thereby. The VAD ascertains with the aid of calculation algorithms on when the speech contains a pause (i.e. the signal is background noise) and when the signal contains active speech.
When the frame output from the encoder 2 of the speech codec is defined as speech-containing, it is stored in a static read/write memory 3, for example, a 256 kbit Static Random Access Memory (SRAM) and which is able to store speech of 30 to 90 second duration. An address of a memory location for each frame to be stored is given by an address counter 6, which is incremented upon completion of the frame. Storing of subsequent frames is continued until VAD 4 ascertains that the speech contains a pause at which point it informs a silent period encoder 5 of this "silent' period. The silent period encoder 5 generates a given bit configuration which acts as a starting mark bit pattern for a pause, and which is stored in the memory 3.
Frames from the encoder 2 are not thereafter stored in the memory 3 and the counter 6 is not therefore incremented. At the same time a counter is started in the silent period encoder 5, which counter is initially reset and which is incremented every time a frame defined by the VAD 4 as not containing speech i.e. a "silent" frame comes from the encoder 2. For example, if the VAD 4 ascertains that a number, N, of successive frames do not contain speech, after which the VAD 4 ascertains that a period of speech begins, then at this point, the counter stops at this number N, and the value of N, obtained from the counter in silent period encoder is encoded in an appropriate form and stored in the memory 3, in a memory location adjacent to the start mark. Before that, the address counter 6 has been incremented, in order to address the value of N to the right memory location.Thereafter, frames containing speech from the encoder 2 are once more stored in the memory 3, and this is continued as long as the VAD 4 interprets that the signal from the A/D converter 1 contains speech. At the start of a new Pause, the silent period encoder 5 again supplies a starting mark bit pattern to the memory 3, the counter in the silent period encoder 5 starts counting the number of "silent" frames and storing of frames from the encoder 2 is interrupted. The operation is then continued as described above. The storing of speech frames is continued as long as speech continues or RAM memory 3 is not full. The contents of the memory 3 therefore becomes as shown in Fig. 2. The spaces between the frames containing speech have pauses, the start and duration of these pauses having been stored.
Thus, the data on a pause does not require more than two memory locations that is one for the starting mark bit pattern and one for the number of "silent" frames.
Messages delivered by an outside caller, i.e. a subscriber, may also be stored. The operation is similar to the one described above. In this case, however, a voice activity detector VAD 7 of a different type is required on the receiving side of the radio-transceiver in comparison to the VAD 2 of the above transmission side of the radio transceiver, since no samples of the original digitized speech are available because the received speech is already encoded. The received speech is supplied and stored in the memory 3 and is also supplied to the VAD 7. When the VAD 7 ascertains that a pause is about to begin, a starting mark bit pattern is supplied from the silent period encoder 5 and is stored in the memory 3 and the number of the frames which contain no speech i.e.
"silent" frames is counted and stored as described above. If the VAD 7 of the receiver side is such that samples of the original speech are not needed and that it is possible to use parameters of the received speech signal for observing pauses, the same VAD can be used both on the transmitter and the receive branch.
Decoding the encoded speech stored in the memory 3 is carried out with a decoder 9. The speech frames stored in the memory 3 are supplied, in succession, via a detector 8, which detects the start and the end of the silent period, to a decoder 9, which decodes them. The samples of the speech signal obtained by means of decoding are supplied to a Digital-to-Analog (D/A) converter 11 which converts the samples into an analog speech signal. If a start mark bit pattern indicating "pause begins" is found in the bit stream read from the memory 3, the RAM address counter 6 is incremented and the content of the subsequent memory location is read therefrom, to obtain the duration of the pause, i.e.
using the encoded number N, which informs for how many frames the pause lasts. At this stage reading from the memory 3 is interrupted. The detector 8 sends a signal to a muting means 10, if used, so that no bits enter the D/A converter, so that only low level noise or merely silence is output. The counter of the detector 8 counts a number of time intervals of 20 ms length, equal to the number in the subsequent memory location which indicates the duration of the pause, e.g. N or M in Fig. 2. i.e. a pause of the required duration is produced. Now the RAM address counter 6 is again started, the muting means 10 is opened and reading from the memory 3 frames containing encoded speech continues until the next pause is detected. The RAM address counter 6 is in operation only when new information is needed from the memory 3.It increments only when the stored frames are read and the length of an increment is the same as the duration of a frame, when a start mark bit pattern is written in the memory 3, or it and a bit pattern indicating the duration of a pause are read therefrom. During the pause the counter is not in operation.
The device makes use of low speed speech codecs and of a voice activity detector. Installing a telephone answering device in a telephone can be carried out in several ways which are obvious to persons skilled in the art, and is, of course, dependent on the telephone system in question.
In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the present invention. For example, a mute circuit 10 is not indispensable because without a mute circuit 10, the audio signal coming from the D/A converter 11 during a pause will be noise which may be more pleasant to the human ear than a total silence. It goes without saying that the device requires various control or other lines between the processor of the telephone, the blocks and the keyboard of the device. They are not essential with a view to the invention, and they may be accomplished in several ways required by the know-how of the person skilled in the art.

Claims (12)

Claims
1. A device for storing digitized speech signals encoded into periods of temporally constant duration comprising: memory means for storing said periods of encoded speech signals; means for detecting the absence of active speech in the speech signal and operable to supply in response thereto a first signal indicative of a period not-containing active speech to the memory means for storage therein, the memory means being operable in response to the first signal to cease storage of the periods of the encoded speech signal; and means for counting the number of periods of the encoded speech signal not containing active speech and operable to supply a second signal indicative of that number to the memory means for storage therein, said memory means being operable to recommence storage of said periods of the encoded speech signal after storage of said second signal.
2. A device according to claim 1 wherein the memory means is operable to store each successive period at consecutive memory locations within the memory means.
3. A device according to claim 2 wherein the first signal is a bit stream of a predetermined pattern of one period duration, the memory means being operable to store the predetermined bit stream pattern in a memory location next to the location of the last stored period of active speech.
4. A device according to claim 3 wherein the second signal is a bit stream of one period duration containing information on the number of counted periods of inactive speech, the memory means being operable to store the bit stream in a memory location next to the location of the predetermined bit stream pattern.
5. A device according to any preceding claim wherein the detecting means includes a voice activity detector arranged to sample the digitized signal before it is encoded.
6. A device according to any of the claims 1 to 3 wherein the detecting means includes a voice activity detector arranged to sample the digitized speech signal after it is encoded.
7. A device according to any preceding claim wherein the device further includes: means for reproducing the speech signal stored in the memory means; and means for detecting in the periods read from memory means, successive frames containing the first and second signals and operable to supply, in response thereto, to the memory means, a signal such that the memory means inhibits reading out of periods containing active speech subsequent to the detected first and second signals, for a time interval equal to the number of periods stored as the second signal.
8. A digital radio telephone answering device comprising a device according to any preceding claim.
9. A digital radio telephone responder according to claim 8 wherein the voice activity detector is the voice activity detector of the telephone.
10. A digital radio telephone responder according to claim 8 or claim 9 wherein the memory means comprises the central precession unit of the telephone.
11. A device according to any preceding claim, wherein the speech signals are encoded using Vector-Sum Excited Linear Predictive (VSELP) Coding.
12. A device for storing digitized speech signals substantially as herein described with reference to Figure 1 and Figure 2 of the accompanying drawings.
GB9204879A 1991-03-08 1992-03-05 Device for storing and reproducing speech Expired - Fee Related GB2254986B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
FI911167A FI91457C (en) 1991-03-08 1991-03-08 A method of storing speech in a memory means and reproducing a stored speech and apparatus for its use

Publications (3)

Publication Number Publication Date
GB9204879D0 GB9204879D0 (en) 1992-04-22
GB2254986A true GB2254986A (en) 1992-10-21
GB2254986B GB2254986B (en) 1995-05-10

Family

ID=8532086

Family Applications (1)

Application Number Title Priority Date Filing Date
GB9204879A Expired - Fee Related GB2254986B (en) 1991-03-08 1992-03-05 Device for storing and reproducing speech

Country Status (2)

Country Link
FI (1) FI91457C (en)
GB (1) GB2254986B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2272346A (en) * 1992-10-28 1994-05-11 Batalunt Limited Telephone answering system
GB2311189A (en) * 1996-03-15 1997-09-17 Casio Phonemate Inc Telephone answering device
EP0849924A2 (en) * 1996-12-20 1998-06-24 Nokia Mobile Phones Ltd. Method and arrangement for recording a call on a memory medium
AU725922B2 (en) * 1995-04-24 2000-10-26 Nec Corporation Speech reproducing device capable of reproducing long-time speech with reduced memory
GB2370206A (en) * 2000-12-15 2002-06-19 Ericsson Telefon Ab L M Storing a speech signal, e.g. in a mobile telephone

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4802221A (en) * 1986-07-21 1989-01-31 Ncr Corporation Digital system and method for compressing speech signals for storage and transmission

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2272346A (en) * 1992-10-28 1994-05-11 Batalunt Limited Telephone answering system
AU725922B2 (en) * 1995-04-24 2000-10-26 Nec Corporation Speech reproducing device capable of reproducing long-time speech with reduced memory
GB2311189A (en) * 1996-03-15 1997-09-17 Casio Phonemate Inc Telephone answering device
US5768349A (en) * 1996-03-15 1998-06-16 Casio Phonemate, Inc. Voice mail telephone answering device
GB2311189B (en) * 1996-03-15 2000-06-21 Casio Phonemate Inc Voice mail telephone answering device
EP0849924A2 (en) * 1996-12-20 1998-06-24 Nokia Mobile Phones Ltd. Method and arrangement for recording a call on a memory medium
US6138091A (en) * 1996-12-20 2000-10-24 Nokia Mobile Phones Ltd. Method and arrangement for simultaneous recording of incoming and outgoing voice signals with compression of silence periods
EP0849924A3 (en) * 1996-12-20 2003-06-04 Nokia Corporation Method and arrangement for recording a call on a memory medium
GB2370206A (en) * 2000-12-15 2002-06-19 Ericsson Telefon Ab L M Storing a speech signal, e.g. in a mobile telephone

Also Published As

Publication number Publication date
GB2254986B (en) 1995-05-10
FI91457B (en) 1994-03-15
FI911167A (en) 1992-09-09
FI911167A0 (en) 1991-03-08
GB9204879D0 (en) 1992-04-22
FI91457C (en) 1994-06-27

Similar Documents

Publication Publication Date Title
US5251261A (en) Device for the digital recording and reproduction of speech signals
US6055497A (en) System, arrangement, and method for replacing corrupted speech frames and a telecommunications system comprising such arrangement
US7319703B2 (en) Method and apparatus for reducing synchronization delay in packet-based voice terminals by resynchronizing during talk spurts
US5960389A (en) Methods for generating comfort noise during discontinuous transmission
KR100563293B1 (en) Method and system for speech frame error concealment in speech decoding
EP0680033A2 (en) Speech-rate modification for linear-prediction based analysis-by-synthesis speech coders
JP3896400B2 (en) Method and system for recording a digital signal in a storage element
US6718298B1 (en) Digital communications apparatus
JP2010092059A (en) Speech synthesizer based on variable rate speech coding
JPH0927757A (en) Method and device for reproducing sound in course of erasing
GB2254986A (en) Device for storing and reproducing speech
JP2001506764A (en) Methods and arrangements in telecommunications systems
KR101011320B1 (en) Identification and exclusion of pause frames for speech storage, transmission and playback
JP2861889B2 (en) Voice packet transmission system
JP2611728B2 (en) Video encoding / decoding system
US20030065512A1 (en) Communication device and a method for transmitting and receiving of natural speech
JPH10326100A (en) Voice recording method, voice reproducing method, and voice recording and reproducing device
US20050101301A1 (en) Apparatus and method for storing/reproducing voice in a wireless terminal
US6134519A (en) Voice encoder for generating natural background noise
JP3593183B2 (en) Voice decoding device
JP2002287800A (en) Speech signal processor
JPH1146163A (en) Digital portable telephone system
KR100244217B1 (en) The equipment and the method of transmitting and receiving both voice and data at the same time
KR940002078B1 (en) Apparatus for managing visitor by voice response and the record
JP2000078274A (en) Message recorder for variable rate coding system, and method for recording size reduced message in the variable rate coding system

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20060305