WO1992015987A1 - Procede et systeme de substitution de paquets vocaux - Google Patents

Procede et systeme de substitution de paquets vocaux Download PDF

Info

Publication number
WO1992015987A1
WO1992015987A1 PCT/US1992/001214 US9201214W WO9215987A1 WO 1992015987 A1 WO1992015987 A1 WO 1992015987A1 US 9201214 W US9201214 W US 9201214W WO 9215987 A1 WO9215987 A1 WO 9215987A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
packet
digital representations
digital
standard
Prior art date
Application number
PCT/US1992/001214
Other languages
English (en)
Inventor
James J. Berken
Mark Taylor
Huiyu Wang
Paul Odlyzko
Original Assignee
Motorola, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola, Inc. filed Critical Motorola, Inc.
Publication of WO1992015987A1 publication Critical patent/WO1992015987A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/66Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signals; for improving efficiency of transmission

Definitions

  • This invention pertains to a voice technique for substituting missing or substantially corrupted voice segments.
  • the present invention is especially suited, but not limited to, wireless packet systems in which burst type packet errors occur and retransmission of voice packets cannot be utilized.
  • Pitch estimation is a technique in which the pitch of the voiced speech is estimated.
  • a destination device may utilize received voice packets to make pitch estimates. Such estimates can be utilized by the destination device to create a voice segment to be substituted for a missing voice packet. Such a substitution normally results in better perceived audio quality by a listener as compared to utilizing silence in place of a missing voice segment.
  • the destination device learns the position of the missing packets by time stamps and/or sequence numbers in the headers of correctly received packets. If the destination device calculates a "poor" pitch estimate based on a corrupted signal, the performance of the substitution system will be degraded thereby degrading the voice quality.
  • D.J. Goodman, et al. article "Waveform Substitution Techniques for
  • FIG. 1 is block diagram of a voice packet origination and destination node in accordance with the present invention.
  • FIG. 2a illustrates a voice packet format utilized in a TDMA time slot.
  • FIG.2b illustrates a TDMA frame format consisting of a plurality of time slots.
  • FIG. 3 illustrates a voice waveform in which a pitch estimate has been made.
  • FIG. 4 is a flow diagram illustrating steps in accordance with the present invention for transmitting a voice packet which includes a pitch estimate.
  • FIG. 5 is a flow diagram illustrating steps in accordance with the present invention for receiving voice packets and making substitutions based on previously received pitch estimates.
  • FIG. 6 is a block diagram of an embodiment for generating a pitch estimate and incorporation of same into a packet at an origination node.
  • FIG. 7 is a block diagram of an embodiment for receiving a packet in accord with the present invention.
  • a transmitted pitch estimate in accordance with the present invention provides improved voice quality by allowing a better selection of stored voice samples for substitution in place of a corrupted or missed voice packet.
  • This packet substitution technique is best suited to address voice packet losses between 0.5 and 10 percent. Below about a 0.5 percent error rate, voice quality is impaired but not normally to a level that many people would find objectionable. For error rates at or above ten percent, the unusable packets are so frequent that substitution techniques have difficulty in maintaining an acceptable voice quality level.
  • the use of a pitch estimate of previously received packets to determine suitable prior voice samples for substitution relies on the pitch of consecutive pitch periods (or cycles) being stationary relative to the packet transmission rate. Such substitution produces a sound quality better than the use of silent gaps in place of unusable or lost voice packets.
  • a communication controller 10 includes a transceiver device
  • the transceiver 22 includes one or more antennas 12 for RF communication.
  • the communication controller is connected by wire 14 to an analog to digital and digital to analog converter 20 which is connected by wire 16 to a telephone 18 that receives and transmits voice data.
  • the communication controller 10 functions to both transmit and receive voice packets within a packet network.
  • the controller receives analog signals from the telephone 18 and converts these signals to a digital representation by A/D converter 20.
  • These digital signals are processed by the MPU 24 under the control of an operational program stored in ROM 26 and RAM 28 into a packet format which includes digital representations of the voice.
  • These digital representations are part of a packet which is transmitted by the RF transmitter portion of the transceiver 22 as an RF signal over antenna 12 to another node in the packet network.
  • a packet signal is received by antenna 12 and demodulated by the RF receiver portion of the transceiver 22.
  • the received digital packet information is processed by the MPU 24 in accordance with ROM 26 and RAM 28 to reconstruct the voice information.
  • the recovered digital representations are reconverted by the digital to analog converter 20 which provides the analog voice information to telephone 18 which is utilized by a user to listen to the transmission. It will be apparent to those skilled in the art that the packet transmission rate will limit the number of different voice channels in a real time voice transmission system.
  • each voice packet 210 is segmented to contain a packet header 211, pitch estimate 212, voice data information 213, and an error check value or CRC 214.
  • a packet header 211 includes relevant packet information such as the packet length, address, and CRC on the header. The packet header is used by the communication controller to assist in routing voice packet information to the correct destination, device.
  • the pitch estimate 212 requires a number of voice samples to have occurred before a pitch estimate can be calculated.
  • the pitch estimate may be arbitrarily set to a predetermined value, such as zero, at the transmitter when not enough samples have occurred to allow a pitch estimate to be calculated. After the pitch is estimated, it is inserted into the voice packet 210 and an error checking value such as a CRC 214 is generated and appended to the voice packet 210.
  • a CRC 214 is used to protect bits in each of the digital representations in the voice packet 210 and pitch estimate 212.
  • test listeners found the audio quality of recovered speech more acceptable when there were errors in the least significant bits of the digital speech representation as compared to errors in the most significant bits of the digital representations. This indicated that the most significant bits have a larger impact on voice quality than the least significant bits.
  • the voice quality experiment also indicated that with all bits protected the voice quality declined resulting in the listener hearing M chopped"or "slurred" voice due to the more frequent packet substitutions dictated by CRC failure. This phenomenon results because any single bit error will cause a packet to be substituted at the destination device when all bits are protected.
  • a CRC 214 which protects some but not all of the possible bits comprises an implementation of a predetermined reception accuracy standard
  • the advantage of this discovery relies on the fact that only a portion of the most significant bits need to be protected to ensure that the system is error sensitive enough to detect the corrupted voice signals with the largest impact. This will minimize the number of packet substitutions activated by a CRC failure. This will result in maintaining high voice quality since a substituted voice signal will be used only when needed. For example, where 8 bits is used for one voice sample, protecting the 4 most significant bits by a CRC represents a choice that results in improved voice quality as compared to protecting all 8 bits.
  • FIG. 2a illustrates a voice packet 210 which is transmitted during time slot number two in a TDMA system consisting of a plurality of time slots within each frame 200 as illustrated in FIG. 2b.
  • An originating node in accordance with the present invention generates pitch estimates 212 based on prior voice packets.
  • the pitch estimate 212 is transmitted with each voice packet 210 along with a voice data field 213 which includes digital representations of a number of voice samples.
  • a continuous voice analog voltage waveform such as shown in FIG. 3 is divided into a plurality of consecutive intervals as designated by the marks along the time axis in FIG. 3. These intervals each correspond to different packets appearing in a selected time slot on consecutive frames 200.
  • a plurality of different voice signals can be simultaneously transmitted during a TDMA frame 200, i.e. each time slot functions as a separate channel.
  • the pitch estimate 212 is calculated at the origination node. This is advantageous since the pitch estimate is made based upon consecutive uncorrupted voice samples which results in a more accurately determined pitch estimate than could be made if calculated based upon received information which may contain inaccuracies.
  • FIG. 3 illustrates a voice waveform divided into consecutive intervals as indicated by the marks along the time axis.
  • each interval consists of 16 samples.
  • One known method of determining pitch is to detect the positive and negative peaks of the speech signal. Peak detectors that use center clipping with threshold can provide a voiced/unvoiced classification. In the event of unvoiced speech the origination node forces the pitch estimate to equal 1 interval which causes the receiver to use the previouisy received packet in place of the corrupted or missed packet.
  • the pitch estimate 212 is calculated by measuring the elapsed time in units of the number of voice samples between consecutive significant positive and consecutive negative peaks of the waveform.
  • the speech pitch is not related to the packet transmission rate, hence a sufficient number of packet voice samples must be stored in memory in order to accommodate the normal pitch period found in human speech.
  • the pitch estimate 212 of the speech received during the preceding packet is used by the destination device to identify the stored digital representations of voice in memory that are to be substituted for the missed (or corrupted) voice packet. For example, if as shown in FIG.3 a voice packet is missed and the last pitch estimate received was 52, i.e. 52 voice samples, then this pitch is used to identify the location relative to the current packet of the voice samples stored in memory to be used in place of the missing voice packet. In this example the receiver would replace the missing packet of 16 samples with the 16 voice samples received 52 samples earlier FIG.
  • step 4 is a flow diagram illustrating an exemplary method for the transmission of voice data such as by the communication controller 10.
  • N the number of packets transmitted
  • the variable N is incremented by 1 in step 42.
  • the originating device determines if N > X, i.e. if a sufficient number X of voice packets have been evaluated in memory to calculate a pitch estimate. If step 44 is NO, the originating device assigns a pitch estimate equal to the interval size in step 58 and control passes to step 48.
  • step 45 determines if a new pitch interval has been detected.
  • a NO determination by step 45 results in the pitch last calculated by step 46 being used in accord with step 47 and control passing to step 48.
  • a YES determination by step 45 causes a pitch estimate to be calculated in step 46 on the uncorrupted voice signal at the transmitter.
  • a CRC is calculated for a predetermined number of the most significant voice samples in each packet and on the pitch estimate in step 48.
  • the voice packet such as shown in FIG. 2A is transmitted.
  • a decision is made if voice processing is to continue, i.e. is more voiced data to be transmitted?
  • a YES decision returns control to step 42 for processing another voice packet.
  • a NO decision by step 52 terminates this method by RETURN 54.
  • FIG. 5 is a flow diagram illustrating an exemplary method for the reception of voice data transmitted in accord with the method of FIG.4. Beginning at START 60, variable M which represents the number of received packets is set to zero. In step 62 the receiver attempts to receive a voice packet for a particular voice channel or time slot. In determination step 64 a decision is made if a voice packet is incorrectly received, i.e. not received at all or received with an error in its voice data or pitch estimate as determined by a locally generated CRC being unequal with the received CRC.
  • step 66 memory at the voice receiver is updated with the pitch estimate and voice samples of the received voice packet in step 66. Up dating of the memory includes storing the new pitch value, storing the new voice samples, and deleting the oldest of a predetermined number of stored voice samples. Variable M is incremented by one.
  • step 68 the corresponding output voice signal is generated using the received packet voice samples.
  • step 70 a determination is made as to whether to continue voice processing, i.e. more voiced packets to be received? A YES decision transfers control back to step 62 to process another packet. A NO decision by step 70 will terminate voice processing at RETURN 72.
  • a YES decision by step 64 could result from receipt of a corrupted voice packet as determined by the CRC or a missing voice packet.
  • the stored pitch estimate is used to identify the location of the stored voice samples (SVS) that will be used for substitution.
  • the voice sample memory is updated with the SVS and M is incremented by one. Updating the voice sample memory with replacement stored voice samples enhances the systems performance by providing reasonably good quality voice samples for future possible substitutions.
  • the voice output signal is generated using the SVS instead of the currently received voice samples. Control is then passed to step 70 which proceeds as explained above.
  • FIG. 6 is a block diagram of an illustrative embodiment of the generation of a pitch estimate and CRC in accordance with the present invention at an origination node.
  • An input terminal 100 receives serial PCM information representative of sequential voice samples (see FIG. 3) from an originating codec. This PCM information is converted into parallel form by serial to parallel convenor 102.
  • a pitch calculator 104 receives the parallel information and implements a pitch calculation method. Pitch calculation methods and apparatus for implementing the methods are generally known in the art.
  • the output from the pitch calculator 104 is a pitch estimate which is transmitted with each voice packet. During voiced speech the value of the pitch estimate typically remains unchanged over a number of voice packets since the packet transmission rate is substantially faster than pitch changes.
  • the pitch estimate is converted to serial form by parallel to serial convenor 106.
  • OR gate 107 couples either the input serial PCM or the serial pitched estimate to CRC generator 108.
  • the CRC generator calculates a check value which is coupled to packet assembler 110.
  • the assembler also receives the input PCM information in parallel form and the pitch estimate. It provides a control output for elements 106 and 108 to enable each.
  • the parallel to serial convenor 106 is enabled at the correct packet time slot so that the pitch estimate is included in the CRC check value.
  • the packet assembler consists of conventional control circuitry as utilized in packet switches known in the art.
  • the function of the packet assembler is to organize information in the correct time relationship to form a packet to be transmitted. In the exemplary embodiment, the information is organized in accordance with FIG. 2A.
  • the packet header may contain a variety of relevant information tailored to a specific packet system and hence is not specifically addressed by this invention.
  • the pitch estimate for the current packet is inserted followed by sixteen bytes of 8 bits corresponding to one voice time slot.
  • the CRC is generated to protect all of the information in the pitch estimate field and a predetermined number of the most significant bits in each of the sixteen voice bytes.
  • providing CRC protection for three of the 8 bits in each voice byte was found to produce better voice quality than using only one bit or using all 8 bits.
  • FIG. 7 is a block diagram of an embodiment for implementing the voice substitution method based on received pitch and CRC in accordance with the present invention.
  • packet bytes of 8 bits and packet clock information are received by the parallel to serial convenor 120 which converts the parallel bytes into serial form and provides same to the CRC generator 122 which calculates a CRC based upon the received pitch estimate, voice data, and transmitted CRC fields.
  • a packet disassembler 124 also receives the packet bytes and clock, and generates control signals as will be described.
  • packet disassemblers associated with packet communications systems are known and consist of conventional control circuitry to provide control and timing information to enable a received packet to be disassembled into its constituent parts.
  • a disassembler signal on line 126 supplies control information to the CRC generator 122 which controls the receipt of bits. This permits only the predetermined number of most significant bits to be input to the generator while permitting all of the pitched estimate and transmitted CRC information to be received by the CRC generator 122.
  • the value "zero" in reference 128 is provided as one input to comparator 130. Its other input consists of the output value as determined by CRC generator 122.
  • the comparator upon a command on line 132 compares the CRC generator output with the zero reference. If the comparison is true, i.e. if the CRC generator output is zero, the substitute control line 134 is not enabled. If the comparison is not true, i.e. if CRC generator output is other than zero, substitute control line 134 is enabled and thereby initiates the voice packet substitution in accordance with the present invention.
  • a temporary pitch memory 136 stores the pitch estimate byte when it is present on the packet byte line as determined by disassembler control line 138. Thus, memory 136 stores the pitch estimate for each packet.
  • the last correct pitch memory 140 stores the last correctly received pitch estimate. If the pitch estimate received in the current packet is correctly received as determined by substitute control line 134, the value in memory 136 is transferred to memory 140. If the current pitch estimate is not correctly received, it is not transferred to memory 140.
  • the voice data field in a number of consecutive voice packets is stored in dual port RAM 142 and is utilized in accordance with the present invention to provide substitute voice data when the voice data in a current packet is not correctly received as determined by control line 134.
  • Transmission gate 144 is normally enabled and permits packet bytes to be received by the data input of RAM 142. This gate is inhibited upon determination of an incorrectly received CRC by control line 134. Each voice byte is stored in the RAM at an address location determined by the RAM's input pointer. Control of this pointer is described as follows.
  • Transmission gate 146 couples either the packet clock or the codec clock to counter 148 which counts to the maximum number of voice bytes to be stored in memory of RAM 142.
  • a subtractor 150 receives the output counter value and is able to subtract a predetermined number corresponding to the number of voice bytes contained in a voice packet, i.e. sixteen in the illustrative example. Except when enabled by generation of a substitute command on control line 134, the subtractor merely passes the counter output to the pointer input of RAM 142. Thus, consecutive voice bytes are stored in RAM memory with the oldest byte being overwritten as new bytes are received. Voice data in an incorrectly received packet, i.e. one having a CRC error, will be initially input in RAM 142.
  • transmission gate 146 switches to provide counter 148 with the codec clock and causes the subtractor 150 to be enabled thereby subtracting 16 from the counter value. This effectively repositions the input pointer of RAM 142 to the first voice byte in the incorrectly received packet.
  • Transmission gate 152 is enabled and couples the data out from RAM 142 back to the data input. As will be described below the output pointer is set to a substitute memory location as determined by the last correctly received pitch stored in memory 140. Sixteen voice bytes previously stored in memory will be output at data out of RAM 142 and also input via gate 152 so as to overwrite the voice bytes in the currently received packet which had a CRC error.
  • the output data is converted from parallel to serial form by convenor 154 which provides PCM output to the codec, which in turn translates the PCM into analog voice.
  • the output pointer of RAM 142 identifies the memory location address in the RAM which holds data to be output at the data out port.
  • a counter 156 is incremented by the codec clock and contains the same predetermined number as stored in counter 148, i.e. the maximum number of voice bytes stored in RAM 142.
  • the output value of counter 156 is coupled without being altered by subtractor 158 to the output pointer input of RAM 142 except upon receipt of a erroneous packet as determined by control line 134. It should be noted that although counters 148 and 156 contain the same predetermined number, the output value of counter 148 leads the output value of counter 156 so that the voice bytes stored in a memory location in RAM 142 will have been written previous to the attempt by counter 156 to access the same memory location.
  • subtractor 158 Upon the output of counter 156 reaching a voice byte of an incorrectly received packet, subtractor 158 will subtract from the output value of counter 156 a number corresponding to the pitch substitution value stored in memory 140. This effectively reindexes the output pointer backwards to the voice samples to be substituted for the erroneous packet in accordance with the pitch estimate. The subtractor 158 continues to subtract as counter 156 indexes through the next sixteen counts which corresponds to the length of voice bytes in the incorrectly received packet. Then, subtractor 158 ceases to subtract the pitch value and passes the actual output value of counter 156 to the output pointer input of RAM 142. This advances the output pointer to the voice byte location it would have been if the current packet had been correctly received.
  • gate 152 couples them to the data input and overwrites the incorrectly received voice bytes in RAM 142 memory.
  • packet substitution technique in accordance with the present invention can be practiced by a hardware implementation such as described in the embodiment shown in FIG. 6 and 7 or may be implemented in a microprocessor or preferably a digital signalling processor. Further, the particular packet environment and the modulation coding and decoding technique will impact the selection of the number of most significant bits which will yield optimal performance.
  • the calculation of the pitch estimate at the originating device results in a better pitch estimate of the original waveform than if generated based on received data.
  • This is advantageous since a voice packet substitution based on a more accurate pitch estimate will yield better voice quality.
  • Providing protection of only a predetermined number of the most significant bits of the transmitted voice samples with a check value further enhances the quality achieved by the substitution method.
  • voice quality is improved since a substitution voice sample set in this situation often results in worse voice quality than if the received voice samples are used.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Une estimation de la hauteur de son (212) transmise avec des paquests vocaux (210) est utilisée par le dispositif de destination (10) pour déterminer quelles données vocales stockées substituer aux paquests vocaux manquants ou altérés. Une valeur de vérification d'erreur (214) fondée sur les bits les plus significatifs de la représentation vocale numérique transmise permet au dispositif de destination (10) de déterminer si la représentation vocale reçue satisfait à un critère de qualité minimale. Si ce n'est pas le cas, une représentation de substitution est faite sur la base d'une estimation de la hauteur de son reçue (212).
PCT/US1992/001214 1991-03-08 1992-02-13 Procede et systeme de substitution de paquets vocaux WO1992015987A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US66675291A 1991-03-08 1991-03-08
US666,752 1991-03-08

Publications (1)

Publication Number Publication Date
WO1992015987A1 true WO1992015987A1 (fr) 1992-09-17

Family

ID=24675307

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1992/001214 WO1992015987A1 (fr) 1991-03-08 1992-02-13 Procede et systeme de substitution de paquets vocaux

Country Status (2)

Country Link
MX (1) MX9201012A (fr)
WO (1) WO1992015987A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4907277A (en) * 1983-10-28 1990-03-06 International Business Machines Corp. Method of reconstructing lost data in a digital voice transmission system and transmission system using said method
US5073940A (en) * 1989-11-24 1991-12-17 General Electric Company Method for protecting multi-pulse coders from fading and random pattern bit errors

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4907277A (en) * 1983-10-28 1990-03-06 International Business Machines Corp. Method of reconstructing lost data in a digital voice transmission system and transmission system using said method
US5073940A (en) * 1989-11-24 1991-12-17 General Electric Company Method for protecting multi-pulse coders from fading and random pattern bit errors

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
IEEE TRANSACTIONS ON ACONSTIES, SPEECH AND SIGNAL PROCESSING, Volume ASSP - 34, Number 6, December 1986, GOODMAN et al., column 1 to column 2, line 14 on page 1440 and Section II on page 441. *

Also Published As

Publication number Publication date
MX9201012A (es) 1992-09-01

Similar Documents

Publication Publication Date Title
US5828672A (en) Estimation of radio channel bit error rate in a digital radio telecommunication network
US6055497A (en) System, arrangement, and method for replacing corrupted speech frames and a telecommunications system comprising such arrangement
CA2024742C (fr) Appareil de codage de parols utilisant un codage multimode
US4757536A (en) Method and apparatus for transceiving cryptographically encoded digital data
EP0669026B1 (fr) Procede et appareil de suppression de blocs d'informations erronees dans un systeme de communication
RU2121237C1 (ru) Цифровая сотовая система связи
US6687670B2 (en) Error concealment in digital audio receiver
EP1449305B1 (fr) Procede de remplacement de donnees audio alterees
EP1458145A1 (fr) Appareil et procede de masquage d'erreur
JP2538575B2 (ja) 音声・ディジタル通信システム用のデ−タ・ミュ−ティング方法と装置
AU1251895A (en) Soft error correction in a TDMA radio system
US5687184A (en) Method and circuit arrangement for speech signal transmission
US20140032227A1 (en) Bit error management methods for wireless audio communication channels
US20030220787A1 (en) Method of and apparatus for pitch period estimation
CA2196565C (fr) Procede et systeme de traitement de signaux permettant de remplacer des blocs ne pouvant pas etre corriges dans un recepteur, destines a des signaux audio codes en bloc
JP3165150B2 (ja) エラーバースト検出
US8238368B2 (en) Method and system making it possible to manage erratic interruptions in a transmission system
KR100332526B1 (ko) 통신 매체를 통한 필요 오디오 정보의 전송 시스템 및 방법
WO1992015987A1 (fr) Procede et systeme de substitution de paquets vocaux
CA2256355A1 (fr) Procede et dispositif pour detection d'erreur en transmission numerique de paquets de donnees
US5805612A (en) Mechanism for repeater error mitigation
US20020035468A1 (en) Audio transmission system having a pitch period estimator for bad frame handling
JP2690284B2 (ja) データ信号の誤り訂正方法
JP3220321B2 (ja) 回線品質推定回路
CA1258884A (fr) Methode et appareil de transmission de donnees numeriques chiffrees

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): BR JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU MC NL SE