EP1112625A1 - Method and apparatus for coding an information signal - Google Patents

Method and apparatus for coding an information signal

Info

Publication number
EP1112625A1
EP1112625A1 EP99943854A EP99943854A EP1112625A1 EP 1112625 A1 EP1112625 A1 EP 1112625A1 EP 99943854 A EP99943854 A EP 99943854A EP 99943854 A EP99943854 A EP 99943854A EP 1112625 A1 EP1112625 A1 EP 1112625A1
Authority
EP
European Patent Office
Prior art keywords
positions
pulse
pulses
signal
combinations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP99943854A
Other languages
German (de)
French (fr)
Other versions
EP1112625B1 (en
EP1112625A4 (en
Inventor
James P. Ashley
Weimin Peng
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Motorola Solutions Inc
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Publication of EP1112625A1 publication Critical patent/EP1112625A1/en
Publication of EP1112625A4 publication Critical patent/EP1112625A4/en
Application granted granted Critical
Publication of EP1112625B1 publication Critical patent/EP1112625B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/10Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/10Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
    • G10L19/107Sparse pulse excitation, e.g. by using algebraic codebook

Definitions

  • the present invention relates, in general, to communication systems and, more particularly, to coding information signals in such communication systems.
  • CDMA communication systems are well known.
  • One exemplary CDMA communication system is the so-called IS-95 which is defined for use in North America by the Telecommunications Industry Association (TLA).
  • TLA Telecommunications Industry Association
  • TIA/EIA/IS-95 Mobile Station-Base-station Compatibility Standard for Dual Mode Wideband Spread Spectrum Cellular System, January 1997, published by the Electronic Industries Association (EIA), 2001 Eye Street, N.W., Washington, D.C. 20006.
  • a variable rate speech codec, and specifically Code Excited Linear Prediction (CELP) codec, for use in communication systems compatible with IS-95 is defined in the document known as IS- 127 and titled Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems, September 1996. IS-127 is also published by the Electronic Industries Association (EIA), 2001 Eye Street, N. W., Washington, D.C. 20006.
  • EIA Electronic Industries Association
  • FIG. 1 generally depicts a CELP decoder as is known in the prior art.
  • FIG. 2 generally depicts a Code Excited Linear Prediction (CELP) encoder as is known in the prior art.
  • CELP Code Excited Linear Prediction
  • FIG. 3 generally depicts a joint interleaved pulse permutation matrix in accordance with the invention.
  • FIG. 4 generally depicts a flow chart describing how the codebook is generated in accordance with the invention.
  • FIG. 5 generally depicts a joint interleaved pulse permutation matrix for pulses 3 and 4 in accordance with the present invention.
  • a method for coding an information signal comprises the steps of dividing the information signal into blocks and deriving a target signal based on a block of the information signal.
  • the method further includes the steps of coding the target signal using pulse positioning techniques based on an error criteria, wherein the allowable positions of a given pulse are dependent on the positions of one or more other pulses, to produce coded pulse positions and transmitting the coded pulse positions to a destination.
  • the information signal further comprises a speech signal or an audio signal and a block of the information signals further comprise a frame or a subframe of the information signals.
  • the error criteria further comprises a perceptually weighted squared error criteria and the allowable pulse positions are determined using an arbitrary closed-form expression E( ⁇ ), in which at least one of the conditions within the expression pertain to at least two of the elements within ⁇ .
  • FIG. 1 generally depicts a Code Excited Linear Prediction (CELP) decoder 100 as is known in the art.
  • CELP Code Excited Linear Prediction
  • This signal is scaled using the FCB gain factor / and combined with a signal E(n) output from an adaptive codebook 104 (ACB) and scaled by a factor ⁇ , which is used to model the long term (or periodic) component of a speech signal (with period r).
  • the signal E t (n) which represents the total excitation, is used as the input to the LPC synthesis filter 106, which models the coarse short term spectral shape, commonly referred to as "formants”.
  • the output of the synthesis filter 106 is then perceptually postfiltered by perceptual postfilter 108 in which the coding distortions are effectively "masked” by amplifying the signal spectra at frequencies that contain high speech energy, and attenuating those frequencies that contain less speech energy. Additionally, the total excitation signal E,(n) is used as the adaptive codebook for the next block of synthesized speech.
  • FIG. 2 generally depicts a CELP encoder 200.
  • the goal is to code the perceptually weighted target signal x w (n), which can be represented in general terms by the z-transform:
  • W(z) is the transfer function of the perceptual weighting filter 208, and is of the form:
  • H(z) is the transfer function of the perceptually weighted synthesis filters 206 and 210, and is of the form:
  • H zs ( ) is the "zero state" response of H(z) from filter 206, in which the initial state of H(z) is all zeroes
  • H m (z) is the "zero input response" of H(z) from filter 210, in which the previous state of H(z) is allowed to evolve with no input excitation.
  • the initial state used for generation of H zlR (z) is derived from the total excitation E t (n) from the previous subframe.
  • FCB perceptually weighted target signal x w (n) and the perceptually weighted excitation signal x w (n). This can be expressed in time domain form as:
  • c k (ri) is the codevector corresponding to FCB codebook index k
  • ⁇ k is the optimal FCB gain associated with codevector c k ( ⁇ )
  • h( ⁇ ) is the impulse response of the perceptually weighted synthesis filter H(z)
  • M is the codebook size
  • L is the subframe length
  • x w (n) ⁇ k c k ( ⁇ ) * h(ri) .
  • speech is coded every 20 milliseconds (ms) and each frame includes three subframes of length L.
  • Eq.4 can also be expressed in vector-matrix form as:
  • H is the L x L zero-state convolution matrix
  • the FCB utilizes a multipulse configuration in which the excitation vector c k contains very few non-zero, unit magnitude values. This configuration is known in the art as Algebraic CELP, or ACELP.
  • Table 1 generally depicts pulse positions defined for IS-127 Rate 1/2.
  • the excitation codevector c k can contain " holes" in which certain positions are not represented by the vector space. That is, an optimal match to the target vector may require a pulse at position 12, but the definitions of the pulse positions in Table 1 does not allow a pulse to be located at that position.
  • the constraints on positions may cause the pulse to be placed either at locations close to the optimal position, or worse, the energy of the target signal may be completely missed at that position. This can cause distortion, and possibly audible artifacts in the synthesized speech signal.
  • the bit allocation of 16 bits would be divided between the four tracks equally so that each track would receive four bits.
  • the four bits per track would further be composed of three bits for position (comprising 8 different positions) and one sign bit to indicate the polarity of the pulse.
  • the pulse positions can then be extracted at the decoder by:
  • the respective positions of pulse 0 are shown along the horizontal axis, and the positions of pulse 1 are shown along the vertical axis.
  • the "forbidden" pulse combinations are designated by the shaded regions while the allowable combinations are unshaded.
  • FIG. 4 generally depicts a flow chart describing how the codebook is generated in accordance with the invention.
  • the flowchart shows a basic nested loop structure in which all permutations of 0 ⁇ / ⁇ M and 0 ⁇ j ⁇ N are generated.
  • N and M are the total number of allowable positions for each pulse.
  • the decision in the innermost loop simply checks for forbidden combinations [i,j] according to function F(i,j) at step 402, which in the example of FIG. 3 is described as:
  • This function returns a value of 1 for cases when the absolute value of the difference of / andy is an element of the given set; otherwise, a zero is returned. This is shown in step 403.
  • the elements of the given set correspond to the distances between the diagonal shaded elements of FIG. 3, and the expression is therefore sufficient in describing all necessary shaded regions.
  • the respective positions are calculated using the following expression:
  • is the decimated track position
  • N lr ⁇ ck is the number of tracks
  • n is the track number.
  • FIG. 5 generally depicts a joint interleaved pulse permutation matrix for pulses p 2 and p 3 in accordance with the present invention. As shown in FIG.
  • n is the number of pulses.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
  • Control Of Motors That Do Not Use Commutators (AREA)
  • Paper (AREA)
  • Control Of El Displays (AREA)

Abstract

To achieve high quality speech reconstruction at low bit rates, constraints on position combinations among two or more pulses (403) are implemented. By placing constraints on position combinations, certain combinations of pulses are prohibited which allows the most significant pulses to always be coded, thereby improving speech quality. After all valid combinations are considered, a list of pulse pairs (codebook) which can be indexed using a single, predetermined bit length codeword is produced. The codeword is transmitted to a destination where it is used by a decoder to reconstruct the original information signal.

Description

METHOD AND APPARATUS FOR CODING AN INFORMATION SIGNAL
FIELD OF THE INVENTION
The present invention relates, in general, to communication systems and, more particularly, to coding information signals in such communication systems.
BACKGROUND OF THE INVENTION
Code-division multiple access (CDMA) communication systems are well known. One exemplary CDMA communication system is the so-called IS-95 which is defined for use in North America by the Telecommunications Industry Association (TLA). For more information on IS-95, see TIA/EIA/IS-95, Mobile Station-Base-station Compatibility Standard for Dual Mode Wideband Spread Spectrum Cellular System, January 1997, published by the Electronic Industries Association (EIA), 2001 Eye Street, N.W., Washington, D.C. 20006. A variable rate speech codec, and specifically Code Excited Linear Prediction (CELP) codec, for use in communication systems compatible with IS-95 is defined in the document known as IS- 127 and titled Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems, September 1996. IS-127 is also published by the Electronic Industries Association (EIA), 2001 Eye Street, N. W., Washington, D.C. 20006.
In modern CELP codecs, there is a problem with maintaining high quality speech reproduction at low bit rates. The problem originates since there are too few bits available to appropriately model the "excitation" sequence or "codevector" which is used as the stimulus to the CELP synthesizer. Thus, a need exists for an improved method and apparatus which overcomes the deficiencies of the prior art. BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 generally depicts a CELP decoder as is known in the prior art. FIG. 2 generally depicts a Code Excited Linear Prediction (CELP) encoder as is known in the prior art.
FIG. 3 generally depicts a joint interleaved pulse permutation matrix in accordance with the invention.
FIG. 4 generally depicts a flow chart describing how the codebook is generated in accordance with the invention. FIG. 5 generally depicts a joint interleaved pulse permutation matrix for pulses 3 and 4 in accordance with the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Stated generally, to achieve high quality speech reconstruction at low bit rates, constraints on position combinations among two or more pulses are implemented. By placing constraints on position combinations, certain combinations of pulses are prohibited which allows the most significant pulses to always be coded, thereby improving speech quality. After all valid combinations are considered, a list of pulse pairs (codebook) which can be indexed using a single, predetermined bit length codeword is produced. The codeword is transmitted to a destination where it is used by a decoder to reconstruct the original information signal. Stated specifically, a method for coding an information signal comprises the steps of dividing the information signal into blocks and deriving a target signal based on a block of the information signal. The method further includes the steps of coding the target signal using pulse positioning techniques based on an error criteria, wherein the allowable positions of a given pulse are dependent on the positions of one or more other pulses, to produce coded pulse positions and transmitting the coded pulse positions to a destination.
In the preferred embodiment, the information signal further comprises a speech signal or an audio signal and a block of the information signals further comprise a frame or a subframe of the information signals. The error criteria further comprises a perceptually weighted squared error criteria and the allowable pulse positions are determined using an arbitrary closed-form expression E(λ), in which at least one of the conditions within the expression pertain to at least two of the elements within λ.
FIG. 1 generally depicts a Code Excited Linear Prediction (CELP) decoder 100 as is known in the art. In modern CELP decoders, there is a problem with maintaining high quality speech reproduction at low bit rates. The problem originates since there are too few bits available to appropriately model the "excitation" sequence or "codevector" ck which is used as the stimulus to the CELP decoder 100. As shown in FIG. 1, the excitation sequence or "codevector" cλ., is generated from a fixed codebook 102 (FCB) using the appropriate codebook index k. This signal is scaled using the FCB gain factor / and combined with a signal E(n) output from an adaptive codebook 104 (ACB) and scaled by a factor β, which is used to model the long term (or periodic) component of a speech signal (with period r). The signal Et(n) , which represents the total excitation, is used as the input to the LPC synthesis filter 106, which models the coarse short term spectral shape, commonly referred to as "formants". The output of the synthesis filter 106 is then perceptually postfiltered by perceptual postfilter 108 in which the coding distortions are effectively "masked" by amplifying the signal spectra at frequencies that contain high speech energy, and attenuating those frequencies that contain less speech energy. Additionally, the total excitation signal E,(n) is used as the adaptive codebook for the next block of synthesized speech.
FIG. 2 generally depicts a CELP encoder 200. Within CELP encoder 200, the goal is to code the perceptually weighted target signal xw(n), which can be represented in general terms by the z-transform:
Xw(z) = S(z)W(z) -βE(z)Hzs (z) - Hm(z), (1)
where W(z) is the transfer function of the perceptual weighting filter 208, and is of the form:
and H(z) is the transfer function of the perceptually weighted synthesis filters 206 and 210, and is of the form:
and where A(z) are the unquantized direct form LPC coefficients, Aq(z) are the quantized direct form LPC coefficients, and λx and λ2 are perceptual weighting coefficients. Additionally, Hzs( ) is the "zero state" response of H(z) from filter 206, in which the initial state of H(z) is all zeroes, Hm(z) is the "zero input response" of H(z) from filter 210, in which the previous state of H(z) is allowed to evolve with no input excitation. The initial state used for generation of HzlR(z) is derived from the total excitation Et(n) from the previous subframe. To solve for the parameters necessary to generate xw(n), a fixed codebook
(FCB) closed loop analysis in accordance with the invention is described. Here, the codebook index k is chosen to minimize the mean square error between the perceptually weighted target signal xw(n) and the perceptually weighted excitation signal xw(n). This can be expressed in time domain form as:
min J » - γkck(ή) * h(n))2 [ 0≤ k < M, (4)
where ck(ri) is the codevector corresponding to FCB codebook index k, γk is the optimal FCB gain associated with codevector ck(ή), h(ή) is the impulse response of the perceptually weighted synthesis filter H(z), M is the codebook size, L is the subframe length, * denotes the convolution process and xw(n) = γkck(ή) * h(ri) . In the preferred embodiment, speech is coded every 20 milliseconds (ms) and each frame includes three subframes of length L. Eq.4 can also be expressed in vector-matrix form as:
mink{(xw- rfick)T(xw- rkHck)}, 0≤k<M, (5)
where ck and \w are length L column vectors, H is the L x L zero-state convolution matrix:
T h(0) 0 0 © 01
I h(l) h(0) 0 © 0 I
H = l h(2) h(l) h(0) © 0 I (6) ir' 4T b έr"
_7( -1) h(L-2) h(L-3) © h(0)_
and T denotes the appropriate vector or matrix transpose. Eq.5 can be expanded to:
min k {x - 2rkxlHck + }, 0≤k<M, (7)
and the optimal codebook gain γk for codevector ck can be derived by setting the derivative (with respect to γk) of the above expression to zero:
d
(xlx.. 2 .Hc, + ^HTHc o, (8)
and then solving for γk to yield:
Substituting this quantity into Eq.7 produces: THC min x^xw - ' τ * ; r, 0 <Λ < Af. (10) ckH HcA j
Since the first term in Eq. 10 is constant with respect to k, it can be written as:
From Eq. 11, it is important to note that much of the computational burden associated with the search can be avoided by precomputing the terms in Eq. 11 which do not depend on k; namely, by letting dr = x^H and Θ =HTH . When this is done, Eq. 11 reduces to:
which is equivalent to equation 4.5.7.2-1 of IS- 127. The process of precomputing these terms is known as "backward filtering" . The result is that the index k corresponding to the codevector ck that results in the minimum squared error between the perceptually weighted target signal xw(n) and the perceptually weighted excitation signal xw(n) can be found by maximizing the term in Eq. 12. In the IS-127 half rate case (4.0 kbps), the FCB utilizes a multipulse configuration in which the excitation vector ck contains very few non-zero, unit magnitude values. This configuration is known in the art as Algebraic CELP, or ACELP. Since there are very few non-zero elements within ck, the computational complexity involved with Eq. 12 is relatively low. For the IS-127 three "pulse" case, there are only 10 bits allocated for the pulse positions and associated signs for each of the three subframes (of length of L = 53, 53, 54). In this configuration, an associated "track" defines the allowable positions for each of the three pulses within ck (3 bits per pulse plus 1 bit for composite sign of +, -, + or -, +, -). As shown in Table 4.5.7.4-1 of IS-127, pulse 1 can occupy positions 0, 7, 14, . . . , 49, pulse 2 can occupy positions 2, 9, 16, . . . , 51, and pulse 3 can occupy positions 4, 11, 18, . . . , 53. This is known as "interleaved pulse permutation" , which is well known in the art. The positions of the three pulses are optimized jointly so Eq. 12 is executed 83 = 512 times. The sign bit is then set according to the sign of the gain term γk .
Table 1
Table 1 generally depicts pulse positions defined for IS-127 Rate 1/2. One problem in the above scenario is that the excitation codevector ck can contain " holes" in which certain positions are not represented by the vector space. That is, an optimal match to the target vector may require a pulse at position 12, but the definitions of the pulse positions in Table 1 does not allow a pulse to be located at that position. The constraints on positions may cause the pulse to be placed either at locations close to the optimal position, or worse, the energy of the target signal may be completely missed at that position. This can cause distortion, and possibly audible artifacts in the synthesized speech signal.
In a similar example, a design requirement may be to have four pulses with one pulse on each of four separate tracks, with a subframe sizes of L = [53, 53, 54], and a bit allocation of 16 bits per subframe. In this scenario, the tracks would be configured as 4 pulses x 14 positions = 56 total positions, which could be positioned according to the prior art as in Table 2, which depicts examples of pulse positions as used in the prior art. Here, the bit allocation of 16 bits would be divided between the four tracks equally so that each track would receive four bits. The four bits per track would further be composed of three bits for position (comprising 8 different positions) and one sign bit to indicate the polarity of the pulse.
Table 2
As can be seen from this example, there are still holes in the vector space since all of the pulse positions cannot be adequately represented. One solution would be to allow all fourteen positions to be valid, e.g., the positions of pulse p0 would be [0, 4, 8,..., 52], p, would be [1, 5, 9,..., 53], etc. The problem with this method is that four bits would be required to encode the position information, thereby violating the 16 bit per subframe requirement (4 tracks x (4 position bits + 1 sign bit) = 20 bits). Another method for pulse coding that is known in the prior art deals with multiplexing the indices of two pulses into a single codeword. For example, in the IS-127 Rate 1 case (8.5 kbps), there are 11 possible pulse positions spread over five tracks. Rather than using four bits for each pulse position, the positions of two pulses can be coded jointly using only seven bits. This is accomplished by considering that the total number of positions for two pulses is 11 x 11 = 121, which is less than the total number of positions that can be coded with seven bits (27 = 128). Details of the coding can then be expressed as:
Codeword = 11 P, } £L \ (13)
. 5 L s J '
where p, and p} are the positions of the z'-th andy'-th pulses, and |_xj represents the largest integer < x.
The pulse positions can then be extracted at the decoder by:
Codeword λ. λj = Codeword - 1 U, , (14) 11
where λ, and λ} are the decimated positions within the appropriate track, which can be decoded using Table 2, where the value of λ corresponds to the column in the table. The problem with using this method for the 14 position case in Table 2 is that a 14 x 14 = 196 position multiplex would still require 8 bits (28 = 256 possible positions), so there is no savings over simply using four bits per pulse. Clearly, with all of the above prior art methods, all positions are not adequately represented by the vector space which would allow efficient, low rate coding of. pulse positions.
As previously mentioned, design of an efficient 16 bit, 4 pulse, 56 position codebook (with all positions representable) is not readily achievable in the prior art. In accordance with the present invention, however, a method is presented which allows all pulse positions to be coded, while maintaining the design constraints as presented in the previous example. In addition, the present invention provides a general flexibility which allows efficient solutions to a wide variety of design constraints. The present invention solves the aforementioned problems by placing constraints on position combinations .among two or more pulses. For example, the allowable positions for a given pulse are jointly dependent on the associated positions of one or more other pulses. This can be seen for the 14 position track example in FIG. 3, where a joint interleaved pulse permutation matrix in accordance with the invention is shown. In this embodiment, the matrix depicted in FIG. 3 is for pulses 0 and 1 , and the subframe length is 1=54. In this figure, the respective positions of pulse 0 are shown along the horizontal axis, and the positions of pulse 1 are shown along the vertical axis. The "forbidden" pulse combinations are designated by the shaded regions while the allowable combinations are unshaded. As one may notice, the number of unshaded regions is exactly the number of combinations that can be represented by the given number of bits, in this case 27 = 128, and the number of shaded regions is exactly the total number of decimated positions of pulse 0 times the total number of decimated positions of pulse 1 minus the number of combinations that can be represented by the given number of bits, i.e., (14 x 14) - 128 = 68.
As the various pulse position codevectors are searched (via Eq. 12), when pulse p, is placed at λλ = 0 (corresponding to position (0 x 4) + 1 = 1), then the allowable positions for pulse p0 would be [4, 8, 16, 20, 28, 32, 40, 48, 52].
Likewise, when pulse p, is placed at position 5 (λx = 1), the allowable positions for pulse p0 would be [0, 8, 12, 20, 24, 32, 36, 44, 52], and so on. After considering all valid combinations, a 128 x 2 list of pulse pairs (codebook) that can be indexed using a single 7 bit codeword is produced in accordance with the invention. This codeword is suitable for transmission to a destination for decoding and reconstruction. Furthermore, this codebook can be generated algebraically at run time, stored in volatile memory (RAM), or stored in nonvolatile memory (ROM).
FIG. 4 generally depicts a flow chart describing how the codebook is generated in accordance with the invention. First, the flowchart shows a basic nested loop structure in which all permutations of 0 < / < M and 0 ≤ j < N are generated. In this example, N and M are the total number of allowable positions for each pulse. The decision in the innermost loop simply checks for forbidden combinations [i,j] according to function F(i,j) at step 402, which in the example of FIG. 3 is described as:
W)- ft ΝHM«"].
0, otherwise (l5)
This function returns a value of 1 for cases when the absolute value of the difference of / andy is an element of the given set; otherwise, a zero is returned. This is shown in step 403. The elements of the given set correspond to the distances between the diagonal shaded elements of FIG. 3, and the expression is therefore sufficient in describing all necessary shaded regions. For allowed pulse combinations, the respective positions are calculated using the following expression:
G(λ,n) = λ χ Ntracks+ n , (16)
where λ is the decimated track position, Nlrαck, is the number of tracks, and n is the track number. Once the codebook entry has been generated at step 403, the codebook index k is incremented at step 404, and the process continues until the entire codebook is filled via steps 400-401 and 405-408. A similar technique would be used for generating position information for pulses p2 and p3 of the given example.
Although the previous example shows the forbidden regions to be strict upper left to lower right diagonal, any pattern utilizing 128 unshaded regions is feasible and assumed to be within the scope of the invention. Another aspect of the preferred embodiment is explained as follows: there are 4 x 14 = 56 total possible pulse positions. The length of a subframe, however, is not greater than 54 samples. Therefore, dedicating positions to locations greater than 53 (or 52 for subframes one and two) results in reduced coding efficiency, and thus, degraded quality. FIG. 5 generally depicts a joint interleaved pulse permutation matrix for pulses p2 and p3 in accordance with the present invention. As shown in FIG. 5, the positions 54 and 55 are omitted by the shaded regions, which allows more combinations to be represented in the valid vector space since the total number of unshaded regions is still 128. This can be observed by comparing the relative spacing between the diagonals in FIG. 3 and FIG. 5, where FIG. 3 has generally two spaces between forbidden diagonals while FIG. 5 has three spaces. The closed form expression for the forbidden combinations of FIG. 5 can be expressed as:
1, |/- y| e[0,4,8]
F(i,j)= \ l, i = M -Λ orj = N -λ . (17)
0, otherwise
As one may observe, the example in FIG. 5 is inherently less restrictive and therefore results in higher coding accuracy.
As one skilled in the art will appreciate, it is possible to form upper right to lower left diagonals and a number of various other patterns that may benefit a specific application using the techniques described herein in accordance with the invention. Furthermore, it is possible to extend the dimension of the number of pulses to beyond two so that any closed-form expression F(λ) is allowed, where λ
= is the vector of candidate pulse positions, and n is the number of pulses.
While the invention has been particularly shown and described with reference to a particular embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention. The corresponding structures, materials, acts and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or acts for performing the functions in combination with other claimed elements as specifically claimed.

Claims

What we claim is:CLAIMS
1. A method for coding an information signal comprising the steps of:
a) dividing the information signal into blocks; b) deriving a target signal based on a block of the information signal; c) coding the target signal using pulse positioning techniques based on an error criteria, wherein the allowable positions of a given pulse are dependent on the positions of one or more other pulses, to produce coded pulse positions; and d) transmitting the coded pulse positions to a destination.
2. The method in claim 1, wherein the information signal further comprises a speech signal or an audio signal.
3. The method in claim 1, wherein a block of the information signals further comprise a frame or a subframe of the information signals.
4. The method in claim 1, wherein the error criteria further comprises a perceptually weighted squared error criteria.
5. The method in claim 1, wherein the allowable pulse positions are determined using an arbitrary closed-form expression F(╬╗), in which at least one of the conditions within the expression pertain to at least two of the elements within ╬╗.
EP99943854A 1998-09-11 1999-08-24 Method for coding an information signal Expired - Lifetime EP1112625B1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US15143098A 1998-09-11 1998-09-11
US151430 1998-09-11
PCT/US1999/019217 WO2000016501A1 (en) 1998-09-11 1999-08-24 Method and apparatus for coding an information signal

Publications (3)

Publication Number Publication Date
EP1112625A1 true EP1112625A1 (en) 2001-07-04
EP1112625A4 EP1112625A4 (en) 2004-06-16
EP1112625B1 EP1112625B1 (en) 2006-05-31

Family

ID=22538745

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99943854A Expired - Lifetime EP1112625B1 (en) 1998-09-11 1999-08-24 Method for coding an information signal

Country Status (6)

Country Link
EP (1) EP1112625B1 (en)
JP (1) JP4460165B2 (en)
KR (1) KR100409167B1 (en)
AT (1) ATE328407T1 (en)
DE (1) DE69931641T2 (en)
WO (1) WO2000016501A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6539349B1 (en) 2000-02-15 2003-03-25 Lucent Technologies Inc. Constraining pulse positions in CELP vocoding
US7889103B2 (en) * 2008-03-13 2011-02-15 Motorola Mobility, Inc. Method and apparatus for low complexity combinatorial coding of signals

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0397628A1 (en) * 1989-05-11 1990-11-14 Telefonaktiebolaget L M Ericsson Excitation pulse positioning method in a linear predictive speech coder

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2579356B1 (en) * 1985-03-22 1987-05-07 Cit Alcatel LOW-THROUGHPUT CODING METHOD OF MULTI-PULSE EXCITATION SIGNAL SPEECH
US5754976A (en) * 1990-02-23 1998-05-19 Universite De Sherbrooke Algebraic codebook with signal-selected pulse amplitude/position combinations for fast coding of speech
JP3057907B2 (en) * 1992-06-16 2000-07-04 松下電器産業株式会社 Audio coding device
KR950011967B1 (en) * 1992-07-31 1995-10-12 임홍식 Memory rearangement device for semiconductor recorder
JP3196595B2 (en) * 1995-09-27 2001-08-06 日本電気株式会社 Audio coding device
JP4063911B2 (en) * 1996-02-21 2008-03-19 松下電器産業株式会社 Speech encoding device
US5970444A (en) * 1997-03-13 1999-10-19 Nippon Telegraph And Telephone Corporation Speech coding method
US5963897A (en) * 1998-02-27 1999-10-05 Lernout & Hauspie Speech Products N.V. Apparatus and method for hybrid excited linear prediction speech encoding
JP3180762B2 (en) * 1998-05-11 2001-06-25 日本電気株式会社 Audio encoding device and audio decoding device
JP3824810B2 (en) * 1998-09-01 2006-09-20 富士通株式会社 Speech coding method, speech coding apparatus, and speech decoding apparatus

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0397628A1 (en) * 1989-05-11 1990-11-14 Telefonaktiebolaget L M Ericsson Excitation pulse positioning method in a linear predictive speech coder

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHENG DEYUAN ED - YUAN B ET AL: "An 8 kb/s low complexity ACELP speech codec" SIGNAL PROCESSING, 1996., 3RD INTERNATIONAL CONFERENCE ON BEIJING, CHINA 14-18 OCT. 1996, NEW YORK, NY, USA,IEEE, US, 14 October 1996 (1996-10-14), pages 671-674, XP010209596 ISBN: 0-7803-2912-0 *
See also references of WO0016501A1 *

Also Published As

Publication number Publication date
KR100409167B1 (en) 2003-12-12
EP1112625B1 (en) 2006-05-31
JP2002525667A (en) 2002-08-13
DE69931641D1 (en) 2006-07-06
JP4460165B2 (en) 2010-05-12
ATE328407T1 (en) 2006-06-15
DE69931641T2 (en) 2006-10-05
WO2000016501A1 (en) 2000-03-23
KR20010073146A (en) 2001-07-31
EP1112625A4 (en) 2004-06-16

Similar Documents

Publication Publication Date Title
US6236960B1 (en) Factorial packing method and apparatus for information coding
US6141638A (en) Method and apparatus for coding an information signal
US7280959B2 (en) Indexing pulse positions and signs in algebraic codebooks for coding of wideband signals
DE69928288T2 (en) CODING PERIODIC LANGUAGE
US6055496A (en) Vector quantization in celp speech coder
EP1235203B1 (en) Method for concealing erased speech frames and decoder therefor
KR20010024935A (en) Speech coding
EP2805324B1 (en) System and method for mixed codebook excitation for speech coding
EP0815554A1 (en) Analysis-by-synthesis linear predictive speech coder
US6678651B2 (en) Short-term enhancement in CELP speech coding
US6826527B1 (en) Concealment of frame erasures and method
US6415252B1 (en) Method and apparatus for coding and decoding speech
EP1103953B1 (en) Method for concealing erased speech frames
EP1112625B1 (en) Method for coding an information signal
KR100718487B1 (en) Harmonic noise weighting in digital speech coders
Bessette et al. Techniques for high-quality ACELP coding of wideband speech
WO2002023536A2 (en) Formant emphasis in celp speech coding
JP3166697B2 (en) Audio encoding / decoding device and system
Saleem et al. Implementation of Low Complexity CELP Coder and Performance Evaluation in terms of Speech Quality
EP1212750A1 (en) Multimode vselp speech coder
Ravishankar et al. Voice Coding Technology for Digital Aeronautical Communications

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010411

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

A4 Supplementary search report drawn up and despatched

Effective date: 20040506

RIC1 Information provided on ipc code assigned before grant

Ipc: 7G 10L 19/10 B

Ipc: 7H 04B 7/216 A

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RTI1 Title (correction)

Free format text: METHOD FOR CODING AN INFORMATION SIGNAL

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060531

Ref country code: LI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060531

Ref country code: CH

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060531

Ref country code: BE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060531

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060531

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 69931641

Country of ref document: DE

Date of ref document: 20060706

Kind code of ref document: P

REG Reference to a national code

Ref country code: SE

Ref legal event code: TRGR

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060824

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060831

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060831

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: IT

Payment date: 20060831

Year of fee payment: 8

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060911

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20061031

NLV1 Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act
REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20070301

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060901

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060824

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060531

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20070824

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732E

Free format text: REGISTERED BETWEEN 20110127 AND 20110202

REG Reference to a national code

Ref country code: DE

Ref legal event code: R081

Ref document number: 69931641

Country of ref document: DE

Owner name: MOTOROLA MOBILITY, INC. ( N.D. GES. D. STAATES, US

Free format text: FORMER OWNER: MOTOROLA, INC., SCHAUMBURG, ILL., US

Effective date: 20110324

REG Reference to a national code

Ref country code: FR

Ref legal event code: TP

Owner name: MOTOROLA MOBILITY, INC., US

Effective date: 20110912

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 18

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FI

Payment date: 20160831

Year of fee payment: 18

Ref country code: GB

Payment date: 20160830

Year of fee payment: 18

Ref country code: DE

Payment date: 20160826

Year of fee payment: 18

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20160825

Year of fee payment: 18

Ref country code: SE

Payment date: 20160829

Year of fee payment: 18

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732E

Free format text: REGISTERED BETWEEN 20170831 AND 20170906

REG Reference to a national code

Ref country code: FR

Ref legal event code: TP

Owner name: GOOGLE TECHNOLOGY HOLDINGS LLC, US

Effective date: 20171214

Ref country code: FR

Ref legal event code: CD

Owner name: GOOGLE TECHNOLOGY HOLDINGS LLC, US

Effective date: 20171214

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 69931641

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20170824

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170825

Ref country code: FI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170824

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20180430

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170824

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180301

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170831

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230520