EP0454552A2 - Verfahren und Einrichtung zur Sprachcodierung mit niedriger Bitrate - Google Patents
Verfahren und Einrichtung zur Sprachcodierung mit niedriger Bitrate Download PDFInfo
- Publication number
- EP0454552A2 EP0454552A2 EP91401051A EP91401051A EP0454552A2 EP 0454552 A2 EP0454552 A2 EP 0454552A2 EP 91401051 A EP91401051 A EP 91401051A EP 91401051 A EP91401051 A EP 91401051A EP 0454552 A2 EP0454552 A2 EP 0454552A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- coding
- frame
- frames
- pitch
- takes place
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 239000011295 pitch Substances 0.000 claims description 31
- 239000013598 vector Substances 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000013139 quantization Methods 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims 2
- 230000001755 vocal effect Effects 0.000 abstract description 5
- 238000004364 calculation method Methods 0.000 description 16
- 238000004458 analytical method Methods 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000003786 synthesis reaction Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000005284 excitation Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 101000984710 Homo sapiens Lymphocyte-specific protein 1 Proteins 0.000 description 1
- 101001096074 Homo sapiens Regenerating islet-derived protein 4 Proteins 0.000 description 1
- 102100027105 Lymphocyte-specific protein 1 Human genes 0.000 description 1
- 102100037889 Regenerating islet-derived protein 4 Human genes 0.000 description 1
- 101150080038 Sur-8 gene Proteins 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 101150114085 soc-2 gene Proteins 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000009966 trimming Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
- G10L19/07—Line spectrum pair [LSP] vocoders
Definitions
- the present invention relates to a method and a device for low bit rate coding of speech.
- the object of the invention is to overcome the aforementioned drawbacks.
- the subject of the invention is a low bit rate coding process for speech, characterized in that it consists, after having cut the speech signal into frames of constant length, to calculate the characteristics of N modeling filters of the vocal tract as well as the fundamental period (pitch), voicing and energy characteristics of the voice signal by determined intervals of N successive frames by calculating the energy of the speech signal a determined number P of times per frame to code all of these characteristics.
- Figure 1 a flowchart illustrating the speech coding method implemented by the invention.
- FIG. 2 a mode of coding the LSP coefficients of the analysis filter used in FIG. 1 to model the voice path.
- Figure 3 a table of LSP coefficients.
- Figure 5 is a pitch coding table.
- FIG. 6 is a flowchart illustrating the method for synthesizing the speech signal implemented by the invention.
- FIG. 7 a graph to illustrate a mode of interpolation of the synthesis filters implemented by the invention.
- Figure 8 an embodiment of a device for implementing the method according to the invention.
- the coding method according to the invention consists, after having cut the speech signal into frames of constant length of approximately 20 to 25 ms, as this usually takes place in vocoders, determining and coding the characteristics of the speech signal over N successive frames by determining the energy of the signal P times per frame.
- the synthesis of the speech signal on each frame then takes place by descrambling and decoding the values of the coded characteristics of the speech signal.
- step 3 After sampling the speech signal on each frame and quantizing the samples over a determined number of bits, these are pre-emphasized in step 3.
- the sampling operation makes the spectrum of the speech signal periodic
- the number of samples taken into account for the determination of the coefficients of the vocal tract modeling filter is limited in a known manner by making the product of the pre-emphasized samples of step 3 by a HAMMING window of duration equal to that of a frame, this window also having the advantage of strengthening the resonances.
- the coefficients k i of the vocal tract modeling filter are calculated in step 5 from autocorrelation coefficients R i defined by a relation of the form: where i is an integer varying from 0 to 10 for example, and S i represents a sample of pre-emphasized and windowed signal.
- the calculation of the coefficients K i can be carried out in step 5 by applying the known algorithm of M. LEROUX-GUEGUEN, a description of which can be found in the article of the journal IEEE Transactions or Acoustics Speech, and Signal Processing June 1977 titled "A fixed point computation of partial correlation coefficients"'. This calculation amounts to inverting a square matrix whose elements are the coefficients R i of the relation (1).
- F e represents the sampling frequency of the speech signal.
- the calculation of the fundamental period of the signal and the voicing takes place in a known manner by performing steps 9 and 10.
- the speech signal is classified into two categories of sounds, voiced sounds and unvoiced sounds.
- Voiced sounds that are produced from the vocal cords are compared to a series of impulses whose fundamental period is called "Pitch" in English.
- Unvoiced sounds produced by turbulence are assimilated to white noise.
- the method recognizes in step 10 for each frame a voiced sound, and a non-voiced sound otherwise. Recognition takes place after a preprocessing of the signal to reinforce useful information and limit that which is not.
- This preprocessing consists in carrying out a first low pass filtering of the signal, followed by bashing and a second filtering.
- the first filtering is carried out for example by means of a simple "Butterworth" filter of order 3 whose cutoff frequency at 3dB can be fixed at 600 Hertz .
- the trimming then places the signal samples whose level is below a certain predetermined threshold at zero amplitude, possibly variable depending on the amplitude of the voice signal. This raking makes it possible to accentuate the periodic aspect of the signal while reducing the details detrimental to subsequent processing.
- the second filtering makes it possible to smooth the results of the bashing by eliminating the high frequencies.
- a Butterworth filter identical to the first filter can be used.
- the energy calculation which takes place in step 8 is executed on four subframes. This calculation takes place by taking the logarithm to base 2 of the sum of the energies of each pre-emphasized samples of a subframe.
- the subframes in each frame are contiguous or overlap to have a length multiple of the "pitch".
- the coding of frame 3 is of scalar type. It is carried out in application of the algorithm known under the name "Adaptive Backward Sequential" as described for example in the article of the journal IEEE on selected areas in communications, Vol. 6 feb. 88 of MM. Sugamara N and FAYARDIN N (1988) entitled "Quantizer design in LSP speech analysis”.
- the coding algorithm is executed in descending order of the LSP coefficients, starting with the last of the ways shown in FIGS. 2 and 3.
- the coding of the last LSP coefficient ( 10) takes place linearly between two frequency values F10MIN and F10MAX and takes place on N V10 values coded linearly on NB10 bits.
- frames 1 and 2 are not coded directly, but it is the type of interpolation allowing them to be quantified as faithfully as possible which is coded.
- the coder determines among 3 interpolations represented by the graph of FIG. 4 which one seems to him to give the best approximation of the values of frames 1 and 2.
- the method then chooses from the 3 previous interpolations the one which minimizes the quantization error, estimated by means of a function D_INTER defined below by adopting the corresponding code value.
- D_INTER (i) W1. (LSPQ (case i, frame 1) -LSP (Frame 1)) 2 + W2. (LSPQ (case i, Frame 2) -LSP (Frame 2)) 2 where LSPQ (case i, Frame j) is the value of the odd LSP coefficient of the frame j quantified by means of type i interpolation.
- LSP (frame j) Actual value in frame j of the odd LSP coefficient to be quantified
- W1 value of the energy of frame 1
- This coding takes place on 8 bits.
- the pitch and voicing coding take place in step 14 on three consecutive frames.
- the current voicing type is determined from six possible cases from the voices of frames 1, 2 and 3 and the voicing of frame 0 which precedes each group of frames 1, 2 and 3.
- a coding table represented in FIG. 5 makes it possible to associate with any pitch value a number from the table whose value subsequently designated by "N array" is the closest to the pitch.
- the code 0 is assigned to type 1.
- a code equal to the value "N. table” of the pitch of frame 3 is assigned to type 2.
- a code equal to 64 to which is added the value "N. table” of the pitch of frame 3 is assigned to type 3.
- a code equal to 128 to which is added the value "N. table” of the pitch of frame 1 is assigned to type 4.
- a code equal to 192 to which is added the value "N. table of the pitch of frame 1 is assigned to type 5. Coding of type 6 takes place in a very particular way by projecting the vector composed of the three values of the pitches of the three frames on the 3 vectors (Vect 1, Vect 2, Vect 3) eigen to code the three projections obtained.
- Vect 1, Vect 2, Vect 3 are an approximation of the first 3 eigen vectors of the intercorrelation matrix.
- N.tableau which is the closest to the average (P1 + P2 + P3) / 3 of the pitches of frames 1, 2 and 3.
- the corresponding code is then coded on the 63 values of the coding table.
- the projection on the second eigenvector (Vect 2) is equal to the scalar product of the pitches of frames 1, 2 and 3 by the second eigenvector (Vect 2) and the projection on the third eigenvector (Vect 3) is equal to the product scalar pitch of frames 1, 2 and 3 by the third eigenvector (Vect 3).
- the corresponding codes can be obtained respectively on only 4 and 3 values from the coding table.
- the coding of the energy which is carried out on stage 15 takes place in a known manner and described in patent application FR 2 631 146 on three consecutive frames. Four energy values corresponding to the 4 sub-fields of each of the three fields are coded. However, in order to eliminate the redundant information in these 12 values, a Main Component Analysis of the type described has the title "Data analysis elements" in the book by MM. DIDAY, LEMAIRE, POUGET and TESTU published by Dunod, is performed. Coding takes place in two stages. A first step is to make a basic change. The energy vector of dimension 12, composed of the 12 energy values of the 3 frames is projected on the first 3 main axes determined during the analysis by principal components (more than 97% of the information is contained in these 3 projections) .
- the second step consists in quantifying these 3 projections, the first projection is quantized on 4 bits, the second on 3 bits and the third on 2 bits.
- the synthesis takes place according to steps 17 to 28 of the flow diagram of FIG. 6, on the one hand, steps 17 to 21 for descrambling and decoding the values of the coefficients LSP of the filter (step 18), of the pitch (step 19), of the voicing and of the energy (step 20) for three consecutive frames and on the other hand, according to steps 22 to 28 which carry out the synthesis of the speech signal successively for each of the three frames on the basis of the information obtained during the execution of steps 17 to 21.
- Descreening and decoding follow procedures reverse to the screening and decoding procedures defined during the analysis illustrated by the flowchart of FIG. 1.
- the shaping of the synthesis filter consists in performing in step 23 an interpolation calculation of the LSP coefficients on four subframes and a calculation to transform the LSP coefficients into coefficients A i . This last calculation is followed in step 24 by a gain calculation of the synthesis filter for the 4 subframes to which is added a calculation of the energy of the excitation signal of the filter. In order to avoid sudden transitions between dissimilar filters, these are done in step 23 in four steps every quarter of a frame.
- LSP (SS Tr i , TrN) (LSP (TrN-1) * (4-i) + LSP (TrN) * i) / 4
- LSP (SS Tri, Tr N) designates the value of the interpolated filter in subframe i of frame N.
- the 12 decoded energies correspond to the energy of the speech signal after pre-emphasis, it is necessary to obtain the energy of the excitation signal divide the energy by the gain of the filter.
- the gain of the filter of each subframe is calculated using the coefficients K i according to the relation
- the last step consists in determining the value of the standard deviation of the energy of each subframe (value used during the calculation of the excitation).
- the entire coding and decoding method according to the invention can be executed by means of a microprogrammed structure formed as shown by way of example in FIG. 8 by a signal processing microprocessor 29 such as that sold by the company Texas Instrument under the designation TMS 320C25.
- a signal processing microprocessor 29 such as that sold by the company Texas Instrument under the designation TMS 320C25.
- the speech signal is first sampled by an analog to digital converter 30 before being applied to a data bus 31 of the microprocessor 29.
- An analog filter 32 coupled to an automatic gain control device 33 filters the signal speech before sampling.
- the programs and the data implemented for the execution of the method according to the invention are recorded in a read-only memory 34 and in a random access memory 35 connected to the microprocessor 29.
- An interface circuit 36 connects the microprocessor 29 via from a data line 37 to transmission devices external to the vocoder, not shown.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR9005400A FR2661541A1 (fr) | 1990-04-27 | 1990-04-27 | Procede et dispositif de codage bas debit de la parole. |
FR9005400 | 1990-04-27 |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0454552A2 true EP0454552A2 (de) | 1991-10-30 |
EP0454552A3 EP0454552A3 (en) | 1992-01-02 |
Family
ID=9396170
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19910401051 Withdrawn EP0454552A3 (en) | 1990-04-27 | 1991-04-19 | Method and apparatus for low bitrate speech coding |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP0454552A3 (de) |
JP (1) | JPH05507796A (de) |
CA (1) | CA2079884A1 (de) |
FR (1) | FR2661541A1 (de) |
WO (1) | WO1991017541A1 (de) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0543700A2 (de) * | 1991-11-22 | 1993-05-26 | Thomson-Csf | Verfahren zur Quantisierung der Sprachsignalenergie in einem Vocoder mit niedriger Bitrate |
EP0573398A2 (de) * | 1992-06-01 | 1993-12-08 | Hughes Aircraft Company | C.E.L.P. - Vocoder |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4701955A (en) * | 1982-10-21 | 1987-10-20 | Nec Corporation | Variable frame length vocoder |
US4852179A (en) * | 1987-10-05 | 1989-07-25 | Motorola, Inc. | Variable frame rate, fixed bit rate vocoding method |
EP0428445A1 (de) * | 1989-11-14 | 1991-05-22 | Thomson-Csf | Verfahren und Einrichtung zur Codierung von Prädiktionsfiltern in Vocodern mit sehr niedriger Datenrate |
-
1990
- 1990-04-27 FR FR9005400A patent/FR2661541A1/fr not_active Withdrawn
-
1991
- 1991-04-19 EP EP19910401051 patent/EP0454552A3/fr not_active Withdrawn
- 1991-04-19 WO PCT/FR1991/000329 patent/WO1991017541A1/fr active Application Filing
- 1991-04-19 CA CA 2079884 patent/CA2079884A1/fr not_active Abandoned
- 1991-04-19 JP JP91508756A patent/JPH05507796A/ja active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4701955A (en) * | 1982-10-21 | 1987-10-20 | Nec Corporation | Variable frame length vocoder |
US4852179A (en) * | 1987-10-05 | 1989-07-25 | Motorola, Inc. | Variable frame rate, fixed bit rate vocoding method |
EP0428445A1 (de) * | 1989-11-14 | 1991-05-22 | Thomson-Csf | Verfahren und Einrichtung zur Codierung von Prädiktionsfiltern in Vocodern mit sehr niedriger Datenrate |
Non-Patent Citations (3)
Title |
---|
1977 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH & SIGNAL PROCESSING, Hartford, Connecticut, 9-11 mai 1977, pages 219-222, IEEE, New York, US; R. VISWANATHAN et al.: "The application of a functional perceptual model of speech to variable-rate LPC systems" * |
1978 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH & SIGNAL PROCESSING, Tulsa, Oklahoma, 10-12 avril 1978, pages 458-461, IEEE, New York, US; E. McLARNON: "A method for reducing the transmission rate of a channel vocoder by using frame interpolation" * |
ICASSP '83 - IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, Boston, 14-16 avril 1983, vol. 1, pages 69-72, IEEE, New York, US; R.M. SCHWARTZ et al.: "A comparison of methods for 300-400 B/S vocoders" * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0543700A2 (de) * | 1991-11-22 | 1993-05-26 | Thomson-Csf | Verfahren zur Quantisierung der Sprachsignalenergie in einem Vocoder mit niedriger Bitrate |
FR2684225A1 (fr) * | 1991-11-22 | 1993-05-28 | Thomson Csf | Procede de quantification de l'energie du signal de parole dans un vocodeur a tres faible debit. |
EP0543700A3 (en) * | 1991-11-22 | 1993-09-29 | Thomson-Csf | Method for quantification of speed signal energy in a low bit rate vocoder |
EP0573398A2 (de) * | 1992-06-01 | 1993-12-08 | Hughes Aircraft Company | C.E.L.P. - Vocoder |
EP0573398A3 (de) * | 1992-06-01 | 1994-02-16 | Hughes Aircraft Co | |
US5495555A (en) * | 1992-06-01 | 1996-02-27 | Hughes Aircraft Company | High quality low bit rate celp-based speech codec |
Also Published As
Publication number | Publication date |
---|---|
WO1991017541A1 (fr) | 1991-11-14 |
EP0454552A3 (en) | 1992-01-02 |
JPH05507796A (ja) | 1993-11-04 |
CA2079884A1 (fr) | 1991-10-28 |
FR2661541A1 (fr) | 1991-10-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3241959B2 (ja) | 音声信号の符号化方法 | |
EP0782128B1 (de) | Verfahren zur Analyse eines Audiofrequenzsignals durch lineare Prädiktion, und Anwendung auf ein Verfahren zur Kodierung und Dekodierung eines Audiofrequenzsignals | |
RU2257556C2 (ru) | Квантование коэффициентов усиления для речевого кодера линейного прогнозирования с кодовым возбуждением | |
US5067158A (en) | Linear predictive residual representation via non-iterative spectral reconstruction | |
EP1320087B1 (de) | Synthese eines Anregungssignales zur Verwendung in einem Generator von Komfortrauschen | |
EP0698877B1 (de) | Postfilter und Verfahren zur Postfilterung | |
JP4222951B2 (ja) | 紛失フレームを取扱うための音声通信システムおよび方法 | |
EP0700032B1 (de) | Verfahren und Vorrichtung mit Bitzuordnung zur Quantisierung und Dequantizierung von transformierten Sprachsignalen. | |
EP0865028A1 (de) | Sprachdekodierung mittels Wellenforminterpolation unter Verwendung von Spline-Funktionen | |
US5991725A (en) | System and method for enhanced speech quality in voice storage and retrieval systems | |
EP0865029B1 (de) | Wellenforminterpolation mittels Zerlegung in Rauschen und periodische Signalanteile | |
US5884251A (en) | Voice coding and decoding method and device therefor | |
EP0428445B1 (de) | Verfahren und Einrichtung zur Codierung von Prädiktionsfiltern in Vocodern mit sehr niedriger Datenrate | |
EP2080194B1 (de) | Dämpfung von stimmüberlagerung, im besonderen zur erregungserzeugung bei einem decoder in abwesenheit von informationen | |
FR2784218A1 (fr) | Procede de codage de la parole a bas debit | |
US7603271B2 (en) | Speech coding apparatus with perceptual weighting and method therefor | |
SE470577B (sv) | Förfarande och anordning för kodning och/eller avkodning av bakgrundsljud | |
JPH08254994A (ja) | 分類化及び輪郭の目録(インベントリー)による音声符号化パラメータの配列の再構成 | |
FR2653557A1 (fr) | Appareil et procede pour le traitement de la parole. | |
US5812966A (en) | Pitch searching time reducing method for code excited linear prediction vocoder using line spectral pair | |
US6535847B1 (en) | Audio signal processing | |
EP0454552A2 (de) | Verfahren und Einrichtung zur Sprachcodierung mit niedriger Bitrate | |
JPH09508479A (ja) | バースト励起線形予測 | |
JPH0738116B2 (ja) | マルチパルス符号化装置 | |
JP3163206B2 (ja) | 音響信号符号化装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): DE ES GB IT |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): DE ES GB IT |
|
17P | Request for examination filed |
Effective date: 19920624 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: THOMSON-CSF |
|
17Q | First examination report despatched |
Effective date: 19940803 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 19941214 |