CN1189862C

CN1189862C - Decoder for phoneme of speech sound

Info

Publication number: CN1189862C
Application number: CNB021059365A
Authority: CN
Inventors: 杨凰琳
Original assignee: Inventec Besta Co Ltd
Current assignee: Inventec Besta Co Ltd
Priority date: 2002-04-09
Filing date: 2002-04-09
Publication date: 2005-02-16
Anticipated expiration: 2022-04-09
Also published as: CN1450529A

Abstract

The present invention relates to a speech sound phoneme decoder which is used for synthesizing speech data whose three parameters, namely primitive periods, amplitude and spectra, are encoded into speech signals. The speech sound phoneme decoder comprises an initialization unit, a parameter loading unit, a smooth processing unit, a synthesizing unit and a speech output unit; after the initialization unit generates an initialization signal, three parameters, namely the primitive periods, the amplitude and the spectra, are loaded to the smooth processing unit; after the smooth processing unit receives speech parameter data, the smooth processing unit processes the speech parameter data by using an interior difference method and transmits the processed speech parameter data to the synthesizing unit; the synthesizing unit synthesizes speech data according to a sequence of the primitive periods, the spectra and the amplitude and outputs the speech data to the speech output unit.

Description

Decoder for phoneme of speech sound

Technical field

The present invention relates to a kind of voice operation demonstrator, particularly a kind of decoder for phoneme of speech sound is that the decoder for phoneme of speech sound to being deciphered after the voice coding is come on the basis with the phoneme.

Background technology

In low and middle-grade e-dictionary market, brag about function with true man's pronunciation, become the characteristic that e-dictionary is mainly pursued.In order to promote the competitive power of low and middle-grade e-dictionaries in market, each tame manufacturer is absorbed in the improvement of phonetic function invariably and wants simultaneously and can reduce production costs.The true man that some manufacturer emphasized record special sound, because its data volume is big, and the kind of system's output is subjected to very big restriction, quite expend cost, so, most of manufacturers all come the pronunciation near true man in the synthetic mode of speech analysis, can allow e-dictionary can save speech data memory and improve sound quality.

The synthetic technology of this speech analysis is according to certain disposal route, and the metalanguage signal also proposes necessary characteristic parameter with it, and synthesizes voice with these parameters according to the model of voice generation.So the characteristic parameter according to different just has corresponding voice coding method and phoneme synthesizing method.

Because the synthetic process of speech analysis is that voice signal is represented original signal with minimum numerical data, so, generally being also referred to as voice compression technique, it involves the sampling and the technology such as coding and decoding of voice.As (the AdaptiveDelta Pulse Code Modulation of the adaptability residual quantity pulse-code modulation in the speech waveform coding; ADPCM) coded system, it focuses on making the signal of reconstruction and original signal waveform to heal picture better; Viewpoint from mathematics, it adopts the criterion (Minimum Mean Square Error Criterion) of least mean-square error, but the bit rate of ADPCM method has the sound quality variation after reduction less than 24kbps (Kilo Bit Per Second), and the big problem of operand.

Above-described speech analysis is synthetic, and its characteristic is to have significantly compressed voice data volume, and the advantage (utilization encryption technology) of secret communication also can additionally be arranged.But, its shortcoming be phonetic synthesis weight, partial, pitch often with natural-sounding gap to some extent, cause nature, shortcoming even not easy to identify.

Even if through the speech analysis synthetic technology of overcompression, the possibility of saving memory headroom is arranged still.In addition, the many modes with (on-line) on the line of existing speech analysis synthetic technology operate, so, must add whether the judge voice action of " sound is arranged ", usually in the process of judging, " voiced sound " the part misjudgment with " voiceless sound " can be produced husky situation when causing phonetic synthesis.

So the voice that how to allow the speech analysis synthetic technology be produced can reach on the one hand near natural-sounding, that is, and the improvement of tonequality; On the other hand, how to reach the degree of maximum compression, that is, least consumption accounts for memory headroom; Again on the one hand, how to allow the synthetic process of speech analysis comparatively simple; More than some all becomes important research project.

Summary of the invention

In view of above prior art problems, the object of the present invention is to provide a kind of coding method based on phoneme of speech sound classification, utilize with phoneme of speech sound be divided into voiced sound, voiceless sound with quiet three kinds, need only the part of voiced sound encoded; When decoding,, can carry out the speech decoding of high operand as long as use decoder for phoneme of speech sound of the present invention at the coded portion of voiced sound.

The object of the present invention is achieved like this:

The invention provides a kind of decoder for phoneme of speech sound, its with an amplitude parameter (RMS), a pitch parameter (Pitch) with one with linear predictor coded system coding (Linear Predictive Coding, LPC) the coded speech data of frequency spectrum parameter (RC ' s) is deciphered, this encoded speech data is stored in a speech database, speech data in the speech database is deciphered, and decoder for phoneme of speech sound of the present invention comprises: an initialization unit, a loading parameters unit, a smoothing processing unit, a synthesis unit and a voice-output unit.

Wherein, initialization unit is used for producing an initializing signal (initial).The loading parameters unit then is connected with initialization unit, is used for receiving the initializing signal that initialization unit is sent, and is unit loads this frame from speech database speech data with a frame (Frame).The smoothing processing unit then is used for receiving the speech data of frame this time that the loading parameters unit is loaded, and be length with the pitch (Pitch) in this frame, utilization interpolation method is handled amplitude parameter, pitch parameter and the frequency spectrum parameter in the speech data of this frame respectively, behind the speech data of intact this frame of smoothing processing cell processing, the signal that sends next frame to the loading parameters unit to load the speech data of next frame.Synthesis unit is used for receiving the speech data of handled each pitch in smoothing processing unit, and it is synthesized a voice signal; After synthesis unit is handled the speech data of each pitch, the signal that sends next pitch to the smoothing processing unit to handle the speech data of next pitch.At last, synthesis unit can be delivered to synthetic voice signal voice-output unit with the output voice.

In addition, smoothing processing then deals with the interpolation method, wherein must calculate scale parameter (Prop; Proportion).In addition, because when synthetic, be synthetic unit with pitch, promptly once synthetic one-period.Thereby, at the synthetic cycle total length of a frame, must be less than this voice length (Frame_len) that will synthesize, its residual synthetic voice length (Frame_res=Frame_len_Synths) will and arrive next frame and handles, so the voice length that next frame will synthesize is Frame_len=Frame_res+180.Wherein, prop=(Synths+PitchI)/Frame_len.

Specifically, in the technical scheme disclosed by the invention, comprising:

Described loading parameters unit comprises a parameter code translator, and it is according to the coded sequence of this pitch parameter, this amplitude parameter and this frequency spectrum parameter, and decoding is parallel this smoothing processing unit that exports to also.

Described loading parameters unit is behind the speech data that loads this frame, the speech data of temporary this this frame, and behind the signal that receives this next one frame that is transmitted by this smoothing processing unit, load the speech data of this next one frame, and the speech data of this this frame and the speech data of this next one frame are delivered to this smoothing processing unit.

Described smoothing processing unit with the interpolation method handle this this time the speech data of frame and this next one frame speech data and export this synthesis unit to.

Described smoothing processing unit comprises:

One calculates the ratio unit, in order to calculate this ratio of the synthetic frame length of the length of synthetic frame of frame and this this frame this time;

One pitch parameter smoothing processing unit in order to receiving this this time pitch parameter of frame and pitch parameter of this next one frame, and calculates a synthesized voice high parameter with the interpolation method;

One amplitude parameter smoothing processing unit in order to receiving this this time amplitude parameter of frame and amplitude parameter of this next one frame, and calculates a net amplitude parameter with the interpolation method;

One frequency spectrum parameter smoothing processing unit in order to receiving this this time frequency spectrum parameter and the frequency spectrum parameter of this next one frame of frame, and calculates one with the interpolation method and synthesizes frequency spectrum parameter;

One calculates the composition length unit, in order to calculate this this time frame the length of synthetic frame and the result inputed to this calculating ratio unit, and the signal of exporting this next one frame is to this loading parameters unit; And

One buffer is in order to store this synthesized voice high parameter, this net amplitude parameter and should synthesize frequency spectrum parameter and export this synthesis unit to.

Described synthesis unit comprises:

One pulse train generator is in order to be output as this pitch parameter one excitation signal (ExcitationSignal);

One vocal tract filter, in order to receive this excitation signal and according to this frequency spectrum parameter as the filtering parameter of this vocal tract filter will be treated to a synthetic speech signal; And

One amplitude adjustment unit in order to this synthetic speech signal is multiplied by this amplitude parameter to export reduction voice, and exports this voice-output unit to.

Described synthesis unit also comprises an internal memory, in order to temporary this synthetic speech signal and this reduction voice, and exports these reduction voice to this voice-output unit.

Relevant features of the present invention etc., conjunction with figs. and most preferred embodiment are described in detail as follows.

Description of drawings

Fig. 1 is the system architecture diagram of decoder for phoneme of speech sound of the present invention;

Fig. 2 is the specific embodiment of decoder for phoneme of speech sound of the present invention;

Fig. 3 is the Organization Chart of initialization unit and parameter loading unit in the decoder for phoneme of speech sound specific embodiment of the present invention;

Fig. 4 is the Organization Chart of smoothing processing unit in the decoder for phoneme of speech sound specific embodiment of the present invention;

Fig. 5 is the Organization Chart of synthesis unit in the decoder for phoneme of speech sound specific embodiment of the present invention.

Embodiment

Because the speech processes in e-dictionary market is rule comparatively, and the amount of data compression of its requirement is bigger, so, the present invention uses linear predictor coding (Linear Predictive Coding, hereinafter to be referred as LPC) mode as coding of the present invention with read the sign indicating number mode, because, the method is based on the speech utterance model, and the vocal tract filter of estimated signal (Vocal Tract Filter) parameter and pitch (Pitch) reach the purpose of compression, can reach low-down bit rate (Low Bit Rate), so quite be suitable as coding method of the present invention.

The present invention use sound " voiced sound " (voiced), " voiceless sound " (aspirant; Unvoiced) do basic sound classification with " quiet " phoneme of speech sound (phoneme), and with voiced speech phoneme part compressed encoding in addition, unvoiced speech phoneme part then keeps its former sound not to be compressed, quiet part then only writes down quiet length.With the parameter that this kind mode classification is calculated, comprise amplitude (RMS; Root of mean square), pitch (Pitch, i.e. tone) and frequency spectrum (RC ' s; Reflection coefficient, reflection coefficients) parameter is three kinds.Wherein, the acquisition of amplitude parameter and pitch parameter is a unit with a frame (a frame frame=180 sampling spot, the sampling rate of 8KHz), progressively calculates its parameter value.The acquisition of frequency spectrum parameter (RC ' s) then gets according to the Model Calculation of LPC, that is, calculate and get according to following transfer function (Transfer Function in Z-Domain) H (z):

H(z)＝A ₀/(1+a ₁z ^-1+a ₂z ^-2…+a ₁₀z ^-10)

Wherein, A ₀Be amplitude parameter, z (=e ^-jw) be plural number (complex number), a1-a ₁₀Be the LPC parameter.

By three kinds of above parameters, " voiced sound " speech frame (180samples) may be encoded as 54bits, and the compression bit rate is equivalent to 2.4kbps, and the position configuration of each parameter is as follows:

Pitch(6bits)，RMS(6bits)，RC′S(RC ₀--RC ₉)

6

?6

?5

?4

?3

By the resulting encoded voice of phoneme of speech sound coding method, when decompressing, only need partly voiced speech, utilize the interpolation mode that amplitude, pitch and frequency spectrum parameter are done smoothing processing, utilize voice operation demonstrator again, the reduction voiced speech; The voiceless sound part only needs to take out former voice according to the address and is reduced; And quiet part only needs to take out long getting final product of quiet time.

By the speech database that said method is set up, promptly with the basis of above-mentioned three kinds of parameters as coding, during decoding, the foundation rule that needs only according to speech database designs decoder for phoneme of speech sound.

The course of action of decoder for phoneme of speech sound, at first, earlier with a series of bit sequence (BitStream), also be, selected encoded voice data in the middle of the speech database, three parameters that transfer when coding to, amplitude, pitch and frequency spectrum parameter, again with these parameters via the voice operation demonstrator synthetic speech.Be unit with a pitch (Pitch) when synthetic, read in one group of parameter every a frame (Frame), and store a frame parameter (RMS0, RC0, Pitch0), synthetic required parameter (RMS, RC of each cycle, Pitch), making smoothing processing (Smoother) by these frames and a last frame parameter obtains.

Smoothing processing then deals with the interpolation method, wherein must calculate scale parameter (Prop; Proportion).In addition, because when synthetic, be synthetic unit with pitch, promptly once synthetic one-period.Thereby, at the synthetic cycle total length of a frame, must be less than this voice length (Frame_len) that will synthesize, its residual synthetic voice length (Frame_res=Frame_len-Synths) will and arrive next frame and handles, so the voice length that next frame will synthesize is Frame_len=Frame_res+180.Wherein, Prop=(Synths+PitchI)/Frame_len.

Below introduce the present invention in detail and utilize the designed decoder for phoneme of speech sound of above-mentioned voice coding method.

At first, please refer to Fig. 1, the system architecture diagram of decoder for phoneme of speech sound of the present invention, it has comprised following components: initialization unit 10, loading parameters unit 20, smoothing processing unit 30, synthesis unit 40 and voice-output unit 50.

At first, initialization unit 10 produces an initializing signal (initial), and loading parameters unit 20 is set the parameters initial value according to this.Then, loading parameters unit 20 promptly loads all parameter values in the middle of the frame (Frame) that will synthesize in regular turn, also, once loads three central speech parameters of a frame.Then, smoothing processing unit 30 is with every speech parameter that loading parameters unit 20 loaded in addition after the smoothing processing, pitch (Pitch) in the middle of frame of single treatment, and these are delivered to synthesis unit 40 through parameters of smoothing processing synthesize voice, and send " next frame " signal (Next_Frame) to loading parameters unit 20, allow the speech parameter of its loading " next frame ".It is exportable voice that the voice signal that synthesis unit 40 is synthesized is delivered to voice-output unit 50, and send " next pitch " (Next_Pitch) signal allow the speech parameter of smoothing processing cell processing " next pitch " to smoothing processing unit 30.

Next, will decoder for phoneme of speech sound of the present invention be described, please continue that it has illustrated signal transmission structure of the present invention with reference to figure 2 with specific embodiment.Initialization unit 10 produces initializing signal (initial).Every initial value is set according to initializing signal in loading parameters unit 20, in addition, and responsible three parameters (RCj (10) that loads phoneme of speech sound, RMSj Pitchj), and keeps three parameters (RC0 (10) of a frame, RMS0, Pitch0), last, the composition length (L) that the frame of being sent here according to the smoothing processing unit is each time handled is to produce the length (M) that next frame is handled.After 30 of smoothing processing unit receive the parameters that is transmitted loading parameters unit 20, three parameters (RCj (10) with this processed frame, RMSj, Pitchj) smoothing processing in addition, and (the RC (10) of the parameter after will handling, RMS, Pitch), be unit once with a pitch (Pitchj), be sent to synthesis unit 40, and send next pitch (Next_Pitch) signal, require the parameter of the next pitch of smoothing processing unit 30 transmission, and, send a composition length (L) to loading parameters unit 20, also promptly, the length of this synthetic speech.At last, after synthesis unit 40 synthesizes three parameters, deliver to voice-output unit 50 with the output voice.

Wherein, the position signal of parameters, as shown in Figure 2, initializing signal (initial) is one a control signal; RC0 (10) is eight signals of tape symbol (signed); RMS0 is the sixteen bit signal of tape symbol (unsigned) not; Pitch0 is not signed eight signals; RCj (10) is signed eight signals; RMSj is not signed sixteen bit signal; Pitchj is not signed eight signals; The synthetic frame length M is not signed nine signals; Composition length L is not signed nine signals; RC is signed eight signals; RMS is not signed sixteen bit signal; Pitch is not signed eight signals; Next_Frame is one a control signal; Next_Pitch is one a control signal; What synthesis unit was sent is signed sixteen bit signal.

Then please refer to Fig. 3, it has illustrated that the signal from initialization unit to the loading parameters unit produces Organization Chart.At first, the initializing signal " initial " that initialization unit 10 is produced, allow loading parameters unit 20 set the parameters initial value, comprise composition length (L=0), synthetic frame length (M=180 sampling spot is that per second is for 8000 times an example with the sampling rate), amplitude (RMS ₀=0) (RMS _jRoot of Mean Square), pitch (Pitch ₀=Pitch1) (Pitch _jThe pitch of j frame) and frequency spectrum parameter (RC ₀(i)=RC ₁(i), i=0,1,2 ..., 9; ReflectionCoefficients) or the like.The action of reading of data is carried out by data load unit 24, wherein, bit sequence (54) is decoded as parts such as RCj (10), RMSj and Pitchj for from the phoneme of speech sound data that speech database read in via parameter code translator 241, input to respectively in the middle of second buffer 25.Then, the data that second buffer 25 is about to be read in are sent to next part, also, and the smoothing processing unit 30 and second buffer 26.The 3rd buffer 26 is temporary with the data that this read in, the reference data of the speech parameter that promptly can be used as next frame and read in, also be, when " the next frame " that receive the smoothing processing unit (Next_Frame) order, being about to this parameter setting is the parameter value (RC0 (10) of a last frame, RMS0 Pitch0), and inputs to the reference data that smoothing processing is made in smoothing processing unit 30.

In addition, since at the beginning frame length (180) usually can multiple in the length of pitch, so have rest parts.Therefore, this rest parts is incorporated in the middle of the frame length of next time, be described as follows: at first initializing signal (initial) inputs to buffer 21 and calculates in the middle of the composition length unit 33, and with buffer 21 and to calculate composition length unit 33 output L be clearly zero, this moment, totalizer 23 outputs were first synthetic frame length (M=180).Then calculate next synthetic frame length, buffer 21 loads last synthetic frame length (9) in the middle of the subtracter 22, deduct composition length (L) last time, and add frame length default value constant (default value constant=180) by totalizer 23, can calculate the length (M=M-L+180) of next frame.

The parameter that is loaded by the loading parameters unit then by smoothing processing unit 30 smoothing processing in addition, please refer to Fig. 4.Wherein, " smoothing processing unit 30 " comprises pitch parameter smoothing processing unit 31, calculates ratio unit 32, calculates composition length unit 33, amplitude parameter smoothing processing unit 34, frequency spectrum parameter smoothing processing unit 35 and a buffer 36.

Smoothing processing unit 30 is after receiving the supplemental characteristic of twice frame, also be, this frame phoneme of speech sound parameter (RCj (10), RMSj, Pitchj) and last phoneme of speech sound frame parameter (RC0 (10), RMS0 and Pitch0), promptly begin to do smoothing processing, do smoothing processing one time every a pitch length (Pitch).

At first, calculate scale parameter by the ratio of calculating unit 32, also, Prop=L/M.Then, (Pitchj Pitch0) smoothly locates unit 31 by pitch parameter and deals with pitch parameter, to draw treated pitch, also promptly: Pitch=Pitch0* (1-Prop)+Pitch _j* Prop, the pitch that calculates (Pitch), it is temporary to deliver to buffer 36.Amplitude parameter (RMSj RMS0) then makes smoothing processing by amplitude parameter smoothing processing unit 34, draws the amplitude parameter through smoothing processing, also, and RMS=RMS0* (1-Prop)+RMS _j* Prop delivers to buffer 36 equally and keeps in.The smoothing processing of frequency spectrum parameter (RCj (10), RC0 (10)) then is responsible for by frequency spectrum parameter smoothing processing unit 35, draws the frequency spectrum parameter through smoothing processing, also, and RC (i)=RC0 (i) * (1-Prop)+RC _j(i) * Prop, i=0,1 ..., 9, similarly, frequency spectrum parameter is after treatment deposited to buffer 36.

Deposit pitch parameter to the buffer 36 and amplitude parameter and frequency spectrum parameter and deliver to next part, also be, after in the middle of the synthesis unit 40, after the parameter of synthesis unit 40 synthetic these pitches, send " next pitch " (Next_Pitch) signal, the output of this signal may command buffer 36 when buffer receives this signal, promptly loads the speech parameter through smoothing processing of next pitch." next pitch " accepted (Next_Pitch) behind the signal in calculating composition length unit 33 in the middle of " smoothing processing unit 30 ", promptly calculates this time synthetic frame length, draws L=L+Pitch, if L＞M then L=0; Otherwise, send " next frame " (Next_Frame) signal carry out the loading of next frame parameter to loading parameters unit 20, and make L=0.Wherein, the initializing signal (initial) that initialization unit 10 is sent is then delivered to and is calculated in the middle of the composition length unit 33, and makes L=0, in order to this unit of Initiation.

Ensuing work is undertaken by synthesis unit 40, please refer to Fig. 5, and it has comprised pulse train generator 41, vocal tract filter 42, amplitude adjustment unit 43 and internal memory 44.

The pulse signal of pulse train generator 41 output one-periods, this pulse signal is the waveform of emulation human vocal band vibration, is stored in advance in the wherein included internal memory, the value that to capture its preceding length be Pitch, if Pitch is greater than the stored pulse sequence length of internal memory, overriding price mends 0.For example: the stored pulse train of internal memory be p[1], p[2] ..., p[25] }, then as if Pitch＞25, output e (n)=p[1], p[2] ..., p[25], 0 ..., if 0} is Pitch＜=25, output e (n)=p[1], p[2] ..., p[Pitch] }.

Vocal tract filter 42 is for emulation human mouth, nasal cavity, sound channel etc., to the resonance effects that vocal cord vibration produced, can an all-pole filter (All Pole Filter) or a dot matrix wave filter (LatticeFilter) realize, its input filter parameter is RC (i), i=0,1,2 ... 9.

Pulse train by behind the vocal tract filter 42, is passed through amplitude adjustment unit 43 again, get final product synthetic speech signal, amplitude adjustment unit 43 is calculated the amount of the required adjustment of amplitude by RMS.After the phonetic synthesis, send " next pitch " (Next_Ritch) to smoothing processing unit 30 by amplitude adjustment unit 43.

The voice signal that 44 responsible temporary vocal tract filters 42 of internal memory and amplitude adjustment unit 43 are calculated.

At last, the parameter after 30 processing of smoothing processing unit via synthesis unit 40 synthetic pitch voice, is delivered to voice-output unit 50, promptly exportable voice from internal memory 44.Voice-output unit 50 has a core buffer (Memory Buffer) at least, allows each synthetic voice cycle be stored into wherein.

Though the present invention with aforesaid preferred embodiment openly as above; right its is not in order to qualification the present invention, any those of ordinary skill in the art, without departing from the spirit and scope of the present invention; when can doing a little change and retouching, so protection scope of the present invention is as the criterion with claim.

Claims

1. decoder for phoneme of speech sound, it is characterized in that, described code translator is to being deciphered with the coded speech data of the frequency spectrum parameter of linear predictor coded system coding with an amplitude parameter, a pitch parameter and, this encoded speech data is stored in a speech database, and this decoder for phoneme of speech sound comprises:

One initialization unit is in order to produce an initializing signal;

One loading parameters unit is connected with this initialization unit, in order to receiving this initializing signal, and is that unit loads a speech data of frame this time from this speech database with a frame;

One smoothing processing unit, in order to receive this speech data of frame this time, and with this this time the pitch in frame be length, utilization interpolation method is handled this this amplitude parameter, this pitch parameter and this frequency spectrum parameter in the speech data of frame this time respectively, the signal that sends a next frame to this loading parameters unit to load the speech data of this next one frame;

One synthesis unit, in order to the speech data that receives this handled this pitch in smoothing processing unit and synthesize a voice signal, after this synthesis unit is handled the speech data of this pitch, the signal that sends a next pitch to this smoothing processing unit to handle the speech data of this next one pitch; And

One voice-output unit is in order to receive voice signal that this synthesis unit transmits with the output voice.

2. decoder for phoneme of speech sound as claimed in claim 1, it is characterized in that, described loading parameters unit comprises a parameter code translator, and it is according to the coded sequence of this pitch parameter, this amplitude parameter and this frequency spectrum parameter, and decoding is parallel this smoothing processing unit that exports to also.

3. decoder for phoneme of speech sound as claimed in claim 1, it is characterized in that, described loading parameters unit is behind the speech data that loads this frame, the speech data of temporary this this frame, and behind the signal that receives this next one frame that is transmitted by this smoothing processing unit, load the speech data of this next one frame, and the speech data of this this frame and the speech data of this next one frame are delivered to this smoothing processing unit.

4. decoder for phoneme of speech sound as claimed in claim 1 is characterized in that, described smoothing processing unit with the interpolation method handle this this time the speech data of frame and this next one frame speech data and export this synthesis unit to.

5. decoder for phoneme of speech sound as claimed in claim 1 is characterized in that, described smoothing processing unit comprises:

6. decoder for phoneme of speech sound as claimed in claim 1 is characterized in that, described synthesis unit comprises:

One pulse train generator is in order to be output as an excitation signal with this pitch parameter;

One vocal tract filter is in order to receive this excitation signal and as the filtering parameter of this vocal tract filter it to be treated to a synthetic speech signal according to this frequency spectrum parameter; And

7. as claim 1 or 6 described decoder for phoneme of speech sound, it is characterized in that described synthesis unit also comprises an internal memory,, and export these reduction voice to this voice-output unit in order to temporary this synthetic speech signal and this reduction voice.