CN1315033A

CN1315033A - Signal processing techniques for time-scale and/or pitch modification of audio signals

Info

Publication number: CN1315033A
Application number: CN99810151A
Authority: CN
Inventors: S·M·J·赫克
Original assignee: SIGMA AUDIO RESEARCH Ltd
Current assignee: SIGMA AUDIO RESEARCH Ltd; Sigma Audio Res Ltd
Priority date: 1998-08-28
Filing date: 1999-08-27
Publication date: 2001-09-26
Anticipated expiration: 2019-08-27
Also published as: EP1127349A1; US6266003B1; JP2002524759A; WO2000013172A1; CN1128436C; JP4527287B2; EP1127349B1; EP1127349A4; AU5454899A

Abstract

A method of signal processing for time scale and/or pitch modification of audio signals is disclosed. The method involves encoding and resynthesising a wave form whereby the wave form is sampled into a series of frames, each frame is multiplied by a windowing function where the peak of the windowing function is centred at approximately the zero point of each frame. The resulting function is then subjected to a Fast Fourier transform thus producing a frequency-domain wave form. The resultant wave form is convolved with a variable kernel function, the specification of the variable kernel function varying with frequency. Maxima and associated minima in a magnitude spectrum of each convolved frame are located so that each local maxima and associated minima define a plurality of regions. Each region corresponds to a frequency component of the signal. Each of the regions is analysed in the frequency domain representation separately by summing the complex frequency components or bins falling within the defined region to a signal vector. The variable kernel function can be usefully varied to achieve a differing trade of between the frequency and temporal resolution across the frequency range of the signal.

Description

Be used for the markers of sound signal and/or the signal processing technology of pitch modification

Technical field

The present invention relates to the coding and the processing of digital signal.Especially, the markers and/or the spacing (pitch) that the present invention relates to sound signal are revised, but this is not exclusive.Equally, signal analysis described here and again synthetic method be not limited to sound signal.Can imagine, the present invention can find the application of other signal being encoded with (being similar to wavelet) method described here.An example of such application comprises image compression.In fact, the present invention can be applicable to wish to analyze simultaneously with different time/spatial resolutions the occasion of frequency domain zones of different.

Background technology

Many known prior aries that are used for the markers/spacing modulation of sound signal are arranged in this area.These technology can roughly following classification.

(a) time domain approach:

These technology attempt to estimate by the cyclic activity in the detection sound signal basic cycle of music signal.By this process, input signal is delayed and multiply by undelayed signal, then in low-pass filter to its long-pending carry out level and smooth, so that the approximate measure of autocorrelation function to be provided.Then, an aperiodicity signal that uses autocorrelation function to detect to be hidden in the noise or the signal of one-period property difference.In case find the basic cycle of music signal, then repeat this process, and overlapped signal part by analysis.Distinct disadvantage of these technology is that most of sound signals do not have the basic cycle.For example, multitone musical instrument, reverberation recording and whop all do not have the discernible basic cycle.In addition, when these methods of application, the transition in the music is repeated.This causes note to have a plurality of beginnings and ending.Another problem of this technology is that the overlapping of decay part of music may produce metal, machinery or show the audio frequency effect that is similar to echo property.

(b) sinusoidal curve analytical approach:

These technology supposition input signals are made of pure sinusoidal curve.Therefore, the inherent defect of this method is self-evident.

The sinusoidal curve analytical technology uses Fast Fourier Transform (FFT) in short-term (FFT) to estimate to form sinusoidal frequency.Then, come the synthetic signal that obtains, the output of wanting with generation with one group of audio-frequency generator (tone generator).The information of the frequency content of relevant signal in a time interval is captured in Fourier analysis in short-term by the window function of choosing.A distinct disadvantage of this technology is, single time domain window is applied to all frequency contents of this signal, and therefore, signal analysis can not be exactly corresponding to the sensation of people to signal content.In addition, conventional sinusoidal curve analytical approach uses the local maximum search of amplitude spectrum to determine to form sinusoidal frequency, comprises that the relative phase of considering between the analysis frame changes.This technology has been ignored near any side information that is positioned at each local maximum.Its consequence is that any signal modulation that occurs in the single analysis frame is foreclosed, and causes sound smudgy (smearing) and almost completely loses transition.Under the situation of audio frequency, an example of such transition is playing of guitar.

(c) phase vocoder (vocoder) method:

Such technology is organized wave filter to Fast Fourier Transform (FFT) greatly as one, and handles the output of each wave filter dividually.Use the relative phase between two continuous analyses importing to change the frequency of estimating the interior signal content of each case (bin).From then on information is synthesized the frequency-region signal that obtains, and each case is regarded as a signal that separates.Different with the sinusoidal curve analytical technology, the spectrum energy that this method has kept original signal distributes.Yet it has destroyed the relative phase of any transient information.Therefore, the sound of acquisition is smudgy and be similar to echo.

Therefore,, wish so to analyze and audio signal, thereby the output that obtains keeps the tone characteristic of original signal, and can capture transient sound exactly, and do not make output signal thicken unclear or introduce the character that is similar to echo at prior art.

Correspondingly, an object of the present invention is to provide a kind of technology that is used for audio signal, this technology has realized aforesaid target, improved in the prior art intrinsic at least some shortcomings, or provide a useful selection to the public at least.In addition, another object of the present invention provides a kind of signal analysis and synthetic method, and this method also can be applicable to the coding of signal usually.

Summary of the invention

In one aspect of the invention, provide a kind of and be used for waveform coding and synthetic again method, this method comprises:

This waveform is sampled obtaining a series of discrete samples, and constitute series of frames by them, each frame is crossed over a plurality of samples;

Each frame be multiply by (preferably raised cosine) function of windowing, and the peak value of the function of wherein windowing is the center with the zero point of each frame basically;

Fast Fourier Transform (FFT) is applied to each frame, thereby produces a frequency-domain waveform;

The frequency domain data and the variable core function (its characteristic changes with frequency) that obtain are carried out convolution;

Each local maximum in the amplitude spectrum of the frame of convolution and minimum value are on every side positioned, and wherein each local maximum and relevant minimum value limit a plurality of zones, and each zone is corresponding to a frequency component of this signal; And

Analyze each zone in the frequency domain presentation dividually by each case that drops on the plural frequency component in the localized area being sued for peace into a signal phasor; Wherein can usefully change the variable core function, in the frequency range of signal, to realize different compromise between frequency and the temporal resolution.

In a preferred embodiment, this waveform wherein can change kernel function with the sense quality near people's ear corresponding to a digitized audio frequency waveform.

Under the situation of waveform corresponding to a sound signal, peaked position is corresponding to the spacing of appreciable frequency component.

This method can comprise that also handling this signal is expressed as the step of signal phasor to it simultaneously.

The form that this processing can be taked to revise spacing or markers (in the sound signal) or further simplify (reduce) data is to be applicable to effective signal storage and/or transmission.

Under the situation of revising sound signal, can move the frequency location and the phase place of analyzed signal phasor as required, with the calibration of realization time and/or spacing.

Can realize changing the time domain representation through sampling of writing in reply number in the frequency domain by an equivalent signal (its component is corresponding to determined those signal phasors in analyzing original signal) is accumulated to.

Best, can use inverse fast fourier transform, can suitably be windowed and accumulated the time-domain signal of signal that produces through decoding thereby can provide.

Best, by subjectively estimating the quality of synthetic output, determine the form of convolution function with experience.

Best, kernel function is embodied as first order pole (pole) low pass filter operation to described data to the application of frequency domain data, the position of limit changes with frequency.

Best, under the situation of analyzing audio signal, can specify limit by the control function s (f) of following form:

s(f)=0.4+0.26arctan(41n(0.1f)-18)

Here, f is to be the frequency (cycles/sec) of unit with the hertz.

Can specify frequency domain filter by following relation:

your(f)=[1-s(f)]yin(f)+s(f)yout(f-1)

Best, for audio signal, handle each signal phasor dividually; For spacing moves, the frequency of this component be multiply by value of real part (real value) spacing factor; For spacing moves and time-scale modification, calculate and use no low-frequency disturbance (glitch) and rebuild necessary phase shift.

Best, this method comprises following further step:

Making the frequency domain output array is zero, for each analyzed frequency, is representation in components analyzed signal phasor, value of real part frequency map to two a nearest round values frequency case; And

Making analyzed signal phasor be proportional to 1 position that deducts value of real part frequency and each case distributes between two casees.

In yet another aspect, can come the zone of translation (translate) acquisition by frequency, thereby peaked position be calibrated the zone around the translation simultaneously.

For each zone with maximal value and first and second relevant minimum value, for the spacing of sound signal moves, with the spacing shifter factor each the peaked position in the frame is calibrated, the relevant harmonic information between first and second minimum value is moved to maximal value each position on every side of being calibrated.

For signal being carried out time-stretching (stretch) or compression, make the same position in each maximal value maintenance frequency domain, frequency domain band or harmonic information that stretching simultaneously or compression are relevant with maximal value, thereby the amplitude of stretching harmonic wave and frequency modulation (PFM) keep the spacing of input signal simultaneously.

The method also can comprise following further step:

Data resampling in each frame is become a plurality of casees;

Each case is mapped to a value of real part position in the output frame, and in this position, for the case x in the frequency band at frequency f reqmax place, the value of real part position in the output frequency domain is y, wherein for maximal value

y = {feeq}_{\max} \times shift + \frac{(x - {freq}_{\max})}{(scale)}

Here, shift equals frequency displacement, and scale equals the temporal extension ratio.

Best, the y round down is to the immediate integer z that is less than or equal to y, wherein delivery casing z and z+1 addition, to be proportional to 1 integer position poor that deducts y and these casees.

In yet another aspect, the invention provides the software that is applicable to said method.

In yet another aspect, the invention provides the hardware that is applicable to said method.

Summary of drawings

To only also the present invention be described with reference to the accompanying drawings now by example, wherein:

Fig. 1: the simplified schematic block diagram (being scattered in the 28th to 30 page) that an embodiment of the inventive method is shown;

Fig. 2: the simplified schematic block diagram (being scattered in the 31st to 33 page) that an embodiment of another method of the present invention is shown;

Fig. 3: the synoptic diagram that the process of search maximum/minimum is shown;

Fig. 5 a and 5b: illustrate with respect to two peaked spacings and time-stretching.

Better embodiment of the present invention

With reference to figure 1, the process flow diagram of simplification illustrate among the embodiment of signal processing method institute in steps.For clear, this synoptic diagram is scattered in the 15th to 17 page.

The audio signal digitizing of one input is become frame 10.Each such frame of following then processing:

With the cosine function 30 that (for example) is wide each frame 10 is windowed (20), thus the presentation through the time domain modulation of generation input signal frame 10.Then, Fast Fourier Transform (FFT) 50 is applied to this frame, thereby produces the frequency domain presentation 60 of input signal 60.

Then, determine that with s (f) 71 pairs of frequency domain datas of filter function 60 of parameter carry out filtering.Also can regard filter function as in this example a low pass single-pole filter.Function s (f) 70 has specified the behavior of wave filter how to change with frequency.Can filter function 71 be described by following recurrence relation:

yout(f)=[1-s(f)]yin(f)+s(f)yout(f-1)

Thereby, ' severe degree (severity) ' of s (f) control filters 71.Therefore, in fact, different convolution kernels is used for each frequency case.The real part of each case of convolution and imaginary part dividually.In this example embodiment, filtering or convolution function 71 have the effect that makes frequency domain information " fuzzy (blur) ", therefore can be called ambiguity function to convolution function.Fuzzy or expansion the narrowing down of frequency domain data corresponding to equivalent window in the time domain frame.Therefore, calculated each frequency case of Fast Fourier Transform (FFT) effectively, as the time-domain window of before the FFT operation, having used different size.

The effect of wave filter is not necessarily blured data.For example, half of time domain samples translation window size made and must carry out high-pass filtering to frequency domain data, in time domain, to realize windowing of equivalence.

By ascending order frequency domain filter 71 is applied to each chest, uses by the descending of frequency case then.This has guaranteed not introduce phase shift in frequency domain data.

A critical aspects of the present invention is, selects control function s (f) under the situation of processing audio frequency data, thereby near the stimuli responsive of the human body cilium on the eardrum film that is positioned at people Er Nei.In fact, like this choice function s (f), thereby near the time/frequency response of people's ear.

In this preferred embodiment,, determine the form of control function s (f) with experience by estimating output or the quality of synthetic waveform under changing environment.Though this is a kind of process of subjectivity, found that the assessment that the quality to synthetic video repeats to change has produced very gratifying convolution function.

The preferred versions of control function s (f) is:

s(f)=0.4+0.26arctan(41n(0.1f)-18)

Here, f is to be the frequency (cycles/sec) of unit with the hertz.

In fact, above-mentioned steps is similar to by a big group wave filter comes the effective ways of processing signals, and wherein the bandwidth of each wave filter can be controlled by control function s (f) independently.

In case filter application 71 is then analyzed (90) frequency domain data 80 through convolution, to determine the position of local maximum and relevant local minimum.

In order to carry out this step, find that the working strength spectrum is more effective.Therefore, for each frequency, if I (f)＞I (f-1) and I (f)＞I (f+1), then these data are local maximum.If then there is local minimum in I (f)＜I (f-1) and I (f)＜I (f+1).Here,

Mag (f) = \sqrt{real {(f)}^{2} + im {(f)}^{2}}

, intensity (f)=real (f) ²+ im (f) ²

With reference to figure 2, use each maximal value and relevant local minimum to limit zone (shown in the hatched arrows among Fig. 3) corresponding to the harmonic wave that can hear in the original audio frequency signal.Peaked position is corresponding to the spacing of the harmonic wave of feeling in the frequency domain, and the frequency domain information band around the maximal value is represented any correlation magnitude or the frequency modulation (PFM) of this harmonic wave.Owing to importantly can not lose this information, the summation of whole frequency band provides signal phasor around this peak value so use.Like this, the temporal resolution of analyzing samples will with the bandwidth match of any modulation that is taken place.

The following technical point of foundation is turned up the soil and is handled each zone.Determine the accurate estimation of each peaked position.With reference to the figure below among the figure 3, big arrow a (300) is the poor of minimum strength in three intensity arrows (max-1) and maximum intensity (max).Little arrow b (310) is the poor of minimum (max-1) and intermediate intensity (max+1).That uses the two recently is offset the integer maximal value.

Illustrate schematically by label 130 in Fig. 1 that spacing moves and time-scale modification.At this some place, other application is shown by data reduction (133) or transmission/storage (134) step.These alternative options shown in Figure 1.

Synthesize treated data again according to following method:

For ⅰ frequency component by analysis, vector (ⅱ) has a value of real part position y in frequency domain output.

The y round down to the immediate integer that is less than or equal to y, and is represented by z.Thereby, z=Int (y).

Then, be proportional to 1 and deduct y case integer position poor therewith, delivery casing z and z+1 and vector (ⅰ) addition.

Bin[z]=Bin[z]+[1-(y-z)] vector (ⅰ)

Bin[z+1]=Bin[z+1]+(y-z) vector (ⅰ)

Here plural number is carried out all operations.

For markers or the spacing of revising analyzed signal, must compensate any phase shift, thereby make synthetic output unanimity (that is no low-frequency disturbance).For this reason, the output signal in arbitrary frame is in time moved forward fixing sample number.Therefore,, can determine the output phase change how much, thereby output smoothing ground and previous synthetic frame are joined for given distance measurement.

Yet input time, frame just moved some other sample number.Therefore, when analysis window was passed through the input data, analyzed phase value changed.

Therefore, the required rate of change of the rate of change of calculating input phase and output phase is poor.The difference of these phase places be how soon pivot analysis and synthetic between the measuring of phase place of frequency domain data.Above each signal phasor that limits all has a frequency measurement.Using this measured value to calculate how soon rotation amplitude is 1 vector, and wherein this vector is plural presentation.This vector be multiply by this signal phasor, think the synthetic phase shift that necessity is provided, and do not influence decay (decay) characteristic in each zone or the timing of other modulation.

Provide this phase shift (is unit with the radian) by following formula:

Here, t _rReconstruction time step-length in the=sample, t _aStep-length analysis time in the=sample, t ₂FFT size in the=sample.

Because the measured value of frequency provides measuring of phase differential between a synthetic frame and next synthetic frame, so must add up to these differences when carrying out synthesizing.

The accumulation and only be applied to a zone, therefore, must follow the tracks of the zone to next synthetic frame from a synthetic frame.

Developed easily data structure and to next synthetic frame the zone has been followed the tracks of, this data structure has been described with reference to Fig. 4 a and 4b from a synthetic frame.One integer array is included in interior position corresponding to the local maximum of all casees in this zone, a zone.One corresponding array comprises last phase value (is unit with the radian) that is used for rotating this zone phase place.In case, store this phase value with the index identical with peaked position.

Therefore, when analyzing a new frame and detecting local maximum, use peaked position to index to this integer array.Existing peaked index in the previous frame is provided like this.Then, use this index to visit this array, this array comprises last phase value that is used for previous synthetic frame respective regions.This is shown in Fig. 3 a and the b, for this analysis frame n illustrates with immediate maximal value array and phased array.Consider n+1 analysis frame, the first frequency maximal value is 7.From previous frame, corresponding the 7th unit of immediate maximal value array is 5.From previous frame n, the 5th unit of phased array frame is 12 degree.Use the estimation of local maximum that this is upgraded, be stored in the phased array of next frame with position 7 then.For second zone 410, from previous analysis frame n, the 13 unit of immediate maximal value array provides 16.From the phased array of previous analysis frame n, providing phase place is 57 degree.Frequency of utilization estimates to upgrade this phase value, and it is placed the position 13 of next phased array.

Constitute the signal frequency-domain presentation from known component of signal.For each signal phasor, this vector is added to the frequency domain output array.Because frequency location is value of real part, thus from the energy distribution of signal phasor between the position of immediate two (round values) casees.Then, the frequency domain presentation is carried out inverse Fourier transform (the 16th page, 150 among Fig. 1), so that the time domain representation of composite signal to be provided.Owing to sentence different temporal resolutions in different frequencies and come analytic signal, thus synthetic time-domain signal only in the zone that is equivalent to employed high time resolution just effectively.For this reason, (172) last composite signal (180) is preceding being added in overlapping mode, with (relatively) little sine and cosine window (170) to synthetic time-domain signal window (160).

Process information is as follows to realize that spacing moves with a variation (but equivalence) method of time-stretching.

Another kind method is substantially similar to first method, they share window (420), Fourier transform (450), filtering (460), minimum value and maximal value detect (490) these steps.The key distinction between these two kinds of methods is following this point.The content addition that first method is regional with each, and another kind of method keeps each regional content (510) clearly.Then, move with each regional content of time-stretching factor translation according to spacing respectively and calibrate (530).For the spacing move operation, the content in translation one zone like this, thus by frequency maximal value is calibrated.For the time-stretching operation, with the content calibration in time-stretching factor pair one zone, thereby peaked frequency does not change.

Basically as described in figure 4a and 4b, carry out phase shift compensation as above.For synthetic output, from the unaltered output copy of Fourier transform step frequency domain data to be synthesized, whenever next zone.In the mode identical, each regional content is accumulated in the output frequency domain impact damper with first method.

When these two kinds of technology of realization many variations are arranged, these change those skilled in the art is clearly.Yet key feature of the present invention is to use control function s (f) to change frequency domain filter with different frequency.Like this equivalent time domain data with frequency shift have been produced the effect of windowing.Under the situation of processing audio frequency signal, select this control function, with of the response of reflection human body cilium to audio frequency range.Though determine the shape of this curve with experience, provable other curve is applicable to other treatment technology and application.

Identification and the location maximal value and relevant minimum value of being further characterized in that of the present invention.The technology that is disclosed at present is efficiently on calculating, and high-quality time-stretching and spacing move to allow to carry out fast to sound signal.

Experiment shows that the tone quality of the sound that present technique produced obviously strengthens, and believes that this mainly is to realize by the harmonic information that keeps in the peaked sideband of local frequencies.

With regard to actual realization the of the present invention, can imagine with software or hardware and realize this technology.In the latter case, hardware can form the part such as acoustic components such as audio-playback machines.Potential application of the present invention comprises the SoundRec industry, generally needs Audio Signal Processing/synthetic in the sector, to satisfy very high reproduction quality standard.Other application is included in those application in the show business, can expect, present technique may have application in wanting to change audio reproduction/transmitting system of spacing and time.Can expect that also these application may be in common signal Processing, data reduction and/or data send and store.Under latter event, can change selection to specific convolution function.

With reference to unit or integer, comprise these equivalents in the above description, proposed by independent as them with known equivalent.

Though described the present invention by way of example and with reference to specific embodiment, can understand, can make amendment and/or improve, and do not deviate from the scope of appended claims.

Claims

1. one kind to waveform coding and synthetic again method, and this method may further comprise the steps:

To obtain a series of discrete samples, from these composition of sample series of frames, each frame is crossed over a plurality of samples to waveform sampling;

Each frame be multiply by the function of windowing, and wherein the peak value of this function of windowing is the center with the zero point of each frame basically;

The frequency domain data and the variable core function that obtain are carried out convolution, and the characteristic of variable core function changes with frequency;

Locate local maximum and minimum value on every side at each in the amplitude spectrum of the frame of convolution, wherein each local maximum limits a plurality of zones with relevant minimum value, and each zone is corresponding to a frequency component of signal; And

Sue for peace into a signal phasor and analyze each zone in the frequency domain presentation dividually by dropping on plural frequency component in the localized area or case, wherein can usefully change the variable core function, to realize in the signal frequency range different compromise between the frequency and temporal resolution.

2. as claimed in claim 1 to waveform coding and synthetic again method, the function that it is characterized in that windowing is a raised cosine.

3. as claimed in claim 1 to waveform coding and synthetic again method, it is characterized in that this waveform corresponding to a digitization frequencies waveform, wherein changes kernel function, with the sense quality near people's ear.

4. as claimed in claim 1 waveform coding and synthetic again method be is characterized in that this waveform corresponding to a sound signal, peaked position is corresponding to the spacing of the frequency component of feeling.

5. as claimed in claim 1 to waveform coding and synthetic again method, it is characterized in that also comprising that handling this signal is expressed as the step of signal phasor to it simultaneously.

6. as claimed in claim 1 to waveform coding and synthetic again method, it is characterized in that described processing employing is applicable to the form of useful signal storage and/or the modification spacing that sends or markers (in the sound signal) or further data reduction.

7. as claimed in claim 1 to waveform coding and synthetic again method, it is characterized in that under the situation of revising sound signal, move the frequency location and the phase place of analyzed signal phasor according to scheduled volume, with the calibration of realization time and/or spacing.

8. as claimed in claim 1 to waveform coding and synthetic again method, it is characterized in that by an equivalent signal accumulation is realized in the frequency domain be converted back to signal through the sampling time domain representation, component that wherein should the equivalence signal is corresponding to determined those signal phasors in analyzing original signal.

9. as claimed in claim 1 waveform coding and synthetic again method be is characterized in that using inverse fast fourier transform, can suitably be windowed and accumulate the time-domain signal that produces through the signal of decoding thereby provide.

10. as claimed in claim 1 to waveform coding and synthetic again method, it is characterized in that determining the form of convolution function with experience by subjectively estimating the quality of synthetic output.

11. as claimed in claim 1 to waveform coding and synthetic again method, it is characterized in that kernel function is embodied as single pole and low pass filter operations to described data to the application of frequency domain data, the position of limit changes with frequency.

12. as claimed in claim 1 waveform coding and synthetic again method be is characterized in that under the situation of analyzing audio signal that the control function s (f) by following form specifies limit:

s(f)=0.4+0.26arctan(41n(0.1f)-18)

Here, f is to be the frequency (cycles/sec) of unit with the hertz.

13. as claimed in claim 1 to waveform coding and synthetic again method, it is characterized in that and can specify frequency domain filter by following relation:

yout(f)=[1-s(f)]yin(f)+s(f)yout(f-1)

14. as claimed in claim 1 waveform coding and synthetic again method be is characterized in that for audio signal, handle each signal phasor dividually; For spacing moves, the frequency of this component be multiply by the value of real part spacing factor; For spacing moves and time-scale modification, calculate and use no low-frequency disturbance and rebuild necessary phase shift.

15. as claimed in claim 1 to waveform coding and synthetic again method, it is characterized in that this method comprises following further step:

Making the frequency domain output array is zero, for each analyzed frequency, is representation in components analyzed signal phasor;

Value of real part frequency map to two a nearest round values frequency case; And

16. it is as claimed in claim 1 to waveform coding and synthetic again method, it is characterized in that the zone that obtains in the frequency domain is moved to different frequencies around each maximal value, the position of the signal of maximal value and acquisition is the multiple of peaked frequency, thereby peaked position is calibrated the zone around the translation simultaneously.

17. it is as claimed in claim 16 to waveform coding and synthetic again method, it is characterized in that for each zone with maximal value and first and second relevant minimum value, for the spacing of sound signal moves, to in the frame each peaked position calibration, first and second minimum value are moved to maximal value each position on every side with relevant harmonic information between the maximal value.

18. as claim 16 or 17 described to waveform coding and synthetic again method, it is characterized in that for signal is carried out time-stretching, make the same position in each maximal value maintenance frequency domain, compress frequency domain band or the harmonic information relevant simultaneously with maximal value, thereby the amplitude of stretching harmonic wave and frequency modulation (PFM) keep the spacing of input signal simultaneously.

19. as claimed in claim to waveform coding and synthetic again method, it is characterized in that also comprising following further step:

Data resampling in each frame is become a plurality of casees;

Each case is mapped to a value of real part position in the output frame, in this position, for maximal value at frequency f req _MaxCase x in the frequency band at place, the value of real part position in the output frequency domain is y, wherein

y = {freq}_{\max} \times shift + \frac{(x - {freq}_{\max})}{(scale)}

20. as claimed in claim 19 waveform coding and synthetic again method be is characterized in that the y round down to the immediate integer z that is less than or equal to y, wherein delivery casing z and z+1 addition, to be proportional to 1 integer position poor that deducts y and these casees.

21. the software that foundation such as the described method of claim 1 to 20 are operated.

22. one kind constitutes the device of carrying out according to as the described method of claim 1 to 20.