WO1986005617A1 - Traitement de formes d'ondes acoustiques - Google Patents
Traitement de formes d'ondes acoustiques Download PDFInfo
- Publication number
- WO1986005617A1 WO1986005617A1 PCT/US1986/000543 US8600543W WO8605617A1 WO 1986005617 A1 WO1986005617 A1 WO 1986005617A1 US 8600543 W US8600543 W US 8600543W WO 8605617 A1 WO8605617 A1 WO 8605617A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frequency
- frame
- waveform
- components
- series
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims description 6
- 238000000034 method Methods 0.000 claims abstract description 75
- 238000004458 analytical method Methods 0.000 claims abstract description 18
- 230000003595 spectral effect Effects 0.000 claims abstract description 5
- 230000005284 excitation Effects 0.000 claims description 24
- 238000005070 sampling Methods 0.000 claims description 17
- 238000005259 measurement Methods 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 5
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- 239000000470 constituent Substances 0.000 claims 11
- 240000008881 Oenanthe javanica Species 0.000 claims 1
- 230000008929 regeneration Effects 0.000 claims 1
- 238000011069 regeneration method Methods 0.000 claims 1
- 230000004048 modification Effects 0.000 abstract description 14
- 238000012986 modification Methods 0.000 abstract description 14
- 238000003786 synthesis reaction Methods 0.000 abstract description 9
- 230000015572 biosynthetic process Effects 0.000 abstract description 8
- 230000000875 corresponding effect Effects 0.000 description 17
- 230000001755 vocal effect Effects 0.000 description 13
- 230000008859 change Effects 0.000 description 8
- 238000001228 spectrum Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 239000000872 buffer Substances 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 239000000523 sample Substances 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 230000005534 acoustic noise Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 210000001260 vocal cord Anatomy 0.000 description 2
- 101100310856 Drosophila melanogaster spri gene Proteins 0.000 description 1
- 241000283986 Lepus Species 0.000 description 1
- 241000613130 Tima Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000002715 modification method Methods 0.000 description 1
- KRTSDMXIXPKRQR-AATRIKPKSA-N monocrotophos Chemical compound CNC(=O)\C=C(/C)OP(=O)(OC)OC KRTSDMXIXPKRQR-AATRIKPKSA-N 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000012887 quadratic function Methods 0.000 description 1
- 230000001172 regenerating effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000007789 sealing Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
Definitions
- the field of this invention is speech technology generally and, in particular, methods and devices for analyzing, digitally-encoding, modifying and synthesizing speech or other acoustic waveforms.
- the problem of representing speech signals is approached by using a speech production model in which speech is viewed as the result of passing a glottal excitation waveform through a time-varying linear filter that models the resonant characteristics of the vocal tract.
- the glottal excitation can be in one of two possible states corresponding to voiced or unvoiced speech.
- the voiced speech state the excitation is periodic with a period which is allowed to vary slowly over time relative to the analysis frame rate (typically 10-20 msecs).
- the glottal excitation is modeled as random noise with a flat spectrum. In both cases the power level in the excitation is also considered to be slowly time-varying.
- Speech coders at rates compatible with conventional transmission lines would meet a substantial need. At such rates the binary model is ill-suited for coding applications. Additionally, speech processing devices and methods that allow the user to modi fy various parameters in reconstructing waveform would find substantial usage. For example, time-scale modification (without pitch alteration) would be a very useful feature for a variety of speech applications (i.e. slowing down speech for translation purposes or speeding it up for scanning purposes) as well as for musical composition or analysis. Unfortunately, time-scale (and other parameter) modifications also are not accomplished with high quality by devices employing the binary model.
- the basic method of the invention includes the steps of: (a) selecting frames (i.e. windows of about 20 - 40 milliseconds) of samples from the waveform; (b) analyzing each frame of samples to extract a set of frequency components; (c) tracking the components from one frame to the next; and (d) interpolating the values of the components from one frame to the next to obtain a parametric representation of the waveform.
- a synthetic waveform can then be constructed by generating a series of sine waves corresponding to the parametric representation.
- a device which uses only the amplitudes and frequencies of the component sine waves to represent the waveform.
- phase continuity is maintained by defining the phase to be the integral of the instantaneous frequency.
- explicit use is made of the measured phases as well as the amplitudes and frequencies of the components.
- the invention is particularly useful in speech coding and time-scale modification and has been demonstrated successfully in both of these applications.
- Robust devices can be built according to the invention to operate in environments of additive acoustic noise.
- the invention also can be used to analyze single and multiple speaker signals, music or even biological sounds.
- the invention will also find particular applications, for example, in reading machines for the blind, in broadcast journalism editing and in transmission of music to remote players.
- the basic method summarized above is employed to choose amplitudes, frequencies, and phases corresponding to the largest peaks in a p eri odogram of the measured signal, independently of the speech state.
- the amplitudes, frequencies, and phases of the sine waves estimated on one frame are matched and allowed to continuously evolve into the correspondi ng parameter set on the successive frame. Because the number of estimated peaks are not constant and slowly varying, the matching process is not straightforward. Rapidly varying regions of speech such as unvoiced/voiced transitions can result in large changes in both the location and number of peaks.
- pitch estimates are used to establish a set of harmonic frequency bins to which the frequency components are assigned.
- Pitch is used herein to mean the fundamental rate at which a speaker's vocal cords are vibrating.
- the amplitudes of the components can be coded directly using adaptive pulse code modulation (ADPCM) across frequency or indirectly using linear predictive coding.
- ADPCM adaptive pulse code modulation
- the peak having the largest amplitude is selected and assigned to the frequency at the center of the bin. This results in a harmonic series based upon the coded pitch period.
- the phases can then be coded by using the frequencies to predict phase at the end of the frame, unwrapp i ng the measured phase with respect to this prediction and then coding the phase residual using 4 bits per phase peak.
- phase tracks for the high frequency peaks can be artificially generated. In one preferred embodiment, this is done by translating the frequency tracks of the base bands peaks to the high frequency of the uncoded phase peaks.
- This new coding scheme has the important property of adaptively allocating the bits for each speaker and hence is self-tuning to both low- and high-pitched speakers.
- pitch is used to provide side information for the coding algorithm, the standard voice-excitation model for speech is not used. This means that recourse is never made to a voiced-unvoiced decision. As a consequence the invention is robust in noise and can be applied at various data transmission rates simply by changing the rules for the bit allocation.
- the invention is also well-suited for time-scale modification, which is accomplished by time-scaling the amplitudes and phases such that the frequency variations are preserved.
- the time-scale at which the speech is played back is controlled simply by changing the rate at which the matched peaks are interpolated. This means that the time-scale can be speeded up or slowed down by any factor and this factor can be time-varying. This rate can be controlled by a panel knob which allows an operator complete flexibility for varying the time-scale. There is no perceptual delay in performing the time-scaling.
- the pitch period can be derived from the Fourier transform.
- Other techniques such as the Gold-Malpass techniques can also be used. See generally, M.L. Malpass, "The Gold Pitch Detector in a Real Time Environment” Proc. of EASCON 1975 (Sept. 1975); B. Gold, "Description of a Computer Program for Pitch Detection", Fourth International Congress on Acoustics, Copenhagen August 21-28, 1962 and B. Gold, “Note on Buzz-Hiss Detection", J. Acoust. Soc. Amer. 365, 1659-1661 (1964), all incorporated herein by reference.
- interpolation is used broadly in this application to encompass various techniques for filling in data values between those measured at the frame boundaries.
- linear interpolation is employed to fill in amplitude and frequency values.
- phase values are obtained by first defining a series of instantaneous frequency values by interpolating matched frequency components from one frame to the next and then integrating the series of instantaneous frequency values to obtain a series of interpolated phase values.
- phase value of each frame is derived directly and a cubic polynomial equation preferably is employed to obtain maximally smooth phase interpolations from frame to frame.
- interpolation techniques Other techniques that accomplish the same purpose are also referred to in this application as interpolation techniques.
- interpolation techniques For example, the so-called "overlap and add" method of filling in data values can also be used.
- a weighted overlapping function can be applied to the resul ti ng sine waves generated during each frame and then the overlapped values can be summed to fill in the values between those measured at the frame boundaries.
- FIGURE 1 is a schematic block diagram of one embodiment of the invention in which only the magnitudes and frequencies of the components are used to reconstruct a sampled waveform.
- FIGURE 2 is an illustration of the extracted amplitude and frequency components of a waveform sampled according to the present invention.
- FIGURE 5 is an illustration of tracked frequency components of an exemplary speech pattern.
- FIGURE 6 is a schematic block diagram of another embodiment of the invention in wh i c h magnitude and phase of frequency components are used to reconstruct a sampled waveform.
- FIGURE 7 is an illustrative set of cubic phase interpolation functions for smoothing the phase functions useful in connection with the embodiment of FIGURE 6 from which the "maximally smooth" phase function is selected.
- FIGURE 8 is a schematic block diagram of another embodiment of the invention particularly useful for time-scale modification.
- FIGURE 9 is a schematic block diagram showing an embodiment of the system estimation function of FIGURE 8.
- FIGURE 10 is a block diagram of one real-time implementation of the invention.
- the speech waveform is modeled as a sum of sine waves. If s(n) represents the sampled speech waveform then
- s(n) ⁇ a i (n)sin[ ⁇ i (n)] (1) where a i (n) and ⁇ i (n) are the time-varying amplitudes and phases of the i'th tone.
- f o (n) represents the fundamental frequency at time n.
- phase continuity hence waveform continuity, is guaranteed as a consequence of the definition of phase in terms of the instantaneous frequency. This means that waveform reconstruction is possible from the magnitude-only spectrum since a high-resolution spectral analysis reveals the amplitudes and frequencies of the component sine waves.
- FIGURE 1 A block diagram of an analysis/synthesis system according to the invention is illustrated in FIGURE 1.
- the peaks of the magnitude of the discrete Fourier transform (DFT) of a windowed waveform are found simply by determining the locations of a change in slope (concave down).
- the total number of peaks can be limited and this limit can be adapted to the expected average pitch of the speaker.
- the speech waveform can be digitized at a 10kHz sampling rate, low-passed filtered at 5 kHz, and analyzed at 20 msec frame intervals with a 20 msec Hamming window.
- Speech representations according to the invention can also be obtained by employing an analysis window of variable duration.
- the width of the analysis window be pitch adaptive, being set, for example, at 2.5 times the average pitch period with a minimum width of 20 msec.
- FIGURE 3 illustrates the basic process of frequency component matching. If the number of peaks were constant and slowly varying from frame to frame, the problem of matching the parameters estimated on one frame with those on a successive frame would simply require a frequency ordered assignment of peaks. In practice, however, there will be spurious peaks that come and go due to the effects of sidelobe interaction; the locations of the peaks will c ha nge as the pitch changes; and there will be rapid changes in both the location and the number of peaks corresponding to rapidly-varying regions of speech, such as at voiced/unvoiced transitions. In order to account for such rapid movements in the spectral peaks, the present invention employs the concept of "birth" and "death" of sinusoidal components as part of the matching process.
- FIGURE 4(a) depicts the case where all frequencies ⁇ m k + 1 in frame k+1 lie outside a "matching interval" ⁇ of ⁇ n k , i.e.,
- Step 1 a candidate match from Step 1 is confirmed.
- a frequency ⁇ n k of frame k has been tentatively matched to frequency ⁇ m k+1 of frame k+1.
- the candidate match is declared to be a definitive match. This condition, illustrated 1n FIGURE 4(c), is given by
- Step 1 is repeated for the next frequency in the list, ⁇ k n+1 .
- FIGURE 5 The results of applying the tracker to a segment of real speech is shown in FIGURE 5, which demonstrates the ability of the tracker to adapt quickly through transitory speech behavior such as voiced/unvoiced transitions, and mixed voiced/unvoiced regions.
- FIGURE 6 shows a block diagram of a more comprehensive system in which phases are measured directly.
- the frequency components and their amplitudes are determined in the same manner as the magnitude-only system described above and illustrated in FIGURE 1.
- Phase measurements are derived directly from the discrete Fourier transform by computing the arctangents at the esti ma ted f requency peak s .
- phase interpolation function that is a cubic polynomial, namely
- the parameters of the polynomial must be chosen to satisfy frequency and phase measurements obtained at the frame boundaries. Since the instantaneous frequency is the derivative of the phase, then
- FIGURE 7 illustrates a typical set of cubic phase interpolation functions for a number of values of M. It seems clear on intuitive grounds that the best phase function to pick is the one that would have the least variation. This is what is meant by a maximally smooth frequency track. In fact, if the frequencies were constant and the vocal tract were stationary, the true phase would be linear. Therefore a reasonable criterion for "smoothness" is to choose M such that
- This phase function not only satisfies all of the measured phase and frequency endpoint constraints, but also unwraps the phase in such a way that ⁇ (t) is maximally smooth.
- each frequency track will have associated with it an instantaneous unwrapped phase which accounts for both the rapid phase changes due to the frequency of each sinusoidal component, and the slowly varying phase changes due to the glottal pulse and the vocal track transfer function.
- ⁇ l (t) denote the unwrapped phase function for the
- the invention as described in connection with FIGURE 6 has been used to develop a speech coding system for operation at 8 kilobits per second. At this rate, high-quality speech depends critically on the phase measurements and, thus, phase coding is a high priority. Since the sinusoidal representation also requires the specification of the amplitudes and frequencies, it is clear that relatively few peaks can be coded before all of the available bits were used. The first step, therefore, is to significantly reduce the number of parameters that must be coded. One way to do this is to force all of the frequencies to be harmonic.
- noise-like waveforms can be represented (in an ensemble mean-squared error sense) in terms of a harmonic expansion of sine waves provided the spacing between adjacent harmonics is small enough that there is little change in the power spectrum envelope (i.e. intervals less than about 100 Hz).
- This representation preserves the statistical properties of the input speech provided the amplitudes and phases are randomly varying from frame to frame. Since the amplitudes and phases are to be coded, this random variation inherent in the measurement variables can be preserved in the synthetic waveform.
- the number of sine wave components to be coded is the bandwidth of the coded speech divided by the fundamental. Since there is no guarantee that the number of measured peaks will equal this harmonic number, provision should be made for adjusting the number of peaks to be coded.
- a set of harmonic frequency bins are established and the number of peaks falling within each bin are examined. If more than one peak is found, then only the amplitude and phase corresponding to the largest peak are retained for coding. If there are no peaks in a given bin, then a fictious peak is created having an amplitude and phase obtained by sampling the short-time Fourier Transform at the frequency corresponding to the center of the bin.
- amplitudes are then coded by applying the same techniques used in channel vocoders. That is, a gain level is set, for example, by using 5 bits with 2 dB per level to code the amplitude of a first peak (i.e. the first peak above 300 Hz). Subsequent peaks are coded l oga ri thmi cal l y using delta-modulation techniques across frequency. In one simulation 3.6 kbps were assigned to code the amplitudes at a 50 Hz frame rate. Adaptive bit allocation rules can be used to assign bits to peaks. For example, if the pitch is high there will be relatively few peaks to code, and there will be more bits per peak. Conversely when the pitch is l ow there will be relatively few bits per peak, but since the peaks will be closer together their values will be more correlated, hence the ADPCM coder should be able to track them well.
- phase a fixed number of bits per peak (typically 4 or 5) is used.
- Another method uses the frequency track corresponding to the phase (to be coded) to predict the phase at the end of the current frame, unwrap the value, and then code the phase residual using ADPCM techniques with 4 or 5 bits per phase peak. Since there remains only 4.4 kbps to code the phases and the fundamental (7 bits are used), then at a 50 Hz frame rate, it will be possible to code at most 16 peaks.
- the pitch is greater than 250 Hz. If the pitch is less than 250 Hz provision has to be made for regenerating a phase track for the uncoded high frequency peaks. This is done by computing a differential frequency that is the difference between the derivative of the instantaneous cubic phase and the linear interpolation of the end point frequencies for that track. The differential frequency is translated to the high frequency region by adding it to the linear interpolation of the end point frequencies corresponding to the track of the uncoded phase. The resulting instantaneous frequency function is then integrated to give the Instantaneous phase function that is applied to the sine wave generator. In this way the phase coherence intrinsic in the voiced speech and the phase incoherence characteristic of unvoiced speech is effectively translated to the uncoded frequency regions.
- FIGURE 8 another embodiment of the invention is shown, particularly adapted for time-scale modification.
- the representative sine waves are further defined to consist of system contributions (i.e. from the vocal tract) and excitation contributions (i.e. from the vocal chords).
- the excitation phase contributions are singled out for cubic interpolation.
- the procedure generally follows that described above in connection with other embodiments; however, in a further step the mea su red amplitudes A k l and phases ⁇ k l are decomposed into vocal tract and excitation components.
- the approach is to first form estimates of the vocal tract amplitude and phase as functions of frequency at each analysis frame (i.e., M( ⁇ ,kR) and ⁇ ( ⁇ ,kR)).
- System amplitude and phase estimates at the selected frequencies ⁇ l k are then given by:
- the decomposition problem then becomes that of estimating M( ⁇ ,kR) and ⁇ ( ⁇ ,kR) as functions of frequency from the high resolution spectrum X( ⁇ ,kR).
- M( ⁇ ,kR) and ⁇ ( ⁇ ,kR) as functions of frequency from the high resolution spectrum X( ⁇ ,kR).
- FIGURE 9 One approach to estimation of the system magnitude, and the corresponding estimation of the system phase through the use of the Hubert Transform is shown in FIGURE 9 and is based on a homomorphic transformation.
- the Fourier transform of the logarithm of the high-resolution magnitude is first computed to obtain the "cepstrum”.
- the imaginary component of the resulting inverse Fourier transform is the desired phase and the real part is the smooth log-magnitude.
- uniformly spaced samples of the Fourier transform are computed with the FFT.
- the length of the FFT was chosen at 512 which was sufficiently large to avoid aliasing in the cepstrum.
- the high-resolution spectrum used to estimate the sinewave frequencies is also used to estimate the vocal-tract system function.
- the remaining analysis steps in the time-scale modifying system of FIGURE 8 are analogous to those described above in connection with the other embodiments.
- all of the amplitudes and phases of the excitation and system components measured for an arbitrary frame k are associated with a corresponding set of parameters for frame k+1.
- the next step in the synthesis is to Interpolate the matched excitation and system parameters across frame boundaries.
- the interpolation procedures are based on the assumption that the excitation and system functions are slowly-varying across frame boundaries. This is consistent, with the assumption that the model parameters are slowly-varying relative to the duration of the vocal tract impulse response. Since this slowly-varying constraint maps to a slowly-varying excitation and system amplitude, it suffices to interpolate these functions linearly.
- the system phase estimate derived from the homomorphic analysis is unwrapped in frequency and thus slowly-varying when the system amplitude (from which it was derived) is slowly-varying. Linear interpolation of samples of this function results then in a phase trajectory which reflects the underlying vocal tract movement.
- This phase function is referred to as ⁇ l (t) where ⁇ l (o) corresponds to the ⁇ l k of Equation 22.
- ⁇ l (t) where ⁇ A (o) corresponds to ⁇ l k of Equation 22.
- time-scale modification is to maintain the perceptual quality of the original speech while changing the apparent rate of articulation. This implies that the frequency trajectories of the excitation (and thus the pi tc h contour) are stretched or compressed in time and the vocal tract changes at a slower or faster rate.
- the synthesis method of the previous section is ideally suited for this transformation since it involves summing sine waves composed of vocal cord excitation and vocal tract system contributions for which explicit functional expressions have been derived.
- Speech events which take place at a time t o according to the new time scale will have occurred at p -1 t o in the original time scale.
- the "events" which are time-scaled are the system amplitudes and phases, and the excitation amplitudes and frequencies, along each frequency track. Since the parameter estimates of the unmodified synthesis are available as continuous functions of time, then in theory, any rate change is possible. In conjunction with the Equations (19) -
- the cubic phase function ⁇ l '(n) is initialized by the value p(t n ') ⁇ l (t n ') where ⁇ l (t n ') is the initial excitation phase obtained using (17).
- the invention can be used to perform frequency and pitch scaling.
- the short time spectral envelope of the synthetic waveform can be varied by scaling each frequency component and the pitch of the synthetic waveform can be altered by scaling the excitation-contributed frequency components.
- FIGURE 10 a final embodiment of the invention is shown which has been implemented and operated in real time.
- the illustrated embodiment was implemented in 16-bit fixed point arithmetic using four Lincoln Digital Signal Processors (LDSPs).
- the foreground program operates on every input A/D sample collecting 100 input speech samples into 10 msec buffers.
- a 10 msec buffer of synthesized speech is played out through a D/A converter.
- the most recent speech is pushed down into a 600 msec buffer. It is from this buffer that the data for the pitch-adaptive Hamming window 1s drawn and on which a 512 point Fast Fourier Transform (FFT) is applied.
- FFT Fast Fourier Transform
- a set of amplitudes and frequencies is obtained by locating the peaks of the magnitude of the FFT.
- the data is supplied to the pitch extraction module from wh i c h i s gen era ted th e p i tc h es tima te tha t c on tro l s the pitch-adaptive windows. This parameter is also supplied to the coding module in the data compression application.
- another pitch adaptive Hamming window is buffered and transferred to another LDSP for parallel computation.
- Another 512 point FFT is taken for the purpose of estimating the amplitudes, frequencies and phases, to which the coding and speech modification methods will be applied.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Surgical Instruments (AREA)
- Electrophonic Musical Instruments (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
Un modèle sinusoïdal pour formes d'ondes acoustiques est appliqué pour développer une nouvelle technique d'analyse et de synthèse qui permet de caractériser une forme d'onde par les amplitudes, les fréquences et les phases des ondes sinusoïdales qui la composent. Ces paramètres sont calculés sur la base d'une transformée de Fourier de courte durée. Des changements rapides des composantes spectrales de haute résolution sont suivis en utilisant les concepts de "naissance" et de "mort" des ondes sinusoïdales sous-jacentes. Les valeurs des composantes sont interpolées d'un cadre à l'autre afin de donner une représentation appliquée à un générateur d'ondes sinusoïdales. La forme d'onde synthétique résultante garde la forme générale de la forme d'onde et ne peut pas être perceptiblement distinguée de l'original. En outre, les caractéristiques perceptuelles de la forme d'onde sont conservées en présence de bruits, ainsi que les bruits. Ce procédé et ces dispositifs sont particulièrement utiles pour le codage de la parole, les modifications de l'échelle temporelle, les modifications de l'échelle des fréquences et les modifications de la hauteur.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US71286685A | 1985-03-18 | 1985-03-18 | |
US712,866 | 1985-03-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1986005617A1 true WO1986005617A1 (fr) | 1986-09-25 |
Family
ID=24863876
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1986/000543 WO1986005617A1 (fr) | 1985-03-18 | 1986-03-14 | Traitement de formes d'ondes acoustiques |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP0215915A4 (fr) |
JP (1) | JP2759646B2 (fr) |
AU (1) | AU597573B2 (fr) |
CA (1) | CA1243122A (fr) |
WO (1) | WO1986005617A1 (fr) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0259950A1 (fr) * | 1986-09-11 | 1988-03-16 | AT&T Corp. | Vocoder sinusoidal numérique avec transmission de seulement une partie des harmoniques |
EP0260053A1 (fr) * | 1986-09-11 | 1988-03-16 | AT&T Corp. | Vocodeur numérique |
EP0285275A2 (fr) * | 1987-04-02 | 1988-10-05 | Massachusetts Institute Of Technology | Procédé et dispositif de prétraitement d'un signal acoustique |
EP0285276A2 (fr) * | 1987-04-02 | 1988-10-05 | Massachusetts Institute Of Technology | Codage de formes d'ondes acoustiques |
US4937873A (en) * | 1985-03-18 | 1990-06-26 | Massachusetts Institute Of Technology | Computationally efficient sine wave synthesis for acoustic waveform processing |
US5029509A (en) * | 1989-05-10 | 1991-07-09 | Board Of Trustees Of The Leland Stanford Junior University | Musical synthesizer combining deterministic and stochastic waveforms |
US5327521A (en) * | 1992-03-02 | 1994-07-05 | The Walt Disney Company | Speech transformation system |
DE4425767A1 (de) * | 1994-07-21 | 1996-01-25 | Rainer Dipl Ing Hettrich | Verfahren zur Wiedergabe von Signalen mit veränderter Geschwindigkeit |
EP1127349A1 (fr) * | 1998-08-28 | 2001-08-29 | Sigma Audio Research Limited | Techniques de traitement de signaux permettant d'echelonner dans le temps des signaux audio et/ou d'en modifier la tonie |
WO2005055201A1 (fr) * | 2003-12-01 | 2005-06-16 | Aic | Procede de modelisation de signal fenetre hautement optimise |
US9343074B2 (en) | 2012-01-20 | 2016-05-17 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for audio encoding and decoding employing sinusoidal substitution |
US10607630B2 (en) | 2016-03-18 | 2020-03-31 | Fraunhofer-Gesellschaft Zur Förderung Der | Encoding by reconstructing phase information using a structure tensor on audio spectrograms |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3212785B2 (ja) * | 1993-12-22 | 2001-09-25 | 防衛庁技術研究本部長 | 信号検出装置 |
US5812737A (en) * | 1995-01-09 | 1998-09-22 | The Board Of Trustees Of The Leland Stanford Junior University | Harmonic and frequency-locked loop pitch tracker and sound separation system |
JP3262204B2 (ja) * | 1996-03-25 | 2002-03-04 | 日本電信電話株式会社 | 周波数成分抽出装置 |
EP1259955B1 (fr) * | 2000-02-29 | 2006-01-11 | QUALCOMM Incorporated | Procede et appareil pour mesurer la phase d'un signal quasi periodique |
ES2269112T3 (es) * | 2000-02-29 | 2007-04-01 | Qualcomm Incorporated | Codificador de voz multimodal en bucle cerrado de dominio mixto. |
JP3404350B2 (ja) * | 2000-03-06 | 2003-05-06 | パナソニック モバイルコミュニケーションズ株式会社 | 音声符号化パラメータ取得方法、音声復号方法及び装置 |
SE0004221L (sv) * | 2000-11-17 | 2002-04-02 | Forskarpatent I Syd Ab | Metod och anordning för talanalys |
WO2002058053A1 (fr) * | 2001-01-22 | 2002-07-25 | Kanars Data Corporation | Procede de codage et procede de decodage pour donnees vocales numeriques |
US8027242B2 (en) | 2005-10-21 | 2011-09-27 | Qualcomm Incorporated | Signal coding and decoding based on spectral dynamics |
US8392176B2 (en) | 2006-04-10 | 2013-03-05 | Qualcomm Incorporated | Processing of excitation in audio coding and decoding |
KR101080421B1 (ko) * | 2007-03-16 | 2011-11-04 | 삼성전자주식회사 | 정현파 오디오 코딩 방법 및 장치 |
US8428957B2 (en) | 2007-08-24 | 2013-04-23 | Qualcomm Incorporated | Spectral noise shaping in audio coding based on spectral dynamics in frequency sub-bands |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3360610A (en) * | 1964-05-07 | 1967-12-26 | Bell Telephone Labor Inc | Bandwidth compression utilizing magnitude and phase coded signals representative of the input signal |
US4058676A (en) * | 1975-07-07 | 1977-11-15 | International Communication Sciences | Speech analysis and synthesis system |
US4076958A (en) * | 1976-09-13 | 1978-02-28 | E-Systems, Inc. | Signal synthesizer spectrum contour scaler |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6017120B2 (ja) * | 1981-05-29 | 1985-05-01 | 松下電器産業株式会社 | 音素片編型音声合成方式 |
JPS6040631B2 (ja) * | 1981-12-08 | 1985-09-11 | 松下電器産業株式会社 | 音素片編集型音声合成方式 |
JPS592033A (ja) * | 1982-06-28 | 1984-01-07 | Hitachi Ltd | 背面投写スクリ−ン |
JPS597399A (ja) * | 1982-07-02 | 1984-01-14 | 松下電器産業株式会社 | 単音節音声認識装置 |
JPS5942598A (ja) * | 1982-09-03 | 1984-03-09 | 日本電信電話株式会社 | 法則合成結合処理回路 |
JPS6088326A (ja) * | 1983-10-19 | 1985-05-18 | Kawai Musical Instr Mfg Co Ltd | 音響解析装置 |
JPS6097398A (ja) * | 1983-11-01 | 1985-05-31 | 株式会社河合楽器製作所 | 音響解析装置 |
JPH079591B2 (ja) * | 1983-11-01 | 1995-02-01 | 株式会社河合楽器製作所 | 楽器音響解析装置 |
-
1986
- 1986-03-14 JP JP61501779A patent/JP2759646B2/ja not_active Expired - Lifetime
- 1986-03-14 WO PCT/US1986/000543 patent/WO1986005617A1/fr not_active Application Discontinuation
- 1986-03-14 AU AU56208/86A patent/AU597573B2/en not_active Expired
- 1986-03-14 EP EP19860902188 patent/EP0215915A4/fr not_active Withdrawn
- 1986-03-18 CA CA000504354A patent/CA1243122A/fr not_active Expired
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3360610A (en) * | 1964-05-07 | 1967-12-26 | Bell Telephone Labor Inc | Bandwidth compression utilizing magnitude and phase coded signals representative of the input signal |
US4058676A (en) * | 1975-07-07 | 1977-11-15 | International Communication Sciences | Speech analysis and synthesis system |
US4076958A (en) * | 1976-09-13 | 1978-02-28 | E-Systems, Inc. | Signal synthesizer spectrum contour scaler |
Non-Patent Citations (1)
Title |
---|
See also references of EP0215915A4 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4937873A (en) * | 1985-03-18 | 1990-06-26 | Massachusetts Institute Of Technology | Computationally efficient sine wave synthesis for acoustic waveform processing |
EP0259950A1 (fr) * | 1986-09-11 | 1988-03-16 | AT&T Corp. | Vocoder sinusoidal numérique avec transmission de seulement une partie des harmoniques |
EP0260053A1 (fr) * | 1986-09-11 | 1988-03-16 | AT&T Corp. | Vocodeur numérique |
EP0285276A2 (fr) * | 1987-04-02 | 1988-10-05 | Massachusetts Institute Of Technology | Codage de formes d'ondes acoustiques |
EP0285275A3 (fr) * | 1987-04-02 | 1989-11-23 | Massachusetts Institute Of Technology | Procédé et dispositif de prétraitement d'un signal acoustique |
EP0285276A3 (en) * | 1987-04-02 | 1989-11-23 | Massachusetts Institute Of Technology | Coding of acoustic waveforms |
EP0285275A2 (fr) * | 1987-04-02 | 1988-10-05 | Massachusetts Institute Of Technology | Procédé et dispositif de prétraitement d'un signal acoustique |
US5029509A (en) * | 1989-05-10 | 1991-07-09 | Board Of Trustees Of The Leland Stanford Junior University | Musical synthesizer combining deterministic and stochastic waveforms |
US5327521A (en) * | 1992-03-02 | 1994-07-05 | The Walt Disney Company | Speech transformation system |
DE4425767A1 (de) * | 1994-07-21 | 1996-01-25 | Rainer Dipl Ing Hettrich | Verfahren zur Wiedergabe von Signalen mit veränderter Geschwindigkeit |
EP1127349A1 (fr) * | 1998-08-28 | 2001-08-29 | Sigma Audio Research Limited | Techniques de traitement de signaux permettant d'echelonner dans le temps des signaux audio et/ou d'en modifier la tonie |
EP1127349A4 (fr) * | 1998-08-28 | 2005-07-13 | Sigma Audio Res Ltd | Techniques de traitement de signaux permettant d'echelonner dans le temps des signaux audio et/ou d'en modifier la tonie |
WO2005055201A1 (fr) * | 2003-12-01 | 2005-06-16 | Aic | Procede de modelisation de signal fenetre hautement optimise |
US9343074B2 (en) | 2012-01-20 | 2016-05-17 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for audio encoding and decoding employing sinusoidal substitution |
US10607630B2 (en) | 2016-03-18 | 2020-03-31 | Fraunhofer-Gesellschaft Zur Förderung Der | Encoding by reconstructing phase information using a structure tensor on audio spectrograms |
Also Published As
Publication number | Publication date |
---|---|
EP0215915A4 (fr) | 1987-11-25 |
AU5620886A (en) | 1986-10-13 |
JP2759646B2 (ja) | 1998-05-28 |
CA1243122A (fr) | 1988-10-11 |
EP0215915A1 (fr) | 1987-04-01 |
AU597573B2 (en) | 1990-06-07 |
JPS62502572A (ja) | 1987-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4885790A (en) | Processing of acoustic waveforms | |
AU597573B2 (en) | Acoustic waveform processing | |
US4937873A (en) | Computationally efficient sine wave synthesis for acoustic waveform processing | |
McAulay et al. | Speech analysis/synthesis based on a sinusoidal representation | |
McAulay et al. | Pitch estimation and voicing detection based on a sinusoidal speech model | |
Malah | Time-domain algorithms for harmonic bandwidth reduction and time scaling of speech signals | |
EP0388104B1 (fr) | Procédé pour l'analyse et la synthèse de la parole | |
McAulay et al. | Magnitude-only reconstruction using a sinusoidal speech modelMagnitude-only reconstruction using a sinusoidal speech model | |
Potamianos et al. | Speech analysis and synthesis using an AM–FM modulation model | |
US6496797B1 (en) | Apparatus and method of speech coding and decoding using multiple frames | |
US20050065784A1 (en) | Modification of acoustic signals using sinusoidal analysis and synthesis | |
JP3191926B2 (ja) | 音響波形のコード化方式 | |
McAulay et al. | Mid-rate coding based on a sinusoidal representation of speech | |
Ferreira | Combined spectral envelope normalization and subtraction of sinusoidal components in the ODFT and MDCT frequency domains | |
Serra | Introducing the phase vocoder | |
Cavaliere et al. | Granular synthesis of musical signals | |
Zivanovic et al. | Single and piecewise polynomials for modeling of pitched sounds | |
US6438517B1 (en) | Multi-stage pitch and mixed voicing estimation for harmonic speech coders | |
Tabet et al. | Speech analysis and synthesis with a refined adaptive sinusoidal representation | |
KR100579797B1 (ko) | 음성 코드북 구축 시스템 및 방법 | |
Parikh et al. | Frame erasure concealment using sinusoidal analysis-synthesis and its application to MDCT-based codecs | |
Kawahara et al. | Restructuting speech representations using straight-tempo: Possible role of a repetitive structure in sounds | |
Sercov et al. | An improved speech model with allowance for time-varying pitch harmonic amplitudes and frequencies in low bit-rate MBE coders. | |
JP3321933B2 (ja) | ピッチ検出方法 | |
JP3398968B2 (ja) | 音声分析合成方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE FR GB IT LU NL SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1986902188 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1986902188 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1986902188 Country of ref document: EP |