CN100440314C - High quality real time sound changing method based on speech sound analysis and synthesis - Google Patents
High quality real time sound changing method based on speech sound analysis and synthesis Download PDFInfo
- Publication number
- CN100440314C CN100440314C CNB2004100623371A CN200410062337A CN100440314C CN 100440314 C CN100440314 C CN 100440314C CN B2004100623371 A CNB2004100623371 A CN B2004100623371A CN 200410062337 A CN200410062337 A CN 200410062337A CN 100440314 C CN100440314 C CN 100440314C
- Authority
- CN
- China
- Prior art keywords
- spectrum
- frequency
- time
- resonance peak
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000015572 biosynthetic process Effects 0.000 title abstract description 6
- 238000003786 synthesis reaction Methods 0.000 title abstract description 5
- 238000001228 spectrum Methods 0.000 claims abstract description 75
- 230000008859 change Effects 0.000 claims abstract description 38
- 238000006243 chemical reaction Methods 0.000 claims abstract description 10
- 230000000694 effects Effects 0.000 claims description 11
- 230000009466 transformation Effects 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 5
- 238000012545 processing Methods 0.000 description 5
- 238000004088 simulation Methods 0.000 description 4
- 238000013341 scale-up Methods 0.000 description 3
- 230000000737 periodic effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 210000001260 vocal cord Anatomy 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Abstract
The present invention relates to a high-quality real time speech changing method based on speech sound analysis and synthesis, which belongs to the field of speech converter technique. The signals are interpolated or inspected according to requirements of time length change in a time domain, an amplitude spectrum and a phase spectrum are respectively processed through the conversion from the time domain to a frequency domain to separate and independently adjust base frequency and formant, the influence to the base frequency and the format from time length adjustment is compensated during adjusting, and finally, time domain signals are restored. The time domain signals are converted to the frequency domain through the fast Fourier transform, the positions of the base frequency of speech and formant positions are separated and are respectively adjusted, speech is synthesized again, and therefore, sound length, pitch and tone color are adjusted to change sound intensity and realize voice conversion. The method can process speech in real time, can be directly used in the recreation fields of network telephones, speech chat rooms, etc. and can also be used in the practical fields of dubbing, music synthesis, etc. Simultaneously, the method can also be used for speech synthesis to improve the whole tone quality of synthetic speech.
Description
Technical field
The present invention relates to the speech transformation technique field, particularly a kind of based on speech analysis and synthetic high-quality real-time change of voice method.
Background technology
Speech transformation technique is used to change acoustic features such as the tone of voice and speed, thereby the intention according to people produces the new feature that suits the requirements, it has practical application widely in many aspects, for example dubs, music is synthetic, Internet chat, sound are maintained secrecy or the like.This technology has been widened the research range of speech processes, makes the application of the speech processes more diversification that becomes.
The basic physical features of voice comprises pitch, loudness of a sound, tonequality and the duration of a sound.Pitch is that the frequency high sound is just high by the height decision of pronunciation object vibration frequency, and the low sound of frequency is just low.Vocal cords such as women and children are relatively shorter and thinner, vibration frequency of vocal band height when singing in a minute, and man and old man's vocal cords are long and thicker, and vibration frequency of vocal band is low when singing in a minute, thereby man and old man's sound is overcast compared with female voice and child's voice.Can change pitch by changing fundamental frequency.The power of the corresponding sound of loudness of a sound is decided by the amplitude of sound, and promptly the size by vibration is determined.Tonequality is tone color again, is exactly essence, the characteristic of sound, and it depends on the form of acoustic vibration, is the most basic feature that different sound can be distinguished mutually, performs same tune as voice, piano sound, violin sound, sounds having nothing in common with each other.Resonance peak has reflected the outstanding harmonic components of component in the sound, thereby thinks that height, position and the number affects of resonance peak tone color.The duration of a sound is exactly the length of sound, is decided by the time of sounding body vibration.
As the fundamental of sound, any factor is not self-existent in pitch, loudness of a sound, tonequality and the duration of a sound.Generally only change wherein a kind ofly, severally in addition also can change thereupon.For example, just can change the word speed of voice by the sample frequency that changes the audio digital signals broadcast, promptly change the duration of a sound, but meanwhile, the fundamental frequency of voice and resonance peak position also change simultaneously, thereby be not that variation has taken place word speed in the sound that we hear, variation has also taken place in tone color and pitch, and whole speaker's feature changes so much that one loses one's identity.For another example, only the fundamental frequency in the voice is carried out proportional zoom, behind the synthetic speech, the position of resonance peak also can be moved with fundamental frequency again, and tone color changes equally.These problems need to be resolved in speech transformation technique.
The relation of four kinds of factors that the present invention is clear and definite, by modes such as separation, compensation, pitch, loudness of a sound, tonequality and the duration of a sound have been realized independent adjustment, thereby can adjust voice speaker's feature such as tone color, tone, word speed flexibly, reach multiple speaker ' s identity (old man, child, adult males, girls etc.) high-quality simulation.
Summary of the invention
The object of the present invention is to provide a kind of based on speech analysis and synthetic high-quality real-time change of voice method.
This method changes the influence that cause by studying its difference to voice based on the understanding to the physical attribute of voice, obtains the method that a kind of method by digital signal processing changes the speaker ' s identity feature of voice.The present invention is based on the time frequency analysis of digital signal, cut the length that changes voice by the interpolation on the time domain with taking out, by short time discrete Fourier transform time-domain signal is transformed on the frequency domain, adjust the spectrum envelope shape of phase spectrum, amplitude spectrum and amplitude spectrum, the fundamental frequency that reaches speech separates with the resonance peak position, thus the purpose that can adjust respectively, and the feature after will changing at last synthesizes voice signal again, change characteristic of Voice, realized the change of voice.The present invention has realized the independent regulation of fundamental frequency, resonance peak position and time span, loudness of a sound, thereby can adjust voice speaker's feature such as tone color, pitch, word speed flexibly, reach high-quality simulation to multiple speaker's sex and age characteristics (old man, child, adult males, girls etc.).
A kind of based on speech analysis and synthetic high-quality real-time change of voice method, based on Fourier analysis and synthetic technology, comprise the steps: on time domain, signal to be carried out interpolation or take out and cut according to the requirement of duration change, transform to frequency domain then, amplitude spectrum and phase spectrum are handled respectively, separated fundamental frequency and resonance peak, and it is carried out independent regulation, compensation duration adjustment the two influence to this recovers time-domain signal at last during adjusting.Method has processing speed and high-quality treatment effect fast, can satisfy the requirement of real-time and practicality simultaneously.
Independent adjustment is carried out in fundamental frequency and resonance peak position to signal on frequency domain, fundamental frequency and resonance peak position are separated, the fundamental frequency and the harmonic wave thereof of voice signal both can have been changed, can keep the resonance peak position again simultaneously or the resonance peak position is arbitrarily adjusted, realize that the independent of tone color and pitch changes.
Directly the time span to voice signal changes on time domain, by interpolation or take out to cut digital signal realize is resampled, thereby elongate or shorten the time scale of voice, again fundamental frequency and the resonance peak position that changes therefrom compensated, thereby realize the independent effect that word speed is changed.
Energy to signal is added up, and adjusts the energy ratio of input/output signal in real time, thereby can change the voice intensity of output signal flexibly.
Adjustment respectively to amplitude spectrum and phase spectrum, by asking for the spectrum envelope of amplitude spectrum, and based on this, the spectrum envelope of the new amplitude spectrum that carried out the adjusted spectrum signal of fundamental frequency is carried out in shape change, under the prerequisite that does not influence fundamental frequency, realize random adjustment to the resonance peak position.
The present invention is based on speech analysis and synthetic technology, as shown in Figure 2.
Voice signal is regarded a kind of stationary signal in short-term as, can be transformed into frequency domain to voice signal by short time discrete Fourier transform and carry out analyzing and processing.When doing short time discrete Fourier transform, the length of time window can not be too short, will comprise several fundamental frequency cycles usually, simultaneously, because restriction stably in short-term, can not be oversize, guarantee that the variation of physical characteristics within the frame is not obvious.For voice, the fundamental frequency of male voice is lower, usually about 125HZ, and the about 8ms of fundamental frequency cycles, therefore, the length that can get time window usually is near the 24ms to 32ms.In digital signal processing, the length of the window function i.e. data sample of a frame is counted, and its size depends on the sampling rate size of this voice signal.
Carry out short time discrete Fourier transform, be equivalent to this frame voice signal elder generation windowing, calculated signals fourier series to behind periodic extension on the time shaft, obtaining again, be the stack that is expressed as one group of multiple sinusoidal signal behind this frame signal periodic extension, the fourier coefficient that conversion obtains is the amplitude of these multiple sinusoidal components.If the frequency values of the multiple sinusoidal component of each that will obtain is adjusted to new frequency values by multiply by a certain scale-up factor p simultaneously, the fundamental frequency and the harmonic frequency thereof of the time domain voice signal after synthesizing again through inverse fourier transform have so more also been taken advantage of scale-up factor p simultaneously, thereby realize the change to the original signal fundamental frequency.
In practical methods, short time discrete Fourier transform is realized by windowing and fast Fourier transform (FFT).After the conversion, in order further to finish adjustment and the synthetic again time-domain signal of use invert fast fourier transformation (IFFT) to the each component frequency values, need earlier the fourier coefficient that obtains to be transformed into polar coordinates by rectangular coordinate, promptly obtain its amplitude spectrum and phase spectrum.Both do convenient the realization separating of fundamental frequency and resonance peak position like this, finish following equivalence again easily and realize: promptly the original frequency value f that changes a certain multiple sinusoidal signal component
1To another frequency values p*f
1, become at fixed frequency f
2On to change the amplitude of this component and phase value into corresponding original frequency be f
2The amplitude of the component of/p and phase value, thus it is synthetic directly to use IFFT to realize.
Concerning amplitude spectrum, finish appeal and only handle and to carry out interpolation by proportional parts or take out and cut and to finish original amplitude spectrum.And for phase spectrum, then need earlier phase spectrum to be launched, as shown in Figure 3.In a certain frame, the frequency values f of the multiple sinusoidal signal component of a certain frequency
1Adjust to p*f
1, the variable quantity of the phase place of this component in this frame also will become p times of original variable quantity, and the variation of this phase place can be accumulated on the initial phase of next frame frame by frame.In order to realize this adjustment of phase spectrum, method is that the phase differential (be phase changing capacity former frame in) of phase spectrum on adjacent two frames after launching is adjusted into original p doubly, and the initial phase that obtains through accumulation also will become original p doubly again.
The method of deploying of phase spectrum:
Suppose that the shift time length between two frames is t
w, be f for frequency
kMultiple sinusoidal wave component, its at t (t>1) constantly, and the theoretical value of the phase changing capacity between the former frame is
ΔΦ
k (t)=2π·f
k·t
w.
Initial phase difference between actual two frames is
Δθ
k (t)=θ
k (t)-θ
k (t-1).
Definition
Δφ
k (t)=(Δθ
k (t)-ΔΦ
k (t))MOD2π+ΔΦ
k (t),
Wherein
So, Δ φ
k (t)The phase changing capacity of adjacent two interframe after promptly launching constantly as t.By adding up, obtain the initial phase after t launches constantly:
Said as the front, when changing fundamental frequency, only need carry out interpolation by proportional parts or take out and cut original amplitude spectrum for amplitude spectrum.But do like this, when having changed fundamental frequency, also moved the position of resonance peak in same ratio.So, need to introduce other method and under the situation that does not influence fundamental frequency, adjust resonance peak.This method is to reach final purpose by the spectrum envelope that extracts amplitude spectrum.
Below formula in, e (n) is the spectrum envelope of original amplitude spectrum before adjusting, by top disposal route, fundamental frequency improve p doubly after, spectrum envelope becomes thereupon
And have
Be the adjusted amplitude spectrum of process interpolation,
Be the amplitude spectrum after resonance peak is compensated.Have
Amplitude spectrum after the compensation that obtains thus
Kept the spectrum envelope e (n) of original amplitude spectrum, thereby guaranteed that original resonance peak position is constant moving, can the adjustment of frequency not impacted simultaneously.Same thinking can be further with formula
In e (n) change into resonance peak adjusted the new spectrum envelope in back, thereby realize change to the resonance peak position.
Ask for common the having of method of spectrum envelope: the method for linear predictive coding (LPC), the method for cepstral analysis, the method for low-pass filtering, discrete cepstrum method, and to method of local peak point interpolation or the like.In order to satisfy the requirement of real-time, the method for selection requires that lower complexity is arranged, and simultaneously, also will guarantee effect preferably.What adopt in this example is the method for improved cepstral analysis, experiment showed, that this method stability is strong, is applicable to multiple sound type, and calculates effect and calculated amount all meets practical requirement.
Above method has realized the independent of fundamental frequency and resonance peak position changed.
On this basis, the adjustment of the independence of the duration of a sound also becomes and is easy to realize.
Know that the sample frequency when playing by changing audio digital signals just can change the word speed of voice, has promptly changed the duration of a sound.So, can on time domain, make interpolation earlier or take out and cut voice signal data, under original sampling rate speed, to play, word speed has just obtained slowing down or accelerating.But meanwhile, the fundamental frequency of voice and resonance peak position also change simultaneously.If proportionally factor t is carried out interpolation to time-domain signal, then fundamental frequency cycles becomes original t doubly, and fundamental frequency just becomes 1/t, influences the resonance peak position simultaneously and also changes in the ratio of 1/t.
The method that has had the front that fundamental frequency and resonance peak position are independently changed, this moment if to fundamental frequency and resonance peak position in proportion factor t compensate simultaneously, just realized only changing the purpose of the duration of a sound.
Can be seen that by top discussion the adjustment of three kinds of physical characteristicss is the duration of a sound, fundamental frequency in proper order, be the resonance peak position then.Suppose that its ratio adjustment factor is followed successively by t, p, f, and the separately adjustment successively of three features, situation is as follows: factor t is adjusted the duration of a sound at first in proportion, simultaneously fundamental frequency and resonance peak position is compensated with factor t.Factor p adjusts fundamental frequency in proportion again, simultaneously to resonance peak position compensating factor 1/p.Factor f adjusts the resonance peak position in proportion at last.Therefore, finally be equivalent to adjust the duration of a sound with factor t earlier, adjust fundamental frequency with factor p*t again, use the factor at last
Adjust the resonance peak position of this moment, thereby realize independently adjusting with t, p, f respectively the purpose of three kinds of physical characteristicss.In the practical application, the adjustment of resonance peak can be simplified, and only needs to adjust f*t on initial position, as shown in Figure 1.
The adjustment of three kinds of physical characteristicss all is the interpolation by sample point and smokes and cut realization, in order to guarantee change of voice effect preferably, and under the prerequisite that satisfies voice conversion requirement, each scale factor is limited between 0.5~2.Experimental result shows, the adjustment of in this scope, making, and major part can both obtain gratifying effect.Be noted that the compensation of when adjusting resonance peak position and fundamental frequency the duration of a sound being adjusted simultaneously, this two resize ratio is become very big (substantially exceeding 2 times), cause many information lose or fuzzy.Therefore, when the resize ratio of resonance peak position or fundamental frequency is big, should not do big adjustment to the duration of a sound simultaneously.
The method of adjustment of loudness of a sound is as follows: Δ E
I, n, Δ E
O, nThe energy value of (adjusting fundamental frequency and resonance peak, after the synthetic again time-domain signal) when the energy value of (before the spectrum analysis, after the duration of a sound adjustment) and output when representing the input of n frame signal respectively, E
I, n, E
O, nBe used for representing the gross energy of input signal before the n frame and the gross energy of output signal.Thereby have
E
i,n=E
i,n-1+ΔE
i,n,
E
o,n=E
o,n-1+ΔE
o,n.
Top formula has guaranteed that signal and the conversion original energy before after the phonetic modification is consistent substantially, and promptly loudness of a sound remains unchanged.If need loudness of a sound is done the adjustment of a certain ratio, only need re-use this scale-up factor adjustment on this basis
Description of drawings
Fig. 1 is the duration of a sound of the present invention, fundamental frequency, resonance peak position adjustment detail flowchart;
Fig. 2 is signal analysis of the present invention and synthesis step schematic flow sheet;
Fig. 3 is a phase unwrapping synoptic diagram of the present invention.
Embodiment
The step that the duration of a sound of Fig. 1, fundamental frequency, resonance peak position are adjusted is as follows:
Step S1-1 carries out the interpolation of data point to a certain frame or takes out and cut according to adjusting factor t on time domain;
Step S1-2 transforms to frequency domain, and is transformed on the polar coordinates by rectangular coordinate, obtains phase spectrum I and amplitude spectrum II;
Step S1-3 extracts envelope to amplitude spectrum II, obtains envelope spectrum III, and III is carried out convergent-divergent according to adjusting factor t * f on frequency axis, obtains adjusting the envelope spectrum IV of resonance peak position;
Step S1-4 obtains V to amplitude spectrum II is point-to-point divided by envelope spectrum III, and the horizontal ordinate of spectrum V is carried out convergent-divergent according to adjusting factor t * p on frequency axis, point-to-pointly then multiply by adjusted envelope spectrum IV, obtains adjusted amplitude spectrum VII;
Step S1-5, to phase spectrum I, launch with the phase differential of consecutive frame, obtain between two frames actual value of phase change on each frequency, this is on duty to adjust factor t * p, then frequency axis is carried out convergent-divergent according to adjusting factor t * p, adjusted phase differential is added up again, obtain the adjusted phase spectrum VIII of present frame;
Step S1-6 transforms to rectangular coordinate with adjusted amplitude spectrum VII and phase spectrum VIII, the time domain of remapping.
The speech signal analysis of Fig. 2 is with synthetic, and its step is as follows:
Step S2-1 handles on time domain signal, comprises splice branch frame, interpolation, windowing etc.;
Step S2-2 is transformed into each frame that obtains on the time domain on the frequency domain by time-frequency conversion, handles on frequency spectrum, comprises adjusting fundamental frequency and resonance peak etc., returns to time domain again by the time-frequency inverse transformation then;
Step S2-3 carries out the window function compensation to each frame on time domain, with synthetic window function windowing, obtain complete time-domain signal after the splicing adding again.
The phase unwrapping of Fig. 3, the explanation such as the preamble of the concrete process of launching describe in detail.
In order to realize simulation and mutual conversion to male voice, female voice, child's voice and old man's sound, the present invention in the adjustment of each physical characteristics based on following explanation.
Under common situation of speaking, it is generally acknowledged that the fundamental frequency of child's voice is the highest, can reach about 300Hz, female voice is on a rough average near 220Hz, and the fundamental frequency of male voice is on average about 125Hz.Thus, can obtain the general proportions of the fundamental frequency of male voice, female voice and child's voice.Find in the practical application that the fundamental frequency ratio of female voice and male voice can have more satisfactoryly to effect between 1.5~1.8 usually, and the fundamental frequency ratio of child's voice and male voice must be more than 1.8.For simulation old man's sound, to reduce near the ratio of fundamental frequency to 0.6~0.9 usually, obtain real effect.
For resonance peak, usually, the resonance peak of male voice, female voice, child's voice roughly all has 6: 7: 8 simple relation.In the actual conditions, man, woman, child's voice are between each peak of different frequency height, and its ratio is not to be linear, and each peak difference that frequency is lower is bigger usually, and the higher then difference of frequency is little.Under the common application conditions, can ignore and not consider.For old man's sound, can think that its tone color is partial to male voice, so the adjusting ratio of its resonance peak is selected the numerical value less than 1 for use.
When the mutual conversion of men and women's sound, it has been generally acknowledged that word speed does not change, and, word speed can be slowed down slightly for old man's sound and child's voice, tally with the actual situation.
Embodiment
According to the method that the present invention proposes, on the pc platform, realized one can record, primary sound plays, and handles and simulate the demonstration program that male voice, female voice, old man's sound or child's sound are play in real time.This program is carried out the adjustment of the duration of a sound, fundamental frequency and resonance peak by predetermined resize ratio to this buffer zone speech data section to each the buffer zone elder generation pre-service in the play buffer formation, simulates man, woman, old man and child's sound respectively.And resize ratio that can the above-mentioned three kinds of features of manual setting, reach more satisfied simulate effect.This program has realized real-time processing, adjusts real-time play in real time.
At CPU is P4-2.4GHz, in save as under the test platform of 256M, the CPU usage when idle is 2%, CPU usage was about 10% when primary sound was play, and the change of voice in real time is when playing, CPU usage is about 22%.This change of voice method can receive within the scope fully to the requirement of processor, and has reached gratifying effect on tonequality.
Claims (5)
1. one kind based on speech analysis and synthetic real-time change of voice method, based on Fourier analysis and synthetic technology, it is characterized in that, comprise the steps: on time domain, signal to be carried out interpolation or take out and cut, transform to frequency domain then, amplitude spectrum and phase spectrum are handled respectively according to the requirement of time span change, separate fundamental frequency and resonance peak, and it is carried out independent regulation, the two the influence to this of make-up time length adjustment recovers time-domain signal at last during adjusting; The set-up procedure of wherein said fundamental frequency, resonance peak position is as follows:
Step S1-1 carries out the interpolation of data point to a certain frame or takes out and cut according to adjusting factor t on time domain;
Step S1-2 transforms to frequency domain, and is transformed on the polar coordinates by rectangular coordinate, obtains phase spectrum I and amplitude spectrum II;
Step S1-3 extracts envelope to amplitude spectrum II, obtains envelope spectrum III, and III is carried out convergent-divergent by adjusting factor t * f on frequency axis, obtains adjusting the envelope spectrum IV of resonance peak position
,F represents the fundamental frequency adjustment factor;
Step S1-4, point-to-point to amplitude spectrum II divided by envelope spectrum III, obtain V, the horizontal ordinate of spectrum V is carried out convergent-divergent according to adjusting factor t * p on frequency axis, point-to-pointly then multiply by adjusted envelope spectrum IV, obtain adjusted amplitude spectrum VII, p represents that the resonance peak position adjusts the factor;
Step S1-5, to phase spectrum I, launch with the phase differential of consecutive frame, obtain between two frames actual value of phase change on each frequency, this is on duty to adjust factor t * p, then frequency axis is carried out convergent-divergent according to adjusting factor t * p, adjusted phase differential is added up again, obtain the adjusted phase spectrum VIII of present frame;
Step S1-6 transforms to rectangular coordinate with adjusted amplitude spectrum VII and phase spectrum VIII, the time domain of remapping.
2. according to claim 1 based on speech analysis and synthetic real-time change of voice method, it is characterized in that, independent adjustment is carried out in fundamental frequency and resonance peak position to signal on frequency domain, fundamental frequency and resonance peak position are separated, the fundamental frequency and the harmonic wave thereof of voice signal both can have been changed, can keep the resonance peak position again simultaneously or the resonance peak position arbitrarily be adjusted the independent change of realization tone color and pitch.
3. according to claim 2 based on speech analysis and synthetic real-time change of voice method, it is characterized in that, directly the time span to voice signal changes on time domain, by interpolation or take out to cut digital signal realize is resampled, thereby elongate or shorten the time scale of voice, fundamental frequency and the resonance peak position that changes compensated, realize the effect that separately word speed is changed.
4. according to claim 1 based on speech analysis and synthetic real-time change of voice method, it is characterized in that, by asking for the spectrum envelope of amplitude spectrum, and based on this, the spectrum envelope of the new amplitude spectrum that carried out the adjusted spectrum signal of fundamental frequency is carried out in shape change, under the prerequisite that does not influence fundamental frequency, realize random adjustment to the resonance peak position.
5. according to claim 1 and 2ly it is characterized in that based on speech analysis and the synthetic real-time change of voice method speech analysis is as follows with synthetic step:
Step S2-1 handles on time domain signal, comprises splice branch frame, interpolation, windowing;
Step S2-2 is transformed into each frame that obtains on the time domain on the frequency domain by time-frequency conversion, handles on frequency spectrum, comprises adjusting fundamental frequency and resonance peak, returns to time domain again by the time-frequency inverse transformation then;
Step S2-3 carries out the window function compensation to each frame on time domain, with synthetic window function windowing, obtain complete time-domain signal after the splicing adding again.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100623371A CN100440314C (en) | 2004-07-06 | 2004-07-06 | High quality real time sound changing method based on speech sound analysis and synthesis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100623371A CN100440314C (en) | 2004-07-06 | 2004-07-06 | High quality real time sound changing method based on speech sound analysis and synthesis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1719514A CN1719514A (en) | 2006-01-11 |
CN100440314C true CN100440314C (en) | 2008-12-03 |
Family
ID=35931331
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004100623371A Expired - Fee Related CN100440314C (en) | 2004-07-06 | 2004-07-06 | High quality real time sound changing method based on speech sound analysis and synthesis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100440314C (en) |
Families Citing this family (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2107556A1 (en) * | 2008-04-04 | 2009-10-07 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio transform coding using pitch correction |
CN101304391A (en) * | 2008-06-30 | 2008-11-12 | 腾讯科技(深圳)有限公司 | Voice call method and system based on instant communication system |
CN101860617A (en) * | 2009-04-12 | 2010-10-13 | 比亚迪股份有限公司 | Mobile terminal with voice processing effect and method thereof |
US8744853B2 (en) * | 2009-05-28 | 2014-06-03 | International Business Machines Corporation | Speaker-adaptive synthesized voice |
CN101764879A (en) * | 2009-10-10 | 2010-06-30 | 宇龙计算机通信科技(深圳)有限公司 | Method for changing speech output of mobile terminal and mobile terminal |
CN102044245B (en) * | 2009-10-16 | 2013-03-27 | 成都玺汇科技有限公司 | Instant processing method of input voice information in intelligent machine |
CN101727899B (en) * | 2009-11-27 | 2014-07-30 | 北京中星微电子有限公司 | Method and system for processing audio data |
CN101917163B (en) * | 2010-07-29 | 2012-05-23 | 大连理工大学 | Method for improving electrohydraulic vibration exciting control waveform of non-sinusoidal periodic signal |
US8930182B2 (en) * | 2011-03-17 | 2015-01-06 | International Business Machines Corporation | Voice transformation with encoded information |
CN102307327B (en) * | 2011-08-10 | 2015-08-19 | 深圳万兴信息科技股份有限公司 | A kind of sound effect inflexion method and device |
CN103258539B (en) * | 2012-02-15 | 2015-09-23 | 展讯通信(上海)有限公司 | A kind of transform method of voice signal characteristic and device |
CN102592590B (en) * | 2012-02-21 | 2014-07-02 | 华南理工大学 | Arbitrarily adjustable method and device for changing phoneme naturally |
CN104205213B (en) * | 2012-03-23 | 2018-01-05 | 西门子公司 | Audio signal processing method and device and use its audiphone |
CN102682766A (en) * | 2012-05-12 | 2012-09-19 | 黄莹 | Self-learning lover voice swapper |
CN103730117A (en) * | 2012-10-12 | 2014-04-16 | 中兴通讯股份有限公司 | Self-adaptation intelligent voice device and method |
CN103489443B (en) * | 2013-09-17 | 2016-06-15 | 湖南大学 | A kind of sound imitates method and device |
CN103714824B (en) * | 2013-12-12 | 2017-06-16 | 小米科技有限责任公司 | A kind of audio-frequency processing method, device and terminal device |
CN105304092A (en) * | 2015-09-18 | 2016-02-03 | 深圳市海派通讯科技有限公司 | Real-time voice changing method based on intelligent terminal |
CN105632490A (en) * | 2015-12-18 | 2016-06-01 | 合肥寰景信息技术有限公司 | Context simulation method for network community voice communication |
CN105679331B (en) * | 2015-12-30 | 2019-09-06 | 广东工业大学 | A kind of information Signal separator and synthetic method and system |
CN105845146B (en) * | 2016-05-23 | 2019-09-06 | 珠海市杰理科技股份有限公司 | The method and device of Speech processing |
CN106128478B (en) * | 2016-06-28 | 2019-11-08 | 北京小米移动软件有限公司 | Voice broadcast method and device |
FR3062945B1 (en) * | 2017-02-13 | 2019-04-05 | Centre National De La Recherche Scientifique | METHOD AND APPARATUS FOR DYNAMICALLY CHANGING THE VOICE STAMP BY FREQUENCY SHIFTING THE FORMS OF A SPECTRAL ENVELOPE |
CN108053814B (en) * | 2017-11-06 | 2023-10-13 | 芋头科技(杭州)有限公司 | Speech synthesis system and method for simulating singing voice of user |
CN107863095A (en) * | 2017-11-21 | 2018-03-30 | 广州酷狗计算机科技有限公司 | Acoustic signal processing method, device and storage medium |
CN108492832A (en) * | 2018-03-21 | 2018-09-04 | 北京理工大学 | High quality sound transform method based on wavelet transformation |
CN108682413B (en) * | 2018-04-24 | 2020-09-29 | 上海师范大学 | Emotion persuasion system based on voice conversion |
CN108831437B (en) * | 2018-06-15 | 2020-09-01 | 百度在线网络技术(北京)有限公司 | Singing voice generation method, singing voice generation device, terminal and storage medium |
CN110661760A (en) * | 2018-06-29 | 2020-01-07 | 视联动力信息技术股份有限公司 | Data processing method and device |
CN109192218B (en) * | 2018-09-13 | 2021-05-07 | 广州酷狗计算机科技有限公司 | Method and apparatus for audio processing |
CN109410973B (en) * | 2018-11-07 | 2021-11-16 | 北京达佳互联信息技术有限公司 | Sound changing processing method, device and computer readable storage medium |
CN111383646B (en) * | 2018-12-28 | 2020-12-08 | 广州市百果园信息技术有限公司 | Voice signal transformation method, device, equipment and storage medium |
CN109859327A (en) * | 2019-02-20 | 2019-06-07 | 中山市嘉游动漫科技有限公司 | A kind of virtual cartoon scene construction method and device with reality of combination |
CN109920446B (en) * | 2019-03-12 | 2021-03-26 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio data processing method and device and computer storage medium |
CN113066472A (en) * | 2019-12-13 | 2021-07-02 | 科大讯飞股份有限公司 | Synthetic speech processing method and related device |
CN111833843B (en) * | 2020-07-21 | 2022-05-10 | 思必驰科技股份有限公司 | Speech synthesis method and system |
CN111816198A (en) * | 2020-08-05 | 2020-10-23 | 上海影卓信息科技有限公司 | Voice changing method and system for changing voice tone and tone color |
CN112309425A (en) * | 2020-10-14 | 2021-02-02 | 浙江大华技术股份有限公司 | Sound tone changing method, electronic equipment and computer readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1118493A (en) * | 1994-08-01 | 1996-03-13 | 中国科学院声学研究所 | Language and speech converting system with synchronous fundamental tone waves |
WO2003071523A1 (en) * | 2002-02-19 | 2003-08-28 | Qualcomm, Incorporated | Speech converter utilizing preprogrammed voice profiles |
-
2004
- 2004-07-06 CN CNB2004100623371A patent/CN100440314C/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1118493A (en) * | 1994-08-01 | 1996-03-13 | 中国科学院声学研究所 | Language and speech converting system with synchronous fundamental tone waves |
WO2003071523A1 (en) * | 2002-02-19 | 2003-08-28 | Qualcomm, Incorporated | Speech converter utilizing preprogrammed voice profiles |
Non-Patent Citations (1)
Title |
---|
线性预测编码在变音长语音合成中的应用. 梁志强,李海洲.华南理工大学学报(自然科学版),第26卷第3期. 1998 * |
Also Published As
Publication number | Publication date |
---|---|
CN1719514A (en) | 2006-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100440314C (en) | High quality real time sound changing method based on speech sound analysis and synthesis | |
Watanabe | Formant estimation method using inverse-filter control | |
Cook | Real sound synthesis for interactive applications | |
US10008193B1 (en) | Method and system for speech-to-singing voice conversion | |
CN106971703A (en) | A kind of song synthetic method and device based on HMM | |
CN108417228A (en) | Voice tone color method for measuring similarity under instrument tamber migration | |
CN102419981B (en) | Zooming method and device for time scale and frequency scale of audio signal | |
JPH11513820A (en) | Control structure for speech synthesis | |
CN108766409A (en) | A kind of opera synthetic method, device and computer readable storage medium | |
WO2019107379A1 (en) | Audio synthesizing method, audio synthesizing device, and program | |
CN103258539A (en) | Method and device for transforming voice signal characteristics | |
CN106997765A (en) | The quantitatively characterizing method of voice tone color | |
New et al. | Voice conversion: From spoken vowels to singing vowels | |
Ardaillon | Synthesis and expressive transformation of singing voice | |
Bonada et al. | Singing voice synthesis combining excitation plus resonance and sinusoidal plus residual models | |
Yim et al. | Computationally efficient algorithm for time scale modification (GLS-TSM) | |
Tachibana et al. | A real-time audio-to-audio karaoke generation system for monaural recordings based on singing voice suppression and key conversion techniques | |
Bonada et al. | Generation of growl-type voice qualities by spectral morphing | |
Yu et al. | Probablistic modelling of F0 in unvoiced regions in HMM based speech synthesis | |
Guillemain | Some roles of the vocal tract in clarinet breath attacks: Natural sounds analysis and model-based synthesis | |
US11495200B2 (en) | Real-time speech to singing conversion | |
CN109712634A (en) | A kind of automatic sound conversion method | |
WO2024087727A1 (en) | Voice data processing method based on in-vehicle voice ai, and related device | |
Shimizu et al. | Comparative evaluation of neural vocoders for speech synthesis of operatic singing | |
O'Reilly Regueiro | Evaluation of interpolation strategies for the morphing of musical sound objects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20081203 Termination date: 20210706 |
|
CF01 | Termination of patent right due to non-payment of annual fee |