CN100440314C - High quality real time sound changing method based on speech sound analysis and synthesis - Google Patents

High quality real time sound changing method based on speech sound analysis and synthesis Download PDF

Info

Publication number
CN100440314C
CN100440314C CNB2004100623371A CN200410062337A CN100440314C CN 100440314 C CN100440314 C CN 100440314C CN B2004100623371 A CNB2004100623371 A CN B2004100623371A CN 200410062337 A CN200410062337 A CN 200410062337A CN 100440314 C CN100440314 C CN 100440314C
Authority
CN
China
Prior art keywords
spectrum
frequency
time
resonance peak
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004100623371A
Other languages
Chinese (zh)
Other versions
CN1719514A (en
Inventor
孟猛
张树武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CNB2004100623371A priority Critical patent/CN100440314C/en
Publication of CN1719514A publication Critical patent/CN1719514A/en
Application granted granted Critical
Publication of CN100440314C publication Critical patent/CN100440314C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present invention relates to a high-quality real time speech changing method based on speech sound analysis and synthesis, which belongs to the field of speech converter technique. The signals are interpolated or inspected according to requirements of time length change in a time domain, an amplitude spectrum and a phase spectrum are respectively processed through the conversion from the time domain to a frequency domain to separate and independently adjust base frequency and formant, the influence to the base frequency and the format from time length adjustment is compensated during adjusting, and finally, time domain signals are restored. The time domain signals are converted to the frequency domain through the fast Fourier transform, the positions of the base frequency of speech and formant positions are separated and are respectively adjusted, speech is synthesized again, and therefore, sound length, pitch and tone color are adjusted to change sound intensity and realize voice conversion. The method can process speech in real time, can be directly used in the recreation fields of network telephones, speech chat rooms, etc. and can also be used in the practical fields of dubbing, music synthesis, etc. Simultaneously, the method can also be used for speech synthesis to improve the whole tone quality of synthetic speech.

Description

Based on speech analysis and synthetic high-quality real-time change of voice method
Technical field
The present invention relates to the speech transformation technique field, particularly a kind of based on speech analysis and synthetic high-quality real-time change of voice method.
Background technology
Speech transformation technique is used to change acoustic features such as the tone of voice and speed, thereby the intention according to people produces the new feature that suits the requirements, it has practical application widely in many aspects, for example dubs, music is synthetic, Internet chat, sound are maintained secrecy or the like.This technology has been widened the research range of speech processes, makes the application of the speech processes more diversification that becomes.
The basic physical features of voice comprises pitch, loudness of a sound, tonequality and the duration of a sound.Pitch is that the frequency high sound is just high by the height decision of pronunciation object vibration frequency, and the low sound of frequency is just low.Vocal cords such as women and children are relatively shorter and thinner, vibration frequency of vocal band height when singing in a minute, and man and old man's vocal cords are long and thicker, and vibration frequency of vocal band is low when singing in a minute, thereby man and old man's sound is overcast compared with female voice and child's voice.Can change pitch by changing fundamental frequency.The power of the corresponding sound of loudness of a sound is decided by the amplitude of sound, and promptly the size by vibration is determined.Tonequality is tone color again, is exactly essence, the characteristic of sound, and it depends on the form of acoustic vibration, is the most basic feature that different sound can be distinguished mutually, performs same tune as voice, piano sound, violin sound, sounds having nothing in common with each other.Resonance peak has reflected the outstanding harmonic components of component in the sound, thereby thinks that height, position and the number affects of resonance peak tone color.The duration of a sound is exactly the length of sound, is decided by the time of sounding body vibration.
As the fundamental of sound, any factor is not self-existent in pitch, loudness of a sound, tonequality and the duration of a sound.Generally only change wherein a kind ofly, severally in addition also can change thereupon.For example, just can change the word speed of voice by the sample frequency that changes the audio digital signals broadcast, promptly change the duration of a sound, but meanwhile, the fundamental frequency of voice and resonance peak position also change simultaneously, thereby be not that variation has taken place word speed in the sound that we hear, variation has also taken place in tone color and pitch, and whole speaker's feature changes so much that one loses one's identity.For another example, only the fundamental frequency in the voice is carried out proportional zoom, behind the synthetic speech, the position of resonance peak also can be moved with fundamental frequency again, and tone color changes equally.These problems need to be resolved in speech transformation technique.
The relation of four kinds of factors that the present invention is clear and definite, by modes such as separation, compensation, pitch, loudness of a sound, tonequality and the duration of a sound have been realized independent adjustment, thereby can adjust voice speaker's feature such as tone color, tone, word speed flexibly, reach multiple speaker ' s identity (old man, child, adult males, girls etc.) high-quality simulation.
Summary of the invention
The object of the present invention is to provide a kind of based on speech analysis and synthetic high-quality real-time change of voice method.
This method changes the influence that cause by studying its difference to voice based on the understanding to the physical attribute of voice, obtains the method that a kind of method by digital signal processing changes the speaker ' s identity feature of voice.The present invention is based on the time frequency analysis of digital signal, cut the length that changes voice by the interpolation on the time domain with taking out, by short time discrete Fourier transform time-domain signal is transformed on the frequency domain, adjust the spectrum envelope shape of phase spectrum, amplitude spectrum and amplitude spectrum, the fundamental frequency that reaches speech separates with the resonance peak position, thus the purpose that can adjust respectively, and the feature after will changing at last synthesizes voice signal again, change characteristic of Voice, realized the change of voice.The present invention has realized the independent regulation of fundamental frequency, resonance peak position and time span, loudness of a sound, thereby can adjust voice speaker's feature such as tone color, pitch, word speed flexibly, reach high-quality simulation to multiple speaker's sex and age characteristics (old man, child, adult males, girls etc.).
A kind of based on speech analysis and synthetic high-quality real-time change of voice method, based on Fourier analysis and synthetic technology, comprise the steps: on time domain, signal to be carried out interpolation or take out and cut according to the requirement of duration change, transform to frequency domain then, amplitude spectrum and phase spectrum are handled respectively, separated fundamental frequency and resonance peak, and it is carried out independent regulation, compensation duration adjustment the two influence to this recovers time-domain signal at last during adjusting.Method has processing speed and high-quality treatment effect fast, can satisfy the requirement of real-time and practicality simultaneously.
Independent adjustment is carried out in fundamental frequency and resonance peak position to signal on frequency domain, fundamental frequency and resonance peak position are separated, the fundamental frequency and the harmonic wave thereof of voice signal both can have been changed, can keep the resonance peak position again simultaneously or the resonance peak position is arbitrarily adjusted, realize that the independent of tone color and pitch changes.
Directly the time span to voice signal changes on time domain, by interpolation or take out to cut digital signal realize is resampled, thereby elongate or shorten the time scale of voice, again fundamental frequency and the resonance peak position that changes therefrom compensated, thereby realize the independent effect that word speed is changed.
Energy to signal is added up, and adjusts the energy ratio of input/output signal in real time, thereby can change the voice intensity of output signal flexibly.
Adjustment respectively to amplitude spectrum and phase spectrum, by asking for the spectrum envelope of amplitude spectrum, and based on this, the spectrum envelope of the new amplitude spectrum that carried out the adjusted spectrum signal of fundamental frequency is carried out in shape change, under the prerequisite that does not influence fundamental frequency, realize random adjustment to the resonance peak position.
The present invention is based on speech analysis and synthetic technology, as shown in Figure 2.
Voice signal is regarded a kind of stationary signal in short-term as, can be transformed into frequency domain to voice signal by short time discrete Fourier transform and carry out analyzing and processing.When doing short time discrete Fourier transform, the length of time window can not be too short, will comprise several fundamental frequency cycles usually, simultaneously, because restriction stably in short-term, can not be oversize, guarantee that the variation of physical characteristics within the frame is not obvious.For voice, the fundamental frequency of male voice is lower, usually about 125HZ, and the about 8ms of fundamental frequency cycles, therefore, the length that can get time window usually is near the 24ms to 32ms.In digital signal processing, the length of the window function i.e. data sample of a frame is counted, and its size depends on the sampling rate size of this voice signal.
Carry out short time discrete Fourier transform, be equivalent to this frame voice signal elder generation windowing, calculated signals fourier series to behind periodic extension on the time shaft, obtaining again, be the stack that is expressed as one group of multiple sinusoidal signal behind this frame signal periodic extension, the fourier coefficient that conversion obtains is the amplitude of these multiple sinusoidal components.If the frequency values of the multiple sinusoidal component of each that will obtain is adjusted to new frequency values by multiply by a certain scale-up factor p simultaneously, the fundamental frequency and the harmonic frequency thereof of the time domain voice signal after synthesizing again through inverse fourier transform have so more also been taken advantage of scale-up factor p simultaneously, thereby realize the change to the original signal fundamental frequency.
In practical methods, short time discrete Fourier transform is realized by windowing and fast Fourier transform (FFT).After the conversion, in order further to finish adjustment and the synthetic again time-domain signal of use invert fast fourier transformation (IFFT) to the each component frequency values, need earlier the fourier coefficient that obtains to be transformed into polar coordinates by rectangular coordinate, promptly obtain its amplitude spectrum and phase spectrum.Both do convenient the realization separating of fundamental frequency and resonance peak position like this, finish following equivalence again easily and realize: promptly the original frequency value f that changes a certain multiple sinusoidal signal component 1To another frequency values p*f 1, become at fixed frequency f 2On to change the amplitude of this component and phase value into corresponding original frequency be f 2The amplitude of the component of/p and phase value, thus it is synthetic directly to use IFFT to realize.
Concerning amplitude spectrum, finish appeal and only handle and to carry out interpolation by proportional parts or take out and cut and to finish original amplitude spectrum.And for phase spectrum, then need earlier phase spectrum to be launched, as shown in Figure 3.In a certain frame, the frequency values f of the multiple sinusoidal signal component of a certain frequency 1Adjust to p*f 1, the variable quantity of the phase place of this component in this frame also will become p times of original variable quantity, and the variation of this phase place can be accumulated on the initial phase of next frame frame by frame.In order to realize this adjustment of phase spectrum, method is that the phase differential (be phase changing capacity former frame in) of phase spectrum on adjacent two frames after launching is adjusted into original p doubly, and the initial phase that obtains through accumulation also will become original p doubly again.
The method of deploying of phase spectrum:
Suppose that the shift time length between two frames is t w, be f for frequency kMultiple sinusoidal wave component, its at t (t>1) constantly, and the theoretical value of the phase changing capacity between the former frame is
ΔΦ k (t)=2π·f k·t w.
Initial phase difference between actual two frames is
Δθ k (t)=θ k (t)k (t-1).
Definition
Δφ k (t)=(Δθ k (t)-ΔΦ k (t))MOD2π+ΔΦ k (t)
Wherein
So, Δ φ k (t)The phase changing capacity of adjacent two interframe after promptly launching constantly as t.By adding up, obtain the initial phase after t launches constantly:
θ ~ k ( t ) = θ ~ k ( t - 1 ) + Δ φ k ( t ) , θ ~ k ( 1 ) = θ k ( 1 ) .
Said as the front, when changing fundamental frequency, only need carry out interpolation by proportional parts or take out and cut original amplitude spectrum for amplitude spectrum.But do like this, when having changed fundamental frequency, also moved the position of resonance peak in same ratio.So, need to introduce other method and under the situation that does not influence fundamental frequency, adjust resonance peak.This method is to reach final purpose by the spectrum envelope that extracts amplitude spectrum.
Below formula in, e (n) is the spectrum envelope of original amplitude spectrum before adjusting, by top disposal route, fundamental frequency improve p doubly after, spectrum envelope becomes thereupon
Figure C20041006233700083
And have e ^ ( n ) = e ( n p ) ,
Figure C20041006233700085
Be the adjusted amplitude spectrum of process interpolation, Be the amplitude spectrum after resonance peak is compensated.Have
a ~ ( n ) = e ( n ) e ^ ( n ) a ^ ( n ) = e ( n ) e ( n p ) a ^ ( n ) .
Amplitude spectrum after the compensation that obtains thus
Figure C20041006233700088
Kept the spectrum envelope e (n) of original amplitude spectrum, thereby guaranteed that original resonance peak position is constant moving, can the adjustment of frequency not impacted simultaneously.Same thinking can be further with formula a ~ ( n ) = e ( n ) e ^ ( n ) a ^ ( n ) In e (n) change into resonance peak adjusted the new spectrum envelope in back, thereby realize change to the resonance peak position.
Ask for common the having of method of spectrum envelope: the method for linear predictive coding (LPC), the method for cepstral analysis, the method for low-pass filtering, discrete cepstrum method, and to method of local peak point interpolation or the like.In order to satisfy the requirement of real-time, the method for selection requires that lower complexity is arranged, and simultaneously, also will guarantee effect preferably.What adopt in this example is the method for improved cepstral analysis, experiment showed, that this method stability is strong, is applicable to multiple sound type, and calculates effect and calculated amount all meets practical requirement.
Above method has realized the independent of fundamental frequency and resonance peak position changed.
On this basis, the adjustment of the independence of the duration of a sound also becomes and is easy to realize.
Know that the sample frequency when playing by changing audio digital signals just can change the word speed of voice, has promptly changed the duration of a sound.So, can on time domain, make interpolation earlier or take out and cut voice signal data, under original sampling rate speed, to play, word speed has just obtained slowing down or accelerating.But meanwhile, the fundamental frequency of voice and resonance peak position also change simultaneously.If proportionally factor t is carried out interpolation to time-domain signal, then fundamental frequency cycles becomes original t doubly, and fundamental frequency just becomes 1/t, influences the resonance peak position simultaneously and also changes in the ratio of 1/t.
The method that has had the front that fundamental frequency and resonance peak position are independently changed, this moment if to fundamental frequency and resonance peak position in proportion factor t compensate simultaneously, just realized only changing the purpose of the duration of a sound.
Can be seen that by top discussion the adjustment of three kinds of physical characteristicss is the duration of a sound, fundamental frequency in proper order, be the resonance peak position then.Suppose that its ratio adjustment factor is followed successively by t, p, f, and the separately adjustment successively of three features, situation is as follows: factor t is adjusted the duration of a sound at first in proportion, simultaneously fundamental frequency and resonance peak position is compensated with factor t.Factor p adjusts fundamental frequency in proportion again, simultaneously to resonance peak position compensating factor 1/p.Factor f adjusts the resonance peak position in proportion at last.Therefore, finally be equivalent to adjust the duration of a sound with factor t earlier, adjust fundamental frequency with factor p*t again, use the factor at last
Figure C20041006233700091
Adjust the resonance peak position of this moment, thereby realize independently adjusting with t, p, f respectively the purpose of three kinds of physical characteristicss.In the practical application, the adjustment of resonance peak can be simplified, and only needs to adjust f*t on initial position, as shown in Figure 1.
The adjustment of three kinds of physical characteristicss all is the interpolation by sample point and smokes and cut realization, in order to guarantee change of voice effect preferably, and under the prerequisite that satisfies voice conversion requirement, each scale factor is limited between 0.5~2.Experimental result shows, the adjustment of in this scope, making, and major part can both obtain gratifying effect.Be noted that the compensation of when adjusting resonance peak position and fundamental frequency the duration of a sound being adjusted simultaneously, this two resize ratio is become very big (substantially exceeding 2 times), cause many information lose or fuzzy.Therefore, when the resize ratio of resonance peak position or fundamental frequency is big, should not do big adjustment to the duration of a sound simultaneously.
The method of adjustment of loudness of a sound is as follows: Δ E I, n, Δ E O, nThe energy value of (adjusting fundamental frequency and resonance peak, after the synthetic again time-domain signal) when the energy value of (before the spectrum analysis, after the duration of a sound adjustment) and output when representing the input of n frame signal respectively, E I, n, E O, nBe used for representing the gross energy of input signal before the n frame and the gross energy of output signal.Thereby have
E i,n=E i,n-1+ΔE i,n
E o,n=E o,n-1+ΔE o,n.
Then, each data point D of n frame output signal N, kValue is adjusted into by following formula
Figure C20041006233700101
D ^ n , k = D n , k · E i , n E o , n .
Top formula has guaranteed that signal and the conversion original energy before after the phonetic modification is consistent substantially, and promptly loudness of a sound remains unchanged.If need loudness of a sound is done the adjustment of a certain ratio, only need re-use this scale-up factor adjustment on this basis
Figure C20041006233700103
Description of drawings
Fig. 1 is the duration of a sound of the present invention, fundamental frequency, resonance peak position adjustment detail flowchart;
Fig. 2 is signal analysis of the present invention and synthesis step schematic flow sheet;
Fig. 3 is a phase unwrapping synoptic diagram of the present invention.
Embodiment
The step that the duration of a sound of Fig. 1, fundamental frequency, resonance peak position are adjusted is as follows:
Step S1-1 carries out the interpolation of data point to a certain frame or takes out and cut according to adjusting factor t on time domain;
Step S1-2 transforms to frequency domain, and is transformed on the polar coordinates by rectangular coordinate, obtains phase spectrum I and amplitude spectrum II;
Step S1-3 extracts envelope to amplitude spectrum II, obtains envelope spectrum III, and III is carried out convergent-divergent according to adjusting factor t * f on frequency axis, obtains adjusting the envelope spectrum IV of resonance peak position;
Step S1-4 obtains V to amplitude spectrum II is point-to-point divided by envelope spectrum III, and the horizontal ordinate of spectrum V is carried out convergent-divergent according to adjusting factor t * p on frequency axis, point-to-pointly then multiply by adjusted envelope spectrum IV, obtains adjusted amplitude spectrum VII;
Step S1-5, to phase spectrum I, launch with the phase differential of consecutive frame, obtain between two frames actual value of phase change on each frequency, this is on duty to adjust factor t * p, then frequency axis is carried out convergent-divergent according to adjusting factor t * p, adjusted phase differential is added up again, obtain the adjusted phase spectrum VIII of present frame;
Step S1-6 transforms to rectangular coordinate with adjusted amplitude spectrum VII and phase spectrum VIII, the time domain of remapping.
The speech signal analysis of Fig. 2 is with synthetic, and its step is as follows:
Step S2-1 handles on time domain signal, comprises splice branch frame, interpolation, windowing etc.;
Step S2-2 is transformed into each frame that obtains on the time domain on the frequency domain by time-frequency conversion, handles on frequency spectrum, comprises adjusting fundamental frequency and resonance peak etc., returns to time domain again by the time-frequency inverse transformation then;
Step S2-3 carries out the window function compensation to each frame on time domain, with synthetic window function windowing, obtain complete time-domain signal after the splicing adding again.
The phase unwrapping of Fig. 3, the explanation such as the preamble of the concrete process of launching describe in detail.
In order to realize simulation and mutual conversion to male voice, female voice, child's voice and old man's sound, the present invention in the adjustment of each physical characteristics based on following explanation.
Under common situation of speaking, it is generally acknowledged that the fundamental frequency of child's voice is the highest, can reach about 300Hz, female voice is on a rough average near 220Hz, and the fundamental frequency of male voice is on average about 125Hz.Thus, can obtain the general proportions of the fundamental frequency of male voice, female voice and child's voice.Find in the practical application that the fundamental frequency ratio of female voice and male voice can have more satisfactoryly to effect between 1.5~1.8 usually, and the fundamental frequency ratio of child's voice and male voice must be more than 1.8.For simulation old man's sound, to reduce near the ratio of fundamental frequency to 0.6~0.9 usually, obtain real effect.
For resonance peak, usually, the resonance peak of male voice, female voice, child's voice roughly all has 6: 7: 8 simple relation.In the actual conditions, man, woman, child's voice are between each peak of different frequency height, and its ratio is not to be linear, and each peak difference that frequency is lower is bigger usually, and the higher then difference of frequency is little.Under the common application conditions, can ignore and not consider.For old man's sound, can think that its tone color is partial to male voice, so the adjusting ratio of its resonance peak is selected the numerical value less than 1 for use.
When the mutual conversion of men and women's sound, it has been generally acknowledged that word speed does not change, and, word speed can be slowed down slightly for old man's sound and child's voice, tally with the actual situation.
Embodiment
According to the method that the present invention proposes, on the pc platform, realized one can record, primary sound plays, and handles and simulate the demonstration program that male voice, female voice, old man's sound or child's sound are play in real time.This program is carried out the adjustment of the duration of a sound, fundamental frequency and resonance peak by predetermined resize ratio to this buffer zone speech data section to each the buffer zone elder generation pre-service in the play buffer formation, simulates man, woman, old man and child's sound respectively.And resize ratio that can the above-mentioned three kinds of features of manual setting, reach more satisfied simulate effect.This program has realized real-time processing, adjusts real-time play in real time.
At CPU is P4-2.4GHz, in save as under the test platform of 256M, the CPU usage when idle is 2%, CPU usage was about 10% when primary sound was play, and the change of voice in real time is when playing, CPU usage is about 22%.This change of voice method can receive within the scope fully to the requirement of processor, and has reached gratifying effect on tonequality.

Claims (5)

1. one kind based on speech analysis and synthetic real-time change of voice method, based on Fourier analysis and synthetic technology, it is characterized in that, comprise the steps: on time domain, signal to be carried out interpolation or take out and cut, transform to frequency domain then, amplitude spectrum and phase spectrum are handled respectively according to the requirement of time span change, separate fundamental frequency and resonance peak, and it is carried out independent regulation, the two the influence to this of make-up time length adjustment recovers time-domain signal at last during adjusting; The set-up procedure of wherein said fundamental frequency, resonance peak position is as follows:
Step S1-1 carries out the interpolation of data point to a certain frame or takes out and cut according to adjusting factor t on time domain;
Step S1-2 transforms to frequency domain, and is transformed on the polar coordinates by rectangular coordinate, obtains phase spectrum I and amplitude spectrum II;
Step S1-3 extracts envelope to amplitude spectrum II, obtains envelope spectrum III, and III is carried out convergent-divergent by adjusting factor t * f on frequency axis, obtains adjusting the envelope spectrum IV of resonance peak position ,F represents the fundamental frequency adjustment factor;
Step S1-4, point-to-point to amplitude spectrum II divided by envelope spectrum III, obtain V, the horizontal ordinate of spectrum V is carried out convergent-divergent according to adjusting factor t * p on frequency axis, point-to-pointly then multiply by adjusted envelope spectrum IV, obtain adjusted amplitude spectrum VII, p represents that the resonance peak position adjusts the factor;
Step S1-5, to phase spectrum I, launch with the phase differential of consecutive frame, obtain between two frames actual value of phase change on each frequency, this is on duty to adjust factor t * p, then frequency axis is carried out convergent-divergent according to adjusting factor t * p, adjusted phase differential is added up again, obtain the adjusted phase spectrum VIII of present frame;
Step S1-6 transforms to rectangular coordinate with adjusted amplitude spectrum VII and phase spectrum VIII, the time domain of remapping.
2. according to claim 1 based on speech analysis and synthetic real-time change of voice method, it is characterized in that, independent adjustment is carried out in fundamental frequency and resonance peak position to signal on frequency domain, fundamental frequency and resonance peak position are separated, the fundamental frequency and the harmonic wave thereof of voice signal both can have been changed, can keep the resonance peak position again simultaneously or the resonance peak position arbitrarily be adjusted the independent change of realization tone color and pitch.
3. according to claim 2 based on speech analysis and synthetic real-time change of voice method, it is characterized in that, directly the time span to voice signal changes on time domain, by interpolation or take out to cut digital signal realize is resampled, thereby elongate or shorten the time scale of voice, fundamental frequency and the resonance peak position that changes compensated, realize the effect that separately word speed is changed.
4. according to claim 1 based on speech analysis and synthetic real-time change of voice method, it is characterized in that, by asking for the spectrum envelope of amplitude spectrum, and based on this, the spectrum envelope of the new amplitude spectrum that carried out the adjusted spectrum signal of fundamental frequency is carried out in shape change, under the prerequisite that does not influence fundamental frequency, realize random adjustment to the resonance peak position.
5. according to claim 1 and 2ly it is characterized in that based on speech analysis and the synthetic real-time change of voice method speech analysis is as follows with synthetic step:
Step S2-1 handles on time domain signal, comprises splice branch frame, interpolation, windowing;
Step S2-2 is transformed into each frame that obtains on the time domain on the frequency domain by time-frequency conversion, handles on frequency spectrum, comprises adjusting fundamental frequency and resonance peak, returns to time domain again by the time-frequency inverse transformation then;
Step S2-3 carries out the window function compensation to each frame on time domain, with synthetic window function windowing, obtain complete time-domain signal after the splicing adding again.
CNB2004100623371A 2004-07-06 2004-07-06 High quality real time sound changing method based on speech sound analysis and synthesis Expired - Fee Related CN100440314C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100623371A CN100440314C (en) 2004-07-06 2004-07-06 High quality real time sound changing method based on speech sound analysis and synthesis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100623371A CN100440314C (en) 2004-07-06 2004-07-06 High quality real time sound changing method based on speech sound analysis and synthesis

Publications (2)

Publication Number Publication Date
CN1719514A CN1719514A (en) 2006-01-11
CN100440314C true CN100440314C (en) 2008-12-03

Family

ID=35931331

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100623371A Expired - Fee Related CN100440314C (en) 2004-07-06 2004-07-06 High quality real time sound changing method based on speech sound analysis and synthesis

Country Status (1)

Country Link
CN (1) CN100440314C (en)

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2107556A1 (en) * 2008-04-04 2009-10-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio transform coding using pitch correction
CN101304391A (en) * 2008-06-30 2008-11-12 腾讯科技(深圳)有限公司 Voice call method and system based on instant communication system
CN101860617A (en) * 2009-04-12 2010-10-13 比亚迪股份有限公司 Mobile terminal with voice processing effect and method thereof
US8744853B2 (en) * 2009-05-28 2014-06-03 International Business Machines Corporation Speaker-adaptive synthesized voice
CN101764879A (en) * 2009-10-10 2010-06-30 宇龙计算机通信科技(深圳)有限公司 Method for changing speech output of mobile terminal and mobile terminal
CN102044245B (en) * 2009-10-16 2013-03-27 成都玺汇科技有限公司 Instant processing method of input voice information in intelligent machine
CN101727899B (en) * 2009-11-27 2014-07-30 北京中星微电子有限公司 Method and system for processing audio data
CN101917163B (en) * 2010-07-29 2012-05-23 大连理工大学 Method for improving electrohydraulic vibration exciting control waveform of non-sinusoidal periodic signal
US8930182B2 (en) * 2011-03-17 2015-01-06 International Business Machines Corporation Voice transformation with encoded information
CN102307327B (en) * 2011-08-10 2015-08-19 深圳万兴信息科技股份有限公司 A kind of sound effect inflexion method and device
CN103258539B (en) * 2012-02-15 2015-09-23 展讯通信(上海)有限公司 A kind of transform method of voice signal characteristic and device
CN102592590B (en) * 2012-02-21 2014-07-02 华南理工大学 Arbitrarily adjustable method and device for changing phoneme naturally
CN104205213B (en) * 2012-03-23 2018-01-05 西门子公司 Audio signal processing method and device and use its audiphone
CN102682766A (en) * 2012-05-12 2012-09-19 黄莹 Self-learning lover voice swapper
CN103730117A (en) * 2012-10-12 2014-04-16 中兴通讯股份有限公司 Self-adaptation intelligent voice device and method
CN103489443B (en) * 2013-09-17 2016-06-15 湖南大学 A kind of sound imitates method and device
CN103714824B (en) * 2013-12-12 2017-06-16 小米科技有限责任公司 A kind of audio-frequency processing method, device and terminal device
CN105304092A (en) * 2015-09-18 2016-02-03 深圳市海派通讯科技有限公司 Real-time voice changing method based on intelligent terminal
CN105632490A (en) * 2015-12-18 2016-06-01 合肥寰景信息技术有限公司 Context simulation method for network community voice communication
CN105679331B (en) * 2015-12-30 2019-09-06 广东工业大学 A kind of information Signal separator and synthetic method and system
CN105845146B (en) * 2016-05-23 2019-09-06 珠海市杰理科技股份有限公司 The method and device of Speech processing
CN106128478B (en) * 2016-06-28 2019-11-08 北京小米移动软件有限公司 Voice broadcast method and device
FR3062945B1 (en) * 2017-02-13 2019-04-05 Centre National De La Recherche Scientifique METHOD AND APPARATUS FOR DYNAMICALLY CHANGING THE VOICE STAMP BY FREQUENCY SHIFTING THE FORMS OF A SPECTRAL ENVELOPE
CN108053814B (en) * 2017-11-06 2023-10-13 芋头科技(杭州)有限公司 Speech synthesis system and method for simulating singing voice of user
CN107863095A (en) * 2017-11-21 2018-03-30 广州酷狗计算机科技有限公司 Acoustic signal processing method, device and storage medium
CN108492832A (en) * 2018-03-21 2018-09-04 北京理工大学 High quality sound transform method based on wavelet transformation
CN108682413B (en) * 2018-04-24 2020-09-29 上海师范大学 Emotion persuasion system based on voice conversion
CN108831437B (en) * 2018-06-15 2020-09-01 百度在线网络技术(北京)有限公司 Singing voice generation method, singing voice generation device, terminal and storage medium
CN110661760A (en) * 2018-06-29 2020-01-07 视联动力信息技术股份有限公司 Data processing method and device
CN109192218B (en) * 2018-09-13 2021-05-07 广州酷狗计算机科技有限公司 Method and apparatus for audio processing
CN109410973B (en) * 2018-11-07 2021-11-16 北京达佳互联信息技术有限公司 Sound changing processing method, device and computer readable storage medium
CN111383646B (en) * 2018-12-28 2020-12-08 广州市百果园信息技术有限公司 Voice signal transformation method, device, equipment and storage medium
CN109859327A (en) * 2019-02-20 2019-06-07 中山市嘉游动漫科技有限公司 A kind of virtual cartoon scene construction method and device with reality of combination
CN109920446B (en) * 2019-03-12 2021-03-26 腾讯音乐娱乐科技(深圳)有限公司 Audio data processing method and device and computer storage medium
CN113066472A (en) * 2019-12-13 2021-07-02 科大讯飞股份有限公司 Synthetic speech processing method and related device
CN111833843B (en) * 2020-07-21 2022-05-10 思必驰科技股份有限公司 Speech synthesis method and system
CN111816198A (en) * 2020-08-05 2020-10-23 上海影卓信息科技有限公司 Voice changing method and system for changing voice tone and tone color
CN112309425A (en) * 2020-10-14 2021-02-02 浙江大华技术股份有限公司 Sound tone changing method, electronic equipment and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1118493A (en) * 1994-08-01 1996-03-13 中国科学院声学研究所 Language and speech converting system with synchronous fundamental tone waves
WO2003071523A1 (en) * 2002-02-19 2003-08-28 Qualcomm, Incorporated Speech converter utilizing preprogrammed voice profiles

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1118493A (en) * 1994-08-01 1996-03-13 中国科学院声学研究所 Language and speech converting system with synchronous fundamental tone waves
WO2003071523A1 (en) * 2002-02-19 2003-08-28 Qualcomm, Incorporated Speech converter utilizing preprogrammed voice profiles

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
线性预测编码在变音长语音合成中的应用. 梁志强,李海洲.华南理工大学学报(自然科学版),第26卷第3期. 1998 *

Also Published As

Publication number Publication date
CN1719514A (en) 2006-01-11

Similar Documents

Publication Publication Date Title
CN100440314C (en) High quality real time sound changing method based on speech sound analysis and synthesis
Watanabe Formant estimation method using inverse-filter control
Cook Real sound synthesis for interactive applications
US10008193B1 (en) Method and system for speech-to-singing voice conversion
CN106971703A (en) A kind of song synthetic method and device based on HMM
CN108417228A (en) Voice tone color method for measuring similarity under instrument tamber migration
CN102419981B (en) Zooming method and device for time scale and frequency scale of audio signal
JPH11513820A (en) Control structure for speech synthesis
CN108766409A (en) A kind of opera synthetic method, device and computer readable storage medium
WO2019107379A1 (en) Audio synthesizing method, audio synthesizing device, and program
CN103258539A (en) Method and device for transforming voice signal characteristics
CN106997765A (en) The quantitatively characterizing method of voice tone color
New et al. Voice conversion: From spoken vowels to singing vowels
Ardaillon Synthesis and expressive transformation of singing voice
Bonada et al. Singing voice synthesis combining excitation plus resonance and sinusoidal plus residual models
Yim et al. Computationally efficient algorithm for time scale modification (GLS-TSM)
Tachibana et al. A real-time audio-to-audio karaoke generation system for monaural recordings based on singing voice suppression and key conversion techniques
Bonada et al. Generation of growl-type voice qualities by spectral morphing
Yu et al. Probablistic modelling of F0 in unvoiced regions in HMM based speech synthesis
Guillemain Some roles of the vocal tract in clarinet breath attacks: Natural sounds analysis and model-based synthesis
US11495200B2 (en) Real-time speech to singing conversion
CN109712634A (en) A kind of automatic sound conversion method
WO2024087727A1 (en) Voice data processing method based on in-vehicle voice ai, and related device
Shimizu et al. Comparative evaluation of neural vocoders for speech synthesis of operatic singing
O'Reilly Regueiro Evaluation of interpolation strategies for the morphing of musical sound objects

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20081203

Termination date: 20210706

CF01 Termination of patent right due to non-payment of annual fee