CN1248191C - Phoneme changing method based on digital signal processing - Google Patents
Phoneme changing method based on digital signal processing Download PDFInfo
- Publication number
- CN1248191C CN1248191C CNB031370144A CN03137014A CN1248191C CN 1248191 C CN1248191 C CN 1248191C CN B031370144 A CNB031370144 A CN B031370144A CN 03137014 A CN03137014 A CN 03137014A CN 1248191 C CN1248191 C CN 1248191C
- Authority
- CN
- China
- Prior art keywords
- voice
- change
- pitch period
- length
- fundamental frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Landscapes
- Electrophonic Musical Instruments (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present invention discloses a voice changing method based on digital signal processing. The present invention comprises the steps: 1, original voice signals need to change voice are selected. 2, users can obtain the fundamental tone periodical length of the original voice signals. 3, the position of each fundamental tone cycle of the entire original voice signals is positioned according to the fundamental tone periodical length. 4, fundamental tone periods are deleted or inserted between fundamental tone periods of the original voice signals so as to obtain shortened or elongated voice signals. 5, the shortened or elongated voice signals are linearly elongated or compressed to the same length with the original voice signals so as to obtain the voice signals after voice change. The present invention is based on a voice change method processed by digital signals, the method is simple and practical and has little calculate amount, the method is suitable for real-time realization on a DSP chip, and the natural degree of changed voice is high. The length of the changed voice is the same as the original voice length, so the present invention is favorable to transmit the changed voice signals in real time.
Description
Technical field
The present invention relates to a kind of voice change of voice method, more particularly, the present invention relates to a kind of voice change of voice method based on digital signal processing.
Background technology
Fundamental frequency and resonance peak are two very important features in the voice.Fundamental frequency is the frequency of vocal cord vibration when sending out voiced sound, and the height of fundamental frequency is directly related with speaker's sex, and in general the fundamental frequency of male voice is lower, and the fundamental frequency of female voice is than higher.In addition, the height of age for fundamental frequency also has certain influence, and the elderly's fundamental frequency is lower than young man's fundamental frequency, and young man's fundamental frequency is lower than children's fundamental frequency.So by changing fundamental frequency, just can change the effect of voice, influence the judgement of people to speaker's age even sex.
Resonance peak is meant the resonance frequency of glottis ripple in sound channel.The length of resonance peak and sound channel has very big correlativity, and the frequency of the long more resonance peak of sound channel is high more, and vice versa.Comparatively speaking, man's sound channel is more longer than woman's sound channel, so the formant frequency of male voice is relatively also higher than the formant frequency of female voice.Therefore by changing resonance peak, can influence the judgement of people to the speaker.
For the frequency of revising resonance peak, most of method all is based on the synthetic algorithm of parameter.The ubiquitous problem of these methods is that operand is bigger, needs manual intervention, and the naturalness of synthetic voice is poor.
For changing fundamental frequency, a lot of methods have been arranged at present.Using has PSOLA algorithm (PitchSynchronous Overlap and Add) more widely, mix harmonic wave probabilistic model method (Hybrid Harmonic/StochasticModel), autoregression linear predictor coefficient method methods such as (Auto-Regressive LPC).The PSOLA algorithm is because method is simple, and operand is little, and the naturalness of synthetic speech is very high, so be most widely used.But owing to the limitation of PSOLA algorithm itself, when needs change fundamental frequency scope was bigger, voice will produce considerable aliasing, cause very big noise.The naturalness of voice is less better behind other two kinds of methods change fundamental frequencies, and the operand of these two kinds of methods is all bigger, has certain difficulty in the dsp chip real-time implementation.In addition, these methods all can change the length of raw tone, this for the change of voice after the real-time transmission of voice can cause very big problem.And existing change of voice method all adopts basically is the method for mimic channel, is not suitable for going up at digital signal processor (DSP) realizing.
Summary of the invention
The object of the present invention is to provide a kind of improved voice change of voice method, by fundamental frequency that changes voice and the effect that formant frequency obtains the change of voice; The present invention also aims to provide a kind of improved voice change of voice method, make that the voice signal after the change of voice is consistent with primary speech signal length.
The object of the present invention is achieved like this:
A kind of voice change of voice method based on digital signal processing comprises the steps:
(1) chooses the primary speech signal that needs the change of voice;
(2) when primary speech signal exists periodically, calculate its fundamental frequency value, and calculate the length of the pitch period corresponding with this fundamental frequency value; When not existing periodically in the raw tone, at 65Hz to getting a frequency values between the 500Hz, with cycle of this frequency values correspondence as pitch period, with the Cycle Length of this frequency values correspondence as pitch period length;
(3) locate the position of each pitch period of whole primary speech signal according to the pitch period length that obtains in the step (2);
(4) deletion/insertion pitch period between the pitch period in primary speech signal, the voice signal that is shortened/extend;
(5), obtain the voice signal after the change of voice with the voice signal linear extension of shortening/elongation of obtaining in the step (4)/be compressed to the length consistent with primary speech signal.
It in step (4) deletion/insertion pitch period periodically between the pitch period in primary speech signal.
When the fundamental frequency of voice after the expectation change of voice is that the p of raw tone fundamental frequency is doubly and p>1 the time, every (p-1)
-1Individual pitch period inserts a pitch period, and this pitch period that is inserted into is that of the pitch period adjacent with the insertion point duplicates.When the fundamental frequency of voice after the expectation change of voice is that the p of raw tone fundamental frequency is doubly and 0<p<1 the time, every (1-p)
-1Current pitch period of individual pitch period deletion.Preferably, 1<p≤2 or 0.5≤p<1.
Linear extension/compression method in step (5) is: the length of raw tone is that the length of the voice signal of shortening/elongation of obtaining of N, step (3) is M, and then magnification ratio is r=M/N; The sequence of the voice signal of described shortening/elongation is x (m), wherein 1≤m≤M; Making the sequence of the voice signal after the change of voice is y (n), wherein 1≤n≤N; Make A
n=nr,
C
n=B
n+ 1, wherein
Be to be not more than A
nMaximum integer; Y (n)=x (B then
n)+(A
n-B
n) [x (C
n)-x (B
n)], wherein y (n) is a n point of voice sequence after the change of voice.
The present invention is based on the voice change of voice method of digital signal processing, and this method is simple and practical, and operand is very little, is suitable for real-time implementation on dsp chip, and the naturalness of the voice of the change of voice is very high.And the length of the voice after the change of voice is consistent with raw tone length, helps transmitting in real time the voice signal after the change of voice.
Description of drawings
Fig. 1 is the process flow diagram of voice change of voice method of the present invention;
Fig. 2 is the instance graph of fundamental tone period of voice signal location.
Embodiment
The present invention is done describe in further detail below in conjunction with accompanying drawing and the concrete direction of implementing.
The process flow diagram of voice change of voice method of the present invention as shown in Figure 1.At first import frame voice, the length of frame voice can be done suitably to adjust according to the actual conditions demand.
Estimate the fundamental frequency value in the primary speech signal of this input then.In the speech pitch of present embodiment is estimated, employing be harmonic wave and method (Summation of Sub-Harmonic Method), periodically all will obtain a fundamental frequency value so whether raw tone exists.When raw tone exists periodically, just there is pitch period, will obtain a significant fundamental frequency value so; When raw tone does not exist periodically, when just not having pitch period, as voiceless sound section or quiet section, obtain be actually one at 65Hz to a random number between the 500Hz, but the present invention is still with " puppet " pitch period length of the pairing Cycle Length of this frequency values as this voice segments.
From above-mentioned disposal route as can be seen, the present invention in fact handles not existing periodic voiceless sound section also to be used as voiced segments.This is because the voiceless sound section of voice is similar to white noise, periodically deletes therein or the insertion voice, influences the sense of hearing perceived effect of people to it hardly.And voice are regardless of the unified of voiced sound, voiceless sound handle, simplified the complexity of algorithm, the more important thing is the cost that has caused the voice change of voice to fail when having avoided the voiced sound erroneous judgement for voiceless sound.
The length of the pitch period of voice signal equals the fundamental frequency value of the sampling rate of voice divided by this voice signal.Locate the position of each pitch period of whole primary speech signal according to this Cycle Length.
Voice with per second 8K sampled point are example, and with 1000 sampled points as a processing unit, i.e. frame voice.Average fundamental frequency as this frame voice signal is 100Hz, and then its pitch period is 80 sampled points, and locatees pitch period with this length.
In general each pitch period all has a maximum value, and it is the most convenient and reliable to locate each pitch period with this.At first near the centre position of these frame voice, find a maximum value, find a maximal point to both sides every the length of a pitch period then, find the maximal point of all pitch periods successively.Present embodiment stipulates that each pitch period starts from this pitch period maximal point second zero crossing backward, ends at the starting point of next pitch period.Therefore seek its second zero crossing backward according to each maximal point, orient the zone of each pitch period with this.This location process can be with reference to shown in Figure 2, and to mark be the extreme point of each pitch period to solid line among the figure, and it is extreme point second zero crossing backward that dotted line is marked, and is a pitch period between two dotted lines.For example in the above-mentioned voice signal of choosing, near the 500th point of present frame, find maximum value earlier, then this maximum value forward near 80 and backward 80 seek maximum value, Using such method searches out all maximum value of these frame voice again.Seek its second zero crossing backward at each maximal point at last, orient the zone of each pitch period with this.
For the voice signal that does not have pitch period, after obtaining its " puppet " pitch period length, the method for also available above-mentioned searching maximum value and zero crossing is located the position of its pseudo-pitch period.
According to the difference that the change of voice requires, decision needs to insert pitch period and still deletes pitch period.Improve fundamental frequency if desired, so just insert pitch period; If the reduction fundamental frequency is just deleted pitch period.For example expect that the fundamental frequency of voice after the change of voice is 1.5 times of raw tone fundamental frequency, inserts a pitch period adjacent with the insertion point every [1/ (1.5-1)]=2 pitch periods so; For example expect that the fundamental frequency of voice after the change of voice is 0.8 times of raw tone fundamental frequency, so every current pitch period of [1/ (1-0.8)]=5 pitch periods deletions.The voice signal that is so just extended or shorten.
At last with the Speech Signal Compression of elongation to the length consistent with raw tone, perhaps the voice signal that shortens is elongated to the length consistent with raw tone, obtain the voice signal after the needed change of voice.For example the length of the voice of the elongation that is obtained by previous step is 1400 sampled points, and its voice sequence is x (m), wherein 1≤m≤1400.And the length of raw tone is 1000 sampled points, that is to say that the length of the voice signal after the change of voice also should be 1000 points, and making the sequence of the voice signal after the change of voice is y (n), wherein 1≤n≤1000.Magnification ratio r=1400/1000=1.4.Make A
n=nr,
C
n=B
n+ 1, wherein
Be to be not more than A
nMaximum integer.As following table:
n | 1 | 2 | 3 | … | 500 | 501 | … | 999 | 1000 |
A n | 1.4 | 2.8 | 4.4 | … | 700 | 701.4 | … | 1398.6 | 1400 |
B n | 1 | 2 | 4 | … | 700 | 701 | … | 1398 | 1400 |
C n | 2 | 3 | 5 | … | 701 | 702 | … | 1399 | 1400(1401) |
By formula y (n)=x (B
n)+(A
n-B
n) [x (C
n)-x (B
n)], voice, wherein n point of voice sequence after y (n) change of voice after the change of voice after just having obtained compressing.The lower right corner in form, C
nCalculated value 1401 surpassed the desirable maximal value 1400 of m, in the present embodiment, use and 1401 immediate 1400 replace.Voice output after the change of voice so just can have been obtained existing change of voice effect, consistent with raw tone length again voice signal at last.
Adopt the step consistent, also the voice signal that shortens can be elongated to the length consistent with raw tone with this method.
Claims (7)
1, a kind of voice change of voice method based on digital signal processing comprises the steps:
(1) chooses the primary speech signal that needs the change of voice;
(2) when primary speech signal exists periodically, calculate its fundamental frequency value, and calculate the length of the pitch period corresponding with this fundamental frequency value; When not existing periodically in the raw tone, at 65Hz to getting a frequency values between the 500Hz, with cycle of this frequency values correspondence as pitch period, with the Cycle Length of this frequency values correspondence as pitch period length;
(3) locate the position of each pitch period of whole primary speech signal according to the pitch period length that obtains in the step (2);
(4) deletion/insertion pitch period between the pitch period in primary speech signal, the voice signal that is shortened/extend;
(5), obtain the voice signal after the change of voice with the voice signal linear extension of shortening/elongation of obtaining in the step (4)/be compressed to the length consistent with primary speech signal.
2, voice change of voice method according to claim 1 is characterized in that, is deletion/insertion pitch period periodically between the pitch period in primary speech signal in step (4).
3, voice change of voice method according to claim 2 is characterized in that, when the fundamental frequency of voice after the expectation change of voice is that the p of raw tone fundamental frequency is doubly and p>1 the time, every (p-1)
-1Individual pitch period inserts a pitch period, and this pitch period that is inserted into is that of the pitch period adjacent with the insertion point duplicates.
4, voice change of voice method according to claim 2 is characterized in that, when the fundamental frequency of voice after the expectation change of voice is that the p of raw tone fundamental frequency is doubly and 0<p<1 the time, every (1-p)
-1Current pitch period of individual pitch period deletion.
5, voice change of voice method according to claim 3 is characterized in that 1<p≤2.
6, voice change of voice method according to claim 4 is characterized in that 0.5≤p<1.
7, voice change of voice method according to claim 1 is characterized in that, the linear extension/compression method in step (5) is:
The length of raw tone is that the length of the voice signal of shortening/elongation of obtaining of N, step (4) is M, and then magnification ratio is r=M/N; The sequence of the voice signal of described shortening/elongation is x (m), wherein 1≤m≤M; Making the sequence of the voice signal after the change of voice is y (n), wherein 1≤n≤N;
Make A
n=nr,
As 1≤B
nDuring<M, C
n=B
n+ 1, work as B
nDuring=M, C
n=B
n, wherein
Be to be not more than A
nMaximum integer; Y (n)=x (B then
n)+(A
n-B
n) [x (C
n)-x (B
n)], wherein y (n) is a n point of voice sequence after the change of voice.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB031370144A CN1248191C (en) | 2003-06-19 | 2003-06-19 | Phoneme changing method based on digital signal processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB031370144A CN1248191C (en) | 2003-06-19 | 2003-06-19 | Phoneme changing method based on digital signal processing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1567428A CN1567428A (en) | 2005-01-19 |
CN1248191C true CN1248191C (en) | 2006-03-29 |
Family
ID=34470332
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB031370144A Expired - Lifetime CN1248191C (en) | 2003-06-19 | 2003-06-19 | Phoneme changing method based on digital signal processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1248191C (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101354889B (en) * | 2008-09-18 | 2012-01-11 | 北京中星微电子有限公司 | Method and apparatus for tonal modification of voice |
CN101719371B (en) * | 2009-11-20 | 2012-04-04 | 安凯(广州)微电子技术有限公司 | Voice speed changing method |
JP2012252036A (en) * | 2011-05-31 | 2012-12-20 | Sony Corp | Signal processing apparatus, signal processing method, and program |
CN105304092A (en) * | 2015-09-18 | 2016-02-03 | 深圳市海派通讯科技有限公司 | Real-time voice changing method based on intelligent terminal |
CN105654941A (en) * | 2016-01-20 | 2016-06-08 | 华南理工大学 | Voice change method and device based on specific target person voice change ratio parameter |
CN108682413B (en) * | 2018-04-24 | 2020-09-29 | 上海师范大学 | Emotion persuasion system based on voice conversion |
CN108682426A (en) * | 2018-05-17 | 2018-10-19 | 深圳市沃特沃德股份有限公司 | Voice sensual pleasure conversion method and device |
CN109616131B (en) * | 2018-11-12 | 2023-07-07 | 南京南大电子智慧型服务机器人研究院有限公司 | Digital real-time voice sound changing method |
CN110728993A (en) * | 2019-10-29 | 2020-01-24 | 维沃移动通信有限公司 | Voice change identification method and electronic equipment |
CN113743901A (en) * | 2021-09-07 | 2021-12-03 | 广州网才信息技术有限公司 | Cloud online invigilation method, system, equipment and storage medium |
-
2003
- 2003-06-19 CN CNB031370144A patent/CN1248191C/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
CN1567428A (en) | 2005-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1248191C (en) | Phoneme changing method based on digital signal processing | |
US5642466A (en) | Intonation adjustment in text-to-speech systems | |
EP1380029B1 (en) | Time-scale modification of signals applying techniques specific to determined signal types | |
AU2009267486B2 (en) | Audio signal decoder, audio signal encoder, encoded multi-channel audio signal representation, methods and computer program | |
Airaksinen et al. | Quasi closed phase glottal inverse filtering analysis with weighted linear prediction | |
US4625286A (en) | Time encoding of LPC roots | |
CN1503968A (en) | Bandwidth extension of acoustic signals | |
CN103854649A (en) | Frame loss compensation method and frame loss compensation device for transform domain | |
JP4545941B2 (en) | Method and apparatus for determining speech coding parameters | |
CN110459196A (en) | A kind of method, apparatus and system adjusting singing songs difficulty | |
US6463406B1 (en) | Fractional pitch method | |
CN1432177A (en) | Speech rate conversion | |
Allen et al. | A model for the synthesis of natural sounding vowels | |
CN1266675C (en) | Speech rate conversion apparatus and method | |
JPH056197A (en) | Post filter for voice synthesizing device | |
JPH0777999A (en) | Speech time base compressing and expanding method | |
CN1708785A (en) | Band extending apparatus and method | |
Childers et al. | Articulatory synthesis: nasal sounds and male and female voices | |
KR100445342B1 (en) | Time scale modification method and system using Dual-SOLA algorithm | |
CN1926606A (en) | Coding/decoding method based on template matching and multiple distinguishability analysis | |
JPS594719B2 (en) | Fundamental periodic waveform generation method for speech synthesis | |
Story | A distinctive region model (DRM) based on empirical vocal tract area functions | |
JPS63311400A (en) | Sound synchronization | |
CN102034514A (en) | Digital audio time-domain compression method based on music characteristics | |
Li et al. | A Novel Approach for Pitch Modification on Time Domain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CX01 | Expiry of patent term | ||
CX01 | Expiry of patent term |
Granted publication date: 20060329 |