CN1248191C - Phoneme changing method based on digital signal processing - Google Patents

Phoneme changing method based on digital signal processing Download PDF

Info

Publication number
CN1248191C
CN1248191C CNB031370144A CN03137014A CN1248191C CN 1248191 C CN1248191 C CN 1248191C CN B031370144 A CNB031370144 A CN B031370144A CN 03137014 A CN03137014 A CN 03137014A CN 1248191 C CN1248191 C CN 1248191C
Authority
CN
China
Prior art keywords
voice
change
pitch period
length
fundamental frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CNB031370144A
Other languages
Chinese (zh)
Other versions
CN1567428A (en
Inventor
李明
刘建
汪俊杰
庹凌云
颜永红
孙宝海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CNB031370144A priority Critical patent/CN1248191C/en
Publication of CN1567428A publication Critical patent/CN1567428A/en
Application granted granted Critical
Publication of CN1248191C publication Critical patent/CN1248191C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Electrophonic Musical Instruments (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention discloses a voice changing method based on digital signal processing. The present invention comprises the steps: 1, original voice signals need to change voice are selected. 2, users can obtain the fundamental tone periodical length of the original voice signals. 3, the position of each fundamental tone cycle of the entire original voice signals is positioned according to the fundamental tone periodical length. 4, fundamental tone periods are deleted or inserted between fundamental tone periods of the original voice signals so as to obtain shortened or elongated voice signals. 5, the shortened or elongated voice signals are linearly elongated or compressed to the same length with the original voice signals so as to obtain the voice signals after voice change. The present invention is based on a voice change method processed by digital signals, the method is simple and practical and has little calculate amount, the method is suitable for real-time realization on a DSP chip, and the natural degree of changed voice is high. The length of the changed voice is the same as the original voice length, so the present invention is favorable to transmit the changed voice signals in real time.

Description

A kind of voice change of voice method based on digital signal processing
Technical field
The present invention relates to a kind of voice change of voice method, more particularly, the present invention relates to a kind of voice change of voice method based on digital signal processing.
Background technology
Fundamental frequency and resonance peak are two very important features in the voice.Fundamental frequency is the frequency of vocal cord vibration when sending out voiced sound, and the height of fundamental frequency is directly related with speaker's sex, and in general the fundamental frequency of male voice is lower, and the fundamental frequency of female voice is than higher.In addition, the height of age for fundamental frequency also has certain influence, and the elderly's fundamental frequency is lower than young man's fundamental frequency, and young man's fundamental frequency is lower than children's fundamental frequency.So by changing fundamental frequency, just can change the effect of voice, influence the judgement of people to speaker's age even sex.
Resonance peak is meant the resonance frequency of glottis ripple in sound channel.The length of resonance peak and sound channel has very big correlativity, and the frequency of the long more resonance peak of sound channel is high more, and vice versa.Comparatively speaking, man's sound channel is more longer than woman's sound channel, so the formant frequency of male voice is relatively also higher than the formant frequency of female voice.Therefore by changing resonance peak, can influence the judgement of people to the speaker.
For the frequency of revising resonance peak, most of method all is based on the synthetic algorithm of parameter.The ubiquitous problem of these methods is that operand is bigger, needs manual intervention, and the naturalness of synthetic voice is poor.
For changing fundamental frequency, a lot of methods have been arranged at present.Using has PSOLA algorithm (PitchSynchronous Overlap and Add) more widely, mix harmonic wave probabilistic model method (Hybrid Harmonic/StochasticModel), autoregression linear predictor coefficient method methods such as (Auto-Regressive LPC).The PSOLA algorithm is because method is simple, and operand is little, and the naturalness of synthetic speech is very high, so be most widely used.But owing to the limitation of PSOLA algorithm itself, when needs change fundamental frequency scope was bigger, voice will produce considerable aliasing, cause very big noise.The naturalness of voice is less better behind other two kinds of methods change fundamental frequencies, and the operand of these two kinds of methods is all bigger, has certain difficulty in the dsp chip real-time implementation.In addition, these methods all can change the length of raw tone, this for the change of voice after the real-time transmission of voice can cause very big problem.And existing change of voice method all adopts basically is the method for mimic channel, is not suitable for going up at digital signal processor (DSP) realizing.
Summary of the invention
The object of the present invention is to provide a kind of improved voice change of voice method, by fundamental frequency that changes voice and the effect that formant frequency obtains the change of voice; The present invention also aims to provide a kind of improved voice change of voice method, make that the voice signal after the change of voice is consistent with primary speech signal length.
The object of the present invention is achieved like this:
A kind of voice change of voice method based on digital signal processing comprises the steps:
(1) chooses the primary speech signal that needs the change of voice;
(2) when primary speech signal exists periodically, calculate its fundamental frequency value, and calculate the length of the pitch period corresponding with this fundamental frequency value; When not existing periodically in the raw tone, at 65Hz to getting a frequency values between the 500Hz, with cycle of this frequency values correspondence as pitch period, with the Cycle Length of this frequency values correspondence as pitch period length;
(3) locate the position of each pitch period of whole primary speech signal according to the pitch period length that obtains in the step (2);
(4) deletion/insertion pitch period between the pitch period in primary speech signal, the voice signal that is shortened/extend;
(5), obtain the voice signal after the change of voice with the voice signal linear extension of shortening/elongation of obtaining in the step (4)/be compressed to the length consistent with primary speech signal.
It in step (4) deletion/insertion pitch period periodically between the pitch period in primary speech signal.
When the fundamental frequency of voice after the expectation change of voice is that the p of raw tone fundamental frequency is doubly and p>1 the time, every (p-1) -1Individual pitch period inserts a pitch period, and this pitch period that is inserted into is that of the pitch period adjacent with the insertion point duplicates.When the fundamental frequency of voice after the expectation change of voice is that the p of raw tone fundamental frequency is doubly and 0<p<1 the time, every (1-p) -1Current pitch period of individual pitch period deletion.Preferably, 1<p≤2 or 0.5≤p<1.
Linear extension/compression method in step (5) is: the length of raw tone is that the length of the voice signal of shortening/elongation of obtaining of N, step (3) is M, and then magnification ratio is r=M/N; The sequence of the voice signal of described shortening/elongation is x (m), wherein 1≤m≤M; Making the sequence of the voice signal after the change of voice is y (n), wherein 1≤n≤N; Make A n=nr,
Figure C0313701400041
C n=B n+ 1, wherein
Figure C0313701400042
Be to be not more than A nMaximum integer; Y (n)=x (B then n)+(A n-B n) [x (C n)-x (B n)], wherein y (n) is a n point of voice sequence after the change of voice.
The present invention is based on the voice change of voice method of digital signal processing, and this method is simple and practical, and operand is very little, is suitable for real-time implementation on dsp chip, and the naturalness of the voice of the change of voice is very high.And the length of the voice after the change of voice is consistent with raw tone length, helps transmitting in real time the voice signal after the change of voice.
Description of drawings
Fig. 1 is the process flow diagram of voice change of voice method of the present invention;
Fig. 2 is the instance graph of fundamental tone period of voice signal location.
Embodiment
The present invention is done describe in further detail below in conjunction with accompanying drawing and the concrete direction of implementing.
The process flow diagram of voice change of voice method of the present invention as shown in Figure 1.At first import frame voice, the length of frame voice can be done suitably to adjust according to the actual conditions demand.
Estimate the fundamental frequency value in the primary speech signal of this input then.In the speech pitch of present embodiment is estimated, employing be harmonic wave and method (Summation of Sub-Harmonic Method), periodically all will obtain a fundamental frequency value so whether raw tone exists.When raw tone exists periodically, just there is pitch period, will obtain a significant fundamental frequency value so; When raw tone does not exist periodically, when just not having pitch period, as voiceless sound section or quiet section, obtain be actually one at 65Hz to a random number between the 500Hz, but the present invention is still with " puppet " pitch period length of the pairing Cycle Length of this frequency values as this voice segments.
From above-mentioned disposal route as can be seen, the present invention in fact handles not existing periodic voiceless sound section also to be used as voiced segments.This is because the voiceless sound section of voice is similar to white noise, periodically deletes therein or the insertion voice, influences the sense of hearing perceived effect of people to it hardly.And voice are regardless of the unified of voiced sound, voiceless sound handle, simplified the complexity of algorithm, the more important thing is the cost that has caused the voice change of voice to fail when having avoided the voiced sound erroneous judgement for voiceless sound.
The length of the pitch period of voice signal equals the fundamental frequency value of the sampling rate of voice divided by this voice signal.Locate the position of each pitch period of whole primary speech signal according to this Cycle Length.
Voice with per second 8K sampled point are example, and with 1000 sampled points as a processing unit, i.e. frame voice.Average fundamental frequency as this frame voice signal is 100Hz, and then its pitch period is 80 sampled points, and locatees pitch period with this length.
In general each pitch period all has a maximum value, and it is the most convenient and reliable to locate each pitch period with this.At first near the centre position of these frame voice, find a maximum value, find a maximal point to both sides every the length of a pitch period then, find the maximal point of all pitch periods successively.Present embodiment stipulates that each pitch period starts from this pitch period maximal point second zero crossing backward, ends at the starting point of next pitch period.Therefore seek its second zero crossing backward according to each maximal point, orient the zone of each pitch period with this.This location process can be with reference to shown in Figure 2, and to mark be the extreme point of each pitch period to solid line among the figure, and it is extreme point second zero crossing backward that dotted line is marked, and is a pitch period between two dotted lines.For example in the above-mentioned voice signal of choosing, near the 500th point of present frame, find maximum value earlier, then this maximum value forward near 80 and backward 80 seek maximum value, Using such method searches out all maximum value of these frame voice again.Seek its second zero crossing backward at each maximal point at last, orient the zone of each pitch period with this.
For the voice signal that does not have pitch period, after obtaining its " puppet " pitch period length, the method for also available above-mentioned searching maximum value and zero crossing is located the position of its pseudo-pitch period.
According to the difference that the change of voice requires, decision needs to insert pitch period and still deletes pitch period.Improve fundamental frequency if desired, so just insert pitch period; If the reduction fundamental frequency is just deleted pitch period.For example expect that the fundamental frequency of voice after the change of voice is 1.5 times of raw tone fundamental frequency, inserts a pitch period adjacent with the insertion point every [1/ (1.5-1)]=2 pitch periods so; For example expect that the fundamental frequency of voice after the change of voice is 0.8 times of raw tone fundamental frequency, so every current pitch period of [1/ (1-0.8)]=5 pitch periods deletions.The voice signal that is so just extended or shorten.
At last with the Speech Signal Compression of elongation to the length consistent with raw tone, perhaps the voice signal that shortens is elongated to the length consistent with raw tone, obtain the voice signal after the needed change of voice.For example the length of the voice of the elongation that is obtained by previous step is 1400 sampled points, and its voice sequence is x (m), wherein 1≤m≤1400.And the length of raw tone is 1000 sampled points, that is to say that the length of the voice signal after the change of voice also should be 1000 points, and making the sequence of the voice signal after the change of voice is y (n), wherein 1≤n≤1000.Magnification ratio r=1400/1000=1.4.Make A n=nr,
Figure C0313701400061
C n=B n+ 1, wherein Be to be not more than A nMaximum integer.As following table:
n 1 2 3 500 501 999 1000
A n 1.4 2.8 4.4 700 701.4 1398.6 1400
B n 1 2 4 700 701 1398 1400
C n 2 3 5 701 702 1399 1400(1401)
By formula y (n)=x (B n)+(A n-B n) [x (C n)-x (B n)], voice, wherein n point of voice sequence after y (n) change of voice after the change of voice after just having obtained compressing.The lower right corner in form, C nCalculated value 1401 surpassed the desirable maximal value 1400 of m, in the present embodiment, use and 1401 immediate 1400 replace.Voice output after the change of voice so just can have been obtained existing change of voice effect, consistent with raw tone length again voice signal at last.
Adopt the step consistent, also the voice signal that shortens can be elongated to the length consistent with raw tone with this method.

Claims (7)

1, a kind of voice change of voice method based on digital signal processing comprises the steps:
(1) chooses the primary speech signal that needs the change of voice;
(2) when primary speech signal exists periodically, calculate its fundamental frequency value, and calculate the length of the pitch period corresponding with this fundamental frequency value; When not existing periodically in the raw tone, at 65Hz to getting a frequency values between the 500Hz, with cycle of this frequency values correspondence as pitch period, with the Cycle Length of this frequency values correspondence as pitch period length;
(3) locate the position of each pitch period of whole primary speech signal according to the pitch period length that obtains in the step (2);
(4) deletion/insertion pitch period between the pitch period in primary speech signal, the voice signal that is shortened/extend;
(5), obtain the voice signal after the change of voice with the voice signal linear extension of shortening/elongation of obtaining in the step (4)/be compressed to the length consistent with primary speech signal.
2, voice change of voice method according to claim 1 is characterized in that, is deletion/insertion pitch period periodically between the pitch period in primary speech signal in step (4).
3, voice change of voice method according to claim 2 is characterized in that, when the fundamental frequency of voice after the expectation change of voice is that the p of raw tone fundamental frequency is doubly and p>1 the time, every (p-1) -1Individual pitch period inserts a pitch period, and this pitch period that is inserted into is that of the pitch period adjacent with the insertion point duplicates.
4, voice change of voice method according to claim 2 is characterized in that, when the fundamental frequency of voice after the expectation change of voice is that the p of raw tone fundamental frequency is doubly and 0<p<1 the time, every (1-p) -1Current pitch period of individual pitch period deletion.
5, voice change of voice method according to claim 3 is characterized in that 1<p≤2.
6, voice change of voice method according to claim 4 is characterized in that 0.5≤p<1.
7, voice change of voice method according to claim 1 is characterized in that, the linear extension/compression method in step (5) is:
The length of raw tone is that the length of the voice signal of shortening/elongation of obtaining of N, step (4) is M, and then magnification ratio is r=M/N; The sequence of the voice signal of described shortening/elongation is x (m), wherein 1≤m≤M; Making the sequence of the voice signal after the change of voice is y (n), wherein 1≤n≤N;
Make A n=nr, As 1≤B nDuring<M, C n=B n+ 1, work as B nDuring=M, C n=B n, wherein Be to be not more than A nMaximum integer; Y (n)=x (B then n)+(A n-B n) [x (C n)-x (B n)], wherein y (n) is a n point of voice sequence after the change of voice.
CNB031370144A 2003-06-19 2003-06-19 Phoneme changing method based on digital signal processing Expired - Lifetime CN1248191C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB031370144A CN1248191C (en) 2003-06-19 2003-06-19 Phoneme changing method based on digital signal processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB031370144A CN1248191C (en) 2003-06-19 2003-06-19 Phoneme changing method based on digital signal processing

Publications (2)

Publication Number Publication Date
CN1567428A CN1567428A (en) 2005-01-19
CN1248191C true CN1248191C (en) 2006-03-29

Family

ID=34470332

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB031370144A Expired - Lifetime CN1248191C (en) 2003-06-19 2003-06-19 Phoneme changing method based on digital signal processing

Country Status (1)

Country Link
CN (1) CN1248191C (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354889B (en) * 2008-09-18 2012-01-11 北京中星微电子有限公司 Method and apparatus for tonal modification of voice
CN101719371B (en) * 2009-11-20 2012-04-04 安凯(广州)微电子技术有限公司 Voice speed changing method
JP2012252036A (en) * 2011-05-31 2012-12-20 Sony Corp Signal processing apparatus, signal processing method, and program
CN105304092A (en) * 2015-09-18 2016-02-03 深圳市海派通讯科技有限公司 Real-time voice changing method based on intelligent terminal
CN105654941A (en) * 2016-01-20 2016-06-08 华南理工大学 Voice change method and device based on specific target person voice change ratio parameter
CN108682413B (en) * 2018-04-24 2020-09-29 上海师范大学 Emotion persuasion system based on voice conversion
CN108682426A (en) * 2018-05-17 2018-10-19 深圳市沃特沃德股份有限公司 Voice sensual pleasure conversion method and device
CN109616131B (en) * 2018-11-12 2023-07-07 南京南大电子智慧型服务机器人研究院有限公司 Digital real-time voice sound changing method
CN110728993A (en) * 2019-10-29 2020-01-24 维沃移动通信有限公司 Voice change identification method and electronic equipment
CN113743901A (en) * 2021-09-07 2021-12-03 广州网才信息技术有限公司 Cloud online invigilation method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN1567428A (en) 2005-01-19

Similar Documents

Publication Publication Date Title
CN1248191C (en) Phoneme changing method based on digital signal processing
US5642466A (en) Intonation adjustment in text-to-speech systems
EP1380029B1 (en) Time-scale modification of signals applying techniques specific to determined signal types
AU2009267486B2 (en) Audio signal decoder, audio signal encoder, encoded multi-channel audio signal representation, methods and computer program
Airaksinen et al. Quasi closed phase glottal inverse filtering analysis with weighted linear prediction
US4625286A (en) Time encoding of LPC roots
CN1503968A (en) Bandwidth extension of acoustic signals
CN103854649A (en) Frame loss compensation method and frame loss compensation device for transform domain
JP4545941B2 (en) Method and apparatus for determining speech coding parameters
CN110459196A (en) A kind of method, apparatus and system adjusting singing songs difficulty
US6463406B1 (en) Fractional pitch method
CN1432177A (en) Speech rate conversion
Allen et al. A model for the synthesis of natural sounding vowels
CN1266675C (en) Speech rate conversion apparatus and method
JPH056197A (en) Post filter for voice synthesizing device
JPH0777999A (en) Speech time base compressing and expanding method
CN1708785A (en) Band extending apparatus and method
Childers et al. Articulatory synthesis: nasal sounds and male and female voices
KR100445342B1 (en) Time scale modification method and system using Dual-SOLA algorithm
CN1926606A (en) Coding/decoding method based on template matching and multiple distinguishability analysis
JPS594719B2 (en) Fundamental periodic waveform generation method for speech synthesis
Story A distinctive region model (DRM) based on empirical vocal tract area functions
JPS63311400A (en) Sound synchronization
CN102034514A (en) Digital audio time-domain compression method based on music characteristics
Li et al. A Novel Approach for Pitch Modification on Time Domain

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term
CX01 Expiry of patent term

Granted publication date: 20060329