Summary of the invention
The present inventor finds above-mentioned problems of the prior art, proposes a kind of new technical scheme, when avoiding setting up parameter database, realizes the conversion to characteristics of speech sounds by the frequency map that parameter is controlled.
An object of the present invention is to provide a kind of transform method and device of characteristics of speech sounds.
According to an aspect of the present invention, provide a kind of transform method of voice signal characteristic, the method comprises:
Utilize the fundamental frequency value of voice signal to be multiple subband by the spectrum division of described voice signal, described multiple subband comprises fundamental tone subband and overtone subband;
Be biased based on the pitch variation rate pre-set and fundamental tone, the frequency values that one or more consecutive described subband comprises is mapped as conversion frequency value respectively, to obtain the conversion subband corresponding with this subband;
Reconstructed spectrum is obtained based on described conversion frequency value;
Described reconstructed spectrum is utilized to synthesize new voice signal.
Preferably, described is that the step of multiple subband comprises by the spectrum division of described voice signal:
N point windowed DFT is carried out to described voice signal, obtains the range value X corresponding to N/2+1 frequency values
a[n]
kwith the value X that adjusts frequency normalizing to N/2
f[n]
k, n is frame number, and k is spectral index number, k=0,1,2 ..., N/2;
Utilize the value of adjusting frequency that described fundamental frequency value is corresponding, by described N/2+1 the value X that adjusts frequency
f[n]
kbe divided into multiple subband, described multiple subband comprises fundamental tone subband and overtone subband.
Preferably, the step that the described frequency values comprised one or more consecutive described subband maps respectively comprises:
At least one value X that adjusts frequency that each subband in one or more consecutive described subband is comprised
f[n]
kbe mapped as conversion frequency value X respectively
f' [n]
k, at least one value X that adjusts frequency described
f[n]
kcomprise the value of adjusting frequency that the frequency values in this subband with maximum amplitude value is corresponding;
The described step based on described conversion frequency value acquisition reconstructed spectrum comprises:
Conversion frequency value X according to following formulae discovery
f' [n]
kcorresponding conversion spectral index k ':
k′=round(X
F′[n]
k)
Wherein function round () represents round;
Based on described conversion frequency value X
f' [n]
k, obtain the conversion frequency value X that conversion spectral index k ' is corresponding
f' [n]
k '.
Preferably, use following formula perform described in adjust frequency value X
f[n]
kto corresponding conversion frequency value X
f' [n]
kmapping:
X
F′[n]
k=r
wide×(X
F[n]
k-f
b)+f
b′′
=r
wide×(X
F[n]
k-(b+1)f
0)+(b′+1)f
ratef
0+(b′+1)f
offset
f
b′′=(b′+1)f
0′
f
0′=f
ratef
0+f
offset
Wherein, f
0be to normalize to the adjustment fundamental frequency value of N/2, f
0' be conversion frequency value after the mapping corresponding with normalization fundamental frequency value, b is sub-band serial number, f
bthe center frequency value in subband b, f
b '' be center frequency value in subband b ' after center frequency value in subband b maps, f
raterepresent pitch variation rate, f
offsetrepresent that fundamental tone is biased, r
wideit is bandwidth control coefrficient.
Preferably, when a conversion spectral index k ' only corresponds to a described spectral index k, then corresponding with this conversion spectral index k ' reconstruction frequency values X
rF[n]
k 'equal X
f' [n]
k;
When a conversion spectral index k ' is corresponding to multiple spectral index k, the reconstruction frequency values X corresponding with this conversion spectral index k '
rF[n]
k 'there is in frequency values corresponding to described multiple spectral index k the value X that adjusts frequency of amplitude peak
f[n]
kcorresponding conversion frequency value X
f' [n]
k.
Preferably, the described step based on described conversion frequency value acquisition reconstructed spectrum also comprises:
Based on described range value X
a[n]
k, obtain the reconstruction range value X that conversion spectral index k ' is corresponding
rA[n]
k '.
Preferably, when a conversion spectral index k ' only corresponds to a described spectral index k, the reconstruction range value X corresponding with this conversion spectral index k '
rA[n]
k 'range value X corresponding to described spectral index k
a[n]
k;
When a conversion spectral index k ' is corresponding to multiple described spectral index k, the reconstruction range value X corresponding with this conversion spectral index k '
rA[n]
k 'for the range value sum corresponding to described multiple spectral index k, or the range value X corresponding to described multiple spectral index k
rA[n]
k 'all side and.
Preferably, the described step utilizing described reconstructed spectrum to synthesize new voice signal comprises:
Based on the described reconstruction frequency values X corresponding with each described conversion spectral index k '
rF[n]
k 'calculate the reconstruction phase value X corresponding with conversion spectral index k '
rP[n]
k ';
Based on described reconstruction phase value X
rP[n]
k 'with described reconstruction range value X
rA[n]
k 'synthesize new voice signal.
Preferably, before the frequency values comprised by one or more consecutive described subband is mapped as conversion frequency value respectively, also comprise:
By number range value corresponding respectively of each spectral index in described one or more consecutive subband divided by range value maximum in its place subband, with the amplitude normalization by described subband.
Preferably, the method also comprises:
By described reconstruction range value X
rA[n]
k 'be multiplied by subband gain value corresponding to the conversion subband at its place and/or EQ Gain value, to obtain envelope adjusting range value X
rA' [n]
k '.
Preferably, the method also comprises:
Preset N/2+1 EQ Gain value g
0, g
1, g
2..., g
n/2.
Preferably, described subband gain value is the subband envelope gain value corresponding to spectral index number in described conversion subband with maximum amplitude value.
Preferably, conversion spectral index k ' corresponding for maximum amplitude value in described conversion subband is carried out linear or nonlinear transformation and obtain new spectral index number;
When maximum spectrum call number in last subband that described new spectral index number is not more than before conversion, described subband envelope gain is the maximum amplitude value in the subband before the conversion of described new spectral index number,
When maximum spectrum call number in last subband that described new spectral index number is greater than before conversion, described subband envelope gain is the maximum amplitude value in last subband before conversion, and last subband described is the maximum subband of frequency values.
Preferably, the described step utilizing described reconstructed spectrum to synthesize new voice signal comprises:
Based on the described reconstruction frequency values X corresponding with each described conversion spectral index k '
rF[n]
k 'calculate the reconstruction phase value X corresponding with conversion spectral index k '
rP[n]
k ';
Based on described reconstruction phase value X
rP[n]
k 'with described envelope adjusting range value X
rA' [n]
k 'synthesize new voice signal.
Preferably, described based on the described reconstruction frequency values X corresponding with each described conversion spectral index k '
rF[n]
k 'calculate the envelope adjusting range value X corresponding with conversion spectral index k '
rP' [n]
k 'step in, according to following formulae discovery:
X
RP[n]
k′=res[X
RP[n-1]
k′+X
RF′[n]
k′M/N]
Wherein, res [x]=x-round [x], X
rP[0]
k '=0, M is the sampled point number in output time interval.
Preferably, described one or more consecutive subband comprises the subband with amplitude peak peak value.
Preferably, when converting spectral index k ' < 0, abandon rebuilding to this conversion spectral index number relevant frequency spectrum; When converting spectral index k ' > N/2, abandon rebuilding to this conversion spectral index number relevant frequency spectrum.
Preferably, described voice signal is detected by fundamental tone detecting method.
According to a further aspect in the invention, additionally provide a kind of converting means of characteristics of signals, this device comprises:
Sub-band division unit, for utilizing the fundamental frequency value of voice signal to be multiple subband by the spectrum division of described voice signal, described multiple subband comprises fundamental tone subband and overtone subband;
Frequency mapping unit, for being biased based on the pitch variation rate pre-set and fundamental tone, is mapped as conversion frequency value respectively by the frequency values that one or more consecutive described subband comprises, to obtain the conversion subband corresponding with this subband;
Frequency spectrum reconstruction unit, for obtaining reconstructed spectrum based on described conversion frequency value;
Signal synthesis unit, synthesizes new voice signal for utilizing described reconstructed spectrum.
Method provided by the present invention utilizes the fundamental frequency value of voice signal the frequency spectrum of voice signal to be carried out to the division of subband, be biased based on the pitch variation rate pre-set and fundamental tone, the frequency values that one or more consecutive subband comprises is mapped respectively, thus can obtain the reconstructed spectrum with specified speech feature, and then synthesis has the voice signal of new speech feature.
In another kind preferably embodiment, by normalizing to the value X that adjusts frequency of N/2
f[n]
k, at least one value X that adjusts frequency that one or more consecutive subband is comprised
f[n]
kbe mapped as conversion frequency value X respectively
f' [n]
k, the frequency values X of mapping
f[n]
kcomprise the adjust frequency value corresponding with the frequency values with maximum amplitude value in this subband, thus can map for component frequency value, reduce calculated amount.
In another kind preferably embodiment, further, first subband amplitude normalization can also be carried out before frequency map, after carrying out frequency map, use and there is the subband gain value of phonetic feature and/or EQ Gain value carries out envelope adjustment to spectrum amplitude, thus the conversion of more real characteristics of speech sounds can be realized.
In another kind preferably embodiment, use the signal reconstruction method possessing phase measurements, the high-quality reconstructed speech signal with time-frequency smoothness properties can be obtained.
By referring to the detailed description of accompanying drawing to exemplary embodiment of the present invention, further feature of the present invention and advantage thereof will become clear.
Embodiment
Various exemplary embodiment of the present invention is described in detail now with reference to accompanying drawing.It should be noted that: unless specifically stated otherwise, otherwise positioned opposite, the numerical expression of the parts of setting forth in these embodiments and step and numerical value do not limit the scope of the invention.
Meanwhile, it should be understood that for convenience of description, the size of the various piece shown in accompanying drawing is not draw according to the proportionate relationship of reality.
Illustrative to the description only actually of at least one exemplary embodiment below, never as any restriction to the present invention and application or use.
May not discuss in detail for the known technology of person of ordinary skill in the relevant, method and apparatus, but in the appropriate case, described technology, method and apparatus should be regarded as a part of authorizing instructions.
In all examples with discussing shown here, any occurrence should be construed as merely exemplary, instead of as restriction.Therefore, other example of exemplary embodiment can have different values.
It should be noted that: represent similar terms in similar label and letter accompanying drawing below, therefore, once be defined in an a certain Xiang Yi accompanying drawing, then do not need to be further discussed it in accompanying drawing subsequently.
Below the transform method of characteristics of speech sounds provided by the present invention and device are described in detail.
the transform method of voice signal characteristic
embodiment one
Shown in Figure 1, this figure is the schematic flow sheet of the transform method embodiment one of voice signal characteristic provided by the present invention.
In a step 101, utilize the fundamental frequency value of voice signal to be multiple subband by the spectrum division of voice signal, these multiple subbands comprise fundamental tone subband and overtone subband.
Usually will consider the conversion of characteristics of speech sounds from basic acoustics, pitch depends on the frequency of sound.The size of sound frequency and the length of vocal cords, thickness, degree of tightness are relevant.Vocal cords are short, thin, tight, and during pronunciation, frequency is larger, and sound is just high, otherwise just low.The vocal cords of woman, children are per second vibrates 150 ~ 300 times, and baby is also higher, 60 ~ 200 times per second of man.When one personal mood is exciting, sound is high, and time depressed, sound is low.Vocal cords opening and closing makes air-flow form a series of pulse.Every opening and closing time once and vibration period are called pitch period or pitch period, and the inverse of pitch period is fundamental frequency.Therefore, the fundamental frequency by changing sound can change the characteristic of sound significantly.
The tone color of sound is determined by the combination of the fundamental tone and overtone that form sound.Different sounding bodies, even if the frequency of fundamental tone is identical with intensity, but overtone (each harmonic) the intensity size caused due to physical arrangement is different with distribution, and the sensation sounded will difference completely.People's one's voice in speech is different, has different timbres to a great extent and causes, thus define different sound characteristicses.
Based on the feature that can make a significant impact characteristics of speech sounds above, utilize the frequency spectrum of fundamental frequency value to voice signal of voice signal to divide, be divided into different subbands, comprise fundamental tone subband and overtone subband, such as, make b be sub-band serial number, b=0,1,2, ..., wherein 0 represent fundamental tone subband, 1,2,3 ... represent the 1st, 2,3... overtone subband.Next, the mapping transformation of frequency values can be carried out based on one or more subband, to obtain new frequency values.
In a step 102, be biased based on the pitch variation rate pre-set and fundamental tone, the frequency values comprised by one or more consecutive subband is mapped as conversion frequency value respectively, to obtain the conversion subband corresponding with this subband.
Fundamental frequency is the important parameter showing speaker's characteristic in voice signal.The scope of fundamental frequency is about about 70-350Hz, and it is fixed with the sex of speaker, age and concrete condition, and elderly men is on the low side, child and young women higher.
Because fundamental frequency is the key character affecting voice signal speaker characteristic, the height changing the fundamental tone frequency spectrum of voice signal significantly can change the auditory properties of voice.
Carry out the controlled mapping of parameter to the frequency of fundamental tone subband and overtone subband, controlled parameter can comprise pitch variation rate and fundamental tone is biased.Be biased by default different pitch variation rate and fundamental tone and frequency values is mapped, the conversion frequency value with special sound characteristic can be obtained, and obtain the conversion subband corresponding with this subband.
In step 103, reconstructed spectrum is obtained based on conversion frequency value.The new frequency spectrum obtained has special sound characteristic.
At step 104, reconstructed spectrum is utilized to synthesize new voice signal.Because reconstructed spectrum has specific characteristics of speech sounds, therefore utilize reconstructed spectrum to synthesize new voice signal and also there is specific characteristics of speech sounds, thus achieve the conversion of characteristics of speech sounds.
The frequency spectrum utilizing the fundamental frequency value of voice signal to divide voice signal can have different division methods.In embodiment two, will a kind of concrete division methods be introduced, and carry out the mapping of sub-bands of frequencies based on this division methods, and the reconstruction of frequency spectrum and the synthesis of voice signal.
embodiment two
Shown in Figure 2, the figure shows the schematic flow sheet of the transform method embodiment two of voice signal characteristic provided by the present invention.
In step 201, N point windowed DFT (Discrete-time Fourier Transform, DFT) is carried out to voice signal, obtains the range value X corresponding to N/2+1 frequency values
a[n]
kwith the value X that adjusts frequency normalizing to N/2
f[n]
k, n is frame number, and k is spectral index number, k=0,1,2 ..., N/2.
Frame by frame windowed DFT is carried out to voice signal, and then the range value and frequency values that obtain voice signal can be analyzed.Particularly, to voice signal x (t), carry out N point windowed DFT with time interval L and obtain X [n]
kwith X ' [n]
k:
k=0,1,2,…,N/2
Wherein h
anarepresent the analysis window function of N point, Hamming (Hamming), Hanning window (Hanning) or sinusoidal windows (Sine) can be used, K is that periodic extension is counted, 1 <=K <=L, subscript k represent the kth element of vector.Because voice signal is real number signal, N point DFT converts the DFT spectrum obtained only to be needed to retain front N/2+1 point, and namely the scope of spectral index k is from 0 to N/2.
For the length of N, General Requirements N be the power of 2 to meet the requirement of fast fourier transform (Fast Fourier Transform, FFT), and the spectral resolution fs/N < 32Hz of DFT can be made.For sample frequency f
sthe voice signal of=8000Hz, N is greater than and equals 256.L then represents the renewal speed analyzed with synthesis, and the less renewal of L is faster, affects less by signal time-varying characteristics, analyze and synthesis precision higher, but calculated amount correspondingly can increase, and therefore, weigh the quality of calculated amount and voice signal, General Requirements L is less than or equal to N/4.
To X ' [n]
kwith X [n]
kcarry out rectangular coordinate to polar coordinate transform, X [n] can be obtained
kamplitude spectrum X
a[n]
k, X [n]
kphase spectrum X
p[n]
kand X ' (n)
kphase spectrum X
p' [n]
k:
k=0,1,2,…,N/2
k=0,1,2,…,N/2
The range value X corresponding to N/2+1 frequency values is obtained by formula (2)
a[n]
k.
Utilize X
p[n]
kand X
p' [n]
kthe value X that adjusts frequency of voice signal can be calculated
f[n]
k, make Integer constrained characteristic residual error be: res [x]=x-round [x], wherein, round [x] represents x round, i.e. round [x]=int [x+0.5]
N/2+1 the value X that adjusts frequency
f[n]
kfor:
k=0,1,2,…,N/2
X in formula (4)
f[n]
kbe the adjust frequency value of N/2+1 frequency normalization to N/2, namely the scope of frequency values is [0, N/2], and adjust frequency value X
f[n]
kwith signal physical frequencies f
kcorresponding relation be
Wherein f
srepresent the sample frequency of digital signal.
In step 202., utilize the adjustment frequency that fundamental frequency value is corresponding, by N/2+1 the value X that adjusts frequency
f[n]
kbe divided into multiple subband, these subbands comprise fundamental tone subband and overtone subband.
In the process of carrying out sub-band division, the information of sub-band division can be represented by the border spectral index number of each subband.For convenience of describing, use band
boundnumber set of vector representation subband border spectral index.
The left margin of fundamental tone subband (the 0th subband) is fixing, can be set as band
bound[0]=int [f
0/ 2], also can be band
bound[0]=ceil [f
0/ 2], also can be band
bound[0]=int [f
0/ 2+0.5], they represent respectively to f
0/ 2 round downwards, rounds up and 4 houses 5 enter to round.Wherein, f
0represent that fundamental frequency value normalizes to the adjustment fundamental frequency value of N/2, namely the original physical frequency of fundamental frequency is divided by sample frequency f
sbe multiplied by N again.
The left side of fundamental tone subband, namely spectral index is number from 0 to band
bound[0] be direct current subband.Independent mapping can be carried out to direct current subband, also directly can carry out replication processes, or can abandon.The division of fundamental tone/overtone subband is with adjustment fundamental frequency value f
0realize as the benchmark spectrum amplitude paddy found between fundamental tone/overtone.Specifically, for b subband, the left margin k of b subband
sthe right margin being defined as a known upper subband adds 1, i.e. k
s=band
bound[b]+1, with F
b=F
b-1+ f
0centered by (F
binitial value F
0=3f
0/ 2), at F
baround search for the spectrum amplitude minimum value between subband b and subband b+1 in several frequency range, i.e. amplitude paddy, this hunting zone can be set as [int [F
b-f
0/ 2], int [F
b+ f
0/ 2]].Search for the location index k of the amplitude paddy obtained
ethe right margin band of b subband
bound[b+1].After completing the division of subband b, suitably can revise F according to boundary bias
bif, i.e. F
bdistance k
ecross far away, can force to increase or reduce F
bto reduce this deviation.
Shown in Figure 3, this figure is according to the method described above by schematic diagram that the spectrum division of voice signal is multiple subband.In figure, first subband is fundamental tone subband (direct current subband, on the left side of fundamental tone subband, does not mark) from left to right, and the subband on the right of fundamental tone subband is overtone subband.
In addition, be known in those skilled in the art for the method how obtaining fundamental frequency, no longer describe in detail here.Can pass through pitch Detection, namely pitch period extracts the fundamental frequency f obtaining current frame speech signal
0.The specific implementation of pitch Detection such as can pass through correlation method, average amplitude difference method, Cepstrum Method, linear prediction method etc.The fundamental frequency f obtained
0be to normalize to the value of adjusting frequency of N/2, namely the original physical frequency of fundamental frequency is divided by sample frequency f
sbe multiplied by N again.
In step 203, at least one the value X that the adjusts frequency each subband in one or more consecutive subband comprised
f[n]
kbe mapped as conversion frequency value X respectively
f' [n]
k, this at least one value X that adjusts frequency
f[n]
kcomprise the value of adjusting frequency that the frequency values in this subband with maximum amplitude value is corresponding.
Using subband as elementary cell, according to certain fundamental frequency rule of conversion, the frequency values of subband intermediate frequency spectrum can be mapped to the frequency values of specifying, thus the conversion that new frequency spectrum realizes fundamental frequency can be set up.When carrying out frequency values and mapping, the frequency values X of all frequency spectrums that each subband in the one or more consecutive subband in subband can be selected to comprise
f[n]
kbe mapped as conversion frequency value X respectively
f' [n]
k, also can according to the frequency values X of certain mapping principle by the partial frequency spectrum in subband
f[n]
kbe mapped as conversion frequency value X respectively
f' [n]
k, these component frequency values comprise value of adjusting frequency corresponding to the frequency values in this subband with amplitude peak, with X
f[n]
k_bthere is in expression subband b the value of adjusting frequency that the frequency values of amplitude peak is corresponding.Selected one or more consecutive subband can also be comprise the subband with amplitude peak peak value in all subbands.
The target of frequency map allows reconstructed spectrum realize fundamental frequency conversion, and the fundamental frequency after conversion is:
f
0′=f
ratef
0+f
offset(6)
Wherein, f
raterepresent pitch variation rate, f
offsetrepresent that fundamental tone is biased.F
0adjustment fundamental frequency, the value of adjusting frequency of what namely fundamental frequency value was corresponding normalize to N/2, f
0' be and adjustment fundamental frequency value f
0conversion frequency value after corresponding mapping, can by adjustment pitch variation rate f
ratef is biased with fundamental tone
offsettwo parameters realize f
0conversion.These two parameters are respectively used to control the value f that adjusts frequency corresponding to fundamental frequency value
0fluctuation and skew.If f
rate> 1, then can amplify f
0fluctuation, on the contrary, if f
rate< 1, then can reduce f
0fluctuation.F
offsetthen f can be adjusted
0the reference point of fluctuation.Pitch variation rate f
ratespan can be 0 to 10, fundamental tone is biased f
offsetspan can be-N/4 to N/4.
When carrying out frequency map using subband as elementary cell, follow fundamental frequency rule of conversion, by the center frequency value f in the subband b before conversion
bbe mapped as the center frequency value f in the subband b ' after conversion
b ''.Wherein f
b=(b+1) f
0.The value X that adjusts frequency arbitrarily in subband
f[n]
kto corresponding conversion frequency value X
f' [n]
kmapping can carry out according to the following equation:
X
F′[n]
k=r
wide×(X
F[n]
k-f
b)+f
b′′
=r
wide×(X
F[n]
k-(b+1)f
0)+(b′+1)f
ratef
0+(b′+1)f
offset
(7)
f
b′′=(b′+1)f
0′
f
0′=f
ratef
0+f
offset
Wherein, f
rate, f
offset, f
0, f
0' definition see formula (6), the r in formula (7)
widebandwidth control coefrficient, for controlling subband bandwidth f
wide_bchange.
Shown in Figure 7, the figure shows a kind of frequency map of single subband and the schematic diagram of bandwidth conversion.Wherein, subband bandwidth f
wide_bit is the difference of the frequency values that frequency values that b subband intermediate frequency spectrum index sequence number is the highest and spectral index sequence number are minimum.
If r
wide=1, the side-play amount of other frequency values relative centre frequency in subband remains unchanged before and after conversion; If r
wide> f
0'/f
0, the subband bandwidth so after conversion may exceed the gap of adjacent center frequencies, thus causes the overlapping of the rear intersubband of conversion, causes signal aliasing, thus causes the reduction of figure signal quality, therefore when performing the mapping reducing fundamental frequency, needs to reduce r
wideit is made to be not more than f
0'/f
0, suitably to reduce the bandwidth converting rear subband, avoid the aliasing of subband.
Consider the value X that adjusts frequency that in subband b, amplitude is maximum
f[n]
k_bbe approximately equal to center frequency value f
b=(b+1) f
0, formula (7) can also be approximated by one or more X
f[n]
k_b, b=0,1,2 ... combination, such as:
X
F′[n]
k=r
wide×(X
F[n]
k-X
F[n]
k_b)+X
F[n]
k_b′f
rate+(b′+1)f
offset
(8)
Using subband as elementary cell, select one or more consecutive subband to map, the method being realized fundamental frequency conversion by the frequency map of subband specifically can have two kinds of implementations.
First kind of way is that cyclic mapping is until fill up whole frequency axis using continuous multiple subband as the seed mapped.Multiple subbands as seed can be parts for subband, also can be whole subbands.The characteristic of original voice signal can be kept more using continuous multiple subband as the seed mapped.
As a rule, the consecutive numbers subband that general selection energy is larger.Such as, M the subband chosen continuously from fundamental tone subband is as the seed mapped.M subband comprises 1 fundamental tone subband and M-1 overtone subband.Fundamental tone subband is the 0th subband, by the centre frequency f of fundamental tone subband
0(the value X that adjusts frequency that the frequency values that in subband 0, amplitude is maximum is corresponding
f[n]
k_0be approximately equal to f
0) be mapped to f
0' (being mapped to the subband after conversion 0 from the subband 0 before conversion), the centre frequency f of the 1st overtone subband
1be mapped to f
1'=2f
0' (being mapped to the subband after conversion 1 from the subband 1 before conversion), the centre frequency f of the 2nd overtone subband
2be mapped to f
2'=3f
0' (being mapped to the subband after conversion 2 from the subband 2 before conversion), by that analogy to M-1 overtone subband center frequency f
m-1be mapped to f
m-1'=Mf
0' rear (being mapped to the subband M-1 after conversion from the subband M-1 before conversion), if do not filled frequency axis, i.e. f
m-1' be not more than or equal to N/2, carry out second time mapping and copy, by the 0th subband (fundamental tone subband) centre frequency f
0be mapped to f
m'=Mf
0'+f
0' (being mapped to the subband M after conversion from the subband 0 before conversion), the 1st overtone subband center frequency f
1be mapped to f
m+1'=Mf
0'+2f
0' (being mapped to the subband M+1 after conversion from the subband 1 before conversion), until fill up whole frequency axis.
Shown in Fig. 4 and Fig. 5, which respectively illustrate using continuous multiple subband as the seed mapped, reducing and improving the mapping schematic diagram of fundamental frequency.
The second way can only using some subbands (such as subband b) as map seed, by the centre frequency f of subband
brepeatedly cyclic mapping is to target frequency f
b ''=(b '+1) f
0' (b '=0,1,2 ...) (being mapped to the subband b ' after conversion from the subband b before conversion), until fill up whole frequency axis, i.e. target frequency f
b '' be more than or equal to N/2.
Shown in Figure 6, the figure shows the schematic diagram repeatedly mapped as the seed mapped using single subband.
In step 204, conversion frequency value X is calculated
f' [n]
kcorresponding conversion spectral index k '.Particularly, can according to following formula:
k′=round(X
F′[n]
k) (9)
Wherein, function round () represents round.Formula (9) illustrates index with frequency map principle.
Conversion spectral index k ' is the foundation of carrying out reconstructed spectrum after frequency map conversion.
When converting spectral index k ' > N/2, can abandon rebuilding to this conversion spectral index number relevant frequency spectrum.
When converting spectral index k ' < 0, also can abandon rebuilding to this conversion spectral index number relevant frequency spectrum.
In step 205, based on conversion frequency value X
f' [n]
k, obtain the reconstruction frequency values X that conversion spectral index k ' is corresponding
rF[n]
k '.
Based on conversion frequency value X
f' [n]
k, when only there being a conversion frequency value X
f' [n]
kwhen corresponding to conversion spectral index k ' by formula (9), namely when a conversion spectral index k ' only corresponds to a spectral index k, then corresponding with this conversion spectral index k ' reconstruction frequency values X
rF[n]
k' can X be equaled
f' [n]
k.
As multiple conversion frequency value X
f' [n]
kwhen corresponding to a conversion spectral index k ' by formula (9), namely when a conversion spectral index k ' is corresponding to multiple spectral index k, then can according to frequency with amplitude peak principle, namely corresponding with this conversion spectral index k ' reconstruction frequency values X
rF[n]
k 'there is in frequency values corresponding to multiple spectral index k the value X that adjusts frequency of amplitude peak
f[n]
kcorresponding conversion frequency value X
f' [n]
k.
The reconstruction frequency values X that the conversion spectral index k ' do not calculated by formula (9) is corresponding
rF[n]
k 'can 0 be set to.
The reconstruction frequency values X corresponding with converting spectral index k ' can be obtained by above method
rF[n]
k '.
In step 206, based on range value X
a[n]
k, obtain the reconstruction range value X that conversion spectral index k ' is corresponding
rA[n]
k '.
Specifically, similar with step 205, based on conversion frequency value X
f' [n]
k, when only there being a conversion frequency value X
f' [n]
kwhen corresponding to conversion spectral index k ' by formula (9), namely when a conversion spectral index k ' only corresponds to a spectral index k, then corresponding with this conversion spectral index k ' reconstruction range value X
rA[n]
k 'for X
a[n]
k.
As multiple conversion frequency value X
f' [n]
kwhen corresponding to a conversion spectral index k ' by formula (9), namely when a conversion spectral index k ' is corresponding to multiple spectral index k, then can according to amplitude summation principle, namely corresponding with this conversion spectral index k ' reconstruction range value X
rA[n]
k 'range value X corresponding to multiple spectral index k
a[n]
ksum, or the range value X corresponding to multiple spectral index k
a[n]
kall side and.
Not by formula (9) reconstruction range value X corresponding to the conversion spectral index k ' that calculates
rA[n]
k 'all can be set to 0.
The reconstruction range value X corresponding with converting spectral index k ' can be obtained by above method
rA[n]
k '.
Next, based on the reconstruction frequency values X corresponding with each conversion spectral index k '
rF[n]
k 'the reconstruction phase value X corresponding with converting spectral index k ' can be calculated
rP[n]
k ', thus can based on reconstruction phase value X
rP[n]
k 'with reconstruction range value X
rA[n]
k 'synthesize new voice signal.
Except the fundamental frequency of voice is to except the change effect of characteristics of speech sounds significantly, for strengthening the effect of speaker's characteristic of voice signal further, the adjustment of spectrum envelope can also be carried out.
In step 207, by the reconstruction range value X in step 206
rA[n]
k 'be multiplied by subband gain value corresponding to the conversion subband at its place and/or EQ Gain value, to obtain envelope adjusting range value X
rA' [n]
k '.
The conveniently adjustment of spectrum envelope, after obtaining the division of subband, before can also mapping at the frequency values that one or more consecutive subband is comprised, the amplitude of each sub-bands of frequencies is normalized, by number range value corresponding respectively of each spectral index in one or more consecutive subband divided by range value maximum in its place subband, make the amplitude value of the sub-bands of frequencies after normalization between [0,1], maximal value is 1.
Shown in Figure 8, the figure shows the schematic diagram of the amplitude normalization of sub-bands of frequencies.
Range value X is rebuild in acquisition
rA[n]
k 'after, it is carried out to the adjustment of spectrum envelope, particularly, can by reconstruction range value X
rA[n]
k 'be multiplied by corresponding subband gain value and/or EQ Gain value.
For subband gain value, all normalized spatial spectrum range values in the subband after conversion can be multiplied by corresponding with this subband independently subband gain value, the number of subband gain value and rebuild the number one_to_one corresponding that frequency values comprises subband.
Shown in Figure 9, the spectrum amplitude after subband normalization is carried out the schematic diagram of envelope adjustment by this diagram.
Subband gain value for spectrum envelope adjustment can be one group of yield value of setting arbitrarily.In order to retain and utilize the information of primitive tone signal more, the subband spectrum envelope before to conversion also can be utilized to carry out adjusting the subband gain value obtained.Subband envelope before conversion is one group of yield value, the maximum amplitude value of its former each subband spectrum after saving sub-band division.Carry out linear or nonlinear transformation obtain new spectral index number by rebuilding conversion spectral index number that in each subband after frequency, maximum amplitude value is corresponding.Below use f
scalerepresent the parameter for enveloping curve of adjusting frequency.
When adopting linear scale method, for the subband b rebuild in frequency values, the conversion spectral index number that in subband, maximum amplitude value is corresponding is k
b, to conversion spectral index k
bafter linearly contracting, the new spectral index number that in subband, maximum amplitude value is corresponding is k
b', then make:
k
b′=k
b×f
scale(10)
When adopting non-linear zoom method, monotonically increasing convergent-divergent function can be set up:
k
b′=F
scale[k
b,f
scale] (11)
This monotonically increasing convergent-divergent function meets:
F
scale[0,f
scale]=0
(12)
F
scale[N/2,f
scale]=N/2
Such as, a kind of convergent-divergent function example is:
No matter adopt linear scale or non-linear zoom, by conversion spectral index corresponding for maximum amplitude value in subband b from k
bbe k through scale transformation
b' after, all need the sub-band serial number before converting to retrieve, concrete grammar can be as follows:
Compare band one by one
bound[i], i=0,1,2 ..., bands, bands are the sub-band sum before conversion.If k
b' meet band
bound[b '] < k
b' <=band
bound[b '+1], then the subband gain value of the subband b after conversion is the individual yield value of b ' in the subband envelope before converting; If k
b' meet band
bound[bands] < k
b', the subband gain value of b is last yield value in the subband envelope before converting.Figure 10 and Figure 11 respectively illustrates the schematic diagram of linearity and non-linearity Zoom method adjustment spectrum envelope.That is, when new spectral index number is not more than the maximum spectrum call number in last subband before conversion, subband envelope gain is the maximum amplitude value in the subband before the conversion of new spectral index number.When new spectral index number is greater than the maximum spectrum call number in last subband before conversion, subband envelope gain is the maximum amplitude value in last subband before conversion, and last subband is the maximum subband of frequency values.
For EQ Gain value, EQ Gain value G=[g
0, g
1, g
2..., g
n/2] can be a default vector with N/2+1 component value, it and N/2+1 rebuild range value X
rA[n]
k 'one_to_one corresponding, the equilibrium of compulsory frequency range and global gain adjustment are carried out in its effect.Such as, if need global gain to be set to 2, increase 1 times by final conversion speech volume, so can make G=[2,2,2 ..., 2]; If need by conversion voice low frequency (frequency spectrum of front 1/4) increase 1 times, so can make G=[2,2,2 ..., 2,1,1,1 ..., 1], wherein front 1/4 component is 2, and component is below all 1.For obtaining more level and smooth effect, can also, in several value, the numerical value 2 in above-mentioned example be made gently to change to numerical value 1, such as [2 ... 1.7 ... 1.3 ... 1].
To the adjustment of spectrum envelope, in conjunction with subband yield value and EQ Gain value, such as, range value X can also will be rebuild
rA[n]
k 'be multiplied by corresponding subband gain value, then be multiplied by corresponding EQ Gain value, to obtain envelope adjusting range value X
rA' [n]
k '.
Utilize and rebuild frequency values X
rF[n]
k 'with envelope adjusting range value X
rA' [n]
k 'the step of synthesizing new voice signal can comprise step 208 and step 209 particularly.
In a step 208, based on reconstruction frequency values X
rF[n]
k 'calculate and rebuild phase value X
rP[n]
k '.
The reconstruction frequency values X corresponding with each conversion spectral index k ' can be utilized
rF[n]
k 'calculate the reconstruction phase value X corresponding with conversion spectral index k '
rP[n]
k '.Specifically, the reconstruction frequency values X corresponding with each conversion spectral index k ' is utilized
rF[n]
k 'phase value X is rebuild with previous frame
rP[n-1]
k 'obtain present frame and rebuild phase spectrum X
rP[n]
k '.Concrete formula is:
Wherein, k '=0,1,2 ..., N/2, X
rP[n-1]
k 'represent the reconstruction phase value of previous frame, initial value is zero i.e. X
rP[0]
k '=0, M is the sampled point number in output time interval, wherein, and res [x]=x-round [x].
Adopt the voice signal with phase measurements to rebuild, the high-quality reconstructed speech with time-frequency smoothness properties can be obtained.
In step 209, based on reconstruction phase value X
rP[n]
k 'with envelope adjusting range value X
rA' [n]
k 'synthesize new voice signal.
Only exemplarily introduce the method for voice signal synthesis below.Those skilled in the art it will be appreciated that how to utilize reconstruction phase value and envelope adjusting range value to synthesize new voice signal.
According to reconstruction phase value X
rP[n]
k 'with envelope adjusting range value X
rA' [n]
k 'carry out the conversion of polar coordinate system to rectangular coordinate system, the reconstructed spectrum X of N point can be obtained
r[n]
k ':
To reconstructed spectrum X
r[n]
kdo windowing Inverse Discrete Fourier Transform (Inverse Discrete FourierTransform, IDFT) and obtain echo signal d
w(n):
d
w(n)=[d(0),d(1),…,d(N-1)]·h
syn
=[d(0)h
syn(0),d(1)h
syn(1),…,d(N-1)h
syn(N-1)] (16)
Wherein, h
synbe synthesis window function, Hamming (Hamming) or Hanning window (Hanning) or sinusoidal windows (Sine) can be used.
Utilize echo signal d
wn () is carried out the cumulative L point time domain that obtains of overlap and is exported, i.e. time-domain audio signal:
z(n)=d
w(n)+z(n-1) (17)
x
R(n)
[l]=z(n)
[l],l=0,1,2,L-1 (18)
Buffering is upgraded after obtaining exporting
z(n)
[l]=z(n)
[l+L],l=0,1,2,N-L-1 (19)
z(n)
[l]=0,l=N-L,N-L+1,…,N-1
Wherein, the initial value of z (n) is zero.
It should be noted that, for the synthesis of the voice signal in step 209, if do not carry out the adjustment of spectrum envelope, then based on reconstruction phase value X
rP[n]
k 'with reconstruction range value X
rA[n]
k 'synthesize new voice signal, namely to rebuild range value X
rA[n]
k 'replace the envelope adjusting range value X in step 209
rA' [n]
k '.
In addition, for the sound signal with ground unrest or other non-voices, first voice signal can also be identified from sound signal.Such as, starting point and the terminal of voice signal can be found from ground unrest by speech terminals detection, speech terminals detection is also referred to as voice activation and detects (Voice Active Detection, VAD), its target is separated with other signals (as ambient noise signal) by voice signal in a section audio signal, and judge the end points of voice exactly.Voice activation detects can have different implementation methods, such as, by methods well-known to those skilled in the art such as short-time energy threshold value, short-time zero-crossing rate, short-term spectrum entropy, spectral change rates.Voice activation detects the testing result that output take frame as unit, namely judges whether current frame signal includes voice signal.If present frame does not comprise voice, its amplitude spectrum and frequency spectrum can be analyzed, and synthesize in the step of the voice signal similar with step 104.
the converting means of voice signal characteristic
With reference to shown in Figure 12, this figure is the structural representation of the converting means embodiment corresponding with the transform method of voice signal characteristic provided by the invention.
This device comprises sub-band division unit 1201, frequency mapping unit 1202, frequency spectrum reconstruction unit 1203 and signal synthesis unit 1204.
Sub-band division unit 1201 is multiple subband for utilizing the fundamental frequency value of voice signal by mapping the spectrum division of voice signal, maps multiple subband and comprises fundamental tone subband and overtone subband.
The frequency values that one or more consecutive mapping subband comprises, for being biased based on the pitch variation rate pre-set and fundamental tone, is mapped as conversion frequency value by frequency mapping unit 1202 respectively, to obtain the conversion subband corresponding with this subband.
Frequency spectrum reconstruction unit 1203 is for obtaining reconstructed spectrum based on mapping transformation frequency values.
Signal synthesis unit 1204 is for utilizing the voice signal that mapping reconstruction Spectrum synthesizing is new.
The new voice signal completed after characteristics of speech sounds conversion has the characteristic different from primitive tone signal.Shown in Figure 13, the figure shows the spectrogram of original voice signal.Original voice signal is that male voice is read aloud " sending Young female ".Figure 14 to Figure 17 respectively show the spectrogram of voice signal after the conversion characteristics obtained by a kind of embodiment of transform method provided by the present invention.Target targeted voice signal after conversion is respectively the voice signal that more overcast male voice, loud and sonorous female voice, sharp-pointed child's voice and machine voice are read aloud.Wherein, the sample frequency of original voice signal is 8kHz, N value be 512, L value is 128, and window function all uses peaceful (Hanning) window of the Chinese, and envelope convergent-divergent uses linear scale.The spectrogram of the voice signal after conversion is as shown in Figure 14 to Figure 17:
Former male voice sound is read aloud the spectrogram being transformed to the voice signal that more overcast male voice is read aloud by Figure 14, and wherein converting parameter is f
rate=0.7, f
offset=0, r
wide=1, f
scale=1.1, G=1.
Former male voice sound is read aloud the spectrogram being transformed to the voice signal that loud and sonorous female voice is read aloud by Figure 15, and wherein conversion parameter is f
rate=1.7, f
offset=2, r
wide=1, f
scale=0.8, G=1.
Former male voice sound is read aloud the spectrogram being transformed to the voice signal that sharp-pointed child's voice is read aloud by Figure 16, and wherein conversion parameter is f
rate=2, f
offset=8, r
wide=1, f
scale=0.6, G=1.
Former male voice sound is read aloud the spectrogram being transformed to the voice signal that machine voice is read aloud by Figure 17, and wherein conversion parameter is f
rate=0.1, f
offset=6, r
wide=1, f
scale=1, G=1.
From the audition effect of reality, new voice signal target signature is obvious, and the conversion quality of voice signal is good, stable smooth.
So far, the transform method according to a kind of voice signal characteristic of the present invention and device has been described in detail.In order to avoid covering design of the present invention, details more known in the field are not described.Those skilled in the art, according to description above, can understand how to implement technical scheme disclosed herein completely.
Method and system of the present invention may be realized in many ways.Such as, any combination by software, hardware, firmware or software, hardware, firmware realizes method and system of the present invention.Said sequence for the step of described method is only to be described, and the step of method of the present invention is not limited to above specifically described order, unless specifically stated otherwise.In addition, in certain embodiments, can be also record program in the recording medium by the invention process, these programs comprise the machine readable instructions for realizing according to method of the present invention.Thus, the present invention also covers the recording medium stored for performing the program according to method of the present invention.
Although be described in detail specific embodiments more of the present invention by example, it should be appreciated by those skilled in the art, above example is only to be described, instead of in order to limit the scope of the invention.It should be appreciated by those skilled in the art, can without departing from the scope and spirit of the present invention, above embodiment be modified.Scope of the present invention is limited by claims.