CN103258539B - A kind of transform method of voice signal characteristic and device - Google Patents

A kind of transform method of voice signal characteristic and device Download PDF

Info

Publication number
CN103258539B
CN103258539B CN201210033138.2A CN201210033138A CN103258539B CN 103258539 B CN103258539 B CN 103258539B CN 201210033138 A CN201210033138 A CN 201210033138A CN 103258539 B CN103258539 B CN 103258539B
Authority
CN
China
Prior art keywords
subband
value
conversion
frequency
spectral index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210033138.2A
Other languages
Chinese (zh)
Other versions
CN103258539A (en
Inventor
吴晟
李昙
林福辉
张本好
徐晶明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN201210033138.2A priority Critical patent/CN103258539B/en
Publication of CN103258539A publication Critical patent/CN103258539A/en
Application granted granted Critical
Publication of CN103258539B publication Critical patent/CN103258539B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of transform method and device of voice signal characteristic.The method comprises: utilize the fundamental frequency value of voice signal to be the multiple subbands comprising fundamental tone subband and overtone subband by the spectrum division of voice signal; Be biased based on the pitch variation rate pre-set and fundamental tone, the frequency values comprised by one or more consecutive subband is mapped as conversion frequency value respectively; Reconstructed spectrum is obtained based on conversion frequency value; Reconstructed spectrum is utilized to synthesize new voice signal.By utilizing the fundamental frequency of voice signal speech signal spec-trum is divided into multiple subband, by obtained one or more consecutive subband frequency values carry out the controlled mapping of parameter, acquisition has special sound characteristic reconstructed spectrum, thus can obtain based on reconstructed spectrum and have special sound characteristic and the significantly new voice signal of target signature.

Description

A kind of transform method of voice signal characteristic and device
Technical field
The present invention relates to field of voice signal, particularly relate to a kind of transform method and device of voice signal characteristic.
Background technology
The characteristic conversion of voice signal is a kind of special speech processing applications, its objective is the method for a digitized voice signal by signal transacting, is transformed into another voice signal.From the effect that audience hears, feel to hear that the sound of conversion front and back is from different main bodys, such as, become female voice from male voice, or the sound of male first becomes the sound of male second.In addition, the sound with other certain effects can also be transformed into, such as, be transformed to cartoon figure or the voice signal etc. with animal vocalization characteristic.
The for a change characteristic of voice signal, adopts modelling and non-modelling usually.
Modelling mainly relies on the establishment of the mathematical model of sound pronunciation, and the parameter database of speaker characteristic is set up according to this model, in specific implementation process, need to carry out analysis identification to voice signal, extract voice messaging, mate by the parameter of the object module in these voice messagings and parameter database, to synthesize the voice with target signature.This transform method relies on pronunciation model, similar by voice messaging to element information (such as language text or received pronunciation) again to the transformation system of voice messaging.Use this class methods, need the parameter database being set up speaker characteristic by training, therefore this needs huge workload.
Non-modelling needs the characteristic parameter being extracted voice signal by Speech processing usually, after converting pointedly, utilizes the method reconstructed speech signal of signal transacting to characteristic parameter.Disclosed in Chinese Patent Application No. 200710163066.2 based on the analysis of transform domain and the method for reconstruction, changed the characteristic of voice signal by the bending conversion of composing voice signal transform domain.In order to the characteristic making the voice signal after conversion have target voice, this non-model transform method still needs building database to carry out the characteristic conversion of voice signal targetedly.Except the non-modelling of transform domain, also have some transform methods based on time domain, such as TD-PSOLA (TD-PSOLA) method, the speech processes of these class methods is second-rate, should not adopt.
Above modelling and non-model side all need building database with the conversion of realize target voice, therefore all need huge workload.
Summary of the invention
The present inventor finds above-mentioned problems of the prior art, proposes a kind of new technical scheme, when avoiding setting up parameter database, realizes the conversion to characteristics of speech sounds by the frequency map that parameter is controlled.
An object of the present invention is to provide a kind of transform method and device of characteristics of speech sounds.
According to an aspect of the present invention, provide a kind of transform method of voice signal characteristic, the method comprises:
Utilize the fundamental frequency value of voice signal to be multiple subband by the spectrum division of described voice signal, described multiple subband comprises fundamental tone subband and overtone subband;
Be biased based on the pitch variation rate pre-set and fundamental tone, the frequency values that one or more consecutive described subband comprises is mapped as conversion frequency value respectively, to obtain the conversion subband corresponding with this subband;
Reconstructed spectrum is obtained based on described conversion frequency value;
Described reconstructed spectrum is utilized to synthesize new voice signal.
Preferably, described is that the step of multiple subband comprises by the spectrum division of described voice signal:
N point windowed DFT is carried out to described voice signal, obtains the range value X corresponding to N/2+1 frequency values a[n] kwith the value X that adjusts frequency normalizing to N/2 f[n] k, n is frame number, and k is spectral index number, k=0,1,2 ..., N/2;
Utilize the value of adjusting frequency that described fundamental frequency value is corresponding, by described N/2+1 the value X that adjusts frequency f[n] kbe divided into multiple subband, described multiple subband comprises fundamental tone subband and overtone subband.
Preferably, the step that the described frequency values comprised one or more consecutive described subband maps respectively comprises:
At least one value X that adjusts frequency that each subband in one or more consecutive described subband is comprised f[n] kbe mapped as conversion frequency value X respectively f' [n] k, at least one value X that adjusts frequency described f[n] kcomprise the value of adjusting frequency that the frequency values in this subband with maximum amplitude value is corresponding;
The described step based on described conversion frequency value acquisition reconstructed spectrum comprises:
Conversion frequency value X according to following formulae discovery f' [n] kcorresponding conversion spectral index k ':
k′=round(X F′[n] k)
Wherein function round () represents round;
Based on described conversion frequency value X f' [n] k, obtain the conversion frequency value X that conversion spectral index k ' is corresponding f' [n] k '.
Preferably, use following formula perform described in adjust frequency value X f[n] kto corresponding conversion frequency value X f' [n] kmapping:
X F′[n] k=r wide×(X F[n] k-f b)+f b′
=r wide×(X F[n] k-(b+1)f 0)+(b′+1)f ratef 0+(b′+1)f offset
f b′′=(b′+1)f 0
f 0′=f ratef 0+f offset
Wherein, f 0be to normalize to the adjustment fundamental frequency value of N/2, f 0' be conversion frequency value after the mapping corresponding with normalization fundamental frequency value, b is sub-band serial number, f bthe center frequency value in subband b, f b '' be center frequency value in subband b ' after center frequency value in subband b maps, f raterepresent pitch variation rate, f offsetrepresent that fundamental tone is biased, r wideit is bandwidth control coefrficient.
Preferably, when a conversion spectral index k ' only corresponds to a described spectral index k, then corresponding with this conversion spectral index k ' reconstruction frequency values X rF[n] k 'equal X f' [n] k;
When a conversion spectral index k ' is corresponding to multiple spectral index k, the reconstruction frequency values X corresponding with this conversion spectral index k ' rF[n] k 'there is in frequency values corresponding to described multiple spectral index k the value X that adjusts frequency of amplitude peak f[n] kcorresponding conversion frequency value X f' [n] k.
Preferably, the described step based on described conversion frequency value acquisition reconstructed spectrum also comprises:
Based on described range value X a[n] k, obtain the reconstruction range value X that conversion spectral index k ' is corresponding rA[n] k '.
Preferably, when a conversion spectral index k ' only corresponds to a described spectral index k, the reconstruction range value X corresponding with this conversion spectral index k ' rA[n] k 'range value X corresponding to described spectral index k a[n] k;
When a conversion spectral index k ' is corresponding to multiple described spectral index k, the reconstruction range value X corresponding with this conversion spectral index k ' rA[n] k 'for the range value sum corresponding to described multiple spectral index k, or the range value X corresponding to described multiple spectral index k rA[n] k 'all side and.
Preferably, the described step utilizing described reconstructed spectrum to synthesize new voice signal comprises:
Based on the described reconstruction frequency values X corresponding with each described conversion spectral index k ' rF[n] k 'calculate the reconstruction phase value X corresponding with conversion spectral index k ' rP[n] k ';
Based on described reconstruction phase value X rP[n] k 'with described reconstruction range value X rA[n] k 'synthesize new voice signal.
Preferably, before the frequency values comprised by one or more consecutive described subband is mapped as conversion frequency value respectively, also comprise:
By number range value corresponding respectively of each spectral index in described one or more consecutive subband divided by range value maximum in its place subband, with the amplitude normalization by described subband.
Preferably, the method also comprises:
By described reconstruction range value X rA[n] k 'be multiplied by subband gain value corresponding to the conversion subband at its place and/or EQ Gain value, to obtain envelope adjusting range value X rA' [n] k '.
Preferably, the method also comprises:
Preset N/2+1 EQ Gain value g 0, g 1, g 2..., g n/2.
Preferably, described subband gain value is the subband envelope gain value corresponding to spectral index number in described conversion subband with maximum amplitude value.
Preferably, conversion spectral index k ' corresponding for maximum amplitude value in described conversion subband is carried out linear or nonlinear transformation and obtain new spectral index number;
When maximum spectrum call number in last subband that described new spectral index number is not more than before conversion, described subband envelope gain is the maximum amplitude value in the subband before the conversion of described new spectral index number,
When maximum spectrum call number in last subband that described new spectral index number is greater than before conversion, described subband envelope gain is the maximum amplitude value in last subband before conversion, and last subband described is the maximum subband of frequency values.
Preferably, the described step utilizing described reconstructed spectrum to synthesize new voice signal comprises:
Based on the described reconstruction frequency values X corresponding with each described conversion spectral index k ' rF[n] k 'calculate the reconstruction phase value X corresponding with conversion spectral index k ' rP[n] k ';
Based on described reconstruction phase value X rP[n] k 'with described envelope adjusting range value X rA' [n] k 'synthesize new voice signal.
Preferably, described based on the described reconstruction frequency values X corresponding with each described conversion spectral index k ' rF[n] k 'calculate the envelope adjusting range value X corresponding with conversion spectral index k ' rP' [n] k 'step in, according to following formulae discovery:
X RP[n] k′=res[X RP[n-1] k′+X RF′[n] k′M/N]
Wherein, res [x]=x-round [x], X rP[0] k '=0, M is the sampled point number in output time interval.
Preferably, described one or more consecutive subband comprises the subband with amplitude peak peak value.
Preferably, when converting spectral index k ' < 0, abandon rebuilding to this conversion spectral index number relevant frequency spectrum; When converting spectral index k ' > N/2, abandon rebuilding to this conversion spectral index number relevant frequency spectrum.
Preferably, described voice signal is detected by fundamental tone detecting method.
According to a further aspect in the invention, additionally provide a kind of converting means of characteristics of signals, this device comprises:
Sub-band division unit, for utilizing the fundamental frequency value of voice signal to be multiple subband by the spectrum division of described voice signal, described multiple subband comprises fundamental tone subband and overtone subband;
Frequency mapping unit, for being biased based on the pitch variation rate pre-set and fundamental tone, is mapped as conversion frequency value respectively by the frequency values that one or more consecutive described subband comprises, to obtain the conversion subband corresponding with this subband;
Frequency spectrum reconstruction unit, for obtaining reconstructed spectrum based on described conversion frequency value;
Signal synthesis unit, synthesizes new voice signal for utilizing described reconstructed spectrum.
Method provided by the present invention utilizes the fundamental frequency value of voice signal the frequency spectrum of voice signal to be carried out to the division of subband, be biased based on the pitch variation rate pre-set and fundamental tone, the frequency values that one or more consecutive subband comprises is mapped respectively, thus can obtain the reconstructed spectrum with specified speech feature, and then synthesis has the voice signal of new speech feature.
In another kind preferably embodiment, by normalizing to the value X that adjusts frequency of N/2 f[n] k, at least one value X that adjusts frequency that one or more consecutive subband is comprised f[n] kbe mapped as conversion frequency value X respectively f' [n] k, the frequency values X of mapping f[n] kcomprise the adjust frequency value corresponding with the frequency values with maximum amplitude value in this subband, thus can map for component frequency value, reduce calculated amount.
In another kind preferably embodiment, further, first subband amplitude normalization can also be carried out before frequency map, after carrying out frequency map, use and there is the subband gain value of phonetic feature and/or EQ Gain value carries out envelope adjustment to spectrum amplitude, thus the conversion of more real characteristics of speech sounds can be realized.
In another kind preferably embodiment, use the signal reconstruction method possessing phase measurements, the high-quality reconstructed speech signal with time-frequency smoothness properties can be obtained.
By referring to the detailed description of accompanying drawing to exemplary embodiment of the present invention, further feature of the present invention and advantage thereof will become clear.
Accompanying drawing explanation
What form a part for instructions drawings describes embodiments of the invention, and together with the description for explaining principle of the present invention.
With reference to accompanying drawing, according to detailed description below, clearly the present invention can be understood, wherein:
Fig. 1 shows the schematic flow sheet of transform method embodiment one provided by the present invention;
Fig. 2 shows the schematic flow sheet of transform method embodiment two provided by the present invention;
Fig. 3 shows the sub-band division schematic diagram of the speech signal spec-trum in transform method embodiment two provided by the present invention;
Fig. 4 to show in a kind of embodiment of transform method provided by the present invention using continuous multiple subband as the seed mapped and reduces the schematic diagram of the mapping of fundamental frequency;
Fig. 5 to show in a kind of embodiment of transform method provided by the present invention using continuous multiple subband as the seed mapped and improves the schematic diagram of the mapping of fundamental frequency;
Fig. 6 shows in a kind of embodiment of transform method provided by the present invention the schematic diagram carrying out repeatedly cycle frequency using single subband as the seed mapped and map;
Fig. 7 shows the frequency map of a kind of single subband in a kind of embodiment of transform method provided by the present invention and the schematic diagram of bandwidth conversion;
Fig. 8 shows normalized for the spectrum amplitude of subband schematic diagram in a kind of embodiment of transform method provided by the present invention;
Fig. 9 shows the schematic diagram in a kind of embodiment of transform method provided by the present invention, spectrum amplitude after subband normalization being carried out envelope adjustment;
Figure 10 shows the schematic diagram of the linear scale method adjustment spectrum envelope in a kind of embodiment of transform method provided by the present invention;
Figure 11 shows the schematic diagram of the non-linear zoom method adjustment spectrum envelope in a kind of embodiment of transform method provided by the present invention;
Figure 12 shows the structural representation of converting means embodiment provided by the present invention;
Figure 13 shows the spectrogram of the voice signal that a former male voice sound is read aloud;
Figure 14 shows and former male voice sound is read aloud the spectrogram being transformed to the voice signal that more overcast male voice is read aloud;
Figure 15 shows and former male voice sound is read aloud the spectrogram being transformed to the voice signal that loud and sonorous female voice is read aloud;
Figure 16 shows and former male voice sound is read aloud the spectrogram being transformed to the voice signal that sharp-pointed child's voice is read aloud;
Figure 17 shows and former male voice sound is read aloud the spectrogram being transformed to the voice signal that machine voice is read aloud.
Embodiment
Various exemplary embodiment of the present invention is described in detail now with reference to accompanying drawing.It should be noted that: unless specifically stated otherwise, otherwise positioned opposite, the numerical expression of the parts of setting forth in these embodiments and step and numerical value do not limit the scope of the invention.
Meanwhile, it should be understood that for convenience of description, the size of the various piece shown in accompanying drawing is not draw according to the proportionate relationship of reality.
Illustrative to the description only actually of at least one exemplary embodiment below, never as any restriction to the present invention and application or use.
May not discuss in detail for the known technology of person of ordinary skill in the relevant, method and apparatus, but in the appropriate case, described technology, method and apparatus should be regarded as a part of authorizing instructions.
In all examples with discussing shown here, any occurrence should be construed as merely exemplary, instead of as restriction.Therefore, other example of exemplary embodiment can have different values.
It should be noted that: represent similar terms in similar label and letter accompanying drawing below, therefore, once be defined in an a certain Xiang Yi accompanying drawing, then do not need to be further discussed it in accompanying drawing subsequently.
Below the transform method of characteristics of speech sounds provided by the present invention and device are described in detail.
the transform method of voice signal characteristic
embodiment one
Shown in Figure 1, this figure is the schematic flow sheet of the transform method embodiment one of voice signal characteristic provided by the present invention.
In a step 101, utilize the fundamental frequency value of voice signal to be multiple subband by the spectrum division of voice signal, these multiple subbands comprise fundamental tone subband and overtone subband.
Usually will consider the conversion of characteristics of speech sounds from basic acoustics, pitch depends on the frequency of sound.The size of sound frequency and the length of vocal cords, thickness, degree of tightness are relevant.Vocal cords are short, thin, tight, and during pronunciation, frequency is larger, and sound is just high, otherwise just low.The vocal cords of woman, children are per second vibrates 150 ~ 300 times, and baby is also higher, 60 ~ 200 times per second of man.When one personal mood is exciting, sound is high, and time depressed, sound is low.Vocal cords opening and closing makes air-flow form a series of pulse.Every opening and closing time once and vibration period are called pitch period or pitch period, and the inverse of pitch period is fundamental frequency.Therefore, the fundamental frequency by changing sound can change the characteristic of sound significantly.
The tone color of sound is determined by the combination of the fundamental tone and overtone that form sound.Different sounding bodies, even if the frequency of fundamental tone is identical with intensity, but overtone (each harmonic) the intensity size caused due to physical arrangement is different with distribution, and the sensation sounded will difference completely.People's one's voice in speech is different, has different timbres to a great extent and causes, thus define different sound characteristicses.
Based on the feature that can make a significant impact characteristics of speech sounds above, utilize the frequency spectrum of fundamental frequency value to voice signal of voice signal to divide, be divided into different subbands, comprise fundamental tone subband and overtone subband, such as, make b be sub-band serial number, b=0,1,2, ..., wherein 0 represent fundamental tone subband, 1,2,3 ... represent the 1st, 2,3... overtone subband.Next, the mapping transformation of frequency values can be carried out based on one or more subband, to obtain new frequency values.
In a step 102, be biased based on the pitch variation rate pre-set and fundamental tone, the frequency values comprised by one or more consecutive subband is mapped as conversion frequency value respectively, to obtain the conversion subband corresponding with this subband.
Fundamental frequency is the important parameter showing speaker's characteristic in voice signal.The scope of fundamental frequency is about about 70-350Hz, and it is fixed with the sex of speaker, age and concrete condition, and elderly men is on the low side, child and young women higher.
Because fundamental frequency is the key character affecting voice signal speaker characteristic, the height changing the fundamental tone frequency spectrum of voice signal significantly can change the auditory properties of voice.
Carry out the controlled mapping of parameter to the frequency of fundamental tone subband and overtone subband, controlled parameter can comprise pitch variation rate and fundamental tone is biased.Be biased by default different pitch variation rate and fundamental tone and frequency values is mapped, the conversion frequency value with special sound characteristic can be obtained, and obtain the conversion subband corresponding with this subband.
In step 103, reconstructed spectrum is obtained based on conversion frequency value.The new frequency spectrum obtained has special sound characteristic.
At step 104, reconstructed spectrum is utilized to synthesize new voice signal.Because reconstructed spectrum has specific characteristics of speech sounds, therefore utilize reconstructed spectrum to synthesize new voice signal and also there is specific characteristics of speech sounds, thus achieve the conversion of characteristics of speech sounds.
The frequency spectrum utilizing the fundamental frequency value of voice signal to divide voice signal can have different division methods.In embodiment two, will a kind of concrete division methods be introduced, and carry out the mapping of sub-bands of frequencies based on this division methods, and the reconstruction of frequency spectrum and the synthesis of voice signal.
embodiment two
Shown in Figure 2, the figure shows the schematic flow sheet of the transform method embodiment two of voice signal characteristic provided by the present invention.
In step 201, N point windowed DFT (Discrete-time Fourier Transform, DFT) is carried out to voice signal, obtains the range value X corresponding to N/2+1 frequency values a[n] kwith the value X that adjusts frequency normalizing to N/2 f[n] k, n is frame number, and k is spectral index number, k=0,1,2 ..., N/2.
Frame by frame windowed DFT is carried out to voice signal, and then the range value and frequency values that obtain voice signal can be analyzed.Particularly, to voice signal x (t), carry out N point windowed DFT with time interval L and obtain X [n] kwith X ' [n] k:
X [ n ] k = &Sigma; l = 0 N - 1 x ( nL + l ) h ana ( l ) e - j 2 &pi; N lk = [ X r [ n ] k + j X i [ n ] k ]
X &prime; [ n ] k = &Sigma; l = 0 N - 1 x ( nL + l - K ) h ana ( l ) e - j 2 &pi; N lk = [ X r &prime; [ n ] k + j X i &prime; [ n ] k ] - - - ( 1 )
k=0,1,2,…,N/2
Wherein h anarepresent the analysis window function of N point, Hamming (Hamming), Hanning window (Hanning) or sinusoidal windows (Sine) can be used, K is that periodic extension is counted, 1 <=K <=L, subscript k represent the kth element of vector.Because voice signal is real number signal, N point DFT converts the DFT spectrum obtained only to be needed to retain front N/2+1 point, and namely the scope of spectral index k is from 0 to N/2.
For the length of N, General Requirements N be the power of 2 to meet the requirement of fast fourier transform (Fast Fourier Transform, FFT), and the spectral resolution fs/N < 32Hz of DFT can be made.For sample frequency f sthe voice signal of=8000Hz, N is greater than and equals 256.L then represents the renewal speed analyzed with synthesis, and the less renewal of L is faster, affects less by signal time-varying characteristics, analyze and synthesis precision higher, but calculated amount correspondingly can increase, and therefore, weigh the quality of calculated amount and voice signal, General Requirements L is less than or equal to N/4.
To X ' [n] kwith X [n] kcarry out rectangular coordinate to polar coordinate transform, X [n] can be obtained kamplitude spectrum X a[n] k, X [n] kphase spectrum X p[n] kand X ' (n) kphase spectrum X p' [n] k:
X A [ n ] k = X r [ n ] k 2 + X i [ n ] k 2 , - - - ( 2 )
k=0,1,2,…,N/2
X P [ n ] k = tan - 1 ( X i [ n ] k / X r [ n ] k ) 2 &pi; ,
X P &prime; [ n ] k = tan - 1 ( X i &prime; [ n ] k / X r &prime; [ n ] k ) 2 &pi; , - - - ( 3 )
k=0,1,2,…,N/2
The range value X corresponding to N/2+1 frequency values is obtained by formula (2) a[n] k.
Utilize X p[n] kand X p' [n] kthe value X that adjusts frequency of voice signal can be calculated f[n] k, make Integer constrained characteristic residual error be: res [x]=x-round [x], wherein, round [x] represents x round, i.e. round [x]=int [x+0.5]
N/2+1 the value X that adjusts frequency f[n] kfor:
X F [ n ] k = k + N K res [ X P [ n ] k - X P &prime; [ n ] k - k K N ] - - - ( 4 )
k=0,1,2,…,N/2
X in formula (4) f[n] kbe the adjust frequency value of N/2+1 frequency normalization to N/2, namely the scope of frequency values is [0, N/2], and adjust frequency value X f[n] kwith signal physical frequencies f kcorresponding relation be
f k = f s X F [ n ] k N - - - ( 5 )
Wherein f srepresent the sample frequency of digital signal.
In step 202., utilize the adjustment frequency that fundamental frequency value is corresponding, by N/2+1 the value X that adjusts frequency f[n] kbe divided into multiple subband, these subbands comprise fundamental tone subband and overtone subband.
In the process of carrying out sub-band division, the information of sub-band division can be represented by the border spectral index number of each subband.For convenience of describing, use band boundnumber set of vector representation subband border spectral index.
The left margin of fundamental tone subband (the 0th subband) is fixing, can be set as band bound[0]=int [f 0/ 2], also can be band bound[0]=ceil [f 0/ 2], also can be band bound[0]=int [f 0/ 2+0.5], they represent respectively to f 0/ 2 round downwards, rounds up and 4 houses 5 enter to round.Wherein, f 0represent that fundamental frequency value normalizes to the adjustment fundamental frequency value of N/2, namely the original physical frequency of fundamental frequency is divided by sample frequency f sbe multiplied by N again.
The left side of fundamental tone subband, namely spectral index is number from 0 to band bound[0] be direct current subband.Independent mapping can be carried out to direct current subband, also directly can carry out replication processes, or can abandon.The division of fundamental tone/overtone subband is with adjustment fundamental frequency value f 0realize as the benchmark spectrum amplitude paddy found between fundamental tone/overtone.Specifically, for b subband, the left margin k of b subband sthe right margin being defined as a known upper subband adds 1, i.e. k s=band bound[b]+1, with F b=F b-1+ f 0centered by (F binitial value F 0=3f 0/ 2), at F baround search for the spectrum amplitude minimum value between subband b and subband b+1 in several frequency range, i.e. amplitude paddy, this hunting zone can be set as [int [F b-f 0/ 2], int [F b+ f 0/ 2]].Search for the location index k of the amplitude paddy obtained ethe right margin band of b subband bound[b+1].After completing the division of subband b, suitably can revise F according to boundary bias bif, i.e. F bdistance k ecross far away, can force to increase or reduce F bto reduce this deviation.
Shown in Figure 3, this figure is according to the method described above by schematic diagram that the spectrum division of voice signal is multiple subband.In figure, first subband is fundamental tone subband (direct current subband, on the left side of fundamental tone subband, does not mark) from left to right, and the subband on the right of fundamental tone subband is overtone subband.
In addition, be known in those skilled in the art for the method how obtaining fundamental frequency, no longer describe in detail here.Can pass through pitch Detection, namely pitch period extracts the fundamental frequency f obtaining current frame speech signal 0.The specific implementation of pitch Detection such as can pass through correlation method, average amplitude difference method, Cepstrum Method, linear prediction method etc.The fundamental frequency f obtained 0be to normalize to the value of adjusting frequency of N/2, namely the original physical frequency of fundamental frequency is divided by sample frequency f sbe multiplied by N again.
In step 203, at least one the value X that the adjusts frequency each subband in one or more consecutive subband comprised f[n] kbe mapped as conversion frequency value X respectively f' [n] k, this at least one value X that adjusts frequency f[n] kcomprise the value of adjusting frequency that the frequency values in this subband with maximum amplitude value is corresponding.
Using subband as elementary cell, according to certain fundamental frequency rule of conversion, the frequency values of subband intermediate frequency spectrum can be mapped to the frequency values of specifying, thus the conversion that new frequency spectrum realizes fundamental frequency can be set up.When carrying out frequency values and mapping, the frequency values X of all frequency spectrums that each subband in the one or more consecutive subband in subband can be selected to comprise f[n] kbe mapped as conversion frequency value X respectively f' [n] k, also can according to the frequency values X of certain mapping principle by the partial frequency spectrum in subband f[n] kbe mapped as conversion frequency value X respectively f' [n] k, these component frequency values comprise value of adjusting frequency corresponding to the frequency values in this subband with amplitude peak, with X f[n] k_bthere is in expression subband b the value of adjusting frequency that the frequency values of amplitude peak is corresponding.Selected one or more consecutive subband can also be comprise the subband with amplitude peak peak value in all subbands.
The target of frequency map allows reconstructed spectrum realize fundamental frequency conversion, and the fundamental frequency after conversion is:
f 0′=f ratef 0+f offset(6)
Wherein, f raterepresent pitch variation rate, f offsetrepresent that fundamental tone is biased.F 0adjustment fundamental frequency, the value of adjusting frequency of what namely fundamental frequency value was corresponding normalize to N/2, f 0' be and adjustment fundamental frequency value f 0conversion frequency value after corresponding mapping, can by adjustment pitch variation rate f ratef is biased with fundamental tone offsettwo parameters realize f 0conversion.These two parameters are respectively used to control the value f that adjusts frequency corresponding to fundamental frequency value 0fluctuation and skew.If f rate> 1, then can amplify f 0fluctuation, on the contrary, if f rate< 1, then can reduce f 0fluctuation.F offsetthen f can be adjusted 0the reference point of fluctuation.Pitch variation rate f ratespan can be 0 to 10, fundamental tone is biased f offsetspan can be-N/4 to N/4.
When carrying out frequency map using subband as elementary cell, follow fundamental frequency rule of conversion, by the center frequency value f in the subband b before conversion bbe mapped as the center frequency value f in the subband b ' after conversion b ''.Wherein f b=(b+1) f 0.The value X that adjusts frequency arbitrarily in subband f[n] kto corresponding conversion frequency value X f' [n] kmapping can carry out according to the following equation:
X F′[n] k=r wide×(X F[n] k-f b)+f b′
=r wide×(X F[n] k-(b+1)f 0)+(b′+1)f ratef 0+(b′+1)f offset
(7)
f b′′=(b′+1)f 0
f 0′=f ratef 0+f offset
Wherein, f rate, f offset, f 0, f 0' definition see formula (6), the r in formula (7) widebandwidth control coefrficient, for controlling subband bandwidth f wide_bchange.
Shown in Figure 7, the figure shows a kind of frequency map of single subband and the schematic diagram of bandwidth conversion.Wherein, subband bandwidth f wide_bit is the difference of the frequency values that frequency values that b subband intermediate frequency spectrum index sequence number is the highest and spectral index sequence number are minimum.
If r wide=1, the side-play amount of other frequency values relative centre frequency in subband remains unchanged before and after conversion; If r wide> f 0'/f 0, the subband bandwidth so after conversion may exceed the gap of adjacent center frequencies, thus causes the overlapping of the rear intersubband of conversion, causes signal aliasing, thus causes the reduction of figure signal quality, therefore when performing the mapping reducing fundamental frequency, needs to reduce r wideit is made to be not more than f 0'/f 0, suitably to reduce the bandwidth converting rear subband, avoid the aliasing of subband.
Consider the value X that adjusts frequency that in subband b, amplitude is maximum f[n] k_bbe approximately equal to center frequency value f b=(b+1) f 0, formula (7) can also be approximated by one or more X f[n] k_b, b=0,1,2 ... combination, such as:
X F′[n] k=r wide×(X F[n] k-X F[n] k_b)+X F[n] k_b′f rate+(b′+1)f offset
(8)
Using subband as elementary cell, select one or more consecutive subband to map, the method being realized fundamental frequency conversion by the frequency map of subband specifically can have two kinds of implementations.
First kind of way is that cyclic mapping is until fill up whole frequency axis using continuous multiple subband as the seed mapped.Multiple subbands as seed can be parts for subband, also can be whole subbands.The characteristic of original voice signal can be kept more using continuous multiple subband as the seed mapped.
As a rule, the consecutive numbers subband that general selection energy is larger.Such as, M the subband chosen continuously from fundamental tone subband is as the seed mapped.M subband comprises 1 fundamental tone subband and M-1 overtone subband.Fundamental tone subband is the 0th subband, by the centre frequency f of fundamental tone subband 0(the value X that adjusts frequency that the frequency values that in subband 0, amplitude is maximum is corresponding f[n] k_0be approximately equal to f 0) be mapped to f 0' (being mapped to the subband after conversion 0 from the subband 0 before conversion), the centre frequency f of the 1st overtone subband 1be mapped to f 1'=2f 0' (being mapped to the subband after conversion 1 from the subband 1 before conversion), the centre frequency f of the 2nd overtone subband 2be mapped to f 2'=3f 0' (being mapped to the subband after conversion 2 from the subband 2 before conversion), by that analogy to M-1 overtone subband center frequency f m-1be mapped to f m-1'=Mf 0' rear (being mapped to the subband M-1 after conversion from the subband M-1 before conversion), if do not filled frequency axis, i.e. f m-1' be not more than or equal to N/2, carry out second time mapping and copy, by the 0th subband (fundamental tone subband) centre frequency f 0be mapped to f m'=Mf 0'+f 0' (being mapped to the subband M after conversion from the subband 0 before conversion), the 1st overtone subband center frequency f 1be mapped to f m+1'=Mf 0'+2f 0' (being mapped to the subband M+1 after conversion from the subband 1 before conversion), until fill up whole frequency axis.
Shown in Fig. 4 and Fig. 5, which respectively illustrate using continuous multiple subband as the seed mapped, reducing and improving the mapping schematic diagram of fundamental frequency.
The second way can only using some subbands (such as subband b) as map seed, by the centre frequency f of subband brepeatedly cyclic mapping is to target frequency f b ''=(b '+1) f 0' (b '=0,1,2 ...) (being mapped to the subband b ' after conversion from the subband b before conversion), until fill up whole frequency axis, i.e. target frequency f b '' be more than or equal to N/2.
Shown in Figure 6, the figure shows the schematic diagram repeatedly mapped as the seed mapped using single subband.
In step 204, conversion frequency value X is calculated f' [n] kcorresponding conversion spectral index k '.Particularly, can according to following formula:
k′=round(X F′[n] k) (9)
Wherein, function round () represents round.Formula (9) illustrates index with frequency map principle.
Conversion spectral index k ' is the foundation of carrying out reconstructed spectrum after frequency map conversion.
When converting spectral index k ' > N/2, can abandon rebuilding to this conversion spectral index number relevant frequency spectrum.
When converting spectral index k ' < 0, also can abandon rebuilding to this conversion spectral index number relevant frequency spectrum.
In step 205, based on conversion frequency value X f' [n] k, obtain the reconstruction frequency values X that conversion spectral index k ' is corresponding rF[n] k '.
Based on conversion frequency value X f' [n] k, when only there being a conversion frequency value X f' [n] kwhen corresponding to conversion spectral index k ' by formula (9), namely when a conversion spectral index k ' only corresponds to a spectral index k, then corresponding with this conversion spectral index k ' reconstruction frequency values X rF[n] k' can X be equaled f' [n] k.
As multiple conversion frequency value X f' [n] kwhen corresponding to a conversion spectral index k ' by formula (9), namely when a conversion spectral index k ' is corresponding to multiple spectral index k, then can according to frequency with amplitude peak principle, namely corresponding with this conversion spectral index k ' reconstruction frequency values X rF[n] k 'there is in frequency values corresponding to multiple spectral index k the value X that adjusts frequency of amplitude peak f[n] kcorresponding conversion frequency value X f' [n] k.
The reconstruction frequency values X that the conversion spectral index k ' do not calculated by formula (9) is corresponding rF[n] k 'can 0 be set to.
The reconstruction frequency values X corresponding with converting spectral index k ' can be obtained by above method rF[n] k '.
In step 206, based on range value X a[n] k, obtain the reconstruction range value X that conversion spectral index k ' is corresponding rA[n] k '.
Specifically, similar with step 205, based on conversion frequency value X f' [n] k, when only there being a conversion frequency value X f' [n] kwhen corresponding to conversion spectral index k ' by formula (9), namely when a conversion spectral index k ' only corresponds to a spectral index k, then corresponding with this conversion spectral index k ' reconstruction range value X rA[n] k 'for X a[n] k.
As multiple conversion frequency value X f' [n] kwhen corresponding to a conversion spectral index k ' by formula (9), namely when a conversion spectral index k ' is corresponding to multiple spectral index k, then can according to amplitude summation principle, namely corresponding with this conversion spectral index k ' reconstruction range value X rA[n] k 'range value X corresponding to multiple spectral index k a[n] ksum, or the range value X corresponding to multiple spectral index k a[n] kall side and.
Not by formula (9) reconstruction range value X corresponding to the conversion spectral index k ' that calculates rA[n] k 'all can be set to 0.
The reconstruction range value X corresponding with converting spectral index k ' can be obtained by above method rA[n] k '.
Next, based on the reconstruction frequency values X corresponding with each conversion spectral index k ' rF[n] k 'the reconstruction phase value X corresponding with converting spectral index k ' can be calculated rP[n] k ', thus can based on reconstruction phase value X rP[n] k 'with reconstruction range value X rA[n] k 'synthesize new voice signal.
Except the fundamental frequency of voice is to except the change effect of characteristics of speech sounds significantly, for strengthening the effect of speaker's characteristic of voice signal further, the adjustment of spectrum envelope can also be carried out.
In step 207, by the reconstruction range value X in step 206 rA[n] k 'be multiplied by subband gain value corresponding to the conversion subband at its place and/or EQ Gain value, to obtain envelope adjusting range value X rA' [n] k '.
The conveniently adjustment of spectrum envelope, after obtaining the division of subband, before can also mapping at the frequency values that one or more consecutive subband is comprised, the amplitude of each sub-bands of frequencies is normalized, by number range value corresponding respectively of each spectral index in one or more consecutive subband divided by range value maximum in its place subband, make the amplitude value of the sub-bands of frequencies after normalization between [0,1], maximal value is 1.
Shown in Figure 8, the figure shows the schematic diagram of the amplitude normalization of sub-bands of frequencies.
Range value X is rebuild in acquisition rA[n] k 'after, it is carried out to the adjustment of spectrum envelope, particularly, can by reconstruction range value X rA[n] k 'be multiplied by corresponding subband gain value and/or EQ Gain value.
For subband gain value, all normalized spatial spectrum range values in the subband after conversion can be multiplied by corresponding with this subband independently subband gain value, the number of subband gain value and rebuild the number one_to_one corresponding that frequency values comprises subband.
Shown in Figure 9, the spectrum amplitude after subband normalization is carried out the schematic diagram of envelope adjustment by this diagram.
Subband gain value for spectrum envelope adjustment can be one group of yield value of setting arbitrarily.In order to retain and utilize the information of primitive tone signal more, the subband spectrum envelope before to conversion also can be utilized to carry out adjusting the subband gain value obtained.Subband envelope before conversion is one group of yield value, the maximum amplitude value of its former each subband spectrum after saving sub-band division.Carry out linear or nonlinear transformation obtain new spectral index number by rebuilding conversion spectral index number that in each subband after frequency, maximum amplitude value is corresponding.Below use f scalerepresent the parameter for enveloping curve of adjusting frequency.
When adopting linear scale method, for the subband b rebuild in frequency values, the conversion spectral index number that in subband, maximum amplitude value is corresponding is k b, to conversion spectral index k bafter linearly contracting, the new spectral index number that in subband, maximum amplitude value is corresponding is k b', then make:
k b′=k b×f scale(10)
When adopting non-linear zoom method, monotonically increasing convergent-divergent function can be set up:
k b′=F scale[k b,f scale] (11)
This monotonically increasing convergent-divergent function meets:
F scale[0,f scale]=0
(12)
F scale[N/2,f scale]=N/2
Such as, a kind of convergent-divergent function example is:
k b &prime; = F scale [ k b , f scale ] = N 2 ( k b N / 2 ) f scale - - - ( 13 )
No matter adopt linear scale or non-linear zoom, by conversion spectral index corresponding for maximum amplitude value in subband b from k bbe k through scale transformation b' after, all need the sub-band serial number before converting to retrieve, concrete grammar can be as follows:
Compare band one by one bound[i], i=0,1,2 ..., bands, bands are the sub-band sum before conversion.If k b' meet band bound[b '] < k b' <=band bound[b '+1], then the subband gain value of the subband b after conversion is the individual yield value of b ' in the subband envelope before converting; If k b' meet band bound[bands] < k b', the subband gain value of b is last yield value in the subband envelope before converting.Figure 10 and Figure 11 respectively illustrates the schematic diagram of linearity and non-linearity Zoom method adjustment spectrum envelope.That is, when new spectral index number is not more than the maximum spectrum call number in last subband before conversion, subband envelope gain is the maximum amplitude value in the subband before the conversion of new spectral index number.When new spectral index number is greater than the maximum spectrum call number in last subband before conversion, subband envelope gain is the maximum amplitude value in last subband before conversion, and last subband is the maximum subband of frequency values.
For EQ Gain value, EQ Gain value G=[g 0, g 1, g 2..., g n/2] can be a default vector with N/2+1 component value, it and N/2+1 rebuild range value X rA[n] k 'one_to_one corresponding, the equilibrium of compulsory frequency range and global gain adjustment are carried out in its effect.Such as, if need global gain to be set to 2, increase 1 times by final conversion speech volume, so can make G=[2,2,2 ..., 2]; If need by conversion voice low frequency (frequency spectrum of front 1/4) increase 1 times, so can make G=[2,2,2 ..., 2,1,1,1 ..., 1], wherein front 1/4 component is 2, and component is below all 1.For obtaining more level and smooth effect, can also, in several value, the numerical value 2 in above-mentioned example be made gently to change to numerical value 1, such as [2 ... 1.7 ... 1.3 ... 1].
To the adjustment of spectrum envelope, in conjunction with subband yield value and EQ Gain value, such as, range value X can also will be rebuild rA[n] k 'be multiplied by corresponding subband gain value, then be multiplied by corresponding EQ Gain value, to obtain envelope adjusting range value X rA' [n] k '.
Utilize and rebuild frequency values X rF[n] k 'with envelope adjusting range value X rA' [n] k 'the step of synthesizing new voice signal can comprise step 208 and step 209 particularly.
In a step 208, based on reconstruction frequency values X rF[n] k 'calculate and rebuild phase value X rP[n] k '.
The reconstruction frequency values X corresponding with each conversion spectral index k ' can be utilized rF[n] k 'calculate the reconstruction phase value X corresponding with conversion spectral index k ' rP[n] k '.Specifically, the reconstruction frequency values X corresponding with each conversion spectral index k ' is utilized rF[n] k 'phase value X is rebuild with previous frame rP[n-1] k 'obtain present frame and rebuild phase spectrum X rP[n] k '.Concrete formula is:
X RP [ n ] k &prime; = res [ X RP [ n - 1 ] k &prime; + X RF [ n ] k &prime; M N ] - - - ( 14 )
Wherein, k '=0,1,2 ..., N/2, X rP[n-1] k 'represent the reconstruction phase value of previous frame, initial value is zero i.e. X rP[0] k '=0, M is the sampled point number in output time interval, wherein, and res [x]=x-round [x].
Adopt the voice signal with phase measurements to rebuild, the high-quality reconstructed speech with time-frequency smoothness properties can be obtained.
In step 209, based on reconstruction phase value X rP[n] k 'with envelope adjusting range value X rA' [n] k 'synthesize new voice signal.
Only exemplarily introduce the method for voice signal synthesis below.Those skilled in the art it will be appreciated that how to utilize reconstruction phase value and envelope adjusting range value to synthesize new voice signal.
According to reconstruction phase value X rP[n] k 'with envelope adjusting range value X rA' [n] k 'carry out the conversion of polar coordinate system to rectangular coordinate system, the reconstructed spectrum X of N point can be obtained r[n] k ':
X R [ n ] k &prime; = X RA &prime; [ n ] k &prime; e j 2 &pi; X RP [ n ] k &prime; , k &prime; = 0,2 , &CenterDot; &CenterDot; &CenterDot; , N / 2 X RA &prime; [ n ] N - k &prime; e - j 2 &pi; X RP [ n ] N - k &prime; , k &prime; = N / 2 + 1 , &CenterDot; &CenterDot; &CenterDot; , N - - - ( 15 )
To reconstructed spectrum X r[n] kdo windowing Inverse Discrete Fourier Transform (Inverse Discrete FourierTransform, IDFT) and obtain echo signal d w(n):
d w(n)=[d(0),d(1),…,d(N-1)]·h syn
=[d(0)h syn(0),d(1)h syn(1),…,d(N-1)h syn(N-1)] (16)
d ( l ) = 1 N &Sigma; k = 0 N - 1 X R [ n ] k &prime; e j 2 &pi; N lk , l = 0,2 , &CenterDot; &CenterDot; &CenterDot; , N / 2
Wherein, h synbe synthesis window function, Hamming (Hamming) or Hanning window (Hanning) or sinusoidal windows (Sine) can be used.
Utilize echo signal d wn () is carried out the cumulative L point time domain that obtains of overlap and is exported, i.e. time-domain audio signal:
z(n)=d w(n)+z(n-1) (17)
x R(n) [l]=z(n) [l],l=0,1,2,L-1 (18)
Buffering is upgraded after obtaining exporting
z(n) [l]=z(n) [l+L],l=0,1,2,N-L-1 (19)
z(n) [l]=0,l=N-L,N-L+1,…,N-1
Wherein, the initial value of z (n) is zero.
It should be noted that, for the synthesis of the voice signal in step 209, if do not carry out the adjustment of spectrum envelope, then based on reconstruction phase value X rP[n] k 'with reconstruction range value X rA[n] k 'synthesize new voice signal, namely to rebuild range value X rA[n] k 'replace the envelope adjusting range value X in step 209 rA' [n] k '.
In addition, for the sound signal with ground unrest or other non-voices, first voice signal can also be identified from sound signal.Such as, starting point and the terminal of voice signal can be found from ground unrest by speech terminals detection, speech terminals detection is also referred to as voice activation and detects (Voice Active Detection, VAD), its target is separated with other signals (as ambient noise signal) by voice signal in a section audio signal, and judge the end points of voice exactly.Voice activation detects can have different implementation methods, such as, by methods well-known to those skilled in the art such as short-time energy threshold value, short-time zero-crossing rate, short-term spectrum entropy, spectral change rates.Voice activation detects the testing result that output take frame as unit, namely judges whether current frame signal includes voice signal.If present frame does not comprise voice, its amplitude spectrum and frequency spectrum can be analyzed, and synthesize in the step of the voice signal similar with step 104.
the converting means of voice signal characteristic
With reference to shown in Figure 12, this figure is the structural representation of the converting means embodiment corresponding with the transform method of voice signal characteristic provided by the invention.
This device comprises sub-band division unit 1201, frequency mapping unit 1202, frequency spectrum reconstruction unit 1203 and signal synthesis unit 1204.
Sub-band division unit 1201 is multiple subband for utilizing the fundamental frequency value of voice signal by mapping the spectrum division of voice signal, maps multiple subband and comprises fundamental tone subband and overtone subband.
The frequency values that one or more consecutive mapping subband comprises, for being biased based on the pitch variation rate pre-set and fundamental tone, is mapped as conversion frequency value by frequency mapping unit 1202 respectively, to obtain the conversion subband corresponding with this subband.
Frequency spectrum reconstruction unit 1203 is for obtaining reconstructed spectrum based on mapping transformation frequency values.
Signal synthesis unit 1204 is for utilizing the voice signal that mapping reconstruction Spectrum synthesizing is new.
The new voice signal completed after characteristics of speech sounds conversion has the characteristic different from primitive tone signal.Shown in Figure 13, the figure shows the spectrogram of original voice signal.Original voice signal is that male voice is read aloud " sending Young female ".Figure 14 to Figure 17 respectively show the spectrogram of voice signal after the conversion characteristics obtained by a kind of embodiment of transform method provided by the present invention.Target targeted voice signal after conversion is respectively the voice signal that more overcast male voice, loud and sonorous female voice, sharp-pointed child's voice and machine voice are read aloud.Wherein, the sample frequency of original voice signal is 8kHz, N value be 512, L value is 128, and window function all uses peaceful (Hanning) window of the Chinese, and envelope convergent-divergent uses linear scale.The spectrogram of the voice signal after conversion is as shown in Figure 14 to Figure 17:
Former male voice sound is read aloud the spectrogram being transformed to the voice signal that more overcast male voice is read aloud by Figure 14, and wherein converting parameter is f rate=0.7, f offset=0, r wide=1, f scale=1.1, G=1.
Former male voice sound is read aloud the spectrogram being transformed to the voice signal that loud and sonorous female voice is read aloud by Figure 15, and wherein conversion parameter is f rate=1.7, f offset=2, r wide=1, f scale=0.8, G=1.
Former male voice sound is read aloud the spectrogram being transformed to the voice signal that sharp-pointed child's voice is read aloud by Figure 16, and wherein conversion parameter is f rate=2, f offset=8, r wide=1, f scale=0.6, G=1.
Former male voice sound is read aloud the spectrogram being transformed to the voice signal that machine voice is read aloud by Figure 17, and wherein conversion parameter is f rate=0.1, f offset=6, r wide=1, f scale=1, G=1.
From the audition effect of reality, new voice signal target signature is obvious, and the conversion quality of voice signal is good, stable smooth.
So far, the transform method according to a kind of voice signal characteristic of the present invention and device has been described in detail.In order to avoid covering design of the present invention, details more known in the field are not described.Those skilled in the art, according to description above, can understand how to implement technical scheme disclosed herein completely.
Method and system of the present invention may be realized in many ways.Such as, any combination by software, hardware, firmware or software, hardware, firmware realizes method and system of the present invention.Said sequence for the step of described method is only to be described, and the step of method of the present invention is not limited to above specifically described order, unless specifically stated otherwise.In addition, in certain embodiments, can be also record program in the recording medium by the invention process, these programs comprise the machine readable instructions for realizing according to method of the present invention.Thus, the present invention also covers the recording medium stored for performing the program according to method of the present invention.
Although be described in detail specific embodiments more of the present invention by example, it should be appreciated by those skilled in the art, above example is only to be described, instead of in order to limit the scope of the invention.It should be appreciated by those skilled in the art, can without departing from the scope and spirit of the present invention, above embodiment be modified.Scope of the present invention is limited by claims.

Claims (18)

1. a transform method for voice signal characteristic, is characterized in that, the method comprises:
Utilize the fundamental frequency value of voice signal to be multiple subband by the spectrum division of described voice signal, described multiple subband comprises fundamental tone subband and overtone subband;
Be biased based on the pitch variation rate pre-set and fundamental tone, by the value X that adjusts frequency that one or more consecutive described subband comprises f[n] kbe mapped as conversion frequency value X respectively f' [n] k, to obtain the conversion subband corresponding with this subband;
Reconstructed spectrum is obtained based on described conversion frequency value;
Described reconstructed spectrum is utilized to synthesize new voice signal;
Wherein, use following formula perform described in adjust frequency value X f[n] kto corresponding conversion frequency value X f' [n] kmapping:
X F'[n] k=r wide×(X F[n] k-f b)+f b′
=r wide×(X F[n] k-(b+1)f 0)+(b'+1)f ratef 0+(b'+1)f offset
f b′′=(b'+1)f 0'
f 0'=f ratef 0+f offset
Wherein, f 0be to normalize to the adjustment fundamental frequency value of N/2, f 0' be conversion frequency value after the mapping corresponding with adjustment fundamental frequency value, b is sub-band serial number, f bthe center frequency value in subband b, f b '' be center frequency value in subband b' after center frequency value in subband b maps, f raterepresent pitch variation rate, f offsetrepresent that fundamental tone is biased, r wideit is bandwidth control coefrficient.
2. method according to claim 1, is characterized in that, described is that the step of multiple subband comprises by the spectrum division of described voice signal:
N point windowed DFT is carried out to described voice signal, obtains the range value X corresponding to N/2+1 frequency values a[n] kwith the value X that adjusts frequency normalizing to N/2 f[n] k, n is frame number, and k is spectral index number, k=0,1,2 ..., N/2;
Utilize the value of adjusting frequency that described fundamental frequency value is corresponding, by described N/2+1 the value X that adjusts frequency f[n] kbe divided into multiple subband, described multiple subband comprises fundamental tone subband and overtone subband.
3. method according to claim 2, is characterized in that, the described value X that adjusts frequency comprised one or more consecutive described subband f[n] kthe step of carrying out respectively mapping comprises:
At least one value X that adjusts frequency that each subband in one or more consecutive described subband is comprised f[n] kbe mapped as conversion frequency value X respectively f' [n] k, at least one value X that adjusts frequency described f[n] kcomprise the value of adjusting frequency that the frequency values in this subband with maximum amplitude value is corresponding;
The described step based on described conversion frequency value acquisition reconstructed spectrum comprises:
Conversion frequency value X according to following formulae discovery f' [n] kcorresponding conversion spectral index k':
k'=round(X F′[n] k)
Wherein function round () represents round;
Based on described conversion frequency value X f' [n] k, obtain the conversion frequency value X of conversion corresponding to spectral index k' f' [n] k'.
4. method according to claim 3, is characterized in that,
When a conversion spectral index k' only corresponds to a described spectral index k, then corresponding with this conversion spectral index k' reconstruction frequency values X rF[n] k'equal X f' [n] k;
When a conversion spectral index k' corresponds to multiple spectral index k, the reconstruction frequency values X corresponding with this conversion spectral index k' rF[n] k'there is in frequency values corresponding to described multiple spectral index k the value X that adjusts frequency of amplitude peak f[n] kcorresponding conversion frequency value X f' [n] k.
5. method according to claim 4, is characterized in that,
The described step based on described conversion frequency value acquisition reconstructed spectrum also comprises:
Based on described range value X a[n] k, obtain the reconstruction range value X of conversion corresponding to spectral index k' rA[n] k'.
6. method according to claim 5, is characterized in that,
When a conversion spectral index k' only corresponds to a described spectral index k, the reconstruction range value X corresponding with this conversion spectral index k' rA[n] k'range value X corresponding to described spectral index k a[n] k; '
When a conversion spectral index k' corresponds to multiple described spectral index k, the reconstruction range value X corresponding with this conversion spectral index k' rA[n] k'for the range value sum corresponding to described multiple spectral index k, or the range value X corresponding to described multiple spectral index k rA[n] k'all side and.
7. method according to claim 6, is characterized in that, the described step utilizing described reconstructed spectrum to synthesize new voice signal comprises:
Based on the described reconstruction frequency values X corresponding with each described conversion spectral index k' rF[n] k'calculate the reconstruction phase value X corresponding with conversion spectral index k' rP[n] k';
Based on described reconstruction phase value X rP[n] k'with described reconstruction range value X rA[n] k'synthesize new voice signal.
8. method according to claim 7, is characterized in that, before the frequency values comprised by one or more consecutive described subband is mapped as conversion frequency value respectively, the method also comprises:
By number range value corresponding respectively of each spectral index in described one or more consecutive subband divided by range value maximum in its place subband, with the amplitude normalization by described subband.
9. method according to claim 8, is characterized in that, the method also comprises:
By described reconstruction range value X rA[n] k'be multiplied by subband gain value corresponding to the conversion subband at its place and/or EQ Gain value, to obtain envelope adjusting range value X rA' [n] k'.
10. method according to claim 9, is characterized in that, the method also comprises:
Preset N/2+1 EQ Gain value g 0, g 1, g 2..., g n/2.
11. methods according to claim 9, is characterized in that, described subband gain value is the subband envelope gain value corresponding to spectral index number in described conversion subband with maximum amplitude value.
12. methods according to claim 11, is characterized in that,
Conversion spectral index k' corresponding for maximum amplitude value in described conversion subband is carried out linear or nonlinear transformation and obtain new spectral index number;
When maximum spectrum call number in last subband that described new spectral index number is not more than before conversion, described subband envelope gain is the maximum amplitude value in the subband before the conversion of described new spectral index number,
When maximum spectrum call number in last subband that described new spectral index number is greater than before conversion, described subband envelope gain is the maximum amplitude value in last subband before conversion, and last subband described is the maximum subband of frequency values.
13. methods according to claim 9, is characterized in that, the described step utilizing described reconstructed spectrum to synthesize new voice signal comprises:
Based on the described reconstruction frequency values X corresponding with each described conversion spectral index k' rF[n] k'calculate the reconstruction phase value X corresponding with conversion spectral index k' rP[n] k';
Based on described reconstruction phase value X rP[n] k'with described envelope adjusting range value X rA' [n] k'synthesize new voice signal.
14. methods according to claim 13, is characterized in that, described based on the described reconstruction frequency values X corresponding with each described conversion spectral index k' rF[n] k'calculate the envelope adjusting range value X corresponding with conversion spectral index k' rP' [n] k'step in, according to following formulae discovery:
X RP[n] k'=res[X RP[n-1] k'+X RF′[n] k'M/N]
Wherein, res [x]=x-round [x], X rP[0] k'=0, M is the sampled point number in output time interval.
15. methods according to claim 3, is characterized in that, described one or more consecutive subband comprises the subband with amplitude peak peak value.
16. methods according to claim 3, is characterized in that,
When converting spectral index k'<0, abandon rebuilding to this conversion spectral index number relevant frequency spectrum;
When converting spectral index k'>N/2, abandon rebuilding to this conversion spectral index number relevant frequency spectrum.
17. methods according to claim 1, is characterized in that, detect described voice signal by fundamental tone detecting method.
18. 1 kinds, to the converting means of voice signal characteristic, is characterized in that, this device comprises:
Sub-band division unit, for utilizing the fundamental frequency value of voice signal to be multiple subband by the spectrum division of described voice signal, described multiple subband comprises fundamental tone subband and overtone subband;
Frequency mapping unit, for being biased based on the pitch variation rate pre-set and fundamental tone, by the value X that adjusts frequency that one or more consecutive described subband comprises f[n] kbe mapped as conversion frequency value X respectively f' [n] k, to obtain the conversion subband corresponding with this subband;
Frequency spectrum reconstruction unit, for obtaining reconstructed spectrum based on described conversion frequency value;
Signal synthesis unit, synthesizes new voice signal for utilizing described reconstructed spectrum;
Wherein, use following formula perform described in adjust frequency value X f[n] kto corresponding conversion frequency value X f' [n] kmapping:
X F'[n] k=r wide×(X F[n] k-f b)+f b′′
=r wide×(X F[n] k-(b+1)f 0)+(b'+1)f ratef 0+(b'+1)f offset
f b′′=(b'+1)f 0'
f 0'=f ratef 0+f offset
Wherein, f 0be to normalize to the adjustment fundamental frequency value of N/2, f 0' be conversion frequency value after the mapping corresponding with adjustment fundamental frequency value, b is sub-band serial number, f bthe center frequency value in subband b, f b '' be center frequency value in subband b' after center frequency value in subband b maps, f raterepresent pitch variation rate, f offsetrepresent that fundamental tone is biased, r wideit is bandwidth control coefrficient.
CN201210033138.2A 2012-02-15 2012-02-15 A kind of transform method of voice signal characteristic and device Active CN103258539B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210033138.2A CN103258539B (en) 2012-02-15 2012-02-15 A kind of transform method of voice signal characteristic and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210033138.2A CN103258539B (en) 2012-02-15 2012-02-15 A kind of transform method of voice signal characteristic and device

Publications (2)

Publication Number Publication Date
CN103258539A CN103258539A (en) 2013-08-21
CN103258539B true CN103258539B (en) 2015-09-23

Family

ID=48962412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210033138.2A Active CN103258539B (en) 2012-02-15 2012-02-15 A kind of transform method of voice signal characteristic and device

Country Status (1)

Country Link
CN (1) CN103258539B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2963645A1 (en) 2014-07-01 2016-01-06 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Calculator and method for determining phase correction data for an audio signal
CN105575405A (en) * 2014-10-08 2016-05-11 展讯通信(上海)有限公司 Double-microphone voice active detection method and voice acquisition device
CN105786801A (en) * 2014-12-22 2016-07-20 中兴通讯股份有限公司 Speech translation method, communication method and related device
KR102125410B1 (en) * 2015-02-26 2020-06-22 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. Apparatus and method for processing audio signal to obtain processed audio signal using target time domain envelope
CN105118523A (en) * 2015-07-13 2015-12-02 努比亚技术有限公司 Audio processing method and device
CN107749301B (en) * 2017-09-18 2021-03-09 得理电子(上海)有限公司 Tone sample reconstruction method and system, storage medium and terminal device
CN107958672A (en) * 2017-12-12 2018-04-24 广州酷狗计算机科技有限公司 The method and apparatus for obtaining pitch waveform data
CN109165533B (en) * 2018-08-04 2022-06-03 深圳市马博士网络科技有限公司 Anti-peeping method of short video based on cross-group mechanism
CN109243479B (en) * 2018-09-20 2022-06-28 广州酷狗计算机科技有限公司 Audio signal processing method and device, electronic equipment and storage medium
CN113140225A (en) * 2020-01-20 2021-07-20 腾讯科技(深圳)有限公司 Voice signal processing method and device, electronic equipment and storage medium
CN113066503B (en) * 2021-03-15 2023-12-08 广州酷狗计算机科技有限公司 Audio frame adjusting method, device, equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1514931A (en) * 2002-06-07 2004-07-21 ��ʽ���罨�� Voice signal interpolation device, method and program
CN1719514A (en) * 2004-07-06 2006-01-11 中国科学院自动化研究所 Based on speech analysis and synthetic high-quality real-time change of voice method
CN101740034A (en) * 2008-11-04 2010-06-16 刘盛举 Method for realizing sound speed-variation without tone variation and system for realizing speed variation and tone variation
EP2234414A2 (en) * 2009-03-27 2010-09-29 Starkey Laboratories, Inc. System for automatic fitting using real ear measurement

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1927981B1 (en) * 2006-12-01 2013-02-20 Nuance Communications, Inc. Spectral refinement of audio signals

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1514931A (en) * 2002-06-07 2004-07-21 ��ʽ���罨�� Voice signal interpolation device, method and program
CN1719514A (en) * 2004-07-06 2006-01-11 中国科学院自动化研究所 Based on speech analysis and synthetic high-quality real-time change of voice method
CN101740034A (en) * 2008-11-04 2010-06-16 刘盛举 Method for realizing sound speed-variation without tone variation and system for realizing speed variation and tone variation
EP2234414A2 (en) * 2009-03-27 2010-09-29 Starkey Laboratories, Inc. System for automatic fitting using real ear measurement

Also Published As

Publication number Publication date
CN103258539A (en) 2013-08-21

Similar Documents

Publication Publication Date Title
CN103258539B (en) A kind of transform method of voice signal characteristic and device
CN106919662B (en) Music identification method and system
Epps et al. A novel instrument to measure acoustic resonances of the vocal tract during phonation
US9368103B2 (en) Estimation system of spectral envelopes and group delays for sound analysis and synthesis, and audio signal synthesis system
CN107851444A (en) For acoustic signal to be decomposed into the method and system, target voice and its use of target voice
Raitio et al. Analysis and synthesis of shouted speech.
CN101983402B (en) Speech analyzing apparatus, speech analyzing/synthesizing apparatus, correction rule information generating apparatus, speech analyzing system, speech analyzing method, correction rule information and generating method
Erro et al. HNM-based MFCC+ F0 extractor applied to statistical speech synthesis
CN106997765A (en) The quantitatively characterizing method of voice tone color
JP2018004870A (en) Speech synthesis device and speech synthesis method
KR20180078252A (en) Method of forming excitation signal of parametric speech synthesis system based on gesture pulse model
CN103559893B (en) One is target gammachirp cepstrum coefficient aural signature extracting method under water
Přibilová et al. Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description
JP7359164B2 (en) Sound signal synthesis method and neural network training method
Chadha et al. A comparative performance of various speech analysis-synthesis techniques
Makhijani et al. Speech enhancement using pitch detection approach for noisy environment
Juvela Perceptual spectral matching utilizing mel-scale filterbanks for statistical parametric speech synthesis with glottal excitation vocoder
Raitio et al. Phase perception of the glottal excitation of vocoded speech
Pohjalainen et al. Weighted linear prediction for speech analysis in noisy conditions.
Ganapathy et al. Temporal resolution analysis in frequency domain linear prediction
Arakawa et al. High quality voice manipulation method based on the vocal tract area function obtained from sub-band LSP of STRAIGHT spectrum
Gupta et al. A new framework for artificial bandwidth extension using H∞ filtering
JP7088403B2 (en) Sound signal generation method, generative model training method, sound signal generation system and program
CN113140204B (en) Digital music synthesis method and equipment for pulsar signal control
Christensen Metrics for vector quantization-based parametric speech enhancement and separation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20170120

Address after: 200127 room 3205F, building 707, Zhang Yang Road, Shanghai, China (Shanghai) free trade zone, No. 32

Patentee after: Xin Xin Finance Leasing Co.,Ltd.

Address before: Zuchongzhi road Shanghai Pudong New Area Zhangjiang High Tech Park of Shanghai City, 201203 Lane 2288 Pudong New Area Spreadtrum Center Building 1

Patentee before: Spreadtrum Communications (Shanghai) Co.,Ltd.

TR01 Transfer of patent right

Effective date of registration: 20170711

Address after: 100033 room 2062, Wenstin Executive Apartment, 9 Financial Street, Beijing, Xicheng District

Patentee after: Xin Xin finance leasing (Beijing) Co.,Ltd.

Address before: 200127 room 3205F, building 707, Zhang Yang Road, Shanghai, China (Shanghai) free trade zone, No. 32

Patentee before: Xin Xin Finance Leasing Co.,Ltd.

TR01 Transfer of patent right
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20130821

Assignee: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Assignor: Xin Xin finance leasing (Beijing) Co.,Ltd.

Contract record no.: 2018990000163

Denomination of invention: Method and device for transforming voice signal characteristics

Granted publication date: 20150923

License type: Exclusive License

Record date: 20180626

TR01 Transfer of patent right

Effective date of registration: 20200309

Address after: 201203 Zuchongzhi Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai 2288

Patentee after: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Address before: 100033 room 2062, Wenstin administrative apartments, 9 Financial Street B, Xicheng District, Beijing.

Patentee before: Xin Xin finance leasing (Beijing) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200605

Address after: 361012 unit 05, 8 / F, building D, Xiamen international shipping center, No.97 Xiangyu Road, Xiamen area, China (Fujian) free trade zone, Xiamen City, Fujian Province

Patentee after: Xinxin Finance Leasing (Xiamen) Co.,Ltd.

Address before: 201203 Zuchongzhi Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai 2288

Patentee before: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

TR01 Transfer of patent right
EC01 Cancellation of recordation of patent licensing contract

Assignee: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Assignor: Xin Xin finance leasing (Beijing) Co.,Ltd.

Contract record no.: 2018990000163

Date of cancellation: 20210301

EC01 Cancellation of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20130821

Assignee: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Assignor: Xinxin Finance Leasing (Xiamen) Co.,Ltd.

Contract record no.: X2021110000010

Denomination of invention: A method and device for transforming voice signal characteristics

Granted publication date: 20150923

License type: Exclusive License

Record date: 20210317

EE01 Entry into force of recordation of patent licensing contract
TR01 Transfer of patent right

Effective date of registration: 20230717

Address after: 201203 Shanghai city Zuchongzhi road Pudong New Area Zhangjiang hi tech park, Spreadtrum Center Building 1, Lane 2288

Patentee after: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Address before: 361012 unit 05, 8 / F, building D, Xiamen international shipping center, 97 Xiangyu Road, Xiamen area, China (Fujian) pilot Free Trade Zone, Xiamen City, Fujian Province

Patentee before: Xinxin Finance Leasing (Xiamen) Co.,Ltd.

TR01 Transfer of patent right