CN104599682A

CN104599682A - Method for extracting pitch period of telephone wire quality voice

Info

Publication number: CN104599682A
Application number: CN201510017199.3A
Authority: CN
Inventors: 常亮; 唐昆; 崔慧娟
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2015-01-13
Filing date: 2015-01-13
Publication date: 2015-05-06

Abstract

The invention discloses a method for extracting a pitch period of telephone wire quality voice. The method comprises the following steps of: nonlinearly processing primary voice, and calculating a first time domain self-correlation function of primary voice and a second time domain self-correlation function of nonlinearly processed voice; integrating the first time domain self-correlation function with the second time domain self-correlation function to obtain a third time domain self-correlation function; calculating a long time pitch period of each frame in the primary voice and amending the third time domain self-correlation function; performing LPC inverse filtering of the primary voice to obtain residual signals, performing FFT (Fast Fourier Transform Algorithm) conversion, and calculating a frequency domain self-correlation function according to conversion results; according to the third time domain self-correlation function and the frequency domain self-correlation function, calculating time domain weight and frequency domain weight of a pitch period candidate value and further obtaining the ultimate weight; according to the ultimate weight, performing route planning to determine a final pitch period value. When the telephone wire quality voice is extracted by the method, the accuracy is high.

Description

The method for extracting base-sound period of telephone wire quality audio

Technical field

The present invention relates to digitized voice communications technical field, particularly a kind of method for extracting base-sound period of telephone wire quality audio.

Background technology

Pitch period is very important parameter in voice compression coding, is also the parameter that a lot of speech-related technologies is used, and the correct extraction of pitch period parameter is the prerequisite ensureing digitize voice proper communication.

Current pitch period parameter extraction technology is for the complete voice of frequency spectrum, and the namely voice of 60-4000Hz, can reach higher extraction accuracy.For telephone wire quality audio, it does not refer to merely the voice of telephone system, comprises the voice that other causes voice spectrum to lack owing to have passed through 300-3400Hz bandpass filter, the voice of such as analog-interphone yet.Therefore, the most fundamental frequency of telephone wire quality audio has been filtered (base frequency range of people is 60-400Hz), can cause pitch period corresponding be not the maximal value of autocorrelation function, even do not appear in the middle of candidate value, but current extractive technique depends critically upon autocorrelation function, the accuracy of therefore extracting is not high, there will be very grave error, such as male voice becomes tapering, female voice becomes big and heavy, not only affect sense of hearing, also affect the identification of speaker and distinguishing of content of speaking, affect very large.

Summary of the invention

The present invention is intended to solve one of technical matters in above-mentioned correlation technique at least to a certain extent.

For this reason, the object of the invention is to the method for extracting base-sound period proposing a kind of telephone wire quality audio, time-domain and frequency-domain combines by the method, when extracting telephone wire quality audio, has the advantage that accuracy is high.

To achieve these goals, embodiments of the invention propose a kind of method for extracting base-sound period of telephone wire quality audio, comprise the following steps: Nonlinear Processing is carried out to the raw tone of input, and calculate the second time domain autocorrelation function of the voice after the first time domain autocorrelation function of described raw tone and Nonlinear Processing; Merge described first time domain autocorrelation function and described first time domain autocorrelation function obtains the 3rd time domain autocorrelation function; Calculate the long time base sound cycle of each frame in raw tone, and according to the described long time base sound cycle, described 3rd time domain autocorrelation function is revised; LPC liftering is carried out to described raw tone and obtains residual signal, and FFT conversion is carried out to described residual signal, and calculate frequency domain autocorrelation function according to transformation results; Calculate time domain weights and the frequency domain weight of pitch period candidate value according to described 3rd time domain autocorrelation function and described frequency domain autocorrelation function, and obtain the final weight of described pitch period candidate value according to described time domain weights and frequency domain weight; Final weight according to described pitch period candidate value and described pitch period candidate value carries out path planning, to determine final pitch period value.

According to the method for extracting base-sound period of the telephone wire quality audio of the embodiment of the present invention, time-domain and frequency-domain is combined, in time domain, introduce new parameter---long time base sound cycle, and according to voice short-term stationarity characteristic, time domain correction is carried out to autocorrelation function, removes the delay value that can not become pitch period; On frequency domain, calculate frequency domain autocorrelation function, the frequency domain autocorrelation value corresponding to pitch period candidate value is also alternatively worth a part for weight, to increase the weight of real pitch period.And then, the accuracy that the pitch period that the method can improve telephone wire quality audio extracts.

In addition, the method for extracting base-sound period of telephone wire quality audio according to the above embodiment of the present invention can also have following additional technical characteristic:

In some instances, by the 3rd time domain autocorrelation function described in following formulae discovery:

R_{comb} (τ) = \{\begin{matrix} R_{abs} (τ), R_{abs} (τ) > R_{orig} (τ) \\ R_{orig} (τ), R_{orig} (τ) > R_{abs} (τ) \end{matrix},

Wherein, R _comb(τ) be described 3rd time domain autocorrelation function, R _orig(τ) be the first time domain autocorrelation function of raw tone, R _abs(τ) be the second time domain autocorrelation function of the voice after Nonlinear Processing.

In some instances, in the long time base sound cycle of each frame in described calculating raw tone, specifically comprise:

Wherein, l is frame number, p _avgin the l long time base sound cycle that () is present frame, p (l-1) is the long time base sound cycle of previous frame, P _midbe positioned at the part of male voice and the coincidence of female voice pitch period scope, V _l-1represent when being 0 and 1 that previous frame is voiceless sound and voiced sound respectively, G _l-1for the energy of previous frame, G ₀for the threshold value of energy.

In some instances, wherein, if previous frame voice signal is voiced sound, and its energy is greater than threshold value G ₀, then upgrade the long time base sound cycle of present frame with the long time base sound cycle of previous frame, otherwise use P _midupgrade the long time base sound cycle of present frame.

In some instances, wherein, by following formula, described 3rd time domain autocorrelation function is revised:

Wherein, p _th1and p _th2be two threshold values.

In some instances, wherein, p _th1=45, p _th2=26.

In some instances, wherein, if be positioned at p _minto p _th2between long time base sound cycle of τ value be greater than p _th1, then the auto-correlation function value of this τ is set to 0.

In some instances, FFT conversion is carried out to described residual signal, and calculates frequency domain autocorrelation function according to transformation results, specifically comprise:

R_{sf} (f) = \frac{1}{2} (\frac{Σ_{m = 6}^{46} S_{res} (m) S_{res} (m + f)}{Σ_{m = 6}^{46} S_{res} (m) S_{res} (m)} + \frac{Σ_{m = 24}^{64} S_{res} (m) S_{res} (m + f)}{Σ_{m = 24}^{64} S_{res} (m) S_{res} (m)}),

Wherein, R _sff () is frequency domain autocorrelation function, the FFT transformation results that S (m) is residual signal.

In some instances, by the final weight of pitch period candidate value described in following formulae discovery:

R _sx(τ,f)＝αR _comb(τ)+(1-α)R _sf(f)，

Wherein, R _sx(τ, f) is the final weight of pitch period candidate value τ, α R _comb(τ) be time domain weights, (1-α) R _sff () is frequency domain weight, τ and f becomes corresponding relation, R _comb(τ) be time domain autocorrelation value, R _sff () is frequency domain autocorrelation value, α is weighting factor.

In some instances, wherein, α is 0.5.

Additional aspect of the present invention and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.

Accompanying drawing explanation

Above-mentioned and/or additional aspect of the present invention and advantage will become obvious and easy understand from accompanying drawing below combining to the description of embodiment, wherein:

Fig. 1 is the process flow diagram of the method for extracting base-sound period of telephone wire quality audio according to an embodiment of the invention;

Fig. 2 is the schematic flow sheet of the method for extracting base-sound period of telephone wire quality audio in accordance with another embodiment of the present invention;

Fig. 3 (a), (b), (c) are raw tone spectrum respectively, adopt the schematic diagram of the speech manual of the method synthesis of the speech manual of classic method synthesis and the employing embodiment of the present invention.

Embodiment

Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.

Below in conjunction with accompanying drawing, the method for extracting base-sound period according to the telephone wire quality audio of the embodiment of the present invention is described.

Fig. 1 is the process flow diagram of the method for extracting base-sound period of telephone wire quality audio according to an embodiment of the invention.Fig. 2 is the schematic flow sheet of the method for extracting base-sound period of telephone wire quality audio in accordance with another embodiment of the present invention.Shown in composition graphs 1 and Fig. 2, the method comprises the following steps:

Step S101, carries out Nonlinear Processing (such as taking absolute value) to the raw tone of input, and calculates the second time domain autocorrelation function of the voice after the first time domain autocorrelation function of raw tone and Nonlinear Processing.

Step S102, merges the first time domain autocorrelation function and the first time domain autocorrelation function obtains the 3rd time domain autocorrelation function.Particularly, in one embodiment of the invention, such as, by following formulae discovery the 3rd time domain autocorrelation function:

R_{comb} (τ) = \{\begin{matrix} R_{abs} (τ), R_{abs} (τ) > R_{orig} (τ) \\ R_{orig} (τ), R_{orig} (τ) > R_{abs} (τ) \end{matrix},

Wherein, R _comb(τ) be the 3rd time domain autocorrelation function, R _orig(τ) be the first time domain autocorrelation function of raw tone, R _abs(τ) be the second time domain autocorrelation function of the voice after Nonlinear Processing.

Step S103, calculates the long time base sound cycle (LTAP) of each frame in raw tone, and revises the 3rd time domain autocorrelation function according to the long time base sound cycle.

Wherein, very total in one embodiment of the present of invention, such as, by the long time base sound cycle of each frame in following formulae discovery raw tone:

Wherein, l is frame number, p _avgin the l long time base sound cycle that () is present frame, p (l-1) is the long time base sound cycle of previous frame, P _midbe positioned at the part of male voice and the coincidence of female voice pitch period scope, V _l-1represent when being 0 and 1 that previous frame is voiceless sound and voiced sound respectively, G _l-1for the energy of previous frame, G ₀for the threshold value of energy.Further, in some instances, if previous frame voice signal is voiced sound, and its energy exceedes threshold value G ₀, then upgrade the long time base sound cycle of present frame with the pitch period of previous frame, otherwise use P _midupgrade the long time base sound cycle of present frame.

In one embodiment of the invention, such as by following formula, the 3rd time domain autocorrelation function is revised:

Wherein, p _th1and p _th2be two threshold values.In concrete example, based on lot of experiments experience, such as, can p be set _th1=45, p _th2=26.Further, if scope is at p _minto p _th2between long time base sound cycle of τ value be greater than p _th1, then the auto-correlation function value of these τ values is set to 0, removes its possibility as pitch period candidate value.Reason is that these τ values and the distance in long time base sound cycle have exceeded the scope of normal variation, if do not remove the interference of the τ value that will be subject to mistake, correct pitch period can be made like this to have larger probability to appear in candidate value, also there is larger weight simultaneously, and then improve the accuracy of extracting pitch period.

Step S104, carries out LPC liftering to raw tone and obtains residual signal, and carries out FFT (FastFourier Transformation, Fast Fourier Transform (FFT)) conversion to residual signal, and calculates frequency domain autocorrelation function according to transformation results.

Wherein, in one embodiment of the invention, such as, by following formulae discovery frequency domain autocorrelation function:

R_{sf} (f) = \frac{1}{2} (\frac{Σ_{m = 6}^{46} S_{res} (m) S_{res} (m + f)}{Σ_{m = 6}^{46} S_{res} (m) S_{res} (m)} + \frac{Σ_{m = 24}^{64} S_{res} (m) S_{res} (m + f)}{Σ_{m = 24}^{64} S_{res} (m) S_{res} (m)}),

Step S105, calculates time domain weights and the frequency domain weight of pitch period candidate value, and obtains the final weight of pitch period candidate value according to time domain weights and frequency domain weight according to the 3rd time domain autocorrelation function and frequency domain autocorrelation function.In other words, by frequency domain autocorrelation value also as a part for pitch period candidate value weight, then final weight is such as determined by following formula:

R _sx(τ,f)＝αR _comb(τ)+(1-α)R _sf(f)，

Wherein, R _sx(τ, f) is the final weight of pitch period candidate value τ, α R _comb(τ) be time domain weights, (1-α) R _sff () is frequency domain weight, τ and f becomes corresponding relation, R _comb(τ) be time domain autocorrelation value, R _sff () is frequency domain autocorrelation value, α is weighting factor.More specifically, in some instances, α is 0.5.

Step S106, the final weight according to pitch period candidate value and pitch period candidate value carries out path planning, to determine final pitch period value.More specifically, the path of Least-cost is obtained by dynamic programming, to determine final pitch period value according to the final weight calculation cost function of pitch period candidate value.

As example particularly, as shown in Figure 3, Fig. 3 (a) illustrates the raw tone spectrum of telephone wire quality audio, Fig. 3 (b) illustrates the speech manual that telephone wire quality audio adopts traditional method for extracting base-sound period synthesis, and Fig. 3 (c) illustrates the speech manual that telephone wire quality audio adopts the method synthesis of the embodiment of the present invention.Be applied in speech compression system by the method for the embodiment of the present invention and classic method, the speech manual comparing result of synthetic speech as shown in Figure 3.By comparison diagram 3 (b) and Fig. 3 (c), clearly can see that the method for the embodiment of the present invention can estimate fundamental frequency exactly, therefore the speech manual of synthetic speech is closer to the speech manual of original phone line mass voice, and classic method then creates obvious frequency multiplication mistake.

More specifically, in some instances, shown by test, extract telephone wire quality audio pitch period, the method for the embodiment of the present invention declines 46.8% than the gross error rate of classic method.Meanwhile, to normal voice, the method for this embodiment of the present invention declines 31.2% than the gross error rate of classic method.

To sum up, according to the method for extracting base-sound period of the telephone wire quality audio of the embodiment of the present invention, time-domain and frequency-domain is combined, in time domain, introduce new parameter---long time base sound cycle, and according to voice short-term stationarity characteristic, time domain correction is carried out to autocorrelation function, remove the delay value that can not become pitch period; On frequency domain, calculate frequency domain autocorrelation function, the frequency domain autocorrelation value corresponding to pitch period candidate value is also alternatively worth a part for weight, to increase the weight of real pitch period.And then, the accuracy that the pitch period that the method can improve telephone wire quality audio extracts.

In describing the invention, it will be appreciated that, term " " center ", " longitudinal direction ", " transverse direction ", " length ", " width ", " thickness ", " on ", D score, " front ", " afterwards ", " left side ", " right side ", " vertically ", " level ", " top ", " end " " interior ", " outward ", " clockwise ", " counterclockwise ", " axis ", " radial direction ", orientation or the position relationship of the instruction such as " circumference " are based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, instead of indicate or imply that the device of indication or element must have specific orientation, with specific azimuth configuration and operation, therefore limitation of the present invention can not be interpreted as.

In addition, term " first ", " second " only for describing object, and can not be interpreted as instruction or hint relative importance or imply the quantity indicating indicated technical characteristic.Thus, be limited with " first ", the feature of " second " can express or impliedly comprise at least one this feature.In describing the invention, the implication of " multiple " is at least two, such as two, three etc., unless otherwise expressly limited specifically.

In the present invention, unless otherwise clearly defined and limited, the term such as term " installation ", " being connected ", " connection ", " fixing " should be interpreted broadly, and such as, can be fixedly connected with, also can be removably connect, or integral; Can be mechanical connection, also can be electrical connection; Can be directly be connected, also indirectly can be connected by intermediary, can be the connection of two element internals or the interaction relationship of two elements, unless otherwise clear and definite restriction.For the ordinary skill in the art, above-mentioned term concrete meaning in the present invention can be understood as the case may be.

In the present invention, unless otherwise clearly defined and limited, fisrt feature second feature " on " or D score can be that the first and second features directly contact, or the first and second features are by intermediary indirect contact.And, fisrt feature second feature " on ", " top " and " above " but fisrt feature directly over second feature or oblique upper, or only represent that fisrt feature level height is higher than second feature.Fisrt feature second feature " under ", " below " and " below " can be fisrt feature immediately below second feature or tiltedly below, or only represent that fisrt feature level height is less than second feature.

In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not must for be identical embodiment or example.And the specific features of description, structure, material or feature can combine in one or more embodiment in office or example in an appropriate manner.In addition, when not conflicting, the feature of the different embodiment described in this instructions or example and different embodiment or example can carry out combining and combining by those skilled in the art.

Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, revises, replace and modification.

Claims

1. a method for extracting base-sound period for telephone wire quality audio, is characterized in that, comprises the following steps:

Nonlinear Processing is carried out to the raw tone of input, and calculates the second time domain autocorrelation function of the voice after the first time domain autocorrelation function of described raw tone and Nonlinear Processing;

Merge described first time domain autocorrelation function and described first time domain autocorrelation function obtains the 3rd time domain autocorrelation function;

Calculate the long time base sound cycle of each frame in raw tone, and according to the described long time base sound cycle, described 3rd time domain autocorrelation function is revised;

LPC liftering is carried out to described raw tone and obtains residual signal, and FFT conversion is carried out to described residual signal, and calculate frequency domain autocorrelation function according to transformation results;

Calculate time domain weights and the frequency domain weight of pitch period candidate value according to described 3rd time domain autocorrelation function and described frequency domain autocorrelation function, and obtain the final weight of described pitch period candidate value according to described time domain weights and frequency domain weight;

Final weight according to described pitch period candidate value and described pitch period candidate value carries out path planning, to determine final pitch period value.

2. the method for extracting base-sound period of telephone wire quality audio according to claim 1, is characterized in that, by the 3rd time domain autocorrelation function described in following formulae discovery:

R_{comb} (τ) = \{\begin{matrix} R_{abs} (τ), & R_{abs} (τ) > R_{orig} (τ) \\ R_{orig} (τ), & R_{orig} (τ) > R_{abs} (τ) \end{matrix},

3. the method for extracting base-sound period of telephone wire quality audio according to claim 1, is characterized in that, in the long time base sound cycle of each frame in described calculating raw tone, specifically comprises:

4. the method for extracting base-sound period of telephone wire quality audio according to claim 3, is characterized in that, wherein, if previous frame voice signal is voiced sound, and its energy is greater than threshold value G ₀, then upgrade the long time base sound cycle of present frame with the long time base sound cycle of previous frame, otherwise use P _midupgrade the long time base sound cycle of present frame.

5. the method for extracting base-sound period of telephone wire quality audio according to claim 4, is characterized in that, wherein, is revised described 3rd time domain autocorrelation function by following formula:

Wherein, p _th1and p _th2be two threshold values.

6. the method for extracting base-sound period of telephone wire quality audio according to claim 5, is characterized in that, wherein, and p _th1=45, p _th2=26.

7. the method for extracting base-sound period of telephone wire quality audio according to claim 5, is characterized in that, wherein, if be positioned at p _minto p _th2between long time base sound cycle of τ value be greater than p _th1, then the auto-correlation function value of this τ is set to 0.

8. the method for extracting base-sound period of telephone wire quality audio according to claim 1, is characterized in that, carries out FFT conversion to described residual signal, and calculates frequency domain autocorrelation function according to transformation results, specifically comprises:

R_{sf} (f) = \frac{1}{2} (\frac{Σ_{m = 6}^{46} S_{res} (m) S_{res} (m + f)}{Σ_{m = 6}^{46} S_{res} (m) S_{res} (m)} + \frac{Σ_{m = 24}^{64} S_{res} (m) S_{res} (m + f)}{Σ_{m = 24}^{64} S_{res} (m) S_{res} (m)}),

9. the method for extracting base-sound period of telephone wire quality audio according to claim 1, is characterized in that, the final weight by pitch period candidate value described in following formulae discovery:

R _sx(τ,f)＝αR _comb(τ)+(1-α)R _sf(f)，

10. the method for extracting base-sound period of telephone wire quality audio according to claim 9, is characterized in that, wherein, α is 0.5.