CN108133713A

CN108133713A - Method for estimating sound channel area under glottic closed phase

Info

Publication number: CN108133713A
Application number: CN201711206456.3A
Authority: CN
Inventors: 陶智; 孙宝印; 邵雅婷; 张晓俊; 吴迪; 肖仲喆
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2017-11-27
Filing date: 2017-11-27
Publication date: 2018-06-08
Anticipated expiration: 2037-11-27
Also published as: CN108133713B

Abstract

The invention discloses a method for estimating the area of a sound channel under glottic closure, which comprises the steps of firstly determining the positions of two adjacent closed points of the glottic by adopting a DYPSA algorithm, and synchronously calculating the weight excitation function of attenuation by taking the two adjacent closed points as a unit; then, calculating a reflection coefficient of a glottal closed-phase lower sound channel model by using a weighted linear prediction method; the discrete vocal tract area function is then iteratively calculated. The superiority of the method is verified from the perspective of inverse filtering, six types of sound channel area characteristics are selected for recognition and analysis, and 7% of accuracy improvement is achieved compared with a characteristic fusion optimization algorithm using the same voice library.

Description

A kind of method that sound channel area is estimated in the case where glottis closes phase

Technical field

The present invention relates to linear prediction method estimation sound channel area techniques fields more particularly to one kind to estimate in the case where glottis closes phase The method of sound channel area.

Background technology

Sound channel is one of important system during speech production, the research of vocal tract shape can be applied to phonetic synthesis, Speech recognition, speech training, music control etc..Research shows that when sending out identical voice, some special voices are (such as：Vocal cords are small Knot, polyp of vocal cord, hyperthyroidism voice) and the corresponding vocal tract shape of normal voice it is different.X-ray imaging, ultrasonic imaging, MRI The medical procedures such as (magnetic resonance imaging) magnetic resonance imaging can obtain accurate sound channel area, but this A little methods make subject be exposed in different types of ray and electromagnetic wave, have potential hazard to human body and also equipment requirement compared with High, complicated for operation, underaction is convenient.The method of estimation vocal tract shape is it is only necessary to handle voice data indirectly, letter It is single practical.Estimation sound channel area mainly has formant method and linear prediction liftering method at present, wherein using the inverse filter of linear prediction The method of wave is related to the hypothesis to boundary condition.

In the research of linear prediction method estimation sound channel area, there are two different boundary conditions：Glottis is closed completely, i.e., Glottis reflectance factor is 1, and sound channel loss concentrates on lip end；Lip end is closed completely, i.e., lip end reflectance factor is 1, sound channel loss collection In at glottis.

During the above-mentioned hypothesis of practical application, both are not well positioned to meet, so as to be unfavorable for estimating for sound channel area function Meter if glottis is in sounding, is regularly opened and is closed, and under the conditions of frequency is low-down, lip end radiation impedance can be by It is considered 0, then boundary condition cannot obtain rational result；And the pronunciation of certain vowels (vowel/a/ etc.) leads to condition not It tallies with the actual situation.

Deng H propose to estimate sound channel area function in the case where glottis closes phase, but glottis is only closed phase and the width of glottis wave by it Value connects, less than half of glottis wave amplitude in peak value is considered that glottis closes phase by him, this method of estimation is not stringent Accurately, and the data volume for autocorrelation analysis is caused to become insufficient.

Invention content

The technical problems to be solved by the invention are that in order to overcome the disadvantages of the above prior art, the present invention closes phase in glottis A kind of new algorithm is proposed on the basis of method, to reach the sound channel area that voice under phase is closed in accurate estimation.

The present invention uses following technical scheme to solve above-mentioned technical problem

A kind of method of estimation of the sound channel area in the case where glottis closes phase, specifically comprises the following steps：

Step 1：Determine two adjacent closing point position GCI of glottis₁And GCI₂；

Step 2：According to the two of glottis adjacent closing point position GCI₁And GCI₂Calculate the weight excitation function W of attenuation_n, It is specific as follows：

With two of glottis adjacent closing point position GCI₁And GCI₂For a cycle, by two adjacent closing point positions GCI₁And GCI₂Neighbouring W_nD is set as, with GCI₁Coordinate, weight excitation function W are established for origin_nIncreased with constant-slope from d To 1, d is reduced to from 1 with the identical slope of absolute value, hereafter until GCI₂Position W_nBecome d, weight excitation function W_nIt is formed One trapezoidal piecewise function, it is as follows：

Wherein, d is the normal number less than 1, and n represents n-th of voice data point since origin, and N is in a cycle All data points, α, β are the ratio that different segmentations are shared in piecewise function, N_SlopeRepresent that weighting function value rises to 1 from d The points passed through；

Step 3：The linear prediction of sound channel under glottis closes phase is calculated under conditions of weighted linear predicts mean square error minimum Coefficient；

Step 4：Iterate to calculate the discrete channels area function of lossless pipeline model：

It is specific as follows using reflectance factor and the discrete channels area function of the lossless pipeline model of formula Recursive Solution：

Wherein, μ_mM-th of predictive coefficient of m rank linear predictions for reflectance factor, i.e. a_m(m)；A_mRepresent m length of tubing Sectional area.

As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 1, from The test set of database chooses sounding vowel, determines two adjacent closures of glottis using DYPSA algorithms for its audible curve Point position GCI₁And GCI₂。

As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, the step 3 has Body comprises the following steps：

Step 3.1, the mean square error of weighted linear prediction is calculated, it is specific as follows：

Wherein, E be weighted linear prediction mean square error, e_nRepresent prediction error, W_nIt is the weight excitation function of step 2, s_nIt is voice signal, a_iAs predictive coefficient, P be weighted linear prediction order, [GCI₁,GCI₂] except signal be 0；

Step 3.2, the mean square error E predicted the weighted linear that step 3.1 calculates carries out derivation so as to all a_i's Partial derivative is 0, specific as follows：

Step 3.3, it solves above-mentioned matrix and obtains all reflectance factor a_i,。

It, in step 4, will as the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase Glottis regards a slight change, lossless, uniform sound pipe as to this section of sound channel of lip, establish sound channel by multiple equal lengths, The different pipe series connection of sectional area forms the vocal tube model of lossless pipeline.

As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2, d It is 10^-4。

As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2, α It is 0.05.

As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2, β It is 0.7.

As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2, N_SlopeIt is 7.

The present invention compared with prior art, has following technique effect using above technical scheme：

The present invention determines that glottis closes phase point using DYPSA algorithms, the weighting function that attenuation mainly encourages is determined, so as to utilize Weighted linear prediction obtains sound channel area in the case where glottis closes phase；Compared by the channel model parameter obtained to liftering, from width It is 2.66 to the averaged power spectrum error of reflectance factor that value angle defined, which closes phase method, and weighted linear prediction algorithm proposed in this paper is estimated Meter error is reduced to 2.01, realizes 24.3% promotion；And the normal voice and special voice that can realize highest 99% are known Rate and 96% polyp of vocal cord do not segment accuracy rate with vocal nodule.

Description of the drawings

Fig. 1 is the flow chart of implementation of the present invention；

Fig. 2 (a) is normalization glottis wave oscillogram；

Fig. 2 (b) is normalization glottal flow derivative oscillogram；

Fig. 3 is the LF models and its weighting function of derivative glottal flow；

Fig. 4 is that the sound channel area of MRI magnetic resonance imagings and two kinds of glottises close sound channel area under phase and compare；

Fig. 5 (a) is sound channel area distribution plot of the normal voice on different frame；

Fig. 5 (b) is sound channel area distribution plot of the vocal nodule voice on different frame；

Fig. 5 (c) is sound channel area distribution plot of the polyp of vocal cord on different frame；

Fig. 5 (d) is sound channel area distribution plot of the hyperthyroidism oedema voice on different frame；

Fig. 6 is the corresponding glottis source parameter error table of three kinds of methods；

Fig. 7 is the recognition result under three kinds of recognizers.

Specific embodiment

Technical scheme of the present invention is described in further detail below in conjunction with the accompanying drawings：

Embodiment 1

The algorithm of the present invention first with DYPSA algorithms as shown in Figure 1, obtain LF (Liljencrants-Fant) model GCI positions.Within a vibration period, vocal cords are vibrated from opening state to closed state, form two of glottis wave impulse Major part is opened mutually and closes phase.If Fig. 2 (a) represents glottis wave signal, as Fig. 2 (b) represents the LF moulds of derivative glottal flow signal Type.DYPSA algorithms using tilted phase function, obtain dynamic GCI from voice signal automatically；Then by determine GCI establishes weighting function, while obtains the cycle T of signal, and linear prediction analysis is weighted to voice signal, obtains and pipeline The equivalent reflectance factor of model；Finally discrete channels area is calculated using iteration function.

The selection of weighting function Wn is an essential part in weighted linear prediction, selects the weight letter of attenuation herein Number, such as Fig. 3 are based primarily upon practical glottal flow derivative waveform, reduce the tribute for being located at the speech samples of sound channel master drive near GCI It offers.The d as described in step 2 is preferably 10-4, and preferably 0.05, preferably 0.7, N slopes are preferably 7, obtain matrix W n.Then (3) In formula matrix equation, GCI1 and GCI2 positions it is known that voice data sn it is known that weight Wn it is known that matrix equation can be solved In all unknown number ai, and then obtain reflectance factor μm, Recursive Solution discrete area function Am.

The present invention is using University Of Suzhou's voice library and MEEI (Massachusetts Eye and Ear Infirmary) numbers According to library.The test set of database is sustained vowel/a/, and normal voice and special voice are chosen from database, wherein special Voice includes three kinds of vocal nodule voice, polyp of vocal cord voice and hyperthyroidism voice.100 normal voices are chosen from above-mentioned sample Sample frequency with 230 special voice (100 vocal nodules, 100 polyp of vocal cord, 30 hyperthyroidism voices) speech samples is 25kHz, chooses the frame length of 60ms, and the frame of 30ms moves.The estimation of vocal tract shape is by linear prediction order and sound channel length It influences, document is provided for vowel/a/, and the optimal sound channel length of adult is 17cm, and optimal linear prediction exponent number is 12.

Fig. 4 is that the sound channel area of MRI magnetic resonance imagings and two kinds of glottises close sound channel area under phase and compare, wherein VTA-MRI Represent the sound channel area that magnetic nuclear resonance method obtains, VTA-WLP (vocal tract area- Weightedlinearprediction) the sound channel area of phase method acquisition is closed in the weighted linear prediction proposed for this method, VTA-HPV (vocal tract area-halfpeak value) represents glottis wave amplitude that DengH proposes, and less than half is closed The sound channel area that phase method obtains.Abscissa represents the sound channel number of nodes from glottis to lip end, and ordinate is normalized section Product uses the exponent number of linear prediction as 12 herein, therefore 12 section sound channel pipeline node areas of acquisition is carried out equally spaced slotting Value processing finds out, VTA-WLP, closer to VTA-MRI, and shows more details than VTA-HPV from figure.It counts respectively Mean square error of the sound channel area data to nuclear magnetic resonance area data of two methods acquisition is calculated, wherein, what HPV methods obtained MSE_area=0.1542, WLP method obtain MSE_area=0.0341, it was demonstrated that using this method calculate sound channel area more It is accurate.It is below different classes of normal and area distribution plot of the special voice on different frame, wherein Fig. 5 (a) is represented just The VTAF of normal voice；Fig. 5 (b) is the VTAF of vocal nodule voice；Fig. 5 (c) is the VTAF of polyp of vocal cord voice；Fig. 5 (d) is represented The VTAF of hyperthyroidism oedema voice.When can be seen that hair/a/ from figure, lip site area is bigger, with Magnetic resonance imaging side The area that method obtains is compared, and normal voice tallies with the actual situation, other three kinds of voices seem disorderly, with reference figuration difference compared with Greatly.

From the validity of method after the verification improvement of liftering angle, while choose currently used iteration self-adapting liftering Algorithm (iterative adaptive inverse filtering, IAIF) compares jointly.IAIF algorithms are to channel model The influence of formant is modeled and eliminated by liftering, is accurately estimated by the method for linear prediction and the method for discrete full pole The model of sound channel obtains glottis signal finally by liftering.To liftering method, accurately assessment must be by means of glottis The known synthesis voice of wave signal.According to the LF models of the glottal flow derivative parametric configuration glottis wave extracted from raw tone, then Corresponding channel parameters are extracted, synthesize tested speech.

Specific assessment is by comparing the error between Prediction Parameters and actual test speech parameter.It shakes including normalization Width quotient NAQ (NAQ=U_gm/(|U_gc’|*(t₀ ^T=t₀))), open quotient QQQ (QOQ=(t_c-t₀)/(t₀ ^T-t₀)), slope ratio Sr (Sr =U_gc’/U_gr'), the position of each variable is indicated in fig. 2.The results are shown in Figure 6 for relative error.Wherein, CP_WLP (closed phase_weighted linear prediction) represents that phase method is closed in weighted linear prediction proposed in this paper, CP_HPV (closed phase_half peak value) represents the Deng H glottis wave amplitudes that provide, and less than half closes phase Method, IAIF are iteration self-adapting liftering algorithm；A test groups are that fundamental frequency is less than 150Hz, and average frequency is The mean error of ten synthesis voices of 130Hz is as a result, C test groups are higher than 250Hz for fundamental frequency, and average fundamental frequency is 280Hz's Ten synthesis phonological errors are as a result, B test groups represent fundamental frequency 150 to ten conjunctions that between 250Hz, average fundamental frequency is 220Hz Into voice.The result shows that in addition to the high band of Sr parameters, all parameters of other frequency ranges are all that CP_WLP behaves oneself best, IAIF The resultant error of algorithm is maximum, and two kinds to close phase method obvious with respect to the advantage of IAIF algorithm lifterings, illustrates to close facies analysis Necessity, while prove the superiority of this method CP_WLP.

Fig. 7 gives the recognition result and B (100 vocal cords of the A (normal voice and special voice) of three kinds of recognition methods Brief summary voice and 100 polyp of vocal cord voices) subdivision as a result, including discrimination, AUC indexs and Kappa indexs.AUC indexs It is used for describing the effect of identification with Kappa indexs, when the two refer to target value closer to 1, shows that recognition result is better.From table As can be seen that B subdivisions result will be slightly lower than the recognition result of A because normally the otherness between special voice compared to Discrimination between two kinds of special voices is more obvious, and wherein A highests discrimination can reach the highest subdivision knots of 99%, B Fruit can reach 96%, the results showed that, this algorithm realizes 7% compared to the Fusion Features optimization algorithm of same sound bank Promotion.

Claims

1. a kind of method of estimation of the sound channel area in the case where glottis closes phase, which is characterized in that specifically comprise the following steps：

Step 2：According to the two of glottis adjacent closing point position GCI₁And GCI₂Calculate the weight excitation function W of attenuation_n, specifically It is as follows：

With two of glottis adjacent closing point position GCI₁And GCI₂For a cycle, by two adjacent closing point position GCI₁With GCI₂Neighbouring W_nD is set as, with GCI₁Coordinate, weight excitation function W are established for origin_nIncrease to 1 from d with constant-slope, with The identical slope of absolute value is reduced to d from 1, hereafter until GCI₂Position W_nBecome d, weight excitation function W_nForm a ladder Shape piecewise function, it is as follows：

Wherein, d is the normal number less than 1, and n represents n-th of voice data point since origin, and N is the institute in a cycle There are data points, α, β are the ratio shared by different segmentations, N in piecewise function_SlopeRepresent that weighting function value rises to 1 from d and passed through The points crossed；

Step 3：The linear predictor coefficient of sound channel under glottis closes phase is calculated under conditions of weighted linear predicts mean square error minimum；

Wherein, μ_mM-th of predictive coefficient of m rank linear predictions for reflectance factor, i.e. a_m(m)；A_mRepresent cutting for m length of tubing Area.

2. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that：In step 1, Sounding vowel is chosen from the test set of database, determines that two of glottis adjacent close using DYPSA algorithms for its audible curve Chalaza position GCI₁And GCI₂。

3. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that：The step 3 Specifically comprise the following steps：

Step 3.2, the mean square error E predicted the weighted linear that step 3.1 calculates carries out derivation so as to all a_iLocal derviation Number is 0, specific as follows：

Step 3.3, solution matrix obtains all reflectance factor a_i,。

4. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that：In step 4, It regards glottis to this section of sound channel of lip as a slight change, lossless, uniform sound pipe, establishes sound channel by multiple length phases Deng the different pipe series connection of, sectional area, the vocal tube model of lossless pipeline is formed.

5. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that：In step 2, D is 10^-4。

6. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that：In step 2, α is 0.05.

7. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that：In step 2, β is 0.7.

8. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that：In step 2, N_SlopeIt is 7.