CN109119094A

CN109119094A - Voice classification method by utilizing vocal cord modeling inversion

Info

Publication number: CN109119094A
Application number: CN201810824379.6A
Authority: CN
Inventors: 孙宝印; 陶智; 陈莉媛; 张晓俊; 吴迪; 肖仲喆
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2018-07-25
Filing date: 2018-07-25
Publication date: 2019-01-01
Anticipated expiration: 2038-07-25
Also published as: CN109119094B

Abstract

The invention discloses a vocal classification method by utilizing vocal cord modeling inversion, which effectively distinguishes various voices from the perspective of vocal mechanism. The method mainly utilizes complex cepstrum phase decomposition to obtain an actual glottal wave as a target glottal wave, adopts an optimization algorithm to perform vocal cord dynamics model inversion operation by matching target and model glottal wave characteristic parameters, selects normal voice and special voice to perform identification and classification, and has better accuracy. After the actual voice signal is input, the actual glottal wave is extracted as a target, and the genetic algorithm is adopted for inversion to optimize the original model, so that vocal cord vibration conditions when different voices sound are sounded are simulated. Experimental results show that the matching relative error of each characteristic parameter after model inversion is not more than 1.95%, and the inversion effect is good. The normal voice and the special voice are selected for identification and analysis, and the accuracy is high.

Description

A kind of voice classification method using vocal cords modeling inversion

Technical field

The present invention relates to voice classification fields, and in particular to a kind of voice classification method of vocal cords modeling inversion.

Background technique

Voice sorting technique is to carry out signature analysis, the skill that will be distinguished to different types of voice to voice sound signal Art, the technology can be applied to emotional speech analysis and assessment of voice quality etc..Language of the quality of voice quality to people Speech expression has a direct impact, particularly important for teacher, announcer and singer etc., and people is in long-time sounding or in tight When having the state of pressure, voice can generate variation, or even sound of neighing occur.

The acoustic analysis method of voice is widely used at present, is only capable of providing the acoustic information of voice, it cannot be with practical sounding The physiological structure of system is associated, and can not providing good classification standard classification results, there are biggish errors.Research shows that When issuing identical voice, some special voices (such as: vocal nodule, polyp of vocal cord, hyperthyroidism voice) and normal voice vocal cords Vibration Condition is different, the corresponding vocal cord vibration situation of the voice being under different moods also different from.In view of vocal cords It is important one of system during speech production, the Vibration Condition of vocal cords will have a direct impact on the quality of voice quality, therefore will Vocal cords modeling technique is combined with Acouisic analysis, and simulated sound band is in the glottis wave exported when different conditions, to carry out voice Classification.

Voice classification is carried out using vocal cords modeling technique, glottis wave is mainly exported by model, to simulate actual throat Sound, to classify to voice.In actual conditions, the physical parameter of model is directly set, is difficult to simulate and practical voice The glottis wave that signal matches, has a significant impact to the setting of subsequent classification standard.

Summary of the invention

The technical problems to be solved by the present invention are: being directed to the defect of background technique, the present invention is in traditional two mass of vocal cords On the basis of block models, it is subject to the amendment of impact force, matching optimization is carried out to model using inversion algorithm, to reach accurate simulation Physiological conditions when practical vocal cord vibration, to carry out the classification of normal voice Yu special voice.

The present invention uses following technical scheme to solve above-mentioned technical problem:

The present invention proposes a kind of voice classification method of vocal cords kinetic model inverting, specifically includes the following steps:

Step 1 estimates glottis wave using cepstrum polyphase decomposition CCPD, specifically:

(1) pitch period for seeking a frame voice sound signal first obtains a frame voice sound signal glottis by DYPSA algorithm and closes The position of chalaza, glottis closing point position is corresponding with pitch period, it is specific to obtain glottis closing point in each pitch period Position；

(2) voice sound signal in each pitch period is obtained, using the method for cepstrum by the voice sound signal in this period It is decomposed into maximum phase signal and minimum phase signal, and differential is carried out to maximum, minimum phase signal；

(3) by after differential maximum, minimum phase signal is in conjunction with glottis closing point position, wherein maximum phase is believed Before number being placed on glottis closing point, after minimum phase signal is placed on glottis closing point, derivative glottal flow estimation is obtained；

(4) derivative glottal flow is integrated, realizes glottal source estimation.

Step 2 establishes the modified two mass block model of vibration of vocal cords of impact force, and determines model optimization parameter vector；Tool Body are as follows:

The modified two mass block model of vibration IFCM of vocal cords of impact force is established, the kinetics equation of system vibration is obtained, and Introduce left side vocal cords zoom factor Q_l, right side vocal cords zoom factor Q_r, left vocal cord coupling zoom factor Q_cl, right vocal cord coupling scaling Coefficient Q_crAnd pressure subglottic zoom factor Q_PAs model optimization parameter；

Wherein m_iαFor the quality of two sides mass block, k_iαFor stiffness factor, r_iαFor damped coefficient, wherein i=1,2 difference tables Show mass block up and down, α=l, r respectively indicate left and right mass block；L is vocal cords length, d_iFor each mass block thickness, k_cαFor coupled systemes Number, c_iαAdded resilience coefficient when then colliding for two sides vocal cords；a_iIndicate glottis interval area, P_iFor pressure subglottic；

Parameters in Mathematical Model amendment are as follows:

Wherein, m_il0k_il0k_cl0c_il0Respectively represent quality, the stiffness factor, the coefficient of coup, two sides vocal cords hair of left mass block The parameter standard value of added resilience coefficient when raw collision；m_ir0k_ir0k_cr0c_ir0Respectively represent quality, the stiffness system of right mass block The parameter standard value of added resilience coefficient when number, the coefficient of coup, two sides vocal cords collide；P_i0For the parameter mark of pressure subglottic Quasi- value.

Parameters,acoustic in step 3, extraction glottis wave；

First order derivative is asked to glottis wave Ug, obtains glottal flow derivative Ug ', according to glottis wave and glottal flow derivative it is main when Between point extract glottis wave characteristic parameter:

F0=1/ (to^T-to) (5)

OQ=(tc-to)/(to^T-to) (6)

CIQ=(tc-tm)/(to^T-to) (7)

Sr=Ugc'/Ugr'(8)

0NAQ=Ugm/ (Ugc'| (to^T-to)) (9)

Wherein, F0 represents fundamental frequency, and quotient is opened in OQ representative, and CIQ represents closure quotient, and Sr represents oblique ratio, and NAQ represents normalization vibration Width quotient；To is glottis start-up time, and tc is glottis closing moment, and at the time of tm is corresponding to glottis wave Ug peak value, Ugc' is sound The minimum value of door waveguide number Ug ', Ugr' are the maximum value of glottal flow derivative Ug ', and Ugm is glottis crest value.

Step 4 carries out model inversion operation, tool using the optimization algorithm GA-QN that genetic algorithm is combined with quasi-Newton method Body are as follows:

A, initial population is firstly generated, and determines maximum evolutionary generation；

B, the fitness f (Φ of each individual in population is calculated_i), the maximum individual of fitness is obtained as the present age optimal Body；

Fitness function are as follows:

Wherein subscript m indicates model glottis wave characteristic parameter, and subscript o indicates practical glottis wave characteristic parameter.

C, contemporary community selected, intersected, obtaining new group after mutation operator；

D, Quasi-Newton algorithm is carried out to individual each in obtained new population again, obtains the population of a new generation；

E, the above A-D operation is repeated until reaching maximum evolutionary generation, and output global optimum is individual.

Step 5 carries out voice classification；

The variance analysis of each special medical treatment parameter when according to different voice sounding is weighted combination to it, proposes left and right sound Band weighting degree of asymmetry WAR and normalization weighting zoom factor NWS:

It is combined with NWS two indices to distinguish normal voice and special voice using WAR, WAR describes vocal cords Symmetrical degree, WAR is smaller, and the degree of asymmetry of vocal cords is higher；NWS describes vocal cords bilateral intensity of anomaly, and NWS is bigger, and vocal cords are double Side intensity of anomaly is more serious.

The invention adopts the above technical scheme compared with prior art, has following technical effect that

For the present invention after inputting actual speech signal, extracting practical glottis wave is target, carries out inverting using genetic algorithm Original model is optimized, thus vocal cord vibration situation when simulating different voice sounding.The experimental results showed that model is anti- It drills rear each characteristic parameter matching relative error and is no more than 3.6%, efficiency of inverse process is good.It chooses normal voice and special voice carries out Discriminance analysis has higher accuracy rate.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

Fig. 2 is CCPD algorithm flow chart.

Fig. 3 is IFCM illustraton of model.

Fig. 4 is model physical parameter initial value table.

Fig. 5 is normalization glottis wave Ug figure.

Fig. 6 is glottal flow derivative Ug ' waveform diagram.

Fig. 7 is GA-QN algorithm flow chart.

Fig. 8 is the glottis parameter error contrast table after normal voice inverting.

Fig. 9 is the glottis parameter error contrast table after special voice inverting.

Figure 10 is to identify situation table using the normal voice and special voice of single classification indicators.

Figure 11 is that the normal voice combined using double classification indicators and special voice identify situation table.

Specific embodiment

Technical solution of the present invention is described in further detail with reference to the accompanying drawing:

Those skilled in the art can understand that unless otherwise defined, all terms used herein (including skill Art term and scientific term) there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Also It should be understood that those terms such as defined in the general dictionary should be understood that have in the context of the prior art The consistent meaning of meaning will not be explained in an idealized or overly formal meaning and unless defined as here.

The present invention obtains practical voice glottis wave as target glottis wave, using optimization algorithm using cepstrum polyphase decomposition Vocal cords kinetic model under operation is carried out by matching target and model glottis wave characteristic parameter, after optimizing according to inverting most Excellent parameter vector proposes voice classification standard.Specific step is as follows:

Step 1: obtaining practical glottis wave；

The present invention is estimated using cepstrum polyphase decomposition (complex cepstrumphase decomposition, CCPD) Count glottis wave.The pitch period for seeking a frame voice sound signal first passes through DYPSA (Dynamic Programming Projected Phase-Slope Algorithm) algorithm obtain a frame voice sound signal glottis closing point position, glottis is closed Chalaza position is corresponding with pitch period, obtains glottis closing point specific location in each pitch period.Obtain each fundamental tone week Voice sound signal in this period is decomposed into maximum phase using the method for cepstrum and minimum phase is believed by the voice sound signal in the phase Number and differential, with glottis closing point position ining conjunction with, the component part of maximum phase is opened with glottis to match, minimum phase composition part and Glottis, which closes, to match.It is that glottis opens phase before glottis closing point, before maximum phase signal is placed on glottis closing point；Glottis closure Phase is closed for glottis after point, after minimum phase signal is placed on glottis closing point, obtains derivative glottal flow estimation；By differential glottis Wave integral, realizes glottal source estimation.

Step 2: determining model optimization parameter vector；

Establish the modified two mass block model of vibration of vocal cords of impact force (Impact force correctionmodel, IFCM), the kinetics equation of system vibration is obtained, and introduces left side vocal cords zoom factor Q_l, right side vocal cords zoom factor Q_r, left Vocal cords couple zoom factor Q_cl, right vocal cord coupling zoom factor Q_crAnd pressure subglottic zoom factor Q_PJoin as model optimization Number.

Wherein m_iαFor the quality of two sides mass block, k_iαFor stiffness factor and r_iαFor damped coefficient, wherein i=1,2 difference tables Show mass block up and down, α=l, r respectively indicate left and right mass block.k_iαTo intercouple.L is vocal cords length, d_iIt is thick for each mass block Degree, k_cαFor the coefficient of coup, c_iαAdded resilience coefficient when then colliding for two sides vocal cords；a_iIndicate glottis interval area, P_s For pressure subglottic.

Parameters in Mathematical Model amendment are as follows:

Wherein, m_il0k_il0k_cl0c_il0Respectively represent quality, the stiffness factor, the coefficient of coup, two sides vocal cords hair of left mass block The parameter standard value of added resilience coefficient when raw collision；m_ir0k_ir0k_cr0c_ir0Respectively represent quality, the stiffness system of right mass block The parameter standard value of added resilience coefficient when number, the coefficient of coup, two sides vocal cords collide；P_s0For the parameter mark of pressure subglottic Quasi- value.

Parameters,acoustic in step 3, extraction glottis wave；

F0=1/ (to^T-to) (5)

OQ=(tc-to)/(to^T-to) (6)

CIQ=(tc-tm)/(to^T-to) (7)

Sr=Ugc'/Ugr'(8)

NAQ=Ugm/ (| Ugc'| (to^T-to)) (9)

Fitness function are as follows:

E, the above A-D operation is repeated until reaching maximum evolutionary generation, and output global optimum is individual；

Step 5 carries out voice classification；

The variance analysis of each special medical treatment parameter when the present invention is according to different voice sounding, combination is weighted to it, is proposed Left and right vocal cords weight degree of asymmetry WAR and normalization weighting zoom factor NWS:

It is combined with NWS two indices to distinguish normal voice and special voice using WAR.WAR describes vocal cords Symmetrical degree, WAR is smaller, and the degree of asymmetry of vocal cords is higher；NWS describes vocal cords bilateral intensity of anomaly, and NWS is bigger, and vocal cords are double Side intensity of anomaly is more serious.

Embodiment 1:

Process of the invention is as shown in Figure 1, first with cepstrum polyphase decomposition (complex cepstrum phase Decomposition, CCPD) estimation glottis wave.The pitch period for seeking a frame voice sound signal first, passes through DYPSA (Dynamic Programming ProjectedPhase-SlopeAlgorithm) algorithm obtains a frame voice sound signal glottis and closes The position of chalaza, glottis closing point position is corresponding with pitch period, it is specific to obtain glottis closing point in each pitch period Position.The voice sound signal in each pitch period is obtained, is decomposed into the voice sound signal in this period using the method for cepstrum Maximum phase and minimum phase signal and differential, in conjunction with glottis closing point position, component part and the glottis of maximum phase open phase It coincide, minimum phase composition part is closed with glottis to match.It is that glottis opens phase before glottis closing point, maximum phase signal is placed on Before glottis closing point；It is that glottis closes phase and obtains differential after minimum phase signal is placed on glottis closing point after glottis closing point Glottal source estimation；Derivative glottal flow is integrated, realizes glottal source estimation, concrete operations process is as shown in Figure 2.

Two mass block models of standard as shown in Figure 3 are established, vocal cords are indicated with two mass blocks, the two mass blocks are logical Cross spring k_iαWith damper r_iαIt is connected, wherein i=1,2 respectively indicate mass block up and down, and α=l, r respectively indicate left and right quality Block, spring k_iWith damper r_iPass through an other spring k_iαTo intercouple.And determine model optimization parameter Q_l、Q_r、Q_cl、 Q_cr、Q_p, coupling glottal air flow acquisition model glottis wave, wherein each physical parameter initial value is as shown in Figure 4 in model.

Vocal cords kinetic model inverting behaviour is carried out by matching target and model glottis wave characteristic parameter using GA-QN algorithm Make, operating process is as shown in Figure 5.Population number Np is wherein set as 50, the number of iterations is no more than 300, and crossover probability Pc is 0.7, mutation probability Pm are 0.3, and the fitness function of each individual is as follows in population:

Wherein each glottis wave characteristic parameter is respectively fundamental frequency F0 (F0=1/ (to^T- to)), open quotient OQ (OQ=(tc-to)/ (to^T- to)), closure quotient CIQ (CIQ=(tc-tm)/(to^T- to)), oblique ratio Sr (Sr=Ugm'/MFDR), normalized amplitude Quotient NAQ (NAQ=Ugm/ (Ugc'| (toT-to))), each main time node is as shown in Figure 6, Figure 7 in formula.Subscript m indicates mould Type glottis wave characteristic parameter, subscript o indicate practical glottis wave characteristic parameter.

The present invention uses MEEI (Massachusetts Eye and Ear Infirmary) database.The survey of database Examination collection is sustained vowel/a/, normal voice and special voice is chosen from database, wherein special voice includes polyp of vocal cord Voice and two kinds of paralysis vocal cord voice.40 normal voices and 40 special voice (20 unilateral sound are chosen from above-mentioned sample Band is abnormal and 20 bilateral vocal cords are abnormal), wherein the sample frequency of speech samples is 25kHz.

Normal voice and each characteristic parameter matching error value ginseng of special voice glottis wave are as shown in Figure 8 and Figure 9.It can from figure To find out, compared to traditional two mass block models (Two mass model, TMM), IFCM model pair proposed by the present invention The matching error for each characteristic value answered is all smaller, and efficiency of inverse process is more preferable, when illustrating that IFCM model tool can preferably react sounding The actual vibration situation of vocal cords.It is abnormal due to occurring at vocal cords when inputting voice is special voice, cause glottis to be closed degree Decline, the decline of glottis wave regularity, glottis frequency and amplitude have different variations, therefore the matching of each characteristic parameter of mould misses Difference is above normal voice, illustrates that the efficiency of inverse process of vocal cords exception voice is slightly inferior to normal voice, but its all characteristic parameter Matching error is only up to 1.95%, illustrates that the effect of this paper vocal cords model inversion is preferable, for the analysis of model parameter after optimization Provide guarantee.

Figure 10 is only to identify normal voice with special voice as a result, Figure 11 is using WAR with WAR index or NWS index Index and NWS index combine the normal voice of identification with special voice vocal cords abnormal position as a result, including discrimination and Kappa Index.Kappa index is used to describe the effect of identification, refers to that target value closer to 1, shows that recognition result is better.It can be with from figure Find out, only with WAR index, the discrimination to normal voice and special voice is 86.25%, and only uses the identification of NWS index Rate is 81.25%.The identification for carrying out normal voice and special voice, the accuracy rate energy of recognition result are combined using two indices Reach 97.50%.The experimental results showed that method proposed by the present invention effectively can comprehensively distinguish normal voice and special voice Vocal cords ask condition extremely.

The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of voice classification method of vocal cords kinetic model inverting characterized by comprising

Step 1 estimates glottis wave using cepstrum polyphase decomposition CCPD,

Step 2 establishes the modified two mass block model of vibration of vocal cords of impact force, and determines model optimization parameter vector；

Parameters,acoustic in step 3, extraction glottis wave；

Step 4, the optimization algorithm GA-QN combined using genetic algorithm with quasi-Newton method pass through matching target and model glottis Wave characteristic parameter carries out vocal cords kinetic model under operation；

Step 5 carries out voice classification；

The variance analysis of each characteristic parameter when according to different voice sounding is weighted combination to it, proposes that left and right vocal cords add Degree of asymmetry WAR and normalization weighting zoom factor NWS are weighed, is combined with NWS two indices to distinguish normal throat using WAR Sound and special voice.

2. voice classification method according to claim 1, which is characterized in that utilize cepstrum polyphase decomposition described in step 1 CCPD estimates glottis wave, specifically:

(1) pitch period for seeking a frame voice sound signal first obtains a frame voice sound signal glottis closing point by DYPSA algorithm Position, glottis closing point position is corresponding with pitch period, obtain glottis closing point specific location in each pitch period；

(2) voice sound signal in each pitch period is obtained, is decomposed the voice sound signal in this period using the method for cepstrum For maximum phase signal and minimum phase signal, and differential is carried out to maximum, minimum phase signal；

(3) by after differential maximum, minimum phase signal is in conjunction with glottis closing point position, wherein maximum phase signal is put It sets before glottis closing point, after minimum phase signal is placed on glottis closing point, obtains derivative glottal flow estimation；

(4) derivative glottal flow is integrated, realizes glottal source estimation.

3. voice classification method according to claim 1, which is characterized in that establish the modified sound of impact force described in step 3 Two mass block model of vibration of band, and determine model optimization parameter vector；Specifically:

The modified two mass block model of vibration IFCM of vocal cords of impact force is established, obtains the kinetics equation of system vibration, and introduce Left side vocal cords zoom factor Q_l, right side vocal cords zoom factor Q_r, left vocal cord coupling zoom factor Q_cl, right vocal cord coupling zoom factor Q_crAnd pressure subglottic zoom factor Q_PAs model optimization parameter；

Wherein m_iαFor the quality of two sides mass block, k_iαFor stiffness factor, r_iαFor damped coefficient, wherein i=1,2 are respectively indicated Lower mass block, α=l, r respectively indicate left and right mass block；L is vocal cords length, d_iFor each mass block thickness, k_cαFor the coefficient of coup, c_iαAdded resilience coefficient when then colliding for two sides vocal cords；a_iIndicate glottis interval area, P_iFor pressure subglottic；

Parameters in Mathematical Model amendment are as follows:

Wherein, m_il0k_il0k_cl0c_il0The quality of left mass block, stiffness factor, the coefficient of coup, two sides vocal cords are respectively represented to touch The parameter standard value of added resilience coefficient when hitting；m_ir0k_ir0k_cr0c_ir0Respectively represent quality, the stiffness factor, coupling of right mass block The parameter standard value of added resilience coefficient when collaboration number, two sides vocal cords collide；P_i0For the parameter standard value of pressure subglottic.

4. voice classification method according to claim 1, which is characterized in that the acoustics in extraction glottis wave described in step 3 Parameter specifically:

First order derivative is asked to glottis wave Ug, glottal flow derivative Ug ' is obtained, according to the main time points of glottis wave and glottal flow derivative Extract glottis wave characteristic parameter:

F0=1/ (to^T-to) (5)

OQ=(tc-to)/(to^T-to) (6)

CIQ=(tc-tm)/(to^T-to) (7)

Sr=Ugc'/Ugr'(8)

NAQ=Ugm/ (| Ugc'| (to^T-to)) (9)

Wherein, F0 represents fundamental frequency, and quotient is opened in OQ representative, and CIQ represents closure quotient, and Sr represents oblique ratio, and NAQ represents normalized amplitude Quotient；To is glottis start-up time, and tc is glottis closing moment, and at the time of tm is corresponding to glottis wave Ug peak value, Ugc' is glottis The minimum value of waveguide number Ug ', Ugr' are the maximum value of glottal flow derivative Ug ', and Ugm is glottis crest value.

5. voice classification method according to claim 4, which is characterized in that using genetic algorithm and quasi- ox described in step 4 The optimization algorithm GA-QN that the method for pausing combines, passes through matching target and model glottis wave characteristic parameter carries out vocal cords kinetic model Under operation, specifically:

B, the fitness f (Φ of each individual in population is calculated_i), the maximum individual of fitness is obtained as contemporary optimum individual；

Fitness function are as follows:

Wherein subscript m indicates model glottis wave characteristic parameter, and subscript o indicates practical glottis wave characteristic parameter；

E, the above B-D operation is repeated until reaching maximum evolutionary generation, and output global optimum is individual.

6. voice classification method according to claim 3, which is characterized in that the weighting of left and right vocal cords described in step 5 is asymmetric Spend WAR and normalization weighting zoom factor NWS:

It is combined with NWS two indices to distinguish normal voice and special voice using WAR, WAR describes the symmetrical of vocal cords Degree, WAR is smaller, and the degree of asymmetry of vocal cords is higher；NWS describes vocal cords bilateral intensity of anomaly, and NWS is bigger, and vocal cords bilateral is different Chang Chengdu is more serious.