CN106157212A

CN106157212A - A kind of dysphonia Chinese appraisal procedure based on EMA

Info

Publication number: CN106157212A
Application number: CN201610521815.3A
Authority: CN
Inventors: 薛珮芸; 张雪英; 白静
Original assignee: Taiyuan University of Technology
Current assignee: Taiyuan University of Technology
Priority date: 2016-07-05
Filing date: 2016-07-05
Publication date: 2016-11-23

Abstract

The present invention relates to pronunciation evaluation technical field, be based particularly on the dysphonia assessment technology field of EMA.A kind of dysphonia Chinese appraisal procedure based on EMA, type according to different dysphonia determines testing material, pronounce the slope of formant trajectory equation for parameter Criterion data base with normal person's the tip of the tongue Euclidean distance, normal person's lips folding distance, normal person's duration, normal person, with the pronounce slope of formant trajectory equation of patient's the tip of the tongue Euclidean distance, patient's lips folding distance, patient's duration, patient as reduced parameter, fuzzy membership functions concept is selected to judge the defect level of dysphonia patient.The present invention combines kinetics and acoustic information, it is possible to more comprehensively assess dysphonia patient, provides theoretical basis and technical support for pathological study.

Description

A kind of dysphonia Chinese appraisal procedure based on EMA

Technical field

The present invention relates to pronunciation evaluation technical field, be based particularly on the dysphonia assessment technology field of EMA.

Background technology

The assessment of dysphonia is to be analyzed by acoustic signal mostly, using normal person as referential, studies its difference also Evaluate.Common method has formant to extract the degree of accuracy contrast of contrast, duration contrast and vowel-consonant.Tested mother tongue is many For English, the pronunciation research to Chinese Chinese is less.

Author Michal Novotn et al. is at document " Automatic Evaluation of Articulatory Disorders in Parkinson ' s Disease " in propose an assessment Parkinsonian and pronounce the method for defect, The method is automatically to be assessed by acoustic method based on pronunciation character.This experiment recruit 24 Parkinsonians and 22 of the same age Normal person as reference group, it is desirable to tested quickly repetition reads aloud syllable/pa/ ,/ta/ ,/ka/.It is used for describing the feature of pronunciation Including tonequality, throat's degrees of coordination, sound channel motion, the motion of the accuracy of consonant articulation, tongue, degree of engagement and duration of speaking, this A little features are also by the factor as assessment.With support vector cassification algorithm based on pronunciation character distinguish Parkinsonian and Normal person, this detecting algorithm rate of accuracy reached is to 80%.First tested audio signal is labeled as initial point of articulation (initial Burst), vowel starting point (vowel onset) and halt (occlusion), then commented by above-mentioned six pronunciation characters Estimate the grade of pronunciation defect.The method can evaluate the defect rank of dysphonia patient, but assessment reference factor is from The analysis of audio signal, does not has the dynamic information that actual tongue moves, so appraisal procedure is the most comprehensive.

Author Kris Tjaden et al. is at document " Vowel Acoustics in Parkinson ' s Disease and Multiple Sclerosis: Comparison of Clear, Loud, and Slow Speaking Conditions》 In with normal artificial reference, compared for Parkinsonian and multiple sclerosis patients from definition, loudness, slow degree Vowel articulation, finally wishes to find the lifting intelligibility of speech from comparative study, increase the tip of the tongue displacement, raising tongue movement velocity Therapeutic Method.Literary composition is mentioned and causes dysarthric main cause to have the following aspects: be the vowel of deformation, inaccurate auxiliary Sound, not accurate and degree of irregularity.But being only acoustic signal to be extracted formant be analyzed, contrast condition is more single.

Author Vincent Martel Sauvageau et al. is at document " Impact of the LSVT on vowel Articulation and coarticulation in Parkinson ' s disease " use equation of locus (locus Equation) intelligibility of pronunciation is measured.Equation of locus describes pronunciation the second formant starting point and the line of midpoint relation Property model, the pronunciation situation of enunciator can effectively be assessed by this model.But only with being estimated ignoring to acoustic features The defect of patient itself.

Summary of the invention

The technical problem to be solved is: the how kinematics information to dysphonia patient (dysarthria) Carry out the pronunciation situation of comparative evaluation impaired patients with normal person (healthy controls) with acoustic information simultaneously.

The technical solution adopted in the present invention is: a kind of dysphonia Chinese appraisal procedure based on EMA, according to following step Suddenly carry out:

Step one, type according to different dysphonia determine testing material, and testing material is one or more, each test Language material is the standard according to the combination of Chinese Pin Yin pseudonym simple or compound vowel of a Chinese syllable, an initial consonant and all simple or compound vowel of a Chinese syllable that can pronounce with the combination of this initial consonant All combining forms, select more than several mandarin level second-rank first class and the normal person without pronunciation medical history to read each survey successively Examination language material, ready reading when gathering their reading test language material with EMA instrument but the seat of the tip of the tongue present position when not reading Mark i.e. normal person's the tip of the tongue static frames coordinate, read during each testing material static with normal person's the tip of the tongue in the coordinate of the tip of the tongue place The maximum value i.e. normal person's the tip of the tongue Euclidean distance of frame coordinate Euclidean distance, read upper lip and lower lip opening and closing degree in each testing material Maximum i.e. normal person's lips folding distance, read duration used by each testing material i.e. normal person's duration, normal person pronunciation The slope of formant trajectory equation, with normal person's the tip of the tongue Euclidean distance, normal person's lips folding distance, normal person's duration, normal The pronounce slope of formant trajectory equation of people is parameter Criterion data base, during normal person reads each testing material, Initial consonant being total to the starting point of simple or compound vowel of a Chinese syllable transition that one initial consonant of the testing material collected with EMA instrument combines with all simple or compound vowel of a Chinese syllable Peak frequency of shaking is ordinate value, and the formant frequency at the midpoint of simple or compound vowel of a Chinese syllable is abscissa value, is formed and simple or compound vowel of a Chinese syllable quantity equal number Discrete point, these discrete points are linear and tight clusters, and the slope of the discrete point fitting a straight line asked is normal person's pronunciation The slope of formant trajectory equation；

Step 2, patient to be tested, according to dysphonia type selecting read test language material, gather patient to be tested with EMA instrument Ready reading when reading each testing material but when not reading the i.e. patient's the tip of the tongue static frames of coordinate of the tip of the tongue present position sit Mark, read value maximum with patient's the tip of the tongue static frames coordinate Euclidean distance in the coordinate of the tip of the tongue place during each testing material i.e. Patient's the tip of the tongue Euclidean distance, read in each testing material the maximum of upper lip and lower lip opening and closing degree i.e. patient's lips folding away from From, read duration used by each testing material i.e. patient's duration, with patient's the tip of the tongue Euclidean distance, patient's lips folding distance, suffer from The pronounce slope of formant trajectory equation of person's duration, patient is reduced parameter, during patient reads each testing material, with The initial consonant that any one initial consonant of the testing material that EMA instrument collects and all simple or compound vowel of a Chinese syllable combine is total to the starting point of simple or compound vowel of a Chinese syllable transition Peak frequency of shaking is ordinate value, and the formant frequency at the midpoint of simple or compound vowel of a Chinese syllable is abscissa value, is formed and simple or compound vowel of a Chinese syllable quantity equal number Discrete point, these discrete points are linear and tight clusters, and the slope of the discrete point fitting a straight line asked is patient and pronounces altogether Shake the slope of peak equation of locus；

Step 3, selection fuzzy membership functions concept judge the defect level of dysphonia patient, to i-th testing material, i For natural number, in all normal persons, normal person's the tip of the tongue Euclidean distance maximum is S_iMax, normal person's the tip of the tongue Euclidean distance is minimum Value is S_iMin, experience obtains patient the tip of the tongue Euclidean distance maximum Smax, patient's the tip of the tongue Euclidean distance S_i, work as S_iWhen=0, i-th Testing material patient apical articulation obstacle Sz_iIt is 0, as 0 < S_i<S_iDuring min, i-th testing material patient apical articulation obstacle Sz_i For S_i/S_iMin, works as S_imin≦S_i≦S_iDuring max, i-th testing material patient apical articulation obstacle Sz_iIt is 1, works as S_imax<S_i< During Smax, i-th testing material patient apical articulation obstacle Sz_iFor (S_imax- S_i)/(Smax-S_iMax), work as S_i≥Smax Time, i-th testing material patient apical articulation obstacle Sz_iIt is 0;In all normal persons, normal person's lips folding distance maximum For Z_iMax, normal person's lips folding distance minima is Z_imin, experience obtains patient's lips folding distance maximum Zmax, suffers from Person's lips folding distance is Z_i, work as Z_iWhen=0, i-th testing material patient face dysphonia Zz_iIt is 0, as 0 < Z_i<Z_iDuring min, I-th testing material patient face dysphonia Zz_iFor Z_i/Z_iMin, works as Z_imin≦Z_i≦Z_iDuring max, i-th testing material is suffered from Person face dysphonia Zz_iIt is 1, works as Z_imax<Z_i< during Zmax, i-th testing material patient face dysphonia Zz_iFor (Z_imax- Z_i)/(Zmax-Z_iMax), work as Z_iDuring >=Zmax, i-th testing material patient face dysphonia Zz_iIt is 0;Institute Having in normal person, normal person's duration maximum is J_iMax, normal person's duration minima is J_imin, experience obtains patient's duration It is worth greatly Jmax, a length of J during patient_i, work as J_iWhen=0, i-th testing material patient duration dysphonia Jz_iIt is 0, as 0 < J_i<J_imin Time, i-th testing material patient duration dysphonia Jz_iFor J_i/J_iMin, works as J_imin≦J_i≦J_iDuring max, i-th test language Material patient duration dysphonia Jz_iIt is 1, works as J_imax<J_i< during Jmax, i-th testing material patient duration dysphonia Jz_iFor (J_imax- J_i)/(Jmax-J_iMax), work as J_iDuring >=Jmax, i-th testing material patient duration dysphonia Jz_iIt is 0;Institute Having in normal person, the pronounce gradient maxima of formant trajectory equation of normal person is K_iMax, normal person pronounces formant trajectory side The slope minima of journey is K_iMin, experience obtains patient and pronounces the gradient maxima Kmax of formant trajectory equation, and experience obtains Patient pronounces slope minima Kmin of formant trajectory equation, and Patients Patients to be measured pronounces the slope of formant trajectory equation K_i, work as K_iDuring Kmin, i-th testing material patient slope obstacle Kz_iIt is 0, as Kmin < K_i<K_iDuring min, i-th testing material Patient slope obstacle Kz_iFor (K_i-Kmin)/(K_iMin-Kmin), work as K_imin≦K_i≦K_iDuring max, i-th testing material patient Duration dysphonia Kz_iIt is 1, works as K_imax<K_i< during Kmax, i-th testing material patient duration dysphonia Kz_iFor (K_imax- K_i)/(Kmax-K_iMax), work as K_iDuring >=Kmax, i-th testing material patient duration dysphonia Kz_iIt is 0;I-th test language Material patient obstacle U_i=0.4*Sz_i+0.1*Zz_i+0.1*Jz_i+0.4*Kz_i；

Step 4, Patient Global pronounce voice disorder U=| 1-U₁|+...+|1-U_i|+...+|1-U_n|, U₁It is the 1st test language Material patient's obstacle, U_iFor i-th testing material patient's obstacle, U_nBeing n-th testing material patient's obstacle, n is the total of testing material Quantity belongs to natural number.

As a kind of optimal way: in step 3, experience obtains patient the tip of the tongue Euclidean distance maximum Smax and refers to doctor Maximum in the patient's all data of the tip of the tongue Euclidean distance collected, experience obtains patient's lips folding distance maximum Zmax Referring to patient's lips folding that doctor collects maximum in all data, experience obtains patient duration maximum Jmax Referring to the maximum in all data of patient's duration that doctor collects, experience obtains patient and pronounces the oblique of formant trajectory equation Rate maximum Kmax refer to the patient that doctor collects pronounce formant trajectory equation all data of slope in maximum, warp Test and obtain pronounce slope minima Kmin of formant trajectory equation of patient and refer to that the patient that doctor collects pronounces formant rail Minima in all data of slope of mark equation, and refer to the collection to patients different under same testing material situation.

The invention has the beneficial effects as follows: the exercise data gathered by EMA can pass through MATLAB drawing three-dimensional coordinate diagram, directly Seeing and effectively compare with normal person, the method, from the angle of physiology, improves the accuracy of assessment, contrasts more intuitively Dysphonia patient and the pronunciation difference of normal person.Equation of locus pronunciation model is based on neuroscience, is used for assessing voice Stability and the method for particularity, will have breakthrough to domestic pathology voice study.The present invention combines kinetics and acoustics letter Breath, it is possible to more accurately dysphonia patient is comprehensively assessed, provide theoretical basis and technology for pathological study Support.

Detailed description of the invention

The present invention is with Windows7 system as operating environment, and MATLAB R2010b is data processing platform (DPP).The following is concrete Operational approach:

Step one, type according to different dysphonia determine testing material, choose testing material and follow the feature of Chinese speech pronunciation And rule, it is possible to adjusting testing material according to the type of different dysphonia, testing material is one or more, each test Language material is the standard according to the combination of Chinese Pin Yin pseudonym simple or compound vowel of a Chinese syllable, an initial consonant and all simple or compound vowel of a Chinese syllable that can pronounce with the combination of this initial consonant All combining forms, the present embodiment is for assessing the pronunciation patient lifting obstacle on tongue, owing to tongue cannot normally lift contact Soft palate and upper tooth, cause some initial consonant cacologies of patient true, such as/l/ ,/d/ ,/t/ ,/s/ ,/ch/ etc..The present embodiment chooses survey Examination language material is initial consonant/d/ ,/l/ ,/ch/, selects more than 10 mandarin level second-rank first class and the normal person without pronunciation medical history to depend on The each testing material of secondary reading, ready reading when gathering their reading test language material with EMA instrument but the tip of the tongue when not reading The coordinate of present position i.e. normal person's the tip of the tongue static frames coordinate, read during each testing material in the coordinate of the tip of the tongue place with just The maximum value i.e. normal person's the tip of the tongue Euclidean distance of ordinary person's the tip of the tongue static frames coordinate Euclidean distance, read upper lip in each testing material With the i.e. normal person's lips folding of maximum of lower lip opening and closing degree distance, read the i.e. normal person of duration used by each testing material time Long, normal person pronounces the slope of formant trajectory equation, with normal person's the tip of the tongue Euclidean distance, normal person's lips folding distance, just The pronounce slope of formant trajectory equation of ordinary person's duration, normal person is parameter Criterion data base, and normal person reads each survey During examination language material, the initial consonant of an initial consonant of the testing material collected with EMA instrument and the combination of all simple or compound vowel of a Chinese syllable is to simple or compound vowel of a Chinese syllable mistake The formant frequency of the starting point crossed is ordinate value, and the formant frequency at the midpoint of simple or compound vowel of a Chinese syllable is abscissa value, is formed and simple or compound vowel of a Chinese syllable The discrete point of quantity equal number, these discrete points are linear and tight clusters, the slope of the discrete point fitting a straight line asked Be normal person to pronounce the slope of formant trajectory equation, in the present invention normal person pronounce formant trajectory equation slope involved by And coordinate be plane right-angle coordinate coordinate, other coordinate is 3 D stereo coordinate, and 3 D stereo coordinate is with each reader Left and right directions is X-axis and direction is to be incremented by from right to left, with each reader's fore-and-aft direction as Y-axis and direction is from forward direction Rear incremental；With each reader's above-below direction as Z axis and direction is to be incremented by from bottom to top.The present embodiment use INSTRUMENT MODEL is AG501, records articulation with the sample rate that 200 frames are per second, gathers the fortune of each organ while tester produces voice Dynamic data, and record synchronous voice data.With physiology glue, sensor (sensor) is adhered to the tip of the tongue of tester, upper lip Change with these site location of synchro measure in the middle of centre, lower lip.

As a example by a wherein bit test person pronounces initial consonant/d/, first gathering enunciator's static frames data with EMA, pronunciation is dynamic Making the static frames of data and refer to mute and without obvious articulation a Frame, tongue now and upperlip etc. are sent out Sound organ is in relaxation state, and corresponding voice data is quiet section of speech waveform.Collecting test person pronunciation again/d/ Time motion trace data, choose the key frame of pronunciation/d/, due to pronunciation time the tip of the tongue be directly connected to pronunciation definition, I The key frame of the primary study the tip of the tongue；

In order to study the pronunciation character of Chinese phoneme, need to go out can mark from three-dimensional articulation extracting data complicated and changeable Know a frame of this phoneme or a few frame to characterize its personal characteristics, referred to as key frame, select the tip of the tongue Europe relative to the tip of the tongue static frames The maximum frame of formula distance as in the tip of the tongue place coordinate during the tip of the tongue key frame, and then the reading test language material/d/ asked with Value i.e. normal person's the tip of the tongue Euclidean distance that the tip of the tongue static frames coordinate Euclidean distance is maximum；

Table 1 one bit test person's static frames and the position of key frame

	X-axis (mm)	Y-axis (mm)	Z axis (mm)
				The static frames the tip of the tongue (T1)	9.40	32.54	90.18
The key frame the tip of the tongue (T1)	9.89	33.27	94.17

Extract the formant information gathering enunciator pronunciation/d/.According to the standard of Chinese Pin Yin pseudonym simple or compound vowel of a Chinese syllable combination, initial consonant/d/ With simple or compound vowel of a Chinese syllable combination have 18 kinds of forms, be respectively/da/ ,/duo/ ,/de/ ,/di/ ,/du/ ,/dai/ ,/dui/ ,/dao/ ,/ Dou/ ,/diu/ ,/die/ ,/dan/ ,/din/ ,/dun/ ,/dang/ ,/deng/ ,/ding/ ,/dong/, gather above group respectively The pronunciation information closed, by the second formant initial consonant to the starting point (F2 of simple or compound vowel of a Chinese syllable transition_onset) and the midpoint of the second formant simple or compound vowel of a Chinese syllable (F2_mid) draw, these discrete points are linear and tight clusters.Discrete point obeys unary linear regression equation, according to research Show that the slope of equation of locus can reflect the speech quality of speaker, therefore can judge the pronunciation feelings of enunciator from slope Condition；

By calculating the equation of locus of a bit test person pronunciation/d/ it is

F2_onset =0.416* F2_mid+ 1288.316, k=0.416, the pronunciation duration J=0.67s of record enunciator.

The pronunciation data of same method 10 normal persons of collection, and set up the data base of normal articulation, as shown in table 2. Wherein S represents normal person's the tip of the tongue Euclidean distance, and Z represents lips maximum folding distance, and K represents the slope of pronunciation equation of locus, J table Show pronunciation duration.

The data of 2 10 health adult hair's speech mother/d/ of table

	S(mm)	Z(mm)	K	J(s)
					1	4.09	13.44	0.416	0.67
2	4.36	13.23	0.423	0.62
					3	3.97	12.98	0.396	0.58
4	4.12	13.25	0.425	0.64
					5	4.16	13.56	0.414	0.68
6	4.01	12.89	0.403	0.59
					7	3.99	13.46	0.419	0.63
8	4.06	13.11	0.428	0.61
					9	4.04	13.54	0.423	0.71
10	4.26	13.24	0.410	0.70

Gather the pronunciation data of patient to be measured, as a comparison supplemental characteristic, as shown in table 3.Wherein S ' represents patient's the tip of the tongue Euclidean Distance, Z ' represents lips maximum folding distance, and K ' represents the slope of pronunciation equation of locus, and J ' represents pronunciation duration.

The pronunciation data of table 3 patient to be measured

	S’(mm)	Z’(mm)	K’	J’(s)
					1	3.82	12.85	0.392	0.52

Illustrate using/d/ as first testing material, by step 2 and step 3 it is recognised that in all normal persons, Normal person's the tip of the tongue Euclidean distance maximum is S₁Max=4.26, normal person's the tip of the tongue Euclidean distance minima is S₁Min=3.97, experience Obtain patient the tip of the tongue Euclidean distance maximum Smax=4.55, patient's the tip of the tongue Euclidean distance S₁=3.82, first testing material is suffered from Person apical articulation obstacle Sz₁For S₁/S₁Min=0.962, in all normal persons, normal person's lips folding distance maximum is Z₁max =13.56, normal person's lips folding distance minima is Z₁Min=12.89, experience obtains patient's lips folding distance maximum Zmax=14.15, patient's lips folding distance is Z₁=12.85, first testing material patient face dysphonia Zz₁For Z₁/ Z₁Min=0.997, in all normal persons, normal person's duration maximum is J₁Max=0.71, normal person's duration minima is J_1min= 0.58, experience obtains patient duration maximum Jmax=0.82, a length of J during patient₁=0.52, first testing material patient's duration Dysphonia Jz₁For J₁/J₁Min=0.897, in all normal persons, normal person pronounces the gradient maxima of formant trajectory equation For K₁Max=0.428, the pronounce slope minima of formant trajectory equation of normal person is K₁Min=0.396, experience obtains patient and sends out The gradient maxima Kmax=0.498 of sound formant trajectory equation, experience obtain patient pronounce formant trajectory equation slope Little value Kmin=0.223, Patients Patients to be measured pronounces the slope K of formant trajectory equation₁=0.392, first testing material is suffered from Person slope obstacle Kz₁For (K₁-Kmin)/(K₁Min-Kmin)=0.169/0.173=0.977, first testing material patient barrier Hinder U₁=0.4*Sz₁+0.1*Zz₁+0.1*Jz₁+0.4*Kz₁=0.920, same method obtains second testing material (/l/) and suffers from Person obstacle U₂=0.933, the 3rd testing material (/ch/) patient obstacle U₃=0.893, second testing material in the present embodiment (/ L/) detailed process of patient's obstacle and the 3rd testing material (/ch/) patient's obstacle is complete in first testing material patient's obstacle Universal class seemingly, does not the most add explanation.

Step 4, Patient Global pronounce voice disorder U=| 1-U₁|+...+|1-U_i|+...+|1-U_n|, U₁It is the 1st survey Examination language material patient's obstacle, U_iFor i-th testing material patient's obstacle, U_nBeing n-th testing material patient's obstacle, n is testing material Total quantity.

The present embodiment, Patient Global pronounces voice disorder U=| 1-0.920 |+| 1-0.933 |+| 1-0.893 |=0.254, suffers from Person's voice disorder value of comprehensively pronouncing is the biggest, illustrates that patient's voice disorder that pronounces is the biggest.

Claims

1. a dysphonia Chinese appraisal procedure based on EMA, it is characterised in that carry out in accordance with the following steps:

Step 4, Patient Global pronounce voice disorder U=| 1-U₁|+...+|1-U_i|+...+|1-U_n|, U₁It is the 1st testing material Patient's obstacle, U_iFor i-th testing material patient's obstacle, U_nBeing n-th testing material patient's obstacle, n is the sum of testing material Amount.

A kind of dysphonia Chinese appraisal procedure based on EMA the most according to claim 1, it is characterised in that: step 3 In, experience obtains patient the tip of the tongue Euclidean distance maximum Smax and refers to patient's all data of the tip of the tongue Euclidean distance that doctor collects In maximum, experience obtain patient's lips folding distance maximum Zmax refer to patient's lips folding distance that doctor collects Maximum in all data, experience obtains patient duration maximum Jmax and refers to all data of patient's duration that doctor collects In maximum, experience obtains the pronounce gradient maxima Kmax of formant trajectory equation of patient and refers to the patient that doctor collects Maximum in all data of slope of pronunciation formant trajectory equation, experience obtains patient and pronounces the oblique of formant trajectory equation Rate minima Kmin refer to the patient that doctor collects pronounce formant trajectory equation all data of slope in minima, and And refer to the collection to patients different under same testing material situation.