CN102799759A - Vocal tract morphological standardization method during large-scale physiological pronunciation data processing - Google Patents

Vocal tract morphological standardization method during large-scale physiological pronunciation data processing Download PDF

Info

Publication number
CN102799759A
CN102799759A CN2012101965474A CN201210196547A CN102799759A CN 102799759 A CN102799759 A CN 102799759A CN 2012101965474 A CN2012101965474 A CN 2012101965474A CN 201210196547 A CN201210196547 A CN 201210196547A CN 102799759 A CN102799759 A CN 102799759A
Authority
CN
China
Prior art keywords
thin plate
point
data
physiology
coordinate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012101965474A
Other languages
Chinese (zh)
Inventor
魏建国
陈龙
党建武
宋婵
王宇光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN2012101965474A priority Critical patent/CN102799759A/en
Publication of CN102799759A publication Critical patent/CN102799759A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a vocal tract morphological standardization method during large-scale physiological pronunciation data processing, and the method comprises the following steps of: firstly establishing a template, then calibrating mark points, and finally determining the parameter of the spline function of respective sheet according to the corresponding relation between the template and the speaker model mark points. Compared with the prior art, the method disclosed by the invention has the advantages that the physiological pronunciation data standardization can be realized and the kinetic characteristic and spatial position relation of physiological pronunciation can be kept through conducting physiological standardization on the vocal tracts of different speakers when the method is compared with the traditional linearization standard method.

Description

The form method for normalizing of sound channel during extensive physiology pronunciation data is handled
Technical field
The present invention relates to sound pronunciation morphological analysis process field, particularly relate to a kind of safety defect modeling technique procotol.
The invention belongs to sound pronunciation morphological analysis process field.In voice physiology pronunciation research process, because the difference of sound channel form makes to the research of the motion essential characteristic of physiology pronunciation and modeling difficulty very between the experimenter.Especially when handling, be difficult to the manual data normalization of accomplishing for different speakers for large-scale data.So, a kind of method of the standardization vocal tract shape based on thin plate spline function has been proposed.Compare with widely used linearize standardized method, method can under the prerequisite of the personal characteristics that keeps the experimenter, effectively reduce modal difference.This method has important effect for handling extensive physiology pronunciation data.
Background technology
Phonetics is the subject that pronunciation is studied to human language.Main research contents has two aspects, is the physiology articulatory phonetics that the research vocal organs act in the physiology phonation on the one hand, is the acoustic phonetics of research voice acoustic characteristic on the other hand.Early stage phonetics is more studied the acoustic characteristic of voice, nowadays, also has increasing researchist to take to the research of vocal organs mechanism in the physiology phonation.Yet the researchist utilizes physiology pronunciation speech database to make an experiment not as in the acoustic voice research process fully.Except obtaining the physiology pronunciation data relatively the difficulty; Also because vocal tract shape exists individual difference between the different experiments person; Want to eliminate these differences and must realize the modal standardization of vocal tract shape, yet standardized technology still is a bottleneck in the research of physiology pronunciation.Therefore in the research of voice physiology pronunciation, to be hidden in different speakers pronunciation essence behind for excavation be an individual requisite process with kinetic characteristic for the morphologic difference that reduces the different objects of speaking is standardized different speaker's physiology pronunciation datas.After the normalization method standard of using, the physiology pronunciation model has also kept the kinetic characteristic of vocal organs when the physiology phonation when not only reducing between the individuality morphological differences.
The simulation of conveniently pronouncing.Because to change degree very big for the form of sound channel in the process of pronunciation, be difficulty very so only come sound channel standardized through the affined transformation of simple rigid objects.At present, the scholar of studying physiological pronunciation has proposed the method for several kinds of vocal tract normalization, yet all is based on the method for linearize.People such as Bechman adopt the method for linearize sound channel wall that the MRI data recorded is carried out transformation of coordinates, thereby realize the standardization of data.After the sound channel motion morphology standardization when sending out vowel during people such as Hashi pronounce physiology, formed x light database.More than these two kinds of methods all realize the standardization of sound channel on length through method to maxilla wall contour curve linearize; Though use the method for linearize can reduce the difference between the speaker; But according to the data presentation in the test [6]; Morphological differences between the different speakers is not only relevant with sound channel length, and is also closely bound up with the volume size in chamber, sound channel front and back.The method of sound channel linearize is not only lost the locus relativeness of maxilla and two contour curves of tongue surface after standardization, and lost the nonlinear relationship of different enunciator's sound channel morphological differencess.Especially concerning the data that arrive in sound channel local height deformation station acquisition, will lose important nonlinear relationship, even increase individual data items in the axial difference of x.
In image calibration and figure coupling field, everybody a kind of non-rigid normalization method that is widely used based on thin plate spline mapping transfer function, it can effectively solve the problem that occurs in the above-mentioned linearize method process of normalization.
Because former method for normalizing all is to adopt linearize to realize the standardization of sound channel, has the relative position in physiology pronunciation space and the defective that the nonlinear motion characteristic is lost in these methods.Therefore, for fear of losing of sound channel shape information,
Just because of this, originally having researched and proposed a method based on thin plate spline function comes the EMMA physiology pronunciation data between the different enunciators is carried out standard.Used three enunciators' EMMA physiology pronunciation data, average through maxilla and tongue contour shape to three people, and the template of acquisition standardization sound channel.The sound channel template physiology pronunciation space that then utilizes an existing grid system to come respectively three speaker's physiology to be pronounced spaces and on average obtain is carried out monumented point ground and is demarcated.Confirm the thin plate spline transforming function transformation function by each speaker corresponding relation of monumented point that pronounces in space indicate point and the template space then, just can utilize this thin plate spline function to carry out the coordinate transform of physiology pronunciation data thus, thereby realize standardization.
Summary of the invention
Problem based on above-mentioned prior art existence; The present invention proposes the form method for normalizing of sound channel in a kind of extensive physiology pronunciation data processing; Confirm that each enunciator's physiology pronunciation space is to each self-corresponding thin plate spline function of template physiology pronunciation space coordinate transformation; Keep the relative position on the pronunciation space between maxilla and tongue with this method, also keep the nonlinear motion characteristic of organ in its phonation simultaneously; Finally, make the minimizing of the morphological differences of sound channel between individuality through standardizing.
, use the non-rigid standardized method of realization to solve the defect problem in the above-mentioned linearize methodological standardization process based on thin plate spline mapping transfer function.Just because of this,
The present invention proposes the form method for normalizing of sound channel in a kind of extensive physiology pronunciation data processing, it is characterized in that this method may further comprise the steps:
Step 1 obtains many group maxillas and tongue surface profile lines data from physiology pronunciation data storehouse, the average shape that obtains sound channel in the physiology phonation according to these data is set up many groups template of this method;
Step 2, utilize grid system that the monumented point in the vocal tract shape of many groups template of last step is marked; Specific practice: at first; According to the average shape of the data computation tongue surface contour curve of all vowel physiology pronunciation in the physiology pronunciation data storehouse, and then go out the mean place of tongue surface central point by the data computation on all tongues surfaces in the database; Average shape and tongue centre of surface point mean place according to the tongue surface contour curve are confirmed grid system; The whole grid system that obtains is divided into ten onesize sectors; Make ten sectors cover the space of sound channel motion in the whole physiology phonation; And the limit of each sector respectively with maxilla curve, meta curve, tongue surface curve and tongue surface under curve intersection; Thereby to 44 crossing points, with these 44 points just as the mark point of sound channel; Step 3, utilize in above-mentioned mark point and the physiology pronunciation data storehouse between the original point one-to-one relationship to confirm the thin plate spline function parameter, realize the physiology pronunciation data handle in the form standard of sound channel,
Compared with prior art; The present invention compares with traditional standard method that utilizes linearize; Sound channel through to different speaker is carried out modal standard, when realizing the standard of physiology pronunciation data, but can also keep the kinetic characteristic and the spatial relation of physiology pronunciation; Help the motion essence of organ in the phonation is analyzed, and needn't consider the difference between the individuality.
Description of drawings
Fig. 1 is the monumented point of three experimenters of the present invention and template;
Fig. 2 shows the data of one of them vowel for the raw data before the standardization, each subgraph;
Fig. 3 is the data after the standardization;
Fig. 4 is for using linearize method standardization sound channel data afterwards;
Fig. 5 is the standard difference comparison diagram of experimenter's raw data and normalized number certificate;
Fig. 6 is before each experimenter's standardization and the pronunciation of the vowel physiology after standardization image
Embodiment
The present invention proposes a method and solve the standardization that realizes the EMMA data between the different experiments person based on thin plate spline mapping transfer function.Used three enunciators' EMMA data, average through maxilla and tongue contour shape to three people, and obtain standardized sound channel template.The sound channel template space indicate point that then utilizes an existing grid system to come respectively three speakers to be pronounced the space and on average obtain is demarcated.Confirm the thin plate spline transforming function transformation function by each speaker corresponding relation of monumented point that pronounces in space indicate point and the template space then, just can utilize this thin plate spline function to carry out the coordinate up conversion thus, thereby realize standardization.
Below in conjunction with accompanying drawing and preferred embodiment,, specify as follows according to embodiment provided by the invention, structure, characteristic and effect thereof.
Wanting to confirm need be through three step based on the standardized method of thin plate spline function: the foundation that at first is template; Being the demarcation of monumented point then, is to confirm the parameter of thin plate spline function separately according to the corresponding relation of monumented point in template and the speaker model at last.
Template is set up
Utilization among the present invention from the EMMA database of NTT, comprising three enunciators' physiology pronunciation and the data of acoustic voice, i.e. EMMA database.The view data of the sound channel profile that electromagnetism pronunciation registering instrument is caught in the database; Three enunciator's maxillas and tongue surface contour curve are averaged; Remove the morphological differences of sound channel between the individuality, thereby obtain the modal mean profile of sound channel, as in the normalization method with reference to template.
Monumented point is demarcated
Because the EMMA data recorded is two-dimentional, and unlike the same being the pronunciation data of three-dimensional and can very clearly catching sound channel organ distorted movement spatially in the physiology phonation of writing down of image recording systems such as MRI and X ray.People such as Beautemps in nineteen ninety-five the sound channel area function that extracts in profile and the formant frequency of therefrom losing of proposing as vowel and fricative model; In order to address this problem; We use [9] a kind of grid system that obtains behind the above-mentioned model modification to come respectively the monumented point mark to be carried out in the sound channel space of three experimenters and template, come to confirm accurately that the spatial movement of diverse location in the physiology phonation changes.At first confirm each enunciator's tongue surface outline curves and maxilla contour curve; General T1 more than 1,000 is to T4 sensor movement zone when in the physiology phonation, sending out vowel whole with each enunciator in the physiology pronunciation data storehouse; On average try to achieve the center of each sensor region; The line of four central points is exactly enunciator's tongue surface profile space curve separately, and the maxilla curve also is through on average obtaining.And then determine the position at central point place by tongue surface outline curves locus.After tongue surface contour curve and central point are confirmed; According to tongue surface outline curves of having confirmed and central point whole physiology pronunciation spatial is slit into ten equal angular sectors; The space of whole physiology pronunciation is separated with 11 rays, and the limit of each sector just can intersect at 2 points with maxilla curve, the tongue surface curve confirmed before like this, and the mid point of getting between two intersection points obtains 11 intersection points; Line is exactly the meta curve; Limit, sector and the 1cm of tongue surface intersection point below the limit, sector can obtain 11 other intersection points, and line is a curve under the tongue surface, finally can obtain 44 intersection points.These 44 intersection points are just as the monumented point in physiology pronunciation channel space.Owing to follow the example of just the same when three enunciators of definition and template space indicate point; So think; In four spaces as shown in Figure 1, difference pronunciation space, different spaces should be corresponding one by one through the monumented point of the ad-hoc location of same procedure sign, has identical sign.
The thin plate spline function Determination of Parameters
Whole physiology pronunciation is to be produced by the elastic shrinkage of tongue and the motion of chin.So after the stiff radiation conversion standardization of different enunciators' vocal tract shape process; Vocal tract shape can not well be mated together; Come the standard vocal tract shape such as utilizing the linearize method; Not only lose a lot of Useful Informations, and made the individual morphology difference in x direction of principal axis data increase on the contrary.The thin plate spline transforming function transformation function that uses among this paper is exactly belong to flexible radiation conversion a kind of, and it guarantees that conversion all is level and smooth on the overall situation.
What utilize is that the monumented point one-to-one relationship of mentioning in the above-mentioned grid system is confirmed the function that thin plate radiates.Supposing has n coordinate points in the sound channel two-dimensional coordinate system; The song of thin plate spline sticks up can be by the individual parametric description of 2 (n+3); These parameters are made up of common 2n nonlinear parameter of 6 overall linear dimensions and n monumented point; Wherein half is the axial parametric description of x, and second half is the axial parametric description of y.This 2 (n+3) individual parameter can be confirmed by the linear system of mentioning in [7].Suppose
Figure BDA00001767767900061
i=1; N; N monumented point on the expression plane is the situation of 44 monumented points in this experiment.The functional value that the coordinate of these monumented points is brought the correspondence that thin plate spline function tries to achieve into is
Figure BDA00001767767900062
i=1; 2;, n.It is thus clear that (x, y) expression is mapping relations of
Figure BDA00001767767900063
to thin plate spline difference functions f.The thin plate spline difference functions defines as follows:
f ( x , y ) = a 1 + a 2 x + a 3 y + Σ i = 1 n w i r i 2 ln r i 2 - - - ( 1 )
Above-mentioned equality (1), a 1+ a 2X+a 3Y is linear transformation,
Figure BDA00001767767900065
It is nonlinear transformation.Wherein the r implication as follows
Figure BDA00001767767900066
represent the point that will carry out conversion in each enunciator space; Square distance with respect to each monumented point; X and y represent thin plate spline f respectively, and (x promptly inserts the coordinate that thin plate spline function will carry out the point of conversion in y).Equality (1) is to be the equality of load centre thin plate deformation in the infinite space scope with each monumented point coordinate
Figure BDA00001767767900067
.Thin plate with
Figure BDA00001767767900068
For the weights under the situation of load centre are w i[7], the w here iIt is the parameter that Zagorchev mentions in the article of delivering in 2006 " comparative study of non-rigid images match transfer function ".The spline interpolation function of thin plate is made up of two parts; The linear change that a part is described by first three element, remaining part are to describe the nonlinearities change that the batten song sticks up.Through making the flexional E of function f difference functions fCan reach minimum qualifications, and the coordinate one-to-one relationship of monumented point is confirmed a 1, a 2, a 3And w iValue, thereby confirm thin plate spline function.E wherein fDefine as follows:
Formula (2) is represented flexional, can find out as the E that represents flexional fHour, (x, the conversion of y) carrying out will reach minimum degreeof tortuosity to f, approach the conversion on the thin plate plane.
Below be three constraint conditions:
Σ i = 1 n w i = 0 - - - ( 3 ) Σ i = 1 n x ^ i w i = 0 - - - ( 4 ) Σ i = 1 n y ^ i w i = 0 - - - ( 5 )
Constraint condition (3) show all be applied in load on the thin plate with should be zero.This requires thin plate under the situation of forcing load, to keep static rather than motion.Constraint (4) and (5) requirement be when force at thin plate under load and the non-rotary situation x axle and y axle separately the motion of direction be zero.
TPS parameter vector a comprises a 1, a 2And a 3Three components, vectorial w comprises several w iComponent, these two vectors can calculate through following linear equation:
A P P T O w a = v 0 - - - ( 6 )
Wherein
Figure BDA00001767767900076
I=1 ... N, j=1 ... M wherein n equals the number of monumented point, and j need to equal the number of the raw data coordinate points of conversion, will be that load is found the solution at the center with different monumented points all because each needs the coordinate points of conversion, so be n * m r among the A IjThe i of matrix P is capable to be that one dimension ternary vector
Figure BDA00001767767900077
O is 3 * 3 null matrix.At rightmost 0 of equality 6 is the null vector of one dimension ternary.W, a and v are respectively by w i, a 1, a 2, a 3And v iThe one-dimensional vector that component is formed.Next equality leftmost (n+3) * (n+3) matrix is represented with K.
In this research; The monumented point of each experimenter EMMA data that reference is given (x '; Y ') with template in monumented point
Figure BDA00001767767900078
corresponding relation that defines, emphasis be the mapping of coordinate
Figure BDA00001767767900079
respective coordinates in the template coordinate system of EMMA data.So what be concerned about is the 2D point that the thin plate spline function of reference mark definition is obtained through distortion by many.For this reason, respectively the x coordinate and the y coordinate of data are shone upon with the TPS function.The song that can derive thin plate spline to
Figure BDA00001767767900082
mapping from equality 6 sticks up conversion, can recover through following formula:
w x w y a x a y = K - 1 x ^ ′ y ^ ′ 0 0 - - - ( 7 )
Where and are respectively from
Figure BDA00001767767900086
and component consisting of one-dimensional vector.w xAnd a xBe the parameter of x axle, w yAnd a yIt is the parameter of y axle.Point (x j, y j) to coordinate
Figure BDA00001767767900088
Conversion try to achieve by following formula:
x ′ y ′ = B Q w x w y a x a y - - - ( 8 )
Wherein B Ji = ( ( x j - x ^ i ) 2 + ( y j - y ^ i ) 2 ) Ln ( ( x j - x ^ i ) 2 + ( y j - y ^ i ) 2 ) , I=1 ..., n, j=1 ..., the j of m.Q matrix row for vector (1, x j, y j), vector x among the result ' and the j of y ' row bring x and y into formula and obtain afterwards
Figure BDA000017677679000811
With
Figure BDA000017677679000812
Vector.
How below to illustrate with above-mentioned mention be applied in the example in the research process of physiology pronunciation based on thin pressing bar method, and the effect after physiology pronunciation voice aspect standardized with the acoustic voice aspect is assessed.
Standardization instance in the physiology pronunciation.
The effect of different vocal organs in research voice people's the physiology pronunciation; Select to represent the unique point of its motion state; Though the clear and definite regulation of neither one in this respect, we have rule of thumb selected four positions of tongue surface from the tip of the tongue to the root of the tongue as motion characteristics point in the phonation, in experiment we with T1 to four sensors of T4 as measurement point; Be placed on the position of unique point, catch its movement locus by EMMA.Come the studying physiological pronunciation from these unique points motion trace data phonation.But the difference between the individuality not only shows the difference of sound channel profile, shows that also the dispersion of unique point moving region is not concentrated.We want studying physiological vocal organs movement characteristic and eliminate the difference between individuality, will use our nonlinear normalization method.The practical implementation process is following:
At first the sound channel contour curve position to three experimenters averages, and obtains our normalized template sound channel profile.Utilize known gridding technique respectively, the sound channel space of three experimenters and template is carried out the sign of monumented point.We choose four curves of line under maxilla line, center line, tongue surface line and the tongue surface, are the center with tongue surface particle then, and the pronunciation movement space is divided into ten sectors, and the intersection point of the limit of sector and four curves is exactly the monumented point of our regulation.We can be confirmed the thin pressing bar radiation parameter of each experimenter to template, thereby obtained our normalized method by the monumented point corresponding relationship afterwards.In Fig. 2, the data coordinates that the differs greatly change that standardizes, result such as Fig. 3, each experimenter's vocal tract shape and movement locus almost overlap, and can prove that individual difference obviously reduces.
Recruitment evaluation after acoustics and the physiology pronunciation standardization effect of evaluation experimental method, the method that we have selected in this field, to be in the linearize maxilla wall of leading position compares as benchmark.Fig. 4 has shown the result after same physiology pronunciation data adopts the method for linearize maxilla wall.Compare with the methods and results based on TPS that we show in Fig. 3, the result space that the method through linearize maxilla wall obtains shows more difference on distributing.Data after raw data in experimenter's tongue surface probe motion process, the standardization of linearize method and all be presented among Fig. 5 based on the standard deviation of the data after the standardization of TPS method.Can find out that the standard deviation of the data of experimenter's sensor and five vowels has reduced 0.8mm at the x direction of principal axis, has reduced 2.4mm at the y direction of principal axis after the standardization of TPS method.From Fig. 5 we can find out data that the method for linearize maxilla wall obtains and raw data comparison with standard difference the x direction of principal axis depart from more.This is because stretching maxilla wall and the assurance observation point linearize method vertical with the maxilla wall increase the irrelevance in x direction of principal axis data.
In order to estimate based on the normalization method of the TPS effect at the voice acoustic connection, we study the acoustic feature of the data after raw data and the standardization.Data after we will standardize produce complete vocal tract shape as the input of physiology pronunciation model; Then the data of each vowel under 320 kinds of different contexts are synthesized, calculate first three resonance peak and and the original sound in the EMMA database do comparison.Table 1 has shown average first three resonance peak of EMMA data and the average resonance peak of synthetic vowel.The result shows the acoustic characteristic that can keep vowel based on the normalization method of TPS.
For whether the back voice dynamic perfromance of standardizing is kept estimate, in Fig. 6, our image of each experimenter's raw data that drawn with standardization back data medial vowel.It is quite similar with the shape of initial vowel structure that we can find out clearly that each experimenter T1 obtains the shape of vowel structure behind the data normalization of T4.This result shows, adopt the normalization method of this paper that speaker's personal characteristics can be kept, and the difference that is caused by the vocal organs form between the speaker has also reduced.
The standard difference comparison diagram of experimenter's raw data as shown in Figure 5 and normalized number certificate; The representative of left side bar be the standard deviation of original EMMA data; What central strip was represented is the standard deviation of linearize method standardization back data, and the right bar is the standard deviation of data after standardizing with the TPS method.
The mean value of table 1. resonance peak and standard deviation
Compare with traditional method for normalizing that utilizes linearize, standardization of the present invention not only can be eliminated the modal difference of individual vocal organs, has also kept the personalized kinetic characteristic of individual pronunciation simultaneously.Through said method, we standardize to the unique point movement locus of four sensing stations from the tip of the tongue to the root of the tongue on each experimenter's tongue surface in the physiology phonation.Before the standardization, we as can be seen from Figure 2 different experiments person not only exist notable difference in sound channel contour curve position, and motion feature point moving region extremely disperses in phonation.After utilizing nonlinear method standardization based on thin pressing bar, the result is as shown in Figure 3, and experimenter's maxilla curve coincides together, and the moving region of unique point also almost is in the same space zone.So not only the relative position relation on the maxilla of channel model and tongue surface can keep, and personalized kinetic characteristic is not lost yet during the pronunciation of each vocal organs.
List of references:
[1]M.E.J.Beckman,T.,T.-P.Jung,S.-h.Lee,K.d.Jong,A.K.Krishnamurthy,S.C.Ahalt,K.B.Cohen,and?M.J.Collins,″Variability?in?the?production?of?quantal?vowels?revisited,″J.Acoust.Soc.Am.,vol.97,pp.471-490,1995.
[2]M.Hashi,J.R.Westbury,and?K.Honda,″Vowel?posture?normalization,″JASA,vol.104,pp.2426–2437,1998.
[3]B.FL,″Principal?warps:Thin?plate?splines?and?the?decomposition?of?deformations,″IEEE?Trans?Pattern?Anal.Mach.Intell,vol.11,pp.567-85,1989.
[4]T.Okadome?and?M.Honda,″Generation?of?articulatory?movements?by?using?a?kinematic?triphone?model,″J.Acoust.Soc.Am,pp.453-463,2001.
[5]J.Dang?and?K.Honda,″Construction?and?control?of?a?physiological?articulatory?model,″JASA,vol.115,pp.853-870,2004.
[6]Yang,C.-S.and?Kasuya,H.,“Uniform?and?non-uniform?normalization?of?vocal?tracts?measured?by?MRI?across?male,female?and?child,”IEICE?Trans.On?Inf.&Syst.,Vol.E78-D,No.6,pp.732-737,1995
[7]L.Zagorchev?and?A.Goshtasby,″A?comparative?study?of?transformation?functions?for?nonrigid?image?registration,″IEEE?Trans.Image?Processing,vol.15,pp.529-538,2006.
[8]J.Lim?and?M.H.Yang,″A?Direct?Method?for?modeling?Non-rigid?Motion?with?Thin?Plate?Spline,″in2005?IEEE?Computer?Society?Conference?on?Computer?Vision?and?Pattern?Recognition.
[9]Beautemps,D.,Badin,P.,and?Laboissière,R.(1995).Deriving?vocal-tract?area?function?from?midsagittal?profiles?and?formants?frequencies:A?new?model?for?vowels?and?fricative?consosnants?based?on?experimental?data.Speech?Communication,16,27-47.

Claims (1)

1. the form method for normalizing of sound channel during an extensive physiology pronunciation data is handled is characterized in that this method may further comprise the steps:
Step 1 obtains many group maxillas and tongue surface profile lines data from physiology pronunciation data storehouse, the average shape that obtains sound channel in the physiology phonation according to these data is set up many groups template of this method;
Step 2, utilize grid system that the monumented point in the vocal tract shape of many groups template of last step is marked; Specific practice: at first; According to the average shape of the data computation tongue surface contour curve of all vowel physiology pronunciation in the physiology pronunciation data storehouse, and then go out the mean place of tongue surface central point by the data computation on all tongues surfaces in the database; Average shape and tongue centre of surface point mean place according to the tongue surface contour curve are confirmed grid system; The whole grid system that obtains is divided into ten onesize sectors; Make ten sectors cover the space of sound channel motion in the whole physiology phonation; And the limit of each sector respectively with maxilla curve, meta curve, tongue surface curve and tongue surface under curve intersection; Thereby to 44 crossing points, with these 44 points just as the mark point of sound channel;
Step 3, utilize in above-mentioned mark point and the physiology pronunciation data storehouse between the original point one-to-one relationship to confirm the thin plate spline function parameter, realize the physiology pronunciation data handle in the form standard of sound channel, specific algorithm comprises:
Supposing has n coordinate points in the sound channel two-dimensional coordinate system; The song of thin plate spline sticks up can be by the individual parametric description of 2 (n+3); These parameters are made up of common 2n nonlinear parameter of 6 overall linear dimensions and n monumented point; Wherein half is the axial parametric description of x, and second half is the axial parametric description of y.This 2 (n+3) individual parameter can be confirmed by the linear system of mentioning in [7].Supposing
Figure FDA00001767767800011
and represent n monumented point on the plane, is the situation of 44 monumented points in this experiment.The functional value that the coordinate of these monumented points is brought the correspondence that thin plate spline function tries to achieve into is that (x, y) expression is mapping relations of
Figure FDA00001767767800014
to the visible thin plate spline difference functions f of
Figure FDA00001767767800012
Figure FDA00001767767800013
.The thin plate spline difference functions defines as follows:
f ( x , y ) = a 1 + a 2 x + a 3 y + Σ i = 1 n w i r i 2 ln r i 2 - - - ( 1 )
Above-mentioned equality (1), a 1+ a 2X+a 3Y is linear transformation,
Figure FDA00001767767800022
It is nonlinear transformation; Wherein the r implication is following
Figure FDA00001767767800023
Represent the point that will carry out conversion in each enunciator space, with respect to the square distance of each monumented point, x and y represent thin plate spline f respectively (x promptly insert the coordinate that thin plate spline function will carry out the point of conversion in y).Equality (1) is to be the equality of load centre thin plate deformation in the infinite space scope with each monumented point coordinate .Thin plate with
Figure FDA00001767767800025
For the weights under the situation of load centre are w iThe spline interpolation function of thin plate is made up of two parts; The linear change that a part is described by first three element, remaining part are to describe the nonlinearities change that the batten song sticks up.Through making the flexional E of function f difference functions fCan reach minimum qualifications, and the coordinate one-to-one relationship of monumented point is confirmed a 1, a 2, a 3And w iValue, thereby confirm thin plate spline function.E wherein fDefine as follows:
Figure FDA00001767767800026
Formula (2) is represented flexional, can find out as the E that represents flexional fHour, (x, the conversion of y) carrying out will reach minimum degreeof tortuosity to f, approach the conversion on the thin plate plane.
Below be three constraint conditions:
Σ i = 1 n w i = 0 - - - ( 3 ) Σ i = 1 n x ^ i w i = 0 - - - ( 4 ) Σ i = 1 n y ^ i w i = 0 - - - ( 5 )
Constraint condition (3) show all be applied in load on the thin plate with should be zero.This requires thin plate under the situation of forcing load, to keep static rather than motion.Constraint (4) and (5) requirement be when force at thin plate under load and the non-rotary situation x axle and y axle separately the motion of direction be zero.
TPS parameter vector a comprises a 1, a 2And a 3Three components, vectorial w comprises several w iComponent, these two vectors can calculate through following linear equation:
A P P T 0 w a = v 0 - - - ( 6 )
Wherein
Figure FDA00001767767800032
Wherein n equals the number of monumented point, and j need to equal the number of the raw data coordinate points of conversion, will be that load is found the solution at the center with different monumented points all because each needs the coordinate points of conversion, so be n * m r among the A IjThe i of matrix P is capable to be that one dimension ternary vector
Figure FDA00001767767800033
O is 3 * 3 null matrix.At rightmost 0 of equality (6) is the null vector of one dimension ternary.W, a and v are respectively by w i, a 1, a 2, a 3And v iThe one-dimensional vector that component is formed.Next equality leftmost (n+3) * (n+3) matrix is represented with K.
The monumented point of each experimenter EMMA data that reference is given (x '; Y ') with template in monumented point corresponding relation that defines, emphasis be the mapping of coordinate
Figure FDA00001767767800035
respective coordinates
Figure FDA00001767767800036
in the template coordinate system of EMMA data.So what be concerned about is the 2D point that the thin plate spline function of reference mark definition is obtained through distortion by many.For this reason, respectively the x coordinate and the y coordinate of data are shone upon with the TPS function.The song that can derive thin plate spline
Figure FDA00001767767800037
to
Figure FDA00001767767800038
mapping from equality 6 sticks up conversion, can recover through following formula:
w x w y a x a y = K - 1 x ^ ′ y ^ ′ 0 0 - - - ( 7 )
Where
Figure FDA000017677678000310
and
Figure FDA000017677678000311
are respectively from and
Figure FDA000017677678000313
component consisting of one-dimensional vector.w xAnd a xBe the parameter of x axle, w yAnd a yIt is the parameter of y axle.Point (x j, y j) to coordinate
Figure FDA000017677678000314
Conversion try to achieve by following formula:
x ′ y ′ = B Q w x w y a x a y - - - ( 8 )
Wherein B Ji = ( ( x j - x ^ i ) 2 + ( y j - y ^ i ) 2 ) Ln ( ( x j - x ^ i ) 2 + ( y j - y ^ i ) 2 ) , i = 1 , . . . , n , j = 1 , . . . , m . Q The j of matrix row for vector (1, x j, y j), vector x among the result ' and the j of y ' row bring x and y into formula and obtain afterwards
Figure FDA00001767767800041
With
Figure FDA00001767767800042
Vector.
CN2012101965474A 2012-06-14 2012-06-14 Vocal tract morphological standardization method during large-scale physiological pronunciation data processing Pending CN102799759A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012101965474A CN102799759A (en) 2012-06-14 2012-06-14 Vocal tract morphological standardization method during large-scale physiological pronunciation data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012101965474A CN102799759A (en) 2012-06-14 2012-06-14 Vocal tract morphological standardization method during large-scale physiological pronunciation data processing

Publications (1)

Publication Number Publication Date
CN102799759A true CN102799759A (en) 2012-11-28

Family

ID=47198868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012101965474A Pending CN102799759A (en) 2012-06-14 2012-06-14 Vocal tract morphological standardization method during large-scale physiological pronunciation data processing

Country Status (1)

Country Link
CN (1) CN102799759A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133713A (en) * 2017-11-27 2018-06-08 苏州大学 A kind of method that sound channel area is estimated in the case where glottis closes phase
WO2019034183A1 (en) * 2017-08-17 2019-02-21 厦门快商通科技股份有限公司 Utterance testing method and device, and speech category learning method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034272A (en) * 2010-09-29 2011-04-27 浙江大学 Generating method of individualized maxillofacial soft tissue hexahedral mesh

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034272A (en) * 2010-09-29 2011-04-27 浙江大学 Generating method of individualized maxillofacial soft tissue hexahedral mesh

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIANGUO WEI ET AL.: "《Acoustics Speech and Signal Processing(ICASSP),2010 IEEE International Conference on》", 19 March 2010 *
LYUBOMIR ZAGORCHEV ET AL.: "A Comparative Study of Transformation Functions for Nonrigid Image Registration", 《IMAGE PROCESSING,IEEE TRANSACTIONS ON》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019034183A1 (en) * 2017-08-17 2019-02-21 厦门快商通科技股份有限公司 Utterance testing method and device, and speech category learning method and system
CN108133713A (en) * 2017-11-27 2018-06-08 苏州大学 A kind of method that sound channel area is estimated in the case where glottis closes phase
CN108133713B (en) * 2017-11-27 2020-10-02 苏州大学 Method for estimating sound channel area under glottic closed phase

Similar Documents

Publication Publication Date Title
CN101561710B (en) Man-machine interaction method based on estimation of human face posture
Cheng et al. A novel phonology-and radical-coded Chinese sign language recognition framework using accelerometer and surface electromyography sensors
CN109036467B (en) TF-LSTM-based CFFD extraction method, voice emotion recognition method and system
CN108364639A (en) Speech processing system and method
Du et al. Robust iterative closest point algorithm for registration of point sets with outliers
CN102203852B (en) Method for converting voice
CN101159064A (en) Image generation system and method for generating image
CN104008564A (en) Human face expression cloning method
CN103258340B (en) Is rich in the manner of articulation of the three-dimensional visualization Mandarin Chinese pronunciation dictionary of emotional expression ability
CN105701504B (en) Multi-modal manifold embedding grammar for zero sample learning
CN104346824A (en) Method and device for automatically synthesizing three-dimensional expression based on single facial image
CN108537145A (en) Human bodys' response method based on space-time skeleton character and depth belief network
CN101976453A (en) GPU-based three-dimensional face expression synthesis method
Wang et al. Evaluation of Chinese calligraphy by using DBSC vectorization and ICP algorithm
CN110197503A (en) Non-rigid point set method for registering based on enhanced affine transformation
CN103778661A (en) Method for generating three-dimensional motion model of speaker, system and computer thereof
Ryumin et al. Automatic detection and recognition of 3D manual gestures for human-machine interaction
CN106096642A (en) Based on the multi-modal affective characteristics fusion method differentiating locality preserving projections
CN102799759A (en) Vocal tract morphological standardization method during large-scale physiological pronunciation data processing
Gattone et al. A shape distance based on the Fisher–Rao metric and its application for shapes clustering
CN102945550B (en) A kind of method building remote sensing image semanteme based on Gaussian scale-space
CN102750549A (en) Automatic tongue contour extraction method based on nuclear magnetic resonance images
CN106055244B (en) Man-machine interaction method based on Kinect and voice
CN104064187A (en) Sign language conversion voice system
Girin et al. Extending the cascaded gaussian mixture regression framework for cross-speaker acoustic-articulatory mapping

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121128