CN104361620B

CN104361620B - A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm

Info

Publication number: CN104361620B
Application number: CN201410712164.7A
Authority: CN
Inventors: 韩慧健; 梁秀霞; 贾可亮; 张锐; 刘峥; 其他发明人请求不公开姓名
Original assignee: Individual
Current assignee: Individual
Priority date: 2014-11-27
Filing date: 2014-11-27
Publication date: 2017-07-28
Anticipated expiration: 2034-11-27
Also published as: CN104361620A

Abstract

A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm, comprises the following steps：Analyzed for the Chinese language text of input, Chinese character separating is visualized into phoneme for different Chinese, and these factors are sent to speech synthesis system synthesize basic visualization phoneme stream, realistic parameter faceform is set up based on the standards of MPEG 4, use the deformation of visualization phoneme animation frame driving parameter model, background image and the processing by different level and addition to noise are added, lively, true, the good mouth shape cartoon synthesis of effect is realized.

Description

A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm

Technical field

The present invention relates to facial expression animation research field, the shape of the mouth as one speaks for more particularly relating to the shape of the mouth as one speaks and voice match is moved Draw study on the synthesis field.

Background technology

With the continuous progress of Computer Animated Graph, requirement more and more higher of the people to mouth shape cartoon in man-machine interaction. But, the development of Chinese mouth shape cartoon relatively falls behind.On the one hand because mouth shape cartoon is the research of a multi-crossed disciplines Direction, includes man-machine interaction, computer graphics, speech language etc., and the development of related discipline is unbalanced so that build One true to nature, high automaticity mouth shape cartoon system is still a research topic for being rich in challenge.On the other hand, the world On have a quarter people say Chinese, Chinese mouth shape cartoon system has extremely wide application market, but is due to Chinese sheet The complexity that body has so that the voice mouth shape cartoon systematic research currently for Chinese is relatively fewer, develops also relative Delayed, the especially research work of domestic scholars lacks theoretical accumulation and technology accumulation, result in energy still in the ground zero stage Enough realize that the software of Chinese mouth shape cartoon design is seldom, and compare famous mouth shape cartoon design software such as Poser shape of the mouth as one speaks great master Mimic, 3ds max plug-in units Voice-O-Matic etc., it is supportive to Chinese poor all mainly for english language.

For the research of English mouth shape cartoon, coarticulation model, text-driven, voice driven and mixing are successively occurred in that The method of driving, Guiard-Marigny et al. proposes a kind of based on voice and image collective effect driving synthesis mouth shape cartoon Method, Bregler et al. proposes videoRewrite methods, and this method tracks speaker's lip using computer vision Characteristic point, and using deformation technology by these lip attitude integrations into final mouth shape cartoon sequence, Kang Liu with Jorn.Osterman proposes the shape of the mouth as one speaks and the corresponding relation of alphabetical phoneme in English, and is built on the basis of MPEG-4 animation standards Vertical face, the algorithm of mouth shape cartoon synthesis.It is less for the research in terms of Chinese mouth shape cartoon.It is true that Chinese mouth shape cartoon is synthesized True feeling effect is difficult to reach and surmount international most advanced level in a short time.This just compels to having researched and proposed for Chinese mouth shape cartoon The requirement cut.In addition, lacking the consideration to ambient noise and background image in the prior art so that animation is not lively enough, true It is real, and simulated scenario can not be carried out according to actual needs, and be adjusted as required by noise to improve the effect of animation.

The present invention is from the research angle for building the synthesis of voice driven mouth shape cartoon, to the design of three-dimensional lip section model, lip Dynamic sequences Design, Chinese speech synchronized algorithm and personalized shape of the mouth as one speaks modeling are furtherd investigate, and are realized in input Chinese language text letter Under conditions of breath, synthesized using information technology, output visually has virtual synchronous with labial perfect coordination of high presence People's mouth shape cartoon, and by adding background image so that animation can simulate various scenes as needed, by noise Processing by different level and addition so that animation is lively, true, improve the effect of animation.

The content of the invention

It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of mouth shape cartoon based on aggregative weighted algorithm Synthetic method, this method can be exported visually, and there is the high presence visual human shape of the mouth as one speaks synchronous with labial perfect coordination to move Draw, and animation is lively, true, effect is good, comprises the following steps successively：

Step 1：Chinese language text is inputted, is analyzed for the Chinese language text of input, is different Chinese by Chinese character separating Phoneme is visualized, and these factors are sent to speech synthesis system and synthesizes basic visualization phoneme stream；

Step 2：Realistic parameter faceform is set up based on MPEG-4 standards, visualization phoneme animation frame is used The deformation of driving parameter model, realizes face mouth shape cartoon；

Step 3：The input ambient noise synchronous with input Chinese language text is obtained from input Chinese language text, to input background Noise is analyzed, and is carried out input ambient noise smoothing processing, is obtained initial input ambient noise；

Step 4：Inputted from Chinese character separating to extract the phoneme after phoneme is split respectively in different Chinese visualization phonemes Ambient noise, is analyzed phoneme input ambient noise, is carried out phoneme input ambient noise smoothing processing, is obtained initial phoneme Input ambient noise；

Step 5：Ambient noise is inputted using obtained initial phoneme, school is carried out to obtained initial input ambient noise Just, the input ambient noise after being corrected；

Step 6：Based on aggregative weighted algorithm, sound time control ratio is obtained, the sound weighted value factor is added, counts again The time of the plain mouth shape cartoon of single-tone is calculated, the synthesis of mouth shape cartoon is controlled, the Chinese speech of synthesis is synchronous with face mouth shape cartoon；

Step 7：Background image is added according to cartoon scene, it is synchronous with face mouth shape cartoon with the Chinese speech of synthesis；

Step 8：Based on aggregative weighted algorithm, noise temporal control ratio is obtained, the noise weight value factor is added, calculates school The Noise Synchronization time of input ambient noise after just；

Step 9：According to the demand of animation compound, the input ambient noise after selection control addition correction, with synthesis Chinese The synthesis animation of voice, face mouth shape cartoon and background image is synchronized, and realizes face mouth shape cartoon true to nature.

Analyzed for the Chinese language text of input, by Chinese character separating for different Chinese visualization phonemes be by Chinese character by The division of initial consonant and simple or compound vowel of a Chinese syllable in the quasi- Chinese phonetic alphabet of sighting target, completes the definition in shape of the mouth as one speaks phonetic part and shape of the mouth as one speaks phonetic rhythm portion, by Chinese character Standard phonetic be converted into shape of the mouth as one speaks part and the shape of the mouth as one speaks rhythm portion symbol composition shape of the mouth as one speaks phonetic.

The mouth shape cartoon synthetic method of the aggregative weighted algorithm of the present invention is realized：

(1) the three-dimensional shape of the mouth as one speaks modeling based on finite character point control method, joins according to the MPEG-4 human face characteristic points defined Number, chooses or defines lip area characteristic point, and the status data of tracking feature point simultaneously carries out comprehensive analysis, and specification lip zone state data are entered Row three-dimensional shape of the mouth as one speaks modeling；

(2) lip based on simple or compound vowel of a Chinese syllable transmission diversity weighting control method moves sequences Design, and time scale shared by part and rhythm portion is used and added Weight controls its effect played in animation compound；

(3) influence innovatively by punctuation mark in Chinese language text to speech pause in text reading is applied to voice mouthful In type collaboration animation, to various punctuation marks, pause duration carries out statistical analysis in text reading, according to pause duration to the Chinese Language punctuation mark is classified, and the relational model set up between its pause duration and text reading bulk velocity, meanwhile, to lip The duration scale parameter between adjacent lip in dynamic series model is analyzed, and integrated use punctuate pauses and lip parameter mould Type is weighted processing, realizes the Chinese speech mouth shape cartoon system of voice shape of the mouth as one speaks coordinate synchronization；

(4) the visual phoneme of Chinese is sorted out and divides and set up mapping relations with basic pronunciation mouth shape, according to Chinese phonetic The feature of plain pronunciation mouth shape, repartitions part and the rhythm portion of the Chinese phonetic alphabet, and the classification eases to standard initial consonant table are basic Six classes, rhythm portion can be divided into the four class shape of the mouth as one speaks, handled using a kind of cosine function and be deformed into " rhythm portion " key frame by " part " key frame The transition processing of two kinds of shape of the mouth as one speaks, allows animation more smooth flow.

(5) background image can be added so that animation can select different background images according to demand, so that will be dynamic Picture is presented under different scenes, and animation is more lively, truly.

(6) to the processing by different level and addition of noise so that according to different scene needs, the level of noise can be adjusted , for example in meeting, can not select without noise or reduce the rank of noise so that meeting can quieter, Spectators clearly can be carried out in the environment of audible sound；When needing to show ambient noise, can by ambient noise present or Person is presented with the level of noise needed, such as the underwater sound, tweedle that need to be accompanied by background environment so that animation is more given birth to Dynamic, truly, effect is more preferable；

(7) layered shaping is carried out to noise also with aggregative weighted algorithm so that animation compound and synchronous more flexible, conjunction Into with the demand after synchronization closer to synthesis, animation is lively, and truly, effect is good.

Brief description of the drawings

Fig. 1 Chinese speech synchronization shape of the mouth as one speaks process chart；

Fig. 2 people' s face positioning unit figures (FAPU)；

Fig. 3 mouth area models；

The actual time domain waveform of Fig. 4 pronunciations and the animation compound control comparison diagram of sound weighting control.

Embodiment

The following detailed description of the specific implementation of the present invention, it is necessary to it is pointed out here that, implement to be only intended to this hair below Bright further illustrates, it is impossible to be interpreted as limiting the scope of the invention, and art skilled person is according to above-mentioned Some nonessential modifications and adaptations that the content of the invention is made to the present invention, still fall within protection scope of the present invention.

The analysis of Bopomofo pronunciation shape of the mouth as one speaks feature

The base unit of voice is from the angular divisions of tone color：Phoneme, syllable, tone and phoneme.Phoneme is syllabication Least unit or minimum sound bite.One syllable, if going further division by the difference of tone color, will obtain one The individual minimum unit having their own characteristics each, here it is phoneme.Mandarin pronunciation has 32 phonemes, is segmented into vowel and consonant two Major class, vowel phoneme has 10, and consonant phoneme has 22.According to《The Scheme for the Chinese Phonetic Alphabet》In mention factor pronunciation when spy Levy, the basic shape of the mouth as one speaks is divided into three-level, such as table 1 by the division of initial consonant and simple or compound vowel of a Chinese syllable in the combined standard Chinese phonetic alphabet.

It is, in general, that a Chinese character represents a syllable, exception is only present in er-suffix syllable, and this is mandarin A kind of phenomenon in voice, is also " suffixation of a nonsyllabic "r" rhythm " syllable, such as " playing " to write down is two Chinese characters, but read to get up and be One syllable " wanr ".Bopomofo pronunciation rule generally is only considered in the present invention, is talked about for the youngster that foregoing is directed to The special circumstances of sound, are classified as the processing of two syllables, and analysis is " wan " and " er " two in system processing by such as " playing " Syllable.

The basic shape of the mouth as one speaks classification chart of the Chinese speech pronunciation of table 1

Mouth shape cartoon phonetic redefines scheme

In mandarin, initial consonant is made up of consonant, totally 23 including b, p, m, f etc., and simple or compound vowel of a Chinese syllable totally 38 can be by a member Sound constitutes the combination of (such as a, o, e) or diphthong (such as ai, ie) or three vowels (such as iao).With standard Chinese phonetic Divide initial consonant similar with simple or compound vowel of a Chinese syllable, the phonetic of each Chinese character is defined as two parts：Part (s) and rhythm portion (y).Part and rhythm portion A kind of shape of the mouth as one speaks state is corresponded to respectively, and when making mouth shape cartoon, when personage often says a Chinese character, the shape of the mouth as one speaks is just by " part " key frame Transition deformation is " rhythm portion " key frame.When controlling the time of this two parts key frame, the processing method weighted using sound, with The animation time of rational control two makes animation more life-like.

If making true to nature, natural mouth shape cartoon, all combinations of initial consonant and simple or compound vowel of a Chinese syllable must just be considered.If Be that each initial consonant or simple or compound vowel of a Chinese syllable are set up into a shape of the mouth as one speaks, and consider its combined situation, which not only adds system when Between pay wages, and add repetitive work.According to the tagsort of the above-mentioned shape of the mouth as one speaks (table 1), it can be found that many phonemes Pronunciation mouth shape is identical or similar, therefore in order to reach quick, easy-operating purpose, the present invention is using the side compromised Case, is redefined to standard Chinese phonetic.According to the feature of the shape of the mouth as one speaks of table 1, part and the rhythm portion of the Chinese phonetic alphabet are repartitioned. Classification eases to standard initial consonant table are basic six classes such as table 2, and rhythm portion can be divided into the four class shape of the mouth as one speaks such as table 3.

The standard Chinese phonetic initial consonant conversion table of table 2

The standard Chinese phonetic simple or compound vowel of a Chinese syllable conversion table of table 3

The definition of part mainly sorts out the same or similar initial consonant of pronunciation mouth shape feature：S-b lips are closed, resistance Fill in air-flow；The upper teeth of s-f touch down lip and met into narrow；S-d nozzle type is crack, and lip loosens, and nozzle type change is trickle；S-g nozzle type is chin The a quarter of maximum angle is opened up into, lip loosens；S-r lips protract, and tighten；S-y lips are to two side stretchings.Also according to Shape of the mouth as one speaks feature, rhythm portion can be divided into：The y-a shape of the mouth as one speaks, is mainly used for the larger unrounded simple or compound vowel of a Chinese syllable pronunciation of lip aperture during pronunciation, example Such as a, an；The y-o shape of the mouth as one speaks, is mainly used for lip during pronunciation and slightly justifies, the simple or compound vowel of a Chinese syllable that mouth is scooped up forward, such as o, ou；The y-e shape of the mouth as one speaks, The non-round simple or compound vowel of a Chinese syllable of half, lip, the shape of the mouth as one speaks is such as e, i when being mainly used for pronouncing；Y-o be mainly used for pronunciation when lip forward The circular simple or compound vowel of a Chinese syllable only stayed compared with crack is protruded, such as u.According to table 2 and table 3, all phonetic transcriptions of Chinese characters are converted into by the present invention Shape of the mouth as one speaks part and two, shape of the mouth as one speaks rhythm portion part, such as " animation " two word can just be expressed as s-d → y-o and s-d → y-a. If s-b, s-d, s-f, s-r, s-y, y-a, y-o, s-g and y-e, y-i are made 9 shape of the mouth as one speaks models, then each two model Change procedure between key frame will constitute the pronunciation mouth shape animation of a Chinese character.

The method that Chinese character is divided into part and the rhythm portion shape of the mouth as one speaks according to consonant, vowel is applicable whole Chinese characters, only indivedual Chinese substantially Word phonetic makes an exception, i.e. list phoneme Chinese phoneme such as a (), 0 (), e (hungry), ai (love), ei (Ei), ao (coat), en (grace), er (youngster) Deng, they only have the Chinese phonetic alphabet division in simple or compound vowel of a Chinese syllable.If according to classification above, all only one of which shape of the mouth as one speaks rhythm portion, then The single rhythm portion shape of the mouth as one speaks is just only existed in animation compound, they are all called plus a fixed part shape of the mouth as one speaks symbol in order to unified Nature model, is designated as " ＆ ".The final result of above phonetic conversion is as shown in table 4：

Next complete after shape of the mouth as one speaks phonetic part is defined with rhythm portion is exactly conversion work, is exactly to turn the standard phonetic of Chinese character Change the shape of the mouth as one speaks phonetic being made up of part and rhythm portion symbol into.In order to which program is realized conveniently, part and the mouth in rhythm portion in this research Type mark simplifies, and " s- " and " y- " removed above is only write as symbol letter after a letter simplifies and have 10：a、o、e、i、 b、d、f、r、y、g.Table 5 gives the example of some phonetic transcriptions of Chinese characters conversion：

The single syllable phonetic conversion table of table 4

The part Chinese character conversion citing table of table 5

Fig. 2 describes faceform's definition in its natural state, and it is that, in Z-direction, all facial muscles are put to stare Pine；Eyelid is tangent with iris；The iris diameter (IRISD0) of pupil is 1/3rd；Upper lower lip is contacted naturally；Lip line is Level, with upper lip on sustained height；Mouth closing, sharp tongue supports interdental border up and down and keeps level.

Fig. 3 describes mouth area model, and record data point represents 372 points of free space, 124 summits in a model It is the three-dimensional data in free space, because only that 145 training points of observation are by with therefore covariance matrix ∑ only has 144 etc. Level, be up to 144 independent degrees find by matrix PCA analyses, most difference by preceding 10 grouped datas or Person's " characteristic point " is represented, i.e. 10 eigenvalue λs of covariance matrix ∑ intermediate value highest, and mouth shape cartoon characteristic point is chosen with this.

Sound adds the algorithm that time weighting controls the shape of the mouth as one speaks to synthesize animation, the general principle of foundation be in same class phoneme, The basic shape of the mouth as one speaks change of mouth shape cartoon is with similitude very greatly, and the basic shape of the mouth as one speaks change of mouth shape cartoon has in different phonemes There is very big otherness.Equally in the synthesis of mouth shape cartoon, the mouth between the part of different phonemes and the shape of the mouth as one speaks in rhythm portion is represented Type changing features with the method for sound time weight weight parameter it is obvious that distinguish the difference of animation shape of the mouth as one speaks feature here Property.

The characteristic vector of each time frame is respectively X in voice segments a, b_i, Y_i(1≤i≤N_a, 1≤j≤N_b).If X_iWith Y_i's Euclidean distance is dij, then the intersegmental distance for having a, b sections is：

In formula, D_a,bFor the average value of all characteristic vector distances between a, b, the totality synthetically reflected between a, b is poor It is different.If mouth shape cartoon to be split is divided into T frames, 1 is respectively labeled as ... ..., T.Assuming that using t frames as boundary, it is front and rear each M frames are taken to constitute two sub- voice segments, i.e. i ∈ [t-m+1 ... ..., t] and j ∈ [t+1 ... ..., t+m], can be with according to formula (1) The intersegmental distance for obtaining the two sub- voice segments is

The sound time control ratio obtained, addition sound weighted value w are calculated by above-mentioned formula (1), (2)_s、w_yThe factor, weight The new time for calculating the plain mouth shape cartoon of single-tone, control the synthesis of mouth shape cartoon.

Wherein w_s+w_y=1.

In order to obtain the average time of accurate Chinese word character tone periodChinese word character tone period will be adopted It is identical that word speed is read aloud in sample, sampling process.The present invention takes every group of N number of sampling point of data of M group data, and place is averaged to it Reason, by the assessment of tag system, takes the average time of variance most hours dataIt is used as progress mouth shape cartoon synthesis The standard time of Chinese word character mouth shape cartoon.

Consider the influence that Chinese punctuation changes to the continuous shape of the mouth as one speaks, consider when synthesizing continuous animation in sentence or end of the sentence Dead time of appearance longer 7 kinds of periods, such as fullstop, exclamation, question mark, pause mark, comma, branch, colon, and according to this 7 Kind of label is in sentence or the length of end of the sentence dead time is assigned to different weight ws_bi, formula (4).

w_biRepresent the weighted value of i-th of label in label.By changing the w in certain limit_biLabel weighted value can give birth to Into the basic shape of the mouth as one speaks of similar training set, use on continuous animation synthesis passage.

The actual time domain waveform of Fig. 4 pronunciations and the animation compound control comparison diagram of sound weighting control illustrate after sound weighting Preferably, wherein upper figure is the time domain beamformer of Chinese syllable, figure below is single syllable sound weighting control animation to animation compound effect Generate time control figure.

The continuous smooth control of mouth shape cartoon synthesis, in order to solve the linking overscale problems between two continuous shape of the mouth as one speaks actions, The present invention weights control algolithm control pronunciation initial consonant, mouth shape and rhythm by the initial consonant and simple or compound vowel of a Chinese syllable to standard Chinese phonetic using sound The time of female shape of the mouth as one speaks, and the interpolation method in two mouth shape cartoon transition using cosine function, to the end point of an action Position carries out offset interpolation to another starting point position acted so that have good continuity between mouth shape cartoon.Tool Body cosine function interpolation algorithm is as follows：

Once t (now time and t₀It is related) determined by audio frequency apparatus, lip modal displacement can be just computed .Each lip node apparent place x can be used_i(t)=[x (t), y (t), z (t)] ' is defined, herein, i=1, 2 ..., n is the sequence for the control node for defining mouth geometry and topological structure.In order to accomplish the control of complete oral area shape The interpolated value of node location, topological structure must keep fixed, and the control node in each lip shape blank must It must be consistent.The position X (s) of middle each node of the interpolation shape of the mouth as one speaks can be by initial and terminate apparent place nodes X⁰And X¹'s Position is calculated, and formula is as follows：

Variable s is generally described as t linearly or nonlinearly conversion, and 0≤s≤1, however, based on linear interpolation Action does not show the initial actuating feature for accelerating and slowing down.One close interpolation approximation for accelerating and slowing down is used One cosine function improves this action：

S'=s* (1-cos (π * (s₀-s)))/2 (6)

The use of cosine interpolation type is an effectively solution, and system is it also seen that satisfied result.

The extraction of ambient noise is using the conventional voice in this area and noise separation technology, and noise analysis processing and animation are made an uproar Sound synchronously using with Chinese language text similar mode, equally be based on aggregative weighted algorithm, obtain noise temporal control ratio, addition The noise weight value factor, calculates the Noise Synchronization time of the input ambient noise after correction, so as to close according to the actual needs Into animation.

Fig. 1 Chinese speech synchronization shape of the mouth as one speaks process chart, is comprised the following steps that：

Step1. Chinese language text is inputted；

Step2. the phonetic of Chinese phonetics is converted text to；

Step3. the sample of synthesis voice is produced from text；

Step4. audio process is inquired, and determines from speech play processor current phoneme；

Step5. the current shape of the mouth as one speaks is calculated from the track of current syllable；

Step6. the synchronous shape of the mouth as one speaks of synthesis voice and the displaying of synchronous figure, returns to Step4 until being without readable factor Only.

Although for illustrative purposes, it has been described that illustrative embodiments of the invention, those skilled in the art Member it will be understood that, can be in form and details in the case of the scope and spirit for not departing from invention disclosed in appended claims The upper change for carrying out various modifications, addition and replacement etc., and all these changes should all belong to appended claims of the present invention Each step in protection domain, and claimed each department of product and method, can be in any combination Form is combined.Therefore, to disclosed in this invention embodiment description be not intended to limit the scope of the present invention, But for describing the present invention.Correspondingly, the scope of the present invention is not limited by embodiment of above, but by claim or Its equivalent is defined.

Claims

1. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm, it is characterised in that comprise the following steps successively：

Step 1：Chinese language text is inputted, is analyzed for the Chinese language text of input, Chinese character separating is visual for different Chinese Change phoneme, and these factors are sent to speech synthesis system and synthesize basic visualization phoneme stream；

Step 2：Realistic parameter faceform is set up based on MPEG-4 standards, visualization phoneme animation frame parameter is used The deformation of driving model, realizes face mouth shape cartoon；

Step 3：The input ambient noise synchronous with input Chinese language text is obtained from input Chinese language text, to input ambient noise Analyzed, carry out input ambient noise smoothing processing, obtain initial input ambient noise；

Step 4：Background is inputted from Chinese character separating to extract the phoneme after phoneme is split respectively in different Chinese visualization phonemes Noise, is analyzed phoneme input ambient noise, is carried out phoneme input ambient noise smoothing processing, is obtained initial phoneme input Ambient noise；

Step 5：Ambient noise is inputted using obtained initial phoneme, obtained initial input ambient noise is corrected, obtained Input ambient noise after to correction；

Step 6：Based on aggregative weighted algorithm, sound time control ratio is obtained, the sound weighted value factor is added, recalculates list The time of phoneme mouth shape cartoon, the synthesis of mouth shape cartoon is controlled, the Chinese speech of synthesis is synchronous with face mouth shape cartoon；

Step 8：Based on aggregative weighted algorithm, noise temporal control ratio is obtained, the noise weight value factor is added, calculates after correction Input ambient noise the Noise Synchronization time；

Step 9：According to the demand of animation compound, the input ambient noise after selection control addition correction, with synthesis Chinese speech, The synthesis animation of face mouth shape cartoon and background image is synchronized, and realizes face mouth shape cartoon true to nature.

2. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm as claimed in claim 1, it is characterised in that：It is described Analyzed for the Chinese language text of input, it is according to the standard Chinese by Chinese character that Chinese character separating is visualized into phoneme for different Chinese The division of initial consonant and simple or compound vowel of a Chinese syllable in language phonetic, completes the definition in shape of the mouth as one speaks phonetic part and shape of the mouth as one speaks phonetic rhythm portion, and the standard of Chinese character is spelled Sound is converted into the shape of the mouth as one speaks phonetic of shape of the mouth as one speaks part and shape of the mouth as one speaks rhythm portion symbol composition.

3. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm as claimed in claim 1, it is characterised in that：It is described Based on aggregative weighted algorithm, the sound time control ratio of acquisition adds the sound weighted value factor, recalculates the plain shape of the mouth as one speaks of single-tone The time of animation, the synthesis of control mouth shape cartoon comprises the following steps：

The characteristic vector of each time frame is respectively X in voice segments a, b_i, Y_i, wherein 1≤i≤N_a, 1≤j≤N_b, X_iWith Y_iEuclidean Distance is d_ij, then a, b sections of intersegmental distance is：

D_{a, b} = \frac{1}{N_{a} N_{b}} Σ_{i = 1}^{N_{a}} Σ_{j = 1}^{N_{b}} d_{i j}

D_a,bFor the average value of all characteristic vector distances between a, b, the total difference between a, b is reflected, the shape of the mouth as one speaks to be split is moved Picture is divided into T frames, is respectively labeled as 1 ... ..., T, front and rear respectively to take m frames to constitute two sub- voice segments using t frames as boundary, i.e., I ∈ [t-m+1 ... ..., t] and j ∈ [t+1 ... ..., t+m], the intersegmental distance for obtaining the two sub- voice segments is

D_{t} = \frac{1}{m^{2}} Σ_{i = t - m + 1}^{t} Σ_{j = t + 1}^{t + m} d_{i j}

Calculate the sound time control ratio obtained, addition sound weighted value w_s、w_yThe factor, recalculates the plain mouth shape cartoon of single-tone Time, control mouth shape cartoon synthesis：

t_{s} = w_{s} {\overset{&OverBar;}{t}}_{p}; t_{y} = w_{y} {\overset{&OverBar;}{t}}_{p};

Wherein w_s+w_y=1；

Every group of N number of sampling point of data of M group data is taken, and processing is averaged to it, by the assessment of tag system, by the side of taking The average time of poor most hours dataIt is used as the standard time for the Chinese word character mouth shape cartoon for carrying out mouth shape cartoon synthesis.

4. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm as claimed in claim 3, it is characterised in that：It is described Based on aggregative weighted algorithm, the sound time control ratio of acquisition adds the sound weighted value factor, recalculates the plain shape of the mouth as one speaks of single-tone The time of animation, control the synthesis of mouth shape cartoon further comprising the steps of：Consider what Chinese punctuation changed to the continuous shape of the mouth as one speaks Influence, consider when synthesizing continuous animation dead time longer 7 kinds of periods that in sentence or end of the sentence occurs, i.e. fullstop, exclamation, Question mark, pause mark, comma, branch, colon, according to this 7 kinds of labels in sentence or the end of the sentence dead time length be assigned to it is different Weight w_bi,

t_{s}^{'} = w_{s} {\overset{&OverBar;}{t}}_{p} w_{b i}; t_{y}^{'} = w_{y} {\overset{&OverBar;}{t}}_{p} w_{b i};

w_biThe weighted value of i-th of label in label is represented, by changing the w in certain limit_biThe generation of label weighted value is similar The basic shape of the mouth as one speaks of training set, is used on continuous animation synthesis passage.