CN104361620B - A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm - Google Patents

A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm Download PDF

Info

Publication number
CN104361620B
CN104361620B CN201410712164.7A CN201410712164A CN104361620B CN 104361620 B CN104361620 B CN 104361620B CN 201410712164 A CN201410712164 A CN 201410712164A CN 104361620 B CN104361620 B CN 104361620B
Authority
CN
China
Prior art keywords
mouth
chinese
shape
speaks
cartoon
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410712164.7A
Other languages
Chinese (zh)
Other versions
CN104361620A (en
Inventor
韩慧健
梁秀霞
贾可亮
张锐
刘峥
其他发明人请求不公开姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410712164.7A priority Critical patent/CN104361620B/en
Publication of CN104361620A publication Critical patent/CN104361620A/en
Application granted granted Critical
Publication of CN104361620B publication Critical patent/CN104361620B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers

Abstract

A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm, comprises the following steps:Analyzed for the Chinese language text of input, Chinese character separating is visualized into phoneme for different Chinese, and these factors are sent to speech synthesis system synthesize basic visualization phoneme stream, realistic parameter faceform is set up based on the standards of MPEG 4, use the deformation of visualization phoneme animation frame driving parameter model, background image and the processing by different level and addition to noise are added, lively, true, the good mouth shape cartoon synthesis of effect is realized.

Description

A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm
Technical field
The present invention relates to facial expression animation research field, the shape of the mouth as one speaks for more particularly relating to the shape of the mouth as one speaks and voice match is moved Draw study on the synthesis field.
Background technology
With the continuous progress of Computer Animated Graph, requirement more and more higher of the people to mouth shape cartoon in man-machine interaction. But, the development of Chinese mouth shape cartoon relatively falls behind.On the one hand because mouth shape cartoon is the research of a multi-crossed disciplines Direction, includes man-machine interaction, computer graphics, speech language etc., and the development of related discipline is unbalanced so that build One true to nature, high automaticity mouth shape cartoon system is still a research topic for being rich in challenge.On the other hand, the world On have a quarter people say Chinese, Chinese mouth shape cartoon system has extremely wide application market, but is due to Chinese sheet The complexity that body has so that the voice mouth shape cartoon systematic research currently for Chinese is relatively fewer, develops also relative Delayed, the especially research work of domestic scholars lacks theoretical accumulation and technology accumulation, result in energy still in the ground zero stage Enough realize that the software of Chinese mouth shape cartoon design is seldom, and compare famous mouth shape cartoon design software such as Poser shape of the mouth as one speaks great master Mimic, 3ds max plug-in units Voice-O-Matic etc., it is supportive to Chinese poor all mainly for english language.
For the research of English mouth shape cartoon, coarticulation model, text-driven, voice driven and mixing are successively occurred in that The method of driving, Guiard-Marigny et al. proposes a kind of based on voice and image collective effect driving synthesis mouth shape cartoon Method, Bregler et al. proposes videoRewrite methods, and this method tracks speaker's lip using computer vision Characteristic point, and using deformation technology by these lip attitude integrations into final mouth shape cartoon sequence, Kang Liu with Jorn.Osterman proposes the shape of the mouth as one speaks and the corresponding relation of alphabetical phoneme in English, and is built on the basis of MPEG-4 animation standards Vertical face, the algorithm of mouth shape cartoon synthesis.It is less for the research in terms of Chinese mouth shape cartoon.It is true that Chinese mouth shape cartoon is synthesized True feeling effect is difficult to reach and surmount international most advanced level in a short time.This just compels to having researched and proposed for Chinese mouth shape cartoon The requirement cut.In addition, lacking the consideration to ambient noise and background image in the prior art so that animation is not lively enough, true It is real, and simulated scenario can not be carried out according to actual needs, and be adjusted as required by noise to improve the effect of animation.
The present invention is from the research angle for building the synthesis of voice driven mouth shape cartoon, to the design of three-dimensional lip section model, lip Dynamic sequences Design, Chinese speech synchronized algorithm and personalized shape of the mouth as one speaks modeling are furtherd investigate, and are realized in input Chinese language text letter Under conditions of breath, synthesized using information technology, output visually has virtual synchronous with labial perfect coordination of high presence People's mouth shape cartoon, and by adding background image so that animation can simulate various scenes as needed, by noise Processing by different level and addition so that animation is lively, true, improve the effect of animation.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of mouth shape cartoon based on aggregative weighted algorithm Synthetic method, this method can be exported visually, and there is the high presence visual human shape of the mouth as one speaks synchronous with labial perfect coordination to move Draw, and animation is lively, true, effect is good, comprises the following steps successively:
Step 1:Chinese language text is inputted, is analyzed for the Chinese language text of input, is different Chinese by Chinese character separating Phoneme is visualized, and these factors are sent to speech synthesis system and synthesizes basic visualization phoneme stream;
Step 2:Realistic parameter faceform is set up based on MPEG-4 standards, visualization phoneme animation frame is used The deformation of driving parameter model, realizes face mouth shape cartoon;
Step 3:The input ambient noise synchronous with input Chinese language text is obtained from input Chinese language text, to input background Noise is analyzed, and is carried out input ambient noise smoothing processing, is obtained initial input ambient noise;
Step 4:Inputted from Chinese character separating to extract the phoneme after phoneme is split respectively in different Chinese visualization phonemes Ambient noise, is analyzed phoneme input ambient noise, is carried out phoneme input ambient noise smoothing processing, is obtained initial phoneme Input ambient noise;
Step 5:Ambient noise is inputted using obtained initial phoneme, school is carried out to obtained initial input ambient noise Just, the input ambient noise after being corrected;
Step 6:Based on aggregative weighted algorithm, sound time control ratio is obtained, the sound weighted value factor is added, counts again The time of the plain mouth shape cartoon of single-tone is calculated, the synthesis of mouth shape cartoon is controlled, the Chinese speech of synthesis is synchronous with face mouth shape cartoon;
Step 7:Background image is added according to cartoon scene, it is synchronous with face mouth shape cartoon with the Chinese speech of synthesis;
Step 8:Based on aggregative weighted algorithm, noise temporal control ratio is obtained, the noise weight value factor is added, calculates school The Noise Synchronization time of input ambient noise after just;
Step 9:According to the demand of animation compound, the input ambient noise after selection control addition correction, with synthesis Chinese The synthesis animation of voice, face mouth shape cartoon and background image is synchronized, and realizes face mouth shape cartoon true to nature.
Analyzed for the Chinese language text of input, by Chinese character separating for different Chinese visualization phonemes be by Chinese character by The division of initial consonant and simple or compound vowel of a Chinese syllable in the quasi- Chinese phonetic alphabet of sighting target, completes the definition in shape of the mouth as one speaks phonetic part and shape of the mouth as one speaks phonetic rhythm portion, by Chinese character Standard phonetic be converted into shape of the mouth as one speaks part and the shape of the mouth as one speaks rhythm portion symbol composition shape of the mouth as one speaks phonetic.
The mouth shape cartoon synthetic method of the aggregative weighted algorithm of the present invention is realized:
(1) the three-dimensional shape of the mouth as one speaks modeling based on finite character point control method, joins according to the MPEG-4 human face characteristic points defined Number, chooses or defines lip area characteristic point, and the status data of tracking feature point simultaneously carries out comprehensive analysis, and specification lip zone state data are entered Row three-dimensional shape of the mouth as one speaks modeling;
(2) lip based on simple or compound vowel of a Chinese syllable transmission diversity weighting control method moves sequences Design, and time scale shared by part and rhythm portion is used and added Weight controls its effect played in animation compound;
(3) influence innovatively by punctuation mark in Chinese language text to speech pause in text reading is applied to voice mouthful In type collaboration animation, to various punctuation marks, pause duration carries out statistical analysis in text reading, according to pause duration to the Chinese Language punctuation mark is classified, and the relational model set up between its pause duration and text reading bulk velocity, meanwhile, to lip The duration scale parameter between adjacent lip in dynamic series model is analyzed, and integrated use punctuate pauses and lip parameter mould Type is weighted processing, realizes the Chinese speech mouth shape cartoon system of voice shape of the mouth as one speaks coordinate synchronization;
(4) the visual phoneme of Chinese is sorted out and divides and set up mapping relations with basic pronunciation mouth shape, according to Chinese phonetic The feature of plain pronunciation mouth shape, repartitions part and the rhythm portion of the Chinese phonetic alphabet, and the classification eases to standard initial consonant table are basic Six classes, rhythm portion can be divided into the four class shape of the mouth as one speaks, handled using a kind of cosine function and be deformed into " rhythm portion " key frame by " part " key frame The transition processing of two kinds of shape of the mouth as one speaks, allows animation more smooth flow.
(5) background image can be added so that animation can select different background images according to demand, so that will be dynamic Picture is presented under different scenes, and animation is more lively, truly.
(6) to the processing by different level and addition of noise so that according to different scene needs, the level of noise can be adjusted , for example in meeting, can not select without noise or reduce the rank of noise so that meeting can quieter, Spectators clearly can be carried out in the environment of audible sound;When needing to show ambient noise, can by ambient noise present or Person is presented with the level of noise needed, such as the underwater sound, tweedle that need to be accompanied by background environment so that animation is more given birth to Dynamic, truly, effect is more preferable;
(7) layered shaping is carried out to noise also with aggregative weighted algorithm so that animation compound and synchronous more flexible, conjunction Into with the demand after synchronization closer to synthesis, animation is lively, and truly, effect is good.
Brief description of the drawings
Fig. 1 Chinese speech synchronization shape of the mouth as one speaks process chart;
Fig. 2 people' s face positioning unit figures (FAPU);
Fig. 3 mouth area models;
The actual time domain waveform of Fig. 4 pronunciations and the animation compound control comparison diagram of sound weighting control.
Embodiment
The following detailed description of the specific implementation of the present invention, it is necessary to it is pointed out here that, implement to be only intended to this hair below Bright further illustrates, it is impossible to be interpreted as limiting the scope of the invention, and art skilled person is according to above-mentioned Some nonessential modifications and adaptations that the content of the invention is made to the present invention, still fall within protection scope of the present invention.
The analysis of Bopomofo pronunciation shape of the mouth as one speaks feature
The base unit of voice is from the angular divisions of tone color:Phoneme, syllable, tone and phoneme.Phoneme is syllabication Least unit or minimum sound bite.One syllable, if going further division by the difference of tone color, will obtain one The individual minimum unit having their own characteristics each, here it is phoneme.Mandarin pronunciation has 32 phonemes, is segmented into vowel and consonant two Major class, vowel phoneme has 10, and consonant phoneme has 22.According to《The Scheme for the Chinese Phonetic Alphabet》In mention factor pronunciation when spy Levy, the basic shape of the mouth as one speaks is divided into three-level, such as table 1 by the division of initial consonant and simple or compound vowel of a Chinese syllable in the combined standard Chinese phonetic alphabet.
It is, in general, that a Chinese character represents a syllable, exception is only present in er-suffix syllable, and this is mandarin A kind of phenomenon in voice, is also " suffixation of a nonsyllabic "r" rhythm " syllable, such as " playing " to write down is two Chinese characters, but read to get up and be One syllable " wanr ".Bopomofo pronunciation rule generally is only considered in the present invention, is talked about for the youngster that foregoing is directed to The special circumstances of sound, are classified as the processing of two syllables, and analysis is " wan " and " er " two in system processing by such as " playing " Syllable.
The basic shape of the mouth as one speaks classification chart of the Chinese speech pronunciation of table 1
Mouth shape cartoon phonetic redefines scheme
In mandarin, initial consonant is made up of consonant, totally 23 including b, p, m, f etc., and simple or compound vowel of a Chinese syllable totally 38 can be by a member Sound constitutes the combination of (such as a, o, e) or diphthong (such as ai, ie) or three vowels (such as iao).With standard Chinese phonetic Divide initial consonant similar with simple or compound vowel of a Chinese syllable, the phonetic of each Chinese character is defined as two parts:Part (s) and rhythm portion (y).Part and rhythm portion A kind of shape of the mouth as one speaks state is corresponded to respectively, and when making mouth shape cartoon, when personage often says a Chinese character, the shape of the mouth as one speaks is just by " part " key frame Transition deformation is " rhythm portion " key frame.When controlling the time of this two parts key frame, the processing method weighted using sound, with The animation time of rational control two makes animation more life-like.
If making true to nature, natural mouth shape cartoon, all combinations of initial consonant and simple or compound vowel of a Chinese syllable must just be considered.If Be that each initial consonant or simple or compound vowel of a Chinese syllable are set up into a shape of the mouth as one speaks, and consider its combined situation, which not only adds system when Between pay wages, and add repetitive work.According to the tagsort of the above-mentioned shape of the mouth as one speaks (table 1), it can be found that many phonemes Pronunciation mouth shape is identical or similar, therefore in order to reach quick, easy-operating purpose, the present invention is using the side compromised Case, is redefined to standard Chinese phonetic.According to the feature of the shape of the mouth as one speaks of table 1, part and the rhythm portion of the Chinese phonetic alphabet are repartitioned. Classification eases to standard initial consonant table are basic six classes such as table 2, and rhythm portion can be divided into the four class shape of the mouth as one speaks such as table 3.
The standard Chinese phonetic initial consonant conversion table of table 2
The standard Chinese phonetic simple or compound vowel of a Chinese syllable conversion table of table 3
The definition of part mainly sorts out the same or similar initial consonant of pronunciation mouth shape feature:S-b lips are closed, resistance Fill in air-flow;The upper teeth of s-f touch down lip and met into narrow;S-d nozzle type is crack, and lip loosens, and nozzle type change is trickle;S-g nozzle type is chin The a quarter of maximum angle is opened up into, lip loosens;S-r lips protract, and tighten;S-y lips are to two side stretchings.Also according to Shape of the mouth as one speaks feature, rhythm portion can be divided into:The y-a shape of the mouth as one speaks, is mainly used for the larger unrounded simple or compound vowel of a Chinese syllable pronunciation of lip aperture during pronunciation, example Such as a, an;The y-o shape of the mouth as one speaks, is mainly used for lip during pronunciation and slightly justifies, the simple or compound vowel of a Chinese syllable that mouth is scooped up forward, such as o, ou;The y-e shape of the mouth as one speaks, The non-round simple or compound vowel of a Chinese syllable of half, lip, the shape of the mouth as one speaks is such as e, i when being mainly used for pronouncing;Y-o be mainly used for pronunciation when lip forward The circular simple or compound vowel of a Chinese syllable only stayed compared with crack is protruded, such as u.According to table 2 and table 3, all phonetic transcriptions of Chinese characters are converted into by the present invention Shape of the mouth as one speaks part and two, shape of the mouth as one speaks rhythm portion part, such as " animation " two word can just be expressed as s-d → y-o and s-d → y-a. If s-b, s-d, s-f, s-r, s-y, y-a, y-o, s-g and y-e, y-i are made 9 shape of the mouth as one speaks models, then each two model Change procedure between key frame will constitute the pronunciation mouth shape animation of a Chinese character.
The method that Chinese character is divided into part and the rhythm portion shape of the mouth as one speaks according to consonant, vowel is applicable whole Chinese characters, only indivedual Chinese substantially Word phonetic makes an exception, i.e. list phoneme Chinese phoneme such as a (), 0 (), e (hungry), ai (love), ei (Ei), ao (coat), en (grace), er (youngster) Deng, they only have the Chinese phonetic alphabet division in simple or compound vowel of a Chinese syllable.If according to classification above, all only one of which shape of the mouth as one speaks rhythm portion, then The single rhythm portion shape of the mouth as one speaks is just only existed in animation compound, they are all called plus a fixed part shape of the mouth as one speaks symbol in order to unified Nature model, is designated as " & ".The final result of above phonetic conversion is as shown in table 4:
Next complete after shape of the mouth as one speaks phonetic part is defined with rhythm portion is exactly conversion work, is exactly to turn the standard phonetic of Chinese character Change the shape of the mouth as one speaks phonetic being made up of part and rhythm portion symbol into.In order to which program is realized conveniently, part and the mouth in rhythm portion in this research Type mark simplifies, and " s- " and " y- " removed above is only write as symbol letter after a letter simplifies and have 10:a、o、e、i、 b、d、f、r、y、g.Table 5 gives the example of some phonetic transcriptions of Chinese characters conversion:
The single syllable phonetic conversion table of table 4
The part Chinese character conversion citing table of table 5
Fig. 2 describes faceform's definition in its natural state, and it is that, in Z-direction, all facial muscles are put to stare Pine;Eyelid is tangent with iris;The iris diameter (IRISD0) of pupil is 1/3rd;Upper lower lip is contacted naturally;Lip line is Level, with upper lip on sustained height;Mouth closing, sharp tongue supports interdental border up and down and keeps level.
Fig. 3 describes mouth area model, and record data point represents 372 points of free space, 124 summits in a model It is the three-dimensional data in free space, because only that 145 training points of observation are by with therefore covariance matrix ∑ only has 144 etc. Level, be up to 144 independent degrees find by matrix PCA analyses, most difference by preceding 10 grouped datas or Person's " characteristic point " is represented, i.e. 10 eigenvalue λs of covariance matrix ∑ intermediate value highest, and mouth shape cartoon characteristic point is chosen with this.
Sound adds the algorithm that time weighting controls the shape of the mouth as one speaks to synthesize animation, the general principle of foundation be in same class phoneme, The basic shape of the mouth as one speaks change of mouth shape cartoon is with similitude very greatly, and the basic shape of the mouth as one speaks change of mouth shape cartoon has in different phonemes There is very big otherness.Equally in the synthesis of mouth shape cartoon, the mouth between the part of different phonemes and the shape of the mouth as one speaks in rhythm portion is represented Type changing features with the method for sound time weight weight parameter it is obvious that distinguish the difference of animation shape of the mouth as one speaks feature here Property.
The characteristic vector of each time frame is respectively X in voice segments a, bi, Yi(1≤i≤Na, 1≤j≤Nb).If XiWith Yi's Euclidean distance is dij, then the intersegmental distance for having a, b sections is:
In formula, Da,bFor the average value of all characteristic vector distances between a, b, the totality synthetically reflected between a, b is poor It is different.If mouth shape cartoon to be split is divided into T frames, 1 is respectively labeled as ... ..., T.Assuming that using t frames as boundary, it is front and rear each M frames are taken to constitute two sub- voice segments, i.e. i ∈ [t-m+1 ... ..., t] and j ∈ [t+1 ... ..., t+m], can be with according to formula (1) The intersegmental distance for obtaining the two sub- voice segments is
The sound time control ratio obtained, addition sound weighted value w are calculated by above-mentioned formula (1), (2)s、wyThe factor, weight The new time for calculating the plain mouth shape cartoon of single-tone, control the synthesis of mouth shape cartoon.
Wherein ws+wy=1.
In order to obtain the average time of accurate Chinese word character tone periodChinese word character tone period will be adopted It is identical that word speed is read aloud in sample, sampling process.The present invention takes every group of N number of sampling point of data of M group data, and place is averaged to it Reason, by the assessment of tag system, takes the average time of variance most hours dataIt is used as progress mouth shape cartoon synthesis The standard time of Chinese word character mouth shape cartoon.
Consider the influence that Chinese punctuation changes to the continuous shape of the mouth as one speaks, consider when synthesizing continuous animation in sentence or end of the sentence Dead time of appearance longer 7 kinds of periods, such as fullstop, exclamation, question mark, pause mark, comma, branch, colon, and according to this 7 Kind of label is in sentence or the length of end of the sentence dead time is assigned to different weight wsbi, formula (4).
wbiRepresent the weighted value of i-th of label in label.By changing the w in certain limitbiLabel weighted value can give birth to Into the basic shape of the mouth as one speaks of similar training set, use on continuous animation synthesis passage.
The actual time domain waveform of Fig. 4 pronunciations and the animation compound control comparison diagram of sound weighting control illustrate after sound weighting Preferably, wherein upper figure is the time domain beamformer of Chinese syllable, figure below is single syllable sound weighting control animation to animation compound effect Generate time control figure.
The continuous smooth control of mouth shape cartoon synthesis, in order to solve the linking overscale problems between two continuous shape of the mouth as one speaks actions, The present invention weights control algolithm control pronunciation initial consonant, mouth shape and rhythm by the initial consonant and simple or compound vowel of a Chinese syllable to standard Chinese phonetic using sound The time of female shape of the mouth as one speaks, and the interpolation method in two mouth shape cartoon transition using cosine function, to the end point of an action Position carries out offset interpolation to another starting point position acted so that have good continuity between mouth shape cartoon.Tool Body cosine function interpolation algorithm is as follows:
Once t (now time and t0It is related) determined by audio frequency apparatus, lip modal displacement can be just computed .Each lip node apparent place x can be usedi(t)=[x (t), y (t), z (t)] ' is defined, herein, i=1, 2 ..., n is the sequence for the control node for defining mouth geometry and topological structure.In order to accomplish the control of complete oral area shape The interpolated value of node location, topological structure must keep fixed, and the control node in each lip shape blank must It must be consistent.The position X (s) of middle each node of the interpolation shape of the mouth as one speaks can be by initial and terminate apparent place nodes X0And X1's Position is calculated, and formula is as follows:
Variable s is generally described as t linearly or nonlinearly conversion, and 0≤s≤1, however, based on linear interpolation Action does not show the initial actuating feature for accelerating and slowing down.One close interpolation approximation for accelerating and slowing down is used One cosine function improves this action:
S'=s* (1-cos (π * (s0-s)))/2 (6)
The use of cosine interpolation type is an effectively solution, and system is it also seen that satisfied result.
The extraction of ambient noise is using the conventional voice in this area and noise separation technology, and noise analysis processing and animation are made an uproar Sound synchronously using with Chinese language text similar mode, equally be based on aggregative weighted algorithm, obtain noise temporal control ratio, addition The noise weight value factor, calculates the Noise Synchronization time of the input ambient noise after correction, so as to close according to the actual needs Into animation.
Fig. 1 Chinese speech synchronization shape of the mouth as one speaks process chart, is comprised the following steps that:
Step1. Chinese language text is inputted;
Step2. the phonetic of Chinese phonetics is converted text to;
Step3. the sample of synthesis voice is produced from text;
Step4. audio process is inquired, and determines from speech play processor current phoneme;
Step5. the current shape of the mouth as one speaks is calculated from the track of current syllable;
Step6. the synchronous shape of the mouth as one speaks of synthesis voice and the displaying of synchronous figure, returns to Step4 until being without readable factor Only.
Although for illustrative purposes, it has been described that illustrative embodiments of the invention, those skilled in the art Member it will be understood that, can be in form and details in the case of the scope and spirit for not departing from invention disclosed in appended claims The upper change for carrying out various modifications, addition and replacement etc., and all these changes should all belong to appended claims of the present invention Each step in protection domain, and claimed each department of product and method, can be in any combination Form is combined.Therefore, to disclosed in this invention embodiment description be not intended to limit the scope of the present invention, But for describing the present invention.Correspondingly, the scope of the present invention is not limited by embodiment of above, but by claim or Its equivalent is defined.

Claims (4)

1. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm, it is characterised in that comprise the following steps successively:
Step 1:Chinese language text is inputted, is analyzed for the Chinese language text of input, Chinese character separating is visual for different Chinese Change phoneme, and these factors are sent to speech synthesis system and synthesize basic visualization phoneme stream;
Step 2:Realistic parameter faceform is set up based on MPEG-4 standards, visualization phoneme animation frame parameter is used The deformation of driving model, realizes face mouth shape cartoon;
Step 3:The input ambient noise synchronous with input Chinese language text is obtained from input Chinese language text, to input ambient noise Analyzed, carry out input ambient noise smoothing processing, obtain initial input ambient noise;
Step 4:Background is inputted from Chinese character separating to extract the phoneme after phoneme is split respectively in different Chinese visualization phonemes Noise, is analyzed phoneme input ambient noise, is carried out phoneme input ambient noise smoothing processing, is obtained initial phoneme input Ambient noise;
Step 5:Ambient noise is inputted using obtained initial phoneme, obtained initial input ambient noise is corrected, obtained Input ambient noise after to correction;
Step 6:Based on aggregative weighted algorithm, sound time control ratio is obtained, the sound weighted value factor is added, recalculates list The time of phoneme mouth shape cartoon, the synthesis of mouth shape cartoon is controlled, the Chinese speech of synthesis is synchronous with face mouth shape cartoon;
Step 7:Background image is added according to cartoon scene, it is synchronous with face mouth shape cartoon with the Chinese speech of synthesis;
Step 8:Based on aggregative weighted algorithm, noise temporal control ratio is obtained, the noise weight value factor is added, calculates after correction Input ambient noise the Noise Synchronization time;
Step 9:According to the demand of animation compound, the input ambient noise after selection control addition correction, with synthesis Chinese speech, The synthesis animation of face mouth shape cartoon and background image is synchronized, and realizes face mouth shape cartoon true to nature.
2. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm as claimed in claim 1, it is characterised in that:It is described Analyzed for the Chinese language text of input, it is according to the standard Chinese by Chinese character that Chinese character separating is visualized into phoneme for different Chinese The division of initial consonant and simple or compound vowel of a Chinese syllable in language phonetic, completes the definition in shape of the mouth as one speaks phonetic part and shape of the mouth as one speaks phonetic rhythm portion, and the standard of Chinese character is spelled Sound is converted into the shape of the mouth as one speaks phonetic of shape of the mouth as one speaks part and shape of the mouth as one speaks rhythm portion symbol composition.
3. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm as claimed in claim 1, it is characterised in that:It is described Based on aggregative weighted algorithm, the sound time control ratio of acquisition adds the sound weighted value factor, recalculates the plain shape of the mouth as one speaks of single-tone The time of animation, the synthesis of control mouth shape cartoon comprises the following steps:
The characteristic vector of each time frame is respectively X in voice segments a, bi, Yi, wherein 1≤i≤Na, 1≤j≤Nb, XiWith YiEuclidean Distance is dij, then a, b sections of intersegmental distance is:
D a , b = 1 N a N b Σ i = 1 N a Σ j = 1 N b d i j
Da,bFor the average value of all characteristic vector distances between a, b, the total difference between a, b is reflected, the shape of the mouth as one speaks to be split is moved Picture is divided into T frames, is respectively labeled as 1 ... ..., T, front and rear respectively to take m frames to constitute two sub- voice segments using t frames as boundary, i.e., I ∈ [t-m+1 ... ..., t] and j ∈ [t+1 ... ..., t+m], the intersegmental distance for obtaining the two sub- voice segments is
D t = 1 m 2 Σ i = t - m + 1 t Σ j = t + 1 t + m d i j
Calculate the sound time control ratio obtained, addition sound weighted value ws、wyThe factor, recalculates the plain mouth shape cartoon of single-tone Time, control mouth shape cartoon synthesis:
t s = w s t ‾ p ; t y = w y t ‾ p ;
Wherein ws+wy=1;
Every group of N number of sampling point of data of M group data is taken, and processing is averaged to it, by the assessment of tag system, by the side of taking The average time of poor most hours dataIt is used as the standard time for the Chinese word character mouth shape cartoon for carrying out mouth shape cartoon synthesis.
4. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm as claimed in claim 3, it is characterised in that:It is described Based on aggregative weighted algorithm, the sound time control ratio of acquisition adds the sound weighted value factor, recalculates the plain shape of the mouth as one speaks of single-tone The time of animation, control the synthesis of mouth shape cartoon further comprising the steps of:Consider what Chinese punctuation changed to the continuous shape of the mouth as one speaks Influence, consider when synthesizing continuous animation dead time longer 7 kinds of periods that in sentence or end of the sentence occurs, i.e. fullstop, exclamation, Question mark, pause mark, comma, branch, colon, according to this 7 kinds of labels in sentence or the end of the sentence dead time length be assigned to it is different Weight wbi,
t s ′ = w s t ‾ p w b i ; t y ′ = w y t ‾ p w b i ;
wbiThe weighted value of i-th of label in label is represented, by changing the w in certain limitbiThe generation of label weighted value is similar The basic shape of the mouth as one speaks of training set, is used on continuous animation synthesis passage.
CN201410712164.7A 2014-11-27 2014-11-27 A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm Active CN104361620B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410712164.7A CN104361620B (en) 2014-11-27 2014-11-27 A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410712164.7A CN104361620B (en) 2014-11-27 2014-11-27 A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm

Publications (2)

Publication Number Publication Date
CN104361620A CN104361620A (en) 2015-02-18
CN104361620B true CN104361620B (en) 2017-07-28

Family

ID=52528878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410712164.7A Active CN104361620B (en) 2014-11-27 2014-11-27 A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm

Country Status (1)

Country Link
CN (1) CN104361620B (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504304B (en) * 2016-09-14 2019-09-24 厦门黑镜科技有限公司 A kind of method and device of animation compound
CN106297792A (en) * 2016-09-14 2017-01-04 厦门幻世网络科技有限公司 The recognition methods of a kind of voice mouth shape cartoon and device
CN107808191A (en) * 2017-09-13 2018-03-16 北京光年无限科技有限公司 The output intent and system of the multi-modal interaction of visual human
CN108038461B (en) * 2017-12-22 2020-05-08 河南工学院 System and method for interactive simultaneous correction of mouth shape and tongue shape of foreign languages
CN108447474B (en) * 2018-03-12 2020-10-16 北京灵伴未来科技有限公司 Modeling and control method for synchronizing virtual character voice and mouth shape
CN108763190B (en) * 2018-04-12 2019-04-02 平安科技(深圳)有限公司 Voice-based mouth shape cartoon synthesizer, method and readable storage medium storing program for executing
CN108777140B (en) * 2018-04-27 2020-07-28 南京邮电大学 Voice conversion method based on VAE under non-parallel corpus training
CN110853614A (en) * 2018-08-03 2020-02-28 Tcl集团股份有限公司 Virtual object mouth shape driving method and device and terminal equipment
TW202009924A (en) * 2018-08-16 2020-03-01 國立臺灣科技大學 Timbre-selectable human voice playback system, playback method thereof and computer-readable recording medium
CN109377540B (en) * 2018-09-30 2023-12-19 网易(杭州)网络有限公司 Method and device for synthesizing facial animation, storage medium, processor and terminal
CN109377539B (en) * 2018-11-06 2023-04-11 北京百度网讯科技有限公司 Method and apparatus for generating animation
CN109830236A (en) * 2019-03-27 2019-05-31 广东工业大学 A kind of double vision position shape of the mouth as one speaks synthetic method
CN110134305B (en) * 2019-04-02 2022-12-09 北京搜狗科技发展有限公司 Method and device for adjusting speech rate
CN110136698B (en) * 2019-04-11 2021-09-24 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for determining mouth shape
CN110176284A (en) * 2019-05-21 2019-08-27 杭州师范大学 A kind of speech apraxia recovery training method based on virtual reality
CN110413841A (en) * 2019-06-13 2019-11-05 深圳追一科技有限公司 Polymorphic exchange method, device, system, electronic equipment and storage medium
CN110347867B (en) * 2019-07-16 2022-04-19 北京百度网讯科技有限公司 Method and device for generating lip motion video
CN110428812B (en) * 2019-07-30 2022-04-05 天津大学 Method for synthesizing tongue ultrasonic video according to voice information based on dynamic time programming
CN110446066B (en) * 2019-08-28 2021-11-19 北京百度网讯科技有限公司 Method and apparatus for generating video
CN113689880A (en) * 2020-05-18 2021-11-23 北京搜狗科技发展有限公司 Method, device, electronic equipment and medium for driving virtual human in real time
WO2021232876A1 (en) * 2020-05-18 2021-11-25 北京搜狗科技发展有限公司 Method and apparatus for driving virtual human in real time, and electronic device and medium
CN111915707B (en) * 2020-07-01 2024-01-09 天津洪恩完美未来教育科技有限公司 Mouth shape animation display method and device based on audio information and storage medium
CN112184859B (en) * 2020-09-01 2023-10-03 魔珐(上海)信息科技有限公司 End-to-end virtual object animation generation method and device, storage medium and terminal
CN112331184B (en) * 2020-10-29 2024-03-15 网易(杭州)网络有限公司 Voice mouth shape synchronization method and device, electronic equipment and storage medium
CN112750187A (en) * 2021-01-19 2021-05-04 腾讯科技(深圳)有限公司 Animation generation method, device and equipment and computer readable storage medium
CN113112575B (en) * 2021-04-08 2024-04-30 深圳市山水原创动漫文化有限公司 Mouth shape generating method and device, computer equipment and storage medium
CN113643413A (en) * 2021-08-30 2021-11-12 北京沃东天骏信息技术有限公司 Animation processing method, animation processing device, animation processing medium and electronic equipment
CN115222856B (en) * 2022-05-20 2023-09-26 一点灵犀信息技术(广州)有限公司 Expression animation generation method and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520903A (en) * 2009-04-23 2009-09-02 北京水晶石数字科技有限公司 Method for matching Chinese mouth shape of cartoon role
CN101826216A (en) * 2010-03-31 2010-09-08 中国科学院自动化研究所 Automatic generating system for role Chinese mouth shape cartoon
CN103218842A (en) * 2013-03-12 2013-07-24 西南交通大学 Voice synchronous-drive three-dimensional face mouth shape and face posture animation method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520903A (en) * 2009-04-23 2009-09-02 北京水晶石数字科技有限公司 Method for matching Chinese mouth shape of cartoon role
CN101826216A (en) * 2010-03-31 2010-09-08 中国科学院自动化研究所 Automatic generating system for role Chinese mouth shape cartoon
CN103218842A (en) * 2013-03-12 2013-07-24 西南交通大学 Voice synchronous-drive three-dimensional face mouth shape and face posture animation method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于加权算法的汉语语音同步三维口型动画研究;毕永新 等;《图学学报》;20120430;第33卷(第2期);第98-102页 *
基于综合加权算法的汉语语音同步三维口型动画研究;毕永新;《中国优秀硕士学位论文全文数据库(电子期刊)》;20130315;第二章第2.1节,第三章第3.4.2节,第四章第4.1-4.2节,第五章第5.1节 *
基于语音识别的汉语发音自动评分系统的设计与实现;吕军;《计算机工程与设计》;20070331;第28卷(第5期);第1232-1235页 *
基于韵律文本的三维口型动画;尹宝才 等;《北京工业大学学报》;20091231;第35卷(第12期);第1690-1696页 *

Also Published As

Publication number Publication date
CN104361620A (en) 2015-02-18

Similar Documents

Publication Publication Date Title
CN104361620B (en) A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm
CN113781610B (en) Virtual face generation method
CN102169642B (en) Interactive virtual teacher system having intelligent error correction function
CN109712627A (en) It is a kind of using speech trigger virtual actor's facial expression and the voice system of mouth shape cartoon
Granström et al. Audiovisual representation of prosody in expressive speech communication
US20060009978A1 (en) Methods and systems for synthesis of accurate visible speech via transformation of motion capture data
CN103258340B (en) Is rich in the manner of articulation of the three-dimensional visualization Mandarin Chinese pronunciation dictionary of emotional expression ability
CN103218842A (en) Voice synchronous-drive three-dimensional face mouth shape and face posture animation method
Naert et al. A survey on the animation of signing avatars: From sign representation to utterance synthesis
CN102820030A (en) Vocal organ visible speech synthesis system
Ma et al. Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data
Yu et al. Data-driven 3D visual pronunciation of Chinese IPA for language learning
Karpov et al. Multimodal synthesizer for Russian and Czech sign languages and audio-visual speech
Liu et al. An interactive speech training system with virtual reality articulation for Mandarin-speaking hearing impaired children
Massaro et al. A multilingual embodied conversational agent
Ouni et al. Training Baldi to be multilingual: A case study for an Arabic Badr
CN116665275A (en) Facial expression synthesis and interaction control method based on text-to-Chinese pinyin
Yu et al. 3D visual pronunciation of Mandarine Chinese for language learning
Li et al. A novel speech-driven lip-sync model with CNN and LSTM
Burgos et al. Engaging human-to-robot attention using conversational gestures and lip-synchronization
Busso et al. Learning expressive human-like head motion sequences from speech
Gibet et al. Toward a motor theory of sign language perception
Yu A real-time 3d visual singing synthesis: From appearance to internal articulators
Li et al. Multimodal 3D visible articulation system for syllable based mandarin chinese training
Edge et al. Model-based synthesis of visual speech movements from 3D video

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant