CN104361620B - A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm - Google Patents
A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm Download PDFInfo
- Publication number
- CN104361620B CN104361620B CN201410712164.7A CN201410712164A CN104361620B CN 104361620 B CN104361620 B CN 104361620B CN 201410712164 A CN201410712164 A CN 201410712164A CN 104361620 B CN104361620 B CN 104361620B
- Authority
- CN
- China
- Prior art keywords
- mouth
- chinese
- shape
- speaks
- cartoon
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010189 synthetic method Methods 0.000 title claims abstract description 9
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 31
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 31
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000012800 visualization Methods 0.000 claims abstract description 9
- 230000033764 rhythmic process Effects 0.000 claims description 24
- 150000001875 compounds Chemical class 0.000 claims description 22
- 230000001360 synchronised effect Effects 0.000 claims description 14
- 230000008859 change Effects 0.000 claims description 6
- 238000012937 correction Methods 0.000 claims description 5
- 238000009499 grossing Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 210000001072 colon Anatomy 0.000 claims description 2
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 9
- 238000000034 method Methods 0.000 description 16
- 238000011160 research Methods 0.000 description 9
- 238000006243 chemical reaction Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 241001672694 Citrus reticulata Species 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 206010034719 Personality change Diseases 0.000 description 1
- 235000008331 Pinus X rigitaeda Nutrition 0.000 description 1
- 235000011613 Pinus brutia Nutrition 0.000 description 1
- 241000018646 Pinus brutia Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000002508 compound effect Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 210000000744 eyelid Anatomy 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 210000001097 facial muscle Anatomy 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 210000001747 pupil Anatomy 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000003319 supportive effect Effects 0.000 description 1
- 230000002459 sustained effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000004260 weight control Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Processing Or Creating Images (AREA)
Abstract
A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm, comprises the following steps:Analyzed for the Chinese language text of input, Chinese character separating is visualized into phoneme for different Chinese, and these factors are sent to speech synthesis system synthesize basic visualization phoneme stream, realistic parameter faceform is set up based on the standards of MPEG 4, use the deformation of visualization phoneme animation frame driving parameter model, background image and the processing by different level and addition to noise are added, lively, true, the good mouth shape cartoon synthesis of effect is realized.
Description
Technical field
The present invention relates to facial expression animation research field, the shape of the mouth as one speaks for more particularly relating to the shape of the mouth as one speaks and voice match is moved
Draw study on the synthesis field.
Background technology
With the continuous progress of Computer Animated Graph, requirement more and more higher of the people to mouth shape cartoon in man-machine interaction.
But, the development of Chinese mouth shape cartoon relatively falls behind.On the one hand because mouth shape cartoon is the research of a multi-crossed disciplines
Direction, includes man-machine interaction, computer graphics, speech language etc., and the development of related discipline is unbalanced so that build
One true to nature, high automaticity mouth shape cartoon system is still a research topic for being rich in challenge.On the other hand, the world
On have a quarter people say Chinese, Chinese mouth shape cartoon system has extremely wide application market, but is due to Chinese sheet
The complexity that body has so that the voice mouth shape cartoon systematic research currently for Chinese is relatively fewer, develops also relative
Delayed, the especially research work of domestic scholars lacks theoretical accumulation and technology accumulation, result in energy still in the ground zero stage
Enough realize that the software of Chinese mouth shape cartoon design is seldom, and compare famous mouth shape cartoon design software such as Poser shape of the mouth as one speaks great master
Mimic, 3ds max plug-in units Voice-O-Matic etc., it is supportive to Chinese poor all mainly for english language.
For the research of English mouth shape cartoon, coarticulation model, text-driven, voice driven and mixing are successively occurred in that
The method of driving, Guiard-Marigny et al. proposes a kind of based on voice and image collective effect driving synthesis mouth shape cartoon
Method, Bregler et al. proposes videoRewrite methods, and this method tracks speaker's lip using computer vision
Characteristic point, and using deformation technology by these lip attitude integrations into final mouth shape cartoon sequence, Kang Liu with
Jorn.Osterman proposes the shape of the mouth as one speaks and the corresponding relation of alphabetical phoneme in English, and is built on the basis of MPEG-4 animation standards
Vertical face, the algorithm of mouth shape cartoon synthesis.It is less for the research in terms of Chinese mouth shape cartoon.It is true that Chinese mouth shape cartoon is synthesized
True feeling effect is difficult to reach and surmount international most advanced level in a short time.This just compels to having researched and proposed for Chinese mouth shape cartoon
The requirement cut.In addition, lacking the consideration to ambient noise and background image in the prior art so that animation is not lively enough, true
It is real, and simulated scenario can not be carried out according to actual needs, and be adjusted as required by noise to improve the effect of animation.
The present invention is from the research angle for building the synthesis of voice driven mouth shape cartoon, to the design of three-dimensional lip section model, lip
Dynamic sequences Design, Chinese speech synchronized algorithm and personalized shape of the mouth as one speaks modeling are furtherd investigate, and are realized in input Chinese language text letter
Under conditions of breath, synthesized using information technology, output visually has virtual synchronous with labial perfect coordination of high presence
People's mouth shape cartoon, and by adding background image so that animation can simulate various scenes as needed, by noise
Processing by different level and addition so that animation is lively, true, improve the effect of animation.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of mouth shape cartoon based on aggregative weighted algorithm
Synthetic method, this method can be exported visually, and there is the high presence visual human shape of the mouth as one speaks synchronous with labial perfect coordination to move
Draw, and animation is lively, true, effect is good, comprises the following steps successively:
Step 1:Chinese language text is inputted, is analyzed for the Chinese language text of input, is different Chinese by Chinese character separating
Phoneme is visualized, and these factors are sent to speech synthesis system and synthesizes basic visualization phoneme stream;
Step 2:Realistic parameter faceform is set up based on MPEG-4 standards, visualization phoneme animation frame is used
The deformation of driving parameter model, realizes face mouth shape cartoon;
Step 3:The input ambient noise synchronous with input Chinese language text is obtained from input Chinese language text, to input background
Noise is analyzed, and is carried out input ambient noise smoothing processing, is obtained initial input ambient noise;
Step 4:Inputted from Chinese character separating to extract the phoneme after phoneme is split respectively in different Chinese visualization phonemes
Ambient noise, is analyzed phoneme input ambient noise, is carried out phoneme input ambient noise smoothing processing, is obtained initial phoneme
Input ambient noise;
Step 5:Ambient noise is inputted using obtained initial phoneme, school is carried out to obtained initial input ambient noise
Just, the input ambient noise after being corrected;
Step 6:Based on aggregative weighted algorithm, sound time control ratio is obtained, the sound weighted value factor is added, counts again
The time of the plain mouth shape cartoon of single-tone is calculated, the synthesis of mouth shape cartoon is controlled, the Chinese speech of synthesis is synchronous with face mouth shape cartoon;
Step 7:Background image is added according to cartoon scene, it is synchronous with face mouth shape cartoon with the Chinese speech of synthesis;
Step 8:Based on aggregative weighted algorithm, noise temporal control ratio is obtained, the noise weight value factor is added, calculates school
The Noise Synchronization time of input ambient noise after just;
Step 9:According to the demand of animation compound, the input ambient noise after selection control addition correction, with synthesis Chinese
The synthesis animation of voice, face mouth shape cartoon and background image is synchronized, and realizes face mouth shape cartoon true to nature.
Analyzed for the Chinese language text of input, by Chinese character separating for different Chinese visualization phonemes be by Chinese character by
The division of initial consonant and simple or compound vowel of a Chinese syllable in the quasi- Chinese phonetic alphabet of sighting target, completes the definition in shape of the mouth as one speaks phonetic part and shape of the mouth as one speaks phonetic rhythm portion, by Chinese character
Standard phonetic be converted into shape of the mouth as one speaks part and the shape of the mouth as one speaks rhythm portion symbol composition shape of the mouth as one speaks phonetic.
The mouth shape cartoon synthetic method of the aggregative weighted algorithm of the present invention is realized:
(1) the three-dimensional shape of the mouth as one speaks modeling based on finite character point control method, joins according to the MPEG-4 human face characteristic points defined
Number, chooses or defines lip area characteristic point, and the status data of tracking feature point simultaneously carries out comprehensive analysis, and specification lip zone state data are entered
Row three-dimensional shape of the mouth as one speaks modeling;
(2) lip based on simple or compound vowel of a Chinese syllable transmission diversity weighting control method moves sequences Design, and time scale shared by part and rhythm portion is used and added
Weight controls its effect played in animation compound;
(3) influence innovatively by punctuation mark in Chinese language text to speech pause in text reading is applied to voice mouthful
In type collaboration animation, to various punctuation marks, pause duration carries out statistical analysis in text reading, according to pause duration to the Chinese
Language punctuation mark is classified, and the relational model set up between its pause duration and text reading bulk velocity, meanwhile, to lip
The duration scale parameter between adjacent lip in dynamic series model is analyzed, and integrated use punctuate pauses and lip parameter mould
Type is weighted processing, realizes the Chinese speech mouth shape cartoon system of voice shape of the mouth as one speaks coordinate synchronization;
(4) the visual phoneme of Chinese is sorted out and divides and set up mapping relations with basic pronunciation mouth shape, according to Chinese phonetic
The feature of plain pronunciation mouth shape, repartitions part and the rhythm portion of the Chinese phonetic alphabet, and the classification eases to standard initial consonant table are basic
Six classes, rhythm portion can be divided into the four class shape of the mouth as one speaks, handled using a kind of cosine function and be deformed into " rhythm portion " key frame by " part " key frame
The transition processing of two kinds of shape of the mouth as one speaks, allows animation more smooth flow.
(5) background image can be added so that animation can select different background images according to demand, so that will be dynamic
Picture is presented under different scenes, and animation is more lively, truly.
(6) to the processing by different level and addition of noise so that according to different scene needs, the level of noise can be adjusted
, for example in meeting, can not select without noise or reduce the rank of noise so that meeting can quieter,
Spectators clearly can be carried out in the environment of audible sound;When needing to show ambient noise, can by ambient noise present or
Person is presented with the level of noise needed, such as the underwater sound, tweedle that need to be accompanied by background environment so that animation is more given birth to
Dynamic, truly, effect is more preferable;
(7) layered shaping is carried out to noise also with aggregative weighted algorithm so that animation compound and synchronous more flexible, conjunction
Into with the demand after synchronization closer to synthesis, animation is lively, and truly, effect is good.
Brief description of the drawings
Fig. 1 Chinese speech synchronization shape of the mouth as one speaks process chart;
Fig. 2 people' s face positioning unit figures (FAPU);
Fig. 3 mouth area models;
The actual time domain waveform of Fig. 4 pronunciations and the animation compound control comparison diagram of sound weighting control.
Embodiment
The following detailed description of the specific implementation of the present invention, it is necessary to it is pointed out here that, implement to be only intended to this hair below
Bright further illustrates, it is impossible to be interpreted as limiting the scope of the invention, and art skilled person is according to above-mentioned
Some nonessential modifications and adaptations that the content of the invention is made to the present invention, still fall within protection scope of the present invention.
The analysis of Bopomofo pronunciation shape of the mouth as one speaks feature
The base unit of voice is from the angular divisions of tone color:Phoneme, syllable, tone and phoneme.Phoneme is syllabication
Least unit or minimum sound bite.One syllable, if going further division by the difference of tone color, will obtain one
The individual minimum unit having their own characteristics each, here it is phoneme.Mandarin pronunciation has 32 phonemes, is segmented into vowel and consonant two
Major class, vowel phoneme has 10, and consonant phoneme has 22.According to《The Scheme for the Chinese Phonetic Alphabet》In mention factor pronunciation when spy
Levy, the basic shape of the mouth as one speaks is divided into three-level, such as table 1 by the division of initial consonant and simple or compound vowel of a Chinese syllable in the combined standard Chinese phonetic alphabet.
It is, in general, that a Chinese character represents a syllable, exception is only present in er-suffix syllable, and this is mandarin
A kind of phenomenon in voice, is also " suffixation of a nonsyllabic "r" rhythm " syllable, such as " playing " to write down is two Chinese characters, but read to get up and be
One syllable " wanr ".Bopomofo pronunciation rule generally is only considered in the present invention, is talked about for the youngster that foregoing is directed to
The special circumstances of sound, are classified as the processing of two syllables, and analysis is " wan " and " er " two in system processing by such as " playing "
Syllable.
The basic shape of the mouth as one speaks classification chart of the Chinese speech pronunciation of table 1
Mouth shape cartoon phonetic redefines scheme
In mandarin, initial consonant is made up of consonant, totally 23 including b, p, m, f etc., and simple or compound vowel of a Chinese syllable totally 38 can be by a member
Sound constitutes the combination of (such as a, o, e) or diphthong (such as ai, ie) or three vowels (such as iao).With standard Chinese phonetic
Divide initial consonant similar with simple or compound vowel of a Chinese syllable, the phonetic of each Chinese character is defined as two parts:Part (s) and rhythm portion (y).Part and rhythm portion
A kind of shape of the mouth as one speaks state is corresponded to respectively, and when making mouth shape cartoon, when personage often says a Chinese character, the shape of the mouth as one speaks is just by " part " key frame
Transition deformation is " rhythm portion " key frame.When controlling the time of this two parts key frame, the processing method weighted using sound, with
The animation time of rational control two makes animation more life-like.
If making true to nature, natural mouth shape cartoon, all combinations of initial consonant and simple or compound vowel of a Chinese syllable must just be considered.If
Be that each initial consonant or simple or compound vowel of a Chinese syllable are set up into a shape of the mouth as one speaks, and consider its combined situation, which not only adds system when
Between pay wages, and add repetitive work.According to the tagsort of the above-mentioned shape of the mouth as one speaks (table 1), it can be found that many phonemes
Pronunciation mouth shape is identical or similar, therefore in order to reach quick, easy-operating purpose, the present invention is using the side compromised
Case, is redefined to standard Chinese phonetic.According to the feature of the shape of the mouth as one speaks of table 1, part and the rhythm portion of the Chinese phonetic alphabet are repartitioned.
Classification eases to standard initial consonant table are basic six classes such as table 2, and rhythm portion can be divided into the four class shape of the mouth as one speaks such as table 3.
The standard Chinese phonetic initial consonant conversion table of table 2
The standard Chinese phonetic simple or compound vowel of a Chinese syllable conversion table of table 3
The definition of part mainly sorts out the same or similar initial consonant of pronunciation mouth shape feature:S-b lips are closed, resistance
Fill in air-flow;The upper teeth of s-f touch down lip and met into narrow;S-d nozzle type is crack, and lip loosens, and nozzle type change is trickle;S-g nozzle type is chin
The a quarter of maximum angle is opened up into, lip loosens;S-r lips protract, and tighten;S-y lips are to two side stretchings.Also according to
Shape of the mouth as one speaks feature, rhythm portion can be divided into:The y-a shape of the mouth as one speaks, is mainly used for the larger unrounded simple or compound vowel of a Chinese syllable pronunciation of lip aperture during pronunciation, example
Such as a, an;The y-o shape of the mouth as one speaks, is mainly used for lip during pronunciation and slightly justifies, the simple or compound vowel of a Chinese syllable that mouth is scooped up forward, such as o, ou;The y-e shape of the mouth as one speaks,
The non-round simple or compound vowel of a Chinese syllable of half, lip, the shape of the mouth as one speaks is such as e, i when being mainly used for pronouncing;Y-o be mainly used for pronunciation when lip forward
The circular simple or compound vowel of a Chinese syllable only stayed compared with crack is protruded, such as u.According to table 2 and table 3, all phonetic transcriptions of Chinese characters are converted into by the present invention
Shape of the mouth as one speaks part and two, shape of the mouth as one speaks rhythm portion part, such as " animation " two word can just be expressed as s-d → y-o and s-d → y-a.
If s-b, s-d, s-f, s-r, s-y, y-a, y-o, s-g and y-e, y-i are made 9 shape of the mouth as one speaks models, then each two model
Change procedure between key frame will constitute the pronunciation mouth shape animation of a Chinese character.
The method that Chinese character is divided into part and the rhythm portion shape of the mouth as one speaks according to consonant, vowel is applicable whole Chinese characters, only indivedual Chinese substantially
Word phonetic makes an exception, i.e. list phoneme Chinese phoneme such as a (), 0 (), e (hungry), ai (love), ei (Ei), ao (coat), en (grace), er (youngster)
Deng, they only have the Chinese phonetic alphabet division in simple or compound vowel of a Chinese syllable.If according to classification above, all only one of which shape of the mouth as one speaks rhythm portion, then
The single rhythm portion shape of the mouth as one speaks is just only existed in animation compound, they are all called plus a fixed part shape of the mouth as one speaks symbol in order to unified
Nature model, is designated as " & ".The final result of above phonetic conversion is as shown in table 4:
Next complete after shape of the mouth as one speaks phonetic part is defined with rhythm portion is exactly conversion work, is exactly to turn the standard phonetic of Chinese character
Change the shape of the mouth as one speaks phonetic being made up of part and rhythm portion symbol into.In order to which program is realized conveniently, part and the mouth in rhythm portion in this research
Type mark simplifies, and " s- " and " y- " removed above is only write as symbol letter after a letter simplifies and have 10:a、o、e、i、
b、d、f、r、y、g.Table 5 gives the example of some phonetic transcriptions of Chinese characters conversion:
The single syllable phonetic conversion table of table 4
The part Chinese character conversion citing table of table 5
Fig. 2 describes faceform's definition in its natural state, and it is that, in Z-direction, all facial muscles are put to stare
Pine;Eyelid is tangent with iris;The iris diameter (IRISD0) of pupil is 1/3rd;Upper lower lip is contacted naturally;Lip line is
Level, with upper lip on sustained height;Mouth closing, sharp tongue supports interdental border up and down and keeps level.
Fig. 3 describes mouth area model, and record data point represents 372 points of free space, 124 summits in a model
It is the three-dimensional data in free space, because only that 145 training points of observation are by with therefore covariance matrix ∑ only has 144 etc.
Level, be up to 144 independent degrees find by matrix PCA analyses, most difference by preceding 10 grouped datas or
Person's " characteristic point " is represented, i.e. 10 eigenvalue λs of covariance matrix ∑ intermediate value highest, and mouth shape cartoon characteristic point is chosen with this.
Sound adds the algorithm that time weighting controls the shape of the mouth as one speaks to synthesize animation, the general principle of foundation be in same class phoneme,
The basic shape of the mouth as one speaks change of mouth shape cartoon is with similitude very greatly, and the basic shape of the mouth as one speaks change of mouth shape cartoon has in different phonemes
There is very big otherness.Equally in the synthesis of mouth shape cartoon, the mouth between the part of different phonemes and the shape of the mouth as one speaks in rhythm portion is represented
Type changing features with the method for sound time weight weight parameter it is obvious that distinguish the difference of animation shape of the mouth as one speaks feature here
Property.
The characteristic vector of each time frame is respectively X in voice segments a, bi, Yi(1≤i≤Na, 1≤j≤Nb).If XiWith Yi's
Euclidean distance is dij, then the intersegmental distance for having a, b sections is:
In formula, Da,bFor the average value of all characteristic vector distances between a, b, the totality synthetically reflected between a, b is poor
It is different.If mouth shape cartoon to be split is divided into T frames, 1 is respectively labeled as ... ..., T.Assuming that using t frames as boundary, it is front and rear each
M frames are taken to constitute two sub- voice segments, i.e. i ∈ [t-m+1 ... ..., t] and j ∈ [t+1 ... ..., t+m], can be with according to formula (1)
The intersegmental distance for obtaining the two sub- voice segments is
The sound time control ratio obtained, addition sound weighted value w are calculated by above-mentioned formula (1), (2)s、wyThe factor, weight
The new time for calculating the plain mouth shape cartoon of single-tone, control the synthesis of mouth shape cartoon.
Wherein ws+wy=1.
In order to obtain the average time of accurate Chinese word character tone periodChinese word character tone period will be adopted
It is identical that word speed is read aloud in sample, sampling process.The present invention takes every group of N number of sampling point of data of M group data, and place is averaged to it
Reason, by the assessment of tag system, takes the average time of variance most hours dataIt is used as progress mouth shape cartoon synthesis
The standard time of Chinese word character mouth shape cartoon.
Consider the influence that Chinese punctuation changes to the continuous shape of the mouth as one speaks, consider when synthesizing continuous animation in sentence or end of the sentence
Dead time of appearance longer 7 kinds of periods, such as fullstop, exclamation, question mark, pause mark, comma, branch, colon, and according to this 7
Kind of label is in sentence or the length of end of the sentence dead time is assigned to different weight wsbi, formula (4).
wbiRepresent the weighted value of i-th of label in label.By changing the w in certain limitbiLabel weighted value can give birth to
Into the basic shape of the mouth as one speaks of similar training set, use on continuous animation synthesis passage.
The actual time domain waveform of Fig. 4 pronunciations and the animation compound control comparison diagram of sound weighting control illustrate after sound weighting
Preferably, wherein upper figure is the time domain beamformer of Chinese syllable, figure below is single syllable sound weighting control animation to animation compound effect
Generate time control figure.
The continuous smooth control of mouth shape cartoon synthesis, in order to solve the linking overscale problems between two continuous shape of the mouth as one speaks actions,
The present invention weights control algolithm control pronunciation initial consonant, mouth shape and rhythm by the initial consonant and simple or compound vowel of a Chinese syllable to standard Chinese phonetic using sound
The time of female shape of the mouth as one speaks, and the interpolation method in two mouth shape cartoon transition using cosine function, to the end point of an action
Position carries out offset interpolation to another starting point position acted so that have good continuity between mouth shape cartoon.Tool
Body cosine function interpolation algorithm is as follows:
Once t (now time and t0It is related) determined by audio frequency apparatus, lip modal displacement can be just computed
.Each lip node apparent place x can be usedi(t)=[x (t), y (t), z (t)] ' is defined, herein, i=1,
2 ..., n is the sequence for the control node for defining mouth geometry and topological structure.In order to accomplish the control of complete oral area shape
The interpolated value of node location, topological structure must keep fixed, and the control node in each lip shape blank must
It must be consistent.The position X (s) of middle each node of the interpolation shape of the mouth as one speaks can be by initial and terminate apparent place nodes X0And X1's
Position is calculated, and formula is as follows:
Variable s is generally described as t linearly or nonlinearly conversion, and 0≤s≤1, however, based on linear interpolation
Action does not show the initial actuating feature for accelerating and slowing down.One close interpolation approximation for accelerating and slowing down is used
One cosine function improves this action:
S'=s* (1-cos (π * (s0-s)))/2 (6)
The use of cosine interpolation type is an effectively solution, and system is it also seen that satisfied result.
The extraction of ambient noise is using the conventional voice in this area and noise separation technology, and noise analysis processing and animation are made an uproar
Sound synchronously using with Chinese language text similar mode, equally be based on aggregative weighted algorithm, obtain noise temporal control ratio, addition
The noise weight value factor, calculates the Noise Synchronization time of the input ambient noise after correction, so as to close according to the actual needs
Into animation.
Fig. 1 Chinese speech synchronization shape of the mouth as one speaks process chart, is comprised the following steps that:
Step1. Chinese language text is inputted;
Step2. the phonetic of Chinese phonetics is converted text to;
Step3. the sample of synthesis voice is produced from text;
Step4. audio process is inquired, and determines from speech play processor current phoneme;
Step5. the current shape of the mouth as one speaks is calculated from the track of current syllable;
Step6. the synchronous shape of the mouth as one speaks of synthesis voice and the displaying of synchronous figure, returns to Step4 until being without readable factor
Only.
Although for illustrative purposes, it has been described that illustrative embodiments of the invention, those skilled in the art
Member it will be understood that, can be in form and details in the case of the scope and spirit for not departing from invention disclosed in appended claims
The upper change for carrying out various modifications, addition and replacement etc., and all these changes should all belong to appended claims of the present invention
Each step in protection domain, and claimed each department of product and method, can be in any combination
Form is combined.Therefore, to disclosed in this invention embodiment description be not intended to limit the scope of the present invention,
But for describing the present invention.Correspondingly, the scope of the present invention is not limited by embodiment of above, but by claim or
Its equivalent is defined.
Claims (4)
1. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm, it is characterised in that comprise the following steps successively:
Step 1:Chinese language text is inputted, is analyzed for the Chinese language text of input, Chinese character separating is visual for different Chinese
Change phoneme, and these factors are sent to speech synthesis system and synthesize basic visualization phoneme stream;
Step 2:Realistic parameter faceform is set up based on MPEG-4 standards, visualization phoneme animation frame parameter is used
The deformation of driving model, realizes face mouth shape cartoon;
Step 3:The input ambient noise synchronous with input Chinese language text is obtained from input Chinese language text, to input ambient noise
Analyzed, carry out input ambient noise smoothing processing, obtain initial input ambient noise;
Step 4:Background is inputted from Chinese character separating to extract the phoneme after phoneme is split respectively in different Chinese visualization phonemes
Noise, is analyzed phoneme input ambient noise, is carried out phoneme input ambient noise smoothing processing, is obtained initial phoneme input
Ambient noise;
Step 5:Ambient noise is inputted using obtained initial phoneme, obtained initial input ambient noise is corrected, obtained
Input ambient noise after to correction;
Step 6:Based on aggregative weighted algorithm, sound time control ratio is obtained, the sound weighted value factor is added, recalculates list
The time of phoneme mouth shape cartoon, the synthesis of mouth shape cartoon is controlled, the Chinese speech of synthesis is synchronous with face mouth shape cartoon;
Step 7:Background image is added according to cartoon scene, it is synchronous with face mouth shape cartoon with the Chinese speech of synthesis;
Step 8:Based on aggregative weighted algorithm, noise temporal control ratio is obtained, the noise weight value factor is added, calculates after correction
Input ambient noise the Noise Synchronization time;
Step 9:According to the demand of animation compound, the input ambient noise after selection control addition correction, with synthesis Chinese speech,
The synthesis animation of face mouth shape cartoon and background image is synchronized, and realizes face mouth shape cartoon true to nature.
2. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm as claimed in claim 1, it is characterised in that:It is described
Analyzed for the Chinese language text of input, it is according to the standard Chinese by Chinese character that Chinese character separating is visualized into phoneme for different Chinese
The division of initial consonant and simple or compound vowel of a Chinese syllable in language phonetic, completes the definition in shape of the mouth as one speaks phonetic part and shape of the mouth as one speaks phonetic rhythm portion, and the standard of Chinese character is spelled
Sound is converted into the shape of the mouth as one speaks phonetic of shape of the mouth as one speaks part and shape of the mouth as one speaks rhythm portion symbol composition.
3. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm as claimed in claim 1, it is characterised in that:It is described
Based on aggregative weighted algorithm, the sound time control ratio of acquisition adds the sound weighted value factor, recalculates the plain shape of the mouth as one speaks of single-tone
The time of animation, the synthesis of control mouth shape cartoon comprises the following steps:
The characteristic vector of each time frame is respectively X in voice segments a, bi, Yi, wherein 1≤i≤Na, 1≤j≤Nb, XiWith YiEuclidean
Distance is dij, then a, b sections of intersegmental distance is:
Da,bFor the average value of all characteristic vector distances between a, b, the total difference between a, b is reflected, the shape of the mouth as one speaks to be split is moved
Picture is divided into T frames, is respectively labeled as 1 ... ..., T, front and rear respectively to take m frames to constitute two sub- voice segments using t frames as boundary, i.e.,
I ∈ [t-m+1 ... ..., t] and j ∈ [t+1 ... ..., t+m], the intersegmental distance for obtaining the two sub- voice segments is
Calculate the sound time control ratio obtained, addition sound weighted value ws、wyThe factor, recalculates the plain mouth shape cartoon of single-tone
Time, control mouth shape cartoon synthesis:
Wherein ws+wy=1;
Every group of N number of sampling point of data of M group data is taken, and processing is averaged to it, by the assessment of tag system, by the side of taking
The average time of poor most hours dataIt is used as the standard time for the Chinese word character mouth shape cartoon for carrying out mouth shape cartoon synthesis.
4. a kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm as claimed in claim 3, it is characterised in that:It is described
Based on aggregative weighted algorithm, the sound time control ratio of acquisition adds the sound weighted value factor, recalculates the plain shape of the mouth as one speaks of single-tone
The time of animation, control the synthesis of mouth shape cartoon further comprising the steps of:Consider what Chinese punctuation changed to the continuous shape of the mouth as one speaks
Influence, consider when synthesizing continuous animation dead time longer 7 kinds of periods that in sentence or end of the sentence occurs, i.e. fullstop, exclamation,
Question mark, pause mark, comma, branch, colon, according to this 7 kinds of labels in sentence or the end of the sentence dead time length be assigned to it is different
Weight wbi,
wbiThe weighted value of i-th of label in label is represented, by changing the w in certain limitbiThe generation of label weighted value is similar
The basic shape of the mouth as one speaks of training set, is used on continuous animation synthesis passage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410712164.7A CN104361620B (en) | 2014-11-27 | 2014-11-27 | A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410712164.7A CN104361620B (en) | 2014-11-27 | 2014-11-27 | A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104361620A CN104361620A (en) | 2015-02-18 |
CN104361620B true CN104361620B (en) | 2017-07-28 |
Family
ID=52528878
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410712164.7A Active CN104361620B (en) | 2014-11-27 | 2014-11-27 | A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104361620B (en) |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106504304B (en) * | 2016-09-14 | 2019-09-24 | 厦门黑镜科技有限公司 | A kind of method and device of animation compound |
CN106297792A (en) * | 2016-09-14 | 2017-01-04 | 厦门幻世网络科技有限公司 | The recognition methods of a kind of voice mouth shape cartoon and device |
CN107808191A (en) * | 2017-09-13 | 2018-03-16 | 北京光年无限科技有限公司 | The output intent and system of the multi-modal interaction of visual human |
CN108038461B (en) * | 2017-12-22 | 2020-05-08 | 河南工学院 | System and method for interactive simultaneous correction of mouth shape and tongue shape of foreign languages |
CN108447474B (en) * | 2018-03-12 | 2020-10-16 | 北京灵伴未来科技有限公司 | Modeling and control method for synchronizing virtual character voice and mouth shape |
CN108763190B (en) * | 2018-04-12 | 2019-04-02 | 平安科技(深圳)有限公司 | Voice-based mouth shape cartoon synthesizer, method and readable storage medium storing program for executing |
CN108777140B (en) * | 2018-04-27 | 2020-07-28 | 南京邮电大学 | Voice conversion method based on VAE under non-parallel corpus training |
CN110853614A (en) * | 2018-08-03 | 2020-02-28 | Tcl集团股份有限公司 | Virtual object mouth shape driving method and device and terminal equipment |
TW202009924A (en) * | 2018-08-16 | 2020-03-01 | 國立臺灣科技大學 | Timbre-selectable human voice playback system, playback method thereof and computer-readable recording medium |
CN109377540B (en) * | 2018-09-30 | 2023-12-19 | 网易(杭州)网络有限公司 | Method and device for synthesizing facial animation, storage medium, processor and terminal |
CN109377539B (en) * | 2018-11-06 | 2023-04-11 | 北京百度网讯科技有限公司 | Method and apparatus for generating animation |
CN109830236A (en) * | 2019-03-27 | 2019-05-31 | 广东工业大学 | A kind of double vision position shape of the mouth as one speaks synthetic method |
CN110134305B (en) * | 2019-04-02 | 2022-12-09 | 北京搜狗科技发展有限公司 | Method and device for adjusting speech rate |
CN110136698B (en) * | 2019-04-11 | 2021-09-24 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for determining mouth shape |
CN110176284A (en) * | 2019-05-21 | 2019-08-27 | 杭州师范大学 | A kind of speech apraxia recovery training method based on virtual reality |
CN110413841A (en) * | 2019-06-13 | 2019-11-05 | 深圳追一科技有限公司 | Polymorphic exchange method, device, system, electronic equipment and storage medium |
CN110347867B (en) * | 2019-07-16 | 2022-04-19 | 北京百度网讯科技有限公司 | Method and device for generating lip motion video |
CN110428812B (en) * | 2019-07-30 | 2022-04-05 | 天津大学 | Method for synthesizing tongue ultrasonic video according to voice information based on dynamic time programming |
CN110446066B (en) * | 2019-08-28 | 2021-11-19 | 北京百度网讯科技有限公司 | Method and apparatus for generating video |
CN113689880B (en) * | 2020-05-18 | 2024-05-28 | 北京搜狗科技发展有限公司 | Method, device, electronic equipment and medium for driving virtual person in real time |
CN113689879B (en) * | 2020-05-18 | 2024-05-14 | 北京搜狗科技发展有限公司 | Method, device, electronic equipment and medium for driving virtual person in real time |
CN111915707B (en) * | 2020-07-01 | 2024-01-09 | 天津洪恩完美未来教育科技有限公司 | Mouth shape animation display method and device based on audio information and storage medium |
CN112184859B (en) | 2020-09-01 | 2023-10-03 | 魔珐(上海)信息科技有限公司 | End-to-end virtual object animation generation method and device, storage medium and terminal |
CN112331184B (en) * | 2020-10-29 | 2024-03-15 | 网易(杭州)网络有限公司 | Voice mouth shape synchronization method and device, electronic equipment and storage medium |
CN112750187A (en) * | 2021-01-19 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Animation generation method, device and equipment and computer readable storage medium |
CN113112575B (en) * | 2021-04-08 | 2024-04-30 | 深圳市山水原创动漫文化有限公司 | Mouth shape generating method and device, computer equipment and storage medium |
CN113284506A (en) * | 2021-05-20 | 2021-08-20 | 北京沃东天骏信息技术有限公司 | Information mapping method and device, storage medium and electronic equipment |
CN113609255A (en) * | 2021-08-04 | 2021-11-05 | 元梦人文智能国际有限公司 | Method, system and storage medium for generating facial animation |
CN113643413A (en) * | 2021-08-30 | 2021-11-12 | 北京沃东天骏信息技术有限公司 | Animation processing method, animation processing device, animation processing medium and electronic equipment |
CN115222856B (en) * | 2022-05-20 | 2023-09-26 | 一点灵犀信息技术(广州)有限公司 | Expression animation generation method and electronic equipment |
CN114999440B (en) * | 2022-05-24 | 2024-07-26 | 北京百度网讯科技有限公司 | Avatar generation method, apparatus, device, storage medium, and program product |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101520903A (en) * | 2009-04-23 | 2009-09-02 | 北京水晶石数字科技有限公司 | Method for matching Chinese mouth shape of cartoon role |
CN101826216A (en) * | 2010-03-31 | 2010-09-08 | 中国科学院自动化研究所 | Automatic generating system for role Chinese mouth shape cartoon |
CN103218842A (en) * | 2013-03-12 | 2013-07-24 | 西南交通大学 | Voice synchronous-drive three-dimensional face mouth shape and face posture animation method |
-
2014
- 2014-11-27 CN CN201410712164.7A patent/CN104361620B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101520903A (en) * | 2009-04-23 | 2009-09-02 | 北京水晶石数字科技有限公司 | Method for matching Chinese mouth shape of cartoon role |
CN101826216A (en) * | 2010-03-31 | 2010-09-08 | 中国科学院自动化研究所 | Automatic generating system for role Chinese mouth shape cartoon |
CN103218842A (en) * | 2013-03-12 | 2013-07-24 | 西南交通大学 | Voice synchronous-drive three-dimensional face mouth shape and face posture animation method |
Non-Patent Citations (4)
Title |
---|
基于加权算法的汉语语音同步三维口型动画研究;毕永新 等;《图学学报》;20120430;第33卷(第2期);第98-102页 * |
基于综合加权算法的汉语语音同步三维口型动画研究;毕永新;《中国优秀硕士学位论文全文数据库(电子期刊)》;20130315;第二章第2.1节,第三章第3.4.2节,第四章第4.1-4.2节,第五章第5.1节 * |
基于语音识别的汉语发音自动评分系统的设计与实现;吕军;《计算机工程与设计》;20070331;第28卷(第5期);第1232-1235页 * |
基于韵律文本的三维口型动画;尹宝才 等;《北京工业大学学报》;20091231;第35卷(第12期);第1690-1696页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104361620A (en) | 2015-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104361620B (en) | A kind of mouth shape cartoon synthetic method based on aggregative weighted algorithm | |
CN113781610B (en) | Virtual face generation method | |
CN109712627A (en) | It is a kind of using speech trigger virtual actor's facial expression and the voice system of mouth shape cartoon | |
CN103258340B (en) | Is rich in the manner of articulation of the three-dimensional visualization Mandarin Chinese pronunciation dictionary of emotional expression ability | |
Wang et al. | Phoneme-level articulatory animation in pronunciation training | |
US20060009978A1 (en) | Methods and systems for synthesis of accurate visible speech via transformation of motion capture data | |
Granström et al. | Audiovisual representation of prosody in expressive speech communication | |
Naert et al. | A survey on the animation of signing avatars: From sign representation to utterance synthesis | |
CN102169642A (en) | Interactive virtual teacher system having intelligent error correction function | |
CN103218842A (en) | Voice synchronous-drive three-dimensional face mouth shape and face posture animation method | |
CN105390133A (en) | Tibetan TTVS system realization method | |
De Martino et al. | Facial animation based on context-dependent visemes | |
Ma et al. | Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data | |
Yu et al. | Data-driven 3D visual pronunciation of Chinese IPA for language learning | |
CN116665275A (en) | Facial expression synthesis and interaction control method based on text-to-Chinese pinyin | |
Liu et al. | An interactive speech training system with virtual reality articulation for Mandarin-speaking hearing impaired children | |
Massaro et al. | A multilingual embodied conversational agent | |
Karpov et al. | Multimodal synthesizer for Russian and Czech sign languages and audio-visual speech | |
Ouni et al. | Training Baldi to be multilingual: A case study for an Arabic Badr | |
Yu et al. | 3D visual pronunciation of Mandarine Chinese for language learning | |
Li et al. | A novel speech-driven lip-sync model with CNN and LSTM | |
Burgos et al. | Engaging human-to-robot attention using conversational gestures and lip-synchronization | |
Busso et al. | Learning expressive human-like head motion sequences from speech | |
Gibet et al. | Toward a motor theory of sign language perception | |
Yu et al. | A realistic 3D articulatory animation system for emotional visual pronunciation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |