CN102254336A - Method and device for synthesizing face video - Google Patents
Method and device for synthesizing face video Download PDFInfo
- Publication number
- CN102254336A CN102254336A CN 201110197873 CN201110197873A CN102254336A CN 102254336 A CN102254336 A CN 102254336A CN 201110197873 CN201110197873 CN 201110197873 CN 201110197873 A CN201110197873 A CN 201110197873A CN 102254336 A CN102254336 A CN 102254336A
- Authority
- CN
- China
- Prior art keywords
- face
- people
- expression
- image
- facial image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000002194 synthesizing effect Effects 0.000 title claims abstract description 28
- 230000014509 gene expression Effects 0.000 claims abstract description 118
- 230000008921 facial expression Effects 0.000 claims abstract description 27
- 238000009499 grossing Methods 0.000 claims abstract description 9
- 230000001815 facial effect Effects 0.000 claims description 92
- 239000000284 extract Substances 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 230000008878 coupling Effects 0.000 claims description 7
- 238000010168 coupling process Methods 0.000 claims description 7
- 238000005859 coupling reaction Methods 0.000 claims description 7
- 230000000007 visual effect Effects 0.000 claims description 7
- 230000007935 neutral effect Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000012805 post-processing Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 5
- 238000004804 winding Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 12
- 230000008901 benefit Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- 210000003205 muscle Anatomy 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000010189 synthetic method Methods 0.000 description 1
Images
Landscapes
- Processing Or Creating Images (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a method and a device for synthesizing a face video. The method for synthesizing the face video comprises the following steps of: establishing a target character expression database comprising a plurality of frames of face images; pre-treating the plurality of frames of face images so as to extract a face posture position and a face expression on each frame of face image; retrieving the joint similarity between the face posture position and the face expression in a user-defined face image sequence and the face posture position and the face expression in the plurality of frames of face images so as to obtain a retrieval image sequence matched with the user-defined face image sequence; and performing winding transform and smoothing treatment on the retrieval image sequence. According to the method and the device for synthesizing the face video disclosed by the embodiment of the invention, for any user-defined expression, a lifelike image sequence relative to a target character can be synthesized conveniently; and the automation degree is high.
Description
Technical field
The present invention relates to field of Computer Graphics, particularly a kind of people's face image synthesizing method and device.
Background technology
In game making, film making and virtual reality, the analogue technique of human face expression has obtained development fast, has proposed the synthetic method of many expressions, yet synthesizing of true personage's expression still can not be satisfied actual demand.On the one hand, some lack the sense of reality based on the expression picture that the picture deformation technology of sample produces, and can not satisfy the requirement of a large amount of expression sequence senses of reality; On the other hand, the facial expression capturing technology can be transferred to another personage with performer's expression, and this technology is comparatively ripe, in many film special efficacys (as 3D film " A Fanda "), application is arranged all, but implement comparatively complicated, and the performer need wear a helmet, very inconvenient; Simultaneously, the performer also often need repeat to perform certain expression up to satisfying quality requirements, and this bothers for the performer very much.
Want personage's expression synthetic true to nature, main difficult point is the geometry of people's face and synthesizing of textural characteristics, and when the people did certain expression, his profile changed according to muscle, and the while is influenced by ambient lighting also and produces bright alternately dark.Wherein, encoding facial movement system (FACS) is used widely in expression is analyzed and be synthetic, it is divided into a series of moving cells (AU) with people's face, the respective muscle motion of some basic facial expressions is provided, by changing the correlation combiner of these moving cells, just can produce different human face expressions.Though this method can access comparatively real human face expression, can not realize the robotization processing, a promptly given personage's expression can not be mapped to another personage with expression under the prerequisite that reduces people's intervention as far as possible.
Summary of the invention
Purpose of the present invention is intended to solve at least one of above-mentioned technological deficiency.For this reason, the present invention need provide a kind of people's face image synthesizing method and device, and the advantage of this people's face image synthesizing method and device is: for user-defined any expression, can both synthesize the corresponding photorealism sequence of target person easily; Reduce the actor number of times; And automaticity height.
According to an aspect of the present invention, provide a kind of people's face image synthesizing method, it is characterized in that, comprised step: set up target person expression database, comprise the multiframe facial image in the described target person expression database; Described multiframe facial image is carried out pre-service, with the posture position that extracts people's face on each frame facial image and the expression of people's face; Carry out the associating similarity retrieval of the expression of the posture position of people's face on the expression of the posture position of people's face in the user-defined human face image sequence and people's face and the described multiframe facial image and people's face, to obtain the retrieving images sequence with described user-defined human face image sequence coupling; And to described retrieving images sequence reel conversion and smoothing processing.
According to people's face image synthesizing method of the embodiment of the invention,, can both synthesize the corresponding photorealism sequence of target person easily for user-defined any expression.
According to one embodiment of present invention, the described step of setting up target person expression database comprises: gather a plurality of basic facial expression image sequences of described target person from a plurality of visual angles and a plurality of posture position, each the basic facial expression image sequence in wherein said a plurality of basic facial expression image sequences is expressed one's feelings by the change procedure of neutrality expression and described neutral expression and is constituted.
According to people's face image synthesizing method of the embodiment of the invention, can comprise great amount of images in the target person expression database.
According to one embodiment of present invention, describedly described multiframe facial image is carried out pretreated step comprise: extract in the described multiframe facial image in each frame facial image the posture position of people's face and use crab angle, helix angle and side rake angle to describe the position of people's face; People's face in each frame facial image in the described multiframe facial image is aimed at positive standard attitude and carried out the detection of feature description point, use described feature description point to describe the expression of people's face.
According to people's face image synthesizing method of the embodiment of the invention, posture position and expression that can accurate description people face.
According to one embodiment of present invention, carrying out described feature description point based on active shape model detects.
According to one embodiment of present invention, the step of associating similarity retrieval of carrying out the expression of the posture position of people's face on the expression of the posture position of people's face in the described user-defined human face image sequence and people's face and the described multiframe facial image and people's face comprises: according to D
Pose=L (| Y
i-Y
j|)+L (| R
i-R
j|)+L (| P
i-P
j|) similarity of carrying out the posture position of people's face on the posture position of people's face in the described user-defined human face image sequence and the described multiframe facial image calculates, wherein Y represents crab angle, and R represents helix angle, and P represents side rake angle, I
iRepresent the arbitrary frame facial image in the described user-defined human face image sequence, I
jRepresent the arbitrary frame facial image in the described multiframe facial image, L (d) is the sigmoid function, is defined as
γ=ln99 wherein, T and σ are respectively average statistical and the standard deviations of variable d; According to described crab angle, described helix angle and described side rake angle, the posture position of people's face in the described user-defined human face image sequence is registered to positive attitude, basis then
Carry out the similarity of the expression of people's face on the expression of people's face in the described user-defined human face image sequence and the described multiframe facial image and calculate, wherein A
I, kIt is image I
iK feature description point behind the aligning, A
J, kIt is image I
jK feature description point behind the aligning, J is the number of unique point, w
kBe the weight of k unique point, L (d) is the sigmoid function, is defined as
The unique point range normalization that calculates is arrived [0,1]; According to D (i, j)=D
Pose(i, j)+λ D
Expression(i j) obtains the associating similarity, and wherein parameter lambda is to regulate the weight proportion of expression similarity and posture position similarity; And obtain image I in the user-defined human face image sequence according to described associating similarity
iA plurality of candidate images in described multiframe facial image.
According to people's face image synthesizing method of the embodiment of the invention, can obtain the posture position and the suitable associating similarity between the expression of people's face and obtain qualified a plurality of candidate image.
According to one embodiment of present invention, described a plurality of candidate images for described associating similarity retrieval acquisition, further calculate the associating similarity between described a plurality of candidate images, and use the Dijkstra shortest path first to obtain described retrieving images sequence to guarantee time continuity and Space Consistency.
According to people's face image synthesizing method of the embodiment of the invention, can guarantee the accuracy and the flatness of retrieving images sequence.
According to an aspect of the present invention, provide a kind of people's face video synthesizer, it is characterized in that, having comprised: target person expression database, described target person expression database is used to store the multiframe facial image; Pretreatment module, described pretreatment module are used for extracting the posture position of people's face on described each frame facial image of target person expression database and the expression of people's face; Associating similarity retrieval module, described associating similarity retrieval module is used for carrying out the associating similarity retrieval of the expression of the posture position of people's face on the expression of the posture position of user-defined human face image sequence people face and people's face and the described multiframe facial image and people's face, to obtain the retrieving images sequence with described user-defined human face image sequence coupling; And post-processing module, described post-processing module is used for described retrieving images sequence reel conversion and smoothing processing.
According to people's face video synthesizer of the embodiment of the invention,, can both synthesize the corresponding photorealism sequence of target person easily for user-defined any expression.
According to one embodiment of present invention, described multiframe facial image comprises a plurality of basic facial expression image sequences of gathering described target person from a plurality of visual angles and a plurality of posture position, and each the basic facial expression image sequence in wherein said a plurality of basic facial expression image sequences is expressed one's feelings by the change procedure of neutrality expression and described neutral expression and constituted.
According to people's face video synthesizer of the embodiment of the invention, can comprise great amount of images in the target person expression database.
According to one embodiment of present invention, described pretreatment module is further used for: extract in the described multiframe facial image in each frame facial image the posture position of people's face and use crab angle, helix angle and side rake angle to describe the position of people's face; And the people's face in each frame facial image in the described multiframe facial image aimed at positive standard attitude and use described feature description point to describe the expression of people's face.
According to people's face video synthesizer of the embodiment of the invention, posture position and expression that can accurate description people face.
According to one embodiment of present invention, described pretreatment module is extracted based on active shape model and is carried out described feature description point detection.
According to one embodiment of present invention, described associating similarity retrieval module is further used for: according to D
Pose=L (| Y
i-Y
j|)+L (| R
i-R
j|)+L (| P
i-P
j|) similarity of carrying out the posture position of people's face on the posture position of people's face in the described user-defined human face image sequence and the described multiframe facial image calculates, wherein Y represents crab angle, and R represents helix angle, and P represents side rake angle, I
iRepresent the arbitrary frame facial image in the described user-defined human face image sequence, I
jRepresent the arbitrary frame facial image in the described multiframe facial image, L (d) is the sigmoid function, is defined as
γ=ln99 wherein, T and σ are respectively average statistical and the standard deviations of variable d; According to described crab angle, described helix angle and described side rake angle, the posture position of people's face in the described user-defined human face image sequence is registered to positive attitude, basis then
Carry out the similarity of the expression of people's face on the expression of people's face in the described user-defined human face image sequence and the described multiframe facial image and calculate, wherein A
I, kIt is image I
iK feature description point behind the aligning, A
J, kIt is image I
jK feature description point behind the aligning, J is the number of unique point, w
kBe the weight of k unique point, L (d) is the sigmoid function, is defined as
The unique point range normalization that calculates is arrived [0,1]; According to D (i, j)=D
Pose(i, j)+λ D
Expression(i j) obtains the associating similarity, and wherein parameter lambda is to regulate the weight proportion of expression similarity and posture position similarity; And obtain image I in the user-defined human face image sequence according to described associating similarity
iA plurality of candidate images in described multiframe facial image.
According to people's face video synthesizer of the embodiment of the invention, can obtain the posture position and the suitable associating similarity between the expression of people's face and obtain qualified a plurality of candidate image.
According to one embodiment of present invention, for described a plurality of candidate images, described associating similarity retrieval module further uses the Dijkstra shortest path first to obtain described retrieving images sequence to guarantee time continuity and Space Consistency according to the associating similarity between described a plurality of candidate images.
According to people's face video synthesizer of the embodiment of the invention, can guarantee the accuracy and the flatness of retrieving images sequence.
Additional aspect of the present invention and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Above-mentioned and/or additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of embodiment below in conjunction with accompanying drawing, wherein:
Fig. 1 is the synoptic diagram of people's face image synthesizing method according to an embodiment of the invention;
Fig. 2 is the process flow diagram of people's face image synthesizing method according to an embodiment of the invention;
Fig. 3 is the process flow diagram of uniting the method for similarity retrieval according to an embodiment of the invention;
Fig. 4 extracts the synoptic diagram that face characteristic is described point based on active shape model (ASM) according to an embodiment of the invention;
Fig. 5 is the synoptic diagram that obtains the retrieving images sequence according to an embodiment of the invention; And
Fig. 6 is the block diagram of people's face video synthesizer according to an embodiment of the invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical from start to finish or similar label is represented identical or similar elements or the element with identical or similar functions.Below by the embodiment that is described with reference to the drawings is exemplary, only is used to explain the present invention, and can not be interpreted as limitation of the present invention.
Need to prove that in addition, term " first ", " second ", " the 3rd " only are used to describe purpose, and can not be interpreted as indication or hint relative importance or the implicit quantity that indicates indicated technical characterictic.Thus, one or more a plurality of this feature can be expressed or impliedly be comprised to the feature that is limited with " first ", " second ", " the 3rd ".Further, in description of the invention, except as otherwise noted, the implication of " a plurality of " is two or more.
Below with reference to accompanying drawing specific embodiments of the invention are described.
Fig. 1 is the synoptic diagram of people's face image synthesizing method according to an embodiment of the invention.As shown in Figure 1, comprise the multiframe facial image at target person expression database.Expression according to posture position and people's face of people's face in the user-defined human face image sequence, searched targets personage the express one's feelings posture position of people's face of multiframe facial image and the expression of people's face in the database, to retrieve the image sequence of coupling, afterwards image sequence is reeled with level and smooth.
Fig. 2 is the process flow diagram of people's face image synthesizing method according to an embodiment of the invention.As shown in Figure 2, people's face image synthesizing method may further comprise the steps.
Step S201 sets up target person expression database, comprises the multiframe facial image in the target person expression database.Particularly, target person is made a plurality of basic facial expression image sequences under the various visual angles acquisition system, and target person repeats to do for several times for each basic facial expression under the situation of the posture position that changes head.Gather the expression under a plurality of visual angles and a plurality of posture position as far as possible.Each basic facial expression image sequence is made of the change procedure expression of neutrality expression and neutral expression.Basic facial expression comprises happiness, anger, sadness, detest, fear, surprised etc.
Step S202 carries out pre-service to the multiframe facial image, with the posture position that extracts people's face on each frame facial image and the expression of people's face.Specifically comprise: extract in the multiframe facial image in each frame facial image the posture position of people's face and use crab angle, helix angle and side rake angle to describe the position of people's face; People's face in each frame facial image in the multiframe facial image is aimed at positive standard attitude and carried out feature description point and detect, an expression of describing people's face is put in the use characteristic description.
Step S203, carry out the associating similarity retrieval of the expression of the posture position of people's face on the expression of the posture position of people's face in the user-defined human face image sequence and people's face and the multiframe facial image and people's face, to obtain the retrieving images sequence with user-defined human face image sequence coupling.The concrete steps of associating similarity retrieval are described with reference to figure 3.Fig. 3 is the process flow diagram of uniting the method for similarity retrieval according to an embodiment of the invention.As shown in Figure 3, the associating similarity retrieval comprises the steps.
Step S2031 is according to D
Pose=L (| Y
i-Y
j|)+L (| R
i-R
j|)+L (| P
i-P
j|) similarity of carrying out the posture position of people's face on the posture position of people's face in the user-defined human face image sequence and the multiframe facial image calculates, wherein Y represents crab angle, and R represents helix angle, and P represents side rake angle, I
iRepresent the arbitrary frame facial image in the user-defined human face image sequence, I
jArbitrary frame facial image in the expression multiframe facial image, L (d) is the sigmoid function, is defined as
γ=ln99 wherein, T and σ are respectively average statistical and the standard deviations of variable d.
Step S2032 according to described crab angle, described helix angle and described side rake angle, is registered to positive attitude, basis then with the posture position of people's face in the described user-defined human face image sequence
Carry out the similarity of the expression of people's face on the expression of people's face in the user-defined human face image sequence and the multiframe facial image and calculate, wherein A
I, kIt is image I
iK feature description point behind the aligning, A
J, kIt is image I
jK feature description point behind the aligning, J is the number of unique point, w
kBe the weight of k unique point, L (d) is the sigmoid function, is defined as
The unique point range normalization that calculates is arrived [0,1].The extraction of feature description point is wherein described referring to Fig. 4.
Fig. 4 extracts the synoptic diagram that face characteristic is described point based on active shape model (ASM) according to an embodiment of the invention.As shown in Figure 4, people's face that target person is expressed one's feelings on each frame facial image in the database is registered to positive standard attitude, carry out active shape model feature description point then and detect, detect a plurality of feature description points, these a plurality of feature description points define the expression of people's face.
Step S2033, according to D (i, j)=D
Pose(i, j)+λ D
Expression(wherein parameter lambda is to regulate the weight proportion of expression similarity and posture position similarity for i, j) the associating similarity of acquisition posture position and expression.
Step S2034, according to the associating similarity that obtains among the step S2033, for user-defined human face image sequence, retrieve the nearest K frame of target person expression database middle distance one by one image, the K frame of promptly uniting the similarity minimum, just each user images constantly all has K candidate image.
Step S2035 for the K that obtains among a step S2034 candidate image, further according to the associating similarity between K the candidate image, uses the Dijkstra shortest path first to obtain the retrieving images sequence to guarantee time continuity and Space Consistency.Wherein, use with above-mentioned steps in associating similarity between the identical method calculating K candidate image.Particularly, utilize the Dijkstra shortest path first to retrieve one and be carved into t=m shortest path constantly during from t=1, wherein m is the length of user-defined human face image sequence.Fig. 5 is the synoptic diagram that obtains the retrieving images sequence according to an embodiment of the invention.Before implementing the Dijkstra shortest path first, need set up a digraph, as shown in Figure 5, the node of this digraph is by the candidate face image construction of all target persons that retrieve, only allow to have between the candidate face image of adjacent moment directed edge to link to each other, and the length on limit (or being called cost) is defined as:
L(C
t,i,C
t+1,j)=D(C
t,i,U
t)+D(C
t+1,j,U
t+1)+μD(C
t,i,C
t+1,j),
Wherein, C
T, iWith C
T+1, jRepresent t i the candidate image and j the candidate image in the t+1 moment constantly respectively, U
tWith U
T+1T and t+1 facial image of the definition of difference representative of consumer.The tolerance of cost makes the candidate select rank forward (near as far as possible with the distance of image in the user-defined human face image sequence) as far as possible between this node, can make again and significantly not suddenly change between different candidate images constantly that μ regulates the accuracy of the human face image sequence that retrieves and the parameter of flatness.
Utilize shortest path first, draw node C
1, iWith C
M, jBetween shortest path, from the permutation and combination of first and last node as can be seen, always total K*K of such path is at this K
2Obtain that of path minimum in the individual shortest path again, just obtained the retrieving images sequence that needs.
Step S204 is to retrieving images sequence reel conversion and smoothing processing.Though the retrieving images sequence that retrieval obtains among the step S203 has kept time continuity and Space Consistency to a certain extent, also have certain difference but compare with the expression of people's face on the target person image, the conversion of therefore need reeling makes the expression expression of people's face on the image of the human face image sequence of match user definition more of people's face in the retrieving images sequence.Simultaneously, the retrieving images sequence may exist video jitter to jump, and therefore needs further smoothing processing.The concrete steps of smoothing processing are as follows.
For t user images constantly, utilize the feature description of the extracting people's face of naming a person for a particular job to be divided into parts such as eyebrow, eyes, nose, face, calculate U respectively
tAnd C
tThese provincial characteristicss are described the distance of point, if this distance greater than a certain constant δ, thinks that then this regional aim personage's action and user's action have certain gap.For example the action of face not too meets the requirements, and then needs to be used in the target person expression database candidate image that retrieval once more and user's face mate most, then the detected face that mates is most substituted original C
tFace, can use to mend and paint algorithm, thereby when guaranteeing to substitute and the consistance of peripheral region texture, be unlikely to produce tangible feeling of unreality based on people's face of Poisson equation.
On time domain, may there be shake through people's face of mending after painting, needs to calculate the distance of adjacent moment between the image after benefit is painted,, take the method for light stream optimization to realize seamlessly transitting, substitute shake some image of front and back constantly takes place if detect shake.
Fig. 6 is the block diagram of people's face video synthesizer according to an embodiment of the invention.As shown in Figure 6, people's face video synthesizer 10 comprises target person expression database 110, pretreatment module 120, associating similarity retrieval module 130 and post-processing module 140.
Particularly, target person expression database 110 is used to store the multiframe facial image.Pretreatment module 120 is used for extracting the posture position of people's face on target person expression database 110 each frame facial image and the expression of people's face.Associating similarity retrieval module 130 is used for carrying out the associating similarity retrieval of the expression of the posture position of people's face on the expression of the posture position of user-defined human face image sequence people face and people's face and the multiframe facial image and people's face, to obtain the retrieving images sequence with user-defined human face image sequence coupling.Post-processing module 140 is used for retrieving images sequence reel conversion and smoothing processing.
In one embodiment of the invention, the multiframe facial image comprises that from a plurality of basic facial expression image sequences of a plurality of visual angles and a plurality of posture position collection target person each the basic facial expression image sequence in wherein a plurality of basic facial expression image sequences is expressed one's feelings by the change procedure of neutrality expression and neutral expression and constituted.Describe a little based on the active shape model detected characteristics.
In one embodiment of the invention, pretreatment module 120 is further used for: extract in the multiframe facial image in each frame facial image the posture position of people's face and use crab angle, helix angle and side rake angle to describe the position of people's face; And the people's face in each frame facial image in the multiframe facial image is aimed at also use characteristic description put an expression of describing people's face with positive standard attitude.
In one embodiment of the invention, associating similarity retrieval module 130 is further used for: according to D
Pose=L (| Y
i-Y
j|)+L (| R
i-R
j|)+L (| P
i-P
j|) similarity of carrying out the posture position of people's face on the posture position of people's face in the user-defined human face image sequence and the multiframe facial image calculates, wherein Y represents crab angle, and R represents helix angle, and P represents side rake angle, I
iRepresent the arbitrary frame facial image in the user-defined human face image sequence, I
jArbitrary frame facial image in the expression multiframe facial image, L (d) is the sigmoid function, is defined as
γ=ln99 wherein, T and σ are respectively average statistical and the standard deviations of variable d; According to described crab angle, described helix angle and described side rake angle, the posture position of people's face in the described user-defined human face image sequence is registered to positive attitude, basis then
Carry out the similarity of the expression of people's face on the expression of people's face in the user-defined human face image sequence and the multiframe facial image and calculate, wherein A
I, kIt is image I
iK feature description point behind the aligning, A
J, kIt is image I
jK feature description point behind the aligning, J is the number of unique point, w
kBe the weight of k unique point, L (d) is the sigmoid function, is defined as
The unique point range normalization that calculates is arrived [0,1]; According to D (i, j)=D
Pose(i, j)+λ D
Expression(i j) obtains the associating similarity, and wherein parameter lambda is to regulate the weight proportion of expression similarity and posture position similarity; And according to the associating similarity obtain image I in the user-defined human face image sequence
iA plurality of candidate images in the multiframe facial image.For a plurality of candidate images, associating similarity retrieval module 130 further uses the Dijkstra shortest path first to obtain the retrieving images sequence to guarantee time continuity and Space Consistency according to the associating similarity between a plurality of candidate images.
People's face image synthesizing method and device according to the embodiment of the invention have following advantage: for user-defined any expression, can both synthesize the corresponding photorealism sequence of target person easily; Reduce the actor number of times; And automaticity height.
In the description of this instructions, concrete feature, structure, material or characteristics that the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example description are contained at least one embodiment of the present invention or the example.In this manual, the schematic statement to above-mentioned term not necessarily refers to identical embodiment or example.And concrete feature, structure, material or the characteristics of description can be with the suitable manner combination in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, those having ordinary skill in the art will appreciate that: can carry out multiple variation, modification, replacement and modification to these embodiment under the situation that does not break away from principle of the present invention and aim, scope of the present invention is limited by claim and equivalent thereof.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification that scope of the present invention is by claims and be equal to and limit to these embodiment.
Claims (12)
1. people's face image synthesizing method is characterized in that, comprises step:
Set up target person expression database, comprise the multiframe facial image in the described target person expression database;
Described multiframe facial image is carried out pre-service, with the posture position that extracts people's face on each frame facial image and the expression of people's face;
Carry out the associating similarity retrieval of the expression of the posture position of people's face on the expression of the posture position of people's face in the user-defined human face image sequence and people's face and the described multiframe facial image and people's face, to obtain the retrieving images sequence with described user-defined human face image sequence coupling; And
To described retrieving images sequence reel conversion and smoothing processing.
2. people's face image synthesizing method according to claim 1 is characterized in that, the described step of setting up target person expression database comprises:
Gather a plurality of basic facial expression image sequences of described target person from a plurality of visual angles and a plurality of posture position, each the basic facial expression image sequence in wherein said a plurality of basic facial expression image sequences is expressed one's feelings by the change procedure of neutrality expression and described neutral expression and is constituted.
3. people's face image synthesizing method according to claim 1 is characterized in that, describedly described multiframe facial image is carried out pretreated step comprises:
Extract in the described multiframe facial image in each frame facial image the posture position of people's face and use crab angle, helix angle and side rake angle to describe the position of people's face;
People's face in each frame facial image in the described multiframe facial image is aimed at positive standard attitude and carried out the detection of feature description point, use described feature description point to describe the expression of people's face.
4. people's face image synthesizing method according to claim 3 is characterized in that, extracts described feature description point based on active shape model.
5. people's face image synthesizing method according to claim 3, it is characterized in that the step of associating similarity retrieval of carrying out the expression of the posture position of people's face on the expression of the posture position of people's face in the described user-defined human face image sequence and people's face and the described multiframe facial image and people's face comprises:
According to D
Pose=L (| Y
i-Y
j|)+L (| R
i-R
j|)+L (| P
i-P
j|) similarity of carrying out the posture position of people's face on the posture position of people's face in the described user-defined human face image sequence and the described multiframe facial image calculates, wherein Y represents crab angle, and R represents helix angle, and P represents side rake angle, I
iRepresent the arbitrary frame facial image in the described user-defined human face image sequence, I
jRepresent the arbitrary frame facial image in the described multiframe facial image, L (d) is the sigmoid function, is defined as
γ=ln99 wherein, T and σ are respectively average statistical and the standard deviations of variable d;
According to described crab angle, described helix angle and described side rake angle, the posture position of people's face in the described user-defined human face image sequence is registered to positive attitude, basis then
Carry out the similarity of the expression of people's face on the expression of people's face in the described user-defined human face image sequence and the described multiframe facial image and calculate, wherein A
I, kIt is image I
iK feature description point behind the aligning, A
J, kIt is image I
jK feature description point behind the aligning, J is the number of unique point, w
kBe the weight of k unique point, L (d) is the sigmoid function, is defined as
The unique point range normalization that calculates is arrived [0,1];
According to D (i, j)=D
Pose(i, j)+λ D
Expression(i j) obtains the associating similarity, and wherein parameter lambda is to regulate the weight proportion of expression similarity and posture position similarity; And
Obtain image I in the user-defined human face image sequence according to described associating similarity
iA plurality of candidate images in described multiframe facial image.
6. people's face image synthesizing method according to claim 5, it is characterized in that, described a plurality of candidate images for described associating similarity retrieval acquisition, further calculate the associating similarity between described a plurality of candidate images, and use the Dijkstra shortest path first to obtain described retrieving images sequence to guarantee time continuity and Space Consistency.
7. people's face video synthesizer is characterized in that, comprising:
Target person expression database, described target person expression database is used to store the multiframe facial image;
Pretreatment module, described pretreatment module are used for extracting the posture position of people's face on described each frame facial image of target person expression database and the expression of people's face;
Associating similarity retrieval module, described associating similarity retrieval module is used for carrying out the associating similarity retrieval of the expression of the posture position of people's face on the expression of the posture position of user-defined human face image sequence people face and people's face and the described multiframe facial image and people's face, to obtain the retrieving images sequence with described user-defined human face image sequence coupling; And
Post-processing module, described post-processing module are used for described retrieving images sequence reel conversion and smoothing processing.
8. people's face video synthesizer according to claim 7, it is characterized in that, described multiframe facial image comprises a plurality of basic facial expression image sequences of gathering described target person from a plurality of visual angles and a plurality of posture position, and each the basic facial expression image sequence in wherein said a plurality of basic facial expression image sequences is expressed one's feelings by the change procedure of neutrality expression and described neutral expression and constituted.
9. people's face video synthesizer according to claim 7 is characterized in that described pretreatment module is further used for:
Extract in the described multiframe facial image in each frame facial image the posture position of people's face and use crab angle, helix angle and side rake angle to describe the position of people's face; And
People's face in each frame facial image in the described multiframe facial image is aimed at positive standard attitude and carried out the detection of feature description point, use described feature description point to describe the expression of people's face.
10. people's face video synthesizer according to claim 9 is characterized in that, described pretreatment module is extracted based on active shape model and carried out described feature description point detection.
11. people's face video synthesizer according to claim 9 is characterized in that, described associating similarity retrieval module is further used for:
According to D
Pose=L (| Y
i-Y
j|)+L (| R
i-R
j|)+L (| P
i-P
j|) similarity of carrying out the posture position of people's face on the posture position of people's face in the described user-defined human face image sequence and the described multiframe facial image calculates, wherein Y represents crab angle, and R represents helix angle, and P represents side rake angle, I
iRepresent the arbitrary frame facial image in the described user-defined human face image sequence, I
jRepresent the arbitrary frame facial image in the described multiframe facial image, L (d) is the sigmoid function, is defined as
γ=ln99 wherein, T and σ are respectively average statistical and the standard deviations of variable d;
According to described crab angle, described helix angle and described side rake angle, the posture position of people's face in the described user-defined human face image sequence is registered to positive attitude, basis then
Carry out the similarity of the expression of people's face on the expression of people's face in the described user-defined human face image sequence and the described multiframe facial image and calculate, wherein A
I, kIt is image I
iK feature description point behind the aligning, A
J, kIt is image I
jK feature description point behind the aligning, J is the number of unique point, w
kBe the weight of k unique point, L (d) is the sigmoid function, is defined as
The unique point range normalization that calculates is arrived [0,1];
According to D (i, j)=D
Pose(i, j)+λ D
Expression(i j) obtains the associating similarity, and wherein parameter lambda is to regulate the weight proportion of expression similarity and posture position similarity; And
Obtain image I in the user-defined human face image sequence according to described associating similarity
iA plurality of candidate images in described multiframe facial image.
12. people's face video synthesizer according to claim 11, it is characterized in that, for described a plurality of candidate images, described associating similarity retrieval module further uses the Dijkstra shortest path first to obtain described retrieving images sequence to guarantee time continuity and Space Consistency according to the associating similarity between described a plurality of candidate images.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110197873 CN102254336B (en) | 2011-07-14 | 2011-07-14 | Method and device for synthesizing face video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110197873 CN102254336B (en) | 2011-07-14 | 2011-07-14 | Method and device for synthesizing face video |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102254336A true CN102254336A (en) | 2011-11-23 |
CN102254336B CN102254336B (en) | 2013-01-16 |
Family
ID=44981577
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110197873 Active CN102254336B (en) | 2011-07-14 | 2011-07-14 | Method and device for synthesizing face video |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102254336B (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104679832A (en) * | 2015-02-05 | 2015-06-03 | 四川长虹电器股份有限公司 | System and method for searching single or multi-body combined picture based on face recognition |
WO2015090147A1 (en) * | 2013-12-20 | 2015-06-25 | 百度在线网络技术(北京)有限公司 | Virtual video call method and terminal |
CN105190700A (en) * | 2013-06-04 | 2015-12-23 | 英特尔公司 | Avatar-based video encoding |
CN106303233A (en) * | 2016-08-08 | 2017-01-04 | 西安电子科技大学 | A kind of video method for secret protection merged based on expression |
WO2017035966A1 (en) * | 2015-08-28 | 2017-03-09 | 百度在线网络技术(北京)有限公司 | Method and device for processing facial image |
CN108510435A (en) * | 2018-03-28 | 2018-09-07 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
CN109670386A (en) * | 2017-10-16 | 2019-04-23 | 深圳泰首智能技术有限公司 | Face identification method and terminal |
CN109784123A (en) * | 2017-11-10 | 2019-05-21 | 浙江思考者科技有限公司 | The analysis and judgment method of real's expression shape change |
CN109886091A (en) * | 2019-01-08 | 2019-06-14 | 东南大学 | Three-dimensional face expression recognition methods based on Weight part curl mode |
CN109993102A (en) * | 2019-03-28 | 2019-07-09 | 北京达佳互联信息技术有限公司 | Similar face retrieval method, apparatus and storage medium |
CN110675433A (en) * | 2019-10-31 | 2020-01-10 | 北京达佳互联信息技术有限公司 | Video processing method and device, electronic equipment and storage medium |
CN111274447A (en) * | 2020-01-13 | 2020-06-12 | 深圳壹账通智能科技有限公司 | Target expression generation method, device, medium and electronic equipment based on video |
CN111652121A (en) * | 2020-06-01 | 2020-09-11 | 腾讯科技(深圳)有限公司 | Training method of expression migration model, and expression migration method and device |
CN113269872A (en) * | 2021-06-01 | 2021-08-17 | 广东工业大学 | Synthetic video generation method based on three-dimensional face reconstruction and video key frame optimization |
CN113436302A (en) * | 2021-06-08 | 2021-09-24 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Face animation synthesis method and system |
WO2022016996A1 (en) * | 2020-07-22 | 2022-01-27 | 平安科技(深圳)有限公司 | Image processing method, device, electronic apparatus, and computer readable storage medium |
US11295502B2 (en) | 2014-12-23 | 2022-04-05 | Intel Corporation | Augmented facial animation |
US11303850B2 (en) | 2012-04-09 | 2022-04-12 | Intel Corporation | Communication using interactive avatars |
CN114429611A (en) * | 2022-04-06 | 2022-05-03 | 北京达佳互联信息技术有限公司 | Video synthesis method and device, electronic equipment and storage medium |
US11887231B2 (en) | 2015-12-18 | 2024-01-30 | Tahoe Research, Ltd. | Avatar animation system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1920880A (en) * | 2006-09-14 | 2007-02-28 | 浙江大学 | Video flow based people face expression fantasy method |
CN1920886A (en) * | 2006-09-14 | 2007-02-28 | 浙江大学 | Video flow based three-dimensional dynamic human face expression model construction method |
CN101179665A (en) * | 2007-11-02 | 2008-05-14 | 腾讯科技(深圳)有限公司 | Method and device for transmitting face synthesized video |
-
2011
- 2011-07-14 CN CN 201110197873 patent/CN102254336B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1920880A (en) * | 2006-09-14 | 2007-02-28 | 浙江大学 | Video flow based people face expression fantasy method |
CN1920886A (en) * | 2006-09-14 | 2007-02-28 | 浙江大学 | Video flow based three-dimensional dynamic human face expression model construction method |
CN101179665A (en) * | 2007-11-02 | 2008-05-14 | 腾讯科技(深圳)有限公司 | Method and device for transmitting face synthesized video |
Non-Patent Citations (3)
Title |
---|
《European Conference on Computer Vision》 20100930 Ira Kemelmacher-Shlizerman, Aditya Sankar, Eli Shechtman, and St Being John Malkovich 1-13 1-12 , * |
《IEEE Transactions on Visualization and Computer Graphics》 20060228 Qingshan Zhang, et al. geometry-driven photorealistic facial expression synthesis 48-60 1-12 第12卷, 第1期 * |
《中国优秀硕士学位论文全文数据库 信息科技辑》 20080115 雷林华 基于视频序列的虚拟人脸合成 全文 1-12 , 第1期 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11303850B2 (en) | 2012-04-09 | 2022-04-12 | Intel Corporation | Communication using interactive avatars |
CN105190700A (en) * | 2013-06-04 | 2015-12-23 | 英特尔公司 | Avatar-based video encoding |
WO2015090147A1 (en) * | 2013-12-20 | 2015-06-25 | 百度在线网络技术(北京)有限公司 | Virtual video call method and terminal |
US11295502B2 (en) | 2014-12-23 | 2022-04-05 | Intel Corporation | Augmented facial animation |
CN104679832A (en) * | 2015-02-05 | 2015-06-03 | 四川长虹电器股份有限公司 | System and method for searching single or multi-body combined picture based on face recognition |
US10599914B2 (en) | 2015-08-28 | 2020-03-24 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for human face image processing |
WO2017035966A1 (en) * | 2015-08-28 | 2017-03-09 | 百度在线网络技术(北京)有限公司 | Method and device for processing facial image |
US11887231B2 (en) | 2015-12-18 | 2024-01-30 | Tahoe Research, Ltd. | Avatar animation system |
CN106303233A (en) * | 2016-08-08 | 2017-01-04 | 西安电子科技大学 | A kind of video method for secret protection merged based on expression |
CN109670386A (en) * | 2017-10-16 | 2019-04-23 | 深圳泰首智能技术有限公司 | Face identification method and terminal |
CN109784123A (en) * | 2017-11-10 | 2019-05-21 | 浙江思考者科技有限公司 | The analysis and judgment method of real's expression shape change |
CN108510435A (en) * | 2018-03-28 | 2018-09-07 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
CN109886091B (en) * | 2019-01-08 | 2021-06-01 | 东南大学 | Three-dimensional facial expression recognition method based on weighted local rotation mode |
CN109886091A (en) * | 2019-01-08 | 2019-06-14 | 东南大学 | Three-dimensional face expression recognition methods based on Weight part curl mode |
CN109993102A (en) * | 2019-03-28 | 2019-07-09 | 北京达佳互联信息技术有限公司 | Similar face retrieval method, apparatus and storage medium |
CN109993102B (en) * | 2019-03-28 | 2021-09-17 | 北京达佳互联信息技术有限公司 | Similar face retrieval method, device and storage medium |
CN110675433A (en) * | 2019-10-31 | 2020-01-10 | 北京达佳互联信息技术有限公司 | Video processing method and device, electronic equipment and storage medium |
US11450027B2 (en) | 2019-10-31 | 2022-09-20 | Beijing Dajia Internet Information Technologys Co., Ltd. | Method and electronic device for processing videos |
CN111274447A (en) * | 2020-01-13 | 2020-06-12 | 深圳壹账通智能科技有限公司 | Target expression generation method, device, medium and electronic equipment based on video |
JP7482242B2 (en) | 2020-06-01 | 2024-05-13 | テンセント・テクノロジー・(シェンジェン)・カンパニー・リミテッド | Facial expression transfer model training method, facial expression transfer method and device, computer device and program |
WO2021244217A1 (en) * | 2020-06-01 | 2021-12-09 | 腾讯科技(深圳)有限公司 | Method for training expression transfer model, and expression transfer method and apparatus |
JP2023517211A (en) * | 2020-06-01 | 2023-04-24 | テンセント・テクノロジー・(シェンジェン)・カンパニー・リミテッド | Facial expression transfer model training method, facial expression transfer method and device, computer device and program |
CN111652121A (en) * | 2020-06-01 | 2020-09-11 | 腾讯科技(深圳)有限公司 | Training method of expression migration model, and expression migration method and device |
CN111652121B (en) * | 2020-06-01 | 2023-11-03 | 腾讯科技(深圳)有限公司 | Training method of expression migration model, and method and device for expression migration |
WO2022016996A1 (en) * | 2020-07-22 | 2022-01-27 | 平安科技(深圳)有限公司 | Image processing method, device, electronic apparatus, and computer readable storage medium |
CN113269872A (en) * | 2021-06-01 | 2021-08-17 | 广东工业大学 | Synthetic video generation method based on three-dimensional face reconstruction and video key frame optimization |
CN113436302B (en) * | 2021-06-08 | 2024-02-13 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Face animation synthesis method and system |
CN113436302A (en) * | 2021-06-08 | 2021-09-24 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Face animation synthesis method and system |
CN114429611B (en) * | 2022-04-06 | 2022-07-08 | 北京达佳互联信息技术有限公司 | Video synthesis method and device, electronic equipment and storage medium |
CN114429611A (en) * | 2022-04-06 | 2022-05-03 | 北京达佳互联信息技术有限公司 | Video synthesis method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN102254336B (en) | 2013-01-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102254336B (en) | Method and device for synthesizing face video | |
CN111652828B (en) | Face image generation method, device, equipment and medium | |
Ge et al. | 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images | |
US11238568B2 (en) | Method and system for reconstructing obstructed face portions for virtual reality environment | |
Liu et al. | Fusedream: Training-free text-to-image generation with improved clip+ gan space optimization | |
Chan et al. | Everybody dance now | |
US10846903B2 (en) | Single shot capture to animated VR avatar | |
Toshev et al. | Deeppose: Human pose estimation via deep neural networks | |
Shi et al. | Automatic acquisition of high-fidelity facial performances using monocular videos | |
Hassner | Viewing real-world faces in 3D | |
US20180227482A1 (en) | Scene-aware selection of filters and effects for visual digital media content | |
CN106600626B (en) | Three-dimensional human motion capture method and system | |
CN102567716B (en) | Face synthetic system and implementation method | |
CN110688948B (en) | Method and device for transforming gender of human face in video, electronic equipment and storage medium | |
CN104850825A (en) | Facial image face score calculating method based on convolutional neural network | |
Xiao et al. | Joint affinity propagation for multiple view segmentation | |
CN111950430B (en) | Multi-scale dressing style difference measurement and migration method and system based on color textures | |
CN105118023A (en) | Real-time video human face cartoonlization generating method based on human facial feature points | |
CN115083015B (en) | 3D human body posture estimation data labeling mode and corresponding model construction method | |
CN110232727A (en) | A kind of continuous posture movement assessment intelligent algorithm | |
Seddik et al. | Unsupervised facial expressions recognition and avatar reconstruction from Kinect | |
Zhang et al. | Styleavatar3d: Leveraging image-text diffusion models for high-fidelity 3d avatar generation | |
Lu et al. | Multi-task learning for single image depth estimation and segmentation based on unsupervised network | |
Li et al. | Everyone is a cartoonist: Selfie cartoonization with attentive adversarial networks | |
Li et al. | Ecnet: Effective controllable text-to-image diffusion models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |