CN110516598A

CN110516598A - Method and apparatus for generating image

Info

Publication number: CN110516598A
Application number: CN201910797619.2A
Authority: CN
Inventors: 胡天舒; 张世昌; 洪智滨; 韩钧宇; 刘经拓
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2019-11-29
Anticipated expiration: 2039-08-27
Also published as: CN110516598B

Abstract

Embodiment of the disclosure discloses the method and apparatus for generating image.One specific embodiment of this method includes: to obtain negative image and target facial image, wherein the negative image includes facial image to be replaced and background；Determination matches facial image from the matched default facial image database of facial image to be replaced, wherein, it include the facial image of the different facial poses of face indicated by the facial image to be replaced in the matched default facial image database, the matching facial image is consistent with facial pose shown by the target facial image for characterizing facial pose shown by facial image to be replaced；Based on the replacement of the matching facial image to the facial image to be replaced, target image is generated, wherein the target image includes the facial image consistent with the matching facial image and the background consistent with the negative image.The embodiment improves the speed of the generation image consistent with the facial pose of target facial image.

Description

Method and apparatus for generating image

Technical field

Embodiment of the disclosure is related to field of computer technology, and in particular to the method and apparatus for generating image.

Background technique

With the rapid development of artificial intelligence technology, gradually increased and face in the application such as short-sighted frequency, net cast class Relevant interactive function, so as to drive preset face template (such as cartoon character) by the variation of the facial expression of user Generate approximate expression.

Relevant mode is usually to carry out deformation to preset face template by detecting the variation of face key point, from And it is generated and the consistent expression of user's face facial pose, nozzle type etc. on face template.

Summary of the invention

Embodiment of the disclosure proposes the method and apparatus for generating image.

In a first aspect, embodiment of the disclosure provides a kind of method for generating image, this method comprises: obtaining bottom Domain picture and target facial image, wherein negative image includes facial image to be replaced and background；From with facial image to be replaced Matching facial image is determined in matched default facial image database, wherein include to be replaced in matched default facial image database The facial image of the different facial poses of face indicated by facial image, matching facial image is for characterizing face figure to be replaced As shown facial pose is consistent with facial pose shown by target facial image；Based on matching facial image to be replaced The replacement of facial image generates target image, wherein target image include with match the consistent facial image of facial image and The background consistent with negative image.

In some embodiments, above-mentioned acquisition negative image and target facial image, comprising: obtain and clapped for the first user The first video taken the photograph and the second video for second user shooting；The face figure including the first user is extracted from the first video The video frame of picture is as negative image；The video frame of the facial image including second user is extracted from the second video；From including The facial image of second user is extracted in the video frame of the facial image of second user as target facial image；And above-mentioned Replacement based on matching facial image to facial image to be replaced, after generating target image, this method further include: be based on target Image generates target video, wherein the shown in the facial pose of the second user shown in target video and the first video The facial pose of one user matches.

In some embodiments, above-mentioned default facial image database obtains as follows: benchmark face image library is obtained, It wherein, include the image for showing the different facial poses of benchmark face in benchmark face image library；By benchmark face image library In image be input in advance trained image and generate model, generate matching benchmark face image, wherein image generates model packet Coding network, hidden layer network and decoding network are included, the image for matching facial pose shown by benchmark face image and being inputted Shown facial pose is consistent；Based on matching benchmark face image, default facial image database is generated.

In some embodiments, above-mentioned hidden layer network includes the first hidden layer network and the second hidden layer network, and image generates mould Type includes that the first image generates submodel and the second image and generates submodel, the first image generate submodel include coding network, First hidden layer network, the second hidden layer network and decoding network, the second image generate submodel include coding network, decoding network and Target hidden layer network, target hidden layer network are one of the first hidden layer network and the second hidden layer network.

In some embodiments, training obtains above-mentioned image generation model as follows: obtaining sample benchmark face Image collection and sample face image set, wherein sample benchmark face image collection includes the subset of benchmark face image library； Sample benchmark face image collection and sample face image set are subjected to image preprocessing transformation, generate sample preprocessing benchmark Face image set and sample preprocessing face image set；By sample preprocessing benchmark face image and sample preprocessing face Image generates submodel respectively as the first image and the second image generates the input of submodel, will sample base corresponding with input Quasi- facial image and sample facial image generate the expectation of submodel and the second image generation submodel respectively as the first image Output, training obtain image and generate model.

In some embodiments, the above-mentioned replacement based on matching facial image to facial image to be replaced, generates target figure Picture, comprising: matching facial image is subjected to face with facial image to be replaced and is aligned；Based on after alignment matching facial image and Facial image to be replaced carries out triangulation；The matching face figure of the delta-shaped region divided according to triangulation after alignment Corresponding relationship in picture and facial image to be replaced is replaced, and generates quasi- target image；Face is extracted from quasi- target image The profile of image；According to the profile of facial image, exposure mask is generated；According to exposure mask and quasi- target image, the face of facial image is generated Color distributed intelligence；Facial image is rendered according to distribution of color information, generates target image.

Second aspect, embodiment of the disclosure provide it is a kind of for generating the device of image, the device include: obtain it is single Member is configured to obtain negative image and target facial image, wherein negative image includes facial image to be replaced and background； Determination unit is configured to the determination from facial image to be replaced matched default facial image database and matches facial image, In, it include the face figure of the different facial poses of face indicated by facial image to be replaced in matched default facial image database Picture, matching facial image is for characterizing face shown by facial pose shown by facial image to be replaced and target facial image Portion's posture is consistent；First generation unit is configured to the replacement based on matching facial image to facial image to be replaced, generates mesh Logo image, wherein target image includes and the matching consistent facial image of the facial image and back consistent with negative image Scape.

In some embodiments, above-mentioned acquiring unit includes: acquisition module, is configured to obtain for the first user shooting The first video and for second user shooting the second video；First extraction module is configured to extract from the first video The video frame of facial image including the first user is as negative image；Second extraction module is configured to from the second video Extract the video frame of the facial image including second user；Third extraction module is configured to from the face including second user The facial image of second user is extracted in the video frame of image as target facial image；And device further include: second is raw At unit, it is configured to generate target video, wherein the face of the second user shown in target video based on target image The facial pose of the first user shown in posture and the first video matches.

In some embodiments, above-mentioned first generation unit includes: alignment module, be configured to match facial image with Facial image to be replaced carries out face alignment；Subdivision module, matching facial image after being configured to based on alignment and to be replaced Facial image carries out triangulation；First generation module, the delta-shaped region for being configured to be divided according to triangulation is right The corresponding relationship in matching facial image and facial image to be replaced after neat is replaced, and generates quasi- target image；4th mentions Modulus block is configured to extract the profile of facial image from quasi- target image；Second generation module, is configured to according to face The profile of image generates exposure mask；Third generation module is configured to generate facial image according to exposure mask and quasi- target image Distribution of color information；4th generation module is configured to render facial image according to distribution of color information, generates target Image.

The third aspect, embodiment of the disclosure provide a kind of electronic equipment, which includes: one or more places Manage device；Storage device is stored thereon with one or more programs；When one or more programs are held by one or more processors Row, so that one or more processors realize the method as described in implementation any in first aspect.

Fourth aspect, embodiment of the disclosure provide a kind of computer-readable medium, are stored thereon with computer program, The method as described in implementation any in first aspect is realized when the program is executed by processor.

The method and apparatus for generating image that embodiment of the disclosure provides, first acquisition negative image and target person Face image.Wherein, negative image includes facial image to be replaced and background.Then, from matched pre- with facial image to be replaced If determining matching facial image in facial image database.It wherein, include facial image to be replaced in matched default facial image database The facial image of the different facial poses of indicated face.Matching facial image is for characterizing shown by facial image to be replaced Facial pose it is consistent with facial pose shown by target facial image.Later, based on matching facial image to people to be replaced The replacement of face image generates target image.Wherein, target image include with the consistent facial image of matching facial image and with The consistent background of negative image.To improve the speed of the generation image consistent with the facial pose of target facial image Degree.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the disclosure is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that one embodiment of the disclosure can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the method for generating image of the disclosure；

Fig. 3 is according to an embodiment of the present disclosure for generating the schematic diagram of an application scenarios of the method for image；

Fig. 4 is the flow chart according to another embodiment of the method for generating image of the disclosure；

Fig. 5 is the structural schematic diagram according to one embodiment of the device for generating image of the disclosure；

Fig. 6 is adapted for the structural schematic diagram for realizing the electronic equipment of embodiment of the disclosure.

Specific embodiment

The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using the disclosure for generating the method for image or the example of the device for generating image Property framework 100.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

Terminal device 101,102,103 is interacted by network 104 with server 105, to receive or send message etc..Terminal Various telecommunication customer end applications can be installed in equipment 101,102,103, such as the application of web browser applications, searching class, Instant messaging tools, mailbox client, social platform software, image processing class application, the application of video editing class etc..

Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, the various electronic equipments of image procossing, including but not limited to smart phone, plate are can be with display screen and supported Computer, pocket computer on knee and desktop computer etc..When terminal device 101,102,103 is software, can install In above-mentioned cited electronic equipment.Multiple softwares or software module may be implemented into (such as providing distributed clothes in it The software or software module of business), single software or software module also may be implemented into.It is not specifically limited herein.

Server 105 can be to provide the server of various services, for example, image on terminal device 101,102,103 It handles class application and the background server supported is provided.Background server can be handled the image received, and will processing As a result terminal device is fed back to (such as treated image).

It should be noted that above-mentioned image can also be stored directly in the local of server 105, server 105 can be straight It connects and extracts the local image stored and handled, at this point it is possible to which terminal device 101,102,103 and network 104 is not present.

It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software To be implemented as multiple softwares or software module (such as providing the software of Distributed Services or software module), also may be implemented At single software or software module.It is not specifically limited herein.

It should be noted that for generating the method for image generally by server 105 provided by embodiment of the disclosure It executes, correspondingly, the device for generating image is generally positioned in server 105.Optionally, embodiment of the disclosure is mentioned The method for generating image supplied can also be executed directly by terminal device 101,102,103, correspondingly, for generating image Device also can be set in terminal device 101,102,103.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

With continued reference to Fig. 2, the process of one embodiment of the method for generating image according to the disclosure is shown 200.This be used for generate image method the following steps are included:

Step 201, negative image and target facial image are obtained.

In the present embodiment, can lead to for generating the executing subject (server 105 as shown in Figure 1) of the method for image It crosses wired connection mode or radio connection obtains negative image and target facial image.Wherein, above-mentioned negative image can To include facial image to be replaced and background.Above-mentioned background may include that stingy diagram technology (Image is utilized in negative image Matting) the image section except the facial image to be replaced determined.Above-mentioned target facial image can be answers according to actual With demand, preassigned any facial image.Above-mentioned target facial image is also possible to the list depending on rule, such as The facial image of user terminal uploads.

As an example, above-mentioned executing subject is available to be pre-stored within local negative image and target facial image. As another example, also (such as terminal shown in FIG. 1 is set the available electronic equipment for communicating connection above-mentioned executing subject It is standby) send negative image and target facial image.

Step 202, determination matches facial image from facial image to be replaced matched default facial image database.

In the present embodiment, above-mentioned executing subject can obtain and the matched default face figure of facial image to be replaced first As library.Wherein, may include in above-mentioned default facial image database different faces various facial poses facial image.It is above-mentioned Facial pose can include but is not limited at least one of following: expression, nozzle type, attitude angle (Eulerian angles).Optionally, different people Face can correspond to different default facial image databases, so that face shown by the facial image in the default facial image database is equal Corresponding to the same person.Corresponding relationship between above-mentioned default facial image database and corresponding face can there are many form, Such as the matching of mapping table or mark (such as ID, feature vector).Above-mentioned default facial image database usually may include The index of character representation (embedding) based on each facial image and composition.To, above-mentioned executing subject it is available with to Replace the matched default facial image database of facial image.It is appreciated that the above-mentioned and matched default face of facial image to be replaced It may include the facial image of the different facial poses of face indicated by facial image to be replaced in image library.

It should be noted that in order to promote the training effect of facial image matched accuracy and guarantee, above-mentioned default people The number of image in face image library is usually larger.Such as the number of the image in each set can be not less than 6000.

In this embodiment, above-mentioned executing subject can be further from the above-mentioned and matched default face of facial image to be replaced Matching facial image is determined in image library.Wherein, above-mentioned matching facial image can be used for characterizing facial image to be replaced and show The facial pose shown is consistent with facial pose shown by target facial image.As an example, above-mentioned executing subject can be to mesh It marks facial image and carries out the extraction of face key point, to generate target facial image feature vector.According to above-mentioned target face figure As feature vector, in the way of various image retrievals it is above-mentioned in the matched default facial image database of facial image to be replaced into Row retrieval.Then, above-mentioned executing subject can be by the image of the most matching retrieved (such as distance minimum, similarity maximum etc.) It is determined as matching facial image.Wherein, the mode of above-mentioned image retrieval for example can be approximate KNN method, multi-dimensional indexing side Method (Multidimensional Indexing Method, MIM) etc..

In some optional implementations of the present embodiment, above-mentioned default facial image database can obtain as follows It arrives:

The first step obtains benchmark face image library.

In these implementations, above-mentioned executing subject can be from local or communication connection electronic equipment (such as data Library server) obtain benchmark face image library.It wherein, may include showing benchmark face in said reference facial image database The image of different facial poses.In general, face shown by facial image in said reference facial image database both corresponds to together One people.

It should be noted that the face figure in order to preset the completeness of facial image database, in said reference facial image database The number of picture is usually larger.Such as it can be not less than 6000.Said reference facial image database usually may include based on each one The character representation of face image and the index constituted.

Image in benchmark face image library is input to image trained in advance and generates model, generates matching by second step Benchmark face image.

In these implementations, it may include coding network, hidden layer network and decoding network that above-mentioned image, which generates model,. Facial pose shown by above-mentioned matching benchmark face image is consistent with facial pose shown by the image inputted.

As an example, above-mentioned image, which generates model, can be the self-encoding encoder trained in advance using machine learning method progress (autoencoder).Wherein, above-mentioned image, which generates model, can be used for characterizing matching benchmark face image and benchmark face image The corresponding relationship between image in library.To which the image in benchmark face image library can be input to by above-mentioned executing subject Trained image generates model in advance, generates matching benchmark face image.

In these implementations, above-mentioned hidden layer network may include the first hidden layer network and the second hidden layer network.It is above-mentioned It may include that the first image generates submodel and the second image generation submodel that image, which generates model,.Above-mentioned first image generates son Model may include above-mentioned coding network (encoder), above-mentioned first hidden layer network, above-mentioned second hidden layer network and above-mentioned decoding Network (decoder).It includes above-mentioned coding network, above-mentioned decoding network and target hidden layer net that above-mentioned second image, which generates submodel, Network.Wherein, above-mentioned target hidden layer network can be one of above-mentioned first hidden layer network and above-mentioned second hidden layer network.

Optionally, above-mentioned first hidden layer network and the second hidden layer network can have identical network structure, but usually tool There is different network parameters.

Optionally, above-mentioned first hidden layer network and the second hidden layer network can be connected in parallel to above-mentioned coding network and above-mentioned Between decoding network.

Optionally, it is based on above-mentioned optional implementation, above-mentioned image generates model can be trained as follows It arrives:

The first step obtains sample benchmark face image collection and sample face image set.

In these implementations, the executing subject of training step can be obtained from local or communication connection electronic equipment Sample benchmark face image collection and sample face image set.Wherein, above-mentioned sample benchmark face image collection can be State the subset of benchmark face image library.For the training effect of lift scheme, above-mentioned sample benchmark face image collection and sample The number of image in face image set is usually larger.Such as the number of the image in each set can be not less than 700.

It should be noted that above-mentioned sample benchmark face image and the size of sample facial image are usually consistent, such as 128*128 pixel.

Sample benchmark face image collection and sample face image set are carried out image preprocessing transformation by second step, raw At sample preprocessing benchmark face image collection and sample preprocessing face image set.

In these implementations, above-mentioned executing subject can be to sample benchmark face image acquired in the above-mentioned first step Image in set and sample face image set carries out image preprocessing transformation.Wherein, above-mentioned image preprocessing transformation can be with Including the various processing operations being finely adjusted to image.Such as scalloping (Image Warping), adjust brightness, contrast Etc..Locate in advance it is thus possible to generate sample corresponding with sample benchmark face image collection and sample face image set Manage benchmark face image collection and sample preprocessing face image set.

Third step, it is raw using sample preprocessing benchmark face image and sample preprocessing facial image as the first image The input that submodel is generated at submodel and the second image, will sample benchmark face image corresponding with input and sample face figure Desired output as generating submodel and the second image generation submodel respectively as the first image, training obtain image and generate mould Type.

Specifically, the executing subject of above-mentioned training step can be trained in accordance with the following steps:

S1, first by the sample preprocessing benchmark face image in sample preprocessing benchmark face image collection be input to just Beginning coding network obtains the coding of sample first；Then, above-mentioned sample first coding is separately input into initial first hidden layer network With initial second hidden layer network, the coding of sample second and sample third coding are respectively obtained；Later, above-mentioned sample second is encoded It is attached with sample third coding, obtains the coding of sample the 4th；Then, by the 4th coding input of sample to initial decoding net Network obtains the first reconstruction image of sample；Next, calculating obtained the first reconstruction image of sample using preset loss function Difference degree between sample benchmark face image corresponding with the sample preprocessing benchmark face image of input is as the first damage Mistake value.

S2, the sample preprocessing facial image in sample preprocessing face image set is input to above-mentioned initial code net Network obtains the coding of sample the 5th；Then, by above-mentioned the 5th coding input of sample to initial target hidden layer network, sample is obtained Six codings；Later, above-mentioned sample the 6th coding is replicated and is connected, obtain the coding of sample the 7th；Then, by sample the 7th Coding input obtains the second reconstruction image of sample to above-mentioned initial solution code network；Next, being calculated using preset loss function Difference between obtained the second reconstruction image of sample sample facial image corresponding with the sample preprocessing facial image of input Off course degree is as the second penalty values.Wherein, dimension of the dimension that above-mentioned sample the 7th encodes usually with above-mentioned sample the 4th coding It is identical.

S3, it is based on calculating resulting difference degree, adjusts initial code network, initial first hidden layer network, initial second Hidden layer network, initial decoding network network parameter be trained according to the step of above-mentioned S1, S2.Meeting preset training knot Terminate training in the case where beam condition.Finally, the initial code network that training be obtained, initial first hidden layer network, initial the Initial pictures composed by two hidden layer networks, initial decoding network generate model and are determined as image generation model.

It should be noted that above-mentioned loss function can for example use MSE (mean squared error, mean square error) Loss function or SSIM (structural similarity index) loss function.Alternatively it is also possible to select two kinds simultaneously Above loss function is weighted.Optionally, above-mentioned first-loss value and the second penalty values can also pass through various processing, example Such as it is averaged.Above-mentioned preset trained termination condition can include but is not limited at least one of following: the training time is more than pre- If duration, frequency of training is more than preset times, and it is small to calculate resulting comprehensive loss value based on first-loss value and the second penalty values In preset discrepancy threshold；Accuracy rate on test set reaches preset accuracy rate threshold value.

To, by sample benchmark face image be input to it is above-mentioned training complete image generate model, by coding network, Target hidden layer network and decoding network can be generated and the consistent sample face of the facial pose of above-mentioned sample benchmark face image Image.It is appreciated that when the sample facial image in sample face image set with facial image to be replaced corresponding to same When people, it is to generate above-mentioned to be made with the matched default facial image database of facial image to be replaced that obtained image, which generates model, Image generates model.

It is worth noting that, the executing subject of above-mentioned training step can be with the executing subject of the method for generating image It is same or different.If identical, the executing subject of above-mentioned training step can be obtained in training will after image generates model Trained image generates the network structure of model and network parameter is stored in local.If it is different, then above-mentioned training step Executing subject in training can obtain that trained image is generated the network structure of model after image generates model and network is joined Number is sent to the executing subject of the method for generating image.

Step 203, the replacement based on matching facial image to facial image to be replaced, generates target image.

In the present embodiment, above-mentioned executing subject can be replaced with above-mentioned facial image to be replaced using various methods With facial image, to generate target image.Wherein, above-mentioned target image may include the people consistent with matching facial image Face image and the background consistent with negative image.As an example, above-mentioned executing subject can be first by above-mentioned matching face figure The image (such as 128*128) that picture and face image processing to be replaced match at size.Then, above-mentioned executing subject can incite somebody to action The background of matching facial image and negative image is combined, to generate above-mentioned target image.

In some optional implementations of the present embodiment, above-mentioned executing subject can also be using various methods to above-mentioned Image in conjunction with after carries out fusion treatment, to generate target image.As an example, above-mentioned executing subject can use Alpha The modes such as fusion, multi-band fusion, graph cut generate above-mentioned target image.

In some optional implementations of the present embodiment, above-mentioned executing subject can also generate mesh in accordance with the following steps Logo image:

Matching facial image is carried out face with facial image to be replaced and is aligned (face alignment) by the first step.

In these implementations, above-mentioned executing subject can using various face alignment algorithms will match facial image with Facial image to be replaced carries out face alignment.As an example, above-mentioned executing subject can detect first matching facial image and The position of face key point (such as may include 150 points) in facial image to be replaced.Then, above-mentioned executing subject can be with The left eye tail of the eye (such as label can be 13), the right eye tail of the eye (such as label can be 34), upper lip center (such as are marked Number can for 60) and chin center (such as label can be 6) this four points as benchmark progress face alignment.

Second step, based on the matching facial image and facial image to be replaced progress triangulation after alignment.

In these implementations, above-mentioned executing subject can be matched based on determined by the above-mentioned first step facial image and The position of face key point in facial image to be replaced carries out triangulation.As an example, can call OpenCV's Related API (Application Programming Interface, application programming interface) Lai Shixian of Subdiv2D class To the triangulation of facial image.Wherein, multiple delta-shaped regions not overlapped are commonly available by above-mentioned subdivision.

Third step, the delta-shaped region divided according to triangulation matching facial image after alignment and people to be replaced Corresponding relationship in face image is replaced, and generates quasi- target image.

In these implementations, each delta-shaped region that above-mentioned executing subject can be divided facial image to be replaced Each delta-shaped region that matching facial image after replacing with corresponding alignment is divided, to generate quasi- target image.By This, can be generated the matching facial image consistent with the facial pose of facial image to be replaced in negative image, and on State matching facial image validity with higher and naturalness.

4th step extracts the profile of facial image from quasi- target image.

In these implementations, above-mentioned executing subject can use the profile that various methods extract facial image.For example, Face critical point detection, edge detecting technology.

5th step generates exposure mask (mask) according to the profile of facial image.

6th step generates the distribution of color information of facial image according to exposure mask and quasi- target image.

In these implementations, according to above-mentioned 5th step exposure mask generated and above-mentioned third step quasi- target generated Image, above-mentioned executing subject can determine the distribution of color of the part in quasi- target image except facial image first.Then, on State the distribution of color information that the method that executing subject can use linear color transformation determines facial image.

4th step renders facial image according to distribution of color information, generates target image.

In these implementations, above-mentioned executing subject the facial image in quasi- target image can be rendered to it is above-mentioned The consistent colour of skin of distribution of color indicated by distribution of color information.It is thus possible to make the face in target image generated Image and merging for background are more natural.

It is according to an embodiment of the present disclosure for generating the one of the application scenarios of the method for image with continued reference to Fig. 3, Fig. 3 A schematic diagram.In the application scenarios of Fig. 3,301 using terminal equipment 302 of user uploads negative image 3031 and target face figure As 3032.The above-mentioned image 303 that 304 receiving terminal apparatus 302 of background server is sent.Then, background server 304 from bottom Domain is as determining matching facial image 306 in the matched default facial image database 305 of facial image in 3031.Wherein, it matches The people shown in the negative image 3031 that there is consistent facial pose with target facial image 3032 in facial image 306 Face.Then, the facial image in negative image 3031 is replaced with matching facial image 306 by background server 304, generates target Image 307.Optionally, target image 307 generated can also be sent to terminal device 302 by above-mentioned background server 304, To be shown to user 301.

Currently, one of prior art is usually to carry out deformation adjustment to face template by the monitoring of face key point, cause Facial image generated is not natural enough.And the method provided by the above embodiment of the disclosure be by default facial image database into Row matching promotes matched quality of human face image to realize through picture quality in the default facial image database of guarantee, also The shortcomings that being easy image (the bad case) of failed regeneration to avoid on-time model method.Further, since the present embodiment describes Scheme can directly be matched with database and made inferences without on-line training or application model, so as to promote figure As formation speed, the waiting time is reduced.

With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for generating image.The use In the process 400 for the method for generating image, comprising the following steps:

Step 401, the first video for the first user shooting and the second video for second user shooting are obtained.

It in the present embodiment, can be with for generating the executing subject (such as server 105 shown in FIG. 1) of the method for image It is obtained by various modes from local or communication connection electronic equipment (such as terminal device shown in FIG. 1) and is directed to the first user First video of shooting and the second video shot for second user.

Step 402, the video frame of the facial image including the first user is extracted from the first video as negative image.

In the present embodiment, above-mentioned executing subject can be extracted from the first video acquired in above-mentioned steps 401 includes The video frame of the facial image of first user is as negative image.

It should be noted that video is substantially the image sequence that a sequencing according to the time arranges, thus it is above-mentioned First video can correspond to the image sequence that one includes the facial image of the first user.Herein, above-mentioned executing subject can be with It adopts and chooses the video frame of the facial image including the first user from above-mentioned image sequence in various manners as negative image.Example Such as, can be by the way of randomly selecting, or the clarity preferable video frame of facial image can be preferentially chosen the bottom of as Domain picture.

Step 403, the video frame of the facial image including second user is extracted from the second video.

In the present embodiment, above-mentioned executing subject can be according to the step similar with above-mentioned steps 402 from the second video Extract the video frame of the facial image including second user.

Step 404, the facial image conduct of second user is extracted from the video frame of facial image for including second user Target facial image.

In the present embodiment, above-mentioned executing subject can using the algorithm that various recognitions of face and human face characteristic point extract from Facial image is extracted in the extracted video frame of above-mentioned steps 403 as target facial image.

It should be noted that for the explanation of above-mentioned negative image and target facial image can with walked in previous embodiment Rapid 201 description is consistent, and details are not described herein again.

Step 405, determination matches facial image from facial image to be replaced matched default facial image database.

Step 406, the replacement based on matching facial image to facial image to be replaced, generates target image.

Above-mentioned steps 405, step 406 are consistent with step 202, the step 203 in previous embodiment respectively, above with respect to step Rapid 202, the description of step 203 is also applied for step 405, step 406, and details are not described herein again.

Step 407, it is based on target image, generates target video.

In the present embodiment, above-mentioned executing subject can first video and the second video acquired in the step 401 respectively It is middle to extract multiple negative images and target facial image, negative image sequence and target human face image sequence are generated respectively.It is above-mentioned The sequence of negative image sequence and the image in target human face image sequence can be with the sequence consensus of the frame sequence of video frame.And Afterwards, above-mentioned executing subject can be to each negative image and mesh in extracted negative image sequence and target human face image sequence It marks facial image and executes step 405 to step 406, to generate target image sequence.Wherein, above-mentioned target image sequence is suitable Sequence can be with the sequence consensus of above-mentioned first video or the frame sequence of the second video.To which mesh can be generated in above-mentioned executing subject Mark video.Wherein, first shown in the facial pose of the second user shown in above-mentioned target video and above-mentioned first video The facial pose of user matches.

In some optional implementations of the present embodiment, above-mentioned executing subject can also be by target video generated It is sent to the target device (such as mobile phone, plate etc.) of communication connection, so that above-mentioned target device shows above-mentioned target video.Make For example, above-mentioned first video can be the video of user terminal (such as mobile phone, tablet computer etc.) upload.Above-mentioned second video It can be the video of user terminal self-timer.Target video generated can also be sent to above-mentioned upload view by above-mentioned executing subject The user terminal of frequency.To which user can be realized by user terminal drives uploaded video using the facial expression of oneself In personage expression.

Figure 4, it is seen that the method for generating image compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 400 embody and intercept negative image from video and the step of target facial image, and generate the step of target video Suddenly.The scheme of the present embodiment description can drive second shown in video to use according to the facial pose of the first user as a result, The facial pose at family.Further, since the scheme of the present embodiment description can directly be matched with database without instructing online Experienced or application model makes inferences, and greatly improves image generation speeds, so as to low delay operates in computer and other In mobile device, it is suitable for the fields such as short-sighted frequency, net cast, video display special efficacy.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, present disclose provides for generating image One embodiment of device, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which specifically can be applied to In various electronic equipments.

As shown in figure 5, the device 500 provided in this embodiment for generating image includes acquiring unit 501, determination unit 502 and first generation unit 503.Wherein, acquiring unit 501 are configured to obtain negative image and target facial image.Its In, negative image includes facial image to be replaced and background.Determination unit 502, be configured to from facial image to be replaced Matching facial image is determined in the default facial image database matched.It wherein, include people to be replaced in matched default facial image database The facial image of the different facial poses of face indicated by face image.Matching facial image is for characterizing facial image to be replaced Shown facial pose is consistent with facial pose shown by target facial image.First generation unit 503, is configured to base In replacement of the matching facial image to facial image to be replaced, target image is generated.Wherein, target image includes and matches face The consistent facial image of image and the background consistent with negative image.

In the present embodiment, in the device 500 for generating image: acquiring unit 501, determination unit 502 and first are raw It can be respectively with reference to step 201, the step in Fig. 2 corresponding embodiment at the specific processing of unit 503 and its brought technical effect Rapid 202 and step 203 related description, details are not described herein.

In some optional implementations of the present embodiment, above-mentioned acquiring unit 501 may include obtaining module (in figure Be not shown), the first extraction module (not shown), the second extraction module (not shown), third extraction module (in figure not It shows).Wherein, above-mentioned acquisition module may be configured to obtain for the first video of the first user shooting and for second Second video of user's shooting.Above-mentioned first extraction module, may be configured to extract from the first video includes the first user Facial image video frame as negative image.Above-mentioned second extraction module may be configured to extract from the second video The video frame of facial image including second user.Above-mentioned third extraction module may be configured to from including second user The facial image of second user is extracted in the video frame of facial image as target facial image.And this is used to generate image Device can also include: the second generation unit (not shown), be configured to generate target video based on target image.Its In, the facial pose of the second user shown in target video can be with the facial pose of the first user shown in the first video Match.

In some optional implementations of the present embodiment, above-mentioned default facial image database can obtain as follows It arrives: obtaining benchmark face image library, wherein include the different facial poses for showing benchmark face in benchmark face image library Image；Image in benchmark face image library is input to image trained in advance and generates model, generates matching benchmark face figure Picture, wherein it includes coding network, hidden layer network and decoding network that image, which generates model, is matched shown by benchmark face image Facial pose is consistent with facial pose shown by the image inputted；Based on matching benchmark face image, default face is generated Image library.

In some optional implementations of the present embodiment, above-mentioned hidden layer network may include the first hidden layer network and Two hidden layer networks.It may include that the first image generates submodel and the second image generation submodel that above-mentioned image, which generates model,.On Stating the first image and generating submodel may include coding network, the first hidden layer network, the second hidden layer network and decoding network.It is above-mentioned It may include coding network, decoding network and target hidden layer network that second image, which generates submodel,.Wherein, above-mentioned target hidden layer net Network can be one of the first hidden layer network and the second hidden layer network.

In some optional implementations of the present embodiment, above-mentioned image generates model and can train as follows It obtains: obtaining sample benchmark face image collection and sample face image set, wherein sample benchmark face image collection includes The subset of benchmark face image library；Sample benchmark face image collection and sample face image set are subjected to image preprocessing change It changes, generates sample preprocessing benchmark face image collection and sample preprocessing face image set；By sample preprocessing benchmark people Face image and sample preprocessing facial image generate submodel respectively as the first image and the second image generates the defeated of submodel Enter, sample benchmark face image corresponding with input and sample facial image respectively as the first image are generated into submodel and the Two images generate the desired output of submodel, and training obtains image and generates model.

In some optional implementations of the present embodiment, above-mentioned first generation unit 503 may include: alignment module (not shown), subdivision module (not shown), the first generation module (not shown), the 4th extraction module are (in figure Be not shown), the second generation module (not shown), third generation module (not shown), the 4th generation module (in figure not It shows).Wherein, above-mentioned alignment module may be configured to that facial image will be matched and facial image to be replaced carry out face pair Together.Above-mentioned subdivision module, matching facial image and facial image to be replaced after being configured to alignment carry out triangle Subdivision.Above-mentioned first generation module may be configured to of the delta-shaped region divided according to triangulation after alignment It is replaced with the corresponding relationship in facial image and facial image to be replaced, generates quasi- target image.Above-mentioned 4th extracts mould Block may be configured to the profile that facial image is extracted from quasi- target image.Above-mentioned second generation module, may be configured to According to the profile of facial image, exposure mask is generated.Above-mentioned third generation module may be configured to according to exposure mask and quasi- target figure Picture generates the distribution of color information of facial image.Above-mentioned 4th generation module, may be configured to according to distribution of color information pair Facial image is rendered, and target image is generated.

The device provided by the above embodiment of the disclosure obtains negative image and target face figure by acquiring unit 501 Picture.Wherein, negative image includes facial image to be replaced and background.Then, determination unit 502 from facial image to be replaced Matching facial image is determined in the default facial image database matched.It wherein, include people to be replaced in matched default facial image database The facial image of the different facial poses of face indicated by face image.Matching facial image is for characterizing facial image to be replaced Shown facial pose is consistent with facial pose shown by target facial image.Finally, the first generation unit 503 be based on Replacement with facial image to facial image to be replaced generates target image.Wherein, target image includes and matches facial image Consistent facial image and the background consistent with negative image.To improve the facial appearance of generation Yu target facial image The speed of the consistent image of state.

Below with reference to Fig. 6, below with reference to Fig. 6, it illustrates the electronic equipments for being suitable for being used to realize embodiment of the disclosure The structural schematic diagram of (server of example as shown in figure 1) 600.Terminal device in embodiment of the disclosure can include but is not limited to Such as fixed terminal of the mobile terminal of mobile phone, laptop etc. and such as number TV, desktop computer etc.. Server shown in Fig. 6 is only an example, should not function to embodiment of the disclosure and use scope bring any limit System.

As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.) 601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608 Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM603 are connected with each other by bus 604. Input/output (I/O) interface 605 is also connected to bus 604.

In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As first-class input unit 606；Including such as liquid crystal display (LCD, Liquid Crystal Display), loudspeaker etc. Output device 607；Storage device 608 including such as tape, hard disk etc.；And communication device 609.Communication device 609 can be with Electronic equipment 600 is allowed wirelessly or non-wirelessly to be communicated with other equipment to exchange data.Although Fig. 6 is shown with various dresses The electronic equipment 600 set, it should be understood that being not required for implementing or having all devices shown.It can be alternatively real Apply or have more or fewer devices.Each box shown in Fig. 6 can represent a device, also can according to need generation The multiple devices of table.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 609, or from storage device 608 It is mounted, or is mounted from ROM 602.When the computer program is executed by processing unit 601, the implementation of the disclosure is executed The above-mentioned function of being limited in the method for example.

It is situated between it should be noted that computer-readable medium described in embodiment of the disclosure can be computer-readable signal Matter or computer readable storage medium either the two any combination.Computer readable storage medium for example can be with System, device or the device of --- but being not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or it is any more than Combination.The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires Electrical connection, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type are programmable Read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic are deposited Memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer readable storage medium, which can be, appoints What include or the tangible medium of storage program that the program can be commanded execution system, device or device use or and its It is used in combination.And in embodiment of the disclosure, computer-readable signal media may include in a base band or as carrier wave The data-signal that a part is propagated, wherein carrying computer-readable program code.The data-signal of this propagation can be adopted With diversified forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal is situated between Matter can also be any computer-readable medium other than computer readable storage medium, which can be with It sends, propagate or transmits for by the use of instruction execution system, device or device or program in connection.Meter The program code for including on calculation machine readable medium can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (Radio Frequency, radio frequency) etc. or above-mentioned any appropriate combination.

Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment；It is also possible to individualism, and not It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more When a program is executed by the electronic equipment, so that the electronic equipment: obtaining negative image and target facial image, wherein negative Image includes facial image to be replaced and background；Determination matches from facial image to be replaced matched default facial image database Facial image, wherein include the difference face of face indicated by facial image to be replaced in matched default facial image database The facial image of posture, matching facial image is for characterizing facial pose shown by facial image to be replaced and target face figure As shown facial pose is consistent；Replacement based on matching facial image to facial image to be replaced, generates target image, In, target image includes and the matching consistent facial image of facial image and the background consistent with negative image.

The behaviour for executing embodiment of the disclosure can be write with one or more programming languages or combinations thereof The computer program code of work, described program design language include object oriented program language-such as Java, Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).

Flow chart and block diagram in attached drawing illustrate system, method and the computer of the various embodiments according to the disclosure The architecture, function and operation in the cards of program product.In this regard, each box in flowchart or block diagram can be with A part of a module, program segment or code is represented, a part of the module, program segment or code includes one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer The combination of order is realized.

Being described in unit involved in embodiment of the disclosure can be realized by way of software, can also be passed through The mode of hardware is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor, Including acquiring unit, determination unit, the first generation unit.Wherein, the title of these units under certain conditions constitute pair The restriction of the unit itself, for example, acquiring unit is also described as " obtaining the list of negative image and target facial image Member, wherein negative image includes facial image to be replaced and background ".

Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member it should be appreciated that embodiment of the disclosure involved in invention scope, however it is not limited to the specific combination of above-mentioned technical characteristic and At technical solution, while should also cover do not depart from foregoing invention design in the case where, by above-mentioned technical characteristic or its be equal Feature carries out any combination and other technical solutions for being formed.Such as disclosed in features described above and embodiment of the disclosure (but It is not limited to) technical characteristic with similar functions is replaced mutually and the technical solution that is formed.

Claims

1. a kind of method for generating image, comprising:

Obtain negative image and target facial image, wherein the negative image includes facial image to be replaced and background；

Determination matches facial image from the matched default facial image database of facial image to be replaced, wherein described It include the face figure of the different facial poses of face indicated by the facial image to be replaced in the default facial image database matched Picture, the matching facial image is for characterizing facial pose shown by facial image to be replaced and the target facial image institute The facial pose of display is consistent；

Replacement based on the matching facial image to the facial image to be replaced, generates target image, wherein the target Image includes and the matching consistent facial image of facial image and the background consistent with the negative image.

2. according to the method described in claim 1, wherein, the acquisition negative image and target facial image, comprising:

Obtain the first video for the first user shooting and the second video for second user shooting；

The video frame of the facial image including the first user is extracted from first video as the negative image；

The video frame of the facial image including second user is extracted from second video；

It is extracted from the video frame of the facial image including second user described in the facial image conduct of the second user Target facial image；And

In the replacement based on the matching facial image to the facial image to be replaced, after generating target image, institute State method further include:

Based on the target image, target video is generated, wherein the facial pose of the second user shown in the target video Match with the facial pose of the first user shown in first video.

3. according to the method described in claim 1, wherein, the default facial image database obtains as follows:

Obtain benchmark face image library, wherein include the difference face for showing benchmark face in the benchmark face image library The image of posture；

Image in the benchmark face image library is input to image trained in advance and generates model, generates matching benchmark face Image, wherein it includes coding network, hidden layer network and decoding network, the matching benchmark face figure that described image, which generates model, As shown facial pose is consistent with facial pose shown by the image inputted；

Based on the matching benchmark face image, the default facial image database is generated.

4. according to the method described in claim 3, wherein, the hidden layer network includes the first hidden layer network and the second hidden layer net Network, it includes that the first image generates submodel and the second image generation submodel that described image, which generates model, and the first image is raw It include the coding network, the first hidden layer network, the second hidden layer network and the decoding network at submodel, it is described It includes the coding network, the decoding network and target hidden layer network, the target hidden layer net that second image, which generates submodel, Network is one of the first hidden layer network and the second hidden layer network.

5. according to the method described in claim 4, wherein, described image generates model, and training obtains as follows:

Obtain the sample benchmark face image collection and sample face image set, wherein the sample benchmark face image Set includes the subset of the benchmark face image library；

The sample benchmark face image collection and sample face image set are subjected to image preprocessing transformation, it is pre- to generate sample Handle benchmark face image collection and sample preprocessing face image set；

Submodule is generated using sample preprocessing benchmark face image and sample preprocessing facial image as the first image Type and second image generate the input of submodel, will sample benchmark face image corresponding with input and sample facial image Submodel is generated respectively as the first image and second image generates the desired output of submodel, and training obtains described Image generates model.

6. method described in one of -5 according to claim 1, wherein described to be based on the matching facial image to described to be replaced The replacement of facial image generates target image, comprising:

The matching facial image is carried out face with the facial image to be replaced to be aligned；

Based on the matching facial image and facial image to be replaced progress triangulation after alignment；

Matching facial image and to be replaced facial image of the delta-shaped region divided according to triangulation after the alignment In corresponding relationship be replaced, generate quasi- target image；

The profile of facial image is extracted from the quasi- target image；

According to the profile of the facial image, exposure mask is generated；

According to the exposure mask and the quasi- target image, the distribution of color information of the facial image is generated；

The facial image is rendered according to the distribution of color information, generates the target image.

7. a kind of for generating the device of image, comprising:

Acquiring unit is configured to obtain negative image and target facial image, wherein the negative image includes people to be replaced Face image and background；

Determination unit is configured to the determination from the matched default facial image database of facial image to be replaced and matches face Image, wherein include the difference of face indicated by the facial image to be replaced in the matched default facial image database The facial image of facial pose, the matching facial image is for characterizing facial pose shown by facial image to be replaced and institute It is consistent to state facial pose shown by target facial image；

First generation unit is configured to the replacement based on the matching facial image to the facial image to be replaced, generates Target image, wherein the target image include with the consistent facial image of the matching facial image and with the negative The consistent background of image.

8. device according to claim 7, wherein the acquiring unit includes:

Module is obtained, is configured to obtain the first video for the first user shooting and the second view for second user shooting Frequently；

First extraction module, the video frame for being configured to extract the facial image including the first user from first video are made For the negative image；

Second extraction module is configured to extract the video frame of the facial image including second user from second video；

Third extraction module is configured to extract described second from the video frame of the facial image including second user and uses The facial image at family is as the target facial image；

And described device further include:

Second generation unit is configured to generate target video based on the target image, wherein show in the target video The facial pose of the first user shown in the facial pose for the second user shown and first video matches.

9. device according to claim 7, wherein the default facial image database obtains as follows:

10. device according to claim 9, wherein the hidden layer network includes the first hidden layer network and the second hidden layer net Network, it includes that the first image generates submodel and the second image generation submodel that described image, which generates model, and the first image is raw It include the coding network, the first hidden layer network, the second hidden layer network and the decoding network at submodel, it is described It includes the coding network, the decoding network and target hidden layer network, the target hidden layer net that second image, which generates submodel, Network is one of the first hidden layer network and the second hidden layer network.

11. device according to claim 10, wherein described image generates model, and training obtains as follows:

12. the device according to one of claim 7-11, wherein first generation unit includes:

Alignment module is configured to the matching facial image carrying out face with the facial image to be replaced to be aligned；

Subdivision module, matching facial image and facial image to be replaced after being configured to based on alignment carry out triangulation；

First generation module, matching face of the delta-shaped region for being configured to be divided according to triangulation after the alignment Corresponding relationship in image and facial image to be replaced is replaced, and generates quasi- target image；

4th extraction module is configured to extract the profile of facial image from the quasi- target image；

Second generation module is configured to the profile according to the facial image, generates exposure mask；

Third generation module is configured to generate the color of the facial image according to the exposure mask and the quasi- target image Distributed intelligence；

4th generation module is configured to render the facial image according to the distribution of color information, described in generation Target image.

13. a kind of electronic equipment, comprising:

One or more processors；

Storage device is stored thereon with one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 6.

14. a kind of computer-readable medium, is stored thereon with computer program, wherein the realization when program is executed by processor Such as method as claimed in any one of claims 1 to 6.