CN116543064A

CN116543064A - Image processing method, device, electronic equipment and medium

Info

Publication number: CN116543064A
Application number: CN202310280554.0A
Authority: CN
Inventors: 赵亚飞; 王志强; 张世昌; 郭紫垣; 范锡睿; 陈毅; 杜宗财; 张伟伟; 孙权; 刘倩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-03-21
Filing date: 2023-03-21
Publication date: 2023-08-04

Abstract

The disclosure provides an image processing method, an image processing device, electronic equipment and a medium, relates to the field of artificial intelligence, in particular to the fields of augmented reality, virtual reality, computer vision, deep learning and the like, and can be applied to scenes such as metauniverse, virtual digital people and the like. The specific implementation scheme is as follows: performing three-dimensional reconstruction on the original image to obtain initial face parameters corresponding to the target object in the original image; adjusting the initial face parameters according to the reference face parameters to obtain initial target face parameters; rendering the initial target face parameters to obtain a rendered image; and fusing the original image and the rendered image to obtain a target image.

Description

Image processing method, device, electronic equipment and medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the fields of augmented reality, virtual reality, computer vision, deep learning and the like, and can be applied to scenes such as metauniverse, virtual digital people and the like. The present disclosure relates in particular to an image processing method, apparatus, electronic device, storage medium and computer program product.

Background

At present, the face replay technology has wide application in the fields of video production, electronic games, virtual digital people and the like. The face replay technology aims at generating a new speaker image or video according to the identity of a source face image and the information such as mouth, expression, gesture and the like provided by driving information. However, in the related art, when the face replay is performed, the facial features of the generated face image are deficient in expressive power, accuracy, diversity, and the like.

Disclosure of Invention

The present disclosure provides an image processing method, apparatus, electronic device, storage medium, and computer program product.

According to an aspect of the present disclosure, there is provided an image processing method including: performing three-dimensional reconstruction on the original image to obtain initial face parameters corresponding to the target object in the original image; adjusting the initial face parameters according to the reference face parameters to obtain initial target face parameters; rendering the initial target face parameters to obtain a rendered image; and fusing the original image and the rendered image to obtain a target image.

According to another aspect of the present disclosure, there is provided an image processing apparatus including: the first reconstruction module is used for carrying out three-dimensional reconstruction on the original image to obtain initial face parameters corresponding to the target object in the original image; the adjusting module is used for adjusting the initial face parameters according to the reference face parameters to obtain initial target face parameters; the rendering module is used for performing rendering processing on the initial target face parameters to obtain a rendered image; and the fusion module is used for carrying out fusion processing on the original image and the rendered image to obtain a target image.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method provided according to the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method provided according to the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an exemplary system architecture to which image processing methods and apparatus may be applied, according to embodiments of the present disclosure;

FIG. 2 is a flow chart of an image processing method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a process for generating a target image using a fusion model according to an embodiment of the present disclosure;

fig. 4A, 4B, and 4C are schematic diagrams of an image processing method according to an embodiment of the present disclosure;

fig. 5 is a block diagram of an image processing apparatus according to an embodiment of the present disclosure; and

fig. 6 is a block diagram of an electronic device for implementing an image processing method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Fig. 1 is a schematic diagram of an exemplary system architecture to which image processing generation and apparatus may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various client applications can be installed on the terminal devices 101, 102, 103. For example, an animation class application, a live class application, a game class application, a web browser application, a search class application, an instant messaging tool, a mailbox client or social platform software, and the like (just examples).

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud computing, network service, and middleware service.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

For example, the server 105 may acquire original images from the terminal devices 101, 102, 103 through the network 104, and perform face replay based on the original images to generate target images. The server 105 may also transmit the target image to the terminal devices 101, 102, 103.

It should be noted that, the image processing method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the image processing apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The image processing method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the image processing apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

It should be noted that the sequence numbers of the respective operations in the following methods are merely representative of the operations for the purpose of description, and should not be construed as representing the order of execution of the respective operations. The method need not be performed in the exact order shown unless explicitly stated.

Fig. 2 is a flowchart of an image processing method according to an embodiment of the present disclosure.

As shown in fig. 2, the image processing method 200 may include operations S210 to S240, for example.

In operation S210, the original image is three-dimensionally reconstructed to obtain initial face parameters corresponding to the target object in the original image.

In operation S220, the initial face parameter is adjusted according to the reference face parameter, resulting in an initial target face parameter.

In operation S230, a rendering process is performed on the initial target face parameters to obtain a rendered image.

In operation S240, the original image and the rendered image are fused to obtain a target image.

According to an embodiment of the present disclosure, the original image may be, for example, an image acquired by an image acquisition device, or an image frame in a video stream, which is not limited by the present disclosure.

The target object in the original image can be, for example, a character form, a cartoon or other forms, and can be specifically selected according to the actual application scene.

According to the embodiment of the disclosure, by performing three-dimensional reconstruction on the original image, the initial face parameters corresponding to the target object in the original image can be obtained. The initial facial parameters may be used to characterize initial facial features of the target object in the original image.

In one example, the original image may be three-dimensionally reconstructed, for example, using a parameterized face model, resulting in initial face parameters corresponding to a target object in the original image. Parameterized face models may include, for example, but not limited to, a bazier face model (Basel Facial Model, BFM), a sari face model (Surrey Face Model, SFM), a large face model (Large Scale Facial Model, LSFM), a face model based on a hinge model and expression learning (Faces Learned with an Articulated Model and Expressiohs, flag) or faceware house model, etc., and may be specifically selected according to practical application needs.

The initial facial parameters may include, for example, initial shape parameters, initial expression parameters, initial pose parameters, and initial texture parameters. The initial shape parameters are used for representing the shape of the five sense organs, the relative positions among the five sense organs, the facial forms and other characteristics of the target object in the original image before adjustment, and the characteristics are related to the identity of the target object. The initial expression parameters are used to characterize the expression of the target subject in the original image prior to adjustment, including, for example, but not limited to, smiling, crafting, and anger expressions, among others. The initial pose parameters are used to characterize the head pose and motion of the target object in the original image prior to adjustment. The initial texture parameters may be used to characterize texture features of the target object in the original image prior to adjustment.

By carrying out three-dimensional reconstruction on the original image, the initial facial features of the target object can be represented in a parameterized manner, and decoupling of the features such as shape, expression, gesture, texture and the like is realized, so that the subsequent explicit adjustment of the initial facial parameters is facilitated, the diversified facial feature adjustment is facilitated, the initial facial parameters can be flexibly adjusted according to different application scene requirements, and the facial features of the adjusted target object are more in line with the expectations of users.

According to an embodiment of the present disclosure, the reference face parameter may be, for example, a face parameter corresponding to a reference object in the reference image. The reference facial parameters may be used to characterize facial features of the reference subject. The reference object and the target object may be the same object or different objects.

The reference facial parameters may include, for example, a reference shape parameter, a reference expression parameter, a reference pose parameter, and a reference texture parameter. The reference shape parameter, the reference expression parameter, the reference gesture parameter, and the reference texture parameter have similar definitions as the shape parameter, the expression parameter, the gesture parameter, and the texture parameter in the initial face parameter, respectively, and are not described herein.

In some embodiments, each of the reference facial parameters may be from the same reference image or from a plurality of different reference images. In one example, the shape, expression, posture, texture, and other parameters in the reference facial parameters are the shape, expression, posture, texture, and other parameters corresponding to the reference object in the reference image. In another example, the reference shape parameter and the reference expression parameter in the reference face parameter may be a shape parameter and an expression parameter corresponding to the reference object in the reference image 1, the reference pose parameter in the reference face parameter may be a pose parameter corresponding to the reference object in the reference image 2, and the reference texture parameter in the reference face parameter may be a texture parameter corresponding to the reference object in the reference image 3.

According to embodiments of the present disclosure, each of the initial face parameters may be adjusted using the reference face parameters as a reference in order to obtain initial target face parameters. Thereby, the adjustability of the facial features of the target object is achieved. Then, rendering processing can be performed on the initial target face parameters to obtain a rendered image. Then, the original image and the rendered image are fused to obtain a target image. By utilizing the original image and the rendered image to perform fusion processing, the accuracy and the precision of the facial features of the target object can be improved, so that a clear and vivid target image is generated, and the quality of the target image is improved.

According to the embodiment of the disclosure, on one hand, by performing three-dimensional reconstruction on an original image, the initial facial features of the target object can be represented in a parameterized manner, and decoupling of the features such as shape, expression, gesture and texture is realized, so that explicit adjustment on the facial features of the target object is facilitated, and further diversity of the facial features of the target object in the target image is facilitated to be improved. On the other hand, by fusing the original image and the rendered image, the accuracy and precision of the facial features of the target object can be improved, so that a clear and vivid target image is generated, and the quality of the target image is improved.

Because the expression, the gesture, the shape, the texture and the like of the target object can be correspondingly changed in different application scenes. Accordingly, the initial facial parameters may be adjusted according to application scene requirements in order to obtain facial features of the adapted target object.

In one example, for example, the initial shape parameters in the initial face parameters may be adjusted according to the reference shape parameters in the reference face parameters in order to adjust the five-element shape of the target object. For example, the face shape of the target object may be adjusted from "fat" to "thin", or the mouth shape, eyes, and/or eyebrows of the target object may be adjusted.

For example, the shape adjustment parameters are determined from the reference shape parameters among the reference face parameters and the initial shape parameters among the initial face parameters. Then, the initial face parameters are adjusted based on the shape adjustment parameters, and initial target face parameters are obtained. Thereby, the adjustment of the shape of the five sense organs of the target object is realized.

In another example, for example, an initial expression parameter in the initial facial parameters may be adjusted according to a reference expression parameter in the reference facial parameters in order to migrate the expression of the reference object onto the target object.

The shape of the face is also changed due to the expression. Accordingly, in adjusting the initial expression parameter based on the reference expression parameter, the expression adjustment parameter and the additional shape adjustment parameter may be determined according to the reference expression parameter and the initial expression parameter. And then, adjusting the initial facial parameters based on the expression adjustment parameters and the additional shape adjustment parameters to obtain initial target facial parameters. Thereby, the adjustment of the expression of the target object is realized.

In another example, the expression and the shape of the five sense organs of the target object may also be adjusted. For example, in addition to adjusting the expression of the target object, the face shape of the target object may be changed. For example, the face shape of the target object is adjusted from "fat" to "thin", or the shape of the five sense organs of the target object may be adjusted at the same time in order to make the adjusted expression more vivid, and so on.

For example, on the basis of adjusting the initial expression parameters based on the reference expression parameters, the shape adjustment parameters may also be determined from the reference shape parameters in the reference face parameters and the initial shape parameters in the initial face parameters. And then, adjusting the initial facial parameters based on the shape adjustment parameters, the expression adjustment parameters and the additional shape adjustment parameters to obtain initial target facial parameters. Thereby, the adjustment of the expression and shape of the target object is achieved.

In another example, the expression, posture, and shape of the five sense organs of the target object may also be adjusted together. For example, in addition to adjusting the expression, shape of the target object, the head pose of the target object may be changed, for example, the head pose of the target object is adjusted from the front view direction to a deflection preset angle (for example only).

For example, after the shape adjustment parameters, the expression adjustment parameters, and the additional shape adjustment parameters are determined, the posture adjustment parameters may also be determined from the reference posture parameters among the reference face parameters and the initial posture parameters among the initial face parameters. And then, adjusting the initial facial parameters based on the shape adjustment parameters, the expression adjustment parameters, the additional shape adjustment parameters and the posture adjustment parameters to obtain initial target facial parameters. Thus, the expression, shape and posture of the target object are adjusted.

In some examples, the texture features of the target object may also be adjusted. For example, the skin tone of the target object may be adjusted, etc.

For example, the texture adjustment parameter may be determined from the reference texture parameter and the initial texture parameter. And then, adjusting the initial face parameters based on the texture adjustment parameters to obtain initial target face parameters. Thereby, adjustment of the texture features of the target object is achieved.

In other embodiments, the initial texture parameters and other facial parameters may be adjusted simultaneously, and the specific adjustment process is similar to the adjustment process described above, and will not be repeated here.

According to an embodiment of the present disclosure, the reference face parameter may be acquired as follows.

In one example, for example, a reference image may be acquired, where the reference object and the target object in the reference image may be the same object or different objects. And then, carrying out three-dimensional reconstruction on the reference image to obtain face parameters corresponding to the reference object in the reference image, and determining the face parameters corresponding to the reference object as the reference face parameters.

It should be noted that, the reference image may include a plurality of reference images, for example. The three-dimensional reconstruction can be performed on the plurality of reference face parameters respectively to obtain face parameters respectively corresponding to the plurality of reference images. Then, a reference face parameter is determined from face parameters corresponding to each of the plurality of reference images.

In the embodiment of the present disclosure, the manner of three-dimensionally reconstructing the reference image is the same as or similar to the manner described above, and will not be described here again.

In another example, input information of the user may also be received, the input information including reference facial parameters therein. The reference face parameter may be determined from the input information. The reference facial parameters are determined by input information provided by the user to make the facial features of the adjusted target object more closely match the user's expectations, thereby obtaining a more vivid target image.

According to an embodiment of the present disclosure, the fusing of the original image and the rendered image to obtain the target image may include, for example, the following operations.

For example, the original image and the rendered image may be encoded respectively to obtain a first facial feature hidden code corresponding to the original image and a second facial feature hidden code corresponding to the rendered image.

According to an embodiment of the present disclosure, the first facial feature hidden code characterizes facial features of the target object in the original image, such as features of expression, shape, pose, texture, etc. of the target object in the original image. The second facial feature hidden code characterizes facial features of the target object in the rendered image, such as features of expression, shape, pose, texture, etc. of the target object in the rendered image.

And then, carrying out fusion processing on the first facial feature hidden code and the second facial feature hidden code to obtain a fusion feature hidden code. And then, decoding the fusion feature hidden code to obtain a target image.

According to the embodiment of the disclosure, the accuracy and the precision of the facial features of the target object in the target image can be improved by performing fusion processing on the original image and the rendered image, so that the naturalness of the target image is improved.

According to the embodiment of the disclosure, in the process of performing fusion processing on the first facial feature hidden code and the second facial feature hidden code, a fusion coefficient corresponding to the first facial feature hidden code can be obtained, wherein the fusion coefficient characterizes the fusion degree of the first facial feature hidden code during hidden code fusion. And then, based on the fusion coefficient corresponding to the first facial feature hidden code, fusing the first facial feature hidden code and the second facial feature hidden code to obtain a fusion feature hidden code.

For example, the fusion coefficient corresponding to the second facial feature hidden code may be determined according to the fusion coefficient corresponding to the first facial feature hidden code. The sum of the fusion coefficient corresponding to the first facial feature hidden code and the fusion coefficient corresponding to the second facial feature hidden code is 1.

And then, respectively weighting the first facial feature hidden code and the second facial feature hidden code by using a fusion coefficient corresponding to the first facial feature hidden code and a fusion coefficient corresponding to the second facial feature hidden code, and fusing the weighted first facial feature hidden code and second facial feature hidden code to obtain a fusion feature hidden code.

In some embodiments, the fusion coefficient corresponding to the first facial feature hidden code and the fusion coefficient corresponding to the second facial feature hidden code may also be determined according to actual needs. For example, the fusion coefficients corresponding to the first face feature hidden code and the second face feature hidden code may be between 0 and 1, and the two may be the same or different.

In some embodiments, the original image and the rendered image may be fused using a fusion model to obtain the target image.

FIG. 3 is a schematic diagram of a process for generating a target image using a fusion model according to an embodiment of the present disclosure. An example process of generating a target image using a fusion model is described below with reference to fig. 3.

As shown in fig. 3, the fusion model includes, for example, an encoding network 321, a hidden code fusion network 322, and a decoding network 323. Each network in the fusion model is a pre-trained network. In one example, the fusion model may employ, for example, a model of a model-based generation countermeasure network (Style Generative Adversarial Networks, styleGAN) series.

For example, after the original image 31 and the rendered image 32 are acquired, the original image 31 and the rendered image 32 may be subjected to depth coding processing using the coding network 321, resulting in a first facial feature hidden code 301 corresponding to the original image 31 and a second facial feature hidden code 302 corresponding to the rendered image 32. Thereafter, a fusion process may be performed using the hidden code fusion network 322 based on the first facial feature hidden code 301 and the second facial feature hidden code 302 to obtain a fused feature hidden code 303. Then, the fusion feature hidden code 303 is subjected to decoding processing using the decoding network 323, resulting in the target image 33.

Fig. 4A, 4B, and 4C are schematic diagrams of an image processing method according to an embodiment of the present disclosure. The scheme of the present disclosure is exemplified below with reference to fig. 4A to 4C.

For simplicity of explanation, the scheme of the present disclosure is exemplified by taking the target object in the original image as a face and the adjusted initial facial parameter as the initial expression parameter in the embodiment of the present disclosure. It should be understood that the illustrations in the embodiments of the disclosure are merely exemplary to assist those skilled in the art in understanding the aspects of the disclosure and are not intended to limit the scope of the disclosure.

As shown in fig. 4A, the original image 41 may be, for example, a face image containing a target subject having smiling expression (by way of example only). In the embodiment of the present disclosure, the expression of the target object in the original image 41 may be adjusted according to the reference facial parameters such that the expression of the target object is adjusted from smiling expression to silent expression (i.e., a state without lip movement). The method for obtaining the reference facial parameters is similar to the above-described process, and will not be repeated here.

The original image 41 may be three-dimensionally reconstructed using the parameterized face model 410 to obtain initial face parameters corresponding to the target object in the original image 41. The initial facial parameters may include, for example, initial shape parameters 41_1, initial texture parameters 41_2, initial pose parameters 41_3, initial expression parameters 41_4.

In embodiments of the present disclosure, parameterized face model 410 may include, for example, but is not limited to, a BFM model, an SFM model, an LSFM model, a FLAME model, or a faceWareHouse model, among others, and may be specifically selected according to actual application needs.

Next, the initial expression parameter 41_4 may be adjusted according to the reference expression parameter 40_4 among the acquired reference facial parameters. For example, the expression adjustment parameter 41_41 and the additional shape adjustment parameter 41_42 may be determined according to the reference expression parameter 40_4 and the initial expression parameter 41_4. Thereafter, the initial face parameters are adjusted according to the expression adjustment parameters 41_41 and the additional shape adjustment parameters 41_42. For example, the initial expression parameter 41_4 and the initial shape parameter 41_1 among the initial face parameters are adjusted according to the expression adjustment parameter 41_41 and the additional shape adjustment parameter 41_42, thereby obtaining an initial target face parameter 41_0. Then, the initial target face parameter 41_0 is three-dimensionally rendered, resulting in a rendered image 42. As shown in fig. 4B, the expression of the target subject is adjusted from the smile state to a relatively calm state, but the mouth shape of the target subject has not yet been fully closed.

As shown in fig. 4C, after the rendered image 42 is acquired, the original image 41 and the rendered image 42 are subjected to depth coding processing using the coding network 421, respectively, to obtain a first facial feature hidden code 401 corresponding to the original image 41 and a second facial feature hidden code 402 corresponding to the rendered image 42. Thereafter, fusion processing is performed based on the first face feature hidden code 401 and the second face feature hidden code 402 by using the hidden code fusion network 422, so as to obtain a fusion feature hidden code 403. Then, the fusion feature hidden code 403 is decoded by using the decoding network 423 to obtain the target image 43. In the target image 43, the expression of the target object has been adjusted to a silent expression. Thereby, the adjustment of the expression and shape of the target object is achieved. In the embodiment of the present disclosure, by performing fusion processing on the original image 41 and the rendered image 42, the accuracy and precision of the facial features of the target object in the target image 43 can be improved, thereby improving the naturalness of the target image 43.

In some embodiments, at least one of the expression parameters, texture parameters, shape parameters, and pose parameters in the initial facial parameters may also be adjusted to generate a target image of a target object containing different facial features. In other embodiments, target objects other than faces (e.g., cartoon or other forms of target objects) may also be adjusted to generate different forms of target images. These adjustment procedures are similar to those described above and will not be described here.

According to an embodiment of the present disclosure, a target avatar is driven according to a target face parameter of a target object in a target image to perform an action corresponding to the target face parameter.

The target face parameter is a face parameter corresponding to a target object in the target image. The target avatar may be driven according to the target face parameter to perform an action or gesture corresponding to the target face parameter. For example, if the target facial parameters characterize that the target object has a traumatic expression and a corresponding five-element shape, the target avatar may be driven to present the same traumatic expression and corresponding five-element shape according to the target facial parameters.

In the embodiment of the present disclosure, the target avatar may be a character in a video such as a game, an animation, or other suitable characters, and may specifically be selected according to an actual application scenario. The target avatar may be a character form, a cartoon, or other forms, which the present disclosure does not limit.

Fig. 5 is a block diagram of an image processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, the image processing apparatus 500 includes: a first reconstruction module 510, an adjustment module 520, a rendering module 530, and a fusion module 540.

The first reconstruction module 510 is configured to perform three-dimensional reconstruction on an original image, so as to obtain initial facial parameters corresponding to a target object in the original image.

The adjustment module 520 is configured to adjust the initial face parameter according to the reference face parameter, so as to obtain an initial target face parameter.

The rendering module 530 is configured to perform rendering processing on the initial target face parameter, so as to obtain a rendered image.

The fusion module 540 is configured to perform fusion processing on the original image and the rendered image, so as to obtain a target image.

According to an embodiment of the present disclosure, the fusion module 540 includes: the first encoding unit, the first fusing unit and the first decoding unit. The first coding unit is used for respectively coding the original image and the rendered image to obtain a first facial feature hidden code corresponding to the original image and a second facial feature hidden code corresponding to the rendered image; the first fusion unit is used for carrying out fusion processing on the first facial feature hidden code and the second facial feature hidden code to obtain a fusion feature hidden code; and the first decoding unit is used for decoding the fusion characteristic hidden code to obtain a target image.

According to an embodiment of the present disclosure, the first fusing unit includes: the method comprises the steps of obtaining a subunit and a first fusion subunit. The acquisition subunit is used for acquiring a fusion coefficient corresponding to the first facial feature hidden code; the fusion coefficient is used for representing the fusion degree of the first facial feature hidden code during hidden code fusion; and the first fusion subunit is used for fusing the first facial feature hidden code and the second facial feature hidden code based on the fusion coefficient to obtain a fusion feature hidden code.

According to an embodiment of the present disclosure, the adjustment module 520 includes: a first determining unit and a first adjusting unit. The first determining unit is used for determining an expression adjusting parameter and an additional shape adjusting parameter according to the reference expression parameter in the reference face parameters and the initial expression parameter in the initial face parameters; and the first adjusting unit is used for adjusting the initial facial parameters based on the expression adjusting parameters and the additional shape adjusting parameters to obtain initial target facial parameters.

According to an embodiment of the present disclosure, the adjustment module 520 further includes: and a second determining unit for determining a shape adjustment parameter based on the reference shape parameter of the reference face parameters and the initial shape parameter of the initial face parameters. Wherein the first adjusting unit includes: the first adjusting subunit is configured to adjust the initial face parameter based on the shape adjusting parameter, the expression adjusting parameter, and the additional shape adjusting parameter, to obtain an initial target face parameter.

According to an embodiment of the present disclosure, the adjustment module 520 further includes: a third determination unit configured to determine an attitude adjustment parameter according to a reference attitude parameter among the reference face parameters and an initial attitude parameter among the initial face parameters; wherein the first adjusting unit includes: and the second adjusting subunit is used for adjusting the initial facial parameters based on the shape adjusting parameters, the expression adjusting parameters, the additional shape adjusting parameters and the posture adjusting parameters to obtain initial target facial parameters.

According to an embodiment of the present disclosure, the adjustment module 520 includes: a fourth determining unit and a second adjusting unit. The fourth determining unit is used for determining a shape adjustment parameter according to the reference shape parameter in the reference face parameters and the initial shape parameter in the initial face parameters; and the second adjusting unit is used for adjusting the initial face parameter based on the shape adjusting parameter to obtain the initial target face parameter.

According to an embodiment of the present disclosure, the image processing apparatus 500 further includes: the device comprises an acquisition module, a second reconstruction module and a first determination module. The acquisition module is used for acquiring a reference image; the second modeling block is used for carrying out three-dimensional reconstruction on the reference image to obtain facial parameters corresponding to the reference object in the reference image; and the first determining module is used for determining the face parameter corresponding to the reference object as the reference face parameter.

According to an embodiment of the present disclosure, the image processing apparatus 500 further includes: a receiving module and a second determining module. The receiving module is used for receiving input information of a user, and the second determining module is used for determining reference face parameters according to the input information.

According to an embodiment of the present disclosure, the image processing apparatus 500 further includes: and the driving module is used for driving the target virtual image to execute actions corresponding to the target face parameters according to the target face parameters of the target object in the target image.

According to an embodiment of the present disclosure, the fusion module 540 includes: and a second fusion unit. The second fusion unit is used for carrying out fusion processing on the original image and the rendered image by using the fusion model to obtain a target image.

According to an embodiment of the present disclosure, the fusion model includes an encoding network, a hidden code fusion network, and a decoding network; the second fusion unit includes: a coding subunit, a second fusion subunit, and a decoding subunit. The coding subunit is used for respectively coding the original image and the rendered image by using a coding network to obtain a first facial feature hidden code corresponding to the original image and a second facial feature hidden code corresponding to the rendered image; the second fusion subunit is used for carrying out fusion processing based on the first facial feature hidden code and the second facial feature hidden code by using the hidden code fusion network to obtain a fusion feature hidden code; and the decoding subunit is used for decoding the fusion characteristic hidden code by using a decoding network to obtain a target image.

According to an embodiment of the present disclosure, the first reconstruction module 510 includes: and the reconstruction unit is used for carrying out three-dimensional reconstruction on the original image by using the parameterized face model to obtain initial face parameters corresponding to the target object in the original image.

According to an embodiment of the present disclosure, the parameterized face model includes a bazier face model, a sari face model, a large face model, a flag model, or a faceware house model.

It should be noted that, in the embodiment of the apparatus portion, the implementation manner, the solved technical problem, the realized function, and the achieved technical effect of each module/unit/subunit and the like are the same as or similar to the implementation manner, the solved technical problem, the realized function, and the achieved technical effect of each corresponding step in the embodiment of the method portion, and are not described herein again.

In the technical scheme of the disclosure, the related data (such as including but not limited to personal information of a user) are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, and all meet the requirements of related laws and regulations without violating the public welfare.

In the technical scheme of the disclosure, the authorization or consent of the data attribution is acquired before the related data is acquired or collected.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as in an embodiment of the present disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method as in an embodiment of the present disclosure.

According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as an embodiment of the present disclosure.

Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, for example, an image processing method. For example, in some embodiments, the image processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When a computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the image processing method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image processing method, comprising:

performing three-dimensional reconstruction on an original image to obtain initial face parameters corresponding to a target object in the original image;

adjusting the initial face parameters according to the reference face parameters to obtain initial target face parameters;

rendering the initial target face parameters to obtain a rendered image; and

and carrying out fusion processing on the original image and the rendering image to obtain a target image.

2. The method of claim 1, wherein the fusing the original image and the rendered image to obtain a target image comprises:

encoding the original image and the rendered image respectively to obtain a first facial feature hidden code corresponding to the original image and a second facial feature hidden code corresponding to the rendered image;

carrying out fusion processing on the first facial feature hidden code and the second facial feature hidden code to obtain a fusion feature hidden code; and

and decoding the fusion feature hidden code to obtain the target image.

3. The method of claim 2, wherein the fusing the first facial feature hidden code and the second facial feature hidden code to obtain a fused feature hidden code comprises:

acquiring a fusion coefficient corresponding to the first facial feature hidden code; the fusion coefficient is used for representing the fusion degree of the first facial feature hidden code during hidden code fusion; and

and based on the fusion coefficient, fusing the first facial feature hidden code and the second facial feature hidden code to obtain a fusion feature hidden code.

4. A method according to any one of claims 1 to 3, wherein said adjusting the initial face parameters according to reference face parameters, resulting in initial target face parameters, comprises:

Determining an expression adjustment parameter and an additional shape adjustment parameter according to the reference expression parameter in the reference facial parameters and the initial expression parameter in the initial facial parameters; and

and adjusting the initial face parameters based on the expression adjustment parameters and the additional shape adjustment parameters to obtain the initial target face parameters.

5. The method of claim 4, wherein said adjusting the initial face parameters according to the reference face parameters further comprises:

determining a shape adjustment parameter according to a reference shape parameter of the reference face parameters and an initial shape parameter of the initial face parameters;

wherein the adjusting the initial face parameter based on the expression adjustment parameter and the additional shape adjustment parameter to obtain the initial target face parameter includes:

and adjusting the initial face parameter based on the shape adjustment parameter, the expression adjustment parameter and the additional shape adjustment parameter to obtain the initial target face parameter.

6. The method of claim 5, wherein said adjusting the initial face parameters based on the reference face parameters further comprises:

Determining an attitude adjustment parameter according to the reference attitude parameter in the reference face parameters and the initial attitude parameter in the initial face parameters;

and adjusting the initial face parameter based on the shape adjustment parameter, the expression adjustment parameter, the additional shape adjustment parameter and the posture adjustment parameter to obtain the initial target face parameter.

7. A method according to any one of claims 1 to 3, wherein said adjusting the initial face parameters according to reference face parameters, resulting in initial target face parameters, comprises:

determining a shape adjustment parameter according to a reference shape parameter of the reference face parameters and an initial shape parameter of the initial face parameters; and

and adjusting the initial face parameters based on the shape adjustment parameters to obtain the initial target face parameters.

8. The method of any of claims 1 to 7, further comprising:

acquiring a reference image;

performing three-dimensional reconstruction on the reference image to obtain facial parameters corresponding to a reference object in the reference image; and

And determining the face parameter corresponding to the reference object as the reference face parameter.

9. The method of any of claims 1 to 7, further comprising:

receiving input information of a user; and

the reference facial parameters are determined from the input information.

10. The method of any one of claims 1 to 9, further comprising:

and driving the target virtual image to execute actions corresponding to the target face parameters according to the target face parameters of the target object in the target image.

11. The method of any one of claims 1 to 10, wherein the fusing the original image and the rendered image to obtain a target image comprises:

and carrying out fusion processing on the original image and the rendered image by using a fusion model to obtain the target image.

12. The method of claim 11, wherein the fusion model comprises an encoding network, a hidden code fusion network, and a decoding network; the fusing the original image and the rendered image by using a fusion model to obtain the target image comprises:

encoding the original image and the rendered image by using the encoding network to obtain a first facial feature hidden code corresponding to the original image and a second facial feature hidden code corresponding to the rendered image;

Using the hidden code fusion network to perform fusion processing based on the first facial feature hidden code and the second facial feature hidden code to obtain a fusion feature hidden code; and

and decoding the fusion feature hidden code by using the decoding network to obtain the target image.

13. The method of any one of claims 1 to 12, wherein the three-dimensionally reconstructing an original image to obtain initial facial parameters corresponding to a target object in the original image comprises:

and carrying out three-dimensional reconstruction on the original image by using the parameterized facial model to obtain initial facial parameters corresponding to the target object in the original image.

14. The method of claim 13, wherein the parameterized face model comprises a bassell face model, a sari face model, a large face model, a flag model, or a faceware house model.

15. An image processing apparatus comprising:

the first reconstruction module is used for carrying out three-dimensional reconstruction on an original image to obtain initial face parameters corresponding to a target object in the original image;

the adjusting module is used for adjusting the initial face parameters according to the reference face parameters to obtain initial target face parameters;

The rendering module is used for performing rendering processing on the initial target face parameters to obtain a rendered image; and

and the fusion module is used for carrying out fusion processing on the original image and the rendering image to obtain a target image.

16. The apparatus of claim 15, wherein the fusion module comprises:

the first coding unit is used for respectively coding the original image and the rendering image to obtain a first facial feature hidden code corresponding to the original image and a second facial feature hidden code corresponding to the rendering image;

the first fusion unit is used for carrying out fusion processing on the first facial feature hidden code and the second facial feature hidden code to obtain a fusion feature hidden code; and

and the first decoding unit is used for decoding the fusion characteristic hidden code to obtain the target image.

17. The apparatus of claim 16, wherein the first fusing unit comprises:

an acquisition subunit, configured to acquire a fusion coefficient corresponding to the first facial feature hidden code; the fusion coefficient is used for representing the fusion degree of the first facial feature hidden code during hidden code fusion; and

And the first fusion subunit is used for fusing the first facial feature hidden code and the second facial feature hidden code based on the fusion coefficient to obtain a fusion feature hidden code.

18. The apparatus of any of claims 15 to 17, wherein the adjustment module comprises:

a first determining unit configured to determine an expression adjustment parameter and an additional shape adjustment parameter according to a reference expression parameter of the reference face parameters and an initial expression parameter of the initial face parameters; and

and the first adjusting unit is used for adjusting the initial face parameter based on the expression adjusting parameter and the additional shape adjusting parameter to obtain the initial target face parameter.

19. The apparatus of claim 18, wherein the adjustment module further comprises:

a second determining unit configured to determine a shape adjustment parameter based on a reference shape parameter of the reference face parameters and an initial shape parameter of the initial face parameters;

wherein the first adjusting unit includes:

and the first adjusting subunit is used for adjusting the initial face parameter based on the shape adjusting parameter, the expression adjusting parameter and the additional shape adjusting parameter to obtain the initial target face parameter.

20. The apparatus of claim 19, wherein the adjustment module further comprises:

a third determining unit configured to determine an attitude adjustment parameter according to a reference attitude parameter among the reference face parameters and an initial attitude parameter among the initial face parameters;

wherein the first adjusting unit includes:

and the second adjusting subunit is used for adjusting the initial face parameter based on the shape adjusting parameter, the expression adjusting parameter, the additional shape adjusting parameter and the gesture adjusting parameter to obtain the initial target face parameter.

21. The apparatus of any of claims 15 to 17, wherein the adjustment module comprises:

a fourth determining unit configured to determine a shape adjustment parameter based on a reference shape parameter of the reference face parameters and an initial shape parameter of the initial face parameters; and

and the second adjusting unit is used for adjusting the initial face parameter based on the shape adjusting parameter to obtain the initial target face parameter.

22. The apparatus of any of claims 15 to 21, further comprising:

the acquisition module is used for acquiring a reference image;

The second reconstruction module is used for carrying out three-dimensional reconstruction on the reference image to obtain facial parameters corresponding to a reference object in the reference image; and

and the first determining module is used for determining the face parameter corresponding to the reference object as the reference face parameter.

23. The apparatus of any of claims 15 to 21, further comprising:

the receiving module is used for receiving input information of a user; and

and the second determining module is used for determining the reference face parameter according to the input information.

24. The apparatus of any of claims 15 to 23, further comprising:

and the driving module is used for driving the target virtual image to execute actions corresponding to the target face parameters according to the target face parameters of the target object in the target image.

25. The apparatus of any one of claims 15 to 24, wherein the fusion module comprises:

and the second fusion unit is used for carrying out fusion processing on the original image and the rendering image by using a fusion model to obtain the target image.

26. The apparatus of claim 25, wherein the fusion model comprises an encoding network, a hidden code fusion network, and a decoding network; the second fusion unit includes:

The coding subunit is used for respectively coding the original image and the rendered image by using the coding network to obtain a first facial feature hidden code corresponding to the original image and a second facial feature hidden code corresponding to the rendered image;

the second fusion subunit is used for carrying out fusion processing on the basis of the first facial feature hidden code and the second facial feature hidden code by using the hidden code fusion network to obtain a fusion feature hidden code; and

and the decoding subunit is used for decoding the fusion feature hidden code by using the decoding network to obtain the target image.

27. The apparatus of any of claims 15 to 26, wherein the first reconstruction module comprises:

and the reconstruction unit is used for carrying out three-dimensional reconstruction on the original image by using the parameterized face model to obtain initial face parameters corresponding to the target object in the original image.

28. The apparatus of claim 27, wherein the parameterized face model comprises a bassell face model, a sari face model, a large face model, a flag model, or a faceware house model.

29. An electronic device, comprising:

At least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 14.

30. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 14.

31. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 14.