CN117689770A

CN117689770A - Expression generating method and device, electronic equipment and storage medium

Info

Publication number: CN117689770A
Application number: CN202311600735.3A
Authority: CN
Inventors: 王培娜
Original assignee: Zhuomi Private Ltd
Current assignee: Zhuomi Private Ltd
Priority date: 2023-11-27
Filing date: 2023-11-27
Publication date: 2024-03-12

Abstract

The application provides an expression generating method, an expression generating device, electronic equipment and a storage medium, and relates to the technical field of image processing, wherein the method comprises the following steps: determining a target style model selected by a user, wherein the target style model at least comprises a preset expression and a preset template material; receiving a target image uploaded by a user to generate a character model of the target image, wherein the target image comprises a portrait; fusing the character model and the preset expression to generate a first target expression; and fusing the first target expression with a preset template material to generate a second target expression. The character model of the target image and the target style model can be used for fusion, the results of various expressions and template materials are combined, the expressions for individuation expression emotion are rapidly and accurately generated, and user experience is improved.

Description

Expression generating method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an expression generating method, an expression generating device, an electronic device, and a storage medium.

Background

Under the condition that the text description is inaccurate in the process of network communication, people often express the current emotion by sending the emotion, and then the requirement of personalized expression making is met.

At present, more facial expression driving products are used in the market, and after a user uploads a specific photo, the specific photo is moved based on a selected video, and the generated expression often cannot accurately express the emotion of the user, so that the user experience is affected.

Disclosure of Invention

The present application aims to solve, at least to some extent, one of the technical problems in the related art.

Therefore, a first object of the present application is to propose an expression generating method to quickly generate an expression capable of accurately expressing a user's emotion.

A second object of the present application is to provide an expression generating apparatus.

A third object of the present application is to propose an electronic device.

A fourth object of the present application is to propose a computer readable storage medium.

A fifth object of the present application is to propose a computer programme product.

To achieve the above object, an embodiment of a first aspect of the present application provides an expression generating method, including:

determining a target style model selected by a user, wherein the target style model at least comprises a preset expression and a preset template material;

receiving a target image uploaded by the user to generate a character model of the target image, wherein the target image comprises a portrait;

fusing the character model and the preset expression to generate a first target expression;

and fusing the first target expression with the preset template material to generate a second target expression.

To achieve the above object, an embodiment of a second aspect of the present application provides an expression generating apparatus, including:

the determining module is used for determining a target style model selected by a user, wherein the target style model at least comprises a preset expression and a preset template material;

the receiving module is used for receiving the target image uploaded by the user to generate a character model of the target image, wherein the target image comprises a portrait;

the first fusion module is used for fusing the character model and the preset expression to generate a first target expression;

and the second fusion module is used for fusing the first target expression with the preset template material to generate a second target expression.

To achieve the above object, an embodiment of a third aspect of the present application provides an electronic device, including: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes the computer-executed instructions stored in the memory to implement the expression generating method provided in the embodiment of the first aspect of the present application.

To achieve the above object, an embodiment of a fourth aspect of the present application provides a computer-readable storage medium, where computer-executable instructions are stored, where the computer-executable instructions are used to implement an expression generating method according to the embodiment of the first aspect of the present application when the computer-executable instructions are executed by a processor.

To achieve the above object, an embodiment of a fifth aspect of the present application proposes a computer program product, including a computer program, which when executed by a processor implements an expression generating method according to an embodiment of the first aspect of the present application.

According to the expression generating method, the device, the electronic equipment and the storage medium, through determining the target style model selected by the user, the target style model at least comprises a preset expression and a preset template material; receiving a target image uploaded by a user to generate a character model of the target image, wherein the target image comprises a portrait; fusing the character model and the preset expression to generate a first target expression; and fusing the first target expression with a preset template material to generate a second target expression. The character model of the target image and the target style model can be utilized to be fused, the results of various expressions and template materials are combined, the personalized style is fused on the basis of keeping the user image, the expression for expressing the emotion is rapidly and accurately generated, and the user experience is improved.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

fig. 1 is a flow chart of an expression generating method according to an embodiment of the present application;

fig. 2 is a flowchart of another expression generating method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an expression generating apparatus according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.

The expression generating method and device of the embodiment of the application are described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of an expression generating method according to an embodiment of the present application.

As shown in fig. 1, the expression generating method includes the steps of:

step 101, determining a target style model selected by a user, wherein the target style model at least comprises a preset expression and a preset template material.

Optionally, the target style model is a template selected by the user from the dynamic material library and expected to generate the expression.

Optionally, the dynamic material library may be stored in a pre-established database, a material website, mobile phone Application (Application), and other storage software, and may also be stored in a usb (universal serial bus), a mobile hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, and other various storage media, or may also be uploaded to a cloud server to implement cloud storage of the dynamic material library.

Step 102, receiving a target image uploaded by a user to generate a character model of the target image, wherein the target image comprises a portrait.

Optionally, the target image uploaded by the user is a clear photograph including a portrait, so that the portrait in the target image is subjected to style fusion to generate an expression.

Optionally, the number of target images uploaded by the user is at least two, for generating the character model.

As one possible implementation, a character model is generated by training a target image using a Stable Diffuse model, which is a deep-learning text-to-image generation model used to generate high quality images.

And step 103, fusing the character model and the preset expression to generate a first target expression.

Optionally, the character model and the preset expression are fused, so that the personalized first target expression can be obtained.

Wherein the first target expression is dynamic, and may be, for example, video or a moving picture.

As a possible implementation manner, the fusion of the character model and the preset expression is performed by using a checkpoint function in the Stable diffration model.

And generating a first target expression, and mutually fusing the target image and the preset expression to enable the preset expression to be integrated into the personalized style of the user.

And 104, fusing the first target expression with a preset template material to generate a second target expression.

Optionally, the preset template material may be a dynamic sticker, and the first target expression fused with the target image is further fused with the dynamic sticker when the first target expression requests a sticker service, so that the generated second target expression further increases the personalized style selected by the user, and more accurately expresses the emotion of the user.

In this embodiment, a target style model selected by a user is determined, where the target style model at least includes a preset expression and a preset template material; receiving a target image uploaded by a user to generate a character model of the target image, wherein the target image comprises a portrait; fusing the character model and the preset expression to generate a first target expression; and fusing the first target expression with a preset template material to generate a second target expression. The character model of the target image and the target style model can be utilized to be fused, the results of various expressions and template materials are combined, the personalized style is fused on the basis of keeping the user image, the expression for expressing the emotion is rapidly and accurately generated, and the user experience is improved.

The embodiment provides another expression generating method, and fig. 2 is a schematic flow chart of another expression generating method provided in the embodiment of the present application.

As shown in fig. 2, the expression generating method may include the steps of:

step 201, determining a target style model selected by a user, wherein the target style model at least comprises a preset expression and a preset template material.

Optionally, when the user accesses the dynamic material library, the user can see all the style models in the dynamic material library, wherein any style model at least comprises a corresponding preset expression and a preset template material, and the user can select any style model in the dynamic material library as a target style model for generating the expression.

The dynamic material library may include video material or moving picture material.

As a possible implementation manner, the preset expression included in the target style model is a surprising expression action of the character, and the preset template material is a dynamic text decal of OMG.

Step 202, receiving a target image uploaded by a user to generate a character model of the target image, wherein the target image comprises a portrait.

Alternatively, the target image may be 5-20 portrait photos.

Further, portrait detection is performed on the target image to determine whether the target image meets preset conditions. And in response to the fact that the target image does not meet the preset condition, indicating the user to upload the target image again.

Since the facial expressions are generated by performing corresponding processing on the five sense organs in the portrait photo, the target image needs to include the five sense organs, i.e., the preset condition in the embodiment of the application is to include the five sense organs.

As a possible implementation manner, the user clicks to upload 5 clear portrait photos as the target image, performs portrait detection on the target image, and guides the user to upload the target image again by displaying a popup window when the preset condition is not met in the target image, for example, performs portrait detection on the target image, and displays a popup window of "no eyes are identified, please upload photos again" when no eyes are identified.

Further, under the condition that the target image uploaded by the user meets the preset condition, uploading the target image to a server side for character model training.

In the embodiment of the application, the character model training is performed by using the bottom layer large model, so that the character model of the target image is generated, and the character model is used for extracting the five-sense organ characteristics of the characters in the target image.

For example, a Stable diffion1.5 model of an open source is selected for character model training. The principle of the Stable Diffusion model is that noise is added to a real image, then the noise is gradually removed by using a neural network, and the real image is gradually recovered along with the gradual removal of the noise. The Stable diffration model is an open source model that can be downloaded directly.

In the embodiment of the application, the process of training the character model is as follows:

firstly, selecting an open-source Stable diffusion1.5 model as a bottom layer large model; and determining style prompt words based on the target image, and adjusting parameters based on the bottom layer large model to experiment different effects so as to achieve realistic character skin, expression and the like.

In the embodiment of the application, the style prompt words relate to a male keyword, a female keyword and a reverse keyword, and when the condition that the target image uploaded by the user is a male picture is monitored, a male keyword and reverse keyword generation result is selected; and selecting a female keyword and a reverse keyword to generate a result under the condition that the target image uploaded by the user is the female picture.

Wherein, male keywords such as back shadow, upper body, grin, jacket, etc.; female keywords such as Barbie, girl, red lips, earrings, skirt, etc.; the reverse keywords are used to describe what is not desired to appear in the image, mainly to prevent the appearance of some malformed, illegal or other inappropriate pictures.

In the present embodiment, parameters include, but are not limited to, the number of sampling steps, facial repair, high definition repair, the number of sets of images generated at a time, and the like.

Wherein, the more sampling steps, the smaller and more accurate the sampling step from noise to image, the longer the corresponding character model outputs the image. In the embodiment of the present application, the number of sampling steps is at least 20, for example, the number of sampling steps may be 30.

The face restoration is used for improving the face generation effect, and the high-definition restoration is used for improving the image quality of the picture.

Wherein the number of sets of images generated at a time is used to control the number of images output by the operational character model at a time. For example, the number of groups per generated image may be 5.

After initial parameters are input, the bottom layer big model can automatically generate a character model, whether the generated character model accords with expectations is determined, and if not, the parameters are adjusted until the generated character model accords with expectations, and training is completed.

And uploading the trained character model and related parameters to a server.

Step 203, pulling a target style model; model fusion is carried out on the character model and the target style model so as to generate a target style photo; and driving the photo of the target style photo by using the preset expression so as to acquire a first target expression.

Optionally, a target style model selected by a user is pulled from the dynamic material library at the server, and fusion of the character model and the preset expression is performed by using a checkpoint function in the Stable Diffusion model.

Optionally, model fusion is performed on the character model and the target style model, and modeling character photos of the target style are generated through fusion.

Further, the photo driving is performed on the target style photo by using the preset expression, so that the person in the target image moves to obtain the first target expression.

As a possible implementation manner, the target style model includes a figure-surprised expression action, after the figure model and the target style model are fused, the figure-surprised expression action is used to drive the target style photo, and the figure-surprised expression action in the target image is generated, namely the first target expression.

The surprise expression action corresponds to an image driving instruction containing a facial expression, and according to the image driving instruction, a person in a target image is converted from a static state to a dynamic state for making a corresponding surprise expression, so that a first target expression is generated.

Alternatively, the expressive motion contained in the target style model may be an expressive motion of a cartoon character.

Step 204, responding to the preset template material as dynamic sticker, requesting a sticker service for the first target expression so as to acquire a fusion video of the first target expression and the preset template material; and converting the dynamic image format of the fusion video to obtain a second target expression.

Optionally, when the preset template material is a dynamic sticker, requesting a sticker service for the generated first target expression, so that the dynamic sticker matched with the target style model and the first target expression are fused.

As a possible implementation manner, the preset template material included in the target style model is dynamic text decal of "OMG", and the dynamic decal service in the target style model is requested for the first target expression, so that the dynamic text decal of "OMG" is added to the first target expression, and a fusion video is generated.

Further, the fusion video includes the expression action and the dynamic sticker in the target style model, and the fusion video is converted into a dynamic image format, for example, an image interchange format (Graphics Interchange Format, abbreviated as GIF format) to obtain a second target expression.

The input target image is converted from the character photo into a modeling image containing the expression actions and the dynamic stickers in the target style model, and the modeling image is used as a finally acquired second target expression.

Further, a second target expression is returned to the user, wherein the second target expression comprises a preset expression action and a preset dynamic sticker.

The method has the advantages that the real character photo is modeled, the stylized character picture is generated, the stylized character picture is combined with different target style models based on the stylized character picture, the combined result of various expression actions and dynamic stickers is obtained, the personalized expression special for the user can be rapidly generated, the emotion of the user is expressed, and the expression playing method is enriched.

In this embodiment, a target style model selected by a user is determined, where the target style model at least includes a preset expression and a preset template material; receiving a target image uploaded by a user to generate a character model of the target image, wherein the target image comprises a portrait; pulling a target style model; model fusion is carried out on the character model and the target style model so as to generate a target style photo; photo driving is carried out on the target style photo by using the preset expression so as to obtain a first target expression; responding to the preset template material as dynamic sticker, requesting a sticker service for the first target expression so as to acquire a fusion video of the first target expression and the preset template material; and converting the dynamic image format of the fusion video to obtain a second target expression. The character model of the target image and the target style model can be utilized to be fused, the results of various expressions and template materials are combined, the personalized style is fused on the basis of keeping the user image, the expression for expressing the emotion is rapidly and accurately generated, and the user experience is improved.

In order to achieve the above embodiment, the present application further provides an expression generating apparatus.

As shown in fig. 3, the expression generating apparatus includes: a determining module 301, a receiving module 302, a first fusing module 303 and a second fusing module 304.

The determining module 301 is configured to determine a target style model selected by a user, where the target style model includes at least a preset expression and a preset template material;

a receiving module 302, configured to receive a target image uploaded by a user, so as to generate a character model of the target image, where the target image includes a portrait;

the first fusion module 303 is configured to fuse the character model and a preset expression to generate a first target expression;

the second fusion module 304 is configured to fuse the first target expression with a preset template material, so as to generate a second target expression.

Further, in a possible implementation manner of the embodiment of the present application, the receiving module 302 is further configured to:

training the image generation model based on the target image to generate a character model.

And carrying out portrait detection on the target image to determine whether the target image meets preset conditions.

And in response to the fact that the target image does not meet the preset condition, indicating the user to upload the target image again.

Further, in one possible implementation manner of the embodiment of the present application, the first fusing module 303 is further configured to:

pulling a target style model;

model fusion is carried out on the character model and the target style model so as to generate a target style photo;

and driving the photo of the target style photo by using the preset expression so as to acquire a first target expression.

Further, in one possible implementation manner of the embodiment of the present application, the second fusing module 304 is further configured to:

responding to the preset template material as dynamic sticker, requesting a sticker service for the first target expression so as to acquire a fusion video of the first target expression and the preset template material;

and converting the dynamic image format of the fusion video to obtain a second target expression.

It should be noted that the explanation of the embodiment of the expression generating method is also applicable to the expression generating device of this embodiment, and will not be repeated here.

In order to achieve the above embodiments, the present application further proposes an electronic device including: a processor, and a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the methods provided by the previous embodiments.

In order to implement the above-mentioned embodiments, the present application also proposes a computer-readable storage medium in which computer-executable instructions are stored, which when executed by a processor are adapted to implement the methods provided by the foregoing embodiments.

In order to implement the above embodiments, the present application also proposes a computer program product comprising a computer program which, when executed by a processor, implements the method provided by the above embodiments.

The processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user related in the application all accord with the regulations of related laws and regulations, and do not violate the popular public order.

It should be noted that personal information from users should be collected for legitimate and reasonable uses and not shared or sold outside of these legitimate uses. In addition, such collection/sharing should be performed after receiving user informed consent, including but not limited to informing the user to read user agreements/user notifications and signing agreements/authorizations including authorization-related user information before the user uses the functionality. In addition, any necessary steps are taken to safeguard and ensure access to such personal information data and to ensure that other persons having access to the personal information data adhere to their privacy policies and procedures.

The present application contemplates embodiments that may provide a user with selective prevention of use or access to personal information data. That is, the present disclosure contemplates that hardware and/or software may be provided to prevent or block access to such personal information data. Once personal information data is no longer needed, risk can be minimized by limiting data collection and deleting data. In addition, personal identification is removed from such personal information, as applicable, to protect the privacy of the user.

In the foregoing descriptions of embodiments, descriptions of the terms "one embodiment," "some embodiments," "example," "particular example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" is at least two, such as two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. The expression generating method is characterized by comprising the following steps of:

2. The expression generating method according to claim 1, wherein the receiving the target image uploaded by the user to generate the character model of the target image comprises:

training an image generation model based on the target image to generate the character model.

3. The expression generating method according to claim 1 or 2, characterized by further comprising, after said receiving the target image uploaded by the user:

4. The expression generating method according to claim 3, wherein the preset condition is to include a five sense organs, the method further comprising:

and responding to the fact that the target image does not meet the preset condition, and indicating the user to upload the target image again.

5. The expression generating method according to claim 1, wherein the fusing the character model and the preset expression to generate the first target expression includes:

pulling the target style model;

and driving the photo of the target style photo by utilizing the preset expression so as to acquire the first target expression.

6. The expression generating method according to claim 1, wherein the fusing the first target expression with the preset template material to generate a second target expression includes:

responding to the preset template material as dynamic sticker, and requesting a sticker service for the first target expression so as to acquire a fusion video of the first target expression and the preset template material;

and converting the dynamic image format of the fusion video to obtain the second target expression.

7. An expression generating apparatus, comprising:

8. The expression generating apparatus of claim 7, wherein the receiving module is further configured to:

9. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1-6.

10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1-6.