CN113409437B

CN113409437B - Virtual character face pinching method and device, electronic equipment and storage medium

Info

Publication number: CN113409437B
Application number: CN202110697554.1A
Authority: CN
Inventors: 杨司琪; 高永强
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2023-08-08
Anticipated expiration: 2041-06-23
Also published as: CN113409437A

Abstract

The disclosure provides a method, a device, electronic equipment and a storage medium for pinching a face of a virtual character, wherein the method comprises the following steps: responding to an acquisition request of a user, and acquiring a user face image of the user; extracting skeleton point parameters of the face image of the user by using the trained skeleton point generation network to obtain skeleton point parameters corresponding to the face image of the user; the skeleton point generation network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters; and performing face rendering based on the skeleton point parameters to obtain a virtual face image corresponding to the face image of the user. The skeleton point generation network in the method can learn the characteristics of the real user face and the virtual face at the same time, and the preset skeleton point parameters can also provide data support for the skeleton point parameters corresponding to the user face image, so that the virtual face image similar to the user face can be quickly rendered, and the face pinching efficiency and the similarity are high.

Description

Virtual character face pinching method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of image processing, in particular to a method, a device, electronic equipment and a storage medium for pinching a face of a virtual character.

Background

With the development of computer technology and mobile terminals, more and more role-playing games are emerging. In order to meet the personalized requirements of different players, when creating virtual characters of players, a face pinching function is generally provided for players, so that players can create favorite characters according to own preference, for example, virtual characters similar to own real faces can be created.

In the related art, a face pinching scheme is provided, which can train a specific bone point parameter regressor for each user, and can render a game character by using the bone point parameters under the condition that the bone point parameter regressor can output corresponding bone point parameters. Because the face pinching scheme adopts an iterative regression mode to train the bone point parameter regressions, the training process is slow in convergence, and different bone point parameter regressions need to be trained for different users, which is time-consuming and labor-consuming.

Disclosure of Invention

The embodiment of the disclosure at least provides a method, a device, electronic equipment and a storage medium for pinching a face of a virtual character.

In a first aspect, an embodiment of the present disclosure provides a method for pinching a face of a virtual character, the method including:

responding to an acquisition request of a user, and acquiring a user face image of the user;

extracting skeleton point parameters of the user face image by using a trained skeleton point generation network to obtain skeleton point parameters corresponding to the user face image; the skeleton point generation network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;

and performing face rendering based on the skeleton point parameters to obtain a virtual face image corresponding to the face image of the user.

In a possible implementation manner, the performing face rendering based on the skeletal point parameter to obtain a virtual face image corresponding to the user face image includes:

inputting the skeleton point parameters to a preset rendering engine so that the preset rendering engine reconstructs face information based on the skeleton point parameters;

and receiving the face information reconstructed by the preset rendering engine, and determining a virtual face image corresponding to the user face image based on the face information.

In one possible embodiment, the skeletal point generating network includes a feature extraction layer and a full connection layer; the step of extracting skeleton point parameters of the face image of the user by using the trained skeleton point generation network to obtain skeleton point parameters corresponding to the face image of the user comprises the following steps:

inputting the face image of the user to a feature extraction layer included in the skeleton point generation network to obtain image features output by the feature extraction layer;

inputting the image features output by the feature extraction layer into a full-connection layer included in the skeleton point generation network to obtain skeleton point parameters output by the full-connection layer;

and determining the skeleton point parameters corresponding to the face image of the user based on the skeleton point parameters output by the full connection layer.

In one possible embodiment, the skeletal point generating network is trained as follows:

acquiring a user face image sample and generating a virtual face image sample based on preset skeleton point parameters;

respectively carrying out feature extraction on the user face image sample and the virtual face image sample by using a feature extraction layer included in the skeleton point generation network to obtain a first image sample feature and a second image sample feature; extracting skeleton point parameters of the second image sample features by utilizing a full-connection layer included in the skeleton point generation network, and determining predicted skeleton point parameters corresponding to the virtual face image samples;

Determining a first loss function value based on the first image sample feature and the second image sample feature, and determining a second loss function value based on the preset bone point parameter and the predicted bone point parameter;

and adjusting the skeleton point generation network based on the first loss function value and the second loss function value to obtain a trained skeleton point generation network.

In a possible implementation manner, the method further includes a domain discrimination network, the determining a first loss function value based on the first image sample feature and the second image sample feature includes:

inputting the first image sample characteristics and the second image sample characteristics into the domain discrimination network to obtain a loss function value output by the domain discrimination network;

and determining the first loss function value based on the loss function value output by the domain discrimination network.

In a possible implementation manner, the adjusting the skeleton point generation network based on the first loss function value and the second loss function value to obtain a trained skeleton point generation network includes:

judging whether the loss function sum value corresponding to the first loss function value and the second loss function value is smaller than a preset threshold value or not;

If not, adjusting any one or more networks of the skeleton point generation network and the domain discrimination network, and determining an adjusted first loss function value and a second loss function value based on the adjusted networks;

and obtaining a trained network until the sum of the loss functions corresponding to the adjusted first loss function value and the second loss function value is smaller than a preset threshold value.

In one possible implementation manner, the inputting the first image sample feature and the second image sample feature into the domain discrimination network, to obtain the loss function value output by the domain discrimination network, includes:

inputting the first image sample characteristics into the domain discrimination network to obtain a first image category output by the domain discrimination network, and determining a first comparison result between the first image category output by the domain discrimination network and a first labeling category indicated by a user face image sample; the method comprises the steps of,

inputting the second image sample characteristics into the domain discrimination network to obtain a second image category output by the domain discrimination network, and determining a second comparison result between the second image category output by the domain discrimination network and a second labeling category indicated by the virtual face image sample;

And determining a loss function value output by the domain discrimination network based on the first comparison result and the second comparison result.

In one possible embodiment, the method further comprises a gradient inversion layer connected with the domain discrimination network; the method further comprises the steps of:

inverting the gradient value corresponding to the loss function value output by the domain discrimination network by utilizing the gradient inversion layer to obtain an inverted gradient value;

and adjusting the skeleton point generation network according to the inverted gradient value.

In a possible implementation manner, the virtual face image sample is obtained according to the following steps:

responding to the input request of the user, and acquiring preset skeleton point parameters input by the user;

and carrying out face rendering on the preset skeleton point parameters by using a preset rendering engine to obtain a virtual face image sample.

In a second aspect, an embodiment of the present disclosure further provides an apparatus for pinching a face of a virtual character, where the apparatus includes:

the acquisition module is used for responding to an acquisition request of a user and acquiring a user face image of the user;

the generation module is used for extracting skeleton point parameters of the user face image by utilizing the trained skeleton point generation network to obtain skeleton point parameters corresponding to the user face image; the skeleton point generation network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;

And the rendering module is used for performing face rendering based on the skeleton point parameters to obtain a virtual face image corresponding to the user face image.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the method of pinching a face by a virtual character as described in any of the first aspect and its various embodiments.

In a fourth aspect, the presently disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of pinching faces for virtual characters as described in the first aspect and any of its various embodiments.

According to the scheme for pinching the face of the virtual character, under the condition that the face image of the user is obtained, the skeleton point parameters of the face image of the user can be extracted by utilizing the trained skeleton point generation network, so that the skeleton point parameters corresponding to the face image of the user can be obtained. The skeleton point generation network is used as a general network and can be obtained by training based on the user face image sample and the virtual face image sample, so that any real user face can extract corresponding skeleton point parameters through the skeleton point generation network, and time and labor are saved.

Meanwhile, the skeleton point generation network can learn the characteristics of the user face and the virtual face at the same time, and the preset skeleton point parameters can also provide data support for skeleton point parameters corresponding to the user face image, so that the virtual face image similar to the user face can be quickly rendered, and the face pinching efficiency and the similarity are high.

The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.

FIG. 1 illustrates a flow chart of a method for pinching a face by a virtual character provided by embodiments of the present disclosure;

fig. 2 illustrates an application diagram of a method for pinching a face by a virtual character according to an embodiment of the present disclosure;

FIG. 3 illustrates an application diagram of another method for pinching faces by virtual characters provided by embodiments of the present disclosure;

FIG. 4 illustrates a schematic diagram of an apparatus for pinching faces by virtual characters provided by an embodiment of the present disclosure;

fig. 5 shows a schematic diagram of an electronic device provided by an embodiment of the disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The term "and/or" is used herein to describe only one relationship, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Through researches, a face pinching scheme provided in the related technology can train a specific bone point parameter regressor for each user, and a game character can be rendered by utilizing the bone point parameters under the condition that the bone point parameter regressor can output corresponding bone point parameters. Because the face pinching scheme adopts an iterative regression mode to train the bone point parameter regressions, the training process is slow in convergence, and different bone point parameter regressions need to be trained for different users, which is time-consuming and labor-consuming.

Based on the above researches, the disclosure provides a method, a device, an electronic device and a storage medium for pinching a face of a virtual character.

For the understanding of the present embodiment, first, a detailed description will be given of a method for pinching a face by a virtual character disclosed in the embodiments of the present disclosure, where an execution subject of the method for pinching a face by a virtual character provided in the embodiments of the present disclosure is generally a computer device having a certain computing capability, and the computer device includes, for example: the terminal device, or server or other processing device, may be a User Equipment (UE), mobile device, user terminal, cellular telephone, cordless telephone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle mounted device, wearable device, etc. In some possible implementations, the method of pinching the face of the avatar may be implemented by way of a processor invoking computer readable instructions stored in a memory.

Referring to fig. 1, a flowchart of a method for pinching a face of a virtual character according to an embodiment of the disclosure is shown, where the method includes steps S101 to S103, where:

s101: responding to an acquisition request of a user, and acquiring a user face image of the user;

S102: extracting skeleton point parameters of the face image of the user by using the trained skeleton point generation network to obtain skeleton point parameters corresponding to the face image of the user; the skeleton point generation network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;

s103: and performing face rendering based on the skeleton point parameters to obtain a virtual face image corresponding to the face image of the user.

Here, in order to facilitate understanding of the method for pinching the face of the virtual character provided by the embodiment of the present disclosure, an application scenario of the method will be described in detail first. The method for pinching the face of the virtual character provided by the embodiment of the disclosure can be mainly applied to the technical field of games or other related technical fields with face pinching requirements. Taking the field of game technology as an example, in the case where a game user intends to use a virtual character for game physical examination, it is often desirable that the virtual character used is more similar to a real person to further enhance the sense of immersion of the game.

In order to meet the personalized needs of users, a corresponding face pinching scheme is provided in the related art, wherein a specific skeletal point parameter regressor can be trained for each user, and the skeletal point parameters output by the regressor are utilized to render game roles. However, the above face pinching scheme adopts an iterative regression mode to train the bone point parameter regressor, so that the training process is slow in convergence, and different bone point parameter regressors need to be trained for different users, which is time-consuming and labor-consuming.

In order to solve the above-mentioned problem, the embodiment of the present disclosure provides a scheme for implementing pinching by combining a skeleton point generation network obtained by training a virtual face image sample generated by using a user face image sample and a preset skeleton point parameter, and because the training of the skeleton point generation network satisfies the parameter matching degree between the output target skeleton point parameter and the preset skeleton point parameter and the feature matching degree between the virtual face image sample and the corresponding user face image sample satisfies the target condition, the virtual face image finally rendered by using the user face image for the user is more similar to the user face image, and the skeleton point generation network is used as a general network, so that the pinching efficiency is high.

The face image of the user may be a face image of a related user obtained after the authorization of the user, where the face image may be a face image obtained by shooting the user with the currently handheld user terminal, or may be a face image pre-stored in the user terminal, which is not specifically limited in the embodiment of the present disclosure.

The skeleton point generation network can be obtained by training based on the user face image sample and the virtual face image sample, namely, the generation network can learn the characteristics of the real user face and the characteristics of the virtual face at the same time, so that the similarity between the real face (corresponding to the user face) and the virtual face obtained by finally pinching the face is restrained through the consistency of the characteristics.

In the embodiment of the disclosure, based on the bone point parameters corresponding to the face image of the user, a virtual face image corresponding to the face image of the user can be obtained. Specifically, the method can be determined by the following steps:

step one, inputting skeleton point parameters into a preset rendering engine so that the preset rendering engine rebuilds face information based on the skeleton point parameters;

and step two, receiving face information reconstructed by a preset rendering engine, and determining a virtual face image corresponding to the face image of the user based on the face information.

Here, the face rendering may be implemented using a preset rendering engine. In the technical field of games, the preset rendering engine may be a game engine, and the input skeleton point parameters may be reconstructed into face information by using the game engine. The three-dimensional face model can be determined based on relevant parameters of each bone point, and under the condition of determining bone point parameters corresponding to the face image of the user, the three-dimensional face model can be determined based on corresponding parameter values, and further the rendered virtual face image can be determined through conversion relation of three-dimensional space values and two-dimensional space.

The skeleton point generation network in the embodiment of the disclosure may include a feature extraction layer for performing feature extraction and a full connection layer for generating skeleton point parameters, which may be specifically implemented by the following steps:

step one, inputting a face image of a user to a feature extraction layer included in a skeleton point generation network to obtain image features output by the feature extraction layer;

inputting the image features output by the feature extraction layer into a full-connection layer included in a skeleton point generation network to obtain skeleton point parameters output by the full-connection layer;

and thirdly, determining skeleton point parameters corresponding to the face image of the user based on the skeleton point parameters output by the full connection layer.

Here, the user face image may be input to the feature extraction layer first to extract image features, and in the case that the image features are input to the full connection layer, bone point parameters corresponding to the user face image may be determined.

It should be noted that, the feature extraction layer in the embodiments of the present disclosure may be composed of four residual convolution modules. The full-connection layer can be one layer or multiple layers, wherein different layers of full-connection layers can be arranged in combination with different training purposes, for example, three full-connection layers can be arranged.

To facilitate an understanding of the above-described bone point parameter determination process, further description will be provided with reference to fig. 2.

As shown in fig. 2, for an acquired user face image (i.e., a real face image), in the case of a feature extraction layer included in a network generated through skeletal points, image features output from the feature extraction layer may be determined, and the image features may be input to three full connection layers sequentially connected, so that skeletal point parameters corresponding to the user face image may be obtained. It can be seen that the full connection layer mapping here is the correspondence between image features and bone point parameters.

The training process of the skeleton point generation network is considered to be a key step for realizing the face pinching of the virtual character, and the training process will be specifically described below.

The method for pinching the face of the virtual character provided by the embodiment of the disclosure can train skeleton points to generate a network through the following steps:

step one, acquiring a user face image sample and generating a virtual face image sample based on preset skeleton point parameters;

step two, respectively carrying out feature extraction on a user face image sample and a virtual face image sample by utilizing a feature extraction layer included in a skeleton point generation network to obtain a first image sample feature and a second image sample feature; extracting skeleton point parameters of the second image sample features by utilizing a full-connection layer included in the skeleton point generation network, and determining predicted skeleton point parameters corresponding to the virtual face image samples;

Determining a first loss function value based on the first image sample characteristic and the second image sample characteristic, and determining a second loss function value based on a preset bone point parameter and a predicted bone point parameter;

and step four, adjusting the skeleton point generation network based on the first loss function value and the second loss function value to obtain a trained skeleton point generation network.

In the embodiment of the disclosure, the training process and the application process of the skeleton point generation network are the same, and the feature extraction layer included in the skeleton point generation network is required to extract features and the full connection layer included in the skeleton point generation network is required to generate skeleton point parameters.

Unlike the determination of skeleton point parameters directly performed on a single user face image in the application process, the determination of skeleton point parameters corresponding to a virtual face in the training process can be performed mainly by considering that difficulty in determining skeleton point parameters directly performed on a user face image is high, professional artistic personnel are often required to pinch out a game face like a real face by adjusting hundreds of parameters, and it takes more than about one hour for one picture, and the virtual face image can be generated by using preset skeleton point parameters input by a user and a preset rendering engine, so that time and labor are saved. In addition, the preset skeleton point parameters can be used as the reference information of the predicted skeleton point parameters output by the full-connection layer, and the skeleton point generation network can be trained through the parameter matching degree.

In order to approximate the virtual face to the real face, here, training of the skeletal point generation network may be constrained by feature consistency between the first image features (features corresponding to the user face image samples) and the second image features (features corresponding to the virtual face image samples).

The feature consistency may correspond to a first loss function value, the parameter matching degree may correspond to a second loss function value, and the bone point generating network is adjusted by the two loss function values together, so as to obtain a trained bone point generating network.

In an embodiment of the present disclosure, the determining of the first loss function value may be implemented based on a domain discrimination network, and may specifically be implemented by the following steps:

step one, inputting a first image sample feature and a second image sample feature into a domain discrimination network to obtain a loss function value output by the domain discrimination network;

and step two, determining a first loss function value based on the loss function value output by the domain discrimination network.

In the embodiment of the disclosure, the loss function value of the domain discrimination network output can be determined according to the following steps:

step one, inputting first image sample characteristics into a domain discrimination network to obtain a first image category output by the domain discrimination network, and determining a first comparison result between the first image category output by the domain discrimination network and a first labeling category indicated by a user face image sample; inputting the characteristics of the second image sample into a domain discrimination network to obtain a second image category output by the domain discrimination network, and determining a second comparison result between the second image category output by the domain discrimination network and a second labeling category indicated by the virtual face image sample;

And step two, determining a loss function value output by the domain discrimination network based on the first comparison result and the second comparison result.

Here, assuming that the label of the first labeling category indicated by the user face image sample is 0 and the label of the second labeling category indicated by the virtual face image sample is 1, the first loss function value is optimized so that the feature of the real face can be predicted to be 0 after passing through the domain discrimination network, and the feature of the virtual face can be predicted to be 1 after passing through the domain discrimination network. It is understood that the closer the first image class is to the first label class, the closer the second image class is to the second label class, and the smaller the loss function value.

The method for pinching the face of the virtual character provided by the embodiment of the disclosure can adjust the skeleton point generation network based on the first loss function value and the second loss function value, and specifically comprises the following steps:

step one, judging whether the loss function sum value corresponding to the first loss function value and the second loss function value is smaller than a preset threshold value or not;

step two, if not, any one or more networks of a skeleton point generation network and a domain discrimination network are adjusted, and an adjusted first loss function value and an adjusted second loss function value are determined based on the adjusted networks;

And step three, obtaining a trained network until the sum of the loss functions corresponding to the adjusted first loss function value and the second loss function value is smaller than a preset threshold value.

Here, the network adjustment policy may be determined based on the result of comparing the loss function value and the value with a preset threshold, for example, only the skeleton point generation network is adjusted, or only the domain discrimination network is adjusted, or both the former network and the latter network are adjusted until each network reaches convergence in the case where the loss function and the value are sufficiently small.

It should be noted that, in the process of adjusting the skeleton point generating network, the adjustment may be performed on the feature extraction layer included in the skeleton point generating network, or may be performed on the full-connection layer, where the adjustment may be selected adaptively according to different training requirements, and the method is not limited specifically.

For convenience in describing the training process with respect to the network, the following description is further provided with reference to fig. 3:

as shown in fig. 3, the input is a real face image sample (corresponding to a user face image sample) and a virtual face image sample, and in the case of a feature extraction layer included through a skeletal point generating network, a first image sample feature corresponding to the user face image sample and a second image sample feature corresponding to the virtual face image sample can be determined.

The first image sample feature and the second image sample feature are input into a domain discrimination network, and a loss function value output by the domain discrimination network is obtained as a first loss function value. In addition, a predicted bone point parameter may be determined through the last two fully connected layers included in the bone point generation network, and a second loss function value may be determined based on the preset bone point parameter and the predicted bone point parameter.

Training about the network may be performed based on the first loss function value and the second loss function value.

Considering that the method for pinching the face of the virtual character provided by the embodiment of the disclosure aims at generating a virtual face more similar to a real face, here, the countermeasure learning can be realized by using a gradient inversion layer so that a skeleton point generation network cannot distinguish the real face and the virtual face, thereby facilitating the extraction of skeleton point parameters of a user face image to be processed by using the skeleton point generation network in the following.

The gradient inversion layer in embodiments of the present disclosure may be connected to a domain discrimination network. And reversing the gradient value corresponding to the loss function value output by the domain discrimination network by using the gradient reversing layer to obtain a reversed gradient value, and adjusting the skeleton point generation network according to the reversed gradient value.

It is known that the characteristics of the real face are predicted to be from the virtual face by the gradient inversion layer, and the characteristics of the virtual face are predicted to be from the real face, so that the purpose of enabling the characteristics of the real face and the characteristics of the virtual face to be consistent is achieved.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

Based on the same inventive concept, the embodiments of the present disclosure further provide a device for pinching a face of a virtual character corresponding to the method for pinching a face of a virtual character, and since the principle of solving the problem by the device in the embodiments of the present disclosure is similar to that of pinching a face of a virtual character in the embodiments of the present disclosure, implementation of the device may refer to implementation of the method, and repeated parts will not be repeated.

Referring to fig. 4, a schematic diagram of a device for pinching a face by a virtual character according to an embodiment of the disclosure is shown, where the device includes: an acquisition module 401, a generation module 402, and a rendering module 403; wherein, the liquid crystal display device comprises a liquid crystal display device,

an acquisition module 401, configured to acquire a user face image of a user in response to an acquisition request of the user;

A generating module 402, configured to extract skeleton point parameters of a face image of a user by using a trained skeleton point generating network, so as to obtain skeleton point parameters corresponding to the face image of the user; the skeleton point generation network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;

the rendering module 403 is configured to perform face rendering based on the skeleton point parameter, and obtain a virtual face image corresponding to the face image of the user.

According to the embodiment of the disclosure, under the condition that the user face image is obtained, the trained skeleton point generation network can be utilized to extract skeleton point parameters of the user face image so as to obtain skeleton point parameters corresponding to the user face image. The skeleton point generation network is used as a general network and can be obtained by training based on the user face image sample and the virtual face image sample, so that any real user face can extract corresponding skeleton point parameters through the skeleton point generation network, and time and labor are saved.

In a possible implementation manner, the rendering module 403 is configured to perform face rendering based on the skeleton point parameters according to the following steps to obtain a virtual face image corresponding to the face image of the user:

inputting the skeleton point parameters into a preset rendering engine so that the preset rendering engine reconstructs face information based on the skeleton point parameters;

and receiving the face information reconstructed by the preset rendering engine, and determining a virtual face image corresponding to the face image of the user based on the face information.

In one possible implementation, the skeletal point generating network includes a feature extraction layer and a fully connected layer; the generating module 402 is configured to extract skeleton point parameters of a face image of a user by using the trained skeleton point generating network according to the following steps:

inputting the face image of the user into a feature extraction layer included in a skeleton point generation network to obtain image features output by the feature extraction layer;

and determining skeleton point parameters corresponding to the face image of the user based on the skeleton point parameters output by the full connection layer.

In one possible implementation, the training module 404 is further included:

a training module 404, configured to train the skeletal point generating network according to the following steps:

respectively carrying out feature extraction on a user face image sample and a virtual face image sample by utilizing a feature extraction layer included in a skeleton point generation network to obtain a first image sample feature and a second image sample feature; extracting skeleton point parameters of the second image sample features by utilizing a full-connection layer included in the skeleton point generation network, and determining predicted skeleton point parameters corresponding to the virtual face image samples;

In a possible implementation manner, the method further includes a domain discrimination network, and the training module 404 is configured to determine a first loss function value based on the first image sample feature and the second image sample feature according to the following steps:

Inputting the first image sample characteristics and the second image sample characteristics into a domain discrimination network to obtain a loss function value output by the domain discrimination network;

a first loss function value is determined based on the loss function value output by the domain discrimination network.

In a possible implementation, the training module 404 is configured to adjust the skeleton point generating network based on the first loss function value and the second loss function value according to the following steps to obtain a trained skeleton point generating network:

judging whether the sum of the loss functions corresponding to the first loss function value and the second loss function value is smaller than a preset threshold value or not;

In a possible implementation manner, the training module 404 is configured to input the first image sample feature and the second image sample feature into the domain discrimination network according to the following steps to obtain a loss function value output by the domain discrimination network:

Inputting the first image sample characteristics into a domain discrimination network to obtain a first image category output by the domain discrimination network, and determining a first comparison result between the first image category output by the domain discrimination network and a first labeling category indicated by a user face image sample; the method comprises the steps of,

inputting the second image sample characteristics into a domain discrimination network to obtain a second image category output by the domain discrimination network, and determining a second comparison result between the second image category output by the domain discrimination network and a second labeling category indicated by the virtual face image sample;

In one possible embodiment, the method further comprises a gradient inversion layer connected with the domain discrimination network; training module 404 is further configured to:

In a possible implementation manner, the obtaining module 401 is configured to obtain a virtual face image sample according to the following steps:

responding to an input request of a user, and acquiring preset skeleton point parameters input by the user;

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

The embodiment of the disclosure further provides an electronic device, as shown in fig. 5, which is a schematic structural diagram of the electronic device provided by the embodiment of the disclosure, including: a processor 501, a memory 502, and a bus 503. The memory 502 stores machine readable instructions executable by the processor 501 (e.g., execution instructions corresponding to the acquisition module 401, the generation module 402, the rendering module 403, etc. in the apparatus of fig. 4), and when the electronic device is running, the processor 501 communicates with the memory 502 via the bus 503, and the machine readable instructions when executed by the processor 501 perform the following processing:

responding to an acquisition request of a user, acquiring a user face image of the user and a trained skeleton point generation network;

extracting skeleton point parameters of the face image of the user by using the trained skeleton point generation network to obtain skeleton point parameters corresponding to the face image of the user; the skeleton point generation network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;

The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for pinching faces by virtual characters described in the method embodiments above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.

The embodiments of the present disclosure further provide a computer program product, where the computer program product carries program code, where instructions included in the program code may be used to perform the steps of the method for pinching a face by a virtual character described in the foregoing method embodiments, and specifically reference may be made to the foregoing method embodiments, which are not described herein.

Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method for pinching a face of a virtual character, the method comprising:

performing face rendering based on the skeleton point parameters to obtain a virtual face image corresponding to the face image of the user;

the training process of the skeleton point generation network comprises the following steps: acquiring a user face image sample and generating a virtual face image sample based on preset skeleton point parameters; respectively carrying out feature extraction on the user face image sample and the virtual face image sample by using a feature extraction layer included in the skeleton point generation network to obtain a first image sample feature and a second image sample feature; extracting skeleton point parameters of the second image sample features by utilizing a full-connection layer included in the skeleton point generation network, and determining predicted skeleton point parameters corresponding to the virtual face image samples; determining a first loss function value based on the first image sample feature and the second image sample feature, and determining a second loss function value based on the preset bone point parameter and the predicted bone point parameter; and adjusting the skeleton point generation network based on the first loss function value and the second loss function value to obtain a trained skeleton point generation network.

2. The method according to claim 1, wherein said performing face rendering based on the skeletal point parameters to obtain a virtual face image corresponding to the user face image comprises:

3. The method according to claim 1 or 2, wherein the skeletal point generating network comprises a feature extraction layer and a full connection layer; the step of extracting skeleton point parameters of the face image of the user by using the trained skeleton point generation network to obtain skeleton point parameters corresponding to the face image of the user comprises the following steps:

4. The method of claim 1, further comprising a domain discrimination network, the determining a first loss function value based on the first image sample feature and the second image sample feature comprising:

5. The method of claim 4, wherein adjusting the bone point generation network based on the first loss function value and the second loss function value results in a trained bone point generation network, comprising:

6. The method according to claim 4 or 5, wherein said inputting the first image sample feature and the second image sample feature into the domain discrimination network, resulting in a loss function value output by the domain discrimination network, comprises:

7. The method of claim 4, further comprising a gradient inversion layer connected to the domain discrimination network; the method further comprises the steps of:

8. The method according to claim 1, wherein the virtual face image sample is obtained as follows:

9. An apparatus for pinching a face of a virtual character, the apparatus comprising:

the generation module is used for extracting skeleton point parameters of the user face image by utilizing the trained skeleton point generation network to obtain skeleton point parameters corresponding to the user face image; the skeleton point generation network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters; the training process of the skeleton point generation network comprises the following steps: acquiring a user face image sample and generating a virtual face image sample based on preset skeleton point parameters; respectively carrying out feature extraction on the user face image sample and the virtual face image sample by using a feature extraction layer included in the skeleton point generation network to obtain a first image sample feature and a second image sample feature; extracting skeleton point parameters of the second image sample features by utilizing a full-connection layer included in the skeleton point generation network, and determining predicted skeleton point parameters corresponding to the virtual face image samples; determining a first loss function value based on the first image sample feature and the second image sample feature, and determining a second loss function value based on the preset bone point parameter and the predicted bone point parameter; adjusting the skeleton point generation network based on the first loss function value and the second loss function value to obtain a trained skeleton point generation network;

10. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine-readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine-readable instructions when executed by said processor performing the steps of the method of pinching faces by a virtual character according to any of claims 1 to 8.

11. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the method of pinching faces by a virtual character according to any of claims 1 to 8.