CN114820907A

CN114820907A - Human face image cartoon processing method and device, computer equipment and storage medium

Info

Publication number: CN114820907A
Application number: CN202110119146.8A
Authority: CN
Inventors: 周泽生; 王志斌; 官林杰; 李朋; 周世威
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2022-07-29

Abstract

The application relates to a cartoon processing method and device for a face image, computer equipment and a storage medium. The method comprises the following steps: extracting human face characteristic points and image characteristics from a real human face image; constructing a three-dimensional simulated face model corresponding to the real face image based on the face feature points; acquiring difference characteristics between the simulated face model and the three-dimensional simulated face template; migrating the difference characteristics to the three-dimensional cartoon face template based on the semantic mapping relation between the three-dimensional simulation face template and the three-dimensional cartoon face template to obtain a three-dimensional cartoon face model; and performing image rendering on the cartoon face model according to the image characteristics to generate a three-dimensional cartoon image. By adopting the method, the three-dimensional cartoon image which has higher precision and has real human face characteristics and image characteristics can be effectively constructed.

Description

Human face image cartoon processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of human intelligence technology and image processing technology, and in particular, to a human face image cartoon processing method, apparatus, computer device and storage medium.

Background

With the rapid development of computer technologies and artificial intelligence, various image processing technologies based on computer vision are emerging continuously, for example, three-dimensional reconstruction processing is performed on information in a two-dimensional image, specifically, scenes such as three-dimensional cartoon faces are generated. In the related technology, the features of the facial features of the human face are usually extracted, then cartoon face materials similar to the facial features are matched in a three-dimensional cartoon face database, and then three-dimensional cartoon face reconstruction, interpolation operation and the like are carried out after the cartoon face materials are zoomed, so as to generate the three-dimensional cartoon face.

However, in the current method, five sense organs of the three-dimensional cartoon face need to be disassembled one by one, the processing efficiency is low, the generated three-dimensional cartoon face has low precision and a large similarity difference with a real face.

Disclosure of Invention

In view of the foregoing, there is a need to provide a human face image cartoon processing method, apparatus, computer device and storage medium capable of effectively generating a three-dimensional cartoon human face image similar to a real human face of a user.

A human face image cartoon processing method comprises the following steps:

extracting human face characteristic points and image characteristics from a real human face image;

constructing a three-dimensional simulated face model corresponding to the real face image based on the face feature points;

acquiring difference characteristics between the simulated face model and the three-dimensional simulated face template;

migrating the difference characteristics to the three-dimensional cartoon face template based on the semantic mapping relation between the three-dimensional simulation face template and the three-dimensional cartoon face template to obtain a three-dimensional cartoon face model;

and performing image rendering on the cartoon face model according to the image characteristics to generate a three-dimensional cartoon image.

A human face image cartoon processing device, the device comprising:

the characteristic extraction module is used for extracting human face characteristic points and image characteristics from a real human face image;

the three-dimensional face reconstruction module is used for constructing a three-dimensional simulated face model corresponding to the real face image based on the face characteristic points;

the difference feature extraction module is used for acquiring difference features between the simulated face model and the three-dimensional simulated face template;

the difference feature migration module is used for migrating the difference features to the three-dimensional cartoon face template based on the semantic mapping relation between the three-dimensional simulation face template and the three-dimensional cartoon face template to obtain a three-dimensional cartoon face model;

and the three-dimensional cartoon image generation module is used for performing image rendering on the cartoon face model according to the image characteristics to generate a three-dimensional cartoon image.

In one embodiment, the face feature points are two-dimensional face feature points; the three-dimensional face reconstruction module is also used for acquiring a feature point mapping matrix of the two-dimensional face feature points in the three-dimensional simulation face template; performing parameter estimation based on the face characteristic points and the characteristic point mapping matrix to obtain three-dimensional face parameters; and constructing a three-dimensional simulated face model corresponding to the real face image based on the three-dimensional face parameters.

In one embodiment, the three-dimensional face reconstruction module is further configured to perform iterative estimation of camera parameters based on the face feature points and the feature point mapping matrix, and obtain camera parameters after a first iteration condition is satisfied; performing face parameter iterative estimation based on the face feature points, the feature point mapping matrix and the camera parameters, and obtaining face shape base parameters and face expression base parameters after a second iterative condition is met; and constructing a three-dimensional simulated face model corresponding to the real face image according to the camera parameters, the face shape base parameters and the face expression base parameters.

In one embodiment, the topological structure of the simulated human face model comprises a plurality of triangular faces; the difference feature extraction module is further used for obtaining each triangular surface in the simulated face model and a first deformation gradient on the corresponding triangular surface in the three-dimensional simulated face template; obtaining an affine transformation mapping matrix between each triangular surface in the simulated face model and each corresponding triangular surface in the three-dimensional simulated face template according to the first deformation gradient; and the affine transformation mapping matrix is used for representing the difference characteristics between the simulation face model and the three-dimensional simulation face template.

In one embodiment, the topological structure of the three-dimensional cartoon face template comprises a plurality of triangular faces; the difference feature migration module is further used for aligning the simulated face model with the three-dimensional cartoon face template based on a semantic mapping relation between the three-dimensional simulated face template and the three-dimensional cartoon face template; searching triangular surfaces matched with the triangular surfaces in the simulated human face model in the aligned triangular surfaces in the three-dimensional cartoon human face template to obtain a triangular surface mapping relation between the triangular surfaces in the simulated human face model and the triangular surfaces in the three-dimensional cartoon human face template; and migrating the difference characteristics to the three-dimensional cartoon face template according to the triangular surface mapping relation to obtain a three-dimensional cartoon face model.

In one embodiment, the difference feature migration module is further configured to perform deformation processing on an original triangular surface in the three-dimensional cartoon face template according to the triangular surface mapping relationship and the affine transformation mapping matrix, and determine a second deformation gradient between the original triangular surface and the deformed triangular surface in the three-dimensional cartoon face template; performing iterative deformation processing on each triangular surface in the aligned three-dimensional cartoon face template according to the triangular surface mapping relation and the affine transformation mapping matrix towards the direction of minimizing the difference between the second deformation gradient and the first deformation gradient; and obtaining the three-dimensional cartoon face model after the iteration stop condition is met.

In one embodiment, the feature extraction module is further configured to extract key points of five sense organs, key points of contours and semantic information corresponding to the key points and the contour key points from the real face image; obtaining face characteristic points according to the key points of the five sense organs, the contour key points and the semantic information respectively corresponding to the key points; and extracting the image characteristics of the real human face image based on the human face characteristic points.

In one embodiment, the character features include a native character feature and an additional character feature; the characteristic extraction module is also used for carrying out face alignment processing on the real face image to obtain an aligned real face image; identifying native image features from the aligned real face images based on the face feature points; and identifying additional image features from the aligned real human face images based on the human face feature points.

In one embodiment, the native character features, including hair style features; the feature extraction module is further used for extracting hair features in the aligned real face image through a trained hair style recognition network, and generating a hair mask image according to the hair features; dividing the hair mask image into at least two mask image sub-regions based on the face feature points; and obtaining hair style characteristics according to the distribution of the hair characteristics in the at least two mask image sub-areas.

In one embodiment, the feature extraction module is further configured to extract an additional image region image from the aligned real face image according to the distribution position of the face feature point; identifying additional image categories of the additional image area images through the trained target classification network; and obtaining additional image characteristics in the real face image according to the additional image categories.

In one embodiment, the character features include skin tone features, native character features, and additional character features; the three-dimensional cartoon image generation module is also used for extracting pixels of a skin color area based on the distribution position of the face feature points and acquiring skin color features according to the pixels of the skin color area; performing skin color rendering on the three-dimensional cartoon face model according to the skin color characteristics; and obtaining matched image materials according to the original image characteristics and the additional image characteristics, and rendering the image materials to a three-dimensional cartoon face model after skin color rendering to obtain a three-dimensional cartoon image with face characteristics and image characteristics.

In one embodiment, the device further comprises a display module for displaying a cartoon image selection interface, wherein the cartoon image selection interface comprises three-dimensional image options; responding to the selection operation of the three-dimensional image option, and collecting a real face image of the user; and displaying a image preview interface, and displaying the real human face image and the three-dimensional cartoon image in the image preview interface.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

A computer program product or computer program comprising computer instructions stored in a computer readable storage medium; the processor of the computer device reads the computer instructions from the computer readable storage medium, and when executing the computer instructions, the processor performs the following steps:

According to the human face image cartoon processing method, the human face image cartoon processing device, the computer equipment and the storage medium, after the human face characteristic points and the image characteristics are extracted from the real human face image, the three-dimensional simulated human face model corresponding to the real human face image is constructed on the basis of the human face characteristic points. And then, by acquiring the difference characteristics between the simulated face model and the three-dimensional simulated face template, because the three-dimensional simulated face template and the three-dimensional cartoon face template have a preset semantic mapping relationship, and performing deformation migration on the three-dimensional cartoon face template according to the difference characteristics based on the semantic mapping relationship, the difference characteristics between the simulated face model and the three-dimensional simulated face template can be accurately and effectively migrated into the three-dimensional cartoon face template, and the three-dimensional cartoon face model with the real face characteristics and the cartoon characteristics in the real face image is obtained. And rendering the image characteristics in the real face image into the cartoon face model by further performing image rendering on the cartoon face model according to the image characteristics so as to generate a three-dimensional cartoon image which is more similar to the real face in the real face image.

Drawings

FIG. 1 is a diagram of an exemplary implementation of a human face image cartoon processing method;

FIG. 2 is a schematic flow chart illustrating a cartoon processing method for a face image according to an embodiment;

FIG. 3 is a schematic diagram of a real face image and a constructed corresponding three-dimensional simulated face model in one embodiment;

FIG. 4 is a schematic diagram of a topology of a three-dimensional simulated face template in one embodiment;

FIG. 5 is a schematic diagram of a three-dimensional simulated face template and a corresponding three-dimensional simulated face model in one embodiment;

FIG. 6 is a schematic diagram illustrating an embodiment of aligning a simulated face model with a three-dimensional cartoon face template;

FIG. 7 is a diagram illustrating migration of difference features to a three-dimensional cartoon face template in one embodiment;

FIG. 8 is a diagram illustrating extraction of feature points of a real face image according to an embodiment;

FIG. 9 is a diagram illustrating an embodiment of face alignment processing performed on a real face image;

FIG. 10 is a diagram illustrating an embodiment of an image obtained by performing facial feature point extraction and facial alignment processing on a real face image and dividing a face region;

FIG. 11 is a schematic view of a hair mask map in one embodiment;

FIG. 12 is a schematic flow chart illustrating the classification of features of the glasses according to one embodiment;

FIG. 13 is a schematic view of a skin tone color chart in one embodiment;

FIG. 14 is a schematic diagram illustrating an effect of a three-dimensional cartoon image obtained by performing three-dimensional cartoon reconstruction on a real face image in one embodiment;

FIG. 15 is a schematic diagram illustrating the effects of three sets of real face images and corresponding three-dimensional cartoon images in one embodiment;

FIG. 16 is a schematic diagram of an embodiment of an avatar selection interface;

FIG. 17 is a diagram of an image capture interface in one embodiment;

FIG. 18 is a schematic diagram illustrating a preview avatar interface in one embodiment;

FIG. 19 is a diagram illustrating a three-dimensional avatar adjustment interface, according to an embodiment;

FIG. 20 is a block diagram showing the construction of a human face image cartoon processing apparatus according to an embodiment;

FIG. 21 is a diagram showing an internal structure of a computer device in one embodiment;

fig. 22 is an internal structural view of a computer device in another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The human face image cartoon processing method can be applied to computer equipment. The computer device may be a terminal or a server. The human face image cartoon processing method can be applied to a terminal, a server and a system comprising the terminal and the server, and is realized through interaction of the terminal and the server.

The human face image cartoon processing method provided by the application can be applied to the application environment shown in the figure 1. Wherein the terminal 102 communicates with the server 104 via a network. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, or the like, but is not limited thereto. The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal 102 and the server 104 may be directly or indirectly connected through wired or wireless communication, and the application is not limited thereto.

Among them, cloud computing (cloud computing) is a computing mode that distributes computing tasks over a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". As a basic capability provider of cloud computing, a cloud computing resource pool (called as an IaaS (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are used by external clients selectively.

Specifically, the terminal 102 acquires or collects a real face image, and uploads the real face image to the server 104. The server 104 extracts the face feature points and the image features from the real face image, constructs a three-dimensional simulated face model corresponding to the real face image based on the face feature points, and obtains the difference features between the simulated face model and the three-dimensional simulated face template. The server 104 further migrates the difference features to the three-dimensional cartoon face template based on the semantic mapping relationship between the three-dimensional simulation face template and the three-dimensional cartoon face template, so that a three-dimensional cartoon face model can be effectively obtained; and then performing image rendering on the cartoon face model according to the image characteristics, further generating a three-dimensional cartoon image similar to the real face, and outputting the three-dimensional cartoon image.

It can be understood that the human face image cartoon processing method in the embodiments of the present application adopts a computer vision technology, a machine learning technology, and the like in an artificial intelligence technology, and can effectively realize automatic generation of a three-dimensional cartoon image similar to a real human face. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can acquire information from images or multidimensional data. It can be understood that the application uses the computer vision technology to carry out the cartoon processing of the human face image on the real human face image so as to generate the three-dimensional cartoon image similar to the real human face.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning. It can be understood that the human face feature point detection network, the target classification network, the hair style identification network and the like used in some embodiments of the present application are obtained by training using a machine learning technology, and based on the training obtained by the machine learning technology, the processing such as feature point extraction, target classification, hair style identification and the like can be more accurately performed on a real human face image.

In an embodiment, as shown in fig. 2, a cartoon processing method for a face image is provided, which is described by taking an example that the method is applied to a computer device, where the computer device may specifically be a terminal or a server in fig. 1, and it is understood that the method may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the following steps:

s202, extracting human face characteristic points and image characteristics from the real human face image.

The real face image is an image of a face collected in a real scene, and is a two-dimensional face image, and the real face image includes a face of a user. The real face image can be a frontal face picture shot by a camera in real time, and can also be a face image including a face obtained from a local place or the internet.

The real face image in this embodiment may be a two-dimensional face image of any pose and expression.

It can be understood that the human face feature points are features of a plurality of key points in the human face obtained by performing feature extraction on an image including the human face, and are used for representing human face information. It is understood that plural refers to more than two. The real face image comprises at least one face characteristic point in eyebrows, eyes, a nose, lips, a chin and the like. In one embodiment, the human face feature points may be distributed mainly at least one of the human face portions, such as the eyebrow portion, the nose bridge portion, the eye portion, the lip portion, and the chin portion.

In one embodiment, the face feature points may include facial feature keypoints and contour keypoints. The key points of the five sense organs and the key points of the outline also carry corresponding semantic information. For example, the semantic information may include information of keypoint locations, geometric features between respective keypoints, and the like. Wherein, the geometrical characteristics between the key points comprise at least one of the distance, the area, the angle and the like between the key points. The facial contour and expression can be reflected by the key points of the five sense organs, the key points of the contour and the corresponding semantic information.

The image feature refers to a feature related to an image included in the real face image.

In one embodiment, the character features may include native character features and additional character features. The native image feature refers to an image feature of a real face in a real face image. Such as a native image feature including at least one of a hair style feature, a skin tone feature, a birthmark feature, a mole feature, etc. that the user has. The additional feature refers to an additional feature related to an image, which is not owned by the user in the real face image, for example, an additional feature worn by the user in the real face image, specifically, the additional feature may be included in the face region and the region near the face. For example, the additional character features include accessory features including features of at least one accessory of eyeglasses, features, earrings, and the like.

After the computer equipment acquires the real face image, extracting the face characteristic points from the real face image. Specifically, a pre-trained face detection network or a preset face feature point detection algorithm may be adopted to extract the face feature points, and then the image features in the real face image are extracted according to the face feature points.

In one embodiment, since the original real face image may have various noises and random interferences, the acquired real face image may also be subjected to image preprocessing such as gray scale correction and noise filtering. Specifically, the computer device first detects a face region in a real face image, and preprocesses the real face image based on a face region detection result. For the real face image, the preprocessing process may include at least one of light compensation, gray scale transformation, histogram equalization, normalization, geometric correction, filtering, sharpening, and the like. After the real face image is preprocessed, the face characteristic points and the image characteristics are further extracted from the preprocessed real face image.

In one embodiment, a pre-trained deep neural network may be employed to extract the visual features from the real face image. Specifically, sub-networks for different image features may be included in the pre-trained deep neural network to extract different image features respectively. In another embodiment, different deep neural networks can be used to extract different image features respectively.

The method in each embodiment of the application is to carry out cartoon processing on a real face image so as to carry out three-dimensional reconstruction on the real face image and further generate a three-dimensional cartoon image, so that the generated three-dimensional cartoon image has face features and image features similar to the real face.

And S204, constructing a three-dimensional simulated face model corresponding to the real face image based on the face feature points.

The real face image is a two-dimensional image, and the two-dimensional image is a planar image without depth information. Three-dimensional (i.e., 3D) is a space system formed by adding a direction vector to a planar two-dimensional system. The three-dimensional simulated face model is a three-dimensional face geometric structure, and specifically can be a three-dimensional simulated face model with real face features in a real face image, which is constructed based on three-dimensional modeling of face feature points in a two-dimensional real face image.

After extracting the face characteristic points and the image characteristics from the real face image, the computer equipment firstly establishes the mapping relation between the 2D characteristic points and the 3D characteristic points of the face according to the extracted face characteristic points. Specifically, the computer device may match and map the two-dimensional coordinate information of the face feature point with a key feature point in a preset three-dimensional simulated face template, respectively, to obtain corresponding obtained three-dimensional coordinate information. And the facial pose is corrected according to the facial orientation to obtain facial features with normalized expression. And generating a three-dimensional simulated face model corresponding to the real face image based on the face features which are mapped and corrected through three-dimensional deformation.

The three-dimensional simulation face template is a three-dimensional average face model obtained by averaging the face features corresponding to a large amount of face data. For example, a 3D portable Face Model (3D digital media Model) Model may be used for Face reconstruction. The face reconstruction here means reconstructing a three-dimensional face model corresponding to the face from a two-dimensional real face image. The 3DMM is a universal three-dimensional face model, represents a face by using fixed points, and can be used for constructing a three-dimensional face shape from a two-dimensional face image.

And S206, acquiring the difference characteristics between the simulated face model and the three-dimensional simulated face template.

It is understood that the difference feature refers to a difference between the facial features, that is, a variation feature of the face in the real face image relative to the average face template.

Specifically, after the computer device constructs a three-dimensional simulated face model corresponding to the real face image based on two-dimensional face feature points, the simulated face model is compared with an average three-dimensional simulated face template, and variation characteristics of the simulated face model relative to the average three-dimensional simulated face template are extracted, so that difference characteristics between the simulated face model and the three-dimensional simulated face template are extracted and obtained.

The difference features can be extracted in proportion, namely, the proportion mapping relation between the simulated face model and the three-dimensional simulated face template is extracted, the proportion mapping relation reflects the change of the shape of the simulated face model, and therefore the difference features between the simulated face model and the three-dimensional simulated face template can be effectively extracted by extracting the proportion mapping relation.

And S208, migrating the difference characteristics to the three-dimensional cartoon face template based on the semantic mapping relation between the three-dimensional simulation face template and the three-dimensional cartoon face template to obtain a three-dimensional cartoon face model.

The three-dimensional cartoon face template refers to a preset general cartoon face model, and specifically may be an average three-dimensional cartoon face model obtained by averaging a large number of cartoon faces corresponding to the simulated face model.

It can be understood that the three-dimensional simulated face model and the three-dimensional simulated face template, and the three-dimensional cartoon face model and the three-dimensional cartoon face template are three-dimensional face models with topological structures. The topological structure comprises information such as vertex number, vertex sequence, connection relation among the vertices and the like. The three-dimensional simulated face model and the three-dimensional simulated face template are geometric models with consistent topological structures, and the three-dimensional simulated face model and the three-dimensional cartoon face template have different topological structures.

The semantic mapping relation is a mapping relation between a preset three-dimensional simulation face template and a three-dimensional cartoon face template. In one embodiment, the semantic mapping relationship may be a mapping relationship between triangular surfaces in topological structures corresponding to the three-dimensional simulated face template and the three-dimensional cartoon face template, respectively. In another embodiment, the semantic mapping relationship may also be a mapping relationship between a face key point in a three-dimensional simulated face template and a face key point in a three-dimensional cartoon face template.

After the difference features between the simulated face model and the three-dimensional simulated face template are extracted by the computer equipment, the difference features are transferred to the three-dimensional cartoon face template based on the semantic mapping relation between the three-dimensional simulated face template and the three-dimensional cartoon face template. Specifically, the computer device firstly aligns the three-dimensional simulated face template and the three-dimensional cartoon face template according to semantic mapping relation between the topological structure of the three-dimensional simulated face template and the topological structure of the three-dimensional cartoon face template, namely deforms the three-dimensional simulated face template and the three-dimensional cartoon face template to be consistent according to semantic alignment, so that the two different face templates are semantically aligned.

Because the three-dimensional simulated face model and the three-dimensional simulated face template have the same topological structure, the computer equipment further migrates the difference characteristics of the three-dimensional simulated face model on the three-dimensional simulated face template into the three-dimensional cartoon face template based on the topological structure aligned by the semantic mapping relation according to the topological structure of the aligned three-dimensional simulated face template and the topological structure of the three-dimensional cartoon face template, thereby effectively generating the three-dimensional cartoon face model with the cartoon image and the real face characteristics.

And S210, performing image rendering on the cartoon face model according to the image characteristics to generate a three-dimensional cartoon image.

The generated cartoon face model has the characteristics of the face in the real face image.

It can be understood that the three-dimensional cartoon image is a three-dimensional non-real cartoon image with cartoon characteristics and real human face characteristics.

Since the generated cartoon face model has the characteristics of the face in the real face image, the computer device further performs image rendering on the cartoon face model according to the image characteristics to render the image characteristics in the real face image into the cartoon face model, thereby generating a three-dimensional cartoon image more similar to the real face in the real face image.

In the human face image cartoon processing method, after extracting the human face characteristic points and the image characteristics from the real human face image, the computer device constructs a three-dimensional simulated human face model corresponding to the real human face image based on the human face characteristic points. And then, by acquiring the difference characteristics between the simulated face model and the three-dimensional simulated face template, because the three-dimensional simulated face template and the three-dimensional cartoon face template have a preset semantic mapping relationship, and performing deformation migration on the three-dimensional cartoon face template according to the difference characteristics based on the semantic mapping relationship, the difference characteristics between the simulated face model and the three-dimensional simulated face template can be accurately and effectively migrated into the three-dimensional cartoon face template, and the three-dimensional cartoon face model with the real face characteristics and the cartoon characteristics in the real face image is obtained. And rendering the image characteristics in the real face image into the cartoon face model by further performing image rendering on the cartoon face model according to the image characteristics so as to generate a three-dimensional cartoon image which is more similar to the real face in the real face image.

In one embodiment, the face feature points are two-dimensional face feature points; a three-dimensional simulated face model corresponding to a real face image is constructed based on face feature points, and the method comprises the following steps: acquiring a feature point mapping matrix of two-dimensional face feature points in a three-dimensional simulated face template; performing parameter estimation based on the face characteristic points and the characteristic point mapping matrix to obtain three-dimensional face parameters; and constructing a three-dimensional simulated face model corresponding to the real face image based on the three-dimensional face parameters.

The three-dimensional face parameters refer to face parameters to be solved in the process of constructing a three-dimensional simulation face model corresponding to a real face image. For example, the three-dimensional face parameters may include face shape-based parameters, facial expression-based parameters, and the like.

The three-dimensional simulation face template is a three-dimensional face geometric structure, and specifically can be a parameterized model, namely a face geometric structure for expressing a face by using various parameters. The feature point mapping matrix refers to a mapping relationship between two-dimensional face feature points and three-dimensional face feature points, that is, a 2D-3D feature point mapping matrix.

After the computer equipment extracts the two-dimensional human face characteristic points from the real human face image, the three-dimensional human face reconstruction is carried out by utilizing the two-dimensional human face characteristic points. Specifically, the computer device firstly obtains a feature point mapping matrix of two-dimensional face feature points in the three-dimensional simulated face template to establish a mapping relation between the two-dimensional face feature points and the three-dimensional face feature points. The feature point mapping matrix is used for representing the mapping relation between two-dimensional face feature points and three-dimensional face feature points.

It can be understood that since the face includes strong a priori information, the face can be formed by linearly combining a series of parameters. Therefore, a three-dimensional simulated face model can be constructed through parameter estimation. The series of parameters include camera parameters, three-dimensional face parameters and the like.

And the computer equipment further performs parameter estimation according to the human face characteristic points and the characteristic point mapping matrix, and can specifically obtain the final three-dimensional human face parameters through parameter estimation. And the computer equipment further constructs a three-dimensional simulation face model corresponding to the real face in the real face image according to the three-dimensional face parameters obtained by estimation.

In one embodiment, the computer device may further estimate the camera parameters according to the face feature points and the feature point mapping matrix, and then perform parameter estimation according to the face feature points and the feature point mapping matrix and the camera parameters to obtain the final three-dimensional face parameters. And then according to the camera parameters and the three-dimensional face parameters obtained by estimation, a three-dimensional simulated face model corresponding to the real face in the real face image is constructed.

In the implementation, parameter estimation is performed according to the two-dimensional human face characteristic points and the characteristic point mapping matrix of the two-dimensional human face characteristic points in the three-dimensional simulated human face template, and then three-dimensional human face reconstruction is performed by using the obtained three-dimensional human face parameters, so that a three-dimensional simulated human face model with real human face characteristics can be accurately constructed.

In one embodiment, the step of performing parameter estimation based on the feature points of the human face and the feature point mapping matrix to obtain three-dimensional human face parameters includes: performing camera parameter iterative estimation based on the face characteristic points and the characteristic point mapping matrix, and obtaining camera parameters after a first iterative condition is met; and performing face parameter iterative estimation based on the face characteristic points, the characteristic point mapping matrix and the camera parameters, and obtaining face shape base parameters and face expression base parameters after a second iteration condition is met.

The method comprises the following steps of constructing a three-dimensional simulation face model corresponding to a real face image based on three-dimensional face parameters, wherein the steps comprise: and constructing a three-dimensional simulated face model corresponding to the real face image according to the camera parameters, the face shape base parameters and the face expression base parameters.

It can be understood that, similar to the three-dimensional simulated face template, the three-dimensional simulated face model has a three-dimensional face model with a topological structure, and both have the same topological structure. The three-dimensional face parameters comprise face shape base parameters, face expression base parameters and the like. The face shape base parameter may be used to control the long-phase of the generated face, and the face expression base parameter may be used to control the expression of the generated face.

Where an iteration is the activity of a repetitive feedback process, usually with the aim of approximating a desired goal or result. In computer technology, it may be a program or an instruction that needs to be executed repeatedly by a computer device, that is, a loop step in the program is executed repeatedly until a certain condition is satisfied, and this process is called iteration. Each iteration of the process is referred to as an "iteration," and the result of each iteration is used as the initial value for the next iteration.

Iterative estimation is a process of repeatedly executing a series of operation steps until a preset algorithm or formula, such as an algorithm for solving an equation set and solving a characteristic value by a matrix, is adopted, and sequentially solving subsequent quantities from the previous quantities. Each result of the process is obtained by performing the same operation on the previous result. In this embodiment, the camera parameters, the face shape base parameters, and the face expression base parameters in the three-dimensional face parameters are solved by using an iterative estimation method.

When the computer equipment carries out parameter estimation, the camera parameters and the three-dimensional face parameters can be respectively estimated by adopting a step-by-step iterative estimation mode. It is understood that the step-by-step iterative estimation may refer to dividing an iterative estimation process into iterative estimation steps of different stages to respectively solve the required parameter values. For example, the step-and-iteration estimation may include a first-stage iteration estimation and a second-stage iteration estimation.

It will be appreciated that the computer device may first estimate the camera parameters by a first stage iterative estimation. And then estimating the facial shape base parameters and the facial expression base parameters through the second-stage iterative estimation.

Specifically, the computer device first performs a first-stage iterative estimation from the face feature points and the feature point mapping matrix to perform an iterative estimation of camera parameters. And obtaining the camera parameters after the first iteration condition is met. Wherein the first iteration condition refers to a condition for stopping iterative estimation of the camera parameters. For example, the first iteration condition may specifically be that a preset iteration number is reached, and may also be that a convergence value of the camera parameter reaches a preset convergence threshold value, and the like.

After the camera parameters are obtained by the computer equipment, the face parameters are further iteratively estimated according to the face characteristic points, the characteristic point mapping matrix and the second stage of camera parameters, so that the face parameters are iteratively estimated. And obtaining the facial shape base parameters and the facial expression base parameters after meeting the second iteration condition. The second iteration condition refers to a condition for stopping iterative estimation of the face parameters. For example, the second iteration condition may specifically be that a preset iteration number is reached, or that the convergence values corresponding to the facial shape base parameter and the facial expression base parameter reach a preset convergence threshold value, and the like.

And the computer equipment further constructs a three-dimensional simulated face model corresponding to the real face image according to the estimated camera parameters, the face shape base parameters and the face expression base parameters.

In one embodiment, the expression of the objective function corresponding to the three-dimensional face model may be as follows:

wherein M represents a three-dimensional simulated face template, i.e. an average face. S, R, T denote camera parameters, where S denotes a scaling scale in the camera parameters, R denotes a rotation matrix in the camera parameters, and T denotes a translation factor, i.e. a displacement vector, in the camera parameters. id denotes a shape base, exp denotes an expression base, K denotes the number of face feature points, K denotes the kth personal face feature point, A _id Denotes a shape-based substrate, α _id Denotes the form base coefficient, A _exp Representing an expression base, α _exp Denotes the expression base coefficient, L _k Representing the 2D face feature points of the face, p representing the coefficients of 3DMM, and λ representing the regularization term factor, to avoid overfitting.

The quantity to be solved is a camera parameter, a human face shape base parameter and a human face expression base parameter. The objective function is a non-linear equation, and the parameters can be solved jointly in a non-linear optimization mode. For example, parameter iteration may be performed by calculating a jacobian matrix of the cost function, or by gauss-newton method, or the like.

In order to ensure the real-time performance of the algorithm, a step-by-step linear solving mode can be adopted, and the joint solving process is decomposed into a step-by-step solving process of three groups of parameters, namely camera parameters, human face shape base parameters and human face expression base parameters. When one group of parameters is solved, the other groups of parameters are set as constants, so that the solving of each group of parameters is a linear problem, and an analytic result can be directly obtained. The specific parameter estimation steps are as follows:

(1) solving camera parameters: and setting the shape base parameter and the expression base parameter as constants. For example, the initial values are all set to zero. Here, the camera model is simplified to a weak perspective projection model, which may be applicable to a case where the depth of the object itself is small relative to the distance to the camera. The camera model has only scale, translation, and rotation components. Calculated using the gold standard method, a 2x8 equation was constructed from each 2D-3D pair: x in this case _i Shown are the homogeneous coordinates of the 3D points, x _i Representing non-homogeneous coordinates, P, of the 2D points ¹ And P ² Shown are the first two rows of the camera projection matrix (3x 4):

n human face feature points can construct a 2nx8 equation set, and a projection matrix P, that is, a projection matrix of three-dimensional human face feature points projected to two-dimensional human face feature points can be obtained by solving the equation set. Then S, R, T parameters in the camera parameters can be decomposed from the S, R, T parameters.

(2) Solving the human face shape base coefficient: the camera parameter and the expression base parameter are fixed, and for example, the camera parameter and the expression base parameter may be set to be constant. The camera parameters are obtained in the previous step, and the expression base parameters are initialized to be zero. The objective function is then transformed into a capability function, the expression of which is as follows:

let alpha here _id X, then p ═ α _id X, while S, R, T, A _id ,A _exp ,α _exp ,L _k Are all known quantities, so there are:

A＝S·R·A _id ，

the energy function can be expressed as:

by adding a diagonal matrix Ω to Ax + b for weighting, the following energy function can be obtained:

wherein x represents an expression coefficient to be solved, a represents a result after correlation transformation according to a shape base, an average face and a camera matrix, b represents assignment of 2D human feature points, Ω represents a weighting factor, and λ represents a regular term factor. The regular term here is to avoid overfitting. Specifically, the analysis result can be obtained by a linear least square method.

(3) Solving facial expression base parameters: and setting the camera parameters and the human face shape base parameters as constants, and solving in a manner similar to the method for solving the shape base. The expression may be as follows:

x＝-(A ^T ΩA+λI) ^-1 (A ^T Ω ^T b)

and (4) solving the energy function obtained in the step (3) to obtain a result of the facial expression base parameter.

By iteratively executing the steps (1) - (3), after the iteration condition is met, the camera parameter, the human face shape base parameter and the human face expression base parameter can be obtained.

In the process of iteratively executing steps (1) - (3), camera parameters may be iteratively estimated in a first iteration stage, and after a first iteration condition is met, camera parameters are obtained.

And then in a second iteration stage, carrying out face parameter iterative estimation according to the face characteristic points, the characteristic point mapping matrix and the obtained camera parameters, and obtaining face shape base parameters and face expression base parameters after meeting a second iteration condition.

In another embodiment, the facial shape-based parameters and facial expression-based parameters may also be estimated in different iteration stages. That is, first, in the first iteration stage, camera parameter iteration estimation is performed, and after the first iteration condition is satisfied, camera parameters are obtained. And then in a second iteration stage, carrying out face parameter iterative estimation according to the face characteristic points, the characteristic point mapping matrix and the obtained camera parameters, and obtaining face shape base parameters after meeting a second iteration condition. Further, in a third iteration stage, face parameter iteration estimation is carried out according to the face characteristic points, the characteristic point mapping matrix, the obtained camera parameters and the face shape base parameters, and after a third iteration condition is met, face expression base parameters are obtained.

In this embodiment, the camera parameters have the largest influence on the result of the overall parameter estimation, the facial shape base parameters have an influence on the whole face, and the facial expression base parameters have an influence on the facial sub-regions. Therefore, in the process of solving the parameters step by step linearly, parameter estimation is carried out according to the sequence of the camera parameters, the human face shape base parameters and the human face expression base parameters, namely, the parameters are updated step by step from large to small according to the influence on the final result, so that a more accurate three-dimensional simulated human face model which is more vivid than a real human face can be obtained.

Fig. 3 is a schematic diagram of a real face image and a constructed corresponding three-dimensional simulated face model in an embodiment. Fig. 3 (a) is an image obtained by extracting human face feature points from a real face image, and fig. 3 (b) is a three-dimensional simulated face model constructed with real face features in the real face image.

In another embodiment, the facial shape base parameters and facial expression base parameters corresponding to the facial image can be determined in an artificial intelligence-based manner. For example, by inputting the face image into a trained deep neural network model, the deep neural network model can extract deep features of the face image, and obtain face shape base parameters and face expression base parameters based on the extracted deep features. Then, a three-dimensional simulated face model corresponding to the real face image can be constructed based on the face shape base parameters and the face expression base parameters output by the neural network model and based on the three-dimensional simulated face template of the 3DMM model.

In one embodiment, the topological structure of the simulated human face model comprises a plurality of triangular faces; acquiring difference characteristics between the simulated face model and the three-dimensional simulated face template, wherein the difference characteristics comprise: acquiring each triangular surface in the simulated face model and a first deformation gradient on the corresponding triangular surface in the three-dimensional simulated face template; obtaining an affine transformation mapping matrix between each triangular surface in the simulated face model and each corresponding triangular surface in the three-dimensional simulated face template according to the first deformation gradient; and the affine transformation mapping matrix is used for representing the difference characteristics between the simulated face model and the three-dimensional simulated face template.

Wherein, the deformation gradient refers to the degree of change in the geometric topological structure of the human face. The first deformation gradient reflects the change characteristics of the three-dimensional simulated face model corresponding to the real face relative to the three-dimensional simulated face template.

It can be understood that the three-dimensional simulated face model is similar to the three-dimensional simulated face template, and the three-dimensional simulated face model is also a three-dimensional geometric model. The topological structure of the geometric model comprises a large number of vertexes and a plurality of triangular surfaces formed by the large number of vertexes, namely triangular surfaces. Generally, the greater the number of vertices, the higher the accuracy of the three-dimensional cartoon face model. The three-dimensional simulation face models with various shapes and expressions can be simulated by performing deformation processing on the triangular surface.

Usually, the topology of the geometric model is composed of a plurality of tiny triangular patches. The topological structure of the three-dimensional simulated face template comprises a plurality of triangular faces, and the topological structure of the three-dimensional simulated face model also comprises a plurality of triangular faces. The three-dimensional simulated face model and the three-dimensional simulated face template have the same topological structure, namely, each triangular surface in the three-dimensional simulated face model is provided with each corresponding triangular surface in the three-dimensional simulated face template. In one embodiment, the basic unit for performing the deformation migration process is a triangular surface of the geometric model.

For example, as shown in fig. 4, a schematic diagram of a topology structure of a three-dimensional simulated face template in an embodiment is shown. Fig. 4 includes a topology structure diagram (a) of the three-dimensional simulated face template, one triangular surface (a1) in the topology structure of the three-dimensional simulated face template, and a triangular surface (b1) obtained after the triangular surface (a1) is deformed. From the topological structure diagram (a), it can be seen that the topological structure of the three-dimensional simulated face template includes a plurality of vertices, any three vertices are connected to obtain a triangular surface, and the topological structure of the three-dimensional simulated face template can be regarded as being composed of a plurality of triangular surfaces. Because the three-dimensional simulated face template is a deformable geometric model, in the process of reconstructing the simulated face model corresponding to the real face based on the three-dimensional simulated face template, the triangular surface in the topological structure of the three-dimensional simulated face template can be considered to be deformed corresponding to the characteristics of the real face, so that the simulated face model corresponding to the real face is obtained. Wherein, in FIG. 4, the vertex in one triangular surface (a1) in the topological structure of the three-dimensional simulated face template

By passingThe triangular surface (a1) was deformed to obtain the triangular surface (b1) and the corresponding vertexes vi1, vi2, vi3, and it was found that the shapes of the triangular surface (a1) and the triangular surface (b1) and the positions of the corresponding vertexes were changed. Three-dimensional simulation face models with various shapes and expressions can be simulated and reconstructed by carrying out deformation processing on each triangular surface in the three-dimensional simulation face template.

As shown in fig. 5, fig. 5 includes a three-dimensional simulated face template (a), and a three-dimensional simulated face model (B) corresponding to the real face image and constructed based on the three-dimensional simulated face template (a). The simulated face model (B) constructed in the way not only has the characteristics of the three-dimensional simulated face template (A), but also has the real face characteristics in the real face image.

Because the three-dimensional simulated face model corresponding to the real face is generated by modeling based on the three-dimensional simulated face template, the three-dimensional simulated face model has the real face features in the real face image and the average face features in the three-dimensional simulated face template, and comprises important information such as facial features, contour features and the like.

Therefore, the three-dimensional cartoon face with the real face features of the user can be obtained by extracting the difference features between the three-dimensional simulated face model and the three-dimensional simulated face template, namely extracting the difference features between the three-dimensional face of the user and the average three-dimensional face, and then transferring the extracted difference features to the preset three-dimensional cartoon face template, namely the average cartoon face.

Specifically, after the computer device constructs a three-dimensional simulated face model corresponding to the real face image, each triangular face in the simulated face model and a first deformation gradient on the corresponding triangular face in the three-dimensional simulated face template are obtained according to the corresponding relationship between the three-dimensional simulated face model and the topological structure of the three-dimensional simulated face template.

Further, the triangular surfaces in the topological structures of the two models can be extracted in proportion, that is, the proportion mapping relation, not the difference relation, between the triangular surfaces respectively corresponding to the three-dimensional simulated face template in the simulated face model of the user is extracted. A triangular mapping relation, namely a first deformation gradient, can be obtained between triangular surfaces corresponding to the two three-dimensional models respectively, the first deformation gradient reflects the shape changes of the triangular surfaces in the aspects of space angle, side length and the like in space, and further reflects the shape changes of the constructed simulated human face model.

And the computer equipment obtains an affine transformation mapping matrix between each triangular surface in the simulated face model and each corresponding triangular surface in the three-dimensional simulated face template according to the first deformation gradient between each triangular surface corresponding to the two three-dimensional models respectively, wherein the affine transformation mapping matrix comprises the mapping relation between all the triangular surfaces in the simulated face model and the three-dimensional simulated face template and the first deformation gradient. And the obtained affine transformation mapping matrix is used for representing the difference characteristics between the simulated face model and the three-dimensional simulated face template.

In the embodiment, the affine transformation mapping matrix is obtained by directly calculating the deformation gradient of the corresponding triangular surface between the simulated human face model of the user and the three-dimensional simulated human face template, so that the change characteristics of the simulated human face model relative to the three-dimensional simulated human face template can be effectively extracted, and the difference characteristics between the simulated human face model and the three-dimensional simulated human face template can be accurately extracted.

In one embodiment, the topological structure of the three-dimensional cartoon face template comprises a plurality of triangular faces; based on the semantic mapping relation between the three-dimensional simulation face template and the three-dimensional cartoon face template, the difference features are transferred to the three-dimensional cartoon face template to obtain a three-dimensional cartoon face model, and the method comprises the following steps: aligning the simulated face model and the three-dimensional cartoon face template based on the semantic mapping relation between the three-dimensional simulated face template and the three-dimensional cartoon face template; searching triangular surfaces matched with the triangular surfaces in the simulated human face model in the aligned triangular surfaces in the three-dimensional cartoon human face template to obtain triangular surface mapping relations between the triangular surfaces in the simulated human face model and the triangular surfaces in the three-dimensional cartoon human face template; and transferring the difference characteristics to a three-dimensional cartoon face template according to the triangular surface mapping relation to obtain a three-dimensional cartoon face model.

It can be understood that, similar to the three-dimensional simulated face template, the three-dimensional cartoon face template is also a three-dimensional geometric topological structure model. But the corresponding topological structures of the three-dimensional simulation face template and the three-dimensional cartoon face template are different. The topological structure of the three-dimensional cartoon face template also comprises a large number of vertexes, and a plurality of triangular surfaces formed by the large number of vertexes, namely triangular surfaces. The three-dimensional simulation face template and the three-dimensional cartoon face template do not need to have the same number of vertexes, triangular surfaces or the same connection mode between points. The three-dimensional cartoon faces with various shapes and expressions can be simulated by deforming each triangular surface in the three-dimensional cartoon face template.

Because the semantic mapping relation between the topological structures of the three-dimensional simulated face template and the three-dimensional cartoon face template is pre-established, and the reconstructed simulated face model and the three-dimensional simulated face template have the same topological structure, the corresponding relation between the topological structures of the simulated face model and the three-dimensional cartoon face template can be obtained according to the semantic mapping relation between the three-dimensional simulated face template and the three-dimensional cartoon face template.

After the computer equipment extracts the difference characteristics between the three-dimensional simulation face model and the three-dimensional simulation face template of the user, the preset three-dimensional cartoon face template can be deformed by using the extracted difference characteristics according to the semantic mapping relation between the three-dimensional simulation face template and the three-dimensional cartoon face template, so that the difference characteristics are transferred to the three-dimensional cartoon face template, and therefore the three-dimensional cartoon face with the real face characteristics of the user can be obtained.

Specifically, the computer device firstly obtains the semantic corresponding relation between the triangular surface in the simulated human face model and the triangular surface in the three-dimensional cartoon human face template according to the semantic mapping relation between the three-dimensional simulated human face template and the three-dimensional cartoon human face template, and further deforms the simulated human face model and the three-dimensional cartoon human face template to be consistent according to the semantic alignment principle.

For example, as shown in fig. 6, a schematic diagram of aligning the simulated face model with the three-dimensional cartoon face template in one embodiment is shown. The three-dimensional simulated face template (a) in fig. 6 has the same topological structure as the simulated face model (B), the simulated face model (B) has different topological structures from the three-dimensional cartoon face template (C), one simulated face model (B) has a hindbrain and the other simulated face model (B) does not have the hindbrain, and the triangular face corresponding relationship between the simulated face model (B) and the three-dimensional cartoon face template (C) is unknown, so that the simulated face model (a) and the three-dimensional cartoon face template (C) need to be aligned and obtained through models. Therefore, according to the semantic mapping relation between the preset three-dimensional simulation face template (A) and the three-dimensional cartoon face template (C), the simulation face model (B) and the three-dimensional cartoon face template (C) are semantically deformed and aligned to obtain the aligned three-dimensional cartoon face template (C1).

And the computer equipment searches triangular surfaces matched with the triangular surfaces in the simulated human face model in the aligned triangular surfaces in the three-dimensional cartoon human face template. The matched triangular surfaces can be two triangular surfaces which are most adjacent or closest in semantics, namely triangular surfaces which accord with the principle of consistent semantics. Therefore, the triangular surface mapping relation between each triangular surface in the simulated human face model and the triangular surface in the three-dimensional cartoon human face template can be effectively obtained. The triangular surface mapping relation is the change characteristics of the simulation human face model and the three-dimensional cartoon human face template.

The computer equipment can further perform deformation migration processing on the three-dimensional cartoon face template according to the triangular surface mapping relation and the difference characteristics, so that the difference characteristics are migrated to the three-dimensional cartoon face template, and therefore the three-dimensional cartoon face model with the real face characteristics can be effectively obtained.

For example, the reconstructed three-dimensional simulated face model corresponding to the real face image is applied to the nose of the three-dimensional cartoon face template through the difference on the nose, compared with the difference on the nose of the three-dimensional simulated face template, such as the characteristics of a big nose or a high nose bridge, so that equivalent change of the simulated face model on the nose is realized, and the difference characteristics can be semantically and accurately transferred to the three-dimensional cartoon face template.

In this embodiment, a semantic mapping relationship is preset between the three-dimensional simulated face template and the three-dimensional cartoon face template, and then the semantic correspondence between the triangular face in the simulated face model and the triangular face in the three-dimensional cartoon face template can be obtained according to the semantic mapping relationship, so that the simulated face model and the three-dimensional cartoon face template can be accurately deformed to be consistent according to the semantic mapping relationship and the semantic alignment principle.

In one embodiment, migrating the difference features to a three-dimensional cartoon face template according to a triangular surface mapping relationship to obtain a three-dimensional cartoon face model, including: according to the triangular surface mapping relation and the affine transformation mapping matrix, carrying out deformation processing on an original triangular surface in the three-dimensional cartoon face template, and determining a second deformation gradient between the original triangular surface and the deformed triangular surface in the three-dimensional cartoon face template; performing iterative deformation processing on each triangular surface in the aligned three-dimensional cartoon face template according to a triangular surface mapping relation and an affine transformation mapping matrix in the direction of minimizing the difference between the second deformation gradient and the first deformation gradient; and obtaining the three-dimensional cartoon face model after the iteration stop condition is met.

The iterative deformation processing refers to performing deformation processing on the three-dimensional cartoon face template for multiple times to continuously optimize the cartoon face model, so that the final three-dimensional cartoon face model is obtained after corresponding iteration stop conditions are met.

It can be understood that the second deformation gradient is a deformation gradient between the original triangular surface and each deformed triangular surface in the three-dimensional cartoon face template in the process of deforming the triangular surface in the three-dimensional cartoon face template according to the obtained difference characteristics.

And the computer equipment aligns the simulated face model with the three-dimensional cartoon face template, acquires triangular faces in the simulated face model, and performs deformation processing on the original triangular faces in the three-dimensional cartoon face template according to the triangular face mapping relation and the affine transformation mapping matrix after acquiring the triangular face mapping relation between the triangular faces in the simulated face model and the triangular faces in the three-dimensional cartoon face template. Namely, according to the triangular surface mapping relation, according to a first deformation gradient corresponding to the affine transformation mapping matrix, deforming the corresponding original triangular surface in the three-dimensional cartoon face template, wherein the original triangular surface in the three-dimensional cartoon face template can be changed during deformation, and the computer equipment determines a second deformation gradient between the original triangular surface and the deformed triangular surface in the three-dimensional cartoon face template.

It can be understood that the process of reconstructing the three-dimensional cartoon face is a continuous optimization process, so that the computer device needs to perform iterative deformation processing to optimize the three-dimensional cartoon face model. Specifically, after the computer device determines the second deformation gradient, the computer device performs iterative deformation processing on each triangular surface in the aligned three-dimensional cartoon face template according to the triangular surface mapping relation and the affine transformation mapping matrix in the direction of minimizing the difference between the second deformation gradient and the first deformation gradient. And after the iteration stop condition is met, a final three-dimensional cartoon face model can be obtained.

The iteration stop condition may be specifically a preset iteration number, or may be when a difference between the second deformation gradient and the first deformation gradient reaches a preset difference threshold, or when the difference between the second deformation gradient and the first deformation gradient is no longer reduced.

Fig. 7 is a schematic diagram illustrating migration of a difference feature to a three-dimensional cartoon face template in one embodiment. In fig. 7, the difference between the three-dimensional simulated face template (a) and the three-dimensional simulated face model (B) can be obtained. Because the three-dimensional simulation face template (A) and the three-dimensional cartoon face template (C) have a preset semantic mapping relation, affine transformation mapping matrixes, namely difference characteristics, of all triangular surfaces in the three-dimensional simulation face template (A) and the simulation face model (B) are obtained. And then carrying out deformation processing on the three-dimensional cartoon face template (C) according to the difference characteristics to obtain a three-dimensional cartoon face model (D). Therefore, the difference between the three-dimensional simulation face template (A) and the simulation face model (B) can be considered to be transferred to the three-dimensional cartoon face template (C) by deforming the first deformation gradient between the three-dimensional simulation face template (A) and the simulation face model (B) and the second deformation gradient between the three-dimensional cartoon face template (C) and the three-dimensional cartoon face model (D) to be consistent, and the three-dimensional cartoon face model (D) corresponding to the real face image is obtained.

In this embodiment, the triangular surfaces in the three-dimensional cartoon face template are subjected to iterative deformation processing according to a preset semantic mapping relationship between the three-dimensional simulated face template and the three-dimensional cartoon face template and affine transformation mapping matrices of the triangular surfaces in the three-dimensional cartoon face template and the simulated face model, so that a difference between a second deformation gradient of the triangular surfaces in the three-dimensional cartoon face template and a first deformation gradient corresponding to the affine transformation mapping matrices is minimized, and thus difference characteristics can be accurately migrated to the three-dimensional cartoon face template, and a three-dimensional cartoon face model with real face characteristics can be effectively obtained.

In one embodiment, extracting human face feature points and human image features from a real human face image comprises: extracting key points of five sense organs, contour key points and semantic information respectively corresponding to the key points and the contour key points from a real face image; obtaining face characteristic points according to the key points of the five sense organs, the key points of the outline and the semantic information respectively corresponding to the key points; and extracting the image characteristics of the real face image based on the face characteristic points.

The key points of the five sense organs refer to key points corresponding to the parts of the five sense organs in the human face, for example, the key points of the five sense organs include key points corresponding to at least one part of eyebrows, eyes, noses, lips, and chin. The contour key points refer to key points of the whole contour of the human face, for example, key points corresponding to the mandible contour of the human face. Semantic information corresponding to the key points of the five sense organs and the key points of the outline respectively refers to geometric feature information among the key points, such as distance, area, angle and the like.

After the real face image is obtained by the computer equipment, extracting key points of five sense organs, key points of outlines and semantic information respectively corresponding from the real face image so as to obtain the face characteristic points in the real face image. Specifically, the computer device may extract the face feature points in the real face image using a pre-trained face detection network or using a preset face feature point detection algorithm.

In one embodiment, the computer device may previously train a face feature extraction network using a sample image including a face. The face feature extraction Network may adopt a machine learning model based on a deep Network such as CNN (Convolutional Neural Networks), ResNet (deep residual Convolutional Networks), densnet (compact Convolutional Networks), and DPN (Dual Path Networks). The face feature extraction network in the embodiment improves the robustness and the real-time performance of the face feature extraction network and the accuracy of feature extraction through aspects of deep network structure optimization, network size cutting, sample image quality improvement and the like.

Specifically, the computer device may perform facial feature extraction on a real facial image through a trained facial feature extraction network, specifically, may extract 256 facial feature points, and each extracted facial feature point has specific semantic information. For example, the position of the nose in the human face can be represented by a dot x, and the position of the eyeball in the human face can be represented by a dot y. The face feature extraction is carried out on the real face image through the pre-trained face feature extraction network with high accuracy, and reliable face point positions, namely the face feature points, can be extracted from the real face image. The extracted human face points can be used for accurately analyzing the human face, and are important prepositive steps for subsequently performing three-dimensional simulation human face reconstruction and attribute analysis. As shown in fig. 8, by extracting the human face feature points from the real face image (8a), the human face feature point map (8b) after extracting the human face feature points can be obtained. It is understood that the number of face feature points actually extracted is not just the number shown in the face feature point diagram (8 b). It can be understood that the eyes of the real face image (8a) are shielded based on the privacy angle for protecting the real face image of the user, when the feature points of the real face image are extracted, the feature points of the eyes are actually collected, and the schematic diagram (8b) of the face feature points after the extracted face feature points also includes the feature points of the eyes.

Further, after extracting the face feature points from the real face image, the computer device further extracts the image features in the real face image based on the extracted face feature points. Specifically, the computer device may extract the image features of the corresponding region in the real face image according to the position distribution of the face feature points. For example, the typical image features include some image features of different regions in the human face, so the image features of the corresponding regions can be extracted according to the position distribution of the human face feature points.

In the embodiment, after the face feature points and the corresponding semantic information in the real face image are extracted, the image features of the real face image are further extracted according to the face feature points, so that the image features in the real face image can be more accurately extracted.

In one embodiment, the character features include native character features and additional character features; the image characteristic of the real face image is extracted based on the face characteristic points, and the method comprises the following steps: carrying out face alignment processing on the real face image to obtain an aligned real face image; identifying native image features from the aligned real face images based on the face feature points; and identifying additional image features from the aligned real face images based on the face feature points.

It can be understood that there may be situations such as face inclination in the original real face image relative to the image, that is, situations where the face in the real face image is not a frontal face. The face alignment processing means that the face in the original real face image is corrected, so that the face in the aligned real face image is a front face relative to the image.

The native image feature refers to an image feature of a real face in the real face image, for example, a feature related to an image, such as a hair, a birthmark, a mole, and the like, of a user in the real face image. The additional image feature refers to a feature corresponding to an additional accessory worn by a user in the real face image, and the additional image feature in the real face image is a feature of the accessory included in the face region and the region near the face. For example, additional character features include features of accessories such as glasses, hats, earrings, and the like.

After extracting the human face feature points from the real human face image, the computer device performs human face alignment processing on the original real human face image according to the human face feature points, and the computer device can perform human face alignment on the original real human face image according to alignment key points or human face feature points in the human face so as to correct and align the position of the human face in the real human face image, thereby obtaining the aligned real human face image.

In an embodiment, as shown in fig. 9, after the face alignment processing is performed on the real face image (9a), an image (9b) after the face alignment processing can be obtained, so that it can be seen that the face in the image (9b) after the face alignment processing is a front face relative to the image, and therefore, the image features in the real face image can be extracted more accurately.

In one embodiment, the computer device may align the original real face image in a process of extracting the face feature points from the real face image, and may specifically correct alignment according to the alignment key points in the face or the initial face feature points to the position of the face in the original real face image, so as to further extract the face feature points, thereby being capable of extracting the face feature points in the real face image more accurately.

After the real face image is aligned by the computer equipment, the native image features are identified from the aligned real face image based on the face feature points. Meanwhile, the computer equipment also identifies additional image features from the aligned real human face images according to the human face feature points.

Specifically, the computer device extracts, from the position distribution of the face feature points, the native image features of the corresponding region in the real face image, for example, the native image features and the additional image features in the region within and around the face region.

In the embodiment, after the original real face image is subjected to face alignment, the original image features and the additional image features in the real face image can be extracted more accurately according to the position distribution of the face feature points.

In one embodiment, the native image feature, including a hair style feature; identifying hair style characteristics from the aligned real face images based on the face characteristic points, comprising: extracting hair features in the aligned real face image through a trained hair style recognition network, and generating a hair mask image according to the hair features; dividing the hair mask image into at least two mask image sub-areas based on the human face characteristic points; and obtaining the hair style characteristics according to the distribution of the hair characteristics in at least two mask image areas.

It is to be understood that the native image features include hair styling features, wherein the hair styling features include hair styling categories, for example, hair styling categories including short, medium, long, and the like. More specifically, the gender characteristics of the human face can be further identified, and different types of hair style characteristics can be further distinguished along with the gender characteristics. For example, if the gender characteristic of the real face is female, the hair style category includes short hair, medium hair, long hair, and straight hair, curly hair, etc. If the gender characteristic of the real face is male, the hair style category comprises extra short hair, medium hair and the like.

The computer equipment can train the hair style identification network by utilizing a large number of sample images in advance, so that the trained hair style identification network has the capability of identifying hair style characteristics in the real face image.

Specifically, the computer device extracts a face feature point from a real face image, performs face alignment processing on an original real face image, inputs the real face image subjected to the face alignment processing into a trained hairstyle recognition network, extracts a special hair feature from the aligned real face image through the hairstyle recognition network, and generates a hair mask image according to the extracted hair feature.

The computer device further divides the hair mask image into at least two mask image sub-areas according to the face feature points. For example, the face region may be divided according to the position distribution of the face feature points in the face, so that the whole hair mask is divided into a plurality of mask sub-regions. The computer device obtains the hair style characteristics according to the distribution of the hair characteristics in the plurality of mask image sub-areas.

In one embodiment, the computer device may further divide the face region according to the position distribution of the face feature points in the face, and then input the image obtained by dividing the face region into the trained hair style recognition network. After the computer equipment extracts the hair mask image through the hair style identification network, the hair mask image can be directly divided into a plurality of mask image sub-areas according to the division marks for dividing the face area.

In one embodiment, as shown in fig. 10, the image is obtained by performing facial feature point extraction and facial alignment processing on a real facial image and dividing a facial region. The image shown in fig. 10 includes extracted human face feature points, that is, key points corresponding to at least one of eyebrows, eyes, nose, lips, chin, and mandible line in the human face. The computer device divides the face region according to the position distribution of the face feature points in the face, specifically, trisecting the face region, and marking the face region by the division marks 10a and 10b in fig. 10, thereby dividing the image into three sub-regions.

After the computer device performs hair feature extraction on the image in fig. 10, as shown in fig. 11, the computer device generates a hair mask map 11 according to the extracted hair features. The white area represents the hair feature in the real face image, and the black area represents the background except the hair feature. The division marks 11a and 11b in fig. 11 correspond to the division marks 10a and 10b in fig. 10, and thus the hair mask can be divided into three mask sub-regions according to the division marks 11a and 11 b.

The computer device further determines a hair style characteristic based on a distribution of the hair characteristic over the plurality of mask map sub-regions. For example, in fig. 11, the hair mask image is divided into three mask image sub-regions according to a division mark 11a and a division mark 11b, where the division mark 11a may be specifically obtained by dividing according to the ear root key point, and the division mark 11b may be specifically obtained by dividing according to the chin key point. Thus, the portion above the division mark 11a is the mask map sub-region Q1, the portion between the division mark 11a and the division mark 11b is the mask map sub-region Q2, and the portion below the division mark 11b is the mask map sub-region Q3.

If the hair features in the hair mask image are mostly concentrated in the mask image sub-region Q1, and the hair features are hardly distributed in the mask image sub-region Q2 and the mask image sub-region Q3, the hair style features are determined to be short hairs. If the hair features in the hair mask map are mostly distributed in the mask map sub-region Q1, and a portion of the hair features are also distributed in the mask map sub-region Q2, then the hairstyle feature is determined to be short hair. Similarly, if the hair features in the hair mask map are mostly distributed in the mask map sub-region Q1, and a part of the hair features are also distributed in the mask map sub-region Q2 and the mask map sub-region Q3, the hair style features are determined to be long hair. In one embodiment, if there is no hair feature in the hair mask pattern, the corresponding hair style feature can be determined to be short hair in view of aesthetics.

In this embodiment, after the hair features are extracted through the trained hair style recognition network and the hair mask image is generated, the hair mask image is divided into at least two mask image sub-regions based on the face feature points, and then the hair style features in the real face image can be accurately recognized through the distribution of the hair features in the at least two mask image sub-regions.

In one embodiment, identifying additional character features from the aligned real face images based on the face feature points comprises: extracting an additional image area image from the aligned real face image according to the distribution position of the face feature points; identifying additional image categories of the additional image area images through the trained target classification network; and obtaining the additional image characteristics in the real face image according to the additional image categories.

The target classification network is a deep learning network which is trained in advance and has the capacity of classifying accessories in the real face image. The target classification network can adopt a MobileNet lightweight network, and the basic unit of the MobileNet is a deep-level separable convolution. In some embodiments, the target classification Network may also adopt a machine learning model based on a deep Network such as CNN (Convolutional Neural Networks), ResNet (deep residual Convolutional Networks), densneet (compact Convolutional Networks), DPN (Dual Path Networks), and the like.

The computer device may pre-train the target classification network with a large number of sample images, such that the trained target classification network has the ability to recognize additional image features in the real face images.

Specifically, the computer device extracts a face feature point from a real face image, and after performing face alignment processing on an original real face image, firstly extracts an accessory region image from the aligned real face image according to the distribution position of the face feature point. And then the computer equipment inputs the accessory region image into the trained target classification network, and further identifies the type of the accessory part in the accessory region image through the target classification network so as to obtain the additional image characteristic in the real face image according to the type of the accessory part.

In one embodiment, the additional avatar characteristics include an eyeglass characteristic. For example, the eyewear features include at least one of no-glasses, black-framed glasses, metal-framed glasses, and the like. Fig. 12 is a schematic flow chart illustrating a process of classifying the features of the glasses according to an embodiment. Taking the real face image in fig. 9 as an example, after extracting the feature points of the human face from the real face image, the area around the eyes is extracted from the aligned face image according to the key points of the glasses in the face area, so as to obtain the glasses area image, and the glasses area image is used as the input image 12 a. The computer device then resizes the glasses area image to a predetermined size, which may be 64 × 32, to obtain a resized image 12 b. Then, the image 12b after the size adjustment is input into a target classification network, specifically, a MobileNet network 1202, the image 12b after the size adjustment is subjected to feature extraction through each convolution network in the MobileNet network 1202, then the category of the accessory part in the image is identified through a full connection layer 1204, and a classification result 1206 is output. The classification result is an additional image category, and the additional image category may include any one of glasses without glasses, glasses with black frames, and glasses with metal frames. The computer device further derives additional character features in the real face image according to the additional character categories.

In one embodiment, the target classification network employs a Mobilenet lightweight network, the basic element of which is a deep-level separable convolution. The mobile lightweight network comprises a plurality of network layers, and each network layer comprises attribute information such as a network layer type, a network step size, a filter shape size and an input image size. For example, the network layer type and corresponding network step size may include at least one of Conv/s2, Conv dw/s1, Conv/s1, Conv dw/s2, Avg Pool/s1, FC/s1, Softmax/s1, and so forth. The filter shape size may include at least one of 3x3x3x32, 3x3x32dw, 1x1x32dw, 3x3x64dw, 1x1x64x128, 3x3x128dw, 1x1x64x128, 3x3x256dw, 3x3x512dw, 3x3x1024dw, and the like. The input image size may include at least one of 64x32x3, 112x112x32, 56x56x64, 56x56x128, 28x28x256, 14x14x512, 7x7x1024, 1x1x1024, and the like. Wherein, the input image size of the initial network layer of the Mobilenet lightweight network may be 64x32x 3. The output category of the FC layer is 3 categories, for example, three categories of no glasses, black frame glasses, and metal frame glasses.

In one experimental test example, ten thousand sample images were acquired as a training set, including 3500 glasses-free images, 3420 images with black framed glasses, and 3320 images with metal framed glasses. And training a target classification network through sample images in the training set. Through testing, the trained target classification network aims at the overall classification precision of the glasses type to be 92%, so that the target classification network with high classification accuracy can be effectively trained.

In one embodiment, the character features include skin tone features, native character features, and additional character features; image rendering is carried out on the cartoon face model according to the image characteristics to generate a three-dimensional cartoon image, and the method comprises the following steps: extracting pixels of a skin color area based on the distribution position of the face feature points, and acquiring skin color features according to the pixels of the skin color area; performing skin color rendering on the three-dimensional cartoon face model according to the skin color characteristics; and obtaining matched image materials according to the original image characteristics and the additional image characteristics, and rendering the image materials into a three-dimensional cartoon face model after skin color rendering to obtain a three-dimensional cartoon image with face characteristics and image characteristics.

The skin color feature refers to a feature of skin color of a real face in a real face image. For example, the skin tone feature may be represented by a pixel color feature or the shade of the skin tone, or the like.

And in the process of extracting the real face image, the computer equipment also extracts pixels of a skin color area according to the distribution position of the face feature points and acquires skin color features according to the pixels of the skin color area. Specifically, the computer device may remove regions where the eyes and mouth may affect skin color calculation from the eye key points and lip key points among the human face feature points of the real face image to extract skin color regions that can be used for skin color features, that is, pixels of the cheek portion that are mainly counted. And the computer equipment further compares the pixels of the skin color area with the pixels of a preset skin color card to obtain the skin color card with the minimum pixel difference as the skin color characteristic of the real face image. Referring to fig. 13, which is a schematic diagram of a skin tone color chart in one embodiment, various skin tone features may be distinguished by color depth, referring to fig. 13. The skin tone color chart for five different skin tone characteristics are shown in fig. 13, respectively.

The computer equipment extracts image features, namely skin color features, original image features and additional image features, in the real face image, and after a corresponding three-dimensional cartoon face model is constructed, skin color rendering is carried out on the cartoon face model according to pixel colors corresponding to the skin color features.

The computer device further obtains the matched image material according to the original image characteristic and the additional image characteristic. Wherein, the database comprises image materials respectively corresponding to various original image characteristics and additional image characteristics. The computer equipment renders the image materials into a three-dimensional cartoon face model after skin color rendering, so that a three-dimensional cartoon image with face characteristics and image characteristics can be effectively obtained.

In a specific embodiment, taking the real face image in fig. 9 as an example, after a corresponding three-dimensional cartoon face model is constructed, the extracted image features are rendered to the corresponding cartoon face model, so as to obtain a three-dimensional cartoon image. As shown in fig. 14, the effect diagram 14(C1) of the three-dimensional cartoon image obtained by performing three-dimensional cartoon reconstruction on the real face image 14(R1) is shown. The original image features of the real face in the real face image 14(R1) include skin color features, hair style features, and mole identification features, as well as additional image features, and glasses features. The effect map 14(C1) of the three-dimensional cartoon image thus constructed also includes image features corresponding to the skin color feature, the hair style feature, the mole identification feature and the glasses feature in the real human face, respectively.

As shown in fig. 15, an effect diagram of a three-dimensional cartoon image is obtained after three-dimensional cartoon reconstruction is performed on a real face image in other embodiments. Fig. 15 shows three groups of real face images, namely, the real face image 15(R1), the real face image 15(R2), and the real face image 15(R3), respectively, and after three-dimensional cartoonized reconstruction is performed on the real images, corresponding three-dimensional cartoon image 15(C1), three-dimensional cartoon image 15(C2), and three-dimensional cartoon image 15(C3) are obtained, respectively.

In the embodiment, the skin color feature, the original image feature and the image feature corresponding to the additional image feature in the real face image are extracted, and after the three-dimensional cartoon face model corresponding to the real face is constructed, the colors corresponding to the skin color feature and the materials corresponding to the original image feature and the additional image feature are rendered on the cartoon face model, so that the three-dimensional cartoon image with high precision and the real face feature and the image feature can be constructed, and the similarity between the three-dimensional cartoon image and the real face is effectively improved.

In one embodiment, before extracting the face feature points and the image features from the real face image, the method further comprises: displaying a cartoon image selection interface, wherein the cartoon image selection interface comprises three-dimensional image options; collecting a real face image of a user in response to the selection operation of the three-dimensional image option; after the cartoon face model is subjected to image rendering according to the image characteristics and a three-dimensional cartoon image is generated, the cartoon processing method of the face image further comprises the following steps: and displaying the image preview interface, and displaying the real human face image and the three-dimensional cartoon image in the image preview interface.

The computer equipment can be a terminal, an application capable of realizing the three-dimensional cartoon image reconstruction function runs in the terminal, and the application comprises a cartoon image selection interface. It is to be understood that the avatar selection interface is an interface for instructing a user to select the type of avatar. For example, the cartoon character selection interface comprises a two-dimensional character option and a three-dimensional character option.

And after the user selects the option of the corresponding cartoon image type in the cartoon image selection interface, displaying the image acquisition interface to acquire the real face image of the user. Specifically, after the user selects the three-dimensional image option in the cartoon image selection interface, the terminal responds to the selection operation aiming at the three-dimensional image option, and then the real face image of the user is obtained. The terminal may further specifically display an image acquisition interface, and the image acquisition interface may specifically be a local image selection interface or an image capturing interface. The local image selection interface is used for acquiring the existing real face image from a local database of the terminal. The image shooting interface is used for acquiring real face images of the user in real time.

After the terminal acquires the real face image, firstly extracting face characteristic points and image characteristics from the real face image, then constructing a three-dimensional simulated face model corresponding to the real face image based on the face characteristic points, and acquiring difference characteristics between the simulated face model and the three-dimensional simulated face template. And further migrating the difference characteristics to the three-dimensional cartoon face template according to the semantic mapping relation between the three-dimensional simulation face template and the three-dimensional cartoon face template, thereby obtaining a three-dimensional cartoon face model. And finally, performing image rendering on the cartoon face model according to the extracted image characteristics to obtain a three-dimensional cartoon image with the real face characteristics and the image characteristics of the real face, so that the three-dimensional cartoon image more similar to the real face can be accurately and effectively constructed.

And after the terminal obtains the three-dimensional cartoon image corresponding to the real face image, displaying the image preview interface, and displaying the original real face image and the constructed three-dimensional cartoon image in the image preview interface, so that the similarity between the real face image and the three-dimensional cartoon image can be effectively compared in the image preview interface.

In a specific embodiment, as shown in FIG. 16, a diagram of an embodiment cartoon character selection interface is shown. For example, an interface may be selected for a cartoon character in a "centimeter show" application. The cartoon image selection interface comprises a two-dimensional cartoon image display area 2D and a three-dimensional cartoon image display area 3D, and a two-dimensional image option 2Da is displayed in the two-dimensional cartoon image display area 2D and a three-dimensional image option 3Db is displayed in the three-dimensional cartoon image display area 3D respectively.

And displaying an image shooting interface after the three-dimensional image option 3Db is selected. FIG. 17 is a diagram illustrating an image capture interface in one embodiment. After the terminal displays the image shooting interface, the real face image of the current user is shot through the camera shooting acquisition device of the terminal, and whether a face exists in the image shooting interface is detected by using the face detection frame 17 a. And when the face exists in the current picture, automatically acquiring the real face image of the current user, and further performing three-dimensional cartoon image construction processing.

And displaying an image preview interface after constructing and obtaining the three-dimensional cartoon image corresponding to the real face image. FIG. 18 is a diagram illustrating a preview avatar interface in one embodiment. The image preview interface comprises a three-dimensional cartoon image display area 18a and an original real human face image display area 18 b. The character preview interface may also include a finish option for saving the currently generated three-dimensional cartoon character.

Furthermore, after the three-dimensional cartoon image corresponding to the real face image is constructed and stored, the user can also perform self-defined adjustment on the generated three-dimensional cartoon image. Specifically, the terminal can display the three-dimensional cartoon image adjusting interface so as to realize self-defined adjustment of the three-dimensional cartoon image in the three-dimensional cartoon image adjusting interface. Fig. 19 is a schematic diagram illustrating a three-dimensional cartoon character adjustment interface according to an embodiment. The three-dimensional cartoon image adjusting interface comprises an adjusting pre-area 19a and a material selecting area 19b, wherein the material selecting area 19b comprises corresponding materials such as various human face parts, image characteristics and the like, and a user can select the corresponding materials in the material selecting area 19b, add the corresponding materials to the corresponding parts in the three-dimensional cartoon image or replace the materials of the corresponding parts in the three-dimensional cartoon image, so that the user-defined adjustment of the automatically generated three-dimensional cartoon image can be effectively realized, and the editability and the adaptability of the three-dimensional cartoon image are effectively improved.

In another application scenario, the computer device may further perform three-dimensional cartoon image construction on real face images corresponding to consecutive video frames. After extracting the human face characteristic points and the image characteristics from the real human face image, the computer device constructs a three-dimensional simulated human face model corresponding to the real human face image based on the human face characteristic points. And then, by acquiring the difference characteristics between the simulated face model and the three-dimensional simulated face template, migrating the difference characteristics into the three-dimensional cartoon face template according to the preset semantic mapping relation between the three-dimensional simulated face template and the three-dimensional cartoon face template, so as to obtain the three-dimensional cartoon face model. And then performing image rendering on the cartoon face model according to the image characteristics so as to render the image characteristics in the real face image into the cartoon face model, thereby generating a three-dimensional cartoon image which is more similar to the real face in the real face image.

Further, after the computer device constructs the three-dimensional cartoon image corresponding to the real face image of the initial frame, the computer device only needs to adjust the constructed three-dimensional cartoon image according to the face pose and the expression in the real face image corresponding to the subsequent frame. Specifically, the computer device compares the camera parameters, the face shape parameters and the face expression base parameters of the real face image in the subsequent frame with the camera parameters, the face shape parameters and the face expression base parameters in the real face image in the initial frame, and if the parameters are changed, the corresponding parameters of the cartoon face model in the three-dimensional cartoon image are directly adjusted according to the changed camera parameters, the face shape parameters and the face expression base parameters, so that the three-dimensional cartoon image is consistent with the form expression of the corresponding real face image.

For example, successive video frames, i.e. a video including a real face, may be presented in the interface of the terminal. And after the terminal generates the three-dimensional cartoon image corresponding to the real face in each video frame, displaying the video including the real face and the corresponding three-dimensional cartoon image in the interface at the same time. Therefore, the cartoon processing of the face image can be efficiently carried out on the real face image of the video frame, and the three-dimensional cartoon image similar to the real face can be quickly and accurately constructed.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

In one embodiment, as shown in fig. 20, there is provided a human face image cartoonlization processing apparatus 2000, which may adopt a software module or a hardware module, or a combination of the two modules as a part of a computer device, and specifically includes: a feature extraction module 2002, a three-dimensional face reconstruction module 2004, a difference feature extraction module 2006, a difference feature migration module 2008, and a three-dimensional cartoon image generation module 2010, wherein:

a feature extraction module 2002 for extracting human face feature points and image features from the real human face image;

a three-dimensional face reconstruction module 2004 for constructing a three-dimensional simulated face model corresponding to the real face image based on the face feature points;

a difference feature extraction module 2006, configured to obtain difference features between the simulated face model and the three-dimensional simulated face template;

the difference feature migration module 2008 is configured to migrate the difference features to the three-dimensional cartoon face template based on a semantic mapping relationship between the three-dimensional simulated face template and the three-dimensional cartoon face template to obtain a three-dimensional cartoon face model;

and the three-dimensional cartoon image generation module 2010 is used for performing image rendering on the cartoon human face model according to the image characteristics to generate a three-dimensional cartoon image.

In one embodiment, the face feature points are two-dimensional face feature points; the three-dimensional face reconstruction module 2004 is further configured to obtain a feature point mapping matrix of two-dimensional face feature points in the three-dimensional simulated face template; performing parameter estimation based on the face characteristic points and the characteristic point mapping matrix to obtain three-dimensional face parameters; and constructing a three-dimensional simulated face model corresponding to the real face image based on the three-dimensional face parameters.

In one embodiment, the three-dimensional face reconstruction module 2004 is further configured to perform iterative estimation of camera parameters based on the face feature points and the feature point mapping matrix, and obtain camera parameters after a first iteration condition is satisfied; performing face parameter iterative estimation based on the face feature points, the feature point mapping matrix and the camera parameters, and obtaining face shape base parameters and face expression base parameters after a second iteration condition is met; and constructing a three-dimensional simulated face model corresponding to the real face image according to the camera parameters, the face shape base parameters and the face expression base parameters.

In one embodiment, the topological structure of the simulated human face model comprises a plurality of triangular faces; the difference feature extraction module 2006 is further configured to obtain each triangular surface in the simulated face model and a first deformation gradient on the corresponding triangular surface in the three-dimensional simulated face template; obtaining an affine transformation mapping matrix between each triangular surface in the simulated face model and each corresponding triangular surface in the three-dimensional simulated face template according to the first deformation gradient; and the affine transformation mapping matrix is used for representing the difference characteristics between the simulated face model and the three-dimensional simulated face template.

In one embodiment, the topological structure of the three-dimensional cartoon face template comprises a plurality of triangular faces; the difference feature migration module 2006 is further configured to align the simulated face model with the three-dimensional cartoon face template based on a semantic mapping relationship between the three-dimensional simulated face template and the three-dimensional cartoon face template; searching triangular surfaces matched with the triangular surfaces in the simulated human face model in the aligned triangular surfaces in the three-dimensional cartoon human face template to obtain triangular surface mapping relations between the triangular surfaces in the simulated human face model and the triangular surfaces in the three-dimensional cartoon human face template; and transferring the difference characteristics to a three-dimensional cartoon face template according to the triangular surface mapping relation to obtain a three-dimensional cartoon face model.

In one embodiment, the difference feature migration module 2006 is further configured to perform deformation processing on the original triangular surface in the three-dimensional cartoon face template according to the triangular surface mapping relationship and the affine transformation mapping matrix, and determine a second deformation gradient between the original triangular surface and the deformed triangular surface in the three-dimensional cartoon face template; performing iterative deformation processing on each triangular surface in the aligned three-dimensional cartoon face template according to a triangular surface mapping relation and an affine transformation mapping matrix in the direction of minimizing the difference between the second deformation gradient and the first deformation gradient; and obtaining the three-dimensional cartoon face model after the iteration stop condition is met.

In one embodiment, the feature extraction module 2002 is further configured to extract key points of five sense organs, key points of contours, and semantic information corresponding to the key points and the contour, respectively, from the real face image; obtaining face characteristic points according to the key points of the five sense organs, the key points of the outline and the semantic information respectively corresponding to the key points; and extracting the image characteristics of the real face image based on the face characteristic points.

In one embodiment, the character features include native character features and additional character features; the feature extraction module 2102 is further configured to perform face alignment processing on the real face image to obtain an aligned real face image; identifying native image features from the aligned real face images based on the face feature points; and identifying additional image features from the aligned real face images based on the face feature points.

In one embodiment, the native image feature comprises a hair style feature; the feature extraction module 2102 is further configured to extract hair features in the aligned real face image through the trained hair style recognition network, and generate a hair mask map according to the hair features; dividing the hair mask image into at least two mask image sub-areas based on the human face characteristic points; and obtaining the hair style characteristics according to the distribution of the hair characteristics in at least two mask image areas.

In one embodiment, the feature extraction module 2002 is further configured to extract an additional image region image from the aligned real face image according to the distribution position of the face feature points; identifying additional image categories of the additional image area images through the trained target classification network; and obtaining the additional image characteristics in the real face image according to the additional image categories.

In one embodiment, the character features include skin tone features, native character features, and additional character features; the three-dimensional cartoon image generation module 2010 is further configured to extract pixels of a skin color area based on the distribution positions of the face feature points, and acquire skin color features according to the pixels of the skin color area; performing skin color rendering on the three-dimensional cartoon face model according to the skin color characteristics; and obtaining matched image materials according to the original image characteristics and the additional image characteristics, and rendering the image materials into a three-dimensional cartoon face model after skin color rendering to obtain a three-dimensional cartoon image with face characteristics and image characteristics.

In one embodiment, the device further comprises a display module for displaying a cartoon image selection interface, wherein the cartoon image selection interface comprises three-dimensional image options; collecting a real face image of a user in response to the selection operation of the three-dimensional image option; and displaying the image preview interface, and displaying the real human face image and the three-dimensional cartoon image in the image preview interface.

For specific limitations of the facial image cartoon processing device, reference may be made to the above limitations of the facial image cartoon processing method, which is not described herein again. All or part of the modules in the human face image cartoon processing device can be realized by software, hardware and the combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 21. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data such as real face images, three-dimensional simulation face templates, three-dimensional cartoon face templates and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a human face image cartoon processing method.

In one embodiment, another computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 22. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for communicating with an external terminal in a wired or wireless manner, and the wireless manner can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to realize a human face image cartoon processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the configurations shown in fig. 21 and 22 are only block diagrams of some of the configurations relevant to the present application, and do not constitute a limitation on the computer apparatus to which the present application is applied, and a particular computer apparatus may include more or less components than those shown in the figures, or may combine some components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A cartoon processing method for face images is characterized by comprising the following steps:

2. The method of claim 1, wherein the face feature points are two-dimensional face feature points; the constructing of the three-dimensional simulated face model corresponding to the real face image based on the face feature points comprises:

acquiring a feature point mapping matrix of the two-dimensional face feature points in a three-dimensional simulated face template;

performing parameter estimation based on the face characteristic points and the characteristic point mapping matrix to obtain three-dimensional face parameters;

and constructing a three-dimensional simulated face model corresponding to the real face image based on the three-dimensional face parameters.

3. The method of claim 2, wherein performing parameter estimation based on the face feature points and the feature point mapping matrix to obtain three-dimensional face parameters comprises:

performing camera parameter iterative estimation based on the face feature points and the feature point mapping matrix, and obtaining camera parameters after a first iterative condition is met;

performing face parameter iterative estimation based on the face feature points, the feature point mapping matrix and the camera parameters, and obtaining face shape base parameters and face expression base parameters after a second iterative condition is met;

the constructing of the three-dimensional simulation face model corresponding to the real face image based on the three-dimensional face parameters comprises the following steps:

and constructing a three-dimensional simulated face model corresponding to the real face image according to the camera parameters, the face shape base parameters and the face expression base parameters.

4. The method of claim 1, wherein the topology of the simulated face model comprises a plurality of triangular faces; the acquiring of the difference characteristics between the simulated face model and the three-dimensional simulated face template comprises the following steps:

acquiring each triangular surface in the simulated face model and a first deformation gradient on the corresponding triangular surface in the three-dimensional simulated face template;

obtaining an affine transformation mapping matrix between each triangular surface in the simulated face model and each corresponding triangular surface in the three-dimensional simulated face template according to the first deformation gradient; and the affine transformation mapping matrix is used for representing the difference characteristics between the simulation face model and the three-dimensional simulation face template.

5. The method of claim 4, wherein the topology of the three-dimensional cartoon face template comprises a plurality of triangular faces;

the method for obtaining the three-dimensional cartoon face model by migrating the difference features to the three-dimensional cartoon face template based on the semantic mapping relationship between the three-dimensional simulation face template and the three-dimensional cartoon face template comprises the following steps:

aligning the simulated face model and the three-dimensional cartoon face template based on the semantic mapping relationship between the three-dimensional simulated face template and the three-dimensional cartoon face template;

searching triangular surfaces matched with the triangular surfaces in the simulated human face model in the aligned triangular surfaces in the three-dimensional cartoon human face template to obtain a triangular surface mapping relation between the triangular surfaces in the simulated human face model and the triangular surfaces in the three-dimensional cartoon human face template;

and migrating the difference characteristics to the three-dimensional cartoon face template according to the triangular surface mapping relation to obtain a three-dimensional cartoon face model.

6. The method of claim 5, wherein the migrating the difference features to the three-dimensional cartoon face template according to the triangular surface mapping relationship to obtain a three-dimensional cartoon face model comprises:

according to the triangular surface mapping relation and the affine transformation mapping matrix, carrying out deformation processing on an original triangular surface in the three-dimensional cartoon face template, and determining a second deformation gradient between the original triangular surface and the deformed triangular surface in the three-dimensional cartoon face template;

performing iterative deformation processing on each triangular surface in the aligned three-dimensional cartoon face template according to the triangular surface mapping relation and the affine transformation mapping matrix in the direction of minimizing the difference between the second deformation gradient and the first deformation gradient;

and obtaining the three-dimensional cartoon face model after the iteration stop condition is met.

7. The method of claim 1, wherein extracting human face feature points and human image features from the real human face image comprises:

extracting key points of five sense organs, contour key points and semantic information respectively corresponding to the key points and the contour key points from a real face image;

obtaining face characteristic points according to the key points of the five sense organs, the contour key points and the semantic information respectively corresponding to the key points;

and extracting the image characteristics of the real face image based on the face characteristic points.

8. The method of claim 7, wherein the character features include native character features and additional character features; the extracting of the image features of the real face image based on the face feature points comprises:

carrying out face alignment processing on the real face image to obtain an aligned real face image;

identifying native image features from the aligned real face images based on the face feature points;

and identifying additional image features from the aligned real human face images based on the human face feature points.

9. The method of claim 8, wherein the native image feature comprises a hair style feature; the identifying hair style features from the aligned real face images based on the face feature points comprises:

extracting hair features in the aligned real face image through a trained hair style recognition network, and generating a hair mask image according to the hair features;

dividing the hair mask image into at least two mask image sub-regions based on the face feature points;

and obtaining hair style characteristics according to the distribution of the hair characteristics in the at least two mask image sub-areas.

10. The method of claim 8, wherein the identifying additional character features from the aligned real face images based on the face feature points comprises:

extracting an additional image area image from the aligned real face image according to the distribution position of the face feature points;

identifying additional image categories of the additional image area images through the trained target classification network;

and obtaining additional image characteristics in the real face image according to the additional image categories.

11. The method of claim 1, wherein the character features include skin tone features, native character features, and additional character features; the image rendering is carried out on the cartoon face model according to the image characteristics to generate a three-dimensional cartoon image, and the method comprises the following steps:

extracting pixels of a skin color area based on the distribution position of the face feature points, and acquiring skin color features according to the pixels of the skin color area;

performing skin color rendering on the three-dimensional cartoon face model according to the skin color characteristics;

and obtaining matched image materials according to the original image characteristics and the additional image characteristics, and rendering the image materials to a three-dimensional cartoon face model after skin color rendering to obtain a three-dimensional cartoon image with face characteristics and image characteristics.

12. The method according to any one of claims 1 to 11, wherein before extracting the human face feature points and the human image features from the real human face image, the method further comprises:

displaying a cartoon image selection interface, wherein the cartoon image selection interface comprises three-dimensional image options;

responding to the selection operation of the three-dimensional image option, and collecting a real face image of the user;

after the cartoon face model is subjected to image rendering according to the image characteristics and a three-dimensional cartoon image is generated, the method further comprises the following steps:

and displaying a image preview interface, and displaying the real human face image and the three-dimensional cartoon image in the image preview interface.

13. The human face image cartoon processing device is characterized by comprising:

14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 12.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.