CN114359471A

CN114359471A - Face image processing method, device and system

Info

Publication number: CN114359471A
Application number: CN202110014584.8A
Authority: CN
Inventors: 赵海明; 张邦; 刘铸; 潘攀; 徐盈辉
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date: 2020-09-29
Filing date: 2021-01-06
Publication date: 2022-04-15

Abstract

The application discloses a method, a device and a system for processing a face image. Wherein, the method comprises the following steps: acquiring a face image of a physical object, wherein the face image comprises images of a plurality of face components of the physical object; identifying a plurality of facial components in the facial image; acquiring virtual face components matched with each face component, wherein the virtual face components are materials generated by learning by adopting a neural network model; and splicing the virtual face component at the target position on the face image to generate a virtual object. The method and the device solve the technical problem that the generated virtual object is poor in effect because the processing method of the face image in the related technology adopts the modeling template for generation.

Description

Face image processing method, device and system

Technical Field

The present application relates to the field of image processing, and in particular, to a method, an apparatus, and a system for processing a face image.

Background

In the prior art, virtual objects are used to simulate real objects or living things, and the virtual objects are images rendered from the real objects or living things in a virtual scene, and the images are visual and interactive objects. The technical problem of poor effect of the generated virtual object is caused by adopting a patterned template to generate the virtual object in the process of generating the virtual object.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the application provides a method, a device and a system for processing a face image, which are used for at least solving the technical problem that the generated virtual object has poor effect because a processing method of the face image in the related art adopts a modeling template for generation.

According to an aspect of the embodiments of the present application, there is provided a method for processing a face image, including: acquiring a face image of a physical object, wherein the face image comprises images of a plurality of face components of the physical object; identifying a plurality of facial components in the facial image; acquiring virtual face components matched with each face component, wherein the virtual face components are materials generated by learning by adopting a neural network model; and splicing the virtual face component at the target position on the face image to generate a virtual object.

According to another aspect of the embodiments of the present application, there is also provided a method for processing a face image, including: displaying a facial image of the physical object on the interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; if an image operation instruction is detected in any one area of the interactive interface, triggering and identifying a plurality of face components in the face image, and acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model; displaying a virtual object on the interactive interface, wherein the virtual object is generated for a target position for splicing the virtual face component on the face image.

According to another aspect of the embodiments of the present application, there is also provided a method for processing a face image, including: displaying a facial image of the physical object on the interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; an image operation instruction is sensed in the interactive interface; responding to an image operation instruction, executing and identifying a plurality of face components in the face image, and acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model; outputting a selection page on the interactive interface, wherein the selection page provides at least one face component option, and different face component options are used for representing face components at different positions to be selectively processed; displaying a virtual object on the interactive interface, wherein the virtual object is generated by splicing the virtual face component to a target position on the face image.

According to another aspect of the embodiments of the present application, there is also provided a method for processing a face image, including: uploading, by a front-end client, a facial image captured of a physical object, wherein the facial image comprises images of a plurality of facial components of the physical object; the front-end client transmits the face image of the entity object to the background server; the method comprises the steps that a front-end client receives a plurality of virtual face components returned by a background server, wherein the virtual face components matched with each face component are obtained by identifying the plurality of face components in a face image, and the virtual face components are materials generated by learning through a neural network model; the front-end client receives the image operation instruction, splices the virtual face component at the target position on the face image, and generates a virtual object.

According to another aspect of the embodiments of the present application, there is also provided a method for processing a face image, including: displaying a facial image of the physical object on the interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; under the condition that an image operation instruction is detected in the interactive interface, responding to the image operation instruction, identifying a plurality of face components in the face image, and acquiring a virtual face component matched with each face component; displaying a virtual object on the interactive interface, wherein the virtual object is generated for a target position for splicing the virtual face component on the face image.

According to another aspect of the embodiments of the present application, there is also provided a facial image processing apparatus, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a face image of a solid object, and the face image comprises images of a plurality of face components of the solid object; a recognition module for recognizing a plurality of face components in the face image; the second acquisition module is used for acquiring virtual face components matched with each face component, wherein the virtual face components are materials generated by learning of a neural network model; and the generating module is used for splicing the virtual face component at the target position on the face image to generate a virtual object.

According to another aspect of the embodiments of the present application, there is also provided a facial image processing apparatus, including: a first display module for displaying a facial image of a physical object on an interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; the processing module is used for triggering and identifying a plurality of face components in the face image and acquiring virtual face components matched with each face component if an image operation instruction is detected in any one region of the interactive interface, wherein the virtual face components are materials generated by learning by adopting a neural network model; and the second display module is used for displaying a virtual object on the interactive interface, wherein the virtual object is generated for splicing the virtual face component on the target position of the face image.

According to another aspect of the embodiments of the present application, there is also provided a facial image processing apparatus, including: a first display module for displaying a facial image of a physical object on an interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; the sensing module is used for sensing an image operation instruction in the interactive interface; the processing module is used for responding to the image operation instruction, executing and identifying a plurality of face components in the face image and acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model; the output module is used for outputting a selection page on the interactive interface, and the selection page provides at least one face component option, wherein different face component options are used for representing face components at different positions to be selectively processed; and the second display module is used for displaying the virtual object on the interactive interface, wherein the virtual object is generated by splicing the virtual face component on the target position on the face image.

According to another aspect of the embodiments of the present application, there is also provided a facial image processing apparatus, including: an upload module to upload a facial image captured of a physical object, wherein the facial image includes images of a plurality of facial components of the physical object; the transmission module is used for transmitting the face image of the entity object to the background server; the receiving module is used for receiving a plurality of virtual face components returned by the background server, wherein the virtual face components matched with each face component are obtained by identifying the plurality of face components in the face image, and the virtual face components are materials generated by learning by adopting a neural network model; and the generating module is used for receiving the image operation instruction, splicing the virtual face component at the target position on the face image and generating a virtual object.

According to another aspect of the embodiments of the present application, there is also provided a facial image processing apparatus, including: a first display module for displaying a facial image of a physical object on an interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; the response module is used for responding to the image operation instruction under the condition that the image operation instruction is detected in the interactive interface, identifying a plurality of face components in the face image and acquiring virtual face components matched with each face component; and the second display module is used for displaying a virtual object on the interactive interface, wherein the virtual object is generated for splicing the virtual face component on the target position of the face image.

According to another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium including a stored program, where the program, when executed, controls an apparatus in which the computer-readable storage medium is located to execute the above-mentioned facial image processing method.

According to another aspect of the embodiments of the present application, there is also provided a processing terminal, including: the device comprises a memory and a processor, wherein the processor is used for running a program stored in the memory, and the program is used for executing the processing method of the face image when running.

According to another aspect of the embodiments of the present application, there is also provided a facial image processing system, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a face image of a physical object, wherein the face image comprises images of a plurality of face components of the physical object; identifying a plurality of facial components in the facial image; acquiring virtual face components matched with each face component, wherein the virtual face components are materials generated by learning by adopting a neural network model; and splicing the virtual face component at the target position on the face image to generate a virtual object.

In the embodiment of the application, after the face image of the entity object is obtained, a plurality of face components can be determined by identifying the face image, a virtual face component matched with each face component is obtained, and a virtual object corresponding to the entity object is generated by matching the plurality of virtual face components with the target position on the face image, so that the purpose of generating the virtual object is achieved. It is easy to notice that the virtual object is generated by splicing a plurality of virtual face components, each virtual face component is matched with each face component in the face image, a template of the whole face image does not need to be matched, the face images of different face components which are relatively wide can be covered, the similarity between the generated virtual object and the entity object is higher, in addition, the virtual face components can be generated by adopting neural network model learning, the whole generation method is expandable, the technical effect of improving the generated virtual object is achieved, the technical effect of balancing between the expression and the style of the virtual object and the entity object is kept, and the technical problem that the generated virtual object is poor in effect because the processing method of the face image in the related technology adopts the patterned template for generation is further solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a computer terminal (or mobile device) for implementing a method for processing a face image according to an embodiment of the present application;

FIG. 2 is a flow chart of a first method of processing a facial image according to an embodiment of the present application;

FIG. 3a is a schematic view of an alternative facial assembly in accordance with embodiments of the present application;

FIG. 3b is a schematic view of an alternative eye assembly according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an alternative segmentation of a face component according to embodiments of the present application;

FIG. 5 is a schematic diagram of alternative facial component point location information in accordance with embodiments of the present application;

FIG. 6 is a flow chart of an alternative method of processing a facial image according to an embodiment of the present application;

FIG. 7 is a flow chart of a second method of processing a facial image according to an embodiment of the present application;

FIG. 8 is a schematic illustration of an alternative interactive interface according to an embodiment of the present application;

fig. 9 is a flowchart of a processing method of a third face image according to an embodiment of the present application;

fig. 10 is a flowchart of a fourth method of processing a face image according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a first facial image processing apparatus according to an embodiment of the present application;

FIG. 12 is a schematic diagram of a second facial image processing apparatus according to an embodiment of the present application;

FIG. 13 is a schematic diagram of a third facial image processing apparatus according to an embodiment of the present application;

FIG. 14 is a schematic diagram of a fourth facial image processing apparatus according to an embodiment of the present application;

FIG. 15 is a flow chart of a fifth method for processing a facial image according to an embodiment of the present application;

FIG. 16 is a schematic diagram of a fifth facial image processing apparatus according to an embodiment of the present application;

fig. 17 is a block diagram of a computer terminal according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

and (3) virtual image: it can mean that three-dimensional grid data is used for representing human faces, and texture content is added to be close to the skin color of real people. And rendering a real interactive image by using a rendering engine under the virtual scene.

A face component: may refer to five-sense components constituting a human face, for example, an eye component, a nose component, an eyebrow component, a mouth component, a chin component, etc., and a complete face image may be obtained by combining different face components.

A neural network model: a complex network system, which can be formed by a large number of simple processing units widely interconnected, is a highly complex nonlinear dynamical system. Has large-scale parallel, distributed storage and processing, self-organization, self-adaptation and self-learning capabilities.

Three-dimensional projection matrix: the process of transforming an object in three-dimensional space onto a two-dimensional plane may be referred to as a projective transformation, which may use a matrix representation to determine a three-dimensional projection matrix.

Laplace mesh deformation laplacian deformation: the principle may be that the laplacian coordinates of the point before deformation and the laplacian coordinates of the point after deformation are as equal as possible. The laplacian coordinates may contain detailed information in the mesh model, which is not expected to change after the mesh is deformed.

In application scenes of various short video platforms and live broadcast platforms, in order to enable a user to make convenient and fast production, the function of generating an attractive virtual image which is similar to the user is provided for the user. However, if a realistic version of the avatar is generated, it is prone to slip into the terrorist effect, making it unacceptable to the user; if an avatar of a lovely version or cartoon version is generated, the difference is obvious from the real person.

In order to solve the problem of how to keep balance between the expression and the stylization in the virtual image generation process, the application provides the following technical scheme, and the human face features are extracted by using deep learning, and the template database model is used for post-processing, so that the generation effect of the virtual image can be improved by increasing the content of the template database.

Example 1

According to an embodiment of the present application, there is provided a method for processing a face image, it should be noted that the steps shown in the flowchart of the drawings may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from that here.

The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal or a similar operation device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a face image processing method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission device 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the facial image processing method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing, i.e., implements the facial image processing method described above, by running the software programs and modules stored in the memory 104. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

It should be noted here that, in some embodiments, the computer device (or mobile device) shown in fig. 1 has a touch display (also referred to as a "touch screen" or "touch display screen"). In some embodiments, the computer device (or mobile device) shown in fig. 1 above has a Graphical User Interface (GUI) with which a user can interact by touching finger contacts and/or gestures on a touch-sensitive surface, where the human interaction functionality optionally includes the following interactions: executable instructions for creating web pages, drawing, word processing, making electronic documents, games, video conferencing, instant messaging, emailing, call interfacing, playing digital video, playing digital music, and/or web browsing, etc., for performing the above-described human-computer interaction functions, are configured/stored in one or more processor-executable computer program products or readable storage media.

Under the above operating environment, the present application provides a method for processing a face image as shown in fig. 2. Fig. 2 is a flowchart of a first method for processing a face image according to an embodiment of the present application. As shown in fig. 2, the method may include the steps of:

in step S202, a face image of the physical object is acquired, wherein the face image includes images of a plurality of face components of the physical object.

The physical object in the above steps may be a real object or a living being that needs to be visualized and virtualized, and since the present application is directed to face virtualization of the physical object, in the embodiment of the present application, the physical object may be a real person, a robot, and the like, but is not limited thereto. The facial components in the above steps can be facial features including face shape, eyebrow, eye, nose, mouth, etc., and can be set according to the generation requirement of the virtual image.

In an alternative embodiment, the face part of the physical object may be directly photographed by a camera, a mobile phone, a tablet computer, a notebook computer, or other photographing devices, so as to obtain the face image of the physical object. In another alternative embodiment, the solid object may be photographed by the photographing device, and the face image may be obtained by performing processing such as cropping on the image.

In the embodiment of the application, the virtual image generation device may be a mobile terminal such as a smart phone and a tablet computer of a user, or a computer terminal such as a notebook computer and a PC computer, and the user may directly shoot the face image or select the shot face image to obtain the face image and process the face image. In order to reduce the calculation amount of the mobile terminal or the computer terminal, the virtual image generation device can also be a server, and a user can upload the face image to the server for processing in the modes of internet, WIFI, 3G, 4G, 5G and the like.

In step S204, a plurality of face components in the face image are identified.

In an alternative embodiment, the face image may be processed by using an image target detection technique to mark the position of each face component in the face image, that is, the face image is processed by using a pre-trained neural network model to frame each face component in the face image.

In step S206, a virtual face component matching each face component is obtained, wherein the virtual face component is a material generated by learning using a neural network model.

The virtual face component in the above steps may be a virtual model designed by a designer for different face components. As shown in fig. 3a, for the face component, the virtual face component may be a virtual model designed for different face types, for example, a normal face, a square face, and a round face. As shown in fig. 3b, for the eye component, the virtual face component may be a virtual model designed for different eye types, e.g., a dansyl eye, a peach-blossom eye, and a round eye.

In an alternative embodiment, different virtual models can be designed by a designer for different face components, in order to expand more materials, model training can be performed by using the virtual models designed by the designer, a neural network model is built, and more materials are generated by using the trained neural network model, so that a material library is built. After a plurality of face components are identified, matched materials can be selected from a material library in an image matching mode to obtain virtual face components, or materials of the same type can be selected from the material library by identifying the type corresponding to the face components to obtain the virtual face components.

In step S208, the virtual face component is spliced to the target position on the face image, and a virtual object is generated.

The target positions in the above steps may be corresponding positions of different virtual face components on the face image, and the virtual object may be a three-dimensional avatar.

In an alternative embodiment, since the face image is a two-dimensional image and the virtual face component is a three-dimensional five-sense organ component, the target position of the virtual face component can be determined by converting the virtual face component into a corresponding two-dimensional component and matching the two-dimensional component with the face image in a three-dimensional projective transformation manner. After determining the target position of each virtual face component, all virtual face components may be stitched according to the target position of each virtual face component, so that a corresponding avatar may be generated.

According to the scheme provided by the embodiment of the application, after the face image of the entity object is obtained, the face image can be identified to determine the plurality of face components, the virtual face component matched with each face component is obtained, and further the virtual object corresponding to the entity object is generated in a mode that the plurality of virtual face components are matched with the target position on the face image, so that the purpose of generating the virtual object is achieved. It is easy to notice that the virtual object is generated by splicing a plurality of virtual face components, each virtual face component is matched with each face component in the face image, a template of the whole face image does not need to be matched, the face images of different face components which are relatively wide can be covered, the similarity between the generated virtual object and the entity object is higher, in addition, the virtual face components can be generated by adopting neural network model learning, the whole generation method is expandable, the technical effect of improving the generated virtual object is achieved, the technical effect of balancing between the expression and the style of the virtual object and the entity object is kept, and the technical problem that the generated virtual object is poor in effect because the processing method of the face image in the related technology adopts the patterned template for generation is further solved.

In the above embodiments of the present application, a neural network model is used to process a face image, identify a plurality of face components from the face image, and match a label to each face component, wherein the level of the label is set according to the attribute of the face component.

The neural network model in the above steps may be a convolutional neural network model, and the backbone network may select resnet50, but is not limited thereto, and may be determined according to the actual processing accuracy and processing speed. The input to the neural network model may be a facial image, the output being a predefined facial five sense organ classification and matching label.

The matching labels in the above steps can be classified into different grades, the ranges of the characteristics of the five sense organs corresponding to the different grades are different, and the higher the grade is, the larger the range of the characteristics of the five sense organs is, and the characteristics belong to rough characteristics; the lower the rank, the smaller the range of the feature of the five sense organs, belonging to the detail feature. For example, the primary label may include: face, eyebrow, eye, nose, mouth, the secondary label may include: thickness of lips, angle of eyes, height of cheekbones, width of chin, position of brow head, angle of brow tail, etc., but not limited thereto.

In an alternative embodiment, a large number of face images are obtained as training samples, and face five sense organs are manually labeled to classify and match labels, so that the convolutional neural network model is trained based on the training samples. After the convolutional neural network model training is completed, the acquired face image can be input into the convolutional neural network model, the five sense organs features and the face label are extracted, and a plurality of face components and each face component matching label are obtained.

In the above embodiment of the present application, in step S206, the obtaining of the virtual face component matched with each face component includes: matching materials corresponding to each label from a material library based on the label of each face component; and generating virtual face components based on the matched materials, wherein the face regions corresponding to different virtual face components have overlapping regions.

It should be noted that different face components can be divided as shown in fig. 4, and there is an overlapping area between different face areas, and the overlapping area is used as a transition area to prevent unsmooth stitching. As shown in fig. 5, the specific positions of the different face components can be determined by the point location information of the different face components, wherein the models such as eye shape, eyebrow shape, mouth shape, nose shape, face shape, etc. can be modified and made by the designer on the mesh vertices.

Alternatively, the priority of performing matching from the material library may be determined based on the level of the tag.

It should be noted that, because the labels have different levels and corresponding ranges of five sense organs are different, the higher the level is, the larger the range of five sense organs is, and thus, the higher the corresponding priority is; the lower the level, the smaller the range of five sense organs and, therefore, the lower the corresponding priority, thereby avoiding matching all face components, resulting in a longer matching time and a reduced matching efficiency.

In an optional embodiment, after the matching labels of the face component and the face component are obtained through the convolutional neural network model, the face component corresponding to the label at the higher level may be matched first based on the label of each face component, and then the material successfully matched is matched again based on the face component corresponding to the label at the lower level, so that the material corresponding to each label is obtained, and then each virtual face component is generated.

In the above embodiment of the present application, before the step S208 of splicing the virtual face component to the target position on the face image, the method further includes: aligning the virtual face component, and aligning the virtual face component with the face image; projecting the virtual face component after the alignment processing through a three-dimensional projection matrix, and determining the spatial position of the virtual face component in the space, wherein the spatial position comprises at least one of the following: a rotational position and a translational position.

The space in the above step may be referred to as a camera space, that is, a space in which the face image is captured, but is not limited thereto.

In an alternative embodiment, the virtual face component may be aligned with the face image, and a rotation matrix R and a translation matrix T of the virtual face component M in the camera space are solved by a three-dimensional projection matrix, where R represents the rotation position of the virtual face component M in the camera coordinate system and T represents the translation position of the virtual face component M in the camera coordinate system.

In the above embodiment of the present application, in step S208, the splicing the virtual face component to the target position on the face image includes: determining a characteristic point sequence corresponding to the virtual face component on a plane based on the spatial position of the virtual face component in the space; acquiring the feature point position of each feature point in the feature point sequence in a grid based on the corresponding feature point sequence of the virtual face component on the plane; and splicing the virtual face components, and splicing the splicing result to the target position on the face image based on the characteristic point position of each characteristic point in the grid.

In an alternative embodiment, after the spatial position RT of the virtual face component M in the camera space is based, the feature point sequence L corresponding to the virtual face component M and the corresponding face image can be determined based on the spatial position RT, and the target position of the three-dimensional vertex on the virtual face component M can be obtained by solving the energy equation E as shown below:

E＝∑_i∈Lw_i||P*(R_MV_i+T_M)-N_i||，

wherein, w_iRepresenting weights of different characteristic points, P representing a three-dimensional projection matrix, R_MIndicating the rotational position, T, of the virtual face component M_MRepresenting the translational position, V, of the virtual facial component M_iRepresents the 3d position of the ith feature point on the mesh model (i.e., the feature point position described above), N_iIndicating the 2d position of the ith feature point on the face image (i.e., the above-described target position).

It should be noted that after the feature point positions of the feature points in the mesh are obtained, the virtual face component may be deformed using a laplacian transformation algorithm.

In the above embodiment of the present application, after the step S208 of splicing the virtual face component to the target position on the face image, the method further includes: superposing the virtual face components spliced on the face image; and fusing the superposition result to the face image.

In an alternative embodiment, the eye (E), nose (N) and mouth (M) may be spliced on the face (F), for example, the spatial position of the assembly E spliced on the assembly F may be obtained by solving the energy equation E as shown below:

E＝Σ_i∈F∩E||S*(R_EV_i ^E+T_E)-V_i ^F||，

wherein S represents scaling, and F ∞ E represents the overlapping region of F and E, as shown in fig. 4, there is an overlapping portion in different regions in the face region division, which serves as a transition region to prevent the boundary stitching from being unsmooth. In solving for R_EAnd T_EThen, the target position of the overlapped region vertex F &' E is calculated:

wherein D is_i∈F∩EIndicating the target position of the vertex in the laplacian deformation algorithm in the F ^ E overlapped area; namely, constraint in the laplacian deformation algorithm:

S*(R_EV_i ^E+T_E)＝D_i，i∈F∩E，

V_i ^F＝D_i，i∈F∩E，

then, the Laplacian deformation algorithm can be used for respectively deforming the component E and the component F to obtain deformed positions D^EAnd D^FPost-deformation vertex position R_i∈F∪EComprises the following steps:

wherein w_ERepresenting fusion weight terms predefined on component E.

The rest can be performed by the same operation.

It should be noted that each virtual face component can be modified by the mixed model coefficients, so that the superposition on the human face directly does not need to be performed by using the stitching, but the processing operation needs to re-project the vertex deformation result on the mixed model coefficients because the processing operation is performed on the three-dimensional vertices of the virtual face component.

In the above embodiment of the present application, after the step S208 of splicing the virtual face component to the target position on the face image, the method further includes: generating training data of a virtual object based on the face image and a virtual face component spliced on the face image, wherein the training data is used for training a virtual face model, and the virtual face model is a deep learning model; and re-projecting the virtual face model to the face pinching mixed model to obtain an expression parameter model of the face.

In an alternative embodiment, training data of virtual objects may be generated based on the face image and the virtual face components spliced on the face image, an end-to-end deep learning model (i.e. the above virtual face model) may be trained, and the generated virtual objects may be re-projected onto the face-pinching mixed model through the model, so that the parameterized representation a of each face may be obtained. The main network of the virtual face model can use the resnet50 and the full connection layer, the input is a face picture, and the output is a coefficient A of the face on the face-pinching mixed model.

A preferred embodiment of the present application is described in detail below with reference to fig. 6, and as shown in fig. 6, the method includes the steps of:

in step S61, after the face image is acquired, facial feature matching features and facial labels in the face image may be extracted according to the convolutional neural network model.

Step S62, selecting matched material from the material library to obtain a plurality of virtual face components.

Alternatively, matching materials including face, eye, nose, and mouth shapes may be acquired.

Step S63, performing post-processing optimization on the plurality of virtual face components to enhance the difference between the virtual face components corresponding to different face images.

Optionally, the post-processing procedure may include: corresponding the five sense organ components and the face image, solving a three-dimensional projection matrix, and making a characteristic point sequence; solving an energy equation and the virtual facial component deformation.

And step S64, carrying out integral splicing optimization on the five sense organs to obtain an avatar.

Optionally, the spatial position between the virtual face components is obtained by solving an energy equation, and the virtual face components are fused by using a laplacian format algorithm, so as to obtain a final virtual image.

Through the steps, the virtual object generation scheme provided by the embodiment of the application generates the stylized virtual object similar to the face image under the condition of smaller workload. The scheme has the following advantages: five sense organs in the generated virtual object have finer expression and post-processing and are more similar to the input face image; for designers and arts, only a few models need to be made to obtain results with certain differences, and the workload of the designers and arts can be reduced; the whole framework is expandable, and micro-rendering can be added to further improve the network performance; after the material library is matched, the diversity of the generated virtual object can be enhanced by optimizing the information of the face image; by processing the human face in the sub-regions, the modification degree of each part can be improved, and the degree of freedom of generating the virtual object is enhanced.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

Example 2

According to the embodiment of the application, a method for processing the face image is further provided.

Fig. 7 is a flowchart of a second method for processing a face image according to an embodiment of the present application. As shown in fig. 7, the method may include the steps of:

step S702, displaying a face image of the entity object on the interactive interface, wherein the face image comprises images of a plurality of face components of the entity object.

The interactive interface in the above step may be an operation interface provided to the user on a display screen of the device for implementing the detail resolution enhancement of the image data, for example, the interactive interface may be an operation interface displayed on a display screen of the mobile terminal, or an operation interface displayed on a display screen of the computer terminal, but is not limited thereto.

In an alternative embodiment, after the face image is captured, the user may select to capture the face image by clicking an "upload image" button as shown in fig. 8, or by dragging the face image to a dashed box, which may be displayed in the first display area as shown in fig. 8.

In another alternative embodiment, the user may directly capture the face image by clicking on the "capture image" button as shown in fig. 8, and the captured face image may be displayed in the first display area as shown in fig. 8.

Step S704, if an image operation instruction is detected in any one area of the interactive interface, a plurality of face components in the face image are triggered and identified, and a virtual face component matched with each face component is obtained, wherein the virtual face component is a material generated by learning through a neural network model.

The avatar operation instruction in the above step may be an instruction generated by a user clicking a specific button on the interactive interface, or an instruction generated by a user performing a predetermined gesture operation on the interactive interface, the instruction being for generating a corresponding avatar based on the face image. The virtual face component in the above steps may be a virtual model designed by a designer for different face components.

In an alternative embodiment, after uploading the face image, the user may generate the avatar operation command by clicking an "avatar generation" button as shown in fig. 8, or directly generate the avatar operation command by gesture operation, so that the computer terminal, the mobile terminal or the server may receive the avatar operation command, recognize a plurality of face components in the face image, and further obtain a virtual face component matching each face component from the material library.

Step S706, displaying a virtual object on the interactive interface, wherein the virtual object is generated for splicing the virtual face component on the target position of the face image.

In an alternative embodiment, the virtual object may be displayed in the second display area as shown in FIG. 8.

The matching labels in the above steps can be classified into different grades, the ranges of the characteristics of the five sense organs corresponding to the different grades are different, and the higher the grade is, the larger the range of the characteristics of the five sense organs is, and the characteristics belong to rough characteristics; the lower the rank, the smaller the range of the feature of the five sense organs, belonging to the detail feature.

In the above embodiments of the present application, the obtaining of the virtual face component matched with each face component includes: matching materials corresponding to each label from a material library based on the label of each face component; and generating virtual face components based on the matched materials, wherein the face regions corresponding to different virtual face components have overlapping regions.

In the above embodiment of the present application, in step S706, before displaying the virtual object on the interactive interface, the method further includes: aligning the virtual face component, and aligning the virtual face component with the face image; projecting the virtual face component after the alignment processing through a three-dimensional projection matrix, and determining the spatial position of the virtual face component in the space, wherein the spatial position comprises at least one of the following: a rotational position and a translational position.

In the above embodiments of the present application, the virtual face component is spliced at the target position on the face image in the following manner: determining a characteristic point sequence corresponding to the virtual face component on a plane based on the spatial position of the virtual face component in the space; acquiring the feature point position of each feature point in the feature point sequence in a grid based on the corresponding feature point sequence of the virtual face component on the plane; and splicing the virtual face components, and splicing the splicing result to the target position on the face image based on the characteristic point position of each characteristic point in the grid.

In the above embodiment of the present application, after the splicing the virtual face component to the target position on the face image, the method further includes: superposing the virtual face components spliced on the face image; and fusing the superposition result to the face image.

In the above embodiment of the present application, in step S706, after the virtual object is displayed on the interactive interface, the method further includes: generating training data of a virtual object based on the face image and a virtual face component spliced on the face image, wherein the training data is used for training a virtual face model, and the virtual face model is a deep learning model; and re-projecting the virtual face model to the face pinching mixed model to obtain an expression parameter model of the face.

It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.

Example 3

Fig. 9 is a flowchart of a processing method of a third face image according to an embodiment of the present application. As shown in fig. 9, the method may include the steps of:

step S902, displaying a face image of the physical object on the interactive interface, wherein the face image includes images of a plurality of face components of the physical object.

And step S904, an image operation instruction is sensed in the interactive interface.

The avatar operation instruction in the above step may be an instruction generated by a user clicking a specific button on the interactive interface, or an instruction generated by a user performing a predetermined gesture operation on the interactive interface, the instruction being for generating a corresponding avatar based on the face image.

And step S906, responding to the image operation instruction, executing recognition of a plurality of face components in the face image, and acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model.

The virtual face component in the above steps may be a virtual model designed by a designer for different face components.

Step S908, outputting a selection page on the interactive interface, where the selection page provides at least one facial component option, where different facial component options are used for characterizing the facial components at different positions to be selectively processed.

The selection page in the above steps may be a newly popped page, or a page directly displayed in the area of the interactive interface, where multiple face component options are displayed in the page, and the user may determine the face component to be subjected to avatar virtualization by selecting different face component options.

It should be noted that, the facial component options selected by the user are different, and the corresponding charging standards are different, so that the user can select the options according to the requirements of the generation accuracy and the use of the virtual object in the actual use process.

In an alternative embodiment, after receiving the character manipulation instruction, a selection page may be output on the interactive interface, and the user may select the face component option as desired, so that the computer terminal, the mobile terminal, or the server may determine the virtual face component required to generate the avatar based on the face component option.

Step S910, displaying a virtual object on the interactive interface, wherein the virtual object is generated by splicing the virtual face component to a target position on the face image.

In the above embodiment of the present application, in step S910, before displaying the virtual object on the interactive interface, the method further includes: aligning the virtual face component, and aligning the virtual face component with the face image; projecting the virtual face component after the alignment processing through a three-dimensional projection matrix, and determining the spatial position of the virtual face component in the space, wherein the spatial position comprises at least one of the following: a rotational position and a translational position.

In the above embodiment of the present application, in step S910, after the virtual object is displayed on the interactive interface, the method further includes: generating training data of a virtual object based on the face image and a virtual face component spliced on the face image, wherein the training data is used for training a virtual face model, and the virtual face model is a deep learning model; and re-projecting the virtual face model to the face pinching mixed model to obtain an expression parameter model of the face.

Example 4

Fig. 10 is a flowchart of a fourth method for processing a face image according to an embodiment of the present application. As shown in fig. 10, the method may include the steps of:

in step S1002, the front-end client uploads a face image captured of the physical object, wherein the face image includes images of a plurality of face components of the physical object.

The front-end client in the above steps may be a mobile terminal installed on a mobile phone, a tablet computer, and the like used by a user, or an application program installed on a computer terminal, and the like, such as a notebook computer, a desktop computer, and the like, but is not limited thereto. The physical object in the above steps may be a real object or a living being that needs to be visualized and virtualized, and since the present application is directed to face virtualization of the physical object, in the embodiment of the present application, the physical object may be a real person, a robot, and the like, but is not limited thereto. The facial components in the above steps can be facial features including face shape, eyebrow, eye, nose, mouth, etc., and can be set according to the generation requirement of the virtual image.

In an alternative embodiment, the user may capture the physical object through a camera, a mobile phone, a tablet computer, or other capturing devices, so as to obtain a face image of the physical object. After the face image is shot, the user can operate on the front-end client, select the face image needing to be processed and upload the face image.

In step S1004, the front-end client transmits the face image of the entity object to the background server.

The background server in the above steps may be a server for performing deblurring processing on the image data, and the server may be a cloud server, but is not limited thereto.

In an alternative embodiment, the front-end client may connect to the background server through the internet or a wireless network, and transmit the image data to be processed to the background server for processing.

Step S1006, the front-end client receives the multiple virtual face components returned by the background server, wherein the virtual face components matched with each face component are obtained by identifying the multiple face components in the face image, and the virtual face components are materials generated by learning by adopting a neural network model.

Step S1008, the front-end client receives the image operation instruction, splices the virtual face component at the target position on the face image, and generates a virtual object.

The avatar operation instruction in the above step may be an instruction generated by a user clicking a specific button on the interactive interface, or an instruction generated by a user performing a predetermined gesture operation on the interactive interface, the instruction being for generating a corresponding avatar based on the face image. The target positions in the above steps may be corresponding positions of different virtual face components on the face image, and the virtual object may be a three-dimensional avatar.

In the above embodiments of the present application, a plurality of face components are identified from a face image by processing the face image using a neural network model, and a label is matched to each face component, wherein a level of the label is set according to an attribute of the face component.

In the above embodiments of the present application, obtaining the virtual face component matching each face component in the following manner includes: matching materials corresponding to each label from a material library based on the label of each face component; and generating virtual face components based on the matched materials, wherein the face regions corresponding to different virtual face components have overlapping regions.

In the above embodiment of the present application, in step S1008, before the splicing the virtual face component to the target position on the face image and generating the virtual object, the method further includes: aligning the virtual face component, and aligning the virtual face component with the face image; projecting the virtual face component after the alignment processing through a three-dimensional projection matrix, and determining the spatial position of the virtual face component in the space, wherein the spatial position comprises at least one of the following: a rotational position and a translational position.

In the above embodiment of the present application, in step S1008, the splicing the virtual face component to the target position on the face image includes: determining a characteristic point sequence corresponding to the virtual face component on a plane based on the spatial position of the virtual face component in the space; acquiring the feature point position of each feature point in the feature point sequence in a grid based on the corresponding feature point sequence of the virtual face component on the plane; and splicing the virtual face components, and splicing the splicing result to the target position on the face image based on the characteristic point position of each characteristic point in the grid.

In the above embodiment of the present application, after the step S1008 of splicing the virtual face component to the target position on the face image, the method further includes: superposing the virtual face components spliced on the face image; and fusing the superposition result to the face image.

In the above embodiment of the present application, after the step S1008 of splicing the virtual face component to the target position on the face image, the method further includes: generating training data of a virtual object based on the face image and a virtual face component spliced on the face image, wherein the training data is used for training a virtual face model, and the virtual face model is a deep learning model; and re-projecting the virtual face model to the face pinching mixed model to obtain an expression parameter model of the face.

Example 5

According to an embodiment of the present application, there is also provided a face image processing apparatus for implementing the above face image processing method, as shown in fig. 11, the apparatus 1100 includes: a first acquisition module 1102, a recognition module 1104, a second acquisition module 1106, and a generation module 1108.

The first obtaining module 1102 is configured to obtain a face image of a physical object, where the face image includes images of a plurality of face components of the physical object; the recognition module 1104 is used for recognizing a plurality of face components in the face image; the second obtaining module 1106 is configured to obtain a virtual face component matching each face component, where the virtual face component is a material generated by learning using a neural network model; the generation module 1108 is configured to stitch the virtual face component to a target location on the face image to generate a virtual object.

It should be noted here that the first obtaining module 1102, the identifying module 1104, the second obtaining module 1106 and the generating module 1108 correspond to steps S202 to S208 in embodiment 1, and the four modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the above embodiments of the present application, the recognition module is further configured to process the face image by using a neural network model, recognize a plurality of face components from the face image, and match a label to each face component, where a level of the label is set according to an attribute of the face component.

In the above embodiments of the present application, the second obtaining module includes: a matching unit and a generating unit.

The matching unit is used for matching materials corresponding to each label from the material library based on the label of each face component; the generating unit is used for generating virtual face components based on the matched materials, wherein the face areas corresponding to different virtual face components have overlapping areas.

In the above embodiment of the present application, the apparatus further includes: an alignment module and a projection module.

The alignment module is used for aligning the virtual face component and aligning the virtual face component with the face image; the projection module is used for projecting the aligned virtual face component through a three-dimensional projection matrix, and determining a spatial position of the virtual face component in a space, wherein the spatial position includes at least one of the following: a rotational position and a translational position.

In the above embodiments of the present application, the generating module includes: the device comprises a determining unit, an acquiring unit and a splicing unit.

The determining unit is used for determining a feature point sequence corresponding to the virtual face component on a plane based on the spatial position of the virtual face component in the space; the acquiring unit is used for acquiring the feature point position of each feature point in the feature point sequence in the grid based on the corresponding feature point sequence of the virtual face component on the plane; the splicing unit is used for splicing the virtual face components and splicing the splicing result to the target position on the face image based on the characteristic point position of each characteristic point in the grid.

In the above embodiment of the present application, the apparatus further includes: the device comprises a superposition module and a fusion module.

The superposition module is used for superposing the virtual face component spliced on the face image; and the fusion module is used for fusing the superposition result to the face image.

In the above embodiment of the present application, the apparatus further includes: the device comprises a training module and a third acquisition module.

The training module is used for generating training data of a virtual object based on the face image and a virtual face component spliced on the face image, wherein the training data is used for training a virtual face model, and the virtual face model is a deep learning model; and the third acquisition module is used for re-projecting the virtual face model onto the face-pinching mixed model to acquire the expression parameter model of the face.

Example 6

According to an embodiment of the present application, there is also provided a face image processing apparatus for implementing the above face image processing method, as shown in fig. 12, the apparatus 1200 includes: a first display module 1202, a processing module 1204, and a second display module 1206.

Wherein the first display module 1202 is configured to display a facial image of the physical object on the interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; the processing module 1204 is configured to trigger and identify a plurality of face components in the face image and obtain a virtual face component matched with each face component if an avatar operation instruction is detected in any one region of the interactive interface, where the virtual face component is a material generated by learning using a neural network model; the second display module 1206 is for displaying a virtual object on the interactive interface, wherein the virtual object is generated for a target location for stitching the virtual face component onto the face image.

It should be noted here that the first display module 1202, the processing module 1204 and the second display module 1206 correspond to steps S702 to S706 in embodiment 2, and the three modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the above embodiments of the present application, the processing module is further configured to process the face image by using a neural network model, identify a plurality of face components from the face image, and match a label to each face component, where a level of the label is set according to an attribute of the face component.

In the above embodiments of the present application, the processing module includes: a matching unit and a generating unit.

In the above embodiment of the present application, the apparatus further includes: the device comprises a determining module, a first obtaining module and a splicing module.

The determining module is used for determining a characteristic point sequence corresponding to the virtual face component on a plane based on the spatial position of the virtual face component in the space; the first acquisition module is used for acquiring the feature point position of each feature point in the feature point sequence in the grid based on the corresponding feature point sequence of the virtual face component on the plane; the splicing module is used for splicing the virtual face components and splicing the splicing result to the target position on the face image based on the characteristic point position of each characteristic point in the grid.

In the above embodiment of the present application, the apparatus further includes: the device comprises a training module and a second acquisition module.

The training module is used for generating training data of a virtual object based on the face image and a virtual face component spliced on the face image, wherein the training data is used for training a virtual face model, and the virtual face model is a deep learning model; the second obtaining module is used for re-projecting the virtual face model to the face-pinching mixed model to obtain an expression parameter model of the face.

Example 7

According to an embodiment of the present application, there is also provided a facial image processing apparatus for implementing the above facial image processing method, as shown in fig. 13, the apparatus 1300 includes: a first display module 1302, a sensing module 1304, a processing module 1306, an output module 1308, and a second display module 1310.

Wherein the first display module 1302 is configured to display a facial image of the physical object on the interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; the sensing module 1304 is used for sensing an image operation instruction in the interactive interface; the processing module 1306 is configured to, in response to the avatar operation instruction, perform recognition on a plurality of face components in the face image, and obtain a virtual face component matching each face component, where the virtual face component is a material generated by learning using a neural network model; the output module 1308 is configured to output a selection page on the interactive interface, where the selection page provides at least one facial component option, where different facial component options are used to characterize the facial components for different locations for selective processing; the second display module 1310 is for displaying a virtual object on the interactive interface, wherein the virtual object is generated by stitching a virtual face component to a target location on the face image.

It should be noted that, the first display module 1302, the sensing module 1304, the processing module 1306, the output module 1308 and the second display module 1310 correspond to steps S902 to S910 in embodiment 3, and the five modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 8

According to the embodiment of the application, a facial image processing device for implementing the facial image processing method is also provided, and the device can be deployed in a front-end client. As shown in fig. 14, the apparatus 1400 includes: an upload module 1402, a transmit module 1404, a receive module 1406, and a generate module 1408.

Wherein the upload module 1402 is configured to upload a face image captured of the physical object, wherein the face image comprises images of a plurality of face components of the physical object; the transmission module 1404 is configured to transmit the face image of the entity object to a background server; the receiving module 1406 is configured to receive a plurality of virtual face components returned by the background server, where a virtual face component matched with each face component is obtained by identifying the plurality of face components in the face image, and the virtual face component is a material generated by learning using a neural network model; the generating module 1408 is configured to receive the avatar operation instruction, splice the virtual face component to a target position on the face image, and generate a virtual object.

It should be noted here that the uploading module 1402, the transmitting module 1404, the receiving module 1406, and the generating module 1408 correspond to steps S1002 to S1008 in embodiment 4, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the above embodiment of the present application, the apparatus further includes: and identifying the module.

Wherein the module is identified. Processing the face image by using a neural network model, identifying a plurality of face components from the face image, and matching a label to each face component, wherein the grade of the label is set according to the attribute of the face component.

In the above embodiment of the present application, the apparatus further includes: the device comprises a matching module and a generating module.

The matching module is used for matching materials corresponding to each label from the material library based on the label of each face component; the generating module is used for generating virtual face components based on the matched materials, wherein the face areas corresponding to different virtual face components have overlapping areas.

Example 9

According to an embodiment of the present application, there is also provided a face image processing system, including:

a processor. And

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a face image of a physical object, wherein the face image comprises images of a plurality of face components of the physical object; identifying a plurality of facial components in the facial image; acquiring virtual face components matched with each face component, wherein the virtual face components are materials generated by learning by adopting a neural network model; and splicing the virtual face component at the target position on the face image to generate a virtual object.

Example 10

Fig. 15 is a flowchart of a fifth method for processing a face image according to an embodiment of the present application. As shown in fig. 15, the method may include the steps of:

step S1502 displays a face image of the physical object on the interactive interface, wherein the face image includes images of a plurality of face components of the physical object.

In step S1504, in a case where a character manipulation instruction is detected in any one area of the interactive interface, in response to the character manipulation instruction, a plurality of face components in the face image are recognized, and a virtual face component matching each face component is acquired.

In step S1506, a virtual object is displayed on the interactive interface, wherein the virtual object is generated for a target position for splicing the virtual face component onto the face image.

In the foregoing embodiment of the present application, in step S1506, before displaying the virtual object on the interactive interface, the method further includes: aligning the virtual face component, and aligning the virtual face component with the face image; projecting the virtual face component after the alignment processing through a three-dimensional projection matrix, and determining the spatial position of the virtual face component in the space, wherein the spatial position comprises at least one of the following: a rotational position and a translational position.

In the foregoing embodiment of the present application, in step S1506, after the virtual object is displayed on the interactive interface, the method further includes: generating training data of a virtual object based on the face image and a virtual face component spliced on the face image, wherein the training data is used for training a virtual face model, and the virtual face model is a deep learning model; and re-projecting the virtual face model to the face pinching mixed model to obtain an expression parameter model of the face.

Example 11

According to an embodiment of the present application, there is also provided a facial image processing apparatus for implementing the above facial image processing method, as shown in fig. 16, the apparatus 1600 includes: a first display module 1602, a response module 1604, and a second display module 1606.

Wherein the first display module 1202 is configured to display a facial image of the physical object on the interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; the processing module 1204 is configured to, in a case where a character manipulation instruction is detected in any one region of the interactive interface, identify a plurality of face components in the face image in response to the character manipulation instruction, and acquire a virtual face component matching each face component; the second display module 1206 is for displaying a virtual object on the interactive interface, wherein the virtual object is generated for a target location for stitching the virtual face component onto the face image.

It should be noted here that the first display module 1602, the response module 1604, and the second display module 1606 correspond to steps S1502 to S1506 in embodiment 10, and the three modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 12

The embodiment of the application can provide a computer terminal, and the computer terminal can be any one computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal described above may execute program codes of the following steps in the processing method of the face image: acquiring a face image of a physical object, wherein the face image comprises images of a plurality of face components of the physical object; identifying a plurality of facial components in the facial image; acquiring virtual face components matched with each face component, wherein the virtual face components are materials generated by learning by adopting a neural network model; and splicing the virtual face component at the target position on the face image to generate a virtual object.

Alternatively, fig. 17 is a block diagram of a computer terminal according to an embodiment of the present application. As shown in fig. 17, the computer terminal a may include: one or more processors 1702 (only one of which is shown), and a memory 1704.

The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the facial image processing method and apparatus in the embodiments of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the above-mentioned facial image processing method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a face image of a physical object, wherein the face image comprises images of a plurality of face components of the physical object; identifying a plurality of facial components in the facial image; acquiring virtual face components matched with each face component, wherein the virtual face components are materials generated by learning by adopting a neural network model; and splicing the virtual face component at the target position on the face image to generate a virtual object.

Optionally, the processor may further execute the program code of the following steps: processing the face image by using a neural network model, identifying a plurality of face components from the face image, and matching a label to each face component, wherein the grade of the label is set according to the attribute of the face component.

Optionally, the processor may further execute the program code of the following steps: matching materials corresponding to each label from a material library based on the label of each face component; and generating virtual face components based on the matched materials, wherein the face regions corresponding to different virtual face components have overlapping regions.

Optionally, the processor may further execute the program code of the following steps: the priority of performing matching from the material library is determined based on the level of the tag.

Optionally, the processor may further execute the program code of the following steps: before splicing the virtual face component at the target position on the face image, carrying out alignment processing on the virtual face component, and aligning the virtual face component with the face image; projecting the virtual face component after the alignment processing through a three-dimensional projection matrix, and determining the spatial position of the virtual face component in the space, wherein the spatial position comprises at least one of the following: a rotational position and a translational position.

Optionally, the processor may further execute the program code of the following steps: determining a characteristic point sequence corresponding to the virtual face component on a plane based on the spatial position of the virtual face component in the space; acquiring the feature point position of each feature point in the feature point sequence in a grid based on the corresponding feature point sequence of the virtual face component on the plane; and splicing the virtual face components, and splicing the splicing result to the target position on the face image based on the characteristic point position of each characteristic point in the grid.

Optionally, the processor may further execute the program code of the following steps: after splicing the virtual face component at the target position on the face image, overlapping the virtual face component spliced on the face image; and fusing the superposition result to the face image.

Optionally, the processor may further execute the program code of the following steps: after splicing the virtual face component at the target position on the face image, generating training data of a virtual object based on the face image and the virtual face component spliced on the face image, wherein the training data is used for training a virtual face model, and the virtual face model is a deep learning model; and re-projecting the virtual face model to the face pinching mixed model to obtain an expression parameter model of the face.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: displaying a facial image of the physical object on the interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; if an image operation instruction is detected in any one area of the interactive interface, triggering and identifying a plurality of face components in the face image, and acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model; displaying a virtual object on the interactive interface, wherein the virtual object is generated for a target position for splicing the virtual face component on the face image.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: displaying a facial image of the physical object on the interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; an image operation instruction is sensed in the interactive interface; responding to an image operation instruction, executing and identifying a plurality of face components in the face image, and acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model; outputting a selection page on the interactive interface, wherein the selection page provides at least one face component option, and different face component options are used for representing face components at different positions to be selectively processed; displaying a virtual object on the interactive interface, wherein the virtual object is generated by splicing the virtual face component to a target position on the face image.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: uploading, by a front-end client, a facial image captured of a physical object, wherein the facial image comprises images of a plurality of facial components of the physical object; the front-end client transmits the face image of the entity object to the background server; the method comprises the steps that a front-end client receives a plurality of virtual face components returned by a background server, wherein the virtual face components matched with each face component are obtained by identifying the plurality of face components in a face image, and the virtual face components are materials generated by learning through a neural network model; the front-end client receives the image operation instruction, splices the virtual face component at the target position on the face image, and generates a virtual object.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: displaying a facial image of the physical object on the interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object; under the condition that an image operation instruction is detected in the interactive interface, responding to the image operation instruction, identifying a plurality of face components in the face image, and acquiring a virtual face component matched with each face component; displaying a virtual object on the interactive interface, wherein the virtual object is generated for a target position for splicing the virtual face component on the face image.

By adopting the embodiment of the application, a processing scheme of the face image is provided. The virtual face component matched with each face component in the face image is obtained, the virtual object is generated based on the splicing of the plurality of virtual face components, a template of the whole face image does not need to be matched, the face images of different face components which are wide can be covered, the similarity between the generated virtual object and the entity object is high, in addition, the virtual face component can be generated by adopting neural network model learning, the whole generation method can be expanded, the generated virtual object effect is improved, the technical effect of balance between the expression and stylization between the virtual object and the entity object is kept, and the technical problem that the generated virtual object effect is poor due to the fact that the face image processing method in the related technology adopts the patterned template for generation is solved.

It can be understood by those skilled in the art that the structure shown in fig. 17 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, etc. Fig. 17 is a diagram illustrating the structure of the electronic device. For example, the computer terminal a may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 17, or have a different configuration than shown in fig. 17.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 13

Embodiments of the present application also provide a storage medium. Alternatively, in this embodiment, the storage medium may be configured to store program codes executed by the processing method of the face image provided in the above embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a face image of a physical object, wherein the face image comprises images of a plurality of face components of the physical object; identifying a plurality of facial components in the facial image; acquiring virtual face components matched with each face component, wherein the virtual face components are materials generated by learning by adopting a neural network model; and splicing the virtual face component at the target position on the face image to generate a virtual object.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method of processing a facial image, comprising:

obtaining a facial image of a physical object, wherein the facial image includes images of a plurality of facial components of the physical object;

identifying a plurality of facial components in the facial image;

acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model;

and splicing the virtual face component at the target position on the face image to generate a virtual object.

2. The method of claim 1, wherein the facial image is processed using a neural network model, the plurality of facial components are identified from the facial image, and a label is matched for each facial component, wherein the label is ranked according to facial component attributes.

3. The method of claim 2, wherein obtaining a virtual face component that matches each face component comprises:

matching materials corresponding to each label from a material library based on the label of each face component;

and generating the virtual face components based on the matched materials, wherein the face regions corresponding to different virtual face components have overlapping regions.

4. The method of claim 3, wherein the priority of performing a match from the corpus is determined based on a level of a tag.

5. The method of any of claims 1-4, wherein prior to stitching the virtual face component to a target location on the facial image, the method further comprises:

aligning the virtual face component with the face image;

projecting the virtual face component after the alignment processing through a three-dimensional projection matrix, and determining the spatial position of the virtual face component in the space, wherein the spatial position comprises at least one of the following: a rotational position and a translational position.

6. The method of claim 5, wherein stitching the virtual face component to a target location on the facial image comprises:

determining a corresponding feature point sequence of the virtual face component on a plane based on the spatial position of the virtual face component in space;

acquiring the feature point position of each feature point in the feature point sequence in a grid based on the corresponding feature point sequence of the virtual face component on the plane;

and splicing the virtual face components, and splicing the splicing result to the target position on the face image based on the characteristic point position of each characteristic point in the grid.

7. The method of claim 6, wherein after stitching the virtual face component to a target location on the facial image, the method further comprises:

superimposing the virtual face component stitched on the face image;

and fusing the superposition result to the face image.

8. The method of claim 1, wherein after stitching the virtual face component to a target location on the facial image, the method further comprises:

generating training data of the virtual object based on the face image and a virtual face component spliced on the face image, wherein the training data is used for training a virtual face model, and the virtual face model is a deep learning model;

and re-projecting the virtual face model to a face pinching mixed model to obtain an expression parameter model of the face.

9. A method of processing a facial image, comprising:

displaying a facial image of a physical object on an interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object;

if an image operation instruction is detected in any one area of the interactive interface, triggering and identifying a plurality of face components in the face image, and acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model;

displaying a virtual object on the interactive interface, wherein the virtual object is generated for a target location for stitching the virtual face component onto the facial image.

10. A method of processing a facial image, comprising:

an image operation instruction is sensed in the interactive interface;

responding to the image operation instruction, executing recognition of a plurality of face components in the face image, and acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model;

outputting a selection page on the interactive interface, the selection page providing at least one facial component option, wherein different facial component options are used to characterize the facial components for different locations for selective processing;

displaying a virtual object on the interactive interface, wherein the virtual object is generated by stitching the virtual face component to a target location on the facial image.

11. A method of processing a facial image, comprising:

uploading, by a front-end client, a facial image captured of a physical object, wherein the facial image comprises images of a plurality of facial components of the physical object;

the front-end client transmits the face image of the entity object to a background server;

the front-end client receives a plurality of virtual face components returned by the background server, wherein the virtual face components matched with each face component are obtained by identifying the plurality of face components in the face image, and the virtual face components are materials generated by learning by adopting a neural network model;

and the front-end client receives an image operation instruction, splices the virtual face component at a target position on the face image and generates a virtual object.

12. A method of processing a facial image, comprising:

under the condition that a character operation instruction is detected in the interactive interface, responding to the character operation instruction, identifying a plurality of face components in the face image, and acquiring a virtual face component matched with each face component;

13. A facial image processing apparatus comprising:

a first acquisition module for acquiring a facial image of a physical object, wherein the facial image includes images of a plurality of facial components of the physical object;

a recognition module to recognize a plurality of facial components in the facial image;

the second acquisition module is used for acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model;

and the generating module is used for splicing the virtual face component at the target position on the face image to generate a virtual object.

14. A facial image processing apparatus comprising:

a first display module for displaying a facial image of a physical object on an interactive interface, wherein the facial image includes images of a plurality of facial components of the physical object;

the processing module is used for triggering and identifying a plurality of face components in the face image and acquiring virtual face components matched with each face component if an image operation instruction is detected in any one region of the interactive interface, wherein the virtual face components are materials generated by learning by adopting a neural network model;

a second display module for displaying a virtual object on the interactive interface, wherein the virtual object is generated for a target position for splicing the virtual face component on the face image.

15. A facial image processing apparatus comprising:

the sensing module is used for sensing an image operation instruction in the interactive interface;

the processing module is used for responding to the image operation instruction, identifying a plurality of face components in the face image and acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model;

the output module is used for outputting a selection page on the interactive interface, the selection page provides at least one face component option, wherein different face component options are used for representing face components at different positions to be selectively processed;

a second display module for displaying a virtual object on the interactive interface, wherein the virtual object is generated by splicing the virtual face component to a target position on the face image.

16. A facial image processing apparatus comprising:

an upload module to upload a facial image captured of a physical object, wherein the facial image includes images of a plurality of facial components of the physical object;

the transmission module is used for transmitting the face image of the entity object to a background server;

the receiving module is used for receiving a plurality of virtual face components returned by the background server, wherein the virtual face components matched with each face component are obtained by identifying the plurality of face components in the face image, and the virtual face components are materials generated by learning by adopting a neural network model;

and the generating module is used for receiving an image operation instruction, splicing the virtual face component at the target position on the face image and generating a virtual object.

17. A facial image processing apparatus comprising:

the response module is used for responding to the character operation instruction under the condition that the character operation instruction is detected in the interactive interface, identifying a plurality of face components in the face image and acquiring virtual face components matched with each face component;

18. A computer-readable storage medium comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the facial image processing method according to any one of claims 1 to 11.

19. A processing terminal, comprising: a memory and a processor for executing a program stored in the memory, wherein the program executes to perform the method of processing a facial image according to any one of claims 1 to 11.

20. A facial image processing system, comprising:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: obtaining a facial image of a physical object, wherein the facial image includes images of a plurality of facial components of the physical object; identifying a plurality of facial components in the facial image; acquiring a virtual face component matched with each face component, wherein the virtual face component is a material generated by learning by adopting a neural network model; and splicing the virtual face component at the target position on the face image to generate a virtual object.