CN114373034A - Image processing method, image processing apparatus, image processing device, storage medium, and computer program - Google Patents

Image processing method, image processing apparatus, image processing device, storage medium, and computer program Download PDF

Info

Publication number
CN114373034A
CN114373034A CN202210024607.8A CN202210024607A CN114373034A CN 114373034 A CN114373034 A CN 114373034A CN 202210024607 A CN202210024607 A CN 202210024607A CN 114373034 A CN114373034 A CN 114373034A
Authority
CN
China
Prior art keywords
description information
information
training
image
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210024607.8A
Other languages
Chinese (zh)
Inventor
周志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210024607.8A priority Critical patent/CN114373034A/en
Publication of CN114373034A publication Critical patent/CN114373034A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/344Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving models

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the application discloses an image processing method, an image processing device, image processing equipment, a storage medium and a computer program, wherein the method comprises the following steps: acquiring an image to be processed, wherein the image to be processed comprises a target part of an object, the form of the target part is a first form, and the first form is not matched with an expected form; acquiring description information corresponding to the first form; and modifying based on the description information corresponding to the first form, so as to modify the form of the target part from the first form to a second form, wherein the second form is matched with the expected form. By adopting the embodiment of the application, the manpower resource can be saved, and the form correction efficiency is improved.

Description

Image processing method, image processing apparatus, image processing device, storage medium, and computer program
Technical Field
The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, a storage medium, and a computer program.
Background
In some virtual reality applications and game applications, in order to bring more realistic and shocking effects to players, 3D (three-dimensional) technology is generally adopted to make 3D characters such as player characters and monster characters in games. Further, in order to make the 3D character more realistic, the 3D character may have expressions, such as smiling expressions, and hard expressions.
For example, when the expression of the 3D character is produced, an actor generally performs a performance, and then an animator edits the controller parameters of the face controller bound to the face of the 3D character according to the performance to produce a specific expression, or obtains the corresponding controller parameters automatically according to the photos of the actor by some technical methods.
However, in the 3D character expression created by this method, there is a certain error between the morphology of a certain part and the morphology of the actor (the morphology of the actor can be understood as the expected morphology), it should be understood that the morphology of different parts plays a crucial role in the interpretation of the expression, and even a small difference has a great influence on the emotional expression of certain expressions, especially on various parts of the face. Therefore, after the morphology of a certain part is obtained, the obtained morphology is compared with an expected morphology, and if the obtained morphology does not match with the expected morphology, the morphology is corrected in a manual adjustment mode. The manual correction mode not only consumes human resources, but also reduces the correction efficiency. Therefore, in the field of 3D character expression creation, it is one of the hot issues to study how to perform efficient morphological correction on a part that does not conform to an expected morphology.
Disclosure of Invention
The embodiment of the application provides an image processing method, an image processing device, image processing equipment, a storage medium and a computer program, and improves the form correction efficiency of a target part.
In one aspect, an embodiment of the present application provides an image processing method, including:
acquiring an image to be processed, wherein the image to be processed comprises a target part of an object, and the target part is in a first form; the first form does not match the expected form;
acquiring description information corresponding to the first form;
and modifying the form of the target part based on the description information corresponding to the first form, and modifying the form of the target part from the first form to a second form, wherein the second form is matched with the expected form.
In one aspect, an embodiment of the present application further provides an image processing apparatus, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be processed, the image to be processed comprises a target part of an object, and the form of the target part is a first form; the first form does not match the expected form;
the acquiring unit is further configured to acquire description information corresponding to the first form;
and a correction unit configured to correct a form of the target portion based on the description information corresponding to the first form, and correct the form of the target portion from the first form to a second form, the second form matching the expected form.
In one aspect, an embodiment of the present application provides an image processing apparatus, including: a processor adapted to implement one or more computer programs; a computer storage medium storing one or more computer programs adapted to be loaded and executed by a processor to:
acquiring an image to be processed, wherein the image to be processed comprises a target part of an object, and the target part is in a first form; the first form does not match the expected form;
acquiring description information corresponding to the first form;
and modifying the form of the target part based on the description information corresponding to the first form, and modifying the form of the target part from the first form to a second form, wherein the second form is matched with the expected form.
In one aspect, an embodiment of the present application provides a computer storage medium storing a computer program, which when executed by a processor of an image processing apparatus, is configured to perform:
acquiring an image to be processed, wherein the image to be processed comprises a target part of an object, and the target part is in a first form; the first form does not match the expected form;
acquiring description information corresponding to the first form;
and modifying the form of the target part based on the description information corresponding to the first form, and modifying the form of the target part from the first form to a second form, wherein the second form is matched with the expected form.
In one aspect, embodiments of the present application provide a computer program product or a computer program, where the computer program product includes a computer program, and the computer program is stored in a computer storage medium; a processor of the image processing apparatus reads a computer program from a computer storage medium, the processor executing the computer program to cause the image processing apparatus to execute:
acquiring an image to be processed, wherein the image to be processed comprises a target part of an object, and the target part is in a first form; the first form does not match the expected form;
acquiring description information corresponding to the first form;
and modifying the form of the target part based on the description information corresponding to the first form, and modifying the form of the target part from the first form to a second form, wherein the second form is matched with the expected form.
In the embodiment of the application, the target part of the object in the image to be processed is in the first form, if the first form does not match with the expected form, the description information corresponding to the first form can be acquired, and further, the form of the target part is automatically corrected based on the description information corresponding to the first form, so that the form of the target part is corrected from the first form to the second form, and the second form is matched with the expected form. Compared with the manual correction of the first form, the form correction is automatically performed according to the description information corresponding to the first form, so that the human resource consumption caused by the manual correction can be saved, and the form correction efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1a is a schematic diagram of a 3D character image bound to a form controller according to an embodiment of the present application;
FIG. 1b is a schematic diagram of a morphology controller controlling a morphology of a target site according to an embodiment of the present application;
FIG. 1c is a schematic view of a lip shape modification according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an association between an image to be processed and an image grid according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a parameter prediction model according to an embodiment of the present application;
FIG. 5 is a schematic flowchart of another image processing method provided in the embodiments of the present application;
FIG. 6a is a schematic diagram illustrating a training process of a parameter prediction model according to an embodiment of the present disclosure;
FIG. 6b is a schematic diagram of a training process for a position prediction model according to an embodiment of the present application;
FIG. 6c is a schematic diagram of a training process for an information category classification model according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of countermeasure training of a parameter prediction model and an information category judgment model according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
The embodiment of the application provides an image processing scheme, after an image to be processed is acquired, if a target part of an object in the image to be processed is determined to be in a first form, whether the first form of the target part of the object in the image to be processed is matched with an expected form is further detected; when the first form of the target portion does not match the expected form, the form of the target portion may be automatically modified from the first form to a second form matching the expected form based on the description information corresponding to the first form.
In one embodiment, the image processing scheme may be executed by an image processing device, which may be a terminal, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, a smart home appliance, a smart voice interaction device, or the like; alternatively, the image processing apparatus may be a server, such as an independent physical server, a server cluster or a distributed system configured by a plurality of physical servers, or a cloud server providing a cloud computing service.
In another embodiment, the image processing scheme may be executed by both an image processing device and an image management device, for example, the image management device may store a plurality of images to be processed, the image management device sends the images to be processed to the image processing device, and the image processing device executes the image processing scheme after acquiring the images to be processed; and then, as in the image processing device, acquiring the image to be processed from the local storage, then acquiring description information corresponding to the first form of the target part in the image to be processed, then transmitting the description information to the image management device by the image processing device, modifying the form of the target part based on the description information by the image management device, and finally feeding back the modification result to the image processing device.
Alternatively, the target portion of the subject in the image to be processed may refer to any one of the five sense organs included in the face of the subject, such as eyes, lips, nose, ears, eyebrows, and the like; or the target portion of the subject in the image to be processed may also refer to other portions of the subject's body, such as fingers, arms, etc. For convenience of description, the embodiment of the present application is developed by taking the target portion as a lip portion as an example, without specific explanation.
The image processing scheme provided by the embodiment of the application can be applied to a 3D character expression making scene, for example, the image to be processed is a 3D character image, expressions are made for the 3D character image through the following steps (1) and (2), and the form of a target part in the made expressions is a first form. Specifically, the method comprises the following steps:
(1) referring to fig. 1a, which is a schematic diagram of binding a 3D character image and a morphological controller according to an embodiment of the present invention, 101 denotes the 3D character image, and the 3D character image is associated with an image mesh, so that each part included in the face of an object in the 3D character image can be described by several mesh vertices, for example, lips can be described by mesh vertices within 103. 102 represents a site control corresponding to a 3D character image in a form controller, a site control may include one or more components, and a site of a 3D character corresponds to a site control, e.g., 1021 represents a site control bound to a lip of a 3D character in a form controller. The controller parameters in the form controller may include parameters of each part control, and the parameters of each part control may include position information, size information, shape information, and the like of components included in each part control; when the parameter of the part control corresponding to a certain part of the face is changed, the shape of the part is changed along with the transmission, taking lips as an example, referring to fig. 1b, for the relationship between a shape controller and the shape of the lips provided by the embodiment of the present application, it is assumed that 011 in fig. 1b represents the part control corresponding to the lips in the shape controller, the part control may include a component 012, and when the component 012 is in the position and shape shown by 013, the shape of the lips of the 3D character is shown as 014; when the component is changed from the position and form shown by 013 to the position and form shown by 015, the form in which the lips of the 3D character are positioned is changed as shown by 016;
(2) an actor makes a segment of a performance, and then the animator edits the form controller parameters according to the performance to produce a specific expression, for example, when the actor makes a segment of the performance, the form of the lips is shown as 11 in fig. 1c, and the form of the lips is the expected form; the form controller parameters are edited to create an expression of the 3D character image, the first form of the lips in the created expression being shown at 12 in fig. 1 c.
It can be seen that the first configuration shown at 12 is different from the configuration of the lips (the expected configuration) when the actor performs. Therefore, it is necessary to perform shape correction on the lips by using the image processing scheme provided by the embodiment of the present application, specifically, when the lips are in the current shape, description information corresponding to the current shape is obtained, where the description information may be position information of a plurality of meshes used for representing the lips in an image mesh, and further, the shape corrected by performing shape correction based on the description information is shown as 13 in fig. 1 c.
As can be seen from fig. 1b, the form corrected by using the image processing scheme of the present application has a high matching degree with the expected form, and the image processing scheme of the present application does not require artificial correction, so that the correction time is saved, and the correction efficiency can be improved.
Optionally, the 3D character may be a game object in a game application, before the game application is released formally, some expressions may be designed in advance for the game object in order to achieve a more realistic game effect, and the game object may present different expressions in different game scenes. For example, in a game battle scene, a game object can present a more exciting expression; when the game is won, the game object may show a smiling expression. Different expressions can be designed for the game object through the steps (1) and (2), and if the form of the target part in the designed expressions does not match with the expected form, the form of the target part is modified through the image processing scheme, so that the expressions of the game object are more accurate.
Optionally, the 3D character may also be a virtual human in a scene of performing human-computer interaction based on the virtual human, and the scene of performing human-computer interaction based on the virtual human may be: the virtual man serves a product, and discusses or learns the product through chat interaction between the virtual man and the user. In order to enable the user to experience a realistic feeling of chatting with a real person, several expressions can be designed for the virtual person in advance, and the virtual person can show different forms in different chatting scenes. For example, the chat scene provides a user with a word about the product, and then the virtual human can present a smiling expression; the chat scene shows that the user is dissatisfied with the product, and the virtual human can show a difficult expression. And (3) designing an expression for the virtual human through the steps (1) and (2), and if the form of the target part in the designed expression does not match with the expected form, correcting the form of the target part by adopting the image processing scheme.
In other embodiments, the scene of performing human-computer interaction based on the virtual human may also be: the avatar acts as a use guide for an application, and when the user first enters the application, the avatar instructs the user how to use the application by some physical manipulation. Or, the scene of performing human-computer interaction based on the virtual human may further include: the avatar may be an avatar of a user in an application, with which the user characterizes himself, through which the user may have conversational interaction with friends in the application, or the avatar may be used in the application to participate in activities simulating real life, such as planting vegetables, making meetings, making friends, etc.
In a specific implementation, all the application scenarios related to the object shape correction can be the image processing scheme provided in the embodiment of the present application, so as to achieve the purposes of saving the correction time and improving the correction efficiency.
Based on the image processing scheme, an embodiment of the present application provides an image processing method, and referring to fig. 2, a flowchart of the image processing method provided by the embodiment of the present application is shown. The image processing method described in fig. 2 may be executed by an image processing apparatus, and specifically may be executed by a processor of the image processing apparatus; the image processing method illustrated in fig. 2 may include the steps of:
step S201, acquiring an image to be processed, where the image to be processed includes a target portion of an object, a form of the target portion is a first form, and the first form is not matched with an expected form.
The image to be processed may be stored locally in the image processing device, and the image processing device may directly obtain the image to be processed from the local storage at this time; alternatively, the image to be processed may be generated by and stored in other devices, where the acquiring of the image to be processed by the image processing device means that the image processing device receives the image to be processed sent by the other devices.
The image to be processed may include the face of an object, which may be a human or an animal, and may refer to a 3D character image in 3D character emoji, and the image to be processed may be synthesized by using a 3D technique, for example, see 12 in fig. 1c, which may represent one image to be processed.
The target portion in the image to be processed may refer to any one of five sense organs of the face of the subject, such as eyes, eyebrows, a nose, lips, and ears; alternatively, the target site may refer to other parts of the subject, such as the arm. The shape of the target portion may be a first shape, the first shape may be generated by the face controller, and details on how the first shape is generated by the face controller may be referred to in the foregoing, and will not be described herein again. It should be understood that different configurations of the target portion may convey different states of the subject, and that the first configuration may include a mouth-open state, a smile state, or a laugh state, etc., assuming the target portion is a lip.
In one embodiment, a first morphology of a target region in an image to be processed may be compared with an expected morphology, and a matching condition between the first morphology and the expected morphology is determined; if the first form does not match the expected form, the subsequent steps S202 and S203 are required to perform form correction on the target region; step S202 and step S203 may not be performed if the first modality matches the expected modality. For example, the target area is the lips, the first form is a smiling state, and the expected form is a smiling state with open mouth, so that the first form is not matched with the expected form; or both the first modality and the expected modality are smiling, but the different degree of smiling also determines that the first modality is not matched to the expected modality.
In one embodiment, comparing the first morphology of the target region in the image to be processed with the expected morphology, and determining a matching condition between the first morphology and the expected morphology may include: and manually visually comparing the first form with the expected form, and feeding back the matching condition between the first form and the expected form to the image processing equipment according to the comparison result.
In another embodiment, comparing the first morphology of the target region in the image to be processed with the expected morphology, and determining a matching condition between the first morphology and the expected morphology, may further include: carrying out similarity comparison processing on the target part in the first form and the target part in the expected form by adopting an image similarity analysis technology to obtain a similarity result; if the similarity result is less than the similarity threshold, it may be determined that the first morphology does not match the expected morphology; if the similarity result is greater than the similarity threshold, the first morphology is determined to match the expected morphology.
The expected shape may be any shape that is specified in advance, and the expected shape may be any shape in which the target portion of the object is located in any picture or any video, for example, in creating a 3D character expression, the shape in which the target portion of the actor is located in a performance of the actor is specified in advance as the expected shape.
Step S202, obtaining description information corresponding to the first form.
The image to be processed may be associated with an image mesh, the image mesh may include M mesh vertices, the mesh vertices refer to points constituting the mesh, there may be multiple meshes on the image mesh, and each mesh may be composed of multiple vertices. The target portion corresponds to N mesh vertices of the M mesh vertices, and a change in position of the N mesh vertices (where a change in position of the N mesh vertices may refer to a change in one or more mesh vertices of the N mesh vertices) may cause the target portion to have a different form, so that the position information of the N mesh vertices may be regarded as description information of the first form.
For example, referring to fig. 3, a schematic diagram of associating an image to be processed with an image mesh provided by an embodiment of the present application is provided, where the image mesh may include a plurality of mesh vertices, such as 301 and 302 in fig. 3, which each represent a mesh vertex, and each portion of the face of a subject in the image to be processed may correspond to several mesh vertices, such as a mesh vertex in the region 303 corresponding to a lip. Assuming that the target portion is a lip, the description information when the lip is in the first form is the position information of a plurality of mesh vertices in the region 303.
Step S203 corrects the form of the target portion based on the description information corresponding to the first form, and corrects the form of the target portion from the first form to a second form, the second form matching the expected form.
In one embodiment, modifying the form of the target portion based on the description information corresponding to the first form to modify the form of the target portion from the first form to the second form may refer to: predicting a target parameter of the form controller based on the description information corresponding to the first form, and calling the form controller to adjust the description information corresponding to the first form based on the target parameter, wherein the adjusted description information is used for reflecting the form of the target part as the second form.
The target parameter can be predicted based on the description information corresponding to the first form by calling a parameter prediction model, and the parameter prediction model can be obtained by pre-training. In a specific implementation, the step S203 of modifying the form of the target portion based on the description information corresponding to the first form, and modifying the form of the target portion from the first form to the second form, includes: calling a parameter prediction model to perform parameter prediction based on the description information corresponding to the first form to obtain a target parameter of the form controller; and the calling form controller adjusts the description information based on the target parameter, wherein the adjusted description information is used for reflecting the form of the target part as a second form. The parameter prediction model may be any deep network structure, see fig. 4, which is a schematic structural diagram of the parameter prediction model provided in the embodiment of the present application, and the parameter prediction model described in fig. 4 may include three fully-connected layers, where the three fully-connected layers are connected in sequence, an input of a subsequent fully-connected layer is an output of a previous fully-connected layer, and an input of a first fully-connected layer is description information corresponding to a first form. And the three full-connection layers sequentially output a target parameter to the corresponding parameter of the fitting description information in the form controller based on respective input. Each fully connected layer may correspond to an activation function, which may be a leak relu. LeakyReLU is one of the most commonly used activation functions in the deep learning field, and the specific expression thereof can be shown as the following formula:
Figure BDA0003463574880000091
in summary, if the input to the activation function leak relu is information greater than or equal to 0, the output is the input itself, and if the input is information less than 0, the output is the multiplication of the input by a parameter α, which is typically 0.2. In the above formula, x represents the input of the activation function.
In another embodiment, the modifying the form of the target portion based on the description information corresponding to the first form may further include: acquiring description information corresponding to an expected form; and adjusting the description information corresponding to the first form in a direction of reducing the difference between the description information corresponding to the first form and the description information corresponding to the expected form, wherein the adjusted description information corresponds to the second form. The method for correcting the form of the target part based on the description information corresponding to the first form can be applied to a scene where the description information corresponding to the expected form can be acquired; the above method of correcting the form of the target portion based on the description information corresponding to the first form is more suitable for the application scenario in which the description information corresponding to the expected form cannot be acquired, and may also be suitable for the scenario in which the description information corresponding to the expected form can be acquired.
In the embodiment of the application, the target part of the object in the image to be processed is in the first form, and if the first form is not matched with the expected form, the description information corresponding to the first form can be acquired; further, the target portion is morphologically modified based on the description information corresponding to the first morphology such that the morphology of the target portion is modified from the first morphology to a second morphology that matches the expected morphology, such that the morphology of the target portion does not need to be modified manually by a person, but is automatically morphologically modified according to the description information of the first morphology such that the morphology of the target portion satisfies the expected morphology. The human resource consumption caused by manual correction is saved, and the form correction efficiency is improved.
Based on the above image processing method, an embodiment of the present application further provides another image processing method, and referring to fig. 5, a flowchart of the another image processing method provided by the embodiment of the present application is shown. The image processing method described in fig. 5 may be executed by an image processing apparatus, and may specifically be executed by a processor of the image processing apparatus. The image processing method of fig. 5 may include the steps of:
step S501, an image to be processed is obtained, the image to be processed comprises a target part of an object, the form of the target part is a first form, and the first form is not matched with an expected form.
And step S502, acquiring description information corresponding to the first form.
Some possible implementations included in step S501 may refer to the relevant descriptions in step S201 in fig. 2, and are not described herein again. Some possible implementations included in step S502 can refer to the related descriptions in step S202 in fig. 2, and are not described herein again.
And S503, calling a parameter prediction model to perform parameter prediction based on the description information to obtain a target parameter of the form controller.
In one embodiment, the parameter prediction model and the information category discrimination model are obtained by performing countermeasure training, and the countermeasure training performed by the parameter prediction model and the information category discrimination model is described in detail below, and includes the following steps s1-s5:
s 1: acquiring a first training sample set, wherein the first training sample set comprises sample images and sample image label information, the sample images comprise first sample images and second sample images, and the first sample images and the second sample images both comprise target parts of objects; the sample image tag information includes: first description information corresponding to the first training form of the target part in the first sample image, and second description information corresponding to the second training form of the target part in the second sample image; the first training morphology does not match the first expected training morphology and the second training morphology matches the second expected training morphology.
The first sample image and the second sample image may be the same or different, and the first expected training form and the second expected training form may be the same or different, and if the first expected training form and the second expected training form are the same, it indicates that the first sample image and the second sample image are the same, and the second training form is obtained by modifying the first training form. That is, the first description information and the second description information are not required to be mutually corresponding in the embodiment of the present application.
The target portion is the same as that in the foregoing embodiment of fig. 2, the first sample image corresponds to one image mesh, and the first description information corresponding to the first sample image when the target portion is in the first training form may be first position information of a plurality of mesh vertices corresponding to the target portion in the image mesh associated with the first sample image. It should be noted that the image grid associated with the first sample image, the image grid associated with the second sample image, and the image grid associated with the image to be processed may be the same. The first expected training form and the second expected training form may be determined in the same manner as the expected form, and may be both in the same manner as the target portion in the expression made by the actor.
The number of the first sample images may be X1, X1 is largeAn integer of 1 or more, assuming that the first description information is represented as McoarseIn the first sample image, the number of mesh vertices corresponding to the target portion is N, and the dimension of each mesh vertex is 3-dimensional, so M iscoarseIs (X1, N × 3), where N is much smaller than X1. The number of the second sample images is X2, X2 is an integer greater than or equal to 1, and it is assumed that the second description information is represented as MfineIn the second sample image, the number of mesh vertices corresponding to the target portion is N, and the dimension of each mesh vertex is also 3-dimensional, so MfineHas the dimension of (X2, N X3). Wherein, X1 and X2 may be the same or different. It should be noted that, in the image meshes corresponding to different sample images in the embodiment of the present application, the number of mesh vertices corresponding to the target portion may be the same or different, and may be set according to different products in practical application. For convenience of description, in the following description of the embodiments of the present application, it is assumed that image meshes corresponding to different images may be the same, and the number of mesh vertices corresponding to the same target region in the image meshes corresponding to different images may also be the same, which are all assumed to be N.
In other words, if the number of the first sample images is X1, the first description information includes the first description information of each first sample image, and the first description information of each first sample image can be regarded as a set of N mesh vertices. Similarly, if the number of the first sample images is X2, the second description information includes the second description information of each second sample image, and the second description information of each second sample image can be regarded as a set of N mesh vertices.
That is, the first description information may include X1 lines, each line includes the first position information of N1 mesh vertices in a first sample image, for example, the first position information of a mesh vertex can be represented by a three-dimensional coordinate (X, y, z), and each line of the first description information may be represented as (X, y, z)1,y1,z1,x2,y2,z3,...,xN1,yN1,zN1) That is, each line of the first descriptor puts together the bits of the N1 mesh vertices in one first sample image. Similarly, each row in the second description information is second position information of N mesh vertices in one second sample image.
s 2: and calling a parameter prediction model to perform parameter prediction based on the first description information to obtain a first training parameter of the form controller. If the number of the first sample images may be X1, the number of the first description information is also X1, and the X1 first description information are input to the parametric prediction model one by one when the parametric prediction model is trained. Specifically, when the parameter prediction model is trained, first description information corresponding to one first sample image is input to the parameter prediction model at a time, and the parameter prediction model outputs one first training parameter. Based on this, the process of the parameter prediction model performing the parameter prediction process based on the first description information can be represented by fig. 6 a. The parameter prediction model may also be referred to as (Meah2Params), the dimension of the first description information input into the parameter prediction model each time is (1, N × 3), the dimension of the first training parameter output by the parameter prediction model is (1, H), and H represents the number of parameters used by the form controller to control the target region.
s 3: and reconstructing the description information based on the first training parameter to obtain reconstructed description information corresponding to the target part. In specific implementation, the first training parameter is used as the input of the position reconstruction model, and the position prediction model is called to reconstruct the description information to obtain the reconstruction description information corresponding to the target part. That is, the position reconstruction model is used to reconstruct a description information according to a parameter of a morphology controller, in other words, the position reconstruction model is used to simulate a process in which the morphology controller controls the description information based on the parameter.
The position reconstruction model may be a pre-trained model, the position reconstruction model is obtained by training based on a third training sample set, the third training sample set includes a third sample image, the third sample image includes a target portion of an object, the third training sample set further includes a correspondence between third description information and a second training parameter in the form controller, and the third description information is description information corresponding to the target portion in the third sample image in a third training form. The correspondence between the third description information in the third sample image and the second training parameter in the morphology controller may include a plurality of groups. Wherein the third training shape of the target portion in the third sample image may be a shape matching an expected training shape, or a shape not matching an expected training shape, for example, the expected training shape is a shape in which the target portion is located in an expression of an actor, and the third training shape may be a shape automatically generated by the shape controller based on the expression of the actor, and the shape may have an error from the expected training shape (the shape of the target portion in the expression made by the actor), so that the third training shape generated in this way is considered not matching the expected shape. Alternatively, the third training modality may be a modality in which the modality controller manually adjusts an automatically generated modality after automatically generating a modality based on an expression of the actor. The third training modality at this time is matched to the expected training modality. The expected training pattern may be the same as or different from the first expected training pattern and the second expected training pattern.
The third sample image may correspond to an image mesh, the target portion may correspond to some mesh vertices in the image mesh, and the third description information when the target portion is in the third training form may refer to position information of the mesh vertices. The correspondence between the third description information and the second training parameter in the morphology controller may be determined based on the morphology controller and the third description location information. Specifically, a second training parameter is randomly generated in the form controller, and the form of the target portion in the third sample image is modified by the second training parameter, where the third description information corresponding to the form of the target portion is the third description information corresponding to the current second training parameter. Repeating the above processes several times to obtain the corresponding relationship between the multiple sets of third description information and the second training parameters.
Furthermore, a group of third description information and the second training parameter are used as a training data pair, and a position reconstruction model is called to reconstruct the description information of the second training parameter in the training data pair, so as to obtain the training description information. Assuming that a correspondence relationship between X3 sets of third description information and the second training parameters is obtained, the dimension of each piece of description information is 3, and each piece of third description information may correspond to N mesh vertices, the dimension of the third description information is (X3, N × 3). Referring to fig. 6b, for a training schematic diagram of a position reconstruction model provided in this embodiment of the present application, the position reconstruction model may also be referred to as (Params2Mesh, P2M), the second training parameter in the correspondence between a set of third description information and the second training parameter is input into the P2M model each time, and assuming that the number of parameters related to the target region in the morphology controller is H, the dimension of the second training parameter input into the P2M model each time is (1, H), the P2M model performs description information reconstruction based on the second training parameter, and outputs training description information, where the dimension of the training prediction description information is (1, N × 3).
After the training description information is obtained, the position reconstruction model may be further trained based on the training description information and the third description information. Specifically, a third loss function corresponding to the position reconstruction model is determined based on the training description information and the third description information; and updating the model parameters in the position reconstruction model based on the third loss function by adopting a back propagation algorithm, thereby realizing the training of the position reconstruction model. Optionally, determining the third loss function corresponding to the position reconstruction model based on the training description information and the third description information may refer to performing a mean square error operation on the training description information and the third description information, for example, determining the third loss function corresponding to the position reconstruction model based on the training description information and the third description information may be represented by the following formula (1):
L3=MeanSquareError(Mpredict,Mground truth) (1)
where MeanSquareeror (-) represents the mean square error operation, L3 represents the third loss function, MpredictRepresenting training description information, Mground truthIs shown asAnd three kinds of description information.
The model parameters in the position reconstruction model are updated by calculating partial derivatives of the third loss function on the model parameters in the position reconstruction model. For example, it can be expressed by the following formula (2):
Figure BDA0003463574880000141
in formula (2), w't+1Representing updated network parameters, w 'in the position reconstruction model'tRepresenting the network parameters before updating in the location reconstruction model. Alpha is the learning rate and generally takes a value of about 10 e-3. The model parameters are iteratively optimized until the third loss function is no longer changed.
Optionally, the position reconstruction model may also be any deep network structure as long as the position reconstruction model can meet the input and output requirements, can perform iterative optimization through training, and finally obtains correct description information by inputting a parameter of a form controller. The position reconstruction model provided by the embodiment of the present application may have the same network structure as the parameter prediction model, as shown in fig. 4, the difference between the position reconstruction model and the parameter prediction model is as follows: the input is different, the input of the position reconstruction model is a parameter of the form controller, and the input of the parameter prediction model is description information; the output is different, the output of the position reconstruction model is description information, and the output of the parameter prediction model is a parameter of the form controller; the number of hidden elements in each fully connected layer is different.
s 4: and calling an information type distinguishing model to distinguish the information type of the reconstructed description information to obtain a first distinguishing result. The label of the first type of information is used for indicating that the information category to which the second description information belongs is the second type of information. In the embodiment of the present application, the information categories may be classified into a first category of information and a second category of information, the first category of information and the second category of information may be used to reflect an accuracy of the morphology of the target portion compared with the expected morphology, and the first category of information indicates that the morphology of the target portion is the most accurate, or the morphology of the target portion is matched with the expected morphology; the second type of information indicates that the morphology of the target site is least accurate. In colloquial terms, the first type of information may be true information, the second type of information may be false information, the label of the first type of information may be represented by 1, and the label of the second type of information may be represented by 0.
The information category discrimination model can also be called as a discriminator model, the information category discrimination model and the parameter prediction model form a confrontation network, and in the confrontation network, the parameter prediction model and the information category discrimination model are obtained by performing confrontation training; the confrontational training means that: when the parameter prediction model is trained, model parameters of the information type discrimination model are assumed to be unchanged; when the information category model is trained, it can be assumed that model parameters of the parameter prediction model are unchanged, and it is ensured that reconstructed description information obtained based on the parameter prediction model after the countervailing training can be confirmed as first-category information by the information category judgment model. That is, the objective of the countermeasure training is to enable the parameter prediction model and the information category identification model to play a game with each other, so that the parameter prediction model can be accurate, the reconstructed description information generated by the parameter prediction model is considered as the first type of information by the information category identification model, and the information category identification model cannot be too strong, so that the information category identification model can identify the obtained reconstructed description information as the first type of information but not as the second type of information.
s5: and training the parameter prediction model based on the first description information, the reconstruction description information, the first class information label and the first discrimination result. In the specific implementation, the mean square error operation is carried out on the first description information and the reconstruction description information to obtain a first mean square operation result, and the mean square error operation is carried out on the first judgment result and the first class information label to obtain a second mean square operation result; acquiring a reconstruction loss weight and an antagonistic loss weight; multiplying the first mean square operation result and the reconstruction loss weight to obtain a reconstruction loss function, and multiplying the countermeasure loss weight and the second mean square operation result to obtain a countermeasure loss function; adding the reconstruction loss function and the countermeasure loss function to obtain a first loss function of the parameter prediction model; model parameters in the parametric prediction model are updated based on the first penalty function using a back propagation algorithm.
Assume that the first description information uses McoarePresentation, reconstruction description information uses MreconsTo show, the mean square error operation performed on the first description information and the reconstructed description information to obtain a first operation result may be represented as: MSE (M)coarce,Mrecons) (ii) a Assuming that the first type information label is represented by 1, the first discrimination result output by the information type discrimination model may be represented as D (M)recons) And D is represented as an information category discrimination model, then performing mean square error operation on the first discrimination result and the first category information label to obtain a second mean square operation result may be represented as: MSE (D (M)recons),1)。
Suppose that the reconstruction loss weight can be expressed as wreconsThen the reconstruction loss function can be expressed as: w is arecons*MSE(Mcoarce,Mrecons) Loss of weight on loss can be expressed as wadvThen the penalty function can be expressed as: w is aadu*MSE(D(Mrecons),1). The meaning of the reconstruction loss function is to enable the parameter prediction model to retain the correct part in the first description information, and the meaning of the countermeasure loss function is to enable the parameter prediction model to correct the error between the reconstruction description information and the second description information, in other words, the countermeasure loss function needs to enable the reconstruction description information to be distinguished as the first type information by the information type distinguishing model as much as possible. The reconstruction loss weight and the countermeasure loss weight may be specifically set according to different service scenarios, for example, if the error between the first training form corresponding to the first description information and the first expected training form is smaller, which indicates that more correct information needs to be retained by the first description information, the reconstruction loss weight may be set larger, and the countermeasure loss weight may be set smaller; conversely, if the error between the first training pattern and the first expected training pattern is larger, then it indicates that less correct information needs to be retained in the first descriptor, and the reconstruction loss weight can be set smaller to counteract the larger loss weight.
The first loss function obtained by adding the reconstruction loss function and the countermeasure loss function can be expressed by the following formula (3):
L1=wrecons*MSE(Mcoarce,Mrecons)+wadu*MSE(D(Mrecons),1) (3)
and after the first loss function is obtained, updating model parameters in the parameter prediction model based on the first loss function by adopting a back propagation algorithm. The specific implementation of this part is the same as that of updating the model parameters of the position reconstruction model based on the third loss function by using the back propagation algorithm in the formula (1), which can be specifically referred to above, and is not described herein again.
Optionally, as can be seen from the foregoing, the parameter prediction model and the information category discrimination model are countertrain, and therefore, in addition to training the parameter prediction model, the embodiment of the present application also needs to train the information category discrimination model. The purpose of training the information type distinguishing model is to enable the information type distinguishing model to accurately distinguish whether the description information is the first type information or the second type information. Therefore, when the information type discrimination model is trained, it is necessary for the information type discrimination model to know what the first type of information is and what the second type of information is, and therefore, the data used when the information type discrimination model is trained is as follows: and based on the reconstruction description information obtained by the parameter prediction model and second description information when the target part in the second sample image is in a second training form, the information type of the reconstruction description information is second type information, and the type of the second description information is first type information. If the training of the information type discrimination model is finished, after a piece of reconstruction description information is input into the information type discrimination model, the information type discrimination model determines that the information type to which the reconstruction information belongs is the second type, and similarly, after a piece of second description information is input into the information type discrimination model, the information type discrimination model can determine that the information type to which the second description information belongs is the first type information.
The first type information and the second type information are different in feature distribution, and the information type discrimination model mainly discriminates the information type to which the input data belongs according to the difference between the first type information and the second type information.
The training process for the information category discrimination model can be summarized as follows: calling an information type distinguishing model to carry out information type distinguishing processing on the second description information to obtain a second distinguishing result; and training the information type discrimination model based on the label of the first type of information, the label of the second type of information, the first discrimination result and the second discrimination result. And the label of the second type of information is used for indicating that the information category to which the reconstruction description information belongs is the second type of information.
As can be seen from the foregoing, the data input to the information type discrimination model each time may be a piece of second description information or a piece of reconstruction description information, and the first discrimination result or the second discrimination result output by the information type discrimination model may be a label value, for example, if the label of the first type information is represented by 1 and the label of the second type information is represented by 0, the first discrimination result and the second discrimination result output by the information type discrimination model may be a label value between 0 and 1. Referring to fig. 6b, for a schematic diagram of a training information category decision model provided in this embodiment of the present application, in fig. 6b, assuming that the number of the second description information is X2, as can be seen from the foregoing, each second description information may be represented by N three-dimensional grid vertex coordinates, so that the dimension of the second description information may be (N × 3, X3), and the dimension of the reconstruction description information may be (N × 3, X2), each time inputting one second description information or one reconstruction description information into the information category decision model, so that the dimension of the input data of the information category decision model is (N × 3, 1), the output of the information category decision model is used to indicate the information category to which the input data belongs, if the label of the first type information in the information category is represented by 1, and the label of the second type information is represented by 0, then the decision result output by the information category decision model is a value between 0 and 1, the dimension is denoted (1').
Optionally, training the information category judgment model based on the label of the first type of information, the label of the second type of information, the first judgment result, and the second judgment result may specifically include: performing mean square error operation on the labels of the second type of information and the first judgment result to obtain a first operation result; performing mean square error operation on the labels of the first type of information and the second judgment result to obtain a second operation result; adding the first operation result and the second operation result to obtain a second loss function corresponding to the information type discrimination model; and updating the model parameters in the information category model based on the second loss function by adopting a back propagation algorithm. The above formula (2) may be referred to in a manner of updating the model parameters in the information category model based on the second loss function by using a directional transmission algorithm, and details are not repeated here. The second loss function of the information category discrimination model can be expressed as shown in the following formula (4):
L2=MSE(D(Mrecons),0)+MSE(D(Mfine),1) (4)
in equation (4), L2 represents the second loss function, MSE (-) represents the mean square error operation, MreconsRepresenting reconstruction description information, 0 representing a label of the second type of information, 1 representing a label of the first type of information, MfineIndicating the second description information. D (M)recons) Indicates the first discrimination result, D (M)fine) And representing the second judgment result.
It should be understood that the information category discrimination model and the parameter prediction model are alternately trained, and the training of the information category discrimination model and the training of the parameter prediction model do not specify the training sequence, so that the information category discrimination model can be trained once and then the parameter prediction model can be trained; alternatively, the parameter prediction model may be trained first, and then the information category discrimination model may be trained based on the reconstruction description information and the second description information generated in the parameter prediction model training process.
Step S504, the form controller is called to adjust the description information based on the target parameter, and the adjusted description information is used for reflecting that the form of the target part is a second form.
In an embodiment, some possible implementations included in step S504 can refer to the related description of step S203 in fig. 2, and are not described herein again.
Based on the descriptions of the above steps s1-s5, the embodiment of the present application provides a training schematic diagram of the parameter prediction model and the information category discrimination model for performing the countermeasure training, which is shown in fig. 7. In fig. 7, the parameter prediction model is represented as M2P, the information type discrimination model is a discriminator, the target region is assumed to be a lip, the first description information corresponding to the lip when the lip of the first sample image is in the first training form is not matched with the first expected training form and thus may be referred to as a mesh vertex position of the rough mouth form, and the second description information corresponding to the lip when the lip of the second sample image is in the second training form is matched with the second expected training form and thus may be referred to as a mesh vertex position of the accurate mouth form.
First, assuming that the model parameters of M2P are unchanged, the discriminant is trained: inputting grid vertex positions of the rough mouth shape into a parameter prediction model M2P, and performing parameter prediction by M2P based on the input to output a first training parameter; the first training parameter is used as an input of a position reconstruction model P2M, and P2M reconstructs description information based on the input first training parameter to obtain reconstruction description information, wherein the reconstruction description information can also be called a mesh vertex position of a corrected mouth shape; further, the mesh vertex position of the corrected mouth shape and the mesh vertex position of the accurate mouth shape are used as training data of a discriminator to train the discriminator; after training of the discriminator is completed, inputting the grid vertex positions of the rough mouth shape into a parameter prediction model M2P, and carrying out parameter prediction by M2P based on the input to output a first training parameter; the first training parameter is used as the input of a position reconstruction model P2M, and P2M carries out description information reconstruction based on the input first training parameter to obtain the mesh vertex position of the corrected mouth shape; and calling a discriminator to discriminate the grid vertex position of the corrected mouth shape, obtaining a first loss function of the M2P model according to the discrimination result, the grid vertex position of the rough mouth shape, the first class information label and the grid vertex position of the corrected mouth shape, and then training the M2P model based on the first loss function.
In the embodiment of the application, the parameter prediction model and the information category discrimination model are subjected to countermeasure training by adopting the first training sample set, so that the parameter prediction model can more accurately correct the description information corresponding to a form which does not conform to the expected form, and the description information conforming to the expected form is obtained. Further, in practical applications, if the first form of the target portion in the image to be processed does not match the expected form, the first description information corresponding to the first form may be modified by calling the parameter prediction model, so that the form corresponding to the modified description information matches the expected form. Therefore, the form of the target part is automatically corrected according to the description information of the first form, the human resources consumed by manual modification are saved, and the correction efficiency is improved to a certain extent.
Based on the foregoing image processing method embodiment, an embodiment of the present application provides an image processing apparatus, and referring to fig. 8, a schematic structural diagram of the image processing apparatus provided in the embodiment of the present application is shown. The image processing apparatus illustrated in fig. 8 may operate as follows:
an acquiring unit 801, configured to acquire an image to be processed, where the image to be processed includes a target portion of an object, and a form of the target portion is a first form; the first form does not match the expected form;
the obtaining unit 801 is further configured to obtain description information corresponding to the first form;
a correcting unit 802, configured to correct the form of the target portion by using the description information corresponding to the first form, and correct the form of the target portion from the first form to a second form, where the second form matches the expected form.
In one embodiment, the target site is bound to a morphology controller; when the form of the target portion is modified based on the description information and the form of the target portion is modified from the first form to the second form, the modifying unit 802 performs the following steps:
calling a parameter prediction model to perform parameter prediction based on the description information to obtain a target parameter of the form controller;
and calling the form controller to adjust the description information based on the target parameter, wherein the adjusted description information is used for reflecting that the form of the target part is a second form.
In one embodiment, the parameter prediction model is obtained by performing countermeasure training with an information category judgment model; the image processing apparatus may further include a processing unit 803;
the obtaining unit 801 is further configured to obtain a first training sample set, where the first training sample set includes a sample image and sample image tag information, the sample image includes a first sample image and a second sample image, both the first sample image and the second sample image include a target portion of an object, and the sample image tag information includes: first description information corresponding to the first sample image when the target part is in a first training form, and second description information corresponding to the second sample image when the target part is in a second training form; the first training morphology does not match a first expected training morphology, the second training morphology matches a second expected training morphology;
the processing unit 803 is configured to invoke the parameter prediction model to perform parameter prediction based on the first description information, so as to obtain a first training parameter of the form controller; reconstructing description information based on the first training parameter to obtain reconstructed description information corresponding to the target part;
the processing unit 803 is further configured to invoke the information type discrimination model to perform information type discrimination on the reconstructed description information to obtain a first discrimination result; and training the parameter prediction model based on the first description information, the reconstruction description information, a label of first-class information and the first discrimination result, wherein the label of the first-class information is used for indicating that the information category to which the second description information belongs is first-class information.
In one embodiment, when the parameter prediction model is trained based on the first description information, the reconstruction description information, the label of the first type of information, and the first determination result, the processing unit 803 performs the following steps:
performing mean square error operation on the first description information and the reconstruction description information to obtain a first mean square operation result, and performing mean square error operation on the first judgment result and the label of the first type of information to obtain a second mean square operation result;
acquiring a reconstruction loss weight and an antagonistic loss weight;
multiplying the first mean square operation result and the reconstruction loss weight to obtain a reconstruction loss function, and multiplying the countermeasure loss weight and the second mean square operation result to obtain a countermeasure loss function;
adding the reconstruction loss function and the countermeasure loss function to obtain a first loss function of the parameter prediction model;
updating model parameters in the parametric prediction model based on the first loss function using a back propagation algorithm.
In one embodiment, the information category corresponding to the reconstruction description information is a second type of information; the processing unit 803 is further configured to invoke the information type identification model to perform information type identification processing on the second description information, so as to obtain a second identification result;
and training the information type discrimination model based on the label of the first type of information, the label of the second type of information, the first discrimination result and the second discrimination result.
In one embodiment, when the processing unit 803 trains the information category judgment model based on the label of the first type information, the label of the second type information, the first judgment result and the second judgment result, the following steps are performed:
performing mean square error operation on the labels of the second type of information and the first judgment result to obtain a first operation result, and performing mean square error operation on the labels of the first type of information and the second judgment result to obtain a second operation result;
adding the first operation result and the second operation result to obtain a second loss function corresponding to the information type discrimination model;
and updating model parameters in the information category discrimination model based on the second loss function by adopting a back propagation algorithm.
In an embodiment, when the processing unit 803 reconstructs description information based on the first training parameter to obtain reconstructed description information corresponding to the target portion, the processing unit executes the following steps:
and taking the first training parameter as the input of a position reconstruction model, and calling the position reconstruction model to reconstruct the description information to obtain reconstruction description information corresponding to the target part.
In an embodiment, the obtaining unit 801 is further configured to obtain a second training sample set, where the second training sample set includes a third sample image, the third sample image includes a target portion of a subject, the second training sample set further includes a correspondence between third description information and a second training parameter of the form controller, and the third description information refers to description information corresponding to the target portion in a third training form in the third sample image;
the processing unit 803 is further configured to invoke the position reconstruction model to reconstruct description information based on the second training parameter, so as to obtain training description information; training the position reconstruction model based on the training description information and the third description information.
In one embodiment, the processing unit 803, when training the position reconstruction model based on the training description information and the third description information, performs the following steps:
determining a third loss function corresponding to the position reconstruction model based on the training description information and the third description information;
updating model parameters in the position reconstruction model based on the third loss function by adopting a back propagation algorithm.
In one embodiment, the image to be processed is associated with an image mesh, the image mesh includes M mesh vertices, the target site corresponds to N mesh vertices of the M mesh vertices, M and N are both integers greater than 1 and N is less than or equal to M; the description information corresponding to the first form includes position information of the N mesh vertices.
According to an embodiment of the present application, the steps involved in the image processing methods shown in fig. 2 and 5 may be performed by units in the image processing apparatus shown in fig. 8. For example, step S201 and step S202 shown in fig. 2 may be performed by the acquisition unit 801 in the image processing apparatus shown in fig. 8, step S203 may be performed by the correction unit 802 in the image processing apparatus shown in fig. 8, and step S501 and step S502 may be performed by the acquisition unit 801 in the image processing apparatus shown in fig. 8; step S503 and step S504 may be performed by the correction unit 802 in the image processing apparatus described in fig. 8.
According to another embodiment of the present application, the units in the image processing apparatus shown in fig. 8 may be respectively or entirely combined into one or several other units to form the image processing apparatus, or some unit(s) may be further split into multiple units with smaller functions to form the image processing apparatus, which may achieve the same operation without affecting the achievement of the technical effects of the embodiments of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the image processing apparatus may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of a plurality of units.
According to another embodiment of the present application, the image processing apparatus as shown in fig. 8 may be configured by running a computer program (including program codes) capable of executing the steps involved in the respective methods as shown in fig. 2 and 5 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and the image processing method according to an embodiment of the present application may be implemented. The computer program may be, for example, recorded on a computer-readable storage medium, and loaded into and executed in the image processing apparatus via the computer-readable storage medium.
In the embodiment of the application, the target part of the target face in the image to be processed is in the first form, if the first form does not match with the expected form, the description information corresponding to the first form and the contour information of the target part in the first form can be acquired, and further, the form of the target part is modified from the first form to the second form by performing form modification on the target part based on the description information corresponding to the first form and the contour information of the target part, wherein the second form is matched with the expected form, so that the form modification on the target part is automatically performed according to the description information of the first form and the contour information of the target part in the first form without manually modifying the first form by people, and the form of the target part meets the expected form. The human resource consumption caused by manual correction is saved, and the form correction efficiency is improved.
Based on the above image processing method embodiment and image processing apparatus embodiment, the present application provides an image processing apparatus. Referring to fig. 9, which is a schematic structural diagram of an image processing apparatus provided in an embodiment of the present disclosure, the image processing apparatus described in fig. 9 may include a processor 901, an input interface 902, an output interface 903, and a computer storage medium 904. The processor 901, the input interface 902, the output interface 903, and the computer storage medium 904 may be connected by a bus or other means.
A computer storage medium 904 may be stored in the memory of the image processing apparatus, the computer storage medium 904 being used for storing a computer program, the processor 901 being used for executing the computer program stored by the computer storage medium 904. The processor 901 (or CPU) is a computing core and a control core of the image Processing apparatus, and is adapted to implement one or more computer programs, and specifically to load and execute:
acquiring an image to be processed, wherein the image to be processed comprises a target part of an object, and the target part is in a first form; the first form does not match the expected form;
acquiring description information corresponding to the first form;
and modifying the form of the target part based on the description information corresponding to the first form, and modifying the form of the target part from the first form to a second form, wherein the second form is matched with the expected form.
In the embodiment of the application, the target part of the target face in the image to be processed is in the first form, if the first form does not match with the expected form, the description information corresponding to the first form and the contour information of the target part in the first form can be acquired, and further, the form of the target part is modified from the first form to the second form by performing form modification on the target part based on the description information corresponding to the first form and the contour information of the target part, wherein the second form is matched with the expected form, so that the form modification on the target part is automatically performed according to the description information of the first form and the contour information of the target part in the first form without manually modifying the first form by people, and the form of the target part meets the expected form. The human resource consumption caused by manual correction is saved, and the form correction efficiency is improved.
An embodiment of the present application also provides a computer storage medium (Memory), which is a Memory device of an image processing apparatus and is used to store programs and data. It is understood that the computer storage medium herein may include both a built-in storage medium of the image processing apparatus and, of course, an extended storage medium supported by the image processing apparatus. The computer storage medium provides a storage space that stores an operating system of the image processing apparatus. Also stored in this memory space are one or more computer programs adapted to be loaded and executed by processor 901. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.
In one embodiment, one or more computer programs stored in the computer storage medium may be loaded and executed by the processor 901 to:
acquiring an image to be processed, wherein the image to be processed comprises a target part of an object, and the target part is in a first form; the first form does not match the expected form; acquiring description information corresponding to the first form; and modifying the form of the target part based on the description information corresponding to the first form, and modifying the form of the target part from the first form to a second form, wherein the second form is matched with the expected form.
In one embodiment, the target portion is bound to a form controller, and the processor 901 performs the following steps when modifying the form of the target portion based on the description information to modify the form of the target portion from a first form to a second form:
calling a parameter prediction model to perform parameter prediction based on the description information to obtain a target parameter of the form controller;
and calling the form controller to adjust the description information based on the target parameter, wherein the adjusted description information is used for reflecting that the form of the target part is a second form.
In one embodiment, the parameter prediction model is obtained by performing countermeasure training with the information category judgment model; the processor 901 is further configured to perform:
obtaining a first training sample set, wherein the first training sample set comprises sample images and sample image label information, the sample images comprise a first sample image and a second sample image, the first sample image and the second sample image both comprise a target part of an object, and the sample image label information comprises: first description information corresponding to the first sample image when the target part is in a first training form, and second description information corresponding to the second sample image when the target part is in a second training form; the first training morphology does not match a first expected training morphology, the second training morphology matches a second expected training morphology;
calling the parameter prediction model to perform parameter prediction based on the first description information to obtain a first training parameter of the form controller;
reconstructing description information based on the first training parameter to obtain reconstructed description information corresponding to the target part;
calling the information type distinguishing model to distinguish the information type of the reconstruction description information to obtain a first distinguishing result;
and training the parameter prediction model based on the first description information, the reconstruction description information, a label of first-class information and the first discrimination result, wherein the label of the first-class information is used for indicating that the information category to which the second description information belongs is first-class information.
In one embodiment, when the processor 901 trains the parameter prediction model based on the first description information, the reconstruction description information, the label of the first type information, and the first determination result, the following steps are performed:
performing mean square error operation on the first description information and the reconstruction description information to obtain a first mean square operation result, and performing mean square error operation on the first judgment result and the label of the first type of information to obtain a second mean square operation result;
acquiring a reconstruction loss weight and an antagonistic loss weight;
multiplying the first mean square operation result and the reconstruction loss weight to obtain a reconstruction loss function, and multiplying the countermeasure loss weight and the second mean square operation result to obtain a countermeasure loss function;
adding the reconstruction loss function and the countermeasure loss function to obtain a first loss function of the parameter prediction model;
updating model parameters in the parametric prediction model based on the first loss function using a back propagation algorithm.
In one embodiment, the information category corresponding to the reconstruction description information is the second type information; the processor 901 is further configured to load and execute:
calling the information type distinguishing model to carry out information type distinguishing processing on the second description information to obtain a second distinguishing result;
and training the information type discrimination model based on the label of the first type of information, the label of the second type of information, the first discrimination result and the second discrimination result.
In an embodiment, when the processor 901 trains the information type discrimination model based on the label of the first type information, the label of the second type information, the first discrimination result, and the second discrimination result, the following steps are specifically performed:
performing mean square error operation on the labels of the second type of information and the first judgment result to obtain a first operation result, and performing mean square error operation on the labels of the first type of information and the second judgment result to obtain a second operation result;
adding the first operation result and the second operation result to obtain a second loss function corresponding to the information type discrimination model;
and updating model parameters in the information category discrimination model based on the second loss function by adopting a back propagation algorithm.
In an embodiment, when the processor 901 reconstructs description information based on the first training parameter to obtain reconstructed description information corresponding to the target portion, the following steps are performed:
and taking the first training parameter as the input of a position reconstruction model, and calling the position reconstruction model to reconstruct the description information to obtain reconstruction description information corresponding to the target part.
In one embodiment, the processor 901 is further configured to perform:
acquiring a second training sample set, wherein the second training sample set comprises a third sample image, the third sample image comprises a target part of an object, the second training sample set further comprises a corresponding relation between third description information and a second training parameter of the form controller, and the third description information refers to description information corresponding to the target part in the third sample image in a third training form;
calling the position reconstruction model to reconstruct description information based on the second training parameters to obtain training description information;
training the position reconstruction model based on the training description information and the third description information.
In one embodiment, the processor 901, when training the position reconstruction model based on the training description information and the third description information, performs the following steps:
determining a third loss function corresponding to the position reconstruction model based on the training description information and the third description information;
updating model parameters in the position reconstruction model based on the third loss function by adopting a back propagation algorithm.
In one embodiment, the image to be processed is associated with an image mesh, the image mesh includes M mesh vertices, the target site corresponds to N mesh vertices of the M mesh vertices, M and N are both integers greater than 1 and N is less than or equal to M; the description information corresponding to the first form includes position information of the N mesh vertices.
In the embodiment of the application, the target part of the object in the image to be processed is in the first form, if the first form does not match with the expected form, the description information corresponding to the first form can be acquired, and further, the form of the target part is automatically corrected based on the description information corresponding to the first form, so that the form of the target part is corrected from the first form to the second form, and the second form is matched with the expected form. Compared with the manual correction of the first form, the form correction is automatically performed according to the description information corresponding to the first form, so that the human resource consumption caused by the manual correction can be saved, and the form correction efficiency is improved.
An embodiment of the present application further provides a computer program product or a computer program, where the computer program product includes a computer program, the computer program is stored in a computer storage medium, and the computer program is loaded by the processor 901 and executes:
acquiring an image to be processed, wherein the image to be processed comprises a target part of an object, and the target part is in a first form; the first form does not match the expected form;
acquiring description information corresponding to the first form;
and modifying the form of the target part based on the description information corresponding to the first form, and modifying the form of the target part from the first form to a second form, wherein the second form is matched with the expected form.
In the embodiment of the application, the target part of the object in the image to be processed is in the first form, if the first form does not match with the expected form, the description information corresponding to the first form can be acquired, and further, the form of the target part is automatically corrected based on the description information corresponding to the first form, so that the form of the target part is corrected from the first form to the second form, and the second form is matched with the expected form. Compared with the manual correction of the first form, the form correction is automatically performed according to the description information corresponding to the first form, so that the human resource consumption caused by the manual correction can be saved, and the form correction efficiency is improved.

Claims (14)

1. An image processing method, comprising:
acquiring an image to be processed, wherein the image to be processed comprises a target part of an object, and the target part is in a first form; the first form does not match the expected form;
acquiring description information corresponding to the first form;
and modifying the form of the target part based on the description information corresponding to the first form, and modifying the form of the target part from the first form to a second form, wherein the second form is matched with the expected form.
2. The method of claim 1, wherein binding the target site to a morphology controller, wherein modifying the morphology of the target site from the first morphology to a second morphology based on the description information corresponding to the first morphology comprises:
calling a parameter prediction model to perform parameter prediction based on the description information to obtain a target parameter of the form controller;
and calling the form controller to adjust the description information based on the target parameter, wherein the adjusted description information is used for reflecting that the form of the target part is a second form.
3. The method of claim 2, wherein the parametric prediction model is trained against an information category discriminant model, the method further comprising:
obtaining a first training sample set, wherein the first training sample set includes sample images and sample image label information, the sample images include a first sample image and a second sample image, the first sample image and the second sample image both include a target portion of an object, and the sample image label information includes: first description information corresponding to the first sample image when the target part is in a first training form, and second description information corresponding to the second sample image when the target part is in a second training form; the first training morphology does not match a first expected training morphology, the second training morphology matches a second expected training morphology;
calling the parameter prediction model to perform parameter prediction based on the first description information to obtain a first training parameter of the form controller;
reconstructing description information based on the first training parameter to obtain reconstructed description information corresponding to the target part;
calling the information type distinguishing model to distinguish the information type of the reconstruction description information to obtain a first distinguishing result;
and training the parameter prediction model based on the first description information, the reconstruction description information, a label of first-class information and the first discrimination result, wherein the label of the first-class information is used for indicating that the information category to which the second description information belongs is first-class information.
4. The method of claim 3, wherein training the parametric prediction model based on the first description information, the reconstruction description information, the label of the first type of information, and the first decision comprises:
performing mean square error operation on the first description information and the reconstruction description information to obtain a first mean square operation result, and performing mean square error operation on the first judgment result and the label of the first type of information to obtain a second mean square operation result;
acquiring a reconstruction loss weight and an antagonistic loss weight;
multiplying the first mean square operation result and the reconstruction loss weight to obtain a reconstruction loss function, and multiplying the countermeasure loss weight and the second mean square operation result to obtain a countermeasure loss function;
adding the reconstruction loss function and the countermeasure loss function to obtain a first loss function of the parameter prediction model;
updating model parameters in the parametric prediction model based on the first loss function using a back propagation algorithm.
5. The method of claim 3, wherein the information category to which the reconstruction description information belongs is a second type of information, the method further comprising:
calling the information type distinguishing model to carry out information type distinguishing processing on the second description information to obtain a second distinguishing result;
and training the information type discrimination model based on the label of the first type of information, the label of the second type of information, the first discrimination result and the second discrimination result.
6. The method of claim 5, wherein training the information category discrimination model based on the label of the first type of information, the label of the second type of information, the first discrimination result, and the second discrimination result comprises:
performing mean square error operation on the labels of the second type of information and the first judgment result to obtain a first operation result, and performing mean square error operation on the labels of the first type of information and the second judgment result to obtain a second operation result;
adding the first operation result and the second operation result to obtain a second loss function corresponding to the information type discrimination model;
and updating model parameters in the information category discrimination model based on the second loss function by adopting a back propagation algorithm.
7. The method of claim 3, wherein the reconstructing description information based on the first training parameter to obtain the reconstructed description information corresponding to the target region comprises:
and taking the first training parameter as the input of a position reconstruction model, and calling the position reconstruction model to reconstruct the description information to obtain reconstruction description information corresponding to the target part.
8. The method of claim 7, wherein the method further comprises:
acquiring a second training sample set, wherein the second training sample set comprises a third sample image, the third sample image comprises a target part of an object and a corresponding relation between third description information and a second training parameter in the form controller, and the third description information refers to description information corresponding to the target part in the third sample image in a third training form;
calling the position reconstruction model to reconstruct description information based on the second training parameters to obtain training description information;
training the position reconstruction model based on the training description information and the third description information.
9. The method of claim 8, wherein the training the position reconstruction model based on the training description information and the third description information comprises:
determining a third loss function corresponding to the position reconstruction model based on the training description information and the third description information;
updating model parameters in the position reconstruction model based on the third loss function by adopting a back propagation algorithm.
10. The method of any one of claims 1-9, wherein the image to be processed is associated with an image mesh comprising M mesh vertices thereon, the target site corresponds to N mesh vertices of the M mesh vertices, M and N are both integers greater than 1 and N is less than or equal to M; the description information corresponding to the first form includes position information of the N mesh vertices.
11. An image processing apparatus characterized by comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be processed, the image to be processed comprises a target part of an object, and the form of the target part is a first form; the first form does not match the expected form;
the acquiring unit is further configured to acquire description information corresponding to the first form;
and a correction unit configured to correct a form of the target portion based on the description information corresponding to the first form, and correct the form of the target portion from the first form to a second form, the second form matching the expected form.
12. An image processing apparatus characterized by comprising:
a processor adapted to implement one or more computer programs; and the number of the first and second groups,
computer storage medium storing one or more computer programs adapted to be loaded by the processor and to perform the image processing method according to any of claims 1-10.
13. A computer storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the image processing method according to any one of claims 1 to 10.
14. A computer program product or computer program, characterized in that a computer program is included in the computer program product, the computer program being stored in a computer storage medium, the computer program in the computer storage medium being adapted to load and execute the image processing method according to any of claims 1-10 when being executed by a processor.
CN202210024607.8A 2022-01-10 2022-01-10 Image processing method, image processing apparatus, image processing device, storage medium, and computer program Pending CN114373034A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210024607.8A CN114373034A (en) 2022-01-10 2022-01-10 Image processing method, image processing apparatus, image processing device, storage medium, and computer program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210024607.8A CN114373034A (en) 2022-01-10 2022-01-10 Image processing method, image processing apparatus, image processing device, storage medium, and computer program

Publications (1)

Publication Number Publication Date
CN114373034A true CN114373034A (en) 2022-04-19

Family

ID=81144652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210024607.8A Pending CN114373034A (en) 2022-01-10 2022-01-10 Image processing method, image processing apparatus, image processing device, storage medium, and computer program

Country Status (1)

Country Link
CN (1) CN114373034A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115775024A (en) * 2022-12-09 2023-03-10 支付宝(杭州)信息技术有限公司 Virtual image model training method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914806A (en) * 2013-01-09 2014-07-09 三星电子株式会社 Display apparatus and control method for adjusting the eyes of a photographed user
CN110348387A (en) * 2019-07-12 2019-10-18 腾讯科技(深圳)有限公司 A kind of image processing method, device and computer readable storage medium
US20190377979A1 (en) * 2017-08-30 2019-12-12 Tencent Technology (Shenzhen) Company Limited Image description generation method, model training method, device and storage medium
CN111028305A (en) * 2019-10-18 2020-04-17 平安科技(深圳)有限公司 Expression generation method, device, equipment and storage medium
US20200234480A1 (en) * 2019-01-18 2020-07-23 Snap Inc. Systems and methods for realistic head turns and face animation synthesis on mobile device
CN112529999A (en) * 2020-11-03 2021-03-19 百果园技术(新加坡)有限公司 Parameter estimation model training method, device, equipment and storage medium
WO2021228183A1 (en) * 2020-05-13 2021-11-18 Huawei Technologies Co., Ltd. Facial re-enactment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914806A (en) * 2013-01-09 2014-07-09 三星电子株式会社 Display apparatus and control method for adjusting the eyes of a photographed user
US20190377979A1 (en) * 2017-08-30 2019-12-12 Tencent Technology (Shenzhen) Company Limited Image description generation method, model training method, device and storage medium
US20200234480A1 (en) * 2019-01-18 2020-07-23 Snap Inc. Systems and methods for realistic head turns and face animation synthesis on mobile device
CN110348387A (en) * 2019-07-12 2019-10-18 腾讯科技(深圳)有限公司 A kind of image processing method, device and computer readable storage medium
CN111028305A (en) * 2019-10-18 2020-04-17 平安科技(深圳)有限公司 Expression generation method, device, equipment and storage medium
WO2021228183A1 (en) * 2020-05-13 2021-11-18 Huawei Technologies Co., Ltd. Facial re-enactment
CN112529999A (en) * 2020-11-03 2021-03-19 百果园技术(新加坡)有限公司 Parameter estimation model training method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DIMOSTHENIS TSAGKRASOULIS ET AL: "Heritability maps of human face morphology through large-scale automated three-dimensional phenotyping", SCIENTIFIC REPORTS, 19 April 2017 (2017-04-19) *
曹仰杰 等: "生成式对抗网络及其计算机视觉应用研究综述", 中国图象图形学报, no. 10, 16 October 2018 (2018-10-16) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115775024A (en) * 2022-12-09 2023-03-10 支付宝(杭州)信息技术有限公司 Virtual image model training method and device
CN115775024B (en) * 2022-12-09 2024-04-16 支付宝(杭州)信息技术有限公司 Virtual image model training method and device

Similar Documents

Publication Publication Date Title
US11741668B2 (en) Template based generation of 3D object meshes from 2D images
CN111754596B (en) Editing model generation method, device, equipment and medium for editing face image
CN111354079B (en) Three-dimensional face reconstruction network training and virtual face image generation method and device
CN108961369B (en) Method and device for generating 3D animation
US20220207810A1 (en) Single image-based real-time body animation
US11514638B2 (en) 3D asset generation from 2D images
CN111260754B (en) Face image editing method and device and storage medium
WO2020150687A1 (en) Systems and methods for photorealistic real-time portrait animation
CN110555896B (en) Image generation method and device and storage medium
CN112862807B (en) Hair image-based data processing method and device
US20230177755A1 (en) Predicting facial expressions using character motion states
CN117635897B (en) Three-dimensional object posture complement method, device, equipment, storage medium and product
CN115699099A (en) Visual asset development using generation of countermeasure networks
CN114373034A (en) Image processing method, image processing apparatus, image processing device, storage medium, and computer program
US20230368395A1 (en) Image processing method, apparatus and device, storage medium and computer program
CN113408694A (en) Weight demodulation for generative neural networks
CN100474341C (en) Adaptive closed group caricaturing
CN115690276A (en) Video generation method and device of virtual image, computer equipment and storage medium
CN112991152A (en) Image processing method and device, electronic equipment and storage medium
CN113077383A (en) Model training method and model training device
CN113223128A (en) Method and apparatus for generating image
CN112439200B (en) Data processing method, data processing device, storage medium and electronic equipment
CN114972661B (en) Face model construction method, face image generation device and storage medium
Chai et al. Efficient mesh-based face beautifier on mobile devices
US20240221263A1 (en) System and method for training and using an implicit representation network for representing three dimensional objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40071575

Country of ref document: HK