CN118799440A - Digital human image generation method, device, equipment and readable storage medium - Google Patents

Digital human image generation method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN118799440A
CN118799440A CN202410873148.XA CN202410873148A CN118799440A CN 118799440 A CN118799440 A CN 118799440A CN 202410873148 A CN202410873148 A CN 202410873148A CN 118799440 A CN118799440 A CN 118799440A
Authority
CN
China
Prior art keywords
image
region
hair
digital
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410873148.XA
Other languages
Chinese (zh)
Inventor
华泽宏
曹逸民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Geely Holding Group Co Ltd
Geely Automobile Research Institute Ningbo Co Ltd
Original Assignee
Zhejiang Geely Holding Group Co Ltd
Geely Automobile Research Institute Ningbo Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Geely Holding Group Co Ltd, Geely Automobile Research Institute Ningbo Co Ltd filed Critical Zhejiang Geely Holding Group Co Ltd
Priority to CN202410873148.XA priority Critical patent/CN118799440A/en
Publication of CN118799440A publication Critical patent/CN118799440A/en
Pending legal-status Critical Current

Links

Landscapes

  • Processing Or Creating Images (AREA)
  • Image Processing (AREA)

Abstract

The application provides a digital human image generation method, a device, equipment and a readable storage medium, wherein the method determines an interested area image containing a human head area for driving processing through a digital human hair area in an input original image, cuts the interested area image into a first image, generates a digital human image containing human head movement, which is matched with a digital human head movement coefficient, and re-fuses the digital human image into the input original image, thereby solving the problem of image distortion caused when the input image comprises more partial images except the human head area, ensuring that the fusion boundary of the hair area and a background area in the generated digital human image is smooth, and the head movement of a digital human is natural, thereby improving the quality of the digital human image and the naturalness and realism of the digital human.

Description

Digital human image generation method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for generating a digital human image.
Background
With the rapid iterative development of digital human technology, the head gesture and facial expression of a digital human matched with text or audio input can be simulated based on an original image describing the initial expression state of the digital human through a machine learning model, so that the digital human can naturally express according to the text or audio input. The SADTALKER algorithm is used as an open source technology, and based on head motion coefficients corresponding to input audio, the 3DMM (3D Morphable Model, three-dimensional deformable model) drives the head and face of a digital person in an original image to deform, so that a matched digital person image is generated.
However, SADTALKER algorithm has certain limitations in practical applications. Based on the algorithm, when the matched digital human image is generated, all pixels of the input image are driven, and when the input image comprises more partial images except for a human head area (such as a whole body image of a digital human), obvious distortion can occur on the head and the body of the digital human in the generated matched digital human image, so that the generated digital human image is distorted.
Disclosure of Invention
In view of the above, the present application provides a method, apparatus, device and readable storage medium for generating digital human images.
Specifically, the application is realized by the following technical scheme:
According to a first aspect of an embodiment of the present application, there is provided a digital human image generation method, the method including:
Determining a region of interest in an original image of a digital person; the region of interest includes a first hair region of the digital person;
cutting out a region of interest in the original image to obtain a first image;
Driving the head gesture and facial expression of the digital person in the first image according to the digital person head motion coefficient of the first image to generate a second image;
And fusing the second image with the original image according to the position information of the first image on the original image, and generating a digital human image corresponding to the original image.
Optionally, the determining the region of interest in the original image of the digital person includes:
Determining a first hair region of the digital person in an original image of the digital person, and acquiring position information of boundaries of the first hair region in a horizontal direction and a vertical direction;
obtaining the center coordinates of the first hair region according to the position information of the boundary;
And determining the region of interest according to the central coordinate and the region width of the first hair region in the horizontal direction and the vertical direction.
Optionally, the determining the region of interest according to the center coordinates and the region width of the first hair region in the horizontal direction and the vertical direction includes:
determining the horizontal side length of the region of interest according to the region width of the first hair region in the horizontal direction;
Determining the vertical side length of the region of interest according to the region width of the first hair region in the vertical direction;
and determining the region of interest according to the horizontal side length and the vertical side length and the center coordinates.
Optionally, the determining the region of interest according to the horizontal side length and the vertical side length and the center coordinate includes:
Adjusting the horizontal side length and the vertical side length according to the size of the original image in the case that the horizontal side length is greater than the width of the original image or the vertical side length is greater than the height of the original image;
And determining the region of interest according to the adjusted horizontal side length and the adjusted vertical side length.
Optionally, the fusing the second image with the original image according to the position information of the first image on the original image includes:
Constructing a target mask image of the second image according to a second hair region and key points of the face of the digital person on the second image; the target mask map is used for identifying a human head area and a non-human head area;
determining a fusion weight corresponding to each pixel point on the second image according to the pixel value of each pixel point on the target mask image;
And according to the fusion weight, carrying out linear fusion on the second image and the original image.
Optionally, before constructing the target mask map, the method further comprises:
Acquiring a first eye vector in a first image corresponding to the second image; the first eye mouth vector is predetermined according to the coordinate information of key points of the mouth and eyes of the digital person;
Detecting face key points of the digital person on the second image, and generating a second eye-mouth vector according to coordinate information of key points of a mouth and eyes in the face key points;
Determining vector transformation parameters according to the first eye-mouth vector and the second eye-mouth vector;
determining a second hair vector of the second image according to the first hair vector of the digital person in the first image and the vector transformation parameter; the first hair vector is predetermined according to the coordinate information of the top and the bottom of the hair of the first hair area on the first image;
And determining a second hair region of the digital person on the second image according to the second hair vector.
Optionally, the method further comprises the step of pre-determining the first hair vector:
For a first hair region on the first image, respectively acquiring coordinates of a left lowest point and a right lowest point of a hair bottom region in the horizontal direction, and taking the average coordinates as the hair bottom coordinates;
in the vertical direction, determining the top coordinates of the hair according to the coordinates of each point in the top area of the hair;
And generating the first hair vector according to the hair bottom coordinate and the hair top coordinate.
Optionally, the constructing a target mask map of the second image according to the second hair region and the face key points of the digital person on the second image includes:
Creating a blank image; the pixel points of the blank image are in one-to-one correspondence with the pixel points of the corresponding positions on the second image;
According to the second hair area and the key points of the human face on the second image, determining a human head area and a non-human head area on the blank image;
And setting pixel values for the pixel points in the head region and the non-head region respectively, and generating the target mask map.
Optionally, after generating the target mask map, the method further comprises:
detecting a pixel value mutation region in the target mask map;
And smoothing the pixel points of the pixel value mutation area by using Gaussian blur to obtain an updated target mask diagram.
Optionally, the determining, according to the pixel value of each pixel point on the target mask map, a fusion weight corresponding to each pixel point on the second image includes:
for each pixel point on the target mask graph, carrying out normalization processing on pixel values of the pixel points;
and taking the normalization processed result as a fusion weight corresponding to the pixel point with the same pixel position on the second image.
Optionally, the linearly fusing the second image with the original image according to the fusion weight includes:
acquiring a second pixel value of each pixel point on the second image, and acquiring a first pixel value of the same pixel position on the original image;
And according to the fusion weight corresponding to each pixel point on the second image, linearly superposing the first pixel value and the second pixel value to obtain the digital human image.
Optionally, in the case that the original image of the digital person includes a multi-frame image of the original video, the method further includes a digital person target video generation step including:
combining digital human images corresponding to each original image according to the inter-frame sequence of the multi-frame images in the original video to generate a first video;
and synchronizing the input driving voice data with the first video to generate a digital human target video matched with the driving voice data.
According to a second aspect of an embodiment of the present application, there is provided a digital human image generating apparatus including:
The interest region determining module is used for determining an interest region in the original image of the digital person; the region of interest includes a first hair region of the digital person;
The first image acquisition module is used for cutting out the region of interest in the original image to obtain a first image;
The driving processing module is used for driving and processing the head gesture and the facial expression of the digital person in the first image according to the digital person head motion coefficient of the first image to generate a second image;
And the digital human image generation module is used for fusing the second image with the original image according to the position information of the first image on the original image to generate a digital human image corresponding to the original image.
Optionally, the region of interest determining module is specifically configured to:
A boundary position information acquisition module, configured to determine a first hair region of the digital person in an original image of the digital person, and acquire position information of a boundary of the first hair region in a horizontal direction and a vertical direction;
The center coordinate determining module is used for obtaining the center coordinate of the first hair region according to the position information of the boundary;
and the determining module is used for determining the region of interest according to the central coordinate and the region width of the first hair region in the horizontal direction and the vertical direction.
Optionally, the region of interest determining module is specifically configured to:
determining the horizontal side length of the region of interest according to the region width of the first hair region in the horizontal direction;
Determining the vertical side length of the region of interest according to the region width of the first hair region in the vertical direction;
and determining the region of interest according to the horizontal side length and the vertical side length and the center coordinates.
Optionally, the region of interest determining module is configured to determine the region of interest according to the horizontal side length and the vertical side length and in combination with the center coordinates, and includes:
Adjusting the horizontal side length and the vertical side length according to the size of the original image in the case that the horizontal side length is greater than the width of the original image or the vertical side length is greater than the height of the original image;
And determining the region of interest according to the adjusted horizontal side length and the adjusted vertical side length.
Optionally, the digital human image generating module includes:
The target mask image construction module is used for constructing a target mask image of the second image according to the second hair region and the key points of the face of the digital person on the second image; the target mask map is used for identifying a human head area and a non-human head area;
The fusion weight determining module is used for determining the fusion weight corresponding to each pixel point on the second image according to the pixel value of each pixel point on the target mask image;
and the image fusion module is used for carrying out linear fusion on the second image and the original image according to the fusion weight.
Optionally, before constructing the target mask map, the apparatus further includes:
the first eye mouth vector acquisition module is used for acquiring a first eye mouth vector in a first image corresponding to the second image; the first eye mouth vector is predetermined according to the coordinate information of key points of the mouth and eyes of the digital person;
The second eye-mouth vector acquisition module is used for detecting face key points of the digital person on the second image and generating a second eye-mouth vector according to coordinate information of key points of a mouth and eyes in the face key points;
the vector transformation parameter determining module is used for determining vector transformation parameters according to the first eye mouth vector and the second eye mouth vector;
a second hair vector determining module, configured to determine a second hair vector of the second image according to the first hair vector of the digital person in the first image and the vector transformation parameter; the first hair vector is predetermined according to the coordinate information of the top and the bottom of the hair of the first hair area on the first image;
And the second hair area determining module is used for determining a second hair area of the digital person on the second image according to the second hair vector.
Optionally, the apparatus further comprises a first hair vector predetermination module, specifically for:
For a first hair region on the first image, respectively acquiring coordinates of a left lowest point and a right lowest point of a hair bottom region in the horizontal direction, and taking the average coordinates as the hair bottom coordinates;
in the vertical direction, determining the top coordinates of the hair according to the coordinates of each point in the top area of the hair;
And generating the first hair vector according to the hair bottom coordinate and the hair top coordinate.
Optionally, the target mask graph construction module is specifically configured to:
Creating a blank image; the pixel points of the blank image are in one-to-one correspondence with the pixel points of the corresponding positions on the second image;
According to the second hair area and the key points of the human face on the second image, determining a human head area and a non-human head area on the blank image;
And setting pixel values for the pixel points in the head region and the non-head region respectively, and generating the target mask map.
Optionally, after generating the target mask map, the apparatus further includes:
detecting a pixel value mutation region in the target mask map;
And smoothing the pixel points of the pixel value mutation area by using Gaussian blur to obtain an updated target mask diagram.
Optionally, the fusion weight determining module is specifically configured to:
for each pixel point on the target mask graph, carrying out normalization processing on pixel values of the pixel points;
and taking the normalization processed result as a fusion weight corresponding to the pixel point with the same pixel position on the second image.
Optionally, the image fusion module is specifically configured to:
acquiring a second pixel value of each pixel point on the second image, and acquiring a first pixel value of the same pixel position on the original image;
And according to the fusion weight corresponding to each pixel point on the second image, linearly superposing the first pixel value and the second pixel value to obtain the digital human image.
Optionally, in the case that the original image of the digital person includes a multi-frame image of the original video, the apparatus further includes:
the first video generation module is used for combining the digital human images corresponding to each original image according to the inter-frame sequence of the multi-frame images in the original video to generate a first video;
And the audio synchronization module is used for synchronizing the input driving voice data with the first video and generating a digital human target video matched with the driving voice data.
According to a third aspect of embodiments of the present application, there is provided an electronic device including: a memory and a processor; the memory is used for storing a computer program; the processor is used for executing the digital human image generation method by calling the computer program.
According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described digital human image generation method.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
In the technical scheme provided by the application, the region-of-interest image containing the head region for driving processing is determined through the head region of the digital person in the input original image, the region-of-interest image is cut into the first image, the digital person image containing the head motion matching the digital person head motion coefficient is generated, and the digital person image is recombined to the input original image, so that the problem of image distortion caused when the input image comprises more partial images except the head region is solved, the fusion boundary between the head region and the background region in the generated digital person image is smooth, and the head motion of the digital person is natural, thereby improving the quality of the digital person image and the naturalness and realism of the digital person.
Drawings
FIG. 1A is a schematic diagram of a related art digital human driven image distortion, according to an exemplary embodiment of the present application;
FIG. 1B is a schematic diagram of a digital person driven image distortion in another related art, shown in accordance with an exemplary embodiment of the present application;
FIG. 2A is a flow chart of a digital human image generation method according to an exemplary embodiment of the present application;
FIG. 2B is a schematic representation of a digital person's hair area in an original image, according to an exemplary embodiment of the present application;
FIG. 2C is a schematic diagram of an image fusion scenario according to an exemplary embodiment of the present application;
FIG. 3 is a flowchart illustrating a process for determining a region of interest from an original image in accordance with an exemplary embodiment of the present application;
FIG. 4A is a schematic diagram illustrating a fusion boundary transition stiffness in accordance with an exemplary embodiment of the present application;
FIG. 4B is a flowchart illustrating a process for fusing a second image based on a mask map with an original image according to an exemplary embodiment of the present application;
FIG. 4C is a graph showing comparison of fusion results in a different image fusion mode according to an exemplary embodiment of the present application;
FIG. 5 is a flowchart illustrating steps for determining hair areas of a digital person on a second image according to an exemplary embodiment of the present application;
FIG. 6 is a flowchart illustrating the steps of predetermining a first hair vector for a digital person on a first image, in accordance with an exemplary embodiment of the present application;
FIG. 7A is a flowchart illustrating a target mask map pre-construction step according to an exemplary embodiment of the present application;
FIG. 7B is a target mask diagram illustrating an example embodiment of the present application for identifying a human head region and a non-human head region;
FIG. 8 is a flowchart illustrating an image fusion step in which a second image is fused with the original image, according to an exemplary embodiment of the present application;
FIG. 9 is a block diagram illustrating a digital human image generation method in accordance with an exemplary embodiment of the present application;
fig. 10 is a schematic structural view of a digital human image generating apparatus according to an exemplary embodiment of the present application;
fig. 11 is a hardware schematic of an electronic device according to an exemplary embodiment of the application.
Detailed Description
With the rapid iterative development of digital man technology, the head gesture and facial expression of a digital man matched with text or audio input can be simulated based on an original image describing the initial expression state of the digital man through a machine learning model, so that the digital man can perform natural expression and digital man movement according to the text or audio input, and convenience is brought to the intelligent question-answering interaction digital man fields of live broadcast industry, electronic commerce industry and the like.
One implementation commonly used in generating a matching digital person image from input audio is the SADTALKER algorithm that is open-source. For input audio, SADTALKER algorithm firstly uses mapping network to generate corresponding t frame 3DMM head motion coefficient including head attitude coefficient and facial expression coefficient, then obtains original image of digital person, uses head motion coefficient corresponding to the original image to drive head and facial deformation of digital person in the original image, thus generating matched digital person image.
The image preprocessing stage of SADTALKER algorithm includes three processing modes: a cutting mode, a deformation mode and a whole body mode. In the cutting mode and the deformation mode, the algorithm requires that the input original image is a half-body figure at maximum, and when the input original image exceeds the algorithm limit, the generated matched digital human image generates obvious human face and body distortion, as shown in fig. 1A, the input original image is larger than the half-body figure, and the face and body distortion of the digital human image is correspondingly output. The original image area input under the deformation mode is larger than the original image area input under the cutting mode, so that under the deformation mode, when the movement amplitude of the head of the digital person is overlarge, the distortion of the face and the neck is easy to occur, and the distortion schematic diagram shown in fig. 1B is referred, and the neck and the face outline of the digital person are obviously distorted compared with the original image. In contrast, the whole-body mode supports inputting a whole-body portrait as an original image, however, due to algorithm limitation, a digital human image output in the mode adopts a human head static model, only the correct change of the mouth shape can be realized, and the change of the head posture cannot be realized, so that the generated digital human lacks the naturalness and the realism of the head movement.
It can be seen that the SADTALKER algorithm has corresponding limitations in practical applications, and cannot generate digital human images containing head movements and face movements in a decoupled manner. Under the condition that the input image comprises more partial images (such as whole-body images of digital people) except for the head areas, when the head and the face of the digital people in the original image need to be driven simultaneously, obvious distortion can occur on the head and the body of the digital people in the generated digital people image, so that the generated digital people image is distorted, and the visual effect and the interactive experience of a user are affected.
It will be appreciated that the foregoing description of the SADTALKER algorithm and technical problems has been provided merely for the purpose of facilitating an understanding of the spirit and principles of the application, and that the application is not limited in this respect in any way. Rather, the present application may be adapted and applied to any image processing procedure having the limitations described above, including but not limited to SADTALKER algorithms.
In view of the above, the present application provides a method for generating a digital human image, in which a region of interest image including a human head region for driving processing is determined from a hair region of a digital human in an input original image, and is cut into a first image to generate a digital human image including human head motion matching a digital human head motion coefficient, and the digital human image is recombined into the input original image, so as to solve the problem of image distortion caused when the input image includes more partial images except the human head region, and to make the fusion boundary between the hair region and a background region in the generated digital human image smooth, and the digital human head motion is natural, thereby improving the quality of the digital human image and the naturalness and realism of the digital human.
The digital person image generating method can be applied to various scenes related to driving generation of the digital person image, such as virtual digital person teaching, virtual live broadcasting and the like, and can be also applied to any other applicable scenes suitable for generating the digital person image by including but not limited to SADTALKER algorithm, and the application is not limited to this. The digital person related to the embodiment of the application comprises a two-dimensional digital person, and can comprise, but is not limited to, an emulation type virtual digital person, a cartoon virtual digital person and the like.
The digital human image generating method provided by the application is described with reference to fig. 2A-11, and the method can be implemented by a terminal device or a server, wherein the terminal device can include, but is not limited to, a smart device such as a mobile phone, a wireless handheld device, a tablet computer or a personal computer, and the server can include, but is not limited to, an edge server, a cloud server, a multi-server cluster, and the like.
Referring to the step flow chart of the digital human image generation method shown in fig. 2A, the method may comprise at least the steps of:
S201, determining a region of interest in an original image of a digital person; the region of interest includes a first hair region of the digital person;
The original image of the digital person is used for providing the initial state of the head gesture and the facial expression of the digital person, and the original image can be a static photo of the digital person, or can be each frame of image in an original video which is recorded or generated in advance and contains the image, the action and the expression of the digital person, and each frame of image in the original video comprises the digital person.
The first hair region refers to a set of pixels representing the hair of a digital person in an original image, which are typically visually connected to the face and neck of the digital person and cover around the head to form a specific pixel region, which is distinguished from other elements in the image such as skin, clothing, background, etc. The first hair region may be obtained by detecting and recognizing the original image by using an open-source detection algorithm, such as Canny edge detection, or by applying a machine learning or deep learning model, or the like. It will be appreciated by those skilled in the art that the "first" of the first hair region is merely intended to distinguish it from the same type of information as the hair regions of the digital person on other images in the present application, and is not intended to have a specific meaning.
As shown in the schematic diagram of the first hair region on the original image in fig. 2B, the first hair region of the digital person of the original image is precisely located by setting the non-hair region to black and the hair region to white.
The region of interest refers to an image region to be further processed on the original image, and in this embodiment, the region of interest includes at least a first hair region of the digital person on the original image, that is, a range of the region of interest is greater than or equal to a range of the first hair region of the digital person.
Based on the above, after the first hair region of the digital person in the original image is determined, the region of interest can be determined by means of scaling up or combining the position information of the key points of the face of the digital person on the original image according to the pixel width and the height of the first hair region, so that the region of interest contains the head region of the digital person and excessive unnecessary background features such as image background or digital person body parts are avoided.
S202, cutting out an interested region in the original image to obtain a first image;
that is, for a region of interest on an original image, a region image corresponding to the region of interest is cropped from the original image as the first image based on the position of the region of interest on the original image. In an implementation, this may be achieved by a clipping function of an image processing library or framework, such as OpenCV (Open Source Computer Vision Library ) or PIL (Python Imaging Library, python image library), to ensure that the boundaries of the region of interest are clear and complete.
S203, driving the head gesture and the facial expression of the digital person in the first image according to the digital person head motion coefficient of the first image to generate a second image;
The digital human head motion coefficients include a digital human head pose coefficient and a facial expression coefficient for describing a digital human head motion state at a specific time point and under voice content in input driving voice data. The driving speech data is used to store content that the digital person is to speak, and may include text-to-speech audio files or user recorded audio.
In the SADTALKER algorithm processing process, the input driving voice data is aligned with at least one frame of original image of the pre-acquired digital person, so that each frame of original image corresponds to voice content at a specific time point in the driving voice data, and based on the fact that the first image is an image extracted from the original image, the digital person head motion coefficient of the first image can be generated by the mapping network for the specific time point and voice content in the driving voice data corresponding to the original image.
The second image is based on the initial state of the head of the digital person provided by the first image, the head posture of the digital person in the first image is driven to change according to the head posture coefficient, and the facial expression and the mouth shape of the digital person in the first image are driven according to the facial expression coefficient, so that an image matched with the head motion coefficient of the digital person is generated, and the driving of the head posture and the facial expression of the digital person in the first image to change can be realized by SADTALKER algorithms or other machine learning models. The second image corresponds to a specific time point and voice content in the driving voice data, and the head state and the facial expression of the digital person presented in the image are accurately matched with the head state of the voice content of the digital person at the specific time point.
S204, fusing the second image with the original image according to the position information of the first image on the original image, and generating a digital human image corresponding to the original image.
The fusing the second image with the original image means fusing the second image with a part of images in the same area in the original image, so as to update pixels in the same area in the original image, where the same area is an area in the original image matched with the pixel area of the second image, that is, an image area corresponding to the first image in the original image. In this embodiment, the generated digital person image is different from the original image in that a first image area on the digital person image corresponding to the position information of the first image retains the pixel information of the second image, and other image areas other than the first image area retain the pixel information of the original image.
As shown in fig. 2C, the image fusion situation illustrates that the left image is an original image, the rectangular ABCD in the image is a first image determined based on the region of interest on the original image, a second image is generated based on the first image by using the digital human head motion coefficients, and when the second image is fused with the original image, the second image is aligned with the rectangular ABCD on the original image pixel by pixel, and the second image is fused with the pixel area of the rectangular ABCD on the original image, so as to generate a digital human image matching the digital human head motion coefficients, where the digital human image includes a rectangle a 'B' C 'D' corresponding to the original image, and is different from the original image in that the pixels of the rectangular ABCD area are updated due to the image fusion.
The image fusion of the second image and the original image can be realized through a poisson fusion mode, or can be realized through directly replacing the first image on the original image with the second image and combining edge smoothing processing, or can also be realized through another image fusion mode based on a mask image provided by the application, and the specific realization process is shown in the following embodiment.
In the embodiment of the disclosure, the first image containing the head area is extracted from the input original image, the driving processing is carried out on the digital head and the face in the first image, and the processed second image is recombined to the original image, so that the digital human image matched with the digital head motion coefficient is generated, the problem of digital human distortion when the input original image drives the digital head and the face to move simultaneously is avoided, a more real and natural digital human image is generated, and the visual effect and interactive experience of a user are improved. And, by extracting the first image including the human head region from the input original image, the digital human image driving process is not dependent on the specific input image type any more, and the processing flexibility and the processing efficiency are improved.
In some embodiments, for the region of interest described in step S201 above, based on the region of interest including the first hair region of the digital person in the original image, for step S201, referring to the step flowchart shown in fig. 3, the region of interest may be determined by:
S301, determining a first hair region of the digital person in an original image of the digital person, and acquiring position information of a boundary of the first hair region in a horizontal direction and a vertical direction;
for the first hair region, the first hair region may be obtained by detecting and recognizing an original image by using an open-source detection algorithm, such as Canny edge detection, or by applying a machine learning or deep learning model, etc., which is not limited in the present application.
The horizontal direction refers to a direction along the left side to the right side of the face of the digital person on the original image, and can correspond to an X-axis of a two-dimensional image coordinate system, and the vertical direction refers to a direction from the chin of the digital person to the top of the head, and corresponds to a Y-axis of the two-dimensional image coordinate system.
The positional information of the boundary in the horizontal direction refers to the leftmost boundary (X-coordinate minimum value) and the rightmost boundary (X-coordinate maximum value) of the first hair region in the horizontal direction, defining the range of the first hair region in the horizontal direction. Similarly, the position information of the boundary in the vertical direction is the lowest point (Y coordinate minimum) and the highest point (Y coordinate maximum) of the first hair region in the vertical direction, and defines the range of the first hair region in the vertical direction. It will be appreciated that the "maximum" and "minimum" are defined in terms of coordinate axis directions.
S302, obtaining the center coordinates of the first hair region according to the position information of the boundary;
According to the position information of the boundaries in the horizontal direction and the vertical direction, the span of the hair area in the horizontal direction and the vertical direction can be determined, wherein the span value DeltaX=X coordinate maximum value-X coordinate minimum value in the horizontal direction, and the span value DeltaY=Y coordinate maximum value-Y coordinate minimum value in the vertical direction.
And according to the span value delta X in the horizontal direction and the span value delta Y in the vertical direction, the central coordinate of the first hair region can be determined by combining the position information of the boundary. For example, the center coordinates may be based onAndCalculated from the minimum X min and the minimum Y min, the center coordinate may be expressed as
S303, determining the region of interest according to the central coordinate and the region width of the first hair region in the horizontal direction and the vertical direction.
The width of the first hair region in the horizontal direction and the vertical direction is the span values DeltaX and DeltaY of the first hair region in the horizontal direction and the vertical direction. When the region of interest is determined according to the center coordinates, the center coordinates can be directly determined as the geometric center of the region of interest, the set multiple values of the span values DeltaX and DeltaY of the first hair region in the horizontal direction and the vertical direction are taken as the side lengths of the region of interest, and the geometric center of the region of interest is combined to obtain the corresponding region of interest.
Based on the fact that the hair region generally extends from the top of the head to the periphery of the neck, the face is positioned below the hair and at the central portion, and the central coordinate of the combined region of interest is determined according to the geometric center of the first hair region, the region of interest set by taking the central coordinate as the center can contain the whole head region of the digital person, namely the hair region and the face region of the digital person, so that the head posture and the facial expression of the digital person can be driven to change subsequently.
After the central coordinates of the first hair region are obtained, face key points of the original image can be further detected, the central coordinates of the first hair region are adjusted based on the face key points, so that the central coordinates can represent the geometric centers of the whole head region of a digital person on the original image, and the region width of the first hair region in the horizontal direction and the vertical direction is combined according to the adjusted central coordinates, so that a region of interest with a more accurate range is obtained.
When the region of interest is determined according to the center coordinates and the region widths of the first hair region in the horizontal direction and the vertical direction, the horizontal side length of the region of interest can be determined according to the region width of the first hair region in the horizontal direction, namely DeltaX, and the vertical side length of the region of interest can be determined according to the region width of the first hair region in the vertical direction, namely DeltaY, and then the center coordinates are taken as the center of the region of interest, and the rectangular size of the region of interest is defined according to the horizontal side length and the vertical side length, so that the region of interest is determined. When determining the horizontal side length and the vertical side length, a set multiple value of Δx and Δy which is larger than 1 may be taken as the horizontal side length and the vertical side length, and the multiple values of Δx and Δy may be the same or different. For example, 1.2 times of Δx and Δy may be taken as the side length of the region of interest, or 1.2 times of Δx, 1.4 times of Δy may be taken as the side length of the region of interest.
In the embodiment of the disclosure, the boundary position information of the hair region in the horizontal direction and the vertical direction in the original image is identified, so that the central coordinate of the hair region is calculated, and based on the central coordinate and the width of the hair region, an interesting region containing the hair region is determined.
In some embodiments, in determining the region of interest according to the central coordinate and the region width of the first hair region in the horizontal direction and the vertical direction, there may be a case where the determined region of interest exceeds the image boundary of the original image, for which case, after determining the horizontal side length and the vertical side length of the region of interest according to the region width of the first hair region in the horizontal direction and the vertical direction, if there is any case where the horizontal side length is greater than the width of the original image or the vertical side length is greater than the height of the original image, the horizontal side length and the vertical side length may be adjusted according to the size of the original image, and the region of interest may be determined according to the adjusted horizontal side length and the adjusted vertical side length.
That is, for a side length exceeding the boundary of the original image, it may be reduced to be within the original image size according to the image size of the original image, and another side length may be reduced in the same scale in order to maintain the original shape and scale of the region of interest.
In the embodiment of the disclosure, for the problem that the region of interest exceeds the boundary of the original image when the region of interest is determined, the horizontal side length and the vertical side length of the region of interest are adjusted through the size of the original image so that the region of interest is completely in the original image, the data loss and processing errors caused by accessing the non-existing pixel data are avoided, meanwhile, the integrity of the image can be maintained, and the distortion or stretching of the image caused by improper clipping is prevented.
In some embodiments, for the fusing of the second image with the original image according to the position information of the first image on the original image in the foregoing step S204, because the poisson fusion algorithm is high in complexity and large in calculation amount, if the first image on the original image is directly replaced by the second image, based on the manner that the generated second image is regenerated by the machine learning model on the basis of the original image, complex details of a digital person in the original image in terms of color, pose, illumination, texture and the like cannot be completely reproduced, there may be a problem that the background color of the second image is inconsistent with the original image, and there may be a problem that a clear boundary transition stiffness and image splitting occur when the second image is directly replaced to the original image, for example, see the schematic diagram of the fusion boundary transition stiffness shown in fig. 4A, where when the second image is fused to the original image, the difference in color at the fusion boundary as shown by the arrow in the figure causes a clear boundary distinction (as a display contrast enhancing effect).
Therefore, in order to achieve better image fusion effect and reduce the calculation amount of the fusion process, the embodiment provides an image fusion method based on a mask map, referring to the step flowchart shown in fig. 4B, the foregoing step S204 may achieve image fusion through the following steps:
S401, constructing a target mask diagram of the second image according to a second hair region and key points of the face of the digital person on the second image; the target mask map is used for identifying a human head area and a non-human head area;
The second hair area of the digital person refers to the hair area that is displayed on the second image by the digital person after the head posture and the facial expression are driven to change, and the hair area can be detected and determined through an open-source hair detection algorithm or other machine learning models, or can be quickly determined through a vector transformation mode provided by another embodiment of the application, and the embodiment is specifically described below.
The face key points refer to a series of predefined points used for describing the characteristics of the digital face in the second image, are used for positioning different characteristic areas such as the outline, the eyebrows, the eyes, the nose, the mouth and the like of the face, and can be obtained by detecting the face key points of the second image through a machine learning or deep learning algorithm.
The mask image is used for distinguishing and processing different areas in the image so as to specify the area needing to be reserved or paid attention to in the image processing process, in this embodiment, the target mask image is constructed according to the second hair area and the face key points of the digital person in the second image, and based on the hair area and the face key points on the second image, the head area and the non-head area can be accurately identified, and the head area comprises the hair area and the face area so as to reserve the head area information on the second image in the subsequent image fusion process, and meanwhile reserve the non-head area information on the original image.
Based on the above, different regional pixel values are respectively set in the head region and the non-head region in the target mask diagram, the regional pixel values are used for representing the contribution degree of the same regional pixel points on the second image in the image fusion process, and in order to correctly preserve the head region information on the second image in the fusion process, the pixel value of the head region is set to be significantly higher than the pixel value of the non-head region. The pixel value setting of the target mask image can be realized by a gray mask image, a three-channel mask image and the like, so that the head area can be correctly distinguished and the weight can be reasonably distributed in the fusion process.
S402, determining a fusion weight corresponding to each pixel point on the second image according to the pixel value of each pixel point on the target mask image;
based on the pixel values of the pixel points on the target mask image, the contribution degree of the pixel points of the corresponding region on the second image to the image fusion is represented, so that the pixel value of each pixel point in the target mask image can be used as a basis, and fusion weights are sequentially distributed to each pixel point on the second image by mapping the pixel values of the pixel points to intervals [0,1], wherein the fusion weights are used for representing the contribution degree of the second image on the pixel position to the fusion image.
In the process of distributing the fusion weight to each pixel point on the second image, if the image size and the resolution of the constructed target mask image are the same as those of the second image, for each pixel point on the target mask image, a unique corresponding pixel point can be found at the same pixel position on the second image. In this case, according to the pixel value of the pixel point on the target mask map, the fusion weight corresponding to the pixel point at the same pixel position on the second image can be determined.
If the image size and resolution of the constructed target mask image are different from those of the second image, and the target mask image of the second image and the pixel points of the second image do not meet one-to-one correspondence, the target mask image needs to be adjusted, and the size and resolution of the target mask image can be adjusted to match the second image by bilinear interpolation, so that an updated target mask image is generated, and the fusion weight corresponding to each pixel point in the second image is determined according to the updated target mask image.
The bilinear interpolation is a method for calculating an interpolation pixel value in a two-dimensional space, and calculates the interpolation pixel value of a generated target pixel point by considering the pixel values of four nearest neighbor pixel points around the target pixel point and the horizontal and vertical distances between the four nearest neighbor pixel points and the target pixel point, so that a smooth zoom image can be generated, and a saw-tooth edge possibly caused by the nearest neighbor interpolation is avoided.
For example, the barycentric coordinates of the target pixel point with respect to the four nearest neighbor pixel points may be expressed as (u, v) for describing the relative positions of the target pixel point with respect to the four nearest neighbor pixel points. Assuming that coordinates of the four nearest neighbor pixel points are (i, j), (i+1, j), (i, j+1), (i+1, j+1), and coordinates of the target pixel point may be represented as (i+u, j+v), and a pixel value f obtained by bilinear interpolation may be represented as:
f(i+u,j+v)=(1-u)(1-v)f(i,j)+(1-u)vf(i,j+1)
+u(1-v)f(i+1,j)+uvf(i+1,j+1)
s403, according to the fusion weight, carrying out linear fusion on the second image and the original image to generate a digital human image corresponding to the original image.
Based on the fusion weights being set for each pixel point on the second image, each pixel point can be processed independently in the image fusion process. Specifically, according to the fusion weight corresponding to each pixel point on the second image, each pixel point on the second image may be traversed, and the pixel values of the pixel point on the second image and the pixel points of the same pixel position on the corresponding original image are linearly overlapped, so as to obtain the pixel value of the pixel point on the fusion image.
The linear superposition is essentially a weighted calculation, that is, the pixel values of the pixel points at the same position on the second image and the original image are multiplied by respective fusion weights, and then the two results are added to obtain a new pixel value of the fused pixel point. If the weight of a certain pixel point on the second image is high, the fused pixel value will be closer to the pixel value of the certain pixel point on the second image, otherwise, the fused pixel value will be closer to the pixel value of the certain pixel point on the original image.
By fusing each generated second image with the corresponding original image in the mode, the head motion state of the digital person contained in the second image can be fused to the original image accurately while the background and other information of the original image are maintained, smooth and natural transition of the fused image is ensured, the head motion state of the digital person after driving treatment is fused with the original image perfectly, and a vivid visual effect is presented. For example, as shown in a comparison schematic diagram of the fusion result shown in fig. 4C, the left image is a digital human image fused by a conventional manner (the contrast is enhanced for the display effect), the right image is a digital human image fused by the manner provided in this embodiment, and compared with the obvious fusion boundary trace and color difference on the left digital human image, the transition at the fusion boundary of the right image is smooth.
In the embodiment of the disclosure, the human head area and the non-human head area are identified by constructing the target mask image, the contribution degree of the pixel values of the target mask image to the image fusion is set, so that the fusion weight of each pixel point on the second image is determined according to the pixel values of the mask image during the image fusion, the accurate fusion of the second image containing the digital human head motion state and the original image is realized by the linear superposition of the pixel values, the problem of inconsistent background colors of the fused image is effectively solved, the color gradient effect is formed at the boundary part, the color transition of the image boundary is eliminated, and the visual effect of more naturalness and reality is presented.
In some embodiments, the target mask map according to the step S401 is constructed according to the second hair area of the digital person and the key points of the face on the second image, and in order to reduce the calculation amount of the mask map construction process, the second hair area may be determined by vector transformation without being acquired by a hair detection algorithm.
Based on this, before the foregoing step S401, the method may further include a step of determining a second hair area of the digital person on the second image, referring to a flowchart of a step of determining a hair area of the digital person on the second image shown in fig. 5, may include:
s501, acquiring a first eye vector in a first image corresponding to the second image; the first eye mouth vector is predetermined according to the coordinate information of key points of the mouth and eyes of the digital person;
The first eye-mouth vector is used for describing the relative position relation between eyes and mouth of the digital person in the first image, and can be obtained by calculating the difference between the average coordinates of the mouth area and the eye area of the digital person.
After the first image is obtained, face key points of the first image can be detected to obtain position information of the face key points, average coordinates of a mouth area and average coordinates of eye areas are respectively determined according to the position information of the face key points, and then the average coordinates of the mouth area are subtracted from the average coordinates of the eye areas, and the result is determined to be the first eye-mouth vector. The face keypoint detection of the digital person on the first image and the first eye vectors may be pre-performed and stored in a pre-processing stage, the pre-stored first eye vectors being directly recalled in determining the second hair region.
S502, detecting face key points of the digital person on the second image, and generating a second eye-mouth vector according to coordinate information of key points of a mouth and eyes in the face key points;
The second eye and mouth vector is used for describing the relative position relationship between eyes and mouth after the driving conversion of the human head state of the digital human in the second image after the driving processing. After the driving processing is performed to generate a second image, the face key points of the digital person in the second image can be identified through the existing face key point detection method, the second eye and mouth vector is determined through the same calculation method as the first eye and mouth vector, and the coordinate information of the face key points on the second image and the coordinate information of the face key points on the first image adopt the same coordinate system and coordinate determination mode.
That is, according to the coordinate information of the key points of the mouth and the eye region of the digital person on the second image, the average coordinate of the mouth region and the average coordinate of the eye region of the digital person on the second image are respectively determined, and the difference between the average coordinate of the mouth region and the average coordinate of the eye region is determined as the second eye-mouth vector.
S503, determining vector transformation parameters according to the first eye-mouth vector and the second eye-mouth vector;
In driving facial expression changes of a digital person according to head motion coefficients of the digital person, various parts of the face (including eyes, mouth, etc.) may undergo a certain displacement or rotation with respect to an original position, and such changes may be described using vector transformation parameters, i.e., a second eye-mouth vector B ' on a second image may be expressed as a product of a first eye-mouth vector B on a first image and an eye-mouth vector transformation matrix C, and may be expressed as B ' =cb, and then the vector transformation parameters are the eye-mouth vector transformation matrix c=b ' B -1.
S504, determining a second hair vector of the second image according to the first hair vector of the digital person in the first image and the vector transformation parameters; the first hair vector is predetermined according to the coordinate information of the top and the bottom of the hair of the first hair area on the first image;
The hair vector is used to describe the position and orientation of a digital person's hair in an image, and is determined based on key feature points or areas of the hair area to represent the overall shape, orientation, or change in position of the hair. In this embodiment, the hair vector may be defined as the difference between the average coordinates of the bottom of the hair, which is the average of the coordinates of the left and right lowest points in the hair region, and the top of the hair, which is the minimum of the top of the head y coordinates in the vertical direction.
The first hair vector is used for describing the position and the direction of a first hair area of the digital person in the first image, and the second eye mouth vector is used for describing the position and the direction of a second hair area after the driving conversion of the human head state of the digital person in the second image after the driving processing.
Based on the consistency of the overall head movement, the vector transformation parameters describing the movement of the eye and mouth regions may be approximated to describe the movement of the hair region, and thus, after determining the vector transformation parameters of the digital human eye and mouth region movement of the first image to the second image, the second hair region on the second image may be determined from the vector transformation parameters and the first hair region on the first image.
The first hair region on the original image is determined based on the aforementioned step S201, and the first image is an image extracted from the original image including the first hair region, and therefore, the position information of the first hair region on the original image can be mapped to the first image, thereby determining the first hair vector from the position information of the first hair region on the first image. The first hair vector of the digital person on the first image may be pre-executed and stored in a pre-processing stage, the pre-stored first hair vector being directly recalled in determining the second hair region.
For example, given that the vector transformation parameters of the eye-mouth region are transformation matrix C, the second hair vector a 'on the second image may be represented as a' =ca assuming that the first hair vector is vector a.
S505, determining a second hair area of the digital person on the second image according to the second hair vector.
And mapping the first hair region of the digital person on the first image to the second image as a transformation starting point, transforming the key points of the first hair region according to vector transformation parameters to match the second hair vector, and rapidly positioning the second image region of the digital person on the second image according to the coordinate information of the transformed key points.
In the embodiment of the disclosure, the eye and mouth vectors and the hair vectors of the digital person on the first image are stored in advance, after the second image is generated, the hair area of the digital person on the second image is rapidly determined based on the hair vectors on the first image by utilizing the vector transformation relation of the eye and mouth areas in the images before and after the driving processing, so that the hair area detection on the second image is avoided, the eye and mouth vectors and the hair vectors of the digital person on the first image are stored in advance, the calculated amount of the image fusion process is reduced, and the processing efficiency is improved.
In some embodiments, for the first hair vector of the digital person in the first image referred to in step S501, the first hair area information of the digital person in the first image may be predetermined and stored, so as to be convenient for directly obtaining and using in the image fusion process. Based on this, the step of predetermining the first hair vector may include, as shown in the step flowchart of fig. 6:
S601, respectively acquiring coordinates of a left lowest point and a right lowest point of a hair bottom area in a horizontal direction for a first hair area on the first image, and taking the average coordinates as hair bottom coordinates;
the horizontal direction refers to a direction from the left face boundary to the right face boundary; the hair bottom area refers to an area where the hair tails of the first hair area are located in a vertical direction along the chin of the digital person to the top of the head. The left and right lowest points of the hair base area are the lowest points of the hair tails on either side of the face in the horizontal direction, and are typically the edge points where the hair intersects the background or other non-hair portion.
After the left lowest point and the right lowest point are obtained, taking the average value of the horizontal coordinates (the X coordinates corresponding to the two-dimensional coordinates) of the two points in the horizontal direction based on the coordinates of the two points in the two-dimensional image coordinate system, and taking the average value of the vertical coordinates (the Y coordinates corresponding to the two-dimensional coordinates) of the two points in the vertical direction or taking the smaller vertical coordinates of the two points to obtain the bottom coordinates of the hair in the first hair region, wherein the bottom coordinates represent the central position of the tail part of the hair in the horizontal direction.
S602, determining the top coordinates of the hair according to the coordinates of each point of the top area of the hair in the vertical direction;
For the coordinates of each point in the hair top area, the point with the smallest vertical coordinate in all points can be directly used as the hair top coordinate, if the vertical coordinates of a plurality of points are smallest, an average value of the horizontal coordinates of the plurality of points can be further calculated, and the point determined by the average value and the smallest vertical coordinate is used as the hair top coordinate. Alternatively, if the top area of the hair exhibits a certain specific shape, the coordinates of the top of the hair may be determined by fitting the shape, for example, a straight line may be fitted to the point of the top edge using the least square method, and the top point of the straight line may be regarded as the hair top coordinates.
And S603, generating the first hair vector according to the hair bottom coordinate and the hair top coordinate.
That is, the difference of the hair bottom coordinates minus the hair top coordinates is taken as the first hair vector.
For example, the points M1 (x 1, y 1) and M2 (x 2, y 2) in the first hair region on the first image represent the left and right lowest points of the hair bottom region, respectively, and the point M3 (x 3, y 3) represents the point at which the hair top coordinates are located, then the hair bottom coordinates may be represented asThe first hair vector may be expressed as
In some embodiments, for the construction of the target mask map of the second image according to the second hair region and the face keypoints of the digital person on the second image in the aforementioned step S401, referring to the mask map construction step flowchart shown in fig. 7A, the target mask map may be generated by:
S701, creating a blank image; the pixel points of the blank image are in one-to-one correspondence with the pixel points of the corresponding positions on the second image;
the image size and resolution of the blank image are the same as those of the second image, and a blank image is created based on the image size and image resolution of the second image, and the pixel values of all pixel points of the blank image can be initialized to be set to a pixel value (0, 0) or set to a gray value 0.
S702, determining a head area and a non-head area on the blank image according to the second hair area and the key points of the face on the second image;
And mapping the position information of the face key points and the second hair areas in the second image into the blank image, and directly applying the boundary coordinates of the face key points and the second hair areas to the corresponding positions of the blank image based on the fact that the image size and the resolution of the blank image are the same as those of the second image. Based on the boundary of the hair region and the face region determined based on the face key points, the head region and the non-head region comprising the hair region and the face region can be accurately identified on the blank image.
S703, setting pixel values for the pixel points in the head region and the non-head region respectively, and generating the target mask map.
In this embodiment, since the target mask is used to indicate that the head region on the second image is preserved in the image fusion process, and the pixel value of the target mask is used to represent the contribution degree of the pixels in the same region on the second image to the fused image in the image fusion process, when the blank image is set in the form of a gray scale, the pixel value of the head region may be set to a maximum gray scale value 255, and the pixel value of the non-head region may be set to a minimum gray scale value 0; or the target mask map may be set in the form of a three-way mask map, with the pixel values of the head region set to a first three-way pixel value, and the non-head region set to a second three-way pixel value. Wherein each channel value in the first three-way pixel value takes a maximum value, namely the first three-way pixel value is (255 ), and each channel value in the second three-way pixel value takes a minimum value, namely the second three-way pixel value is (255 ).
Through the pixel value setting mode, the image effect presented by the target mask image is an image comprising a black area and a white area, wherein the black area represents a non-human head area, and the white area represents a human head area. Referring to the target mask diagram shown in fig. 7B, a head region and a non-head region can be located through a face key point and a second hair region, black and white pixels are respectively set for the non-head region and the head region, and a target mask diagram is generated.
In the embodiment of the disclosure, the human head region and the non-human head region are determined on the preset blank image based on the human face key points and the second hair region boundary of the digital human on the second image by accurately detecting the human face key points and the second hair region of the digital human on the second image, and further, the human head region on the second image is reserved in the image fusion process by setting different pixel values, so that the construction of the target mask image is realized, and effective fusion boundary information is provided for the fusion of the second image and the original image. And the position of the head region can be accurately indicated based on the target mask map, and the pixel information of the head region on the second image can be ensured to be completely reserved in the fusion process.
In some embodiments, after the target mask is generated, based on the boundary between the human head region and the non-human head region in the target mask being a hard boundary, that is, the pixel value jumps from 255 (white) to 0 (black), a jaggy effect may occur at the mask boundary during the process of guiding image fusion, resulting in unnatural fusion images, so in order to soften the hard boundary of the pixel value mutation region of the mask image to make the transition natural, the method may further include a step of performing gaussian blur processing on the current target mask, that is, detecting the pixel value mutation region in the target mask first, and performing gaussian blur smoothing processing on each pixel point of the pixel value mutation region to obtain an updated target mask.
Regarding gaussian blur processing, this can be achieved by: according to the image size of the target mask diagram, determining a Gaussian kernel and a standard deviation of Gaussian blur processing; determining a weight value of each position in the Gaussian kernel according to a two-dimensional Gaussian distribution formula; aiming at each pixel point to be processed in the target mask graph, aligning the center of a Gaussian kernel with the pixel point to be processed, and carrying out convolution operation on the pixel point and the neighborhood pixels thereof according to each weight value in the Gaussian kernel to obtain a pixel value of the pixel point after Gaussian blur; the neighborhood pixels are determined according to the coverage range of the Gaussian kernel.
The Gaussian kernel and standard deviation of the Gaussian blur can be preset according to experience, and can be determined according to the resolution of the target mask map, and different Gaussian kernels can be set for the target mask maps with different resolutions. For example, the resolution of the image is a×b, and the gaussian kernel size can be determined as (a/16+b/16).
The two-dimensional gaussian distribution formula can be expressed as:
wherein (m, n) represents the offset of each position in the gaussian kernel from the center of the gaussian kernel, σ is the standard deviation, and f (m, n) represents the weight value of that position in the gaussian kernel.
For example, assume that there is one 3*3 gaussian kernel, each location coordinate in the gaussian kernel as shown in table 1, (0, 0) is the gaussian kernel center, and the coordinates of each location around represent the offset from the gaussian kernel center:
(-1,-1) (-1,0) (-1,1)
(0,-1) (0,0) (0,1)
(1,-1) (1,0) (1,1)
The coordinates of each position are substituted into the two-dimensional Gaussian distribution formula to calculate, so that the weight value of the position can be obtained, the weight value determines the relative importance between the pixel point to be processed and the neighborhood pixels around the pixel point to be processed, the center weight of the Gaussian kernel is highest, and the weight which is far from the center of the Gaussian kernel is gradually reduced.
For the pixel point (x, y) to be processed, the convolution operation formula of the gaussian blur process can be expressed as follows:
Wherein g (x, y) represents a pixel value of the pixel point after Gaussian blur, r represents a convolution kernel radius, s (x, y) represents a pixel value of the pixel point before Gaussian blur processing, and f (m, n) represents a weight value corresponding to a position (m, n) on the convolution kernel.
That is, the pixel point to be processed is aligned with the center of the gaussian kernel, a neighborhood which takes the pixel point as the center and has the same size as the gaussian kernel is obtained, each weight value in the gaussian kernel is multiplied with the pixel value of the corresponding neighborhood pixel point, all the products are added to obtain a new pixel value, and the new pixel value is used as the pixel value of the pixel point to be processed after gaussian blurring. And repeating the convolution operation for each pixel point to be processed in the first mask map, thereby obtaining an updated target mask map after Gaussian blur processing.
In the embodiment of the disclosure, the hard boundary between the human head region and the non-human head region in the target mask image is softened by the Gaussian blur technology, so that the change of the boundary pixel value is smoother, the sawtooth effect is avoided in the image fusion process, the fused image is more natural and smooth, and the overall quality of the image and the user visual experience are improved.
In some embodiments, for the pixel value of each pixel point on the target mask map in the foregoing step S402, determining the fusion weight corresponding to each pixel point on the second image may be implemented by a pixel value normalization process, that is, for each pixel point on the target mask map, performing a normalization process on the pixel value of the pixel point; and taking the normalization processed result as a fusion weight corresponding to the pixel point with the same pixel position on the second image.
For the case where the pixel value of the human head region in the target mask map is set to the maximum gray value and the pixel value of the non-human head region is set to the minimum gray value, the normalization process may be performed by dividing each pixel value in the target mask map by the maximum gray value 255.
For the case that the pixel value of the head area is set to the first three-channel pixel value and the non-head area is set to the second three-channel pixel value, each channel in each pixel value may be divided by 255, and the normalized fusion weight includes three channel values, where each channel value may be used as the fusion weight of the same channel on the pixel point when the second image is fused with the pixel point on the same pixel position on the original image. For example, for a certain pixel point of the pixel value abrupt change region on the target mask map, assuming that the pixel value after the gaussian blur processing is (128,128,128), the normalization result is (0.50,0.50,0.50), and when the second image is fused with the original image, for each channel value in the pixel value of the pixel point, the fusion weight of the same channel in the normalization result is calculated.
In some embodiments, for the linear fusion of the second image and the original image according to the fusion weights in the foregoing step S403, referring to the step flowchart shown in fig. 8, the image fusion may be implemented by the following steps:
S801, obtaining a second pixel value of each pixel point on the second image, and obtaining a first pixel value of the same pixel position on the original image;
The fusing of the second image and the corresponding original image means that the second image is fused with an image block on the original image, which is matched with the pixel area of the second image, and the second image and the original image have the same pixel width and the same pixel height. Based on the fusion weight, which is set for each pixel point on the second image, when the image fusion is carried out, the pixel value weighting calculation based on the fusion weight is essential, and then the first pixel value of each pixel point in the second image and the second pixel value of the same pixel point corresponding to the original image are acquired simultaneously so as to realize the image fusion at the pixel level.
S802, according to the fusion weight corresponding to each pixel point on the second image, the first pixel value and the second pixel value are linearly overlapped, and the digital human image is obtained.
The fusion weight is used for representing the contribution degree of the pixel point on the second image to the fusion image, and if ω is represented, the fusion weight corresponding to the pixel point on the original image may be represented as (1- ω). Based on this, the first pixel value and the second pixel value are linearly superimposed, i.e. the second pixel value is multiplied by the fusion weight ω, and the corresponding second pixel value is added by (1- ω), and the sum is the pixel value of the pixel point at the corresponding position on the fusion image.
When the first pixel value and the second pixel value are multiplied by the respective weight values to perform linear superposition, the first pixel value and the second pixel value may be calculated for each channel in the first pixel value and the second pixel value, and the three-channel pixel value of the pixel point (x, y) on the second image may be expressed as S R(x,y),Sg(x,y),SB (x, y) if the three-channel pixel value of the pixel point (x, y) on the original image is I R(x,y),IG(x,y),IB (x, y) and the three-channel pixel value of the pixel point (x, y) on the fused image is S R(x,y),Sg(x,y),SB (x, y), and the fused weight is ω, where the three-channel pixel value of the pixel point at the corresponding position on the fused image may be expressed as:
FR(x,y)=ωIR(x,y)+(1-ω)SR(x,y)
FG(x,y)=ωIG(x,y)+(1-ω)SG(x,y)
FR(x,y)=ωIB(x,y)+(1-ω)SB(x,y)
In the embodiment of the disclosure, the contribution degree of each pixel point in the second image and the original image is determined by adopting the fusion weight based on the pixel level, the effective fusion of the second image and the original image is realized by the fusion weight, the digital human image which contains the dynamic motion of the head of the digital human and keeps the sense of reality of the background is generated, the problem of stiffness of the fusion boundary is solved, and the transition smoothness of the fusion boundary is realized.
In some embodiments, in the case that each original image to be processed is a multi-frame image of an original video, the method may further include a digital human target video generation process, and may include the steps of: and combining the digital person images corresponding to each original image according to the inter-frame sequence of the multi-frame images in the original video to generate a first video, and synchronizing the pre-prepared driving voice data with the first video, for example, precisely aligning the voice data with the video frames according to a time axis so as to ensure that the mouth shape, expression and action of the digital person are matched with voice content, thereby generating the digital person target video.
In order to better understand the digital human image generation method provided by the application, a method flow is described below taking a scene in which a whole body image in front of a digital human head-up is taken as an original image and a digital human image of a face-up smile is required to be generated as an example.
In this embodiment, referring to the flowchart of the method shown in fig. 9, the digital human image generating method provided by the present application mainly includes an input image processing stage, a digital human head motion driving stage, an image fusion stage, and a digital human image output stage.
S901, input image processing stage: the method is divided into an original image input and an image cutting process. The original image input is used for inputting a whole body image in front of the digital human head-up as an original image; the image cropping process is used for determining an interested region in the whole-body image, and cropping the image of the interested region from the whole-body image to serve as a basic driving image, wherein the basic driving image mainly comprises a human head region of a digital person, so that unnecessary background image information is reduced.
Regarding the region of interest, the hair detection of the open source algorithm may be utilized, firstly, detecting the whole body image of the digital person, marking the hair region and the non-hair region respectively, and detecting the maximum value and the minimum value of the x coordinate in the horizontal direction and the maximum value and the minimum value of the y coordinate in the vertical direction in the hair region; then, taking (Deltax+Deltay)/2 as the central coordinate of the region of interest, and taking the set multiples of Deltax and Deltay as the side length of the region of interest. After determining the region of interest, if a certain side length of the region of interest is larger than the pixel width or height of the input whole-body image, the side length can be scaled down to the pixel size of the picture, and the other side length can be scaled down in the same scale.
For example, assume that the bounding box coordinates of the hair region are: x_min=100, x_max=300, y_min=50, y_max=250, and the total pixel size of the whole-body image of the digital person is 800 x 600, the center coordinates of the region of interest can be expressed as:
Center x coordinate= (x_min+x_max)/2= (100+300)/2=200
Center y coordinate= (y_min+y_max)/2= (50+250)/2=150
Assuming 1.1 times as the magnification factor of the side length, the side length of the region of interest can be expressed as:
width=1.1 (x_max-x_min) =1.1 (300-100) =220
Height=1.31 (y_max-y_min) =1.1 (250-50) =220
S902, digital human head motion driving phase: the driving method comprises the steps of driving a digital person in a basic driving image output by a step S901 to change head gestures and facial expressions according to digital person head gesture coefficients and facial expression coefficients corresponding to input driving voice data, and converting a head state in front of head-up of the digital person in the basic driving image into a head state of head-up smile to obtain a driving matching image;
S903, image fusion stage: the method at least comprises three steps of target mask diagram construction, gaussian blur processing and image fusion. The target mask image construction step is used for constructing a target mask image according to the hair area and the face key points of the digital person on the driving matching image, the mask image is used for identifying the head area and the non-head area, and the head area comprises the hair area of the digital person on the driving matching image and the face area positioned by the face key points. The Gaussian blur processing step is used for carrying out Gaussian blur processing on pixel points of a pixel value mutation area on the constructed target mask image. The image fusion step is used for determining the fusion weight of each pixel point on the driving matching image according to the pixel value of the target mask image, and carrying out pixel-level linear fusion on the driving matching image and the whole-body image according to the fusion weight.
S904, a digital human image output stage, which is used for outputting the digital human image subjected to the linear fusion processing.
In the embodiment of the disclosure, the region of interest including the head region for driving processing is determined from the input whole-body image, the corresponding driving matching image including the head motion is generated based on the region of interest, and the driving matching image is recombined to the input whole-body image, so that the problem of image distortion caused when the input image includes more partial images except the head region is solved, and the quality of the digital human image and the naturalness and realism of the digital human are improved.
Corresponding to the foregoing embodiment of the digital person generating method, referring to fig. 10, the present application also provides an embodiment of a digital person generating apparatus, the apparatus comprising:
A region of interest determination module 1010 for determining a region of interest in an original image of a digital person; the region of interest includes a first hair region of the digital person;
A first image obtaining module 1020, configured to crop a region of interest in the original image to obtain a first image;
The driving processing module 1030 is configured to perform driving processing on a head gesture and a facial expression of the digital person in the first image according to the digital person head motion coefficient of the first image, so as to generate a second image;
the digital person image generating module 1040 is configured to fuse the second image with the original image according to the position information of the first image on the original image, and generate a digital person image corresponding to the original image.
In some embodiments, the region of interest determination module is specifically configured to:
A boundary position information acquisition module, configured to determine a first hair region of the digital person in an original image of the digital person, and acquire position information of a boundary of the first hair region in a horizontal direction and a vertical direction;
The center coordinate determining module is used for obtaining the center coordinate of the first hair region according to the position information of the boundary;
and the interested region determining module is used for determining the interested region according to the central coordinate and the region width of the first hair region in the horizontal direction and the vertical direction.
In some embodiments, the region of interest determination module is specifically configured to:
determining the horizontal side length of the region of interest according to the region width of the first hair region in the horizontal direction;
Determining the vertical side length of the region of interest according to the region width of the first hair region in the vertical direction;
and determining the region of interest according to the horizontal side length and the vertical side length and the center coordinates.
In some embodiments, the region of interest determining module is configured to determine the region of interest according to the horizontal side length and the vertical side length in combination with the center coordinates, and includes:
Adjusting the horizontal side length and the vertical side length according to the size of the original image in the case that the horizontal side length is greater than the width of the original image or the vertical side length is greater than the height of the original image;
And determining the region of interest according to the adjusted horizontal side length and the adjusted vertical side length.
In some embodiments, the digital human image generation module comprises:
The target mask image construction module is used for constructing a target mask image of the second image according to the second hair region and the key points of the face of the digital person on the second image; the target mask map is used for identifying a human head area and a non-human head area;
The fusion weight determining module is used for determining the fusion weight corresponding to each pixel point on the second image according to the pixel value of each pixel point on the target mask image;
and the image fusion module is used for carrying out linear fusion on the second image and the original image according to the fusion weight.
In some embodiments, prior to constructing the target mask map, the apparatus further comprises:
the first eye mouth vector acquisition module is used for acquiring a first eye mouth vector in a first image corresponding to the second image; the first eye mouth vector is predetermined according to the coordinate information of key points of the mouth and eyes of the digital person;
The second eye-mouth vector acquisition module is used for detecting face key points of the digital person on the second image and generating a second eye-mouth vector according to coordinate information of key points of a mouth and eyes in the face key points;
the vector transformation parameter determining module is used for determining vector transformation parameters according to the first eye mouth vector and the second eye mouth vector;
a second hair vector determining module, configured to determine a second hair vector of the second image according to the first hair vector of the digital person in the first image and the vector transformation parameter; the first hair vector is predetermined according to the coordinate information of the top and the bottom of the hair of the first hair area on the first image;
And the second hair area determining module is used for determining a second hair area of the digital person on the second image according to the second hair vector.
In some embodiments, the apparatus further comprises a first hair vector predetermination module, specifically for:
For a first hair region on the first image, respectively acquiring coordinates of a left lowest point and a right lowest point of a hair bottom region in the horizontal direction, and taking the average coordinates as the hair bottom coordinates;
in the vertical direction, determining the top coordinates of the hair according to the coordinates of each point in the top area of the hair;
And generating the first hair vector according to the hair bottom coordinate and the hair top coordinate.
In some embodiments, the target mask graph construction module is specifically configured to:
Creating a blank image; the pixel points of the blank image are in one-to-one correspondence with the pixel points of the corresponding positions on the second image;
According to the second hair area and the key points of the human face on the second image, determining a human head area and a non-human head area on the blank image;
And setting pixel values for the pixel points in the head region and the non-head region respectively, and generating the target mask map.
In some embodiments, after generating the target mask map, the apparatus further comprises:
detecting a pixel value mutation region in the target mask map;
And smoothing the pixel points of the pixel value mutation area by using Gaussian blur to obtain an updated target mask diagram.
In some embodiments, the fusion weight determination module is specifically configured to:
for each pixel point on the target mask graph, carrying out normalization processing on pixel values of the pixel points;
and taking the normalization processed result as a fusion weight corresponding to the pixel point with the same pixel position on the second image.
In some embodiments, the image fusion module is specifically configured to:
acquiring a second pixel value of each pixel point on the second image, and acquiring a first pixel value of the same pixel position on the original image;
And according to the fusion weight corresponding to each pixel point on the second image, linearly superposing the first pixel value and the second pixel value to obtain the digital human image.
In some embodiments, where the original image of the digital person comprises a multi-frame image of the original video, the apparatus further comprises:
the first video generation module is used for combining the digital human images corresponding to each original image according to the inter-frame sequence of the multi-frame images in the original video to generate a first video;
And the audio synchronization module is used for synchronizing the input driving voice data with the first video and generating a digital human target video matched with the driving voice data.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
The embodiment of the application also provides an electronic device, the structural schematic diagram of which is shown in fig. 11, the electronic device 1100 includes at least one processor 1101, a memory 1102 and a bus 1103, and at least one processor 1101 is electrically connected to the memory 1102; the memory 1102 is configured to store at least one computer executable instruction and the processor 1101 is configured to execute the at least one computer executable instruction to perform the steps of any one of the digital human image generation methods as provided by any one of the embodiments or any one of the alternative implementations of the present application.
Further, the processor 1101 may be an FPGA (Field-Programmable gate array) or other device having logic processing capability, such as an MCU (Microcontroller Unit, micro control unit), a CPU (Central Process Unit, central processing unit).
The embodiment of the application also provides another readable storage medium, which stores a computer program for implementing the steps of any digital human image generating method provided by any embodiment or any optional implementation mode of the application when the computer program is executed by a processor.
The readable storage medium provided by the embodiments of the present application includes, but is not limited to, any type of disk (including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks), ROMs (Read-Only memories), RAMs (Random Access Memory, random access memories), EPROMs (Erasable Programmable Read-Only memories), EEPROMs (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only memories), flash memories, magnetic cards, or optical cards. That is, a readable storage medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).
The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims (15)

1. A digital human image generation method, the method comprising:
Determining a region of interest in an original image of a digital person; the region of interest includes a first hair region of the digital person;
cutting out a region of interest in the original image to obtain a first image;
Driving the head gesture and facial expression of the digital person in the first image according to the digital person head motion coefficient of the first image to generate a second image;
And fusing the second image with the original image according to the position information of the first image on the original image, and generating a digital human image corresponding to the original image.
2. The method of claim 1, wherein the determining the region of interest in the original image of the digital person comprises:
Determining a first hair region of the digital person in an original image of the digital person, and acquiring position information of boundaries of the first hair region in a horizontal direction and a vertical direction;
obtaining the center coordinates of the first hair region according to the position information of the boundary;
And determining the region of interest according to the central coordinate and the region width of the first hair region in the horizontal direction and the vertical direction.
3. The method of claim 2, wherein the determining the region of interest based on the center coordinates and the region width of the first hair region in the horizontal direction and the vertical direction comprises:
determining the horizontal side length of the region of interest according to the region width of the first hair region in the horizontal direction;
Determining the vertical side length of the region of interest according to the region width of the first hair region in the vertical direction;
and determining the region of interest according to the horizontal side length and the vertical side length and the center coordinates.
4. A method according to claim 3, wherein said determining said region of interest in combination with said central coordinates based on said horizontal and vertical side lengths comprises:
Adjusting the horizontal side length and the vertical side length according to the size of the original image in the case that the horizontal side length is greater than the width of the original image or the vertical side length is greater than the height of the original image;
And determining the region of interest according to the adjusted horizontal side length and the adjusted vertical side length.
5. The method of claim 1, wherein fusing the second image with the original image based on the positional information of the first image on the original image comprises:
Constructing a target mask image of the second image according to a second hair region and key points of the face of the digital person on the second image; the target mask map is used for identifying a human head area and a non-human head area;
determining a fusion weight corresponding to each pixel point on the second image according to the pixel value of each pixel point on the target mask image;
And according to the fusion weight, carrying out linear fusion on the second image and the original image.
6. The method of claim 5, wherein prior to constructing the target mask map, the method further comprises:
Acquiring a first eye vector in a first image corresponding to the second image; the first eye mouth vector is predetermined according to the coordinate information of key points of the mouth and eyes of the digital person;
Detecting face key points of the digital person on the second image, and generating a second eye-mouth vector according to coordinate information of key points of a mouth and eyes in the face key points;
Determining vector transformation parameters according to the first eye-mouth vector and the second eye-mouth vector;
determining a second hair vector of the second image according to the first hair vector of the digital person in the first image and the vector transformation parameter; the first hair vector is predetermined according to the coordinate information of the top and the bottom of the hair of the first hair area on the first image;
And determining a second hair region of the digital person on the second image according to the second hair vector.
7. The method of claim 6, further comprising the step of pre-determining the first hair vector:
For a first hair region on the first image, respectively acquiring coordinates of a left lowest point and a right lowest point of a hair bottom region in the horizontal direction, and taking the average coordinates as the hair bottom coordinates;
in the vertical direction, determining the top coordinates of the hair according to the coordinates of each point in the top area of the hair;
And generating the first hair vector according to the hair bottom coordinate and the hair top coordinate.
8. The method of claim 5, wherein constructing the target mask map of the second image from the second hair region and the face keypoints of the digital person on the second image comprises:
Creating a blank image; the pixel points of the blank image are in one-to-one correspondence with the pixel points of the corresponding positions on the second image;
According to the second hair area and the key points of the human face on the second image, determining a human head area and a non-human head area on the blank image;
And setting pixel values for the pixel points in the head region and the non-head region respectively, and generating the target mask map.
9. The method of claim 8, wherein after generating the target mask map, the method further comprises:
detecting a pixel value mutation region in the target mask map;
And smoothing the pixel points of the pixel value mutation area by using Gaussian blur to obtain an updated target mask diagram.
10. The method of claim 5, wherein determining the fusion weight corresponding to each pixel on the second image according to the pixel value of each pixel on the target mask map comprises:
for each pixel point on the target mask graph, carrying out normalization processing on pixel values of the pixel points;
and taking the normalization processed result as a fusion weight corresponding to the pixel point with the same pixel position on the second image.
11. The method of claim 5, wherein the linearly fusing the second image with the original image according to the fusion weights comprises:
acquiring a second pixel value of each pixel point on the second image, and acquiring a first pixel value of the same pixel position on the original image;
And according to the fusion weight corresponding to each pixel point on the second image, linearly superposing the first pixel value and the second pixel value to obtain the digital human image.
12. The method according to claim 1, wherein in case the original image of the digital person comprises a multi-frame image of the original video, the method further comprises the digital person target video generation step of:
combining digital human images corresponding to each original image according to the inter-frame sequence of the multi-frame images in the original video to generate a first video;
and synchronizing the input driving voice data with the first video to generate a digital human target video matched with the driving voice data.
13. A digital human image generation apparatus, the apparatus comprising:
The interest region determining module is used for determining an interest region in the original image of the digital person; the region of interest includes a first hair region of the digital person;
The first image acquisition module is used for cutting out the region of interest in the original image to obtain a first image;
The driving processing module is used for driving and processing the head gesture and the facial expression of the digital person in the first image according to the digital person head motion coefficient of the first image to generate a second image;
And the digital human image generation module is used for fusing the second image with the original image according to the position information of the first image on the original image to generate a digital human image corresponding to the original image.
14. An electronic device, comprising: a memory, a processor;
the memory is used for storing a computer program;
The processor being operative to invoke the computer program to implement the method of any of claims 1-12.
15. A readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-12.
CN202410873148.XA 2024-07-01 2024-07-01 Digital human image generation method, device, equipment and readable storage medium Pending CN118799440A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410873148.XA CN118799440A (en) 2024-07-01 2024-07-01 Digital human image generation method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410873148.XA CN118799440A (en) 2024-07-01 2024-07-01 Digital human image generation method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN118799440A true CN118799440A (en) 2024-10-18

Family

ID=93021147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410873148.XA Pending CN118799440A (en) 2024-07-01 2024-07-01 Digital human image generation method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN118799440A (en)

Similar Documents

Publication Publication Date Title
US11335379B2 (en) Video processing method, device and electronic equipment
CN110807836B (en) Three-dimensional face model generation method, device, equipment and medium
EP3992919B1 (en) Three-dimensional facial model generation method and apparatus, device, and medium
EP4057234B1 (en) Method and apparatus for three-dimensional face reconstruction, and computer device, storage medium, and program product
US6717586B2 (en) Apparatus, method, program code, and storage medium for image processing
CN107484428B (en) Method for displaying objects
CN107610202B (en) Face image replacement method, device and storage medium
CN110136243A (en) A kind of three-dimensional facial reconstruction method and its system, device, storage medium
US11238569B2 (en) Image processing method and apparatus, image device, and storage medium
CN111008927B (en) Face replacement method, storage medium and terminal equipment
CN111754415A (en) Face image processing method and device, image equipment and storage medium
WO2002013144A1 (en) 3d facial modeling system and modeling method
CN112581518A (en) Eyeball registration method, device, server and medium based on three-dimensional cartoon model
CN115272570A (en) Virtual expression generation method and device, electronic equipment and storage medium
CN114821675B (en) Object processing method and system and processor
WO2024174422A1 (en) Model generation method and apparatus, electronic device, and storage medium
CN112749611B (en) Face point cloud model generation method and device, storage medium and electronic equipment
CN111652795A (en) Face shape adjusting method, face shape adjusting device, live broadcast method, live broadcast device, electronic equipment and storage medium
JP2023541351A (en) Character erasure model training method and device, translation display method and device, electronic device, storage medium, and computer program
CN111028318A (en) Virtual face synthesis method, system, device and storage medium
CN116681579A (en) Real-time video face replacement method, medium and system
CN116863044A (en) Face model generation method and device, electronic equipment and readable storage medium
KR100422470B1 (en) Method and apparatus for replacing a model face of moving image
CN112561784B (en) Image synthesis method, device, electronic equipment and storage medium
CN118799440A (en) Digital human image generation method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination