CN114049290A - Image processing method, device, equipment and storage medium - Google Patents

Image processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114049290A
CN114049290A CN202111325365.8A CN202111325365A CN114049290A CN 114049290 A CN114049290 A CN 114049290A CN 202111325365 A CN202111325365 A CN 202111325365A CN 114049290 A CN114049290 A CN 114049290A
Authority
CN
China
Prior art keywords
image
head
synthesized
feature map
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111325365.8A
Other languages
Chinese (zh)
Inventor
束长勇
刘家铭
洪智滨
韩钧宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111325365.8A priority Critical patent/CN114049290A/en
Publication of CN114049290A publication Critical patent/CN114049290A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The disclosure provides an image processing method, device and storage medium, relates to the field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scenes such as face image processing, face recognition and the like. The specific implementation scheme is as follows: acquiring a reference image and a target person head image; replacing the head of a reference person in the reference image with the head of a target person to obtain an image to be synthesized, wherein the image to be synthesized comprises a part of reference background, the head of the target person and a region to be filled between the part of reference background and the head of the target person; performing feature extraction on the reference image and the image to be synthesized to obtain a skin color sample feature map and a filling sample feature map; and generating a composite image based on the skin color sample characteristic diagram, the filling sample characteristic diagram and the image to be synthesized. The skin color sample characteristic graph is extracted and the sample characteristic graph is filled for image synthesis, so that the synthesized image is more natural and real.

Description

Image processing method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of deep learning and computer vision technologies, which can be applied to scenes such as face image processing and face recognition, and in particular, to an image processing method, apparatus, device, storage medium, and computer program product.
Background
With the development of computing technology and artificial intelligence, the fusion network has functions of skin color alignment, neck and background filling, and is widely applied to scenes such as face image editing and fusion, for example, fusing a head portrait of a person to a body of a specific person or a specific scene or background.
Disclosure of Invention
The present disclosure provides an image processing method, apparatus, device, storage medium, and computer program product.
According to a first aspect of the present disclosure, there is provided an image processing method including: acquiring a reference image and a target person head image, wherein the reference image comprises a reference background and a reference person head; replacing the head of a reference person in the reference image by the head of the target person to obtain an image to be synthesized, wherein the image to be synthesized comprises a part of reference background, the head of the target person and a region to be filled between the part of reference background and the head of the target person; performing feature extraction on the reference image and the image to be synthesized to obtain a skin color sample feature map and a filling sample feature map; and generating a composite image based on the skin color sample characteristic diagram, the filling sample characteristic diagram and the image to be synthesized.
According to a second aspect of the present disclosure, there is provided an image processing apparatus comprising: an acquisition module configured to acquire a reference image and a target person head image, wherein the reference image includes a reference background and a reference person head; the replacing module is configured to replace the head of the reference person in the reference image by using the head image of the target person to obtain an image to be synthesized, wherein the image to be synthesized comprises a part of reference background, the head of the target person and a region to be filled between the part of reference background and the head of the target person; the characteristic extraction module is configured to extract the characteristics of the reference image and the image to be synthesized to obtain a skin color sample characteristic diagram and a filling sample characteristic diagram; and the image generation module is configured to generate a composite image based on the skin color sample characteristic diagram, the filling sample characteristic diagram and the image to be synthesized.
According to a third aspect of the present disclosure, there is provided an electronic apparatus comprising: at least one processor; and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to implement a method as described in any of the implementations of the first aspect.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in any implementation manner of the first aspect.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method as described in any of the implementations of the first aspect.
According to the image processing method, the device, the equipment, the storage medium and the computer program product, firstly, the head of a target person is used for replacing the head of a reference person in a reference image to obtain an image to be synthesized, then, the reference image and the image to be synthesized are subjected to feature extraction to obtain a skin color sample feature map and a filling sample feature map, and finally, a synthetic image is generated based on the skin color sample feature map, the filling sample feature map and the image to be synthesized, and the generated synthetic image is more natural and real.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is an exemplary system architecture to which the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of an image processing method of the present disclosure;
FIG. 3 is a flow diagram of another embodiment of an image processing method of the present disclosure;
FIG. 4 is a flow chart of yet another embodiment of an image processing method of the present disclosure;
fig. 5A is a schematic view of an application scenario of the image processing method of the present disclosure;
FIG. 5B is a schematic diagram of extracting a skin tone sample feature map and a fill sample feature map in the scene of FIG. 5A;
fig. 6 is a schematic configuration diagram of an example of an image processing apparatus of the present disclosure;
fig. 7 is a block diagram of an electronic device for implementing a method of image processing of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the image processing method or image processing apparatus of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user can use the terminal apparatuses 101, 102, 103 to interact with the server 105 through the network 104 to acquire image processing results and the like. Various client applications, such as an image composition application, etc., may be installed on the terminal devices 101, 102, 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may provide various image composition based services or applications. For example, the server 105 may process the reference image and the target person head image acquired from the terminal apparatuses 101, 102, 103, and generate a processing result (e.g., generate a composite image).
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the image processing method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the image processing apparatus is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring to fig. 2, fig. 2 shows a flowchart of an image processing method provided by an embodiment of the present disclosure, where the flowchart 200 includes the following steps:
step 201, acquiring a reference image and a target person head image.
In the present embodiment, the execution subject of the image processing method (e.g., the server 105 shown in fig. 1) may acquire the reference image and the target person head image. Wherein the reference image comprises a reference background and a reference person head. The reference image may be acquired by directly using an image sensor, for example, the image sensor may be a camera, or may be acquired from a local file storing a large number of images. For example, the reference image may be an image captured by a camera with the reference person as a target and the environment where the reference person is located as a background. The head image of the target person may be an image obtained by individually dividing a head region of a certain person from an image captured by the camera. Optionally, the reference image further comprises an exposed skin area other than the head of the reference person; illustratively, the reference image may include the neck and/or arms, etc. of the reference person in addition to the head of the reference person.
And 202, replacing the head of the reference person in the reference image by using the head image of the target person to obtain an image to be synthesized.
In this embodiment, after the execution main body acquires the reference image and the target person head image, the target person head image may be used to replace the head of the reference person in the reference image to obtain an image to be synthesized, where the image to be synthesized includes a part of the reference background, the head of the target person, and a region to be filled between the part of the reference background and the head of the target person. In the implementation process, considering that the head sizes, shapes and the like of different people have differences, when the head of the reference person is replaced, the head of the reference person and the background of the preset distance or shape around the head can be removed in advance, the head image of the target person is added to the position of the head of the original reference person, and therefore the image to be synthesized is obtained, wherein the partial reference background refers to a part of the reference background in the reference image, for example, the residual background area of the background of the preset distance or shape around the head of the reference person is removed from the reference background.
And 203, extracting the features of the reference image and the image to be synthesized to obtain a skin color sample feature map and a filling sample feature map.
In this embodiment, after obtaining the image to be synthesized, the executing entity may perform feature extraction on the reference image and the image to be synthesized to obtain a skin color sample feature map and a filling sample feature map. The way of feature extraction may be any existing way of extraction including, but not limited to, HOG (histogram of oriented gradients) extraction algorithm, scale invariant feature transform, neural network features, and the like. The skin color sample characteristic features skin color information of a reference person in the reference image, and the filling sample characteristic map features filling information of a region to be filled indicated by the image to be synthesized, which is extracted from the reference image.
And step 204, generating a composite image based on the skin color sample characteristic diagram, the filling sample characteristic diagram and the image to be synthesized.
In this embodiment, after obtaining the skin color sample feature map and the filling sample feature map, the execution subject may generate a synthesized image based on the skin color sample feature map, the filling sample feature map and the image to be synthesized. The executing body can fuse the images to be synthesized by combining the acquired skin color sample characteristic diagram and the filling sample characteristic diagram by adopting a fusion network to obtain a synthesized image, wherein the synthesized image comprises the head of the target person, and the skin color of the target person is the same as that of the reference person.
The image processing method provided by this embodiment includes first replacing the head of a reference person in a reference image with the head of a target person to obtain an image to be synthesized, then performing feature extraction on the reference image and the image to be synthesized to obtain a skin color sample feature map and a filling sample feature map, and finally generating a synthesized image based on the skin color sample feature map, the filling sample feature map and the image to be synthesized, where the generated synthesized image is more natural and real.
With further continuing reference to fig. 3, fig. 3 illustrates a flow 300 of another embodiment of an image processing method of the present disclosure. The image processing method comprises the following steps:
and 301, acquiring a reference image and a target person head image.
In this embodiment, the specific operation of step 301 has been described in detail in step 201 in the embodiment shown in fig. 2, and is not described herein again.
Step 302, performing five sense organs segmentation on the head of the reference person in the reference image to obtain a head mask, and taking the region except the head mask in the reference image as a reference background.
In the present embodiment, the five sense organs segmentation means that an image is divided into regions corresponding to respective organs through five sense organs (eyebrows, eyes, nose, mouth, eyebrows, ears) of a human face, and then a complete head region is located through the divided regions; the head mask represents a region where the head is located, for example, in a specific implementation process, the execution subject may extract the head as a solid to obtain a binarized image as the head mask, specifically, a pixel value of the region where the head is located in the reference image may be set to 255, and a pixel value of a region other than the head in the reference image may be set to 0, that is, the head is represented by white, and the rest is filled by black. Optionally, in a specific implementation process, the execution subject may further perform translation and scaling on the reference image according to the size of the head portrait of the target person before performing the five-sense organ segmentation, so that the size of the head of the reference person in the reference image after the translation and scaling is equivalent to the size of the head of the target person.
And step 303, expanding the head mask to obtain an expanded region, wherein the area of the expanded region is larger than that of the image of the head of the target person.
In this embodiment, the expansion manner includes, but is not limited to, expanding the outermost edge of the head part by a preset distance with reference to the center of the head mask, or expanding the other side of the head part by a preset distance with reference to one side of the head part. For example, the above implementation subject may set the pixel value of the region having a distance of less than two millimeters from the outermost edge of the head to 0 in the region having a pixel value of 255 in the head mask obtained in the foregoing, so as to obtain an expansion region having a larger area than the original head, and the embodiment does not limit the expansion size, and the above-mentioned specific distance value is merely used for illustration.
And step 304, determining a region which does not intersect with the expansion region in the reference background as a part of the reference background, adding the head image of the target person to the expansion region, and determining a region which does not intersect with the head of the target person in the expansion region as a region to be filled.
In this embodiment, after the execution main body obtains the expansion region, the execution main body may map the expansion region to a position corresponding to a reference person on the reference image by comparing the reference background with the expansion region, and the expansion region may be mapped in a manner that an original head region in the expansion region corresponds to a head region on the reference image, so as to obtain a region that is not overlapped with the expansion region in the reference background as a partial reference background; further, the execution main body may map the head of the target person to the expansion region after the expansion region is obtained, and since the expansion region is obtained based on the expansion of the head of the reference person, mapping may be performed in a manner that parts of five sense organs are aligned when mapping the head of the target person, for example, aligning the nose of the target person with the nose of the original reference person when adding the head portrait of the target person, and finally, taking a region of the expansion region that does not cover the head of the target person as a region to be filled.
And 305, integrating the head image of the target person, part of the reference background and the region to be filled to obtain an image to be synthesized.
In this embodiment, after the execution main body determines the area to be filled and the partial reference background, the execution main body integrates the obtained partial reference background, the area to be filled, and the head image of the target person to obtain an image to be synthesized, where a part of the background of the image to be synthesized surrounds the area to be filled, and the area to be filled surrounds the head image of the target person.
Step 306, extracting the features of the reference image and the image to be synthesized to obtain a skin color sample feature map and a filling sample feature map;
and 307, generating a synthetic image based on the skin color sample characteristic diagram, the filling sample characteristic diagram and the image to be synthesized.
In the present embodiment, the specific operations of steps 306-307 have been described in detail in steps 203 and 204 in the embodiment shown in fig. 2, and are not described herein again.
According to the image processing method provided by the embodiment, the images to be synthesized are obtained by segmenting and expanding the reference image and integrating the segmented and expanded reference image with the head image of the target person, and the images to be synthesized are constructed by utilizing the reference image, so that the synthesized images are more real and natural.
With further continuing reference to fig. 4, fig. 4 shows a flow chart in yet another embodiment of an image processing method of the present disclosure, the image processing method comprising the steps of:
step 401, acquiring a reference image and a target person head image.
And 402, replacing the head of the reference person in the reference image by the head image of the target person to obtain an image to be synthesized.
In the present embodiment, the specific operations of steps 401 and 402 have been described in detail in step 201 and 202 in the embodiment shown in fig. 2, and are not described herein again.
And step 403, extracting the features of the reference image and the features of the image to be synthesized by using the feature extraction network.
In this embodiment, the feature extraction network may select classical backbone networks such as AlexNet, ZF Net, VGGNet, inclusion, ResNet, and the like without loss of generality, and the feature extraction network adopts a dual-input dual-output structure, specifically, a reference image and an image to be synthesized are used as dual inputs, and features of the reference image and features of the image to be synthesized are used as dual outputs.
And step 404, extracting a skin color sample feature map and a filling sample feature map from the features of the reference image and the features of the image to be synthesized by adopting an attention mechanism.
In this embodiment, after the execution subject extracts the features of the reference image and the features of the image to be synthesized, the execution subject may extract skin color information of the reference person as a skin color sample feature map by combining the features of the reference image and the features of the image to be synthesized, and extract information of a region to be filled as a filling sample feature map by combining the features of the reference image and the features of the image to be synthesized, so as to provide richer fusion information for subsequent image synthesis.
Optionally, step 404 includes determining the head features and the region features to be filled of the target person based on the features of the image to be synthesized; calculating an attention matrix by using the head characteristics of the target person and the characteristics of the reference image to obtain a color attention characteristic diagram; multiplying the color attention feature map and the features of the reference image to obtain a skin color sample feature map; calculating an attention matrix by using the characteristics of the region to be filled and the characteristics of the reference image to obtain a filled region attention characteristic diagram; and multiplying the attention feature map of the filling area with the features of the reference image to obtain a filling sample feature map.
And step 405, performing color processing on the head image of the target person to obtain a head gray scale image.
In this embodiment, the color processing refers to traversing the color of the head image of the target person, and processing the head image of the target person, which is originally composed of three RGB channels, into a single-channel grayscale image, so as to obtain a head grayscale image.
And step 406, synthesizing the skin color sample feature map, the filling sample feature map, the head mask, the head gray scale map and part of the reference background to obtain a synthesized image.
In this embodiment, after the execution subject extracts the skin color sample feature map and the filling sample feature map, the execution subject sends the skin color sample feature map and the filling sample feature map, the head mask, the head grayscale map, and a part of the reference background into a pre-trained fusion network for fusion, so as to obtain a composite image.
Optionally, step 406 includes stitching the skin color sample feature map, the filling sample feature map, the head mask, the head grayscale map, and a part of the reference background to obtain a stitched map; and inputting the mosaic into a pre-trained fusion network for fusion to obtain a synthetic image. For example, converged networks include, but are not limited to, a Unet network. Splicing refers to combining information of each channel along the dimension of the channel; for example, the signature 1 has four channels, denoted as B × C1 × W × H, the signature 2 has four channels, denoted as B × C2 × W × H, and a spliced graph is obtained by splicing the signature 1 and the signature 2 along the channels, denoted as B (C1+ C2) × W × H, the number of channels in the signature is not limited in this embodiment, and the number of channels in this embodiment is merely used for illustration.
In this embodiment, the pre-trained fusion network generates a human body image G (X) specifying a pose expression and an ID by a generatorInput,XRef) Y, wherein XInputFor picture of image to be synthesized, XRefThe image is a reference image, Y is a composite image output after fusion, and the loss function during the training of the fusion network comprises the following components:
(1) ID reservation loss. The intermediate features extracted by Arcface are adopted for alignment in a high-dimensional information space:
LID=||Arcface(Y)-Arcface(XGT)||2
wherein XGTIs part of the background derived from the reference image.
(2) Image feature alignment is lost. The intermediate features extracted using VGG19 are aligned in the high-dimensional information space:
LVGG=||VGG(Y)-VGG(XGT)||2
(3) and judging the loss of feature alignment. The intermediate features extracted by the discriminator D are aligned in the high-dimensional information space:
LD=||D(Y)-D(XGT)||2
(4) the discriminator is lost. Countermeasure training with discriminators to reduce artifacts in the generated images:
LGAN=E(logD(XGT))+E(log(1-D(Y)))
in the image processing method provided by the embodiment, the attention mechanism is adopted to extract the skin color sample characteristic diagram and the filling sample characteristic diagram, so that the skin color sample characteristic diagram has more real skin texture compared with the average color information adopted by the fusion network, the reduction of the migration quality of skin color caused by the adoption of the average color information is avoided, meanwhile, the filling sample characteristic diagram avoids the information imagined by the network from being inconsistent with the original information of the region to be filled, more complete and reliable information is provided for the fusion network, and the image synthesis mode is enriched.
In order to facilitate understanding of the technical solution of the present invention, a head-changing application scenario is taken as an example for detailed description, please refer to fig. 5A and 5B, where fig. 5A illustrates an application scenario of the image processing method of the present disclosure, fig. 5B is a schematic diagram of extracting a skin color sample feature map and filling a sample feature map in the application scenario of fig. 5A, in the application scenario, a reference image 1 includes a reference person's neck in addition to a reference person's head and background, in an implementation process, the execution subject inputs the reference image 1 and an image to be synthesized 2 into a feature extraction network 3, and the feature extraction network 3 outputs a feature 4 of the reference image and a feature 5 of the image to be synthesized. Further, the executing subject extracts 6 the features 4 of the reference image and the features 5 of the image to be synthesized through attention features to obtain a skin color sample feature map 7 and a filling sample feature map 8; referring to fig. 5B, the features 5 of the image to be synthesized include a head feature 5a of the target person and a feature 5B of the area to be filled, and the execution subject calculates an attention matrix by using the head feature 5a of the target person and the feature 5B of the area to be filled and the feature 4 of the reference image, and multiplies the attention matrix by the feature 4 of the reference image to obtain a skin color sample feature map 7 and a filling sample feature map 8. Referring to fig. 5A again, after the execution subject acquires the head mask 9, the partial reference background 10, and the head gray-scale image 11 based on the image to be synthesized 2, the execution subject concatenates the three with the skin color sample feature map 7 and the filling sample feature map 8 obtained in the previous step, and then inputs the three into the pre-trained fusion network 12, and the pre-trained fusion network 12 outputs the synthesized image 13.
In the embodiment, the synthesized image obtained by the above method is targeted at the target person, is the background of the reference image, and further comprises the neck of the reference person in the reference image, and by the fact that the skin color of the target person in the synthesized image is the same as the skin color of the reference person, it is ensured that the head of the target person in the image to be synthesized has no difference from the skin color of the neck in the image, and the area around the head combined with the background is more real and natural, so that the attractiveness of the synthesized image is improved.
Referring further to fig. 6, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an image processing apparatus, and an embodiment of an apparatus for training an image style conversion model corresponds to an embodiment of a method for training an image style conversion model shown in fig. 2. The device can be applied to various electronic equipment.
As shown in fig. 5, the image processing apparatus 600 of the present embodiment may include: an acquisition module 601, a replacement module 602, a feature extraction module 603, and an image generation module 604. The acquiring module 601 is configured to acquire a reference image and a target person head image, where the reference image includes a reference background and a reference person head; a replacing module 602, configured to replace the head of the reference person in the reference image with the head image of the target person, so as to obtain an image to be synthesized, where the image to be synthesized includes a partial reference background, the head of the target person, and a region to be filled between the partial reference background and the head of the target person; a feature extraction module 603 configured to perform feature extraction on the reference image and the image to be synthesized to obtain a skin color sample feature map and a filling sample feature map; and the image generating module 604 is configured to generate a composite image based on the skin color sample feature map, the filling sample feature map and the image to be synthesized.
In the present embodiment, in the image processing apparatus 600: the specific processing and the technical effects thereof of the obtaining module 601, the replacing module 602, the feature extracting module 603 and the image generating module 604 can refer to the related descriptions of step 201 and step 204 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional implementations of this embodiment, the feature extraction module 603 includes:
a first extraction module configured to extract features of a reference image and features of an image to be synthesized using a feature extraction network;
and the second extraction module is configured to adopt an attention mechanism to extract a skin color sample feature map and a filling sample feature map from the features of the reference image and the features of the image to be synthesized.
In some optional implementations of this embodiment, the second extracting module includes:
the characteristic determination module is configured to determine the head characteristic of the target person and the characteristic of the region to be filled based on the characteristic of the image to be synthesized;
the first calculation module is configured to calculate an attention matrix by using the head characteristics of the target person and the characteristics of the reference image to obtain a color attention characteristic map;
the first multiplication module is configured to multiply the color attention feature map and the features of the reference image to obtain a skin color sample feature map;
the second calculation module is configured to calculate an attention matrix by using the characteristics of the region to be filled and the characteristics of the reference image to obtain a filled region attention characteristic map;
and the second multiplying module is configured to multiply the filled region attention feature map and the features of the reference image to obtain a filled sample feature map.
In some optional implementations of this embodiment, the replacing module 602 includes:
the segmentation module is configured to perform five-sense organ segmentation on the head of a reference person in the reference image to obtain a head mask, and the region except the head mask in the reference image is used as a reference background;
the expansion module is configured to expand the head mask to obtain an expansion area, wherein the area of the expansion area is larger than that of the image of the head of the target person;
the region determining module is configured to determine a region, which does not intersect with the expansion region, in the reference background as a partial reference background, add the head image of the target person to the expansion region, and determine a region, which does not intersect with the head of the target person, in the expansion region as a region to be filled;
and the integration module is configured to integrate the head image of the target person, part of the reference background and the region to be filled to obtain an image to be synthesized.
In some optional implementations of this embodiment, the image generating module 604 includes:
the color processing module is configured to perform color processing on the head image of the target person to obtain a head gray scale image;
and the synthesis module is configured to synthesize the skin color sample feature map, the filling sample feature map, the head mask, the head gray scale map and part of the reference background to obtain a synthesized image.
In some optional implementations of this embodiment, the synthesizing module includes:
the splicing module is configured to splice the skin color sample characteristic graph, the filling sample characteristic graph, the head mask, the head gray-scale graph and part of the reference background to obtain a spliced graph;
and the fusion module is configured to input the mosaic into a fusion network trained in advance for fusion to obtain a composite image.
In some optional implementations of the present embodiment, the reference image further includes an exposed skin area other than the head of the reference person.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as an image processing method. For example, in some embodiments, the image processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM 03 and executed by the computing unit 701, one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the image processing method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. An image processing method comprising:
acquiring a reference image and a target person head image, wherein the reference image comprises a reference background and a reference person head;
replacing the head of a reference person in the reference image with the head image of the target person to obtain an image to be synthesized, wherein the image to be synthesized comprises a part of reference background, the head of the target person and a region to be filled between the part of reference background and the head of the target person;
performing feature extraction on the reference image and the image to be synthesized to obtain a skin color sample feature map and a filling sample feature map;
and generating a composite image based on the skin color sample characteristic diagram, the filling sample characteristic diagram and the image to be synthesized.
2. The method according to claim 1, wherein the extracting features of the reference image and the image to be synthesized to obtain a skin color sample feature map and a filling sample feature map comprises:
extracting the features of the reference image and the features of the image to be synthesized by using a feature extraction network;
and adopting an attention mechanism to extract a skin color sample feature map and a filling sample feature map from the features of the reference image and the features of the image to be synthesized.
3. The method according to claim 2, wherein the extracting a skin color sample feature map and a filling sample feature map from the extracted features of the reference image and the image to be synthesized by using an attention mechanism comprises:
determining the head characteristics and the characteristics of the region to be filled of the target person based on the characteristics of the image to be synthesized;
calculating an attention matrix by using the head characteristics of the target person and the characteristics of the reference image to obtain a color attention characteristic diagram;
multiplying the color attention feature map and the features of the reference image to obtain the skin color sample feature map;
calculating an attention matrix by using the characteristics of the region to be filled and the characteristics of the reference image to obtain a filled region attention characteristic diagram;
and multiplying the attention feature map of the filling area with the features of the reference image to obtain the feature map of the filling sample.
4. The method according to any one of claims 1 to 3, wherein the replacing the head of the reference person in the reference image with the head image of the target person to obtain the image to be synthesized comprises:
performing five-sense organ segmentation on the head of the reference person in the reference image to obtain a head mask, and taking the region of the reference image except the head mask as the reference background;
expanding the head mask to obtain an expanded region, wherein the area of the expanded region is larger than that of the target person head image;
determining a region of the reference background, which does not intersect with the expansion region, as the partial reference background, adding the target person head image to the expansion region, and determining a region of the expansion region, which does not intersect with the target person head, as the region to be filled;
and integrating the head image of the target person, the partial reference background and the region to be filled to obtain the image to be synthesized.
5. The method of claim 4, wherein the generating a composite image based on the skin color exemplar feature map, the fill exemplar feature map, and the image to be composite comprises:
performing color processing on the head image of the target person to obtain a head gray scale image;
and synthesizing the skin color sample feature map, the filling sample feature map, the head mask, the head gray scale map and the partial reference background to obtain the synthesized image.
6. The method according to claim 5, wherein the synthesizing the skin color sample feature map, the filling sample feature map, the head mask, the head gray scale map and the partial reference background to obtain the synthesized image comprises:
splicing the skin color sample characteristic graph, the filling sample characteristic graph, the head mask, the head gray level graph and the partial reference background to obtain a spliced graph;
inputting the mosaic image into a pre-trained fusion network for fusion to obtain the composite image.
7. The method of claim 1, wherein the reference image further comprises an exposed area of skin other than the reference person's head.
8. An image processing apparatus comprising:
an acquisition module configured to acquire a reference image and a target person head image, wherein the reference image comprises a reference background and a reference person head;
the replacing module is configured to replace the head of the reference person in the reference image by using the head image of the target person to obtain an image to be synthesized, wherein the image to be synthesized comprises a partial reference background, the head of the target person and a region to be filled between the partial reference background and the head of the target person;
the characteristic extraction module is configured to extract the characteristics of the reference image and the image to be synthesized to obtain a skin color sample characteristic diagram and a filling sample characteristic diagram;
and the image generation module is configured to generate a composite image based on the skin color sample feature map, the filling sample feature map and the image to be synthesized.
9. The apparatus of claim 8, wherein the feature extraction module comprises:
a first extraction module configured to extract features of the reference image and features of the image to be synthesized using a feature extraction network;
and the second extraction module is configured to extract a skin color sample feature map and a filling sample feature map from the features of the reference image and the features of the image to be synthesized by adopting an attention mechanism.
10. The apparatus of claim 9, wherein the second extraction module comprises:
the characteristic determination module is configured to determine the head characteristic and the characteristic of the region to be filled of the target person based on the characteristic of the image to be synthesized;
the first calculation module is configured to calculate an attention matrix by using the head characteristics of the target person and the characteristics of the reference image to obtain a color attention characteristic map;
a first multiplying module configured to multiply the color attention feature map and the feature of the reference image to obtain the skin color sample feature map;
the second calculation module is configured to calculate an attention matrix by using the characteristics of the region to be filled and the characteristics of the reference image to obtain a filled region attention characteristic map;
and the second multiplying module is configured to multiply the filled region attention feature map and the features of the reference image to obtain the filling sample feature map.
11. The apparatus of any of claims 8-10, the replacement module, comprising:
a segmentation module configured to perform facial feature segmentation on the head of the reference person in the reference image to obtain a head mask, and to use a region of the reference image other than the head mask as the reference background;
the expansion module is configured to expand the head mask to obtain an expansion area, wherein the area of the expansion area is larger than that of the target person head image;
a region determination module configured to determine a region of the reference background, which does not intersect the expansion region, as the partial reference background, add the target person head image to the expansion region, and determine a region of the expansion region, which does not intersect the target person head, as the region to be filled;
and the integration module is configured to integrate the head image of the target person, the partial reference background and the region to be filled to obtain the image to be synthesized.
12. The apparatus of claim 11, wherein the image generation module comprises:
the color processing module is configured to perform color processing on the head image of the target person to obtain a head gray scale image;
and the synthesis module is configured to synthesize the skin color sample feature map, the filling sample feature map, the head mask, the head gray scale map and the partial reference background to obtain the synthesized image.
13. The apparatus of claim 12, wherein the synthesis module comprises:
the splicing module is configured to splice the skin color sample feature map, the filling sample feature map, the head mask, the head gray scale map and the partial reference background to obtain a spliced map;
and the fusion module is configured to input the mosaic into a fusion network trained in advance for fusion to obtain the composite image.
14. The apparatus of claim 8, wherein the reference image further comprises an exposed area of skin other than the reference person's head.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
CN202111325365.8A 2021-11-10 2021-11-10 Image processing method, device, equipment and storage medium Pending CN114049290A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111325365.8A CN114049290A (en) 2021-11-10 2021-11-10 Image processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111325365.8A CN114049290A (en) 2021-11-10 2021-11-10 Image processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114049290A true CN114049290A (en) 2022-02-15

Family

ID=80207957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111325365.8A Pending CN114049290A (en) 2021-11-10 2021-11-10 Image processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114049290A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385641A (en) * 2023-03-29 2023-07-04 北京百度网讯科技有限公司 Image processing method and device, electronic equipment and storage medium
CN117291979A (en) * 2023-09-26 2023-12-26 北京鹰之眼智能健康科技有限公司 Ear hole positioning method, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050286786A1 (en) * 2004-06-17 2005-12-29 Reiko Noda Apparatus and method for coding image based on level of visual attention and level of perceivable image quality distortion, and computer program product therefor
US20070031032A1 (en) * 2005-08-05 2007-02-08 Samsung Electronics Co., Ltd. Method and apparatus for performing conversion of skin color into preference color by applying face detection and skin area detection
CN101263721A (en) * 2005-07-13 2008-09-10 日本电气株式会社 Color correction method and color correction device
CN109784301A (en) * 2019-01-28 2019-05-21 广州酷狗计算机科技有限公司 Image processing method, device, computer equipment and storage medium
CN111027382A (en) * 2019-11-06 2020-04-17 华中师范大学 Attention mechanism-based lightweight face detection method and model
CN111063008A (en) * 2019-12-23 2020-04-24 北京达佳互联信息技术有限公司 Image processing method, device, equipment and storage medium
CN112967355A (en) * 2021-03-05 2021-06-15 北京百度网讯科技有限公司 Image filling method and device, electronic device and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050286786A1 (en) * 2004-06-17 2005-12-29 Reiko Noda Apparatus and method for coding image based on level of visual attention and level of perceivable image quality distortion, and computer program product therefor
CN101263721A (en) * 2005-07-13 2008-09-10 日本电气株式会社 Color correction method and color correction device
US20070031032A1 (en) * 2005-08-05 2007-02-08 Samsung Electronics Co., Ltd. Method and apparatus for performing conversion of skin color into preference color by applying face detection and skin area detection
CN109784301A (en) * 2019-01-28 2019-05-21 广州酷狗计算机科技有限公司 Image processing method, device, computer equipment and storage medium
CN111027382A (en) * 2019-11-06 2020-04-17 华中师范大学 Attention mechanism-based lightweight face detection method and model
CN111063008A (en) * 2019-12-23 2020-04-24 北京达佳互联信息技术有限公司 Image processing method, device, equipment and storage medium
CN112967355A (en) * 2021-03-05 2021-06-15 北京百度网讯科技有限公司 Image filling method and device, electronic device and medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
RENWANG CHEN ET AL: "SimSwap: An Efficient Framework For High Fidelity Face Swapping", 《MM \'20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA》, 16 October 2020 (2020-10-16) *
刘喜荣;田启川;: "基于肤色模型和模板匹配的人脸检测方法研究", 太原科技大学学报, no. 05, 15 October 2010 (2010-10-15) *
邓波;吴炜;滕奇志;: "基于视觉注意机制的人脸区域预检测", 电视技术, no. 07, 17 July 2010 (2010-07-17) *
陈令, 汪亚明: "基于肤色平滑度的人脸分割", 测试技术学报, no. 01, 15 March 2004 (2004-03-15) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385641A (en) * 2023-03-29 2023-07-04 北京百度网讯科技有限公司 Image processing method and device, electronic equipment and storage medium
CN116385641B (en) * 2023-03-29 2024-03-19 北京百度网讯科技有限公司 Image processing method and device, electronic equipment and storage medium
CN117291979A (en) * 2023-09-26 2023-12-26 北京鹰之眼智能健康科技有限公司 Ear hole positioning method, electronic equipment and storage medium
CN117291979B (en) * 2023-09-26 2024-04-26 北京鹰之眼智能健康科技有限公司 Ear hole positioning method, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113327278B (en) Three-dimensional face reconstruction method, device, equipment and storage medium
US10599914B2 (en) Method and apparatus for human face image processing
CN106682632B (en) Method and device for processing face image
EP3876204A2 (en) Method and apparatus for generating human body three-dimensional model, device and storage medium
CN114187624B (en) Image generation method, device, electronic equipment and storage medium
CN108388889B (en) Method and device for analyzing face image
CN113221771A (en) Living body face recognition method, living body face recognition device, living body face recognition equipment, storage medium and program product
CN112330527A (en) Image processing method, image processing apparatus, electronic device, and medium
US20230036338A1 (en) Method and apparatus for generating image restoration model, medium and program product
US20230047748A1 (en) Method of fusing image, and method of training image fusion model
CN110874575A (en) Face image processing method and related equipment
CN113052962A (en) Model training method, information output method, device, equipment and storage medium
CN113379877B (en) Face video generation method and device, electronic equipment and storage medium
KR20170002097A (en) Method for providing ultra light-weight data animation type based on sensitivity avatar emoticon
CN114049290A (en) Image processing method, device, equipment and storage medium
CN116309983B (en) Training method and generating method and device of virtual character model and electronic equipment
CN112884889B (en) Model training method, model training device, human head reconstruction method, human head reconstruction device, human head reconstruction equipment and storage medium
CN115147261A (en) Image processing method, device, storage medium, equipment and product
CN113380269B (en) Video image generation method, apparatus, device, medium, and computer program product
CN114120413A (en) Model training method, image synthesis method, device, equipment and program product
CN114066790A (en) Training method of image generation model, image generation method, device and equipment
CN113781653A (en) Object model generation method and device, electronic equipment and storage medium
CN116402914B (en) Method, device and product for determining stylized image generation model
US20220198828A1 (en) Method and apparatus for generating image
CN112785524B (en) Character image restoration method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination