CN114565508B - Virtual reloading method and device - Google Patents

Virtual reloading method and device Download PDF

Info

Publication number
CN114565508B
CN114565508B CN202210049595.4A CN202210049595A CN114565508B CN 114565508 B CN114565508 B CN 114565508B CN 202210049595 A CN202210049595 A CN 202210049595A CN 114565508 B CN114565508 B CN 114565508B
Authority
CN
China
Prior art keywords
image
neck
clothing
area
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210049595.4A
Other languages
Chinese (zh)
Other versions
CN114565508A (en
Inventor
苗锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing New Oxygen World Wide Technology Consulting Co ltd
Original Assignee
Soyoung Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Soyoung Technology Beijing Co Ltd filed Critical Soyoung Technology Beijing Co Ltd
Priority to CN202210049595.4A priority Critical patent/CN114565508B/en
Publication of CN114565508A publication Critical patent/CN114565508A/en
Application granted granted Critical
Publication of CN114565508B publication Critical patent/CN114565508B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application provides a virtual reloading method, a virtual reloading device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a user image and a clothes template image; acquiring a neck key point corresponding to a user image and a clothes key point corresponding to a clothes template image; detecting whether the neck of the user is blocked; if yes, generating a suit changing effect graph by utilizing a neck area in the suit template image according to the neck key point and the suit key point; and if not, generating a reloading effect picture by utilizing the neck area in the user image according to the neck key point and the clothes key point. If the neck of the user is not shielded, the neck area of the user is repaired, the clothes area of the clothes template image is deformed, the hair area of the user is deformed, and the neck area of the user is utilized for changing the clothes. If the neck of the user is shielded, the color of the neck area of the model in the clothes template image is transferred, the neck and clothes area of the model is deformed, the hair area of the user is deformed, and the model is used for changing the clothes.

Description

Virtual reloading method and device
Technical Field
The application belongs to the technical field of image processing, and particularly relates to a virtual reloading method and device.
Background
The virtual reloading means that the user image is combined with the clothing template image to obtain a new user picture wearing clothing in the clothing template image.
In the related art, a virtual reloading method generally trains a virtual reloading model according to a large amount of data, and reloads the virtual reloading model obtained through training. However, the training model has a large amount of calculation, and the model is easy to depend on the training data set, and when the difference between the test data and the training data is large, the effect of reloading through the virtual reloading model is poor.
Disclosure of Invention
The application provides a virtual reloading method and device, and a reloading effect picture corresponding to a user image and a clothes template image is generated according to a neck key point of the user image and a clothes key point of the clothes template image. The virtual reloading is carried out only by processing points or areas in the image, a model does not need to be trained, the calculation amount is reduced, the training data is not relied on, the accuracy of the virtual reloading is improved, and the reloading effect is more natural.
An embodiment of a first aspect of the present application provides a virtual reloading method, including:
acquiring a user image and a clothes template image to be reloaded;
acquiring a neck key point corresponding to the user image and a clothes key point corresponding to the clothes template image;
detecting whether a neck in the user image is occluded;
if the neck in the user image is blocked, generating a suit changing effect image corresponding to the user image and the suit template image by using the neck region in the suit template image according to the neck key point and the suit key point;
and if the neck in the user image is not blocked, generating a suit changing effect picture corresponding to the user image and the suit template image by using the neck region in the user image according to the neck key point and the suit key point.
In some embodiments of the present application, the generating a change effect map corresponding to the user image and the clothing template image by using a neck region in the user image according to the neck key point and the clothing key point includes:
repairing a neck region in the user image according to the neck key point, the clothes key point, the user image and the clothes template image;
according to the neck key points and the clothes key points, deformation processing is carried out on the clothes area in the clothes template image;
and covering the clothing area of the deformed clothing template image into the repaired user image to obtain a changing effect image.
In some embodiments of the present application, the repairing the neck region in the user image according to the neck keypoint, the clothing keypoint, the user image, and the clothing template image includes:
generating a mask image of a to-be-repaired area of the neck in the user image according to the neck key points, the clothes key points, the user image and the clothes template image;
if the area of the area to be repaired in the mask image of the area to be repaired is larger than 0, generating a background repair image corresponding to the user image;
and repairing the neck area in the user image according to the mask image of the area to be repaired and the background repair image.
In some embodiments of the present application, the generating a mask image of a to-be-repaired area of a neck in the user image according to the neck key point, the clothing key point, the user image, and the clothing template image includes:
acquiring a neck mask image corresponding to a neck region in the user image according to the user image;
acquiring a clothing mask image corresponding to the clothing template image according to the neck key point, the clothing key point and the clothing template image;
and generating a mask image of a to-be-repaired area of the neck in the user image according to the neck mask image and the clothing mask image.
In some embodiments of the present application, the obtaining, according to the user image, a neck mask image corresponding to a neck region in the user image includes:
detecting all face key points in the user image;
according to the face key points, carrying out face alignment processing on the user image;
segmenting a neck region from the aligned user image through a preset semantic segmentation model;
and generating a neck mask image corresponding to the neck area.
In some embodiments of the present application, the obtaining, according to the neck key point, the clothing key point and the clothing template image, a clothing mask image corresponding to the clothing template image includes:
aligning the clothing key points in the clothing template image with the neck key points;
and determining the aligned clothing template image as a clothing mask image.
In some embodiments of the present application, the generating a mask image of a to-be-repaired area of a neck in the user image according to the neck mask image and the garment mask image includes:
splicing the neck mask image and the clothing mask image according to the neck key points in the neck mask image and the clothing key points in the clothing mask image to obtain a spliced mask image;
determining a region to be detected from the edge of the collar of the garment to the upper edge of the neck region in the splicing mask image;
determining a region to be repaired from the region to be detected;
and generating a mask image of the area to be repaired corresponding to the area to be repaired.
In some embodiments of the present application, the determining a region to be repaired from the region to be detected includes:
traversing the attribution image of each pixel point in the area to be detected;
screening all pixel points of which the attributive image is neither the neck mask image nor the clothing mask image;
and determining the area formed by all the screened pixel points as the area to be repaired.
In some embodiments of the present application, the determining a region to be repaired from the region to be detected includes:
calculating the difference value between the vertical coordinate of the pixel point at the edge of the collar in the same row of pixels and the vertical coordinate of the pixel point at the lower edge of the neck area for each row of pixels in the area to be detected;
marking the pixels at the edge of the collar and the pixels at the lower edge of the neck area with the difference value larger than zero;
and determining the area surrounded by the connecting lines of all the marked pixel points as the area to be repaired.
In some embodiments of the present application, the generating a background restoration map corresponding to the user image includes:
extracting a main color of a neck region in the user image;
drawing a pure color background picture corresponding to the main color;
deducting an image of a head and neck region from the user image;
and covering the image of the head and neck region into the pure-color background image to obtain a background restoration image corresponding to the user image.
In some embodiments of the present application, the extracting a dominant color of a neck region in the user image includes:
determining all color values contained in a neck region of the user image;
counting the number of pixel points corresponding to each color value in the neck sub-area;
and determining the color value with the largest number of pixel points as the main color of the neck area.
In some embodiments of the present application, the repairing a neck region in the user image according to the mask image of the region to be repaired and the background repair map includes:
and inputting the mask image of the area to be repaired and the background repair image into a preset image repair network to obtain a repaired image corresponding to the user image.
In some embodiments of the application, the generating a change-garment effect map corresponding to the user image and the garment template image by using the neck region in the garment template image according to the neck key point and the garment key point includes:
performing color transfer processing on a neck region in the clothing template image according to the user image;
according to the neck key points, performing deformation processing on a neck area in the clothing template image;
according to the neck key points and the clothes key points, deformation processing is carried out on the clothes area in the clothes template image;
and covering the neck area and the clothing area of the deformed clothing template image in the user image to obtain a changing effect image.
In some embodiments of the present application, the performing, according to the user image, color migration processing on a neck region in the clothing template image includes:
extracting a face skin dominant color and a first neck skin dominant color in the user image;
and adjusting the color of the neck area of the clothing template image according to the face skin main color and the first neck skin main color.
In some embodiments of the present application, the adjusting the color of the neck region of the garment template image according to the face skin dominant color and the first neck skin dominant color comprises:
calculating a ratio of a first neck area of the user image to a face area of the user image, and determining whether the ratio is within a preset interval;
if so, fusing the main color of the face skin and the main color of the first neck skin, and adjusting the UV channel value of each pixel point in the neck area of the clothing template image according to the fused main colors;
if not, and if the ratio is larger than the upper limit value of the preset interval, adjusting the UV channel value of each pixel point in the neck area of the clothing template image according to the first neck skin main color;
if not, and the ratio is smaller than the lower limit value of the preset interval, adjusting the UV channel value of each pixel point in the neck area of the clothing template image according to the main color of the facial skin.
In some embodiments of the present application, the fusing the face skin dominant color and the first neck skin dominant color, and adjusting a UV channel value of each pixel point in a neck region of the clothing template image according to the fused dominant color, includes:
converting the color spaces of the main color of the face skin and the main color of the first neck skin into YUV color spaces respectively, and acquiring a UV channel value of the main color of the face skin and a UV channel value of the main color of the first neck skin in the YUV color spaces;
determining a UV channel value of a fusion main color according to the UV channel value and the corresponding weight coefficient of the main color of the face skin, and the UV channel value and the corresponding weight coefficient of the main color of the first neck skin;
and replacing the UV channel value of each pixel point in the neck area of the clothing template image with the UV channel value of the fused main color.
In some embodiments of the present application, adjusting the UV channel value of each pixel point in the neck region of the garment template image according to the first neck skin dominant color comprises:
converting the color space of the first neck skin main color into a YUV color space, and acquiring a UV channel value of the first neck skin main color in the YUV color space;
converting the color space of the clothing template image into a YUV color space;
and replacing the UV channel value of each pixel point in the neck area of the clothing template image with the UV channel value of the first neck skin main color.
In some embodiments of the present application, adjusting the UV channel value of each pixel point in the neck region of the clothing template image according to the dominant color of the facial skin includes:
converting the color space of the main color of the facial skin into a YUV color space, and acquiring a UV channel value of the main color of the facial skin in the YUV color space;
converting the color space of the clothing template image into a YUV color space;
and replacing the UV channel value of each pixel point in the neck area of the clothing template image with the UV channel value of the main color of the facial skin.
In some embodiments of the present application, before the adjusting the color of the neck region of the garment template image according to the face skin dominant color and the first neck skin dominant color, the method further includes:
extracting a second neck skin dominant color of the garment template image;
and adjusting the brightness of each pixel point in the neck area of the clothing template image according to the brightness value of the second neck skin main color and the brightness value of the face skin main color.
In some embodiments of the present application, adjusting the brightness of each pixel point in the neck region of the clothing template image according to the brightness value of the second neck skin main color and the brightness value of the face skin main color includes:
determining a brightness adjustment parameter according to the brightness value of the second neck skin main color and the brightness value of the face skin main color;
and adjusting the brightness of each pixel point in the neck region of the clothing template image based on the brightness adjusting parameter.
In some embodiments of the present application, the determining a brightness adjustment parameter according to the brightness value of the second neck skin main color and the brightness value of the face skin main color includes:
converting the color space of both the second neck skin dominant color and the facial skin dominant color to an HSV color space;
respectively acquiring a first brightness value of the main color of the face skin and a second brightness value of the main color of the second neck skin in an HSV color space;
and calculating a brightness adjusting parameter according to the second brightness value and the first brightness value.
In some embodiments of the present application, the deforming the neck region in the clothing template image according to the neck key point includes:
acquiring a neck key point of a neck area in the clothing template image;
determining a coordinate mapping matrix before and after deformation of a neck region in the clothing template image according to the neck key point corresponding to the user image and the neck key point corresponding to the clothing template image;
and according to the coordinate mapping matrix before and after the neck region is deformed, deforming the neck region in the clothing template image.
In some embodiments of the present application, the deforming the clothing region in the clothing template image according to the neck key point and the clothing key point includes:
determining a coordinate mapping matrix before and after deformation of a clothing region in the clothing template image according to the neck key point and the clothing key point;
and carrying out deformation processing on the clothing area in the clothing template image according to the coordinate mapping matrix.
In some embodiments of the present application, the determining a coordinate mapping matrix before and after deformation of a clothing region in the clothing template image according to the neck key point and the clothing key point includes:
calculating an abscissa mapping matrix before and after deformation of the clothing region in the clothing template image according to the neck key point and the clothing key point;
and calculating a longitudinal coordinate mapping matrix before and after deformation of the clothing region according to the neck key points and the clothing key points.
In some embodiments of the present application, the calculating an abscissa mapping matrix before and after deformation of the clothing region in the clothing template image according to the neck key point and the clothing key point includes:
dividing the width of the user image into a plurality of sections of first abscissa intervals along the horizontal direction according to the abscissa of each neck key point;
dividing the width of the clothes template image into a plurality of sections of second abscissa intervals along the horizontal direction according to the abscissa of each clothes key point, wherein the number of the first abscissa intervals is equal to that of the second abscissa intervals;
and calculating an abscissa mapping matrix corresponding to the clothing region in the clothing template image by utilizing a linear interpolation and a deformation coordinate mapping function according to the plurality of first abscissa intervals and the plurality of second abscissa intervals.
In some embodiments of the present application, the calculating a vertical coordinate mapping matrix before and after the deformation of the clothing region according to the neck key point and the clothing key point includes:
calculating a scaling coefficient of a vertical coordinate corresponding to each horizontal coordinate in the clothing area according to the neck key point and the clothing key point;
and calculating a vertical coordinate mapping matrix corresponding to the clothing area by using a deformation coordinate mapping function according to the height of the clothing template image, the vertical coordinate of each coordinate point of the clothing area and the scaling coefficient corresponding to each vertical coordinate.
In some embodiments of the present application, the calculating a scaling factor of a ordinate corresponding to each abscissa in the clothing region according to the neck key point and the clothing key point includes:
dividing the width of the user image into a plurality of first abscissa intervals along the horizontal direction according to the abscissa of each neck key point;
respectively calculating a scaling coefficient corresponding to the vertical coordinate of each clothes key point according to the height of the clothes template image, the neck key point and the clothes key point;
and calculating the scaling coefficient of the ordinate corresponding to each abscissa in the clothing area by utilizing linear interpolation and a deformation coordinate mapping function according to the scaling coefficients corresponding to the first abscissa intervals and the ordinate of each clothing key point.
In some embodiments of the present application, before calculating, according to the neck key point and the clothing key point, a scaling factor of a ordinate corresponding to each abscissa in the clothing region, the method further includes:
carrying out integral zooming processing on the clothing area in the clothing template image, wherein after zooming, a key point with the maximum vertical coordinate on a neckline boundary line in the clothing template image is superposed with a key point, which is positioned on a vertical central axis of a neck, of a clavicle area in the user image;
and recalculating each clothing key point in the clothing template image after the integral scaling processing.
In some embodiments of the present application, the performing an overall scaling process on the clothing region in the clothing template image includes:
calculating an overall scaling coefficient according to the height of the clothing template image, the vertical coordinate of the intersection point of the boundary lines of the left side and the right side of the neckline in the clothing template image and the vertical coordinate of a key point of the neck, on the vertical central axis of the neck, of the clavicle area in the user image;
and calculating a vertical coordinate mapping matrix of the clothing area before and after the integral scaling treatment by using a deformation coordinate mapping function according to the height of the clothing template image, the vertical coordinate of each coordinate point of the clothing area and the integral scaling coefficient.
In some embodiments of the present application, the recalculating each clothing keypoint in the clothing template image after the global scaling process comprises:
keeping the abscissa of each clothes key point unchanged after the integral scaling treatment;
and respectively calculating the vertical coordinate of each clothes key point after the integral scaling treatment according to the height of the clothes template image, the integral scaling coefficient and the vertical coordinate of each clothes key point before the integral scaling treatment.
In some embodiments of the present application, the calculating, according to the height of the clothing template image, the ordinate of each coordinate point of the clothing region, and the scaling factor corresponding to each ordinate, a ordinate mapping matrix corresponding to the clothing region by using a deformed coordinate mapping function includes:
and calculating a final ordinate mapping matrix corresponding to the clothing area by using a deformed coordinate mapping function according to the height of the clothing template image, the ordinate mapping matrix before and after the integral scaling treatment and the scaling coefficient corresponding to each ordinate.
In some embodiments of the present application, the deforming the clothing region in the clothing template image according to the coordinate mapping matrix includes:
according to the abscissa mapping matrix included by the coordinate mapping matrix, carrying out deformation processing on the clothing region in the clothing template image in the horizontal direction through a preset deformation algorithm;
and according to a vertical coordinate mapping matrix included in the coordinate mapping matrix, performing deformation processing on the clothing region in the vertical direction through the preset deformation algorithm.
In some embodiments of the present application, before the step of overlaying the deformed clothing template image to the user image, the method further includes:
obtaining a classification result of a hair region in the user image;
if the classification result is the long-hair backward cape type, judging whether the hair area needs to be subjected to deformation processing according to the clothes area and the hair area in the clothes template image;
and if so, performing deformation processing on the hair area in the user image.
In some embodiments of the present application, the determining whether the hair region needs to be deformed according to the clothes region and the hair region in the clothes template image includes:
traversing column-by-column a difference in longitudinal coordinates of adjacent edges between the garment region and the hair region;
acquiring a maximum vertical coordinate difference from the vertical coordinate differences;
if the maximum vertical coordinate difference is smaller than a preset value, determining that deformation processing is not needed to be carried out on the hair area;
if the maximum vertical coordinate difference is larger than the preset value, determining that the hair area needs to be deformed;
wherein the difference in the ordinates is the ordinate of the edge of the garment region minus the ordinate of the lower edge of the hair region.
In some embodiments of the application, the deforming the hair region in the user image includes:
expanding the face area of the user image towards the vertex direction to obtain a protection area;
determining a region to be treated using the hair region and the protective region;
and carrying out deformation processing on the area to be processed on the user image along the direction opposite to the head top direction to obtain the user image with deformed hair.
In some embodiments of the application, the deforming the region to be processed on the user image in a direction opposite to the vertex direction to obtain a deformed hair user image includes:
acquiring a deformation coefficient corresponding to each column on the region to be processed;
and according to the deformation coefficient corresponding to each column on the area to be processed, carrying out deformation processing on the area to be processed on the user image along the direction opposite to the vertex direction to obtain the user image with deformed hair.
In some embodiments of the present application, the deforming the to-be-processed area on the user image in a direction opposite to the vertex direction according to the deformation coefficient corresponding to each column on the to-be-processed area includes:
acquiring the maximum vertical coordinate and the minimum vertical coordinate of each column on the area to be processed;
for each column on the region to be processed, determining a deformed vertical coordinate of the column according to the maximum vertical coordinate, the minimum vertical coordinate and the deformation coefficient of the column;
and carrying out deformation processing on the area to be processed on the user image according to the deformed ordinate, the minimum ordinate and the maximum ordinate of each column on the area to be processed.
In some embodiments of the present application, the obtaining the deformation coefficient corresponding to each column on the region to be processed includes:
acquiring the maximum ordinate and the minimum ordinate of each column on the hair area and acquiring the minimum ordinate of each column on the clothes area;
for each column on the hair area, determining a deformation coefficient corresponding to the column by using the maximum ordinate and the minimum ordinate of the column on the hair area and the minimum ordinate of the column on the clothes area;
wherein the column of the area to be treated is the same as the column of the hair area.
In some embodiments of the present application, the deforming the to-be-processed area on the user image according to the deformed ordinate, the minimum ordinate, and the maximum ordinate of each column on the to-be-processed area includes:
generating a first map matrix and a second map matrix by respectively utilizing the width and the height of the user image, so that the mapping relation between the horizontal coordinate and the vertical coordinate of the user image is represented by the first map matrix and the second map matrix;
performing linear interpolation according to the deformed ordinate, the minimum ordinate and the maximum ordinate of each column on the region to be processed so as to update the second map matrix;
and performing deformation processing on the region to be processed on the user image by using the first map matrix and the updated second map matrix.
In some embodiments of the present application, said determining an area to be treated using said hair region and said protection region comprises:
and carrying out set subtraction operation between the hair area and the protection area to obtain an area to be treated.
In some embodiments of the present application, the detecting whether a neck in the user image is occluded includes:
determining an image to be detected according to the proportion of a target area in the user image;
and inputting the image to be detected into a trained neck shielding detection model, and judging whether the neck in the image to be detected is shielded or not by the neck shielding detection model.
In some embodiments of the present application, the determining an image to be detected according to a proportion of a target area in the user image includes:
detecting a human face area containing a neck in the user image as a target area;
determining a ratio between an area of the target region and an area of the user image;
if the ratio exceeds a ratio threshold, determining the user image as an image to be detected;
and if the ratio does not exceed a ratio threshold value, the target area is scratched from the user image, and the scratched target area is subjected to size amplification and then determined as an image to be detected.
In some embodiments of the present application, the neck occlusion detection model comprises a first branch network, a second branch network, a fusion module, and an output layer, the first branch network and the second branch network executing in parallel and independently;
by the neck shelters from the detection model and judges whether the neck in waiting to detect the image is sheltered from, include:
performing first preset convolution processing on the image to be detected through a first branch network in the neck shielding detection model to obtain a first characteristic diagram;
performing second preset convolution processing on the image to be detected through a second branch network in the neck shielding detection model to obtain a second feature map;
performing channel fusion on the first feature map and the second feature map through a fusion module in the neck shielding detection model to obtain a fusion feature map;
and obtaining a judgment result of whether the neck is shielded or not based on the fusion feature map through an output layer in the neck shielding detection model.
In some embodiments of the present application, the first branch network comprises a first depth separable convolutional layer and a first upsampling layer;
through first branch network in the neck shelters from the detection model is right wait to detect the image and carry out first preset convolution and handle, obtain first characteristic map, include:
performing feature extraction on a channel of the image to be detected through a convolution kernel contained in the first depth separable convolution layer to obtain a single-channel feature map;
and performing dimensionality-increasing processing on the single-channel feature map through a preset number of channel 1 x 1 convolution kernels contained in the first upper sampling layer to obtain a first feature map.
In some embodiments of the present application, the second branch network comprises a downsampling layer, a second depth separable convolutional layer, and a second upsampling layer;
and carrying out second preset convolution processing on the image to be detected through a second branch network in the neck shielding detection model to obtain a second characteristic diagram, wherein the second characteristic diagram comprises:
performing dimensionality reduction processing on the image to be detected through a single-channel 1 x 1 convolution kernel contained in the down-sampling layer to obtain a single-channel characteristic diagram;
performing feature extraction on the single-channel feature map through a convolution kernel included in the second depth separable convolution layer to obtain a single-channel feature map with features extracted;
and performing dimensionality-increasing processing on the single-channel characteristic diagram after the characteristics are extracted through a preset number of channel 1 × 1 convolution kernels included in the second upper sampling layer to obtain a second characteristic diagram.
In some embodiments of the present application, performing channel fusion on the first feature map and the second feature map by using a fusion module in the neck shielding detection model to obtain a fused feature map, including:
overlapping the first characteristic diagram and the second characteristic diagram according to the channel by a channel splicing layer in the fusion module to obtain a characteristic diagram after channel overlapping;
and disturbing the channel stacking sequence of the feature map after channel stacking through the channel mixing layer in the fusion module to obtain a fusion feature map.
In some embodiments of the present application, the training process of the neck occlusion detection model is as follows:
obtaining a data set, samples in the data set comprising positive samples and negative samples;
traversing each sample in the data set, inputting the currently traversed sample into a pre-constructed neck shielding detection model, predicting whether the sample is shielded by the neck shielding detection model, and outputting a prediction result;
calculating a loss value by using the prediction result, the positive and negative sample balance coefficients and the sample number balance coefficient;
and when the change rate of the loss value is greater than a change threshold value, optimizing network parameters of the neck shielding detection model according to the loss value, adjusting the positive and negative sample balance coefficients and the sample number balance coefficient, and continuously executing the process of traversing each sample in the data set until the change rate of the loss value is lower than the change threshold value.
In some embodiments of the application, after acquiring the data set, the method further comprises:
each sample in the data set is subjected to a data enhancement process, and the processed sample is added to the data set to expand the data set.
In some embodiments of the present application, the adjusting the positive and negative sample balance coefficients and the sample number balance coefficient includes:
if the loss value is larger than a preset threshold value, increasing both the positive and negative sample balance coefficients and the sample number balance coefficient by a first step length;
if the loss value is smaller than a preset threshold value, increasing both the positive and negative sample balance coefficients and the sample number balance coefficient by a second step length;
wherein the first step size is greater than the second step size.
In some embodiments of the present application, before the obtaining the key point of the neck corresponding to the user image, the method further includes:
determining the actual slope of a straight line determined by the labeled key point of the neck and the reference point in the sample graph;
inputting the sample graph into a pre-constructed neck key point detection model, so that the neck key point detection model can learn and output a neck key point;
calculating a loss value by using the neck key points output by the model, the labeled neck key points and the actual slope;
and when the loss value is larger than a preset value, optimizing the network parameters of the neck key point detection model according to the loss value, and continuing to execute the process of inputting the sample graph into the pre-constructed neck key point detection model until the loss value is lower than the preset value.
In some embodiments of the present application, the determining an actual slope of a straight line in the sample graph determined by the labeled neck key point and the reference point comprises:
acquiring a data set, wherein each sample image in the data set comprises a user head portrait;
for each sample map in the data set, locating a neck region in the sample map;
detecting a middle point of a contact edge of the neck area and the clothes and determining the middle point as a reference point;
labeling the key point of the neck on the sample graph, and determining the actual slope of a straight line determined by the labeled key point of the neck and the reference point.
In some embodiments of the present application, said labeling neck key points on the sample graph includes:
determining a straight line passing through the reference point by using a preset slope;
marking an intersection point between the straight line and the edge of the neck region as a neck key point on the sample graph;
horizontally overturning the straight line, and marking an intersection point between the overturned straight line and the edge of the neck region as another neck key point on the sample graph;
and fine-tuning the key points of the neck marked on the sample graph.
In some embodiments of the present application, said locating a neck region in said sample map comprises:
inputting the sample graph into a preset segmentation model so as to perform semantic segmentation on the sample graph by the segmentation model;
and determining a region formed by the pixels of which the semantic segmentation result is a neck as a neck region.
In some embodiments of the present application, the calculating a loss value using the model-output neck keypoints, the labeled neck keypoints, and the actual slope includes:
acquiring the position error between the neck key point output by the model and the position error before the labeled neck key point;
determining a loss weight based on the position error and the actual slope;
determining Euclidean distance between sample graph vector information carrying neck key points output by the model and sample graph vector information carrying labeled neck key points;
and calculating a loss value by using the loss weight and the Euclidean distance.
In some embodiments of the present application, the method further comprises:
acquiring a plurality of original character images;
labeling each human body region in each original human body image based on a preset semantic segmentation model and a preset cutout model to obtain a training data set;
constructing a network structure of a segmentation model based on an attention mechanism;
training the network structure of the segmentation model according to the training data set to obtain an image segmentation model for human body region segmentation;
and carrying out human body region segmentation on the user image and the clothing template image through the image segmentation model.
In some embodiments of the present application, labeling each person region in each original person image based on a preset semantic segmentation model and a preset matting model to obtain a training data set includes:
based on the original character image, adopting a preset semantic segmentation model to obtain a semantic segmentation result of each human body region of the original character image;
correcting a semantic segmentation result corresponding to the original character image through a preset matting model;
marking each human body area in the original character image according to the corrected semantic segmentation result;
and storing the marked original character image in a training data set.
In some embodiments of the present application, modifying the semantic segmentation result corresponding to the original character image through a preset matting model includes:
carrying out cutout processing on the original character image through a preset cutout model, and dividing a foreground pixel area and a background pixel area of the original character image;
if the matting result indicates that the first pixel point is a background pixel and the semantic segmentation result indicates that the first pixel point is a foreground pixel, correcting the semantic segmentation result of the first pixel point to be the background pixel;
if the matting result indicates that the first pixel point is a foreground pixel and the semantic segmentation result indicates that the first pixel point is a background pixel, determining a target pixel point which is closest to the first pixel point and different from the semantic segmentation result of the first pixel point, and determining the semantic segmentation result of the target pixel point as the semantic segmentation result of the first pixel point.
In some embodiments of the present application, the obtaining, based on the original person image, a semantic segmentation result of each human body region of the original person image by using a preset semantic segmentation model includes:
calculating an affine transformation matrix used for carrying out face alignment operation on the original figure image and a preset standard image of a preset semantic segmentation model;
performing matrix transformation on the original character image based on the affine transformation matrix to obtain a transformed image corresponding to the original character image;
obtaining a semantic segmentation result corresponding to the original character image by adopting a preset semantic segmentation model based on the original character image and the transformed image
In some embodiments of the present application, performing matrix transformation on an original person image based on an affine transformation matrix to obtain a transformed image corresponding to the original person image includes:
decomposing the affine transformation matrix to respectively obtain a rotation matrix, a translation matrix and a scaling matrix;
performing rotation, translation transformation and scaling transformation on the original figure image based on the rotation matrix, the translation matrix and the scaling matrix respectively to obtain a first transformed image;
and respectively cutting the first transformed image based on the scaling matrix and the image size corresponding to the scaling matrix with a preset magnification to obtain a plurality of second transformed images.
In some embodiments of the application, the obtaining, based on the original person image and the transformed image, a semantic segmentation result corresponding to the original person image by using a preset semantic segmentation model includes:
respectively obtaining semantic segmentation results of all human body regions of the original character image and semantic segmentation results of all human body regions of all second transformed images by adopting a semantic segmentation model based on the original character image and the second transformed images;
and correcting the semantic segmentation result of each human body region of the original character image according to the semantic segmentation result of each human body region of each second transformed image to obtain a semantic segmentation result corresponding to the original character image.
In some embodiments of the application, the modifying the semantic segmentation result of each human body region of the original person image according to the semantic segmentation result of each human body region of each second transformed image to obtain the semantic segmentation result corresponding to the original person image includes:
determining the confidence degrees of the segmentation results of the original person image and each second transformed image; the confidence coefficient of the segmentation result is in inverse proportional relation with the size of the image;
for each pixel point of the original character image, determining the segmentation result confidence of the original character image and/or each second transformed image,
and taking the semantic segmentation result with the highest confidence coefficient corresponding to the segmentation result as the semantic segmentation result of the pixel point.
In some embodiments of the present application, the storing the annotated image of the original person in the training dataset includes:
performing data enhancement on the marked original figure image through a data enhancement library to obtain an enhanced data set;
and carrying out multi-scale random cutting on partial images in the enhanced data set to obtain the training data set.
In some embodiments of the present application, training an attention-based segmentation model according to the training data set to obtain an image segmentation model for segmenting a human image includes:
inputting the sample image in the training data set into a segmentation model based on an attention mechanism to obtain a segmentation result of the sample image;
calculating the integral loss value of the current training period according to the segmentation result and the labeling information of each sample image;
and optimizing the segmentation model based on the attention mechanism after each period of training through an optimizer and a learning rate scheduler.
In some embodiments of the present application, the calculating the overall loss value of the current training period includes:
calculating a cross entropy loss value of the current training period through a cross entropy loss function according to the segmentation result and the labeling information of each sample image;
calculating a Dice loss value of the current training period through a Dice loss function according to the segmentation result and the labeling information of each sample image;
and calculating the integral loss value of the current training period according to the cross entropy loss value and the Dice loss value.
An embodiment of a second aspect of the present application provides a virtual reloading apparatus, including:
the image acquisition module is used for acquiring a user image and a clothes template image to be changed;
the key point acquisition module is used for acquiring a neck key point corresponding to the user image and a clothes key point corresponding to the clothes template image;
the neck shielding detection module is used for detecting whether the neck in the user image is shielded or not;
a generating module, configured to generate a suit changing effect map corresponding to the user image and the suit template image by using a neck region in the suit template image according to the neck key point and the suit key point if the neck occlusion detecting module detects that the neck in the user image is occluded; and if the neck shielding detection module detects that the neck in the user image is not shielded, generating a suit changing effect picture corresponding to the user image and the suit template image by using the neck region in the user image according to the neck key point and the suit key point.
Embodiments of the third aspect of the present application provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the method of the first aspect.
An embodiment of a fourth aspect of the present application provides a computer-readable storage medium having a computer program stored thereon, the program being executable by a processor to implement the method of the first aspect.
The technical scheme provided in the embodiment of the application at least has the following technical effects or advantages:
in the embodiment of the application, a neck key point in a user image and a clothes key point in a clothes template image are determined, and human body regions such as a hair region, a face region, a neck region and a clothes region in the user image and the clothes template image are segmented. Based on the processing, the neck area of the user image is repaired, the clothes area in the clothes template image is deformed, the hair area is deformed and the like, and the clothes area in the clothes template image is covered in the user image to achieve a good changing effect. Or, color migration is carried out on the neck area in the clothing template image, the neck area and the clothing area in the clothing template image are deformed, operations such as deformation of the hair area are carried out, and the good dressing change effect is achieved by covering the neck area and the clothing area in the clothing template image on the user image.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings.
In the drawings:
FIG. 1 is a flow chart illustrating a virtual reloading method according to an embodiment of the present application;
FIG. 2 illustrates a schematic diagram of a user image provided by an embodiment of the present application;
FIG. 3 illustrates a schematic diagram of a garment template image provided by an embodiment of the present application;
FIG. 4 is a labeled diagram of a key point of a neck of a sample graph provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a neck key point detection model according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating cropping of a first transformed image to obtain a series of second transformed images of different sizes according to an embodiment of the application;
fig. 7 is a schematic structural diagram illustrating an image segmentation model provided in an embodiment of the present application as a Swin-transformer model
Fig. 8 shows a schematic structural diagram of Swin Transformer Block provided in an embodiment of the present application;
fig. 9 is a schematic structural diagram of a neck occlusion detection model according to an embodiment of the present application;
fig. 10 is a schematic structural diagram illustrating a preset semantic segmentation model provided in an embodiment of the present application as an hrnet network;
fig. 11 is a schematic structural diagram illustrating a preset image restoration network provided in an embodiment of the present application is a CR-Fill network;
FIG. 12 is a diagram illustrating a reloading effect before applying the image restoration method of the present application according to an embodiment of the present application;
FIG. 13 is a diagram illustrating a reloading effect after applying the image restoration method of the present application according to an embodiment of the present application;
FIG. 14 is a flowchart illustrating a method for virtual rewrites based on an image repair method according to an embodiment of the present application;
FIG. 15 is a diagram illustrating the effect of changing a garment after deformation of a garment region according to an embodiment of the present application;
fig. 16 is a schematic flow chart illustrating a method for deforming a garment based on virtual retooling provided in an embodiment of the present application;
FIG. 17 is a flow diagram illustrating image color migration provided by an embodiment of the present application;
FIG. 18 is a schematic comparison of a prior art front and rear garment changing situation;
FIG. 19 is a schematic diagram illustrating a hair region and a face region obtained by segmentation according to an embodiment of the present application;
FIG. 20 is a schematic view of the protection area obtained after expansion of the face area shown in FIG. 19 in the overhead direction;
FIG. 21 is a schematic view of a region to be treated obtained from FIG. 19 (a) and FIG. 20;
FIG. 22 is a diagram illustrating a first map matrix and a second map matrix provided by an embodiment of the present application;
FIG. 23 illustrates a schematic comparison of a pre-and post-reloading schematic provided by an embodiment of the present application;
FIG. 24 is a flow chart illustrating a specific implementation of hair treatment provided by an embodiment of the present application;
FIG. 25 is a schematic diagram illustrating a virtual reloading apparatus according to an embodiment of the present application;
FIG. 26 is a diagram illustrating an electronic device according to an embodiment of the present application;
FIG. 27 is a schematic diagram of a storage medium provided by an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It is to be noted that, unless otherwise specified, technical terms or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.
The following describes a virtual reloading method and apparatus proposed according to an embodiment of the present application with reference to the accompanying drawings.
The virtual reloading means that the user image is combined with the clothing template image to obtain a new user image wearing clothing in the clothing template image. In the related art, a virtual reloading method generally trains a virtual reloading model according to a large amount of data, and reloads the virtual reloading model obtained through training. However, the training model has a large amount of calculation, and the model is easy to depend on the training data set, and when the difference between the test data and the training data is large, the effect of reloading through the virtual reloading model is poor.
Based on the above, the embodiment of the application provides a virtual reloading method, which does not perform virtual reloading through a network model, but performs image processing on a user image and a dress template image to generate a corresponding reloading effect image. Therefore, a model does not need to be trained, and the operation amount is reduced. And the reloading effect cannot be reduced because the model depends on the training data set, and the reloading effect is ensured to be good every time of virtual reloading operation.
Referring to fig. 1, the method specifically includes the following steps:
step 101: and acquiring a user image and a clothes template image to be reloaded.
And in the virtual reloading scene, acquiring a user image and a clothes template image. The user image may be an image including a head and neck region and a body region below the neck of the user, and the head and neck region includes a head region and a neck region. The garment template image comprises a garment image which comprises a complete neckline area and a part or complete clothes area below the neckline.
Step 102: and acquiring a neck key point corresponding to the user image and a clothes key point corresponding to the clothes template image.
The method comprises the steps of firstly, detecting all face key points in a user image through a preset face key point detection model, wherein the detected face key points comprise face key points of human face contours, eyes, a nose, a mouth, eyes, eyebrows and the like in the user image.
In the embodiment of the application, a standard face image is preset, and all preset standard face key points in the standard face image are marked. The standard face image can be an image with five sense organs not shielded and a connecting line of central points of two eyes parallel to a horizontal line. And carrying out face alignment on the user image according to the face key point corresponding to the user image and the preset standard face key point. That is, the face key points in the user image are aligned with the preset standard face key points in the preset standard face image, and specifically, a plurality of face key points of the facial features in the user image may be aligned with the corresponding standard face key points in the preset standard face image. For example, the face key points at two corners of the mouth, the nose tip, the centers of the left and right eyes and the centers of two eyebrows of the user image are respectively aligned with the corresponding face key points in the preset standard face image.
After aligning a user image with a preset standard face image, detecting a neck key point in the user image through a pre-trained neck key point detection model, wherein the neck key point comprises two key points at the joint of a left neck boundary line and a right neck boundary line with a shoulder and one key point of a clavicle region on a vertical central axis of the neck. In the user image shown in fig. 2, the key points of the neck include two key points p2_ d and p3_ d where the left and right neck boundary lines meet the shoulders, and a key point p1_ d where the clavicle region is located on the vertical central axis of the neck.
The clothing template image can also comprise a model face image, and the user can more intuitively see the wearing effect of the clothing template by wearing the clothing template image by the model. In the embodiment of the application, all face key points of a model face image in a clothing template image are detected in advance through a preset face key point detection model, the face key points are labeled in the clothing template image in advance, and the labeled face key points are called as pre-labeled virtual face key points. And aligning the clothing template image according to the virtual face key points pre-marked in the clothing template image and the preset standard face key points in the standard face image. Namely, aligning a plurality of pre-marked virtual face key points in the clothes template image with corresponding preset standard face key points through stretching deformation operation.
The clothing template image is also pre-labeled with a plurality of clothing key points, and the clothing key points can comprise two end points of the left and right boundary lines of the neckline and the intersection of the left and right boundary lines, as shown in fig. 3. With the vertex at the upper left corner of the clothing template image as the origin, the width of the clothing template image as the x-axis, and the height of the clothing template image as the y-axis, as shown in fig. 3, the acquired clothing key points may include two key points with the smallest vertical coordinate on the left and right sides of the neckline boundary line and one key point with the largest vertical coordinate on the neckline boundary line, that is, p2_ o, p3_ o, and p1_ o in fig. 3.
After the clothes template image is aligned with the preset standard face image, according to the clothes key points pre-marked in the clothes template image, the aligned clothes key points are determined from the aligned clothes template image. After alignment, the coordinates of each clothes key point after alignment are recalculated according to the clothes key point pre-marked in the clothes template image and the stretching degree in the alignment processing process.
In the step, the user image and the clothes template image are aligned with the preset standard human face image, so that the position and the posture of the clothes area in the user image and the clothes template image both meet the requirements of clothes changing processing, and the condition of poor clothes changing effect caused by large difference of the position and the posture of the user image and the clothes template image is avoided. And the number of the acquired neck key points is equal to that of the garment key points, and the number of the neck key points and that of the garment key points can be at least 3.
The training process of the neck key point detection model for detecting the neck key points comprises the following steps A1-A5.
A1: and determining the actual slope of a straight line determined by the labeled key point of the neck and the reference point in the sample graph.
Usually, the neck belongs to the key organ of connecting people's face and human body, has certain geometric constraint relation between it and the human body, and the vertical axis bilateral symmetry of the relative neck in neck both sides promptly consequently through mark a key point respectively on both sides of neck, alright in order to decide the position of the relative people's face of neck.
In one possible implementation manner, after the data set is acquired, for each sample graph in the data set, a neck region in the sample graph is located, a middle point of a contact edge between the neck region and clothes is detected and determined as a reference point, then a neck key point is marked on the sample graph, and an actual slope of a straight line determined by the marked neck key point and the reference point is determined.
The middle point of the contact edge of the neck area and the clothes can be a key point of the clavicle area on the vertical central axis of the neck. The key points of the neck marked on the sample graph by the reference points may include two key points where the left and right neck boundary lines meet the shoulder.
Each sample graph in the data set contains a user head portrait, and an isosceles triangle constraint relation is formed between two neck key points and a reference point respectively marked on two sides of a neck by taking a middle point of a contact edge of a neck region and clothes as the reference point, so that constraint conditions can be provided for subsequent loss calculation by calculating an actual slope between the marked neck key points and the reference point, and the robustness of the model is improved.
In an optional embodiment, the sample map is input into a preset segmentation model, so that the sample map is subjected to semantic segmentation by the segmentation model, and a region formed by pixels of which the semantic segmentation result is a neck is determined as the neck region.
Wherein, the precision of the neck region can be ensured by using the semantic segmentation model to position the neck region. Illustratively, the segmentation model may employ a HRNet segmentation model.
In an optional embodiment, a straight line passing through the reference point is determined by using a preset slope, an intersection point between the straight line and the edge of the neck region is marked on the sample graph as a neck key point, then the straight line is horizontally turned, the intersection point between the turned straight line and the edge of the neck region is marked on the sample graph as another neck key point, and finally the neck key point marked on the sample graph is finely adjusted, so that marking of the sample graph is completed.
The preset slope is a slope preset according to practice before the sample is marked, and the marking workload can be saved by calculating and then finely adjusting the key point of the marking neck by using the geometric constraint and the slope.
As shown in fig. 4, a rectangular coordinate system is established with the point O as the origin at the middle point O of the edge where the neck region contacts the garment, a straight line m is determined by using the point O and the preset slope k, the intersection B between the straight line m and the edge of the neck region is used as a key point of the neck, the straight line m 'is obtained by horizontally turning the straight line m, and the intersection a between the straight line m' and the edge of the neck region is used as another key point of the neck.
Due to the influence of the image shooting environment, labeling errors may exist in the neck key points labeled by slope calculation under the action of geometric constraint, and therefore the final labeling of the sample graph is completed after the neck key points labeled on the sample graph are manually fine-tuned.
It should be noted that, the final labeled key point of the neck on the sample graph is a point after fine adjustment, so that a difference exists between an actual slope and a preset slope of a straight line determined by the labeled key point of the neck and the reference point.
It should be further supplemented that, after the neck key point is marked on the sample graph, each sample graph in the data set may be subjected to a data enhancement process, and the processed sample graph is added to the data set to expand the data set and improve the model performance.
The data enhancement category may include clipping, flipping, morphing, color transformation, illumination transformation, and the like.
A2: and inputting the sample graph into a pre-constructed neck key point detection model, so that the neck key point detection model can learn and output the neck key points.
A3: and calculating a loss value by using the neck key point output by the model, the labeled neck key point and the actual slope.
In a possible implementation manner, a loss weight is determined according to a position error between a neck key point output by a model and a position error before a labeled neck key point, and an actual slope, then a euclidean distance between sample map vector information carrying the neck key point output by the model and sample map vector information carrying the labeled neck key point is determined, and a loss value is calculated by using the loss weight and the euclidean distance.
Wherein, the calculation formula for calculating the loss weight according to the position error and the actual slope is as follows:
Y n =K 1 *Y 1 +K 2 *Y 2 (formula 1)
In the above formula 1, K 1 And K 2 Weight ratios of position error and actual slope, respectively, dynamically adjusted during training, Y 1 And Y 2 Representing the position error and the actual slope, respectively. It can be seen that the position error and actual slope are combined proportionally to generate the final loss weight Y n
The Euclidean distance calculation formula between the sample graph vector information of the neck key points output by the carrying model and the sample graph vector information of the neck key points carrying the labels is as follows:
Figure BDA0003473918130000231
in the above formula 2, a represents the sample graph vector information carrying the neck key points output by the model, and b represents the sample graph vector information carrying the labeled neck key points.
The calculation formula for calculating the loss value using the loss weight and the euclidean distance is as follows:
Figure BDA0003473918130000241
in the above formula 3, M is the number of samples, N is the number of feature points of each sample, and in the present invention, N =2,y n Is the loss weight of the feature point and,
Figure BDA0003473918130000242
is the euclidean distance of the nth feature point in the mth sample. Therefore, after the loss weight is calculated, the final loss can be obtained by combining the Euclidean distance of the pixel.
A4: and when the loss value is larger than the preset value, optimizing the network parameters of the neck key point detection model according to the loss value, and continuing to execute the process of the step A2.
For the optimization process of the network parameters, an Adamw optimizer can be used for optimization.
A5: and when the loss value is smaller than the preset value, stopping the training process.
It should be noted that, for the model training end condition, other index conditions are also included, for example, a condition that the accuracy rate of the model is higher than a certain value, a condition that the recall rate is higher than a certain value, and the like.
In the accurate training data stage, the actual slope between the marked key point of the neck and the reference point is obtained, and in the training stage, loss calculation is carried out by using the actual slope, the prediction result and the marking result, so that the robustness and the accuracy of model detection are improved. The model is specially used for detecting the key points of the neck, the key points of the neck in the image can be accurately predicted, and a basis is provided for subsequent related operations on the face, so that the face and the neck are properly jointed.
After the neck key point detection model is obtained through the operation training in the steps A1 to A5, the neck key point detection model can be used for detecting the neck key point in the user image. Since the clothing template image may include the head and neck region and the clothing region of the model, the neck key point of the neck region in the clothing template image may be detected by the neck key point detection model.
Before detecting the neck key point of the user image by using the neck key point detection model, the received user image may be preprocessed. Specifically, the user image can be directly detected through a neck detection model by performing preset types of data enhancement processing on the user image, determining the proportion of the area of the face region containing the neck in the processed user image to the processed user image, and if the proportion exceeds a preset proportion, indicating that the proportion of the face region containing the neck in the image is high. If the proportion does not exceed the preset proportion, the fact that the proportion of the face region containing the neck in the user image is low is indicated, the model can not be detected, the face region containing the neck is extracted from the user image, the size of the face region extracted by extraction is amplified, the amplified face region image is detected through the neck key point detection model, and the accuracy of neck key point detection is improved.
The robustness of model detection can be improved by carrying out preset type data enhancement processing on the original user image. Illustratively, the preset kind of data enhancement processing may be flipping, morphing, color transformation, illumination transformation, and the like.
In some embodiments, the model structure of the neck keypoint detection model is as shown in fig. 5, the convolution module performs convolution operation on an image to be detected to obtain a basic feature map, the linear operation layer performs linear operation on the basic feature map to obtain a ghost feature map similar to the basic feature map, and the output layer in the neck keypoint detection model performs neck keypoint prediction according to the basic feature map and the ghost feature map. The image to be detected can be a user image or an enlarged image of a face region image extracted from the user image.
The linear operation belongs to simple operation, the calculated amount is small, all feature maps are obtained by combining the convolution module and the linear operation layer, and compared with the feature maps which are obtained by singly using the convolution operation, the linear operation has the characteristics of light weight and high efficiency.
In other embodiments of the present application, before the final clothes changing operation is performed on the user image and the clothes template image, a human body region segmentation operation is further performed on the user image and the clothes template image, specifically, the segmentation is performed through the following steps B1 to B5.
B1: a plurality of original person images are acquired.
The method comprises the steps of obtaining a plurality of original human images, wherein each original human image can comprise one or more human regions, and for the original human images comprising a plurality of human regions, respectively truncating images of the human regions from the original human images to obtain a plurality of original human images only comprising the human regions. When the image of the person region is captured, only the upper body may include the person image of the face, the neck, and the body.
Specifically, the original person image may be a person image disclosed on the internet, or may be a person image acquired by an image capturing device, which is not specifically limited in this embodiment.
B2: and marking each human body region in each original character image based on a preset semantic segmentation model and a preset matting model to obtain a training data set.
After the plurality of original character images are obtained in the above mode, the semantic segmentation result of each human body region of the original character images can be obtained by adopting a preset semantic segmentation model based on each original character image. The preset semantic segmentation model can be a face-matching model, the face can be analyzed by the model, five sense organs can be accurately identified, other parts of the human body can also be identified, and the semantic segmentation model is adopted to carry out semantic segmentation on the human image, so that accurate segmentation results of all parts of the human body can be quickly obtained. The preset semantic segmentation model is not particularly limited in this embodiment, as long as it can segment each part of the human body of the original person image.
The method comprises the steps of firstly detecting a plurality of key points of the five sense organs of an original character image through a preset key point detection model of the five sense organs, and obtaining an affine transformation matrix to be applied to enable the original character image and a preset standard image to be subjected to face alignment according to preset standard key points of the five sense organs and the plurality of key points of the five sense organs of the original character image. The preset standard image is generally an image used for training a preset semantic segmentation model, and the size of the preset standard image can be determined according to the specific preset semantic segmentation model. For example, for the face-matching model, the size of the pre-standard image is usually 512 × 512, and may be 224 × 224 or 226 × 226, etc. The present embodiment does not specifically limit the specific number of detected key points of the five sense organs, and may be 106 or 117, etc. In addition, the face key points of the original person image may be detected as long as the original person image and the first image with the preset size can be aligned.
The preset standard facial features key points can comprise preset left eye standard central points, right eye standard central points, nose tip standard points, left mouth angle standard points, right mouth angle standard points and the like. And respectively aligning a left eye central point, a right eye central point, a nose tip point, a left mouth corner point, a right mouth corner point and the like included in the detected multiple five sense organ key points of the original character image with a left eye standard central point, a right eye standard central point, a nose tip standard point, a left mouth corner standard point, a right mouth corner standard point and the like included in preset standard five sense organ key points one by one. For example, the center point of the left eye on the original character image is aligned with the standard center point of the left eye, the standard center point of the right eye on the original character image is aligned with the standard center point of the right eye, and so on.
After the affine transformation matrix is obtained, matrix transformation can be performed on the original character image based on the affine transformation matrix, so that the original character image and the preset standard image are subjected to face alignment, and semantic segmentation is performed on the original character image by applying a preset semantic segmentation model.
Specifically, the affine transformation matrix may be decomposed to obtain a rotation matrix, a translation matrix, and a scaling matrix. Then, the original image may be subjected to translation transformation and rotation transformation based on the translation matrix and the rotation matrix, respectively, which may generate relatively small errors, and then the image after translation transformation and rotation transformation may be subjected to scaling transformation, which may generally generate relatively large errors. Therefore, translation transformation and rotation transformation with small errors are firstly carried out, and then scaling transformation with large errors is carried out, so that the superposition error of the whole transformation can be effectively reduced, and the trueness of the transformed image is effectively guaranteed.
It should be noted that, the above-mentioned scheme of performing the translation transformation, the rotation transformation and then the scaling transformation is only a preferred embodiment of the present embodiment, and the present embodiment is not limited thereto.
In this embodiment, the original person image is transformed based on the rotation matrix, the translation matrix, and the scaling matrix to obtain a first transformed image, and the first transformed image may be aligned with the face of the preset standard image.
After the first transformed image is obtained, in order to obtain a better semantic segmentation result, the first transformed image may be clipped based on a certain rule to obtain a series of transformed images with different sizes, then the original character image and the corresponding series of transformed images are subjected to semantic segmentation respectively to obtain a semantic segmentation result of the original character image (which may be recorded as an initial semantic segmentation result) and a semantic segmentation result of the corresponding series of transformed images, and the initial semantic segmentation result is corrected according to the semantic segmentation result of the series of transformed images corresponding to the original character image to obtain the semantic segmentation result of the original character image.
In this embodiment, the first transformed image may be cropped based on the scaling matrix and an image size corresponding to the scaling matrix with a preset magnification to obtain a series of second transformed images with different sizes. Specifically, as shown in fig. 6, the image with the smallest center position in fig. 6 can be obtained by cropping the first transformed image based on the image size corresponding to the scaling matrix, that is, the size of the preset standard image. And then, based on the image size corresponding to the scaling matrix, carrying out multiple times of amplification according to a preset amplification factor until the size of the image after the next amplification is larger than that of the original character image, thus obtaining a plurality of image sizes, wherein each time of the image sizes is the same as the previous amplification factor, and respectively cutting the image after the first conversion by adopting the plurality of image sizes, thus obtaining a series of second converted images with different sizes corresponding to each original character image.
After the series of second transformed images with different sizes are obtained, a semantic segmentation result corresponding to the original character image can be obtained by adopting a preset semantic segmentation model based on the original character image and the corresponding second transformed images.
And respectively obtaining an initial semantic segmentation result of each human body region of the original character image and a semantic segmentation result of each human body region of each second transformed image by adopting a preset semantic segmentation model based on the original character image and the corresponding second transformed image. Specifically, the original character image and the second transformed image corresponding thereto may be sequentially input to a preset semantic segmentation model, and an obtained model output result is a semantic segmentation result.
And then, the semantic segmentation result of each human body region of the original character image can be corrected according to the semantic segmentation result of each human body region of each second transformed image, so that the semantic segmentation result corresponding to the original character image is obtained, and the semantic segmentation result of the original character image is more accurate.
And determining the confidence coefficient of the original character image and/or the segmentation result of each second converted image for each pixel point of the original character image based on the initial semantic segmentation result of the original character image and the semantic segmentation result of each human body region of each second converted image corresponding to the initial semantic segmentation result, wherein the confidence coefficient of the segmentation result is in inverse proportion to the size of the image.
As shown in fig. 6, the minimum size (i.e. the image size corresponding to the scaling matrix) of the second transformed image may correspond to each second transformed image in the original person image industry to obtain a semantic segmentation result, and each semantic segmentation result may correspond to a segmentation result confidence. The confidence coefficient of the segmentation result is used for representing the reliability or accuracy of the segmentation result, and in all the obtained second transformed images, the smaller the image size, the closer the image size to the image size used by the preset segmentation model training, so that the lower the image size, the higher the confidence coefficient of the segmentation result corresponding to the second transformed image.
After the confidence degrees of the segmentation results of the original person image and the second transformed images are determined, the position of each pixel point of the original person image, namely the position of the pixel point on which one or more second transformed images are located, can be determined. As shown in fig. 6, each pixel point on the second transformed image with the smallest size is simultaneously located on all the images (including the original person image and each of the second transformed images). Each pixel point located outside the second transformed image with the smallest size and located on the second transformed image with other sizes (the image size corresponding to the scaling matrix with the preset magnification in the figure) may be located on the original person image and one or more second transformed images. And the pixel points outside all the second transformed images are only positioned on the original character image.
Each pixel point of the original character image is positioned on a plurality of images (including the converted image and the original character image), and corresponds to the confidence degrees of a plurality of segmentation results, and the semantic segmentation result with the highest confidence degree of the corresponding segmentation result can be used as the semantic segmentation result of the pixel point, so that the accuracy of the semantic segmentation result of the preset semantic segmentation model on the original character image is further improved.
In some embodiments, in view of the limitation of the semantic segmentation model, which may have some objective errors, in order to further improve the accuracy of the semantic segmentation result of the original human image, the semantic segmentation result of the original human image may be further modified by other recognition models or classification models. For example, the semantic segmentation result corresponding to the original character image can be modified through the matting result of the matting model. It should be noted that, the present embodiment is not limited to the matting result of the matting model being used to correct the semantic segmentation result corresponding to the original character image, and may be any embodiment as long as each part of the character image can be accurately classified so as to correct the semantic segmentation result.
In general, the matting model can accurately identify foreground pixel regions and background pixel regions. Namely, the original character image is input into the matting model, and the foreground pixel area and the background pixel area of the original character image can be accurately divided through the matting processing of the matting model.
And when the semantic segmentation result corresponding to the original figure image is corrected by adopting a preset matting model, if the matting result of a certain pixel point is inconsistent with the semantic segmentation result, taking the matting result as the semantic segmentation result of the pixel point.
Specifically, if the matting result indicates that the first pixel is a background pixel and the semantic segmentation result indicates that the first pixel is a foreground pixel, the semantic segmentation result of the first pixel is corrected to be the background pixel. If the matting result indicates that the first pixel point is a foreground pixel and the semantic segmentation result indicates that the first pixel point is a background pixel, determining a target pixel point which is closest to the first pixel point and different from the semantic segmentation result of the first pixel point, and determining the semantic segmentation result of the target pixel point as the semantic segmentation result of the first pixel point.
After the semantic segmentation result corresponding to the original character image is modified, each human body region in the original character image can be labeled according to the modified semantic segmentation result.
In this embodiment, in order to further improve the segmentation accuracy between the neck and the body (clothes), the labeling classmates may further modify the multiple original person images by using pixelanotlatintools, and specifically may adjust semantic information of the clothes, and mark all regions of the clothes, that is, the remaining regions from which the head and the neck are removed, as the body, so as to obtain semantic segmentation data labeled with the face, the neck, and the body.
The annotated original character image may be stored in a training dataset for model training of a network structure constructed based on an attention-based segmentation model.
In this embodiment, after obtaining the labeled original person image, the data enhancement database may perform data enhancement on the labeled original person image to obtain an enhanced data set.
Specifically, the original person image can be enhanced by using a data enhancement application library with a good data enhancement effect, and the data enhancement application library is based on a highly optimized OpenCV (open CV) library, so that the processing speed is high, and the rapid data enhancement of the original person image can be realized. And simultaneously converting images, masks, key points and bounding boxes, so that the method is easier to expand to other tasks. And specific API interfaces can be provided for different image processing tasks, such as segmentation, detection and the like, so that the personalized customization of the data set is easier to realize. And is easy to add to other frameworks such as PyTorch, especially with good results in medical image processing.
In addition, although using uniformly sized pictures, it is easier to train the model. However, the robustness of the model obtained by training is poor, and if the aspect ratio of the picture to be segmented is different from that of the picture used for training, the semantic range of the picture is different, such as only a face part or only a neck clothes part. The results of model reasoning do not perform well enough. Therefore, in this embodiment, multi-scale random clipping is performed on partial images in the enhanced data set to obtain more images with different aspect ratios and larger differences in semantic ranges of the images, and based on the enhanced data set and a result of the multi-scale random clipping, the training data set for performing model training on the network structure constructed by the segmentation model based on the attention mechanism is obtained.
B3: and constructing a network structure of the segmentation model based on the attention mechanism.
The present embodiment is directed to constructing an image segmentation model with higher accuracy and faster processing speed, and based on the purpose, an analysis and research are performed on an existing network structure that can be used for image segmentation, and based on the research result, it is found that a classification network based on an attention mechanism is used as a backbone network, and not only can perform classification, but also can perform target detection, model migration learning of semantic segmentation, and the like.
In this embodiment, when constructing the network structure of the segmentation model, an improved model Swin-transformer model of a transformer model is used, the Swin-transformer model adopts a basic network structure of the transformer model, and in order to obtain a better image segmentation result and a more efficient image segmentation model, the network structure of the ransformer model is improved as follows: 1) And (3) introducing a layering construction mode commonly used in CNN (Convolutional Neural Networks) to construct a layering Transformer. 2) Introducing a locality idea (local sensitivity, namely, the probability of collision after mapping points with a short distance in space is high, and the probability of collision after mapping points with a long distance in space is low), and performing self-attention calculation in a window area without coincidence (an area without overlapped data).
The specific network architecture of the image segmentation model of the embodiment adopts a pyramid structure or a hierarchical structure, i.e., the model is divided into different layers (stages). As shown in FIG. 7, the network structure of the Swin-transformer model may include an image input layer, a tile Partition (batch Partition) layer, and four computation layers. The application process is as follows:
firstly, inputting a person image into a pixel division layer from an image input layer for pixel blocking, wherein generally every 4x4 adjacent pixels are a Patch, and then flattening (flatten) in a channel (channel) direction. Taking an RGB three-channel, size [ H, W,3] human image as an example, each Patch has 4 × 4=16 pixels, and then each pixel has three values of R, G, and B, and 16x3=48 is flattened, so that after Patch Partition, shape (size) of the human image is changed from [ H, W,3] to [ H/4, W/4,48].
Then, the first layer of calculation is carried out, the Linear transformation is firstly carried out on the channel data of each pixel through a Linear embedding (Linear embedding) layer, the B value of each pixel is changed from 48 to C, namely the shape of the output person image is changed from H/4, W/4,48 to H/4, W/4, C. Two more Swin Transformer blocks were stacked.
And then, performing second-layer calculation, namely performing downsampling through a Patch Merging layer to reduce resolution and adjust the number of channels so as to form a hierarchical design, and simultaneously saving a certain amount of calculation. Assuming that the input Patch Merging is a single-channel feature map (feature map) with a size of 4x4, the Patch Merging divides each 2x2 adjacent pixel into a Patch, and then 4 feature maps are obtained by piecing together the pixels with the same position (same color) in each Patch. The four feature maps are subsequently concat spliced in the depth direction and then passed through a LayerNorm layer. And finally, linearly changing the depth direction of the feature map through a full connection layer, and changing the depth of the feature map from C to C/2. As can be seen by this simple example, the feature map height and width would be halved and the depth would be doubled after passing through the Patch Merging layer. Then, two more Swin Transformer blocks are stacked, each downsampling can be set to double, and pixels are selected in the row and column directions at interval 2. Then the images are spliced together to be used as a whole tensor, and finally the tensor is expanded, wherein the number of the channels is 4 times of the original number, the number of the channels is changed into 2 times of the original number through the adjustment of a full connection layer, and the shape of the output person image is changed from H/4, W/4 and C to H/4, W/4 and 2C.
And sequentially performing third-layer calculation and fourth-layer calculation, wherein the third-layer calculation and the fourth-layer calculation are basically the same as the second-layer calculation, and Swin Transformer blocks with the same or different numbers can be stacked.
The structure of Swin Transformer Block shown in fig. 8 may be any one of the two structures in fig. 8, or may be applied by stacking the two structures. Wherein, the process of entering into the first Block comprises the following steps: firstly, a characteristic diagram (output data of the upper Layer) passes through a Layer Norm (LN) Layer, then passes through W-MSA (Windows Multi-head Self-attachment), and then is connected; the connected profile passes through the Layer Norm Layer again, then through the full connection Layer MLP, and then one connection is made. And then entering a second Block, wherein the characteristic diagram firstly passes through a Layer Norm Layer, then passes through the SW-MSA, and then is connected, and the connected characteristic diagram passes through the Layer Norm Layer again, then passes through a full connection Layer MLP, and then is connected, similarly to the process of entering the first Block.
Among them, the W-MSA may divide the feature map into multiple Windows according to M × M (e.g., M = 2) size, and then perform Self-Attention inside each Windows separately to reduce the amount of computation. SW-MSA (Shifted Windows Multi-Head Self-orientation), different from W-MSA in that there is a sliding of SW-MSA, which can be a win _ size/2. Where W-MSA and SW-MSA can be used in pairs, assuming that W-MSA is used for the L-th layer and SW-MSA is used for the L + 1-th layer, the two model Windows (Windows) are shifted (it can be understood that the Windows are shifted from the top left to the right and down by a number of pixels, respectively). Thereby enabling communication between the two windows of the L-th layer.
It should be noted that, in this embodiment, the specific implementation sequence of the step (B1 and B2) of acquiring the training data set and the step (B3) of constructing the network structure of the segmentation model is not specifically limited, that is, the step B3 may be performed first to construct the network structure of the segmentation model based on the attention mechanism, and then the step B1 and the step B2 may be performed, which does not affect the result of training the model.
B4: and training the network structure of the segmentation model according to the training data set to obtain an image segmentation model for segmenting the human body region.
After the network structure of the segmentation model is constructed, the model structure may be trained by using the data in the training data set, so as to obtain an image segmentation model with higher accuracy of segmentation result and higher segmentation efficiency.
The number of sample images acquired from the training data set per training period may be plural. The acquired sample image can be input into a Swin transform model, each layer structure sequentially connected in the Swin transform model is used for extracting features in the sample image, and self-attribute calculation is carried out on the basis of the extracted feature image to obtain a segmentation result of the sample image.
In some embodiments, only body and head and neck labels are labeled in the sample image. Correspondingly, only the probability of belonging to the body or to the head and neck of each region in the sample image may be included in the segmentation result.
And calculating the integral loss value of the current training period according to the segmentation result and the labeling information of each sample image. Specifically, in view of the reasons that if there are only foreground and background, if there are some pixels in the small target, the loss will change greatly, and the gradient will change drastically, and the training will be unstable, the application of the cross entropy loss function alone will be very disadvantageous for the small target, so this embodiment trains the Swin Transformer model using the Generalized Dice loss function in addition to the original cross entropy loss function. And the Dice loss and the cross entropy loss can be set to be in a preset ratio, such as 1.
According to the segmentation result and the labeling information of each sample image, the cross entropy loss value of the current training period is calculated through a formula (1) included in a cross entropy loss function. Calculating the Dice loss value of the current training period through a formula (2) included by a Dice loss function according to the segmentation result and the labeling information of each sample image; and then, calculating the overall loss value of the current training period according to the cross entropy loss value and the Dice loss value through a formula (3).
Figure BDA0003473918130000321
Figure BDA0003473918130000331
L total =λ 1 L ce2 L gd (3)
In the above formulas (1) and (2), L ce Represents the cross entropy loss value, L gd Denotes the Generalized die loss value, N denotes the total number of pixel positions, i denotes the pixel position, c denotes the label type (segmentation result), m denotes the total number of label types, p ic Indicates the segmentation result at the i-th position, y ic The true annotation class representing the ith position,
Figure BDA0003473918130000332
denotes the probability, λ, of the ith position prediction as a true label class 1 And λ 2 Decibel representation L ce And L gd Weight of (in this embodiment λ) 1 =1,λ 2 =1)。
On the basis of the original loss, the detection weight w of each segmentation in portrait detection is introduced, and the weight of each category
Figure BDA0003473918130000333
The calculation is performed according to the pixel area ratio, so that the contribution of various types (including the background type) of target areas to loss can be balanced.
In order to further optimize the Swin Transformer model, the embodiment further optimizes the segmentation model based on the attention mechanism after training in each period through an optimizer and a learning rate scheduler. In particular, popular Adamw optimizers may be used that more easily train out model performance equivalent to SGD + Moment, which requires complex tuning. And the model jumps out of the local optimal solution through the cosine annealing learning rate, so that a better model is obtained through training. The learning rate scheduler formula may be as shown in formula (4).
Figure BDA0003473918130000334
In the above formula (4), η min Represents the minimum learning rate, η max Denotes the maximum learning rate, T cur Is epochs (number of training rounds), T, after the last learning rate reset i Indicates how many epochs (number of training rounds) have passed before the learning rate is reset when Tcur =T i While setting eta t =η min When the learning rate is reset, T cur =0, η is set t =η max
B5: and carrying out human body region segmentation on the user image and the clothes template image through the image segmentation model.
For the user image and the clothing template image, preprocessing operations such as image alignment, denoising and the like can be carried out, then the user image and the clothing template image are input into the trained image segmentation model, the image segmentation model carries out image segmentation on the input user image and the clothing template image, and the probability that each pixel point in the user image and the clothing template image belongs to each human body region category is output.
In one implementation, the trained image segmentation model only outputs the probability that each pixel belongs to each region of the human body, each region of the human body comprises a body and a head and neck, and if the probability that a certain pixel belongs to the body is greater than 0.5, the pixel is the pixel of the body region. If the probability that a certain pixel point belongs to the head and neck is less than 0.5, the fact that the human body area to which the pixel point belongs is not the body area but the head and neck area is indicated.
In the embodiment of the application, when the image segmentation model for image segmentation of the character image is trained, the annotation preprocessing is performed on each human body region in the original character image of each training data set based on the preset semantic segmentation model and the preset matting model, so that the type of the human body region of the original character image can be accurately determined, the image annotation efficiency and the annotation accuracy can be greatly improved in the annotation process, and the acquisition efficiency of the training data set is effectively improved. The training data obtained based on the labeling process is adopted to train the network structure of the segmentation model based on the attention mechanism, so that the image segmentation speed and accuracy of the image segmentation model obtained by training can be improved, and a good segmentation effect can be obtained on the neck and body parts far away from the face.
In other embodiments of the present application, before the final change of the clothing template image and the user image, the portrait is extracted from the user image and the clothing template image, and the background images other than the extracted portrait are replaced with the preset background images.
In the virtual reloading scene, special requirements may be imposed on the background where the image is located after reloading, for example, in the virtual reloading scene of the identification photo, the image background in the reloading effect picture may be required to be a pure color background of blue, white or red. After the portrait is respectively extracted from the user image and the dress template image, the background image except the extracted portrait in the user image is replaced by a preset background image. Similarly, the background image except the extracted portrait in the clothing template image is replaced by the preset background image.
The neck key points corresponding to the user image and the clothes key points corresponding to the clothes template image are obtained through the method, the face alignment processing is carried out on the user image and the clothes template image, and human body regions such as a face region, a hair region, a neck region, a body region and the like are respectively separated from the user image and the clothes template image. And moreover, the portrait is extracted from the user image and the clothing template image, and the background image is replaced. After the operations are completed, virtual reloading can be carried out by using the user image and the dress template image.
In the virtual reloading scene, the reloading is carried out by utilizing the own neck area of the user in the user image, the face of the user and the own neck are most natural in color and brightness after reloading, and the reloading effect is good. However, in practical applications, the neck area in the user image may be blocked, so that the user image cannot be used for changing the garment model, and only the model in the garment template image can be used for changing the garment model.
In this case, before the final change-over operation is performed in this step, whether or not the neck in the user image is blocked is also detected by the following operation in step 103.
Step 103: and detecting whether the neck in the user image is blocked, if not, executing the step 104, and if so, executing the step 105.
This application embodiment has trained the neck that can detect whether the neck region is sheltered from and has sheltered from and sheltered from detection model, and concrete training process includes:
d1: a data set is acquired, the samples in the data set including positive samples and negative samples.
The positive sample refers to that the neck is not occluded, and the negative sample refers to that the neck is occluded, for example, the neck in the sample is occluded by an object such as hair or clothes.
It should be added that, after a data set including positive samples and negative samples is obtained, one data enhancement process may be performed on each sample in the data set, and the processed sample is added to the data set to expand the data set and improve the model performance.
The data enhancement category may include clipping, flipping, morphing, color transformation, illumination transformation, and the like.
D2: traversing each sample in the data set, inputting the currently traversed sample into a pre-constructed neck shielding detection model, predicting whether the sample is shielded by the neck shielding detection model, and outputting a prediction result.
D3: and calculating a loss value by using the prediction result, the positive and negative sample balance coefficients and the sample number balance coefficient.
Wherein, the calculation formula of the loss function is as follows:
Loss(p t )=-α(l-p t ) y log(p t ) (formula 1)
In the above equation 1, α represents a positive and negative sample balance coefficient, 0<α 1, γ represents a sample number balance coefficient, 0<γ<5,p t Representing the probability that the model predicts that the input sample belongs to a positive sample.
It should be noted that α and γ belong to hyper-parameters, and the two hyper-parameters need to be updated while optimizing the network parameters, so as to balance the contribution of various samples to the loss calculation.
As can be seen from the above formula 1, since α is a parameter smaller than 1, it is used to balance the problem of non-uniform ratio of positive and negative samples, and the loss value when the probability is high can be reduced; since γ is a parameter greater than 0, if the probability of a positive sample is smaller, it indicates that the sample belongs to a sample difficult to classify, and the loss is larger, so that the influence of the sample easy to classify can be reduced by γ, so that the model focuses more on the sample difficult to classify, and plays a smoothing role.
The problem that positive and negative samples are unbalanced in the training process can be solved through the positive and negative sample balance coefficients, the problem that difficult samples are unbalanced in the training process can be solved through the sample number balance coefficients, loss calculation can be carried out through the positive and negative sample balance coefficients and the sample number balance coefficients, loss can be guaranteed not to change greatly, and model training can be converged stably.
D4: and when the change rate of the loss value is greater than the change threshold value, optimizing the network parameters of the neck shielding detection model according to the loss value, adjusting the positive and negative sample balance coefficients and the sample number balance coefficient, and continuously executing the process of the step D2.
The optimization of the network parameters of the neck shielding detection model can be optimized by using a preset optimizer, such as an Adamw optimizer.
In a possible implementation manner, for the adjustment process of the positive and negative sample balance coefficients and the sample number balance coefficient, the adjustment step lengths of the two coefficients may be adjusted by setting a threshold, that is, if the loss value is greater than the preset threshold, the positive and negative sample balance coefficients and the sample number balance coefficient are both increased by a first step length, and if the loss value is less than the preset threshold, the positive and negative sample balance coefficients and the sample number balance coefficient are both increased by a second step length.
The first step length is larger than the second step length, that is, when the loss value is higher, the adjustment range of the two coefficients is larger, and when the loss value is lower, the adjustment range of the two coefficients is smaller.
D5: and when the change rate of the loss value is smaller than the change threshold value, ending the training process.
It will be understood by those skilled in the art that other index conditions, such as a condition that the accuracy of the model is higher than a certain value and the recall rate is higher than a certain value, are also included in the model training end condition, and these index conditions all belong to the conventional model training end condition and do not form a limitation to the scope of the present invention.
Therefore, the model training process is completed, the problem that positive and negative samples are unbalanced in the training process can be solved through the positive and negative sample balance coefficients, and the problem that difficult samples are unbalanced in the training process can be solved through the sample number balance coefficient, so that loss can be guaranteed not to change greatly by using the positive and negative sample balance coefficients and the sample number balance coefficient for loss calculation, and the model training can be stably converged.
After the neck occlusion detection model is obtained through training, whether a neck region in the user image is occluded or not is detected through the following operations of the steps E1 and E2.
E1: and determining an image to be detected according to the ratio of the target area in the user image.
And the image to be detected is an input image which is suitable for accurate prediction of the neck shielding detection model.
In a possible implementation manner, a human face area containing a neck in a user image is detected as a target area, the area of the target area accounts for the area of the user image, if the area exceeds a ratio threshold, the human face area containing the neck in the image is high, the background influence is small, the user image can be directly determined as an image to be detected, if the area does not exceed the ratio threshold, the human face area containing the neck in the image is low, the background influence is large, and the model can not be accurately detected, the target area is extracted from the user image, and the extracted target area is subjected to size amplification and then determined as the image to be detected, so that the accuracy of model detection is improved.
It should be added that, before determining the image to be detected, the robustness of the model detection can be improved by performing a preset type of data enhancement processing on the user image.
Illustratively, the preset category may be flip, morph, color transform, illumination transform, and the like.
E2: and inputting the image to be detected into the trained neck shielding detection model, and judging whether the neck in the image to be detected is shielded or not by the neck shielding detection model.
The structure of the neck shielding detection model is shown in fig. 9, a first preset convolution processing is performed on an image to be detected through a first branch network to obtain a first feature map, a second preset convolution processing is performed on the image to be detected through a second branch network to obtain a second feature map, then the first feature map and the second feature map are subjected to channel fusion through a fusion module to obtain a fusion feature map, and a judgment result of whether a neck is shielded or not is obtained through an output layer based on the fusion feature map.
The first branch network and the second branch network in the neck shielding detection model are executed in parallel and independently, each branch network is executed by simple convolution processing, in addition, the fusion module in the model is fused according to a channel, and the calculation time consumption is low.
For example, the output layer may implement classification calculation of whether the neck is blocked by using a full connection layer.
In a possible implementation manner, as shown in fig. 9, the first branch network includes a first depth separable convolution layer and a first upsampling layer, and for a process of performing a first preset convolution processing on an image to be detected, feature extraction is performed on one channel of the image to be detected through one convolution kernel included in the first depth separable convolution layer to obtain a single-channel feature map, and then dimension-increasing processing is performed on the single-channel feature map through a preset number of channels 1 × 1 convolution kernels included in the first upsampling layer to obtain a first feature map.
Wherein the first depth-separable convolution layer performs depth-separable convolution by randomly selecting one channel using one convolution kernel to extract feature information and output a feature map of the channel, and the first upsampling layer performs dimensionality-increasing processing on the extracted single-channel feature map by using a1 × 1 convolution kernel of multiple channels to output the first feature map of the same number of channels as the input image.
It can be seen that the number of channels of the 1 × 1 convolution kernel used in the first upsampling layer is the number of channels of the input image.
In one possible implementation, as shown in fig. 9, the second branch network includes a downsampling layer, a second depth separable convolutional layer, and a second upsampling layer; and for the process of carrying out second preset convolution processing on the image to be detected, carrying out dimensionality reduction processing on the image to be detected through single-channel 1 × 1 convolution kernel included in the down-sampling layer to obtain a single-channel feature map, carrying out feature extraction on the single-channel feature map through one convolution kernel included in the second depth separable convolution layer to obtain a single-channel feature map after feature extraction, and carrying out dimensionality enhancement processing on the single-channel feature map after feature extraction through preset number of channel 1 × 1 convolution kernels included in the second up-sampling layer to obtain a second feature map.
Wherein the second depth-separable convolutional layer performs depth-separable convolution on the channel by using one convolution kernel to extract feature information and output a feature map after feature extraction since the feature map of the input second depth-separable convolutional layer is already a single channel, and the second upsampling layer performs up-dimensional processing on the feature map after feature extraction by using 1 × 1 convolution kernels of multiple channels to output a second feature map of the same number of channels as the input image.
It can be seen that the number of channels of the 1 × 1 convolution kernel used by the second upsampling layer is also the number of channels of the input image.
In a possible implementation manner, as shown in fig. 9, the fusion module includes a channel splicing layer and a channel mixing layer, and for the process of channel fusion of the first feature map and the second feature map, the first feature map and the second feature map are superimposed according to the channel by the channel splicing layer in the fusion module to obtain a feature map after channel superimposition, and then the channel superimposition sequence of the feature map after channel superimposition is disturbed by the channel mixing layer to obtain a fusion feature map.
After the characteristic diagram is superimposed according to the channel through the channel splicing layer, the channel superimposing sequence is randomly disturbed through the channel mixing layer, so that the robustness of the model is improved.
According to the processing process of the neck shielding detection model, the two parallel branch networks are respectively subjected to different convolution processing to realize feature extraction and then are fused, so that the computing power consumption is reduced compared with the conventional CNN network architecture, and the migration to a mobile terminal is facilitated.
Whether the neck region in the user image is blocked is detected through the method, whether the neck region in the image is blocked is accurately predicted through the deep learning model special for the neck blocking detection, so that reference can be provided for subsequent reloading operation, reloading is guaranteed without using the neck region of the user image under the condition that the neck region of the user image is blocked, and a good reloading effect can be guaranteed. And the image to be detected of the input model is determined according to the proportion of the target area in the user image, so that the influence of the image background on the accuracy of the model is reduced, and the output result of the model is more accurate.
Step 104: and generating a reloading effect picture corresponding to the user image and the clothes template image by using the neck region in the user image according to the neck key point and the clothes key point.
If the neck region in the user image is not blocked, generating a reloading effect picture corresponding to the user image and the clothes template image by using the neck region in the user image according to the neck key point and the clothes key point. The reloading is carried out in particular by the following operations of steps F1 to F3.
F1: and repairing the neck area in the user image according to the neck key point, the clothes key point, the user image and the clothes template image.
This step is to repair the neck region by the following operations F11 to F13.
F11: and generating a mask image of a to-be-repaired area of the neck in the user image according to the key points of the neck, the key points of the clothes, the user image and the template image of the clothes.
Firstly, according to a user image, a neck mask image corresponding to a neck region in the user image is obtained. Specifically, after a user image is obtained, all face key points in the user image are detected through a preset face key point detection model, and face alignment processing is performed on the user image based on the face key points. After the alignment processing, the neck region is segmented from the user image through a preset semantic segmentation model or an image segmentation model obtained by training in the text. The preset semantic segmentation model can be a model such as HRNet, FCN (full Convolution Network), U-Net and the like.
Taking an HRNet model as an example, the preset semantic segmentation model can be subjected to migration learning training by using HRNet, and subnets for connecting high resolution to low resolution in parallel are used, so that the high resolution can be maintained. The hrnet uses repeated multi-scale fusion, and utilizes low-resolution representation with the same depth and similar level to improve high-resolution representation, so that the high-resolution representation is also sufficient in estimation of the posture, and a preset semantic segmentation model shows more accurate prediction capability on segmentation of a neck region, and the hrnet network architecture is shown in fig. 10.
And after a neck region in the user image is obtained through a preset semantic segmentation model, a neck mask image corresponding to the neck region is generated. Specifically, according to the coordinates of each pixel point on the contour line of the segmented neck region, a mask region with the same shape and size as the neck region is drawn in the blank texture map, the pixel value of each pixel point in the mask region is set to be a preset value, and a neck mask image corresponding to the neck region in the user image is obtained. The preset values may be 255, 253, 248, etc.
The pixel values of the pixel points in the mask area, which is the same as the neck area in shape and size, in the neck mask image are preset values, and the pixel values of the pixel points outside the mask area are 0.
And for the clothing template image, acquiring a clothing mask image corresponding to the clothing template image according to the neck key point, the clothing key point and the clothing template image. Specifically, a user image and a clothing template image are obtained, the user image is aligned with a preset standard face image, and then a neck key point detection model trained in the method is adopted to identify the neck key point in the aligned user image. This neck key point includes two neck key points along the axis bilateral symmetry of the vertical direction of neck at least, and these two neck key points can be two key points of the position department that meets with the shoulder for the neck. And aligning the clothes key points in the clothes template image with the neck key points, wherein the clothes key points are two neckline key points which are bilaterally symmetrical along the central axis in the vertical direction of the clothes at the neckline in the clothes template image, and the two neckline key points are two highest points at the joint of the left side and the right side of the neckline of the clothes and the neck. And determining the aligned clothing template image as a clothing mask image.
In another implementation mode, according to the coordinates of each pixel point on the outline of the clothing region in the clothing template image, the outline region with the same shape and size as the clothing region in the clothing template image is drawn in the blank texture map by using a coordinate index mode, and the pixel value of each pixel point in the outline region is set to be a preset value, so that the clothing mask image is obtained. The preset values may be 255, 252, 248, etc.
The pixel values of the pixel points in the outline area with the same shape and size as the clothing area in the clothing mask image are preset values, and the pixel values of the pixel points outside the outline area are 0.
After the neck mask image and the clothing mask image are obtained in the above mode, a mask image of a to-be-repaired area of the neck in the user image is generated according to the neck mask image and the clothing mask image. Specifically, the neck mask image and the clothing mask image are spliced according to a neck key point in the neck mask image and a clothing key point in the clothing mask image to obtain a spliced mask image.
Since the clothing key points in the clothing mask image are aligned with the neck key points in the neck mask image, the coordinates of the two right and left neckline key points included in the clothing key points are the same as the coordinates of the two right and left neck key points included in the neck key points. Therefore, the left and right clothing key points in the clothing mask image are correspondingly overlapped with the left and right neck key points in the neck mask image, the clothing mask image and the neck mask image can be spliced together, and the clothing mask area is spliced below the neck mask area in the obtained spliced mask image.
And then determining a region to be detected from the edge of the collar of the garment to the upper edge of the neck region in the splicing mask image. And determining the upper boundary of the neck region and the upper boundary of the clothing neckline in the splicing mask image, and drawing a circumscribed rectangle which is the minimum rectangle simultaneously covering the upper boundary of the neck region and the upper boundary of the clothing neckline. And determining the circumscribed rectangle as a region to be detected, and then determining the region to be repaired from the region to be detected.
In one implementation, the attribution image of each pixel point in the area to be detected is traversed. All pixel points of which the belonging image is neither the neck mask image nor the clothing mask image are screened out from the area to be detected, and the area formed by all the screened pixel points is determined as the area to be repaired. If no pixel point is screened out, the completeness of the neck region is indicated, and the area of the region to be repaired is 0.
In another implementation mode, the difference value between the vertical coordinate of the pixel point at the edge of the collar and the vertical coordinate of the pixel point at the lower edge of the neck region in the same row of pixels is calculated for each row of pixels in the region to be detected. Marking the pixels at the edge of the collar and the pixels at the lower edge of the neck area with the difference value larger than zero, and determining an area surrounded by the connecting lines of all the marked pixels as an area to be repaired. If the difference value between the vertical coordinate of the pixel point at the edge of the collar in each row of pixels and the vertical coordinate of the pixel point at the lower edge of the neck region is 0, the neck region is complete, and the area of the region to be repaired is 0.
After the area to be repaired is determined through any one of the above manners, a mask image of the area to be repaired corresponding to the area to be repaired is generated. Specifically, according to the coordinates of each pixel point on the outline of the area to be repaired, the outline area with the same shape and size as the area to be repaired is drawn in the blank texture map in a coordinate index mode, and the pixel value of each pixel point in the outline area is set to be a preset value, so that the mask image of the area to be repaired is obtained. The preset values may be 255, 252, 248, etc.
The pixel values of the pixel points in the contour region corresponding to the region to be repaired in the mask image of the region to be repaired are all preset values, and the pixel value of each pixel point outside the contour region is 0.
And if the area of the region to be repaired is determined to be 0 through any mode, the generated mask image of the region to be repaired is the blank texture map.
The method and the device for repairing the neck mask image have the advantages that the neck mask image is obtained based on the user image, the clothing mask image is obtained based on the clothing template image, and the neck mask image and the clothing mask image are used for generating the mask image of the area to be repaired. The accurate calculation mode of the to-be-repaired area is given, so that the goal of neck repair based on virtual reloading is stronger, the to-be-repaired area is repaired subsequently, the original neck area in the user image cannot be influenced, the accuracy of neck area repair is improved, and the reloading effect is improved.
After the mask image of the area to be repaired is generated in the step, whether the area of the area to be repaired in the mask image of the area to be repaired is larger than 0 is judged, and if yes, the area to be repaired is repaired in the steps F12 and F13 and then reloaded. If the area of the to-be-repaired area in the to-be-repaired area mask image is 0, it is indicated that the original neck area in the user image is complete and repair is not needed, the subsequent steps F12 and F13 are not executed, and the subsequent operation is directly performed after the preset neckline key point in the clothing template image is aligned with the neck key point in the user image.
F12: and if the area of the area to be repaired in the mask image of the area to be repaired is larger than 0, generating a background repair image corresponding to the user image.
When it is determined in step F11 that the area of the region to be repaired in the mask image of the region to be repaired is greater than 0, it indicates that the original neck region in the user image is incomplete and needs to be repaired. At this time, the background restoration image corresponding to the user image is stored.
First, the dominant color of the neck region in the user image is extracted. Specifically, all color values contained within the neck region of the user image are determined. And counting the number of pixel points corresponding to each color value in the neck region. And determining the color value with the largest number of pixel points as the dominant color of the neck subarea.
And then drawing a pure-color background image corresponding to the main color of the neck region, deducting the image of the head and neck region from the user image, and covering the image of the head and neck region into the pure-color background image to obtain a background repairing image corresponding to the user image.
When the image of the head and neck region is overlaid into the pure-color background image, the image of the head and neck region can be overlaid into the pure-color background image in a coordinate indexing manner according to the coordinates of each pixel point in the head and neck region, so that the position of the head and neck region in the obtained background restoration image is the same as the position of the head and neck region in the user image.
In the background restoration image obtained in the above manner, the picture content around the neck region with the restoration region, either the original neck region in the user image or the background color drawn based on the main color of the neck, is related to the neck restoration. Therefore, the adverse effects of the regions irrelevant to the neck (such as the clothes region) on image restoration and virtual reloading can be avoided, and the accuracy of the neck region restoration is ensured.
F13: and repairing the neck area in the user image according to the mask image and the background repairing image of the area to be repaired.
Obtaining a mask image of the area to be repaired through the step F11, obtaining a background repair picture with the background color being the main color of the neck through the step F12, and inputting the mask image of the area to be repaired and the background repair picture into a preset image repair network to obtain a repaired image corresponding to the user image.
The preset image restoration network can be a CR-Fill network, the network structure of the CR-Fill network is an architecture from coarse to fine, the architecture is similar to that in deep Fill v2, but the CA Layer is removed, and CR Loss is applied for training. The coarse and fine networks are convolutional encoder-decoder type networks. The expanded convolutional layer is used to enlarge the receptive field. Gated convolution was used in all convolution and expansion convolution layers. The coarse network takes as input the incomplete image with the missing pixels set to zero and a binary mask indicating the missing regions and generates an initial prediction. The refinement network then takes the initial prediction as input and outputs the final repaired image, and the network architecture of the CR-Fill network is shown in fig. 11.
And then, covering the repaired image with a clothing template image with the preset collar key point aligned with the neck key point corresponding to the user image to obtain a reloading effect image.
In order to more intuitively see the influence of the image restoration method provided by the embodiment of the present application on the virtual reloading effect, the following description is made with reference to the accompanying drawings. As shown in fig. 12, fig. 12 is a diagram showing an effect of performing virtual retooling without repairing the neck region. When the collar of the garment in the original user image is higher, the garment with the lower collar in the garment template image is directly covered in the user image, part of the collar of the original garment cannot be shielded by the garment template, and part of the original garment leaks out of the neck area, as shown in a rectangular frame in fig. 12, so that the dressing change effect is poor.
According to the method provided by the embodiment of the application, the neck mask image is obtained from the original image, the clothing mask image is obtained from the clothing template image, and the mask image of the area to be repaired is determined based on the neck mask image and the clothing mask image. And generating a background restoration picture with the background color as the main color of the neck. According to the mask image and the background restoration image of the area to be restored, the restored image is generated through a preset image restoration network, then the restored user image is utilized to perform virtual reloading, and the obtained reloading effect is shown in fig. 13. The region shown by the rectangular frame in fig. 13 is the image restoration region, and it can be seen visually by comparing fig. 12 and fig. 13 that the neck region is restored through the embodiment of the application, and then the virtual reloading operation is performed, so that the collar of the original garment cannot be leaked in the obtained reloading effect diagram, the reloading operation is more accurate, and the effect is more natural.
In order to further understand the image restoration method provided by the embodiment of the present application, the control flow of the present application is described again below with reference to the drawings. As shown in fig. 14, after the user image and the clothing template image are obtained, the facial alignment is performed on the user image using the key points of the five sense organs, and then the key points of the neck specified at both sides of the neck in the user image are detected. And obtaining a neck mask image based on the aligned user image. And aligning the preset collar key point in the clothing template image with the neck key point in the user image according to the clothing template image, and then obtaining a clothing mask image from the aligned clothing template image. And determining a mask image of the area to be repaired by utilizing the neck mask image and the clothing mask image. The dominant color of the neck region is extracted from the aligned user image. And generating a background restoration image according to the user image and the main color of the neck area. And then obtaining a repaired image according to the mask image and the background repair image of the area to be repaired. And then covering the aligned clothing template image into the repaired image to obtain a changing effect image.
And determining a mask image of the area to be repaired based on the neck mask image corresponding to the user image and the clothing mask image corresponding to the clothing template image. The accurate calculation mode of the to-be-repaired area is given, so that the goal of neck repair based on virtual reloading is stronger, the to-be-repaired area is repaired subsequently, the original neck area in the user image cannot be influenced, and the accuracy of neck area repair is improved. The background restoration image with the background color as the main color of the neck is generated, the adverse effect of a region (such as a clothes region) irrelevant to the neck on image restoration can be avoided, the accuracy of the neck region restoration is ensured, the color and the texture of the restored region to be restored are close to those of the original neck region in the user image, and the restored neck region is complete and has natural transitional texture. The virtual reloading is carried out on the basis of the repaired image, the incomplete or shielded condition of the neck area can not exist after the reloading, and a good reloading effect can be obtained.
F2: and according to the neck key points and the clothes key points, carrying out deformation processing on the clothes area in the clothes template image.
Because the model in the clothing template image is different from the human figure in the user image in weight, before the final reloading operation is carried out, the clothing region in the clothing template image is also subjected to deformation processing, so that the clothing region is matched with the user image, and the reloading effect is improved.
Firstly, according to the neck key points and the clothes key points, determining a coordinate mapping matrix before and after deformation of a clothes area in a clothes template image. According to the method, the coordinate mapping function before and after deformation of the clothing area is determined by using the deformation coordinate mapping function, the deformation coordinate mapping function is initialized, and the deformation is not performed during initialization, namely the abscissa after deformation is consistent with the abscissa before deformation, and the ordinate after deformation is consistent with the ordinate before deformation. The warped coordinate mapping function is represented in the form of a two-dimensional matrix with the first subscript being the ordinate and the second subscript being the abscissa. If the ordinate of a certain point is j and the abscissa is i, map [ j, i ] represents the coordinate before mapping. In order to facilitate calculation, in the embodiment of the application, the abscissa mapping and the ordinate mapping are calculated separately, the abscissa mapping function is designated as map _ x, and the ordinate mapping function is designated as map _ y.
The step specifically determines the coordinate mapping matrix before and after the deformation through the following operations of steps F21 and F22, including:
f21: and calculating the abscissa mapping matrix before and after deformation of the clothing region in the clothing template image according to the neck key points and the clothing key points.
According to the abscissa of each neck key point, the width of the user image is divided into a plurality of first abscissa sections along the horizontal direction. Wherein, the number of the first abscissa interval is the sum of 1 of the number of the key points of the neck.
In the user image shown in fig. 2, the coordinate system of the user image is as shown in fig. 2, and the vertex at the upper left corner of the user image is the origin. The width w of the user image may be divided into 4 first abscissa sections by the abscissa of three neck key points p2_ d, p1_ d, and p3_ d. The divided 4 first abscissa intervals are [0, p2_d [ 'x' ]), [ p2_ d [ 'x' ], p1_ d [ 'x' ]), [ p1_ d [ 'x' ], p3_ d [ 'x' ]) and [ p3_ d [ 'x' ], w) from left to right.
And dividing the width of the clothes template image into a plurality of second abscissa intervals along the horizontal direction according to the abscissa of each clothes key point. Wherein, the number of the second abscissa interval is the number of the clothes key points plus 1. Since the number of key points of the neck is equal to the number of key points of the garment, the number of first abscissa intervals is also equal to the number of second abscissa intervals.
As the clothing template image shown in fig. 3, the width w of the user image may be divided into 4 second abscissa sections by the abscissas of the three neck key points p2_ o, p1_ o, and p3_ o. The divided 4 second abscissa intervals are [0, p2_o [ 'x' ]), [ p2_ o [ 'x' ], p1_ o [ 'x' ]), [ p1_ o [ 'x' ], p3_ o [ 'x' ]) and [ p3_ o [ 'x' ], w) from left to right.
For the divided multiple first abscissa intervals and multiple second abscissa intervals, the number of the first abscissa intervals is equal to that of the second abscissa intervals, and the first abscissa intervals and the second abscissa intervals correspond to each other one by one. Namely, a first abscissa interval from left to right in the user image corresponds to a first second abscissa interval from left to right in the clothing template image, and a second abscissa interval in the user image corresponds to a second abscissa interval in the clothing template image, and the first abscissa interval and the second abscissa interval correspond to each other one by one.
For example, [0, p2_d [ 'x' ] in fig. 2 corresponds to [0, p2_o [ 'x' ] in fig. 3, [ p2_ d [ 'x' ], p1_ d [ 'x' ] in fig. 2 corresponds to [ p2_ o [ 'x' ], p1_ o [ 'x' ] in fig. 3, [ p1_ d [ 'x' ] in fig. 2, p3_ d [ 'x' ] in fig. 3, and [ p3_ d [ 'x' ] in fig. 2 corresponds to [ p3_ o [ 'x' ], w) in fig. 3.
Regarding a first abscissa section in the user image and a corresponding second abscissa section in the clothing template image, two points are included, where the abscissa in the user image is a start abscissa of the first abscissa section and the abscissa is a stop abscissa of the first abscissa section, and it is assumed that a point corresponding to the start abscissa is referred to as a point a and a point corresponding to the stop abscissa is referred to as a point B. Similarly, two points having the abscissa of the clothing template image as the start abscissa of the second abscissa section and the abscissa of the second abscissa section as the end abscissa are included, and the point corresponding to the start abscissa of the clothing template image is referred to as the a 'point and the point corresponding to the end abscissa is referred to as the B' point. The point a is the point corresponding to the point a 'after the deformation of the clothing region, and the point B is the point corresponding to the point B' after the deformation of the clothing region.
That is, for the first abscissa interval and the second abscissa interval corresponding to each other, two pairs of coordinate points corresponding to each other before and after the deformation are obtained, a straight line can be determined according to the two pairs of coordinate points, and linear interpolation is performed based on the straight line, so that the abscissa of a point corresponding to the abscissa of each point between the two end points of the second abscissa interval in the user image can be obtained.
For example, [0, p2_d [ 'x' ]) in FIG. 2 corresponds to [0, p2_o [ 'x' ]) in FIG. 3, which corresponds to having points (0, y 1), (p 2_ d [ 'x' ], y 1) in the user image, where y1 may be the ordinate of p2_ d. And (0, y 2), (p 2_ o [ 'x' ], y 2) in the garment template image is obtained, where y2 may be the ordinate of p2_ o. Wherein, after the deformation of the clothing region, the point (0, y 2) becomes the point (0, y 1), and the point (p 2_ o [ 'x' ], y 2) becomes the point (p 2_ d [ 'x' ], y 1). Determining a straight line according to the four points, determining each point with the ordinate of y2 and the abscissa between 0 and p2_ o [ 'x' ] by utilizing a linear interpolation mode, and determining the abscissa of each point after deformation.
According to the above manner, for each first abscissa interval divided in the user image and each second abscissa interval divided in the clothing template image, a linear interpolation manner can be adopted to determine the deformed abscissa corresponding to each abscissa in the clothing template image. According to the first abscissa intervals and the second abscissa intervals, calculating an abscissa mapping matrix corresponding to the clothing region in the clothing template image by using linear interpolation and a deformation coordinate mapping function.
For example, for the division of the abscissa intervals in fig. 2 and 3, a linear interpolation is used for each segment, and the numpy toolkit is used to combine the coordinate mapping processes of 4 segments into the following formula:
map_x[:,:]=np.interp(np.arange(w),[0,p2_d[‘x’],p1_d[‘x’],p3_d[‘x’],w-1],[0,p2_o[‘x’],p1_o[‘x’],p3_o[‘x’],w-1])
wherein, w is the width of the user image and the clothing template image, and the interp (x) represents a linear interpolation function.
In the step, the abscissa of the image is divided into a plurality of abscissa intervals by using a few key points (such as 3 key points), the transformed abscissas of other abscissas in the clothing template image in the user image are determined in a linear interpolation mode by using the mapping relation between the clothing key points in the clothing template image and the neck key points in the user image, and the mapping relation between the abscissas before and after the transformation is expressed by using a transformation coordinate mapping function. Therefore, the abscissa mapping matrix before and after deformation can be determined by using simple geometric relations through few key points, the calculation amount is small, and the mapping relation of the abscissas before and after deformation can be determined quickly and accurately.
F22: and calculating a longitudinal coordinate mapping matrix before and after deformation of the clothing region according to the neck key points and the clothing key points.
In one implementation, first, according to the key points of the neck and the key points of the clothing, the scaling factor of the ordinate corresponding to each abscissa in the clothing area is calculated.
Specifically, the width of the user image is divided into a plurality of first abscissa sections in the horizontal direction according to the abscissa of each neck key point. The division manner is the same as that in step S2, and as shown in fig. 2, 4 first abscissa intervals are defined as [0, p2_ [ 'x' ]), [ p2_ d [ 'x' ], p1_ d [ 'x' ]), [ p1_ d [ 'x' ], p3_ d [ 'x' ]), and [ p3_ d [ 'x' ], w) from left to right.
And then respectively calculating a scaling coefficient corresponding to the ordinate of each clothes key point according to the height of the clothes template image, the neck key point and the clothes key point. The height of the apparel template image is equal to the height of the user image. The number of key points of the neck is equal to that of key points of clothes, and the key points of the neck and the key points of the clothes correspond to each other one by one. For example, a key point of a neck on the left side of the boundary line of the neck corresponds to a key point of clothes at the end of the left side of the boundary line of the neck collar, a key point of a neck on the right side of the boundary line of the neck corresponds to a key point of clothes at the end of the right side of the boundary line of the neck collar, and a key point of a neck on the vertical central axis of the neck in the clavicle region corresponds to a key point of clothes at the intersection of the left and right side boundary lines of the neck collar.
And subtracting 1 from the height of the clothes template image and then subtracting the vertical coordinate of the clothes key point to obtain a first difference value for the neck key point and the clothes key point which are mutually corresponding. And subtracting 1 from the height of the clothes template image (or the height of the user image), and then subtracting the vertical coordinate of the corresponding neck key point to obtain a second difference value. And calculating the ratio of the first difference value to the second difference value, wherein the ratio is the scaling coefficient corresponding to the ordinate of the key point of the garment.
For example, assume that the height of the dress template image and the user image are both h. For the corresponding neck key point and clothing key point in fig. 2 and 3, such as the neck key point p2_ d in fig. 2 and the clothing key point p2_ o in fig. 3, the scaling factor corresponding to the ordinate of the clothing key point p2_ o is (h-1-p 2_ o [ 'y' ])/(h-1-p 2_ d [ 'y' ]). Similarly, the ordinate of the clothing key point p1_ o in FIG. 3 corresponds to a scaling factor of (h-1-p 1_ o [ 'y' ])/(h-1-p 1_ d [ 'y' ]). The scale factor corresponding to the ordinate of the clothing key point p3_ o is (h-1-p 3_ o [ 'y' ])/(h-1-p 3_ d [ 'y' ]).
After the scaling coefficient corresponding to the ordinate of each clothes key point is calculated in the above manner, the scaling coefficient corresponding to the ordinate of each abscissa in the clothing area is calculated by using linear interpolation and a deformed coordinate mapping function according to a plurality of first abscissa intervals divided in the user image and the scaling coefficient corresponding to the ordinate of each clothes key point.
For each first abscissa interval divided in the user image, the abscissa of the start point and the abscissa of the cutoff point of each interval respectively correspond to a scaling factor of an ordinate. For a first abscissa interval, according to the abscissa of the starting point and the scaling factor corresponding to the abscissa, and the abscissa of the cutoff point and the scaling factor corresponding to the abscissa, a straight line corresponding to the first abscissa interval can be determined, linear interpolation is performed based on the straight line, and the scaling factor corresponding to the ordinate under the abscissa of each point between the two end points of the first abscissa interval can be obtained.
For example, in the first abscissa interval [ p2_ d [ 'x' ], p1_ d [ 'x' ]) in fig. 2, the scale factor of the ordinate corresponding to the abscissa of the starting point p2_ d [ 'x' ] is (h-1-p 2_ o [ 'y' ])/(h-1-p 2_ d [ 'y' ]). The ordinate p1_ d [ 'x' ] of the cutoff point corresponds to the scaling factor of the ordinate as (h-1-p 1_ o [ 'y' ])/(h-1-p 1_ d [ 'y' ]). A straight line is determined from p2_ d [ 'x' ] and (h-1-p 2_ o [ 'y' ])/(h-1-p 2_ d [ 'y' ]), and p1_ d [ 'x' ] and (h-1-p 1_ o [ 'y' ])/(h-1-p 1_ d [ 'y' ]). Linear interpolation is performed based on the straight line, and a scaling factor of the ordinate corresponding to each abscissa between p2_ d [ 'x' ] and p1_ d [ 'x' ] can be obtained.
According to the above manner, for the scaling coefficient of the ordinate corresponding to each first abscissa interval and the interval endpoint divided in the user image, the scaling coefficient of the ordinate corresponding to each abscissa in each first abscissa interval can be determined by adopting a linear interpolation manner.
For example, for the 4 first abscissa intervals divided in fig. 2, a linear interpolation is applied to each segment, and the numpy toolkit is used to combine the 4 segments of linear difference processes as the following formula:
scale2=np.interp(np.arange(w),[0,p2_d[‘x’],p1_d[‘x’],p3_d[‘x’],w-1],
[1,(h-1-p2_o[‘y’])/(h-1-p2_d[‘y’]),(h-1-p1_o[‘y’])/(h-1-p1_d[‘y’]),(h-1-p3_o[‘y’])/(h-1-p3_d[‘y’]),1])
where w is the width of the user image and interp (x) represents a linear interpolation function.
And then, according to the height of the clothing template image, the ordinate of each coordinate point of the clothing area and the scaling coefficient corresponding to each ordinate, calculating a ordinate mapping matrix corresponding to the clothing area by using a deformation coordinate mapping function.
Specifically, a longitudinal coordinate mapping matrix corresponding to the clothing region is represented by the following assignment formula:
map_y[:,:]=h-1-map_y[:,:]*scale2[np.newaxis,:]
wherein h is the height of the user image and the clothes template image, scale2 is the scaling factor of the ordinate corresponding to each abscissa, map _ y [: to the right of the equal sign is the coordinate of each point in the clothes area before deformation, and map _ y [: to the left of the equal sign is the coordinate of each point in the clothes area after deformation. The abscissa of each point in the garment region is unchanged before and after deformation, and the ordinate is changed.
In other embodiments, before calculating the scaling factor of the ordinate corresponding to each abscissa in the clothing region in the manner described above, it is considered that the neckline region in the user image is likely to be greatly different from the neckline region in the clothing template image in the virtual change scene. If the clothes of the user image is high-collar, the neckline of the clothes in the clothes template image is lower; alternatively, the garment neckline in the user image is lower and the garment neckline in the garment template image is higher. Therefore, the clothing region can be integrally zoomed firstly based on the clothing key point at the intersection of the boundary lines of the left side and the right side of the neckline in the clothing template image and the neck key point of the clavicle region in the user image on the vertical central axis of the neck, and the key point with the maximum vertical coordinate on the boundary line of the neckline in the clothing template image after zooming is superposed with the key point of the clavicle region in the user image on the vertical central axis of the neck.
Firstly, calculating an overall scaling coefficient according to the height of the clothes template image, the vertical coordinate of an intersection point of the boundary lines of the left side and the right side of the neckline in the clothes template image and the vertical coordinate of a neck key point of a clavicle area on the vertical central axis of the neck in the user image. And subtracting one from the height of the clothes template image, and then subtracting the vertical coordinate of the intersection point of the boundary lines of the left side and the right side of the neckline in the clothes template image to obtain a third difference value. And subtracting one from the height of the clothes template image, and then subtracting the ordinate of a neck key point of the clavicle region on the vertical central axis of the neck in the user image to obtain a fourth difference value. And calculating the ratio of the third difference to the fourth difference, wherein the ratio is the integral scaling coefficient.
For example, assuming that the heights of the user image and the clothing template image are h, the key point of the neck with the clavicle region on the vertical central axis of the neck in fig. 2 is p1_ d, and the intersection point of the boundary lines on the left and right sides of the neckline in fig. 3 is p1_ o, the overall scaling factor is (h-1-p 1_ o [ 'y' ])/(h-1-p 1_ d [ 'y' ]).
After the overall scaling coefficient is calculated in the above manner, the vertical coordinate mapping matrix of the clothing area before and after the overall scaling processing is calculated by using the deformation coordinate mapping function according to the height of the clothing template image, the vertical coordinate of each coordinate point of the clothing area and the overall scaling coefficient. Specifically, the ordinate mapping matrix before and after the overall scaling processing is calculated by the following formula:
map_y[:,:]=(h-1-np.arange(h))[:,np.newaxis]*scale1
where h is the image height and scale1 is the overall scaling factor.
And calculating the overall scaling coefficient by the method, determining the ordinate mapping matrix before and after the overall scaling, and recalculating each clothing key point in the clothing template image after the overall scaling. Specifically, the abscissa of each clothing key point after the overall scaling process is kept unchanged. And respectively calculating the vertical coordinate of each clothes key point after the integral scaling treatment according to the height of the clothes template image, the integral scaling coefficient and the vertical coordinate of each clothes key point before the integral scaling treatment.
For any one clothing key point, firstly, the height of the clothing template image is subtracted by one, then, the vertical coordinate of the clothing key point before the integral scaling processing is subtracted, and the ratio of the obtained difference value to the integral scaling coefficient is calculated. And subtracting the calculated ratio from the height of the clothes template image after subtracting one, wherein the finally obtained difference value is the ordinate of the key point of the clothes after the integral scaling treatment.
For example, for the clothing key points p2_ o, p1_ o, and p3_ o in fig. 3, assume that the height of the clothing template image is h and the overall scaling factor is scale1. The ordinate of p2_ o after the entire scaling process becomes h-1- (h-1-p 2_ o [ 'y' ])/scale 1. The ordinate of p1_ o becomes h-1- (h-1-p 1_ o [ 'y' ])/scale 1= p1_ d [ 'y' ]. The ordinate of p3_ o becomes h-1- (h-1-p 3_ o [ 'y' ])/scale 1.
The coordinates of each clothing key point after the overall scaling processing are recalculated in the above manner, and then the scaling factor of the ordinate corresponding to each abscissa in the clothing region can be calculated according to the neck key point in the user image and the recalculated clothing key point in the manner described above, where the calculation process is the same as the corresponding process described above, and is not described again here.
And then calculating a final ordinate mapping matrix corresponding to the clothing area by using a deformed coordinate mapping function according to the height of the clothing template image, the ordinate mapping matrix before and after the calculated integral scaling processing and the scaling coefficient corresponding to each ordinate. The final ordinate mapping matrix is formulated as:
map_y[:,:]=h-1-map_y[:,:]*scale2[np.newaxis,:]
wherein h is the height of the user image and the clothes template image, and scale2 is a scaling coefficient of a vertical coordinate corresponding to each horizontal coordinate calculated according to the clothes key points after the integral scaling processing. Map _ y [:: ] on the right of equal sign is the ordinate mapping matrix before and after the whole scaling process, and map _ y [:: ] on the left of equal sign is the coordinate of each point in the clothing region after deformation. The abscissa of each point in the garment region is unchanged before and after deformation, and the ordinate is changed.
According to the method, fewer neck key points and fewer clothes key points are utilized, the abscissa mapping matrix and the ordinate mapping matrix before and after deformation are calculated by utilizing linear interpolation and a deformation coordinate mapping function on the basis of the geometrical relationship between the neck key points and the clothes key points which are mutually corresponding, and the coordinate mapping matrix before and after deformation of the clothes area is accurately calculated with small calculation amount. Before the ordinate mapping matrix is calculated, the clothing region can be wholly zoomed, and then the zoom coefficient of the ordinate corresponding to each abscissa is calculated, so that deformation processing of the clothing region is more detailed, the deformation precision of the clothing region is improved, the deformed clothing region is more consistent with the user image, and the virtual change effect is improved.
The whole scaling processing of the clothing region can be only virtual processing, namely, the clothing region is not really scaled, but only the whole scaling coefficient is calculated, the new clothing key points after the whole scaling processing and the longitudinal coordinate mapping matrix before and after the whole scaling processing are calculated, and then the final coordinate mapping matrix before and after deformation is calculated according to the parameters.
And then according to the calculated abscissa mapping matrix and ordinate mapping matrix before and after deformation, deformation processing is carried out on the clothing area in the clothing template image. Specifically, according to an abscissa mapping matrix included in the coordinate mapping matrix, deformation processing in the horizontal direction is performed on a clothing region in the clothing template image through a preset deformation algorithm. And according to a vertical coordinate mapping matrix included in the coordinate mapping matrix, performing deformation processing on the clothing area in the vertical direction through a preset deformation algorithm.
The preset deformation algorithm may include a remapping function opencvremap, and the abscissa mapping matrix, the ordinate mapping matrix, and the clothing template image calculated in steps F21 and F22 are input into the opencvremap function, and a deformed clothing template image is output.
And coordinates of each clothing key point in the deformed clothing template image are the same as coordinates of the corresponding neck key point in the user image, and coordinates of other pixel points except the clothing key point in the clothing template image are correspondingly changed. The deformed clothing template image is highly matched with the user image, a clothing mask corresponding to the clothing area is generated according to the deformed clothing template image, and the deformed clothing template image is covered in the user image according to the clothing mask to obtain a changing effect image.
As shown in fig. 15, the clothing region in fig. 3 is transformed according to the user image shown in fig. 2 and the clothing template image shown in fig. 3, and then the transformed clothing region is overlaid on the reloading effect diagram obtained in fig. 2. As can be seen from fig. 15, the virtual change of the garment is performed after the deformation of the garment region, so that the garment region is highly matched with the user image, and a good change effect is obtained.
In order to facilitate understanding of the garment deformation process provided by the embodiments of the present application, the following description is made with reference to the accompanying drawings. As shown in fig. 16, a user image and a clothing template image are first acquired, and the user image and the clothing template image are aligned, respectively. And then detecting a neck key point in the user image, and acquiring a clothes key point in the clothes template image. And calculating the integral scaling coefficient scale1 in the y direction of the garment according to the key points of the garment and the key points of the neck. The user image is divided into a plurality of sections of first abscissa intervals according to the key point of the neck, and the clothing area is divided into a plurality of sections of second abscissa intervals according to the key point of the clothing. And calculating a vertical coordinate mapping matrix corresponding to the y direction on the basis of scale1. And calculating an abscissa mapping matrix corresponding to the x direction according to the divided first abscissa intervals and the divided second abscissa intervals. And deforming the clothing area in the clothing template image by using the abscissa mapping matrix and the ordinate mapping matrix to obtain the deformed clothing template image.
And determining a coordinate mapping matrix before and after deformation of the clothing region according to the neck key point in the user image and the clothing key point in the clothing template image, and performing deformation by using the coordinate mapping matrix. And skillfully utilizing the geometrical relationship before and after deformation, based on few key points, adopting linear interpolation and a deformation coordinate mapping function to calculate a coordinate mapping matrix before and after deformation of the clothing region, and utilizing the coordinate mapping matrix to deform the clothing region. Not only ensures that the clothes deformation has good deformation effect, but also greatly simplifies the calculation amount of the clothes deformation and improves the processing speed of the clothes deformation.
F3: and covering the clothing area of the deformed clothing template image into the repaired user image to obtain a changing effect image.
And F1, repairing the neck area of the user image, deforming the clothing area in the clothing template image in the step F2, and then covering the clothing area of the deformed clothing template image in the repaired user image, so that the clothing key points in the deformed clothing template image are overlapped with the corresponding neck key points in the user image after the neck is repaired, and obtaining a suit changing effect image.
The neck area in the user image is repaired firstly, so that the condition that the neck area is incomplete after the user image is changed can be avoided. And deforming the clothing region in the clothing template image to enable the clothing region to be highly matched with the user image. The virtual dress change is carried out based on the user image after the neck is repaired and the clothing template image after the clothing area is deformed, and the best dress change effect can be obtained.
Step 105: and generating a reloading effect picture corresponding to the user image and the clothes template image by utilizing the neck area in the clothes template image according to the neck key point and the clothes key point.
If it is detected in step 103 that the neck region is blocked in the user image and the detection result shows that the neck region is blocked, the virtual change-over cannot be performed in the manner of step 104. But needs to be changed by using the neck area of the model in the image of the clothing template. The reloading is carried out in particular by the following operations G1 to G4.
G1: and performing color migration processing on the neck region in the clothing template image according to the user image.
First, a face skin dominant color and a first neck skin dominant color in an image of a user are extracted.
After the user image and the clothing template image are obtained, the user image and the clothing template image can be subjected to appropriate preprocessing, such as digital conversion, noise reduction and the like, so that subsequent main color extraction is facilitated.
The method comprises the steps of firstly identifying a face area and a neck area of a user image through a preset semantic segmentation model or an image segmentation model trained in the foregoing and used for segmenting a human body area, then extracting a face skin dominant color of the user image from the face area, and extracting a first neck skin dominant color from the neck area.
For the face area of the user image, all the chromatic values contained in the face area can be determined firstly, then the number of pixel points corresponding to each chromatic value in the face area is counted, and the chromatic value with the largest number of pixel points in the face area is determined as the main color of the face skin corresponding to the face area. Similarly, for the first neck region, all the colorimetric values included in the first neck region may be determined first, then the number of the pixel points corresponding to each colorimetric value in the first neck region is counted, and then the colorimetric value with the largest number of the pixel points in the first neck region is determined as the main color of the neck skin corresponding to the first neck region.
Because the model neck region in the clothing template image is used for changing the clothing, the changing effect picture after changing the clothing comprises the face region of the user and the model neck region, and the color of the face region of the user and the color of the model neck region may have great difference, so that the changing effect is not good. Therefore, after the main color of the face skin and the main color of the first neck skin of the user image are obtained in the mode, the color of the neck area of the clothing template image is adjusted according to the main color of the face skin and the main color of the first neck skin, transition between the color of the neck of the model in the clothing template image and the color of the face area in the user image is more natural, and the dress changing effect is improved.
For the skin dominant color, it can be expressed by using a plurality of color spaces, such as RGB, YUV, etc., and considering that the luminance signal Y and the chrominance signals U and V of the image in the YUV color space are separated, the luminance value Y or the chrominance value UV of the image in the color space can be replaced individually, thereby realizing the adjustment of the image color and making the adjusted color more real and natural. Therefore, in the embodiment, the main color in the YUV color space is adopted, and the UV channel value of each pixel point in the neck region of the clothing template image is adjusted, so that a good color migration effect is achieved, the face and the neck in the changing effect image look more real and natural, and a higher processing speed can be realized.
When the UV channel value of each pixel point in the neck region of the clothing template image is adjusted, theoretically, the color of the face region of the user image and the color of the first neck region are directly transferred to the clothing template image in an ideal state, and natural transition between the face and the neck in the clothing changing effect image can be guaranteed. However, in practical applications, the area of the first neck region of the user image is often small, and an ideal color migration effect cannot be achieved. In order to further solve the problem, in the embodiment, when the area of the first neck region is small, the UV channel value of each pixel point in the neck region of the clothing template image is adjusted based on the color of the face region of the user image.
Specifically, a ratio of a first neck area of the user image to a face area of the user image may be calculated first, and it may be determined whether the ratio is within a preset interval; if so, fusing the main color of the face skin and the main color of the first neck skin, and adjusting the UV channel value of each pixel point in the neck area of the clothing template image according to the fused main colors; if not, and the ratio is larger than the upper limit value of the preset interval, adjusting the UV channel value of each pixel point in the neck area of the clothing template image according to the primary color of the skin of the first neck; if not, and the ratio is smaller than the lower limit value of the preset interval, adjusting the UV channel value of each pixel point in the neck area of the clothing template image according to the main color of the facial skin.
It should be noted that, in the present embodiment, specific values of the preset interval are not specifically limited, and those skilled in the art can determine the preset interval according to actual situations, for example, the preset interval may be [0.2,0.3], [0.25,0.32], [0.18,0.26], and the like.
When the main color of the facial skin and the main color of the first neck skin are fused, the color space of the main color of the facial skin and the color space of the main color of the first neck skin can be converted into the YUV color space respectively, and the UV channel value of the main color of the facial skin and the UV channel value of the main color of the first neck skin under the YUV color space are obtained. And then determining the UV channel value of the fusion main color according to the UV channel value and the corresponding weight coefficient of the main color of the face skin, and the UV channel value and the corresponding weight coefficient of the main color of the first neck skin.
The sum of the weight coefficient corresponding to the UV channel value of the face skin dominant color and the weight coefficient corresponding to the UV channel value of the first neck skin dominant color may be equal to 1, and the weight coefficient may have a corresponding relationship with a ratio of the first neck area to the face area of the user image. Specifically, assume that a weight coefficient corresponding to the UV channel value of the dominant color of the facial skin is a, a weight coefficient corresponding to the UV channel value of the dominant color of the first neck skin is b, and a ratio of the first neck area to the facial area of the user image is k, then a + b =1, and the larger the k value is, the larger the value of b is, the smaller the value of a is.
It should be noted that the determination of the weighting factor is only a preferred embodiment of the present invention, and this embodiment is not limited thereto, as long as the dominant color of the facial skin and the dominant color of the first neck skin can be fused, for example, the two weighting factors may also be constant values, and the sum of the two weighting factors may also be smaller than 1, or slightly larger than 1.
After the UV channel value of the fused dominant color (i.e., the fused dominant color) is determined, the UV channel value of each pixel point in the neck region of the clothing template image may be replaced with the UV channel value of the fused dominant color, thereby implementing color adjustment of the neck region of the clothing template image.
When the ratio of the first neck area of the user image to the face area of the user image is larger than the upper limit value of the preset interval, the color of the neck area of the clothing template image is adjusted by directly using the first neck skin main color. The color space of the first neck skin dominant color may be first converted into a YUV color space, and the UV channel value of the first neck skin dominant color in the YUV color space may be obtained. And then, converting the color space of the clothes template image into a YUV color space, and replacing the UV channel value of each pixel point in the neck region of the clothes template image with the UV channel value of the primary color of the skin of the first neck.
Similarly, when the ratio of the first neck area of the user image to the face area of the user image is smaller than the lower limit value of the preset interval, the color of the neck area of the clothing template image is adjusted by using the main color of the face skin of the user image. The color space of the main color of the facial skin can be converted into a YUV color space, and the UV channel value of the main color of the facial skin under the YUV color space can be obtained. And then, converting the color space of the clothing template image into a YUV color space, and replacing the UV channel value of each pixel point in the neck area of the clothing template image with the UV channel value of the main color of the facial skin.
In some embodiments of the present application, before adjusting the UV channel value of each pixel point in the neck region of the clothing template image according to the main color of the facial skin and the main color of the first neck skin, the overall brightness of the user image can be adjusted, and then based on the user image after adjusting the brightness, the YUV space color conversion is performed, and the UV channel value of the main color of the facial skin and the UV channel value of the main color of the first neck skin are obtained, and the UV channel value of each pixel point in the neck region of the clothing template image is adjusted. Therefore, the whole brightness transition of the reloading effect picture after reloading is natural, and the face color, the neck color and the transition region between the face color and the neck color are more real and natural.
When the overall brightness of the user image is adjusted, the neck skin main color (hereinafter referred to as a second neck skin main color) of the model in the clothing template image can be extracted, and then the brightness value of each pixel point in the neck area of the clothing template image is adjusted according to the brightness value of the second neck skin main color and the brightness value of the face skin main color. Therefore, the brightness of the neck area of the clothing template image is integrally adjusted based on the brightness of the neck of the model in the clothing template image and the brightness of the face skin of the user image, and the coordination and the reality of the overall brightness can be guaranteed.
When the brightness value of each pixel point in the neck region of the clothing template image is adjusted according to the brightness value of the second neck skin main color and the brightness value of the face skin main color, a brightness adjustment parameter may be determined according to the brightness value of the second neck skin main color and the brightness value of the face skin main color, and then the brightness value of each pixel point in the neck region of the clothing template image is adjusted based on the brightness adjustment parameter.
The brightness parameter adjustment may be any value greater than 0, and may be determined according to the brightness value of the second neck skin main color and the brightness value of the face skin main color, and the specific value is not specifically limited in this embodiment. In some embodiments, the brightness adjustment parameter may be a ratio of the brightness value of the second neck skin main color to the brightness value of the face skin main color, or may be other parameters related to the ratio, such as a square of the ratio, a root-open sign of the ratio, and the like.
Adjusting the brightness value of each pixel point in the neck region of the clothing template image based on the brightness adjustment parameter can be generally understood as multiplying the brightness value of each pixel point in the neck region of the clothing template image by the brightness adjustment parameter.
In this embodiment, by setting the brightness adjustment parameter, when the brightness value of each pixel point in the neck region of the clothing template image is adjusted, the brightness value of each pixel point in the neck region of the migration image can be further adjusted by adjusting the brightness adjustment parameter, for example, in an actual situation, for a neck region near a chin, because the chin blocks light, the color is usually dark, and the brightness adjustment parameter can be appropriately reduced; for the neck area far away from the chin, in practical situations, the color is usually brighter due to the reflection of the skin, and the brightness adjustment parameter can be appropriately increased, so that the skin color of the neck area is more real.
Specifically, when the brightness adjustment parameter is determined according to the brightness Value of the second neck skin main color and the brightness Value of the face skin main color, the color spaces of the second neck skin main color and the face skin main color may be both converted into HSV (Hue, saturation) color spaces, then the first brightness Value of the face skin main color and the second brightness Value of the second neck skin main color in the HSV color spaces are respectively obtained, and the brightness adjustment parameter is calculated according to the second brightness Value and the first brightness Value in the HSV color spaces.
For HSV color space, H represents the hue of an image, S represents the saturation of an image, and V represents the brightness of an image. Different from the hardware-oriented color spaces such as RGB and CMY, the HSV color space is user-oriented, the model parameters (H, S and V) are provided according to the physiological characteristics of observed colors of people, and the sensitivity of the human visual system to brightness is stronger than the color value, so that the brightness adjustment parameter is calculated by adopting the brightness value in the HSV color space, the adjusted result is more accurate, and the observation of the human visual system is more convenient. It is understood that calculating the brightness adjustment parameter by using the parameter in the HSV color space is only a preferred implementation manner of the present embodiment, and the present embodiment is not limited thereto, as long as the brightness adjustment parameter can be determined by the brightness calculation.
It should be noted that, the above-mentioned scheme of adjusting the overall brightness of the user image before adjusting the UV channel value of each pixel point in the neck region of the clothing template image is only a preferred embodiment of the present embodiment, and the present embodiment is not limited thereto, as long as the brightness adjustment can make the brightness distribution between the neck region of the clothing template image and the face region in the user image look more real and natural. For example, after adjusting the UV channel value of each pixel point in the neck region of the clothing template image, the overall brightness of the clothing template image may be adjusted according to the face brightness of the user image and the neck brightness of the model image.
In order to facilitate understanding of the methods provided by the embodiments of the present application, reference is made to the following description taken in conjunction with the accompanying drawings. As shown in fig. 17, the server first obtains the user image and the clothing template image uploaded by the user from the terminal device of the user, then performs face alignment operation of the user image and the clothing template image through a face recognition technology, and performs body region segmentation on the user image and the clothing template image through a partition model based on an hrnet to obtain a face region and a first neck region of the user image and a second neck region of the clothing template image. Then, a face skin dominant color, a first neck skin dominant color, and a second neck skin dominant color are extracted based on the face region, the first neck region, and the second neck region, respectively. And adjusting the overall brightness of the user image according to the color brightness of the face region and the color brightness of the second neck region, then performing color fusion based on the main color of the face skin and the main color of the first neck skin of the adjusted user image, obtaining a fused main color UV channel value, and replacing the UV channel value of each pixel point in the neck region of the clothing template image with the fused main color UV channel value, thereby realizing color migration of the user image, and finally, returning the image generated after color migration to a service calling party (namely a client). If the server detects that the user image is unqualified (such as blurred image, incomplete face and the like), the server can display that the user image is unqualified on the client interface and please upload the character of the user image again.
Based on user's image and dress template image, facial skin owner colour and first neck skin owner colour in the extraction user's image, according to facial skin owner colour and first neck skin owner colour, adjust the colour in the neck region of dress template image, fine colour migration effect has been reached, especially to under the relatively less condition of neck region area in the user's image, can make face and neck after changing the dress seem truer and more natural, only adjust UV channel value, the calculated amount is little, can realize higher processing speed, be convenient for at the operation of the relevant service of customer end.
G2: and performing deformation processing on the neck area in the clothing template image according to the neck key point of the user image.
And acquiring the neck key points of the neck area in the clothing template image through the trained neck key point detection model. And then determining a coordinate mapping matrix before and after the deformation of the neck region in the clothing template image according to the neck key point corresponding to the user image and the neck key point corresponding to the clothing template image. Similarly to the coordinate mapping matrix before and after the deformation of the clothing region is determined in the step F2, the horizontal coordinate mapping matrix in the horizontal direction and the vertical coordinate mapping matrix in the vertical direction are respectively obtained. And then performing deformation processing on the neck region in the garment template image according to the coordinate mapping matrix before and after deformation of the neck region. Namely, according to the abscissa mapping matrix before and after the neck region is deformed, the horizontal deformation processing is carried out on the neck region in the clothing template image. And performing deformation processing on the neck region in the clothing template image in the vertical direction according to the ordinate mapping matrix before and after the deformation of the neck region.
The coordinates of the neck key points of the neck region of the model in the clothing template image after the deformation processing are the same as the coordinates of the corresponding neck key points in the user image, and the size and the shape of the neck region of the model are close to the size and the shape of the complete neck region of the user in the user image. Thus, a better changing effect can be obtained when changing the model by using the neck area.
G3: and performing deformation processing on the clothing area in the clothing template image according to the neck key point and the clothing key point.
The operation in this step is the same as the process of deforming the clothing region in the clothing template image in step F2, and is not described herein again.
G4: and covering the neck area and the clothing area of the deformed clothing template image in the user image to obtain a change effect image.
And G1, carrying out color transfer on the neck area of the clothing template image, G2, deforming the neck area in the clothing template image, G3, deforming the clothing area in the clothing template image, and covering the deformed neck area and clothing area of the clothing template image into the repaired user image so that the deformed neck area in the clothing template image is connected with the face area in the user image to obtain a suit changing effect image. The retouching effect map thus includes the original head region in the user image, the deformed neck region in the garment template image, and the deformed garment region.
After the neck area and the clothing area of the deformed clothing template image are covered on the user image, a certain deviation may exist at the part where the neck area is connected with the face area to influence the reloading effect. In other embodiments, after the neck area and the clothing area of the deformed clothing template image are covered on the user image, the face area or the skin area of the face of the user is obtained from the original user image, and the obtained face area or the skin area of the face is covered on the face of the reloading effect map again to obtain the final reloading effect map.
In some embodiments of the present application, the hair region in the user image may also affect the reloading effect, for example, in the case of a long hair covered with a back drape as shown in fig. 18 (a), the clothing template image is overlaid on the user image, and there may be a gap between the hair region and the clothing region in the reloading effect image (as shown in fig. 18 (b)), so that the reloading effect is unnatural. Therefore, before performing step F3 or G4, it is also necessary to classify the hair region in the user image, and take certain measures to eliminate the influence of the hair region on the reloading effect according to the classification result of the hair region.
In the virtual reloading scene, if the original picture is short hair, the hair area does not need to be stretched and deformed, and a good reloading effect can be obtained. If the original picture is covered with long hairs, after the clothes template picture is replaced and covered in the original picture, no gap is formed between the hair area and the clothes area, so that the hair area does not need to be stretched and deformed, and a good replacing effect can be obtained. The original picture is the situation that hair is draped behind, gaps may occur between the hair and the clothes after makeup changing, and in the situation, if the hair area is stretched and deformed, the effect of changing the clothes can be improved, so that the situation needs to be considered independently, and the situation needs to be considered independently when the user wears the hair in front of or behind the hair, or the hair is stretched to one side.
Based on the different influences of different hair types on the changing effect in the changing scene, the hair types are divided into 5 types of short hair (the hair does not exceed the upper boundary of the shoulder), forward left-side hair, forward right-side hair wholesale, backward left-side hair and backward right-side hair wholesale in the embodiment of the application. And identifying the hair category corresponding to the user image through a preset classification model. In the virtual reloading scene, if the hair type corresponding to the output user image is wholesale backwards in the left side and/or wholesale backwards in the right side, before reloading, the hair area on the side which is wholesale backwards in the opposite direction can be stretched and deformed, and then the clothes template image is covered in the user image after stretching and deformation, so that gaps between the clothes area and the hair area after reloading can be avoided, and the reloading effect is improved.
If the classification result of the hair region in the user image is detected to be short hair or front hair, the step F3 or G4 may be executed without performing special processing on the hair region. If the detected classification result is the long-hair after-the-front classification, considering that in the changing scene, the clothing in the clothing template image is higher than the clothing in the user image, even if the changing process is executed, the clothing template covers a part of the hair, no gap exists between the clothing and the hair, and thus the hair region does not need to be deformed, and whether the hair region needs to be deformed or not needs to be further judged according to the clothing region and the hair region in the clothing template image.
In a possible implementation manner, the vertical coordinate difference of adjacent edges between the clothes area and the hair area is traversed column by column, the maximum vertical coordinate difference is obtained from the vertical coordinate difference, if the maximum vertical coordinate difference is smaller than a preset value, it is indicated that the clothes area covers a part of the hair, it is determined that the deformation processing is not required to be performed on the hair area, and if the maximum vertical coordinate difference is larger than the preset value, it is determined that the deformation processing is required to be performed on the hair area.
The vertical coordinates in the clothes area are taken as a rectangular coordinate system value established by taking the upper left corner of the image of the belonging clothes template as an original point, the vertical coordinates in the hair area are taken as a rectangular coordinate system value established by taking the upper left corner of the image of the belonging user as an original point, and the size of the image of the clothes template is consistent with that of the image of the user, so that the difference of the vertical coordinates is the vertical coordinate of the edge of the clothes area minus the vertical coordinate of the edge of the hair area.
For example, for a column, if the ordinate of the edge of the clothing area is smaller than the ordinate of the edge of the hair area, and the coordinate difference between the two is smaller than 0, it indicates that the clothing pixels on the column will be covered on the hair, and if the ordinate of the edge of the clothing area is larger than the ordinate of the edge of the hair area, and the coordinate difference between the two is larger than 0, it indicates that the clothing pixels on the column will not be covered on the hair.
It follows that the above preset value can be set to 0.
If it is determined that the hair region needs to be deformed in the above manner, the hair region is deformed by the following operations of steps I1 to I4.
I1: the hair region and the face region in the user image are located.
The hair area and the face area are positioned by performing semantic segmentation on hair and faces in the user image. The hair area is an area formed by pixels of which the label type is hair, the white area shown in a diagram (a) in fig. 19 is a hair area, the face area is an area formed by pixels of which the label type is a face, the area includes an area surrounded by an auricle and the whole face, and the white area shown in a diagram (b) in fig. 19 is a face area.
I2: and expanding the face area of the user image towards the vertex direction to obtain a protection area.
In which, by expanding the face region toward the parietal direction, the protection region can cover the parietal region that does not need to be processed, as shown in fig. 20.
It should be noted that, taking the changing scene as an example, if the user's hair in the image is short hair (i.e. the hair does not go over the shoulder) or long hair front cape (both the left and right sides are front cape), even if the changing process is performed, because there is a gap between the short hair itself and the garment or the long hair itself is put on the front of the garment, the original style between the hair and the garment after the changing process is still maintained, the changing effect is not affected, and at this time, the hair does not need to be deformed; if the hair of the user in the image is the long hair after-cape, for the single-side long hair after-cape, a gap is formed between the single-side hair and the clothes after the clothes are changed, and for the double-side long hair after-cape, the double-side hair and the clothes after the clothes are changed have gaps, so that the clothes changing effect can be influenced, and therefore, the deformation treatment is needed as long as one side of the hair is the after-cape.
I3: the hair region and the protective region are used to define the region to be treated.
Wherein, because the protection area covers the hair area which does not need to be treated, the area to be treated can be obtained by adopting a special subtraction operation on the complete hair area and the protection area.
In an alternative embodiment, a set subtraction operation may be performed between the hair region and the protective region to obtain the region to be treated. Fig. 21 shows a protection region obtained by performing the set subtraction operation between fig. 20 and fig. 19 (a), i.e., a hair region located under the auricle.
I4: and carrying out deformation processing on the area to be processed on the user image along the direction opposite to the direction of the top of the head to obtain the user image with deformed hair.
In a possible implementation manner, the deformation coefficient corresponding to each column of the to-be-processed region is obtained, and then the to-be-processed region on the user image is subjected to deformation processing along a direction opposite to the vertex direction according to the deformation coefficient corresponding to each column of the to-be-processed region, so that the user image with deformed hair is obtained.
And the direction opposite to the vertex direction is the downward deformation treatment of the area to be treated. When each row of hair area is processed, the larger the deformation coefficient is, the longer the hair corresponding to the row is processed, and the smaller the deformation coefficient is, the shorter the hair corresponding to the row is processed.
In an alternative embodiment, for the process of obtaining the deformation coefficient corresponding to each column on the to-be-processed area, the maximum ordinate and the minimum ordinate of each column on the hair area and the minimum ordinate of each column on the clothing area are obtained, and then for each column on the hair area, the deformation coefficient corresponding to the column is determined by using the maximum ordinate and the minimum ordinate of the column on the hair area and the minimum ordinate of the column on the clothing area.
The columns of the to-be-processed area are the same as the columns of the hair area, so that the deformation coefficient corresponding to each column of the hair area is the deformation coefficient corresponding to each column of the to-be-processed area. Or, since the area to be treated belongs to a part of the hair area, the columns on the area to be treated all belong to the columns on the hair area.
Optionally, the calculation formula of the deformation coefficient corresponding to a certain column is as follows:
scale = (min _ y _ loop-min _ y _ hair)/(max _ y _ hair-min _ y _ hair) (equation 1)
In the above equation 1, scale is a deformation coefficient, min _ y _ tooth is a minimum ordinate of the column on the clothing region (i.e., an ordinate on the clothing shoulder edge), and max _ y _ hair and min _ y _ hair are a maximum ordinate and a minimum ordinate of the column on the hair region (i.e., a top ordinate and a tail ordinate), respectively.
In an optional embodiment, in a process of performing deformation processing on a to-be-processed area on a user image in a direction opposite to a vertex direction, a maximum vertical coordinate and a minimum vertical coordinate of each column on the to-be-processed area are obtained, then, for each column on the to-be-processed area, a deformed vertical coordinate of the column is determined according to the maximum vertical coordinate, the minimum vertical coordinate and a deformation coefficient of the column, and finally, deformation processing is performed on the to-be-processed area on the user image according to the deformed vertical coordinate, the minimum vertical coordinate and the maximum vertical coordinate of each column on the to-be-processed area.
Wherein, the calculation formula of the transformed vertical coordinate of a certain column is as follows:
end _ h _ new = start _ h + (end _ h-start _ h) × scale (formula 2)
In the above formula 2, end _ h _ new is the deformed ordinate of the column, start _ h is the minimum ordinate of the column, end _ h is the maximum ordinate of the column, and scale is the deformation coefficient of the column.
That is, the vertical coordinate range of a certain column on the region to be processed is (start _ h, end _ h), and the vertical coordinate range of the column after processing becomes (start _ h, end _ h _ new).
In a possible implementation manner, in a process of performing deformation processing on a to-be-processed region according to a deformed ordinate, a minimum ordinate and a maximum ordinate of each column on the to-be-processed region, a first map matrix and a second map matrix are generated by respectively using the width and the height of a user image, a mapping relation between an abscissa and an ordinate of the user image is represented by the first map matrix and the second map matrix, then linear interpolation is performed according to the deformed ordinate, the minimum ordinate and the maximum ordinate of each column on the to-be-processed region to update the second map matrix, and the first map matrix and the updated second map matrix are used for performing deformation processing on the to-be-processed region on the user image.
Wherein, the first map matrix does not need to be updated because the column direction does not need to be processed.
In specific implementation, the first map matrix and the second map matrix can be obtained by initializing a coordinate mapping function of opencvremmap, where the coordinate mapping function is as follows:
map_x=np.zeros((h,w),dtype=np.float32)
map_y=np.zeros((h,w),dtype=np.float32)
map_x[:,:]=np.arange(w)
map_y[:,:]=np.arange(h)[:,np.newaxis]
the first two functions are used for initializing two map matrixes with the size h x w being 0, h is the height of the user image, w is the width of the user image, and each element value type in the map matrixes is a floating point type.
As shown in fig. 22, for the first map matrix map _ x, each row element has a value of 0 to w-1, and for the second map matrix map _ y, each column element has a value of 0 to h-1.
Optionally, the linear interpolation function for updating the second map matrix map _ y is as follows:
map_y[start_h:end_h_new,col]=
np.interp(np.arange(start_h,end_h_new),
[start_h,end_h_new],[start_h,end_h])
wherein, end _ h _ new is the deformed ordinate, start _ h is the minimum ordinate, and end _ h is the maximum ordinate.
It can be seen from the above linear interpolation function that the ordinate value representing each column of the region to be processed in the second map matrix is updated.
It should be noted that, after the user image after the hair treatment is obtained, the image of the suit is obtained by overlaying the clothing region on the user image after the hair deformation.
As shown in fig. 23, a drawing (a) in fig. 23 is an original image before the change, a drawing (c) is an image after the change is performed, before the change, a user in the original image is a cape for growing hair, and after the flow processing of the above-described steps I1 to I4, the clothes area on the image of the clothes template is overlaid on the image according to the coordinate correspondence to realize the change, and as can be seen from the drawing (c), no gap appears between the hair and the clothes.
After the hair area and the face area are located, the face area is expanded towards the vertex direction to obtain a protection area, the protection area comprises the hair area which does not need to be processed, so that the area to be processed is obtained through the located hair area and the protection area, then the area to be processed on the user image is subjected to deformation processing along the direction opposite to the vertex direction, and the user image with deformed hair is obtained.
Further, when the method is applied to a changing scene, the human face area is expanded towards the top of the head to cover hair which does not need to be processed, and the area to be processed is left, so that after the area to be processed is deformed along the downward direction subsequently, even if changing processing is executed, gaps do not occur between the hair area and the clothes, and seamless connection between the clothes and the hair after changing can be ensured.
To facilitate an understanding of the application of the hair deformation treatment during the reloading process, reference will now be made to the following description in connection with the accompanying drawings. As shown in fig. 24, a user picture and a clothing template picture uploaded by a user are received, the user picture and the clothing template picture are aligned, so that the aligned user picture and the clothing template picture are consistent in size, human bodies in the two pictures are aligned, hair in the user picture is classified, a classification result is obtained, a clothing region in the clothing template picture is located, if the classification result is short hair or hair front drape, deformation processing is not needed, and the clothing region can be directly overlaid on the user picture according to a coordinate correspondence relationship to realize clothing changing; if the classification result is a long hair after-wrapping, positioning a hair area and a face area in the user picture, further judging whether the hair area needs to be subjected to deformation processing according to the clothes area and the hair area, expanding the face area towards the top direction to obtain a protection area when the deformation processing is judged to be needed, then determining a region to be processed by using the hair area and the protection area, performing deformation processing on the region to be processed on the user picture along the direction opposite to the top direction to obtain a user picture after hair deformation, and finally covering the clothes area on the user picture after hair processing according to the coordinate corresponding relation to realize reloading.
Based on the above description, in the changing scene, the hair area in the user picture is classified, when the classification result is a long hair covered, whether the hair area needs to be deformed or not is judged according to the clothes area and the hair area in the clothes template picture, if the hair area needs to be deformed, the face area expands towards the top direction to cover the hair area which does not need to be processed to obtain a protection area, then the protection area and the hair area are used to obtain an area to be processed, the area to be processed on the user picture is deformed along the direction opposite to the top direction to obtain the user picture with deformed hair, and then the clothes area is covered on the user picture to realize changing.
And (3) deforming the hair area when the hair type is a long-hair after-cape in the above mode, and then executing the operation of the step F3 or the step G4 to finish the reloading to obtain a reloading effect picture.
According to the method and the device, after the image of the user and the image of the clothing template are reloaded, the effect picture can be zoomed into a preset size, for example, the certificate photo usually has a specified size. After zooming to the preset size, the final image is obtained, and the final image is returned to the user.
In other embodiments of the present application, the user image uploaded by the user may not need to be reloaded, for example, in a credential scene, the garment itself in the user image uploaded by the user is a correct garment meeting the requirement, and if the user does not select the garment template image to be reloaded, the user image uploaded by the user may only be scaled to a preset size.
In other embodiments, preset conditions required to be met by the user image in the reloading scene are also configured in advance, after the user image uploaded by the user is received, whether the user image meets the preset conditions or not is judged, and if yes, the user image is reloaded, resized and the like through the embodiment of the application. And if the user image is determined to be not in accordance with the preset condition, returning prompt information that the user image is not in accordance with the requirement so that the user can upload the user image meeting the requirement again.
The preset condition is related to a specific reloading scene, for example, in a certificate photo scene, the preset condition may include that a user image needs to include a complete face region and at least a partial neck region.
In the embodiment of the application, a neck key point in a user image and a clothes key point in a clothes template image are determined, and human body regions such as a hair region, a face region, a neck region, a clothes region and the like in the user image and the clothes template image are segmented. Based on the processing, the neck area of the user image is repaired, the clothes area in the clothing template image is deformed, the hair area is deformed and the like, and the clothes area in the clothing template image is covered in the user image to achieve a good changing effect. Or, color migration is carried out on the neck area in the clothing template image, the neck area and the clothing area in the clothing template image are deformed, the hair area is deformed and the like, and the good changing effect is achieved by covering the neck area and the clothing area in the clothing template image on the user image.
The embodiment of the present application further provides a virtual reloading device, configured to execute the virtual reloading method provided in any embodiment of the foregoing embodiments. As shown in fig. 25, the apparatus includes:
an image obtaining module 201, configured to obtain a user image and a clothing template image to be reloaded;
the key point acquisition module 202 is used for acquiring a neck key point corresponding to the user image and a clothes key point corresponding to the clothes template image;
the neck occlusion detection module 203 is configured to detect whether a neck in the user image is occluded;
the generating module 204 is configured to, if the neck occlusion detecting module detects that the neck in the user image is occluded, generate a suit changing effect map corresponding to the user image and the suit template image by using a neck region in the suit template image according to the neck key point and the suit key point; and if the neck shielding detection module detects that the neck in the user image is not shielded, generating a reloading effect picture corresponding to the user image and the clothes template image by using the neck area in the user image according to the neck key point and the clothes key point.
The generating module 204 is used for repairing a neck area in the user image according to the neck key point, the clothes key point, the user image and the clothes template image; according to the neck key points and the clothes key points, performing deformation processing on the clothes area in the clothes template image; and covering the clothing area of the deformed clothing template image into the repaired user image to obtain a changing effect image.
Further comprising: the neck repairing module is used for generating a mask image of a to-be-repaired area of a neck in the user image according to the neck key point, the clothes key point, the user image and the clothes template image; if the area of the area to be repaired in the mask image of the area to be repaired is larger than 0, generating a background repair image corresponding to the user image; and repairing the neck area in the user image according to the mask image and the background repairing image of the area to be repaired.
The neck repairing module is used for acquiring a neck mask image corresponding to a neck area in the user image according to the user image; acquiring a clothing mask image corresponding to the clothing template image according to the neck key point, the clothing key point and the clothing template image; and generating a mask image of a to-be-repaired area of the neck in the user image according to the neck mask image and the clothing mask image.
The neck repairing module is used for detecting all face key points in the user image; according to the key points of the face, carrying out face alignment processing on the user image; segmenting a neck region from the aligned user image through a preset semantic segmentation model; and generating a neck mask image corresponding to the neck area.
The neck repairing module is used for aligning the clothes key points in the clothes template image with the neck key points; and determining the aligned clothing template image as a clothing mask image.
The neck repairing module is used for splicing the neck mask image and the clothing mask image according to a neck key point in the neck mask image and a clothing key point in the clothing mask image to obtain a spliced mask image; determining a region to be detected from the edge of the collar of the garment to the upper edge of the neck region in the spliced mask image; determining a region to be repaired from the region to be detected; and generating a mask image of the area to be repaired corresponding to the area to be repaired.
The neck repairing module is used for traversing the attribution image of each pixel point in the area to be detected; screening out all pixel points of which the belonging image is neither the neck mask image nor the clothing mask image; and determining the area formed by all the screened pixel points as the area to be repaired.
The neck repairing module is used for calculating the difference value between the vertical coordinate of the pixel point at the edge of the collar in the same row of pixels and the vertical coordinate of the pixel point at the lower edge of the neck area; marking the pixels at the edge of the collar and the pixels at the lower edge of the neck area with the difference value larger than zero; and determining the area surrounded by the connecting lines of all the marked pixel points as the area to be repaired.
The neck repairing module is used for extracting the main color of a neck area in the user image; drawing a pure color background picture corresponding to the main color; deducting an image of the head and neck region from the user image; and covering the image of the head and neck region into the pure-color background image to obtain a background restoration image corresponding to the user image.
The neck repairing module is used for determining all color values contained in a neck sub-area of the user image; counting the number of pixel points corresponding to each color value in the neck region; and determining the color value with the largest number of pixel points as the dominant color of the neck region.
And the neck repairing module is used for inputting the mask image and the background repairing image of the area to be repaired into a preset image repairing network to obtain a repaired image corresponding to the user image.
The generating module 204 is configured to perform color migration processing on a neck region in the clothing template image according to the user image; according to the neck key points, performing deformation processing on a neck area in the clothing template image; according to the neck key points and the clothes key points, carrying out deformation processing on the clothes area in the clothes template image; and covering the neck area and the clothing area of the deformed clothing template image in the user image to obtain a change effect image.
Further comprising: the color migration module is used for extracting a main color of facial skin and a main color of first neck skin in the user image; and adjusting the color of the neck area of the clothes template image according to the main color of the face skin and the main color of the first neck skin.
The color migration module is used for calculating the ratio of the first neck area of the user image to the face area of the user image and determining whether the ratio is located in a preset interval; if so, fusing the main color of the face skin and the main color of the first neck skin, and adjusting the UV channel value of each pixel point in the neck area of the clothing template image according to the fused main colors; if not, and the ratio is larger than the upper limit value of the preset interval, adjusting the UV channel value of each pixel point in the neck area of the clothing template image according to the primary color of the skin of the first neck; if not, and the ratio is smaller than the lower limit value of the preset interval, adjusting the UV channel value of each pixel point in the neck area of the clothing template image according to the main color of the facial skin.
The color migration module is used for respectively converting the color space of the face skin main color and the color space of the first neck skin main color into a YUV color space and acquiring a UV channel value of the face skin main color and a UV channel value of the first neck skin main color in the YUV color space; determining a UV channel value of a fusion main color according to the UV channel value and the corresponding weight coefficient of the main color of the face skin, and the UV channel value and the corresponding weight coefficient of the main color of the first neck skin; and replacing the UV channel value of each pixel point in the neck area of the clothing template image with the UV channel value of the fused main color.
The color migration module is used for converting the color space of the first neck skin main color into a YUV color space and acquiring a UV channel value of the first neck skin main color in the YUV color space; converting the color space of the clothes template image into a YUV color space; and replacing the UV channel value of each pixel point in the neck area of the clothing template image with the UV channel value of the first neck skin main color.
The color migration module is used for converting the color space of the main color of the facial skin into a YUV color space and acquiring a UV channel value of the main color of the facial skin in the YUV color space; converting the color space of the image of the clothing template into a YUV color space; and replacing the UV channel value of each pixel point in the neck area of the clothing template image with the UV channel value of the main color of the facial skin.
The color migration module is used for extracting a second neck skin main color of the clothing template image; and adjusting the brightness of each pixel point in the neck area of the clothing template image according to the brightness value of the second neck skin main color and the brightness value of the face skin main color.
The color migration module is used for determining a brightness adjustment parameter according to the brightness value of the second neck skin main color and the brightness value of the face skin main color; and adjusting the brightness of each pixel point in the neck region of the clothing template image based on the brightness adjusting parameter.
The color transfer module is used for converting the color space of the second neck skin main color and the color space of the face skin main color into the HSV color space; respectively acquiring a first brightness value of a main color of facial skin and a second brightness value of a main color of second neck skin under an HSV color space; and calculating a brightness adjusting parameter according to the second brightness value and the first brightness value.
Further comprising: the neck deformation module is used for acquiring a neck key point of a neck area in the garment template image; determining a coordinate mapping matrix before and after deformation of a neck region in the clothing template image according to the neck key point corresponding to the user image and the neck key point corresponding to the clothing template image; and according to the coordinate mapping matrix before and after the neck region is deformed, deforming the neck region in the clothing template image.
Further comprising: the clothing deformation module is used for determining a coordinate mapping matrix before and after deformation of a clothing area in the clothing template image according to the neck key point and the clothing key point; and carrying out deformation processing on the clothing area in the clothing template image according to the coordinate mapping matrix.
The clothing deformation module is used for calculating a horizontal coordinate mapping matrix before and after deformation of a clothing area in the clothing template image according to the neck key point and the clothing key point; and calculating a longitudinal coordinate mapping matrix before and after deformation of the clothing region according to the neck key points and the clothing key points.
The clothing deformation module is used for dividing the width of the user image into a plurality of sections of first horizontal coordinate intervals along the horizontal direction according to the horizontal coordinate of each neck key point; dividing the width of the clothes template image into a plurality of sections of second abscissa intervals along the horizontal direction according to the abscissa of each clothes key point, wherein the number of the first abscissa intervals is equal to that of the second abscissa intervals; and calculating an abscissa mapping matrix corresponding to the clothing region in the clothing template image by utilizing a linear interpolation and a deformation coordinate mapping function according to the first abscissa intervals and the second abscissa intervals.
The clothing deformation module is used for calculating a scaling coefficient of a vertical coordinate corresponding to each horizontal coordinate in the clothing area according to the neck key point and the clothing key point; and calculating a vertical coordinate mapping matrix corresponding to the clothing area by using a deformation coordinate mapping function according to the height of the clothing template image, the vertical coordinate of each coordinate point of the clothing area and the scaling coefficient corresponding to each vertical coordinate.
The clothing deformation module is used for dividing the width of the user image into a plurality of sections of first horizontal coordinate intervals along the horizontal direction according to the horizontal coordinate of each neck key point; respectively calculating a scaling coefficient corresponding to the vertical coordinate of each clothes key point according to the height of the clothes template image, the neck key point and the clothes key point; and calculating the scaling coefficient of the ordinate corresponding to each abscissa in the clothing area by utilizing linear interpolation and a deformation coordinate mapping function according to the scaling coefficients corresponding to the first abscissa intervals and the ordinate of each clothing key point.
The clothing deformation module is used for carrying out overall scaling processing on the clothing area in the clothing template image, and the key point with the maximum vertical coordinate on the border line of the collar in the clothing template image after scaling is superposed with the key point of the clavicle area in the user image, which is positioned on the vertical central axis of the neck; and recalculating each clothing key point in the clothing template image after the overall scaling processing.
The clothing deformation module is used for calculating an overall scaling coefficient according to the height of the clothing template image, the vertical coordinate of the intersection point of the boundary lines of the left side and the right side of the neckline in the clothing template image and the vertical coordinate of the key point of the neck, wherein the collarbone area in the user image is positioned on the vertical central axis of the neck; and calculating a vertical coordinate mapping matrix of the clothing area before and after the integral scaling treatment by using a deformation coordinate mapping function according to the height of the clothing template image, the vertical coordinate of each coordinate point of the clothing area and the integral scaling coefficient.
The clothing deformation module is used for keeping the abscissa of each clothing key point unchanged after the integral scaling treatment; and respectively calculating the vertical coordinate of each clothes key point after the integral scaling treatment according to the height of the clothes template image, the integral scaling coefficient and the vertical coordinate of each clothes key point before the integral scaling treatment.
And the clothing deformation module is used for calculating a final longitudinal coordinate mapping matrix corresponding to the clothing area by using a deformation coordinate mapping function according to the height of the clothing template image, the longitudinal coordinate mapping matrix before and after the integral scaling processing and the scaling coefficient corresponding to each longitudinal coordinate.
The clothing deformation module is used for carrying out deformation processing in the horizontal direction on a clothing area in the clothing template image through a preset deformation algorithm according to the abscissa mapping matrix included by the coordinate mapping matrix; and according to a vertical coordinate mapping matrix included in the coordinate mapping matrix, performing deformation processing on the clothing region in the vertical direction through a preset deformation algorithm.
Further comprising: hair deformation processing for acquiring a classification result of a hair region in the user image; if the classification result is the type of the long-hair backward drape, judging whether the hair area needs to be subjected to deformation processing according to the clothes area and the hair area in the clothes template image; and if so, performing deformation processing on the hair area in the user image.
Hair deformation processing for traversing a difference in vertical coordinates of adjacent edges between the garment region and the hair region column by column; acquiring the maximum vertical coordinate difference from the vertical coordinate differences; if the maximum vertical coordinate difference is smaller than a preset value, determining that deformation processing is not needed to be carried out on the hair area; if the maximum vertical coordinate difference is larger than a preset value, determining that the hair area needs to be deformed; wherein the difference in vertical coordinates is the vertical coordinate of the edge of the garment region minus the vertical coordinate of the lower edge of the hair region.
The hair deformation processing is used for expanding the face area of the user image towards the vertex direction to obtain a protection area; determining a region to be treated by using the hair region and the protection region; and carrying out deformation processing on the area to be processed on the user image along the direction opposite to the vertex direction to obtain the user image with deformed hair.
The hair deformation treatment is used for acquiring the corresponding deformation coefficient of each column on the area to be treated; and according to the corresponding deformation coefficient of each column on the to-be-processed area, carrying out deformation processing on the to-be-processed area on the user image along the direction opposite to the vertex direction to obtain the user image with deformed hair.
The hair deformation treatment is used for acquiring the maximum ordinate and the minimum ordinate of each column on the area to be treated; determining a deformed vertical coordinate of each column according to the maximum vertical coordinate, the minimum vertical coordinate and the deformation coefficient of each column on the region to be processed; and carrying out deformation processing on the area to be processed on the user image according to the deformed ordinate, the minimum ordinate and the maximum ordinate of each column on the area to be processed.
A hair deformation process for acquiring a maximum ordinate and a minimum ordinate of each column on the hair region, and acquiring a minimum ordinate of each column on the clothes region; aiming at each column on the hair area, determining a deformation coefficient corresponding to the column by using the maximum ordinate and the minimum ordinate of the column on the hair area and the minimum ordinate of the column on the clothes area; wherein the column of the area to be treated is the same as the column of the hair area.
The hair deformation processing is used for generating a first map matrix and a second map matrix by utilizing the width and the height of the user image respectively so as to represent the mapping relation between the horizontal coordinate and the vertical coordinate of the user image by the first map matrix and the second map matrix; performing linear interpolation according to the deformed ordinate, the minimum ordinate and the maximum ordinate of each column on the region to be processed so as to update the second map matrix; and performing deformation processing on the region to be processed on the user image by using the first map matrix and the updated second map matrix.
And the hair deformation treatment is used for carrying out set subtraction operation between the hair area and the protection area to obtain the area to be treated.
The neck shielding detection module 203 is used for determining an image to be detected according to the proportion of a target area in the user image; and inputting the image to be detected into the trained neck shielding detection model, and judging whether the neck in the image to be detected is shielded or not by the neck shielding detection model.
The neck shielding detection module 203 is configured to detect a face area including a neck in the user image as a target area; determining a ratio between an area of the target region and an area of the user image; if the ratio exceeds a ratio threshold, determining the user image as an image to be detected; and if the ratio does not exceed the ratio threshold, the target area is scratched from the user image, and the scratched target area is subjected to size amplification and then determined as the image to be detected.
The neck shielding detection model comprises a first branch network, a second branch network, a fusion module and an output layer, wherein the first branch network and the second branch network are executed in parallel and independently; the neck shielding detection module 203 is configured to perform a first preset convolution processing on an image to be detected through a first branch network in the neck shielding detection model to obtain a first feature map; performing second preset convolution processing on an image to be detected through a second branch network in the neck shielding detection model to obtain a second characteristic diagram; performing channel fusion on the first characteristic diagram and the second characteristic diagram through a fusion module in the neck shielding detection model to obtain a fusion characteristic diagram; and obtaining a judgment result of whether the neck is shielded or not based on the fusion feature map through an output layer in the neck shielding detection model.
The first branch network includes a first depth separable convolutional layer and a first upsampling layer; the neck shielding detection module 203 is used for performing feature extraction on a channel of the image to be detected through a convolution kernel included in the first depth separable convolution layer to obtain a single-channel feature map; and performing dimensionality-increasing processing on the single-channel characteristic diagram through a preset number of channel 1 x 1 convolution cores contained in the first upper sampling layer to obtain a first characteristic diagram.
The second branch network includes a downsampling layer, a second depth separable convolutional layer, and a second upsampling layer; the neck shielding detection module 203 is used for performing dimensionality reduction processing on the image to be detected through 1 × 1 convolution kernel of a single channel contained in the down-sampling layer to obtain a single-channel characteristic diagram; performing feature extraction on the single-channel feature map through a convolution kernel included in the second depth separable convolution layer to obtain a single-channel feature map with extracted features; and performing dimensionality-increasing processing on the single-channel feature map with the extracted features through a preset number of channel 1 x 1 convolution kernels contained in the second upper sampling layer to obtain a second feature map.
The neck shielding detection module 203 is used for overlapping the first characteristic diagram and the second characteristic diagram according to the channel by the channel splicing layer in the fusion module to obtain a characteristic diagram after the channel is overlapped; and disturbing the channel stacking sequence of the feature map after channel stacking through the channel mixing layer in the fusion module to obtain a fusion feature map.
The neck shielding detection module 203 is configured to obtain a data set, where samples in the data set include a positive sample and a negative sample; traversing each sample in the data set, inputting the currently traversed sample into a pre-constructed neck shielding detection model, predicting whether the sample is shielded by the neck shielding detection model and outputting a prediction result; calculating a loss value by using the prediction result, the positive and negative sample balance coefficients and the sample number balance coefficient; and when the change rate of the loss value is greater than the change threshold value, optimizing network parameters of the neck shielding detection model according to the loss value, adjusting the positive and negative sample balance coefficients and the sample number balance coefficient, and continuously executing the process of traversing each sample in the data set until the change rate of the loss value is lower than the change threshold value.
The neck occlusion detection module 203 is configured to perform a data enhancement process on each sample in the data set, and add the processed sample to the data set to expand the data set.
The neck shielding detection module 203 is configured to increase both the positive and negative sample balance coefficients and the sample number balance coefficient by a first step length if the loss value is greater than a preset threshold; if the loss value is smaller than the preset threshold value, increasing the positive and negative sample balance coefficients and the sample number balance coefficient by a second step length; wherein the first step size is larger than the second step size.
Further comprising: the neck key point detection module is used for determining the actual slope of a straight line determined by the marked neck key point and the reference point in the sample graph; inputting the sample graph into a pre-constructed neck key point detection model, so as to learn by the neck key point detection model, and outputting a neck key point; calculating a loss value by using the neck key points output by the model, the labeled neck key points and the actual slope; and when the loss value is larger than the preset value, optimizing the network parameters of the neck key point detection model according to the loss value, and continuously executing the process of inputting the sample graph into the pre-constructed neck key point detection model until the loss value is lower than the preset value.
The neck key point detection module is used for acquiring a data set, and each sample image in the data set comprises a user head portrait; for each sample graph in the data set, locating a neck region in the sample graph; detecting a middle point of a contact edge of the neck area and the clothes to determine as a reference point; and marking a neck key point on the sample graph, and determining the actual slope of a straight line determined by the marked neck key point and the reference point.
The neck key point detection module is used for determining a straight line passing through a reference point by utilizing a preset slope; marking an intersection point between the straight line and the edge of the neck area on the sample graph as a neck key point; horizontally turning the straight line, and marking an intersection point between the turned straight line and the edge of the neck region as another neck key point on the sample graph; and fine-tuning neck key points marked on the sample graph.
The neck key point detection module is used for inputting the sample image into a preset segmentation model so as to perform semantic segmentation on the sample image by the segmentation model; and determining a region formed by the pixels of which the semantic segmentation result is a neck as a neck region.
The neck key point detection module is used for acquiring the position error between the neck key point output by the model and the position error before the labeled neck key point; determining a loss weight according to the position error and the actual slope; determining Euclidean distance between sample graph vector information carrying neck key points output by the model and sample graph vector information carrying labeled neck key points; and calculating a loss value by using the loss weight and the Euclidean distance.
Further comprising: the human body region segmentation module is used for acquiring a plurality of original character images; labeling each human body region in each original human body image based on a preset semantic segmentation model and a preset cutout model to obtain a training data set; constructing a network structure of a segmentation model based on an attention mechanism; training the network structure of the segmentation model according to the training data set to obtain an image segmentation model for segmenting the human body region; and carrying out human body region segmentation on the user image and the clothes template image through the image segmentation model.
The human body region segmentation module is used for obtaining semantic segmentation results of human body regions of the original character image by adopting a preset semantic segmentation model based on the original character image; correcting a semantic segmentation result corresponding to the original character image through a preset matting model; marking each human body area in the original character image according to the corrected semantic segmentation result; and storing the marked original character image in a training data set.
The human body region segmentation module is used for carrying out cutout processing on the original character image through a preset cutout model to segment a foreground pixel region and a background pixel region of the original character image; if the matting result indicates that the first pixel point is a background pixel and the semantic segmentation result indicates that the first pixel point is a foreground pixel, correcting the semantic segmentation result of the first pixel point into the background pixel; if the matting result indicates that the first pixel point is a foreground pixel and the semantic segmentation result indicates that the first pixel point is a background pixel, determining a target pixel point which is closest to the first pixel point and different from the semantic segmentation result of the first pixel point, and determining the semantic segmentation result of the target pixel point as the semantic segmentation result of the first pixel point.
The human body region segmentation module is used for calculating an affine transformation matrix used for carrying out face alignment operation on the original figure image and a preset standard image of a preset semantic segmentation model; performing matrix transformation on the original figure image based on the affine transformation matrix to obtain a transformed image corresponding to the original figure image; obtaining a semantic segmentation result corresponding to the original character image by adopting a preset semantic segmentation model based on the original character image and the transformed image
The human body region segmentation module is used for decomposing the affine transformation matrix to respectively obtain a rotation matrix, a translation matrix and a scaling matrix; respectively performing rotation, translation transformation and scaling transformation on the original figure image based on the rotation matrix, the translation matrix and the scaling matrix to obtain a first transformed image; and respectively cutting the first transformed image based on the image sizes corresponding to the scaling matrix and the scaling matrix with the preset magnification to obtain a plurality of second transformed images.
The human body region segmentation module is used for respectively obtaining semantic segmentation results of human body regions of the original character image and semantic segmentation results of human body regions of the second transformed image by adopting a semantic segmentation model based on the original character image and the second transformed image;
and correcting the semantic segmentation result of each human body region of the original character image according to the semantic segmentation result of each human body region of each second transformed image to obtain a semantic segmentation result corresponding to the original character image.
The human body region segmentation module is used for determining the confidence degrees of the segmentation results of the original person image and each second transformed image; the confidence coefficient of the segmentation result is in inverse proportional relation with the size of the image; and determining the confidence coefficient of the segmentation result of the original character image and/or each second transformed image of each pixel point of the original character image, and taking the semantic segmentation result with the highest confidence coefficient of the corresponding segmentation result as the semantic segmentation result of the pixel point.
The human body region segmentation module is used for performing data enhancement on the marked original figure image through a data enhancement library to obtain an enhanced data set; and carrying out multi-scale random cutting on partial images in the enhanced data set to obtain a training data set.
The human body region segmentation module is used for inputting the sample images in the training data set into a segmentation model based on an attention mechanism to obtain the segmentation results of the sample images; calculating the integral loss value of the current training period according to the segmentation result and the labeling information of each sample image; and optimizing the segmentation model based on the attention mechanism after each period of training through an optimizer and a learning rate scheduler.
The human body region segmentation module is used for calculating a cross entropy loss value of the current training period through a cross entropy loss function according to the segmentation result and the labeling information of each sample image; calculating a Dice loss value of the current training period through a Dice loss function according to the segmentation result and the labeling information of each sample image; and calculating the integral loss value of the current training period according to the cross entropy loss value and the Dice loss value.
The virtual reloading device provided by the above embodiment of the application and the virtual reloading method provided by the embodiment of the application have the same inventive concept, and have the same beneficial effects as the method adopted, operated or realized by the application program stored in the virtual reloading device.
The embodiment of the application also provides electronic equipment for executing the virtual reloading method. Referring to fig. 26, a schematic diagram of an electronic device provided in some embodiments of the present application is shown. As shown in fig. 26, the electronic device 8 includes: a processor 800, a memory 801, a bus 802 and a communication interface 803, the processor 800, the communication interface 803 and the memory 801 being connected by the bus 802; the memory 801 stores a computer program that can be executed on the processor 800, and the processor 800 executes the virtual reloading method provided by any of the foregoing embodiments when executing the computer program.
The Memory 801 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the apparatus and at least one other network element is realized through at least one communication interface 803 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
Bus 802 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 801 is used for storing a program, and the processor 800 executes the program after receiving an execution instruction, and the virtual reloading method disclosed in any of the foregoing embodiments of the present application may be applied to the processor 800, or implemented by the processor 800.
The processor 800 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 800. The Processor 800 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 801, and the processor 800 reads the information in the memory 801 and completes the steps of the method in combination with the hardware thereof.
The electronic device provided by the embodiment of the application and the virtual reloading method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.
Referring to fig. 27, the computer readable storage medium is an optical disc 30, and a computer program (i.e., a program product) is stored thereon, and when the computer program is executed by a processor, the computer program may execute the virtual reloading method provided in any of the foregoing embodiments.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memories (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical and magnetic storage media, which are not described in detail herein.
The computer-readable storage medium provided by the above embodiment of the present application and the virtual reloading method provided by the embodiment of the present application are based on the same inventive concept, and have the same beneficial effects as methods adopted, run, or implemented by application programs stored in the computer-readable storage medium.
It should be noted that:
in the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted to reflect the following schematic diagram: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (17)

1. A virtual reloading method, comprising:
acquiring a user image and a clothes template image to be reloaded;
acquiring a neck key point corresponding to the user image and a clothes key point corresponding to the clothes template image;
detecting whether a neck in the user image is blocked;
if the neck in the user image is blocked, generating a suit changing effect graph corresponding to the user image and the suit template image by using the neck area in the suit template image according to the neck key point and the suit key point;
if the neck in the user image is not blocked, generating a suit changing effect image corresponding to the user image and the suit template image by using the neck region in the user image according to the neck key point and the suit key point;
generating a change-over effect graph corresponding to the user image and the clothing template image by using the neck region in the user image, wherein the generation of the change-over effect graph comprises the following steps:
repairing a neck area in the user image, and deforming a clothing area in the clothing template image; and performing virtual reloading based on the user image after neck repair and the clothing template image after clothing area deformation.
2. The method according to claim 1, wherein the generating of the retouching effect map corresponding to the user image and the clothing template image by using the neck region in the user image according to the neck key point and the clothing key point comprises:
repairing a neck region in the user image according to the neck key point, the clothes key point, the user image and the clothes template image;
according to the neck key points and the clothes key points, deformation processing is carried out on the clothes area in the clothes template image;
and covering the clothing area of the deformed clothing template image into the repaired user image to obtain a changing effect image.
3. The method of claim 2, wherein said repairing a neck region in the user image from the neck keypoints, the clothing keypoints, the user image, and the clothing template image comprises:
generating a mask image of a to-be-repaired area of the neck in the user image according to the neck key points, the clothes key points, the user image and the clothes template image;
if the area of the area to be repaired in the mask image of the area to be repaired is larger than 0, generating a background repair image corresponding to the user image;
and repairing the neck area in the user image according to the mask image of the area to be repaired and the background repair image.
4. The method according to claim 3, wherein the generating a mask image of a to-be-repaired area of the neck in the user image according to the neck key point, the clothing key point, the user image and the clothing template image comprises:
acquiring a neck mask image corresponding to a neck region in the user image according to the user image;
acquiring a clothing mask image corresponding to the clothing template image according to the neck key point, the clothing key point and the clothing template image;
splicing the neck mask image and the clothing mask image according to the neck key points in the neck mask image and the clothing key points in the clothing mask image to obtain a spliced mask image;
determining a region to be detected from the edge of the collar of the garment to the upper edge of the neck region in the spliced mask image;
determining a region to be repaired from the region to be detected;
and generating a mask image of the area to be repaired corresponding to the area to be repaired.
5. The method according to claim 3, wherein the generating a background restoration map corresponding to the user image comprises:
extracting a main color of a neck region in the user image;
drawing a pure color background picture corresponding to the main color;
deducting an image of a head and neck region from the user image;
and covering the image of the head and neck region into the pure-color background image to obtain a background restoration image corresponding to the user image.
6. The method according to claim 1, wherein the generating a change effect map corresponding to the user image and the clothing template image by using a neck region in the clothing template image according to the neck key point and the clothing key point comprises:
performing color transfer processing on a neck area in the clothing template image according to the user image;
according to the neck key points, performing deformation processing on a neck area in the clothing template image;
according to the neck key points and the clothes key points, deformation processing is carried out on the clothes area in the clothes template image;
and covering the neck area and the clothing area of the deformed clothing template image into the user image to obtain a changing effect image.
7. The method according to claim 6, wherein the performing color migration processing on the neck region in the clothing template image according to the user image comprises:
extracting a face skin dominant color and a first neck skin dominant color in the user image;
and adjusting the color of the neck area of the clothing template image according to the face skin main color and the first neck skin main color.
8. The method of claim 7, wherein before adjusting the color of the neck region of the garment template image according to the face skin dominant color and the first neck skin dominant color, further comprising:
extracting a second neck skin dominant color of the clothing template image;
and adjusting the brightness of each pixel point in the neck area of the clothing template image according to the brightness value of the second neck skin main color and the brightness value of the face skin main color.
9. The method according to claim 2 or 6, wherein the deforming the clothing region in the clothing template image according to the neck key point and the clothing key point comprises:
determining a coordinate mapping matrix before and after deformation of a clothing region in the clothing template image according to the neck key point and the clothing key point;
and carrying out deformation processing on the clothing area in the clothing template image according to the coordinate mapping matrix.
10. The method of claim 9, wherein determining the coordinate mapping matrix before and after deformation of the garment region in the garment template image according to the neck keypoint and the garment keypoint comprises:
dividing the width of the user image into a plurality of sections of first abscissa intervals along the horizontal direction according to the abscissa of each neck key point;
dividing the width of the clothes template image into a plurality of sections of second abscissa intervals along the horizontal direction according to the abscissa of each clothes key point, wherein the number of the first abscissa intervals is equal to that of the second abscissa intervals;
and calculating an abscissa mapping matrix corresponding to the clothing region in the clothing template image by utilizing a linear interpolation and a deformation coordinate mapping function according to the plurality of first abscissa intervals and the plurality of second abscissa intervals.
11. The method of claim 9, wherein determining the coordinate mapping matrix before and after deformation of the garment region in the garment template image according to the neck keypoint and the garment keypoint comprises:
calculating a scaling coefficient of a vertical coordinate corresponding to each horizontal coordinate in the clothing area according to the neck key point and the clothing key point;
and calculating a vertical coordinate mapping matrix corresponding to the clothing area by using a deformation coordinate mapping function according to the height of the clothing template image, the vertical coordinate of each coordinate point of the clothing area and the scaling coefficient corresponding to each vertical coordinate.
12. The method according to claim 2 or 6, further comprising, before the step of overlaying the warped garment template image to the user image:
obtaining a classification result of a hair region in the user image;
if the classification result is the long-hair backward drape type, judging whether deformation processing needs to be carried out on the hair area according to the clothes area and the hair area in the clothes template image;
and if so, performing deformation processing on the hair area in the user image.
13. The method of claim 12, wherein said deforming said hair region in said user image comprises:
expanding the face area of the user image towards the vertex direction to obtain a protection area;
determining a region to be treated using the hair region and the protective region;
and carrying out deformation processing on the area to be processed on the user image along the direction opposite to the head top direction to obtain the user image with deformed hair.
14. The method of claim 1, wherein the detecting whether the neck in the user image is occluded comprises:
determining an image to be detected according to the proportion of a target area in the user image;
and inputting the image to be detected into a trained neck shielding detection model, and judging whether the neck in the image to be detected is shielded or not by the neck shielding detection model.
15. The method according to claim 1, wherein before the obtaining the key point of the neck corresponding to the user image, further comprises:
determining the actual slope of a straight line determined by the labeled key point of the neck and the reference point in the sample graph;
inputting the sample graph into a pre-constructed neck key point detection model, so that the neck key point detection model can learn and output a neck key point;
calculating a loss value by using the neck key points output by the model, the labeled neck key points and the actual slope;
and when the loss value is larger than a preset value, optimizing the network parameters of the neck key point detection model according to the loss value, and continuing to execute the process of inputting the sample graph into the pre-constructed neck key point detection model until the loss value is lower than the preset value.
16. The method according to claim 1, wherein before generating the reloading effect map corresponding to the user image and the clothes template image according to the neck key point and the clothes key point, the method further comprises:
acquiring a plurality of original character images;
labeling each human body region in each original human body image based on a preset semantic segmentation model and a preset cutout model to obtain a training data set;
constructing a network structure of a segmentation model based on an attention mechanism;
training the network structure of the segmentation model according to the training data set to obtain an image segmentation model for human body region segmentation;
and carrying out human body region segmentation on the user image and the clothing template image through the image segmentation model.
17. A virtual reloading apparatus, comprising:
the image acquisition module is used for acquiring a user image and a clothes template image to be changed;
the key point acquisition module is used for acquiring a neck key point corresponding to the user image and a clothes key point corresponding to the clothes template image;
the neck shielding detection module is used for detecting whether the neck in the user image is shielded or not;
a generating module, configured to generate a suit changing effect map corresponding to the user image and the suit template image by using a neck region in the suit template image according to the neck key point and the suit key point if the neck occlusion detecting module detects that the neck in the user image is occluded; if the neck shielding detection module detects that the neck in the user image is not shielded, generating a suit changing effect picture corresponding to the user image and the suit template image by using the neck region in the user image according to the neck key point and the suit key point;
under the condition that the neck in the user image is not blocked, the generation module repairs the neck area in the user image and deforms the clothing area in the clothing template image; and performing virtual reloading based on the user image after neck repair and the clothing template image after clothing area deformation.
CN202210049595.4A 2022-01-17 2022-01-17 Virtual reloading method and device Active CN114565508B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210049595.4A CN114565508B (en) 2022-01-17 2022-01-17 Virtual reloading method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210049595.4A CN114565508B (en) 2022-01-17 2022-01-17 Virtual reloading method and device

Publications (2)

Publication Number Publication Date
CN114565508A CN114565508A (en) 2022-05-31
CN114565508B true CN114565508B (en) 2023-04-18

Family

ID=81712735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210049595.4A Active CN114565508B (en) 2022-01-17 2022-01-17 Virtual reloading method and device

Country Status (1)

Country Link
CN (1) CN114565508B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100228B (en) * 2022-07-25 2022-12-20 江西现代职业技术学院 Image processing method, system, readable storage medium and computer device
CN117635763A (en) * 2023-11-28 2024-03-01 广州像素数据技术股份有限公司 Automatic reloading method, device, equipment and medium based on portrait component analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111787242A (en) * 2019-07-17 2020-10-16 北京京东尚科信息技术有限公司 Method and apparatus for virtual fitting
CN112562034A (en) * 2020-12-25 2021-03-26 咪咕文化科技有限公司 Image generation method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052980B (en) * 2021-04-27 2022-10-14 云南大学 Virtual fitting method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111787242A (en) * 2019-07-17 2020-10-16 北京京东尚科信息技术有限公司 Method and apparatus for virtual fitting
CN112562034A (en) * 2020-12-25 2021-03-26 咪咕文化科技有限公司 Image generation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114565508A (en) 2022-05-31

Similar Documents

Publication Publication Date Title
US10198624B2 (en) Segmentation-guided real-time facial performance capture
CN104732506B (en) A kind of portrait photographs&#39; Color Style conversion method based on face semantic analysis
WO2022078041A1 (en) Occlusion detection model training method and facial image beautification method
CN114565508B (en) Virtual reloading method and device
CN110363116B (en) Irregular human face correction method, system and medium based on GLD-GAN
CN103839223A (en) Image processing method and image processing device
WO2022156626A1 (en) Image sight correction method and apparatus, electronic device, computer-readable storage medium, and computer program product
CN113052170B (en) Small target license plate recognition method under unconstrained scene
CN112184585B (en) Image completion method and system based on semantic edge fusion
CN113112416B (en) Semantic-guided face image restoration method
CN112950477A (en) High-resolution saliency target detection method based on dual-path processing
CN111768415A (en) Image instance segmentation method without quantization pooling
CN104794693A (en) Human image optimization method capable of automatically detecting mask in human face key areas
CN110390657B (en) Image fusion method
CN108596992B (en) Rapid real-time lip gloss makeup method
CN114565755B (en) Image segmentation method, device, equipment and storage medium
Bugeau et al. Influence of color spaces for deep learning image colorization
Wang et al. Perception-guided multi-channel visual feature fusion for image retargeting
CN112330573A (en) Portrait-based image repairing method and device, electronic equipment and storage medium
Chen et al. Time-of-Day Neural Style Transfer for Architectural Photographs
CN115719414A (en) Target detection and accurate positioning method based on arbitrary quadrilateral regression
CN114155569B (en) Cosmetic progress detection method, device, equipment and storage medium
CN114882562A (en) Image processing method and device for preventing head from distortion
CN114187309A (en) Hair segmentation method and system based on convolutional neural network
CN114596213A (en) Image processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240510

Address after: 100102 room 1201, 12 / F, building 8, yard 34, Chuangyuan Road, Chaoyang District, Beijing

Patentee after: Beijing new oxygen world wide Technology Consulting Co.,Ltd.

Country or region after: China

Address before: 100102 room 901, 9 / F, room 1001, 10 / F, building 8, yard 34, Chuangyuan Road, Chaoyang District, Beijing

Patentee before: Beijing New Oxygen Technology Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right