CN112200818A

CN112200818A - Image-based dressing area segmentation and dressing replacement method, device and equipment

Info

Publication number: CN112200818A
Application number: CN202011105742.2A
Authority: CN
Inventors: 黄培根; 朱鹏飞; 王雷
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Huaduo Network Technology Co Ltd
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2021-01-08
Anticipated expiration: 2040-10-15
Also published as: CN112200818B

Abstract

The embodiment of the application provides a dressing area segmentation and dressing replacement method, a device and equipment based on an image, and relates to the field of image processing, wherein an original image is input into a dressing segmentation network, a feature down-sampling module of the dressing segmentation network is utilized to extract first image features of the original image, the first image features are input into a feature extraction module to obtain feature vectors, the feature vectors are input into a feature up-sampling module to restore to obtain feature images consistent with the resolution of the original image, and a probability image of the original image is obtained according to the feature images; and segmenting the dressing area of the original image according to the probability image, wherein a dressing segmentation network is deployed at the client. The technical scheme can realize accurate segmentation of the dressing area in the image on the client, and has stable performance and high processing speed. Meanwhile, the target dressing style selected by the user is fused to the dressing area, so that the dressing style is replaced on the client, and the method is small in calculation amount and stable in operation.

Description

Image-based dressing area segmentation and dressing replacement method, device and equipment

Technical Field

The present application relates to the field of image processing, and in particular, to a method, an apparatus, and a device for image-based dressing region segmentation and dressing replacement, and a computer-readable storage medium.

Background

With the rapid development of artificial intelligence, especially deep learning, semantic segmentation has become an important research topic with a wide application scenario. The dressing division is to divide the dressing area in the image and then realize the functions of dressing color change, dressing change and the like through relevant post-processing.

In the related art, the specific key point positions of the garment are generally predicted, such as the left collar, the right collar, the left sleeve, the right sleeve, the left lower hem and the right lower hem of the upper body garment, and 6 coordinate points are calculated; 4 coordinate points are arranged on the lower body, namely a left waistline, a right waistline, a left lower hem and a right lower hem; for the whole body, the left collar, the right collar, the left sleeves, the right sleeves, the left waistline, the right waistline, the left lower hem and the right lower hem are arranged, and the total number of the coordinate points is 8. Using these key points, the user's dressing area is determined. However, only rough segmentation can be performed on the dressing area by using the identified key points, and special situations such as the outline of the clothes and the shielding in the middle of the clothes cannot be accurately segmented, so that the segmentation precision is inaccurate, and the segmentation effect is poor.

Disclosure of Invention

The object of the present application is to solve at least one of the above technical drawbacks, in particular the problem of low segmentation accuracy.

In a first aspect, an embodiment of the present application provides a method for segmenting a dressing area based on an image, including the following steps:

acquiring an original image to be segmented;

inputting the original image into a clothing segmentation network, extracting first image features of the original image by using a feature down-sampling module of the clothing segmentation network, inputting the first image features into a feature extraction module of the clothing segmentation network to obtain feature vectors with different receptive fields in the original image, inputting the feature vectors into a feature up-sampling module of the clothing segmentation network for processing, restoring to obtain a feature image consistent with the resolution of the input original image, and obtaining a probability image of the original image according to the feature image;

and segmenting the dressing area of the original image according to the probability image.

In one embodiment, the step of inputting the first image feature into the feature extraction module of the clothing segmentation network to obtain feature vectors with different receptive fields in the original image includes:

inputting the first image feature into a semantic feature extraction branch of a feature extraction module of the clothing segmentation network for processing to obtain a semantic feature vector of the original image;

inputting the first image feature into a spatial feature extraction branch of a feature extraction module of the clothing segmentation network for processing to obtain a spatial feature vector of the original image; wherein the receptive field of the semantic feature vector is larger than the receptive field of the spatial feature vector;

and adding the semantic feature vector and the spatial feature vector pixel by pixel to obtain a total feature vector.

In one embodiment, the step of inputting the feature vector to a feature upsampling module of the clothing segmentation network for processing, and restoring to obtain a first feature image consistent with the resolution of the input original image includes:

inputting the total feature vector into a feature up-sampling module of the dressing segmentation network to perform network operation layer by layer, and extracting to obtain a second image feature;

and restoring according to the second image characteristic to obtain a characteristic image which is consistent with the resolution of the input image to be segmented.

In one embodiment, the feature downsampling module comprises a plurality of first convolution units, each of the first convolution units comprising a plurality of stacked first conventional convolutions, batch normalization, ReLU activation function, second conventional convolutions, batch normalization, and ReLU activation;

the semantic feature extraction branch comprises ten first convolution modules; wherein the convolution step sizes of the first, third and seven convolution modules are 2;

the spatial feature extraction branch comprises two stacked second convolution modules, wherein the convolution step length of each second convolution module is set to be 1, the convolution layers of the second convolution modules adopt channel separable convolution, and each second convolution module is formed by stacking packet convolution, batch normalization, a ReLU activation function, 1 x 1 conventional convolution, batch normalization and a ReLU activation function;

the feature upsampling volume block is formed by stacking a bilinear interpolation function, a conventional convolution function, batch normalization and a ReLU activation function.

In one embodiment, the step of obtaining the probability image of the original image according to the feature image comprises:

converting the characteristic image into a probability image corresponding to the characteristic image by using a Sigmoid function; wherein the characteristic image is a single-channel image with the same size as the original image.

In one embodiment, in the process of training the clothing segmentation network, the spatial feature extraction branch further includes:

classifying the original image by utilizing a global pooling and full-link layer of the dressing segmentation network, and judging whether the original image has a dressing area according to a classification result output by the global pooling and full-link layer; wherein the global pooling and full-link layer is disposed in a penultimate convolution module of a semantic feature extraction branch of the wear segmentation network.

In one embodiment, the method for segmenting the dressing area of the image further comprises the following steps:

and training the dressing segmentation network through cross entropy loss function constraint according to the classification result.

In a second aspect, an embodiment of the present application provides an image-based dressing replacement method, including:

acquiring a target dressing style selected by a user;

determining a dressing area of an original image, wherein the dressing area is determined by inputting the original image into a dressing segmentation network, extracting a first image feature of the original image by using a feature down-sampling module of the dressing segmentation network, inputting the first image feature into a feature extraction module of the dressing segmentation network to obtain feature vectors with different receptive fields in the original image, inputting the feature vectors into a feature up-sampling module of the dressing segmentation network for processing, restoring to obtain a feature image with the resolution consistent with the resolution of the input original image, and obtaining a probability image of the original image according to the feature image;

and fusing the target dressing style to the dressing area to generate a target image.

In a third aspect, an embodiment of the present application provides an image-based dressing area segmentation apparatus, including:

the original image acquisition module is used for acquiring an original image to be segmented;

a probability image output module, configured to input the original image into a clothing segmentation network, extract a first image feature of the original image by using a feature down-sampling module of the clothing segmentation network, input the first image feature into a feature extraction module of the clothing segmentation network to obtain a feature vector with different receptive fields in the original image, input the feature vector into a feature up-sampling module of the clothing segmentation network for processing, restore to obtain a feature image that is consistent with a resolution of the input original image, and obtain a probability image of the original image according to the feature image;

and the dressing area segmentation module is used for receiving the probability image and segmenting the dressing area of the original image according to the probability image.

In a fourth aspect, an embodiment of the present application provides an image-based dressing replacement apparatus, including:

the dressing style acquisition module is used for acquiring a target dressing style selected by a user;

the system comprises a dressing area determining module, a dressing area determining module and a dressing area determining module, wherein the dressing area is determined by inputting an original image into a dressing segmentation network, extracting a first image feature of the original image by using a feature down-sampling module of the dressing segmentation network, inputting the first image feature into a feature extraction module of the dressing segmentation network to obtain feature vectors with different receptive fields in the original image, inputting the feature vectors into a feature up-sampling module of the dressing segmentation network for processing, restoring to obtain a feature image with the resolution consistent with the resolution of the input original image and obtaining a probability image of the original image according to the feature image;

and the target image generation module is used for fusing the target dressing style to the dressing area to generate a target image.

In a fifth aspect, an embodiment of the present application provides an electronic device, which includes:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the image-based dressing region segmentation method according to the first aspect or the dressing replacement method according to the second aspect is performed.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the image-based dressing region segmentation method according to the first aspect or the dressing replacement method according to the second aspect.

The method, the device and the equipment for image-based dressing area segmentation and dressing replacement provided by the above embodiments are implemented by inputting an original image into a dressing segmentation network, extracting a first image feature of the original image by using a feature down-sampling module of the dressing segmentation network, inputting the first image feature into a feature extraction module of the dressing segmentation network to obtain feature vectors with different receptive fields in the original image, inputting the feature vectors into a feature up-sampling module of the dressing segmentation network for processing, restoring to obtain a feature image with a resolution consistent with that of the input original image, obtaining a probability image of the original image according to the feature image, and segmenting a dressing area of the probability original image according to the image, wherein the dressing segmentation network is a convolutional neural network trained in advance, is deployed at a client, and can realize accurate segmentation of the dressing area in the image at the client, stable performance and high processing speed. Meanwhile, the target dressing style selected by the user is fused to the dressing area to generate the target image, so that the dressing style can be replaced on the client, the calculation amount is small, the operation is stable, and the requirements of the user are met.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic diagram of an application system framework involved in an image-based dressing region segmentation process according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for segmenting a dressing area based on an image according to an embodiment of the present application;

FIG. 3 is a flow chart of an image-based dressing replacement method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an implementation of an image-based dressing replacement provided by an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an image-based dressing area segmentation apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an image-based dressing replacement device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The following describes an application scenario related to an embodiment of the present application.

The embodiment of the application is applied to a scene of effect transformation of the clothes in the image, and particularly can be applied to the image, the clothes area in the image is identified and divided, and the original clothes is replaced by the target clothes style through clothes style transformation, color transformation and the like.

For example, a client deploys a clothing segmentation network for identifying and segmenting a clothing region of an image, determines the clothing region through the clothing segmentation network, acquires a target clothing style selected by a user, replaces an original clothing in the image with the target clothing style, and fuses the target clothing style into the original image for display, so that the display effect of the image is improved, and the image is more attractive.

Based on the application scenario, the dressing segmentation network needs to be executed on the client, and the dressing area can be accurately segmented, so that the image display effect can be better improved. Of course, the technical solution provided in the embodiment of the present application may also be applied to other positioning scenarios, which are not listed here.

In order to better explain the technical solution of the present application, a certain application environment to which the present solution can be applied is shown below. Fig. 1 is a schematic diagram of an application system framework involved in an image-based dressing area segmentation process provided in an embodiment of the present application, and as shown in fig. 1, the application system 10 includes a client 101 and a server 102, and a communication connection is established between the client 101 and the server 102 through a wired network or a wireless network.

The client 101 may be a portable device such as a smart phone, a smart camera, a palm computer, a tablet computer, an electronic book, and a notebook computer, which is not limited to the above, and may have functions such as photographing and image processing, so as to implement image-based clothing region segmentation and special effect processing. Optionally, the client 101 has a touch screen, the client acquires a related original image, and a user may perform corresponding operations on the touch screen of the client 101 to implement the functions of dressing segmentation, image processing, special effect synthesis, and the like on the original image, and then obtain a target image, and upload the target image to the server 102, so as to issue the target image to other clients through the server 102 for display.

The server 102 includes a background server for providing background services for electronic devices, such as the client 101, and may be implemented by a stand-alone server or a server cluster composed of a plurality of servers. In one embodiment, the server may be an image sharing platform. After the user shoots the original image, the original image is correspondingly processed, such as dressing transformation, character beautification, background replacement and the like, the processed target image is uploaded to the server 102, and then the server 102 pushes the target image to other clients, so that other users can see the manufactured target image of the user.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

In the related image processing technology, a user can set related parameters at a client to identify a designated area, such as a dressing area, in an image through a server with better performance, and after the dressing area is subjected to style transformation according to the parameters set by the user, a processed image is generated and then pushed to each client.

In the process, the client and the server are required to be performed in a networking state, the neural network model is trained by means of high performance of the server, and the neural network model based on deep learning in the related technology cannot be deployed at the client due to large operation amount and cannot realize image processing based on the neural network model in an offline state. Even though some improved dressing segmentation methods can be executed on the client, other methods for segmenting images have low accuracy, so that the segmentation effect is caused, and the image processing effect is influenced.

In some related art, the clothing region segmentation is performed by predicting the positions of clothing-related key points in an image, for example, by a deeply-learned clothing segmentation network. Generally, a clothing segmentation network in the related art includes a feature extraction module, a clothing semantic information extraction module, and a clothing segmentation prediction module. The semantic information extraction module comprises a convolution layer and two full-connection layers, and the two full-connection layers respectively predict the position information of key points of the garment and the visibility of the key points; the clothing segmentation prediction module comprises a full connection layer, a softmax module and a regression module. The softmax module outputs the probability of the category, and the regress module outputs the specific key point positions of the upper garment, the lower garment and the whole garment. Because the dressing segmentation network is divided into a plurality of stages and a large number of full connection layers are used, the calculation amount is very large, and the dressing segmentation network is difficult to deploy at a mobile phone end for operation. Meanwhile, the dressing area is defined according to the key points, only the dressing area can be roughly divided, the outline of the clothes cannot be accurately divided, and special conditions such as shielding exist in the middle of the clothes, so that the dividing precision is inaccurate, and the dividing effect is poor.

The application provides a dressing area segmentation and dressing replacement method, device, equipment and computer readable storage medium based on images, and aims to solve the technical problems in the prior art.

Fig. 2 is a flowchart of an image-based clothing region segmentation method according to an embodiment of the present application, which is applicable to an image-based clothing region segmentation apparatus, such as a client. The following description will be given taking a mobile terminal as an example.

And S210, acquiring an original image to be segmented.

In this embodiment, the original image may be obtained by shooting with a device such as a mobile phone camera, or obtained from a local storage device or an external storage device, or downloaded from a server. Optionally, the original image may be any image, and the original image may include the image of the clothing feature, or may be an image that does not include the clothing feature.

In one embodiment, the original image containing the dressing features may be preliminarily screened out from the original image by preprocessing the original image, such as coarse screening the original image, so as to reduce the data processing amount of subsequent dressing region segmentation of the image. In another embodiment, if an original image not containing a clothing feature is acquired, it may be further identified as an original image not containing a clothing feature in subsequent processing, and the original image is deleted, or subsequent sky segmentation or special effect processing is not required.

S220, inputting the original image into a clothing segmentation network, extracting first image features of the original image by using a feature down-sampling module of the clothing segmentation network, inputting the first image features into a feature extraction module of the clothing segmentation network to obtain feature vectors with different receptive fields in the original image, inputting the feature vectors into a feature up-sampling module of the clothing segmentation network for processing, restoring to obtain a feature image with the resolution consistent with that of the input original image, and obtaining a probability image of the original image according to the feature image.

In the convolutional neural network, the definition of a Receptive Field (Receptive Field) is the area size of a pixel point on a feature image (feature map) output by each layer of the convolutional neural network, which is mapped on an input image. The explanation for the restyle point is that a point on the feature image corresponds to an area on the input image.

In the embodiment, the shallow feature of the original image is calculated, the feature vectors of different receptive fields in the shallow feature are determined, and the semantic information and the spatial information of the original image are determined, so that the accuracy of pixel classification and the edge segmentation precision of the original image are improved.

In this embodiment, the clothing segmentation network includes a feature downsampling module, a feature extraction module, and a feature upsampling module. The feature downsampling module includes a plurality of first convolution units, e.g., three stacked first convolution units, each of the first convolution units including a plurality of stacked first conventional convolutions, batch normalization, ReLU activation function, second conventional convolutions, batch normalization, and ReLU activation. Wherein the step size of the first conventional convolution is set to 2 to achieve downsampling and the second conventional convolution achieves doubling of the number of output channels.

The semantic feature extraction branch comprises ten first convolution modules; the convolution step length of the first convolution module, the convolution step length of the third convolution module and the convolution step length of the seventh convolution module are 2, so that the receptive field of the extracted semantic features is enlarged, and the extracted image features have richer semantic information. Since three convolution modules with step size 2 are included, the resolution of the feature image output by the final feature extraction module is 1/8 of the input feature image.

The spatial feature extraction branch comprises two stacked second convolution modules, wherein the convolution step length of each second convolution module is set to be 1, so that the resolution of an output feature image is not reduced, the feature image output by the spatial feature extraction branch has richer spatial information, the convolution layers of the second convolution modules adopt channel separable convolution, and each second convolution module is formed by stacking a grouping convolution, batch normalization, a ReLU activation function, a 1 × 1 conventional convolution, batch normalization and a ReLU activation function;

the feature upsampling volume block is formed by stacking a bilinear interpolation function, a conventional convolution function, batch normalization and a ReLU activation function. The feature upsampling module comprises three upsampling rolling blocks, so that the resolution of the output feature image is gradually restored to be consistent with the resolution of the input original image, and the number of channels of the output feature image is 1.

In an embodiment, the step S220 of inputting the first image feature into the feature extraction module of the clothing segmentation network to obtain feature vectors with different receptive fields in the original image includes the following steps:

s2201, inputting the first image feature into a semantic feature extraction branch of a feature extraction module of the clothing segmentation network for processing to obtain a semantic feature vector of the original image.

S2202, inputting the first image feature to a spatial feature extraction branch of a feature extraction module of the clothing segmentation network for processing to obtain a spatial feature vector of the original image;

wherein the receptive field of the semantic feature vector is larger than the receptive field of the spatial feature vector.

S2203, adding the semantic feature vectors and the space feature vectors pixel by pixel to obtain a total feature vector.

In the embodiment, the feature extraction module of the rigging segmentation network comprises a network main branch and a semantic feature extraction branch and a spatial feature extraction branch which are decomposed by the network main branch. The original image is input to the clothing segmentation network to be processed by a three-layer convolution unit (Conv + BN + ReLU, wherein the convolution layer is set to have the step size of 2 and the number of channels is doubled) of a network main branch of a feature extraction module of the clothing segmentation network, and first image features of the original image are extracted.

Furthermore, the network main branch is decomposed into a semantic feature extraction branch and a spatial feature extraction branch, the first image features extracted by the network main branch are analyzed and processed respectively, and feature vectors of different receptive fields are output, wherein the feature vectors learned by the semantic feature extraction branch, namely the semantic feature vectors, have larger receptive fields and richer semantic information, and are beneficial to improving the accuracy of pixel classification in the original image. The characteristic vector learned by the spatial characteristic extraction branch, namely the receptive field of the spatial characteristic vector is small, and the spatial characteristic vector has richer spatial information, thereby being beneficial to improving the segmentation precision of different regions in the original image.

In this embodiment, the semantic feature vector output by the semantic feature extraction branch and the spatial feature vector output by the spatial feature extraction branch are added pixel by pixel to obtain a total feature vector.

In an embodiment, the step S220 of inputting the feature vector to a feature upsampling module of the clothing segmentation network for processing, and the step of restoring to obtain a first feature image consistent with the resolution of the input original image includes:

and S2204, inputting the total feature vector to a feature upsampling module of the dressing segmentation network to perform network operation layer by layer, and extracting to obtain a second image feature.

In this embodiment, the total feature vector is input to the feature upsampling module of the clothing segmentation network, and the resolution of the output second image feature is gradually increased through the processing of the plurality of upsampling convolution modules of the feature upsampling module until the resolution of the output second image feature is consistent with the resolution of the input original image.

In this embodiment, the upsampling convolution module doubles the resolution of the second image feature layer by using bilinear interpolation, so that when the resolution of the second image feature output by the last upsampling convolution module is consistent with the resolution of the input original image, the feature image corresponding to the original image is obtained.

The characteristic image has the same image resolution as the input original image, and is a single-channel image. In a single-channel image, commonly referred to as a gray-scale image, each pixel point can only have one value representing color, the pixel value of the single-channel image is between 0 and 255, 0 is black, 255 is white, and the intermediate value is gray of different levels.

The method comprises the steps of inputting an original image into a dressing segmentation network, carrying out semantic segmentation processing on the original image, and outputting a feature image corresponding to the original image, wherein the resolution of the feature image is consistent with that of the input original image and is a single-channel image, namely the pixel value range of the feature image is 0-255.

Further, the step of obtaining the probability image of the original image according to the feature image in step S220 includes:

s2205, converting the characteristic image into a probability image corresponding to the characteristic image by using a Sigmoid function.

The value range of the Sigmoid function is between 0 and 1, and the Sigmoid function has very good symmetry. In the embodiment, the feature image with the pixel value range of 0-255 is converted by using a Sigmoid function, and the value range of the probability image corresponding to the output feature image is 0-1.

The Sigmoid function is:

wherein, x is the pixel value of the characteristic image, and f (x) is the value of the probability image corresponding to the pixel point of the characteristic image.

And S230, segmenting the dressing area of the original image according to the probability image.

And after performing semantic segmentation on the original image, the dressing segmentation network outputs a corresponding probability image, the value range of each pixel in the probability image is 0-1, and a dressing segmentation area is determined according to the pixel value, for example, the pixel value 255 is white, and the white area is a dressing area.

In the present embodiment, the dressing area of the original image is segmented according to the probability image output by the dressing segmentation network. In other embodiments, the probability image may be further processed, such as guided filtering, to optimize the segmentation accuracy of the clothing region, or inter-frame mean smoothing to obtain a more accurate probability image, so as to further improve the segmentation effect of the clothing region of the image.

In the method for segmenting the dressing area based on the image, the original image is obtained; inputting the original image into a clothing segmentation network, extracting first image features of the original image by using a feature down-sampling module of the clothing segmentation network, inputting the first image features into a feature extraction module of the clothing segmentation network to obtain feature vectors with different receptive fields in the original image, inputting the feature vectors into a feature up-sampling module of the clothing segmentation network for processing, restoring to obtain feature images consistent with the resolution of the input original image, obtaining probability images of the original image according to the feature images, and segmenting a clothing region of the original image according to the probability images, wherein the clothing segmentation network is a convolutional neural network trained in advance, is deployed at a client, and can realize accurate segmentation of the clothing region in the image at the client, the performance is stable, the processing speed is high, and the requirements of users are met.

It should be noted that, in the related art, the implementation scheme of the clothing segmentation, such as processing based on key point identification, generally has a large computation amount, and if a neural network in the related art, such as a Unet network, is directly transplanted to a client, the processing speed of the clothing segmentation is slow, and even the network is halted and unable to work, thereby causing a card end, on the other hand, due to the characteristics of the clothing, such as existence of a void gap, the identification of the clothing region is easy to be wrong, and the segmentation accuracy is low. According to the scheme, the sky segmentation of the image is carried out based on the lightweight dressing segmentation network, the image can be deployed at a client, the sky area can be accurately segmented in real time, and blocking is avoided while the segmentation precision is guaranteed.

In an embodiment, the method for segmenting a dressing area based on an image according to the present disclosure further includes: the clothing segmentation network is trained.

Optionally, in the process of training the clothing segmentation network, classifying the original image by using a global pooling layer and a full-link layer of the clothing segmentation network, and judging whether the original image has a clothing region according to a classification result output by the global pooling layer and the full-link layer; wherein the global pooling and full-link layer is disposed in a penultimate convolution module of a semantic feature extraction branch of the wear segmentation network.

In this embodiment, a global pooling + full-link layer is added to the penultimate convolution module of the semantic feature extraction branch of the clothing segmentation network, and is used to partition whether the clothing region exists in the original image, where if the classification result output by the global pooling + full-link layer is a value 0, it indicates that the clothing region does not exist in the original image, and if the classification result output by the global pooling + full-link layer is a value 1, it indicates that the clothing region exists in the original image.

In one embodiment, the training process of the rigging segmentation network is constrained by a cross-entropy loss function according to the classification result, and the cross-entropy loss function calculation formula is as follows:

wherein,

two-dimensional vector (p) representing the output of a clothing segmentation network_t，p_f) The probability value representing the existence and nonexistence of the original image, y representing the real value of the existence and nonexistence of the input original image is [0, 1 ]]. Optimizing background classification of original images by cross entropy loss functionAnd false detection of the situation that the original image has no dressing area and the like is reduced, so that the dressing segmentation network can predict the dressing area more accurately and stably.

In this embodiment, a global pooling + full-link layer is added to the penultimate volume block of the semantic feature extraction branch of the clothing segmentation network, the step of classifying whether the original image has a clothing region is performed in the training stage, and no calculation is required to be added in the prediction stage, so that no additional calculation amount is added.

Fig. 3 is a flowchart of an image-based dressing replacement method according to an embodiment of the present application, where the image-based dressing replacement method is applicable to a dressing replacement device, such as a client.

Specifically, as shown in fig. 3, the image-based dressing replacement method may include the following steps:

and S410, acquiring the target dressing style selected by the user.

In this embodiment, the client locally downloads the dressing style material in advance, the user triggers a panel for popping up the dressing style material by clicking a relevant button on a display interface of the client, and different dressing style materials are displayed through the panel, and the dressing style materials may be the dressing style materials pre-configured by the system or the dressing style materials customized by the user.

The user selects one or more of the dress style materials on the panel as a target dress style for subsequent processing.

And S420, determining the dressing area of the original image.

The dressing area is determined by inputting the original image into a dressing segmentation network, extracting first image features of the original image by using a feature down-sampling module of the dressing segmentation network, inputting the first image features into a feature extraction module of the dressing segmentation network to obtain feature vectors with different receptive fields in the original image, inputting the feature vectors into a feature up-sampling module of the dressing segmentation network for processing, restoring to obtain a feature image with the resolution consistent with that of the input original image, and obtaining a probability image of the original image according to the feature image.

In this embodiment, the original image is input to a clothing segmentation network, feature vectors with different receptive fields in the original image are obtained by using the clothing segmentation network, and a probability image of the original image is output according to the feature vectors.

In an embodiment, the first image feature is input to a semantic feature extraction branch of a feature extraction module of a clothing segmentation network for processing, so as to obtain a semantic feature vector of the original image. Inputting the first image characteristic into a spatial characteristic extraction branch of a characteristic extraction module of a dressing segmentation network for processing to obtain a spatial characteristic vector of the original image; wherein the receptive field of the semantic feature vector is larger than the receptive field of the spatial feature vector. And adding the semantic feature vector and the spatial feature vector pixel by pixel to obtain a total feature vector.

In this embodiment, the network main branch is decomposed into a semantic feature extraction branch and a spatial feature extraction branch, which respectively analyze and process the first image features extracted by the network main branch and output feature vectors of different receptive fields, where the feature vectors learned by the semantic feature extraction branch, i.e., the semantic feature vectors, have a larger receptive field and richer semantic information, and are beneficial to improving the accuracy of pixel classification in the original image. The characteristic vector learned by the spatial characteristic extraction branch, namely the receptive field of the spatial characteristic vector is small, and the spatial characteristic vector has richer spatial information, thereby being beneficial to improving the segmentation precision of different regions in the original image.

In one embodiment, the total feature vector is input to the feature upsampling module of the clothing segmentation network to perform network operation layer by layer, and the second image feature is extracted.

Further, the feature image is converted into a probability image corresponding to the feature image by using a Sigmoid function.

The Sigmoid function is:

And S430, fusing the target dressing style to the dressing area to generate a target image.

And covering the target dressing pattern on the dressing area of the original image, or deleting the original dressing pattern of the original image, replacing the target dressing pattern selected by the user, and fusing the target dressing pattern into the original image to obtain the target image, for example, changing a white T-shirt into a red T-shirt, changing a checked shirt into a pure color shirt, and the like.

In the dressing replacement method based on the image provided by the embodiment, the target dressing style selected by the user is obtained; performing semantic segmentation processing on the original image by using a pre-trained lightweight dressing segmentation network to obtain a probability image corresponding to the original image; the method comprises the steps of determining and segmenting the dressing area of the original image according to the probability image, fusing the target dressing pattern into the dressing area, generating the target image, achieving accurate segmentation of the dressing area of the original image and replacement of the dressing pattern on the client, and being small in calculation amount, stable in operation and capable of meeting requirements of users.

In order to more clearly illustrate the present solution, an implementation process of the present solution is exemplarily described below with reference to fig. 4. Fig. 4 is a schematic diagram of an implementation of an image-based dressing replacement according to an embodiment of the present application.

As shown in fig. 4, an input original image is obtained, the original image is input into a pre-trained dressing segmentation network for semantic segmentation, a network main branch of the dressing segmentation network is used for extracting shallow features of the original image, then a semantic feature extraction branch of the dressing segmentation network is used for learning a semantic feature vector with a large receptive field based on the shallow features, a spatial feature extraction branch of the dressing segmentation network is used for learning a spatial feature vector with a small receptive field based on the shallow features, meanwhile, a global pooling + full connection layer is added in a penultimate convolution module of the semantic feature extraction branch for classifying whether the input original image has a dressing region or not, and a training process is constrained through a cross-entropy loss function, so that the more stable the training of the dressing segmentation network is, and the segmentation precision is higher. And adding the semantic feature vector output by the semantic feature extraction branch and the spatial feature vector output by the spatial feature extraction branch, recovering to obtain a probability image of the original image, segmenting the dressing area according to the probability image, and fusing the target dressing style selected by the user to the dressing area of the original image to achieve the effect of changing clothes or colors.

In this embodiment, the global pooling layer + the full link layer added in the background classification loss function semantic feature extraction branch is used for performing classification results and the probability image output by the clothing segmentation network is used for performing constraint training, which is beneficial to improving the stability and the edge segmentation precision of the clothing segmentation network training.

The above examples are merely used to assist in explaining the technical solutions of the present disclosure, and the drawings and specific flows related thereto do not constitute a limitation on the usage scenarios of the technical solutions of the present disclosure.

The following describes in detail embodiments related to the image-based dressing area dividing apparatus and the dressing replacement apparatus.

Fig. 5 is a schematic structural diagram of an image-based dressing region segmentation apparatus according to an embodiment of the present application, which is executable by an image-based dressing region segmentation device, such as a client.

Specifically, as shown in fig. 5, the image-based dressing area dividing apparatus 200 includes: an original image acquisition module 210, a probability image output module 220 and a dressing area segmentation module.

The original image obtaining module 210 is configured to obtain an original image to be segmented; a probability image output module 220, configured to input the original image into a clothing segmentation network, extract a first image feature of the original image by using a feature down-sampling module of the clothing segmentation network, input the first image feature into a feature extraction module of the clothing segmentation network to obtain a feature vector with different receptive fields in the original image, input the feature vector into a feature up-sampling module of the clothing segmentation network for processing, restore to obtain a feature image with a resolution consistent with that of the input original image, and obtain a probability image of the original image according to the feature image; and a dressing region segmentation module 230, configured to receive the probability image, and segment the dressing region of the original image according to the probability image.

The image-based dressing area dividing device provided by the embodiment is deployed at a client, can realize accurate division of the dressing area of the image on the client, and has stable performance and high processing speed.

In one embodiment, the probabilistic image output module 220 includes: a semantic feature vector obtaining unit, a spatial feature vector obtaining unit and a total feature vector obtaining unit;

a semantic feature vector obtaining unit, configured to input the first image feature to a semantic feature extraction branch of a feature extraction module of the clothing segmentation network for processing, so as to obtain a semantic feature vector of the original image;

a spatial feature vector obtaining unit, configured to input the first image feature to a spatial feature extraction branch of a feature extraction module of the clothing segmentation network for processing, so as to obtain a spatial feature vector of the original image; wherein the receptive field of the semantic feature vector is larger than the receptive field of the spatial feature vector;

and the total feature vector obtaining unit is used for adding the semantic feature vector and the space feature vector pixel by pixel to obtain a total feature vector.

In one embodiment, the probabilistic image output module 220 includes: a second image feature extraction unit and a feature image obtaining unit;

the second image feature extraction unit is used for inputting the total feature vector to a feature upsampling module of the dressing segmentation network to perform network operation layer by layer, and extracting to obtain a second image feature;

and the characteristic image obtaining unit is used for obtaining a characteristic image which is consistent with the resolution of the input image to be segmented according to the second image characteristic restoration.

In an embodiment, the feature downsampling module includes a plurality of first convolution units, each of the first convolution units including a plurality of stacked first conventional convolutions, batch normalization, a ReLU activation function, a second conventional convolution, batch normalization, and a ReLU activation;

In one embodiment, the probabilistic image output module 220 includes: and the probability image obtaining unit is used for converting the characteristic image into a probability image corresponding to the characteristic image by using a Sigmoid function.

In one embodiment, in training the clothing segmentation network, the image-based clothing region segmentation apparatus 200 further includes: the dressing area judgment module is used for classifying the original image by utilizing a global pooling layer and a full connection layer of the dressing segmentation network and judging whether the original image has a dressing area according to a classification result output by the global pooling layer and the full connection layer; wherein the global pooling and full-link layer is disposed in a penultimate volume block of a semantic feature extraction branch of the instrumented segmentation network.

In one embodiment, the image-based dressing region segmenting apparatus 200 further includes: and the segmentation network training module is used for training the dressing segmentation network through cross entropy loss function constraint according to the classification result.

The image-based dressing region segmentation apparatus according to the embodiments of the present disclosure may perform the image-based dressing region segmentation method according to the embodiments of the present disclosure, which is similar to the principle of the image-based dressing region segmentation method, the actions performed by each module in the image-based dressing region segmentation apparatus according to the embodiments of the present disclosure correspond to the steps in the image-based dressing region segmentation method according to the embodiments of the present disclosure, and for the detailed functional description of each module of the image-based dressing region segmentation apparatus, reference may be specifically made to the description in the corresponding image-based dressing region segmentation method shown in the foregoing text, and details are not repeated here.

Fig. 6 is a schematic structural diagram of an image-based dressing replacement apparatus according to an embodiment of the present application, where the image-based dressing replacement apparatus is executable on an image-based dressing replacement device, such as a client.

Specifically, as shown in fig. 6, the image-based dressing change apparatus 400 includes: a dressing style acquisition module 410, a dressing area determination module 420, and a target image generation module.

The clothing style acquiring module 410 is configured to acquire a target clothing style selected by a user;

a clothing region determining module 420, configured to determine a clothing region of an original image, where the clothing region is determined by inputting the original image into a clothing segmentation network, extracting a first image feature of the original image by using a feature down-sampling module of the clothing segmentation network, inputting the first image feature into a feature extraction module of the clothing segmentation network to obtain feature vectors with different receptive fields in the original image, inputting the feature vectors into a feature up-sampling module of the clothing segmentation network for processing, restoring to obtain a feature image consistent with a resolution of the input original image, and obtaining a probability image of the original image according to the feature image;

and a target image generation module 430, configured to fuse the target clothing style to the clothing region, and generate a target image.

The dressing replacement device based on the images is deployed at the client, can realize accurate segmentation of the dressing area of the images and dressing style transformation processing on the client, is stable in performance and high in processing speed, and meets the requirements of users.

The image-based dressing replacement device according to the embodiment of the present disclosure can execute the image-based dressing replacement method according to the embodiment of the present disclosure, and the implementation principles thereof are similar, and are not described herein again.

An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the image-based dressing region segmentation method or the dressing replacement method in any of the above embodiments.

The computer device provided by the above has corresponding functions and advantages when executing the image-based dressing region division method or the dressing replacement method provided by any of the above embodiments.

An embodiment of the present invention provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method for image-based sky region segmentation, including:

acquiring an original image to be segmented;

and receiving the probability image, and segmenting the dressing area of the original image according to the probability image.

The computer executable instructions, when executed by a computer processor, are further for performing an image special effects processing method comprising:

acquiring a target dressing style selected by a user;

Of course, the embodiments of the present invention provide a storage medium containing computer-executable instructions, which are not limited to the operations of the image-based dressing region segmentation method or the dressing replacement method described above, and have corresponding functions and advantages.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the image-based method for dividing a mounting area or the method for replacing a mounting according to any embodiment of the present invention.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps. The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A dressing area segmentation method based on images is characterized by comprising the following steps:

acquiring an original image to be segmented;

2. The method of claim 1, wherein the step of inputting the first image feature to the feature extraction module of the clothing segmentation network to obtain the feature vectors with different receptive fields in the original image comprises:

3. The method of claim 2, wherein the step of inputting the feature vector to a feature upsampling module of the clothing segmentation network for processing to restore a first feature image consistent with the resolution of the input original image comprises:

4. The image-based clothing region segmentation method according to claim 1, wherein the feature downsampling module includes a plurality of first convolution units, each of the first convolution units including a plurality of stacked first conventional convolutions, batch normalization, ReLU activation function, second conventional convolutions, batch normalization, and ReLU activation;

5. The method of claim 1, wherein the step of deriving the probability image of the original image from the feature image comprises:

6. The method of claim 1, wherein in training the clothing segmentation network, further comprising:

classifying the original image by utilizing a global pooling and full-link layer of the dressing segmentation network, and judging whether the original image has a dressing area according to a classification result output by the global pooling and full-link layer; wherein the global pooling and full-connectivity layer is disposed in a penultimate volume block of a semantic branch of the rigged segmentation network.

7. The image-based method of segmenting a clothing region according to claim 6, further comprising:

8. An image-based dressing replacement method, comprising:

acquiring a target dressing style selected by a user;

9. An image-based dressing area dividing device, comprising:

and the dressing area segmentation module is used for segmenting the dressing area of the original image according to the probability map.

10. An image-based dressing change apparatus, comprising:

the system comprises a dressing area determining module, a dressing area determining module and a dressing area determining module, wherein the dressing area inputs an original image into a dressing segmentation network, a feature down-sampling module of the dressing segmentation network is used for extracting first image features of the original image, the first image features are input into a feature extraction module of the dressing segmentation network to obtain feature vectors with different receptive fields in the original image, the feature vectors are input into a feature up-sampling module of the dressing segmentation network for processing, a feature image with the resolution consistent with that of the input original image is obtained through restoration, and the dressing area determining module is used for determining a probability image of the original image according to the feature image;

11. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the image-based rigging region segmentation method according to any one of claims 1-7 or the image-based rigging replacement method according to claim 8.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the image-based rigging region segmentation method according to any one of claims 1 to 7 or the image-based rigging replacement method according to claim 8.