CN115393596A

CN115393596A - Garment image segmentation method based on artificial intelligence

Info

Publication number: CN115393596A
Application number: CN202211332438.0A
Authority: CN
Inventors: 余锋; 李会引; 王誉霖; 姜明华; 周昌龙; 宋坤芳
Original assignee: Wuhan Textile University
Current assignee: Wuhan Textile University
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2022-11-25
Anticipated expiration: 2042-10-28
Also published as: CN115393596B

Abstract

The invention discloses a garment image segmentation method based on artificial intelligence, and relates to the field of image segmentation. The clothing image segmentation method comprises the following steps: firstly, clothing image data is collected, the image is preprocessed and sent to a clothing feature extraction network of a clothing image segmentation model, clothing features in the image are extracted, then a clothing feature fusion restoration network of the clothing image segmentation model is used for fusing the image features to reduce feature loss, and the image with different labeling information is output. The clothing feature extraction network comprises a double-branch feature extraction module. The clothing feature fusion reduction network comprises a multi-scale feature fusion module and a result output module. According to the method, the attention mechanism and the depth separable convolution operation are fused, so that the accuracy and the speed of model reasoning are greatly improved, a large amount of manual labels are not needed, and great convenience is provided for subsequent clothes analysis work.

Description

Garment image segmentation method based on artificial intelligence

Technical Field

The invention relates to the field of image segmentation, in particular to a garment image segmentation method based on artificial intelligence.

Background

Image segmentation is the most fundamental image operation in computer vision processing, and the subsequent processing of computer vision depends on the quality of segmentation of a region of interest in an image. Most of the existing image segmentation technologies adopt traditional algorithms to process, such as statistical image energy histogram and edge detection and cutting. Or a mathematical morphology processing link of the image is added to improve the accuracy of segmentation, such as noise reduction processing of expansion corrosion and the like. When the traditional image segmentation algorithm is used for processing an image with a single scene and strong pixel continuity, the segmentation accuracy and efficiency are high; however, when complicated fashionable garment images are processed, especially in an environment with complicated scenes, if the garments worn by people in the images are to be segmented, the accuracy linearly slides down, and the images are hardly usable.

At present, the deep learning technology is used for the vast majority of image segmentation, segmentation tasks are carried out through a convolutional neural network, more segmentation networks with excellent performance are generated, the accuracy of the segmentation networks is successfully superior to that of the traditional algorithm, a plurality of scholars apply a semantic segmentation network based on deep learning to the field of clothing image segmentation, and the semantic segmentation network is combined with an attention mechanism and a semantic feature enhancement module, so that the network can more accurately segment clothing images. The image segmentation technology has a wide application range, and the clothing image segmentation is used as an important application of image segmentation in the clothing image field, and aims to classify different clothing, retain effective clothing information in an image, extract a target clothing region and filter other ineffective information. The effective information reserved by the clothing image segmentation technology can be used in the fields of clothing retrieval, virtual fitting and the like. There is therefore a need for an efficient and intelligent method for automatically segmenting high quality garments from complex images, including dresses, skirts, pants, shoes, etc.

The Chinese invention patent with the publication number of CN109325952A discloses a fashion clothing image segmentation method based on deep learning, which provides that weights in a network are adjusted by endowing a plurality of loss functions with different weights, and the method can improve the image accuracy of model segmentation, but the method is not enough only through innovation of the loss functions when images under complex scenes such as clothing images are segmented, and a more effective network structure design scheme needs to be introduced.

Disclosure of Invention

Aiming at the defects or improvement requirements of the prior art, the invention provides a garment image segmentation method based on artificial intelligence, and aims to build a network model suitable for garment segmentation through the fusion of an attention mechanism and a convolutional neural network, so that the network segmentation efficiency is effectively improved, and a high-quality garment segmentation image can be provided.

To test the above objects, according to one aspect of the present invention, there is provided an artificial intelligence based garment image segmentation method, comprising the steps of:

step 1, collecting clothing images to form a clothing data set, and labeling image segmentation information;

step 2, reading the pictures in the data set, preprocessing the read pictures, and sending the preprocessed pictures into a clothing feature extraction network of a clothing image segmentation model to extract clothing features so as to obtain a feature map with the clothing features;

the clothing feature extraction network is a double-branch feature extraction module, and the two branches simultaneously process data;

step 3, sending the characteristic graph with the clothing characteristics into a clothing characteristic fusion reduction network of a clothing image segmentation model to obtain a prediction probability graph, determining the category of each pixel according to the channel serial number of the maximum probability, and rendering the picture according to different categories to obtain a color label graph;

the clothing feature fusion restoration network comprises a multi-scale feature fusion module used for fusing feature maps and a result output module used for outputting the category of each pixel.

Further, the specific processing procedure of the dual-branch feature extraction module in step 2 is as follows: transmitting the preprocessed image into a double-branch feature extraction module, and simultaneously processing data by a first branch and a second branch; wherein the first branch consists of 4 attention mechanism modules and the second branch consists of 8 convolution modules; note that the specific operations within the mechanism module are: first, passing through a linear self-attention mechanism layer, then a separable convolution layer with a depth of 2 x 2 and an activation layer; the specific operations within the volume module are: the feature map is passed through a 7 x 7 depth separable convolutional layer, through an active layer, and finally through a 2 x 2 depth separable convolutional layer and an active layer, wherein the 2 x 2 depth separable convolutional layers in the 2 nd, 4 th, 6 th and 8 th convolutional modules are used to change the height and width of the feature map.

Further, the weighting loss function used in the training process of the clothing image segmentation model in step 2 is specifically represented by:

where H is the height of the predicted feature map, W is the width of the predicted feature map, y _ij Representing the pixel value, y, at the position of the ith row and the jth column on the real label _ij ' the pixel value at the ith row and the jth column on the prediction characteristic diagram can be globally focused on all the types of pixels in the network training process through a weighting mode, and the segmentation performance of the network is improved.

Further, the multi-scale feature fusion module in step 3 is composed of 2 feature upsampling modules, wherein the feature upsampling module specifically operates as follows: firstly, passing through a separable convolution layer and an activation layer with the depth of 3 multiplied by 3, and then passing through a sampling layer with the depth of 4 times; the specific operation of the result output module is as follows: the channel number is adjusted to be a classification number through a 3 x 3 depth separable convolution layer and an activation layer, then the classification number is adjusted through the 3 x 3 depth separable convolution layer, and finally a final prediction characteristic diagram is output through a softmax layer.

Further, the activation functions of the activation layer are all relx, and the specific calculation formula is as follows:

where max is the function of the maximum, α and β are the scaling factors, and x represents the value at any position of the feature map.

Further, the calculation method of the linear self-attention mechanism layer is as follows;

the data dimension of the feature map is H × W × C, wherein H is the height of the feature map, W is the width of the feature map, C is the channel of the feature map, the feature map of the input linear self-attention mechanism layer is divided into 3 branches to be subjected to 1 × 1 depth separable convolutional layers to obtain 3 feature maps which are respectively marked as q, k and v, and the specific calculation formula of the LSA is as follows:

wherein, the symbol represents matrix multiplication, and the size of n is H multiplied by W to represent q _i Vector, k _i Vector sum v _i Number of vectors, q _i Representing the vector of the characteristic diagram q at the position i, the vector dimension is C, wherein k _i And v _i And q is _i Similarly, sim is a function for calculating vector similarity, and the specific formula is as follows:

wherein Q and K represent vectors with the same dimension, the symbol ∙ represents vector point multiplication, and epsilon is the value of the similarity component of 0.01.

Further, in the second branch, the input of the 1 st convolution module is the clothing image data, the input of the 2 nd convolution module is the output of the 1 st convolution module, the input of the 3 rd convolution module is the product of the output of the 1 st attention mechanism module and the output of the 2 nd convolution module, the input of the 4 th convolution module is the output of the 3 rd convolution module, the input of the 5 th convolution module is the product of the output of the 2 nd attention mechanism module and the output of the 4 th convolution module, the input of the 6 th convolution module is the output of the 5 th convolution module, the input of the 7 th convolution module is the product of the output of the 3 rd attention mechanism module and the output of the 6 th convolution module, and the input of the 8 th convolution module is the output of the 7 th convolution module.

Further, the input of the 1 st feature upsampling module in the multi-scale feature fusion module is the product of the output of the 4 th attention mechanism module and the output of the 8 th convolution module, and the input of the 2 nd feature upsampling module is the product of the output of the 1 st feature upsampling module plus the output of the 2 nd attention mechanism module and the output of the 4 th convolution module.

Further, the formula of the softmax function calculation is as follows:

in the formula X _i Is the firstiThe output value of each channel, n is the number of output channels, i.e. the number of classified categories.

Further, different marking colors in step 3 represent different clothing types, including long sleeves, short sleeves, one-piece dresses, trousers, shoes, waistbands, scarves, glasses, coats, short skirts, sweaters, bags, ties and vests.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) The attention mechanism-based double-branch feature extraction network and the multi-scale feature fusion network are used, so that the clothes data features can be amplified favorably, and the aim of improving the clothes image segmentation precision is fulfilled.

(2) In the method, a great amount of depth separable convolutions are used, and meanwhile, the used attention mechanism layer belongs to linear operation, so that the network reasoning time can be greatly reduced, and the segmented image with the label can be obtained in a short time when the image is input into a network.

(3) An effective design scheme is provided for the design of the garment segmentation depth network.

Drawings

Fig. 1 is a schematic view of an implementation flow of a method for segmenting a garment image based on artificial intelligence according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a dual-branch feature extraction module of a clothing image segmentation method based on artificial intelligence according to an embodiment of the present invention.

Fig. 3 is a schematic view of an attention mechanism module of a clothing image segmentation method based on artificial intelligence according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a convolution module of a clothing image segmentation method based on artificial intelligence according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a feature upsampling module of a clothing image segmentation method based on artificial intelligence according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of a result output module of a clothing image segmentation method based on artificial intelligence according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Referring to fig. 1, fig. 1 is a schematic view of an implementation flow of an artificial intelligence based garment image segmentation method provided by an embodiment of the present invention, which specifically includes the following steps:

(1) And collecting clothing images to form a clothing data set, and labeling image segmentation information.

Wherein, the source of image acquisition is each large garment platform.

(2) Reading pictures in the data set, preprocessing the read pictures, and sending the preprocessed pictures into a clothing feature extraction network of a clothing image segmentation model to extract clothing features so as to obtain a feature map with the clothing features;

the clothing feature extraction network comprises a double-branch feature extraction module;

specifically, the preprocessing operation includes data processing of random cropping, flipping, and image enhancement.

Fig. 2 is a schematic diagram of a dual-branch feature extraction module of a clothing image segmentation method based on artificial intelligence according to an embodiment, and the specific processing procedure is as follows: and transmitting the preprocessed image into a double-branch feature extraction module, wherein the first branch and the second branch can process data simultaneously. The first branch consists of 4 attention mechanism modules and the second branch consists of 8 convolution modules. Note that the specific operations within the mechanism module are: referring to fig. 3, fig. 3 is a schematic view of an attention mechanism module of an artificial intelligence-based garment image segmentation method according to an embodiment of the present invention. The specific operations within the volume module are: first a 7 x 7 depth separable convolutional layer, then an active layer, and finally a 2 x 2 depth separable convolutional layer and active layer. Wherein, the 2 × 2 depth separable convolution layer in the 2 nd, 4 th, 6 th and 8 th convolution modules is used to change the height and width of the feature map, please refer to fig. 4, fig. 4 is a schematic diagram of the convolution modules of the artificial intelligence based garment image segmentation method provided by the embodiment.

Specifically, the weight Loss function (Weighted Loss) used in the training process of the clothing image segmentation model is represented by the following formula:

Specifically, the activation functions of the activation layer and the subsequent activation operation are all relx, and the specific calculation formula is as follows:

where max is the maximum function, α and β are the scaling factors, where α is set to 0.7, β is set to 1.5, and x represents the value at any position of the feature map.

Specifically, the linear self-attention mechanism layer calculation method comprises the following steps: the data dimension of the characteristic diagram is H multiplied by W multiplied by C, wherein H is the height of the characteristic diagram, W is the width of the characteristic diagram, C is the channel of the characteristic diagram, the characteristic diagram of the input linear self-attention mechanism layer is divided into 3 branches to be subjected to 1 multiplied by 1 depth separable convolutional layers to obtain 3 characteristic diagrams which are respectively marked as q, k and v, and the linear self-attention mechanism layer is

The specific calculation formula of (Linear Self authorization) is:

where symbol denotes the matrix multiplication, n denotes the total number of vectors, of size H W, and vector q _i The vector represented at position i of the feature map q, the vector dimension being C, where k _i And v _i And q is _i Similarly, sim is a function for calculating vector similarity, and the specific formula is as follows:

where Q and K represent vectors of the same dimension, the notation ∙ represents a vector point product,εthe value of the similarity component is 0.01. The attention mechanism can calculate the similarity of each pixel and the surrounding pixels, and the higher the similarity is, the higher the weight is.

Specifically, the input of the 1 st convolution module in the second branch is the clothing image data, the input of the 2 nd convolution module is the output of the 1 st convolution module, the input of the 3 rd convolution module is the product of the output of the 1 st attention mechanism module and the output of the 2 nd convolution module, the input of the 4 th convolution module is the output of the 3 rd convolution module, the input of the 5 th convolution module is the product of the output of the 2 nd attention mechanism module and the output of the 4 th convolution module, the input of the 6 th convolution module is the output of the 5 th convolution module, the input of the 7 th convolution module is the product of the output of the 3 rd attention mechanism module and the output of the 6 th convolution module, and the input of the 8 th convolution module is the output of the 7 th convolution module.

(3) Sending the characteristic graph with the clothing characteristics into a clothing characteristic fusion reduction network of a clothing image segmentation model to obtain a prediction probability graph, determining the category of each pixel according to the channel serial number of the maximum probability, and rendering the graph according to different categories to obtain a color labeling graph;

the clothing feature fusion reduction network comprises a multi-scale feature fusion module and a result output module;

wherein different mark colours represent different clothing types, contain long sleeves, cotta, dress, trousers, shoes, waistband, scarf, glasses, overcoat, skirt, sweater, package, tie and undershirt.

The system comprises a multi-scale feature fusion module and a result output module. The multi-scale feature fusion module consists of 2 feature upsampling modules. The feature upsampling module specifically operates as follows: first through a 3 x 3 depth separable convolutional layer and active layer, and then through a 4 x upsampling layer, see fig. 5, which is a graph of fig. 5The embodiment provides a schematic diagram of a feature upsampling module of a garment image segmentation method based on artificial intelligence. The specific operation of the result output module is: referring to fig. 6, fig. 6 is a schematic diagram of a result output module of the clothing image segmentation method based on artificial intelligence according to the embodiment, where the number of channels is adjusted to be a classification number by passing through a 3 × 3 depth separable convolution layer and an active layer, and then passes through a 3 × 3 depth separable convolution layer, and finally passes through a softmax layer, so as to output a final prediction feature map. The formula of the softmax function calculation is as follows:

in the formula X _i Is the output value of the ith channel, and n is the number of output channels, i.e. the number of classified categories.

Specifically, the input of two feature upsampling modules in the multi-scale feature fusion module: the input to the 1 st feature upsampling module is the product of the 4 th attention mechanism module output and the 8 th convolution module output, and the input to the 2 nd feature upsampling module is the product of the 1 st feature upsampling module output plus the 2 nd attention mechanism module output and the 4 th convolution module output.

The invention provides a garment image segmentation method based on artificial intelligence, which uses a double-branch feature extraction network and a multi-scale feature fusion network based on an attention mechanism to amplify garment data features. A great amount of depth separable convolutions are used, meanwhile, the used attention mechanism layer belongs to linear operation, the network reasoning time can be greatly reduced, and the segmented image with the label can be obtained in a short time when the image is input into the network. The garment image segmentation method is specifically applied to the cooperation of the team and Wuhan microlite science and technology limited companies, the garment image data crawl from shopping websites such as amazon and Taobao, meanwhile, the Wuhan microlite science and technology limited companies provide technical support, a data set is manufactured through the implementation example method, a garment image segmentation model obtained through the network of people is trained, and the garment image with the label generated by the model is improved to a certain extent in comparison with the existing network in terms of segmentation quality and label image generation efficiency in verification under the actual condition.

Various modifications and alterations of this application may be made by those skilled in the art without departing from the spirit and scope of this application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A clothing image segmentation method based on artificial intelligence is characterized by comprising the following steps:

2. The artificial intelligence based garment image segmentation method of claim 1, wherein: the specific processing procedure of the double-branch feature extraction module in the step 2 is as follows: the preprocessed image is transmitted into a double-branch feature extraction module, and a first branch and a second branch process data at the same time; wherein the first branch consists of 4 attention mechanism modules and the second branch consists of 8 convolution modules; note that the specific operations within the mechanism module are: first, passing through a linear self-attention mechanism layer, then a separable convolution layer with a depth of 2 x 2 and an activation layer; the specific operations within the volume module are: the feature map is passed through a 7 x 7 depth separable convolutional layer, through an active layer, and finally through a 2 x 2 depth separable convolutional layer and an active layer, wherein the 2 x 2 depth separable convolutional layers in the 2 nd, 4 th, 6 th and 8 th convolutional modules are used to change the height and width of the feature map.

3. The artificial intelligence based garment image segmentation method of claim 1, wherein: the weighting loss function used in the training process of the clothing image segmentation model in the step 2 is specifically represented by the following formula:

4. The artificial intelligence based garment image segmentation method of claim 2, wherein: in step 3, the multi-scale feature fusion module is composed of 2 feature upsampling modules, wherein the feature upsampling module specifically operates as follows: firstly, passing through a separable convolution layer and an activation layer with the depth of 3 multiplied by 3, and then passing through a sampling layer with the depth of 4 times; the specific operation of the result output module is as follows: the channel number is adjusted to be a classification number through a 3 x 3 depth separable convolution layer and an activation layer, then the classification number is adjusted through the 3 x 3 depth separable convolution layer, and finally a final prediction characteristic diagram is output through a softmax layer.

5. The artificial intelligence based garment image segmentation method of claim 2 or 4, wherein: the activation functions of the activation layer are all relx, and the specific calculation formula is as follows:

6. The artificial intelligence based garment image segmentation method of claim 2, wherein: the linear self-attention mechanism layer calculation method is as follows;

where the symbol denotes a matrix multiplication, and n is H × W, and q is _i Vector, k _i Vector sum v _i Number of vectors, q _i Representing the vector at the position i of the feature map q, the vector dimension is C, k _i And v _i And q is _i Similarly, sim is a function for calculating vector similarity, and the specific formula is:

where Q and K represent vectors of the same dimension, the notation ∙ represents a vector point product,εthe value of the similarity component is 0.01.

7. The artificial intelligence based garment image segmentation method of claim 2, wherein: in the second branch, the input of the 1 st convolution module is clothing image data, the input of the 2 nd convolution module is the output of the 1 st convolution module, the input of the 3 rd convolution module is the product of the output of the 1 st attention mechanism module and the output of the 2 nd convolution module, the input of the 4 th convolution module is the output of the 3 rd convolution module, the input of the 5 th convolution module is the product of the output of the 2 nd attention mechanism module and the output of the 4 th convolution module, the input of the 6 th convolution module is the output of the 5 th convolution module, the input of the 7 th convolution module is the product of the output of the 3 rd attention mechanism module and the output of the 6 th convolution module, and the input of the 8 th convolution module is the output of the 7 th convolution module.

8. The artificial intelligence based garment image segmentation method of claim 4, wherein: the input of the 1 st feature upsampling module in the multi-scale feature fusion module is the product of the output of the 4 th attention mechanism module and the output of the 8 th convolution module, and the input of the 2 nd feature upsampling module is the product of the output of the 1 st feature upsampling module plus the output of the 2 nd attention mechanism module and the output of the 4 th convolution module.

9. The artificial intelligence based garment image segmentation method of claim 4, wherein: the formula for the calculation of the softmax function is:

10. The artificial intelligence based garment image segmentation method of claim 1, wherein: in the step 3, different marking colors represent different clothing types, including long sleeves, short sleeves, one-piece dresses, trousers, shoes, waistbands, scarves, glasses, coats, short skirts, sweaters, bags, ties and vests.