CN115393596B

CN115393596B - Garment image segmentation method based on artificial intelligence

Info

Publication number: CN115393596B
Application number: CN202211332438.0A
Authority: CN
Inventors: 余锋; 李会引; 王誉霖; 姜明华; 周昌龙; 宋坤芳
Original assignee: Wuhan Textile University
Current assignee: Wuhan Textile University
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2023-02-21
Anticipated expiration: 2042-10-28
Also published as: CN115393596A

Abstract

The invention discloses a garment image segmentation method based on artificial intelligence, and relates to the field of image segmentation. The clothing image segmentation method comprises the following steps: firstly, clothing image data is collected, the images are preprocessed and sent to a clothing feature extraction network of a clothing image segmentation model, clothing features in the images are extracted, then a clothing feature fusion restoration network of the clothing image segmentation model is used for fusing the image features to reduce feature loss, and the images with different labeling information are output. The clothing feature extraction network comprises a double-branch feature extraction module. The clothing feature fusion reduction network comprises a multi-scale feature fusion module and a result output module. The method greatly improves the accuracy and speed of model reasoning by fusing an attention mechanism and depth separable convolution operation, does not need a large amount of manual labeling, and provides great convenience for subsequent clothes analysis work.

Description

Garment image segmentation method based on artificial intelligence

Technical Field

The invention relates to the field of image segmentation, in particular to a garment image segmentation method based on artificial intelligence.

Background

Image segmentation is the most fundamental image operation in computer vision processing, and the subsequent processing of computer vision depends on the quality of segmentation of a region of interest in an image. Most of the existing image segmentation technologies adopt traditional algorithms to process, such as statistical image energy histogram and edge detection and cutting. Or a mathematical morphology processing link of the image is added to improve the accuracy of segmentation, such as noise reduction processing of expansion corrosion and the like. When the traditional image segmentation algorithm is used for processing images with single scene and strong pixel continuity, the accuracy and the efficiency of segmentation are high; however, in the case of processing a complicated fashion garment image, particularly in an environment with a complicated scene, when it is desired to segment the garment worn by a person in the image, accuracy linearly drops down, and thus it is hardly usable.

At present, the deep learning technology is used for the vast number of image segmentation, the segmentation task is carried out through the convolutional neural network, the accuracy of more segmentation networks with excellent performance is successfully surpassed that of a traditional algorithm, a plurality of scholars apply the semantic segmentation network based on the deep learning to the garment image segmentation field, and the semantic segmentation network combines an attention mechanism and a semantic feature enhancement module, so that the network can more accurately segment the garment images. The image segmentation technology has a wide application range, and the clothing image segmentation is used as important application of image segmentation in the clothing image field, and aims to classify different clothing, retain effective clothing information in the image, extract a target clothing region and filter other ineffective information. The effective information reserved by the clothing image segmentation technology can be used in the fields of clothing retrieval, virtual fitting and the like. There is therefore a need for an efficient and intelligent method for automatically segmenting high quality garments from complex images, including dresses, skirts, pants, shoes, etc.

Chinese patent publication No. CN109325952A discloses a fashion clothing image segmentation method based on deep learning, which proposes to adjust weights in a network by assigning different weights to a plurality of loss functions, and this way can improve the image accuracy of model segmentation, but it is not enough to segment images in complex scenes such as clothing images, and it is necessary to introduce a more effective network structure design scheme.

Disclosure of Invention

Aiming at the defects or improvement requirements of the prior art, the invention provides a garment image segmentation method based on artificial intelligence, and aims to build a network model suitable for garment segmentation through the fusion of an attention mechanism and a convolutional neural network, so that the network segmentation efficiency is effectively improved, and a high-quality garment segmentation image can be provided.

To test the above objects, according to one aspect of the present invention, there is provided an artificial intelligence based garment image segmentation method, comprising the steps of:

step 1, collecting clothing images to form a clothing data set, and labeling image segmentation information;

step 2, reading the pictures in the data set, preprocessing the read pictures, and sending the preprocessed pictures into a clothing feature extraction network of a clothing image segmentation model to extract clothing features so as to obtain a feature map with the clothing features;

the clothing feature extraction network is a double-branch feature extraction module, and the two branches simultaneously process data;

step 3, sending the characteristic graph with the clothing characteristics into a clothing characteristic fusion reduction network of a clothing image segmentation model to obtain a prediction probability graph, determining the category of each pixel according to the channel serial number of the maximum probability, and rendering the picture according to different categories to obtain a color label graph;

the clothing feature fusion restoration network comprises a multi-scale feature fusion module used for fusing feature maps and a result output module used for outputting the category of each pixel.

Further, the specific processing procedure of the dual-branch feature extraction module in step 2 is as follows: transmitting the preprocessed image into a double-branch feature extraction module, and simultaneously processing data by a first branch and a second branch; wherein the first branch consists of 4 attention mechanism modules and the second branch consists of 8 convolution modules; note that the specific operations within the mechanism module are: first, passing through a linear self-attention mechanism layer, then a separable convolution layer with a depth of 2 x 2 and an activation layer; the specific operations within the volume module are: the feature map is processed through a 7 x 7 depth separable convolutional layer, then through an active layer, and finally through a 2 x 2 depth separable convolutional layer and an active layer, wherein the 2 x 2 depth separable convolutional layers in the 2 nd, 4 th, 6 th and 8 th convolutional modules are used for changing the height and width of the feature map.

Further, the weighting loss function used in the training process of the clothing image segmentation model in step 2 is specifically represented by:

where H is the height of the predicted feature map, W is the width of the predicted feature map, y _ij Representing the pixel value, y, at the position of the ith row and the jth column on the real label _ij The' expression predicts the pixel value of the position of the ith row and the jth column on the characteristic diagram, and leads the network to focus on all the types of pixels in a global manner during training by a weighting mode, thereby improving the segmentation performance of the network.

Further, the multi-scale feature fusion module in step 3 is composed of 2 feature upsampling modules, wherein the feature upsampling module specifically operates as follows: firstly, passing through a separable convolution layer and an activation layer with the depth of 3 multiplied by 3, and then passing through a sampling layer with the depth of 4 times; the specific operation of the result output module is as follows: the channel number is adjusted to be a classification number through a 3 x 3 depth separable convolution layer and an activation layer, then the classification number is adjusted through the 3 x 3 depth separable convolution layer, and finally a final prediction characteristic diagram is output through a softmax layer.

Further, the activation functions of the activation layer are all relx, and the specific calculation formula is as follows:

where max is the function of the maximum, α and β are the scaling factors, and x represents the value at any position of the feature map.

Further, the calculation method of the linear self-attention mechanism layer is as follows;

the data dimension of the feature map is H × W × C, wherein H is the height of the feature map, W is the width of the feature map, C is the channel of the feature map, the feature map of the input linear self-attention mechanism layer is divided into 3 branches to be subjected to 1 × 1 depth separable convolutional layers to obtain 3 feature maps which are respectively marked as q, k and v, and the specific calculation formula of the LSA is as follows:

wherein, the symbol represents matrix multiplication, and the size of n is H multiplied by W to represent q _i Vector, k _i Vector sum v _i Number of vectors, q _i Representing the vector at the position i of the feature map q, the vector dimension is C, k _i And v _i And q is _i Similarly, sim is a function for calculating vector similarity, and the specific formula is as follows:

wherein Q and K represent vectors with the same dimension, the symbol is 8729, the point multiplication of the vectors is represented, and the value of the component of the similarity is 0.01 when epsilon is used.

Further, in the second branch, the input of the 1 st convolution module is the clothing image data, the input of the 2 nd convolution module is the output of the 1 st convolution module, the input of the 3 rd convolution module is the product of the output of the 1 st attention mechanism module and the output of the 2 nd convolution module, the input of the 4 th convolution module is the output of the 3 rd convolution module, the input of the 5 th convolution module is the product of the output of the 2 nd attention mechanism module and the output of the 4 th convolution module, the input of the 6 th convolution module is the output of the 5 th convolution module, the input of the 7 th convolution module is the product of the output of the 3 rd attention mechanism module and the output of the 6 th convolution module, and the input of the 8 th convolution module is the output of the 7 th convolution module.

Further, the input of the 1 st feature upsampling module in the multi-scale feature fusion module is the product of the output of the 4 th attention mechanism module and the output of the 8 th convolution module, and the input of the 2 nd feature upsampling module is the product of the output of the 1 st feature upsampling module plus the output of the 2 nd attention mechanism module and the output of the 4 th convolution module.

Further, softmax functionThe formula for the number calculation is:

in the formula X _i Is the firstiThe output value of each channel, n is the number of output channels, i.e. the number of classified categories.

Further, in step 3, different colors are marked to indicate different types of clothes, including long sleeves, short sleeves, one-piece dresses, trousers, shoes, waistbands, scarves, glasses, coats, short skirts, sweaters, bags, ties and vests.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) The attention mechanism-based double-branch feature extraction network and the multi-scale feature fusion network are used, so that the clothes data features can be amplified favorably, and the aim of improving the clothes image segmentation precision is fulfilled.

(2) In the method, a great amount of depth separable convolutions are used, and meanwhile, the used attention mechanism layer belongs to linear operation, so that the network reasoning time can be greatly reduced, and the segmented image with the label can be obtained in a short time when the image is input into a network.

(3) An effective design scheme is provided for the design of the garment segmentation depth network.

Drawings

Fig. 1 is a schematic view of an implementation flow of a method for segmenting a garment image based on artificial intelligence according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a dual-branch feature extraction module of a clothing image segmentation method based on artificial intelligence according to an embodiment of the present invention.

Fig. 3 is a schematic view of an attention mechanism module of a clothing image segmentation method based on artificial intelligence according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a convolution module of a clothing image segmentation method based on artificial intelligence according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a feature upsampling module of a clothing image segmentation method based on artificial intelligence according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of a result output module of a clothing image segmentation method based on artificial intelligence according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention. In addition, the technical features involved in the respective embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Referring to fig. 1, fig. 1 is a schematic view of an implementation flow of an artificial intelligence based garment image segmentation method provided in an embodiment, and specifically includes the following steps:

(1) And collecting clothing images to form a clothing data set, and labeling image segmentation information.

Wherein, the source of image acquisition is each large garment platform.

(2) Reading pictures in the data set, preprocessing the read pictures, and sending the preprocessed pictures into a clothing feature extraction network of a clothing image segmentation model to extract clothing features so as to obtain a feature map with the clothing features;

the clothing feature extraction network comprises a double-branch feature extraction module;

specifically, the preprocessing operation includes data processing of random cropping, flipping, and image enhancement.

Fig. 2 is a schematic diagram of a dual-branch feature extraction module of a clothing image segmentation method based on artificial intelligence according to an embodiment, and the specific processing procedure is as follows: the preprocessed image is transmitted into a double-branch feature extraction module, and a first branch and a second branch can process data simultaneously. The first branch consists of 4 attention mechanism modules and the second branch of 8 convolution modules. Note that the specific operations within the mechanism module are: referring to fig. 3, fig. 3 is a schematic view of an attention mechanism module of an artificial intelligence-based garment image segmentation method according to an embodiment of the present invention. The specific operations within the volume module are: first a 7 x 7 depth separable convolutional layer, then an active layer, and finally a 2 x 2 depth separable convolutional layer and active layer. Referring to fig. 4, fig. 4 is a schematic diagram of a convolution module of an artificial intelligence based garment image segmentation method provided by an embodiment of the invention, wherein 2 × 2 depth separable convolution layers in the 2 nd, 4 th, 6 th and 8 th convolution modules are used for changing the height and width of the feature map.

Specifically, the weight Loss function (Weighted Loss) used in the training process of the clothing image segmentation model is represented by the following formula:

where H is the height of the predicted feature map, W is the width of the predicted feature map, y _ij Representing the pixel value, y, at the position of the ith row and the jth column on the real label _ij ' the pixel value at the ith row and the jth column on the prediction characteristic diagram can be globally focused on all the types of pixels in the network training process through a weighting mode, and the segmentation performance of the network is improved.

Specifically, the activation functions of the activation layer and the subsequent activation operation are all relx, and the specific calculation formula is as follows:

where max is the maximum function, α and β are the scaling factors, where α is set to 0.7, β is set to 1.5, and x represents the value at any position in the feature map.

Specifically, the calculation method of the linear self-attention mechanism layer is as follows: the data dimension of the feature map is H × W × C, wherein H is the height of the feature map, W is the width of the feature map, C is the channel of the feature map, the feature map input into the linear self-attention mechanism layer is divided into 3 branches to be subjected to 1 × 1 depth separable convolution layer to obtain 3 feature maps which are respectively marked as q, k and v, and the linear self-attention mechanism layer

The specific calculation formula of (Linear Self attachment) is as follows:

where symbol denotes the matrix multiplication, n denotes the total number of vectors of size H x W, and vector q _i The vector represented at position i of the feature map q, the vector dimension being C, where k _i And v _i And q is _i Similarly, sim is a function for calculating vector similarity, and the specific formula is as follows:

wherein Q and K represent vectors with the same dimension, symbol 8729, represent vector point multiplication,εthe value of the similarity component is 0.01. The similarity of each pixel with the surrounding pixels can be calculated through the attention mechanism, and the weight is larger when the similarity is higher.

Specifically, the input of the 1 st convolution module in the second branch is the clothing image data, the input of the 2 nd convolution module is the output of the 1 st convolution module, the input of the 3 rd convolution module is the product of the output of the 1 st attention mechanism module and the output of the 2 nd convolution module, the input of the 4 th convolution module is the output of the 3 rd convolution module, the input of the 5 th convolution module is the product of the output of the 2 nd attention mechanism module and the output of the 4 th convolution module, the input of the 6 th convolution module is the output of the 5 th convolution module, the input of the 7 th convolution module is the product of the output of the 3 rd attention mechanism module and the output of the 6 th convolution module, and the input of the 8 th convolution module is the output of the 7 th convolution module.

(3) Sending the characteristic graph with the clothing characteristics into a clothing characteristic fusion reduction network of a clothing image segmentation model to obtain a prediction probability graph, determining the category of each pixel according to the channel serial number of the maximum probability, and rendering the picture according to different categories to obtain a color label graph;

the clothing feature fusion reduction network comprises a multi-scale feature fusion module and a result output module;

wherein different marking colors represent different garment types, including long sleeves, short sleeves, dresses, pants, shoes, belts, scarves, glasses, coats, dresses, sweaters, bags, ties, and vests.

The system comprises a multi-scale feature fusion module and a result output module. The multi-scale feature fusion module consists of 2 feature upsampling modules. The specific operation of the feature upsampling module is as follows: referring to fig. 5, fig. 5 is a schematic diagram of a feature upsampling module of the artificial intelligence based garment image segmentation method according to the embodiment, wherein the feature upsampling module is first passed through a 3 × 3 depth separable convolutional layer and an activation layer, and then passed through a 4-fold upsampling layer. The specific operation of the result output module is as follows: referring to fig. 6, fig. 6 is a schematic diagram of a result output module of the clothing image segmentation method based on artificial intelligence according to the embodiment, where the number of channels is adjusted to be a classification number by passing through a 3 × 3 depth separable convolution layer and an active layer, and then passes through a 3 × 3 depth separable convolution layer, and finally passes through a softmax layer, so as to output a final prediction feature map. The formula of the softmax function calculation is as follows:

in the formula X _i Is the output value of the ith channel, and n is the number of output channels, i.e. the number of classified categories.

Specifically, the input of two feature upsampling modules in the multi-scale feature fusion module: the input to the 1 st feature upsampling module is the product of the output of the 4 th attention mechanism module and the output of the 8 th convolution module, and the input to the 2 nd feature upsampling module is the product of the output of the 1 st feature upsampling module plus the output of the 2 nd attention mechanism module and the output of the 4 th convolution module.

The invention provides a garment image segmentation method based on artificial intelligence, which uses a double-branch feature extraction network and a multi-scale feature fusion network based on an attention mechanism to amplify garment data features. A great amount of depth separable convolutions are used, meanwhile, the used attention mechanism layer belongs to linear operation, the network reasoning time can be greatly reduced, and the segmented image with the label can be obtained in a short time when the image is input into the network. The garment image segmentation method is specifically applied to the cooperation of the team and Wuhan microlite science and technology limited companies, the garment image data crawl from shopping websites such as amazon and Taobao, meanwhile, the Wuhan microlite science and technology limited companies provide technical support, a data set is manufactured through the implementation example method, a garment image segmentation model obtained through the network of people is trained, and the garment image with the label generated by the model is improved to a certain extent in comparison with the existing network in terms of segmentation quality and label image generation efficiency in verification under the actual condition.

Various modifications and alterations of this application may be made by those skilled in the art without departing from the spirit and scope of this application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A clothing image segmentation method based on artificial intelligence is characterized by comprising the following steps:

the specific processing procedure of the double branch feature extraction module in the step 2 is as follows: the preprocessed image is transmitted into a double-branch feature extraction module, and a first branch and a second branch process data at the same time; wherein the first branch consists of 4 attention mechanism modules and the second branch consists of 8 convolution modules; note that the specific operations within the mechanism module are: first, a linear self-attention mechanism layer is formed, and then a 2 x 2 depth separable convolution layer and an active layer are formed; the specific operations within the volume module are: the feature map is processed by a 7 x 7 depth separable convolution layer, an active layer and a 2 x 2 depth separable convolution layer and an active layer, wherein the 2 x 2 depth separable convolution layer in the 2 nd, the 4 th, the 6 th and the 8 th convolution modules is used for changing the height and the width of the feature map;

the linear self-attention mechanism layer calculation method is as follows;

where the symbol denotes a matrix multiplication, and n is H × W, and q is _i Vector, k _i Vector sum v _i Number of vectors, q _i Representing the vector of the characteristic diagram q at the position i, the vector dimension is C, wherein k _i And v _i And q is _i Similarly, sim is a function for calculating vector similarity, and the specific formula is as follows:

wherein Q and K areVectors with the same table dimension, the symbol 8729, represent vector point multiplication,εthe value of the similarity component is 0.01;

the clothing feature fusion reduction network comprises a multi-scale feature fusion module for fusing the feature map and a result output module for outputting the category of each pixel;

the weighting loss function used in the training process of the clothing image segmentation model is specifically represented as follows:

2. The artificial intelligence based garment image segmentation method of claim 1, wherein: in step 3, the multi-scale feature fusion module is composed of 2 feature upsampling modules, wherein the feature upsampling module specifically operates as follows: firstly, a 3 x 3 depth separable convolution layer and an active layer are passed, and then a 4 times upper sampling layer is passed; the specific operation of the result output module is: the channel number is adjusted to be a classification number through a 3 x 3 depth separable convolution layer and an activation layer, then the classification number is adjusted through the 3 x 3 depth separable convolution layer, and finally a final prediction characteristic diagram is output through a softmax layer.

3. The artificial intelligence based garment image segmentation method of claim 2, wherein: the activation functions of the activation layer are all relx, and the specific calculation formula is as follows:

4. The artificial intelligence based garment image segmentation method of claim 1, wherein: in the second branch, the input of the 1 st convolution module is the clothing image data, the input of the 2 nd convolution module is the output of the 1 st convolution module, the input of the 3 rd convolution module is the product of the output of the 1 st attention mechanism module and the output of the 2 nd convolution module, the input of the 4 th convolution module is the output of the 3 rd convolution module, the input of the 5 th convolution module is the product of the output of the 2 nd attention mechanism module and the output of the 4 th convolution module, the input of the 6 th convolution module is the output of the 5 th convolution module, the input of the 7 th convolution module is the product of the output of the 3 rd attention mechanism module and the output of the 6 th convolution module, and the input of the 8 th convolution module is the output of the 7 th convolution module.

5. The artificial intelligence based garment image segmentation method of claim 2, wherein: the input of the 1 st feature upsampling module in the multi-scale feature fusion module is the product of the output of the 4 th attention mechanism module and the output of the 8 th convolution module, and the input of the 2 nd feature upsampling module is the product of the output of the 1 st feature upsampling module plus the output of the 2 nd attention mechanism module and the output of the 4 th convolution module.

6. The artificial intelligence based garment image segmentation method of claim 2, wherein: the formula for calculation of the softmax function is:

7. The artificial intelligence based garment image segmentation method as claimed in claim 1, wherein: in the step 3, different marking colors represent different clothing types, including long sleeves, short sleeves, one-piece dresses, trousers, shoes, waistbands, scarves, glasses, coats, short skirts, sweaters, bags, ties and vests.