CN114882253A

CN114882253A - Fabric weave matching method based on contrast learning and self-attention mechanism

Info

Publication number: CN114882253A
Application number: CN202210645732.0A
Authority: CN
Inventors: 沈启承; 罗钇凯; 俞可扬; 吴子朝
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2022-08-09

Abstract

The invention discloses a fabric weaving matching method based on a contrast learning and self-attention mechanism. Designing a comparison learning network comprising a residual error network and a self-encoder based on a basic structure of deep learning, calculating a comparison calculation error of positive and negative examples by using a comparison loss function, and performing iterative training by gradient back propagation; cascading the trained self-encoder with a multi-head self-attention layer, and extracting global information and local details of the image to be retrieved; and outputting characteristics by adopting a full connecting layer and a normalized residual connecting layer, and performing probability projection sequencing according to the characteristics to match the most accordant fabric weaving mode. The method solves the problem of matching fabric weaving patterns, introduces a multi-head self-attention mechanism into the traditional contrast learning framework, realizes better matching of fabric weaving images, and improves the precision and efficiency of target identification and matching.

Description

Fabric weave matching method based on contrast learning and self-attention mechanism

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a fabric weave matching method based on contrast learning and a self-attention mechanism.

Background

The weaving pattern matching technology of the fabric can help a fabric manufacturer to conveniently manage information of the fabric, can also assist end users such as a fabric dealer and a clothing designer to quickly match more similar weaving fabrics according to the existing fabric, and has very wide application. At present, products matched with the fabric weaving method by applying a deep learning technology in the market are rare, the existing products are difficult to accurately judge the fabric weaving method, the fabric matched by the deep learning technology is not easy to adopt the same weaving method due to low recognition accuracy, and the efficiency and the practicability of fabric matching are undoubtedly greatly reduced.

Computer vision image recognition technology mostly adopts a deep learning network of self-supervision learning or unsupervised learning. Compared with the traditional image processing technology based on a pure mathematical method, such as an image filter, edge extraction, sharpening and the like, the method based on deep learning can extract hierarchical features, can better process high-dimensional sparse data such as images, can perform multiple nonlinear combination and processing on key information, and can extract deeper features in the images, so that the method has better robustness and recognition accuracy.

However, the traditional supervised learning based on the convolutional neural network needs to obtain a relatively ideal effect after a large amount of training and parameter adjustment of labeled training data, but the image information of the accurately labeled labels undoubtedly consumes a large amount of manpower, material resources and resources, and besides, some mistaken labeled samples in a mixed sample set may also have a certain influence on the accuracy of the model, which is undoubtedly not beneficial to model training. The self-supervision learning represented by the contrast learning can carry out model training under the condition that sample data has no label, and the identification performance and the accuracy rate superior to those of the traditional convolutional neural network are shown; and the self-encoder obtained by the contrast learning training can be extracted separately to be combined with other network structures. The self-encoder is used for downstream tasks, and compared with the traditional convolutional neural network, the network structure is simpler, and the required parameters and the requirement on computer resources are lower. The self-encoder is applied to the fabric weave matching problem, and the inherent defects of the traditional neural network in the aspects of accuracy, robustness, training efficiency and the like are overcome.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a fabric weave matching method based on a contrast learning and self-attention mechanism, which adopts a deep learning basic structure to train a self-encoder, introduces a multi-head self-attention mechanism into a traditional contrast learning framework, realizes better matching of fabric weave images, and improves the precision and efficiency of target recognition and matching.

A fabric weaving method based on comparative learning and a self-attention mechanism specifically comprises the following steps:

step 1, collecting fabric images of various different weaves, and taking the fabric weaves in the images as corresponding labels. And (4) performing data enhancement and edge enhancement on the fabric image, zooming to the same size, and storing the fabric image into a pattern matching database in a tensor mode.

And 2, constructing a contrast learning network, wherein the contrast learning network comprises a self-encoder for feature extraction and a projection layer for nonlinear transformation. The projection layer comprises two layers of residual error networks and a linear rectification function which are sequentially cascaded.

And 3, inputting the tensors and the labels in the pattern matching database into the contrast learning network constructed in the step 2, and performing iterative training for many times. And calculating errors and losses between the positive examples and the negative examples by using a contrast loss function, and performing gradient back propagation on parameters in the contrast learning network.

And 4, inputting the fabric image to be matched into the self-encoder trained in the step 3, obtaining a characteristic image, sequentially passing through a plurality of layers of multi-head self-attention layers, performing dot product calculation on the characteristic image and a query matrix, a key matrix and a value matrix, sequentially passing through a full connection layer, normalization and residual connection, and outputting a characteristic vector.

And 5, inputting the characteristic vectors obtained in the step 4 into a classifier, performing probability projection sequencing on the characteristic vectors, and outputting labels of the fabric images to be matched. And calling fabric images with the same label in the pattern matching database according to the obtained label as a matching result.

The invention has the following beneficial effects:

according to the method, the comparison learning self-encoder with high recognition efficiency is trained in advance through the fabric weave pattern matching database training set, the characteristic key information in the image is extracted, and the fabric weave is accurately judged through pattern matching.

Drawings

FIG. 1 is a flow chart of a fabric weave matching method based on a comparative learning and self-attention mechanism;

FIG. 2 is an image of a fabric of different weaves taken in the example;

FIG. 3 is a schematic diagram of a residual network structure;

fig. 4 is a back propagation training flow diagram.

Detailed Description

The invention is further explained below with reference to the drawings; it should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, a fabric weave matching method based on comparative learning and a self-attention mechanism specifically includes the following steps:

step 1, collecting 50 ten thousand fabric images of various different weaves through photomicrography, dividing the fabric images into a plain weave, a twill weave, a satin weave and various combined fabric weaves of different combinations of the three plain weaves according to different warp and weft yarn passing rules of the fabric, and taking the fabric weaves in the images as labels corresponding to the images. As shown in fig. 2, a, b, and c represent plain weave, twill weave, and satin weave, respectively. And performing edge enhancement processing on the fabric image by adopting a super sampling technology and an edge enhancement technology based on a Sobel operator to highlight detail characteristics. And finally, uniformly scaling the preprocessed image into 128x128 pixels, and storing the pixels into a pattern matching database in a tensor form.

The super sampling technology synthesizes a plurality of pixels in an image into a super pixel, so that the image quality of the processed image is better, more details can be displayed, and the pixel density is increased.

Image processing based on Sobel operators is a traditional image processing technology, the image is convoluted by means of the operators, edges in the image are sequentially extracted, and the fact that fabric weaving information is often contained in a textile edge area is considered, so that useless information can be effectively reduced by adopting an edge enhancement technology, training efficiency and accuracy are greatly improved, and calculation cost is reduced. The Sobel operator used in this embodiment includes two sets of 3 × 3 matrices, which are respectively in the horizontal and vertical directions, and performs a planar convolution with the image to obtain horizontal and vertical luminance difference approximations. The Sobel operator is as follows:

wherein, a represents an original image before processing, and Gx and Gy represent images obtained by detecting transverse and longitudinal edges, respectively. The transverse and longitudinal gradient approximations G for each pixel in the original image can be expressed as:

the gradient direction θ is:

and 2, constructing a contrast learning network, wherein the contrast learning network comprises a self-encoder for feature extraction and a projection layer for nonlinear transformation. In this embodiment, the self-encoder is composed of 3 sequentially cascaded fully-connected layers. The projection layer comprises two layers of residual networks as shown in fig. 3 and a linear rectification function.

And 3, inputting the tensors and the labels in the pattern matching database into the contrast learning network constructed in the step 2 according to batches, and performing iterative training for many times. Before inputting into the contrast learning network, each original picture collected in the step 1 is randomly cut, rotated, turned over or discolored to be converted into a plurality of augmented pictures, and the augmented images of the original pictures are used as normal examples. In order to increase the learning effect, other pictures which are not in the pattern matching database are randomly selected and are used as negative examples of the original pictures after the same augmentation processing. For any image, the image is required to be closer to the positive case and less close to the negative case after passing through the neural network, so that label-free self-supervision learning is realized. And gradually optimizing the model by a back propagation method and a gradient descent method. The error and loss between the positive and negative examples are calculated by using a comparative loss function, and the positive examples are required to be as close as possible and the positive examples are required to be as far away as possible. And carrying out gradient back propagation on the parameters in the comparison learning network according to the training result.

As shown in fig. 4, in this embodiment, gradient back propagation is realized by calculating the normalized temperature-scale cross entropy loss, and the specific steps are as follows:

and s3.1, calculating cosine similarity of the characteristic images output by the projection layer in the comparative learning network.

s3.2, calculating the similar probability among different images by using the cosine similarity between the normalized index function and the different images; in a batch, there are comparisons between the own augmented image and other augmented images of the own image, and also between the own augmented image and the augmented images of other images, there will be a plurality of pairs of pairing combinations, all of which are required to calculate cosine similarity one by one and calculate probability of similarity using a normalized index function.

s3.3, calculating the loss of the set of images using the noise contrast estimation loss.

And s3.4, calculating the loss of all pairs of the whole batch and averaging to obtain the normalized temperature-scale cross entropy loss of the training.

And s3.5, reversely transmitting the result of the normalized temperature-scale cross entropy loss function to a comparison learning network, and dynamically adjusting parameters of the self-encoder and the projection layer.

And s3.6, carrying out repeated iterative training calculation according to gradient descent, and reducing the iterative times and the training time by adopting an adaptive moment estimation optimizer.

And 4, inputting the fabric image to be matched into the self-encoder trained in the step 3, obtaining a characteristic image, sequentially passing through 3 layers of multi-head self-attention layers, performing dot product calculation on the characteristic image and the query matrix, the key matrix and the value matrix, and finally sequentially passing through a full connection layer, normalization and residual connection to output a characteristic vector.

And 5, inputting the feature vectors obtained in the step 4 into a classifier, performing mode matching by adopting probability projection sequencing, and outputting a matching result of the fabric weaving method. In this embodiment, not only the result of matching with the maximum similarity probability of the fabric weave in the image is presented to the user, but also a plurality of results with high similarity probabilities are presented as the similarity recommendations.

Claims

1. A fabric weaving method matching method based on comparative learning and self-attention mechanism is characterized in that: the method specifically comprises the following steps:

step 1, collecting fabric images of various different weaves, and taking the fabric weaves in the images as corresponding labels; the fabric image is subjected to data enhancement and edge enhancement processing, then is zoomed to the same size, and is stored in a pattern matching database in a tensor mode;

step 2, constructing a contrast learning network, wherein the contrast learning network comprises a self-encoder for feature extraction and a projection layer for nonlinear transformation; the projection layer comprises two layers of residual error networks and a linear rectification function which are sequentially cascaded;

step 3, inputting the tensors and labels in the pattern matching database into the contrast learning network constructed in the step 2, and performing repeated iterative training; carrying out gradient back propagation on parameters in the contrast learning network by using a contrast loss function;

step 4, inputting the fabric image to be matched into the self-encoder trained in the step 3, obtaining a characteristic image, sequentially passing through a plurality of layers of multi-head self-attention layers, and performing dot product calculation on the characteristic image, the query matrix, the key matrix and the value matrix; finally, outputting the characteristic vector through full connection layer, normalization and residual connection in sequence;

step 5, inputting the feature vectors obtained in the step 4 into a classifier, performing probability projection sequencing on the feature vectors, and outputting labels of the fabric images to be matched; and taking the fabric image with the same label in the pattern matching database as a matching result according to the obtained label.

2. The fabric weave matching method based on the comparative learning and the self-attention mechanism as claimed in claim 1, characterized in that: the weave of the captured fabric image includes plain weave, twill weave, satin weave, and combinations of two or more.

3. The fabric weave matching method based on the comparative learning and the self-attention mechanism as claimed in claim 1, characterized in that: and performing edge enhancement treatment on the fabric image by adopting a super sampling technology and an edge enhancement technology based on a Sobel operator.

4. The fabric weave matching method based on the comparative learning and the self-attention mechanism as claimed in claim 1, characterized in that: the Sobel operator is:

wherein, A represents the original image before processing, and Gx and Gy represent the images obtained by detecting the transverse and longitudinal edges respectively; the transverse and longitudinal gradient approximations G for each pixel in the original image are expressed as:

the gradient direction θ is:

5. the fabric weave matching method based on the comparative learning and the self-attention mechanism as claimed in claim 1, characterized in that: the self-encoder comprises 3 sequentially cascaded full-connection layers, and the number of the multi-head self-attention layers is 3.

6. The fabric weave matching method based on the comparative learning and the self-attention mechanism as claimed in claim 1, characterized in that: the contrast loss function is the normalized temperature-scale cross entropy loss.

7. The fabric weave matching method based on the comparative learning and self-attention mechanism as claimed in claim 1 or 6, characterized in that: the specific steps of the comparative learning network training are as follows:

s3.1, inputting the fabric images into a contrast learning network according to batches, and calculating the cosine similarity of the characteristic images output by the projection layer;

s3.2, comparing the similar probability of different images by using a normalized exponential function according to the cosine similarity calculation result;

s3.3, calculating the loss of a group of images by using the noise contrast estimation loss;

s3.4, calculating the loss of all pairs of the whole batch and averaging to obtain the normalized temperature-scale cross entropy loss of the training;

s3.5, reversely transmitting the result of the normalized temperature-scale cross entropy loss function to a comparison learning network, and dynamically adjusting parameters of a self-encoder and a projection layer;

8. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to carry out the method of any one of claims 1 to 6.