CN114723707A

CN114723707A - Complex texture and pattern color difference detection method based on self-supervision contrast learning

Info

Publication number: CN114723707A
Application number: CN202210362124.9A
Authority: CN
Inventors: 程良伦; 曾炜峰; 黄国恒
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2022-04-07
Filing date: 2022-04-07
Publication date: 2022-07-08

Abstract

The invention relates to the technical field of industrial detection, and discloses a complex texture and pattern color difference detection method based on self-supervision contrast learning, which comprises the following steps: s1, constructing an original data set with labels; s2, performing data enhancement on the original data set to obtain an enhanced data set; s3, extracting image characteristics of the enhanced data set through an encoder; s4, projecting the image characteristics of the enhanced data through a projection network to obtain an embedded vector; s5, calculating the similarity between different images in the enhanced data set and the pre-training contrast loss according to the obtained embedded vector, and improving an encoder according to the contrast loss; and S6, accessing a classification network behind the improved encoder to replace a projection network, re-acquiring the characteristics of the improved image through the improved encoder, and performing classification detection on the image in the enhanced data set. The invention solves the problems of high subjectivity and easy false detection of the detection of adjacent color difference levels in the prior art.

Description

Complex texture and pattern color difference detection method based on self-supervision contrast learning

Technical Field

The invention relates to the technical field of industrial detection, in particular to a complex texture and pattern color difference detection method based on self-supervision contrast learning.

Background

The method for evaluating the color difference quality of the color printed product mainly comprises a visual method, a densitometry method and a colorimetry method. The subjective visual inspection method is a printing product evaluation method which makes judgment based on human eye feeling. The subjective visual inspection method mainly depends on the direct comparison between the set standard printed sheets and the production sample sheets, analyzes the visual color difference between the standard printed sheets and the production sample sheets, and observes the dot shape change and dot overprinting condition of various colors by means of an amplification tool to make comprehensive evaluation. The density detection method is based on the thickness of the ink layer of the printed matter, and the density value directly reflects the reflectivity of the printed matter, so that the depth of the printing color and the thickness of the ink layer are directly judged, and the adjustment and control of printing production are influenced. For a long time, the density detection method is widely used by printing enterprises for quality detection, but the consistency of different instruments is obviously different, so that the density detection method is not suitable for wide use. The chroma detection method is a detection method for measuring the chroma information of a printed matter, and is a basic research tool for printing enterprises to utilize object colors, measure colors and describe colors. The chromaticity detection is not influenced by subjective factors, can objectively display the detection measurement, but cannot be directly associated with the ink layer thickness, the dot variation and the like, so that the detection data is directly used for guiding production. The density detection method and the chroma detection method overcome various problems of the subjective detection method and determine a similarity measurement reference. However, most industrial prints have complex textures or patterns, density and chromaticity detection are calculated by using a fixed model base, the universality and accuracy cannot be met by relevant inspection reflection, and the two detection methods can only perform reading detection on a 10-square-millimeter area of a printed sample, so that a color difference detection model of the complex texture and pattern images based on a significance algorithm needs to be constructed. At present, the corrugated paper digital printer can print images with the resolution of 1200 multiplied by 600 at most, the pattern texture is fine and complex, and if the tasks such as color difference detection, grading color separation and the like are carried out by adopting the traditional template matching mode, the processing speed is low, and the precision is low.

Aiming at the problem, the existing self-supervision image classification method based on contrast learning comprises the following steps of S1, acquiring unlabeled data, and randomly enhancing to generate different views; s2, extracting the features of the view, comparing and calculating loss without supervision to obtain an unsupervised classification model C1; step S3, manually labeling part of the unlabeled data to be used as a training verification set; step S4, taking the C1 as a pre-training model, and carrying out fine adjustment according to a training verification set; step S5: extracting the characteristics of the training verification set, and calculating loss through supervision and comparison to obtain C2; step S6: predicting labels of the label-free data according to C2, and screening data with confidence coefficient higher than a preset value to serve as training samples; step S7: based on training samples, C2 is used as a pre-training model, a small network is selected for training and fine adjustment, and the model with the highest verification output accuracy is used as an optimal classification model C3.

However, the existing chromatic aberration detection method has the problems of high subjectivity and easy false detection of the detection of adjacent chromatic aberration levels, so how to invent an objective chromatic aberration detection method with high accuracy is a problem which needs to be solved urgently in the technical field.

Disclosure of Invention

The invention provides a complex texture and pattern color difference detection method based on self-supervision contrast learning, aiming at solving the problems that the subjectivity is high and the detection of adjacent color difference levels is easy to detect by mistake in the prior art, and the method has the characteristics of objectivity and high accuracy.

In order to achieve the purpose of the invention, the technical scheme is as follows:

a complex texture and pattern color difference detection method based on self-supervision contrast learning comprises the following steps:

s1, obtaining an original image, and constructing an original data set with a label according to a color difference grading standard;

s2, performing data enhancement on the original data set to obtain an enhanced data set;

s3, pre-training is started, and image features of the enhanced data set are extracted through an encoder;

s4, projecting the image characteristics of the enhanced data through a projection network to obtain an embedded vector;

s5, calculating the similarity between different images in the enhanced data set and the contrast loss of pre-training according to the obtained embedded vector, improving an encoder according to the contrast loss, and ending the pre-training;

and S6, accessing a classification network behind the improved encoder to replace a projection network, re-acquiring the characteristics of the improved image through the improved encoder, and performing classification detection on the image in the enhanced data set.

Preferably, in step S1, the specific steps are:

s101, collecting sample image data sets with the number of N and the resolution ratio of w multiplied by h from an object to be analyzed;

s102, marking each sample image according to a color difference grade division standard, and recording an original data set as I ═ I₁，I₂，......，I_N}。

Further, in step S2, the specific steps are:

s201, dividing an original data set into a plurality of batches according to the batch size n, and converting the batches through a transformation function R_T：

R_T＝random(flip，rotate，crop，zoom)

Performing data enhancement on original sample image data according to batches, wherein random represents a random selection function, flip represents a turnover transformation, rotate represents a rotation transformation, crop represents a clipping transformation, and zoom represents a scaling transformation;

s202, each sample image I in each batch of data set_iTwo enhancement transformation types are randomly selected through a random function to be enhanced;

and S203, outputting the enhanced data sets with the number of 2N and the resolution of w multiplied by h.

Further, the encoder extracts image features of the enhanced data set, and the specific steps are as follows:

A01. the resolution is w multiplied by h and the number of channels is C through the convolution layer_inputIs processed into resolution of

Tensor C with 64 channels₁：

C₁＝Conv_{7_2}(C_input)

Wherein, Conv_{7_2}The convolution kernel size representing the convolution layer is 7, and the convolution kernel step size is 2;

A02. will tensor C₁Obtaining a characteristic diagram C sequentially through a batch normalization layer and an activation function layer₂，

C₂＝Act_ReLU(BN(C₁))

Wherein BN denotes a batch normalization layer, Act_ReLURepresenting an activation function;

A03. will feature map C₂By maximizing the pooling layer, a shape is obtained of

Characteristic diagram C of₃：

C₃＝MaxP_{3_2}(C₂)

Wherein, MaxP_{3_2}A maximum pooling layer representing a pooling kernel size of 3 and a pooling kernel step size of 2;

A04. characteristic diagram C₃Sequentially passing through four residual learning stages of Stage1, Stage2, Stage3 and Stage4 to obtain an image feature vector H:

Stage1：C₄＝BTK₂(BTK₂(BTK₁(C₃)))

Stage2：C₅＝BTK₂(BTK₂(BTK₂(BTK₁(C₄))))

Stage3：C₆＝BTK₂(BTK₂(BTK₂(BTK₂(BTK₂(BTK₁(C₅))))))

Stage4：H＝BTK₂(BTK₂(BTK₁(C₆)))

wherein, BTK₁() Type I bottleneck Block, BTK, representing residual Structure₂() Type II bottleneck block, C, representing residual structure₄Feature maps obtained for Stage1 through residual learning phase, C₅Feature maps obtained for Stage2 through residual learning phase, C₆The feature map obtained in the residual learning Stage3 is denoted by H, and the m-dimensional feature vector obtained in the residual learning Stage4 is denoted by H.

Further, the image features of the enhanced data are subjected to projection operation through a projection network to obtain an embedded vector, specifically: inputting H into projection network for realizing nonlinear transformation, and the process expression is as follows

z＝FC(Act_ReLU(Dense(H)))

Where FC is the full connection layer and z is the embedded vector resulting from the projection representation.

Furthermore, according to the obtained embedded vector, calculating the similarity between different images in the enhanced data set and the contrast loss of the pre-training, specifically

B01. Enhancing image A by enhancing any two of the data sets_iAnd A_jIs embedded vector z_iAnd z_jCalculating two enhanced images A_iAnd A_jCosine similarity therebetween:

where σ is an adjustable parameter used to scale the input and extend the range of cosine similarity [ -1,1 [ -1]，||z_iAnd z_j| represents the modulus of the embedded vector;

B02. and (3) making the similar cosine similarity into pairs, and calculating the noise contrast estimation loss l (i, j):

wherein k is a function count value;

B03. in the case of image position interchange, the contrast estimation loss l (j, i) for the same pair of images is calculated again:

B04. calculating the loss of all pairs with the batch size of n and averaging to obtain the contrast loss L_N：

B05. According to L_NAnd fixing the weight of the encoder and finishing the pre-training.

Furthermore, the classification network comprises three Dense blocks, wherein each Dense block is a module comprising a plurality of layers, and the feature maps of each layer have the same size; in the classification network, the image characteristics of the enhanced data pass through a convolution layer, sequentially pass through three Dense blocks, pass through a pooling layer and a linear function layer, and output a result, and two adjacent Dense blocks are connected through a convolution layer and a pooling layer.

Furthermore, the process of the image feature x of the enhanced data passing through the density block is as follows:

f₁＝Conv_{1_1}(Act_ReLU(BN(x)))

f₂＝Conv_{3_1}(Act_ReLU(BN(f₁)))

wherein, Conv_{1_1}And Conv_{3_1}F represents a convolution layer having a convolution kernel size of 1 and a step size of 1, and a convolution layer having a convolution kernel size of 3 and a step size of 1, respectively₁Is a feature map after 1 × 1 convolution processing, f₂The feature map is obtained after 3 × 3 convolution processing.

Further, the images in the enhanced data set are classified and detected, andcalculating the loss of the classification network, and training the classification network according to the loss of the classification network; function L of losses of a classification network_dComprises the following steps:

where T is the number of color difference levels, y_nAs a label, if the color difference level of the current sample is n, y_n1, otherwise 0, p_nThe probability that the color difference level is n for the current sample.

A complex texture and pattern color difference detection system based on self-supervision contrast learning comprises a data acquisition module, a data enhancement module, an encoder, a projection network module and a classification network module; the data acquisition module is used for acquiring an original image and constructing an original data set with a label according to a color difference grading standard, the encoder is used for acquiring image characteristics of the enhanced data, the projection network module is used for performing projection operation on the image characteristics through a projection network, and the classification network module is used for classifying color differences to obtain a color difference detection grading result.

The invention has the following beneficial effects:

according to the invention, after the original data set is collected and sorted, the original data set is enhanced, the encoder is pre-trained by combining projection operation through the enhanced data set, the characteristic extraction capability of the encoder is improved, and a classification network is accessed to replace a projection network, so that the detection of complex texture and pattern color difference is realized, the problems of high subjectivity and easy false detection of adjacent color difference levels in the prior art are solved, and the method has the characteristics of objectivity and high accuracy.

Drawings

FIG. 1 is a schematic flow chart of the complex texture and pattern color difference detection method based on the self-supervision contrast learning.

Fig. 2 is a schematic image feature flow chart of an encoder extracting an enhanced data set in the complex texture and pattern color difference detection method based on the self-supervised contrast learning in embodiment 2.

Fig. 3 is a schematic diagram of a type I bottleneck block and a type II bottleneck block in the complex texture and pattern color difference detection method based on the self-supervision contrast learning in embodiment 2.

FIG. 4 is a schematic flow chart of a classification network in the complex texture and pattern color difference detection method based on the self-supervision contrast learning.

FIG. 5 is a schematic diagram of a Dense block of a classification network in the complex texture and pattern color difference detection method based on the self-supervised contrast learning.

Detailed Description

The invention is described in detail below with reference to the drawings and the detailed description.

Example 1

As shown in fig. 1, a method for detecting color difference of complex texture and pattern based on self-supervision contrast learning includes the following steps:

Example 2

In one embodiment, step S1 includes the following steps:

s101, collecting sample image data sets with the number of N and the resolution ratio of w multiplied by h from an object to be analyzed; this example collected 600 sample image data sets with a resolution of 576 x 576 from a print production line.

Step S2, the specific steps are:

R_T＝random(flip，rotate，crop，zoom)

Performing data enhancement on original sample image data according to batches, wherein random represents a random selection function, flip represents a turnover transformation, rotate represents a rotation transformation, crop represents a clipping transformation, and zoom represents a scaling transformation; in the embodiment, the original data set is divided into a plurality of batches according to the batch size of 6;

s202, each sample image I in each batch of data set_iTwo are randomly selected by a random functionEnhancing by using the enhancement transformation type;

In this embodiment, after the data set of each batch is inputted into the data enhancement module, each sample image I in the batch_iTwo enhancement transformation types are selected sequentially through a random function, and the transformation function does not change the color difference characteristics of the sample, so that the enhancement result of the same sample has the same color difference characteristics, and finally, the same color difference grade is also classified by the classification network. After all batches are subjected to the first round of data enhancement operation, enhanced data sets with the number of 1200 and the resolution of 576 × 576 are output.

In this embodiment, the enhanced data set with resolution 576 × 576 and the number of channels C is input into the untrained initial encoder, and the process can be expressed as:

H＝Extract(A)

wherein the Extract () has the role of obtaining image features of sample data by an encoder, and converting A_ijConverts m-dimensional vector H_ijAnd (6) outputting. The encoder used in the pre-training process comprises a deep learning network module of two parts, namely a feature extraction module and a residual error module.

As shown in fig. 2 and 3, the encoder extracts image features of the enhanced data set, and the specific steps are as follows:

Tensor C with 64 channels₁：

C₁＝Conv_{7_2}(C_input)

A02. will tensor C₁Obtaining a characteristic diagram C sequentially through a batch standardization layer and an activation function layer₂，

C₂＝Act_ReLU(BN(C₁))

Characteristic diagram C of₃：

C₃＝MaxP_{3_2}(C₂)

A04. in the residual error module, there are two different residual error structure bottleneck blocks, the type I bottleneck block has two important adjustable parameters λ and μ, λ is responsible for controlling whether to execute down-sampling, μ is responsible for controlling whether to reduce the number of channels, the specific operation is to achieve the purpose of different number of input channels and output channels by the convolution layer on the residual error branch, and the residual error branch of the type II bottleneck block is not provided with convolution layer, so the number of output channels and the number of input channels are kept the same. The whole residual error module can be divided into four residual error learning stages in sequence by taking the I-type bottleneck block as a boundary, and the characteristic diagram C₃Sequentially passing through four residual learning stages of Stage1, Stage2, Stage3 and Stage4 to obtain an image feature vector H:

Stage1：C₄＝BTK₂(BTK₂(BTK₁(C₃)))

Stage2：C₅＝BTK₂(BTK₂(BTK₂(BTK₁(C₄))))

Stage3：C₆＝BTK₂(BTK₂(BTK₂(BTK₂(BTK₂(BTK₁(C₅))))))

Stage4：H＝BTK₂(BTK₂(BTK₁(C₆)))

wherein, BTK₁() Type I bottleneck Block, BTK, representing residual Structure₂() Type II bottleneck Block representing residual Structure, C₄To go through residual error learning Stage1The resulting feature map, C₅Feature maps obtained for Stage2 through residual learning phase, C₆The feature map obtained in the residual learning Stage3 is denoted by H, and the m-dimensional feature vector obtained in the residual learning Stage4 is denoted by H. Stage1 residual learning in this embodiment includes looping through 2 type II bottleneck blocks and 1 type I bottleneck block, Stage2 residual learning includes looping through 1 type I bottleneck block and 3 type II bottleneck blocks, Stage3 residual learning includes looping through 1 type I bottleneck block and 5 type II bottleneck blocks, and Stage4 residual learning includes looping through 1 type I bottleneck block and 2 type II bottleneck blocks.

In this embodiment, the shape output by the feature extraction module in the first stage is the (64, 144, 144) feature map C₃In input, considering that convolution and maximum pooling operations are just performed on sample image input in the feature extraction module, residual learning is not performed, a large amount of information is lost in direct downsampling at the moment, the number of input channels is 64 at the moment, and the number of channels does not need to be reduced to improve learning efficiency, so that lambda and mu are in a state of not performing operations by default. After the residual error learning in the first stage, the operations of downsampling and channel number reduction can be directly set in the following three stages, and the model learning efficiency is further improved.

Projecting the image features of the enhanced data through a projection network to obtain an embedded vector, specifically: inputting H into projection network for realizing nonlinear transformation, and the process expression is as follows

z＝FC(Act_ReLU(Dense(H)))

And the FC is a full connection layer and is used for extracting the correlation among the features and finally mapping the correlation into an output space, and z is an embedded vector obtained by projection representation and used for loss calculation of pre-training.

In a particular embodiment, the similarity between different images in the enhanced data set and the pre-trained contrast loss are calculated based on the obtained embedded vector, in particular

B01. Enhancing image A by enhancing any two of the data sets_iAnd A_jIs embedded vector z_iAnd z_jCalculating two enhancement mapsLike A_iAnd A_jCosine similarity between:

where σ is an adjustable parameter used to scale the input and extend the range of cosine similarity [ -1,1 [ -1]，||z_iAnd z_j| represents the modulus of the embedded vector; the above formula is used to calculate the cosine similarity between each two enhanced images in batch, and ideally, the similarity between the enhanced images of the same color difference level is very high, that is, when the subscript i of the embedded vector is equal to j, the cosine similarity is calculated to be higher.

wherein k is a function count value;

B05. According to L_NAnd fixing the weight of the encoder and finishing the pre-training. Based on L_NThe encoder and projection head representations will improve over time, and the resulting representations will place similar images in more similar locations in space.

Example 3

In one embodiment, step S1 includes the following steps:

Step S2, the specific steps are:

s201, dividing an original data set into a plurality of batches according to the batch size n, and converting a function number R_T：

R_T＝random(flip，rotate，crop，zoom)

s202. eachEach sample image I in the data set of the batch_iTwo enhancement transformation types are randomly selected through a random function to be enhanced;

In one embodiment, the encoder extracts image features of the enhanced data set by the following steps:

Tensor C with 64 channels₁：

C₁＝Conv_{7_2}(C_input)

C₂＝Act_ReLU(BN(C₁))

Characteristic diagram C of₃：

C₃＝MaxP_{3_2}(C₂)

Stage1：C₄＝BTK₂(BTK₂(BTK₁(C₃)))

Stage2：C₅＝BTK₂(BTK₂(BTK₂(BTK₁(C₄))))

Stage3：C₆＝BTK₂(BTK₂(BTK₂(BTK₂(BTK₂(BTK₁(C₅))))))

Stage4：H＝BTK₂(BTK₂(BTK₁(C₆)))

wherein, BTK₁() Type I bottleneck Block, BTK, representing residual Structure₂() Type II bottleneck block, C, representing residual structure₄Feature maps obtained for Stage1 through residual learning phase, C₅Feature maps obtained by Stage2 of residual learning Stage, C₆The feature map obtained in the residual learning Stage3 is denoted by H, and the m-dimensional feature vector obtained in the residual learning Stage4 is denoted by H.

Projecting the image characteristics of the enhanced data through a projection network to obtain an embedded vector, specifically: inputting H into projection network for realizing nonlinear transformation, and the process expression is as follows

z＝FC(Act_ReLU(Dense(H)))

B01. Enhancing image A by enhancing any two of the data sets_iAnd A_jIs embedded vector z_iAnd z_jCalculating two enhanced images A_iAnd A_jCosine similarity between:

wherein k is a function count value;

As shown in fig. 4, in one specific implementation, the classification network includes three sense blocks, which are modules including several layers, wherein the feature maps of each layer have the same size; in the classification network, the image characteristics of the enhanced data pass through a convolution layer, sequentially pass through three Dense blocks, pass through a pooling layer and a linear function layer, and then output a result, and two adjacent Dense blocks are connected through a convolution layer and a pooling layer, so that the size of a characteristic diagram is reduced, and the effect of compressing the model is achieved.

In this embodiment, the density block is a module including a plurality of layers, the feature maps of each layer have the same size, and a Dense connection manner is adopted between the layers.

As shown in fig. 5, in one implementation, the process of passing image feature x of the enhanced data through the density block is:

f₁＝Conv_{1_1}(Act_ReLU(BN(x)))

f₂＝Conv_{3_1}(Act_ReLU(BN(f₁)))

In one embodiment, the images in the enhanced dataset are classified and detected, the loss of the classification network is calculated, and the classification network is trained according to the loss of the classification network; function L of losses of a classification network_dComprises the following steps:

According to the method, after the original data set is collected and sorted, the original data set is enhanced, the encoder is pre-trained by combining projection operation through the enhanced data set, the feature extraction capability of the encoder is improved, and a classification network is accessed to replace a projection network, so that the detection of complex textures and pattern color difference is realized; according to the self-supervised contrast learning idea, the problems of low speed and low adjacent chromatic aberration level detection precision in the traditional chromatic aberration detection method are solved, and compared with the supervised learning method, the method is more suitable for scenes with few samples, the problem that a large amount of labeled data is needed in the supervised learning process is solved by using the advantages of the contrast learning, the generalization performance of the model is greatly improved, and the method has the characteristics of high detection precision, strong detection robustness and wide applicability.

Example 4

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A complex texture and pattern color difference detection method based on self-supervision contrast learning is characterized by comprising the following steps: the method comprises the following steps:

2. The method for detecting color difference of complex texture and pattern based on self-supervision contrast learning according to claim 1, characterized in that: step S1, the specific steps are:

s102, marking each sample image according to a color difference grade division standard, and recording an original data set as I ═ I₁,I₂,……,I_N}。

3. The method for detecting color difference of complex texture and pattern based on self-supervision contrast learning according to claim 2, characterized in that: step S2, the specific steps are:

R_T＝random(flip,rotate,crop,zoom)

s202, each sample image I in each batch of data sets_iTwo enhancement transformation types are randomly selected through a random function to be enhanced;

4. The method for detecting color difference of complex texture and pattern based on self-supervision contrast learning according to claim 3, characterized in that: the encoder extracts the image features of the enhanced data set, and the specific steps are as follows:

Tensor C with 64 channels₁：

C₁＝Conv_{7_2}(C_input)

C₂＝Act_ReLU(BN(C₁))

Characteristic diagram C of₃：

C₃＝MaxP_{3_2}(C₂)

Stage1:C₄＝BTK₂(BTK₂(BTK₁(C₃)))

Stage2:C₅＝BTK₂(BTK₂(BTK₂(BTK₁(C₄))))

Stage3:C₆＝BTK₂(BTK₂(BTK₂(BTK₂(BTK₂(BTK₁(C₅))))))

Stage4:H＝BTK₂(BTK₂(BTK₁(C₆)))

wherein, BTK₁() Type I bottleneck Block, BTK, representing residual Structure₂() Type II bottleneck block representing residual structure, C₄Feature maps obtained for Stage1 through residual learning phase, C₅Feature maps obtained for Stage2 through residual learning phase, C₆The feature map obtained in the residual learning Stage3 is denoted by H, and the m-dimensional feature vector obtained in the residual learning Stage4 is denoted by H.

5. The method of claim 4, wherein the method comprises: projecting the image features of the enhanced data through a projection network to obtain an embedded vector, specifically: inputting H into a projection network realizing nonlinear transformation, wherein the process expression is as follows:

z＝FC(Act_ReLU(Dense(H)))

6. The method for detecting color difference of complex texture and pattern based on self-supervision contrast learning according to claim 5, characterized in that: according to the obtained embedded vector, calculating the similarity between different images in the enhanced data set and the contrast loss of pre-training, specifically

where σ is an adjustable parameter used to scale the input and extend the range of cosine similarity [ -1,1 [ -1]，‖z_iII and I z_j| represents the modulus of the embedded vector;

B02. and (3) making pairs of similar cosine similarities, and calculating the noise contrast estimation loss l (i, j):

wherein k is a function count value;

7. The method for detecting color difference of complex texture and pattern based on self-supervision contrast learning according to claim 1, characterized in that: the classification network comprises three Dense blocks, wherein each Dense block is a module comprising a plurality of layers, and the feature maps of all the layers have the same size; in the classification network, the image characteristics of the enhanced data pass through a convolution layer, sequentially pass through three Dense blocks, pass through a pooling layer and a linear function layer, and output a result, and two adjacent Dense blocks are connected through a convolution layer and a pooling layer.

8. The method for detecting color difference of complex texture and pattern based on self-supervision contrast learning according to claim 7, characterized in that: the process of the image characteristic x of the enhanced data through the Dense block is as follows:

f₁＝Conv_{1_1}(Act_ReLU(BN(x)))

f₂＝Conv_{3_1}(Act_ReLU(BN(f₁)))

9. The method for detecting color difference of complex texture and pattern based on self-supervision contrast learning according to claim 8, characterized in that: classifying and detecting the images in the enhanced data set, calculating the loss of a classification network, and training the classification network according to the loss of the classification network; function L of losses of a classification network_dComprises the following steps:

where T is the number of color difference levels, y_nFor labeling, if the color difference level of the current sample is n, y_n1, otherwise 0, p_nThe probability that the color difference level is n for the current sample.

10. The utility model provides a complicated texture and pattern colour difference detecting system based on self-supervision contrast study which characterized in that: the system comprises a data acquisition module, a data enhancement module, an encoder, a projection network module and a classification network module; the data acquisition module is used for acquiring an original image and constructing an original data set with a label according to a color difference grading standard, the encoder is used for acquiring image characteristics of the enhanced data, the projection network module is used for performing projection operation on the image characteristics through a projection network, and the classification network module is used for classifying color differences to obtain a color difference detection grading result.