CN118334404A

CN118334404A - Corrugated case grading and color separation system based on stable diffusion model

Info

Publication number: CN118334404A
Application number: CN202410239409.2A
Authority: CN
Inventors: 罗富文; 曾伟光; 梁倩婷
Original assignee: Guangdong Tai Yi Machinery Co ltd
Current assignee: Guangdong Tai Yi Machinery Co ltd
Priority date: 2024-03-04
Filing date: 2024-03-04
Publication date: 2024-07-12

Abstract

The invention discloses a corrugated case grading and color separation system based on a stable diffusion model, which comprises a light source mapping module, a semantic guidance module and a grading and color separation module, wherein the light source mapping module maps images from different illumination to obtained image characteristic information under standard illumination conditions; the semantic guidance module performs cross attention calculation on the image feature information and semantic text features to generate a semantic image attention map; the hierarchical color separation module calculates the channel space attention of the semantic image attention map through a high-efficiency channel space attention mechanism, and finally obtains the final hierarchical color separation result category through measurement and learning. According to the invention, the light source mapping module is used for mapping image characteristics, the semantic image attention map is generated through the semantic guidance module, and finally, the final classification color separation result category is obtained through the classification color separation module through the calculation of the efficient channel space attention mechanism and the measurement learning, so that the color separation calculation efficiency and the accuracy are greatly improved.

Description

Corrugated case grading and color separation system based on stable diffusion model

Technical Field

The invention relates to the field of industrial defect detection, in particular to a corrugated case grading and color separation system based on a stable diffusion model.

Background

The corrugated board is manufactured into the corrugated paper box through die cutting, indentation, nailing or sticking. Corrugated cartons are the most widely used packaging products, and the dosage is always the first of various packaging products. For over half a century, corrugated cartons gradually replaced transport packaging containers such as wooden cartons with excellent usability and good processability, and became the main force of transport packaging.

The printing quality of the corrugated board not only relates to the skew quality of the corrugated board, but also influences the sales prospect of the packaged products and the image of commodity production enterprises, so that the printing quality of the corrugated board has important significance on how to improve the surface printing effect of the corrugated board.

In the printing production process of the corrugated paper boxes in the existing market, quality problems of certain deviation between printing colors and standard colors often occur due to the influences of factors such as process level, raw material quality and the like. The color reduction effect and the user experience of the product are seriously affected mainly when the single or multiple printing colors are light or dark.

In order to detect and control color deviation problems, the printed matter needs to be subjected to color grading and color separation processing. Existing hierarchical color separation methods are based primarily on calculating differences in color values between images. The method has large calculation cost, and is difficult to meet the requirement of quick quality inspection on the production line; meanwhile, the color difference degree cannot be accurately judged like human eyes, so that the color separation result is not accurate enough.

There is a need for improvements to existing hierarchical color separation systems to increase color separation calculation efficiency and accuracy.

Disclosure of Invention

In order to solve the technical problems, a corrugated case grading color separation system based on a stable diffusion model is provided, which improves color separation calculation efficiency and accuracy.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: the utility model provides a corrugated case hierarchical colour separation system based on stable diffusion model, includes light source mapping module, semantic guidance module and hierarchical colour separation module, wherein:

The light source mapping module maps the images I _other from different illumination to the obtained image characteristic information f _standard under the standard illumination condition;

The semantic guidance module performs cross attention calculation on the image feature information f _standard and the semantic text feature f _text to generate a semantic image attention map f _it;

the hierarchical color separation module calculates the channel space attention of the semantic image attention map f _it through a high-efficiency channel space attention mechanism to obtain the channel space attention f _final, and the channel space attention map f _final is subjected to metric learning to obtain the final hierarchical color separation result category r.

Preferably, the light source mapping module includes a non-standard light source feature extraction block, a standard light source feature extraction block and a standard light source mapping module,

The nonstandard light source characteristic extraction block consists of two convolution layers with convolution kernel sizes of K ₁ =5 and K ₂ =3, and is marked as

The standard light source characteristic extraction block consists of residual block stacks, and is marked as

The standard light source mapping module consists of three convolution layers with the convolution kernel sizes of K '₁＝3、K′₂ =3 and K' ₃ =5, and is marked as

The expression of the image I _other of different illumination mapped to the obtained image characteristic information f _standard under the standard illumination condition is as follows:

Preferably, the semantic guidance module comprises a text sequence construction module, a CLIP text encoder, an MLP text adapter, a denoising U-shaped network module and a feature extraction module;

The text sequence construction module constructs CMYK text sequence tau with standard color and possible deviation color

Τ= { "Standard Yyan", "Standard Magenta", "Standard Yellow", "Standard Black", "Cyan shallow", "Cyan deep", "Magenta shallow", "Magenta deep", "Yellow shallow", "Yellow deep", "Black shallow", "Black deep" };

the CLIP text encoder acquires semantic features c of a text sequence tau;

the MLP text adapter semantic feature c is refined to obtain a text feature c', and the mathematical model of the text feature is as follows:

c←CLIP(τ)

c′←c+λMLP(c)；；

the image standard light source characteristic f _standard is simultaneously input into the denoising U-shaped network by combining with the text characteristic c', and cross attention attempt is obtained through semantic guidance of the text characteristic

The cross attention attemptThe multi-scale feature is obtained by extracting multi-scale features through the feature extraction module by three extended convolution layers F _multiscale with the convolution kernel sizes of K ₁、K₂ and K ₃ and the expansion rates of r ₁、r₂ and r ₃ respectivelyFinally, f _cross and f _multiscale are spliced to obtain semantic and image attention force diagramAnd inputting the data to the grading color separation module.

Preferably, the hierarchical color separation module comprises a spatial attention calculation unit, a channel attention calculation unit and a similarity measurement unit,

The spatial attention calculating unit performs maximum pooling MaxPool and average pooling AvgPool on the input characteristic f _it at H and W respectively, then performs 1D convolution by using two convolution layers Conv1D with the convolution kernel size of K=2, and then fixes the convolved value between [0,1], wherein Sigmoid is expressed as sigma, and a spatial attention diagram is obtainedThe mathematical model is expressed as:

the channel attention calculating unit takes the maximum value and the average value of f _it on the channel of each characteristic point to obtain a space attention diagram

f_c＝σ(MaxPool(f_it),AvgPool(f_it))；

Then stacking the two results, adjusting the channel number by convolution with the primary convolution kernel size of 1, and obtaining a fusion characteristic diagram by SigmoidFinally, the weight is calculatedMultiplying the input semantic image attention map f _it to obtain a result

The similarity measurement unit pairAnd carrying out similarity measurement SM to obtain a final grading color separation result histogram r, wherein a mathematical model is expressed as follows:

f_sum＝σ(Conv1×1(contact(f_s,f_c)))

f_final＝f_sum×f_it

r＝SM(f_final)。

preferably, the process of collecting different illumination images I _other by the light source mapping module is as follows:

the light source mapping module acquires images shot under different light sources to obtain an image set I _total which contains M images;

dividing M images into a standard light source image set I _L1 and a non-standard light source image set I _L2;

The standard light source image set I _L1 is input to the G ₁ of the light source mapping module, and the non-standard light source image set I _L2 is input to the G ₂ of the light source mapping module to train a mapping of images from non-standard light sources to standard light sources;

Meanwhile, a text prompt is constructed for each picture of the image set I _total to obtain a text prompt set { t ₁,t₂…t_n }, a data set is formed through text-image pairs [ I _i,t_i ] of texts and images, and the data set is used for training the semantic guidance module and the grading color separation module.

Preferably, for the light source mapping module, the loss function is:

wherein, D (x|y) represents the identification condition of the discriminator D to the real picture under the condition of the given standard light source image information;

d (z|y) represents the generated image quality of the generator G given y as a standard light source condition.

Preferably, for the semantic guidance module, the loss function is:

L_guidance＝μ(MSE(f_standa,f_cross))+ν(MSE(c,f_cross))

where MSE represents the mean square error and μ and ν are weights.

Preferably, in the hierarchical color separation module, assuming that an average value of training sample vectors of each class is set to be m, a centroid of the k-class training samples is expressed as:

Wherein S _k is a support set sample of labeled k classes.

The beneficial technical effects of the invention are as follows: according to the invention, the light source mapping module is used for mapping image characteristics, the semantic image attention map is generated through the semantic guidance module, and finally, the final classification color separation result category is obtained through the classification color separation module through the calculation of the efficient channel space attention mechanism and the measurement learning, so that the color separation calculation efficiency and the accuracy are greatly improved.

Drawings

Fig. 1 is a schematic block diagram of a corrugated case grading and color separation system based on a stable diffusion model.

Fig. 2 is a schematic diagram of a light source mapping module according to the present invention.

FIG. 3 is a schematic diagram of a semantic guidance module according to the present invention.

FIG. 4 is a schematic diagram of a hierarchical color separation module according to the present invention.

Detailed Description

The present invention will be further described in detail with reference to the following examples, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, but the scope of the present invention is not limited to the following specific examples.

The invention mainly aims to construct a CMYK text sequence containing standard colors and possible deviation colors, and applies the semantic guidance design thought from the text to the image model to focus the attention of the hierarchical color separation model on a key area where color difference occurs, thereby greatly improving the color separation calculation efficiency and accuracy. The scheme can rapidly judge which printing colors deviate in the production process and the type and degree of the deviation, and provides basis for subsequent automatic color difference correction and quality optimization.

Specifically, one embodiment of the present invention is as follows:

As shown in fig. 1-4, a corrugated carton grading and color separation system based on a stable diffusion model comprises a light source mapping module 101, a semantic guidance module 102 and a grading and color separation module 103.

The functions of the functional modules are as follows:

The light source mapping module 101 maps the images I _other from different illumination to the obtained image characteristic information f _standard under the standard illumination condition;

The semantic guidance module 102 performs cross attention calculation on the image feature information f _standard and the semantic text feature f _text to generate a semantic image attention map f _it;

The hierarchical color separation module 103 calculates the channel spatial attention of the semantic image attention map f _it through an efficient channel spatial attention mechanism to obtain a channel spatial attention map f _final, and the channel spatial attention map f _final is subjected to metric learning to obtain a final hierarchical color separation result category r.

The light source mapping module 101 adopts a pre-trained Conditional GAN (network countermeasure refers to a generator model that comprehensively utilizes own network system and means to effectively fight against an adversary's network system), wherein the generator G is an automatic encoder of a coding and decoding structure.

Specifically, the light source mapping module includes a non-standard light source feature extraction block 201, a standard light source feature extraction block 202 and a standard light source mapping module 203,

The nonstandard light source feature extraction block 201 is composed of two convolution layers with convolution kernel sizes of K ₁ =5 and K ₂ =3, and is denoted as

The standard illuminant feature extraction block 202 consists of a stack of residual blocks, denoted as

The standard illuminant mapping module 203 consists of three convolution layers with respective convolution kernel sizes K '₁＝3、K′₂ =3 and K' ₃ =5, denoted as

The process of collecting different illumination images I _other by the light source mapping module 101 is as follows:

The light source mapping module 101 collects images shot under different light sources to obtain an image set I _total which contains M images;

The standard light source image set I _L1 is input to a non-standard light source feature extraction block G ₁ of the light source mapping module, and the non-standard light source image set I _L2 is input to a standard light source feature extraction block G ₂ of the light source mapping module to train a mapping of an image from a non-standard light source to a standard light source;

Meanwhile, a text prompt is constructed for each picture of the image set I _total to obtain a text prompt set { t ₁,t₂…t_n }, a data set is formed through a text-image pair [ I _i,t_i ] of a test image of the text and the image, and the data set is used for training the semantic guidance module and the hierarchical color separation module.

For the illuminant mapping module, to instruct the training illuminant mapping module to generate images of standard illuminant from images of different illuminant, the illuminant mapping module loss function is used to train the dataset, wherein the illuminant mapping module loss function formula is as follows:

The non-standard light source image is mapped to obtain standard light source image characteristics under the guidance of the standard light source characteristics, and the characteristic information irrelevant to the light source is provided for other modules.

The semantic guidance module comprises a text sequence construction module, a CLIP (Contrastive Language-Image Pre-training method for comparing text-Image pairs) text encoder, an MLP (Multilayer Perceptron, multi-layer perceptron) text adapter, a denoising U-shaped network module and a feature extraction module;

the text sequence τ contains a plurality of category labels;

the CLIP text encoder acquires semantic features c of a text sequence tau;

c←CLIP(τ)

c′←c+λMLP(c)；

The image standard light source characteristic f _standard is simultaneously input into the denoising U-shaped network DU (DU represents DenoisingUNet) in combination with the text characteristic c, and cross attention force diagram is obtained through semantic guidance of the text characteristic

The cross attention attemptThe multi-scale characteristic is obtained by extracting multi-scale characteristics through the characteristic extraction module by three extended convolution layers F _multiscale with the convolution kernel sizes of K ₁、K₂ and K ₃ and the expansion rates of r ₁、r₂ and r ₃ respectivelyFinally, the cross attention is soughtAnd multiscale featuresStitching to obtain semantic and image attention force diagramAnd inputting the data to the grading color separation module.

The mathematical model of the whole process above is represented as follows:

f_cross＝DU(f_standard，c)

f_multiscale＝F_multiscale(f_standard)

f_it＝contact(f_cross,f_multiscale)。

the hierarchical color separation module includes a spatial attention calculation unit 301, a channel attention calculation unit 302 and a similarity measurement unit 303,

The spatial attention computation unit performs maximum pooling MaxPool and average pooling AvgPool on the input semantic image attention force f _it at H and W respectively, then carrying out 1D convolution by using two convolution layers Conv1D with the convolution kernel size of K=2, and then fixing the convolved value between [0,1] by adopting normalized Sigmoid, wherein the normalized Sigmoid is expressed as sigma, thus obtaining the space attention force diagramThe mathematical model is expressed as:

The channel attention calculation unit takes a maximum value and an average value on the channel of each feature point for the semantic image attention map f _it, obtain a spatial attention map

f_c＝σ(MaxPool(f_it),AvgPool(f_it))；

Then stacking the two results, adjusting the channel number by convolution with the primary convolution kernel size of 1, and obtaining a fusion characteristic diagram by normalizing SigmoidFinally, the weight is calculatedMultiplying the input semantic image attention map f _it to obtain the final result

f_sum＝σ(Conv1×1(contact(f_s,f_c)))

f_final＝f_sum×f_it

r＝SM(f_final)。

for the semantic guidance module, the loss function is:

L_guidance＝μ(MSE(f_standard,f_cross))+v(MSE(c,f_cross))

Where MSE represents the mean square error and μ and v are weights.

The function of the loss function in the semantic guidance module is to measure the gap between the standard light source image after passing through the light source mapping module and the cross attention feature map obtained after being guided by the semantic feature c, so as to train the semantic guidance module.

In the hierarchical color separation module, the text sequence τ has a plurality of label categories, and an image serving as a training sample and the label categories form a sample vector, and if the average value of the training sample vectors of each category is set as m, the centroid of the k-category training sample is expressed as:

Wherein S _k is a support set sample of labeled k classes.

Before the real training, part of samples need to be tested, and each sample is possibly obtained from different light sources in the test stage, so that a standard light source sample is obtained through a light source mapping module; then obtaining an image-text feature map corresponding to the sample through a semantic guidance module; the image-text feature map is subjected to a hierarchical color separation module and is subjected to similarity measurement with the standard class to obtain a final result.

In the test stage, for each test sample, the similarity between the test sample and the prototypes of all the classes is calculated, and the test sample is allocated to the class with the highest similarity, the cosine similarity is selected as a distance function d (-), in order to generate a distribution in all the classes, a query sample Q is selected from a query set Q, and the distribution expression is:

And carrying out similarity measurement on the test sample and the average value, carrying out similarity measurement on the test sample and each category, and finally adding the test sample and each category to obtain a final measurement result. The total loss function of the measurement learning stage in the similarity measurement module is as follows:

Variations and modifications to the above would be obvious to persons skilled in the art to which the invention pertains from the foregoing description and teachings. Therefore, the invention is not limited to the specific embodiments disclosed and described above, but some modifications and changes of the invention should be also included in the scope of the claims of the invention. In addition, although specific terms are used in the present specification, these terms are for convenience of description only and do not constitute any limitation on the invention.

Claims

1. The corrugated case hierarchical color separation system based on the stable diffusion model is characterized by comprising a light source mapping module, a semantic guidance module and a hierarchical color separation module, wherein:

2. The corrugated box grading and color separation system based on stable diffusion model of claim 1, wherein the light source mapping module comprises a non-standard light source feature extraction block, a standard light source feature extraction block and a standard light source mapping module,

3. The corrugated box grading and color separation system based on the stable diffusion model according to claim 2, wherein the semantic guidance module comprises a text sequence construction module, a CLIP text encoder, an MLP text adapter, a denoising U-shaped network module and a feature extraction module;

The text sequence construction module constructs a CMYK text sequence tau with standard colors and possible deviation colors;

the CLIP text encoder acquires semantic features c of a text sequence tau;

The MLP text adapter semantic feature c is refined to obtain a text feature c';

The cross attention attemptThe multi-scale feature is obtained by extracting multi-scale features through the feature extraction module by three extended convolution layers F _multiscale with the convolution kernel sizes of K ₁、K₂ and K ₃ and the expansion rates of r ₁、r₂ and r ₃ respectivelyFinally, cross attention is soughtAnd multiscale featuresSplicing to obtain semantic and image attention diagramAnd inputting the data to the grading color separation module.

4. The corrugated box grading and color separation system based on stable diffusion model of claim 3, wherein the grading and color separation module comprises a spatial attention calculating unit, a channel attention calculating unit and a similarity measuring unit,

The spatial attention calculating unit performs maximum pooling MaxPool and average pooling AvgPool on the input characteristic f _it at H and W respectively, then performs 1D convolution by using two convolution layers Conv1D with a convolution kernel size of k=2, and then fixes the convolved value between [0,1], the normalized Sigmoid is represented as sigma, and a spatial attention attempt is obtained

Then stacking the two results, adjusting the channel number by convolution with the primary convolution kernel size of 1, and obtaining a fusion characteristic diagram by normalizing SigmoidFinally, feature images are fusedMultiplying the input semantic image attention map f _it to obtain a result

The similarity measurement unit pairAnd carrying out similarity measurement SM to obtain a final grading color separation result histogram r.

5. The corrugated box grading and color separation system based on stable diffusion model as claimed in claim 4, wherein the process of collecting different illumination images I _other by the light source mapping module is as follows:

6. The corrugated box grading and color separation system based on stable diffusion model of claim 5 wherein for the light source mapping module, the loss function is:

7. A corrugated box grading and color separation system based on a stable diffusion model as in claim 3 wherein for the semantic guidance module, the loss function is:

L_guidance＝μ(MSE(f_standard,f_cross))+ν(MSE(c,f_cross))

where MSE represents the mean square error and μ and ν are weights.