CN116612167A - Texture splicing method for removing defects of solid wood sawn timber - Google Patents

Texture splicing method for removing defects of solid wood sawn timber Download PDF

Info

Publication number
CN116612167A
CN116612167A CN202310533045.4A CN202310533045A CN116612167A CN 116612167 A CN116612167 A CN 116612167A CN 202310533045 A CN202310533045 A CN 202310533045A CN 116612167 A CN116612167 A CN 116612167A
Authority
CN
China
Prior art keywords
image
model
texture
solid wood
decoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310533045.4A
Other languages
Chinese (zh)
Inventor
张怡卓
于慧伶
刘星宇
赵艳江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou University
Original Assignee
Changzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou University filed Critical Changzhou University
Priority to CN202310533045.4A priority Critical patent/CN116612167A/en
Publication of CN116612167A publication Critical patent/CN116612167A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/49Analysis of texture based on structural texture description, e.g. using primitives or placement rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • G06T5/77
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to the technical field of image processing, in particular to a texture splicing method for removing defects of a solid wood sawn timber, which comprises the steps of introducing an asymmetric encoder in an MAE (advanced model) into a Vit model, and reducing the calculated amount of the model; constructing an RSwin decoder with multi-scale characteristics, which is used for adjusting the size of the divided image blocks and finishing the restoration of the image blocks from thick to thin; the unmasked area is given different weights by the L2 loss function weighted far and near from the defect center, and the effective pixels are fully utilized to repair the missing area. The method solves the problem that the calculation amount is too large due to the fact that the existing transducer model uses a global attention mechanism; when the method of dividing an image into a plurality of non-overlapping image blocks by the Vit is used for an image restoration task, pixel-level modeling cannot be performed, and particularly when the Vit model is used as a decoder, the restoration result has obvious cracking sense and the problem of obvious marginal effect.

Description

Texture splicing method for removing defects of solid wood sawn timber
Technical Field
The invention relates to the technical field of image processing, in particular to a texture splicing method for removing defects of a solid wood saw.
Background
During the growth and processing of wood, various flaws such as scabs, dead knots, insect eyes and saw damage can be generated, so that the physical properties, such as mechanical properties, of wood products can be influenced, the attractiveness of the products can be influenced, and the grade of the products can be reduced. The defect in wood is an important standard for evaluating the quality and commercial value of wood, and with the rapid development of sensor and computer technologies, related nondestructive inspection technologies have been gradually applied to the field of wood inspection, such as laser technology, infrared technology, machine vision technology, and the like. The existing research focuses on detection and positioning of flaws, but in the splicing of solid wood veneers after the flaws are positioned, the visual consistency of the wood textures after the splicing is rarely considered. Then, an image restoration technology is introduced to generate textures of the defective portion of the wood, and the wood with similar textures to the defective area is found through a matching algorithm in the later period to splice, so that the use value of the wood is improved.
The difficulty with image restoration is that the consistency of the image texture structure can be maintained while the missing part generates a fine real texture. The ideas of the current image generation task mainly include a variational encoder (VAE), a denoising diffusion model (DDPM), a generation countermeasure network (GAN), and the like. In recent years, the generation of textures of an image missing part based on a GAN+CNN combined method is a main research direction of image restoration, but the existing image restoration network has the problems of low restoration precision, long model training time and the like caused by obvious marginal effect and incoherence of visual textures aiming at the image restoration technology of plate defect textures, and the irregularities and complexity of wood textures deepen the restoration difficulty, so that an ideal effect is difficult to achieve by using the existing model.
In recent years, the Transformer network has achieved great success in the field of vision by virtue of its strong feature extraction capability, and the problem that the receptive field of the CNN model is limited is solved. However, the application of the transducer model to the vision field is insufficient, and on one hand, the transducer model uses a global attention mechanism, so that the calculation amount is too large; on the other hand, when the method of dividing an image into a plurality of non-overlapping image blocks by Vision Transformer (Vit) is used for an image restoration task, pixel-level modeling cannot be performed, and particularly when the Vit model is used as a decoder, a restoration result has obvious cracking sense and a marginal effect is obvious.
Unlike other image restoration tasks, wood defect texture restoration needs to take the following points into consideration: the wood texture is irregular, the texture surrounding the peripheral area of the defect is deformed due to the existence of the defect, and the mask area aiming at the defect part is larger than other repair tasks in order to ensure the consistency of the repaired texture; in addition, the wood defect region texture is repaired to prepare for later-stage wood splicing, and the wood splicing is performed in a regular shape.
Disclosure of Invention
Aiming at the defects of the existing method, the invention adopts the following technical scheme: the texture splicing method for removing the defects of the solid wood sawn timber comprises the following steps:
firstly, acquiring a wood texture image, and carrying out rotation, translation, mirroring and brightness transformation pretreatment on the image to construct a wood texture image training set and a verification set;
and step two, transmitting the training set data into an MRS-transducer model for training.
Further, construction of the MRS-transducer model includes:
step 21, introducing an asymmetric encoder in the MAE in the Vit model;
further, the asymmetric encoder uses a fixed masking strategy, discards the masking portion, and takes as input the unmasked visual block.
Further, the asymmetric encoder comprises a plurality of transducer modules, wherein each transducer module consists of a multi-head self-focusing mechanism layer and a full-connection layer, and the multi-head self-focusing mechanism layer and the full-connection layer are connected through a Norm layer and a residual error.
Step 22, constructing an RSwin decoder with multi-scale characteristics, which is used for adjusting the size of the divided image blocks and finishing the restoration of the image blocks from thick to thin;
further, the RSwin decoder uses a W-MSA and SW-MSA sliding window attention machine of the Swin transducer model, and connects the Patch conversion layer after the SW-MSA.
And step 23, giving different weights to the unmasked areas through the L2 loss function weighted far and near from the defect center, and fully utilizing the effective pixels to repair the missing areas.
Further, step 23 specifically includes:
firstly, setting a matrix with the same size as an original image, setting a pixel of a masked area to 0, and calculating Euclidean distance between the pixel and the nearest mask area without being masked to obtain a matrix template;
and then taking alpha as a base number and a matrix template as an index to obtain a weight matrix, and taking the weight matrix as the weight of the pixel point when the L2 loss is calculated.
Further, the formula of the L2 loss function is:
wherein ,representing L2 loss of restored image and original image, alpha A Representing a weight matrix divided byRepresents averaging losses, I r Is a repaired image, I o Is the original image.
The invention has the beneficial effects that:
1. an asymmetric encoder and decoder structure is designed, namely, the encoder only takes an unmasked visible block as input, and the output of the encoder and a masking block are input into the decoder, and the dimensions of the input of the encoder and the decoder are different to form the asymmetric structure, and the masking block only can provide position information, so that the calculation amount of a model encoder is reduced while the wood image texture information is not lost;
2. the decoder with the multi-scale characteristic RSwin module is designed, the size of the divided image blocks is flexibly adjusted, the texture of the wood image is repaired from thick to thin in the decoding stage, the defect part can be repaired by utilizing the characteristic of the attention of a Swin moving window, and the crack feeling between the image blocks is solved;
3. the L2 loss weighted far and near from the defect center is provided, different weights are given to the unmasked areas, the closer the distance from the defect center is, the larger the given weights are, the full utilization of the effective texture features of the wood defect edge is realized, the problem that the semantics of the joint of the repair area and the unmasked area are not consistent is solved, and the repair accuracy is improved.
Drawings
FIG. 1 is a schematic diagram of a Vision Transformer model to repair wood grain;
FIG. 2 is a schematic diagram of a defective image texture restoration step;
FIG. 3 is a block diagram of an asymmetric encoder decoder;
FIG. 4 is a block diagram of a decoder RSwin and W-MSA and SW-MSA layer principles;
FIG. 5 is a Patch switching layer principle;
FIG. 6 is a construction diagram of an MRS-transducer;
FIGS. 7 (a) - (f) are original, 90 ° right-hand, 90 ° left-hand, luminance, mirror, and contrast maps, respectively;
FIG. 8 is a partially textured defect sample;
FIG. 9 is a view of a solid wood sawn timber original image, a cut defect image, a mask strategy processing image, a repaired image, and an image mapped to an original position of the image, respectively;
fig. 10 is an original image, a mask strategy treatment image, and A, B, C, D ablation experiment images, respectively;
fig. 11 is a diagram of the original, mask policy processing, deep fill v2, TG-Net, and MRS, respectively.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic illustrations showing only the basic structure of the invention and thus showing only those constructions that are relevant to the invention.
The texture splicing method for removing the defects of the solid wood sawn timber comprises the following steps:
vit is a model for applying a transducer to image classification, and the model is simple, good in effect, high in expandability and the like, so that researchers in many vision fields are attracted to the transducer model. In NLP, a sequence is input in a transducer, and an image in the visual field is two-dimensional, so before input, a two-dimensional image is required to be converted into a one-dimensional sequence, a method adopted by VIT is that the image is divided into a plurality of non-overlapping blocks (patches) with the same size, each patch is projected into a vector with a fixed length, then position information among the image blocks is marked by adding position codes (position embedding) to each vector, and then the vector is input into the transducer for training; the structure of the wood grain restoration using the VIT model is shown in fig. 1.
The MRS-transducer model is based on VIT as the basic framework of the image restoration model, and is optimized in three aspects of MASK strategy (MASK), encoder input form (Encode) and decoder structural design (Decode).
Firstly, acquiring a wood texture image, and carrying out rotation, translation, mirroring and brightness transformation pretreatment on the image to construct a wood texture image training set and a verification set;
and step two, transmitting the training set data into an MRS-transducer model for training.
Step 21, introducing an asymmetric coder and decoder based on a Vit model, wherein the coder adopts a fixed MASK (MASK) strategy, and the reject MASK part takes only the unmasked visible blocks as input so as to reduce the calculation amount of the model;
MASK: the image repair class task typically requires masking the image to simulate the defective area to train the network model, ensuring that the model reconstructs the complete image by learning some of the known information in the image. In the industrial scene of solid wood board splicing, the areas with defects of the boards are removed in a regular rectangle. The MRS-transducer model uses a rectangular mask approach in training. In addition, since the trend of the wood texture is deformed due to the existence of defects, from the viewpoint of visual texture consistency, the defect area and all surrounding deformed textures need to be masked, so that the masking area is far larger than that of other image restoration algorithms, an asymmetric coding and decoding model structure in MAE is introduced in the process of designing the algorithm, namely, an encoder takes only unmasked visible blocks as input, and the output of the encoder and the masking blocks are input into a decoder, and the encoder and the decoder are different in input dimension to form an asymmetric structure, so that the larger the masking area is, the fewer the blocks input by the encoder are, and the smaller the required calculation amount of the encoder is. But the fewer unmasked areas that remain, the more difficult it is to reconstruct the texture. Through multiple experiments, when the mask rate is 0.5, namely the mask area is about 50% of the picture, the mask area is maximum while the repair accuracy is ensured, and the calculated amount is reduced to the minimum.
Since the size and location of the wood grain defects are different, a strategy for training the model with a fixed mask size is proposed here in order to improve the accuracy of the model in repairing the actual wood grain defects. The strategy is to position the defect, determine the mask range, cut out the proper wood texture image, expand or scale the defect image to a fixed size, then input the repaired texture image into the model, and map to the original position of the image through the opposite operation, as shown in fig. 2.
The encoder consists of a plurality of transducer modules, each transducer module consists of a multi-head self-Attention mechanism layer (Attention) and a full-link layer (MLP), which are transmitted through a Norm layer and a residual connection, and the multi-head self-Attention mechanism is the core of the transducer and realizes the global modeling of the image through the mechanism. The formula of the attention mechanism is as follows:
wherein Q, K, V are respectively query, key, value matrix,is a scaling factor.
Although this mechanism achieves global relational modeling by calculating the similarity between all image block vectors, it also consumes a significant amount of computation. Because the image blocks of the wood image after being masked can not provide information for image restoration, the input of the image blocks into a network model only increases the calculated amount and occupies extra memory space, the invention designs an asymmetric coding and decoding structure, the structure of which is shown in figure 3, wherein an encoder only takes unmasked blocks as input, and after the characteristics are extracted by the encoder, the image blocks are input into a decoder together with the masking blocks, and the restored image is output.
Step 22, designing an RSwin (RS) module decoder with a multi-scale characteristic, which is used for adjusting the size of the divided image blocks and finishing the restoration of the image blocks from thick to thin;
a decoder: because Vit is the input and output of the model in the form of image blocks, when the image blocks are too large in the process of a decoder for the model, the repair accuracy is affected; too small an image block size increases the amount of computation, prolongs the repair time, and cannot adaptively adjust the size of the divided image blocks. For the shortcomings in the Vit model design, a new decoder module RSwin-transducer for wood texture image restoration is designed as shown in fig. 4.
The module reserves a sliding window attention mechanism (W-MSA and SW-MSA) of the Swin transducer model, reduces the calculation complexity of the model while ensuring the global modeling characteristic, and respectively calculates the common attention Mechanism (MSA) and the window size M window attention mechanism (W-MSA) as follows:
Ω(MSA)=4hwC 2 +2(hw) 2 C (2)
Ω(W-MSA)=4hwC 2 +2M 2 hwC (3)
wherein, h represents the number of patch blocks on the image, M represents the size of the attention of the window, and C represents the number of channels; from the above equation, it can be seen that the window attention mechanism can allow the model computational complexity to increase linearly with image size.
The RSwin module improves the Swin multi-scale design mode, and a Patch conversion layer is designed in the decoder; the principle is as shown in fig. 5, the output size can be doubled along with the deepening of the network, the channel dimension is reduced, the image dimension is enlarged, the size of a basic block in the attention of a window is reduced, the direct cracking sense of the repaired image block is effectively relieved, and the repairing precision is improved.
As shown in FIG. 6, on a solid wood plate image with defects, an image with proper size is cut out by taking the defects as the center according to a masking strategy, a standard inputtable model is made for training, then the image is divided into non-overlapped blocks, the defect blocks at the center are masked, the non-masked blocks are changed into feature vectors through a linear mapping layer and are added with position codes to be input into an encoder, the encoder consists of a Vit module of 12 layers, the output after the feature is extracted by the encoder is input into a decoder together with the vectors mapped by the blocks which are initially masked, the decoder consists of a RSwin module of 8 layers, the restored wood texture image is output after the decoder, and the restored image is mapped back to the corresponding position of an original image, namely the final result.
And step 23, giving different weights to the unmasked areas through L2 loss functions weighted far and near from the defect center, so that the model fully utilizes the effective pixels to repair the missing areas.
Weighted L2 loss: the traditional image restoration calculation loss mostly calculates the loss of a defect part or the loss of the whole image, so that the semantic inconsistency of the connection part of the restored image defect and the non-defect is caused, and the more the effective pixels close to the defect position are, the more the restoration assistance is, the less the opposite is.
The MRS-Transformers model designs an L2 loss weighted far and near from the missing center; firstly, setting a matrix with the same size as an original image, setting a pixel of a masked area to 0, and calculating Euclidean distance between the pixel and the nearest mask area without being masked to obtain a matrix template; taking alpha as a base number (alpha is set to be 0.999), taking a matrix template as an index, obtaining a weight matrix, wherein the range of the weight matrix value is [0,1], taking the weight matrix as the weight of pixel points when L2 loss is calculated, the weight of a defect part of the matrix is 1, and the closer to the defect, the weight is about 1, and the lower the weight is.
The formula is as follows:
wherein ,representing L2 loss of restored image and original image, alpha A Representing a weight matrix divided byMeans averaging the losses; i r Is a repaired image, I o Is the original image.
Texture generation evaluation index:
four image quality evaluation indexes MSE (mean square error), PSNR (peak signal to noise ratio), SSIM (structural similarity) and LPIPS (learned perceptual image patch similarity) and model calculation amount GFLOPs were adopted as quantitative observation data. MSE is the mean of the square of the difference between original image X and the generated image Y
Wherein H represents the high of the image, Y represents the wide of the image, the MSE value range is [0,1], and the smaller the value is, the smaller the image distortion is.
PSNR (Peak Signal to Noise Ratio) peak signal-to-noise ratio is a fully referenced image quality assessment indicator.
Where n is the color depth of each pixel in the image, here taken as 8, and PSNR is given in db, with a larger value indicating less distortion of the image.
SSIM (structural similarity) structural similarity is also a fully-referenced image quality evaluation index, which measures image similarity from three aspects of brightness (luminance), contrast (contrast), and structure (structure), respectively;
SSIM(x,y)=l(x,y)·c(x,y)·s(x,y) (7)
where l (x, y) represents brightness, c (x, y) represents contrast, s (x, y) represents structure, the SSIM value range is [0,1], and the larger the value, the smaller the image distortion.
LPIPS (Learned Perceptual Image Patch Similarity) learning image block perception loss is a reference image quality evaluation index, and is more in line with human perception conditions than MAE, PSNR, SSIM and other methods. The lower the value of LPIPS, the more similar the two images are, and conversely, the greater the difference.
Given a true image reference block x and a noisy image distortion block x 0 The perceptual similarity metric formula is as follows:
wherein d is x 0 Distance from x. Feature stacks (feature stacks) are extracted from the feature layers and unit-normalized in the channel dimension. Using vectorsTo scale the number of active channels and finally calculate the L2 distance. And finally averaged over space and summed over the channels. When->When equivalent to cosine similarity.
And (3) data acquisition:
experiments 3000 wood grain images, 500 of which were defective, were acquired using an OscarF810CIRF industrial camera. The 2500 defect-free images are expanded to 10000 through data addition methods such as rotation, translation, mirror image, brightness conversion and the like, and the images are processed according to 8: the scale of 2 is divided into training and validation sets. To increase the model training speed and recognition speed, the high-resolution images are uniformly processed to 256×256 pixels. Some of the samples are shown in FIGS. 7 (a) - (f).
The sizes and defect positions of the defect wood texture pictures collected in the data set are different, the defect positions are positioned at the center of a cut sample, the cut sample is uniformly processed into 256 x 256 pixels, and a part of the processed sample is shown in fig. 8.
Experimental environment and key parameters:
the experimental environment is as follows: the system is Ubuntu20.4; the deep learning framework is Pytorch; the GPU is Tesla V100; the running memory is 32G. Main parameters of the model: the BatchSize was 32, the training process used an Adam optimizer with a learning rate of 1.8e-4, and the co-training 1200epoch model converged to the optimal state.
The image size (Imagesize) used for the experiment was unified at 256 x 256 pixels; the picture is divided into non-overlapping picture blocks in the encoding stage, the picture block size (PatchSize) is 16 x 16pixel, and N (256) picture blocks total. The dimension (encoding_embedded_dim) of each image block map is 768, representing the sum of the number of pixels of three channels per image block. The encoder uses the Vit model with a Depth (Depth) of 12 layers. The decoder uses 2 sets of RSwin models, each set of Depth (Depth) is 4 layers, and each time one set of RSwin is passed, one layer of dividing layer is used, so that the size of a divided image block is reduced by 4 times, the size of the image is enlarged by 4 times, and the number of channels is reduced by 16 times.
Solid wood plate defect texture generation experiment
In order to test the repairing effect of the model on a real wood sample with defects, an experiment of generating textures of a defect removing area is carried out, firstly, the defect position is determined on an original image (origian) of a solid wood saw, then, the cutting range is determined according to the size of the defects and the deformation condition of the textures, the cut defect image is processed into a corresponding size (input) and is input into an MRS-Transformers model (Mask is Mask), a repaired image (output) is obtained, and then, the image is mapped to the original position (inpaint) of the image. As shown in fig. 9, the wood texture generated by the MRS-transducer model is natural and coherent, which can meet the requirement of visual texture consistency of the splicing of the solid wood veneers, and the solid wood boards with similar and proper textures can be found from the database for splicing by a texture matching algorithm in the later period.
Ablation experiments
In order to demonstrate the effectiveness of the MRS-transducers model, ablation experiments were designed.
Group a experiments were performed using the Vit model directly as encoder and decoder training data sets, using L2 loss as a loss function, encoder 12 layer, decoder 8 layer.
Group B experiments introduce an asymmetric encoder-decoder structure based on group a, the encoder only inputs unmasked tiles, using L2 loss as a loss function, the decoder structure is unchanged.
Group C replaces the decoder with the proposed RSwin structure on a group B basis, again using the L2 penalty as a penalty function, encoder 12 layer, decoder 8 layer.
Group D is the model of the invention, and the weighted L2 loss is used as a loss function on the basis of group C, and the coder and the decoder are the same as those of group C.
The experimental results are shown in table 1, and the model quality is measured by adopting a plurality of angles such as four image evaluation indexes, model calculation amount and the like. It can be seen that after the input mode of the encoder is improved by the B group experiment, the MSE, PSNR, SSIM three indexes have no obvious change, and the LPIPS indexes are poorer than those of the A group, but the calculated amount of the model is reduced by 28.3%; after the RSwin module is redesigned as a decoder in the group C, other three indexes are not obviously changed, the LPIPS index is reduced by 21%, the generated image detail is improved from the visual angle of human eyes, and the calculated amount is reduced by 26.3% again; the weighted L2 loss is used on the basis of the group C, so that the repair accuracy of the edge of the wood missing area is improved, wherein MSE and LPIPS are respectively reduced by 51.7% and 34.2% compared with the group C, and PSNR and SSIM are respectively higher by 12.2% and 7.5% compared with the group C.
Table 1 ablation experiments
Besides objective indexes, visual images of each improvement stage are selected as subjective references, so that experimental results are more visual; as shown in fig. 10, by comparing the pictures after the repair of the three experiments, it can be obviously observed that the repair result of fig. 10 is closest to the original picture in texture structure and semantic information, and compared with the asymmetric coding and decoding structure of fig. 10, the block-shaped cracking feeling is obviously reduced after the repair result of the RSwin structure is introduced, and the repaired texture is more coherent and finer; after the weight L2 loss is introduced, the block-shaped cracking sense is basically disappeared, the semantics of the edge connection part are coherent, and the texture connection is more natural.
Comparative experiments
In order to verify the superiority of the model, the invention adopts two image restoration models of deep fill v2 and TG-Net on the fixed-size mask image restoration to compare on the solid wood veneer data set; the deep fill v2 model and the TG-Net model are both double-stage repairing methods based on the GAN model, the outline of the missing region is repaired in the coarse generation network, and a finer result is repaired by inputting the end of the coarse generation into the fine generation network. The deep fill v2 model provides a gating convolution operation, the gating convolution solves the problem that the common convolution regards all pixels as effective pixels, and a part of convolution is generalized by providing a learnable dynamic feature selection mechanism for each channel of each spatial position in all layers, so that the repairing effect is more excellent, and a leading result is obtained in a plurality of image repairing tasks; the TG-Net model is also used for researching the removal of texture defects of solid wood sawn timber, and provides the idea of respectively normalizing front and back backgrounds and improving the texture generation capacity of a missing area. The experimental results are shown in Table 2, the invention obtains the results of 0.0003, 40.1233, 0.154 and 0.9173 on four indexes MSE, PSNR, IPIPS and SSIM, and compared with the MSE indexes of two models of deep fillv2 and TG-Net, the MSE indexes are reduced by 47.0% and 66.9%; the LPIPS index was reduced by 60.6% and 42.5%; PSNR indexes are respectively 16.1% and 26.2% higher; the SSIM index was 7.3% and 5.8% higher, respectively. Besides objective evaluation indexes, the model only needs 0.05s on repairing a single wood grain picture, which is faster than other two algorithms by nearly 5 times.
Table 2 comparative experiments
The invention also carries out visual comparison on the model repairing result, as shown in fig. 11, in order to ensure the authenticity of the experimental result, all repairing images are not subjected to any post-treatment. As can be seen by comparison, the module of the invention can better generate the missing part texture of the solid wood board, and a very good result is obtained in the aspect of semantic consistency of the edge of the missing area, the other two models can not remove the artifacts at the edge and maintain the semantic consistency of the joint, and the generated wood texture is not consistent enough.
As can be seen from the experimental results of FIG. 11, the MRS-transducer model of the invention is superior to the deep fill v2 and TG-Net models in terms of objective indexes and visual effects, because the real wood saw texture repairing task mask area is larger, which causes a certain difficulty to image generation; the invention is optimized independently on the problem, so that the repairing result is better than that of other two models. In addition, although the gating convolution operation proposed by the deep v2 model provides the capability of dynamic feature selection for the network, the effect of partial convolution is optimized, but the learning capability of effective features or modeling of a longer-distance scene is still worse than that of a transducer model, so that the repairing effect finally presented by the model is also worse than that of an MRS-transducer model, the TG-Net model is also a model for researching wood texture repairing, a front background and back background normalization method is provided for increasing the weight of mask region features in the model, and the texture features of a missing part can be better generated, but in this way, the generated texture is not coherent with a background region, a color difference phenomenon is easy to occur, and the model provided by the invention has the gradual change process through far and near weighting L2 loss from the missing center, the weight ratio is decreased from the mask region to the background region, and the generated texture features are more coherent.
With the above-described preferred embodiments according to the present invention as an illustration, the above-described descriptions can be used by persons skilled in the relevant art to make various changes and modifications without departing from the scope of the technical idea of the present invention. The technical scope of the present invention is not limited to the description, but must be determined according to the scope of claims.

Claims (7)

1. The texture splicing method for removing defects of the solid wood sawn timber is characterized by comprising the following steps of:
firstly, acquiring a wood texture image, preprocessing the image, and constructing a wood texture image training set and a verification set;
and step two, transmitting the training set data into an MRS-transducer model for training.
2. The method for texture stitching for solid wood sawing defect removal according to claim 1 wherein the construction of the MRS-transducer model comprises:
step 21, introducing an asymmetric encoder in the MAE in the Vit model, and reducing the calculated amount of the model;
step 22, constructing an RSwin decoder with multi-scale characteristics, which is used for adjusting the size of the divided image blocks and finishing the restoration of the image blocks from thick to thin;
and step 23, setting different weights for the unmasked areas through the L2 loss function weighted far and near from the defect center, and repairing the missing areas by using the effective pixels.
3. The method of claim 2, wherein the asymmetrical encoder uses a fixed masking strategy, discards the masked portion and takes as input the unmasked visual block.
4. The method for texture stitching for solid wood sawing through defects according to claim 2 wherein the asymmetric encoder comprises a plurality of transducer modules, each transducer module comprising a multi-headed self-focusing mechanism layer and a fully connected layer connected by a Norm layer and a residual.
5. The method of claim 4, wherein the RSwin decoder uses a W-MSA and SW-MSA sliding window attention machine of the Swin transform model and connects the Patch conversion layer after the SW-MSA.
6. The method for texture stitching for solid wood sawing with defect removal according to claim 2 wherein step 23 comprises:
firstly, setting a matrix with the same size as an original image, setting 0 pixels of a masked area, and calculating Euclidean distance between the pixels and the nearest mask area without the masked position to obtain a matrix template;
and then taking alpha as a base number and a matrix template as an index to obtain a weight matrix, and taking the weight matrix as the weight of the pixel point when the L2 loss is calculated.
7. The method for texture stitching for solid wood sawing defect removal according to claim 2 wherein the formula for the L2 loss function is:
wherein ,representing L2 loss of restored image and original image, alpha A Representing a weight matrix, I r Is a repaired image, I o Is the original image.
CN202310533045.4A 2023-05-11 2023-05-11 Texture splicing method for removing defects of solid wood sawn timber Pending CN116612167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310533045.4A CN116612167A (en) 2023-05-11 2023-05-11 Texture splicing method for removing defects of solid wood sawn timber

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310533045.4A CN116612167A (en) 2023-05-11 2023-05-11 Texture splicing method for removing defects of solid wood sawn timber

Publications (1)

Publication Number Publication Date
CN116612167A true CN116612167A (en) 2023-08-18

Family

ID=87684679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310533045.4A Pending CN116612167A (en) 2023-05-11 2023-05-11 Texture splicing method for removing defects of solid wood sawn timber

Country Status (1)

Country Link
CN (1) CN116612167A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117474806A (en) * 2023-12-26 2024-01-30 松立控股集团股份有限公司 Panoramic image restoration method based on global structure coding

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117474806A (en) * 2023-12-26 2024-01-30 松立控股集团股份有限公司 Panoramic image restoration method based on global structure coding
CN117474806B (en) * 2023-12-26 2024-04-12 松立控股集团股份有限公司 Panoramic image restoration method based on global structure coding

Similar Documents

Publication Publication Date Title
CN110599409B (en) Convolutional neural network image denoising method based on multi-scale convolutional groups and parallel
CN114140353B (en) Swin-Transformer image denoising method and system based on channel attention
CN110570364B (en) Depth neural network-based sub-focal plane polarization image denoising method
CN110992275A (en) Refined single image rain removing method based on generation countermeasure network
CN110060286B (en) Monocular depth estimation method
CN112270654A (en) Image denoising method based on multi-channel GAN
CN116612167A (en) Texture splicing method for removing defects of solid wood sawn timber
CN115018727A (en) Multi-scale image restoration method, storage medium and terminal
CN114943656B (en) Face image restoration method and system
CN112669249A (en) Infrared and visible light image fusion method combining improved NSCT (non-subsampled Contourlet transform) transformation and deep learning
Wang et al. Polarization image fusion algorithm using NSCT and CNN
CN116416156A (en) Swin transducer-based medical image denoising method
CN114972332B (en) Bamboo laminated wood crack detection method based on image super-resolution reconstruction network
CN112288645A (en) Skull face restoration model construction method, restoration method and restoration system
CN117333359A (en) Mountain-water painting image super-resolution reconstruction method based on separable convolution network
Krishnan et al. A novel underwater image enhancement technique using ResNet
CN116681621A (en) Face image restoration method based on feature fusion and multiplexing
CN116109510A (en) Face image restoration method based on structure and texture dual generation
CN116091357A (en) Low-light image enhancement method for fusion of depth convolution attention and multi-scale features
CN114764754B (en) Occlusion face restoration method based on geometric perception priori guidance
CN116309221A (en) Method for constructing multispectral image fusion model
CN116402702A (en) Old photo restoration method and system based on deep neural network
CN115861108A (en) Image restoration method based on wavelet self-attention generation countermeasure network
Zou et al. EDCNN: a novel network for image denoising
CN114862696A (en) Facial image restoration method based on contour and semantic guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination