CN115731198A

CN115731198A - Intelligent detection system for leather surface defects

Info

Publication number: CN115731198A
Application number: CN202211481234.3A
Authority: CN
Inventors: 陈志强; 朱启锐; 林菲; 张聪
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-11-24
Filing date: 2022-11-24
Publication date: 2023-03-03

Abstract

The invention is mainly used for the intelligent detection of defects such as holes, scratches, rotten surfaces, pinholes and the like in the surface defects of cow leather and sheepskin leather, and provides an intelligent detection system for the surface defects of the leather, which comprises: and S1, collecting a whole leather image once, and storing the collected leather surface image on a storage medium. And S2, reading the collected whole leather image from a storage medium, and detecting the leather surface by adopting a trained improved yolo model. And S3, defect segmentation, namely obtaining the defect position and the defect type according to the step S2, segmenting the defect based on a segmentation model, cutting a picture, and obtaining the defect shape and the defect area after calculation. And S4, performing visual display on the detection result of the system according to the position, type, shape and area data of the defect obtained in the steps S2 and S3 and by combining the whole skin image data acquired in the step S1.

Description

Intelligent detection system for leather surface defects

Technical Field

The invention relates to the field of automatic leather treatment, in particular to an intelligent detection system for leather surface defects.

Background

Leather is a high-quality dress raw material which is not easy to rot and has good flexibility. In the process from animal growth to tanning finished products, leather properties are different due to individual differences of animal growth environments and the like, wherein various defects such as wormholes, dead marks, wrinkles, 311464646, scratches, uneven colors and the like are inevitably existed, and the shape and the size, the texture color, the shape and the size position of surface defects and the like of each piece of leather are random. This presents a number of challenges to the selection of leather. The automatic detection and identification of the leather surface blemish is stable, reliable and effective, is a technical key and a technical difficulty for realizing the intelligent manufacturing of leather products, and is also an important subject in industrial inspection.

The conventional process of surface defect inspection and classification is performed by human inspectors. This becomes a potential bottleneck in the production of leather articles. Related enterprises in Guangdong and Zhejiang in developed areas of leather products in China are investigated, a plurality of enterprises are found to still maintain traditional manual defect inspection and classification, partial enterprises realize semi-automation and semi-manual work, and a full-automatic defect detection system in the true sense is not realized. The traditional leather detecting and cutting processing mode can not meet the development requirements of modern leather enterprises, and advanced processing equipment plays an important role in promoting the development of the leather industry. Therefore, the automatic detection of the leather surface defects is stable, reliable and effective. Although students and automatic detection equipment suppliers both pay attention to the automatic detection of leather surface blemishes in the past 90 s, and traditional image processing methods such as wavelet transformation, mathematical morphology, gabor filtering, fuzzy clustering, edge detection, threshold-based segmentation and the like are widely used for leather defect detection, the leather surface defect detection and corresponding machine vision technology have many problems in the aspect of practical application, the leather detection theories and methods pay attention to different angles, corresponding solutions are provided only for specific application requirements, and once the environment or fabric is changed, some algorithms may fail. Deep learning, in which features can be learned directly from original images, has become a widespread application in the field of image processing today and has achieved excellent results. The deep learning framework is believed to be a source of guidelines for designing and developing new solutions for leather defect inspection.

Conventional image processing methods tend to require multiple thresholds for various defects in the algorithm. They are very sensitive to lighting conditions and background color. When a new problem arises, these thresholds need to be adjusted, or the algorithm may even need to be redesigned. In the existing various leather surface defect intelligent detection systems, a texture structure-based method focuses on searching texture primitives of a texture image, is suitable for textures with obvious structural attributes, and is suitable for specific applications only, and the texture density, the directionality or the dimension directly influence the texture patterns and the structural visual effect; most of the calculation amount is huge based on the statistical method, the effectiveness and the stability of the algorithm need to be optimized, and although the algorithm has good performance on the two classification tasks without defects, the types of different defects are difficult to distinguish; in the method based on the filter, some noise from texture primitives can be generated during image segmentation, certain influence is caused on a detection result, and the accuracy of the algorithm can be greatly reduced along with the change of a production environment and a leather fabric; most of the methods based on the traditional classifier have good performance in specific data set, have too large difference with the actual industry and are not easy to apply; at present, a part of deep learning methods are used for leather defect detection, but the defect type is single, the defect type is not fused with the industrial production environment enough, a large amount of industrial practical models are still required to be evaluated, and a large optimization and improvement space is provided. The diversity of defects is insufficient, without taking into account the dynamic variations of leather defects, so it is difficult to ensure the generalization performance of these algorithms. The texture statistical feature extraction represented by the traditional gray level co-occurrence matrix has large calculation amount, and the effectiveness of the feature extraction is challenged by the high variation of the leather surface defects.

Disclosure of Invention

The invention is mainly used for the intelligent detection of the defects of holes, scratches, rotten surfaces, pinholes and the like in the surface defects of the cow leather and the sheepskin leather, and realizes the automatic detection of the defects. The developed system has a front-end interactive interface convenient to operate, has an accurate leather surface detection and segmentation model, and can obtain accurate data of the category, position, shape and area of various defects on the leather surface.

The invention consists of an ultra-high definition whole skin image acquisition module, a data labeling and preprocessing module, a defect detection model, defect segmentation and a user interface, and the technical scheme is shown in figure 1. The core model used in the invention is an improved yolo network model as shown in fig. 2, compared with a convolutional neural network model based on a region, the yolo series network model has the greatest advantages of higher detection speed and higher accuracy, and by virtue of the advantages, the yolo series network model is widely applied to the scenes of various target detections and obtains good evaluation scores in common data sets, but the target detection task of the leather surface defect needs to process target objects which are not common in life, the leather surface defect has unique characteristics such as texture, material and the like, and according to the prior knowledge, the yolo model is further improved, so that the model can better learn the characteristics of the leather surface defect, and the detection speed and the accuracy are increased, and the technical implementation detailed process is as follows:

s1, high-definition one-time whole leather image acquisition, namely placing animal leathers such as cow leather, sheepskin and the like on a disc with a blue background, acquiring the image of the whole leather at one time by using a fixed light source and an ultra-high-definition camera, and storing the acquired leather surface image on a reliable storage medium with enough capacity.

And S2, reading the acquired whole leather image from a storage medium, and detecting the surface of the leather by adopting a trained improved yolo model.

The yolo model improvement process is as follows:

the modified yolo model includes: a backbone network, a neck network and a head network. The main network is a main force for extracting model features, and the CBS1_ X module, the SPP module and a large number of CBS modules have strong capability of extracting features. The neck network mainly performs a feature enhancement function, and the main technology is an FPN + PAN structure, wherein the FPN structure adopts a top-down feature fusion mode to process a multi-scale change problem in the field of target detection, and a PAN structure is added on the basis of the FPN to introduce a bottom-up information flow. The head network has a multi-scale detection function, the three detection heads respectively carry out 8-time, 16-time and 32-time down sampling on an original image, and finally, three feature vectors with different sizes are respectively generated and used for detecting targets with different sizes; the improved yolo network model uses five main convolution modules: the system comprises a CBS module, a FOCUS module, a CSP1_ X module, a CSP2_ X module and an SPP module;

the CBS module is mainly a calculation module integrating convolution, normalization and SiLU activation functions, and is largely used in other four modules;

the Focus module is a deformed down-sampling process, which aims to reduce an image, reduce the dimensionality of a feature map as much as possible and maintain original effective information, and is mainly applied to a main network of a model on the premise of sacrificing a small amount of data, wherein the process is to reduce the data volume and the calculated amount of the model, the specific calculation flow of the module is to divide picture data into blocks according to the size of 4 x 4, pixels on the same position of each block are sampled and spliced, the picture size is reduced, and the down-sampling effect is achieved, but different from the down-sampling, because the dimensionality is increased by four times, the most feature map is deformed once, and then the deformed feature map is subjected to a CBS calculation operation once, and the calculated feature map is transmitted to the next layer;

the CSP1_ X module mainly divides an input into two branches, is mainly applied to a main network of a model in order to enable the model to learn more picture characteristics, and has a specific calculation flow that the input is divided into two branches, wherein one branch is calculated by a CBS (cubic boron standard) calculation process and then is subjected to convolution calculation of X dense units, the other branch is calculated by a CBS module only, then results of the two branches are spliced for one time, finally, BN, siLU (single independent logical unit) activation and CBS module calculation are sequentially carried out on the spliced characteristic diagram, and the calculated characteristic diagram is transmitted to the next layer;

the CSP2_ X module and the CSP1_ X module are similar on the whole, and the only difference is that the CSP2_ X module converts the convolution calculation of X dense units in the CSP1_ X module into the calculation of 2X X CBS modules, and the modules are mainly applied to the neck network of the model;

the SPP module mainly converts a feature map with any size transmitted from an upper layer into a feature map with a fixed size, the module is mainly applied to a backbone network, the specific calculation process of the model is that the input feature map is calculated by a CBS module, the obtained calculation result is divided into three branches, the three branches are respectively calculated in three maximum pooling layers with different scales, then the three calculation results are spliced, so that the picture features with different scales can be learned, finally, the spliced features are calculated by the CBS module again, and the calculation results are input to a next layer.

In the CSP1_ X module, an intensive unit is used for replacing a residual error unit, the original yolo model uses the residual error unit for extracting features, the calculated amount is small, the detection speed can be effectively increased, the intensive unit can increase the calculated amount and reduce the detection speed, but a more accurate detection result can be obtained, and the accuracy of the subsequent target segmentation step is greatly improved;

the improved yolo model uses a data enhancement technology, fully utilizes a computer to generate data, increases the data quantity, such as data enhancement by adopting methods of scaling, translation, rotation, color transformation and the like, can increase the number of training samples through data enhancement, and can improve the generalization force of the model by adding proper noise data. The method is characterized in that a Mosaic data enhancement method is used in an improved yolo model besides a most basic data enhancement method, and the method has the main idea that four pictures are randomly cut and zoomed and then randomly arranged and spliced to form one picture, so that a small sample target is increased while a rich data set is realized, and the training speed of a network is increased.

And S3, defect segmentation, namely obtaining the defect position and the defect type according to the step S2, segmenting the defect based on a segmentation model, cutting a picture, and obtaining the defect shape and the defect area after calculation.

The detailed segmentation process is as follows:

the main model used for segmentation is the modified MSCDAE model as shown in fig. 3, which is robust and has high accuracy on texture surface datasets, and the MASCDAE model uses a defect detection and localization method using model training using only non-defective samples.

The method is realized by reconstructing image blocks at different Gaussian pyramid levels through a convolution denoising automatic encoder network and integrating detection results of channels with different resolutions, the network consists of three layers, each layer is provided with an encoding stage and a decoding stage, the encoding stage consists of a convolution layer and a pooling layer, the decoding stage consists of a convolution layer and an upper sampling layer, each layer processes pictures with different resolutions and performs Gaussian blur processing on the pictures, and in order to enable a model to be more adaptive to the system, an intensive connection technology is added at a convolution operation position.

The invention has the following advantages: the system can automatically detect the leather surface defects, accurately obtain the types and positions of all the defects of the leather surface picture through the calculation of the model, and accurately divide the defects according to the defect positioning and types to obtain the shapes and the areas of the defects.

Drawings

FIG. 1 is an overall schematic of the present invention;

FIG. 2 is an improved yolo network model of the present invention;

FIG. 3 is a MSCDAE network model improved by the method;

FIG. 4 is a system flow diagram of the present invention;

FIG. 5 is a graph of the average accuracy of defect detection for a model;

FIG. 6 model defect detection training loss plot;

FIG. 7 is a diagram of the effect of defect detection in the model.

Detailed Description

The technical solution of the present invention is further specifically described below by using specific embodiments and with reference to the accompanying drawings.

The invention consists of an ultra-high-definition whole skin image acquisition module, a data labeling and preprocessing module, a defect detection model, defect segmentation and a user interface, and the technical scheme is shown in figure 1. The core model used in the invention is an improved yolo network model as shown in fig. 2, compared with a convolutional neural network model based on a region, the yolo series network model has the greatest advantages of higher detection speed and higher accuracy, and by virtue of the advantages, the yolo series network model is widely applied to the scenes of various target detections and obtains good evaluation scores in common data sets, but the target detection task of the leather surface defect needs to process target objects which are not common in life, the leather surface defect has unique characteristics such as texture, material and the like, and according to the prior knowledge, the yolo model is further improved, so that the model can better learn the characteristics of the leather surface defect, and the detection speed and the accuracy are increased, and the technical implementation detailed process is as follows:

s1, high-definition one-time whole leather image acquisition, placing animal leathers such as cowhides and sheepskins on a disc with a blue background, acquiring the image of the whole leather at one time by using a fixed light source and an ultra-high-definition camera, and storing the acquired leather surface image on a reliable storage medium with enough capacity.

The yolo model improvement procedure is as follows:

the modified yolo model includes: a backbone network, a neck network and a head network. The main network is a main force for extracting model features, and the CBS1_ X module, the SPP module and a large number of CBS modules have strong capability of extracting features. The neck network mainly performs a feature enhancement function, and the main technology is an FPN + PAN structure, wherein the FPN structure adopts a top-down feature fusion mode to process a multi-scale change problem in the field of target detection, and a PAN structure is added on the basis of the FPN to introduce a bottom-up information flow. The head network has a multi-scale detection function, the three detection heads respectively carry out 8-time, 16-time and 32-time down-sampling on an original image, and finally, three feature vectors with different sizes are respectively generated and used for detecting targets with different sizes; the improved yolo network model uses five main convolution modules: the system comprises a CBS module, a FOCUS module, a CSP1_ X module, a CSP2_ X module and an SPP module;

the Focus module is a deformed down-sampling process, which aims to reduce an image, reduce the dimensionality of a feature map as much as possible and maintain original effective information, and is used for reducing the data volume and the calculated amount of a model on the premise of sacrificing a small amount of data;

the CSP1_ X module mainly divides an input into two branches, and is mainly applied to a main network of a model in order to enable the model to learn more picture characteristics, the specific calculation process of the module is to divide the input into two branches, wherein one branch is calculated by a CBS (cubic boron nitride) calculation process and then is subjected to convolution calculation of X dense units, the other branch is calculated by a CBS module only, then the results of the two branches are spliced for one time, finally, the spliced characteristic diagram is subjected to BN (boron nitride), siLU (single-site integrated circuit) activation and CBS module calculation in sequence, and the calculated characteristic diagram is transmitted to the next layer;

In the CSP1_ X module, an intensive unit is used for replacing a residual error unit, the original yolo model uses the residual error unit to extract features, the calculated amount is small, the detection speed can be effectively increased, the intensive unit can increase the calculated amount and reduce the detection speed, but a more accurate detection result can be obtained, and the accuracy of the subsequent target segmentation step is greatly improved;

the improved yolo model uses a data enhancement technology, fully utilizes a computer to generate data, increases the data quantity, such as data enhancement by adopting methods of scaling, translation, rotation, color transformation and the like, can increase the number of training samples through data enhancement, and can improve the generalization force of the model by adding proper noise data. The method is characterized in that a most basic data enhancement method and a Mosaic data enhancement method are used in an improved yolo model, and the main idea is that four pictures are randomly cut and zoomed and then randomly arranged and spliced to form one picture, so that a small sample target is increased while a data set is enriched, and the training speed of a network is increased.

And S3, dividing the defects, namely obtaining the positions and types of the defects according to the step S2, dividing the defects based on a division model, cutting pictures, and obtaining the shapes and the areas of the defects after calculation.

The detailed segmentation process is as follows:

The system has the characteristics of high efficiency, high accuracy, short time consumption and capability of detecting various categories at one time, and comprises a data acquisition layer, a deep learning model layer and a user layer from bottom to top as shown in figure 4:

1. a data acquisition layer: collecting images of leather defects required in production, the defect types including: and marking the collected sufficient leather defect type pictures by using labelme software, storing marked files and pictures in a fixed storage medium, and preparing a system deep learning model layer.

2. Defect detection layer: there is a need for training a model based on an improved Yolo. The training process is as follows: marking and preprocessing pictures, wherein due to the difficulty in collecting leather surface defect pictures, picture data are rare, data enhancement is performed during preprocessing before model training, the method comprises the technical methods of rotation, cutting, splicing and the like. Based on a defect detection model deployed on a server, a user can realize type identification and positioning of defects, store related information and serve as input of a defect segmentation module.

3. And defect segmentation, namely extracting corresponding region characteristics by adopting a defect segmentation model based on the defect position and the defect type acquired by the defect detection model, and segmenting the defect by adopting a segmentation algorithm so as to calculate the area of the defect and the unusable area. As important information for automatic cutting and quality determination of leather.

4. And (3) a user layer: under the support of the deep learning model on the server, a user can use a computer or a mobile phone and other multimedia tools to interact with the web background through a front-end interface, input a leather image to be detected, and after the model is calculated, types and areas of all defects on the leather image can be obtained.

FIG. 5 is a graph of the average accuracy of model defect detection: the figure shows that the average precision of the class of the model in the system increases along with the increase of the training discussion, and it can be seen from the figure that when the model is trained for one hundred and ten rounds, the model reaches the optimal verification precision, and at the moment, the model saves the weight parameter file with the optimal precision for model deployment.

FIG. 6 is a model defect detection training loss: this figure shows that the loss of the model in the system gradually decreases with the increase of the training discussion, and it can be seen that after one hundred rounds of training, the loss of the model decreases slowly and the model tends to converge.

FIG. 7 is a diagram of the effect of model defect detection: this figure illustrates the defect detection capability of the model, from which it can be seen that the model can accurately detect the type and area of each defect in the leather.

Claims

1. An intelligent detection system for leather surface defects is characterized by comprising the following steps:

step S1, collecting a whole skin image: placing animal leathers such as cowhide, sheepskin and the like on a disc with a blue background, acquiring images of the whole leather at one time by using a fixed light source and a camera, and storing the acquired images of the surface of the leather on a storage medium;

s2, reading the collected whole leather image from a storage medium, carrying out defect detection on the leather surface by adopting a trained improved yolo model, and calculating to obtain the defect type and the defect position;

s3, dividing the defects, namely obtaining the positions and types of the defects according to the step S2, dividing the defects based on an MSCDAE division model, cutting pictures, and obtaining the shapes and the areas of the defects after calculation;

and S4, performing visual display on the detection result of the system according to the position, type, shape and area data of the defect obtained in the steps S2 and S3 and by combining the whole skin image data acquired in the step S1.

2. The system of claim 1, wherein the modified yolo model in step S2 comprises: a backbone network, a neck network and a head network;

wherein, the backbone network includes: the system comprises a CBS1_ X module, an SPP module and a plurality of CBS modules, wherein a backbone network is used for extracting features;

performing feature enhancement on a neck network, using an FPN + PAN structure, processing a multi-scale change problem in the field of target detection by the FPN structure in a top-down feature fusion mode, and adding the PAN structure on the basis of the FPN to introduce a bottom-up information stream;

the head network carries out multi-scale detection, the three detection heads respectively carry out 8-time, 16-time and 32-time down-sampling on the original image, and finally, three feature vectors with different sizes are respectively generated and used for detecting targets with different sizes;

the improved yolo network model uses five main convolution modules: the system comprises a CBS module, a FOCUS module, a CSP1_ X module, a CSP2_ X module and an SPP module;

the CBS module is a calculation module integrating convolution, normalization and SiLU activation functions, and is largely used in other four modules;

the Focus module is a deformed down-sampling process, is applied to a backbone network of a model, and comprises the following calculation processes: dividing the picture data into blocks according to the size of 4 × 4, sampling and splicing pixels at the same position of each block, reducing the size of the picture, and then transmitting the calculated feature map to the next layer through one CBS calculation operation;

the CSP1_ X module is used for dividing the input into two branches, and the calculation process is as follows: the input is divided into two branches, one branch is firstly calculated by CBS and then is convoluted by X dense units, the other branch is only calculated by CBS

CBS module calculation, then, the results of the two branches are spliced for one time, finally, the spliced characteristic diagram is sequentially subjected to BN, siLU activation and CBS module calculation, and the calculated characteristic diagram is transmitted to the next layer;

the SPP module is used for converting the characteristic diagram with any size transmitted from the upper layer into the characteristic diagram with a fixed size, and is applied to a backbone network, and the calculation process of the SPP module is as follows: calculating the input feature map by a CBS module to obtain a calculation result which is divided into three branches, respectively performing maximum pooling layer calculation of three different scales, splicing the three calculation results, so that the picture features of different scales can be learned, finally performing CBS module calculation on the spliced features again, and inputting the calculation result to the next layer;

in the CSP1_ X module, a dense unit is used instead of a residual unit;

the improved yolo model uses the Mosaic data enhancement method.

3. The system of claim 1, wherein the MSCDAE partition model is: the improved MSCDAE model is a defect detection and positioning method for model training by using defect-free samples;

the method is realized by reconstructing image blocks at different Gaussian pyramid levels through a convolution denoising automatic encoder network and integrating detection results of channels with different resolutions, the network consists of three layers, each layer is provided with an encoding stage and a decoding stage, the encoding stage consists of a convolutional layer and a pooling layer, the decoding stage consists of a convolutional layer and an upper sampling layer, each layer processes pictures with different resolutions and performs Gaussian blur processing on the pictures, and in order to enable a model to be more adaptive to the system, intensive connection is added at a convolution operation position.

4. The system of claim 1, wherein the training process of the modified yolo model and the MSCDAE based segmentation model further comprises:

marking and preprocessing the picture:

labeling labelme open source software, a visual interface, manually labeling all defects of the training leather picture by using polygons, generating a JSON file for each picture, and recording defect types and key point coordinates;

two tasks are performed during preprocessing: firstly, converting the JSON markup file of each picture into a text markup file required by an improved yolo model by using codes, meeting the markup format of the improved yolo model, and integrating the prepared picture file and a tag file into the codes; acquiring initial weight files, training a public PASCAL VOC data set by using an improved yolo model and an improved MSCDAE model to acquire respective weight files, and respectively using the two acquired weight files as the initial weight files of the improved yolo model and the improved MSCDAE model training leather data set according to the migration learning principle;

the labeled data set is sent into an improved yolo model for target detection training, and then the detection result is sent into an improved MSCDAE model for target segmentation training, wherein each training process is distributed training under a plurality of gpus;

and obtaining a trained improved yolo model and an improved MSCDAE model, and deploying the models into the system.