CN114612472B

CN114612472B - SegNet improvement-based leather defect segmentation network algorithm

Info

Publication number: CN114612472B
Application number: CN202210507678.3A
Authority: CN
Inventors: 黄惠玲; 韩军; 张忠良; 王愉锦
Original assignee: Quanzhou Institute of Equipment Manufacturing
Current assignee: Quanzhou Institute of Equipment Manufacturing
Priority date: 2022-05-11
Filing date: 2022-05-11
Publication date: 2022-09-13
Anticipated expiration: 2042-05-11
Also published as: CN114612472A

Abstract

The invention relates to the technical field of image data processing, and provides a characteristic fusion multi-decoding leather defect segmentation network based on a SegNet network which is improved according to the SegNet network so as to be suitable for raw material leather with the characteristics of random texture, rich defects and the like; the feature extraction and fusion module is used for increasing the fusion of shallow features and deep features on the basis of a SegNet coding network, and reserving richer details and more detailed context information in a leather image; and the defect restoration module expands the upsampling process of the SegNet decoding network into four upsampling paths, and finally performs fusion and expansion for pixel-level classification.

Description

SegNet improved leather defect segmentation network algorithm

Technical Field

The invention relates to the technical field of image data processing, in particular to a leather defect segmentation network algorithm based on SegNet improvement.

Background

Leather is one of the most important living goods in people's life at present, such as leather clothes, leather shoes, leather seats in automobiles, and the like. According to the data report of the industry development organization of the united nations, the added value of industries such as Chinese textiles, clothes, leather, basic metals and the like accounts for more than 30 percent of the world. The annual output of the tanning enterprises of which the scale is more than 25 percent of the total world leather output by 2018 in China, and is the top of the world. In the leather making process, most leather making enterprises grade the leather by visual observation. However, manual inspection results in large defect identification errors and inaccurate leather grading. Therefore, the rapid and accurate calculation of the surface damage rate of leather is a major problem in the current leather-making industry.

The leather defect detection algorithm at the present stage is mainly based on computer vision. The vision-based leather defect detection algorithm is mainly divided into classification and segmentation according to detection effects, wherein the segmentation algorithm is used for segmenting defect parts in a leather image according to shape edges of the defect parts, and the classification algorithm is used for identifying the defects of the leather defect parts in the forms of square line frames and the like. The existing leather defect detection research mainly comprises methods of Principal Component Analysis (PCA), Support Vector Machine (SVM), clustering, genetic algorithm, decision tree, neural network and the like.

Leather defect segmentation algorithms are mainly divided into non-deep learning methods and deep learning methods. The non-deep learning method mainly includes statistics, frequency analysis, model methods and the like. For example, a Gabor filter-based defect detection algorithm, a fourier transform-based defect detection algorithm. The defect detection algorithm based on the Gabor filter detects defects with uniform texture surfaces, and the core idea is to select appropriate relevant parameters by the Gabor filter to carry out convolution operation on an image, so that uniform texture parts and defect parts in the image obtain different outputs, a more obvious defect area and a more obvious uniform texture area are reconstructed, and then threshold segmentation is carried out to achieve the purpose of finding the defects. The defect detection algorithm based on Fourier transform utilizes two-dimensional Fourier transform to extract the frequency characteristics of random textures on the surfaces of industrial materials such as leather and the like, then sets the frequency component in a certain range to be zero, and then restores the original image through inverse Fourier transform, so that the texture area in the original image has approximately uniform gray scale, the defect area is obviously reserved, and the processed defect image can be separated by only simple threshold segmentation.

The image segmentation technology based on deep learning is an extremely important research direction in the field of computer vision. It classifies each pixel in the image, and divides the image at the pixel level. Due to the strong generalization ability of deep learning, more and more application scenarios using images for understanding and reasoning have emerged and developed, including autopilot, medical imaging, virtual reality and augmented reality technologies, and other fields. Among them, the segmentation algorithm applied to leather defects has appeared in recent years, and a classification and segmentation detection algorithm is proposed mainly for two defects, namely black lines and wrinkles. The AlexNet network is adopted as the image classification algorithm, the U-Net network is adopted as the segmentation algorithm, and under the experiment of 250 defective samples and 125 non-defective samples, the classification detection performance of 95% and the intersection rate of 99.84% are achieved. But the detected defect types are single, and the cutting detection speed of the method still needs to be improved in the actual leather production process. In 2021, a texture defect detection algorithm based on a deep convolution generation countermeasure network can reconstruct an input image, and then implement initial defect segmentation by making a difference with an original input image, and add an LDA sub-module capable of eliminating false detection in the whole model to eliminate the false detection. However, the leather defect types detected by the algorithm are still few, and the quality requirement on the detected image is too high, so that the method is not beneficial to practical application.

The SegNet network structure is divided into an encoder and a decoder, wherein the encoder is obtained by removing full connection layers on the basis of a VGG convolutional neural network model, and each layer of encoder corresponds to one decoder. The role of the encoding process is to extract the feature map of the image, while the role of the decoding process is to restore the feature map to the size of the input image for pixel-level classification. To achieve this, the decoder continues to upsample and deconvolute the feature map by a factor of two until the original image size is restored. And finally, the SegNet network performs pixel-level classification on the output of the decoder by using a softmax function so as to realize the classification of the image. The SegNet network has an excellent expression in semantic understanding of vehicle roads, and can recognize relatively fine features in an image, as shown in fig. 1.

The greatest innovation of the SegNet network is that the maximum pooling position of each pooling layer in the coding network is reserved and added into the corresponding decoding network, so that the information loss caused by the pooling operation in the coding network is reduced by each up-sampling operation, and the SegNet can identify tiny defects. And finally, after the feature map is up-sampled to the size of the input map, the feature map is unfolded to carry out pixel-level classification.

Disclosure of Invention

Therefore, aiming at the problems, the invention improves according to the SegNet network so as to adapt to the raw material leather with the characteristics of random texture, rich defects and the like, provides a characteristic fusion multi-decoding leather defect segmentation network based on the SegNet network, obtains the complete information of the defects by fusing the characteristics of different scales of the leather defect image, then carries out decoding for multiple times, finally segments the defect parts, and obtains the leather defect segmentation network algorithm with ideal effect based on the SegNet improvement through a large amount of training.

In order to solve the technical problem, the invention adopts the technical scheme that a leather defect segmentation network algorithm based on SegNet improvement comprises a feature extraction fusion module and a defect restoration module which are used for processing a leather image;

the feature extraction and fusion module is used for increasing the fusion of shallow features and deep features on the basis of a SegNet coding network, and reserving richer details and more detailed context information in a leather image;

and the defect restoration module expands the upsampling process of the SegNet decoding network into four upsampling paths, and finally performs fusion and expansion for pixel-level classification.

The further improvement is that: the feature extraction fusion module has five layers of structures, namely five layers of features are extracted, wherein the first layer of structure comprises two times of feature extraction and one maximum pooling layer, and the second layer of structure, the third layer of structure, the fourth layer of structure and the fifth layer of structure comprise three times of feature extraction and one maximum pooling layer; and performing 16-time maximum pooling, 8-time maximum pooling and 4-time maximum pooling on the features extracted from the second layer structure, the third layer structure and the fourth layer structure to enable the size of the features to be the same as that of the fifth layer feature, and forming a final feature map by superposition and fusion in the channel direction.

The further improvement is that: the primary feature extraction comprises three processes, namely a convolution layer, a batch normalization layer and an activation layer; in the feature extraction, the convolution layers all adopt convolution kernels with the size of 3 multiplied by 3 to slide on the image to finish feature extraction, and at the moment, the obtained image information is large and the calculation speed is high.

The further improvement is that: the defect recovery module has six layers of structures, wherein the first layer of structure comprises a convolution layer, the second layer of structure, the third layer of structure, the fourth layer of structure and the fifth layer of structure comprise an up-sampling layer and two convolution layers, the sixth layer of structure comprises an up-sampling layer and a convolution layer, and feature graphs extracted from the third layer of structure, the fourth layer of structure and the fifth layer of structure are respectively subjected to 8 times of up-sampling, 4 times of up-sampling and 2 times of up-sampling to enable the feature graphs to be identical to an input image in size.

The further improvement is that: the defect restoration module gradually performs up-sampling on the extracted leather image characteristics to restore the extracted leather image characteristics to the size of an input image, and then performs pixel-level classification; the defect restoration module reserves a decoding network firstly, adds corresponding feature maps in the feature extraction and fusion module at the same pixel position before performing double upsampling on the sampling layers of the third layer structure, the fourth layer structure, the fifth layer structure and the sixth layer structure, and thus, the loss of defect information can be offset after each upsampling.

The further improvement is that: the defect recovery module utilizes the maximum pooling position information corresponding to the pooling layers in the structures of the feature extraction and fusion module to perform upsampling operation until the size of the input image is finally recovered, and generates a first upsampling path, wherein feature graphs extracted by a third layer structure, a fourth layer structure and a fifth layer structure in the middle of the first upsampling path are respectively subjected to 8-time upsampling, 4-time upsampling and 2-time upsampling, so that three new upsampling paths are formed; and (3) superposing the results of the four up-sampling paths according to the channel direction, retaining the core information of the convolution layer by 1x1, and finally outputting the classification probability of each pixel by using a softmax loss function for classification.

By adopting the technical scheme, the invention has the beneficial effects that:

1. the algorithm is improved based on the SegNet network, the feature extraction and fusion functions are strengthened while the advantages of the SegNet network are kept, the random texture and defect diversification of leather can be well adapted, the problem of defect information loss caused by the fact that collected defect images simultaneously contain large-size defects and small-size defects is solved, most of information of features with different depths can be still kept after feature images of different layers are fused, more abundant defect feature information can be utilized in an up-sampling stage, and meanwhile, the calculated amount is not greatly increased.

2. According to the method, two image enhancement preprocessing operations of gamma conversion and histogram equalization are firstly carried out on the collected leather image, the details of leather texture and defects are improved through the gamma conversion, the gray level histogram of the image is counted on the basis of the gamma conversion of the image, and then equalization processing is carried out, so that the contrast is improved. The image quality is improved through preprocessing, and the accuracy of a calculation result is improved.

3. The leather defects are automatically detected, classified and segmented, information such as sizes, numbers, coordinate positions and defect types of different defects is obtained, labor cost of enterprises is saved, and meanwhile the defect detection efficiency is greatly improved.

Drawings

Fig. 1 is a schematic diagram of a SegNet network used for a vehicle road in the background of the invention.

Fig. 2 is a schematic diagram of a leather defect segmentation network algorithm based on SegNet improvement according to an embodiment of the present invention.

Fig. 3 and 4 are schematic diagrams illustrating the effect of extracting the shallow feature information and the deep feature information by the feature extraction and fusion module in the SegNet-based improved leather defect segmentation network algorithm according to the embodiment of the present invention.

FIG. 5 is a schematic representation of a captured leather image showing both large size defects and small size defects.

FIG. 6 is a 10 leather defect feature map involved in algorithm development.

Fig. 7 is a comparison between the present invention and other conventional leather defect detection algorithms, wherein the first column is leather defect original image, the second column is label corresponding to original image, the third column is detection result diagram of SegNet algorithm, and the fourth column is detection result of improved SegNet algorithm (i.e. LDSN algorithm).

Detailed Description

The invention will now be further described with reference to the accompanying drawings and specific embodiments.

Referring to fig. 1 to 7, disclosed in the embodiment of the present invention is a SegNet-based improved leather defect segmentation network algorithm, including a feature extraction fusion module and a defect restoration module for processing a leather image, as shown in fig. 2;

the method comprises the steps of preprocessing a leather image before the leather image enters a feature extraction and fusion module and a defect restoration module, preprocessing the leather image by gamma conversion and histogram equalization, and cutting an acquired leather picture into an image with a size capable of being used by a SegNet network before the leather image enters the preprocessing.

The gray level histogram of the image is counted on the basis of the gamma conversion of the image, and then equalization processing is carried out, so that the contrast is improved.

The feature extraction and fusion module is used for increasing the fusion of shallow features and deep features on the basis of a SegNet coding network, and reserving richer details and more detailed context information in a leather image; and the defect restoration module expands the upsampling process of the SegNet decoding network into four upsampling paths, and finally performs fusion and expansion for pixel-level classification.

The feature extraction fusion module has five layers of structures, namely five layers of features are extracted, wherein the first layer of structure comprises two times of feature extraction and one maximum pooling layer, and the second layer of structure, the third layer of structure, the fourth layer of structure and the fifth layer of structure comprise three times of feature extraction and one maximum pooling layer; and performing 16-time maximum pooling, 8-time maximum pooling and 4-time maximum pooling on the features extracted from the second layer structure, the third layer structure and the fourth layer structure to enable the size of the features to be the same as that of the fifth layer feature, and forming a final feature map by superposition and fusion in the channel direction.

The primary feature extraction comprises three processes, namely a convolution layer, a batch normalization layer and an activation layer; the convolutional layer finishes feature extraction by utilizing the sliding of convolutional kernels on an image, the sizes of corresponding receptive fields of different convolutional kernels are different, the larger the size of the convolutional kernel is, the larger the receptive field is, the more image information is obtained, the better the obtained global features are, but the calculated amount is also increased sharply, and the increase of the depth of a model is influenced, so that the sizes of all the convolutional kernels in the feature extraction and fusion part are 3x 3. The batch normalization layer accelerates the training speed, so that the model can be quickly converged during training, and a network can be trained by using a larger learning rate. The activation layer still uses a common activation function, and therefore, will not be described in detail herein.

In the feature extraction and fusion module, the shallow structure extraction learns detail information, and the deep structure extraction learns macro information. Each upper layer is a shallow layer of the next layer, and each lower layer is a deep layer of the previous layer. The method comprises the steps of obtaining 32 layers of characteristic diagrams through a first layer of structure of a leather defect image, and splicing the characteristic diagrams subsequently, wherein most of the characteristic diagrams mainly comprise the details of the leather texture and the defects at the pixel level, and are shown in fig. 3. The defect image is formed by splicing 32 layers of feature maps obtained after passing through a third layer structure, wherein most feature maps mainly comprise the macro information of the leather texture and the defects, as shown in fig. 4. Comparing fig. 3 and fig. 4, it can be seen that as the network structure is continuously deepened, the extracted information is also rougher. Therefore, most information of features with different depths can be reserved after different layers of feature maps in the module are fused, so that more abundant defect feature information can be utilized in an up-sampling stage, and meanwhile, the calculated amount is not greatly increased.

The defect recovery module has a total of six-layer structure, wherein the first layer structure comprises a convolution layer, the second layer structure, the third layer structure, the fourth layer structure and the fifth layer structure respectively comprise an upper sampling layer and two convolution layers, and the sixth layer structure comprises an upper sampling layer and a convolution layer; and respectively carrying out 8-time upsampling, 4-time upsampling and 2-time upsampling on the feature maps extracted from the third-layer structure, the fourth-layer structure and the fifth-layer structure to enable the sizes of the feature maps to be the same as those of the input image. And the defect restoration module gradually performs upsampling on the extracted leather image characteristics to restore the extracted leather image characteristics to the size of an input image, and then performs pixel-level classification. In the SegNet network, although the maximum pooling position information of each layer feature in the coding network is used to be added into the decoding network, some defect information is still missed. The defect restoration module in the algorithm reserves a decoding network firstly, adds corresponding feature maps in the feature extraction and fusion module at the same pixel position before performing double upsampling on the sampling layers of the third layer structure, the fourth layer structure, the fifth layer structure and the sixth layer structure, and thus, the loss of defect information can be offset after each upsampling.

And the defect restoration module performs upsampling operation by using the maximum pooling position information corresponding to the pooling layer in each structure of the feature extraction and fusion module until the size of the input image is finally restored, and generates a first upsampling path. As the acquired defect images contain both large-sized defects and very small-sized defects, two defects with great differences appear in one defect sample image when the model is trained, as shown in fig. 5, in order to ensure that small defects are not ignored in the continuous two-time upsampling stage, information of each layer needs to be directly synthesized in the network classification stage, so that feature maps extracted from the third layer structure, the fourth layer structure and the fifth layer structure in the middle of the first upsampling path are respectively subjected to 8-time upsampling, 4-time upsampling and 2-time upsampling, and thus new three upsampling paths are formed; and (3) superposing the results of the four up-sampling paths according to the channel direction, retaining the core information of the up-sampling paths by a convolution layer of 1x1, and outputting the classification probability of each pixel by using a softmax loss function for classification. The algorithm can be used for extracting the defects of the blue wet leather (namely semi-finished leather) which is simply processed, can also be used for extracting the defects of the finished leather, and is generally used for detecting the semi-finished leather.

In the leather production process, the surface defect of one piece of leather is a key factor influencing the quality of finished leather, and the leather defect detection is an important step in the leather production process. At present, vision-based detection algorithms usually rely on tools such as statistics, mathematics and machine learning, and both the detection efficiency and the detection precision are insufficient. Therefore, aiming at the defects of the current leather defect Detection algorithm based on machine learning, an improved SegNet leather defect Segmentation network, namely LDSN (leather Detection Segmentation network), is provided, the characteristics extracted by the convolution network are utilized for fusion, and an FCN network is fused in an up-sampling stage, so that the purpose of separating the defect part and the normal part in the leather image is achieved. The LDSN network utilizes the leather defect images collected by cooperation of a leather making enterprise to carry out experimental training, verification and testing, and compares the experimental training, verification and testing with the detection results of a traditional detection algorithm and a classical semantic segmentation algorithm. Experimental results show that the LDSN network has obvious advantages in defect segmentation accuracy and generalization capability compared with a traditional leather defect detection algorithm, and is improved by more than 30% on IoU and is improved by 25% on AUC indexes compared with SegNet classical semantic segmentation algorithms.

In order to optimize better model effect and shorten optimization time, a series of measures are taken in the stages of model building, model training and model adjustment. First, our network was trained using the keras deep learning framework, and the fit-generator method therein. The mode can read data in batch, and saves the memory. And then an Adam learner is used for optimizing the network, so that different learning rates can be designed for different parameters, and the training speed is increased. Due to data imbalance, weights are set for the samples during the training phase, which are set on the number of pixel levels according to each defect class.

And a Tensorflow and keras deep learning framework is adopted for algorithm development in the experimental process. Developing a training platform configuration: the CPU adopts Intel (R) Weon (R) W-2102 CPU @2.9GHz (4-core), the GPU adopts Nvidia RTX TITAN, the video memory is 24G, the CUDA is used for acceleration, and the software environment is Windows 10. And (3) test model platform configuration: the CPU adopts Intel (R) core (TM) i5-9400 CPU @2.90GHz (6-core), the memory is 32.0GB (31.9GB available), and the software environment is Windows 10.

The experimental data set is obtained by taking pictures and collecting images on the spot in a factory of a tanning enterprise relevant to the Jinjiang city by adopting a Haikangwei high-precision industrial camera. More than 10 defects including stabbing and scratching, acanthosis, suture wound, opening wound, branding, slippery surface, skin moss, hole breaking, healing wound, rotten surface, insect eyes and the like are collected, partial defects are characterized as shown in figure 6, 105 leather images are collected in total, and the size of each image is 3072 dp/2048 dp. In order to speed up the training and to realize fast detection for practical application, the original size is cut into 1024dp by 1024dp pixels, and two 2 times down-sampling is performed to obtain 256dp by 256dp images, as shown in fig. 6, and 630 leather images are obtained. And marking the defective part by using a labelme program for the label image.

The acquired high-precision leather defect original image is cut before the experiment. Because the number of images containing the defect parts after the acquired original image is cut is less than that of images without defects, up-down and left-right mirror images are required. Due to the characteristic that the leather texture is irregular, the difference between the image after mirror image and the original image is still large, and the image can be used as new data. After image preprocessing, 1890 defect images are obtained, then images with wrong labels during labeling are removed, and 1809 leather defect data sets are left. Each image is 256dp by 256dp in size and contains one or more defects.

In deep learning, the evaluation criteria of the machine learning model for the two classes are based on a confusion matrix (also called an error matrix), which is a standard format for precision evaluation and is represented in a matrix form of n rows and n columns. In artificial intelligence, a confusion matrix is mainly used for comparing a classification result with an actually measured value, and the precision of the classification result can be displayed in the confusion matrix. Each column of the confusion matrix represents a prediction category, and the total number of each column represents the number of data predicted as the category; each row represents the true attribution category of data, and the total number of data per row represents the number of instances of that category of data. The numerical values in each column represent the number of categories for which real data is predicted, as shown in table 1, where TP represents the number of both real categories and model predicted values being 1; TN represents the number of 0 for both the real category and the model prediction; FP represents the number of true classes of 0 and model predictions of 1; FN indicates the number of real categories of 1 and model predictions of 0, as shown in table 1.

TABLE 1 confusion matrix

Many model evaluation indexes can be obtained through the confusion matrix, for example, an Intersection over Union (IoU) index for measuring similarity between a prediction region and an actual region of an object in an image, a numerator is an area of an overlapping part of the prediction region and the actual region, a denominator is an area of a Union of the prediction region and the actual region, and shapes of the regions are not all rectangles. The real region is the edge shape of the detected object in semantic segmentation. For another example, measure the Accuracy (ACC) of the correct classification ratio of the model, as shown in formula (1); measuring index accuracy (Precision) of the model for the positive sample prediction accuracy, as shown in formula (2); measuring the index Recall (Recall) of the model covering the positive sample during prediction, as shown in a formula (3); the False Positive Rate (FPR) is an index for measuring the False detection Rate of the model, and is shown as a formula (4). In addition, auc (area Under cut) is also an important index for the deep learning model based on the two-classification. The area is defined as the area enclosed by the ROC curve and the coordinate axis, the value is not more than 1, and the value range is between 0.5 and 1. The closer the AUC is to 1.0, the higher the predictive realism of the model; and when the value is equal to 0.5, the authenticity is lowest, and the application value is not high.

（1）

（2）

（3）

（4）

The performance of the model is comprehensively tested by adopting various indexes in the experiment, and five indexes of IOU, accuracy, Precision, Recall and AUC are mainly adopted. The three indexes of the intersection ratio accuracy and the AUC are that the larger the numerical value is, the better the model is. Accuracy and recall are traded off against each other and are difficult to achieve, and are constrained by each other in large-scale data sets.

The LDSN model has better performance than SegNet networks, as shown in table 2. In order to control the model structure as a single variable as much as possible during comparison, the same image preprocessing process is performed on the network to be compared, and meanwhile, the unbalanced weight and the Epoch number of the sample in the training process are kept consistent with the LDSN model. In terms of IOU and precision, it can be seen from the table that LDSN is the highest in terms of IOU and precision, reaching 46.42% and 98.01%, respectively. Compared with SegNet, the improvement is 30.71 percent and 14.2 percent respectively. This indicates that the LDSN divided defect part has the highest coincidence degree with the real defect part, and the identification effect is the best. In the aspects of recall rate and accuracy, compared with SegNet, LDSN is respectively improved by 4% and 51.13%, and the improvement in the aspect of accuracy is particularly obvious. LDSN networks are also higher than SegNet networks in terms of AUC. In terms of false detection rate, the false detection rate of LDSN is 0.0083, which is obviously lower than 0.2296 of SegNet, and the LDSN has strong capability of avoiding false judgment defects. The LDSN network also has faster detection speed and more advantages in terms of detection speed. Based on the seven indexes, compared with SegNet, the LDSN has obvious advantages in the aspects of detecting and dividing leather defects. Compared with the classical SegNet network, the method has more obvious advantages in indexes such as IoU, accuracy, precision, AUC and the like. For different leather defects, the LDSN model can effectively identify, and particularly, the detection of 8 defects, namely healing wound, stunning, opening wound, branding, skin moss, hole breaking, facial sliding and suture wound, is more consistent with the real shape and area of the defect. Compared with the algorithm detection speed, the LDSN network can reach the detection speed of 0.298s for each leather image.

TABLE 2 LDSN comparison with classical semantic segmentation network

In terms of detection effects of different defect types, as shown in table 3, IoU of the LDSN network is above 76.36% on average when the LDSN network detects and identifies five defects of healing wounds, broken holes, skin moss, branding and warts, and the LDSN network has a good detection effect with recall rate and accuracy of above 86% and 88% on average. For three defects of open cuts, suture wounds and slippery surfaces, the IOU has about 50 percent of detection effect, and the recall rate and the precision have 61.16 percent and 85.54 percent of detection effect. It can also be seen from table 3 that the IOU of the stab scratch is the lowest, only 29%, and the recall rate and accuracy are the lowest among all the detected defect types, which may be caused by the fact that the stab scratch in the sample has a short length, and the stab scratch has insufficient and insignificant characteristics after the downsampling process of the image preprocessing. The rotten face can still achieve 66% of real detection effect under the condition that the boundary is difficult to define in the human eye detection process, and the recall rate, the precision and the accuracy are all about 80%.

TABLE 3 comparison of the results of detection of different types of defects

The improved SegNet algorithm and SegNet algorithm were qualitatively compared as in fig. 7. It can be obtained that the SegNet algorithm may falsely detect a part of normal texture regions as a defect part or falsely detect other types of defects due to differences between the brightness and the blur of the defect image at the texture. The LDSN network effectively avoids the deficiency and correctly detects the defect part.

While there have been shown and described what are at present considered to be the fundamental and essential features of the invention and its advantages, it will be understood by those skilled in the art that the invention is not limited by the embodiments described above, which are merely illustrative of the principles of the invention, but various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.

Claims

1. A leather defect segmentation network algorithm based on SegNet improvement is characterized in that: the system comprises a feature extraction and fusion module and a defect restoration module, wherein the feature extraction and fusion module is used for processing a leather image;

the defect restoration module expands the upsampling process of the SegNet decoding network into four upsampling paths, and finally performs fusion and expansion to perform pixel-level classification;

the defect recovery module has six layers of structures, wherein the first layer of structure comprises a convolution layer, the second layer of structure, the third layer of structure, the fourth layer of structure and the fifth layer of structure all comprise an up-sampling layer and two convolution layers, the sixth layer of structure comprises an up-sampling layer and one convolution layer, and feature graphs extracted from the third layer of structure, the fourth layer of structure and the fifth layer of structure are respectively subjected to 8 times of up-sampling, 4 times of up-sampling and 2 times of up-sampling to enable the feature graphs to be the same as an input image in size;

the defect restoration module gradually performs up-sampling on the extracted leather image characteristics to restore the extracted leather image characteristics to the size of an input image, and then performs pixel-level classification; the defect restoration module reserves a decoding network firstly, and adds corresponding feature maps in the feature extraction and fusion module at the same pixel position before performing double upsampling on the sampling layers of the third layer structure, the fourth layer structure, the fifth layer structure and the sixth layer structure, so that the loss of defect information can be offset after each upsampling;

the defect recovery module utilizes the maximum pooling position information corresponding to the pooling layers in the structures of the feature extraction and fusion module to perform upsampling operation until the size of the input image is finally recovered, and generates a first upsampling path, wherein feature graphs extracted by a third layer structure, a fourth layer structure and a fifth layer structure in the middle of the first upsampling path are respectively subjected to 8-time upsampling, 4-time upsampling and 2-time upsampling, so that three new upsampling paths are formed; and (3) superposing the results of the four up-sampling paths according to the channel direction, retaining the core information of the convolution layer by 1x1, and finally outputting the classification probability of each pixel by using a softmax loss function for classification.

2. The SegNet improved leather defect segmentation network algorithm in accordance with claim 1, wherein: the feature extraction fusion module has five layers of structures, namely five layers of features are extracted, wherein the first layer of structure comprises two times of feature extraction and one maximum pooling layer, and the second layer of structure, the third layer of structure, the fourth layer of structure and the fifth layer of structure comprise three times of feature extraction and one maximum pooling layer; and performing 16-time maximum pooling, 8-time maximum pooling and 4-time maximum pooling on the features extracted from the second layer structure, the third layer structure and the fourth layer structure to enable the size of the features to be the same as that of the fifth layer feature, and forming a final feature map by superposition and fusion in the channel direction.

3. The SegNet improved leather defect segmentation network algorithm as claimed in claim 2, wherein: the primary feature extraction comprises three processes, namely a convolution layer, a batch normalization layer and an activation layer; in the feature extraction, the convolution layers all adopt convolution kernels with the size of 3 multiplied by 3 to slide on the image to finish feature extraction, and at the moment, the obtained image information is large and the calculation speed is high.