CN116403042A - Method and device for detecting defects of lightweight sanitary products - Google Patents

Method and device for detecting defects of lightweight sanitary products Download PDF

Info

Publication number
CN116403042A
CN116403042A CN202310368796.5A CN202310368796A CN116403042A CN 116403042 A CN116403042 A CN 116403042A CN 202310368796 A CN202310368796 A CN 202310368796A CN 116403042 A CN116403042 A CN 116403042A
Authority
CN
China
Prior art keywords
module
convolution
block
feature
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310368796.5A
Other languages
Chinese (zh)
Inventor
石烈纯
杨辉华
杨牧
赵亮
赵文义
宋明望
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202310368796.5A priority Critical patent/CN116403042A/en
Publication of CN116403042A publication Critical patent/CN116403042A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method for detecting defects of lightweight hygienic products, which comprises the following steps: the method comprises the steps of obtaining surface images of sanitary products, inputting the surface images into a deep learning-based lightweight target detection neural network model, carrying out data enhancement processing on the surface images, further obtaining initial feature images, carrying out multi-scale feature extraction on the initial feature images through a CSP module embedded with an EVC module, fusing the feature images with different scales through a feature pyramid method of an FPV module, fusing the feature images with different resolutions through a path polymerization method of a PANet module, and fusing output features of the CSP module, the FPV module and the PANet module through a lightweight convolution operation of a CSConv module to obtain and output a final detection result. According to the invention, the GSConv and EVC modules are integrated into the deep learning target detection neural network model, so that the accuracy and efficiency of target detection are improved, and meanwhile, the characteristics of light weight and high speed are maintained.

Description

Method and device for detecting defects of lightweight sanitary products
Technical Field
The invention relates to a method for detecting defects of light-weight sanitary products, belonging to the technical field of computer vision and deep learning.
Background
Surface defects are an important issue in product production and quality control in the manufacturing industry. Traditional surface defect detection methods rely mainly on manual visual detection, which is time-consuming, labor-consuming and subject to subjectivity and error. In recent years, with the continuous development and application of deep learning technology, a surface defect detection method based on deep learning is becoming a research hotspot. The method realizes the automatic detection and classification of different types of surface defects by carrying out self-learning and feature extraction on the surface defect images, and has higher precision and efficiency. However, the existing surface defect detection method based on deep learning generally has the problems of excessive model parameters, large calculation amount and low operation efficiency, and is difficult to meet the requirements of production and manufacturing scenes on instantaneity and response speed.
Currently, conventional target detection algorithms typically employ methods based on sliding window or region proposal for target detection. However, this method requires intensive sampling over the entire image, resulting in a huge computational effort. Meanwhile, the area proposal-based method also has many problems such as non-maximum suppression, proposal inaccuracy, etc., so that the performance of target detection is limited. In addition, the traditional target detection algorithm mostly uses a shallow network as a backbone network, so that the understanding and expression capability of the target semantic information is limited, and the detection performance is further influenced. Therefore, these problems make it difficult for conventional target detection algorithms to achieve efficient and accurate target detection in complex scenarios.
In summary, the existing surface defect detection method based on deep learning has the problems of excessive model parameters, large calculated amount and low operation efficiency. To solve these problems, new methods and techniques are required.
Disclosure of Invention
The invention provides a method for detecting defects of light-weight sanitary products, which aims at solving at least one of the technical problems in the prior art.
The invention discloses a method for detecting defects of a lightweight sanitary product, which comprises the following steps:
s110, acquiring a surface image of the sanitary article and inputting the surface image into a lightweight target detection neural network model based on deep learning so as to perform data preprocessing on the surface image and further acquire an initial feature map;
the target detection neural network model comprises a CSP module, an FPV module, a PANet module and a CSConv module;
s200, performing multi-scale feature extraction on the initial feature map through the CSP module embedded with the EVC module;
s300, fusing the feature images with different scales through the FPV module, and fusing the feature images with different resolutions through the PANet module;
s400, fusing output characteristics of the CSP module, the FPV module and the PANet module through a lightweight convolution operation of the CSConv module so as to obtain and output a final detection result.
Further, the EVC module includes STEM blocks, MLP blocks, and vision center mechanisms; the step S200 includes:
s310, performing feature smoothing processing on the input features of the EVC module through the STEM block; wherein said STEM block comprises a first convolution of 7x 7;
s320, obtaining output characteristics of the STEM block, and capturing global long-term dependency relationship of top-level characteristics through the lightweight MLP block to obtain global information;
s330, obtaining output characteristics of the STEM block, and aggregating local area characteristics in the layer through the learnable visual center mechanism to obtain local information;
s340, connecting the result feature map of the global information and the result feature map of the local information together along the channel dimension to obtain the output feature of the ENC module;
wherein the MLP block is connected in parallel with the vision center mechanism.
Further, the output characteristics of the ENC module are calculated by the following formula:
X=cat(MLP(X (in) );LVC(X (in) ))
wherein X represents the output characteristics of the EVC module, cat (& gt) represents the feature map stitching along the channel dimension, and MLP (X) in ) Characteristic outputs representing the MLP blocks, LVC (X in ) Output features representing the visual center mechanism, X in Representing the output characteristics of the STEM block, wherein the output characteristics of the STEM block are calculated by the following formula:
X in =σ(BN(Conv 7×7 (X 4 )))
in the formula Conv 7×7 (. Cndot.) represents a 7X7 convolution function with stride 1, BN (-) represents a batch normalization function, and σ (-) represents a ReLU activation function.
Further, the MLP block includes a first remaining module and a second remaining module, and the step S320 includes:
s321, inputting the output characteristics of the STEM block into the first residual module based on the depth convolution, and simultaneously carrying out grouping normalization processing; wherein the output characteristics of the first remaining module are calculated by the following formula
Figure SMS_1
Figure SMS_2
Wherein GN (·) represents a group normalization processing function, and DConv (·) represents a deep convolution function with a kernel size of 1x 1;
s322, inputting the processed output characteristics of the first residual module into the second residual module based on channel MLPThe residual module is used for performing signal scaling operation and DropPath operation; wherein the output characteristics MLP (X) of the second residual module are calculated by the following formula in ):
Figure SMS_3
Where CMLP (. Cndot.) represents a function of channel MLP.
Further, the visual center mechanism includes a convolution group and a CBR block, and the step S330 includes:
s331, encoding the output characteristics of the STEM block through the convolution layer group, wherein the convolution layer group comprises a second convolution of 1 multiplied by 1, a third convolution of 3 multiplied by 3 and a fourth convolution of 1 multiplied by 1;
s332, inputting the coded characteristics into the CBR block for processing, wherein the CBR block comprises a 3BN layer convolution and a ReLU activation layer;
s333, inputting the processed characteristics into a codebook;
wherein the output characteristic of the visual center mechanism passes through the output characteristic X of the STEM block in And the local angular region feature Z, which is calculated as follows:
Figure SMS_4
wherein Z represents a local angular region feature;
Figure SMS_5
represents the addition of the channel type;
wherein the local angular region feature Z is calculated as follows:
Figure SMS_6
in the formula Conv 1x1 Representing a 1x1 convolution function, delta (·) is the sigmoid function,
Figure SMS_7
is a multiplication of the channels, e represents the entire information of the entire image about K codewords.
Further, the CSConv module includes a Ghost sub-block and a shrnk sub-block, and the step S400 includes:
s410, performing channel dimension reduction operation on the input feature map through the Ghost sub-block based on the grouping convolution to obtain a smaller Ghost feature map;
wherein, the channel dimension reduction operation includes: dividing an input feature map into a plurality of groups, and then randomly selecting channels in each group to obtain the smaller Ghost feature map;
s420, performing lightweight convolution operation on the Ghost feature map through the Shrink sub-block based on depth separable convolution to obtain a final feature map;
wherein the lightweight convolution operation includes: and respectively carrying out deep convolution operation on each input channel, and then carrying out point-by-point convolution on the obtained output to obtain the final characteristic diagram.
Further, the model loss function is based on a weighted non-maximum suppression algorithm, and the model loss function is calculated as follows:
L CIoU =1-IoU+R CIoU
wherein L is CIoU Representing a model loss function, ioU representing a bounding box loss function,
Figure SMS_8
R CIoU representing a penalty term;
wherein the penalty term is obtained by minimizing the normalized distance between the center points of two bounding boxes, the penalty term R CIoU Calculated by the following formula:
Figure SMS_9
wherein B and bgt represent B and B gt P (·) represents the euclidean distance, c represents the diagonal length of the smallest bounding box covering both boxes. Alpha meterShowing the trade-off parameters, v represents the uniformity of the aspect ratio.
Further, when model training is carried out, carrying out structural pruning operation on the model; the structured pruning operation is based on weights or channels, and comprises the following steps:
s510, pruning a specific layer, a channel or a weight of the network model, and removing redundant connection and parameters;
s520, retraining the pruned model, enabling the pruned model to be fitted with training data again, and achieving accuracy similar to that of the original model.
The technical scheme of the invention also relates to a computer device, which comprises a memory and a processor, wherein the processor executes the computer program stored in the memory to implement the method.
The invention also relates to a computer-readable storage medium, on which computer program instructions are stored, which, when being executed by a processor, carry out the above-mentioned method.
The beneficial effects of the invention are as follows.
The embodiment of the invention provides a light-weight hygienic product defect detection method and system based on a deep learning technology. The system uses a lightweight neural network structure and an end-to-end learning technology to realize rapid detection and classification of surface defects. Optimizing the model structure and parameters can reduce the calculated amount, improve the model operation efficiency and response speed, and realize real-time defect detection and quality control. Meanwhile, by adopting an effective feature extraction method and a classifier, the rapid detection and classification of different types of surface defects can be realized, and the detection precision and efficiency are improved. In order to meet the real-time detection requirement of the hygienic product defect detection platform, the GSConv technology is introduced to lighten the complexity of the model and maintain the accuracy. In addition, the invention also embeds the EVC module to improve the precision of small target sample detection, and greatly reduces the size and the calculated amount of the model under the condition of not losing the accuracy of the model by a channel pruning technology, thereby improving the running efficiency and the acceleration reasoning speed of the model.
Further, additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
Fig. 1 is a schematic architecture diagram of a target detection neural network model according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of an architecture of an EVC module according to an embodiment of the invention.
Fig. 3 is a schematic architecture diagram of a CSConv module according to an embodiment of the invention.
Detailed Description
The conception, specific structure, and technical effects produced by the present invention will be clearly and completely described below with reference to the embodiments and the drawings to fully understand the objects, aspects, and effects of the present invention.
It should be noted that, unless otherwise specified, when a feature is referred to as being "fixed" or "connected" to another feature, it may be directly or indirectly fixed or connected to the other feature. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any combination of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element of the same type from another. For example, a first element could also be termed a second element, and, similarly, a second element could also be termed a first element, without departing from the scope of the present disclosure. The use of any and all examples, or exemplary language (e.g., "such as") provided herein, is intended merely to better illuminate embodiments of the invention and does not pose a limitation on the scope of the invention unless otherwise claimed.
Referring to fig. 1 to 3, the method for detecting defects of lightweight sanitary products according to the present invention comprises at least the following steps:
s110, acquiring a surface image of the sanitary article and inputting the surface image into a lightweight target detection neural network model based on deep learning so as to perform data enhancement processing on the surface image and further acquire an initial feature map;
the target detection neural network model comprises a CSP module, an FPV module, a PANet module and a CSConv module;
s200, carrying out multi-scale feature extraction on the initial feature map through a CSP module embedded with an EVC module;
s300, fusing the feature images with different scales through an FPV module, and fusing the feature images with different resolutions through a PANet module;
s400, fusing output characteristics of the CSP module, the FPV module and the PANet module through a lightweight convolution operation of the CSConv module to obtain and output a final detection result.
Referring to FIG. 1, an embodiment of the present invention is directed to developing a target detection neural network model based on a deep-learned YOLOv5 network model to translate target detection tasks into regression problems. The basic framework of the model is divided into four parts: input, backbone, neck, and prediction. In the input section, mosaic data enhancement is employed to enhance the dataset to enhance the generalization ability of the model. And carrying out various data enhancement processes on the input data, such as random clipping, color disturbance, deformation and the like, so as to increase the sample size of the data set, thereby improving the robustness and the precision of the model. The backbone part adopts a CSP module to execute feature extraction, wherein the CSP module is a lightweight feature extraction module, and can realize high-efficiency multi-scale feature extraction. The CSP module performs feature extraction through CSPDarknet53 to improve the accuracy and speed of the model. The neck portion is mainly composed of two modules, FPN and PANet, for aggregating image features at different scales. The FPN module fuses the feature graphs with different scales by a feature pyramid method, so that the detection capability of the model on small targets is improved. The PANet module adopts a path polymerization method to aggregate the feature images with different resolutions together so as to improve the characterization capability of the model. The prediction part fuses the outputs of the backbone network and the neck network through convolution operation, so that a final detection result is obtained.
The model framework of the embodiment of the invention adopts a GE-YOLOv5 framework (see figure 1) to improve the accuracy and the operation efficiency of target detection. The model is improved on the basis of YOLOv5, an EVC module is embedded in a network in a feature extraction part to improve the detection capability of a small target and the effect of multi-scale feature fusion, and lightweight GSConv convolution is adopted in a feature fusion part to replace standard convolution so as to reduce the quantity of model parameters and the calculated quantity.
Detailed description of step S200
In some embodiments of the present invention, in the feature extraction part, an EVC module is embedded in the CSP model, where the EVC module is based on an explicit learnable visual center module, and can improve the small target detection accuracy, referring to fig. 2, the EVC module captures long-range dependency relationships through a lightweight multi-layer perceptron (MLP, multilayer Perceptron), and aggregates local corner areas of an input image through a parallel learnable visual center mechanism, so that the EVC module extracts a more comprehensive and differentiated feature representation without increasing the computational complexity, so as to effectively improve the detection accuracy of the small target.
Further, the EVC module is composed of two parallel connected sub-blocks, wherein one lightweight MLP block is used for capturing the global long-term dependency relationship (namely global information) of the top-level characteristic; the other sub-block is a learnable visual center mechanism for aggregating local region features within the layer to preserve local corner regions (i.e., local information). The resulting feature maps of the two sub-blocks are connected together along the channel dimension as output of the EVC for downstream identification.
Further, the input features of the EVC module are first feature flat-slid by STEM blocks to ensure that the feature map input to the EVC is smooth. The STEM block consists of a 7x7 convolution with an output channel size of 256 followed by a batch normalization layer and an activation function layer. The EVC module can extract the characteristics of more comprehensive and distinguishing degree through the STEM block, meanwhile, the calculation complexity is not increased, and the accuracy of small target detection can be effectively improved.
In an embodiment, the feature extraction of the embodiment of the present invention includes the following steps:
s310, performing feature smoothing processing on the input features of the EVC module through a STEM block; wherein the STEM block comprises a first convolution of 7x 7;
s320, obtaining output characteristics of STEM blocks, and capturing global long-term dependency of top-level characteristics through lightweight MLP blocks to obtain global information;
s330, obtaining output characteristics of STEM blocks, and aggregating local area characteristics in the layer through a learnable visual center mechanism to obtain local information;
s340, connecting the result feature map of the global information and the result feature map of the local information together along the channel dimension to obtain the output feature of the ENC module;
wherein the MLP blocks are connected in parallel with the vision center mechanism.
Further, the output characteristics of the ENC module are calculated by the following formula:
X=cat(MLP(X (in) );LVC(X (in) ))
wherein X represents the output characteristics of the EVC module, cat (·) represents feature map stitching along the channel dimension, MLP (X) in ) Characteristic outputs representing the MLP blocks, LVC (X in ) Output features representing the visual center mechanism, X in Representing the output characteristics of the STEM block, wherein the output characteristics of the STEM block are calculated by the following formula:
X in =σ(BN(Conv 7×7 (X 4 )))
in the formula Conv 7×7 (. Cndot.) represents a 7X7 convolution function with stride 1, BN (-) represents a batch normalization function, and σ (-) represents a ReLU activation function.
In an application embodiment, an EVC module packageThe method comprises the steps of including a lightweight MLP block, wherein the MLP block consists of two residual modules, including a first residual module based on depth convolution and a second residual module based on channel MLP, wherein the input of the second residual module is the output of the first residual module. The second remaining module performs a channel scaling operation and a DropPath operation (a normalization method for preventing overfitting) on the input features to improve the generalization ability and robustness of the features. And for the first remaining module based on depth convolution, STEM module X is first used in The output characteristics are input to a deep convolution layer, and meanwhile grouping normalization processing is carried out. Compared with the traditional space convolution, the depth convolution adopted by the method can not only reduce the calculation cost, but also improve the characteristic representation capability, and realize X after channel scaling and downlink path in Is a residual connection of (c).
Specifically, inputting the output characteristics of the STEM block into a first residual module based on the depth convolution, and simultaneously carrying out grouping normalization processing; wherein the output characteristics of the first remaining module
Figure SMS_10
The calculation is as follows:
Figure SMS_11
wherein GN (·) represents a group normalization processing function, and DConv (·) represents a deep convolution function with a kernel size of 1x 1;
specifically, the output characteristics of the processed first residual module are input into a second residual module based on the channel MLP so as to perform signal scaling operation and DropPath operation; wherein the output characteristics of the second remaining module are calculated as follows:
Figure SMS_12
where CMLP (. Cndot.) represents a function of channel MLP.
In one application embodiment, the EVC module includes a learnable visual center mechanism (LVC, learnable Visual Center), which is an encoder with an inherent dictionary, made up of two components: 1) The intrinsic codebook b= { B1, B2,., bK }, where n=h×w is the total number of spaces of the input feature, and H and W are the height and width of the feature map, respectively; 2) A set of learnable visual center scale factors s= { S1, S2, &..sk }.
Specifically, the visual center mechanism is derived from STEM block X in The extracted features are first encoded by a set of convolution packets, where the set of convolution packets includes a second convolution of 1x1, a third convolution of 3 x 3, and a fourth convolution of 1x 1. After encoding, the features are subjected to CBR block (Computer Build Report) processing, wherein CBR block processing includes 3BN layer convolution and ReLU activation functions. Through the steps, the coded characteristic X in Is input into the codebook.
Wherein, the output characteristic of the visual center mechanism passes through the output characteristic X of the STEM block in And the local angle area characteristic Z is obtained by channel addition; the local angular region feature Z is calculated as follows:
Figure SMS_13
in the formula Conv 1x1 Representing a 1x1 convolution function, delta (·) is the sigmoid function,
Figure SMS_14
is a multiplication of the channels, e represents the entire information of the entire image about K codewords.
It should be noted that in the embodiment of the present invention, a set of scale factors s are used, and the following are sequentially performed
Figure SMS_15
And b k Mapped to corresponding location information. The information of the entire image about the kth codeword can be calculated by:
Figure SMS_16
in the method, in the process of the invention,
Figure SMS_17
for the ith pixel point, b k For the kth learnable visual codeword, s k Is the kth scale factor. />
Figure SMS_18
Is information relative to each pixel location of the codeword. K is the total number of visual centers. And then used to fuse all e k Which contains BN layer and ReLU layer and mean layer. On this basis, the overall information of the entire image about the K codewords is calculated as follows:
Figure SMS_19
after the codebook output is obtained, it is further fed to the full join layer and the 1x1 convolution layer to predict features that highlight key classes. After this, use is made of the block X from STEM in Channel multiplication between the input features of (c) and the scale factor coefficient delta (·). Finally, at feature X of STEM Block in Channel addition is performed between the output and the local angular region feature Z.
According to the method provided by the embodiment of the invention, the EVC module is embedded to realize intra-layer characteristic adjustment, so that not only can the global remote dependency be extracted, but also the local corner area information of the input image can be reserved as far as possible, and the speed and the precision of the intensive prediction task can be improved.
Detailed description of step S400
In some embodiments of the present invention, the output features of the CSP module, the FPV module, and the PANet module are fused by a lightweight convolution operation of the CSConv module, so as to obtain and output a final detection result. Referring to fig. 3, the GSConv module is a convolution module based on a group convolution and a group attention mechanism, and includes a Ghost sub-block based on a group convolution and a krnk sub-block based on a depth separable convolution, thereby effectively improving the feature extraction capability of the model.
Specifically, the GSConv module divides the input feature map into a plurality of groups, performs independent convolution operation on each group, and performs interaction on information between different groups through a group attention mechanism. In a convolution operation, the convolution kernel parameters within each group are shared, while the convolution kernel parameters between different groups are independent. By the mode, the GSConv module can enhance the multi-channel feature extraction capability of the model, and can effectively reduce the parameter quantity and the calculation complexity of the model.
In one embodiment, feature fusion of the embodiments of the present invention includes the steps of:
s410, performing channel dimension reduction operation on the input feature map through a Ghost sub-block based on the group convolution to obtain a smaller Ghost feature map. The channel dimension reduction operation comprises the following steps: dividing an input feature map into a plurality of groups, and then randomly selecting channels in each group to obtain a smaller Ghost feature map;
s420, performing lightweight convolution operation on the Ghost feature map through a Shrink sub-block based on depth separable convolution to obtain a final feature map. Wherein the lightweight convolution operation includes: and respectively carrying out deep convolution operation on each input channel, and then carrying out point-by-point convolution on the obtained output to obtain a final characteristic diagram.
In an application embodiment, the Ghost sub-block in the GSConv module is used for improving the calculation efficiency in the network based on the packet convolution, and the Ghost sub-block reduces the calculation amount and the parameter number by carrying out channel dimension reduction on the input feature map. Specifically, the Ghost section obtains a smaller Ghost profile (Ghost feature map) by dividing the input profile into several groups and then randomly selecting channels within each group. This Ghost profile will be passed on to the next Shrink sub-block.
In an application embodiment, the shrnk sub-blocks in the GSConv module are based on depth separable convolution (DWConv, depthwise Separable Convolution) for further compression of the Ghost feature map. Specifically, the shrnk section performs a deep convolution operation on Ghost feature map using DWConv, thereby further reducing the number of channels and the amount of computation. DWConv is based on a lightweight convolution operation and is divided into two steps of depth convolution and point-by-point convolution, namely, the depth convolution operation is respectively carried out on each input channel, and then the obtained output is subjected to point-by-point convolution to obtain a final feature map. By adopting the depth separable convolution, the calculation amount and the parameter number are greatly reduced while the model precision is ensured. The feature map output after the processing of the Shrink sub-block has smaller channel number and fewer parameter number, so that the whole network becomes lighter and more efficient.
It should be noted that, the GSConv module in the embodiment of the present invention performs operations such as packet convolution and deep convolution on the input feature map, so as to compress the channel dimension reduction and calculation amount of the feature map, which is beneficial to greatly reducing the parameter amount and calculation amount of the network while ensuring the model accuracy, so that the whole network is lighter and more efficient. The GSConv module retains the connection of the feature maps as much as possible, but if it is used at all stages of the model, the network layer of the model will be deeper and the deep layers will exacerbate the resistance to data flow, significantly increasing the inference time, so when these feature maps walk to the Neck (Neck), they have become slim (channel dimension is maximized and width and height dimensions are minimized) and no transformation is required, so the model of the embodiment of the invention uses the GSConv module only in the Neck stage.
In some embodiments of the present invention, when model training the object detection neural network model, generalized IoU (GIoU) loss is used as a bounding box loss function in prediction, and a weighted non-maximum suppression algorithm (NMS) is used for NMS. The loss function is as follows:
Figure SMS_20
Figure SMS_21
wherein C is a member covering B and B gt Is not limited to the minimum area size of (a). B (B) gt =(x gt ,y gt ,w gt ,h gt ) Is a label, b= (x, y, w, h) is a prediction box.
However, when the prediction box is located within a tag and the size of the prediction box is the same, the prediction box and the tag cannot be distinguished, so in some embodiments of the invention, the GIoU is replaced with a full IoU (CIoU) penalty. Based on the GIoU penalty, the CIoU penalty considers the consistency of the overlap region, the center point distance of the bounding box, and the aspect ratio of the bounding box. The loss function may be defined as:
Figure SMS_22
Figure SMS_23
L CIoU =1-IoU+R CIoU
wherein R is CIoU Is a penalty term that is defined by minimizing the normalized distance between the center points of two bounding boxes. B and bgt represent B and B gt P (·) is the euclidean distance and c is the diagonal length of the smallest bounding box covering both boxes. Alpha is a positive compromise parameter and v measures the uniformity of the aspect ratio.
Whereas the trade-off parameter α is defined as:
Figure SMS_24
where overlapping region factors are given higher regression priorities, especially for non-overlapping cases.
In an application embodiment, the method of the embodiment of the invention adopts sparse network pruning to optimize the GE-YOLOv5 model, thereby reducing the size and calculation cost of the model and improving the practicability of the model under the condition of maintaining certain precision through model pruning. The model pruning of the method successfully reduces the parameter number of the model to 3/16 of the original model, and maintains the performance and precision similar to those of the original model.
In the process of model training, the invention adopts structured pruning operation as a light weight method of the convolutional neural network. The structured pruning method has the advantages of being convenient for software and hardware realization and deployment, and simultaneously can effectively reduce the complexity of the model without affecting the precision of the network.
Specifically, the structured pruning method of the present invention comprises two steps of pruning and retraining. In the pruning stage, pruning is performed on specific layers, channels or weights in the network, and redundant connections and parameters are removed. And in the retraining stage, retraining the pruned model, so that the pruned model is retrained with training data, and the accuracy similar to that of the original model is achieved. The structured pruning method is based on the structural characteristics of the deep neural network, adopts a targeted pruning strategy, and removes unnecessary parameters in the network on the premise of not affecting the structural integrity and the feature expression capability of the network. The method is based on weight or pruning of channels, and can compress and accelerate the model.
Further, different pruning strategies may be employed for different network layers, such as channel, weight or filter based pruning, etc. The structured pruning method can effectively reduce the complexity of the model, thereby improving the training speed and the reasoning speed of the model and simultaneously reducing the storage space of the model.
Specifically, the weight or channel-based pruning operation of the present invention includes the steps of:
s510, pruning a specific layer, a channel or a weight of the network model, and removing redundant connection and parameters;
s520, retraining the pruned model, enabling the pruned model to be fitted with training data again, and achieving accuracy similar to that of the original model.
Furthermore, the invention uses a structured pruning method, and the BN layer capable of learning parameters gamma and beta is introduced into the neural network, so that the training and convergence speed of the network can be accelerated. And carrying out normalization processing on the channel data through translation and scaling, and learning the characteristic distribution of the network. Adding an L1 regularization term to the loss function can reduce the complexity of the model, and a sparse network is obtained. In the method, a scale factor is introduced into each channel of a BN layer, and a penalty term about gamma is added into a loss function, so that a sparse network is obtained. Wherein the first term of the loss function represents the loss function of normal training of the network, the second term represents the L1 regular term, lambda represents the balance factor of the two terms, and Γ represents the set of all pruning channels. Finally, the size of the scale factor gamma of the sparse network is used as an index for measuring the importance of the network channels of each layer. The structured pruning method adopted by the method can be more convenient in software and hardware realization and deployment.
Furthermore, on the basis of a lightweight reference network, through iterative sparse training, the gamma parameters of a BN layer in the neural network can be gradually reduced, the overall distribution tends to 0, the sparsification of the network is realized, and the channel screening and pruning are facilitated. By adjusting the size of the super parameter lambda, the network can obtain proper sparsity, so that redundant channels are screened. Removing channels in the network that contribute less does not affect efficient extraction of features by the model and can also reduce the complexity of the network. Retraining and fine tuning can improve the accuracy and generalization performance of the model. The method for controlling the channels by the single threshold value can lead the number of channels of the neural network to lose the original regular structure, so that the number of channels of the network is limited, and masks are used for reorganization during channel selection so as to meet the requirement of acceleration of model deployment and realize new lightweight network model reconstruction.
The method for detecting the defects of the lightweight sanitary products has various advantages, such as high precision, quick operation, real-time performance, universality, high expandability and the like, and can provide technical support for automatic defect detection and quality control in manufacturing industry. The method can improve the accuracy and efficiency of target detection, and simultaneously maintain the characteristics of light weight and rapidness, so that the method is suitable for large-scale practical application scenes.
It should be appreciated that the method steps in embodiments of the present invention may be implemented or carried out by computer hardware, a combination of hardware and software, or by computer instructions stored in non-transitory computer-readable memory. The method may use standard programming techniques. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Furthermore, the operations of the processes described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes (or variations and/or combinations thereof) described herein may be performed under control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications), by hardware, or combinations thereof, collectively executing on one or more processors. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable computing platform, including, but not limited to, a personal computer, mini-computer, mainframe, workstation, network or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and so forth. Aspects of the invention may be implemented in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optical read and/or write storage medium, RAM, ROM, etc., such that it is readable by a programmable computer, which when read by a computer, is operable to configure and operate the computer to perform the processes described herein. Further, the machine readable code, or portions thereof, may be transmitted over a wired or wireless network. When such media includes instructions or programs that, in conjunction with a microprocessor or other data processor, implement the steps described above, the invention described herein includes these and other different types of non-transitory computer-readable storage media. The invention may also include the computer itself when programmed according to the methods and techniques of the present invention.
The computer program can be applied to the input data to perform the functions described herein, thereby converting the input data to generate output data that is stored to the non-volatile memory. The output information may also be applied to one or more output devices such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including specific visual depictions of physical and tangible objects produced on a display.
The present invention is not limited to the above embodiments, but can be modified, equivalent, improved, etc. by the same means to achieve the technical effects of the present invention, which are included in the spirit and principle of the present invention. Various modifications and variations are possible in the technical solution and/or in the embodiments within the scope of the invention.

Claims (10)

1. A method of lightweight hygienic product defect detection, the method comprising the steps of:
s110, acquiring a surface image of the sanitary article and inputting the surface image into a lightweight target detection neural network model based on deep learning so as to perform data preprocessing on the surface image and further acquire an initial feature map;
the target detection neural network model comprises a CSP module, an FPV module, a PANet module and a CSConv module;
s200, performing multi-scale feature extraction on the initial feature map through the CSP module embedded with the EVC module;
s300, fusing the feature images with different scales through the FPV module, and fusing the feature images with different resolutions through the PANet module;
s400, fusing output characteristics of the CSP module, the FPV module and the PANet module through a lightweight convolution operation of the CSConv module so as to obtain and output a final detection result.
2. The method of claim 1, wherein the EVC module includes STEM blocks, MLP blocks, and vision center mechanisms; the step S200 includes:
s310, performing feature smoothing processing on the input features of the EVC module through the STEM block; wherein said STEM block comprises a first convolution of 7x 7;
s320, obtaining output characteristics of the STEM block, and capturing global long-term dependency relationship of top-level characteristics through the lightweight MLP block to obtain global information;
s330, obtaining output characteristics of the STEM block, and aggregating local area characteristics in the layer through the learnable visual center mechanism to obtain local information;
s340, connecting the result feature map of the global information and the result feature map of the local information together along the channel dimension to obtain the output feature of the ENC module;
wherein the MLP block is connected in parallel with the vision center mechanism.
3. The method of claim 2, wherein the output characteristics of the ENC module are calculated by the following formula:
X=cat(MLP(X (in) );LVC(X (in) ))
wherein X represents the output characteristics of the EVC module, cat (& gt) represents the feature map stitching along the channel dimension, and MLP (X) in ) Characteristic outputs representing the MLP blocks, LVC (X in ) Output features representing the visual center mechanism, X in Representing the output characteristics of the STEM block, wherein the output characteristics of the STEM block are calculated by the following formula:
X in =σ(BN(ConV 7×7 (X 4 )))
in the formula Conv 7×7 (. Cndot.) represents a 7X7 convolution function with stride 1, BN (-) represents a batch normalization function, and σ (-) represents a ReLU activation function.
4. The method of claim 3, wherein the MLP block comprises a first remaining module and a second remaining module, the step S320 comprising:
s321, inputting the output characteristics of the STEM block into the first residual module based on the depth convolution, and simultaneously carrying out a grouping normalization processing function; wherein the output characteristics of the first remaining module are calculated by the following formula
Figure FDA0004167988560000021
Figure FDA0004167988560000022
Wherein GN (·) represents a group normalization processing function, and DConv (·) represents a deep convolution function with a kernel size of 1x 1;
s322, inputting the processed output characteristics of the first residual module into the second residual module based on the channel MLP to perform signal scaling operation and DropPath operation; wherein the output characteristics MLP (X) of the second residual module are calculated by the following formula in ):
Figure FDA0004167988560000023
Where CMLP (. Cndot.) represents a function of channel MLP.
5. A method according to claim 3, wherein the visual center mechanism comprises a convolutional layer group and a CBR block, the step S330 comprising:
s331, encoding the output characteristics of the STEM block through the convolution layer group, wherein the convolution layer group comprises a second convolution of 1 multiplied by 1, a third convolution of 3 multiplied by 3 and a fourth convolution of 1 multiplied by 1;
s332, inputting the coded characteristics into the CBR block for processing, wherein the CBR block comprises a 3BN layer convolution and a ReLU activation layer;
s333, inputting the processed characteristics into a codebook;
wherein the output characteristic of the visual center mechanism passes through the output characteristic X of the STEM block in And the local angular region feature Z, which is calculated as follows:
Figure FDA0004167988560000024
wherein Z represents a local angular region feature;
Figure FDA0004167988560000025
represents the addition of the channel type;
wherein the local angular region feature Z is calculated as follows:
Figure FDA0004167988560000026
in the formula Conv 1x1 Representing a 1x1 convolution function, delta (·) is the sigmoid function,
Figure FDA0004167988560000027
is a multiplication of the channels, e represents the entire information of the entire image about K codewords.
6. The method of claim 1, wherein the CSConv module includes a Ghost sub-block and a shrnk sub-block, and the step S400 includes:
s410, performing channel dimension reduction operation on the input feature map through the Ghost sub-block based on the grouping convolution to obtain a smaller Ghost feature map;
wherein, the channel dimension reduction operation includes: dividing an input feature map into a plurality of groups, and then randomly selecting channels in each group to obtain the smaller Ghost feature map;
s420, performing lightweight convolution operation on the Ghost feature map through the Shrink sub-block based on depth separable convolution to obtain a final feature map;
wherein the lightweight convolution operation includes: and respectively carrying out deep convolution operation on each input channel, and then carrying out point-by-point convolution on the obtained output to obtain the final characteristic diagram.
7. The method of claim 1, wherein the model's loss function is based on a weighted non-maximum suppression algorithm, the model loss function being calculated as follows:
L CIoU =l-IoU+R CIoU
wherein L is CIoU Representing a model loss function, ioU representing a bounding box loss function,
Figure FDA0004167988560000031
R CIoU representing a penalty term;
wherein the penalty term is obtained by minimizing the normalized distance between the center points of two bounding boxes, the penalty term R CIoU Calculated by the following formula:
Figure FDA0004167988560000032
wherein B and bgt represent B and B gt P (·) represents the euclidean distance, c represents the diagonal length of the smallest bounding box covering both boxes. Alpha represents a trade-off parameter and v represents the uniformity of the aspect ratio.
8. The method of claim 1, wherein the model is structured to pruning as the model is trained; the structured pruning operation is based on weights or channels, and comprises the following steps:
s510, pruning a specific layer, a channel or a weight of the network model, and removing redundant connection and parameters;
s520, retraining the pruned model, enabling the pruned model to be fitted with training data again, and achieving accuracy similar to that of the original model.
9. A computer device comprising a memory and a processor, wherein the processor implements the method of any of claims 1 to 8 when executing a computer program stored in the memory.
10. A computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the method of any of claims 1 to 8.
CN202310368796.5A 2023-04-07 2023-04-07 Method and device for detecting defects of lightweight sanitary products Pending CN116403042A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310368796.5A CN116403042A (en) 2023-04-07 2023-04-07 Method and device for detecting defects of lightweight sanitary products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310368796.5A CN116403042A (en) 2023-04-07 2023-04-07 Method and device for detecting defects of lightweight sanitary products

Publications (1)

Publication Number Publication Date
CN116403042A true CN116403042A (en) 2023-07-07

Family

ID=87017495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310368796.5A Pending CN116403042A (en) 2023-04-07 2023-04-07 Method and device for detecting defects of lightweight sanitary products

Country Status (1)

Country Link
CN (1) CN116403042A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078967A (en) * 2023-09-04 2023-11-17 石家庄铁道大学 Efficient and lightweight multi-scale pedestrian re-identification method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078967A (en) * 2023-09-04 2023-11-17 石家庄铁道大学 Efficient and lightweight multi-scale pedestrian re-identification method
CN117078967B (en) * 2023-09-04 2024-03-01 石家庄铁道大学 Efficient and lightweight multi-scale pedestrian re-identification method

Similar Documents

Publication Publication Date Title
US11928602B2 (en) Systems and methods to enable continual, memory-bounded learning in artificial intelligence and deep learning continuously operating applications across networked compute edges
CN111950453B (en) Random shape text recognition method based on selective attention mechanism
CN112329760B (en) Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
CN113158862A (en) Lightweight real-time face detection method based on multiple tasks
CN109766873B (en) Pedestrian re-identification method based on hybrid deformable convolution
CN114972213A (en) Two-stage mainboard image defect detection and positioning method based on machine vision
CN112529146B (en) Neural network model training method and device
CN113052006B (en) Image target detection method, system and readable storage medium based on convolutional neural network
CN113297972B (en) Transformer substation equipment defect intelligent analysis method based on data fusion deep learning
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN114359631A (en) Target classification and positioning method based on coding-decoding weak supervision network model
CN116403042A (en) Method and device for detecting defects of lightweight sanitary products
CN111696136A (en) Target tracking method based on coding and decoding structure
CN115588237A (en) Three-dimensional hand posture estimation method based on monocular RGB image
CN116434012A (en) Lightweight cotton boll detection method and system based on edge perception
CN115861650A (en) Shadow detection method and device based on attention mechanism and federal learning
CN116844041A (en) Cultivated land extraction method based on bidirectional convolution time self-attention mechanism
CN115239672A (en) Defect detection method and device, equipment and storage medium
CN113657414A (en) Object identification method
CN115761240B (en) Image semantic segmentation method and device for chaotic back propagation graph neural network
CN116309213A (en) High-real-time multi-source image fusion method based on generation countermeasure network
CN117173595A (en) Unmanned aerial vehicle aerial image target detection method based on improved YOLOv7
Rishita et al. Dog breed classifier using convolutional neural networks
CN115880477A (en) Apple detection positioning method and system based on deep convolutional neural network
CN112541469B (en) Crowd counting method and system based on self-adaptive classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination